<<

Optimized Analysis of the Lung Allograft Microbiota from Bronchoalveolar Lavage Fluid

by

Janice Evana Prescod

A thesis submitted in conformity with the requirements for the degree of Master of Science Department of Laboratory of Medicine and Pathobiology University of Toronto

© Copyright by Janice Evana Prescod 2018

Optimized Analysis of the Lung Allograft Microbiota from Bronchoalveolar Lavage Fluid

Janice Prescod

Master of Science

Department of Laboratory of Medicine and Pathobiology University of Toronto

2018 Abstract

Introduction: Use of bronchoalveolar lavage fluid (BALF) for analysis of the allograft microbiota in lung transplant recipients (LTR) by culture-independent analysis poses specific challenges due to its highly variable bacterial density. Approach: We developed a methodology to analyze low- density BALF using a serially diluted mock community and BALF from uninfected LTR.

Methods/Results: A mock microbial community was used to establish the properties of true- positive taxa and contaminants in BALF. Contaminants had an inverse relationship with input bacterial density. Concentrating samples increased the bacterial density and the ratio of community taxa (signal) to contaminants (noise), whereas DNase treatment decreased density and signal:noise.

Systematic removal of contaminants had an important impact on microbiota-inflammation correlations in BALF. Conclusions: There is an inverse relationship between microbial density and the proportion of contaminants within microbial communities across the density range of

BALF. This study has implications for the analysis and interpretation of BALF microbiota.

ii

Acknowledgments

To my supervisor, Dr. Bryan Coburn, I am eternally grateful for the opportunity you have given me by accepting me as your graduate student. Over the past two years working with you has been both rewarding and at times very challenging. Thank you for guiding me in the right direction, being as honest as possible, pushing me to do my best and most of all being a supportive mentor throughout my master’s degree. The lessons you have taught me not only has made me a better scientist but a better person and I will continue to use them throughout my future career.

To my lab mates, especially Saumya Bansal who has endured this whole process with me from start to finish and Ashley Rooney for being my coffee mate/great listener, thank you for being a great support system. I really do value you all of your help and have grown to appreciate you all as my science family!

To my advisory committee, Dr. Tereza Martinu, Dr. Stephen Juvet and Dr. David Hwang, thank you all for continued guidance throughout these two years. Your expertise in the lung transplantation and pathology has allowed me to gain a greater understanding of this field.

To our collaborators at the Toronto Lung Transplant Program and within the Martinu lab, I would like to thank you for your work collecting samples, diagnosing patients and for the cytokine analysis presented in this thesis. A special thank you to Dr. Liran Levey, for collecting all the clinical data and the cytokine analysis and Dr. Pierre Schneeberger and Dr. Youngho Lee for collaborating on the analysis of 16S data and qPCR analysis of lung transplant samples

To my family, I want to thank you for being the amazing supportive people who have relentlessly encouraged me to pursue my academic career. Words cannot express the appreciation I have for your unconditional support and the sacrifices you have made to help me achieve my goals.

iii

Table of Contents

Acknowledgments ...... iii

Table of Contents ...... iv

List of Tables ...... v

List of Figures ...... vi

List of Abbreviations and Definitions ...... viii

Chapter 1 Introduction ...... 1 1.1 Human microbiome ...... 1 1.2 Airway microbiome ...... 3 1.3 Allograft microbiota in lung transplant recipients ...... 8 1.4 Chronic lung allograft dysfunction(CLAD) versus allograft microbiota ...... 13 1.5 Limitations with allograft microbiota research to date ...... 17 1.6 Rationale ...... 22

Chapter 2 Materials and Methods ...... 23 2.1 Optimization of low density samples with serially mock community ...... 23 2.2 Optimization of low density BALF samples from lung transplant recipients with CLAD 30

Chapter 3 Results ...... 33 3.1 Mock community serial dilution ...... 33 3.2 Pre-sequencing treatments ...... 48 3.3 Post sequencing removal of contaminants ...... 56 3.4 Removal of contaminants from lung transplant cohort ...... 63 3.5 Threshold of 16S rRNA sequencing detection ...... 75

Chapter 4 Discussion ...... 77 4.1 Study limitations ...... 78 4.2 Comparison to published reports ...... 80 4.3 Future Directions ...... 83

References ...... 87

iv

List of Tables

Table 1. List of contaminating taxa from mock community serial dilutions ...... 37

Table 2.Contaminants that were reproducible within more than one density of mock community...... 47

Table 3. Contaminants that were retained post-filtration in mock communities ...... 61

Table 4. Patient characteristics ...... 68

v

List of Figures

Figure 1. A representation of the relationship between contaminants, microbial community complexity and density during 16S rRNA sequencing...... 20

Figure 2. Overview of pre-sequencing optimization of low density samples ...... 29

Figure 3.Range of absolute microbial densities within allograft microbiota of LTR...... 34

Figure 4. Overview of negative control collection during the processing of mock community samples...... 35

Figure 5. Sequence profile of serially diluted mock community samples ...... 36

Figure 6. Overlapping taxa from mock community contaminants and reagents ...... 39

Figure 7. Heat map of the relative abundance of taxa within each sample clustered by Bray- Curtis dissimilarity...... 40

Figure 8. Source tracker 2 proportions for mock community samples ...... 41

Figure 9. Bray-Curtis Dissimilarity between mock community samples to the highest density mock community ( 107CFU/mL)...... 44

Figure 10. Reproducibility of technical replicates of the serially diluted mock community samples ...... 45

Figure 11. Relative abundance of all contaminants in mock community samples ...... 46

Figure 12. Quantitative results of pre-sequencing treatment on mock community samples using 16S rRNA qPCR...... 52

Figure 13. Change in relative abundance of mock taxa and contaminants after pre-sequence treatment ...... 53

vi

Figure 14. Bray-Curtis dissimilarity between treatment samples and highest density mock community density ...... 54

Figure 15. Histogram of the taxonomic composition of the BALF samples after each pre- sequencing treatment...... 55

Figure 16. Post-sequencing methods for removal of contaminants ...... 59

Figure 17. Histogram mock community post-sequencing filtration...... 60

Figure 18. Principal Coordinate Analysis (PCoA) plots before and after post-sequencing filtration mock community samples...... 62

Figure 19. Absolute microbial density within the allograft of lung transplant recipients with CLAD in the absence of infection ...... 69

Figure 20. Cytokine expression for each group of LTR...... 70

Figure 21. Source tracker 2 for the analysis of lung transplant recipient allograft microbiota .... 71

Figure 22. Histogram of post-sequence filtration of lung transplant BALF samples ...... 72

Figure 23. Heatmap showing inverse in correlations between diversity and cytokines post filtration in LTR...... 73

Figure 24. Inverted relationship between alpha diversity metrics and cytokines after filtration of LTR...... 74

Figure 25. Proposed theory of spike in controls with mock community samples ...... 84

vii

List of Abbreviations and Definitions

16S rRNA 16S ribosomal ribonucleic acid, conserved region of gene use to identify

Alpha diversity Diversity of organisms in a sample

BALF Bronchoalveolar lavage fluid

BC Bray-Curtis dissimilarity

Berger Parker Abundance of the most abundant taxa

Beta Diversity Diversity between samples

BOS Bronchiolitis obliterans syndrome

CFU Colony forming units

Chao1 Estimate of total richness

CLAD Chronic lung allograft dysfunction

Ct Cycle threshold

FEV1 forced expiratory volume in 1 second

Microbiota Collection of microorganisms that live within and on a host

Noise Contaminants

OTU Operational taxonomic unit

OW Oral wash

Principal Component Analysis. Used to visualize similarities/dissimilarities PCoA between samples

PCR Polymerase chain reaction

Quantitative Insights Into Microbial Ecology, a bioinformatics pipeline for QIIME analyzing microbiome sequences qPCR Quantitative polymerase chain reaction

RA Relative abundance

viii

RAS Restrictive allograft syndrome

Shannon Measure of richness (number of taxa) and evenness(distribution) in a sample Diversity

Signal True taxa within sample (mock community taxa)

TLC Total lung capacity

URT Upper respiratory tract

ix 1

Chapter 1 Introduction 1.1 Human microbiome

1.1.1 What is the human microbiome?

The human body is home to trillions of microorganisms and their genes, called the human microbiome.1,2 Studies pertaining to the human microbiota (collection of bacteria, archaea and fungi) started with Antonie van Leewenhoek in late 1670s, who identified and cultured bacteria within the oral cavity which he called “animalcules”.3 With the advancements of culture- independent techniques, we now know that microbes are not only limited to the oral cavity but can also inhabit the skin, nasal cavity, vagina and (at the highest density) the gut.4–7 Not only do we have the ability to identify fastidious taxa, but through culture-independent techniques it is now possible to expand our understanding of the functions of these organisms in relation to the human host.8

Research regarding the human microbiome has gained popularity due to its health implications. One of the largest human microbiome studies conducted by the National Institute of

Health called the Human Microbiome Project (HMP) was launched in 2008.9,10 HMP was an expansion of the Human Genome Project, to understand how “the range of human genetic and physiological diversity… is influenced by the distribution and evolution of our microbial partners.”11 The objective of the HMP was to: (1) Characterize the “normal” microbiome in different sites and between healthy individuals; (2) Address the changes in the human microbiome with respect to disease, and; (3) Create a database and standardized techniques for microbiome studies.2 Since its creation, HMP has analyzed over 11,000 samples and collected several terabytes of microbial genomic data.12 To date, approximately 633 publications have cited the work

2 produced from this project (https://commonfund.nih.gov/publications?pid=16). The human microbiome project has demonstrated the impact microbial communities can have on human health and represents a new frontier to interrogate the mechanisms of human disease.

1.1.2 The mutualistic relationship between the microbiome and host

The gut microbiome remains by far the best studied human host-associated microbial ecosystem because of its large abundance of organisms and the relative ease of obtaining and sequencing stool.13 Most studies have focused on the gut microbiome’s interaction with the host, but many of these principles learned can be applied to other anatomical sites. 13 In observing the human microbiome, the distinction of a healthy microbiome has been one main objective. In describing the healthy microbiome, the term “healthy” is generally defined by the absence of apparent disease within a host, not based on the composition of the microbiome itself.13 Health- associated gut microbiota are generally dominated by taxa from the phyla Bacteroidetes and

Firmicutes.10 However, for researchers formally defining a ‘healthy microbiome’ has been difficult due to the high variability in bacterial communities that colonize healthy people.11,13

The microbiome aids in host health through a mutualistic relationship involving the host’s metabolism, homeostatic and defense functions.8,14 The ratio between Bacteroidetes and

Firmicutes within the gut differs between individuals with abnormal metabolism or obesity and healthy control subjects.8,15 In murine studies, an increase in Firmicutes and decrease in

Bacteroidetes is associated with greater in weight gain in mice consuming the same amount of calories.16,17 Observations in human twin studies have linked obesity with shifts in microbial genes within the gut that contribute to carbohydrate metabolism and other metabolic pathways.18

The vaginal microbiome has gained attention interest from researchers regarding the mutual beneficial relationship to the host, as the vagina provides the benefit of nutrients and

3 optimal growth conditions, while the bacteria (i.e. Lactobacillus) play a role in preventing pathogenic organisms from colonizing the host.6 For example, Lactobacillus is known to decrease the pH of the vagina through the production of lactic acid and is correlated with decreased risk of human immunodeficiency virus (HIV) acquisition.19,20 In addition, Lactobacillus in the vagina has been known to stimulate the immune system.21 The function between commensal bacteria protecting the host has been observed through competing for attachment sites in the gut, and preventing pathogenic species from entering the systemic circulation through breaks in the epithelial cells.7,22

Lastly, the human microbiome also has been linked to priming the development of the immune system and the ability to combat allergens.23 This intestinal microbiota during infancy has been hypothesized to prevent the immune system from overreacting to antigens in adulthood.22

This symbiotic relationship between the microbiota and the immune system has allowed the promotion of healthy microbial colonization, induction of protective response to pathogens and tolerance of harmless antigens.24 Overall, this cooperative relationship allows both host and microbes to thrive and has been a subject of interest because of its far reaching potential in enhancing many functions of the human body.

1.2 Airway microbiome

The microbiota of the lung has been the subject of more limited research in comparison to other organs. Within the last several years, our understanding of the lung microbiota, the collection of microbes that reside within the human lung, has grown exponentially. The origin and composition of the lung microbiome in both a healthy and disease bearing host have been investigated by culture-independent techniques such as 16S rRNA sequencing. In the next section, the lung microbiota literature in health and disease will be discussed.

4

1.2.1 Are the lungs sterile? Conceptual biases that may have let to this conclusion

Although the human microbiota has been observed in various sites of the body, the lung microbiota has notoriously been one of the more difficult sites to observe. Historical approaches to assess the lung microbiota with culture-dependent techniques concluded that the healthy lungs were sterile.25,26 In 1888, one of the first published studies proposed lung sterility after observing minimal growth of bacteria within the trachea and the nasal passages in rabbits, then inferred that this was also true in humans.27 This observation can be difficult to believe because of the direct interaction the lungs have with the environment through inhalation of air (104-106 bacteria/mm3) and the lung’s connection to the highly dense oral flora.28,29 Yet, as Dickson et al. explains this

“sterile healthy lung” concept has been perpetuated due to a variety of conceptual errors.27

Firstly, ‘sterility’ was generally established using bacterial culture. Many of the protocols used to observe the lung microbiota were optimized to detect bacterial pathogens (i.e. disease- causing organisms). Therefore, when observing healthy lungs the commensal bacteria more were difficult to cultivate.30 Not only are historical culture methods biased against commensal bacteria, but the low absolute microbial density, which is a feature of healthy lungs, increases the difficulty in growth of organisms using culture.30 Using culture-independent techniques, taxa cultivated within the healthy lung using traditional culture represent a subsection (60%) of the total bacteria present when observed with culture-independent techniques.31

The second potential error Dickson et. al proposed is the misinterpretation of cultured taxa as upper airway contaminants. The upper respiratory tract (URT) and the lungs are directly connected through mucosal continuity and therefore the URT is a source for the microbes within the lungs through migration.32,33 The lungs and the mouth share community membership, therefore microbes common in this compartment should not be overlooked as non-representative of the lung

5 microbiota.27,34 Lastly, Dickson et al. proposed that there can be a failure to distinguish between sterility and absence of resident taxa. In healthy people, taxa in the lung may only be transiently present and thus not consistently detected within the same individual over time.25,35

With the application of culture-independent techniques such as 16S rRNA gene sequencing, there has been a renewed interest in the lung as a potentially microbe-rich environment. Investigators have now been able to identify and make observations regarding the diversity and composition of microbes that reside within the lungs of healthy hosts and overcome the technical limitations of culture dependent technqiues.32,36,37

1.2.2 Composition of the lung microbiota

The lung microbiota is established by three processes: microbial immigration; microbial elimination, and; relative microbial reproductive rates.35,38 These factors determine the relative community membership and the total absolute microbial burden within the lungs. Changes within the lung microbiota can be attributed to the alteration of these three factors.27

Microbial immigration into the lungs occurs by micro-aspiration, direct inhalation of air and mucosal dispersion.35 Elimination of microbes from the lungs is achieved through the innate immune defense mechanisms, coughing and mucociliary clearance. The result of these processes is the constant influx and efflux of bacteria within the human lung. 35 Bacterial reproduction rates can be influenced by several factors that are endogenous to the host such as pH, temperature, oxygen availability and nutrients. In addition, exogenous factors, such as air pollution, antibiotics and other medications, also play a role in reproduction of the lung microbiota. 35,38

6

There has been controversy regarding the relative proportion of resident community members (bacteria that colonize and reproduce) and transient community members (bacteria that immigrate then are eliminated) within the lung.32,35,38 Dickson et al. applied the theory of the adapted island model of biogeography to explain the compositional and spatial variation with respect to influx and efflux of bacteria within the lung.35,38 Based on this model, in health, distal areas of the lung will have less bacterial diversity and a reduction in microbial similarity compared to the URT.38 Microbial species richness in this model is a function of immigration of microbes originating from the URT. Distal airways will experience a lower rate of immigration due to greater distance from the URT. However, rates of immigration/extinction and reproduction can be affected by changes in anatomy and physiology associated with disease, as well as treatment. In advanced lung diseases, many factors may decrease extinction rates, including impaired ciliary function, decreased cough reflux and endobronchial obstruction. This decreasing extinction rate would increase the burden of microbes within the distal areas and variably affect bacterial reproduction at these sites.38 Spatial variation and higher bacterial burden in the lung microbiota has been demonstrated in patients with cystic fibrosis and chronic obstructive pulmonary disease

(COPD) in comparison to healthy controls.36,39

1.2.3 Characterizing the lung microbiota in health and disease

Characterization of the healthy lung microbiota has been attempted in a number of recent studies.32,36–38,40–45 While there has been variation in the “core” lung microbiota between studies, the most prevalent phyla are consistently Bacteroidetes and Firmicutes and to a lesser extent

Proteobacteria and Actinobacteria.32,36,37,44,45 In addition the genera, Prevotella, Veillonella,

Pseudomonas and Streptococcus have been observed consistently in a number of studies.32,38,41

Taxonomically, the lung and upper respiratory tract microbiota are similar in community membership, but differ in absolute and relative abundances.32,38,41,43 Comparisons between the

7 alpha diversity (diversity within a sample) have shown that the lung microbiota is lower in richness

(number of different species) than the URT yet, some species having higher relative abundance in the lower versus upper respiratory tract.33,41 Generally studies of the healthy lung microbiota included a small number of subjects, although Beck et al. published a study observing 86 healthy subjects in a multi-centered comparison of the lung microbiota observing similar core taxa.40,46

Furthermore, there remains a lack of information regarding healthy lung microbiota such as the stability over time of the lung microbiota in healthy subjects. Projects such as the Lung HIV

Microbiome Project aim to address some of these shortcomings with large multi-centered observations of the healthy lung microbiota (https://clinicaltrials.gov/ct2/show/NCT02392182).

Disease can have profound effects on the composition of the lung microbiota. Cystic fibrosis is one disease with well-established research regarding lung microbiota, which in part may be due the increased susceptibility in acquiring respiratory infections and pathogens that have been identified through culture-dependent/independent techniques.46 Characterization of the CF lung microbiota through culture-independent techniques has identified complex communities during and after acute exacerbations. In a study performed by Rogers et al., they observed that the lung microbiota of CF patients during infection are not only dominated by one pathogen, such as

Pseudomonas aeruginosa, but can be composed of a variety of bacterial species in large quantities

(species richness 13.4 ± 6.7 per patient).47 The microbial composition for many other lung diseases such as asthma and COPD have reported differences in two dominant phyla and

Bacteroidetes. In a study performed by Hilty et al., they examined the lung microbiota of 43 subjects with either asthma, COPD and healthy controls.37 They observed that COPD and asthmatic patients had a higher prevalence of pathogenic Proteobacteria (Haemophilus spp.) in comparison to controls. Healthy patients had more Prevotella spp., than subjects with either disease.

8

1.3 Allograft microbiota in lung transplant recipients

Lung transplantation is the last available life-extending therapy for the end-stages of a variety of lung diseases including cystic fibrosis, idiopathic pulmonary fibrosis and COPD.48 The world’s first successful single human lung transplant occurred in 1983 at the Toronto General

Hospital.49 Now, thousands of lung transplants occur yearly worldwide. The latest report from the

International Society for Heart and Lung Transplantation (ISHLT), stated that 60,107 lung transplantations were performed up until June 2016.50 In spite of the associated advancements in surgery and treatment for these lung diseases, the 10-year survival rate after lung transplantation is only 27% and is among the lowest of all solid organ transplants. 48

Microbial infection has been observed to be a major risk factor for early death and development of chronic rejection for lung transplant recipients (LTR).51 Bacteria, particularly airway pathogens such as Pseudomonas aeruginosa have been observed to modulate the immune system, eliciting pro-inflammatory cytokine upregulation.52 With the advent of culture- independent techniques such as 16S rRNA sequencing, we have been able to expand our understanding of the impact of the lung microbiota on disease in lung transplant recipients beyond cultured pathogens.53–55 Research regarding the role of commensal bacteria, dysbiosis

(pathological perturbation of whole microbial communities) and colonization of microbes not historically considered pathogens in the LTR airway are in their infancy. Yet, with our current understanding of the microbiota impacting human mucosal inflammation in other circumstances, the allograft microbiota represents a new area for understanding pathobiology and personalizing treatments for LTR.

9

1.3.1 Quantitative and qualitative insights into the allograft microbiota in LTR

The allograft microbiota in lung transplant recipients has been observed to possess a range in microbial density and community structure varies between patients. In healthy patients, the bacterial burden of the lung is 2-4 log lower than the oral flora.32 This gradient can be maintained in the allograft of LTR; yet, more commonly, bacterial loads are similar to the oral cavity especially in those with infection and suppurative lung disease.44,56 The lower airways of LTR differ in taxonomic composition in comparison to healthy controls. While microbiota in healthy lungs generally share community membership with the upper airway, in LTR this relationship is not always observed.32 Within allograft microbiota, organisms belonging to the phyla Proteobacteria have been observed to be in higher abundances compared to the healthy lower airway.44

Specifically, the allograft microbiota exhibits a higher frequency of dominant pathogenic taxa that include Pseudomonas, Staphylococcus and Achromobacter.44 Dominance by these genera are frequently observed in allograft microbiota with a high bacterial density.55,57,58 In a study performed by Dickson et al., symptomatic LTR were dominated by Pseudomonas aeruginosa and domination was correlated with higher bacterial burden and lower diversity in comparison to healthy controls.58 Although the relationship between certain pathogenic taxa being enriched within the lungs of LTR may be due to clinical and physiological factors, such as infection, other communities have been identified in the allograft. Communities dominated by taxa normally considered ‘non-pathogens’ within the allograft, including species such as Pseudomonas fluorescens, is present in a subset of LTR.58 The significance of these communities is unclear, but may be related to allograft dysfunction, as discussed below.

10

Bacterial communities of the allograft microbiota

In an early study of the allograft microbiota using culture-independent techniques,

Charlson et al. compared the microbiota of 6 healthy controls to 21 lung transplant recipients.44

Bronchoalveolar lavage fluid (BALF) and oropharyngeal wash (OW) were collected in parallel to assess lung and URT microbiota, respectively. 16S rRNA sequencing was used to profile the taxonomic compositions of each sample. They observed that LTR have a 44-fold higher bacterial burden by (P<0.05) in comparison to healthy controls. Shannon diversity (richness and evenness) was lower in LTR (Shannon Diversity 1.8-3, P<0.05) and had greater prevalence of dominant organisms, such as Pseudomonas aeruginosa than controls. Comparing OW and BALF, they observed a greater distinction in microbial composition between sample types within LTR in comparison to controls (weighted UniFrac distance, P<0.05). Many taxa within BALF samples that dominated communities were detected by culture, including Pseudomonas aeruginosa. Yet, there was a subset of patients who did not have positive cultures but showed dominance of single taxa, generally with ‘non-pathogens’ such as Prevotella.

In a similar study, Dickson et al. assessed the microbial composition of allografts in LTR, stratified by clinical parameters, in comparison to healthy controls.58 They collected BALF from

33 LTR and 28 healthy controls, and assessed microbial density and composition through 16S rRNA quantitative polymerase chain reaction (qPCR) and pyrosequencing, respectively.

Asymptomatic patients were identified as LTR that were undergoing a surveillance bronchoscopy and had no complaints of cough or fever and were not undergoing bronchoscopy for decreased lung function or new pulmonary infiltrates. Like the Charlson et al. study44, LTR overall had a higher bacterial burden in comparison to controls (P<0.05). Symptomatic patients exhibited a lower diversity than controls (P<0.001) and were more likely to be dominated by Pseudomonas

11 aeruginosa (mean relative abundance 11%). All samples with Pseudomonas aeruginosa as the dominant taxa were culture-positive for this organism. In contrast, samples from asymptomatic patients had similar diversity to controls and a greater prevalence and relative abundance of

Pseudomonas fluorescens (mean relative abundance 12%). Interestingly, only one out of fifteen patients who were dominated with Pseudomonas fluorescens were also culture-positive for this organism. BALF that were dominated by P. aeruginosa had higher increased neutrophils in comparison to those with P. fluorescens (P=0.03), suggesting that there are organism-specific relationships between ecological dominance and airway inflammation.

The presence of pathogens within the allograft microbiota may be due to recolonization of pre-transplant species. In a cohort of 14 LTR with cystic fibrosis, Syed et al. assessed the lung microbiota pre and post transplantation from patients that underwent a bilateral lung transplant.59

They describe the post-lung transplant microbiota to have a similar alpha diversity to the pre- transplant samples (Shannon diversity, P=0.65). Beta-diversity (differences in taxonomic composition between samples) was stratified by patients with similar or dissimilar communities pre- and post-transplant. Patients with more similar communities (Bray Curtis mean 0.3) were dominated by the same primary pathogens, primarily Pseudomonas aeruginosa. In 11/12 cases, the re-established dominant Pseudomonas was genotypically the same as the pre-transplant strain, indicating that the allograft is colonized from the patient, not the environment.

Longitudinal analysis

Surveillance bronchoscopies are commonly obtained at pre-specified intervals in LTR.

These ‘surveillance bronchoscopies’ provide an opportunity to sample the allograft microbiota and can give insights into the dynamics of the microbiota over time. In a study performed by Borewicz et al. of 4 LTR and 2 non-transplant controls over 3 sampling points, shifts within the allograft

12 microbiota over time were observed. 60 Only 8-12% of the total relative abundance were retained between sampling points within patients. Luna et al. assayed the changes in the allograft microbiota within the first year post transplantation in 21 children who underwent lung transplantation.61 They collected 94 BALF samples and 16S rRNA sequencing of the V3-V4 region was used to identify microbial communities. Interestingly, they observed that bacterial diversity increased until 9 months post-transplantation. Between 9-12 months after transplant, the allograft microbiota was observed to decrease in diversity.

Allograft microbiota and infection

Respiratory infections such as bacterial pneumonia and tracheobronchitis have been associated with disease-specific microbial community composition and markers of host response in LTR.57 In a study performed by Shankar et al, they observed the allograft microbiota in 16 LTR with bacterial pneumonia (n = 8), tracheobronchitis (n = 12) or colonization without respiratory infection (n = 29) and also assessed allograft inflammation.57 They observed lower microbial diversity in patients with bacterial pneumonia in comparison to both colonization and tracheobronchitis (P<0.05). This loss in diversity was mainly driven by the decrease in abundance of commensal taxa such as Corynebacterium. They also observed an elevated pro-inflammatory cytokine, IL-1β, anti-inflammatory IL-1RA and IL-4 and immunoregulatory interferon c (IFN-c) in comparison to non-infected colonized LTR. Tracheobronchitis patients had similar diversity index compared to uninfected LTR yet, displayed differences in taxonomic composition and cytokine profiles that include macrophage inflammatory protein (MIP) 1b, IP-10, granulocyte colony-stimulating factor, IL-2 and IL-7. The researchers concluded that differences in microbiota and cytokine profiles suggest pneumonia and tracheobronchitis represent different pathological processes.

13

1.4 Chronic lung allograft dysfunction (CLAD) versus allograft microbiota

1.4.1 What is CLAD?

Chronic lung allograft dysfunction (CLAD, or “chronic rejection”) is the major cause of death for LTR during the late transplant period. Approximately 50% of LTR will develop CLAD within the first 5 years after transplantation.48,62 Inflammation of the allograft progressing to fibrosis is the hallmark of CLAD, which can be divided into two main phenotypes based on the affected area of the lung and the resulting dysfunction:

1) The first observed phenotype of CLAD is bronchiolitis obliterans syndrome (BOS). BOS

is characterized by an obstructive physiology due to the inflammation and fibrosis

development within the small airways of the lung. BOS is identified by the persistent

decline in forced expiratory volume in 1 second (FEV1) by >20% in comparison to the

baseline within the LTR.63 There are various grades associated with BOS progression and

are determined by the decreasing FEV1 values. Approximately 75-85% of CLAD patients

are diagnosed with BOS and the median survival after diagnosis is 35 months.62

2) The second phenotype is restrictive allograft syndrome (RAS) which is characterized as

the inflammation and fibrosis in the peripheral lung and is associated with restrictive lung

physiology.62,64 This phenotype was recently described by Sato et al. in 2011, who

characterized RAS as distinct phenotype of CLAD.64 While both BOS and RAS share

similar irreversible decline in pulmonary function tests, RAS patients also have a decline

in total lung capacity (TLC).64 Of the 50% of LTR that are diagnosed with CLAD after 5

years after transplantation, 15% are diagnosed with RAS.64 RAS patients have a

14

significantly worse prognosis compared to BOS patients and their median survival rate

after diagnosis is less than 2 years. 64

While clinically and pathologically different, both phenotypes of CLAD exhibit fibrotic lesions and share risk factors for development, suggesting overlapping underlying pathobiology.64

Nonetheless, the poorer prognosis in RAS patients has created urgency in identifying specific risk factors for RAS specifically.

1.4.2 Current literature regarding the allograft microbiota and CLAD

1.4.2.1 Pseudomonas colonization is a risk factor for CLAD development

One consistent association between the allograft microbiota and CLAD has been the presence of Gram-negative species, such as Pseudomonas aeruginosa, and the onset and development of BOS.54,65,66 Current microbiota analysis and traditional culture have both established a strong basis in the literature regarding Pseudomonas’ association with BOS development. In study with 17 LTR, Hayes et al. demonstrated that 12 LTR showed symptoms of

BOS and all patients were infected with a Gram-negative species such as P. aeruginosa identified through culture.65 In a larger study of 92 LTR, Vos et al. identified that P. aeruginosa colonization was a risk factor for BOS development and was associated with worse stage BOS according to a univariate analysis.66. In addition, this work was also followed up by Gregson et al. and concluded that timing of Pseudomonas infection had a role in BOS development in LTR.67 Specifically, they observed an increased risk of the initial transition from stable to BOS associated with

Pseudomonas infection and the associated inflammatory response.

While P. aeruginosa infection is a risk factor for BOS, the presence of this organism in allograft without infection is not associated with disease in all studies. In 155 LTR, Botha et al. observed that 61% of the patients were colonized by P. aeruginosa after transplantation. They

15 identified that de novo (newly acquired) Pseudomonas colonization was strongly associated with

BOS development in comparison to those who were free of colonization, but no association for those that were persistently colonized (i.e. were colonized pre-transplant)54. In addition, Willner et al. found the recolonization of Pseudomonas to be protective against BOS development in lung transplant patients who have cystic fibrosis55. In a cross-sectional and longitudinal analysis of 57

LTR, reestablishment of the pre-transplant lung populations was negatively correlated with BOS development in CF patients. Conversely, de novo acquisition of the same genera increased the risk for BOS, similar to the observations of others.54,68 The presence P. aeruginosa within the allograft microbiota has been the observed to be associated with BOS development, but not in all circumstances. Furthermore, the relationship between the microbiota and CLAD development in the absence of dominant taxa, such as Pseudomonas, has yet to be elucidated.

1.4.2.2 The microbiota and inflammation of the allograft

Recently, Bernasconi et al. evaluated the host-microbial interactions in LTR in regards to immunological tone and graft survival.69 They collected 209 BALF samples from 112 LTR, profiling the allograft microbiota and host inflammatory response. They observed microbial community composition through 16S rRNA sequencing and host gene expression using qPCR.

They discovered that different microbial community compositions were associated with pro- inflammatory or remodeling immunological profiles. Multiple patterns of ‘dysbiosis’, the pathological perturbation of the microbiota – were identified in this cohort and related to patterns of immune activity. Bacteroidetes dysbiosis and Proteobacteria dysbiosis (dominance by either phylum) were associated with tissue remodeling and pro-inflammatory profiles, respectively, while intermediate activation of the gene expression profiles within the allograft was associated with a balance of these phyla. No additional studies have directly associated microbial community composition and patterns of dysbiosis with inflammatory profiles in this way.

16

1.4.2.3 Limited research regarding the allograft microbiota between CLAD phenotypes

To date, there have been limited data regarding the relatively new phenotype, RAS, and the allograft microbiota. In a study with 103 LTR, 73 with BOS and 24 with RAS, Verleden et al. observed that pulmonary infections of any kind were identified at a higher prevalence in BOS and

RAS patients (51% and 54% respectively) than in non-CLAD LTR (33%).70 Colonization with

Pseudomonas (defined as detection of the taxa within BALF without clinical symptoms) before

CLAD onset was observed in higher prevalence in BOS and RAS patients (32% and 45% respectively) than in non-CLAD LTR (19%). Overall, there were no statistically significant differences between the incidence rate of the colonization of Pseudomonas and respiratory infections between CLAD phenotypes, indicating once again that colonization and infection (the detection of the taxon in the presence of clinical symptoms or findings consistent with infection) with this organism are not pathologically identical. The relationship of other airway pathogens, non-pathogens and community composition has not been evaluated for RAS and BOS separately.

1.4.3 Mechanisms of host-microbiota interactions in CLAD development/progression

The allograft microbiota is in a dynamic relationship with the host and allograft immune response. It is proposed that there is a positive feedback mechanism between the allograft microbiota and the immune response, linking overgrowth of some taxa with increased inflammation, which further promotes changes in the microbiota. The mechanism of epithelial injury within the allograft lung may be due to the antibacterial response elicited by specific bacterial species.48 As Bernasconi et al. observed, Proteobacteria dysbiosis was associated with higher expression of type 1 inflammation markers such as tumor necrosis factor (TNF) and cyclooxygenase (COX)-2 while Bacteroidetes dysbiosis was linked to remodeling response,

17 characterized by increase in tissue inhibitor of metalloproteinase (TIMP)-1/matrix metalloproteinase (MMP)-12 ratio and platelet-derived growth factor D (PDGFD).69 Borthwick et al. associated anti-microbial activity to Pseudomonas with the expression of CXCL1 and

CXCL5, cytokines associated with BOS.52 These cytokines lead to inflammation and activation of

CD8+ lymphocytes and macrophages that mark the beginning of fibrotic remodeling and chronic rejection in LTR.52,71 However, many of the correlations between allograft microbiota and inflammation have been made in LTR with infection. Studies of allograft microbiota-host interactions in the absence of infection are needed to more strongly link the presence of these taxa and the development of CLAD.

1.5 Limitations with allograft microbiota research to date

1.5.1 Bias confounding effects of infection on microbiota structure

Studies of the post-transplant allograft microbiota largely include significant numbers of patients with infection.54,55,66,69,72 Although this has generated compelling data, the presence of infection has been correlated with the loss of diversity due to the emergence of a dominant organism in many lung diseases73–75, including pneumonia in LTR57, and thus may confound interpretations of microbiota-allograft interactions. Notably, Bernasconi et al. observed that pro- inflammation stimulatory taxa were associated with infection in LTR, but also that low stimulatory bacteria (usually non-pathogenic species) were enriched in LTR with pro-tissue remodeling and pro-fibrotic cytokine profiles.69 These findings indicate that specific microbiota-host relationships may be present in the absence of infection. In the allograft microbiota literature, there is little known about the influence of low stimulatory bacteria or the taxonomic composition of non- infected allograft microbiota and its relation to chronic rejection at CLAD diagnosis and longitudinally.

18

1.5.2 Technical limitations of culture-independent analysis of low density samples

1.5.2.1 What is culture-independent analysis for microbiome studies?

Culture independent analysis of microbiome studies generally detect microbial communities directly from extracted DNA.76 The most common method for whole-community analysis is sequencing of the 16S rRNA gene, a ubiquitous region within the genome of bacteria that contains hypervariable regions which can distinguish taxa to species level through site specific variants.77 Quantitation of bacterial load is often performed by 16S rRNA quantitative polymerase chain reaction (qPCR).78 During 16S rRNA qPCR, amplification of the 16S rRNA gene is performed using standard PCR techniques with primers and fluorescent probes specific to the conserved region of the genome.78 The process for 16S rRNA sequencing is as follows: (1) DNA is extracted; (2) PCR amplification of the variable region 16S rRNA gene is performed (to generate libraries for sequencing); (3) Sequencing of the PCR products is performed on a next-generation sequencing platform, and; (4) DNA sequences are aligned back to a database for identification and relative quantitation.77 Historically, DNA sequencing has been very expensive and time consuming. Within the last decade, high-throughput sequencing technologies have made identifying bacteria more cost effective, quick and a popular choice in analyzing the microbiota of humans.76,77

1.5.2.2 Impact of contaminants on the analysis of culture-independent analysis

Culture-independent techniques have been useful in identifying and characterizing microbial communities within the allograft, but have specific technical limitations of relevance to analysis in LTR. The most common sample type to assess lung microbiota has been BALF as they are collected during scheduled surveillance and indication bronchoscopies post-transplantation.79

19

An important technical limitation in the analysis of the allograft microbiota using BALF is that they can be low in bacterial density (compared to more commonly assayed samples such as stool or the oropharynx). During 16S rRNA gene sequencing, low density bacteria are being amplified along with contaminating DNA from numerous sources that include the upper airway, sample processing reagents and the environment.80 This creates an issue trying to distinguish if the sequenced taxa originated from the source of interest or contaminants introduced. Figure 1 outlines the relationship between microbial starting density and proportion of contaminating bacteria after

16S rRNA sequencing. DNA is ubiquitous within the laboratory setting and even with aseptic techniques in collection, storage and processing, contamination in samples is unavoidable, but is often only evident in low density samples such as BALF from some LTR.32,80,81

20

Low density High density sample sample Community Complexity Community Proportion of Contaminants of Proportion

Log2 Microbial Density

Figure 1. A representation of the relationship between contaminants, microbial community complexity and density during 16S rRNA sequencing.

The blue line represents the increasing (left to right) community complexity of a sample and the red line represents to proportion of contaminants in the sample after sequencing. Low-density sample types, such as BALF, are characterized as containing low microbial density and low community complexity (richness and evenness) and are susceptible to a high proportion of contaminants after 16S rRNA gene sequencing. This is in comparison to a high- density sample, such as stool, which has high microbial density, low community complexity and low proportion of contaminants observed after sequencing.

21

1.5.2.3 Ways researchers have observed low density samples for culture-independent analysis

To combat the technical limitations in analyzing low bacterial density samples, researchers have employed pre-sequencing treatments such as sample concentration to improve the overall density prior to sequencing. Absolute density has a negative correlation with contaminants during culture-independent analysis.81 Concentrating low-density samples can increase the overall microbial density and improve the identification of the true “signal” taxa within a sample. Other fields of microbiota research have employed concentration as a method in observing microbes in low-density sample types, such as sea water. This method is performed using ultrafiltration units to trap microorganisms and contents within the sample that are larger than the size of the pore.82–

84 Although this technique has been useful in analyzing large volumes (sometimes liters) of low density sample types, it has not been applied to BALF in LTR.

Another pre-sequencing treatment for the reducing contaminants observed during culture independent analysis of low density samples is the use of DNase.85 DNase treatment is performed to remove free-floating contaminating DNA within a sample. It has been observed to be effective in removing contaminants for 16S quantitative polymerase chain reactions (16S qPCR).86,87 For

BALF samples, the use of DNase treatment can have an added benefit in confirming the presence of viable bacterial cells, as only free-floating DNA or dead bacteria are susceptible to DNase treatment.88 Notably, this generally requires using fresh samples, as freeze-thawing bacteria causes lysis. However, by using pre-sequencing sample preparation such as DNase treatment, it may be possible to increase the ability to observe signal DNA by removing contaminants in low biomass samples for culture-independent analysis.

22

1.6 Rationale

To properly distinguish taxa that are found in the lung, we must create a protocol that can establish reproducible and consistent microbiota profiles from BALF that are not obscured by technical or experimental design factors. Our work will identify issues that are common when processing BALF samples. Our primary aim is to develop and apply methods for the analysis of

BALF across the full range of bacterial density found in LTR, allowing us to assess relationships between microbial community structure and lung allograft status even in the absence of high bacterial burden and the confounding influence of infection.

1.6.1 Hypotheses

Hypothesis 1: We hypothesize that there will be an inverse relationship between initial sample density and proportion of contaminants within each sample. In addition, pre-sequencing sample concentration and DNase treatment will increase the ratio between signal DNA and contaminants.

Hypothesis 2: We hypothesize that: (1) in the absence of infection, a significant portion of LTR

BALF samples will have bacterial densities at or near the threshold of detection; (2) Allograft microbial diversity will be inversely correlated with inflammatory cytokine levels in BALF

1.6.2 Overall Aims

The aims of this thesis are to:

1. Develop a technical approach to the analysis of low density BALF using a mock

community and validating it in a BALF sample.

2. Using a cohort of lung transplant recipients, selected for lack of infection, we will

characterize the allograft microbiota using the methods described in Aim 1 and assess

between group differences.

23

Chapter 2 Materials and Methods 2.1 Optimization of low density samples with serially mock community

2.1.1 Mock community and BALF collection

Microbial communities were created in vitro using three bacterial species obtained from the American Type Culture Collection (ATCC, Manassas, VA, USA) that include: Pseudomonas aeruginosa PAO1, Stenotrophomonas maltophilia ATCC-13637 and Burkholderia multivorans

ATCC-17616. Single colonies of each bacterial species were cultured overnight in 5 mL of Luria-

Bertani (LB) medium at 37oC while shaking at 200 rpm. A 10-fold serial dilution of the starter culture for each species was performed 6 times using sterile LB medium and 100 µL of the 10-6 dilution was plated on LB agar and incubated overnight to quantify starting density and ensure purity of the culture. Samples were immediately stored at -80oC until subsequent steps. Once the overnight cultures of each species were quantified an equal mix of each species (1:1:1) was created to an initial mock community density of 109 CFU/mL. Nine 10-fold serial dilutions were made using sterile LB. Mock communities between the range of 107 – 101 CFU/mL were used and immediately stored at 4oC until extraction that same day

A single 4 mL BALF sample, was obtained from a lung abscess patient undergoing a bronchoscopy. According to the patient’s clinical microbiology report, commensal bacteria were present in the BALF at < 106 CFU/L. The sample was also clinically positive for scant yeast growth consistent with commensal flora. Upon collection of the BALF, the sample was immediately placed on ice and stored at -80oC until further analysis.

24

2.1.2 Pre-sequencing treatment

To determine the effects of concentrating samples, we performed a 6 to 10-fold concentration on both mock community densities and the BALF sample. For the mock community samples, 5 mL of each serial dilution was concentrated using Amicon Ultra-15 centrifugal filter units with a 30 kDa filter (MilliporeSigma, Darmstadt, Germany). The samples were concentrated by centrifugation at 3220 x g for 1 minute increments until an end volume of 500 µL. For the

BALF sample, we performed a 6-fold concentration with a starting volume of 3mL following the same procedure. During all concentration runs a negative control was performed following the same procedure using 5 mL of sterile water.

For the DNase treatment, we obtained DNase I from Invitrogen (Thermofisher Scientific,

Waltham, Massachusetts, United States) and performed this treatment according to the manufacturer’s protocol. For each reaction, 2µL of DNase I and 28µL of 10x DNase Buffer was added to 250 µL of sample and incubated at 37oC for 30 minutes. The samples were subsequently heat denatured at 65oC for 10 minutes. A control was performed following the same procedure substituting sample for 250 µL of sterile water. All samples were immediately extracted following the protocol below.

2.1.3 DNA extraction

DNA was extracted using the DNeasy PowerSoil Kit (formerly MoBio PowerSoilTM DNA

Isolation kit; Qiagen Carlsbard, CA). DNA extraction was performed in duplicate using 125µL of each sample to examine reproducibility of each sample. The extraction was performed according to the manufacture’s protocol except for a deviation at step 12 where the sample is centrifuged for

2 minutes at 10,000 x g instead of 1 minute, according to the Human Microbiome Project protocol.89 All the samples were eluted in 55 µL of C6. Negative controls were performed during

25 the DNA extraction step for each extraction run using sterile water instead of sample and 50µL of

C6 was collected as a reagent control. All samples were stored at -20oC until further analysis.

2.1.4 16S rRNA qPCR

Quantitative polymerase chain reaction (qPCR) of the 16S rRNA gene was performed to examine the absolute abundance of bacterial species in all samples. The qPCR assay used was developed by Nadkarni et al. and was performed using the Applied Biosystems SDS- 7900 HT real-time PCR machine (Thermo Fisher Scientific, Foster City,CA) 90. The reaction mixture included 10µL of 2X Taqman qPCR Master mix, 0.8µL of 5uM Taqman probe( (6-FAM) - 5’-

CGTATTACCGCGGCTGCTGGCAC- 3’(TAMRA) ) ,1.2µL of 5uM of each forward (5’-

TCCTACGGGAGGCAGCAGT-3’) and reverse (5’-GGACTAC-

CAGGGTATCTAATCCTGTT-3’) primers, 5µL of extracted DNA and 1.8µL of sterile water90.

The amplification conditions are as follows: 10 min at 95°C followed by 40 cycles of 95°C for 15 secs and 65°C for 1 min. Negative control was performed using 5µL sterile water instead of extracted DNA. The standard curves were performed using known concentrations of genomic

DNA from Pseudomonas aeruginosa (PAO1). All reactions were performed in duplicate and analyzed using the SDS software v2.3.

2.1.5 16S rRNA sequencing

Sequencing of the extracted DNA was performed at the Centre for the Analysis of Genome

Evolution and Function (CAGEF) (Toronto, ON). The V4 hypervariable region of the 16S rRNA gene was amplified using a universal forward sequencing primer and a uniquely barcoded reverse sequencing primer to allow for multiplexing91. Amplification reactions were performed using 12.5

µL of KAPA2G Robust HotStart ReadyMix (KAPA Biosystems, Wilmington, MA), 1.5 µL of 10 uM forward (5’-AATGATACGGCGACCACCGAGATCTACACTATGGTAATTGT

26

GTGCCAGCMGCCGCGGTAA-3’) and reverse primers (5’-

CAAGCAGAAGACGGCATACGAGAXXXXXXXXXXXXAGTCAGTCAGCCGGACTACH

VGGGTWTCTAAT-3’, X denotes the unique barcode sequence), 8 µL of sterile water and 1.5 µL of DNA. The V4 region was amplified by cycling the reaction at 95°C for 3 minutes followed by

30 cycles of 95°C for 15 seconds, 50°C for 15 seconds and 72°C for 15 seconds, and finally held at 72°C for 5 minute extension. All amplification reactions were done in triplicate, checked on a

1% agarose TBE gel, and then pooled to reduce amplification bias. Pooled triplicates were quantified using Quant-it PicoGreen dsDNA Assay (Thermo Fisher Scientific, Waltham, MA) and combined at even concentrations. The final library was purified using Ampure XP beads

(Agencourt, Beverly, MA), selecting for the bacterial V4 amplified band. The purified library was quantified using Qubit dsDNA Assay (Thermo Fisher Scientific, Waltham, MA) and loaded on to the Illumina MiSeq for sequencing, according to manufacturer’s instructions (Illumina, San Diego,

CA). Sequencing was performed using the V2 (150bp x 2) chemistry.

2.1.6 Sequencing analysis

For 16S rRNA gene profiling, the UNOISE pipeline, available through USEARCH version

10.0.240, was used for sequence analysis.92–94 The last base, typically error-prone, was removed from all the sequences. Sequences were assembled and quality trimmed using –fastq_mergepairs and –fastq_filter, with a –fastq_maxee set at 1.0 and 0.5, respectively. Assembled sequences less than 233bp were removed. Following the UNOISE pipeline, unique sequences were identified from the merged pairs and sorted. Sequences were denoised and chimeras were removed using the unoise3 command in USEARCH. Assembled sequences were then mapped back to the chimera- free denoised sequences at 97% identity OTUs using the –usearch_global command. assignment was executed using SINTAX95, available through USEARCH, and the SINTAX compatible Ribosomal Database Project (RDP) database version 16, with the default minimum

27 confidence cut-off of 0.8.96 OTU sequences were aligned using PyNast accessed through QIIME.97

Sequences that did not align were removed from the dataset and a phylogenetic tree of the filtered aligned sequence data was made using FastTree.98

2.1.7 Biostatistical analysis

Alpha and Beta analysis was performed using Quantitative Insights Into Microbial Ecology

(QIIME) version 1.9.1. Beta diversity was performed using the Bray-Curtis dissimilarity using rarefied OTU tables to the lowest number of sequences within a sample (must be >1500). Principal coordinate analysis (PCoA) plots were generated using the dissimilarity matrix from the Bray-

Curtis dissimilarity output in QIIME. All taxonomic analysis was performed at the OTU level using rarefied tables as described above. All correlations were performed using non-parametric

Spearman correlation (rs) with PRISM 7.

2.1.8 Overview of pre-sequencing workflow

Outline of pre-sequencing workflow is shown in Figure 2.

Step 1 Concentration: A 10-fold concentration was performed using a starting volume of 5 mL of each mock community density. In addition, for the BALF sample, we performed a 6X concentration with a starting volume of 3mL. A negative control was performed also with sterile water.

Step 2 DNase treatment: We performed a DNase treatment on 250 µL of both concentrated and non-concentrated samples. A negative control was also performed at this step with of sterile water.

Step 3 DNA extraction: For reproducibility purposes, we performed each DNA extraction in duplicate. In parallel we performed negative DNA extraction control with sterile water and collected the elution control(C6) to identify any contaminants from the extraction kit and elution buffer

28

Step 4: Sequencing of the 16S rRNA gene was performed by the Centre for the Analysis of

Genome Evolution and Function (CAGEF) using Illumina Miseq platform

29

Mock community or BALF

Step 1: Separate samples: (1) Concentrated and (2) Concentrate using 30kDa Centrifugal Unconcentrated Filter Unconcentrated *with controls*

Step 2: DNase treatment Concentrated Sample Unconcentrated Sample To remove free DNA *with controls*

Duplicate Extractions

DNase DNase No DNase No DNase Step 3: Treated Treated DNA extraction In duplicate to determine reproducibility Duplicate Extractions Duplicate Extractions Duplicate Extractions Duplicate Extractions *with controls* DNA DNA DNA DNA DNA DNA DNA DNA Extraction Extraction Extraction Extraction Extraction Extraction Extraction Extraction 1 2 1 2 1 2 1 2

Step 4: 16S rRNA sequencing and qPCR 16S rRNA sequencing 16S rRNA qPCR To determine microbial composition and density respectively

Reagent Reagent Sample Sample Controls Controls

.

30

2.2 Optimization of low density BALF samples from lung transplant recipients with CLAD

2.2.1 Study design

This study was approved by the University Health Network Research Ethics Board (15-

9531-AE). We performed a retrospective study on 27 lung transplant recipients from the Toronto

Lung Transplant Program (Toronto, Ontario, Canada). Patient sex, age, underlying condition, pulmonary function tests, post lung transplant bronchoscopy radiology and pathology reports were abstracted from the medical records. These data were maintained by the physicians and research coordinators within the Toronto Lung Transplant Program. Patients underwent a surveillance or indication bronchoscopy and a bronchoalveolar lavage was performed with approximately 100mL of sterile saline into the right middle lobe. All samples were frozen at -80oC until subsequent steps.

2.2.2 Diagnostic definitions

Patients were diagnosed with CLAD according to the criteria outlined by the International

Society of Heart and Lung Transplantation.63 Using the pulmonary function test results, CLAD was diagnosed if there is an irreversible drop in forced expiratory volume in 1 second (FEV1) of a minimum 20% in two sequential tests performed in at least 3 weeks apart. In addition, the distinction between the two phenotypes of CLAD were made by assessing the total lung capacity

(TLC) and the pathology/radiology reports. RAS patients were characterized as patients with a

63,64 drop in TLC >10% and fibrosis in the peripheral lung the in addition to the decreased FEV1.

Control lung transplant recipients had no evidence of CLAD within the first four years post- transplantation. In a total, there were 11 BOS, 4 RAS and 12 non-CLAD control patients. We utilized the first non-infected BALF samples after CLAD diagnosis. To determine if the patient

31 had an infection, clinical microbiology reports were accessed and only samples with a negative bacterial culture (commensal or no growth) were used.

2.2.3 DNA extraction

DNA was extracted from approx. 300-500µL aliquots of BALF using the MoBio

PowerSoilTM DNA Isolation kit (now known as DNeasy, Qiagen Carlsbard, CA). Technical replicates using the same samples were extracted according to the Human Microbiome Project protocol.89 Aliquots of sterile water were extracted in parallel, and used as negative controls to assay for the presence of contaminants. Quantification of DNA was performed using the Qubit 4 flourometer (Invitrogen, Carlsbad, CA, US).

2.2.4 16S rRNA sequencing, qPCR and bioinformatics

The isolated DNA was sequenced similarly to the methods described in 1.1.5. The V4 region of the 16S rRNA gene was amplified under the same conditions, and PCR-based library construction was performed in triplicate. These reactions were pooled together and combined to make the final library. This library was prepared according to the MiSeq user guide and sequencing was performed using the V2 (150bp x 2) chemistry on the Ilumina MiSeq platform (Illumina, San

Diego, CA). Technical replicates were sequenced in at least 2 runs. All samples followed the same protocol for 16S sequencing. 16S qPCR was performed using the same methods outlined in 1.1.4.

We utilized the UNOISE platform to analyze our sequences. All sample sequences were demultiplexed on the Illumina MiSeq. The same protocol was followed for the pipelining of this run as described in 1.1.6. Alpha (Shannon Diversity, Chao1, Observed OTUs, Berger Parker) and beta diversity (principal coordinate analysis and Bray-Curtis dissimilarity) analysis was performed on QIIME (version 1.9.1). The OTU tables were used perform post-sequencing analysis in Excel and statistical analysis (Spearman correlation) was performed using PRISM 7.

32

2.2.5 Cytokine analysis

Cytokine were quantified in the BALF samples using the Luminex multiplex bead (R&D

Systems) analysis system. Samples were thawed on ice and centrifuged at 3184 x g at 4oC for 20 minutes. The supernatant was used for analysis as per manufactures instructions. A custom multiplex bead kit was used to measure cytokines based on manufacturer’s instructions. Biomarker concentrations were obtained using a Bio-Plex® MAGPIX™ Multiplex reader (Bio-Rad

Laboratories, Hercules, CA). For all analyses, any value falling below the lower limit of detection was assigned a level of 0 ng/mL. Cytokine levels were compared between CLAD patients and non-

CLAD lung transplant recipient controls and statistical significance between groups was calculated using the Kruskall-Walis test in PRISM 7. Spearman correlations were generated for the cytokine levels and the four alpha diversity measures (Shannon diversity, Berger-Parker dominance, number of OTUs, and Chao1). The diversity measures were recalculated from post-filtration OTU tables and Spearman correlations were calculated between the new alpha diversity values and cytokines. Spearman correlation coefficients were compared between pre- and post-filtration and correlations of correlations were generated.

33

Chapter 3 Results 3.1 Mock community serial dilution

3.1.1 16S rRNA gene sequencing of the serially diluted mock community

To assess the relationship between sequencing results and sample density across the range of bacterial densities we observe in post-transplant BALF samples, we performed 16S rRNA sequencing on a serially diluted community composed of equal proportions of three cultured bacterial species: Pseudomonas aeruginosa, Stenotrophomonas maltophilia and Burkholderia multivorans. These species (which we refer to as a ‘mock community’) were chosen because they have been observed to be colonizers of lung, have had their whole genome sequenced for easy identification using 16S rRNA sequencing and can be easily cultured.99–103 We observed the absolute bacterial density in allograft BALF to be within a range of 4-6 log copies 16S rDNA/mL

(Figure 3).58 The mock community was made to mimic this range by serially diluting an overnight culture (grown to 108 CFU/mL) from 107 to 101 CFU/mL in 10-fold increments. The use of defined input bacterial species enabled us to identify all “non-mock community” taxa as contaminants. All samples were extracted using DNeasy DNA extraction kits with run-specific negative controls which also underwent 16S rRNA gene sequencing (Figure 4, outline of negative control collection).

Taxonomic profiles from the mock communities through the range of dilutions are shown in Figure 5A. We observed a statistically significant negative relationship between the mock community density and the total relative abundance of contaminants (Spearman correlation (rs) =

-0.973, p =0.002). At the highest mock community bacterial density, 107 CFU/mL, the mock community taxa comprised a mean of 97.5% of the total relative abundance. By the 5th dilution,

34

(102 CFU/mL), the contaminants exceeded 90% of the 16S rRNA sequence (Figure 5A). We

identified 1091 Operational Taxonomic Units (OTUs) that were not mock community members.

These contaminating taxa included those commonly found in the environment such as

Acinetobacter to common human skin or gut associated genera such as Corynebacterium,

Streptococcus and Bacteroides (Table 1). Taxa that were present in high relative abundance

(>10%) included Acinetobacter, Serratia proteamaculans, Enterobacteriaceae and Pseudomonas

were observed mainly in lowest density mock community samples. We classified taxa as either

being ‘signal’ taxa (mock community members) or ‘noise’ (contaminants) and observed that the

signal to noise ratio was inversely proportional to input sample density (Figure 5B)

8

6

4 16S rDNA 2 Log (Copy Number/mL) 0

CLAD Cohort Project

Surveillance Broncoscopy

Figure 3.Range of absolute microbial densities within allograft microbiota of LTR. BALF collected from lung transplant recipients undergoing surveillance bronchoscopies or their first BALF after chronic lung allograft dysfunction (CLAD) diagnosis in the absence of infection.

35

1. Overnight culture of Mock Community

2. Pre-sequencing Treatment

Water DNAse Water + treatment +

Negative Water Negative Water Control Concentration Control

3. DNA extraction DNA + Extraction + Kit Negative Water Elution Buffer (C6) Control Collection

Reagent Reagent Reagent 1 2 3 4. 16s rRNA Sequencing +

16s rRNA gene PCR Negative Control

Figure 4. Overview of negative control collection during the processing of mock community samples.

36

A Other 1 s__Stenotrophomonas_rhizophila s__Shewanella_xiamenensis s__Serratia_proteamaculans s__Rhizobium_cellulos ilyticum s__Pseudomonas_zhaodongens is 0.8 s__Pseudomonas_zeshuii s__Pseudomonas_psychrotolerans s__Pseudomonas_beteli s__Paracoccus_marcusii s__Massilia_norwichensis s__Massilia_namucuonens is 0.6 s__Massilia_aurea s__Listeria_welshimeri s__Lactobacillus_fermentum s__Escherichia_fergusonii s__Alcaligenes_faecalis_s ubsp._parafaecalis s__Acinetobacter_indicus 0.4 g__Stenotrophomonas g__Staphylococcus

RelativeAbundance g__Sphingobium g__Pseudomonas g__Paracoccus g__Novosphingobium 0.2 g__Enterococcus g__Comamonas g__Bacillus g__Acinetobacter f__Enterobacteriaceae 0 f__Comamonadaceae B100100 4040 Mock 16s qPCR Ct Value 35 1010 35 16S Threshold(Ct) rRNACycle 1 1 30

0.10.1 25 Signal:Noise

16s Signal:Noise 0.010.01 20

0.0010.001 1515 101 102 103 104 105 106 107 101 102 103 104 105 106 107 CFU/mL Mock Comunity Bacterial Density (CFU/mL) Figure 5. Sequence profile of serially diluted mock community samples

A- Taxonomic composition of the serially diluted mock community samples. The mock community taxa are represented in black and contaminant taxa are represented by various colours. Extractions were performed in duplicate for each density, only the first extraction is shown. With increasing dilutions, the proportion of bacterial reads from the mock community members decreases and contaminants become more abundant. B- Signal to noise ratio of mock community samples over various starting densities. The red points and line represent the 16S qPCR individual sample values and mean, respectively. The blue points and line represent the 16S signal(mock community taxa) to noise(contaminant) ratio for individual sample values and mean, respectively.

37 Table 1. List of contaminating taxa from mock community serial dilutions

Domain Phylum Operational taxonomic unit Actinobacteria Actinobacteria, Iamiaceae, Actinomycetales, Actinomyces, Actinomyces odontolyticus, Mobiluncus, Brevibacterium, Brevibacterium ammoniilyticum, Cellulomonadaceae, Cellulomonas, Corynebacterium, Corynebacterium diphtheriae, Corynebacterium pilbarense, Corynebacterium tuberculostearicum, Brachybacterium, Dermacoccus, Kytococcus sedentarius, Dietzia, Dietzia maris, Geodermatophilaceae, Blastococcus aggregatus, Geodermatophilus terrae, Modestobacter multiseptatus, Intrasporangiaceae, Knoellia, Kineosporiaceae, Kineococcus rhizosphaerae, Pseudokineococcus lusitanus, Quadrisphaera granulorum, Microbacteriaceae, Amnibacterium kyonggiense, Curtobacterium citreum, Herbiconiux ginsengi, Leucobacter, Leucobacter albus, Microbacterium, Schumannella luteola, Micrococcaceae, Arthrobacter, Arthrobacter agilis, Arthrobacter psychrochitiniphilus, Kocuria, Kocuria kristinae, Kocuria salsicia, Micrococcus luteus, Rothia, Rothia dentocariosa, Rothia endophytica, Rothia mucilaginosa, Mycobacterium, Nakamurella flavida, Nocardia coeliaca, Rhodococcus cerastii, Williamsia muralis, Aeromicrobium halocynthiae, Marmoricola, Nocardioides, Nocardioides iriomotensis, Nocardioides terrigena, Promicromonosporaceae, Microlunatus aurantiacus, Propionibacterium acnes, Propionibacterium granulosum, Actinomycetospora, Actinomycetospora succinea, Pseudonocardia, Pseudonocardia hydrocarbonoxydans, Sanguibacter, Streptomyces, Bifidobacterium, Bifidobacterium animalis subsp. lactis, Bifidobacterium bifidum, Bifidobacterium dentium, Atopobium rimae, Collinsella, Eggerthella lenta, Enterorhabdus mucosicola, Olsenella, Patulibacter minatonensis, Solirubrobacter, Solirubrobacter taibaiensis Bacteroidetes, Bacteroidales, Bacteroides, Bacteroides fragilis, Bacteroides uniformis, Bacteroides xylanisolvens, Bacteroidetes Porphyromonadaceae, Odoribacter, Parabacteroides distasonis, Parabacteroides goldsteinii, Parabacteroides merdae, Porphyromonas catoniae, Prevotella, Prevotella denticola, Prevotella nigrescens, Prevotella salivae, Alistipes, Alistipes onderdonkii, Cytophagales, Adhaeribacter, Cytophaga hutchinsonii, Dyadobacter, Hymenobacter, Hymenobacter flocculans, Hymenobacter latericoloratus, Hymenobacter soli, Larkinella insperata, Nibribacter koreensis, Pontibacter, Pontibacter akesuensis, Rufibacter, Rufibacter immobilis, Cryomorphaceae, Flavobacteriaceae, Chryseobacterium, Chryseobacterium gambrini, Chryseobacterium hagamense, Chryseobacterium hispanicum, Chryseobacterium hungaricum, Chryseobacterium indoltheticum, Cloacibacterium haliotis, Cloacibacterium normanense, Empedobacter falsenii, Epilithonimonas, Flavobacterium, Flavobacterium akiainvivens, Flavobacterium ceti, Flavobacterium lindanitolerans, Flavobacterium rakeshii, Chitinophagaceae, Cnuella takakiae, Sphingobacteriaceae, Arcticibacter, Arcticibacter svalbardensis, Mucilaginibacter, Pedobacter, Pedobacter xixiisoli, Sphingobacterium, Sphingobacterium alimentarium, Sphingobacterium thermophilum Firmicutes, Bacilli, Bacillales, Bacillaceae 1, Bacillus, Caldibacillus, Geobacillus stearothermophilus, Thermicanus aegyptius, Gemella haemolysans, Exiguobacterium mexicanum, Listeria welshimeri, Paenibacillus, Planococcaceae, Lysinibacillus sphaericus, Staphylococcaceae, Jeotgalicoccus nanhaiensis, Staphylococcus, Staphylococcus vitulinus, Lactobacillales, Firmicutes Aerococcus urinaeequi, Alloiococcus otitis, Desemzia incerta, Granulicatella elegans, Trichococcus palustris, Enterococcus, Lactobacillaceae, Lactobacillus, Lactobacillus faecis, Lactobacillus fermentum, Lactobacillus intestinalis, Lactobacillus salivarius, Lactobacillus taiwanensis, Leuconostoc, Leuconostoc citreum, Lactococcus taiwanensis, Streptococcus, Streptococcus gallolyticus subsp. pasteurianus, Clostridiales, Clostridium sensu stricto, Clostridium paraputrificum, Thermoanaerobacterium thermosulfurigenes, Anaerococcus, Anaerococcus octavius, Anaerococcus tetradius, Parvimonas micra, Clostridiales Incertae Sedis XIII, Eubacterium coprostanoligenes, Eubacterium desmolans, Lachnospiraceae, Acetatifactor muris, Anaerostipes, Anaerostipes butyraticus, Eubacterium hadrum, Blautia, Blautia faecis, Blautia glucerasea, Blautia luti, Blautia wexlerae, Ruminococcus obeum, Clostridium XlVa, Clostridium bolteae, Clostridium nexile, Clostridium scindens, Clostridium colinum, Clostridium lactatifermentans, Eubacterium hallii, Eubacterium ventriosum, Ruminococcus gnavus, Ruminococcus2, Peptostreptococcaceae, Clostridium sordellii, Intestinibacter bartlettii, Romboutsia, Terrisporobacter glycolicus, Ruminococcaceae, Butyricicoccus pullicaecorum, Clostridium IV, Clostridium leptum, Faecalibacterium Bacteria prausnitzii, Flavonifractor plautii, Gemmiger formicilis, Intestinimonas butyriciproducens, Oscillibacter, Pseudoflavonifractor capillosus, Erysipelotrichaceae, Clostridium XVIII, Clostridium ramosum, Clostridium saccharogumia, Clostridium spiroforme, Erysipelotrichaceae incertae sedis, Faecalicoccus acidiformans, Turicibacter sanguinis, Phascolarctobacterium succinatutens, Veillonellaceae, Dialister succinatiphilus, Veillonella, Veillonella atypica, Veillonella dispar, Veillonella ratti Proteobacteria Proteobacteria, , Caulobacteraceae, Brevundimonas, Brevundimonas mediterranea, Caulobacter segnis, Phenylobacterium, Rhizobiales, Aureimonas, Aureimonas altamirensis, Bradyrhizobiaceae, Bosea lathyri, Bosea robiniae, Bradyrhizobium, Ochrobactrum daejeonense, Ochrobactrum grignonense, Pseudochrobactrum saccharolyticum, Devosia, Devosia insulae, Devosia subaequoris, , , Methylobacterium cerastii, Methylobacterium iners, Methylobacterium tarhaniae, Microvirga, Rhizobiaceae, Rhizobium, Rhizobium calliandrae, Rhizobium cellulosilyticum, Rhizobium grahamii, Rhizobium vignae, Shinella zoogloeoides, Azorhizobium doebereinerae, Rhodobacteraceae, Haematobacter missouriensis, Paracoccus, Paracoccus marcusii, Paracoccus yeei, Acetobacteraceae, Roseomonas, Roseomonas aestuarii, Roseomonas aquatica, Roseomonas gilardii subsp. rosea, Roseomonas ludipueritiae, Rhodospirillaceae, Azospirillum, Azospirillum thiophilum, Skermanella, Skermanella aerolata, Reyranella, Reyranella soli, Sphingomonadales, Altererythrobacter, Altererythrobacter aestuarii, Altererythrobacter troitsensis, Sphingomonadaceae, Novosphingobium, Sphingobium, Sphingobium limneticum, Sphingobium xenophagum, Sphingomonas, Sphingomonas aestuarii, Sphingomonas alpina, Sphingomonas aurantiaca, Sphingomonas cynarae, Sphingomonas dokdonensis, Sphingomonas endophytica, Sphingomonas laterariae, , , Alcaligenaceae, Achromobacter anxifer, Alcaligenes faecalis subsp. parafaecalis, Burkholderia dilworthii, Cupriavidus, Ralstonia, citratiphilum, , Tepidimonas fonticaldi, , , Comamonas jiangduensis, Comamonas odontotermitis, gracilis, bisanensis, Pseudorhodoferax, Ramlibacter henchirensis, Schlegelella thermodepolymerans, arseniciresistens, , Herbaspirillum chlorophenolicum, , Massilia aurea, Massilia namucuonensis, Massilia niabensis, Massilia norwichensis, Massilia plicata, Noviherbaspirillum malthae, Sutterella, Hydrogenophilus hirschii, Tepidiphilus, Methylobacillus gramineus, Neisseriaceae, Morococcus, Neisseria, Azoarcus, Zoogloea resiniphila, Peredibacter starrii, Cystobacter badius, Campylobacter*, Campylobacter rectus, Gammaproteobacteria, Aeromonas, Shewanella xiamenensis, Rheinheimera, Rheinheimera soli, Enterobacteriaceae**, Escherichia fergusonii, Kosakonia sacchari, Pantoea, Serratia, Serratia proteamaculans**, Halomonas, Haemophilus, Haemophilus parainfluenzae, Moraxellaceae, Acinetobacter**, Acinetobacter beijerinckii, Acinetobacter bohemicus, Acinetobacter indicus, Acinetobacter seifertii, Acinetobacter ursingii, Alkanindiges illinoisensis, Enhydrobacter aerosaccus, Moraxella porci, Psychrobacter, Pseudomonadaceae, Cellvibrio ostraviensis, Pseudomonas**, Pseudomonas abietaniphila, Pseudomonas azotifigens, Pseudomonas balearica, Pseudomonas cichorii, Pseudomonas guariconensis, Pseudomonas knackmussii, Pseudomonas lini, Pseudomonas psychrophila*, Pseudomonas psychrotolerans, Pseudomonas tremae, Pseudomonas xanthomarina, Pseudomonas zeshuii, Pseudomonas zhaodongensis, Xanthomonadaceae, Dyella, Luteimonas abyssi, Lysobacter, Lysobacter xinjiangensis, Pseudoxanthomonas indica, Pseudoxanthomonas mexicana, Pseudoxanthomonas taiwanensis, Stenotrophomonas, Pseudomonas beteli, Stenotrophomonas rhizophila, Vulcaniibacterium thermophilum Other Bacteria, Blastocatella fastidiosa, Gp6, Saccharibacteria genera incertae sedis, Cyanobacteria/Chloroplast, Streptophyta, Mucispirillum schaedleri, Deinococcus, Deinococcus reticulitermitis, Deinococcus wulumuqiensis, Truepera radiovictrix, Thermus scotoductus, Fusobacterium, Leptotrichia, Anaeroplasma, Mycoplasma salivarium, Spartobacteria genera incertae sedis, Akkermansia muciniphila, Luteolibacter luojiensis

Methanobacterium,Nitrososphaera viennensis Archea

(*) Denotes Taxa that had a maximum relative abundance of 1% in at least 1 sample (**)Denotes Taxa that had a maximum relative abundance of 10% in at least 1 sample Bolded taxa was present (>0) in +50% of all the mock community samples Red taxa represent OTUs that had both a maximum RA of more than 1% and were present in greater than 50% of samples

38

3.1.2 Reagents are largest source of contaminants within mock community samples

To identify the source of the contaminants within the mock community samples, we compared the sequenced communities to negative controls collected at each step, including DNA extraction and library amplification (PCR) controls. The PCR sequencing control and DNA extraction control taxonomic profiles contained 225 and 757 OTUs, respectively. One-hundred sixteen contaminating OTUs from mock community samples overlapped with either the PCR control taxa (24 OTUs) or both the PCR control taxa and DNA extraction control taxa. (96 OTUs)

(Figure 6). Four hundred and six contaminant OTUs were identified in both the DNA extraction controls and mock community samples, and were similar to those identified in the study conducted by Salter et al., detecting reagent contamination in DNA extraction kits.81 Figure 7 shows a heat map of the taxonomic composition of mock community taxa and controls clustered by Bray-Curtis dissimilarly. Notably, the two lowest density samples (101-102CFU/mL) cluster more closely to the reagent controls than to higher density mock community samples, meaning they are more taxonomically similar in composition to these reagents.

Using Source Tracker2, we approximated the proportion the taxa coming from potential

“source” environments for each replicate across the range of dilutions. This approach uses a

Bayesian model to assign sequences to a defined set of input source communities.104 For our data set, we compared our “sink” samples, which includes the serially diluted mock community samples from 106CFU/ml to 101CFU/mL, to 4 source communities: DNA elution controls; DNA extraction controls; PCR reagents, and; the highest density mock community (107 CFU/mL)(Figure 8). At the lowest mock community densities (101-102 CFU/mL), 67-78% of the proportion of reads were predicted to originate from the DNA extraction controls. Sequences from moderate density communities were attributed approximately equally to PCR reagents and the mock community

39 taxa, while the highest density community was dominated by taxa overlapping with the 107

CFU/mL mock community sample.

PCR Reagent DNA Extraction

Mock Contaminants

Figure 6. Overlapping taxa from mock community contaminants and reagents

40

Elution Control 3 Extraction Control 1 Extraction Control 3 101CFU/mL Rep 2 101CFU/mL Rep 1 Extraction Control 2 102CFU/mL Rep 2 102CFU/mL Rep 1 Elution Control 2 Elution Control 1 Sequencing Control 107CFU/mL Rep 2 107CFU/mL Rep 1 106CFU/mL Rep 2 106CFU/mL Rep 1 105CFU/mL Rep 2 104CFU/mL Rep 1 103CFU/mL Rep 1 105CFU/mL Rep 1 104CFU/mL Rep 2 103CFU/mL Rep 2

Figure 7. Heat map of the relative abundance of taxa within each sample clustered by Bray-Curtis dissimilarity. 1 2 Low Density mock community samples (10 -10 CFU/mL) cluster more closely to reagent controls than to higher density mock community samples.

DNA Elution Control DNA Extraction Control Mock Community PCR Reagent Other 41 A B Replicate 2 Replicate 1 2 101 CFU/mL Replicate 1 10 CFU/mL Replicate 1 1.0 1.0 DNA_Elution_Control 101CFU/mL 0.8 DNA_Extraction_Control0.8 Mock Community 0.6 0.6 PCR Control 0.4 Unknown0.4

102CFU/mL 0.2 0.2

0.0 0.0

103 CFU/mL Replicate 1 104 CFU/mL Replicate 1 1.0 1.0 103 CFU/mL DNA_Elution_Control

0.8 DNA_Extraction_Control0.8 Mock Community 0.6 0.6 PCR Control 0.4 Unknown0.4 104 CFU/mL 0.2 0.2

0.0 0.0

105 CFU/mL Replicate 1 106 CFU/mL Replicate 1 5 1.0 1.0 10 CFU/mL DNA_Elution_Control 0.8 0.8 DNA_Extraction_Control Mock Community 0.6 0.6 PCR Control 0.4 106 CFU/mL 0.4 Unknown

0.2 0.2

0.0 0.0

Figure 8. Source tracker 2 proportions for mock community samples The “source” environment was the mock community negative reagents that include the DNA Elution Control (C6 DNeasy kit), DNA 7 extraction controls, PCR reagent controls, highest density mock community (10 CFU/mL) and other (sequences not attributed to rest of source communities) A- The proportion of sequences attributed to source. B- The bar chart for the first replicate of mock community densities and their standard deviations 42

3.1.3 16S rRNA sequencing accuracy varies by density

We then defined sequencing accuracy using an aggregate measure of taxonomic compositional dissimilarity (Bray-Curtis dissimilarity, BC), which compares the relative abundance of all taxa across all communities (Figure 9). The BC dissimilarity to the highest density mock community sample (as reference to a starting community) was calculated for all mock densities. As the density of the mock community decreases, the samples are increasingly dissimilar

(higher BC dissimilarity) to the starting mock community (rs= -0.93, p =0.002). The dissimilarity plateaus at the 102 CFU/mL dilution. Notably, this is the same dilution at which bacterial quantitation by 16S rRNA gene qPCR plateaus (Figure 5B) and at which >90% of OTUs represent contaminants (Figure 5A).

3.1.4 Sequencing precision varies by density

We next assessed the degree of agreement in taxonomic composition and OTU relative abundance between replicates at each density. Since all samples were extracted and sequenced in duplicate, we compared community composition between technical replicates. Overall, taxonomic relative abundances were generally correlated between replicates, but with stronger agreement for higher RA taxa (Figure 10A). However, higher density samples exhibited a greater degree of agreement in OTU RA than lower abundance samples, (R2= 0.95-0.998 for 107- 106 CFU/mL vs

R2=0.68 for 101 CFU/mL). This relationship between density and reproducibility is mainly observed in taxa that represent a relative abundance greater than 1% (rs= 0.8, p =0.02) and is diminished in taxa lower than 1% in relative abundance (rs= 0.46, p =0.3). To complement this finding we also assessed degree of taxonomic compositional similarity through the use of Bray-

Curtis between technical replicates (Figure 10B). Although BC <0.5 for all duplicates, there was

43 a density dependent relationship is still apparent between the precision of technical replicates and the overall density of the mock community (rs= -0.93, P=0.006).

3.1.5 Singletons represent large proportion of contaminants and are low relative abundance

We next investigated the reproducibility of the contaminants. Contaminants were classified as ‘local singleton’ if the taxon was present (RA above 0% ) in only one of the technical replicates or ‘global singleton’ if the taxon was detected in only once in the whole dilution series. We identified 577 out of the 1091 contaminant OTUs to be global singletons. The proportion of taxa that are only present in one duplicate (local singletons) varied between samples of the same density. Local singletons account for approximately 67-80% of contaminants within the sample.

We observed the maximum relative abundance a local singleton within our cohort was 1% (Figure

11). Approximately, 127 of the 1091 contaminating OTUs were reproducible (present between both replicates) in at least 2 mock community densities (Table 2). These characteristics of contaminants were used to inform post-sequencing filtration approaches employed in section 3.3.

44

1.0 1 Serial Diltuion

0.8 0.8 Concentrated DnaseTreated 0.6 0.6 Both Treatments 0.4

0.4DissimilarityCurtis -

0.2Bray 0.2

0.0 0 101 102 103 104 105 106 107 CFU/mL 0 1 2 3 4 5 6 7

Figure 9. Bray-Curtis Dissimilarity between mock community samples to the highest density mock community ( 107CFU/mL). Density dependent relationship between the similarity in taxonomic composition to 7 the highest density same (10 CFU/mL). As the density decreases samples taxonomically are dissimilar to the starting mock community (rs= -0.93, p =0.002).

45

A B 0.4 0.4 1 Serial Diltuion Concentrated 0.3 0.3 0.1 DnaseTreated

0.2 Both Treatments 0.01 0.2 Curtis DissimilarityCurtis -

0.001 0.1Bray 0.1 Pairwaire Dissimilarity 0.0001 0.0 0 Replicate 2 Relative Abundance 0.0001 0.001 0.01 0.1 1 101 102 103 104 105 106 107 Replicate 1 Relative Abundance 0Concentration1 2 CFU/mL3 4 5 6 7

Figure 10. Reproducibility of technical replicates of the serially diluted mock community samples A – Relative abundance of OTUs in replicate 1 Vs replicate 2. Precision OTUs between technical replicates is best with taxa of higher relative abundance and decreases with taxa of lower relative abundace B- Bray- Curtis dissimilarity between technical replicates. Density dependent relationship between similarity in

microbial composition of technical replicates (rs= -0.93, P=0.006).

46

1

0.1 Legend

101 CFU/mL=Red

0.01 102 CFU/mL= Orange

103 CFU/mL= Yellow

104 CFU/mL= Green 0.001 105 CFU/mL= Aqua

106 CFU/mL= Blue

Relative Abundance 0.0001 107 CFU/mL= Purple

0.00001

0.000001

-Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 -Replicate 2 -Replicate 1 1 2 3 4 5 5 6 6 7 7 1 1 2 3 4 5 5 6 6 7 7 1 1 2 3 4 5 5 6 6 7 7 1 1 2 3 4 5 5 6 6 7 7 2- 3- 4- 2- 3- 4- 2- 3- 4- 2- 3- 4- 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 All Contaminants Local Singletons Global Singletons Non-singleton Contaminats

Figure 11. Relative abundance of all contaminants in mock community samples All contaminants represent all the contaminants at that density. Local singletons represent non- mock OTUs that were only present (>0% RA) in one technical replicate. Global singletons represent contaminants that were only present once in the whole data set. Non-singleton contaminants are non-mock community OTUs that were present in both technical replicates. Local and global singletons represent the low relative abundance taxa (<1%). Reproducible contaminants (non-singleton) represent a much higher relative abundance in samples

47

Table 2.Contaminants that were reproducible within more than one density of mock community.Phylum Operational taxonomic unit Actinomycetales, Dietzia, Microbacterium, Micrococcus luteus, Nocardia Actinobacteria coeliaca, Propionibacterium acnes

Bacteroides, Porphyromonadaceae, Chryseobacterium hispanicum, Bacteroidetes Flavobacterium, Flavobacterium rakeshii

Exiguobacterium mexicanum, Staphylococcus, Enterococcus, Firmicutes Lactobacillus, Streptococcus, Lachnospiraceae, Clostridium IV

Alphaproteobacteria, Brevundimonas, Brevundimonas mediterranea, Rhizobiales, Bosea robiniae, Bradyrhizobium, Ochrobactrum grignonense, Pseudochrobactrum saccharolyticum, Rhizobium, Rhizobium cellulosilyticum, Rhizobium grahamii, Paracoccus, Paracoccus marcusii, Roseomonas aestuarii, Rhodospirillaceae, Novosphingobium, Sphingobium, Sphingomonas, Sphingomonas aestuarii, Alcaligenaceae, Achromobacter anxifer, Alcaligenes faecalis subsp. parafaecalis, Tepidimonas, Comamonadaceae, Comamonas, Herbaspirillum Proteobacteria chlorophenolicum, Massilia, Massilia aurea, Massilia namucuonensis, Massilia niabensis, Massilia norwichensis, Shewanella xiamenensis, Rheinheimera soli, Enterobacteriaceae, Escherichia fergusonii, Acinetobacter, Acinetobacter beijerinckii, Acinetobacter indicus, Alkanindiges illinoisensis, Pseudomonadaceae, Pseudomonas, Pseudomonas psychrotolerans, Pseudomonas tremae, Pseudomonas xanthomarina, Pseudomonas zeshuii, Pseudomonas zhaodongensis, Luteimonas abyssi, Stenotrophomonas, Pseudomonas beteli, Stenotrophomonas rhizophila Bacteria Other

48

3.2 Pre-sequencing treatments

We sought to improve sequencing signal:noise ratio and reproducibility through pre- sequencing treatments of the BALF samples and DNA. “Signal” is classified as the mock community taxa we want to observe and contaminants within the sample were considered “noise”.

We selected two methods to ascertain their effects on sequencing outcomes across a range of sample densities: Concentration using Amicon Ultra-15 centrifugal filter units with a 30 kDa filter

(MilliporeSigma, Darmstadt, Germany), and DNase treatment using DNase I from Invitrogen

(Thermofisher Scientific, Waltham, Massachusetts, United States).

3.2.1 Sample concentration increases sample density while DNase treatment decreases sample density

We performed pre-sequencing treatments on the full range of mock community samples and identified differences in microbial density after treatment. 16S rRNA qPCR was used to assess the microbial density of each sample before and after treatment (Figure 12). Concentrating increased sample density as we observed a median decrease change in cycle threshold (Ct) by 2.06

(0.08-4.00) cycles. Alternatively, DNase treatment decreased the amount of template DNA in the samples as we identified a median increase in Ct by 1.5 (0.08-3.5) cycles. The largest changes after treatment (median 3.0 ± 0.5) were identified in samples that had a starting density between 105-

107 CFU/mL. Minimal changes in Ct values (median 1.0± 1) were observed in mock community samples with a very low starting density below 103 CFU/mL.

3.2.2 Sequencing accuracy improved in low density samples that are concentrated but not in those that were DNase treated

We assessed the changes between the signal to noise ratio after pre-sequencing treatment

(concentration and DNase) with mock community samples ranging from 103-105 CFU/mL. Figure

13 outlines the signal to noise ratio for relationship after each treatment. All treatments groups

49 worsened the signal to noise ratio of the mock community at a sample density of 103 CFU/mL, decreasing the relative abundance of the mock community OTUs by 20-31%. Concentrating samples improved the signal to noise ratio with sample densities greater than 103 CFU/mL, increasing the mock community relative abundance by 25-37%. DNase treatment of the samples only increased the signal to noise ratio within the 105 CFU/mL mock community density, by increasing the relative abundance of mock community taxa by 9%. The combination of both treatments had effects most similar to DNase treatment alone.

In addition, we performed beta diversity analysis to compare taxonomic composition of the post-treatment samples to the most representative mock community at the highest density (107

CFU/mL, Figure 14). We observed varying effects in taxonomic similarity to the 107 CFU/mL mock community by treatment and sample density. At an input density of 103 CFU/mL, all treatments had little effect on Bray-Curtis dissimilarity, although communities had a baseline high dissimilarity(BC>0.5) to the 107 CFU/mL sample. At the 104CFU/mL mock community density, we observe a decreased Bray-Curtis dissimilarity (BC= 0.2), in comparison to the other treatments

(BC=0.6 for DNase Treated and BC=0.5 for both treatments), meaning that concentrating the samples increases the similarity to the composition of at the highest density sample. At the sample density of 105 CFU/mL, we observe that all the treatments have a small effect, in increasing the signal of mock community taxa and on the composition as the Bray-Curtis index after treatments are between 0.25-0.35.

3.2.3 Pre-sequencing concentration of moderately dense BALF sample increases density in comparison to DNase treatment

To validate our analysis of the mock community, we performed the same pre-sequencing treatment on a BALF sample obtained from an LTR with an input microbial density of was 105

16S rDNA copies/mL, representing a mid-range density sample. Within our untreated BALF

50 sample we observed four taxa (Mycoplasma salivarium, Enterobacteriaceae, Campylobacter,

Corynebacterium tuberculostearicum), represent a mean total relative abundance of 78% of the community (Figure 15) consistent with the clinical microbiology report of ‘commensal flora’. The technical replicates of the BALF extraction show reproducibility (overlap) of taxa that represent

98% of the total relative abundance.

We performed all the pre-sequencing treatments on the BALF sample to observe the changes in the density, composition of the lung microbial community and potential removal of contaminants. Due to the limited amount of BALF, we were only able to perform a 6X concentration, at which the absolute microbial density did increase in abundance by 6-fold through

16S qPCR (mean 2.15 cycle threshold difference). The four most abundant taxa within this community had a non-statistically significant increase in relative abundance after concentration

(p>0.05, increase total RA 3%). Twelve of fourteen low relative abundance taxa (RA <0.01%) had a statistically significant decrease in relative abundance. DNase treatment decreased the absolute density of the BALF sample by approximately one-fold. Many of the statistically significant

(p<0.05) changes in taxa after DNase treatment were observed with low relative abundance taxa

(0.03-0.0009% of sequence). Of the four most abundant taxa within this community, three had a non-statistically significant decrease in relative abundance after DNase treatment (mean 2 ±1% decrease in RA). Mycoplasma salivarium, the most highly abundant OTU, increased in relative abundance after DNase treatment which may be due to the taxon being a viable whole cell within the sample.88 Lastly, using both treatments on the BALF sample overall increased the absolute microbial density of the sample by one-fold. Similar to the earlier treatments, low relative abundance taxa (RA<0.03%) showed a statistically significant decrease in relative abundance after treatment (p<0.05) and the four highest relative abundance taxa had a non-statistically significant increase (mean 0.5%) in relative abundance.

51

Pre-sequencing treatments have variable effects on signal to noise ratio depending on density. Concentration has shown to have the best results, increasing the overall density of samples and the signal to noise ratio in samples over 103CFU/mL. With DNase treatment these effects are not as pronounced due to the overall decrease in absolute abundance and mock community taxa being susceptible to treatment. Notably, we observed a threshold at which neither was effective at increasing the signal (those with an input density <103CFU/mL).

52

Legend

1 -4 10 CFU/mL=Red 102 CFU/mL= Orange

-2 103 CFU/mL= Yellow 104 CFU/mL= Green

0 105 CFU/mL= Aqua

106 CFU/mL= Blue Difference in Ct 2 107 CFU/mL= Purple

Full Fill = Concentrated

4 No Fill = DNase Treated 33-35 33-35 30-32 28-29 25-26 21-22 17-19 Starting Cycle Threshold(Ct) Dotted = Both Treatment

Figure 12. Quantitative results of pre-sequencing treatment on mock community samples using 16S rRNA qPCR. The various colors represent the different mock community starting densities. A solid bar represents samples that were concentrated, no fill bar represents samples that were DNase treated and a dotted bar represents samples that underwent both treatments. The difference in cycle threshold (Ct) is the change from staring Ct after treatment (decrease Ct= increase in absolute microbial density, increase Ct= decrease absolute microbial density). The largest differences occurred within high density 4 7 2 1 samples (10 -10 CFU/mL) and with concentration. Low density samples (10 -10 CFU/mL) difference after treatment was minimal possibly due to the sample being at the threshold of detections.

53

No DNase Both Treatment Concentrated Treated Treatment

1.0 Relative Abundance Relative 0.8 Mock

0.6

0.4

Contaminant 0.2

0 CFU/mL CFU/mL CFU/mL CFU/mL CFU/mL CFU/mL CFU/mL CFU/mL CFU/mL CFU/mL CFU/mL CFU/mL 3 4 5 3 4 5 3 4 5 3 4 5 10 10 10 10 10 10 10 10 10 10 10 10

Figure 13. Change in relative abundance of mock taxa and contaminants after pre- sequence treatment

Mock and contaminant relative abundance is a cumulative relative abundance the taxa classified for each group. Concentrating samples increased the relative abundance(RA) of the mock community 4 5 members samples with the density of 10 -10 CFU/mL. DNase treatment alternatively decrease RA of 3 4 the mock community taxa in 10 -10 CFU/mL density samples. Utilizing both treatments, mimicked the relationship observed in the DNase treated samples

54

1 1.0 Serial Diltuion Serial Diltuion 0.8 0.8 ConcentratedConcentrated DnaseTreatedDnaseTreated 0.6 0.6 Both Treatments Both Treatments 0.4 0.4 Curtis DissimilarityCurtis -

0.2Bray 0.2

0.0 0 101 102 103 104 105 106 107 CFU/mL 0 1 2 3 4 5 6 7

Figure 14. Bray-Curtis dissimilarity between treatment samples and highest density mock community density Bray Curtis dissimilarity after concentration (red) , DNase treatment (green) and both treatments (purple). Only mock community concentrations 103-105CFU/mL underwent sequencing of the pre-sequencing treated samples. The higher the Bray-Curtis dissimilarity value the more divergent in taxonomic composition the communities. Concentrating increased the similarity of low density samples >103CFU/mL more effectively in comparison to DNase and both treatments.

55

1 28

Other

0.9 Enterobacteriaceae 27.5 Actinomyces odontolyticus

Pseudomonas 0.8 Listeria welshimeri 27 Stenotrophomonas maltophilia 0.7 Escherichia fergusonii

26.5 Enterobacteriaceae 0.6 Lactobacillus fermentum

Parvimonas micra 0.5 26 Stenotrophomonas maltophilia

Corynebacterium

0.4 Cycle ThresholdqPCR

RealtiveAbundance Burkholderia 25.5

16S 16S Parvimonas micra

0.3 Serratia proteamaculans

25 Pseudomonas aeruginosa

0.2 Staphylococcus

Corynebacterium tuberculostearicum 24.5 0.1 Campylobacter Enterobacteriaceae

Mycoplasma salivarium 0 24

Figure 15. Histogram of the taxonomic composition of the BALF samples after each pre-sequencing treatment.

Technical replicates are both shown in this figure of each treatment to the moderately dense BALF samples. Relative abundance of taxa is shown on the left Y axis and cycle threshold (Ct) after 16S rRNA qPCR is shown on the right Y-axis, and the black dots and lines correspond to the Ct value for that sample and connecting technical duplicate, respectively. Treatments retain over 90% of taxa, although concentration increase absolute density and retention of most sequences in comparison to DNase treatment

56

3.3 Post sequencing removal of contaminants

3.3.1 Introduce work flow of removal of contaminants from section 1.2

There are several potential limitations of using pre-sequencing sample manipulation to improve sequencing accuracy and precision. These treatments add expense, potentially introduce bias and did not improve sequencing accuracy of the highest and lowest density samples. In addition, they require a large overall volume of sample in order to undergo any treatment. We therefore aimed to develop strategies to improve sequencing accuracy through the identification and removal of taxa bioinformatically (post-sequencing). We generated a set of procedures for filtration of sequences from samples based on our observations of the contaminant characteristics from our mock community samples (Figure 16). We used two broad sets of approaches: filtration methods that can be applied on a single replicate, and filtration methods that require sequencing two replicates. Single replicate-compatible methods were designed based on the following common characteristics of contaminants in low density samples: (1) relative abundance is density dependent; (2) they are of low relative abundance (<1% of reads); (3) they are represented in negative controls. Replicate-based methods were based on the observation that OTUs represented in only a single replicate (which includes both ‘local’ and ‘global’ singletons as defined above) were all contaminants.

With this information, we performed the following steps on the mock serially diluted mock community: First, using a rarefied table (insure equal sequencing depth between samples) we removed contaminants that represent less than 1% of the total sequencing relative abundance in all samples. Secondly, using the maximum sequences for a taxon within a negative control, we next subtracted this value from all the mock community samples to remove any potential contaminants that may have been introduced during the pre-sequencing/sequencing process. Lastly, we removed

57 any local and global singletons, to increase reproducibility. Importantly, removal of contaminants was performed at the OTU level as collapsing to higher taxonomic such as the genera level collapse related OTUs representing both contaminants and signal taxa (such as non-aeruginosa

Pseudomonas and P. aeruginosa) into a single .

3.3.2 Community composition with the post-sequencing filtration

Post-sequencing filtration was first performed on the serially diluted mock community members to ascertain the effects on a known community across the full range of bacterial densities.

Figure 17 shows the stepwise effects of each filtering strategy. A mean of 10% (±9%) of the contaminant reads were removed with the first filtration step, with a large number of contaminants retained at moderate (105-103CFU/mL) and low(102-101CFU/mL) input densities. Filtering the negative reagent contaminants contributed to the largest removal of sequences, and especially was pronounced effects on low density samples (retained less than <10% of reads). Filtration requiring replicate sequencing further improved sequence attribution to mock community membership, but this effect was quantitatively small compared to the prior step. Overall our post-filtration methods retained a median 66% (range 0-89%) of the total sequences. There was a density dependent

(rs=0.97, p<0.001) relationship between the sequences retained and the overall microbial density serially diluted mock community. Within the filtered dataset, we retained 16 OTUs that were non- mock community members and are observed to be a mean total relative abundance of 11 % (Table

3). The lowest density samples (102-101CFU/mL) retained 0% of the sequences after filtration.

58

3.3.3 Removal of contaminants from mock community members improves the beta diversity similarity

We also assessed the changes in precision after filtration of the mock community members using beta diversity metrics. Principal coordinate analysis plots (PCoA) were used to visualize the difference in composition and similarity between all the mock community densities. We observed with filtration that mock community serial dilutions (≥103CFU/mL) cluster more closely to each other (Figure 18) in comparison to unfiltered data. Interestingly, low abundance communities (101 and 102 CFU/mL) increased in distance (mean BC 0.7) from the negative reagent controls and high/moderate density mock communities, which may be due to the lack of sequences retained post filtration. Thus, this increase in similarity between the mock community samples after filtration underlines the impact of contaminants on microbial composition across a range of mock community densities.

59

Rarefy samples to the lowest sequences within a sample ( min 1500)

Low relative 1. Remove Sequences with a abundance taxa, Maximum RA of 1 % contaminant that lack precision

Large overlap of 2. Remove Max #sequences of taxa in negative a taxon in negative controls controls

Requires Duplicate Sequencing

3. Remove Local and Global Increase Singletons reproducibility

Figure 16. Post-sequencing methods for removal of contaminants

60

1 other Campylobacter Sphingomonas aestuarii Helicobacter hepaticus Luteimonas abyssi Bacteroides plebeius Enterobacteriaceae 0.9 Bacteroides Pseudomonas xanthomarina Pseudomonas psychrophila Stenotrophomonas rhizophila Pseudomonas Deefgea rivuli 0.8 Pseudomonas zeshuii Requires Requires Duplicate Sequencing Prevotella intermedia Ruminococcus bromii Novosphingobium Massilia norwichensis Enterobacteriaceae Fusobacterium necrophorum subsp. funduliforme 0.7 Massilia aurea Paracoccus marcusii Massilia namucuonensis Leptotrichia Sphingomonas Faecalibacterium prausnitzii Pseudomonas 0.6 Brevundimonas mediterranea Rhizobium cellulosilyticum Acinetobacter Gemmiger formicilis Enterobacteriaceae Paracoccus Pseudomonas psychrotolerans 0.5 Enterococcus Pseudomonas Bradyrhizobium Stenotrophomonas rhizophila Pseudomonas beteli Stenotrophomonas 0.4 Novosphingobium

RelativeAbundance Listeria welshimeri Enterobacteriaceae Alcaligenes faecalis subsp. parafaecalis Acinetobacter Mycoplasma salivarium Lactobacillus 0.3 Lactobacillus fermentum Escherichia fergusonii Enterobacteriaceae Sphingobium Pseudomonas Bacillus Staphylococcus 0.2 Comamonas Acinetobacter indicus Enterobacteriaceae Shewanella xiamenensis Comamonadaceae Acinetobacter 0.1 Acinetobacter Actinomyces Pseudomonas Serratia proteamaculans Stenotrophomonas maltophilia Stenotrophomonas maltophilia Acinetobacter 0 Pseudomonas aeruginosa 1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5 6 7 1 2 3 4 5 6 7 10 10 10 10 10 10 10 10 10 10 10 10 10 107 10 10 10 10 10 10 10 10 10 10 10 10 10 10 Burkholderia Unfiltered Remove <1% Remove <1%+ Negative Remove <1%+ Negative CTR+ CTR Singletons

Figure 17. Histogram mock community post-sequencing filtration.

Step 1 Samples highlighted with blue represent rarefied unfiltered data set. Step 2 (Red) is the removal of low abundance taxa Step 3 ( green) represents taxa retained after the removal of the maximum sequence of a given OTU within negative controls. Step 4 (grey) removal of both local and global singletons (retain reproducible taxa).

61

Table 3. Contaminants that were retained post-filtration in mock communities

Contaminant Maximum Relative abundance after Filtration Serratia 0.113170732 proteamaculans Staphylococcus 0.047804878 Bacillus 0.04 Mycoplasma 0.04 salivarium Lactobacillus 0.037317073 fermentum Escherichia 0.033902439 fergusonii Enterobacteriaceae 0.030731707 Listeria welshimeri 0.030243902 Enterobacteriaceae 0.029512195 Enterococcus 0.020487805 Enterobacteriaceae 0.019268293 Enterobacteriaceae 0.017317073 Campylobacter 0.01 Paracoccus 0.008780488 Luteimonas abyssi 0.005365854 Massilia aurea 0.002195122

62

A B Mock community ≥ 103CFU/mL

Mock community < 103CFU/mL

PCR Control

DNA Elution Control

DNA Extraction Control

Figure 18. Principal Coordinate Analysis (PCoA) plots before and after post-sequencing filtration mock community samples.

3 A- Before filtration B- After filtration. Mock community samples ≥10 CFU/mL(orange) cluster more closely after filtration due to the increase in taxonomic 3 similarity. Mock community members <10 CFU/mL(green) in the unfiltered data set (panel A) cluster towards reagent controls (blue, red) which is due to the similarity in composition. Although after filtration (panel B) of low density mock communities ( <103CFU/mL ), taxonomic dissimilarity increases which is observed through the increase in distance between all samples due to the lack of retained sequences.

63

3.4 Removal of contaminants from lung transplant cohort

Having established the density-dependence of contamination with a mock community, we next assessed the impact of BALF 16S rRNA gene sequencing contaminants using samples from the post-transplant allograft and the effect of application of sequence filtration strategies on microbiota-host relationships.

3.4.1 Patient demographics

We obtained samples from a retrospective study of the allograft microbiota from 27 lung transplant recipients outlined in Table 4. These samples were selected because: (1) they were the first non-infected BALF sample after CLAD incidence, (2) the samples had been extracted for

DNA and sequenced over at least two runs, to assess reproducibility. Fifteen of the BALF samples were collected LTR with CLAD (11 BOS, 4 RAS) and twelve samples were collected from lung transplant recipients controls with no CLAD diagnosis at least four years post-transplantation. We selected patients that did not have infection at the time of BAL, because of the strong relationship between infection, microbial community composition and microbial density observed in prior reports. The age of the patients ranged from 16-70 with a median of 49 years of age. Most cultures contained only commensal bacteria or no growth and acid-fast bacilli (AFB) testing was negative for 96% of the samples.

3.4.2 Increase in cytokines in RAS patients in comparison to BOS and CLAD in the absence of infection

To evaluate the immune response within the allograft, we performed multiplex analysis of cytokines within our BALF samples. We observed that RAS patients overall, had a statistically significant (P<0.05) higher cytokine response in 50% of the cytokines tested (Figure 19). These cytokines include pro-inflammatory cytokines (IL-10, CCL5, CCL2, CXCL9) and growth

64 stimulating responses such as PDGF and VEGF that have been attributed to inflammatory response within the allograft microbiota in previous studies.67,69,71,105 Interestingly, RAGE, a receptor of advanced glycation products has been reported to be important in the activation of lung ischemia- reperfusion injury and development of CLAD, was observed to be significantly lower in abundance within RAS patients compared to controls and BOS.106

3.4.3 Variable microbial density in allograft microbial of LTR without infection

To begin with the analysis of the lung transplant cohort, we performed 16S rRNA qPCR to assess the absolute microbial density of the BALF samples from non-infected lung transplant recipients(Figure 20). We observed the absolute microbial varied by 102-106 gene copies/mL across the cohort. We observed a non-statistically significant (p =0.2) trend in lower absolute density of the allograft microbiota in control (mean log 4.25 copies/mL, log 2-4 copies/mL) LTR in comparison to RAS (4.6 log copies/mL, log 4-5.5 copies/mL) and BOS patients (mean 4.75 log copies/mL, 4-5 log copies/mL). Within CLAD patients, we do not observe a difference in microbial density between phenotypes, yet this may be due to the limited sample size of RAS patients.

3.4.4 Analysis of unfiltered samples reveals low density and high contribution of contaminants to community composition.

We first assessed the taxonomic composition and diversity of the lung transplant microbiota pre-filtration. Alpha diversity, the measure of diversity within the samples, was calculated using QIIME (Version 1.9.1) for Shannon diversity, Berger-Parker dominance, Chao1 and number of unique OTUs. We observed an overall mean Shannon diversity index (richness and evenness within a sample) of 5.6 ±1.003 within our cohort. All patients had a mean relative abundance that was not statistically significant (p>0.05) between groups yet CLAD LTR with the

65

RAS phenotype had a mean lower Shannon diversity index (mean 5.04, 3.5-6.5), in comparison to control LTR (mean 5.9, 3.8-6.5). Berger-Parker dominance (the relative abundance of the most dominant taxon) was calculated to be a mean 0.151 ±0.11. One of the most dominant taxon within these allograft microbiota communities was observed to be Prevotella (maximum relative abundance 41%), which is consistent with findings from Bernasconi et al., showing that Prevotella was negatively associated with infections in lung transplant recipents69. In addition, Acinetobacter and Veillonella dispar, was also observed in high relative abundance (>10%) in 26% and 23% of the samples. The number of unique OTUs and Chao1 (estimate richness with unlimited sampling), we observed to be a mean 154.8±37 and 189.5±45 for both measures, respectively.

We performed Source tracker 2 analysis to determine what proportion of sequences were attributed to reagents within the BALF sample. Source tracker analysis demonstrated that most taxa were attributable to the DNA extraction control as source for a large number of BALF samples. We report that 65% of BALF samples had > 50% of their sequencing reads originate from DNA extraction reagents (Figure 21).

3.4.5 Post-sequencing removal of contaminants from LTR retain <21% of sequences

Using the filtration method from the mock community cohort, we applied this post- sequencing identification and removal of contaminants within our lung transplant cohort. Figure

22 shows the histograms of the removal of contaminants from each step within the process. After filtration, we retained a mean of 21% of the total relative abundance within our samples but with significant between-sample variability (0.002-87% sequence retention). The first step in removal of low abundance taxa (<1%) removed a mean of 17% of the total sequences within each of these samples. In the second step, which includes the subtraction of sequences of taxa found within the negative DNA reagent controls, removed the largest proportion of contaminants as seen in the

66 mock community which was mean 56% of the sequences. The taxa that were in high abundance

(>10% of sequences) within the negative controls include Acinetobacter and Bacteriodes

(maximum RA 30% and 13% respectively). The final step removed local singletons (OTUs present in only one technical replicate) which comprised of a mean 7% of the total sequences. Prior to filtration we observed technical replicates had a mean Bray-Curtis dissimilarity of 0.6 ±0.22. We performed a group significance analysis (in QIIME 1) and determined that there were between sequencing run stool taxa such as Bacteroides and Prevotella copri differed significantly in relative abundance between runs. Post filtration, the Bray-Curtis dissimilarity values decreased to a mean 0.33±0.22. Importantly, in run 1, BALF samples were multiplexed with stool samples from an unrelated experiment.

3.4.6 The effect of sequence filtration on microbiota-inflammation correlations

We assessed the impact of sequence filtration on interpretation of microbiota-host relationships by generating correlation matrices for microbial diversity and BALF cytokine levels indices before and after sequence filtration. Post-sequencing filtration of the BALF samples had a major impact on the diversity of the samples and the relationship between inflammatory response within LTR. Within the pre-filtered data set, we observed an inverse relationship between alpha diversity measures (Shannon diversity, number of OTUs and Chao1) and cytokines in this cohort

LTR cohort (Figure 23A). This was consistent with many other studies showing dysbiosis, which can cause a decrease in diversity, correlated to an immune response within the allograft.69 Prior to applying post-sequencing filtration, Shannon diversity and Berger-Parker dominance were inversely correlated to BALF cytokine levels. These correlations became positive after filtration.

(Figure 23B, Figure 24). This change in relationship is due to the samples that do retain more than

67

50% of the original sequence (7 out of 27 in our cohort). Correlations between cytokines and taxonomic richness remained unchanged.

68

Table 4. Patient characteristics

Characteristics BOS RAS Control P value

(n=11) (n=4) (n=12)

Recipient age at transplant, year 37.5±14.6 43.3±14.6 54.6±12.4 NS (mean ± SD)

Male (%) 72.7 75 50 Primary diagnosis (%) IPF 27.3 50 30 COPD 18.2 0 10 CF 27.3 25 0 Other 27.3 25 60 Time of CLAD onset 42±30 39±30 NS (months) Time of BAL (months) 46±32 50±45 38±42 NS

69

8

6

4 16S rDNA 2 Log (Copy Number/mL) 0 Control BOS RAS

Figure 19. Absolute microbial density within the allograft of lung transplant recipients with CLAD in the absence of infection

Controls have a non-statistically significant (P>0.05) lower absolute density in comparison to BOS and RAS patients.

70

6 1000 10

800 8 4 600 6

2 400 4 IL10 CXCL10 PDGF BB 200 2 0 0 0

-2 -200 -2 Controls BOS RAS Controls BOS RAS Controls BOS RAS

60 300 10000

8000 40 200 6000 20 4000 CCL5 CCL2 VEGF C 100 2000 0 0 -20 0 -2000 Controls BOS RAS Controls BOS RAS Controls BOS RAS

5 15000 400

4 300

10000 3 200 IL17A GCSF 2 RAGE 100 5000 1 0

0 0 -100 Controls BOS RAS Controls BOS RAS Controls BOS RAS

Figure 20. Cytokine expression for each group of LTR. RAS patients exhibit statistically significant (P<0.05) differences in comparison to control and BOS patients.

71 DNA Extraction PCR Reagent Unknown

Controls

439 439 499 499 499 676 676 676 676 719 719 865 865

DNA Extraction Controls BOS PCR Reagnet

Unknown

865 877 877 972 972 1151 1151 1156 1156 1255 1255 1359 1359

BOS

1471 1471 1472 1472 1486 1486 539 539 539 632 632 632 805

BOS

805 805 819 819 830 830 830 835 835 835 1197 1197 1243

BOS RAS

1243 1320 1320 1432 1432 1464 1464 1189 1189 1205 1205 1352 1352

Figure 21. Source tracker 2 for the analysis of lung transplant recipient allograft microbiota Source tracker 2, Bayesian model approach to estimate the proportion of sequences attributed to a source community (reagents). Duplicated samples are present here that were sequenced over different sequencing runs.

72

2RemoveA total removed sequences <1% RA 1 3RemoveA total remove sequences from negative controls 4RemoveA TOTAL Local/GlobalREMOVED Singletons s__Fusobacterium_nucleatum_subsp._animalis f__Veillonellaceae s__Prevotella_oris s__Solobacterium_moorei 0.9 g__Sphingomonas s__Finegoldia_magna s__Atopobium_rimae s__Scardovia_wiggsiae g__Saccharibacteria_genera_inc ertae_s edis o__Clostridiales 0.8 s__Atopobium_vaginae s__Campylobacter_rectus s__Alloprevotella_rava s__Tannerella_forsythia o__Lactobacillales g__Rothia 0.7 s__Stomatobaculum_longum g__Lachnoanaerobaculum s__Leptotrichia_hongkongensis g__Haemophilus s__Lachnoanaerobaculum _umeaens e g__Capnocytophaga 0.6 s__Prevotella_maculosa s__Rothia_mucilaginosa g__Methylobacterium s__Lactobacillus_taiwanens is s__Prevotella_shahii f__Lachnospiraceae 0.5 g__Corynebacterium s__Megasphaera_micronuciformis g__Staphylococcus s__Actinomyces_graevenitzii s__Porphyromonas_catoniae s__Capnocytophaga_granulosa 0.4 g__Campylobacter

Relative Abundance Relative s__Prevotella_nanceiensis s__Oribacterium_sinus s__Capnocytophaga_leadbetteri s__Actinomyces_odontolyticus s__Corynebacterium_tuberculostear icum 0.3 g__Fusobacterium s__Haemophilus_parainfluenz ae g__Veillonella f__Leptotrichiaceae s__Prevotella_veroralis s__Gemella_haemolysans 0.2 s__Pseudomonas_aeruginos a o__Bacteroidales s__Prevotella_pallens s__Prevotella_oulorum g__Leptotrichia s__Leptotrichia_shahii 0.1 s__Prevotella_salivae g__Selenomonas g__Neisseria s__Veillonella_dispar s__Prevotella_histicola g__Prevotella 0 g__Streptococcus Controls BOS RAS

Figure 22. Histogram of post-sequence filtration of lung transplant BALF samples Dark grey bars present the first step removal of OTUs <1% in relative abundance. Medium grey bars represent step 2, the removal of DNA extraction controls. Light grey bars represent OTUs that local or global singletons that were removed from the data set. Colored bars represent retained sequences after post-sequencing removal.

73

A PTX3 B1.0 PTX3 1.0 CCL5_per_RANTES CCL5_per_RANTES CXCL10_per__IP10 CXCL10_per__IP10 Hu_TNFa_36 Hu_TNFa_36 Hu_PDGFbb_47 Hu_PDGFbb_47 Alpha_Defensin Alpha_Defensin CCL3_per_MIP1a CCL3_per_MIP1a CCL4_per__MIP1b CCL4_per__MIP1b CCL11_per___Eotaxin CCL11_per___Eotaxin CXCL9_per___MIG CXCL9_per___MIG Hu_IL6_19 Hu_IL6_19 CXCL10 CXCL10 TGFb2 TGFb2 Hu_IL13_51 Hu_IL13_51 Hu_IL6 Hu_IL6 Hu_GCSF_57 Hu_GCSF_57 CCL2_per__MCP1 CCL2_per__MCP1 S100Png_per_mL 0.5S100Png_per_mL 0.5 Hu_IL9_77 Hu_IL9_77 Hu_IL2_38 Hu_IL2_38 PMN PMN Hu_MIF_35 Hu_MIF_35 TGFb1 TGFb1 S100A8_per_A9 S100A8_per_A9 _g_per_mL _g_per_mL ( Correlation Spearman S100A8_per_A9ng_per_mL S100A8_per_A9ng_per_mL Hu_IL18_42 Hu_IL18_42 IL10_LLOQ_considered IL10_LLOQ_considered MMP8 MMP8 S100A9ng_per_mL S100A9ng_per_mL Hu_SCF_65 Hu_SCF_65 TGFb3 TGFb3 Hu_IL1ra_25 Hu_IL1ra_25 sRAGEng_per_mL sRAGEng_per_mL CXCL1_per__GROa CXCL1_per__GROa IL17_LLOQ_considered IL17_LLOQ_considered HMGB1_ng_per_mL 0 0 HMGB1_ng_per_mL CXCL8_per__IL8 CXCL8_per__IL8 S100A12ng_per_mL S100A12ng_per_mL Hu_VEGF_45 Hu_VEGF_45 S100A8ngPERmL S100A8ngPERmL Hu_IL10_56 Hu_IL10_56 Hu_bNGF_46 Hu_bNGF_46 Hu_IL12p70_75 Hu_IL12p70_75 Hu_IL1a_63 Hu_IL1a_63 Hu_IL1b_39 Hu_GMCSF_34 Hu_IL1b_39 Hu_IL7_74 Hu_GMCSF_34 Hu_IFNa2_20 Hu_IL7_74 r Hu_IFNa2_20 s Hu_IL15_73 ) IL17_per_IL10_LLOQ_considered Hu_IL15_73 Hu_IL16_27 IL17_per_IL10_LLOQ_considered Hu_IFNg_21 Hu_IL16_27 Hu_HGF_62 -0.5 Hu_IFNg_21 -0.5 Hu_IL3_64 Hu_HGF_62 Hu_TRAIL_66 Hu_IL3_64 CXCL12_per__SDF1a Hu_TRAIL_66 Hu_IL17_A_76 CXCL12_per__SDF1a Hu_SCGFb_78 Hu_IL17_A_76 Hu_IL4_52 Hu_SCGFb_78 Hu_FGF_basic_44 Hu_IL4_52 Hu_IL2Ra_13 Hu_FGF_basic_44 CCL27_per__CTACK Hu_IL2Ra_13 Hu_MCSF_67 CCL27_per__CTACK Hu_IL12p40_28 Hu_MCSF_67 Hu_TNFb_30 Hu_IL12p40_28 CCL7_per__MCP3 Hu_TNFb_30 Hu_LIF_29 CCL7_per__MCP3 Hu_LIF_29 Hu_LIF_29 IL10_LLOQ_considered IL10_LLOQ_considered IL17_per_IL10_LLOQ_considered IL17_per_IL10_LLOQ_considered-1.0 -1.0 shannon Observed OTUs Chao1 Berger-Parker shannon Observed OTUs Chao1 Berger-Parker

Figure 23. Heat map showing inverse in correlations between diversity and cytokines post filtration in LTR. A- Unfiltered B- Filtered Green represents a positive correlation between cytokines and diversity. Red represent a negative relationship between cytokine and diversity.

74

Shannon Diveristy Berger-Parker 1.0 1.0

0.5 0.5

0.0 0.0

-0.5 -0.5 rs= -0.4344 rs= -0.5154 Filtered Correlation p <0.001 Filtered Correlation p <0.001 -1.0 -1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Unfiltered Correlation Unfiltered Correlation

Observed-OTUs Chao1 1.0 1.0

0.5 0.5

0.0 0.0

-0.5 r = 0.6242 -0.5 s rs= 0.5854 Filtered Correlation p <0.001 Filtered Correlation p <0.001 -1.0 -1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Unfiltered Correlation Unfiltered Correlation

Figure 24. Inverted relationship between alpha diversity metrics and cytokines after filtration of LTR. Shannon diversity (richness and evenness) and Berger-Parker dominance (RA of the most abundant OTU) values were both inversed after filtration. Observed OTUs and Chao1 (number of OTUs predicted with infinite sampling) maintained their relationship after filtration.

75

3.5 Threshold of 16S rRNA sequencing detection

We suggest that <103 CFU/mL represents a threshold of detection of 16S rRNA sequencing. Within the mock community data set, we observed that there was a density dependent relationship between contaminants and the signal identified through 16S rRNA sequencing.

Comparatively, in samples at the lowest densities (<103CFU/mL) contaminants were represented

>90% of the sequence. A plateau in qPCR cycle threshold was identified between samples below this input density (Figure 5B), and we observed a minimal impact of pre-sequencing treatments and post-sequencing filtering on community composition at 103CFU/mL. Importantly, within the sequenced lung transplant cohort we identified that a large proportion of samples were dominated by contaminants in similar abundance to reagent controls. After the post-sequence removal of contaminants only 9 out of 27 LTR retained ≥40% of their sequences in at least 1 replicate.

Coupled with the absolute density values identified through 16S rRNA qPCR we observed that many of the samples 20 samples were at or less than the threshold of detection. The identification of this threshold is critical for the appropriate interpretation of BALF community structure.

Samples at or below the threshold of detection would have inaccurate assessments of within sample diversity and inter-sample taxonomic similarity without appropriate sequence filtration.

76

Overall, we have characterized the effects of contamination on mock community and low density BALF samples after culture independent analysis. Pre-sequencing treatments improved the signal:noise ratio in a density dependent manner for samples above an input density of >103

CFU/mL. Post-sequencing removal of contamination was effective in providing reproducible community composition for samples above the lower limit of detection. Most importantly, appropriate filtration of samples in this cohort effectively reversed observed alpha- diversity/cytokine correlations for Shannon Diversity and Berger-Parker dominance, highlighting the potential impact of this phenomenon on the interpretation of host-microbiota relationships.

77

Chapter 4 Discussion

Our analysis of mock communities across a range of dilutions and BALF samples from

LTR in the absence of infection supports the following conclusions:

1) Contaminants are a significant proportion of sequencing reads within the mock

community samples at moderate and low input bacterial densities;

2) Within mock community and BALF samples, we observed that contaminants are

introduced mainly through laboratory reagents used for DNA extraction, library

preparation and sequencing;

3) Samples with input bacterial densities of <103 CFU/mL were below the limit of

detection of 16S rRNA gene sequencing.

4) BALF samples collected from LTR in the absence of infection commonly have input

densities at or below the threshold for detection/accurate sequencing, and;

5) Analysis of low-density samples that do not appropriately identify and account for

thresholds of detection and contamination can result in incorrect interpretation of host-

microbial relationships within LTR.

Through observations of mock community samples and BALFs over a range of densities we were able to identify sample management strategies and sequence filtration approach to minimize the contribution of contaminants on community structure. We systematically performed pre-sequencing treatments such as concentration and DNase treatments to improve the signal to

78 noise ratios within a sample. Overall, concentration was the most efficient pre-sequencing method for improving the mock community signal within samples with a density greater than 103 CFU/mL.

4.1 Study limitations

This study has several limitations with respect to both synthetic mock communities created and the retrospective collection of BALF samples from lung transplant recipients. The mock community was used as a positive control to mimic taxa that may reside within the allograft lung.

This synthetic community excluded many taxa found in BALF and other lung samples, such as anaerobic species44,60, and the performance of pre-sequencing treatments on these taxa may not be uniform.90 For instance, Nadkarni et al. suggest increasing the concentration of universal primers by 3-fold for 16S rRNA qPCR in samples mainly containing anaerobic species.90 Therefore, without these species included in our mock community we do not have a complete understanding regarding the precision and accuracy using culture independent analysis of complex community structures in the lung.

All of the mock community members have been identified as contaminants at the genus level.81,107–110 Although the ratios of signal:noise over serially diluted community are similar to what was observed as signal in previous mock community studies, sequences that may have been introduced by negative reagent controls could have been attributed to signal taxa (e.g. non- aeruginosa Pseudomonas species) which can affect the relative abundance of both mock community taxa and contaminants.81

In addition, our conclusions regarding the pre-sequencing treatment are limited due to the partial analysis of the range of mock communities sequenced and the inability to perform these treatments on a variety of BALF samples. The conclusions regarding the pre-sequencing treatments on the mock community cohort are inclusive of the range of BALF samples within the

79 lung, yet a robust analysis of all mock community and BALF sample densities would allow us to make more definitive conclusions regarding which treatment is the best and how treatments affect signal:noise ratio.

With our post-sequencing methodology some concerns that may arise include, the removal of potentially “real” low abundance taxa. The mock community was composed of an even distribution of three taxa, which were therefore in high relative abundance. We have suggested the removal of low abundance taxa (<1%) within our post-sequencing decontamination due to the large presence of singletons and non-mock community taxa within our mock community cohort.

Yet in studies of the lung microbiota of cystic fibrosis patients, low relative abundance taxa are used as to discriminate between patients dominated by pathogens.111 Therefore, it may be that the lung microbiota is composed of complex communities and the removal of less abundant contaminants may be inclusive of actual organisms that reside and distinguish between the allograft microbiota in LTR. However, the lack of reproducibility between technical replicates call into question their importance and reliability of these taxa representing true lung residents.

Our last filtration step is the retention of taxa present in both technical replicates, which necessitates sequencing in duplicate. We observed that sequencing run of technical replicates at high density (e.g. >104CFU/mL) minimally changes the taxonomic composition. However, sequencing samples in duplicate may be a very costly and time-consuming step with an insignificant improvement in sequencing reproducibility. Next, we have identified sequencing- run batch effects that occur when sequencing low-density samples across different sequencing runs. This limitation can cause artefactual variance in the relative abundance of both signal and noise taxa. Batch to batch variability has been a common phenomenon in culture independent analysis of microbiota that can cause misleading results about the composition of the sample112,113.

80

Although we identified a ‘threshold of detection’ for sequencing, we did not assess the adequacy of BALF for sampling the entire lung. BALF does not represent a sample of the distal airway/alveolus, and low microbial density in a BALF sample does not exclude the possibility of high density bacterial communities sequestered in the distal airway. Future studies comparing

BALF densities and regional lung tissue microbial communities in LTR (e.g. in explanted lungs) would be informative.

Finally, we did not assess the physiologic significance of low-density communities in a model system or in the human lung. It remains possible that very low-density communities are still able to elicit responses within the allograft that may precipitate clinically significant events.

Therefore, our results regarding the allograft microbiota and CLAD onset are purely observational.

4.2 Comparison to published reports

Our results validate and extend important observations regarding contaminants in low density samples and the allograft microbiota. Low density samples, such as bronchoalveolar lavage fluid, have proven to be difficult to analyze.32,80,114 Correctly identifying microbial communities with culture independent analysis within low biomass samples and the amplification of ubiquitous contaminants is not a problem unique to BALF.32,43,115,116 Contaminating DNA have been reported in may studies within laboratory reagents such as DNA extraction kits and PCR negative.81,117–121

Many of the contaminant taxa in published reports are soil or water-dwelling organisms or taxa known to colonize the host in areas that include the skin and gut.81,122,123 In our study, a large proportion of contaminants were Acinetobacter, a contaminant that has been ubiquitous not only in our mock community samples but also our BALF cohort. This organism is prevalent in environmental samples and sequencing reagents. 81,117–121

81

The importance of appropriate analysis and interpretation of low-density samples has been observed in other culture-independent analyses of human disease. Misinterpretation due to contamination in low-density samples has plagued microbiome studies and has been observed in high profile studies, such as the discovery of XMRV virus in chronic fatigue syndrome patients and that of ancient human DNA and pathogens.124,125 Even within the healthy lung microbiota we have observed contaminants in bronchoscopes and controls that are at a similar taxonomic relative abundance within BALF samples.32 The microbiota, especially at sites within the allograft of lung transplant recipients, has been of interest to many researchers due to the potential host-microbial relationship and disease outcomes in this population.52,54,55,69 Some studies have observed higher diversity in samples of asymptomatic or no dysbiosis LTR.58,69 Yet, with the lack of a ‘gold standard’ in processing low density samples it may be difficult to attribute these results to microbiota within the lung rather than the consequences of contamination.

Within the current allograft microbiota literature 10 reports have been published that utilized BALF to observe the microbiota in LTR.30,44,54,55,57,58,60,66,69,126 Based on our observations, we would interpret these studies with the following caveats:

1. Studies in which a high proportion (or all) subjects had a high microbial load (e.g.

those with pneumonia) are less likely to be influenced by density-dependent

artifact;

2. Studies in which control samples were used to identify and remove contaminants

are less likely to be affected by sequencing artifact;

3. Sequence data for low density BALF may have been incorrectly interpreted. First,

low density samples may have inflated diversity due to the inappropriate inclusion

82

of contaminants in diversity calculations. In a study performed by Borewicz et al.

they observed the LTR allograft microbiota was more diverse than non-

transplanted healthy controls.60 This is in comparison to other studies that have

observed lower diversity between the microbial composition in the lungs of

transplant and non-transplanted patients.44,55,58 Second, some taxa present in low-

density samples may be interpreted as biologically significant when in fact they

represent contaminants. In a study performed by Dickson et al., they observed

differences in the microbial composition of asymptomatic vs symptomatic

patients. Specifically, the presence of P.fluorescens was identified in LTR that

were asymptomatic and correlated to an increase in microbial diversity and lower

microbial load in comparison to LTR with P.aeruginosa as the dominant

Pseudomonas species. While P. aeruginosa was readily culturable when detected

by sequencing, they were unable to culture P.fluorescens in a large proportion of

samples where it was the dominant Pseudomonas OTU (1/15 subjects).

P.fluorescens is a fairly ubiquitous species, found within the environment and

more recently in humans, yet the inability to observe viable species calls maybe

due presence of sequencing artifacts within the samples.127,128

We compared our analysis of the cytokine-microbial relationships in LTR to the data presented in the study performed by Bernasconi et al.69 In regards to composition of the microbiota, we observed that the samples that retained sequences had a high proportion of

Prevotella spp. which was consistent Bernasconi et al. cohort as the with the most frequent dominating taxa. We observed RAS patients were correlated to have a higher pro-remodeling expression of cytokines (such as the expression of platelet derived growth factor, PDGF, and matrix metalloproteinases8 ,MMP8 ) in comparison to BOS and control patients. RAS patients had

83 a non-statistically significant higher microbial density and all the patients that retained sequences had Prevotella in a relative abundance > 10%. This confirms the observations reported within the

Bernasconi et al. cohort. They discovered that patients with high proportions of Prevotella (>20%

RA) within the allograft microbiota were correlated to remodeling immune profile. Conversely,

LTR with a low proportion of Prevotella (<20%RA) was correlated to a pro-inflammatory response. Prevotella has been observed to be a part of the ‘core’ microbiota. 69 Yet, in the case of

LTR, it is hypothesized acute inflammatory responses (mainly observed within the first 6 month after transplant) cause the host to be more susceptible to a pro-remodeling activation, from lower stimulatory taxa such as Prevotella. 69

4.3 Future Directions

This study explores the potential in optimizing low density samples using both pre- and post-sequencing methodologies. We have planned future analyses to further optimize analysis of these samples. Stämmler et al. conducted a study of serially diluted stool samples with added exogenous, viable ‘spike-in’ bacteria to calibrate the sequencing reads of signal taxa.129 They developed this technique to normalize absolute abundances of taxa across various sample microbial loads. This spike-in calibration allowed precise estimation of the absolute reads of endogenous bacteria and identified changes in relative abundance at the phylum level in human stool samples of patients with allogeneic stem cell transplantation. Not only do we want to calibrate sequences with this technique in low-density BALF samples, but we also would like to utilize spike in bacteria to observe samples at the threshold of detection.

84

Using the information regarding the theoretical threshold of detection (102-103CFU/mL),

we would like to utilize the spike-in control as an internal indicator of the proportion of actual

reads in a sample and increase the accuracy of 16S rRNA detection by increasing the absolute

sample density. We hypothesize that if we spike in a positive control equivalent to the microbial

density of a BALF sample, spike-in bacteria will displace a proportion of contaminating taxa from

reagents and the environment (Figure 25).

1

0.9 B. haloduans spike in at 104 CFU/mL 0.8

0.7

0.6 haloduans haloduans B. 0.5 B.

0.4 RelativeAbundance

0.3 Mock Community Mock 0.2

0.1

0 Raw Sample Spike in Raw Sample Spike in Raw Sample Spike in 102CFU/mL 102CFU/mL 104CFU/mL 104CFU/mL 106CFU/mL 106CFU/mL

Figure 25. Proposed theory of spike in controls with mock community samples Black bar represent mock community members, the grey bar represents the spike in control, 4 B.halodurans, at 10 CFU/mL and the colored bar represent contaminants. The spike in is the concentration of the theoretical “threshold of detection” therefore samples higher than this

85

If the sample were at the threshold of detection, adding a spike-in also at this density would represent approximately 50% of the total relative abundance. If the sample has an absolute density greater than the spike-in control, then the relative abundance of the spike-in will be less than 50%.

If the sample density is less than the spike-in concentration, the relative abundance of the spike-in taxa will be greater than 50%. We hypothesize that utilizing an exogenous control at the threshold of detection for 16S rRNA sequencing in low density samples can increase overall density of the sample (increase accuracy of sequencing as seen in the results) and overwhelm contaminants introduced downstream of sample collection.

We will create a two mock community samples over a range of densities (108-101CFU/mL) and spike Bacillus haldurans, an environmental species, that has had its whole genome sequenced for species level identification and is not a common laboratory contaminant. We will also perform pre-sequencing treatments that include concentration, DNase and both treatments on the mock community samples to confirm initial observations in the changes in of composition of the microbiome over the full range of densities.

Utilizing the sequence-optimization strategies we have developed will allow us to optimize the analysis BALF samples with the intent of analyzing the microbiota of lung transplant receipts.

Future work will also include a large retrospective study of the allograft microbiota and compositional changes over time respective to CLAD development. To date, we have BALF samples from 230 lung transplant recipients (90 CLAD, 120 LTR controls) over 4 time points

(3,6,9,12 months post-lung transplantation) to elucidate changes in the allograft prior to CLAD diagnosis. This study will be inclusive of patients with BOS (n ≈ 65) and RAS (n≈ 25), to draw conclusions about the taxonomic differences over time in the allograft microbiota between these two phenotypes. This will be one of the largest cohorts to analyze the allograft microbiota

86 comparing both phenotypes of CLAD to date. These optimization techniques will be of importance when correctly identifying the composition of the allograft in order to draw valid conclusion without the confounding element of contaminants within this cohort.

Lastly, in order to make conclusions regarding causal relationships between the microbiota and the allograft in lung transplant recipients, mechanistic models must be employed. Mouse models can mimic the relationships observed in humans and allow us to assess potential causal relationships of the microbiota in CLAD. Although multiple models of lung transplantation exist, no model recapitulates all of the pathological features of CLAD.130,131 Bronchial epithelial cells represent an in vitro alterative to mouse models to assess limited mechanistic questions.132,133

Potential biomarkers of CLAD from observational studies can be utilized within this mechanistic model to provide information at cell-level resolution of the relationship between host-cell recognition of disease-associated microbial communities. The potential to limit confounding elements, such as the presence of antibiotics or the genetic differences between murine and host, within an in vitro study can allow researchers to elucidate concrete evidence regarding the allograft microbiota and host response within LTR.

The results outlined in this thesis highlight the importance of a systematic technical approach in analyzing BALF for the analysis of the allograft microbiota. We have shown the effects on of contaminants on a range of mock community samples and the consequences of pre/post sequencing optimization of low-density samples such as BALF. Developing these optimization strategies will allow researchers to draw more robust and reliable conclusions regarding the allograft microbiota in lung transplant recipients.

87

References

1. Ursell, L. K., Metcalf, J. L., Parfrey, L. W. & Knight, R. Defining the human microbiome. Nutr. Rev. 70, S38-44 (2012).

2. NIH HMP Working Group, T. N. H. W. et al. The NIH Human Microbiome Project. Genome Res. 19, 2317–23 (2009).

3. Gest Van Leeuwenhoek, H. & Gest, H. The discovery of microorganisms by Robert Hooke and Antoni van Leeuwenhoek, fellows of the royal society. Notes Rec. R. Soc. L. 187–201 (2004). doi:10.1098/rsnr.2004.0055

4. Grice, E. A. & Segre, J. A. The skin microbiome. Nat. Rev. Microbiol. 9, 244–53 (2011).

5. Bassis, C. M., Tang, A. L., Young, V. B. & Pynnonen, M. A. The nasal cavity microbiota of healthy adults. Microbiome 2, 27 (2014).

6. Ma, B., Forney, L. J. & Ravel, J. Vaginal microbiome: rethinking health and disease. Annu. Rev. Microbiol. 66, 371–89 (2012).

7. Guarner, F. & Malagelada, J.-R. Gut flora in health and disease. Lancet 361, 512–519 (2003).

8. Cho, I. & Blaser, M. J. The human microbiome: at the interface of health and disease. Nat. Rev. Genet. 13, 260–270 (2012).

9. Human Microbiome Project Consortium, B. A. et al. A framework for human microbiome research. Nature 486, 215–21 (2012).

10. Huttenhower, C. et al. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

11. Turnbaugh, P. J. et al. The human microbiome project. Nature 449, 804–10 (2007).

12. Proctor, L. M. The Human Microbiome Project in 2011 and Beyond. Cell Host Microbe 10, 287–291 (2011).

88

13. Lloyd-Price, J., Abu-Ali, G. & Huttenhower, C. The healthy human microbiome. Genome Med. 8, 51 (2016).

14. Benson, A. K. et al. Individuality in gut microbiota composition is a complex polygenic trait shaped by multiple environmental and host genetic factors. Proc. Natl. Acad. Sci. U. S. A. 107, 18933–8 (2010).

15. Ley, R. E., Turnbaugh, P. J., Klein, S. & Gordon, J. I. Microbial ecology: Human gut microbes associated with obesity. Nature 444, 1022–1023 (2006).

16. Turnbaugh, P. J. et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444, 1027–131 (2006).

17. Ley, R. E. et al. Obesity alters gut microbial ecology. Proc. Natl. Acad. Sci. U. S. A. 102, 11070–5 (2005).

18. Greenblum, S., Turnbaugh, P. J. & Borenstein, E. Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease. Proc. Natl. Acad. Sci. U. S. A. 109, 594–9 (2012).

19. Alakomi, H. L. et al. Lactic acid permeabilizes gram-negative bacteria by disrupting the outer membrane. Appl. Environ. Microbiol. 66, 2001–5 (2000).

20. Lai, S. K. et al. Human immunodeficiency virus type 1 is trapped by acidic but not by neutralized human cervicovaginal mucus. J. Virol. 83, 11196–200 (2009).

21. Witkin, S. S., Alvi, S., Bongiovanni, A. M., Linhares, I. M. & Ledger, W. J. Lactic acid stimulates interleukin-23 production by peripheral blood mononuclear cells exposed to bacterial lipopolysaccharide. FEMS Immunol. Med. Microbiol. 61, 153–158 (2011).

22. Bull, M. J. & Plummer, N. T. Part 1: The Human Gut Microbiome in Health and Disease. Integr. Med. (Encinitas). 13, 17–22 (2014).

23. Björkstén, B., Sepp, E., Julge, K., Voor, T. & Mikelsaar, M. Allergy development and the intestinal microflora during the first year of life. J. Allergy Clin. Immunol. 108, 516–20 (2001).

89

24. Belkaid, Y. & Hand, T. W. Role of the microbiota in immunity and inflammation. Cell 157, 121–41 (2014).

25. Joseph E. Thorpe, R. P. B., Peter T. Frame, T. A. W. & and Joseph L. Staneck. Bronchoalveolar Lavage for Diagnosing Acute Bacterial Pneumonia. J. Infect. Dis. 155, 855–860 (1987).

26. Baughman, R. P., Thorpe, J. E., Staneck, J., Rashkin, M. & Frame, P. T. Use of the Protected Specimen Brush in Patients with Endotracheal or Tracheostomy Tubes. Chest 91, 233–236 (1987).

27. Dickson, R. P., Erb-Downward, J. R., Martinez, F. J. & Huffnagle, G. B. The Microbiome and the Respiratory Tract. Annu. Rev. Physiol. 78, 481–504 (2016).

28. Dickson, R. P. & Huffnagle, G. B. The Lung Microbiome: New Principles for Respiratory Bacteriology in Health and Disease. PLoS Pathog. 11, 1–5 (2015).

29. Lighthart, B. Mini-review of the concentration variations found inthe alfresco atmospheric bacterial populations. Aerobiologia (Bologna). 16, 7–16 (2000).

30. Dickson, R. P. et al. Analysis of culture-dependent versus culture-independent techniques for identification of bacteria in clinically obtained bronchoalveolar lavage fluid. J. Clin. Microbiol. 52, 3605–13 (2014).

31. Venkataraman, A. et al. Application of a neutral community model to assess structuring of the human lung microbiome. MBio 6, (2015).

32. Charlson, E. S. et al. Topographical continuity of bacterial populations in the healthy human respiratory tract. Am. J. Respir. Crit. Care Med. 184, 957–63 (2011).

33. Bassis, C. M. et al. Analysis of the upper respiratory tract microbiotas as the source of the lung and gastric microbiotas in healthy individuals. MBio 6, e00037-15 (2015).

34. Beck, J. M., Young, V. B. & Huffnagle, G. B. The microbiome of the lung. Transl. Res. 160, 258–66 (2012).

90

35. Dickson, R. P., Erb-Downward, J. R. & Huffnagle, G. B. Towards an ecology of the lung: new conceptual models of pulmonary microbiology and pneumonia pathogenesis. Lancet. Respir. Med. 2, 238–46 (2014).

36. Erb-Downward, J. R. et al. Analysis of the Lung Microbiome in the ‘Healthy’ Smoker and in COPD. PLoS One 6, e16384 (2011).

37. Hilty, M. et al. Disordered Microbial Communities in Asthmatic Airways. PLoS One 5, e8578 (2010).

38. Dickson, R. P. et al. Spatial Variation in the Healthy Human Lung Microbiome and the Adapted Island Model of Lung Biogeography. Ann. Am. Thorac. Soc. 12, 821–30 (2015).

39. Willner, D. et al. Spatial distribution of microbial communities in the cystic fibrosis lung. ISME J. 6, 471–4 (2012).

40. Beck, J. M. et al. Multicenter Comparison of Lung and Oral Microbiomes of HIV-infected and HIV-uninfected Individuals. Am. J. Respir. Crit. Care Med. 192, 1335–44 (2015).

41. Morris, A. et al. Comparison of the respiratory microbiome in healthy nonsmokers and smokers. Am. J. Respir. Crit. Care Med. 187, 1067–75 (2013).

42. Segal, L. N. et al. Enrichment of lung microbiome with supraglottic taxa is associated with increased pulmonary inflammation. Microbiome 1, 1–10 (2013).

43. Charlson, E. S. et al. Assessing Bacterial Populations in the Lung by Replicate Analysis of Samples from the Upper and Lower Respiratory Tracts. PLoS One 7, e42786 (2012).

44. Charlson, E. S. et al. Lung-enriched organisms and aberrant bacterial and fungal respiratory microbiota after lung transplant. Am. J. Respir. Crit. Care Med. 186, 536–545 (2012).

45. Marsland, B. J. & Gollwitzer, E. S. Host–microorganism interactions in lung diseases. Nat. Rev. Immunol. 14, 827–835 (2014).

46. Dickson, R. P., Erb-Downward, J. R. & Huffnagle, G. B. The role of the bacterial

91

microbiome in lung disease. Expert Rev. Respir. Med. 7, 245–57 (2013).

47. Rogers, G. B. et al. Characterization of bacterial community diversity in cystic fibrosis lung infections by use of 16s ribosomal DNA terminal restriction fragment length polymorphism profiling. J. Clin. Microbiol. 42, 5176–83 (2004).

48. Royer, P.-J., Olivera-Botella, G., Koutsokera, A., Aubert, J.-D. & Bernasconi, E. Chronic Lung Allograft Dysfunction:A Systematic Review of Mechanisms. Transplantation 100, 1803–1814 (2016).

49. Toronto Lung Transplant Group. Unilateral Lung Transplantation for Pulmonary Fibrosis. N. Engl. J. Med. 314, 1140–1145 (1986).

50. Chambers, D. C. et al. The Registry of the International Society for Heart and Lung Transplantation: Thirty-fourth Adult Lung And Heart-Lung Transplantation Report— 2017; Focus Theme: Allograft ischemic time MPH and for the International Society for Heart and Lung Transplantatio. J. Hear. Lung Transplant. 36, 1048–1059 (2017).

51. HUSAIN, A. N. et al. Analysis of Risk Factors for the Development of Bronchiolitis Obliterans Syndrome. Am. J. Respir. Crit. Care Med. 159, 829–833 (1999).

52. Borthwick, L. A. et al. Pseudomonas aeruginosa Induced Airway Epithelial Injury Drives Fibroblast Activation: A Mechanism in Chronic Lung Allograft Dysfunction. Am. J. Transplant. 16, 1751–1765 (2016).

53. Willner, D. et al. 68 Distinct Microbial Signatures of Healthy and Failing Lung Allografts. J. Hear. Lung Transplant. 31, S32 (2012).

54. Botha, P. et al. Pseudomonas aeruginosa colonization of the allograft after lung transplantation and the risk of bronchiolitis obliterans syndrome. Transplantation 85, 771– 774 (2008).

55. Willner, D. L. et al. Reestablishment of Recipient-associated Microbiota in the Lung Allograft Is Linked to Reduced Risk of Bronchiolitis Obliterans Syndrome. Am. J. Respir. Crit. Care Med. 187, 640–647 (2013).

92

56. Dickson, R. P. et al. The Lung Microbiota Is Distinct Following Lung Transplantation And Is Associated With Bronchiolitis Obliterans Syndrome. a104. Lung Transplant. Clin. Transl. Adv. A2209–A2209 (2013). doi:doi:10.1164/ajrccm- conference.2013.187.1_MeetingAbstracts.A2209

57. Shankar, J. et al. Looking Beyond Respiratory Cultures: Microbiome-Cytokine Signatures of Bacterial Pneumonia and Tracheobronchitis in Lung Transplant Recipients. Am. J. Transplant. 16, 1766–1778 (2016).

58. Dickson, R. P. et al. Changes in the lung microbiome following lung transplantation include the emergence of two distinct pseudomonas species with distinct clinical associations. PLoS One 9, (2014).

59. Syed, S. A. et al. Reemergence of Lower-Airway Microbiota in Lung Transplant Patients with Cystic Fibrosis. Ann. Am. Thorac. Soc. 13, 2132–2142 (2016).

60. Borewicz, K. et al. Longitudinal analysis of the lung microbiome in lung transplantation. FEMS Microbiol. Lett. 339, 57–65 (2013).

61. Luna, R. A. et al. Characterization of the Lung Microbiome in Pediatric Lung Transplant Recipients. J. Hear. Lung Transplant. 32, S291 (2013).

62. Sato, M. Chronic lung allograft dysfunction after lung transplantation: The moving target. Gen. Thorac. Cardiovasc. Surg. 61, 67–78 (2013).

63. Geert M. Verleden, Ganesh Raghu, Keith C. Meyer, Allan R. Glanville, and P. C. A NEW CLASSIFICATION SYSTEM FOR CHRONIC LUNG ALLOGRAFT DYSFUNCTION. ISHLT at

64. Sato, M. et al. Restrictive allograft syndrome (RAS): A novel form of chronic lung allograft dysfunction. J. Hear. Lung Transplant. 30, 735–742 (2011).

65. Hayes, D. et al. Gram-Negative Infection and Bronchiectasis in Lung Transplant Recipients with Bronchiolitis Obliterans Syndrome. Thorac Cardiovasc Surg 61, 240–245 (2013).

93

66. Vos, R. et al. Pseudomonal airway colonisation: risk factor for bronchiolitis obliterans syndrome after lung transplantation? Eur. Respir. J. 31, 1037–45 (2008).

67. Gregson, A. L. et al. Interaction between Pseudomonas and CXC chemokines increases risk of bronchiolitis obliterans syndrome and death in lung transplantation. Am. J. Respir. Crit. Care Med. 187, 518–26 (2013).

68. Willner, D. et al. Comparison of DNA Extraction Methods for Microbial Community Profiling with an Application to Pediatric Bronchoalveolar Lavage Samples. PLoS One 7, e34605 (2012).

69. Bernasconi, E. et al. Airway microbiota determines innate cell inflammatory or tissue remodeling profiles in lung transplantation. Am. J. Respir. Crit. Care Med. 194, 1252– 1263 (2016).

70. Verleden, S. E. et al. Bronchiolitis Obliterans Syndrome and Restrictive Allograft Syndrome. Transplant. J. 95, 1167–1172 (2013).

71. Kuehnel, M., Maegel, L., Vogel-Claussen, J., Robertus, J. L. & Jonigk, D. Airway remodelling in the transplanted lung. Cell Tissue Res. 367, 663–675 (2017).

72. Dickson, R. B. Z. E.-D. J. Lung Microbiota Diversity Is Decreased Among Lung Transplant Recipients. AJRCCM C14, A3781–A3781 (2014).

73. Twigg, H. L. et al. Effect of Advanced HIV Infection on the Respiratory Microbiome. Am. J. Respir. Crit. Care Med. 194, 226–35 (2016).

74. Wang, Z. et al. Lung microbiome dynamics in COPD exacerbations. Eur. Respir. J. 47, 1082–1092 (2016).

75. Collard, H. R. et al. Acute Exacerbations of Idiopathic Pulmonary Fibrosis. Am. J. Respir. Crit. Care Med. 176, 636–643 (2007).

76. Morgan, X. C. & Huttenhower, C. Chapter 12: Human microbiome analysis. PLoS Comput. Biol. 8, e1002808 (2012).

94

77. Clarridge III, J. E. Impact of 16S rRNA gene sequence analysis for identification of bacteria on clinical microbiology and infectious diseases. Clin. Microbiol. Rev. 17, 840– 62 (2004).

78. Callbeck, C. M. et al. Improving PCR efficiency for accurate quantification of 16S rRNA genes. J. Microbiol. Methods 93, 148–152 (2013).

79. Talmadge E King, Jr, M. Basic principles and technique of bronchoalveolar lavage - UpToDate. at

80. Aho, V. T. E. et al. The microbiome of the human lower airways: a next generation sequencing perspective. World Allergy Organ. J. 8, 1–13 (2015).

81. Salter, S. J. et al. Reagent and laboratory contamination can critically impact sequence- based microbiome analyses. BMC Biol. 12, 1–12 (2014).

82. Sun, G. et al. Efficient purification and concentration of viruses from a large body of high turbidity seawater. MethodsX 1, 197–206 (2014).

83. Goyal, S. M. & Gerba, C. P. Simple Method for Concentration of Bacteria from Large Volumes of Tap Water. Appl. Environ. Microbiol. 40, 912–916 (1980).

84. Francy, D. S. et al. Comparison of filters for concentrating microbial indicators and pathogens in lake water samples. Appl. Environ. Microbiol. 79, 1342–52 (2013).

85. Lazarevic, V., Gaïa, N., Girard, M. & Schrenzel, J. Decontamination of 16S rRNA gene amplicon sequence datasets based on bacterial load assessment by qPCR. BMC Microbiol. 16, 1–8 (2016).

86. Corless, C. E. et al. Contamination and sensitivity issues with a real-time universal 16S rRNA PCR. J. Clin. Microbiol. 38, 1747–52 (2000).

87. Klaschik, S., Lehmann, L. E., Raadts, A., Hoeft, A. & Stuber, F. Comparison of Different Decontamination Methods for Reagents to Detect Low Concentrations of Bacterial 16S DNA by Real-Time-PCR. Mol. Biotechnol. 22, 231–42 (2002).

95

88. Pezzulo, A. A. et al. Abundant DNase I-sensitive bacterial DNA in healthy porcine lungs and its implications for the lung microbiome. Appl. Environ. Microbiol. 79, 5936–41 (2013).

89. Mcinnes, P. Manual of Procedures for Human Microbiome Project Core Microbiome Sampling Protocol A HMP Protocol # 07-001. (2010). at

90. Nadkarni, M., Martin, F. E., Jacques, N. A. & Hunter, N. Determination of bacterial load by real-time PCR using a broad range (universal) probe and primer set. Microbiology 148, 257–266 (2002).

91. Caporaso, J. G. et al. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 68, 1621–1624 (2012).

92. Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).

93. Edgar, R. C. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat. Methods 10, 996–998 (2013).

94. Edgar, R. C. UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. bioRxiv 81257 (2016). doi:10.1101/081257

95. Edgar, R. SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences. bioRxiv (2016). doi:10.1101/074161

96. Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–7 (2007).

97. Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–6 (2010).

98. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).

96

99. Coburn, B. et al. Lung microbiota across age and disease stage in cystic fibrosis. Sci. Rep. 5, 10241 (2015).

100. Chung, H. et al. Global and local selection acting on the pathogen Stenotrophomonas maltophilia in the human lung. Nat. Commun. 8, 14078 (2017).

101. Stover, C. K. et al. Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen. Nature 406, 959–964 (2000).

102. Lira, F. et al. Whole-genome sequence of Stenotrophomonas maltophilia D457, a clinical isolate and a model strain. J. Bacteriol. 194, 3563–4 (2012).

103. Hsueh, P.-T. et al. Genomic Sequence of Burkholderia multivorans NKI379, a Soil Bacterium That Inhibits the Growth of Burkholderia pseudomallei. Genome Announc. 3, (2015).

104. Knights, D. et al. Bayesian community-wide culture-independent microbial source tracking. Nat. Methods 8, 761–3 (2011).

105. Weigt, S. S. et al. Altered Levels of CC Chemokines During Pulmonary CMV Predict BOS and Mortality Post-Lung Transplantation. Am. J. Transplant. 8, 1512–1522 (2008).

106. Sharma, A. K. et al. Receptor for advanced glycation end products (RAGE) on iNKT cells mediates lung ischemia-reperfusion injury. Am. J. Transplant 13, 2255–67 (2013).

107. Tanner, M. A., Goebel, B. M., Dojka, M. A. & Pace, N. R. Specific ribosomal DNA sequences from diverse environmental settings correlate with experimental contaminants. Appl. Environ. Microbiol. 64, 3110–3 (1998).

108. Grahn, N., Olofsson, M., Ellnebo-Svedlund, K., Monstein, H.-J. & Jonasson, J. Identification of mixed bacterial DNA contamination in broad-range PCR amplification of 16S rDNA V1 and V3 variable regions by pyrosequencing of cloned amplicons. FEMS Microbiol. Lett. 219, 87–91 (2003).

109. Barton, H. A., Taylor, N. M., Lubbers, B. R. & Pemberton, A. C. DNA extraction from low-biomass carbonate rock: An improved method with reduced contamination and the

97

low-biomass contaminant database. J. Microbiol. Methods 66, 21–31 (2006).

110. Laurence, M., Hatzis, C. & Brash, D. E. Common Contaminants in Next-Generation Sequencing That Hinder Discovery of Low-Abundance Microbes. PLoS One 9, e97876 (2014).

111. Hogan, D. A. et al. Analysis of lung microbiota in bronchoalveolar lavage, protected brush and sputum samples from subjects with Mild-To- Moderate cystic fibrosis lung disease. PLoS One 11, (2016).

112. Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high- throughput data. Nat. Rev. Genet. 11, 733–9 (2010).

113. Laursen, M. F., Dalgaard, M. D. & Bahl, M. I. Genomic GC-Content Affects the Accuracy of 16S rRNA Gene Sequencing Based Microbial Profiling due to PCR Bias. Front. Microbiol. 8, 1934 (2017).

114. Becker, J., Poroyko, V. & Bhorade, S. The lung microbiome after lung transplantation. Expert Rev. Respir. Med. 8, 221–31 (2014).

115. Harris, J. K. et al. Molecular identification of bacteria in bronchoalveolar lavage fluid from children with cystic fibrosis. Proc. Natl. Acad. Sci. U. S. A. 104, 20529–33 (2007).

116. Kim, D. et al. Optimizing methods and dodging pitfalls in microbiome research. Microbiome 5, 52 (2017).

117. Shen, H., Rogelj, S. & Kieft, T. L. Sensitive, real-time PCR detects low-levels of contamination by Legionella pneumophila in commercial reagents. Mol. Cell. Probes 20, 147–153 (2006).

118. Carroll, N. M., Adamson, P. & Okhravi, N. Elimination of bacterial DNA from Taq DNA polymerases by restriction endonuclease digestion. J. Clin. Microbiol. 37, 3402–4 (1999).

119. Maiwald, M., Ditton, H.-J., Sonntag, H.-G. & von Knebel Doeberitz, M. Characterization of contaminating DNA in Taq polymerase which occurs during amplification with a primer set for Legionella 5S ribosomal RNA. Mol. Cell. Probes 8, 11–14 (1994).

98

120. Nogami, T., Ohto, T., Kawaguchi, O., Zaitsu, Y. & Shoichi, S. Estimation of Bacterial Contamination in Ultrapure Water: Application of the Anti-DNA Antibody. Anal. Chemostry 70, 5296–5301 (1998).

121. Kulakov, L. A., McAlister, M. B., Ogden, K. L., Larkin, M. J. & O’Hanlon, J. F. Analysis of bacteria contaminating ultrapure water in industrial systems. Appl. Environ. Microbiol. 68, 1548–55 (2002).

122. Glassing, A., Dowd, S. E., Galandiuk, S., Davis, B. & Chiodini, R. J. Inherent bacterial DNA contamination of extraction and sequencing reagents may affect interpretation of microbiota in low bacterial biomass samples. Gut Pathog. 8, (2016).

123. Jervis-Bardy, J. et al. Deriving accurate microbiota profiles from human samples with low bacterial content through post-sequencing processing of Illumina MiSeq data. Genome Res. 3, 19 (2015).

124. Kearney, M. F. et al. Multiple Sources of Contamination in Samples from Patients Reported to Have XMRV Infection. PLoS One 7, e30889 (2012).

125. Roberts, C. & Ingham, S. Using ancient DNA analysis in palaeopathology: a critical analysis of published papers, with recommendations for future work. Int. J. Osteoarchaeol. 18, 600–613 (2008).

126. Beaume, M. et al. Microbial communities of conducting and respiratory zones of lung- transplanted patients. Front. Microbiol. 7, 1–11 (2016).

127. Paulsen, I. T. et al. Complete genome sequence of the plant commensal Pseudomonas fluorescens Pf-5. Nat. Biotechnol. 23, 873–878 (2005).

128. Scales, B. S., Dickson, R. P., LiPuma, J. J. & Huffnagle, G. B. Microbiology, genomics, and clinical significance of the Pseudomonas fluorescens species complex, an unappreciated colonizer of humans. Clin. Microbiol. Rev. 27, 927–48 (2014).

129. Stämmler, F. et al. Adjusting microbiome profiles for differences in microbial load by spike-in bacteria. Microbiome 4, 28 (2016).

99

130. Jungraithmayr, W. M., Korom, S., Hillinger, S. & Weder, W. A mouse model of orthotopic, single-lung transplantation. J. Thorac. Cardiovasc. Surg. 137, 486–491 (2009).

131. Krupnick, A. S. et al. Orthotopic mouse lung transplantation as experimental methodology to study transplant and tumor biology. Nat. Protoc. 4, 86–93 (2009).

132. Mauck, K. A. & Hosenpud, J. D. The bronchial epithelium: a potential allogeneic target for chronic rejection after lung transplantation. J. Heart Lung Transplant. 15, 709–14 (1996).

133. Suwara, M. I. et al. Mechanistic differences between phenotypes of chronic lung allograft dysfunction after lung transplantation. Transpl. Int. 27, 857–67 (2014).