<<

BIOMESEQ: A QUANTITATIVE APPROACH FOR THE ANALYSIS OF

ANIMAL MICROBIOMES AND ITS APPLICATION IN CHARACTERIZING

THE MICROBIAL ECOLOGY OF AVIAN

by

Kelly Ann Mulholland

A dissertation submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics and Systems Biology

Spring 2020

© 2020 Kelly A. Mulholland All Rights Reserved

BIOMESEQ: A QUANTITATIVE APPROACH FOR THE ANALYSIS OF

ANIMAL MICROBIOMES AND ITS APPLICATION IN CHARACTERIZING

THE MICROBIAL ECOLOGY OF AVIAN SPECIES

by

Kelly Ann Mulholland

Approved: ______Cathy H. Wu, Ph.D. Chair of Bioinformatics and Computational Biology

Approved: ______Mark Rieger, Ph.D. Dean of the College of Agriculture and Natural Resources

Approved: ______Douglas J. Doren, Ph.D. Interim Vice Provost for Graduate & Professional Education and Dean of the Graduate College

I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy.

Signed: ______Calvin L. Keeler, Jr., Ph.D. Professor in charge of dissertation

I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy.

Signed: ______Carl Schmidt, Ph.D. Member of dissertation committee

I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy.

Signed: ______Shawn Polson, Ph.D. Member of dissertation committee

I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy.

Signed: ______Timothy Johnson, Ph.D. Member of dissertation committee

DEDICATION

To my father for the lifetime of love and support you gave me in the short time we

shared together.

To Tyler for your overflowing love, your uplifting spirit and your unwavering patience

after all this time.

iv

ACKNOWLEDGMENTS

I would like to extend my gratitude to several people for their contribution to this work. First, I give my sincere thanks to my advisor and mentor, Dr. Calvin Keeler, for his immeasurable support, advice and guidance throughout this process. His enthusiasm and dedication to his work have been very inspiring to me. I would like to thank my committee members, Dr. Shawn Polson, Dr. Carl Schmidt and Dr. Timothy

Johnson for their time and invaluable insight. Thank you to the members of the Keeler

Research Group, both past and present. I would especially like to thank Monique for her many contributions to this work and Sharon for sharing her knowledge and insight over the years. I also wish to thank the many friends that I have made during my time in the Bioinformatics Student Association and the EmPOWER mentoring program that have made my time at the University of Delaware so wonderful. I would like to acknowledge the University of Delaware CANR Unique Strengths Dissertation Award and the Agriculture and Food Research Initiative Competitive Grant for the financial support to make this work possible.

I am incredibly grateful to all of my family and friends for always believing in me and for their encouragement over these past few years. This dissertation would not have been possible without them. I wish to express my most sincere gratitude to my

v

partner, Tyler, for being my greatest supporter throughout this entire journey and for all he has done to ensure that I accomplished this goal. It was his love and belief in me that gave me the strength to continue during some of the more difficult times. Thank you to Bob for his guidance and endless support in all of my endeavors. I will always cherish our conversations and laughs over coffee and breakfast. I wish to thank Kathy,

David and Kari for their generosity, advice and love over the years. Thank you to Amy for her unparalleled friendship and for making these past few years so enjoyable with our many adventures. I would also like to thank Gibbs for being such a great companion and source of happiness. I am grateful to my father for instilling his work ethic in me and for always encouraging me to achieve my goals no matter how big or small. Although we are unable to celebrate this milestone together, I know that he is immensely proud. Finally, I would like to thank my mother and sisters for all of their support.

vi

TABLE OF CONTENTS

LIST OF TABLES ...... xi LIST OF FIGURES ...... xiii ABSTRACT ...... xvi

Chapter

1 INTRODUCTION AND REVIEW OF LITERATURE ...... 1

1.1 Microbiomes ...... 1

1.1.1 Symbiotic Microbial Interactions with Healthy Host and Other Microbes ...... 3 1.1.2 Dysbiotic Microbial Interactions with Diseased Host and Other Microbes ...... 6

1.2 Respiratory Microbiome ...... 7

1.2.1 Healthy Mammalian Respiratory Microbiome ...... 7 1.2.2 Mammalian Respiratory Microbiome Diseases ...... 9 1.2.3 Avian Respiratory Microbiome ...... 11 1.2.4 Multifactorial Avian Respiratory Disease Complex ...... 12

1.3 Advancement of Technology for Detection of Microorganisms ...... 14

1.3.1 Next-Generation Sequencing Technology ...... 16 1.3.2 16S Ribosomal RNA Sequencing ...... 17 1.3.3 Metagenomic Shotgun Sequencing ...... 19

1.4 Characterization of the Virome ...... 20

1.4.1 Characterizing the Virome Using a Culture-Independent Approach ...... 21 1.4.2 Challenges with Developing Comprehensive Computational Tools for Analysis of the Virome ...... 22

1.4.2.1 Quantification of the Virome ...... 23

vii

1.4.3 Existing Culture-Independent Tools for Virome Characterization and their Limitations ...... 24

1.5 Rationale and Objectives ...... 25

REFERENCES ...... 27

2 BIOMESEQ: A TOOL FOR THE CHARACTERIZATION OF ANIMAL MICROBIOMES FROM METAGENOMIC DATA ...... 48

2.1 Summary ...... 48 2.2 Introduction ...... 49 2.3 Results ...... 52

2.3.1 Design and Development of BiomeSeq ...... 52 2.3.2 Validation of BiomeSeq ...... 54 2.3.3 A Longitudinal Study of the Microbial Ecology of a Healthy Broiler Flock ...... 58 2.3.4 A Comparison of BiomeSeq bacterial results to 16S rRNA Results ...... 60

2.4 Discussion ...... 60 2.5 Materials and Methods ...... 66

2.5.1 Quality Trimming and Host Decontamination ...... 67 2.5.2 Microbial Database Alignment ...... 67 2.5.3 Quantification and Output ...... 68 2.5.4 Performance Metrics ...... 69 2.5.5 A Longitudinal Study of the Microbial Ecology of a Healthy Broiler Flock ...... 71 2.5.6 Comparison of BiomeSeq Bacterial Results to 16S rRNA Results ...... 72

REFERENCES ...... 85

3 METAGENOMIC ANALYSIS OF THE RESPIRATORY MICROBIOME OF A HEALTHY BROILER FLOCK FROM HATCHING TO PROCESSING ...... 91

3.1 Summary ...... 91 3.2 Introduction ...... 92 3.3 Results ...... 95

viii

3.3.1 Avian Respiratory Eukaryotic Viral Diversity ...... 96 3.3.2 Bacterial Diversity ...... 98 3.3.3 Diversity ...... 100 3.3.4 Fungal Diversity ...... 102 3.3.5 The Avian Microbiome ...... 103

3.4 Discussion ...... 104 3.5 Materials and Methods ...... 108

3.5.1 Sample Collection ...... 108 3.5.2 Nucleic Acid Extraction and Sequencing ...... 109 3.5.3 16S rRNA Amplicon Sequencing and Analysis ...... 109 3.5.4 Eukaryotic , Bacteriophage and Fungal Analysis ...... 110

REFERENCES ...... 122

4 CHARACTERIZATION OF THE RESPIRATORY MICROBIOME OF WITH RESPIRATORY DISEASE ...... 128

4.1 Summary ...... 128 4.2 Introduction ...... 129 4.3 Materials and Methods ...... 132

4.3.1 Sample Collection ...... 132 4.3.2 Nucleic Acid Extraction and Sequencing ...... 132 4.3.3 Eukaryotic Virus, Bacteriophage, Yeast and Fungal Analysis .. 133

4.4 Results ...... 135

4.4.1 Identifying the broiler respiratory microbiome and a comparison of the respiratory virome between healthy and diseased birds...... 135 4.4.2 Comparison of the bacterial microbiome between healthy and diseased birds...... 136 4.4.3 Comparison of the bacteriophage and fungal microbiomes between healthy and diseased birds...... 137 4.4.4 Microbial network analysis...... 138

4.5 Discussion ...... 139

REFERENCES ...... 155

ix

5 A COMPARISON OF TRACHEA, CHOANAL CLEFT AND CLOACAL MICROBIOTA OF A HEALTHY TURKEY FLOCK ...... 160

5.1 Introduction ...... 160 5.2 Materials and Methods ...... 162

5.2.1 Sample Collection ...... 162 5.2.2 Nucleic Acid Extraction and Sequencing ...... 162 5.2.3 Eukaryotic Virus, , Bacteriophage and Fungal Analysis ...... 163

5.3 Results ...... 165

5.3.1 Quality Trimming and Decontamination of Sequencing Reads 165 5.3.2 Diversity of Eukaryotic in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock ...... 166 5.3.3 Bacteriophage Diversity in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock ...... 167 5.3.4 Fungal Diversity in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock ...... 168 5.3.5 Bacterial Diversity in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock ...... 169 5.3.6 Microbial network of choanal cleft, cloaca and trachea of a healthy turkey flock ...... 170

5.4 Discussion ...... 170

REFERENCES ...... 192

6 CONCLUSIONS AND FUTURE DIRECTIONS ...... 196

REFERENCES ...... 203

Appendix

A BIOMESEQ: A TOOL FOR THE CHARACTERIZATION OF ANIMAL MICROBIOMES FROM METAGENOMIC DATA ...... 208

B METAGENOMIC ANALYSIS OF THE RESPIRATORY MICROBIOME OF A HEALTHY BROILER FLOCK FROM HATCHING TO PROCESSING ...... 216

x

LIST OF TABLES

Table 2.1 Software tools and parameters used by BiomeSeq ...... 83

Table 2.2 Example table generated by BiomeSeq of the viral component of a commercial poultry flock ...... 84

Table 3.1 Avian specific viral database structure...... 120

Table 3.2 Shannon diversity of respiratory microbes in a healthy broiler flock . 121

Table 4.1 Avian specific viral genome database structure ...... 149

Table 4.2 Sequencing data generated by DNA Seq and RNA Seq and number of reads trimmed, aligned to host and aligned to microbial databases 150

Table 4.3 Eukaryotic viruses detected in healthy and diseased broiler flocks .... 151

Table 4.4 Bacteria detected in healthy and diseased poultry broiler flocks ...... 152

Table 4.5 Bacteriophage detected in healthy and diseased poultry broiler flocks ...... 153

Table 4.6 Fungi detected in healthy and diseased poultry broiler flocks ...... 154

Table 5.1 Avian specific viral genome database structure ...... 184

Table 5.2 Quality Trimming and Host DNA Decontamination of reads generated by DNA Seq and RNA Seq from samples collected from the choanal cleft, cloaca and trachea of turkeys ...... 185

Table 5.3 Shannon diversity of virus, bacteria, bacteriophage and fungi in choanal cleft, trachea and cloaca of turkey ...... 186

Table 5.4 Eukaryotic viral species abundance in the choanal cleft, trachea and cloaca of turkey ...... 187

Table 5.5 Bacteria abundance in the choanal cleft, trachea and cloaca of turkey 188

xi

Table 5.6 Bacteriophage species with top 10 highest abundances in the choanal cleft, trachea and cloaca of turkey ...... 189

Table 5.7 Fungal species with top 10 highest abundances in the choanal cleft, trachea and cloaca of turkey ...... 190

xii

LIST OF FIGURES

Figure 2.1 BiomeSeq Workflow ...... 73

Figure 2.2 Percent relative abundance of microorganisms detected by BiomeSeq and known values from simulated datasets ...... 74

Figure 2.3 Average rate of speed at different steps in BiomeSeq processing including A) quality, B) decontamination, C) Microbial database alignment and D) quantification for four simulated datasets...... 75

Figure 2.4 Root Mean Square Error between known abundances and abundances determined by BiomeSeq ...... 76

Figure 2.5 Heatmap of percent normalized relative abundance of viruses detected in a commercial poultry flock from hatching to processing...... 77

Figure 2.6 Phylogenetic tree of bacterial species detected in a poultry flock ...... 78

Figure 2.7 Venn Diagram of the detected bacteriophage species in a commercial poultry flock at Week 0, Week 1 and Week 7 ...... 79

Figure 2.8 Fungal network of species detected in a commercial poultry flock ...... 80

Figure 2.9 Microbial network of the top 10 most abundant eukaryotic viruses, fungi, bacteria and bacteriophage in a commercial poultry flock at time of processing ...... 81

Figure 2.10 Bacteria detected in a healthy poultry broiler flock ...... 82

Figure 3.1 Normalized relative abundance of detected DNA and RNA viral species at each time point ...... 113

Figure 3.2 Heat map with phylogenetic tree representing the detection intensity of viral families at each individual week ...... 114

xiii

Figure 3.3 Heat map with phylogenetic tree representing the detection intensity of each viral family from hatching to processing ...... 115

Figure 3.4 Abundance of A) virus, B) bacteria, C) bacteriophage and D) fungi at Week 0, Week 1 and Week 7 ...... 116

Figure 3.5 Phylogenetic tree of A) virus, B) bacteria, C) bacteriophage and D) yeast and fungi ...... 117

Figure 3.6 Microbial network of the complete healthy avian respiratory microbiome ...... 118

Figure 3.7 Correlation matrix comparing bacteria and bacteriophage taxa at the family level...... 119

Figure 4.1 Sample Diversity of all detected microorganisms in healthy and diseased flocks...... 144

Figure 4.2 Heat map representing the detection intensity of viral families at each individual week ...... 145

Figure 4.3 Abundance of bacterial species in A) healthy and B) diseased flocks .... 146

Figure 4.4 Abundance of bacteriophage families in healthy and diseased flocks .... 147

Figure 4.5 Microbial network of the complete avian respiratory microbiome of a healthy and diseased flock including detected eukaryotic viruses, fungi, bacteria, and bacteriophage...... 148

Figure 5.1 Normalized relative abundance of eukaryotic viruses at the choanal cleft, cloaca and trachea of turkey...... 175

Figure 5.2 Normalized relative abundance of bacteria at the choanal cleft, cloaca and trachea of turkey...... 176

Figure 5.3 Normalized relative abundance of bacteriophage at the choanal cleft, cloaca and trachea of turkey...... 177

Figure 5.4 Normalized relative abundance of fungi at the choanal cleft, cloaca and trachea of turkey...... 178

Figure 5.5 Venn Diagram of the eukaryotic viruses detected in the choanal cleft, cloaca and trachea of turkeys...... 179

xiv

Figure 5.6 Venn Diagram of the bacteria detected in the choanal cleft, cloaca and trachea of turkeys...... 180

Figure 5.7 Venn Diagram of the bacteriophage detected in the choanal cleft, cloaca and trachea of turkeys...... 181

Figure 5.8 Venn Diagram of the fungi detected in the choanal cleft, cloaca and trachea of turkeys...... 182

Figure 5.9 Microbial network of eukaryotic viruses, fungi, bacteria and bacteriophage present in the cloaca, trachea and choanal cleft of healthy turkeys...... 183

xv

ABSTRACT

Microbiomes are complex communities of microorganisms (including bacteria, eukaryotic viruses, fungi and bacteriophage) that inhabit a particular environment in animals and contribute to essential biological functions. The microorganisms within these environments interact with the host and each other in either symbiosis or dysbiosis, depending on the condition of the host as well as external factors.

Disturbances within a microbiome may result in metabolic disturbances or disease in the host. The advancement of next generation sequencing methodologies has given rise to an increase in studies attempting to examine the microbial communities existing in a variety of animals. Readily accessible and cost-effective sequencing methodologies, as well as a number of user-friendly bioinformatics analysis software and databases for 16S rRNA sequencing data, provide the standard culture- independent approach for bacterial microbiome analysis. However, this approach cannot be extended to the characterization of eukaryotic viruses, bacteriophage and fungi. Therefore, elucidating the complete microbiome requires a new approach.

Herein, we present BiomeSeq, a computational tool developed for the characterization of complete animal microbiomes using metagenomic sequencing data.

BiomeSeq, and its accompanying databases, addresses the constraints of current

xvi

computational tools by providing a comprehensive workflow that accurately identifies and quantifies each major component of the microbiome. BiomeSeq provides taxonomic information for each detected microorganism as well as normalized abundance, relative abundance, genome coverage and sample diversity values. The performance of this tool was successfully evaluated using both simulated and clinical samples. BiomeSeq is available as a software package and as an open-source and user- friendly container, allowing users to easily download, install and use the program with a few simple commands. The versatility of BiomeSeq, such as customizable parameters and accepting custom databases, allow this tool to facilitate a variety of unique investigations.

BiomeSeq was utilized to detect and quantify microbial abundance and diversity of several avian microbiomes under various conditions. In one study, the respiratory tract of a healthy poultry broiler flock was examined at weekly intervals throughout its grow-out cycle from hatching to processing. As expected, the complexity and diversity of the viral community increased as the flock aged, while the timing and presence of several viral elements was consistent with the management practices of commercial broiler flocks. Additionally, correlations between bacteria and bacteriophage families were investigated and several highly positive correlations were identified. In a second study, the microbial ecology of the respiratory tract of a broiler flock clinically diagnosed with respiratory disease complex and a healthy broiler flock were compared. Changes in the composition and diversity of the viral, bacterial, and bacteriophage microbiomes were observed which were consistent with the complex

xvii

etiology of this disease. In a final study, the ability of BiomeSeq to characterize a variety of microbiomes in different host species was demonstrated. The tool was successful in identifying microbial communities inhabiting three unique microbial niches, including the trachea, choanal cleft and cloaca in a healthy turkey flock.

xviii

Chapter 1

INTRODUCTION AND REVIEW OF LITERATURE

1.1 Microbiomes

The term microbiome was coined by Nobel laureate Joshua Lederberg in

2001 to describe the commensal, symbiotic and pathogenic microorganisms that exist within the human body [1]. Microbiomes consist of a variety of microorganisms including bacteria, fungi, eukaryotic viruses, bacteriophage and . These environments can exist throughout the body in the oral cavity, intestinal tract, respiratory tract, vaginal tract and skin of both animals and humans [2, 3]. The composition within different microbiomes varies as the microbial communities participate in unique biological functions. For example, the human gut microbiome is involved in a variety of functions including the metabolism of glycans, amino acids and xenobiotics, immune system development, methanogenesis, and the 2-methyl-D- erythritol 4-phosphate pathway-mediated biosynthesis of vitamins essential for human health, such as B6 and B12 [4, 5]. The size of each microbial component varies as well.

In total, the bacteria within human microbiomes were originally thought to outnumber somatic and germ cells by about 10:1 [6], however recent studies provide evidence

1

that this ratio is closer to 1:1 with a total mass of about 0.2 kg [7]. Additional evidence suggests that the number of viruses may be 10-fold higher than the bacterial component [8]. These microorganisms often interact in symbiosis, in which the host organism and the microbiota interact to maintain homeostasis of the host environment

[6]. However, a change within the environment can lead to dysbiosis, often resulting in infection and disease of the host. Therefore, understanding the complex etiology of a disease requires a characterization of the microbiota from both healthy and diseased organisms.

In 2008, The Human Microbiome Project emerged as an effort to characterize the human microbiome from multiple body sites, identify changes in composition between healthy and diseased individuals and provide a standard resource for microbial data [9]. Since then, this project has isolated and sequenced over 2,200 reference bacterial strains from the human body, sampled over 300 healthy adults at eighteen specific body sights, including gut, oral cavity, airway, skin and vagina, and it continues to provide raw sequencing data for metagenomic strains on the HMP Data

Browser database [2, 3].

Although there are at least seven major microbiomes on the human body, microbiome research has primarily concentrated on the gut. A tally of microbiome literature in 2016 revealed that a total of 17,546 gut microbiome publications existed on PubMed [10]. The next most studied environment was the oral microbiome with 4,843 publications. About 1,477 studies existed that concentrated on the reproductive tract, followed closely by skin microbiome studies with about 1,372 publications. The

2

respiratory tract and ocular microbiomes were the least studied with only 764 and 152 studies published, respectively [10]. The advancement of next generation sequencing technology, along with the decrease in cost promises even more research in the near future.

1.1.1 Symbiotic Microbial Interactions with Healthy Host and Other Microbes

The microorganisms residing in microbiomes interact with the host and each other to carry out specific biological functions while maintaining homeostasis of the host. These interactions are referred to as symbiotic. The balance typically occurs when the commensal microbes outnumber the pathogenic and greater diversity is observed [6]. In the bacterial microbiome these functions can include housekeeping functions necessary for microbial life, processes specific to body-site, and specialized functions for each habitat [11]. Within the human gut microbiome, commensal microbes synthesize vitamins and amino acids, metabolize bile acids, aide in the development of the immune system and prevent the overgrowth of harmful bacteria by enhancing the epithelial barrier [5]. Commensal was found to inhibit the growth of pathogenic E. coli in nine animal species [12]. Additionally,

Bifidobacterium has been found to produce acetate, which inhibits the colonization of pathogenic E. coli [13]. The introduction of probiotics, such as Biffidobacterium and

Lactobacillus, and prebiotics to the gut microbiome has been shown to have a positive impact on human and animal health and has been an extensive area of research. In a

3

metagenomic study on a cohort of 396 women conducted by Ravel et al., vaginal microbiome compositions included predominately Lactobacillus iners, Lactobacillus crispatus, Lactobacillus gasseri, and Lactobacillus jensenii [14]. Lactobacillus species have been associated with healthy vaginal tracts as they produce hydrogen peroxide which combines with a low pH to prevent colonization of pathogens [15, 16].

A study by Skarin et al. determined that Lactobacillus species inhibit the growth of several bacteria including Garednerella vaginalis, Mobiluncus and Bacterioides by producing a low pH [Skarin et al., 1986]. Commensal bacteria on the skin include

Corynebacterium , jeieikum, Staphylococcus epidermis,

Staphylococcus aureurs, Streptococcus mitis, Psydomonas aeruginosa, and others [17,

18]. These skin-residing microbes have also been found to prevent colonization by pathogenetic species [19]. For example, in a 2010 study by Lai et al., it was demonstrated that Staphylococcus epidermis can reduce susceptibility to pathogens that lead to skin infections by activating TLR2 signaling and antimicrobial peptide expression [20].

Mutualistic symbiotes within the viral microbiome, or virome, have been found to benefit the host by altering innate immunity to other pathogens, both viral and bacterial. Virgin et al., estimates that a healthy human could harbor ten or more permanent chronic systemic viral infections, which may contribute to activating the immune system [21]. These viruses include several herpesviruses [22, 23], polyomaviruses [24], anelloviruses [25], adenoviruses [26-29], papillomaviruses [30] and even endogenous [31]. One example of a virus affecting bacterial

4

infection is gammaherpesvirus 68, which when latent has been found to increase resistance to the bacterial pathogens Listeria monocytogenes and in mice [32]. This relationship was later confirmed by Yager and colleagues in 2009, where they found that the latency period is actually transient and not lifelong [33].

Viruses can also affect infection by other viral pathogens. This can occur due to interference, the phenomena by which a viral infection causes a cell to be temporarily resistant to infection by other viruses. This type of behavior was observed in a study by Grivel et al. in 2001, in which they found that persistent infection by human herpesvirus 6 could inhibit HIV-1 infection and progression in lymphoid tissue [34].

Alternatively, viruses can also increase susceptibility and exacerbate infections by other viruses. This type of co-infection was observed in a study by Bonfante and colleagues, in which infection of low pathogenic avian influenza virus in a flock of chickens was found to increase susceptibility and clinical signs of velogenic

Newcastle disease virus [35]. More of these relationships are expected to exist between the healthy host and the virome, however for now the eukaryotic virome remains severely under-characterized.

Fungi and yeast are another component that are even less characterized, however there is some evidence that similar, symbiotic relationships can occur. One example was observed in Saccharomyces boulardii which was found to have probiotic behaviors against pathogens such as Escherichia coli, Vibrio , Salmonella [36] and Clostridium difficile in humans [37] and several animals, including turkeys [38].

5

1.1.2 Dysbiotic Microbial Interactions with Diseased Host and Other Microbes

Disturbances within a microbiome can result in an unfavorable imbalance of the microbiota referred to as dysbiosis. Disturbances can be caused by a number of factors including the colonization of a new infectious agent, external environmental stressors and the physiological or health status of the host. Dysbiosis can lead to an increase in pathogens and as a result, a decrease in the abundance and diversity of commensal microorganisms. Although it is common for pathogens to exist in an asymptomatic host, this imbalance in the microflora can often lead to infection and disease.

Dysbiosis in the gut microbiome has been linked to diseases such as chrone’s disease [39-41], ulcerative colitis [39, 42, 43], irritable bowel syndrome [44, 45], colorectal cancer [46, 47], celiac disease [48, 49], type 1 and type 2 diabetes [50-54], chronic kidney disease [55] and obesity [50, 56, 57]. In the skin microbiome, dysbiosis has been linked to atopic dermatitis [58, 59], psoriasis [58, 60, 61], acne [62-64] and rosacea [65, 66]. Dysbiosis in the vaginal microbiome has been associated with bacterial vaginosis, vaginal candidiasis and perinatal group B streptococcal disease

[15]. The onset of bacterial vaginosis, for example, can occur due to a shift in microbiota abundance of Lactobacillis sp. to [67]. Dysbiosis in the oral microbiome has been found to play a role in periodontal diseases, dental caries, and oral squamous cell carcinoma [68]. Interestingly, there is evidence that also

6

links the oral microbiome to cardiovascular disease [69, 70] as well as esophageal [71,

72], pancreatic [73] and colorectal cancer [74].

1.2 Respiratory Microbiome

As previously discussed, the respiratory microbiome is understudied when compared to the intestinal, reproductive and oral microbiomes. One reason for the reduced number of studies on this particular environment is due to differences in sampling techniques. Unlike intestinal samples in which nucleic acids can effectively be extracted from fecal matter, sampling of the respiratory tract requires much more invasive techniques such as swabbing of the trachea. Nevertheless, it is essential to characterize the microflora within this environment as pathogen introduction is a constant threat to host immunity. Recent efforts have been made to characterize the respiratory microbiome of both healthy and diseased organisms. Although much of this information regards the bacterial component, this knowledge provides information that can be used to prevent future infections and diseases in both humans and animals.

1.2.1 Healthy Mammalian Respiratory Microbiome

For many decades the healthy lung was thought to be a completely sterile environment with no bacteria present [75]. However, due to the advancement of culture-independent techniques, such as next generation sequencing, microorganisms have been found to inhabit this environment. Due to migration of microorganisms

7

from the upper-respiratory tract to the lower-respiratory tract, it is often difficult to distinguish which microorganisms inhabit which environment. Whether upper- respiratory inhabitants are being identified in the lower region, or if they inhabit both environments is not always clear. According to Dickson et al., the composition of the respiratory microbiome is determined by microbial immigration, elimination and growth [76]. Microbial immigration includes inhaling microbes within the air and the microaspiration of upper-respiratory tract contents. Elimination includes mucociliary clearance as well as innate and adaptive host immune defenses. Microbial growth can be due to several environmental factors such as nutrient availability, temperature and pH [76].

Recent studies have concluded the most abundant bacterial genera in the healthy respiratory tract of humans consist primarily of Prevotella, Veillonella and Streptococcus [77, 78]. A limited number of studies focusing on the respiratory virome exist. The first study characterizing the respiratory virome of humans was conducted by Willner and colleagues in 2009. In this study, a high viral diversity in healthy individuals that was representative of the external environment was observed [79]. Furthermore, the same twenty viruses were detected in each healthy individual, including mammalian adenoviruses, mammalian herpesviruses and poxviruses. This suggests that healthy humans share a common viral community structure in the respiratory tract.

8

1.2.2 Mammalian Respiratory Microbiome Diseases

Disease can alter the microbial composition by affecting immigration and elimination of the commensal microbes inhabiting a healthy human respiratory tract.

Chronic respiratory diseases include chronic obstructive pulmonary disease (COPD), (CF) and asthma. Typically, disease onset begins with colonization by an infectious agent. For example, bacterial colonization within the nasopharyngeal niche has been found to result in overgrowth and invasion of the bacteria, eventually leading to respiratory disease [80]. A dysbiosis state in the nasopharynx can lead to acquisition of new bacterial or vial pathogens, carriage of multiple , or a viral co-infection [81].

In patients with COPD, infections can lead to increased shortness of breath, chest tightness, and phlegm. Specific bacteria and viruses have increased abundances in patients with COPD. A study by Papi and colleagues identified specific viruses which included , influenza viruses, respiratory syncytial viruses, parainfluenza viruses and coronaviruses. The most abundant bacteria they identified were influenzae, Streptococcus pneumoniae, ,

Staphylococcus aureus, , and Enterobacter spp. [82, 83].

Furthermore, coinfection of bacteria and viruses was found to increase severity of the disease, with greater lung function impairment resulting in a longer recovery time in the hospital.

9

In patients with cystic fibrosis, infection by bacteria such as Pseudomona

Aeruginosa, Burkholderia cepacia and Staphylococcus aureus have been found to cause an increase in morbidity and mortality of patients [79, 84]. In one of the first studies to characterize a respiratory virome, Willner and colleagues identified several eukaryotic viruses in patients with cystic fibrosis including human herpesvirus and . Interestingly, they observed similar bacteriophage populations in patients with the disease when compared to healthy patients and also found that the bacteriophage populations corresponded to the detected bacteria species [79]. In a study conducted by Green et al., several bacteria, including Moraxella catarrhalis,

Haemophilus spp. and Streptococcuss spp., were identified in a majority of patients with asthma, suggesting their possible role in increased airway obstruction and inflammation [85]. Furthermore, in a study by Johnston and colleagues, respiratory viruses such as , coronaviruses, influenza, and respiratory syncytial virus were identified in children during symptomatic episodes of asthma

[86].

Several recent studies have identified viruses that exacerbate bacterial infections in the human respiratory tract. These interactions include coronavirus on H. influenza [Michaels et al., 1983]; adenovirus on H. Influenza and M. catarrhalis [87,

88]; influenza virus on S. pneumoniae [89], H. influenza [90] and S. aureus [91]; human rhinovirus on S. pneumoniae [92], H. influenzae [88] and M. catarrhalis [88]; human on S. pneumoniae [93]; and respiratory syncytial virus on S. pneumoniae and H. influenza [94, 95]. In one specific instance, influenza infection has

10

been found to alter the host to predispose it to adherence, invasion and induction of disease by pneumococcus, however the mechanism for this alteration requires further examination [96]. , Pseudomona aeruginosa and

Streptococcus pneumoniae have been shown to stimulate secretion of mucus, which could potentially lead to increased bacterial infection in patients [97]. Haemophilus influenzae and Pseudomona Aeruginosa were found to slow, and in some cases stop, human nasal cilia function, which could result in less efficient mucociliary clearance, therefore allowing infectious agents to colonize and spread more easily in the lung

[98].

1.2.3 Avian Respiratory Microbiome

Similar to humans and mammals, microbial interactions in the avian respiratory microbiome can be symbiotic or dysbiotic and this depends primarily on the status of the bird and its living conditions. In a recent study by Glendinning and colleagues, culture-independent methods were utilized to identify bacteria in the buccal, nasal and lung microbiomes of healthy chickens at different time points [99].

The group identified differences in the bacterial microbiome between the different respiratory sites as well as between different age groups of birds. They identified

Staphylococci, Lactobacilli and , which corresponded to previous culture-dependent studies. However, they were also able to identify several additional bacterial groups in a relatively high abundance, including Faecalibacterium,

11

Turicibacter and Jeotgalicoccus. In another study by Shabbir et al., culture- independent methods were utilized to compare the bacterial microbiome of the lower respiratory tract between healthy flocks located on different farms [100]. They determined that the environment has an impact on the composition of the bacterial microbiome as significant differences were observed according to the farm that the birds belong to. Breed of bird and geographic location showed less of an impact [100].

Although microorganisms are present in the respiratory tract of healthy flocks, the introduction of a new infectious agent or changes in the environmental conditions can lead to a variety of infections and co-infections that may result in respiratory disease.

1.2.4 Multifactorial Avian Respiratory Disease Complex

Avian respiratory disease complex is an example of a multifactorial syndrome that commonly affects poultry flocks and involves a combination of bacterial, viral and fungal infectious agents in conjunction with environmental stressors. Clinical signs of avian respiratory disease include snicking, head swelling, conjunctivitis, airsaculitis, nasal and ocular excretion and rattling noises [101]. The morbidity per house can range from 10-20% and mortality per house can range from 5-10%.

According to a study conducted by the USDA in 2012 where 482 breeder farms were examined for respiratory disease, 5.2% of poultry flocks in the Eastern region of the United States were affected by respiratory disease, while only 2.7% of flocks located in Central United States were affected [102]. In recent history,

12

pathogenic outbreaks in poultry flocks have contributed to global economic loss. For example, during the 2014-2015 outbreak of highly pathogenic avian influenza, over 50 million chickens and turkeys were lost to disease or depopulation [103]. Poultry is the leading source of protein globally, with over $46.3 billion in global wholesale prices in 2018 [104]. Therefore, prevention of similar outbreaks is important ensuring a healthy nutrition and economy worldwide. Environmental factors such as increased dust and ammonia levels, crowded houses, or fluctuations in temperature and humidity can trigger respiratory disease in flocks [105-107]. Poor ventilation in the winter months can reduce ventilation and prompt these stressors. The living conditions of the flock are a contributing factor to respiratory disease onset that is often overlooked in controlled laboratory settings. Therefore, understanding the complexity of coinfections, birds should be studied in controlled settings as well as in the field.

In commercial flocks in particular, coinfection is common and multiple bacteria-bacteria [108-110], virus-virus [35, 110-112] and even bacteria-virus interactions [35, 113-115] have been reported to result in respiratory disease. In the majority of cases, infection by two or more agents contributes to an exacerbation of clinical signs and increase in mortality. In the Eastern region of the United States, the

USDA reported 2.1% of flocks were diagnosed with synoviae, 2.4% were diagnosed with infectious larynogotracheitis and 0.8% were diagnosed with infectious bronchitis. In the Western region of the United States, 1.6% of flocks were diagnosed with Mycoplasma synoviae and 1.1% were diagnosed with infectious bronchitis [102]. Several studies have reported that infectious bronchitis virus

13

infection can increase the severity of Mycoplasma synoviae in poultry [116-118].

Other contributing pathogens of avian respiratory disease complex include

Erysipelothrix, Mycoplasma gallisepticum, Haeophilus paragallinarum, Escherichia coli, Pasteruella multocida, Ornithobacterium rhinotracheale, Aspergillus, infectious coryza, avian influenza, avian pneumovirus and Newcastle disease [101, 119].

Infection by two or more of these agents can increase the morbidity and mortality in poultry flocks. In a study by Springer and colleagues, it was demonstrated that two relatively non-pathogenic agents, Mycoplasma synoviae and Escherichia coli, prolong and increase the severity of infectious bronchitis virus infections [118]. Clinical signs were more severe than co-infection by only Mycoplasma synoviae and infectious bronchitis virus or Escherichia coli and infectious bronchitis virus. Unlike diseases that have one causative agent, determining multifactorial disease mechanisms can be quite challenging. Nevertheless, elucidating the dynamic interactions occurring within the microbial communities inhabiting the respiratory tract will provide a better understanding of the etiology of avian respiratory disease.

1.3 Advancement of Technology for Detection of Microorganisms

Culture dependent approaches for detecting microbial organisms have been the standard method since the 1880’s when Robert Koch invented plating [120]. While other methods such as microscopy, antigen detection and serology are commonly used in microbiology, culture is viewed as the standard method for diagnostics [121]. These

14

traditional methods have tremendously contributed to the advancement of the field of microbiology and to our understanding of microbial diversity. Although culture is still the standard in some laboratories, traditional laboratory methods are time consuming and do not allow us to truly see the diversity within the microbial communities. In some instances, it can take several days to receive bacteria and yeast results, while fungi can take as long as months [121]. Furthermore, current laboratory techniques are unable to culture the vast majority (over 90%) of microbial species leaving gaps in our knowledge and understanding of the planet’s biodiversity [122].

About a century later, however, culture-independent technologies were invented, which have allowed us to analyze microbial communities within a particular environment by identifying microbial DNA isolated from a sample. Some early molecular methods used to detect unknown microbial composition from a sample include fluorescent in situ hybridization (FISH) [123], denaturing gradient gel electrophoresis (DGGE) [124], automated ribosomal internal transcribed spacer analysis (ARISA) [125] and terminal restriction fragment length polymorphism

(TRFLP) [126]. Dideoxy chain-termination method, or Sanger sequencing, was the most commonly used method of sequencing DNA and led to the development of automated DNA sequencing platforms [127]. Using these early sequencing approaches, Sanger sequenced the first DNA genome, bacteriophage Phi X [128]. The introduction of culture-independent methods provided more insight into microbial communities; however, it was still expensive and time consuming.

15

1.3.1 Next-Generation Sequencing Technology

Advancements in sequencing technology have resulted in the development of more robust, accurate and rapid next generation sequencing platforms including the

454 Genome Sequencer (the first next-generation sequencer) [129], the Illumina

Genome Analyzer [130] and the Applied Biosystems/SOLiD [131]. With next generation sequencing platforms, unlike Sanger sequencing, DNA sequencing libraries are clonally amplified in vitro, the DNA is sequenced by synthesis and the spatially segregated and amplified DNA templates are sequenced in parallel [129, 130, 131].

Each sequencing platform has its advantages and disadvantages. For example, the 454

Genome Sequencer has a maximum read length of about 700 base pairs, a Phred quality score greater than Q20 and error rates between 1.07% and 1.7% [129, 132].

The Illumina Genome Analyzer has a maximum read length of 300 base pairs, a Phred quality score greater than Q30 and error rates between 1.0034% and 1% [133].

Finally, the Biosystems/SOLiD has a maximum read length of only 75 base pairs, a

Phred quality score greater than Q30 and error rates between 1.01% and 1%

[134].Third generation sequencing platforms produce larger reads however the error rates are significantly higher than the other platforms. For example, the Nanopore sequencer can produce up to 10 kb with error rates between 10% to 40% [135] while the Pacific Biosciences sequencer can produce up to 20 kb of sequence with error rates between 5% and 10% [136].

16

As next generation sequencing technologies continue to advance, larger amounts of data will be generated at lower cost and time. The very first human genome that was sequenced in 2000 cost an estimated $300 million to complete [137,

138]. Since then, the National Human Genome Research Institute has tracked the cost per genome sequenced from 2001 to 2019 and in this time, the cost has decreased from $100 million to as little as $1,000. This trend correlates with Moore’s Law, which states that the advancement of technology increases computational power by double every other year [139]. In addition to cost decreasing, sequencing time has also decreased. For example, the first human genome took over 15 months to sequence in

2000, while twenty years later, a human genome can be sequenced in just days [137].

Using next generation sequencing technology, two approaches are commonly employed to characterize microbial composition. One method is amplification and sequencing of conserved marker genes, such as 16S/18S ribosomal RNA gene in bacteria and the internal transcribed spacer gene in fungi. The second approach is metagenomic shotgun sequencing.

1.3.2 16S Ribosomal RNA Sequencing

Some components in the microbiome, such as bacteria, archaea and fungi, have conserved regions within their commonly referred to as marker genes, which can be sequenced and used to study and phylogeny of microorganisms in a given sample. In bacteria, the 16S ribosomal RNA (16S rRNA) gene is 500 base pairs

17

in size and consists of conserved regions and hypervariable regions (V1-V9) [140].

The conserved regions act as primer binding sites for PCR amplification and the hypervariable regions are used to identify specific bacteria. These sequences can then be clustered either into phylotypes according to sequences in a reference database

[126], or by operational taxonomic units (OTUs) in which clusters are generated based on similarity [141]. Several well-developed bioinformatics tools exist for the analysis of 16S rRNA data including Qiime, MG-RAST and Mothur, in addition to many comprehensive bacterial databases such as Greengenes and Silva [141-145]. For fungal characterization, internal transcribed spacers (ITS) can be similarly sequenced and analyzed. Bioinformatics tools and databases exist for the analysis of ITS data, however they are not as well-developed as those that exist for 16S rRNA. For example, Qiime developed a pipeline for ITS analysis and the UNITE database specifically contains fungal genomes [142, 146].

Over the past three decades, 16S rRNA sequencing has provided an inexpensive and relatively rapid alternative to traditional culture techniques. This approach has been employed in numerous microbiome studies and the gain of insight to the role of bacteria in both healthy and diseased organisms this method has provided must be appreciated. However, there are several limitations with this approach. Amplicon sequencing, such as 16S rRNA, utilizes primers which can present a bias as they target a specific sequence in bacteria, thus leaving eukaryotes and viruses undetectable. Results are also typically output as abundance proportionate to the sample instead of absolute abundance, which can lead to over- or under-

18

represented organisms, making sample comparison a challenge [147]. Furthermore, this method is restricted to classifying bacteria at the genus taxonomic level, as it lacks the power to differentiate specific bacterial species [147]. Identification of novel bacterial sequences is another limitation that is the result of utilization of bacterial reference databases in downstream analysis. Finally, because amplicon sequencing is restricted to providing taxonomic information, it does not provide the information required to infer function.

1.3.3 Metagenomic Shotgun Sequencing

Metagenomic shotgun sequencing characterizes all known and novel microorganisms in a given sample, both culturable and unculturable. DNA from an entire sample can be sequenced using metagenomic shotgun sequencing approaches and microbial communities can be classified in a short amount of time [147]. This approach does not use PCR and is therefore not restricted by primers that target specific gene sequences. As a result, it is not limited to detecting one specific and has enough sensitivity to detect at the species and even strain taxonomic level.

Therefore, a metagenomics approach can be employed to characterize viral and eukaryotic sequences. Moreover, data generated using this approach do not require alignment to a reference database, thus the potential to detect novel microbial sequences is a possibility. In addition to providing invaluable insight into the diversity and composition of an environment, the data generated using this approach can be also

19

be employed in metatranscriptomics, metabolomics and proteomics analyses, providing a deeper understanding of the community interactions through functional profiles and metabolic pathways.

Several bioinformatics analysis tools exist for the characterization of microorganisms from shotgun metagenomics data; determining which tool is most appropriate depends on the unique research hypotheses. For identification of known microbial sequences, a sequence-dependent approach can be used to align reads to annotated reference genomes that exist in reference genome databases [148-150]. This can be achieved using de novo assembly or with unassembled reads. Several sequence aligners exist that align unassembled reads to the template of reference genomes including the commonly used Bowtie2 and the Burrows-Wheeler Aligner [149, 150]. The completeness of the reference genome database provided needs to be considered using this approach. For identification of unknown microbial sequences, a de novo assembly can be employed. Several tools exist that assemble metagenomic sequences into contigs for de novo detection of microbial elements without using a reference sequence [151-153]. Some common metagenomic assemblers include MEGAHIT, MetaSPAdes and MetaVelvet [151-153]. A Basic Local Alignment Search Tool (BLAST), or a similar method, can also be employed to identify known microbial sequences from the assembled contigs [154].

1.4 Characterization of the Virome

Although the importance of the virome in both health and disease of animal hosts is apparent, our current understanding of viral diversity is incredibly limited. According to the latest master species list provided by the International Committee for

20

the Taxonomy of Viruses in 2018, only 150 families, 1,020 genera and 5,560 viral species have been classified [155]. In fact, it has been estimated that only about 1% of the planet’s virome has been discovered [8]. In addition to discovering novel viruses, elucidating the biological roles of known viruses is another area of great importance. As described previously, examining the eukaryotic virome in particular will provide a deeper understanding of the complex symbiotic and dysbiotic interactions that occur within the virome and between the virome, the host and other microorganisms. Moreover, it can even provide unique insights into our own genomes as viral signatures have been identified within the human genome. Indeed, human endogenous retroviruses account for about 8% of our total genome [156]. Endogenous retroviruses are typically inactive in humans due to deletions, inversions and mutations but they could contribute to regulating gene expression and encoding mRNAs for certain proteins [157-159] Furthermore, they have been found to be reactivated in certain diseases such as cancer [160, 161]. Although the necessity of such studies is apparent, several challenges with characterizing the virome have limited the available computational methodology and tools required for this type of analysis.

1.4.1 Characterizing the Virome Using a Culture-Independent Approach

Although amplicon sequencing and the well-developed computational tools for analyzing this data have been successfully employed in detecting bacterial communities in a number of studies, similar approaches cannot be used for viral classification. Viruses lack conserved genomic regions that are homologous across all viral genomes, such as the 16S rRNA gene in bacteria. However, the advancement of

21

other culture-independent approaches, such as metagenomic shotgun sequencing, has resulted in the possibility of characterizing viruses.

1.4.2 Challenges with Developing Comprehensive Computational Tools for Analysis of the Virome

Several challenges have made developing comprehensive tools for viral characterization arduous. For one, viral DNA and RNA is typically less abundant compared to the host and other microorganisms within a sample, resulting in a weaker signal and making viruses more difficult to detect during downstream analysis.

Moreover, the rapid degradation and instability of RNA can make detecting RNA viruses even more challenging. Furthermore, the structures of DNA and RNA viruses can significantly vary in genome size as well as other characteristics such as double- or single-stranded, positive- or negative-sense and enveloping or no enveloping.

Viruses are also highly genetically heterogeneous in nature and the sequence variability causes these viruses to evolve quickly. In 2015, a reported average of 2.5 viruses were being added to the NCBI viral RefSeq database each day [162].

Therefore, maintaining a current viral genome database will require frequent updating, possibly more so than other microbial databases. For this reason, there have been few attempts at creating and managing robust and curated viral genome databases. Thus, preventing sequence-dependent approaches that rely on reference databases. As a result, many studies attempting to characterize the virome rely on using sequence- independent assembly approaches and/or BLAST database searches, which can

22

require a significant amount of computational time and resources. Additionally, the host background DNA within samples can interfere with mapping quality and efficiency, making data interpretation problematics. Finally, many existing metagenomics tools that attempt to characterize the virome require extensive command-line knowledge and expensive computational resources to process a sample.

Although a tool that addresses all of these limitations does not yet exist, some tools have been recently developed for taxonomic profiling and discovery of viruses.

1.4.2.1 Quantification of the Virome

Quantifying the viral component of a microbiome has also provided a challenge to researchers. Percent relative abundance of microbial elements is often the chosen method of quantifying microorganisms in culture-independent studies, whether utilizing 16S rRNA or metagenomics shotgun sequencing. This is typically calculated as the number of reads mapped to a particular microbial reference genome in proportion to the total number of microbial sequences detected within the entire sample. This formula allows researchers to determine which elements are most abundant in a particular sample, information that could then be compared to other samples to identify microbial shifts and trends. This method of quantification has developed into the standard, however incorporating reference genome length to calculate normalized viral abundance is a strategy some studies have found to increase accuracy of abundance estimates [163, 164]. In a highly cited study by Moustafa and colleagues, normalized viral abundance was calculated by normalizing mapped reads based on both the viral reference genome length as well as the host reference genome

23

length [163]. By considering the length of both viral sequences and host sequences, this formula was intended to provide a more accurate representation of viral abundance by eliminating bias stemming from variable genome lengths of the viruses. Encouragingly, they found that normalized abundance corresponded well to viral abundance determined by Polymerase Chain Reaction (PCR) experiments. Overall, accurate quantification of microbial abundance requires sequence-dependent alignment to comprehensive microbial reference databases.

1.4.3 Existing Culture-Independent Tools for Virome Characterization and their Limitations

As described previously, sequence-dependent alignment and de novo assembly approaches can be employed to analyze metagenomic sequencing data. Several tools exist that perform de novo assembly specifically for metagenomics studies and can be used to analyze the virome. These tools were designed to consider the variability of multiple genome sizes and include MEGAHIT, MetaSPAdes and MetaVelvet [151-

153]. Virus-specific tools have also been developed using sequence-dependent approaches to identify known viruses including Virome, VirusHunter, VirusSeeker,

MetaVir, ProViDE, Kraken and VirSorter [148, 165-167]. All but one of these tools uses a Basic Alignment Search Algorithm (BLAST) for taxonomic classification, which can require an extensive amount of time and computational resources. These existing tools are appropriate for taxonomic classification, however they do not provide accurate viral abundance, diversity and genome coverage estimates.

Furthermore, many of these resources require extensive command-line knowledge to

24

install and use the tool, only work on certain operating systems and can be computationally exhaustive requiring access to a powerful computer or sever.

1.5 Rationale and Objectives

Overwhelming evidence suggests that microbial interactions occurring between the host and microorganisms or between microorganisms of different kingdoms have a significant role in disease pathology in intestinal, respiratory, skin, oral and reproductive microbiomes. Shotgun metagenomic sequencing has provided a rapid and inexpensive approach to sequence microbiome samples. However, a majority of studies focus on the bacterial component as the computational tools needed to characterize the eukaryotic viruses, bacteriophage and fungi are lacking.

Elucidating the complete microbiome of animals is critical in understanding the role these microbial communities play in disease etiology. To address this, BiomeSeq, a tool for the detection and quantification of eukaryotic viruses, bacteriophage, fungi and bacteria, was developed and is presented in Chapter 2. BiomeSeq utilizes a sequence-dependent approach to detect microorganisms in comprehensive microbial databases and accurately determines microbial abundance, diversity and genome coverage. The development of this tool is detailed in Chapter 2, including each step of the bioinformatics workflow as well as the contents of each microbial database.

BiomeSeq performance was evaluated using several metrics on simulated datasets as well as a clinical dataset, and the tool performed with high accuracy and precision.

25

BiomeSeq was implemented into a software package as well as a user-friendly container and the manuscript is available on BioRxiv and was recently submitted to

BMC Genomics Journal. BiomeSeq was employed in several studies examining various microbiomes of avian flocks. In the first study, BiomeSeq was utilized to analyze the respiratory tract of a healthy broiler flock from hatching to processing.

Abundance, microbial diversity, species frequency and microbial shifts were examined for each of the microbial components within the respiratory microbiome. This study provides the first comprehensive analysis of the ecology of the avian respiratory microbiome. This work is discussed in Chapter 3 and a manuscript of this work was submitted to the Journal of Applied and Environmental Microbiology. In a second study, clinical isolates with respiratory disease complex were sampled and BiomeSeq was employed to characterize each of the microbial components. This information was compared to the healthy broiler flock, providing insight on the microbial shifts occurring as a result of dysbiosis in the diseased flock. This study is discussed in

Chapter 4 and a manuscript of this work was submitted to Avian Diseases. In a third study, BiomeSeq’s ability to characterize a variety of hosts, sampling methods and microbiome environments is highlighted. In this study, the respiratory microbiome is compared to the intestinal microbiome in a flock of healthy turkeys. This study is a collaborative effort between the University of Delaware Department of Animal and

Food Science, University of Minnesota Department of Veterinary and Biomedical

Sciences, and the Ohio State University Department of Veterinary Preventive

Medicine and is discussed in Chapter 5.

26

REFERENCES

1. Lederberg, J.M., Alexa, Ome Sweet 'Omics-- A Genealogical Treasury of

Words. The Scientist, 2001. 15(7): p. 8.

2. Human Microbiome Project, C., A framework for human microbiome research.

Nature, 2012. 486(7402): p. 215-21.

3. Human Microbiome Project, C., Structure, function and diversity of the

healthy human microbiome. Nature, 2012. 486(7402): p. 207-14.

4. Gill, S.R., et al., Metagenomic analysis of the human distal gut microbiome.

Science, 2006. 312(5778): p. 1355-9.

5. Kamada, N., et al., Control of pathogens and pathobionts by the gut

microbiota. Nat Immunol, 2013. 14(7): p. 685-90.

6. Turnbaugh, P.J., et al., The human microbiome project. Nature, 2007.

449(7164): p. 804-10.

7. Sender, R., S. Fuchs, and R. Milo, Revised Estimates for the Number of

Human and Bacteria Cells in the Body. PLoS Biol, 2016. 14(8): p. e1002533.

8. Mokili, J.L.R., F.; Dutilh, B.E., Metagenomics and future perspectives in virus

discovery. Curren Opinion in Virology, 2012. 2: p. 63-77.

9. Peterson, J., et al., The NIH Human Microbiome Project. Genome Res, 2009.

19(12): p. 2317-23.

10. Lloyd-Price, J., G. Abu-Ali, and C. Huttenhower, The healthy human

microbiome. Genome Med, 2016. 8(1): p. 51.

27

11. Shafquat, A., et al., Functional and phylogenetic assembly of microbial

communities in the human microbiome. Trends Microbiol, 2014. 22(5): p. 261-

6.

12. Schamberger, G.P. and F. Diez-Gonzalez, Characterization of Colicinogenic

Escherichia coli Strains Inhibitory to Enterohemorrhagic Escherichia coli.

Journal of Food Protection, 2004. 67(3): p. 486-492.

13. Fukuda, S., et al., Bifidobacteria can protect from enteropathogenic infection

through production of acetate. Nature, 2011. 469(7331): p. 543-7.

14. Ravel, J., et al., Vaginal microbiome of reproductive-age women. Proc Natl

Acad Sci U S A, 2011. 108 Suppl 1: p. 4680-7.

15. Larson, B.a.M.G.R., Understanding the Bacterial Flora of the Female Genital

Tract. Clinical Infectious Diseases, 2000. 32: p. 69-77.

16. Hickey, R.J., et al., Understanding vaginal microbiome complexity from an

ecological perspective. Transl Res, 2012. 160(4): p. 267-82.

17. Gao, Z., et al., Molecular analysis of human forearm superficial skin bacterial

biota. Proc Natl Acad Sci U S A, 2007. 104(8): p. 2927-32.

18. Dekio, I., et al., Detection of potentially novel bacterial components of the

human skin microbiota using culture-independent molecular profiling. J Med

Microbiol, 2005. 54(Pt 12): p. 1231-1238.

19. Kong, H.H., Skin microbiome: genomics-based insights into the diversity and

role of skin microbes. Trends Mol Med, 2011. 17(6): p. 320-8.

28

20. Lai, Y., et al., Activation of TLR2 by a small molecule produced by

Staphylococcus epidermidis increases antimicrobial defense against bacterial

skin infections. J Invest Dermatol, 2010. 130(9): p. 2211-21.

21. Virgin, H.W., E.J. Wherry, and R. Ahmed, Redefining chronic viral infection.

Cell, 2009. 138(1): p. 30-50.

22. Zhu, J., et al., Virus-specific CD8+ T cells accumulate near sensory nerve

endings in genital skin during subclinical HSV-2 reactivation. J Exp Med,

2007. 204(3): p. 595-603.

23. Hislop, A.D., et al., Tonsillar homing of Epstein-Barr virus-specific CD8+ T

cells and the virus-host balance. J Clin Invest, 2005. 115(9): p. 2546-55.

24. Zur Hausen, H., Novel human polyomaviruses--re-emergence of a well known

virus family as possible human carcinogens. Int J Cancer, 2008. 123(2): p.

247-250.

25. Hino, S.M., H., Torque teno virus (TTV): current status. Reviews in Medical

Virology, 2006. 17(1): p. 45-57.

26. Gao, G., et al., Clades of Adeno-associated viruses are widely disseminated in

human tissues. J Virol, 2004. 78(12): p. 6381-8.

27. Chen, C.L., et al., Molecular characterization of adeno-associated viruses

infecting children. J Virol, 2005. 79(23): p. 14781-14792.

28. Erles, K.S., P; Schlehofer J.R., Update on the prevalence of serum antibodies

(IgG and IgM) to adeno-associated virus (AAV). Journal of Medical Virology,

1999. 59(3): p. 406-411.

29

29. Garnett, C.T.E., D.; Xu, W.; Gooding, L.R., Prevalence and Quantitation of

Species C Adenovirus DNA in Human Mucosal Lymphocytes. Journal of

Virology, 2002. 76(21): p. 10608–10616.

30. Leggatt, G.R. and I.H. Frazer, HPV vaccines: the beginning of the end for

cervical cancer. Current Opinion in Immunology, 2007. 19(2): p. 232-238.

31. Seifarth, W., et al., Comprehensive analysis of human endogenous retrovirus

transcriptional activity in human tissues with a retrovirus-specific microarray. J

Virol, 2005. 79(1): p. 341-52.

32. Barton, E.S.W., D.W.; Cathelyn, J.S.; Brett-McClellan, K.A.; Engle, M.;

Diamond, M.S.; Miller, V.L.; and Virgin; H.W. . Herpesvirus latency confers

symbiotic protection from bacterial infection. Nature, 2007. 447: p. 326–329.

33. Yager, E.J., et al., gamma-Herpesvirus-induced protection against bacterial

infection is transient. Viral Immunol, 2009. 22(1): p. 67-72.

34. Grivel, J.C., et al., Suppression of CCR5- but not CXCR4-tropic HIV-1 in

lymphoid tissue by human herpesvirus 6. Nat Med, 2001. 7(11): p. 1232-5.

35. Bonfante, F., et al., Synergy or interference of a H9N2 avian influenza virus

with a velogenic Newcastle disease virus in chickens is dose dependent. Avian

Pathol, 2017. 46(5): p. 488-496.

36. Hatoum, R., S. Labrie, and I. Fliss, Antimicrobial and probiotic properties of

yeasts: from fundamental to novel applications. Front Microbiol, 2012. 3: p.

421.

30

37. McFarland, L.V., Systematic review and meta-analysis of Saccharomyces

boulardii in adult patients. World J Gastroenterol, 2010. 16(18): p. 2202-22.

38. Bradley G.L.; Savage, T.F.T.K.I., The effects of supplementing diets with

Saccharomyces cerevisiae var. boulardii on male poult performance and ileal

morphology. Poultry Science, 1994. 73: p. 1766-1770.

39. Kaser, A., S. Zeissig, and R.S. Blumberg, Inflammatory bowel disease. Annu

Rev Immunol, 2010. 28: p. 573-621.

40. Sokol, H., et al., Faecalibacterium prausnitzii is an anti-inflammatory

commensal bacterium identified by gut microbiota analysis of Crohn disease

patients. Proc Natl Acad Sci U S A, 2008. 105(43): p. 16731-6.

41. Willing, B.P., et al., A pyrosequencing study in twins shows that

gastrointestinal microbial profiles vary with inflammatory bowel disease

phenotypes. Gastroenterology, 2010. 139(6): p. 1844-1854.e1.

42. Png, C.W., et al., Mucolytic bacteria with increased prevalence in IBD mucosa

augment in vitro utilization of mucin by other bacteria. Am J Gastroenterol,

2010. 105(11): p. 2420-8.

43. Lepage, P., et al., Twin study indicates loss of interaction between microbiota

and mucosa of patients with ulcerative colitis. Gastroenterology, 2011. 141(1):

p. 227-36.

44. Salonen, A., W.M. de Vos, and A. Palva, Gastrointestinal microbiota in

irritable bowel syndrome: present state and perspectives. Microbiology, 2010.

156(11): p. 3205-3215.

31

45. Saulnier, D.M., et al., Gastrointestinal microbiome signatures of pediatric

patients with irritable bowel syndrome. Gastroenterology, 2011. 141(5): p.

1782-91.

46. Sobhani, I., et al., Microbial dysbiosis in colorectal cancer (CRC) patients.

PLoS One, 2011. 6(1): p. e16393.

47. Wang, T., et al., Structural segregation of gut microbiota between colorectal

cancer patients and healthy volunteers. Isme j, 2012. 6(2): p. 320-9.

48. Nistal, E., et al., Differences of small intestinal bacteria populations in adults

and children with/without celiac disease: effect of age, gluten diet, and disease.

Inflammatory bowel diseases, 2012. 18(4): p. 649-656.

49. Di Cagno, R., et al., Duodenal and faecal microbiota of celiac children:

molecular, phenotype and metabolome characterization. BMC Microbiol,

2011. 11: p. 219.

50. Musso, G., R. Gambino, and M. Cassader, Interactions between gut microbiota

and host metabolism predisposing to obesity and diabetes. Annu Rev Med,

2011. 62: p. 361-80.

51. Vaarala, O., The gut as a regulator of early inflammation in type 1 diabetes.

Curr Opin Endocrinol Diabetes Obes, 2011. 18(4): p. 241-7.

52. Giongo, A., et al., Toward defining the autoimmune microbiome for type 1

diabetes. Isme j, 2011. 5(1): p. 82-91.

53. Larsen, N., et al., Gut microbiota in human adults with type 2 diabetes differs

from non-diabetic adults. PLoS One, 2010. 5(2): p. e9085.

32

54. Wu, X., et al., Molecular characterisation of the faecal microbiota in patients

with type II diabetes. Curr Microbiol, 2010. 61(1): p. 69-78.

55. Wing, M.R., et al., Gut microbiome in chronic kidney disease. Exp Physiol,

2016. 101(4): p. 471-7.

56. Ley, R.E., et al., Microbial ecology: human gut microbes associated with

obesity. Nature, 2006. 444(7122): p. 1022-3.

57. Turnbaugh, P.J., et al., A core gut microbiome in obese and lean twins. Nature,

2009. 457(7228): p. 480-4.

58. de Jongh, G.J., et al., High expression levels of keratinocyte antimicrobial

proteins in psoriasis compared with atopic dermatitis. J Invest Dermatol, 2005.

125(6): p. 1163-73.

59. Harder, J., et al., Enhanced expression and secretion of antimicrobial peptides

in atopic dermatitis and after superficial skin injury. J Invest Dermatol, 2010.

130(5): p. 1355-64.

60. Gao, Z., et al., Substantial Alterations of the Cutaneous Bacterial Biota in

Psoriatic Lesions. PLOS ONE, 2008. 3(7): p. e2719.

61. Gudjonsson, J.E., et al., Global gene expression analysis reveals evidence for

decreased lipid biosynthesis and increased innate immunity in uninvolved

psoriatic skin. J Invest Dermatol, 2009. 129(12): p. 2795-804.

62. Jugeau, S., et al., Induction of toll-like receptors by Propionibacterium acnes.

Br J Dermatol, 2005. 153(6): p. 1105-13.

33

63. Dessinioti, C. and A.D. Katsambas, The role of Propionibacterium acnes in

acne pathogenesis: facts and controversies. Clin Dermatol, 2010. 28(1): p. 2-7.

64. Grice, E.A. and J.A. Segre, The skin microbiome. Nat Rev Microbiol, 2011.

9(4): p. 244-53.

65. Holmes, A.D., Potential role of microorganisms in the pathogenesis of rosacea.

J Am Acad Dermatol, 2013. 69(6): p. 1025-32.

66. Whitfeld, M., et al., Staphylococcus epidermidis: a possible role in the pustules

of rosacea. J Am Acad Dermatol, 2011. 64(1): p. 49-52.

67. Martin, D.H. and J.M. Marrazzo, The Vaginal Microbiome: Current

Understanding and Future Directions. J Infect Dis, 2016. 214 Suppl 1: p. S36-

41.

68. Zhang, Y.W., X; Li, H; Ni, C; Du, X; Yan F, Human oral microbiota and its

modulation for oral health. Biomedicine & Pharmacotherapy, 2018. 99: p. 883-

893.

69. Grant, M.M. and D. Jonsson, Next Generation Sequencing Discoveries of the

Nitrate-Responsive Oral Microbiome and Its Effect on Vascular Responses. J

Clin Med, 2019. 8(8).

70. Sampaio-Maia, B.C., I.M.; Pereira, M.L.; Pérez-Mongiovi, D. ; Araujo, R.,

Chapter Four - The Oral Microbiome in Health and Its Implication in Oral and

Systemic Diseases. Advances in Applied Microbiology, 2016. 97: p. 171-210.

71. Peters, B.A., et al., Oral Microbiome Composition Reflects Prospective Risk

for Esophageal Cancers. Cancer Res, 2017. 77(23): p. 6777-6787.

34

72. Gao, S.G., et al., Preoperative serum immunoglobulin G and A antibodies to

Porphyromonas gingivalis are potential serum biomarkers for the diagnosis and

prognosis of esophageal squamous cell carcinoma. BMC Cancer, 2018. 18(1):

p. 17.

73. Ertz-Archambault, N., P. Keim, and D. Von Hoff, Microbiome and pancreatic

cancer: A comprehensive topic review of literature. World J Gastroenterol,

2017. 23(10): p. 1899-1908.

74. Flemer, B., et al., The oral microbiota in colorectal cancer is distinctive and

predictive. Gut, 2018. 67(8): p. 1454-1463.

75. Cotran, R., et al., Robbins Pathologic Basis of Disease. 1999, Philadelphia:

Saunders.

76. Dickson, R.P., et al., The Microbiome and the Respiratory Tract. Annu Rev

Physiol, 2016. 78: p. 481-504.

77. Morris, A., et al., Comparison of the respiratory microbiome in healthy

nonsmokers and smokers. Am J Respir Crit Care Med, 2013. 187(10): p. 1067-

75.

78. Chen;, L.N.S.A.V.A.J.C.C.R.K.B.W.H., K.I. Berger;, and

R.M.G.W.N.R.M.J.B.a.M.D. Weiden, Enrichment of lung microbiome with

supraglottic taxa is associated with increased pulmonary inflammation.

Microbiome, 2013. 1(19): p. 1-12.

35

79. Willner, D., et al., Metagenomic analysis of respiratory tract DNA viral

communities in cystic fibrosis and non-cystic fibrosis individuals. PLoS One,

2009. 4(10): p. e7370.

80. Bosch, A.A., et al., Viral and bacterial interactions in the upper respiratory

tract. PLoS Pathog, 2013. 9(1): p. e1003057.

81. Murphy, T.F., L.O. Bakaletz, and P.R. Smeesters, Microbial interactions in the

respiratory tract. Pediatr Infect Dis J, 2009. 28(10 Suppl): p. S121-6.

82. Papi, A., et al., Infections and airway inflammation in chronic obstructive

pulmonary disease severe exacerbations. Am J Respir Crit Care Med, 2006.

173(10): p. 1114-21.

83. Rohde, G., et al., Respiratory viruses in exacerbations of chronic obstructive

pulmonary disease requiring hospitalisation: a case-control study. Thorax,

2003. 58(1): p. 37-42.

84. Kulczycki, L.L., T.M. Murphy, and J.A. Bellanti, Pseudomonas colonization in

cystic fibrosis. A study of 160 patients. Jama, 1978. 240(1): p. 30-4.

85. Green, B.J., et al., Potentially pathogenic airway bacteria and neutrophilic

inflammation in treatment resistant severe asthma. PLoS One, 2014. 9(6): p.

e100645.

86. Johnston, N.W., et al., The September epidemic of asthma exacerbations in

children: a search for etiology. Journal of Allergy and Clinical Immunology,

2005. 115(1): p. 132-138.

36

87. Michaels, R.H. and R.L. Myerowitz, Viral enhancement of nasal colonization

with Haemophilus influenzae type b in the infant rat. Pediatric research, 1983.

17(6): p. 472-473.

88. Moore, H.C., et al., The interaction between respiratory viruses and pathogenic

bacteria in the upper respiratory tract of asymptomatic Aboriginal and non-

Aboriginal children. Pediatr Infect Dis J, 2010. 29(6): p. 540-5.

89. McCullers, J.A. and J.E. Rehg, Lethal synergism between influenza virus and

Streptococcus pneumoniae: characterization of a mouse model and the role of

platelet-activating factor receptor. The Journal of infectious diseases, 2002.

186(3): p. 341-350.

90. Lee, L.N., et al., A mouse model of lethal synergism between influenza virus

and Haemophilus influenzae. The American journal of pathology, 2010.

176(2): p. 800-811.

91. Iverson, A.R., et al., Influenza virus primes mice for from

Staphylococcus aureus. Journal of Infectious Diseases, 2011. 203(6): p. 880-

888.

92. Wiertsema, S.P., et al., High detection rates of nucleic acids of a wide range of

respiratory viruses in the nasopharynx and the middle ear of children with a

history of recurrent acute otitis media. Journal of medical virology, 2011.

83(11): p. 2008-2017.

93. Kukavica-Ibrulj, I., et al., infection predisposes to

severe pneumococcal pneumonia in mice. Journal of Virology, 2008.

37

94. McGillivary, G., et al., Respiratory syncytial virus-induced dysregulation of

expression of a mucosal β-defensin augments colonization of the upper airway

by non-typeable Haemophilus influenzae. Cellular microbiology, 2009. 11(9):

p. 1399-1408.

95. Stark, J.M., et al., Decreased bacterial clearance from the lungs of mice

following primary respiratory syncytial virus infection. Journal of medical

virology, 2006. 78(6): p. 829-838.

96. McCullers, J.A., Insights into the interaction between influenza virus and

pneumococcus. Clin Microbiol Rev, 2006. 19(3): p. 571-82.

97. Adler, K.B., D.D. Hendley, and G.S. Davis, Bacteria associated with

obstructive pulmonary disease elaborate extracellular products that stimulate

mucin secretion by explants of guinea pig airways. Am J Pathol, 1986. 125(3):

p. 501-14.

98. Wilson, R.R.D.C.P., Effect of bacterial products on human ciliary function in

vitro. Thorax, 1985. 40(40): p. 125-131.

99. Glendinning, L., G. McLachlan, and L. Vervelde, Age-related differences in

the respiratory microbiota of chickens. PLoS One, 2017. 12(11): p. e0188455.

100. Shabbir, M.Z., et al., Microbial communities present in the lower respiratory

tract of clinically healthy birds in Pakistan. Poult Sci, 2015. 94(4): p. 612-20.

101. Roussan, D.A., et al., Simultaneous detection of , , reovirus

and adenovirus type I in broiler chicken flocks. Pol J Vet Sci, 2012. 15(2): p.

337-44.

38

102. USDA, A.a.P.H.I.S. Respiratory Disease on Breeder- Chicken Farms in the

United States. Technical Brief 2012; Available from:

https://www.aphis.usda.gov/aphis/home.

103. Ramos, S.M., M.; Melton, A., Impacts of the 2014-2015 Highly Pathogenic

Avian Influenza Outbreak on the U.S. Poultry Sector. USDA, Economic

Research Service, 2015.

104. USDA, N.A.S.S. Poultry - Production and Value 2018 Summary. 2019;

Available from:

https://www.nass.usda.gov/Publications/Todays_Reports/reports/plva0519.pdf.

105. Chand, N., et al., Performance traits and immune response of broiler chicks

treated with zinc and ascorbic acid supplementation during cyclic heat stress.

Int J Biometeorol., 2014. 58(10): p. 2153-2157.

106. David, B., et al., Air Quality in Alternative Housing Systems May Have an

Impact on Laying Hen Welfare. Part I-Dust. Animals, 2015. 5(3): p. 495-511.

107. David, B., et al., Air Quality in Alternative Housing Systems may have an

Impact on Laying Hen Welfare. Part II-Ammonia. Animals, 2015. 5(3): p. 886-

896.

108. Ganapathy, K., R.C. Jones, and J.M. Bradbury, Pathogenicity of in vivo-

passaged Mycoplasma imitans in turkey poults in single infection and in dual

infection with rhinotracheitis virus. Avian Pathology, 1998. 27(1): p. 80-89.

109. Saif, Y.M., P.D. Moorhead, and E.H. Bohl, Mycoplasma meleagridis and

Escherichia coli infections in germfree and specific-pathogen-free turkey

39

poults: production of complicated airsacculitis. Am J Vet Res, 1970. 31(9): p.

1637-43.

110. Kato, K., Infectious coryza of chickens. V. Influence of Mycoplasma

gallisepticum infection on chicken infected with Haemophilus gallinarum. Natl

Inst Anim Health Q (Tokyo), 1965. 5(4): p. 183-9.

111. Karimi-Madab, M., et al., Risk factors for detection of bronchial casts, most

frequently seen in endemic H9N2 avian influenza infection, in poultry flocks

in Iran. Prev Vet Med, 2010. 95(3-4): p. 275-80.

112. Travers, A.F., Concomitant Ornithobacterium rhinotracheale and Newcastle

disease infection in broilers in South Africa. Avian Dis, 1996. 40(2): p. 488-90.

113. Okoye, J.O., C.N. Okeke, and F.K. Ezeobele, Effect of infectious bursal

disease virus infection on the severity of Aspergillus flavus aspergillosis of

chickens. Avian Pathol, 1991. 20(1): p. 167-71.

114. Omuro, M., et al., Interaction of Mycoplasma gallisepticum, mild strains of

Newcastle disease virus and infectious bronchitis virus in chickens. Natl Inst

Anim Health Q (Tokyo), 1971. 11(2): p. 83-93.

115. Kishida, N., et al., Co-infection of Staphylococcus aureus or Haemophilus

paragallinarum exacerbates H9N2 influenza A virus infection in chickens.

Arch Virol, 2004. 149(11): p. 2095-104.

116. Kleven, S.H., C.S. Eidson, and O.J. Fletcher, Airsacculitis induced in broilers

with a combination of Mycoplasma gallinarum and respiratory viruses. Avian

Dis, 1978. 22(4): p. 707-16.

40

117. Hopkins, S.R. and H.W. Yoder, Jr., Increased incidence of airsacculitis in

broilers infected with mycoplasma synoviae and chicken-passaged infectious

bronchitis vaccine virus. Avian Dis, 1984. 28(2): p. 386-96.

118. Springer, W.T., C. Luskus, and S.S. Pourciau, Infectious bronchitis and mixed

infections of Mycoplasma synoviae and Escherichia coli in gnotobiotic

chickens. I. Synergistic role in the airsacculitis syndrome. Infect Immun, 1974.

10(3): p. 578-89.

119. Gross, W.B., Factors affecting the development of respiratory disease complex

in chickens. Avian Dis, 1990. 34(3): p. 607-10.

120. Weiss, R.A., Robert Koch: the grandfather of cloning? Cell, 2005. 123(4): p.

539-42.

121. Laupland, K.B. and L. Valiquette, The changing culture of the microbiology

laboratory. Can J Infect Dis Med Microbiol, 2013. 24(3): p. 125-8.

122. Stewart, E.J., Growing unculturable bacteria. J Bacteriol, 2012. 194(16): p.

4151-60.

123. Gall; J.G.; Pardue, M.L., Formation and detection of RNA-DNA hybrid

molecules in cytological preparations. Genetics, 1969. 63: p. 378-383.

124. Fischer, S.G. and L.S. Lerman, DNA fragments differing by single base-pair

substitutions are separated in denaturing gradient gels: correspondence with

melting theory. Proc Natl Acad Sci U S A, 1983. 80(6): p. 1579-83.

125. Bornman, J. and T. E.W., Molecular Microbial Diversity in Soils from Eastern

Amazonia: Evidence for Unusual Microorganisms and Microbial Population

41

Shifts Associated with Deforestation. Appl Environ Microbiol, 1997. 63(7): p.

2647-2653.

126. Liu, W.M.T.C., H.; Forney, L., Characterization of Microbial Diversity by

Determining Terminal Restriction Fragment Length Polymorphisms of Genes

Encoding 16S rRNA. Appl Environ Microbiol, 1997. 63(11): p. 4516-4522.

127. Sanger, F., N. S., and C. A.R., DNA sequencingwithchain-

terminatinginhibitors. Proc. Nati. Acad. Sci. USA, 1977. 74(12): p. 5463-5467.

128. Sanger F.; Air, G.M.B., B.G.; Brown, N.L.; Coulson, A.R.; Fiddes, C.A.;

Hutchison, C.A.; Slocombe, P.M.; Smith, M., Nucleotide sequence of

bacteriophage phi X174 DNA. Nature, 1977. 65: p. 687-695.

129. Margulies, M., et al., Genome sequencing in microfabricated high-density

picolitre reactors. Nature, 2005. 437(7057): p. 376-80.

130. Quail, M.A., et al., A large genome center's improvements to the Illumina

sequencing system. Nat Methods, 2008. 5(12): p. 1005-10.

131. Shendure, J., et al., Accurate Multiplex Polony Sequencing of an Evolved

Bacterial Genome. Science, 2005. 309(5741): p. 1728.

132. Gilles, A., et al., Accuracy and quality assessment of 454 GS-FLX Titanium

pyrosequencing. BMC Genomics, 2011. 12: p. 245.

133. Ross, M.G., et al., Characterizing and measuring bias in sequence data.

Genome Biol, 2013. 14(5): p. R51.

42

134. Glenn, T.C., Field guide to next-generation DNA sequencers. Mol Ecol

Resour, 2011. 11(5): p. 759-69.

135. Laver, T., et al., Assessing the performance of the Oxford Nanopore

Technologies MinION. Biomol Detect Quantif, 2015. 3: p. 1-8.

136. Koren, S., et al., Hybrid error correction and de novo assembly of single-

molecule sequencing reads. Nat Biotechnol, 2012. 30(7): p. 693-700.

137. Wetterstrand, K.A., DNA Sequencing Costs: Data from the NHGRI Genome

Sequencing Program. 2019.

138. Mardis, E.R., A decade’s perspective on DNA sequencing technology. Nature,

2011. 470(7333): p. 198-203.

139. G.E., M., Cramming more components onto integrated circuits. Electronics,

1965. 38(8): p. 1-4.

140. Lane, D.J.P., B.; Olsen, G.; Stahl, D.; Sogin, M.; Pace, N., Rapid

determination of 16S ribosomal RNA sequences for phylogenetic analyses.

Proc Natl Acad Sci USA, 1985. 82: p. 6955-6959.

141. Schloss P, W.S., Ryabin T, Hall J, Hartman M, Hollister E, Lesniewski R,

Oakley B, Parks D, Robinson C, Sahl J, Stres B, Thallinger G, Van Horn D,

Weber C. , Introducing mothur: Open-source, platform-independent,

community-supported software for describing and comparing microbial

communities. Appl Enviro Microbiol, 2009. 75: p. 7537-7541.

142. Caporaso J, K.J., Stombaugh J, Bittinger K, Bushman F, Costello E, Fierer N,

Peña A, Goodrich J, Gordon J, Huttley G, Kelley ST, Knights D, Koenig JE,

43

Ley R, Lozupone C, McDonald D, Muegge B, Pirrung M, Reeder J, Sevinsky

JR, Turnbaugh P, Walters W, Widmann J, Yatsunenko T, Zaneveld J, Knight

R., Qiime allows analysis of high-throughout community sequencing data.

Nature Methods, 2010. 7: p. 335-336.

143. Meyer F, P.D., D’Souza M, Olson R, Glass E, Kubal M, Paczian T, Rodriguqz

A, Stevens R, Wilke A, Wilkening J, Edwards R., The metagenomics RAST

server- a public resource for the automatic phylogenetic and functional

analysis of metagenomes. BMC Bioinformatics, 2008. 9: p. 386.

144. DeSantis, T.Z., et al., Greengenes, a chimera-checked 16S rRNA gene

database and workbench compatible with ARB. Appl Environ Microbiol,

2006. 72(7): p. 5069-72.

145. Quast C, P.E., Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner F.,

The SILVA ribosomal RNA gene database project: improved data processing

and web-based tools. Nucl Acids Res., 2013. 41: p. 590-596.

146. Kõljalg U, N.R., Abarenkov K, Tedersoo L, Taylor A, Bahram M, Bates S,

Bruns T, Bengtsson-Palme J, Callaghan T, Douglas B, Drenkhan T, Eberhardt

U, Dueñas M, Grebenc T, Griffith G, Hartmann M, Kirk P, Kohout P, Larsson

E, Lindahl B, Lücking R, Martín M, Matheny P, Nguyen N, Niskanen T, Oja J,

Peay K, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Scott

J, Senés C, Smith M, Suija A, Taylor D, Telleria M, Weiss M, Larsson K.,

Towards a unified paradigm for sequence-based identification of fungi. Mol

Ecol., 2013. 22: p. 5271-5277.

44

147. Boers, S.A., R. Jansen, and J.P. Hays, Understanding and overcoming the

pitfalls and biases of next-generation sequencing (NGS) methods for use in the

routine clinical microbiological diagnostic laboratory. Eur J Clin Microbiol

Infect Dis, 2019. 38(6): p. 1059-1070.

148. Wood., D.E.S., S.L, Kraken: ultrafast metagenomic sequence classification

using exact alignments. Genome Biology, 2014. 15: p. R46.

149. Li H., D.R., Fast and accurate long-read alignment with Burrows-Wheeler

Transform. Bioinformatics, 2009. EPub.

150. Langmead, B. and S.L. Salzberg, Fast gapped-read alignment with Bowtie 2.

Nat Methods, 2012. 9(4): p. 357-9.

151. Li, D., et al., MEGAHIT: an ultra-fast single-node solution for large and

complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics,

2015. 31(10): p. 1674-6.

152. Nurk, S., et al., metaSPAdes: a new versatile metagenomic assembler. Genome

Res, 2017. 27(5): p. 824-834.

153. Namiki, T., et al., MetaVelvet: an extension of Velvet assembler to de novo

metagenome assembly from short sequence reads. Nucleic Acids Res, 2012.

40(20): p. e155.

154. Altschul, S.F., et al., Basic local alignment search tool. J Mol Biol., 1990.

215(3): p. 403-410.

155. ICTV. The International Committee for the Taxonomy of Viruses Species List.

2018; Available from: https://www.ictvonline.org/files/master-species-lists.

45

156. Griffiths, D.J., Endogenous retroviruses in the human genome sequence.

Genome biology, 2001. 2(6): p. reviews1017. 1.

157. Feschotte, C. and C. Gilbert, Endogenous viruses: insights into viral evolution

and impact on host biology. Nature Reviews Genetics, 2012. 13(4): p. 283-

296.

158. Holmes, E.C., The evolution of endogenous viral elements. Cell host &

microbe, 2011. 10(4): p. 368-377.

159. Patel, M.R., M. Emerman, and H.S. Malik, Paleovirology—ghosts and gifts of

viruses past. Current opinion in virology, 2011. 1(4): p. 304-309.

160. Bustamante Rivera, Y.Y., et al., Endogenous Retrovirus 3–History,

Physiology, and Pathology. Frontiers in microbiology, 2018. 8: p. 2691.

161. Jern, P. and J.M. Coffin, Effects of retroviruses on host genome function.

Annual review of genetics, 2008. 42: p. 709-732.

162. Rose, R., et al., Challenges in the analysis of viral metagenomes. Virus Evol,

2016. 2(2): p. vew022.

163. Moustafa, A., et al., The blood DNA virome in 8,000 humans. PLoS Pathog,

2017. 13(3): p. e1006292.

164. Angly, F.E., et al., The GAAS metagenomic tool and its estimations of viral

and microbial average genome size in four major biomes. PLoS computational

biology, 2009. 5(12).

165. Wommack, K.E., et al., VIROME: a standard operating procedure for analysis

of viral metagenome sequences. Stand Genomic Sci, 2012. 6(3): p. 427-39.

46

166. Zhao, G., et al., VirusSeeker, a computational pipeline for virus discovery and

virome composition analysis. Virology, 2017. 503: p. 21-30.

167. Zhao, G., et al., Identification of novel viruses using VirusHunter--an

automated data analysis pipeline. PLoS One, 2013. 8(10): p. e78470.

47

Chapter 2

BIOMESEQ: A TOOL FOR THE CHARACTERIZATION OF ANIMAL MICROBIOMES FROM METAGENOMIC DATA

2.1 Summary

The complete characterization of a microbiome is critical in elucidating the complex ecology of the microbial composition within healthy and diseased animals.

Many microbiome studies characterize only the bacterial component, for which there are several well-developed sequencing methods, bioinformatics tools and databases available. The lack of comprehensive bioinformatics workflows and databases have limited efforts to characterize the other components existing in a microbiome.

BiomeSeq is a tool for the analysis of the complete animal microbiome using metagenomic sequencing data. With its comprehensive workflow, customizable parameters and microbial databases, BiomeSeq can rapidly quantify the viral, fungal, bacteriophage and bacterial components of a sample and produce informative tables for analysis. Several performance metrics were performed and BiomeSeq displayed a strong positive correlation with known abundance values and exhibited high sensitivity and precision. BiomeSeq was employed in detecting and quantifying the respiratory microbiome of a commercial poultry broiler flock throughout its grow-out

48

cycle from hatching to processing and successfully processed 780 million reads. For each of microbial species detected, BiomeSeq calculated the normalized abundance, percent relative abundance, and coverage as well as the diversity for each sample. Rate of speed for each step in the BiomeSeq, precision and accuracy were calculated to examine BiomeSeq’s performance using in silico sequencing datasets. BiomeSeq demonstrated high precision (average of 99.52%) and sensitivity (average of 93.01%).

When compared to bacterial results generated by the commonly used 16S rRNA sequencing method, BiomeSeq detected the same most abundant bacteria as well as several additional species. BiomeSeq provides for the detection and quantification of the microbiome from next-generation metagenomic sequencing data. This tool is implemented into a user-friendly container that requires one command and generates a table consisting of taxonomical information for each microbe detected. It also determines normalized abundance, percent relative abundance, genome coverage and sample diversity calculations.

2.2 Introduction

Specific and unique animal microbiomes contribute to the biological function of various locations on the body including the intestinal tract, skin, vaginal tract, oral cavity, and respiratory tract [1]. Disturbances of these environments by colonization of a new bacteria, eukaryotic virus, or fungi can lead to competition, invasion and replacement. Under appropriate conditions this may result in disease. Advancements

49

in next-generation sequencing technology enable investigations into individual components of the microbiome, thereby gaining insight into the dynamic interactions taking place [2]. Identification of microbial communities within these environments can aid in elucidating the role they play in both healthy and diseased animals.

Recent studies attempting to characterize the microbiomes of animals have focused primarily on their bacterial composition, as there are well established methodological approaches to sequence and analyze this component [3-8]. The 16S rRNA gene is commonly used to identify and compare the bacterial genera present in a given sample. Accessible bacterial databases, such as Greengenes [9] and Silva [10], in addition to well-developed bioinformatics workflows are available to facilitate these analyses [11-13]. Internal Transcribed Spacer, or ITS, is a widely used fungal genetic marker gene. Similar to 16S rRNA, accessible fungi databases [14] and bioinformatics workflows for fungal analysis exist [12] .

Characterizing the viral component of the microbiome presents unique challenges. Unlike the ribosomal genes of bacteria and fungi, viruses are heterogeneous in their genetic content and therefore do not have a conserved genomic region that can be sequenced and employed for taxonomic classification using the same approaches [15]. Metagenomic shotgun sequencing does not use PCR and is therefore not restricted by primers that target specific gene sequences. As a result, this method is not limited to detecting one specific kingdom and has enough sensitivity to detect at the species taxonomic level. Using this approach, the major components of a

50

microbiome can be identified. Many studies attempting to characterize microorganisms using metagenomic sequencing data rely on adapting a sequence- similarity independent assembly approach and computationally exhaustive BLAST- like database searches. This can be attributed to the limited comprehensive microbial databases that exist. Thus, this approach provides taxonomic classification of samples, but lacks the ability to accurately quantify abundance and diversity. Furthermore, many of the available computational tools require the user to possess extensive command-line knowledge and computational resources to successfully install and run the programs and their dependencies on the command line.

Herein, we present BiomeSeq, a tool for the analysis of complete animal microbiomes from metagenomic sequencing data. BiomeSeq addresses the constraints of current computational tools by providing a comprehensive workflow and corresponding microbial databases that accurately identify and quantify each major component of the microbiome. The workflow includes quality filtering and host decontamination, sequence-similarity dependent alignment to microbial reference genome databases and quantification of microbial abundance and sample diversity.

BiomeSeq also analyzes the eukaryotic viral, fungal, bacteriophage and bacterial components using the same sequencing data to produce a complete analysis of the microbiome without requiring additional sequencing of the 16S rRNA or ITS genes.

Utilizing shotgun metagenomic data to analyze the bacterial and fungal components can increase taxonomic resolution, permit the analysis of complete genomes instead of

51

a conserved genomic region, and allows for a comparison of bacteria and fungi to the viral and bacteriophage components [16]. BiomeSeq was evaluated using simulated datasets designed to mimic complex microbial communities and performed with exceptional accuracy and precision. BiomeSeq was also employed to characterize the respiratory microbiome of a healthy broiler flock. The results obtained using

BiomeSeq were compared to 16S rRNA approach and BiomeSeq was able to identify

533 unique bacterial genera compared to 24 detected by 16S rRNA. In addition to characterizing all microbial components from the same sample, BiomeSeq is also able to discriminate at a higher taxonomic resolution. BiomeSeq is available as an open- source and user-friendly container. This versatility allows BiomeSeq to be accessible to users with varied degrees of command-line knowledge and computational resources. While BiomeSeq has been developed and evaluated on avian species, it can be used to characterize microbiomes of a variety of species, including humans.

2.3 Results

2.3.1 Design and Development of BiomeSeq

BiomeSeq was designed to identify microbial communities within next generation sequencing files in single- and paired-end format. Figure 2.1 shows an overview of the BiomeSeq workflow. In summary, the workflow begins with a quality and decontamination step in which all adapter sequences, short reads and low-quality reads are first extracted from the sequencing files provided by the user. The trimmed

52

reads are then aligned to the host reference genome specified by the user. Host DNA is extracted from the file to increase analytical efficiency and mapping accuracy [17].

The remaining reads are then aligned to BiomeSeq’s microbial databases including a eukaryotic viral, fungal, bacterial and bacteriophage genome database containing sequences obtained from the NCBI RefSeq database. These databases are publicly available [18]. One feature that makes BiomeSeq quite versatile is that it accepts custom databases provided by the user. Several additional customizable parameters can be specified by the user including: the host reference genome, mapping quality threshold, and output files (i.e. alignment files). Table 2.1 includes all software and parameters used by BiomeSeq. Following the alignment of the decontaminated reads to the microbial databases, BiomeSeq then calculates normalized abundance, percent relative abundance and genome coverage for each eukaryotic virus, bacteria, bacteriophage and fungi detected. It also calculates diversity of the entire sample.

BiomeSeq generates a table consisting of NCBI RefSeq accession number, microbe name, taxonomy, number of mapped reads of the detected microbes and all calculations. For each sample processed by BiomeSeq, four tables are generated consisting of the results generated for each of the four components. In Table 2.2 an example of an output table for the viral component can be viewed. Similar tables are generated for bacteria, bacteriophage and fungal data. The BiomeSeq workflow and associated databases were implemented into a software package and a container

(Figure S1). BiomeSeq is currently available as an open-source and user-friendly resource on Docker Hub. The self-contained environment simplifies installation and

53

execution by eliminating the need for downloading and installing the BiomeSeq program, databases and all dependent software. Furthermore, the BiomeSeq container allows the same customizable parameters and accepts custom databases provided by the user.

2.3.2 Validation of BiomeSeq

BiomeSeq’s performance was evaluated using simulated datasets consisting of known microorganisms and their corresponding abundances. Four simulated datasets were created to closely mimic the complex community structure of an avian respiratory microbiome. Each dataset was generated using genome sequences from 20 microorganisms that have been experimentally detected in the respiratory tract of poultry broilers (Table S1). A sequence from a poultry broiler was also included to represent the host environment. The simulated datasets contained an average of

24,523,032 total raw reads (Table S2), which were processed using BiomeSeq. The reads were first trimmed for quality and decontaminated of host DNA. Table S2 shows the number of reads that were extracted during each of these steps in the processing.

An average of 24,522,253 remained after quality trimming of all adapter sequences, sequences less than 100 base pairs in length and sequences with a quality Phred score under 30 (Table S2). An average of 5,158,715 remained following decontamination of chicken genomic sequences. The remaining reads were then aligned to four microbial databases including bacteriophage, bacteria, fungi and avian derived virus genomes

54

with a mapping quality threshold of 20. One major feature of BiomeSeq that makes it so versatile, is its ability to accept custom databases provided by the user. To evaluate this feature, an avian-specific viral database was constructed to replace BiomeSeq’s default viral database. An average of 90.8% of the reads were aligned to microbial genome sequences (Table S2). From the number of mapped reads, BiomeSeq then calculates the normalized abundance, relative abundance, genome coverage and diversity of the sample and generates a table for each of the four microbiome components. From the information provided by the BiomeSeq tables, several metrics were used to evaluate BiomeSeq’s overall performance including correlation with known abundance, sensitivity, precision, rate of speed and root mean square error.

A total of twenty microbial genomes were included in the four simulated datasets and BiomeSeq was able to successfully identify each. From the number of known mapped reads and the number of reads BiomeSeq mapped, percent relative abundance was calculated. Figure 2.2 shows both known and predicted percent relative abundance of one of the simulated datasets. Of the known abundances, the most abundant fungi is Aspergillus Oryzae (73.34%), the most abundant bacteria is

Escherichia coli (10.87%), the most abundant eukaryotic virus is Gallid Herpesvirus 2

(1.06%) and the most abundant bacteriophage is Enterobacteriophage T4 (0.33%).

The most abundant fungi detected by BiomeSeq is also Aspergillus Oryzae (78.03%), the most abundant bacteria is also Escherichia coli (7.36%), the most abundant eukaryotic virus is Gallid Herpesvirus 1 (0.27%) and the most abundant bacteriophage is also Enterobacteriophage T4 (0.09%). Pearson correlation coefficients between

55

predicted and known abundances were calculated at the species level. Abundances of species determined by BiomeSeq were highly correlated with known abundances demonstrating an average correlation coefficient of r = 0.997 for all four datasets.

The precision and sensitivity of BiomeSeq was evaluated using the same datasets. True positives, true negatives, false positives, sensitivity and precision were calculated for each microbial component (Table S3). Overall, 4,659,277 true positives,

22,397 false positives, and 350,271 false negatives were observed. Sensitivity describes the number of reads correctly aligned to the appropriate genome divided by the total number of sequences in the sample. Precision is the number of reads that were aligned to the appropriate genome divided by the total number of reads mapped to any genome. Using default parameters, BiomeSeq demonstrated exceptional accuracy, with 99.52% precision of and 93.01% sensitivity (Table S3).

The rate of speed during each step of BiomeSeq was calculated for the four simulated datasets (Figure 2.3; Table S4). Rate of speed of BiomeSeq is contingent upon the number of computational cores, amount of computational memory and the size of the dataset and host reference genome. The four simulated datasets were processed on a server with 98 GB RAM and 4 CPU cores. The quality step, in which adapter sequences, reads less than 100 base pairs in length and low quality reads are trimmed from the input sequencing file, was measured at an average speed of 79,977

(± 9,204) reads per second (Figure 2.3; Table S4). The decontamination step had an average speed of 6,327 (± 473) reads per second (Figure 2.3; Table S4). During this step, the host reference genome is indexed; the larger the host genome is the longer

56

this step will take. The Gallus gallus genome (Annotation Release 104), used in this evaluation, is about 1.2 billion base pairs in length [19]. After the genome is indexed, the trimmed reads are aligned to the host reference genome and reads that map are removed from the file. Alignment of reads to microbial databases was measured at an average speed of 2,421 (± 174) reads per second (Figure 2.3; Table S4). During this step, the reads remaining after decontamination, an average of 5,158,715 for the four simulated datasets, are aligned to a total of 7,227 microbial genomes with various sizes. Finally, the quantification step, in which the normalized relative abundance, percent relative abundance, genome coverage and diversity is calculated from the reads that aligned to the microbial sequences, had an average speed of 183,264 (±

31,244) reads per second (Figure 2.3; Table S4).

Root mean square error (RMSE) measures the amount of error between the known abundances of each species and the abundances determined by BiomeSeq

(Figure 2.4). A small RMSE value indicates that the abundance determined by

BiomeSeq is close to the known abundances in the simulated dataset. RMSE was calculated for each eukaryotic virus, bacteria, bacteriophage and fungi species (Figure

2.4). An RMSE of < 4.70 was exhibited for all species and 17 species exhibited an

RMSE value of < 0.24. These results further indicate that BiomeSeq can accurately determine microbial abundance at the species taxonomic level.

57

2.3.3 A Longitudinal Study of the Microbial Ecology of a Healthy Broiler Flock

BiomeSeq was employed to detect and quantify eukaryotic viruses, bacteria, bacteriophage, and fungi in a healthy commercial broiler flock during the grow-out cycle from hatching to processing. Samples were collected from the respiratory tract of a healthy broiler flock weekly as the flock aged (Day 1 – Day 49). DNA and RNA were isolated and sequenced using an Illumina NGS platform. A total of 780 million reads were generated and successfully processed using BiomeSeq. These reads were first trimmed for quality, decontaminated of host DNA and aligned to each microbial genome database. The default viral genome database provided by BiomeSeq was replaced by a custom database containing avian-derived viral sequences (Table S5).

For each microorganism identified, BiomeSeq calculated normalized abundance, percent relative abundance, genome coverage and sample diversity. The taxonomic and quantitative data generated by BiomeSeq was visually represented using a variety of available tools.

In total, BiomeSeq aligned 5,163 reads to avian DNA viruses and 71,936 reads to avian RNA viral sequences. A total of 9 viral species, representing 8 genera and 8 families, were identified from the avian respiratory tract during the grow-out period.

Figure 6 shows a heatmap of percent normalized viral abundance at each time point during the grow-out cycle (Figure 2.5). A total of 469,937 reads were aligned to the bacterial genome database. This included 533 unique bacterial species, of which 45 had a calculated relative abundance greater than 0.5%. The 45 most abundant species

58

detected extend from 4 phyla, 7 classes, 13 orders, 26 families and 45 genera. This data is represented in a phylogenetic tree generated using the Phytools package in R

(Figure 2.6) [20]. A total of 504,682 reads aligned to the bacteriophage genome database. A total of 30 unique bacteriophage species extended from 1 classified and 1 unclassified order, 4 classified and 1 unclassified families, and 5 classified and 4 unclassified genera were identified. This data is represented in a Venn diagram of the common and unique bacteriophage species detected at Week 0, Week 3 and Week 7, generated using the VennDiagram package in R (Figure 2.7) [21]. A total of 1,964 reads aligned to the fungal genome database. Sixty-one unique fungal species were identified which extended from 2 phyla, 9 classes, 20 orders, 37 families and 50 genera. This data is represented in a fungal network generated with Cytoscape in which the nodes are grouped according to class and the diameter of the inner nodes corresponds to the frequency of which that particular microbial species was detected during the growout cycle of the flock (Figure 2.8) [22]. BiomeSeq detects the major components of a microbiome and therefore provides the information necessary for a complete view of microbial community structures. To provide an example of how the taxonomic and quantitative information produced by BiomeSeq can be visually represented, a microbial network was generated using Cytoscape from one sample

(Figure 2.9) [22]. This network contains all of the fungi, eukaryotic viruses, bacteria and bacteriophage detected in a single sample by BiomeSeq, with each node diameter corresponding to percent relative abundance of the particular species detected.

59

2.3.4 A Comparison of BiomeSeq bacterial results to 16S rRNA Results

As previously discussed, 16S rRNA sequencing methods are commonly used to analyze the bacterial component of microbiome samples. To compare BiomeSeq to this method, the next generation sequencing data generated from a healthy broiler flock at week 7 was compared to 16S rRNA results. Using the same sample, metagenomic DNA-Seq data and 16S rRNA data was generated. The DNA-Seq data was analyzed using BiomeSeq and the16S rRNA data was analyzed using Mothur and

Silva. Interestingly, the same most abundant bacteria were identified using both methods (Figure 2.10; Table S6). BiomeSeq determined Gallibacterium anatis was the most abundant (29%), followed by Staphylococcus haemolyticus (28%) and

Corynebacterium falsenii (18%; Figure 2.10B). The 16S rRNA approach determined

Gallibacterium was the most abundant (39%), followed by Corynebacterium (23%),

Lactobacillales (16%) and Staphylococcus (10%; Figure 2.10A). BiomeSeq has greater taxonomic sensitivity and is able to identify bacteria at the species level, whereas 16S rRNA is restricted to detection at the genera level.

2.4 Discussion

The complete characterization of a microbiome is critical in elucidating the complex ecology of the microbial composition within healthy and diseased animals.

The advancement of next generation sequencing methodologies has given rise to an increase in studies attempting to examine the microbial communities existing in a

60

variety of animals. Readily accessible and cost-effective sequencing methodologies as well as a number of user-friendly bioinformatics analysis software and databases for

16S rRNA sequencing data provide the standard culture-independent approach for bacterial analysis [9-13]. Although 16S rRNA has provided insight into one component of the microbiome, it is limited to detecting one specific kingdom, lacks the sensitivity to discriminate between species and cannot be used for novel microbial discovery. Metagenomic shotgun sequencing does not use PCR and is therefore not restricted by primers that target specific gene sequences. As a result, it is not limited to detecting one specific kingdom and has enough sensitivity to detect at the species taxonomic level. BiomeSeq is a novel computational tool designed to characterize the complete microbiome from metagenomic sequencing data. With its comprehensive workflow and microbial reference databases, this tool can rapidly identify the eukaryotic viral, fungal, bacteriophage and bacterial components of a sample and provide an accurate quantification of abundance, genome coverage and diversity.

BiomeSeq consists of three primary steps: i) quality trimming and decontamination of host DNA; ii) alignment to four microbial reference databases; iii) quantification of abundance, genome coverage and diversity (Figure 2.1). BiomeSeq utilizes a sequence-similarity dependent approach with comprehensive microbial databases to provide taxonomic classification and quantitate abundance and diversity.

This tool provides an accurate representation of abundance by considering the variability in microbial genome length and host genome length in these calculations.

61

Comprehensive eukaryotic viral, bacterial, fungal and bacteriophage databases were constructed using complete and representative genomes obtained from the NCBI

Reference Sequence Database and contain 5,693, 3,623, 1,281 and 2,212 genomes, respectively. These databases are publicly available [18]. A sequence-similarity dependent approach allows for accurate quantification; however, it is often limited by the completeness of the database used. To address this, BiomeSeq databases are updated biannually to include recently discovered microorganisms. Furthermore,

BiomeSeq accepts custom microbial databases provided by users, thus studies are not limited to utilizing only the default databases. BiomeSeq was designed for the identification of known microorganisms, however the sequencing data accepted by this tool can also be used in de novo microbial discovery. Many computational tools require extensive command-line knowledge and computational resources to process sequencing samples. In an attempt to increase user accessibility, the BiomeSeq software package is implemented into an open-source and user-friendly container

(Figure S1). Containers, such as this, allow the user to download and install

BiomeSeq, both workflow and all databases, and dependent software on any operating system using one simple command. Furthermore, the user can process their sample with any custom parameters, using one line of code.

BiomeSeq’s performance was evaluated using several metrics including correlation with known abundance, sensitivity, precision, rate of speed and root mean square error. Four simulated datasets containing known abundances of 20

62

microorganisms were employed for this evaluation (Table S1). BiomeSeq was successful in identifying each of the 20 microorgansims, and the abundance calculations at the species taxonomic level determined by BiomeSeq were highly correlated with the known abundances of these species (r = 0.997). Utilizing the default quality threshold of BiomeSeq, high precision and sensitivity were demonstrated with an average of 99.52% and 93.01%, respectively (Table S3). Rate of speed was calculated for each dataset at each step in the BiomeSeq workflow including quality trimming, decontamination of host DNA, alignment to four microbial databases, and quantification. Overall, an average total rate of speed of

271,584 (± 34,912) reads per second was observed (Figure 2.3; Table S4). However, this metric is highly dependent on computational resources, as well as the size of the host reference genome and sequencing file input into the program. An RMSE of less than 0.24 was demonstrated for 17 of the species in the simulated datasets, further demonstrating that the abundance determined by BiomeSeq at the species taxonomic level corresponds to the known values (Figure 2.4). Overall, BiomeSeq performed with exceptional speed, accuracy and sensitivity.

Biomeseq was employed to detect and quantify the respiratory microbiome of a healthy commercial poultry broiler flock at weekly intervals from hatching to processing. For each component of the respiratory microbiome of this flock, abundance was calculated and population shifts were examined at each time point. A total of 11 eukaryotic viral species, 45 bacterial species, 31 bacteriophage species, and

63

61 fungal species were identified in this flock. The taxonomic and quantitative tables generated by BiomeSeq can be input into several programs to create visual representations of the data. Heatmaps, phylogenetic trees, venn diagrams, and microbial networks are examples of visualizations that can be easily generated to assist interpretation of the results (Figures 2.5-9).

The commercial broiler flock utilized in this study was vaccinated in ovo with a live Marek’s disease virus vaccine (SB-1) and a live recombinant herpesvirus of turkeys (HVT) vaccine expressing Newcastle disease virus genes. The presence of herpesviruses and coronaviruses in the respiratory tract is consistent with vaccination with these two live vaccines, coupled with the expected presence of these avian viruses in the environment. The presence of the family of bacteriophage correlated with Gallibacterium, an abundant bacterial species (data not shown). Interactions between bacteriophage and bacteria are known to have a significant impact on host health (24). Basidiomycota was highly abundant in this flock, however further studies are needed to determine the relevance of this fungal species in the respiratory tract of avian species. The bacterial diversity of the flock was complex at the time of processing, containing significant amounts of , Corynebacteriaceae, Staphylococcaceae and Enterobacteriaceae. Using one sample from this study, bacterial results generated by BiomeSeq were compared to results generated by 16S rRNA sequencing methods. The most abundant bacteria were observed using both methods (Figure 2.11; Table S6). However, BiomeSeq identified 533 unique bacteria, 45 with a relative abundance of greater than 0.5%, while 16S rRNA detected only 24 genera. Furthermore, BiomeSeq has greater taxonomic

64

sensitivity and is able to identify bacteria at the species level, whereas 16S rRNA is restricted to detection at the genera level. Moreover, 16S rRNA sequencing methodology can only be employed for taxonomic classification of the bacterial component, leaving the identity of the remaining components of the microbiome unknown. BiomeSeq is able to characterize all major components of a microbiome with high taxonomic sensitivity and accurately quantify abundance. Moreover, unlike 16S rRNA sequencing data, metagenomic shotgun sequencing data processed by BiomeSeq can be further used in sequence-independent approaches for de novo microbial discovery.

BiomeSeq is a tool developed for the analysis of complete animal microbiomes using metagenomic sequencing data. With its comprehensive workflow, customizable parameters and microbial databases, BiomeSeq can rapidly identify the major components of a microbiome from a sample and determine normalized abundance, percent relative abundance, genome coverage and sample diversity. While many existing tools focus on characterizing one microorganism, BiomeSeq provides a complete view of microbial ecology and diversity in a sample. The performance of this tool was evaluated using both simulated and clinical datasets and exceptionally accurate and precise abundance estimates were demonstrated. BiomeSeq is available as an open-source and user-friendly container, allowing users to easily download, install and use the program with a few simple commands. The versatility of

BiomeSeq, such as customizable parameters and accepting custom databases, allow this tool to facilitate a variety of unique investigations.

65

2.5 Materials and Methods

BiomeSeq is currently available as an open-access and user-friendly tool on

Docker Hub. As the docker container is self-contained, it simplifies installation and execution by eliminating the need for downloading and installing dependent software and requires only one command. BiomeSeq is customizable and allows the user to adjust parameters similar to a command-line tool. Table 2.1 includes all software and parameters used in BiomeSeq.

BiomeSeq accepts both single- and paired-end reads in fastq format generated by DNA-Seq or RNA-Seq methods. Along with the fastq file, the user may customize a number of parameters including: the host genome that the sample was derived from, custom databases provided by the user, mapping quality threshold and output file types. Figure 2.1 shows an overview of the BiomeSeq workflow, which consists of three primary steps: i) quality trimming and decontamination of host DNA; ii) alignment to four microbial reference databases; iii) quantification of abundance, genome coverage and diversity. BiomeSeq generates a table consisting of NCBI

RefSeq accession number, microbe name, taxonomic information, number of mapped reads, normalized abundance, percent relative abundance, genome coverage for each eukaryotic virus, bacteria, bacteriophage and fungi detected, as well as diversity of the sample. Table 2.2 is an example of an output table generated for the viral component.

Similar tables are generated for bacteria, bacteriophage and fungal data. Visualizations of these results can be easily generated using several different packages in R.

66

2.5.1 Quality Trimming and Host Decontamination

The BiomeSeq workflow begins with a quality trimming step in which individual fastq sequence files input into the program are first analyzed for per-base sequence quality, per-sequence quality, sequence length distribution and duplicate sequences (Figure 2.1). Reads with a quality phred score below 30, reads under 100 base pairs in length and adapter sequences are removed from the file. This step is conducted using Trim-Galore [23] . The next step in the workflow decontaminates the file of host DNA. In this step, the trimmed reads are aligned to the user-specified host reference genome using BWA, and only reads that do not align to the host genome are extracted and analyzed further (Figure 2.1) [24].

2.5.2 Microbial Database Alignment

The trimmed and decontaminated sequencing reads are aligned to a eukaryotic viral genome database, a bacterial database, a fungal database and a bacteriophage database using the Bowtie 2 alignment algorithm (Figure 2.1) [25]. Mapping quality threshold default is 20, however this parameter may be customized by the user. The eukaryotic viral genome database currently includes 5,693 complete and representative viral sequences obtained from the National Center for Biotechnology

Information (NCBI) Reference Sequence Database [26]. Bacterial, fungal and bacteriophage databases were constructed using a similar approach and contain 3,623,

1,281 and 2,212 genomes, respectively [26]. Each microbial database and

67

corresponding aligner index files are publicly available [18]. Each of the four microbial databases are continuously updated to include novel and recently discovered sequences. These databases are the default option for BiomeSeq. However, as an additional feature, BiomeSeq also accepts custom microbial databases provided by the user.

2.5.3 Quantification and Output

A sequence similarity-dependent approach for detecting microorganisms contributes to the rapid detection of known viruses while also allowing for the quantification of biodiversity, which similarity-independent approaches lack [27, 28].

To calculate microbial abundance, BiomeSeq uses an adaptation to the equation presented by Moustafa and colleagues in 2017 to quantify viral abundance [29]:

"#$%&'#() +',-.(-$/ -,3'/% &4 %/(.5 3(66/. 7& 3#$%&'/ 5/8,/-$/ 2 2 3#$%&'/ 5/8,/-$/ 5#9/ = 2 10! -,3'/% &4 %/(.5 3(66/. 7& ℎ&57 ;/-&3/ ℎ&57 ;/-&3/ 5#9/

Percent relative abundance is quantified using the following equation:

3#$%&'#() (',-.(-$/ >/%$/-7 ?/)(7#@/ +',-.(-$/ = 2 100 7&7() 3#$%&'#() (',-.(-$/

Genome coverage is approximated using the following equation:

68

(-,3'/% &4 %/(.5 3(66/. 7& 3#$%&'/ 2 %/(. )/-;7ℎ) A/-&3/ B&@/%(;/ = 3#$%&'/ %/4/%/-$/ ;/-&3/ 5#9/

Alpha diversity for each sample is calculated using the Shannon Diversity Index, a commonly used equation for calculating species diversity in a microbiome as it accounts for both species abundance and evenness within the sample [30, 31].

2.5.4 Performance Metrics

Simulated data was utilized to assess several metrics of BiomeSeq’s performance capabilities including correlation with known abundance, sensitivity, precision, rate of speed and root mean square error. Four datasets were generated to closely mimic the complexity of data obtained from real microbiomes consisting of bacteria, eukaryotic viruses, bacteriophage and fungi genomes as well as host DNA sequences. The datasets contain sequences from 20 microorganisms commonly found in the respiratory microbiome of broiler chickens, including 10 eukaryotic viruses, 4 bacteriophage, 5 bacteria and 1 fungi (Table S1). We included one chicken sequence to represent the host environment (NC_006088.5). ART was used to simulate reads generated using next-generation sequencing technology [32]. Single-end reads with a length of 100, fold coverage of 10X and masking cutoff frequency of 1 in 100 were simulated based on an error and quality profile of the HiSeq 2500 Illumina sequencing platform. The number of reads simulated ranged from 24,522,223 to 24,523,708, with an average read count of 24,523,065.

69

The four simulated datasets were processed using BiomeSeq with the following parameters: -g chicken.fasta -d avian_virus -q 20. One major feature of

BiomeSeq is its ability to accept custom databases provided by the user. To evaluate this feature, an avian-specific viral database was constructed to replace BiomeSeq’s default viral database (Table S5). The avian DNA viral genomes include 48 viral elements from 9 unique families and the avian RNA viral genomes include 63 viral elements from 13 families. The avian DNA and RNA viral database is arranged by the classification of their viral structure and genome organization. DNA viruses are organized hierarchically by whether the virus is double- or single-stranded and whether the virus is enveloped or non-enveloped. RNA viruses are organized hierarchically by whether the virus is double- or single-stranded, negative or positive sense, segmented or non-segmented and whether the virus is enveloped or non- enveloped. This database is publicly available [18]. Abundance was calculated on the species level and several metrics were assessed based on the calculations determined by BiomeSeq including correlation with known abundances which was calculated using Pearson’s correlation coefficient. Rate of speed was calculated as the number of reads per second at each step of the BiomeSeq process on a server with 98 GB RAM and 4 CPU cores. Sensitivity and precision were calculated based on the following equations:

G%,/ >&5#7#@/ E/-5#7#@#7F = G%,/ >&5#7#@/ + I()5/ J/;(7#@/

70

G%,/ >&5#7#@/ >%/$#5#&- = G%,/ >&5#7#@/ + I()5/ >&5#7#@/

True positives are the number of reads that BiomeSeq aligned to the genomes in the databases; false positives are the number of reads that were aligned to genomes not included in the databases; and false negatives are the number of reads that were not aligned. Root mean square error was calculated to compare the abundance calculations of BiomeSeq to the known abundance using the following equation:

∑(N#&3/5/8 +',-.(-$/ − P-&Q- +',-.(-$/)" ?"EK = L -,3'/% 5(36)/5

2.5.5 A Longitudinal Study of the Microbial Ecology of a Healthy Broiler Flock

Tracheal swabs were collected at hatching and at weekly intervals through processing at day 49 (8 samples) from an -free commercial broiler flock.

Both DNA and RNA were isolated and sequencing was performed for each of the eight time points using the Illumina HiSeq platform producing 1 X 100 single-end reads. Each of the resulting 16 samples were processed using BiomeSeq with the following parameters: -g chicken -d avian_virus -q 40. The previously described avian viral reference database was utilized in this study (Table S5). Normalized abundance, relative abundance, genome coverage and sample diversity was calculated for each of the microbial components. Visual representations of the results generated by

71

BiomeSeq were generated using several R packages, including heatmaps, phylogenetic trees, Venn diagrams and microbial networks [20-22].

2.5.6 Comparison of BiomeSeq Bacterial Results to 16S rRNA Results

For comparison of BiomeSeq results to bacterial results generated using 16S rRNA sequencing methodology, The V4 hypervariable region of the bacterial 16S rRNA gene was extracted and amplified using PCR with primers 515F (‘5-

GTGCCAGCMGCCGCGGTAA-3’) and 806R (‘5-

GGACTACHVGGGTWTCTAAT-3’), as previously described [7, 33]. The amplicons were sequenced at the University of Minnesota Genomics Center (Minneapolis, MN) using an Illumina MiSeq 600 cycle v3 kit. Each sample was assessed for quality and assembled into contigs using PEAR’s default parameters, with the modification that the quality score threshold was set to 30. Samples were further filtered and analyzed using mothur version 1.35.1 [13] and MiSeq SOP [34]. OTUs were generated using

97% sequence similarity. Mothur’s implementation of the SILVA database (v123) was used for classification of OTUs, and relative abundance was calculated. The results generated using 16S rRNA sequencing methodology were compared to results generated by BiomeSeq.

72

Figure 2.1. BiomeSeq Workflow.

Input Fastq Files

Quality And Decontamination Trim Sequences for Quality

Align Trimmed Reads to Host Genome

Database Alignment

Fungal Animal Viral Align Unmapped Database Database reads to Microbial databases Bacteriophage Bacterial Database Database

Quantification

Calculate Viral Calculate Bacteria Calculate Fungal Calculate Phage Abundance, Abundance, Abundance, Abundance, Diversity and Diversity and Diversity and Diversity and Genome Genome Genome Genome Coverage Coverage Coverage Coverage

73

Figure 2.2. Percent relative abundance of microorganisms detected by BiomeSeq and known values from simulated datasets

BiomeSeq

Known

0 10 20 30 40 50 60 70 80 90 100 Percent Relative Abundance

Aspergillus oryzae Enterobacteria phage T4 Enterobacteria phage T7 Escherichia coli Escherichia phage TL-2011b Gallid herpesvirus 1 Gallid herpesvirus 2 Gallid herpesvirus 3 Infectious bronchitis virus Infectious bursal disease virus Influenza A virus Mycoplasma gallisepticum Mycoplasma synoviae Newcastle disease virus Ornithobacterium rhinotracheale Staphylococcus phage StB20 Turkey coronavirus Meleagridid Herpesvirus 1

74

Figure 2.3. Average rate of speed at different steps in BiomeSeq processing including A) quality(a), B) decontamination(b), C) Microbial database alignment(c) and D) quantification(d) for four simulated datasets.

100,000

10,000

1,000 (Reads/second)

10 100 log

10

1 Quality Trimming Host Decontamination Microbial Database Quantification Alignment

a) Adapter sequences, reads shorter than 100 base pairs in length and reads with a quality Phred score of less than 30 from the sequencing file. b) Host reference genome is indexed, and reads are aligned to the host reference genome to extract host DNA. c) Reads are aligned to four microbial databases including eukaryotic viruses, fungi, bacteria and bacteriophage. d) Normalized abundance, percent relative abundance, genome coverage and diversity are calculated from the reads that align to microbial sequences.

75

Figure 2.4. Root Mean Square Error between known abundances and abundances determined by BiomeSeq

Root Mean Square Error 5

4.5

4

3.5

3

2.5

2 Root Mean Square Error Mean Square Root 1.5

1 76 0.5

Enterobacteriaphage Newcastlevirus disease Infectiousbursal virus disease

0 Ornithobacterium Mycoplasma Aspergillus Enterobacteriaphage Escherichiaphage TL Infectiousbronchitis virus Influenzavirus A HerpesvirusMeleagridid 1 Gallid Gallid coronavirus Turkey Gallid Escherichiacoli Pasteurella Mycoplasma Staphylococcusphage StB20 metapneumovirus Avian herpesvirus 3 herpesvirus 1 herpesvirus herpesvirus 2 herpesvirus oryzae multocida synoviae gallisepticum rhinotracheale - 2011b T7 T4

76

Figure 2.5. Heatmap of percent normalized relative abundance of viruses detected in a commercial poultry flock from hatching to processing. Color corresponds to the range of relative abundance of each family from 0 to 100%. Green: 0-1%; yellow: 1-25%; orange: 25-75%; and red: 75-100%. The sum of each column, or week, is 100%.

Nucleic Acid Strand Sense Enveloping Family Genus Species Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Type Gallid alpha herpesvirus 1 0.171 enveloped Gallid alphaherpesvirus 2&3 0.257 0.145 0.070 0.006 double stranded DNA Meleagrid alphaherpesvirus 1 0.037 0.004 Avian gyrovirus 88.664 12.054 15.469 36.773 non-enveloped single stranded Aviadenovirus Fowl aviadenovirus 53.299 6.698 negative enveloped Avibirnavirus Infectious bursal disease virus 0.008 2.333 0.105 Gammacoronavirus Avian infectious bronchitis virus 0.382 54.762 58.947 16.278 1.884 23.602 21.786 19.319 enveloped Alpharetrovirus Avian carcinoma virus 0.077 0.042 RNA single stranded Retroviridae positive Unclassified Avian Endogenous Retrovirus 99.447 44.493 41.017 83.296 9.290 64.196 7.108 37.063 Astroviridae Chicken astrovirus 0.744 non-enveloped

77 Picornaviridae Sicinivirus Chicken sicinivirus JSY 0.169 0.005

77

Figure 2.6. Phylogenetic tree of bacterial species detected in a commercial poultry flock. Branches extend from phylum to species. Nodes indicate detected species and diameter indicates average abundance.

Actinobacteria

Proteobacteria

Bacteroidetes

Firmicutes

78

Figure 2.7. Venn Diagram of the detected bacteriophage species in a commercial poultry flock at Week 0, Week 1 and Week 7.

Week 0 Week 3

Enterobacteria phage P88 Staphylococcus phage GH15 Microbacterium phage Min1 Staphylococcus phage StB20-like 10 Staphylococcus phage P108 Staphylococcus phage phiSA012 Enterobacteria phage phi92 1 6 Enterobacteria phage IME10 Staphylococcus phage SPbeta-like Enterobacteria phage cdtI Enterobacteria phage P1 Staphylococcus phage MCE-2014 Staphylococcus phage phiRS7 Staphylococcus phage phiIPLA-RODI Enterobacteria phage SfI Salmonella phage RE-2010 Salmonella phage SJ46 10 Enterobacteria phage T7 Enterobacteria phage lambda Stx2-converting phage 1717 Enterobacteria phage mEp460 Enterobacteria phage VT2phi_272 Shigella phage SfIV Escherichia phage TL-2011b 79 Uncultured phage crAssphage 1 1 Stx2 converting phage vB_EcoP_24B

Enterobacteria phage YYZ-2008

Shigella phage SHFML-11 Staphylococcus phage StB20

1

Enterobacteria phage RB55 Week 7

79

Figure 2.8. Fungal network of species detected in a commercial poultry flock. Outer nodes represent order level, while inner nodes represent species. Diameter of the inner nodes correlate to species frequency, or the number of weeks the species was detected.

80

80

Figure 2.9. Microbial network of the top 10 most abundant eukaryotic viruses, fungi, bacteria and bacteriophage in a commercial poultry flock at time of processing. Node diameter indicates the percent relative abundance.

Bacteria

Eukaryotic Virus

Avian Respiratory Microbiome 81

Fungi

Bacteriophage

81

Figure 2.10. Bacteria detected in a healthy poultry broiler flock using A) 16S rRNA and B) BiomeSeq

16S HEALTHY BIOMESEQ HEALTHY

A) B)

Other 10% Other Gallibacterium 25% anatis Lactobacillales** Gallibacterium 29% 16% 40%

Staphylococcus 10% Staphylococcus haemolyticus Corynebacterium 28% falsenii Corynebacteriaceae* 18% 24%

82

Table 2.1. Software tools and parameters used by BiomeSeq

Process Tool Name Parameters

Quality Trimming Trim Galore default

BWA -x -S Host Decontamination Samtools view -bS

Bowtie 2 -x -S Microbial Database Alignment Samtools view -bSq [user input]

83

Table 2.2. Example table generated by BiomeSeq of the viral component of a commercial poultry flock at Week 6.

Ref Seq Genome Number Norm. Relative Genome Sample Name Taxonomy Number Size Mapped Abundance Abundance Coverage Diversity

Double Gallid Stranded; NC002229 Alphaherpesvirus Enveloped; 177874 1 27 0.004% 0 0.534 2 Herpesviridae; Mardivirus Double Gallid Stranded; NC002577 Alphaherpesvirus Enveloped; 164270 1 30 0.004% 0 3 Herpesviridae; Mardivirus Single Stranded; 84 Non- NC015396 Avian Gyrovirus 2383 72 147166 22.493% 3.05 Enveloped; ; Gyrovirus Double Stranded; Fowl Non- NC001720 43804 4560 507049 77.498% 10.51 Aviadenovirus Enveloped; Adenoviridae; Aviadenovirus

84

REFERENCES

1. Peterson, J., et al., The NIH Human Microbiome Project. Genome research,

2009. 19(12): p. 2317-2323.

2. Barzon, L., et al., Applications of next-generation sequencing technologies to

diagnostic virology. Int J Mol Sci, 2011. 12(11): p. 7861-84.

3. Bond, S.L., et al., Upper and lower respiratory tract microbiota in horses:

bacterial communities associated with health and mild asthma (inflammatory

airway disease) and effects of dexamethasone. BMC Microbiol, 2017. 17(1): p.

184.

4. De Boeck, C., et al., Longitudinal monitoring for respiratory pathogens in

broiler chickens reveals co-infection of Chlamydia psittaci and

Ornithobacterium rhinotracheale. J Med Microbiol, 2015. 64(5): p. 565-574.

5. Gaeta N, L.S., Teixeira A, Ganda E, Oikonomou G, Gregory L, Bichalho R.,

Deciphering upper respiratory tract microbiota complexity in healthy calves

and calves that develop respiratory disease using shotgun metagenomics. J

Dairy Sci. , 2017. 100: p. 1445-1458.

6. Glendinning, L., G. McLachlan, and L. Vervelde, Age-related differences in

the respiratory microbiota of chickens. PLoS One, 2017. 12(11): p. e0188455.

7. Johnson TJ, Y.B., Noll S, Cardona C, Evans NP, Karnezos P, Ngunjiri JM,

Abundo MC, Lee C-W, A consistent and predictable commercial broiler

85

chicken bacterial microbiota in antibiotic-free production displays strong

correlations with performance. Appl. Environ. Micro., 2018. 84: p. e00362-18.

8. Shabbir, M.Z., et al., Microbial communities present in the lower respiratory

tract of clinically healthy birds in Pakistan. Poult Sci, 2015. 94(4): p. 612-20.

9. De Santis T, H.P., Larsen N, Rojas M, Brodie E, Keller K, Huber T, Dalevi D,

Hu P, Andersen G., Greengenes, a chimera-checked 16S rRNA gene. 2016.

10. Quast C, P.E., Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner F.,

The SILVA ribosomal RNA gene database project: improved data processing

and web-based tools. Nucl Acids Res., 2013. 41: p. 590-596.

11. Meyer F, P.D., D’Souza M, Olson R, Glass E, Kubal M, Paczian T, Rodriguqz

A, Stevens R, Wilke A, Wilkening J, Edwards R., The metagenomics RAST

server- a public resource for the automatic phylogenetic and functional

analysis of metagenomes. BMC Bioinformatics, 2008. 9: p. 386.

12. Caporaso J, K.J., Stombaugh J, Bittinger K, Bushman F, Costello E, Fierer N,

Peña A, Goodrich J, Gordon J, Huttley G, Kelley ST, Knights D, Koenig JE,

Ley R, Lozupone C, McDonald D, Muegge B, Pirrung M, Reeder J, Sevinsky

JR, Turnbaugh P, Walters W, Widmann J, Yatsunenko T, Zaneveld J, Knight

R., Qiime allows analysis of high-throughout community sequencing data.

Nature Methods, 2010. 7: p. 335-336.

13. Schloss P, W.S., Ryabin T, Hall J, Hartman M, Hollister E, Lesniewski R,

Oakley B, Parks D, Robinson C, Sahl J, Stres B, Thallinger G, Van Horn D,

Weber C. , Introducing mothur: Open-source, platform-independent,

86

community-supported software for describing and comparing microbial

communities. Appl Enviro Microbiol, 2009. 75: p. 7537-7541.

14. Kõljalg U, N.R., Abarenkov K, Tedersoo L, Taylor A, Bahram M, Bates S,

Bruns T, Bengtsson-Palme J, Callaghan T, Douglas B, Drenkhan T, Eberhardt

U, Dueñas M, Grebenc T, Griffith G, Hartmann M, Kirk P, Kohout P, Larsson

E, Lindahl B, Lücking R, Martín M, Matheny P, Nguyen N, Niskanen T, Oja J,

Peay K, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Scott

J, Senés C, Smith M, Suija A, Taylor D, Telleria M, Weiss M, Larsson K.,

Towards a unified paradigm for sequence-based identification of fungi. Mol

Ecol., 2013. 22: p. 5271-5277.

15. Zhu, J., et al., Virus-specific CD8+ T cells accumulate near sensory nerve

endings in genital skin during subclinical HSV-2 reactivation. J Exp Med,

2007. 204(3): p. 595-603.

16. Jovel, J., et al., Characterization of the Gut Microbiome Using 16S or Shotgun

Metagenomics. Front Microbiol, 2016. 7: p. 459.

17. Daly G, L.R., Rowe W, Stubbs S, Wilkinson M, Ramirez-Gonzalez R, Mario

C, Bernal W, Heeney J. , Host subtraction, filtering and assembly validations

for novel viral discovery using next generation sequencing data. PLoS One,

2015. 10(6).

18. Mulholland, K.A. BiomeSeq Microbial Databases. Avian Genomics 2019;

Available from: https://sites.udel.edu/aviangenomics/.

87

19. Hillier, L.W., et al., Sequence and comparative analysis of the chicken genome

provide unique perspectives on vertebrate evolution. Nature, 2004. 432(7018):

p. 695-716.

20. Revell, L.J., phytools: an R package for phylogenetic comparative biology

(and other things). Methods in ecology and evolution, 2012. 3(2): p. 217-223.

21. Chen, H. and P.C. Boutros, VennDiagram: a package for the generation of

highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics,

2011. 12(1): p. 35.

22. Shannon, P., et al., Cytoscape: a software environment for integrated models of

biomolecular interaction networks. Genome Res, 2003. 13(11): p. 2498-504.

23. Martin, M., Cutadapt Removes Adapter Sequences from High-Throughput

Sequencing Reads. EMBnet Journal, 2011. 17: p. 10-12.

24. Li, H. and R. Durbin, Fast and accurate long-read alignment with Burrows-

Wheeler transform. Bioinformatics, 2010. 26(5): p. 589-95.

25. Langmead, B. and S.L. Salzberg, Fast gapped-read alignment with Bowtie 2.

Nat Methods, 2012. 9(4): p. 357-9.

26. O'Leary NA, W.M., Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B,

Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y,

Blinkova O, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell

CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali

VK, Li W, Maglott D, Masterson P, McGarvey KM, Murphy MR, O'Neill K,

Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, Shkeda A, Storz SS,

88

Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR, Wallin C, Webb D,

Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio M, Kitts P, Murphy

TD, Pruitt KD. , Reference sequence (RefSeq) database at NCBI: current

status, taxonomic expansion, and functional annotation. Nucleic Acids Res.,

2016. 4: p. 733-745.

27. Herath, D., et al., Assessing Species Diversity Using Metavirome Data:

Methods and Challenges. Comput Struct Biotechnol J, 2017. 15: p. 447-455.

28. Rose, R., et al., Challenges in the analysis of viral metagenomes. Virus Evol,

2016. 2(2): p. vew022.

29. Moustafa, A., et al., The blood DNA virome in 8,000 humans. PLoS Pathog,

2017. 13(3): p. e1006292.

30. Lemos, L.N., et al., Rethinking microbial diversity analysis in the high

throughput sequencing era. Journal of Microbiological Methods, 2011. 86(1):

p. 42-51.

31. Ludwig, J. and J. Reynolds, Statistical Ecology, ed. Wiley. 1988, New York.

32. Huang, W., et al., ART: a next-generation sequencing read simulator.

Bioinformatics, 2012. 28(4): p. 593-4.

33. Gohl DM, V.P., Garbe J, MacLean A, Hauge A, Becker A, Gould TJ, Clayton

JB, Johnson TJ, Hunter R, Knights D, Beckman KB., Systematic improvement

of amplicon marker gene methods for increased accuracy in microbiome

studies. Nat Biotechnol, 2016. 34: p. 942-949.

89

34. Kozich J, W.S., Baxter N, Highlander S, Schloss P., Development of a dual-

index se-quencing strategy and curation pipeline for analyzing amplicon

sequence data on the MiSeq Illumina sequencing platform. Appl. Environ.

Microbiol., 2013. 79: p. 5112-5120.

90

Chapter 3

METAGENOMIC ANALYSIS OF THE RESPIRATORY MICROBIOME OF A HEALTHY BROILER FLOCK FROM HATCHING TO PROCESSING

3.1 Summary

The severity and spread of many human and animal diseases are associated with specific bacterial and viral agents within the respiratory microbiome. Recent studies attempting to characterize the respiratory microbiome of poultry have focused primarily on bacteria, however elucidating the complex microbial interactions that result in disease requires the characterization of the viruses, bacteria, bacteriophage, and fungi present in the respiratory microbiome of a healthy broiler flock. The lack of comprehensive bioinformatics pipelines and viral genome databases have limited efforts to characterize the avian virome. Next generation sequencing approaches, accompanied by the further development of novel computational and bioinformatics tools, were utilized to examine the evolution of the microbial ecology of the avian trachea during the growth of a commercial flock. The flock was sampled weekly, beginning at placement and concluding at 49 days, the day before processing.

Metagenomic sequencing of DNA and RNA and 16S rRNA sequencing was utilized to examine the bacteria, virus, bacteriophage, and fungal components at these times.

91

We detected a total of 11 eukaryotic viral species, 24 bacterial genera, 31 bacteriophage species, and 61 fungal species. Abundance at various taxonomic levels, alpha diversity, species frequency and microbial shifts were examined for each of the microbial components. Additionally, correlations between bacteria and bacteriophage families were investigated and several highly positive correlations were identified.

This study provides the first comprehensive analysis of the ecology of the avian respiratory microbiome and will facilitate future investigations of avian respiratory diseases.

3.2 Introduction

Microbiomes are complex environments consisting of eukaryotic viruses, bacteria, archaea, bacteriophage, fungi, and protozoa, all of which contribute to the creation of a particular biological niche. These microorganisms interact with the host and each other in either symbiosis or dysbiosis depending on the status of the host [1].

The introduction of an infectious agent can lead to disturbances of this environment and may result in disease [2]. Recent studies have identified specific bacterial and viral agents within the respiratory microbiome of both humans and animals that are associated with the severity and spread of disease [3-6]. For example, Pettigrew et al.

(2008) evaluated the complex interactions between Streptococcus pneumoniae,

Haemophilus influenzae, Moraxella catarrhalis, and Staphylococcus aureus in the upper respiratory tract of children with upper respiratory tract infections. They

92

determined that colonization involves a combination of host conditions, host immune response, and direct competitive interactions between bacteria. The presence of a viral infection may predispose the respiratory tract to a bacterial superinfection. Bakaletz determined that this is due to viral-bacterial interactions resulting in disruption of the respiratory mucosal epithelium [4]. Since diseases in the respiratory tract of poultry in particular result in poor performance, our goal is to gain insight into the impact microbial communities have on the health of the respiratory tract of poultry.

Avian respiratory disease complex (RDC) is an example of a multi-faceted syndrome that commonly affects poultry [7]. This disease can be triggered by a combination of environmental factors and microbial agents. Interactions between a combination of endogenous bacteria (Mycoplasma gallisepticum and Escherichia coli), with fungal and viral infectious agents such as infectious laryngotracheitis virus, infectious bronchitis virus and Newcastle disease virus can lead to RDC and may result in high mortality rates in poultry flocks [7, 8]. Additionally, exposure to certain environmental factors such as ammonia and other gasses and insufficient ventilation reduces the innate immune defenses of the bird and enables opportunistic microbial pathogens to establish themselves [8]. However, the extent of these microbial interactions is not fully understood. Elucidating the complex microbial interactions that result in RDC first requires a characterization of the complete respiratory microbiome of both healthy and diseased chickens.

93

Recent studies attempting to characterize the respiratory microbiome of poultry have focused primarily on bacteria, as there are well established and rapid methods of sequencing and analyzing this component [9-14]. The 16S rRNA gene is commonly used to identify and compare bacteria present in a given sample [15]. Accessible bacterial databases, such as Greengenes [16] and Silva [17], in addition to well- developed bioinformatics pipelines are available to facilitate these analyses [18-20].

Glendinning et al. (2017) utilized these 16S rRNA gene amplification approaches to characterize the buccal, nasal and lung microbiota of chickens. Utilizing similar methods, Shabbir et al. (2015) determined that the lower respiratory tract of healthy flocks of chickens from different farms in Pakistan exhibited high levels of diversity in their microbiota. More recently, Johnson et al. (2018) presented a comprehensive analysis of the core bacterial microbiota in the broiler gastrointestinal, respiratory, and barn environments. Although Lactobacillae were the predominant bacteria found in the trachea, similar to the ileum, the dominant Lactobacillus species differed in relative abundance when tracheal and ileum tissues were compared.

Although, the bacterial component provides valuable information about the respiratory microbiome of poultry, a comprehensive analysis of the avian respiratory microbiota has not been reported. Unlike bacteria, viruses lack a marker gene that can be sequenced and employed for taxonomic classification due to their high genetic heterogeneity [21]. With the advancement of next generation metagenomic sequencing technologies, virome characterization is also possible. The lack of comprehensive

94

bioinformatics tools and viral genome databases limit efforts to characterize the virome. Given this limitation and the lack of a comprehensive microbial environment for the broiler chicken, we developed and employed a bioinformatics pipeline and bacteriophage, fungal, and avian viral genome databases to examine a healthy flock of chickens throughout their grow out cycle. These methods were used to detect and quantify eukaryotic DNA and RNA viruses, bacteria, bacteriophage, and fungi. This study provides the first comprehensive analysis of the ecology of the avian respiratory microbiome and will facilitate future investigations of avian respiratory diseases such as RDC.

3.3 Results

A commercial poultry flock was utilized for this longitudinal study of the broiler respiratory microbiome. The flock was sampled weekly, beginning at placement, and concluding at 49 days, the day before processing. The flock had no health issues during grow out. Mortality after the first week was 1.4%, average final body weight was 7.95 lb, the feed conversion ratio was 1.76, and ammonia levels in the house were maintained below 20 ppm. Tracheal swabs from 12 birds were collected at each time point as two pools of six swabs. The two pools were combined and used for the extraction of DNA and RNA. DNA-Seq and RNA-Seq libraries were constructed and sequenced and the V4 hypervariable region of the 16S rRNA gene was also amplified and sequenced. A total of 339,319,712 trimmed DNA-Seq reads and 440,442,599 trimmed RNA-Seq reads were generated from a total of 16 libraries.

A total of 78.787 giga base pairs (Gbp) of high-quality nucleotide sequences were

95

obtained (Table S1). An average of 88% of the DNA reads mapped to the chicken genome, while an average of 53% of the RNA reads mapped to the chicken genome.

3.3.1 Avian Respiratory Eukaryotic Viral Diversity

Unmapped DNA and RNA reads from the eight weekly broiler respiratory samples, each representing a pool of 12 birds, were aligned to an avian specific viral genome database consisting of 63 complete avian RNA viral genomes and 48 complete avian DNA viral genomes (Table 3.1). The 5,163 reads which aligned to the avian viral DNA database and the 71,936 reads which aligned to the avian viral RNA database were analyzed as described in the Materials and Methods.

A total of 11 viral species, representing 9 genera and 8 families, were identified from the avian respiratory tract during the seven week grow out period

(Table S2). Normalized viral abundance was calculated for each eukaryotic viral species for each week (Figure 3.1; Table S3). At placement, or week 0, gallid herpesvirus 1 was the only DNA virus detected (Table S2). Relatively small amounts of meleagrid herpesvirus 1 were detected at week 2, and gallid herpesvirus 2 and 3 were detected in small quantities from weeks 3-6. Two other viral DNA families were detected later during growth. Circoviridae (avian gyrovirus) first appears in the avian respiratory tract at four weeks of age, while was initially detected at week 6. The relative abundance of the DNA viruses of the respiratory tract can be examined on the basis of their percent relative abundance to the other viruses in each

96

sample (Figure 3.2; Table S4) or with respect to the relative distribution of the specific virus family throughout the 7 week period (Figure 3.3; Table S5). It is notable that when Circoviridae and Adenoviridae first appear (Week 4 and Week 6, respectively) they represent the highest viral abundance in the tracheal sample

(88.66% and 53.30% respectively, Figure 3.2). They also represent the highest relative abundance of these virus families observed during flock growth (57.97% and

88.84% respectively, Figure 3.3).

Five eukaryotic RNA virus families were identified in the tracheal samples during grow out. The pattern of RNA virus detection differed markedly from the observed patterns of DNA virus detection (Figure 3.2; Figure 3.3). As expected, transcripts from endogenous avian retroelements were detected throughout the growth of the flock. Coronoviridae were also observed in all of the respiratory samples, and they were the most abundant virus family found at Week 1 and Week 2 (Figure 3.2).

Three other RNA virus families were observed in the broiler trachea. Astroviridae and

Picornoviridae were observed transiently and in low numbers during Week1 and

Weeks 3-4 respectively. In addition, relatively low levels of Birnaviridae, infectious bursal disease virus, were observed in tracheal samples from Weeks 4, 6, and 7.

Unlike the DNA viruses that were observed later during flock growth, the relative abundances of these viruses remained low.

Figure 3.4A compares the avian respiratory viral microbiome of newly placed chickens, correlating with the microbial environment of a commercial hatchery (Week

0), with birds who have spent 1 week on litter (Week 1) to that of mature broiler

97

chickens at the time of processing (Week 7). An examination of the viral microbiome

(Figure 3.4A, Table S4) revealed trace amounts of Coronaviriae (0.38%) and

Herpesviridae (17%) at hatch. After 1 week, Coronaviridae are well established in the birds. After 6 more weeks, a more diverse and complex viral environment was observed, where avian adenovirus, and infectious bronchitis virus dominate. Average normalized abundance was calculated at each taxonomic level (Figure 5A, Table S5). Alpha diversity was also calculated at each week using

Shannon Diversity Index (Table 3.2). Both RNA viruses and DNA viruses exhibited their lowest diversity at placement (Week 0, H = 0.041 and H = 0.000 respectively).

DNA viruses saw an increase in diversity at Week 4 and exhibited their highest diversity at Week 7 (H = 0.867). The RNA virus population exhibited the highest diversity at Week 6 (H = 1.480).

3.3.2 Bacterial Diversity

A total of 50,181 reads were obtained from sequencing the V4 hypervariable region of the 16S rRNA gene (Table S1). Week 2 and Week 6 were omitted from analyses due to low numbers of processed reads. Processing and analysis was performed on the samples following the protocol discussed in the Materials and

Methods section. Following processing, a total of 353 operational taxonomic units

(OTUs) were obtained.

98

A total of 24 unique bacterial genera were identified and extended from 4 phyla, 7 classes, 13 orders and 24 families (Table S6). Average abundance was calculated in a similar manner to the previous analyses. The phyla made up most of the bacteria with an average abundance of 56.17%, followed by

(39.28%), (24.00%) and (5.78%). The relative abundance of all phyla, classes, orders, families, genera and species are available in

Table S7. Within the Firmicutes phylum, was the most abundant class with an average abundance of 49.02% followed by Clostridia (7.15%). Within the

Proteobacteria phylum, was the most abundant class with an average abundance of 36.28% followed by (3.00%). Actinobacteria

(24.00%) was the only class in the Actinobacteria phyla. The Bacteroidia (3.38%) and

Flavobacteria (2.40%) classes made up the Bacteroidetes phyla (Figure 3.5B).

A comparison of the Week 0 to the Week 1 bacterial microbiome (Figure

3.4B) reveals that the Bacteroides present at hatch (9.00%) are absent by Week 1 and the Actinobacteria and Proteobacteria are significantly reduced. The Firmicutes nearly double in abundance by Week 1 at the expense of these three families. By the end of the grow out cycle more balanced populations of Proteobacteria (37.20%),

Actinobacteria (27.20%) and Firmicutes (33.40%) are observed. Calculations of Alpha diversity (Table 3.2) showed a consistently diverse bacterial population, which is highest at placement and lowest near the end of the grow out period.

We also investigated the frequency of specific bacteria genera during the grow out cycle (Table S6). Three different population patterns were observed. At placement,

99

several genera from all four phyla are represented. Representing the Actinobacteria are the Corynebacteriaceae (6%), Brevibacterium (9%), the Brachybacterium (8%) and the Yaniella (1%). These are observed in lower abundance throughout flock growth.

At placement, the Proteobacteria are predominantly represented by the Pseudomonas

(13%) which are not found in significant levels after Week 1. As shown in Figure

3.4B, Week 0 is the time when significant numbers of Bacteroidetes are observed

(Chryseobacterium, 7% and Alloprevotella 2%). The predominant Firmicutes seen at placement, and consistently observed at high levels throughout growth, are the

Lactobacilli (5.1%).

Once established on litter, the avian bacterial respiratory microbiome is consistent for the first 4 weeks and is dominated by the Lactobacilli, averaging almost

40% of the detected OTUs. Other and Staphylococcus from the Firmicutes as well as Actinobacteria phyla are also consistently observed. By Week 7, a significant shift to the Proteobacteria and Actinobacteria occurs in the respiratory tract

(Figure 3.4B). While the relative abundance of Lactobacilli drops to 14.8%, significant numbers of Gallibacterium (37%) and Corynebacteriaceae (22%) are now present (Table 3.7).

3.3.3 Bacteriophage Diversity

The unmapped DNA sequences were also aligned to a bacteriophage database consisting of 3,429 complete genome sequences. A total of 504,682 reads aligned to

100

bacteriophage genomes (Table S1). A total of 31 unique bacteriophage species extended from 1 classified and 1 unclassified order, 3 classified and 1 unclassified families, and 8 classified and 4 unclassified genera were identified (Table S8).

Normalized abundance, percent relative abundance and average abundance was calculated similar to the previous analyses (Table S8-S10). Of the classified families of bacteriophage observed, the Myoviridae were the most abundant with an average normalized abundance of 70.99%, followed by (40.19%) and

Siphoviridae (31.16%) (Figure 3.4C; Figure 3.5C; Table S8). The most abundant species of bacteriophage was Enterobacteria phage RB55 with an average normalized abundance of 39.16%.

We also investigated the frequency of specific bacteriophage species observed during the grow out cycle (Figure 3.6; Table S10). Salmonella phage RE-2010,

Enterobacteria phage IME10, Enterobacteria phage T7, Enterobacteria phage

VT2phi_272, Escherichia phage TL-2011b, Stx2 converting phage vB_EcoP_24B and

Stx2-converting phage 1717 were detected in all eight weeks whereas Salmonella phage SJ46, Shigella phage SfIV , Enterobacteria phage lambda, and Enterobacteria phage YYZ-2008 appeared in seven of the eight weeks. Alpha diversity was calculated at each week using the Shannon Diversity Index (Table 3.2). Samples exhibited the lowest bacteriophage diversity at Week 2 (H = 2.111) and the highest diversity at

Week 3 (H = 2.922). When comparing bacteriophage families from hatching to processing, Myoviridae increased by 25.02% while Podoviridae and unclassified bacteriophage decreased by 23.27% and 3.89%, respectively (Figure 3.4C).

101

Correlations between detected bacteria and bacteriophage were analyzed based on the Pearson coefficient of correlation. A total of 55 correlations were calculated between 4 families of bacteriophage and 11 families of bacteria (Figure 3.7). Strong, positive correlations were observed between the bacteriophage families and bacterial families present in the trachea. (R = 0.459), Podoviridae (R = 0.743) and the unclassified bacteriophage (R = 0.887) exhibited strong positive correlations with the Dermabacteraceae. Podoviridae showed positive correlations with the

Brevibacteriaceae (R = 0.802), the Pseudomonadaceae (R = 0853), Flavobacteriaceae

(R = 0.788) and Streptococcaceae (R = 0.812) which are found in the avian trachea at hatch. Siphoviridae had a positive correlation with the Staphylococcaceae (R = 0.641), which are found in the trachea throughout growth.

3.3.4 Fungal Diversity

The unmapped DNA sequences were also aligned to a fungi database consisting of 1,281 genomes. A total of 1,964 reads aligned to fungi genomes (Table

S1). A total of 61 unique fungi species were identified which extended from 2 phyla, 9 classes, 20 orders, 37 families and 50 genera (Table S11). Normalized abundance, percent relative abundance and average abundance was calculated in a similar manner to the previous analyses (Table S11-13). Of the 2 Phyla, was by far the most abundant. The average abundance of all phyla, classes, orders, families, genera and species are available in Table S11. Within the Ascomycota phylum, the most

102

abundant class of fungi was Saccharomycetes with an average abundance of 98.76%.

Within the Basidiomycota phylum, the most abundant classes were Agaricomycetes with an average abundance of 0.05% and Tremellomycetes (0.03%).

We also investigated the frequency of specific fungi species observed during the grow out cycle (Figure 3.6; Table S13). Laccaria bicolor, Penicillium chrysogenum and Wickerhamomyces ciferrii were detected in all eight weeks whereas

Tetrapisispora phaffii and Aspergillus oryzae appeared in seven of the eight samples.

Twenty-six of the 61 fungal species were only detected in one sample. We also observed no shifts in fungal microbial communities during the experiment (Figure

3.4D, Figure 3.5D). Alpha diversity was calculated at each week using Shannon

Diversity Index (Table 3.2). The dominance of a single fungal species resulted in low levels of Alpha diversity in each sample.

3.3.5 The Avian Microbiome

The development of NGS approaches, accompanied by the further development of novel computational and bioinformatics tools enabled us to examine the evolution of the microbial ecology of the avian trachea (eukaryotic virus, bacteria, bacteriophage, and fungi) during the growth of this commercial flock. Figure 3.6 is a representation of the complex ecology of the respiratory microbiome of the broiler chicken. In this microbial network, nodes of bacteria, bacteriophage, eukaryotic

103

viruses, and fungi are arranged by order, while the diameter of the node depicts taxa frequency from 1-8 samples.

3.4 Discussion

A detailed characterization of the bacterial microbiota of the commercial broiler chicken was recently published [13]. This study examined the core bacterial microbiota of the broiler gastrointestinal, respiratory, and barn environments.

Lactobacillus was found to be the dominant bacterial taxon of the trachea, although the trachea was found to also contain Staphylococcus, Streptococcus, Ruminococcus, and Xanthomonas. This study was conducted as a longitudinal study from Day 7 to

Day 42 and utilized multiple flocks so that microbiome composition could be correlated with performance [13]. The goal of our study was to expand the characterization of the broiler respiratory microbiome beyond the bacterial component. Although emphasizing the eukaryotic virome, another objective was to use next generation sequence data to determine the bacteriophage and fungal composition of the avian respiratory tract. This required the development of a unique bioinformatics tool (BiomeSeq) that utilizes a sequence-dependent approach [22]. To determine and quantify the relative abundance of microbial elements RNA-Seq and

DNA-Seq derived sequences were initially aligned to the avian genome, followed by aligning the remaining sequences to avian-virus specific databases, a bacteriophage database, and a fungi database. Alignment to a host genome sequence followed by

104

alignment to specific microbial databases allowed the microbial community in a given sample to be quantified, a unique property of this sequence-dependent bioinformatics approach. This does not preclude the use of the same data in a sequence-independent manner, allowing for a more traditional metagenomics approach that can be used to create contigs that could then be utilized to identify and sequence novel viral elements.

Using either method, functional genes and metabolic pathways can be identified using

BLASTP and KEGG databases [23].

Previous avian virome studies have focused on the RNA virus community of the avian gut [24, 25]. Tracheal swabs of the avian respiratory tract are not amenable to traditional viral enrichment strategies such as centrifugation or filtration because of their small volume, the relatively low viral concentrations in the samples, and the nuclease rich environment. For this study we pooled twelve swabs in two samples to increase yield and collected the swab material in a chaotropic buffer that was rapidly frozen in order to preserve the integrity of viral RNA. In addition, the decision to utilize a sequence-dependent approach to analyze the sequenceing data necessitated the development of specific databases. For the eukaryotic virome, an avian virus- specific whole genome database was developed (Table 3.1). Representative whole genomes from 22 viral families (9 DNA viruses, 13 RNA viruses) are represented in the database. Once chicken genome sequences are removed from the RNA-Seq and

DNA-Seq library fastq files, alignment to the avian-specific viral database and subsequent analysis is rapid and efficient.

105

An examination of the avian respiratory viral microbiome confirmed the presence of a dynamic and diverse community. The commercial broiler flock utilized in this study was vaccinated in ovo with a live Marek’s disease virus vaccine (SB-1) and a live recombinant herpesvirus of turkeys (HVT) vaccine expressing Newcastle disease virus genes. At hatch, chicks were also vaccinated by spray with a multivalent infectious bronchitis virus (avian coronavirus) vaccine before placement. The consistent presence of herpesviruses and coronaviruses in the respiratory tract is consistent with vaccination with these two live vaccines, coupled with the expected presence of these avian viruses in the environment. As predicted, as the birds aged, the complexity and diversity of the viral community also increased. Of particular note are the appearance of infectious bursal disease virus (Birnaviridae) at Week 6 and chicken anemia virus (Circoviridae) at Week 4. Broiler breeders are vaccinated in order to maximize the amount of maternal antibodies to these potential pathogens in the newly hatched chick. By Week 4 maternal antibody levels should be reduced to the level where colonization of the respiratory tract by these viruses is likely. However, by

Week 4, the avian adaptive immune system has matured. Consequently, the initially observed relative abundance of these viruses is the highest level observed (57.97%) of the detected chicken anemia viruses sequences, Figure 3.3. The rapid reduction in the amount of these viruses in the respiratory tract is most likely due to the activation of the avian adaptive immune system. A similar observation is seen with the appearance in Week 6 of avian adenovirus in the avian respiratory tract. Avian adenoviruses are commonly isolated from the avian respiratory tract. Finally, picornaviruses and

106

are commonly found in the digestive tract of chickens [25], not the respiratory tract. However, it is not surprising that representatives from these virus families would be transiently observed in the respiratory tract during their initial colonization of the bird.

Consistent with the observations of Johnson et al. (2018) we observed that the bacterial microbiome of the avian respiratory tract was dominated by the Lactobacilli

[13]. The bacterial microbiome of the newly hatched chick was more complex than expected. Although Lactobacilli (6.1%) were the dominant Firmicutes, Bacteroidetes

(Chryseobacterium, 7%), Proteobacteria (Pseudomonas 13%), and Actinobacteria

(Brevibacterium, 9%) were also observed in significant numbers. The majority of the bacteriophage found in the avian respiratory tract was Enterobacteria phage RB55 of the Myoviridae family. The presence of this bacteriophage correlated with

Gallibacterium (Pasteurellaceae), an abundant bacterial species found in the last two weeks of growth. Interactions between bacteriophage and bacteria are known to have a significant impact on host health [23]. may also help control bacterial populations, influencing bacterial diversity and contributing to the dysbiosis of the respiratory microbiota during disease. Little is known about the diversity and role of fungi in the respiratory tract of the avian and further studies are needed to determine the relevance of the high normalized relative abundance of the Basidiomycota observed in this flock.

This approach utilized a longitudinal study of one commercial antibiotic-free broiler flock to develop the tools needed to develop a comprehensive analysis of the

107

microbial ecology of the avian respiratory tract (Figure 3.6). This initial study should be confirmed and expanded by examining the respiratory microbiome of multiple flocks from multiple companies, by examining multiple grow-out cycles from the same flock in order to determine seasonal effects and flock consistency, and by examining flocks grown under different production systems (antibiotic free, organic, free range, and traditional). These results could also be compared with multi-age backyard flocks from the same geographic area. Finally, efforts are underway to compare the ecology of the respiratory microbiome of birds exhibiting respiratory disease complex (RDC) to the control avian respiratory microbiome. We observe significant changes in the composition of the eukaryotic virome and the bacterial microbiome consistent with the complex etiology of this disease (manuscript in preparation).

3.5 Materials and Methods

3.5.1 Sample Collection

Tracheal swabs were collected at placement and at weekly intervals through processing at day 49 (8 samples) from an antibiotic-free commercial broiler flock grown in the Jones-Hamilton facility at the University of Delaware Carvel Research and Education Center. At each time point two samples containing six individual swabs in 3 ml of buffer PV1 (Qiagen) were collected frozen immediately on dry ice and stored in -80ºC until use.

108

3.5.2 Nucleic Acid Extraction and Sequencing

After thawing on ice, the pooled samples were gently homogenized, split into two tubes and then centrifuged (7000 X g; 5 minutes; 4°C) to form pellets. Total RNA was isolated from one pellet using the Qiagen (previously MoBio) Viral Nucleic Acid extraction kit following the manufacturer’s protocol. DNA was isolated from the duplicate pellet using the Qiagen Blood and Tissue Kit following the manufacturer’s protocol. Both DNA and RNA sequencing was performed for each time point using the Illumina HiSeq platform producing 1 X 100 single-end reads by the University of

Delaware Sequencing Core Facility.

3.5.3 16S rRNA Amplicon Sequencing and Analysis

The V4 hypervariable region of the bacterial 16S rRNA gene was extracted and amplified using PCR with primers 515F (‘5- GTGCCAGCMGCCGCGGTAA-3’) and 806R (‘5-GGACTACHVGGGTWTCTAAT-3’), as previously described [13, 27].

The conditions of the first PCR reaction used were an initial denaturation step at 95°C for 5 minutes, followed by 25 cycles of 98°C for 20 seconds, 55°C for 15 seconds, and

72°C for 1 minute, with a final extension at 72°C for 5 minutes. The product was diluted 1:100 and used in a second PCR reaction. The second PCR reaction consisted of an initial denaturation step at 95°C for 5 minutes, followed by 10 cycles of 98°C for

20 seconds, 55°C for 15 seconds, and 72°C for 1 minute, with a final extension at

72°C for 5 minutes. The pooled, size-selected sample was denatured with NaOH,

109

diluted to 8 pM in Illumina’s HT1 buffer, spiked with 20% PhiX, and heat denatured at 96°C for 2 minutes immediately prior to loading. The amplicons were sequenced at the University of Minnesota Genomics Center (Minneapolis, MN) using an Illumina

MiSeq 600 cycle v3 kit.

Following sequencing, samples were sorted by barcode to generate individual fastq files. Each sample was assessed for quality and assembled into contigs using

PEAR’s default parameters, with the modification that the quality score threshold was set to 30. Samples were further filtered and analyzed using Mothur version 1.35.1 [20] and MiSeq SOP [26]. OTUs were generated using 97% sequence similarity. Mothur’s implementation of the SILVA database (v123) was used for classification of OTUs.

Alpha-diversity was measured using the Shannon diversity index [27, 28]. Relative abundance, mean relative abundance and genera frequency were also calculated. This data was represented by pie charts, phylogenetic trees and networks using the R library GraPhlAn and Cytoscape [29, 30]. A Pearson correlation matrix between bacteria and bacteriophage was constructed using the R library Corrplot [31].

3.5.4 Eukaryotic Virus, Bacteriophage and Fungal Analysis

Raw DNA-Seq and RNA-Seq reads were processed using BiomeSeq [22].

Individual sequence files were first analyzed for per-base sequence quality, per- sequence quality, sequence length distribution and duplicate sequences. Reads with a quality Phred score below 30, reads under 100 base pairs and reads containing only

110

adapter sequences were removed. The remaining reads were then aligned to the reference host genome (Gallus gallus; Annotation Release 104) using the Burrows-

Wheeler Alignment algorithm [32-34]. Only unmapped reads were extracted and analyzed further. This step removes host genome contamination from the data, increasing analytical efficiency [35]. Determining the amount of host genome sequence in the library is also required when quantifying the results. The remaining reads were then aligned to microbial databases including a bacteriophage, a fungal and an avian-specific viral genome database using Bowtie2.

The avian-specific viral genome database contains full genome reference sequences of both DNA and RNA avian viruses obtained from the National Center for

Biotechnology Information (NCBI) reference sequences. The avian DNA viral database contains 48 viral elements from 9 unique families and the avian RNA viral database contains 63 viral elements from 13 families (Table 3.1). The avian DNA and

RNA viral databases are organized by the classification of their viral structure and genome organization. DNA viruses are organized hierarchically by whether the virus is double- or single-stranded and whether the virus is enveloped or non-enveloped.

RNA viruses are organized hierarchically by whether the virus is double- or single- stranded, negative or positive sense, segmented or non-segmented and whether the virus is enveloped or non-enveloped. The reads were also aligned to the default bacterial, fungal and bacteriophage databases provided by BiomeSeq, which contain

111

complete and representative genomes obtained from the NCBI Reference Sequence

Database and consist of 3,623, 1,281 and 2,212 genomes, respectively [11].

A sequence similarity-dependent approach for detecting microbes, such as this, contributes to the rapid detection of known microbes while also allowing for the quantification of biodiversity which similarity-independent approaches lack [36]. For each individual sample, the reads that mapped to each microbe were normalized based on the genome length of both microbe and reference per 100,000 host cells using the following equation [37]:

%$.#)/ 01 /)'&2 .'33)& 40 .5(/0#5'6 7)%0.) 2 - .5(/0#) 7)%0.) 258) "#$%&'%() = - 10! %$.#)/ 01 /)'&2 .'33)& 40 (ℎ5(:)% 7)%0.) (ℎ5(:)% 7)%0.) 258)

Relative microbial abundance, mean relative abundance and species frequency were also calculated. This data was represented by pie charts, phylogenetic trees and networks using the R library GraPhlAn and Cytoscape [29, 30]. Alpha diversity was measured using Shannon diversity index [27]. In addition, stacked bar plots and heatmaps were generated with the R library PhyTools [38]. A Pearson correlation matrix between bacteria and bacteriophage was constructed using the R library

Corrplot [31].

112

Figure 3.1. Normalized relative abundance of detected DNA and RNA viral species at each time point. * No DNA viruses detected at Week 1

A

* Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 B

Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7

113

Figure 3.2. Heat map with phylogenetic tree representing the detection intensity of viral families at each individual week. Color corresponds to the range of relative abundance of each week from 0 to 100%. The sum of each column, or week, is 100%.

W0 W1 W2 W3 W4 W5 W6 W7 Herpesviridae 0.17 0.04 0.26 0.15 0.07 0.01 Adenoviridae 53.30 6.70 DNA Genomiviridae Circoviridae 88.66 12.05 15.47 36.77 Birnaviridae 0.01 2.33 0.10 RNA Astroviridae 0.74 Picornoviridae 0.17 0.01

Retroviridae 99.45 44.49 41.02 83.30 9.29 64.27 7.11 37.11

Coronaviridae 0.38 54.76 58.95 16.28 1.88 23.60 21.79 19.32

114

Figure 3.3. Heat map with phylogenetic tree representing the detection intensity of each viral family from hatching to processing. Color intensity corresponds to the range of relative abundance of each family from 0 to 100%. The sum of each row, or viral family, is 100%.

W0 W1 W2 W3 W4 W5 W6 W7 Poxviridae

Herpesviridae 24.86 5.30 37.27 21.54 10.16 0.87

Adenoviridae 88.84 11.16 DNA Hepeviridae Hepadnaviridae Genomiviridae Circoviridae 57.97 7.88 10.11 24.04 Parvoviridae Smacoviridae Reoviridae Birnaviridae 0.32 95.39 4.28 Orthomyxoviridae Phenuiviridae Bornaviridae RNA Pneumoviridae Paramyxoviridae Astroviridae 100.00 Caliciviridae Picornoviridae 97.06 2.94 Retroviridae 25.76 11.53 10.63 21.58 2.41 16.65 1.84 9.61 Flaviviridae Coronaviridae 0.19 27.80 29.93 8.26 0.96 11.98 11.06 9.81

115

Figure 3.4. Abundance of A) virus, B) bacteria, C) bacteriophage and D) fungi at Week 0, 1 and 7. Taxa represented at family (A and C) and phylum (B and D).

WeekWeek 00 WeekWeek 1 WeekWeek 77

Astroviridae Coronaviridae Herpesviridae 0.74% Retroviridae

0.38% Circoviridae 0.17% 37.11% 36.77% Retroviridae 44.49%

Virus Coronaviridae

54.76% 0.10% Birnaviridae A) Retroviridae Adenoviridae 99.45% 6.7% Coronaviridae 19.32% Week 0 Week 1 Week 7

Actinobacteria Proteobacteria Actinobacteria 15.00% 19.10% Proteobacteria Actinobacteria 29.00% 4.00% 27.20% Proteobacteria 37.20% Bacteria

Firmicutes Firmicutes Bacteroidetes 65.40% B) Firmicutes 35.10% 9.00% 33.40%

Unclassified Unclassified Unclassified 4.38% 2.96% 6.85% Myoviridae Myoviridae 20.42% 21.33%

Myoviridae Podoviridae 46.35% 36.01% Podoviridae Siphoviridae 46.91% Bacteriophage 12.54% Podoviridae Siphoviridae 59.28% 28.29% Siphoviridae C) 12.68%

Basidiomycota Basidiomycota Basidiomycota 0.01% 0.03% 0.02% Fungi

Ascomycota D) 99.97% Ascomycota Ascomycota 99.99% 99.98%

116

Figure 3.5. Phylogenetic tree of A) virus, B) bacteria, C) bacteriophage and D) yeast and fungi. Node diameter indicates average abundance at species (A, C and D) and genera (B) level. Taxonomic levels range from phylum to genera (B), order to species (C) phylum to species (D). Viruses are organized according to structural classification (A).

A B Herpesviridae

Birnaviridae

Adenoviridae

Astroviridae

Circoviridae

Picornoviridae

Retroviridae Coronaviridae

C D Dothideomycets Podoviridae Eurotiomycetes Lecanoromycetes

Leotiomycetes

Schizosaccharomycetes Siphoviridae Agaricomycetes

Unclassified Tremellomycetes

Sordariomycetes

Saccharomycetes

Myoviridae

117

Figure 3.6. Correlation matrix comparing bacteria and bacteriophage taxa at the family level. Node diameter corresponds to level of correlation. Node color corresponds to the Pearson correlation coefficient and ranges from -1 to 1 indicated by red and blue, respectively.

118

Figure 3.7. Microbial network of the complete healthy avian respiratory microbiome including detected RNA viruses, DNA viruses, yeast and fungi, bacteria, and bacteriophage. Taxa nodes are arranged by order. Node diameter correlates to taxa frequency.

Fungi

RNA Virus

DNA Virus

Healthy Avian Respiratory Microbiome

Bacteriophage Bacteria

119

Table 3.1. Avian specific viral genome database structure.

Virus Complete Database Classification Family Genomes

Double/Single Hepeviridae 1 Enveloped d a Stranded Hepadnaviridae 1

Genomoviridae 3 Non- Single Stranded Parvoviridae Enveloped 7 Avian DNA Viral Circoviridae 10 Database Smacoviridae 3

Poxviridae 3 Double Stranded Enveloped Herpesviridae 6 Non- Double Stranded Adenoviridae Enveloped 14 Non- Reoviridae 5 Double Stranded Segmented c Enveloped Birnaviridae 1 Retroviridae 5 Non- Single Stranded Positive b Enveloped Flaviviridae Segmented 3 Coronaviridae 5 Astroviridae 5 Avian RNA Non- Non- Single Stranded Positive Caliciviridae Viral Segmented Enveloped 1 Database Picornaviridae 17 Orthomyxoviridae 16 Phenuiviridae 1 Single Stranded Negative Segmented Enveloped Bornaviridae 3 Pneumoviridae 1 Non- Single Stranded Negative Enveloped Paramyxoviridae Segmented 14 a single stranded, double stranded or single/double stranded DNA and RNA viruses b positive-sense or negative-sense RNA viruses c segmented or non-segmented RNA viruses d enveloped or non-enveloped DNA and RNA viruses

120

Table 3.2. Shannon diversity of respiratory microbes in a healthy broiler flock over time.

Time RNA Virus DNA Virus Bacteria Phage Fungi

Placement 0.041 0.000 2.707 2.218 0.022 Week 1 1.290 0.000 2.468 2.531 0.286 Week 2 1.108 0.000 - 2.111 0.096 Week 3 0.722 0.000 2.251 2.922 0.165 Week 4 0.738 0.013 2.499 2.756 0.151 Week 5 0.910 0.035 1.925 2.294 0.026 Week 6 1.480 0.534 - 2.134 0.013 Week 7 1.092 0.867 1.935 2.087 0.134

121

REFERENCES

1. Human Microbiome Project, C., A framework for human microbiome research.

Nature, 2012. 486(7402): p. 215-21.

2. Bosch, A.A., et al., Viral and bacterial interactions in the upper respiratory

tract. PLoS Pathog, 2013. 9(1): p. e1003057.

3. Bakaletz, L.O., Viral potentiation of bacterial superinfection of the respiratory

tract. Trends Microbiol, 1995. 3(3): p. 110-4.

4. Pettigrew, M.M., et al., Microbial interactions during upper respiratory tract

infections. Emerg Infect Dis, 2008. 14(10): p. 1584-91.

5. de Steenhuijsen Piters, W.A., et al., Nasopharyngeal Microbiota, Host

Transcriptome, and Disease Severity in Children with Respiratory Syncytial

Virus Infection. Am J Respir Crit Care Med, 2016. 194(9): p. 1104-1115.

6. Teo, S.M., et al., The infant nasopharyngeal microbiome impacts severity of

lower respiratory infection and risk of asthma development. Cell Host

Microbe, 2015. 17(5): p. 704-15.

7. Gross, W.B., Factors affecting the development of respiratory disease complex

in chickens. Avian Dis, 1990. 34(3): p. 607-10.

122

8. Kleven, S.H., in the etiology of multifactorial respiratory

disease. Poult Sci, 1998. 77(8): p. 1146-9.

9. Bond, S.L., et al., Upper and lower respiratory tract microbiota in horses:

bacterial communities associated with health and mild asthma (inflammatory

airway disease) and effects of dexamethasone. BMC Microbiol, 2017. 17(1): p.

184.

10. De Boeck, C., et al., Longitudinal monitoring for respiratory pathogens in

broiler chickens reveals co-infection of Chlamydia psittaci and

Ornithobacterium rhinotracheale. J Med Microbiol, 2015. 64(5): p. 565-574.

11. Gaeta N, L.S., Teixeira A, Ganda E, Oikonomou G, Gregory L, Bichalho R.,

Deciphering upper respiratory tract microbiota complexity in healthy calves

and calves that develop respiratory disease using shotgun metagenomics. J

Dairy Sci. , 2017. 100: p. 1445-1458.

12. Glendinning, L., G. McLachlan, and L. Vervelde, Age-related differences in

the respiratory microbiota of chickens. PLoS One, 2017. 12(11): p. e0188455.

13. Johnson TJ, Y.B., Noll S, Cardona C, Evans NP, Karnezos P, Ngunjiri JM,

Abundo MC, Lee C-W, A consistent and predictable commercial broiler

chicken bacterial microbiota in antibiotic-free production displays strong

correlations with performance. Appl. Environ. Micro., 2018. 84: p. e00362-18.

123

14. Shabbir, M.Z., et al., Microbial communities present in the lower respiratory

tract of clinically healthy birds in Pakistan. Poult Sci, 2015. 94(4): p. 612-20.

15. Clarridge, J.E., 3rd, Impact of 16S rRNA gene sequence analysis for

identification of bacteria on clinical microbiology and infectious diseases. Clin

Microbiol Rev, 2004. 17(4): p. 840-62.

16. De Santis T, H.P., Larsen N, Rojas M, Brodie E, Keller K, Huber T, Dalevi D,

Hu P, Andersen G., Greengenes, a chimera-checked 16S rRNA gene. 2016.

17. Quast C, P.E., Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner F.,

The SILVA ribosomal RNA gene database project: improved data processing

and web-based tools. Nucl Acids Res., 2013. 41: p. 590-596.

18. Meyer F, P.D., D’Souza M, Olson R, Glass E, Kubal M, Paczian T, Rodriguqz

A, Stevens R, Wilke A, Wilkening J, Edwards R., The metagenomics RAST

server- a public resource for the automatic phylogenetic and functional

analysis of metagenomes. BMC Bioinformatics, 2008. 9: p. 386.

19. Caporaso J, K.J., Stombaugh J, Bittinger K, Bushman F, Costello E, Fierer N,

Peña A, Goodrich J, Gordon J, Huttley G, Kelley ST, Knights D, Koenig JE,

Ley R, Lozupone C, McDonald D, Muegge B, Pirrung M, Reeder J, Sevinsky

JR, Turnbaugh P, Walters W, Widmann J, Yatsunenko T, Zaneveld J, Knight

R., Qiime allows analysis of high-throughout community sequencing data.

Nature Methods, 2010. 7: p. 335-336.

124

20. Schloss P, W.S., Ryabin T, Hall J, Hartman M, Hollister E, Lesniewski R,

Oakley B, Parks D, Robinson C, Sahl J, Stres B, Thallinger G, Van Horn D,

Weber C. , Introducing mothur: Open-source, platform-independent,

community-supported software for describing and comparing microbial

communities. Appl Enviro Microbiol, 2009. 75: p. 7537-7541.

21. Zou, S., et al., Research on the human virome: where are we and what is next.

Microbiome, 2016. 4(1): p. 32.

22. Mulholland, K.A. and C.L. Keeler, BiomeSeq: A Tool for the Characterization

of Animal Microbiomes from Metagenomic Data. bioRxiv, 2019: p. 800995.

23. Yang, S., et al., Metagenomic Analysis of Bacteria, Fungi, Bacteriophages, and

Helminths in the Gut of Giant Pandas. Front Microbiol, 2018. 9: p. 1717.

24. Day, J.M., et al., Comparative analysis of the intestinal bacterial and RNA

viral communities from sentinel birds placed on selected broiler chicken farms.

PLoS One, 2015. 10(1): p. e0117210.

25. Day, J.M. and L. Zsak, Recent progress in the characterization of avian enteric

viruses. Avian Dis, 2013. 57(3): p. 573-80.

26. Kozich J, W.S., Baxter N, Highlander S, Schloss P., Development of a dual-

index se-quencing strategy and curation pipeline for analyzing amplicon

125

sequence data on the MiSeq Illumina sequencing platform. Appl. Environ.

Microbiol., 2013. 79: p. 5112-5120.

27. Lemos, L.N., et al., Rethinking microbial diversity analysis in the high

throughput sequencing era. J Microbiol Methods, 2011. 86(1): p. 42-51.

28. Ludwig, J. and J. Reynolds, Statistical Ecology, ed. Wiley. 1988, New York.

29. Asnicar, F., et al., Compact graphical representation of phylogenetic data and

metadata with GraPhlAn. PeerJ, 2015. 3: p. e1029.

30. Shannon, P., et al., Cytoscape: a software environment for integrated models of

biomolecular interaction networks. Genome Res, 2003. 13(11): p. 2498-504.

31. Wei, T. and V. Simko, corrplot: Visualization of a correlation matrix. R

package version 0.73, 2013. 230(231): p. 11.

32. Hillier, L.W., et al., Sequence and comparative analysis of the chicken genome

provide unique perspectives on vertebrate evolution. Nature, 2004. 432(7018):

p. 695-716.

33. Langmead, B. and S.L. Salzberg, Fast gapped-read alignment with Bowtie 2.

Nat Methods, 2012. 9(4): p. 357-9.

34. Li, H. and R. Durbin, Fast and accurate long-read alignment with Burrows-

Wheeler transform. Bioinformatics, 2010. 26(5): p. 589-95.

126

35. Daly G, L.R., Rowe W, Stubbs S, Wilkinson M, Ramirez-Gonzalez R, Mario

C, Bernal W, Heeney J. , Host subtraction, filtering and assembly validations

for novel viral discovery using next generation sequencing data. PLoS One,

2015. 10(6).

36. Herath, D., et al., Assessing Species Diversity Using Metavirome Data:

Methods and Challenges. Comput Struct Biotechnol J, 2017. 15: p. 447-455.

37. Moustafa, A., et al., The blood DNA virome in 8,000 humans. PLoS Pathog,

2017. 13(3): p. e1006292.

38. Revell, L.J., phytools: an R package for phylogenetic comparative biology

(and other things). Methods in ecology and evolution, 2012. 3(2): p. 217-223.

127

Chapter 4

CHARACTERIZATION OF THE RESPIRATORY MICROBIOME OF CHICKENS WITH RESPIRATORY DISEASE

4.1 Summary

Respiratory diseases in commercial poultry are a clinical manifestation of a broader dysbiosis of the respiratory microbial community. Although the bacterial component of the healthy broiler chicken has recently been characterized, there are limited tools available to identify the eukaryotic virus, bacteriophage, and fungal composition of the broiler respiratory tract. BiomeSeq is a computational tool that utilizes a sequence similarity-dependent approach and a comprehensive workflow incorporating nucleotide data generated through high throughput sequencing platforms to determine the composition of the eukaryotic virus, bacterial, bacteriophage, and fungal microbiomes. This tool was used to determine the feasibility of generating a comprehensive assessment of the respiratory microbiome from birds diagnosed with infectious laryngotracheitis and/or respiratory disease complex. To that end, two samples were compared; a pooled sampled of tracheal swabs collected from normal healthy 7-week-old broilers, and a pooled sample form three clinical submissions of poultry respiratory disease. It was confirmed that the diseased birds harbored

128

infectious laryngotracheitis virus (89% relative abundance in the diseased sample). A significant dysbiosis in the bacterial component of the microbiome was also observed.

An increase in the abundance of Escherichia coli, a loss of commensal bacteria such as Corynebacterium falsenii, and the introduction of Ornithobacterium rhinotracheale, a respiratory pathogen, was observed in the diseased tracheal samples.

Information learned about the respiratory microbiome using this approach can be represented as a microbial network, which can be used to make hypotheses about the relationships among the microbial components composing the microbial ecology of the avian respiratory tract.

4.2 Introduction

Since the Human Microbiome project was initiated in 2008, considerable knowledge has been gained about the composition and function of the microbial species that inhabit different ecological niches in the human body [1]. These microbial communities, or “microbiomes”, interact with each other and with their host in order to benefit both systems and imbalances in these microbial communities can be associated with specific diseases. For example, the lung microbiota of individuals with asthma demonstrate an increase in Proteobacteria [2] and a decrease in Firmicutes,

Actinobacteria and Saccharibacteria [3]. And while some studies addressed the viral component of the gut, more recent studies have examined the blood virome in healthy and diseased individuals [4]. With regard to the respiratory tract, dysbiosis in the

129

nasopharynx may lead to the acquisition of novel viral pathogens or viral co-infections

[5].

Research on the avian microbiome has primarily been directed at the prokaryotic component of the gastrointestinal tract and several studies have clearly showed that the introduction of poultry pathogens negatively impacts the gut microbiome [6, 7]. Only recently has the healthy broiler chicken respiratory bacterial microbiome been characterized [8]. In determining the baseline microbial composition of the trachea in normal flocks, Lactobacillus was found to be the dominant taxa, and other bacterial taxa were also correlated with broiler performance. However, no systematic efforts have been made to identify the eukaryotic virus, bacteriophage, or fungal composition of the broiler respiratory tract.

Although defining the bacterial microbiome is crucial for understanding the relationship between health and disease, the inclusion of these other microbial components in the avian respiratory microbiome and an understanding of their interactions are required in order to develop more complete homeostatic or disequilibrium models. Respiratory diseases are a major cause of economic losses in poultry production. Commercial poultry are vaccinated in ovo and at hatch with a variety of live attenuated viral vaccines. Some vaccines increase the severity of infections with other viruses [9] while interference between viral vaccines has also been reported [10]. Under commercial conditions respiratory infections involving multiple agents are common.

130

Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes mild to severe respiratory infections in chickens [11]. The virus enters through the upper respiratory tract or conjunctiva. is generally limited to the nares, oropharynx, and trachea, while viremia is rarely observed [12]. Birds exhibiting the peracute form of the disease have difficulty breathing, gasp while extending their necks, may produce a bloody tracheal exudate, and exhibit high mortality. Mild forms of ILT may present with conjunctivitis, nasal discharge and coughing. Under commercial conditions, a clinical diagnosis of infectious laryngotracheitis (ILT) can be complicated by co-infections with other viral agents (either pathogens or vaccines) and/or the development of secondary bacterial infections. Combined with suboptimal environmental or management conditions, a complex multi-factored respiratory disease complex (RDC) often develops. It is difficult to reproduce the complex etiology of RDC under defined laboratory conditions due to the inability to reproduce the complex microbial and environmental environment found in a commercial broiler facility.

One reason why RDC is difficult to reproduce is that the microbial ecology of the respiratory tract of healthy and diseased birds in a commercial setting is poorly characterized. Therefore, the purpose of this study was to determine the feasibility of generating sufficient next generation DNA and RNA sequencing data to compare and contrast the respiratory microbiome (including the bacterial, bacteriophage, eukaryotic viral, and fungal components) of a healthy broiler flock from flocks diagnosed with RDC.

131

4.3 Materials and Methods

4.3.1 Sample Collection

Tracheal swabs were collected at seven weeks of age from a healthy antibiotic- free commercial broiler flock grown in the Jones-Hamilton facility at the University of

Delaware Carvel Research and Education Center. Two samples, each containing six individual swabs in 3 ml of buffer PV1 (Qiagen) were collected and frozen immediately on dry ice.

Tracheal swabs were also collected from three respiratory clinical isolates submitted to the University of Delaware Poultry Health System (UDPHS) Lasher Laboratory at the University of Delaware. Two of the clinical sample were obtained from ~50-day-old roaster flocks, while the third sample was obtained from a 31-day- old broiler flock. Tracheal swabs were collected in BHI broth and frozen immediately at -80°C. The clinical diagnosis of all three flocks was infectious laryngotracheitis (ILT), RDC, or an ILT complicated by RDC. All three flocks tested positive for infectious bronchitis virus (IBV) and ILTV by PCR, and tested negative for avian influenza virus (AIV). One broiler flock tested positive for Escherichia coli in the air sac and pericardium.

4.3.2 Nucleic Acid Extraction and Sequencing

After thawing on ice, both the healthy and diseased samples were pooled, gently homogenized, split evenly into two tubes and then centrifuged (7000 X g,

20min; 4°C) to form pellets. Total RNA was isolated from one pellet using the Qiagen

132

Viral Nucleic Acid extraction kit following the manufacturer’s protocol. DNA was isolated from the duplicate pellet using the Qiagen Blood and Tissue Kit following the manufacturer’s protocol. Library construction and sequencing using the Illumina

HiSeq platform, producing 1 X 100 single-end reads, was performed at the University of Delaware Sequencing Core Facility.

4.3.3 Eukaryotic Virus, Bacteriophage, Yeast and Fungal Analysis

Raw DNA-Seq and RNA-Seq reads were processed using BiomeSeq [13].

Individual sequence files were first analyzed for per-base sequence quality, per- sequence quality, sequence length distribution and duplicate sequences. Reads with a quality Phred score below 30, reads under 100 base pairs in length and adapter sequences were removed. The remaining reads were then aligned to the reference host genome (Gallus gallus; Annotation Release 104) using the Burrows-Wheeler

Alignment algorithm [14, 15]. Only unmapped reads were extracted and analyzed further. This step removes host genome contamination from the data, increasing analytical efficiency [16]. Determining the amount of host genome sequence in the library is also required when quantifying the results. The remaining reads were then aligned to microbial databases including a bacteriophage, a fungal and an avian- specific viral genome database using Bowtie2 [17].

The avian-specific viral genome database contains full genome reference sequences of both DNA and RNA avian viruses obtained from the National Center for

133

Biotechnology Information (NCBI) reference sequences (Table 4.1). The avian DNA viral database contains 48 viral elements from 9 unique families and the avian RNA viral database contains 63 viral elements from 13 families. The avian DNA and RNA viral databases are organized by the classification of their viral structure and genome organization. DNA viruses are organized hierarchically by whether the virus is double- or single-stranded and whether the virus is enveloped or non-enveloped. RNA viruses are organized hierarchically by whether the virus is double- or single-stranded, negative or positive sense, segmented or non-segmented and whether the virus is enveloped or non-enveloped. The reads were also aligned to the default bacterial, fungal and bacteriophage databases provided by BiomeSeq, which contain complete and representative genomes obtained from the NCBI Reference Sequence Database and consist of 3,623, 1,281 and 2,212 genomes, respectively [18].

A sequence similarity-dependent approach for detecting microbes, such as this, contributes to the rapid detection of known microbes while also allowing for the quantification of biodiversity which similarity-independent approaches lack [19]. For each individual sample, the reads that mapped to each microbe were normalized based on the genome length of both microbe and reference per 100,000 host cells using the following equation [4]:

!"#$%& () &%*+, #*--%+ .( #/0&($/*1 2%!(#% " $ #/0&($/*1 2%!(#% ,/3% ! .5(/0#5'6 '#$%&'%() = !"#$%& () &%*+, #*--%+ .( 4(,. 2%!(#% - 10 4(,. 2%!(#% ,/3%

134

Relative microbial abundance was calculated and alpha diversity was measured using Shannon diversity index [20, 21].

4.4 Results

4.4.1 Identifying the broiler respiratory microbiome and a comparison of the respiratory virome between healthy and diseased birds.

In this feasibility study, a healthy, seven-week-old broiler flock was sampled to represent a healthy respiratory microbiome in this feasibility study. Twelve tracheal swabs were collected, pooled and then split in half for DNA-Seq and RNA-Seq library construction and sequencing. Similarly, tracheal swabs from three clinical cases which were presented to the UDPHS with infectious laryngotracheits or respiratory disease complex were pooled and then split for DNA-Seq and RNA-Seq library construction and sequencing. All three clinical samples had been screened by PCR as positive for infectious laryngotracheitis virus (ILTV) and infectious bursal virus

(IBV), and negative for avian influenza virus (AIV). E. coli was identified in the air sac and pericardium of one sample. The raw DNA-Seq and RNA-Seq reads were processed through BiomeSeq as described in the Materials and Methods. As shown in Table 4.2, between 462,382 and 20,884,583 sequencing reads from the four libraries did not have sequences which could be mapped to the chicken genome. From these sequences, a total of 147,097 sequencing reads were aligned to one of the four microbial databases that were derived from NCBI reference sequences. Knowing the sequence length of the viral genomes represented in the avian eukaryotic virus database, the size of the chicken genome, and the number of sequences that map to the chicken genome permits the quantification of the number of viral genomes per

135

100,000 chicken cells in the indicated sample (Table 4.3). From that value the relative abundance of each identified viral species in the sample can be determined. The heat map in Figure 4.1 graphically presents this data. Of the 19 avian eukaryotic virus families in the avian eukaryotic virus database, representatives from five families were found in each sample. Herpesviridae, Gallid herpesvirus 1 or ILTV, was only identified in the diseased sample and represented 28,937 (89.1%) of the recovered viral sequences. Adenoviridae, Circoviridae, Coronaviridae and Retroviridae sequences were identified in both samples. The healthy flock had a more diverse eukaryotic viral population (Figure 4.2) because the disproportional representation of ILTV sequences in the diseased sample.

4.4.2 Comparison of the bacterial microbiome between healthy and diseased birds.

Along with the changes observed in the eukaryotic virome, significant changes were observed in the bacterial microbiome of diseased birds when compared with the control flock. Table 4.4 lists the bacterial species found in each sample at a frequency of >0.10% and the loss in bacterial diversity observed in the diseased sample is shown in Figure 4.2. Significant alteration of the bacterial microbiome in the diseased flock were observed, Figure 4.3. Gallibacterium anatis and Corynebacterium falsenii, two commensal bacterial species commonly found in the microflora of the upper respiratory tract of chickens, were the most abundant bacterial species recovered from the healthy broiler flock, representing a combined 66.1% of the identified bacterial sequences (Figure 4.3A). In the diseased flock, E. coli abundance increased three- fold, to become the most abundant species. Ornithobacterium rhinotracheale, a

136

bacterium identified with avian respiratory diseases, was only observed in the diseased birds and was the third most abundant species. Together, these two bacterial species represented 68.4% of the identified bacterial sequences in the diseased sample.

4.4.3 Comparison of the bacteriophage and fungal microbiomes between healthy and diseased birds.

Large changes were observed in the bacteriophage components of the respiratory microbiome between healthy and diseased birds. When identifying bacteriophage species present at >1% in the two samples, 97.2% of the bacteriophage found in the healthy birds were represented by two species, both Enterobacteriophage from two different families, Table 4.5. The top 10 most abundant bacteriophage species in the diseased sample only represented 67.2% of the identified bacteriophage sequences, while two bacteriophage species represented 97.3% of the sequences recovered from healthy birds. In the diseased sample there were 20 different bacteriophage species represented at a frequency of at least 1%. The increase in bacteriophage diversity can be calculated using the Shannon diversity index, Figure

4.2, and is represented in Figure 4.4. Whereas the majority of the bacteriophage in the healthy respiratory microbiome are represented by two families, in the diseased flock there is a proportional distribution found between three families and an unclassified group. The unclassified group (Enterobacteria phage P4) represented the most abundant bacteriophage population in the diseased sample (20.5%). The increased

137

diversity in this sample can be associated with an increase in the diversity of

Enterobacteria and Salmonella phage.

In contrast to the other components of the respiratory microbiome, the fungal component was found to be relatively consistent between the healthy flock and the diseased birds. Wickerhamomyces ciferri was the most predominant fungal species identified in both samples, Table 4.6.

4.4.4 Microbial network analysis.

Figure 4.5 represents the eukaryotic virus, bacterial, bacteriophage, and fungal microbiome results as a microbial network. The eukaryotic virus component, representing all of the identified viral families, clearly identifies the unique presence of ILTV in the diseased sample and the small (0.1%) unique presence of infectious bursal disease virus sequences in the healthy flock respiratory sample. The bacterial network demonstrates the increased bacterial diversity found in the healthy flock and the replacement and shift in the bacterial microbiome observed in the diseased birds. It does not demonstrate the significant shift in the distribution of commonly found species (Escherichia coli and Gallibacterium anatis) or that Ornithobacterium rhinotracheale represent a significant portion of the bacterial microbiome in diseased birds. The shift in bacterial composition resulted in a concomitant increase in the diversity of bacteriophage species in the diseased birds, while the same two fungal species were identified in both groups of birds.

138

4.5 Discussion

Avian respiratory diseases, even with a specific diagnosis such as infectious laryngotracheitis, actually represent the clinical manifestation of a broader dysbiosis of the respiratory microbial community. This report evaluates an approach for determining the composition of the avian respiratory microbiome under such conditions. This was done by sampling a healthy broiler flock at 7-weeks-of-age and comparing that sample to a pooled sample comprised of three clinical isolates that had been confirmed by clinical, microbiological, and molecular tests as having infectious laryngotracheitis and/or respiratory disease complex. Total RNA and DNA was extracted from these samples and used to construct and sequence libraries generating next generation sequence data. DNA-Seq data was used to identify the presence of eukaryotic DNA virus genetic material, as well as bacterial, bacteriophage, and fungal sequences. RNA-Seq data, derived from total RNA, was used to identify the presence of eukaryotic RNA virus genetic material, which would not be detected from DNA sequencing data. The resulting sequence data, greater than 127,000,000 reads over four libraries, was processed with BiomeSeq, a computational tool which utilizes a sequence-dependent approach and a comprehensive workflow to determine the composition of the eukaryotic virus, bacterial, bacteriophage, and fungal microbiomes.

This approach successfully identified changes in the composition and proportions of the eukaryotic viral, bacterial, and bacteriophage microbiomes, while detecting no

139

substantive changes in the fungal microbiome component, demonstrating the utility of this approach for this limited and controlled sample.

Analysis of the eukaryotic viral microbiome utilized a customized avian eukaryotic virus database. This database contains 124 complete viral genome sequences distributed over 24 virus families. Sequences from six of these families were identified in the two samples. As expected, the most striking observation was the increase in number of viral sequences (4.5-fold greater in the diseased sample) and the overwhelming presence of ILTV (89.1% relative abundance) in the diseased birds, confirming the primary clinical diagnosis. With 29,000 ILTV reads being recovered, this approach would also permit the resequencing of this clinical isolate of ILTV for comparison with vaccine and field viruses. Expected normal eukaryotic viral flora was also observed with the presence of Adenoviridae, Circoviridae, and transcripts from endogenous retroviruses identified in both samples. Small amounts of Birnaviridae in the healthy sample are not unexpected considering the ubiquitous nature of infectious bursal disease virus in the production environment.

The metagenomic approach evaluated in this report also permitted the first contemporary comparison of the bacterial microbiome from diseased and healthy birds, confirming the complexity of the changes in the microbial ecology of birds diagnosed with respiratory disease. Remarkable changes in the composition and distribution of the major bacterial components of the respiratory microbiome were observed. The composition of the healthy flock’s bacterial microbiome was dominated by the Gallibacterium anatis, Corynebacterium falsenii, and Escherichia coli. C.

140

falsenii is recognized as a commensal bacteria of the respiratory tract, while G. anatis and E. coli are also found in the normal flora. However, these bacterial species may also participate in respiratory disease complex as opportunistic pathogens [22]. The ecology of the diseased birds revealed the loss of the C. falsenii commensal component, E. coli became the dominat bacterial component (present in 3 times the abundance of that observed in the healthy flock), and the third most abundant species observed, 13.1%, was Ornithobacterium rhinotracheale, a well characterized avian pathogen [23].

In previous studies of the broiler respiratory bacterial microbiome,

Lactobacillus was found to be the dominant bacterial species [8]. Lactobacillus was found, but in relatively low abundance, in our samples. 16S rRNA sequencing confirmed the composition, but not the relative abundance, of the bacteria identified in the tracheal samples from the diseased birds (data not shown). Both methods identified the same two most abundant bacteria in the healthy sample, G. anatis and C. falsenii,

However, while the high throughput sequencing approach identified E. coli as the third most abundant bacterial species in the healthy flock, 16S analysis identified

Lactobacillus (16%) and Staphylococcus (10%) as the next most abundant bacterial species. Although the major findings were confirmed by 16S analysis, the differences may be due to a number of factors. While this feasibility study examined two pooled tracheal samples collected from broiler birds on the DelMarVa peninsula, the Johnson study [8] collected 2,309 samples from 37 commercial flocks in Minnesota. Thus, statistical differences due to the small sample size of this study, as well as differences

141

in geographic location, the poultry integrator, management practices and seasonality could be factors influencing these differences.

Compared to the bacterial and viral components of the avian respiratory tract, little is known regarding the bacteriophage and fungal components. The infected bacteriophage microbiome was the only microbiome component exhibiting an increase in diversity when compared to healthy birds (Figure 4.2). In healthy birds 97% of the identified bacteriophage sequences represented two bacteriophage from two families

( and Myoviridae), while there were 13 bacteriophage species present at

>2% abundance in the sample from the diseased birds. There were significant representations from the Microviridae and Myoviridae, as well as the Siphoviridae and unclassified bacteriophage in this sample (Figure 4.4). The increase in E. coli in the respiratory tract of diseased birds may be contributing to the observed increase in bacteriophage diversity. In both samples the dominant fungal species were

Wickerhamomyces ciferrii and Penicillium chrysogenum. W. ciferrii has been collected from wild birds [24] while P. chrysogenum can be found in damp indoor environments [25]. More research is needed to determine if the relative increase in fungal abundance in the respiratory tract of diseased birds can be confirmed. Taken together, data generated about the unique biological components of the respiratory microbiome can be represented as a microbial network. This enables a comparison of the various microbial elements in a manner that can reveal associations that can be proposed and evaluated. Figure 4.5 demonstrates the unique presence of infectious laryngotracheitis, the significant changes in the bacterial component, and the increase

142

in bacteriophage diversity in the pooled tracheal sample collected from diseased birds, when compared to the control flock.

The success of this experiment warrants further studies to confirm its utility.

These should include examining the respiratory microbiome of individual birds (to determine bird to bird variability), collecting samples directly from multiple flocks exhibiting various clinical respiratory presentations, and determining seasonal effects.

The BiomeSeq tool has also been used to determine the composition of the microbiome in other environments, such as the enteric tract (data not shown). Of particular interest is determining the relationship between the enteric and respiratory microbiomes in birds experiencing respiratory diseases, enteric diseases, and diseases affecting immune function. A further desire is to see the approach in this study used in controlled experiments where the interplay between microbial components from the same or different kingdoms can be artificially determined and manipulated in attempts to study the complex etiology of avian diseases.

143

Figure 4.1. Heat map with phylogenetic tree representing the detection intensity of viral families at each individual week. Color corresponds to the range of relative abundance of each week from 0 to 100%. Green: 0-1%; yellow: 1-25%; orange: 25- 75%; and red: 75-100%. The sum of each column, or week, is 100%.

Healthy Diseased Poxviridae Herpesviridae 89.1 0.6 2.0 DNA Adenoviridae DNA Viruses Hepeviridae Hepadnaviridae Genomiviridae

Circoviridae 32.5 6.6 Parvoviridae Reoviridae Orthomyxoviridae Phenuiviridae RNA Birnaviridae RNA 0.1 Viruses Pneumoviridae Paramyxoviridae Astroviridae Picornoviridae Retroviridae 43.9 0.4 Flaviviridae Coronaviridae 22.7 1.8

144

Figure 4.2. Sample Diversity of all detected microorganisms in healthy flock and diseased birds. Alpha diversity determined using the Shannon diversity index.

2

1.8 1.6

1.4

1.2 1

0.8

Diversity Index Diversity 0.6 0.4

0.2

0 Virus Bacteria Bacteriophage Fungi

Healthy Diseased

145

Figure 4.3. Abundance of bacterial species identified in A) healthy and B) diseased flocks (Top 5 most abundant).

HEALTHY FLOCK DISEASED FLOCK A) Neisseria sicca B) Serratia 1.5% marcescens 3.6% 1.9% 2.5%

Ornithobacterium Escherichia coli rhinotracheale 17.5% Gallibacterium 13.1% anatis 40.7% Gallibacterium Escherichia coli anatis 55.2% Corynebacterium 17.3% falsenii

146 25.4%

146

Figure 4.4. Abundance of bacteriophage families identified in healthy and diseased flocks (Top 10 most abundant).

Diseased 10.6% 16.9% 19.2% 20.5% DiseasedFlock

Healthy 23.9% HealthyFlock 74.8%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Microviridae Myoviridae Podoviridae Siphoviridae Unclassified

147

Figure 4.5. Microbial network of the complete avian respiratory microbiome of a healthy broiler flock and diseased birds. Blue nodes indicate species detected in the healthy flock, red nodes indicate species detected in the diseased flock, green nodes indicate species detected in both flocks. The bacteria, bacteriophage, and fungal networks were constructed from elements present at greater than 1%.

Healthy Diseased Both

148

Table 4.1. Avian specific viral genome database structure.

Virus Complete Database Classification Family Genomes Double/Single Hepeviridae 1 Enveloped d Stranded a Hepadnaviridae 1 3 Non- Single Stranded Parvoviridae 7 Avian DNA Enveloped Circoviridae Viral 10 Database Smacoviridae 3 Poxviridae 3 Double Stranded Enveloped Herpesviridae 6 Non- Double Stranded Adenoviridae Enveloped 14 Non- Reoviridae 5 Double Stranded Segmented c Enveloped Birnaviridae 1 Retroviridae 5 Non- Single Stranded Positive b Enveloped Flaviviridae 3 Segmented Coronaviridae 5 Astroviridae 5 Avian RNA Non- Non- Single Stranded Positive Caliciviridae 1 Viral Segmented Enveloped Database Picornaviridae 17 Orthomyxoviridae 16 Phenuiviridae 1 Single Stranded Negative Segmented Enveloped Bornaviridae 3 Pneumoviridae 1 Non- Single Stranded Negative Enveloped Paramyxoviridae Segmented 14 a single stranded, double stranded or single/double stranded DNA and RNA viruses b positive-sense or negative-sense RNA viruses c segmented or non-segmented RNA viruses d enveloped or non-enveloped DNA and RNA viruses

149

Table 4.2. Sequencing data generated by DNA Seq and RNA Seq and number of reads trimmed, aligned to host and aligned to microbial databases.

Not Mapped Map to Map to Map to Map to Sample Trimmed Map to Host to Host Virus DB Phage DB Bacteria DB Fungi DB Healthy 41,196,338 37,149,004 4,066,203 DNA 7,039 257 17,038 74 Healthy 54,162,138 33,277,555 20,884,583 RNA Disease 23,547,613 21,003,097 2,544,516 DNA 32,478 1,448 88,278 485 Disease 8,680,175 8,217,793 462,382 RNA Total 127,586,264 99,647,449 27,957,684 39,517 1,705 105,316 559

150 Average 31,896,566 24,911,862 6,989,421 19,758.5 852.5 52,658 279.5

150

Table 4.3. Eukaryotic Viruses detected in healthy and diseased poultry broiler flocks.

Percent Normalized Sample Virus Name Type Taxonomy Relative Abundance Abundance Healthy Avian Retrovirus RNA ss, positive enveloped Retroviridae,Alpharetrovirus 8116608 44

Circoviridae,Gyrovirus,Avian Healthy Avian gyrovirus 2 DNA ss nonenveloped 6010480 33 gyrovirus 2

Adenoviridae,Aviadenovirus, Healthy Adenovirus DNA ds nonenveloped 117156 0.6 aviadenovirus

ss, positive, Avian infectious Coronaviridae,Gammacoronavirus Healthy RNA nonsegmente enveloped 4193379 22.7 bronchitis virus ,Avian coronavirus d

Infectious bursal ds, Birnaviridae,Avibirnavirus,Infectio Healthy RNA nonenveloped 22727 0.1 disease virus segmented us bursal disease virus 151

Diseased Avian Retrovirus RNA ss, positive enveloped Retroviridae,Alpharetrovirus 12864 0.4

Circoviridae,Gyrovirus,Avian Diseased Avian gyrovirus 2 DNA ss nonenveloped 237457 7 gyrovirus 2

Adenoviridae,Aviadenovirus, Diseased Adenovirus DNA ds nonenveloped 73306 2.0 aviadenovirus ss, positive, Avian infectious Coronaviridae,Gammacoronavirus Diseased RNA nonsegmente enveloped 65122 1.8 bronchitis virus ,Avian coronavirus d Herpesviridae,Iltovirus,Gallid Diseased Gallid herpesvirus 1 DNA ds enveloped 3193134 89.1 alphaherpesvirus 1

151

Table 4.4. Bacteria detected in healthy and diseased poultry broiler flocks (> 1%).

Number Percent Relative Sample Bacteria Name Mapped Abundance

Healthy Gallibacterium anatis 6932 40.7 Healthy Corynebacterium falsenii 4332 25.4 Healthy Escherichia coli 2986 17.5 Healthy 430 2.5 Healthy Neisseria sicca 261 1.5 Healthy Methylobacterium radiotolerans 238 1.4 Healthy Corynebacterium stationis 237 1.4 Healthy Cronobacter sakazakii 220 1.3 Diseased Escherichia coli 48736 55.2 Diseased Gallibacterium anatis 15239 17.3 Diseased Ornithobacterium rhinotracheale 11579 13.1 Diseased Proteus mirabilis 3186 3.6 Diseased Citrobacter freundii 1637 1.9

152

Table 4.5. Bacteriophage detected in healthy and diseased poultry broiler flocks (Top10 and >1%).

Percent Bacteriophage Normalized Sample Bacteriophage Name Relative Family Abundance Abundance Healthy Microviridae Enterobacteria phage WA13 519726 74.2 Healthy Myoviridae Enterobacteria phage fd 161803 23.1 Diseased Unclassified Enterobacteria phage P4 85457 20.5 Diseased Myoviridae Enterobacteria phage P88 59207 14.2 Diseased Microviridae Enterobacteria phage WA13 44074 10.6 Diseased Siphoviridae Stx2-converting phage 1717 25974 6.2 Diseased Siphoviridae Salmonella phage f18SE 14372 3.5 Diseased Myoviridae Escherichia phage pro483 11108 2.7 Diseased Siphoviridae Enterobacteria phage mEp460 10515 2.5 Diseased Siphoviridae Salmonella phage MA12 9963 2.4 Diseased Siphoviridae Salmonella phage FSL SP-031 9955 2.4 Diseased Siphoviridae Salmonella phage SETP7 9161 2.2

153

Table 4.6. Fungi detected in healthy and diseased poultry broiler flocks (> 1%).

Percent Number Normalized Sample Fungi Name Relative Mapped Abundance Abundance Healthy Wickerhamomyces ciferrii 21 403326 75.9 Healthy Penicillium chrysogenum 9 122422 23.0 Diseased Wickerhamomyces ciferrii 340 2856318 94.8 Diseased Penicillium chrysogenum 21 126311 4.2

154

REFERENCES

1. Wypych, T.P., L.C. Wickramasinghe, and B.J. Marsland. The influence of the

microbiome on respiratory health. Nat. Immunol, 2019. 20:1279-1290.

2. Huang, Y.J., S. Nariya, J.M. Harris, S.V. Lynch, D.F. Choy, J.R. Arron, and H.

Boushey. The airway microbiome in patients with severe asthma: associations

with disease features and severity. J. Allergy Clin. Immunol, 2015. 136:874-

884.

3. Durack, J., S.V. Lynch, S. Nariya, N.R. Bhakta, A. Beigelman, M. Castro, A.

Dyer, E. Israel, M. Kraft, and R.J. Martin. Features of the bronchial bacterial

microbiome associated with atopy, asthma, and responsiveness to inhaled

corticosteroid treatment. J. Allergy Clin. Immunol., 2017. 1450:63-75.

4. Moustafa, A., C. Xie, E. Kirkness, W. Biggs, E. Wong, Y. Turpaz, K. Bloom,

E. Delwart, K.E. Nelson, J.C. Venter, and A. Telenti. The blood DNA virome

in 8,000 humans. PLoS Pathog., 2017. 13:e1006292.

5. Murphy, T.F., L.O. Bakaletz, and P.R. Smeesters. Microbial interactions in the

respiratory tract. The Pediat. Infect. Dis. J., 2009. 28:S121-S126.

6. Lin, Y., S. Xu, D. Zeng, X. Ni, M. Zhou, Y. Zeng, H. Wang, Y. Zhou, H. Zhu,

K. Pan, and G. Li. Disruption in the cecal microbiota of chickens challenged

155

with Clostridium perfringens and other factors was alleviated by Bacillus

licheniformis supplementation. PLoS One, 2017 12:e0182426.

7. Danzelsen, J.L., J.B. Clayton, H. Huang, D. Knights, B. McComb, S.S. Hayer,

and T.J. Johnson. Temporal relationships exist between cecum, ileum, and

litter bacterial microbiomes in a commercial turkey flock, and subtherapeutic

penicillin treatment impacts ileum bacterial community establishment. Front.

Vet. Sci., 2015. 2:56.

8. Johnson, T.J., B.P. Youmans, S. Noll, C. cardona, N.P. Evans, T.P. Karnezos,

J.M. Ngunjiri, M.C. Abundo, and C.-W. Lee. A consistent and predictable

commercial broiler chicken bacterial microbiota in antibiotic-free production

displays strong correlation with performance. Appl. Environ. Microbiol., 2018.

84:e00362-18.

9. Hassan, K.E., A. Ali, S.A.S. Shany, and M.F. El-Kady. Experimental co-

infection of infectious bronchitis and low pathogenic avian influenza H9N2

viruses in commercial broiler chickens. Res. Vet. Sci., 2017. 115:356-362.

10. Cook, J.K., M.B. Huggins, S.J. Orbell, K. Mawditt, and D. Cavanaugh.

Infectious bronchitis virus vaccine interferes with the replication of avian

pneumovirus vaccine in domestic fow. Avian Path., 2001. 20:233-242.

156

11. Hanson, L.E., and T.J. Bagust. Infectious laryngotracheitis. In Diseases of

Poultry, 9th edn. Ed: Calnek, B.W., Iowa State University press, Ames, IA.,

1991. p. 485-495.

12. Bang, B.G., and F.B. Bang. Laryngotracheitis virus in chickens: a model for

study of acute nonfatal desquamating rhinitis. J. Exp. Med., 1967. 125:409-

428.

13. Mulholland, K.A., and C.L. Keeler. BiomeSeq: A tool for the characterization

of animal microbiomes from metagenomic data. bioRxiv., 2019. p. e800995.

14. Li H., and R. Durbin. Fast and accurate long-read alignment with Burrows-

Wheeler Transform. Bioinformatics., 2009. 45:1745-1760.

15. Hillier, L.W., W. Miller, E. Birney, W. Warren, R.C. Hardison, C.P. Pointing,

P. Bork, D.W. Burt, M.A.M. Grienen, M.E. Delaney, et al. Sequence and

comparative analysis of the chicken genome provide unique perspectives on

vertebrate evolution. Nature., 2004. 432:695-716.

16. Daly, G.M., R.M. Keggett,W. Rowe, S. Stubbs, M. Wilkinson, R. Ramirez-

Gonzalez, C. Mario, W. Bernal, and J. Heeney. Host subtraction, filtering and

assembly validations for novel viral discovery using next generation

sequencing data. PLoS One, 2015. 10:e0129059.

157

17. Langmead, B. and S.L. Salzberg. Fast gapped-read alignment with Bowtie 2.

Nat Methods, 2012. 9:357-359.

18. Mulholland, K.A and Keeler, C.L. BiomeSeq Microbial Databases. Available

from: https://sites.udel.edu/aviangenomics/. 2019.

19. Herath, D., D. Jayasundra, D. Ackland, I. Saeed, S.-L. Tan, and S. Halgamuge.

Assessing Species Diversity Using Metavirome Data: Methods and

Challenges. Comput. Struct. Biotechnol. J., 2017. 15:447-455.

20. Lemos, L.N., R.R. Fulthorpe, E.W. Triplett, and L.F.W. Roesch. Rethinking

microbial diversity in the hihj throughput sequencing era. J. Microbiol. Meth.,

2011. 86:42-51.

21. Ludwig, J., and J. Reynolds. Statistical Ecology. Wiley. New York. 1988.

22. Zhang, J.J., T.Y. Kang, T. Kwon, H. Koh, N. Chandimali, D.L> Huynh, X.Z.

Wang, N. Kim, and D.K. Jeong. Specific chicken egg yolk antibody improves

the protective response against Gallibacterium anatis infection. Infect.

Immun., 2019. 87:e00619-18.

23. van Empel, P.C.M., and H.M. Hafez. Ornithobacterium rhinotracheale: a

review. Avian Pathol., 1999. 28:217-227.

24. Francesca, N., C. Carvalho, P.M. Almeida, C. Sannino, L. Settanni, J.P.

Sampaio, and G. Moschetti. Wickerhamomyces syylviae f.a., sp. Nov., an

158

ascomycetous yeast species isolated from migratory birds. Int. J. Systematic

Evol. Micro., 2013. 63:4824-4830.

25. Andersen, B., J.C. Frisvad, I. Sondergaard, I.S. rasmussen, and L.S. Larsen.

Associations between fungal species and water-damaged buidling materials.

Appl. Environ. Micro., 2011. 77:4180-4188.

159

Chapter 5

A COMPARISON OF TRACHEA, CHOANAL CLEFT AND CLOACAL

MICROBIOTA OF A HEALTHY TURKEY FLOCK

5.1 Introduction

Poultry is the leading source of protein globally, with over $46.3 billion in global sales in 2018 [1]. Turkey in particular had an estimated $13.5 billion in global sales in 2016, with the United States accounting for about $6 billion, according to the

United States Department of Agriculture [2, 3]. The United States is the world leader in both turkey meat consumption as well as in production and export, producing about

7.5 billion pounds and accounting for about 41% of the world’s turkey consumption in

2016 [2]. In recent years, pathogenic outbreaks in poultry flocks have contributed to global economic loss. For example, during the 2014-2015 outbreak of highly pathogenic avian influenza, arguably the largest poultry health catastrophe in the

United States, over 50 million chickens and turkeys were lost [4]. Due to the importance of turkey health from both a nutritional and economic standpoint, elucidating the microbiota of turkey flocks is essential.

160

Advancements in next-generation sequencing technology enable investigations into individual components of the microbiome. However, the current methodologies are limited to characterizing one component at a time. BiomeSeq is a tool developed for the analysis of complete animal microbiomes using metagenomic sequencing data.

This tool addresses the constraints of current computational tools by providing a comprehensive workflow and corresponding microbial databases that accurately identify and quantify each major component of the microbiome. BiomeSeq has demonstrated high precision and sensitivity on several simulated datasets and has also been successfully employed to characterize the respiratory microbiome of a commercial poultry broiler flock during its development and a broiler flock clinically diagnosed with avian respiratory disease complex. BiomeSeq was designed to facilitate investigations of various microbial niches from any animal host. Therefore, this tool can be utilized to elucidate turkey microflora.

Utilizing BiomeSeq, we provide a complete characterization of the cloacal, choanal cleft and tracheal microbiomes of a healthy flock of turkeys. This tool was successful in identifying the microbial communities inhabiting these three unique biological niches from metagenomics next generation sequencing data. Each of the major components of the microbiome, including eukaryotic viruses, bacteria, bacteriophage and fungi, are identified from each niche and normalized relative abundance and diversity is carefully examined. A comprehensive microbial network of the microorganisms inhabiting the respiratory and intestinal microbiomes is provided.

161

This study further demonstrates the extensive utility of BiomeSeq and its ability to characterize various microbiomes from any animal host.

5.2 Materials and Methods

5.2.1 Sample Collection

Tracheal, choanal and cloacal swabs were collected from two 4-week-old and two 8-week-old commercial turkeys at the University of Minnesota. Individual swabs were placed in 3 ml of buffer PV1 (Qiagen) and frozen immediately on dry ice and stored at -80ºC.

5.2.2 Nucleic Acid Extraction and Sequencing

For each sampling site, the four swab samples were thawed, combined, homogenized and split into two tubes. Each tube was centrifuged (7000 X g; 5 minutes; 4°C) to form two pellets for each sample. RNA was isolated from one of the pellets using the Qiagen Viral Nucleic Acid extraction kit, following the manufacturer’s protocol. DNA was isolated from the other pellet using the Qiagen

Blood and Tissue kit, also following the manufacturer’s protocol. DNA and RNA sequencing was performed for each sample using Illumina HiSeq platform, producing

1 X 100 single-end reads, at the University of Delaware Sequencing and Genotyping

Center. A total of six sequencing files were created.

162

5.2.3 Eukaryotic Virus, Bacteria, Bacteriophage and Fungal Analysis

The raw DNA-Seq and RNA-Seq reads were processed using BiomeSeq [5]. In summary, each sequence file was trimmed for quality and reads were analyzed for per- base sequence quality, per-sequence quality, sequence length distribution and duplicate sequences. Reads with a quality Phred score below 30, adapter sequences and reads less than 100 base pairs in length were removed from the file. The remaining reads were then aligned to the reference host genome (Meleagris gallopavo;

Annotation Release 101) using the Burrows-Wheeler Alignment algorithm [6, 7]. This step removes host (turkey) sequences from the dataset, increasing analytical efficiency of the remaining processing steps [8].

Unmapped reads were extracted and aligned to microbial reference genome databases using the Bowtie2 alignment algorithm [9]. BiomeSeq provides eukaryotic virus, bacteriophage, fungi and bacteria reference genome databases. However, one major feature of this tool is its ability to accept custom databases provided by the user.

For this study, an avian-derived viral database was constructed to replace BiomeSeq’s default viral database (Table 5.1). This avian-derived viral database contains full genome reference sequences of both DNA and RNA avian viruses obtained from

National Center for Biotechnology Information (NCBI) reference sequences (Table

5.1). The avian DNA viral database contains 48 viral elements from 9 unique families and the avian RNA viral database contains 63 viral elements from 13 families. The avian DNA and RNA viral databases are organized by the classification of their viral

163

structure and genome organization (Table 5.1). DNA viruses are organized hierarchically by whether the virus is double- or single-stranded and whether the virus is enveloped or non-enveloped. RNA viruses are organized hierarchically by whether the virus is double- or single-stranded, negative or positive sense, segmented or non- segmented and whether the virus is enveloped or non-enveloped The avian-viral database used in this study is publicly available [10]. The reads were also aligned to the default bacterial, fungal and bacteriophage databases provided by BiomeSeq, which contain complete and representative genomes obtained from the NCBI

Reference Sequence Database and consist of 3,623, 1,281 and 2,212 genomes, respectively [11]. These databases are also publicly available [10] .

A sequence similarity-dependent approach for detecting microorganisms, employed by BiomeSeq, contributes to the rapid detection of known eukaryotic viruses, bacteria, bacteriophage and fungi while also allowing for the quantification of biodiversity, which similarity-independent approaches lack [12]. For each individual sample, the reads that mapped to each microorganism were normalized based on both microbe and reference genome length per 100,000 host cells using an adaptation of the equation presented by Moustafa and colleagues in 2017 to quantify viral abundance

[13]:

$#-"(. /0 .(&%1 -&22(% 3/ -4'./"( 1(5#($'( 2 , -4'./"( 1(5#($'( 146( !"#$%&$'( = , 10! $#-"(. /0 .(&%1 -&22(% 3/ ℎ/13 8($/-( ℎ/13 8($/-( 146(

164

Percent relative abundance was also quantified from the normalized abundances using the following equation:

-4'./"4&= &"#$%&$'( ;(.'($3 <(=&34>( !"#$%&$'( = , 100 3/3&= -4'./"4&= &"#$%&$'(

Finally, alpha diversity for each sample was calculated using the Shannon Diversity Index, a commonly used equation for calculating species diversity in a microbiome as it accounts for both species abundance and evenness within the sample [14, 15]. This data was visually represented by Venn diagrams, networks and stacked bar plots using the R library Venn Diagram [16] and Cytoscape [17].

5.3 Results

5.3.1 Quality Trimming and Decontamination of Sequencing Reads

A total of 164,557,236 reads were obtained from RNA-Seq and 192,392,125 reads from DNA-Seq (Table 5.2). Following quality control and trimming,

164,412,647 RNA-Seq reads and 192,361,317 DNA-Seq reads remained. A total of

6,456,405 RNA-Seq reads aligned to the turkey genome, and the 157,956,242 reads that did not align to the turkey genome were aligned to the avian viral, bacteria, bacteriophage and fungi databases using BiomeSeq [Mulholland, 2019]. A total of

162,742,651 DNA-Seq reads aligned to the turkey genome, and the 29,618,666 reads that did not were also aligned to the avian viral, bacteria, bacteriophage and fungi databases using BiomeSeq [Mulholland, 2019] (Table 5.2).

165

5.3.2 Diversity of Eukaryotic Viruses in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock

Two DNA viruses and three RNA eukaryotic viruses were detected in the choanal cleft (Table 5.4). The most abundant viral species was Rotavirus D with a relative abundance of 74.4%, followed by Turkey astrovirus 2 (15.5%) and Turkey (6.7%; Figure 5.1). The DNA viruses, Adenovirus (2.3%) and Gallid herpesvirus 2&3 (1.1%) were the least abundant. Three DNA viruses were detected in the trachea (Table 5.4). The most abundant viral species was Adenovirus with a relative abundance of 95.7%. Meleagrid herpesvirus 1 (0.9%) and Gallid herpesvirus

2&3 (3.4%) were also detected (Figure 5.1). One DNA virus and four RNA viruses were detected in the cloaca (Table 5.4). The most abundant viral species were Turkey gallivirus, Turkey astrovirus 2 and Rotavirus D with relative abundances of 36.1%,

35.1% and 27.4%, respectively. Trace amounts of Avian leukosis virus (1.0%) and

Adenovirus (0.4%) were also detected (Figure 5.1). A Venn Diagram was generated to analyze the similarities and differences in the species of viruses detected in the choanal cleft, trachea and cloaca of this turkey flock (Figure 5.5). Viral alpha diversity was highest in the cloaca (H = 1.16), followed by choanal cleft (H = 0.83) and trachea (H = 0.20; Table 5.3).

166

5.3.3 Bacteriophage Diversity in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock

A total of 84 unique bacteriophage were detected in the choanal cleft, 73 in the trachea and 79 in the cloaca. The top 10 most abundant bacteriophage are included in

Table 5.6. The most abundant bacteriophage in all locations was Enterobacteria phage P88 with a normalized relative abundance of 20.3% in the choanal cleft, 39.0% in the trachea and 11.6% in the cloaca (Figure 5.3). In the choanal cleft, the next most abundant bacteriophage includes Enterobacteria phage P4 (17.0%), phage

BPP-1 (11.8%) and Stx2-converting phage 1717 (11.0%). In the trachea

Enterobacteria phage P1 (7.5%), Shigella phage SfIV (7.0%) and Salmonella phage

SJ46 (6.1%) are the most abundant species (Figure 5.3). Finally, we observed

Salmonella phage SJ46 (10.1%), Stx2-converting phage 1717 (7.3%) and

Enterobacteria phage mEp460 (6.4%) as the most abundant bacteriophage in the cloaca (Figure 5.3). Several bacteriophage species were shared among locations, while many were unique to a specific niche. A Venn Diagram was generated to depict the similarities and differences in the top ten most abundant species of bacteriophage detected in the choanal cleft, trachea and cloaca of this turkey flock (Figure 5.7).

Enterobacteria phage P88, Enterobacteria phage P1, Salmonella phage SJ46,

Enterobacteria phage mEp460 and Stx2-converting phage 1717 were present in all locations. Shigella phage SfIV, Enterobacteria phage SfV and Escherichia phage TL-

2011b were detected only in trachea. Enterobacteria phage P4, Bordetella phage

BPP-1, Enterobacteria phage phiP27, Enterobacteria phage Sf6 were detected in the

167

choanal cleft. Salmonella phage RE-2010 and Stxconverting phage vB_EcoP_24B were detected in the cloaca. Enterobacteria phage fiAA91-ss was observed in trachea and choanal cleft, but not the cloaca. Phage cdtI was observed in the choanal cleft and cloaca, but not the trachea. Finally, Enterobacteria phage YYZ-2008 was detected in the trachea and cloaca, but not in the choanal cleft (Figure 5.7). Bacteriophage alpha diversity was highest in the cloaca (H = 3.37), followed by choanal cleft (H = 2.76) and trachea (H = 2.52; Table 5.3).

5.3.4 Fungal Diversity in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock

A total of 29 unique fungi were detected in the choanal cleft, 28 were detected in the trachea and 39 were detected in the cloaca. The fungal species with a percent relative abundance of greater that 0.1% are included in Table 5.7. Wickerhamomyces ciferrii was the most abundant fungi in the trachea (94.7%) and cloaca (85.3%; Figure

5.4). The most abundant fungi in the choanal cleft were Penicillium chrysogenum

(73.2%) and Wickerhamomyces ciferrii (26.4%; Figure 5.4). The most abundant fungi in the trachea were Wickerhamomyces ciferrii (94.7%), Penicillium chrysogenum

(2.9%), Sordaria macrospora (2.37%; Figure 5.4). The most abundant fungi in the cloaca were Wickerhamomyces ciferrii (85.3%) and Penicillium chrysogenum (14.6%;

Figure 5.4). A Venn Diagram was generated to analyze the similarities and differences in the species of fungi detected in the choanal cleft, trachea and cloaca of a

168

turkey (Figure 5.8). Fungal alpha diversity was highest in the choanal cleft (H = 0.59), followed by cloaca (H = 0.42) and trachea (H = 0.24; Table 5.3).

5.3.5 Bacterial Diversity in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock

A total of 4 unique bacteria were detected in the choanal cleft, 6 were detected in the trachea and 20 were detected in the cloaca with a percent relative abundance of greater than 0.1% (Table 5.5). Escherichia coli was detected at a high abundance in all three locations (choanal cleft at 46.6%, trachea at 67.5% and cloaca at 3.4%; Table

5.5). Bordetella hinzii was the most abundant bacteria detected in the choanal cleft with a percent relative abundance of 50.8%, followed by Escherichia coli (46.6%),

Lactobacillus amylovorus (1.5%) and Citrobacter freundii (1.1%; Figure 5.2).

Escherichia coli was the most abundant bacteria detected in the trachea with a percent relative abundance of 67.5%, followed by Citrobacter freundii (28.4%) and

Corynebacterium stationis (1.4%; Figure 5.2). In the cloaca, Staphylococcus warneri was the most abundant with a percent relative abundance of 73.8%, followed by

Eubacterium Marseille-P3177 (6.5%) and Campylobacer coli (3.4%; Figure 5.2). A

Venn Diagram was generated to analyze the similarities and differences in the species of bacteria detected in the choanal cleft, trachea and cloaca of a turkey (Figure 5.6).

Bacterial alpha diversity was highest in the cloaca (H = 1.20), followed by trachea (H

= 0.86) and choanal cleft (H = 0.75; Table 5.3).

169

5.3.6 Microbial network of choanal cleft, cloaca and trachea of a healthy turkey flock

We examined the microbial ecology of the turkey cloaca, trachea and choanal cleft of a healthy turkey flock. Figure 5.9 is a representation of the complex ecology of the microbiomes. In this microbial network, nodes represent species of bacteria, bacteriophage, eukaryotic viruses, and fungi. The most abundant species are included:

17 bacteriophage, 7 eukaryotic viruses, 15 bacteria and 6 fungi. Five bacteriophage, 1 virus, 3 fungi and 1 bacteria were detected in all locations, represented by yellow nodes in the microbial network. One bacteriophage and 3 eukaryotic viruses were detected in the choanal cleft and cloaca, represented by red-green nodes. Two bacteriophage and one bacteria were detected in the trachea and cloaca, represented by blue-green nodes. One bacteriophage, two fungi, two bacteria and one eukaryotic virus were detected in the choanal cleft and trachea, represented by red-blue nodes. Six bacteriophage, 2 eukaryotic viruses, 1 fungi and 11 bacteria were only detected in a single location, represented by blue, green or red nodes in the microbial network

(Figure 5.9).

5.4 Discussion

Over the years, poultry meat has become the main source of protein consumption globally [18] , with the United States and Europe consuming the most

[19]. While the majority of poultry consumed is chicken, turkey accounts for as much

170

as 17% of consumption. According to the United States Department of Agriculture, the United States produces over 250 million turkeys per year [20]. A majority of studies concentrating on the avian microbiome examine chicken specifically, with far less concentration on turkeys. Understanding the complex communities existing in turkey microbiomes could contribute to the treatment of pathogen induced infections and diseases in this species, which account for a large economic loss for producers each year.

Numerous studies provide evidence that microbial communities vary between microbiome niches and that the specific microorganisms inhabiting different environments contribute to particular biological functions. An interesting example in birds can be seen in the microbiota on the face and mouth of vultures. These birds, which feed predominately on decaying carcasses, are consequently exposed to pathogens, so the microbiota of their face and mouth is incredibly diverse [21].

Interestingly, their intestine was found to be less diverse, implying that the microbiota of the face and mouth eliminate pathogens before they can enter the intestinal tract

[21]. Understanding the composition and diversity of these various microbial communities will certainly lead to a deeper understanding of the complex role these microorganisms play in maintaining homeostasis of the host in which they reside.

Previous studies have demonstrated that BiomeSeq can successfully detect and quantify microbial abundance in the respiratory microbiome broiler flocks. In one study, this tool was utilized to examine the development of microbial ecology

171

throughout the growth of a healthy broiler flock from hatching to processing. In another study, this tool was utilized to compare a healthy broiler flock to a broiler flock clinically diagnosed with avian respiratory disease complex. In this study, we demonstrate the extensive utility of BiomeSeq by employing the tool to investigate microbial communities from the choanal cleft, trachea and cloaca of a healthy turkey flock. This includes the eukaryotic viruses, bacteria, bacteriophage and fungi that inhabit both the intestinal tract and respiratory tract. To detect and quantify the normalized relative abundance of these microorganisms, choanal cleft, trachea and cloacal swabs were collected from a healthy turkey flock. From these samples, nucleic acids were extracted and DNA-Seq and RNA-Seq was conducted. A total of

356,949,361 raw reads were generated and processed using BiomeSeq. To quantify the normalized relative abundance of each microorganism detected, BiomeSeq utilizes a unique sequence-dependent approach. In summary, the raw sequencing reads are first trimmed for quality and adapter sequences, reads shorter than 100 bp in length and reads with low quality are extracted from the sample. The remaining reads are then aligned to the turkey genome in a decontamination step to extract turkey DNA sequences from the sample. The decontaminated reads are aligned to the default bacteria, bacteriophage and fungi databases provided by BiomeSeq. One feature that makes BiomeSeq versatile to a variety of studies is its ability to accept custom databases provided by the user. For this study, an avian-derived viral genome database was constructed to replace BiomeSeq’s default viral database.

172

A close examination of the turkey respiratory and intestinal microbiomes confirmed unique and complex microbial communities. The species diversity was confirmed by the detection of a variety microorganisms. Interestingly, we observed a higher diversity in the cloacal microbiota when compared to the choanal cleft and trachea, which were quite comparable. Adenovirus was the only virus detected in all three locations, and the cloaca and trachea each had one unique virus. Interestingly the cloaca and choanal cleft shared the most similar viral composition, including Turkey astrovirus 2, Rotavirus D and Turkey gallivirus. Astroviruses, reoviruses (such as rotavirus) and adenoviruses are frequently identified enteric viruses in chicken and turkey in both healthy and diseased flocks [22-24]. Therefore, it is not surprising to detect Turkey astrovirus at a high abundance in the cloaca as these viruses are common in the avian intestinal tract [25]. Turkey astroviruses can result in microscopic changes to the intestinal epithelium of turkeys, leading to failure to absorb water and resulting in [26]. Furthermore, it is not surprising that were detected at such a high abundance in these samples as previous studies provide evidence that rotaviruses in turkeys have been found to appear in the first month of life, which is consistent with the age of this flock at the time of sampling [27]. Moreover, the presence of herpesviruses is consistent with the vaccination of the birds with this live vaccine, coupled with the expected presence of these avian viruses in the environment.

173

The only bacteria that was detected in all locations was Escherichia coli. The bacterial composition in the cloaca was much more diverse than the respiratory tract, with eight unique bacteria. Furthermore, the choanal cleft and trachea have Citrbacter freundii and Lactobacillus amylovorus in common, whereas more unique species were detected in the cloaca, with the most abundant being Staphylococcus warneri.

Penicillium chrysogenum and Wickerhamomcyes ciferrii were the most abundant fungal species identified in all three locations. Of the top 10 most abundant fungal species, only 2 were detected in all locations while 1 species was unique to trachea and 3 were unique to the choanal cleft. Enterobacteria phage p88 was the most abundant bacteriophage in each location. Of the most abundant bacteriophage species,

5 were detected in all locations, including Enterobacteria phage 88, Stx2-converting phage 1717, Salmonella phage SJ46, Enterobacteria phage P1 and Enterobacteria phage mEp460. Unique bacteriophage were identified in each location; 4 were unique to the choanal cleft, 2 were unique to the trachea and 2 were unique to the cloaca.

In this study, we demonstrate the extensive utility of BiomeSeq to identify and quantify microbial abundance and diversity in various microbial niches and host animals. This tool was employed to investigate microbial communities from the choanal cleft, trachea and cloaca of a healthy turkey flock. By identifying the eukaryotic viruses, bacteria, bacteriophage and fungal elements within various animal microbiomes, the unique biological roles to which these microorganisms contribute can be elucidated. Furthermore, accurate quantification of abundance and diversity

174

within these communities, provided by BiomeSeq, will contribute to valuable knowledge that may distinguish healthy from potentially infected microbiomes.

175

Figure 5.1. Normalized relative abundance of eukaryotic viruses at the choanal cleft, cloaca and trachea of turkey.

Cloaca

Trachea

Choanal Cleft

0 20 40 60 80 100

Gallid herpesvirus 2&3 Adenovirus Turkey astrovirus 2 Rotavirus D Turkey gallivirus Meleagrid herpesvirus 1 Avian leukosis virus

176

Figure 5.2. Abundance of bacteria at the choanal cleft, cloaca and trachea of turkey.

Cloaca

Trachea

Choanal Cleft

0 10 20 30 40 50 60 70 80 90 100

Staphylococcus warneri Eubacterium sp. Marseille-P3177 Campylobacter coli Escherichia coli Christensenella sp. Marseille-P2438 Intestinimonas butyriciproducens Barnesiella viscericola Roseburia hominis Bacteroides vulgatus Megasphaera elsdenii Bordetella hinzii Citrobacter freundii Lactobacillus amylovorus Corynebacterium stationis Bordetella holmesii

177

Figure 5.3. Abundance of top 10 bacteriophage at the choanal cleft, cloaca and trachea of turkey.

Cloaca

Trachea

Choanal Cleft

0 10 20 30 40 50 60 70 80 90 100

Enterobacteria phage P88 Stx2-converting phage 1717 Bordetella phage BPP-1 Enterobacteria phage mEp460 Enterobacteria phage phiP27 Enterobacteria phage P4 Phage cdtI DNA Salmonella phage SJ46 Enterobacteria phage P1 Enterobacteria phage fiAA91-ss Enterobacteria phage Sf6 Shigella phage SfIV

178

Figure 5.4. Normalized relative abundance of top 10 fungi at the choanal cleft, cloaca and trachea of turkey.

Cloaca

Trachea

Choanal Cleft

0 10 20 30 40 50 60 70 80 90 100

Penicillium chrysogenum Wickerhamomyces ciferrii Trichosporon asahii Usnea ceratina Botrytis cinerea Sordaria macrospora

179

Figure 5.5. Venn Diagram of the eukaryotic viruses detected in the choanal cleft, cloaca and trachea of turkeys.

Gallid herpesvirus 2&3 Choanal Trachea

1 Meleagrid herpesvirus 1

0 1

1

180 Adenovirus 3 0

Turkey astrovirus 2 Rotavirus D Turkey gallivirus 1

Avian leukosis virus Cloaca

180

Figure 5.6. Venn Diagram of the bacteria detected in the choanal cleft, cloaca and trachea of turkeys.

Corynebacterium stationis Bordetella holmesii

2 Choanal Cloaca

Eubacterium Marseille-P3177

1 2 1 8 1 181 Bordetella hinzii

Staphylococcus warneri Campylobacter coli Christensenella Marseille-P2438 Intestinimonas butyriciproducens Citrobacter freundii Barnesiella viscericola Lactobacillus amylovorus Roseburia hominis Bacteroides vulgatus Trachea Megasphaera elsdenii Escherichia coli

181

Figure 5.7. Venn Diagram of the bacteriophage detected in the choanal cleft, cloaca and trachea of turkeys.

Enterobacteria phage fiAA91-ss

Choanal Trachea

Enterobacteria phage SfV 1 Shigella phage SfIV

4 2 Bordetella phage BPP-1 Enterobacteria phage phiP27 Enterobacteria phage P4 Enterobacteria phage Sf6 5

182 1 2 Phage cdtI DNA Enterobacteria phage YYZ-2008 Escherichia phage TL-2011b

2 Enterobacteria phage P88 Stx2-converting phage 1717 Salmonella phage RE-2010 Salmonella phage SJ46 Stxconverting phage vB_EcoP_24B Enterobacteria phage P1 Enterobacteria phage mEp460 Cloaca

182

Figure 5.8. Venn Diagram of the fungi detected in the choanal cleft, cloaca and trachea of turkeys.

Sordaria macrospora

1 Choanal Cloaca

Trichosporon asahii Botrytis cinereal 0 0 Usnea ceratina 3 2 0 183

Trachea Wickerhamomyces ciferrii Penicillium chrysogenum

183

Figure 5.9. Microbial network of eukaryotic viruses, fungi, bacteria and bacteriophage present in the cloaca, trachea and choanal cleft of turkeys. Yellow nodes are species identified in the choanal cleft, cloaca and trachea; green nodes are species identified in the cloaca; blue nodes are species identified in the trachea; red nodes are species identified in the choanal cleft.

Cloaca

Choanal cleft

Trachea

All locations 184

184

Table 5.1. Avian specific viral genome database structure.

Virus Complete Database Classification Family Genomes Hepeviridae Double/Single d 1 a Enveloped Stranded Hepadnaviridae 1

Genomoviridae 3 Non- Single Stranded Parvoviridae Enveloped 7 Avian DNA Viral Circoviridae 10 Database Smacoviridae 3

Poxviridae 3 Double Stranded Enveloped Herpesviridae 6 Non- Double Stranded Adenoviridae Enveloped 14 Non- Reoviridae 5 Double Stranded Segmented c Enveloped Birnaviridae 1 Retroviridae 5 Non- Single Stranded Positive b Enveloped Flaviviridae Segmented 3 Coronaviridae 5 Astroviridae 5 Avian RNA Non- Non- Single Stranded Positive Caliciviridae 1 Viral Segmented Enveloped Database Picornaviridae 17 Orthomyxovirid ae 16 Single Stranded Negative Segmented Enveloped Phenuiviridae 1 Bornaviridae 3 Pneumoviridae 1 Non- Paramyxovirida Single Stranded Negative Enveloped Segmented e 14 a single stranded, double stranded or single/double stranded DNA and RNA viruses b positive-sense or negative-sense RNA viruses c segmented or non-segmented RNA viruses d enveloped or non-enveloped DNA and RNA viruses

185

Table 5.2. Quality Trimming and Host DNA Decontamination of reads generated by DNA-Seq and RNA-Seq from samples collected from the choanal cleft, cloaca and trachea of turkeys

Number Number Number Sample Sequencing Number Sample Trimmed Mapped to Unmapped Location Method Raw Reads Reads Host to Host

CK91 Choanal RNA Seq 39,116,031 39,062,667 1,989,029 37,073,638

CK92 Trachea RNA Seq 63,836,927 63,770,120 4,209,347 59,560,773

CK93 Cloacal RNA Seq 61,604,278 61,579,860 258,029 61,321,831

CK94 Choanal DNA Seq 65,037,361 65,026,950 57,894,840 7,132,110

CK95 Trachea DNA Seq 67,112,661 67,101,764 59,695,581 7,406,183

CK96 Cloacal DNA Seq 60,242,103 60,232,603 45,152,230 15,080,373

186

Table 5.3. Shannon diversity of virus, bacteria, bacteriophage and fungi in choanal cleft, trachea and cloaca of turkey

Choanal Cleft Trachea Cloaca

Virus 0.83 0.20 1.16

Bacteria 0.75 0.86 1.20

Bacteriophage 2.76 2.52 3.37 Fungi 0.59 0.24 0.42

187

Table 5.4. Eukaryotic viral species abundance in the choanal cleft, trachea and cloaca of turkey. Virus Virus Nucleic Acid Virus Number Normalized Relative Location Virus Taxonomy Virus Species Type Description Enveloping Mapped Abundance Abundance Herpesviridae,Mardivirus,Gallid Gallid herpesvirus DNA ds enveloped 101 2253 1.1 alphaherpesvirus 2&3 Adenoviridae,Aviadenovirus, DNA ds nonenveloped Adenovirus 49 4503 2.3 Aviadenovirus Choanal ss, positive, Astroviridae,Avastrovirus, Turkey astrovirus RNA nonenveloped 2 30851 15.5 Cleft nonsegmented Avastrovirus 2 ds, positive, RNA nonenveloped Reoviridae,Rotavirus,Rotavirus D Rotavirus D 1 148309 74.4 segmented ss, positive, RNA nonenveloped Picornaviridae,Gallivirus,Gallivirus A Turkey gallivirus 1 13354 6.7 nonsegmented Herpesviridae,Mardivirus,Gallid Gallid herpesvirus DNA ds enveloped 98 2111 3.4 alphaherpesvirus 2&3 Adenoviridae,Aviadenovirus, Trachea DNA ds nonenveloped Adenovirus 677 59709 95.7 Aviadenovirus 188 Herpesviridae,Mardivirus, Meleagrid DNA ds enveloped 24 570 0.9 Meleagrid alphaherpesvirus 1 herpesvirus 1 Adenoviridae,, DNA ds nonenveloped Adenovirus 270 50709 0.4 Aviadenovirus Retroviridae,Alpharetrovirus, Avian leukosis RNA ss, positive enveloped 1 120036 1.0 Avian leukosis virus virus ss, positive, Astroviridae,Avastrovirus, Turkey astrovirus Cloaca RNA nonenveloped 37 4399671 35.1 nonsegmented Avastrovirus 3 2 ds, positive, RNA nonenveloped Reoviridae,Rotavirus,Rotavirus D Rotavirus D 3 3429738 27.4 segmented ss, positive, RNA nonenveloped Picornaviridae,Gallivirus,Gallivirus A Turkey gallivirus 44 4529386 36.1 nonsegmented

188

Table 5.5. Bacteria species abundance in the choanal cleft, trachea and cloaca of turkey.

Number Normalized Relative Location Bacteria Name Mapped Abundance Abundance

Bordetella hinzii 113957 90412 50.8 Escherichia coli 111443 83035 46.6 Choanal Cleft Citrobacter freundii 2566 1962 1.1 Lactobacillus amylovorus 1455 2729 1.5 Escherichia coli 179927 130018 67.5 Citrobacter freundii 6669 54777 28.4 Corynebacterium stationis 1957 2634 1.4 Trachea Eubacterium sp. Marseille-P3177 1774 1916 1.0 Lactobacillus amylovorus 1182 2150 1.1 Bordetella holmesii 1158 1183 0.6 Eubacterium sp. Marseille-P3177 85221 121668 6.5 Escherichia coli 67614 64596 3.4 Campylobacter coli 29671 72910 3.9 Christensenella sp. Marseille-P2438 24832 48476 2.6 Intestinimonas butyriciproducens 23550 34859 1.9 Barnesiella viscericola 20852 33871 1.8 Bacteroides vulgatus 19634 19006 1.0 Roseburia hominis 16220 22568 1.2 Flavonifractor plautii 8568 11214 0.6 Megasphaera elsdenii 6767 13505 0.7 Cloaca Lactobacillus amylovorus 4143 9965 0.5 Alistipes finegoldii 4103 5491 0.3 Bacteroides salanitronis 3789 4463 0.2 Bifidobacterium animalis 3764 9705 0.5 Staphylococcus warneri 3644 1381197 73.8 Mobiluncus curtisii 3088 7190 0.4 Clostridium cellulovorans 2833 2691 0.1 Odoribacter splanchnicus 2609 2969 0.2 Eubacterium rectale 2316 3355 0.2 Cloacibacillus porcorum 2083 2904 0.2

189

Table 5.6. Bacteriophage species with top 10 highest abundances in the choanal cleft, trachea and cloaca of turkey.

Number Normalized Relative Location Phage Name Mapped Abundance Abundance

Enterobacteria phage P88 1383 150522 20.3 Enterobacteria phage P4 377 126420 17.0 Bordetella phage BPP-1 952 87327 11.8 Stx2-converting phage 1717 1305 81850 11.0 Enterobacteria phage mEp460 557 48778 6.6 Choanal 436 39917 5.4 Cleft Enterobacteria phage phiP27 Phage cdtI DNA 318 26361 3.5 Enterobacteria phage fiAA91-ss 161 18662 2.5 Enterobacteria phage Sf6 138 13777 1.9 Enterobacteria phage P1 187 7689 1.0 Salmonella phage SJ46 195 7348 1.0 Enterobacteria phage P88 1514 159809 39.0 Enterobacteria phage P1 769 30665 7.5 Shigella phage SfIV 300 28525 7.0 Salmonella phage SJ46 681 24887 6.1 Enterobacteria phage mEp460 237 20129 4.9 Trachea Stx2-converting phage 1717 256 15572 3.8 Enterobacteria phage SfV 148 15091 3.7 Escherichia phage TL-2011b 178 15025 3.7 Enterobacteria phage fiAA91-ss 133 14951 3.7 Enterobacteria phage YYZ-2008 106 7299 1.8 Enterobacteria phage P88 104 14513 11.6 Salmonella phage SJ46 262 12658 10.1 Stx2-converting phage 1717 114 9168 7.3 Enterobacteria phage mEp460 71 7972 6.4 Salmonella phage RE-2010 54 7911 6.3 Cloaca Enterobacteria phage P1 142 7486 6.0 Escherichia phage TL-2011b 60 6696 5.4 Phage cdtI DNA, complete 42 4464 3.6 genome Stxconverting phage vB_EcoP_24B 45 3899 3.1 Enterobacteria phage YYZ-2008 36 3278 2.6

190

Table 5.7. Fungal species with top 10 highest abundances in the choanal cleft, trachea and cloaca of turkey.

Number Normalized Relative Location Fungi Name Mapped Abundance Abundance

Penicillium chrysogenum 59 144821 73.2 Wickerhamomyces ciferrii 1318 52296 26.4 Choanal Cleft Trichosporon asahii 1 120 0.1 Usnea ceratina 2 119 0.1 Botrytis cinerea 71 102 0.1 Wickerhamomyces ciferrii 1345 4016207 94.7 Trachea Penicillium chrysogenum 51 121408 2.9 Sordaria macrospora 9 100659 2.4 Wickerhamomyces ciferrii 677 2672670 85.3 Cloaca Penicillium chrysogenum 145 456360 14.6

191

REFERENCES

1. USDA, N.A.S.S. Poultry - Production and Value 2018 Summary. 2019; Available

from:

https://www.nass.usda.gov/Publications/Todays_Reports/reports/plva0519.pdf.

2. USDA, N.A.S.S. Turkeys: Production and Value of Production. 2017; Available

from: https://www.nass.usda.gov/Charts_and_Maps/Poultry/tkprvl.php.

3. Johnson, R. Global turkey meat market: Key findings and insights. the poultry site

2018; Available from: https://thepoultrysite.com/news/2018/05/global-turkey-

meat-market-key-findings-and-insights.

4. Ramos, S., M. MacLachlan, and A. Melton, Impacts of the 2014-2015 Highly

Pathogenic Avian Influenza Outbreak on the U.S. Poultry Sector. USDA,

Economic Research Service, 2015.

5. Mulholland, K.A. and C.L. Keeler, BiomeSeq: A Tool for the Characterization of

Animal Microbiomes from Metagenomic Data. bioRxiv, 2019: p. 800995.

6. Dalloul, R.A., et al., Multi-platform next-generation sequencing of the domestic

turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol, 2010.

8(9).

7. Li, H. and R. Durbin, Fast and accurate long-read alignment with Burrows-

Wheeler transform. Bioinformatics, 2010. 26(5): p. 589-95.

8. Daly G., L.R., Rowe W., Stubbs S., Wilkinson M., Ramirez-Gonzalez R., Mario

C., Bernal W., Heeney J. , Host subtraction, filtering and assembly validations for

novel viral discovery using next generation sequencing data. PLoS One, 2015.

10(6).

192

9. Langmead, B. and S.L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat

Methods, 2012. 9(4): p. 357-9.

10. Mulholland, K.A. BiomeSeq Microbial Databases. Avian Genomics 2019;

Available from: https://sites.udel.edu/aviangenomics/.

11. O'Leary NA, W.M., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B.,

Robbertse B., Smith-White B., Ako-Adjei D., Astashyn A., Badretdin A., Bao Y.,

Blinkova O., Brover V., Chetvernin V., Choi J., Cox E., Ermolaeva O., Farrell

C.M., Goldfarb T., Gupta T., Haft D., Hatcher E., Hlavina W., Joardar V.S.,

Kodali V.K., Li W., Maglott D., Masterson P., McGarvey K.M., Murphy M.R.,

O'Neill K., Pujar S., Rangwala S.H., Rausch D., Riddick L.D., Schoch C., Shkeda

A., Storz S.S., Sun H., Thibaud-Nissen F., Tolstoy I., Tully R.E., Vatsan A.R.,

Wallin C., Webb D., Wu W., Landrum M.J., Kimchi A., Tatusova T., DiCuccio

M., Kitts P., Murphy T.D., Pruitt K.D., Reference sequence (RefSeq) database at

NCBI: current status, taxonomic expansion, and functional annotation. Nucleic

Acids Res., 2016. 4: p. 733-745.

12. Herath, D., et al., Assessing Species Diversity Using Metavirome Data: Methods

and Challenges. Comput Struct Biotechnol J, 2017. 15: p. 447-455.

13. Moustafa, A., et al., The blood DNA virome in 8,000 humans. PLoS Pathog, 2017.

13(3): p. e1006292.

14. Ludwig, J. and J. Reynolds, Statistical Ecology, ed. Wiley. 1988, New York.

15. Lemos, L.N., et al., Rethinking microbial diversity analysis in the high throughput

sequencing era. Journal of Microbiological Methods, 2011. 86(1): p. 42-51.

193

16. Chen, H. and P.C. Boutros, VennDiagram: a package for the generation of

highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics, 2011.

12(1): p. 35.

17. Shannon, P., et al., Cytoscape: a software environment for integrated models of

biomolecular interaction networks. Genome Res, 2003. 13(11): p. 2498-504.

18. Foley, S.L., et al., Population dynamics of serotypes in

commercial egg and poultry production. Applied and environmental

microbiology, 2011. 77(13): p. 4273-4279.

19. Magdelaine, P., M.P. Spiess, and E. Valceschini, Poultry meat consumption

trends in Europe. World's Poultry Science Journal, 2008. 64(1): p. 53-64.

20. USDA, Respiratory Disease on Breeder- Chicken Farms in the United States.

Technical Brief 2012; Available from: https://www.aphis.usda.gov/aphis/home.

21. Marin, C., et al., Wild griffon vultures (Gyps fulvus) as a source of Salmonella

and Campylobacter in Eastern Spain. PLoS One, 2014. 9(4): p. e94191.

22. Reynolds, D.L., Y.M. Saif, and K.W. Theil, A survey of enteric viruses of turkey

poults. Avian Dis, 1987. 31(1): p. 89-98.

23. Pantin-Jackwood, M.J., et al., Enteric viruses detected by molecular methods in

commercial chicken and turkey flocks in the United States between 2005 and

2006. Avian Dis, 2008. 52(2): p. 235-44.

24. Reynolds, D.L., K.W. Theil, and Y.M. Saif, Demonstration of rotavirus and

rotavirus-like virus in the intestinal contents of diarrheic pheasant chicks. Avian

Dis, 1987. 31(2): p. 376-9.

194

25. Day, J.M. and L. Zsak, Recent progress in the characterization of avian enteric

viruses. Avian Dis, 2013. 57(3): p. 573-80.

26. Nighot, P.K., et al., Astrovirus infection induces sodium malabsorption and

redistributes sodium hydrogen exchanger expression. Virology, 2010. 401(2): p.

146-54.

27. Theil, K. and Y.M. Saif, Age-Related Infections with Rotavirus, Rotaviruslike

Virus, and Atypical Rotavirus in Turkey Flocks. Journal of Clinical Microbiology,

1987. 25(2): p. 333-337.

195

Chapter 6

CONCLUSIONS AND FUTURE DIRECTIONS

Microbiomes are complex environments consisting of a variety of microorganisms including eukaryotic viruses, bacteria, bacteriophage and fungi. These environments can exist in the oral cavity, intestinal tract, skin, respiratory tract, and vaginal tract of both animals and humans [1, 2]. The microorganisms within these environments interact with the host and each other in either symbiosis or dysbiosis, depending on the condition of the host as well as external factors in the host’s surroundings [3]. Symbiotic relationships occur when a balance of specific microflora is achieved, which contributes to maintaining homeostasis of the host environment.

Conversely, dysbiosis may occur due to a disruption of the environment, either by the colonization of a new infectious agent or the introduction of an unfavorable external environmental condition. This may result in infection or disease in the host.

The respiratory microbiome is understudied in comparison to the intestinal, reproductive and oral microbiomes. However, dysbiosis in this environment can result in respiratory diseases such as chronic obstructive pulmonary disease (COPD), cystic fibrosis (CF) and asthma in humans. Several studies have identified the specific infectious agents that contribute to these diseases [4, 5] as well as the impact of co- infection by multiple infectious agents [6, 7]. In poultry, the main source of protein

196

consumption with over $46.3 billion in global sales as of 2018, respiratory diseases, particularly avian influenza and respiratory disease complex, can contribute to severe economic losses [8]. For example, the 2014-2015 outbreak of highly pathogenic avian influenza in the United States resulted in a loss of over 50 million chickens and turkeys

[9]. Many bacteria, viruses and fungi contributing to respiratory diseases in poultry have been identified and studies have examined the effect of several bacteria-bacteria [10-12], virus-virus [13, 14] and even bacteria-virus co-infections [15-18] on the severity of the disease. However, due to limitations in current methodologies, a comprehensive view of the complex ecology within the respiratory microbiome remains elusive.

The advancement of next generation sequencing methodologies has given rise to an increase in studies attempting to examine the microbial communities existing in a variety of animals. In contrast to traditional culture-dependent approaches, new technologies allow researchers to identify microbial communities at a relatively low cost

[19]. Readily accessible and cost-effective sequencing methodologies as well as a number of user-friendly bioinformatics analysis software and databases for 16S rRNA sequencing data provide the standard culture-independent approach for bacterial analysis [20-24].

Although 16S rRNA has provided insight into one component of the microbiome, it is limited to detecting one specific kingdom, lacks the sensitivity to discriminate between species and cannot be used for novel microbial discovery. Eukaryotic viruses are particularly difficult to analyze due to their high genetic heterogeneity and the lack of a common marker gene. Additionally, there are limited reference genome databases and bioinformatics tools available for viral analysis of microbiomes. To avoid these limitations, researchers have resorted to utilizing sequence-independent approaches for

197

viral identification, which presents a new challenge as this approach loses the information necessary for quantification of the virome.

The two major goals of this work were to develop a comprehensive and user- friendly computational tool for the detection and quantification of the major components of a microbiome and to utilize this tool for the analysis of several microbiomes in both healthy and diseased poultry. These goals were accomplished, and the computational tool developed, BiomeSeq, was successful in characterizing the respiratory microbiome of a healthy broiler flock, the respiratory microbiome of a clinically diseased broiler flock and the respiratory, cloacal and choanal cleft microbiomes of a healthy turkey flock. The design of the tool is described in detail in Chapter 2. In summary, a comprehensive workflow and microbial databases were developed that carefully consider the major limitations of existing methodology. Both DNA- and RNA-Seq data generated from next generation sequencing technology in both single- and paired-end format are accepted.

The workflow begins with a quality and decontamination step, which includes quality trimming of adapter sequences and low quality reads as well as decontamination of host

DNA.

Using a sequence-dependent approach, the remaining reads are aligned to four microbial reference genome databases. Eukaryotic viral, bacterial, fungal and bacteriophage databases were constructed using complete and representative genomes obtained from the NCBI Reference Sequence Database and contain 5,693, 3,623, 1,281 and 2,212 genomes, respectively. Quantification of normalized abundance, species diversity and genome coverage are determined from the number of reads aligned to the sequences within the microbial databases. The workflow and databases were packaged

198

into a comprehensive computational tool called BiomeSeq, discussed in detail in Chapter

2. To further increase accessibility, BiomeSeq was also implemented into an open-source and user-friendly container available on the Docker Hub. Containers, such as this, allow the end user to download and install BiomeSeq, both workflow and databases, and all dependent software on any operating system using one simple command. Furthermore, it allows the user to process their sample, with custom parameters, using one line of code.

The performance of BiomeSeq was evaluated using synthetic datasets containing sequences from variety of microorganisms experimentally observed in the respiratory microbiome of poultry. BiomeSeq detected each microorganism in the datasets and highly precise abundances were calculated. Using a clinical sample, results obtained by

BiomeSeq were compared results obtained by the 16S rRNA approach. BiomeSeq was able to identify 533 unique bacterial genera compared to 24 detected by 16S rRNA.

Furthermore, BiomeSeq has greater taxonomic sensitivity and is able to identify bacteria at the species level, whereas 16S rRNA sequencing is restricted to detection at the genera level. Moreover, 16S rRNA sequencing methodology can only be employed for taxonomic classification of the bacterial component, leaving the identity of the remaining components of the microbiome unknown.

This resource was employed to investigate the microbial communities inhabiting several different microbiomes in both healthy and clinically diseased avian hosts, which are detailed in Chapters 3, 4 and 5. In the first study, the development of the respiratory microbiome of a commercial, antibiotic-free broiler flock was examined at weekly intervals from hatching to processing. For each component of the respiratory microbiome of this flock, microbial abundance was calculated at various taxonomic levels and

199

population shifts were examined at various time points. A total of 11 eukaryotic viruses,

45 bacteria, 31 bacteriophage, and 61 fungi were identified throughout the development of this flock. In one interesting finding, the complexity and diversity of the viral community increased as the flock aged, with the occurrence of several viral elements being consistent with vaccination schedules of the chicks. Additionally, correlations between bacteria and bacteriophage families were investigated and several highly positive correlations were identified. In the second study, the microbial ecology of the respiratory tract of a broiler flock clinically diagnosed with respiratory disease complex and a healthy broiler flock were compared. Changes in the composition and diversity of the viral, bacterial, and bacteriophage microbiomes were observed which were consistent with the complex etiology of this disease. In the final study, the utility of BiomSeq to characterize a variety of species and microbiome locations was highlighted. BiomeSeq was successful in identifying microbial communities inhabiting three unique microbial niches, including the trachea, choanal cleft and cloaca in a turkey flock. BiomeSeq was also successful in characterizing the respiratory microbiome of duck and quail (data not provided).

BiomeSeq was successful in identifying and quantifying microbiomes in different locations with unique niches; microbiomes of both healthy and diseased hosts; and microbiomes of a variety of different host species. Therefore, the utility of this resource can be extended to include additional species, including humans. By accepting custom databases, species-specific microbial databases can be utilized in place of the custom databases provided by BiomeSeq. This feature was demonstrated in three studies

(Chapters 3, 4 and 5). As part of an extensive sustainability plan for BiomeSeq, an

200

automated program was designed to update the microbial databases biannually.

Furthermore, BiomeSeq is accessible for users with various levels of command-line knowledge and computational resources. In addition to the software package and container, BiomeSeq will also become available as a web tool.

These studies provide knowledgeable insight into the complexity of microbiomes.

However, this work could be further expanded in several directions. For example,

BiomeSeq was designed to identify known microbial species, but this information can be further extended by employing sequence-independent approaches, such as contig assembly, which can be used to identify novel microbial elements that the reference databases would lack. Moreover, the data presented in this work may also be further expanded to include metatranscriptomics, metabolomics and proteomics, providing invaluable insight into biological functions occurring within a particular microbiome. The genomic microbial networks presented in Chapters 3, 4 and 5 are the first visual representations of complete microbiomes and expanding these networks to incorporate multi-omics data would provide an even deeper understanding. Other possible directions may include modeling microbial community structures to predict disease progression during outbreaks. This could help explain how community diversity and shifts in specific species abundances can contribute to the severity and spread of disease. Additionally, potential microbial interactions could be examined utilizing text mining to generate systemic knowledge networks at the species or gene level. This type of analysis can strengthen our understanding of dynamic communities by providing valuable information about possible interactions using information provided by the literature. Interestingly, cross-talk between microbiomes and other areas of the body have been identified,

201

including the gut-brain [25-28], gut-kidney [29, 30], and gut-liver axes [29], linking microbiomes to conditions such as depression [25, 26], eating disorders [31], autism [27,

28], cancer [32-35], kidney disease [29, 30] and diabetes [36]. The methodological approaches described in this work may help reveal information about these complex systems.

The available literature is rich with studies attempting to decipher the role microorganisms play in a myriad of biological functions in healthy and diseased animals.

These studies have emphasized one or two components at a time. However, due to the lack of robust computational tools utilizing metagenomic sequencing data, the complete microbiome still remains elusive. The approaches to develop the necessary methodology to overcome these challenges are detailed in this work and have contributed to a better understanding of the dynamic community structure of microbiomes. By providing new information on how microbial communities develop over time, the population shifts observed in healthy and diseased animals and the similarities and differences observed between microbiomes in different locations of the same animal and in different species, the complexity of biological systems can be fully appreciated. Furthermore, the development of this computational tool as an open-source and user-friendly resource was motivated by the hope that it will be used to facilitate future investigations of microbiomes and advance knowledge in this growing field.

202

REFERENCES

1. Human Microbiome Project, C., A framework for human microbiome research.

Nature, 2012. 486(7402): p. 215-221.

2. Human Microbiome Project, C., Structure, function and diversity of the healthy

human microbiome. Nature, 2012. 486(7402): p. 207-214.

3. Turnbaugh, P.J., et al., The human microbiome project. Nature, 2007. 449(7164):

p. 804-810.

4. Papi, A., et al., Infections and airway inflammation in chronic obstructive

pulmonary disease severe exacerbations. Am J Respir Crit Care Med, 2006.

173(10): p. 1114-1121.

5. Rohde, G., et al., Respiratory viruses in exacerbations of chronic obstructive

pulmonary disease requiring hospitalisation: a case-control study. Thorax, 2003.

58(1): p. 37-42.

6. Bosch, A.A., et al., Viral and bacterial interactions in the upper respiratory tract.

PLoS Pathog, 2013. 9(1): p. e1003057.

7. Willner, D., et al., Metagenomic analysis of respiratory tract DNA viral

communities in cystic fibrosis and non-cystic fibrosis individuals. PLoS One,

2009. 4(10): p. e7370.

8. USDA, N.A.S.S. Poultry - Production and Value 2018 Summary. 2019; Available

from:

https://www.nass.usda.gov/Publications/Todays_Reports/reports/plva0519.pdf.

203

9. Ramos, S., M. MacLachlan, and A. Melton, Impacts of the 2014-2015 Highly

Pathogenic Avian Influenza Outbreak on the U.S. Poultry Sector. USDA,

Economic Research Service, 2015.

10. Ganapathy, K., R.C. Jones, and J.M. Bradbury, Pathogenicity of in vivo-passaged

Mycoplasma imitans in turkey poults in single infection and in dual infection with

rhinotracheitis virus. Avian Pathology, 1998. 27(1): p. 80-89.

11. Saif, Y.M., P.D. Moorhead, and E.H. Bohl, Mycoplasma meleagridis and

Escherichia coli infections in germfree and specific-pathogen-free turkey poults:

production of complicated airsacculitis. Am J Vet Res, 1970. 31(9): p. 1637-

1643.

12. Kato, K., Infectious coryza of chickens. V. Influence of Mycoplasma gallisepticum

infection on chicken infected with Haemophilus gallinarum. Natl Inst Anim

Health Q (Tokyo), 1965. 5(4): p. 183-189.

13. Bonfante, F., et al., Synergy or interference of a H9N2 avian influenza virus with

a velogenic Newcastle disease virus in chickens is dose dependent. Avian Pathol,

2017. 46(5): p. 488-496.

14. Karimi-Madab, M., et al., Risk factors for detection of bronchial casts, most

frequently seen in endemic H9N2 avian influenza infection, in poultry flocks in

Iran. Prev Vet Med, 2010. 95(3-4): p. 275-280.

15. Travers, A.F., Concomitant Ornithobacterium rhinotracheale and Newcastle

disease infection in broilers in South Africa. Avian Dis, 1996. 40(2): p. 488-490.

204

16. Okoye, J.O., C.N. Okeke, and F.K. Ezeobele, Effect of infectious bursal disease

virus infection on the severity of Aspergillus flavus aspergillosis of chickens.

Avian Pathol, 1991. 20(1): p. 167-171.

17. Omuro, M., et al., Interaction of Mycoplasma gallisepticum, mild strains of

Newcastle disease virus and infectious bronchitis virus in chickens. Natl Inst

Anim Health Q (Tokyo), 1971. 11(2): p. 83-93.

18. Kishida, N., et al., Co-infection of Staphylococcus aureus or Haemophilus

paragallinarum exacerbates H9N2 influenza A virus infection in chickens. Arch

Virol, 2004. 149(11): p. 2095-2104.

19. Wetterstrand, K.A. DNA Sequencing Costs: Data from the NHGRI Genome

Sequencing Program. 2019; Available from: https://www.genome.gov/about-

genomics/fact-sheets/DNA-Sequencing-Costs-Data.

20. Caporaso J, K.J., Stombaugh J, Bittinger K, Bushman F, Costello E, Fierer N,

Peña A, Goodrich J, Gordon J, Huttley G, Kelley ST, Knights D, Koenig JE, Ley

R, Lozupone C, McDonald D, Muegge B, Pirrung M, Reeder J, Sevinsky JR,

Turnbaugh P, Walters W, Widmann J, Yatsunenko T, Zaneveld J, Knight R.,

Qiime allows analysis of high-throughout community sequencing data. Nature

Methods, 2010. 7: p. 335-336.

21. Meyer F, P.D., D’Souza M, Olson R, Glass E, Kubal M, Paczian T, Rodriguqz A,

Stevens R, Wilke A, Wilkening J, Edwards R., The metagenomics RAST server- a

public resource for the automatic phylogenetic and functional analysis of

metagenomes. BMC Bioinformatics, 2008. 9: p. 386.

205

22. Schloss P, W.S., Ryabin T, Hall J, Hartman M, Hollister E, Lesniewski R, Oakley

B, Parks D, Robinson C, Sahl J, Stres B, Thallinger G, Van Horn D, Weber C. ,

Introducing mothur: Open-source, platform-independent, community-supported

software for describing and comparing microbial communities. Appl Enviro

Microbiol, 2009. 75: p. 7537-7541.

23. DeSantis, T.Z., et al., Greengenes, a chimera-checked 16S rRNA gene database

and workbench compatible with ARB. Appl Environ Microbiol, 2006. 72(7): p.

5069-72.

24. Quast C, P.E., Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner F.,

The SILVA ribosomal RNA gene database project: improved data processing and

web-based tools. Nucl Acids Res., 2013. 41: p. 590-596.

25. Agrawal, L., et al., Therapeutic potential of serotonin 4 receptor for chronic

depression and its associated comorbidity in the gut. Neuropharmacology, 2020.

166: p. 107969.

26. Penalver Bernabe, B., et al., Precision medicine in perinatal depression in light of

the human microbiome. Psychopharmacology (Berl), 2020. 237(4): p. 915-941.

27. Saurman, V., K.G. Margolis, and R.A. Luna, Autism Spectrum Disorder as a

Brain-Gut-Microbiome Axis Disorder. Dig Dis Sci, 2020.

28. Hartman, R.E. and D. Patel, Dietary Approaches to the Management of Autism

Spectrum Disorders, in Personalized Food Intervention and Therapy for Autism

Spectrum Disorder Management, M.M. Essa and M.W. Qoronfleh, Editors. 2020,

Springer International Publishing: Cham. p. 547-571.

206

29. Raj, D., et al., The gut-liver-kidney axis: Novel regulator of fatty liver associated

chronic kidney disease. Pharmacol Res, 2020. 152: p. 104617.

30. Jazani, N.H., et al., Impact of Gut Dysbiosis on Neurohormonal Pathways in

Chronic Kidney Disease. Diseases, 2019. 7(1): p. 21.

31. Peñalver Bernabé, B., et al., Precision medicine in perinatal depression in light of

the human microbiome. Psychopharmacology, 2020. 237(4): p. 915-941.

32. Seitz, J., S. Trinh, and B. Herpertz-Dahlmann, The Microbiome and Eating

Disorders. Psychiatric Clinics of North America, 2019. 42(1): p. 93-103.

33. Peters, B.A., et al., Oral Microbiome Composition Reflects Prospective Risk for

Esophageal Cancers. Cancer Res, 2017. 77(23): p. 6777-6787.

34. Gao, S.G., et al., Preoperative serum immunoglobulin G and A antibodies to

Porphyromonas gingivalis are potential serum biomarkers for the diagnosis and

prognosis of esophageal squamous cell carcinoma. BMC Cancer, 2018. 18(1): p.

17.

35. Ertz-Archambault, N., P. Keim, and D. Von Hoff, Microbiome and pancreatic

cancer: A comprehensive topic review of literature. World J Gastroenterol, 2017.

23(10): p. 1899-1908.

36. Flemer, B., et al., The oral microbiota in colorectal cancer is distinctive and

predictive. Gut, 2018. 67(8): p. 1454-1463.

37. Sharma, M., et al., The Epigenetic Connection Between the Gut Microbiome in

Obesity and Diabetes. Front genet, 2020. 10: p. 1329-1329.

207

Appendix A

BIOMESEQ: A TOOL FOR THE CHARACTERIZATION OF ANIMAL MICROBIOMES FROM METAGENOMIC DATA

Table S1. List of RefSeq genomes included in simulated datasets.

RefSeq ID Microbe Type Microbe Name

NC_002695.2 Bacteria Escherichia coli NC_004829.2 Bacteria Mycoplasma gallisepticum NC_018016.1 Bacteria Ornithobacterium rhinotracheale NZ_CP008918.1 Bacteria Pasteurella multocida NZ_CP011096.1 Bacteria Mycoplasma synoviae NC_000866.4 Bacteriophage Enterobacteria phage T4 NC_001604.1 Bacteriophage Enterobacteria phage T7 NC_019445.1 Bacteriophage Escherichia phage TL-2011b NC_019915.1 Bacteriophage Staphylococcus phage StB20 Eukaryotic AY851295.1 Virus Avian infectious bronchitis virus strain Mass 41 Eukaryotic DQ530348.1 Virus Gallid herpesvirus 2 strain CVI988 Eukaryotic EF523390.1 Virus Gallid herpesvirus 2 strain RB-1B Eukaryotic GQ504720.1 Virus Infectious bronchitis virus strain Arkansas DPI Eukaryotic GQ504723.1 Virus Infectious bronchitis virus strain Georgia 1998 Vaccine Eukaryotic KM244097.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 1 Eukaryotic KM244098.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 2 Eukaryotic KM244099.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 3 Eukaryotic KM244100.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 4 Eukaryotic KM244101.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 5 Eukaryotic KM244102.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 6 Eukaryotic KM244103.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 7

208

Eukaryotic KM244104.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 8 Eukaryotic NC_002229.3 Virus Gallid herpesvirus 2 Eukaryotic NC_002577.1 Virus Gallid herpesvirus 3 Eukaryotic NC_002617.1 Virus Newcastle disease virus B1 Eukaryotic NC_002641.1 Virus Meleagrid herpesvirus 1 Eukaryotic NC_004178.1 Virus Infectious bursal disease virus segment A Eukaryotic NC_004179.1 Virus Infectious bursal disease virus segment B Eukaryotic NC_006623.1 Virus Gallid herpesvirus 1 Eukaryotic NC_007652.1 Virus Avian metapneumovirus Eukaryotic NC_010800.1 Virus Turkey coronavirus NC_008282.1 Fungi Aspergillus oryzae NC_018100.1 Fungi Aspergillus oryzae NC_036435.1 Fungi Aspergillus oryzae chromosome 1 NC_036436.1 Fungi Aspergillus oryzae chromosome 2 NC_036437.1 Fungi Aspergillus oryzae chromosome 3 NC_036438.1 Fungi Aspergillus oryzae chromosome 4 NC_036439.1 Fungi Aspergillus oryzae chromosome 5 NC_036440.1 Fungi Aspergillus oryzae chromosome 6 NC_036441.1 Fungi Aspergillus oryzae chromosome 7 NC_036442.1 Fungi Aspergillus oryzae chromosome 8 Host NC_006088.5 Reference Gallus gallus breed Red Jungle Fowl isolate RJF #256 chromosome 1

209

Table S2. The number of reads that remain after each BiomeSeq processing step using four simulated sequencing datasets.

Decontaminated Reads Aligned to Dataset Raw Reads Trimmed Reads Reads Microbial Genomes

Dataset 1 24,522,223 24,521,469 5,158,013 4,681,160

Dataset 2 24,523,708 24,522,890 5,159,593 4,682,852

Dataset 3 24,523,100 24,522,369 5,158,995 4,681,818

Dataset 4 24,523,100 24,522,284 5,158,260 4,680,864

210

Table S3. Precision and sensitivity of BiomeSeq for each microbial component.

True False False Microbe Sensitivity Precision Positive Positive Negative Eukaryotic Virus 135016 0 270422 0.413 1.000 Bacteria 3847552 12940 916288 0.808 0.997 Bacteriophage 57252 9457 59256 0.491 0.858 Fungi 14541724 0 155116 0.989 1.000 Total 18581544 22397 1401082 0.930 0.999

211

Table S4. Rate of speed for simulated data during each BiomeSeq processing step in reads/second.

Microbial Simulated Quality Decontamination Database Quantification Total Dataset Alignment Dataset 1 92,537 6,966 2,614 222,912 325,029 Dataset 2 75,690 5,967 2,294 195,119 279,070 Dataset 3 71,705 6,049 2,354 156,061 236,168 Dataset 4 76,396 6,053 2,211 161,409 246,069

212

Table S5. Avian specific viral genome database structure.

Virus Complete Database Classification Family Genomes Double/Single Hepeviridae 1 Enveloped d Stranded a Hepadnaviridae 1 Genomoviridae 3 Single Non- Parvoviridae 7 Avian DNA Stranded Enveloped Circoviridae 10 Viral Database Smacoviridae 3 Double Poxviridae 3 Enveloped Stranded Herpesviridae 6 Double Non- Adenoviridae Stranded Enveloped 14 Double Non- Reoviridae 5 Segmented c Stranded Enveloped Birnaviridae 1 Retroviridae 5 Single Non- Positive b Enveloped Flaviviridae 3 Stranded Segmented Coronaviridae 5 Astroviridae 5 Avian RNA Single Non- Non- Positive Caliciviridae 1 Viral Stranded Segmented Enveloped Database Picornaviridae 17 Orthomyxoviridae 16 Single Phenuiviridae 1 Negative Segmented Enveloped Stranded Bornaviridae 3 Pneumoviridae 1 Single Non- Negative Enveloped Paramyxoviridae Stranded Segmented 14 a single stranded, double stranded or single/double stranded DNA and RNA viruses b positive-sense or negative-sense RNA viruses c segmented or non-segmented RNA viruses d enveloped or non-enveloped DNA and RNA viruses Table SX. Avian specific viral genome database structure.

213

Table S6. Abundance of bacterial species detected by BiomeSeq and 16S.

Percent Relative Family Genera/Species Abundance Pasteurellaceae Gallibacterium 37.8 Corynebacteriaceae Corynebacteriaceae* 22.5 Staphylococcaceae Staphylococcus 9.2 Lactobacillaceae Lactobacillus 8.7 Lactobacillales Lactobacillales** 6.4 16S Brevibacteriaceae Brevibacterium 3.1 Staphylococcaceae Salinicoccus 2.5 Streptococcaceae Streptococcus 2.3 Dermabacteraceae Brachybacterium 2.0 Bacillaceae Bacillaceae* 2.0 Pasteurellaceae Gallibacterium anatis 23.1 Corynebacteriaceae Corynebacterium falsenii 14.5 Staphylococcaceae Staphylococcus haemolyticus 23.0 Enterobacteriaceae 9.3 Enterobacteriaceae Escherichia coli 5.1 BiomeSeq Staphylococcaceae Staphylococcus saprophyticus 2.1 Methylobacterium Methylobacteriaceae 1.0 radiotolerans Neisseria sicca 0.8 Corynebacteriaceae Corynebacterium stationis 0.8 Yersiniaceae Serratia marcescens 0.8

214

Figure S1. BiomeSeq implemented into user-friendly container.

215

Appendix B

METAGENOMIC ANALYSIS OF THE RESPIRATORY MICROBIOME OF A HEALTHY BROILER FLOCK FROM HATCHING TO PROCESSING

216

Table S1. Sequencing data generated by DNA-Seq, RNA-Seq and 16S rRNA.

Total After % Mapped Mapped to DNA Mapped to Mapped to DNA-Seq Reads Trimming Host Viral Reads Bacteriophage Fungi Week 0 46,586,608 46,568,465 86.43 1 68,174 545 Week1 43,940,202 43,911,476 81.11 0 63,256 649 Week 2 33,471,831 33,442,860 89.8 1 61,974 48 Week 3 45,131,953 45,108,639 84.81 2 68,330 313 Week 4 44,621,969 44,600,956 90 387 61,412 109 Week 5 38,630,592 38,590,484 88.99 7 51,011 142 Week 6 45,915,721 45,900,494 89.73 4,634 66,310 82 Week 7 41,215,207 41,196,338 90.18 131 64,215 76 Total 339,514,083 339,319,712 701 5,163 504,682 1,964 Average 42,439,260 42,414,964 88 645 63,085 246

Total After % Mapped Mapped to RNA 217 RNA-Seq Reads Trimming Host Viral Reads

Week 0 62375955 62351619 49.21 416 Week1 54,496,930 54,477,530 44.3 16,569 Week 2 51,426,090 51,415,545 65.54 10,016 Week 3 57,566,851 57,551,710 28.76 977 Week 4 56,334,014 56,321,058 61.62 3,660 Week 5 50,937,821 50,926,463 57.56 4,579 Week 6 53,248,431 53,236,536 59.08 29,476 Week 7 54,173,052 54,162,138 61.44 6,243 Total 440,559,144 440,442,599 428 71,936

Average 55,069,893 55,055,325 53 8,992

217

16S rRNA Total Reads OTUs

Week 0 8,438 47 Week 1 7,550 79 Week 2 6,737 76 Week 4 9,450 63 Week 5 10,718 50 Week 7 7,288 38 Total 50,181 353 Average 8,364 59

218

218

Table S2. Normalized abundance of detected eukaryotic viruses at the species level.

Species Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7

Gallid alphaherpesvirus 1 33.52 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Meleagrid alphaherpesvirus 1 0.00 0.00 41.97 0.00 31.40 0.00 0.00 0.00 Gallid alphaherpesvirus 2&3 0.00 0.00 0.00 63.84 1186.57 71.12 57.03 0.00 Avian gyrovirus 0.00 0.00 0.00 0.00 727764.39 12256.84 147165.92 68358.83 Fowl aviadenovirus 0.00 0.00 0.00 0.00 0.00 0.00 507048.92 12450.59 Chicken astrovirus 0.00 1490.69 0.00 0.00 0.00 0.00 0.00 0.00 Chicken sicinivirus JSY 0.00 0.00 0.00 42.11 42.11 0.00 0.00 0.00 Avian carcinoma virus 0.00 0.00 0.00 0.00 0.00 78.52 0.00 78.52 Avian infectious bronchitis virus 74.78 109675.86 67740.67 4048.83 15467.52 23999.29 207253.45 35914.02 Infectious bursal disease virus 0.00 0.00 0.00 0.00 64.88 0.00 22191.95 194.65 Avian Endogenous Retrovirus 19470.62 89108.98 47135.34 20718.00 76251.66 65275.35 67618.14 68898.80

219

219

Table S3. Normalized percent relative abundance of detected eukaryotic viruses at the species level. Sum of columns = 100%.

Week Week Week Week Week Week Week Week Family Species Frequency 0 1 2 3 4 5 6 7 Poxviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Gallid alphaherpesvirus 1 0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12.5 Herpesviridae Meleagrid alphaherpesvirus 1 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 12.5 Gallid alphaherpesvirus 2&3 0.00 0.00 0.00 0.26 0.14 0.07 0.01 0.00 50.0 Adenoviridae Fowl aviadenovirus 0.00 0.00 0.00 0.00 0.00 0.00 53.30 6.70 25.0 Hepeviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Hepadnaviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Genomiviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Circoviridae Avian gyrovirus 0.00 0.00 0.00 0.00 88.66 12.05 15.47 36.77 50.0 Parvoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0

220 Reoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0

Orthomyxoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Phenumoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Birnaviridae Infectious bursal disease virus 0.00 0.00 0.00 0.00 0.01 0.00 2.33 0.10 37.5 Pneumoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Paramyxoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Astroviridae Chicken astrovirus 0.00 0.74 0.00 0.00 0.00 0.00 0.00 0.00 12.5 Picornoviridae Chicken sicinivirus JSY 0.00 0.00 0.00 0.17 0.01 0.00 0.00 0.00 25.0 Avian Endogenous Retrovirus 99.45 44.49 41.02 83.30 9.29 64.20 7.11 37.06 100.0 Retroviridae Avian carcinoma virus 0.00 0.00 0.00 0.00 0.00 0.08 0.00 0.04 25.0 Flaviviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Coronaviridae Avian infectious bronchitis virus 0.38 54.76 58.95 16.28 1.88 23.60 21.79 19.32 100.0

220

Table S4. Normalized percent relative abundance of detected eukaryotic viruses at the family level. Sum of rows = 100%.

Week Week Week Week Week Week Week Week Viral Family 0 1 2 3 4 5 6 7 Poxviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Herpesviridae 24.86 0.00 5.30 37.27 21.54 10.16 0.87 0.00 Adenoviridae 0.00 0.00 0.00 0.00 0.00 0.00 88.84 11.16 Hepeviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Hepadnaviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Genomiviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Circoviridae 0.00 0.00 0.00 0.00 57.97 7.88 10.11 24.04 Parvoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Reoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Orthomyxoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Phenumoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Birnaviridae 0.00 0.00 0.00 0.00 0.32 0.00 95.39 4.28 Pneumoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Paramyxoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 221 Astroviridae 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00

Picornoviridae 0.00 0.00 0.00 97.06 2.94 0.00 0.00 0.00 Retroviridae 25.76 11.53 10.63 21.58 2.41 16.65 1.84 9.61 Flaviviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Coronaviridae 0.19 27.80 29.93 8.26 0.96 11.98 11.06 9.81

221

Table S5. Average normalized relative abundance of viruses detected.

Strand Segments Envelope Family Genus Species double stranded DNA 7.59% enveloped DNA 0.09% Herpesviridae 0.09% Iltovirus 0.02% Gallid alphaherpesvirus 1 0.02% Meleagrid Mardivirus 0.06% alphaherpesvirus 1 0.01% Gallid alphaherpesvirus 2&3 0.06%

non-enveloped DNA 7.50% Adenoviridae 7.50% Aviadenovirus 7.50% Fowl aviadenovirus 7.50%

single stranded non-enveloped DNA 19.12% DNA 19.12% Circoviridae 19.12% Gyrovirus 19.12% Avian gyrovirus 19.12%

single stranded positive, non- non-enveloped RNA 73.95% segmented 73.64% RNA 0.77% Astroviridae 0.74% Avastrovirus 0.74% Chicken astrovirus 0.74%

Picornoviridae 0.02% Sicinivirus 0.02% Chicken sicinivirus JSY 0.02% 222

Avian infectious enveloped RNA 72.87% Coronaviridae 24.62% Gammacoronavirus 24.62% bronchitis virus 24.62%

Unclassified Avian Endogenous Retroviridae 48.25% Retrovirus 48.25% Retrovirus 48.24%

Avian carcinoma virus 0.01% Infectious bursal disease negative, segmented 0.31% enveloped RNA 0.31% Birnaviridae 0.31% Avibirnavirus 0.31% virus 0.31%

222

Table S6. Normalized abundance of detected bacteriophage at the species level.

Species Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7

Enterobacteria phage IME10 202.28 1258.09 150.16 611.33 0.00 0.00 0.00 0.00

Staphylococcus phage SPbeta-like 313.94 0.00 0.00 284.63 90.51 0.00 0.00 0.00

Enterobacteria phage P88 0.00 928.47 0.00 1353.48 484.17 0.00 0.00 0.00

Staphylococcus phage GH15 0.00 59.46 0.00 1040.16 41.34 0.00 0.00 0.00

Shigella phage SHFML-11 140.99 97.43 69.77 0.00 33.87 160.41 0.00 70.65

Enterobacteria phage P1 253.79 613.83 62.80 894.81 0.00 0.00 0.00 0.00

Staphylococcus phage StB20-like 0.00 0.00 0.00 3873.58 568.48 168.27 0.00 0.00

Staphylococcus phage P108 0.00 0.00 0.00 1290.95 123.15 48.60 0.00 0.00 Staphylococcus phage phiSA012 0.00 0.00 0.00 1108.69 244.06 48.16 0.00 0.00

Microbacterium phage Min1 2075.63 896.48 0.00 0.00 0.00 0.00 412.69 0.00

Staphylococcus phage phiRS7 924.06 191.57 0.00 3071.89 133.20 315.42 0.00 0.00 Enterobacteria phage cdtI 170.56 2298.33 0.00 773.17 245.85 0.00 135.64 128.21

Enterobacteria phage SfI 1462.35 1732.38 310.15 1578.36 0.00 178.27 0.00 0.00

Staphylococcus phage StB20 0.00 0.00 0.00 2979.68 1421.19 168.27 156.82 148.23 Salmonella phage SJ46 387.63 1285.79 57.55 937.18 223.50 132.32 61.66 0.00

Staphylococcus phage MCE-2014 56.51 58.58 0.00 2732.68 0.00 144.68 0.00 42.48

Enterobacteria phage lambda 2480.22 2399.55 122.74 249.85 476.68 141.10 526.00 0.00 Enterobacteria phage phi92 0.00 55.94 0.00 978.52 933.43 184.20 0.00 0.00

Enterobacteria phage mEp460 2702.67 3175.06 267.49 1089.04 0.00 461.27 143.29 0.00

Lactobacillus prophage Lj928 0.00 10612.23 465.28 0.00 150.58 0.00 0.00 0.00 Staphylococcus phage phiIPLA-RODI 56.34 0.00 0.00 3490.40 284.23 480.77 0.00 42.35

Salmonella phage RE-2010 9637.67 4873.26 523.47 4617.59 847.08 802.38 186.95 353.40

Enterobacteria phage T7 7630.75 5620.16 298.12 3944.67 434.18 856.81 798.51 452.84 Stx2-converting phage 1717 5032.73 4012.93 478.95 2729.93 186.01 550.60 205.26 194.00

Enterobacteria phage VT2phi_272 6809.25 3781.24 270.78 3123.52 438.18 518.81 386.81 91.40

Shigella phage SfIV 11295.95 7318.20 898.40 3962.43 436.14 860.67 1283.37 0.00 Escherichia phage TL-2011b 11998.06 7981.90 398.79 4600.12 1032.51 916.89 996.93 269.22

Enterobacteria phage RB55 0.00 0.00 3982.75 0.00 0.00 5388.95 0.00 1284.90 Stx2 converting phage vB_EcoP_24B 38098.43 20466.63 825.71 12816.52 2004.26 949.24 3096.30 522.60

Uncultured phage crAssphage 1487.20 171.29 0.00 124.85 0.00 0.00 0.00 0.00

Enterobacteria phage YYZ-2008 5989.66 3482.95 108.44 2428.26 526.45 0.00 697.11 109.82

223

Table S7. Normalized percent relative abundance of detected bacteriophage at the species level.

Week Week Week Week Week Week Week Week Family Species Frequency 0 1 2 3 4 5 6 7

Myoviridae Enterobacteria phage P88 1.114 2.030 4.262 50

Staphylococcus phage GH15 0.071 1.560 0.364 37.5

Shigella phage SHFML-11 0.129 0.117 0.751 0.000 0.298 1.190 1.904 75

Enterobacteria phage P1 0.232 0.736 0.676 1.342 50

Staphylococcus phage P108 1.936 1.084 0.361 37.5

Staphylococcus phage phiSA012 1.663 2.149 0.357 37.5

Enterobacteria phage SfI 1.339 2.078 3.338 2.367 0.000 1.323 62.5

Salmonella phage SJ46 0.355 1.542 0.619 1.405 1.968 0.982 0.678 87.5

Staphylococcus phage MCE-2014 0.052 0.070 4.098 1.074 1.145 62.5

Enterobacteria phage phi92 0.067 1.467 8.218 1.367 50

Staphylococcus phage phiIPLA-RODI 0.052 5.234 2.502 3.568 1.141 62.5

Salmonella phage RE-2010 8.825 5.845 5.634 6.924 7.457 5.954 2.057 9.525 100

Shigella phage SfIV 10.344 8.778 9.669 5.942 3.840 6.387 14.123 87.5

Enterobacteria phage RB55 42.865 39.989 34.632 37.5

Podoviridae Enterobacteria phage IME10 0.185 1.509 1.616 0.917 50

Enterobacteria phage T7 6.987 6.741 3.209 5.915 3.822 6.358 8.787 12.206 100

Enterobacteria phage VT2phi_272 6.235 4.535 2.914 4.684 3.858 3.850 4.257 2.464 100

Escherichia phage TL-2011b 10.987 9.574 4.292 6.898 9.090 6.804 10.970 7.256 100

Stx2 converting phage vB_EcoP_24B 34.887 24.549 8.887 19.219 17.645 7.044 34.073 14.086 100

Siphoviridae Staphylococcus phage SPbeta-like 0.287 0.427 0.797 37.5

Staphylococcus phage StB20-like 5.809 5.005 1.249 37.5

Microbacterium phage Min1 1.901 1.075 4.541 37.5

Staphylococcus phage phiRS7 0.846 0.230 4.606 1.173 2.341 62.5

Enterobacteria phage cdtI 0.156 2.757 1.159 2.164 1.493 3.456 75

Staphylococcus phage StB20 4.468 12.512 1.249 1.726 3.995 62.5

Enterobacteria phage lambda 2.271 2.878 1.321 0.375 4.196 1.047 5.788 87.5

Enterobacteria phage mEp460 2.475 3.808 2.879 1.633 3.423 1.577 75

Lactobacillus prophage Lj928 0.000 12.729 5.008 1.326 37.5

224

Stx2-converting phage 1717 4.608 4.813 5.155 4.094 1.638 4.086 2.259 5.229 100

Unclassified Uncultured phage crAssphage 1.362 0.205 0.187 37.5

Enterobacteria phage YYZ-2008 5.485 4.178 1.167 3.641 4.635 7.671 2.960 87.5

225

Table S8. Average normalized relative abundance of detected bacteriophage.

Order Family Genus Species Average

Caudovirales 142.34% Myoviridae 70.99% P1virus 1.83% Enterobacteria phage P1 0.75%

Salmonella phage SJ46 1.08%

Spounavirinae 6.97% Staphylococcus phage GH15 0.67% Staphylococcus phage MCE-2014 1.29%

Staphylococcus phage P108 1.13%

Staphylococcus phage phiIPLA-RODI 2.50% Staphylococcus phage phiSA012 1.39%

Tevenvirinae 39.89% Enterobacteria phage RB55 39.16%

Shigella phage SHFML-11 0.73% unclassified 22.31% 2.47% Myoviridae Enterobacteria phage P88

Enterobacteria phage phi92 2.78% Enterobacteria phage SfI 2.09%

Salmonella phage RE-2010 6.53%

Shigella phage SfIV 8.44% Podoviridae 40.19% Autographivirinae 6.75% Enterobacteria phage T7 6.75%

Epsilon15virus 8.23% Escherichia phage TL-2011b 8.23%

Sepvirinae 24.15% Enterobacteria phage VT2phi_272 4.10% Stx2 converting phage vB_EcoP_24B 20.05% Unclassified 1.06% 1.06% Podoviridae Enterobacteria phage IME10 Siphoviridae 31.16% 11.15% Enterobacteria phage cdtI 1.98%

Enterobacteria phage lambda 2.55%

Enterobacteria phage mEp460 2.63% Stx2-converting phage 1717 3.99%

Spbetavirus 0.50% Staphylococcus phage SPbeta-like 0.50% unclassified 19.51% 6.35% Siphoviridae Lactobacillus prophage Lj928

Microbacterium phage Min1 2.51%

Staphylococcus phage phiRS7 1.84% Staphylococcus phage StB20 4.79%

Staphylococcus phage StB20-like 4.02%

Unclassified 4.83% Unclassified 4.83% Unclassified 4.83% Enterobacteria phage YYZ-2008 4.25% Uncultured phage crAssphage 0.59%

226

Table S9. Relative abundance of detected bacteria at the genera level. (Taxa that could not be assigned a genus are displayed using the highest taxonomic level that could be assigned to them: * (family), ** (class), or *** (order)).

Week Week Week Week Week Week Family Genus Frequency 0 1 3 4 5 7

Corynebacteriaceae Corynebacteriaceae* 6.00 1.00 1.00 1.00 0.50 22.00 100

Brevibacteriaceae Brevibacterium 9.00 6.00 7.00 5.00 4.00 3.00 100

Dermabacteraceae Brachybacterium 8.00 6.00 5.00 7.00 2.00 2.00 100

Micrococcaceae Yaniella 1.00 1.00 1.00 1.00 1.00 0.20 100

Micrococcaceae* 2.00 0.00 0.00 0.00 0.00 0.00 16.7

Nocardiopsaceae 3.00 1.00 1.00 21.00 0.10 0.00 83.3

Bacteroidaceae Bacteroides 0.00 0.00 3.00 2.00 2.00 0.00 50

Prevotellaceae Alloprevotella 2.00 0.00 0.00 0.00 0.10 0.00 33.3

Flavobacteriaceae Chryseobacterium 7.00 0.00 0.00 0.10 0.10 0.00 50

Bacillaceae Lentibacillus 2.00 0.40 0.10 0.00 0.00 0.00 50

Paucisalibacillus 1.00 2.00 2.00 6.00 0.30 0.30 100

Jeotgalicoccus 1.00 0.40 0.30 1.00 0.40 1.00 100

Staphylococcaceae Salinicoccus 1.00 2.00 2.00 4.00 1.00 2.40 100

Staphylococcus 3.00 7.10 11.00 16.00 6.00 9.00 100

Bacillaceae* 6.50 6.30 4.20 10.40 1.20 2.00 100

Planococcaceae Planococcaceae* 4.00 0.00 0.00 0.00 0.10 0.00 33.3

Aerococcaceae Facklamia 2.00 0.00 0.10 0.20 0.00 0.10 66.7

Lactobacillaceae Lactobacillus 5.10 14.10 34.10 7.00 18.20 8.50 100

Leuconostocaceae Weissella 0.00 1.00 0.40 3.00 1.00 0.10 83.3

Streptococcaceae Streptococcus 8.00 1.10 0.10 0.40 0.40 2.20 100

Lactobacillales Lactobacillales** 1.00 23.00 0.10 0.00 11.00 6.30 83.3

Lachnoclostridium 0.00 3.00 2.00 2.00 0.30 0.00 66.7

Lachnospiraceae Anaerotruncus 0.00 1.00 1.00 0.10 0.30 0.00 66.7

Faecalibacterium 0.00 0.00 5.00 2.00 4.00 1.00 66.7

Ruminococcaceae Subdoligranulum 0.00 0.00 1.00 0.20 0.20 0.10 66.7

Lachnospiraceae* 0.00 2.00 0.20 0.10 0.00 0.00 50

Peptostreptococcaceae* 0.00 1.00 0.10 1.00 0.40 0.40 83.3

Peptostreptococcaceae Bacilli*** 0.50 1.00 2.40 1.40 0.50 0.00 83.3

Oxalobacteraceae Oxalobacteraceae* 3.00 0.00 0.00 0.00 0.00 0.00 16.7

Pasteurellaceae Gallibacterium 0.00 0.00 1.00 0.00 44.00 37.00 50

Pseudomonadaceae Pseudomonas 13.00 2.00 0.00 0.00 0.10 0.00 50

Xanthomonadaceae Xanthomonas 3.00 0.00 0.00 0.00 0.00 0.00 16.7

Enterobacteriaceae Escherichia-Shigella 0.10 2.00 1.00 2.00 0.20 0.20 100

227

Table S10. Average relative abundance of detected bacteria.

Phylum Class Order Family Genus

Actinobacteria 24.00% Actinobacteria 24.00% Corynebacteriales 5.25% Corynebacteriaceae 5.25% 5.25%

Micrococcales 13.53% Brevibacteriaceae 5.67% Brevibacterium 5.67%

Dermabacteraceae 5.00% Brachybacterium 5.00% Micrococcaceae 2.87% Yaniella 0.87%

2.00%

Streptosporangiales 5.22% Nocardiopsaceae 5.22% Nocardiopsis 5.22%

Bacteroidetes 5.78% Bacteroidia 3.38% Bacteroidales 3.38% Bacteroidaceae 2.33% Bacteroides 2.33%

Prevotellaceae 1.05% Alloprevotella 1.05% Flavobacteria 2.40% Flavobacteriales 2.40% Flavobacteriaceae 2.40% Chryseobacterium 2.40%

Firmicutes 56.17% Bacilli 49.02% 21.35% Bacillaceae 7.87% Lentibacillus 0.83% 228 Paucisalibacillus 1.93%

5.10%

Staphylococcaceae 11.43% Jeotgalicoccus 0.68%

Salinicoccus 2.07%

Staphylococcus 8.68%

Planococcaceae 2.05% 2.05%

Lactobacillales 27.67% Aerococcaceae 0.60% Facklamia 0.60%

Lactobacillaceae 14.50% Lactobacillus 14.50%

Leuconostocaceae 1.10% Weissella 1.10%

Streptococcaceae 2.03% Streptococcus 2.03%

Lactobacillales 8.28% 8.28%

1.16%

Clostridia 7.15% Clostridiales 7.15% Lachnospiraceae 2.59% Lachnoclostridium 1.83%

228

0.77% Ruminococcaceae 3.98% Anaerotruncus 0.60%

Faecalibacterium 3.00%

Subdoligranulum 0.38% Peptostreptococcaceae 0.58% 0.58%

Proteobacteria 39.28% Betaproteobacteria 3.00% 3.00% Oxalobacteraceae 3.00% 3.00% Gammaproteobacteria 36.28% Pasteurellales 27.33% Pasteurellaceae 27.33% Gallibacterium 27.33%

Pseudomonadales 5.03% Pseudomonadaceae 5.03% Pseudomonas 5.03%

Xanthomonadales 3.00% 3.00% Xanthomonas 3.00% Escherichia- Enterobacterales 0.92% Enterobacteriaceae 0.92% 0.92% Shigella

229

229

Table S11. Normalized abundance of detected fungi at the species level.

Species Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7

Candida glabrata 4.53 3.79 11.63

Agaricus bisporus 8.73 2.99 Amorphotheca 3.17 5.13 resinae Aspergillus 183.48 170.82 fumigatus

Aspergillus nidulans 169.51

Aspergillus oryzae 1.18 25.15 1.73 9.14 2.96 1.83 1.03 4.35

Bipolaris cookei 43.02

Botrytis cinerea 4.84 2.87 2.94 2.43 2.93 2.70 Candida dubliniensis 2.48 9.31 1.78 2.36 Chrysoporthe 177.08 27.48 austroafricana Chrysoporthe 42.15 deuterocubensis

Clonostachys rosea 137.64 Colletotrichum 925.76 graminicola

Cryptococcus gattii 2.94 2.45 Cryptococcus 3.74 3.75 neoformans Debaryomyces 3.69 7.92 8.80 4.07 hansenii

Dekkera bruxellensis 73.67 68.59

Diaporthe longicolla 18760.22 125.00 1177.48 93.52 109.31 Didymella pinodes 178.10 Epidermophyton 216.10 169.64 floccosum Eremothecium 5.51 3.24 gossypii Eremothecium 5.48 sinecaudum Exophiala 191.68 dermatitidis

Fusarium circinatum 587.48 703.22 174.09 Fusarium 58.87 438.45 graminearum

Fusarium mangiferae 171.20 352.62 Gibberella 92.73 97.55 moniliformis Kazachstania 5.28 3.87 naganishii

Kluyveromyces lactis 3.41 Kluyveromyces 3.71 3.06 marxianus

Kuraishia capsulata 4.41 4.99 4.42

Laccaria bicolor 4.83 9.98 3.31 4.87 2.48 7.43 4.03 9.27

230

Lachancea 3.28 3.40 thermotolerans Meyerozyma 160.86 guilliermondii Moniliophthora 48.06 perniciosa 226.75 graminicola

Nectria cinnabarina 80.58

Neurospora crassa 260.59 Penicillium 89.75 42.56 6.87 45.95 288.28 147.93 15.96 6.87 chrysogenum

Pestalotiopsis fici 286.75 Pithomyces 144.63 chartarum

Ricasolia amplissima 60.54 Saccharomyces 6.81 cerevisiae 14.31 apiospermum 2.20 pombe

Sordaria macrospora 4.85 6.41 10.35 2.89 6.32 22.04 Stemphylium 74.19 lycopersici Sugiyamaella 4.37 7.41 1.80 lignohabitans Talaromyces 147.97 761.92 marneffei Tetrapisispora 6.77 1.94 blattae

Tetrapisispora phaffii 68.48 45.12 21.25 20.25 23.67 26.32 58.36 Thermothelomyces 1.30 1.46 1.48 1.06 0.46 1.63 1.29 1.94 thermophila

Thielavia terrestris 1.98 1.37 1.92 1.49 1.23 2.20 1.00 0.90 Torulaspora 4.00 delbrueckii

Tremella fuciformis 142.18 Trichoderma 166.60 asperellum

Trichophyton rubrum 247.54

Usnea ceratina 85.94 89.13 Wickerhamomyces 539341.10 286941.40 45174.40 174945.06 33799.87 218599.53 50508.57 51128.68 ciferrii

Yarrowia lipolytica 2.16 1.37 1.44 2.17 1.61 1.49

Zymoseptoria tritici 1.42 1.43

231

Table S12. Normalized percent relative abundance of detected fungi at the species level.

Species Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Frequency

Candida glabrata 0.0008 0.0109 0.0053 37.5 Agaricus bisporus 0.0252 0.0014 25 Amorphotheca 0.0014 0.0101 25 resinae Aspergillus 25 fumigatus 0.0596 0.0953

Aspergillus nidulans 0.0551 12.5

Aspergillus oryzae 0.0002 0.0082 0.0038 0.0051 0.0086 0.0008 0.0020 0.0126 87.5 Bipolaris cookei 0.0196 12.5

Botrytis cinerea 0.0009 0.0009 0.0064 0.0014 0.0085 0.0012 75

Candida dubliniensis 0.0008 0.0269 0.0035 0.0023 50 Chrysoporthe 25 austroafricana 0.0575 0.0153 Chrysoporthe 12.5 deuterocubensis 0.0200

Clonostachys rosea 0.0447 12.5 Colletotrichum 12.5 graminicola 0.5165

Cryptococcus gattii 0.0013 0.0049 25 Cryptococcus 25 neoformans 0.0012 0.0021 Debaryomyces 50 hansenii 0.0007 0.0026 0.0049 0.0118 Dekkera bruxellensis 0.0239 0.0383 25

Diaporthe longicolla 6.0952 0.2727 0.6569 0.2703 0.0499 62.5

Didymella pinodes 0.0329 12.5 Epidermophyton 25 floccosum 0.4715 0.0946 Eremothecium 0.0015 25 gossypii 0.0159 Eremothecium 12.5 sinecaudum 0.0158 Exophiala 12.5 dermatitidis 0.0354

Fusarium circinatum 0.1909 0.3923 0.0794 37.5 Fusarium 25 graminearum 0.0191 0.2446

Fusarium mangiferae 0.0955 0.3405 25 Gibberella 25 moniliformis 0.0171 0.0544 Kazachstania 25 naganishii 0.0029 0.0112

Kluyveromyces lactis 0.0016 12.5 Kluyveromyces 25 marxianus 0.0012 0.0017

Kuraishia capsulata 0.0008 0.0016 0.0128 37.5

Laccaria bicolor 0.0009 0.0032 0.0072 0.0027 0.0072 0.0034 0.0080 0.0179 100

232

Lachancea 0.0015 25 thermotolerans 0.0095 Meyerozyma 12.5 guilliermondii 0.4650 Moniliophthora 12.5 perniciosa 0.0268 Mycosphaerella 12.5 graminicola 0.0419

Nectria cinnabarina 0.0262 12.5

Neurospora crassa 0.0847 12.5 Penicillium 0.0675 0.0315 0.0265 100 chrysogenum 0.0166 0.0138 0.0150 0.0256 0.8333

Pestalotiopsis fici 0.0530 12.5 Pithomyces 12.5 chartarum 0.0267

Ricasolia amplissima 0.0112 12.5 Saccharomyces 12.5 cerevisiae 0.0022 Scedosporium 0.0065 12.5 apiospermum Schizosaccharomyces 0.0021 12.5 pombe

Sordaria macrospora 0.0009 0.0021 0.0226 0.0084 0.0125 0.0426 75 Stemphylium 12.5 lycopersici 0.0241 Sugiyamaella 37.5 lignohabitans 0.0014 0.0041 0.0052 Talaromyces 0.7358 25 marneffei 0.0825 Tetrapisispora 25 blattae 0.0013 0.0011

Tetrapisispora phaffii 0.0222 0.0984 0.0119 0.0585 0.0108 0.0520 0.0564 87.5 Thermothelomyces 0.0007 0.0026 0.0056 75 thermophila 0.0002 0.0005 0.0032 0.0006 0.0013

Thielavia terrestris 0.0004 0.0004 0.0042 0.0008 0.0035 0.0010 0.0020 0.0026 75 Torulaspora 0.0079 12.5 delbrueckii

Tremella fuciformis 0.0263 12.5 Trichoderma 12.5 asperellum 0.4816

Trichophyton rubrum 0.5400 12.5

Usnea ceratina 0.0279 0.0407 12.5 Wickerhamomyces 99.7037 99.8600 98.7550 100 ciferrii 99.7309 93.2272 98.5550 97.5976 97.6982

Yarrowia lipolytica 0.0004 0.0004 0.0008 0.0063 0.0007 0.0029 50

Zymoseptoria tritici 0.0003 0.0041 12.5

233

Table S13. Average normalized relative abundance of detected fungi.

Average Phylum Class Order Family Genus Species Abundance

Ascomycota 103.8% 0.15% Capnodiales 0.05% Mycosphaerellaceae 0.05% Zymoseptoria 0.05% Bipolaris cookei 0.02%

Didymella pinodes 0.03% Mycosphaerella

Pleosporales 0.09% Astrosphaeriellaceae 0.04% Pithomyces 0.04% graminicola 0.04% Pithomyces

Didymellaceae 0.03% Didymella 0.03% chartarum 0.03% Stemphylium

Pleosporaceae 0.03% Bipolaris 0.02% lycopersici 0.02%

Stemphylium 0.00% Zymoseptoria tritici 0.002% Aspergillus Eurotiomycetes 0.99% Onygenales 0.08% Arthrodermataceae 0.08% Trichophyton 0.08% fumigatus 0.08%

Chaetothyriales 0.06% Herpotrichiellaceae 0.06% Exophiala 0.06% Aspergillus nidulans 0.06%

Eurotiales 0.86% Aspergillaceae 0.45% Aspergillus 0.32% Aspergillus oryzae 0.01% Epidermophyton

Aspergillaceae floccosum 0.28% Exophiala

234 dermatitidis 0.04% Penicillium

Penicillium 0.13% chrysogenum 0.13% Talaromyces

Trichocomaceae 0.41% Talaromyces 0.41% marneffei 0.41%

Onygenales 0.54% Arthrodermataceae 0.54% Epidermophyton 0.54% Trichophyton rubrum 0.54% Lecanoromycetes 0.55% Lecanorales 0.01% 0.01% Usnea 0.01% Ricasolia amplissima 0.01%

Peltigerales 0.03% Lobariaceae 0.03% Ricasolia 0.03% Usnea ceratina 0.03% Amorphotheca Leotiomycetes 0.01% Helotiales 0.01% Sclerotiniaceae 0.01% Botrytis 0.01% resinae 0.01% Leotiomycetes incertae

sedis 0.00% Myxotrichaceae 0.00% Amorphotheca 0.00% Botrytis cinerea 0.003% Saccharomycetes 98.76% Saccharomycetales 98.76% Debaryomycetaceae 0.02% Candida 0.01% Candida dubliniensis 0.01%

Debaryomyces 0.01% Candida glabrata 0.01% Debaryomyces

Meyerozyma 0.00% hansenii 0.01%

1

Dipodascaceae 0.03% Yarrowia 0.03% Dekkera bruxellensis 0.03% Eremothecium

Phaffomycetaceae 0.01% Wickerhamomyces 0.01% gossypii 0.01% Eremothecium

Pichiaceae 0.02% Brettanomyces 0.02% sinecaudum 0.02% Kazachstania

Saccharomycetaceae 0.54% Eremothecium 0.01% Naganishii 0.01% Kluyveromyces

marxianus 0.001%

Kazachstania 0.01% Kluyveromyces lactis 0.002%

Kuraishia capsulata 0.01% Lachancea

Kluyveromyces 0.01% thermotolerans 0.01% Meyerozyma

Lachancea 0.46% guilliermondii 0.47% Saccharomyces

Nakaseomyces 0.00% cerevisiae 0.002% Sugiyamaella

Saccharomyces 0.00% lignohabitans 0.004% Tetrapisispora

Tetrapisispora 0.05% blattae 0.001% Tetrapisispora

phaffii 0.04% Torulaspora

Torulaspora 0.01% delbrueckii 0.01% Wickerhamomyces

Sacc. Incertae sedis 98.14% Kuraishia 98.14% ciferrii 98.14% 235

Trichomonascaceae 0.00% Sugiyamaella 0.00% Yarrowia lipolytica 0.002% Schizosaccharomyces 0.00% 0.00% 0.00% Schizosaccharomyces 0.00% pombe 0.002% Chrysoporthe 3.37% Diaporthales 0.10% Cryphonectriaceae 0.06% Chrysoporthe 0.06% austroafrica 0.04% Chrysoporthe

deuterocubensis 0.02%

Diaporthaceae 0.04% Diaporthe 0.04% Clonostachys rosea 0.05% Colletotrichum

Glomerellales 0.52% Glomerellaceae 0.52% Colletotrichum 0.52% graminicola 0.52%

Hypocreales 2.19% Bionectriaceae 1.47% Clonostachys 1.47% Diaporthe longicolla 1.47%

Hypocreaceae 0.22% Trichoderma 0.22% Fusarium circinatum 0.22% Fusarium

Nectriaceae 0.50% Fusarium 0.41% graminearum 0.13%

Fusarium mangiferae 0.22%

2

Gibberella

moniliformis 0.04%

Nectria cinnabarina 0.03%

Nectria 0.08% Neurospora crassa 0.09%

Microascales 0.05% Microascaceae 0.05% Scedosporium 0.05% Pestalotiopsis fici 0.05% Scedosporium

Sordariales 0.03% Chaetomiaceae 0.02% Thermothelomyces 0.01% apiospermum 0.01%

Thielavia 0.01% Sordaria macrospora 0.02% Thermothelomyces

Sordariaceae 0.00% Neurospora 0.00% thermophila 0.002%

Sordaria 0.00% Thielavia terrestris 0.002% Trichoderma

Xylariales 0.48% Sporocadaceae 0.48% Pestalotiopsis 0.48% asperellum 0.48% Cryptococcus Basidiomycota 0.08% Tremellomycetes 0.03% Agaricales 0.03% Agaricaceae 0.00% Agaricus 0.00% neoformans 0.002%

Marasmiaceae 0.00% Moniliophthora 0.00% Cryptococcus gattii 0.003%

Tricholomataceae 0.03% Laccaria 0.03% Tremella fuciformis 0.03% Agaricomycetes 0.05% Tremellales 0.05% Cryptococcaceae 0.02% Cryptococcus 0.02% Agaricus bisporus 0.01%

Laccaria bicolor 0.02% Moniliophthora

Tremellaceae 0.03% Tremella 0.03% perniciosa 0.03% 236

3