BIOMESEQ: A QUANTITATIVE APPROACH FOR THE ANALYSIS OF
ANIMAL MICROBIOMES AND ITS APPLICATION IN CHARACTERIZING
THE MICROBIAL ECOLOGY OF AVIAN SPECIES
by
Kelly Ann Mulholland
A dissertation submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics and Systems Biology
Spring 2020
© 2020 Kelly A. Mulholland All Rights Reserved
BIOMESEQ: A QUANTITATIVE APPROACH FOR THE ANALYSIS OF
ANIMAL MICROBIOMES AND ITS APPLICATION IN CHARACTERIZING
THE MICROBIAL ECOLOGY OF AVIAN SPECIES
by
Kelly Ann Mulholland
Approved: ______Cathy H. Wu, Ph.D. Chair of Bioinformatics and Computational Biology
Approved: ______Mark Rieger, Ph.D. Dean of the College of Agriculture and Natural Resources
Approved: ______Douglas J. Doren, Ph.D. Interim Vice Provost for Graduate & Professional Education and Dean of the Graduate College
I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy.
Signed: ______Calvin L. Keeler, Jr., Ph.D. Professor in charge of dissertation
I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy.
Signed: ______Carl Schmidt, Ph.D. Member of dissertation committee
I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy.
Signed: ______Shawn Polson, Ph.D. Member of dissertation committee
I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy.
Signed: ______Timothy Johnson, Ph.D. Member of dissertation committee
DEDICATION
To my father for the lifetime of love and support you gave me in the short time we
shared together.
To Tyler for your overflowing love, your uplifting spirit and your unwavering patience
after all this time.
iv
ACKNOWLEDGMENTS
I would like to extend my gratitude to several people for their contribution to this work. First, I give my sincere thanks to my advisor and mentor, Dr. Calvin Keeler, for his immeasurable support, advice and guidance throughout this process. His enthusiasm and dedication to his work have been very inspiring to me. I would like to thank my committee members, Dr. Shawn Polson, Dr. Carl Schmidt and Dr. Timothy
Johnson for their time and invaluable insight. Thank you to the members of the Keeler
Research Group, both past and present. I would especially like to thank Monique for her many contributions to this work and Sharon for sharing her knowledge and insight over the years. I also wish to thank the many friends that I have made during my time in the Bioinformatics Student Association and the EmPOWER mentoring program that have made my time at the University of Delaware so wonderful. I would like to acknowledge the University of Delaware CANR Unique Strengths Dissertation Award and the Agriculture and Food Research Initiative Competitive Grant for the financial support to make this work possible.
I am incredibly grateful to all of my family and friends for always believing in me and for their encouragement over these past few years. This dissertation would not have been possible without them. I wish to express my most sincere gratitude to my
v
partner, Tyler, for being my greatest supporter throughout this entire journey and for all he has done to ensure that I accomplished this goal. It was his love and belief in me that gave me the strength to continue during some of the more difficult times. Thank you to Bob for his guidance and endless support in all of my endeavors. I will always cherish our conversations and laughs over coffee and breakfast. I wish to thank Kathy,
David and Kari for their generosity, advice and love over the years. Thank you to Amy for her unparalleled friendship and for making these past few years so enjoyable with our many adventures. I would also like to thank Gibbs for being such a great companion and source of happiness. I am grateful to my father for instilling his work ethic in me and for always encouraging me to achieve my goals no matter how big or small. Although we are unable to celebrate this milestone together, I know that he is immensely proud. Finally, I would like to thank my mother and sisters for all of their support.
vi
TABLE OF CONTENTS
LIST OF TABLES ...... xi LIST OF FIGURES ...... xiii ABSTRACT ...... xvi
Chapter
1 INTRODUCTION AND REVIEW OF LITERATURE ...... 1
1.1 Microbiomes ...... 1
1.1.1 Symbiotic Microbial Interactions with Healthy Host and Other Microbes ...... 3 1.1.2 Dysbiotic Microbial Interactions with Diseased Host and Other Microbes ...... 6
1.2 Respiratory Microbiome ...... 7
1.2.1 Healthy Mammalian Respiratory Microbiome ...... 7 1.2.2 Mammalian Respiratory Microbiome Diseases ...... 9 1.2.3 Avian Respiratory Microbiome ...... 11 1.2.4 Multifactorial Avian Respiratory Disease Complex ...... 12
1.3 Advancement of Technology for Detection of Microorganisms ...... 14
1.3.1 Next-Generation Sequencing Technology ...... 16 1.3.2 16S Ribosomal RNA Sequencing ...... 17 1.3.3 Metagenomic Shotgun Sequencing ...... 19
1.4 Characterization of the Virome ...... 20
1.4.1 Characterizing the Virome Using a Culture-Independent Approach ...... 21 1.4.2 Challenges with Developing Comprehensive Computational Tools for Analysis of the Virome ...... 22
1.4.2.1 Quantification of the Virome ...... 23
vii
1.4.3 Existing Culture-Independent Tools for Virome Characterization and their Limitations ...... 24
1.5 Rationale and Objectives ...... 25
REFERENCES ...... 27
2 BIOMESEQ: A TOOL FOR THE CHARACTERIZATION OF ANIMAL MICROBIOMES FROM METAGENOMIC DATA ...... 48
2.1 Summary ...... 48 2.2 Introduction ...... 49 2.3 Results ...... 52
2.3.1 Design and Development of BiomeSeq ...... 52 2.3.2 Validation of BiomeSeq ...... 54 2.3.3 A Longitudinal Study of the Microbial Ecology of a Healthy Broiler Flock ...... 58 2.3.4 A Comparison of BiomeSeq bacterial results to 16S rRNA Results ...... 60
2.4 Discussion ...... 60 2.5 Materials and Methods ...... 66
2.5.1 Quality Trimming and Host Decontamination ...... 67 2.5.2 Microbial Database Alignment ...... 67 2.5.3 Quantification and Output ...... 68 2.5.4 Performance Metrics ...... 69 2.5.5 A Longitudinal Study of the Microbial Ecology of a Healthy Broiler Flock ...... 71 2.5.6 Comparison of BiomeSeq Bacterial Results to 16S rRNA Results ...... 72
REFERENCES ...... 85
3 METAGENOMIC ANALYSIS OF THE RESPIRATORY MICROBIOME OF A HEALTHY BROILER FLOCK FROM HATCHING TO PROCESSING ...... 91
3.1 Summary ...... 91 3.2 Introduction ...... 92 3.3 Results ...... 95
viii
3.3.1 Avian Respiratory Eukaryotic Viral Diversity ...... 96 3.3.2 Bacterial Diversity ...... 98 3.3.3 Bacteriophage Diversity ...... 100 3.3.4 Fungal Diversity ...... 102 3.3.5 The Avian Microbiome ...... 103
3.4 Discussion ...... 104 3.5 Materials and Methods ...... 108
3.5.1 Sample Collection ...... 108 3.5.2 Nucleic Acid Extraction and Sequencing ...... 109 3.5.3 16S rRNA Amplicon Sequencing and Analysis ...... 109 3.5.4 Eukaryotic Virus, Bacteriophage and Fungal Analysis ...... 110
REFERENCES ...... 122
4 CHARACTERIZATION OF THE RESPIRATORY MICROBIOME OF CHICKENS WITH RESPIRATORY DISEASE ...... 128
4.1 Summary ...... 128 4.2 Introduction ...... 129 4.3 Materials and Methods ...... 132
4.3.1 Sample Collection ...... 132 4.3.2 Nucleic Acid Extraction and Sequencing ...... 132 4.3.3 Eukaryotic Virus, Bacteriophage, Yeast and Fungal Analysis .. 133
4.4 Results ...... 135
4.4.1 Identifying the broiler respiratory microbiome and a comparison of the respiratory virome between healthy and diseased birds...... 135 4.4.2 Comparison of the bacterial microbiome between healthy and diseased birds...... 136 4.4.3 Comparison of the bacteriophage and fungal microbiomes between healthy and diseased birds...... 137 4.4.4 Microbial network analysis...... 138
4.5 Discussion ...... 139
REFERENCES ...... 155
ix
5 A COMPARISON OF TRACHEA, CHOANAL CLEFT AND CLOACAL MICROBIOTA OF A HEALTHY TURKEY FLOCK ...... 160
5.1 Introduction ...... 160 5.2 Materials and Methods ...... 162
5.2.1 Sample Collection ...... 162 5.2.2 Nucleic Acid Extraction and Sequencing ...... 162 5.2.3 Eukaryotic Virus, Bacteria, Bacteriophage and Fungal Analysis ...... 163
5.3 Results ...... 165
5.3.1 Quality Trimming and Decontamination of Sequencing Reads 165 5.3.2 Diversity of Eukaryotic Viruses in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock ...... 166 5.3.3 Bacteriophage Diversity in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock ...... 167 5.3.4 Fungal Diversity in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock ...... 168 5.3.5 Bacterial Diversity in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock ...... 169 5.3.6 Microbial network of choanal cleft, cloaca and trachea of a healthy turkey flock ...... 170
5.4 Discussion ...... 170
REFERENCES ...... 192
6 CONCLUSIONS AND FUTURE DIRECTIONS ...... 196
REFERENCES ...... 203
Appendix
A BIOMESEQ: A TOOL FOR THE CHARACTERIZATION OF ANIMAL MICROBIOMES FROM METAGENOMIC DATA ...... 208
B METAGENOMIC ANALYSIS OF THE RESPIRATORY MICROBIOME OF A HEALTHY BROILER FLOCK FROM HATCHING TO PROCESSING ...... 216
x
LIST OF TABLES
Table 2.1 Software tools and parameters used by BiomeSeq ...... 83
Table 2.2 Example table generated by BiomeSeq of the viral component of a commercial poultry flock ...... 84
Table 3.1 Avian specific viral genome database structure...... 120
Table 3.2 Shannon diversity of respiratory microbes in a healthy broiler flock . 121
Table 4.1 Avian specific viral genome database structure ...... 149
Table 4.2 Sequencing data generated by DNA Seq and RNA Seq and number of reads trimmed, aligned to host and aligned to microbial databases 150
Table 4.3 Eukaryotic viruses detected in healthy and diseased broiler flocks .... 151
Table 4.4 Bacteria detected in healthy and diseased poultry broiler flocks ...... 152
Table 4.5 Bacteriophage detected in healthy and diseased poultry broiler flocks ...... 153
Table 4.6 Fungi detected in healthy and diseased poultry broiler flocks ...... 154
Table 5.1 Avian specific viral genome database structure ...... 184
Table 5.2 Quality Trimming and Host DNA Decontamination of reads generated by DNA Seq and RNA Seq from samples collected from the choanal cleft, cloaca and trachea of turkeys ...... 185
Table 5.3 Shannon diversity of virus, bacteria, bacteriophage and fungi in choanal cleft, trachea and cloaca of turkey ...... 186
Table 5.4 Eukaryotic viral species abundance in the choanal cleft, trachea and cloaca of turkey ...... 187
Table 5.5 Bacteria abundance in the choanal cleft, trachea and cloaca of turkey 188
xi
Table 5.6 Bacteriophage species with top 10 highest abundances in the choanal cleft, trachea and cloaca of turkey ...... 189
Table 5.7 Fungal species with top 10 highest abundances in the choanal cleft, trachea and cloaca of turkey ...... 190
xii
LIST OF FIGURES
Figure 2.1 BiomeSeq Workflow ...... 73
Figure 2.2 Percent relative abundance of microorganisms detected by BiomeSeq and known values from simulated datasets ...... 74
Figure 2.3 Average rate of speed at different steps in BiomeSeq processing including A) quality, B) decontamination, C) Microbial database alignment and D) quantification for four simulated datasets...... 75
Figure 2.4 Root Mean Square Error between known abundances and abundances determined by BiomeSeq ...... 76
Figure 2.5 Heatmap of percent normalized relative abundance of viruses detected in a commercial poultry flock from hatching to processing...... 77
Figure 2.6 Phylogenetic tree of bacterial species detected in a poultry flock ...... 78
Figure 2.7 Venn Diagram of the detected bacteriophage species in a commercial poultry flock at Week 0, Week 1 and Week 7 ...... 79
Figure 2.8 Fungal network of species detected in a commercial poultry flock ...... 80
Figure 2.9 Microbial network of the top 10 most abundant eukaryotic viruses, fungi, bacteria and bacteriophage in a commercial poultry flock at time of processing ...... 81
Figure 2.10 Bacteria detected in a healthy poultry broiler flock ...... 82
Figure 3.1 Normalized relative abundance of detected DNA and RNA viral species at each time point ...... 113
Figure 3.2 Heat map with phylogenetic tree representing the detection intensity of viral families at each individual week ...... 114
xiii
Figure 3.3 Heat map with phylogenetic tree representing the detection intensity of each viral family from hatching to processing ...... 115
Figure 3.4 Abundance of A) virus, B) bacteria, C) bacteriophage and D) fungi at Week 0, Week 1 and Week 7 ...... 116
Figure 3.5 Phylogenetic tree of A) virus, B) bacteria, C) bacteriophage and D) yeast and fungi ...... 117
Figure 3.6 Microbial network of the complete healthy avian respiratory microbiome ...... 118
Figure 3.7 Correlation matrix comparing bacteria and bacteriophage taxa at the family level...... 119
Figure 4.1 Sample Diversity of all detected microorganisms in healthy and diseased flocks...... 144
Figure 4.2 Heat map representing the detection intensity of viral families at each individual week ...... 145
Figure 4.3 Abundance of bacterial species in A) healthy and B) diseased flocks .... 146
Figure 4.4 Abundance of bacteriophage families in healthy and diseased flocks .... 147
Figure 4.5 Microbial network of the complete avian respiratory microbiome of a healthy and diseased flock including detected eukaryotic viruses, fungi, bacteria, and bacteriophage...... 148
Figure 5.1 Normalized relative abundance of eukaryotic viruses at the choanal cleft, cloaca and trachea of turkey...... 175
Figure 5.2 Normalized relative abundance of bacteria at the choanal cleft, cloaca and trachea of turkey...... 176
Figure 5.3 Normalized relative abundance of bacteriophage at the choanal cleft, cloaca and trachea of turkey...... 177
Figure 5.4 Normalized relative abundance of fungi at the choanal cleft, cloaca and trachea of turkey...... 178
Figure 5.5 Venn Diagram of the eukaryotic viruses detected in the choanal cleft, cloaca and trachea of turkeys...... 179
xiv
Figure 5.6 Venn Diagram of the bacteria detected in the choanal cleft, cloaca and trachea of turkeys...... 180
Figure 5.7 Venn Diagram of the bacteriophage detected in the choanal cleft, cloaca and trachea of turkeys...... 181
Figure 5.8 Venn Diagram of the fungi detected in the choanal cleft, cloaca and trachea of turkeys...... 182
Figure 5.9 Microbial network of eukaryotic viruses, fungi, bacteria and bacteriophage present in the cloaca, trachea and choanal cleft of healthy turkeys...... 183
xv
ABSTRACT
Microbiomes are complex communities of microorganisms (including bacteria, eukaryotic viruses, fungi and bacteriophage) that inhabit a particular environment in animals and contribute to essential biological functions. The microorganisms within these environments interact with the host and each other in either symbiosis or dysbiosis, depending on the condition of the host as well as external factors.
Disturbances within a microbiome may result in metabolic disturbances or disease in the host. The advancement of next generation sequencing methodologies has given rise to an increase in studies attempting to examine the microbial communities existing in a variety of animals. Readily accessible and cost-effective sequencing methodologies, as well as a number of user-friendly bioinformatics analysis software and databases for 16S rRNA sequencing data, provide the standard culture- independent approach for bacterial microbiome analysis. However, this approach cannot be extended to the characterization of eukaryotic viruses, bacteriophage and fungi. Therefore, elucidating the complete microbiome requires a new approach.
Herein, we present BiomeSeq, a computational tool developed for the characterization of complete animal microbiomes using metagenomic sequencing data.
BiomeSeq, and its accompanying databases, addresses the constraints of current
xvi
computational tools by providing a comprehensive workflow that accurately identifies and quantifies each major component of the microbiome. BiomeSeq provides taxonomic information for each detected microorganism as well as normalized abundance, relative abundance, genome coverage and sample diversity values. The performance of this tool was successfully evaluated using both simulated and clinical samples. BiomeSeq is available as a software package and as an open-source and user- friendly container, allowing users to easily download, install and use the program with a few simple commands. The versatility of BiomeSeq, such as customizable parameters and accepting custom databases, allow this tool to facilitate a variety of unique investigations.
BiomeSeq was utilized to detect and quantify microbial abundance and diversity of several avian microbiomes under various conditions. In one study, the respiratory tract of a healthy poultry broiler flock was examined at weekly intervals throughout its grow-out cycle from hatching to processing. As expected, the complexity and diversity of the viral community increased as the flock aged, while the timing and presence of several viral elements was consistent with the management practices of commercial broiler flocks. Additionally, correlations between bacteria and bacteriophage families were investigated and several highly positive correlations were identified. In a second study, the microbial ecology of the respiratory tract of a broiler flock clinically diagnosed with respiratory disease complex and a healthy broiler flock were compared. Changes in the composition and diversity of the viral, bacterial, and bacteriophage microbiomes were observed which were consistent with the complex
xvii
etiology of this disease. In a final study, the ability of BiomeSeq to characterize a variety of microbiomes in different host species was demonstrated. The tool was successful in identifying microbial communities inhabiting three unique microbial niches, including the trachea, choanal cleft and cloaca in a healthy turkey flock.
xviii
Chapter 1
INTRODUCTION AND REVIEW OF LITERATURE
1.1 Microbiomes
The term microbiome was coined by Nobel laureate Joshua Lederberg in
2001 to describe the commensal, symbiotic and pathogenic microorganisms that exist within the human body [1]. Microbiomes consist of a variety of microorganisms including bacteria, fungi, eukaryotic viruses, bacteriophage and archaea. These environments can exist throughout the body in the oral cavity, intestinal tract, respiratory tract, vaginal tract and skin of both animals and humans [2, 3]. The composition within different microbiomes varies as the microbial communities participate in unique biological functions. For example, the human gut microbiome is involved in a variety of functions including the metabolism of glycans, amino acids and xenobiotics, immune system development, methanogenesis, and the 2-methyl-D- erythritol 4-phosphate pathway-mediated biosynthesis of vitamins essential for human health, such as B6 and B12 [4, 5]. The size of each microbial component varies as well.
In total, the bacteria within human microbiomes were originally thought to outnumber somatic and germ cells by about 10:1 [6], however recent studies provide evidence
1
that this ratio is closer to 1:1 with a total mass of about 0.2 kg [7]. Additional evidence suggests that the number of viruses may be 10-fold higher than the bacterial component [8]. These microorganisms often interact in symbiosis, in which the host organism and the microbiota interact to maintain homeostasis of the host environment
[6]. However, a change within the environment can lead to dysbiosis, often resulting in infection and disease of the host. Therefore, understanding the complex etiology of a disease requires a characterization of the microbiota from both healthy and diseased organisms.
In 2008, The Human Microbiome Project emerged as an effort to characterize the human microbiome from multiple body sites, identify changes in composition between healthy and diseased individuals and provide a standard resource for microbial data [9]. Since then, this project has isolated and sequenced over 2,200 reference bacterial strains from the human body, sampled over 300 healthy adults at eighteen specific body sights, including gut, oral cavity, airway, skin and vagina, and it continues to provide raw sequencing data for metagenomic strains on the HMP Data
Browser database [2, 3].
Although there are at least seven major microbiomes on the human body, microbiome research has primarily concentrated on the gut. A tally of microbiome literature in 2016 revealed that a total of 17,546 gut microbiome publications existed on PubMed [10]. The next most studied environment was the oral microbiome with 4,843 publications. About 1,477 studies existed that concentrated on the reproductive tract, followed closely by skin microbiome studies with about 1,372 publications. The
2
respiratory tract and ocular microbiomes were the least studied with only 764 and 152 studies published, respectively [10]. The advancement of next generation sequencing technology, along with the decrease in cost promises even more research in the near future.
1.1.1 Symbiotic Microbial Interactions with Healthy Host and Other Microbes
The microorganisms residing in microbiomes interact with the host and each other to carry out specific biological functions while maintaining homeostasis of the host. These interactions are referred to as symbiotic. The balance typically occurs when the commensal microbes outnumber the pathogenic and greater diversity is observed [6]. In the bacterial microbiome these functions can include housekeeping functions necessary for microbial life, processes specific to body-site, and specialized functions for each habitat [11]. Within the human gut microbiome, commensal microbes synthesize vitamins and amino acids, metabolize bile acids, aide in the development of the immune system and prevent the overgrowth of harmful bacteria by enhancing the epithelial barrier [5]. Commensal Escherichia coli was found to inhibit the growth of pathogenic E. coli in nine animal species [12]. Additionally,
Bifidobacterium has been found to produce acetate, which inhibits the colonization of pathogenic E. coli [13]. The introduction of probiotics, such as Biffidobacterium and
Lactobacillus, and prebiotics to the gut microbiome has been shown to have a positive impact on human and animal health and has been an extensive area of research. In a
3
metagenomic study on a cohort of 396 women conducted by Ravel et al., vaginal microbiome compositions included predominately Lactobacillus iners, Lactobacillus crispatus, Lactobacillus gasseri, and Lactobacillus jensenii [14]. Lactobacillus species have been associated with healthy vaginal tracts as they produce hydrogen peroxide which combines with a low pH to prevent colonization of pathogens [15, 16].
A study by Skarin et al. determined that Lactobacillus species inhibit the growth of several bacteria including Garednerella vaginalis, Mobiluncus and Bacterioides by producing a low pH [Skarin et al., 1986]. Commensal bacteria on the skin include
Corynebacterium diphtheria, Corynebacterium jeieikum, Staphylococcus epidermis,
Staphylococcus aureurs, Streptococcus mitis, Psydomonas aeruginosa, and others [17,
18]. These skin-residing microbes have also been found to prevent colonization by pathogenetic species [19]. For example, in a 2010 study by Lai et al., it was demonstrated that Staphylococcus epidermis can reduce susceptibility to pathogens that lead to skin infections by activating TLR2 signaling and antimicrobial peptide expression [20].
Mutualistic symbiotes within the viral microbiome, or virome, have been found to benefit the host by altering innate immunity to other pathogens, both viral and bacterial. Virgin et al., estimates that a healthy human could harbor ten or more permanent chronic systemic viral infections, which may contribute to activating the immune system [21]. These viruses include several herpesviruses [22, 23], polyomaviruses [24], anelloviruses [25], adenoviruses [26-29], papillomaviruses [30] and even endogenous retroviruses [31]. One example of a virus affecting bacterial
4
infection is gammaherpesvirus 68, which when latent has been found to increase resistance to the bacterial pathogens Listeria monocytogenes and Yersinia pestis in mice [32]. This relationship was later confirmed by Yager and colleagues in 2009, where they found that the latency period is actually transient and not lifelong [33].
Viruses can also affect infection by other viral pathogens. This can occur due to interference, the phenomena by which a viral infection causes a cell to be temporarily resistant to infection by other viruses. This type of behavior was observed in a study by Grivel et al. in 2001, in which they found that persistent infection by human herpesvirus 6 could inhibit HIV-1 infection and progression in lymphoid tissue [34].
Alternatively, viruses can also increase susceptibility and exacerbate infections by other viruses. This type of co-infection was observed in a study by Bonfante and colleagues, in which infection of low pathogenic avian influenza virus in a flock of chickens was found to increase susceptibility and clinical signs of velogenic
Newcastle disease virus [35]. More of these relationships are expected to exist between the healthy host and the virome, however for now the eukaryotic virome remains severely under-characterized.
Fungi and yeast are another component that are even less characterized, however there is some evidence that similar, symbiotic relationships can occur. One example was observed in Saccharomyces boulardii which was found to have probiotic behaviors against pathogens such as Escherichia coli, Vibrio Cholera, Salmonella [36] and Clostridium difficile in humans [37] and several animals, including turkeys [38].
5
1.1.2 Dysbiotic Microbial Interactions with Diseased Host and Other Microbes
Disturbances within a microbiome can result in an unfavorable imbalance of the microbiota referred to as dysbiosis. Disturbances can be caused by a number of factors including the colonization of a new infectious agent, external environmental stressors and the physiological or health status of the host. Dysbiosis can lead to an increase in pathogens and as a result, a decrease in the abundance and diversity of commensal microorganisms. Although it is common for pathogens to exist in an asymptomatic host, this imbalance in the microflora can often lead to infection and disease.
Dysbiosis in the gut microbiome has been linked to diseases such as chrone’s disease [39-41], ulcerative colitis [39, 42, 43], irritable bowel syndrome [44, 45], colorectal cancer [46, 47], celiac disease [48, 49], type 1 and type 2 diabetes [50-54], chronic kidney disease [55] and obesity [50, 56, 57]. In the skin microbiome, dysbiosis has been linked to atopic dermatitis [58, 59], psoriasis [58, 60, 61], acne [62-64] and rosacea [65, 66]. Dysbiosis in the vaginal microbiome has been associated with bacterial vaginosis, vaginal candidiasis and perinatal group B streptococcal disease
[15]. The onset of bacterial vaginosis, for example, can occur due to a shift in microbiota abundance of Lactobacillis sp. to Gardnerella vaginalis [67]. Dysbiosis in the oral microbiome has been found to play a role in periodontal diseases, dental caries, and oral squamous cell carcinoma [68]. Interestingly, there is evidence that also
6
links the oral microbiome to cardiovascular disease [69, 70] as well as esophageal [71,
72], pancreatic [73] and colorectal cancer [74].
1.2 Respiratory Microbiome
As previously discussed, the respiratory microbiome is understudied when compared to the intestinal, reproductive and oral microbiomes. One reason for the reduced number of studies on this particular environment is due to differences in sampling techniques. Unlike intestinal samples in which nucleic acids can effectively be extracted from fecal matter, sampling of the respiratory tract requires much more invasive techniques such as swabbing of the trachea. Nevertheless, it is essential to characterize the microflora within this environment as pathogen introduction is a constant threat to host immunity. Recent efforts have been made to characterize the respiratory microbiome of both healthy and diseased organisms. Although much of this information regards the bacterial component, this knowledge provides information that can be used to prevent future infections and diseases in both humans and animals.
1.2.1 Healthy Mammalian Respiratory Microbiome
For many decades the healthy lung was thought to be a completely sterile environment with no bacteria present [75]. However, due to the advancement of culture-independent techniques, such as next generation sequencing, microorganisms have been found to inhabit this environment. Due to migration of microorganisms
7
from the upper-respiratory tract to the lower-respiratory tract, it is often difficult to distinguish which microorganisms inhabit which environment. Whether upper- respiratory inhabitants are being identified in the lower region, or if they inhabit both environments is not always clear. According to Dickson et al., the composition of the respiratory microbiome is determined by microbial immigration, elimination and growth [76]. Microbial immigration includes inhaling microbes within the air and the microaspiration of upper-respiratory tract contents. Elimination includes mucociliary clearance as well as innate and adaptive host immune defenses. Microbial growth can be due to several environmental factors such as nutrient availability, temperature and pH [76].
Recent studies have concluded the most abundant bacterial genera in the healthy respiratory tract of humans consist primarily of Prevotella, Veillonella and Streptococcus [77, 78]. A limited number of studies focusing on the respiratory virome exist. The first study characterizing the respiratory virome of humans was conducted by Willner and colleagues in 2009. In this study, a high viral diversity in healthy individuals that was representative of the external environment was observed [79]. Furthermore, the same twenty viruses were detected in each healthy individual, including mammalian adenoviruses, mammalian herpesviruses and poxviruses. This suggests that healthy humans share a common viral community structure in the respiratory tract.
8
1.2.2 Mammalian Respiratory Microbiome Diseases
Disease can alter the microbial composition by affecting immigration and elimination of the commensal microbes inhabiting a healthy human respiratory tract.
Chronic respiratory diseases include chronic obstructive pulmonary disease (COPD), cystic fibrosis (CF) and asthma. Typically, disease onset begins with colonization by an infectious agent. For example, bacterial colonization within the nasopharyngeal niche has been found to result in overgrowth and invasion of the bacteria, eventually leading to respiratory disease [80]. A dysbiosis state in the nasopharynx can lead to acquisition of new bacterial or vial pathogens, carriage of multiple pathogenic bacteria, or a viral co-infection [81].
In patients with COPD, infections can lead to increased shortness of breath, chest tightness, and phlegm. Specific bacteria and viruses have increased abundances in patients with COPD. A study by Papi and colleagues identified specific viruses which included rhinoviruses, influenza viruses, respiratory syncytial viruses, parainfluenza viruses and coronaviruses. The most abundant bacteria they identified were Haemophilus influenzae, Streptococcus pneumoniae, Moraxella catarrhalis,
Staphylococcus aureus, Pseudomonas aeruginosa, and Enterobacter spp. [82, 83].
Furthermore, coinfection of bacteria and viruses was found to increase severity of the disease, with greater lung function impairment resulting in a longer recovery time in the hospital.
9
In patients with cystic fibrosis, infection by bacteria such as Pseudomona
Aeruginosa, Burkholderia cepacia and Staphylococcus aureus have been found to cause an increase in morbidity and mortality of patients [79, 84]. In one of the first studies to characterize a respiratory virome, Willner and colleagues identified several eukaryotic viruses in patients with cystic fibrosis including human herpesvirus and retrovirus. Interestingly, they observed similar bacteriophage populations in patients with the disease when compared to healthy patients and also found that the bacteriophage populations corresponded to the detected bacteria species [79]. In a study conducted by Green et al., several bacteria, including Moraxella catarrhalis,
Haemophilus spp. and Streptococcuss spp., were identified in a majority of patients with asthma, suggesting their possible role in increased airway obstruction and inflammation [85]. Furthermore, in a study by Johnston and colleagues, respiratory viruses such as picornaviruses, coronaviruses, influenza, rhinovirus and respiratory syncytial virus were identified in children during symptomatic episodes of asthma
[86].
Several recent studies have identified viruses that exacerbate bacterial infections in the human respiratory tract. These interactions include coronavirus on H. influenza [Michaels et al., 1983]; adenovirus on H. Influenza and M. catarrhalis [87,
88]; influenza virus on S. pneumoniae [89], H. influenza [90] and S. aureus [91]; human rhinovirus on S. pneumoniae [92], H. influenzae [88] and M. catarrhalis [88]; human metapneumovirus on S. pneumoniae [93]; and respiratory syncytial virus on S. pneumoniae and H. influenza [94, 95]. In one specific instance, influenza infection has
10
been found to alter the host to predispose it to adherence, invasion and induction of disease by pneumococcus, however the mechanism for this alteration requires further examination [96]. Haemophilus influenzae, Pseudomona aeruginosa and
Streptococcus pneumoniae have been shown to stimulate secretion of mucus, which could potentially lead to increased bacterial infection in patients [97]. Haemophilus influenzae and Pseudomona Aeruginosa were found to slow, and in some cases stop, human nasal cilia function, which could result in less efficient mucociliary clearance, therefore allowing infectious agents to colonize and spread more easily in the lung
[98].
1.2.3 Avian Respiratory Microbiome
Similar to humans and mammals, microbial interactions in the avian respiratory microbiome can be symbiotic or dysbiotic and this depends primarily on the status of the bird and its living conditions. In a recent study by Glendinning and colleagues, culture-independent methods were utilized to identify bacteria in the buccal, nasal and lung microbiomes of healthy chickens at different time points [99].
The group identified differences in the bacterial microbiome between the different respiratory sites as well as between different age groups of birds. They identified
Staphylococci, Lactobacilli and Enterobacteriaceae, which corresponded to previous culture-dependent studies. However, they were also able to identify several additional bacterial groups in a relatively high abundance, including Faecalibacterium,
11
Turicibacter and Jeotgalicoccus. In another study by Shabbir et al., culture- independent methods were utilized to compare the bacterial microbiome of the lower respiratory tract between healthy flocks located on different farms [100]. They determined that the environment has an impact on the composition of the bacterial microbiome as significant differences were observed according to the farm that the birds belong to. Breed of bird and geographic location showed less of an impact [100].
Although microorganisms are present in the respiratory tract of healthy flocks, the introduction of a new infectious agent or changes in the environmental conditions can lead to a variety of infections and co-infections that may result in respiratory disease.
1.2.4 Multifactorial Avian Respiratory Disease Complex
Avian respiratory disease complex is an example of a multifactorial syndrome that commonly affects poultry flocks and involves a combination of bacterial, viral and fungal infectious agents in conjunction with environmental stressors. Clinical signs of avian respiratory disease include snicking, head swelling, conjunctivitis, airsaculitis, nasal and ocular excretion and rattling noises [101]. The morbidity per house can range from 10-20% and mortality per house can range from 5-10%.
According to a study conducted by the USDA in 2012 where 482 breeder chicken farms were examined for respiratory disease, 5.2% of poultry flocks in the Eastern region of the United States were affected by respiratory disease, while only 2.7% of flocks located in Central United States were affected [102]. In recent history,
12
pathogenic outbreaks in poultry flocks have contributed to global economic loss. For example, during the 2014-2015 outbreak of highly pathogenic avian influenza, over 50 million chickens and turkeys were lost to disease or depopulation [103]. Poultry is the leading source of protein globally, with over $46.3 billion in global wholesale prices in 2018 [104]. Therefore, prevention of similar outbreaks is important ensuring a healthy nutrition and economy worldwide. Environmental factors such as increased dust and ammonia levels, crowded houses, or fluctuations in temperature and humidity can trigger respiratory disease in flocks [105-107]. Poor ventilation in the winter months can reduce ventilation and prompt these stressors. The living conditions of the flock are a contributing factor to respiratory disease onset that is often overlooked in controlled laboratory settings. Therefore, understanding the complexity of coinfections, birds should be studied in controlled settings as well as in the field.
In commercial flocks in particular, coinfection is common and multiple bacteria-bacteria [108-110], virus-virus [35, 110-112] and even bacteria-virus interactions [35, 113-115] have been reported to result in respiratory disease. In the majority of cases, infection by two or more agents contributes to an exacerbation of clinical signs and increase in mortality. In the Eastern region of the United States, the
USDA reported 2.1% of flocks were diagnosed with Mycoplasma synoviae, 2.4% were diagnosed with infectious larynogotracheitis and 0.8% were diagnosed with infectious bronchitis. In the Western region of the United States, 1.6% of flocks were diagnosed with Mycoplasma synoviae and 1.1% were diagnosed with infectious bronchitis [102]. Several studies have reported that infectious bronchitis virus
13
infection can increase the severity of Mycoplasma synoviae in poultry [116-118].
Other contributing pathogens of avian respiratory disease complex include
Erysipelothrix, Mycoplasma gallisepticum, Haeophilus paragallinarum, Escherichia coli, Pasteruella multocida, Ornithobacterium rhinotracheale, Aspergillus, infectious coryza, avian influenza, avian pneumovirus and Newcastle disease [101, 119].
Infection by two or more of these agents can increase the morbidity and mortality in poultry flocks. In a study by Springer and colleagues, it was demonstrated that two relatively non-pathogenic agents, Mycoplasma synoviae and Escherichia coli, prolong and increase the severity of infectious bronchitis virus infections [118]. Clinical signs were more severe than co-infection by only Mycoplasma synoviae and infectious bronchitis virus or Escherichia coli and infectious bronchitis virus. Unlike diseases that have one causative agent, determining multifactorial disease mechanisms can be quite challenging. Nevertheless, elucidating the dynamic interactions occurring within the microbial communities inhabiting the respiratory tract will provide a better understanding of the etiology of avian respiratory disease.
1.3 Advancement of Technology for Detection of Microorganisms
Culture dependent approaches for detecting microbial organisms have been the standard method since the 1880’s when Robert Koch invented plating [120]. While other methods such as microscopy, antigen detection and serology are commonly used in microbiology, culture is viewed as the standard method for diagnostics [121]. These
14
traditional methods have tremendously contributed to the advancement of the field of microbiology and to our understanding of microbial diversity. Although culture is still the standard in some laboratories, traditional laboratory methods are time consuming and do not allow us to truly see the diversity within the microbial communities. In some instances, it can take several days to receive bacteria and yeast results, while fungi can take as long as months [121]. Furthermore, current laboratory techniques are unable to culture the vast majority (over 90%) of microbial species leaving gaps in our knowledge and understanding of the planet’s biodiversity [122].
About a century later, however, culture-independent technologies were invented, which have allowed us to analyze microbial communities within a particular environment by identifying microbial DNA isolated from a sample. Some early molecular methods used to detect unknown microbial composition from a sample include fluorescent in situ hybridization (FISH) [123], denaturing gradient gel electrophoresis (DGGE) [124], automated ribosomal internal transcribed spacer analysis (ARISA) [125] and terminal restriction fragment length polymorphism
(TRFLP) [126]. Dideoxy chain-termination method, or Sanger sequencing, was the most commonly used method of sequencing DNA and led to the development of automated DNA sequencing platforms [127]. Using these early sequencing approaches, Sanger sequenced the first DNA genome, bacteriophage Phi X [128]. The introduction of culture-independent methods provided more insight into microbial communities; however, it was still expensive and time consuming.
15
1.3.1 Next-Generation Sequencing Technology
Advancements in sequencing technology have resulted in the development of more robust, accurate and rapid next generation sequencing platforms including the
454 Genome Sequencer (the first next-generation sequencer) [129], the Illumina
Genome Analyzer [130] and the Applied Biosystems/SOLiD [131]. With next generation sequencing platforms, unlike Sanger sequencing, DNA sequencing libraries are clonally amplified in vitro, the DNA is sequenced by synthesis and the spatially segregated and amplified DNA templates are sequenced in parallel [129, 130, 131].
Each sequencing platform has its advantages and disadvantages. For example, the 454
Genome Sequencer has a maximum read length of about 700 base pairs, a Phred quality score greater than Q20 and error rates between 1.07% and 1.7% [129, 132].
The Illumina Genome Analyzer has a maximum read length of 300 base pairs, a Phred quality score greater than Q30 and error rates between 1.0034% and 1% [133].
Finally, the Biosystems/SOLiD has a maximum read length of only 75 base pairs, a
Phred quality score greater than Q30 and error rates between 1.01% and 1%
[134].Third generation sequencing platforms produce larger reads however the error rates are significantly higher than the other platforms. For example, the Nanopore sequencer can produce up to 10 kb with error rates between 10% to 40% [135] while the Pacific Biosciences sequencer can produce up to 20 kb of sequence with error rates between 5% and 10% [136].
16
As next generation sequencing technologies continue to advance, larger amounts of data will be generated at lower cost and time. The very first human genome that was sequenced in 2000 cost an estimated $300 million to complete [137,
138]. Since then, the National Human Genome Research Institute has tracked the cost per genome sequenced from 2001 to 2019 and in this time, the cost has decreased from $100 million to as little as $1,000. This trend correlates with Moore’s Law, which states that the advancement of technology increases computational power by double every other year [139]. In addition to cost decreasing, sequencing time has also decreased. For example, the first human genome took over 15 months to sequence in
2000, while twenty years later, a human genome can be sequenced in just days [137].
Using next generation sequencing technology, two approaches are commonly employed to characterize microbial composition. One method is amplification and sequencing of conserved marker genes, such as 16S/18S ribosomal RNA gene in bacteria and the internal transcribed spacer gene in fungi. The second approach is metagenomic shotgun sequencing.
1.3.2 16S Ribosomal RNA Sequencing
Some components in the microbiome, such as bacteria, archaea and fungi, have conserved regions within their genomes commonly referred to as marker genes, which can be sequenced and used to study taxonomy and phylogeny of microorganisms in a given sample. In bacteria, the 16S ribosomal RNA (16S rRNA) gene is 500 base pairs
17
in size and consists of conserved regions and hypervariable regions (V1-V9) [140].
The conserved regions act as primer binding sites for PCR amplification and the hypervariable regions are used to identify specific bacteria. These sequences can then be clustered either into phylotypes according to sequences in a reference database
[126], or by operational taxonomic units (OTUs) in which clusters are generated based on similarity [141]. Several well-developed bioinformatics tools exist for the analysis of 16S rRNA data including Qiime, MG-RAST and Mothur, in addition to many comprehensive bacterial databases such as Greengenes and Silva [141-145]. For fungal characterization, internal transcribed spacers (ITS) can be similarly sequenced and analyzed. Bioinformatics tools and databases exist for the analysis of ITS data, however they are not as well-developed as those that exist for 16S rRNA. For example, Qiime developed a pipeline for ITS analysis and the UNITE database specifically contains fungal genomes [142, 146].
Over the past three decades, 16S rRNA sequencing has provided an inexpensive and relatively rapid alternative to traditional culture techniques. This approach has been employed in numerous microbiome studies and the gain of insight to the role of bacteria in both healthy and diseased organisms this method has provided must be appreciated. However, there are several limitations with this approach. Amplicon sequencing, such as 16S rRNA, utilizes primers which can present a bias as they target a specific sequence in bacteria, thus leaving eukaryotes and viruses undetectable. Results are also typically output as abundance proportionate to the sample instead of absolute abundance, which can lead to over- or under-
18
represented organisms, making sample comparison a challenge [147]. Furthermore, this method is restricted to classifying bacteria at the genus taxonomic level, as it lacks the power to differentiate specific bacterial species [147]. Identification of novel bacterial sequences is another limitation that is the result of utilization of bacterial reference databases in downstream analysis. Finally, because amplicon sequencing is restricted to providing taxonomic information, it does not provide the information required to infer function.
1.3.3 Metagenomic Shotgun Sequencing
Metagenomic shotgun sequencing characterizes all known and novel microorganisms in a given sample, both culturable and unculturable. DNA from an entire sample can be sequenced using metagenomic shotgun sequencing approaches and microbial communities can be classified in a short amount of time [147]. This approach does not use PCR and is therefore not restricted by primers that target specific gene sequences. As a result, it is not limited to detecting one specific kingdom and has enough sensitivity to detect at the species and even strain taxonomic level.
Therefore, a metagenomics approach can be employed to characterize viral and eukaryotic sequences. Moreover, data generated using this approach do not require alignment to a reference database, thus the potential to detect novel microbial sequences is a possibility. In addition to providing invaluable insight into the diversity and composition of an environment, the data generated using this approach can be also
19
be employed in metatranscriptomics, metabolomics and proteomics analyses, providing a deeper understanding of the community interactions through functional profiles and metabolic pathways.
Several bioinformatics analysis tools exist for the characterization of microorganisms from shotgun metagenomics data; determining which tool is most appropriate depends on the unique research hypotheses. For identification of known microbial sequences, a sequence-dependent approach can be used to align reads to annotated reference genomes that exist in reference genome databases [148-150]. This can be achieved using de novo assembly or with unassembled reads. Several sequence aligners exist that align unassembled reads to the template of reference genomes including the commonly used Bowtie2 and the Burrows-Wheeler Aligner [149, 150]. The completeness of the reference genome database provided needs to be considered using this approach. For identification of unknown microbial sequences, a de novo assembly can be employed. Several tools exist that assemble metagenomic sequences into contigs for de novo detection of microbial elements without using a reference sequence [151-153]. Some common metagenomic assemblers include MEGAHIT, MetaSPAdes and MetaVelvet [151-153]. A Basic Local Alignment Search Tool (BLAST), or a similar method, can also be employed to identify known microbial sequences from the assembled contigs [154].
1.4 Characterization of the Virome
Although the importance of the virome in both health and disease of animal hosts is apparent, our current understanding of viral diversity is incredibly limited. According to the latest master species list provided by the International Committee for
20
the Taxonomy of Viruses in 2018, only 150 families, 1,020 genera and 5,560 viral species have been classified [155]. In fact, it has been estimated that only about 1% of the planet’s virome has been discovered [8]. In addition to discovering novel viruses, elucidating the biological roles of known viruses is another area of great importance. As described previously, examining the eukaryotic virome in particular will provide a deeper understanding of the complex symbiotic and dysbiotic interactions that occur within the virome and between the virome, the host and other microorganisms. Moreover, it can even provide unique insights into our own genomes as viral signatures have been identified within the human genome. Indeed, human endogenous retroviruses account for about 8% of our total genome [156]. Endogenous retroviruses are typically inactive in humans due to deletions, inversions and mutations but they could contribute to regulating gene expression and encoding mRNAs for certain proteins [157-159] Furthermore, they have been found to be reactivated in certain diseases such as cancer [160, 161]. Although the necessity of such studies is apparent, several challenges with characterizing the virome have limited the available computational methodology and tools required for this type of analysis.
1.4.1 Characterizing the Virome Using a Culture-Independent Approach
Although amplicon sequencing and the well-developed computational tools for analyzing this data have been successfully employed in detecting bacterial communities in a number of studies, similar approaches cannot be used for viral classification. Viruses lack conserved genomic regions that are homologous across all viral genomes, such as the 16S rRNA gene in bacteria. However, the advancement of
21
other culture-independent approaches, such as metagenomic shotgun sequencing, has resulted in the possibility of characterizing viruses.
1.4.2 Challenges with Developing Comprehensive Computational Tools for Analysis of the Virome
Several challenges have made developing comprehensive tools for viral characterization arduous. For one, viral DNA and RNA is typically less abundant compared to the host and other microorganisms within a sample, resulting in a weaker signal and making viruses more difficult to detect during downstream analysis.
Moreover, the rapid degradation and instability of RNA can make detecting RNA viruses even more challenging. Furthermore, the structures of DNA and RNA viruses can significantly vary in genome size as well as other characteristics such as double- or single-stranded, positive- or negative-sense and enveloping or no enveloping.
Viruses are also highly genetically heterogeneous in nature and the sequence variability causes these viruses to evolve quickly. In 2015, a reported average of 2.5 viruses were being added to the NCBI viral RefSeq database each day [162].
Therefore, maintaining a current viral genome database will require frequent updating, possibly more so than other microbial databases. For this reason, there have been few attempts at creating and managing robust and curated viral genome databases. Thus, preventing sequence-dependent approaches that rely on reference databases. As a result, many studies attempting to characterize the virome rely on using sequence- independent assembly approaches and/or BLAST database searches, which can
22
require a significant amount of computational time and resources. Additionally, the host background DNA within samples can interfere with mapping quality and efficiency, making data interpretation problematics. Finally, many existing metagenomics tools that attempt to characterize the virome require extensive command-line knowledge and expensive computational resources to process a sample.
Although a tool that addresses all of these limitations does not yet exist, some tools have been recently developed for taxonomic profiling and discovery of viruses.
1.4.2.1 Quantification of the Virome
Quantifying the viral component of a microbiome has also provided a challenge to researchers. Percent relative abundance of microbial elements is often the chosen method of quantifying microorganisms in culture-independent studies, whether utilizing 16S rRNA or metagenomics shotgun sequencing. This is typically calculated as the number of reads mapped to a particular microbial reference genome in proportion to the total number of microbial sequences detected within the entire sample. This formula allows researchers to determine which elements are most abundant in a particular sample, information that could then be compared to other samples to identify microbial shifts and trends. This method of quantification has developed into the standard, however incorporating reference genome length to calculate normalized viral abundance is a strategy some studies have found to increase accuracy of abundance estimates [163, 164]. In a highly cited study by Moustafa and colleagues, normalized viral abundance was calculated by normalizing mapped reads based on both the viral reference genome length as well as the host reference genome
23
length [163]. By considering the length of both viral sequences and host sequences, this formula was intended to provide a more accurate representation of viral abundance by eliminating bias stemming from variable genome lengths of the viruses. Encouragingly, they found that normalized abundance corresponded well to viral abundance determined by Polymerase Chain Reaction (PCR) experiments. Overall, accurate quantification of microbial abundance requires sequence-dependent alignment to comprehensive microbial reference databases.
1.4.3 Existing Culture-Independent Tools for Virome Characterization and their Limitations
As described previously, sequence-dependent alignment and de novo assembly approaches can be employed to analyze metagenomic sequencing data. Several tools exist that perform de novo assembly specifically for metagenomics studies and can be used to analyze the virome. These tools were designed to consider the variability of multiple genome sizes and include MEGAHIT, MetaSPAdes and MetaVelvet [151-
153]. Virus-specific tools have also been developed using sequence-dependent approaches to identify known viruses including Virome, VirusHunter, VirusSeeker,
MetaVir, ProViDE, Kraken and VirSorter [148, 165-167]. All but one of these tools uses a Basic Alignment Search Algorithm (BLAST) for taxonomic classification, which can require an extensive amount of time and computational resources. These existing tools are appropriate for taxonomic classification, however they do not provide accurate viral abundance, diversity and genome coverage estimates.
Furthermore, many of these resources require extensive command-line knowledge to
24
install and use the tool, only work on certain operating systems and can be computationally exhaustive requiring access to a powerful computer or sever.
1.5 Rationale and Objectives
Overwhelming evidence suggests that microbial interactions occurring between the host and microorganisms or between microorganisms of different kingdoms have a significant role in disease pathology in intestinal, respiratory, skin, oral and reproductive microbiomes. Shotgun metagenomic sequencing has provided a rapid and inexpensive approach to sequence microbiome samples. However, a majority of studies focus on the bacterial component as the computational tools needed to characterize the eukaryotic viruses, bacteriophage and fungi are lacking.
Elucidating the complete microbiome of animals is critical in understanding the role these microbial communities play in disease etiology. To address this, BiomeSeq, a tool for the detection and quantification of eukaryotic viruses, bacteriophage, fungi and bacteria, was developed and is presented in Chapter 2. BiomeSeq utilizes a sequence-dependent approach to detect microorganisms in comprehensive microbial databases and accurately determines microbial abundance, diversity and genome coverage. The development of this tool is detailed in Chapter 2, including each step of the bioinformatics workflow as well as the contents of each microbial database.
BiomeSeq performance was evaluated using several metrics on simulated datasets as well as a clinical dataset, and the tool performed with high accuracy and precision.
25
BiomeSeq was implemented into a software package as well as a user-friendly container and the manuscript is available on BioRxiv and was recently submitted to
BMC Genomics Journal. BiomeSeq was employed in several studies examining various microbiomes of avian flocks. In the first study, BiomeSeq was utilized to analyze the respiratory tract of a healthy broiler flock from hatching to processing.
Abundance, microbial diversity, species frequency and microbial shifts were examined for each of the microbial components within the respiratory microbiome. This study provides the first comprehensive analysis of the ecology of the avian respiratory microbiome. This work is discussed in Chapter 3 and a manuscript of this work was submitted to the Journal of Applied and Environmental Microbiology. In a second study, clinical isolates with respiratory disease complex were sampled and BiomeSeq was employed to characterize each of the microbial components. This information was compared to the healthy broiler flock, providing insight on the microbial shifts occurring as a result of dysbiosis in the diseased flock. This study is discussed in
Chapter 4 and a manuscript of this work was submitted to Avian Diseases. In a third study, BiomeSeq’s ability to characterize a variety of hosts, sampling methods and microbiome environments is highlighted. In this study, the respiratory microbiome is compared to the intestinal microbiome in a flock of healthy turkeys. This study is a collaborative effort between the University of Delaware Department of Animal and
Food Science, University of Minnesota Department of Veterinary and Biomedical
Sciences, and the Ohio State University Department of Veterinary Preventive
Medicine and is discussed in Chapter 5.
26
REFERENCES
1. Lederberg, J.M., Alexa, Ome Sweet 'Omics-- A Genealogical Treasury of
Words. The Scientist, 2001. 15(7): p. 8.
2. Human Microbiome Project, C., A framework for human microbiome research.
Nature, 2012. 486(7402): p. 215-21.
3. Human Microbiome Project, C., Structure, function and diversity of the
healthy human microbiome. Nature, 2012. 486(7402): p. 207-14.
4. Gill, S.R., et al., Metagenomic analysis of the human distal gut microbiome.
Science, 2006. 312(5778): p. 1355-9.
5. Kamada, N., et al., Control of pathogens and pathobionts by the gut
microbiota. Nat Immunol, 2013. 14(7): p. 685-90.
6. Turnbaugh, P.J., et al., The human microbiome project. Nature, 2007.
449(7164): p. 804-10.
7. Sender, R., S. Fuchs, and R. Milo, Revised Estimates for the Number of
Human and Bacteria Cells in the Body. PLoS Biol, 2016. 14(8): p. e1002533.
8. Mokili, J.L.R., F.; Dutilh, B.E., Metagenomics and future perspectives in virus
discovery. Curren Opinion in Virology, 2012. 2: p. 63-77.
9. Peterson, J., et al., The NIH Human Microbiome Project. Genome Res, 2009.
19(12): p. 2317-23.
10. Lloyd-Price, J., G. Abu-Ali, and C. Huttenhower, The healthy human
microbiome. Genome Med, 2016. 8(1): p. 51.
27
11. Shafquat, A., et al., Functional and phylogenetic assembly of microbial
communities in the human microbiome. Trends Microbiol, 2014. 22(5): p. 261-
6.
12. Schamberger, G.P. and F. Diez-Gonzalez, Characterization of Colicinogenic
Escherichia coli Strains Inhibitory to Enterohemorrhagic Escherichia coli.
Journal of Food Protection, 2004. 67(3): p. 486-492.
13. Fukuda, S., et al., Bifidobacteria can protect from enteropathogenic infection
through production of acetate. Nature, 2011. 469(7331): p. 543-7.
14. Ravel, J., et al., Vaginal microbiome of reproductive-age women. Proc Natl
Acad Sci U S A, 2011. 108 Suppl 1: p. 4680-7.
15. Larson, B.a.M.G.R., Understanding the Bacterial Flora of the Female Genital
Tract. Clinical Infectious Diseases, 2000. 32: p. 69-77.
16. Hickey, R.J., et al., Understanding vaginal microbiome complexity from an
ecological perspective. Transl Res, 2012. 160(4): p. 267-82.
17. Gao, Z., et al., Molecular analysis of human forearm superficial skin bacterial
biota. Proc Natl Acad Sci U S A, 2007. 104(8): p. 2927-32.
18. Dekio, I., et al., Detection of potentially novel bacterial components of the
human skin microbiota using culture-independent molecular profiling. J Med
Microbiol, 2005. 54(Pt 12): p. 1231-1238.
19. Kong, H.H., Skin microbiome: genomics-based insights into the diversity and
role of skin microbes. Trends Mol Med, 2011. 17(6): p. 320-8.
28
20. Lai, Y., et al., Activation of TLR2 by a small molecule produced by
Staphylococcus epidermidis increases antimicrobial defense against bacterial
skin infections. J Invest Dermatol, 2010. 130(9): p. 2211-21.
21. Virgin, H.W., E.J. Wherry, and R. Ahmed, Redefining chronic viral infection.
Cell, 2009. 138(1): p. 30-50.
22. Zhu, J., et al., Virus-specific CD8+ T cells accumulate near sensory nerve
endings in genital skin during subclinical HSV-2 reactivation. J Exp Med,
2007. 204(3): p. 595-603.
23. Hislop, A.D., et al., Tonsillar homing of Epstein-Barr virus-specific CD8+ T
cells and the virus-host balance. J Clin Invest, 2005. 115(9): p. 2546-55.
24. Zur Hausen, H., Novel human polyomaviruses--re-emergence of a well known
virus family as possible human carcinogens. Int J Cancer, 2008. 123(2): p.
247-250.
25. Hino, S.M., H., Torque teno virus (TTV): current status. Reviews in Medical
Virology, 2006. 17(1): p. 45-57.
26. Gao, G., et al., Clades of Adeno-associated viruses are widely disseminated in
human tissues. J Virol, 2004. 78(12): p. 6381-8.
27. Chen, C.L., et al., Molecular characterization of adeno-associated viruses
infecting children. J Virol, 2005. 79(23): p. 14781-14792.
28. Erles, K.S., P; Schlehofer J.R., Update on the prevalence of serum antibodies
(IgG and IgM) to adeno-associated virus (AAV). Journal of Medical Virology,
1999. 59(3): p. 406-411.
29
29. Garnett, C.T.E., D.; Xu, W.; Gooding, L.R., Prevalence and Quantitation of
Species C Adenovirus DNA in Human Mucosal Lymphocytes. Journal of
Virology, 2002. 76(21): p. 10608–10616.
30. Leggatt, G.R. and I.H. Frazer, HPV vaccines: the beginning of the end for
cervical cancer. Current Opinion in Immunology, 2007. 19(2): p. 232-238.
31. Seifarth, W., et al., Comprehensive analysis of human endogenous retrovirus
transcriptional activity in human tissues with a retrovirus-specific microarray. J
Virol, 2005. 79(1): p. 341-52.
32. Barton, E.S.W., D.W.; Cathelyn, J.S.; Brett-McClellan, K.A.; Engle, M.;
Diamond, M.S.; Miller, V.L.; and Virgin; H.W. . Herpesvirus latency confers
symbiotic protection from bacterial infection. Nature, 2007. 447: p. 326–329.
33. Yager, E.J., et al., gamma-Herpesvirus-induced protection against bacterial
infection is transient. Viral Immunol, 2009. 22(1): p. 67-72.
34. Grivel, J.C., et al., Suppression of CCR5- but not CXCR4-tropic HIV-1 in
lymphoid tissue by human herpesvirus 6. Nat Med, 2001. 7(11): p. 1232-5.
35. Bonfante, F., et al., Synergy or interference of a H9N2 avian influenza virus
with a velogenic Newcastle disease virus in chickens is dose dependent. Avian
Pathol, 2017. 46(5): p. 488-496.
36. Hatoum, R., S. Labrie, and I. Fliss, Antimicrobial and probiotic properties of
yeasts: from fundamental to novel applications. Front Microbiol, 2012. 3: p.
421.
30
37. McFarland, L.V., Systematic review and meta-analysis of Saccharomyces
boulardii in adult patients. World J Gastroenterol, 2010. 16(18): p. 2202-22.
38. Bradley G.L.; Savage, T.F.T.K.I., The effects of supplementing diets with
Saccharomyces cerevisiae var. boulardii on male poult performance and ileal
morphology. Poultry Science, 1994. 73: p. 1766-1770.
39. Kaser, A., S. Zeissig, and R.S. Blumberg, Inflammatory bowel disease. Annu
Rev Immunol, 2010. 28: p. 573-621.
40. Sokol, H., et al., Faecalibacterium prausnitzii is an anti-inflammatory
commensal bacterium identified by gut microbiota analysis of Crohn disease
patients. Proc Natl Acad Sci U S A, 2008. 105(43): p. 16731-6.
41. Willing, B.P., et al., A pyrosequencing study in twins shows that
gastrointestinal microbial profiles vary with inflammatory bowel disease
phenotypes. Gastroenterology, 2010. 139(6): p. 1844-1854.e1.
42. Png, C.W., et al., Mucolytic bacteria with increased prevalence in IBD mucosa
augment in vitro utilization of mucin by other bacteria. Am J Gastroenterol,
2010. 105(11): p. 2420-8.
43. Lepage, P., et al., Twin study indicates loss of interaction between microbiota
and mucosa of patients with ulcerative colitis. Gastroenterology, 2011. 141(1):
p. 227-36.
44. Salonen, A., W.M. de Vos, and A. Palva, Gastrointestinal microbiota in
irritable bowel syndrome: present state and perspectives. Microbiology, 2010.
156(11): p. 3205-3215.
31
45. Saulnier, D.M., et al., Gastrointestinal microbiome signatures of pediatric
patients with irritable bowel syndrome. Gastroenterology, 2011. 141(5): p.
1782-91.
46. Sobhani, I., et al., Microbial dysbiosis in colorectal cancer (CRC) patients.
PLoS One, 2011. 6(1): p. e16393.
47. Wang, T., et al., Structural segregation of gut microbiota between colorectal
cancer patients and healthy volunteers. Isme j, 2012. 6(2): p. 320-9.
48. Nistal, E., et al., Differences of small intestinal bacteria populations in adults
and children with/without celiac disease: effect of age, gluten diet, and disease.
Inflammatory bowel diseases, 2012. 18(4): p. 649-656.
49. Di Cagno, R., et al., Duodenal and faecal microbiota of celiac children:
molecular, phenotype and metabolome characterization. BMC Microbiol,
2011. 11: p. 219.
50. Musso, G., R. Gambino, and M. Cassader, Interactions between gut microbiota
and host metabolism predisposing to obesity and diabetes. Annu Rev Med,
2011. 62: p. 361-80.
51. Vaarala, O., The gut as a regulator of early inflammation in type 1 diabetes.
Curr Opin Endocrinol Diabetes Obes, 2011. 18(4): p. 241-7.
52. Giongo, A., et al., Toward defining the autoimmune microbiome for type 1
diabetes. Isme j, 2011. 5(1): p. 82-91.
53. Larsen, N., et al., Gut microbiota in human adults with type 2 diabetes differs
from non-diabetic adults. PLoS One, 2010. 5(2): p. e9085.
32
54. Wu, X., et al., Molecular characterisation of the faecal microbiota in patients
with type II diabetes. Curr Microbiol, 2010. 61(1): p. 69-78.
55. Wing, M.R., et al., Gut microbiome in chronic kidney disease. Exp Physiol,
2016. 101(4): p. 471-7.
56. Ley, R.E., et al., Microbial ecology: human gut microbes associated with
obesity. Nature, 2006. 444(7122): p. 1022-3.
57. Turnbaugh, P.J., et al., A core gut microbiome in obese and lean twins. Nature,
2009. 457(7228): p. 480-4.
58. de Jongh, G.J., et al., High expression levels of keratinocyte antimicrobial
proteins in psoriasis compared with atopic dermatitis. J Invest Dermatol, 2005.
125(6): p. 1163-73.
59. Harder, J., et al., Enhanced expression and secretion of antimicrobial peptides
in atopic dermatitis and after superficial skin injury. J Invest Dermatol, 2010.
130(5): p. 1355-64.
60. Gao, Z., et al., Substantial Alterations of the Cutaneous Bacterial Biota in
Psoriatic Lesions. PLOS ONE, 2008. 3(7): p. e2719.
61. Gudjonsson, J.E., et al., Global gene expression analysis reveals evidence for
decreased lipid biosynthesis and increased innate immunity in uninvolved
psoriatic skin. J Invest Dermatol, 2009. 129(12): p. 2795-804.
62. Jugeau, S., et al., Induction of toll-like receptors by Propionibacterium acnes.
Br J Dermatol, 2005. 153(6): p. 1105-13.
33
63. Dessinioti, C. and A.D. Katsambas, The role of Propionibacterium acnes in
acne pathogenesis: facts and controversies. Clin Dermatol, 2010. 28(1): p. 2-7.
64. Grice, E.A. and J.A. Segre, The skin microbiome. Nat Rev Microbiol, 2011.
9(4): p. 244-53.
65. Holmes, A.D., Potential role of microorganisms in the pathogenesis of rosacea.
J Am Acad Dermatol, 2013. 69(6): p. 1025-32.
66. Whitfeld, M., et al., Staphylococcus epidermidis: a possible role in the pustules
of rosacea. J Am Acad Dermatol, 2011. 64(1): p. 49-52.
67. Martin, D.H. and J.M. Marrazzo, The Vaginal Microbiome: Current
Understanding and Future Directions. J Infect Dis, 2016. 214 Suppl 1: p. S36-
41.
68. Zhang, Y.W., X; Li, H; Ni, C; Du, X; Yan F, Human oral microbiota and its
modulation for oral health. Biomedicine & Pharmacotherapy, 2018. 99: p. 883-
893.
69. Grant, M.M. and D. Jonsson, Next Generation Sequencing Discoveries of the
Nitrate-Responsive Oral Microbiome and Its Effect on Vascular Responses. J
Clin Med, 2019. 8(8).
70. Sampaio-Maia, B.C., I.M.; Pereira, M.L.; Pérez-Mongiovi, D. ; Araujo, R.,
Chapter Four - The Oral Microbiome in Health and Its Implication in Oral and
Systemic Diseases. Advances in Applied Microbiology, 2016. 97: p. 171-210.
71. Peters, B.A., et al., Oral Microbiome Composition Reflects Prospective Risk
for Esophageal Cancers. Cancer Res, 2017. 77(23): p. 6777-6787.
34
72. Gao, S.G., et al., Preoperative serum immunoglobulin G and A antibodies to
Porphyromonas gingivalis are potential serum biomarkers for the diagnosis and
prognosis of esophageal squamous cell carcinoma. BMC Cancer, 2018. 18(1):
p. 17.
73. Ertz-Archambault, N., P. Keim, and D. Von Hoff, Microbiome and pancreatic
cancer: A comprehensive topic review of literature. World J Gastroenterol,
2017. 23(10): p. 1899-1908.
74. Flemer, B., et al., The oral microbiota in colorectal cancer is distinctive and
predictive. Gut, 2018. 67(8): p. 1454-1463.
75. Cotran, R., et al., Robbins Pathologic Basis of Disease. 1999, Philadelphia:
Saunders.
76. Dickson, R.P., et al., The Microbiome and the Respiratory Tract. Annu Rev
Physiol, 2016. 78: p. 481-504.
77. Morris, A., et al., Comparison of the respiratory microbiome in healthy
nonsmokers and smokers. Am J Respir Crit Care Med, 2013. 187(10): p. 1067-
75.
78. Chen;, L.N.S.A.V.A.J.C.C.R.K.B.W.H., K.I. Berger;, and
R.M.G.W.N.R.M.J.B.a.M.D. Weiden, Enrichment of lung microbiome with
supraglottic taxa is associated with increased pulmonary inflammation.
Microbiome, 2013. 1(19): p. 1-12.
35
79. Willner, D., et al., Metagenomic analysis of respiratory tract DNA viral
communities in cystic fibrosis and non-cystic fibrosis individuals. PLoS One,
2009. 4(10): p. e7370.
80. Bosch, A.A., et al., Viral and bacterial interactions in the upper respiratory
tract. PLoS Pathog, 2013. 9(1): p. e1003057.
81. Murphy, T.F., L.O. Bakaletz, and P.R. Smeesters, Microbial interactions in the
respiratory tract. Pediatr Infect Dis J, 2009. 28(10 Suppl): p. S121-6.
82. Papi, A., et al., Infections and airway inflammation in chronic obstructive
pulmonary disease severe exacerbations. Am J Respir Crit Care Med, 2006.
173(10): p. 1114-21.
83. Rohde, G., et al., Respiratory viruses in exacerbations of chronic obstructive
pulmonary disease requiring hospitalisation: a case-control study. Thorax,
2003. 58(1): p. 37-42.
84. Kulczycki, L.L., T.M. Murphy, and J.A. Bellanti, Pseudomonas colonization in
cystic fibrosis. A study of 160 patients. Jama, 1978. 240(1): p. 30-4.
85. Green, B.J., et al., Potentially pathogenic airway bacteria and neutrophilic
inflammation in treatment resistant severe asthma. PLoS One, 2014. 9(6): p.
e100645.
86. Johnston, N.W., et al., The September epidemic of asthma exacerbations in
children: a search for etiology. Journal of Allergy and Clinical Immunology,
2005. 115(1): p. 132-138.
36
87. Michaels, R.H. and R.L. Myerowitz, Viral enhancement of nasal colonization
with Haemophilus influenzae type b in the infant rat. Pediatric research, 1983.
17(6): p. 472-473.
88. Moore, H.C., et al., The interaction between respiratory viruses and pathogenic
bacteria in the upper respiratory tract of asymptomatic Aboriginal and non-
Aboriginal children. Pediatr Infect Dis J, 2010. 29(6): p. 540-5.
89. McCullers, J.A. and J.E. Rehg, Lethal synergism between influenza virus and
Streptococcus pneumoniae: characterization of a mouse model and the role of
platelet-activating factor receptor. The Journal of infectious diseases, 2002.
186(3): p. 341-350.
90. Lee, L.N., et al., A mouse model of lethal synergism between influenza virus
and Haemophilus influenzae. The American journal of pathology, 2010.
176(2): p. 800-811.
91. Iverson, A.R., et al., Influenza virus primes mice for pneumonia from
Staphylococcus aureus. Journal of Infectious Diseases, 2011. 203(6): p. 880-
888.
92. Wiertsema, S.P., et al., High detection rates of nucleic acids of a wide range of
respiratory viruses in the nasopharynx and the middle ear of children with a
history of recurrent acute otitis media. Journal of medical virology, 2011.
83(11): p. 2008-2017.
93. Kukavica-Ibrulj, I., et al., Human metapneumovirus infection predisposes to
severe pneumococcal pneumonia in mice. Journal of Virology, 2008.
37
94. McGillivary, G., et al., Respiratory syncytial virus-induced dysregulation of
expression of a mucosal β-defensin augments colonization of the upper airway
by non-typeable Haemophilus influenzae. Cellular microbiology, 2009. 11(9):
p. 1399-1408.
95. Stark, J.M., et al., Decreased bacterial clearance from the lungs of mice
following primary respiratory syncytial virus infection. Journal of medical
virology, 2006. 78(6): p. 829-838.
96. McCullers, J.A., Insights into the interaction between influenza virus and
pneumococcus. Clin Microbiol Rev, 2006. 19(3): p. 571-82.
97. Adler, K.B., D.D. Hendley, and G.S. Davis, Bacteria associated with
obstructive pulmonary disease elaborate extracellular products that stimulate
mucin secretion by explants of guinea pig airways. Am J Pathol, 1986. 125(3):
p. 501-14.
98. Wilson, R.R.D.C.P., Effect of bacterial products on human ciliary function in
vitro. Thorax, 1985. 40(40): p. 125-131.
99. Glendinning, L., G. McLachlan, and L. Vervelde, Age-related differences in
the respiratory microbiota of chickens. PLoS One, 2017. 12(11): p. e0188455.
100. Shabbir, M.Z., et al., Microbial communities present in the lower respiratory
tract of clinically healthy birds in Pakistan. Poult Sci, 2015. 94(4): p. 612-20.
101. Roussan, D.A., et al., Simultaneous detection of astrovirus, rotavirus, reovirus
and adenovirus type I in broiler chicken flocks. Pol J Vet Sci, 2012. 15(2): p.
337-44.
38
102. USDA, A.a.P.H.I.S. Respiratory Disease on Breeder- Chicken Farms in the
United States. Technical Brief 2012; Available from:
https://www.aphis.usda.gov/aphis/home.
103. Ramos, S.M., M.; Melton, A., Impacts of the 2014-2015 Highly Pathogenic
Avian Influenza Outbreak on the U.S. Poultry Sector. USDA, Economic
Research Service, 2015.
104. USDA, N.A.S.S. Poultry - Production and Value 2018 Summary. 2019;
Available from:
https://www.nass.usda.gov/Publications/Todays_Reports/reports/plva0519.pdf.
105. Chand, N., et al., Performance traits and immune response of broiler chicks
treated with zinc and ascorbic acid supplementation during cyclic heat stress.
Int J Biometeorol., 2014. 58(10): p. 2153-2157.
106. David, B., et al., Air Quality in Alternative Housing Systems May Have an
Impact on Laying Hen Welfare. Part I-Dust. Animals, 2015. 5(3): p. 495-511.
107. David, B., et al., Air Quality in Alternative Housing Systems may have an
Impact on Laying Hen Welfare. Part II-Ammonia. Animals, 2015. 5(3): p. 886-
896.
108. Ganapathy, K., R.C. Jones, and J.M. Bradbury, Pathogenicity of in vivo-
passaged Mycoplasma imitans in turkey poults in single infection and in dual
infection with rhinotracheitis virus. Avian Pathology, 1998. 27(1): p. 80-89.
109. Saif, Y.M., P.D. Moorhead, and E.H. Bohl, Mycoplasma meleagridis and
Escherichia coli infections in germfree and specific-pathogen-free turkey
39
poults: production of complicated airsacculitis. Am J Vet Res, 1970. 31(9): p.
1637-43.
110. Kato, K., Infectious coryza of chickens. V. Influence of Mycoplasma
gallisepticum infection on chicken infected with Haemophilus gallinarum. Natl
Inst Anim Health Q (Tokyo), 1965. 5(4): p. 183-9.
111. Karimi-Madab, M., et al., Risk factors for detection of bronchial casts, most
frequently seen in endemic H9N2 avian influenza infection, in poultry flocks
in Iran. Prev Vet Med, 2010. 95(3-4): p. 275-80.
112. Travers, A.F., Concomitant Ornithobacterium rhinotracheale and Newcastle
disease infection in broilers in South Africa. Avian Dis, 1996. 40(2): p. 488-90.
113. Okoye, J.O., C.N. Okeke, and F.K. Ezeobele, Effect of infectious bursal
disease virus infection on the severity of Aspergillus flavus aspergillosis of
chickens. Avian Pathol, 1991. 20(1): p. 167-71.
114. Omuro, M., et al., Interaction of Mycoplasma gallisepticum, mild strains of
Newcastle disease virus and infectious bronchitis virus in chickens. Natl Inst
Anim Health Q (Tokyo), 1971. 11(2): p. 83-93.
115. Kishida, N., et al., Co-infection of Staphylococcus aureus or Haemophilus
paragallinarum exacerbates H9N2 influenza A virus infection in chickens.
Arch Virol, 2004. 149(11): p. 2095-104.
116. Kleven, S.H., C.S. Eidson, and O.J. Fletcher, Airsacculitis induced in broilers
with a combination of Mycoplasma gallinarum and respiratory viruses. Avian
Dis, 1978. 22(4): p. 707-16.
40
117. Hopkins, S.R. and H.W. Yoder, Jr., Increased incidence of airsacculitis in
broilers infected with mycoplasma synoviae and chicken-passaged infectious
bronchitis vaccine virus. Avian Dis, 1984. 28(2): p. 386-96.
118. Springer, W.T., C. Luskus, and S.S. Pourciau, Infectious bronchitis and mixed
infections of Mycoplasma synoviae and Escherichia coli in gnotobiotic
chickens. I. Synergistic role in the airsacculitis syndrome. Infect Immun, 1974.
10(3): p. 578-89.
119. Gross, W.B., Factors affecting the development of respiratory disease complex
in chickens. Avian Dis, 1990. 34(3): p. 607-10.
120. Weiss, R.A., Robert Koch: the grandfather of cloning? Cell, 2005. 123(4): p.
539-42.
121. Laupland, K.B. and L. Valiquette, The changing culture of the microbiology
laboratory. Can J Infect Dis Med Microbiol, 2013. 24(3): p. 125-8.
122. Stewart, E.J., Growing unculturable bacteria. J Bacteriol, 2012. 194(16): p.
4151-60.
123. Gall; J.G.; Pardue, M.L., Formation and detection of RNA-DNA hybrid
molecules in cytological preparations. Genetics, 1969. 63: p. 378-383.
124. Fischer, S.G. and L.S. Lerman, DNA fragments differing by single base-pair
substitutions are separated in denaturing gradient gels: correspondence with
melting theory. Proc Natl Acad Sci U S A, 1983. 80(6): p. 1579-83.
125. Bornman, J. and T. E.W., Molecular Microbial Diversity in Soils from Eastern
Amazonia: Evidence for Unusual Microorganisms and Microbial Population
41
Shifts Associated with Deforestation. Appl Environ Microbiol, 1997. 63(7): p.
2647-2653.
126. Liu, W.M.T.C., H.; Forney, L., Characterization of Microbial Diversity by
Determining Terminal Restriction Fragment Length Polymorphisms of Genes
Encoding 16S rRNA. Appl Environ Microbiol, 1997. 63(11): p. 4516-4522.
127. Sanger, F., N. S., and C. A.R., DNA sequencingwithchain-
terminatinginhibitors. Proc. Nati. Acad. Sci. USA, 1977. 74(12): p. 5463-5467.
128. Sanger F.; Air, G.M.B., B.G.; Brown, N.L.; Coulson, A.R.; Fiddes, C.A.;
Hutchison, C.A.; Slocombe, P.M.; Smith, M., Nucleotide sequence of
bacteriophage phi X174 DNA. Nature, 1977. 65: p. 687-695.
129. Margulies, M., et al., Genome sequencing in microfabricated high-density
picolitre reactors. Nature, 2005. 437(7057): p. 376-80.
130. Quail, M.A., et al., A large genome center's improvements to the Illumina
sequencing system. Nat Methods, 2008. 5(12): p. 1005-10.
131. Shendure, J., et al., Accurate Multiplex Polony Sequencing of an Evolved
Bacterial Genome. Science, 2005. 309(5741): p. 1728.
132. Gilles, A., et al., Accuracy and quality assessment of 454 GS-FLX Titanium
pyrosequencing. BMC Genomics, 2011. 12: p. 245.
133. Ross, M.G., et al., Characterizing and measuring bias in sequence data.
Genome Biol, 2013. 14(5): p. R51.
42
134. Glenn, T.C., Field guide to next-generation DNA sequencers. Mol Ecol
Resour, 2011. 11(5): p. 759-69.
135. Laver, T., et al., Assessing the performance of the Oxford Nanopore
Technologies MinION. Biomol Detect Quantif, 2015. 3: p. 1-8.
136. Koren, S., et al., Hybrid error correction and de novo assembly of single-
molecule sequencing reads. Nat Biotechnol, 2012. 30(7): p. 693-700.
137. Wetterstrand, K.A., DNA Sequencing Costs: Data from the NHGRI Genome
Sequencing Program. 2019.
138. Mardis, E.R., A decade’s perspective on DNA sequencing technology. Nature,
2011. 470(7333): p. 198-203.
139. G.E., M., Cramming more components onto integrated circuits. Electronics,
1965. 38(8): p. 1-4.
140. Lane, D.J.P., B.; Olsen, G.; Stahl, D.; Sogin, M.; Pace, N., Rapid
determination of 16S ribosomal RNA sequences for phylogenetic analyses.
Proc Natl Acad Sci USA, 1985. 82: p. 6955-6959.
141. Schloss P, W.S., Ryabin T, Hall J, Hartman M, Hollister E, Lesniewski R,
Oakley B, Parks D, Robinson C, Sahl J, Stres B, Thallinger G, Van Horn D,
Weber C. , Introducing mothur: Open-source, platform-independent,
community-supported software for describing and comparing microbial
communities. Appl Enviro Microbiol, 2009. 75: p. 7537-7541.
142. Caporaso J, K.J., Stombaugh J, Bittinger K, Bushman F, Costello E, Fierer N,
Peña A, Goodrich J, Gordon J, Huttley G, Kelley ST, Knights D, Koenig JE,
43
Ley R, Lozupone C, McDonald D, Muegge B, Pirrung M, Reeder J, Sevinsky
JR, Turnbaugh P, Walters W, Widmann J, Yatsunenko T, Zaneveld J, Knight
R., Qiime allows analysis of high-throughout community sequencing data.
Nature Methods, 2010. 7: p. 335-336.
143. Meyer F, P.D., D’Souza M, Olson R, Glass E, Kubal M, Paczian T, Rodriguqz
A, Stevens R, Wilke A, Wilkening J, Edwards R., The metagenomics RAST
server- a public resource for the automatic phylogenetic and functional
analysis of metagenomes. BMC Bioinformatics, 2008. 9: p. 386.
144. DeSantis, T.Z., et al., Greengenes, a chimera-checked 16S rRNA gene
database and workbench compatible with ARB. Appl Environ Microbiol,
2006. 72(7): p. 5069-72.
145. Quast C, P.E., Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner F.,
The SILVA ribosomal RNA gene database project: improved data processing
and web-based tools. Nucl Acids Res., 2013. 41: p. 590-596.
146. Kõljalg U, N.R., Abarenkov K, Tedersoo L, Taylor A, Bahram M, Bates S,
Bruns T, Bengtsson-Palme J, Callaghan T, Douglas B, Drenkhan T, Eberhardt
U, Dueñas M, Grebenc T, Griffith G, Hartmann M, Kirk P, Kohout P, Larsson
E, Lindahl B, Lücking R, Martín M, Matheny P, Nguyen N, Niskanen T, Oja J,
Peay K, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Scott
J, Senés C, Smith M, Suija A, Taylor D, Telleria M, Weiss M, Larsson K.,
Towards a unified paradigm for sequence-based identification of fungi. Mol
Ecol., 2013. 22: p. 5271-5277.
44
147. Boers, S.A., R. Jansen, and J.P. Hays, Understanding and overcoming the
pitfalls and biases of next-generation sequencing (NGS) methods for use in the
routine clinical microbiological diagnostic laboratory. Eur J Clin Microbiol
Infect Dis, 2019. 38(6): p. 1059-1070.
148. Wood., D.E.S., S.L, Kraken: ultrafast metagenomic sequence classification
using exact alignments. Genome Biology, 2014. 15: p. R46.
149. Li H., D.R., Fast and accurate long-read alignment with Burrows-Wheeler
Transform. Bioinformatics, 2009. EPub.
150. Langmead, B. and S.L. Salzberg, Fast gapped-read alignment with Bowtie 2.
Nat Methods, 2012. 9(4): p. 357-9.
151. Li, D., et al., MEGAHIT: an ultra-fast single-node solution for large and
complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics,
2015. 31(10): p. 1674-6.
152. Nurk, S., et al., metaSPAdes: a new versatile metagenomic assembler. Genome
Res, 2017. 27(5): p. 824-834.
153. Namiki, T., et al., MetaVelvet: an extension of Velvet assembler to de novo
metagenome assembly from short sequence reads. Nucleic Acids Res, 2012.
40(20): p. e155.
154. Altschul, S.F., et al., Basic local alignment search tool. J Mol Biol., 1990.
215(3): p. 403-410.
155. ICTV. The International Committee for the Taxonomy of Viruses Species List.
2018; Available from: https://www.ictvonline.org/files/master-species-lists.
45
156. Griffiths, D.J., Endogenous retroviruses in the human genome sequence.
Genome biology, 2001. 2(6): p. reviews1017. 1.
157. Feschotte, C. and C. Gilbert, Endogenous viruses: insights into viral evolution
and impact on host biology. Nature Reviews Genetics, 2012. 13(4): p. 283-
296.
158. Holmes, E.C., The evolution of endogenous viral elements. Cell host &
microbe, 2011. 10(4): p. 368-377.
159. Patel, M.R., M. Emerman, and H.S. Malik, Paleovirology—ghosts and gifts of
viruses past. Current opinion in virology, 2011. 1(4): p. 304-309.
160. Bustamante Rivera, Y.Y., et al., Endogenous Retrovirus 3–History,
Physiology, and Pathology. Frontiers in microbiology, 2018. 8: p. 2691.
161. Jern, P. and J.M. Coffin, Effects of retroviruses on host genome function.
Annual review of genetics, 2008. 42: p. 709-732.
162. Rose, R., et al., Challenges in the analysis of viral metagenomes. Virus Evol,
2016. 2(2): p. vew022.
163. Moustafa, A., et al., The blood DNA virome in 8,000 humans. PLoS Pathog,
2017. 13(3): p. e1006292.
164. Angly, F.E., et al., The GAAS metagenomic tool and its estimations of viral
and microbial average genome size in four major biomes. PLoS computational
biology, 2009. 5(12).
165. Wommack, K.E., et al., VIROME: a standard operating procedure for analysis
of viral metagenome sequences. Stand Genomic Sci, 2012. 6(3): p. 427-39.
46
166. Zhao, G., et al., VirusSeeker, a computational pipeline for virus discovery and
virome composition analysis. Virology, 2017. 503: p. 21-30.
167. Zhao, G., et al., Identification of novel viruses using VirusHunter--an
automated data analysis pipeline. PLoS One, 2013. 8(10): p. e78470.
47
Chapter 2
BIOMESEQ: A TOOL FOR THE CHARACTERIZATION OF ANIMAL MICROBIOMES FROM METAGENOMIC DATA
2.1 Summary
The complete characterization of a microbiome is critical in elucidating the complex ecology of the microbial composition within healthy and diseased animals.
Many microbiome studies characterize only the bacterial component, for which there are several well-developed sequencing methods, bioinformatics tools and databases available. The lack of comprehensive bioinformatics workflows and databases have limited efforts to characterize the other components existing in a microbiome.
BiomeSeq is a tool for the analysis of the complete animal microbiome using metagenomic sequencing data. With its comprehensive workflow, customizable parameters and microbial databases, BiomeSeq can rapidly quantify the viral, fungal, bacteriophage and bacterial components of a sample and produce informative tables for analysis. Several performance metrics were performed and BiomeSeq displayed a strong positive correlation with known abundance values and exhibited high sensitivity and precision. BiomeSeq was employed in detecting and quantifying the respiratory microbiome of a commercial poultry broiler flock throughout its grow-out
48
cycle from hatching to processing and successfully processed 780 million reads. For each of microbial species detected, BiomeSeq calculated the normalized abundance, percent relative abundance, and coverage as well as the diversity for each sample. Rate of speed for each step in the BiomeSeq, precision and accuracy were calculated to examine BiomeSeq’s performance using in silico sequencing datasets. BiomeSeq demonstrated high precision (average of 99.52%) and sensitivity (average of 93.01%).
When compared to bacterial results generated by the commonly used 16S rRNA sequencing method, BiomeSeq detected the same most abundant bacteria as well as several additional species. BiomeSeq provides for the detection and quantification of the microbiome from next-generation metagenomic sequencing data. This tool is implemented into a user-friendly container that requires one command and generates a table consisting of taxonomical information for each microbe detected. It also determines normalized abundance, percent relative abundance, genome coverage and sample diversity calculations.
2.2 Introduction
Specific and unique animal microbiomes contribute to the biological function of various locations on the body including the intestinal tract, skin, vaginal tract, oral cavity, and respiratory tract [1]. Disturbances of these environments by colonization of a new bacteria, eukaryotic virus, or fungi can lead to competition, invasion and replacement. Under appropriate conditions this may result in disease. Advancements
49
in next-generation sequencing technology enable investigations into individual components of the microbiome, thereby gaining insight into the dynamic interactions taking place [2]. Identification of microbial communities within these environments can aid in elucidating the role they play in both healthy and diseased animals.
Recent studies attempting to characterize the microbiomes of animals have focused primarily on their bacterial composition, as there are well established methodological approaches to sequence and analyze this component [3-8]. The 16S rRNA gene is commonly used to identify and compare the bacterial genera present in a given sample. Accessible bacterial databases, such as Greengenes [9] and Silva [10], in addition to well-developed bioinformatics workflows are available to facilitate these analyses [11-13]. Internal Transcribed Spacer, or ITS, is a widely used fungal genetic marker gene. Similar to 16S rRNA, accessible fungi databases [14] and bioinformatics workflows for fungal analysis exist [12] .
Characterizing the viral component of the microbiome presents unique challenges. Unlike the ribosomal genes of bacteria and fungi, viruses are heterogeneous in their genetic content and therefore do not have a conserved genomic region that can be sequenced and employed for taxonomic classification using the same approaches [15]. Metagenomic shotgun sequencing does not use PCR and is therefore not restricted by primers that target specific gene sequences. As a result, this method is not limited to detecting one specific kingdom and has enough sensitivity to detect at the species taxonomic level. Using this approach, the major components of a
50
microbiome can be identified. Many studies attempting to characterize microorganisms using metagenomic sequencing data rely on adapting a sequence- similarity independent assembly approach and computationally exhaustive BLAST- like database searches. This can be attributed to the limited comprehensive microbial databases that exist. Thus, this approach provides taxonomic classification of samples, but lacks the ability to accurately quantify abundance and diversity. Furthermore, many of the available computational tools require the user to possess extensive command-line knowledge and computational resources to successfully install and run the programs and their dependencies on the command line.
Herein, we present BiomeSeq, a tool for the analysis of complete animal microbiomes from metagenomic sequencing data. BiomeSeq addresses the constraints of current computational tools by providing a comprehensive workflow and corresponding microbial databases that accurately identify and quantify each major component of the microbiome. The workflow includes quality filtering and host decontamination, sequence-similarity dependent alignment to microbial reference genome databases and quantification of microbial abundance and sample diversity.
BiomeSeq also analyzes the eukaryotic viral, fungal, bacteriophage and bacterial components using the same sequencing data to produce a complete analysis of the microbiome without requiring additional sequencing of the 16S rRNA or ITS genes.
Utilizing shotgun metagenomic data to analyze the bacterial and fungal components can increase taxonomic resolution, permit the analysis of complete genomes instead of
51
a conserved genomic region, and allows for a comparison of bacteria and fungi to the viral and bacteriophage components [16]. BiomeSeq was evaluated using simulated datasets designed to mimic complex microbial communities and performed with exceptional accuracy and precision. BiomeSeq was also employed to characterize the respiratory microbiome of a healthy broiler flock. The results obtained using
BiomeSeq were compared to 16S rRNA approach and BiomeSeq was able to identify
533 unique bacterial genera compared to 24 detected by 16S rRNA. In addition to characterizing all microbial components from the same sample, BiomeSeq is also able to discriminate at a higher taxonomic resolution. BiomeSeq is available as an open- source and user-friendly container. This versatility allows BiomeSeq to be accessible to users with varied degrees of command-line knowledge and computational resources. While BiomeSeq has been developed and evaluated on avian species, it can be used to characterize microbiomes of a variety of species, including humans.
2.3 Results
2.3.1 Design and Development of BiomeSeq
BiomeSeq was designed to identify microbial communities within next generation sequencing files in single- and paired-end format. Figure 2.1 shows an overview of the BiomeSeq workflow. In summary, the workflow begins with a quality and decontamination step in which all adapter sequences, short reads and low-quality reads are first extracted from the sequencing files provided by the user. The trimmed
52
reads are then aligned to the host reference genome specified by the user. Host DNA is extracted from the file to increase analytical efficiency and mapping accuracy [17].
The remaining reads are then aligned to BiomeSeq’s microbial databases including a eukaryotic viral, fungal, bacterial and bacteriophage genome database containing sequences obtained from the NCBI RefSeq database. These databases are publicly available [18]. One feature that makes BiomeSeq quite versatile is that it accepts custom databases provided by the user. Several additional customizable parameters can be specified by the user including: the host reference genome, mapping quality threshold, and output files (i.e. alignment files). Table 2.1 includes all software and parameters used by BiomeSeq. Following the alignment of the decontaminated reads to the microbial databases, BiomeSeq then calculates normalized abundance, percent relative abundance and genome coverage for each eukaryotic virus, bacteria, bacteriophage and fungi detected. It also calculates diversity of the entire sample.
BiomeSeq generates a table consisting of NCBI RefSeq accession number, microbe name, taxonomy, number of mapped reads of the detected microbes and all calculations. For each sample processed by BiomeSeq, four tables are generated consisting of the results generated for each of the four components. In Table 2.2 an example of an output table for the viral component can be viewed. Similar tables are generated for bacteria, bacteriophage and fungal data. The BiomeSeq workflow and associated databases were implemented into a software package and a container
(Figure S1). BiomeSeq is currently available as an open-source and user-friendly resource on Docker Hub. The self-contained environment simplifies installation and
53
execution by eliminating the need for downloading and installing the BiomeSeq program, databases and all dependent software. Furthermore, the BiomeSeq container allows the same customizable parameters and accepts custom databases provided by the user.
2.3.2 Validation of BiomeSeq
BiomeSeq’s performance was evaluated using simulated datasets consisting of known microorganisms and their corresponding abundances. Four simulated datasets were created to closely mimic the complex community structure of an avian respiratory microbiome. Each dataset was generated using genome sequences from 20 microorganisms that have been experimentally detected in the respiratory tract of poultry broilers (Table S1). A sequence from a poultry broiler was also included to represent the host environment. The simulated datasets contained an average of
24,523,032 total raw reads (Table S2), which were processed using BiomeSeq. The reads were first trimmed for quality and decontaminated of host DNA. Table S2 shows the number of reads that were extracted during each of these steps in the processing.
An average of 24,522,253 remained after quality trimming of all adapter sequences, sequences less than 100 base pairs in length and sequences with a quality Phred score under 30 (Table S2). An average of 5,158,715 remained following decontamination of chicken genomic sequences. The remaining reads were then aligned to four microbial databases including bacteriophage, bacteria, fungi and avian derived virus genomes
54
with a mapping quality threshold of 20. One major feature of BiomeSeq that makes it so versatile, is its ability to accept custom databases provided by the user. To evaluate this feature, an avian-specific viral database was constructed to replace BiomeSeq’s default viral database. An average of 90.8% of the reads were aligned to microbial genome sequences (Table S2). From the number of mapped reads, BiomeSeq then calculates the normalized abundance, relative abundance, genome coverage and diversity of the sample and generates a table for each of the four microbiome components. From the information provided by the BiomeSeq tables, several metrics were used to evaluate BiomeSeq’s overall performance including correlation with known abundance, sensitivity, precision, rate of speed and root mean square error.
A total of twenty microbial genomes were included in the four simulated datasets and BiomeSeq was able to successfully identify each. From the number of known mapped reads and the number of reads BiomeSeq mapped, percent relative abundance was calculated. Figure 2.2 shows both known and predicted percent relative abundance of one of the simulated datasets. Of the known abundances, the most abundant fungi is Aspergillus Oryzae (73.34%), the most abundant bacteria is
Escherichia coli (10.87%), the most abundant eukaryotic virus is Gallid Herpesvirus 2
(1.06%) and the most abundant bacteriophage is Enterobacteriophage T4 (0.33%).
The most abundant fungi detected by BiomeSeq is also Aspergillus Oryzae (78.03%), the most abundant bacteria is also Escherichia coli (7.36%), the most abundant eukaryotic virus is Gallid Herpesvirus 1 (0.27%) and the most abundant bacteriophage is also Enterobacteriophage T4 (0.09%). Pearson correlation coefficients between
55
predicted and known abundances were calculated at the species level. Abundances of species determined by BiomeSeq were highly correlated with known abundances demonstrating an average correlation coefficient of r = 0.997 for all four datasets.
The precision and sensitivity of BiomeSeq was evaluated using the same datasets. True positives, true negatives, false positives, sensitivity and precision were calculated for each microbial component (Table S3). Overall, 4,659,277 true positives,
22,397 false positives, and 350,271 false negatives were observed. Sensitivity describes the number of reads correctly aligned to the appropriate genome divided by the total number of sequences in the sample. Precision is the number of reads that were aligned to the appropriate genome divided by the total number of reads mapped to any genome. Using default parameters, BiomeSeq demonstrated exceptional accuracy, with 99.52% precision of and 93.01% sensitivity (Table S3).
The rate of speed during each step of BiomeSeq was calculated for the four simulated datasets (Figure 2.3; Table S4). Rate of speed of BiomeSeq is contingent upon the number of computational cores, amount of computational memory and the size of the dataset and host reference genome. The four simulated datasets were processed on a server with 98 GB RAM and 4 CPU cores. The quality step, in which adapter sequences, reads less than 100 base pairs in length and low quality reads are trimmed from the input sequencing file, was measured at an average speed of 79,977
(± 9,204) reads per second (Figure 2.3; Table S4). The decontamination step had an average speed of 6,327 (± 473) reads per second (Figure 2.3; Table S4). During this step, the host reference genome is indexed; the larger the host genome is the longer
56
this step will take. The Gallus gallus genome (Annotation Release 104), used in this evaluation, is about 1.2 billion base pairs in length [19]. After the genome is indexed, the trimmed reads are aligned to the host reference genome and reads that map are removed from the file. Alignment of reads to microbial databases was measured at an average speed of 2,421 (± 174) reads per second (Figure 2.3; Table S4). During this step, the reads remaining after decontamination, an average of 5,158,715 for the four simulated datasets, are aligned to a total of 7,227 microbial genomes with various sizes. Finally, the quantification step, in which the normalized relative abundance, percent relative abundance, genome coverage and diversity is calculated from the reads that aligned to the microbial sequences, had an average speed of 183,264 (±
31,244) reads per second (Figure 2.3; Table S4).
Root mean square error (RMSE) measures the amount of error between the known abundances of each species and the abundances determined by BiomeSeq
(Figure 2.4). A small RMSE value indicates that the abundance determined by
BiomeSeq is close to the known abundances in the simulated dataset. RMSE was calculated for each eukaryotic virus, bacteria, bacteriophage and fungi species (Figure
2.4). An RMSE of < 4.70 was exhibited for all species and 17 species exhibited an
RMSE value of < 0.24. These results further indicate that BiomeSeq can accurately determine microbial abundance at the species taxonomic level.
57
2.3.3 A Longitudinal Study of the Microbial Ecology of a Healthy Broiler Flock
BiomeSeq was employed to detect and quantify eukaryotic viruses, bacteria, bacteriophage, and fungi in a healthy commercial broiler flock during the grow-out cycle from hatching to processing. Samples were collected from the respiratory tract of a healthy broiler flock weekly as the flock aged (Day 1 – Day 49). DNA and RNA were isolated and sequenced using an Illumina NGS platform. A total of 780 million reads were generated and successfully processed using BiomeSeq. These reads were first trimmed for quality, decontaminated of host DNA and aligned to each microbial genome database. The default viral genome database provided by BiomeSeq was replaced by a custom database containing avian-derived viral sequences (Table S5).
For each microorganism identified, BiomeSeq calculated normalized abundance, percent relative abundance, genome coverage and sample diversity. The taxonomic and quantitative data generated by BiomeSeq was visually represented using a variety of available tools.
In total, BiomeSeq aligned 5,163 reads to avian DNA viruses and 71,936 reads to avian RNA viral sequences. A total of 9 viral species, representing 8 genera and 8 families, were identified from the avian respiratory tract during the grow-out period.
Figure 6 shows a heatmap of percent normalized viral abundance at each time point during the grow-out cycle (Figure 2.5). A total of 469,937 reads were aligned to the bacterial genome database. This included 533 unique bacterial species, of which 45 had a calculated relative abundance greater than 0.5%. The 45 most abundant species
58
detected extend from 4 phyla, 7 classes, 13 orders, 26 families and 45 genera. This data is represented in a phylogenetic tree generated using the Phytools package in R
(Figure 2.6) [20]. A total of 504,682 reads aligned to the bacteriophage genome database. A total of 30 unique bacteriophage species extended from 1 classified and 1 unclassified order, 4 classified and 1 unclassified families, and 5 classified and 4 unclassified genera were identified. This data is represented in a Venn diagram of the common and unique bacteriophage species detected at Week 0, Week 3 and Week 7, generated using the VennDiagram package in R (Figure 2.7) [21]. A total of 1,964 reads aligned to the fungal genome database. Sixty-one unique fungal species were identified which extended from 2 phyla, 9 classes, 20 orders, 37 families and 50 genera. This data is represented in a fungal network generated with Cytoscape in which the nodes are grouped according to class and the diameter of the inner nodes corresponds to the frequency of which that particular microbial species was detected during the growout cycle of the flock (Figure 2.8) [22]. BiomeSeq detects the major components of a microbiome and therefore provides the information necessary for a complete view of microbial community structures. To provide an example of how the taxonomic and quantitative information produced by BiomeSeq can be visually represented, a microbial network was generated using Cytoscape from one sample
(Figure 2.9) [22]. This network contains all of the fungi, eukaryotic viruses, bacteria and bacteriophage detected in a single sample by BiomeSeq, with each node diameter corresponding to percent relative abundance of the particular species detected.
59
2.3.4 A Comparison of BiomeSeq bacterial results to 16S rRNA Results
As previously discussed, 16S rRNA sequencing methods are commonly used to analyze the bacterial component of microbiome samples. To compare BiomeSeq to this method, the next generation sequencing data generated from a healthy broiler flock at week 7 was compared to 16S rRNA results. Using the same sample, metagenomic DNA-Seq data and 16S rRNA data was generated. The DNA-Seq data was analyzed using BiomeSeq and the16S rRNA data was analyzed using Mothur and
Silva. Interestingly, the same most abundant bacteria were identified using both methods (Figure 2.10; Table S6). BiomeSeq determined Gallibacterium anatis was the most abundant (29%), followed by Staphylococcus haemolyticus (28%) and
Corynebacterium falsenii (18%; Figure 2.10B). The 16S rRNA approach determined
Gallibacterium was the most abundant (39%), followed by Corynebacterium (23%),
Lactobacillales (16%) and Staphylococcus (10%; Figure 2.10A). BiomeSeq has greater taxonomic sensitivity and is able to identify bacteria at the species level, whereas 16S rRNA is restricted to detection at the genera level.
2.4 Discussion
The complete characterization of a microbiome is critical in elucidating the complex ecology of the microbial composition within healthy and diseased animals.
The advancement of next generation sequencing methodologies has given rise to an increase in studies attempting to examine the microbial communities existing in a
60
variety of animals. Readily accessible and cost-effective sequencing methodologies as well as a number of user-friendly bioinformatics analysis software and databases for
16S rRNA sequencing data provide the standard culture-independent approach for bacterial analysis [9-13]. Although 16S rRNA has provided insight into one component of the microbiome, it is limited to detecting one specific kingdom, lacks the sensitivity to discriminate between species and cannot be used for novel microbial discovery. Metagenomic shotgun sequencing does not use PCR and is therefore not restricted by primers that target specific gene sequences. As a result, it is not limited to detecting one specific kingdom and has enough sensitivity to detect at the species taxonomic level. BiomeSeq is a novel computational tool designed to characterize the complete microbiome from metagenomic sequencing data. With its comprehensive workflow and microbial reference databases, this tool can rapidly identify the eukaryotic viral, fungal, bacteriophage and bacterial components of a sample and provide an accurate quantification of abundance, genome coverage and diversity.
BiomeSeq consists of three primary steps: i) quality trimming and decontamination of host DNA; ii) alignment to four microbial reference databases; iii) quantification of abundance, genome coverage and diversity (Figure 2.1). BiomeSeq utilizes a sequence-similarity dependent approach with comprehensive microbial databases to provide taxonomic classification and quantitate abundance and diversity.
This tool provides an accurate representation of abundance by considering the variability in microbial genome length and host genome length in these calculations.
61
Comprehensive eukaryotic viral, bacterial, fungal and bacteriophage databases were constructed using complete and representative genomes obtained from the NCBI
Reference Sequence Database and contain 5,693, 3,623, 1,281 and 2,212 genomes, respectively. These databases are publicly available [18]. A sequence-similarity dependent approach allows for accurate quantification; however, it is often limited by the completeness of the database used. To address this, BiomeSeq databases are updated biannually to include recently discovered microorganisms. Furthermore,
BiomeSeq accepts custom microbial databases provided by users, thus studies are not limited to utilizing only the default databases. BiomeSeq was designed for the identification of known microorganisms, however the sequencing data accepted by this tool can also be used in de novo microbial discovery. Many computational tools require extensive command-line knowledge and computational resources to process sequencing samples. In an attempt to increase user accessibility, the BiomeSeq software package is implemented into an open-source and user-friendly container
(Figure S1). Containers, such as this, allow the user to download and install
BiomeSeq, both workflow and all databases, and dependent software on any operating system using one simple command. Furthermore, the user can process their sample with any custom parameters, using one line of code.
BiomeSeq’s performance was evaluated using several metrics including correlation with known abundance, sensitivity, precision, rate of speed and root mean square error. Four simulated datasets containing known abundances of 20
62
microorganisms were employed for this evaluation (Table S1). BiomeSeq was successful in identifying each of the 20 microorgansims, and the abundance calculations at the species taxonomic level determined by BiomeSeq were highly correlated with the known abundances of these species (r = 0.997). Utilizing the default quality threshold of BiomeSeq, high precision and sensitivity were demonstrated with an average of 99.52% and 93.01%, respectively (Table S3). Rate of speed was calculated for each dataset at each step in the BiomeSeq workflow including quality trimming, decontamination of host DNA, alignment to four microbial databases, and quantification. Overall, an average total rate of speed of
271,584 (± 34,912) reads per second was observed (Figure 2.3; Table S4). However, this metric is highly dependent on computational resources, as well as the size of the host reference genome and sequencing file input into the program. An RMSE of less than 0.24 was demonstrated for 17 of the species in the simulated datasets, further demonstrating that the abundance determined by BiomeSeq at the species taxonomic level corresponds to the known values (Figure 2.4). Overall, BiomeSeq performed with exceptional speed, accuracy and sensitivity.
Biomeseq was employed to detect and quantify the respiratory microbiome of a healthy commercial poultry broiler flock at weekly intervals from hatching to processing. For each component of the respiratory microbiome of this flock, abundance was calculated and population shifts were examined at each time point. A total of 11 eukaryotic viral species, 45 bacterial species, 31 bacteriophage species, and
63
61 fungal species were identified in this flock. The taxonomic and quantitative tables generated by BiomeSeq can be input into several programs to create visual representations of the data. Heatmaps, phylogenetic trees, venn diagrams, and microbial networks are examples of visualizations that can be easily generated to assist interpretation of the results (Figures 2.5-9).
The commercial broiler flock utilized in this study was vaccinated in ovo with a live Marek’s disease virus vaccine (SB-1) and a live recombinant herpesvirus of turkeys (HVT) vaccine expressing Newcastle disease virus genes. The presence of herpesviruses and coronaviruses in the respiratory tract is consistent with vaccination with these two live vaccines, coupled with the expected presence of these avian viruses in the environment. The presence of the Myoviridae family of bacteriophage correlated with Gallibacterium, an abundant bacterial species (data not shown). Interactions between bacteriophage and bacteria are known to have a significant impact on host health (24). Basidiomycota was highly abundant in this flock, however further studies are needed to determine the relevance of this fungal species in the respiratory tract of avian species. The bacterial diversity of the flock was complex at the time of processing, containing significant amounts of Pasteurellaceae, Corynebacteriaceae, Staphylococcaceae and Enterobacteriaceae. Using one sample from this study, bacterial results generated by BiomeSeq were compared to results generated by 16S rRNA sequencing methods. The most abundant bacteria were observed using both methods (Figure 2.11; Table S6). However, BiomeSeq identified 533 unique bacteria, 45 with a relative abundance of greater than 0.5%, while 16S rRNA detected only 24 genera. Furthermore, BiomeSeq has greater taxonomic
64
sensitivity and is able to identify bacteria at the species level, whereas 16S rRNA is restricted to detection at the genera level. Moreover, 16S rRNA sequencing methodology can only be employed for taxonomic classification of the bacterial component, leaving the identity of the remaining components of the microbiome unknown. BiomeSeq is able to characterize all major components of a microbiome with high taxonomic sensitivity and accurately quantify abundance. Moreover, unlike 16S rRNA sequencing data, metagenomic shotgun sequencing data processed by BiomeSeq can be further used in sequence-independent approaches for de novo microbial discovery.
BiomeSeq is a tool developed for the analysis of complete animal microbiomes using metagenomic sequencing data. With its comprehensive workflow, customizable parameters and microbial databases, BiomeSeq can rapidly identify the major components of a microbiome from a sample and determine normalized abundance, percent relative abundance, genome coverage and sample diversity. While many existing tools focus on characterizing one microorganism, BiomeSeq provides a complete view of microbial ecology and diversity in a sample. The performance of this tool was evaluated using both simulated and clinical datasets and exceptionally accurate and precise abundance estimates were demonstrated. BiomeSeq is available as an open-source and user-friendly container, allowing users to easily download, install and use the program with a few simple commands. The versatility of
BiomeSeq, such as customizable parameters and accepting custom databases, allow this tool to facilitate a variety of unique investigations.
65
2.5 Materials and Methods
BiomeSeq is currently available as an open-access and user-friendly tool on
Docker Hub. As the docker container is self-contained, it simplifies installation and execution by eliminating the need for downloading and installing dependent software and requires only one command. BiomeSeq is customizable and allows the user to adjust parameters similar to a command-line tool. Table 2.1 includes all software and parameters used in BiomeSeq.
BiomeSeq accepts both single- and paired-end reads in fastq format generated by DNA-Seq or RNA-Seq methods. Along with the fastq file, the user may customize a number of parameters including: the host genome that the sample was derived from, custom databases provided by the user, mapping quality threshold and output file types. Figure 2.1 shows an overview of the BiomeSeq workflow, which consists of three primary steps: i) quality trimming and decontamination of host DNA; ii) alignment to four microbial reference databases; iii) quantification of abundance, genome coverage and diversity. BiomeSeq generates a table consisting of NCBI
RefSeq accession number, microbe name, taxonomic information, number of mapped reads, normalized abundance, percent relative abundance, genome coverage for each eukaryotic virus, bacteria, bacteriophage and fungi detected, as well as diversity of the sample. Table 2.2 is an example of an output table generated for the viral component.
Similar tables are generated for bacteria, bacteriophage and fungal data. Visualizations of these results can be easily generated using several different packages in R.
66
2.5.1 Quality Trimming and Host Decontamination
The BiomeSeq workflow begins with a quality trimming step in which individual fastq sequence files input into the program are first analyzed for per-base sequence quality, per-sequence quality, sequence length distribution and duplicate sequences (Figure 2.1). Reads with a quality phred score below 30, reads under 100 base pairs in length and adapter sequences are removed from the file. This step is conducted using Trim-Galore [23] . The next step in the workflow decontaminates the file of host DNA. In this step, the trimmed reads are aligned to the user-specified host reference genome using BWA, and only reads that do not align to the host genome are extracted and analyzed further (Figure 2.1) [24].
2.5.2 Microbial Database Alignment
The trimmed and decontaminated sequencing reads are aligned to a eukaryotic viral genome database, a bacterial database, a fungal database and a bacteriophage database using the Bowtie 2 alignment algorithm (Figure 2.1) [25]. Mapping quality threshold default is 20, however this parameter may be customized by the user. The eukaryotic viral genome database currently includes 5,693 complete and representative viral sequences obtained from the National Center for Biotechnology
Information (NCBI) Reference Sequence Database [26]. Bacterial, fungal and bacteriophage databases were constructed using a similar approach and contain 3,623,
1,281 and 2,212 genomes, respectively [26]. Each microbial database and
67
corresponding aligner index files are publicly available [18]. Each of the four microbial databases are continuously updated to include novel and recently discovered sequences. These databases are the default option for BiomeSeq. However, as an additional feature, BiomeSeq also accepts custom microbial databases provided by the user.
2.5.3 Quantification and Output
A sequence similarity-dependent approach for detecting microorganisms contributes to the rapid detection of known viruses while also allowing for the quantification of biodiversity, which similarity-independent approaches lack [27, 28].
To calculate microbial abundance, BiomeSeq uses an adaptation to the equation presented by Moustafa and colleagues in 2017 to quantify viral abundance [29]:
"#$%&'#() +',-.(-$/ -,3'/% &4 %/(.5 3(66/. 7& 3#$%&'/ 5/8,/-$/ 2 2 3#$%&'/ 5/8,/-$/ 5#9/ = 2 10! -,3'/% &4 %/(.5 3(66/. 7& ℎ&57 ;/-&3/ ℎ&57 ;/-&3/ 5#9/
Percent relative abundance is quantified using the following equation:
3#$%&'#() (',-.(-$/ >/%$/-7 ?/)(7#@/ +',-.(-$/ = 2 100 7&7() 3#$%&'#() (',-.(-$/
Genome coverage is approximated using the following equation:
68
(-,3'/% &4 %/(.5 3(66/. 7& 3#$%&'/ 2 %/(. )/-;7ℎ) A/-&3/ B&@/%(;/ = 3#$%&'/ %/4/%/-$/ ;/-&3/ 5#9/
Alpha diversity for each sample is calculated using the Shannon Diversity Index, a commonly used equation for calculating species diversity in a microbiome as it accounts for both species abundance and evenness within the sample [30, 31].
2.5.4 Performance Metrics
Simulated data was utilized to assess several metrics of BiomeSeq’s performance capabilities including correlation with known abundance, sensitivity, precision, rate of speed and root mean square error. Four datasets were generated to closely mimic the complexity of data obtained from real microbiomes consisting of bacteria, eukaryotic viruses, bacteriophage and fungi genomes as well as host DNA sequences. The datasets contain sequences from 20 microorganisms commonly found in the respiratory microbiome of broiler chickens, including 10 eukaryotic viruses, 4 bacteriophage, 5 bacteria and 1 fungi (Table S1). We included one chicken sequence to represent the host environment (NC_006088.5). ART was used to simulate reads generated using next-generation sequencing technology [32]. Single-end reads with a length of 100, fold coverage of 10X and masking cutoff frequency of 1 in 100 were simulated based on an error and quality profile of the HiSeq 2500 Illumina sequencing platform. The number of reads simulated ranged from 24,522,223 to 24,523,708, with an average read count of 24,523,065.
69
The four simulated datasets were processed using BiomeSeq with the following parameters: -g chicken.fasta -d avian_virus -q 20. One major feature of
BiomeSeq is its ability to accept custom databases provided by the user. To evaluate this feature, an avian-specific viral database was constructed to replace BiomeSeq’s default viral database (Table S5). The avian DNA viral genomes include 48 viral elements from 9 unique families and the avian RNA viral genomes include 63 viral elements from 13 families. The avian DNA and RNA viral database is arranged by the classification of their viral structure and genome organization. DNA viruses are organized hierarchically by whether the virus is double- or single-stranded and whether the virus is enveloped or non-enveloped. RNA viruses are organized hierarchically by whether the virus is double- or single-stranded, negative or positive sense, segmented or non-segmented and whether the virus is enveloped or non- enveloped. This database is publicly available [18]. Abundance was calculated on the species level and several metrics were assessed based on the calculations determined by BiomeSeq including correlation with known abundances which was calculated using Pearson’s correlation coefficient. Rate of speed was calculated as the number of reads per second at each step of the BiomeSeq process on a server with 98 GB RAM and 4 CPU cores. Sensitivity and precision were calculated based on the following equations:
G%,/ >&5#7#@/ E/-5#7#@#7F = G%,/ >&5#7#@/ + I()5/ J/;(7#@/
70
G%,/ >&5#7#@/ >%/$#5#&- = G%,/ >&5#7#@/ + I()5/ >&5#7#@/
True positives are the number of reads that BiomeSeq aligned to the genomes in the databases; false positives are the number of reads that were aligned to genomes not included in the databases; and false negatives are the number of reads that were not aligned. Root mean square error was calculated to compare the abundance calculations of BiomeSeq to the known abundance using the following equation:
∑(N#&3/5/8 +',-.(-$/ − P-&Q- +',-.(-$/)" ?"EK = L -,3'/% 5(36)/5
2.5.5 A Longitudinal Study of the Microbial Ecology of a Healthy Broiler Flock
Tracheal swabs were collected at hatching and at weekly intervals through processing at day 49 (8 samples) from an antibiotic-free commercial broiler flock.
Both DNA and RNA were isolated and sequencing was performed for each of the eight time points using the Illumina HiSeq platform producing 1 X 100 single-end reads. Each of the resulting 16 samples were processed using BiomeSeq with the following parameters: -g chicken -d avian_virus -q 40. The previously described avian viral reference database was utilized in this study (Table S5). Normalized abundance, relative abundance, genome coverage and sample diversity was calculated for each of the microbial components. Visual representations of the results generated by
71
BiomeSeq were generated using several R packages, including heatmaps, phylogenetic trees, Venn diagrams and microbial networks [20-22].
2.5.6 Comparison of BiomeSeq Bacterial Results to 16S rRNA Results
For comparison of BiomeSeq results to bacterial results generated using 16S rRNA sequencing methodology, The V4 hypervariable region of the bacterial 16S rRNA gene was extracted and amplified using PCR with primers 515F (‘5-
GTGCCAGCMGCCGCGGTAA-3’) and 806R (‘5-
GGACTACHVGGGTWTCTAAT-3’), as previously described [7, 33]. The amplicons were sequenced at the University of Minnesota Genomics Center (Minneapolis, MN) using an Illumina MiSeq 600 cycle v3 kit. Each sample was assessed for quality and assembled into contigs using PEAR’s default parameters, with the modification that the quality score threshold was set to 30. Samples were further filtered and analyzed using mothur version 1.35.1 [13] and MiSeq SOP [34]. OTUs were generated using
97% sequence similarity. Mothur’s implementation of the SILVA database (v123) was used for classification of OTUs, and relative abundance was calculated. The results generated using 16S rRNA sequencing methodology were compared to results generated by BiomeSeq.
72
Figure 2.1. BiomeSeq Workflow.
Input Fastq Files
Quality And Decontamination Trim Sequences for Quality
Align Trimmed Reads to Host Genome
Database Alignment
Fungal Animal Viral Align Unmapped Database Database reads to Microbial databases Bacteriophage Bacterial Database Database
Quantification
Calculate Viral Calculate Bacteria Calculate Fungal Calculate Phage Abundance, Abundance, Abundance, Abundance, Diversity and Diversity and Diversity and Diversity and Genome Genome Genome Genome Coverage Coverage Coverage Coverage
73
Figure 2.2. Percent relative abundance of microorganisms detected by BiomeSeq and known values from simulated datasets
BiomeSeq
Known
0 10 20 30 40 50 60 70 80 90 100 Percent Relative Abundance
Aspergillus oryzae Avian metapneumovirus Enterobacteria phage T4 Enterobacteria phage T7 Escherichia coli Escherichia phage TL-2011b Gallid herpesvirus 1 Gallid herpesvirus 2 Gallid herpesvirus 3 Infectious bronchitis virus Infectious bursal disease virus Influenza A virus Mycoplasma gallisepticum Mycoplasma synoviae Newcastle disease virus Ornithobacterium rhinotracheale Pasteurella multocida Staphylococcus phage StB20 Turkey coronavirus Meleagridid Herpesvirus 1
74
Figure 2.3. Average rate of speed at different steps in BiomeSeq processing including A) quality(a), B) decontamination(b), C) Microbial database alignment(c) and D) quantification(d) for four simulated datasets.
100,000
10,000
1,000 (Reads/second)
10 100 log
10
1 Quality Trimming Host Decontamination Microbial Database Quantification Alignment
a) Adapter sequences, reads shorter than 100 base pairs in length and reads with a quality Phred score of less than 30 from the sequencing file. b) Host reference genome is indexed, and reads are aligned to the host reference genome to extract host DNA. c) Reads are aligned to four microbial databases including eukaryotic viruses, fungi, bacteria and bacteriophage. d) Normalized abundance, percent relative abundance, genome coverage and diversity are calculated from the reads that align to microbial sequences.
75
Figure 2.4. Root Mean Square Error between known abundances and abundances determined by BiomeSeq
Root Mean Square Error 5
4.5
4
3.5
3
2.5
2 Root Mean Square Error Mean Square Root 1.5
1 76 0.5
Enterobacteriaphage Newcastlevirus disease Infectiousbursal virus disease
0 Ornithobacterium Mycoplasma Aspergillus Enterobacteriaphage Escherichiaphage TL Infectiousbronchitis virus Influenzavirus A HerpesvirusMeleagridid 1 Gallid Gallid coronavirus Turkey Gallid Escherichiacoli Pasteurella Mycoplasma Staphylococcusphage StB20 metapneumovirus Avian herpesvirus 3 herpesvirus 1 herpesvirus herpesvirus 2 herpesvirus oryzae multocida synoviae gallisepticum rhinotracheale - 2011b T7 T4
76
Figure 2.5. Heatmap of percent normalized relative abundance of viruses detected in a commercial poultry flock from hatching to processing. Color corresponds to the range of relative abundance of each family from 0 to 100%. Green: 0-1%; yellow: 1-25%; orange: 25-75%; and red: 75-100%. The sum of each column, or week, is 100%.
Nucleic Acid Strand Sense Enveloping Family Genus Species Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Type Iltovirus Gallid alpha herpesvirus 1 0.171 enveloped Herpesviridae Gallid alphaherpesvirus 2&3 0.257 0.145 0.070 0.006 double stranded Mardivirus DNA Meleagrid alphaherpesvirus 1 0.037 0.004 Anelloviridae Gyrovirus Avian gyrovirus 88.664 12.054 15.469 36.773 non-enveloped single stranded Adenoviridae Aviadenovirus Fowl aviadenovirus 53.299 6.698 negative enveloped Birnaviridae Avibirnavirus Infectious bursal disease virus 0.008 2.333 0.105 Coronaviridae Gammacoronavirus Avian infectious bronchitis virus 0.382 54.762 58.947 16.278 1.884 23.602 21.786 19.319 enveloped Alpharetrovirus Avian carcinoma virus 0.077 0.042 RNA single stranded Retroviridae positive Unclassified Avian Endogenous Retrovirus 99.447 44.493 41.017 83.296 9.290 64.196 7.108 37.063 Astroviridae Avastrovirus Chicken astrovirus 0.744 non-enveloped
77 Picornaviridae Sicinivirus Chicken sicinivirus JSY 0.169 0.005
77
Figure 2.6. Phylogenetic tree of bacterial species detected in a commercial poultry flock. Branches extend from phylum to species. Nodes indicate detected species and diameter indicates average abundance.
Actinobacteria
Proteobacteria
Bacteroidetes
Firmicutes
78
Figure 2.7. Venn Diagram of the detected bacteriophage species in a commercial poultry flock at Week 0, Week 1 and Week 7.
Week 0 Week 3
Enterobacteria phage P88 Staphylococcus phage GH15 Microbacterium phage Min1 Staphylococcus phage StB20-like 10 Staphylococcus phage P108 Staphylococcus phage phiSA012 Enterobacteria phage phi92 1 6 Enterobacteria phage IME10 Staphylococcus phage SPbeta-like Enterobacteria phage cdtI Enterobacteria phage P1 Staphylococcus phage MCE-2014 Staphylococcus phage phiRS7 Staphylococcus phage phiIPLA-RODI Enterobacteria phage SfI Salmonella phage RE-2010 Salmonella phage SJ46 10 Enterobacteria phage T7 Enterobacteria phage lambda Stx2-converting phage 1717 Enterobacteria phage mEp460 Enterobacteria phage VT2phi_272 Shigella phage SfIV Escherichia phage TL-2011b 79 Uncultured phage crAssphage 1 1 Stx2 converting phage vB_EcoP_24B
Enterobacteria phage YYZ-2008
Shigella phage SHFML-11 Staphylococcus phage StB20
1
Enterobacteria phage RB55 Week 7
79
Figure 2.8. Fungal network of species detected in a commercial poultry flock. Outer nodes represent order level, while inner nodes represent species. Diameter of the inner nodes correlate to species frequency, or the number of weeks the species was detected.
80
80
Figure 2.9. Microbial network of the top 10 most abundant eukaryotic viruses, fungi, bacteria and bacteriophage in a commercial poultry flock at time of processing. Node diameter indicates the percent relative abundance.
Bacteria
Eukaryotic Virus
Avian Respiratory Microbiome 81
Fungi
Bacteriophage
81
Figure 2.10. Bacteria detected in a healthy poultry broiler flock using A) 16S rRNA and B) BiomeSeq
16S HEALTHY BIOMESEQ HEALTHY
A) B)
Other 10% Other Gallibacterium 25% anatis Lactobacillales** Gallibacterium 29% 16% 40%
Staphylococcus 10% Staphylococcus haemolyticus Corynebacterium 28% falsenii Corynebacteriaceae* 18% 24%
82
Table 2.1. Software tools and parameters used by BiomeSeq
Process Tool Name Parameters
Quality Trimming Trim Galore default
BWA -x -S Host Decontamination Samtools view -bS
Bowtie 2 -x -S Microbial Database Alignment Samtools view -bSq [user input]
83
Table 2.2. Example table generated by BiomeSeq of the viral component of a commercial poultry flock at Week 6.
Ref Seq Genome Number Norm. Relative Genome Sample Name Taxonomy Number Size Mapped Abundance Abundance Coverage Diversity
Double Gallid Stranded; NC002229 Alphaherpesvirus Enveloped; 177874 1 27 0.004% 0 0.534 2 Herpesviridae; Mardivirus Double Gallid Stranded; NC002577 Alphaherpesvirus Enveloped; 164270 1 30 0.004% 0 3 Herpesviridae; Mardivirus Single Stranded; 84 Non- NC015396 Avian Gyrovirus 2383 72 147166 22.493% 3.05 Enveloped; Circoviridae; Gyrovirus Double Stranded; Fowl Non- NC001720 43804 4560 507049 77.498% 10.51 Aviadenovirus Enveloped; Adenoviridae; Aviadenovirus
84
REFERENCES
1. Peterson, J., et al., The NIH Human Microbiome Project. Genome research,
2009. 19(12): p. 2317-2323.
2. Barzon, L., et al., Applications of next-generation sequencing technologies to
diagnostic virology. Int J Mol Sci, 2011. 12(11): p. 7861-84.
3. Bond, S.L., et al., Upper and lower respiratory tract microbiota in horses:
bacterial communities associated with health and mild asthma (inflammatory
airway disease) and effects of dexamethasone. BMC Microbiol, 2017. 17(1): p.
184.
4. De Boeck, C., et al., Longitudinal monitoring for respiratory pathogens in
broiler chickens reveals co-infection of Chlamydia psittaci and
Ornithobacterium rhinotracheale. J Med Microbiol, 2015. 64(5): p. 565-574.
5. Gaeta N, L.S., Teixeira A, Ganda E, Oikonomou G, Gregory L, Bichalho R.,
Deciphering upper respiratory tract microbiota complexity in healthy calves
and calves that develop respiratory disease using shotgun metagenomics. J
Dairy Sci. , 2017. 100: p. 1445-1458.
6. Glendinning, L., G. McLachlan, and L. Vervelde, Age-related differences in
the respiratory microbiota of chickens. PLoS One, 2017. 12(11): p. e0188455.
7. Johnson TJ, Y.B., Noll S, Cardona C, Evans NP, Karnezos P, Ngunjiri JM,
Abundo MC, Lee C-W, A consistent and predictable commercial broiler
85
chicken bacterial microbiota in antibiotic-free production displays strong
correlations with performance. Appl. Environ. Micro., 2018. 84: p. e00362-18.
8. Shabbir, M.Z., et al., Microbial communities present in the lower respiratory
tract of clinically healthy birds in Pakistan. Poult Sci, 2015. 94(4): p. 612-20.
9. De Santis T, H.P., Larsen N, Rojas M, Brodie E, Keller K, Huber T, Dalevi D,
Hu P, Andersen G., Greengenes, a chimera-checked 16S rRNA gene. 2016.
10. Quast C, P.E., Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner F.,
The SILVA ribosomal RNA gene database project: improved data processing
and web-based tools. Nucl Acids Res., 2013. 41: p. 590-596.
11. Meyer F, P.D., D’Souza M, Olson R, Glass E, Kubal M, Paczian T, Rodriguqz
A, Stevens R, Wilke A, Wilkening J, Edwards R., The metagenomics RAST
server- a public resource for the automatic phylogenetic and functional
analysis of metagenomes. BMC Bioinformatics, 2008. 9: p. 386.
12. Caporaso J, K.J., Stombaugh J, Bittinger K, Bushman F, Costello E, Fierer N,
Peña A, Goodrich J, Gordon J, Huttley G, Kelley ST, Knights D, Koenig JE,
Ley R, Lozupone C, McDonald D, Muegge B, Pirrung M, Reeder J, Sevinsky
JR, Turnbaugh P, Walters W, Widmann J, Yatsunenko T, Zaneveld J, Knight
R., Qiime allows analysis of high-throughout community sequencing data.
Nature Methods, 2010. 7: p. 335-336.
13. Schloss P, W.S., Ryabin T, Hall J, Hartman M, Hollister E, Lesniewski R,
Oakley B, Parks D, Robinson C, Sahl J, Stres B, Thallinger G, Van Horn D,
Weber C. , Introducing mothur: Open-source, platform-independent,
86
community-supported software for describing and comparing microbial
communities. Appl Enviro Microbiol, 2009. 75: p. 7537-7541.
14. Kõljalg U, N.R., Abarenkov K, Tedersoo L, Taylor A, Bahram M, Bates S,
Bruns T, Bengtsson-Palme J, Callaghan T, Douglas B, Drenkhan T, Eberhardt
U, Dueñas M, Grebenc T, Griffith G, Hartmann M, Kirk P, Kohout P, Larsson
E, Lindahl B, Lücking R, Martín M, Matheny P, Nguyen N, Niskanen T, Oja J,
Peay K, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Scott
J, Senés C, Smith M, Suija A, Taylor D, Telleria M, Weiss M, Larsson K.,
Towards a unified paradigm for sequence-based identification of fungi. Mol
Ecol., 2013. 22: p. 5271-5277.
15. Zhu, J., et al., Virus-specific CD8+ T cells accumulate near sensory nerve
endings in genital skin during subclinical HSV-2 reactivation. J Exp Med,
2007. 204(3): p. 595-603.
16. Jovel, J., et al., Characterization of the Gut Microbiome Using 16S or Shotgun
Metagenomics. Front Microbiol, 2016. 7: p. 459.
17. Daly G, L.R., Rowe W, Stubbs S, Wilkinson M, Ramirez-Gonzalez R, Mario
C, Bernal W, Heeney J. , Host subtraction, filtering and assembly validations
for novel viral discovery using next generation sequencing data. PLoS One,
2015. 10(6).
18. Mulholland, K.A. BiomeSeq Microbial Databases. Avian Genomics 2019;
Available from: https://sites.udel.edu/aviangenomics/.
87
19. Hillier, L.W., et al., Sequence and comparative analysis of the chicken genome
provide unique perspectives on vertebrate evolution. Nature, 2004. 432(7018):
p. 695-716.
20. Revell, L.J., phytools: an R package for phylogenetic comparative biology
(and other things). Methods in ecology and evolution, 2012. 3(2): p. 217-223.
21. Chen, H. and P.C. Boutros, VennDiagram: a package for the generation of
highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics,
2011. 12(1): p. 35.
22. Shannon, P., et al., Cytoscape: a software environment for integrated models of
biomolecular interaction networks. Genome Res, 2003. 13(11): p. 2498-504.
23. Martin, M., Cutadapt Removes Adapter Sequences from High-Throughput
Sequencing Reads. EMBnet Journal, 2011. 17: p. 10-12.
24. Li, H. and R. Durbin, Fast and accurate long-read alignment with Burrows-
Wheeler transform. Bioinformatics, 2010. 26(5): p. 589-95.
25. Langmead, B. and S.L. Salzberg, Fast gapped-read alignment with Bowtie 2.
Nat Methods, 2012. 9(4): p. 357-9.
26. O'Leary NA, W.M., Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B,
Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y,
Blinkova O, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell
CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali
VK, Li W, Maglott D, Masterson P, McGarvey KM, Murphy MR, O'Neill K,
Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, Shkeda A, Storz SS,
88
Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR, Wallin C, Webb D,
Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio M, Kitts P, Murphy
TD, Pruitt KD. , Reference sequence (RefSeq) database at NCBI: current
status, taxonomic expansion, and functional annotation. Nucleic Acids Res.,
2016. 4: p. 733-745.
27. Herath, D., et al., Assessing Species Diversity Using Metavirome Data:
Methods and Challenges. Comput Struct Biotechnol J, 2017. 15: p. 447-455.
28. Rose, R., et al., Challenges in the analysis of viral metagenomes. Virus Evol,
2016. 2(2): p. vew022.
29. Moustafa, A., et al., The blood DNA virome in 8,000 humans. PLoS Pathog,
2017. 13(3): p. e1006292.
30. Lemos, L.N., et al., Rethinking microbial diversity analysis in the high
throughput sequencing era. Journal of Microbiological Methods, 2011. 86(1):
p. 42-51.
31. Ludwig, J. and J. Reynolds, Statistical Ecology, ed. Wiley. 1988, New York.
32. Huang, W., et al., ART: a next-generation sequencing read simulator.
Bioinformatics, 2012. 28(4): p. 593-4.
33. Gohl DM, V.P., Garbe J, MacLean A, Hauge A, Becker A, Gould TJ, Clayton
JB, Johnson TJ, Hunter R, Knights D, Beckman KB., Systematic improvement
of amplicon marker gene methods for increased accuracy in microbiome
studies. Nat Biotechnol, 2016. 34: p. 942-949.
89
34. Kozich J, W.S., Baxter N, Highlander S, Schloss P., Development of a dual-
index se-quencing strategy and curation pipeline for analyzing amplicon
sequence data on the MiSeq Illumina sequencing platform. Appl. Environ.
Microbiol., 2013. 79: p. 5112-5120.
90
Chapter 3
METAGENOMIC ANALYSIS OF THE RESPIRATORY MICROBIOME OF A HEALTHY BROILER FLOCK FROM HATCHING TO PROCESSING
3.1 Summary
The severity and spread of many human and animal diseases are associated with specific bacterial and viral agents within the respiratory microbiome. Recent studies attempting to characterize the respiratory microbiome of poultry have focused primarily on bacteria, however elucidating the complex microbial interactions that result in disease requires the characterization of the viruses, bacteria, bacteriophage, and fungi present in the respiratory microbiome of a healthy broiler flock. The lack of comprehensive bioinformatics pipelines and viral genome databases have limited efforts to characterize the avian virome. Next generation sequencing approaches, accompanied by the further development of novel computational and bioinformatics tools, were utilized to examine the evolution of the microbial ecology of the avian trachea during the growth of a commercial flock. The flock was sampled weekly, beginning at placement and concluding at 49 days, the day before processing.
Metagenomic sequencing of DNA and RNA and 16S rRNA sequencing was utilized to examine the bacteria, virus, bacteriophage, and fungal components at these times.
91
We detected a total of 11 eukaryotic viral species, 24 bacterial genera, 31 bacteriophage species, and 61 fungal species. Abundance at various taxonomic levels, alpha diversity, species frequency and microbial shifts were examined for each of the microbial components. Additionally, correlations between bacteria and bacteriophage families were investigated and several highly positive correlations were identified.
This study provides the first comprehensive analysis of the ecology of the avian respiratory microbiome and will facilitate future investigations of avian respiratory diseases.
3.2 Introduction
Microbiomes are complex environments consisting of eukaryotic viruses, bacteria, archaea, bacteriophage, fungi, and protozoa, all of which contribute to the creation of a particular biological niche. These microorganisms interact with the host and each other in either symbiosis or dysbiosis depending on the status of the host [1].
The introduction of an infectious agent can lead to disturbances of this environment and may result in disease [2]. Recent studies have identified specific bacterial and viral agents within the respiratory microbiome of both humans and animals that are associated with the severity and spread of disease [3-6]. For example, Pettigrew et al.
(2008) evaluated the complex interactions between Streptococcus pneumoniae,
Haemophilus influenzae, Moraxella catarrhalis, and Staphylococcus aureus in the upper respiratory tract of children with upper respiratory tract infections. They
92
determined that colonization involves a combination of host conditions, host immune response, and direct competitive interactions between bacteria. The presence of a viral infection may predispose the respiratory tract to a bacterial superinfection. Bakaletz determined that this is due to viral-bacterial interactions resulting in disruption of the respiratory mucosal epithelium [4]. Since diseases in the respiratory tract of poultry in particular result in poor performance, our goal is to gain insight into the impact microbial communities have on the health of the respiratory tract of poultry.
Avian respiratory disease complex (RDC) is an example of a multi-faceted syndrome that commonly affects poultry [7]. This disease can be triggered by a combination of environmental factors and microbial agents. Interactions between a combination of endogenous bacteria (Mycoplasma gallisepticum and Escherichia coli), with fungal and viral infectious agents such as infectious laryngotracheitis virus, infectious bronchitis virus and Newcastle disease virus can lead to RDC and may result in high mortality rates in poultry flocks [7, 8]. Additionally, exposure to certain environmental factors such as ammonia and other gasses and insufficient ventilation reduces the innate immune defenses of the bird and enables opportunistic microbial pathogens to establish themselves [8]. However, the extent of these microbial interactions is not fully understood. Elucidating the complex microbial interactions that result in RDC first requires a characterization of the complete respiratory microbiome of both healthy and diseased chickens.
93
Recent studies attempting to characterize the respiratory microbiome of poultry have focused primarily on bacteria, as there are well established and rapid methods of sequencing and analyzing this component [9-14]. The 16S rRNA gene is commonly used to identify and compare bacteria present in a given sample [15]. Accessible bacterial databases, such as Greengenes [16] and Silva [17], in addition to well- developed bioinformatics pipelines are available to facilitate these analyses [18-20].
Glendinning et al. (2017) utilized these 16S rRNA gene amplification approaches to characterize the buccal, nasal and lung microbiota of chickens. Utilizing similar methods, Shabbir et al. (2015) determined that the lower respiratory tract of healthy flocks of chickens from different farms in Pakistan exhibited high levels of diversity in their microbiota. More recently, Johnson et al. (2018) presented a comprehensive analysis of the core bacterial microbiota in the broiler gastrointestinal, respiratory, and barn environments. Although Lactobacillae were the predominant bacteria found in the trachea, similar to the ileum, the dominant Lactobacillus species differed in relative abundance when tracheal and ileum tissues were compared.
Although, the bacterial component provides valuable information about the respiratory microbiome of poultry, a comprehensive analysis of the avian respiratory microbiota has not been reported. Unlike bacteria, viruses lack a marker gene that can be sequenced and employed for taxonomic classification due to their high genetic heterogeneity [21]. With the advancement of next generation metagenomic sequencing technologies, virome characterization is also possible. The lack of comprehensive
94
bioinformatics tools and viral genome databases limit efforts to characterize the virome. Given this limitation and the lack of a comprehensive microbial environment for the broiler chicken, we developed and employed a bioinformatics pipeline and bacteriophage, fungal, and avian viral genome databases to examine a healthy flock of chickens throughout their grow out cycle. These methods were used to detect and quantify eukaryotic DNA and RNA viruses, bacteria, bacteriophage, and fungi. This study provides the first comprehensive analysis of the ecology of the avian respiratory microbiome and will facilitate future investigations of avian respiratory diseases such as RDC.
3.3 Results
A commercial poultry flock was utilized for this longitudinal study of the broiler respiratory microbiome. The flock was sampled weekly, beginning at placement, and concluding at 49 days, the day before processing. The flock had no health issues during grow out. Mortality after the first week was 1.4%, average final body weight was 7.95 lb, the feed conversion ratio was 1.76, and ammonia levels in the house were maintained below 20 ppm. Tracheal swabs from 12 birds were collected at each time point as two pools of six swabs. The two pools were combined and used for the extraction of DNA and RNA. DNA-Seq and RNA-Seq libraries were constructed and sequenced and the V4 hypervariable region of the 16S rRNA gene was also amplified and sequenced. A total of 339,319,712 trimmed DNA-Seq reads and 440,442,599 trimmed RNA-Seq reads were generated from a total of 16 libraries.
A total of 78.787 giga base pairs (Gbp) of high-quality nucleotide sequences were
95
obtained (Table S1). An average of 88% of the DNA reads mapped to the chicken genome, while an average of 53% of the RNA reads mapped to the chicken genome.
3.3.1 Avian Respiratory Eukaryotic Viral Diversity
Unmapped DNA and RNA reads from the eight weekly broiler respiratory samples, each representing a pool of 12 birds, were aligned to an avian specific viral genome database consisting of 63 complete avian RNA viral genomes and 48 complete avian DNA viral genomes (Table 3.1). The 5,163 reads which aligned to the avian viral DNA database and the 71,936 reads which aligned to the avian viral RNA database were analyzed as described in the Materials and Methods.
A total of 11 viral species, representing 9 genera and 8 families, were identified from the avian respiratory tract during the seven week grow out period
(Table S2). Normalized viral abundance was calculated for each eukaryotic viral species for each week (Figure 3.1; Table S3). At placement, or week 0, gallid herpesvirus 1 was the only DNA virus detected (Table S2). Relatively small amounts of meleagrid herpesvirus 1 were detected at week 2, and gallid herpesvirus 2 and 3 were detected in small quantities from weeks 3-6. Two other viral DNA families were detected later during growth. Circoviridae (avian gyrovirus) first appears in the avian respiratory tract at four weeks of age, while avian adenovirus was initially detected at week 6. The relative abundance of the DNA viruses of the respiratory tract can be examined on the basis of their percent relative abundance to the other viruses in each
96
sample (Figure 3.2; Table S4) or with respect to the relative distribution of the specific virus family throughout the 7 week period (Figure 3.3; Table S5). It is notable that when Circoviridae and Adenoviridae first appear (Week 4 and Week 6, respectively) they represent the highest viral abundance in the tracheal sample
(88.66% and 53.30% respectively, Figure 3.2). They also represent the highest relative abundance of these virus families observed during flock growth (57.97% and
88.84% respectively, Figure 3.3).
Five eukaryotic RNA virus families were identified in the tracheal samples during grow out. The pattern of RNA virus detection differed markedly from the observed patterns of DNA virus detection (Figure 3.2; Figure 3.3). As expected, transcripts from endogenous avian retroelements were detected throughout the growth of the flock. Coronoviridae were also observed in all of the respiratory samples, and they were the most abundant virus family found at Week 1 and Week 2 (Figure 3.2).
Three other RNA virus families were observed in the broiler trachea. Astroviridae and
Picornoviridae were observed transiently and in low numbers during Week1 and
Weeks 3-4 respectively. In addition, relatively low levels of Birnaviridae, infectious bursal disease virus, were observed in tracheal samples from Weeks 4, 6, and 7.
Unlike the DNA viruses that were observed later during flock growth, the relative abundances of these viruses remained low.
Figure 3.4A compares the avian respiratory viral microbiome of newly placed chickens, correlating with the microbial environment of a commercial hatchery (Week
0), with birds who have spent 1 week on litter (Week 1) to that of mature broiler
97
chickens at the time of processing (Week 7). An examination of the viral microbiome
(Figure 3.4A, Table S4) revealed trace amounts of Coronaviriae (0.38%) and
Herpesviridae (17%) at hatch. After 1 week, Coronaviridae are well established in the birds. After 6 more weeks, a more diverse and complex viral environment was observed, where avian adenovirus, chicken anemia virus and infectious bronchitis virus dominate. Average normalized abundance was calculated at each taxonomic level (Figure 5A, Table S5). Alpha diversity was also calculated at each week using
Shannon Diversity Index (Table 3.2). Both RNA viruses and DNA viruses exhibited their lowest diversity at placement (Week 0, H = 0.041 and H = 0.000 respectively).
DNA viruses saw an increase in diversity at Week 4 and exhibited their highest diversity at Week 7 (H = 0.867). The RNA virus population exhibited the highest diversity at Week 6 (H = 1.480).
3.3.2 Bacterial Diversity
A total of 50,181 reads were obtained from sequencing the V4 hypervariable region of the 16S rRNA gene (Table S1). Week 2 and Week 6 were omitted from analyses due to low numbers of processed reads. Processing and analysis was performed on the samples following the protocol discussed in the Materials and
Methods section. Following processing, a total of 353 operational taxonomic units
(OTUs) were obtained.
98
A total of 24 unique bacterial genera were identified and extended from 4 phyla, 7 classes, 13 orders and 24 families (Table S6). Average abundance was calculated in a similar manner to the previous analyses. The phyla Firmicutes made up most of the bacteria with an average abundance of 56.17%, followed by Proteobacteria
(39.28%), Actinobacteria (24.00%) and Bacteroidetes (5.78%). The relative abundance of all phyla, classes, orders, families, genera and species are available in
Table S7. Within the Firmicutes phylum, Bacilli was the most abundant class with an average abundance of 49.02% followed by Clostridia (7.15%). Within the
Proteobacteria phylum, Gammaproteobacteria was the most abundant class with an average abundance of 36.28% followed by Betaproteobacteria (3.00%). Actinobacteria
(24.00%) was the only class in the Actinobacteria phyla. The Bacteroidia (3.38%) and
Flavobacteria (2.40%) classes made up the Bacteroidetes phyla (Figure 3.5B).
A comparison of the Week 0 to the Week 1 bacterial microbiome (Figure
3.4B) reveals that the Bacteroides present at hatch (9.00%) are absent by Week 1 and the Actinobacteria and Proteobacteria are significantly reduced. The Firmicutes nearly double in abundance by Week 1 at the expense of these three families. By the end of the grow out cycle more balanced populations of Proteobacteria (37.20%),
Actinobacteria (27.20%) and Firmicutes (33.40%) are observed. Calculations of Alpha diversity (Table 3.2) showed a consistently diverse bacterial population, which is highest at placement and lowest near the end of the grow out period.
We also investigated the frequency of specific bacteria genera during the grow out cycle (Table S6). Three different population patterns were observed. At placement,
99
several genera from all four phyla are represented. Representing the Actinobacteria are the Corynebacteriaceae (6%), Brevibacterium (9%), the Brachybacterium (8%) and the Yaniella (1%). These are observed in lower abundance throughout flock growth.
At placement, the Proteobacteria are predominantly represented by the Pseudomonas
(13%) which are not found in significant levels after Week 1. As shown in Figure
3.4B, Week 0 is the time when significant numbers of Bacteroidetes are observed
(Chryseobacterium, 7% and Alloprevotella 2%). The predominant Firmicutes seen at placement, and consistently observed at high levels throughout growth, are the
Lactobacilli (5.1%).
Once established on litter, the avian bacterial respiratory microbiome is consistent for the first 4 weeks and is dominated by the Lactobacilli, averaging almost
40% of the detected OTUs. Other Bacillaceae and Staphylococcus from the Firmicutes as well as Actinobacteria phyla are also consistently observed. By Week 7, a significant shift to the Proteobacteria and Actinobacteria occurs in the respiratory tract
(Figure 3.4B). While the relative abundance of Lactobacilli drops to 14.8%, significant numbers of Gallibacterium (37%) and Corynebacteriaceae (22%) are now present (Table 3.7).
3.3.3 Bacteriophage Diversity
The unmapped DNA sequences were also aligned to a bacteriophage database consisting of 3,429 complete genome sequences. A total of 504,682 reads aligned to
100
bacteriophage genomes (Table S1). A total of 31 unique bacteriophage species extended from 1 classified and 1 unclassified order, 3 classified and 1 unclassified families, and 8 classified and 4 unclassified genera were identified (Table S8).
Normalized abundance, percent relative abundance and average abundance was calculated similar to the previous analyses (Table S8-S10). Of the classified families of bacteriophage observed, the Myoviridae were the most abundant with an average normalized abundance of 70.99%, followed by Podoviridae (40.19%) and
Siphoviridae (31.16%) (Figure 3.4C; Figure 3.5C; Table S8). The most abundant species of bacteriophage was Enterobacteria phage RB55 with an average normalized abundance of 39.16%.
We also investigated the frequency of specific bacteriophage species observed during the grow out cycle (Figure 3.6; Table S10). Salmonella phage RE-2010,
Enterobacteria phage IME10, Enterobacteria phage T7, Enterobacteria phage
VT2phi_272, Escherichia phage TL-2011b, Stx2 converting phage vB_EcoP_24B and
Stx2-converting phage 1717 were detected in all eight weeks whereas Salmonella phage SJ46, Shigella phage SfIV , Enterobacteria phage lambda, and Enterobacteria phage YYZ-2008 appeared in seven of the eight weeks. Alpha diversity was calculated at each week using the Shannon Diversity Index (Table 3.2). Samples exhibited the lowest bacteriophage diversity at Week 2 (H = 2.111) and the highest diversity at
Week 3 (H = 2.922). When comparing bacteriophage families from hatching to processing, Myoviridae increased by 25.02% while Podoviridae and unclassified bacteriophage decreased by 23.27% and 3.89%, respectively (Figure 3.4C).
101
Correlations between detected bacteria and bacteriophage were analyzed based on the Pearson coefficient of correlation. A total of 55 correlations were calculated between 4 families of bacteriophage and 11 families of bacteria (Figure 3.7). Strong, positive correlations were observed between the bacteriophage families and bacterial families present in the trachea. Siphoviridae (R = 0.459), Podoviridae (R = 0.743) and the unclassified bacteriophage (R = 0.887) exhibited strong positive correlations with the Dermabacteraceae. Podoviridae showed positive correlations with the
Brevibacteriaceae (R = 0.802), the Pseudomonadaceae (R = 0853), Flavobacteriaceae
(R = 0.788) and Streptococcaceae (R = 0.812) which are found in the avian trachea at hatch. Siphoviridae had a positive correlation with the Staphylococcaceae (R = 0.641), which are found in the trachea throughout growth.
3.3.4 Fungal Diversity
The unmapped DNA sequences were also aligned to a fungi database consisting of 1,281 genomes. A total of 1,964 reads aligned to fungi genomes (Table
S1). A total of 61 unique fungi species were identified which extended from 2 phyla, 9 classes, 20 orders, 37 families and 50 genera (Table S11). Normalized abundance, percent relative abundance and average abundance was calculated in a similar manner to the previous analyses (Table S11-13). Of the 2 Phyla, Ascomycota was by far the most abundant. The average abundance of all phyla, classes, orders, families, genera and species are available in Table S11. Within the Ascomycota phylum, the most
102
abundant class of fungi was Saccharomycetes with an average abundance of 98.76%.
Within the Basidiomycota phylum, the most abundant classes were Agaricomycetes with an average abundance of 0.05% and Tremellomycetes (0.03%).
We also investigated the frequency of specific fungi species observed during the grow out cycle (Figure 3.6; Table S13). Laccaria bicolor, Penicillium chrysogenum and Wickerhamomyces ciferrii were detected in all eight weeks whereas
Tetrapisispora phaffii and Aspergillus oryzae appeared in seven of the eight samples.
Twenty-six of the 61 fungal species were only detected in one sample. We also observed no shifts in fungal microbial communities during the experiment (Figure
3.4D, Figure 3.5D). Alpha diversity was calculated at each week using Shannon
Diversity Index (Table 3.2). The dominance of a single fungal species resulted in low levels of Alpha diversity in each sample.
3.3.5 The Avian Microbiome
The development of NGS approaches, accompanied by the further development of novel computational and bioinformatics tools enabled us to examine the evolution of the microbial ecology of the avian trachea (eukaryotic virus, bacteria, bacteriophage, and fungi) during the growth of this commercial flock. Figure 3.6 is a representation of the complex ecology of the respiratory microbiome of the broiler chicken. In this microbial network, nodes of bacteria, bacteriophage, eukaryotic
103
viruses, and fungi are arranged by order, while the diameter of the node depicts taxa frequency from 1-8 samples.
3.4 Discussion
A detailed characterization of the bacterial microbiota of the commercial broiler chicken was recently published [13]. This study examined the core bacterial microbiota of the broiler gastrointestinal, respiratory, and barn environments.
Lactobacillus was found to be the dominant bacterial taxon of the trachea, although the trachea was found to also contain Staphylococcus, Streptococcus, Ruminococcus, and Xanthomonas. This study was conducted as a longitudinal study from Day 7 to
Day 42 and utilized multiple flocks so that microbiome composition could be correlated with performance [13]. The goal of our study was to expand the characterization of the broiler respiratory microbiome beyond the bacterial component. Although emphasizing the eukaryotic virome, another objective was to use next generation sequence data to determine the bacteriophage and fungal composition of the avian respiratory tract. This required the development of a unique bioinformatics tool (BiomeSeq) that utilizes a sequence-dependent approach [22]. To determine and quantify the relative abundance of microbial elements RNA-Seq and
DNA-Seq derived sequences were initially aligned to the avian genome, followed by aligning the remaining sequences to avian-virus specific databases, a bacteriophage database, and a fungi database. Alignment to a host genome sequence followed by
104
alignment to specific microbial databases allowed the microbial community in a given sample to be quantified, a unique property of this sequence-dependent bioinformatics approach. This does not preclude the use of the same data in a sequence-independent manner, allowing for a more traditional metagenomics approach that can be used to create contigs that could then be utilized to identify and sequence novel viral elements.
Using either method, functional genes and metabolic pathways can be identified using
BLASTP and KEGG databases [23].
Previous avian virome studies have focused on the RNA virus community of the avian gut [24, 25]. Tracheal swabs of the avian respiratory tract are not amenable to traditional viral enrichment strategies such as centrifugation or filtration because of their small volume, the relatively low viral concentrations in the samples, and the nuclease rich environment. For this study we pooled twelve swabs in two samples to increase yield and collected the swab material in a chaotropic buffer that was rapidly frozen in order to preserve the integrity of viral RNA. In addition, the decision to utilize a sequence-dependent approach to analyze the sequenceing data necessitated the development of specific databases. For the eukaryotic virome, an avian virus- specific whole genome database was developed (Table 3.1). Representative whole genomes from 22 viral families (9 DNA viruses, 13 RNA viruses) are represented in the database. Once chicken genome sequences are removed from the RNA-Seq and
DNA-Seq library fastq files, alignment to the avian-specific viral database and subsequent analysis is rapid and efficient.
105
An examination of the avian respiratory viral microbiome confirmed the presence of a dynamic and diverse community. The commercial broiler flock utilized in this study was vaccinated in ovo with a live Marek’s disease virus vaccine (SB-1) and a live recombinant herpesvirus of turkeys (HVT) vaccine expressing Newcastle disease virus genes. At hatch, chicks were also vaccinated by spray with a multivalent infectious bronchitis virus (avian coronavirus) vaccine before placement. The consistent presence of herpesviruses and coronaviruses in the respiratory tract is consistent with vaccination with these two live vaccines, coupled with the expected presence of these avian viruses in the environment. As predicted, as the birds aged, the complexity and diversity of the viral community also increased. Of particular note are the appearance of infectious bursal disease virus (Birnaviridae) at Week 6 and chicken anemia virus (Circoviridae) at Week 4. Broiler breeders are vaccinated in order to maximize the amount of maternal antibodies to these potential pathogens in the newly hatched chick. By Week 4 maternal antibody levels should be reduced to the level where colonization of the respiratory tract by these viruses is likely. However, by
Week 4, the avian adaptive immune system has matured. Consequently, the initially observed relative abundance of these viruses is the highest level observed (57.97%) of the detected chicken anemia viruses sequences, Figure 3.3. The rapid reduction in the amount of these viruses in the respiratory tract is most likely due to the activation of the avian adaptive immune system. A similar observation is seen with the appearance in Week 6 of avian adenovirus in the avian respiratory tract. Avian adenoviruses are commonly isolated from the avian respiratory tract. Finally, picornaviruses and
106
astroviruses are commonly found in the digestive tract of chickens [25], not the respiratory tract. However, it is not surprising that representatives from these virus families would be transiently observed in the respiratory tract during their initial colonization of the bird.
Consistent with the observations of Johnson et al. (2018) we observed that the bacterial microbiome of the avian respiratory tract was dominated by the Lactobacilli
[13]. The bacterial microbiome of the newly hatched chick was more complex than expected. Although Lactobacilli (6.1%) were the dominant Firmicutes, Bacteroidetes
(Chryseobacterium, 7%), Proteobacteria (Pseudomonas 13%), and Actinobacteria
(Brevibacterium, 9%) were also observed in significant numbers. The majority of the bacteriophage found in the avian respiratory tract was Enterobacteria phage RB55 of the Myoviridae family. The presence of this bacteriophage correlated with
Gallibacterium (Pasteurellaceae), an abundant bacterial species found in the last two weeks of growth. Interactions between bacteriophage and bacteria are known to have a significant impact on host health [23]. Bacteriophages may also help control bacterial populations, influencing bacterial diversity and contributing to the dysbiosis of the respiratory microbiota during disease. Little is known about the diversity and role of fungi in the respiratory tract of the avian and further studies are needed to determine the relevance of the high normalized relative abundance of the Basidiomycota observed in this flock.
This approach utilized a longitudinal study of one commercial antibiotic-free broiler flock to develop the tools needed to develop a comprehensive analysis of the
107
microbial ecology of the avian respiratory tract (Figure 3.6). This initial study should be confirmed and expanded by examining the respiratory microbiome of multiple flocks from multiple companies, by examining multiple grow-out cycles from the same flock in order to determine seasonal effects and flock consistency, and by examining flocks grown under different production systems (antibiotic free, organic, free range, and traditional). These results could also be compared with multi-age backyard flocks from the same geographic area. Finally, efforts are underway to compare the ecology of the respiratory microbiome of birds exhibiting respiratory disease complex (RDC) to the control avian respiratory microbiome. We observe significant changes in the composition of the eukaryotic virome and the bacterial microbiome consistent with the complex etiology of this disease (manuscript in preparation).
3.5 Materials and Methods
3.5.1 Sample Collection
Tracheal swabs were collected at placement and at weekly intervals through processing at day 49 (8 samples) from an antibiotic-free commercial broiler flock grown in the Jones-Hamilton facility at the University of Delaware Carvel Research and Education Center. At each time point two samples containing six individual swabs in 3 ml of buffer PV1 (Qiagen) were collected frozen immediately on dry ice and stored in -80ºC until use.
108
3.5.2 Nucleic Acid Extraction and Sequencing
After thawing on ice, the pooled samples were gently homogenized, split into two tubes and then centrifuged (7000 X g; 5 minutes; 4°C) to form pellets. Total RNA was isolated from one pellet using the Qiagen (previously MoBio) Viral Nucleic Acid extraction kit following the manufacturer’s protocol. DNA was isolated from the duplicate pellet using the Qiagen Blood and Tissue Kit following the manufacturer’s protocol. Both DNA and RNA sequencing was performed for each time point using the Illumina HiSeq platform producing 1 X 100 single-end reads by the University of
Delaware Sequencing Core Facility.
3.5.3 16S rRNA Amplicon Sequencing and Analysis
The V4 hypervariable region of the bacterial 16S rRNA gene was extracted and amplified using PCR with primers 515F (‘5- GTGCCAGCMGCCGCGGTAA-3’) and 806R (‘5-GGACTACHVGGGTWTCTAAT-3’), as previously described [13, 27].
The conditions of the first PCR reaction used were an initial denaturation step at 95°C for 5 minutes, followed by 25 cycles of 98°C for 20 seconds, 55°C for 15 seconds, and
72°C for 1 minute, with a final extension at 72°C for 5 minutes. The product was diluted 1:100 and used in a second PCR reaction. The second PCR reaction consisted of an initial denaturation step at 95°C for 5 minutes, followed by 10 cycles of 98°C for
20 seconds, 55°C for 15 seconds, and 72°C for 1 minute, with a final extension at
72°C for 5 minutes. The pooled, size-selected sample was denatured with NaOH,
109
diluted to 8 pM in Illumina’s HT1 buffer, spiked with 20% PhiX, and heat denatured at 96°C for 2 minutes immediately prior to loading. The amplicons were sequenced at the University of Minnesota Genomics Center (Minneapolis, MN) using an Illumina
MiSeq 600 cycle v3 kit.
Following sequencing, samples were sorted by barcode to generate individual fastq files. Each sample was assessed for quality and assembled into contigs using
PEAR’s default parameters, with the modification that the quality score threshold was set to 30. Samples were further filtered and analyzed using Mothur version 1.35.1 [20] and MiSeq SOP [26]. OTUs were generated using 97% sequence similarity. Mothur’s implementation of the SILVA database (v123) was used for classification of OTUs.
Alpha-diversity was measured using the Shannon diversity index [27, 28]. Relative abundance, mean relative abundance and genera frequency were also calculated. This data was represented by pie charts, phylogenetic trees and networks using the R library GraPhlAn and Cytoscape [29, 30]. A Pearson correlation matrix between bacteria and bacteriophage was constructed using the R library Corrplot [31].
3.5.4 Eukaryotic Virus, Bacteriophage and Fungal Analysis
Raw DNA-Seq and RNA-Seq reads were processed using BiomeSeq [22].
Individual sequence files were first analyzed for per-base sequence quality, per- sequence quality, sequence length distribution and duplicate sequences. Reads with a quality Phred score below 30, reads under 100 base pairs and reads containing only
110
adapter sequences were removed. The remaining reads were then aligned to the reference host genome (Gallus gallus; Annotation Release 104) using the Burrows-
Wheeler Alignment algorithm [32-34]. Only unmapped reads were extracted and analyzed further. This step removes host genome contamination from the data, increasing analytical efficiency [35]. Determining the amount of host genome sequence in the library is also required when quantifying the results. The remaining reads were then aligned to microbial databases including a bacteriophage, a fungal and an avian-specific viral genome database using Bowtie2.
The avian-specific viral genome database contains full genome reference sequences of both DNA and RNA avian viruses obtained from the National Center for
Biotechnology Information (NCBI) reference sequences. The avian DNA viral database contains 48 viral elements from 9 unique families and the avian RNA viral database contains 63 viral elements from 13 families (Table 3.1). The avian DNA and
RNA viral databases are organized by the classification of their viral structure and genome organization. DNA viruses are organized hierarchically by whether the virus is double- or single-stranded and whether the virus is enveloped or non-enveloped.
RNA viruses are organized hierarchically by whether the virus is double- or single- stranded, negative or positive sense, segmented or non-segmented and whether the virus is enveloped or non-enveloped. The reads were also aligned to the default bacterial, fungal and bacteriophage databases provided by BiomeSeq, which contain
111
complete and representative genomes obtained from the NCBI Reference Sequence
Database and consist of 3,623, 1,281 and 2,212 genomes, respectively [11].
A sequence similarity-dependent approach for detecting microbes, such as this, contributes to the rapid detection of known microbes while also allowing for the quantification of biodiversity which similarity-independent approaches lack [36]. For each individual sample, the reads that mapped to each microbe were normalized based on the genome length of both microbe and reference per 100,000 host cells using the following equation [37]:
%$.#)/ 01 /)'&2 .'33)& 40 .5(/0#5'6 7)%0.) 2 - .5(/0#) 7)%0.) 258) "#$%&'%() = - 10! %$.#)/ 01 /)'&2 .'33)& 40 (ℎ5(:)% 7)%0.) (ℎ5(:)% 7)%0.) 258)
Relative microbial abundance, mean relative abundance and species frequency were also calculated. This data was represented by pie charts, phylogenetic trees and networks using the R library GraPhlAn and Cytoscape [29, 30]. Alpha diversity was measured using Shannon diversity index [27]. In addition, stacked bar plots and heatmaps were generated with the R library PhyTools [38]. A Pearson correlation matrix between bacteria and bacteriophage was constructed using the R library
Corrplot [31].
112
Figure 3.1. Normalized relative abundance of detected DNA and RNA viral species at each time point. * No DNA viruses detected at Week 1
A
* Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 B
Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7
113
Figure 3.2. Heat map with phylogenetic tree representing the detection intensity of viral families at each individual week. Color corresponds to the range of relative abundance of each week from 0 to 100%. The sum of each column, or week, is 100%.
W0 W1 W2 W3 W4 W5 W6 W7 Poxviridae Herpesviridae 0.17 0.04 0.26 0.15 0.07 0.01 Adenoviridae 53.30 6.70 DNA Hepeviridae Hepadnaviridae Genomiviridae Circoviridae 88.66 12.05 15.47 36.77 Parvoviridae Smacoviridae Reoviridae Birnaviridae 0.01 2.33 0.10 Orthomyxoviridae Phenuiviridae Bornaviridae RNA Pneumoviridae Paramyxoviridae Astroviridae 0.74 Caliciviridae Picornoviridae 0.17 0.01
Retroviridae 99.45 44.49 41.02 83.30 9.29 64.27 7.11 37.11 Flaviviridae
Coronaviridae 0.38 54.76 58.95 16.28 1.88 23.60 21.79 19.32
114
Figure 3.3. Heat map with phylogenetic tree representing the detection intensity of each viral family from hatching to processing. Color intensity corresponds to the range of relative abundance of each family from 0 to 100%. The sum of each row, or viral family, is 100%.
W0 W1 W2 W3 W4 W5 W6 W7 Poxviridae
Herpesviridae 24.86 5.30 37.27 21.54 10.16 0.87
Adenoviridae 88.84 11.16 DNA Hepeviridae Hepadnaviridae Genomiviridae Circoviridae 57.97 7.88 10.11 24.04 Parvoviridae Smacoviridae Reoviridae Birnaviridae 0.32 95.39 4.28 Orthomyxoviridae Phenuiviridae Bornaviridae RNA Pneumoviridae Paramyxoviridae Astroviridae 100.00 Caliciviridae Picornoviridae 97.06 2.94 Retroviridae 25.76 11.53 10.63 21.58 2.41 16.65 1.84 9.61 Flaviviridae Coronaviridae 0.19 27.80 29.93 8.26 0.96 11.98 11.06 9.81
115
Figure 3.4. Abundance of A) virus, B) bacteria, C) bacteriophage and D) fungi at Week 0, 1 and 7. Taxa represented at family (A and C) and phylum (B and D).
WeekWeek 00 WeekWeek 1 WeekWeek 77
Astroviridae Coronaviridae Herpesviridae 0.74% Retroviridae
0.38% Circoviridae 0.17% 37.11% 36.77% Retroviridae 44.49%
Virus Coronaviridae
54.76% 0.10% Birnaviridae A) Retroviridae Adenoviridae 99.45% 6.7% Coronaviridae 19.32% Week 0 Week 1 Week 7
Actinobacteria Proteobacteria Actinobacteria 15.00% 19.10% Proteobacteria Actinobacteria 29.00% 4.00% 27.20% Proteobacteria 37.20% Bacteria
Firmicutes Firmicutes Bacteroidetes 65.40% B) Firmicutes 35.10% 9.00% 33.40%
Unclassified Unclassified Unclassified 4.38% 2.96% 6.85% Myoviridae Myoviridae 20.42% 21.33%
Myoviridae Podoviridae 46.35% 36.01% Podoviridae Siphoviridae 46.91% Bacteriophage 12.54% Podoviridae Siphoviridae 59.28% 28.29% Siphoviridae C) 12.68%
Basidiomycota Basidiomycota Basidiomycota 0.01% 0.03% 0.02% Fungi
Ascomycota D) 99.97% Ascomycota Ascomycota 99.99% 99.98%
116
Figure 3.5. Phylogenetic tree of A) virus, B) bacteria, C) bacteriophage and D) yeast and fungi. Node diameter indicates average abundance at species (A, C and D) and genera (B) level. Taxonomic levels range from phylum to genera (B), order to species (C) phylum to species (D). Viruses are organized according to structural classification (A).
A B Herpesviridae
Birnaviridae
Adenoviridae
Astroviridae
Circoviridae
Picornoviridae
Retroviridae Coronaviridae
C D Dothideomycets Podoviridae Eurotiomycetes Lecanoromycetes
Leotiomycetes
Schizosaccharomycetes Siphoviridae Agaricomycetes
Unclassified Tremellomycetes
Sordariomycetes
Saccharomycetes
Myoviridae
117
Figure 3.6. Correlation matrix comparing bacteria and bacteriophage taxa at the family level. Node diameter corresponds to level of correlation. Node color corresponds to the Pearson correlation coefficient and ranges from -1 to 1 indicated by red and blue, respectively.
118
Figure 3.7. Microbial network of the complete healthy avian respiratory microbiome including detected RNA viruses, DNA viruses, yeast and fungi, bacteria, and bacteriophage. Taxa nodes are arranged by order. Node diameter correlates to taxa frequency.
Fungi
RNA Virus
DNA Virus
Healthy Avian Respiratory Microbiome
Bacteriophage Bacteria
119
Table 3.1. Avian specific viral genome database structure.
Virus Complete Database Classification Family Genomes
Double/Single Hepeviridae 1 Enveloped d a Stranded Hepadnaviridae 1
Genomoviridae 3 Non- Single Stranded Parvoviridae Enveloped 7 Avian DNA Viral Circoviridae 10 Database Smacoviridae 3
Poxviridae 3 Double Stranded Enveloped Herpesviridae 6 Non- Double Stranded Adenoviridae Enveloped 14 Non- Reoviridae 5 Double Stranded Segmented c Enveloped Birnaviridae 1 Retroviridae 5 Non- Single Stranded Positive b Enveloped Flaviviridae Segmented 3 Coronaviridae 5 Astroviridae 5 Avian RNA Non- Non- Single Stranded Positive Caliciviridae Viral Segmented Enveloped 1 Database Picornaviridae 17 Orthomyxoviridae 16 Phenuiviridae 1 Single Stranded Negative Segmented Enveloped Bornaviridae 3 Pneumoviridae 1 Non- Single Stranded Negative Enveloped Paramyxoviridae Segmented 14 a single stranded, double stranded or single/double stranded DNA and RNA viruses b positive-sense or negative-sense RNA viruses c segmented or non-segmented RNA viruses d enveloped or non-enveloped DNA and RNA viruses
120
Table 3.2. Shannon diversity of respiratory microbes in a healthy broiler flock over time.
Time RNA Virus DNA Virus Bacteria Phage Fungi
Placement 0.041 0.000 2.707 2.218 0.022 Week 1 1.290 0.000 2.468 2.531 0.286 Week 2 1.108 0.000 - 2.111 0.096 Week 3 0.722 0.000 2.251 2.922 0.165 Week 4 0.738 0.013 2.499 2.756 0.151 Week 5 0.910 0.035 1.925 2.294 0.026 Week 6 1.480 0.534 - 2.134 0.013 Week 7 1.092 0.867 1.935 2.087 0.134
121
REFERENCES
1. Human Microbiome Project, C., A framework for human microbiome research.
Nature, 2012. 486(7402): p. 215-21.
2. Bosch, A.A., et al., Viral and bacterial interactions in the upper respiratory
tract. PLoS Pathog, 2013. 9(1): p. e1003057.
3. Bakaletz, L.O., Viral potentiation of bacterial superinfection of the respiratory
tract. Trends Microbiol, 1995. 3(3): p. 110-4.
4. Pettigrew, M.M., et al., Microbial interactions during upper respiratory tract
infections. Emerg Infect Dis, 2008. 14(10): p. 1584-91.
5. de Steenhuijsen Piters, W.A., et al., Nasopharyngeal Microbiota, Host
Transcriptome, and Disease Severity in Children with Respiratory Syncytial
Virus Infection. Am J Respir Crit Care Med, 2016. 194(9): p. 1104-1115.
6. Teo, S.M., et al., The infant nasopharyngeal microbiome impacts severity of
lower respiratory infection and risk of asthma development. Cell Host
Microbe, 2015. 17(5): p. 704-15.
7. Gross, W.B., Factors affecting the development of respiratory disease complex
in chickens. Avian Dis, 1990. 34(3): p. 607-10.
122
8. Kleven, S.H., Mycoplasmas in the etiology of multifactorial respiratory
disease. Poult Sci, 1998. 77(8): p. 1146-9.
9. Bond, S.L., et al., Upper and lower respiratory tract microbiota in horses:
bacterial communities associated with health and mild asthma (inflammatory
airway disease) and effects of dexamethasone. BMC Microbiol, 2017. 17(1): p.
184.
10. De Boeck, C., et al., Longitudinal monitoring for respiratory pathogens in
broiler chickens reveals co-infection of Chlamydia psittaci and
Ornithobacterium rhinotracheale. J Med Microbiol, 2015. 64(5): p. 565-574.
11. Gaeta N, L.S., Teixeira A, Ganda E, Oikonomou G, Gregory L, Bichalho R.,
Deciphering upper respiratory tract microbiota complexity in healthy calves
and calves that develop respiratory disease using shotgun metagenomics. J
Dairy Sci. , 2017. 100: p. 1445-1458.
12. Glendinning, L., G. McLachlan, and L. Vervelde, Age-related differences in
the respiratory microbiota of chickens. PLoS One, 2017. 12(11): p. e0188455.
13. Johnson TJ, Y.B., Noll S, Cardona C, Evans NP, Karnezos P, Ngunjiri JM,
Abundo MC, Lee C-W, A consistent and predictable commercial broiler
chicken bacterial microbiota in antibiotic-free production displays strong
correlations with performance. Appl. Environ. Micro., 2018. 84: p. e00362-18.
123
14. Shabbir, M.Z., et al., Microbial communities present in the lower respiratory
tract of clinically healthy birds in Pakistan. Poult Sci, 2015. 94(4): p. 612-20.
15. Clarridge, J.E., 3rd, Impact of 16S rRNA gene sequence analysis for
identification of bacteria on clinical microbiology and infectious diseases. Clin
Microbiol Rev, 2004. 17(4): p. 840-62.
16. De Santis T, H.P., Larsen N, Rojas M, Brodie E, Keller K, Huber T, Dalevi D,
Hu P, Andersen G., Greengenes, a chimera-checked 16S rRNA gene. 2016.
17. Quast C, P.E., Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner F.,
The SILVA ribosomal RNA gene database project: improved data processing
and web-based tools. Nucl Acids Res., 2013. 41: p. 590-596.
18. Meyer F, P.D., D’Souza M, Olson R, Glass E, Kubal M, Paczian T, Rodriguqz
A, Stevens R, Wilke A, Wilkening J, Edwards R., The metagenomics RAST
server- a public resource for the automatic phylogenetic and functional
analysis of metagenomes. BMC Bioinformatics, 2008. 9: p. 386.
19. Caporaso J, K.J., Stombaugh J, Bittinger K, Bushman F, Costello E, Fierer N,
Peña A, Goodrich J, Gordon J, Huttley G, Kelley ST, Knights D, Koenig JE,
Ley R, Lozupone C, McDonald D, Muegge B, Pirrung M, Reeder J, Sevinsky
JR, Turnbaugh P, Walters W, Widmann J, Yatsunenko T, Zaneveld J, Knight
R., Qiime allows analysis of high-throughout community sequencing data.
Nature Methods, 2010. 7: p. 335-336.
124
20. Schloss P, W.S., Ryabin T, Hall J, Hartman M, Hollister E, Lesniewski R,
Oakley B, Parks D, Robinson C, Sahl J, Stres B, Thallinger G, Van Horn D,
Weber C. , Introducing mothur: Open-source, platform-independent,
community-supported software for describing and comparing microbial
communities. Appl Enviro Microbiol, 2009. 75: p. 7537-7541.
21. Zou, S., et al., Research on the human virome: where are we and what is next.
Microbiome, 2016. 4(1): p. 32.
22. Mulholland, K.A. and C.L. Keeler, BiomeSeq: A Tool for the Characterization
of Animal Microbiomes from Metagenomic Data. bioRxiv, 2019: p. 800995.
23. Yang, S., et al., Metagenomic Analysis of Bacteria, Fungi, Bacteriophages, and
Helminths in the Gut of Giant Pandas. Front Microbiol, 2018. 9: p. 1717.
24. Day, J.M., et al., Comparative analysis of the intestinal bacterial and RNA
viral communities from sentinel birds placed on selected broiler chicken farms.
PLoS One, 2015. 10(1): p. e0117210.
25. Day, J.M. and L. Zsak, Recent progress in the characterization of avian enteric
viruses. Avian Dis, 2013. 57(3): p. 573-80.
26. Kozich J, W.S., Baxter N, Highlander S, Schloss P., Development of a dual-
index se-quencing strategy and curation pipeline for analyzing amplicon
125
sequence data on the MiSeq Illumina sequencing platform. Appl. Environ.
Microbiol., 2013. 79: p. 5112-5120.
27. Lemos, L.N., et al., Rethinking microbial diversity analysis in the high
throughput sequencing era. J Microbiol Methods, 2011. 86(1): p. 42-51.
28. Ludwig, J. and J. Reynolds, Statistical Ecology, ed. Wiley. 1988, New York.
29. Asnicar, F., et al., Compact graphical representation of phylogenetic data and
metadata with GraPhlAn. PeerJ, 2015. 3: p. e1029.
30. Shannon, P., et al., Cytoscape: a software environment for integrated models of
biomolecular interaction networks. Genome Res, 2003. 13(11): p. 2498-504.
31. Wei, T. and V. Simko, corrplot: Visualization of a correlation matrix. R
package version 0.73, 2013. 230(231): p. 11.
32. Hillier, L.W., et al., Sequence and comparative analysis of the chicken genome
provide unique perspectives on vertebrate evolution. Nature, 2004. 432(7018):
p. 695-716.
33. Langmead, B. and S.L. Salzberg, Fast gapped-read alignment with Bowtie 2.
Nat Methods, 2012. 9(4): p. 357-9.
34. Li, H. and R. Durbin, Fast and accurate long-read alignment with Burrows-
Wheeler transform. Bioinformatics, 2010. 26(5): p. 589-95.
126
35. Daly G, L.R., Rowe W, Stubbs S, Wilkinson M, Ramirez-Gonzalez R, Mario
C, Bernal W, Heeney J. , Host subtraction, filtering and assembly validations
for novel viral discovery using next generation sequencing data. PLoS One,
2015. 10(6).
36. Herath, D., et al., Assessing Species Diversity Using Metavirome Data:
Methods and Challenges. Comput Struct Biotechnol J, 2017. 15: p. 447-455.
37. Moustafa, A., et al., The blood DNA virome in 8,000 humans. PLoS Pathog,
2017. 13(3): p. e1006292.
38. Revell, L.J., phytools: an R package for phylogenetic comparative biology
(and other things). Methods in ecology and evolution, 2012. 3(2): p. 217-223.
127
Chapter 4
CHARACTERIZATION OF THE RESPIRATORY MICROBIOME OF CHICKENS WITH RESPIRATORY DISEASE
4.1 Summary
Respiratory diseases in commercial poultry are a clinical manifestation of a broader dysbiosis of the respiratory microbial community. Although the bacterial component of the healthy broiler chicken has recently been characterized, there are limited tools available to identify the eukaryotic virus, bacteriophage, and fungal composition of the broiler respiratory tract. BiomeSeq is a computational tool that utilizes a sequence similarity-dependent approach and a comprehensive workflow incorporating nucleotide data generated through high throughput sequencing platforms to determine the composition of the eukaryotic virus, bacterial, bacteriophage, and fungal microbiomes. This tool was used to determine the feasibility of generating a comprehensive assessment of the respiratory microbiome from birds diagnosed with infectious laryngotracheitis and/or respiratory disease complex. To that end, two samples were compared; a pooled sampled of tracheal swabs collected from normal healthy 7-week-old broilers, and a pooled sample form three clinical submissions of poultry respiratory disease. It was confirmed that the diseased birds harbored
128
infectious laryngotracheitis virus (89% relative abundance in the diseased sample). A significant dysbiosis in the bacterial component of the microbiome was also observed.
An increase in the abundance of Escherichia coli, a loss of commensal bacteria such as Corynebacterium falsenii, and the introduction of Ornithobacterium rhinotracheale, a respiratory pathogen, was observed in the diseased tracheal samples.
Information learned about the respiratory microbiome using this approach can be represented as a microbial network, which can be used to make hypotheses about the relationships among the microbial components composing the microbial ecology of the avian respiratory tract.
4.2 Introduction
Since the Human Microbiome project was initiated in 2008, considerable knowledge has been gained about the composition and function of the microbial species that inhabit different ecological niches in the human body [1]. These microbial communities, or “microbiomes”, interact with each other and with their host in order to benefit both systems and imbalances in these microbial communities can be associated with specific diseases. For example, the lung microbiota of individuals with asthma demonstrate an increase in Proteobacteria [2] and a decrease in Firmicutes,
Actinobacteria and Saccharibacteria [3]. And while some studies addressed the viral component of the gut, more recent studies have examined the blood virome in healthy and diseased individuals [4]. With regard to the respiratory tract, dysbiosis in the
129
nasopharynx may lead to the acquisition of novel viral pathogens or viral co-infections
[5].
Research on the avian microbiome has primarily been directed at the prokaryotic component of the gastrointestinal tract and several studies have clearly showed that the introduction of poultry pathogens negatively impacts the gut microbiome [6, 7]. Only recently has the healthy broiler chicken respiratory bacterial microbiome been characterized [8]. In determining the baseline microbial composition of the trachea in normal flocks, Lactobacillus was found to be the dominant taxa, and other bacterial taxa were also correlated with broiler performance. However, no systematic efforts have been made to identify the eukaryotic virus, bacteriophage, or fungal composition of the broiler respiratory tract.
Although defining the bacterial microbiome is crucial for understanding the relationship between health and disease, the inclusion of these other microbial components in the avian respiratory microbiome and an understanding of their interactions are required in order to develop more complete homeostatic or disequilibrium models. Respiratory diseases are a major cause of economic losses in poultry production. Commercial poultry are vaccinated in ovo and at hatch with a variety of live attenuated viral vaccines. Some vaccines increase the severity of infections with other viruses [9] while interference between viral vaccines has also been reported [10]. Under commercial conditions respiratory infections involving multiple agents are common.
130
Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes mild to severe respiratory infections in chickens [11]. The virus enters through the upper respiratory tract or conjunctiva. Viral replication is generally limited to the nares, oropharynx, and trachea, while viremia is rarely observed [12]. Birds exhibiting the peracute form of the disease have difficulty breathing, gasp while extending their necks, may produce a bloody tracheal exudate, and exhibit high mortality. Mild forms of ILT may present with conjunctivitis, nasal discharge and coughing. Under commercial conditions, a clinical diagnosis of infectious laryngotracheitis (ILT) can be complicated by co-infections with other viral agents (either pathogens or vaccines) and/or the development of secondary bacterial infections. Combined with suboptimal environmental or management conditions, a complex multi-factored respiratory disease complex (RDC) often develops. It is difficult to reproduce the complex etiology of RDC under defined laboratory conditions due to the inability to reproduce the complex microbial and environmental environment found in a commercial broiler facility.
One reason why RDC is difficult to reproduce is that the microbial ecology of the respiratory tract of healthy and diseased birds in a commercial setting is poorly characterized. Therefore, the purpose of this study was to determine the feasibility of generating sufficient next generation DNA and RNA sequencing data to compare and contrast the respiratory microbiome (including the bacterial, bacteriophage, eukaryotic viral, and fungal components) of a healthy broiler flock from flocks diagnosed with RDC.
131
4.3 Materials and Methods
4.3.1 Sample Collection
Tracheal swabs were collected at seven weeks of age from a healthy antibiotic- free commercial broiler flock grown in the Jones-Hamilton facility at the University of
Delaware Carvel Research and Education Center. Two samples, each containing six individual swabs in 3 ml of buffer PV1 (Qiagen) were collected and frozen immediately on dry ice.
Tracheal swabs were also collected from three respiratory clinical isolates submitted to the University of Delaware Poultry Health System (UDPHS) Lasher Laboratory at the University of Delaware. Two of the clinical sample were obtained from ~50-day-old roaster flocks, while the third sample was obtained from a 31-day- old broiler flock. Tracheal swabs were collected in BHI broth and frozen immediately at -80°C. The clinical diagnosis of all three flocks was infectious laryngotracheitis (ILT), RDC, or an ILT complicated by RDC. All three flocks tested positive for infectious bronchitis virus (IBV) and ILTV by PCR, and tested negative for avian influenza virus (AIV). One broiler flock tested positive for Escherichia coli in the air sac and pericardium.
4.3.2 Nucleic Acid Extraction and Sequencing
After thawing on ice, both the healthy and diseased samples were pooled, gently homogenized, split evenly into two tubes and then centrifuged (7000 X g,
20min; 4°C) to form pellets. Total RNA was isolated from one pellet using the Qiagen
132
Viral Nucleic Acid extraction kit following the manufacturer’s protocol. DNA was isolated from the duplicate pellet using the Qiagen Blood and Tissue Kit following the manufacturer’s protocol. Library construction and sequencing using the Illumina
HiSeq platform, producing 1 X 100 single-end reads, was performed at the University of Delaware Sequencing Core Facility.
4.3.3 Eukaryotic Virus, Bacteriophage, Yeast and Fungal Analysis
Raw DNA-Seq and RNA-Seq reads were processed using BiomeSeq [13].
Individual sequence files were first analyzed for per-base sequence quality, per- sequence quality, sequence length distribution and duplicate sequences. Reads with a quality Phred score below 30, reads under 100 base pairs in length and adapter sequences were removed. The remaining reads were then aligned to the reference host genome (Gallus gallus; Annotation Release 104) using the Burrows-Wheeler
Alignment algorithm [14, 15]. Only unmapped reads were extracted and analyzed further. This step removes host genome contamination from the data, increasing analytical efficiency [16]. Determining the amount of host genome sequence in the library is also required when quantifying the results. The remaining reads were then aligned to microbial databases including a bacteriophage, a fungal and an avian- specific viral genome database using Bowtie2 [17].
The avian-specific viral genome database contains full genome reference sequences of both DNA and RNA avian viruses obtained from the National Center for
133
Biotechnology Information (NCBI) reference sequences (Table 4.1). The avian DNA viral database contains 48 viral elements from 9 unique families and the avian RNA viral database contains 63 viral elements from 13 families. The avian DNA and RNA viral databases are organized by the classification of their viral structure and genome organization. DNA viruses are organized hierarchically by whether the virus is double- or single-stranded and whether the virus is enveloped or non-enveloped. RNA viruses are organized hierarchically by whether the virus is double- or single-stranded, negative or positive sense, segmented or non-segmented and whether the virus is enveloped or non-enveloped. The reads were also aligned to the default bacterial, fungal and bacteriophage databases provided by BiomeSeq, which contain complete and representative genomes obtained from the NCBI Reference Sequence Database and consist of 3,623, 1,281 and 2,212 genomes, respectively [18].
A sequence similarity-dependent approach for detecting microbes, such as this, contributes to the rapid detection of known microbes while also allowing for the quantification of biodiversity which similarity-independent approaches lack [19]. For each individual sample, the reads that mapped to each microbe were normalized based on the genome length of both microbe and reference per 100,000 host cells using the following equation [4]:
!"#$%& () &%*+, #*--%+ .( #/0&($/*1 2%!(#% " $ #/0&($/*1 2%!(#% ,/3% ! .5(/0#5'6 '#$%&'%() = !"#$%& () &%*+, #*--%+ .( 4(,. 2%!(#% - 10 4(,. 2%!(#% ,/3%
134
Relative microbial abundance was calculated and alpha diversity was measured using Shannon diversity index [20, 21].
4.4 Results
4.4.1 Identifying the broiler respiratory microbiome and a comparison of the respiratory virome between healthy and diseased birds.
In this feasibility study, a healthy, seven-week-old broiler flock was sampled to represent a healthy respiratory microbiome in this feasibility study. Twelve tracheal swabs were collected, pooled and then split in half for DNA-Seq and RNA-Seq library construction and sequencing. Similarly, tracheal swabs from three clinical cases which were presented to the UDPHS with infectious laryngotracheits or respiratory disease complex were pooled and then split for DNA-Seq and RNA-Seq library construction and sequencing. All three clinical samples had been screened by PCR as positive for infectious laryngotracheitis virus (ILTV) and infectious bursal virus
(IBV), and negative for avian influenza virus (AIV). E. coli was identified in the air sac and pericardium of one sample. The raw DNA-Seq and RNA-Seq reads were processed through BiomeSeq as described in the Materials and Methods. As shown in Table 4.2, between 462,382 and 20,884,583 sequencing reads from the four libraries did not have sequences which could be mapped to the chicken genome. From these sequences, a total of 147,097 sequencing reads were aligned to one of the four microbial databases that were derived from NCBI reference sequences. Knowing the sequence length of the viral genomes represented in the avian eukaryotic virus database, the size of the chicken genome, and the number of sequences that map to the chicken genome permits the quantification of the number of viral genomes per
135
100,000 chicken cells in the indicated sample (Table 4.3). From that value the relative abundance of each identified viral species in the sample can be determined. The heat map in Figure 4.1 graphically presents this data. Of the 19 avian eukaryotic virus families in the avian eukaryotic virus database, representatives from five families were found in each sample. Herpesviridae, Gallid herpesvirus 1 or ILTV, was only identified in the diseased sample and represented 28,937 (89.1%) of the recovered viral sequences. Adenoviridae, Circoviridae, Coronaviridae and Retroviridae sequences were identified in both samples. The healthy flock had a more diverse eukaryotic viral population (Figure 4.2) because the disproportional representation of ILTV sequences in the diseased sample.
4.4.2 Comparison of the bacterial microbiome between healthy and diseased birds.
Along with the changes observed in the eukaryotic virome, significant changes were observed in the bacterial microbiome of diseased birds when compared with the control flock. Table 4.4 lists the bacterial species found in each sample at a frequency of >0.10% and the loss in bacterial diversity observed in the diseased sample is shown in Figure 4.2. Significant alteration of the bacterial microbiome in the diseased flock were observed, Figure 4.3. Gallibacterium anatis and Corynebacterium falsenii, two commensal bacterial species commonly found in the microflora of the upper respiratory tract of chickens, were the most abundant bacterial species recovered from the healthy broiler flock, representing a combined 66.1% of the identified bacterial sequences (Figure 4.3A). In the diseased flock, E. coli abundance increased three- fold, to become the most abundant species. Ornithobacterium rhinotracheale, a
136
bacterium identified with avian respiratory diseases, was only observed in the diseased birds and was the third most abundant species. Together, these two bacterial species represented 68.4% of the identified bacterial sequences in the diseased sample.
4.4.3 Comparison of the bacteriophage and fungal microbiomes between healthy and diseased birds.
Large changes were observed in the bacteriophage components of the respiratory microbiome between healthy and diseased birds. When identifying bacteriophage species present at >1% in the two samples, 97.2% of the bacteriophage found in the healthy birds were represented by two species, both Enterobacteriophage from two different families, Table 4.5. The top 10 most abundant bacteriophage species in the diseased sample only represented 67.2% of the identified bacteriophage sequences, while two bacteriophage species represented 97.3% of the sequences recovered from healthy birds. In the diseased sample there were 20 different bacteriophage species represented at a frequency of at least 1%. The increase in bacteriophage diversity can be calculated using the Shannon diversity index, Figure
4.2, and is represented in Figure 4.4. Whereas the majority of the bacteriophage in the healthy respiratory microbiome are represented by two families, in the diseased flock there is a proportional distribution found between three families and an unclassified group. The unclassified group (Enterobacteria phage P4) represented the most abundant bacteriophage population in the diseased sample (20.5%). The increased
137
diversity in this sample can be associated with an increase in the diversity of
Enterobacteria and Salmonella phage.
In contrast to the other components of the respiratory microbiome, the fungal component was found to be relatively consistent between the healthy flock and the diseased birds. Wickerhamomyces ciferri was the most predominant fungal species identified in both samples, Table 4.6.
4.4.4 Microbial network analysis.
Figure 4.5 represents the eukaryotic virus, bacterial, bacteriophage, and fungal microbiome results as a microbial network. The eukaryotic virus component, representing all of the identified viral families, clearly identifies the unique presence of ILTV in the diseased sample and the small (0.1%) unique presence of infectious bursal disease virus sequences in the healthy flock respiratory sample. The bacterial network demonstrates the increased bacterial diversity found in the healthy flock and the replacement and shift in the bacterial microbiome observed in the diseased birds. It does not demonstrate the significant shift in the distribution of commonly found species (Escherichia coli and Gallibacterium anatis) or that Ornithobacterium rhinotracheale represent a significant portion of the bacterial microbiome in diseased birds. The shift in bacterial composition resulted in a concomitant increase in the diversity of bacteriophage species in the diseased birds, while the same two fungal species were identified in both groups of birds.
138
4.5 Discussion
Avian respiratory diseases, even with a specific diagnosis such as infectious laryngotracheitis, actually represent the clinical manifestation of a broader dysbiosis of the respiratory microbial community. This report evaluates an approach for determining the composition of the avian respiratory microbiome under such conditions. This was done by sampling a healthy broiler flock at 7-weeks-of-age and comparing that sample to a pooled sample comprised of three clinical isolates that had been confirmed by clinical, microbiological, and molecular tests as having infectious laryngotracheitis and/or respiratory disease complex. Total RNA and DNA was extracted from these samples and used to construct and sequence libraries generating next generation sequence data. DNA-Seq data was used to identify the presence of eukaryotic DNA virus genetic material, as well as bacterial, bacteriophage, and fungal sequences. RNA-Seq data, derived from total RNA, was used to identify the presence of eukaryotic RNA virus genetic material, which would not be detected from DNA sequencing data. The resulting sequence data, greater than 127,000,000 reads over four libraries, was processed with BiomeSeq, a computational tool which utilizes a sequence-dependent approach and a comprehensive workflow to determine the composition of the eukaryotic virus, bacterial, bacteriophage, and fungal microbiomes.
This approach successfully identified changes in the composition and proportions of the eukaryotic viral, bacterial, and bacteriophage microbiomes, while detecting no
139
substantive changes in the fungal microbiome component, demonstrating the utility of this approach for this limited and controlled sample.
Analysis of the eukaryotic viral microbiome utilized a customized avian eukaryotic virus database. This database contains 124 complete viral genome sequences distributed over 24 virus families. Sequences from six of these families were identified in the two samples. As expected, the most striking observation was the increase in number of viral sequences (4.5-fold greater in the diseased sample) and the overwhelming presence of ILTV (89.1% relative abundance) in the diseased birds, confirming the primary clinical diagnosis. With 29,000 ILTV reads being recovered, this approach would also permit the resequencing of this clinical isolate of ILTV for comparison with vaccine and field viruses. Expected normal eukaryotic viral flora was also observed with the presence of Adenoviridae, Circoviridae, and transcripts from endogenous retroviruses identified in both samples. Small amounts of Birnaviridae in the healthy sample are not unexpected considering the ubiquitous nature of infectious bursal disease virus in the production environment.
The metagenomic approach evaluated in this report also permitted the first contemporary comparison of the bacterial microbiome from diseased and healthy birds, confirming the complexity of the changes in the microbial ecology of birds diagnosed with respiratory disease. Remarkable changes in the composition and distribution of the major bacterial components of the respiratory microbiome were observed. The composition of the healthy flock’s bacterial microbiome was dominated by the Gallibacterium anatis, Corynebacterium falsenii, and Escherichia coli. C.
140
falsenii is recognized as a commensal bacteria of the respiratory tract, while G. anatis and E. coli are also found in the normal flora. However, these bacterial species may also participate in respiratory disease complex as opportunistic pathogens [22]. The ecology of the diseased birds revealed the loss of the C. falsenii commensal component, E. coli became the dominat bacterial component (present in 3 times the abundance of that observed in the healthy flock), and the third most abundant species observed, 13.1%, was Ornithobacterium rhinotracheale, a well characterized avian pathogen [23].
In previous studies of the broiler respiratory bacterial microbiome,
Lactobacillus was found to be the dominant bacterial species [8]. Lactobacillus was found, but in relatively low abundance, in our samples. 16S rRNA sequencing confirmed the composition, but not the relative abundance, of the bacteria identified in the tracheal samples from the diseased birds (data not shown). Both methods identified the same two most abundant bacteria in the healthy sample, G. anatis and C. falsenii,
However, while the high throughput sequencing approach identified E. coli as the third most abundant bacterial species in the healthy flock, 16S analysis identified
Lactobacillus (16%) and Staphylococcus (10%) as the next most abundant bacterial species. Although the major findings were confirmed by 16S analysis, the differences may be due to a number of factors. While this feasibility study examined two pooled tracheal samples collected from broiler birds on the DelMarVa peninsula, the Johnson study [8] collected 2,309 samples from 37 commercial flocks in Minnesota. Thus, statistical differences due to the small sample size of this study, as well as differences
141
in geographic location, the poultry integrator, management practices and seasonality could be factors influencing these differences.
Compared to the bacterial and viral components of the avian respiratory tract, little is known regarding the bacteriophage and fungal components. The infected bacteriophage microbiome was the only microbiome component exhibiting an increase in diversity when compared to healthy birds (Figure 4.2). In healthy birds 97% of the identified bacteriophage sequences represented two bacteriophage from two families
(Microviridae and Myoviridae), while there were 13 bacteriophage species present at
>2% abundance in the sample from the diseased birds. There were significant representations from the Microviridae and Myoviridae, as well as the Siphoviridae and unclassified bacteriophage in this sample (Figure 4.4). The increase in E. coli in the respiratory tract of diseased birds may be contributing to the observed increase in bacteriophage diversity. In both samples the dominant fungal species were
Wickerhamomyces ciferrii and Penicillium chrysogenum. W. ciferrii has been collected from wild birds [24] while P. chrysogenum can be found in damp indoor environments [25]. More research is needed to determine if the relative increase in fungal abundance in the respiratory tract of diseased birds can be confirmed. Taken together, data generated about the unique biological components of the respiratory microbiome can be represented as a microbial network. This enables a comparison of the various microbial elements in a manner that can reveal associations that can be proposed and evaluated. Figure 4.5 demonstrates the unique presence of infectious laryngotracheitis, the significant changes in the bacterial component, and the increase
142
in bacteriophage diversity in the pooled tracheal sample collected from diseased birds, when compared to the control flock.
The success of this experiment warrants further studies to confirm its utility.
These should include examining the respiratory microbiome of individual birds (to determine bird to bird variability), collecting samples directly from multiple flocks exhibiting various clinical respiratory presentations, and determining seasonal effects.
The BiomeSeq tool has also been used to determine the composition of the microbiome in other environments, such as the enteric tract (data not shown). Of particular interest is determining the relationship between the enteric and respiratory microbiomes in birds experiencing respiratory diseases, enteric diseases, and diseases affecting immune function. A further desire is to see the approach in this study used in controlled experiments where the interplay between microbial components from the same or different kingdoms can be artificially determined and manipulated in attempts to study the complex etiology of avian diseases.
143
Figure 4.1. Heat map with phylogenetic tree representing the detection intensity of viral families at each individual week. Color corresponds to the range of relative abundance of each week from 0 to 100%. Green: 0-1%; yellow: 1-25%; orange: 25- 75%; and red: 75-100%. The sum of each column, or week, is 100%.
Healthy Diseased Poxviridae Herpesviridae 89.1 0.6 2.0 DNA Adenoviridae DNA Viruses Hepeviridae Hepadnaviridae Genomiviridae
Circoviridae 32.5 6.6 Parvoviridae Reoviridae Orthomyxoviridae Phenuiviridae RNA Birnaviridae RNA 0.1 Viruses Pneumoviridae Paramyxoviridae Astroviridae Picornoviridae Retroviridae 43.9 0.4 Flaviviridae Coronaviridae 22.7 1.8
144
Figure 4.2. Sample Diversity of all detected microorganisms in healthy flock and diseased birds. Alpha diversity determined using the Shannon diversity index.
2
1.8 1.6
1.4
1.2 1
0.8
Diversity Index Diversity 0.6 0.4
0.2
0 Virus Bacteria Bacteriophage Fungi
Healthy Diseased
145
Figure 4.3. Abundance of bacterial species identified in A) healthy and B) diseased flocks (Top 5 most abundant).
HEALTHY FLOCK DISEASED FLOCK A) Neisseria sicca B) Serratia Proteus mirabilis Citrobacter freundii 1.5% marcescens 3.6% 1.9% 2.5%
Ornithobacterium Escherichia coli rhinotracheale 17.5% Gallibacterium 13.1% anatis 40.7% Gallibacterium Escherichia coli anatis 55.2% Corynebacterium 17.3% falsenii
146 25.4%
146
Figure 4.4. Abundance of bacteriophage families identified in healthy and diseased flocks (Top 10 most abundant).
Diseased 10.6% 16.9% 19.2% 20.5% DiseasedFlock
Healthy 23.9% HealthyFlock 74.8%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Microviridae Myoviridae Podoviridae Siphoviridae Unclassified
147
Figure 4.5. Microbial network of the complete avian respiratory microbiome of a healthy broiler flock and diseased birds. Blue nodes indicate species detected in the healthy flock, red nodes indicate species detected in the diseased flock, green nodes indicate species detected in both flocks. The bacteria, bacteriophage, and fungal networks were constructed from elements present at greater than 1%.
Healthy Diseased Both
148
Table 4.1. Avian specific viral genome database structure.
Virus Complete Database Classification Family Genomes Double/Single Hepeviridae 1 Enveloped d Stranded a Hepadnaviridae 1 Genomoviridae 3 Non- Single Stranded Parvoviridae 7 Avian DNA Enveloped Circoviridae Viral 10 Database Smacoviridae 3 Poxviridae 3 Double Stranded Enveloped Herpesviridae 6 Non- Double Stranded Adenoviridae Enveloped 14 Non- Reoviridae 5 Double Stranded Segmented c Enveloped Birnaviridae 1 Retroviridae 5 Non- Single Stranded Positive b Enveloped Flaviviridae 3 Segmented Coronaviridae 5 Astroviridae 5 Avian RNA Non- Non- Single Stranded Positive Caliciviridae 1 Viral Segmented Enveloped Database Picornaviridae 17 Orthomyxoviridae 16 Phenuiviridae 1 Single Stranded Negative Segmented Enveloped Bornaviridae 3 Pneumoviridae 1 Non- Single Stranded Negative Enveloped Paramyxoviridae Segmented 14 a single stranded, double stranded or single/double stranded DNA and RNA viruses b positive-sense or negative-sense RNA viruses c segmented or non-segmented RNA viruses d enveloped or non-enveloped DNA and RNA viruses
149
Table 4.2. Sequencing data generated by DNA Seq and RNA Seq and number of reads trimmed, aligned to host and aligned to microbial databases.
Not Mapped Map to Map to Map to Map to Sample Trimmed Map to Host to Host Virus DB Phage DB Bacteria DB Fungi DB Healthy 41,196,338 37,149,004 4,066,203 DNA 7,039 257 17,038 74 Healthy 54,162,138 33,277,555 20,884,583 RNA Disease 23,547,613 21,003,097 2,544,516 DNA 32,478 1,448 88,278 485 Disease 8,680,175 8,217,793 462,382 RNA Total 127,586,264 99,647,449 27,957,684 39,517 1,705 105,316 559
150 Average 31,896,566 24,911,862 6,989,421 19,758.5 852.5 52,658 279.5
150
Table 4.3. Eukaryotic Viruses detected in healthy and diseased poultry broiler flocks.
Percent Normalized Sample Virus Name Type Taxonomy Relative Abundance Abundance Healthy Avian Retrovirus RNA ss, positive enveloped Retroviridae,Alpharetrovirus 8116608 44
Circoviridae,Gyrovirus,Avian Healthy Avian gyrovirus 2 DNA ss nonenveloped 6010480 33 gyrovirus 2
Adenoviridae,Aviadenovirus, Healthy Adenovirus DNA ds nonenveloped 117156 0.6 aviadenovirus
ss, positive, Avian infectious Coronaviridae,Gammacoronavirus Healthy RNA nonsegmente enveloped 4193379 22.7 bronchitis virus ,Avian coronavirus d
Infectious bursal ds, Birnaviridae,Avibirnavirus,Infectio Healthy RNA nonenveloped 22727 0.1 disease virus segmented us bursal disease virus 151
Diseased Avian Retrovirus RNA ss, positive enveloped Retroviridae,Alpharetrovirus 12864 0.4
Circoviridae,Gyrovirus,Avian Diseased Avian gyrovirus 2 DNA ss nonenveloped 237457 7 gyrovirus 2
Adenoviridae,Aviadenovirus, Diseased Adenovirus DNA ds nonenveloped 73306 2.0 aviadenovirus ss, positive, Avian infectious Coronaviridae,Gammacoronavirus Diseased RNA nonsegmente enveloped 65122 1.8 bronchitis virus ,Avian coronavirus d Herpesviridae,Iltovirus,Gallid Diseased Gallid herpesvirus 1 DNA ds enveloped 3193134 89.1 alphaherpesvirus 1
151
Table 4.4. Bacteria detected in healthy and diseased poultry broiler flocks (> 1%).
Number Percent Relative Sample Bacteria Name Mapped Abundance
Healthy Gallibacterium anatis 6932 40.7 Healthy Corynebacterium falsenii 4332 25.4 Healthy Escherichia coli 2986 17.5 Healthy Serratia marcescens 430 2.5 Healthy Neisseria sicca 261 1.5 Healthy Methylobacterium radiotolerans 238 1.4 Healthy Corynebacterium stationis 237 1.4 Healthy Cronobacter sakazakii 220 1.3 Diseased Escherichia coli 48736 55.2 Diseased Gallibacterium anatis 15239 17.3 Diseased Ornithobacterium rhinotracheale 11579 13.1 Diseased Proteus mirabilis 3186 3.6 Diseased Citrobacter freundii 1637 1.9
152
Table 4.5. Bacteriophage detected in healthy and diseased poultry broiler flocks (Top10 and >1%).
Percent Bacteriophage Normalized Sample Bacteriophage Name Relative Family Abundance Abundance Healthy Microviridae Enterobacteria phage WA13 519726 74.2 Healthy Myoviridae Enterobacteria phage fd 161803 23.1 Diseased Unclassified Enterobacteria phage P4 85457 20.5 Diseased Myoviridae Enterobacteria phage P88 59207 14.2 Diseased Microviridae Enterobacteria phage WA13 44074 10.6 Diseased Siphoviridae Stx2-converting phage 1717 25974 6.2 Diseased Siphoviridae Salmonella phage f18SE 14372 3.5 Diseased Myoviridae Escherichia phage pro483 11108 2.7 Diseased Siphoviridae Enterobacteria phage mEp460 10515 2.5 Diseased Siphoviridae Salmonella phage MA12 9963 2.4 Diseased Siphoviridae Salmonella phage FSL SP-031 9955 2.4 Diseased Siphoviridae Salmonella phage SETP7 9161 2.2
153
Table 4.6. Fungi detected in healthy and diseased poultry broiler flocks (> 1%).
Percent Number Normalized Sample Fungi Name Relative Mapped Abundance Abundance Healthy Wickerhamomyces ciferrii 21 403326 75.9 Healthy Penicillium chrysogenum 9 122422 23.0 Diseased Wickerhamomyces ciferrii 340 2856318 94.8 Diseased Penicillium chrysogenum 21 126311 4.2
154
REFERENCES
1. Wypych, T.P., L.C. Wickramasinghe, and B.J. Marsland. The influence of the
microbiome on respiratory health. Nat. Immunol, 2019. 20:1279-1290.
2. Huang, Y.J., S. Nariya, J.M. Harris, S.V. Lynch, D.F. Choy, J.R. Arron, and H.
Boushey. The airway microbiome in patients with severe asthma: associations
with disease features and severity. J. Allergy Clin. Immunol, 2015. 136:874-
884.
3. Durack, J., S.V. Lynch, S. Nariya, N.R. Bhakta, A. Beigelman, M. Castro, A.
Dyer, E. Israel, M. Kraft, and R.J. Martin. Features of the bronchial bacterial
microbiome associated with atopy, asthma, and responsiveness to inhaled
corticosteroid treatment. J. Allergy Clin. Immunol., 2017. 1450:63-75.
4. Moustafa, A., C. Xie, E. Kirkness, W. Biggs, E. Wong, Y. Turpaz, K. Bloom,
E. Delwart, K.E. Nelson, J.C. Venter, and A. Telenti. The blood DNA virome
in 8,000 humans. PLoS Pathog., 2017. 13:e1006292.
5. Murphy, T.F., L.O. Bakaletz, and P.R. Smeesters. Microbial interactions in the
respiratory tract. The Pediat. Infect. Dis. J., 2009. 28:S121-S126.
6. Lin, Y., S. Xu, D. Zeng, X. Ni, M. Zhou, Y. Zeng, H. Wang, Y. Zhou, H. Zhu,
K. Pan, and G. Li. Disruption in the cecal microbiota of chickens challenged
155
with Clostridium perfringens and other factors was alleviated by Bacillus
licheniformis supplementation. PLoS One, 2017 12:e0182426.
7. Danzelsen, J.L., J.B. Clayton, H. Huang, D. Knights, B. McComb, S.S. Hayer,
and T.J. Johnson. Temporal relationships exist between cecum, ileum, and
litter bacterial microbiomes in a commercial turkey flock, and subtherapeutic
penicillin treatment impacts ileum bacterial community establishment. Front.
Vet. Sci., 2015. 2:56.
8. Johnson, T.J., B.P. Youmans, S. Noll, C. cardona, N.P. Evans, T.P. Karnezos,
J.M. Ngunjiri, M.C. Abundo, and C.-W. Lee. A consistent and predictable
commercial broiler chicken bacterial microbiota in antibiotic-free production
displays strong correlation with performance. Appl. Environ. Microbiol., 2018.
84:e00362-18.
9. Hassan, K.E., A. Ali, S.A.S. Shany, and M.F. El-Kady. Experimental co-
infection of infectious bronchitis and low pathogenic avian influenza H9N2
viruses in commercial broiler chickens. Res. Vet. Sci., 2017. 115:356-362.
10. Cook, J.K., M.B. Huggins, S.J. Orbell, K. Mawditt, and D. Cavanaugh.
Infectious bronchitis virus vaccine interferes with the replication of avian
pneumovirus vaccine in domestic fow. Avian Path., 2001. 20:233-242.
156
11. Hanson, L.E., and T.J. Bagust. Infectious laryngotracheitis. In Diseases of
Poultry, 9th edn. Ed: Calnek, B.W., Iowa State University press, Ames, IA.,
1991. p. 485-495.
12. Bang, B.G., and F.B. Bang. Laryngotracheitis virus in chickens: a model for
study of acute nonfatal desquamating rhinitis. J. Exp. Med., 1967. 125:409-
428.
13. Mulholland, K.A., and C.L. Keeler. BiomeSeq: A tool for the characterization
of animal microbiomes from metagenomic data. bioRxiv., 2019. p. e800995.
14. Li H., and R. Durbin. Fast and accurate long-read alignment with Burrows-
Wheeler Transform. Bioinformatics., 2009. 45:1745-1760.
15. Hillier, L.W., W. Miller, E. Birney, W. Warren, R.C. Hardison, C.P. Pointing,
P. Bork, D.W. Burt, M.A.M. Grienen, M.E. Delaney, et al. Sequence and
comparative analysis of the chicken genome provide unique perspectives on
vertebrate evolution. Nature., 2004. 432:695-716.
16. Daly, G.M., R.M. Keggett,W. Rowe, S. Stubbs, M. Wilkinson, R. Ramirez-
Gonzalez, C. Mario, W. Bernal, and J. Heeney. Host subtraction, filtering and
assembly validations for novel viral discovery using next generation
sequencing data. PLoS One, 2015. 10:e0129059.
157
17. Langmead, B. and S.L. Salzberg. Fast gapped-read alignment with Bowtie 2.
Nat Methods, 2012. 9:357-359.
18. Mulholland, K.A and Keeler, C.L. BiomeSeq Microbial Databases. Available
from: https://sites.udel.edu/aviangenomics/. 2019.
19. Herath, D., D. Jayasundra, D. Ackland, I. Saeed, S.-L. Tan, and S. Halgamuge.
Assessing Species Diversity Using Metavirome Data: Methods and
Challenges. Comput. Struct. Biotechnol. J., 2017. 15:447-455.
20. Lemos, L.N., R.R. Fulthorpe, E.W. Triplett, and L.F.W. Roesch. Rethinking
microbial diversity in the hihj throughput sequencing era. J. Microbiol. Meth.,
2011. 86:42-51.
21. Ludwig, J., and J. Reynolds. Statistical Ecology. Wiley. New York. 1988.
22. Zhang, J.J., T.Y. Kang, T. Kwon, H. Koh, N. Chandimali, D.L> Huynh, X.Z.
Wang, N. Kim, and D.K. Jeong. Specific chicken egg yolk antibody improves
the protective response against Gallibacterium anatis infection. Infect.
Immun., 2019. 87:e00619-18.
23. van Empel, P.C.M., and H.M. Hafez. Ornithobacterium rhinotracheale: a
review. Avian Pathol., 1999. 28:217-227.
24. Francesca, N., C. Carvalho, P.M. Almeida, C. Sannino, L. Settanni, J.P.
Sampaio, and G. Moschetti. Wickerhamomyces syylviae f.a., sp. Nov., an
158
ascomycetous yeast species isolated from migratory birds. Int. J. Systematic
Evol. Micro., 2013. 63:4824-4830.
25. Andersen, B., J.C. Frisvad, I. Sondergaard, I.S. rasmussen, and L.S. Larsen.
Associations between fungal species and water-damaged buidling materials.
Appl. Environ. Micro., 2011. 77:4180-4188.
159
Chapter 5
A COMPARISON OF TRACHEA, CHOANAL CLEFT AND CLOACAL
MICROBIOTA OF A HEALTHY TURKEY FLOCK
5.1 Introduction
Poultry is the leading source of protein globally, with over $46.3 billion in global sales in 2018 [1]. Turkey in particular had an estimated $13.5 billion in global sales in 2016, with the United States accounting for about $6 billion, according to the
United States Department of Agriculture [2, 3]. The United States is the world leader in both turkey meat consumption as well as in production and export, producing about
7.5 billion pounds and accounting for about 41% of the world’s turkey consumption in
2016 [2]. In recent years, pathogenic outbreaks in poultry flocks have contributed to global economic loss. For example, during the 2014-2015 outbreak of highly pathogenic avian influenza, arguably the largest poultry health catastrophe in the
United States, over 50 million chickens and turkeys were lost [4]. Due to the importance of turkey health from both a nutritional and economic standpoint, elucidating the microbiota of turkey flocks is essential.
160
Advancements in next-generation sequencing technology enable investigations into individual components of the microbiome. However, the current methodologies are limited to characterizing one component at a time. BiomeSeq is a tool developed for the analysis of complete animal microbiomes using metagenomic sequencing data.
This tool addresses the constraints of current computational tools by providing a comprehensive workflow and corresponding microbial databases that accurately identify and quantify each major component of the microbiome. BiomeSeq has demonstrated high precision and sensitivity on several simulated datasets and has also been successfully employed to characterize the respiratory microbiome of a commercial poultry broiler flock during its development and a broiler flock clinically diagnosed with avian respiratory disease complex. BiomeSeq was designed to facilitate investigations of various microbial niches from any animal host. Therefore, this tool can be utilized to elucidate turkey microflora.
Utilizing BiomeSeq, we provide a complete characterization of the cloacal, choanal cleft and tracheal microbiomes of a healthy flock of turkeys. This tool was successful in identifying the microbial communities inhabiting these three unique biological niches from metagenomics next generation sequencing data. Each of the major components of the microbiome, including eukaryotic viruses, bacteria, bacteriophage and fungi, are identified from each niche and normalized relative abundance and diversity is carefully examined. A comprehensive microbial network of the microorganisms inhabiting the respiratory and intestinal microbiomes is provided.
161
This study further demonstrates the extensive utility of BiomeSeq and its ability to characterize various microbiomes from any animal host.
5.2 Materials and Methods
5.2.1 Sample Collection
Tracheal, choanal and cloacal swabs were collected from two 4-week-old and two 8-week-old commercial turkeys at the University of Minnesota. Individual swabs were placed in 3 ml of buffer PV1 (Qiagen) and frozen immediately on dry ice and stored at -80ºC.
5.2.2 Nucleic Acid Extraction and Sequencing
For each sampling site, the four swab samples were thawed, combined, homogenized and split into two tubes. Each tube was centrifuged (7000 X g; 5 minutes; 4°C) to form two pellets for each sample. RNA was isolated from one of the pellets using the Qiagen Viral Nucleic Acid extraction kit, following the manufacturer’s protocol. DNA was isolated from the other pellet using the Qiagen
Blood and Tissue kit, also following the manufacturer’s protocol. DNA and RNA sequencing was performed for each sample using Illumina HiSeq platform, producing
1 X 100 single-end reads, at the University of Delaware Sequencing and Genotyping
Center. A total of six sequencing files were created.
162
5.2.3 Eukaryotic Virus, Bacteria, Bacteriophage and Fungal Analysis
The raw DNA-Seq and RNA-Seq reads were processed using BiomeSeq [5]. In summary, each sequence file was trimmed for quality and reads were analyzed for per- base sequence quality, per-sequence quality, sequence length distribution and duplicate sequences. Reads with a quality Phred score below 30, adapter sequences and reads less than 100 base pairs in length were removed from the file. The remaining reads were then aligned to the reference host genome (Meleagris gallopavo;
Annotation Release 101) using the Burrows-Wheeler Alignment algorithm [6, 7]. This step removes host (turkey) sequences from the dataset, increasing analytical efficiency of the remaining processing steps [8].
Unmapped reads were extracted and aligned to microbial reference genome databases using the Bowtie2 alignment algorithm [9]. BiomeSeq provides eukaryotic virus, bacteriophage, fungi and bacteria reference genome databases. However, one major feature of this tool is its ability to accept custom databases provided by the user.
For this study, an avian-derived viral database was constructed to replace BiomeSeq’s default viral database (Table 5.1). This avian-derived viral database contains full genome reference sequences of both DNA and RNA avian viruses obtained from
National Center for Biotechnology Information (NCBI) reference sequences (Table
5.1). The avian DNA viral database contains 48 viral elements from 9 unique families and the avian RNA viral database contains 63 viral elements from 13 families. The avian DNA and RNA viral databases are organized by the classification of their viral
163
structure and genome organization (Table 5.1). DNA viruses are organized hierarchically by whether the virus is double- or single-stranded and whether the virus is enveloped or non-enveloped. RNA viruses are organized hierarchically by whether the virus is double- or single-stranded, negative or positive sense, segmented or non- segmented and whether the virus is enveloped or non-enveloped The avian-viral database used in this study is publicly available [10]. The reads were also aligned to the default bacterial, fungal and bacteriophage databases provided by BiomeSeq, which contain complete and representative genomes obtained from the NCBI
Reference Sequence Database and consist of 3,623, 1,281 and 2,212 genomes, respectively [11]. These databases are also publicly available [10] .
A sequence similarity-dependent approach for detecting microorganisms, employed by BiomeSeq, contributes to the rapid detection of known eukaryotic viruses, bacteria, bacteriophage and fungi while also allowing for the quantification of biodiversity, which similarity-independent approaches lack [12]. For each individual sample, the reads that mapped to each microorganism were normalized based on both microbe and reference genome length per 100,000 host cells using an adaptation of the equation presented by Moustafa and colleagues in 2017 to quantify viral abundance
[13]:
$#-"(. /0 .(&%1 -&22(% 3/ -4'./"( 1(5#($'( 2 , -4'./"( 1(5#($'( 146( !"#$%&$'( = , 10! $#-"(. /0 .(&%1 -&22(% 3/ ℎ/13 8($/-( ℎ/13 8($/-( 146(
164
Percent relative abundance was also quantified from the normalized abundances using the following equation:
-4'./"4&= &"#$%&$'( ;(.'($3 <(=&34>( !"#$%&$'( = , 100 3/3&= -4'./"4&= &"#$%&$'(
Finally, alpha diversity for each sample was calculated using the Shannon Diversity Index, a commonly used equation for calculating species diversity in a microbiome as it accounts for both species abundance and evenness within the sample [14, 15]. This data was visually represented by Venn diagrams, networks and stacked bar plots using the R library Venn Diagram [16] and Cytoscape [17].
5.3 Results
5.3.1 Quality Trimming and Decontamination of Sequencing Reads
A total of 164,557,236 reads were obtained from RNA-Seq and 192,392,125 reads from DNA-Seq (Table 5.2). Following quality control and trimming,
164,412,647 RNA-Seq reads and 192,361,317 DNA-Seq reads remained. A total of
6,456,405 RNA-Seq reads aligned to the turkey genome, and the 157,956,242 reads that did not align to the turkey genome were aligned to the avian viral, bacteria, bacteriophage and fungi databases using BiomeSeq [Mulholland, 2019]. A total of
162,742,651 DNA-Seq reads aligned to the turkey genome, and the 29,618,666 reads that did not were also aligned to the avian viral, bacteria, bacteriophage and fungi databases using BiomeSeq [Mulholland, 2019] (Table 5.2).
165
5.3.2 Diversity of Eukaryotic Viruses in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock
Two DNA viruses and three RNA eukaryotic viruses were detected in the choanal cleft (Table 5.4). The most abundant viral species was Rotavirus D with a relative abundance of 74.4%, followed by Turkey astrovirus 2 (15.5%) and Turkey gallivirus (6.7%; Figure 5.1). The DNA viruses, Adenovirus (2.3%) and Gallid herpesvirus 2&3 (1.1%) were the least abundant. Three DNA viruses were detected in the trachea (Table 5.4). The most abundant viral species was Adenovirus with a relative abundance of 95.7%. Meleagrid herpesvirus 1 (0.9%) and Gallid herpesvirus
2&3 (3.4%) were also detected (Figure 5.1). One DNA virus and four RNA viruses were detected in the cloaca (Table 5.4). The most abundant viral species were Turkey gallivirus, Turkey astrovirus 2 and Rotavirus D with relative abundances of 36.1%,
35.1% and 27.4%, respectively. Trace amounts of Avian leukosis virus (1.0%) and
Adenovirus (0.4%) were also detected (Figure 5.1). A Venn Diagram was generated to analyze the similarities and differences in the species of viruses detected in the choanal cleft, trachea and cloaca of this turkey flock (Figure 5.5). Viral alpha diversity was highest in the cloaca (H = 1.16), followed by choanal cleft (H = 0.83) and trachea (H = 0.20; Table 5.3).
166
5.3.3 Bacteriophage Diversity in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock
A total of 84 unique bacteriophage were detected in the choanal cleft, 73 in the trachea and 79 in the cloaca. The top 10 most abundant bacteriophage are included in
Table 5.6. The most abundant bacteriophage in all locations was Enterobacteria phage P88 with a normalized relative abundance of 20.3% in the choanal cleft, 39.0% in the trachea and 11.6% in the cloaca (Figure 5.3). In the choanal cleft, the next most abundant bacteriophage includes Enterobacteria phage P4 (17.0%), Bordetella phage
BPP-1 (11.8%) and Stx2-converting phage 1717 (11.0%). In the trachea
Enterobacteria phage P1 (7.5%), Shigella phage SfIV (7.0%) and Salmonella phage
SJ46 (6.1%) are the most abundant species (Figure 5.3). Finally, we observed
Salmonella phage SJ46 (10.1%), Stx2-converting phage 1717 (7.3%) and
Enterobacteria phage mEp460 (6.4%) as the most abundant bacteriophage in the cloaca (Figure 5.3). Several bacteriophage species were shared among locations, while many were unique to a specific niche. A Venn Diagram was generated to depict the similarities and differences in the top ten most abundant species of bacteriophage detected in the choanal cleft, trachea and cloaca of this turkey flock (Figure 5.7).
Enterobacteria phage P88, Enterobacteria phage P1, Salmonella phage SJ46,
Enterobacteria phage mEp460 and Stx2-converting phage 1717 were present in all locations. Shigella phage SfIV, Enterobacteria phage SfV and Escherichia phage TL-
2011b were detected only in trachea. Enterobacteria phage P4, Bordetella phage
BPP-1, Enterobacteria phage phiP27, Enterobacteria phage Sf6 were detected in the
167
choanal cleft. Salmonella phage RE-2010 and Stxconverting phage vB_EcoP_24B were detected in the cloaca. Enterobacteria phage fiAA91-ss was observed in trachea and choanal cleft, but not the cloaca. Phage cdtI was observed in the choanal cleft and cloaca, but not the trachea. Finally, Enterobacteria phage YYZ-2008 was detected in the trachea and cloaca, but not in the choanal cleft (Figure 5.7). Bacteriophage alpha diversity was highest in the cloaca (H = 3.37), followed by choanal cleft (H = 2.76) and trachea (H = 2.52; Table 5.3).
5.3.4 Fungal Diversity in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock
A total of 29 unique fungi were detected in the choanal cleft, 28 were detected in the trachea and 39 were detected in the cloaca. The fungal species with a percent relative abundance of greater that 0.1% are included in Table 5.7. Wickerhamomyces ciferrii was the most abundant fungi in the trachea (94.7%) and cloaca (85.3%; Figure
5.4). The most abundant fungi in the choanal cleft were Penicillium chrysogenum
(73.2%) and Wickerhamomyces ciferrii (26.4%; Figure 5.4). The most abundant fungi in the trachea were Wickerhamomyces ciferrii (94.7%), Penicillium chrysogenum
(2.9%), Sordaria macrospora (2.37%; Figure 5.4). The most abundant fungi in the cloaca were Wickerhamomyces ciferrii (85.3%) and Penicillium chrysogenum (14.6%;
Figure 5.4). A Venn Diagram was generated to analyze the similarities and differences in the species of fungi detected in the choanal cleft, trachea and cloaca of a
168
turkey (Figure 5.8). Fungal alpha diversity was highest in the choanal cleft (H = 0.59), followed by cloaca (H = 0.42) and trachea (H = 0.24; Table 5.3).
5.3.5 Bacterial Diversity in the Choanal Cleft, Cloaca and Trachea of a Healthy Turkey Flock
A total of 4 unique bacteria were detected in the choanal cleft, 6 were detected in the trachea and 20 were detected in the cloaca with a percent relative abundance of greater than 0.1% (Table 5.5). Escherichia coli was detected at a high abundance in all three locations (choanal cleft at 46.6%, trachea at 67.5% and cloaca at 3.4%; Table
5.5). Bordetella hinzii was the most abundant bacteria detected in the choanal cleft with a percent relative abundance of 50.8%, followed by Escherichia coli (46.6%),
Lactobacillus amylovorus (1.5%) and Citrobacter freundii (1.1%; Figure 5.2).
Escherichia coli was the most abundant bacteria detected in the trachea with a percent relative abundance of 67.5%, followed by Citrobacter freundii (28.4%) and
Corynebacterium stationis (1.4%; Figure 5.2). In the cloaca, Staphylococcus warneri was the most abundant with a percent relative abundance of 73.8%, followed by
Eubacterium Marseille-P3177 (6.5%) and Campylobacer coli (3.4%; Figure 5.2). A
Venn Diagram was generated to analyze the similarities and differences in the species of bacteria detected in the choanal cleft, trachea and cloaca of a turkey (Figure 5.6).
Bacterial alpha diversity was highest in the cloaca (H = 1.20), followed by trachea (H
= 0.86) and choanal cleft (H = 0.75; Table 5.3).
169
5.3.6 Microbial network of choanal cleft, cloaca and trachea of a healthy turkey flock
We examined the microbial ecology of the turkey cloaca, trachea and choanal cleft of a healthy turkey flock. Figure 5.9 is a representation of the complex ecology of the microbiomes. In this microbial network, nodes represent species of bacteria, bacteriophage, eukaryotic viruses, and fungi. The most abundant species are included:
17 bacteriophage, 7 eukaryotic viruses, 15 bacteria and 6 fungi. Five bacteriophage, 1 virus, 3 fungi and 1 bacteria were detected in all locations, represented by yellow nodes in the microbial network. One bacteriophage and 3 eukaryotic viruses were detected in the choanal cleft and cloaca, represented by red-green nodes. Two bacteriophage and one bacteria were detected in the trachea and cloaca, represented by blue-green nodes. One bacteriophage, two fungi, two bacteria and one eukaryotic virus were detected in the choanal cleft and trachea, represented by red-blue nodes. Six bacteriophage, 2 eukaryotic viruses, 1 fungi and 11 bacteria were only detected in a single location, represented by blue, green or red nodes in the microbial network
(Figure 5.9).
5.4 Discussion
Over the years, poultry meat has become the main source of protein consumption globally [18] , with the United States and Europe consuming the most
[19]. While the majority of poultry consumed is chicken, turkey accounts for as much
170
as 17% of consumption. According to the United States Department of Agriculture, the United States produces over 250 million turkeys per year [20]. A majority of studies concentrating on the avian microbiome examine chicken specifically, with far less concentration on turkeys. Understanding the complex communities existing in turkey microbiomes could contribute to the treatment of pathogen induced infections and diseases in this species, which account for a large economic loss for producers each year.
Numerous studies provide evidence that microbial communities vary between microbiome niches and that the specific microorganisms inhabiting different environments contribute to particular biological functions. An interesting example in birds can be seen in the microbiota on the face and mouth of vultures. These birds, which feed predominately on decaying carcasses, are consequently exposed to pathogens, so the microbiota of their face and mouth is incredibly diverse [21].
Interestingly, their intestine was found to be less diverse, implying that the microbiota of the face and mouth eliminate pathogens before they can enter the intestinal tract
[21]. Understanding the composition and diversity of these various microbial communities will certainly lead to a deeper understanding of the complex role these microorganisms play in maintaining homeostasis of the host in which they reside.
Previous studies have demonstrated that BiomeSeq can successfully detect and quantify microbial abundance in the respiratory microbiome broiler flocks. In one study, this tool was utilized to examine the development of microbial ecology
171
throughout the growth of a healthy broiler flock from hatching to processing. In another study, this tool was utilized to compare a healthy broiler flock to a broiler flock clinically diagnosed with avian respiratory disease complex. In this study, we demonstrate the extensive utility of BiomeSeq by employing the tool to investigate microbial communities from the choanal cleft, trachea and cloaca of a healthy turkey flock. This includes the eukaryotic viruses, bacteria, bacteriophage and fungi that inhabit both the intestinal tract and respiratory tract. To detect and quantify the normalized relative abundance of these microorganisms, choanal cleft, trachea and cloacal swabs were collected from a healthy turkey flock. From these samples, nucleic acids were extracted and DNA-Seq and RNA-Seq was conducted. A total of
356,949,361 raw reads were generated and processed using BiomeSeq. To quantify the normalized relative abundance of each microorganism detected, BiomeSeq utilizes a unique sequence-dependent approach. In summary, the raw sequencing reads are first trimmed for quality and adapter sequences, reads shorter than 100 bp in length and reads with low quality are extracted from the sample. The remaining reads are then aligned to the turkey genome in a decontamination step to extract turkey DNA sequences from the sample. The decontaminated reads are aligned to the default bacteria, bacteriophage and fungi databases provided by BiomeSeq. One feature that makes BiomeSeq versatile to a variety of studies is its ability to accept custom databases provided by the user. For this study, an avian-derived viral genome database was constructed to replace BiomeSeq’s default viral database.
172
A close examination of the turkey respiratory and intestinal microbiomes confirmed unique and complex microbial communities. The species diversity was confirmed by the detection of a variety microorganisms. Interestingly, we observed a higher diversity in the cloacal microbiota when compared to the choanal cleft and trachea, which were quite comparable. Adenovirus was the only virus detected in all three locations, and the cloaca and trachea each had one unique virus. Interestingly the cloaca and choanal cleft shared the most similar viral composition, including Turkey astrovirus 2, Rotavirus D and Turkey gallivirus. Astroviruses, reoviruses (such as rotavirus) and adenoviruses are frequently identified enteric viruses in chicken and turkey in both healthy and diseased flocks [22-24]. Therefore, it is not surprising to detect Turkey astrovirus at a high abundance in the cloaca as these viruses are common in the avian intestinal tract [25]. Turkey astroviruses can result in microscopic changes to the intestinal epithelium of turkeys, leading to failure to absorb water and resulting in diarrhea [26]. Furthermore, it is not surprising that rotaviruses were detected at such a high abundance in these samples as previous studies provide evidence that rotaviruses in turkeys have been found to appear in the first month of life, which is consistent with the age of this flock at the time of sampling [27]. Moreover, the presence of herpesviruses is consistent with the vaccination of the birds with this live vaccine, coupled with the expected presence of these avian viruses in the environment.
173
The only bacteria that was detected in all locations was Escherichia coli. The bacterial composition in the cloaca was much more diverse than the respiratory tract, with eight unique bacteria. Furthermore, the choanal cleft and trachea have Citrbacter freundii and Lactobacillus amylovorus in common, whereas more unique species were detected in the cloaca, with the most abundant being Staphylococcus warneri.
Penicillium chrysogenum and Wickerhamomcyes ciferrii were the most abundant fungal species identified in all three locations. Of the top 10 most abundant fungal species, only 2 were detected in all locations while 1 species was unique to trachea and 3 were unique to the choanal cleft. Enterobacteria phage p88 was the most abundant bacteriophage in each location. Of the most abundant bacteriophage species,
5 were detected in all locations, including Enterobacteria phage 88, Stx2-converting phage 1717, Salmonella phage SJ46, Enterobacteria phage P1 and Enterobacteria phage mEp460. Unique bacteriophage were identified in each location; 4 were unique to the choanal cleft, 2 were unique to the trachea and 2 were unique to the cloaca.
In this study, we demonstrate the extensive utility of BiomeSeq to identify and quantify microbial abundance and diversity in various microbial niches and host animals. This tool was employed to investigate microbial communities from the choanal cleft, trachea and cloaca of a healthy turkey flock. By identifying the eukaryotic viruses, bacteria, bacteriophage and fungal elements within various animal microbiomes, the unique biological roles to which these microorganisms contribute can be elucidated. Furthermore, accurate quantification of abundance and diversity
174
within these communities, provided by BiomeSeq, will contribute to valuable knowledge that may distinguish healthy from potentially infected microbiomes.
175
Figure 5.1. Normalized relative abundance of eukaryotic viruses at the choanal cleft, cloaca and trachea of turkey.
Cloaca
Trachea
Choanal Cleft
0 20 40 60 80 100
Gallid herpesvirus 2&3 Adenovirus Turkey astrovirus 2 Rotavirus D Turkey gallivirus Meleagrid herpesvirus 1 Avian leukosis virus
176
Figure 5.2. Abundance of bacteria at the choanal cleft, cloaca and trachea of turkey.
Cloaca
Trachea
Choanal Cleft
0 10 20 30 40 50 60 70 80 90 100
Staphylococcus warneri Eubacterium sp. Marseille-P3177 Campylobacter coli Escherichia coli Christensenella sp. Marseille-P2438 Intestinimonas butyriciproducens Barnesiella viscericola Roseburia hominis Bacteroides vulgatus Megasphaera elsdenii Bordetella hinzii Citrobacter freundii Lactobacillus amylovorus Corynebacterium stationis Bordetella holmesii
177
Figure 5.3. Abundance of top 10 bacteriophage at the choanal cleft, cloaca and trachea of turkey.
Cloaca
Trachea
Choanal Cleft
0 10 20 30 40 50 60 70 80 90 100
Enterobacteria phage P88 Stx2-converting phage 1717 Bordetella phage BPP-1 Enterobacteria phage mEp460 Enterobacteria phage phiP27 Enterobacteria phage P4 Phage cdtI DNA Salmonella phage SJ46 Enterobacteria phage P1 Enterobacteria phage fiAA91-ss Enterobacteria phage Sf6 Shigella phage SfIV
178
Figure 5.4. Normalized relative abundance of top 10 fungi at the choanal cleft, cloaca and trachea of turkey.
Cloaca
Trachea
Choanal Cleft
0 10 20 30 40 50 60 70 80 90 100
Penicillium chrysogenum Wickerhamomyces ciferrii Trichosporon asahii Usnea ceratina Botrytis cinerea Sordaria macrospora
179
Figure 5.5. Venn Diagram of the eukaryotic viruses detected in the choanal cleft, cloaca and trachea of turkeys.
Gallid herpesvirus 2&3 Choanal Trachea
1 Meleagrid herpesvirus 1
0 1
1
180 Adenovirus 3 0
Turkey astrovirus 2 Rotavirus D Turkey gallivirus 1
Avian leukosis virus Cloaca
180
Figure 5.6. Venn Diagram of the bacteria detected in the choanal cleft, cloaca and trachea of turkeys.
Corynebacterium stationis Bordetella holmesii
2 Choanal Cloaca
Eubacterium Marseille-P3177
1 2 1 8 1 181 Bordetella hinzii
Staphylococcus warneri Campylobacter coli Christensenella Marseille-P2438 Intestinimonas butyriciproducens Citrobacter freundii Barnesiella viscericola Lactobacillus amylovorus Roseburia hominis Bacteroides vulgatus Trachea Megasphaera elsdenii Escherichia coli
181
Figure 5.7. Venn Diagram of the bacteriophage detected in the choanal cleft, cloaca and trachea of turkeys.
Enterobacteria phage fiAA91-ss
Choanal Trachea
Enterobacteria phage SfV 1 Shigella phage SfIV
4 2 Bordetella phage BPP-1 Enterobacteria phage phiP27 Enterobacteria phage P4 Enterobacteria phage Sf6 5
182 1 2 Phage cdtI DNA Enterobacteria phage YYZ-2008 Escherichia phage TL-2011b
2 Enterobacteria phage P88 Stx2-converting phage 1717 Salmonella phage RE-2010 Salmonella phage SJ46 Stxconverting phage vB_EcoP_24B Enterobacteria phage P1 Enterobacteria phage mEp460 Cloaca
182
Figure 5.8. Venn Diagram of the fungi detected in the choanal cleft, cloaca and trachea of turkeys.
Sordaria macrospora
1 Choanal Cloaca
Trichosporon asahii Botrytis cinereal 0 0 Usnea ceratina 3 2 0 183
Trachea Wickerhamomyces ciferrii Penicillium chrysogenum
183
Figure 5.9. Microbial network of eukaryotic viruses, fungi, bacteria and bacteriophage present in the cloaca, trachea and choanal cleft of turkeys. Yellow nodes are species identified in the choanal cleft, cloaca and trachea; green nodes are species identified in the cloaca; blue nodes are species identified in the trachea; red nodes are species identified in the choanal cleft.
Cloaca
Choanal cleft
Trachea
All locations 184
184
Table 5.1. Avian specific viral genome database structure.
Virus Complete Database Classification Family Genomes Hepeviridae Double/Single d 1 a Enveloped Stranded Hepadnaviridae 1
Genomoviridae 3 Non- Single Stranded Parvoviridae Enveloped 7 Avian DNA Viral Circoviridae 10 Database Smacoviridae 3
Poxviridae 3 Double Stranded Enveloped Herpesviridae 6 Non- Double Stranded Adenoviridae Enveloped 14 Non- Reoviridae 5 Double Stranded Segmented c Enveloped Birnaviridae 1 Retroviridae 5 Non- Single Stranded Positive b Enveloped Flaviviridae Segmented 3 Coronaviridae 5 Astroviridae 5 Avian RNA Non- Non- Single Stranded Positive Caliciviridae 1 Viral Segmented Enveloped Database Picornaviridae 17 Orthomyxovirid ae 16 Single Stranded Negative Segmented Enveloped Phenuiviridae 1 Bornaviridae 3 Pneumoviridae 1 Non- Paramyxovirida Single Stranded Negative Enveloped Segmented e 14 a single stranded, double stranded or single/double stranded DNA and RNA viruses b positive-sense or negative-sense RNA viruses c segmented or non-segmented RNA viruses d enveloped or non-enveloped DNA and RNA viruses
185
Table 5.2. Quality Trimming and Host DNA Decontamination of reads generated by DNA-Seq and RNA-Seq from samples collected from the choanal cleft, cloaca and trachea of turkeys
Number Number Number Sample Sequencing Number Sample Trimmed Mapped to Unmapped Location Method Raw Reads Reads Host to Host
CK91 Choanal RNA Seq 39,116,031 39,062,667 1,989,029 37,073,638
CK92 Trachea RNA Seq 63,836,927 63,770,120 4,209,347 59,560,773
CK93 Cloacal RNA Seq 61,604,278 61,579,860 258,029 61,321,831
CK94 Choanal DNA Seq 65,037,361 65,026,950 57,894,840 7,132,110
CK95 Trachea DNA Seq 67,112,661 67,101,764 59,695,581 7,406,183
CK96 Cloacal DNA Seq 60,242,103 60,232,603 45,152,230 15,080,373
186
Table 5.3. Shannon diversity of virus, bacteria, bacteriophage and fungi in choanal cleft, trachea and cloaca of turkey
Choanal Cleft Trachea Cloaca
Virus 0.83 0.20 1.16
Bacteria 0.75 0.86 1.20
Bacteriophage 2.76 2.52 3.37 Fungi 0.59 0.24 0.42
187
Table 5.4. Eukaryotic viral species abundance in the choanal cleft, trachea and cloaca of turkey. Virus Virus Nucleic Acid Virus Number Normalized Relative Location Virus Taxonomy Virus Species Type Description Enveloping Mapped Abundance Abundance Herpesviridae,Mardivirus,Gallid Gallid herpesvirus DNA ds enveloped 101 2253 1.1 alphaherpesvirus 2&3 Adenoviridae,Aviadenovirus, DNA ds nonenveloped Adenovirus 49 4503 2.3 Aviadenovirus Choanal ss, positive, Astroviridae,Avastrovirus, Turkey astrovirus RNA nonenveloped 2 30851 15.5 Cleft nonsegmented Avastrovirus 2 ds, positive, RNA nonenveloped Reoviridae,Rotavirus,Rotavirus D Rotavirus D 1 148309 74.4 segmented ss, positive, RNA nonenveloped Picornaviridae,Gallivirus,Gallivirus A Turkey gallivirus 1 13354 6.7 nonsegmented Herpesviridae,Mardivirus,Gallid Gallid herpesvirus DNA ds enveloped 98 2111 3.4 alphaherpesvirus 2&3 Adenoviridae,Aviadenovirus, Trachea DNA ds nonenveloped Adenovirus 677 59709 95.7 Aviadenovirus 188 Herpesviridae,Mardivirus, Meleagrid DNA ds enveloped 24 570 0.9 Meleagrid alphaherpesvirus 1 herpesvirus 1 Adenoviridae,Siadenovirus, DNA ds nonenveloped Adenovirus 270 50709 0.4 Aviadenovirus Retroviridae,Alpharetrovirus, Avian leukosis RNA ss, positive enveloped 1 120036 1.0 Avian leukosis virus virus ss, positive, Astroviridae,Avastrovirus, Turkey astrovirus Cloaca RNA nonenveloped 37 4399671 35.1 nonsegmented Avastrovirus 3 2 ds, positive, RNA nonenveloped Reoviridae,Rotavirus,Rotavirus D Rotavirus D 3 3429738 27.4 segmented ss, positive, RNA nonenveloped Picornaviridae,Gallivirus,Gallivirus A Turkey gallivirus 44 4529386 36.1 nonsegmented
188
Table 5.5. Bacteria species abundance in the choanal cleft, trachea and cloaca of turkey.
Number Normalized Relative Location Bacteria Name Mapped Abundance Abundance
Bordetella hinzii 113957 90412 50.8 Escherichia coli 111443 83035 46.6 Choanal Cleft Citrobacter freundii 2566 1962 1.1 Lactobacillus amylovorus 1455 2729 1.5 Escherichia coli 179927 130018 67.5 Citrobacter freundii 6669 54777 28.4 Corynebacterium stationis 1957 2634 1.4 Trachea Eubacterium sp. Marseille-P3177 1774 1916 1.0 Lactobacillus amylovorus 1182 2150 1.1 Bordetella holmesii 1158 1183 0.6 Eubacterium sp. Marseille-P3177 85221 121668 6.5 Escherichia coli 67614 64596 3.4 Campylobacter coli 29671 72910 3.9 Christensenella sp. Marseille-P2438 24832 48476 2.6 Intestinimonas butyriciproducens 23550 34859 1.9 Barnesiella viscericola 20852 33871 1.8 Bacteroides vulgatus 19634 19006 1.0 Roseburia hominis 16220 22568 1.2 Flavonifractor plautii 8568 11214 0.6 Megasphaera elsdenii 6767 13505 0.7 Cloaca Lactobacillus amylovorus 4143 9965 0.5 Alistipes finegoldii 4103 5491 0.3 Bacteroides salanitronis 3789 4463 0.2 Bifidobacterium animalis 3764 9705 0.5 Staphylococcus warneri 3644 1381197 73.8 Mobiluncus curtisii 3088 7190 0.4 Clostridium cellulovorans 2833 2691 0.1 Odoribacter splanchnicus 2609 2969 0.2 Eubacterium rectale 2316 3355 0.2 Cloacibacillus porcorum 2083 2904 0.2
189
Table 5.6. Bacteriophage species with top 10 highest abundances in the choanal cleft, trachea and cloaca of turkey.
Number Normalized Relative Location Phage Name Mapped Abundance Abundance
Enterobacteria phage P88 1383 150522 20.3 Enterobacteria phage P4 377 126420 17.0 Bordetella phage BPP-1 952 87327 11.8 Stx2-converting phage 1717 1305 81850 11.0 Enterobacteria phage mEp460 557 48778 6.6 Choanal 436 39917 5.4 Cleft Enterobacteria phage phiP27 Phage cdtI DNA 318 26361 3.5 Enterobacteria phage fiAA91-ss 161 18662 2.5 Enterobacteria phage Sf6 138 13777 1.9 Enterobacteria phage P1 187 7689 1.0 Salmonella phage SJ46 195 7348 1.0 Enterobacteria phage P88 1514 159809 39.0 Enterobacteria phage P1 769 30665 7.5 Shigella phage SfIV 300 28525 7.0 Salmonella phage SJ46 681 24887 6.1 Enterobacteria phage mEp460 237 20129 4.9 Trachea Stx2-converting phage 1717 256 15572 3.8 Enterobacteria phage SfV 148 15091 3.7 Escherichia phage TL-2011b 178 15025 3.7 Enterobacteria phage fiAA91-ss 133 14951 3.7 Enterobacteria phage YYZ-2008 106 7299 1.8 Enterobacteria phage P88 104 14513 11.6 Salmonella phage SJ46 262 12658 10.1 Stx2-converting phage 1717 114 9168 7.3 Enterobacteria phage mEp460 71 7972 6.4 Salmonella phage RE-2010 54 7911 6.3 Cloaca Enterobacteria phage P1 142 7486 6.0 Escherichia phage TL-2011b 60 6696 5.4 Phage cdtI DNA, complete 42 4464 3.6 genome Stxconverting phage vB_EcoP_24B 45 3899 3.1 Enterobacteria phage YYZ-2008 36 3278 2.6
190
Table 5.7. Fungal species with top 10 highest abundances in the choanal cleft, trachea and cloaca of turkey.
Number Normalized Relative Location Fungi Name Mapped Abundance Abundance
Penicillium chrysogenum 59 144821 73.2 Wickerhamomyces ciferrii 1318 52296 26.4 Choanal Cleft Trichosporon asahii 1 120 0.1 Usnea ceratina 2 119 0.1 Botrytis cinerea 71 102 0.1 Wickerhamomyces ciferrii 1345 4016207 94.7 Trachea Penicillium chrysogenum 51 121408 2.9 Sordaria macrospora 9 100659 2.4 Wickerhamomyces ciferrii 677 2672670 85.3 Cloaca Penicillium chrysogenum 145 456360 14.6
191
REFERENCES
1. USDA, N.A.S.S. Poultry - Production and Value 2018 Summary. 2019; Available
from:
https://www.nass.usda.gov/Publications/Todays_Reports/reports/plva0519.pdf.
2. USDA, N.A.S.S. Turkeys: Production and Value of Production. 2017; Available
from: https://www.nass.usda.gov/Charts_and_Maps/Poultry/tkprvl.php.
3. Johnson, R. Global turkey meat market: Key findings and insights. the poultry site
2018; Available from: https://thepoultrysite.com/news/2018/05/global-turkey-
meat-market-key-findings-and-insights.
4. Ramos, S., M. MacLachlan, and A. Melton, Impacts of the 2014-2015 Highly
Pathogenic Avian Influenza Outbreak on the U.S. Poultry Sector. USDA,
Economic Research Service, 2015.
5. Mulholland, K.A. and C.L. Keeler, BiomeSeq: A Tool for the Characterization of
Animal Microbiomes from Metagenomic Data. bioRxiv, 2019: p. 800995.
6. Dalloul, R.A., et al., Multi-platform next-generation sequencing of the domestic
turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol, 2010.
8(9).
7. Li, H. and R. Durbin, Fast and accurate long-read alignment with Burrows-
Wheeler transform. Bioinformatics, 2010. 26(5): p. 589-95.
8. Daly G., L.R., Rowe W., Stubbs S., Wilkinson M., Ramirez-Gonzalez R., Mario
C., Bernal W., Heeney J. , Host subtraction, filtering and assembly validations for
novel viral discovery using next generation sequencing data. PLoS One, 2015.
10(6).
192
9. Langmead, B. and S.L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat
Methods, 2012. 9(4): p. 357-9.
10. Mulholland, K.A. BiomeSeq Microbial Databases. Avian Genomics 2019;
Available from: https://sites.udel.edu/aviangenomics/.
11. O'Leary NA, W.M., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B.,
Robbertse B., Smith-White B., Ako-Adjei D., Astashyn A., Badretdin A., Bao Y.,
Blinkova O., Brover V., Chetvernin V., Choi J., Cox E., Ermolaeva O., Farrell
C.M., Goldfarb T., Gupta T., Haft D., Hatcher E., Hlavina W., Joardar V.S.,
Kodali V.K., Li W., Maglott D., Masterson P., McGarvey K.M., Murphy M.R.,
O'Neill K., Pujar S., Rangwala S.H., Rausch D., Riddick L.D., Schoch C., Shkeda
A., Storz S.S., Sun H., Thibaud-Nissen F., Tolstoy I., Tully R.E., Vatsan A.R.,
Wallin C., Webb D., Wu W., Landrum M.J., Kimchi A., Tatusova T., DiCuccio
M., Kitts P., Murphy T.D., Pruitt K.D., Reference sequence (RefSeq) database at
NCBI: current status, taxonomic expansion, and functional annotation. Nucleic
Acids Res., 2016. 4: p. 733-745.
12. Herath, D., et al., Assessing Species Diversity Using Metavirome Data: Methods
and Challenges. Comput Struct Biotechnol J, 2017. 15: p. 447-455.
13. Moustafa, A., et al., The blood DNA virome in 8,000 humans. PLoS Pathog, 2017.
13(3): p. e1006292.
14. Ludwig, J. and J. Reynolds, Statistical Ecology, ed. Wiley. 1988, New York.
15. Lemos, L.N., et al., Rethinking microbial diversity analysis in the high throughput
sequencing era. Journal of Microbiological Methods, 2011. 86(1): p. 42-51.
193
16. Chen, H. and P.C. Boutros, VennDiagram: a package for the generation of
highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics, 2011.
12(1): p. 35.
17. Shannon, P., et al., Cytoscape: a software environment for integrated models of
biomolecular interaction networks. Genome Res, 2003. 13(11): p. 2498-504.
18. Foley, S.L., et al., Population dynamics of Salmonella enterica serotypes in
commercial egg and poultry production. Applied and environmental
microbiology, 2011. 77(13): p. 4273-4279.
19. Magdelaine, P., M.P. Spiess, and E. Valceschini, Poultry meat consumption
trends in Europe. World's Poultry Science Journal, 2008. 64(1): p. 53-64.
20. USDA, Respiratory Disease on Breeder- Chicken Farms in the United States.
Technical Brief 2012; Available from: https://www.aphis.usda.gov/aphis/home.
21. Marin, C., et al., Wild griffon vultures (Gyps fulvus) as a source of Salmonella
and Campylobacter in Eastern Spain. PLoS One, 2014. 9(4): p. e94191.
22. Reynolds, D.L., Y.M. Saif, and K.W. Theil, A survey of enteric viruses of turkey
poults. Avian Dis, 1987. 31(1): p. 89-98.
23. Pantin-Jackwood, M.J., et al., Enteric viruses detected by molecular methods in
commercial chicken and turkey flocks in the United States between 2005 and
2006. Avian Dis, 2008. 52(2): p. 235-44.
24. Reynolds, D.L., K.W. Theil, and Y.M. Saif, Demonstration of rotavirus and
rotavirus-like virus in the intestinal contents of diarrheic pheasant chicks. Avian
Dis, 1987. 31(2): p. 376-9.
194
25. Day, J.M. and L. Zsak, Recent progress in the characterization of avian enteric
viruses. Avian Dis, 2013. 57(3): p. 573-80.
26. Nighot, P.K., et al., Astrovirus infection induces sodium malabsorption and
redistributes sodium hydrogen exchanger expression. Virology, 2010. 401(2): p.
146-54.
27. Theil, K. and Y.M. Saif, Age-Related Infections with Rotavirus, Rotaviruslike
Virus, and Atypical Rotavirus in Turkey Flocks. Journal of Clinical Microbiology,
1987. 25(2): p. 333-337.
195
Chapter 6
CONCLUSIONS AND FUTURE DIRECTIONS
Microbiomes are complex environments consisting of a variety of microorganisms including eukaryotic viruses, bacteria, bacteriophage and fungi. These environments can exist in the oral cavity, intestinal tract, skin, respiratory tract, and vaginal tract of both animals and humans [1, 2]. The microorganisms within these environments interact with the host and each other in either symbiosis or dysbiosis, depending on the condition of the host as well as external factors in the host’s surroundings [3]. Symbiotic relationships occur when a balance of specific microflora is achieved, which contributes to maintaining homeostasis of the host environment.
Conversely, dysbiosis may occur due to a disruption of the environment, either by the colonization of a new infectious agent or the introduction of an unfavorable external environmental condition. This may result in infection or disease in the host.
The respiratory microbiome is understudied in comparison to the intestinal, reproductive and oral microbiomes. However, dysbiosis in this environment can result in respiratory diseases such as chronic obstructive pulmonary disease (COPD), cystic fibrosis (CF) and asthma in humans. Several studies have identified the specific infectious agents that contribute to these diseases [4, 5] as well as the impact of co- infection by multiple infectious agents [6, 7]. In poultry, the main source of protein
196
consumption with over $46.3 billion in global sales as of 2018, respiratory diseases, particularly avian influenza and respiratory disease complex, can contribute to severe economic losses [8]. For example, the 2014-2015 outbreak of highly pathogenic avian influenza in the United States resulted in a loss of over 50 million chickens and turkeys
[9]. Many bacteria, viruses and fungi contributing to respiratory diseases in poultry have been identified and studies have examined the effect of several bacteria-bacteria [10-12], virus-virus [13, 14] and even bacteria-virus co-infections [15-18] on the severity of the disease. However, due to limitations in current methodologies, a comprehensive view of the complex ecology within the respiratory microbiome remains elusive.
The advancement of next generation sequencing methodologies has given rise to an increase in studies attempting to examine the microbial communities existing in a variety of animals. In contrast to traditional culture-dependent approaches, new technologies allow researchers to identify microbial communities at a relatively low cost
[19]. Readily accessible and cost-effective sequencing methodologies as well as a number of user-friendly bioinformatics analysis software and databases for 16S rRNA sequencing data provide the standard culture-independent approach for bacterial analysis [20-24].
Although 16S rRNA has provided insight into one component of the microbiome, it is limited to detecting one specific kingdom, lacks the sensitivity to discriminate between species and cannot be used for novel microbial discovery. Eukaryotic viruses are particularly difficult to analyze due to their high genetic heterogeneity and the lack of a common marker gene. Additionally, there are limited reference genome databases and bioinformatics tools available for viral analysis of microbiomes. To avoid these limitations, researchers have resorted to utilizing sequence-independent approaches for
197
viral identification, which presents a new challenge as this approach loses the information necessary for quantification of the virome.
The two major goals of this work were to develop a comprehensive and user- friendly computational tool for the detection and quantification of the major components of a microbiome and to utilize this tool for the analysis of several microbiomes in both healthy and diseased poultry. These goals were accomplished, and the computational tool developed, BiomeSeq, was successful in characterizing the respiratory microbiome of a healthy broiler flock, the respiratory microbiome of a clinically diseased broiler flock and the respiratory, cloacal and choanal cleft microbiomes of a healthy turkey flock. The design of the tool is described in detail in Chapter 2. In summary, a comprehensive workflow and microbial databases were developed that carefully consider the major limitations of existing methodology. Both DNA- and RNA-Seq data generated from next generation sequencing technology in both single- and paired-end format are accepted.
The workflow begins with a quality and decontamination step, which includes quality trimming of adapter sequences and low quality reads as well as decontamination of host
DNA.
Using a sequence-dependent approach, the remaining reads are aligned to four microbial reference genome databases. Eukaryotic viral, bacterial, fungal and bacteriophage databases were constructed using complete and representative genomes obtained from the NCBI Reference Sequence Database and contain 5,693, 3,623, 1,281 and 2,212 genomes, respectively. Quantification of normalized abundance, species diversity and genome coverage are determined from the number of reads aligned to the sequences within the microbial databases. The workflow and databases were packaged
198
into a comprehensive computational tool called BiomeSeq, discussed in detail in Chapter
2. To further increase accessibility, BiomeSeq was also implemented into an open-source and user-friendly container available on the Docker Hub. Containers, such as this, allow the end user to download and install BiomeSeq, both workflow and databases, and all dependent software on any operating system using one simple command. Furthermore, it allows the user to process their sample, with custom parameters, using one line of code.
The performance of BiomeSeq was evaluated using synthetic datasets containing sequences from variety of microorganisms experimentally observed in the respiratory microbiome of poultry. BiomeSeq detected each microorganism in the datasets and highly precise abundances were calculated. Using a clinical sample, results obtained by
BiomeSeq were compared results obtained by the 16S rRNA approach. BiomeSeq was able to identify 533 unique bacterial genera compared to 24 detected by 16S rRNA.
Furthermore, BiomeSeq has greater taxonomic sensitivity and is able to identify bacteria at the species level, whereas 16S rRNA sequencing is restricted to detection at the genera level. Moreover, 16S rRNA sequencing methodology can only be employed for taxonomic classification of the bacterial component, leaving the identity of the remaining components of the microbiome unknown.
This resource was employed to investigate the microbial communities inhabiting several different microbiomes in both healthy and clinically diseased avian hosts, which are detailed in Chapters 3, 4 and 5. In the first study, the development of the respiratory microbiome of a commercial, antibiotic-free broiler flock was examined at weekly intervals from hatching to processing. For each component of the respiratory microbiome of this flock, microbial abundance was calculated at various taxonomic levels and
199
population shifts were examined at various time points. A total of 11 eukaryotic viruses,
45 bacteria, 31 bacteriophage, and 61 fungi were identified throughout the development of this flock. In one interesting finding, the complexity and diversity of the viral community increased as the flock aged, with the occurrence of several viral elements being consistent with vaccination schedules of the chicks. Additionally, correlations between bacteria and bacteriophage families were investigated and several highly positive correlations were identified. In the second study, the microbial ecology of the respiratory tract of a broiler flock clinically diagnosed with respiratory disease complex and a healthy broiler flock were compared. Changes in the composition and diversity of the viral, bacterial, and bacteriophage microbiomes were observed which were consistent with the complex etiology of this disease. In the final study, the utility of BiomSeq to characterize a variety of species and microbiome locations was highlighted. BiomeSeq was successful in identifying microbial communities inhabiting three unique microbial niches, including the trachea, choanal cleft and cloaca in a turkey flock. BiomeSeq was also successful in characterizing the respiratory microbiome of duck and quail (data not provided).
BiomeSeq was successful in identifying and quantifying microbiomes in different locations with unique niches; microbiomes of both healthy and diseased hosts; and microbiomes of a variety of different host species. Therefore, the utility of this resource can be extended to include additional species, including humans. By accepting custom databases, species-specific microbial databases can be utilized in place of the custom databases provided by BiomeSeq. This feature was demonstrated in three studies
(Chapters 3, 4 and 5). As part of an extensive sustainability plan for BiomeSeq, an
200
automated program was designed to update the microbial databases biannually.
Furthermore, BiomeSeq is accessible for users with various levels of command-line knowledge and computational resources. In addition to the software package and container, BiomeSeq will also become available as a web tool.
These studies provide knowledgeable insight into the complexity of microbiomes.
However, this work could be further expanded in several directions. For example,
BiomeSeq was designed to identify known microbial species, but this information can be further extended by employing sequence-independent approaches, such as contig assembly, which can be used to identify novel microbial elements that the reference databases would lack. Moreover, the data presented in this work may also be further expanded to include metatranscriptomics, metabolomics and proteomics, providing invaluable insight into biological functions occurring within a particular microbiome. The genomic microbial networks presented in Chapters 3, 4 and 5 are the first visual representations of complete microbiomes and expanding these networks to incorporate multi-omics data would provide an even deeper understanding. Other possible directions may include modeling microbial community structures to predict disease progression during outbreaks. This could help explain how community diversity and shifts in specific species abundances can contribute to the severity and spread of disease. Additionally, potential microbial interactions could be examined utilizing text mining to generate systemic knowledge networks at the species or gene level. This type of analysis can strengthen our understanding of dynamic communities by providing valuable information about possible interactions using information provided by the literature. Interestingly, cross-talk between microbiomes and other areas of the body have been identified,
201
including the gut-brain [25-28], gut-kidney [29, 30], and gut-liver axes [29], linking microbiomes to conditions such as depression [25, 26], eating disorders [31], autism [27,
28], cancer [32-35], kidney disease [29, 30] and diabetes [36]. The methodological approaches described in this work may help reveal information about these complex systems.
The available literature is rich with studies attempting to decipher the role microorganisms play in a myriad of biological functions in healthy and diseased animals.
These studies have emphasized one or two components at a time. However, due to the lack of robust computational tools utilizing metagenomic sequencing data, the complete microbiome still remains elusive. The approaches to develop the necessary methodology to overcome these challenges are detailed in this work and have contributed to a better understanding of the dynamic community structure of microbiomes. By providing new information on how microbial communities develop over time, the population shifts observed in healthy and diseased animals and the similarities and differences observed between microbiomes in different locations of the same animal and in different species, the complexity of biological systems can be fully appreciated. Furthermore, the development of this computational tool as an open-source and user-friendly resource was motivated by the hope that it will be used to facilitate future investigations of microbiomes and advance knowledge in this growing field.
202
REFERENCES
1. Human Microbiome Project, C., A framework for human microbiome research.
Nature, 2012. 486(7402): p. 215-221.
2. Human Microbiome Project, C., Structure, function and diversity of the healthy
human microbiome. Nature, 2012. 486(7402): p. 207-214.
3. Turnbaugh, P.J., et al., The human microbiome project. Nature, 2007. 449(7164):
p. 804-810.
4. Papi, A., et al., Infections and airway inflammation in chronic obstructive
pulmonary disease severe exacerbations. Am J Respir Crit Care Med, 2006.
173(10): p. 1114-1121.
5. Rohde, G., et al., Respiratory viruses in exacerbations of chronic obstructive
pulmonary disease requiring hospitalisation: a case-control study. Thorax, 2003.
58(1): p. 37-42.
6. Bosch, A.A., et al., Viral and bacterial interactions in the upper respiratory tract.
PLoS Pathog, 2013. 9(1): p. e1003057.
7. Willner, D., et al., Metagenomic analysis of respiratory tract DNA viral
communities in cystic fibrosis and non-cystic fibrosis individuals. PLoS One,
2009. 4(10): p. e7370.
8. USDA, N.A.S.S. Poultry - Production and Value 2018 Summary. 2019; Available
from:
https://www.nass.usda.gov/Publications/Todays_Reports/reports/plva0519.pdf.
203
9. Ramos, S., M. MacLachlan, and A. Melton, Impacts of the 2014-2015 Highly
Pathogenic Avian Influenza Outbreak on the U.S. Poultry Sector. USDA,
Economic Research Service, 2015.
10. Ganapathy, K., R.C. Jones, and J.M. Bradbury, Pathogenicity of in vivo-passaged
Mycoplasma imitans in turkey poults in single infection and in dual infection with
rhinotracheitis virus. Avian Pathology, 1998. 27(1): p. 80-89.
11. Saif, Y.M., P.D. Moorhead, and E.H. Bohl, Mycoplasma meleagridis and
Escherichia coli infections in germfree and specific-pathogen-free turkey poults:
production of complicated airsacculitis. Am J Vet Res, 1970. 31(9): p. 1637-
1643.
12. Kato, K., Infectious coryza of chickens. V. Influence of Mycoplasma gallisepticum
infection on chicken infected with Haemophilus gallinarum. Natl Inst Anim
Health Q (Tokyo), 1965. 5(4): p. 183-189.
13. Bonfante, F., et al., Synergy or interference of a H9N2 avian influenza virus with
a velogenic Newcastle disease virus in chickens is dose dependent. Avian Pathol,
2017. 46(5): p. 488-496.
14. Karimi-Madab, M., et al., Risk factors for detection of bronchial casts, most
frequently seen in endemic H9N2 avian influenza infection, in poultry flocks in
Iran. Prev Vet Med, 2010. 95(3-4): p. 275-280.
15. Travers, A.F., Concomitant Ornithobacterium rhinotracheale and Newcastle
disease infection in broilers in South Africa. Avian Dis, 1996. 40(2): p. 488-490.
204
16. Okoye, J.O., C.N. Okeke, and F.K. Ezeobele, Effect of infectious bursal disease
virus infection on the severity of Aspergillus flavus aspergillosis of chickens.
Avian Pathol, 1991. 20(1): p. 167-171.
17. Omuro, M., et al., Interaction of Mycoplasma gallisepticum, mild strains of
Newcastle disease virus and infectious bronchitis virus in chickens. Natl Inst
Anim Health Q (Tokyo), 1971. 11(2): p. 83-93.
18. Kishida, N., et al., Co-infection of Staphylococcus aureus or Haemophilus
paragallinarum exacerbates H9N2 influenza A virus infection in chickens. Arch
Virol, 2004. 149(11): p. 2095-2104.
19. Wetterstrand, K.A. DNA Sequencing Costs: Data from the NHGRI Genome
Sequencing Program. 2019; Available from: https://www.genome.gov/about-
genomics/fact-sheets/DNA-Sequencing-Costs-Data.
20. Caporaso J, K.J., Stombaugh J, Bittinger K, Bushman F, Costello E, Fierer N,
Peña A, Goodrich J, Gordon J, Huttley G, Kelley ST, Knights D, Koenig JE, Ley
R, Lozupone C, McDonald D, Muegge B, Pirrung M, Reeder J, Sevinsky JR,
Turnbaugh P, Walters W, Widmann J, Yatsunenko T, Zaneveld J, Knight R.,
Qiime allows analysis of high-throughout community sequencing data. Nature
Methods, 2010. 7: p. 335-336.
21. Meyer F, P.D., D’Souza M, Olson R, Glass E, Kubal M, Paczian T, Rodriguqz A,
Stevens R, Wilke A, Wilkening J, Edwards R., The metagenomics RAST server- a
public resource for the automatic phylogenetic and functional analysis of
metagenomes. BMC Bioinformatics, 2008. 9: p. 386.
205
22. Schloss P, W.S., Ryabin T, Hall J, Hartman M, Hollister E, Lesniewski R, Oakley
B, Parks D, Robinson C, Sahl J, Stres B, Thallinger G, Van Horn D, Weber C. ,
Introducing mothur: Open-source, platform-independent, community-supported
software for describing and comparing microbial communities. Appl Enviro
Microbiol, 2009. 75: p. 7537-7541.
23. DeSantis, T.Z., et al., Greengenes, a chimera-checked 16S rRNA gene database
and workbench compatible with ARB. Appl Environ Microbiol, 2006. 72(7): p.
5069-72.
24. Quast C, P.E., Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner F.,
The SILVA ribosomal RNA gene database project: improved data processing and
web-based tools. Nucl Acids Res., 2013. 41: p. 590-596.
25. Agrawal, L., et al., Therapeutic potential of serotonin 4 receptor for chronic
depression and its associated comorbidity in the gut. Neuropharmacology, 2020.
166: p. 107969.
26. Penalver Bernabe, B., et al., Precision medicine in perinatal depression in light of
the human microbiome. Psychopharmacology (Berl), 2020. 237(4): p. 915-941.
27. Saurman, V., K.G. Margolis, and R.A. Luna, Autism Spectrum Disorder as a
Brain-Gut-Microbiome Axis Disorder. Dig Dis Sci, 2020.
28. Hartman, R.E. and D. Patel, Dietary Approaches to the Management of Autism
Spectrum Disorders, in Personalized Food Intervention and Therapy for Autism
Spectrum Disorder Management, M.M. Essa and M.W. Qoronfleh, Editors. 2020,
Springer International Publishing: Cham. p. 547-571.
206
29. Raj, D., et al., The gut-liver-kidney axis: Novel regulator of fatty liver associated
chronic kidney disease. Pharmacol Res, 2020. 152: p. 104617.
30. Jazani, N.H., et al., Impact of Gut Dysbiosis on Neurohormonal Pathways in
Chronic Kidney Disease. Diseases, 2019. 7(1): p. 21.
31. Peñalver Bernabé, B., et al., Precision medicine in perinatal depression in light of
the human microbiome. Psychopharmacology, 2020. 237(4): p. 915-941.
32. Seitz, J., S. Trinh, and B. Herpertz-Dahlmann, The Microbiome and Eating
Disorders. Psychiatric Clinics of North America, 2019. 42(1): p. 93-103.
33. Peters, B.A., et al., Oral Microbiome Composition Reflects Prospective Risk for
Esophageal Cancers. Cancer Res, 2017. 77(23): p. 6777-6787.
34. Gao, S.G., et al., Preoperative serum immunoglobulin G and A antibodies to
Porphyromonas gingivalis are potential serum biomarkers for the diagnosis and
prognosis of esophageal squamous cell carcinoma. BMC Cancer, 2018. 18(1): p.
17.
35. Ertz-Archambault, N., P. Keim, and D. Von Hoff, Microbiome and pancreatic
cancer: A comprehensive topic review of literature. World J Gastroenterol, 2017.
23(10): p. 1899-1908.
36. Flemer, B., et al., The oral microbiota in colorectal cancer is distinctive and
predictive. Gut, 2018. 67(8): p. 1454-1463.
37. Sharma, M., et al., The Epigenetic Connection Between the Gut Microbiome in
Obesity and Diabetes. Front genet, 2020. 10: p. 1329-1329.
207
Appendix A
BIOMESEQ: A TOOL FOR THE CHARACTERIZATION OF ANIMAL MICROBIOMES FROM METAGENOMIC DATA
Table S1. List of RefSeq genomes included in simulated datasets.
RefSeq ID Microbe Type Microbe Name
NC_002695.2 Bacteria Escherichia coli NC_004829.2 Bacteria Mycoplasma gallisepticum NC_018016.1 Bacteria Ornithobacterium rhinotracheale NZ_CP008918.1 Bacteria Pasteurella multocida NZ_CP011096.1 Bacteria Mycoplasma synoviae NC_000866.4 Bacteriophage Enterobacteria phage T4 NC_001604.1 Bacteriophage Enterobacteria phage T7 NC_019445.1 Bacteriophage Escherichia phage TL-2011b NC_019915.1 Bacteriophage Staphylococcus phage StB20 Eukaryotic AY851295.1 Virus Avian infectious bronchitis virus strain Mass 41 Eukaryotic DQ530348.1 Virus Gallid herpesvirus 2 strain CVI988 Eukaryotic EF523390.1 Virus Gallid herpesvirus 2 strain RB-1B Eukaryotic GQ504720.1 Virus Infectious bronchitis virus strain Arkansas DPI Eukaryotic GQ504723.1 Virus Infectious bronchitis virus strain Georgia 1998 Vaccine Eukaryotic KM244097.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 1 Eukaryotic KM244098.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 2 Eukaryotic KM244099.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 3 Eukaryotic KM244100.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 4 Eukaryotic KM244101.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 5 Eukaryotic KM244102.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 6 Eukaryotic KM244103.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 7
208
Eukaryotic KM244104.1 Virus Influenza A virus (A/chicken/Delaware/10851/2014(H7N7)) segment 8 Eukaryotic NC_002229.3 Virus Gallid herpesvirus 2 Eukaryotic NC_002577.1 Virus Gallid herpesvirus 3 Eukaryotic NC_002617.1 Virus Newcastle disease virus B1 Eukaryotic NC_002641.1 Virus Meleagrid herpesvirus 1 Eukaryotic NC_004178.1 Virus Infectious bursal disease virus segment A Eukaryotic NC_004179.1 Virus Infectious bursal disease virus segment B Eukaryotic NC_006623.1 Virus Gallid herpesvirus 1 Eukaryotic NC_007652.1 Virus Avian metapneumovirus Eukaryotic NC_010800.1 Virus Turkey coronavirus NC_008282.1 Fungi Aspergillus oryzae NC_018100.1 Fungi Aspergillus oryzae NC_036435.1 Fungi Aspergillus oryzae chromosome 1 NC_036436.1 Fungi Aspergillus oryzae chromosome 2 NC_036437.1 Fungi Aspergillus oryzae chromosome 3 NC_036438.1 Fungi Aspergillus oryzae chromosome 4 NC_036439.1 Fungi Aspergillus oryzae chromosome 5 NC_036440.1 Fungi Aspergillus oryzae chromosome 6 NC_036441.1 Fungi Aspergillus oryzae chromosome 7 NC_036442.1 Fungi Aspergillus oryzae chromosome 8 Host NC_006088.5 Reference Gallus gallus breed Red Jungle Fowl isolate RJF #256 chromosome 1
209
Table S2. The number of reads that remain after each BiomeSeq processing step using four simulated sequencing datasets.
Decontaminated Reads Aligned to Dataset Raw Reads Trimmed Reads Reads Microbial Genomes
Dataset 1 24,522,223 24,521,469 5,158,013 4,681,160
Dataset 2 24,523,708 24,522,890 5,159,593 4,682,852
Dataset 3 24,523,100 24,522,369 5,158,995 4,681,818
Dataset 4 24,523,100 24,522,284 5,158,260 4,680,864
210
Table S3. Precision and sensitivity of BiomeSeq for each microbial component.
True False False Microbe Sensitivity Precision Positive Positive Negative Eukaryotic Virus 135016 0 270422 0.413 1.000 Bacteria 3847552 12940 916288 0.808 0.997 Bacteriophage 57252 9457 59256 0.491 0.858 Fungi 14541724 0 155116 0.989 1.000 Total 18581544 22397 1401082 0.930 0.999
211
Table S4. Rate of speed for simulated data during each BiomeSeq processing step in reads/second.
Microbial Simulated Quality Decontamination Database Quantification Total Dataset Alignment Dataset 1 92,537 6,966 2,614 222,912 325,029 Dataset 2 75,690 5,967 2,294 195,119 279,070 Dataset 3 71,705 6,049 2,354 156,061 236,168 Dataset 4 76,396 6,053 2,211 161,409 246,069
212
Table S5. Avian specific viral genome database structure.
Virus Complete Database Classification Family Genomes Double/Single Hepeviridae 1 Enveloped d Stranded a Hepadnaviridae 1 Genomoviridae 3 Single Non- Parvoviridae 7 Avian DNA Stranded Enveloped Circoviridae 10 Viral Database Smacoviridae 3 Double Poxviridae 3 Enveloped Stranded Herpesviridae 6 Double Non- Adenoviridae Stranded Enveloped 14 Double Non- Reoviridae 5 Segmented c Stranded Enveloped Birnaviridae 1 Retroviridae 5 Single Non- Positive b Enveloped Flaviviridae 3 Stranded Segmented Coronaviridae 5 Astroviridae 5 Avian RNA Single Non- Non- Positive Caliciviridae 1 Viral Stranded Segmented Enveloped Database Picornaviridae 17 Orthomyxoviridae 16 Single Phenuiviridae 1 Negative Segmented Enveloped Stranded Bornaviridae 3 Pneumoviridae 1 Single Non- Negative Enveloped Paramyxoviridae Stranded Segmented 14 a single stranded, double stranded or single/double stranded DNA and RNA viruses b positive-sense or negative-sense RNA viruses c segmented or non-segmented RNA viruses d enveloped or non-enveloped DNA and RNA viruses Table SX. Avian specific viral genome database structure.
213
Table S6. Abundance of bacterial species detected by BiomeSeq and 16S.
Percent Relative Family Genera/Species Abundance Pasteurellaceae Gallibacterium 37.8 Corynebacteriaceae Corynebacteriaceae* 22.5 Staphylococcaceae Staphylococcus 9.2 Lactobacillaceae Lactobacillus 8.7 Lactobacillales Lactobacillales** 6.4 16S Brevibacteriaceae Brevibacterium 3.1 Staphylococcaceae Salinicoccus 2.5 Streptococcaceae Streptococcus 2.3 Dermabacteraceae Brachybacterium 2.0 Bacillaceae Bacillaceae* 2.0 Pasteurellaceae Gallibacterium anatis 23.1 Corynebacteriaceae Corynebacterium falsenii 14.5 Staphylococcaceae Staphylococcus haemolyticus 23.0 Enterobacteriaceae Klebsiella oxytoca 9.3 Enterobacteriaceae Escherichia coli 5.1 BiomeSeq Staphylococcaceae Staphylococcus saprophyticus 2.1 Methylobacterium Methylobacteriaceae 1.0 radiotolerans Neisseriaceae Neisseria sicca 0.8 Corynebacteriaceae Corynebacterium stationis 0.8 Yersiniaceae Serratia marcescens 0.8
214
Figure S1. BiomeSeq implemented into user-friendly container.
215
Appendix B
METAGENOMIC ANALYSIS OF THE RESPIRATORY MICROBIOME OF A HEALTHY BROILER FLOCK FROM HATCHING TO PROCESSING
216
Table S1. Sequencing data generated by DNA-Seq, RNA-Seq and 16S rRNA.
Total After % Mapped Mapped to DNA Mapped to Mapped to DNA-Seq Reads Trimming Host Viral Reads Bacteriophage Fungi Week 0 46,586,608 46,568,465 86.43 1 68,174 545 Week1 43,940,202 43,911,476 81.11 0 63,256 649 Week 2 33,471,831 33,442,860 89.8 1 61,974 48 Week 3 45,131,953 45,108,639 84.81 2 68,330 313 Week 4 44,621,969 44,600,956 90 387 61,412 109 Week 5 38,630,592 38,590,484 88.99 7 51,011 142 Week 6 45,915,721 45,900,494 89.73 4,634 66,310 82 Week 7 41,215,207 41,196,338 90.18 131 64,215 76 Total 339,514,083 339,319,712 701 5,163 504,682 1,964 Average 42,439,260 42,414,964 88 645 63,085 246
Total After % Mapped Mapped to RNA 217 RNA-Seq Reads Trimming Host Viral Reads
Week 0 62375955 62351619 49.21 416 Week1 54,496,930 54,477,530 44.3 16,569 Week 2 51,426,090 51,415,545 65.54 10,016 Week 3 57,566,851 57,551,710 28.76 977 Week 4 56,334,014 56,321,058 61.62 3,660 Week 5 50,937,821 50,926,463 57.56 4,579 Week 6 53,248,431 53,236,536 59.08 29,476 Week 7 54,173,052 54,162,138 61.44 6,243 Total 440,559,144 440,442,599 428 71,936
Average 55,069,893 55,055,325 53 8,992
217
16S rRNA Total Reads OTUs
Week 0 8,438 47 Week 1 7,550 79 Week 2 6,737 76 Week 4 9,450 63 Week 5 10,718 50 Week 7 7,288 38 Total 50,181 353 Average 8,364 59
218
218
Table S2. Normalized abundance of detected eukaryotic viruses at the species level.
Species Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7
Gallid alphaherpesvirus 1 33.52 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Meleagrid alphaherpesvirus 1 0.00 0.00 41.97 0.00 31.40 0.00 0.00 0.00 Gallid alphaherpesvirus 2&3 0.00 0.00 0.00 63.84 1186.57 71.12 57.03 0.00 Avian gyrovirus 0.00 0.00 0.00 0.00 727764.39 12256.84 147165.92 68358.83 Fowl aviadenovirus 0.00 0.00 0.00 0.00 0.00 0.00 507048.92 12450.59 Chicken astrovirus 0.00 1490.69 0.00 0.00 0.00 0.00 0.00 0.00 Chicken sicinivirus JSY 0.00 0.00 0.00 42.11 42.11 0.00 0.00 0.00 Avian carcinoma virus 0.00 0.00 0.00 0.00 0.00 78.52 0.00 78.52 Avian infectious bronchitis virus 74.78 109675.86 67740.67 4048.83 15467.52 23999.29 207253.45 35914.02 Infectious bursal disease virus 0.00 0.00 0.00 0.00 64.88 0.00 22191.95 194.65 Avian Endogenous Retrovirus 19470.62 89108.98 47135.34 20718.00 76251.66 65275.35 67618.14 68898.80
219
219
Table S3. Normalized percent relative abundance of detected eukaryotic viruses at the species level. Sum of columns = 100%.
Week Week Week Week Week Week Week Week Family Species Frequency 0 1 2 3 4 5 6 7 Poxviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Gallid alphaherpesvirus 1 0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12.5 Herpesviridae Meleagrid alphaherpesvirus 1 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 12.5 Gallid alphaherpesvirus 2&3 0.00 0.00 0.00 0.26 0.14 0.07 0.01 0.00 50.0 Adenoviridae Fowl aviadenovirus 0.00 0.00 0.00 0.00 0.00 0.00 53.30 6.70 25.0 Hepeviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Hepadnaviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Genomiviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Circoviridae Avian gyrovirus 0.00 0.00 0.00 0.00 88.66 12.05 15.47 36.77 50.0 Parvoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0
220 Reoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0
Orthomyxoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Phenumoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Birnaviridae Infectious bursal disease virus 0.00 0.00 0.00 0.00 0.01 0.00 2.33 0.10 37.5 Pneumoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Paramyxoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Astroviridae Chicken astrovirus 0.00 0.74 0.00 0.00 0.00 0.00 0.00 0.00 12.5 Picornoviridae Chicken sicinivirus JSY 0.00 0.00 0.00 0.17 0.01 0.00 0.00 0.00 25.0 Avian Endogenous Retrovirus 99.45 44.49 41.02 83.30 9.29 64.20 7.11 37.06 100.0 Retroviridae Avian carcinoma virus 0.00 0.00 0.00 0.00 0.00 0.08 0.00 0.04 25.0 Flaviviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 Coronaviridae Avian infectious bronchitis virus 0.38 54.76 58.95 16.28 1.88 23.60 21.79 19.32 100.0
220
Table S4. Normalized percent relative abundance of detected eukaryotic viruses at the family level. Sum of rows = 100%.
Week Week Week Week Week Week Week Week Viral Family 0 1 2 3 4 5 6 7 Poxviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Herpesviridae 24.86 0.00 5.30 37.27 21.54 10.16 0.87 0.00 Adenoviridae 0.00 0.00 0.00 0.00 0.00 0.00 88.84 11.16 Hepeviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Hepadnaviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Genomiviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Circoviridae 0.00 0.00 0.00 0.00 57.97 7.88 10.11 24.04 Parvoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Reoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Orthomyxoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Phenumoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Birnaviridae 0.00 0.00 0.00 0.00 0.32 0.00 95.39 4.28 Pneumoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Paramyxoviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 221 Astroviridae 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00
Picornoviridae 0.00 0.00 0.00 97.06 2.94 0.00 0.00 0.00 Retroviridae 25.76 11.53 10.63 21.58 2.41 16.65 1.84 9.61 Flaviviridae 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Coronaviridae 0.19 27.80 29.93 8.26 0.96 11.98 11.06 9.81
221
Table S5. Average normalized relative abundance of viruses detected.
Strand Segments Envelope Family Genus Species double stranded DNA 7.59% enveloped DNA 0.09% Herpesviridae 0.09% Iltovirus 0.02% Gallid alphaherpesvirus 1 0.02% Meleagrid Mardivirus 0.06% alphaherpesvirus 1 0.01% Gallid alphaherpesvirus 2&3 0.06%
non-enveloped DNA 7.50% Adenoviridae 7.50% Aviadenovirus 7.50% Fowl aviadenovirus 7.50%
single stranded non-enveloped DNA 19.12% DNA 19.12% Circoviridae 19.12% Gyrovirus 19.12% Avian gyrovirus 19.12%
single stranded positive, non- non-enveloped RNA 73.95% segmented 73.64% RNA 0.77% Astroviridae 0.74% Avastrovirus 0.74% Chicken astrovirus 0.74%
Picornoviridae 0.02% Sicinivirus 0.02% Chicken sicinivirus JSY 0.02% 222
Avian infectious enveloped RNA 72.87% Coronaviridae 24.62% Gammacoronavirus 24.62% bronchitis virus 24.62%
Unclassified Avian Endogenous Retroviridae 48.25% Retrovirus 48.25% Retrovirus 48.24%
Avian carcinoma virus 0.01% Infectious bursal disease negative, segmented 0.31% enveloped RNA 0.31% Birnaviridae 0.31% Avibirnavirus 0.31% virus 0.31%
222
Table S6. Normalized abundance of detected bacteriophage at the species level.
Species Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7
Enterobacteria phage IME10 202.28 1258.09 150.16 611.33 0.00 0.00 0.00 0.00
Staphylococcus phage SPbeta-like 313.94 0.00 0.00 284.63 90.51 0.00 0.00 0.00
Enterobacteria phage P88 0.00 928.47 0.00 1353.48 484.17 0.00 0.00 0.00
Staphylococcus phage GH15 0.00 59.46 0.00 1040.16 41.34 0.00 0.00 0.00
Shigella phage SHFML-11 140.99 97.43 69.77 0.00 33.87 160.41 0.00 70.65
Enterobacteria phage P1 253.79 613.83 62.80 894.81 0.00 0.00 0.00 0.00
Staphylococcus phage StB20-like 0.00 0.00 0.00 3873.58 568.48 168.27 0.00 0.00
Staphylococcus phage P108 0.00 0.00 0.00 1290.95 123.15 48.60 0.00 0.00 Staphylococcus phage phiSA012 0.00 0.00 0.00 1108.69 244.06 48.16 0.00 0.00
Microbacterium phage Min1 2075.63 896.48 0.00 0.00 0.00 0.00 412.69 0.00
Staphylococcus phage phiRS7 924.06 191.57 0.00 3071.89 133.20 315.42 0.00 0.00 Enterobacteria phage cdtI 170.56 2298.33 0.00 773.17 245.85 0.00 135.64 128.21
Enterobacteria phage SfI 1462.35 1732.38 310.15 1578.36 0.00 178.27 0.00 0.00
Staphylococcus phage StB20 0.00 0.00 0.00 2979.68 1421.19 168.27 156.82 148.23 Salmonella phage SJ46 387.63 1285.79 57.55 937.18 223.50 132.32 61.66 0.00
Staphylococcus phage MCE-2014 56.51 58.58 0.00 2732.68 0.00 144.68 0.00 42.48
Enterobacteria phage lambda 2480.22 2399.55 122.74 249.85 476.68 141.10 526.00 0.00 Enterobacteria phage phi92 0.00 55.94 0.00 978.52 933.43 184.20 0.00 0.00
Enterobacteria phage mEp460 2702.67 3175.06 267.49 1089.04 0.00 461.27 143.29 0.00
Lactobacillus prophage Lj928 0.00 10612.23 465.28 0.00 150.58 0.00 0.00 0.00 Staphylococcus phage phiIPLA-RODI 56.34 0.00 0.00 3490.40 284.23 480.77 0.00 42.35
Salmonella phage RE-2010 9637.67 4873.26 523.47 4617.59 847.08 802.38 186.95 353.40
Enterobacteria phage T7 7630.75 5620.16 298.12 3944.67 434.18 856.81 798.51 452.84 Stx2-converting phage 1717 5032.73 4012.93 478.95 2729.93 186.01 550.60 205.26 194.00
Enterobacteria phage VT2phi_272 6809.25 3781.24 270.78 3123.52 438.18 518.81 386.81 91.40
Shigella phage SfIV 11295.95 7318.20 898.40 3962.43 436.14 860.67 1283.37 0.00 Escherichia phage TL-2011b 11998.06 7981.90 398.79 4600.12 1032.51 916.89 996.93 269.22
Enterobacteria phage RB55 0.00 0.00 3982.75 0.00 0.00 5388.95 0.00 1284.90 Stx2 converting phage vB_EcoP_24B 38098.43 20466.63 825.71 12816.52 2004.26 949.24 3096.30 522.60
Uncultured phage crAssphage 1487.20 171.29 0.00 124.85 0.00 0.00 0.00 0.00
Enterobacteria phage YYZ-2008 5989.66 3482.95 108.44 2428.26 526.45 0.00 697.11 109.82
223
Table S7. Normalized percent relative abundance of detected bacteriophage at the species level.
Week Week Week Week Week Week Week Week Family Species Frequency 0 1 2 3 4 5 6 7
Myoviridae Enterobacteria phage P88 1.114 2.030 4.262 50
Staphylococcus phage GH15 0.071 1.560 0.364 37.5
Shigella phage SHFML-11 0.129 0.117 0.751 0.000 0.298 1.190 1.904 75
Enterobacteria phage P1 0.232 0.736 0.676 1.342 50
Staphylococcus phage P108 1.936 1.084 0.361 37.5
Staphylococcus phage phiSA012 1.663 2.149 0.357 37.5
Enterobacteria phage SfI 1.339 2.078 3.338 2.367 0.000 1.323 62.5
Salmonella phage SJ46 0.355 1.542 0.619 1.405 1.968 0.982 0.678 87.5
Staphylococcus phage MCE-2014 0.052 0.070 4.098 1.074 1.145 62.5
Enterobacteria phage phi92 0.067 1.467 8.218 1.367 50
Staphylococcus phage phiIPLA-RODI 0.052 5.234 2.502 3.568 1.141 62.5
Salmonella phage RE-2010 8.825 5.845 5.634 6.924 7.457 5.954 2.057 9.525 100
Shigella phage SfIV 10.344 8.778 9.669 5.942 3.840 6.387 14.123 87.5
Enterobacteria phage RB55 42.865 39.989 34.632 37.5
Podoviridae Enterobacteria phage IME10 0.185 1.509 1.616 0.917 50
Enterobacteria phage T7 6.987 6.741 3.209 5.915 3.822 6.358 8.787 12.206 100
Enterobacteria phage VT2phi_272 6.235 4.535 2.914 4.684 3.858 3.850 4.257 2.464 100
Escherichia phage TL-2011b 10.987 9.574 4.292 6.898 9.090 6.804 10.970 7.256 100
Stx2 converting phage vB_EcoP_24B 34.887 24.549 8.887 19.219 17.645 7.044 34.073 14.086 100
Siphoviridae Staphylococcus phage SPbeta-like 0.287 0.427 0.797 37.5
Staphylococcus phage StB20-like 5.809 5.005 1.249 37.5
Microbacterium phage Min1 1.901 1.075 4.541 37.5
Staphylococcus phage phiRS7 0.846 0.230 4.606 1.173 2.341 62.5
Enterobacteria phage cdtI 0.156 2.757 1.159 2.164 1.493 3.456 75
Staphylococcus phage StB20 4.468 12.512 1.249 1.726 3.995 62.5
Enterobacteria phage lambda 2.271 2.878 1.321 0.375 4.196 1.047 5.788 87.5
Enterobacteria phage mEp460 2.475 3.808 2.879 1.633 3.423 1.577 75
Lactobacillus prophage Lj928 0.000 12.729 5.008 1.326 37.5
224
Stx2-converting phage 1717 4.608 4.813 5.155 4.094 1.638 4.086 2.259 5.229 100
Unclassified Uncultured phage crAssphage 1.362 0.205 0.187 37.5
Enterobacteria phage YYZ-2008 5.485 4.178 1.167 3.641 4.635 7.671 2.960 87.5
225
Table S8. Average normalized relative abundance of detected bacteriophage.
Order Family Genus Species Average
Caudovirales 142.34% Myoviridae 70.99% P1virus 1.83% Enterobacteria phage P1 0.75%
Salmonella phage SJ46 1.08%
Spounavirinae 6.97% Staphylococcus phage GH15 0.67% Staphylococcus phage MCE-2014 1.29%
Staphylococcus phage P108 1.13%
Staphylococcus phage phiIPLA-RODI 2.50% Staphylococcus phage phiSA012 1.39%
Tevenvirinae 39.89% Enterobacteria phage RB55 39.16%
Shigella phage SHFML-11 0.73% unclassified 22.31% 2.47% Myoviridae Enterobacteria phage P88
Enterobacteria phage phi92 2.78% Enterobacteria phage SfI 2.09%
Salmonella phage RE-2010 6.53%
Shigella phage SfIV 8.44% Podoviridae 40.19% Autographivirinae 6.75% Enterobacteria phage T7 6.75%
Epsilon15virus 8.23% Escherichia phage TL-2011b 8.23%
Sepvirinae 24.15% Enterobacteria phage VT2phi_272 4.10% Stx2 converting phage vB_EcoP_24B 20.05% Unclassified 1.06% 1.06% Podoviridae Enterobacteria phage IME10 Siphoviridae 31.16% Lambdavirus 11.15% Enterobacteria phage cdtI 1.98%
Enterobacteria phage lambda 2.55%
Enterobacteria phage mEp460 2.63% Stx2-converting phage 1717 3.99%
Spbetavirus 0.50% Staphylococcus phage SPbeta-like 0.50% unclassified 19.51% 6.35% Siphoviridae Lactobacillus prophage Lj928
Microbacterium phage Min1 2.51%
Staphylococcus phage phiRS7 1.84% Staphylococcus phage StB20 4.79%
Staphylococcus phage StB20-like 4.02%
Unclassified 4.83% Unclassified 4.83% Unclassified 4.83% Enterobacteria phage YYZ-2008 4.25% Uncultured phage crAssphage 0.59%
226
Table S9. Relative abundance of detected bacteria at the genera level. (Taxa that could not be assigned a genus are displayed using the highest taxonomic level that could be assigned to them: * (family), ** (class), or *** (order)).
Week Week Week Week Week Week Family Genus Frequency 0 1 3 4 5 7
Corynebacteriaceae Corynebacteriaceae* 6.00 1.00 1.00 1.00 0.50 22.00 100
Brevibacteriaceae Brevibacterium 9.00 6.00 7.00 5.00 4.00 3.00 100
Dermabacteraceae Brachybacterium 8.00 6.00 5.00 7.00 2.00 2.00 100
Micrococcaceae Yaniella 1.00 1.00 1.00 1.00 1.00 0.20 100
Micrococcaceae* 2.00 0.00 0.00 0.00 0.00 0.00 16.7
Nocardiopsaceae Nocardiopsis 3.00 1.00 1.00 21.00 0.10 0.00 83.3
Bacteroidaceae Bacteroides 0.00 0.00 3.00 2.00 2.00 0.00 50
Prevotellaceae Alloprevotella 2.00 0.00 0.00 0.00 0.10 0.00 33.3
Flavobacteriaceae Chryseobacterium 7.00 0.00 0.00 0.10 0.10 0.00 50
Bacillaceae Lentibacillus 2.00 0.40 0.10 0.00 0.00 0.00 50
Paucisalibacillus 1.00 2.00 2.00 6.00 0.30 0.30 100
Jeotgalicoccus 1.00 0.40 0.30 1.00 0.40 1.00 100
Staphylococcaceae Salinicoccus 1.00 2.00 2.00 4.00 1.00 2.40 100
Staphylococcus 3.00 7.10 11.00 16.00 6.00 9.00 100
Bacillaceae* 6.50 6.30 4.20 10.40 1.20 2.00 100
Planococcaceae Planococcaceae* 4.00 0.00 0.00 0.00 0.10 0.00 33.3
Aerococcaceae Facklamia 2.00 0.00 0.10 0.20 0.00 0.10 66.7
Lactobacillaceae Lactobacillus 5.10 14.10 34.10 7.00 18.20 8.50 100
Leuconostocaceae Weissella 0.00 1.00 0.40 3.00 1.00 0.10 83.3
Streptococcaceae Streptococcus 8.00 1.10 0.10 0.40 0.40 2.20 100
Lactobacillales Lactobacillales** 1.00 23.00 0.10 0.00 11.00 6.30 83.3
Lachnoclostridium 0.00 3.00 2.00 2.00 0.30 0.00 66.7
Lachnospiraceae Anaerotruncus 0.00 1.00 1.00 0.10 0.30 0.00 66.7
Faecalibacterium 0.00 0.00 5.00 2.00 4.00 1.00 66.7
Ruminococcaceae Subdoligranulum 0.00 0.00 1.00 0.20 0.20 0.10 66.7
Lachnospiraceae* 0.00 2.00 0.20 0.10 0.00 0.00 50
Peptostreptococcaceae* 0.00 1.00 0.10 1.00 0.40 0.40 83.3
Peptostreptococcaceae Bacilli*** 0.50 1.00 2.40 1.40 0.50 0.00 83.3
Oxalobacteraceae Oxalobacteraceae* 3.00 0.00 0.00 0.00 0.00 0.00 16.7
Pasteurellaceae Gallibacterium 0.00 0.00 1.00 0.00 44.00 37.00 50
Pseudomonadaceae Pseudomonas 13.00 2.00 0.00 0.00 0.10 0.00 50
Xanthomonadaceae Xanthomonas 3.00 0.00 0.00 0.00 0.00 0.00 16.7
Enterobacteriaceae Escherichia-Shigella 0.10 2.00 1.00 2.00 0.20 0.20 100
227
Table S10. Average relative abundance of detected bacteria.
Phylum Class Order Family Genus
Actinobacteria 24.00% Actinobacteria 24.00% Corynebacteriales 5.25% Corynebacteriaceae 5.25% 5.25%
Micrococcales 13.53% Brevibacteriaceae 5.67% Brevibacterium 5.67%
Dermabacteraceae 5.00% Brachybacterium 5.00% Micrococcaceae 2.87% Yaniella 0.87%
2.00%
Streptosporangiales 5.22% Nocardiopsaceae 5.22% Nocardiopsis 5.22%
Bacteroidetes 5.78% Bacteroidia 3.38% Bacteroidales 3.38% Bacteroidaceae 2.33% Bacteroides 2.33%
Prevotellaceae 1.05% Alloprevotella 1.05% Flavobacteria 2.40% Flavobacteriales 2.40% Flavobacteriaceae 2.40% Chryseobacterium 2.40%
Firmicutes 56.17% Bacilli 49.02% Bacillales 21.35% Bacillaceae 7.87% Lentibacillus 0.83% 228 Paucisalibacillus 1.93%
5.10%
Staphylococcaceae 11.43% Jeotgalicoccus 0.68%
Salinicoccus 2.07%
Staphylococcus 8.68%
Planococcaceae 2.05% 2.05%
Lactobacillales 27.67% Aerococcaceae 0.60% Facklamia 0.60%
Lactobacillaceae 14.50% Lactobacillus 14.50%
Leuconostocaceae 1.10% Weissella 1.10%
Streptococcaceae 2.03% Streptococcus 2.03%
Lactobacillales 8.28% 8.28%
1.16%
Clostridia 7.15% Clostridiales 7.15% Lachnospiraceae 2.59% Lachnoclostridium 1.83%
228
0.77% Ruminococcaceae 3.98% Anaerotruncus 0.60%
Faecalibacterium 3.00%
Subdoligranulum 0.38% Peptostreptococcaceae 0.58% 0.58%
Proteobacteria 39.28% Betaproteobacteria 3.00% Burkholderiales 3.00% Oxalobacteraceae 3.00% 3.00% Gammaproteobacteria 36.28% Pasteurellales 27.33% Pasteurellaceae 27.33% Gallibacterium 27.33%
Pseudomonadales 5.03% Pseudomonadaceae 5.03% Pseudomonas 5.03%
Xanthomonadales 3.00% Xanthomonadaceae 3.00% Xanthomonas 3.00% Escherichia- Enterobacterales 0.92% Enterobacteriaceae 0.92% 0.92% Shigella
229
229
Table S11. Normalized abundance of detected fungi at the species level.
Species Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7
Candida glabrata 4.53 3.79 11.63
Agaricus bisporus 8.73 2.99 Amorphotheca 3.17 5.13 resinae Aspergillus 183.48 170.82 fumigatus
Aspergillus nidulans 169.51
Aspergillus oryzae 1.18 25.15 1.73 9.14 2.96 1.83 1.03 4.35
Bipolaris cookei 43.02
Botrytis cinerea 4.84 2.87 2.94 2.43 2.93 2.70 Candida dubliniensis 2.48 9.31 1.78 2.36 Chrysoporthe 177.08 27.48 austroafricana Chrysoporthe 42.15 deuterocubensis
Clonostachys rosea 137.64 Colletotrichum 925.76 graminicola
Cryptococcus gattii 2.94 2.45 Cryptococcus 3.74 3.75 neoformans Debaryomyces 3.69 7.92 8.80 4.07 hansenii
Dekkera bruxellensis 73.67 68.59
Diaporthe longicolla 18760.22 125.00 1177.48 93.52 109.31 Didymella pinodes 178.10 Epidermophyton 216.10 169.64 floccosum Eremothecium 5.51 3.24 gossypii Eremothecium 5.48 sinecaudum Exophiala 191.68 dermatitidis
Fusarium circinatum 587.48 703.22 174.09 Fusarium 58.87 438.45 graminearum
Fusarium mangiferae 171.20 352.62 Gibberella 92.73 97.55 moniliformis Kazachstania 5.28 3.87 naganishii
Kluyveromyces lactis 3.41 Kluyveromyces 3.71 3.06 marxianus
Kuraishia capsulata 4.41 4.99 4.42
Laccaria bicolor 4.83 9.98 3.31 4.87 2.48 7.43 4.03 9.27
230
Lachancea 3.28 3.40 thermotolerans Meyerozyma 160.86 guilliermondii Moniliophthora 48.06 perniciosa Mycosphaerella 226.75 graminicola
Nectria cinnabarina 80.58
Neurospora crassa 260.59 Penicillium 89.75 42.56 6.87 45.95 288.28 147.93 15.96 6.87 chrysogenum
Pestalotiopsis fici 286.75 Pithomyces 144.63 chartarum
Ricasolia amplissima 60.54 Saccharomyces 6.81 cerevisiae Scedosporium 14.31 apiospermum Schizosaccharomyces 2.20 pombe
Sordaria macrospora 4.85 6.41 10.35 2.89 6.32 22.04 Stemphylium 74.19 lycopersici Sugiyamaella 4.37 7.41 1.80 lignohabitans Talaromyces 147.97 761.92 marneffei Tetrapisispora 6.77 1.94 blattae
Tetrapisispora phaffii 68.48 45.12 21.25 20.25 23.67 26.32 58.36 Thermothelomyces 1.30 1.46 1.48 1.06 0.46 1.63 1.29 1.94 thermophila
Thielavia terrestris 1.98 1.37 1.92 1.49 1.23 2.20 1.00 0.90 Torulaspora 4.00 delbrueckii
Tremella fuciformis 142.18 Trichoderma 166.60 asperellum
Trichophyton rubrum 247.54
Usnea ceratina 85.94 89.13 Wickerhamomyces 539341.10 286941.40 45174.40 174945.06 33799.87 218599.53 50508.57 51128.68 ciferrii
Yarrowia lipolytica 2.16 1.37 1.44 2.17 1.61 1.49
Zymoseptoria tritici 1.42 1.43
231
Table S12. Normalized percent relative abundance of detected fungi at the species level.
Species Week 0 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Frequency
Candida glabrata 0.0008 0.0109 0.0053 37.5 Agaricus bisporus 0.0252 0.0014 25 Amorphotheca 0.0014 0.0101 25 resinae Aspergillus 25 fumigatus 0.0596 0.0953
Aspergillus nidulans 0.0551 12.5
Aspergillus oryzae 0.0002 0.0082 0.0038 0.0051 0.0086 0.0008 0.0020 0.0126 87.5 Bipolaris cookei 0.0196 12.5
Botrytis cinerea 0.0009 0.0009 0.0064 0.0014 0.0085 0.0012 75
Candida dubliniensis 0.0008 0.0269 0.0035 0.0023 50 Chrysoporthe 25 austroafricana 0.0575 0.0153 Chrysoporthe 12.5 deuterocubensis 0.0200
Clonostachys rosea 0.0447 12.5 Colletotrichum 12.5 graminicola 0.5165
Cryptococcus gattii 0.0013 0.0049 25 Cryptococcus 25 neoformans 0.0012 0.0021 Debaryomyces 50 hansenii 0.0007 0.0026 0.0049 0.0118 Dekkera bruxellensis 0.0239 0.0383 25
Diaporthe longicolla 6.0952 0.2727 0.6569 0.2703 0.0499 62.5
Didymella pinodes 0.0329 12.5 Epidermophyton 25 floccosum 0.4715 0.0946 Eremothecium 0.0015 25 gossypii 0.0159 Eremothecium 12.5 sinecaudum 0.0158 Exophiala 12.5 dermatitidis 0.0354
Fusarium circinatum 0.1909 0.3923 0.0794 37.5 Fusarium 25 graminearum 0.0191 0.2446
Fusarium mangiferae 0.0955 0.3405 25 Gibberella 25 moniliformis 0.0171 0.0544 Kazachstania 25 naganishii 0.0029 0.0112
Kluyveromyces lactis 0.0016 12.5 Kluyveromyces 25 marxianus 0.0012 0.0017
Kuraishia capsulata 0.0008 0.0016 0.0128 37.5
Laccaria bicolor 0.0009 0.0032 0.0072 0.0027 0.0072 0.0034 0.0080 0.0179 100
232
Lachancea 0.0015 25 thermotolerans 0.0095 Meyerozyma 12.5 guilliermondii 0.4650 Moniliophthora 12.5 perniciosa 0.0268 Mycosphaerella 12.5 graminicola 0.0419
Nectria cinnabarina 0.0262 12.5
Neurospora crassa 0.0847 12.5 Penicillium 0.0675 0.0315 0.0265 100 chrysogenum 0.0166 0.0138 0.0150 0.0256 0.8333
Pestalotiopsis fici 0.0530 12.5 Pithomyces 12.5 chartarum 0.0267
Ricasolia amplissima 0.0112 12.5 Saccharomyces 12.5 cerevisiae 0.0022 Scedosporium 0.0065 12.5 apiospermum Schizosaccharomyces 0.0021 12.5 pombe
Sordaria macrospora 0.0009 0.0021 0.0226 0.0084 0.0125 0.0426 75 Stemphylium 12.5 lycopersici 0.0241 Sugiyamaella 37.5 lignohabitans 0.0014 0.0041 0.0052 Talaromyces 0.7358 25 marneffei 0.0825 Tetrapisispora 25 blattae 0.0013 0.0011
Tetrapisispora phaffii 0.0222 0.0984 0.0119 0.0585 0.0108 0.0520 0.0564 87.5 Thermothelomyces 0.0007 0.0026 0.0056 75 thermophila 0.0002 0.0005 0.0032 0.0006 0.0013
Thielavia terrestris 0.0004 0.0004 0.0042 0.0008 0.0035 0.0010 0.0020 0.0026 75 Torulaspora 0.0079 12.5 delbrueckii
Tremella fuciformis 0.0263 12.5 Trichoderma 12.5 asperellum 0.4816
Trichophyton rubrum 0.5400 12.5
Usnea ceratina 0.0279 0.0407 12.5 Wickerhamomyces 99.7037 99.8600 98.7550 100 ciferrii 99.7309 93.2272 98.5550 97.5976 97.6982
Yarrowia lipolytica 0.0004 0.0004 0.0008 0.0063 0.0007 0.0029 50
Zymoseptoria tritici 0.0003 0.0041 12.5
233
Table S13. Average normalized relative abundance of detected fungi.
Average Phylum Class Order Family Genus Species Abundance
Ascomycota 103.8% Dothideomycetes 0.15% Capnodiales 0.05% Mycosphaerellaceae 0.05% Zymoseptoria 0.05% Bipolaris cookei 0.02%
Didymella pinodes 0.03% Mycosphaerella
Pleosporales 0.09% Astrosphaeriellaceae 0.04% Pithomyces 0.04% graminicola 0.04% Pithomyces
Didymellaceae 0.03% Didymella 0.03% chartarum 0.03% Stemphylium
Pleosporaceae 0.03% Bipolaris 0.02% lycopersici 0.02%
Stemphylium 0.00% Zymoseptoria tritici 0.002% Aspergillus Eurotiomycetes 0.99% Onygenales 0.08% Arthrodermataceae 0.08% Trichophyton 0.08% fumigatus 0.08%
Chaetothyriales 0.06% Herpotrichiellaceae 0.06% Exophiala 0.06% Aspergillus nidulans 0.06%
Eurotiales 0.86% Aspergillaceae 0.45% Aspergillus 0.32% Aspergillus oryzae 0.01% Epidermophyton
Aspergillaceae floccosum 0.28% Exophiala
234 dermatitidis 0.04% Penicillium
Penicillium 0.13% chrysogenum 0.13% Talaromyces
Trichocomaceae 0.41% Talaromyces 0.41% marneffei 0.41%
Onygenales 0.54% Arthrodermataceae 0.54% Epidermophyton 0.54% Trichophyton rubrum 0.54% Lecanoromycetes 0.55% Lecanorales 0.01% Parmeliaceae 0.01% Usnea 0.01% Ricasolia amplissima 0.01%
Peltigerales 0.03% Lobariaceae 0.03% Ricasolia 0.03% Usnea ceratina 0.03% Amorphotheca Leotiomycetes 0.01% Helotiales 0.01% Sclerotiniaceae 0.01% Botrytis 0.01% resinae 0.01% Leotiomycetes incertae
sedis 0.00% Myxotrichaceae 0.00% Amorphotheca 0.00% Botrytis cinerea 0.003% Saccharomycetes 98.76% Saccharomycetales 98.76% Debaryomycetaceae 0.02% Candida 0.01% Candida dubliniensis 0.01%
Debaryomyces 0.01% Candida glabrata 0.01% Debaryomyces
Meyerozyma 0.00% hansenii 0.01%
1
Dipodascaceae 0.03% Yarrowia 0.03% Dekkera bruxellensis 0.03% Eremothecium
Phaffomycetaceae 0.01% Wickerhamomyces 0.01% gossypii 0.01% Eremothecium
Pichiaceae 0.02% Brettanomyces 0.02% sinecaudum 0.02% Kazachstania
Saccharomycetaceae 0.54% Eremothecium 0.01% Naganishii 0.01% Kluyveromyces
marxianus 0.001%
Kazachstania 0.01% Kluyveromyces lactis 0.002%
Kuraishia capsulata 0.01% Lachancea
Kluyveromyces 0.01% thermotolerans 0.01% Meyerozyma
Lachancea 0.46% guilliermondii 0.47% Saccharomyces
Nakaseomyces 0.00% cerevisiae 0.002% Sugiyamaella
Saccharomyces 0.00% lignohabitans 0.004% Tetrapisispora
Tetrapisispora 0.05% blattae 0.001% Tetrapisispora
phaffii 0.04% Torulaspora
Torulaspora 0.01% delbrueckii 0.01% Wickerhamomyces
Sacc. Incertae sedis 98.14% Kuraishia 98.14% ciferrii 98.14% 235
Trichomonascaceae 0.00% Sugiyamaella 0.00% Yarrowia lipolytica 0.002% Schizosaccharomyces Schizosaccharomycetes 0.00% Schizosaccharomycetales 0.00% Schizosaccharomycetaceae 0.00% Schizosaccharomyces 0.00% pombe 0.002% Chrysoporthe Sordariomycetes 3.37% Diaporthales 0.10% Cryphonectriaceae 0.06% Chrysoporthe 0.06% austroafrica 0.04% Chrysoporthe
deuterocubensis 0.02%
Diaporthaceae 0.04% Diaporthe 0.04% Clonostachys rosea 0.05% Colletotrichum
Glomerellales 0.52% Glomerellaceae 0.52% Colletotrichum 0.52% graminicola 0.52%
Hypocreales 2.19% Bionectriaceae 1.47% Clonostachys 1.47% Diaporthe longicolla 1.47%
Hypocreaceae 0.22% Trichoderma 0.22% Fusarium circinatum 0.22% Fusarium
Nectriaceae 0.50% Fusarium 0.41% graminearum 0.13%
Fusarium mangiferae 0.22%
2
Gibberella
moniliformis 0.04%
Nectria cinnabarina 0.03%
Nectria 0.08% Neurospora crassa 0.09%
Microascales 0.05% Microascaceae 0.05% Scedosporium 0.05% Pestalotiopsis fici 0.05% Scedosporium
Sordariales 0.03% Chaetomiaceae 0.02% Thermothelomyces 0.01% apiospermum 0.01%
Thielavia 0.01% Sordaria macrospora 0.02% Thermothelomyces
Sordariaceae 0.00% Neurospora 0.00% thermophila 0.002%
Sordaria 0.00% Thielavia terrestris 0.002% Trichoderma
Xylariales 0.48% Sporocadaceae 0.48% Pestalotiopsis 0.48% asperellum 0.48% Cryptococcus Basidiomycota 0.08% Tremellomycetes 0.03% Agaricales 0.03% Agaricaceae 0.00% Agaricus 0.00% neoformans 0.002%
Marasmiaceae 0.00% Moniliophthora 0.00% Cryptococcus gattii 0.003%
Tricholomataceae 0.03% Laccaria 0.03% Tremella fuciformis 0.03% Agaricomycetes 0.05% Tremellales 0.05% Cryptococcaceae 0.02% Cryptococcus 0.02% Agaricus bisporus 0.01%
Laccaria bicolor 0.02% Moniliophthora
Tremellaceae 0.03% Tremella 0.03% perniciosa 0.03% 236
3