Characterization of Two Aerobic from Canada Goose Microflora by Whole Genome Sequence Analyses

by Abigail Larkin

A THESIS

submitted to

Oregon State University

Honors College

in partial fulfillment of the requirements for the degree of

Honors Baccalaureate of Science in Biology (Honors Associate)

Presented July 31, 2020 Commencement June 2021

AN ABSTRACT OF THE THESIS OF

Abigail Larkin for the degree of Honors Baccalaureate of Science in Biology presented on July 31, 2020. Title: Characterization of Two Aerobic Bacteria from Canada Goose Microflora by Whole Genome Sequence Analyses.

Abstract approved:______Patrick Ball

The Canada goose (Branta canadensis) monogastric gut is proposed to contain aerobic spore- forming bacteria that produce non-toxin substances that may promote anti-inflammatory immune responses. It is hypothesized that the aerobic bacteria found in samples of Canada goose microflora could have probiotic effects in other avian or used for commercial or ecosystems analyses. To investigate this, two Gram-variable, spore forming, rod-shaped and chloroform tolerant aerobic bacteria strains A4 and A15 were isolated from fecal samples of resident Canada geese (Branta canadensis). Using a 16S rRNA gene sequences analysis it was determined that both novel strains were 94% similar to other species rRNA genes and both had their own unique 16S rRNA sequence. Both A4 and A15 were found to contain urease genes, GerA spore germination protein genes, and a fibronectin binding protein. The digital DNA–DNA hybridization (dDDH) of both strains to similar genomes and between each other was <28%, below the 70% cut-off for species definition. In addition, the average nucleotide identity (ANI) and average amino acid identity (AAI) values of both strains were <82% for A4 and <88% for A15, also below the cut-off values for species definition. Phenotypic characteristics were tested, including phenotypic assays testing for 177 different carbon sources and assays testing for pH effects on growth. It was concluded that A4 and A15 represent two novel strains of the Sporosarcina , Sporosarcina cascadiensis (A4) and Sporosarcina obsidiansis (A15).

Key Words: Microbiome, Spore-forming bacteria, Probiotic, Average Nucleotide Identity

Corresponding e-mail address: [email protected]

©Copyright by Abigail Larkin July 31, 2020

Characterization of Two Aerobic Bacteria from Canada Goose Microflora by Whole Genome Sequence Analyses

by Abigail Larkin

A THESIS

submitted to

Oregon State University

Honors College

in partial fulfillment of the requirements for the degree of

Honors Baccalaureate of Science in Biology (Honors Associate)

Presented July 31, 2020 Commencement June 2021

Honors Baccalaureate of Science in Biology project of Abigail Larkin presented on August 31, 2020.

APPROVED:

______Patrick Ball, Mentor, representing the Department of Biology

______Bruce Seal, Committee Member, representing the Department of Biology

______Kristina Smith, Committee Member, representing the Department of Biology

______Toni Doolen, Dean, Oregon State University Honors College

I understand that my project will become part of the permanent collection of Oregon State University, Honors College. My signature below authorizes release of my project to any reader upon request.

______Abigail Larkin, Author

Acknowledgements

Appreciation is extended to Patrick Ball, Ph.D. for taking the time to mentor me throughout this research project and providing feedback throughout the process. Appreciation is also extended to Bruce Seal for his co-mentorship on the project and for taking his time to guide me through this process. In addition, I would like to thank Kristina Smith, Ph.D. for being on the committee and for providing feedback to all of us who participated on the project. I want to thank Brittany Martinez for partnering with me during the research process and for providing funding for the research from the URSA Scholarship. Funding was also provided by my mentor Patrick Ball, Ph.D. though a start-up grant from the OSU Faculty Innovation Committee and the Gaskins Fund. I am grateful for the collaborative effort of all involved and the opportunity I had to work with a group of individuals who helped foster my interest in microbial research.

Introduction

It was proposed that the Canada goose’s (Branta canadensis) monogastric gut contains aerobic bacteria that produce non-toxin substances that may promote anti- inflammatory immune responses in other avian species. The Canada goose is one of the most common waterfowl species in North America and has a gut flora which contains diverse communities of bacteria [1]. Like many domestic and free ranging bird species the Canada goose’s gut flora plays an important role in host nutrition and protection from pathogens [2]. The purpose of this study was to genotypically and phenotypically characterize two bacterial strains isolated from the Canada goose’s feces. In particular, the study focused on strains of bacteria that exhibit sporulation, a mechanism that enables spore-forming bacteria to survive in diverse environments [3]. The results of the study found two novel bacteria strains, isolated from Canada goose feces, and selected them for their potential to stimulate anti-inflammatory responses in avian species. The novel bacterial species, identified using whole genome sequencing, belong to the genus Sporocarcina and exhibited a unique 16S rRNA sequence with only a 94% similarity to previously identified Sporosarcina species.

Sporosarcina characterization and diversity

The genus Sporosarcina, which belongs to the family Bacillaceae, was created by Kluyver and van Neil to accommodate bacteria that have spherical or oval-shaped cells, low DNA G+C content (40–42 mol%), and MK-7 as the major menaquinone [4]. Currently Sporosarcina includes 17 species (https://www.bacterio.net/genus/sporosarcina) isolated from a range of environments including some species, such as S. ureae which utilizes the enzyme urease to breakdown urea [5]. Additionally, some species such as S. newyorkensis form . S. newyorkensis is a gram-positive, -forming rods that originated from veterinary clinical specimens in New York State, USA and from raw milk in Flanders, Belgium [6].

Methods of Isolation and Identification

Fresh feces from geese were aseptically collected from resident Canada goose populations in Bend, OR USA (44.0582° N, 121.3153° W) utilizing sterile plastic bags. Fecal material was stored in an ultra-cold freezer (- 80 °C) until processed for selection of bacterial spores. Goose fecal material was thawed and suspended in phosphate-buffered saline (PBS) using organic solvent-resistant, polypropylene 15 or 50 ml conical centrifuge tubes followed by vortex mixing for five minutes [7]. Subsequently, low-speed centrifugation was conducted at 1,000 x g for five minutes to eliminate solids. Following centrifugation, chloroform was added to a concentration of approximately 3 percent; e.g., 0.3 ml per 10 ml of fecal suspension or 1.5 ml chloroform per 50 ml fecal suspension and placed on a laboratory shaker for 30 minutes to eliminate vegetative bacterial cells and select for bacterial spores [8 9]. Aliquots of spore suspensions (150µl) were cultured aerobically for two days utilizing brucella agar with blood and vitamin K-hemin (BBHK) or reinforced clostridial agar with L- cysteine, Na acetate, starch without polymyxin B. This procedure resulted in fifteen aerobic bacterial isolates [10].

Two of the reported isolates were propagated on BHI media for preservation. Gram stains performed on both isolates initially indicated they were Gram negative (Fig 1). DNA was extracted from bacterial colonies using either the QuickExtract™ Bacterial DNA Extraction Kit or the mBio UltraClean® Microbial DNA Isolation Kit. Genomic DNA was used to amplify 16S rRNA genes using the primers 27F (5'AGAGTTTGATCMTGGCTCAG-3') and 1492R (5'-GGTTACCTTGTTACGACTT-3') [11]. After amplification and analysis of the 16S rRNA, two Gram-variable bacilli, designated A4 and A15, were selected due to the fact that they had the most unique 16S rRNA sequences, that was also 94% similar to previously identified Sporosarcina spp.

A4 A15

Fig.1 Gram staining of isolates A4 and A15 showing both strains as Gram negative to Gran Variable

Genome Features

Genomic DNA was sent to the Bioinformatics Center at the Oregon State University Corvallis campus for whole genome sequencing (WSG) using Illumina’s MiSeq platform 150 bp paired end sequencing chemistry. Raw fastq files were used for de novo genome assembly with St. Petersburg genome assembler (SPAdes) v 3.10.1 [12]. Assembled scaffolds of strains A4 and A15 genome sequences were submitted to National Center for Biotechnology Information (NCBI) for Basic Local Alignment Search Tool (BLAST) analyses using Microbial genomes BLAST analysis tools [13] Phylogenetic analyses of genomic data demonstrated that the isolates were closely related to the genus Sporosarcina, but only 79% related to currently classified Sporosarcina spp. in the database, suggesting the bacteria may be a previously unclassified species (Fig. 2). Three genes of interest were identified by the BLAST analysis, the first gene identified encoded a related /Clostridium GerA spore germination protein. The protein is a SpoVR-like protein gene related to the Bacillus subtilis stage V sporulation protein R involved with spore cortex formation, and gene encoding a urea oxidase enzyme (Fig. 3). Spores may recognize small molecules through receptor proteins encoded by the gerA family of operons, which includes gerA, gerB, and gerK, as well as a related Bacillus subtilis stage V sporulation protein R gene (spoVR) involved in spore cortex formation. [14 15]. Dormant spores germinate in the

presence of nutrients known as germinates, recognized by the GerA spore germination protein. A gene common to the Sporosarcina genus encodes the urease accessory protein, UreG [4,5]. The urease accessory protein is part of the pre-activation complex of urease, a nickel-binding enzyme that catalyzes the hydrolysis of urea to form ammonia and carbamate [16]. In addition to these genes, a fibronectin-binding protein A gene (FbpA) was identified in both bacterial genomes. The fibronectin-binding protein A gene (FbpA) binds to fibronectin in the extracellular matrix, fibronectin is also a substrate for the attachment of bacteria to eukaryotic cells [17 18]. Another interesting gene identified in the conserved analysis of both isolate genomes was urea oxidase. Oxidases are a group of enzymes that catalyze the oxidation-reduction reaction using dioxygen as electron acceptor leading to formation of water (H2O) or hydrogen peroxide (H2O2) as by-product [19].

Fig. 2 Phylogenetic analyses of the A4 - A15 isolates 16S small subunit ribosomal RNA gene sequences as the Query subject denoted in the figure.

Fig. 3 Example of Conserved Domain Analyses and Protein Classification by BLAST analysis.

A Mash/Minhash algorithm was used in the search for genomes similar to A4 and A15. Mash Distance is a software utilized for genome distance estimation using the MinHash algorithm, used for rapid species identification [20]. The algorithm found the nearest genome to strain A4 as Sporosarcina sp. P13 (distance 21%) and to A15 as S. newyorkensis (distance 17%). The Phyloflash pipeline, used to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of genomic or transcriptomic dataset [21], was also used to relate A4 and A15 to an unknown species of the genus Sporosarcina.

Assembled genomes were annotated using the IMG/MER Pipeline. The IMG/MER Pipeline contains annotated DNA and RNA sequence data of microbial genome and microbiome datasets to be used for more comprehensive genome analysis [22]. The contigs, physical maps of overlapping bits of DNA, of the A4 strain had a total length of 3,718,715 nucleotides with 85.95% of coding DNA bases and GC content of 44.02%. The contigs of the A15 strain had a total length of 3,642,744 nucleotides with 86.07% of coding DNA bases and GC content of 40.97%. Of interest is the low DNA G+C content (40–42 mol%) is consistent with other species in the Sporosarcina genus [4].

After annotating both genomes, the average nucleotide identity (ANI) analyses compared A4 and A15 with 22 genomes of Sporosarcina spp. and three other closely related bacterial genomes. ANI is a similarity index between a given pair of genomes that can be applicable to prokaryotic organisms independently of their G+C content. ANI also provides a cutoff score of >95%, which indicates if organisms may belong to the same species [23]. ANI compared A4 and A15 with 22 genomes of Sporosarcina spp. and three other closely related genomes. Strain A4 and A15 have maximum ANI with S. newyorkensis of 78.6% (A4) and 81% (A15). The maximum ANI scores demonstrate that both A4 and A15 were under the 95% cutoff and provide evidence that A4 and A15 may indeed be novel organisms [24].

Further evidence of A4 and A15 as novel organisms were given by analysis from a digital DNA-DNA hybridization of 18 closely related genomes to those of A4 and A15 [25]. DNA-DNA hybridization (DDH) is a technique in which DNA is heated until the double stranded DNA denatures to form single strands. Similarity between DNA is measured by re- formation of complementary sequences. The hybridization of A4 and A15 were analyzed using the Genome BLAST Distance Phylogeny approach (GBDP) which provided distances and digital DDH (dDDH) values. The dDDH% indicated that A4 and A15 were closely related to S. newyorkensis with a dDDH% of 23.5% for A4 and 27.4% for A15. Additionally, these results fall below the 79-80% dDDH cutoff that indicates a species is a same organism, thereby supporting that A4 and A15 may indeed be novel organisms (Fig. 4).

Fig.4 Minimum evolution tree from Genome BLAST Distance Phylogeny (GBDP) distances calculated from genome sequences of strains A4, A15 and closest homologous genomes.

Analysis of the average amino-acid identity (AAI) of the genomes was performed using protein fasta files with the AA-profiler. The AAI-profiler summarizes proteome-wide sequence search results to identify novel species, assess the need for taxonomic reclassification and detect multi-isolate and contaminated samples. AAI-profiler visualizes

results using a scatterplot that shows the Average Amino-acid Identity (AAI) from the query proteome to all similar species in the sequence database [26]. In addition to the ANI analysis and the dDDH%, the AAI also indicated that A4 and A15 has the closest similarity to S. newyorkensis with an AAI of 85.2% for A4 and an AAI of 87.1% for A15 as indicated in the previous assessments.

Clusters of Orthologous Groups of proteins (COG), a database of phylogenetic classification of the proteins encoded in complete genomes of bacteria, archaea, and eukaryotes, is used for comparative analysis [27]. To further compare A4 and A15 with other genomes principal coordinate clustering based on the database of COG profiles [27] were mapped and indicated that the nearest genomes of A4 and A15 were also from S. newyorkensis species (Fig. 6), once again illustrating solid evidence of novel organisms.

Fig.6 Principal Coordinates Analysis (PCoA) plot of COG profiles of Sporosarcina genomes based on Bray- Curtis distance matrix. Compositional dissimilarities among genomes were measured using Bray-Curtis dissimilarity coefficient of COG abundances.

Phenotypic Methods and Results

Gram stains were completed utilizing basic bacteriological procedures [28]. Additionally, the isolates were grown on media containing streptomycin, erythromycin, chloramphenicol, penicillin, and tetracycline [28] with no resultant antibiotic inhibition. Additional biochemical features were determined with the use of Phenotype Microarrays plates (Biolog plates PM1, 2, 9 and 10) [29]. All the materials and reagents used were purchased from Biolog (Hayward, CA, USA). Carbon source utilization and pH effects of both A4 and A15 strains were determined. Carbon sources were tested by the incubation of Phenotype MicroArray Plate Maps (PM) PM1 and PM2, and it was determined that out of 177 carbon sources A4 was able to use 12 different carbon sources and A15 was able to use 38 different carbon sources. PM10 was used to test pH effects and it was determined that strain A4 could be grown in pH range 8-10 while A15 in the pH range 6-10. These results can be useful in determining the species of the isolates, but they can also be indicative of the type of environment the isolates would propagate in. It was determined that both isolates used carbon sources, providing evidence that the isolates are heterotrophic and rely on a carbon sources from their environment. So far, all species belonging to the genus Sporosarcina are classified as heterotrophic [30], providing further evidence that A4 and A15 belong to the genus Sporosarcina. Additionally, pH is useful in determining the type of environment an isolate can inhabit and what pH range may produce spores [31]. Upon further investigation, the pH range of A4 and A15 can be useful in determining whether these isolates would be a good candidate for an anti-inflammatory immune response in avian species.

Conclusion

In conclusion, the genomic and phenotypic results taken from the A4 and A15 represent two novel species, Sporosarcina cascadensis and Sporosarcina obsidiansis. Genotypic evidence was provided by several methods to relate A4 and A15 to the genus Sporosarcina, including BLAST analysis, Mash/Minhash algorithm analysis, the Phyloflash pipeline, ANI analysis, AAI analysis, GBDP, and COG analysis. These methods were

successful in relating A4 and A15 to Sporosarcina and, if applicable, were below the cut-off for species definition indicating they were indeed a novel species. Additionally, the IMG/MER Pipeline provided evidence of low DNA G+C content (40–42 mol%) for both A4 and A15, which is another indicator that the isolates are species belonging to the Sporosarcina genus. The BLAST analysis was also successful in revealing three genes, a urease gene, a GerA spore germination protein gene, and a fibronectin binding protein gene, that upon further studies may promote anti-inflammatory immune response in avian species. Phenotypic analysis of the two isolates indicated that the isolates were gram variable, utilized environment carbon sources, and grew in high pH ranges, consistent with others within the Sporosarcina Genus. In addition, the phenotypic characteristics of the isolates may, upon further study, be used to analyze whether the isolates can be used as probiotics in other avian species or used for commercial or ecosystems analyses.

References

1. Lu J, Santo Domingo JW, Hill J, Edge TA. Microbial Diversity and Host-Specific Sequences of Canada Goose Feces. J Appl Environ Microbiol 2009;75(18):5919- 5926.

2. Waite DW, Taylor MW. Exploring the avian gut microbiota: current trends and future directions. Front Microbiol 2015;6:673.

3. Swick MC, Koehler TM, Driks A. Surviving Between Hosts: Sporulation and Transmission. Microbiol Spectr 2016;4(4):10.

4. Rylo Sona Janarthine S, Eganathan P. Plant Growth Promoting of Endophytic Sporosarcina aquimarina SjAM16103 Isolated from the Pneumatophores of Avicennia marina L. J Life Sci 2012;2012:1-10.

5. Garrity, GM. Bergey's Manual of Systematic Bacteriology: The . Volume 3 of Bergey's Manual of Systematic Bacteriology 2009.

6. Wolfgang WJ, Coorevits A, Cole JA, De Vos P, Dickinson MC et al. Sporosarcina newyorkensis sp. nov. from clinical specimens and raw cow's milk. Int J Syst Evol Microbiol 2012;62(Pt 2):322-329.

7. Wise MG, Siragusa GR. Quantitative analysis of the intestinal bacterial community in one- to three-week-old commercially reared broiler chickens fed conventional or antibiotic-free vegetable-based diets. J Appl Microbiol 2007;102(4):1138-1149.

8. Atarashi K, Tanoue T, Shima T, Imaoka A, Kuwahara T et al. Induction of colonic regulatory T cells by indigenous Clostridium species. Science 2011;331(6015):337-341.

9. Itoh K, Mitsuoka T. Production of gnotobiotic mice with normal physiological functions. I. Selection of useful bacteria from feces of conventional mice. Z Versuchstierkd 1980;22(3):173-178.

10. Keillor H.R., Svendsen, M.K., Ball, P.N. and Seal, B.S. Isolation of potential novel endospore-containing bacteria from Canada goose feces. Proceedings of the National Conference on Undergraduate Research (NCUR) 2017; http://www.ncurproceedings.org/

11. Stackebrandt E, Frederiksen W, Garrity GM, Grimont PA, Kämpfer P et al. Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. Int J Syst Evol Microbiol 2002;52(Pt 3):1043-1047.

12. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M et al. SPAdes: a new genome assembly algorithm and its applications to single- sequencing. J Comput Biol 2012;19(5):455-477.

13. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25(17):3389-3402.

14. Paidhungat M, Setlow PJ. Role of ger proteins in nutrient and nonnutrient triggering of spore germination in Bacillus subtilis. Bacteriol 200;182:2513-2519.

15. Galperin MY, Mekhedov SL, Puigbo P, Smirnov S, Wolf YI, Rigden DJ. Genomic determinants of sporulation in Bacilli and Clostridia: towards the minimal set of sporulation-specific genes. Environ Microbiol 2012;14(11):2870-90.

16. Fong YH, Wong HC, Yuen MH, Lau PH, Chen YW, Wong KB. Structure of UreG/UreF/UreH complex reveals how urease accessory proteins facilitate maturation of Helicobacter pylori urease. PLoS Biol 2013;11.

17. Huveneers S, Truong H, Fässler R, Sonnenberg A, Danen EH. Binding of soluble fibronectin to integrin alpha5 beta1 - link to focal adhesion redistribution and contractile shape. J Cell Sci 2008;121(Pt 15):2452–62.

18. Expression of fibronectin-binding protein FbpA modulates adhesion in Streptococcus gordonii. Microbiology 2002;148(Pt 6):1615-1625.

19. Prashant P, Sharma A, Gautam K. 11 - Microbial degradation of xenobiotics like aromatic pollutants from the terrestrial environments. Pharmaceuticals and Personal Care Products: Waste Management and Treatment Technology 2019; 259-278.

20. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 2016;17(1):132.

21. Gruber-Vodicka HR, Seah BKB, Pruesse E. PhyloFlash – Rapid SSU rRNA profiling and targeted assembly from metagenomes. BioRxiv2019.

22. Chen IA, Chu K, Palaniappan K, Pillay M, Ratner A et al. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res 2019;47(D1):D666-D677.

23. Figueras MJ, Beaz-Hidalgo R, Hossain MJ, Liles MR. Taxonomic affiliation of new genomes should be verified using average nucleotide identity and multilocus phylogenetic analysis. Genome Announc 2014;2(6):927-14.

24. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 2018;9(1):5114.

25. Meier-Kolthoff JP, Göker M. TYGS is an automated high-throughput platform for state-of-the-art genome-based . Nat Commun 2019;10(1):2182.

26. Medlar AJ, Törönen P, Holm L. AAI-profiler: fast proteome-wide exploratory analysis reveals taxonomic identity, misclassification and contamination. Nucleic Acids Res 2018;46(W1):W479-W485.

27. Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 2000;28(1):33-36.

28. Carroll KC, Pfaller MA, Landry ML, al. e. Manual of Clinical Microbiology, Twelfth edition 2019.

29. Bochner BR, Gadzinski P, Panomitros E. Phenotype microarrays for high- throughput phenotypic testing and assay of gene function. Genome Res 2001;11(7):1246-1255.

30. Garrity GM. Bergey's Manual of Systematic Bacteriology: The Firmicutes, Volume 3 of Bergey's Manual of Systematic Bacteriology. 2009.

31. Lowe SE, Pankratz HS, Zeikus JG. Influence of pH extremes on sporulation and ultrastructure of Sarcina ventriculi. J Bacteriol. 1989 Jul;171(7):3775-81. doi: 10.1128/jb.171.7.3775-3781.1989. PMID: 2738022; PMCID: PMC210124.