Abstract Metagenomic Analysis of Deep-Branching
Total Page:16
File Type:pdf, Size:1020Kb
ABSTRACT METAGENOMIC ANALYSIS OF DEEP-BRANCHING FIRMICUTES AND NEW REPRESENTATIVES OF KNOWN GENERA FROM THE CALUMET WETLANDS, A HIGHLY ALKALINE ENVIRONMENT Jay Osvatic, MS Department of Biological Sciences Northern Illinois University, 2017 Wesley Swingley, Director The Calumet Wetlands, located in southern Chicago, Illinois, USA, is a historical steel waste site that was created through the dumping of steel mill slag waste, which was used until the middle of the 20th century to turn the wetlands into buildable land. The weathering of the steel slag has led to a highly alkaline (up to pH 13.4), non-saline environment due to the release of Ca(OH)2 into the groundwater flow. While there are many chemical similarities to natural alkaline systems, such as serpentinizing sites, Calumet is distinct in that its extreme pH is a result of anthropogenic processes. Preliminary16S rDNA sequence analysis of these sites revealed that the microbial community was low in microbial diversity, and a significant portion of the community, particularly at the most alkaline sites, contained divergent sequences possibly representing some novel bacterial clades. The alkalinity of this site makes cultivation of these bacteria difficult and current ex situ culturing attempts have not yielded novel clades. As an alternative to the cultivation of these divergent taxa, we performed metagenomic sequencing and taxonomic binning to examine the genomes through computational analyses. Microbial DNA was sequenced from three samples taken from a single sediment core and assembled using a de novo metagenomic sequence assembler (SPAdes). Taxonomic bins were created using tetramer frequency and coverage values to separate contigs (Anvi’o). Nine robust genomes were identified within the Calumet metagenome dataset and were taxonomically placed using protein and 16S rRNA gene analyses and annotated using online tools (KEGG and RAST). Of the nine highest quality bins, four belonged to known genera and two to the novel Firmicutes from previous genetic surveys. Several of these genomes showed uncommon metabolic pathways, including nitrogen oxidation and hydrogen utilization that all for energy production in the harsh environmental conditions. Analysis also revealed very few proteins that are typically implicated in high pH adaptation and tolerance. NORTHERN ILLINIOS UNIVERSITY DEKALB, ILLINOIS MAY 2017 METAGENOMIC ANALYSIS OF DEEP-BRANCHING FIRMICUTES AND NEW REPRESENTATIVES OF KNOWN GENERA FROM THE CALUMET WETLANDS, A HIGHLY ALKALINE ENVIRONMENT BY JAY T. OSVATIC A THESIS SUBMITTTED TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE MASTER OF SCIENCE DEPARTMENT OF BIOLOGICAL SCIENCES Thesis Director: Wesley Swingley ii ACKNOWLEDGEMENTS I would like to thank my advisor, Wesley Swingley, for his help and guidance throughout this project. His encouragement has led me to become a better scientist and grow personally throughout my time at NIU. Thank you to my committee members, Dr. Yanbin Yin and Dr. Melissa Lenczewski for assisting in and reviewing my project. I would like to thank Ingemar Ohlsson and Eric Becraft for their contributions, both in collecting data and helping me understand the intricacies of bioinformatics. Thank you to all of MO Deep South for being friends and supportive fellow scientists, who when needed provided plenty of escapes from school life. A final thank you to my parents that have continuously encouraged my pursuit of sciences and higher education. iii TABLE OF CONTENTS PAGE LIST OF TABLES............................................................................................................ vi LIST OF FIGURES.................................................................................................................... viii INTRODUCTION................................................................................................ 1 Extreme Environments and Their Importance............................................. 1 Alkaline Environments.................................................................................... 2 Microbial Survival in Alkaline Systems....................................................... 5 Calumet Site Description and Geochemistry............................................... 7 Current Study. ................................................................................................. 10 METHODS.................................................................................................................... 13 Site Description................................................................................................ 13 Sampling Scheme............................................................................................ 14 DNA Extraction and Sequencing.................................................................. 15 Analysis of 16S rRNA Gene Sequences...................................................... 16 Metagenomic Assembly and Binning.......................................................... 16 iv 16S rRNA Gene Identification and Taxonomic Placement...................... 17 Protein and Nucleotide Based Taxonomic Placement............................... 17 Genome Quality Control................................................................................ 18 Determination of Metabolic Potential and Genome Features................... 18 Alkaline-related Adaptive Characteristics................................................... 19 Protein Based Comparison to Known Genomes......................................... 20 Putative Genome Phylogeny.......................................................................... 20 RESULTS...................................................................................................................... 25 Analysis of 16S rRNA Gene Sequences...................................................... 25 Naming Conventions for Genomes.............................................................. 28 Metagenomic Assembly and Binning.......................................................... 28 Protein and Nucleotide Based Taxonomic Placement............................... 30 Genome Quality Control................................................................................ 36 Alkaline-related Adaptive Characteristics and Serpentinization-Associated Metabolisms..................................................................................................... 38 Relative Abundance of Genomes.................................................................. 42 16S rRNA Gene and Protein Phylogeny...................................................... 43 Protein Based Comparison to Known Genomes......................................... 49 v Genome Determination of Metabolic Potential and Genome Features... 54 Alkaline Protein pI Trends............................................................................. 66 DISCUSSION............................................................................................................... 68 Bacterial Community...................................................................................... 68 Genomes........................................................................................................... 70 Alkaline Survival and Characteristics.......................................................... 75 Conclusion and Future Derection................................................................. 77 REFERENCES............................................................................................................. 78 vi LIST OF TABLES Table Page 1. AmphoraNet Genes Excluded from Gene Trees...................................................... 21 2. Gene Set Used for Firmicutes Gene Tree.................................................................. 23 3. The Relative Abundance of Amplicons Labeled to the Most Abundant Four Phyla by RDP....................................................................................................... 25 4. Genome Naming Conventions.................................................................................... 28 5. General Combined Metagenome Statistics............................................................... 29 6. General Post-Refinement Genome Statistics............................................................ 30 7. Taxonomic Placement of Genes Based on MEGAN5............................................. 31 8. Taxonomic Analysis of Conserved Single Copy Marker Genes from Binned Genomes......................................................................................................................... 34 9. rRNA Feature Taxonomy and Removal.................................................................... 37 10. Completion Analysis of Binned Genomes................................................................ 38 11. Presence of Kegg Annotated Genes Within Binned Genomes.............................. 39 12. Na+/H+ Associated Genomes Features.................................................................... 42 13. Relative Abundance of Genome Related 16S rRNA Genes.................................. 43 14. Approximate Relative Abundance of Genomes, Measured Through Anvi’o_Read_Incorporation....................................................................................... 43 15. General RAST Protein Counts for CVC_MG_Fluv1 and Fluviicola_taffensis_DSM_16823............................................................................. 54 vii 16. RAST Subsystem Category Distribution for CVC_MG_Fluv1