Genomic and Metabolic Gene Characterization of Bacterial Communities from the Neuse River Estuarine Sytem Using Long Read Metagenomics
Total Page:16
File Type:pdf, Size:1020Kb
GENOMIC AND METABOLIC GENE CHARACTERIZATION OF BACTERIAL COMMUNITIES FROM THE NEUSE RIVER ESTUARINE SYTEM USING LONG READ METAGENOMICS Laura Elizabeth Fisch A thesis submitted to the faculty at the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master of Science in the Department of Marine Sciences. Chapel Hill 2019 Approved By: Scott Gifford Alecia Septer Adrian Marchetti Chris Osborn © 2019 Laura Elizabeth Fisch ALL RIGHTS RESERVED ii ABSTRACT Laura Elizabeth Fisch: GENOMIC AND METABOLIC GENE CHARACTERIZATION OF BACTERIAL COMMUNITIES FROM THE NEUSE RIVER ESTUARINE SYTEM USING LONG READ METAGENOMICS (Under the direction of Scott Gifford) The productivity of estuaries is linked to the microbes in the estuary and their significant role in carbon cycling. The greatest challenge in studying these microbial communities is finding a method to capture their complexity. Metagenomics is the recovery and sequencing of DNA from an environmental sample and can be applied to studying the potential of a community to drive carbon cycling by identifying the carbon metabolism genes encoded in the community’s DNA. Long read sequencing technology produces DNA sequence fragments (reads) of ~ 1,000 to 50,000 base pairs which are long enough to contain multiple, complete functional genes. This study aims to understand the taxonomic identity and carbon metabolic pathways of estuarine microbial communities by applying long read sequencing technology from Oxford Nanopore Technologies to water samples collected from the Neuse River Estuary on March 8th and October 4th of 2018. iii ACKNOWLEDGMENTS Thank you to Scott Gifford for advising me through this graduate degree. Under your guidance, I feel as though I have grown immensely as a scientist, presenter and writer. I am very grateful for the many hours you have put into my education. I want to especially thank you for the time spent training me to write scientifically through creating this thesis, which has greatly improved the quality of my writing. I also want to thank you for imparting your impressive presentation skills to me, and for developing my ability to see the big picture in all of the work that we do. I also want to thank my committee members, Alecia Septer, Adrian Marchetti, and Chris Osborn. Each of you have, through your own expertise, contributed to this project. I found committee meetings with you all to be quite enjoyable as we always had interesting conversations about microbiology, organic carbon, and metagenomics. Thank you all for your contributions to this project with your ideas and my education. I want to mention Chris especially as some of the papers you had me read for my comprehensive exams and the Daniel Thornton paper made their way into this thesis because of the questions you asked me to answer. Through my classes in the Marine Science program, I have been well educated on marine microbial ecology so that I could contribute to this project. I want to thank National Science Foundation for funding my graduate education through the Graduate Research Fellowship Program. By funding my graduate education NSF made it possible to devote recourses to a sequencing project, which is something I’d always wanted to work on. iv A huge thanks to Hans Paerl’s lab for allowing us to come on the ModMon and Pamlico sound cruises for sample collection. In particular, thanks to Jeremy Brady and Betsy Abare for taking us out to the estuary and for helping us out with our sampling. Another huge thanks to Acacia Zhao for helping me with many of the bioinformatics tools from installation to usage to understanding these tools. I really appreciate your help as it made this project possible for a biologist without a computer science background. Lastly, I’d like to thank Lauren Speare for all the great conversations about research and science. I found your tips and trick for developing topic sentences to be very helpful. Thanks a bunch for sharing your writing experience and knowledge with me. v TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................... viii LIST OF FIGURES ....................................................................................................................... ix LIST OF ABREVIATIONS .......................................................................................................... xi INTRODUCTION ...........................................................................................................................1 METHODS ......................................................................................................................................5 Sample Collection ................................................................................................................5 DNA extraction ....................................................................................................................5 DNA sequencing with Oxford Nanopore Technologies ......................................................6 Sequence processing and Quality control ............................................................................6 Read annotation ...................................................................................................................7 RESULTS AND DISCUSSION ......................................................................................................9 Internal Standards ................................................................................................................9 Sequencing Statistics ...........................................................................................................9 Sequence Quality ...............................................................................................................11 Community Composition ...................................................................................................12 Functional Composition .....................................................................................................20 Glucose metabolism: glycolysis and the pentose phosphate pathway ...............................22 Carbohydrate metabolism: isomerase enzymes .................................................................23 Aromatic metabolism: the homogentisate aromatic degradation pathway ........................26 Aromatic metabolism: the beta-ketoadipate pathway ........................................................29 vi CONCLUSIONS............................................................................................................................31 APPENDIX: CARBON METABOLIC ANNOTATIONS ...........................................................33 WORKS CITED ............................................................................................................................41 vii LIST OF TABLES Table 1: Sequencing statistics ........................................................................................................11 Table 2: The Top 10 most abundant phyla and the relative abundances of the total number of reads assigned to them out of the total number of reads analyzed in the sample ............................................................................................16 Table 3: The Top 10 most abundant phyla and the relative abundances of the total number of reads assigned to them out of the total number of reads analyzed in the sample ............................................................................................16 Table 4: Top 20 most abundant families and the number of reads assigned to them ..................................................................................................................................17 Table 5. Broad COG categories and the number of reads with a COG assigned to them. Numbers represent the number of reads with a COG assignment that falls into each category ................................................................................................. 21 Table 6. COG Metabolism categories and the number of gene annotations in each category. Numbers represent the number of reads with a COG assignment that falls into each category ..............................................................................22 Table 1A: COG annotations of glycolysis enzymes ......................................................................33 viii LIST OF FIGURES Figure 1: 2018 sampling sites ..........................................................................................................4 Figure 2: Bioinformatics work flow ................................................................................................8 Figure 3: Example figure of how MEGAN6’s LCA algorithm assigns reads to a taxonomic identification from the list of best hits returned by DIAMOND blastx ...........................................................................................................12 Figure 4: Tree depicting the evolutionary relationships of the different taxonomic classifications of NR180 reads within the proteobacteria .................................18 Figure 5: Tree depicting the evolutionary relationships of the different taxonomic classifications of PS9 reads within the proteobacteria ......................................19 Figure 6: 19 reads from NR180 that include the functional annotation uronic isomerase. and the surrounding COG annotations on the read ................................25 Figure 7: 19