Identification of Environmental Alphaproteobacteria with Conserved Signature Proteins in Metagenomic Datasets
Total Page:16
File Type:pdf, Size:1020Kb
M. Sc. Thesis—Quan Yao McMaster—Biology IDENTIFICATION OF ENVIRONMENTAL ALPHAPROTEOBACTERIA WITH CONSERVED SIGNATURE PROTEINS IN METAGENOMIC DATASETS M. Sc. Thesis—Quan Yao McMaster—Biology IDENTIFICATION OF ENVIRONMENTAL ALPHAPROTEOBACTERIA WITH CONSERVED SIGNATURE PROTEINS IN METAGENOMIC DATASETS BY QUAN YAO, B.Sc. A Thesis Submitted to the School of Graduate Studies in Partial Fulfillment of the Requirements For the Degree Master of Science McMaster University © Copyright by Quan Yao, Dec 2013 M. Sc. Thesis—Quan Yao McMaster—Biology MASTER OF SCIENCE (2013) McMaster University (Biology) Hamilton, Ontario TITLE: Identification of Environmental Alphaproteobacteria with Conserved Signature Proteins in Metagenomic Datasets AUTHOR: Quan Yao, B.Sc. (Ocean University of China) SUPERVISOR: Professor H.E. Schellhorn NUMBER OF PAGES: ix, 94 ii M. Sc. Thesis—Quan Yao McMaster—Biology Abstract Microbial metagenomics is the exploration of taxonomical diversity of microbial communities in environmental habitats using large, exhaustive DNA sequence datasets. However, due to inherent limitations of sequencing technology and the complexity of environmental genomes, current analytical approaches do not reveal the existence of all microbes that may be present. In this study, a new classification approach is proposed based upon unique proteins that are specific for different clades of Alphaproteobacteria to predict the presence and absence of species from these groups of bacteria in published metagenomic datasets. In this work, 264 previously–identified, published conserved signature proteins (CSPs) characteristic of individual taxonomic clades of Alphaproteobacteria are used as probes to detect the presence of bacteria in metagenomic datasets. Although public genome sequence information has increased manifold since these CSPs were initially identified 6 years ago, results indicate that nearly all of these CSPs (259 of 265) are specific for their previously characterized clades. Furthermore, they are confirmed to be present in the newly–identified and sequenced members of these clades. In view of their specificity and predictive ability in different monophyletic clades of Alphaproteobacteria, the sequences of these CSPs provide reliable probes to determine the presence or absence of these Alphaproteobacteria in metagenomic datasets. In this work, CSPs are used to determine the presence of Alphaproteobacteria diversity in 10 published metagenomic datasets (bioreactor, compost, wastewater, activated sludge, groundwater, freshwater sediment, microbial mat, marine, hydrothermal vent and whale fall metagenomes), which cover diverse environment and ecosystems. It is indicated that iii M. Sc. Thesis—Quan Yao McMaster—Biology the BLAST searches with these CSPs can be used to efficiently identify Alphaproteobacteria species in these metagenome dataset and substantial differences can be determined in the distribution and relative abundance of different Alphaproteobacteria species in the tested metagenome datasets. Thus the CSPs, which are specific for different microbial taxa, provide novel and powerful means for identification of microbes and for their taxonomic profiling in metagenomic datasets. iv M. Sc. Thesis—Quan Yao McMaster—Biology Acknowledgements First, I must thank my Supervisor, Dr. Herb Schellhorn, who gave a lot of valuable suggestions and recommendations during my research work, along with his generosity for taking us to attend the conference of Canadian Society of Microbiologist in Ottawa, during which we had a great experience to share research work and communicate with the world’s top researchers. The second summer in Dr. Schellhorn’s cottage is an unforgettable memory, where we enjoyed a fascinating retreat after a year of hard work. Equally important, I would like to thank my co-supervisor, Dr. Gupta for his continuous support in my work and the inspirations he ignited in my mind and my committee chair, Dr. Igdoura for his kindness and assistance in my defense. Secondly, I have to thank my lab mate who accompanied me in the past 2 years both in the lab and out of campus. I want to acknowledge Lingzi, Mohammed, Shirley, Sohail, Steve, Rachel, and Pardis. The coffee break chats for casual and entertaining topics, the cooperative work we managed to accomplish when encountering the bottlenecks in research, or some in-depth exchange of ideas and thoughts about philosophy, universe and ourselves, all these pieces make up an indispensable part in my life to establish my values and faiths. Finally, I must thank my parents for their encouragement in my life. Without their guidance and instruction, I can never achieve the goal that I have ever dreamed of. Their love to me is my forever treasure and provides the motive power to help me conquer future obstacles in my career. v M. Sc. Thesis—Quan Yao McMaster—Biology Table of Contents Part I. Uniqueness of Alphaproteobacteria specific CSPs ................................................ 1 Chapter 1 Introduction ........................................................................................................................ 1 1.1 Significance of Alphaproteobacteria ............................................................................................... 1 1.2 Conserved signature proteins as phylogenetic markers ......................................................... 5 1.3 Standards for taxonomic hierarchy ................................................................................................ 6 Chapter 2 Materials and methods ...................................................................................................... 9 2.1 Confirmation of the uniqueness of CSPs ...................................................................................... 9 2.2 Grouping of CSP into Taxonomic levels ..................................................................................... 10 Chapter 3. Results .............................................................................................................................. 13 3.1 Confirmation of the uniqueness of CSPs .................................................................................... 13 3.2 Grouping of CSP into Taxonomic levels ..................................................................................... 15 Chapter 4 Discussion ......................................................................................................................... 27 4.1 Confirmation of the uniqueness of CSP ...................................................................................... 27 4.2 Grouping of CSP into Taxonomic levels ..................................................................................... 28 4.3 Future experiments ............................................................................................................................... 29 Part II Identification of Alphaproteobacteria specific CSPs in metagenomic samples ..................................................................................................................................................... 31 Chapter 1 introduction ...................................................................................................................... 31 1.1 Metagenome, environmental genomes ......................................................................................... 31 1.2 Taxonomic classification of metagenomic reads: methods and challenges .................. 34 1.3 Application of metagenomics ........................................................................................................... 36 1.4 Project objectives ................................................................................................................................... 40 Chapter 2 Materials and methods ................................................................................................... 42 2.1 Metagenome selection .......................................................................................................................... 42 2.2 Identification of CSP in metagenomic samples ........................................................................ 42 2.3 Comparative analysis of Alphaproteobacteria in metagenomes ...................................... 43 Chapter 3 Results ............................................................................................................................... 45 3.1 Metagenome selection .......................................................................................................................... 45 3.2 Identification of CSPs in metagenomic samples ...................................................................... 47 3.3 Comparative analysis of Alphaproteobacteria in metagenomes ...................................... 50 Chapter 4 Discussion ......................................................................................................................... 74 4.1 Metagenome selection .......................................................................................................................... 74 4.2 Identification of CSPs in metagenomic samples ...................................................................... 75 4.3 Comparative analysis of Alphaproteobacteria in metagenomes ...................................... 77 4.4 Overall conclusions ............................................................................................................................... 79 4.5 Future directions ...................................................................................................................................