Bioinformatics Approaches for Metagenomics Data

Total Page:16

File Type:pdf, Size:1020Kb

Bioinformatics Approaches for Metagenomics Data BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA ANALYSIS A D I D O R O N - FAIGENBOIM PLANT SCIENCES, VEGETABLE AND FIELD CROPS ARO, T H E VOLCANI CENTER , I S R A E L RISHON LEZION 7528809 Metagenomics o“Metagenomics is the study of the collective genomes of all microorganisms from an environmental sample” o Community o Environmental o Ecological DNA sequencing & microbial profiling Traditional microbiology relies on isolation and culture of bacteria o Cumbersome and labour intensive process o Fails to account for the diversity of microbial life o Great plate-count anomaly Staley, J. T., and A. Konopka. 1985. Measurements of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. Annu. Rev. Microbiol. 39:321-346 Why environmental sequencing? Estimated 1000 trillion tons of bacterial/archeal life on Earth o Only a small proportion of organisms have been grown in culture o Species do not live in isolation o Clonal cultures fail to represent the natural environment of a given organism o Many proteins and protein functions remain undiscovered Why environmental sequencing? Rhizobiome Pollutant Non-human microbiomes Human microbiome sites The revolution in sequencing technologies High throughput technologies promote the accumulation of enormous volumes of genomic and metagenomics data. HiSeq MiSeq Next-Generation Sequencing: A Review of Technologies and Tools for Wound Microbiome Research Brendan P. Hodkinson and Elizabeth A. Grice*. Adv Wound Care (New Rochelle). 2015 Experimental Approaches Community composition ◦ Microbiome (16S rRNA gene, 18S, ITS, etc.) Community composition and functional potential ◦ Metagenomics Functional genetic response ◦ Metatranscriptomics 16s Vs. Shotgun Metagenomic o16s – targeted sequencing of a single gene ◦ Marker for identification ◦ Well established ◦ Cheap ◦ Amplified what you want oShotgun sequencing – sequence all the DNA ◦ No primer bias ◦ Can identify all microbes ◦ Function information 16S rRNA sequencing • 16S rRNA forms part of bacterial ribosomes. • Contains regions of highly conserved and highly variable sequence. • Variable sequence can be thought of as a molecular “fingerprint” can be used to identify bacterial genera and species. • Large public databases available for comparison.– Ribosomal Database Project (RDP) currently contains >1.5 million rRNA sequences. • Conserved regions can be targeted to amplify broad range of bacteria from environmental samples. • Not quantitative due to copy number variation Erlandsen S L et al. J Histochem Cytochem 2005;53:917-927 16S rRNA gene sequencing o Pros ◦ Well established ◦ Sequencing costs are relatively cheap (~50,000 reads/sample) ◦ Only amplifies what you want (no host contamination) oCons ◦ Primer choice can bias results towards certain organisms ◦ Usually not enough resolution to identify to the strain level ◦ Need different primers usually for archaea & eukaryotes (18S) ◦ Cannot identify viruses ◦ No direct functional profiling Binning sequences to UTS oOperational Taxonomic Unit (OTU) An arbitrary definition of a taxonomic unit based on sequence divergence oComposition-based binning − GC content − Di/Tri/Tetra/... nucleotide composition (kmer-based frequency comparison) − Codon usage statistics oSimilarity-based binning − Direct comparison of OTU sequence to a reference database − Identity cut-off varies depending on resolution required Genus - 90% , Family - 80% , Species - 97% Sample 1 Sample 2 OTU present 50:50 in both samples MEGAN Blast against NCBI database Clustering of OTUs based on sequence similarity Software for binning o Composition-based binning o TETRA - Maximal-Order Markov Model o PhyloPythia – Support Vector o Seeded Growing Self-Organising Maps (S-GSOM) o TETRA + Codon based usage o Similarity-based binning o Requires that most sequences in a sample are present in a primary or secondary reference database o QIIME o MEGAN (comparison against Blast NCBI NR) o Mothur (RDP) o CARMA (comparison against PFAM) o ARB (linked with Silva database) Sequences Databases Measuring diversity of OTUs Two primary measures for sequence based studies: • Alpha diversity −What is there? How much is there? −Diversity within a sample • Beta diversity −How similar are two samples? −Diversity between samples Alpha diversity – human microbiome C Huttenhower et al. Nature 486, 207-214 (2012) doi:10.1038/nature11234 Alpha diversity oSpecies count in the sample o what is a species ? o OUTs o missing level of evolutionary diversity oPhylogenetic diversity (PD) o sum of the branch length covered by a sample o missing the distribution of the species Alpha diversity oSimpson’s diversity index (also Shannon, Chao indexes) o gives less weight to rarest species S is the number of species N is the total number of organisms ni is the number of organisms of species i Whittaker, R.H. (1972). "Evolution and measurement of species diversity". Taxon (International Association for Plant Taxonomy (IAPT)) 21 (2/3): 213–251 Beta diversity – human microbiome C Huttenhower et al. Nature 486, 207-214 (2012) doi:10.1038/nature11234 Beta diversity oDiversity between samples oUnifrac distance oPhytogenic-based beta diversity oPercentage observed branch length unique to either sample Lozupone and Knight, 2005. Unifrac: A new phylogenetic method for comparing microbial communitieis. Appl Environ Microbiol 71:8228 Other useful data representations Simple bar charts - what species are present? Other useful data representations Rarefaction curves - How much of a community have we sampled? Number of OTUsNumber Number of sequences Adapted from Wooley et al. A Primer on Metagenomics, PLoS Computational Biology, Feb 2010, Vol 6(2) Shotgun whole metagenome oUnlike 16S, metagenomic sequencing is no targeted to a specific gene, but does an unbiased sample of the entire genomic DNA. oTypically shorter sequence reads are used to obtain >5Gb of data per sample. oHiSeq or NextSeq platform are typically more cost effective for metagenomic sequencing Shotgun metagenomics Pros ◦ No primer bias ◦ Can identify all microbes (e.g. eukaryotes, viruses) ◦ Direct functional profiling • Cons ◦ More expensive (millions of sequences needed) ◦ Host/site contamination can be significant ◦ May not be able to sequence “rare” microbes ◦ Required computational resources can be restrictive ◦ More complex bioinformatic analyses required ◦ Chimera, unknown function Sequence coverage Complexity Diversity & Coverage Estimating coverage in metagenomic data sets and why it matters. ISME J. 2014 Luis M Rodriguez-R and Konstantinos T Konstantinidis Metagenomics' assembly Metagenomics' assembly Metagenomic Assembly: Overview, Challenges and Applications. Yale J Biol Med. 2016 Sep; 89(3): 353–362 Metagenomics' assembly o Greedy assembler: o reads with maximum overlaps are iteratively merged into contigs o Overlap-Layout-Consensus : o graph is constructed by finding overlaps between all pairs of reads o Bruijn graph: o reads are chopped into short overlapping segments (k-mers) o K-mers are organized in a de Bruijn graph based on their co-occurrence across reads. o The graph is simplified to remove artifacts due to sequencing errors, o branch-less paths are reported as contigs. de Bruijn graph approach o Low abundance genomes may end up fragmented if overall sequencing depth is insufficient to form connections in the graph o Using a short k-mer size oThe assembler must strike a balance between recovering low abundance genomes and obtaining long, accurate contigs for high abundance genomes oComputational time and memory may be insufficient to complete such assemblies. oMultiple k-mer approach oSpread memory load over cluster of computer Metagenome assembly tools Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist’s Perspective - Not Only Size Matters! John Vollmers, Sandra Wiegand, Anne-Kristin Kaster What we do with the assembly oCharacterizing the contigs/scaffolds o Mapping statistics o Compositions (%GC, codon usage) o Annotation - taxonomy & function assignments oBinning oComparative genomics oMetabolic pathways Binning over read mapping oPartition the metagenome to species o Read coverage (multiple samples) o compositions sample sample sample GC% 1 2 3 scaffold1 27 7 60 34 scaffold2 29 6 61 33 scaffold3 5 21 20 51 scaffold4 7 20 22 50 Metagenomic Assembly: Overview, Challenges and Applications. Yale J Biol Med. 2016 Sep; 89(3): 353–362 Binning over read mapping sample sample sample GC% 1 2 3 scaffold1 27 7 60 34 70 scaffold2 29 6 61 33 60 scaffold3 5 21 20 51 50 scaffold1 scaffold4 7 20 22 50 40 scaffold2 scaffold3 30 scaffold4 20 10 0 sample1 sample2 sample3 GC Binning contigs oCompletely automated approach o CONCOCT o GroopM o MetaBAT oCompleteness of metagenome assembled genomes (MAGs) o single-copy core genes (tRNA synthetases , ribosomal proteins) Genes annotations oFinds bacterial genes in the contigs/scaffolds ◦ Prodigal ◦ Prokka oAnnotation of the genes ◦ By homology searches (DIAMOND) ◦ Domains finding o Comparisons ◦ Gene family ◦ Distribution among the samples (CD-HIT) Functional potential - The annotations suggest the functional potential of the community No sure about the biology activity (may not be transcribed an translates) Common functional databases oNCBI oCOG o Well known but original classification (not updated since 2003) o PFAM o Focused more on protein domains based on hidden Markov models oKEGG o Very popular, each entry is well annotated, and often linked into “Modules” or “Pathways” o Full access now requires a license fee
Recommended publications
  • The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet Is Available from the National Academies Press, 500 Fifth Street, NW, Washington, D.C
    THE NATIONALA REPORTIN BRIEF C The New Science of Metagenomics Revealing the Secrets of Our Microbial Planet ADEMIES Although we can’t see them, microbes are essential for every part of human life— indeed all life on Earth. The emerging field of metagenomics provides a new way of viewing the microbial world that will not only transform modern microbiology, but also may revolu- tionize understanding of the entire living world. very part of the biosphere is impacted Eby the seemingly endless ability of microorganisms to transform the world around them. It is microorganisms, or microbes, that convert the key elements of life—carbon, nitrogen, oxygen, and sulfur—into forms accessible to other living things. They also make necessary nutrients, minerals, and vitamins available to plants and animals. The billions of microbes living in the human gut help humans digest food, break down toxins, and fight off disease-causing pathogens. Microbes also clean up pollutants in the environment, such as oil and Bacteria in human saliva. Trillions of chemical spills. All of these activities are carried bacteria make up the normal microbial com- out not by individual microbes but by complex munity found in and on the human body. microbial communities—intricate, balanced, and The new science of metagenomics can help integrated entities that have a remarkable ability to us understand the role of microbial commu- adapt swiftly to environmental change. nities in human health and the environment. Historically, microbiology has focused on (Image courtesy of Michael Abbey) single species in pure laboratory culture, and thus understanding of microbial communities has lagged behind understanding of their individual mem- bers.
    [Show full text]
  • Metagenomics Approaches for the Detection and Surveillance of Emerging and Recurrent Plant Pathogens
    microorganisms Review Metagenomics Approaches for the Detection and Surveillance of Emerging and Recurrent Plant Pathogens Edoardo Piombo 1,2 , Ahmed Abdelfattah 3,4 , Samir Droby 5, Michael Wisniewski 6,7, Davide Spadaro 1,8,* and Leonardo Schena 9 1 Department of Agricultural, Forest and Food Sciences (DISAFA), University of Torino, 10095 Grugliasco, Italy; [email protected] 2 Department of Forest Mycology and Plant Pathology, Uppsala Biocenter, Swedish University of Agricultural Sciences, P.O. Box 7026, 75007 Uppsala, Sweden 3 Institute of Environmental Biotechnology, Graz University of Technology, Petersgasse 12, 8010 Graz, Austria; [email protected] 4 Department of Ecology, Environment and Plant Sciences, University of Stockholm, Svante Arrhenius väg 20A, 11418 Stockholm, Sweden 5 Department of Postharvest Science, Agricultural Research Organization (ARO), The Volcani Center, Rishon LeZion 7505101, Israel; [email protected] 6 U.S. Department of Agriculture—Agricultural Research Service (USDA-ARS), Kearneysville, WV 25430, USA; [email protected] 7 Department of Biological Sciences, Virginia Technical University, Blacksburg, VA 24061, USA 8 AGROINNOVA—Centre of Competence for the Innovation in the Agroenvironmental Sector, University of Torino, 10095 Grugliasco, Italy 9 Department of Agriculture, Università Mediterranea, 89122 Reggio Calabria, Italy; [email protected] * Correspondence: [email protected]; Tel.: +39-0116708942 Abstract: Globalization has a dramatic effect on the trade and movement of seeds, fruits and vegeta- bles, with a corresponding increase in economic losses caused by the introduction of transboundary Citation: Piombo, E.; Abdelfattah, A.; plant pathogens. Current diagnostic techniques provide a useful and precise tool to enact surveillance Droby, S.; Wisniewski, M.; Spadaro, protocols regarding specific organisms, but this approach is strictly targeted, while metabarcoding D.; Schena, L.
    [Show full text]
  • What Is Metagenomics? - an Introduction
    What is Metagenomics? - an Introduction Josef Korbinian Vogt Information Age Do you remember? DNA Sequencing Reading the order of bases in DNA fragments How? Do you remember? Sequencing timeline Do you remember? ‘53 ‘77 ‘83 ‘86 ‘90 ‘95 ‘96 ‘00 ‘03 ‘04 ‘06 2010 - Human Genome Pyro- Completion of 3rd Generation Sanger sequencing Project starts sequencing Human Genome Sequencing Project Illumina Sequencing Polymerase chain First complete reaction developed genome of free- living organisms 454 Discovery of DNA double Pyrosequencing Helix by Watson & Crick Applied Biosystems “Next Generation” markets first automated Sequencing DNA sequencing 2nd Generation sequencing Do you remember? Reading the order of bases in DNA bases From genomics to metagenomics From genomics to metagenomics Genomics E. coli, Science, 1997 Human, Nature/Science, 2001 Metagenomics Saragasso sea, Science, 2004 Human gut, Nature, 2010 Metagenomics What is Metagenomics? Metagenomics (Environmental Genomics, Ecogenomics or Community Genomics) is the study of genetic material recovered directly from environmental samples. Chen & Pachter, Metagenomics is application of modern genomic techniques to the 2005 study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species Environments Metagenomics • Investigate all genomic content (i.e. bacteria, phages, plasmids…) • culture/non-culturable (~99% of microbial species cannot be cultivated) • known/unknown 16S rRNA How to? How to? How to? Methods Makes analysis hard… • Lack of references (for novel metagenomes) • Assembly: shared/similar regions between genomes work as repeats, assign contigs • Varying abundance • High diversity, large datasets Genomic vs Metagenomics Genomic vs Metagenomics Genomic vs Metagenomics Why bother? • Discovery: • novel products (e.g.
    [Show full text]
  • For Immediate Release. Oct. 8Th, 2010 Microbesonline.Org Enhanced for Metagenomics and Metabolism
    For Immediate Release. Oct. 8th, 2010 MicrobesOnline.org Enhanced for Metagenomics and Metabolism & The Arkin lab (genomics.lbl.gov) computational biology and bioinformatics team, as part of the Joint BioEnergy Institute (JBEI, www.jbei.org) and the Virtual Institute for Microbial Stress and Survival (VIMSS, vimss.lbl.gov), has added significant new functionality to the MicrobesOnline.org web resource for comparative and functional analysis of microbial genomes. This work, led by Dylan Chivian with John Bates, Keith Keller, Morgan Price, Paramvir Dehal, under the guidance of Adam Arkin, offers the scientific community new capabilities for metagenomic and metabolic analyses. metaMicrobesOnline Figure 1. metaMicrobesOnline permits the user to select metagenomes and isolates they want to study and then perform easy searches for keywords or gene families (e.g. “amylase” or “PFAM00128”). Scanning electron microscopy image of microbial compost community courtesy Bernhard Knierim and Manfred Auer. The new meta.MicrobesOnline.org web resource, which currently contains over 1600 genomes from bacterial, archaeal, and microeukaryotic isolates, offers combined phylogenetic gene tree analysis of millions of genes from over 150 ecological and organismal metagenomes. These trees are built using our FastTree [1] program, which offers rapid highly accurate tree building, even for very large trees. Such combined analysis is superior to BLAST-based homology approaches in that trees offers the ability to place genes from environmental samples into an evolutionary context and permits more precise functional grouping within a gene family, yielding information about the key genes for a given environment. Additionally, comparison with isolate genomes gives researchers clues for which additional genes to look for to complete the components of systems, or may possess phylogenetic markers to aid in assigning the species for environmental sequence fragments, permitting the determination of which community members are responsible for which roles.
    [Show full text]
  • Genome and Pangenome Analysis of Lactobacillus Hilgardii FLUB—A New Strain Isolated from Mead
    International Journal of Molecular Sciences Article Genome and Pangenome Analysis of Lactobacillus hilgardii FLUB—A New Strain Isolated from Mead Klaudia Gustaw 1,* , Piotr Koper 2,* , Magdalena Polak-Berecka 1 , Kamila Rachwał 1, Katarzyna Skrzypczak 3 and Adam Wa´sko 1 1 Department of Biotechnology, Microbiology and Human Nutrition, Faculty of Food Science and Biotechnology, University of Life Sciences in Lublin, Skromna 8, 20-704 Lublin, Poland; [email protected] (M.P.-B.); [email protected] (K.R.); [email protected] (A.W.) 2 Department of Genetics and Microbiology, Institute of Biological Sciences, Maria Curie-Skłodowska University, Akademicka 19, 20-033 Lublin, Poland 3 Department of Fruits, Vegetables and Mushrooms Technology, Faculty of Food Science and Biotechnology, University of Life Sciences in Lublin, Skromna 8, 20-704 Lublin, Poland; [email protected] * Correspondence: [email protected] (K.G.); [email protected] (P.K.) Abstract: The production of mead holds great value for the Polish liquor industry, which is why the bacterium that spoils mead has become an object of concern and scientific interest. This article describes, for the first time, Lactobacillus hilgardii FLUB newly isolated from mead, as a mead spoilage bacteria. Whole genome sequencing of L. hilgardii FLUB revealed a 3 Mbp chromosome and five plasmids, which is the largest reported genome of this species. An extensive phylogenetic analysis and digital DNA-DNA hybridization confirmed the membership of the strain in the L. hilgardii species. The genome of L. hilgardii FLUB encodes 3043 genes, 2871 of which are protein coding sequences, Citation: Gustaw, K.; Koper, P.; 79 code for RNA, and 93 are pseudogenes.
    [Show full text]
  • Microbes and Metagenomics in Human Health an Overview of Recent Publications Featuring Illumina® Technology TABLE of CONTENTS
    Microbes and Metagenomics in Human Health An overview of recent publications featuring Illumina® technology TABLE OF CONTENTS 4 Introduction 5 Human Microbiome Gut Microbiome Gut Microbiome and Disease Inflammatory Bowel Disease (IBD) Metabolic Diseases: Diabetes and Obesity Obesity Oral Microbiome Other Human Biomes 25 Viromes and Human Health Viral Populations Viral Zoonotic Reservoirs DNA Viruses RNA Viruses Human Viral Pathogens Phages Virus Vaccine Development 44 Microbial Pathogenesis Important Microorganisms in Human Health Antimicrobial Resistance Bacterial Vaccines 54 Microbial Populations Amplicon Sequencing 16S: Ribosomal RNA Metagenome Sequencing: Whole-Genome Shotgun Metagenomics Eukaryotes Single-Cell Sequencing (SCS) Plasmidome Transcriptome Sequencing 63 Glossary of Terms 64 Bibliography This document highlights recent publications that demonstrate the use of Illumina technologies in immunology research. To learn more about the platforms and assays cited, visit www.illumina.com. An overview of recent publications featuring Illumina technology 3 INTRODUCTION The study of microbes in human health traditionally focused on identifying and 1. Roca I., Akova M., Baquero F., Carlet J., treating pathogens in patients, usually with antibiotics. The rise of antibiotic Cavaleri M., et al. (2015) The global threat of resistance and an increasingly dense—and mobile—global population is forcing a antimicrobial resistance: science for interven- tion. New Microbes New Infect 6: 22-29 1, 2, 3 change in that paradigm. Improvements in high-throughput sequencing, also 2. Shallcross L. J., Howard S. J., Fowler T. and called next-generation sequencing (NGS), allow a holistic approach to managing Davies S. C. (2015) Tackling the threat of anti- microbial resistance: from policy to sustainable microbes in human health.
    [Show full text]
  • A Roadmap for Metagenomic Enzyme Discovery
    Natural Product Reports View Article Online REVIEW View Journal A roadmap for metagenomic enzyme discovery Cite this: DOI: 10.1039/d1np00006c Serina L. Robinson, * Jorn¨ Piel and Shinichi Sunagawa Covering: up to 2021 Metagenomics has yielded massive amounts of sequencing data offering a glimpse into the biosynthetic potential of the uncultivated microbial majority. While genome-resolved information about microbial communities from nearly every environment on earth is now available, the ability to accurately predict biocatalytic functions directly from sequencing data remains challenging. Compared to primary metabolic pathways, enzymes involved in secondary metabolism often catalyze specialized reactions with diverse substrates, making these pathways rich resources for the discovery of new enzymology. To date, functional insights gained from studies on environmental DNA (eDNA) have largely relied on PCR- or activity-based screening of eDNA fragments cloned in fosmid or cosmid libraries. As an alternative, Creative Commons Attribution-NonCommercial 3.0 Unported Licence. shotgun metagenomics holds underexplored potential for the discovery of new enzymes directly from eDNA by avoiding common biases introduced through PCR- or activity-guided functional metagenomics workflows. However, inferring new enzyme functions directly from eDNA is similar to searching for a ‘needle in a haystack’ without direct links between genotype and phenotype. The goal of this review is to provide a roadmap to navigate shotgun metagenomic sequencing data and identify new candidate biosynthetic enzymes. We cover both computational and experimental strategies to mine metagenomes and explore protein sequence space with a spotlight on natural product biosynthesis. Specifically, we compare in silico methods for enzyme discovery including phylogenetics, sequence similarity networks, This article is licensed under a genomic context, 3D structure-based approaches, and machine learning techniques.
    [Show full text]
  • Discovery of New Protein Families and Functions
    Discovery of new protein families and functions: new challenges in functional metagenomics for biotechnologies and microbial ecology Lisa Ufarté, Gabrielle Veronese, Elisabeth Laville To cite this version: Lisa Ufarté, Gabrielle Veronese, Elisabeth Laville. Discovery of new protein families and functions: new challenges in functional metagenomics for biotechnologies and microbial ecology. Frontiers in Microbiology, Frontiers Media, 2015, 6, 10.3389/fmicb.2015.00563. hal-01184136 HAL Id: hal-01184136 https://hal.archives-ouvertes.fr/hal-01184136 Submitted on 12 Aug 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. REVIEW published: 05 June 2015 doi: 10.3389/fmicb.2015.00563 Discovery of new protein families and functions: new challenges in functional metagenomics for biotechnologies and microbial ecology Lisa Ufarté 1,2,3, Gabrielle Potocki-Veronese 1,2,3 and Élisabeth Laville 1,2,3* 1 Université de Toulouse, Institut National des Sciences Appliquées (INSA), Université Paul Sabatier (UPS), Institut National Polytechnique (INP), Laboratoire d’Ingénierie des Systèmes Biologiques et des Procédés (LISBP), Toulouse, France, 2 INRA - UMR792 Ingénierie des Systèmes Biologiques et des Procédés, Toulouse, France, 3 CNRS, UMR5504, Toulouse, France Edited by: Eamonn P. Culligan, University College Cork, Ireland The rapid expansion of new sequencing technologies has enabled large-scale functional Reviewed by: exploration of numerous microbial ecosystems, by establishing catalogs of functional Marc Strous, genes and by comparing their prevalence in various microbiota.
    [Show full text]
  • Prokaryotic Gene Discovery, Metagenomics and Pangenomics
    27621 Prokaryotic Gene Discovery, Metagenomics and Pangenomics Thomas Sicheritz-Pontén Carsten Friis Dave Ussery Center for Biological Sequence Analysis Department of Systems Biology Technical University of Denmark Outline z People z Course structure z Learning objectives z Examination z Short Intro to the Curriculum Main Teachers Thomas Sicheritz-Ponten Associate Professor Head of Metagenomics Group Main Teachers Carsten Friis Rundsten Assistant Professor Comparative Microbial Genomics Group Main Teachers Dave Ussery Associate Professor Head of Comparative Microbial Genomics Group Other Teachers Kristoffer Kiil PhD student Comparative Microbial Genomics Group Marcelo Bertalan PhD student Metagenomics Group Ulrik Plate PhD student Metagenomics Group Text books Content of the Course Course Philosophy Course Structure – Cutting Edge Science – Lectures – Specialist Knowledge – Labs/Computer (Guest Lecturers) exercises – Dialog, not monolog – Lab Journals *must* be handed in Student Projects – Groups of 2-3 Students Exam – Problem Formulation – 4 Hrs. Written Exam – Project Presentation – No Aid! Course Web Page http://www.cbs.dtu.dk/dtucourse/metagenomics/27621.php Learning Objectives z Understand and explain the differences between conventional, meta- and pan-genomics. z Understand and apply bioinformatic tools on meta- and pan-genomic data. z Estimate 16S rRNA biodiversity and the size of the total core- and pan-genomes z Produce a list of genes representing the core- and pan-genomes. z Independently design a metagenomic project and a
    [Show full text]
  • Genome-Reconstruction for Eukaryotes from Complex Natural Microbial Communities
    Downloaded from genome.cshlp.org on October 1, 2021 - Published by Cold Spring Harbor Laboratory Press Genome-reconstruction for eukaryotes from complex natural microbial communities ,$ Patrick T. West1, Alexander J. Probst2 , Igor V. Grigoriev1,5, Brian C. Thomas2, Jillian F. Banfield2,3,4* 1Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA. 2Department of Earth and Planetary Science, University of California, Berkeley, CA, USA. 3Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA. 4Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. 5US Department of Energy Joint Genome Institute, Walnut Creek, California, USA. * Corresponding Author $present address: Group for Aquatic Microbial Ecology, Biofilm Center, Department of Chemistry, University of Duisburg-Essen, Essen, Germany Running title Metagenomic Reconstruction of Eukaryotic Genomes Keywords metagenomics, eukaryotes, genome, gene prediction Corresponding author contact info Jillian Banfield Department of Environmental Science, Policy, & Management UC Berkeley 130 Mulford Hall #3114 Berkeley, CA 94720 [email protected] (510) 643-2155 1 Downloaded from genome.cshlp.org on October 1, 2021 - Published by Cold Spring Harbor Laboratory Press Abstract Microbial eukaryotes are integral components of natural microbial communities and their inclusion is critical for many ecosystem studies yet the majority of published metagenome analyses ignore eukaryotes. In order to include eukaryotes in environmental studies we propose a method to recover eukaryotic genomes from complex metagenomic samples. A key step for genome recovery is separation of eukaryotic and prokaryotic fragments. We developed a k-mer- based strategy, EukRep, for eukaryotic sequence identification and applied it to environmental samples to show that it enables genome recovery, genome completeness evaluation and prediction of metabolic potential.
    [Show full text]
  • Metagenomics Metatranscriptomics Matthew L
    Metagenomics Metatranscriptomics Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis [email protected]; [email protected] Sequencing Depth • The first and most basic question is how many base pairs of sequence data will I get Factors to consider are: • 1. Number of reads being sequenced • 2. Read length (if paired consider then as individuals) • 3. Number of samples being sequenced • 4. Expected percentage of usable data • The number of reads and read length data are best obtained from the manufacturer’s website (search for specifications) and always use the lower end of the estimate. Genomic Coverage Once you have the number of base pairs per sample you can then determine expected coverage Factors to consider then are: 1. Length of the genome 2. Any extra-genomic sequence (ie mitochondria, virus, plasmids, etc.). For bacteria in particular, these can become a significant percentage ���������ℎ ∗ �������� ∗ 0.8 ∗ ���������������� ���������� num.lanes = ������ ������������������� Metagenomics Sequencing Considerations (when a literature search turns up nothing) • Proportion that is host (non-microbial genomic content) • Proportion that is microbial (genomic content of interest) • Number of species • Genome size of each species • Relative abundance of each species The back of the envelope calculation �������� �������� ∗ ����������������� 1 = ∗ ������ ������� ∗ �������������� ∗ (1 − ℎ�������������) 0.8 Metagenomics Sequencing Considerations (when a literature search turns up nothing) • Proportion that is host (non-microbial genomic content) • Proportion that is microbial (genomic content of interest) • Number of species • Genome size of each species • Relative abundance of each species The back of the envelope calculation �������� �������� ∗ ����������������� 1 = ∗ ������ ������� ∗ �������������� ∗ (1 − ℎ�������������) 0.8 Amplicons vs. Metagenomics • Metagenomics • Shotgun libraries intended to sequence random genomic sequences from the entire bacterial community.
    [Show full text]
  • A Fast Machine Learning Workflow for Metagenomic Data for Phenotype
    The Thirty-First AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI-19) A Fast Machine Learning Workflow for Rapid Phenotype Prediction from Whole Shotgun Metagenomes Anna Paola Carrieri* Will PM Rowe∗ Martyn Winn Edward O. Pyzer-Knappy IBM Research UK Scientific Computing Dept. Scientific Computing Dept. IBM Research UK Sci-Tech Daresbury STFC Daresbury Lab. STFC Daresbury Lab. Sci-Tech Daresbury Warrington, UK. Warrington, UK. Warrington, UK. Warrington, UK. [email protected] Abstract Metagenomics: study of the genetic material of microorgan- isms constituting the microbiome. Research on the microbiome is an emerging and crucial Read: inferred sequence of nucleotides corresponding to all science that finds many applications in healthcare, food or part of a single DNA fragment, as measured in a sequenc- safety, precision agriculture and environmental studies. Huge amounts of DNA from microbial communities are being se- ing experiment. quenced and analyzed by scientists interested in extracting Whole metagenome shotgun sequencing: sequencing of meaningful biological information from this big data. Ana- the total genetic material of a microbial community. lyzing massive microbiome sequencing datasets, which em- Taxonomy: the process of naming and classifying organ- bed the functions and interactions of thousands of different isms into groups within a larger system, according to their bacterial, fungal and viral species, is a significant computa- similarities. tional challenge. Artificial intelligence has the potential for Marker genes: gene families that can be used to quantify building predictive models that can provide insights for spe- taxonomic diversity. cific cutting edge applications such as guiding diagnostics OTUs - Operational Taxonomy Units: cluster of microor- and developing personalised treatments, as well as maintain- ganisms grouped by sequence similarity of a specific marker ing soil health and fertility.
    [Show full text]