Natural Product Reports View Article Online REVIEW View Journal A roadmap for metagenomic enzyme discovery Cite this: DOI: 10.1039/d1np00006c Serina L. Robinson, * Jorn¨ Piel and Shinichi Sunagawa Covering: up to 2021 Metagenomics has yielded massive amounts of sequencing data offering a glimpse into the biosynthetic potential of the uncultivated microbial majority. While genome-resolved information about microbial communities from nearly every environment on earth is now available, the ability to accurately predict biocatalytic functions directly from sequencing data remains challenging. Compared to primary metabolic pathways, enzymes involved in secondary metabolism often catalyze specialized reactions with diverse substrates, making these pathways rich resources for the discovery of new enzymology. To date, functional insights gained from studies on environmental DNA (eDNA) have largely relied on PCR- or activity-based screening of eDNA fragments cloned in fosmid or cosmid libraries. As an alternative, Creative Commons Attribution-NonCommercial 3.0 Unported Licence. shotgun metagenomics holds underexplored potential for the discovery of new enzymes directly from eDNA by avoiding common biases introduced through PCR- or activity-guided functional metagenomics workflows. However, inferring new enzyme functions directly from eDNA is similar to searching for a ‘needle in a haystack’ without direct links between genotype and phenotype. The goal of this review is to provide a roadmap to navigate shotgun metagenomic sequencing data and identify new candidate biosynthetic enzymes. We cover both computational and experimental strategies to mine metagenomes and explore protein sequence space with a spotlight on natural product biosynthesis. Specifically, we compare in silico methods for enzyme discovery including phylogenetics, sequence similarity networks, This article is licensed under a genomic context, 3D structure-based approaches, and machine learning techniques. We also discuss Received 31st January 2021 various experimental strategies to test computational predictions including heterologous expression and DOI: 10.1039/d1np00006c screening. Finally, we provide an outlook for future directions in the field with an emphasis on meta- rsc.li/npr omics, single-cell genomics, cell-free expression systems, and sequence-independent methods. Open Access Article. Published on 12 April 2021. Downloaded 9/29/2021 5:15:20 PM. 1. Introduction 3.4. Gene context and interactions 1.1. The sequence–structure–function paradigm 3.5. 3D-structure based methods 1.2. Metagenomics: promises and perils 3.6. Motifs and active site residues 1.3. Denitions for enzyme discovery 3.7. Machine learning 1.4. Caveats and assumptions 4. Reaching the destination: characterizing new enzymes 2. Setting course: experimental design for metagenomics 4.1. Cloning and heterologous expression studies 4.2. Heterologous expression 2.1. Activity-guided functional metagenomics 4.3. Screening for enzyme activity 2.2. PCR-based functional metagenomics 5. Scenic drives: a case study on marine metagenomics 2.3. Shotgun metagenomic sequencing 5.1. Global ocean microbiomics 2.4. Parallels with natural product research 5.2. Microbiomes of marine invertebrates 2.5. Hotbeds for enzyme discovery 6. Gearing up for the future: new frontiers in enzyme 3. On the road: computational methods for enzyme function discovery prediction 6.1. Meta-omics 3.1. Querying metagenomic databases 6.2. Single-cell genomics 3.2. Phylogenetics 6.3. Microuidics 3.3. Sequence similarity networking 6.4. Cell-free platforms 6.5. Sequence-independent methods 7. Conclusions Eidgenossische¨ Technische Hochschule (ETH), Zurich,¨ Switzerland. E-mail: [email protected] This journal is © The Royal Society of Chemistry 2021 Nat. Prod. Rep. View Article Online Natural Product Reports Review 7.1. Discoveries oen occur at the boundaries of protein phenotypic consequences of a genetic message”. Nearly 5 families decades later, predicting the phenotypic consequences of 7.2. Think outside the colorimetric assay box to move into protein sequences remains a complex task. Signicant progress unexplored protein space has been made on the three-dimensional prediction front, 7.3. Move beyond E. coli into new hosts however. In 2020, the deep learning algorithm AlphaFold2 7.4. (Genome) context is everything achieved landmark results for the prediction of 3D protein 8. Conicts of interest structure from primary sequence. In a rigorous blinded global 9. Acknowledgements competition, AlphaFold2 averaged within 1.6 A˚ of the truth, 10. References achieving an error less than the width of one atom.1 To this news, Frances Arnold, 2018 Nobel laureate in Chemistry, reac- ted with, “Pretty impressive! Perhaps we can now move to the 1. Introduction protein function problem?”. 1.1. The sequence–structure–function paradigm While accurate predictions for the 3D structures of many The 1972 Nobel laureate in Chemistry, Christian Annsen, proteins from primary sequence are now within our grasp, ended his Nobel lecture with the line, “It is certain that major understanding function from protein structure or sequence is advances in the understanding of cellular organization.will far from solved. Even for Escherichia coli, one of the most well- occur when we can predict, in advance, the three-dimensional, characterized organisms on earth, >35% of genes lack experi- mental evidence of function.2 Moreover, the pan-genome, that is, the complete set of genes found among all strains of E. coli is Serina Robinson is an ETH estimated to contain >16 000 different families of homologous Zurich¨ postdoctoral fellow with genes.3 By these estimates, E. coli is still considered to have an Dr Jorn¨ Piel and will start her open pan-genome since the species is undergoing constant independent career as a tenure- 4 Creative Commons Attribution-NonCommercial 3.0 Unported Licence. gene acquisition and diversi cation. Our limited under- track group leader at the Swiss standing of one of the world's most intensively-studied model Federal Institute of Aquatic organisms5 emphasizes the challenge in determining the Science and Technology (Eawag) functions of coding sequences not from organisms grown in in autumn 2021. She obtained monoculture in the laboratory but from metagenomic DNA her PhD in Microbiology and MSc from complex environments. in Bioinformatics and Computa- tional Biology from the University 1.2. Metagenomics: promises and perils of Minnesota, Minneapolis, USA 6 This article is licensed under a (advisor: Larry Wackett), where Metagenomics, a term rst coined in 1998, refers to the study she applied machine learning and genome mining techniques to of environmental DNA (eDNA). This is not only limited to investigate b-lactone synthetases, a newly-discovered family of natural environments in the classical sense, but to essentially enzymes involved in natural product biosynthesis. Her current every sampling location conceivable, including the hindguts of Open Access Article. Published on 12 April 2021. Downloaded 9/29/2021 5:15:20 PM. 7 8 9 research focuses on the discovery of new biosynthetic enzymes from termites, cheese rinds, and the International Space Station. marine, freshwater, and wastewater metagenomes. Enabled by next-generation sequencing technologies, Jorn¨ Piel studied Chemistry at the University of Bonn, Germany, Shinichi Sunagawa studied and obtained a PhD in 1998 Biochemistry and Marine Ecology (advisor: Wilhelm Boland). Aer in Germany, and obtained his a postdoc with Bradley S. Moore PhD in 2010 at the University of and Heinz G. Floss he became California, Merced, USA. Aer group leader at the Max Planck returning to Germany, he joined Institute for Chemical Ecology in the European Molecular Biology Jena, Germany, in 2000. From Laboratory in Heidelberg as 2004–2013 he was associate a postdoctoral fellow, and professor at the University of continued to work on ocean and Bonn and subsequently full human gut microbial communi- professor at the Institute of ties as a research- and staff Microbiology, ETH Zurich.¨ His lab works at the interface of scientist. In 2016, he established Chemistry and Biology, studying bacterial metabolism with an the Microbiome Research Laboratory at the Institute of Microbi- emphasis on microbial and biosynthetic ‘dark matter’, symbiosis, ology at ETH Zurich,¨ which combines bioinformatic and experi- marine natural products, biosynthetic engineering, and chemical mental approaches to integrate quantitative ‘meta-omics’ readouts ecology. with contextual information to study and the role of environmental microorganisms and mechanisms of host-microbial homeostasis. Nat. Prod. Rep. This journal is © The Royal Society of Chemistry 2021 View Article Online Review Natural Product Reports metagenomics quickly became a new scientic eld in its own characterization of new enzymes from eDNA – lacks sufficient right, contributing to exponential growth in the size of resolution. What exactly is a ‘new’ enzyme? In this review, we sequencing repositories. In 2007, still relatively early years for conceptualize metagenomic enzyme discovery as a pyramid metagenomics, a single study – the Global Ocean Sampling with three tiers (Fig. 1). The tip of the pyramid, which we refer to Expedition – nearly doubled the total number of protein as de novo enzyme discovery, refers to the identication of sequences in public databases.10 The rate of increase in next- entirely new types of biocatalysts. In other words, de novo generation sequencing has far
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages30 Page
-
File Size-