Bioinformatic Methods for Characterization of Viral Pathogens in Metagenomic Samples

Total Page:16

File Type:pdf, Size:1020Kb

Bioinformatic Methods for Characterization of Viral Pathogens in Metagenomic Samples Linköping studies in science and technology Dissertation No. 1489 Bioinformatic methods for characterization of viral pathogens in metagenomic samples Fredrik Lysholm Department of Physics, Chemistry and Biology Linköping, 2012 Cover art by Fredrik Lysholm 2012. The cover shows a rhinovirus capsid consisting of 4 amino acid chains (VP1: Blue, VP2: Red, VP3: Green and VP4: Yellow) in complex, repeated 60 times (in total 240 chains). The icosahedral shape of the capsid can be seen, especially through observ- ing the placement of each 4 chain complex and the edges. The structure is fetched from RCSB PDB [1] (4RHV [2]) and the ~6500 atoms were rendered with ray-trace using PyMOL 1.4.1 at 3200x3200 pixels, in ~3250 seconds. During the course of research underlying this thesis, Fredrik Lysholm was enrolled in Forum Scientium, a multidisciplinary doctoral programme at Linköping University. Copyright © 2012 Fredrik Lysholm, unless otherwise noted. All rights reserved. Fredrik Lysholm Bioinformatic methods for characterization of viral pathogens in metagenomic samples ISBN: 978-91-7519-745-6 ISSN: 0345-7524 Linköping studies in science and technology, dissertation No. 1489 Printed by LiU-Tryck, Linköping, Sweden, 2012. Bioinformatic methods for characterization of viral pathogens in metagenomic samples Fredrik Lysholm, [email protected] IFM Bioinformatics Linköping University, Linköping, Sweden Abstract Virus infections impose a huge disease burden on humanity and new viruses are continuously found. As most studies of viral disease are limited to the investigation of known viruses, it is important to charac- terize all circulating viruses. Thus, a broad and unselective exploration of the virus flora would be the most productive development of modern virology. Fueled by the reduction in sequencing costs and the unbiased nature of shotgun sequencing, viral metagenomics has rapidly become the strategy of choice for this exploration. This thesis mainly focuses on improving key methods used in viral metagenomics as well as the complete viral characterization of two sets of samples using these methods. The major methods developed are an efficient automated analysis pipeline for metagenomics data and two novel, more accurate, alignment algorithms for 454 sequencing data. The automated pipeline facilitates rapid, complete and effortless analysis of metagenomics samples, which in turn enables detection of potential pathogens, for instance in patient samples. The two new alignment algorithms developed cover comparisons both against nucleotide and protein databases, while retaining the underlying 454 data representa- tion. Furthermore, a simulator for 454 data was developed in order to evaluate these methods. This simulator is currently the fastest and most complete simulator of 454 data, which enables further development of algorithms and methods. Finally, we have successfully used these methods to fully characterize a multitude of samples, including samples collected from children suffering from severe lower respiratory tract infections as well as patients diagnosed with chronic fatigue syndrome, both of which presented in this thesis. In these studies, a complete viral characterization has revealed the presence of both expected and unexpected viral pathogens as well as many potential novel viruses. iii Abstract iv Populärvetenskaplig sammanfattning Metoder för att bestämma sammansättningen av virus i patientprover. Denna avhandling hanterar främst hur man kan ta reda på vilka virus som infekterar oss, genom att studera det genetiska materialet (DNA) i insamlade prover. Under det senaste årtiondes har det skett explosionsartad utveckling av möjligheterna att bestämma det DNA som finns i ett prov. Det är t.ex. möjligt att bestämma en persons kompletta arvsanlag för mindre än 10’000 dollar, i jämförelse med 3 miljarder dollar för det första, som färdigställdes för ca 10 år sedan. Denna utveckling har emellertid inte bara gjort det möjligt att ”billigt” bestämma en persons DNA utan också möjliggjort storskaliga studier av arvsanlaget hos de virus och bakterier som finns i våra kroppar. För att kunna förstå allt det data som genereras behöver vi dock ta hjälp av datorer. Bioinformatik är det ämnesområde som innefattar att lagra, hantera och analysera just dessa data. Analys av DNA insamlat från prover med okänt innehåll (så kallad metagenomik) kräver användandet av en stor mängd bioinformatiska metoder. Till exempel, för att förstå vad som hittats i provet måste det jämföras med stora databaser med känt innehåll. Tyvärr så har vi ännu inte samlat in och karaktäriserat mer än en bråkdel av allt genetiskt material som står att finna bland bakterier, virus och andra patogener. Dessa ickekompletta databaser, samt de fel som uppstår under in- samlingen av data, försvårar djupgående analyser avsevärt. Vidare så har ökningen av mängden data som genererats samt växande databaser gjort de här analyserna mer och mer tidskrävande. I denna avhandling har vi använt och utvecklat nya bioinformatiska metoder för att bättre och snabbare förstå och analysera metage- nomikdata. Vi har både utvecklat nya sätt att använda befintliga v Populärvetenskaplig sammanfattning sökmetoder för att så snabbt som möjligt ta reda på vad ett prov innehåller samt utvecklat flera metoder för att hantera de fel som uppstår när ett provs DNA bestäms. De här bioinformatiska metoderna och verktygen har redan används för att kartlägga ett flertal material, bl.a. virusinnehållet i prov insamlat från barn som lider av svår lägre luftvägsinfektion samt patienter som lider av kroniskt trötthetssyndrom. Till följd av en mängd analyser har de flesta processerna finputsats och automatiserats så att beräkningar och analyser kan ske utan mänsklig inverkan. Detta möjliggör t.ex. att det i framtiden kommer gå att använda denna eller liknande metoder som ett kliniskt verktyg för att ta reda på de möjliga virus som infekterar en enskild patient. vi Acknowledgments This is perhaps the most important part of my thesis, or at least the most read part. While much of the work you do as a PhD student is done on your own, there are many people who made this thesis possible whom I really ought to thank a lot! Firstly, I would like to thank my supervisors Bengt Person and Björn Andersson. Thank you Bengt for always believing in the work I have done and for an always so very positive attitude towards our projects. I also appreciate that you let me work on whatever I felt like and always supported me in my projects. Björn I would really like to thank for your involvement and help in the metagenomic projects we shared and for all those nice lunches at MF or the local fast food place at KI. I would also like to thank my two former co-workers, Jonas Carlsson and Joel Hedlund. Thank you both for all the cool discussions, pool-playing in Brag and for being stable lunch and fika partners during my first years as a PhD student. Jonas, my mentor during my master thesis time, you are always happy and crazy in a way that made it impossible not to love you. Joel, with whom I shared an office after Jonas left, thanks for helping me to take a break and look at some “something awful pictures” or to discuss really odd stuff such as how best to make an utterly immovable nail-bed. Also, I would like to thank the newcomers to Linköping, Björn Wallner and Robert Pilestål. Björn, you are probably the only person I have ever met who share my occupation, my interest in carpentry and building stuff as well as playing floorball (a true He-Man). Robert, PhD student in Björn’s group, with whom I now share an office, thanks for sharing your thoughts on everything from evolution to marriage! Thank you both (as well as Malin Larsson, new BILS person in Linköping) for sharing lunches and fikas with me during this last year. I would also like to thank all other fellow PhD students at IFM over the years. Special thanks goes to Leif “leffe” Johansson and Karin “karma” vii Acknowledgments Magnusson for being my two allies over the years. Also, special thanks goes to Sara, Viktor, Jonas, Per, Erik, Anders and all the other members of the “coffee club”. Thank you all for the greetings sent and conveyed by Karin at my wedding and for all the weird but incredibly funny discussions over coffee. My appreciation also goes to Stefan Klintström and all the members of Forum Scientium. It has been a lot of fun and educating to be part of Forum! My colleges at CMB, KI and SciLifeLab also deserve special thanks. First of all thanks to Anna Wetterbom, for helping and teaching me how to write a manuscript. I would also like to thank Oscar, Stefanie, Stephen and Hamid for the nice discussions, lunches and for housing me in their small office at KI! I also really have to thank all other friends and collaborators over the years; Tobias Allander, Michael Lindberg, Dave Messina, Erik Sonnhammer and Patrik Björkholm. Tobias and Michael for help with all the viruses and Dave and Erik for the joined effort in trying to make sense of the “dark matter”. Patrik, you are probably the most cheerful guy I know and you have been a great support over the years. It is downright strange we have not published together yet! Finally, I would really like to thank my lovely wife, Ida, who has always encouraged me and been at my side. Linköping, December 2012. viii Papers Papers included in this thesis I. Lysholm F*, Andersson B, Persson B: An efficient simulator of 454 data using configurable statistical models. BMC Res Notes 2011, 4: 449 Fredrik Lysholm drafted the study, implemented the software and wrote the manuscript. II. Lysholm F*, Andersson B, Persson B: FAAST: Flow-space Assisted Alignment Search Tool. BMC Bioinformatics 2011, 12: 293. Fredrik Lysholm drafted the study, implemented the software and wrote the manuscript. III. Sullivan PF*, Allander T, Lysholm F, Goh S, Persson B, Jacks A, Evengård B, Pedersen NL, Andersson B*: An unbiased metagenomic search for infectious agents using monozygotic twins discordant for chronic fatigue.
Recommended publications
  • Coming-Of-Age Characterization of Soil Viruses: a User's Guide To
    Review Coming-of-Age Characterization of Soil Viruses: A User’s Guide to Virus Isolation, Detection within Metagenomes, and Viromics Gareth Trubl 1,* , Paul Hyman 2 , Simon Roux 3 and Stephen T. Abedon 4,* 1 Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA 2 Department of Biology/Toxicology, Ashland University, Ashland, OH 44805, USA; [email protected] 3 DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; [email protected] 4 Department of Microbiology, The Ohio State University, Mansfield, OH 44906, USA * Correspondence: [email protected] (G.T.); [email protected] (S.T.A.) Received: 25 February 2020; Accepted: 17 April 2020; Published: 21 April 2020 Abstract: The study of soil viruses, though not new, has languished relative to the study of marine viruses. This is particularly due to challenges associated with separating virions from harboring soils. Generally, three approaches to analyzing soil viruses have been employed: (1) Isolation, to characterize virus genotypes and phenotypes, the primary method used prior to the start of the 21st century. (2) Metagenomics, which has revealed a vast diversity of viruses while also allowing insights into viral community ecology, although with limitations due to DNA from cellular organisms obscuring viral DNA. (3) Viromics (targeted metagenomics of virus-like-particles), which has provided a more focused development of ‘virus-sequence-to-ecology’ pipelines, a result of separation of presumptive virions from cellular organisms prior to DNA extraction. This separation permits greater sequencing emphasis on virus DNA and thereby more targeted molecular and ecological characterization of viruses.
    [Show full text]
  • Microbes and Metagenomics in Human Health an Overview of Recent Publications Featuring Illumina® Technology TABLE of CONTENTS
    Microbes and Metagenomics in Human Health An overview of recent publications featuring Illumina® technology TABLE OF CONTENTS 4 Introduction 5 Human Microbiome Gut Microbiome Gut Microbiome and Disease Inflammatory Bowel Disease (IBD) Metabolic Diseases: Diabetes and Obesity Obesity Oral Microbiome Other Human Biomes 25 Viromes and Human Health Viral Populations Viral Zoonotic Reservoirs DNA Viruses RNA Viruses Human Viral Pathogens Phages Virus Vaccine Development 44 Microbial Pathogenesis Important Microorganisms in Human Health Antimicrobial Resistance Bacterial Vaccines 54 Microbial Populations Amplicon Sequencing 16S: Ribosomal RNA Metagenome Sequencing: Whole-Genome Shotgun Metagenomics Eukaryotes Single-Cell Sequencing (SCS) Plasmidome Transcriptome Sequencing 63 Glossary of Terms 64 Bibliography This document highlights recent publications that demonstrate the use of Illumina technologies in immunology research. To learn more about the platforms and assays cited, visit www.illumina.com. An overview of recent publications featuring Illumina technology 3 INTRODUCTION The study of microbes in human health traditionally focused on identifying and 1. Roca I., Akova M., Baquero F., Carlet J., treating pathogens in patients, usually with antibiotics. The rise of antibiotic Cavaleri M., et al. (2015) The global threat of resistance and an increasingly dense—and mobile—global population is forcing a antimicrobial resistance: science for interven- tion. New Microbes New Infect 6: 22-29 1, 2, 3 change in that paradigm. Improvements in high-throughput sequencing, also 2. Shallcross L. J., Howard S. J., Fowler T. and called next-generation sequencing (NGS), allow a holistic approach to managing Davies S. C. (2015) Tackling the threat of anti- microbial resistance: from policy to sustainable microbes in human health.
    [Show full text]
  • Freshwater Viral Metagenome Reveals Novel and Functional Phage-Borne
    Moon et al. Microbiome (2020) 8:75 https://doi.org/10.1186/s40168-020-00863-4 RESEARCH Open Access Freshwater viral metagenome reveals novel and functional phage-borne antibiotic resistance genes Kira Moon1, Jeong Ho Jeon2, Ilnam Kang1, Kwang Seung Park2, Kihyun Lee3, Chang-Jun Cha3, Sang Hee Lee2* and Jang-Cheon Cho1* Abstract Background: Antibiotic resistance developed by bacteria is a significant threat to global health. Antibiotic resistance genes (ARGs) spread across different bacterial populations through multiple dissemination routes, including horizontal gene transfer mediated by bacteriophages. ARGs carried by bacteriophages are considered especially threatening due to their prolonged persistence in the environment, fast replication rates, and ability to infect diverse bacterial hosts. Several studies employing qPCR and viral metagenomics have shown that viral fraction and viral sequence reads in clinical and environmental samples carry many ARGs. However, only a few ARGs have been found in viral contigs assembled from metagenome reads, with most of these genes lacking effective antibiotic resistance phenotypes. Owing to the wide application of viral metagenomics, nevertheless, different classes of ARGs are being continuously found in viral metagenomes acquired from diverse environments. As such, the presence and functionality of ARGs encoded by bacteriophages remain up for debate. Results: We evaluated ARGs excavated from viral contigs recovered from urban surface water viral metagenome data. In virome reads and contigs, diverse ARGs, including polymyxin resistance genes, multidrug efflux proteins, and β-lactamases, were identified. In particular, when a lenient threshold of e value of ≤ 1×e−5 and query coverage of ≥ 60% were employed in the Resfams database, the novel β-lactamases blaHRV-1 and blaHRVM-1 were found.
    [Show full text]
  • Viral Metagenomics in the Clinical Realm: Lessons Learned from a Swiss-Wide Ring Trial
    G C A T T A C G G C A T genes Article Viral Metagenomics in the Clinical Realm: Lessons Learned from a Swiss-Wide Ring Trial 1, 2, 2 2, Thomas Junier *, Michael Huber * , Stefan Schmutz , Verena Kufner y, 2, 3, 3, 4, Osvaldo Zagordi y, Stefan Neuenschwander y, Alban Ramette y , Jakub Kubacki y , 4, 5, 6, 6, Claudia Bachofen y, Weihong Qi y, Florian Laubscher y, Samuel Cordey y , 6, 7, 8 1,9 8, Laurent Kaiser y, Christian Beuret y, Valérie Barbié , Jacques Fellay and Aitana Lebrand * 1 Global Health Institute, Swiss Federal Institute of Technology (ETH Lausanne) & SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland 2 Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland 3 Institute for Infectious Diseases, University of Bern, 3001 Bern, Switzerland 4 Institute of Virology, VetSuisse Faculty, University of Zurich, 8057 Zurich, Switzerland 5 Functional Genomics Center Zurich, Swiss Federal Institute of Technology (ETH Zurich) & University of Zurich, 8057 Zurich, Switzerland 6 Laboratory of Virology, University Hospitals of Geneva, 1205 Geneva, Switzerland; University of Geneva Medical School, 1206 Geneva, Switzerland 7 Biology Department, Spiez Laboratory, 3700 Spiez, Switzerland 8 Clinical Bioinformatics, SIB Swiss Institute of Bioinformatics, 1202 Geneva, Switzerland 9 Precision Medicine Unit, Lausanne University Hospital and University of Lausanne, 1010 Lausanne, Switzerland * Correspondence: [email protected] (T.J.); [email protected] (M.H.); [email protected] (A.L.) These authors contributed equally. y Received: 30 July 2019; Accepted: 24 August 2019; Published: 28 August 2019 Abstract: Shotgun metagenomics using next generation sequencing (NGS) is a promising technique to analyze both DNA and RNA microbial material from patient samples.
    [Show full text]
  • Long-Read Viral Metagenomics Enables Capture of Abundant and Microdiverse Viral Populations and Their Niche-Defining Genomic Islands
    bioRxiv preprint doi: https://doi.org/10.1101/345041; this version posted November 12, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Long-read viral metagenomics enables capture of abundant and microdiverse viral populations and their niche-defining genomic islands Authors Joanna Warwick-Dugdale1,2, Natalie Solonenko3, Karen Moore2, Lauren Chittick3, Ann C. Gregory3, Michael J. Allen1,2, Matthew B. Sullivan3,4, Ben Temperton2. 1. Plymouth Marine Laboratory, Plymouth, UK 2. School of Biosciences, University of Exeter, Exeter, UK. 3. Department of Microbiology, and 4 Civil, Environmental and Geodetic Engineering, The Ohio State University, Ohio, USA. Corresponding Author: Ben Temperton Geoffrey Pope, Stocker Road, Exeter, EX4 4QD, UK email: [email protected] Abstract Marine viruses impact global biogeochemical cycles via their influence on host community structure and function, yet our understanding of viral ecology is constrained by limitations in culturing of important hosts and the lack of a ‘universal’ gene to facilitate community surveys. Short-read viral metagenomic studies have provided clues to viral function and first estimates of global viral gene abundance and distribution. However, short-read assemblies are confounded by populations with high levels of strain evenness and nucleotide diversity (microdiversity), limiting assembly of some of the most abundant viruses on Earth. Assembly across genomic islands which likely contain niche-defining genes that drive ecological speciation is also challenging. While such populations and features are successfully captured by single-virus genomics and fosmid-based approaches, both techniques require considerable cost and technical expertise.
    [Show full text]
  • Long-Read Viral Metagenomics Captures Abundant and Microdiverse Viral Populations and Their Niche-Defining Genomic Islands
    Long-read viral metagenomics captures abundant and microdiverse viral populations and their niche-defining genomic islands Joanna Warwick-Dugdale1,2, Natalie Solonenko3, Karen Moore2, Lauren Chittick3, Ann C. Gregory3, Michael J. Allen1,2, Matthew B. Sullivan3,4 and Ben Temperton2 1 Plymouth Marine Laboratory, Plymouth, Devon, United Kingdom 2 School of Biosciences, University of Exeter, Exeter, Devon, United Kingdom 3 Department of Microbiology, Ohio State University, Columbus, OH, United States of America 4 Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, United States of America ABSTRACT Marine viruses impact global biogeochemical cycles via their influence on host com- munity structure and function, yet our understanding of viral ecology is constrained by limitations in host culturing and a lack of reference genomes and `universal' gene markers to facilitate community surveys. Short-read viral metagenomic studies have provided clues to viral function and first estimates of global viral gene abundance and distribution, but their assemblies are confounded by populations with high levels of strain evenness and nucleotide diversity (microdiversity), limiting assembly of some of the most abundant viruses on Earth. Such features also challenge assembly across genomic islands containing niche-defining genes that drive ecological speciation. These populations and features may be successfully captured by single-virus genomics and fosmid-based approaches, at least in abundant taxa, but at considerable cost and tech- nical expertise. Here we established a low-cost, low-input, high throughput alternative sequencing and informatics workflow to improve viral metagenomic assemblies using Submitted 15 November 2018 Accepted 14 March 2019 short-read and long-read technology.
    [Show full text]
  • Virus Discovery by Metagenomics: the (Im)Possibilities
    UC Irvine UC Irvine Previously Published Works Title Editorial: Virus Discovery by Metagenomics: The (Im)possibilities. Permalink https://escholarship.org/uc/item/3t14062g Journal Frontiers in microbiology, 8(SEP) ISSN 1664-302X Authors Dutilh, Bas E Reyes, Alejandro Hall, Richard J et al. Publication Date 2017 DOI 10.3389/fmicb.2017.01710 License https://creativecommons.org/licenses/by/4.0/ 4.0 Peer reviewed eScholarship.org Powered by the California Digital Library University of California EDITORIAL published: 08 September 2017 doi: 10.3389/fmicb.2017.01710 Editorial: Virus Discovery by Metagenomics: The (Im)possibilities Bas E. Dutilh 1, 2*, Alejandro Reyes 3, 4, 5, Richard J. Hall 6 and Katrine L. Whiteson 7 1 Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands, 2 Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, Netherlands, 3 Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia, 4 Max Planck Tandem Group in Computational Biology, Universidad de los Andes, Bogotá, Colombia, 5 Center for Genome Sciences and Systems Biology, Washington University in Saint Louis, St Louis, MO, United States, 6 Animal Health Laboratory, Ministry for Primary Industries, Upper Hutt, New Zealand, 7 Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA, United States Keywords: metagenomics, virus discovery, virome, bacteriophages, phages, metagenome, bioinformatics, biological dark matter Editorial on the Research Topic Virus Discovery by Metagenomics: The (Im)possibilities This Frontiers in Virology Research Topic showcases how metagenomic and bioinformatic approaches have been combined to discover, classify and characterize novel viruses. Since the late 1800s (Lecoq, 2001), the discovery of new viruses was a gradual process.
    [Show full text]
  • An in Silico Evaluation of Metagenome-Enabled Estimates of Viral Community Composition and Diversity
    Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity Simon Roux1, Joanne B. Emerson1, Emiley A. Eloe-Fadrosh2 and Matthew B. Sullivan1,3 1 Department of Microbiology, Ohio State University, Columbus, OH, United States of America 2 Joint Genome Institute, Department of Energy, Walnut Creek, CA, United States of America 3 Department of Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, United States of America ABSTRACT Background. Viral metagenomics (viromics) is increasingly used to obtain unculti- vated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we used in silico mock viral communities to evaluate the viromic sequence-to-ecological- inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. Results. Tools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, and IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5× coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain Submitted 24 June 2017 Accepted 26 August 2017 heterogeneity within populations hampered assembly, especially when strains were Published 21 September 2017 closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant Corresponding authors strain represented <50% of the population.
    [Show full text]
  • Coming-Of-Age Characterization of Soil Viruses: a User's Guide To
    Review Coming-of-Age Characterization of Soil Viruses: A User’s Guide to Virus Isolation, Detection within Metagenomes, and Viromics Gareth Trubl 1,* , Paul Hyman 2 , Simon Roux 3 and Stephen T. Abedon 4,* 1 Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA 2 Department of Biology/Toxicology, Ashland University, Ashland, OH 44805, USA; [email protected] 3 DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; [email protected] 4 Department of Microbiology, The Ohio State University, Mansfield, OH 44906, USA * Correspondence: [email protected] (G.T.); [email protected] (S.T.A.) Received: 25 February 2020; Accepted: 17 April 2020; Published: 21 April 2020 Abstract: The study of soil viruses, though not new, has languished relative to the study of marine viruses. This is particularly due to challenges associated with separating virions from harboring soils. Generally, three approaches to analyzing soil viruses have been employed: (1) Isolation, to characterize virus genotypes and phenotypes, the primary method used prior to the start of the 21st century. (2) Metagenomics, which has revealed a vast diversity of viruses while also allowing insights into viral community ecology, although with limitations due to DNA from cellular organisms obscuring viral DNA. (3) Viromics (targeted metagenomics of virus-like-particles), which has provided a more focused development of ‘virus-sequence-to-ecology’ pipelines, a result of separation of presumptive virions from cellular organisms prior to DNA extraction. This separation permits greater sequencing emphasis on virus DNA and thereby more targeted molecular and ecological characterization of viruses.
    [Show full text]
  • 1 Benchmark of Thirteen Bioinformatic Pipelines for Metagenomic Virus Diagnostics Using Datasets
    medRxiv preprint doi: https://doi.org/10.1101/2021.05.04.21256618; this version posted May 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license . 1 Benchmark of thirteen bioinformatic pipelines for metagenomic virus diagnostics using datasets 2 from clinical samples 3 4 Jutte J.C. de Vries1, Julianne R. Brown2, Nicole Fischer3, Igor A. Sidorov1, Sofia Morfopoulou2a, Jiabin 5 Huang3, Bas B. Oude Munnink4, Arzu Sayiner5, Alihan Bulgurcu5, Christophe Rodriguez6, Guillaume 6 Gricourt6, Els Keyaerts7, Leen Beller7, Claudia Bachofen8, Jakub Kubacki8, Samuel Cordey9, Florian 7 Laubscher9, Dennis Schmitz10, Martin Beer11, Dirk Hoeper11, Michael Huber12, Verena Kufner12, 8 Maryam Zaheri12, Aitana Lebrand13, Anna Papa14, Sander van Boheemen4, Aloys C.M. Kroes1, Judith 9 Breuer2, 2a, F. Xavier Lopez-Labrador15, 16, Eric C.J. Claas1, on behalf of the ESCV Network on Next- 10 Generation Sequencing 11 Word count: 2431 12 Author contributions: Conceptualization: JV, EC. Methodology: JV, EC, FXL, NF, IS, BO, SB. Data 13 analysis: JV, JuB, NF, IS, SM, JH, BO, AS, AB, CR, GG, EK, LB, CB, JK, SC, FL, DS, MB, DH, MH, VK, MZ, 14 AL, AP. Visualization: JV, AL. First draft: JV. Reviewing and editing: all authors 15 16 1 Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical 17 Center, Leiden, the Netherlands; [email protected],
    [Show full text]
  • Biovisionalexandria 2012 Proceedings
    New Life Sciences: Linking Science to Society BioVisionAlexandria 2012 New Life Sciences: Linking Science to Society BioVisionAlexandria 2012 Editors Ismail Serageldin & Mohammed Yahia with Mohamed El-Faham & Marwa El-Wakil Bibliotheca Alexandrina Cataloging-in-Publication Data BioVision Alexandria (2012 : Alexandria, Egypt) New Life Sciences : linking science to society : BioVisionAlexandria2012 / Bibliotheca Alexandrina ; editors, Ismail Serageldin, Mohamed Yahia, with Mohamed El-Faham and Marwa El-Wakil. – Alexandria, Egypt: Bibliotheca Alexandrina, 2013. p. cm. Includes bibliographical references. ISBN 978-977-452-226-8 1. Biotechnology -- Congresses. I. Serageldin, Ismail, 1944- II. Yahia, Mohamed. III. El-Faham, Mohamed. IV. El-Wakil, Marwa. V. Aliksāndrīnā (Library). VI.Title 660.6--dc22 2013659839 ISBN 978-977-452-226-8 Dar El Kotob Depository Number 5293/2013 © 2013, Bibliotheca Alexandrina. NON-COMMERCIAL REPRODUCTION Information in this publication has been produced with the intent that it be readily available for personal and public non-commercial use and may be reproduced, in part or in whole and by any means, without charge or further permission from the Bibliotheca Alexandrina. We ask only that: • Users exercise due diligence in ensuring the accuracy of the materials reproduced; • Bibliotheca Alexandrina be identified as the source; and • The reproduction is not represented as an official version of the materials reproduced, nor as having been made in affiliation with or with the endorsement of the Bibliotheca Alexandrina.
    [Show full text]
  • Identifying Viruses from Metagenomic Data by Deep Learning
    Identifying viruses from metagenomic data by deep learning Jie Ren∗ Kai Song∗ University of Southern California Chinese Academy of Sciences Los Angeles, CA, USA Qingdao, China Chao Deng Nathan A. Ahlgren Jed A. Fuhrman University of Southern California Clark University University of Southern California, Los Angeles, CA, USA Worcester, MA, USA Los Angeles, CA, USA Yi Li Xiaohui Xie University of California, Irvine University of California, Irvine Irvine, CA, USC Irvine, CA. USA Fengzhu Suny University of Southern California Los Angeles, CA, USA Fudan University Shanghai, China Abstract The recent development of metagenomic sequencing makes it possible to sequence microbial genomes including viruses in an environmental sample. Identifying viral sequences from metagenomic data is critical for downstream virus analyses. The existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences. Here we have developed a reference-free and alignment-free machine learning method, DeepVirFinder, for predicting viral sequences in metagenomic data using deep learning techniques. DeepVirFinder was trained based on a large number of viral sequences discovered before May 2015. Evaluated on the sequences after that date, DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths. Enlarging the training data by adding millions of purified viral sequences from environmental arXiv:1806.07810v1 [q-bio.GN] 20 Jun 2018 metavirome samples significantly improves the accuracy for predicting under- represented viruses. Applying DeepVirFinder to real human gut metagenomic samples from patients with colorectal carcinoma (CRC) identified 51,138 viral sequences belonging to 175 bins. Ten bins were associated with the cancer status, indicating their potential use for non-invasive diagnosis of CRC.
    [Show full text]