269795622-Oa

Total Page:16

File Type:pdf, Size:1020Kb

269795622-Oa J Antimicrob Chemother 2017; 72: 700–704 doi:10.1093/jac/dkw511 Advance Access publication 30 December 2016 Prediction of antibiotic resistance from antibiotic resistance genes detected in antibiotic-resistant commensal Escherichia coli using PCR or WGS Downloaded from https://academic.oup.com/jac/article-abstract/72/3/700/2762720 by Biomedical Library user on 04 April 2019 Robert A. Moran1, Sashindran Anantham1, Kathryn E. Holt2,3 and Ruth M. Hall1* 1School of Life and Environmental Sciences, The University of Sydney, NSW 2006, Australia; 2Centre for Systems Genomics, University of Melbourne, Parkville, Victoria 3010, Australia; 3Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria 3010, Australia *Corresponding author. School of Life and Environmental Sciences, Molecular Bioscience Building G08, The University of Sydney, NSW 2006, Australia. Tel: þ61-2-9351-3465; Fax: þ61-2-9351-5858; E-mail: [email protected] Received 22 August 2016; accepted 27 October 2016 Objectives: To assess the effectiveness of bioinformatic detection of resistance genes in whole-genome se- quences in correctly predicting resistance phenotypes. Methods: Genomes of a collection of well-characterized commensal Escherichia coli were sequenced using Illumina HiSeq technology and assembled with SPAdes. Antibiotic resistance genes identified by PCR, SRST2 ana- lysis of reads and ResFinder analysis of SPAdes assemblies were compared with known resistance phenotypes. Results: Generally, the antibiotic resistance genes detected using bioinformatic methods were concordant, but only ARG-ANNOT included sat2. However, the presence or absence of genes was not always predictive of the phenotype. In one strain, trimethoprim resistance was due to a known mutation in the chromosomal folA gene. In cases where the copy number was low, the aadA5 gene downstream of dfrA17 did not confer streptomycin or spectinomycin resistance. Resistance genes were found in the genomes that were not detected previously by PCRs targeting a limited gene set and gene cassettes in class 1 or class 2 integrons. In one isolate, the aadA1 gene cassette in the estX-aadA1 cassettes pair was outside an integron context and was not expressed. The qnrS1 gene, conferring reduced susceptibility to fluoroquinolones, and the blaCMY-2 gene, encoding an ESBL, were each detected in a single isolate and mphA (macrolide resistance) was present in six isolates surrounded by IS26 and IS6100. Conclusions: WGS analysis detected more genes than PCR. Some were not expressed, causing inconsistencies with the experimentally determined phenotype. An unpredicted chromosomal folA mutation causing trimetho- prim resistance was found. Introduction volunteers using a number of measures of strain diversity, includ- Bacteria resistant to therapeutic antibiotics represent a significant ing the antibiotic resistance phenotype of each isolate, phylogen- global health challenge as infections caused by multiply, exten- etic group and random amplified polymorphic DNA (RAPD) 3–6 sively or pan antibiotic-resistant Gram-negative and Gram-positive profiling. A single representative of each strain type detected bacteria continue to increase. As WGS becomes more affordable was retained. For the resistant strains, PCR was used to detect class and searchable databases of acquired antibiotic resistance genes 1 and class 2 integrons and the gene cassettes they harbour, as have been made available,1,2 predicting the antibiotic resistance well as a limited set of other resistance genes. The plasmid content 6 profile by identifying antibiotic resistance genes in WGS data has was determined recently and a few plasmids that carry resistance 5,7 become feasible. However, studies that compare experimentally genes have been studied or completely sequenced. determined resistance profiles with resistance gene content are Here, the genomes of the antibiotic-resistant isolates in the needed in order to assess the reliability of WGS-based approaches. collection were sequenced via Illumina and both reads-based We have established a non-redundant collection of commensal analysis with SRST2 using the ARG-ANNOT database and Escherichia coli recovered from healthy Australian adults by exam- assembly-based analysis with ResFinder were used to determine ining the population structure of E. coli from the colons of the resistance gene content. The outputs of this analysis were VC The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please email: [email protected]. 700 Resistance genes in commensal Escherichia coli JAC Table 1. Antibiotic-resistant commensal E. coli collection Resistance Additional genes a b Strain phenotype Phylogroup blaTEM strAB aadA sul dfrA tetA Other WGS 14.3-R4 TET A0 ÀÀÀÀÀA— — 24.16-R4 AMP STR SMX TMP A0 11225 2 —— 24.20-R5 TET A0 22222A— — Downloaded from https://academic.oup.com/jac/article-abstract/72/3/700/2762720 by Biomedical Library user on 04 April 2019 1-R1 AMP STR SMX TET A1 þþÀ1, 27 A— — TMP 2.2-R2 NAL TET A1 ÀÀÀÀÀA— — 3.6-R4 AMP STR SMX A1 þþÀ2 ÀÀ—— 3.6-R5 AMP CHL GEN NAL A1 þþ51, 217 A catA1, aacC2d mphA STR SMX TET TOB TMP 21.1-R1 AMP STR SMX TET A1 þþÀ2 À A— — 1.4-R4 SMX TET TMP A1 ÀÀÀ17 A— — 15.1-R1 AMP STR SMX TMP A1 þþÀ25 À —— 1.2-R2 TET A1 ÀÀÀÀÀA— — 1.2-R3 SMX TET TMP A1 þÀ214A— — 1.9-R7 AMP TET A1 þÀÀÀA — qnrS1 14.2-R3 SMX TET TMP A1 ÀÀÀ2 À A— aadA5, dfrA17 24-R3 AMP STR SMX TMP A1 11225 2 —— 3.5-R3 STR SPT TMP B1c ÀÀ1 À 1 À sat2 — 19.1-R1a STR SPT TMP B1 ÀÀ1 À 1 À sat2 — 19.1-R1 TMP B1 ÀÀÀÀ1 À sat2 — 1.10-R8 SMX TET TMP B2 22215 B— — 2.1-R1 AMP B2 þÀÀÀÀ—— 3-S1R AMP B2 þÀÀÀÀ—— 3.3-R2 AMP B2 þÀÀÀÀ—— 13.1-R2 AMP SMX B2 þÀ2 ÀÀ—— 13.1-R2a AMP SMX TET B2 þÀ2 À B— — 14.2-R2 AMP B2 þÀÀÀÀ—— 22.1-R1 AMP CHL STR SPT B2 þ1, 23 12B cmlA1 mef(B) SMX TET TMP 10.1-R1 AMP STR SMX TET B2 þþÀ21 B sat2 — TMP 11.1-R1 AMP TMP B2 þÀÀÀÀ—— 11.3-R3 AMP TET B2 þÀÀÀA— — 19.1-R3 AMP STR SPT SMX B2 þþ11, 21 B— — TET TMP 1.9-R6 AMP CIP GEN NAL B2 þþ51, 217 A aacC2d mphA STR SMX TET TOB TMPd 5.1-R1 AMP STR SMX B2 þþÀ2 ÀÀ—— 2.3-R3 STR SMX TMP D þÀ1, 25 À —— 2.3-R4 SMX TET TMP D ÀÀÀ15 C— — 2.3-R5 STR SMX TET TMP D þÀ1, 25 B— — 3-R1 AMP TET D þÀÀÀB— — 4-R1 AMP TET D þÀÀÀB— — 4.2-R3 TET D ÀÀÀÀÀB— — 5.2-R2 AMP CHL STR SMX D þþÀ25e D catA1 — TET TMP 11.4-R4 SMX TMP D 22215 2 —— 18.1-R1 AMP TET D þÀÀÀB— — 4-R2 AMP STR SPT SMX D þþ51, 217 A — mphA TET TMP Continued 701 Moran et al. Table 1. Continued Resistance Additional genes a b Strain phenotype Phylogroup blaTEM strAB aadA sul dfrA tetA Other WGS 4.3-R2a AMP D þÀÀÀÀ—— 4.4-R2b AMP STR SMX TET D þþÀ2 À A — mphA 11.2-R2 AMP STR SPT SMX D þþ51, 217 A — mphA TET TMP Downloaded from https://academic.oup.com/jac/article-abstract/72/3/700/2762720 by Biomedical Library user on 04 April 2019 14.1-R1 AMP STR SPT SMX D þþ51, 217 A— mphA TET TMP 6.2-R1 SMX TMP D þÀ214À —— 9.1-R1 SMX D ÀÀÀ2 ÀÀ— aadA1 13.1-R1 TET D ÀÀÀÀÀB— — 24.1-R1 AMP CAZ CTX TET D 22222B— blaCMY-2 24.1-R2 AMP CHL STR SPT D 111, 23 12A cmlA1 — SMX TET TMP aNew strains and ciprofloxacin and nalidixic acid resistance are bold. bAMP, ampicillin; CAZ, ceftazidime; CTX, cefotaxime; CHL, chloramphenicol; CIP, ciprofloxacin; GEN, gentamicin; NAL, nalidixic acid; STR, streptomycin; SPT, spectinomycin; SMX, sulfamethoxazole; TET, tetracycline; TOB, tobramycin; TMP, trimethoprim. c 6 Previously reported as phylogroup A0 in Moran et al. (2015). dPreviously reported as AMP (CHL) STR SMX TET TMP in Anantham and Hall5 (2012). eIncorrectly reported as dfrA1 in Anantham and Hall5 (2012). reconciled with the resistance phenotype and resistance gene JAC Online). Resistance genes in assembled contigs were detected using determination using PCR-based methods. ResFinder (https://cge.cbs.dtu.dk//services/ResFinder/)1 and raw reads were used to query ARG-ANNOT (http://en.mediterranee-infection.com/article. php?laref¼283%26titre¼arg-annot)2 using SRST2 with default settings.11 Materials and methods The coverage of resistance genes relative to the average coverage for genes E. coli isolates used for MLST was used to assess copy number. Assembled sequences were compared with those found in the GenBank non-redundant DNA The strains used were either derived from a published collection of com- database using the BLAST alignment facility (http://blast.ncbi.nlm.nih.gov). mensal E. coli strains recovered from the faeces of 22 healthy human sub- Gene Construction Kit version 2.5 (Textco, West Lebanon, NH, USA) was jects between 2008 and 20108 or were isolated from further samples used to draw figures to scale. collected from the same subjects and from an additional subject over the time frame 2008–14.4–6 Sample collection followed protocols approved by the University of Sydney Human Research Ethics Committee (HREC) (04- GenBank accession numbers 2008/10778) with informed consent from subjects. The protocols for isolation Sequences of fragments containing blaCMY-2 and the estX-aadA1 cassettes and analysis are described elsewhere.4 The 51 unique isolates that are resist- have been deposited in GenBank with the accession numbers KX462017 ant to at least 1 of the 12 antibiotics (ampicillin, ceftazidime, cefotaxime, and KX462014, respectively.
Recommended publications
  • Methods for De-Novo Genome Assembly
    Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 28 June 2020 doi:10.20944/preprints202006.0324.v1 Methods for De-novo Genome Assembly Arash Bayat∗1,3, Hasindu Gamaarachchi1, Nandan P Deshpande2, Marc R Wilkins2, and Sri Parameswaran1 1School of Computer Science and Engineering, UNSW, Australia 2Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, UNSW, Australia 3Health and Biosecurity, CSIRO, Australia June 25, 2020 Abstract Despite advances in algorithms and computational platforms, de-novo genome assembly remains a chal- lenging process. Due to the constant innovation in sequencing technologies (Sanger, SOLiD, Illumina, 454 , PacBio and Oxford Nanopore), genome assembly has evolved to respond to the changes in input data type. This paper includes a broad and comparative review of the most recent short-read, long-read and hybrid assembly techniques. In this review, we provide (1) an algorithmic description of the important processes in the workflow that introduces fundamental concepts and improvements; (2) a review of existing software that explains possible options for genome assembly; and (3) a comparison of the accuracy and the performance of existing methods executed on the same computer using the same processing capabilities and using the same set of real and synthetic datasets. Such evaluation allows a fair and precise comparison of accuracy in all aspects. As a result, this paper identifies both the strengths and weaknesses of each method. This com- parative review is unique in providing a detailed comparison of a broad spectrum of cutting-edge algorithms and methods. Availability: https://arashbayat.github.io/asm ∗To whom correspondence should be addressed. Email: [email protected] 1 © 2020 by the author(s).
    [Show full text]
  • A Galaxy-Based Virus Genome Assembly Pipeline A
    The Pennsylvania State University The Graduate School VIRAMP: A GALAXY-BASED VIRUS GENOME ASSEMBLY PIPELINE A Thesis in Integrative Biosciences by Yinan Wan © 2014 Yinan Wan Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science December 2014 The thesis of Yinan Wan was reviewed and approved* by the following: Istvan Albert Associate Professor, Bioinformatics, Biochemistry and Molecular Biology Thesis Advisor Moriah L. Szpara Assistant Professor of Biochemistry & Molecular Biology Cooduvalli S. Shashikant Associate Professor of Molecular and Development Biology Co-Director, IBIOS Graduate Program Option in Bioinformatics and Genomics Peter Hudson Willaman Professor of Biology Director, Huck Institutes of the Life Sciences *Signatures are on file in the Graduate School iii ABSTRACT Background Advances in next generation sequencing make it possible to obtain high-coverage sequence data for large numbers of viral strains in a short time. However, since most bioinformatics tools are developed for command line use, the selection and accessibility of computational tools for genome assembly and variation analysis limits the ability of individual scientist to perform further bioinformatics analysis. Findings We have developed a multi-step viral genome assembly pipeline named VirAmp that combines existing tools and techniques and presents them to end users via a web-enabled Galaxy interface. Our pipeline allows users to assemble, analyze and interpret high coverage viral sequencing data with an ease and efficiency that previously was not feasible. Our software makes a large number of genome assembly and related tools available to life scientists and automates the currently recommended best practices into a single, easy to use interface.
    [Show full text]
  • Culturing Ancient Bacteria Carrying Resistance Genes from Permafrost and Comparative Genomics with Modern Isolates
    microorganisms Article Culturing Ancient Bacteria Carrying Resistance Genes from Permafrost and Comparative Genomics with Modern Isolates Pamela Afouda 1,2, Grégory Dubourg 1,2, Anthony Levasseur 1,2, Pierre-Edouard Fournier 2,3 , Jeremy Delerce 1, Oleg Mediannikov 1 , Seydina M. Diene 1,2 , Daniel Nahon 4, Didier Bourlès 4, Jean-Marc Rolain 1,2 and Didier Raoult 1,2,* 1 Aix Marseille Université, IRD, AP-HM, MEPHI, 13005 Marseille, France; [email protected] (P.A.); [email protected] (G.D.); [email protected] (A.L.); [email protected] (J.D.); [email protected] (O.M.); [email protected] (S.M.D.); [email protected] (J.-M.R.) 2 IHU Méditerranée Infection, 13005 Marseille, France; [email protected] 3 UMR VITROME, SSA, Aix-Marseille Université, IRD, AP-HM, IHU-Méditerranée-Infection, 13005 Marseille, France 4 Aix-Marseille University, CNRS, IRD, INRAE, Coll France, UM 34 CEREGE, Technopôle de l’Environnement Arbois-Méditerranée, BP80, 13545 Aix-en-Provence, France; [email protected] (D.N.); [email protected] (D.B.) * Correspondence: [email protected]; Tel.: +33-413-732-401; Fax: +33-413-732-402 Received: 29 March 2020; Accepted: 1 October 2020; Published: 3 October 2020 Abstract: Long considered to be a consequence of human antibiotics use by deduction, antibiotic resistance mechanisms appear to be in fact a much older phenomenon as antibiotic resistance genes have previously been detected from millions of year-old permafrost samples. As these specimens guarantee the viability of archaic bacteria, we herein propose to apply the culturomics approach to recover the bacterial content of a Siberian permafrost sample dated, using the in situ-produced cosmogenic nuclide chlorine36 (36Cl), at 2.7 million years to study the dynamics of bacterial evolution in an evolutionary perspective.
    [Show full text]
  • A New Versatile Metagenomic Assembler (Supplementary Material)
    metaSPAdes: a new versatile metagenomic assembler (Supplementary Material) Sergey Nurk1,*,**, Dmitry Meleshko1,*, Anton Korobeynikov1,2 and Pavel A Pevzner1,3 1Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia 2Department of Statistical Modelling, St. Petersburg State University, St. Petersburg, Russia 3Department of Computer Science and Engineering, University of California, San Diego, USA *These authors contributed equally to this work **corresponding author, [email protected] Data preprocessing 2 Modifying the decision rule in exSPAnder for metagenomic data 3 Reducing running time and memory footprint of metaSPAdes 4 Bulge projection approach 6 Nx statistics 7 Analysis of the SYNTH dataset 8 CAMI datasets 10 Analysis of the CAMI datasets 14 BenchmarKing SPAdes against metaSPAdes 17 Effect of novel algorithmic approaches in metaSPAdes on assembly quality 21 Analysis of the HMP dataset 25 References 27 1 Supplemental Material: Data preprocessing The SYNTH, HMP, MARINE, and SOIL datasets were pre-processed to remove adaptors and trim low- quality segments of the reads. We used cutadapt software v 1.9.1 (Martin 2011), trimming bases with PHRED quality < 20 from 3’ end (parameter -q 20). Adaptor sequences were identified for each dataset individually, using FastQC v0.11.3 and manual reads inspection: SYNTH: -q 20 -a GAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT HMP: -q 20 -a AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
    [Show full text]
  • Spades: a New Genome Assembler for Single-Cell Sequencing
    Problem setting and preparing data The SPAdes approach SPAdes: a New Genome Assembler for Single-Cell Sequencing Algorithmic Biology Lab St. Petersburg Academic University A. Bankevich, S. Nurk, D. Antipov, A.A. Gurevich, M. Dvorkin, A. Korobeynikov, A.S. Kulikov, V.M. Lesin, S.I. Nikolenko, S. Pham, A.D. Prjibelski, A.V. Pyshkin, A.V. Sirotkin, N. Vyahhi, G. Tesler, M.A. Alekseyev, P.A. Pevzner August 27, 2012 Algorithmic Biology Lab, SPbAU SPAdes Problem setting and preparing data Assembly: problem and pipeline The SPAdes approach Error correction: BayesHammer Outline 1 Problem setting and preparing data Assembly: problem and pipeline Error correction: BayesHammer 2 The SPAdes approach De Bruijn graphs, mate-pairs, and simplification Paired de Bruijn graphs and repeats Algorithmic Biology Lab, SPbAU SPAdes Problem setting and preparing data Assembly: problem and pipeline The SPAdes approach Error correction: BayesHammer Single-cell sequencing Recent years have seen the advent of single-cell sequencing as a way to sequence genomes that we previously couldn’t. It turns out that many bacteria (“dark matter of life”) cannot be sequenced by standard techniques, most often because they cannot be cloned millions of times to get large DNA samples needed for regular sequencing. This is usually due to the fact that these bacteria come in metagenomic samples (ocean samples, microbiomes of larger organisms etc.) and cannot be cultivated alone. For now, metagenomic analysis can yield more or less only individual genes. Single-cell sequencing can
    [Show full text]
  • Computational Protocol for Assembly and Analysis of SARS-Ncov-2 Genomes
    PROTOCOL Computational Protocol for Assembly and Analysis of SARS-nCoV-2 Genomes Mukta Poojary1,2, Anantharaman Shantaraman1, Bani Jolly1,2 and Vinod Scaria1,2 1CSIR Institute of Genomics and Integrative Biology, Mathura Road, Delhi 2Academy of Scientific and Innovative Research (AcSIR) *Corresponding email: MP [email protected]; AS [email protected]; BJ [email protected]; VS [email protected] ABSTRACT SARS-CoV-2, the pathogen responsible for the ongoing Coronavirus Disease 2019 pandemic is a novel human-infecting strain of Betacoronavirus. The outbreak that initially emerged in Wuhan, China, rapidly spread to several countries at an alarming rate leading to severe global socio-economic disruption and thus overloading the healthcare systems. Owing to the high rate of infection of the virus, as well as the absence of vaccines or antivirals, there is a lack of robust mechanisms to control the outbreak and contain its transmission. Rapid advancement and plummeting costs of high throughput sequencing technologies has enabled sequencing of the virus in several affected individuals globally. Deciphering the viral genome has the potential to help understand the epidemiology of the disease as well as aid in the development of robust diagnostics, novel treatments and prevention strategies. Towards this effort, we have compiled a comprehensive protocol for analysis and interpretation of the sequencing data of SARS-CoV-2 using easy-to-use open source utilities. In this protocol, we have incorporated strategies to assemble the genome of SARS-CoV-2 using two approaches: reference-guided and de novo. Strategies to understand the diversity of the local strain as compared to other global strains have also been described in this protocol.
    [Show full text]
  • Sequencing and Comparative Analysis of <I>De Novo</I> Genome
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Dissertations and Theses in Biological Sciences Biological Sciences, School of 7-2016 Sequencing and Comparative Analysis of de novo Genome Assemblies of Streptomyces aureofaciens ATCC 10762 Julien S. Gradnigo University of Nebraska - Lincoln, [email protected] Follow this and additional works at: http://digitalcommons.unl.edu/bioscidiss Part of the Bacteriology Commons, Bioinformatics Commons, and the Genomics Commons Gradnigo, Julien S., "Sequencing and Comparative Analysis of de novo Genome Assemblies of Streptomyces aureofaciens ATCC 10762" (2016). Dissertations and Theses in Biological Sciences. 88. http://digitalcommons.unl.edu/bioscidiss/88 This Article is brought to you for free and open access by the Biological Sciences, School of at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Dissertations and Theses in Biological Sciences by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. SEQUENCING AND COMPARATIVE ANALYSIS OF DE NOVO GENOME ASSEMBLIES OF STREPTOMYCES AUREOFACIENS ATCC 10762 by Julien S. Gradnigo A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Master of Science Major: Biological Sciences Under the Supervision of Professor Etsuko Moriyama Lincoln, Nebraska July, 2016 SEQUENCING AND COMPARATIVE ANALYSIS OF DE NOVO GENOME ASSEMBLIES OF STREPTOMYCES AUREOFACIENS ATCC 10762 Julien S. Gradnigo, M.S. University of Nebraska, 2016 Advisor: Etsuko Moriyama Streptomyces aureofaciens is a Gram-positive Actinomycete used for commercial antibiotic production. Although it has been the subject of many biochemical studies, no public genome resource was available prior to this project.
    [Show full text]
  • De Novo Assembly
    de novo assembly Rayan Chikhi CNRS Workshop on Genomics - Cesky Krumlov January 2016 1 YOUR INSTRUCTOR IS.. - Junior CNRS researcher in Lille, France - Postdoc at Penn State, PhD at ENS Rennes, France - CompSci background Research: - Software and methods for de novo assembly: I Minia I KmerGenie I Falcon2Fastg - Collab. on large-genomes assembly projects @RayanChikhi on Twitter http://rayan.chikhi.name 2 QUESTIONS TO THE AUDIENCE - Already have data to assemble? - Plans to sequence de novo? - RNA-Seq? - PacBio? 3 COURSE STRUCTURE - Short intro - Basic definitions - Fundamentals: why assemblies are as they are - Metrics: methods for evaluation - RNA-Seq: how Trinity works - In practice: best practices ; multi-k ; visualization 4 5 What’s an assembly and how to generate one ... 6 Source: 8 Ted Talks That Teach Public Speaking (SNI) 7 THE "WHY" - Create reference genome / transcriptome - Gene content - Novel insertions - Un-mapped reads - SNPs in non-model organisms - Find SV’s (Evan’s talk) - Specific regions of interest - Metagenomics - .. 8 "WHAT" AND "HOW" BASED ON "WHY" Scenario 1: What the best possible assembly of bacteria X How high-coverage PacBio data Why Obtain a reference geome Scenario 2: What a meh-looking draft assembly of organism X How couple of Illumina lanes Why Gene content and possible viral insertions 9 ASSEMBLY: A SOLVED PROBLEM? Still a difficult problem in 2016. 1. PacBio methods are still preliminary 2. Hard to obtain good assemblies from Illumina data Conclusions of the GAGE benchmark : in terms of assembly quality, there is no single best assembler 3. High computational requirements State of the research 1.
    [Show full text]
  • Metaspades: a New Versatile Metagenomics Assembler
    metaSPAdes: a new versatile metagenomics assembler Sergey Nurk1,*,**, Dmitry Meleshko1,*, Anton Korobeynikov1,2 and Pavel A Pevzner1,3 1Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia 2Department of Statistical Modelling, St. Petersburg State University, St. Petersburg, Russia 3Department of Computer Science and Engineering, University of California, San Diego, USA *These authors contributed equally to this work **corresponding author, [email protected] Abstract While metagenomics has emerged as a technology of choice for analyzing bacterial populations, as- sembly of metagenomic data remains challenging thus stifling biological discoveries. Moreover, re- cent studies revealed that complex bacterial populations may be composed from dozens of related strains thus further amplifying the challenge metagenomics assembly. metaSPAdes addresses various challenges of metagenomics assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes. We benchmark metaSPAdes against other state-of-the-art metagenome assemblers and demonstrate that it results in high-quality assemblies across diverse datasets. Introduction Metagenome sequencing has emerged as a technology of choice for analyzing bacterial popula- tions and discovery of novel organisms and genes (Venter et al. 2004; Tyson et al. 2004; Yooseph et al. 2007; Arumugam et al. 2011). In one of the early metagenomics studies, Venter et al. (2004) at- tempted to assemble the complex Sargasso Sea microbial community but, as the paper stated, failed. On the other side of the spectrum of metagenomics studies, Tyson et al. (2004) succeeded in assem- bling a simple metagenomic community from a biofilm consisting of a few species.
    [Show full text]
  • Evaluation of Genome Assembly Software Based on Long Reads
    Evaluation of genome assembly software based on long reads Laurent Bouri1,*, Dominique Lavenier2, Jean-Franc¸ois Gibrat3, and Victoria Dominguez del Angel4 1CNRS Engineer/ IFB 2CNRS Research Director, GenScale team leader 3INRA Research Director/ IFB 4ELIXIR Training Coordinator (FRANCE)/ IFB ABSTRACT During the last 30 years, Genomics has been revolutionized by the development of first- and second-generation sequencing (SGS) technologies, enabling the completion of many remarkable projects as the Human Genome Project1,2 , the 1000 Genomes Project3 and the Human Microbiome Project4. In the last decade, SGS technologies based on massive parallel sequencing have dominated the market, thanks to their ability to produce enormous volumes of data cheaply. However, often genes and regions of interest are not completely or accurately assembled, complicating analyses or requiring additional cloning efforts for obtaining the correct sequences5. The fundamental obstacle in SGS technologies for obtaining high quality genome assembly is the existence of repetitions in the sequences. A promising solution to this issue is the advent of Third-generation sequencing (TGS) technologies based on long read sequencing6. TGS technologies have been used to produce highly accurate de novo assemblies of hundreds of microbial genomes7,8, and highly contiguous reconstructions of many dozens of plant and animal genomes, enabling new insights into evolution and sequence diversity9,10. They have also been applied to resequencing analyses, to create detailed maps of structural variations in many species11. Also, these new technologies have been used to fill in many of the gaps in the human reference genome12. In this report, we compare and evaluate several genome assembly software based on TSG technology.
    [Show full text]
  • A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
    JOURNAL OF COMPUTATIONAL BIOLOGY Volume 19, Number 5, 2012 Original Articles # Mary Ann Liebert, Inc. Pp. 455–477 DOI: 10.1089/cmb.2012.0021 SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing ANTON BANKEVICH,1,2 SERGEY NURK,1,2 DMITRY ANTIPOV,1 ALEXEY A. GUREVICH,1 MIKHAIL DVORKIN,1 ALEXANDER S. KULIKOV,1,3 VALERY M. LESIN,1 SERGEY I. NIKOLENKO,1,3 SON PHAM,4 ANDREY D. PRJIBELSKI,1 ALEXEY V. PYSHKIN,1 ALEXANDER V. SIROTKIN,1 NIKOLAY VYAHHI,1 GLENN TESLER,5 MAX A. ALEKSEYEV,1,6 and PAVEL A. PEVZNER1,4 ABSTRACT The lion’s share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of unculti- vated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We de- scribe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E + V - SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of un- cultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software. Key words: assembly, de Bruijn graph, single cell, sequencing, bacteria 1.
    [Show full text]
  • Assembly of the Mitochondrial Genome in the Campanulaceae Family Using Illumina Low-Coverage Sequencing
    G C A T T A C G G C A T genes Article Assembly of the Mitochondrial Genome in the Campanulaceae Family Using Illumina Low-Coverage Sequencing Hyun-Oh Lee 1,2,†, Ji-Weon Choi 3,†, Jeong-Ho Baek 4, Jae-Hyeon Oh 5 ID , Sang-Choon Lee 1 and Chang-Kug Kim 5,* ID 1 Phyzen Genomics Institute, Seongnam 13558, Korea; [email protected] (H.-O.L.); [email protected] (S.-C.L.) 2 Department of Plant Science, Seoul National University, Seoul 08826, Korea 3 Postharvest Technology Division, National Institute of Horticultural and Herbal Science, Wanju 55365, Korea; [email protected] 4 Gene Engineering Division, National Institute of Agricultural Sciences, RDA, Jeonju 54874, Korea; fi[email protected] 5 Genomics Division, National Institute of Agricultural Sciences, RDA, Jeonju 54874, Korea; [email protected] * Correspondence: [email protected]; Tel.: +82-63-238-4555 † These authors contributed equally to this work. Received: 28 June 2018; Accepted: 25 July 2018; Published: 30 July 2018 Abstract: Platycodon grandiflorus (balloon flower) and Codonopsis lanceolata (bonnet bellflower) are important herbs used in Asian traditional medicine, and both belong to the botanical family Campanulaceae. In this study, we designed and implemented a de novo DNA sequencing and assembly strategy to map the complete mitochondrial genomes of the first two members of the Campanulaceae using low-coverage Illumina DNA sequencing data. We produced a total of 28.9 Gb of paired-end sequencing data from the genomic DNA of P. grandiflorus (20.9 Gb) and C. lanceolata (8.0 Gb). The assembled mitochondrial genome of P.
    [Show full text]