Basics of Phage Genome Annotation & Classification – How to Get Started
Total Page:16
File Type:pdf, Size:1020Kb
Basics of phage genome annotation & classification – How to get started Dr. Evelien Adriaenssens (she/her) Career Track Group Leader Gut viruses & viromics Chair Bacterial Viruses Subcommittee (ICTV) @EvelienAdri Disclaimers The following presentation is my personal opinion. It aims at communicating best practices at this time for phage genome annotation. Practices may change over time. I do not receive any remuneration for this presentation and any content may only be reproduced for non-commercial purposes. Any mention of a software tool does not constitute institutional endorsements. 2 Database submission 3 Concepts Assembly Read mapping Annotation 4 Next-generation-sequencing and assembly Majority of phages: sequencing platform does not matter much Things to keep in mind: Smallest sequencers sufficient for phages, 30-50 fold read coverage of genome is ideal, possible to co-sequence multiple phages E.g.: Illumina MiSeq, MiniSeq; Ion Torrent PGM; ONT MinION Long read technologies have potential to give genome in one read Popular Nextera library prep = transposon based, ends of linear DNA will be missed Assembly of genomes: choose tools that are accessible to you and in line with your bioinformatic skills In general: the easier to use, the more expensive Do read mapping: find assembly errors, find genome ends Owen, Perez-Sepulveda & Adriaenssens, 2021, Detection of Bacteriophages: Sequence-Based Systems: 5 https://link.springer.com/referenceworkentry/10.1007%2F978-3-319-41986-2_19 Phage genome structure & implications (1) Circularly permuted headful packaging: random ends pac sites: start fixed, end random Merrill et al, 2016, BMC Genomics: Excellent examples of genome organisations Dickeya phage LIMEstone1, Adriaenssens et al 2012, PLoS ONE Escherichia phage P1, Lobocka et al 2004, J. Bact. 6 Phage genome structure & implications (2) Circularly permuted headful packaging: random ends pac sites: start fixed, end random Useful software: PhageTerm: Garneau et al 2017, Scientific Reports 7 Phage genome structure & implications (3) Defined ends Cohesive ends Terminal repeats Merrill et al, 2016, BMC Genomics: Excellent examples of genome organisations 8 Phage genome structure & implications (4) Defined ends Cohesive ends Terminal repeats MSc thesis Vincent Dunon 9 Reorganise your genome! Defined ends: verify (experimentally) and arrange correctly No defined ends: - Compare with database genomic relative - Rearrange to be colinear with best-annotated relative 10 Genome annotation: Escherichia phage T7 example UGENE, http://ugene.net/, Okonechnikov et al, 2012, Bioinformatics 11 ORF vs CDS ORFfinder @ NCBI Final annotation 12 Gene prediction Know your organism: • Common start codons • Alternative stop codons? • Which translation table? • What is the coding density? • Presence of introns? 13 Decisions, decisions Shine-Dalgarno sequence: ribosome-binding site E. coli: AGGAGG; phage T4 early genes GAGG 14 Decision criteria for phage CDSs • Presence of ribosome-binding site • Alternative start codons allowed, most phages: Translation table 11 (some exceptions) • Maximize coding density • Small CDS overlap allowed • Nested CDS possible (depends on phage) • CDS on both strands (depends on phage) • Introns possible 15 Functional annotation Based on similarity with proteins of known function • BLASTp or derivatives (PSI-BLAST, PHI-BLAST…) • HMM-based searches (e.g. HHPred) • Structural predictions (e.g. PHYRE2) Know your database: comparison search can’t find what isn’t in the database • RefSeq: curated databases (genomes, proteins), not all phages present • nr/nt: non-redundant databases of proteins/nucleotides • Protein databases: Pfam, PDB, Swiss-Prot, Uni-Prot • Environmental databases: e.g. IMG/VR 16 General rules for functional annotation If you are unsure of the function à “hypothetical protein” “If you can’t convince your grandma of the gene’s function, don’t give it a name” – Ramy Aziz & Andrew Kropinski Mistakes in databases will get propagated Do not add organism information in /product description Do not name a gene product “gp55” nobody knows what that means Add extra information in /notes 17 Twitter survey on pet peeves 18 Other features t t g t c g g c • c g Promoters c g c g • Terminators c g a t • tRNAs a t a t c a t a t t a t a t t t t ENERGY = -11.2 terminator_57-58 19 Tools Best tool: your own eyes All-in-one tools and platforms for phage assembly and annotation: - PATRIC (patricbrc.org) - Galaxy platform - https://cpt.tamu.edu/galaxy-pub 20 Command line tools • Assembly: • SPAdes (https://cab.spbu.ru/software/spades/) • Shovill (https://github.com/tseemann/shovill) • Megahit (https://github.com/voutcn/megahit) • Gene prediction: • Prodigal (https://github.com/hyattpd/Prodigal) • Glimmer (http://ccb.jhu.edu/software/glimmer/index.shtml) • GeneMarkS (http://exon.gatech.edu/GeneMark/) • PHANOTATE (https://github.com/deprekate/PHANOTATE) • Gene predication & Annotation: • Prokka (https://github.com/tseemann/prokka) • Balrog (https://github.com/salzberg-lab/Balrog) 21 Other useful tools Kropinski suite of collected tools: https://molbiol-tools.ca/ STEP3 https://step3.erc.monash.edu/ Enhanced prediction of phage virion proteins Easyfig (http://mjsull.github.io/Easyfig/) BRIG (http://brig.sourceforge.net/) or alternatively CGView server (http://stothard.afns.ualberta.ca/cgview_server/) 22 Phage classification & taxonomy https://www.mdpi.com/1999-4915/9/4/70 Name Affiliation Position Evelien Adriaenssens Quadram Institute Bioscience, UK Subcommittee Chair ICTV Dann Turner University of the West of England, UK Vice Chair; Caudoviricetes & Autographiviridae SGC Andrew Kropinski University of Guelph, Canada Actinobacteriophages SGC Hanna Oksanen University of Helsinki, Finland EC Elected Member; Halopanivirales & Corticoviridae SGC Minna Poranen University of Helsinki, Finland Cystoviridae SGC https://talk.ictvonline.org/ Jakub Barylski Adam Mickiewicz University, Poland Herelleviridae SGC Susan Lehman FDA, USA Lambdaviruses SGC Janis Rumnieks Biomedical Research and Study Center, Latvia Leviviricetes SGC Bas Dutilh University of Utrecht, The Netherlands EC Elected Member; Metagenomics SGC Bacterial Viruses Subcommittee Matthew Sullivan The Ohio State University, USA Metagenomics SG Co-Chair François Enault University of Clermont, France Microviridae SGC Leonardo (Lonnie) van Zyl University of the Western Cape, RSA Peduovirus SGC Johannes Wittmann DSMZ, Germany Schitoviridae SGC Study Group Chairs Mart Krupovic Pasteur Institute, France EC member; Plasmaviridae SGC Malgorzata Lobocka Polish Academy of Sciences, Poland EC Elected Member; phi29-related phages SGC Regional representatives Annika Gillis Imperial College London, UK Tectiviridae SGC Andrew Millard University of Leicester, UK Tevenvirinae SGC Members Petar Knezevic University of Novi Sad, Serbia Tubulavirales SGC Andrey Shkoporov University College Cork, Ireland Crassvirales SGC Rodney Brister NCBI, USA NCBI contact Tong Yigang Beijing Institute of Microbiology and Epidemiology, China regional representative Ramy Aziz Cairo University, Egypt regional representative Andrea Moreno Switt Universidad Andres Bello, Chile regional representative Jumpei Uchiyama Azabu University, Japan regional representative Poliane Alfenas-Zerbini Universidade Federal de Viçosa, Brasil regional representative Alla Kushkina D.K. Zabolotny Institute of Microbiology and Virology of the National regional representative Academy of Sciences of Ukraine B.L. Sarkar National Institute of Cholera and Enteric Diseases, India regional representative Vera Morozova Institute of Chemical Biology and Fundamental Medicine (RAS), regional representative Russia Nina Chanishvili The Eliava Institute of Bacteriophage, Georgia regional representative Ipek Kurtböke University of the Sunshine Coast, Australia regional representative Jesca Nakavuma Makerere University, Uganda regional representative Alejandro Reyes Universidad de los Andes, Colombia member Cristina Moraru University of Oldenburg, Germany member Rob Lavigne KU Leuven, Belgium member Rob Edwards Flinders University, Australia member Cédric Lood KU Leuven, Belgium member 24 Naming your phage • No official rules about naming phage/virus isolates • BUT lots of rules for official taxon names (e.g. no hyphens or slashes, no Greek letters...) • BE UNIQUE! • BVS has used the exemplar isolate name as basis for the species name where possible • isolate phage T7 infecting Escherichia coli • deposited as Enterobacteria phage T7 • type strain of the species Escherichia virus T7 Remember: species != phage all domestic dogs member of the species Canis lupus 25 Binomial species naming system Proposal: Use genus name plus species epithet to refer to virus species in freeform format Example: Salmonella virus P22, member of genus Lederbergvirus à Lederbergvirus P22 (freeform binomial) à Lederbergvirus transductans (latinised binomial) Clear difference between phage isolate and species! In practice: my phage is called Salmonella phage Tweedledum and it belongs to the species Lederbergvirus P22. 26 Does my new phage represent a new species? Main species demarcation criterion for bacteriophages: genome sequence identity of 95% à the genomes of two isolates belonging to the same species differ from each other by less than 5% over the genome length à Suggested tool to use: VIRIDIC (http://rhea.icbm.uni- oldenburg.de/VIRIDIC/) à check for synteny, isolates with high levels of rearrangements do not belong to same species è part of existing species: use this taxonomic