Basics of phage genome annotation & classification – How to get started Dr. Evelien Adriaenssens (she/her) Career Track Group Leader Gut viruses & viromics
Chair Bacterial Viruses Subcommittee (ICTV) @EvelienAdri Disclaimers
The following presentation is my personal opinion. It aims at communicating best practices at this time for phage genome annotation. Practices may change over time.
I do not receive any remuneration for this presentation and any content may only be reproduced for non-commercial purposes.
Any mention of a software tool does not constitute institutional endorsements.
2 Database submission 3 Concepts
Assembly Read mapping Annotation
4 Next-generation-sequencing and assembly
Majority of phages: sequencing platform does not matter much
Things to keep in mind: Smallest sequencers sufficient for phages, 30-50 fold read coverage of genome is ideal, possible to co-sequence multiple phages E.g.: Illumina MiSeq, MiniSeq; Ion Torrent PGM; ONT MinION
Long read technologies have potential to give genome in one read Popular Nextera library prep = transposon based, ends of linear DNA will be missed Assembly of genomes: choose tools that are accessible to you and in line with your bioinformatic skills In general: the easier to use, the more expensive
Do read mapping: find assembly errors, find genome ends
Owen, Perez-Sepulveda & Adriaenssens, 2021, Detection of Bacteriophages: Sequence-Based Systems: 5 https://link.springer.com/referenceworkentry/10.1007%2F978-3-319-41986-2_19 Phage genome structure & implications (1)
Circularly permuted headful packaging: random ends pac sites: start fixed, end random
Merrill et al, 2016, BMC Genomics: Excellent examples of genome organisations
Dickeya phage LIMEstone1, Adriaenssens et al 2012, PLoS ONE Escherichia phage P1, Lobocka et al 2004, J. Bact. 6 Phage genome structure & implications (2)
Circularly permuted headful packaging: random ends pac sites: start fixed, end random
Useful software: PhageTerm: Garneau et al 2017, Scientific Reports 7 Phage genome structure & implications (3)
Defined ends Cohesive ends Terminal repeats
Merrill et al, 2016, BMC Genomics: Excellent examples of genome organisations 8 Phage genome structure & implications (4)
Defined ends Cohesive ends Terminal repeats
MSc thesis Vincent Dunon 9 Reorganise your genome!
Defined ends: verify (experimentally) and arrange correctly
No defined ends: - Compare with database genomic relative - Rearrange to be colinear with best-annotated relative
10 Genome annotation: Escherichia phage T7 example
UGENE, http://ugene.net/, Okonechnikov et al, 2012, Bioinformatics 11 ORF vs CDS
ORFfinder @ NCBI
Final annotation
12 Gene prediction
Know your organism:
• Common start codons
• Alternative stop codons?
• Which translation table?
• What is the coding density?
• Presence of introns?
13 Decisions, decisions
Shine-Dalgarno sequence: ribosome-binding site
E. coli: AGGAGG; phage T4 early genes GAGG 14 Decision criteria for phage CDSs
• Presence of ribosome-binding site • Alternative start codons allowed, most phages: Translation table 11 (some exceptions) • Maximize coding density • Small CDS overlap allowed • Nested CDS possible (depends on phage) • CDS on both strands (depends on phage) • Introns possible
15 Functional annotation
Based on similarity with proteins of known function • BLASTp or derivatives (PSI-BLAST, PHI-BLAST…)
• HMM-based searches (e.g. HHPred) • Structural predictions (e.g. PHYRE2)
Know your database: comparison search can’t find what isn’t in the database • RefSeq: curated databases (genomes, proteins), not all phages present • nr/nt: non-redundant datab