<<

Identification of Defective Viral using NGS Data on HIVE

Konstantinos Karagiannis [email protected] FDA/CBER/OBE/HIVE

02/22/2021 Disclaimer

This presentation reflects the views of the author and should not be construed to represent FDA’s views or policies What is a DVG? Wild-type

DVG Genomes unable to Defective complete a full a) Viral replication Genomes cycle a) Mutation b) b) Deletions c) Copy-back/snap-back c) Copy-back DVG Generation mechanisms

• Stochastic mistakes Error prone viral RNA polymerase

• Genomic variations (single AA mutation in SeV’s nucleoprotein)

• Variants of structural proteins (PPXY domain of LCMV)

• Template switching (intra or inter-genomic) Effects of DVGs

M Vignuzzi & C.B. López, . Defective viral genomes are key drivers of the -host interaction. 2019 June 03, Nature Microbiology DVG and HIV

• Types of DVG • Mutations • Deletions

• Present at all stages of HIV-1 infection

• Replication-incompetent or competent

• Silent pool • Persistent seropositivity • In CD4+ memory cells 17y

H. Imamichi et al. Defective HIV-1 proviruses produce viral proteins. 2020 Feb 18, PNAS DVG in SARS-CoV-2

• Defective / subgenomic RNA

D. Kim et al. The Architecture of SARS-CoV-2 Transcriptme. 2020 May 14, Cell H. Doddapaneni et al. Oligonucleotide Capture Sequencing of the SARS- CoV-2 and SubgenomicFragments from COVID-19 Individuals. 2020 Dec 11, bioRxiv RNA with described DVGs

Virus Type of DVG Known DVG functions Virus Type of DVG Known DVG functions nsRNA viruses psRNA viruses Arenaviridae Lymphocytic choriomeningitis Interference/modulate Porcine reproductive and respiratory Deletion Deletion UN mammarenavirus virulence/persistence syndrome virus Equine arteritis virus Deletion Interference Ebola virus Deletion /copy-back Persistence Citrus tristeza virus Deletion UN Interference/IFN- Influenza virus Deletion induction/persistence/modulate virulence Berne virus Deletion Interference Bovine coronavirus Deletion UN Human parainfluenza virus 3 Deletion /copy-back Interference Infectious bronchitis virus Deletion UN Parainfluenza virus 5 Deletion/copy-back IFN-induction Mouse hepatitis virus Deletion Interference/persistence Interference/IFN- Transmissible gastroenteritis virus Deletion Interference Measles virus Deletion/copy-back induction/persistence/modulate virulence Mumps virus Persistence/modulate virulence Dengue virus Deletion Persistence Interference/IFN-induction/immune Japanese encephalitis virus Persistence Sendai virus Deletion/copy-back stimulation/persistence/modulate Hepatitis C virus Deletion Persistence virulence Murray Valley encephalitis virus Deletion Persistence Tick-borne encephalitis virus Deletion Modulate virulence Bunyamwera virus Deletion Interference West Nile virus Deletion Interference/persistence Nepoviridae virus (RVFV) Deletion Interference/modulate virulence Tomato black ring virus Deletion Interference Toscana virus Deletion Interference Deletion Modulate virulence Human metapneumovirus Copy-back IFN-induction Picornaviridae Human respiratory syncytial virus Deletion/copy-back Interference/IFN-induction/persistence Encephalomyocarditis virus Deletion Interference Foot-and-mouth disease virus Deletion Interference Interference/IFN- Vesicular stomatitis virus Deletion/copy-back Mengo virus Deletion Interference induction/persistence/modulate virulence Polio virus Deletion Modulate virulence Rabies virus Deletion Interference/persistence Togaviridae Tospoviridae Persistence Tomato spotted wilt virus Deletion Modulate virulence Semliki Forest virus Deletion Interference/modulate virulence dsRNA viruses Sindbis virus Deletion Interference/IFN-induction Infectious necrotic pancreatic virus UN Persistence Cucumber necrosis virus Deletion Modulate virulence Tomato bushy stunt virus Deletion Interference/modulate virulence Rosellinia necatrix virus Deletion Interference Deletion Interference/modulate virulence Type 3 reovirus Deletion Interference Retroviridae Deletion UN Wound tumor virus Human immunodeficiency virus 1 Deletion/ hypermutation/frame shift Persistence DVG-profiler

All reads All pairwise combinations HTS reads Assign Sort breakpoints Detect Maximum breakpoints Align against score path reference Highest Hits from Yes alignment Group same read Hits score pair breakpoints All hits overlapping? No Sorted Sum of 2 the Filter hits by alignment scores read DVG-profiler

T. Bosma K. Karagiannis et al. Identification and quantification of defective virus genomes in high throughput sequencing data using DVG-profiler, a novel post-sequence alignment processing algorithm. 2019 May 17, PLOS ONE DVG-profiler: Annotations DVG profiler: Junction types short read First Last 𝑞𝑞 bp 𝑘𝑘

Deletion

3’ copy-back

5’ copy-back

Insertion Stable junction calling Experimental confirmation Performance assessment Reproducibility

bp Copy short read 𝑙𝑙 -

bp back 𝑞𝑞 𝑚𝑚 = Minimum aligned length = Read length 𝑚𝑚 𝑙𝑙 In/ dels DI-tector

• Uses artificial split reads and BWA • Complexity increases with the read length • Available only as end-to-end

G. Beauclairet al. DI-tector: defective interfering viral genomes’ detector for next-generation sequencing data . 2018 Oct 24, RNA Comparison DI-tector and DVG profiler

In silico samples: Sensitivity concentrations:

Name Type Breakpoint Reinitiation point Delta Length (b) Dataset Composition Av. Depth of Prevalence (%) REF Reference n/a n/a n/a 15384 Coverage DVG1 5’ cb 13363 14924 561 2483 SED1 REF/DVG3 6500.26/3.53 99.95 / DVG2 5’ cb 13257 14191 934 3322 0.05 SED2 DVG3 3’ cb 520 2312 1792 2832 REF/DVG3 6500.26/3.53 99.46 / 0.054 DVG4 3’ cb 5585 5686 101 11271 SED3 REF/DVG3 6500.26/353.31 94.85 / 5.15 DVG5 Deletion 522 5587 5065 10320 SED4 REF/DVG3 6500.26/3531.07 64.80 / 35.20 DVG6 Deletion 521 542 21 15364 SED5 REF/DVG3 6500.26/35310.73 15.55 / 84.45 DVG7 Insertion 2311 2292 19 15404 DVG8 Insertion 5686 5586 100 15485 Abundance prediction accuracy Wall clock time Performance in all 9 DVGs mixture 7000 8 6000 7 25 5000 6 20 5 4000 15 4 3000 Time (s) Divergence - 3

JS 10 2000 2 5 1 1000 0 0 0 SED1 SED2 SED3 SED4 SED5 DVG-profiler DI-tector SED1 SED2 SED3 SED4 SED5 SPD

DVG-profiler DI-tector TP FP DVG-profiler DI-tector Tutorial – Reads in HIVE

https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main Tutorial – Align Reads against reference

https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main Tutorial – From Alignments to DVG profiler

https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main Tutorial – DVG profiler paramters

1 2

1) Maximum distance bp

2) Maximum overlap 𝑑𝑑 bp

𝑜𝑜 https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main Tutorial – DVG profiler results

https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main # of hits 2D peak detection 14308 14307

14306 Detected Junctions 18000 14305 16000 14304 14000 11434 11435 11436 11437 11438 11439 11440 12000 10000 8000 # of hits 6000 Genomic position Genomic 4000 2000 0 1 2001 4001 6001 8001 10001 12001 14001 16001 18001 Genomic position

# of hits (grouped) 14308

14307

14306

14305

14304 11434 11435 11436 11437 11438 11439 11440 Tutorial – DVG profiler results refined

https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main Identifying junctions is the first step

• Nested events RT-PCR detected deletions in cpRNAs • DVGs mutations and quasispecies Quasispecies

4*10-4 errors / nucleotide / round of replication

Highly diverse populations, mutant clouds, swarms Quasispecies

Quasispecies is a mathematical model by Manfred Eigen1 = : type sequence from parent sequence 𝒘𝒘𝒊𝒊𝒊𝒊 𝑨𝑨𝒊𝒊𝒒𝒒𝒊𝒊𝒊𝒊 − 𝑫𝑫𝒊𝒊𝜹𝜹𝒊𝒊𝒊𝒊 : offspring with mutation rate 𝒘𝒘𝒊𝒊𝒊𝒊 𝑖𝑖 𝑗𝑗 : death rate of sequence ( = 0 = and = 1 ) 𝑨𝑨𝒊𝒊𝒒𝒒𝒊𝒊𝒊𝒊 𝑨𝑨𝒊𝒊 𝒒𝒒𝒊𝒊𝒊𝒊 𝑫𝑫𝒊𝒊𝜹𝜹𝒊𝒊𝒊𝒊 𝑫𝑫𝒊𝒊 𝑖𝑖 𝛿𝛿𝑖𝑖𝑗𝑗 ∀ 𝑖𝑖 𝑗𝑗 𝛿𝛿𝑖𝑖𝑗𝑗 ∀ 𝑖𝑖 ≠ 𝑗𝑗 Problem definition

Quasispecies Spectrum Reconstruction (QSR) means: • the set of sequence consensuses and • the relative frequency of each sequence in the sample HexaHedron algorithm

Alignment length to window half length Window profile Pos A C G T Ins Del start 1 50 1 0 0 0 0

2 0 0 65 0 0 1

3 0 82 1 0 0 0 start end 4 0 67 0 64 0 0 Sliding window 5 0 44 0 86 0 0

6 1 0 99 0 0 0

7 76 0 41 0 0 0 end 8 0 32 41 47 0 1

9 104 0 4 0 0 1

10 0 11 0 80 1 0 11 0 56 0 17 0 0 overlap 12 64 0 0 0 0 0

13 0 1 48 0 0 0

14 0 0 0 39 0 0

K. Karagiannis et al. Separation and assembly of deep sequencing data into discrete sub-population genomes. 2017 Nov 17, NAR HexaHedron algorithm

AGCCTGA|CATTAGT Window1

AGCTCGG|TACCAGT Window2 AGCCCGA|GATCAGT Window3 : Readn+1 CATTAGTAGTCAAC

Hamming similarity normalized by Sørensen–Dice index ( , ) Exceptions: = ( , ) • During bifurcation 𝑅𝑅∩𝑊𝑊 𝑅𝑅∩𝑊𝑊 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 𝑅𝑅 2 𝑊𝑊 • Paired end read with 𝐻𝐻 , = 𝑄𝑄𝑄𝑄 𝑅𝑅 𝑊𝑊+ assigned mate 𝑅𝑅 ∩ 𝑊𝑊 𝑄𝑄𝑄𝑄 𝑅𝑅 𝑊𝑊 K. Karagiannis et al. Separation and assembly of deep sequencing data𝑅𝑅 into discrete𝑊𝑊 sub-population genomes. 2017 Nov 17, NAR HexaHedron algorithm

• Sankey diagram + Flow diagram Energy between processes – Constant width – No axis information

• Nephosome / Graph Genome • Width corresponds to the depth of the coverage • Trajectories from left to right

30

K. Karagiannis et al. Separation and assembly of deep sequencing data into discrete sub-population genomes. 2017 Nov 17, NAR Empirical sample (Mumpsvax)

• DB of genomic sequences of 54 strains of mumps

• Alignment produced 688,000 hits

• Predicted frequencies : 93.18% and 6.82%. Consistent with previous estimate based on quantitative PCR (95%-5%)

• Consensus sequences of these two sub-strains were identical to those determined by conventional sequencing of plaque-purified clones

31 Source code

https://github.com/kkaragiannis/hexahedron

https://github.com/kkaragiannis/pan-HIV-ARM-quasispecies-analysis/

32 Tutorial

https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main Tutorial

https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main Tutorial

https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main Tutorial

https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main Tutorial

https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main HIV Pipeline

• Subtyping

• Refined reference selection

• Global sequences

B. Hora & N. Gulzar et al. Streamlined Subpopulation, Subtype, and Recombination Analysis of HIV-1 Half Genome Sequences Generated by High-Throughput Sequences. 2020 Sept, mSphere Global sequence inference

Convert Nephosome to a weighted directional graph = ( , ) where each contig is a vertex 𝐺𝐺 𝑉𝑉 𝐸𝐸

𝑣𝑣 ∈ 𝑉𝑉

Run Monte Carlo simlations1 until user specified limit

1Topfer, A., et al., Probabilistic inference of viral quasispecies subject to recombination. J Comput Biol, 2013. 20(2): p. 113-23. 39 Tutorial

https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main HIV Pipeline

B. Hora & N. Gulzar et al. Streamlined Subpopulation, Subtype, and Recombination Analysis of HIV-1 Half Genome Sequences Generated by High-Throughput Sequences. 2020 Sept, mSphere QSR Problem redefined

Quasispecies Spectrum Reconstruction (QSR) amended: • Identify the DVGs • Reconstruct Defective and non-Defective Viral Genomes consensuses • Quantitate the relative frequency of each consensus in the sample Future work

DVG-profiler

DVGs Acknowledgments

GWU/HIVE Raja Mazumder, PhD Naila Gulzar Charles Hadley King

FDA/OVRR FDA/HIVE Sergey Ivanovsky Christian Sauder, PhD Dipankar Chattopadhyay Kural Kamil Trent Bosma, PhD John Dougherty Alexander Lukyanov Konstantin Chumakov, PhD Majid Laassri, PhD Arya Eskandarian Hasmik Manukyan Marianna Faradzheva Ilya Mazo, PhD Sydney Fenstermaker Luis Quintero-Santana, PhD DUKE/EQUAPOL Feng Gao, MD Tigran Ghazanchyan Krista Smith Bhavna Hora, PhD Anton Golikov Sean Smith, PhD

Resources

Title Link Publications HIVE https://pubmed.ncbi.nlm.nih.gov/26989153/ DVG-profiler https://pubmed.ncbi.nlm.nih.gov/31100083/ HIVE-Hexahedron https://pubmed.ncbi.nlm.nih.gov/28977510/ HexaHedron HIV Quasispecies https://pubmed.ncbi.nlm.nih.gov/33055255/ Tools DVG-profiler https://github.com/kkaragiannis/DVG-profiler HIVE-HexaHedron https://github.com/kkaragiannis/hexahedron GWU-HIVE https://hive.biochemistry.gwu.edu/home FDA-HIVE https://hive.fda.gov/