RADAR-seq: Utilizing PacBio SMRT Sequencing to Detect and Quantitate DNA Damage on a Genome-Wide Scale Kelly M. Zatopek Research Scientist New England Biolabs Ipswich, MA Archaeal DNA Replication and DNA Repair
Leading Strand 3’ 5’ PCNA polD GAN RFC clamp loader MCM helicase Lagging Strand GINS 5’ 3’ DNA primase Okazaki fragment maturation
Fen1
polB gap
RPA DNA ligase
Performed in Thermococcus kodakarensis (Tko) Archaeal DNA Replication and DNA Repair
Leading Strand 3’ 5’ PCNA DNA Glycosylase polD AP polB GAN Endonuclease Fen1 RFC clamp loader MCM helicase DNA ligase Lagging Strand GINS 5’ 3’ DNA primase Okazaki fragment maturation
Fen1
polB gap
RPA DNA ligase
Performed in Thermococcus kodakarensis (Tko) Archaeal DNA Replication and DNA Repair
Leading Strand 3’ 5’ PCNA Experimental Techniques DNA Glycosylase polD AP polB GAN Endonuclease Fen1 RFC clamp loader 1. Protein Expression & MCM helicase Purification DNA ligase Lagging Strand GINSGOAL 5’ 2. Enzyme Kinetics 3’ Utilize next generation sequencingDNA primase to understand3. CE Assay Okazaki fragment maturation DNA replication and repair in Tko4. Gel Filtration Fen1 5. Site Directed Mutagenesis polB gap 6. Genetics RPA DNA ligase 7. NGS
Performed in Thermococcus kodakarensis (Tko) Next Generation Sequencing PacBio Single Molecule Real-Time
SMRT-Cell PacBio Genomic DNA Library Prep Sequencing Wells Instruments
Isolate genomic DNA RSII 150,000 wells
Sequel Sheer into 100 bp – 20 kb fragments 1,000,000 wells
Ligation of SMRT-bell adapters Sequel II 8,000,000 wells
Bind primer and polymerase doi:10.1126/science.1079700 Next Generation Sequencing PacBio Single Molecule Real-Time
SMRT-Cell Detection of Non-Canonical Bases Sequencing Wells SMRT-seq can detect presence of certain modified nucleotides, by slower polymerase translocation (IPD) 6mA & 4mC Two Major Hurdles 1. Detection of a wide variety of DNA modifications C T TGG T A CG AG T T A C CG AT C GG TGG T A C GGC T TG G T G A ACC A T G CTC A AT G GC T AG CC ACC A T G CCGA AC C A
2. Stochastic DNA modification detection IPD Ratio
Experimental IPD IPD Ratio = Expected IPD
doi:10.1126/science.1079700 doi:10.1186/2041-9414-2-10 RADAR-seq RAre Damage and Repair - Seq
× DNA damage site nick site Patch of high IPDs produced by RADAR-seq
Isolation of genomic DNA × × × patch (38 nt) × 10 PacBio library preparation 8 6 top × 4 strand
IPD ratio 2 SMRTbell Repair SMRTbell 0 adapter adapter C T GG C GGG C A T T G C C A G A A A C C T T G C C A T C T C A G A C A GG T A T A G A G A C G C T T A T C T enzyme(s) 551,600 — | | | | | | — 551,650 G A C C G C C C G T A A C GG T C T T T GG A A C GG T A G A G T C T G T C C A T A T C T C T G C G A A T A G A × 0 2 Bst FL DNA polymerase 4 bottom dTTP, dGTP d6mATP, d4mCTP 6 RADAR-seq strand
IPD ratio 8 single-molecule patch detection 10
Taq ligase NAD+ Identifying a patch of high IPDs at single-molecule level provides us with confidence that we are PacBio correctly calling a DNA damage site SMRT sequencing RADAR-seq patch detection Analysis by Vladimir Potapov doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq Detectable DNA Lesions
× DNA damage site nick site Detectable DNA Lesions Isolation of genomic DNA × × × × PacBio DNA Lesions Repair Enzyme library preparation alkylated purines Alkyl Adenine DNA Glycosylase1 × TwoAP Major site Hurdles Endonuclease IV2 1,2 SMRTbell Repair SMRTbell CPD T4 Pyrimidine DNA Glycosylase adapter adapter 1.enzyme(s)Detection of a widedamaged variety pyrimidines of DNA Endonuclease modifications VIII1,2 ✔ × deoxyinosine Endonuclease V 1,2 Bst FL DNA polymerase2. Stochastic DNAdeoxyuridine modification Uracil detection DNA Glycosylase dTTP, dGTP 1 d6mATP, d4mCTP oxidized pyrimidines NEIL1 oxidized purines (8oxoG) Fpg DNA Glycosylase1,2
Taq ligase rN embedded in DNA RNaseH2 NAD+ T:G mismatch Thymine DNA Glycosylase1 1. Must be used in conjunction with Endonuclease IV for nick translation 2. Component of PreCR PacBio SMRT sequencing
RADAR-seq patch detection doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq Stochastic Modification Detection
× DNA damage site nick site Non-stochastic Versus Stochastic
Isolation of genomic DNA Base Detection × × × × PacBio library preparation × ● SMRTbell Repair SMRTbell ● adapter adapter enzyme(s) ● ● × ● Bst FL DNA polymerase 6mA dTTP, dGTP 0 2,000,000 d6mATP, d4mCTP 1.7 1.9 1.0 7.0 1.5 Average IPD Ratio at each Genomic Position Taq ligase NAD+
PacBio SMRT sequencing RADAR-seq patch detection Analysis by Vladimir Potapov doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq Stochastic Modification Detection
× DNA damage site nick site Non-stochastic Versus Stochastic
Isolation of genomic DNA Base Detection × × × × PacBio library preparation × vi v iv ● SMRTbell Repair SMRTbell ● adapter adapter enzyme(s) iii ● ii ● × i ● Bst FL DNA polymerase 6mA dTTP, dGTP 0 d6mATP, d4mCTP 2,000,000 Average IPD Ratio at each Genomic Position Taq ligase NAD+
PacBio SMRT sequencing RADAR-seq patch detection Analysis by Vladimir Potapov doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq Stochastic Modification Detection
× DNA damage site nick site Non-stochastic Versus Stochastic
Isolation of genomic DNA Base Detection × × × × PacBio library preparation ×
SMRTbell Repair SMRTbell adapter adapter Two Major Hurdles enzyme(s) ×1. Detection of a widei variety of DNA modifications● ✔ Bst FL DNA polymerase 6mA dTTP, dGTP 0 2,000,000 d6mATP, d4mCTP2. Stochastic DNA 7.0modification1.0 detection7.0 ✔ IPD Ratio at each Genomic Position in every DNA fragment Taq ligase NAD+
PacBio SMRT sequencing RADAR-seq patch detection Analysis by Vladimir Potapov doi: 10.1016/j.dnarep.2019.06.007 Method Validation Site-Specific Nicking Enzyme
Genome-wide Tko & A’s and C’s at Nb.BsrDI sites E. coli 2000 0
100 1900 7 PacBio 200 library preparation 6 A atio 1800 r 5 GGCATTGCCAGAAACCTTGC 300 CCGTAACGGTCTTTGGAACG 4 SMRTbell SMRTbell 3 adapter Nb.BsrDI adapter 1700 400 2
GGCATTGCCAGAAACCTTGC Mean IPD 1 CCGTAACGGTCTTTGGAACG 0 1600 WT Tko −100 0 100 200 300 400 500 Bst FL DNA polymerase 500 mdNTPs treated with Relative Position GGCATTGCCAGAACCTTGCC CCGTAACGGTCTTTGGAACG 1500 Nb.BsrDI 600 4
Taq ligase atio r 3 NAD+ C 1400 700 GGCATTGCCAGAACCTTGCC 2 CCGTAACGGTCTTTGGAACG 800 PacBio 1300 1 SMRT sequencing 900 Mean IPD RADAR-seq algorithm optimization 1200 1000 0 1100 −100 0 100 200 300 400 500 Relative Position
doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq RAre Damage and Repair - Seq
Workflow Detectable DNA Lesions
× DNA damage site nick site
Isolation of genomic DNA DNA Lesions Repair Enzyme × × × 1 × alkylated purines Alkyl Adenine DNA Glycosylase 2 PacBio AP site Endonuclease IV library preparation CPD T4 Pyrimidine DNA Glycosylase1,2 × damaged pyrimidines Endonuclease VIII1,2 SMRTbell Repair SMRTbell adapter adapter enzyme(s) deoxyinosine Endonuclease V × deoxyuridine Uracil DNA Glycosylase1,2 1 Bst FL DNA polymerase oxidized pyrimidines NEIL1 dTTP, dGTP 1,2 d6mATP, d4mCTP oxidized purines (8oxoG) Fpg DNA Glycosylase rN embedded in DNA RNaseH2 1 Taq ligase T:G mismatch Thymine DNA Glycosylase NAD+ 1. Must be used in conjunction with Endonuclease IV for nick translation 2. Component of PreCR
PacBio SMRT sequencing
RADAR-seq patch detection
doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq Ribonucleotide Detection
RNaseH2 Workflow Results
Tko E.coli 500 25 rN Ribonucleotide nick site 400 20 rN 300 15
SMRTbell SMRTbell RNaseH2 200 10 adapter adapter
rN 100 5 Ribonucleotides (per million bases) Ribonucleotides (per million bases) 0 0 Bst FL DNA polymerase WT∆RNaseH2 WT∆RNaseH2 m dNTPs 50-fold 6.5-fold
2000
2000 0
0 100 1900 100 1900 200 200 1800 1800 300 Taq ligase 300
1700 NAD+ 1700 400 400
WT 1600 1600 DRnaseH2 500 500
1500 Tko 600 1500 Tko 600
PacBio 700 1400 1400 700
SMRT sequencing 800 800 1300 1300
900
900
1200 1000
1200 1000
RADAR-seq patch detection 1100
1100
doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq Utilization
RADAR-seq enables detection of a wide variety of stochastic DNA modifications on a genome-wide scale
4400 0
200 4200
400
400 0 0 Current RADAR-seq Projects 60
3800 1. Track lagging strand DNA polymerase synthesis 800
3600 2. Understand DNA deamination formation and repair in Tko 1000 E. coli 3400 PolI I709A/ΔRNaseH2 3. Look for off-target nicking activity of Cas12a 1200 4. Determine formation of DNA secondary structure 3200 1400
1600 5. Determine DNA damaging affects of DNA in space 3000 1800 2800
2000
2200 2600
2400 RADAR-seq Version 2.0
Room for Improvement
1. Single-base resolution : Modified T & G 2. Detecting closely spaced lesions 3. RADAR-seq on large genomes RADAR-seq Version 2.0
Modified A’s & C’s Modified T’s & G’s
2-thio-TTP N2-methyl-dGTP
7 7 6 6 5 5 7 4 4 6 3 3 atio
r 2 6mA 2 Mean IPD ratio 5 Mean IPD ratio 1 4 1 0 0 3 0 100 0 100 2 Relative Position Relative Position
Mean IPD 1 0 −100 0 100 200 300 400 500 Relative Position 5-aminoallyl-dUTP 6-thio-dGTP 4 7 7 atio
r 3 6 4mC 6 5 5 2 4 4 3 3 1 2 Mean IPD ratio 2 Mean IPD ratio Mean IPD 1 1 0 0 −100 0 100 200 300 400 500 0 100 0 0 100 Relative Position Relative Position Relative Position RADAR-seq: Utilizing PacBio SMRT Sequencing to Detect and Quantitate DNA Damage on a Genome-Wide Scale
NEB RADAR-Seq NEB PacBio Group Vladimir Potapov Alexey Fomenkov Andrew F. Gardner Rick Morgan Jennifer Ong Brian Anton Laurence E wilier Yvette Luyten Tom Evans Lisa Maduzia Pacific Biosciences Ece Alpaslan Michael Weiand h ps://github.com/potapovneb/RADAR-seq Lixin Chen Xander Watson RADAR-seq Version 2.0
Modified A’s & C’s All Four Modified Bases
6mA 4mC
7 7 6 6 5 5
7 4 4 6 3 3 atio r 2
2 Mean IPD ratio 5 6mA Mean IPD ratio 4 1 1 3 0 0 0 100 0 100 2 Relative Position Relative Position
Mean IPD 1 0 −100 0 100 200 300 400 500 Relative Position N2-mG 2T-T 7 4 7 6
atio 6
r 3 4mC 5 5 2 4 4 3 3 2 1 2 Mean IPD ratio Mean IPD ratio
Mean IPD 1 1 0 0 −100 0 100 200 300 400 500 0 0 100 0 100 Relative Position Relative Position Relative Position Method Validation Site-Specific Nicking Enzyme
Single-Molecule Level Tko & E. coli single-molecule sequencing read patch
PacBio library preparation tp fn tn tn GGCATTGCCAGAAACCTTGC tp tp= true positive tp CCGTAACGGTCTTTGGAACG tn tn=true negative SMRTbell SMRTbell fn tn adapter Nb.BsrDI adapter tp tp fp=false positive tp fp tp fn=false negative GGCATTGCCAGAAACCTTGC CCGTAACGGTCTTTGGAACG Bst FL DNA polymerase Nb.BsrDI nicking site reference genome Nb.BsrDI nicking site mdNTPs GGCATTGCCAGAACCTTGCC Recall Precision CCGTAACGGTCTTTGGAACG tp/(tp+fp) Taq ligase tp/(tp+fn) NAD+ GGCATTGCCAGAACCTTGCC % of single-molecule sequencing reads % of all single-molecule sequencing CCGTAACGGTCTTTGGAACG containing a nick site and a true positive PacBio reads containing a true positive patch SMRT sequencing patch identified by RADAR-seq identified by RADAR-seq that occurs RADAR-seq algorithm optimization at nick site
Analysis by Vladimir Potapov Method Validation Site-Specific Nicking Enzyme
Single-Molecule Level Tko & E. coli single-molecule sequencing read patch
PacBio library preparation tp fn tn tn GGCATTGCCAGAAACCTTGC tp tp= true positive tp CCGTAACGGTCTTTGGAACG tn tn=true negative SMRTbell SMRTbell fn tn adapter Nb.BsrDI adapter tp tp fp=false positive tp fp tp fn=false negative GGCATTGCCAGAAACCTTGC CCGTAACGGTCTTTGGAACG Bst FL DNA polymerase Nb.BsrDI nicking site reference genome Nb.BsrDI nicking site mdNTPs GGCATTGCCAGAACCTTGCC Recall Precision CCGTAACGGTCTTTGGAACG tp/(tp+fp) Taq ligase tp/(tp+fn) NAD+ Training Test set Training Test set GGCATTGCCAGAACCTTGCC CCGTAACGGTCTTTGGAACG set Replicate 1 Replicate 2 Replicate 3 set Replicate 1 Replicate 2 Replicate 3 PacBio Replicate 1 90.8 ± 2.1% 90.3 ± 2.1% 92.0 ± 1.5% Replicate 1 96.5 ± 0.2% 95.2 ± 0.1% 90.9 ± 0.2% SMRT sequencing Replicate 2 90.4 ± 1.0% 90.4 ± 1.3% 91.6 ± 0.7% Replicate 2 96.9 ± 0.3% 96.1 ± 0.4% 91.3 ± 0.2% RADAR-seq algorithm optimization Replicate 3 91.3 ± 0.8% 90.4 ± 0.8% 92.9 ± 0.7% Replicate 3 96.5 ± 0.1% 95.3 ± 0.1% 93.8 ± 0.3%
Analysis by Vladimir Potapov