RADAR-seq: Utilizing PacBio SMRT Sequencing to Detect and Quantitate DNA Damage on a Genome-Wide Scale Kelly M. Zatopek Research Scientist New England Biolabs Ipswich, MA Archaeal DNA Replication and DNA Repair

Leading Strand 3’ 5’ PCNA polD GAN RFC clamp loader MCM Lagging Strand GINS 5’ 3’ DNA primase Okazaki fragment maturation

Fen1

polB gap

RPA DNA ligase

Performed in Thermococcus kodakarensis (Tko) Archaeal DNA Replication and DNA Repair

Leading Strand 3’ 5’ PCNA DNA Glycosylase polD AP polB GAN Endonuclease Fen1 RFC clamp loader MCM helicase DNA ligase Lagging Strand GINS 5’ 3’ DNA primase Okazaki fragment maturation

Fen1

polB gap

RPA DNA ligase

Performed in Thermococcus kodakarensis (Tko) Archaeal DNA Replication and DNA Repair

Leading Strand 3’ 5’ PCNA Experimental Techniques DNA Glycosylase polD AP polB GAN Endonuclease Fen1 RFC clamp loader 1. Protein Expression & MCM helicase Purification DNA ligase Lagging Strand GINSGOAL 5’ 2. Enzyme Kinetics 3’ Utilize next generation sequencingDNA primase to understand3. CE Assay Okazaki fragment maturation DNA replication and repair in Tko4. Gel Filtration Fen1 5. Site Directed polB gap 6. Genetics RPA DNA ligase 7. NGS

Performed in Thermococcus kodakarensis (Tko) Next Generation Sequencing PacBio Single Molecule Real-Time

SMRT-Cell PacBio Genomic DNA Library Prep Sequencing Wells Instruments

Isolate genomic DNA RSII 150,000 wells

Sequel Sheer into 100 bp – 20 kb fragments 1,000,000 wells

Ligation of SMRT-bell adapters Sequel II 8,000,000 wells

Bind primer and polymerase doi:10.1126/science.1079700 Next Generation Sequencing PacBio Single Molecule Real-Time

SMRT-Cell Detection of Non-Canonical Bases Sequencing Wells SMRT-seq can detect presence of certain modified nucleotides, by slower polymerase translocation (IPD) 6mA & 4mC Two Major Hurdles 1. Detection of a wide variety of DNA modifications C T TGG T A CG AG T T A C CG AT C GG TGG T A C GGC T TG G T G A ACC A T G CTC A AT G GC T AG CC ACC A T G CCGA AC C A

2. Stochastic DNA modification detection IPD Ratio

Experimental IPD IPD Ratio = Expected IPD

doi:10.1126/science.1079700 doi:10.1186/2041-9414-2-10 RADAR-seq RAre Damage and Repair - Seq

× DNA damage site nick site Patch of high IPDs produced by RADAR-seq

Isolation of genomic DNA × × × patch (38 nt) × 10 PacBio library preparation 8 6 top × 4 strand

IPD ratio 2 SMRTbell Repair SMRTbell 0 adapter adapter C T GG C GGG C A T T G C C A G A A A C C T T G C C A T C T C A G A C A GG T A T A G A G A C G C T T A T C T enzyme(s) 551,600 — | | | | | | — 551,650 G A C C G C C C G T A A C GG T C T T T GG A A C GG T A G A G T C T G T C C A T A T C T C T G C G A A T A G A × 0 2 Bst FL DNA polymerase 4 bottom dTTP, dGTP d6mATP, d4mCTP 6 RADAR-seq strand

IPD ratio 8 single-molecule patch detection 10

Taq ligase NAD+ Identifying a patch of high IPDs at single-molecule level provides us with confidence that we are PacBio correctly calling a DNA damage site SMRT sequencing RADAR-seq patch detection Analysis by Vladimir Potapov doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq Detectable DNA Lesions

× DNA damage site nick site Detectable DNA Lesions Isolation of genomic DNA × × × × PacBio DNA Lesions Repair Enzyme library preparation alkylated Alkyl Adenine DNA Glycosylase1 × TwoAP Major site Hurdles Endonuclease IV2 1,2 SMRTbell Repair SMRTbell CPD T4 DNA Glycosylase adapter adapter 1.enzyme(s)Detection of a widedamaged variety of DNA Endonuclease modifications VIII1,2 ✔ × deoxyinosine Endonuclease V 1,2 Bst FL DNA polymerase2. Stochastic DNAdeoxyuridine modification detection DNA Glycosylase dTTP, dGTP 1 d6mATP, d4mCTP oxidized pyrimidines NEIL1 oxidized purines (8oxoG) Fpg DNA Glycosylase1,2

Taq ligase rN embedded in DNA RNaseH2 NAD+ T:G mismatch Thymine DNA Glycosylase1 1. Must be used in conjunction with Endonuclease IV for nick translation 2. Component of PreCR PacBio SMRT sequencing

RADAR-seq patch detection doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq Stochastic Modification Detection

× DNA damage site nick site Non-stochastic Versus Stochastic

Isolation of genomic DNA Base Detection × × × × PacBio library preparation × ● SMRTbell Repair SMRTbell ● adapter adapter enzyme(s) ● ● × ● Bst FL DNA polymerase 6mA dTTP, dGTP 0 2,000,000 d6mATP, d4mCTP 1.7 1.9 1.0 7.0 1.5 Average IPD Ratio at each Genomic Position Taq ligase NAD+

PacBio SMRT sequencing RADAR-seq patch detection Analysis by Vladimir Potapov doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq Stochastic Modification Detection

× DNA damage site nick site Non-stochastic Versus Stochastic

Isolation of genomic DNA Base Detection × × × × PacBio library preparation × vi v iv ● SMRTbell Repair SMRTbell ● adapter adapter enzyme(s) iii ● ii ● × i ● Bst FL DNA polymerase 6mA dTTP, dGTP 0 d6mATP, d4mCTP 2,000,000 Average IPD Ratio at each Genomic Position Taq ligase NAD+

PacBio SMRT sequencing RADAR-seq patch detection Analysis by Vladimir Potapov doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq Stochastic Modification Detection

× DNA damage site nick site Non-stochastic Versus Stochastic

Isolation of genomic DNA Base Detection × × × × PacBio library preparation ×

SMRTbell Repair SMRTbell adapter adapter Two Major Hurdles enzyme(s) ×1. Detection of a widei variety of DNA modifications● ✔ Bst FL DNA polymerase 6mA dTTP, dGTP 0 2,000,000 d6mATP, d4mCTP2. Stochastic DNA 7.0modification1.0 detection7.0 ✔ IPD Ratio at each Genomic Position in every DNA fragment Taq ligase NAD+

PacBio SMRT sequencing RADAR-seq patch detection Analysis by Vladimir Potapov doi: 10.1016/j.dnarep.2019.06.007 Method Validation Site-Specific Nicking Enzyme

Genome-wide Tko & A’s and C’s at Nb.BsrDI sites E. coli 2000 0

100 1900 7 PacBio 200 library preparation 6 A atio 1800 r 5 GGCATTGCCAGAAACCTTGC 300 CCGTAACGGTCTTTGGAACG 4 SMRTbell SMRTbell 3 adapter Nb.BsrDI adapter 1700 400 2

GGCATTGCCAGAAACCTTGC Mean IPD 1 CCGTAACGGTCTTTGGAACG 0 1600 WT Tko −100 0 100 200 300 400 500 Bst FL DNA polymerase 500 mdNTPs treated with Relative Position GGCATTGCCAGAACCTTGCC CCGTAACGGTCTTTGGAACG 1500 Nb.BsrDI 600 4

Taq ligase atio r 3 NAD+ C 1400 700 GGCATTGCCAGAACCTTGCC 2 CCGTAACGGTCTTTGGAACG 800 PacBio 1300 1 SMRT sequencing 900 Mean IPD RADAR-seq algorithm optimization 1200 1000 0 1100 −100 0 100 200 300 400 500 Relative Position

doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq RAre Damage and Repair - Seq

Workflow Detectable DNA Lesions

× DNA damage site nick site

Isolation of genomic DNA DNA Lesions Repair Enzyme × × × 1 × alkylated purines Alkyl Adenine DNA Glycosylase 2 PacBio AP site Endonuclease IV library preparation CPD T4 Pyrimidine DNA Glycosylase1,2 × damaged pyrimidines Endonuclease VIII1,2 SMRTbell Repair SMRTbell adapter adapter enzyme(s) deoxyinosine Endonuclease V × deoxyuridine Uracil DNA Glycosylase1,2 1 Bst FL DNA polymerase oxidized pyrimidines NEIL1 dTTP, dGTP 1,2 d6mATP, d4mCTP oxidized purines (8oxoG) Fpg DNA Glycosylase rN embedded in DNA RNaseH2 1 Taq ligase T:G mismatch Thymine DNA Glycosylase NAD+ 1. Must be used in conjunction with Endonuclease IV for nick translation 2. Component of PreCR

PacBio SMRT sequencing

RADAR-seq patch detection

doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq Ribonucleotide Detection

RNaseH2 Workflow Results

Tko E.coli 500 25 rN Ribonucleotide nick site 400 20 rN 300 15

SMRTbell SMRTbell RNaseH2 200 10 adapter adapter

rN 100 5 Ribonucleotides (per million bases) Ribonucleotides (per million bases) 0 0 Bst FL DNA polymerase WT∆RNaseH2 WT∆RNaseH2 m dNTPs 50-fold 6.5-fold

2000

2000 0

0 100 1900 100 1900 200 200 1800 1800 300 Taq ligase 300

1700 NAD+ 1700 400 400

WT 1600 1600 DRnaseH2 500 500

1500 Tko 600 1500 Tko 600

PacBio 700 1400 1400 700

SMRT sequencing 800 800 1300 1300

900

900

1200 1000

1200 1000

RADAR-seq patch detection 1100

1100

doi: 10.1016/j.dnarep.2019.06.007 RADAR-seq Utilization

RADAR-seq enables detection of a wide variety of stochastic DNA modifications on a genome-wide scale

4400 0

200 4200

400

400 0 0 Current RADAR-seq Projects 60

3800 1. Track lagging strand DNA polymerase synthesis 800

3600 2. Understand DNA formation and repair in Tko 1000 E. coli 3400 PolI I709A/ΔRNaseH2 3. Look for off-target nicking activity of Cas12a 1200 4. Determine formation of DNA secondary structure 3200 1400

1600 5. Determine DNA damaging affects of DNA in space 3000 1800 2800

2000

2200 2600

2400 RADAR-seq Version 2.0

Room for Improvement

1. Single-base resolution : Modified T & G 2. Detecting closely spaced lesions 3. RADAR-seq on large genomes RADAR-seq Version 2.0

Modified A’s & C’s Modified T’s & G’s

2-thio-TTP N2-methyl-dGTP

7 7 6 6 5 5 7 4 4 6 3 3 atio

r 2 6mA 2 Mean IPD ratio 5 Mean IPD ratio 1 4 1 0 0 3 0 100 0 100 2 Relative Position Relative Position

Mean IPD 1 0 −100 0 100 200 300 400 500 Relative Position 5-aminoallyl-dUTP 6-thio-dGTP 4 7 7 atio

r 3 6 4mC 6 5 5 2 4 4 3 3 1 2 Mean IPD ratio 2 Mean IPD ratio Mean IPD 1 1 0 0 −100 0 100 200 300 400 500 0 100 0 0 100 Relative Position Relative Position Relative Position RADAR-seq: Utilizing PacBio SMRT Sequencing to Detect and Quantitate DNA Damage on a Genome-Wide Scale

NEB RADAR-Seq NEB PacBio Group Vladimir Potapov Alexey Fomenkov Andrew F. Gardner Rick Morgan Jennifer Ong Brian Anton Laurence Ewilier Yvette Luyten Tom Evans Lisa Maduzia Pacific Biosciences Ece Alpaslan Michael Weiand hps://github.com/potapovneb/RADAR-seq Lixin Chen Xander Watson RADAR-seq Version 2.0

Modified A’s & C’s All Four Modified Bases

6mA 4mC

7 7 6 6 5 5

7 4 4 6 3 3 atio r 2

2 Mean IPD ratio 5 6mA Mean IPD ratio 4 1 1 3 0 0 0 100 0 100 2 Relative Position Relative Position

Mean IPD 1 0 −100 0 100 200 300 400 500 Relative Position N2-mG 2T-T 7 4 7 6

atio 6

r 3 4mC 5 5 2 4 4 3 3 2 1 2 Mean IPD ratio Mean IPD ratio

Mean IPD 1 1 0 0 −100 0 100 200 300 400 500 0 0 100 0 100 Relative Position Relative Position Relative Position Method Validation Site-Specific Nicking Enzyme

Single-Molecule Level Tko & E. coli single-molecule sequencing read patch

PacBio library preparation tp fn tn tn GGCATTGCCAGAAACCTTGC tp tp= true positive tp CCGTAACGGTCTTTGGAACG tn tn=true negative SMRTbell SMRTbell fn tn adapter Nb.BsrDI adapter tp tp fp=false positive tp fp tp fn=false negative GGCATTGCCAGAAACCTTGC CCGTAACGGTCTTTGGAACG Bst FL DNA polymerase Nb.BsrDI nicking site reference genome Nb.BsrDI nicking site mdNTPs GGCATTGCCAGAACCTTGCC Recall Precision CCGTAACGGTCTTTGGAACG tp/(tp+fp) Taq ligase tp/(tp+fn) NAD+ GGCATTGCCAGAACCTTGCC % of single-molecule sequencing reads % of all single-molecule sequencing CCGTAACGGTCTTTGGAACG containing a nick site and a true positive PacBio reads containing a true positive patch SMRT sequencing patch identified by RADAR-seq identified by RADAR-seq that occurs RADAR-seq algorithm optimization at nick site

Analysis by Vladimir Potapov Method Validation Site-Specific Nicking Enzyme

Single-Molecule Level Tko & E. coli single-molecule sequencing read patch

PacBio library preparation tp fn tn tn GGCATTGCCAGAAACCTTGC tp tp= true positive tp CCGTAACGGTCTTTGGAACG tn tn=true negative SMRTbell SMRTbell fn tn adapter Nb.BsrDI adapter tp tp fp=false positive tp fp tp fn=false negative GGCATTGCCAGAAACCTTGC CCGTAACGGTCTTTGGAACG Bst FL DNA polymerase Nb.BsrDI nicking site reference genome Nb.BsrDI nicking site mdNTPs GGCATTGCCAGAACCTTGCC Recall Precision CCGTAACGGTCTTTGGAACG tp/(tp+fp) Taq ligase tp/(tp+fn) NAD+ Training Test set Training Test set GGCATTGCCAGAACCTTGCC CCGTAACGGTCTTTGGAACG set Replicate 1 Replicate 2 Replicate 3 set Replicate 1 Replicate 2 Replicate 3 PacBio Replicate 1 90.8 ± 2.1% 90.3 ± 2.1% 92.0 ± 1.5% Replicate 1 96.5 ± 0.2% 95.2 ± 0.1% 90.9 ± 0.2% SMRT sequencing Replicate 2 90.4 ± 1.0% 90.4 ± 1.3% 91.6 ± 0.7% Replicate 2 96.9 ± 0.3% 96.1 ± 0.4% 91.3 ± 0.2% RADAR-seq algorithm optimization Replicate 3 91.3 ± 0.8% 90.4 ± 0.8% 92.9 ± 0.7% Replicate 3 96.5 ± 0.1% 95.3 ± 0.1% 93.8 ± 0.3%

Analysis by Vladimir Potapov