<<

Leveraging DNA Damage Response Pathways to Enhance the Precision of CRISPR-Mediated

Tarun S. Nambiar

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive Committee of the Graduate School of Arts and Sciences

COLUMBIA UNIVERSITY

2020

© 2020

Tarun Nambiar

All Rights Reserved

Abstract

The ability to efficiently and precisely modify the genome of living cells forms the basis of

genetic studies and offers great potential to research and therapy. With its unprecedented ease

of use and efficiency, CRISPR-Cas9 has revolutionized genome editing at a stunning pace.

Functioning like a pair of molecular scissors, the RNA-guided Cas9 can cleave genomic DNA to generate double-stranded breaks (DSBs). DSBs trigger the DNA damage response (DDR), that sets into motion multiple cellular processes that attempt to repair these lesions. One such cellular pathway, named homology-directed repair (HDR), enables researchers to make desirable changes precisely to genomic DNA sequences. HDR facilitates nearly any genomic DNA change, from the replacement of a single to the insertion of several thousands of . Thus, the precision, as well as versatility at modifying genomic DNA, make HDR a particularly promising repair pathway for genome editing. However, competition with other error-prone DSB repair pathways reduces the efficiency of HDR and results in the generation of an excess of undesirable . In this thesis, I address these two challenges associated with CRISPR-Cas9 genome editing: i) low efficiency of HDR and ii) large deletion mutations generated upon repair of Cas9-induced DSBs.

The first part of the thesis describes our study to identify genetic factors that stimulate

HDR at Cas9 induced DSBs. Towards this goal, we individually express in human cells 204 open reading frames involved in the DDR and determine their impact on CRISPR-mediated HDR. From these studies, we identify RAD18 as a stimulator of CRISPR-mediated HDR. By defining the

RAD18 domains required to promote HDR, we derive an enhanced RAD18 variant (e18) that stimulates HDR induced by CRISPR-Cas9 in multiple human types, including embryonic stem

cells. Mechanistically, e18 suppresses the localization of the HDR-inhibiting factor 53BP1 to

DSBs. Through this suppression of 53BP1, e18 promotes HDR at the expense of insertion and deletion mutations introduced by error-prone DSB repair pathways. Altogether, this study identifies e18 as an enhancer of CRISPR-mediated HDR and highlights the promise of engineering DDR factors to augment the efficiency of precision genome editing.

In the second part of the thesis I describe our study of the genetic mechanisms regulating large deletions that are generated upon repair of Cas9-induced DSBs. We perform a pooled CRISPR screen to interrogate the effect of knocking out 610 DDR on the frequency of CRISPR-mediated long deletions. The screen identifies genes that consistently affect the frequency of long deletions when knocked-out in different experimental conditions. Thus, our study lays the foundations for uncovering the mechanisms regulating CRISPR-mediated long deletions and has the potential to aid in the development of new strategies to limit their generation.

Table of Contents

LIST OF FIGURES III LIST OF TABLES IV LIST OF ABBREVIATIONS V ACKNOWLEDGEMENTS VII

CHAPTER 1: INTRODUCTION TO DNA DOUBLE-STRANDED BREAK MEDIATED GENOME EDITING 1

1.1 DNA double-stranded break mediated genome editing and the rise of CRISPR-Cas9 2 1.2 Mechanism of action of CRISPR-Cas9 4 1.2.1 Target search and recognition 4 1.2.2 Target cleavage 6 1.3 Repair of Cas9-induced DNA double-stranded breaks 6 1.3.1 DSB repair pathways 6 1.3.2 Regulation of DSB repair pathway choice 13 1.3.3 Undesirable DNA repair outcomes associated with Cas9-induced DSBs 18 1.4 Controlling DNA repair pathways to promote precision genome editing 24 1.5 Applications of CRISPR-mediated homology-directed repair 31 1.6 Figures 38

CHAPTER TWO: STIMULATION OF CRISPR-MEDIATED HOMOLOGY-DIRECTED REPAIR BY AN ENGINEERED RAD18 VARIANT 41

2.1 Introduction 42 2.2 Results 43 2.2.1 Identification of RAD18 as a potent enhancer of CRISPR-mediated HDR 43 2.2.2 RAD18 stimulates Cas9-mediated HDR in a UBZ motif dependent manner 44 2.2.3 Generation of an enhanced RAD18 (e18) variant for Cas9-mediated HDR stimulation 45 2.2.4 e18 stimulates HDR by preventing the localization of 53BP1 to DSBs 47 2.2.5 e18 expression leads to the inhibition of NHEJ 48

i

2.2.6 e18 stimulates CRISPR-mediated HDR at endogenous genomic loci in human cells 49 2.3 Discussion 51 2.4 Materials and methods 55 2.5 Figures 66 2.6 Tables 86

CHAPTER THREE: DISSECTING THE MOLECULAR MECHANISMS OF LONG DELETIONS INDUCED BY CRISPR-CAS9 93

3.1 Introduction 94 3.2 Results 96 3.2.1 A dual-Cas9 screen to map pathways that regulate CRISPR-mediated long deletions 96 3.2.2 Quality control assessment of dual-Cas9 screen data 98 3.2.3 Identification and validation of regulators of Cas9-induced long deletions 100 3.3 Discussion 101 3.4 Materials and methods 105 3.1 Figures 110 3.6 Tables 115

CHAPTER FOUR: CONCLUSION AND FUTURE PERSPECTIVES 118

4.1 Discussion 119 4.1.1 Improving the specificity and efficiency of e18 119 4.1.2 Regulation of the 53BP1-H2AK15ub axis for genome editing 122 4.1.3 Combinatorial modulation of the DDR 126 4.2 Future Directions 129

REFERENCES 133

ii

LIST OF FIGURES Main Figures Figure 1.1: Major Cas9-induced DSB repair pathways and selected interventions to promote HDR-based precision genome editing...... 38 Figure 1.2: Outcomes of Cas9-induced DSB repair in human cells...... 40 Figure 2.1: ORF screen to identify stimulators of CRISPR-mediated HDR...... 66 Figure 2.2: Analysis of RAD18 domains required to promote CRISPR-mediated HDR...... 68 Figure 2.3: Interplay between e18 (RAD18-∆SAP) and 53BP1 at DSBs...... 69 Figure 2.4: Measurement of indel levels at Cas9-induced DSBs after expression of e18 ...... 71 Figure 2.5: Targeting of endogenous loci in human cells using e18...... 73 Figure 2.6: Targeting of endogenous loci in hESCs...... 75 Figure 3.1: A dual-Cas9 screen identifies regulators of CRISPR-mediated long deletions...... 110

Supplementary Figures Supplementary Figure 2.1: HDR stimulation and PCNA ubiquitination induced by WT and mutant RAD18 variants expressed in HEK293T cells...... 76 Supplementary Figure 2.2: CRISPR-mediated HDR levels upon expression of RAD18 UBZ-NLS and RAD18-ΔSAP...... 78 Supplementary Figure 2.3: γH2AX foci formation and detection of 53BP1 and RNF8...... 79 Supplementary Figure 2.4: Analysis of indel patterns at Cas9-induced...... 80 Supplementary Figure 2.5: RFLP and flow cytometry analysis upon targeting of endogenous loci in cells expressing e18...... 82 Supplementary Figure 2.6: Analysis of the effect of e18 expression on cell proliferation...... 83 Supplementary Figure 2.7: Analysis of apoptosis in hESCs expressing e18...... 84 Supplementary Figure 2.8: Images of uncropped western blots...... 85 Supplementary Figure 3.1: Quality control assessment of dual-Cas9 screen...... 112 Supplementary Figure 3.2: Correlation comparison between dual-Cas9 screen populations and replicates...... 113 Supplementary Figure 3.3: sgRNA behavior in dual-Cas9 screen for identified hits...... 114

iii

LIST OF TABLES

Supplementary Table 2.1: CRISPR-mediated HDR frequency values upon expression of 204 individual DDR ORFs...... 86 Supplementary Table 2.2: Validation of screen hits...... 87 Supplementary Table 2.3: Indel mutation patterns at endogenous loci targeted by Cas9 in e18- expressing cells...... 88 Supplementary Table 2.4: HDR frequency values at endogenous loci targeted by Cas9 in e18- expressing cells...... 89 Supplementary Table 2.5: List of ORFs used in this study...... 90 Supplementary Table 2.6: List of sgRNAs and ssODNs used in this study...... 91 Supplementary Table 2.7: List of primers, siRNA and antibodies used in this study...... 92 Supplementary Table 3.1: Processed dual-Cas9 screen data ...... 115 Supplementary Table 3.2: List of sgRNAs used in this study ...... 117

iv

LIST OF ABBREVIATIONS

ABE Adenine Base Editor AID Activation-Induced AML Adult BE Base Editor BFP Blue Fluorescent bp Base Pairs BIR Break Induced Repair CAR Chimeric Antigen CAS Crispr-Associated Cas9n Crispr-Associated 9- nickase CRISPR Clustered Regularly Interspaced Short Palindromic Repeats crRNA CRISPR RNA dCas9 Dead Cas9 DDR DNA Damage Response D-loop Displacement Loop DMD Duchenne Muscular Dystrophy DNA Deoxyribonucleic Acid DSB Double-Stranded DNA Break dsDNA Displacement-Loop DUB Deubiquitinating e18 Enhanced RAD18 FA Fanconi Anemia FACS Fluorescence Activated Cell Sorting Fluorescence In-Situ Hybridization FLAER Fluorescein-Labeled Proaerolysin GCR Gross Chromosomal Rearrangements GFP Green Fluorescent Protein HDR Homology-Directed Repair HEK293 Human Embryonic Kidney 293 hESC Human Embryonic Stem Cells HR hSPC Hematopoietic Stem and Progenitor Cell hTERT Human Reverse Transcriptase i53 Inhibitor of 53BP1 indels Insertions and Deletions iPSCs Induced Pluripotent Stem Cells IR IRIF Ionizing Radiation Induced Foci

v kb Kilobases mAG Monomeric Azami-Green MAGecK Model-Based Analysis of Genome-Wide Crispr-Cas9 Knockout MIU Motif Interacting with MMEJ Microhomology-Mediated End Joining NGS Next-Generation Sequencing NHEJ Non-Homologous End Joining nt Nucleotides ORF Open Reading Frame PAM Protospacer Adjacent Motif PCR Chain Reaction PE Prime Editor pegRNA Prime Editing Guide RNA PRR RING Really Interesting New RNA Ribonucleic Acid RNP Ribonucleic Protein RRA Robust Rank Aggregation SAP SAF-A/B, Acinus and Pias SDSA Synthesis-Dependent Strand Annealing sgRNA Single Guide RNA SSA Single Strand Annealing ssDNA Single-Stranded DNA ssODN Single-Stranded Oligodeoxynucleotide SSTR Single Strand Template Repair TALEN Activator-Like Effector TLS Trans-Lesion Synthesis TRAC T-Cell Receptor Α Constant tracrRNA Trans-Activating CRISPR RNA ub Ubiquitin UBZ Ubiquitin-Binding UDR Ubiquitin-Dependent Recruitment WT Wild-Type ZFN Zinc-Finger Nucleases

vi

ACKNOWLEDGEMENTS

Finishing graduate school in the middle of a global pandemic, isolated from friends and colleagues, lab work, and the chaos of New York City, is an unrepresentative (and somewhat unceremonious) ending to perhaps the most influential period of my life. The last 5 years have been a time filled with joyful experiences, soaring highs and profound lows alike, and overall a period of great personal growth. I was also blessed during this period to have the mentorship, support, and companionship of some incredible individuals.

Joining the Ciccia lab in 2016 is a decision that I look back on without a trace of regret.

For this experience, I thank first and foremost, my thesis advisor Alberto Ciccia. While Alberto’s sense of ambition mirrors my own, his trust in me to develop my ideas (though often after sharing hundreds of his own) gave me the confidence to follow through with my experimental plans. Over the years I have learned tremendously from his commitment to perfection and immaculate attention to detail, his alacrity in adapting to new technologies and data, and his quiet leadership by example. I will miss the weekly one-one meeting where we would discuss all things science and I am truly grateful to him for giving me the opportunity to research one of the most revolutionary technologies of our time.

Thank you to Pierre Billon who has been a tremendous mentor and friend to me these last few years. His infectious passion for science, manic work-ethic, and brutal honesty contributed to my development as an independent scientist. He was the driving force behind the conception of the HDR project and I am grateful for his unflinching support through graduate school.

vii

Thank you to Sam Hayward and Jen-Wei Huang for many things but especially the

countless conversations relating to science, Netflix documentaries, and Jen-Wei’s astonishing

(and mostly unfortunate) adventures in NYC. Our interactions, interspersed between pipetting

and lunch (sometimes dinner) breaks kept life at lab lively and gave me the energy to plough on

after the inevitable failed experiments.

I am indebted to the rest of the wonderful members of the Ciccia lab. The patient instruction for experimental techniques I received from Giuseppe Leuzzi and Angelo Taglialatela were crucial to the HDR project. For this and for always being willing to help (weekends and vacations included) I am grateful. Thank you to Raquel Cuella-Martin, whose organization and efficiency I admire and have benefitted so much from. A big thanks to Sarah Joseph who has been a great support through the leadership roles we played in the graduate community and during our career search. And finally, thank you to two brilliant undergraduates I have had the privilege to mentor: Anuj Gupta and Andrew Palacios. I am excited about the amazing things they will achieve in their careers.

My progress in graduate school was also the direct result of the support from several amazing professors at Columbia. Thanks to Jean Gautier and Lorraine Symington for their suggestions during qualifying exams and TRAC meetings. A big thanks to Alex Chavez who I pestered frequently for advice on the CRISPR-mediated long deletions project. I am also grateful to Richard Baer, Chao Lu and Sam Sternberg for their insightful comments during lab and SLCC meetings. A huge thanks to Dieter Egli and his lab (specifically Giacomo Diedenhofen and Kunheng Cai) who performed critical experiments in human embryonic stem cells for the

HDR project.

viii

I would be remiss not to mention my undergraduate mentor Tim Bestor, who remains the most patient and encouraging mentor I have ever had. The two years I spent researching in his lab was the best preparation I could have had for graduate school.

To my friends in graduate school and beyond: there are plenty to list here and I shall eschew mentioning specific names to avoid the catastrophe of missing someone. But you know who you are and I cannot wait to continue celebrating our future achievements together. You make NYC truly magical.

A tremendous thank you to my family. Acha’s recounting of his influential years attaining a Ph.D. in organic chemistry during my childhood made my own decision to commit to graduate school an easy one. I cannot help but admire in equal measure, his relentless work ethic, his ability to balance fatherhood and a taxing career, his rational approach to problem- solving and his sometimes-infuriating stubbornness. I am proud to follow in his footsteps. I am also blessed to have Amma, who in addition to her role as an amazing mother, has served informally over the years as my nutritionist, life coach, immigration attorney, and (especially since the beginning of the pandemic) as my unsolicited doctor. I would struggle to function in life without her. Undoubtedly, I owe any and every success I have ever had to my parents.

Additionally, I would like to give a shout out to my little brother Anirudh, who despite seeing success in school and at work (which I am proud of), woefully continues to struggle to beat me at squash.

And finally, thank you to Corinna. I couldn’t have done this without you- quite literally, as I have finished writing the entirety of this thesis as a refugee of the pandemic in your Chicago studio. Besides providing me rent-free accommodation these last few months, you have

ix

brought me endless joy and supported me through the most testing times. I am grateful for

that algorithm that magically steered our lives to intertwine 3 years ago, and I am excited for the future.

x

CHAPTER 1: INTRODUCTION TO DNA DOUBLE-STRANDED BREAK MEDIATED GENOME EDITING

Contributions: I wrote this section of the thesis with help from Alberto Ciccia, Pierre Billon and Sarah Joseph.

1

1.1 DNA double-stranded break mediated genome editing and the rise of CRISPR-Cas9

Biologists have long attempted to introduce specific and desirable heritable changes in

. Classically, studies relied on the discovery and analysis of spontaneously or

artificially induced mutations in the genome using DNA damaging agents1,2. However, the 1980s

would usher groundbreaking breakthroughs for targeted genome engineering in and

mice3-6. Such gene targeting depended on the process of homology-directed repair (HDR),

wherein homologous DNA molecules could recombine and correctly insert at the desired

chromosomal location5,7. However, despite being remarkably precise, HDR suffered from

inefficiency in mammalian cells, thereby impeding advances in genome editing.

The study of the repair of damaged DNA would inadvertently reveal the key to

improving the efficiency of precision genome engineering- a targeted DNA double-stranded

break (DSB) in the chromosomal sequence of interest. Influential experiments in yeast and

mammalian cells showed that the rate of HDR could be stimulated upon cutting genomic DNA

with highly specific rare cutter restriction enzymes8-11. Adoption of genome editing by

restriction enzymes such as the endonuclease I-SceI remained limited however, due to

dependence on specific sequences required for these enzymes to recognize and cleave DNA.

Still, these breakthrough experiments would, in a remarkably short and intense period, pave

the way for programmable genome editing technologies that enable targeting of almost any

region of the genome. These technologies include zinc-finger nucleases (ZFN), transcription

activator-like effector nucleases (TALEN), and more recently clustered regularly interspaced

short palindromic repeats – CRISPR associated (CRISPR-Cas).

2

CRISPR-Cas, the latest addition to the genome editing tool , was discovered rather unexpectedly as a cluster of short repeat sequences in a bacterial genome12. These sequences were found to have been derived from DNA fragments of bacteriophages that had previously infected the prokaryote13-15. The sequences along with the Cas proteins, formed part of a

CRISPR system that was shown to be necessary to detect and destroy DNA from similar bacteriophages during subsequent infections16,17. Further studies revealed that of all the Cas proteins, one particular protein was necessary and sufficient for cleavage of nucleic acids:

Cas918.

The critical work, that arguably sparked the beginning of the CRISPR genome editing revolution, was the demonstration that Cas9 can be programmed to target a desired DNA sequence19,20. Cas9’s DNA targeting and cleavage activity is enabled by complexing with two small non-coding : the crRNA and a trans-activating crRNA (tracrRNA)21. Notably, one study simplified the CRISPR system by fusing the tracrRNA and crRNA to form a single guide

RNA (sgRNA)19. The sgRNA and a single Cas9 protein was demonstrated to be sufficient to induce a sequence-specific DSB in vitro19. These studies were immediately followed by seminal publications showing that CRISPR can be adapted for in vivo genome editing in eukaryotic cells22-24. For the first time, researchers had a tool that proved to be not only efficient and specific, but also extremely simple in design and flexible to target nearly any in the genome.

Since the elucidation of the function of Cas9, several other CRISPR systems have been identified and characterized25. The CRISPR‐Cas systems are divided into two classes based on the structural variation of the Cas genes and their organization style26. Specifically, class 1

3

CRISPR–Cas systems consist of multiprotein effector complexes, whereas class 2 systems comprise only a single effector protein26. Despite these discoveries, the type 2 CRISPR-Cas9, for now, remains at the forefront of gene editing efforts owing to its accuracy, efficiency, simplicity, and multiplexabiliy in gene manipulation24,27. These advantages have enabled

CRISPR-Cas9 to revolutionize biotechnology at a breathtaking pace, and is now poised to stimulate significant developments in agriculture, infectious diseases, and human health.

1.2 Mechanism of action of CRISPR-Cas9

1.2.1 Target search and recognition

The recognition of specific DNA sequences and their cleavage requires that Cas9 form an active DNA surveillance complex by assembling with the sgRNA22,28. The sgRNA consists of a 20- (bp) spacer sequence belonging to the crRNA that confers DNA target specificity, while the tracrRNA is essential for complex assembly 29. Within the spacer sequence, the first

10-12 nucleotides (nt) from the 3’ end that bind to the DNA, defined as the seed sequence, is particularly important for target specificity22,23,30-33. Mismatches in this seed region abrogate target DNA binding and cleavage, whereas close homology in the seed region often leads to off- target binding events even with many mismatches elsewhere34.

Besides the base pairing between the 20 bp spacer sequence with the target DNA, a specific DNA sequence called the protospacer adjacent motif (PAM), is critical for CRISPR interference. In the endogenous setting, the PAM sequence enables bacteria to discriminate between the genomic bacterial sequence (self) and invaders’ sequences (nonself)35. Located adjacent to the target site, the PAM is essential for target search and Cas9 activity20,22.

4

Mutations in the PAM impede Cas9 activity and allow bacteriophages to evade the host immune response32,36. The native PAM sequence for the commonly used SpCas9 (derived from the Streptococcus Pyogenes) is 5′-NGG-3′.

Cas9 initiates the target DNA search process by probing for the PAM before interrogating the flanking DNA for potential guide RNA complementarity33. Cas9 rapidly dissociates from DNA that does not contain the appropriate PAM sequence33,37,38. However, once the target site is found, local DNA melting is triggered at the PAM-adjacent nucleation site, followed by RNA strand invasion to form an RNA–DNA hybrid from PAM-proximal to PAM- distal ends38,39. This results in the activation of the CRISPR-Cas9 complex and induction of a double-strand break cleavage33.

Cas9 exhibits two domains, a well-conserved RuvC domain and an HNH domain. Both domains cleave a respective strand of the target dsDNA 3 bp from the PAM sequence19,20. There is a growing body of evidence that Cas9 generates both blunt and 1 bp staggered ends. In vitro it has been demonstrated that the HNH domain accurately cuts at the –

3 position upstream of the PAM sequence, whereas the RuvC domain flexibly cuts at either –3,

–4, –5, or even further upstream40-45. Accurate cleavage by the HNH domain, thus may reflect restrictions imposed by target DNA–sgRNA hybrid formation, whereas the lability of the single- strand nontarget DNA may result in the flexibility of RuvC cleavage. The cleavage mechanism of

Cas9 has important implications on the cellular repair of Cas9 breaks and will be discussed in later sections.

5

1.2.2 Target cleavage

In in vitro experiments, Cas9 or the catalytically dead Cas9 (dCas9) bind DNA with great affinity and possess residence times of over five hours33,46,47. In living cells, the estimates of

Cas9 residence on DNA vary widely38,48, and some approximate that it stays bound to its target in mammalian cells for as little as five minutes37. Further, genomic edits have been detected only one hour after electroporation of Cas9 ribonucleoproteins (RNPs), indicating a short residence time in cells and perhaps the promotion of multi-turnover activity of Cas9 on DNA by cellular factors. It is, therefore, possible that cellular DNA mechanisms such as transcription, replication, and remodeling, could promote Cas9 removal from the genome49.

1.3 Repair of Cas9-induced DNA double-stranded breaks

1.3.1 DSB repair pathways

The generation of a DSB at a defined genomic location by Cas9 is not sufficient to edit the locus. Rather, genome editing is a consequence of the activity of the cellular DNA damage response (DDR). DSBs are particularly harmful lesions as they can lead to chromosomal rearrangements and cell death. Their accurate repair is crucial to promote genome stability and the accurate transmission of genetic information. Living systems have therefore developed a complex and intricately intertwined set of mechanisms to repair DSBs. The ultimate goal of genome editing efforts is to regulate these pathways and navigate DNA repair towards the desired repair outcomes and away from undesirable byproducts. The achievement of this goal will be made through advances in our understanding of the interplay between DNA repair processes and our ability to control them.

6

The major pathways in the DDR have been uncovered by decades of study50. In the

context of genome editing, the cellular response to Cas9-induced DSBs can be broadly

categorized into three major categories. Processes including non-homologous end-joining

(NHEJ) and microhomology-mediated end-joining (MMEJ) result from the direct ligation of the

two DNA ends. These pathways have mostly been exploited to promote gene disruption as they

have the potential of introducing insertions and deletions (indels) or other DNA rearrangements

at the site of a DSB. Alternatively, specific DNA sequences can be incorporated precisely at a

desired locus by supplying an exogenous DNA template encoding the desired modification

flanked by DNA sequences homologous to the region upstream and downstream of the

cleavage site. Homology-directed repair (HDR) therefore enables the precise incorporation of

desirable changes to the genome. The molecular mechanisms underlying these three major

pathways in human cells (Fig. 1.1a-d) are described in the following sections.

Repair by non-homologous end-joining

NHEJ is the predominant form of DSB repair in human cells and is active during all

phases of the cell cycle51. NHEJ involves the direct ligation of the two DNA ends that may follow after end trimming52 53. Despite mutagenicity associated with this DSB end trimming, the rapid

kinetics of NHEJ is important for the protection of genomic integrity54-57. After a DSB has

formed, the ends are quickly bound and protected by the Ku70–Ku80 complex (), which

recruits and activates the DNA-dependent protein catalytic subunit (DNA-PKcs), whose

activity is essential for the recruitment of factors that process non-blunt ends57-60. Protection of

DNA ends by Ku inhibits end-resection, a process involving the nuclease dependent degradation

of the exposed ends of the DNA. End-resection is a major process required to initiate repair by

7

MMEJ and HDR61-63. Furthermore, Ku recruits XLF-XRCC4, which interacts with and stabilizes the

DNA LIG4, which ligates the blunt DNA ends64-66. When the DSB ends are not directly ligatable by XRCC4-LIG4 (for example due to overhangs of a few nucleotides or hairpin-capped ends), nucleases such as ARTEMIS can cleave at single-strand DNA (ssDNA)–double-stranded

DNA (dsDNA) boundaries to generate compatible blunt ends67. Further, , like Polμ and Polλ, are recruited to the DNA end by the Ku–DNA complex to fill in staggered sequences by template-independent and template-dependent polymerase activity (fill-in synthesis), respectively68. Due to its rapid kinetics and efficiency in all phases of the , NHEJ is the dominant mechanism by which Cas9-induced DSBs in human cells are repaired.

Repair by microhomology-mediated end-joining

MMEJ was originally discovered in NHEJ-deficient cells as a backup pathway to repair

DSBs69,70. Similar to NHEJ, MMEJ joins the broken DSB ends without templated synthesis from an exogenous donor; however, unlike NHEJ, MMEJ requires initial DSB end-resection71 and functions independently of Ku and LIG4 function. MMEJ is microhomology dependent and results in the deletion of DNA sequences located between the microhomologies72,73.

MMEJ is initiated by short-distance resection of DSBs that reveals homologies71. Initial resection is performed by the MRE11–RAD50–NBS1 (MRN) complex, which binds adjacent to the DSB and is activated in a cell cycle-dependent manner by its co-factor CtIP71,74-76. CtIP binding stimulates endonucleolytic nicking upstream of the break end on the 5′-terminated dsDNA strand by MRN77. The 3′-to-5′ MRE11 then resects from the nick back towards the break in a CtIP-dependent manner, generating the short 3′ overhangs necessary for initiating MMEJ71,74,75,78-81. Alternatively, MRN-CtIP mediated double-stranded endonucleolytic

8 cleavage has also been recently reported to promote loss of Ku and DNA-PKcs from DNA82. End processing exposes microhomology (> 2 nt) internal to the broken ends72,83-86 that are subsequently annealed through the activity of PARP1 and DNA polymerase θ (Polθ/POLQ)87,88.

POLQ catalyzes extension of the minimally paired 3′ single-stranded DNA ends and the resulting non-complementary 3′ flaps are then removed by the ERCC1-XPF endonuclease. The single- strand gaps flanking the annealed microhomology are then filled by low-fidelity polymerases that may involve POLQ89,90. Finally, LIG1 and LIG3 ligate the remaining nicks to complete the repair91,92.

Due to the removal of the heterologous DNA flaps, MMEJ is innately mutagenic and can be harnessed for genome engineering to correct genetic disorders involving repeat expansion93.

Such studies have enabled precise control of deletion outcomes without the need for exogenous DNA templates94,93. Additionally, the regulation of MMEJ enables HDR-free insertion by utilizing the microhomologies between an exogenous DNA donor and the genomic target to drive genomic integration95.

Homology-directed repair

HDR, like MMEJ is dependent on end-resection, however unlike MMEJ is characterized by more extensive end-resection. In genome editing experiments, HDR enables precise repair using exogenously provided DNA donor templates96,97. Although HDR can accurately repair damaged DNA, uncontrolled HDR leads to hyper-recombination and genomic instability, including potential loss of heterozygosity98. Therefore, end-resection is generally only initiated when an appropriate repair template is available, such as a sister chromatid in the S and G2 phases of the cell cycle99.

9

HDR is characterized by extended end-resection on the order of 1 kilobase or more. This is achieved by extending the short end-resection initiated by MRN-CtIP over a long-distance by and multiple such as DNA2, BLM and EXO180,100,101. EXO1 and DNA2-

BLM extensively process the DNA in the 5′-to-3′ direction, thereby exposing long homologies between the resected DSB ends and the repair template. The emergent ssDNA is rapidly coated with the abundant RPA complex, which includes RPA1, RPA2 and RPA3 subunits. Subsequent

RPA-ssDNA processing depends on the resolution by the specific HDR pathway102.

In the context of genome editing, HDR can repair DSBs using different forms of exogenous DNA templates that include double-stranded DNA (dsDNA) or single-stranded oligodeoxynucleotides (ssODN). The mechanisms utilizing dsDNA and ssODN as a template for repair are distinct. HDR using dsDNA donor templates is mediated by the canonical homologous recombination (HR) pathway. In the absence of an exogenously provided dsDNA template, this pathway predominantly utilizes the sister chromatid as a template for repair. ssDNA bound by

RPA cannot pair with other ssDNA, therefore HR requires the removal of RPA by BRCA2. BRCA2 which interacts with ssDNA, RAD51 monomers, and with BRCA1–BARD1 through PALB2103,104, exchanges RPA on the resected ends with RAD51. Loading of RAD51 onto ssDNA results in the formation of the presynaptic nucleoprotein filament105. This RAD51-DNA nucleoprotein filament, then conducts the homology search for an appropriate repair template, including exogenously provided dsDNA106. Strand invasion of the DSB end into the repair template forms a triple-stranded displacement loop (D-loop) structure, which is followed by extension of the

DSB-end from the template DNA by DNA polymerase δ (Polδ)90. This process is limited by motor proteins, including FANCM, BLM and RTEL1 which can disassemble D-loops107-111. HR is

10

characterized by the capture of the second DSB end resulting in the formation of a double

Holliday junction (dHJ) structure whose further processing results in either crossover or non-

crossover repair products (the latter outcome is also called gene conversion).

In addition to dsDNA templates, ssODNs have been used to stimulate HDR by a yet to be

fully dissected mechanism called single-stranded templated repair (SSTR). SSTR-based editing is

frequently applied because of its higher frequency relative to dsDNA and the ease of preparing

or purchasing the donor47,112-114. SSTR relies on annealing of the single-stranded overhangs at

the DSB with the ssODN and extension off the donor template spanning the break site.

Subsequent synthesis extends the DSB end unidirectionally using the ssODN. Following dissociation of the donor template, the DSB ends are ligated together115,116,117.

SSTR functions independently of RAD51 and BRCA2118,119. However, SSTR is still dependent on

the RAD51 paralogs, RAD51C and XRCC3, thereby suggesting the involvement of strand invasion

in this pathway119. Further, SSTR is partially dependent on RAD52, and overexpression of RAD52

or RAD52-Cas9 fusions increases templated repair using ssODNs115,118-121. Recent work has also

implicated the role of the Fanconi Anemia (FA) pathway in SSTR, as well as HR, by a mechanism

that remains unclear119. It also remains to be determined whether ssODNs are directly

incorporated into the genome. Thus, further research needs to be done to fully uncover the

mechanism and cellular rationale behind SSTR117,119.

Other potential pathways in Cas9-induced DSB repair

Single-strand annealing (SSA), Synthesis-dependent strand annealing (SDSA), and break-

induced replication (BIR) are other HDR pathways.

11

SSA is similar to MMEJ in requiring annealable homologies. Unlike MMEJ, SSA however

utilizes larger sequences of homology ranging several hundred nucleotides in mammalian cells122-124. SSA is mediated by RAD52, which binds the resected RPA-coated ssDNA ends and

anneals exposed single-stranded homologies125-129. Similar to MMEJ, SSA annealing also generates heterologous DNA flaps that are removed by the ERCC1-XPF nuclease complex130.

The resulting DNA gaps are filled and ligated together by currently unknown enzymes90,131.

Thus, SSA results in large deletions and this feature has been applied in genome editing

experiments in the removal of inserted selection cassettes that contain flanking homologies132.

SDSA involves D-loop migration but forms only a single HJ structure as the newly synthesized DNA is dissociated from the invaded exogenous template and anneals to the other

DSB end131. SDSA is, therefore, a non-crossover pathway and is predominantly used for somatic

HR (unlike the canonical HR pathway that is prominent in meiotic recombination)133. SSTR

shares some resemblance to SDSA, wherein the ssODN donor first anneals to a target ssDNA

overhang96,119.

BIR is likely a back-up DSB repair mechanism when the second DSB end cannot be located or is nonexistent. In such a scenario, extension takes place by error-prone polymerases such as POLD3 to the end131. Consequently, such repair is associated with

increased mutations along the entire replicated tract131. The extent to which BIR, which is

predominantly used to repair one-ended DSBs potentially during mitotic DNA synthesis and

alternative lengthening of telomeres134,135, is also used for Cas9-induced DSBs is unclear.

12

1.3.2 Regulation of DSB repair pathway choice

Given the array of available DSB repair pathways, the cell is faced with the task of

choosing the most appropriate pathway for the repair of the break. In principle, all DSB repair pathways compete for access to free DNA ends. NHEJ is dominant in the repair of Cas9-induced

DSBs in all phases of the cell cycle in human cells. In the end-resection permissive phases of the

cell cycle and upon the availability of donor DNA templates, SSTR/HR is a competing pathway to

NHEJ. Given the direct ligation of the ends by NHEJ, Cas9 might repeatedly bind and cut the target DNA until the generation of mutations by error-prone pathways that interfere with Cas9

binding. Error-prone pathways such as MMEJ and SSA may thus function more

opportunistically, by scavenging for the products of aborted NHEJ or unresolved donor- mediated HDR. Thus, cells are presented with various points at which repair intermediates are vulnerable to hijacking by error-prone repair pathways. The major decision nodes that influence

DNA repair choice (Fig. 1.1) are discussed below.

Ku is an early DSB responder

The efficient nature of mammalian NHEJ is mediated by the role of Ku which responds

rapidly at DNA ends surrounding DSBs81. At a molecular level, the override of NHEJ requires

displacement of the Ku70-Ku80 complex from the DNA end as Ku-bound DSB ends are resistant

to EXO1 and DNA2 processing61,136,137. One such mechanism is the targeted degradation of Ku

by specific E3 ubiquitin ligases138. Alternatively, the process of end-resection can promote the displacement of Ku. By this model, the short-range end-resection through the MRN-CtIP

complex, is responsible for Ku removal from the ends137,139, thereby committing the cell towards end-resection dependent repair pathways. 13

End-resection regulation by the nucleosome barrier

Genomic DNA is wrapped around the histone octamer to form a series of nucleosomes.

Nucleosomes are refractory to Cas9 cleavage due to steric hindrance in vitro140-143. Experiments in cells have demonstrated that Cas9 cleavage efficiency correlates either with general chromatin accessibility or transcription of target sequences141,144-147.Recently, it has been

observed that repair of Cas9-induced DSBs by NHEJ is more efficient in euchromatin contexts,

while MMEJ and SSTR is more active in specific heterochromatin contexts148. Thus, besides

modulating the cleavage efficiency of Cas9, chromatin contexts can influence Cas9-induced DSB

pathway balance45,148,149.

In agreement with the role of chromatin context in influencing DSB repair, it has been

demonstrated in vitro that the end-resection machinery is altered by chromatin accessibility150.

It has been observed in vitro that BLM can move nucleosomes along the DNA, if RPA is added in the assay, promoting DNA resection by EXO1 and DNA2151. On the other hand, interference of

BLM activity is promoted by the of RPA, a process that is critical to limit end- resection at nucleosomes152. Higher mobility and instability of the histone octamer in the

nucleosome is promoted by the histone variant H2A.Z by enabling EXO1 resection of

nucleosome bound DNA150. Conversely, it has been also shown that H2A.Z and H3.3 variants

facilitate the loading of the NHEJ factors Ku and XRCC4 onto DSB, thus limiting end-resection

initiation153-155. Such contrasting observations on the influence of chromatin on the accessibility

of repair factors paint a complex picture of the regulation of DSB repair pathway choice by the

chromatin. Nevertheless, in vivo results currently support a fundamental role of chromatin

14

remodelers to mobilize and/or dissociate nucleosomes 1-2 kilobases (kb) nearby a DSB, creating

the entry-space for repair factors156-158.

Histone code regulation and the 53BP1 axis

The DDR ATM, ATR and DNA-PKcs phosphorylate several substrates in response

to the presence of DSBs159-163. An important example is their phosphorylation of Ser139 of the histone variant H2AX, thereby forming γH2AX which sets in motion a ubiquitination cascade.

γH2AX recruits the adaptor MDC1 to form a specialized chromatin structure that can extend hundreds of kilobases away from the DSB164. MDC1 recruits the E3 RNF8 which

along with a second E3 ubiquitin ligase RNF168 catalyze Lys63 (K63)-linked polyubiquitylation of

histone H2A165,166. The ubiquitin scaffolds promote the recruitment of several complexes within

the vicinity of the DSB that function predominantly to limit end-resection. For example, a

complex of BRCA1 and the K63-linked polyubiquitin binding protein RAP80 further modulate

the ubiquitylation of chromatin near the DSB to antagonize end-resection167-170. In parallel,

TP53-binding protein 1 (53BP1) is recruited to chromatin by binding dimethylated at

Lys20 (H4K20me2) and histone H2A monoubiquitylated at Lys15 (H2AK15ub)171.

53BP1 has been extensively characterized as an antagonist of BRCA1. 53BP1 loss largely

restores HR in BRCA1-deficient mice, resulting in the rescue of the embryonic lethality and the

suppression of predisposition in pre-clinical mouse tumor models172-174. The dramatic

rescue of HR in BRCA1-deficient mice, was identified by the ability of 53BP1 to block end-

resection173. This is achieved through its effectors RIF1 and PTIP 175-183. Recent studies have

identified shieldin, a complex consisting of SHLD1, SHLD2, SHLD3, and REV7, as an essential

mediator of 53BP1 repair functions184-188. 53BP1/RIF1/shieldin inhibition of end-resection could

15

be achieved by multiple mechanisms184-189. In one model, shieldin may protect ssDNA from end-

resection, in agreement with the observation that SHLD2 binds ssDNA184,188,190,191. Additionally, the shieldin complex may recruit the Polα/ complex to counteract end-resection by filling in ssDNA192-194. Besides the RIF1/shieldin axis, 53BP1 interacts with PTIP, to additionally

inhibit HDR post-end-resection by blocking PALB2195. Together, this data highlights the role of

53BP1 in antagonizing BRCA1 promotion of HDR both before and after end-resection.

BRCA1 promotion of HDR is thus driven in part by its ability to override the function of

53BP1. This is achieved by several mechanisms. For example, BRCA1 loading at DSBs is

promoted by TIP60-mediated of histone residues close to H4K20, which inhibits

53BP1 localization196,197. BRCA1–BARD1 can also ubiquitylate histone H2A at Lys27, thereby

recruiting the chromatin remodeler SMARCAD1 and facilitating 53BP1 repositioning away from

the DSB198. Furthermore, BRCA1-BARD1 has also been demonstrated to inhibit 53BP1

recruitment to post-replicative chromatin by recognition of H4K20me0 on new histones199.

Finally, BRCA1-BARD1 could potentially override the inhibition of RAD51 loading by

53BP1/RIF1/shieldin, by promoting homologous DNA pairing by directly interacting with

RAD51103.

Cell cycle dependence

Given its importance in DSB repair pathway choice, end-resection is highly regulated

during the different stages of the cell cycle. For example, in the G1 phase, end-resection is

suppressed by DNA HELB, which is inactivated as cells enter the S phase200.

Additionally, the assembly of the BRCA1–PALB2–BRCA2–RAD51 recombinase complex is suppressed in the G1 phase by proteasome-mediated degradation of PALB2, following its

16

ubiquitylation by the E3 ubiquitin ligase CUL3–RBX1 and the adaptor protein KEAP1201. This

activity of KEAP1 is reversed by USP11 in the S phase.

Activation of end-resection and HDR is dependent on cell cycle-dependent kinase (CDK)

activity, which increases as cells enter the S phase202-207. In this regard, CDK dependent

phosphorylation of CtIP at Thr847 is essential for efficient activation of the nuclease

MRE1174,204,208. CDK activity promotes end-resection through the phosphorylation of EXO1203

and NBS199,209-211. Cell cycle status is also communicated to the DSB repair machinery by the upregulation of HDR genes as cells transition from the G1 into the S phase. Chromatin features in the S phase also promote the HDR machinery. For example, the heterodimer TONSL–

MMS22L supports RAD51 loading by binding to H4K20me0 on post replicative chromatin212-216.

Finally, entry into the M phase presents a unique challenge on account of chromatin

condensation and the continuation of DNA replication even in the presence of the DSB217. The

M phase is characterized by an attenuation of the downstream effectors of DDR. In particular,

the activity of RNF8 and RNF168 or accumulation of BRCA1 or 53BP1 on chromatin is inhibited

by mitotic kinases218-220. Such inhibition of repair factors in mitosis prevents chromosome

rearrangement and telomere fusions, which form due to the erroneous activation of NHEJ.

Interestingly,upstream events in DSB repair such as the recruitment of Ku and MRN

,phosphorylation of H2AX, recruitment of MDC1, and initiation of end-resection by CtIP

continue through the M phase despite no evidence of downstream NHEJ or HR66,221-223. These upstream factors might therefore function to mark and tether DSB ends, thereby enabling future repair upon exiting mitosis224.

Chromosomal mobility and pathway choice

17

HDR necessitates pairing between the Cas9 cleaved DNA and the repair template. The

proximity of the DSB with the repair template and the mobility of the DNA ends influence the

efficiency of pairing with the repair template225. In an endogenous context, the proximity of the

repair template is ensured by the tethering of sister chromatids by cohesion226,227. Additionally,

DSBs induce chromatin movement as evidenced by the clustering of γH2AX foci after DSB induction228. While limited mobility is associated with NHEJ228-230, the clustering of RAD51 foci

upon DSB induction indicates the dependence on the mobility of DNA ends by the HDR

pathway231,232. Furthermore, it was observed that DSBs in transcribed genes coalesce in the G1

phase until their repair by HDR in the late phases of the cell cycle233. Recent work has revealed

the mechanisms regulating HDR mediated DSB clustering by demonstrating the role of actin

polymerization in promoting end-resection234. Collectively, these studies demonstrate the

contribution of regulated mobility of DNA ends to DSB repair pathway choice.

1.3.3 Undesirable DNA repair outcomes associated with Cas9-induced DSBs

Despite the precision of HDR, its dependence on DSBs results in its accompaniment

with undesired mutational outcomes (Fig. 1.2). These include the formation of indels235,236,

inter-homolog recombination237,238, gross chromosomal rearrangements239,240 and

activation241,242. The nature of these undesirable mutational and cellular-response outcomes

associated with Cas9-induced DSBs are discussed below.

Insertions and deletions

The introduction of indels by Cas9-induced DSBs in a gene can cause shifts in the reading

frame that abrogate the expression of the gene. Due to the low frequency of HDR and the high

18 frequency of end-joining processes, indels are frequently generated concurrently with the desirable edits obtained through HDR. The generation of indels were typically considered stochastic and heterogeneous. However, more recent studies have challenged this notion by demonstrating that the indel distribution at Cas9-targeted sites is reproducible and dependent on the targeted sequence41,45,94,243,244. These studies demonstrate that NHEJ-mediated direct ligation of the predominantly blunt-ended DNA breaks induced by Cas9 is mostly error-free; however, the reconstitution of the original sequence promotes repeated cleavage by Cas9 until mutations are introduced by error-prone repair pathways. In this regard, the majority of deletions are due to MMEJ with the frequency of such mutations increasing at sequences with longer micro-homologies, higher GC content, and decreasing distance between the micro- homologous sequence. Concurrent analysis of the insertion pattern at Cas9-induced DSBs, revealed that the majority of such mutations are single-nucleotide insertions which can be attributed to Cas9 producing staggered-ended DSBs with one-nucleotide overhangs, in addition to blunt-ended DSBs. These studies further highlight that the spectrum of mutations associated with Cas9-induced DSBs vary depending on the targeted sequence as well as the cell line. Thus, the high frequency and diversity of indels make them a major barrier to DSB-based precision genome engineering. Limiting the frequency of indels has therefore emerged as an attractive strategy to promote HDR-based precise repair at Cas9-induced DSBs.

Inter-homolog recombination

In the absence of exogenously provided donor templates, HDR in mitotically dividing human cells most frequently use the identical sister chromatid as a repair template245. HDR involving the sister chromatid restores the intact chromatid without loss of sequence

19 information. However, such high-fidelity repair is potentially compromised when the template for HDR is the allele on the , owing to differences in the maternal and paternal . Such inter-homolog homologous recombination (IH-HR) can lead to the loss of heterozygosity of large chromosome segments when recombination between the homologs is associated with either crossing over or extensive gene conversion without crossing over.

Studies in mice have demonstrated the occurrence of repair of Cas9-induced breaks by

IH-HR246,247. In mammalian cells, the frequency IH-HR mediated repair of nuclease-induced DSBs is considered rare240,245. The rate of IH-HR at Cas9-induced DSBs is further lowered upon the availability of exogenous donor templates248. In contrast, recent observations of the potentially high-frequency IH-HR following Cas9 cleavage in human embryos and mouse zygotes question the view that the homologous chromosome is consistently an inefficient repair template across all cell types237,238,249. However, the conclusions implicating the role of IH-HR in these genome editing experiments remain highly debated250,251. Together, these studies highlight the need to improve our understanding of the mechanisms regulating IH-HR and perform a more comprehensive analysis of the frequency of this pathway at various genomic loci and in different human cell types.

Gross chromosomal rearrangements

While the study of repair outcomes within the vicinity Cas9 breaks has received considerable attention, amounting evidence points to the additional generation of expansive chromosomal lesions induced by Cas9 cleavage. Gross chromosomal rearrangements (GCRs), include chromosomal translocations, inversions, duplications, and deletions, and are associated

20

with most cases of cancer252. GCRs in coding sequences of genes can affect protein function,

whereas when encompassing one or several coding units and gene-regulatory regions can

cause changes in gene dosage253-255. The major mechanisms associated with such lesions are

error-prone end-joining repair of a DNA DSB256-258 and HDR between repeat sequences259,260.

Cas9-induced DSB associated translocations have been reported to arise on account of

cleavage at predicted sgRNA off-target sites261-263. Therefore, the generation of multiple DSBs

by Cas9 could result in the erroneous ligation of the junctions predominantly by NHEJ. In agreement with this model, Cas9-induced translocation studies using paired-sgRNAs have

revealed little or no end processing at breakpoint junctions with few microhomologies264,265.

Loss of LIG4 (or its partner XRCC4) reduces translocations in multiple cell lines264, further

implicating NHEJ in promoting translocations. However, the knockdown of CtIP and MMEJ

factor PARP1 has also contributed to decreased translocations suggesting that a role for MMEJ

in translocation formation cannot be excluded 265-268. Together, these studies highlight the

potential of modulating the DDR to reduce the frequency of such undesirable events.

Using Cas9 with two sgRNAs, precise inversions of DNA fragments ranging in size from

10 bp to >100 kb have been generated. Inversions were observed to be mediated by MMEJ

through short inverted repeats269. In addition, DNA fragment duplications through trans-allelic

recombination between DSBs induced at two homologous chromosomes using Cas9 have also

been detected41,269. While inversions and duplications have been detected at a single Cas9-

induced DSB270, it is unclear how the frequency and nature of these lesions are regulated.

Interestingly, reports have indicated that the presence of dsDNA donor templates could cause their multiple head-to-tail integration into the target site by processes that could include both

21

HDR and NHEJ271. Similarly, fluorescence in situ hybridization (FISH) -based methods have also detected off-target donor plasmid integrations in Cas9 targeted mouse embryonic stem cells272.

Together, these studies raise concerns about collateral damage associated with the use of dsDNA donor templates in Cas9-mediated genome editing experiments.

The design of reporters with repeat sequences has enabled the study of repeat- mediated large chromosomal deletions (>1 kb) induced by a single Cas9-induced site-specific

DSB40,273,274. These reports reveal the role of Ku70 and RAD51 in suppressing repeat mediated deletions while RAD52, CtIP, and BRCA1 promote this type of mutagenesis. The dependence on

RAD52 and end-resection factors indicate the role of SSA in mediating repair between the two repeats. Additionally, POLD3 and RAD51 dependent BIR has also been implicated in repeat mediate deletions, especially when the repeats are far apart (100-200 kb) and the DSB break is induced close to one repeat274.

Interestingly, long deletions (>250bp) of varying length spanning several kilobases have recently been reported at multiple endogenous sites targeted by a single-Cas9 break240,275,250.

Cas9-induced long deletions have been detected at a remarkably high-frequency (up to ~15%) and without the presence of large homologies (required for SSA) at the deletion breakpoints.

Conversely, observation of microhomologies at these long deletion breakpoint junctions suggest the involvement of MMEJ, rather than SSA in their generation275. However, it is unclear how MMEJ, which is inhibited by extensive end-resection, could promote long deletions.

Additionally, the distribution of the sizes of long deletions cannot be predicted by microhomology sequences alone, thereby implicating the role of multiple pathways in the generation of such highly mutagenic lesions.

22

Inhibition of the GCRs is crucial for the maintenance of genome stability following Cas9- mediated genome editing. The generation of GCRs can be limited by reducing Cas9 off-target cleavage by improving sgRNA design, optimizing modes of delivery of CRISPR components, and using more-specific variants of Cas9276. The detection of GCRs also remains challenging as they escape detection by conventional short-range PCR amplification-based methods; therefore, new tools have to be developed to sensitively detect GCRs. Finally, the molecular mechanisms behind GCRs needs to be fully dissected as this would enable DDR modulation-based interventions to reduce their frequency. p53 activation

The generation of DSBs within the genome results in the activation of the tumor suppressor p53, which is a potent cell cycle checkpoint that preserves genome stability by triggering cell cycle arrest, cellular senescence, and/or apoptosis277. DSBs induced by Cas9 lead to p53-dependent toxicity and cell cycle arrest in p53 proficient primary cells241,242.

Interestingly, this DSB-associated toxicity was even observed with the delivery of Cas9 RNPs, which have a limited lifetime in cells. These results raise the possibility that genome edited clones isolated based on the desired genome edit, are under selective pressure to lose p53 function. This is particularly concerning for therapeutic applications as these cells may have an enhanced potential to become tumorigenic. Thus, these observations highlight the need for a stringent evaluation of the p53-response to Cas9-induced DSBs.

23

1.4 Controlling DNA repair pathways to promote precision genome editing

Despite tremendous advances, CRISPR-Cas9 based genome editing can produce diverse genomic outcomes that can introduce risks and errors in a variety of applications. An error-free

DSB repair pathway such as HDR is highly desirable to ensure safe and accurate genome editing.

Although HDR is attractive for its precision, it remains inefficient in human cells and almost nonexistent in non-dividing somatic cells, as it is outcompeted by other repair pathways such as

NHEJ and MMEJ. Herein, I review known strategies to boost HDR by modulating the DNA damage response (Fig. 1.1), and also include a brief discussion of novel technologies that circumvent the need for HDR to make precise edits.

Optimizing donor design, chemistry and localization

Precision genome editing mediated by HDR can be achieved using two different donor template types. dsDNA templates with 0.5–2 kb of homology are generally used to insert large sequences into the genomes, while ssODNs are capable of introducing small sequence changes278,279. While classically genome editing depended on dsDNA mediated-HDR such as an exogenously supplied plasmid, the use of ssODNs is becoming increasingly popular47,115,116,280 because of the high frequency of HDR it stimulates and the ease of designing and purchasing the donor47,112-114. Strategies considering donor templates to enhance HDR fall under two categories: improving donor design for more efficient integration, and improving localization of the donor to the Cas9-induced DSB (Fig. 1.1f).

Two strategies that improve efficiency of donor-templated repair involve the use of asymmetric ssODNs and modified donor templates. The design of asymmetric ssODNs is based

24

on the observation that Cas9 asymmetrically releases the 3' end of the non-target strand47. The

use of asymmetric donors has been reported to increase HDR by up to 60%; however, this

strategy may not be generalizable as it has proven difficult to reproduce. Additionally, the use of chemically modified donor oligonucleotides has been shown to improve HDR. For example, the use of phosphorothioate-modified oligonucleotides likely stimulates HDR by blocking degradation by exonucleases, thereby increasing their concentration in the cell281,282.

Conversely, dsDNA-dependent HDR has also been enhanced with the use of dsDNA with 3′

overhangs at both ends283. Additionally, the use of chromatinized dsDNA instead of a naked

dsDNA has also been reported to stimulate HDR, though the mechanism of stimulation remains

unclear284.

Besides altering the donor DNA, an alternative strategy is based on the principle that

HDR can be improved by increasing the availability of the donor template. This is achieved by

the covalent linkage of an ssODN to Cas9 or the sgRNA, which likely promotes increased nuclear availability of the donor template285-288.

Controlling the cell cycle

Two cell cycle-dependent approaches to enhance HDR have been tested (Fig. 1.1e). The first strategy involves increasing the number of cells in HDR-permissive phases. Pharmacological cell cycle arrest in the permissive S phase using aphidicolin or XL413 (CDC7 inhibitor) is effective in some cell types289,290. Unexpectedly, blocking in the M phase, using the microtubule inhibitor nocodazole, also increases HDR289. The second strategy involves restricting Cas9 activity to the

S, G2, and M phases of the cell cycle. This favors templated repair over NHEJ by preventing DSB

formation in the HDR inactive phases of the cell cycle. For example, fusing Cas9 to cell cycle-

25

regulated degrons, such as a domain of geminin, promotes its degradation in G1. This strategy

has been demonstrated to improve templated repair in different cell types, although with

modest effect (~1.5 fold)291-293. Further, injection of CRISPR reagents specifically in the S294 and

G2295 phases has also resulted in improved HDR in mouse zygotes and embryos respectively.

Inhibiting end-joining pathways

A key dynamic in DSB repair is the competition between end-joining and templated

repair. This has inspired efforts to inhibit end-joining to promote the repair of Cas9-induced

DSBs by HDR.

Efforts to inhibit the core NHEJ factors have produced mixed results in human cells. For

example, pharmacological inhibition of Ku only marginal stimulates HDR296. LIG4 inhibition by

the small molecule SCR7 or viral proteins that induce its proteasomal degradation has been

reported to increase HDR in some human and mouse cells, though this effect was more limited

in other cellular contexts293,297-307. Similarly, small molecule inhibition of DNA-PKcs, with

NU7441, NU7026 and M3814 has also been reported to increase HDR303,308.

Given the key role of 53BP1 in promoting NHEJ by suppressing end-resection, the

inhibition of 53BP1 is an attractive strategy to stimulate Cas9-mediated HDR309. This was first

achieved by the development of i53 (inhibitor of 53BP1), an engineered ubiquitin variant309. By

binding to the Tudor domain of 53BP1, i53 prevents the binding of 53BP1 to H4K20me2,

thereby blocking its recruitment to chromatin. Expression of i53 improves repair by HDR using

both ssODN and dsDNA. Attempts to restrict 53BP1 inhibition to sites of DNA damage have also been achieved by fusing to Cas9 a dominant-negative mutant of 53BP1 (dn53BP1) that is

26

incapable of binding its effectors RIF1 and PTIP310. This fusion may prevent 53BP1 localization

specifically at Cas9 targeted sites, thereby improving HDR frequency.

Whereas the early steps of MMEJ resemble HDR, annealing of the overhanging ends

during MMEJ prevents HDR. Therefore, a promising and yet to be fully explored approach to

stimulating HDR is inhibiting MMEJ. For example, it has been demonstrated that inhibition of

MMEJ-mediated repair by POLQ knockdown could upregulate HDR in some contexts148,311,312.

Promoting end-resection and recombination

HDR stimulating strategies involve promoting end-resection and HDR factors

downstream of resection. In this regard, CtIP is a key protein in the early stages of end-

resection. It has been demonstrated that fusing CtIP to Cas9 improves templated repair during

genome editing, possibly through stimulating MRN-mediated resection of the DSB ends292. The

fusion was found to increase MMEJ, further supporting its function in stimulating end-

resection.

Human SSTR partially depends on RAD52, and overexpression of RAD52 or RAD52-Cas9

fusions increases templated repair using ssODNs115,118-121. Similarly, RAD51 is critical to dsDNA mediated HDR and a small molecule target of RAD51, RS1 was found to stimulate HDR by stabilizing ssDNA-RAD51 nucleoprotein filaments301,313,314.

Combinatorial strategies

Given that cells have multiple critical decision points to choose between end-joining pathways and HDR pathways, there is a growing need to focus on the targeting of multiple pathways to synergistically enhance HDR. For example, co-expression of RAD52 and dn53BP1

27

synergizes to promote efficient HDR using ssODN in human cells121. This synergy is likely

attained by the promotion of end-resection by blocking 53BP1 by dn53BP1, and the

downstream promotion of SSTR by RAD52121.

The dual inhibition of MMEJ (by knocking down POLQ) and NHEJ (by knocking down

either Ku70, Ku80 or LIG4) has been shown to eliminate random integration or illegitimate recombination at off-target sites of dsDNA donors315. It remains to be characterized if Cas9- mediated on-target HDR is significantly increased using this strategy.

HDR mediated precision genome editing has also been stimulated by using a cocktail of small molecules (named “CRISPY” mix) that target multiple DDR factors296. The mix consists of

NU7026 (DNA-PKcs inhibitor), Trichostatin A (ATM activator), MLN4924 (inhibitor of the Nedd8-

activating that has been shown to inhibit the neddylation of CtIP, thereby potentially

increasing the extent of DNA end-resection), and NSC 15520 (inhibitor of RPA association with

p53 and RAD9316,317, thereby possibly increasing the abundance of RPA available for HDR). It is

unclear how each of the components of the mix stimulates HDR and how they synergize when used in combination. Further, CRISPY mix has been demonstrated not to work consistently at all the tested genomic loci and cell types. This observation indicating locus and cell-type specificity of CRISPY mix is likely true for most HDR stimulating strategies292. Therefore, it is highly likely

that DDR modulation-based interventions will have to be tailored based on the genomic locus of the targeted sequence and cell type for more robust stimulation of HDR.

DNA DSB-free editing: Cas9-nickases, base editors and prime editors

(adapted from Billon, P., Bryant, E. E., Joseph, S. A., Nambiar, T. S., Hayward, S. B., Rothstein, R., & Ciccia, A. (2017).

28

CRISPR-mediated base editing enables efficient disruption of eukaryotic genes through induction of STOP codons. Molecular cell, 67(6), 1068-1079.)

HDR based precision genome editing is limited by its dependence on exogenous DNA repair templates, the generation of indels due to competition by more efficient end-joining, and its low efficiency in most therapeutically relevant cell types (T-cells and some types of stem cell being important exceptions)235,318. Further, DSBs are toxic DNA lesions that can induce GCRs, activate DNA damage checkpoints, and cause cell death264,319-323.

In contrast to DSBs, DNA single-strand breaks or nicks generate a much lower frequency of undesired genome modification324,325. Mutating the HNH (Cas9 H840A) or RuvC (Cas9 D10A) domain of Cas9 can generate nickases (Cas9n) that can cleave only one of the two DNA strands at the locus116,117,326. Single nicks can lead to higher HDR:indel ratios than double-stranded DNA breaks115. However, Cas9n induce much lower frequencies of genome editing when compared to Cas9, thereby limiting its application as a genome editing tool23,118. While modulating the

DDR to improve nickase-mediated-HDR has shown some promise116,118,262,325-327, there remains further scope for enhancing efficiency of Cas9n-based precision genome editing. Improving our understanding of how DDR factors repair Cas9-induced nicks will undoubtedly help towards this goal. Interestingly, Cas9n has been remodeled to generate two new technologies that entail direct modification of DNA bases: base editors and prime editors328-334.

Base editing is a direct replacement of a single base pair in the genome with the correct one without making a DSB. A base editor (BE) is an engineered Cas9 nickase and cytidine deaminase fusion enzyme that enables a C–G to T–A conversion in an activity window that can be as narrow as one to two bases333. The last few years have seen extensive improvements in

29

the design and application of BEs335,336. Additionally, adenine base editors (ABEs) have also

been developed which enable an A–T to G–C transition. Together, BE and ABEs give researchers

the ability to model and correct ~60% of the known pathogenic point mutations in humans337.

More recently, prime editing was developed namely to overcome the limitations of base editing, i.e. the inability to perform transversion mutations and make targeted DSB-free deletions and insertions328. Prime editors (PEs) use a reverse transcriptase (RT) fused to an

RNA-programmable nickase and a prime editing guide RNA (pegRNA) to copy genetic

information directly from an extension on the pegRNA into the target genomic locus. Prime

editors can perform all transversions and translocations introduced to target genes at locations

ranging from 3 nt upstream to 29 nt downstream of a PAM. Additionally, PEs can also be used to perform insertions even up to 44 nt and deletions up to 80 nt. Finally, early signs indicate that PEs are not only more versatile than BEs but also more efficient with fewer byproducts than HDR in the cell lines tested. It remains to be tested whether PEs can be used to insert large sequences with the same efficiencies as HDR.

In addition to BEs and PEs, even more nascent DSB-free transposon-based technologies have been described to work with remarkable precision in bacteria338,339. If these can be

engineered to work in human cells, they could facilitate the precise integration of large genetic

payloads into the genome. Along with CRISPR based transcriptional and epigenome editors340-

344, as well as RNA editors345-348, scientists now have an arsenal of technologies to manipulate

cellular function without induction of a DSB. Therefore, further research in these DSB-free

genome engineering technologies to improve specificity, efficiency in different cell types, and

30 delivery capabilities will give scientists new precision genome editing tools with previously unachievable possibilities.

1.5 Applications of CRISPR-mediated HDR

Targeted gene modification using genome editing tools is a powerful method to interrogate gene function and precisely manipulate cellular behavior and function. Despite its recent history, genome editing tools using CRISPR-Cas9 have enhanced our ability to use genetically engineered animal and cell line models to understand the molecular mechanism of various diseases and design improved therapeutic strategies. A comprehensive summary of cell modeling and gene therapy strategies using CRISPR-Cas9 can be found elsewhere349,350. Herein,

I highlight a selection of promising strategies that highlight the applications of CRISPR-Cas9 in studying and curing human diseases, with a focus on HDR-mediated precision genome editing.

Gene-tagging

One unique application of CRISPR-mediated HDR is the tagging of genes with markers.

Analysis of gene function in complex organisms relies extensively on tools to detect the cellular and subcellular localization of gene products, especially proteins. Gene tagging with fluorescent reporters circumvent the need for specific antibodies that can be costly and time consuming to make. Furthermore, antibodies do not enable live imaging studies of protein dynamics. Hence, tagging genes with standardized immune-epitopes or fluorescent tags that permit live imaging has become popular351-356. Tagging of genes with affinity tags have enabled the isolation of native protein complexes357,358. Genome engineering has also enabled the application of BioID, a proximity labeling method that relies on fusing a bait protein to a promiscuous biotin ligase,

31

BirA*, resulting in the tagging of vicinal proteins359. Finally, CRISPR-Cas9 has been deployed to generate conditional alleles of essential nuclear and cytoplasmic proteins by tagging these genes with auxin-inducible degron (AID). The addition of auxin to these edited cells results in the rapid depletion of the tagged genes360.

Screening for functional genes

An ongoing challenge in biology is comprehensively mapping genotype-phenotype relationships. It has become increasingly clear that the volume and complexity of genomic information necessitates rapid screening methodologies. In this regard, CRISPR-based screens can be used to interrogate the function of a massive number of genetic sequences including the entire genome361,362. This is facilitated by a library of sgRNAs that has enabled a large number of high-throughput functional genomic screens to be performed which have, in turn, identified key genes involved in a broad range of human health and disease conditions including , infections, immune regulators and responses, and metabolic diseases363-365. Such screens are performed in an arrayed or pooled format. In an arrayed screen, the reagents are added into a multi-well plate so that a single gene is knocked out in the cells in each well. This arrayed format is significantly more expensive to perform and lower throughput366. Pooled screens on the other hand involve testing thousands of genetic perturbations in a single assay by allowing massive libraries of gene targets to be investigated in a single cell culture dish, thereby accelerating the process of functional screening. Measurements with single-cell quantitation

(such as NGS and FACS) are thereafter used to quantify the effect of a single perturbation.

Acute Myeloid Leukemia (AML) was the first disease to be systematically analyzed with this technology367. Using this platform, the authors found several well-known potential targets

32 for AML therapies. Large-scale CRISPR-Cas9 screening have also been performed to systematically discover essential genes in many cancer lines368,369. Alternatively, CRISPR-based screens have been deployed to identify genes that promote resistance to specific drugs370,371.

Further, CRISPR-based double-knockout systems have been developed using dual sgRNA libraries to screen for genetic interactions, combinatorial genes and identify pairs of synthetic lethal drug targets 362,372-375. Finally, CRISPR screens have been performed in vivo which could lead to the development of personalized cancer therapies376.

While CRISPR-based knockout screens have become very popular, low HDR rates have prevented the widespread adoption of HDR based pooled knockin screens. However, there are a few examples of successful outcomes using this approach. For example, an HDR screen was used to knockin all possible variants into an of the BRCA1 gene377. Recently, a pooled knockin screen of an HDR template library into T-cells was performed in vitro and in vivo in mice

378. Using this strategy, the pooled knockin of several unique barcoded templates into the T-cell receptor-locus was performed, resulting in the identification of gene constructs that enhance the fitness of T-cells. These examples demonstrate the utility of HDR in CRISPR screens.

Improving HDR rates could, therefore, ensure that the potential of pooled knockin screens may be more fully realized.

Disease modeling

One of the major hurdles in the generation of faithful disease models has been the recreation of patient-specific mutations or alterations379. The ease of use and efficiency of

CRISPR-Cas9 have facilitated the generation of disease-related models as well as gene correction alternatives. In a remarkably short period, CRISPR-Cas9 has been used to generate

33 many disease-based models for many important human pathologies, including cancer380, neurological diseases381and cardiovascular pathologies382, as well as other Mendelian or complex genetic human diseases. The generation of these models not only allows investigation of the molecular mechanism behind these pathologies, but also serves as an excellent platform for drug screening and other high throughput approaches.

CRISPR-based HDR has enabled a rapid validation of candidate oncogenes and tumor suppressor genes both in vitro as well as in vivo383,384. For instance, organoid colon cancer models were constructed in vitro by introducing mutations of tumor suppressor genes and gene modification of oncogenes by CRISPR-mediated HDR385.

CRISPR-Cas9 technology has also been deployed for the study of diseases associated with a single genetic mutation. Such diseases include, cardiovascular, metabolic, neurodegenerative, hematological, and hereditary eye disease. For example, Cas9-mediated

HDR was used in human induced pluripotent stem cells (iPSCs) to generate a congenital heart disease model associated with GATA4 mutations in vitro to investigate the pathogenesis of this gene mutation386,387. Similarly, mutations in the FTO allele, associated with obesity, were corrected using CRISPR-Cas9 mediated HDR thereby achieving the re-expression of genes regulating weight loss in patient-derived adipocytes388.

The possibility of ex vivo genome editing, that is, the genetic engineering of cells in vitro and the subsequent re-engraftment of modified cells back to patients, have spurred genome editing attempts to cure hematological diseases. Sickle cell anemia is caused by a single nucleotide mutation from A to T in the first exon of human β-globin389. Using HDR, researchers corrected the β-globin gene (HBB) mutations in patient-derived HSCs in vitro390. Normal HBB

34 mRNA could thereafter be detected in red blood cells that were differentiated from modified iPSCs.

NHEJ-mediated editing has also been applied towards the correction of genetic diseases.

For example, this strategy could be used to treat Duchenne muscular dystrophy (DMD), one of the most common forms of muscular dystrophy caused by mutations of the DMD gene. CRISPR-

Cas9 based NHEJ enabled excision of the mutant portion of DMD in a DMD mouse model, thereby synthesizing a shorter version of dystrophin protein in the muscle fibers and restoring partial muscle function391-393. This raises the possibility of correcting disease-causing mutations in the muscle tissue of patients.

Together, these selected examples serve to highlight how CRISPR-Cas9 based HDR and

NHEJ have enabled the engineering of diverse cell types in complex ways. In combination with next-generation sequencing, CRISPR-Cas9 provides an unprecedented opportunity to create powerful and more informative cellular and animal genetic models that enhance our understanding of pathological processes.

Therapeutic applications

Genome editing has the potential to correct or eliminate mutations that lead to the development of cancer and other genetically driven diseases. In particular, ex vivo genome editing has shown the most widespread use.

In this regard, immunotherapy such as chimeric antigen receptor (CAR) T-cell therapy has generated considerable attention, with its goal to engineer the patient’s own immune system to target tumor cells394. However, the manufacturing process remains complicated, the number of target antigens limited and the antitumor responses to solid tumors also limited.

35

CRISPR-Cas9 has demonstrated the potential to overcome these limitations and further improve CAR-T-cell design. Multiplexability of CRISPR-Cas9 enables easier and faster generation of CAR T-cells. For example, CAR T-cells were efficiently generated and tested in vivo in which two genes (TRAC and B2M) or three genes (TRAC, B2M and PD-1) were simultaneously disrupted395. Similarly, CRISPR-Cas9 mediated HDR was used to integrate the CAR gene at the

TRAC locus in T-cells, resulting in more potent T-cells396,397. Overall, CRISPR-Cas9 holds great potential to emerge as an effective weapon in the fight against cancers. To this end, multiple clinical trials using CRISPR-Cas9 are currently underway350,398.

Genome editing attempts in the context of β-hemoglobinopathies are showing progress, with three projects aiming to treat patients (clinical trials identifier NCT03655678 and

NCT03745287 by CRISPR Therapeutics Ltd. for β-thalassemia and sickle cell disease, respectively, and NCT03728322 by Allife Medical Science and Technology Co. Ltd. for β- thalassemia)398.

Similarly, the use of CRISPR-Cas9 to correct diseases in the eyes is bolstered by the accessibility and relative immune-privileged status of the eye. Gene augmentation is therefore successfully employed for the treatment of inherited retinal diseases, and multiple clinical trials of gene augmentation are underway for Leber congenital amaurosis (LCA), choroideremia, achromatopsia, X-linked retinoschisis and retinitis pigmentosa399. For example, a treatment in the phase 1/2 of clinical trials called EDIT-101 (clinical trials identifier AGN-151587), is an AAV vector containing 3 components: Staphylococcus aureus Cas9 (SaCas9) and two sgRNAs that are designed to eliminate the pathogenic mutation in the of the CEP290 gene to treat LCA

398.

36

CRISPR-Cas9 research has also contributed to the anti-viral treatments, in particular HIV.

Targeting CCR5, a major coreceptor in the early stages of HIV infection has been demonstrated as a promising approach. Recently, researchers have established a CRISPR-Cas9-modified CCR5 gene editing system for adult HPSCs to achieve long-term and stable hematopoietic system reconstruction after infusion of modified CD34+ cells into patients with HIV-1 infection and acute lymphocytic leukemia (clinical trials identifier NCT03164135)400. This study demonstrates the potential feasibility and safety of gene edited adult HPSC transplantation in the human body.

The next several years will undoubtedly see more CRISPR-based therapeutics enter clinical trials. The permanent genetic changes made by CRISPR-Cas9 represents an immense paradigm shift by potentially eliminating the requirement for continuous treatment. Besides safety considerations, several ethical questions will also have to be deliberated before the widespread adoption of CRISPR-Cas9 as a therapeutic tool.

37

1.6 Figures

Figure 1.1: Major Cas9-induced DSB repair pathways and selected interventions to promote HDR-based precision genome editing.

a-d) Major pathways involved in repair a Cas9 break. The path highlighted in black shows the major branch points (see Chapter 1.3.1 and 1.3.2 for details) from DSB to SSTR and HR while

38 also depicting where losses in templated repair efficiency may occur. Select interventions are labeled in grey wherein ellipses indicate genetically encoded factors, hexagons indicate small molecules and rounded rectangles indicate Cas9-fusions (see Chapter 1.4 for details). a) Following a DSB, MRN promotes the ATM-mediated phosphorylation of targets of H2AX (resulting in γH2AX). γH2AX recruits MDC1, which in turn recruits the E3 ubiquitin ligase RNF8. RNF8, through recruitment of a second E3 ubiquitin ligase, RNF168, promotes histone H2A ubiquitylation. This modification, together with H4K20 methylation, allows for 53BP1 recruitment. 53BP1 through its interactors RIF1 and PTIP, blocks end-resection thereby channeling DSB repair towards NHEJ. Rapid association of Ku to DSBs schedules repair of DSBs by NHEJ. b) Ku recruits DNA-PKcs which phosphorylates ARTEMIS, resulting in the initial processing of DNA ends by ARTEMIS, followed by DNA ligation by LIG4. c) In the S-G2-M phase, the displacement of Ku and short-range end-resection of the DNA can be promoted by the MRN complex. Along with CtIP, MRN promotes the 3’-5’ resection that generates ssDNA that is refractory to Ku binding. ssDNA generated by short resection can be repaired by MMEJ in a POLQ/LIG3 dependent manner. d) Alternatively, 5’ to 3’ resection mediated by EXO1 and DNA2- BLM2 heterodimer can generate long stretches of ssDNA that are stabilized by the binding of the RPA complex. If a homologous donor template is not found, resection continues until RAD52 mediates annealing of flanking repeats to bridge the DSB by the SSA pathway. Conversely, in the presence of ssODNs, RAD52 promotes SSTR by a mechanism that remains to be fully elucidated. Conversely, in the presence of dsDNA donor template, RAD51 mediated recombination is engaged. This requires the BRCA2 mediated displacement of RPA from the 3′- ssDNA ends and assembly of RAD51 filaments. The RAD51 bound DNA nucleoprotein filament directs strand invasion into homologous DNA sequences resulting in the formation of a displacement loop (D-loop). Following synthesis of the invading strand in the D-loop, the invading strand captures the second end of the DSB (mediated by RAD52) resulting in the formation of a Holliday junction (HJ). Processing by structure-specific such as MUS81-EME1 and SLX1-SLX4 complexes then resolve the HJ structure. e) Cell cycle-dependent approaches to enhance HDR. Select interventions are labeled in grey wherein hexagons indicate small molecules and rounded rectangles indicate Cas9-fusions (see Chapter 1.4 for details). f) Modifying ssODNs to stimulate SSTR. Gene editing by SSTR can be improved using asymmetric donor ssODNs or phosphorothioate (PS)-modified ssODNs and by covalently linking ssODNs to Cas9-fusions or fusing to the sgRNA (see Chapter 1.4 for details).

39

Figure 1.2: Outcomes of Cas9-induced DSB repair in human cells.

Depiction of the major lesions associated with Cas9 induced DSBs in human cells in the absence of exogenously supplied donor molecules (see Chapter 1.3.3 for details).

40

CHAPTER TWO: STIMULATION OF CRISPR-MEDIATED HOMOLOGY- DIRECTED REPAIR BY AN ENGINEERED RAD18 VARIANT

This chapter has been adapted from: Nambiar, T.S., Billon, P., Diedenhofen G., Hayward, S.B, Taglialatela A., Cai K., Huang J, Leuzzi G., Cuella-Martin R., Palacios A., Gupta A., Egli D., Ciccia D. Stimulation of CRISPR-mediated homology-directed repair by an engineered RAD18 variant. Nature Communications 10, 3395 (2019).

Contributions: T.S.N, P.B., and A.C. conceived the project and designed the experiments. T.S.N. performed the experiments helped by P.B., S.B.H., A.T., G.L., J-W.H., R.C.M, A.P., and A.G. G.D and K.C. performed the experiments in hESCs under the supervision of D.E. T.S.N., and A.C. wrote the paper with help from P.B.

41

2.1 Introduction

Genome editing using CRISPR technology relies on the repair of site-specific DNA

double-strand breaks (DSBs) induced by the RNA-guided Cas9 endonuclease401. Homology-

directed repair (HDR) of these DSBs enables precise editing of the genome by introducing

defined genomic changes, including base substitutions, sequence insertions and deletions402,403.

In genome editing experiments, HDR is stimulated by homologous donor templates delivered in the form of single-stranded oligodeoxynucleotides (ssODNs) or double-stranded DNA (dsDNA) donors96,97. However, the efficiency of HDR-dependent precision genome editing is limited

by DSB repair pathways that compete with HDR, such as non-homologous end-joining (NHEJ)97.

The choice of DSB repair pathway is determined in large part by DSB end-resection, a

nucleolytic process that converts DSB ends into 3’-single-stranded DNA overhangs137. Certain

NHEJ factors, including 53BP1, promote the direct joining of DSBs by protecting DNA ends from

resection173,184-188,404. However, limited resection of DSB ends can expose regions of sequence microhomology, which favor DSB repair through microhomology-mediated end-joining

(MMEJ)69,405, while more extensive DSB resection generates the long 3’-single-stranded DNA

tails required for HDR97,406. Thus, cellular factors that impede DSB end-resection represent

major barriers to HDR-mediated precision genome editing.

Previous studies have shown that the efficiency of CRISPR-mediated HDR can be

improved by modulating the cellular pathways responsible for DSB repair (see Chapter 1.4).

Despite these important studies highlighting the influence of selected DNA repair factors on

precision genome editing, the impact of proteins of the DNA damage response (DDR) on Cas9-

induced HDR has not been systematically examined.

42

In this study, we conducted a screen of 204 human DDR open reading frames (ORFs) to evaluate their effect on CRISPR-mediated HDR and identified RAD18, a RING-type E3 ubiquitin ligase involved in post-replication repair and HDR407-411, as the most potent stimulator of both ssODN- and dsDNA-mediated HDR. Through functional analysis, we defined the domains of

RAD18 necessary to promote HDR at Cas9-induced DSBs and derived an enhanced RAD18 variant (e18) that stimulates HDR with greater efficiency and specificity. The e18 variant promotes HDR events by blocking the recruitment of 53BP1 to DSBs, and thereby inhibiting competition from the NHEJ pathway. In addition, we demonstrate that e18 augments Cas9- induced HDR at multiple genomic loci in various human cell types, including human embryonic stem cells. Altogether, this work establishes the engineered e18 variant of RAD18 as a potent enhancer of precision genome editing by CRISPR-dependent HDR.

2.2 Results

2.2.1 Identification of RAD18 as a potent enhancer of CRISPR-mediated HDR To identify factors that stimulate precision genome editing by CRISPR-dependent HDR, we used a previously described fluorescence-based reporter that measures Cas9-induced HDR events47. This reporter consists of a stably integrated blue fluorescent protein (BFP) sequence that is converted into a green fluorescent protein (GFP) sequence following Cas9-induced cleavage and HDR-mediated substitution of a single codon (CAT into TAC) (Fig. 2.1a). The percentage of cells that have undergone HDR-mediated repair of Cas9-induced DSBs can then be determined by monitoring the proportion of GFP-positive cells by flow cytometry (Fig. 2.1b).

Using the above BFP reporter stably integrated in HEK293T cells, we examined the levels of HDR modulation induced by individually co-expressing, with Cas9/sgRNA, 204 ORFs implicated in

43

DNA damage, repair and replication, as well as predicted interactors of key DNA repair

proteins412 (Fig. 2.1c, Supplementary Tables 2.1 and 2.4). In this assay, we monitored the

frequency of HDR events at the BFP reporter using either ssODNs or dsDNAs as donor

templates (Fig. 2.1c). Previous studies have shown that genome editing using dsDNA donors

relies on canonical RAD51-mediated HDR, also known as homologous recombination, while

ssODN-mediated genome editing depends on single-stranded template repair (SSTR), a RAD51-

independent HDR process that is promoted by RAD52 and proteins of the Fanconi anemia

pathway96,115,116,118,119,413. In agreement with the above findings413, expression of the RAD52

ORF elicited a 1.2-1.3-fold induction of ssODN-mediated HDR in HEK293T cells, thereby serving

as a positive control in our assay (Supplementary Table 2.1). ORFs that modulated HDR ≥1.25-

or ≤0.75-fold in the screen described above were individually validated in triplicates in HEK293T

and HeLa cells harboring the BFP reporter (Supplementary Table 2.1). Our screen revealed that

expression of the RAD18 ORF in HEK293T cells led to the greatest HDR stimulatory effect for

both ssODN and dsDNA donors (Fig. 2.1d, e). The HDR stimulatory effect of RAD18 was also

observed in HeLa and U2OS cells carrying the BFP reporter using both ssODNs and dsDNA

donors (Fig. 2.1f, g). These studies identify RAD18 as a potent stimulator of CRISPR-mediated

HDR events.

2.2.2 RAD18 stimulates Cas9-mediated HDR in a UBZ motif dependent manner

RAD18 is an E3 ubiquitin ligase involved in post-replication repair (PRR), a cellular

pathway that promotes the bypass of DNA lesions during DNA replication or after its

completion414. PRR requires the ubiquitination of the DNA polymerase sliding clamp PCNA

catalyzed by RAD18 in complex with the E2 ubiquitin ligase RAD6. Ubiquitination of PCNA by

44

RAD18/RAD6 is dependent on the RING, SAP (SAF-A/B, Acinus and PIAS), and RAD6-binding domains of RAD18410,415,416. In addition, RAD18 also promotes DSB repair by homologous recombination407-409,411. Although its role in HDR is less well-characterized, both the UBZ and

RING domains of RAD18 have been proposed to contribute to HDR407,411. To ascertain how

RAD18 expression promotes CRISPR-mediated HDR, we constructed a panel of RAD18 mutants in which RAD18 functional domains have been individually removed (Fig. 2.2a). Upon expression of these RAD18 mutants in HEK293T cells carrying the BFP reporter (Supplementary

Fig. 2.1a), we noted that HDR stimulation was unaffected by deletion of the RING domain, suggesting that the E3 ligase activity of RAD18 is dispensable for CRISPR-mediated HDR (Fig.

2.2b, c). In contrast, however, HDR was completely abrogated by deletion of the UBZ motif (Fig.

2.2b, c). Moreover, expression of the UBZ domain alone fused to an NLS sequence (UBZ-NLS) was sufficient to promote HDR (Supplementary Fig. 2.2), indicating that RAD18 stimulates

CRISPR-induced HDR in a UBZ-dependent manner.

2.2.3 Generation of an enhanced RAD18 (e18) variant for Cas9-mediated HDR stimulation

Interestingly, expression of the RAD18-ΔSAP mutant generated a significant increase in

CRISPR-mediated HDR compared to full-length RAD18 or UBZ-NLS (Fig. 2.2b, c and

Supplementary Fig. 2.2b). The enhanced effect of the ΔSAP mutant over full-length RAD18 was also confirmed using a distinct expression plasmid (Supplementary Fig. 2.1b, c). Of note, the

HDR-stimulatory effect of RAD18-ΔSAP was fully abrogated upon deletion of the UBZ motif (Fig.

2.2b, c). To determine whether ubiquitin binding by the UBZ domain is required for HDR stimulation by RAD18-ΔSAP, we introduced the D221A mutation, which abrogates RAD18

45 ubiquitin binding417,418, into the ΔSAP mutant. As observed in Fig. 2.2b, c, the D221A mutation inhibited HDR stimulation by RAD18-ΔSAP in a manner comparable to deletion of the UBZ motif, indicating that ubiquitin binding is necessary for stimulation of CRISPR-mediated HDR by

RAD18-ΔSAP. Similar results were observed upon electroporation of RAD18-ΔSAP mRNA into

BFP+ HEK293T cells. As shown in Supplementary Fig. 2.1d, RAD18-ΔSAP mRNA increased Cas9- mediated HDR in a dose-dependent manner up to 3.9-fold over RAD18-ΔSAP-D221A mRNA, further confirming the role of UBZ-dependent ubiquitin binding in stimulating CRISPR-mediated

HDR. These studies identify RAD18-ΔSAP as the minimal RAD18 construct that most efficiently stimulates CRISPR-dependent HDR and demonstrate that HDR stimulation by RAD18-ΔSAP can be achieved by mRNA delivery.

Previous studies have shown that RAD18 can induce PCNA ubiquitination on 164

(K164) and promote the association between ubiquitinated PCNA and PRR factors419. To determine whether the HDR-stimulatory effect of RAD18 is dependent on its ability to ubiquitinate PCNA, we monitored the levels of PCNA ubiquitination on K164 in cells expressing the WT or ΔSAP-mutant forms of RAD18 with or without replication stress induced by UV radiation. Under both untreated and UV-treated conditions, K164 ubiquitination of PCNA was markedly reduced in cells expressing the ΔSAP mutant relative to those expressing WT RAD18

(Supplementary Fig. 2.1e), in agreement with previous work implicating the SAP domain of

RAD18 in PCNA ubiquitination416. These observations confirm that stimulation of CRISPR- mediated HDR by RAD18 is independent of its role in inducing PCNA ubiquitination and establish that the ΔSAP mutant has enhanced specificity for HDR compared to WT RAD18.

Given the higher HDR efficiency and specificity of the ΔSAP mutant relative to WT RAD18, we

46

renamed the ΔSAP mutant as enhanced RAD18 (e18) variant for CRISPR-mediated HDR

stimulation.

2.2.4 e18 stimulates HDR by preventing the localization of 53BP1 to DSBs

The UBZ domain of RAD18 binds the histone H2A ubiquitinated on lysine 15

(H2AK15ub), and this interaction can be abrogated by the D221A substitution418.

Interestingly, H2AK15ub can also associate with the UDR domain of 53BP1 and is required for

53BP1 recruitment to DSBs171. Moreover, since RAD18 binds H2AK15ub with greater affinity

than 53BP1, it can potentially displace 53BP1 from DSBs411,418. In agreement with these

findings, we noted that e18, but not the e18-D221A mutant, localized to ionizing radiation-

induced foci (IRIF) in U2OS cells and suppressed 53BP1 IRIF formation (Fig. 2.3a, b). In contrast,

e18 expression did not alter IRIF formation by γH2AX, a marker of DSBs (Fig. 2.3b and

Supplementary Fig. 2.3a). These findings indicate that e18 abrogates the recruitment of 53BP1 to DSBs in a manner dependent on H2AK15ub binding.

To determine whether e18-induced HDR stimulation was dependent on 53BP1, we

depleted 53BP1 in HEK293T cells by siRNA and measured CRISPR-mediated HDR at the BFP

reporter using dsDNA donors, as described above (Supplementary Fig. 2.3b). As previously

observed420, 53BP1 depletion enhanced CRISPR-mediated HDR approximately 2-fold, in a

manner comparable to e18 expression (Fig. 2.3c). Interestingly, e18 expression did not further

stimulate HDR in 53BP1-depleted cells, suggesting an epistatic relationship between e18

expression and 53BP1 loss with respect to HDR stimulation (Fig. 2.3c). These observations were

confirmed in e18-expressing cells using i53, a genetically-encoded inhibitor of 53BP1420 (Fig.

2.3d). In line with previous studies showing that the recruitment of both 53BP1 and RAD18 to

47

DSBs is mediated by the RNF8-dependent ubiquitination pathway407,421-423, e18-mediated

stimulation of HDR was not observed in cells depleted of RNF8 (Fig. 2.3c and Supplementary

Fig. 2.3b). Taken together, these findings indicate that the e18-dependent stimulation of

CRISPR-mediated HDR occurs through the RNF8/53BP1 pathway.

2.2.5 e18 expression leads to the inhibition of NHEJ

Previous studies have shown that inhibition of 53BP1 results in a reduction of NHEJ

activity424. To determine the impact of e18 expression on NHEJ, we generated a reporter that

measures precise end-joining of Cas9-induced DSBs. Similar to a recently published reporter of

precise end-joining425, our reporter consists of a GFP gene inactivated by an intervening

cassette, which, when removed upon Cas9-dependent cleavage at 2 distinct sites, leads to

functional GFP detectable by flow cytometry (Fig. 2.4a and Supplementary Fig. 2.4a). Using

this reporter, which we named GFP-2-cut, e18 expression reduced the efficiency of precise end-

joining in a manner comparable to i53 expression (Fig. 2.4b). This effect was dependent on the

UBZ domain of e18, as deletion of this domain restored precise end-joining to levels

comparable to the empty vector control (Fig. 2.4b).

Repair of DSBs by NHEJ can result in the formation of small insertions and deletions

(indels). To investigate the effect of e18 expression on the frequency of indels generated by

mutagenic end-joining of DSBs, we analyzed by next-generation sequencing (NGS) the repair events occurring in e18-expressing cells at DSBs induced by Cas9 in the BFP reporter described above. In these studies, e18 expression led to a UBZ-dependent reduction in indel formation at

the site of Cas9-mediated DNA cleavage (Fig. 2.4c). Reduction in indel formation was also

observed at Cas9-induced DSBs in two distinct endogenous loci (FANCM and SPRTN), showing

48 that e18 promotes CRISPR-mediated HDR at the expense of NHEJ (Fig. 2.4d and

Supplementary Fig. 2.4b).

Limited resection of DSBs can favor MMEJ, a mutagenic repair pathway that competes

69 with NHEJ to promote the joining of DSB ends containing regions of microhomology . To determine whether MMEJ-dependent repair of Cas9-induced DSBs is altered in e18-expressing cells, we examined by NGS the pattern of end-joining products generated at Cas9-induced DSBs in 2 endogenous loci (FANCM and SPRTN) upon e18 expression (Supplementary Table 2.3). As shown in Fig. 2.4e,f and Supplementary Fig. 2.4c-e, e18 expression promoted the formation of

MMEJ-dependent repair products (microhomology-mediated deletions) at the expense of

NHEJ-mediated repair products (insertions and deletions lacking microhomology), consistent with a possible role for e18 in stimulating the processing of DSB ends necessary for MMEJ.

Overall, these findings further support the notion that e18 inhibits NHEJ by promoting DSB resection, which is required for both MMEJ and HDR pathways.

2.2.6 e18 stimulates CRISPR-mediated HDR at endogenous genomic loci in human cells

To test the efficiency of e18 in stimulating CRISPR-mediated HDR, we targeted several endogenous loci in multiple human cell lines using distinct DNA donor types (Fig. 2.5a and

Supplementary Table 2.4). In particular, we investigated the effect of e18 on ssODN-mediated

HDR by introducing a cancer-associated frameshift mutation into TP53 (i.e., Arg209fs*6,

626_627delGA) and new restriction sites into the EMX1 and JAK2 genes413. As shown in Fig.

2.5b, c and Supplementary Fig. 2.5a, e18 stimulated HDR using ssODN donors up to 2.7- and 3- fold in HEK293T and HeLa cells, respectively. Next, we examined the ability of e18 to enhance

49

CRISPR-induced tagging of the SEC61B and ACTB genes with GFP using dsDNA plasmid

donors426. In these assays, HDR events result in the expression of GFP-tagged proteins that can

be detected by flow cytometry (Supplementary Fig. 2.5b). As shown in Fig. 2.5a, d, e18

enhanced HDR at the ACTB and SEC61B loci in HEK293T cells approximately 1.7- and 2-fold,

respectively. We additionally examined whether e18 expression affected the efficiency of gene

targeting at the histone H2B (HIST1H2BK) and LMNA loci in U2OS cells, using mAG and mClover

dsDNA donor constructs, respectively298,427. These studies revealed that e18 expression

enhanced gene targeting up to 3- and 2-fold at the HIST1H2BK and LMNA loci, respectively (Fig.

2.5e and Supplementary Fig. 2.5c). In agreement with our previous observations, expression of the e18-D221A mutant did not stimulate CRISPR-mediated HDR at endogenous loci, further confirming that the HDR stimulatory effect of e18 is UBZ-dependent

(Supplementary Table 2.4). Importantly, e18 expression did not alter cell cycle progression

or cell proliferation, nor did it further sensitize cells to treatment with hydroxyurea, a

replication stress-inducing agent, thereby indicating that e18 expression does not interfere

with cellular replication (Supplementary Fig. 2.6).

Human embryonic stem cells (hESCs) have emerged as an important tool for disease

modeling, drug development, and tissue repair428. To demonstrate the utility of e18 as a

stimulator of precision genome editing in hESCs, we introduced a point mutation in the CALD1

gene using an ssODN296 and tagged the HIST1H2BK gene using an mAG dsDNA donor in the

hESC line pES12. As shown in Fig. 2.6a, e18 enhanced CRISPR-mediated HDR at the CALD1

and HIST1H2BK loci in hESCs up to 2.5- and 1.7-fold, respectively. Importantly, e18 did not

affect the expression levels of the pluripotency markers NANOG and SSEA-4, nor the viability of

50 hESCs (Fig. 2.6b, c and Supplementary Fig. 2.7b, c). These studies indicate that e18 can stimulate precision genome editing in hESCs without altering cellular viability and pluripotency.

2.3 Discussion

Precision genome editing by the CRISPR-Cas9 system can be enhanced by modulating the cellular DNA repair machinery in a manner that promotes more efficient and precise repair of Cas9-induced DSBs. By screening 204 ORFs encoding various components of the DNA damage response, we have identified RAD18 as one of the most robust stimulators of CRISPR- mediated HDR (Fig. 2.1). Through functional studies, we derived an enhanced RAD18 variant, designated e18, with increased HDR stimulatory activity and demonstrated the use of e18 as a genetically-encoded inhibitor of NHEJ that promotes Cas9-dependent precision genome editing in human cells using either ssODNs or dsDNA donor templates (Figs. 2.2-2.6).

The e18 variant of RAD18 lacks the SAP domain (RAD18-ΔSAP) (Fig. 2.2a), which is required for RAD18 binding to DNA replication fork structures and ubiquitination of PCNA during PRR415,416. Ubiquitinated PCNA is known to associate with translesion synthesis (TLS)

DNA polymerases, which favor the bypass of DNA lesions in a potentially error-prone manner414,429. In accordance with these observations, PCNA ubiquitination was induced in human cells upon expression of WT RAD18, but not e18 (Supplementary Fig. 2.1e). Given its inability to catalyze PCNA ubiquitination, e18 is not expected to induce TLS-dependent DNA mutagenesis. In line with these conclusions, previous studies have shown that WT, but not

ΔSAP-mutant, RAD18 is recruited to sites of UV-induced replication stress, where it promotes accumulation of the TLS polymerase POLH416. Since RAD18 variants that lack the SAP domain are defective in localizing to sites of replication damage416, e18 might be more readily available

51 to associate with DSBs compared to WT RAD18, which localizes to both DSBs and stalled replication forks. The greater efficiency by which e18 stimulates HDR relative to WT RAD18 (Fig.

2.2b, c) might therefore result from its enhanced specificity for DSBs. Together, these observations indicate that e18 is a separation-of-function variant of RAD18 that is deficient for

PRR but proficient for HDR. As such, e18 exhibits enhanced specificity for HDR relative to WT

RAD18.

Our work indicates that e18 stimulates CRISPR-mediated HDR in a manner dependent on its UBZ motif (Fig. 2.2b, c and Supplementary Table 2.4). Previous studies reported that the

UBZ motif of RAD18 binds to H2AK15ub with higher affinity than the UDR domain of 53BP1, suggesting that RAD18 can displace 53BP1 from chromatin near DSBs411,418. In line with these findings, we show that e18 abrogates the formation of 53BP1 IRIF through its UBZ motif (Fig

2.3a). Furthermore, we show that e18-mediated 53BP1 displacement and stimulation of

CRISPR-mediated HDR depends on D221, a key residue of the UBZ domain required for binding

H2AK13/K15ub417,418 (Figs. 2.2b,c and 2.3a,b). Similar to RAD18, RNF169 has also been reported to bind H2AK13/K15ub through its MIU domain and inhibit 53BP1 IRIF formation418,421,430, raising the possibility that RNF169 expression might also stimulate CRISPR-mediated HDR through the displacement of 53BP1 from Cas9-induced DSBs. H2AK15 ubiquitination requires the RNF8 ubiquitin ligase, which promotes the UBZ-dependent recruitment of RAD18 to sites of

DNA damage407. Since e18-mediated HDR stimulation is abrogated by RNF8 knockdown (Fig.

2.3c), the UBZ domain of e18 enhances HDR in an RNF8-dependent manner.

Inactivation of 53BP1 results in enhanced DSB resection, thus preventing NHEJ and favoring HDR420,431. In agreement with a role for e18 in inhibiting NHEJ, e18 reduces end-joining

52

and indel formation at Cas9-induced DSBs (Fig. 2.4). Consistent with the possible role of e18 in

promoting DNA end processing, e18 expression stimulates MMEJ at the expense of NHEJ at

Cas9-induced DSBs (Fig. 2.4e, f and Supplementary Fig. 2.4c-e). The above observations

suggest that inhibition of MMEJ factors, such as POLQ312, might further stimulate Cas9-

mediated HDR in e18-expressing cells. In line with this possibility, the loss of POLQ was recently

shown to increase the precision of gene targeting in NHEJ-deficient cells315,432. The possible role of e18 in stimulating DSB resection may also explain its ability to induce both ssODN- and

dsDNA-mediated HDR. This feature distinguishes the use of e18 from other previously

described HDR-stimulatory strategies involving expression of RAD52413 or treatment with the

RAD51-enhancing small molecule RS-1301, which are specific for ssODN- or dsDNA-dependent

HDR, respectively. Given that RAD51 and RAD52 promote invasion and annealing of resected

DSB ends406,433, respectively, stimulation of their functions might synergize with e18 expression

to further enhance CRISPR-mediated HDR. Additional stimulation of the HDR-promoting activity

of e18 might be achieved by fusing DDR factors to e18 (or its UBZ motif), thereby facilitating

their recruitment to DSBs through the binding of ubiquitinated H2A. As in the case of RAD18,

other DDR factors could also be engineered to stimulate CRISPR-mediated HDR by eliminating

domains that may attenuate their HDR-promoting activity. In this regard, some of the other hits

identified in our screen (Supplementary Table 2.1), such as the BRCA1 partner BARD1, which

recruits BRCA1 to DSBs to promote HDR199,434,435, would be attractive targets for engineered

enhancement and either co-expression or fusion with e18. Collectively, our work highlights how

rational engineering of DDR factors can enhance the efficiency of precision genome editing.

53

Our study demonstrates that co-delivery of e18 with Cas9 enables more efficient precision genome editing in multiple human cell lines (Figs. 2.5 and 2.6). Thus, the ability of e18 to robustly enhance ssODN-mediated HDR should also enable the production of precisely edited cell lines without the use of selectable markers, which can cause unintended perturbations of coding or non-coding genomic elements. Expression of e18 could facilitate marker-free genome editing when used in combination with HDR-enrichment approaches, such as the marker-free co-selection strategy based on modification of the ATP1A1 gene427. Since e18 can promote the efficiency of gene tagging using dsDNA donor templates, it should also facilitate the study of the cellular localization and interactions of endogenously tagged proteins.

Importantly, e18 stimulates CRISPR-mediated HDR in hESCs without altering cellular viability or pluripotency (Fig. 2.6 and Supplementary Fig. 2.7). This feature of e18 should facilitate the analysis of human genetic variants in distinct cell types derived by hESC differentiation. Given the growing number of genetic variants identified in the human population and in genetic diseases436,437, these studies would be particularly informative to decipher cellular phenotypes caused by genetic variants. Altogether, our work establishes e18 as a robust tool for enhancing

CRISPR-mediated precision genome editing in human cells, thereby aiding the study of gene function and modeling of human diseases.

54

2.4 Materials and methods

DNA plasmids

pX330-U6-Chimeric_BB-CBh-hSpCas9 and X335-U6-Chimeric_BB-CBh-hSpCas9n (D10A) were gifts from Feng Zhang (Addgene plasmids #42230 and #42335, respectively). Gateway pDONR223 or pDONR201 entry vectors containing DDR ORFs were either obtained from the

Human ORFeome Library or individually cloned (Supplementary Table 2.5). Entry vectors were individually recombined into the pMSCV-HA-FLAG438,439 or pHAGE-HA440 vectors by Gateway cloning. The identity of inserts following recombination into destination vectors was confirmed by BsrGI digestion and/or Sanger sequencing. The RAD18 cDNA was cloned by Gibson assembly into pcDNA3.1(+) linearized with BamHI and EcoRI. RAD18 mutants were generated by site- directed mutagenesis of pDONR223-RAD18 with primers designed using NEBaseChanger

(http://nebasechanger.neb.com). Mutants generated in the pDONR223 backbone were recombined into the pMSCV-HA-FLAG vector by Gateway cloning or inserted into pcDNA3.1(+) by Gibson assembly and validated by Sanger sequencing. All sgRNA sequences were cloned into

B52 as previously described441, and available on Addgene (Addgene #100708). The GFP-2-cut reporter was constructed by removing Cas9 from the lenti-SpCas9 Blast plasmid (Addgene

#104997) by XbaI and BamHI digestion. The plasmid was re-circularized using complementary oligonucleotides. Oligonucleotides were phosphorylated using PNK (NEB), annealed as previously described441 and ligated into the digested plasmid. The resulting plasmid, which was named B116 (Lenti-empty-Blast), was then digested with AgeI and BamHI (NEB) and the GFP-2- cut reporter cassette (ordered as a gBlock from IDT) was cloned using Gibson assembly. The

GFP-2-cut reporter contains a 192 nt sequence inserted into the GFP sequence that was

55 generated using a random DNA sequence generator

(http://www.faculty.ucr.edu/~mmaduro/random.htm). STOP codons were removed manually from the randomly generated sequence and PAM sequences for Cas9 targeting were added at the junctions with the GFP. A full list of the vectors used along with the sequences of the sgRNAs, donor DNA templates and primers used for PCR amplification of genomic DNA loci or generation of RAD18 mutants can be found in Supplementary Table 2.6 and 2.7.

Cell culture

U2OS, HeLa and HEK293T cells were obtained from American Type Culture Collection

(ATCC). Cell lines were cultured in DMEM supplemented with 10% Fetalgro bovine growth serum (BGS, RMBIO) and 1X penicillin-streptomycin (ThermoFisher Scientific). Cells were grown at 37°C with 5% CO2. To generate cell lines expressing the BFP reporter, HEK293T and HeLa cells were transduced with lentiviruses carrying the BFP reporter under the control of an EF1α (Addgene #71825) and sorted by flow cytometry to produce a pure population of

BFP-expressing cells. Single clones were generated from these populations and the clone exhibiting the highest HDR efficiency upon CRISPR-mediated targeting, as measured by BFP to

GFP conversion, was selected for the ORF screen. Similar experiments were conducted in U2OS cells to produce a population of BFP-expressing cells. HEK293T cells with the GFP-2-cut reporter were generated by lentiviral infection. Cells were then selected with blasticidin and transfected with sgRNAs to induce Cas9-mediated excision of the cassette within the GFP sequence. Diploid hESCs pES12 were grown on 6-well plates treated with Geltrex Matrix (A1413302,

ThermoFisher Scientific), with complete StemFlex medium (A3349401, ThermoFisher Scientific).

Cells were grown at 37°C in a humidified incubator with 5% CO2. The medium was replaced

56 every day. Cell cultures were maintained 4–6 days until ~ 80% confluency and subcultured at a

1:6 to 1:10 dilution. Adherent cells were dissociated using TrypLE Express Enzyme (12605‐036,

ThermoFisher Scientific). The medium was supplemented with 10 µM Rho-associated (ROCK) inhibitor Y-27632 (S1049, Selleckchem) for one day after cell splitting in order to increase cell survival.

ORF screen

BFP+ HEK293T cells were seeded at 50%–70% confluency into 24-well plates and transfected by mixing SpCas9 (250 ng), an sgRNA (250 ng) and a vector expressing a DDR ORF or an empty control plasmid (79 fmol), along with either a plasmid HDR donor (500 ng) or an ssODN (4 pmol). The cells were collected 3 days after transfection and analyzed by flow cytometry for GFP+ cells using a BD LSRFortessa. To overcome variations due to transfection efficiency and plate effect, multiple controls were used per plate and the percentage of GFP+ cells calculated for each sample was normalized to the mean of all the samples on a single plate. Biological duplicates were repeated on a separate day, with cells from different passages and the location of each DDR factor on the plate changed to avoid positional effects. Serial steps of validations were performed: first, a low-cutoff of HDR modulation (<0.75- and >1.25- fold) was utilized for pre-selection of the top hits; second, the identified hits were taken forward and tested in biological triplicates in BFP+ HEK293T cells; third, the hits that continued to reproduce in these trials were then tested in three independent experiments in BFP+ HeLa cells.

Antibodies

57

We utilized the following primary antibodies: anti-53BP1 (A300-273A, Bethyl

Laboratories), anti-FLAG (A2220, Sigma-Aldrich), anti-RAD18 (9040, Cell Signaling Technology), anti-RNF8 (sc-271462, Santa Cruz Biotechnology), anti-PCNA antibody (PC10, MA5-11358,

ThermoFisher Scientific), anti-ubiquityl-PCNA (Lys164) (D5C7P, 13439, Cell Signaling

Technologies), anti-human/mouse SSEA-4, mouse IgG (MAB1435, R&D Systems), and anti- human/mouse NANOG, goat IgG (AF1997, R&D Systems). We utilized the following secondary antibodies: HRP-conjugated AffiniPure goat anti-rabbit IgG (Jackson ImmunoResearch), HRP- linked sheep anti-mouse IgG (NA931, GE Healthcare), Alexa Fluor 488 goat anti-mouse and anti- rabbit IgG (ThermoFisher Scientific), Alexa Fluor 647 goat anti-rabbit IgG (ThermoFisher

Scientific), Alexa Fluor 555 goat anti-mouse and anti-rabbit (ThermoFisher Scientific).

Western blotting

Cell pellets were washed in PBS, resuspended in lysis buffer (100 mM Tris pH 6.8, 4%

SDS and 1.716 M 2-mercaptoethanol) and incubated for 10 min at 95°C. Proteins from the cell extracts were quantified using the Bradford Assay (Bio-Rad). Equal quantities of the extracts were diluted in 1X loading buffer (NuPage, ThermoFisher Scientific) and heated for 5 min at

95°C. Samples were run on an 8% polyacrylamide gel at 160 V in tris-glycine buffer. Gels were subsequently transferred onto nitrocellulose membranes, which were then blocked for 30 min with 5% milk (Bio Basic) in TBS 0.1% Tween-20 (TBS-T). Membranes were then incubated with primary antibodies in TBS-T supplemented with 1% milk overnight at 4°C. Membranes were then washed three times in TBS-T and incubated for 1 h with IgG secondary antibodies coupled with HRP at 1:1000 dilution in TBS-T/1% milk. Membranes were subsequently washed three more times in TBS-T and the HRP signal was detected using SuperSignal West Pico

58

Chemiluminescent Substrate (ThermoFisher Scientific) and autoradiography films (Southern

Labware). All uncropped gels are available in Supplementary Fig. 2.8.

Immunofluorescence microscopy

U2OS cells were grown on glass coverslips, fixed and permeabilized simultaneously in

2% (w/v) paraformaldehyde, 0.25% (v/v) Triton X-100 in PBS for 30 min at room temperature.

Cells were blocked with 3% BSA in PBS-T for 30 min at room temperature. Cells were then incubated with the primary antibody diluted in blocking buffer overnight at 4°C. Cells were next washed with PBS and then incubated with secondary antibodies diluted in blocking buffer at room temperature. The coverslips were mounted onto glass slides with Flouroshield with DAPI

(Sigma-Aldrich). Confocal images were taken using a Nikon A1 confocal microscope. Image collection, processing and analysis was performed in the Confocal and Specialized Microscopy

Shared Resource of the Herbert Irving Comprehensive Cancer Center at Columbia University

Irving Medical Center. pES12 cells edited at the HIST1H2BK locus with the mAG tag were sorted for mAG expression on a Bio-Rad S3e cell sorter. mAG-positive sorted cells were grown at low density in a 24-well plate until colonies formed, following which they were fixed with 4% PFA for 20 min at room temperature. The cells were then permeabilized with DPBS and 3% donkey serum with 0.1% Triton-X for 30 min at room temperature. Primary antibodies in blocking solution were then added to the cells and incubated overnight at 4°C. Cells were then washed in PBST (0.1% Tween-20). Secondary antibodies and Hoechst in blocking solution were subsequently added to the cells and incubated for 1 h at room temperature. Cells were then washed with PBST and imaged using an Olympus IX-71.

59

RNA interference

All siRNAs employed in this study were single-duplex siRNAs purchased from Horizon

Dharmacon. RNA interference (RNAi) transfections were performed using HiPerFect

Transfection Reagent (Qiagen) in a forward transfection mode. The individual siRNA duplexes used were targeting 53BP1/TP53BP1 (D-003549-01, Horizon Dharmacon), control firefly luciferase438 (D-002050-01-20, Horizon Dharmacon) and RNF8442 (Horizon Dharmacon). Except when stated otherwise, siRNAs were transfected 48 h before cell processing. The sequences of all the siRNAs used can be found in Supplementary Table 2.7.

UV irradiation

HEK293T cells were transfected with pcDNA3 vectors either empty or expressing WT

RAD18 or e18. Two days after transfection cells were exposed to 40 J/m2 or 60 J/m2 radiation using a Stratalinker 2400. Cells were harvested 4 h after UV radiation.

RFLP-based HDR assay

The RFLP assays were performed to detect HDR events at the EMX1 and JAK2 loci, as detailed previously441. Briefly, three days after transfection, HEK293T or HeLa cells were harvested. The cell pellet was resuspended in the Quick Extract DNA Extraction Solution

(Epicenter) and heated sequentially at 65°C for 5 min and 95°C for 5 min to isolate genomic

DNA (gDNA). The isolated gDNA was quantified using Nanodrop, diluted in water and stored at

−20°C or directly used in PCR reactions. PCR reactions were performed using the genomic DNA.

The following oligonucleotides were utilized for this reaction: EMX1 (EMX1 PCR Forward and

Reverse, Tm = 72°C) and JAK2 (JAK2 PCR Forward and Reverse, Tm = 72°C). The restriction

60 digestion was performed with 2 µl of the PCR reaction mix supplemented with CutSmart Buffer

(NEB) and one unit of PmeI. The restriction digestion was carried out for 1 h 30 min at 37°C, following which the samples were loaded onto a 6% TBE polyacrylamide gel. Gel pictures were taken using LI-COR Odyssey. The band intensity was quantitated using ImageJ (v.1.51m9, http://imagej.nih.gov/ij). The percentage of HDR was calculated using the equation (b+c / a+b+c) × 100, in which ‘a’ is the intensity of the uncleaved band and ‘b’ and ‘c’ are the intensities of the cleavage products.

Fluorescence-based HDR and NHEJ assays

The effect of e18 in cells with 53BP1 and RNF8 depleted by siRNA was measured in BFP+

HEK293T cells. Briefly BFP+ HEK293T cells were transfected with 10 nM siRNA using HiPerFect

Transfection Reagent (Qiagen). 24 h later, the cells were transfected with pcDNA3 vectors, either empty or expressing e18, using polyethylenimine (PEI). 48 h after plasmid transfection, the cells were trypsinized and the percentage of GFP+ cells was analyzed using the BD

LSRFortessa flow cytometer. The targeting of HIST1H2BK and LMNA was performed as previously described298,427. The SEC61B and ACTB targeting assays were performed in HEK293T cells by transfection of the following plasmids: e18 vector or empty control vector (280 ng),

Cas9-P2A-mCherry (250 ng), sgRNA vector (250 ng), dsDNA donor (500 ng). Edited cells were measured by flow cytometry on a BD LSRFortessa 3 days after transfection, and the results were analyzed using FlowJo v10 software. For editing in hESCs, cells were nucleofected at 80% confluency on a 6-well plate using the human embryonic stem cell Nucleofector Kit (VVPH-

5012, Lonza) for the Nucleofector 2b device (AAB-1001, Lonza). One well was used per nucleofection reaction. Cells were dissociated with TrypLE express enzyme, centrifuged 4 min at

61

1300 rpm and then resuspended in 100 µl of the nucleofection mix containing hESC nucleofection buffer with the following plasmids: e18 vector or empty control vector (4 µg),

Cas9-P2A-mCherry (2.5 µg), HIST1H2BK sgRNA (1 µg), HIST1H2BK-mAG plasmid donor (5 µg).

The complete reaction mix was added to the electroporation cuvette and underwent transfection with program A-23 of the Nucleofector 2b device. 48 h after nucleofection, cells were dissociated using TrypLE, filtered to obtain a single-cell solution, and subjected to FACS on the Bio-Rad S3e for mCherry-expressing cells. The number of sorted mCherry cells was kept consistent between conditions within each trial. During sorting cells were kept at 4°C in

StemFlex supplemented with ROCK inhibitor. mCherry-expressing cells were then reseeded and left to grow for 4-5 days, dissociated at a confluence of approximately 50% and analyzed for percentage of mAG+ cells on the BD LSRFortessa to determine HDR efficiency. The raw data of all HDR assays targeting endogenous loci in HEK293T, U2OS and HeLa cells and hESCs can be found in Supplementary Table 2.4. Precise NHEJ was measured in HEK293T cells harboring the

GFP-2-cut reporter generated as described above. 24-well plates were seeded at approximately

70% cell confluency and were reverse transfected by mixing 250 ng of sgRNA targeting the reporter, 250 ng of Cas9, along with 280 ng of empty vector or e18 expressing vector. Precise

NHEJ was quantified as the percentage of GFP+ cells measured by flow cytometry analysis. Flow cytometry analyses were conducted in the Herbert Irving Comprehensive Cancer Center Flow

Cytometry Shared Resource.

Next-generation sequencing-based HDR and end-joining assays

NGS analysis was performed for gene editing experiments at the BFP reporter and at the TP53, CALD1, FANCM and SPRTN loci. Five days after transfection with Cas9, sgRNAs

62

and/or DNA donors, cells were collected and their genomic DNA (gDNA) was isolated as

detailed above. Sequencing libraries were prepared using primers listed in Supplementary

Table 2.7. The protocol outlined herein is modified from a protocol described elsewhere68. In

the first PCR step, the targeted gene was amplified from 100 ng of gDNA in a 25 µl reaction with

Q5 Master Mix (M0494L, NEB) and 500 nM final concentration of forward and reverse primers.

This step was omitted for experiments in which the FANCM and SPRTN genes were targeted, as they did not include the delivery of donor molecules. The thermal cycler program for the first PCR was as follows: 1 min at 98°C, 35 cycles × 10 s at 98°C, 20 s at

66°C, 30 s at 72°C, and 2 min at 72°C. The second PCR was performed to add the Illumina P5

and P7 adaptors. The product from the first PCR was diluted at 100X and 2 µl of this dilution was used as template. The thermal cycler program for the second PCR was as follows: 1 min at

98°C, 35 cycles × 10 s at 98°C, 20 s at 60°C, 30 s at 72°C, and 2 min at 72°C. A third PCR was

performed to add index barcodes to each sample. For this PCR, the product of the first PCR was

diluted 100X and 8 µl of this dilution was used as template in a 25 µl reaction with Q5 Master

Mix and 500 nM final concentration of each of the forward (i5) and reverse (i7) primers. The

thermal cycler program for the third PCR was as follows: 1 min at 98°C, 12 cycles × 10 s at

98°C, 20 s at 60°C, 30 s at 72°C, and 2 min at 72°C. PCR amplifications were verified using

2% agarose gels in TAE. The indexed amplicons from PCR #3 were pooled and gel purified.

Gel purified samples were sequenced at the Genome Sciences Facility at The Pennsylvania

State College of Medicine. Data was analyzed using software described previously443. The sequences and allelic frequencies of all the FANCM and SPRTN variants identified by NGS are available in Supplementary Table 2.3.

63 mRNA editing experiments

Cas9 mRNA was obtained from TriLink (L-7206). sgRNA targeting the BFP reporter was generated by in vitro transcription using the HiScribe™ Quick T7 High Yield RNA Synthesis Kit

(E2050, NEB) following the manufacturer’s protocol. e18 and e18-D221A mRNAs were produced using the mMESSAGE mMACHINE T7 ULTRA Kit (ThermoFisher Scientific), according to the manufacturer's instructions. Oligonucleotide sequences used to produce the sgRNA are available in Supplementary Table 2.6.

Cell cycle analysis

HEK293T cells transfected with the indicated plasmids were collected 2 days after transfection, centrifuged and washed in PBS, fixed in 70% ethanol and stored overnight at

−20°C. At the time of analysis, the cells were centrifuged, washed twice in PBS, resuspended in

PBS with RNase (0.1 mg/ml) and propidium iodide 10 µg/ml, and incubated at 37°C for 15-30 min in the dark. Cell cycle distribution was determined using the BD LSRFortessa machine and

FlowJo v9 software.

Cell proliferation and viability assays

HEK293T and U2OS cells transfected with the indicated plasmids were seeded at 2,000 cells/ well and 1,000 cells/ well, respectively, in a 96-well plate and monitored over 8 days.

During the time course, cells were passaged into a larger plate upon reaching 70% confluency, to allow for continued proliferation. Cell proliferation was monitored across different time points by collecting the cells and quantifying them using the Countess II Automated Cell

Counter (ThermoFisher Scientific) following the manufacturer’s protocol. For survival assays

64 upon HU treatment, U2OS cells seeded in triplicate at 4,000 cells/well in 12-well plates were treated with DMSO or with the indicated concentrations of HU 16 h later. 7 days after treatment, cells were fixed and stained using crystal violet solution (0.5% [w/v] in 20% methanol). For quantification, bound crystal violet was dissolved in 10% (v/v) acetic acid, and absorbance of 1:50 dilutions was measured at 595 nm using the SpectraMax iD3 microplate reader (Molecular Devices). For cell viability studies in hESCs, diploid pES12 cells were electroporated with Cas9, sgRNA targeting the HIST1H2BK locus and either e18 or an empty vector as control. As a positive control for apoptosis, cells were treated with 50 µM of cisplatin

24 h after plating. 5 days after nucleofection, apoptosis was assayed using the Annexin V-FITC

Apoptosis Staining kit (Abcam) following the manufacturer’s protocol. Cells were subjected to flow cytometry using BD LSRFortessa. Edited pES12 clones (mAG-positive), following treatment with empty vector as control or e18, were sorted as single clones into each well of a 96-well plate using the Influx cell sorter machine. The viability of the clones was monitored over 14 days and the number of viable clones on day 14 was evaluated for each treatment condition.

65

2.5 Figures

Figure 2.1: ORF screen to identify stimulators of CRISPR-mediated HDR.

66

a) Schematic of the BFP reporter utilized for the HDR assays. The sequences of the unedited (BFP) and edited (GFP) loci are shown. The PAM sequence is underlined and the site of Cas9- induced DNA cleavage is indicated by red arrows. b) Representative flow cytometry plots of HEK293T cells carrying the BFP reporter shown in a targeted with Cas9 with or without transfection of an ssODN donor containing the GFP sequence indicated in a. c) Experimental workflow for the arrayed ORF screen conducted in this study. HEK293T cells carrying the BFP reporter shown in a were seeded in 24-well plates and transfected with Cas9, sgRNA, ssODN or dsDNA donors, in combination with individual human DDR ORFs (204). The percentage of GFP+ HDR events induced by the expression of each ORF was quantified by flow cytometry. d) Graphical representation of the HDR levels for 204 DDR ORFs. HDR fold change corresponds to the average of the HDR values obtained using both ssODN and dsDNA donors for each ORF relative to control (HDR fold change = 1). Each line corresponds to a single ORF. The top HDR stimulatory ORF (RAD18) is highlighted in orange. e) Scatter plot of the HDR fold change (relative to control) mediated by ssODN (x-axis) and dsDNA (y-axis) donors for 204 DDR ORFs. Each data point corresponds to the mean HDR fold stimulation of two biological replicates obtained for a single ORF using ssODN or dsDNA donors. RAD18 is highlighted in orange. f-g) CRISPR-mediated HDR levels measured using the BFP reporter stably integrated in the indicated cell lines following transfection with ssODN (f) or dsDNA donor (g), and vectors expressing Cas9 along with WT RAD18 or an empty vector control. 72 h post-transfection, cells were analyzed by flow cytometry for GFP fluorescence. The values of individual experiments were normalized to the empty vector control condition (dashed line) and presented as the mean ± SEM (n≥3). Statistical significance was calculated using a one-way analysis of variance with Tukey’s multiple comparison test with single pooled variance (*p < 0.05, **p < 0.01, ***p < 0.001).

67

Figure 2.2: Analysis of RAD18 domains required to promote CRISPR-mediated HDR.

a) Schematic diagram of wild-type RAD18 and its mutant variants generated in this study. b-c) HDR levels measured using the BFP reporter stably expressed in HEK293T cells following transfection of ssODN (b) or dsDNA donor (c), and vectors expressing Cas9 along with WT or mutant RAD18 variants, or an empty vector control. Experiments were conducted as described in Fig. 2.1 f, g. The values of individual experiments were normalized to the empty vector condition and presented along with the mean ± SEM (n≥3). Statistical significance was calculated using a one-way analysis of variance with Tukey’s multiple comparison test with single pooled variance (*p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001).

68

Figure 2.3: Interplay between e18 (RAD18-∆SAP) and 53BP1 at DSBs.

a) Representative images showing IR-induced foci of the indicated proteins in U2OS cells transfected with vectors expressing FLAG-tagged e18, e18-D221A mutant or an empty vector control. Cells were fixed 1 h after irradiation with a 5 Gy dose and stained with antibodies that recognize the FLAG tag (green) or 53BP1 (red). Merged images with DAPI staining (blue) are also shown. Arrows indicate cells expressing e18 or e18-D221A. b) Graphical representation of the percentage of FLAG-positive cells with >5 53BP1 or γH2AX foci, under the conditions described in a. Individual experimental data are shown along with the mean ± SEM (n=3). c) HDR values in HEK293T cells carrying the BFP reporter and transfected with siRNAs targeting 53BP1 or RNF8, or with a non-targeting control siRNA. 24 h after siRNA transfection, cells were transfected with Cas9/sgRNA and dsDNA donor and the percentage of GFP+ cells was then determined after 72 h by flow cytometry. The values of three individual experiments were

69 normalized to the siRNA control condition and presented along with the mean ± SEM (n=3). d) HDR levels in HEK293T cells carrying the BFP reporter transfected with Cas9 and dsDNA donor in the presence of e18, i53, or both. HDR frequency was determined by quantifying the percentage of GFP+ cells, as described in c. The values of individual experiments were normalized to the control and presented along with the mean ± SEM (n=3). Statistical significance of the data shown in b, c and d was calculated using a one-way analysis of variance with Tukey’s multiple comparison test with single pooled variance (*p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001)

70

Figure 2.4: Measurement of indel levels at Cas9-induced DSBs after expression of e18.

71 a) Schematic of the GFP-2-cut reporter assay utilized to measure precise end-joining. b) Precise end-joining (EJ) levels measured using the GFP-2-cut reporter shown in a stably integrated in HEK293T cells transfected with vectors expressing Cas9 and sgRNA, along with empty vector control, e18, e18-∆UBZ or i53. The frequency of precise end-joining was determined by quantifying the percentage of GFP+ cells by flow cytometry. Individual experiments were normalized to the empty vector condition and presented along with the mean ± SEM (n=4). Statistical significance was calculated using a one-way analysis of variance with Tukey’s multiple comparison test with single pooled variance (*p < 0.05, **p< 0.01). c) Indel levels measured by NGS at the BFP reporter integrated in HEK293T cells transfected with Cas9 and dsDNA donor along with e18, e18∆UBZ mutant or an empty vector control. Values are presented as percentage of indels per edited sequences (Indels + HDR). Individual data are presented along with the mean ± SEM (n=3). Statistical significance was calculated as in b (*p < 0.05). d) Fold change in indel levels measured by NGS at the indicated loci in HEK293T cells transfected with vectors expressing Cas9/sgRNA along with e18 or an empty vector control. The calculated indel values of individual experiments were normalized to the empty vector condition (dashed line) and presented as the mean ± SEM (n=3). e) Pattern of indels induced upon treatment of HEK293T cells with Cas9 and sgRNA targeting FANCM along with either an empty vector control or e18, as determined by NGS. The wild-type FANCM sequence is shown on the top along with the 4 most frequent mutant alleles resulting from the repair of a Cas9-induced DSB (brown dashed line) in the FANCM locus. Sequences of microhomology (M.H., green boxes) and PAM sequence (purple line) are indicated. The repair outcome (deletion size) and the length of M.H. utilized for repair are specified. The frequency of the indicated repair products obtained upon transfection of an empty vector control or e18 is represented as fraction of total edited sequences along with the mean ± SEM (n=3). Non M.H.-dependent deletions and M.H.- mediated deletions are depicted in red and blue, respectively. f) Fold change in the mean of all MMEJ- and NHEJ-induced repair events upon targeting the FANCM locus shown in e with Cas9 in cells treated with e18, relative to empty vector control. Each data point represents the average, across 3 e18-treated biological replicates, of the fraction of each variant generated by MMEJ (blue) or NHEJ (red), as shown in e, normalized to the empty vector control condition (dashed line). Data are presented as the mean ± SEM (n=3). Statistical significance was calculated with a paired t-test comparing the overall MMEJ and NHEJ fold change (**p< 0.01). The full list of repair products generated at the FANCM targeted site, their frequency and proportion in the edited allele population are compiled in Supplementary Table 2.3 e) Cell cycle-dependent approaches to enhance HDR (see text for more details).

72

Figure 2.5: Targeting of endogenous loci in human cells using e18.

a) Schematics of the gene targeting experiments conducted using either ssODNs carrying nucleotide substitutions or dsDNA donor plasmids with fluorescent tags. b-c) Gene targeting efficiency at the TP53 (green) and JAK2 (purple) loci in HEK293T (b) and EMX1 (blue) and JAK2 (red) loci in HeLa (c) cells transfected with either TP53, JAK2, or EMX1 ssODN template, the vector coding for Cas9/sgRNA, and a plasmid expressing e18 or an empty vector control. Gene targeting efficiency at the TP53 locus was determined by NGS, while gene targeting efficiency at

73 the JAK2 and EMX1 loci was calculated by RFLP assay. HDR frequency values of individual paired experimental sets are shown on the left graph. Fold change in HDR values for e18-expressing conditions relative to the control (dashed line) are shown on the right graph, where individual experiments are presented along with the mean ± SEM (n=3). d) Gene targeting efficiency at the SEC61B (green) and ACTB (brown) loci in HEK293T cells transfected with either SEC61B-GFP or ACTB-GFP donor template, the vector coding for the sgRNA and Cas9, and a plasmid expressing e18 or an empty vector control, as determined by flow cytometry analysis of GFP+ cells. Experiments are represented as in b (n=3). e) Gene targeting efficiency at the HIST1H2BK (yellow) and LMNA (blue) loci in U2OS cells transfected with either HIST1H2BK-mAG or ACTB- Clover donor template, the vector coding for the sgRNA and Cas9, and a plasmid expressing e18 or the empty vector control, as determined by flow cytometry analysis of mAG+ or Clover+ cells. Experiments are represented as in b (n=4). The full list of HDR values for all the gene editing experiments at endogenous loci is available in Supplementary Table 2.4.

74

Figure 2.6: Targeting of endogenous loci in hESCs.

a) Gene targeting efficiency at the CALD1 (brown) and HIST1H2BK (yellow) loci in hESCs pES12 transfected with a CALD1 ssODN or a HIST1H2BK-mAG plasmid donor, respectively, along with the vector coding for the sgRNA and Cas9, and a plasmid expressing e18 or an empty vector control. Gene targeting efficiency at the CALD1 and HIST1H2BK loci was determined by NGS and flow cytometry analysis of mAG+ cells, respectively. Experiments were performed 5 days after transfection. HDR values are represented as in Fig. 2.5b (HIST1H2BK, n=5; CALD1, n=3). b) Percentage of viable mAG+ hESC pES12 clones 14 days after transfection with Cas9, sgRNA and mAG dsDNA donor targeting the HIST1H2BK locus, and either an empty vector control or e18. The values of individual experiments are presented along with the mean ± SEM (n=3). Statistical significance was calculated using a paired-t-test. c) Immunostaining of pES12 cells edited at the HIST1H2BK locus using mAG dsDNA donors with or without e18 expression, as shown in Fig. 2.5a. Cell were stained using antibodies against the pluripotency markers NANOG (red) and SSEA-4 (cyan). mAG-tagged H2BK is shown in green. HOECHST was used to detect DNA nuclei (blue). Merged images are also shown. Experiments were performed 5 days after transfection.

75

Supplementary Figure 2.1: HDR stimulation and PCNA ubiquitination induced by WT and mutant RAD18 variants expressed in HEK293T cells.

76 a) Protein levels of WT and mutant RAD18 upon transfection of RAD18-expressing pMSCV- FLAG-HA vectors into HEK293T cells carrying the BFP reporter and subjected to the HDR assays shown in Fig. 2b, c. Cell lysates were analyzed by western blotting using antiRAD18 and anti- vinculin antibodies. b) Protein levels of WT RAD18 and RAD18-DSAP upon transfection of RAD18-expressing pcDNA3.1 plasmid into HEK293T cells carrying the BFP reporter. Western blotting was conducted as in a. c) CRISPR mediated HDR levels measured at the BFP reporter in HEK293T cells transfected with dsDNA donor along with pcDNA3.1 vector expressing WT RAD18 or RAD18-ΔSAP, or an empty vector control. HDR frequency was determined by quantifying the 4 percentage of GFP+ cells, as described in Fig. 1f, g. The values of individual experiments were normalized to the empty vector condition (dashed line) and presented along with the mean ± SEM (n≥3). Statistical significance was calculated using a Tukey’s test as in Fig. 1f, g (*p< 0.05) d) HDR levels measured at the BFP reporter in HEK293T cells nucleofected with Cas9/sgRNA and ssODN along with increasing concentrations of RAD18-ΔSAP or RAD18-ΔSAP-D221A mRNA. HDR frequency was determined by quantifying the percentage of GFP+ cells, as in c. The values of individual experiments were normalized to the ΔSAP-D221A mRNA condition and presented along with the mean ± SEM (n=3). Statistical significance was calculated with a paired t-test comparing the HDR values obtained using RAD18- DSAP mRNA versus RAD18-DSAP-D221A mRNA (*p < 0.05). e) Levels of PCNA and ubiquitinated PCNA in HEK293T cells transfected with vectors expressing WT RAD18, RAD18-DSAP or the empty vector control with or without treatment with UV radiation (40 or 60 J/m2). Western blotting was conducted using anti-PCNA, anti-ubiquitinated PCNA (K164), anti-RAD18 and anti-vinculin antibodies. Two distinct exposures (short exposure, s.e.; long exposure, l.e.) of the anti-ubiquitinated PCNA blot are shown.

77

Supplementary Figure 2.2: CRISPR-mediated HDR levels upon expression of RAD18 UBZ-NLS and RAD18-ΔSAP.

a) Schematic representation of RAD18-ΔSAP and the NLS-tagged UBZ (UBZ-NLS) motif of RAD18. UBZ-NLS has a FLAG tag. b) CRISPR-mediated HDR levels measured using the BFP reporter stably integrated in HEK293T cells following transfection of dsDNA donor, and vectors expressing Cas9 and sgRNA along with RAD18-ΔSAP, RAD18-UBZ (UBZ-NLS) or an empty vector control. 72 h post-transfection, cells were analyzed for GFP fluorescence to determine 5 the frequency of HDR events. The values of individual experiments were normalized to the empty vector condition and presented along with the mean ± SEM (n=3). Statistical analysis was conducted as in Fig. 1f, g (*p <0.05, **p<0.01). c) Expression levels of FLAG-tagged UBZ-NLS in HEK293T cells. Cell lysates were subjected to western blotting using anti-FLAG and anti-vinculin antibodies.

78

Supplementary Figure 2.3: γH2AX foci formation and detection of 53BP1 and RNF8

a) Representative images showing IR induced foci of the indicated proteins in U2OS cells transfected with vectors expressing FLAG-tagged e18, e18-D221A mutant or an empty vector control. Cells were irradiated with a 5 Gy dose and fixed 1 h after irradiation. Staining for anti- FLAG (green) and anti-gH2AX (red) and their merge with DAPI staining (blue) is shown. b) Detection of 53BP1 and RNF8 protein levels in whole cell lysates prepared from HEK293T cells carrying the BFP reporter, transfected with 53BP1 and RNF8 siRNAs for the experiments shown in Fig. 3c, and probed with the indicated antibodies. to detect DNA nuclei (blue). Merged images are also shown. Experiments were performed 5 days after transfection.

79

Supplementary Figure 2.4: Analysis of indel mutation patterns at Cas9-induced.

a) Representative flow cytometry plots of one of the end-joining experiments with the GFP-2- cut reporter shown in Fig. 4b. b) Indel levels measured by NGS at the FANCM locus in HEK293T cells transfected with Cas9/sgRNA and ssODN along with e18, i53 or an empty vector control. The values of individual experiments were normalized to the empty vector control and

80 presented along with the mean ± SEM (n=3). Statistical significance of the data shown was calculated using a one-way analysis of variance with Tukey’s multiple comparison test 6 with single pooled variance (**p<0.01). c) Frequency of the indicated FANCM indel variants shown in Fig. 4e upon treatment with the empty vector control or e18. Individual paired experimental sets are shown. d) Pattern of indels formed upon treatment of HEK293T cells with Cas9 and sgRNA targeting the SPRTN gene along with either an empty vector control plasmid or e18, as determined by NGS. The wildtype SPRTN sequence is shown on the top along with the 4 most frequent mutant alleles resulting from the repair of a Cas9-induced DSB (brown dashed line) in the SPRTN locus. Sequences of microhomology (M.H., green boxes) and PAM sequence (purple line) are indicated. The repair outcome (deletion/insertion size) and the length of M.H. utilized for repair are specified. The frequency of the indicated repair products obtained upon transfection of an empty vector control or e18 is represented as fraction of total edited sequences along with the mean ± SEM (n=3). Non M.H.-dependent repair events and M.H.- mediated deletions are depicted in red and blue, respectively. e) Frequency of the indicated SPRTN indel variants upon transfection with the empty vector control or e18. Individual paired experimental sets are shown. The full list of repair products generated at the SPRTN targeted site, their frequency and proportion in the edited allele population are compiled in Supplementary Table 2.3.

81

Supplementary Figure 2.5: RFLP and flow cytometry analysis upon targeting of endogenous loci in cells expressing e18.

a) Representative images of the RFLP assays conducted for the experiments shown in Fig. 5b, c. Undigested and PmeI digested DNA products are shown by red and green arrows. b) Representative flow cytometry plots of one of the gene targeting experiments at the SEC61B and ACTB 7 loci shown in Fig. 5d. c) Representative flow cytometry plots of one of the gene targeting experiments at the HIST1H2BK and LMNA loci shown in Fig. 5e.

82

Supplementary Figure 2.6: Analysis of the effect of e18 expression on cell proliferation.

a) Cell cycle distribution upon e18 expression in HEK293T cells, as measured by propidium iodide staining. HEK293T cells were analyzed by flow cytometry 2 days after transfection with an empty vector control, e18 or the e18-D221A mutant. Results are presented as the mean ± SEM (n=3). b) Analysis of cell proliferation upon e18 expression. HEK293T (top panel) or U2OS (bottom panel) cells were transfected with an empty vector control or e18 and plated in a 96- well plate. The number of cells per well was quantified at the indicated time points. Pooled data from 3 independent experiments are shown. Error bars represent SEM (n=3). c) Cell survival analysis on HEK293T cells transfected with the indicated plasmids following treatment with hydroxyurea (HU). The fraction of surviving cells was determined by crystal violet staining after 7 days of HU treatment. Pooled data from 3 independent experiments are shown. Error bars represent SEM (n=3).

83

Supplementary Figure 2.7: Analysis of apoptosis in hESCs expressing e18.

a) e18 expression in diploid hESCs pES12. Cell lysates were analyzed by western blotting using anti-RAD18 and anti-vinculin antibodies. b) Representative flow cytometry plots of apoptosis analysis conducted in pES12 cells 4 days after transfection of Cas9/sgRNA, with e18 or empty vector control. As a positive control for apoptosis, untransfected cells were treated with cisplatin (50 µM) for 24 h before analysis. Cells were stained with Annexin V and propidium iodide (PI) to evaluate apoptotic cell death 8 by flow cytometry. c) Percentage of viable (brown), early (green) and late (red) apoptotic pES12 cells under the conditions described in b, as determined by Annexin V/ PI double staining. Statistical significance was calculated using a one-way analysis of variance with Tukey’s multiple comparison test with single pooled variance (*p<0.05, ***p<0.001, ****p<0.0001).

84

Supplementary Figure 2.8: Images of uncropped western blots.

85

2.6 Tables

Supplementary Table 2.1: CRISPR-mediated HDR frequency values upon expression of 204 individual DDR ORFs. Results of the ORF screen conducted as shown in Fig. 1. HDR fold change values for each ORF, relative to empty vector control, are shown for two independent experiments using ssODN or dsDNA donors in HEK293T cells harboring the BFP reporter.

HDR fold change (relative to control) HDR fold change (relative to control) ssODN dsDNA ssODN + dsDNA ssODN dsDNA ssODN + dsDNA ORF Trial #1 Trial #2 Mean Trial #1 Trial #2 Mean Mean ORF Trial #1 Trial #2 Mean Trial #1 Trial #2 Mean Mean MRE11 0.92249047 1.05555556 0.98902301 1.04761905 0.711864407 0.879741728 0.93438237 POLH 0.86753731 0.95238095 0.90995913 1.08433735 1.09090909 1.08762322 0.998791176 RAD1 0.950444727 0.94444444 0.94744459 1.14285714 0.949152542 1.046004843 0.996724714 POLD1 1.03436497 0.90490396 0.96963446 1.23529993 1.01205185 1.12367589 1.046655177 HUS1 0.447268107 0.88888889 0.6680785 2.0952381 1.186440678 1.640839387 1.154458942 SLX4 1.01030997 0.99851471 1.00441234 0.98823995 0.84337654 0.91580824 0.960110292 RAD17 1.146124524 1 1.07306226 1.04761905 1.186440678 1.117029863 1.095046063 H2AFX 0.98625497 0.99851471 0.99238484 0.868 0.823 0.8455 0.918942421 KU80 0.950444727 1.22222222 1.08633347 0.85714286 0.949152542 0.9031477 0.994740587 RAD51C 0.98625497 1.07652367 1.03138932 0.98823995 0.9277142 0.95797707 0.994683197 TIPIN 0.92249047 1.27777778 1.10013412 0.95238095 1.06779661 1.010088781 1.055111452 UBE2V2 1.17869497 0.92050575 1.04960036 1.11176994 1.01205185 1.06191089 1.055755626 WHSC1 1.481575604 1.27777778 1.37967669 0.95238095 1.186440678 1.069410815 1.224543753 UBE2I 0.74570498 0.98291292 0.86430895 0.98823995 1.18072716 1.08448355 0.97439625 PCNA 0.726810673 0.94444444 0.83562756 0.57142857 0.711864407 0.641646489 0.738637024 ERCC3 0.96219997 0.90490396 0.93355196 0.86470995 0.9277142 0.89621207 0.91488202 TIP60 1.285895807 1.16666667 1.22628124 1.42857143 1.186440678 1.307506054 1.266893645 MCM3 1.05841997 1.04532009 1.05187003 1.11176994 0.9277142 1.01974207 1.035806048 MCMDC2 1.42566709 1.44444444 1.43505577 1.23809524 1.186440678 1.212267958 1.323661863 MSH5 0.72164998 0.99851471 0.86008235 0.74117996 0.84337654 0.79227825 0.826180298 RNF8 1.146124524 1.11111111 1.12861782 1.14285714 1.186440678 1.164648911 1.146633364 GEN1 0.81786998 0.99851471 0.90819234 1.11176994 0.84337654 0.97757324 0.942882792 RUVBL2 1.00635324 0.81545064 0.91090194 0.57142857 0.711864407 0.641646489 0.776274216 MCM2 0.89003498 1.04532009 0.96767753 0.86470995 1.01205185 0.9383809 0.953029216 UHRF1 1.09021601 0.77253219 0.9313741 0.95238095 0.830508475 0.891444714 0.911409407 MSH2 1.03544776 0.85714286 0.94629531 1.08433735 0.96969697 1.02701716 0.986656234 RAD51AP1 1.00635324 0.90128755 0.9538204 0.85714286 1.305084746 1.081113802 1.017467099 MSH4 0.91408998 1.04532009 0.97970503 1.35882992 1.09638951 1.22760971 1.103657373 BARD1 1.229987294 1.11587983 1.17293356 1.33333333 1.305084746 1.31920904 1.2460713 ERCC5 1.10099338 1.10964333 1.10531835 0.90128755 0.88654354 0.89391555 0.999616949 CHD1 0.531130877 0.68669528 0.60891308 0.66666667 0.593220339 0.629943503 0.619428291 LIG3 1.31529851 1.33333333 1.32431592 1.08433735 0.72727273 0.90580504 1.115060479 ASF1A 1.118170267 0.90128755 1.00972891 0.85714286 0.711864407 0.784503632 0.897116271 BRCA1 0.90397351 1.15125495 1.02761423 0.94635193 0.99736148 0.9718567 0.999735468 MCMBP 0.782719187 0.72961373 0.75616646 0.57142857 0.711864407 0.641646489 0.698906475 MLH3 0.97350993 1.04029062 1.00690028 1.08154506 0.94195251 1.01174879 1.009324532 MBD4 0.950444727 0.8583691 0.90440691 0.76190476 0.949152542 0.855528652 0.879967783 UBE2B 0.85761589 1.05416116 0.95588853 0.94635193 1.21899736 1.08267465 1.019281587 FEN1 1.17407878 1.97424893 1.57416385 1.14285714 1.333333333 1.238095238 1.406129546 APTX 0.91556291 1.05416116 0.98486204 1.08154506 1.10817942 1.09486224 1.03986214 LIG4 1.298245614 1.06857143 1.18340852 0.97777778 0.987179487 0.982478633 1.082943577 GINS2 0.92715232 0.84610304 0.88662768 1.03648069 0.99736148 1.01692108 0.95177438 CLOCK 1.50877193 1.50857143 1.50867168 1.46666667 1.551282051 1.508974359 1.508823019 MTA1 0.72761194 0.57142857 0.64952026 0.9939759 0.96969697 0.98183644 0.815678346 RAD51 1.368421053 1.06857143 1.21849624 0.97777778 1.128205128 1.052991453 1.135743847 ORC6 0.91556291 0.98480845 0.95018568 0.99141631 1.3298153 1.16061581 1.055400745 KIAA0913 1.157894737 0.94285714 1.05037594 1.34444444 0.846153846 1.095299145 1.072837543 RPA4 0.90397351 0.95706737 0.93052044 0.90128755 0.83113457 0.86621106 0.89836575 SSBP1 1.01754386 0.88 0.94877193 1.22222222 0.987179487 1.104700855 1.026736392 ALKBH3 0.89238411 0.97093791 0.93166101 1.08154506 1.10817942 1.09486224 1.013261626 SMARCAL1 2.011 0.849 1.43 1.34444444 1.128205128 1.236324786 1.333162393 MMS19 1.06622517 0.998679 1.03245208 0.99141631 1.10817942 1.04979786 1.041124973 RAD52 1.309021113 1.25 1.27951056 0.99099099 1.01986755 1.005429271 1.142469914 MPG 1.14735099 1.09577279 1.12156189 1.08154506 0.94195251 1.01174879 1.066655338 DNA2 1.052631579 1.06857143 1.0606015 0.97777778 0.987179487 0.982478633 1.021540068 POLL 0.99668874 0.98480845 0.9907486 1.08154506 0.72031662 0.90093084 0.945839721 RFWD3 0.736842105 0.69142857 0.71413534 1.1 0.846153846 0.973076923 0.843606131 ERCC8 1.00827815 0.98480845 0.9965433 0.99141631 0.66490765 0.82816198 0.91235264 RBBP8 1.122807018 1.00571429 1.06426065 1.46666667 1.269230769 1.367948718 1.216104685 ALKBH2 1.01986755 1.05416116 1.03701436 0.99141631 0.83113457 0.91127544 0.974144897 DCLRE1C 1.263157895 1.00571429 1.13443609 1.22222222 0.987179487 1.104700855 1.119568473 XRCC2 1.00827815 0.4993395 0.75380882 1.03648069 1.16358839 1.10003454 0.926921681 EXO1 1.157894737 0.94285714 1.05037594 1.1 0.987179487 1.043589744 1.046982842 APEX1 0.93874172 0.998679 0.96871036 0.85622318 0.88654354 0.87138336 0.920046858 RNF168 0.701754386 1.00571429 0.85373434 0.753 1.22 0.9865 0.920117168 HDAC6 1.13576159 0.95706737 1.04641448 0.99141631 0.94195251 0.96668441 1.006549444 GINS3 0.947368421 1.06857143 1.00796993 0.85555556 1.128205128 0.991880342 0.999925134 HDAC10 0.53171642 0.38095238 0.4563344 0.90361446 0.96969697 0.93665571 0.696495057 LMNA 0.49122807 0.62857143 0.55989975 0.61111111 0.705128205 0.658119658 0.609009704 BRD8 1.12951807 1.30232558 1.21592183 0.73619632 0.93333333 0.83476483 1.025343326 MCM8 0.666666667 0.69142857 0.67904762 1.1 0.846153846 0.973076923 0.826062271 RNF20 0.85843374 0.58604651 0.72224012 0.9202454 0.93333333 0.92678937 0.824514745 MYST2 1.157894737 0.88 1.01894737 0.97777778 1.41025641 1.194017094 1.106482231 SETD5 1.11258278 0.94319683 1.02788981 1.12660944 1.27440633 1.20050789 1.114198846 SIRT1 0.526315789 0.56571429 0.54601504 0.48888889 0.846153846 0.667521368 0.606768203 EHMT2 1.00827815 0.98480845 0.9965433 0.90128755 1.05277045 0.977029 0.986786151 TCOF1 0.456140351 0.69142857 0.57378446 0.48888889 0.423076923 0.455982906 0.514883684 EP400 1.08940397 1.12351387 1.10645892 0.90128755 0.88654354 0.89391555 1.000187234 CHK1 0.947368421 0.81714286 0.88225564 0.85555556 1.269230769 1.062393163 0.972324401 SIN3A 0.99212598 0.61149826 0.80181212 1.04180064 1.12759644 1.08469854 0.943255331 CHK2 0.561403509 0.62857143 0.59498747 0.48888889 0.705128205 0.597008547 0.595998008 ING4 0.8976378 0.70557491 0.80160635 1.09967846 0.78931751 0.94449798 0.873052168 TOPBP1 1.333333333 1.2 1.26666667 0.97777778 0.987179487 0.982478633 1.12457265 HDAC7 0.90944882 1.1445993 1.02702406 1.15755627 0.84569733 1.0016268 1.01432543 RPA1 0.99032882 0.59925094 0.79478988 0.57251908 0.761904762 0.667211923 0.731000901 SETD3 1.03937008 1.17595819 1.10766413 0.98392283 0.84569733 0.91481008 1.011237107 RAD18 1.454545455 1.7 1.57727273 2.17557252 1.396825397 1.786198958 1.681735843 MORF4L1 1.00393701 1.08188153 1.04290927 0.92604502 1.0148368 0.97044091 1.006675088 SSRP1 1.114119923 0.8988764 1.00649816 1.03053435 1.015873016 1.023203684 1.014850924 OGG1 1.02755906 0.89372822 0.96064364 1.09967846 0.90207715 1.0008778 0.980760722 OTUB1 0.866537718 1.07865169 0.9725947 0.91603053 1.015873016 0.965951775 0.969273238 HDAC11 0.76807229 0.3255814 0.54682684 0.4601227 0.8 0.63006135 0.588444096 RPA2 1.052224371 0.83895131 0.94558784 1.14503817 1.142857143 1.143947656 1.044767748 ING2 1.07480315 0.92508711 0.99994513 1.09967846 1.0148368 1.05725763 1.028601378 HMCES 1.423597679 0.9588015 1.19119959 1.14503817 1.015873016 1.080455592 1.13582759 SETD7 0.92620482 0.58604651 0.75612567 0.9202454 0.53333333 0.72678937 0.741457516 WRNIP1 0.649903288 0.9588015 0.80435239 0.91603053 0.761904762 0.838967648 0.821660021 RNF138 0.96850394 0.75261324 0.86055859 1.15755627 0.95845697 1.05800662 0.959282605 FBH1 0.773694391 0.71910112 0.74639776 0.91603053 1.26984127 1.092935902 0.91966683 KIAA0182 1.01574803 1.06620209 1.04097506 0.92604502 0.95845697 0.94225099 0.991613028 ESRRA 0.649903288 1.61797753 1.13394041 0.6870229 0.761904762 0.724463832 0.92920212 ABTB1 1.01656627 0.91162791 0.96409709 0.82822086 0.93333333 0.8807771 0.922437091 SIRT2 0.742746615 0.59925094 0.67099878 0.91603053 1.142857143 1.029443839 0.850221307 AEN 1.02755906 1.05052265 1.03904085 1.09967846 1.0148368 1.05725763 1.048149239 POLI 0.835589942 0.8988764 0.86723317 0.79 0.97 0.88 0.873616587 ALDH3A2 0.98031496 1.19163763 1.0859763 1.04180064 0.84569733 0.94374899 1.014862641 POLD2 0.711798839 1.01872659 0.86526272 0.6870229 1.26984127 0.978432086 0.921847401 ARL4C 0.82677165 0.98780488 0.90728827 0.63665595 0.73293769 0.68479682 0.796042542 MCM9 0.588007737 1.13857678 0.86329226 0.80152672 0.761904762 0.78171574 0.822503999 ASH2L 0.92125984 1.06620209 0.99373097 0.81028939 1.07121662 0.940753 0.967241985 DAXX 1.640232108 1.0188 1.32951605 1.14503817 1.015873016 1.080455592 1.204985823 BCL7A 1.08661417 0.83101045 0.95881231 0.8681672 1.0148368 0.941502 0.950157156 VRK3 1.547388781 0.8988764 1.22313259 1.25954199 0.761904762 1.010723374 1.116927983 BEND3 0.8976378 1.01916376 0.95840078 0.81028939 0.78931751 0.79980345 0.879102114 PRIMPOL 0.959381044 0.53932584 0.74935344 0.6 0.634920635 0.617460318 0.683406881 CDYL 1.15748032 1.05052265 1.10400148 1.04180064 0.84569733 0.94374899 1.023875234 HELLS 1.063432836 0.95238095 1.00790689 0.90361446 1.333333333 1.118473896 1.063190395 CKAP5 1.06299213 1.08188153 1.07243683 1.09967846 1.0148368 1.05725763 1.064847228 PTPN11 1.14738806 0.57142857 0.85940832 0.9939759 0.848484848 0.921230376 0.890319346 DDX17 0.90361446 0.91162791 0.90762118 1.10429448 0.93333333 1.01881391 0.963217544 UBR2 1.14738806 1.14285714 1.1451226 0.9939759 1.090909091 1.042442498 1.09378255 DDX20 1.11023622 1.3641115 1.23717386 1.09967846 0.90207715 1.0008778 1.119025832 STAM 1.259328358 0.95238095 1.10585466 0.9939759 1.212121212 1.103048558 1.104451607 DDX6 0.81325301 0.97674419 0.8949986 1.01226994 1.06666667 1.0394683 0.967233451 PECI 1.315298507 1.14285714 1.22907783 1.08433735 1.212121212 1.148229281 1.188653553 DDX23 1.12188366 1.07861369 1.10024868 1.07125506 0.99024756 1.03075131 1.065499994 RUVBL1 1.351248638 0.9784082 1.16482842 1.13 0.83 0.98 1.07241421 DONSON 0.97138554 0.71627907 0.84383231 1.28834356 1.2 1.24417178 1.044002043 SETD6 0.886756919 1.26477158 1.07576425 0.79278565 1.019872953 0.906329302 0.991046775 DSCR3 0.84764543 0.40912933 0.62838738 1.17327935 1.03975994 1.10651965 0.867453513 HDAC1 0.811567164 0.66666667 0.73911692 1.08433735 0.848484848 0.966411099 0.852764007 FAM98A 1.03915663 1.30232558 1.1707411 1.10429448 1.06666667 1.08548057 1.128110839 EPC1 0.928983439 1.24090796 1.0849457 0.79278565 1.165569089 0.97917737 1.032061536 GPKOW 0.99722992 0.48351648 0.7403732 0.91821862 0.92423106 0.92122484 0.830799021 SETDB2 0.886756919 0.76363567 0.82519629 1.18917848 0.582784545 0.885981511 0.855588902 HSFY1 1.08864266 0.78106509 0.93485387 1.1562753 1.02325581 1.08976556 1.012309717 DNMT3A 1.309022118 1.1454535 1.22723781 1.5855713 1.311265225 1.448418263 1.337828037 ISLR 1.12188366 1.09721048 1.10954707 0.96923077 1.07276819 1.02099948 1.065273275 ING1 0.9 0.92 0.91 0.396393 0.728481 0.562437 0.7362185 MCRS1 1.08033241 1.11580727 1.09806984 0.88421053 1.07276819 0.97848936 1.0382796 BRD3 0.928983439 1.28863519 1.10880932 0.99098206 1.019872953 1.005427508 1.057118412 NAA15 1.11357341 1.63651733 1.37504537 0.64615385 1.15528882 0.90072133 1.137883351 BRD4 0.82 0.8665 0.84325 0.594589 0.437088 0.5158385 0.67954425 NAA40 0.68975069 1.09721048 0.89348059 1.34331984 0.72618155 1.03475069 0.96411564 INO80 1.140116038 0.93068097 1.03539851 0.79278565 0.874176817 0.833481234 0.93443987 NAT10 0.93905817 1.227388 1.08322308 1.02024292 0.97374344 0.99699318 1.04010813 RAD51D 0.675624319 0.57272675 0.62417554 1.18917848 0.437088408 0.813133442 0.718654489 NKAP 1.06371191 1.17159763 1.11765477 0.91821862 1.02325581 0.97073722 1.044195995 XRCC4 1.055662999 1.09772628 1.07669464 1.38737489 1.019872953 1.203623921 1.140159279 PAF1 0.89750693 1.3761623 1.13683461 0.85020243 0.90772693 0.87896468 1.007899646 DMC1 1.132584779 0.8896184 1.01110159 0.46666604 0.792455821 0.629560933 0.820331262 PAGR1 1.09695291 1.19019442 1.14357367 0.98623482 1.02325581 1.00474532 1.074159491 ERCC1 1.02247237 1.34069252 1.18158245 1.0266653 1.188683731 1.107674515 1.144628481 RBM25 1.15512465 0.81825866 0.98669166 1.10526316 1.12228057 1.11377186 1.050231762 POLB 1.132584779 0.81443938 0.97351208 1.11999851 0.693398843 0.906698675 0.940105378 RHOBTB2 1.15512465 1.227388 1.19125633 1.05425101 0.99024756 1.02224929 1.106752806 RRM2B 1.148315123 1.10262563 1.12547038 1.11999851 0.891512798 1.005755653 1.065613014 SKIV2L2 0.86426593 1.11580727 0.9900366 0.88421053 0.89122281 0.88771667 0.938876633 POLE 1.069663402 0.8896184 0.9796409 1.0266653 1.089626753 1.058146026 1.018893464 SNX32 0.90581718 1.17159763 1.0387074 0.73117409 0.92423106 0.82770257 0.933204989 RAD23A 0.928983439 0.6443176 0.78665052 0.99098206 0.582784545 0.786883304 0.786766911 SPIN1 0.63157895 1.39475909 1.01316902 0.47611336 0.94073518 0.70842427 0.860796645 RAD51B 0.896629616 0.96479742 0.93071352 1.0266653 0.891512798 0.959089048 0.944901284 SPIN2B 0.78740158 0.57039338 0.67889748 0.24085225 0.61591356 0.4283829 0.553640188 RAD23B 0.849438584 1.07756595 0.96350227 0.93333209 1.089626753 1.011479421 0.987490845 TCEA1 0.83371931 0.55072464 0.69222198 0.40759611 0.85854617 0.63307114 0.662646558 PRIM1 0.717850839 0.83522651 0.77653868 0.79278565 0.582784545 0.687785098 0.732161887 TRMT6 0.78740158 0.78674948 0.78707553 0.39833256 0.82121808 0.60977532 0.698425423 XPA 0.83370824 0.96479742 0.89925283 0.74666567 0.693398843 0.720032257 0.809642545 TULP2 1.02825382 0.55072464 0.78948923 0.42612321 0.97053045 0.69832683 0.743908029 CDC25C 0.817977896 0.95226759 0.88512274 0.74666567 0.792455821 0.769560746 0.827341744 WDR55 0.72255674 0.60973085 0.66614379 0.37054192 0.91453831 0.64254011 0.654341954 PNKP 1.179775811 1.01491677 1.09734629 1.11999851 0.990569776 1.055284142 1.076315216 WDR76 1.18573414 0.66873706 0.9272356 0.55581288 1.17583497 0.86582392 0.896529761 APEX2 0.755056519 0.8896184 0.82233746 1.0266653 0.693398843 0.860032071 0.841184766 ZC3H11A 1.17647059 0.98343685 1.07995372 0.37054192 0.93320236 0.65187214 0.865912929 PARP3 0.770786863 0.87708857 0.82393772 1.262 1.115 1.1885 1.006218858 ZFHX3 1.1394164 1.10144928 1.12043284 0.49096804 1.21316307 0.85206555 0.986249194 MUS81 0.817977896 1.16527481 0.99162635 1.11999851 1.089626753 1.10481263 1.048219492 ZNF24 0.82445577 1.12111801 0.97278689 0.38906901 0.95186641 0.67046771 0.821627299 CDC45 1.116854435 0.97732726 1.04709085 0.83999888 0.891512798 0.865755839 0.956423344 ZNF460 1.07457156 1.14078675 1.10767916 0.55581288 1.08251474 0.81916381 0.96342148 FANCC 1.258427532 1.05250628 1.15546691 1.0266653 1.188683731 1.107674515 1.131570711 ZRANB2 0.84298286 1.61283644 1.22790965 0.36127837 0.85854617 0.60991227 0.91891096 FAN1 1.006742026 1.35322236 1.17998219 1.0266653 1.386797686 1.206731492 1.193356843 ZW10 0.89856415 1.43581781 1.16719098 0.39833256 1.02652259 0.71242758 0.939809277 FANCG 1.164045467 0.82696922 0.99550734 1.11999851 1.089626753 1.10481263 1.050159987 PDS5B 0.81519222 1.31780538 1.0664988 0.32422418 0.89587426 0.61004922 0.838274011 XRCC6 0.793814979 0.87370037 0.83375768 0.98823995 0.674701234 0.83147059 0.832614132 TTI1 1.09309866 1.31780538 1.20545202 0.54654933 1.06385069 0.80520001 1.005326014 DBF4 0.865979977 1.23254159 1.04926079 1.23529993 1.012051851 1.123675891 1.086468338 RINT1 0.8893006 1.23913044 1.06421552 0.47244095 1.04518664 0.75881379 0.911514656 ATRIP 1.058419971 0.87370037 0.96606017 0.61764997 1.096389505 0.857019735 0.911539953 TEL2 1.10692771 1.17209302 1.13951037 1.19631902 1.06666667 1.13149284 1.135501605 GINS1 0.72761194 0.76190476 0.74475835 0.81325301 0.848484848 0.83086893 0.787813641 DDX11 0.93561834 1.55383023 1.24472429 0.4631774 0.97053045 0.71685392 0.980789105 SPO11 1.154639969 0.99851471 1.07657734 0.98823995 1.096389505 1.042314725 1.059446032 DDX1 1.03751737 0.84575569 0.94163653 0.48170449 1.08251474 0.78210961 0.861873073 CDC14A 1.250859966 1.0141165 1.13248823 1.11176994 1.096389505 1.104079722 1.118283978 ZRANB3 1.44578313 1.2372093 1.34149622 1.3803681 1.06666667 1.22351738 1.2825068 FANCE 1.539519958 1.06092188 1.30022092 1.48235992 1.265064813 1.373712365 1.336966642 SMARCAD1 1.12951807 1.62790698 1.37871252 1.10429448 1.33333333 1.21881391 1.298763215 LIG1 0.72761194 0.76190476 0.74475835 0.81325301 0.727272727 0.77026287 0.75751061 SPRTN 0.78571429 1.01086957 0.89829193 1.53333333 1.72727273 1.63030303 1.264297478

86

Supplementary Table 2.2: Validation of screen hits. Normalized HDR fold-values observed in the indicated cell line that were transfected with Cas9 and sgRNA targeting BFP reporter, indicated donor repair template and vector expressing the indicated gene, as determined by flow cytometry.

ssODN-mediated HDR fold change (relative to control) HEK293T HeLa ORF Trial #1 Trial #2 Trial #3 Mean Trial #1 Trial #2 Trial #3 Mean HDAC10 0.53171642 0.38095238 0.76416897 0.55894592 0.4355 0.6 0.65 0.56183333 FANCE 1.53951996 1.06092188 2.6 1.73348061 0.88888889 0.42424242 0.66666667 0.65993266 TCOF1 0.45614035 0.69142857 0.87030354 0.67262416 0.88888889 0.42424242 1 0.77104377 HDAC11 0.76807229 0.3255814 0.74294205 0.61219858 0.86545455 0.55393586 0.9030837 0.77415804 CLOCK 1.50877193 1.50857143 0.75 1.25578112 1 0.6969697 1 0.8989899 LMNA 0.49122807 0.62857143 0.625 0.58159983 1.16666667 0.63636364 1 0.93434343 SIRT1 0.52631579 0.56571429 0.95521121 0.68241376 0.94444444 0.72727273 1.33333333 1.0016835 RAD52 1.30902111 1.25 1.1 1.2196737 1.24 1.36 1.3 1.3 LIG3 1.31529851 1.333 1.125 1.25776617 1.44444444 0.63636364 1.33333333 1.13804714 DAXX 1.64023211 1.0188 1.8576779 1.50557 1.27777778 0.57575758 1.66666667 1.17340067 SMARCAL1 2.011 0.849 1.534 1.46466667 1 0.36363636 2 1.12121212 TOPBP1 1.33333333 1.2 1.8857 1.47301111 1.33 0.66666667 2 1.33222222 RAD18 1.45454545 1.7 1.37827715 1.51094087 1.491 1.4 1.70909091 1.53336364 FEN1 1.17407878 1.97424893 1.5 1.54944257 1.8 1.45772595 1.0176 1.42510865 dsDNA-HDR fold change (relative to control) HEK293T HeLa ORF Trial #1 Trial #2 Trial #3 Mean Trial #1 Trial #2 Trial #3 Trial #4 Trial #5 Mean PRIMPOL 0 0.634921 1.86862967 0.83451689 0.65216 0.74 0.69414317 0.71212121 - 0.69960609 TCOF1 0.488889 0.423077 1.75537939 0.88911513 0.826 0.9436 0.76789588 0.78030303 - 0.82944973 SPIN2B 0.240852 0.615914 1.3590034 0.7385898 0.1666 0.3 0.22559653 0.09090909 - 0.19577641 BRD4 0.594589 0.437088 1.69875425 0.91014375 0.22 0.2857 0.29934924 0.16494845 0.12878788 0.21975711 ING1 0.396393 0.728481 0.90600227 0.67695876 0.333 0.628 0.52060738 0.74226804 0.33333333 0.51144175 TIP60 1.42857143 1.18644068 1.41562854 1.34354688 0.82692308 1.11111111 0.66666667 1.16666667 - 0.94284188 BARD1 1.33333333 1.30508475 1.47286822 1.37042877 0.88461539 1.88888889 1.33333333 1.33333333 - 1.36004274 FANCE 1.48 1.26 0.90600227 1.21533409 0.67913212 0.9 0.9 - - 0.82637737 RBBP8 1.46666667 1.26923077 1.3590034 1.36496694 1.05769231 1.55555556 0.83333333 1 - 1.1116453 CLOCK 1.46666667 1.55128205 1.83333333 1.61709402 0.95 1.07 0.88286334 0.87878788 1.16666667 0.98966358 HUS1 2.0952381 1.18644068 2.16666667 1.81611515 1 1.18 1.00867679 2.08762887 1.24242424 1.30374598 RAD18 2.17557252 1.3968254 3.16666667 2.24635486 2 2.17 1.20173536 0.74033149 - 1.52801671 RAD18 2.17557252 1.3968254 3.16666667 2.24635486 2 2.17 1.20173536 0.74033149 - 1.52801671

87

NA NA 1.04406349 1.20417083 0.97019805 0.76803715 0.897182381 0.961200353 0.828821029 1.029475903 0.915342056 0.792470055 1.098470316 0.698129728 0.729994022 1.054011969 0.904282006 0.776475066 0.747361776 0.985876914 0.678187062 0.756425652 0.903625675 1.317018167 0.999324179 0.822253557 1.390147509 1.170843396 1.621271909 0.769938289 1.088185281 0.541978481 0.772624315 0.912935877 0.942585516 0.799260388 0.966508586 0.523740057 1.681848434 0.876945601 0.594584271 1.181289572 0.551161402 0.611695354 0.662454401 0.555980916 1.267258093 1.114211586 1.317806981 0.769209766 1.474741793 0.929679896 1.320355253 0.687672525 0.891648689 0.690584631 Mean fold change Mean fold change NA NA 2.19225 0.376937 1.172693 Trial #3 0.205061 7.475213 Trial #3 3.4133742 2.6734608 1.6333938 2.4640513 1.5775513 1.9405277 0.8934804 1.3402206 1.0540276 0.6770906 0.8585788 1.1936339 0.4886221 0.4607008 1.2215552 0.8585788 0.5444646 3.8461538 4.1881892 0.3210945 3.8112522 8.5372115 0.6046128 0.3910957 0.5433059 0.3255607 0.5235237 0.5263937 0.6257531 0.3657273 0.5010253 0.2875082 0.8646386 0.6468934 0.3361309 0.9407437 0.5327358 0.6130689 0.9449718 0.7208845 1.7060229 1.7440754 1.8392068 1.2599624 1.6256897 11.014101 2.9385028 3.4465292 2.0210135 NA NA e18 e18 0.417139 Trial #2 0.247853 1.060874 0.644102 Trial #2 3.1240081 2.9789163 2.1627749 2.9743822 1.3284969 1.3647699 1.1516663 1.2106098 0.7345273 0.7390614 1.0247109 0.7662661 0.5440943 0.5622308 1.0383133 0.5849014 0.5304919 3.5547494 3.4686012 0.6347767 0.5259578 4.1668555 8.8823395 0.5746401 0.7404016 0.4183506 0.5935842 0.7846047 0.6529132 0.4294014 0.4025638 0.4372948 0.4704471 0.3283657 0.4530816 0.3536247 0.5193862 0.7451377 1.1729603 1.7949608 1.6260419 1.6465648 1.2771533 1.6402501 9.8620232 2.1880525 8.4017429 3.1763071 3.1131599 2.0317631 NA NA Trial #1 Trial #1 0.2733845 0.7654766 2.68316634 0.76605774 1.79826249 1.25351856 2.13847431 3.076016463 2.577096807 2.852091893 1.253191892 1.638185013 1.162836364 1.390689435 0.891769779 0.644274202 1.316047912 0.915340786 0.636417199 1.017481818 0.667845209 0.549990172 0.443920639 3.700648158 3.645649141 0.781771745 0.687487715 4.262423834 8.768414744 0.658147871 0.759401389 0.637897167 1.115813775 0.245033515 0.632488698 0.540693789 0.453615763 0.313885908 0.449565622 0.340211822 0.378688159 0.439440271 0.963933497 0.773576882 0.805978008 0.621696604 1.613981086 1.561329256 1.885340516 10.78754987 2.308580223 7.239626577 2.881675138 3.092282457 NA NA e18 or empty vector

2.444542 1.553162 0.806969 0.494868 Trial #3 1.273827 Trial #3 3.3696863 2.4580477 2.5998582 1.2425296 1.4586217 1.2459061 1.1581186 1.0466962 1.1142249 0.7732046 0.6651585 1.0905899 0.6482763 0.7900868 0.6752878 0.4828308 3.8525171 3.4507209 0.7529459 0.4659486 4.3319715 5.7703706 0.2510997 0.2675953 0.5131965 0.5278592 0.4655425 0.5810117 0.5315249 0.3720674 0.4527126 0.6506598 0.4618768 0.6891496 0.8247801 1.2481672 0.9622434 1.6697214 1.3196481 1.6898827 1.0777126 1.7210411 2.1664223 7.0784457 2.5458211 5.8687683 4.6022727 3.6895161 3.1983138 NA NA Frequency in edited allele population allele edited in Frequency population allele edited in Frequency ach variant in the total edited edited total the in variant ach 2.815226 Trial #2 0.383455 0.589781 0.486618 1.426764 Trial #2 Control Control 3.9041054 3.2757884 2.6810224 1.6195937 1.6348441 1.2291832 1.1803819 1.0675288 0.9119746 1.0156774 1.2322333 0.7320198 0.7686208 0.8601232 0.7503203 0.8113219 0.7991216 0.7442201 4.2487647 4.3006161 0.6130666 0.4727628 5.3071433 6.3624718 0.5664234 0.4029197 0.5625304 0.5002433 0.4827251 0.4574209 0.4282238 0.5683698 0.6092457 0.5450122 0.8622871 1.0549878 1.3683698 1.5396594 1.0335766 1.3761557 1.5435523 1.6233577 2.2072993 7.0832117 2.2345499 5.9211679 4.1265207 3.5892944 2.8944039 dependent variants is is variants dependent NA NA - 2.841333 Trial #1 Trial #1 1.1035812 0.7864959 0.7709525 0.7398657 0.5279831 0.5893433 3.44130813 2.74807262 2.60818204 1.68179557 1.64138274 1.57299179 1.24968913 1.21860234 0.97301666 0.83312609 0.81136533 0.74297438 0.67458344 0.63106192 4.18428252 3.89828401 0.59997513 0.59686645 5.24745088 6.70542154 0.50800537 0.51228631 0.51371329 0.55366877 0.55794971 0.55794971 0.55794971 0.56793858 0.60646708 0.61502897 0.66497332 0.82336825 0.99460601 1.15442792 1.20437227 1.22006907 1.33565456 1.35991324 1.38845287 1.59108422 2.32312566 7.30899854 2.41445247 5.71791432 4.35372014 3.54604869 2.87251348 1.27 0.173032 0.982893 0.200201 0.193969 0.196306 0.599045 4.058549 Trial #3 Trial #3 0.8722953 0.6832088 0.4174174 0.6296937 0.4031467 0.4959061 0.2283309 0.3424963 0.2693591 0.2194117 0.3050358 0.1248684 0.1177331 0.3121711 0.2194117 0.1337876 0.0963271 0.1391391 1.0703009 0.2996843 0.0820564 0.9739738 2.1817031 74.444781 0.2227918 0.1441135 0.0755622 0.3186078 0.1199648 0.2305817 0.1347656 0.1846211 0.1059429 0.2383716 0.1238598 0.3466515 0.2259077 0.3482095 0.2656363 0.6286467 0.6426685 0.6777232 0.4642793 63.151335 0.8078148 2.7545162 1.0827991 0.7447165 0.17656 e18 e18 0.899224 0.365438 0.166137 0.574232 0.451023 Trial #2 Trial #2 0.9430218 0.6528612 0.8978553 0.4010238 0.4119732 0.3476452 0.2217265 0.2230951 0.3093221 0.2313072 0.1642418 0.1697165 0.3134281 0.1970902 0.1601358 0.1259187 1.0730465 1.0470416 0.1916155 0.1587671 1.2578186 2.6812477 69.813722 0.2029325 0.2614707 0.1477393 0.2096226 0.2770809 0.0875286 0.1516419 0.1421642 0.1544294 0.1159614 0.1600045 0.1248815 0.3746446 0.1834197 0.2631432 0.4142276 0.2274628 0.6338853 0.5814796 0.5792496 64.685287 3.4827452 0.7727045 2.9670513 1.1217037 1.0994035 0.7175113 0.412166 Trial #1 Trial #1 0.1808879 0.6830454 1.01168019 0.84758902 0.93803297 0.88247455 0.53878753 0.38244871 0.45738798 0.29329681 0.21189726 0.43283891 0.30104915 0.25195101 0.20931314 0.33464262 0.21964959 0.20802109 0.14600238 1.21711716 1.19902837 0.25711923 0.22610988 1.40188123 2.88386997 67.1107034 0.24998846 0.28844823 0.10384136 0.24229651 0.42382659 0.09307263 0.20537514 0.17229974 0.11922527 0.17076135 0.12922481 0.14383952 0.16691537 0.36613695 0.29075581 0.29383259 0.30613972 0.23614295 0.61304863 0.59304955 0.47613187 0.71612079 62.0163685 4.09750319 0.87688261 2.74987308 1.09456487 1.17456117 0.81227020 Raw frequency Raw frequency 0.350429 0.222692 0.219108 0.241171 1.531474 Trial #3 Trial #3 1.1281553 0.8184213 0.8229429 0.8704204 0.4159931 0.4883398 0.4171235 0.3877327 0.5199914 0.2701694 0.3730373 0.2588653 0.3651244 0.2170399 0.2645174 0.2260832 0.1616495 1.2898048 1.1552853 0.2520828 0.1559974 1.4503239 1.9318932 66.520466 0.2054138 0.1042285 0.1110756 0.2130217 0.1932411 0.2206296 0.1544407 0.1879156 0.2700811 0.1917195 0.2860577 0.3423563 0.5180992 0.3994157 0.6930814 0.5477701 0.5287503 0.7014501 0.4473456 0.7143835 0.8992559 58.491198 2.9381781 1.0567399 2.4360554 1.9103483 1.3275817 expressing cells. Frequency of allelic variants of 2 endogenous - 0.159308 0.240886 0.215489 2.800588 Control Control Trial #2 Trial #2 1.5180986 1.2737796 1.0946914 1.0425068 0.6297737 0.6357038 0.4779639 0.4589876 0.4151051 0.3546183 0.3949428 0.4791499 0.2846435 0.2988757 0.3344561 0.2917596 0.3154799 0.3107358 0.2893875 1.6521182 1.6722805 0.2383889 0.1838323 2.0636653 2.4740263 61.115328 0.2239547 0.1516119 0.2331899 0.2224155 0.1924009 0.1977882 0.1908617 0.1808569 0.1693128 0.2247243 0.3409345 0.4171252 0.5410314 0.6087566 0.4086596 0.5641195 0.5441098 0.6102958 0.6418495 0.8727306 60.461608 0.8835051 2.3411345 1.6315599 1.4191493 1.1444007 1.32737 2.586393 0.674964 Trial #1 Trial #1 1.0959495 1.0599775 1.0060193 0.6486966 0.6331087 0.6067292 0.4820259 0.4700353 0.4256697 0.3753088 0.3213506 0.3129571 0.3033646 0.2973692 0.2865776 0.2853785 0.2601981 0.2434111 1.6139476 1.5036332 0.2314204 0.2302213 2.0240294 61.428332 0.2155042 0.2173202 0.2179256 0.2239791 0.2348754 0.2366914 0.2366914 0.2366914 0.2409288 0.2500091 0.2572733 0.2609054 0.2820926 0.3492863 0.4219282 0.4897272 0.5109144 0.5175733 0.5666065 0.5768975 0.5890044 0.9855079 57.578362 3.1005969 1.0242503 2.4256329 1.8469194 1.5042919 1.2185673 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA AA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA AC GG AAC GTA CGT GTA GAA GCC TGG AACG GAGG iant observed in cells expressing relative e18 to the empty vector control. Microhomology Microhomology

mediated repair are highlighted in blue. The length of microhomology for the MMEJ the for microhomology of length The blue. in highlighted are repair mediated - allele sequence allele allele sequence allele SPRTN FANCM Indel mutation patterns at endogenous loci targeted by Cas9 in e18 in by Cas9 targeted loci endogenous at patterns mutation Indel

: 2.3

Supplementary Table Table Supplementary with transfected cells HEK293T in conducted experiments independent three in Cas9/sgRNA with targeted SPRTN) and (FANCM loci e of frequency the variant, each of frequency raw the variants, SPRTN and FANCM the of sequences the lists table The control. var edited each of frequency the of change fold mean the and population allele Edited variants resulting MMEJ from TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCGATCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCGTTCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCC-TCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGC---CGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCGGTCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGC------TCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCC------GTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGC--TCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCT------TCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCC-----GGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGC------TGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCGCGTCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCC------GAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCG----GGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGT-GGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCGTCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCGTATCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGC------GAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCG------GAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCT------GTCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCGCTCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCGTCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGA-GCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCG-CGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGC------TGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCG--GAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTG------GAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA TCGTGGGAGTTGGTGGACCCCACACCGGACTTGCAGGCACTGTTTGTTCAGTTTAACGACCAATTCTTCTGGGGCCAGCTGGAGGCCGTCGAGGTGAAGTGGAGCGTGCGAATGACCCTGTGAGTTCCGAGCCCCGCTGGGGAAAGAGGCGGGA GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAA------CGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGA------GCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACG------AACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTACAACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTT------GTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTAG--CTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAAC------AACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATT------CGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTAGTAACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGT---ACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAAT-----GAACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTAGAAACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTAG------ACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTG-AGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTAGAACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGA------AACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTAG-ACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTAG------AAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAAT------AACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GT-GGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTAGAACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTAC------AACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATG------GGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTAGGAACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTA-AACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTAGAACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTAC------AACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAA------TGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGG------GTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGTA------GTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAAC------GAACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG GTGGGTGAAGAAGGTTTGGATATAGGAGAAGTTGATCTTATAATATGTTTTGATTCCCAGAAGAGCCCAATTCGTCTTGTACAACGAATGGGT--AACTGGCCGTAAACGTCAAGGCAGGATAGTTATTATCCTTTCTGAAGGACGAGAGGAACG 88

te 11.9 19.1 Trial #3 18 21 TP53 Trial #2 13.7 22.3 Trial #1 27.67 50.47 Trial #3 expressing cells. Compilation of the the of Compilation cells. expressing - 15 0.8 0.68 1.36 22.7 JAK2 14.06 37.34 Trial #2 Trial #4 Trial #5 1.1 1.6 0.9 9.78 10.1 17.5 18.31 Trial #1 Trial #3 Trial #4 LMNA 20 8.2 9.5 6.5 0.9 1.2 0.8 8.51 8.67 16.7 25.24 Trial #3 Trial #3 Trial #2 Trial #3 HEK293T HIST1H2BK 1 15 24 7.1 7.4 0.9 1.8 12.1 6.18 7.22 JAK2 15.53 ACT2B Trial #2 Trial #2 Trial #1 Trial #2 D221A. The HDR frequency values for e18 and the empty vector control are are control vector empty the and e18 for values frequency HDR The D221A. - 1 7.3 7.6 1.5 0.9 6.9 8.4 10.9 18.32 21.28 16.65 hESC U2OS Trial #1 Trial #1 Trial #4 Trial #1

6.4 7.3 0.4 0.9 1.7 11.8 2.15 6.43 2.84 0.31 0.61 HeLa Trial #3 Trial #3 Trial #3 Trial #3 : HDR frequency values at endogenous loci targeted by Cas9 in e18 in Cas9 by targeted loci endogenous at values frequency HDR : HIST1H2BK Fig. 2.5 and 2.6a. and 2.5 Fig. 9 9.2 1.6 3.9 13.4 3.81 8.97 4.19 0.77 1.31 0.81 EMX1 Trial #2 CALD1 Trial #2 Trial #2 Trial #2 SEC61B 7 11 5.6 9.2 0.8 1.7 6.85 0.31 0.92 0.52 13.92 Trial #1 Trial #1 Trial #1 Trial #1 either the empty vector control, e18 or e18 or e18 control, vector empty the either upplementary Table 2.4 Table upplementary Construct Construct Construct Construct S templa donor sgRNA, with Cas9, transfection following loci, and lines cell indicated the in measured values frequency raw HDR plotted in the graphs of and Control e18 e18-D221A Control e18 e18-D221A Control e18 e18-D221A Control e18

89

Supplementary Table 2.5: List of ORFs used in this study.

ORF Source Human ORFeome ID ORF Source Human ORFeome ID ORF Source Human ORFeome ID RAD23B ORFeome v8 150 RAD51D ORFeome v8 7201 BRCA1 ORFeome v8 55703 SPIN1 ORFeome v8 151 ASF1A ORFeome v7 5980 ALKBH2 ORFeome v8 14660 HDAC11 ORFeome v8 153 ZC3H11A ORFeome v8 6160 LMNA ORFeome v7 14730 ASH2L ORFeome v7 296 WRNIP1 ORFeome v7 6233 MCMDC2 ORFeome v8 14814 XRCC6 ORFeome v8 421 ERCC8 ORFeome v8 6310 PDS5B ORFeome v8 14893 NKAP ORFeome v7 445 GINS1 ORFeome v8 6412 MCM8 ORFeome v7 15002 SETD5 ORFeome v8 524 OTUB1 ORFeome v7 6420 EPC1 ORFeome v8 15025 RUVBL1 ORFeome v8 653 MCM2 ORFeome v8 4536 RRM2B ORFeome v8 54167 RPA1 ORFeome v7 752 MSH5 ORFeome v8 6760 MLH3 ORFeome v8 55150 SETD6 ORFeome v8 1153 HMCES ORFeome v7 6826 FAN1 ORFeome v8 11671 ISLR ORFeome v7 1167 MUS81 ORFeome v8 6827 RNF20 ORFeome v8 52855 STAM ORFeome v8 1364 MBD4 ORFeome v7 6853 DAXX ORFeome v7 52987 TULP2 ORFeome v7 1553 MCRS1 ORFeome v7 6901 DNA2 ORFeome v7 53124 RBBP8 ORFeome v8 1563 ABTB1 ORFeome v7 7026 UHRF1 ORFeome v7 53135 RAD18 ORFeome v7 1782 RAD23A ORFeome v8 7058 ZW10 ORFeome v7 53140 TRMT6 ORFeome v7 1842 DDX1 ORFeome v8 7143 BCL7A ORFeome v7 53866 LIG4 ORFeome v8 1884 PARP3 ORFeome v8 7318 TIP60 ORFeome v7 53941 MSH4 ORFeome v8 1980 KU80 ORFeome v7 7362 GEN1 ORFeome v8 54049 DDX20 ORFeome v7 2055 RAD51AP1 ORFeome v7 7510 CLOCK ORFeome v7 54316 DBF4 ORFeome v8 2066 XRCC4 ORFeome v8 7515 BARD1 ORFeome v7 54455 TTI1 ORFeome v8 2395 XPA ORFeome v8 7662 CHD1 ORFeome v7 55210 HDAC6 ORFeome v8 8810 POLH ORFeome v8 7683 ARL4C ORFeome v8 55311 DDX11 ORFeome v8 2444 FANCC ORFeome v8 7699 ALKBH3 ORFeome v8 55713 ERCC5 ORFeome v8 2474 MPG ORFeome v8 7728 ZNF460 ORFeome v8 55829 EHMT2 ORFeome v8 2537 CDC25C ORFeome v8 7841 POLB ORFeome v8 56121 SIRT1 ORFeome v7 2655 MSH2 ORFeome v8 7918 LIG1 ORFeome v8 71458 RAD1 ORFeome v8 2688 DDX17 ORFeome v8 8028 CDYL ORFeome v8 56575 TIPIN ORFeome v7 2757 EP400 ORFeome v8 8717 HDAC10 ORFeome v8 56643 SSBP1 ORFeome v7 2764 SETD3 ORFeome v8 8806 SETD7 ORFeome v8 56672 APEX1 ORFeome v8 3865 RAD51C ORFeome v8 8834 BEND3 ORFeome v8 56682 UBE2I ORFeome v8 3129 APTX ORFeome v8 8866 RBM25 ORFeome v7 70593 RUVBL2 ORFeome v7 3139 HDAC7 ORFeome v8 8955 SKIV2L2 ORFeome v8 71068 HDAC1 ORFeome v8 3152 MTA1 ORFeome v8 8956 FAM98A ORFeome v8 71107 PCNA ORFeome v8 3156 MMS19 ORFeome v8 8970 WDR55 ORFeome v7 71279 DDX23 ORFeome v8 3178 MCM9 ORFeome v7 9308 BRD4 ORFeome v8 71377 ALDH3A2 ORFeome v7 3212 ZFHX3 ORFeome v8 9406 TCEA1 ORFeome v7 71438 OGG1 ORFeome v8 3268 PTPN11 ORFeome v8 9415 DSCR3 ORFeome v7 71487 GINS2 ORFeome v8 3432 SPO11 ORFeome v8 9483 CDC14A ORFeome v8 71733 PAF1 ORFeome v7 3459 CKAP5 ORFeome v8 9657 ING1 ORFeome v8 71736 SPIN2B ORFeome v7 3473 RAD17 ORFeome v7 9687 NAA15 ORFeome v7 71866 FANCG ORFeome v8 3568 KIAA0182 ORFeome v7 9694 INO80 ORFeome v8 72161 SSRP1 ORFeome v7 3674 NAT10 ORFeome v8 9701 SLX4 ORFeome v8 2565 SIRT2 ORFeome v7 3686 POLI ORFeome v7 10037 DCLRE1C ORFeome v7 13460 POLL ORFeome v8 3692 MYST2 ORFeome v7 10042 FAM98A ORFeome v8 71107 ZNF24 ORFeome v8 3699 DMC1 ORFeome v8 56779 WDR55 ORFeome v7 71279 APEX2 ORFeome v8 3714 SNX32 ORFeome v8 10408 BRD4 ORFeome v8 71377 RPA2 ORFeome v8 3722 HSFY1 ORFeome v8 10543 TCEA1 ORFeome v7 71438 MCM3 ORFeome v8 3726 SETDB2 ORFeome v8 10639 DSCR3 ORFeome v7 71487 PAGR1 ORFeome v7 3927 SMARCAD1 ORFeome v8 10760 CDC14A ORFeome v8 71733 AEN ORFeome v8 3954 ZRANB2 ORFeome v7 11325 ING1 ORFeome v8 71736 FEN1 ORFeome v7 3978 FANCE ORFeome v8 11400 NAA15 ORFeome v7 71866 POLD2 ORFeome v7 3988 DNMT3A ORFeome v8 11403 INO80 ORFeome v8 72161 GPKOW ORFeome v7 4001 DONSON ORFeome v7 11467 SLX4 ORFeome v8 2565 H2AFX ORFeome v8 4032 WDR76 ORFeome v8 11797 DCLRE1C ORFeome v7 13460 PECI ORFeome v8 8041 ERCC1 ORFeome v8 11933 ZRANB3 Ciccia et al, Mol Cell, 2012 N/A MORF4L1 ORFeome v8 4430 ORC6 ORFeome v8 12043 RAD51 cDNA clone MGC:2244 IMAGE:3139011 N/A EXO1 ORFeome v7 4484 DDX6 ORFeome v8 12339 KIAA0913 cDNA clone MGC:164743 IMAGE:40147119 N/A BRD8 ORFeome v8 4485 SIN3A ORFeome v8 12351 ATRIP transcript variant NM_130384 N/A RNF8 ORFeome v7 4755 ING2 ORFeome v8 12445 MRE11 transcript variant NM_005591 N/A GINS3 ORFeome v7 4758 VRK3 ORFeome v7 12512 RAD52 cDNA variant HSU12134, GenBank: U12134.1 N/A POLD1 ORFeome v8 4775 BRD3 ORFeome v8 12575 FBH1 cDNA clone MGC:141937 IMAGE:8322429 N/A CDC45 ORFeome v8 4828 POLE ORFeome v8 12663 WHSC1 cDNA clone IMAGE:100066394, MGC:195531 N/A ERCC3 ORFeome v8 4854 RHOBTB2 ORFeome v8 12763 PRIMPOL cDNA clone MGC:70479 IMAGE:5786393 N/A ING4 ORFeome v8 4992 HELLS ORFeome v8 13233 RFWD3 cDNA clone MGC:71573 IMAGE:5266451 N/A MCMBP ORFeome v7 5222 PNKP ORFeome v8 13397 SMARCAL1 Ciccia et al, Genes Dev, 2009 N/A UBE2V2 ORFeome v8 5572 XRCC2 ORFeome v8 13571 TCOF1 Ciccia et al, PNAS, 2014 N/A HUS1 ORFeome v8 5599 NAA40 ORFeome v8 13676 RNF168 transcript variant NM_152617 N/A PRIM1 ORFeome v8 5641 UBR2 ORFeome v7 13768 TOPBP1 transcript variant NM_007027, C-terminal fragment (2200 nt-end) N/A UBE2B ORFeome v8 5728 RPA4 ORFeome v8 14032 CHK2 transcript variant NM_007194 N/A TEL2 ORFeome v7 5884 LIG3 ORFeome v8 14243 ESRRA cDNA clone MGC:70614 IMAGE:6527166 N/A RAD51B ORFeome v8 8346 RINT1 ORFeome v8 14421 CHK1 transcript variant NM_001114122.2 N/A HELLS ORFeome v8 13233 DNA2 ORFeome v7 53124 SPRTN transcript variant NM_032018 N/A PNKP ORFeome v8 13397 UHRF1 ORFeome v7 53135 RNF138 transcript variant NM_001191324 N/A XRCC2 ORFeome v8 13571 ZW10 ORFeome v7 53140 ZRANB3 Ciccia et al, Mol Cell, 2012 N/A NAA40 ORFeome v8 13676 BCL7A ORFeome v7 53866 RAD51 cDNA clone MGC:2244 IMAGE:3139011 N/A UBR2 ORFeome v7 13768 TIP60 ORFeome v7 53941 KIAA0913 cDNA clone MGC:164743 IMAGE:40147119 N/A RPA4 ORFeome v8 14032 GEN1 ORFeome v8 54049 ATRIP transcript variant NM_130384 N/A LIG3 ORFeome v8 14243 CLOCK ORFeome v7 54316 MRE11 transcript variant NM_005591 N/A RINT1 ORFeome v8 14421 BARD1 ORFeome v7 54455 RAD52 cDNA variant HSU12134, GenBank: U12134.1 N/A BRCA1 ORFeome v8 55703 CHD1 ORFeome v7 55210 FBH1 cDNA clone MGC:141937 IMAGE:8322429 N/A ALKBH2 ORFeome v8 14660 ARL4C ORFeome v8 55311 WHSC1 cDNA clone IMAGE:100066394, MGC:195531 N/A LMNA ORFeome v7 14730 ALKBH3 ORFeome v8 55713 PRIMPOL cDNA clone MGC:70479 IMAGE:5786393 N/A MCMDC2 ORFeome v8 14814 ZNF460 ORFeome v8 55829 RFWD3 cDNA clone MGC:71573 IMAGE:5266451 N/A PDS5B ORFeome v8 14893 POLB ORFeome v8 56121 SMARCAL1 Ciccia et al, Genes Dev, 2009 N/A MCM8 ORFeome v7 15002 LIG1 ORFeome v8 71458 TCOF1 Ciccia et al, PNAS, 2014 N/A EPC1 ORFeome v8 15025 CDYL ORFeome v8 56575 RNF168 transcript variant NM_152617 N/A RRM2B ORFeome v8 54167 HDAC10 ORFeome v8 56643 TOPBP1 transcript variant NM_007027, C-terminal fragment (2200 nt-end) N/A MLH3 ORFeome v8 55150 SETD7 ORFeome v8 56672 CHK2 transcript variant NM_007194 N/A FAN1 ORFeome v8 11671 BEND3 ORFeome v8 56682 ESRRA cDNA clone MGC:70614 IMAGE:6527166 N/A RNF20 ORFeome v8 52855 RBM25 ORFeome v7 70593 CHK1 transcript variant NM_001114122.2 N/A DAXX ORFeome v7 52987 SKIV2L2 ORFeome v8 71068 SPRTN transcript variant NM_032018 N/A RNF138 transcript variant NM_001191324 N/A

90

Supplementary Table 2.6: List of sgRNAs and ssODNs used in this study. sgRNA Sequence Orientation Forward CACCGgctgaagcactgcacgccat BFP Reverse AAACatggcgtgcagtgcttcagcC Forward CACCGGGGGCTTTAAGACGCTTACT HIST1H2BK Reverse AAACAGTAAGCGTCTTAAAGCCCCC Forward CACCGCACTCATCTCCAATATGGTA SEC61B Reverse AAACTACCATATTGGAGATGAGTGC Forward CACCGGCTATTCTCGCAACTGACAA ACTB Reverse AAACTTGTCAGTTGCGAGAATAGCC Forward CACCGGGGGTCGCAGTCGCCATGGC LMNA Reverse AAACGCCATGGCGACTGCGACCCCC Forward CACCGAATTATGGAGTATGTGTCTG JAK2 Reverse AAACCAGACACATACTCCATAATTC Forward CACCGGTCACCTCCAATGACTAGGG EMX1 Reverse AAACCCCTAGTCATTGGAGGTGACC Forward CACCGACCACCCTGACCTACGAGG GFP-2-cut #1 Reverse AAACCCTCGTAGGTCAGGGTGGTC Forward CACCGCTGAAGCACTGCACGCAGG GFP-2-cut #2 Reverse AAACCCTGCGTGCAGTGCTTCAGC Forward CACCGTGGAGACTATTGCTGCTTGA CALD1 Reverse AAACTCAAGCAGCAATAGTCTCCAC Forward CACCGCACTTTTCGACATAGTGTGG TP53 Reverse AAACCCACACTATGTCGAAAAGTGC Forward CACCGGTACAACGAATGGGTAGAAC FANCM Reverse AAACGTTCTACCCATTCGTTGTACC Forward CACCGGGCCAGCTGGAGGCCGTCG SPRTN Reverse AAACCGACGGCCTCCAGCTGGCCC

ssODN Sequence CGGGCATGGCGGACTTGAAGAAGTCGTGCTGCTTCATG TGGTCGGGGTAGCGGCTGAAGCACTGCACGCCGTAAGT BFP-GFP CAGGGTGGTCACGAGGGTGGGCCAGGGCACGGGCAGC TTGCCGGTGGTGCAGATGAACTTCAGGGTCA AAGAAGGGCTCCCATCACATCAACCGGTGGCGCATTGCC ACGAAGCAGGCCAATGGGGAGGACATCGATGTCACCTCC EMX1 AATGACTAgtttaaacGGGTGGGCAACCACAAACCCACGAGG GCAGAGTGCTGCTTGCTGCTGGCCAGGCCCCTGCGTGG GCCCAAGCTGGACTCTGGCCACTCCC TTCCTTAGTCTTTCTTTGAAGCAGCAAGTATGATGAGCAA GCTTTCTCACAAGCATTTGGTTTTAAATTATGGAGTATGT JAK2 GTgtttaaacCTGTGGAGACGAGAGTAAGTAAAACTACAGGC TTTCTAATGCCTTTCTCAGAGCATCTGTTTTTGTTTATAT AGAAAATTCAGTTTCAGGATCA

GTATACTGCTCCAGTCTGCTGTCAATCTTGGAGACTACT CALD1 GCTGCTTGATGGGTCGATTTGACACCACTGCTAAAAAAG TAAACACATACA

TCTTAGGTCTGGCCCCTCCTCAGCATCTTATCCGAGTGG AAGGAAATTTGCGTGTGGAGTATTTGGATGACAAACACTT TP53 TTCGTCATAGTGTGGTTGTGCCCTATGAGCCGCCTGAG GTCTGGTTTGCAACTGGGGTCTCTGGGAGGAGGGGTTA AGGGTGGTTGT

91

Supplementary Table 2.7: List of primers, siRNA and antibodies used in this study.

PCR primer Orientation Sequence

Forward GTCTTCCCATCAGGCTCTCAGCTC EMX1 PCR Reverse GAGCTGGAGGTAGAGACCAGGGT

Forward ACGTTGATGGCAGTTGCAGGTC JAK2 PCR Reverse CTGACAGAGTTGCTAGACACTGGGTTG

Mutagenesis primer Orientation Sequence

Forward GTGACTGTCACAGAGCCG RAD18-ΔRING Reverse CAGCAAATCATCTATTGTCTTCATG

Forward TCACGCGAAGAGAAGAAG RAD18-ΔSAP Reverse AACTTGTTTCAAAGTGGATG

Forward TACAATGCCCAATGCGATG RAD18-ΔUBZ Reverse TACAGTTTTGGGCAGCGG

Forward AAGCATTTAGcCAGCTGTTTATC RAD18-D221A Reverse ATTAATGTGACTTTCTGGAATG

Forward GGATCCgccaccATGCCAAAGAAGAAGCGGAAGGTCGGTACTAAAGTGGATTGTCCT RAD18 UBZ-NLS Reverse GAATTCTTACCTTTTGTGAACAGAACTTCTGAGGCTTTCC

NGS primer Orientation Sequence

Forward ggagtttccccacactgagt BFP NGS Out Reverse cttgtacagctcgtccatgc

Forward ACACTCTTTCCCTACACGACGCTCTTCCGATCTAGCTGACCCTGAAGTTCATCT BFP NGS Adaptor Reverse AGACGTGTGCTCTTCCGATCTGCATGGCGGACTTGAAGAA

Forward CAATCTCTCCCTCCCAGGTT CALD1 NGS Out Reverse CAATCTCTCCCTCCCAGGTT

Forward ACACTCTTTCCCTACACGACGCTCTTCCGATCTGCTAATCAGCTAGCATATGTATGAGAA CALD1 NGS Adaptor Reverse GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTGGACTTGATTATTGTCCTAAGTG

Forward acactctttccctacacgacg ctcttccgatctGCCCCTCCTCAGCATCTTAT TP53 NGS Adaptor Reverse agacgtgtgctcttccgatctCTTAACCCCTCCTCCCAGAG

Forward acactctttccctacacgacg ctcttccgatctGTGGGTGAAGAAGGTTTGGA FANCM NGS Adaptor Reverse agacgtgtgctcttccgatctCGTTCCTCTCGTCCTTCAGA

Forward acactctttccctacacgacg ctcttccgatctTCGTGGGAGTTGGTGGAC SPRTN NGS Adaptor Reverse agacgtgtgctcttccgatctTCCCGCCTCTTTCCCCAG

Table 2.4e: Sequences of siRNAs used in the study siRNA Sequence/Identifier Source Control siRNA CGUACGCGGAAUACUUCGAUU Horizon Dharmacon (Firefly )

53BP1 siRNA D-003549-01 Horizon Dharmacon

RNF8 siRNA AGA AUG AGC UCC AAU GUA UUU Horizon Dharmacon

Table 2.4f: List of antibodies used in the study Antibody Identifier Use Dilution Source anti-vinculin V9131 WB 1:5,000 Sigma-Aldrich anti-tubulin NB600-506 WB 1:5,000 Novus Biologicals anti-γH2AX A303-837A IF 1:1,000 Bethyl Laboratories anti-53BP1 A300-273A WB, IF 1:2,000 (WB); 1:1,000 (IF) Bethyl Laboratories anti-FLAG A2220 WB, IF 1:5,000 (WB); 1:1,000 (IF) Sigma-Aldrich anti-RAD18 9040 WB 1:2,000 Cell Signaling Technology anti-RNF8 sc-271462 WB 1:1,000 Santa Cruz Biotechnology anti-PCNA MA5-11358 WB 1:2,000 ThermoFisher Scientific anti-ubiquityl-PCNA (Lys164) 13439 WB 1:10,000 Cell Signaling Technologies anti-SSEA-4 MAB1435 IF 1:1,000 R&D Systems anti-NANOG AF1997 IF 1:1,000 R&D Systems HRP-conjugated AffiniPure goat anti-rabbit IgG 111-035-144 WB 1:10,000 Jackson ImmunoResearch HRP-linked sheep anti-mouse IgG NA931 WB 1:10,000 GE Healthcare Alexa Fluor 488 goat anti-mouse IgG A-11029 IF 1:10,000 ThermoFisher Scientific Alexa Fluor 488 goat anti-rabbit IgG A-11008 IF 1:10,000 ThermoFisher Scientific Alexa Fluor 647 goat anti-rabbit IgG A32733 IF 1:5,000 ThermoFisher Scientific Alexa Fluor 555 goat anti-mouse IgG A28180 IF 1:5,000 ThermoFisher Scientific Alexa Fluor 555 goat anti-rabbit IgG A27039 IF 1:5,000 ThermoFisher Scientific

92

CHAPTER THREE: DISSECTING THE MOLECULAR MECHANISMS OF LONG DELETIONS INDUCED BY CRISPR-CAS9

Contributions: I conceived the project and designed experiments with the help of Alejandro Chavez and Alberto Ciccia. I performed the experiments with help from Samuel B. Hayward and Andrew Palacios. I wrote this section of the thesis with help from Alberto Ciccia and Samuel B. Hayward.

93

3.1 Introduction

The repurposing of CRISPR-Cas9 as a genome editing technology holds tremendous

promise in human therapy and disease modeling29. Indeed, several clinical trials are currently

underway for the treatment of cancer, hematologic, and ophthalmologic diseases444. A long

recognized concern for the clinical use of CRISPR-based therapeutics is the generation of

undesirable mutations at off-target loci and targeted sequences. Off-target effects have been

studied extensively23,34,261,445,446, and several strategies have been designed to detect and

mitigate the off-target impact through the use of careful guide design, improved delivery

methods, and reengineering Cas9 with greater specificity447-453.

In addition to off-target effects, Cas9-induced DSBs, when repaired by end-joining

pathways can introduce undesirable indels at the target locus. It was long believed that indels

generated by Cas9-mediated end-joining were small in size (<50 bp)118,149,454-456. However,

recent reports demonstrate that in addition to short indels, Cas9-induced DSBs can also

generate substantial on-target genomic alterations such as complex chromosomal

rearrangements and long deletions240,275. Previously, large-scale DNA mutagenesis had gone

undetected due to these events escaping detection by conventional, short-range PCR (<200 nt

away from the target site) based genotyping methods. However, analysis using long-range

sequencing techniques uncovered Cas9-mediated generation of long deletions (>200 nt) and

other complex chromosomal rearrangements at surprisingly high frequencies in mammalian

cells240.

Cas9-induced breaks using single sgRNAs were initially shown to generate deletions of

up to 600 nt in mouse zygotes457 and up to 1.5 kb in a haploid cancer cell line458. More recently,

94

a more comprehensive analysis of the prevalence of long deletions from a single-Cas9 DSB in

mammalian cells was demonstrated using long-range PCR and PacBio or Sanger sequencing240.

In this study, long deletions in mouse ESCs ranging up to 9.5 kb and at frequencies ranging from

5-20% were detected. Similarly, long deletions along with other large complex rearrangements

were also detected at multiple distinct loci on different chromosomes, in cultured human and

mouse cells, as well as mouse embryos240.

The repair of two concurrent DSBs induced by two nucleases often gives rise to

translocations, inversions, duplications, and long deletions457-463, but it is unclear how a single

DSB can cause such complex rearrangements. A high frequency of microhomologies at large

deletion breakpoint junctions suggests that MMEJ may play a role in the generation of these

events275. However, the DNA repair mechanisms and factors involved in the generation of these

CRISPR-mediated long deletions remain to be elucidated. Here, we developed a dual-Cas9

system capable of interrogating the function of hundreds of genes in regulating CRISPR-

mediated long deletions. Long deletions were detected by a previously described fluorescent

activated cell sorting (FACS) assay involving the targeting of an intron of the PIGA gene. Using

high throughput FACS, we interrogated the effect of the knockout of 610 DDR factors on the

frequency of PIGA loss caused by CRISPR-mediated long deletions. The results of these findings are discussed herein.

95

3.2 Results

3.2.1 A dual-Cas9 screen to map pathways that regulate CRISPR-mediated long deletions

To identify factors that modulate Cas9-induced long deletions, we used a previously described fluorescence-based assay that enables quantification of long deletion events240. The assay consists of targeting an intron that is 353 nt from the nearest exon of PIGA, which is located on the . Previously, it had been demonstrated that targeting this intronic sequence results in the inactivation of PIGA due to long deletions extending many kilobases into the exon240. We tested this assay in our experimental conditions using a retinal pigment epithelial (RPE-1) cell line. hTERT-immortalized RPE-1 cells are diploid cells with normal DNA repair machinery, and is functionally hemizygous at the PIGA locus due to X-inactivation240. We infected a single-cell-derived Cas9 expressing RPE-hTERT clonal line, with lentivirus expressing either the PIGA intron or exon (as positive control) targeting sgRNA (Fig. 3.1a). Cells with the stably integrated sgRNA construct were selected and subsequently stained with a fluorescein- labeled proaerolysin (FLAER) reagent, which binds specifically to PIGA-dependent glycophosphatidylinositol (GPI) anchors in the plasma membrane of cells. The loss of PIGA was thereafter quantified by the proportion of FLAER negative cells in the population using flow cytometry. Targeting the PIGA exon resulted in ~90% loss of PIGA (Fig. 3.1a). Further, we observed PIGA loss up to ~4% upon targeting the intron, suggesting the incidence of large genomic lesions at a frequency consistent with the previous report (Fig. 3.1a).

To map the pathways involved in CRISPR-mediated long deletions, we developed a dual-

Cas9 screening platform that combines Staphylococcus aureus Cas9 (SaCas9) mediated knockout of 610 genes, along with Streptococcus pyogenes Cas9 (SpCas9) cleavage of the 96 aforementioned PIGA intronic sequence (Fig. 3.1b). The use of two Cas9 variants, which use a distinct sgRNA, prevents sgRNA swapping between Cas9 variants. This incompatibility ensures that the editing efficiency of each Cas9 variant is not lowered due to competition with the other. The sgRNA library’s plasmid construct expresses the SaCas9 enzyme along with an sgRNA that targets one gene. The customized sgRNA library targets known and predicted DDR factors along with other control genes. The relatively small plasmid library size along with the high number of sgRNAs targeting each gene (10 sgRNAs per gene), enables the detection of relatively low-frequency events. This plasmid library was stably transduced (multiplicity of infection ~0.3) into a single cell-derived SpCas9 expressing RPE-hTERT clonal line and cultured for 14 days until the sgRNA populations reached an equilibrium. The pool was then infected at a low titer (multiplicity of infection ~0.3) with lentivirus expressing an SpCas9 sgRNA that targets the PIGA intron. The SpCas9-sgRNA construct additionally expresses RFP, thereby enabling FACS based selection of cells expressing the PIGA targeting sgRNA three days after transduction.

Finally, two weeks after PIGA sgRNA transduction, edited cells were separated by FACS and next-generation sequencing was used to determine genes whose knockdown lead to enrichment or depletion from each population (Fig. 3.1b). Targeting of PIGA under these conditions resulted in a loss of PIGA at a frequency of 3-4% across the two replicates. We measured sgRNA abundances in the PIGA inactivated (PIGA-) population and compared these distributions to the unsorted cell population to reveal which target genes promote (sgRNAs depleted from the PIGA- population) or restrict (sgRNA enriched from the PIGA- population) long deletions.

97

3.2.2 Quality control assessment of dual-Cas9 screen data

To determine the quality of our screen data, we performed quality control tests using previously designed analytical methods464,465. Briefly, we performed quality control at the read count level to determine abundance, quality, and evenness of sgRNA reads across replicates.

Our analysis revealed a good abundance and quality of reads based on previously reported recommended statistics for CRISPR screens (>300 reads per sgRNA with at least 65% mapping to the plasmid library and low zero counts in early time points, Supplementary Fig. 3.1a,b)465.

Similarly, we obtained a gini index, a measurement of the distribution of sgRNA frequency, within the recommended limits (0.1 for the cell library and at most 0.2 for later time points,

Supplementary Fig. 3.1c) indicating evenness in oligonucleotide synthesis, optimal viral transfection efficiency and clonal selection465. We next investigated if 14 days was sufficient for the cell viability phenotype of the sgRNAs to reach a steady-state, given that the cell viability effect of the sgRNAs could introduce noise into the PIGA expression signal. To test this, we compared the correlation of sgRNA read counts for the populations collected on days 3, 14 and

28 (library, equilibrated library, and unsorted populations respectively). The high correlation between the equilibrated library and the unsorted population across both replicates suggests stable sgRNA behavior after 14 days (Supplementary Fig. 3.1d). Thus, 14 days is indeed sufficient to ensure that the cell library population reaches equilibrium.

To benchmark our screening system, we identified essential genes by comparing the equilibrated library (day 14) with the parent cell library (day 3, Supplementary Fig. 3.1e).

Importantly, previously reported essential genes, were also depleted after 14 days in the equilibrated library466. Further, we observed the enrichment of TP53 targeting guides after 14

98 days which is consistent with reports that Cas9 induces a p53-mediated DNA damage response leading to a selection of cells with an inactive p53 pathway (Supplementary Fig. 3.1e)241,242.

These results demonstrate the achievement of stable gene knockdown in the cell pools. To further measure screen data quality at the sample level, we checked for consistency across replicates for gene behavior between the library and equilibrated library populations. Gene fold change across the two replicates showed a high correlation between the two replicates

(Supplementary Fig. 3.1f).

In contrast, gene behavior in the PIGA- populations showed weak correlation between the replicates indicating sampling variability or batch effects (Supplementary Fig. 3.2a). Given the long duration of the screen, we wondered if sgRNA frequency was stochastically altered during the course of the experiment. Such random drift has been described previously for

CRISPR screens lacking a strong selective pressure467,468. To test this hypothesis, we compared the correlation of samples between the two replicates across different stages of the experiment. Comparison of the normalized sgRNA frequencies between replicates revealed an increase in variability at later time points upon analyzing the library (day 3), equilibrated library

(day 14), and finally unsorted and PIGA- and PIGA+ cell populations (day28, Supplementary Fig.

3.2b). Indeed, the unsorted and PIGA- population showed only a moderate correlation between the two replicates, suggesting that the increased variability in the PIGA- populations could be due to the relatively long duration of the screen causing selection independent drift of some sgRNA populations.

99

3.2.3 Identification and validation of regulators of Cas9-induced long deletions

Despite variability between replicates in the unsorted and PIGA- populations, the high number of sgRNAs per gene (10 sgRNAs per gene) could still ensure reliable hit calling. We, therefore, ranked genes based on the fold change in the PIGA- population relative to the unsorted population and filtered genes that consistently modulated long deletion frequency across the different sgRNA replicates (Fig. 3.1c). We identified several genes that consistently regulate the frequency of PIGA loss when knocked out with the majority of tested sgRNAs and across replicates (Supplementary Fig. 3.3). To determine the effectiveness of hit identification we selected a few hits for further validation in a different single cell-derived Cas9 expressing

RPE-hTERT clonal line. For this purpose, we independently knocked out HELQ, PSIPI, HMGN1, and ZRANB2 using two different SpCas9 sgRNAs in an arrayed format (Fig. 3.1d). The validation experiments demonstrated that HMGN1 knock-out, increased CRISPR-mediated PIGA loss by

~60% consistently across the two sgRNA conditions used. These results provide preliminary evidence for the possibility of HMGN1 regulation of long deletions induced by Cas9.

Importantly, these results also highlight the promise of our screening strategy to identify regulators of CRISPR-mediated long deletions, as all the hits validated in at least one of the two knockout conditions. Future efforts will focus on comprehensively validating all identified hits

(Supplementary Fig. 3.3) and determining their role in regulating long deletions at different genomic loci, and across multiple cell lines.

100

3.3 Discussion

Despite the rapid adoption of CRISPR-Cas9 technology as an efficient and versatile genome editing tool, concerns remain about the additional on-target damage and unintended mutational outcomes upon Cas9 cleavage. Given the relatively high frequency of occurrence of such collateral damage following Cas9 cleavage, modulating the DDR could be a strategy to minimize mutagenic outcomes such as long deletions. Thus, there is a need to further understand the molecular regulation of CRISPR-mediated long deletions.

Here, we developed a dual-Cas9 targeting system to map pathways that affect the frequency of CRISPR-mediated long deletions. The utility of this approach for identifying genes that regulate CRISPR-Cas9 editing outcomes has previously been demonstrated using a similar experimental design to screen for HDR modulating genes119,290. Our screen identified known essential genes indicating successful gene knockout in the screen (Supplementary Fig. 3.1d).

However, the variability in frequency of sgRNA between the replicates of the PIGA- pools, could potentially generate false positives and negatives. Low replicate correlation is generally expected in positive selection screens, especially if the majority of sgRNAs do not strongly alter the phenotype being tested469,470. Indeed, modulation of the DDR has achieved only modest changes in the frequencies of repair pathways such as HDR119,309,471. Similarly, knocking out regulators of long deletions might induce only low-level alterations in its frequency. Thus, the absence of strong selection pressure and a relatively long duration of the assay, may cause the sgRNA proportions in the population to stochastically change over time. Indeed, our observation of time-dependent increase in variability between replicates is consistent with this hypothesis as well as with other reports467,468. Replicate variability could lead to the generation

101

of false positives, however, strategies have been developed towards resolving these issues

through more effective analysis472. Future efforts will focus on determining the best analytical

strategy to ensure the most robust hit-calling.

Despite the decrease of signal to noise ratio due to replicate variability, the high number

of sgRNAs per gene in the library improves the sensitivity of hit calling472. Promisingly, the effect

of one identified hit, HMGN1, on CRISPR-mediated long deletions, continued to validate in distinct knockout backgrounds (Fig. 3.1d). Further, all tested hits validated in at least one of the two knockout conditions. Future experiments will seek to determine the extent of gene inactivation in the two knockout conditions using qPCR and western blots to ascertain the cause for differences between the replicates. Together, this promising data demonstrates the potential of the dual-Cas9 screen to reliably identify regulators of CRISPR-mediated long deletions.

Future efforts will focus on understanding how the validating hits regulate long

deletions. In this regard, HMGN1 modulates the interaction of ATM with chromatin and its depletion reduces the levels of ionizing radiation-induced ATM autophosphorylation and the

activation of several ATM targets473. Given the central role of ATM in genome stability, HMGN1

loss might, therefore, suppress ATM-mediated suppression of toxic repair of DSBs that

generates deleterious mutations474. Alternatively, given the prominent role of HMGN1 in

altering chromatin structure by modulating chromatin compaction and organization475, it could

inhibit long deletions by interfering with the recruitment of DDR factors to the Cas9-induced

DSB. Similar functions could also be attributed to the other chromatin remodelers SUV39H2

and P300 (encoded by the EP300 gene) that are identified as hits in our screen (Fig. 3.1d).

102

Further, the identification of end-resection inhibitors, HELQ, 53BP1 and SHLD3 (Fig. 3.1d and Supplementary Fig. 3.3i) might indicate that the engagement of mutagenic end-resection dependent pathways such as SSA and MMEJ could cause large scale sequence loss following

Cas9-induced DSBs137. Given that 53BP1 loss promotes MMEJ (Fig. 2.4)471, our findings are consistent with the recent observation of the high frequency of microhomologies at long deletion breakpoint junctions275,476.

Finally, our screen identified that knocking out PRPF19 inhibits the frequency of CRISPR- mediated long deletions (Fig. 3.1d and Supplementary Fig. 3.3i). PRPF19 is an important regulator of pre-mRNA splicing477-479, but also promotes the recruitment of ATRIP through its direct interaction with ssDNA bound RPA480,481. Interestingly, depletion of PRP19 causes a reduction in BRCA1 mRNA levels, decreased RPA phosphorylation (a marker of single-stranded

DNA), and reduces I-SceI-induced HR480. These results suggest a possibility of PRPF19 promoting long deletions by enhancing end-resection. More experiments are needed to dissect the multi- faceted roles of PRPF19 in the assembly of , DNA damage response, cell proliferation, and apoptosis to further elucidate its potential role in stimulating CRISPR- mediated long deletions482.

It is interesting to note that our analysis of the dual-Cas9 screen data revealed only one gene that potentially promotes CRISPR-mediated long deletions. One plausible explanation for the lack of hits that decrease long deletions could be that very few genes independently promote such a highly mutagenic repair outcome. An alternative explanation may point to technical limitations associated with the narrow dynamic range of negative selection483. In

103

addition, the potentially low-level modulation of long deletion frequency by DDR factors, may

further limit our ability to robustly detect genes that promote long deletions.

Given the identification of several promising hits using the dual-Cas9 screen, future

efforts will focus on validating the role of these genes in regulating CRISPR-mediated long deletions. Validation steps would involve 1) reproducing the PIGA loss phenotype using different knockout/knockdown reagents and in different cell lines, 2) testing the effect of inactivating hits on CRISPR-mediated long deletions using other FACS based reporters (such as targeting Cd9240) and, 3) determining the function of the hits at the sequence level using long-

read next-generation sequencing as well as sanger-based sequencing approaches. Subsequent

experiments will seek to dissect how the validating hits regulate CRISPR-mediated long

deletions by generating functional mutants and targeting functional interactors of hits. Lastly,

based on these experiments it will be determined if targeting DDR pathways is an effective strategy to diminish the frequency of CRISPR-mediated long deletions in a clinically relevant system.

Together, these experiments will lead to the characterization of the molecular regulation of a highly mutagenic repair outcome that is not only associated with Cas9-induced

DSBs but has also been implicated in cancer progression and other genetic diseases484.

Therefore, our studies could advance our understanding of pathogenic DDR mechanisms in

addition to potentially improving the safety of CRISPR-Cas9 as a therapeutic tool.

104

3.4 Materials and methods

DNA plasmids and cloning

lentiCas9-Blast (a gift from Feng Zhang; Addgene plasmid #52962) was used for the

generation of stably expressing Cas9, RPE-1-hTERT cell line. sgRNA sequences targeting PIGA

were cloned into the BsmBI site of the lentiviral vector pUSEPR (gift from Chao Lu) which also

expresses RFP-P2A-PURO. sgRNA library sequences were cloned into SaCas9 expressing

pXPR_206 (a gift from Feng John Doench and David Root; Addgene plasmid #96920) at the

BsmBI site to generate the sgRNA library used in the dual-Cas9 screen. SpCas9 sgRNA

sequences that targeted HMGN1, PSIP1, ZRANB2, and HELQ, along with non-targeting (NT)

sgRNA sequences were cloned into lentiGuide-Puro (a gift from Feng Zhang; Addgene plasmid

#52963) at the BsmBI restriction sites for sgRNA expression. All sgRNA sequences used in the

study can be found in Supplementary Table 3.2.

SaCas9-sgRNA library construction

We created a library of 6397 sgRNAs (610 genes with 10 sgRNAs per gene and 297 non-

targeting control guides). The 610 genes that were targeted included some known essential

genes and non-essential genes as controls. For the selection of genes, curated gene lists from the Reactome pathways "DNA repair" and "chromatin remodeling" were used in conjunction with BioGRID and ClueGO. The Reactome pathways were expanded for all protein-protein interactions within BioGRID and trimmed based on ClueGO classification. A subset of potential unclassified DDR genes was added to this list based on their localization to

micro-irradiation laser stripes or pulldown with known DNA repair factors in iPOND (isolation of

105

proteins on nascent DNA) or co-immunoprecipitation experiments with known DNA repair

factors that were conducted previously within our lab.

Top ten sgRNAs for each gene were picked using the GPP sgRNA designer

(https://portals.broadinstitute.org/gpp/public/analysistools/sgrna-design). An oligonucleotide

pool containing all sgRNA sequences were synthesized by Agilent. Then, primers targeting the

flanking sequences of the oligonucleotides were used for amplification of the sublibrary. The

amplified sublibrary was then digested with BbsI to create homologies with BsmBI digested

pXPR_206. The amplified DNA products were ligated into the lentiviral vector using the Gibson cloning method485 and were transformed into electrocompetent cells (Endura) to obtain the

sgRNA plasmid library. Representation of each sgRNA in the library was then determined by

NGS using the protocol detailed below.

Cell lines and cell culture

HEK293T and RPE-1-hTERT cells were obtained from American Type Culture Collection

(ATCC). Cell lines were cultured in DMEM supplemented with 10% Fetalgro bovine growth

serum (BGS, RMBIO) and 1X penicillin-streptomycin (ThermoFisher Scientific). Cells were grown

at 37 °C with 5% CO2. To generate single clone derived RPE-1-hTERT expressing Cas9 cell lines,

RPE-hTERT cells were transduced with lentiviruses (multiplicity of infection = 0.1) carrying

SpCas9 (Addgene plasmid #52962), and selected with blasticidin (1 µg/ml) following which

monoclonal populations were generated by limiting dilution. Clones were tested for Cas9

expression by western blotting, and the monoclonal lines with the highest Cas9 expression

were selected for the dual-Cas9 screen and downstream validation of hits.

106

Lentiviral packaging

Lentivirus was produced by transfecting HEK293T with standard packaging vectors using the TransIT-LT1 Transfection Reagent (MIR 2700; Mirus Bio LLC). Viral supernatant was collected 48–72 h after transfection, filtered through a 0.45-μm polyethersulfone syringe filter, snap-frozen, and stored at −80 °C for future use.

PIGA targeting assay

Cas9 expressing RPE1 cells were seeded at a confluency of 50% in a 12 well plate on day

0. One day later (day 1), the cells were transduced with lentivirus (at a multiplicity of infection of ~0.3) carrying the PIGA targeting SpCas9 sgRNA. The following day, the media was changed and puromycin (8 µg/ml) was added to select for sgRNA expressing cells. 2 days later, the drug selection was removed and cells were cultured for 14 days from the viral transduction. Around

300,000 cells were collected on day 14, stained in PBS+0.1% BSA for 30 min at room temperature with 1 μg/ml FLAER reagent (Cedarlane), washed twice and analyzed by flow cytometry using a BD LSRFortessa. Flow cytometry analyses were conducted in the Herbert

Irving Comprehensive Cancer Center Flow Cytometry Shared Resource.

Pooled screen

On day 1, 20 million RPE-1-hTERT-SpCas9 cells were transduced with pXPR_206-derived,

SaCas9 expressing sgRNA library lentivirus at a multiplicity of infection of ~0.3. The following day, cells stably expressing an sgRNA were selected by puromycin treatment (8 μg/ml) for one day (in which time the untransduced control cells were completely killed by the puromycin selection). On day 3, 6 million cells were subcultured for the screen and the remaining cell

107 libraries were frozen down (library). On day 14, the cells were split and 20 million cells were transduced with pUSEPR-derived, SpCas9 sgRNA (targeting the PIGA intron) expressing lentivirus at a multiplicity of infection of ~0.3. The rest of the cells were frozen down

(equilibrated library). Since the pUSEPR construct expresses RFP, 6 million PIGA sgRNA expressing cells were sorted into culture by FACS using the BD influx cell sorter on day 17. Cells were cultured till day 28, at which point 80 million cells were stained with FLAER (as described previously) and sorted by FACS using the BD influx cell sorter. At least 8M FLAER positive

(PIGA+) and ~2M FLAER negative (PIGA-) populations were collected and frozen down. The remaining cultured cells that were not sorted by FACS were spun down and frozen (unsorted).

Cells during the entire course of the screen were cultured at a population of no less than 3.6 million cells to ensure a coverage of >500 cells per sgRNA (to guarantee that each perturbation is sufficiently represented in the final screening readout). A replicate was performed a month later using a different batch of sgRNA library lentivirus but the same cell line and conditions detailed above.

Next-generation sequencing

DNA from each cell population (library, equilibrated library, unsorted, PIGA+ and PIGA-) was purified with DNA purification kits (mini genomic DNA kit: IB47200) and the total amount of DNA was quantified. A maximum of 2.5 μg of genomic DNA was amplified in a single Q5 polymerase (NEB) PCR reaction using primers specific to the sgRNA cassette. Up to 7 PCR reactions were set up for each cell population to obtain the desired coverage of the cell library.

The thermocycler was set for 1 cycle of 98 °C for 1 min, 23 cycles × 10 s at 98 °C, 20 s at 60 °C,

30 s at 72 °C, and 2 min at 72 °C. The amplicons were then barcoded in a second PCR step using

108 custom primers containing TruSeq Illumina dual indexes (UDI). The thermal cycler program for the second PCR was as follows: 1 min at 98 °C, 12 cycles × 10 s at 98 °C, 20 s at 60 °C, 30 s at

72 °C, and 2 min at 72 °C. PCR amplifications were verified using 2% agarose gels in TAE. The indexed amplicons from the second PCR were pooled and gel purified using a gel extraction kit

(IBI gel/PCR DNA fragment extraction kit: IB47010). Gel purified samples were sequenced at the

Genome Sciences Facility at The Pennsylvania State College of Medicine.

Screen data analysis

We used the latest version of MAGeCK to analyze the screening data464. We used the

MAGeCK “count” command to generate read counts of all populations. We next used MAGeCK

“test” command to identify the top negatively and positively selected sgRNAs. MAGeCK uses an

α-Robust Rank Aggregation (α-RRA) algorithm to calculate the “RRA score” of each sgRNA, a score to describe the degree of negative (or positive) selection. A detailed description of the algorithm can be found in the original study464. Processed screen data for the PIGA- and unsorted cell populations are listed in Supplementary Table 3.1.

109

3.1 Figures

Figure 3.1: A dual-Cas9 screen identifies regulators of CRISPR-mediated long deletions.

110 a) Examples of PigA editing in RPE1-Cas9-hTERT cells revealed by FLAER staining, for two sgRNAs and two controls. Positive control corresponds to unedited FLAER stained cells while the negative control corresponds to unedited unstained cells. PIGA was inactivated using sgRNA targeting either intron 1 or exon 2. In the schematic, thick bars represent and narrow ones indicate untranslated regions. b) Schematic of the dual-Cas9 editing screening strategy as described in the main text. Cells were seeded on day 0 and the populations collected and sent for sequencing were cell library on day 3, equilibrated cell library on day 14, and unsorted, PIGA- (FLAER-) and PIGA+ (FLAER+) populations on day 28. c) Plot showing the distribution of normalized change (y-axis) between the PIGA- population with the unsorted population. Shown are all the sgRNA targeted genes. Red dots indicate hits that consistently modulated PIGA knockout frequency, and will be taken forward for further validation for their effect on CRISPR-mediated long deletions. d) Individual SpCas9-mediated knockout of 4 hits identified in c along with a non-targeting (NT) control sgRNA. Each gene was knocked-out using two SpCas9 sgRNA independently in RPE1-Cas9-hTERT cells following which the frequency of PIGA loss was tested. Cas9-induced PIGA loss frequency was measured following FLAER staining by FACS, 14 days after cells were transduced with a PIGA intron targeting sgRNA expressing lentivirus. The values of individual experiments were normalized to the non-targeting (NT) control sgRNA (dashed line) and presented as the mean ± s.e.m. (n ≥ 2).

111

Supplementary Figure 3.1: Quality control assessment of dual-Cas9 screen.

a-c) Quality control measurements for both replicates of the screen populations derived from MAGeCK analysis464. a) Total number of reads and the percentage of unmapped reads. b) Number of zero-count sgRNAs. c) Gini-index. d) Pairwise Pearson’s correlations of sgRNA read counts for the indicated populations and replicates. Heatmap values correspond to Pearson’s correlation coefficient e) CRISPR screen results of essential genes identified by comparing the cell library (library) with the equilibrated library (library eq.) populations (collected on days 3 and 14 respectively). Genes were rank-ordered by essentiality using robust rank aggregation (RRA) scores calculated by MAGeCK. Known essential genes and the TP53 gene are highlighted. f) Comparison between two replicates of gene frequency fold change (as calculated using MAGeCK analysis) between the cell library and equilibrated cell library. Pearson's correlation coefficient (R) is indicated.

112

Supplementary Figure 3.2: Correlation comparison between dual-Cas9 screen populations and replicates.

a) Comparison between two replicates of gene frequency fold change (as calculated using MAGeCK analysis) between unsorted and PIGA- populations. Pearson's correlation coefficient (R) is indicated. b) Comparison of sgRNA sequence counts (normalized to total sgRNA counts) between two replicates for indicated populations with Pearson's correlation coefficient (R) reported.

113

Supplementary Figure 3.3: sgRNA behavior in dual-Cas9 screen for identified hits

a-i) sgRNA sequence counts (reads) across the indicated populations for all the sgRNAs that targeted the indicated gene in the dual-Cas9 screen. Data is shown for both screen replicates. Graphs highlight genes that were identified as “hits” in the screen for their modulation of Cas9- induced PIGA loss.

114

3.6 Tables

Supplementary Table 3.1: Processed dual-Cas9 screen data (PIGA- vs unsorted populations).

Gene Fold Change (PIGA- vs Unsorted)

False discovery Log2 fold False discovery Log2 fold False discovery Log2 fold Gene p-score rate change Rank Gene p-score rate change Rank Gene p-score rate change Rank RFC1 0.94279 0.999578 1.8745 1 CBX3 0.69659 0.999578 0.33603 104 APTX 0.60325 0.999578 0.15105 207 XPO1 0.50087 0.999578 1.8163 2 ORC2 0.72549 0.999578 0.33285 105 ACTR3 0.29766 0.890776 0.15053 208 ZRANB2 0.96588 0.999578 1.2554 3 TELO2 0.25461 0.837716 0.33238 106 BEND3 0.49741 0.999578 0.14949 209 XPA 0.99947 0.999578 1.1354 4 KMT2D 0.94756 0.999578 0.33112 107 SIRT1 0.91358 0.999578 0.14912 210 CDK7 0.05921 0.481576 1.0789 5 PHF20 0.87922 0.999578 0.3298 108 TMEM165 0.72992 0.999578 0.14604 211 TOP1 0.35779 0.940739 1.0637 6 USP28 0.9898 0.999578 0.32961 109 CGNL1 0.85807 0.999578 0.14483 212 TP53BP1 0.98852 0.999578 1.0499 7 FANCB 0.98904 0.999578 0.32667 110 CHD4 0.092077 0.601485 0.14455 213 PSIP1 0.99789 0.999578 0.95462 8 RAD21 0.046771 0.430008 0.32445 111 CEP63 0.94621 0.999578 0.14172 214 HELQ 0.98209 0.999578 0.89525 9 KDM1A 0.032136 0.407719 0.32371 112 XRCC2 0.82724 0.999578 0.1412 215 ATR 0.42073 0.999578 0.82938 10 ATXN3 0.99648 0.999578 0.32123 113 MRE11 0.21129 0.781139 0.1406 216 PARP2 0.9745 0.999578 0.81565 11 UBC 0.17999 0.713243 0.32041 114 SLF1 0.89068 0.999578 0.13861 217 SHLD3 0.81314 0.999578 0.81491 12 UBE2B 0.99958 0.999578 0.31293 115 DCLRE1A 0.74125 0.999578 0.13825 218 TIPIN 0.92253 0.999578 0.79075 13 USP6 0.65728 0.999578 0.31259 116 RPA4 0.83329 0.999578 0.13734 219 HMGN1 0.98223 0.999578 0.70994 14 APAF1 0.03771 0.423416 0.31234 117 WRNIP1 0.5758 0.999578 0.13685 220 BRD2 0.65221 0.999578 0.69749 15 KMT5B 0.15722 0.713243 0.31148 118 KDM2A 0.71053 0.999578 0.13601 221 XRCC4 0.98329 0.999578 0.69016 16 H2AFX 0.39665 0.995698 0.31042 119 HINT1 0.92263 0.999578 0.13568 222 KDM6B 0.82084 0.999578 0.68693 17 MCM5 0.27914 0.882258 0.31039 120 SNAP29 0.86492 0.999578 0.13521 223 SMC3 0.026006 0.377706 0.67431 18 NHEJ1 0.93653 0.999578 0.30899 121 KMT2B 0.79367 0.999578 0.13506 224 HDDC2 0.99743 0.999578 0.6608 19 SMARCAD1 0.9811 0.999578 0.30865 122 PHF8 0.51835 0.999578 0.13373 225 ORC4 0.23705 0.831052 0.65694 20 MTMR6 0.57171 0.999578 0.30795 123 GORASP2 0.85287 0.999578 0.1323 226 SWAP70 0.9671 0.999578 0.65666 21 PARPBP 0.71056 0.999578 0.30728 124 HERC2 0.93609 0.999578 0.13158 227 RAD17 0.97456 0.999578 0.65222 22 EZH2 0.93321 0.999578 0.30505 125 LIN9 0.14795 0.694242 0.12981 228 OOSP2 0.97712 0.999578 0.64435 23 RECQL 0.90583 0.999578 0.30199 126 DDX1 0.17903 0.713243 0.12893 229 POLDIP3 0.0207 0.332297 0.62931 24 KAT2B 0.91567 0.999578 0.29526 127 SMC6 0.76198 0.999578 0.12876 230 EIF3A 0.27741 0.88136 0.62072 25 BMI1 0.98306 0.999578 0.29168 128 RBM7 0.78629 0.999578 0.12794 231 RAD51B 0.55031 0.999578 0.61075 26 USP15 0.75686 0.999578 0.29075 129 PPM1D 0.2929 0.890776 0.12662 232 CBX5 0.96944 0.999578 0.6103 27 DNA2 0.25232 0.837434 0.28965 130 ABRAXAS1 0.56311 0.999578 0.12638 233 FAN1 0.99891 0.999578 0.60482 28 C17orf53 0.56714 0.999578 0.28911 131 ORC1 0.68599 0.999578 0.12595 234 SLF2 0.0070293 0.21803 0.58145 29 USP7 0.24692 0.835814 0.28152 132 RAD50 0.78715 0.999578 0.12457 235 SUV39H2 0.57638 0.999578 0.57634 30 MORF4L1 0.77679 0.999578 0.28082 133 Non_Target 0.98214 0.999578 0.12422 236 0.99271 0.999578 0.57014 31 SPRTN 0.46749 0.999578 0.27967 134 PAXIP1 0.83001 0.999578 0.12391 237 WDR48 0.99594 0.999578 0.56757 32 TDG 0.99142 0.999578 0.27188 135 ANKHD1 0.97125 0.999578 0.12257 238 MCM6 0.096696 0.601485 0.5654 33 TFCP2 0.87756 0.999578 0.27053 136 C7orf49 0.2967 0.890776 0.12225 239 CHEK1 0.23461 0.827238 0.5625 34 ZRANB3 0.72016 0.999578 0.26319 137 HPRT1 0.54398 0.999578 0.12183 240 TET3 0.98231 0.999578 0.56146 35 UBE2N 0.28162 0.884553 0.25978 138 THUMPD3 0.87526 0.999578 0.11815 241 POLA1 0.58395 0.999578 0.5556 36 NKTR 0.74572 0.999578 0.25749 139 SETD1A 0.71484 0.999578 0.11788 242 CABS1 0.97059 0.999578 0.54541 37 USP11 0.63293 0.999578 0.25611 140 MYSM1 0.72676 0.999578 0.11742 243 CENPA 0.39095 0.993674 0.53899 38 NEIL1 0.93043 0.999578 0.25336 141 AGPAT4 0.69598 0.999578 0.11725 244 FANCC 0.82046 0.999578 0.53865 39 SMARCD3 0.93692 0.999578 0.25327 142 WDR76 0.48113 0.999578 0.11656 245 EP300 0.095494 0.601485 0.53255 40 KAT2A 0.34391 0.916091 0.25311 143 RNF169 0.47125 0.999578 0.11626 246 CHEK2 0.88532 0.999578 0.52913 41 DCLRE1C 0.76559 0.999578 0.25226 144 POLR2A 0.65503 0.999578 0.11334 247 BRCA2 0.94726 0.999578 0.52902 42 SNX2 0.73071 0.999578 0.25139 145 SYNE1 0.75405 0.999578 0.11328 248 ERCC6L 0.85731 0.999578 0.52835 43 POLI 0.69783 0.999578 0.24954 146 MMS22L 0.72059 0.999578 0.10901 249 TET2 0.79952 0.999578 0.5235 44 SIRT5 0.82025 0.999578 0.24788 147 UIMC1 0.97903 0.999578 0.10718 250 PALB2 0.55782 0.999578 0.52133 45 UCHL5 0.80681 0.999578 0.24779 148 RNF4 0.93989 0.999578 0.10333 251 CDC7 0.46869 0.999578 0.51967 46 PCM1 0.19191 0.745628 0.24559 149 EP400 0.080084 0.561512 0.10233 252 HUS1 0.72845 0.999578 0.51724 47 COMMD8 0.99824 0.999578 0.2439 150 RNASEH2C 0.62226 0.999578 0.1009 253 PRIMPOL 0.99058 0.999578 0.51304 48 GTF2H1 0.77616 0.999578 0.2429 151 ZMYM2 0.52981 0.999578 0.10052 254 MYH10 0.92356 0.999578 0.49567 49 POLK 0.80999 0.999578 0.24115 152 ZNF691 0.69313 0.999578 0.099522 255 SET 0.56682 0.999578 0.49296 50 VN1R4 0.87472 0.999578 0.23957 153 RUVBL2 0.33993 0.916091 0.098673 256 TOP3A 0.49887 0.999578 0.48734 51 ERCC1 0.44635 0.999578 0.23669 154 UBE3A 0.66243 0.999578 0.095167 257 ETAA1 0.96643 0.999578 0.48709 52 CUL5 0.94015 0.999578 0.23565 155 RAD9B 0.59076 0.999578 0.089735 258 ERCC8 0.98425 0.999578 0.48135 53 LRRC41 0.55899 0.999578 0.2291 156 BAIAP2 0.86979 0.999578 0.088796 259 USP3 0.99609 0.999578 0.47684 54 TTBK2 0.97287 0.999578 0.22837 157 PCID2 0.31817 0.90335 0.085225 260 JADE1 0.53897 0.999578 0.47645 55 TAF12 0.58583 0.999578 0.22794 158 RFC5 0.008162 0.22253 0.08172 261 BRD8 0.97139 0.999578 0.471 56 HERC3 0.95679 0.999578 0.22484 159 INTS2 0.29936 0.890776 0.079263 262 RXFP3 0.85975 0.999578 0.47097 57 DDB2 0.89228 0.999578 0.22244 160 RNASE8 0.87015 0.999578 0.075987 263 UNG 0.90927 0.999578 0.47051 58 CCNK 0.39616 0.995698 0.21639 161 MCM3AP 0.24527 0.835814 0.075809 264 EPC1 0.99264 0.999578 0.4578 59 SUMO2 0.7922 0.999578 0.21568 162 BRD3 0.77832 0.999578 0.074132 265 FAM35A 0.066731 0.497444 0.45465 60 ENY2 0.1879 0.734751 0.21539 163 SLX4 0.033676 0.407719 0.073565 266 SUZ12 0.91052 0.999578 0.4535 61 ZFC3H1 0.75201 0.999578 0.21412 164 OR2D2 0.54519 0.999578 0.06893 267 EPC2 0.96642 0.999578 0.44881 62 MLH3 0.82273 0.999578 0.2125 165 TCP1 0.13951 0.677754 0.06784 268 SEM1 0.75941 0.999578 0.44809 63 ATRX 0.73322 0.999578 0.21029 166 DIP2B 0.84197 0.999578 0.067333 269 NEK9 0.0056185 0.213369 0.44573 64 NSUN2 0.89647 0.999578 0.20979 167 TDP1 0.47438 0.999578 0.065639 270 PTEN 0.96464 0.999578 0.44466 65 KDM5A 0.87779 0.999578 0.20875 168 PSMD1 0.23379 0.827238 0.063633 271 PROP1 0.83626 0.999578 0.44001 66 ING4 0.97012 0.999578 0.20796 169 C18orf25 0.77365 0.999578 0.059429 272 MSH2 0.90337 0.999578 0.43923 67 CASP3 0.99537 0.999578 0.20745 170 L3MBTL3 0.96777 0.999578 0.059312 273 RFC4 0.33567 0.916091 0.43873 68 ZGRF1 0.75394 0.999578 0.20738 171 MCRS1 0.82343 0.999578 0.058401 274 SKA3 0.7476 0.999578 0.42811 69 CYLD 0.95306 0.999578 0.20659 172 RAD1 0.45183 0.999578 0.056447 275 HDAC2 0.93019 0.999578 0.42176 70 HMGB1 0.20148 0.769867 0.19784 173 ESCO2 0.69411 0.999578 0.056003 276 PNKP 0.17111 0.713243 0.42025 71 CEP164 0.84769 0.999578 0.19654 174 SIN3A 0.72812 0.999578 0.052391 277 NSMCE4A 0.34204 0.916091 0.41999 72 BRIP1 0.96892 0.999578 0.19493 175 THOC1 0.76134 0.999578 0.052383 278 DNMT1 0.72072 0.999578 0.41954 73 PMS1 0.69818 0.999578 0.19371 176 CBX1 0.9243 0.999578 0.05084 279 WDR75 0.58769 0.999578 0.41227 74 HNRNPUL2 0.44026 0.999578 0.19261 177 FANCF 0.2979 0.890776 0.049386 280 PRPF40A 0.29491 0.890776 0.41217 75 ADAR 0.61685 0.999578 0.18943 178 HNRNPUL1 0.61728 0.999578 0.048322 281 MSH3 0.90614 0.999578 0.39718 76 EME2 0.91109 0.999578 0.1894 179 EME1 0.60603 0.999578 0.048083 282 SBF2 0.97868 0.999578 0.39684 77 YTHDF1 0.81521 0.999578 0.18652 180 ALKBH2 0.91298 0.999578 0.04732 283 ASF1A 0.82552 0.999578 0.39628 78 PIAS4 0.7476 0.999578 0.18645 181 PARK7 0.16164 0.713243 0.04646 284 RNMT 0.7097 0.999578 0.39205 79 TET1 0.74122 0.999578 0.1847 182 SIN3B 0.066036 0.497444 0.042339 285 PIAS2 0.99368 0.999578 0.38896 80 DRD3 0.82021 0.999578 0.18404 183 PARP15 0.84721 0.999578 0.040915 286 SMG1 0.033984 0.407719 0.3855 81 RECQL5 0.76419 0.999578 0.18171 184 RAD54L 0.4512 0.999578 0.039832 287 SDCCAG8 0.98887 0.999578 0.38488 82 CUL4B 0.98087 0.999578 0.18153 185 SETDB1 0.82687 0.999578 0.039512 288 ALKBH3 0.28905 0.890776 0.37867 83 POLE2 0.34031 0.916091 0.17794 186 MTNR1B 0.69426 0.999578 0.03783 289 GATAD2A 0.68721 0.999578 0.37639 84 FBXO18 0.91946 0.999578 0.17743 187 PIAS1 0.94349 0.999578 0.036706 290 MLH1 0.78268 0.999578 0.37604 85 UBE2U 0.63859 0.999578 0.17376 188 UVSSA 0.90425 0.999578 0.033053 291 IRF3 0.96761 0.999578 0.3725 86 RIF1 0.1373 0.67544 0.17208 189 SLC22A6 0.67438 0.999578 0.031609 292 HAT1 0.75129 0.999578 0.36934 87 NEIL3 0.89817 0.999578 0.16661 190 PSMA4 0.095415 0.601485 0.030926 293 XRCC5 0.65085 0.999578 0.36909 88 CXorf57 0.90059 0.999578 0.16593 191 RHNO1 0.85681 0.999578 0.029678 294 UBQLN3 0.91086 0.999578 0.36712 89 SIRT4 0.22578 0.819787 0.16518 192 POLE4 0.86073 0.999578 0.029241 295 TP53 0.30914 0.897965 0.3665 90 GEN1 0.7604 0.999578 0.16515 193 KDM6A 0.77408 0.999578 0.028491 296 NEDD1 0.20149 0.769867 0.36427 91 RAD18 0.87712 0.999578 0.16407 194 APOBEC2 0.83059 0.999578 0.026533 297 KMT5A 0.88094 0.999578 0.35882 92 RNASEH2B 0.97423 0.999578 0.1636 195 OR8B8 0.74199 0.999578 0.025356 298 KIAA0753 0.93175 0.999578 0.35657 93 MTA3 0.43077 0.999578 0.1634 196 MSL2 0.92074 0.999578 0.024666 299 PRKDC 0.5332 0.999578 0.35581 94 SKIV2L2 0.51399 0.999578 0.16249 197 GSX1 0.95528 0.999578 0.023897 300 MKI67 0.94312 0.999578 0.35442 95 RRM2B 0.90887 0.999578 0.15843 198 FAAP20 0.6456 0.999578 0.023816 301 USP1 0.97647 0.999578 0.3542 96 HLTF 0.92715 0.999578 0.15775 199 RBX1 0.74396 0.999578 0.021864 302 TP53BP2 0.80603 0.999578 0.35159 97 RUVBL1 0.12198 0.657653 0.15749 200 POLB 0.99833 0.999578 0.021833 303 CEP57 0.46308 0.999578 0.35153 98 XPC 0.87432 0.999578 0.15564 201 POLN 0.87902 0.999578 0.018989 304 KDM4D 0.96916 0.999578 0.34861 99 ERCC6 0.69974 0.999578 0.15431 202 CEP152 0.81372 0.999578 0.015707 305 RPP30 0.67198 0.999578 0.34431 100 C20orf196 0.66615 0.999578 0.15391 203 CEP131 0.94974 0.999578 0.015679 306 FANCM 0.81841 0.999578 0.34095 101 ABCF1 0.79496 0.999578 0.15309 204 KMT5C 0.90834 0.999578 0.015642 307 BARD1 0.59073 0.999578 0.3377 102 NOL11 0.39462 0.995698 0.15275 205 ZDBF2 0.87544 0.999578 0.014389 308 DCUN1D1 0.69708 0.999578 0.33621 103 PIAS3 0.61727 0.999578 0.15128 206 TADA2A 0.78781 0.999578 0.014377 309 115

Gene Fold Change (PIGA- vs Unsorted) [continued]

False discovery Log2 fold False discovery Log2 fold False discovery Log2 fold Gene p-score rate change Rank Gene p-score rate change Rank Gene p-score rate change Rank REV1 0.93466 0.999578 0.011653 310 PSMD2 0.11034 0.629029 -0.16648 413 XRCC1 0.098604 0.601485 -0.41039 516 MAPK14 0.95821 0.999578 0.009334 311 CUL4A 0.62243 0.999578 -0.167 414 TDP2 0.2448 0.835814 -0.41233 517 PARP1 0.27735 0.88136 0.007204 312 NUDCD3 0.018157 0.30766 -0.16788 415 MDC1 0.38527 0.987459 -0.4132 518 CTNNBIP1 0.71371 0.999578 0.006609 313 SAMHD1 0.63028 0.999578 -0.16883 416 0.067685 0.497444 -0.41633 519 CGN 0.11627 0.644766 0.002535 314 YBX1 0.48025 0.999578 -0.17123 417 POLL 0.31135 0.900115 -0.41656 520 ABL1 0.60892 0.999578 0.002364 315 XIAP 0.17037 0.713243 -0.17178 418 RNASEH2A 0.010884 0.252189 -0.42633 521 POLM 0.46464 0.999578 2.71E-07 316 ING3 0.3223 0.90335 -0.17269 419 RBBP5 0.26309 0.853659 -0.4328 522 USP4 0.86723 0.999578 -0.00422 317 TUBB2A 0.57017 0.999578 -0.17386 420 CDK1 0.041356 0.427322 -0.4371 523 TAAR9 0.77336 0.999578 -0.00474 318 SUPT5H 0.071461 0.518939 -0.17409 421 CLSPN 0.12147 0.657653 -0.4397 524 TREX1 0.64037 0.999578 -0.00732 319 LMNA 0.84591 0.999578 -0.17485 422 NSD2 0.55258 0.999578 -0.44283 525 ADH5 0.69793 0.999578 -0.00962 320 KAT5 0.046036 0.430008 -0.175 423 KRT35 0.21558 0.792205 -0.44303 526 MUM1 0.63057 0.999578 -0.01443 321 MCM2 0.31399 0.90335 -0.17654 424 RHO 0.30439 0.892672 -0.44846 527 TRIM9 0.6954 0.999578 -0.01592 322 SLX1B 0.82202 0.999578 -0.17932 425 PLK4 0.24907 0.835814 -0.4557 528 BLM 0.69053 0.999578 -0.01717 323 SLC2A7 0.61886 0.999578 -0.1842 426 RTEL1 0.17522 0.713243 -0.45805 529 MUTYH 0.41828 0.999578 -0.02017 324 POLH 0.78588 0.999578 -0.18425 427 SETD2 0.053229 0.43878 -0.45908 530 RAD52 0.29911 0.890776 -0.02213 325 FAAP24 0.49846 0.999578 -0.1872 428 GATAD2B 0.14 0.677754 -0.46121 531 WDR18 0.53767 0.999578 -0.02462 326 NAP1L4 0.92125 0.999578 -0.18897 429 CAT 0.3067 0.895157 -0.46608 532 IL22 0.033507 0.407719 -0.02593 327 OR2AK2 0.78185 0.999578 -0.18959 430 BCOR 0.27599 0.88136 -0.46682 533 KDM4A 0.8032 0.999578 -0.02721 328 SALL2 0.77928 0.999578 -0.19329 431 EXO1 0.20808 0.781139 -0.46775 534 UBE2V2 0.94255 0.999578 -0.02887 329 TICRR 0.40762 0.999578 -0.19421 432 HOXB9 0.16971 0.713243 -0.48768 535 SNX1 0.48728 0.999578 -0.02937 330 AICDA 0.64541 0.999578 -0.19467 433 RBBP8 0.3757 0.966999 -0.48794 536 MTA1 0.9968 0.999578 -0.03011 331 NBN 0.030626 0.407719 -0.19838 434 PRDM10 0.12722 0.657653 -0.49293 537 ING5 0.16871 0.713243 -0.03069 332 SLC1A5 0.82348 0.999578 -0.20224 435 RAD51C 0.36073 0.944401 -0.49454 538 WRN 0.24937 0.835814 -0.03114 333 KDM2B 0.092692 0.601485 -0.2093 436 OGG1 0.1728 0.713243 -0.49505 539 SMARCA5 0.71725 0.999578 -0.03211 334 MTA2 0.47571 0.999578 -0.20956 437 PMS2 0.02994 0.407719 -0.49565 540 TNKS1BP1 0.90913 0.999578 -0.03212 335 RTF1 0.18006 0.713243 -0.21031 438 KIAA0100 0.34257 0.916091 -0.50167 541 RNF168 0.6669 0.999578 -0.03425 336 MRGBP 0.11539 0.644766 -0.21206 439 YEATS2 0.20193 0.769867 -0.50221 542 MBD3 0.57631 0.999578 -0.03439 337 PRMT7 0.12432 0.657653 -0.21221 440 PIF1 0.12627 0.657653 -0.51103 543 SMARCAL1 0.16964 0.713243 -0.03492 338 BABAM1 0.87678 0.999578 -0.21808 441 PML 0.14389 0.691124 -0.51618 544 NASP 0.48585 0.999578 -0.03509 339 CDK12 0.85496 0.999578 -0.21915 442 FANCA 0.32088 0.90335 -0.51842 545 MCM9 0.67823 0.999578 -0.0354 340 TRIM28 0.59636 0.999578 -0.2215 443 TWISTNB 0.042211 0.427322 -0.52761 546 KRTAP10-12 0.20877 0.781139 -0.03755 341 NSD1 0.60542 0.999578 -0.22466 444 FANCL 0.20743 0.781139 -0.53195 547 SETD7 0.6753 0.999578 -0.0379 342 OR1A1 0.60934 0.999578 -0.22616 445 RNASEH1 0.10636 0.617922 -0.53937 548 HELB 0.91742 0.999578 -0.04206 343 ATAD5 0.30173 0.892672 -0.22715 446 FANCG 0.047079 0.430008 -0.54098 549 EHMT2 0.45074 0.999578 -0.04293 344 PARP3 0.91644 0.999578 -0.22739 447 ERCC4 0.27307 0.881337 -0.54345 550 TPD52L1 0.934 0.999578 -0.04581 345 CIAPIN1 0.43417 0.999578 -0.22812 448 POLA2 0.013875 0.282121 -0.54446 551 TBK1 0.46209 0.999578 -0.0461 346 DBNL 0.58915 0.999578 -0.23014 449 IFITM3 0.064933 0.497444 -0.54499 552 MBD1 0.63555 0.999578 -0.04627 347 CTCF 0.067049 0.497444 -0.23424 450 RFC3 0.16103 0.713243 -0.55005 553 DDB1 0.09305 0.601485 -0.04659 348 LIG4 0.32018 0.90335 -0.23427 451 SRCAP 0.012653 0.266144 -0.55111 554 RAP1B 0.41839 0.999578 -0.04883 349 TWNK 0.13692 0.67544 -0.23625 452 RCC1 0.13634 0.67544 -0.55447 555 BRCA1 0.83071 0.999578 -0.05166 350 COLGALT1 0.24709 0.835814 -0.24071 453 GTF2H4 0.16537 0.713243 -0.57187 556 SIRT6 0.14728 0.694242 -0.05274 351 ACTL9 0.69761 0.999578 -0.24259 454 RPA2 0.015226 0.299609 -0.58358 557 NCOR1 0.92925 0.999578 -0.05323 352 RNF113A 0.16599 0.713243 -0.2439 455 BOD1L1 0.12312 0.657653 -0.5989 558 OR10A5 0.28277 0.884553 -0.05458 353 TREX2 0.48839 0.999578 -0.25744 456 XRCC6 0.000651 0.131818 -0.60646 559 XAB2 0.053229 0.43878 -0.05674 354 NIPBL 0.32022 0.90335 -0.25805 457 DDX20 0.42626 0.999578 -0.61267 560 SOX14 0.55642 0.999578 -0.05875 355 AWAT1 0.54538 0.999578 -0.25835 458 EED 0.0746 0.52914 -0.6151 561 WDR43 0.081406 0.564291 -0.05974 356 LIG1 0.32961 0.916091 -0.25859 459 DFFB 0.04227 0.427322 -0.62318 562 PSMD7 0.47206 0.999578 -0.06344 357 THOC2 0.15819 0.713243 -0.26115 460 DONSON 0.11028 0.629029 -0.63799 563 ALDH2 0.34103 0.916091 -0.06564 358 CDX4 0.38753 0.989083 -0.26522 461 UNC45A 0.11998 0.657653 -0.65179 564 WNK1 0.17967 0.713243 -0.0665 359 IRF4 0.39955 0.99887 -0.26597 462 RNF2 0.097859 0.601485 -0.65204 565 POLR2B 0.36962 0.959446 -0.06668 360 ZMPSTE24 0.29784 0.890776 -0.26773 463 ACSS1 0.063194 0.497444 -0.65444 566 MGMT 0.44768 0.999578 -0.06687 361 SETX 0.78686 0.999578 -0.26949 464 SIRT3 0.089593 0.601485 -0.66039 567 PARG 0.50275 0.999578 -0.06759 362 NOLC1 0.37432 0.966999 -0.27227 465 YEATS4 0.086781 0.594791 -0.66982 568 TBP 0.42863 0.999578 -0.07031 363 JADE2 0.6294 0.999578 -0.27266 466 CHAF1A 0.12552 0.657653 -0.6711 569 PSMD8 0.2526 0.837434 -0.07039 364 APEX2 0.113 0.638244 -0.27564 467 INO80 0.096647 0.601485 -0.68857 570 APEX1 0.91744 0.999578 -0.07224 365 KNL1 0.045926 0.430008 -0.28399 468 PLRG1 0.10159 0.613399 -0.69539 571 RNF146 0.56469 0.999578 -0.07398 366 SBF1 0.23365 0.827238 -0.28789 469 ERCC3 0.031749 0.407719 -0.71064 572 MAPK8 0.45329 0.999578 -0.07469 367 FEN1 0.22372 0.817184 -0.28936 470 RBBP7 0.017958 0.30766 -0.74995 573 KIF21A 0.44435 0.999578 -0.07633 368 CST8 0.67392 0.999578 -0.2906 471 GTF2F1 0.027695 0.392882 -0.75063 574 PAXX 0.33972 0.916091 -0.07776 369 RAVER1 0.41006 0.999578 -0.29275 472 REV3L 0.091053 0.601485 -0.7519 575 MUS81 0.25543 0.837716 -0.07887 370 TJP1 0.159 0.713243 -0.29735 473 ACACA 0.004486 0.196566 -0.76298 576 CDC25C 0.50342 0.999578 -0.07979 371 SUV39H1 0.33918 0.916091 -0.30567 474 NPM1 0.003532 0.179545 -0.78595 577 PPP1R13B 0.52299 0.999578 -0.08038 372 STUB1 0.24645 0.835814 -0.30608 475 ARID1A 0.022648 0.354235 -0.78928 578 TOPBP1 0.54175 0.999578 -0.08056 373 MCPH1 0.84104 0.999578 -0.30729 476 XRCC3 0.012593 0.266144 -0.7978 579 CCNH 0.050651 0.430008 -0.08112 374 MAD2L2 0.22817 0.823579 -0.30955 477 POLD1 0.018992 0.313104 -0.80328 580 KMT2C 0.76735 0.999578 -0.08153 375 SMARCA4 0.050755 0.430008 -0.32055 478 RPA1 0.017163 0.30766 -0.81309 581 BRD7 0.49183 0.999578 -0.08432 376 EEF2 0.5161 0.999578 -0.32128 479 BRD4 0.006304 0.213636 -0.82615 582 TRAIP 0.5431 0.999578 -0.08454 377 KLHL1 0.41008 0.999578 -0.32251 480 TXN 0.035007 0.407719 -0.82809 583 NUDT1 0.76616 0.999578 -0.08516 378 EEF1A1 0.010417 0.252189 -0.32364 481 SOD1 0.042886 0.427322 -0.82872 584 ZNRF4 0.13007 0.662096 -0.09046 379 SMARCD2 0.51996 0.999578 -0.32527 482 HTATSF1 0.025579 0.377706 -0.83677 585 MCM4 0.32284 0.90335 -0.09334 380 FAM210A 0.039101 0.42592 -0.33142 483 POLQ 0.025946 0.377706 -0.85378 586 BCL2 0.72539 0.999578 -0.09479 381 RECQL4 0.41743 0.999578 -0.33212 484 NSMCE3 0.043433 0.427322 -0.86187 587 RPRD2 0.44521 0.999578 -0.09764 382 PHF10 0.49619 0.999578 -0.33561 485 RNF8 0.061873 0.496611 -0.86189 588 TCOF1 0.0048336 0.196566 -0.10509 383 CHTF18 0.10521 0.617104 -0.33978 486 RFC2 0.03465 0.407719 -0.88866 589 MSH6 0.96945 0.999578 -0.10625 384 HDAC4 0.10429 0.617104 -0.34116 487 POLG2 0.005946 0.213369 -0.88919 590 SPATA16 0.72316 0.999578 -0.10629 385 EXO5 0.73039 0.999578 -0.35092 488 MCM7 0.050517 0.430008 -0.88947 591 NEDD8 0.76432 0.999578 -0.10793 386 TONSL 0.1816 0.714702 -0.35188 489 PCNA 0.041475 0.427322 -0.91508 592 C3orf33 0.69901 0.999578 -0.10835 387 HDAC1 0.24833 0.835814 -0.35562 490 ELL 0.007149 0.21803 -0.9155 593 UBE2A 0.85143 0.999578 -0.10871 388 DSCC1 0.4454 0.999578 -0.36112 491 FANCD2 0.000959 0.131818 -0.92953 594 TMEM120B 0.86346 0.999578 -0.11257 389 CDK2 0.17378 0.713243 -0.36224 492 KDM5C 0.001654 0.131818 -0.93726 595 RNF138 0.82208 0.999578 -0.11405 390 ERCC5 0.30312 0.892672 -0.36392 493 POLE 0.016756 0.30766 -0.97186 596 NEK8 0.74568 0.999578 -0.11891 391 DNMT3A 0.17799 0.713243 -0.36477 494 DCLRE1B 0.007924 0.22253 -0.98397 597 PARP4 0.91835 0.999578 -0.12307 392 HSDL2 0.57673 0.999578 -0.36644 495 CDC45 0.002161 0.131818 -1.0164 598 SIRT2 0.48727 0.999578 -0.12842 393 NSUN5 0.072842 0.522745 -0.36958 496 LIG3 0.011162 0.252189 -1.0184 599 ZC3H4 0.2328 0.827238 -0.12948 394 RRM1 0.16688 0.713243 -0.371 497 MDM4 0.009881 0.251136 -1.0358 600 NSMCE1 0.44485 0.999578 -0.13142 395 RAD51 0.035425 0.407719 -0.37393 498 ERCC2 0.001803 0.131818 -1.0571 601 FANCE 0.69601 0.999578 -0.13318 396 RAD9A 0.050109 0.430008 -0.37415 499 DDX11 0.017978 0.30766 -1.1397 602 SMC5 0.47665 0.999578 -0.1339 397 POLD3 0.049523 0.430008 -0.37973 500 UHRF1 0.001227 0.131818 -1.2139 603 RAD51D 0.36259 0.945208 -0.13401 398 NUDCD2 0.58128 0.999578 -0.38022 501 PRMT5 0.004794 0.196566 -1.2362 604 TADA2B 0.61789 0.999578 -0.13857 399 BUD31 0.83913 0.999578 -0.38693 502 POLE3 0.008391 0.22253 -1.2635 605 SHPRH 0.46747 0.999578 -0.13975 400 RAB2A 0.067198 0.497444 -0.38808 503 PPP4C 0.000134 0.081818 -1.4346 606 ATP6V0C 0.038177 0.423416 -0.14259 401 CCAR2 0.29181 0.890776 -0.38845 504 PRPF19 0.001793 0.131818 -1.5832 607 BUB1B 0.21013 0.781139 -0.14374 402 CHD1L 0.34307 0.916091 -0.38872 505 DKC1 0.003333 0.179545 -1.7008 608 APOBEC3B 0.5436 0.999578 -0.1464 403 ACTL6A 0.25683 0.837773 -0.39209 506 TRRAP 0.002002 0.131818 -1.7305 609 ZNF518A 0.85051 0.999578 -0.14823 404 ZMYND8 0.17569 0.713243 -0.39417 507 MRPS16 0.000363 0.110606 -1.7763 610 HDAC10 0.44713 0.999578 -0.15097 405 SOD2 0.16774 0.713243 -0.39575 508 POLG 0.84854 0.999578 -0.15346 406 UBE2T 0.52423 0.999578 -0.40052 509 SLX1A 0.58008 0.999578 -0.15588 407 ATRIP 0.10257 0.613399 -0.40277 510 TPR 0.13401 0.67544 -0.15766 408 TIMELESS 0.049026 0.430008 -0.40399 511 SIRT7 0.70008 0.999578 -0.16393 409 ATM 0.6143 0.999578 -0.40564 512 FANCI 0.35578 0.940739 -0.1656 410 DNMT3B 0.3565 0.940739 -0.40577 513 0.13025 0.662096 -0.16574 411 NUDT16L1 0.41106 0.999578 -0.40792 514 ACTR8 0.1531 0.712908 -0.16604 412 CDT1 0.14588 0.694242 -0.41038 515

116

Supplementary Table 3.2: List of sgRNAs used for PIGA targeting and validation of selected screen hits.

sgRNA Orientation Sequence Forward CACCGGGTAAAGTATAAGAGTAAAG PIGA intron Reverse AACTCTTTACTCTTATACTTTACCC Forward CACCGGTAATAGACTTTGAGGCCAC PIGA exon Reverse AACTGTGGCCTCAAAGTCTATTACC Forward CACCGCACGAACTCACACCGCGCGA Non-targeting Reverse AAACTCGCGCGGTGTGAGTTCGTGC Forward CACCGCAGTGGGCATGTGACGTAGG HELQ Reverse AAACCCTACGTCACATGCCCACTGC Forward CACCGAAGTGGTGGAAAAACCCTCG HELQ Reverse AAACCGAGGGTTTTTCCACCACTTC Forward CACCGAGAACCAGCTGTAATCGATG ZRANB2 Reverse AAACCATCGATTACAGCTGGTTCTC Forward CACCGACACTTGCAGAAAAGAGCCG ZRANB2 Reverse AAACCGGCTCTTTTCTGCAAGTGTC Forward CACCGAGATCGAAAACGCAAGCAAG PSIP1 Reverse AAACCTTGCTTGCGTTTTCGATCTC Forward CACCGAAGAGCCGGATAAAAAAGAG PSIP1 Reverse AAACCTCTTTTTTATCCGGCTCTTC Forward CACCGTTTCTTTCTCCAAGCCCAAG HMGN1 Reverse AAACCTTGGGCTTGGAGAAAGAAAC Forward CACCGCCGCAGGTCAGCTCCGCCGA HMGN1 Reverse AAACTCGGCGGAGCTGACCTGCGGC

117

CHAPTER FOUR: CONCLUSION AND FUTURE PERSPECTIVES

Contributions: I wrote this section of the thesis with help from Alberto Ciccia and Jen-Wei Huang.

118

4.1 Discussion

CRISPR-Cas9 editing outcomes are influenced by the many overlapping pathways

deployed by eukaryotic cells to repair the Cas9-induced DSB. Therefore, the permanent genetic

changes following Cas9 cleavage of the genome, can be diverse on account of repair by varied

DDR pathways. Controlling the DDR is therefore an attractive strategy towards the goal of

making Cas9-based genome editing more efficient and specific. Towards this end, we designed and executed screens to identify regulators of both CRISPR-mediated HDR and long deletions.

The pooled ORF screening approach discussed in Chapter 2 led to the development of e18, a

genetically encoded stimulator of HDR. This work highlights the promise of engineering DDR

factors to augment the efficiency of precision genome editing. Additionally, we performed a dual-Cas9 screen discussed in Chapter 3, which provides some preliminary evidence for the genetic regulation of Cas9-induced long deletions.

Herein I discuss future experiments that could build on our work and contextualize our findings within the larger goal of achieving high efficiency and precise genome editing.

4.1.1 Improving the specificity and efficiency of e18

Rationally engineering RAD18 based on its known regulation of the DDR enabled its

repurposing as a genome editing tool. Several opportunities to further engineer e18 could

result in variants with superior specificity and efficiency in stimulating HDR. Such efforts will be

guided by improved understanding of the molecular function of e18 in human cells.

We observed that K164 ubiquitination of PCNA was markedly reduced in cells expressing

e18 relative to those expressing WT RAD18 (Supplementary Fig. 2.1). Other reports have

119 demonstrated the dependence of UV-induced TLS polymerase η focus formation on the RAD18

SAP domain, thereby further implicating the necessary role of the SAP domain in RAD18’s function in TLS486. However, given that the SAP domain is not necessary for RAD6 interaction487, it would be interesting to test if e18 expression could titrate away RAD6 from WT RAD18, thereby potentially inhibiting TLS. The impact of e18 expression on TLS-mediated maintenance of genome integrity could be further assessed by testing the expression of monoubiquitinated

PCNA upon addition of genotoxic stress, the kinetics of replication fork progression (using DNA fiber analysis) and, given the ssDNA gap suppression role of TLS488,489, the frequency of ssDNA gaps (using the S1 nuclease assay). On a similar note, it would also be worth testing if e18 expression could inhibit RAD6 interactions with other interactors such as the E3 ligase RNF20

490.

e18 promotes HDR with greater specificity and efficiency than WT RAD18 (Fig. 2.2b, c).

However, it remains to be investigated, whether the enhancement of HDR is a consequence of the abrogation of RAD18’s TLS function mediated by the SAP domain. On the other hand, the difference between e18 and WT RAD18 could be on account of difference in their cellular localization and stability. Indeed, RAD18 localization in the nucleus and stability has been previously been demonstrated to be dependent on its three putative nuclear localization signals486 and auto-mono- and polyubiquitylation415. Immunofluorescence experiments comparing WT RAD18 and e18 expression, cellular localization and effect on 53BP1 foci formation in different stages of the cell cycle could reveal the reasons behind the enhanced stimulation of e18 relative to WT RAD18.

120

Differences in stability and localization could also explain the differences in HDR stimulation function between e18 and the NLS-UBZ domain (Supplementary Fig. 2.2).

Alternatively, this result could indicate the role of other RAD18 functional domains in stimulating HDR besides the UBZ-mediated inhibition of 53BP1. Reported examples of SAP domain independent functions of RAD18 include the recruitment of SLF1491, RPA492, RAD51C407,

NBS1493 and RAD6487, and monoubiquitination of FANCD2494. Further, the E3 ligase activity of e18 could ubiquitinate yet to be identified factors that could further stimulate HDR. Chromatin immunoprecipitation methods with mass spectrometry could enable the unbiased identification of factors whose recruitment to a DSB is affected upon e18 expression. Testing the impact on HDR upon expressing a series of overlapping truncation mutants of e18 could further define the minimal domains necessary for e18 function. Further, proximity-based labeling assays359 using e18 and RADΔUBZ could be utilized to distinguish between HDR and TLS based interactors of RAD18. Such studies could enable further engineering of e18 to be more specific and efficient at stimulating HDR.

Finally, future work could further enhance the function of e18 beyond inhibiting 53BP1 by fusing DDR factors to e18. The HDR function of e18 could therefore be augmented as a unique homing device to recruit HDR stimulators to Cas9-induced DSBs. In this regard, fusing e18 with factors that play a role in DNA repair downstream of short end-resection, such as those that inhibit MMEJ, promote donor DNA-mediated recombination or inhibit mutagenic homology-dependent pathways such as SSA are attractive targets that could further enhance e18 function.

121

4.1.2 Regulation of the 53BP1-H2AK15ub axis for genome editing

53BP1 has emerged as an attractive target for promoting HDR. Indeed, our study along

with others have demonstrated the effectiveness of manipulating this axis, though key

differences exist between these strategies. dn53BP1 contains an intact Tudor domain, and an

adjacent UDR motif495 and therefore likely binds to chromatin with the same affinity as 53BP1.

Conversely, e18 functions by competitively binding H2AK15ub, which prevents the UDR- dependent association of 53BP1 to chromatin in the vicinity of DSBs (Fig. 2.3 a, b). In vitro, the

RAD18 UBZ domain has been demonstrated to bind H2AK15ub with greater affinity than the

53BP1 UDR to H2AK15ub418. It remains to be elucidated whether the stronger binding to

H2AK15ub observed in vitro can translate to greater stimulation of HDR. On the other hand, i53 enhances HDR at Cas9-induced DSBs by obstructing the Tudor domain of 53BP1420. The targeting of the Tudor domain by i53 and dn53BP1 can have nonspecific effects, given its role in non-DSB-dependent activities (e.g., transcriptional regulation of p53) of 53BP1496. In contrast,

by binding to H2AK15ub, e18 inhibits 53BP1 only in the context of its role in DSB repair.

However, H2AK15ub is a mark that recruits DSB repair factors besides RAD18 and 53BP1, such

as RAP80, RNF169 and RNF168497,498. Thus, e18 in addition to 53BP1 inhibition, could also

impede the recruitment of other proteins to DSBs. While the factors that are recruited to the

RNF8-mediated ubiquitin cascade are associated predominantly with restricting end-resection,

a few H2AK15ub binding proteins have been implicated in HDR. It would, therefore, be

interesting to determine if the magnitude of e18 promotion of HDR could be diminished to

some extent by its inhibition of the recruitment of HDR promoting factors to DSBs. For example,

RNF169 is recruited to H2AK15ub and can also displace 53BP1 from DNA damage foci499.

122

Further, it has been demonstrated that RNF168 acts redundantly with BRCA1 to promote

RAD51 dependent HDR by loading PALB2 onto damaged DNA500. This function of RNF168, however, appears to be largely dispensable for HDR in cells with unperturbed BRCA1 function and therefore may remain unaffected by e18 expression in cells with functional BRCA1500.

Future experiments should focus on advancing our understanding of the consequences of e18 expression on factors that function in the RNF8-mediated ubiquitination pathway.

There remain other unexplored strategies for blocking the recruitment of 53BP1.

Overexpression of RNF169 could be one such strategy as its MIU domain was shown to bind

H2AK15ub with a greater affinity than either the RAD18 UBZ or the 53BP1 UDR in vitro418. A recently identified protein TIRR, functions similarly to i53 by binding the 53BP1 Tudor domain and preventing its H4K20me2 interaction501,502. While TIRRs small size and effectiveness at inhibiting 53BP1 make it an attractive target, its application to enhance genome editing should proceed after careful examination of the effect of its overexpression on its non-53BP1 related functions including formation of cilia structures and translation503. Finally, dynein light chain

1(DYNLL1) inhibits MRE11-mediated end-resection by stimulating the recruitment of 53BP1 to sites of DNA damage175,176 504. Therefore, inhibiting DYNLL1 could also result in HDR stimulation at Cas9-induced DSBs by destabilizing 53BP1 oligomerization.

Another strategy for inhibition of 53BP1 recruitment could be the targeting of histone post-translational modification writers and erasers. For example, acetyltransferase TIP60 along with its interactor MBTD1 impedes 53BP1 recruitment by acetylating the histone H4 tail and

H2AK15, as well as directly competing with 53BP1 for H4K20me2 binding respectively196,197.

However, the several regulatory functions performed by TIP60 outside of 53BP1 inhibition, in

123 transcription as well as its modification of other histone substrates505, could reduce the specificity of this potential approach to promote Cas9-mediated HDR. Interestingly, one of the hits identified in our HDR screen, BARD1 (Supplementary Table 2.2), along with BRCA1 can ubiquitylate histone H2A at Lys27, thereby recruiting the chromatin remodeler SMARCAD1 and facilitating 53BP1 repositioning away from DSBs198,199,434,435. Further, BRCA1-BARD1 has also been demonstrated to inhibit 53BP1 recruitment to post-replicative chromatin by recognition of H4K20me0 on newly deposited histones199. H4K20me0 on post-replicative histones also recruits other HDR promoting factors such as the TONSL-MMS22L and SLF1-RAD18 complexes199. TONSL-MMS22L promotes RAD51 loading199, while the SLF1-RAD18 complex might function to maintain chromosomal integrity by recruiting the SMC5/6 cohesin complex to

DSBs and stalled replication forks491. It is unclear if e18 affects this function of SLF1 at post- replicative chromatin, or alternatively if its function is dependent on SLF1. Therefore, the SLF1-

RAD18 complex along with the other complexes recruited to H4K20me0 are interesting targets for stimulating CRISPR-mediated HDR, as they have multifaceted roles in the inhibition of

53BP1, the promotion of RAD51-mediated recombination, and the maintenance of fidelity of

HDR.

Conversely to targeting chromatin writers and readers, targeting erasers of histone marks recognized by 53BP1 could limit its retention at DSBs. In this regard, RNF168 dependent histone ubiquitination is counteracted by H2A specific deubiquitinating enzymes (DUBs) such as

USP51, USP44, USP11, and USP3506. Targeting these DUBs have been demonstrated to modulate 53BP1 function, therefore, their overexpression could be a viable option to inhibit

53BP1 recruitment to DSBs.

124

Given the established role of 53BP1 outside of DSB regulation, a more DSB repair specific strategy could involve targeting the downstream effectors of 53BP1. In this regard, targeting SHLD1 and SHLD2 is particularly attractive as they function at the most downstream level of blocking end-resection by binding single-stranded DNA184-188. However, given that end- resection can proceed in the absence of RIF1/shieldin through the other 53BP1 effector PTIP195, targeting of the shieldin family alone may not be as effective as a strategy as entirely blocking

53BP1 recruitment to DSBs. Further, it remains to be elucidated if inhibiting the more recently characterized shieldin components (SHLD1, SHLD2 and SHLD3) can specifically promote HDR, as they might have functions outside of DSB repair, just as REV7 does in TLS507.

Finally, the safety and specificity of targeting the 53BP1-H2AK15ub axis remains to be fully tested. Indeed, several evidences point to a possible role of 53BP1 regulating the fidelity of end-resection rather than merely inhibiting it. This includes our observation of an increase in end-resection dependent MMEJ upon expression of e18 (Fig. 2.4e and Supplementary Fig. 2.4), the loss of RAD51 affinity for chromatin upon 53BP1 loss508, and the dependence of shieldin function on the binding of SHLD2 and SHLD1 to resected ssDNA184. Thus, 53BP1 may allow limited resection but prevent hyper-resection dependent error-prone pathways such as SSA in the S/G2 phases of the cell cycle. This role of 53BP1 in impeding hyper-resection could explain our identification of 53BP1 and SHLD3 as a negative regulator of Cas9-induced long deletions

(Fig. 3.1e). These findings raise concerns of undesirable damage induced upon 53BP1 inhibition following Cas9 cleavage. While our study, as well as others, have demonstrated that the transient inhibition of 53BP1 recruitment is not toxic to some human cell types (Fig. 2.6 b, c)471

121, however, a comprehensive analysis on the effect on Cas9-induced off-target activity, long

125 deletions and chromosomal aberrations upon 53BP1 inhibition has not been performed.

Therefore, future experiments should focus on a head to head comparison of the effectiveness and specificity of the different 53BP1 targeting strategies at stimulating HDR across different cell types and at different loci.

4.1.3 Combinatorial modulation of the DDR

Precision editing is influenced by the efficiency of targeting by Cas9 and the efficiency of repair by cellular factors. The efficiency of repair is influenced by multiple mechanisms that differ in speed and availability during the cell cycle235,509. In this regard, several methods

(detailed in Chapters 1.4 and 2) have been tested to stimulate precise genome editing by modulating the DDR. Despite successfully stimulating HDR, the application of these findings, including our own, continues to be limited by the modest ratio of desired HDR product: undesired byproducts. Therefore, the goal of highly efficient HDR will perhaps be fully realized by the combinatorial modulation of the DDR. The critical nodes that would likely have to be concomitantly targeted are 1) inhibition of NHEJ 2) activation of DNA end-resection 3) promotion of donor DNA mediated recombination 4) inhibition of MMEJ and 5) prevention of hyper-resection.

The rapid kinetics of NHEJ factors at localizing to DSBs enable it to outcompete other repair factors at repairing the breaks510,511. Therefore, stimulation of HDR by inhibiting 53BP1 could potentially be further enhanced by inhibiting early responding NHEJ factors (such as DNA-

PKcs or Ku) as they might still be localizing to the Cas9-induced DSB before end-resection factors. Indeed, the synergy between i53 and DNA-PK inhibitor NU7441 provides some preliminary evidence for the effectiveness of such a strategy309.

126

A tremendous impediment to efficient HDR remains the cell cycle regulation of gene

expression and active suppression of end-resection and HDR factors in the G1 phase201,208,512-514.

This feature prohibits the application of HDR-mediated precision editing in quiescent cells,

while also lowering efficiency in mitotic cells. Two potential strategies to overcome or at least

mitigate cell cycle suppression of HDR include the activation of HDR in G1 and limiting repair of

Cas9-induced DSBs in the S/G2 phases of dividing cells respectively. Previously, the minimal requirements to activate HDR in the G1 phase was demonstrated by promoting BRCA1-PALB2-

BRCA2 interaction (by preventing KEAP1-CUL3 mediated degradation of PALB2) and activating

CtIP mediated end-resection (by expressing a phosphomimetic CtIP) in a 53BP1-KO cell line201.

Alternative approaches to promote cell cycle independent HDR could be streamlined by fusing

proteins to Cas9 and e18. For example, it would be interesting to determine if the combined

expression of Cas9 fused to mutant CtIP (T847E), along with the expression of an e18 fused to a

non-ubiquitable mutant PALB2 (by mutating its K20/25/30 residues) could induce HDR in G1.

Fusing Cas9 to CtIP has been shown to effectively promote target specific end-resection292,

therefore, fusing Cas9 with the constitutively activated CtIP(T847E) mutant could enable end-

resection in G1. Further, an e18-mutant-PALB2 (that is resistant to degradation by the KEAP1-

CUL3 complex in G1) fusion could combinatorially promote extended end-resection (by

inhibiting 53BP1 localization) as well as RAD51 loading (by localizing PALB2-BRCA2 complex to

DSBs). Fusing PALB2 with e18, would therefore, force PALB2 localization to DSBs, which has

previously been demonstrated (by fusing PALB2 to the MDC1 recognizing FHA domain of RNF8)

to circumvent its dependency on BRCA1 and RNF168 for recruitment500. Given the need to

target multiple pathways, stimulating HDR in all phases of the cell cycle will undoubtedly be

127 challenging and future efforts will have to focus on dissecting the critical nodes for HDR activation in G1, as well as the cell viability consequences of such strategies.

Alternative strategies to promote repair of Cas9 DSB repair in the HDR permissive phases of the cell cycle have been discussed in Chapter 1.4. Deploying these strategies combinatorially with 53BP1 inhibition should be investigated for the synergistic enhancement of HDR-mediated precision genome editing. However, the application of these strategies will be limited to mitotic cells.

The synergistic enhancement of HDR by combinatorially depleting MMEJ factors along with 53BP1 depletion has been discussed in Chapter 2. Finally, given the detection of modest frequencies of long deletions at Cas9-induced DSBs240,275 (Fig. 3.1 ), inhibiting factors that promote such mutagenic repair could further improve HDR:indel ratios. Characterization of the hits identified in Chapter 3, may reveal strategies to limit such highly mutagenic repair that are potentially regulated by hyper-resection dependent pathways such as SSA. Strategies involving the stimulation of RAD51 loading and factors downstream of RAD51 loading could limit such mutagenic pathways by promoting donor-mediated recombination and limiting the time for nucleases to extend end-resection. On a similar note, it would also be interesting to investigate if the delivery of donor DNA, by promoting donor-mediate HDR at the expense of extended end-resection, could limit hyper-resection dependent repair pathways and prevent the formation of CRISPR-mediated long deletions.

It is important to note, that the effectiveness of the strategies discussed above may vary depending on context due to tissue- and cell-specific differences in DNA repair pathways. A well-known example of such tissue-specific regulation of the DDR is evidenced by mutations in

128

BRCA1/2 specifically increasing the risk of breast and ovarian cancers515. Therefore, the identification of a universal HDR promoting strategy that is effective in all cell types is unlikely.

Future efforts focusing on cell-type-specific profiling of key repair factors may reveal opportunities for promoting highly efficient cell-type-specific precision editing.

4.2 Future Directions

(partially adapted from Billon, P., Nambiar, T. S., Hayward, S. B., Zafra, M. P., Schatoff, E. M., Oshima, K., Dunbar,

A., Breinig, M., Park, Y. C., Ryu, H. S., Tschaharganeh, D. F., Levine, R. L., Baer, R., Ferrando, A., Dow, L. E., & Ciccia,

A. (2020). Detection of Marker-Free Precision Genome Editing and Genetic Variation through the Capture of

Genomic Signatures. Cell reports, 30(10), 3280–3295.e6.)

The ultimate goal of CRISPR-Cas9-mediated genome editing is to guide the cellular DDR perfectly from the DSB to the desired repair outcome with no diversion to undesirable side products. While investigating the modulation of the DDR towards this end is of great significance, in parallel, research into improving delivery approaches, specificity of genome editing, and detection of edits are of vital significance.

One of the biggest challenges in the of CRISPR-Cas9 as a therapeutic is to transport it safely and efficiently into the nucleus of cells. The current limitations of the delivery of CRISPR components include the inability to efficiently enter cells, off-target effects, and activation of the immune system. These issues will undoubtedly also impact strategies for modulating the DDR. The question of how to deliver factors that enhance precision genome editing along with the CRISPR components is crucial to consider, as it will determine its effectiveness as well as its specificity at stimulating HDR. DDR modulating factors could be delivered as a small molecule, or like CRISPR components, as a plasmid, mRNA, RNP, or

129 packaged into a virus. Just as the editing efficiency of Cas9 varies based on the format516, we have observed that the delivery format of e18 (mRNA or plasmid-based) affects the degree of

HDR stimulation (Supplementary Fig. 2.1d). Similarly, the delivery format of CRISPR components has been shown to influence the editing specificity517. This is believed to be a function of availability; persistently expressed formats such as virus or plasmid DNA have more opportunity for off-target editing than transient formats such as protein delivery. Similar consideration of the effect of persistence in the cell on off-target effects and toxicity will have to be made for co-delivery of DNA repair factors with CRISPR components. Also, the size of the factor (specifically for genetically modified factors), the kinetics of modulating the DDR, and the abundance of the factor required to induce an effect will also influence the choice of delivery format. Finally, given the potential of the DDR to modulate the immune response518, the potential immunogenicity of the DDR controlling strategy will be vital to consider for certain applications.

A single cell endowed with an unintended edit could result in a pathogenic lesion with catastrophic consequences. Therefore, the reduction and detection of the undesirable effects of CRISPR-Cas9 editing is paramount. The specificity of function is similarly of concern for DDR modulating strategies. Our demonstration of engineering RAD18 to improve its specificity of stimulating HDR, highlight the potential of engineering DDR factors to reduce undesirable- effects. Additionally, the aforementioned strategies to spatiotemporally modulate the DDR by fusing factors to Cas9 or e18 could enable localizing their function specifically to the Cas9- cleavage site.

130

In addition to reducing off-target cleavage, it is imperative to accurately and sensitively detect all edits made at the Cas9-target and off-target site(s). In this regard, we recently developed DTECT (Dinucleotide signaTurE CapTure), a rapid and versatile detection method that relies on the capture of targeted dinucleotide signatures resulting from the digestion of genomic DNA amplicons by the type IIS AcuI519. DTECT enables the accurate quantification of marker-free precision genome editing events introduced by CRISPR-dependent

HDR, base editing and prime editing in various biological systems, such as mammalian cell lines, organoids, and tissues. The ease, speed, and cost efficiency by which DTECT identifies genomic signatures thereby circumvents the need for more time-consuming, sophisticated and expensive approaches like NGS. However, despite its versatility, DTECT, like other available approaches to detect Cas9 editing outcomes, is not capable of comprehensively detecting long deletions and large-scale chromosomal aberrations induced by duplications, inversions and translocations. Thus, there remains a need for new technologies that can quickly, economically and comprehensively analyze the genome.

Additionally, approaches to enrich for cells with desirable edits could circumvent the need for high-frequency HDR in certain applications. In this regard several approaches have been developed520, however, we continue to be limited in our ability to detect cells with desirable edits in live cells, and eliminate cells with undesirable edits. Together, improved strategies for the detection of Cas9 on- and off-target edits and enrichment of cells with desirable edits will greatly enhance the utility of CRISPR-Cas9 based genome editing in academic as well as clinical contexts.

131

Finally, it is imperative to acknowledge that while efforts to modulate the DSB response factors to stimulate precision genome editing promise, the advancement of DSB-free technologies such as base editors and prime editors, could drastically reduce applications for

DSB-dependent editing. For the time being, key issues remain regarding the efficiency and specificity of base editors and prime editors. Further, owing to their large size, their delivery especially for in vivo genome editing remains challenging521. Ultimately, the development of improved base and prime editors will herald in their own right, efforts to further understand how these technologies engage DNA repair pathways.

Thus, the symbiotic advancement of the DNA damage response and genome editing fields will continue to transform science and usher breakthroughs that will change humankind and the planet for good. The future of both these fields is exciting and I look forward to watching it unfold.

132

REFERENCES

1 Auerbach, C. The chemical production of mutations. The effect of chemical mutagens on cells and their genetic material is discussed. Science 158, 1141-1147, doi:10.1126/science.158.3805.1141 (1967). 2 Muller, H. J. Artificial Transmutation of the Gene. Science 66, 84-87, doi:10.1126/science.66.1699.84 (1927). 3 Rothstein, R. J. One-step gene disruption in yeast. Methods Enzymol 101, 202-211, doi:10.1016/0076-6879(83)01015-0 (1983). 4 Scherer, S. & Davis, R. W. Replacement of chromosome segments with altered DNA sequences constructed in vitro. Proc Natl Acad Sci U S A 76, 4951-4955, doi:10.1073/pnas.76.10.4951 (1979). 5 Smithies, O., Gregg, R. G., Boggs, S. S., Koralewski, M. A. & Kucherlapati, R. S. Insertion of DNA sequences into the human chromosomal beta-globin locus by homologous recombination. Nature 317, 230-234, doi:10.1038/317230a0 (1985). 6 Thomas, K. R., Folger, K. R. & Capecchi, M. R. High frequency targeting of genes to specific sites in the mammalian genome. Cell 44, 419-428, doi:10.1016/0092- 8674(86)90463-0 (1986). 7 Smithies, O., Koralewski, M. A., Song, K. Y. & Kucherlapati, R. S. Homologous recombination with DNA introduced into mammalian cells. Cold Spring Harb Symp Quant Biol 49, 161-170, doi:10.1101/sqb.1984.049.01.019 (1984). 8 Choulika, A., Perrin, A., Dujon, B. & Nicolas, J. F. Induction of homologous recombination in mammalian chromosomes by using the I-SceI system of . Mol Cell Biol 15, 1968-1973, doi:10.1128/mcb.15.4.1968 (1995). 9 Plessis, A., Perrin, A., Haber, J. E. & Dujon, B. Site-specific recombination determined by I-SceI, a mitochondrial group I intron-encoded endonuclease expressed in the yeast nucleus. Genetics 130, 451-460 (1992). 10 Rouet, P., Smih, F. & Jasin, M. Introduction of double-strand breaks into the genome of mouse cells by expression of a rare-cutting endonuclease. Mol Cell Biol 14, 8096-8106, doi:10.1128/mcb.14.12.8096 (1994). 11 Rudin, N., Sugarman, E. & Haber, J. E. Genetic and physical analysis of double-strand break repair and recombination in Saccharomyces cerevisiae. Genetics 122, 519-534 (1989). 12 Ishino, Y., Shinagawa, H., Makino, K., Amemura, M. & Nakata, A. Nucleotide sequence of the iap gene, responsible for alkaline isozyme conversion in Escherichia coli, and identification of the gene product. J Bacteriol 169, 5429-5433, doi:10.1128/jb.169.12.5429-5433.1987 (1987). 13 Pourcel, C., Salvignol, G. & Vergnaud, G. CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. Microbiology 151, 653-663, doi:10.1099/mic.0.27437-0 (2005). 14 Mojica, F. J., Diez-Villasenor, C., Garcia-Martinez, J. & Soria, E. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J Mol Evol 60, 174-182, doi:10.1007/s00239-004-0046-3 (2005).

133

15 Bolotin, A., Quinquis, B., Sorokin, A. & Ehrlich, S. D. Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology 151, 2551-2561, doi:10.1099/mic.0.28048-0 (2005). 16 Barrangou, R. et al. CRISPR provides acquired resistance against viruses in . Science 315, 1709-1712, doi:10.1126/science.1138140 (2007). 17 Sapranauskas, R. et al. The Streptococcus thermophilus CRISPR/Cas system provides immunity in Escherichia coli. Nucleic Acids Res 39, 9275-9282, doi:10.1093/nar/gkr606 (2011). 18 Garneau, J. E. et al. The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature 468, 67-71, doi:10.1038/nature09523 (2010). 19 Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816-821, doi:10.1126/science.1225829 (2012). 20 Gasiunas, G., Barrangou, R., Horvath, P. & Siksnys, V. Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proc Natl Acad Sci U S A 109, E2579-2586, doi:10.1073/pnas.1208507109 (2012). 21 Deltcheva, E. et al. CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Nature 471, 602-607, doi:10.1038/nature09886 (2011). 22 Jinek, M. et al. RNA-programmed genome editing in human cells. Elife 2, e00471, doi:10.7554/eLife.00471 (2013). 23 Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823, doi:10.1126/science.1231143 (2013). 24 Mali, P. et al. RNA-guided engineering via Cas9. Science 339, 823-826, doi:10.1126/science.1232033 (2013). 25 Makarova, K. S. et al. Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants. Nat Rev Microbiol 18, 67-83, doi:10.1038/s41579-019- 0299-x (2020). 26 Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Classification and Nomenclature of CRISPR-Cas Systems: Where from Here? CRISPR J 1, 325-336, doi:10.1089/crispr.2018.0033 (2018). 27 Gilbert, L. A. et al. Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell 159, 647-661, doi:10.1016/j.cell.2014.09.029 (2014). 28 Jinek, M. et al. Structures of Cas9 endonucleases reveal RNA-mediated conformational activation. Science 343, 1247997, doi:10.1126/science.1247997 (2014). 29 Doudna, J. A. & Charpentier, E. Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096, doi:10.1126/science.1258096 (2014). 30 Semenova, E. et al. Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc Natl Acad Sci U S A 108, 10098-10103, doi:10.1073/pnas.1104144108 (2011). 31 Wiedenheft, B. et al. RNA-guided complex from a bacterial immune system enhances target recognition through seed sequence interactions. Proc Natl Acad Sci U S A 108, 10092-10097, doi:10.1073/pnas.1102716108 (2011). 32 Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini, L. A. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat Biotechnol 31, 233-239, doi:10.1038/nbt.2508 (2013).

134

33 Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. & Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507, 62-67, doi:10.1038/nature13011 (2014). 34 Pattanayak, V. et al. High-throughput profiling of off-target DNA cleavage reveals RNA- programmed Cas9 nuclease specificity. Nat Biotechnol 31, 839-843, doi:10.1038/nbt.2673 (2013). 35 Marraffini, L. A. & Sontheimer, E. J. Self versus non-self discrimination during CRISPR RNA-directed immunity. Nature 463, 568-571, doi:10.1038/nature08703 (2010). 36 Bikard, D., Hatoum-Aslan, A., Mucida, D. & Marraffini, L. A. CRISPR interference can prevent natural transformation and virulence acquisition during in vivo bacterial infection. Cell Host Microbe 12, 177-186, doi:10.1016/j.chom.2012.06.003 (2012). 37 Knight, S. C. et al. Dynamics of CRISPR-Cas9 genome interrogation in living cells. Science 350, 823-826, doi:10.1126/science.aac6572 (2015). 38 Ma, H. et al. CRISPR-Cas9 nuclear dynamics and target recognition in living cells. J Cell Biol 214, 529-537, doi:10.1083/jcb.201604115 (2016). 39 Szczelkun, M. D. et al. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascade effector complexes. Proc Natl Acad Sci U S A 111, 9798-9803, doi:10.1073/pnas.1402597111 (2014). 40 Li, Y. et al. A versatile reporter system for CRISPR-mediated chromosomal rearrangements. Genome Biol 16, 111, doi:10.1186/s13059-015-0680-7 (2015). 41 Shou, J., Li, J., Liu, Y. & Wu, Q. Precise and Predictable CRISPR Chromosomal Rearrangements Reveal Principles of Cas9-Mediated Nucleotide Insertion. Mol Cell 71, 498-509 e494, doi:10.1016/j.molcel.2018.06.021 (2018). 42 Gisler, S. et al. Multiplexed Cas9 targeting reveals genomic location effects and gRNA- based staggered breaks influencing mutation efficiency. Nat Commun 10, 1598, doi:10.1038/s41467-019-09551-w (2019). 43 Taheri-Ghahfarokhi, A. et al. Decoding non-random mutational signatures at Cas9 targeted sites. Nucleic Acids Res 46, 8417-8434, doi:10.1093/nar/gky653 (2018). 44 Lemos, B. R. et al. CRISPR/Cas9 cleavages in budding yeast reveal templated insertions and strand-specific insertion/deletion profiles. Proc Natl Acad Sci U S A 115, E2040- E2047, doi:10.1073/pnas.1716855115 (2018). 45 Chakrabarti, A. M. et al. Target-Specific Precision of CRISPR-Mediated Genome Editing. Mol Cell 73, 699-713 e696, doi:10.1016/j.molcel.2018.11.031 (2019). 46 Raper, A. T., Stephenson, A. A. & Suo, Z. Functional Insights Revealed by the Kinetic Mechanism of CRISPR/Cas9. J Am Chem Soc 140, 2971-2984, doi:10.1021/jacs.7b13047 (2018). 47 Richardson, C. D., Ray, G. J., DeWitt, M. A., Curie, G. L. & Corn, J. E. Enhancing homology-directed genome editing by catalytically active and inactive CRISPR-Cas9 using asymmetric donor DNA. Nat Biotechnol 34, 339-344, doi:10.1038/nbt.3481 (2016). 48 Shao, S. et al. Long-term dual-color tracking of genomic loci by modified sgRNAs of the CRISPR/Cas9 system. Nucleic Acids Res 44, e86, doi:10.1093/nar/gkw066 (2016). 49 Alan S. Wang, L. C., R. Alex Wu, Christopher D. Richardson, Benjamin G. Gowen, Katelynn R. Kazane, Jonathan T. Vu, Stacia K. Wyman, Jiyung Shin, Johannes C. Walter, Jacob E. Corn. The histone chaperone FACT induces Cas9 multi-turnover behavior and modifies genome manipulation in human cells. bioRxiv 705657 (2019).

135

50 Haber, J. E. A Life Investigating Pathways That Repair Broken Chromosomes. Annu Rev Genet 50, 1-28, doi:10.1146/annurev-genet-120215-035043 (2016). 51 Mao, Z., Bozzella, M., Seluanov, A. & Gorbunova, V. Comparison of nonhomologous end joining and homologous recombination in human cells. DNA Repair (Amst) 7, 1765- 1771, doi:10.1016/j.dnarep.2008.06.018 (2008). 52 Davis, A. J. & Chen, D. J. DNA double strand break repair via non-homologous end- joining. Transl Cancer Res 2, 130-143, doi:10.3978/j.issn.2218-676X.2013.04.02 (2013). 53 Lieber, M. R. The mechanism of double-strand DNA break repair by the nonhomologous DNA end-joining pathway. Annu Rev Biochem 79, 181-211, doi:10.1146/annurev.biochem.052308.093131 (2010). 54 Difilippantonio, M. J. et al. DNA repair protein Ku80 suppresses chromosomal aberrations and malignant transformation. Nature 404, 510-514, doi:10.1038/35006670 (2000). 55 Ren, K. & Pena de Ortiz, S. Non-homologous DNA end joining in the mature rat brain. J Neurochem 80, 949-959, doi:10.1046/j.0022-3042.2002.00776.x (2002). 56 Mjelle, R. et al. Cell cycle regulation of human DNA repair and chromatin remodeling genes. DNA Repair (Amst) 30, 53-67, doi:10.1016/j.dnarep.2015.03.007 (2015). 57 Chang, H. H. Y., Pannunzio, N. R., Adachi, N. & Lieber, M. R. Non-homologous DNA end joining and alternative pathways to double-strand break repair. Nat Rev Mol Cell Biol 18, 495-506, doi:10.1038/nrm.2017.48 (2017). 58 Davis, A. J., Chen, B. P. & Chen, D. J. DNA-PK: a dynamic enzyme in a versatile DSB repair pathway. DNA Repair (Amst) 17, 21-29, doi:10.1016/j.dnarep.2014.02.020 (2014). 59 Spagnolo, L., Rivera-Calzada, A., Pearl, L. H. & Llorca, O. Three-dimensional structure of the human DNA-PKcs/Ku70/Ku80 complex assembled on DNA and its implications for DNA DSB repair. Mol Cell 22, 511-519, doi:10.1016/j.molcel.2006.04.013 (2006). 60 Griffith, A. J., Blier, P. R., Mimori, T. & Hardin, J. A. Ku polypeptides synthesized in vitro assemble into complexes which recognize ends of double-stranded DNA. J Biol Chem 267, 331-338 (1992). 61 Shim, E. Y. et al. Saccharomyces cerevisiae Mre11/Rad50/Xrs2 and Ku proteins regulate association of Exo1 and Dna2 with DNA breaks. EMBO J 29, 3370-3380, doi:10.1038/emboj.2010.219 (2010). 62 Mimitou, E. P. & Symington, L. S. Ku prevents Exo1 and -dependent resection of DNA ends in the absence of a functional MRX complex or Sae2. EMBO J 29, 3358- 3369, doi:10.1038/emboj.2010.193 (2010). 63 Yang, S. H. et al. The SOSS1 single-stranded DNA binding complex promotes DNA end resection in concert with Exo1. EMBO J 32, 126-139, doi:10.1038/emboj.2012.314 (2013). 64 Yano, K. & Chen, D. J. Live cell imaging of XLF and XRCC4 reveals a novel view of protein assembly in the non-homologous end-joining pathway. Cell Cycle 7, 1321-1325, doi:10.4161/cc.7.10.5898 (2008). 65 Bryans, M., Valenzano, M. C. & Stamato, T. D. Absence of DNA ligase IV protein in XR-1 cells: evidence for stabilization by XRCC4. Mutat Res 433, 53-58, doi:10.1016/s0921-8777(98)00063-9 (1999). 66 Mari, P. O. et al. Dynamic assembly of end-joining complexes requires interaction between Ku70/80 and XRCC4. Proc Natl Acad Sci U S A 103, 18597-18602, doi:10.1073/pnas.0609061103 (2006).

136

67 Chang, H. H., Watanabe, G. & Lieber, M. R. Unifying the DNA end-processing roles of the artemis nuclease: Ku-dependent artemis resection at blunt DNA ends. J Biol Chem 290, 24036-24050, doi:10.1074/jbc.M115.680900 (2015). 68 Ma, Y. et al. A biochemically defined system for mammalian nonhomologous DNA end joining. Mol Cell 16, 701-713, doi:10.1016/j.molcel.2004.11.017 (2004). 69 Sfeir, A. & Symington, L. S. Microhomology-Mediated End Joining: A Back-up Survival Mechanism or Dedicated Pathway? Trends Biochem Sci 40, 701-714, doi:10.1016/j.tibs.2015.08.006 (2015). 70 Deriano, L. & Roth, D. B. Modernizing the nonhomologous end-joining repertoire: alternative and classical NHEJ share the stage. Annu Rev Genet 47, 433-455, doi:10.1146/annurev-genet-110711-155540 (2013). 71 Truong, L. N. et al. Microhomology-mediated End Joining and Homologous Recombination share the initial end resection step to repair DNA double-strand breaks in mammalian cells. Proc Natl Acad Sci U S A 110, 7720-7725, doi:10.1073/pnas.1213431110 (2013). 72 Boboila, C. et al. Alternative end-joining catalyzes class switch recombination in the absence of both Ku70 and DNA ligase 4. J Exp Med 207, 417-427, doi:10.1084/jem.20092449 (2010). 73 Yan, C. T. et al. IgH class switching and translocations use a robust non-classical end- joining pathway. Nature 449, 478-482, doi:10.1038/nature06020 (2007). 74 Anand, R., Ranjha, L., Cannavo, E. & Cejka, P. Phosphorylated CtIP Functions as a Co- factor of the MRE11-RAD50-NBS1 Endonuclease in DNA End Resection. Mol Cell 64, 940-950, doi:10.1016/j.molcel.2016.10.017 (2016). 75 Sartori, A. A. et al. Human CtIP promotes DNA end resection. Nature 450, 509-514, doi:10.1038/nature06337 (2007). 76 Myler, L. R. et al. Single-Molecule Imaging Reveals How Mre11-Rad50-Nbs1 Initiates DNA Break Repair. Mol Cell 67, 891-898 e894, doi:10.1016/j.molcel.2017.08.002 (2017). 77 Shibata, A. et al. DNA double-strand break repair pathway choice is directed by distinct MRE11 nuclease activities. Mol Cell 53, 7-18, doi:10.1016/j.molcel.2013.11.003 (2014). 78 Paull, T. T. & Gellert, M. The 3' to 5' exonuclease activity of Mre 11 facilitates repair of DNA double-strand breaks. Mol Cell 1, 969-979, doi:10.1016/s1097-2765(00)80097-0 (1998). 79 Zhou, Y., Caron, P., Legube, G. & Paull, T. T. Quantitation of DNA double-strand break resection intermediates in human cells. Nucleic Acids Res 42, e19, doi:10.1093/nar/gkt1309 (2014). 80 Garcia, V., Phelps, S. E., Gray, S. & Neale, M. J. Bidirectional resection of DNA double- strand breaks by Mre11 and Exo1. Nature 479, 241-244, doi:10.1038/nature10515 (2011). 81 Shibata, A. et al. Factors determining DNA double-strand break repair pathway choice in G2 phase. EMBO J 30, 1079-1092, doi:10.1038/emboj.2011.27 (2011). 82 Deshpande, R. A. et al. DNA-dependent protein kinase promotes DNA end processing by MRN and CtIP. Sci Adv 6, eaay0922, doi:10.1126/sciadv.aay0922 (2020). 83 Yu, A. M. & McVey, M. Synthesis-dependent microhomology-mediated end joining accounts for multiple types of repair junctions. Nucleic Acids Res 38, 5706-5717, doi:10.1093/nar/gkq379 (2010).

137

84 Wang, H. et al. Biochemical evidence for Ku-independent backup pathways of NHEJ. Nucleic Acids Res 31, 5377-5388, doi:10.1093/nar/gkg728 (2003). 85 McVey, M. & Lee, S. E. MMEJ repair of double-strand breaks (director's cut): deleted sequences and alternative endings. Trends Genet 24, 529-538, doi:10.1016/j.tig.2008.08.007 (2008). 86 Bennardo, N., Cheng, A., Huang, N. & Stark, J. M. Alternative-NHEJ is a mechanistically distinct pathway of mammalian chromosome break repair. PLoS Genet 4, e1000110, doi:10.1371/journal.pgen.1000110 (2008). 87 Kent, T., Chandramouly, G., McDevitt, S. M., Ozdemir, A. Y. & Pomerantz, R. T. Mechanism of microhomology-mediated end-joining promoted by human DNA polymerase theta. Nat Struct Mol Biol 22, 230-237, doi:10.1038/nsmb.2961 (2015). 88 Audebert, M., Salles, B. & Calsou, P. Involvement of poly(ADP-ribose) polymerase-1 and XRCC1/DNA ligase III in an alternative route for DNA double-strand breaks rejoining. J Biol Chem 279, 55117-55126, doi:10.1074/jbc.M404524200 (2004). 89 Chan, S. H., Yu, A. M. & McVey, M. Dual roles for DNA polymerase theta in alternative end-joining repair of double-strand breaks in . PLoS Genet 6, e1001005, doi:10.1371/journal.pgen.1001005 (2010). 90 McVey, M., Khodaverdian, V. Y., Meyer, D., Cerqueira, P. G. & Heyer, W. D. Eukaryotic DNA Polymerases in Homologous Recombination. Annu Rev Genet 50, 393- 421, doi:10.1146/annurev-genet-120215-035243 (2016). 91 Wang, H. et al. DNA ligase III as a candidate component of backup pathways of nonhomologous end joining. Cancer Res 65, 4020-4030, doi:10.1158/0008-5472.CAN- 04-3055 (2005). 92 Lu, G. et al. Ligase I and ligase III mediate the DNA double-strand break ligation in alternative end-joining. Proc Natl Acad Sci U S A 113, 1256-1260, doi:10.1073/pnas.1521597113 (2016). 93 Iyer, S. et al. Precise therapeutic gene correction by a simple nuclease-induced double- stranded break. Nature 568, 561-565, doi:10.1038/s41586-019-1076-8 (2019). 94 Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646-651, doi:10.1038/s41586-018-0686-x (2018). 95 Nakade, S. et al. Microhomology-mediated end-joining-dependent integration of donor DNA in cells and animals using TALENs and CRISPR/Cas9. Nat Commun 5, 5560, doi:10.1038/ncomms6560 (2014). 96 Gallagher, D. N. & Haber, J. E. Repair of a Site-Specific DNA Cleavage: Old-School Lessons for Cas9-Mediated Gene Editing. ACS Chem Biol 13, 397-405, doi:10.1021/acschembio.7b00760 (2018). 97 Jasin, M. & Haber, J. E. The democratization of gene editing: Insights from site-specific cleavage and double-strand break repair. DNA Repair (Amst) 44, 6-16, doi:10.1016/j.dnarep.2016.05.001 (2016). 98 German, J. Bloom syndrome: a mendelian prototype of somatic mutational disease. Medicine (Baltimore) 72, 393-406 (1993). 99 Hustedt, N. & Durocher, D. The control of DNA repair by the cell cycle. Nat Cell Biol 19, 1-9, doi:10.1038/ncb3452 (2016). 100 Daley, J. M., Chiba, T., Xue, X., Niu, H. & Sung, P. Multifaceted role of the Topo IIIalpha-RMI1-RMI2 complex and DNA2 in the BLM-dependent pathway of DNA break end resection. Nucleic Acids Res 42, 11083-11091, doi:10.1093/nar/gku803 (2014).

138

101 Daley, J. M., Niu, H., Miller, A. S. & Sung, P. Biochemical mechanism of DSB end resection and its regulation. DNA Repair (Amst) 32, 66-74, doi:10.1016/j.dnarep.2015.04.015 (2015). 102 San Filippo, J., Sung, P. & Klein, H. Mechanism of eukaryotic homologous recombination. Annu Rev Biochem 77, 229-257, doi:10.1146/annurev.biochem.77.061306.125255 (2008). 103 Zhao, W. et al. BRCA1-BARD1 promotes RAD51-mediated homologous DNA pairing. Nature 550, 360-365, doi:10.1038/nature24060 (2017). 104 Prakash, R., Zhang, Y., Feng, W. & Jasin, M. Homologous recombination and human health: the roles of BRCA1, BRCA2, and associated proteins. Cold Spring Harb Perspect Biol 7, a016600, doi:10.1101/cshperspect.a016600 (2015). 105 Liu, J., Doty, T., Gibson, B. & Heyer, W. D. Human BRCA2 protein promotes RAD51 filament formation on RPA-covered single-stranded DNA. Nat Struct Mol Biol 17, 1260- 1262, doi:10.1038/nsmb.1904 (2010). 106 Sugiyama, T., Zaitseva, E. M. & Kowalczykowski, S. C. A single-stranded DNA-binding protein is needed for efficient presynaptic complex formation by the Saccharomyces cerevisiae Rad51 protein. J Biol Chem 272, 7940-7945, doi:10.1074/jbc.272.12.7940 (1997). 107 Vindigni, A. & Hickson, I. D. RecQ helicases: multiple structures for multiple functions? HFSP J 3, 153-164, doi:10.2976/1.3079540 (2009). 108 Xue, X., Sung, P. & Zhao, X. Functions and regulation of the multitasking FANCM family of DNA motor proteins. Genes Dev 29, 1777-1788, doi:10.1101/gad.266593.115 (2015). 109 Whitby, M. C. The FANCM family of DNA helicases/. DNA Repair (Amst) 9, 224-236, doi:10.1016/j.dnarep.2009.12.012 (2010). 110 Barber, L. J. et al. RTEL1 maintains genomic stability by suppressing homologous recombination. Cell 135, 261-271, doi:10.1016/j.cell.2008.08.016 (2008). 111 Stafa, A., Donnianni, R. A., Timashev, L. A., Lam, A. F. & Symington, L. S. Template switching during break-induced replication is promoted by the Mph1 helicase in Saccharomyces cerevisiae. Genetics 196, 1017-1028, doi:10.1534/genetics.114.162297 (2014). 112 DeWitt, M. A. et al. Selection-free genome editing of the sickle mutation in human adult hematopoietic stem/progenitor cells. Sci Transl Med 8, 360ra134, doi:10.1126/scitranslmed.aaf9336 (2016). 113 Yang, L. et al. Optimization of scarless human stem cell genome editing. Nucleic Acids Res 41, 9049-9061, doi:10.1093/nar/gkt555 (2013). 114 Chen, F. et al. High-frequency genome editing using ssDNA oligonucleotides with zinc- finger nucleases. Nat Methods 8, 753-755, doi:10.1038/nmeth.1653 (2011). 115 Davis, L. & Maizels, N. Two Distinct Pathways Support Gene Correction by Single- Stranded Donors at DNA Nicks. Cell Rep 17, 1872-1881, doi:10.1016/j.celrep.2016.10.049 (2016). 116 Davis, L. & Maizels, N. Homology-directed repair of DNA nicks via pathways distinct from canonical double-strand break repair. Proc Natl Acad Sci U S A 111, E924-932, doi:10.1073/pnas.1400236111 (2014).

139

117 Kan, Y., Ruis, B., Takasugi, T. & Hendrickson, E. A. Mechanisms of precise genome editing using oligonucleotide donors. Genome Res 27, 1099-1111, doi:10.1101/gr.214775.116 (2017). 118 Bothmer, A. et al. Characterization of the interplay between DNA repair and CRISPR/Cas9-induced DNA lesions at an endogenous locus. Nat Commun 8, 13905, doi:10.1038/ncomms13905 (2017). 119 Richardson, C. D. et al. CRISPR-Cas9 genome editing in human cells occurs via the Fanconi anemia pathway. Nat Genet 50, 1132-1139, doi:10.1038/s41588-018-0174-0 (2018). 120 Shao, S. et al. Enhancing CRISPR/Cas9-mediated homology-directed repair in mammalian cells by expressing Saccharomyces cerevisiae Rad52. Int J Biochem Cell Biol 92, 43-52, doi:10.1016/j.biocel.2017.09.012 (2017). 121 Paulsen, B. S. et al. Ectopic expression of RAD52 and dn53BP1 improves homology- directed repair during CRISPR-Cas9 genome editing. Nat Biomed Eng 1, 878-888, doi:10.1038/s41551-017-0145-2 (2017). 122 Liskay, R. M., Letsou, A. & Stachelek, J. L. Homology requirement for efficient gene conversion between duplicated chromosomal sequences in mammalian cells. Genetics 115, 161-167 (1987). 123 Ren, C., Yan, Q. & Zhang, Z. Minimum length of direct repeat sequences required for efficient homologous recombination induced by zinc finger nuclease in yeast. Mol Biol Rep 41, 6939-6948, doi:10.1007/s11033-014-3579-6 (2014). 124 Sugawara, N., Ira, G. & Haber, J. E. DNA length dependence of the single-strand annealing pathway and the role of Saccharomyces cerevisiae RAD59 in double-strand break repair. Mol Cell Biol 20, 5300-5309, doi:10.1128/mcb.20.14.5300-5309.2000 (2000). 125 Han, J. et al. BRCA2 antagonizes classical and alternative nonhomologous end-joining to prevent gross genomic instability. Nat Commun 8, 1470, doi:10.1038/s41467-017-01759- y (2017). 126 Grimme, J. M. et al. Human Rad52 binds and wraps single-stranded DNA and mediates annealing via two hRad52-ssDNA complexes. Nucleic Acids Res 38, 2917-2930, doi:10.1093/nar/gkp1249 (2010). 127 Rothenberg, E., Grimme, J. M., Spies, M. & Ha, T. Human Rad52-mediated homology search and annealing occurs by continuous interactions between overlapping nucleoprotein complexes. Proc Natl Acad Sci U S A 105, 20274-20279, doi:10.1073/pnas.0810317106 (2008). 128 Shinohara, A., Shinohara, M., Ohta, T., Matsuda, S. & Ogawa, T. Rad52 forms ring structures and co-operates with RPA in single-strand DNA annealing. Genes Cells 3, 145-156, doi:10.1046/j.1365-2443.1998.00176.x (1998). 129 Mortensen, U. H., Bendixen, C., Sunjevaric, I. & Rothstein, R. DNA strand annealing is promoted by the yeast Rad52 protein. Proc Natl Acad Sci U S A 93, 10729-10734, doi:10.1073/pnas.93.20.10729 (1996). 130 Ahmad, A. et al. ERCC1-XPF endonuclease facilitates DNA double-strand break repair. Mol Cell Biol 28, 5082-5092, doi:10.1128/MCB.00293-08 (2008). 131 Verma, P. & Greenberg, R. A. Noncanonical views of homology-directed DNA repair. Genes Dev 30, 1138-1154, doi:10.1101/gad.280545.116 (2016).

140

132 Li, X. et al. Efficient SSA-mediated precise genome editing using CRISPR/Cas9. FEBS J 285, 3362-3375, doi:10.1111/febs.14626 (2018). 133 Scully, R., Panday, A., Elango, R. & Willis, N. A. DNA double-strand break repair- pathway choice in somatic mammalian cells. Nat Rev Mol Cell Biol 20, 698-714, doi:10.1038/s41580-019-0152-0 (2019). 134 Roumelioti, F. M. et al. Alternative lengthening of human telomeres is a conservative DNA replication process with features of break-induced replication. EMBO Rep 17, 1731-1737, doi:10.15252/embr.201643169 (2016). 135 Bhowmick, R., Minocherhomji, S. & Hickson, I. D. RAD52 Facilitates Mitotic DNA Synthesis Following Replication Stress. Mol Cell 64, 1117-1126, doi:10.1016/j.molcel.2016.10.037 (2016). 136 Wang, W. et al. A DNA nick at Ku-blocked double-strand break ends serves as an entry site for exonuclease 1 (Exo1) or Sgs1-Dna2 in long-range DNA end resection. J Biol Chem 293, 17061-17069, doi:10.1074/jbc.RA118.004769 (2018). 137 Symington, L. S. Mechanism and regulation of DNA end resection in . Crit Rev Biochem Mol Biol 51, 195-212, doi:10.3109/10409238.2016.1172552 (2016). 138 Postow, L. Destroying the ring: Freeing DNA from Ku with ubiquitin. FEBS Lett 585, 2876-2882, doi:10.1016/j.febslet.2011.05.046 (2011). 139 Chanut, P., Britton, S., Coates, J., Jackson, S. P. & Calsou, P. Coordinated nuclease activities counteract Ku at single-ended DNA double-strand breaks. Nat Commun 7, 12889, doi:10.1038/ncomms12889 (2016). 140 Isaac, R. S. et al. Nucleosome breathing and remodeling constrain CRISPR-Cas9 function. Elife 5, doi:10.7554/eLife.13450 (2016). 141 Horlbeck, M. A. et al. Nucleosomes impede Cas9 access to DNA in vivo and in vitro. Elife 5, doi:10.7554/eLife.12677 (2016). 142 Hinz, J. M., Laughery, M. F. & Wyrick, J. J. Nucleosomes Selectively Inhibit Cas9 Off- target Activity at a Site Located at the Nucleosome Edge. J Biol Chem 291, 24851- 24856, doi:10.1074/jbc.C116.758706 (2016). 143 Hinz, J. M., Laughery, M. F. & Wyrick, J. J. Nucleosomes Inhibit Cas9 Endonuclease Activity in Vitro. Biochemistry 54, 7063-7066, doi:10.1021/acs.biochem.5b01108 (2015). 144 Jensen, K. T. et al. Chromatin accessibility and guide sequence secondary structure affect CRISPR-Cas9 gene editing efficiency. FEBS Lett 591, 1892-1901, doi:10.1002/1873- 3468.12707 (2017). 145 Daer, R. M., Cutts, J. P., Brafman, D. A. & Haynes, K. A. The Impact of Chromatin Dynamics on Cas9-Mediated Genome Editing in Human Cells. ACS Synth Biol 6, 428- 438, doi:10.1021/acssynbio.5b00299 (2017). 146 Chen, X. et al. Probing the impact of chromatin conformation on genome editing tools. Nucleic Acids Res 44, 6482-6492, doi:10.1093/nar/gkw524 (2016). 147 Chen, X., Liu, J., Janssen, J. M. & Goncalves, M. The Chromatin Structure Differentially Impacts High-Specificity CRISPR-Cas9 Nuclease Strategies. Mol Ther Nucleic Acids 8, 558-563, doi:10.1016/j.omtn.2017.08.005 (2017). 148 Schep R., B. E., Leemans C., Vergara X., Morris B., van Schaik T., Manzo S., Peric- hupkes D., van den Berg J., Beijersbergen R., Medema R., van Steensel B. Impact of chromatin context on Cas9-induced DNA double-strand break repair pathway balance. BioRxiv (2020).

141

149 van Overbeek, M. et al. DNA Repair Profiling Reveals Nonrandom Outcomes at Cas9- Mediated Breaks. Mol Cell 63, 633-646, doi:10.1016/j.molcel.2016.06.037 (2016). 150 Adkins, N. L., Niu, H., Sung, P. & Peterson, C. L. Nucleosome dynamics regulates DNA processing. Nat Struct Mol Biol 20, 836-842, doi:10.1038/nsmb.2585 (2013). 151 Xue, C. et al. Regulatory control of Sgs1 and Dna2 during eukaryotic DNA end resection. Proc Natl Acad Sci U S A 116, 6091-6100, doi:10.1073/pnas.1819276116 (2019). 152 Soniat, M. M., Myler, L. R., Kuo, H. C., Paull, T. T. & Finkelstein, I. J. RPA Phosphorylation Inhibits DNA Resection. Mol Cell 75, 145-153 e145, doi:10.1016/j.molcel.2019.05.005 (2019). 153 Billon, P. & Cote, J. Precise deposition of histone H2A.Z in chromatin for genome expression and maintenance. Biochim Biophys Acta 1819, 290-302, doi:10.1016/j.bbagrm.2011.10.004 (2013). 154 Luijsterburg, M. S. et al. PARP1 Links CHD2-Mediated Chromatin Expansion and H3.3 Deposition to DNA Repair by Non-homologous End-Joining. Mol Cell 61, 547-562, doi:10.1016/j.molcel.2016.01.019 (2016). 155 Xu, Y. et al. Histone H2A.Z controls a critical chromatin remodeling step required for DNA double-strand break repair. Mol Cell 48, 723-733, doi:10.1016/j.molcel.2012.09.026 (2012). 156 Clouaire, T. & Legube, G. A Snapshot on the Cis Chromatin Response to DNA Double- Strand Breaks. Trends Genet 35, 330-345, doi:10.1016/j.tig.2019.02.003 (2019). 157 Price, B. D. & D'Andrea, A. D. Chromatin remodeling at DNA double-strand breaks. Cell 152, 1344-1354, doi:10.1016/j.cell.2013.02.011 (2013). 158 Shim, E. Y. et al. RSC mobilizes nucleosomes to improve accessibility of repair machinery to the damaged chromatin. Mol Cell Biol 27, 1602-1613, doi:10.1128/MCB.01956-06 (2007). 159 Zou, L. & Elledge, S. J. Sensing DNA damage through ATRIP recognition of RPA- ssDNA complexes. Science 300, 1542-1548, doi:10.1126/science.1083430 (2003). 160 Cortez, D., Guntuku, S., Qin, J. & Elledge, S. J. ATR and ATRIP: partners in checkpoint signaling. Science 294, 1713-1716, doi:10.1126/science.1065521 (2001). 161 Feng, S. et al. Ewing Tumor-associated Antigen 1 Interacts with to Promote Restart of Stalled Replication Forks. J Biol Chem 291, 21956-21962, doi:10.1074/jbc.C116.747758 (2016). 162 Bass, T. E. & Cortez, D. Quantitative phosphoproteomics reveals mitotic function of the ATR activator ETAA1. J Cell Biol 218, 1235-1249, doi:10.1083/jcb.201810058 (2019). 163 Blackford, A. N. & Jackson, S. P. ATM, ATR, and DNA-PK: The Trinity at the Heart of the DNA Damage Response. Mol Cell 66, 801-817, doi:10.1016/j.molcel.2017.05.015 (2017). 164 Stucki, M. & Jackson, S. P. gammaH2AX and MDC1: anchoring the DNA-damage- response machinery to broken chromosomes. DNA Repair (Amst) 5, 534-543, doi:10.1016/j.dnarep.2006.01.012 (2006). 165 Stewart, G. S. et al. The RIDDLE syndrome protein mediates a ubiquitin-dependent signaling cascade at sites of DNA damage. Cell 136, 420-434, doi:10.1016/j.cell.2008.12.042 (2009).

142

166 Doil, C. et al. RNF168 binds and amplifies ubiquitin conjugates on damaged chromosomes to allow accumulation of repair proteins. Cell 136, 435-446, doi:10.1016/j.cell.2008.12.041 (2009). 167 Coleman, K. A. & Greenberg, R. A. The BRCA1-RAP80 complex regulates DNA repair mechanism utilization by restricting end resection. J Biol Chem 286, 13669-13680, doi:10.1074/jbc.M110.213728 (2011). 168 Hu, Y. et al. RAP80-directed tuning of BRCA1 homologous recombination function at ionizing radiation-induced nuclear foci. Genes Dev 25, 685-700, doi:10.1101/gad.2011011 (2011). 169 Shao, G. et al. MERIT40 controls BRCA1-Rap80 complex integrity and recruitment to DNA double-strand breaks. Genes Dev 23, 740-754, doi:10.1101/gad.1739609 (2009). 170 Feng, L., Huang, J. & Chen, J. MERIT40 facilitates BRCA1 localization and DNA damage repair. Genes Dev 23, 719-728, doi:10.1101/gad.1770609 (2009). 171 Fradet-Turcotte, A. et al. 53BP1 is a reader of the DNA-damage-induced H2A Lys 15 ubiquitin mark. Nature 499, 50-54, doi:10.1038/nature12318 (2013). 172 Cao, L. et al. A selective requirement for 53BP1 in the biological response to genomic instability induced by Brca1 deficiency. Mol Cell 35, 534-541, doi:10.1016/j.molcel.2009.06.037 (2009). 173 Bunting, S. F. et al. 53BP1 inhibits homologous recombination in Brca1-deficient cells by blocking resection of DNA breaks. Cell 141, 243-254, doi:10.1016/j.cell.2010.03.012 (2010). 174 Bouwman, P. et al. 53BP1 loss rescues BRCA1 deficiency and is associated with triple- negative and BRCA-mutated breast cancers. Nat Struct Mol Biol 17, 688-695, doi:10.1038/nsmb.1831 (2010). 175 Becker, J. R. et al. The ASCIZ-DYNLL1 axis promotes 53BP1-dependent non- homologous end joining and PARP inhibitor sensitivity. Nat Commun 9, 5406, doi:10.1038/s41467-018-07855-x (2018). 176 He, Y. J. et al. DYNLL1 binds to MRE11 to limit DNA end resection in BRCA1- deficient cells. Nature 563, 522-526, doi:10.1038/s41586-018-0670-5 (2018). 177 Callen, E. et al. 53BP1 mediates productive and mutagenic DNA repair through distinct phosphoprotein interactions. Cell 153, 1266-1280, doi:10.1016/j.cell.2013.05.023 (2013). 178 Xu, G. et al. REV7 counteracts DNA double-strand break resection and affects PARP inhibition. Nature 521, 541-544, doi:10.1038/nature14328 (2015). 179 Boersma, V. et al. MAD2L2 controls DNA repair at telomeres and DNA breaks by inhibiting 5' end resection. Nature 521, 537-540, doi:10.1038/nature14216 (2015). 180 Zimmermann, M., Lottersberger, F., Buonomo, S. B., Sfeir, A. & de Lange, T. 53BP1 regulates DSB repair using Rif1 to control 5' end resection. Science 339, 700-704, doi:10.1126/science.1231573 (2013). 181 Escribano-Diaz, C. et al. A cell cycle-dependent regulatory circuit composed of 53BP1- RIF1 and BRCA1-CtIP controls DNA repair pathway choice. Mol Cell 49, 872-883, doi:10.1016/j.molcel.2013.01.001 (2013). 182 Di Virgilio, M. et al. Rif1 prevents resection of DNA breaks and promotes immunoglobulin class switching. Science 339, 711-715, doi:10.1126/science.1230624 (2013).

143

183 Chapman, J. R. et al. RIF1 is essential for 53BP1-dependent nonhomologous end joining and suppression of DNA double-strand break resection. Mol Cell 49, 858-871, doi:10.1016/j.molcel.2013.01.002 (2013). 184 Noordermeer, S. M. et al. The shieldin complex mediates 53BP1-dependent DNA repair. Nature 560, 117-121, doi:10.1038/s41586-018-0340-7 (2018). 185 Mirman, Z. et al. 53BP1-RIF1-shieldin counteracts DSB resection through CST- and Polalpha-dependent fill-in. Nature 560, 112-116, doi:10.1038/s41586-018-0324-7 (2018). 186 Gupta, R. et al. DNA Repair Network Analysis Reveals Shieldin as a Key Regulator of NHEJ and PARP Inhibitor Sensitivity. Cell 173, 972-988 e923, doi:10.1016/j.cell.2018.03.050 (2018). 187 Ghezraoui, H. et al. 53BP1 cooperation with the REV7-shieldin complex underpins DNA structure-specific NHEJ. Nature 560, 122-127, doi:10.1038/s41586-018-0362-1 (2018). 188 Dev, H. et al. Shieldin complex promotes DNA end-joining and counters homologous recombination in BRCA1-null cells. Nat Cell Biol 20, 954-965, doi:10.1038/s41556-018- 0140-1 (2018). 189 Barazas, M. et al. The CST Complex Mediates End Protection at Double-Strand Breaks and Promotes PARP Inhibitor Sensitivity in BRCA1-Deficient Cells. Cell Rep 23, 2107- 2118, doi:10.1016/j.celrep.2018.04.046 (2018). 190 Gao, S. et al. An OB-fold complex controls the repair pathways for DNA double-strand breaks. Nat Commun 9, 3925, doi:10.1038/s41467-018-06407-7 (2018). 191 Findlay, S. et al. SHLD2/FAM35A co-operates with REV7 to coordinate DNA double- strand break repair pathway choice. EMBO J 37, doi:10.15252/embj.2018100158 (2018). 192 Ganduri, S. & Lue, N. F. STN1-POLA2 interaction provides a basis for primase-pol alpha stimulation by human STN1. Nucleic Acids Res 45, 9455-9466, doi:10.1093/nar/gkx621 (2017). 193 Casteel, D. E. et al. A DNA polymerase-{alpha}{middle dot}primase with homology to replication protein A-32 regulates DNA replication in mammalian cells. J Biol Chem 284, 5807-5818, doi:10.1074/jbc.M807593200 (2009). 194 Goulian, M., Heard, C. J. & Grimm, S. L. Purification and properties of an accessory protein for DNA polymerase alpha/primase. J Biol Chem 265, 13221-13230 (1990). 195 Callen, E. et al. 53BP1 Enforces Distinct Pre- and Post-resection Blocks on Homologous Recombination. Mol Cell 77, 26-38 e27, doi:10.1016/j.molcel.2019.09.024 (2020). 196 Jacquet, K. et al. The TIP60 Complex Regulates Bivalent Chromatin Recognition by 53BP1 through Direct H4K20me Binding and H2AK15 Acetylation. Mol Cell 62, 409- 421, doi:10.1016/j.molcel.2016.03.031 (2016). 197 Tang, J. et al. Acetylation limits 53BP1 association with damaged chromatin to promote homologous recombination. Nat Struct Mol Biol 20, 317-325, doi:10.1038/nsmb.2499 (2013). 198 Densham, R. M. et al. Human BRCA1-BARD1 ubiquitin ligase activity counteracts chromatin barriers to DNA resection. Nat Struct Mol Biol 23, 647-655, doi:10.1038/nsmb.3236 (2016). 199 Nakamura, K. et al. H4K20me0 recognition by BRCA1-BARD1 directs homologous recombination to sister chromatids. Nat Cell Biol 21, 311-318, doi:10.1038/s41556-019- 0282-9 (2019). 200 Tkac, J. et al. HELB Is a Feedback Inhibitor of DNA End Resection. Mol Cell 61, 405- 418, doi:10.1016/j.molcel.2015.12.013 (2016).

144

201 Orthwein, A. et al. A mechanism for the suppression of homologous recombination in G1 cells. Nature 528, 422-426, doi:10.1038/nature16142 (2015). 202 Makharashvili, N. & Paull, T. T. CtIP: A DNA damage response protein at the intersection of DNA metabolism. DNA Repair (Amst) 32, 75-81, doi:10.1016/j.dnarep.2015.04.016 (2015). 203 Tomimatsu, N. et al. Phosphorylation of EXO1 by CDKs 1 and 2 regulates DNA end resection and repair pathway choice. Nat Commun 5, 3561, doi:10.1038/ncomms4561 (2014). 204 Huertas, P., Cortes-Ledesma, F., Sartori, A. A., Aguilera, A. & Jackson, S. P. CDK targets Sae2 to control DNA-end resection and homologous recombination. Nature 455, 689-692, doi:10.1038/nature07215 (2008). 205 Caspari, T., Murray, J. M. & Carr, A. M. Cdc2-cyclin B kinase activity links Crb2 and Rqh1- III. Genes Dev 16, 1195-1208, doi:10.1101/gad.221402 (2002). 206 Ira, G. et al. DNA end resection, homologous recombination and DNA damage checkpoint activation require CDK1. Nature 431, 1011-1017, doi:10.1038/nature02964 (2004). 207 Aylon, Y., Liefshitz, B. & Kupiec, M. The CDK regulates repair of double-strand breaks by homologous recombination during the cell cycle. EMBO J 23, 4868-4875, doi:10.1038/sj.emboj.7600469 (2004). 208 Yu, X. & Chen, J. DNA damage-induced cell cycle checkpoint control requires CtIP, a phosphorylation-dependent binding partner of BRCA1 C-terminal domains. Mol Cell Biol 24, 9478-9486, doi:10.1128/MCB.24.21.9478-9486.2004 (2004). 209 Ferretti, L. P., Lafranchi, L. & Sartori, A. A. Controlling DNA-end resection: a new task for CDKs. Front Genet 4, 99, doi:10.3389/fgene.2013.00099 (2013). 210 Falck, J. et al. CDK targeting of NBS1 promotes DNA-end resection, replication restart and homologous recombination. EMBO Rep 13, 561-568, doi:10.1038/embor.2012.58 (2012). 211 Wohlbold, L. et al. Chemical genetics reveals a specific requirement for Cdk2 activity in the DNA damage response and identifies Nbs1 as a Cdk2 substrate in human cells. PLoS Genet 8, e1002935, doi:10.1371/journal.pgen.1002935 (2012). 212 Saredi, G. et al. H4K20me0 marks post-replicative chromatin and recruits the TONSL- MMS22L DNA repair complex. Nature 534, 714-718, doi:10.1038/nature18312 (2016). 213 Piwko, W. et al. The MMS22L-TONSL heterodimer directly promotes RAD51- dependent recombination upon replication stress. EMBO J 35, 2584-2601, doi:10.15252/embj.201593132 (2016). 214 Piwko, W. et al. RNAi-based screening identifies the Mms22L-Nfkbil2 complex as a novel regulator of DNA replication in human cells. EMBO J 29, 4210-4222, doi:10.1038/emboj.2010.304 (2010). 215 O'Donnell, L. et al. The MMS22L-TONSL complex mediates recovery from replication stress and homologous recombination. Mol Cell 40, 619-631, doi:10.1016/j.molcel.2010.10.024 (2010). 216 Duro, E. et al. Identification of the MMS22L-TONSL complex that promotes homologous recombination. Mol Cell 40, 632-644, doi:10.1016/j.molcel.2010.10.023 (2010).

145

217 Rieder, C. L. & Cole, R. W. Entry into mitosis in vertebrate somatic cells is guarded by a chromosome damage checkpoint that reverses the cell cycle when triggered during early but not late prophase. J Cell Biol 142, 1013-1022, doi:10.1083/jcb.142.4.1013 (1998). 218 Orthwein, A. et al. Mitosis inhibits DNA double-strand break repair to guard against telomere fusions. Science 344, 189-193, doi:10.1126/science.1248024 (2014). 219 Lee, D. H. et al. Dephosphorylation enables the recruitment of 53BP1 to double-strand DNA breaks. Mol Cell 54, 512-525, doi:10.1016/j.molcel.2014.03.020 (2014). 220 Giunta, S., Belotserkovskaya, R. & Jackson, S. P. DNA damage signaling in response to double-strand breaks during mitosis. J Cell Biol 190, 197-207, doi:10.1083/jcb.200911156 (2010). 221 Stucki, M. et al. MDC1 directly binds phosphorylated histone H2AX to regulate cellular responses to DNA double-strand breaks. Cell 123, 1213-1226, doi:10.1016/j.cell.2005.09.038 (2005). 222 Gomez-Godinez, V. et al. Analysis of DNA double-strand break response and chromatin structure in mitosis using laser microirradiation. Nucleic Acids Res 38, e202, doi:10.1093/nar/gkq836 (2010). 223 Peterson, S. E. et al. Cdk1 uncouples CtIP-dependent resection and Rad51 filament formation during M-phase double-strand break repair. J Cell Biol 194, 705-720, doi:10.1083/jcb.201103103 (2011). 224 Heijink, A. M., Krajewska, M. & van Vugt, M. A. The DNA damage response during mitosis. Mutat Res 750, 45-55, doi:10.1016/j.mrfmmm.2013.07.003 (2013). 225 Renkawitz, J., Lademann, C. A. & Jentsch, S. Mechanisms and principles of homology search during recombination. Nat Rev Mol Cell Biol 15, 369-383, doi:10.1038/nrm3805 (2014). 226 Strom, L., Lindroos, H. B., Shirahige, K. & Sjogren, C. Postreplicative recruitment of cohesin to double-strand breaks is required for DNA repair. Mol Cell 16, 1003-1015, doi:10.1016/j.molcel.2004.11.026 (2004). 227 Unal, E. et al. DNA damage response pathway uses histone modification to assemble a double-strand break-specific cohesin domain. Mol Cell 16, 991-1002, doi:10.1016/j.molcel.2004.11.027 (2004). 228 Soutoglou, E. et al. Positional stability of single double-strand breaks in mammalian cells. Nat Cell Biol 9, 675-682, doi:10.1038/ncb1591 (2007). 229 Robinett, C. C. et al. In vivo localization of DNA sequences and visualization of large- scale chromatin organization using lac operator/repressor recognition. J Cell Biol 135, 1685-1700, doi:10.1083/jcb.135.6.1685 (1996). 230 Lemaitre, C. et al. Nuclear position dictates DNA repair pathway choice. Genes Dev 28, 2450-2463, doi:10.1101/gad.248369.114 (2014). 231 Seeber, A. & Gasser, S. M. Chromatin organization and dynamics in double-strand break repair. Curr Opin Genet Dev 43, 9-16, doi:10.1016/j.gde.2016.10.005 (2017). 232 Aten, J. A. et al. Dynamics of DNA double-strand breaks revealed by clustering of damaged chromosome domains. Science 303, 92-95, doi:10.1126/science.1088845 (2004). 233 Aymard, F. et al. Transcriptionally active chromatin recruits homologous recombination at DNA double-strand breaks. Nat Struct Mol Biol 21, 366-374, doi:10.1038/nsmb.2796 (2014).

146

234 Schrank, B. R. et al. Nuclear ARP2/3 drives DNA break clustering for homology- directed repair. Nature 559, 61-66, doi:10.1038/s41586-018-0237-5 (2018). 235 Chapman, J. R., Taylor, M. R. & Boulton, S. J. Playing the end game: DNA double- strand break repair pathway choice. Mol Cell 47, 497-510, doi:10.1016/j.molcel.2012.07.029 (2012). 236 Paquet, D. et al. Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9. Nature 533, 125-129, doi:10.1038/nature17664 (2016). 237 Ma, H. et al. Ma et al. reply. Nature 560, E10-E23, doi:10.1038/s41586-018-0381-y (2018). 238 Ma, H. et al. Correction of a pathogenic gene mutation in human embryos. Nature 548, 413-419, doi:10.1038/nature23305 (2017). 239 Richardson, C. & Jasin, M. Frequent chromosomal translocations induced by DNA double-strand breaks. Nature 405, 697-700, doi:10.1038/35015097 (2000). 240 Kosicki, M., Tomberg, K. & Bradley, A. Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements. Nat Biotechnol 36, 765-771, doi:10.1038/nbt.4192 (2018). 241 Haapaniemi, E., Botla, S., Persson, J., Schmierer, B. & Taipale, J. CRISPR-Cas9 genome editing induces a p53-mediated DNA damage response. Nat Med 24, 927-930, doi:10.1038/s41591-018-0049-z (2018). 242 Ihry, R. J. et al. p53 inhibits CRISPR-Cas9 engineering in human pluripotent stem cells. Nat Med 24, 939-946, doi:10.1038/s41591-018-0050-6 (2018). 243 Leenay, R. T. et al. Large dataset enables prediction of repair after CRISPR-Cas9 editing in primary T cells. Nat Biotechnol 37, 1034-1037, doi:10.1038/s41587-019-0203-2 (2019). 244 Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double- strand breaks. Nat Biotechnol, doi:10.1038/nbt.4317 (2018). 245 Stark, J. M. & Jasin, M. Extensive loss of heterozygosity is suppressed during homologous repair of chromosomal breaks. Mol Cell Biol 23, 733-743, doi:10.1128/mcb.23.2.733-743.2003 (2003). 246 Liang, P. et al. CRISPR/Cas9-mediated gene editing in human tripronuclear zygotes. Protein Cell 6, 363-372, doi:10.1007/s13238-015-0153-5 (2015). 247 Wu, Y. et al. Correction of a genetic disease by CRISPR-Cas9-mediated gene editing in mouse spermatogonial stem cells. Cell Res 25, 67-79, doi:10.1038/cr.2014.160 (2015). 248 Javidi-Parsijani, P. et al. CRISPR/Cas9 increases mitotic gene conversion in human cells. Gene Ther, doi:10.1038/s41434-020-0126-z (2020). 249 Wilde J. J. Aida T., W. M., Zhang Q., Qi P., Feng Guoping. RAD51 Enhances Zygotic Interhomolog Repair. BioRxiv (2018). 250 Adikusuma, F. et al. Large deletions induced by Cas9 cleavage. Nature 560, E8-E9, doi:10.1038/s41586-018-0380-z (2018). 251 Egli, D. et al. Inter-homologue repair in fertilized human eggs? Nature 560, E5-E7, doi:10.1038/s41586-018-0379-5 (2018). 252 Chen, J. M., Cooper, D. N., Ferec, C., Kehrer-Sawatzki, H. & Patrinos, G. P. Genomic rearrangements in inherited disease and cancer. Semin Cancer Biol 20, 222-233, doi:10.1016/j.semcancer.2010.05.007 (2010).

147

253 Spielmann, M. & Mundlos, S. Structural variations, the regulatory landscape of the genome and their alteration in human disease. Bioessays 35, 533-543, doi:10.1002/bies.201200178 (2013). 254 Spielmann, M. et al. Homeotic arm-to-leg transformation associated with genomic rearrangements at the PITX1 locus. Am J Hum Genet 91, 629-635, doi:10.1016/j.ajhg.2012.08.014 (2012). 255 Montavon, T., Thevenet, L. & Duboule, D. Impact of copy number variations (CNVs) on long-range gene regulation at the HoxD locus. Proc Natl Acad Sci U S A 109, 20204- 20211, doi:10.1073/pnas.1217659109 (2012). 256 Smith, J. A., Waldman, B. C. & Waldman, A. S. A role for DNA mismatch repair protein Msh2 in error-prone double-strand-break repair in mammalian chromosomes. Genetics 170, 355-363, doi:10.1534/genetics.104.039362 (2005). 257 Pastorczak, A. et al. Secondary acute monocytic leukemia positive for 11q23 rearrangement in Nijmegen breakage syndrome. Pediatr Blood Cancer 61, 1469-1471, doi:10.1002/pbc.24994 (2014). 258 Bunting, S. F. & Nussenzweig, A. End-joining, translocations and cancer. Nat Rev Cancer 13, 443-454, doi:10.1038/nrc3537 (2013). 259 Strout, M. P., Marcucci, G., Bloomfield, C. D. & Caligiuri, M. A. The partial tandem duplication of ALL1 (MLL) is consistently generated by Alu-mediated homologous recombination in acute myeloid leukemia. Proc Natl Acad Sci U S A 95, 2390-2395, doi:10.1073/pnas.95.5.2390 (1998). 260 Jeffs, A. R., Benjes, S. M., Smith, T. L., Sowerby, S. J. & Morris, C. M. The BCR gene recombines preferentially with Alu elements in complex BCR-ABL translocations of chronic myeloid leukaemia. Hum Mol Genet 7, 767-776, doi:10.1093/hmg/7.5.767 (1998). 261 Fu, Y. et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat Biotechnol 31, 822-826, doi:10.1038/nbt.2623 (2013). 262 Cullot, G. et al. CRISPR-Cas9 genome editing induces megabase-scale chromosomal truncations. Nat Commun 10, 1136, doi:10.1038/s41467-019-09006-2 (2019). 263 Anderson, K. R. et al. CRISPR off-target analysis in genetically engineered rats and mice. Nat Methods 15, 512-514, doi:10.1038/s41592-018-0011-5 (2018). 264 Ghezraoui, H. et al. Chromosomal translocations in human cells are generated by canonical nonhomologous end-joining. Mol Cell 55, 829-842, doi:10.1016/j.molcel.2014.08.002 (2014). 265 Brunet, E. & Jasin, M. Induction of Chromosomal Translocations with CRISPR-Cas9 and Other Nucleases: Understanding the Repair Mechanisms That Give Rise to Translocations. Adv Exp Med Biol 1044, 15-25, doi:10.1007/978-981-13-0593-1_2 (2018). 266 Wray, J. et al. PARP1 is required for chromosomal translocations. Blood 121, 4359-4365, doi:10.1182/blood-2012-10-460527 (2013). 267 Soni, A. et al. Requirement for Parp-1 and DNA 1 or 3 but not of Xrcc1 in chromosomal translocation formation by backup end joining. Nucleic Acids Res 42, 6380-6392, doi:10.1093/nar/gku298 (2014). 268 Zhang, Y. & Jasin, M. An essential role for CtIP in chromosomal translocation formation through an alternative end-joining pathway. Nat Struct Mol Biol 18, 80-84, doi:10.1038/nsmb.1940 (2011).

148

269 Li, J. et al. Efficient inversions and duplications of mammalian regulatory DNA elements and gene clusters by CRISPR/Cas9. J Mol Cell Biol 7, 284-298, doi:10.1093/jmcb/mjv016 (2015). 270 Pristyazhnyuk, I. E. et al. Time origin and structural analysis of the induced CRISPR/cas9 megabase-sized deletions and duplications involving the Cntn6 gene in mice. Sci Rep 9, 14161, doi:10.1038/s41598-019-50649-4 (2019). 271 Skryabin, B. V. et al. Pervasive head-to-tail insertions of DNA templates mask desired CRISPR-Cas9-mediated genome editing events. Sci Adv 6, eaax2941, doi:10.1126/sciadv.aax2941 (2020). 272 Paulis, M. et al. A pre-screening FISH-based method to detect CRISPR/Cas9 off-targets in mouse embryonic stem cells. Sci Rep 5, 12327, doi:10.1038/srep12327 (2015). 273 Hu, Q. et al. Break-induced replication plays a prominent role in long-range repeat- mediated deletion. EMBO J 38, e101751, doi:10.15252/embj.2019101751 (2019). 274 Mendez-Dorantes, C., Bhargava, R. & Stark, J. M. Repeat-mediated deletions can be induced by a chromosomal break far from a repeat, but multiple pathways suppress such rearrangements. Genes Dev 32, 524-536, doi:10.1101/gad.311084.117 (2018). 275 Owens, D. D. G. et al. Microhomologies are prevalent at Cas9-induced larger deletions. Nucleic Acids Res 47, 7402-7417, doi:10.1093/nar/gkz459 (2019). 276 Zhang, X. H., Tee, L. Y., Wang, X. G., Huang, Q. S. & Yang, S. H. Off-target Effects in CRISPR/Cas9-mediated Genome Engineering. Mol Ther Nucleic Acids 4, e264, doi:10.1038/mtna.2015.37 (2015). 277 Bieging, K. T., Mello, S. S. & Attardi, L. D. Unravelling mechanisms of p53-mediated tumour suppression. Nat Rev Cancer 14, 359-370, doi:10.1038/nrc3711 (2014). 278 Yang, H. et al. One-step generation of mice carrying reporter and conditional alleles by CRISPR/Cas-mediated genome engineering. Cell 154, 1370-1379, doi:10.1016/j.cell.2013.08.022 (2013). 279 Dickinson, D. J., Ward, J. D., Reiner, D. J. & Goldstein, B. Engineering the genome using Cas9-triggered homologous recombination. Nat Methods 10, 1028-1034, doi:10.1038/nmeth.2641 (2013). 280 Storici, F., Snipe, J. R., Chan, G. K., Gordenin, D. A. & Resnick, M. A. Conservative repair of a chromosomal double-strand break by single-strand DNA through two steps of annealing. Mol Cell Biol 26, 7645-7657, doi:10.1128/MCB.00672-06 (2006). 281 Liang, X., Potter, J., Kumar, S., Ravinder, N. & Chesnut, J. D. Enhanced CRISPR/Cas9- mediated precise genome editing by improved design and delivery of gRNA, Cas9 nuclease, and donor DNA. J Biotechnol 241, 136-146, doi:10.1016/j.jbiotec.2016.11.011 (2017). 282 Eckstein, F. Phosphorothioates, essential components of therapeutic oligonucleotides. Nucleic Acid Ther 24, 374-387, doi:10.1089/nat.2014.0506 (2014). 283 Hirotsune, S. et al. Enhanced homologous recombination by the modulation of targeting vector ends. Sci Rep 10, 2518, doi:10.1038/s41598-020-58893-9 (2020). 284 Cruz-Becerra, G. & Kadonaga, J. T. Enhancement of homology-directed repair with chromatin donor templates in cells. Elife 9, doi:10.7554/eLife.55780 (2020). 285 Aird, E. J., Lovendahl, K. N., St Martin, A., Harris, R. S. & Gordon, W. R. Increasing Cas9-mediated homology-directed repair efficiency through covalent tethering of DNA repair template. Commun Biol 1, 54, doi:10.1038/s42003-018-0054-2 (2018).

149

286 Savic, N. et al. Covalent linkage of the DNA repair template to the CRISPR-Cas9 nuclease enhances homology-directed repair. Elife 7, doi:10.7554/eLife.33761 (2018). 287 Carlson-Stevermer, J. et al. Assembly of CRISPR ribonucleoproteins with biotinylated oligonucleotides via an RNA aptamer for precise gene editing. Nat Commun 8, 1711, doi:10.1038/s41467-017-01875-9 (2017). 288 Yeh, C. D., Richardson, C. D. & Corn, J. E. Advances in genome editing through control of DNA repair pathways. Nat Cell Biol 21, 1468-1478, doi:10.1038/s41556-019-0425-z (2019). 289 Lin, S., Staahl, B. T., Alla, R. K. & Doudna, J. A. Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery. Elife 3, e04766, doi:10.7554/eLife.04766 (2014). 290 Wienert, B., Nguyen, D.N., Guenther, A. et al. Timed inhibition of CDC7 increases CRISPR-Cas9 mediated templated repair. Nature Communications (2020). 291 Lomova, A. et al. Improving Gene Editing Outcomes in Human Hematopoietic Stem and Progenitor Cells by Temporal Control of DNA Repair. Stem Cells 37, 284-294, doi:10.1002/stem.2935 (2019). 292 Charpentier, M. et al. CtIP fusion to Cas9 enhances transgene integration by homology- dependent repair. Nat Commun 9, 1133, doi:10.1038/s41467-018-03475-7 (2018). 293 Gutschner, T., Haemmerle, M., Genovese, G., Draetta, G. F. & Chin, L. Post-translational Regulation of Cas9 during G1 Enhances Homology-Directed Repair. Cell Rep 14, 1555- 1566, doi:10.1016/j.celrep.2016.01.019 (2016). 294 Abe T., I. K., Furata Y., Kiyonari H. Pronuclear Microinjection during S-Phase Increases the Efficiency of CRISPR-Cas9-Assisted Knockin of Large DNA Donors in Mouse Zygotes. Cell Reports 31 (2020). 295 Gu, B., Posfai, E. & Rossant, J. Efficient generation of targeted large insertions by microinjection into two-cell-stage mouse embryos. Nat Biotechnol 36, 632-637, doi:10.1038/nbt.4166 (2018). 296 Riesenberg, S. & Maricic, T. Targeting repair pathways with small molecules increases precise genome editing in pluripotent stem cells. Nat Commun 9, 2164, doi:10.1038/s41467-018-04609-7 (2018). 297 Greco, G. E. et al. SCR7 is neither a selective nor a potent inhibitor of human DNA ligase IV. DNA Repair (Amst) 43, 18-23, doi:10.1016/j.dnarep.2016.04.004 (2016). 298 Pinder, J., Salsman, J. & Dellaire, G. Nuclear domain 'knock-in' screen for the evaluation and identification of small molecule enhancers of CRISPR-based genome editing. Nucleic Acids Res 43, 9379-9392, doi:10.1093/nar/gkv993 (2015). 299 Yang, D. et al. Enrichment of G2/M cell cycle phase in human pluripotent stem cells enhances HDR-mediated gene repair with customizable endonucleases. Sci Rep 6, 21264, doi:10.1038/srep21264 (2016). 300 Zhang, J. P. et al. Efficient precise knockin with a double cut HDR donor after CRISPR/Cas9-mediated double-stranded DNA cleavage. Genome Biol 18, 35, doi:10.1186/s13059-017-1164-8 (2017). 301 Song, J. et al. RS-1 enhances CRISPR/Cas9- and TALEN-mediated knock-in efficiency. Nat Commun 7, 10548, doi:10.1038/ncomms10548 (2016).

150

302 Hu, Z. et al. Ligase IV inhibitor SCR7 enhances gene editing directed by CRISPR-Cas9 and ssODN in human cancer cells. Cell Biosci 8, 12, doi:10.1186/s13578-018-0200-z (2018). 303 Robert, F., Barbeau, M., Ethier, S., Dostie, J. & Pelletier, J. Pharmacological inhibition of DNA-PK stimulates Cas9-mediated genome editing. Genome Med 7, 93, doi:10.1186/s13073-015-0215-6 (2015). 304 Maruyama, T. et al. Increasing the efficiency of precise genome editing with CRISPR- Cas9 by inhibition of nonhomologous end joining. Nat Biotechnol 33, 538-542, doi:10.1038/nbt.3190 (2015). 305 Beumer, K. J. et al. Efficient gene targeting in Drosophila by direct embryo injection with zinc-finger nucleases. Proc Natl Acad Sci U S A 105, 19821-19826, doi:10.1073/pnas.0810475105 (2008). 306 Chu, V. T. et al. Increasing the efficiency of homology-directed repair for CRISPR-Cas9- induced precise gene editing in mammalian cells. Nat Biotechnol 33, 543-548, doi:10.1038/nbt.3198 (2015). 307 Srivastava, M. et al. An inhibitor of nonhomologous end-joining abrogates double-strand break repair and impedes cancer progression. Cell 151, 1474-1487, doi:10.1016/j.cell.2012.11.054 (2012). 308 Riesenberg, S. et al. Simultaneous precise editing of multiple genes in human cells. Nucleic Acids Res 47, e116, doi:10.1093/nar/gkz669 (2019). 309 Canny, M. D. et al. Inhibition of 53BP1 favors homology-dependent DNA repair and increases CRISPR-Cas9 genome-editing efficiency. Nat Biotechnol 36, 95-102, doi:10.1038/nbt.4021 (2018). 310 Jayavaradhan, R. et al. CRISPR-Cas9 fusion to dominant-negative 53BP1 enhances HDR and inhibits NHEJ specifically at Cas9 target sites. Nat Commun 10, 2866, doi:10.1038/s41467-019-10735-7 (2019). 311 Mateos-Gomez, P. A. et al. The helicase domain of Poltheta counteracts RPA to promote alt-NHEJ. Nat Struct Mol Biol 24, 1116-1123, doi:10.1038/nsmb.3494 (2017). 312 Mateos-Gomez, P. A. et al. Mammalian polymerase theta promotes alternative NHEJ and suppresses recombination. Nature 518, 254-257, doi:10.1038/nature14157 (2015). 313 Jayathilaka, K. et al. A chemical compound that stimulates the human homologous recombination protein RAD51. Proc Natl Acad Sci U S A 105, 15848-15853, doi:10.1073/pnas.0808046105 (2008). 314 Huang, F., Mazina, O. M., Zentner, I. J., Cocklin, S. & Mazin, A. V. Inhibition of homologous recombination in human cells by targeting RAD51 recombinase. J Med Chem 55, 3011-3020, doi:10.1021/jm201173g (2012). 315 Zelensky, A. N., Schimmel, J., Kool, H., Kanaar, R. & Tijsterman, M. Inactivation of Pol theta and C-NHEJ eliminates off-target integration of exogenous DNA. Nat Commun 8, 66, doi:10.1038/s41467-017-00124-3 (2017). 316 Glanzer, J. G. et al. A small molecule directly inhibits the p53 transactivation domain from binding to replication protein A. Nucleic Acids Res 41, 2047-2059, doi:10.1093/nar/gks1291 (2013). 317 Glanzer, J. G., Liu, S. & Oakley, G. G. Small molecule inhibitor of the RPA70 N- terminal protein interaction domain discovered using in silico and in vitro methods. Bioorg Med Chem 19, 2589-2595, doi:10.1016/j.bmc.2011.03.012 (2011).

151

318 Cox, D. B., Platt, R. J. & Zhang, F. Therapeutic genome editing: prospects and challenges. Nat Med 21, 121-131, doi:10.1038/nm.3793 (2015). 319 Torres, R. et al. Engineering human tumour-associated chromosomal translocations with the RNA-guided CRISPR-Cas9 system. Nat Commun 5, 3964, doi:10.1038/ncomms4964 (2014). 320 Roukos, V. & Misteli, T. The biogenesis of chromosome translocations. Nat Cell Biol 16, 293-300, doi:10.1038/ncb2941 (2014). 321 Frock, R. L. et al. Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nat Biotechnol 33, 179-186, doi:10.1038/nbt.3101 (2015). 322 Choi, P. S. & Meyerson, M. Targeted genomic rearrangements using CRISPR/Cas technology. Nat Commun 5, 3728, doi:10.1038/ncomms4728 (2014). 323 Aguirre, A. J. et al. Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting. Cancer Discov 6, 914-929, doi:10.1158/2159-8290.CD-16- 0154 (2016). 324 Caldecott, K. W. Single-strand break repair and genetic disease. Nat Rev Genet 9, 619- 631, doi:10.1038/nrg2380 (2008). 325 Maizels, N. & Davis, L. Initiation of homologous recombination at DNA nicks. Nucleic Acids Res 46, 6962-6973, doi:10.1093/nar/gky588 (2018). 326 Davis, L., Zhang, Y. & Maizels, N. Assaying Repair at DNA Nicks. Methods Enzymol 601, 71-89, doi:10.1016/bs.mie.2017.12.001 (2018). 327 Rees, H. A., Yeh, W. H. & Liu, D. R. Development of hRad51-Cas9 nickase fusions that mediate HDR without double-stranded breaks. Nat Commun 10, 2212, doi:10.1038/s41467-019-09983-4 (2019). 328 Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149-157, doi:10.1038/s41586-019-1711-4 (2019). 329 Yang, L. et al. Engineering and optimising deaminase fusions for genome editing. Nat Commun 7, 13330, doi:10.1038/ncomms13330 (2016). 330 Plosky, B. S. CRISPR-Mediated Base Editing without DNA Double-Strand Breaks. Mol Cell 62, 477-478, doi:10.1016/j.molcel.2016.05.006 (2016). 331 Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, doi:10.1126/science.aaf8729 (2016). 332 Ma, Y. et al. Targeted AID-mediated mutagenesis (TAM) enables efficient genomic diversification in mammalian cells. Nat Methods 13, 1029-1035, doi:10.1038/nmeth.4027 (2016). 333 Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-424, doi:10.1038/nature17946 (2016). 334 Hess, G. T. et al. Directed using dCas9-targeted somatic hypermutation in mammalian cells. Nat Methods 13, 1036-1042, doi:10.1038/nmeth.4038 (2016). 335 Hess, G. T., Tycko, J., Yao, D. & Bassik, M. C. Methods and Applications of CRISPR- Mediated Base Editing in Eukaryotic Genomes. Mol Cell 68, 26-43, doi:10.1016/j.molcel.2017.09.029 (2017). 336 Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464-471, doi:10.1038/nature24644 (2017).

152

337 Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat Rev Genet 19, 770-788, doi:10.1038/s41576-018-0059- 1 (2018). 338 Klompe, S. E., Vo, P. L. H., Halpin-Healy, T. S. & Sternberg, S. H. Transposon-encoded CRISPR-Cas systems direct RNA-guided DNA integration. Nature 571, 219-225, doi:10.1038/s41586-019-1323-z (2019). 339 Strecker, J. et al. RNA-guided DNA insertion with CRISPR-associated . Science 365, 48-53, doi:10.1126/science.aax9181 (2019). 340 Vojta, A. et al. Repurposing the CRISPR-Cas9 system for targeted DNA methylation. Nucleic Acids Res 44, 5615-5628, doi:10.1093/nar/gkw159 (2016). 341 Hilton, I. B. et al. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat Biotechnol 33, 510-517, doi:10.1038/nbt.3199 (2015). 342 Thakore, P. I. et al. Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat Methods 12, 1143-1149, doi:10.1038/nmeth.3630 (2015). 343 Maeder, M. L. et al. CRISPR RNA-guided activation of endogenous human genes. Nat Methods 10, 977-979, doi:10.1038/nmeth.2598 (2013). 344 Chavez, A. et al. Highly efficient Cas9-mediated transcriptional programming. Nat Methods 12, 326-328, doi:10.1038/nmeth.3312 (2015). 345 Cox, D. B. T. et al. RNA editing with CRISPR-Cas13. Science 358, 1019-1027, doi:10.1126/science.aaq0180 (2017). 346 Abudayyeh, O. O. et al. RNA targeting with CRISPR-Cas13. Nature 550, 280-284, doi:10.1038/nature24049 (2017). 347 East-Seletsky, A. et al. Two distinct RNase activities of CRISPR-C2c2 enable guide- RNA processing and RNA detection. Nature 538, 270-273, doi:10.1038/nature19802 (2016). 348 Abudayyeh, O. O. et al. C2c2 is a single-component programmable RNA-guided RNA- targeting CRISPR effector. Science 353, aaf5573, doi:10.1126/science.aaf5573 (2016). 349 Papasavva, P., Kleanthous, M. & Lederer, C. W. Rare Opportunities: CRISPR/Cas-Based Therapy Development for Rare Genetic Diseases. Mol Diagn Ther 23, 201-222, doi:10.1007/s40291-019-00392-3 (2019). 350 Li, H. et al. Applications of genome editing technology in the targeted therapy of human diseases: mechanisms, advances and prospects. Signal Transduct Target Ther 5, 1, doi:10.1038/s41392-019-0089-y (2020). 351 Miyaoka, Y. et al. Isolation of single-base genome-edited human iPS cells without antibiotic selection. Nat Methods 11, 291-293, doi:10.1038/nmeth.2840 (2014). 352 Leonetti, M. D., Sekine, S., Kamiyama, D., Weissman, J. S. & Huang, B. A scalable strategy for high-throughput GFP tagging of endogenous human proteins. Proc Natl Acad Sci U S A 113, E3501-3508, doi:10.1073/pnas.1606731113 (2016). 353 Kamiyama, D. et al. Versatile protein tagging in cells with split fluorescent protein. Nat Commun 7, 11046, doi:10.1038/ncomms11046 (2016). 354 Merkle, F. T. et al. Efficient CRISPR-Cas9-mediated generation of knockin human pluripotent stem cells lacking undesired mutations at the targeted locus. Cell Rep 11, 875- 883, doi:10.1016/j.celrep.2015.04.007 (2015).

153

355 Sharma, A. et al. CRISPR/Cas9-Mediated Fluorescent Tagging of Endogenous Proteins in Human Pluripotent Stem Cells. Curr Protoc Hum Genet 96, 21 11 21-21 11 20, doi:10.1002/cphg.52 (2018). 356 Chen, B., Zou, W., Xu, H., Liang, Y. & Huang, B. Efficient labeling and imaging of protein-coding genes in living cells using CRISPR-Tag. Nat Commun 9, 5065, doi:10.1038/s41467-018-07498-y (2018). 357 Savic, D. et al. CETCh-seq: CRISPR epitope tagging ChIP-seq of DNA-binding proteins. Genome Res 25, 1581-1589, doi:10.1101/gr.193540.115 (2015). 358 Dalvai, M. et al. A Scalable Genome-Editing-Based Approach for Mapping Multiprotein Complexes in Human Cells. Cell Rep 13, 621-633, doi:10.1016/j.celrep.2015.09.009 (2015). 359 Vandemoortele, G. et al. A Well-Controlled BioID Design for Endogenous Bait Proteins. J Proteome Res 18, 95-106, doi:10.1021/acs.jproteome.8b00367 (2019). 360 Natsume, T., Kiyomitsu, T., Saga, Y. & Kanemaki, M. T. Rapid Protein Depletion in Human Cells by Auxin-Inducible Degron Tagging with Short Homology Donors. Cell Rep 15, 210-218, doi:10.1016/j.celrep.2016.03.001 (2016). 361 Yu, J. S. L. & Yusa, K. Genome-wide CRISPR-Cas9 screening in mammalian cells. Methods 164-165, 29-35, doi:10.1016/j.ymeth.2019.04.015 (2019). 362 Wong, A. S. et al. Multiplexed barcoded CRISPR-Cas9 screening enabled by CombiGEM. Proc Natl Acad Sci U S A 113, 2544-2549, doi:10.1073/pnas.1517883113 (2016). 363 Sanjana, N. E. Genome-scale CRISPR pooled screens. Anal Biochem 532, 95-99, doi:10.1016/j.ab.2016.05.014 (2017). 364 Bester, A. C. et al. An Integrated Genome-wide CRISPRa Approach to Functionalize lncRNAs in Drug Resistance. Cell 173, 649-664 e620, doi:10.1016/j.cell.2018.03.052 (2018). 365 Luo, J. CRISPR/Cas9: From Genome Engineering to Cancer Drug Discovery. Trends Cancer 2, 313-324, doi:10.1016/j.trecan.2016.05.001 (2016). 366 Shalem, O., Sanjana, N. E. & Zhang, F. High-throughput functional genomics using CRISPR-Cas9. Nat Rev Genet 16, 299-311, doi:10.1038/nrg3899 (2015). 367 Tzelepis, K. et al. A CRISPR Dropout Screen Identifies Genetic Vulnerabilities and Therapeutic Targets in Acute Myeloid Leukemia. Cell Rep 17, 1193-1205, doi:10.1016/j.celrep.2016.09.079 (2016). 368 Hart, T. et al. High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype- Specific Cancer Liabilities. Cell 163, 1515-1526, doi:10.1016/j.cell.2015.11.015 (2015). 369 Munoz, D. M. et al. CRISPR Screens Provide a Comprehensive Assessment of Cancer Vulnerabilities but Generate False-Positive Hits for Highly Amplified Genomic Regions. Cancer Discov 6, 900-913, doi:10.1158/2159-8290.CD-16-0178 (2016). 370 Heigwer, F. et al. CRISPR library designer (CLD): software for multispecies design of single guide RNA libraries. Genome Biol 17, 55, doi:10.1186/s13059-016-0915-2 (2016). 371 Krall, E. B. et al. KEAP1 loss modulates sensitivity to kinase targeted therapy in lung cancer. Elife 6, doi:10.7554/eLife.18970 (2017). 372 Najm, F. J. et al. Orthologous CRISPR-Cas9 enzymes for combinatorial genetic screens. Nat Biotechnol 36, 179-189, doi:10.1038/nbt.4048 (2018).

154

373 Adamson, B. et al. A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response. Cell 167, 1867-1882 e1821, doi:10.1016/j.cell.2016.11.048 (2016). 374 Han, K. et al. Synergistic drug combinations for cancer identified in a CRISPR screen for pairwise genetic interactions. Nat Biotechnol 35, 463-474, doi:10.1038/nbt.3834 (2017). 375 Shen, J. P. et al. Combinatorial CRISPR-Cas9 screens for de novo mapping of genetic interactions. Nat Methods 14, 573-576, doi:10.1038/nmeth.4225 (2017). 376 Manguso, R. T. et al. In vivo CRISPR screening identifies Ptpn2 as a cancer immunotherapy target. Nature 547, 413-418, doi:10.1038/nature23270 (2017). 377 Findlay, G. M., Boyle, E. A., Hause, R. J., Klein, J. C. & Shendure, J. Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513, 120-123, doi:10.1038/nature13695 (2014). 378 Roth, T. L. et al. Pooled Knockin Targeting for Genome Engineering of Cellular Immunotherapies. Cell, doi:10.1016/j.cell.2020.03.039 (2020). 379 Soldner, F. et al. Generation of isogenic pluripotent stem cells differing exclusively at two early onset Parkinson point mutations. Cell 146, 318-331, doi:10.1016/j.cell.2011.06.019 (2011). 380 Torres-Ruiz, R. & Rodriguez-Perales, S. CRISPR-Cas9: A Revolutionary Tool for Cancer Modelling. Int J Mol Sci 16, 22151-22168, doi:10.3390/ijms160922151 (2015). 381 Heidenreich, M. & Zhang, F. Applications of CRISPR-Cas systems in neuroscience. Nat Rev Neurosci 17, 36-44, doi:10.1038/nrn.2015.2 (2016). 382 Seeger, T., Porteus, M. & Wu, J. C. Genome Editing in Cardiovascular Biology. Circ Res 120, 778-780, doi:10.1161/CIRCRESAHA.116.310197 (2017). 383 Sayin, V. I. & Papagiannakopoulos, T. Application of CRISPR-mediated genome engineering in cancer research. Cancer Lett 387, 10-17, doi:10.1016/j.canlet.2016.03.029 (2017). 384 Sanchez-Rivera, F. J. & Jacks, T. Applications of the CRISPR-Cas9 system in cancer biology. Nat Rev Cancer 15, 387-395, doi:10.1038/nrc3950 (2015). 385 Matano, M. et al. Modeling colorectal cancer using CRISPR-Cas9-mediated engineering of human intestinal organoids. Nat Med 21, 256-262, doi:10.1038/nm.3802 (2015). 386 Wang, G. et al. Modeling the mitochondrial cardiomyopathy of Barth syndrome with induced pluripotent stem cell and heart-on-chip technologies. Nat Med 20, 616-623, doi:10.1038/nm.3545 (2014). 387 Ang, Y. S. et al. Disease Model of GATA4 Mutation Reveals Cooperativity in Human Cardiogenesis. Cell 167, 1734-1749 e1722, doi:10.1016/j.cell.2016.11.033 (2016). 388 Claussnitzer, M. et al. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N Engl J Med 373, 895-907, doi:10.1056/NEJMoa1502214 (2015). 389 Rees, D. C., Williams, T. N. & Gladwin, M. T. Sickle-cell disease. Lancet 376, 2018- 2031, doi:10.1016/S0140-6736(10)61029-X (2010). 390 Dever, D. P. et al. CRISPR/Cas9 beta-globin gene targeting in human haematopoietic stem cells. Nature 539, 384-389, doi:10.1038/nature20134 (2016). 391 Nelson, C. E. et al. In vivo genome editing improves muscle function in a mouse model of Duchenne muscular dystrophy. Science 351, 403-407, doi:10.1126/science.aad5143 (2016).

155

392 Bengtsson, N. E. et al. Muscle-specific CRISPR/Cas9 dystrophin gene editing ameliorates pathophysiology in a mouse model for Duchenne muscular dystrophy. Nat Commun 8, 14454, doi:10.1038/ncomms14454 (2017). 393 Xu, L. et al. CRISPR-mediated Genome Editing Restores Dystrophin Expression and Function in mdx Mice. Mol Ther 24, 564-569, doi:10.1038/mt.2015.192 (2016). 394 Kirkwood, J. M. et al. Immunotherapy of cancer in 2012. CA Cancer J Clin 62, 309-335, doi:10.3322/caac.20132 (2012). 395 Liu, X. et al. CRISPR-Cas9-mediated multiplex gene editing in CAR-T cells. Cell Res 27, 154-157, doi:10.1038/cr.2016.142 (2017). 396 Roth, T. L. et al. Reprogramming human T cell function and specificity with non-viral genome targeting. Nature 559, 405-409, doi:10.1038/s41586-018-0326-5 (2018). 397 Eyquem, J. et al. Targeting a CAR to the TRAC locus with CRISPR/Cas9 enhances tumour rejection. Nature 543, 113-117, doi:10.1038/nature21405 (2017). 398 Mullard, A. Gene-editing pipeline takes off. Nat Rev Drug Discov, doi:10.1038/d41573- 020-00096-y (2020). 399 Moore, C. B. T., Christie, K. A., Marshall, J. & Nesbit, M. A. Personalised genome editing - The future for corneal dystrophies. Prog Retin Eye Res 65, 147-165, doi:10.1016/j.preteyeres.2018.01.004 (2018). 400 Xu, L. et al. CRISPR-Edited Stem Cells in a Patient with HIV and Acute Lymphocytic Leukemia. N Engl J Med 381, 1240-1247, doi:10.1056/NEJMoa1817426 (2019). 401 Knott, G. J. & Doudna, J. A. CRISPR-Cas guides the future of genetic engineering. Science 361, 866-869, doi:10.1126/science.aat5011 (2018). 402 Hsu, P. D., Lander, E. S. & Zhang, F. Development and applications of CRISPR-Cas9 for genome engineering. Cell 157, 1262-1278, doi:10.1016/j.cell.2014.05.010 (2014). 403 Wang, H. et al. One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell 153, 910-918, doi:10.1016/j.cell.2013.04.025 (2013). 404 Bothmer, A. et al. 53BP1 regulates DNA resection and the choice between classical and alternative end joining during class switch recombination. J Exp Med 207, 855-865, doi:jem.20100244 [pii] 10.1084/jem.20100244 (2010). 405 Wang, H. & Xu, X. Microhomology-mediated end joining: new players join the team. Cell Biosci 7, 6, doi:10.1186/s13578-017-0136-8 (2017). 406 Jasin, M. & Rothstein, R. Repair of strand breaks by homologous recombination. Cold Spring Harb Perspect Biol 5, a012740, doi:10.1101/cshperspect.a012740 (2013). 407 Huang, J. et al. RAD18 transmits DNA damage signalling to elicit homologous recombination repair. Nat Cell Biol 11, 592-603, doi:10.1038/ncb1865 (2009). 408 Kobayashi, S. et al. Rad18 and Rnf8 facilitate homologous recombination by two distinct mechanisms, promoting Rad51 focus formation and suppressing the toxic effect of nonhomologous end joining. Oncogene 34, 4403-4411, doi:10.1038/onc.2014.371 (2015). 409 Szuts, D., Simpson, L. J., Kabani, S., Yamazoe, M. & Sale, J. E. Role for RAD18 in homologous recombination in DT40 cells. Mol Cell Biol 26, 8032-8041, doi:10.1128/MCB.01291-06 (2006). 410 Ulrich, H. D. Two-way communications between ubiquitin-like modifiers and DNA. Nat Struct Mol Biol 21, 317-324, doi:10.1038/nsmb.2805 (2014).

156

411 Helchowski, C. M., Skow, L. F., Roberts, K. H., Chute, C. L. & Canman, C. E. A small ubiquitin binding domain inhibits ubiquitin-dependent protein recruitment to DNA repair foci. Cell Cycle 12, 3749-3758, doi:10.4161/cc.26640 (2013). 412 Huttlin, E. L. et al. The BioPlex Network: A Systematic Exploration of the Human Interactome. Cell 162, 425-440, doi:10.1016/j.cell.2015.06.043 (2015). 413 Paulsen, B. S. et al. Ectopic expression of RAD52 and dn53BP1 improves homology- directed repair during CRISPR–Cas9 genome editing. Nature Biomedical Engineering 1, 878-888, doi:10.1038/s41551-017-0145-2 (2017). 414 Garcia-Rodriguez, N., Wong, R. P. & Ulrich, H. D. Functions of Ubiquitin and SUMO in DNA Replication and Replication Stress. Front Genet 7, 87, doi:10.3389/fgene.2016.00087 (2016). 415 Notenboom, V. et al. Functional characterization of Rad18 domains for Rad6, ubiquitin, DNA binding and PCNA modification. Nucleic Acids Res 35, 5819-5830, doi:10.1093/nar/gkm615 (2007). 416 Tsuji, Y. et al. Recognition of forked and single-stranded DNA structures by human RAD18 complexed with RAD6B protein triggers its recruitment to stalled replication forks. Genes Cells 13, 343-354, doi:10.1111/j.1365-2443.2008.01176.x (2008). 417 Crosetto, N. et al. Human Wrnip1 is localized in replication factories in a ubiquitin- binding zinc finger-dependent manner. J Biol Chem 283, 35173-35185, doi:10.1074/jbc.M803219200 (2008). 418 Hu, Q., Botuyan, M. V., Cui, G., Zhao, D. & Mer, G. Mechanisms of Ubiquitin- Nucleosome Recognition and Regulation of 53BP1 Chromatin Recruitment by RNF168/169 and RAD18. Mol Cell 66, 473-487 e479, doi:10.1016/j.molcel.2017.04.009 (2017). 419 Bi, X. et al. Rad18 regulates DNA polymerase kappa and is required for recovery from S- phase checkpoint-mediated arrest. Mol Cell Biol 26, 3527-3540, doi:10.1128/MCB.26.9.3527-3540.2006 (2006). 420 Canny, M. D. et al. Inhibition of 53BP1 favors homology-dependent DNA repair and increases CRISPR-Cas9 genome-editing efficiency. Nat Biotechnol, doi:10.1038/nbt.4021 (2017). 421 Panier, S. et al. Tandem protein interaction modules organize the ubiquitin-dependent response to DNA double-strand breaks. Mol Cell 47, 383-395, doi:10.1016/j.molcel.2012.05.045 (2012). 422 Kolas, N. K. et al. Orchestration of the DNA-damage response by the RNF8 ubiquitin ligase. Science 318, 1637-1640, doi:1150034 [pii] 10.1126/science.1150034 (2007). 423 Mailand, N. et al. RNF8 ubiquitylates histones at DNA double-strand breaks and promotes assembly of repair proteins. Cell 131, 887-900, doi:S0092-8674(07)01271-8 [pii] 10.1016/j.cell.2007.09.040 (2007). 424 Xie, A. et al. Distinct roles of chromatin-associated proteins MDC1 and 53BP1 in mammalian double-strand break repair. Mol Cell 28, 1045-1057, doi:S1097- 2765(07)00842-8 [pii] 10.1016/j.molcel.2007.12.005 (2007).

157

425 Bhargava, R. et al. C-NHEJ without indels is robust and requires synergistic function of distinct XLF domains. Nat Commun 9, 2484, doi:10.1038/s41467-018-04867-5 (2018). 426 Roberts, B. et al. Systematic gene tagging using CRISPR/Cas9 in human stem cells to illuminate cell organization. Mol Biol Cell 28, 2854-2874, doi:10.1091/mbc.E17-03-0209 (2017). 427 Agudelo, D. et al. Marker-free coselection for CRISPR-driven genome editing in human cells. Nature methods 14, 615-620, doi:10.1038/nmeth.4265 (2017). 428 Robinton, D. A. & Daley, G. Q. The promise of induced pluripotent stem cells in research and therapy. Nature 481, 295-305, doi:10.1038/nature10761 (2012). 429 Vaisman, A. & Woodgate, R. Translesion DNA polymerases in eukaryotes: what makes them tick? Crit Rev Biochem Mol Biol 52, 274-303, doi:10.1080/10409238.2017.1291576 (2017). 430 Kitevski-LeBlanc, J. et al. The RNF168 paralog RNF169 defines a new class of ubiquitylated histone reader involved in the response to DNA damage. Elife 6, doi:10.7554/eLife.23872 (2017). 431 Zimmermann, M. & de Lange, T. 53BP1: pro choice in DNA repair. Trends Cell Biol 24, 108-117, doi:10.1016/j.tcb.2013.09.003 (2014). 432 Saito, S., Maeda, R. & Adachi, N. Dual loss of human POLQ and LIG4 abolishes random integration. Nature communications 8, 16112, doi:10.1038/ncomms16112 (2017). 433 Bhargava, R., Onyango, D. O. & Stark, J. M. Regulation of Single-Strand Annealing and its Role in Genome Maintenance. Trends Genet 32, 566-575, doi:10.1016/j.tig.2016.06.007 (2016). 434 Westermark, U. K. et al. BARD1 participates with BRCA1 in homology-directed repair of chromosome breaks. Mol Cell Biol 23, 7926-7936 (2003). 435 Wu, L. C. et al. Identification of a RING protein that can interact in vivo with the BRCA1 gene product. Nat Genet 14, 430-440, doi:10.1038/ng1296-430 (1996). 436 McClellan, J. & King, M. C. Genetic heterogeneity in human disease. Cell 141, 210-217, doi:10.1016/j.cell.2010.03.032 (2010). 437 Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285-291, doi:10.1038/nature19057 (2016). 438 Ciccia, A. et al. The SIOD disorder protein SMARCAL1 is an RPA-interacting protein involved in replication fork restart. Genes Dev 23, 2415-2425, doi:10.1101/gad.1832309 (2009). 439 Sowa, M. E., Bennett, E. J., Gygi, S. P. & Harper, J. W. Defining the human deubiquitinating enzyme interaction landscape. Cell 138, 389-403, doi:S0092- 8674(09)00503-0 [pii] 10.1016/j.cell.2009.04.042 (2009). 440 Ciccia, A. et al. Treacher Collins syndrome TCOF1 protein cooperates with NBS1 in the DNA damage response. Proc Natl Acad Sci U S A 111, 18631-18636, doi:10.1073/pnas.1422488112 (2014). 441 Billon, P. et al. CRISPR-Mediated Base Editing Enables Efficient Disruption of Eukaryotic Genes through Induction of STOP Codons. Mol Cell 67, 1068-1079 e1064, doi:10.1016/j.molcel.2017.08.008 (2017). 442 Mallette, F. A. et al. RNF8- and RNF168-dependent degradation of KDM4A/JMJD2A triggers 53BP1 recruitment to DNA damage sites. EMBO J 31, 1865-1878, doi:10.1038/emboj.2012.47 (2012).

158

443 Pinello, L. et al. Analyzing CRISPR genome-editing experiments with CRISPResso. Nature biotechnology 34, 695-697, doi:10.1038/nbt.3583 (2016). 444 Foss, D. V., Hochstrasser, M. L. & Wilson, R. C. Clinical applications of CRISPR-based genome editing and diagnostics. Transfusion 59, 1389-1399, doi:10.1111/trf.15126 (2019). 445 Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31, 827-832, doi:10.1038/nbt.2647 (2013). 446 Mali, P. et al. CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering. Nat Biotechnol 31, 833-838, doi:10.1038/nbt.2675 (2013). 447 Cho, S. W., Kim, S., Kim, J. M. & Kim, J. S. Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat Biotechnol 31, 230-232, doi:10.1038/nbt.2507 (2013). 448 Gabriel, R., von Kalle, C. & Schmidt, M. Mapping the precision of genome editing. Nat Biotechnol 33, 150-152, doi:10.1038/nbt.3142 (2015). 449 Yu, C. et al. Small molecules enhance CRISPR genome editing in pluripotent stem cells. Cell Stem Cell 16, 142-147, doi:10.1016/j.stem.2015.01.003 (2015). 450 Veres, A. et al. Low incidence of off-target mutations in individual CRISPR-Cas9 and TALEN targeted human stem cell clones detected by whole-genome sequencing. Cell Stem Cell 15, 27-30, doi:10.1016/j.stem.2014.04.020 (2014). 451 Smith, C. et al. Whole-genome sequencing analysis reveals high specificity of CRISPR/Cas9 and TALEN-based genome editing in human iPSCs. Cell Stem Cell 15, 12-13, doi:10.1016/j.stem.2014.06.011 (2014). 452 Kim, S., Kim, D., Cho, S. W., Kim, J. & Kim, J. S. Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins. Genome Res 24, 1012-1019, doi:10.1101/gr.171322.113 (2014). 453 Kleinstiver, B. P. et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 481-485, doi:10.1038/nature14592 (2015). 454 Sack, L. M., Davoli, T., Xu, Q., Li, M. Z. & Elledge, S. J. Sources of Error in Mammalian Genetic Screens. G3 (Bethesda) 6, 2781-2790, doi:10.1534/g3.116.030973 (2016). 455 Haeussler, M. et al. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol 17, 148, doi:10.1186/s13059-016-1012-2 (2016). 456 Chang, H. H. et al. Different DNA End Configurations Dictate Which NHEJ Components Are Most Important for Joining Efficiency. J Biol Chem 291, 24377-24389, doi:10.1074/jbc.M116.752329 (2016). 457 Shin, H. Y. et al. CRISPR/Cas9 targeting events cause complex deletions and insertions at 17 sites in the mouse genome. Nat Commun 8, 15464, doi:10.1038/ncomms15464 (2017). 458 Gasperini, M. et al. CRISPR/Cas9-Mediated Scanning for Regulatory Elements Required for HPRT1 Expression via Thousands of Large, Programmed Genomic Deletions. Am J Hum Genet 101, 192-205, doi:10.1016/j.ajhg.2017.06.010 (2017). 459 Parikh, B. A., Beckman, D. L., Patel, S. J., White, J. M. & Yokoyama, W. M. Detailed phenotypic and molecular analyses of genetically modified mice generated by CRISPR-

159

Cas9-mediated editing. PLoS One 10, e0116484, doi:10.1371/journal.pone.0116484 (2015). 460 Boroviak, K., Fu, B., Yang, F., Doe, B. & Bradley, A. Revealing hidden complexities of genomic rearrangements generated with Cas9. Sci Rep 7, 12867, doi:10.1038/s41598- 017-12740-6 (2017). 461 Boroviak, K., Doe, B., Banerjee, R., Yang, F. & Bradley, A. Chromosome engineering in zygotes with CRISPR/Cas9. Genesis 54, 78-85, doi:10.1002/dvg.22915 (2016). 462 Kraft, K. et al. Deletions, Inversions, Duplications: Engineering of Structural Variants using CRISPR/Cas in Mice. Cell Rep 10, 833-839, doi:10.1016/j.celrep.2015.01.016 (2015). 463 Canver, M. C. et al. Characterization of genomic deletion efficiency mediated by clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 nuclease system in mammalian cells. J Biol Chem 289, 21312-21324, doi:10.1074/jbc.M114.564625 (2014). 464 Li, W. et al. MAGeCK enables robust identification of essential genes from genome- scale CRISPR/Cas9 knockout screens. Genome Biol 15, 554, doi:10.1186/s13059-014- 0554-4 (2014). 465 Li, W. et al. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biol 16, 281, doi:10.1186/s13059-015-0843-6 (2015). 466 Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096-1101, doi:10.1126/science.aac7041 (2015). 467 Kiessling, M. K. et al. Identification of oncogenic driver mutations by genome-wide CRISPR-Cas9 dropout screening. BMC Genomics 17, 723, doi:10.1186/s12864-016- 3042-2 (2016). 468 Ihry, R. J. et al. Genome-Scale CRISPR Screens Identify Human Pluripotency-Specific Genes. Cell Rep 27, 616-630 e616, doi:10.1016/j.celrep.2019.03.043 (2019). 469 Hanna, R. E. a. D. J. G. Design and analysis of CRISPR–Cas experiments. Nature Biotechnology (2020). 470 Sanson, K. R. et al. Optimized libraries for CRISPR-Cas9 genetic screens with multiple modalities. Nat Commun 9, 5416, doi:10.1038/s41467-018-07901-8 (2018). 471 Nambiar, T. S. et al. Stimulation of CRISPR-mediated homology-directed repair by an engineered RAD18 variant. Nat Commun 10, 3395, doi:10.1038/s41467-019-11105-z (2019). 472 Bodapati, S., Daley, T. P., Lin, X., Zou, J. & Qi, L. S. A benchmark of algorithms for the analysis of pooled CRISPR screens. Genome Biol 21, 62, doi:10.1186/s13059-020- 01972-x (2020). 473 Kim, Y. C. et al. Activation of ATM depends on chromatin interactions occurring before induction of DNA damage. Nat Cell Biol 11, 92-96, doi:10.1038/ncb1817 (2009). 474 Shiloh, Y. & Ziv, Y. The ATM protein kinase: regulating the cellular response to genotoxic stress, and more. Nat Rev Mol Cell Biol 14, 197-210 (2013). 475 Murphy, K. J. et al. HMGN1 and 2 remodel core and linker histone tail domains within chromatin. Nucleic Acids Res 45, 9917-9930, doi:10.1093/nar/gkx579 (2017). 476 Densham, R. M. & Morris, J. R. Moving Mountains-The BRCA1 Promotion of DNA Resection. Front Mol Biosci 6, 79, doi:10.3389/fmolb.2019.00079 (2019).

160

477 Chen, C. H. et al. Functional links between the Prp19-associated complex, U4/U6 biogenesis, and recycling. RNA 12, 765-774, doi:10.1261/.2292106 (2006). 478 Chan, S. P., Kao, D. I., Tsai, W. Y. & Cheng, S. C. The Prp19p-associated complex in spliceosome activation. Science 302, 279-282, doi:10.1126/science.1086602 (2003). 479 Chan, S. P. & Cheng, S. C. The Prp19-associated complex is required for specifying interactions of U5 and U6 with pre-mRNA during spliceosome activation. J Biol Chem 280, 31190-31199, doi:10.1074/jbc.M505060200 (2005). 480 Abbas, M., Shanmugam, I., Bsaili, M., Hromas, R. & Shaheen, M. The role of the human psoralen 4 (hPso4) protein complex in replication stress and homologous recombination. J Biol Chem 289, 14009-14019, doi:10.1074/jbc.M113.520056 (2014). 481 Marechal, A. et al. PRP19 transforms into a sensor of RPA-ssDNA after DNA damage and drives ATR activation via a ubiquitin-mediated circuitry. Mol Cell 53, 235-246, doi:10.1016/j.molcel.2013.11.002 (2014). 482 Zhang, Y. et al. Crystal structure of the WD40 domain of human PRPF19. Biochem Biophys Res Commun 493, 1250-1253, doi:10.1016/j.bbrc.2017.09.145 (2017). 483 Sharma, S. & Petsalaki, E. Application of CRISPR-Cas9 Based Genome-Wide Screening Approaches to Study Cellular Signalling Mechanisms. Int J Mol Sci 19, doi:10.3390/ijms19040933 (2018). 484 Negrini, S., Gorgoulis, V. G. & Halazonetis, T. D. Genomic instability--an evolving hallmark of cancer. Nat Rev Mol Cell Biol 11, 220-228, doi:10.1038/nrm2858 (2010). 485 Gibson, D. G. Enzymatic assembly of overlapping DNA fragments. Methods Enzymol 498, 349-361, doi:10.1016/B978-0-12-385120-8.00015-2 (2011). 486 Nakajima, S. et al. Replication-dependent and -independent responses of RAD18 to DNA damage in human cells. J Biol Chem 281, 34687-34695, doi:10.1074/jbc.M605545200 (2006). 487 Watanabe, K. et al. Rad18 guides poleta to replication stalling sites through physical interaction and PCNA monoubiquitination. EMBO J 23, 3886-3896, doi:10.1038/sj.emboj.7600383 (2004). 488 Panzarino, N., Krais, J., Peng, M., Mosqueda, M., Nayak, S., Bond, S., Calvo, J., Cong, K., Doshi, M., Bere, M., Ou, J., Deng, B., Zhu, J., Johnson, N., Cantor, S. . Replication gaps underlie BRCA-deficiency and therapy response. BIORXIV (2019). 489 Cong, K., Kousholt, A., Peng, M., Panzarino, N., Lee, W., Nayak, S., Krais, J., Calvo, J., Bere, M., Rothenberg, E., Johnson, N., Jonkers, J., Cantor, S. PARPi synthetic lethality derives from replication-associated single-stranded DNA gaps. BIORXIV (2019). 490 Kim, J. et al. RAD6-Mediated transcription-coupled H2B ubiquitylation directly stimulates H3K4 methylation in human cells. Cell 137, 459-471, doi:10.1016/j.cell.2009.02.027 (2009). 491 Raschle, M. et al. DNA repair. Proteomics reveals dynamic assembly of repair complexes during bypass of DNA cross-links. Science 348, 1253671, doi:10.1126/science.1253671 (2015). 492 Hedglin, M., Aitha, M., Pedley, A. & Benkovic, S. J. Replication protein A dynamically regulates monoubiquitination of proliferating cell nuclear antigen. J Biol Chem 294, 5157-5168, doi:10.1074/jbc.RA118.005297 (2019).

161

493 Yanagihara, H. et al. NBS1 recruits RAD18 via a RAD6-like domain and regulates Pol eta-dependent translesion DNA synthesis. Mol Cell 43, 788-797, doi:10.1016/j.molcel.2011.07.026 (2011). 494 Williams, S. A., Longerich, S., Sung, P., Vaziri, C. & Kupfer, G. M. The E3 ubiquitin ligase RAD18 regulates ubiquitylation and chromatin loading of FANCD2 and FANCI. Blood 117, 5078-5087, doi:10.1182/blood-2010-10-311761 (2011). 495 Zgheib, O., Pataky, K., Brugger, J. & Halazonetis, T. D. An oligomerized 53BP1 tudor domain suffices for recognition of DNA double-strand breaks. Mol Cell Biol 29, 1050- 1058, doi:10.1128/MCB.01011-08 (2009). 496 Cuella-Martin, R. et al. 53BP1 Integrates DNA Repair and p53-Dependent Cell Fate Decisions via Distinct Mechanisms. Mol Cell 64, 51-64, doi:10.1016/j.molcel.2016.08.002 (2016). 497 Schwertman, P., Bekker-Jensen, S. & Mailand, N. Regulation of DNA double-strand break repair by ubiquitin and ubiquitin-like modifiers. Nat Rev Mol Cell Biol 17, 379- 394, doi:10.1038/nrm.2016.58 (2016). 498 Altmeyer, M. & Lukas, J. Guarding against collateral damage during chromatin transactions. Cell 153, 1431-1434, doi:10.1016/j.cell.2013.05.044 (2013). 499 An, L. et al. RNF169 limits 53BP1 deposition at DSBs to stimulate single-strand annealing repair. Proc Natl Acad Sci U S A 115, E8286-E8295, doi:10.1073/pnas.1804823115 (2018). 500 Zong, D. et al. BRCA1 Haploinsufficiency Is Masked by RNF168-Mediated Chromatin Ubiquitylation. Mol Cell 73, 1267-1281 e1267, doi:10.1016/j.molcel.2018.12.010 (2019). 501 Drane, P. & Chowdhury, D. TIRR and 53BP1- partners in arms. Cell Cycle 16, 1235- 1236, doi:10.1080/15384101.2017.1337966 (2017). 502 Drane, P. et al. TIRR regulates 53BP1 by masking its histone methyl-lysine binding function. Nature 543, 211-216, doi:10.1038/nature21358 (2017). 503 Avolio, R. et al. Protein Syndesmos is a novel RNA-binding protein that regulates primary cilia formation. Nucleic Acids Res 46, 12067-12086, doi:10.1093/nar/gky873 (2018). 504 West, K. L. et al. LC8/DYNLL1 is a 53BP1 effector and regulates checkpoint activation. Nucleic Acids Res 47, 6236-6249, doi:10.1093/nar/gkz263 (2019). 505 Ghobashi, A. H. & Kamel, M. A. Tip60: updates. J Appl Genet 59, 161-168, doi:10.1007/s13353-018-0432-y (2018). 506 Uckelmann, M. & Sixma, T. K. Histone ubiquitination in the DNA damage response. DNA Repair (Amst) 56, 92-101, doi:10.1016/j.dnarep.2017.06.011 (2017). 507 Setiaputra, D. & Durocher, D. Shieldin - the protector of DNA ends. EMBO Rep 20, doi:10.15252/embr.201847560 (2019). 508 Ochs, F. et al. 53BP1 fosters fidelity of homology-directed DNA repair. Nat Struct Mol Biol 23, 714-721, doi:10.1038/nsmb.3251 (2016). 509 Mehta, A. & Haber, J. E. Sources of DNA double-strand breaks and models of recombinational DNA repair. Cold Spring Harb Perspect Biol 6, a016428, doi:10.1101/cshperspect.a016428 (2014). 510 Chiruvella, K. K., Liang, Z. & Wilson, T. E. Repair of double-strand breaks by end joining. Cold Spring Harb Perspect Biol 5, a012757, doi:10.1101/cshperspect.a012757 (2013).

162

511 Fell, V. L. & Schild-Poulter, C. The Ku heterodimer: function in DNA repair and beyond. Mutat Res Rev Mutat Res 763, 15-29, doi:10.1016/j.mrrev.2014.06.002 (2015). 512 Dutertre, S. et al. Cell cycle regulation of the endogenous wild type Bloom's syndrome DNA helicase. Oncogene 19, 2731-2738, doi:10.1038/sj.onc.1203595 (2000). 513 Wang, S. C., Lin, S. H., Su, L. K. & Hung, M. C. Changes in BRCA2 expression during progression of the cell cycle. Biochem Biophys Res Commun 234, 247-251, doi:10.1006/bbrc.1997.6544 (1997). 514 Flygare, J., Benson, F. & Hellgren, D. Expression of the human RAD51 gene during the cell cycle in primary human peripheral blood lymphocytes. Biochim Biophys Acta 1312, 231-236, doi:10.1016/0167-4889(96)00040-7 (1996). 515 Huszno, J., Kolosza, Z. & Grzybowska, E. BRCA1 mutation in breast cancer patients: Analysis of prognostic factors and survival. Oncol Lett 17, 1986-1995, doi:10.3892/ol.2018.9770 (2019). 516 Wilbie, D., Walther, J. & Mastrobattista, E. Delivery Aspects of CRISPR/Cas for in Vivo Genome Editing. Acc Chem Res 52, 1555-1564, doi:10.1021/acs.accounts.9b00106 (2019). 517 Liang, X. et al. Rapid and highly efficient mammalian cell engineering via Cas9 protein transfection. J Biotechnol 208, 44-53, doi:10.1016/j.jbiotec.2015.04.024 (2015). 518 Nakad, R. & Schumacher, B. DNA Damage Response and Immune Defense: Links and Mechanisms. Front Genet 7, 147, doi:10.3389/fgene.2016.00147 (2016). 519 Billon, P. et al. Detection of Marker-Free Precision Genome Editing and Genetic Variation through the Capture of Genomic Signatures. Cell Rep 30, 3280-3295 e3286, doi:10.1016/j.celrep.2020.02.068 (2020). 520 Ren, C., Xu, K., Segal, D. J. & Zhang, Z. Strategies for the Enrichment and Selection of Genetically Modified Cells. Trends Biotechnol 37, 56-71, doi:10.1016/j.tibtech.2018.07.017 (2019). 521 Villiger, L. et al. Treatment of a metabolic disease by in vivo genome base editing in adult mice. Nat Med 24, 1519-1525, doi:10.1038/s41591-018-0209-1 (2018).

163