BRCA1 and 53BP1 Mediate Reprogramming Through DNA Repair Pathway Choice

Daniela Georgieva

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive Committee of the Graduate School of Arts and Sciences

COLUMBIA UNIVERSITY 2019

© 2019 Daniela Georgieva All rights reserved

ABSTRACT

BRCA1 and 53BP1 Mediate Reprogramming Through DNA Repair Pathway Choice

Daniela Georgieva

BRCA1 is a caretaker of genome integrity with various molecular functions, which are required for development and tumor suppression. These include the homology-directed repair

(HDR) of DNA double strand breaks, stalled replication fork protection (SFP), transcription, remodeling and cell cycle checkpoint control. Recent studies reported that BRCA1 is required for reprogramming to pluripotency, but its specific role remains unknown. In this work, we use separation of function mutants for the roles of BRCA1 in HDR and SFP to show that

BRCA1 is required to repair replication-associated DNA double strand breaks by during reprogramming. Deficiency in SFP proved inconsequential to induced pluripotent stem (iPS) cell generation and cells with this phenotype did not experience reduced reprogramming. Thus, the primary limiting factor for the transition to pluripotency is a specific class of DNA damage: double strand breaks, likely occurring in late replicating regions which require repair by homologous recombination.

These findings identify an important role of DNA damage, linked to the progression of

DNA replication, in limiting cell type transitions during reprogramming. Most studies on iPS cell generation have focused on expression as a limiting step, in part due to the wide availability of tools to analyze transcription. Since the progression of DNA replication and DNA damage during S-phase are cell type specific, we have started the development of a sequencing platform to map various aspects of replication progression, such as origin usage, polymerase direction,

pausing and stalling. In this work, we demonstrate that nucleotide analogs, incorporated during

DNA synthesis in mammalian cells, can be detected by Nanopore sequencing.

TABLE OF CONTENTS

LIST OF CHARTS, GRAPHS AND ILLUSTRATIONS ...... v

ACKNOWLEDGEMENTS ...... vii

DEDICATION...... ix

CHAPTER 1: INTRODUCTION ...... 1

1.1.THE TUMOR SUPPRESSOR BRCA1 ...... 1

1.2. BRCA1 DOMAIN ORGANIZATION ...... 1

1.3. MOLECULAR FUNCTIONS OF BRCA1 ...... 4

1.3.1. REPAIR OF DOUBLE STRAND BREAKS BY HOMOLOGOUS RECOMBINATION ...... 5

1.3.2. PROTECTION OF STALLED REPLICATION FORKS ...... 5

1.3.3. TRANSCRIPTION ...... 7

1.3.4. CHROMATIN REMODELING ...... 7

1.3.5. CELL CYCLE CHECKPOINT CONTROL ...... 7

1.4. THE BRCA1-INTERACTING PARTNER BARD1 ...... 8

1.5. MARKERS OF REPLICATION STRESS AND DNA DAMAGE ...... 10

1.6. REPAIR OF DNA DOUBLE STRAND BREAKS ...... 10

1.7. THE NHEJ REPAIR FACTOR 53BP1 ...... 14

1.8. PROCESSING OF STALLED REPLICATION FORKS...... 16

1.9. THE FORK REMODELLER SMARCAL1 ...... 18

1.10. REPROGRAMMING TO PLURIPOTENCY ...... 20

i

CHAPTER 2: BRCA1 AND 53BP1 MEDIATE REPROGRAMMING THROUGH DNA REPAIR

PATHWAY CHOICE ...... 22

2.1. ABSTRACT ...... 22

2.2. INTRODUCTION ...... 22

2.3. RESULTS ...... 25

2.3.1. REPROGRAMMING DEPENDS ON THE BRCT PHOSPHO-RECOGNITION PROPERTY

OF BRCA1...... 25

2.3.2. DISRUPTION OR RESCUE OF STALLED FORK PROTECTION (SFP) ALTERS THE

LEVELS OF DNA DAMAGE WITHOUT AFFECTING REPROGRAMMING EFFICIENCY. ... 30

2.3.3. REPROGRAMMING EFFICIENCY IS DETERMINED BY THE BALANCE OF DNA DSB

REPAIR PATHWAY CHOICE...... 50

...... 55

2.4. DISCUSSION ...... 65

2.5. MATERIALS AND METHODS ...... 71

2.5.1. Animal breeding ...... 71

2.5.2. Fibroblast derivation and genotyping ...... 71

2.5.3. Virus preparation and infection ...... 73

2.5.4. Reprogramming ...... 73

2.5.5. Immunofluorescence and DNA fiber analysis ...... 74

2.5.6. HR Assay ...... 75

CHAPTER 3: DETECTION OF BASE ANALOGS INCORPORATED DURING DNA

REPLICATION BY NANOPORE SEQUENCING ...... 77

ii

3.1. MOLECULAR METHODS AVAILABLE FOR THE STUDY OF DNA REPLICATION .. 77

3.2. ABSTRACT ...... 77

3.3. INTRODUCTION ...... 78

3.4. RESULTS ...... 82

3.4.1. DETECTION OF SINGLE-BASE DNA MODIFICATIONS WITH NANOPORE

SEQUENCING ...... 82

3.4.2. DETECTION OF MULTIPLE DNA MODIFICATIONS WITH NANOPORE

SEQUENCING ...... 94

3.4.3. A COMPARATIVE ANALYSIS OF NUCLEOTIDE ANALOGS FOR gDNA

MODIFICATIONS ...... 101

3.4.4. THYMIDINE ANALOGS INCORPORATED IN REPLICATING MAMMALIAN DNA

CAN BE DETECTED ON SINGLE STRANDS...... 106

3.5. DISCUSSION ...... 112

3.6. MATERIALS AND METHODS ...... 115

3.6.1. Training Template Assembly ...... 115

3.6.2. Click Cycloaddition Reactions ...... 115

3.6.3. Dot Blots ...... 116

3.6.4. MinION Runs on Synthetic Templates ...... 116

3.6.5. MinION Runs with Genomic DNA ...... 117

CHAPTER 4: FUTURE DIRECTIONS AND PERSPECTIVES ...... 118

4.1. BRCA1 IN REPROGRAMMING TO PLURIPOTENCY ...... 118

4.2. DETECTION OF MODIFIED NUCLEOTIDES WITH NANOPORE SEQUENCING ..... 120

iii

4.3. TYING GENETIC STABILITY TO CELL IDENTITY ...... 120

4.4. DNA BREAKS AND ...... 125

BIBLIOGRAPHY ...... 128

iv

LIST OF CHARTS, GRAPHS AND ILLUSTRATIONS

Figure 1. Schematic of human BRCA1 protein. 3

Figure 2. Molecular functions of BRCA1. 4

Figure 3. Role of BRCA1 in stalled replication fork protection (SFP). 6

Figure 4. Schematic of human BARD1 protein. 9

Figure 5. Main pathways for the repair of DNA double strand breaks (DSBs). 13

Figure 6. Schematic of human 53BP1 protein. 15

Figure 7. Main pathways for the processing of stalled replication forks. 17

Figure 8. Schematic of human SMARCAL1 protein. 19

Figure 9. Factors which affect reprogramming efficiency. 21

Figure 10. BRCT phospho-recognition is required for reprogramming. 26

Table 1. Reprogramming genotype collection. 29

Figure 11. Stalled fork protection deficiency increases DNA damage but does not impair reprogramming. 31

Figure 12. Compromising SF has no effect on development or reprogramming. 34

Figure 13. The function of Brca1 in HDR is required for reprogramming. 39

Figure 14. Loss of HR has negative consequences for development and reprogramming.

45

Figure 15. Restoration of HDR rescues impaired reprogramming efficiency. 52

Table 2. iPS cell line collection. 58

Figure 16. Restoration of HDR rescues development and reprogramming. 59

Figure 17. One-ended DSBs limit somatic cell reprogramming. 70

Table 3. Sequences of genotyping primers. 72

Figure 18. An artist's representation of nanopore sequencing. 80 v Figure 19. Design and assembly of training templates with single modifications. 83

Figure 20. Detecting a series of single-base modifications with increasing molecular weight

Table 3. Sequences of genotyping primers. 72

Figure 18. An artist's representation of nanopore sequencing. 80

Figure 19. Design and assembly of training templates with single modifications. 83

Figure 20. Detecting a series of single-base modifications with increasing molecular weight through MinION sequencing. 88

Figure 21. Design and assembly of training templates with multiple modifications. 95

Figure 22. Detection of multiple modifications with MinION sequencing. 98

Figure 23. Substitution of mammalian genomic DNA with EdU. 102

Figure 24. Sequencing depth required for the calling of base analogs in mixed samples with unmodified and modified reads. 104

Figure 25. Detection of IdU incorporated into replicating mammalian DNA. 108

Figure 26. Cell type specific DNA replication programs limit changes in cell identity. 123

Figure 27. Traffic analogy of DNA replication. 127

vi

ACKNOWLEDGEMENTS

“From the nature of things ideas do not come from prosperity, affluence and contentment, but rather from the blackness of despair, not in the bright light of day, nor the footlights’ glare, but rather in the quiet, undisturbed hours of midnight, or early morning, when one can be alone to think”-Frederick Banting

The journey of my PhD has been riddled with obstacles, but also exciting moments of discovery. I thank my mentor, Dr. Dieter Egli, for the opportunity to pursue projects I felt passionate about, for his guidance through the rough times of experimental problems, for his complete scientific and emotional investment in this work which has brought about scores of publication exchanges, hours of discussion and plenty of encouragement. My time at Dieter’s lab has been a privilege as I have had the rare opportunity to work closely with and learn from one of the world’s best! I thank Dieter for always seeking and respecting my opinion, for letting me contradict his views whenever we disagreed and for guiding me steadily towards the truth.

I would also like to thank my exceptional committee members-Richard Baer, Claudia

Doege, Alberto Ciccia and Jean Gautier. They have advised me constantly throughout my PhD and have been extremely generous with their time and lab resources. Special thanks to Dr. Richard

Baer for making the BRCA1 project possible! Many thanks to Dr. Ciccia for sharing lab reagents and his expertise on SMARCAL1! I am deeply grateful to Dr. Claudia Doege for her advice on my reprogramming experiments, for her mentorship and encouragement which have been invaluable during tough times. Finally, special thanks to Dr. Gautier who generously agreed to chair the committee. I thank him for raising important questions during committee meetings and for his great advice.

vii

Besides my mentors, I would like to thank my current labmates Lina Sui, Bryan Gonzalez,

Michael Zuccaro, Danielle Baum, Ning Wang, Maria-Caterina De Rosa (Doege lab) as well as past lab members for the wonderful work atmosphere. I am proud to graduate from the Egli lab at

Columbia which is a wonderful example of a hard-working, supportive and fun lab team.

Many thanks to my friends, who have learned about BRCA1 over the past 6 years, have started loving DNA and have become aware that a Minion can be something other than a 3D animation character. I am especially grateful to my best friend, Viktoriya Staneva, for her sisterly care and continuous support in everything I do.

viii

DEDICATION

To CAD and MVG who soldiered through breast cancer during my PhD. Your optimism and unswerving strength of character have been an inspiration and a lesson on how to prevail over the greatest malady of our time. I pray and hope for your good health in the years to come.

“Resilience, inventiveness, and survivorship-qualities often ascribed to great physicians-are reflected qualities, emanating first from those who struggle with illness and only then mirrored by those who treat them”- Siddhartha Mukherjee in the Emperor of All Maladies.

ix

CHAPTER 1: INTRODUCTION

1.1.THE TUMOR SUPPRESSOR BRCA1 BRCA1 is a tumor suppressor with multiple molecular functions, which play important roles in the maintenance of genome stability. Germline mutations in BRCA1 significantly increase the risk of breast and ovarian cancer [1] and are associated with higher incidence of fallopian tube, prostate, stomach and other malignancies, reviewed in Roy et al. [2]. Pathogenic mutations have been described in all domains of the protein, but most are truncations which eliminate or severely affect the BRCT domains [3, 4]. Two common BRCA1 alterations are the Ashkenazi founder mutations 185AGdel and 5382insC, both of which are frameshifts leading to a truncated protein

[4]. While 5382insC is prevalent in European countries, 185AGdel has been found in patients of

European, American, African and Asian descent [5]. The cumulative lifetime risk of developing breast cancer for BRCA1 mutation carriers is 70-80% [2]. BRCA1-asscoicated breast are typically triple negative (ER-negative, PR-negative, HER2-negative) and highly aggressive [6].

The tissue tropism of BRCA1 tumors has not been formally explained, but an attractive possibility is accelerated loss of heterozygosity (LOH) in mammary tissue due to increased rates of mitotic recombination [7]. The landmark cloning of BRCA1 in 1994 [1] provided the opportunity to discover, characterize and establish ways to manipulate the mechanisms of BRCA1-dependent tumor suppression with therapeutic intent.

1.2. BRCA1 PROTEIN DOMAIN ORGANIZATION BRCA1 (Breast Cancer Associated 1 Early Onset) is an 1863 amino acid protein, which maps to the long arm of human 17 [8]. BRCA1 participates in multiple molecular pathways with important consequences for genomic stability and cell proliferation. Complete loss of BRCA1 is embryonic lethal, which required the generation of hypomorphic alleles and tissue

1

specific knockouts to model the disease in mice [9]. The main functional units of the BRCA1 gene product are an N-terminal RING-finger domain, a DNA-binding domain, a serine cluster, and two copies of C-terminal BRCT repeats [10]. (Fig.1). The RING-finger is required for the formation of the BRCA1-BARD1 heterodimer which enhances the E3 ligase activity of BRCA1, but is not needed for tumor suppression [11]. The region between Exon11 and Exon13 spans over half of the

BRCA1 protein and provides surface for interaction with RB1 (RB transcriptional corepressor 1), the bHLH transcription factor cMYC, the double strand break repair protein RAD50, PALB2

(Partner and Localizer of BRCA2) and the RAD51 recombinase, which are involved in cell cycle control and DNA repair [12]. The serine cluster domain between aa1280 and aa1524 contains serine residues which are phosphorylated by ATM/ATR kinases in response to DNA damage and have been linked to ionizing irradiation induced S-phase (S1387) and G2/M (S1423 and S1524) cell cycle arrest [13, 14]. Another important phosphorylation site is the CHK2 target S988, which has been shown to regulate homologous recombination [15]. The C-terminal of BRCA1 provides a phosphoprotein binding surface for ABRAXAS/CCDC98, the DEAH family

BACH1/BRIP1 (BRCA1-Intercating Protein C-terminal helicase 1) and CTIP/RBBP8 (RB-

Binding Protein 8 endonuclease), which form mutually exclusive complexes with BRCA1 at

S1655 (S1598 in mouse Brca1) [16] and play roles in DNA repair and checkpoint control [17-19].

The C-terminal of BRCA1 can also associate with EP300 (E1A binding protein p300), which is involved in transcription [20]. Most of the identified cancer-predisposing BRCA1 lesions generate a truncated protein, missing one or both BRCT domains [3], which underscores their importance in tumor suppression.

2

Figure 1. Schematic of human BRCA1 protein. The RING (Really Interesting New Gene) domain of BRCA1 from aa10 to aa109 is followed by a DNA Binding Domain (aa452-1092) and a Serine Cluster (SQ) domain (aa1280-1524). Two pairs of BRCT repeats span the regions aa1640-1729 and aa1760-1821. The binding sites of several BRCA1-interacting partners are marked in blue. Many of the interacting factors are implicated in somatic cell reprogramming, including Rb [21], cMyc [22], Rad51 [23], Brca2 [24] and CtIP [25]. Selected ATM/ATR targets are shown in the serine cluster domain. The interaction of BRCA1 with ABRAXAS, BACH1 and CTIP occurs at BRCA1 S1655 (S1598 in mouse Brca1).

3

1.3. MOLECULAR FUNCTIONS OF BRCA1 BRCA1 is a member of various protein complexes with important molecular functions. In addition to its critical role in DNA double strand break repair by homologous recombination (HR),

BRCA1 is involved in stalled replication fork protection (SF), chromatin remodeling, transcription and cell cycle checkpoint control [26] (Fig. 2).

Figure 2. Molecular functions of BRCA1. The tumor suppressor BRCA1 has important roles in various molecular processes. BRCA1 functions in DNA double strand break repair by homologous recombination (HR), stalled replication fork (SF) protection, transcription, chromatin remodeling and cell cycle checkpoint control.

4

1.3.1. REPAIR OF DOUBLE STRAND BREAKS BY HOMOLOGOUS RECOMBINATION A key molecular role of BRCA1 is in the repair of DNA double strand breaks (DSBs) by homologous recombination (HR) [27]. Such lesions occur in response to ionizing irradiation, treatment with alkylating or crosslinking agents, oxidative damage related to cell metabolism and problems during DNA replication [28]. Single strand breaks (SSBs), for example, are converted to DSBs during replication when the runs through the affected region [29]. Stalled replication forks can also cause DSBs if they collapse or are degraded by nucleases [30]. Single ended DSBs, such as the ones generated by fork cleavage or collapse, are preferentially processed by HR as the competing non-homologous end joining (NHEJ) pathway can lead to lethal translocations [31]. Homologous recombination is a mostly error-free DSB repair pathway, which relies on homology to a sister chromatid. Thus, while NHEJ is active throughout the cell cycle,

HR is only available in S and G2. An important function of BRCA1 in HR is to inhibit 53BP1, a main factor in NHEJ, and thus direct the repair pathway choice to HR [32]. The frequency of HR is greatly reduced in BRCA1 deficient cells, which leads to genomic instability with developmental consequences [33] and increased cancer susceptibility [34]. Inactivation of 53bp1 in Brca1 mutant cells rescues their HR deficiency by eliminating the competition for DSB repair with 53bp1 [35]. The coiled coil domain of BRCA1 contains a PALB2 binding site which is required for the interaction with BRCA2 [36] and the formation of RAD51 filaments in the late steps of HR.

1.3.2. PROTECTION OF STALLED REPLICATION FORKS Replication fork stalling occurs at sites of DNA lesions, in regions with difficult to replicate repetitive sequences [37], due to collisions with transcription complexes [38], or as a result of replication stress in the presence of activated oncogenes [39]. One pathway to resolve stalled replication is by fork regression, catalyzed by the SNF-2 family translocase SMARCAL1 [40].

5

This process involves annealing of the two nascent strands to each other, which generates an intermediate structure, termed “chicken foot” [41]. Stalled forks, resolved via “chicken foot”, require protection by BRCA1, BRCA2 and RAD51 from MRE11-dependent nucleolytic degradation [42, 43] (Fig. 3). Depleting SMARCAL1 rescues the stalled fork protection (SF) deficiency of BRCA1 mutant cells and reduces the levels of [42].

Figure 3. Role of BRCA1 in stalled replication fork protection (SFP). The SMARCAL1 translocase is recruited to stalled replication forks in an RPA-dependent manner

[40]. Fork reversal by SMARCAL1 generates a “chicken foot” intermediate. This structure is protected by BRCA1 and other HR from MRE11-dependent degradation. Created with

BioRender.

6

1.3.3. TRANSCRIPTION A role of BRCA1 in transcription has been suggested by its association with core components of the transcription machinery, such as RNA Pol II and RNA helicase A [44, 45]. In addition, BRCA1 has been reported to interact directly with the transcriptional co-activator P300

[20]. These findings are thought to reflect the role of BRCA1 in transcription-coupled repair, rather than promoter activation since BRCA1 binds to RNA Pol II post transcription initiation [46].

BRCA1 also modulates the activities of several transcription factors, which control proliferation, cell cycle arrest and apoptosis. For example, the N-terminal of BRCA1 binds to ER-α to quench

ER-α-dependent transcription, which might have implications in tumor suppression [47].

Similarly, BRCA1 inhibits the transcriptional activities of another oncogene-cMYC [48]. Besides curbing the transformative potential of oncogenes, BRCA1 has been shown to interact with tumor suppressors, such as TP53 (Tumor Protein 53). Binding of BRCA1 to TP53 stimulates the induction of TP53 target , including P21/CDKN1A, which induces cell cycle arrest [49].

1.3.4. CHROMATIN REMODELING The BRCT domains of BRCA1 associate with components of the histone deacetylase complex-HDAC1 and HDAC2 [50]. BRCA1 can also directly interact with BRG1, which is a component of the SWI/SNF complex [51]. Furthermore, BRCA1 has been reported in a complex with BACH1/BRIP1, a DEAH family helicase [52], which plays a role in the DNA damage response [18]. The activities of BRCA1 in chromatin remodeling are likely related to DNA repair, recombination and transcription which require changes in chromatin state [12].

1.3.5. CELL CYCLE CHECKPOINT CONTROL The Serine Cluster domain of BRCA1 between aa1280 and aa1524 contains several serine residues, phosphorylated by ataxia telangiectasia mutated (ATM) and RAD3-related (ATR) kinases in response to DNA damage and implicated in cell cycle checkpoint control. These include

7

S1387 [13], required for the S-phase checkpoint, as well as, S1423 and S1524 for the G2/M checkpoint [14, 53]. Furthermore, BRCA1 regulates the expression of CHK1, which has a role in the induction of G2/M arrest [54]. In response to DNA damage, BRCA1 is phosphorylated at S988 by CHK2, which is also involved in the G2 to M transition. Mouse models lacking the Chk2 phosphorylation site have reduced ability to activate the G2/M checkpoint and are tumor prone

[55].

1.4. THE BRCA1-INTERACTING PARTNER BARD1 BRCA1 in vivo exists in a heterodimeric complex with BARD1 (Brca1 Associated RING

Domain 1)- a 777 amino acid protein with an N-terminal RING motif, three ankyrin repeats and two sets of C-terminal BRCT repeats [56] (Fig.4). BARD1 interacts with BRCA1 through helices in the vicinity of its RING motif [56] and this association is important for BRCA1 stabilization

[57] and recruitment to sites of DNA damage [58]. Loss of either BARD1 [59] or BRCA1 [9] is embryonic lethal, which shows that the BARD1/BRCA1 heterodimer mediates the role of BRCA1 in development. Mammary-specific ablation of BARD1 leads to the development of breast carcinomas in mice, similar to BRCA1 inactivation [60]. Germline mutations in BARD1 have been described in hereditary breast and ovarian cancer patients [61], which underscores the importance of the BARD1/BRCA1 interaction for tumor suppression. Two residues in the BRCT domain of

BARD1, S575 (S563 in mouse Bard1) and K619 (K607 in mouse Bard1), have been shown to play a role in PARP-mediated recruitment of BRCA1 to stalled replication forks [62]. Disrupting S563 or K607 renders cells unable to protect stalled replication forks from degradation without interfering with homologous recombination [62].

8

Figure 4. Schematic of human BARD1 protein. The protein product of BARD1 contains an N-terminal RING-domain, three tandem ankyrin repeats and two copies of a C-terminal BRCT motif. The residues, which play a role in stalled replication fork (SF) protection, S575 and K619, are displayed in red.

9

1.5. MARKERS OF REPLICATION STRESS AND DNA DAMAGE The presence of RPA (Replication Protein A) nuclear foci in cycling cells is a physiological phenomenon, related to the progression of DNA replication. The number of RPA foci increases in in the context of replication stress, induced by treatment with hydroxyurea or the cross-linking agent mitomycin C (MMC) [63] due to accumulation of excessive amounts of ssDNA from stalled replication forks. The p32 subunit of RPA is phosphorylated by ATR at S33 in response to replication stress in late S and G2 phases of the cell cycle [64]. The phosphorylation of RPA plays a role in the recruitment of PALB2 to stalled replication forks and subsequent fork recovery [65].

The phosphorylated form of histone variant H2AX at S139, termed γ-H2AX is a widely accepted marker of DNA double strand breaks (DSBs). Phosphorylation in response to DNA damage is carried out by ATM [66], while phosphorylation at sites of stalled replication forks is mediated by ATR [67]. Phosphorylated H2AX can accumulate over a region as large as 2MB surrounding the lesion [68] and plays a role in the recruitment of DNA repair proteins, such as

NBS1 and BRCA1 [69].

1.6. REPAIR OF DNA DOUBLE STRAND BREAKS Homologous recombination (HR) and non-homologous end joining (NHEJ) are the primary pathways for the repair of DNA double strands breaks (DSB) in cells (Fig.5). Homologous recombination is active in S and G2, while NHEJ is available throughout the cell cycle [70]. Repair by HR is mostly error free and depends on the presence of a sister chromatid. It is a slow process, estimated to require as long as 7 hours or more in human cells [71]. The tumor suppressor BRCA1 has been shown to promote HR in mouse embryonic stem cells [27]. BRCA1 is phosphorylated by CHK2 [72] and ATM/ATR [13, 14] kinases, which become active in response to DNA damage.

Phosphorylated BRCA1 inhibits the activities of 53BP1 in S and G2 phases of the cell cycle and drives repair pathway choice to HR [73]. The first step of repair by HR is end resection, initiated

10

by the MRN complex (MRE11/RAD50/NBS1) and stimulated by CTIP [74]. More extensive resection is accomplished by EXO1 and BLM/DNA2 [75], generating single stranded DNA, coated by RPA. In the next steps of HR, BRCA2 facilitates the displacement of RPA by RAD51 and the assembly of RAD51 filaments on ssDNA [76, 77], which is a key requirement for homology search. In its late stages, HR involves strand invasion, DNA synthesis, ligation and holiday junction resolution [78].

Non-homologous end joining (NHEJ) is a fast (~30min) [71] error prone repair pathway active in all cell cycle phases. A main player in NHEJ is 53BP1, which is phosphorylated by ATM in response to DNA damage [70]. Active 53BP1 recruits RIF1 and PTIP which inhibit end resection [79, 80], thereby directing repair pathway choice to NHEJ. The repair process begins with the binding of the KU70/80 heterodimer to the DNA ends which are held together through interaction with DNA-PKcs [81]. Removal of overhangs, loops and flaps is accomplished by the

DNA-PKcs/Artemis complex [82]. Finally, the ends are joined by DNA ligase IV and XRCC4, which stabilizes DNA ligase IV and stimulates its activity [83]. Mismatched ends require the activity of an additional factor, XLF [84]. Due to its template independence, NHEJ repair is prone to short insertions and deletions at the site of the lesion.

11

12

Figure 5. Main pathways for the repair of DNA double strand breaks (DSBs).

Two-ended DSBs can be repaired by homologous recombination (HR) or non-homologous end joining (NHEJ). While HR is only active in S and G2, NHEJ is available throughout the cell cycle.

BRCA1 stimulates HR by inhibiting the NHEJ repair factor 53BP1. The first step in HR is end resection, followed by RAD51 filament formation. The late stages involve strand invasion, DNA synthesis and holiday junction resolution, which concludes the error-free repair of the lesion.

Alternatively, the ends of the break can be bound by KU70/80 and ligated by LIGIV/XRCC4, generating insertions or deletions. Created with BioRender.

13

1.7. THE NHEJ REPAIR FACTOR 53BP1 53BP (p-53 Binding Protein 1) is a 1972 amino acid DNA repair factor involved in non- homologous end joining (NHEJ). NHEJ joins two broken DNA ends, such as in V(D)J and class switch recombination (CSR), which occur during immune system development, or after ionizing radiation. Mice, deficient in 53BP1 are hypersensitive to IR and prone to lymphomas [85]. The

N-terminal of 53BP1 has 28 S/T-Q sites, which can be phosphorylated by ATM and other kinases

[73] (Fig.6). The S/T-Q sites are followed by a Tudor domain and a pair of BRCT repeats at the

C-terminal of 53BP1 [86]. In DNA double strand break (DSB) repair, 53BP1 acts as scaffold to recruit additional NHEJ factors. These include RIF1 (RAP-Interacting Factor 1) and PTIP (Pax

Transactivation activation domain Interacting Protein), which associate with phosphorylated

53BP1 [87, 88]. In S and G2 phases of the cell cycle, BRCA1 promotes dephosphorylation of

53BP1, which directs DSB repair pathway choice to HR [32]. The activity of 53BP1 is highly consequential to cell proliferation and survival in the absence of BRCA1. Ablation of 53bp1 in

Brca1-deficient mice rescues their embryonic lethality [33]. Similarly, loss of 53BP1 in BRCA1 mammary tumors confers resistance to PARP inhibitors by restoring HR competence [89]. Another function of 53BP1 in DNA repair is to amplify ATM signaling in response to DNA damage [90].

In the absence of 53BP1, the phosphorylation of ATM targets, is reduced [91]. The relationship between 53BP1 and p53 remains to be fully defined, but 53BP1 has been implicated in the transcriptional activities of p53 [92]. Loss of 53BP1 was reported to interfere with the induction of p21, BAX and other targets in response to DNA damage in cultured human cancer cells [92].

Another study, however, showed normal activation and stabilization of p21 in the absence of

53bp1 in primary cells from transgenic mice. [93].

14

Figure 6. Schematic of human 53BP1 protein.

The N-terminal of 53BP1 contains 28 Ser/Thr-Gln sites, phosphorylated by ATM and other kinases. Phosphorylated sites 1-8 are bound by PTIP, while RIF1 binds to the next 7 sites towards the C-terminal [94]. The oligomerization domain (OD) spans the region from aa1231 to aa1277 and is followed by a Tudor domain (aa1480 to aa1540) and an ubiquitylation-dependent recruitment (UDR) motif. Two BRCT repeats, shown in pink, have been mapped to the C-terminal of the protein. Sites of interaction between 53BP1 and other factors are marked in blue. The minimal 53BP1 focus forming region spans from aa1231 to aa1711.

15

1.8. PROCESSING OF STALLED REPLICATION FORKS Stalled forks occur during normal conditions in replicating cells, but are accumulated in response to replication stress and in the absence of factors required for their stabilization and repair

[42]. Fork stalling can occur at sites of tightly bound proteins, DNA secondary structures and DNA lesions, such as crosslinks or adducts. Common pathways to bypass replication blocks include trans-lesion synthesis, re-priming or rescue from a converging fork [95]. Alternatively, stalled forks can be processed by the DNA translocase SMARCAL1, which catalyzes fork regression and the generation of a “chicken foot” structure (Fig.7). This intermediate is protected by BRCA1,

BRCA2 and FA [26] proteins from MRE11-dependent degradation [42, 43]. Recently, another

DNA repair factor, 53BP1 was reported to bind reversed forks and protect from genomic instability

[96]. The intermediate structures of reversed forks can be directly converted to active forks by reverse branch migration [97] or processed by BLM and WRN , which migrate holiday junctions [98, 99].

Another pathway for resolving stalled forks involves the nuclease MUS81, which is subject to cell cycle regulation [100]. Efficient cleavage by MUS81 requires the association with SLX4, which occurs in G2/M [101]. Thus, fork stalling in late S-phase or G2 is likely to be resolved by

MUS81-mediated cleavage. Another aspect of cell cycle regulation is the activity of Wee1 kinase, which inhibits the endonuclease Mus81 [102]. Wee1 kinase activity decreases in late S-phase and

G2 of the cell cycle [103]. Interestingly, the recruitment of the MUS81-SLX complex to stalled forks is facilitated by BRCA1 [104]. Cleavage by MUS81 results in a one-ended double strand break (DSB), which can be productively repaired only by HR, albeit 53BP1 competes with BRCA1 for the break end. Repair of a one-ended DSB by NHEJ can result in lethal translocations. Thus, shifting the balance towards HR can lead to more efficient repair and promote cell survival.

16

Figure 7. Main pathways for the processing of stalled replication forks.

Stalled forks can be remodeled by SMARCAL1 and further processed in a cleavage-free manner to restart replication. Alternatively, stalled forks can be cleaved my MUS81-SLX4, which generates a one-ended double strand break, requiring repair by BRCA1-mediated HR. This pathway is active in late S-phase and G2.

17

1.9. THE FORK REMODELLER SMARCAL1 SMARCAL1 (SWI/SNF family related, Matrix associated, Actin dependent Regulator of

Chromatin, subfamily A-like1) is a DNA translocase with important functions in stalled replication fork remodeling and telomere maintenance [105]. Biallelic mutations in SMARCAL1 result in

Schimke immuno-osseous dysplasia (SIOD), a multisystem disorder characterized by growth failure, brain malformations, atherosclerosis and deficiency in T-lymphocytes. Mice, lacking

Smarcal1 are hypersensitive to topoisomerase inhibitors etoposide and irinotecan, as well as, the nucleotide reductase inhibitor hydroxyurea (HU) [106].The product of the human SMARCAL1 gene is a 954 amino acid protein with an N-terminal RPA interacting motif, two HARP domains and a C-terminal helicase domain [105] (Fig.8). The interaction with RPA is required for the recruitment of SMARCAL1 to stalled replication forks [40] where SMARCAL1 acts as an annealing helicase to direct the binding of RPA-coated, complementary ssDNA strands to each other [107]. These activities generate DNA intermediates, which resemble holiday junctions and require protection by BRCA1 and other HR proteins from MRE11-dependent degradation [42,

43]. The functions of SMARCAL1 are especially important at telomeres where G-quadruplex and

T-loop structures frequently impede DNA replication. Accordingly, loss of SMARCAL1 has been shown to increase DNA damage at telomeres [108].

18

Figure 8. Schematic of human SMARCAL1 protein. The RBD (RPA-binding domain) of SMARCAL1 is located at the N-terminal of the protein and spans the region from aa2 to aa30. The RBD is followed by a pair of HARP domains from aa260 to aa303 and from aa327 to ss398. The helicase domain is located at the C-terminal of the protein and consists of two RecA-like domains: DEXDc (aa445-aa600) and HELICc (aa716-aa869).

19

1.10. REPROGRAMMING TO PLURIPOTENCY Somatic cells can be reprogrammed to pluripotency by the ectopic expression of OCT4,

SOX2, KLF4 and cMYC [22]. The reprogrammed cell population, termed “induced pluripotent stem cells” (iPS) exhibits the hallmark characteristics of embryonic stem cells, such as ability to proliferate indefinitely and capacity to differentiate into any cell type. Since the landmark

Yamanaka study [22], various reprogramming factor combinations have been developed, reviewed by Gonzalez et al. [109], including ones based entirely on small molecules [110]. The reprogramming process involves the generation of defined intermediate cell populations with characteristic transcriptional features [111]. To acquire pluripotency, cells first silence lineage- specific genes and then proceed to activate embryonic factor promoters [112].

The oncogene cMYC is not an essential component of the reprograming cocktail [113], but it enhances the process substantially. Overexpression of cMYC causes replication stress and results in DNA damage [114]. However, increased DNA damage has been reported during the generation of iPS cells in the absence of cMyc (3 factors only Oct4, Sox2, Klf4) [24] and during reprogramming by nuclear transfer where no overexpression of transcription factors occurs [115].

Thus, DNA damage is a general reprogramming phenomenon, likely related to replication stress

[116] and replication program remodeling [117], albeit the exact cause has not been formally established. The HR pathway for DSB repair plays an important role in iPS cell generation as

Brca1 and Brca2 mutants have been shown to reprogram poorly [24]. Mutations or knockdown of other proteins involved in DNA damage repair, including Parp1 [118], Rad51 [24], FancC and

FancA [119] as well as Atm [120] result in impaired reprogramming efficiency (Fig. 9).

Conversely, ablation of the tumor suppressor p53 [121, 122] improves reprogramming by

20

compromising DNA-damage induced apoptosis signaling. Enhanced reprogramming has also been reported upon downregulation of Rb [21], which controls entry into the cell cycle.

Figure 9. Factors which affect reprogramming efficiency. The ectopic expression of reprogramming factors during iPS cell generation leads to accumulation of DNA damage. Mutations in DNA damage signaling and repair pathways have a negative impact on reprogramming. Ablation of several tumor suppressors improves the efficiency of iPS cell generation. BRCA1 stands out as a tumor suppressor whose ablation impairs reprogramming.

LOF = loss of function. Created with BioRender.

21

CHAPTER 2: BRCA1 AND 53BP1 MEDIATE REPROGRAMMING THROUGH DNA REPAIR PATHWAY CHOICE

2.1. ABSTRACT Reprogramming to pluripotency is associated with increased levels of DNA damage which trigger the DNA damage response (DDR) pathway and require the functions of the homology- directed repair (HDR) protein BRCA1. Here, we leverage the genetic interactions between

BRCA1, its binding partner BARD1, the DNA translocase SMARCAL1 and the non-homologous end joining (NHEJ) factor 53BP1 to ascertain the relevance of stalled fork protection (SFP) and double strand break HDR in somatic cell reprogramming. Surprisingly, changes in stalled fork stability, which altered the levels of DNA damage marked by H2AX, were inconsequential for the transition to iPS cells, while deficiency in HDR alone impaired reprogramming. Ablation of

53bp1 restored reprogramming in Brca1 mutants and also increased the efficiency of iPS cell generation in wild type cells. These results demonstrate that the DNA damage limiting to somatic cell reprogramming is a double strand break, which requires repair by homologous recombination.

2.2. INTRODUCTION Somatic cells can be reprogrammed to pluripotency by the ectopic expression of four transcription factors OCT4, SOX2, KLF4 and cMYC, which are master regulators of the embryonic state [22]. The reprogrammed cell population, termed iPS cells, is endowed with the capacity to proliferate indefinitely, but also to differentiate into any specialized cell type in the adult organism. These characteristics make iPS cells uniquely suitable for disease modelling and drug discovery in vitro, as well as for the development of patient-specific cell replacement therapies. The overexpression of the OSKM reprogramming factors has been shown to result in increased levels of DNA damage [24], which was also reported in the absence of cMYC (OSK) and during reprogramming by nuclear transfer, where no overexpression of transcription factors

22

occurs [115]. These findings show that DNA damage is a general reprogramming phenomenon, likely related to replication stress [116] and replication program remodeling [117], albeit the exact cause has not been formally established. The HR pathway for double strand break (DSB) repair plays an important role in iPS cell generation as Brca1 and Brca2 mutants have been shown to reprogram poorly [24]. Mutations or knockdown of other proteins, involved in DNA damage repair, including Parp1 [118], Brca2 and Rad51 [24], FancC and FancA [119], FancD2 [123], as well as, Atm [120] have reduced reprogramming efficiency, suggesting that genome integrity plays a central role in somatic cell reprogramming. Conversely, ablation of tumor suppressors p53 [122,

124], p21[121] and Rb [21] results in more efficient iPS cell generation. These observations indicate that there are shared molecular processes between reprogramming and oncogenic transformation, related to genome instability and overcoming the limitations in somatic cell proliferation. However, the source of reprogramming-induced DNA damage and the mechanisms of repair remain poorly understood as most factors affecting reprogramming efficiency, including

BRCA1, exert multiple functions.

BRCA1 is involved in the homology-directed repair (HDR) of DNA double strand breaks

[125] and binds the strand invasion protein RAD51 [126]. Recent studies have reported a previously unrecognized function of BRCA1 in stalled replication fork protection (SFP).

Heterozygous BRCA1 mutant cells show more extensive resection at stalled replication forks

[127]. Stalled fork stability is conferred by the BRCA1/BARD1 heterodimer [62]. Point mutations in BARD1 have been reported to result in SF protection deficiency, manifested by reduced survival upon treatment with fork stalling agents MMC, hydroxyurea and the PARP inhibitor olaparib, though mutant mice are not cancer prone [62].

23

Here, we leverage the functional interactions of BRCA1 with BARD1, SMARCAL1 and

53BP1 in a panel of mouse genotypes to determine the nature of the DNA damage generated during reprogramming and the mechanisms of repair. Specifically, we generated cells with combined

HDR, SFP deficiency or single disruption of either double strand break HDR or SFP. We examined the relevance of SFP to reprogramming with two complementary strategies: disruption by point mutations in Bard1 and inactivation of one Brca1 allele. An additional model for the study of HDR and SFP functions of BRCA1 was developed by mutating the SNF-2 family translocase

SMARCAL1, which remodels stalled replication forks to generate intermediate “chicken-foot” structures, degraded by MRE11 in the absence of BRCA1 [42]. Inactivation of SMARCAL1 prevents fork degradation by interfering with the formation of the “chicken-foot” intermediate and selectively rescues SFP, without affecting HDR [42]. Furthermore, we restored HDR in Brca1 mutant cells through inactivation of 53bp1, which competes for DNA ends with Brca1 and promotes repair by non-homologous end joining (NHEJ) [32, 80]. Our studies show that deficiency in stalled fork protection (SFP) is inconsequential to reprogramming. In contrast, we find decreased reprogramming efficiency in both HDR-SFP- and HDR- genotypes. Reprogramming is rescued upon restoration of HDR through mutation of 53bp1 in Brca1 mutant cells. Therefore,

BRCA1 is required for the homology-directed repair (HDR) of replication stress-induced DNA damage during reprogramming. Surprisingly, stimulating HDR by disruption of 53bp1 also improved the reprogramming efficiency of wild type cells. This effect is not due to the role of

53bp1 in p53-regulated transcription, but due to DNA repair pathway choice. Competition with

NHEJ inhibits HDR, resulting in a DNA damage signal which suppresses somatic cell reprogramming.

24

2.3. RESULTS 2.3.1. REPROGRAMMING DEPENDS ON THE BRCT PHOSPHO-RECOGNITION PROPERTY OF BRCA1.

We previously reported that mouse fibroblasts, homozygous for a Brca1 truncation allele

(Brca1tr/tr), generated by a 50-bp insertion into exon 11 [34], reprogram with reduced efficiency

[24]. This Brca1 truncation produces a 924 amino acid hypomorphic protein, which lacks a portion of the DNA binding domain, the entire SQ cluster region and both copies of the C-terminal BRCT repeats. To test if BRCT phospho-recognition is required for somatic cell reprogramming, we generated fibroblasts triple homozygous for selected serine to alanine substitutions on Brca1 partners Abraxas (S404), Bach1 (S994) and CtIP (S327), which interact with Brca1S1598 in the

BRCT repeats. The phosphorylated isoforms of these three proteins associate with Brca1S1598 in a mutually exclusive manner and this interaction is abolished in the triple homozygous point mutant AbraxasS404AS404A, Bach1S994A/S994A, CtIPS327A/S327A genotype, referred to as

AABBCC. To determine the reprogramming efficiency of the AABBCC genotype, we first collected control and mutant embryos for fibroblast derivation. We noticed that AABBCC embryos were smaller than their wild type littermates at E13.5 (Fig.10A). We infected primary fibroblasts from control, double homozygous (AABB, BBCC and AACC) and triple homozygous point mutant

AABBCC embryos with doxycycline-inducible lentiviruses, which delivered the reprogramming factors Oct4, Sox2, Klf4 and cMyc. Reprogramming efficiency was assessed on day 20 by counting the number of colonies positive for alkaline phosphatase (AP), an early marker of pluripotency. This analysis revealed a 7-fold drop in reprogramming efficiency in the AABBCC genotype compared to control, which phenocopied the 6 to 8-fold reduction in reprogramming, recorded for Brca1tr/tr fibroblasts (Fig.1B, C). The double homozygous mutant AACC also

25

showed reduction in reprogramming efficiency (Fig.10D). This result demonstrates that BRCT phospho-recognition is required for reprogramming to pluripotency.

Figure 10. BRCT phospho-recognition is required for reprogramming. A). Size and morphology of E13.5 embryos from the AbraxasS404A/S404A, Bach1S994A/S994A,

CtIPS327A/S327A genotype collection. Each genotype included 3 to 9 embryos and statistical significance was determined with an unpaired, two-tailed t-test.

26

Figure 10 (continued). B). and C). Alkaline phosphatase (AP) staining and quantification on reprogramming day 20. Statistical analysis was performed with an unpaired, 2-tailed t test; n = 5 per genotype, except for AABBCC (AbraxasS404A/S404A, Bach1S994A/S994A, CtIP327A/327A), where n = 2, **p < 0.01 and ***p < 0.001.

27

Figure 10 (continued). D). AP staining and quantification for the ABC genotype collection. Data was collected from 3 to 7 biological and experimental replicates, except for AABBCC where only

2 data points were available. Statistical analysis was performed with one-way ANOVA and

Sidak’s multiple comparisons test.; *p < 0.05 and **p < 0.01.

28

The deficiency in BRCT phospho-recognition in the AABBCC point mutant disrupts both the function of Brca1 in the homology directed repair (HDR) and its role in stalled replication fork protection (SFP). To distinguish between these two roles and to determine the specific type of

DNA damage limiting to somatic cell reprogramming, we generated a panel of primary mouse fibroblasts with single and combination mutations in stalled replication fork protection (SFP) and homology-directed repair (HDR) of DNA double strand breaks (Table 1).

Genotype Phenotype Cancer Prone Reprogramming Phenotype Bard1K607A/+ HR+ SF- No No Bard1K607A/K607A HR+ SF- No No Bard1S563F/+ HR+ SF- No No Bard1S563F/S563F HR+ SF- No No Brca1tr/+ HR+ SF- No No Smarcal1+/- HR+ SF+ No Smarcal1-/- HR+ SF+ No Brca1tr/+, Smarcal1+/- HR+ SF- * No Brca1tr/+, Smarcal1-/- HR+ SF+ No Brca1tr/tr HR- SF- Yes reduced Brca1tr/tr, Smarcal1+/- HR- SF- * reduced Brca1tr/tr, Smarcal1-/- HR- SF+ reduced 53bp1+/- HR+ SF+ improved 53bp1-/- HR+ SF+ Yes improved Brca1tr/+, 53bp1+/- HR+ SF+ * improved Brca1tr/+, 53bp1-/- HR+ SF+ * improved Brca1tr/tr, 53bp1+/- HR+ SF+ * rescue Brca1tr/tr, 53bp1-/- HR+ SF+ * No rescue AABB TBD No No BBCC TBD No No AACC TBD No reduced AABBCC HR- SF- No reduced Table 1. Reprogramming genotype collection. Reprogramming phenotype and tumor susceptibility are indicated for each genotype-phenotype pair. * To be confirmed. TBD = to be determined.

29

2.3.2. DISRUPTION OR RESCUE OF STALLED FORK PROTECTION (SFP) ALTERS THE LEVELS OF DNA DAMAGE WITHOUT AFFECTING REPROGRAMMING EFFICIENCY.

To determine if stalled fork protection (SFP) is required for reprogramming, we first examined mutations which compromise the function of Brca1 in SFP without affecting homology- directed repair (HDR). The SFP- genotype set included a heterozygous Brca1 truncation

(Brca1tr/+), [34], as well as, heterozygous and homozygous Bard1S563F point mutants. Cells mutant for Bard1 (BRCA1 associated RING domain 1) are HDR-proficient, but sensitive to fork stalling agents such as hydroxyurea [62]. To determine the consequence of SFP deficiency on

DNA damage during reprogramming, we quantified foci for γH2AX, phospho RPA(S33), and

53BP1 on day 5 of reprogramming. We found a 1.5-fold increase in the mean number of γH2AX foci per cell in both SFP-deficient genotypes, Brca1tr/+ and Bard1S563F/S563F (Fig.11A). Foci, marked by phospho RPA(S33), phosphorylated by ATR in late S and G2 [64, 65] showed accumulation of ssDNA in the context of SFP deficiency. Notably, Brca1tr/+ mutants had a 3 to

4-fold increase in phospho RPA(S33) foci compared to the SFP proficient controls (Fig.11B).

Albeit, Brca1tr/+ fibroblasts accrued higher numbers of γH2AX foci, the mean number of 53BP1 foci per cell did not increase significantly (Fig.11C). Therefore, replication stress is increased in

SFP deficient cells during reprogramming, though it does not necessarily result in a double strand break bound by 53BP1. To determine whether the increase in γH2AX and phospho RPA(S33) foci was consequential for cell proliferation, we treated cells with the cytoplasmic dye CFSE, which is diluted through cell division, but retained in arrested cells. We noted no differences in proliferation on day 5 of reprogramming in the SFP mutants (Fig.11D and Fig.12A). Furthermore, deficiency in stalled fork protection had no consequences on embryo growth, as shown by the normal size and morphology of SFP- E13.5 embryos (Fig.12B). Despite an increase in DNA damage, SFP- genotypes (Brca1tr/+, Bard1S563F/+ and Bard1S563F/S563F) and the SFP+ control

30

reprogrammed equally well as shown by the number of alkaline phosphatase (AP) positive colonies on reprogramming day 20 (Fig.11E). This result was further confirmed by the reprogramming efficiency of two additional Bard1 SF- mutants, Bard1K607A/+ and

Bard1K607A/K607A, which was not different from controls (Fig.12C). These results indicate that the function of Brca1 in stalled fork protection (SFP) is dispensable for reprogramming. Therefore, elevated DNA damage, measured by an increase in γH2AX foci in the SFP-genotypes, does not directly correlate with impaired reprogramming efficiency.

Figure 11. Stalled fork protection deficiency increases DNA damage but does not impair reprogramming. A). Immunofluorescence and quantification for double strand break (DSB) marker γH2AX. Foci were counted on reprogramming day 5 in fibroblasts derived from 2-3 different embryos per genotype (>200 cells). Statistical analysis was performed with one-way

ANOVA and Sidak’s multiple comparisons test. B). Immunofluorescence and quantification of phospho RPA(S33)/γH2AX double positive foci. Statistical significance was evaluated with an unpaired, 2-tailed student’s t test. **p < 0.01, ***p < 0.001 and ****p < 0.0001.

31

Figure 11. (continued) C). Immunostaining and quantification of 53BP1 foci. The analysis included 2-3 biological replicates per genotype (>200 cells) and statistical analysis was performed with an unpaired two-tailed t-test. D). Number of CFSE positive cells relative to wild type control.

Data was collected from 2-3 biological replicates per genotype and analyzed with one-way

ANOVA and Sidak’s multiple comparisons test.

32

Figure 11. (continued) E). Alkaline phosphatase (AP) staining and reprogramming efficiency quantification. The number of AP positive colonies was first normalized to the percentage of infected cells and shown as a ratio to wild type. Data from 3-9 biological and experimental replicates per genotype was analyzed by one-way ANOVA.; All comparisons in the ANOVA analyses were made with Sidak’s multiple comparisons test.

33

Figure 12. Compromising SF has no effect on development or reprogramming.

A). Flow cytometry plots of CFSE cytoplasmic dye retention. Serum starved cells exhibit a bright peak, centered around 104, while cycling cells rapidly dilute the dye and the peak moves to the left.

34

Figure 12 (continued) B). Morphology and size of E13.5 embryos. In each genotype, the area of

11-13 embryos were measured, except for Bard1563F/S563F for which only one embryo was available. Statistical analysis was performed with one-way ANOVA and Sidak’s multiple comparisons test.

35

Figure 12. (continued). C). Alkaline phosphatase (AP) staining and quantification of reprogramming efficiency for 3 to 7 biological and experimental replicates per genotype. The results are shown as a ratio to wild type after normalization to percent infected cells. Statistical significance was measured with one-way ANOVA and Sidak’s multiple comparisons test.

36

The homozygous Brca1 truncated mutant Brca1tr/tr is deficient in both HR and SF.

Separation of function for the roles of BRCA1 in homology-directed repair (HDR) and stalled fork protection (SFP) can be achieved through loss of function of the SNF2-family translocase

SMARCAL1 [42]. Depleting SMARCAL1 rescues the SFP proficiency of BRCA1 mutant cells by preventing the formation of a chicken-foot structure, which is a substrate for MRE-11 dependent degradation. BRCA1 mutant cells depleted for SMARCAL1 remain HDR deficient

[42]. The HDR-SFP- Brca1tr/tr cells accumulated high levels of DNA damage on day 5 of reprogramming, marked by a 2-fold increase in the mean number of γH2AX foci per cell (Fig.

13A). This was higher than the 1.5-fold increase, recorded for the SFP- deficient genotypes on

Fig.11A. The number of γH2AX foci was significantly reduced upon rescue of SFP protection in

Brca1/Smarcal1 double mutant cells, though it remained higher than in controls (Fig.13A).

Similarly, deficiency for both HDR and SFP in Brca1 mutant cells resulted in a 5 to 6-fold increase in the number of phospho RPA(S33) foci (Fig.13B). This was higher than the 3 to 4-fold increase recorded for the SFP- genotype Brca1tr/+ on Fig.11B. The number of phospho RPA(S33) foci was greatly reduced in the absence of Smarcal1, but remained 2-fold higher than in the controls

(Fig.13B). To evaluate the SFP deficiency of Brca1tr/tr cells during reprogramming, we performed DNA fiber analysis through incubation with CldU, followed by IdU and subsequent treatment with hydroxyurea (HU) to induce fork stalling. A decrease in the mean length of the IdU tract relative to CldU shows excessive fork degradation. Stalled forks were frequently degraded in the HDR-SFP- genotype while loss of Smarcal1 rescued the phenotype (Fig.13C). Concordant results were obtained using the G-quadruplex stabilizing compound pyridostatin (PDS) which stalls replication forks in G-rich regions of the genome (Fig.14A). The proliferation capacity of the different genotypes during reprogramming was tested by incubation with the cytoplasmic dye

37

CFSE, which is rapidly diluted by cycling cells. This analysis revealed a proliferation defect in the

HDR-SFP- genotype, which persisted despite the rescue of SFP in Brca1tr/tr, Smarcal1-/- cells

(Fig.13D and Fig.14B). Brca1tr/tr embryos were delayed in development, and were not rescued by restoration of SFP through Smarcal1 deletion (Fig.14C). Ablation of Smarcal1 alone had no effect on proliferation capacity or development (Fig.13D and Fig.14C). Apoptosis signaling in the different genotypes was evaluated by flow cytometry for Annexin V and propidium iodide (PI) on reprogramming day 5. Combined HDR-SFP- deficiency led to accumulation of both early and late apoptotic cells in the Brca1tr/tr genotype, which persisted in Brca1tr/tr, Smarcal1-/- mutants, despite protection of SF (Fig.13E and Fig.14D).The reprogramming efficiency of HDR-SFP-

Brca1tr/tr cells was severely impaired (6 to 8-fold lower) compared to the wild type control

(Fig.13F and Fig.14E). Importantly, restoring SF in Brca1-deficient cells through the genetic inactivation of Smarcal1 did not rescue their poor reprogramming phenotype (Fig.13F and

Fig.14E). Ablation of one or both Smarcal1 alleles alone or in combinations with the Brca1tr/+ genotype had no effect on iPS cell generation (Fig.13F and Fig.14E). These findings show that

Brca1 is required for reprogramming for the homology-directed repair (HDR) of DSBs, not through its role in stalled replication fork protection (SFP). Though γH2AX-marked DNA damage was partially reduced in Brca1/Smarcal1 double mutant cells, reprogramming efficiency was not increased. Therefore, not all DNA damage marked by γH2AX is equally consequential to reprogramming, and only the specific type of damage requiring double strand break HDR is limiting.

38

Figure 13. The function of Brca1 in HDR is required for reprogramming.

A). Immunofluorescence and quantification for γH2AX foci on reprogramming day 5. Foci were counted in 3 biological replicates per genotype with a total number of at least 430 cells per genotype. Statistical analysis was performed with an unpaired, two-tailed Student’s t test. *p <

0.05 and ****p < 0.0001.

39

Figure 13. (continued) B). Staining and quantification for phospho RPA(S33)/γH2AX double positive foci. Data was collected from 3 biological replicates per genotype (>145 cells per genotype) and the statistical significance was determined with an unpaired, two-tailed Student’s t test. **p < 0.01 and ****p < 0.0001.

40

Figure 13. (continued) C). DNA fiber analysis in a fork stalling assay with hydroxyurea (HU).

At least 170 DNA fibers were measured per genotype from 2-3 biological replicates and analyzed by one-way ANOVA and Sidak’s multiple comparisons test. ****p < 0.0001.

41

Figure 13. (continued) D). Cell proliferation analysis with CFSE dye. Data was pooled from 2-4 biological and experimental replicates per genotype. Statistical significance was determined with one-way ANOVA and Sidak’s multiple comparisons test. **p < 0.01 and ***p < 0.001.

42

Figure 13. (continued) E). Apoptosis analysis with Annexin V and propidium iodide (PI). Data was collected from 3 biological replicates per genotype and analyzed by one-way ANOVA and

Sidak’s multiple comparisons test. **p < 0.01 and ****p < 0.0001.

43

Figure 13. (continued) F). Alkaline phosphatase staining and quantification of reprogramming efficiency. Data was collected from 3-5 biological and experimental replicates per genotype, except for the Smarcal1-/- and the AbraxasS404A/S404A, Bach1S994A/S994A, CtIP327A/327A genotypes which had 2 data points each. Reprogramming efficiency was normalized to percent infected cells and is shown as a ratio to wild type control. Statistical analysis was performed with one-way ANOVA. All comparisons in the ANOVA analyses were made with Sidak’s multiple comparisons test. ***p < 0.001 and ****p < 0.0001.

44

Figure 14. Loss of HR has negative consequences for development and reprogramming.

A). DNA fiber analysis in a fork stalling assay with pyridostatin (PDS). Data was collected from

2-3 biological replicates (>180 fibers per genotype in total) and analyzed with one-way ANOVA and Sidak’s multiple comparisons test.

45

Figure 14. (continued) B). Flow cytometry plots for CFSE cytoplasmic dye. Serum starved and

HR deficient cells exhibit high signal retention, indicating reduced proliferation.

46

Figure 14. (continued). C). Size and morphology of E13.5 embryos from the Brca1/Smarcal1 genotype collection. For quantification by embryo area size, each genotype included 3-10 embryos except Brca1tr/+, Smarcal1-/- and Brca1tr/tr, Smarcal1-/- for which 2 embryos per genotype were available. Statistical analysis was performed with one-way ANOVA and Sidak’s multiple comparisons test.

47

Figure 14. (continued) D). Flow cytometry for Annexin V-FITC and propidium iodide (PI). Early apoptotic cells stain for Annexin V only, while late apoptosis is marked by double staining for

Annexin V and PI due to an increase in membrane permeability typical for dying cells.

48

Figure 14. (continued) E). Alkaline phosphatase (AP) staining panel for the Brca1/Smarcal1 genotype collection.

49

2.3.3. REPROGRAMMING EFFICIENCY IS DETERMINED BY THE BALANCE OF DNA DSB REPAIR PATHWAY CHOICE

Ablation of 53bp1 in Brca1 mutant cells has been reported to re-establish their HDR proficiency [35]. In wild type cells, the ablation of 53BP1 is used to increase gene targeting by homologous recombination [128] as 53BP1 promotes NHEJ and inhibits BRCA1-mediated HDR in a competitive manner [32]. To assess whether rescuing the HDR function of Brca1tr/tr cells restores reprogramming to wild type levels, we harvested fibroblasts from embryos deficient in both Brca1 and the NHEJ repair factor 53bp1 (Table 1). Evaluation of DNA damage on reprogramming day 5 revealed that removing a single copy of 53bp1 is sufficient to reduce significantly the mean number of γH2AX foci per cell in the Brca1tr/tr genotype (Fig. 15A).

Further reduction was noted when both alleles of 53bp1 were eliminated (Fig. 15A). There was no detectable difference between the number of γH2AX foci per cell in the control and the Brca1tr/tr genotype with either heterozygous or homozygous inactivation of 53bp1 (Fig. 15A). Loss of one or both copies of 53bp1 also reduced the accumulation of phospho RPA(S33) foci in Brca1 deficient cells (Fig. 15B). The mean number of phospho RPA(S33) foci in Brca1tr/tr mutants on a 53bp1 null background was not significantly different from the wild type control, though it remained higher in heterozygous 53bp1 mutant, Brca1 deficient cells (Fig. 15B). To determine the effect of 53bp1 disruption on fork stability, we performed a fork-stalling assay with hydroxyurea.

Fiber analysis revealed that ablating one or both copies of 53bp1 rescues the SFP function of the

Brca1tr/tr genotype as shown by the restored length of the IdU track, relative to CldU (Fig.15C).

Concordant results were recorded for a complementary fork-stalling assay with pyridostatin

(Fig.16A). Restoration of HDR competence of the different genotypes was assessed by

CRISPR/Cas9-mediated targeting of a zsGreen repair template to the Hsp90 in mouse induced pluripotent stem (iPS) cells (Table 2) as described in Mateos-Gomez et al. [129]. This

50

strategy evaluates HDR proficiency based on the presence of a genomically integrated zsGreen template, which generates a fluorescent protein, detected by flow cytometry. The HDR deficiency of Brca1tr/tr iPS cell lines was confirmed by a 2-fold reduction in the number zsGreen positive cells, which was restored to wild type levels on a 53bp1-/- background (Fig.15D and Fig.16B).

Inactivation of 53bp1 also improved the proliferation capacity of the HDR-SFP- genotype as shown by a reduction in the number of cells that retain the CFSE cytoplasmic dye (Fig.15E and

Fig.16C). The positive effect on proliferation had developmental consequences, revealed by the complete rescue of the reduced embryo size in the Brca1tr/tr mutants upon loss of 53bp1

(Fig.16D). Heterozygous or homozygous ablation of 53bp1 in wt cells did not affect proliferation

(Fig.15E) or development (Fig.16D). Loss of 53bp1 also had pronounced effects on the apoptotic rates of Brca1tr/tr mutants (Fig.15F). The number of early apoptotic cells in HDR-SFP- Brca1tr/tr genotype was reduced 2-fold on a 53bp1 heterozygous background and 4-fold when 53bp1 was fully removed (Fig.15F and Fig.16E). A similar reduction was noted for late apoptotic cells, positive for both Annexin V and PI (Fig.15 and Fig.16E). Ablation of 53bp1 alone did not cause a detectable increase in apoptosis (Fig.15F). Importantly, removal of one or both copies of 53bp1 rescued the poor reprogramming efficiency of Brca1tr/tr cells (Fig.15G and Fig.16F). Disruption of a single copy of 53bp1 resulted in a 5-fold improvement in reprogramming efficiency (Fig.15G), while removing both copies resulted in 11-fold increase and a complete rescue in the number of alkaline phosphatase positive colonies (Fig.15G). We noted that 53bp1 loss also consistently resulted in a significant increase in reprogramming of HDR+SFP+ wild type cells as well of the

SFP- deficient Brca1tr/+ genotype (Fig.15G and Fig.16F). The improvement for wild type and

Brca1tr/+ cells was a 2-fold increase upon complete loss of 53bp1 (Fig.15G and Fig.16F).

51

Therefore, and unlike previously reported [130], ablation of 53bp1 increases reprogramming efficiency in all genetic contexts examined.

Figure 15. Restoration of HDR rescues impaired reprogramming efficiency.

A). Immunofluorescence staining and quantification of γH2AX foci on day 5 of reprogramming.

Data was collected from >400 cells per genotype and analyzed with one-way ANOVA and Sidak’s multiple comparisons test. ****p < 0.0001.

52

Figure 15. (continued) B). Staining and quantification for phospho RPA(S33)/γH2AX double positive foci. For each genotype at least 145 cells were analyzed and statistical significance was determined with one-way ANOVA and Sidak’s multiple comparisons test. **p < 0.01 and ****p

< 0.0001.

53

Figure 15. (continued) C). DNA fiber analysis in a fork stalling assay with hydroxyurea. For each genotype >120 fibers were measured and statistical significance was calculated with one-way

ANOVA and Sidak’s multiple comparisons test. **p < 0.01, ***p < 0.001 and ****p < 0.0001.

54

Figure 15. (continued) D). CRISPR/Cas9 based HR assay with 2-3 induced pluripotent stem (iPS) cells lines per genotype from the indicated genotypes. The data is shown as a ratio to wild type control and was analyzed with an unpaired two-tailed t-test. E). Analysis of cell proliferation with

CFSE cytoplasmic dye. Data from 2 to 5 biological and experimental replicates per genotype was pooled and statistical significance was measured with one-way ANOVA and Sidak’s multiple comparisons test. **p < 0.01.

55

Figure 15. (continued) F). Apoptosis analysis with Annexin V and propidium iodide. Data was collected from 3 biological replicates per genotype and evaluated with one-way ANOVA and

Sidak’s multiple comparisons test. ***p < 0.001 and ****p < 0.0001.

56

Figure 15. (continued) G). Alkaline phosphatase (AP) staining and quantification of reprogramming efficiency for the Brca1/53bp1 genotype collection. Data from 4-9 biological and experimental replicates was normalized to infection efficiency and is shown as a ratio to wild type control. Statistical analysis was performed with one-way ANOVA.; All comparisons in the

ANOVA analyses were made with Sidak’s multiple comparisons test; *p < 0.05 and ****p <

0.0001.

57

Genotype # of Cell lines Cell Line IDs wt 6 BS2-1A, BS2-1B, BS3-4A, BS3-4B, BS3-7A, BS3-7B Brca1tr/+ 5 BS1-2B, BS1-2C, BS4-3A, BS4-3B, BS4-9B Smarcal1+/- 6 BS2-5A, BS2-5B, BS2-6A, BS2-6B, BS4-1A, BS4-1B Smarcal1-/- 6 BS6-2A, BS6-2B, BS6-3A, BS6-3B, BS6-6A, BS6-6B Brca1tr/+, Smarcal1+/- 6 BS2-3A, BS2-3B, BS3-1A, BS3-1B, BS4-2A, BS4-2B Brca1tr/+, Smarcal1-/- 6 BS2-2C, BS2-2D, BS2-2E, BS6-1A, BS6-1B, BS6-5A Brca1tr/tr 4 BS2-4A, BS2-4B, BS4-6B, BS4-8A Brca1tr/tr, Smarcal1+/- 5 BS3-2A, BS3-2B, BS3-5B, BS4-7A, BS4-7B Brca1tr/tr, Smarcal1-/- 6 BS3-3A, BS3-3B, BS4-5A, BS4-5B, BS6-7A, BS6-7B 53bp1+/- 5 BP2-1B, BP2-1C, BP2-5A, BP2-5B, BP6-1A 53bp1-/- 10 BP5-3E, BP6-2A, BP6-2B, BP6-2C, BP6-2E, BP6-7A, BP6- 7C, BP6-7D, BP6-7E, BP6-7F Brca1tr/+, 53bp1+/- 3 BP2-4B, BP6-4A, BP6-4B Brca1tr/+, 53bp1-/- 3 BP3-9B, BP4-6A, BP5-2A Brca1tr/tr, 53bp1+/- 4 BP2-8A, BP3-7A, BP5-1A, BP5-1B Brca1tr/tr, 53bp1-/- 10 BP2-9B, BP4-5A, BP4-5B, BP4-5C, BP4-5D, BP4-5D BP5-3A, BP5-3B, BP5-3D, BP5-3E

Table 2. iPS cell line collection. List of mouse iPS cell lines, derived from fibroblasts of the indicated genotypes. Colonies were picked on reprogramming day 16.

58

Figure 16. Restoration of HDR rescues development and reprogramming.

A). DNA fiber assay of replication stalled with pyridostatin (PDS). At least 180 DNA fibers were measured per genotype and statistical significance was evaluated with one-way ANOVA and

Sidak’s multiple comparisons test. **p < 0.01 and ***p < 0.001.

59

Figure 16. (continued) B). Flow cytometry plots from iPS cell lines of the indicated genotypes in the CRISPR/Cas9-mediated HR assay. The number of zsGreen positive cells is used to assess HR competence.

60

Figure 16. (continued) C). Flow cytometry plots of cells incubated with the CFSE cytoplasmic dye to asses proliferation. Cycling cells exhibit reduced brightness due to progressive dilution of the dye.

61

Figure 16. (continued) D). Size and morphology of E13.5 embryos from the Brca1/53bp1 genotype collection. Each genotype included 3-11 embryos. Statistical analysis was performed with one-way ANOVA and Sidak’s multiple comparisons test. **p < 0.01 and ***p < 0.001.

62

Figure 16. (continued) E). Flow cytometry plots from apoptosis analysis with Annexin V and propidium iodide (PI). Cells positive for Annexin V are in early apoptosis, while late apoptotic cells stain for both for Annexin V and PI.

63

Figure 16. (continued) F). Alkaline phosphatase staining for the Brca1/53bp1 genotype collection.

64

2.4. DISCUSSION

The integrity of the genome of reprogrammed stem cells is an important determinant of their utility in stem cell research and therapy. From studies in two different reprogramming systems, iPS cells and somatic cell nuclear transfer, we know that reprogramming results in increased DNA damage and replication stress marked by H2AX [24, 116, 131]. Therefore, a detailed understanding of the type of DNA damage and the repair pathways required is important.

Replication stress signals can arise from stalled forks, which result in the formation of excessive single stranded DNA and double strand breaks. Stalled replication forks can be resolved by several pathways, including re-priming, regression with template switching, and break induced replication, reviewed in [95]. In this study, we use separation of function mutants for the roles of

Brca1 in stalled replication fork protection (SFP) and homology-directed repair (HDR) to assess their importance in reprogramming. Our data shows that despite a significant increase in γH2AX and phospho RPA(S33) foci, three genotypes deficient for SFP, Brca1tr/- and two Bard1 mutants, did not experience any reduction in reprogramming efficiency.

Stalled forks in BRCA1 mutant cells are degraded after formation of a four-way intermediate structure, termed “chicken foot”, which is generated by the fork remodelers

SMARCAL1, ZRANB3 and HLTF [42]. The inactivation of SMARCAL1 impairs fork regression and the formation of the branched intermediate, thereby preventing fork degradation in the absence of BRCA1. [42]. Leveraging this interaction, we tested the role of SFP in reprogramming further by ablating Smarca1l in Brca1 mutants. Loss of Smarcal1 in Brca1tr/tr cells reduced the numbers of both γH2AX and phospho RPA(S33) foci, indicating that the rescue of SFP lowers replication stress during reprogramming. Despite this decrease, Brca1tr/tr, Smarcal1-/- mutants were impaired in reprogramming, comparable to the Brca1tr/tr genotype. Our studies thus far allowed

65

for two important conclusions: first, not all replication stress-associated DNA damage affects reprogramming efficiency. And second, while stalled fork protection is dispensable for reprogramming, homology-directed repair is not: Brca1tr/tr, Smarcal1-/- cells are deficient specifically for the HDR, but not the SFP function of BRCA1 and reprogram poorly.

To test whether restoring HDR improves the poor reprogramming of Brca1tr/tr mutants, we made use of the genetic interaction with 53bp1. The DNA repair factor 53BP1 acts together with RIF1 in a competitive and antagonistic manner to BRCA1-CtIP to inhibit the resection of double strand breaks [80]. Inactivation of 53bp1 in Brca1 mutant cells has been previously shown to rescue their HDR competence [35]. Loss of one allele of 53bp1 in Brca1tr/tr cells was sufficient to reduce the number of γH2AX foci to wild type levels, and also significantly lowered the number of phospho RPA(S33) foci. Notably, this also restored the reprogramming efficiency of Brca1tr/tr cells to nearly wild type levels. Complete 53bp1 loss fully rescued the efficiency of iPS cell generation in the absence of Brca1. Surprisingly, improved reprogramming upon disruption of

53bp1 was also noted for wild type cells as well as the heterozygous SFP- deficient Brca1tr/+ genotype. This result challenges a previous study which reported a 2-fold reduction in iPS cells generation for 53bp1-/- mouse fibroblasts compared to controls [130]. Marion et. al did not publish error bars or statistical analysis of the reported difference. The strength of our conclusion comes from examining the consequences of 53bp1 loss in three different genotypes-wild type, Brca1tr/+ and Brca1tr/tr, all of which reprogram better in the absence of 53bp1. Furthermore, we show a dose dependent increase in reprogramming efficiency, which is detectable in 53bp1+/- cells, but more pronounced on the null background. Importantly, this result was universal across all three genotypes, demonstrating that 53bp1 suppresses reprogramming to pluripotency. Since Brca1 and

53bp1 compete to determine repair pathway choice, the absence of 53bp1 in wild type cells might

66

enhance the efficiency of HDR. This idea has been corroborated by CRISPR/Cas9 gene targeting studies which have reported improved rates of HR-mediated genome editing by inhibition of

53BP1 [128].

In addition to its role in DSB repair by non-homologous end joining, 53BP1 has a separate function in the stimulation of p53-dependent transcription [92]. Loss of 53BP1 has been reported to impair the induction of p21, as well as, pro-apoptotic targets BAX and PUMA/BBC3 in human metastatic adenocarcinoma cells [92]. Loss of p53 or downregulation of p21 improves iPS cell generation [121], providing an alternative mechanism by which ablation of 53bp1 could increase reprogramming. However, another study showed normal stabilization of p53 in 53bp1-/- mouse thymocytes and upregulation of p21 in response to IR [93]. In our experimental system-mouse embryonic fibroblasts, we did not note any changes in proliferation or apoptosis in 53bp1+/- or

53bp1-/- cells and found normal induction of p21 in both 53bp1 mutant cells (data not shown).

Therefore, the improvement in reprogramming efficiency in the absence of 53bp1 is not due to compromised induction of p53 targets.

The prominent role of HDR during iPS cell generation was further corroborated by studies of triple homozygous phosphoserine point mutants Abraxas S404A/S404A, Bach1S994A/S994A,

CtIPS327A/S327A. In their phosphorylated form, these proteins interact with Brca1 at S1598 which is required for tumor suppression [11]. While Brca1tr/tr is a hypomorph lacking the entire

BRCT domain, the AABBCC mutant offers a highly specific disruption of HR. We noted a reduction in reprogramming efficiency for AABBCC, as well as the double homozygous phosphoserine point mutant, AACC. This shows that the phospho-dependent interaction between

CtIP and Brca1 facilitates reprogramming, though revealing this effect also required a sensitized system with a phosho-mutation in Abraxas. In line with this result, downregulation of CtIP has

67

been previously shown to have a negative effect on iPS cell generation [25]. The phosphorylation of CtIP is cell cycle regulated and S327 is phosphorylated by Cdk1 in G2 [132]. This points out that repair by homologous recombination becomes relevant in G2 during reprogramming.

Our observations collectively indicate that the limiting factor for reprogramming to pluripotency is a one-ended double strand break which requires homology-directed repair (Fig.17).

Stalled replication forks are cleaved by endonuclease Mus81, which is regulated in a cell-cycle dependent manner by the Wee1 kinase [102]. In early S-phase, converging forks and alternative mechanisms of replication fork restart are available to cope with replication stress, which allows reprogramming despite increased DNA damage. In G2 of the cell cycle, HDR becomes limiting due to a paucity of origins in late replicating regions [133] and increased cleavage of stalled forks by Mus81. Replication stress in late replicating regions is hence most consequential. Consistent with this model, copy number variations in induced pluripotent stem (iPS) cells occur at common late replicating fragile sites [134, 135].

Several studies have pointed out the parallels between reprogramming and tumorigenesis

[136, 137].The deletion of tumor suppressors p53, p21 and Rb facilitate both reprogramming and abnormal cell transformation. In this work, we demonstrate that the primary inhibitor of reprogramming is a double strand break (DSB), repaired by HR. Homology-directed repair has slow kinetics [71] and is counteracted by nonproductive repair mechanisms. Unresolved repair intermediates can persist through G2, into and the next G1 [138]. The inheritance of such damage determines further cell cycle progression through activation of p21 and Rb [139, 140].

Therefore, an unrepaired replication-associated DSB is highly consequential and acts upstream of tumor suppressors p21, p53 and Rb to repress cell type transitions.

68

In contrast to the deletions of p21, Rb or p53, loss of the tumor suppressor Brca1 severely impairs reprogramming. The function of BRCA1 in the maintenance of genomic stability through

HDR is required for both reprogramming and tumor suppression. Mice with HDR deficiency are tumor prone [11]. The role of Brca1 in stalled fork protection (SFP) is dispensable for reprogramming, as well as, for tumor suppression [62]. Similarly, separation of function mutations in BRCA2 show that the HDR function, but not the SFP function is required to prevent chromosome fragility [141]. Importantly, loss of 53bp1 restores HDR and reduces the tumor incidence in Brca1tr/tr animals [33], and in our system we show that it also rescues reprogramming. Interestingly, some BRCA1-deficienct cancers inactivate 53BP1, which restores

HDR competence [89, 142]. This BRCA1 paradox, where HDR is required for tumor suppression, but also enables tumor growth and mediates cell transformation during reprogramming remains unresolved. A plausible solution is provided here: BRCA1 is required to maintain the integrity of the regions, which generate replication-associated double strand breaks, activate downstream tumor suppressors and thereby inhibit cell transformation.

69

Figure 17. One-ended DSBs limit somatic cell reprogramming.

One-ended double strand breaks (DSBs), resulting from replication-associated DNA damage during reprogramming require repair by homologous recombination (HR). The competing repair pathway, non-homologous end joining (NHEJ), is not productive and impairs reprogramming. The presence of DSBs results in the activation of tumor suppressors p53, p21 and Rb which inhibit reprogramming. Ablation of the NHEJ factor 53bp1 or the tumor suppressors p53, p21, Rb results in improved reprogramming.

70

2.5. MATERIALS AND METHODS

2.5.1. Animal breeding To generate Bard1 mutants, heterozygous Bard1S563F/+ and Bard1K607A/+ females on a C57BL/6J background, described in Billing et al. [62] were bred to males of the same genotypes.

The Brca1/Smarcal1 genotype collection was created from intercrosses between Brca1tr/+,

Smarcal1+/- animals of mixed C57BL/6J and 129Sv background. The Brca1tr/+ allele is described in [34]. Mice, mutant for Smarcal1 were obtained from the International Mouse

Phenotyping Consortium (IMPC). The Brca1/53bp1 genotype panel was generated from intercrosses between Brca1tr/+, 53bp1+/- males and females on a mixed C57BL/6J and 129Sv background. For the AbraxasS404A/Bach1S994A/CtIPS327A genotype collection, referred to as

ABC, we first crossed mixed background (C57BL/6J and 129Sv) AABB mice to CC to generate F1 triple heterozygous A+B+C+ animals. The F1 generation of A+B+C+ was intercrossed to obtain the different combinations of double homozygous mutants. To generate triple homozygous phosphoserine mutants, F2 A+BBCC males were crossed to F2 A+BBCC or A+BBC+ females.

From the A+BBCC x A+BBCC crosses, 1 of 9 embryos was triple homozygous mutant (expected mendelian ratio 1/4). One additional triple homozygous mutant embryo was found from the triple heterozygous intercrosses (A+B+C+ x A+B+C+) which produced 69 embryos (expected mendelian ratio 1/64).

2.5.2. Fibroblast derivation and genotyping To derive fibroblasts for reprogramming, we harvested E13.5 mouse embryos from the above described crossed and processed them as follows: the embryo head, limbs and all internal organs were removed by dissection and the remaining tissue was mechanically dissociated with surgical scissors. This step was followed by a 10-15min incubation with TrypLE at 37C (Thermo

71

Fisher Scientific #12605036), trypsin inactivation with media and a 5min centrifuge spin. The cells from a single embryo were then plated in one 10cm dish and grown in MEF media, consisting of

DMEM HG (Thermo Fisher Scientific #10569010), supplemented with 10% FBS (Atlanta

Biologicals #S11150), Glutamax (Thermo Fisher Scientific #35050079) and PenStrep (Thermo

Fisher Scientific 15140163). Cells were split once to P1 and frozen down for reprogramming experiments. The sequences of all genotyping primers are provided in Table 3.

Genotype Primer F Primer R Bard1S563F/+ GCAGGTGCTCTACCCTC AACCTGGCCATCAACAT Bard1S563F/S563F AAC G Bard1K607A/+ CACGTGGTTGCTGGAA ATGTAAAGGAGCCAGC Bard1K607A/K607A ATTG AGC Brca1tr/+ TGCTCACTCTGTGCCCT TCCATTCTCCCCGCTTCT Brca1tr/tr CAA GT Smarcal1+/- CCGCTCTAACCTGGGA GTGACAGACAACAGCC Smarcal1-/- ACAC AGCC TCGTGGTATCGTTATGC GCC 53bp1+/+ AGGAGACTGAAGAACC CTCAGTTTTCCTGGGCC AATCG TCCT

53bp1+/- GTCAGGGTTTCACTGG CCTTCTTGACGAGTTCTT 53bp1-/- CTTG AbraxasS404A/+ CAGCAGGCACCAAGAC TCTGTGTATTAATCCGA AbraxasS404A/S404A AAGG GAGGCAAAGA Bach1S994A/+ GCCAAGTGTCCCAGCT TCAGTGTCCCAGGCAAC Bach1S994A/S994A CAAA TAAG CtIPS327A/+ TAGCAAAAGTCCTCAG TGTTGCTAAAGGGAGCT CtIPS327A/S327A TGGGC GTC

Table 3. Sequences of genotyping primers. For each genotype, forward and reverse primer pairs are provided.

72

2.5.3. Virus preparation and infection This study used a doxycycline inducible lentiviral system, consisting of Tet-O-FUW-

OSKM (Addgene #20321) and FUW-M2rtTA (Addgene # 20342). Lentivirus was prepared in

293T cells by transfection of plasmids with Jetprime transfection reagent (VWR #89129-922) as outlined in the manufacturer’s instructions. Briefly, Tet-O-FUW vectors were transfected together with the envelope and packaging plasmids pMD2VSVG (Addgene #12259) and psPax2 (Addgene

# 12260) into 293T cells plated on collagen-coated dishes. Fresh antibiotic-free media DMEM HG

(Thermo Fisher Scientific #10569010), supplemented with 15% FBS (Atlanta Biologicals

#S11150) and Glutamax (Thermo Fisher Scientific #35050079) was provided 16 to 20h post transfection. Viral supernatant was collected on each of the following two days and kept at 4C for up to 4 days. Prior to infection, titer from the two collection days was pooled and filtered through a 40uM cell strainer (Fisher Scientific #08-771-1).

For infection, P1 mouse embryonic fibroblasts (MEFs) were thawed and plated at 1x106 cells per 10cm dish on the previous day. Infection proceeded in two rounds with 8h to 9h in between. Briefly, cells were incubated with an OSKM/rtta virus mix (1:1), supplemented with

8ug/ml protamine sulfate (Fisher Scientific #0219472905). The infection mix was removed on the following day and cells were left to recover in fresh MEF media (DMEM HG Thermo Fisher

Scientific #10569010 with 10% FBS Atlanta Biologicals #S11150, Glutamax Thermo Fisher

Scientific #35050079 and PenStrep Thermo Fisher Scientific 15140163).

2.5.4. Reprogramming Two days after infection, cells were re-plated for molecular analyses, colony picking and alkaline phosphatase (AP) staining. In each experiment, infected fibroblasts from the different genotypes were re-plated at multiple densities to allow for optimal reprogramming efficiency. For

73

wild type cells, 40-60K per well of a 24w dish routinely generated high numbers of iPS cell clones.

Besides for wild type, 40-60K per well of a 24w dish was also optimal for the Bard1 point mutants,

Brca1tr/+ (and all combination mutations with Smarcal1 or 53BP1); Brca1tr/tr, 53bp1-/- as well as the heterozygous or homozygous Smarcal1 and 53bp1 single mutants. The 3 genotypes-

Brca1tr/tr; Brca1tr/tr, Smarcal1+/- and Brca1tr/tr, Smarcal1-/- were plated at 120-160K per well of a 24w dish since we observed very few or no iPS clones at the densities selected for wild type.

The remaining Brca1tr/tr, 53bp1+/- genotype was re-plated at 90K/well of a 24w dish. The OSKM reprogramming factors were induced with 1ug/ml doxycycline (Sigma # D9891) in mouse embryonic stem (mES) cell media, consisting of Knockout DMEM (Life Technologies #10829-

018), supplemented with 15% Knockout Serum Replacement (Life Technologies #10828-028),

Glutamax (Thermo Fisher Scientific #35050079), MEM NEAA (Life Technologies #11140050),

PenStrep (Thermo Fisher Scientific 15140163), 2-mercaptoetahnol (Life Technologies #21985-

023) and 10ng/ul LIF (eBioscience #34-8521-82). Transduction efficiency was determined on reprogramming day 3 by staining for Sox2 (Stemgent #09-0024). Cells were fixed on reprogramming day 19-20 and stained for alkaline phosphatase with the Vector Red detection kit

(Vector Laboratories #SK-5100). Reprogramming efficiency was determined by considering the number of AP-positive colonies per number of infected cells at the optimal plating density for each genotype.

2.5.5. Immunofluorescence and DNA fiber analysis Detection of γH2AX, phospho RPA(S33) and 53bp1 was performed on reprogramming D5 with the following antibodies Phospho-RPA2Ser33 (Invitrogen # PA5-39809), Phospho-Histone

H2A.X Ser139 (Cell Signaling # 9718), 53BP1 Antibody H-300 (Santa Cruz #22760).

DNA fiber analysis was carried out as described in Terret et al. 2009 [143]. Briefly, fibroblasts of different genotypes were incubated on reprogramming day 5 with 25uM CldU for

74

30min, washed 3 times with warm PBS and incubated with 125uM IdU for another 30min. Fork stalling was then induced by a 5h treatment with 2mM hydroxyurea (HU). Fibers were stretched on slides and stained with anti-BrdU/CldU (Biorad # OBT0030) and anti-BrdU/IdU (BD # 347580) antibodies. Imaging was performed with a 100x objective on an Olympus microscope and fiber length was measured with Olympus cellSens imaging and analysis software. In an alternative fiber assay, fork stalling was induced by treatment with 2uM pyridostatin (PDS) during the 30min incubation with 125uM IdU.

2.5.6. HR Assay The HR competence of the different genotypes was evaluated in mouse induced pluripotent stem (iPS) cells with a CRISPR/Cas9-based assay where a zsGReen repair template is targeted to the Hsp90 genomic locus. This strategy has been described in detail by Mateos-Gomez et al. [129].

In short, 1x106 exponentially growing iPS cells were transfected with 200ng Cas9-puromycin vector and 800ng zsGreen repair template with Jetprimer transfection reagent (VWR #89129-922) as outlined in the manufacturer’s instructions. To enrich for Cas9-transfected cells, the plates were treated with 1ug/ml puromycin (Thermo Fisher #A11138-03) 20h post transfection for 12h. Three days after selection, HR was measured by flow cytometry for zsGreen.

Proliferation and apoptosis To evaluate proliferation, infected fibroblasts on reprogramming D2 were incubated with

5uM Cell Trace CSFE proliferation dye (Thermo Fisher # C34554) for 20min at 37C as outlined in the manufacturer’s protocol. Cells were then changed to fresh mouse ES cell media, composed of Knockout DMEM (Life Technologies #10829-018), 15% Knockout Serum Replacement (Life

Technologies #10828-028), Glutamax (Thermo Fisher Scientific #35050079), MEM NEAA (Life

Technologies #11140050), PenStrep (Thermo Fisher Scientific 15140163), 2-mercaptoetahnol

75

(Life Technologies #21985-023) and 10ng/ul LIF (eBioscience #34-8521-82), supplemented with

1ug/ml doxycycline (Sigma # D9891). Three days post incubation with CSFE (reprogramming day

5) cells were harvested for flow cytometry.

For apoptosis analysis, cells were collected on reprogramming day 5 and stained without fixation with the Annexin V-FITC apoptosis detection kit (Sigma # APOAF-20TST) according to protocols provided by the manufacturer. The numbers of early and late apoptotic cells were determined by flow cytometry for Annexin V-FITC and propidium iodide (PI). Early apoptosis is marked by Annexin V staining only, while late apoptotic cells stain for both Annexin V and PI.

76

CHAPTER 3: DETECTION OF BASE ANALOGS INCORPORATED DURING DNA REPLICATION BY NANOPORE SEQUENCING

3.1. MOLECULAR METHODS AVAILABLE FOR THE STUDY OF DNA REPLICATION The study on the role of BRCA1 in reprogramming to pluripotency used a widely employed

DNA fiber assay to evaluate the protection of stalled replication forks. A significant shortcoming of this method is that the genomic location of stalling is not known. The variability in replication tract length on different fibers demonstrates that replication fork progression and stability are not equal across the genome. Mutations in double strand break repair pathways may have differential effects on different genomic areas and thereby contribute to genetic abnormalities in reprogramming and cancer. To address this, we started the development of a novel method for the analysis of replication progression, which can be used to study DNA replication and repair in various biological contexts.

3.2. ABSTRACT DNA synthesis is a fundamental requirement for cell proliferation and DNA repair, but no tools exist to identify the location, direction and speed of replication forks with resolution. Mammalian cells have the ability to incorporate thymidine analogs along with the natural A, T, G and C bases during DNA synthesis, which allows for labelling of replicating or repaired DNA. The Oxford Nanopore Technologies (ONT) MinION infers nucleotide identity from changes in the ionic current as DNA strands are pulled through nanopores and can differentiate noncanonical nucleotides from natural ones. Here, we demonstrate the use of the ONT

MinION to detect 11 different thymidine analogs including CldU, BrdU, IdU, as well as, EdU alone or coupled to Biotin and other bulky adducts in synthetic DNA templates. We also show detection of IdU label, incorporated during DNA replication in the genome of mouse pluripotent

77

stem cells. We find that different modifications generate variable shifts in signals, providing a method of using analog combinations to identify the location and direction of DNA synthesis and repair at high resolution. We conclude that this novel method has the potential for genome-wide examination of DNA replication in stem cell differentiation or cell transformation.

3.3. INTRODUCTION DNA replication is a fundamental requirement for the development of an organism and the life-long maintenance of organ function. An estimated 2-3x1011 cell divisions occur in the human body each day to replenish lost cells and repair tissue damage. While DNA is replicated just once during a cell cycle, the pattern by which this occurs varies between cell types [144], and is different in malignancy [145]. During the neural differentiation of mouse embryonic stem cells, for example, as much as 20% of the genome switches replication timing [146]. Mammalian development induces replication program changes which affect at least 50% of the genome [147].

The generation of induced pluripotent stem cells is associated with changes in replication timing which reset the replication pattern to resemble the one in embryonic stem cells [117]. Importantly, cancer cells acquire replication programs which differ from the ones in their normal counterparts.

For instance, allelic replication asynchrony at the p53 and 21q22 loci has been described in invasive carcinomas [148]. Delayed replication of one allele of a tumor suppressor can interfere with and create a situation similar to loss of heterozygosity which is associated with malignancy [148]. Chromosome-wide delays in DNA replication have been shown to delay chromosome condensation and contribute to the chromosomal instability of various tumor cell lines [149]. Therefore, in addition to studying differential gene expression patterns, there is a strong rationale to study differences in DNA replication patterns in development and disease.

Methods that are currently available to analyze the progression of DNA replication include a pulse of BrdU and immune-precipitation for labelled DNA [150]. Replication patterns can also

78

be inferred from the sequencing of Okazaki fragments [151] or from counting the number of reads in next generation sequencing data which allows the identification of early and late replicating regions as well as of replication origins [152]. However, next generation sequencing requires amplification during library preparation and averages the signal from different cells and DNA strands. To identify the progression of replication on a single DNA strand, a commonly used technique is DNA combing, which relies on sequential pulses with two modified nucleotides- IdU and CldU, fiber stretching on glass slides and staining with specific antibodies. While this technique can track replisome progression, stalling and restart in a genome-wide manner, it does not provide any sequence information. To study replication dynamics at a locus of interest, fiber analysis requires combination with fluorescent in situ hybridization [153]. This method, however, is very laborious and not scalable to the level of a complete eukaryotic genome. Thus, to the best of our knowledge, there is currently no technique that can capture location, direction and speed of replication fork progression on single molecules at a genome-wide level with high resolution.

Nanopore technology has recently emerged as a powerful third-generation sequencing technique, primarily utilized for genome assembly due to its ability to produce ultra-long reads

[154]. The system uses electrophoresis to transport DNA through a collection of nanopores and bases are identified by measuring changes in the electric current across the surface of the orifice

[155] (Fig.18). In reality, signal shift is produced not by a single base, but is influenced by 5-6 neighboring nucleotides, making contact with the pore and termed a k-mer. Computational algorithms are then used to process the signal and determine the identity of the bases passing through. Since the nanopore platform does not require DNA amplification for library preparation, it can sequence cellular DNA directly and thereby avoid averaging signals from different DNA strands. Because nanopore technology does not rely on base-pairing to determine DNA sequence,

79

it may be used to distinguish not only the four canonical nucleotides (A, T, G, C), but potentially other types of base. Indeed, the system is able to detect methylated bases [156], which suggests that the sequencing of nucleotide analogs incorporated during DNA replication may also be possible.

Here, we present the use of nanopore sequencing on the Oxford Nanopore Technologies

(ONT) MinION to detect thymidine analogs that can be directly incorporated during DNA replication or generated in click cycloaddition reactions. Using a series of templates, we show that the MinION can distinguish between thymidine and all 11 analogs, with the greatest signal to noise ratio shown by IdU, CldU and Biotin-dU. We also show detection of IdU in substituted mammalian

DNA. The methods presented in this study provide a path towards extracting comprehensive, genome-wide information on the origin, direction and progression of polymerases during DNA replication and repair from MinION sequencing reads.

Figure 18. An artist's representation of nanopore sequencing. A DNA helicase at the pore unwinds dsDNA. Single DNA strands pass through the pore (CsgG from E. coli in this illustration) while an electrical current is applied across the pore membrane.

80

Changes in the current as ssDNA passes through are recorded and used in computational algorithms to determine base identity. Obtained from Operon Art. https://operonart.wixsite.com/operonart

81

3.4. RESULTS

3.4.1. DETECTION OF SINGLE-BASE DNA MODIFICATIONS WITH NANOPORE SEQUENCING To determine the ability of the ONT MinION sequencer to detect modified bases, we designed synthetic templates containing a single modification in a defined location. Template assembly involved a vector nicking step with endonuclease nb.BbvCI which generated a 63bp region of ssDNA on the minus strand, followed by the ligation of an oligo with one modified thymidine base (Fig.19A). This approach to replace segments within a plasmid with a labelled oligo has previously been used for FRET studies and reported to result in >75% substitution [157].

We used oligos without modification, or containing 5 different modifications, EdU, CldU, BrdU,

IdU, or Biotin for the substitution reaction (Fig.19B). Also, we modified EdU with a series of compounds of increasing molecular weight: azide, nicotine, glucose, AF488, AF647 and a 10bp single stranded DNA oligo through click cycloaddition reaction [158]. This method allows the attachment of any azide group containing moiety to alkyne-substituted DNA and has been reported to be nearly 100% efficient [159]. Dot blots for AF647, AF488, BrdU and Biotin modified plasmids showed efficient labelling (Fig.19C). Following the replace reaction, the vector was linearized with SpeI to generate a 6184-bp double-stranded training template with one modified thymidine on the minus strand at position 3073, which was sequenced on the MinION (Fig.19D).

82

Figure 19. Design and assembly of training templates with single modifications. Design and assembly of training templates with single modifications. A). A schematic of the two-step assembly of single modification training templates. The first step is nicking with the restriction enzyme nb.BbvCI and the second-ligation of a ssDNA oligo which carries a modification. The vector is then linearized with SpeI to generate a 6184bp sequence with one thymidine analog at position 3073.

83

Figure 19. (continued) B). Sequence of the single modification oligos used for ligation. C). Dot blots with ligation products for AF647, AF488, BrdU, IdU and Biotin modifications, demonstrating efficient replacement with the modified oligo.

84

Figure 19. (continued) D). A schematic illustrating the sequencing of a training template with a single-modification through a nanopore.

85

To identify modified nucleotides, we developed NanoMod, an analysis tool which calls

DNA modifications directly from electrical signals [160]. NanoMod uses Albacore for base calling and performs indel error correction by aligning signals to a reference sequence. The input to

NanoMod is a dataset with two groups of samples-a modified template and a sequence-matched control, while the output is the ranked list of regions with modifications. Signals from control and substituted samples are then compared using two statistical assays-Komogorov-Smirnov test and

Stouffer’s method for calculating combined p values. We used NanoMod to score individual nucleotides in 5-bp sliding windows to detect the presence of thymidine analogs. All 11 modifications (Fig.20A) in the dataset caused signal shifts at base 3073 as well as at neighboring positions (Fig.20B-M). The analogs EdU, CldU, BrdU and IdU caused signal shifts spreading as far as 3-4 bases upstream and 2 bases downstream of the modified position (Fig.20B, C, D, F), consistent with the minion detecting signals from a 5-6 bp k-mer. Therefore, the signal of the modified T is present in neighboring bases. The magnitude of the signal change was proportional to the size of the moiety with one of the largest unclicked analogs IdU (MW=354.1g/mol) generating the most significant change in pore current, log(Combined p-value) = -175 (Fig. 20F).

The smallest modification, EdU (MW=252.23 g/mol) was detectable as well at log(Combined p- value) = -125 (Fig.20A). A highly significant change in signal at 3073 was also recorded for the bulkier analog- Biotin, with a log(Combined p-value)=-150 (Fig.20I). Purification of the plasmid after the replace reaction with streptavidin beads, followed by sequencing further increased the significance of the change, with a log(Combined p-value)= -250 (Fig.20J). Therefore, modification at 75% [157], was sufficient to call the modified base, while 100% modification further magnified signal change. Biotin influenced electrical signals further away from the modification than EdU,

CldU, BrdU or IdU. An increased number of surrounding bases affected-as many as 3-4

86

nucleotides upstream and about 10 downstream of the modified position (Fig.20I, J) indicates that the effect of bulky modifications can spread beyond a single k-mer.

We also compared modifications attached to the training template by click cycloaddition - an azide, nicotine, glucose, AF488 and AF647. In regions outside the modified base, we did not observe significant changes in signals, indicating that the click reaction did not affect the ability to sequence DNA on the MinION (e.g. compare position T3063 and T3083 on the control and experimental templates in Fig.20G). At the modified base, nicotine showed a significant change with a log(Combined p-value) = -74 (Fig.20G). The mean raw signal of bulky bases caused a wider spread in signal, which generally affected 5-6 bases preceding the modification and about 2-4 bases downstream (Fig.20E, G, H). Clicked moieties AF488 and AF647 caused a more extensive downstream current shift, spreading as far as 8-10 bases beyond the modified position (Fig.20K,

L). The mean raw signal at the modified base was visibly different when compared to control, though the change did not translate into lower p-values in the NanoMod statistical analysis. The largest clicked modification - a ssDNA oligo, expected to generate a branched structure, caused the smallest change in signal which spread over 10 bases upstream and 2-3 positions downstream of 3073 (Fig.20M). In this particular case, reduced progression through the pore, or signal integration from bases on the training template and the clicked oligo might have contributed to the low shift in current recorded for the modified position.

These observations point to a relationship between adduct size and the magnitude of signal shift at the modified position as well as the neighboring bases. In general, bulkier moieties caused more pronounced changes in the raw signal at the modified position, and affected more bases in its vicinity. The differences in signal spread between adducts of different molecular weight show

87

that pairs of analogs, such as IdU and Biotin, can be distinguished from each other based on the width of the window affected by nanopore current shifts (Fig.20F and Fig.20J).

Figure 20. Detecting a series of single-base modifications with increasing molecular weight through MinION sequencing.

A). Chemical structure and molecular weight of the tested analogs.

88

Figure 20. (continued). A violin plot of the known modification site and the surrounding bases for B). EdU, C). CldU, and D). BrdU.

89

Figure 20. (continued). A violin plot of the known modification site and the surrounding bases for E). Azide, F). IdU and G). Nicotine.

90

Figure 20. (continued). A violin plot of the known modification site and the surrounding bases for H). Glucose, I). Biotin and J). Streptavidin purified Biotin.

91

Figure 20. (continued). A violin plot of the known modification site and the surrounding bases for K). AF488, L). AF647 and M). a 10bp ssDNA oligo, attached to position 3073 by click cycloaddition. In each panel, the first line denotes position, followed by the base, the second line shows mean raw signal from the control and modified sample, the third line contains the p-value

(pv for short) for a Kolmogorov-Smirnov test, and the fourth line displays the combined p-value, calculated using Stouffer’s method. ‘DS 1’ in blue represents the non-modified oligo samples,

92

while ‘DS 2’in green stands for the sample with a modification at position 3073. ‘- strand’ denotes reverse strand. Black stars represent unmodified T’s and the red star marks the modified position.

93

3.4.2. DETECTION OF MULTIPLE DNA MODIFICATIONS WITH NANOPORE SEQUENCING

To mimic the scenario of pulsing replicating cells with nucleotide analogs, which are incorporated at multiple positions into newly synthetized DNA, we repaired the nb.BbvCI nicks on the minus strand of the training sequence by ligating an oligo with 14 thymidine modifications between positions 3042 and 3104 (Fig. 21A). Based on the results with a single defined position, we selected 4 analogs - EdU, CldU, IdU, and Biotin added to EdU by click cyclo-addition (Fig.

21B). Efficient replacement was confirmed by dot blots for CldU, IdU and clicked Biotin (Fig.

21C). Templates with replacements were sequenced and analysis was carried out by dividing the reference sequence into ~400 regions of 60bp each with a 30-bp overlap between adjacent windows. For all analogs tested, NanoMod correctly identified the modified region between 3042 and 3104 as shown by the p-values for Kolmogorov-Smirnov test and Stouffer’s method (Fig.22A-

C). The most significant shift of signal was centered around a modified base, gradually abated with increasing distance from the analog and raised again in the vicinity of the next modification on the template. Within each template, some modified thymidines generated stronger signals than others

(Fig.22A-C). As in templates with single modifications, the magnitude of signal shift in sequences with multiple analogs was determined by the size of the adducts with IdU generating the greatest signal to noise ratio (Fig.22C). Biotin, the largest moiety, caused signal changes spreading over the whole region between 3042 and 3104. This spread of signal is likely related to the chemical structure of biotin, and the ability of bulky adducts to influence the current reading beyond a single k-mer. This dataset on multiple modifications demonstrates that MinION sequencing and analysis with NanoMod can be successfully used to identify short genomic regions with modified bases in

DNA.

94

Figure 21. Design and assembly of training templates with multiple modifications.

A). A schematic of the strategy used to assemble training templates with multiple modifications. A ssDNA oligo carrying 14 analogs is used in the ligation step.

95

Figure 21. (continued) B). Sequence of the multiple modification oligos used for ligation. E =

EdU; Cl = CldU and I = IdU. C). Dot blots with ligation products for CldU, IdU and Biotin. CldU and IdU were visualized by antibody staining on the blot. Biotin was seen after incubation with

Alexa 647-streptavidin.

96

Figure 21. (continued) D). An illustration of sequencing a DNA template with multiple analogs through a nanopore.

97

Figure 22. Detection of multiple modifications with MinION sequencing.

A). A violin plot of the modified region 3042-3104 and surrounding bases for EdU.

98

Figure 22. (continued) B). A violin plot of the modified region 3042-3104 and surrounding bases for CldU.

99

Figure 22. (continued) C). A violin plot of the modified region 3042-3104 and surrounding bases for IdU. ‘DS 1’ in blue represents the non-modified oligo sample, while ‘DS 2’ in green stands for the multi modified template. The Kolmogorov-Smirnov test p-value is shown on line 3, while the combined p-value, calculated by Stouffer’s method is displayed in line 4. Red stars indicate the position of modified T’s.

100

3.4.3. A COMPARATIVE ANALYSIS OF NUCLEOTIDE ANALOGS FOR gDNA MODIFICATIONS To determine if pore sequencing and the analysis method NanoMod developed here can be applied to sequencing base analogs incorporated by replicating cells in genomic DNA, we performed computational simulations. We first determined the level of analog substitution in vivo by exposing primary human fibroblasts to EdU for 24h, or a complete cell cycle. Purified, EdU- containing genomic DNA was then labelled with AF647 in a click cycloaddition reaction and the level of EdU-AF647 substitution was determined by measuring sample fluorescence against a standard curve of PCR products with known percentage substitutions of EdU, clicked with AF647.

This analysis revealed that 20% of thymidine bases in genomic DNA were substituted with EdU

(Fig.23A, B). This is the expected percent modification for replication tracts pulsed with thymidine analogs [161].

101

Figure 23. Substitution of mammalian genomic DNA with EdU.

A). Dot blots with EdU-substituted genomic DNA clicked with A647-azide. B). A647- fluorescence measurement of EdU-substituted DNA. The standard curve was prepared with products of PCR reactions ran with known concentrations of dTTP to dEdUTP and processed through click cycloaddition with A647 azide. The analysis shows ~20% substitution in genomic

DNA after 24h of incubation with 10uM EdU.

102

Based on this result, we performed computational simulations with 20% modified and 80% unmodified reads to determine what coverage would be required to detect single (Fig.24A) and multiple (Fig.24B) modifications in genomic DNA. Since the fraction of modified reads without purification is approximately 75% [157], the proportion of modified reads in the simulation is approximately 15%. IdU and CldU were easiest to detect in the group of analogs that can be directly incorporated during DNA synthesis, requiring 450 reads of a modified site to rank strands with a single modification in the 0.05 percentile or less (Fig.24A). The calling of BrdU, CldU and the calling of single EdU-modified strands also obtained the highest 0.05 percentile rank as low as

450 or 950. Similarly, Biotin modifications were ranked in the 0.05 percentile range with only 450 reads. The clicked analogs-an azide group, nicotine and glucose required 950 reads, almost twice the number noted for IdU and CldU, to achieve the 0.05 percentile rank. The clicked fluorophores

AF488 and AF647 required more extensive coverage-1950 and 2450 reads, respectively for more than 75% of the strands to be ranked in the 0.05 percentile or between percentiles 0.05 and 0.12.

This result is consistent with the wide-spread signals of lower significance described for AF488,

AF647 and the ssDNA oligo in Fig.20. This result corroborates the observation that thymidine analogs of higher molecular weight IdU, and Biotin are readily detected on the MinION platform.

To determine the number of reads required to identify a region with multiple modifications, we performed the same simulation with 14 thymidine analogs in a single template (Fig.24B).

Multiple EdU modifications required 250 reads for reliable detection (percentile 0.05 or between the 0.05 and 0.12 percentiles). For other analogs CldU and IdU, only 100 reads proved enough for correct calling (percentile 0.05 or between the 0.05 and 0.12 percentiles). The template which contained multiple clicked biotins required 350 reads for accurate detection. This analysis showed that the presence of multiple modified bases in a DNA segment facilitates detection with

103

NanoMod, requiring a smaller number of reads compared to a single modification. IdU was the modification that was most readily detectable in both the modelling of single modifications and multiple modifications.

Figure 24. Sequencing depth required for the calling of base analogs in mixed samples with unmodified and modified reads. A). Modeling using single modified base templates. The X axis represents the number of reads used for the analysis, and the Y axis shows the fraction a modified site can be ranked by percentile among all sites for each sample. Analysis was performed using Stouffer’s combined statistics.

104

Figure 24. (continued) B). Modeling using templates with multiple modifications for the modified region between 3042 and 3104 with tracks of thymidine analogs. IdU and CldU modifications cause the strongest signal shifts and require the least number of reads for reliable detection.

105

3.4.4. THYMIDINE ANALOGS INCORPORATED IN REPLICATING MAMMALIAN DNA CAN BE DETECTED ON SINGLE STRANDS.

To test if the MinION platform in conjunction with NanoMod can detect nucleotide analogs incorporated in replicating DNA, we incubated mouse embryonic stem (ES) cells from a pure

C57BL/6J strain with 25uM IdU for 24h. Genomic DNA from unlabelled and IdU-incubated samples was then harvested, sheared in 6-8kb fragments and prepared for 1D sequencing on the

MinION system. Error correction and signal annotation were performed with NanoMod. Base calling was accomplished with Albacore v2.3.1 and long reads were aligned with the GRCm38

(mm10) mouse reference genome. We aligned all reads from the IdU sample to the mouse reference genome and used a set of IdU-covered reads and control reads in the subsequent analysis.

These reads were from different, rather than the same genomic location. Reads were split in 9- mers with no thymidines (VVVVVVVVV), V=AGC, 1 thymidine in the center of the k-mer

(VVVVTVVVV) or 2 and more thymidines at different positions on the plus strand of the k-mer.

This classification included 335 non-T k-mers and 668 T-containing k-mers of different sequence composition with coverage of >500 reads per k-mer. Current signals were collected from the central T as well as 2 nucleotides upstream and 2 downstream of the base and used in a

Kolmogorov-Smirnov test to calculate p-values. Stouffer’s method was then applied to calculate a combined p-value for each 9-mer. The procedure was repeated 50 times and the median of the resulting 50 p-values was plotted on a logarithmic scale. Comparing IdU labelled samples with unlabelled controls revealed that 20% of the T-containing k-mers in the IdU sample had combined log10 p-values between -308 and -10, while none of the non-T k-mers had a p-value in this range, showing that IdU substitution in T-containing k-mers generates a detectable signal shift (Fig.25A,

B). Comparison of k-mers containing either 1T or 2T resulted in greater significance of detected differences: 25% of 1T and almost 40% of 2T-contaning 9-mers had combined log10 p-values

106

between -308 and -3 (Fig.25C, D). Analysing k-mers with p-values below 10-7 showed that only

1.3% of non-T k-mers have p-values in this range, while 5.9% of the 1T, 24.8% of the 2T and 68% of the 3T containing k-mers had signals with p-values <10-7 (Fig.25E). This result demonstrates that the method employed here has low background and can detect incremental signal differences generated by the presence of increasing numbers of IdU-substituted thymidines. To determine whether signals from a k-mer of a specific sequence could be used to distinguish IdU from thymidine, two selected 9-mers GAGATACAC and GAGTTACAC were divided into 5 different k-mers. Electrical signals from control and IdU labelled samples of the same k-mer were plotted and compared using NanoMod. A significant shift was seen in the IdU containing k-mers, with the shift depending on the position of the T within the k-mer (Fig.25F,G). This analysis used

427IdU/229control reads for GAGATACAC and 310IdU/211control for GAGTTACAC to identify an IdU containing k-mer in cellular DNA. This analysis shows that thymidine analogs, incorporated by replicating mammalian cells, can be called with nanopore sequencing, and that T containing k-mers of the same sequence, from different locations of the genome can be combined for the calling of base modifications. The combination of approximately 200 different k-mers on a single read should provide a tool to identify replicated DNA from long nanopore reads.

To determine if we can detect single IdU-substituted DNA molecules with Nanomod, we randomly selected 70% of the reads with T-containing k-mers from different genomic locations in the control genomic DNA sample. We calculated the signal mean and standard deviation for the central 5 bases (central T as well as 2 bases upstream and 2 bases downstream) of each 9-mer.

These results were used to calculate a combined p-value for each T in the remaining 30% of the

T-containing k-mers in the control genomic DNA sample and the IdU-substituted sample with the following thresholds for read length: 500, 800, 1000, 1500, and 2000bp. This analysis showed that

107

IdU-substituted single reads of length 0.8-1.0kb can be detected with Nanomod in the 10% quantile and individual modified reads of 1.0-1.5kb could be called in the 15% quantile (Fig.25H). These results demonstrate that IdU, incorporated in newly-synthesized mammalian DNA can be detected on individual DNA molecules with MiniION sequencing and Nanomod analysis.

Figure 25. Detection of IdU incorporated into replicating mammalian DNA.

A). Combined p-values calculated with Stouffer’s method for non-T containing 9-mers. The combined p-value includes p-values for the central base of the 9-mer as well as 2 bases upstream and downstream. B). Combined p-values for T-containing k-mers. All T-containing 9-mers were considered, including ones with multiple Ts.

108

Figure 25. (continued) C). Combined p-values for 1T-contaning k-mers. This analysis included

9-mers with the thymidine base in the middle. D). Combined p-values for 2T-containg k-mers. The first T was set to the middle of the 9-mer and the other T could take any other possible position.

Figure 25. (continued) E). Pie chart showing the percentage of 9-mers with p-values less than

10^-9. Non-T, 1T, 2T and 3T-containing k-mers are shown in red, green, blue and purple, respectively.

109

Figure 25. (continued) F). Signals changes as a selected 9-mer from IdU-substituted genomic

DNA with the sequence GAGATACAC, and G). GAGTTACAC was separated in 5 different k- mers. The k-mers were covered by 200-400 reads.

110

Figure 25. (continued) H). Detection of single modified reads. The top panel contains ranges of read lengths. The purple stars indicate the read length required for the detection of individual IdU- substituted genomic DNA reads when integrating signals from k-mers with p-values in the 10% and 15% quantile.

111

3.5. DISCUSSION

Analysis of DNA replication at a single molecule level and genome would greatly increase our understanding of DNA replication and genetic stability in development and disease. Towards this aim, we show here that pore sequencing with the MinION can be used in conjunction with

NanoMod to detect a panel of thymidine modifications in synthetic templates, as well as, IdU incorporated in the DNA of replicating mammalian cells. Compounds, which are chemically alike, but with different molecular weights or different structure, can generate distinct signals in a pore

[162]. Here we present increased signal change with increasing molecular weight from EdU, CldU and BrdU to IdU with a signal spread, affecting 2-4 neighboring bases on each side of the analog.

Structurally bulkier moieties of high molecular weight, such as biotin, generated large signal shifts spreading as far as 10 nucleotides in the neighborhood of the modified base. Larger polymers have been reported to have increased pore residence times compared to shorter molecules [163] and thus the extensive signal spread with biotin modified DNA may be due to longer dwell times in the pore.

In this work, we also demonstrate that click cycloaddition reactions can be used to generate any desired chemical change after incorporation of a nucleotide with a reactive group. These analogs could be detected without further modification, and the clicking of bulky adducts again showed a wider spread in signals. Though click chemistry using copper catalysts can introduce

DNA damage, novel copper-free click chemistry is now available and will likely be more suitable for this application [164]. For example, AmdU is an azide group-containing analog, which is incorporated during DNA replication and reacts with alkynes in copper-free reactions, allowing the formation of a bulky adduct while protecting DNA from damage [165, 166]. A base with a

112

vinyl-group, such as VdU, also allows the introduction of further modifications via copper-free alkene-tetrazine ligation[167].

The difference in signal spread between small analogs and larger bulky adducts can be potentially leveraged to distinguish between a small and a bulky analog which opens up the possibility of using the platform to determine replication patterns. For example, studies on the progression of DNA replication may use the sequential application of two analogs, such as IdU and EdU with the intent to modify the latter further with biotin and form a larger base. This advance will require the development of new algorithms which can distinguish one analog from the other as this is not yet possible with the methods used in the current study. The use of two analogs is routinely applied in studies of fork stalling, origin mapping, analysis of replisome speed and direction. The ability to add information on location in the genome on single DNA strands through pore sequencing would greatly facilitate our understanding of the pattern of DNA synthesis in normal and abnormal cells. To this end, we provide a proof of principle that noncanonical bases that can be incorporated into cells during DNA replication can be detected with nanopore sequencing. We chose IdU for analysis, as it performed best in modeling to detect a modified nucleotide when 20% of T’s are substituted. Using IdU labelling through a complete cell cycle, we demonstrate that IdU-substituted mammalian genomic DNA, can be distinguished from control

DNA. A coverage of 200-300 reads per 9-mer was sufficient to observe significant differences. As these k-mers are from different locations of the genome, k-mers from the same read may be combined for the analysis of a labelled DNA strand, thereby amplifying the signal and reducing the coverage requirement. We used this strategy to detect IdU-substitutions on individual DNA molecules and showed that single IdU-containing reads of 0.8-1.0kb can be called with Nanomod when sampling from the top 10% lowest p-values of T-containing k-mers along the read. This

113

result is a significant step forward in the development of applications to study origin usage, replisome direction and stalling, as well as, DNA repair. The feasibility of using nanopore sequencing for such purposes was recently demonstrated by a study which maps origin usage and fork direction in yeast with D-NAscent sequencing [168].

The MinION has been successfully used for the assembly of eukaryotic genomes, such as, yeast [169], C-elegans [170] and recently the with a median coverage of 26x, using

43 flow cells [171]. This demonstrates that nanopore sequencing can be used to analyse a mammalian genome. Furthermore, the PromethION, a high throughput MinION platform can run

48 flow cells simultaneously and generate up to 7.6Tb data during a 72h run. Applying single read analysis on PromethION data will allow the assembly of detailed replication dynamics maps, which inform on origin usage and fork progression on single DNA strands. This is currently not possible with existing methods, which require amplification, resulting in population averages.

Therefore, sequencing DNA replication using nanopores, termed here Replipore sequencing, is a novel technology with significant potential in research and diagnostics.

Cell proliferation plays an important role in many diseases. Abnormalities in DNA replication can result in genome instability, cellular senescence or apoptosis, and are therefore highly relevant to regenerative medicine. Furthermore, DNA replication is cell type specific and dependent on epigenetic principles that also regulate cell type specific gene expression. Due to the fundamental requirement of DNA replication in cell cycle progression, the cell type specificity of the replication program may have an important role in determining limitations in cell proliferation

[172]. Thus, insights about polymerase progression derived from sequencing modified nucleotides through nanopores will likely prove instrumental to our understanding, diagnosis and ultimately treatment of diseases characterized by cell cycle abnormalities.

114

3.6. MATERIALS AND METHODS 3.6.1. Training Template Assembly For training sequence assembly, the pCALNL-GFP vector (Addgene plasmid#13770) was digested with AgeI and NotI to release the E-GFP insert. A double stranded DNA oligomer with

5 nbBbvCI sites was prepared by hybridizing two single-stranded DNA oligos and ligated into the purified empty vector as described in Luzzietti et al.[157]. The product was transformed into E- coli. Nicking and substitution reactions were performed as outlined in [157] with oligos, purchased from Baseclick, Germany. Prior to MinION library prep, all templates were linearized with SpeI

(New England Biolabs).

Top Clone: 5’CCGGGCCTCAGCTTGCCACGACCTCAGCGTCAATTGTCCTCAGCTCAGATGACCTC AGCAGATTGTAGCCTCAGC-3’

Bottom Clone: 5’GGCCGCTGAGGCTACAATCTGCTGAGGTCATCTGAGCTGAGGACAATTGACGCTG AGGTCGTGGCAAGCTGAGGC-3’

Substitute Oligo Single: 5’PhosTGAGGCTACAATCTGCTGAGGTCATCTGAGCXGAGGACAATTGACGCTGAGG TCGTGGCAAGC-3’ X = EdU, CldU, BrdU, IdU

Substitute Oligo Multi: 5’PhosXGAGGCXACAAXCXGAGGXCAXCXGAGCXGAGGACAAXXGACGCXGAGGX CGXGGCAAGC-3’ X = EdU, CldU, IdU

3.6.2. Click Cycloaddition Reactions Click cyclo-addition was performed on single stranded EdU replace oligos prior to the substitution reaction. Briefly, 1ug phosphorylated oligo was incubated with reagents from the

CuAAC Biomolecule Reaction Buffer kit (Jena Bioscience CLK-072) and 250uM azide for 1h at

115

37C. Separate reactions were carried out with the following azides: sodium azide (Sigma S2002),

Nicotinoyl azide (Sigma CDS006775), 6-azido-6-deoxy-D-glucose (Sigma 712760) Biotin azide

(Thermo Fisher B10184), AF488 azide (Thermo Fisher A10266), AF647 azide (Thermo Fisher

A10277), and a ssDNA oligo purchased from Integrated DNA Technologies: 5’-

GGATAGCCTC/3AzideN/-3’. All click products were purified by precipitation.

3.6.3. Dot Blots For dot blots, 100-200ng DNA was denatured with 20mM NaOH for 5min at 99C, cooled on ice and neutralized with 0.6M ammonium acetate. Spotting was performed manually on a

0.45uM nylon membrane (Thermo Fisher 77016). Once fully dried, the membrane was baked in a microwave for 2.5min. Prior to imaging, Biotin-containing blots were stained with A647-

Streptavidin (Thermo Fisher S32357). For this procedure, the baked membrane was incubated with

3% BSA in TBST for 15min, followed by A647 Streptavidin 1:500 (Thermo Fisher S32357) for

45min at room temperature and 3 washes with TBST. To detect CldU or IdU-containing DNA,

500ng-1ug of sample was spotted on the membrane and stained with rat BrdU/CldU 1:500 (Bio

Rad OBT0030S) or mouse BrdU/IdU 1:500 (BD 347580) in 3% BSA in TBST for 1h at room temperature. Staining with mouse ssDNA antibody 1:500 (Millipore MAB3034) was used as a loading control. To visualize the signal, the membrane was incubated with Alexa Fluor 647 and

Alexa Fluro 488 secondary antibodies at 1:500. Following washes, the membranes were imaged with a Biorad ChemiDoc MP Imaging System.

3.6.4. MinION Runs on Synthetic Templates All samples were sequenced on MinION R9 flow cells. Sequencing libraries were prepared with Ligation Sequencing Kit 1D (ONT SQK-LSK108), following manufacturer’s instructions.

Each run was performed with 250-500 ng purified DNA. Typically, a single flow cell was used for multiple samples with washes in between, carried out with the Flow Cell Wash Kit (ONT EXP-

116

WSH002). Fast5 data files from each run were analyzed with NanoMod as outlined in Liu et. al

[160].

3.6.5. MinION Runs with Genomic DNA For genomic DNA runs, mouse embryonic stem cells from a pure C57BL/6J strain

(Jackson LAB Strain # 000664) were incubated with 25uM IdU for 24h. Cells were then harvested and genomic DNA was extracted with a High Pure PCR Template Preparation Kit (Roche

11796828001). MinION runs were performed on R9 with 400-500ng purified DNA per sample following library preparation with Ligation Sequencing Kit 1D (ONT SQK-LSK108). For each run, unlabelled DNA was loaded first and sequenced for 5-6 hours. This was followed by a wash step with the ONT wash kit (ONT EXP-WSH002), priming and loading of IdU-substituted gDNA which was run until the flow cell expired, typically from 8 to 20 hours without voltage adjustments.

Analysis was performed with data from two combined runs. The combined control data set included 719,295 long reads covering ~2.4G bases while the IdU sample was represented by

140,667 long reads across 417M bases.

117

CHAPTER 4: FUTURE DIRECTIONS AND PERSPECTIVES

4.1. BRCA1 IN REPROGRAMMING TO PLURIPOTENCY The current study shows that BRCA1 is required for reprogramming to repair replication- associated DNA damage by homologous recombination. Interestingly, eliminating 53bp1 makes iPS cell generation more efficient, due to the absence of competition between 53bp1 mediated

NHEJ and Brca1-dependent HDR for one-ended DNA breaks.

To rule out the possibility that ablation of 53bp1 interferes with the transcriptional activity of p53 and improves reprogramming by preventing apoptosis [92], we are looking into p21, Bax and Bbc3/Puma induction in wild type and 53bp1-/- cells post irradiation. Our preliminary data shows comparable upregulation of p21 transcript in wild type, 53bp1+/- and 53bp1-/- cells, which shows that 53bp1 does not improve reprogramming efficiency by compromising p53 dependent transcription.

To further strengthen the finding that DNA repair pathway choice is a critical determinant of reprogramming efficiency, we are in the process of stimulating HDR during reprogramming by overexpression of Rad51. We are also determining the homologous recombination efficiency in heterozygous 53bp1 knockout cells. These results aim to establish a correlation between homologous recombination efficiency and reprogramming efficiency.

To confirm that the breaks which limit reprogramming efficiency are one-ended, we have begun experiments where cells of wild type and 53bp1-/- genotypes are irradiated to induce two- ended DSBs or treated with the topoisomerase 1 inhibitor camptothecin (CPT), which generates one-ended DSBs. Our preliminary observations show that 53bp1-/- cells undergoing reprogramming are less sensitive to CTP, compared to wild type controls.

118

Our data points to one-ended DSBs in late S/G2 as the main limitation to reprogramming.

To confirm that the most consequential damage is indeed in late S/G2, we are preparing cell cycle profiles of wild type and Brca1-deficient cells to determine where in the cell cycle these genotypes arrest. In addition, we have derived iPS cell lines from wild type, Brca1tr/tr as well as Brca1tr/tr,

53bp1-/- genotypes, which we have sequenced on the Bionano Genomics platform to determine which regions of the genome are lost during reprogramming. To capture reprogramming-induced copy number variation, we have included two sibling iPS cell clones from the same somatic cell culture for each genotype in the analysis. Our expectation is that Brca1-deficient iPS cells will have copy number losses in late replicating regions, including in common fragile sites.

Surprisingly, γH2AX marked DNA damage does not directly correlate with reprogramming efficiency. As 53BP1 is an important factor in reprogramming, we are currently looking at γH2AX and 53BP1 double positive foci in the context of HDR deficiency, SFP rescue and SFP deficiency. We expect that Brca1tr/tr, Smarcal1-/- cells accumulate 53BP1 foci in numbers similar to the Brca1tr/tr genotype, which results in poor reprogramming, despite restoration of SFP. We expect that 53BP1 foci are a better predictor of reprogramming efficiency than either γH2AX or phospho-RPA (S33).

An intriguing interpretation of our findings is that the function of the genomic regions which are difficult to replicate is to form double strand breaks, which serve as a primary barrier to reprogramming. These one-ended double strand breaks act upstream of tumor suppressors p53, p21 and Rb, which limit reprogramming. Ablation of p53, p21 or Rb increases the efficiency of iPS cell generation. Equally, the deletion of unstable genomic regions reduces the signal upstream of p53, and may be selected for during reprogramming.

119

4.2. DETECTION OF MODIFIED NUCLEOTIDES WITH NANOPORE SEQUENCING The main conclusion of this project is that thymidine analogs, incorporated into nascent

DNA during replication or repair can be detected with nanopore sequencing. Importantly, besides calling of modifications on synthetic templates, we show detection of IdU in mammalian genomic

DNA. This result has wide implications for the development of techniques which track replication fork stalling in a sequence-specific context. To bring the technology closer to this point, we are currently looking at synthetic templates with two different analogs in an attempt to distinguish them from each other.

4.3. TYING GENETIC STABILITY TO CELL IDENTITY Replication of the DNA is a fundamental requirement for progression through the cell cycle and thereby for the development and maintenance of all tissues in the human body. Though all cells duplicate their DNA once before division, they do so in a manner dependent on cell type.

Since replication tends to begin in regions with open chromatin and active genes, about half of the human genome is replicated in a cell type dependent temporal pattern, while the other half is cell type independent [144]. Surprisingly, a functional significance of cell type specific DNA replication has not been established because of fundamental experimental challenges. It is not yet possible to manipulate all cell type specific origins and ask whether a cell can still progress through the cell cycle. Various observations, including feedback between replication progression and origin activity [173] as well as checkpoints monitoring the completion of S-phase before entry into mitosis, suggest that DNA replication is well prepared to duplicate the genetic information of any cell type. According to this interpretation, differences in DNA replication patterns are a mere adaptation to epigenetic changes required to establish cell type specific gene expression. Hence, most research to better understand different cell states has focused on the processes governing

120

gene expression, while cell type specific DNA replication remains a little explored dimension of cellular biology.

A connection between genomic stability and cellular identity is visible in various experimental systems. The conversion of somatic cells to induced pluripotent stem cells is associated with DNA damage and replication stress. In the absence of proteins involved in DNA damage repair, such as Brca1, Brca2 and Rad51, reprogramming fails [24]. Furthermore, cancer cells are widely known to duplicate their genetic material in an aberrant manner, often associated with DNA damage and genomic instability. The transfer of somatic nuclei to eggs leads to segregation errors related to problems with DNA replication [115]. Importantly, since early embryonic cells are the only cell type that does not require gene expression for cell cycle progression, nuclear transfer experiments conclusively show that cell type specific DNA replication provides constraints on the ability to progress through the cell cycle. Consequently, the pattern of DNA replication is not a mere adaption to a cell type specific gene expression program.

Rather, it limits the number of possible cellular states compatible with genetic integrity during cell proliferation. The organization of the genome and the packaging of the DNA in chromatin determine the ability to replicate and also enables cell type specific gene expression patterns, providing a link between cell identity and genetic stability.

The significance of this model to cellular biology is profound. The model predicts that cells with abnormal gene expression states will also have a compromised replication program, resulting in the activation of checkpoints that prevent or slow cell cycle progression (Fig.26). Thereby, checkpoints of genetic integrity are also the checkpoints of cellular identity, enabling continuous selection for cellular fitness in proliferating cells. During growth and development, this ensures the optimization of cell and organ function. During aging, it results in cell cycle arrest of damaged

121

cells. During terminal differentiation, the inactivation of a cell type specific replication program may be used in a physiological manner to permanently halt the proliferation of entire groups of cells. For instance, neurons do not proliferate, and forcing them into DNA replication results in cell death [174]. Cell type specific DNA replication may also be the primary impediment to the growth of cells assuming an abnormal identity. Consistent with this observation, DNA replication stress has been identified as an early barrier to tumorigenesis [175]. Proliferating cancer cells represent the rare gaps in this quality control system, allowing cell cycle progression of some abnormal cellular states, albeit with a compromised replication program and genetic instability.

Most abnormal cell states are incompatible with DNA replication and cell cycle progression would result in lethal chromosomal abnormalities. Perhaps the most important implication of this model is that, if cell type specific DNA replication is a cellular mechanism preventing the proliferation of abnormal cells, it may also be therapeutically relevant.

122

Figure 26. Cell type specific DNA replication programs limit changes in cell identity. A). Proliferating cells blue panel: Cell type transitions during normal development follow along cell type specific replication programs. In contrast, forced changes in cell identity during reprogramming to pluripotency (e.g. fibroblast to iPS cell) or direct conversion to another cell type

(e.g., fibroblast to hepatocyte) undermine replication program integrity and activate cell cycle checkpoints. Our current studies suggest that the consequences of compromised DNA replication program are one-ended DSBs that require repair by homologous recombination. Inefficient repair of these lesions results in activation of tumor-suppressing pathways including p53 and p21.

123

Figure 26. (continued) B). Proliferating cells pink panel: Deviation from a cell type specific replication program compatible with genetic stability (referred to as ‘replication ground state’) occurs during cell transformation. The outcome may be cell death, or genetic instability in proliferating cells. C). Cell cycle exit: The cell type specific replication program is compromised in a growing number of cells during aging, resulting in cellular senescence. Mature neurons lack a replication program compatible with genetic stability and permanently exit the cell cycle.

124

4.4. DNA BREAKS AND CANCER The regulation of cell cycle progression is of central importance to human health. The over- proliferation of normal and particularly abnormal cell types can be detrimental to organ function.

Our studies show that the pathway choice for the repair of a specific class of DNA damage: one- ended double strand breaks is a critical determinant to cell proliferation and acts as a barrier to abnormal cell type transitions. Breaks at fragile sites of the genome and regions that are difficult to replicate curb cell growth by generating an antiproliferative signal, which involves p53, Rb and p21. We show that 53bp1 helps sustain this signal by counteracting homology-dependent DNA repair. Besides suppressing proliferation, p21 and Rb also have epigenetic effects [21], which play a role in limiting cell type transitions. Therefore, DNA damage during reprogramming acts upstream of both cell cycle progression and epigenetic changes.

Similarly, increased replication stress in the first steps of tumorigenesis leads to oncogene- induced senescence and constitutes a barrier to malignant transformation [175, 176]. Since DNA damage frequently occurs at fragile sites, the instability of these regions might act as a tumor suppressor independent of their role in encoding gene products. This barrier to transformation might be overcome if the site generating the signal is eroded through recurrent damage. Indeed, cancer cells often have deletions in late replicating regions [177]. The erosion may be accelerated by mutations in genes required for the integrity of late replicating regions, such as BRCA1 which would explain the early onset of BRCA1 mutant tumors. Thus, DNA damage in late replicating regions, specifically one-ended DSBs, is a cell type-specific growth and transformation suppressor, which may be exploited for therapeutic purposes.

Current approaches to treat solid tumors already take advantage of this principle. Cytotoxic chemotherapies, which damage the DNA, mimic and amplify the signal emanating from late

125

replicating regions. However, while such treatments inhibit cell proliferation and promote apoptosis, they are not specific to genomic location and induce indiscriminate damage. To target exclusively the tumor in an effective manner, one may need to exploit the difference in replication patterns between healthy and transformed cells. Multiple groups have demonstrated that the temporal program of DNA replication differs between cell types, changes during differentiation or reprogramming and is altered in cancer [117, 146, 178]. However, the method portfolio to analyze

DNA replication patterns genome-wide lags far behind the tools to study gene expression as biomedical research has traditionally focused on the products of the genome, rather than the genome itself. Emerging tools which can be used genome-wide are the optical mapping of DNA replication after incorporation of fluorescent nucleotides [179] and the detection of nucleotide analogs through pore sequencing [168, 180]. Nanopore sequencing is particularly suitable for mapping replication patterns as it can provide information on origin usage, replication fork direction, speed, pausing and stalling, all in a sequence-specific context. This information can be assembled into maps which detail replication traffic in healthy and malignant cells (Fig.27). Cancer cell specific replication problems can then be exploited for the development of targeted treatments, which selectively suppress the proliferation of the tumor, without affecting normal cells. The success of such efforts could fundamentally shift cancer care from treatment and management to a cure.

126

Figure 27. Traffic analogy of DNA replication. A). Google maps of typical daily traffic in New York and Boston. In this analogy the two cities represent two different cell types. B). Some areas in each city incur traffic stalling, much like areas in the genome, which are difficult to replicate or are prone to breakage. These signals may be exploited for therapeutic purposes in a specific manner.

127

BIBLIOGRAPHY

1. Miki, Y., et al., A Strong Candidate for the Breast and Ovarian Cancer Susceptibility Gene BRCA1. Science 1994. 266(7): p. 66-71. 2. Roy, R., J. Chun, and S.N. Powell, BRCA1 and BRCA2: different roles in a common pathway of genome protection. Nat Rev Cancer, 2011. 12(1): p. 68-78. 3. Rebbeck, T.R., et al., Association of type and location of BRCA1 and BRCA2 mutations with risk of breast and ovarian cancer. JAMA, 2015. 313(13): p. 1347-61. 4. Sharma, B., et al., BRCA1 mutation spectrum, functions, and therapeutic strategies: The story so far. Curr Probl Cancer, 2018. 42(2): p. 189-207. 5. Karami, F. and P. Mehdipour, A comprehensive focus on global spectrum of BRCA1 and BRCA2 mutations in breast cancer. Biomed Res Int, 2013. 2013: p. 928562. 6. Macedo, G.S., B. Alemar, and P. Ashton-Prolla, Reviewing the characteristics of BRCA and PALB2-related cancers in the precision medicine era. Genet Mol Biol, 2019. 7. Monteiro, A.N.A., BRCA1: the enigma of tissue-specific tumor development. Trends in Genetics, 2003. 19(6): p. 312-315. 8. Hall, J., et al., Linkage of Early-Onset Familial Breast Cancer to Chromosome 17q21. Science 1990. 250(4988): p. 1684-1689. 9. Evers, B. and J. Jonkers, Mouse models of BRCA1 and BRCA2 deficiency: past lessons, current understanding and future prospects. Oncogene, 2006. 25(43): p. 5885-97. 10. Huen, M.S., S.M. Sy, and J. Chen, BRCA1 and its toolbox for the maintenance of genome integrity. Nat Rev Mol Cell Biol, 2010. 11(2): p. 138-48. 11. Shakya, R., et al., BRCA1 Tumor Suppression Depends on BRCT Phosphoprotein Binding, But Not Its E3 Ligase Activity. Science 2011. 334(28): p. 525-527. 12. Narod, S.A. and W.D. Foulkes, BRCA1 and BRCA2: 1994 and beyond. Nat Rev Cancer, 2004. 4(9): p. 665-76. 13. Xu, B., et al., Phosphorylation of Serine 1387 in Brca1 Is Specifically Required for the Atm- mediated S-Phase Checkpoint after Ionizing Irradiation. CANCER RESEARCH 2002. 62: p. 4588– 4591. 14. Cortez, D., et al., Requirement of ATM-Dependent Phosphorylation of Brca1 in the DNA Damage Response to Double-Strand Breaks. SCIENCE, 1999 286: p. 1162-1166. 15. Zhang, J., et al., Chk2 phosphorylation of BRCA1 regulates DNA double-strand break repair. Mol Cell Biol, 2004. 24(2): p. 708-18. 16. Moynahan, M.E. and M. Jasin, Mitotic homologous recombination maintains genomic stability and suppresses tumorigenesis. Nat Rev Mol Cell Biol, 2010. 11(3): p. 196-207. 17. Castillo, A., et al., The BRCA1-interacting protein Abraxas is required for genomic stability and tumor suppression. Cell Rep, 2014. 8(3): p. 807-17. 18. Peng, M., et al., BACH1 is a DNA repair protein supporting BRCA1 damage response. Oncogene, 2006. 25(15): p. 2245-53. 19. Yun, M. and K. Hiom, CtIP-BRCA1 modulates the choice of DNA double-strand-break repair pathway throughout the cell cycle. Nature, 2009. 459(Iss. 7245): p. 460-463. 20. Pao, G., et al., CBP/p300 interact with and function as transcriptional coactivators of BRCA1. PNAS, 2000. 97(3): p. 1020-1025. 21. Kareta, M.S., et al., Inhibition of pluripotency networks by the Rb tumor suppressor restricts reprogramming and tumorigenesis. Cell Stem Cell, 2015. 16(1): p. 39-50. 22. Takahashi, K. and S. Yamanaka, Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell, 2006. 126(4): p. 663-76.

128

23. Lee, J.Y., et al., Rad51 Regulates Reprogramming Efficiency through DNA Repair Pathway. Dev Reprod, 2016. 20(2): p. 163-9. 24. Gonzalez, F., et al., Homologous recombination DNA repair genes play a critical role in reprogramming to a pluripotent state. Cell Rep, 2013. 3(3): p. 651-60. 25. Gomez-Cabello, D., et al., CtIP-Specific Roles during Cell Reprogramming Have Long-Term Consequences in the Survival and Fitness of Induced Pluripotent Stem Cells. Stem Cell Reports, 2017. 8(2): p. 432-445. 26. Chen, C.C., et al., Homology-Directed Repair and the Role of BRCA1, BRCA2, and Related Proteins in Genome Integrity and Cancer. Annu Rev Cancer Biol, 2018. 2: p. 313-336. 27. Moynahan, M.E., et al., Brca1 Controls Homology-Directed DNA Repair. Molecular Cell, 1999. 4(511-518). 28. Aparicio, T., R. Baer, and J. Gautier, DNA double-strand break repair pathway choice and cancer. DNA Repair (Amst), 2014. 19: p. 169-75. 29. Aguilera, A. and B. Gomez-Gonzalez, Genome instability: a mechanistic view of its causes and consequences. Nat Rev Genet, 2008. 9(3): p. 204-17. 30. Cannan, W.J. and D.S. Pederson, Mechanisms and Consequences of Double-Strand DNA Break Formation in Chromatin. J Cell Physiol, 2016. 231(1): p. 3-14. 31. Brandsma, I. and D. van Gent, Pathway choice in DNA double strand break repair: observations of a balancing act. Genome Integrity 2012. 3(9). 32. Isono, M., et al., BRCA1 Directs the Repair Pathway to Homologous Recombination by Promoting 53BP1 Dephosphorylation. Cell Rep, 2017. 18(2): p. 520-532. 33. Cao, L., et al., A selective requirement for 53BP1 in the biological response to genomic instability induced by Brca1 deficiency. Mol Cell, 2009. 35(4): p. 534-41. 34. Ludwig, T., et al., Tumorigenesis in mice carrying a truncating Brca1 mutation. Genes Dev, 2001. 15(10): p. 1188-93. 35. Bunting, S.F., et al., 53BP1 inhibits homologous recombination in Brca1-deficient cells by blocking resection of DNA breaks. Cell, 2010. 141(2): p. 243-54. 36. Zhang, F., et al., PALB2 links BRCA1 and BRCA2 in the DNA-damage response. Curr Biol, 2009. 19(6): p. 524-9. 37. Gerhardt, J., et al., Stalled DNA Replication Forks at the Endogenous GAA Repeats Drive Repeat Expansion in Friedreich's Ataxia Cells. Cell Rep, 2016. 16(5): p. 1218-1227. 38. Felipe-Abrio, I., et al., RNA polymerase II contributes to preventing transcription-mediated replication fork stalls. EMBO J, 2015. 34(2): p. 236-50. 39. Bester, A.C., et al., Nucleotide deficiency promotes genomic instability in early stages of cancer development. Cell, 2011. 145(3): p. 435-46. 40. Ciccia, A., et al., The SIOD disorder protein SMARCAL1 is an RPA-interacting protein involved in replication fork restart. Genes Dev, 2009. 23(20): p. 2415-25. 41. Neelsen, K.J. and M. Lopes, Replication fork reversal in eukaryotes: from dead end to dynamic response. Nat Rev Mol Cell Biol, 2015. 16(4): p. 207-20. 42. Taglialatela, A., et al., Restoration of Replication Fork Stability in BRCA1- and BRCA2- Deficient Cells by Inactivation of SNF2-Family Fork Remodelers. Mol Cell, 2017. 68(2): p. 414- 430 e8. 43. Kolinjivadi, A.M., et al., Smarcal1-Mediated Fork Reversal Triggers Mre11-Dependent Degradation of Nascent DNA in the Absence of Brca2 and Stable Rad51 Nucleofilaments. Mol Cell, 2017. 67(5): p. 867-881 e7. 44. Anderson, S., et al., BRCA1 protein is linked to the RNA polymerase II holoenzyme complex via RNA helicase A. Nature Genetics volume 1998. 19: p. 254-256.

129

45. Scully, R., et al., BRCA1 is a component of the RNA polymerase II holoenzyme. Proc. Natl. Acad. Sci. , 1997. 94: p. 5605–5610. 46. Krum, S.A., et al., BRCA1 associates with processive RNA polymerase II. J Biol Chem, 2003. 278(52): p. 52012-20. 47. Fan, S., et al., BRCA1 Inhibition of Estrogen Receptor Signaling in Transfected Cells. Science, 1999. 284(5418): p. 1354-1356. 48. Wang, Q., et al., BRCA1 binds c-Myc and inhibits its transcriptional and transforming activity in cells. Oncogene, 1998. 17: p. 1939-1948. 49. Zhang, H., et al., BRCA1 physically associates with p53 and stimulates its transcriptional activity. Oncogene, 1998. 16: p. 1713-1721. 50. Yarden, R. and L. Brody, BRCA1 interacts with components of the histone deacetylase complex. Proc. Natl. Acad. Sci., 1999. 96: p. 4983–4988,. 51. Bochar, D., et al., BRCA1 Is Associated with a Human SWI/SNF-Related Complex: Linking Chromatin Remodeling to Breast Cancer. Cell, 2000. 102: p. 257–265. 52. Cantor, S., et al., BACH1, a Novel Helicase-like Protein, Interacts Directly with BRCA1 and Contributes to Its DNA Repair Function. Cell, 2001. 105 p. 149–160. 53. Arlt, M.F., et al., BRCA1 is required for common-fragile-site stability via its G2/M checkpoint function. Mol Cell Biol, 2004. 24(15): p. 6701-9. 54. Yarden, R.I., et al., BRCA1 regulates the G2/M checkpoint by activating Chk1 kinase upon DNA damage. Nat Genet, 2002. 30(3): p. 285-9. 55. Kim, S.S., et al., Uterus hyperplasia and increased carcinogen-induced tumorigenesis in mice carrying a targeted mutation of the Chk2 phosphorylation site in Brca1. Mol Cell Biol, 2004. 24(21): p. 9498-507. 56. Wu, L., et al., Identification of a RING protein that can interact in vivo with the BRCA1 gene product. Nature Genetics 1996. 14(4): p. 430-440. 57. Joukov, V., et al., Functional communication between endogenous BRCA1 and its partner, BARD1, during Xenopus laevis development. PNAS, 2001. 98(21): p. 12078–12083. 58. Li, M. and X. Yu, Function of BRCA1 in the DNA damage response is mediated by ADP- ribosylation. Cancer Cell, 2013. 23(5): p. 693-704. 59. McCarthy, E.E., et al., Loss of Bard1, the heterodimeric partner of the Brca1 tumor suppressor, results in early embryonic lethality and chromosomal instability. Mol Cell Biol, 2003. 23(14): p. 5056-63. 60. Shakya, R., et al., The basal-like mammary carcinomas induced by Brca1 or Bard1 inactivation implicate the BRCA1/BARD1 heterodimer in tumor suppression. PNAS 2008. 105(19): p. 7040–7045. 61. De Brakeleer, S., et al., Frequent incidence of BARD1-truncating mutations in germline DNA from triple-negative breast cancer patients. Clin Genet, 2016. 89(3): p. 336-40. 62. Billing, D., et al., The BRCT Domains of the BRCA1 and BARD1 Tumor Suppressors Differentially Regulate Homology-Directed Repair and Stalled Fork Protection. Mol Cell, 2018. 72(1): p. 127-139 e8. 63. Gupta, R., et al., FANCJ (BACH1) helicase forms DNAdamage inducible foci with replication proteinAand interacts physically and functionally with the single-stranded DNA-binding protein. . Blood, 2019. 110(7): p. 2390-2398. 64. Vassin, V.M., et al., Human RPA phosphorylation by ATR stimulates DNA synthesis and prevents ssDNA accumulation during DNA-replication stress. J Cell Sci, 2009. 122(Pt 22): p. 4070- 80.

130

65. Murphy, A.K., et al., Phosphorylated RPA recruits PALB2 to stalled DNA replication forks to facilitate fork recovery. J Cell Biol, 2014. 206(4): p. 493-507. 66. Burma, S., et al., ATM phosphorylates histone H2AX in response to DNA double-strand breaks. J Biol Chem, 2001. 276(45): p. 42462-7. 67. Ward, I.M. and J. Chen, Histone H2AX is phosphorylated in an ATR-dependent manner in response to replicational stress. J Biol Chem, 2001. 276(51): p. 47759-62. 68. Rogakou, E.P., et al., DNA Double-stranded Breaks Induce Histone H2AX Phosphorylation on Serine 139. . The Journal of Biological Chemistry, 1997. 273(10): p. 5858–5868. 69. Kobayashi, J., Molecular Mechanism of the Recruitment of NBS1/hMRE11/hRAD50 Complex to DNA Double-strand Breaks: NBS1 Binds to γ -H2AX through FHA/BRCT Domain. . J. Radiat. Res. , 2004. 45: p. 473–478. 70. Hustedt, N. and D. Durocher, The control of DNA repair by the cell cycle. Nat Cell Biol, 2016. 19(1): p. 1-9. 71. Mao, Z., et al., Comparison of nonhomologous end joining and homologous recombination in human cells. DNA Repair (Amst), 2008. 7(10): p. 1765-71. 72. Lee, J.S., et al., hCds1-mediated phosphorylation of BRCA1 regulates the DNA damage response. Nature, 2000. 404(6774): p. 201-204. 73. Feng, L., et al., Cell cycle-dependent inhibition of 53BP1 signaling by BRCA1. Cell Discov, 2015. 1: p. 15019. 74. Sartori, A.A., et al., Human CtIP promotes DNA end resection. Nature, 2007. 450(7169): p. 509-14. 75. Nimonkar, A.V., et al., BLM-DNA2-RPA-MRN and EXO1-BLM-RPA-MRN constitute two DNA end resection machineries for human DNA break repair. Genes Dev, 2011. 25(4): p. 350-62. 76. Jensen, R.B., A. Carreira, and S.C. Kowalczykowski, Purified human BRCA2 stimulates RAD51-mediated recombination. Nature, 2010. 467(7316): p. 678-83. 77. Liu, J., et al., Human BRCA2 protein promotes RAD51 filament formation on RPA-covered single-stranded DNA. Nat Struct Mol Biol, 2010. 17(10): p. 1260-2. 78. Sung, P. and H. Klein, Mechanism of homologous recombination: mediators and helicases take on regulatory functions. Nat Rev Mol Cell Biol, 2006. 7(10): p. 739-50. 79. Zimmermann, M., et al., 53BP1 Regulates DSB Repair Using Rif1 to Control 5′ End Resection. Science 2013. 339 p. 700-704. 80. Escribano-Diaz, C., et al., A cell cycle-dependent regulatory circuit composed of 53BP1- RIF1 and BRCA1-CtIP controls DNA repair pathway choice. Mol Cell, 2013. 49(5): p. 872-83. 81. DeFazio, L., et al., Synapsis of DNA ends by DNA-dependent protein kinase. EMBO J, 2002. 21(12): p. 21390-3200. 82. Ma, Y., K. Schwarz, and M.R. Lieber, The Artemis:DNA-PKcs endonuclease cleaves DNA loops, flaps, and gaps. DNA Repair (Amst), 2005. 4(7): p. 845-51. 83. Grawunder, U., et al., Activity of DNAligase IV stimulatedbycomplex formationwithXRCC4 protein inmammalian cells. NATURE 1997. 388: p. 492-495. 84. Tsai, C., S. Kim, and G. Chu, Cernunnos/XLF promotes the ligation of mismatched and noncohesive DNA ends. PNAS 2007. 104(19): p. 7851–7856. 85. Ward, I.M., et al., p53 Binding protein 53BP1 is required for DNA damage responses and tumor suppression in mice. Mol Cell Biol, 2003. 23(7): p. 2556-63. 86. Panier, S. and S.J. Boulton, Double-strand break repair: 53BP1 comes into focus. Nat Rev Mol Cell Biol, 2014. 15(1): p. 7-18. 87. Silverman, J., et al., Human Rif1, ortholog of a yeast telomeric protein, is regulated by ATM and 53BP1 and functions in the S-phase checkpoint. Genes Dev, 2004. 18(17): p. 2108-19.

131

88. Munoz, I.M., et al., Phospho-epitope binding by the BRCT domains of hPTIP controls multiple aspects of the cellular response to DNA damage. Nucleic Acids Res, 2007. 35(16): p. 5312-22. 89. Jaspers, J.E., et al., Loss of 53BP1 causes PARP inhibitor resistance in Brca1-mutated mouse mammary tumors. Cancer Discov, 2013. 3(1): p. 68-81. 90. DiTullio, R.A., Jr., et al., 53BP1 functions in an ATM-dependent checkpoint pathway that is constitutively activated in human cancer. Nat Cell Biol, 2002. 4(12): p. 998-1002. 91. Lee, J.H., et al., 53BP1 promotes ATM activity through direct interactions with the MRN complex. EMBO J, 2010. 29(3): p. 574-85. 92. Cuella-Martin, R., et al., 53BP1 Integrates DNA Repair and p53-Dependent Cell Fate Decisions via Distinct Mechanisms. Mol Cell, 2016. 64(1): p. 51-64. 93. Ward, I.M., et al., 53BP1 cooperates with p53 and functions as a haploinsufficient tumor suppressor in mice. Mol Cell Biol, 2005. 25(22): p. 10079-86. 94. Callen, E., et al., 53BP1 mediates productive and mutagenic DNA repair through distinct phosphoprotein interactions. Cell, 2013. 153(6): p. 1266-80. 95. Pasero, P. and A. Vindigni, Nucleases Acting at Stalled Forks: How to Reboot the Replication Program with a Few Shortcuts. Annu Rev Genet, 2017. 51: p. 477-499. 96. Schmid, J.A., et al., Histone Ubiquitination by the DNA Damage Response Is Required for Efficient DNA Replication in Unperturbed . Mol Cell, 2018. 71(6): p. 897-910 e8. 97. Yeeles, J.T., et al., Rescuing stalled or damaged replication forks. Cold Spring Harb Perspect Biol, 2013. 5(5): p. a012815. 98. Karow, J., et al., The Bloom’s syndrome gene product promotes branch migration of Holliday junctions. PNAS, 2000. 97(12): p. 6504–6508. 99. Constantinou, A., et al., Werner's syndrome protein (WRN) migrates Holliday junctions and co-localizes with RPA upon replication arrest. EMBO Rep, 2000. 1(1): p. 80-84. 100. Gallo-Fernandez, M., et al., Cell cycle-dependent regulation of the nuclease activity of Mus81-Eme1/Mms4. Nucleic Acids Res, 2012. 40(17): p. 8325-35. 101. Duda, H., et al., A Mechanism for Controlled Breakage of Under-replicated during Mitosis. Dev Cell, 2016. 39(6): p. 740-755. 102. Dominguez-Kelly, R., et al., Wee1 controls genomic stability during replication by regulating the Mus81-Eme1 endonuclease. J Cell Biol, 2011. 194(4): p. 567-79. 103. Harvey, S.L., et al., Cdk1-dependent regulation of the mitotic inhibitor Wee1. Cell, 2005. 122(3): p. 407-20. 104. Xu, Y., et al., 53BP1 and BRCA1 control pathway choice for stalled replication restart. Elife, 2017. 6. 105. Lugli, N., S.K. Sotiriou, and T.D. Halazonetis, The role of SMARCAL1 in replication fork stability and telomere maintenance. DNA Repair (Amst), 2017. 56: p. 129-134. 106. Baradaran-Heravi, A., et al., SMARCAL1 deficiency predisposes to non-Hodgkin lymphoma and hypersensitivity to genotoxic agents in vivo. Am J Med Genet A, 2012. 158A(9): p. 2204-13. 107. Yusufzai, T. and J.T. Kadonaga, HARP Is an ATP-Driven Annealing Helicase. Science 2008. 322(5902): p. 748-750. 108. Poole, L.A., et al., SMARCAL1 maintains telomere integrity during DNA replication. Proc Natl Acad Sci U S A, 2015. 112(48): p. 14864-9. 109. Gonzalez, F., S. Boue, and J.C. Izpisua Belmonte, Methods for making induced pluripotent stem cells: reprogramming a la carte. Nat Rev Genet, 2011. 12(4): p. 231-42. 110. Hou, P., et al., Pluripotent Stem Cells Induced from Mouse Somatic Cells by Small-Molecule Compounds. Science 2013. 341(6146): p. 651-654.

132

111. Polo, J.M., et al., A molecular roadmap of reprogramming somatic cells into iPS cells. Cell, 2012. 151(7): p. 1617-32. 112. Buganim, Y., D.A. Faddah, and R. Jaenisch, Mechanisms and models of somatic cell reprogramming. Nat Rev Genet, 2013. 14(6): p. 427-39. 113. Nakagawa, M., et al., Generation of induced pluripotent stem cells without Myc from mouse and human fibroblasts. Nature Biotechnology, 2008. 26(1). 114. Srinivasan, S.V., et al., Cdc45 is a critical effector of myc-dependent DNA replication stress. Cell Rep, 2013. 3(5): p. 1629-39. 115. Chia, G., et al., Genomic instability during reprogramming by nuclear transfer is DNA replication dependent. Nat Cell Biol, 2017. 19(4): p. 282-291. 116. Ruiz, S., et al., Limiting replication stress during somatic cell reprogramming reduces genomic instability in induced pluripotent stem cells. Nat Commun, 2015. 6: p. 8036. 117. Lu, J., et al., The distribution of genomic variations in human iPSCs is related to replication- timing reorganization during reprogramming. Cell Rep, 2014. 7(1): p. 70-8. 118. Doege, C.A., et al., Early-stage epigenetic modification during somatic cell reprogramming by Parp1 and Tet2. Nature, 2012. 488(7413): p. 652-5. 119. Muller, L.U., et al., Overcoming reprogramming resistance of Fanconi anemia cells. Blood, 2012. 119(23): p. 5449-57. 120. Kinoshita, T., et al., Ataxia-telangiectasia mutated (ATM) deficiency decreases reprogramming efficiency and leads to genomic instability in iPS cells. Biochem Biophys Res Commun, 2011. 407(2): p. 321-6. 121. Kawamura, T., et al., Linking the p53 tumour suppressor pathway to somatic cell reprogramming. Nature, 2009. 460: p. 1140-1144. 122. Hong, H., et al., Suppression of Indued Pluripotent Stem Cell Generation by the p53-p21 Pathway. Nature 2009. 460(27): p. 1132-1134. 123. Raya, A., et al., Disease-corrected haematopoietic progenitors from Fanconi anaemia induced pluripotent stem cells. Nature 2009. 460(7251): p. 53-59. 124. Utikal, J., et al., Immortalization eliminates a roadblock during cellular reprogramming into iPS cells. Nature, 2009. 460(7259): p. 1145-8. 125. Moynahan, M.E., et al., Brca1 controls homology-directed DNA repair. Mol Cell, 1999. 4(4): p. 511-8. 126. Scully, R., et al., Association of BRCA1 with Rad51 in mitotic and meiotic cells. Cell, 1997. 88(2): p. 265-75. 127. Pathania, S., et al., BRCA1 haploinsufficiency for replication stress suppression in primary cells. Nat Commun, 2014. 5: p. 5496. 128. Canny, M.D., et al., Inhibition of 53BP1 favors homology-dependent DNA repair and increases CRISPR-Cas9 genome-editing efficiency. Nat Biotechnol, 2018. 36(1): p. 95-102. 129. Mateos-Gomez, P.A., et al., The helicase domain of Poltheta counteracts RPA to promote alt-NHEJ. Nat Struct Mol Biol, 2017. 24(12): p. 1116-1123. 130. Marion, R.M., et al., A p53-mediated DNA damage response limits reprogramming to ensure iPS cell genomic integrity. Nature, 2009. 460(7259): p. 1149-53. 131. Chia, G., et al., Genomic instability during reprogramming by nuclear transfer is DNA replication dependent. Nat Cell Biol, 2017. 132. Yu, X. and J. Chen, DNA damage-induced cell cycle checkpoint control requires CtIP, a phosphorylation-dependent binding partner of BRCA1 C-terminal domains. Mol Cell Biol, 2004. 24(21): p. 9478-86. 133. Letessier, A., et al., Cell-type-specific replication initiation programs set fragility of the FRA3B fragile site. Nature, 2011. 470(7332): p. 120-3.

133

134. Pasi, C.E., et al., Genomic instability in induced stem cells. Cell Death Differ, 2011. 18(5): p. 745-53. 135. Hussein, S.M., et al., Copy number variation and selection during reprogramming to pluripotency. Nature, 2011. 471(7336): p. 58-62. 136. Ben-Porath, I., et al., An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors. Nat Genet, 2008. 40(5): p. 499-507. 137. Utikal, J., et al., Immortalization eliminates a roadblock during cellular reprogramming into iPS cells. Nature, 2009. 460(7259): p. Nature. 2009 Aug 27;460(7259):1145-8. doi: 10.1038/nature08285. Epub 2009 Aug 9. 138. Moreno, A., et al., Unreplicated DNA remaining from unperturbed S phases passes through mitosis for resolution in daughter cells. Proc Natl Acad Sci U S A, 2016. 113(39): p. E5757-64. 139. Lezaja, A. and M. Altmeyer, Inherited DNA lesions determine G1 duration in the next cell cycle. Cell Cycle, 2018. 17(1): p. 24-32. 140. Moser, J., et al., Control of the Restriction Point by Rb and p21. Proc Natl Acad Sci U S A, 2018. 115(35): p. E8219-e8227. 141. Feng, W. and M. Jasin, BRCA2 suppresses replication stress-induced mitotic and G1 abnormalities through homologous recombination. Nat Commun, 2017. 8(525). 142. Bouwman, P., et al., 53BP1 loss rescues BRCA1 deficiency and is associated with triple- negative and BRCA-mutated breast cancers. Nat Struct Mol Biol, 2010. 17(6): p. 688-95. 143. Terret, M.E., et al., Cohesin acetylation speeds the replication fork. Nature, 2009. 462(7270): p. 231-4. 144. Hansen, R.S., et al., Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc Natl Acad Sci U S A, 2010. 107(1): p. 139-44. 145. Leslie Smith, A.P., and Mathew Thayer, Delayed replication timing leads to delayed mitotic chromosome condensation and chromosomal instability of chromosome translocations. Proceedings of the National Academy of Sciences of the United States of America, 2001. 98(23): p. 13300-13305. 146. Hiratani, I., et al., Global reorganization of replication domains during embryonic stem cell differentiation. PLoS Biol, 2008. 6(10): p. e245. 147. Pope, B.D., et al., Topologically associating domains are stable units of replication-timing regulation. Nature, 2014. 515(7527): p. 402-5. 148. Amiel, l.K., Tania; Fishman, Amiram; Gaber, Elena; Klein, Zvi; Beyth, Yoram; Fejgin, Moshe D, Replication Pattern of the p53 and 21q22 Loci in the Premalignant and Malignant Stages of Carcinoma of the Cervix-Cancer.pdf>. Cancer 1998. 83(9): p. 1966-1971. 149. Leslie Smith, A.P., and Mathew Thayer, Delayed replication timing leads to delayed mitotic chromosome condensation and chromosomal instability of chromosome translocations. PNAS, 2001. vol. 98 (no. 23): p. 13300–13305 150. Haye-Bertolozzi, J.A., Oscar M., Quantitative Bromodeoxyuridine Immunoprecipitation Analyzed by High-Throughput Sequencing (qBrdU-Seq or QBU). In: Muzi-Falconi M., Brown G. (eds) Genome Instability. Methods in Molecular Biology, 2018. 1672: p. 209-225. 151. McGuffee, S.R., D.J. Smith, and I. Whitehouse, Quantitative, genome-wide analysis of eukaryotic replication initiation and termination. Mol Cell, 2013. 50(1): p. 123-35. 152. Koren, A., et al., Genetic variation in human DNA replication timing. Cell, 2014. 159(5): p. 1015-1026.

134

153. Norio, P., et al., Progressive activation of DNA replication initiation in large domains of the immunoglobulin heavy chain locus during B cell development. Mol Cell, 2005. 20(4): p. 575-87. 154. Jenjaroenpun, P., et al., Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D. Nucleic Acids Res, 2018. 46(7): p. e38. 155. Goodwin, S., et al., Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res, 2015. 25(11): p. 1750-6. 156. Rand, A.C., et al., Mapping DNA methylation with high-throughput nanopore sequencing. Nat Methods, 2017. 14(4): p. 411-413. 157. Luzzietti, N., et al., Nicking enzyme-based internal labeling of DNA at multiple loci. Nat Protoc, 2012. 7(4): p. 643-53. 158. Liang, L. and D. Astruc, The copper(I)-catalyzed alkyne-azide cycloaddition (CuAAC) “click” reaction and its applications. An overview. Coordination Chemistry Reviews, 2011. 255(23- 24): p. 2933-2945. 159. Hong, V., et al., Analysis and optimization of copper-catalyzed azide-alkyne cycloaddition for bioconjugation. Angew Chem Int Ed Engl, 2009. 48(52): p. 9879-83. 160. Liu, Q., et al., NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data. bioRxiv, 2018. 161. Russev, G.C.T., R.G., Continuous Labeling of Mammalian DNA in vivo. Analytical Biochemistry 1973(54): p. 115-119. 162. Li-Qun Gu, O.B., Sean Conlan, Stephen Cheley & Hagan Bayley, Stochastic sensingof organic analytesbyapore-forming protein containing amolecularadapter. Nature, 1999. Vol 398 163. Joseph W. F. Robertson, C.G.R., Vincent M. Stanford, Kenneth A. Rubinson, Oleg V. Krasilnikov, and John J. Kasianowicz, Single-molecule mass spectrometry in solution using a solitary nanopore. PNAS 2007 Vol. 104 ( No. 20 ): p. 8207–8211. 164. Abel, G.R., Jr., et al., Measuring and Suppressing the Oxidative Damage to DNA During Cu(I)-Catalyzed Azide-Alkyne Cycloaddition. Bioconjug Chem, 2016. 27(3): p. 698-704. 165. Tera, M., S.M.K. Glasauer, and N.W. Luedtke, In Vivo Incorporation of Azide Groups into DNA by Using Membrane-Permeable Nucleotide Triesters. Chembiochem, 2018. 166. Neef, A.B. and N.W. Luedtke, An azide-modified nucleoside for metabolic labeling of DNA. Chembiochem, 2014. 15(6): p. 789-93. 167. Rieder, U. and N.W. Luedtke, Alkene-tetrazine ligation for imaging cellular DNA. Angew Chem Int Ed Engl, 2014. 53(35): p. 9168-72. 168. Muller, C.A., et al., Capturing the dynamics of genome replication on individual ultra-long nanopore sequence reads. Nat Methods, 2019. 16(5): p. 429-436. 169. Istace, B., et al., de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer. Gigascience, 2017. 6(2): p. 1-13. 170. Tyson, J.R., et al., MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome. Genome Res, 2018. 28(2): p. 266-274. 171. Jain, M., et al., Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol, 2018. 36(4): p. 338-345. 172. Georgieva, D. and D. Egli, Tying genetic stability to cell identity. Cell Cycle, 2017. 16. 173. Shechter, D., V. Costanzo, and J. Gautier, ATR and ATM regulate the timing of DNA replication origin firing. Nat Cell Biol, 2004. 6(7): p. 648-55.

135

174. Herrup, K., et al., Divide and die: cell cycle events as triggers of nerve cell death. J Neurosci, 2004. 24(42): p. 9232-9. 175. Bartkova, J., et al., DNA damage response as a candidate anti-cancer barrier in early human tumorigenesis. Nature, 2005. 434,(7035): p. 864-70. 176. Bartkova, J., et al., Oncogene-induced senescence is part of the tumorigenesis barrier imposed by DNA damage checkpoints. Nature, 2006. 444(7119): p. 633-7. 177. Ohta, M., et al., The FHIT gene, spanning the chromosome 3p14.2 fragile site and renal carcinoma-associated t(3;8) breakpoint, is abnormal in digestive tract cancers. Cell, 1996. 84(4): p. 587-597. 178. Donley, N. and M.J. Thayer, DNA replication timing, genome stability and cancer: late and/or delayed DNA replication timing is associated with increased genomic instability. Semin Cancer Biol, 2013. 23(2): p. 80-9. 179. Klein, K., et al., Genome-Wide Identification of Early-Firing Human Replication Origins by Optical Replication Mapping. bioRxiv 2017. 180. Georgieva, D., et al., Detection of Base Analogs Incorporated During DNA Replication by Nanopore Sequencing. bioRxiv, 2019.

136