The Function and Production of EccDNA

Teressa Paulsen Charlottesville, VA

B.S., Brigham Young University, 2012 B.A., University of Utah, 2014

A Dissertation presented to the Graduate Faculty of the University of Virginia in Candidacy for the Degree of Doctor of Philosophy

Department of Biochemistry and Molecular Genetics

University of Virginia June, 2020 ABSTRACT

The hallmark of oncogenesis is genetic alterations that lead to unregulated cell proliferation and migration. One type of genetic alteration is the excision or copying of DNA from the body during DNA repair followed by ligation of the non-chromosomal DNA to form extrachromosomal circular DNA (eccDNA). DNA sequences within eccDNA can become greatly amplified through non-Mendelian inheritance, which can lead to changes in the expression of within the DNA sequence, including full coding genes as well as non-coding microRNA and novel si-like RNA. This leads to changes in expression within the cell, increasing the heterogeneity and adaptability of a population of cancer cells.

EccDNA formation occurs through resection-dependent DNA repair pathways which utilize microhomology-mediated end joining and trigger mismatch repair pathways. Conversely, non-homologous blunt end joining of DNA double strand breaks on the chromosome represses the formation of eccDNA. The formation of eccDNA is tied to double strand breaks (DSBs) in the genome and to replication, and eccDNAs are produced most during S-phase. These data suggest that eccDNA forms naturally as a result of breaks of DNA during replication. I hypothesize that when DNA at the end of a break is resected, the resulting single strand sequence anneals back to itself, or to a sister-chromatid, and forms mismatches and looped structures in the DNA that give rise to eccDNA. These observations can help improve cancer treatment strategies by providing targets for decreasing eccDNA formation and thus decrease tumor adaptation and growth.

2 ACKNOWLEDGEMENTS

I am very grateful for all the mentorship and guidance I have received through the course of my PhD program. Dr. Anindya Dutta has directed my projects and assisted in all steps of going through the milestones of the PhD program and publishing our results. My committee members: Dr. Yuh-Hwa Wang, Dr. Marty Mayo, Dr. Jeff Smith, and Dr. James Larner have all significantly helped develop my project and have broadened and deepened my work through insightful questions and ideas. Dr. Yuh-Hwa Wang has directly helped my project by mentoring me directly on the bench work preparations of samples for electron microscopy imaging. Dr.

Tarek Abbas and Rebeka Eki helped our project by contributing knock out cell lines which were used to analyze the genes involved in eccDNA formation. I am very grateful for all the assistance I received and for all the guidance that drove the project forward.

Further, I am grateful for the past mentorship I received as an undergraduate research assistant in my past education. I would like to express my very great appreciation for Dr. Simon

Titen who mentored me. He inspired me by investing my progress as an undergraduate and future scientist. I would like to offer my special thanks to Dr. Mario Capecchi, at the University of

Utah, and Dr. Gregory Burton, at Brigham Young University, who allowed me to work in their labs and help develop their projects.

3 TABLE OF CONTENTS

LIST OF FIGURES ...... 5

LIST OF TABLES ...... 7

CHAPTER I: Extrachromosomal circles of DNA in eukaryotes ...... 8

Introduction ...... 9

Discussion ...... 18

CHAPTER II: Small extrachromosomal circular DNA produce short regulatory RNAs that suppress independent of canonical promoters ...... 32

Introduction ...... 33

Materials and Methods ...... 35

Results ...... 39

Discussion ...... 46

CHAPTER III: DNA repair of double-strand breaks by end-resection and homology dependent repair promote eccDNA formation ...... 67

Introduction ...... 68

Materials and Methods ...... 70

Results ...... 71

Discussion ...... 79

CHAPTER IV: Future directions ...... 96

APPENDIX: Contributions to other published works ...... 110

4 LIST OF FIGURES

Figure 1-1 Junctional tag: Schematic representation of eccDNA and junctional sequence genesis from linear DNA ...... 25

Figure 1-2 Examples of how eccDNA is formed: Replication slippage creates a loop on the template strand through mis-priming of a dissociated polymerase at the wrong direct repeat. .. 26

Figure 1-3 Functions of eccDNA in mammalian cells...... 27

Figure 2-1 In vitro transcription of artificial microDNA ...... 50

Figure 2-2 Transcription of artificial microDNA carrying microRNA in vivo ...... 51

Figure 2-3 Transfection of artificial microDNA carrying pre-microRNA sequences decreases expression of a co-transfected Renilla luciferase reporter containing a sequence complementary to the microRNA sequence within its 3’ UTR ...... 52

Figure 2-4 Formation of si-like or sh-like RNA from microDNA containing an exonic sequence 53

Figure 2-5 Efficiency of pull-down of RNA polymerase complex by HT ...... 54

Figure 2-6 Subunits of PolII and PolIII bind to microDNA ...... 55

Supplemental Figure 2-1 Circular microDNA mimic molecules of all topologies are transcribed by RNA polymerases in HeLa nuclear extract in vitro transcription assay...... 57

Supplemental Figure 2-2 The in vitro transcription of microDNA by RNA polymerases within HeLa Nuclear Extract is verified with two other microDNA sequences ...... 58

Supplemental Figure 2-3 Distribution of transcripts from microDNA is not random ...... 59

Supplemental Figure 2-4 RNA transcribed from circular microDNA specifically and the repression of gene expression by RNA transcripts arising from microDNA is sequence-specific60

Supplemental Figure 2-5 Diagram of the luciferase reporter assay used to test whether endogenous microDNA have the ability to repress gene expression ...... 61

Supplemental Figure 2-6 Percentage of the genome that give rise to microDNA ...... 62

Figure 3-1 EccDNA formation is induced by disruptions to DNA structure: (A) Assay developed to quantify eccDNA ...... 83

Figure 3-2 EccDNA formation is suppressed by c-NHEJ and increased by alt-NHEJ pathways84

Figure 3-3 After DSB, cells lacking c-NHEJ have more eccDNA and cells lacking functional MMEJ and resection have fewer eccDNA ...... 85

5 Figure 3-4 EccDNA formation is increased in S-phase, G2-phase, and M-phase of the cell cycle86

Supplemental Figure 3-1 Propidium Iodide FACS profiles to show cell-cycle profile of HeLa cells ...... 90

Figure 4-1 DSB induces long eccDNAs as detected by metaphase spread of ES2 ovarian cancer cells ...... 105

Figure 4-2 DSB induces long eccDNAs and this is repressed by inhibition of MMEJ by the PARP inhibitor AZD2461 ...... 106

Figure 4-3 Levels of transfected eccDNA remaining after transfection at time indicated in 293T, U2OS and HeLa cells ...... 107

Figure 4-4 EM quantifications of eccDNA ...... 108

6 LIST OF TABLES

Table 1-1 Size range of circular DNA in eukaryotes ...... 28

Table 2-1 MicroDNA diversity associated with HT, and HT fused with indicated RNA polymerase subunits ...... 56

Supplemental Table 2-1 Artificial microDNA were created using sequences of endogenous microDNA found in previous studies (Shibata 2012, Dillon 2015) which overlapped with microRNA ...... 63

Supplemental Table 3-1 Chromosomal loci interrogated by inverse PCR ...... 87-88

Supplemental Table 3-2 Inverse PCR primers to detect eccDNA formed from near a CRISPR- Cas9 induced double strand break ...... 89

7 Chapter 1

Extrachromosomal circles of DNA in eukaryotes

In this chapter, the first part, page 2-15 is taken from a review cited below. I wrote the rest of the chapter (pages 16-22) uniquely for this thesis:

Paulsen T, Kumar P, Koseoglu MM, Dutta A. Discoveries of Extrachromosomal Circles of DNA in Normal and Tumor Cells. Trends Genet. 2018;34(4):270-278. doi:10.1016/j.tig.2017.12.010

In this review I wrote the first section about eccDNA discoveries before next-generation sequencing including Sanger sequences and electron microscopy. I also cover the characteristics of eccDNAs and the implications of these characteristics on their mechanism of formation.

P. Kumar wrote the section on next-generation sequencing and the newer information about eccDNA hotspots and common features of eccDNAs between organisms.

M. Koseoglu wrote sections about some known functions of eccDNA including the amplification of genes that confer drug resistance.

A. Dutta managed and edited the manuscript.

8 ABSTRACT

While the vast majority of cellular DNA in eukaryotes is contained in long linear strands in , we have long recognized some exceptions like mitochondrial DNA, plasmids in yeasts and double-minutes in cancer cells where the DNA is present in extrachromosomal circles. In addition, specialized extrachromosomal circles of DNA (eccDNA) have been noted to arise from repetitive genomic sequences like telomeric DNA or ribosomal DNA. Recently eccDNA arising from unique (non-repetitive) DNA have been discovered in normal and malignant cells, raising interesting questions about their biogenesis, function and clinical utility. Here we will first review recent results and future directions of inquiry on these new forms of eccDNAs and then proceed to introduce the specific work done in this thesis.

History of extrachromosomal circular DNA in eukaryotes

EccDNA was first found by Alix Bassel and Yasuo Hoota while investigating Franklin

Stahl’s theory that chromosomes of higher organisms are made of a series of DNA circles in

1964.1 They found DNA circles of various sizes, ranging from hundreds to thousands of base pairs within a preparation of mammalian DNA. These large DNA circles are referred to as

Double Minutes (DMs). Another group verified the existence of DMs in human cancer cells by karyotype preparations and by CsCl gradient purification.2,3 CsCl gradient purification and EM imaging experiments were used to study DNA from a number of other organisms.4-10

Southern blots determined that eccDNA molecules were homologous to genomic DNA.

The majority of eccDNA were <500 base pairs and were named poly-disperse circular DNA

(spcDNA).11-15 Although the vast majority appeared to come from repetitive sequences, a few spcDNA molecules hybridized to unique sequences. Some groups found that some non- repetitive spcDNA sequences were flanked on both sides by direct repeats of an average length of 9-11 bp.13-15 This suggested that DNA repair pathways such as homologous recombination

(HR) or microhomology mediated end joining (MMEJ) between short repeats could generate the

9 circles. However, later studies isolated and sequenced eccDNA with unique sequences that do not contain repetitive regions of any length within or flanking the DNA.16 Around the same time, a group used exonuclease III to quantify eccDNA amounts and found that eccDNA levels vary between tissues in mice.17

Using techniques that mostly studied eccDNA from repetitive sequences, a few groups attempted to determine which processes contribute to the formation of eccDNA. Cycloheximide, an inhibitor of protein synthesis, caused a 70-fold increase of eccDNA in murine cells;14 increases were also seen with a carcinogen, 7,1-dimethylbenz[a]anthracene, and a DNA replication inhibitor, hydroxyurea.14 Cells from patients with Fanconi Anemia, with defects in a specific DNA repair pathway, contained longer and more eccDNA molecules.12 Two- dimensional gel electrophoresis expanded the exploration of the smaller eccDNA to show that carcinogens increase eccDNA levels18 and that eccDNA production changes with developmental stage in frogs and flies.19-20 EccDNA could be formed by foreign DNA within a cell21 and the organization of DNA sequences in tandem repeats predisposed the DNA for eccDNA formation.21-23 Collectively, these results suggested that eccDNA formation is dependent on DNA sequence, organization and DNA damage repair.

As regards function of eccDNA, sequencing of larger eccDNA molecules, DMs, indicated their capacity to code for full genes, and indeed eccDNA molecules in cancer cells were found to contain amplified oncogenes and drug resistant genes.18,24-26

Recent advances in eccDNA

Recently, a paired-end high-throughput sequencing technology approach was used to sequence the full complement of eccDNA in mouse tissues and human cancer cell lines.27 High- throughput sequencing on exonuclease-resistant, rolling circle amplified, extrachromosomal cellular DNA followed by a computational method to identify junctional sequences (Fig 1) allowed the characterization of eccDNA sequences at resolution.27 Unique (from non- repetitive DNA) eccDNA sizes peaked around 180 and 380 bp with 5% of the molecules

10 extending to as long as 2-3 kb; the pattern was consistent between all mouse tissues and human cancer cell lines. These molecules were named microDNA. Longer eccDNA could have been under-represented in this study because rolling circle amplification will amplify smaller circles at a greater rate than larger circles, but electron microscopy of the circles also suggests that the majority of the circles are small.28 The eccDNA mapped to over a hundred-thousand unique sites in the genome, and were enriched in specific regions, hotspots, including 5’UTR regions and CpG regions, areas with high GC content (60%), and transcriptionally active chromatin.27 The genomic DNA flanking most microDNA contain 2-15 base direct repeats suggesting a microhomology-mediated mode of generation of the circles.27 The chromosomal locations of the eccDNA sequences weakly clustered prostate and ovarian cancer cell-lines away from each other suggesting that sites of eccDNA formation could be correlated to cell lineage.28 Further, deletion of MSH3, encoding a protein in the DNA mismatch repair pathway, caused eccDNA levels to decrease by 80%.28 It is not yet clear whether these small eccDNAs replicate. A very rough estimate, probably an underestimate, of the abundance of the eccDNA counted by electron microscopy on preparations from defined numbers of cells suggest that there are at least 125-200 circles per DT40 cell.28

A similar study done in Saccharomyces cerevisiae29 identified about two-thousand extrachromosomal circles of DNA covering nearly a quarter of the yeast genome. The approach ignored potential eccDNAs less than 1kb in size and did not depend on the identification of junctional sequences to call a sequence an eccDNA. Thus the eccDNAs identified varied from

1-38 kb in length, with a significant enrichment of circles from repeated parts of the genome like transposons, ribosomal DNA circles, gene duplications etc, suggesting a homologous recombination mediated biogenesis of the circles. Nevertheless, nearly 60% of the eccDNA arose from unique sequences, and over 90% of the genomic sites revealed 7 base direct repeats that may lend them to a microhomology-mediated mechanism of circle formation.

11 A very recent study done in Caenorhabditis elegans and in human cell lines used an approach that relies on density gradient centrifugation (Cesium Chloride- Ethidium Bromide) followed by tagmentation and high-throughput sequencing [30]. The authors also report circles mapping on both coding and noncoding regions of the genome. The eccDNA frequently appeared to be derived from exons of protein coding genes like mucin and titin. The authors hypothesized that eccDNA may contribute to the expression of different isoforms of a gene by interfering with or promoting the transcription of specific exons.27

Fluorescent microscopy has been used to quantify changes in large eccDNAs, DMs, between cancerous and normal cells.31,32 EGFR and MYC genes were amplified in cancer cells after a few passages through the formation of eccDNA.32 An image analysis software package combined with fluorescence imaging to quantify copies of oncogene, found MYC and EGFR to be amplified in ~40% of examined human cancer tissues whereas no enrichment was found in normal tissues.31 The oncogenes were significantly more amplified through the mechanism of eccDNA formation than through chromosomal amplification.31 The failure to detect eccDNA in normal tissues is probably due to the absence of an enrichment procedure for circles and because the small eccDNA (microDNA) do not bind enough dye to be detected under the microscope. Together these studies suggest that eccDNA could contribute to tumor heterogeneity and evolution of tumors by increasing the copy number of oncogenes.28 These findings were validated by a study where an oncogene, MET, in glioblastoma cells was found to be amplified on eccDNA molecules as seen by FISH.33

Yeast also contained an eccDNA carrying the GAP1 gene, a protein involved in amino acid absorption, that was amplified after metabolite limitation.34 The eccDNA was flanked by two long direct repeats and it was hypothesized that homologous recombination between the two repeat regions could form an eccDNA.

12 Clearly the eccDNA identified to date are distributed over a wide size range and can also be easily divided into those that arise from repetitive DNA versus unique DNA (Table 1-1). Thus it is likely that eccDNAs may arise by many different pathways and have different functions.

FUTURE QUESTIONS

Biogenesis

The mechanisms that lead to the biogenesis of eccDNA have yet to be fully discovered.

A large percentage of eccDNA molecules contain or are proximal to short direct repeats, suggesting that some form of microhomology directed repair may form eccDNA [35, 36].

However, a significant portion of eccDNA contain no repeats and could not recombine with any nearby sequence [16]. EccDNA levels have been known to increase with the addition of carcinogens [16, 37]. The contribution of DNA replication in eccDNA production, however, is controversial where some labs find eccDNA levels increase when ongoing replication is blocked by replication inhibitors [14] while others find that eccDNA can be formed in the absence of any

DNA replication [21]. Further, specific DNA repair proteins are necessary for eccDNA formation such as MSH3 (involved in mismatch repair) in DT40 cells [28]; or unnecessary for eccDNA formation, such as okra, mus309, and mei41 (involved in homologous recombination, non- homologous end joining, and DNA damage recognition respectively) in Drosophila melanogaster cells [20]. Homologous recombination could excise repetitive DNA sequences during early development to give rise to larger eccDNA. This would contribute to the heterogeneity of tandem repeat sequences between individuals, including genes organized in tandem arrays such as rDNA and tDNA [21, 22]. Another hypothesis is that DNA synthesis through regions rich in repeats would require increased recruitment of DNA repair proteins, which may cause part of the chromosome to be incompletely condensed prior to mitosis. This would cause breakage of large fragments, which are ligated into a circle.38 Finally, the enrichment of eccDNA in normal cells from GC rich, transcriptionally active areas of the genome suggest that R-loop formation and its repair may contribute to eccDNA formation.28 In figure 1- 2

13 we have speculated on the different mechanism by which the eccDNAs may arise. Overall, it seems that eccDNAs arise from various processes and more research is needed to determine how the size, GC content, and repeat sequences of eccDNA are tied to pathways involving DNA metabolism.

When unique DNA is detected as an eccDNA, there is also the issue of whether the biogenesis arose from excision of the chromosomal DNA, leading to a corresponding deletion in the chromosomal sequence, or whether the circular DNA was produced by some kind of copying mechanism, as during DNA replication or repair, where there is no concurrent loss of chromosomal DNA. Indeed, in at least two hot-spots for microDNA production, a very high depth sequencing of the chromosomal DNA from adult mouse brain identified microdeletions present substoichiometrically (in a somatically mosaic pattern).27 The microdeletions were rare, occurring in 1 of 400-4000 alleles from the brain and would be missed if the genomic sequencing is not done at a very high depth. Nevertheless, the presence of eccDNA from over a hundred-thousand sites in mouse, human and chicken cells makes it unlikely that all of them are created by an excision event that leaves behind over a hundred-thousand somatically mosaic deletions on the chromosomes.

Circular DNA containing telomeric DNA (t-circles) and ribosomal sequences (rDNA) have been studied for many more years and many mechanisms of formation have been proposed, though none proven. These theories include genomic rearrangement, excision and ligation, recombination between tandem repeats, and reverse transcription of mRNA.39 One protein complex, CTC1/STN1/TEN1, known to be involved in telomere maintenance has been shown to significantly contribute to t-circle formation.40 Also, in yeast cells lacking SGS1, a protein known to be involved in DNA repair, there are significantly increased levels of eccDNA.41

These results further suggest that multiple pathways contribute to eccDNA formation.

Functions

14 Known functions of eccDNA include contribution to intercellular genetic heterogeneity in tumors, particularly amplification of oncogenes and drug resistance genes, which presumes that the genes on the eccDNA are expressed. Theoretical functions of eccDNA can include expression of regulatory RNAs contributing to intercellular heterogeneity, sponging of transcription factors, production of a pool of mutable DNA for evolution of tumors, gene dose compensation, aging, release from cells for intercellular communication, liquid biopsy and stimulation of innate immune pathways (Figure 1-3).

EccDNAs have been linked to cancers and drug resistance. Double minutes carry oncogenes or drug resistance genes and advance cancers by gene amplification.42-45 Recent evidence suggests oncogene amplification mediated by eccDNAs is a much more common phenomenon than previously thought, and is critical for tumor heterogeneity and evolution. For example, eccDNAs were detected in nearly 40% of the tumor cell lines and nearly 90% of patient-derived brain tumor models.31 This study also provides mathematical and experimental evidence that driver oncogene amplification and tumor heterogeneity may be significantly higher when the amplification occurs on eccDNA than when the amplification occurs within chromosomes. Due to the random distribution of eccDNAs to daughter cells, during each division one of the daughter cells may inherit a higher copy number of eccDNAs with a driver oncogene, and thus acquire a proliferative advantage. The number of specific eccDNAs in cells may also change in response to environmental conditions thereby introducing an additional mechanism for tumor adaptation to challenging conditions. An example of this is in glioblastomas, where EGFR is frequently mutated and commonly gives rise to the EGFRvIII oncogenic variant. While EGFRvIII promotes tumor growth, it also makes the tumor cells sensitive to EGFR tyrosine kinase inhibitors.46 It is found that resistance to EGFR tyrosine kinase inhibitors is developed by losing double minutes carrying the mutant EGFR.46 It is also possible that driver mutations can occur extrachromosomally during the amplification of eccDNAs, and this will also be critical for tumor heterogeneity and evolution.47 Overall, eccDNAs

15 appear to be a common phenomenon in cancers contributing to tumor heterogeneity, adaptation, and evolution.

EccDNA has also been linked to aging. EccDNAs containing ribosomal RNA genes accumulate with time and contribute to the aging of yeast cells.41 These extrachromosomal rDNAs have an ARS sequence (autonomously replicating sequence), and are able to replicate.

Moreover, in each cell division they are preferentially segregated to mother cells.41 These properties are responsible for an exponential increase of these eccDNAs in aging mother cells, while limiting the number of eccDNAs in daughter cells prolongs their lifespans. The exact mechanism how the eccDNAs trigger the senescence and eventual death of old cells is not clear.41 It was proposed that sheer abundance of these eccDNAs might titrate the components of replication and/or transcription machineries, and triggers the senescence and eventual death of old cells. Corroborating the titration hypothesis, ectopic expression of ARS plasmid is sufficient to trigger a final arrested state of old cells and eventual cell death. Recent discoveries of eccDNA in normal cells and tissues raises the interesting question of whether eccDNA accumulation may contribute to ageing in higher eukaryotes.

EccDNAs may also play a role in gene compensation.48 In S. cerevisiae, H2A and H2B histones are encoded by two gene pairs named as HTA1-HTB1 and HTA2-HTB2. When HTA1-

HTB1 is deleted, dosage compensation occurs by gene amplification of HTA2-HTB2 via formation of a new eccDNA containing 39kb of chromosome II that includes HTA2-HTB2, the histone H3-H4 , a centromere and origins of replication. This new eccDNA is created by recombination between two Ty1 retrotransposon elements that flank this region. In HTA1-HTB1 deleted strains, formation of this eccDNA is significantly elevated to compensate for the decrease in H2A and H2B.

In contrast to the DMs, the smaller type of eccDNAs are more widespread, but much less is known about their function in cell biology.27,49 They are too small to contain protein- coding genes, but long enough to code for regulatory short RNAs or fragments of genes.

16 Another possibility for the microDNA function may be molecular sponging: they may function as sponges for transcription factors to control gene expression indirectly. Finally, it was recently found that microDNAs are present in serum and plasma of both mouse and humans as circulating DNA.49,50 Other than their potential as biomarkers in liquid biopsy experiments, if other cells can take in these microDNAs, this may be a novel way of communication between cells. This possibility is a speculation at this point, but is an important subject of future investigation.

Cells react to naked DNA in the cytoplasm by activation of the cGAS pathway that culminates in expression of interferon and stimulation of the immune system.51,52 This is part of the innate immune response that is used in response to foreign pathogens. One interesting possibility is that the eccDNAs are released to the cytoplasm during mitosis and are either degraded by enzymes like TREX1 or activate the cGAS pathway. Thus the eccDNA, especially if they are not protected by chromatin, may be an endogenous antigen that can activate autoimmune pathways.

Clinical utility

Tumor specific features of eccDNA could be important for the identification and prognosis of the disease and therefore it is important to find any differences in eccDNA between tumor and matched normal tissue. Human cancer cell lines were found to have a population of longer eccDNAs than that in normal mouse tissues.27 MicroDNA was present in both tumor and normal lung tissue and most of the known properties of eccDNA were similar between the normal and tumor.49 However, the eccDNA identified in human lung cancers were slightly longer on average compared to matched normal tissue from the same patients (3 out of 4 pairs of tumor and normal samples). It will, of course, be more important for the use of eccDNA as biomarkers for cancers, if their abundance or sequence of origin change in a predictable way when a normal cell is transformed into a cancer cell.

17 There are numerous reports using high-throughput sequencing technology to identify tumor-specific linear DNA fragments present in the serum or plasma, the so called liquid biopsy.53 The presence of eccDNA in circulation has also recently been shown.49,50 The characteristics of the microDNA (higher GC content, 2-15 bases long direct repeat flanking their source genomic sites, genomic distribution etc.) of circulating eccDNA were very similar to that reported earlier in mouse tissue and mouse, chicken and human cell lines. Furthermore, the eccDNA arise from genic and intergenic regions. Consistent with the fact that microDNA from lung cancers was slightly longer than matched normal tissue (mentioned above), the circulating eccDNA in the patients before surgical resection of the tumor was generally longer compared to the circulating eccDNA in the same patients 6 weeks after surgery.49 This may suggest that human cancers also release longer eccDNA into circulation than normal tissues. The eccDNAs are expected to be more stable compared to linear DNA and this may be an advantage for using eccDNA in blood for liquid biopsy experiments.

Conclusion

The most important question to be resolved in order for eccDNA to be used as a biomarker for cancer is determine whether cell-free eccDNA arising from tumors can be differentiated from eccDNA arising from normal tissues. Further, because eccDNA has been shown to confer an ability to amplify oncogenes and drug resistant genes it is vital to determine what specific proteins are involved in eccDNA formation. The proteins involved in generating eccDNA from the genome could act as targets for therapy against this form of genomic plasticity. The functions of eccDNA needs to be further explored. The majority of eccDNA are

200-400 base pairs (microDNA, some of the spcDNA). However, most studies have focused on the gene amplification abilities of DMs, and it is possible that microDNA can contribute to similar amplification of small genes or regulatory RNAs. Lastly, the turnover of eccDNA in cells or in cell-free DNA in the circulation has yet to be determined, and the results will have implications for their intracellular functions and their utility for liquid biopsy, respectively. The half-life of these

18 molecules inside the cells would be particularly interesting in proliferating cells where nuclear envelope breakdown in mitosis exposes them concurrently to cytoplasmic nucleases and to the cGAS DNA sensing system. Overall, research on eccDNA has increased the appreciation of the plasticity of genomic DNA and the dynamics of gene amplification and deletion and much more remains to be discovered regarding their biogenesis, function and possible clinical utility.

Introduction, continued:

Current studies of eccDNA

As discussed above, eccDNA were first noted when DNA was imaged by the electron microscope and later by light microscopy as anomalous circular DNA elements.1–3 Though they garnered interest enough for some of the molecules to be sequenced through laborious Sanger sequencing, no clear mechanism of formation or function was found and they were generally thought of as inert byproducts of DNA metabolism.4–20 Recently, however, it has been found that eccDNA can amplify oncogenes and can significantly reduce survival of cancer patients.21–23

Further, eccDNA has been found to exist in all eukaryotic organisms tested to date and it is known that circles can be selected by cancer cells that contain genes that lead to drug resistance and increased growth, in plants and humans respectively.18,21,22,24–27

Next generation sequencing

The full host of eccDNA within human and mice cells were first characterized through next generation sequencing in 2012 by Shibata et al.27 from Dr. Dutta’s lab. They found that eccDNA came from specific hotspots from the genome i.e. regions with high GC content, microhomology sequences, and within genic regions. This showed that there are patterns of eccDNA formation and it suggested that eccDNA formation is tied to transcription and replication stalling in DNA areas that are less easily opened and are more prone to form secondary structures.

19 The Dutta lab later showed in 2015 that eccDNA are unique enough between different cancer types, ovarian and prostate cancer, to cluster in separate groups according to tumor lineage.28 This suggests that genomic sites that yield eccDNA are distinct in different cancer lineages and that the eccDNA may contribute to the differential expression and behavior of cancers. It also suggests that eccDNA sequences could give information about the tissue of origin which could be utilized in diagnosis of cancer through sequencing eccDNA from the blood.

One of the most interesting findings of these studies was the consistent length distribution of eccDNA in normal cells of various tissues in mammals and various organisms, peaking at 180 and 380 base pairs. The length of circles quickly drops off after these lengths, which shows that the vast majority of eccDNA are very small in size. Their small size precludes the eccDNA from containing fully functional protein coding genes or promoter sequences. Our current work shows that eccDNA are transcribed through a unique characteristic conferred by their small diameter and can code for functional RNA.

There are two measures of eccDNAs that should be clarified. Abundance refers to the number of copies of eccDNAs per cell. Complexity refers to the number of unique sites in the genome that are represented in a cell or tissue. Before the advent of Next generation sequencing, a few hundred naturally occurring eccDNAs were sequenced, and it was generally agreed that most of them arise from repetitive DNA sequences, suggesting a low complexity.

However, with the adoption of Next generation sequencing, it became clear that nearly 50% of the circles seen in normal cells and tissues arise from unique DNA sequences, yielding a much higher complexity of the eccDNA population. To prepare these circles for next generation sequencing, however, most methods used some kind of a PCR amplification, which made it difficult to estimate the quantity of a given circle from the sequencing results, the abundance. A very laborious way to measure abundance of circles was tried by Dillon et al,27 where the

20 eccDNAs obtained from a specified number of cells were spread on electron microscopy grids and counted. This method was too difficult to measure the abundance of circles in parallel in multiple cell lines, and so I developed an easier assay to measure abundance that facilitated my studies in Chapter 3.

Another issue that needs to be addressed is whether the eccDNAs arise from excision of uncopied chromosomal DNA, which would lead to corresponding deletions in the genome. The high complexity of the eccDNAs in normal tissues (up to 40,000 sites in a tissue) is not matched by as many deletions in the tissue. There could be two explanations for this. (1) The DNA is copied from the chromosomes and then ligated to form the circle, or the deletion is repaired by copying from another template after the excision of the DNA that forms the eccDNA. (2) All the eccDNAs are not present in all the cells in the tissue. Because of this high somatic mosaicism of the eccDNA even if their production left deletions on the chromosome, the deletion will be present in a very small fraction of cells in the tissue and will not be detected unless one specifically looked for somatically mosaic deletions. We think that both explanations are true, because if one sequences the genomic DNA from hotspots of eccDNAs production at a very high depth then small chromosomal deletions become apparent, but they are present at a very low allele frequency compared to the wild-type sequence (1:400 to 1:4000).28 Thus, sometimes the eccDNA are generated by excision of uncopied chromosomal DNA. However, the low number of chromosomal deletions compared to the high complexity of eccDNAs makes it likely that many of the eccDNAs seen in normal tissues are generated by pathways that are not accompanied by chromosomal deletion.

Large eccDNA in cancer

Recently, there has been a steadily increasing interest in the larger eccDNA which are long enough to contain full protein coding genes with their regulatory sequences. Newer methods focused on sites with gene amplification detected by next generation sequencing.

21 These amplified loci were used as probes for fluorescent-in-situ-hybridization of metaphase spreads of cancer cells to reveal the long eccDNAs are much more common in cancers than previously believed. These large eccDNA increase in copy number of the locus so that an oncogenic or drug resistant gene can increase in copy number >20X.21,22 Further, eccDNA have been shown by Mischel and co-workers to distribute between the daughter cells after cell division in a non-Mendelian inheritance pattern. This can lead to rapid and large changes in copy number of specific eccDNA sequences in the cells. This suggests that tumor cells may develop aggressive characteristics in a short series of cell divisions. This also suggests that to counter this adaptability we must find out how these eccDNA are formed and maintained in order to neutralize their negative effect on patient survival and improve chemotherapeutic strategies.

Transcription of eccDNA is triggered by structure

We hypothesized the small eccDNA were capable of contributing to the pool of active

RNA and affecting gene expression. This hypothesis was supported by a past study that showed that circular single-stranded DNA circles of very small size (80 bp) were transcribed without containing promoter sequences.31 This study was performed to study the various sequences of promoters and found that, contrary to expectations, the sequence of the circles did not alter the levels of transcription of the circles. The majority of eccDNAs that are seen in normal tissues are of larger size (180-400 bp) and are most likely double stranded, not single stranded. To test whether the size and the double stranded nature of the eccDNA in normal cells would still trigger transcription without the need of a promoter sequence we performed the experiments in Chapter 2.

DNA Repair

Soon after the discovery of eccDNA it was hypothesized that eccDNA form through DNA repair mechanisms. Through sequence analyses it was determined that the characteristics of

22 eccDNA were broad and inconsistent.6–8,12,19,28–30 The chromosomal source of eccDNA did not contain long homologous sequences, suggesting that the eccDNA were not produced by homologous recombination. Microhomology sequences were, however, noted in some papers within the sampled population of eccDNAs.13-15,20-24 Next generation sequencing revealed sequences of micro-homology at the ends of the chromosomal sequence that give rise to the eccDNAs (in ~60% of circle), and only one of those microhomology repeats remains in the eccDNA.27,28 This suggested that many of the circles are formed by microhomology-mediated- end joining, but the absence of microhomology in ~40% of the cases also suggests that other mechanisms contribute to eccDNA formation. To evaluate what kind of DNA damage and what

DNA repair mechanism contribute to eccDNAs formation, we carried out the experiments in

Chapter 3.

Microscopy of double minutes in cancer cells

Visualizing the eccDNA through microscopy has continued since the discovery of eccDNA though modern techniques have allowed more precision and quantification.8,16,18,21,22,26

Mischel et al. have shown that specific long eccDNA molecules can be visualized in cancers.21,22 They also show that a large fraction of the long eccDNA molecules studied are not one contiguous strand of DNA from the genome but are fragments from various chromosomes.

This suggests that (1) DNA repair ligates ends of DNA fragments together to halt the cell-cycle arresting checkpoint signal created by double strand break ends. (2) EccDNA has ties to chromothripsis where chromosomes lag behind in the equatorial plane during cell division and becomes isolated in a micronucleus in the daughter cell, where it shatters into fragments. The ligation of the fragments into a linear chromosome with a very high density of rearrangements leads to the chromothriptic chromosome. However, the linear fragments could also be ligated into circular DNA molecules which would then be enriched or lost in future cell divisions as eccDNA.

23 As mentioned earlier, it is becoming apparent that long eccDNA are very common in cancer cells. Over 60% of all patient tumor samples had long eccDNA detectable by DAPI staining.21,22 The genes on the long eccDNAs are often over-expressed in the cancer because of (a) the increase in the copy number of the genes and (b) the change in the cis-acting regulatory elements on the eccDNAs, which is chromatinized in a different manner compared to the parental chromosomal DNA.60 These results suggest that eccDNA are conferring a survival and growth advantage to cells and that if the formation of eccDNA can be hindered then patient survival could be improved.

I have done some experiments described in Chapter 4 in order to move from the small eccDNAs in normal cells to the long eccDNAs in cancers. Although this work is not yet complete, I want to develop a system to study how long eccDNAs are generated in cancer cells.

24

FIGURE 1-1 Junctional tag: Schematic representation of eccDNA and junctional sequence genesis from linear DNA. The two ends of linear DNA get ligated and ligation event creates a new junctional sequence which is not present in parent linear DNA; The sequencing reads partially mapping to the left and right side of ligation point are the junctional sequence. Because the junctional sequence is not present in linear DNA it acts as a discriminatory feature within computer algorithms to validate that the DNA fragment was within a circular DNA molecule [27, 28, 49]. The paired end reads pairs where one read completely maps inside the body of circle and other second reads maps on the junctional sequence are used for the final validation of circular nature of the starting DNA fragment.

25

FIGURE 1-2: Examples of how eccDNA is formed: Replication slippage creates a loop on the template strand through mis-priming of a dissociated polymerase at the wrong direct repeat. The loop is then excised and ligated into a circle, leaving a microdeletion on the chromosome. (B) Replication slippage creates a loop in the product strand which is then excised and ligated into a circle, but no microdeletion is left on the chromosome. An R-loop displaces the non-template strand and allows the direct repeats on the unpaired strand to form into a loop which is then excised and ligated into a circle. Alternatively (not shown), the RNA paired DNA strand could be excised, released from the RNA and ligated between direct repeats to form a circle. In either case, the gap in the chromosomal DNA is repaired by gap filling and leaves no deletions on the chromosome. (C) ODERA mechanism of eccDNA formation. Replication slippage on pairs of inverted repeats and ligation forms a single-strand circle, e.g. [54]. (D) Double strand break within a repeat region with a proximal homologous repeat sequence is repaired by homologous recombination. The small fragment forms a circle, while the chromosome suffers a microdeletion. An example of this is in [34].

26

Ageing Intercellular heterogeneity Gene dosage compensa"on Intercellular communica"on Oncogene amplifica"on S"mula"on of innate immune pathways eccDNAs Pool of mutable DNA for tumor adap"on and evolu"on Expression of regulatory RNAs Tumor heterogeneity Molecular sponges Drug resistance Liquid biopsy

FIGURE 1-3: Functions of eccDNA in mammalian cells: The known functions of eccDNA are listed in black text; the hypothesized roles of eccDNA are listed in grey text.

27 Table 1-1: Size range of circular DNA in eukaryotes

Name of Size range Replication References Function (if any) circular DNA Double- 100kb - 3mb Self [31, 42-45] Double minutes contain minute proto-oncogenes in chromosomes cancers Extra 19.3 and 40.4 Self [41, 55] Accumulation chromosomal kb associated with aging, rDNA circle suppress mitochondrial “cheats” in Yeast Telomeric Integral Self [22] Telomeric circles in a circle multiples of 738 wide range of bp organisms provide telomeric repeat template for restoring telomere length by homologous recombination

Mitochondrial 16 kb Self [56] Contains genes circular DNA essential for mitochondria function

Chloroplast 120-160 kb Self [57] Contains genes circular DNA including essential genes for photosynthesis Alpha satellite 2-20 kb Self [22] Centromere evolution? circle MicroDNA 100-1000 ND [27, 28, 49] miRNA generation? bases

Kinetoplast 20-40 kb Self [58, 59] Encode ribosomal maxicircles RNAs and mitochondrial proteins Kinetoplast 0.5-1 kb Self [58, 59] Produce guide RNA to minicircles decode maxicircle gene information for mitochondria

28 BIBLIOGRAPHY

1. Hotta, Y. and Bassel, A. (1965) Molecular Size and Circularity of DNA in Cells of Mammals and Higher Plants. Proc Natl Acad Sci U S A 53, 356-62. 2. Cox, D. et al. (1965) Minute Chromatin Bodies in Malignant Tumours of Childhood. Lancet 1 (7402), 55-8. 3. Radloff, R. et al. (1967) A dye-buoyant-density method for the detection and isolation of closed circular duplex DNA: the closed circular DNA in HeLa cells. Proc Natl Acad Sci U S A 57 (5), 1514-21. 4. Agsteribbe, E. et al. (1972) Circular DNA from mitochondria of Neurospora crassa. Biochim Biophys Acta 269 (2), 299-303. 5. Billheimer, F.E. and Avers, C.J. (1969) Nuclear and mitochondrial DNA from wild-type and petite yeast: circularity, length, and buoyant density. Proc Natl Acad Sci U S A 64 (2), 739-46. 6. Buongiorno-Nardelli, M. et al. (1976) Electron microscope analysis of amplifying ribosomal DNA from Xenopus laevis. Exp Cell Res 98 (1), 95-103. 7. Ono, T. et al. (1971) Characterization of nuclear and satellite DNA from trypanosomes. Biken J 14 (3), 203-15. 8. Smith, C.A. and Vinograd, J. (1972) Small polydisperse circular DNA of HeLa cells. J Mol Biol 69 (2), 163-78. 9. Stanfield, S. and Helinski, D.R. (1976) Small circular DNA in Drosophila melanogaster. Cell 9 (2), 333-45. 10. Wong, F.Y. and Wildman, S.G. (1972) Simple procedure for isolation of satellite DNA's from tobacco leaves in high yield and demonstration of minicircles. Biochim Biophys Acta 259 (1), 5-12. 11. Bertelsen, A.H. et al. (1982) Molecular characterization of small polydisperse circular deoxyribonucleic acid from an African green monkey cell line. Biochemistry 21 (9), 2076-85. 12. Motejlek, K. et al. (1993) Increased amount and contour length distribution of small polydisperse circular DNA (spcDNA) in Fanconi anemia. Mutat Res 293 (3), 205-14. 13. Stanfield, S.W. and Helinski, D.R. (1984) Cloning and characterization of small circular DNA from Chinese hamster ovary cells. Mol Cell Biol 4 (1), 173-80. 14. Sunnerhagen, P. et al. (1986) Molecular cloning and characterization of small polydisperse circular DNA from mouse 3T6 cells. Nucleic Acids Res 14 (20), 7823-38. 15. Stanfield, S.W. and Lengyel, J.A. (1979) Small circular DNA of Drosophila melanogaster: chromosomal homology and kinetic complexity. Proc Natl Acad Sci U S A 76 (12), 6142-6. 16. van Loon, N. et al. (1994) Formation of extrachromosomal circular DNA in HeLa cells by nonhomologous recombination. Nucleic Acids Res 22 (13), 2447-52. 17. Gaubatz, J.W. and Flores, S.C. (1990) Purification of eucaryotic extrachromosomal circular DNAs using exonuclease III. Anal Biochem 184 (2), 305-10. 18. Schneider, S.S. et al. (1992) Isolation and structural analysis of a 1.2-megabase N- myc amplicon from a human neuroblastoma. Mol Cell Biol 12 (12), 5563-70. 19. Cohen, S. et al. (1999) Regulated formation of extrachromosomal circular DNA molecules during development in Xenopus laevis. Mol Cell Biol 19 (10), 6682-9. 20. Cohen, S. et al. (2003) Extrachromosomal circular DNA of tandemly repeated genomic sequences in Drosophila. Genome Res 13 (6A), 1133-45.

29 21. Cohen, S. and Mechali, M. (2001) A novel cell-free system reveals a mechanism of circular DNA formation from tandem repeats. Nucleic Acids Res 29 (12), 2542-8. 22. Cohen, S. et al. (2010) Extrachromosomal circles of satellite repeats and 5S ribosomal DNA in human cells. Mob DNA 1 (1), 11. 23. Navratilova, A. et al. (2008) Survey of extrachromosomal circular DNA derived from plant satellite repeats. BMC Plant Biol 8, 90. 24. Beland, J.L. et al. (1993) CpG island mapping of a mouse double-minute chromosome. Mol Cell Biol 13 (8), 4459-64. 25. Carroll, S.M. et al. (1987) Characterization of an episome produced in hamster cells that amplify a transfected CAD gene at high frequency: functional evidence for a mammalian replication origin. Mol Cell Biol 7 (5), 1740-50. 26. Stahl, F. et al. (1992) Amplicon structure in multidrug-resistant murine cells: a nonrearranged region of genomic DNA corresponding to large circular DNA. Mol Cell Biol 12 (3), 1179-87. 27. Shibata, Y. et al. (2012) Extrachromosomal microDNAs and chromosomal microdeletions in normal tissues. Science 336 (6077), 82-6. 28. Dillon, L.W. et al. (2015) Production of Extrachromosomal MicroDNAs Is Linked to Mismatch Repair Pathways and Transcriptional Activity. Cell Rep 11 (11), 1749-59. 29. Moller, H.D. et al. (2015) Extrachromosomal circular DNA is common in yeast. Proc Natl Acad Sci U S A 112 (24), E3114-22. 30. Shoura, M.J. et al. (2017) Intricate and Cell-type-specific Populations of Endogenous Circular DNA (eccDNA) in Caenorhabditis elegans and Homo sapiens. G3 (Bethesda). 31. Turner, K.M. et al. (2017) Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature 543 (7643), 122-125. 32. Vogt, N. et al. (2014) Amplicon rearrangements during the extrachromosomal and intrachromosomal amplification process in a glioma. Nucleic Acids Res 42 (21), 13194- 205. 33. deCarvalho, A.C. et al. (2017) Extrachromosomal DNA elements can drive disease evolution in glioblastoma. BioRxiv, 1-13. 34. Gresham, D. et al. (2010) Adaptation to diverse nitrogen-limited environments by deletion or extrachromosomal element formation of the GAP1 locus. Proc Natl Acad Sci U S A 107 (43), 18551-6. 35. Misra, R. et al. (1989) Recombination mediates production of an extrachromosomal circular DNA containing a transposon-like human element, THE-1. Nucleic Acids Res 17 (20), 8327-41. 36. Jones, R.S. and Potter, S.S. (1985) L1 sequences in HeLa extrachromosomal circular DNA: evidence for circularization by homologous recombination. Proc Natl Acad Sci U S A 82 (7), 1989-93. 37. Cohen, S. et al. (1997) Small polydispersed circular DNA (spcDNA) in human cells: association with genomic instability. Oncogene 14 (8), 977-85. 38. Hahn, P.J. (1993) Molecular biology of double-minute chromosomes. Bioessays 15 (7), 477-84. 39. Tomaska, L. et al. (2009) Telomeric circles: universal players in telomere maintenance? Nat Struct Mol Biol 16 (10), 1010-5.

30 40. Huang, C. et al. (2017) The human CTC1/STN1/TEN1 complex regulates telomere maintenance in ALT cancer cells. Exp Cell Res 355 (2), 95-104. 41. Sinclair, D.A. and Guarente, L. (1997) Extrachromosomal rDNA circles--a cause of aging in yeast. Cell 91 (7), 1033-42. 42. Vogt, N. et al. (2004) Molecular structure of double-minute chromosomes bearing amplified copies of the epidermal growth factor receptor gene in gliomas. Proc Natl Acad Sci U S A 101 (31), 11368-73. 43. Zuberi, L. et al. (2010) Rapid response to induction in a case of acute promyelocytic leukemia with MYC amplification on double minutes at diagnosis. Cancer Genet Cytogenet 198 (2), 170-2. 44. Storlazzi, C.T. et al. (2010) Gene amplification as double minutes or homogeneously staining regions in solid tumors: origin and structure. Genome Res 20 (9), 1198-206. 45. Del Rey, J. et al. (2010) Centrosome clustering and cyclin D1 gene amplification in double minutes are common events in chromosomal unstable bladder tumors. BMC Cancer 10, 280. 46. Nathanson, D.A. et al. (2014) Targeted therapy resistance mediated by dynamic regulation of extrachromosomal mutant EGFR DNA. Science 343 (6166), 72-6. 47. Nikolaev, S. et al. (2014) Extrachromosomal driver mutations in glioblastoma and low-grade glioma. Nat Commun 5, 5690. 48. Libuda, D.E. and Winston, F. (2006) Amplification of histone genes by circular chromosome formation in Saccharomyces cerevisiae. Nature 443 (7114), 1003-7. 49. Kumar, P. et al. (2017) Normal and Cancerous Tissues Release Extrachromosomal Circular DNA (eccDNA) into the Circulation. Mol Cancer Res. 50. Zhu, J. et al. (2017) Molecular characterization of cell-free eccDNAs in human plasma. Sci Rep 7 (1), 10968. 51. Mackenzie, K.J. et al. (2017) cGAS surveillance of micronuclei links genome instability to innate immunity. Nature 548 (7668), 461-465. 52. de Oliveira Mann, C.C. and Kranzusch, P.J. (2017) cGAS Conducts Micronuclei DNA Surveillance. Trends Cell Biol. 53. Heitzer, E. et al. (2015) Circulating tumor DNA as a liquid biopsy for cancer. Clin Chem 61 (1), 112-23. 54. Brewer, B.J. et al. (2015) Origin-Dependent Inverted-Repeat Amplification: Tests of a Model for Inverted DNA Amplification. PLoS Genet 11 (12), e1005699. 55. Poole, A.M. et al. (2012) A positive role for yeast extrachromosomal rDNA circles? Extrachromosomal ribosomal DNA circle accumulation during the retrograde response may suppress mitochondrial cheats in yeast through the action of TAR1. Bioessays 34 (9), 725-9. 56. Taanman, J.W. (1999) The mitochondrial genome: structure, transcription, translation and replication. Biochim Biophys Acta 1410 (2), 103-23. 57. Bendich, A.J. (2004) Circular chloroplast chromosomes: the grand illusion. Plant Cell 16 (7), 1661-6. 58. Shlomai, J. (2004) The structure and replication of kinetoplast DNA. Curr Mol Med 4 (6), 623-47. 59. Shapiro, T.A. and Englund, P.T. (1995) The structure and replication of kinetoplast DNA. Annu Rev Microbiol 49, 117-43.

31 Chapter 2

Small extrachromosomal circular DNA produce short regulatory RNAs that suppress gene expression independent of canonical promoters

Adapted from Teressa Paulsen, Yoshiyuki Shibata, Pankaj Kumar, Laura Dillon, Anindya Dutta: Small extrachromosomal circular DNA, microDNA, produce short regulatory RNAs that suppress gene expression independent of canonical promoters. Nuc Acids Res 2019;47:4586- 4596. doi: 10.1093/nar/gkz155.

I did the work for figures 2-1, 2-2, 2-3, and 2-4; supplemental figures 2-1, 2-3, 2-4, and 2-5; and supplemental table 2-1. I also wrote the manuscript, and responded to the reviews.

Yoshi Shibata did figures 2-5, 2-6 and table 2-1.

Pankaj Kumar did the bioinformatic analyses of eccDNA molecules to address the reviewers in supplemental figure 2-6.

Laura Dillon did two biological replicates of the in vitro transcription experiment in supplemental figure 2-2.

Anindya Dutta supervised the project.

32 ABSTRACT

Interest in extrachromosomal circular DNA (eccDNA) molecules has increased recently because of their widespread presence in normal cells across every species ranging from yeast to humans, their increased levels in cancer cells, and their overlap with oncogenic and drug- resistant genes. However, the majority of eccDNA (microDNA) in mammalian tissues and cell lines are too small to carry protein coding genes. We have tested functional capabilities of microDNA by creating artificial microDNA molecules mimicking known microDNA sequences and have discovered that they express functional small regulatory RNA including microRNA and novel si-like RNA. MicroDNA is transcribed in vitro and in vivo independent of a canonical promoter sequence. MicroDNA which carry miRNA genes form transcripts which are processed by the endogenous RNA-interference pathway into mature miRNA molecules, which repress a luciferase reporter gene as well as endogenous mRNA targets of the miRNA. Further, microDNA that contain sequences of exons repress the endogenous gene from which the microDNA was derived through the formation of novel si-like RNA. We also show that endogenous microDNA associate with RNA polymerases subunits POLR2H and POLR3F.

Together, these results suggest that microDNA may modulate gene expression through the production of both known and novel regulatory small RNA.

INTRODUCTION

Extrachromosomal circular DNA (eccDNA) exists within all eukaryotic organisms tested (1–

17) and come from consistent hotspots within the genome including 5’UTRs, exons, and CpG islands (5–9). For a recent review see (18). The majority of eccDNA in normal cells are small in size, 200-400 base pairs, though they can range up to tens of thousands of base pairs

(5,6,8,9,19). Mega-base sized eccDNA molecules have been found to amplify oncogenes, and smaller forms of eccDNA (>10,000 base pairs) have recently been found to also amplify oncogenes (20–22), drug-resistant genes (23), and tissue specific genes (5,8,19,24). The

33 smallest type of naturally occurring eccDNA, <1000 bp, are called microDNA (5,6,8). The limited size of these molecules precludes them from carrying full protein coding gene sequences and or promoter sequences.

By electron microscopy, endogenous microDNA are both single and double-stranded (5,8).

We hypothesized that microDNA can be transcribed based on previous research that suggested that single stranded circular DNA of 34-89 base pairs can be transcribed without a promoter in vitro and within cells by a rolling circle mechanism in which the polymerase travels around the circle 12-260 times (25–27). However these circles are much smaller than the average microDNA. Paramecium tetraurelia, on the other hand, creates double-stranded DNA circles of unknown length by fusing transposon derived sequences and these circles also express small regulatory RNA (28).

Here we investigate whether microDNA-mimics are capable of being transcribed in mammalian cells without a canonical promoter and whether the transcripts are functional within a cell. Both single stranded and double stranded microDNA (ranging from ~180-400 base pairs) are transcribed without a promoter in vitro and in vivo. RNA is produced from both strands of microDNA without strand bias. MicroDNA containing miRNA coding sequences, but without the promoter of the gene, produce functional miRNA capable of knocking down both a luciferase reporter and endogenous mRNA targets. MicroDNA is known to be enriched in genic regions

(8,19,24) so that some microDNA carry exon sequences. We report that microDNA arising from exons can also affect gene expression by expressing novel si-RNA that targets the parental gene that the microDNA was derived from. We also show RNA polymerase subunits are associated with endogenous microDNA, giving further evidence that microDNA molecules can be transcribed in cells. Together these results show that microDNA could produce functional regulatory RNA, both miRNA and novel si-RNA, and suggest a new mechanism of how genomic plasticity and instability can lead to changes in gene expression.

34 MATERIALS AND METHODS

Cell Culture

HCT116, 293A and 293T cells were cultured in Dulbecco’s modified Eagle’s medium

(DMEM) supplemented with 10% fetal bovine serum, 100 U/ml penicillin and 100 µg/ml streptomycin in an environment containing 5% CO2 at 37°C. 293T cells were cultured in

McCoy’s medium supplemented with 10% fetal bovine serum, 100 U/ml penicillin and 100 µg/ml streptomycin in an environment containing 5% CO2 at 37°C.

Artificial microDNA synthesis

Artificial microDNA molecules containing known microDNA sequences were created using a protocol published by Du Q, et al (29). In short, microDNA sequences were amplified out of HeLa genomic DNA using PCR. A circularly permuted molecule was created through

PCR amplification of each half, which were then cloned in the appropriate order into a pUC19 plasmid using an In Fusion HD Cloning Kit (Takara). The substrates were amplified using

Phusion Polymerase PCR (NEB). The linear double-stranded molecules were denatured and renatured to produce both the parental linear molecules and a circle with nicks on each strand nearly half-way around the circle which were then ligated by Taq Ligase (NEB). The products were taken through 10 cycles of denaturation, annealing and ligation to enrich for circular DNA.

Residual linear DNA was digested using ExoI and ExoIII (NEB) and the products separated on a denaturing PAGE gel. The band corresponding to dsDNA circles was excised and the DNA was extracted for the in vitro transcription reactions. Unless otherwise noted, the final products after the ligation cycles and exonuclease digestion was transfected into cells.

In vitro transcription assay

The artificial microDNA (100-200 ng per 50 mL reaction) was transcribed for 4 hours in vitro in an IVT buffer (40 mM Tris-HCl pH 7.9, 6 mM MgCl2, 10 mM DTT, 2 mM spermidine, 0.1

35 mM NaCl), rNTPs (2 mM rATP, 2 mM rCTP, 2 mM rGTP, 0.4 mM rUTP), [α-32P]-UTP (volume dependent on radioactivity), and 20 mg of HeLa nuclear extract (Millipore) at 37°C. The radioactive product RNA was heat denatured and run on a denaturing (urea) PAGE gel. The gel image was captured using a phosphor imaging screen and a gel imager.

Transfections of microDNA

MicroDNA was transfected in cells using Lipofectamine LTX according to the manufacturer’s instructions. RNA was isolated 24 hours after transfection. Plates with 0 ng of microDNA received 100 ng of a GFP plasmid as carrier. By transfection of a GFP expressing plasmid we estimate that 70-80% of the 293 cells take up the plasmid, but we are unable to say what fraction of the input plasmid enters the cell or the nucleus.

RNA Isolation and Quantification

RNA was extracted using TRIZOL according to the manufacturer’s instructions

(Ambion). The cDNA was created using the miScript II RT kit (QIAGEN). Specifically, the pre- microRNA sequences were quantified by creating cDNA with the miScript II RT Kit with miScript

HiFlex Buffer and then amplified by QPCR with primers that flank the mature microRNA sequence within the pre-microRNA molecule. The mature microRNA sequences were quantified by creating cDNA of mature microRNA with the miScript II RT Kit with miScript HiSpec Buffer and then amplified by QPCR using a primer which targets the microRNA sequence and the 10X miScript Universal Primer. The miScript II kit is to selectively amplify the short microRNA and not the longer product that may be created by ligation of the 3’ adaptor to pre-microRNA.

QPCR was performed using Power SYBER Green Master Mix (Life Technologies).

Luciferase assays

36 DNA oligonucleotides designed to carry the sequence of the mature miRNAs encoded by the microDNA (miR191, miR126, miR145) were cloned into the siCHECK (Promega) vector into the 3’UTR of the Renilla luciferase gene such that the sequence complementary to the miRNA was in the + strand. The effect of miRNA produced by microDNA on the siCHECK vector was quantified using the Dual-Luciferase Reporter Assay System (Promega) according to manufacturer’s instructions.

Construction and infection of HaloTag vectors

The plasmid pENTR4-HaloTag which encoded the HaloTag sequence was obtained from

Addgene. POLR2H and POLR3F cDNA were amplified from hORFeome V5.1 clones. PCR amplified POLR2H or POLR3F cDNA were inserted in frame downstream of HaloTag sequence.

HaloTag, HaloTag-POLR2H or HaloTag-POLR3F were subcloned into the pCW plasmid vector.

293T cells were transfected with plasmids pCW-HaloTag (or pCW-Halo-fusions), psPAX2 and pCMV-VSV-G using lipofectamine 2000. Lentivirus was harvested from the supernatant after 48 hours, cleared by centrifugation and passed through a 0.45 µm filter. To obtain stably transduced 293A clonal cell derivatives expressing HaloTag fusion proteins, lentivirus was added to 293A cells in the presence of 6 µg/ml polybrene followed by selection with 3 µg/ml puromycin, and isolation of clones by dilution cloning.

HT Cell lysate preparation and pull-down on Halo-Link beads

All the reagents used were pre-chilled and the entire procedure was performed on ice. HT- fusion proteins were induced by adding 1 mg/ml doxycycline to cells in a 15-cm plate. Two days later, cells were washed with PBS, scraped and transferred into a micro-centrifuge tube. After centrifugation five packed cell volume (PCV) Hypotonic Buffer was added to cell pellets, allowing the cells to swell, for 15 min on ice. Hypotonic Buffer was removed after spinning. Cells were suspended in 0.5 PCV of Buffer LS and equal volume of Buffer LS with 600 mM NaCl and

37 0.2% Triton X-100 was added. Cells were homogenized by passing 10 times through a 27G hypodermic needle and rotated for 20 min. After centrifugation, the supernatant was transfer into a new microcentrifuge tube and equal volume of Buffer LS to the supernatant was added.

Cell lysate was added to the equilibrated HaloLink Resin and incubated by mixing on a tube rotator for 30 min at room temperature. After centrifugation supernatant was saved as sample flow through. Resin was washed with PD Washing Buffer five times. The beads were boiled in Laemmli sample buffer to obtain the eluates.

MicroDNA extraction and identification

HT or HT-POLR3F/POLR2H associated DNA was purified with QIAprep Spin Miniprep Kit according to its instruction manual and amplified by rolling circle amplification (RCA) as described (6,8).

Initially paired-end high-throughput sequencing (250 cycles PE) was performed on the RCA products according to the manufacturer's protocol (Illumina) on the Illumina MiSeq at the

University of Virginia DNA Sciences Core (Charlottesville, VA, USA) (Table 1). Read quality was checked by program fastqc (FASTX-Toolkit) and was found that the median read quality was less than 28 after position 150 on the read. Therefore to remove the bad quality bases we made each library 150 PE and did the downstream analysis. We used Burrows–Wheeler Aligner with maximal exact matches (BWA-MEM) to align the reads to the human hg38 genome, allowing for split reads under default conditions. Like our previous publication (5,6,30) we used split reads mapped position to identify microDNA co-ordinate at base pair resolution. In summary to identify microDNA we consider paired-end reads that had one end mapping uniquely to the reference genome (mapped end) and the other end not mapping to the reference genome continuously, but coming from a split-read (the junctional read). Furthermore, the two parts of the split read have to flank the linked mapped read. The identified microDNA would represent genomic coordinates of potential microDNA junctions created by the ligation of two ends of a linear DNA.

38 In addition to this we also check polarity (strand information) of both the split read (should be map in the same orientation) and mapped read in pair (this should be opposite to split read).

Evaluation of complexity of microDNA associated with RNA polymerases

The microDNA were identified from the HaLo-Tag pull-downs as above. Because of the differences in number of mapped reads obtained from the HT, HT-POL2H and HT-POL3F associated microDNA, we used a random sampling approach to compare the microDNA complexity among samples (Table 1). The same number of mapped reads was randomly extracted from the HT-POLR2H and HT-POLR3F libraries, as the total number of mapped reads in the HT library. This was done 10 times. Each randomly selected set of reads was processed as above and the number of unique microDNAs identified counted (Table 1). The mean and standard-deviation of the 10 samples is presented in Figure 6C.

RESULTS

Synthesis of microDNA mimics

We created synthetic microDNA mimicking known microDNA sequences by utilizing a technique, called ligase assisted mini-circle accumulation (LAMA), which relies on cycles of annealing, ligation and denaturation to produce small DNA circular molecules (Figure 2-1A)

(29). Utilizing sequencing data of microDNA isolated from human cancer cell lines, we designed circles that mimic known microDNA overlapping with microRNA sequences or with exons of non-coding or protein coding genes. Both single stranded and double stranded artificial microDNA molecules were created and isolated (Figure 2-1B). The sequences of the circular molecules created are listed in Supplemental Table 1. Double-strandedness of specific isoforms was verified by digestion with restriction endonuclease, while topoisomerase I was used to distinguish supercoiled from relaxed circles (Supplemental Figure 1).

39 MicroDNA mimics are transcribed in vitro

To test whether microDNA are transcribed we used an in vitro transcription system using

HeLa nuclear extract which contains human RNA polymerases. We isolated the single stranded and double stranded microDNA (containing the hsa-mir-145 sequence) after separation on the denaturing PAGE and added transcriptionally competent HeLa nuclear extract, NTPs, and radiolabeled UTP. Both single stranded and double stranded circular microDNA molecules are transcribed but the linear DNA control containing the same sequence is not transcribed (Figure

2-1C). Relaxed circles produced significantly more transcripts than supercoiled circles

(Supplemental Figure 2-1). This experiment was repeated with two other microDNA sequences

(microDNA carrying hsa-mir-126 sequence and the hsa-let-7a sequence) and similar results were found (Supplemental Figure 2-2). Each in vitro transcription experiment was validated with a second replicate. The RNA products show distinct lengths that correspond to multiples of the microDNA sequence length. This suggests that some RNA polymerases fall off the DNA template after going around the circle once and some continue around the circle multiple times.

Because the microDNA sequences did not contain known promoters, this result also shows that the short double-stranded DNA circles, as well as the single-stranded DNA circles, are transcribed independent of a canonical promoter sequence by human RNA polymerases, whereas a linear DNA fragment is not transcribed. We hypothesize that the bending of the double-stranded DNA enables the binding of TATA-binding proteins (31–34) to recruit RNA polymerase which initiates transcription independent of a canonical promoter.

To determine where transcription initiates on a microDNA mimic (microDNA with hsa-mir-

191 sequence), we performed an in vitro transcription reaction and then quantified the RNA arising from different regions of the microDNA sequence. We found that the transcription is not uniformly distributed around the circle, but has some sequence bias (Supplemental Figure 2-3).

This suggests that certain sequences within the microDNA are more likely to be bound by transcription initiation machinery than others.

40

MicroDNA mimics are transcribed in vivo

We next wanted to test whether the artificial microDNA molecules can be transcribed in vivo.

Because some of the microDNA molecules which have been sequenced from human cancer cells contain miRNA sequences (8), we hypothesized that microDNA may be capable of expressing miRNA (Figure 2-2A). Artificial microDNA molecules were made that carry miRNA sequences observed in naturally occurring microDNA sequences: hsa-mir-145, hsa-mir-191, hsa-mir-126. The sequences contained only the pre-miRNA part of the gene but not the rest of the primary miRNA, and hence excluded the promoter that is found at the 5’ end of the long primary miRNA, often tens of kb away from the mature miRNA. As the amounts of artificial microDNA are increased, the levels of each of the pre-miRNA transcripts increased by 6- to 30- fold (Figure 2-2B). At least some of pre-miRNA transcripts are processed into mature miRNAs which are also increased concurrently by 3- to 6-fold (Figure 2-2C).

Additionally, the transfection of linear DNA carrying the same sequence as the microDNA does not increase the RNA levels confirming that the DNA must be circularized to be transcribed (Supplemental Figure 2-4A). To further show that the RNA is arising from circular

DNA and not linear contaminants or the genomic sequence, we quantified the RNA using primers that amplify the junction sequence. We see an increase in the RNA spanning the junction as more artificial microDNA is added (Supplemental Figure 2-4B). The fold increase of the junctional RNA appears significantly higher relative to fold-induction seen with primers known to target the pre-microRNA within the microDNA. We believe this is because there is no endogenous RNA in untransfected cells that spans the junction of the circle. Therefore, the fold- induction using the primers spanning the junction sequence gives a more accurate representation of the quantity of RNA arising from microDNA because there is no endogenous junction-spanning RNA elevating the basal level in untransfected cells.

41 Because the microDNA is transcribed without a canonical promoter sequence we wondered whether the RNA is equally transcribed from both strands of the microDNA when double- stranded circular microDNA is transfected. Reverse-transcription with strand-specific primers and Q-PCR demonstrated that the microDNA carrying hsa-mir-145 were transcribed relatively equally from either strand (Figure 2-2D). The relative stabilities of the RNAs arising from the two strands and whether the RNAs are processed by RNA capping and poly-A addition is unknown and will require further research.

Overall, these results suggest that microDNA like molecules can be actively transcribed within cells. The specific character of microDNA allowing for its relative independence from a canonical promoter requires further determination. This research gives insight into how these circular DNA molecules, till now assumed to be inert byproducts of DNA metabolism, could contribute to cell physiology by actively forming RNA transcripts.

MicroRNA produced from microDNA mimics are functional

To determine whether the microDNA produced functional miRNAs, luciferase reporters containing target sequences complementary to the miRNAs in their 3’ UTRs were co- transfected with the microDNA for dual luciferase assays in 293T cells. Each of the microDNA molecules (containing hsa-mir-145, hsa-mir-191, or hsa-mir-126 sequences) repressed the luciferase reporter carrying the target sequence of the miRNA by >50% after transfection

(Figure 2-3A). Further, the microDNA containing hsa-mir-145 and hsa-mir-191, which carry both the 3p and 5p sequences of the microRNA, are able to repress a luciferase reporter which contains either the 3p or 5p sequence. This shows that the transcripts arising from the microDNA form both mature sequences and function in the same manner as endogenous microRNA.

We next examined whether the repression of a luciferase reporter by a co-transfected microDNA is dependent on the endogenous RNA interference pathway. The introduction of

42 each microDNA carrying microRNA sequences did not repress the luciferase reporter when transfected into DICER1 KO 293T cells (35) (Figure 2-3B). This shows that the transcripts from microDNA need to be processed by Dicer through the same pathways as pre-miRNA produced from a chromosomal locus.

The microDNA carrying miRNA sequences also repress endogenous cellular genes that are targets of the encoded miRNA. The predicted targets were obtained from Targetscan. The microDNA carrying hsa-mir-145 sequence repressed mir-145 targets by up to 40%; hsa-mir-191 microDNA repressed mir-191 targets by up to 60%; hsa-mir-126 microDNA repressed downstream targets by up to 90% (Figure 2-3C). In each case most targets were repressed to similar levels. Further, the gene repression caused by a microDNA was specific to the targets of the miRNA encoded by the microDNA: for example, targets of miR-145 or miR-126 were not repressed by the introduction of microDNA containing miR-191 (Supplemental Figure 2-4C).

Together these results show that the microDNA are potentially capable of contributing to the population of functional short RNA within a cell to influence the expression of endogenous genes.

MicroDNA mimics containing exon sequences can repress host genes

MicroDNA are enriched from genes, and often contain exons (8). Short hairpin RNA

(shRNA) sequences have long been known to be processed into siRNAs that repress target genes, suggesting that the gene from which a microDNA is derived could be repressed by the microDNA. Alternatively, if a microDNA is transcribed from both DNA strands the resulting double-stranded RNA could also be processed to a functional si-RNA (Figure 2-4A). Indeed, a similar phenomenon has been suggested in Paramecium tetraurelia where transposon-derived

DNA sequences are ligated to form circles (of unknown size) that produce siRNAs that repress transposon expression (28).

43 To test whether microDNA could repress the parental gene, we created artificial microDNAs containing exonic sequences that we had identified in human cancer cells (Supplemental Table

1). These microDNA were transfected into 293T cells, where we get nearly 80% of the cells taking up the transfected DNA (data not shown). Three microDNA containing different exon sequences specifically repressed the expression of the host gene that contained the matching exon sequence. A microDNA carrying the full sequence of exon 2 of the TESC-AS1 gene repressed the TESC-AS1 mRNA by ~60% (Figure 2-4B). A microDNA encoding an exon of the

KCNQ1OT1 gene repressed the KCNQ1OT1 mRNA by ~60% (Figure 2-4C). A microDNA encoding the full exon 6 of SIGLEC9 gene repressed of the SIGLEC9 mRNA by ~50% (Figure

2-4D). Here again the repression of the endogenous gene by the microDNA is dependent on the

RNA interference pathway, specifically DICER1, with the repression of TESC-AS1, KCNQ1OT1, or SIGLEC9 mRNAs significantly attenuated in DICER1 KO 293T cells (Figure 2-4B to 2-4D).

The expression of genes not containing homology to the microDNA mimic were not affected by the introduction of microDNA: for example the microDNA from KCNQ1OT1 did not repress

TESC-AS1 or SIGLEC9 (Supplemental Figure 2-4D). This shows that the microDNA can also produce novel si-like RNA when the microDNA are derived from exonic sequences.

Thus regulatory short RNAs can be produced not only from microDNA which contain pre- microRNA genes but also from microDNA which overlap with exons. This greatly expands the proportion of eccDNA now expected to contribute to gene expression changes.

Endogenous microDNA are associated with RNA polymerases

We next tested whether RNA polymerases bind to endogenous microDNA. Epitope-tagged

(Halo Tag (HT)) POLR3F (subunit of RNA Polymerase III) or POLR2H (subunit of all three RNA

Polymerases) were induced by doxycycline, in 293A cells (Figure 2-5). The epitope tagged subunits were then captured by covalent linkage to HaloLink resin and the pull-down confirmed by western blotting of the eluate for a non-covalently associated RNA polymerase subunit,

44 POLR3A (Figure 2-5). The DNA which was associated with the HT alone or HT-POL subunits was isolated, digested by exonucleases, and then amplified by multiple displacement amplification with random hexamers (Figure 6A). The amplified DNA was quantitated and found to be significantly more in the POLR3F and POLR2H eluates than in the HT alone negative control precipitates: undetectable (below limit of detection) for the HT control, 728 ng for HT-

POLR3F, and 408 ng for HT-POLR2H. Note that the HT alone is 33 kDa in size, and so a significant sized protein is being pulled down on the negative control HT beads. Further, PCR of the sheared RCA products ligated to sequencing adapters show that POLR3F and HT-

POLR2H pull down 2e10 (or ~1000) fold more DNA than the HT control (Figure 2-6B). Because

POLR2H and POLR3F both bind to endogenous microDNA, it suggests that microDNA could be transcribed by RNA Polymerase III, and possibly RNA Polymerase I and II.

The rolling circle amplification products were subjected to Illumina sequencing to identify the microDNA and determine the number of unique sites in the genome represented in the microDNA library (complexity). The HT-POLR3F and HT-POLR2H associated libraries yielded

27-fold and 6-fold more complexity than the HT associated library (Table 2-1). Since the yield of

DNA after rolling circle amplification was significantly more in the two POLR precipitates, we also compared the complexity by randomly sampling equal numbers of high-throughput reads from the three precipitates. Even after equalizing read numbers the HT-POLR3F and HT-

POLR2H precipitates had microDNA at 3-8 fold higher complexity than that associated with just the HaloTag (Table 2-1 and Figure 2-6C), providing further evidence that microDNA bind to

RNA polymerase subunits. Sequencing of the multiple displacement amplification products also confirmed that the microDNA associated with the RNA Polymerases have the same characteristics as found in our previous studies including length peaks at around 200-400 base pairs with a periodicity of nucleosome length DNA (Figure 2-6E), and GC content around 45-

50% (Figure 6F). This shows that naturally occurring microDNA is associated with RNA

45 polymerases, consistent with the hypothesis that they can be transcribed to produce functional regulatory small RNAs.

We have reported before that the microDNA in mammalian cells is 4-10X enriched relative to random expectation from 5’ UTRs, exons, genes, and CpG islands (8). In contrast, the RNA polymerase associated microDNA are more uniformly distributed throughout the genome

(Figure 2-6D), suggesting that the RNA polymerase subunits have a lower affinity for microDNA from gene or CpG island derived areas.

DISCUSSION

In summary, we have found that microDNA like molecules within cells are capable of being transcribed without a canonical promoter to form functional microRNA and novel sh- or si-RNA.

The transcripts from microDNA mimics which carry microRNA sequences are processed into mature microRNA which can repress expression of down-stream targets and a luciferase reporter. MicroDNA arise from about 5% of the genome in the chicken DT40 cell line and 0.4-

1.5 % of human HeLa, C4-2, and LNCap cell lines (Supplemental Figure 2-5), and are enriched from genic regions (5,6,8). An intriguing finding of this study is that microDNA mimics which carry exonic sequences form novel sh- or si-RNA that repress the gene from which it originated.

Additionally, subunits within RNA polymerase complexes (POL2H and POL3F) are bound to naturally occurring microDNA, adding to the possibility that endogenous microDNA may be transcribed. Collectively, these results support the hypothesis that microDNA could be functional in cells, actively repressing genes through the RNA interference pathway by producing microRNA and novel si-RNA.

EccDNA, including microDNA, are found to be significantly increased in cancer cells

(5,6,8,21). We know that long eccDNA (>10,000 kilo-bases) can amplify genes, including oncogenes, in cancer and change gene expression patterns that contribute to oncogenesis.

46 Our current results suggest that the microDNA could similarly regulate gene expression patterns through the formation of novel regulatory RNAs arising from microDNA.

The length distribution of eccDNA in various tissues and organisms has a distinct pattern of peaks that correspond to lengths of DNA bound by nucleosomes suggesting that the excision of

DNA from the genome may be regulated by nucleosomes. Interestingly, it has recently been discovered that artificial eccDNA molecules introduced into cancer cell lines have high stability but are quickly transcriptionally silenced by epigenetic mechanisms (36). Although 359 bp long microDNA like plasmids have been shown to be assembled in vitro into mono- or di- nucleosomes (37), it is unclear whether the microDNA in cells are chromatinized, and if they are, whether they can be epigenetically repressed like the long eccDNA. Therefore, it will be interesting to test in the future whether the microDNA can or cannot be epigenetically regulated.

Because the length of the microDNA corresponds to DNA fragments generated by apoptosis, it has been suggested that they may arise at least partially as a byproduct of apoptosis (9). Even in that paper, however, the authors noted that microDNA arise from specific parts of the genome and that they arise in lymphoblasts that are resistant to the apoptosis- inducing agents methotrexate or L-asparaginase and in cells not treated with any apoptosis- inducing-agents, so that the authors qualified their suggestion by saying that apoptosis is not the only source of microDNA. Our results that microDNA formation is diminished in DT40 cells mutant for the mismatch repair gene MSH3 (8), and the specific association of endogenous microDNA with RNA polymerase subunits that we report here, reinforce the suggestion that microDNA are not solely produced by apoptosis, if at all.

More research is necessary to determine the exact mechanism by which microDNA are transcribed, but it is clear that the circularization is important, because the equivalent linear DNA was not transcribed in vitro (Figure 2-1C and Supplemental Figure 2-2) or in vivo (Supplemental

Figure 2-4A). Transcription of microDNA without the requirement of a canonical promoter sequence may occur because of structural features unique to the microDNA that attracts an

47 RNA polymerase, or because cryptic promoters as seen in (38) are created by the circularization of the linear DNA into a microDNA. At most 1 or 2 nucleosomes could be assembled on circles of 359 bp in vitro (37,39). Thus abnormal chromatinization may leave the intervening naked DNA more accessible to transcription factors including RNA polymerases (40) allowing promoters or cryptic promoters to be more readily bound. The small size of the microDNA molecule may also contribute to its spontaneous transcription through the formation of flipped bases (41) and bubbles of ssDNA (42), the latter known to initiate transcription (43).

Further, the bent shape of the small circular DNA itself may signal for the binding of TATA- binding proteins (31–34), though the bent shape of DNA can also limit its transcription when associated with a bona fide promoter (44). Lastly, it has also been shown that nicks can initiate transcription which suggests that if microDNA molecules contain nicks then that could also contribute to their transcription (45).

We have previously reported that microDNA arise from epigenetically active gene-rich areas within the genome (8). Therefore, transcription can lead to microDNA formation, which could lead to repression of the parent gene by generating regulatory short RNAs. This could be a negative feedback mechanism that represses GC-rich genes when they are transcribed at a high level and produce microDNA, most likely from some DNA repair process. It is intriguing that a similar pathway has been proposed for the silencing of transposon genes in the germline nucleus of Paramecium tetraurelia where transposon-derived DNA sequences are ligated to form circles (of unknown size) that produce siRNAs that associate with PIWI proteins to repress transposon expression (28).

One criticism of our study is that the amount of synthetic microDNA we transfected into cells to see increase in microDNA-encoded regulatory RNAs is significantly higher than endogenous levels of microDNA. We believe that this is because of inefficiencies introduced by several factors: the presence of endogenous microRNA that sets a high basal level of the RNA for comparison, the low number of DNA molecules taken up per cell and the possible saturation of

48 endogenous RNA polymerases. The microDNA sequences selected for this study were chosen because of the existing knowledge of their encoded microRNAs and their target genes. As a result, the fold-increase of microRNAs arising from transfected microDNA is diminished by the high basal level of endogenous microRNA in the untransfected cell. Indeed, when we look for microDNA-junction-specific RNAs, the fold-change is very high because there is very little endogenous RNA to set a high basal level (Supplemental Figure 2-4B). Another important factor is that we do not know how much of the added microDNA is being taken up by the cells. It is very likely that these fractions/numbers are low leading to an underrepresentation of the effect of the synthetic exogenous microDNA. In addition, the microDNA molecules have no inherent structural feature that would cause them to escape from the endo-phagosome and be transported into the nucleus therefore limiting the amounts of the transfected microDNA that can be transcribed. Finally, it is unknown if there is sufficient free RNA polymerase to associate with the newly arriving nuclear microDNA, further decreasing the proportion of exogenous microDNA which is actively transcribed. Overall, we believe the effect of the microDNA transfected into the cells is underrepresented by our experiments because of these limiting factors.

The controls in all experiments were designed so the only variant between experiment and control is the specific sequence of the artificial microDNA transfected into the cell. This ensures that the differences in expression of the genes measured by Q-RT-PCR and luciferase assays can be attributed to the introduction of the exogenous microDNA. Further, the results are robust because each experiment was repeated at least three times and produced consistent results.

In summary, inspired by the prevalence of small eccDNA (microDNA) in normal and cancer cells, we examined whether molecules with similar characteristics could express functional gene products. The results suggest that microDNA could express regulatory short RNAs, raising the possibility that microDNA could cause changes in cell phenotype by regulating gene expression.

49 Figures

FIGURE 2-1 Diagram of artificial microDNA synthesis starting with two circularly permuted linear DNA molecules that anneal to form a circle with staggered nicks. (B) Circular and linear products of LAMA run on a denaturing PAGE gel before and after ligation cycles. Circular DNA accumulates to form ssDNA, nicked and supercoiled dsDNA. Ss: single-stranded; ds: double-stranded; sc: supercoiled; rel: relaxed; circ: circular; (C) 32P-UTP labeled RNA run on PAGE gel after in vitro transcription of the templates indicated on the top. Products are seen at size ranges of multiples of the microDNA length. NE: HeLa nuclear extract.

50

FIGURE 2-2 (A) Diagram of transcription of microDNA carrying only the promoterless pre-microRNA part of the microRNA gene. The transcripts are processed by endogenous RNA interference proteins into functional mature miRNA (B) Expression of pre-microRNA molecules after addition of indicated amounts of the corresponding artificial microDNA molecules. Expression is relative to beta-actin (hsa-mir-191 and hsa-mir-126) and GAPDH (hsa-mir-145) and normalized again to the negative control of a transfected GFP plasmid. Mean and S.E. of 6 transfections. (C) Expression of processed mature miRNA molecules after transfection of indicated amounts of artificial microDNA molecules. Expression is relative to beta-actin (hsa-mir-191 and hsa-mir- 126) and GAPDH (hsa-mir-145) normalized to negative control. Mean and S.E. of 6 experiments. (D) Expression of the + and the – strand of the pre-microRNA, hsa-mir-145, from microDNA molecules relative to GAPDH. Strand-specific primers were used for the reverse- transcription to form cDNA specifically from the (+) or (-) strand of RNA.

51

FIGURE 2-3 (A) Transfection of artificial microDNA, carrying indicated pre-microRNA sequences, to 293T cells decreases expression of a co-transfected Renilla luciferase reporter containing a sequence complementary to the microRNA sequence within its 3’ UTR. RL activity expressed relative to a co-transfected firefly luciferase and normalized again to the level in cells transfected with 0 ng of microDNA. Mean and S.E. of 3 experiments. * indicated p < 0.05 in a Student’s t test. (B) Repression of luciferase in dual-luciferase assay is observed in WT 293T cells but not in DICER1-/- 293T cells (C) Endogenous cellular targets of indicated microRNAs are repressed after transfection of the synthetic microDNA carrying the indicated pre-miRNA genes. mRNAs quantitated by Q-RT-PCR and expressed relative to the beta-actin gene and normalized to the level in cells transfected with 0 ng microDNA. Mean and S.E. of 3 experiments.

52

FIGURE 2-4 (A) Diagram of theoretical mechanism of the formation of si-like or sh-like RNA from microDNA containing an exonic sequence either from transcription of both strands of the microDNA or from folding of the transcript into a short-hairpin. (B) Transfection of a microDNA containing the sequence of exon 2 of the TESC-AS1 gene represses the expression of the TESC-AS1 gene in 293Tcells. The repression of is observed in 293T cells but not in DICER-/- 293T cells. The same is seen with microDNA carrying a portion of (C) Exon 1 of KCNQ1OT1 and (D) The exon of SIGLEC9. The RNAs were quantitated by Q-RT-PCR and the values expressed relative to beta actin and normalized to the cells with 0 ng of transfected microDNA. Mean and S.E. of 3 experiments. *= P < 0.05 in a Student’s t-test. (E) A luciferase reporter assay where a sequence homologous to a known microDNA arising from the gene TESC-AS1 has been inserted into the 3’UTR of a Renilla luciferase gene. The luminescence of the Renilla luciferase gene is normalized against the luminescence from a synthetic firefly luciferase gene within the same plasmid. Mean and S.E. of 4 experiments.

53

FIGURE 2-5 Efficiency of pull-down of RNA polymerase complex by HT. Detection of RNA polymerase III catalytic subunit (POLR3A) in the HT-POLR2H or HT-POLR3F pull-down. HaloTag fused RNA polymerase subunit POLR2H or POLR3F was expressed in 293A cells (asterisk). The HT-tagged polymerase subunits were covalently bound to the HaloLink resin and the non-covalently associated proteins were eluted by boiling in Laemmli Sample buffer. The HT-proteins are covalently bound to the beads and are mostly not eluted. Some covalent bonds break to release traces of the HT protein in the eluate (squares). However, non-covalently associated POLR3A of the RNA polymerase complex is specifically released in the eluates from the HT-POLR3F and HT-POLR2H pull-downs.

54

FIGURE 2-6 Subunits of PolII and PolIII bind to microDNA. (A) Diagram of pull-down of Halo-tagged RNA polymerase subunits and purification of associated microDNA for rolling circle amplification with random hexamers (RCA). (B) RCA products were sheared, ligated to library primers for high throughput sequencing, amplified by PCR for the indicated cycles and comparable aliquots run on a gel and visualized by ethidium bromide fluorescence. (C) Complexity of microDNA in the libraries prepared from the POLR3F and POLR2H pull-downs relative to the tag-only control as calculated in Table 1. (D-F) Characterization of microDNA molecules pulled down by POLR3F and POLR2H: (D) Enrichment relative to random expectation of the microDNA from areas of the genome with indicated genomic features, (E) length distribution, (F) GC content.

55

Table 2-1

MicroDNA diversity associated with HT, and HT fused with indicated RNA polymerase subunits. Summary of reads obtained from HT, HT-POLR2H and POLR3F associated microDNA libraries (2 independent sequencing runs done on separate days, R1 and R2). Low quality bases from the 3’ end of reads were removed by making 150 (read length) PE reads from 250 (length) PE reads. We also made 75 bases PE reads from the same library to identify microDNA shorter than 150 bp. Total number of unique microDNA (unique microDNA junctions) identified from each run is indicated.

To normalize for the number of paired end reads in the three libraries, 500,000 mapped reads were randomly selected from each library and the number of unique microDNAs in the samples determined. This was done 10 times and the mean microDNA number and their standard deviation indicated in the last two columns.

Sample Paired-end Mapped MicroDNA Mean SD -Name Reads Reads (Random 500K PE Mapped Reads) 2H-R1 3,182,235 3,104,725 54,592 16,531 94 3F-R1 1,479,615 1,434,156 19,217 9,675 60 HT-R1 11,472,991 9,152,503 14,882 2,473 42

2H-R1 10,314,411 10,081,812 112,572 17,072 110 3F-R1 2,868,822 2,780,519 30,038 9,751 66 HT-R1 1,092,336 854,021 3,732 2,842 18

56

Supplemental Figures

SUPPLEMENTAL FIGURE 2-1 Circular microDNA mimic molecules of all topologies are transcribed by RNA polymerases in HeLa nuclear extract in vitro transcription assay. (A) Different topologies of microDNA (with KCNQ1OT1 sequence as example) analyzed by incubation with NdeI digestion (cuts the double-stranded microDNA twice) or topoisomerase (relaxes supercoiled microDNA). REL-CC: Relaxed covalently closed circular DNA; REL-N: Relaxed nicked circular DNA; SC: Supercoiled circular DNA; LIN-DS: Linear double stranded DNA; LIN-SS: Linear single stranded DNA. Representative replicate of duplicates. (B) QPCR of RNA products obtained by IVT of microDNA with indicated topologies or strandedness (with hsa-mir-191 sequence) Mean and S.D. of 3 replicates.

57

SUPPLEMENTAL FIGURE 2-2 The in vitro transcription of microDNA by RNA polymerases within HeLa Nuclear Extract is verified with two other microDNA sequences: (A) microDNA containing hsa-let-7a (B) microDNA containing hsa-mir-145. Representative replicate of duplicates.

58

SUPPLEMENTAL FIGURE 2-3 Distribution of transcripts from microDNA is not random (A) Diagram of regions of the microDNA whose transcripts are quantitated by Q-RT-PCR. (B) Quantification of RNA arising from each region of the microDNA after normalizing to efficiency of the primer pairs when they are used to amplify the circular DNA. Mean and S.E. of 3 replicates. (C) Sequence of microDNA within each specified region

59

SUPPLEMENTAL FIGURE 2-4 RNA transcribed from circular microDNA specifically and the repression of gene expression by RNA transcripts arising from microDNA is sequence-specific. (A) Circular, but not linear, molecules of the hsa-mir-126 microDNA increases the encoded RNA when transfected into 293T cancer cells. Mean and S.D. of 3 replicates. (B) QPCR of RNA arising only from microDNA quantified using primers which amplify the junction sequence of the microDNA (containing hsa-mir-126) Mean and S.D. of 3 replicates. (C) When the microDNA containing the miR-191 sequence was transfected into human cancer cells, genes which are not targeted by miR-191 were not repressed. Mean and S.D. of 3 replicates. (D) When the microDNA containing the KCNQ1OT1 microDNA sequence was transfected into human cancer cells, genes other than KCNQ1OT1 were not repressed. Mean and S.D. of 3 replicates.

60

SUPPLEMENTAL FIGURE 2-5 Diagram of the luciferase reporter assay used to test whether endogenous microDNA have the ability to repress gene expression. The luciferase reporter was transfected into a human cancer cell line and the repression of the Renilla luciferase gene with the TESC-AS1 homologous sequence was normalized to the control luciferase gene (the synthetic firefly luciferase gene).

61

SUPPLEMENTAL FIGURE 2-6 Percentage of the genome that give rise to microDNA in (A) DT40, (B) HeLa, (C) C4-2, and (D) LNCaP cells. As we prepare many libraries and sequence them, the % of the genome covered by the microDNA sequences does not increase linearly but appears to saturate at 4-5 % of the genome for chicken DT40 cells, and 0.4-1.5% of the genome for human HeLa, C4-2, and LNCaP cells.

62

Name Primer Sequence MicroDNA sequence Legend miR145 AB F CCCCCCAGAGCAATAAGCC CCCCCCAGAGCAATAAGCCACATCCGGCGACGTGTGGCACCCCACCCTGG Bold: pre-hsa-mir-145 sequence R GAAGGAGGCAAATCCAGCTGT CTGCTACAGATGGGGCTGGATGCAGAAGAGAACTCCAGCTGGTCCTTAGG Underlined: 5p and 3p miR145 sequence CD F GGCCACTCGCTCCCACCTTG GACACGGCGGCCTTGGCGCTGAAGGCCACTCGCTCCCACCTTGTCCTCA CGGTCCAGTTTTCCCAGGAATCCCTTAGATGCTAAGATGGGGATTCCTGG TTCAGCGCCAAGGCCGC R AAATACTGTTCTTGAGGTCATGGTTTCACAGCTGGATTTGCCTCCTTC miR191 AB F CCCCAGGAAGTAAGAGGGCTATCTTTAGCG GCAGGAGCTCCCCCGCCCCCCGCCAACGGCTGGACAGCGGGCAACGGA Bold: pre-hsa-mir-191 sequence R GCAGGAGCTCCCCCGCCC ATCCCAAAAGCAGCTG TTGTCTCCAGAGCATTCCAGCTGCGCTTGGATTT Underlined: 5p and 3p miR191 sequence CD F AGAGGAACTGAGACCCAAGCAGC CGTCCCCTGCTCTCCTGCCTGAGCAGCGCCCTGGCCCAGATGGGGTGCC CCTGACCCCCAGACATACTTTACTGAGCTGCTTGGGTCTCAGTTCCTCTCA R CAGTTGCGCCCTCAGGCT GTTGCGCCCTCAGGCTGGAGGTGATGGGTGTAGACGTGGGAGAGCCGAG GCTGGTGGCCATCTCTGGAACCCTGGGGAGGATTGGCGAGGGAGGGTGG ACCCAGGACCTCTGGGTAGGGCTGCAATGGTAGTGACTCCCCCAGGGCTG CCTCAGCCCGCATCTCGCTAAAGATAGCCCTCTTACTTCCTGGGG miR126 AB F GGAGGATAGGTGGGTTCCCGA GGAGGATAGGTGGGTTCCCGAGAACTGGGGGCAGGTTGCCCGGAGCCTC Bold: pre-hsa-mir-126 sequence R CGGTGCCGTGGACGGCGCATT ATATCAGCCAAGAAGGCAGAAGTGCCCCGTCCCGGGGTCCTGTCTGCATC Underlined: 5p and 3p miR126 sequence CD F TTCTGGAAGACGCCACGCCTC CAGCGCAGCATTCTGGAAGACGCCACGCCTCCGCTGGCGACGGGACATT ATTACTTTTGGTACGCGCTGTGACACTTCAAACTCGTACCGTGAGTAATAA TGCTGCGCTGGATGCAGACAG R TGCGCCGTCCACGGCACCG Let-7 AB F TCCTCAGCCCTCTTTCCTCC TCCTCAGCCCTCTTTCCTCCCGCGTCCCCAGGAGGTGCCTCTGGAAGCCA Bold: pre-hsa-let-7a sequence R CACCGCAGATATTACAGCCACTTC CGGAGTCCCATCGGCACCAAGACCGACTGCCCTTTGGGGTGAGGTAGTA Underlined: 5p and 3p let7a sequence CD F GGTAGTAGGTTGTATAGTTTGGG GGTTGTATAGTTTGGGGCTCTGCCCTGCTATGGGATAACTATACAATCTACT GTCTTTCCTGAAGTGGCTGTAATATCTGCGGTG R CCAAAGGGCAGTCGGTCTT TESC-AS1 AB F TCTCCATCCAGTTGGGGTGACT TCTCCATCCAGTTGGGGTGACTGGGGCCGATGCTGTTGCCTGTGTAATGTG Bold: Portion of TESC-AS1 exon 2 R AGACTAACGTTCAGAACAGACCTGA ATTCCTCCTCCTTAAAATAAGGATAAGTTAATGAGATGTCCACCAGAGGGC CD F CCCACCAGCCTCTGTGGC CTGGCGTGGACACAGCACATGGCCACAGAGGCTGGTGGGCGCTATGGAA TCTTGTCCCCTGGAGAGGCACACAGCCAGGGCAGAACATCAAGGTCAAG R CGCTATGGAATCTTGTCCCCTG GCTCTCCTGAAGGCTCTGCAGTGCTTAGTGACACCACCATCAATGACCGT CAGGTATCAGGTCTGTTCTGAACGTTAGTCT SIGLEC9 AB F CAGGCTACATGCTGGCTGTG CAGGCTACATGCTGGCTGTGGAGAGTCCACATCACTCACCTGAGAGGCTG Bold: Portion of SIGLEC9 exon 6 R CCAATCTGACCACACTGAAAGG AACCCCTGACAGCGTTTGCATCCTCTATGCCCGTATCTCCCACGCCCGCT CD F CACGCCCGCTGCTGGCCT GCTGGCCTTGCCGATTTCTTCCTGCAGGACCTCACTCTGAGTGAAGAGAC CAGAGAGCCTTTCAGTGTGGTCAGATTGG R GGAGATACGGGCATAGAGGATG KCNQ1OT1 AB F GAAAGGACACCATGTTGAACACATC ATCCCTTGTCAGATGAATATTTTCTCCCATTCTACAGGATATCTCTTAATAGT Bold: Portion of KCNQ1OT1 exon 1 R ATCCCTTGTCAGATGAATATTTTCTCC TTATTATTTCCTTTTATGTGTAGGTTTTTAGTTTGATACAGTCCCATTTGTCTA CD F AGGATATGGGTCTAGCGAAGATTTTATG TTTTTGTTTTTGTTGCCTGTGCTTATAAAGTCTTACCCATAAAATCTTCGCTA GACCCATATCCTGAAGGGTTTCCCTTCTGTTTTCTTCTAGTACTGTTTGCTT R GAAGGGTTTCCCTTCTGTTTTCTTCTA CAGGTCTCATGTTTAAGTCTCTAATCAATTTTGAGTTGATTTTTTTATATGCT GAGAGATAGTATAGTTTCATTCTTCTGCATATGATATCCTGTTTTCCCAACAT GATGTGTTCAACATGGTGTCCTTTC

SUPPLEMENTAL TABLE 2-1 Artificial microDNA were created using sequences of endogenous microDNA found in previous studies (Shibata 2012, Dillon 2015) which overlapped with microRNA. The primer sequences amplified the staggered duplexes AB or CD for each microDNA. When these duplexes were annealed and subjected to LAMA we obtained the circular microDNA derived entirely from genomic sequence which matched the microDNA sequence acquired in previous studies. The sequences are listed under the ‘MicroDNA sequence’ column.

63 BIBLIOGRAPHY

1. Hotta, Y; Bassel A. Molecular size and circularity of DNA in cells of mammals and higher plants. Proc Natl Acad Sci USA. 1965;53:356-362. 2. Cox D, Yuncken C, Spriggs AI. Minute Chromatin Bodies in Malignant Tumours of Childhood. Lancet. 1965:20. 3. Wong, FY; Wildman S. Simple procedure for isolation of satellite DNA’s from tobacco leaves in high yield and demonstration of minicircles. Biochim Biophys Acta. 1972;1:5-12. 4. Møller, HD; Parsons, L; Jørgensen, T; Botstein, D; Regenberg B. Extrachromosomal circular DNA is common in yeast. Proc Natl Acad Sci USA. 2015;24(112):E3114-22. 5. Kumar P, Dillon LW, Shibata Y, Jazaeri AA, Jones DR, Dutta A. Normal and Cancerous Tissues Release Extrachromosomal Circular DNA (eccDNA) into the Circulation. Mol Cancer Res. 2017;9:1197-1205. 6. Shibata, Y; Kumar, P; Layer, R; Willcox, S; Gagan, J; Griffith, J; Dutta A. Extrachromosomal microDNAs and chromosomal microdeletions in normal tissues. Science (80- ). 2012:82-86. 7. Zhu, J; Zhang, F; Du, M, Zhang, P; Fu, S; Wang L. Molecular characterization of cell-free microDNAs in human plasma. Sci Rep. 2017;7:1-11. 8. Dillon LW, Kumar P, Shibata Y, et al. Production of Extrachromosomal MicroDNAs Is Linked to Mismatch Repair Pathways and Transcriptional Activity. CellReports. 2015;11(11):1749-1759. 9. Mehanna, P; Gagné, V; Lajoie, M; Spinella, J; St-Onge, P; Sinnett, D; Brukner, I; Krajinovic M. Characterization of the microDNA through the response to chemotherapeutics in lymphoblastoid cell lines. PLoS One. 2017:1-14. 10. Radloff, R; Bauer, W; Vinograd J. A dye-buoyant-density method ofr the detection and isolation of closed circular duplex DNA: The closed circular DNA in HeLa cells. Proc Natl Acad Sci USA. 1967:1514-1521. 11. Agsteribbe, E; Kroon, AM; Van Bruggen E. Circular DNA from mitochondria of Neurospora crassa. Biochim Biophys Acta. 1972;2:299-303. 12. Billheimer, FE; Avers C. Nuclear and mitochondrial DNA from wild-type and petite yeast: circularity, length, and buoyant density. Proc Natl Acad Sci USA. 1969;2:739-746. 13. Buongiorno-Nardelli, M; Amaldi, F; Lava-Sanchez P. Electron microscope analysis of amplifying ribosomal DNA from Xenopus laevis. Exp Cell Res. 1976;1:95-103. 14. Ono, T; Ozeki, Y; Okubo, S, Inoki S. Characterization of nuclear and satellite DNA from trypanosomes. Biken J. 1971;3(14):203-215. 15. Smith, CA; Vinograd J. Small polydisperse circular DNA of HeLa cells. J Mol Biol. 1972;2(69):163-178. 16. Stanfield S, Helinski DR. Small circular DNA in Drosophila melanogaster. Cell. 1976;9(2):333-345. 17. Stanfield SW, Helinski DR. Cloning and characterization of small circular DNA from Chinese hamster ovary cells. Mol Cell Biol. 1984;4(1):173-180. 18. Paulsen T, Kumar P, Koseoglu MM, Dutta A. Discoveries of Extrachromosomal Circles of DNA in Normal and Tumor Cells. Trends Genet. 2018;34(4):270-278. 19. Møller HD, Mohiyuddin M, Prada-Luengo I, et al. Circular DNA elements of chromosomal origin are common in healthy human somatic tissue. Nat Commun. 2018;9(1):1069. 20. Schneider SS, Hiemstra JL, Zehnbauer BA, et al. Isolation and structural analysis of a 1.2-megabase N-myc amplicon from a human neuroblastoma. Mol Cell Biol. 1992;12(12):5563-5570. 21. Turner KM, Deshpande V, Beyter D, et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature. 2017;543(7643):122-125. 22. David A. Nathanson, Beatrice Gini, Jack Mottahedeh, Koppany Visnyei, Tomoyuki Koga,

64 German Gomez, Ascia Eskin, Kiwook Hwang, Jun Wang, Kenta Masui, Andres Paucar, Huijun Yang, Minori Ohashi, Shaojun Zhu, Jill Wykosky, Rachel Reed, Stanley F. Nelson, Timot and PSM. Targeted Therapy Resistance Mediated by Dynamic Regulation of Extrachromosomal Mutant EGFR DNA. Science (80- ). 2014;343:72-76. 23. Koo D-H, Molin WT, Saski CA, et al. Extrachromosomal circular DNA-based amplification and transmission of herbicide resistance in crop weed Amaranthus palmeri. Proc Natl Acad Sci. 2018;115(13):3332-3337. 24. Shoura MJ, Gabdank I, Hansen L, et al. Intricate and Cell-type-specific Populations of Endogenous Circular DNA (eccDNA) in Caenorhabditis elegans and Homo sapiens. G3&#58; Genes|Genomes|Genetics. 2017;7:g3.300141. 25. Daubendiek SL, Kool ET. Generation of catalytic RNAs by rolling transcription of synthetic DNA Nanocircles. Nat Biotechnol. 1997;15(3):273-277. 26. Daubendiek SL, Ryan K, Kool ET. Rolling-Circle RNA Synthesis: Circular Oligonucleotides as Efficient Substrates for T7 RNA Polymerase. J Am Chem Soc. 1995;117(29):7818-7819. 27. Seidl CI, Lama L, Ryan K. Circularized synthetic oligodeoxynucleotides serve as promoterless RNA polymerase III templates for small RNA generation in human cells. Nucleic Acids Res. 2013;41(4):2552-2564. 28. Allen SE, Hug I, Pabian S, Rzeszutek I, Hoehener C, Nowacki M. Circular Concatemers of Ultra-Short DNA Segments Produce Regulatory RNAs. Cell. 2017;168(6):990-999.e7. 29. Du Q, Kotlyar A, Vologodskii A. Kinking the double helix by bending deformation. Nucleic Acids Res. 2008;36(4):1120-1128. 30. Heo SJ, Tatebayashi K, Ohsugi I, Shimamoto A, Furuichi Y, Ikeda H. Bloom’s syndrome gene suppresses premature ageing caused by Sgs1 deficiency in yeast. Genes Cells. 1999;4(11):619-625. 31. Kim, J; Klooster, S; Shapiro D. Intrinsiccally Bent DNA in a Eukaryotic Transcription Factor Recognition Sequence Potentiates Transcription Activation. J Biol Chem. 1995;270(3):1282-1288. 32. Gimenes F, Takeda KI, Fiorini A, Gouveia FS, Fernandez MA. Intrinsically bent DNA in replication origins and gene promoters. Genet Mol Res. 2008;7(2):549-558. 33. Wanapirak C, Kato M, Onishi Y, Wada-Kiyama Y, Kiyama R. Evolutionary conservation and functional synergism of curved DNA at the mouse and other globin-gene promoters. J Mol Evol. 2003;56(6):649-657. 34. Pérez-Martín J, Rojo F, de Lorenzo V. Promoters responsive to DNA bending: a common theme in prokaryotic gene expression. Microbiol Rev. 1994;58(2):268-290. 35. Bogerd HP, Whisnant AW, Kennedy EM, Flores O, Cullen BR. Derivation and characterization of Dicer- and microRNA-deficient human cells. Rna. 2014;20(6):923-937. 36. Møller HD, Lin L, Xiang X, et al. CRISPR-C: circularization of genes and chromosome by CRISPR in human cells. Nucleic Acids Res. 2018. 37. Goulet I, Zivanovic Y, Prunellt A, Monod IJ, Revet B. Chromatin reconstitution on small DNA rings. J Mol Biol. 1988;200:253-266. 38. Lemp NA, Hiraoka K, Kasahara N, Logg CR. Cryptic transcripts from a ubiquitous plasmid origin of replication confound tests for cis -regulatory function. Nucleic Acids Res. 2012;40(15):7280-7290. 39. Sivolob A, Prunell A. Nucleosome conformational flexibility and implications for chromatin dynamics. Philos Trans R Soc A Math Phys Eng Sci. 2004;362:1519-1547. 40. Li B, Carey M, Workman JL. The Role of Chromatin during Transcription. Cell. 2007;128(4):707-719. 41. Irobalieva RN, Fogg JM, Catanese DJ, et al. Structural diversity of supercoiled DNA. Nat Commun. 2015;6:1-10. 42. Jeon JH, Adamcik J, Dietler G, Metzler R. Supercoiling induces denaturation bubbles in

65 circular DNA. Phys Rev Lett. 2010;105(20):1-4. 43. Aiyar SE, Helmann JD, DeHaseth PL. A mismatch bubble in double-stranded DNA suffices to direct precise transcription initiation by Escherichia coli RNA polymerase. J Biol Chem. 1994;269(18):13179-13184. 44. Lionberger TA, Meyhöfer E. Bending the rules of transcriptional repression: Tightly looped DNA directly represses T7 RNA polymerase. Biophys J. 2010;99(4):1139-1148. 45. Lewis MK, Burgess RR. Transcription of simian virus 40 DNA by wheat germ RNA polymerase II. Priming of RNA synthesis by the 3’-hydroxyl of DNA at single strand nicks. J Biol Chem. 1980;(10):4928-4936.

66 Chapter 3

DNA repair of double-strand breaks by end-resection and homology dependent repair promote eccDNA formation

Adapted from: DNA repair of double-strand breaks by end-resection and homology dependent repair promote eccDNA formation Teressa Paulsen, Pumoli Malapati, Rebeka Eki, Tarek Abbas, Anindya Dutta Manuscipt in preparation for submission.

Teressa Paulsen wrote the manuscript and performed all the experiments.

Pumoli Malapati contributed biological replicates in 293T cells treated with PARP1i, RAD51i, cisplatin, and UV exposure.

Rebeka Eki and Dr. Tarek Abbas generated and characterized the U2OS KO cell lines.

Dr. Anindya Dutta managed the project and edited the manuscript.

67 ABSTRACT

Extrachromosomal circular DNAs (eccDNA) are known to contribute to tumor adaptation and evolution by amplifying oncogenes through non-Mendelian inheritance. However, the mechanisms that initially form the eccDNA have never been fully elucidated due to the complex interactions of DNA repair pathways and lack of a method to quantify eccDNA abundance within a cell. Through the development of a sensitive and quantitative assay for eccDNA we found that eccDNA are generated by DNA repair pathways involved in the repair of DNA double-strand breaks (DSBs) and are dependent on DNA strand-resection and annealing of homologous sequences (MMEJ, HR, SSA and MMR) especially after DNA damage. Inactivating the classical-NHEJ (c-NHEJ) repair pathway significantly increases eccDNA abundance, which we show to be highest in the S- and G2-phases of the cell-cycle. Double-strand break induction in cancer cells also increased the levels of large extrachromosomal DNA. Our results indicate that the induction of double-strand breaks and the use of strand-resection dependent repair pathways will increase eccDNAs and limit tumor survival and adaptation.

INTRODUCTION

Interest in eccDNA has been rapidly increasing because of the growing appreciation of the prevalence of eccDNA in both normal and cancer cells, and of the capability of eccDNA in promoting the genetic variability and resistance of cancer cells.1–9 Here, we study the mechanism by which eccDNA molecules are produced by first developing a method to quantify circular DNA and then utilizing CRISPR/Cas9 to produce cell lines that lack specific genes essential for known DNA repair pathways.

Previously, the mechanisms of eccDNA formation were inferred based on the sequences and characteristics of the eccDNA molecules.10–30 In both normal and cancerous cells, sequences of microhomology were found flanking the locations of eccDNA sequence in the genome for around 10-90% of the eccDNA in different libraries.5,7,31,32 This suggested that DNA

68 repair mechanisms that either utilize microhomology or are triggered by secondary structures formed by microhomology contribute to the formation of the eccDNA. However, many eccDNAs do not have the microhomology in the flanking sequences, which suggested that microhomology was not essential for eccDNA formation. In all eccDNA libraries, there also exist a population of eccDNA from repetitive sequences in the genome, and these may be produced by DNA repair processes like homologous recombination that are specific to highly repetitive regions of the genome. Thus, more definitive genetic studies were necessary to elucidate the major mechanisms of eccDNA formation.

EccDNA are known to increase the copy number of oncogenes in cancer, including full protein coding genes, microRNA, and si-like RNA.1,30,33,34 Abundance of eccDNA is significantly higher in cancer cells than in normal cells and their presence has been shown to enhance tumorigenicity, including cancer cell growth, survival, and drug resistance.15,35–38 Full oncogene sequences can be increased >30-fold by eccDNA amplification. Oncogenes amplified on eccDNA include c-Myc in colon cancer cells,35 PDGFRA and EGFR in gliomas and glioblastomas,38 and eIF-5A2 in ovarian cancer cells.37

Here we show that cellular eccDNA abundance is increased significantly when the classical-NHEJ (c-NHEJ) pathway is compromised, and is decreased significantly when alt-

NHEJ pathways, especially MMEJ, are compromised. We also show that double-strand breaks

(DSBs) in the genome promote eccDNA formation. Together this suggests that treatment of cancer cells with DSB inducing agents will increase eccDNA formation, promoting gene amplification and drug resistance. Conversely the tumorigenicity of cancer cells and amplification of oncogenes or drug resistance genes encoded by eccDNA can be decreased by inhibiting the MMEJ pathway. Because the MMEJ pathway is considered a non-essential pathway for normal cells,39 targeting this pathway would be a specific and non-toxic therapeutic option for cancer treatment. Further, because the majority of cancer cells have increased levels of eccDNA,1 this could be a strategy generalizable to numerous cancer types and stages.

69

METHODS

ECCDNA QUANTIFICATION

The eccDNA was isolated from the various knock-out cell lines and treated cells using a

HiSpeed midi-prep DNA isolation kit (Qiagen Catalog: 12643). The linear DNA was then digested using ATP-dependent plasmid safe DNase (Lucigen Catalog: E3110K). The remaining circular DNA was purified using DNA Clean & Concentrator-5 Kit (Genessee Catalog: 11-303).

Then QPCR was performed using SYBR master mix (Life Technology Catalog: A25778) with the outward facing primers listed in Supplemental Table 3-1. The abundance of each eccDNA, relative to mitochondrial DNA, and normalized to the control samle, was obtained by the formula:

2!"#$!%# !"!"_!"#$"%&' !!"!"#$%&'$_!"#$"%&' �������� ������ ����� �� ������� ������ = 2!"#$!%# !"!"_!"#$%"& !!"!"#$%&'$_!"#$%"&

The average of eight eccDNA was taken as the abundance of the eccDNA in that experiment. 3 or 5 independent biological replicates were performed as indicated in the figure legends.

CELL CULTURE

HeLa cells were cultured in McCoy’s medium;, 293A, 293T and U2OS cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM); both supplemented with 10% fetal bovine serum, 100 U/ml penicillin and 100 µg/ml streptomycin in an environment containing 5%

CO2 at 37°C. 293T cells were cultured in McCoy’s medium supplemented with 10% fetal bovine serum, 100 U/ml penicillin and 100 µg/ml streptomycin in an environment containing 5% CO2 at

37°C.

70

DRUG TREATMENTS

Drugs were added to the cell cultures at the following concentrations: NCS (200 ng/mL), cisplatin (2 uM), paclitaxel (10 nM), B02 (5 uM), AZD2461 (10 uM), MMS (200 uM), Mx (250 ug/mL), D-103 (30 uM), NSC16168 (0.5 uM). DNAPki (Sigma Aldrich) was added at 30 uM.

Cells were harvested after 48 hours of treatment.

FACS ANALYSIS

Cells were fixed in 70% ethanol, washed, and then stained with propidium iodide

(Thermo Fisher Catalog: P1304MP) according to the manufacturer’s recommendation before subjecting to FACS.

TRANSFECTION

The p413 plasmid expresses Cas9 together with a gRNA lacking the targeting site. The experimental plasmid has the chr22 gRNA targeting site inserted into the multiple cloning site of the gRNA. The plasmids were transfected in 293A cells using PEI using the manufacturer’s instructions. Puromycin was added 48 hours after transfection for selection of transfected cells and eccDNA was isolated 2 days after transfection.

RESULTS

A QUANTITATIVE ASSAY FOR ECCDNA ABUNDANCE

To determine the abundance of eccDNA, we developed a quantitative assay that amplified sequences of 8 of the most abundant eccDNA molecules that we detected by next- generation sequencing of eccDNA in chicken DT40 or human cancer cell lines (as indicated in

Supplemental Table 1), and normalized the eccDNA levels to that of endogenous circular mitochondrial DNA. To eliminate chromosomal DNA with tandem duplications that may give a

71 confounding signal, the eccDNA was enriched using a circular DNA plasmid isolation kit and then the remaining contaminating linear DNA was removed by digestion with an exonuclease for

24 hrs. We then carried out inverse PCR with primers that are outward directed on genomic

DNA. Such primers will not give a product on contaminating linear DNA, but will give a product when the same DNA is circularized as an eccDNA. We verified by Sanger sequencing that the amplicons from the inverse PCR reactions are indeed from circularized genomic DNA

(Supplemental Table 1) Quantitative PCR with the inverse primers was followed by normalization to the circular mitochondrial DNA present in the same preparation.

ECCDNA FORMATION IS INDUCED BY DOUBLE-STRAND DNA BREAKS (DSB)

We hypothesized that eccDNA formation was tied to DNA repair mechanisms and thus their levels would be increased after DNA damage. Treatment of 293T cells with various agents that disrupt the DNA structure, most notably, those that induce double-strand breaks, increased eccDNA within 48 hours (Figure 3-1B-N). For example, the addition of cisplatin, which leads to

DNA crosslinking40 increased the abundance of eccDNA (Figure 3-1B). Repair of cross-linked

DNA is known to lead to the formation of DSBs.40 Induction of thymidine-dimers by UV radiation also increased eccDNA levels (Figure 3-1C). UV radiation is also known to induce DSBs, though at a much lower frequency than thymine dimers.41 Additionally, treatment with MMS, which leads to methylation of DNA causing replication stalling and double-strand breaks,42 increases eccDNA abundance (Figure 3-1D). Finally, several compounds known to directly cause DSBs (Neocarzinostatin, Bleomycin, and X-rays)43,44 led to an increase of eccDNA abundance (Figure 3-1E-G). Interestingly, the increase of eccDNA by each type of damage plateaued out around a 2-fold increase, suggesting that eccDNA production is limited either by repair dynamics, or by the amount of damage that can be withstood by a cell.

72 ECCDNA FORMATION IS INCREASED AT LOCUS OF DOUBLE-STRAND BREAK

To ask whether DSBs lead to eccDNA formation locally near the DSB, we induced a

DSB at a specific locus in 293A cells by transfection of a vector expressing CAS9 and either a control sgRNA or an sgRNA that anneals to a specific location in the genome

(Chr22:18624104). The amount of eccDNA produced from that locus was measured using a few outward-directed primer pairs located within 2 KB from the DSB site (Figure 3-1H, I). The sequences of the amplicons quantified by QPCR were confirmed to arise from circular DNA by

Sanger sequencing (Figure 3-1H-J, Supplemental Table 3-2). EccDNAs arise from the neighborhood of the DSB and are induced 1.5-2.5X following the DSB (Figure 3-1J). The eccDNA observed were 1511 bp (Primer set 1), 680 bp (Primer set 2), and 946 bp (Primer set

3) long and were detected primarily on one side of the DSB, but this could be a locus-specific effect. Surprisingly, one of the eccDNAs detected (from Primer set 1), spanned the DSB, suggesting that it was produced by copying DNA from an uncut allele, either before the

CRISPR-Cas9 was active, or while it was active on one allele but had not yet cut the other allele. These results show that a DSB can induce the formation of eccDNA from the chromosomal neighborhood.

ECCDNA FORMATION IS SUPPRESSED BY c-NHEJ

To elucidate which specific DNA repair pathways form eccDNA as products of repair, we utilized our quantitative assay for eccDNA on an array of isogenic cell lines that have DNA repair genes knocked-out with CRISPR or inhibited by specific molecular inhibitors (Figure 3-

2A-R). The results from human and chicken cells under normal growth conditions are summarized in Figure 3-3C. Human U2OS cells that lacked genes in the c-NHEJ pathway,

DNA-PKc, XRCC4, XLF, LIG4, and 53BP1, all had a significant increase of eccDNA (Figure 3-

2B). DNA-PKc is a part of the early phase of c-NHEJ that together with the KU70/KU80 forms the DNA-PK holoenzymes that binds to the DNA break ends to hold them together during repair,

73 which blocks recruitment of end-processing enzymes.46 The critical role of DNA-PK in suppressing eccDNA production is further supported by the observed abundance of eccDNA levels in human embryonic kidney derived 293T cells treated with the specific DNA-PK inhibitor

(CAS-20357-25-9) (Figure 3-2M). XLF recruits the XRCC4-LIG4 to ligate the broken ends together.47 53BP1 is recruited to DSBs and signals for c-NHEJ by inhibiting DNA end-resection via the shieldin complex thus inhibiting all resection-dependent repair pathways.48

To determine how general this result is, we turned to chicken DT40 lymphoma cells where many DNA repair genes have been knocked out by classic homologous recombination over the last thirty years. Lack/inhibition of several c-NHEJ repair genes in DT40 cells also increased the eccDNA levels (Figure 3-2G). Cells with LIG4 knock-out showed a 3-fold increase of eccDNA abundance. The knock-out of Ku70 did not change levels significantly, but we suspect that this could be explained by the higher levels of death experienced in this cell line.

Together, these results show that in normal cells, the formation of eccDNA is suppressed by the repair of the DSBs by the c-NHEJ pathway.

ECCDNA FORMATION IS FACILITATED BY ALT-NHEJ REPAIR

We next turned to DNA repair pathways outside of c-NHEJ to test if eccDNA formation is linked to alt-NHEJ pathways, particularly MMEJ (Figure 3-2D, 3-2N). U2OS cells that lack

POLQ, a helicase-polymerase involved in unwinding DNA and facilitating the annealing of homologous ssDNA in MMEJ,49 have significantly reduced levels of eccDNA. POLQ is also shown to prevent RAD51 binding and so POLQ reduction stimulates HR50, and yet eccDNA formation is reduced, suggesting that HR may not be involved in eccDNA formation. Further, the addition of a PARP1 inhibitor, AZD2461, to 293T cells also reduced levels of eccDNA (Figure 3-

2N). PARP1 tethers DNA ends and interacts with XRCC1 and LIGIII to promote MMEJ,51 and the lack of PARP1 greatly reduces MMEJ.39 Interestingly, the lack of PARP1 also decreases

74 microhomology at junctions formed during class-switch recombination at the Immunoglobulin locus,39,52. These two results suggest that MMEJ is important for eccDNA formation.

PARP1 inhibition is known to disrupt base excision repair (BER) of some types of DNA lesions.53 To determine whether the the PARP1 inhibitor is altering eccDNA production through the BER pathway, we tested eccDNA levels after the inhibition of two genes within the BER pathway: LIG154 and APE155. The knock-out of LIG1, the ligase utilized by BER,54 did not significantly change eccDNA abundance in U2OS cells (Figure 3-2F). Furthermore, the knock- out of FEN1, the endonuclease known to be necessary for BER by processing flap-containing intermediates56, did not change eccDNA abundance in DT40 cells (Figure 3-2L). The APE1 endonuclease is necessary at an early stage in BER because it is recruited to the apurinic sites to sever the DNA and recruit other BER proteins.55 Consistent with the lack of an effect of BER on eccDNA production, an APE1 inhibitor (Mx) did not significantly alter eccDNA abundance in

293T cells (Figure 3-2P). Therefore, the role of PARP1 in promoting eccDNA production is likely dependent in its role in MMEJ but not in BER.

Alt-NHEJ requires limited end-resection, an activity that is mediated by the MRE11-

RAD50-NBS1 nuclease complex (MRN) and CtIP. NBS1, as a component of the MRN complex, is recruited early to DSB, and is critical for resecting DNA ends and a major factor in repair choice from c-NHEJ to end-resection dependent alt-NHEJ and HR repair.57 The knock-out of

NBS1 significantly decreased eccDNA abundance both in human U2OS and in chicken DT40 cells (Figure 3-2C, 3-2H). This reduction is consistent with MMEJ contributing to eccDNA formation.

To test other repair pathways downstream from MRN-mediated end-resection, we analyzed eccDNA levels in cells lacking functional proteins for single strand annealing (SSA) repair and homologous recombination (HR). RAD52 is necessary for SSA because it mediates strand invasion in a RAD51 independent manner.58 We found no change in eccDNA abundance

75 after the addition of the RAD52 inhibitor, D-103, to 293T cells (Figure 3-2Q), suggesting that

SSA is not essential for eccDNA formation.

To test the contribution of HR to eccDNA formation, we tested cells lacking functional

HR proteins including BRCA1, BRCA2, or CTIP in the DT40 cells (Figure 3-2K). Alternatively, a

RAD51 inhibitor (B02, which prevents the strand exchange activity of RAD5159, was added to

293T cells (Figure 3-2O). We did not detect a significant impact on eccDNA production in cells deleted of BRCA2, but we detected a slight increase in eccDNA levels after RAD51 inhibition.

This suggests that the genes downstream of resection in HR may not be essential for eccDNA formation and that the lack of RAD51 may increase eccDNA formation by increasing DNA secondary structure formation.

However, the results in chicken DT40 cells suggest the need for caution in dismissing the role of HR pathway proteins in eccDNA formation. The decrease of eccDNA after BRCA2 deletion was not statistically significant because of experimental variability, but the lack of

RAD54 or BRCA1 in DT40 cells significantly decreased eccDNA abundance (Figure 3-2K). The requirement of NBS1 from the MRN nuclease complex, but not of CTIP could be explained by the fact that the MRN complex can bind and resect DNA independently of CTIP and BRCA1, but at a slower rate60, and that other enzymes like DNA2 and EXO1 are also involved in strand resection.61 The likely explanation is that some of the enzymes used in HR, perhaps for end resection and strand invasion, are important in eccDNA formation even if the entire HR machinery may not be essential. The production of eccDNA is not dependent on the NER pathway, as tested with the inhibition of ERCC1-XPF with NSC16868 (Figure 3-2R).62,63

MISMATCH REPAIR AND FAN1 NUCLEASE ALSO PROMOTE ECCDNA FORMATION

We also examined a few selected DNA repair pathways outside of c-NHEJ and alt-NHEJ that may contribute to eccDNA formation. We had reported earlier that DT40 cells with defects in MSH3 from the mismatch repair (MMR) pathway produced less eccDNA as measured by

76 counting the extracted circles under an electron microscope.7 With the inverse-PCR assay utilized here, we find that knock-out of MSH2 or MLH1 in U2OS cells as well as the knock-out of MSH3 in DT40 cells all led to a substantial decrease of eccDNA (Figure 3-2E, 3-2I). This confirms and extends our earlier results and suggests either that DNA secondary structures that signal for mismatch repair lead to the formation of eccDNA through further involvement of

MMEJ proteins, or that alt-NHEJ pathways involve MMR proteins at some level. Further research is required to determine whether MMEJ and MMR contribute independently to eccDNA formation or whether cooperation between the two pathways leads to eccDNA formation.

Searching for other nucleases that may contribute to eccDNA formation, we found that a nuclease implicated initially in unhooking nucleotide cross-links in the Fanconi Anemia (FA) pathway, FAN1,64,65 is needed for sustaining eccDNA levels in DT40 cells (Figure 3-2J). Recent work has suggested that the unhooking reaction in cross-link repair is most likely carried out by other nuclease complexes, EME1-MUS81 and ERCC1-XPF,66 so the link of FAN1 with the FA pathway remains unclear. Our results, suggest that FAN1 has an unexpected role in eccDNA formation, perhaps in co-operation with the MRN complex implicated earlier.

CELLS LACKING C-NHEJ PRODUCE MORE ECCDNA AFTER DSBs

To examine the interaction between DSBs and the repair pathways implicated above in eccDNA formation, we induced DSBs in cells with compromised DSB repair by the addition of neocarzinostatin (NCS) (Results summarized in Figure 3-3C, bottom). Cells lacking genes involved in the c-NHEJ pathway, XRCC4 and 53BP1, produced more eccDNA upon addition of

NCS compared to WT cells (Figure 3-3A, p-value < 0.05). This result is consistent with results above suggesting that c-NHEJ repair decreases eccDNA levels.

On the other hand, cells compromised in alt-NHEJ pathways do not show an increase of eccDNA after DSB induction. The PARP1 inhibitor(AZD2641) suppressed the increase of eccDNA after DSBs (Figure 3-3B, p-value = 0.02), consistent with the hypothesis that MMEJ is

77 important for eccDNA production. Similarly, the knock-out of MLH1 from the MMR pathway also suppressed the increase of eccDNA after DSB (Figure 3-3B, p-value = 0.04). We conclude that alt-NHEJ and MMR promote eccDNA formation both under basal conditions as well as following the induction of DSBs by DSB-inducing agents. Thus, our results show that DSB repaired by c-

NHEJ decreases eccDNA production, while alt-NHEJ and MMR are still important for eccDNA formation after DSBs (summarized in Figure 3-3C).

The RAD51 or RAD52 inhibitors suppressed the production of eccDNA after DSBs

(Figure 3-3B), but the decrease in the induction post NCS did not reach statistical significance from that in control cells (as summarized in Figure 3-3C). The combination of the results in normally proliferating cells and in NCS treated cells suggest that some HR proteins involved in strand invasion may be required during eccDNA formation, but not the entire HR apparatus.

ECCDNA FORMATION IS INCREASED IN S-, G2-, AND M-PHASE OF THE CELL CYCLE

The extent of endogenous DSBs and the utilization of different DNA repair pathways are different in different parts of the cell cycle (Figure 3-4A, 3-4B). DSBs are increased during normal DNA replication in S phase when the replication fork runs into nicks or other barriers to

DNA replication.67 c-NHEJ is used throughout the cell cycle, but repair pathways dependent on end-resection are used only in S- and G2-phase.68 DNA repair that utilizes only small stretches of homology (MMEJ) are active earlier in S-phase, and as chromosomes are replicated and homologous sequences become more available on the sister chromatid, repair pathways that utilize longer stretches of homology (SSA and HR) become active.51,69 Therefore, we hypothesized an increase of eccDNA from mid-S-phase onwards.

Cells blocked at the G1-S phase transition in hydroxyurea were released from the block by washing the cells, and cells harvested at 2.5, 5, 7.5, and 9 hours after release. FACS for propidium iodide staining confirmed that the cells progressed through S-phase as expected

78 (Supplemental Figure 3-1). The eccDNA levels progressively increase as the cells progress through S-phase (Figure 3-4B).

We also tested the levels of eccDNA in cells blocked in M-phase by Nocodazole or

Paclitaxel, and found the levels of eccDNA to be elevated similar to the levels of the cells in the late S/G2-phase (Figure 3-4B). Together, this shows that eccDNA formation increases during S- phase when DSBs appear and resection dependent DNA repair pathways are most active.

Interestingly, even though the nuclear envelope breaks down in M phase in cells arrested with

Nocodozole or Paclitaxel, the exposure of the nuclear eccDNA to the cytosol did not result in an immediate degradation of the eccDNA by cytoplasmic DNAses. Consistent with a role of DNA replication related DSBs in producing eccDNA, prevention of DNA replication with aphidicolin, an inhibitor of replicative DNA polymerases, lowers eccDNA levels (Figure 3-4C).

DISCUSSION

EccDNA are formed by DNA repair pathways, especially after induced DNA damage.

Double-strand breaks (DSBs), DNA crosslinks, and alkylating damage all increased eccDNA formation, though the original insult or the repair process in all cases increases the likelihood of

DSB formation. EccDNA arise at higher frequencies at a specific locus after a DNA double- strand break. MMEJ, MMR and the FAN1 nuclease significantly contribute to the formation of eccDNA. Proteins involved in strand displacement and homology searching also appear to contribute to eccDNA formation, a requirement that becomes more evident when cells are stressed with an exogeneous agent inducing DSBs. It is also clear that functional c-NHEJ suppresses eccDNA formation. Conversely, eccDNA formation is tied directly to resection dependent DNA repair pathways (MMEJ, SSA, HR), and eccDNA levels increase significantly when the utilization of those pathways is increased in cells deficient in c-NHEJ repair. EccDNA

79 are produced at higher levels during S-phase and G2-phase, when resection dependent DNA repair pathways are the most utilized (Figure 3-4B).

It has been previously shown that the induction of two DNA DSBs induced by exogenous

CRISPR/Cas9 induction within the same chromosome can lead to eccDNA formation of the excised DNA by c-NHEJ.70 This is because the DSBs were blunt cuts and contained no homology around the broken ends. This suggests that cNHEJ does contribute to eccDNA formation when two DSBs are artificially induced intra-chromosomally even at large distances between cut sites.

Our results suggest that when there are single DSBs, the contribution of c-NHEJ is to repress eccDNA formation. This leads us to hypothesize that c-NHEJ suppresses eccDNA formation by rapidly repairing single DSBs in the chromosome. In the absence of c-NHEJ the

DSB may persist longer and end-resection produces single-strand DNA tails at these sites prior to either microhomology mediated end joining (MMEJ), single-strand annealing (SSA) or homologous recombination (HR) mediated repair. Our results suggest that the end-resection and homology directed repair promotes eccDNA formation. How this happens is unclear, but we suspect the single-stranded ends can lead to different DNA conformations, mediated by microhomologies, whose repair leads to circular extrachromosomal DNA as suggested in Figure

3-4D. We predict that the DNA lost from the chromosome in the form of the eccDNA can then be recovered by microhomology-mediated or SSA-mediated repair using the DNA on the other side of the break or by HR-mediated repair using a sister chromatid, so that formation of eccDNA will not always be accompanied by chromosomal microdeletions.

Recently, it has been shown in yeast that eccDNA formation is tied to SAE2 and MRE11, proteins with roles of DNA resection in DSB repair.71 The SAE2 equivalent in mammals CTIP, was dispensable for eccDNA formation, but this could be because several other nucleases have a redundant role in mammalian DSB repair. MRE11 of the MRN complex is important for eccDNA formation in yeast,71 consistent with our finding that NBS1 of the MRN complex is

80 important for eccDNA formation in human and chicken cells. MUS81 nuclease, which when paired with EME1 is involved in the unhooking of an interstrand cross-link, is also required for eccDNA formation in yeast. This is very similar to the requirement we note for FAN1, though it is entirely possible that MUS81 in yeast and FAN1 in human cells are required for their nuclease activity on some other DNA structure, and not necessarily near an interstrand cross-link.

The requirement of the MMR pathway proteins for eccDNA formation is more difficult to explain, though it is known that MMR genes are important for HR mediated repair to fix mismatches after the strand invasion step and for backtracking HR when strands mis-align.72

MMR proteins have been shown to be necessary for MMEJ, specifically class switch recombination in mice73, suggesting that the reliance on MMR for eccDNA formation may be tied to its role in MMEJ. It remains to be elucidated whether the genes involved in eccDNA formation in human cancer cells work in separate pathways or whether they interact in order for the DNA to be excised and ligated into circular molecules. Further research is needed to determine the specific interactions of the DNA repair pathways, as recently it has been shown that there is significant overlap of DNA repair pathways in repairing DNA lesions, and that the pathways are not as independent or as isolated as previously considered.74

The cell-cycle stages where most eccDNA is seen (S to M) are consistent with the stages where there are natural DSBs and active MMEJ/SSA/HR. It is quite interesting that prevention of DNA replication significantly reduces eccDNA formation. Surprisingly, the breakdown of the nuclear envelope in M phase arrested cells does not immediately subject the eccDNA to degradation by cytoplasmic DNases. It will be interesting to determine whether the eccDNA are tightly associated with chromosomes, and so protected from cytoplasmic DNases, and whether nuclear formation after mitosis is accompanied by the return of the eccDNA to the nuclear compartment.

The increase in eccDNA abundance following DSB could be relevant to the use of chemotherapeutic agents or radiotherapy, which lead to DSBs, and may, therefore, increase

81 somatically mosaic eccDNA, thus increasing the genetic variation of cancer cells and potentially leading to adaptation of the cancer to therapy. Compromise of the c-NHEJ pathway will lead to increased abundance of eccDNA. This could be relevant to cancers where NHEJ is known to be downregulated in cancers such as breast cancer75 as it suggests that the loss of NHEJ can increase genomic instability and tumor adaptation through increased formation of somatically mosaic eccDNA.

On the other hand, the discovery that MMR and MMEJ are the main DNA repair pathways to contribute to eccDNA formation, and that end-resection and strand invasion are necessary for eccDNA formation, has important therapeutic implications for cancer treatment.

Because cancer cells have higher levels of eccDNA and the eccDNA are known to carry and amplify oncogenes, it is possible that specific inhibitors of MMR and MMEJ may decrease the eccDNA burden in cancers, especially when they accompany DSB inducing therapy. As it has been shown that eccDNA are lost quickly in time if not replicated at high levels,71 it is possible that the addition of MMR and MMEJ inhibitors could reduce eccDNA even in late stage cancers after eccDNA have been amplified by replication, unequal segregation and selection. Because inhibiting MMR and MMEJ would be less toxic than inhibiting DNA repair pathways such as

NHEJ and HR,74,76 the inhibition could increase the effectiveness of cancer treatment through the combination with other DNA damaging chemotherapeutic targets.

82 FIGURES

FIGURE 3-1 EccDNA formation is induced by disruptions to DNA structure: (A) Assay developed to quantify eccDNA. B-G: EccDNA levels measured by the QPCR assay in 293T cells 48 hours after the addition of (B) 2 uM cisplatin (C) 10 J/m2 UV (D) 200 uM methyl-methanesulfonate (MMS) (E) 6 ug/mL of bleomycin (F) 200 ng/mL of neocarzinostatin (G) irradiation by 5 Gy of ionizing X-rays. (H) Diagram of induced DSBs at locus Chr22:18623837-18623994 in 293A cells and the amplification of circles arising from the surrounding genomic locus using outward facing primers (I) Diagram of eccDNA sequences amplified by the outward facing primers (J) Quantification of eccDNA isolated 48 hours after the transfection of an a p413 vector containing

83 CAS9 and a control gRNA sequence or a gRNA sequence targeting Chr22:18624104. P-values < 0.05 are indicated with a (*), < 0.01 (**), <0.001 (***). Mean±S.E. shown

FIGURE 3-2 EccDNA formation is suppressed by c-NHEJ and increased by alt-NHEJ pathways (A-E) Levels of eccDNA in U2OS cell lines with various knocked-out genes within DNA repair pathways. (A) Diagram of genes knocked-out or inhibited. (B-G) Levels of eccDNA in the DT40 cell lines with various knocked-out genes within DNA repair pathways. (F-K) Levels of eccDNA in the DT40 cell lines with various knocked-out genes within DNA repair pathways. (L-R) Levels of eccDNA in 293T cell lines with various genes within DNA repair pathways inhibited by small molecule inhibitors. P-values < 0.05 are indicated with a (*), < 0.01 (**), <0.001 (***). N=5 in the KO cells of U2OS and DT40; N=3 in the cells treated with small molecule inhibitors. Mean±S.E. shown relative to WT or DMSO treated cells.

84

FIGURE 3-3 After DSB, cells lacking c-NHEJ have more eccDNA and cells lacking functional MMEJ and resection proteins have fewer eccDNA (A, B) Levels of eccDNA after treatment of NCS (200 ng/mL) for 48 hours. P-values < 0.05 are indicated with a (*). The P-values test two comparisons: does NCS significantly alter eccDNA levels compared to DMSO and is the change seen with NCS different from that in WT cells. For all panels N=3, mean±S.E. shown. (C) Graphical summary of whether mutants or inhibitors or a given repair pathway increased (green) or decreased (red) eccDNA abundance in cells growing normally (top) or after the induction of DSB with Neocarzinozstatin.. Bold: Human cancer cell lines, Regular:DT40 cell lines; KO: (-/-), inhibitor (i).

85

FIGURE 3-4 EccDNA formation is increased in S-phase, G2-phase, and M-phase of the cell cycle (A) End-resection-independent pathway of repair is active in G1, while end-resection (and homology) dependent repair is more active in S and G2. (B) EccDNA levels in S-phase, G2- phase and M-phase. (C) EccDNA levels after the addition of 3 uM aphidicolin (APH). P-values < 0.05 are indicated with a (*), < 0.01 (**), <0.001 (***). (D) Graphical summary of eccDNA formation. The circle is formed by alignment of microhomologies. Following excision of the circle on the right, the DNA lost from the chromosome can be recovered by copying from the DNA on the opposite side of the break, or from a sister chromatid. The excised circle can become double-stranded from the action of DNA polymerases.

86

A) Human OF Primers and Amplicon Sequences # Size of Size of UF Primer DF Primer Abundance Coordinates of sequence in Coordinates of eccDNA by NGS circle from Amplicon Sequence of OF primer PCR amplicon sequence Sequence by NGS genome NGS (bp) (bp) 1 GAGATAGCCA TACTACCAT chr1 245530219 245530472 265246.5 253 chr1:245530078-245530685 567 GAGATAGCCAAGCTCCCATGTTTATTGCAGCACTATT AGCTCCCATG TGCAGATTG CACAATAGTCAAGAGTTGGAAGCAATCTAAGCATCC CG ATCAATAGACAAATGAATAAAGAAAAGGTGGTACATA TACAGAATGGAGTACTCTTCAGCCATAAAGAAGAAT GAGATCCTGCCATTTGCAACAACATGGATGGAACTG GAGGTCATTATGTTAAGTGAAATTAGCGAGGCCCAG AAAAACAGACTTCACATGTTTTTATGTGTGCTGCCCA GATGACACAATTCAGGAAAATCTTTCAGCTCCTACCA GAGAGTAAAGCTCTTCAGCTGTAGTGGCTCAACATC ATTCATTGATCATCATAGAAATGCAAATCAAAACTACAA TGAGATTATCATCTCACCACAGTTAAAATGGCTTTTATC CAAAAATCAGGCAATAACAAATGCTGGTGAGGATGTG GAGAAAAGGGAACCCTCATACACTGTGGATGGGAATG TAAATTAGTACAACCACTGTAGAGAACAGTTTGGAGCT TCCTCAAAAAACTAACAATTGAACTACCACGCAATCTG CAATGGTAGTA 2 CCTCTGAAAG GATCACGA chr16 83,620,081 83,620,204 130172 123 chr16:83620011-83620259 121 GATCACGAGGTCAAGAGATGGAGACCATTCTGGCCA TGCTGGGATT GGTCAAGA ACATGGTGAAACCCCGTCTCTACTAAAAGTACAAAA ACAG GATGGAG ATTAGCTGGGCGTGGTGGTGCTCGCCTGTAATCCCA GCACTTTCAGAGG 3 GGCCCTCTGG TGGTGGGC chr12 117100251 117100607 9879 356 chr12:117100381-117100483 281 TCTCCATCCAGTTGGGGTGACTGGGGCCGATGCTGT TGGACATCTC GCTATGGAA TGCCTGTGTAATGTGATTCCTCTTCCTTAAAATAAGGA ATTAA TCTT TAAGTTAATGAGATGTCCACCAGAGGGCCTGGCGTG GACACAGCACATGGCCACAGAGGCTGGTGGGCGCTA TGGAATCTTGTCCCCTGGAGAGGCACACAGCCAGGG CAGAATATCAAGGTCAAGGCTCTCCTGAAGGCTCTGC AGTGCTTAGTGACACCACCATCAATGACCGTCAGGTA TCAGGTCTGTTCTGAACGTTAGTCT 4 GGCCCTCTGG CTGGTGGG chr12 117100431 117100599 8358.2 168 chr12:117100381-117100661 251 CTGGTGGGCGCTATGGAATCTTGTCCCCTGGAGAGG TGGACATCTC CGCTATGG CACACAGCCAGGGCAGAACATCAAGGTCAAGGCTC ATTAA AATCTT TCCTGAAGGCTCTGCAGTGCTTAGTGACACCACCAT CAATGACCGTCAGGTATCAGGTCTGTTCTGAACGTTA GTCTTCTCCATCCAGTTGGGGTGACTGGGGCCGATG CTGTTGCCTGTGTAATGTGATTCCTCCTCCTTAAAATA AGGATAAGTTAATGAGATGTCCACCAGAGGGCC 5 CTGGTGGGCG GGCCCTCT chr12 117,100,306 117,100,599 8358.2 293 chr12: 117100381-117100661 251 GGCCCTCTGGTGGACATCTCATTAACTTATCCTTATTT CTATGGAATCT GGTGGACA TAAGGAAGAGGAATCACATTACACAGGCAACAGCAT T TCTCATTAA CGGCCCCAGTCACCCCAACTGGATGGAGAAGACTAA CGTTCAGAACAGACCTGATACCTGACGGTCATTGATG GTGGTGTCACTAAGCACTGCGGAGCCTTCAGGAGAG CCTTGACCTTGATATTCTGCCCTGGCTGTGTGCCTCT CCAGGGGACAAGATTCCATAGCGCCCACCAG 6 CGCTCCCAGA GGGGTTCT chr14 69,346,610 69,347,101 7352.5 491 Chr14: 69346689-69347123 336 GGGGTTCTTCCAAGTGGCTCTGGGACCCTCTGCCCT GCTGATAGGA TCCAAGTG GGAGGCTGGGCCTGAGACATAGAGGATGGGGGAGC GCTCT CATGGGCGGAGCTCAGGCTGGAAAAACAGAGCCCC TCAGTCCAGAAAGAGGAAAGCCTCAGGAGAGCCTG GGGCCAAAATGCATGCTCACCAGCTGTGTTCTGGCC CTGAGATTCTAAGCAGTTCACCCCCTGCTTAATGATG CCGGGGAGTGCTTCCTTCACGGGGACCCTGAGAGAA AAGGCAGCAGTCAGGCTTGTGCTTTCCCCCCCAAGG CTTACCTACACCCTCTGCCAGACGTCAGCATCCTATC AGCTCTGGGAGCG 7 GCCTCCCAAG GAGAATTG chr11 100,650,466 100,650,750 2880 284 Chr11: 100650414-100650783 361 GAGAATTGCTTGAACCCTGGAGGTGGAGGTTGCAGT TAGGTGGGAT CTTGAACC GAGCTGAGATCGCGCCACTGCAATCCAGCCTGGGC CTGGAG GGCAGAGTGAGACTCTGCCTCAAAAAAAAAAAGAG CAAAGTCAAGTCCATTGTCTTATACGGTACTTGCACA TTTTTTAATGACTTCAACTTTCTTAAGAGTCAAGTCGG CCGGGTGTGGTGGCTCATGCCTGTAATCCCAGCACTT TGGGAGGCCAAGGCAGGCAGATTACTTGAGGTCAGG AGTTTGAAACCAGCCTGGCCAACATGGTGAAACCCC GTCGCTACTAAAAATACAAAAATTAGGTGGGCATGGTG GCATGCACCTATAATCCCACCTACTTGGGAGGC 8 TACTACCATTG GAGATAGC chr1 245530571 245531755 247.9 1,184 chr1:245530303-245530648 306 TACTACCATTGCTAGATTGCGTGGTGAAAGATTTTCCT CTAGATTGC CAAGCTCC GAATTGTGTCATCTGGGCAGCACACATAAAAACATGT CAT GAAGTCTGTTTTTCTGGGCCTCGCTAATTTCACTTAAC ATAATGACCTCCAGTTCCATCCATGTTGTTGCAAATGG CAGGATCTCATTCTTCTTTATGGCTGAAGAGTACTCCA TTCTGTATATGTACCACCTTTTCTTTATTCATTTGTCTAT TGATGGATGCTTAGATTGCTTCCAACTCTTGACTATTG TGAATAGTGCTGCAATAAACATGGGAGCTTGGCTATCT C Mitochodrial DNA F:GATCAGGACATCCCGATGGTG R:AGGCGCTTTGTGAAGTAGGC

87 B) DT40 OF Primers and Amplicon Sequences Size of Size of Coordinates of eccDNA by Abundance Coordinates of sequence # UF Primer sequence DF Primer Sequence circle from Amplicon Sequence of OF primer PCR amplicon NGS by NGS in genome NGS (bp) (bp) 1 ATGCAAACCTGCCTTCGTTC GCAGTGGTATGGGACCTGTC chr11 2300736 2300935 19280.6 199 Chr11: 2300736 - 2300969 200 GCAGTGGTATGGGACCTGTCACAGGAACCATATGGTCGTCTGCT TCTTCTGCAAAGCAGCCGTCACAAGAGAGGAGGATGCTGCAGG GCTGCATGAATAAGCAGTGAGGAGCAAACCCACAAGGAAATTCA GGCAGTGCTTGGTTTTGCATAACAGTGCTATTTTGTCTGAGTTGT GGTTTACTGTTATGTTGTACAAGAAGAGACTATCGGATGCAAAGCC TCAACTCTGTAACTGGGAGGAGAGAGGGACAGCCATTACATCCAT CAGTGAAACTGGAGGAGATGATGAACGAAGGCAGGTTTGCAT

1 ATGCAAACCTGCCTTCGTTC GCAGTGGTATGGGACCTGTC chr11 2300736 2300935 19280.6 199 Chr11: 2300736 - 2300969 330 GCAGTGGTATGGGACCTGTCACAGGAACCATATGGTCGTCTGCT TCTTCTGCAAAGCAGCCGTCACAAGAGAGGAGGATGCTGCATAT TGCTGCTGCCTCTGGAACGACGATATTTAGGTCATCCATCAGTGA AACTGGAGGAGATGATGAACGAAGGCAGGTTTGCAT 2 TAAGCCTCAATGTGTGTTCAGCT GAGGCAATTCTGAAAGAAGCCG chr4 1460467 1460815 8535 348 Chr4: 1460467 - 1462710 174 AAGCCTCAATGTGTGTTCAGCTGGAATGCTGTCTAGCTATACATC CCTTATTAATGCCATATTAACAGACACAGCTGTTCAATATGCAGTA TGCAAACACACATGGATGTAAACAATCAGAGCAGAGGAGAGCT TTTGTTTCCTCGGCTTCTTTCAGAATTGCCTCCTCACTGTAAGCC TCAATGTGTGTTCAGCTGGAATGCTGTCTAGCTATACATCCCTTAT TAATGCCATATTAACAGACACAGCTGTTCAATATGCAGTATGCAAA CACACATGGATGTAAACAATCAGAGCAGAGGAGAGCTTTTGTTTC CTCGGCTTCTTTCAGAATTGCCTC 2 TAAGCCTCAATGTGTGTTCAGCT GAGGCAATTCTGAAAGAAGCCG chr4 1460467 1460815 8535 348 Chr4: 1460467 - 1462710 180 TAAGCCTCAATGTGTGTTCAGCTGGAATGCTGTCTAGCTATACAT CCCTTATTAATGCCATATTAACAGACACAGCTGTTCAATATGCAGT ATGCAAACACACATGGATGTAAACAATCAGAGCAGAGGAGAGC TTTTGTTTCCTCGGCTTCTTTCAGAATTGCCTCCTCACTGTAAGC CTCAATGTGTGTTCAGCTGGAATGCTGTCTAGCTATACATCCCTTA TTAATGCCATATTAACAGACACAGCTGTTCAATATGCAGTATGCAAA CACACATGGATGTAAACAATCAGAGCAGAGGAGAGCTTTTGTTTC CTCGGCTTCTTTCAGAATTGCCTC 3 AGGAGCATCAACACTGATGCA CCCCTAGATGGTCTGGACCT chr3 25576742 25577047 5161.2 305 Chr3: 25576742 - 25577047 205 CGTCTTGCTATAGCCCCCATCACCAACACACCTCTCACATTCCTC AAAATGCACCTTGAGACCAGAAACGAGCCCCCCTTGTTACCAC GAAGCCATAACAAGCCCACAGCAACCACGTGCTGTCAGAGGCA GTGCCCAGGAACCCCAACCATGTCTCCCCAGCCTTTGTCACTTG AAAGCTCACTGCATCAGTGTTGATGCTCCT 3 AGGAGCATCAACACTGATGCA CCCCTAGATGGTCTGGACCT chr3 25576742 25577047 5161.2 305 Chr3: 25576742 - 25577047 170 CCCCTAGATGGTCTGGACCTTTATGCATAGACCTGATGCAGTGCT TTTTCAAATGCCTGAAATTGATAAAAGGATTCTGGTGGATTTTGG CCCCAAGCAGGCAGATGCCTCCTGTATACACAGTGTGCTGTACA GCTTCTACTGCATCTCACTGCATCAGTGTTGATGCTCCT 4 CTGCACTGTGATCAGCAGCT GGAGGACAGATGGCTGCAC chr4 82442526 82442718 5096.5 192 Chr4: 82442326 - 82443624 177 GGAGGACAGATGGCTGCACCGCAGTCACTGCCAGCTGACACCT GCACAGGTTGAGCTCTCAGAGCTCCCCGTCACCATTTCTTGCCT TCTGATCAGCAGTGTGCCTGTGAGTGAGAACTCACACGCAGCCC TTTTCAGGGCCCACGGGAGATGGAGCTGCTGATCACAGTGCAG 4 CTGCACTGTGATCAGCAGCT GGAGGACAGATGGCTGCAC chr4 82442526 82442718 5096.5 192 Chr4: 82442326 - 82443624 179 GGAGGACAGATGGCTGCACCGCAGTCACTGCCAGCTGACACCT GCACAGGTTGAGCTCTCAGAGCTCCCCGTCACCATTTCTTGCCT TCTGATCAGCAGTGTGCCTGTGAGTGAGAACTCACACGCAGCCC TTTTCAGGGCCCACGGGAGATGGAGCTGCTGATCACAGTGCAGC TCTC 4 CTGCACTGTGATCAGCAGCT GGAGGACAGATGGCTGCAC chr4 82442526 82442718 5096.5 192 Chr4: 82442326 - 82443624 303 TGGCTGCACCCTCCCTGTCCGACTCATCCTCTTCAGACAGCCCC GAGTCAGCTTCGGTGGGAGCTGGATCTGGTGGGGCAGCCTTGA CGGGGGGCTCAGGCTCAGCTGAGTCCAAGCTGGGCTCCACTAT GTGGAGCTCGGTGCTGTCCAGCTCTGCCTCTCCTTCCTCCTCCT CCTCTTCCGAGCTCTCAGCCTCTTGCTGAGCCACAGGACCCCCC GGGCATGGAGCCCCCTCTGCTCCCCAGCT 5 TTAGTTTCTGCTCGGCAGCA TCAAACTCCGAGCACCAGTC chr2 69137247 69137609 1057.7 362 Chr2: 69137047-69137904 360 TCAAACTCCGAGCACCAGTCGTCTCGCTGTCAGGCAGTGCATG CTGGCTGCCTGCCTTATGGTGTGTGGAGCACCAAGGCAGACAGT AGTTCAGAGCACTTTGCTGTTGTAAAATAAGGTTTCAAGCTCAGA GACACTGGTTTGCGCTCTCAAAGATGCAGATTCTGATCATCATCA CATTTCATCTCGGTGATCTGTATCACTGAGTCCCAACAGTATGTAA GTTTGCATCAGATTCTCCCCTAAGTTCACTTCTTGCTTAGTTCACT GTTATGCTGCAGGTCTTTTTTTTTTAACCCTTCCATTGAGGGAGGT AAATGTAAAATGCTCAAGTGACTGCTGCCGAGCAGAAACTAA 6 ACTTCATCCAGTCCAGTGACAG GAGGCTTCCACAACCTTTCAGT chr8 13132838 13132985 1014.2 147 Chr8: 13132654 - 13133032 335 ACTTCATCCAGTCCAGTGACAGCAGTACATCAGCAGATCCATAC TGGCGTATCCTGCTGGTTCTTCACACCTGTAGGTTGCCAGTTTTA GTCTGATCAGCCTAAAATGTTTTTTTTCAGTTGCACATGTGCTTTA GGACAGGTGTTGTACTGAAAGGTTGTGGAAGCCTC 7 GGGACAGGCTGAGAGCAGAG GGCTTGGGAAGCAGTCATGC chr2 20383499 20383673 1014.2 174 Chr2: 20383321 - 20383744 199 GGCTTGGGAAGCAGTCATGCACATTGGCCTACCTGGCATGGGA TTTGAACATCTTCACCCACAGTGTAAGACAGTGTGCATTGTGAT GTGTAACTCCCATTGTTTTTTCTGCTATTTGCAGTGTAACGTGTG GATTAGTGCCCAAGTCATCTCTGCTCTGCTCTCAGCCTGTCCCTA ATGGCTTGGGAAGCAGTCATGCACATTGGCCTCAGCTGCTCTGG TTCCTACCTGGCATGGGATTTGAACATCTTCACCCACAGTGTAAG ACAGTGTGCATTGTGATGTGTAACTCCCATTGTTTTTTCTGCTATTT GCAGTGTAACGTGTGGATTAGTGCCCAAGTCATCTCTGCTCTGCT CTCAGCCTGTCCC 7 GGGACAGGCTGAGAGCAGAG GGCTTGGGAAGCAGTCATGC chr2 20383499 20383673 894.1 174 Chr2: 20383321 - 20383744 379 GCTTGGGAAGCAGTCATGCACATTGGCCAATTCACAGGCTGAA GAAGTGCAGAGGGCAGTACTGGGCACTTAACATTGCATTCTGTT AGAGTAATTTTTATCAGCGCGCTGTAAGTACACAATTTGGCATAA CTAGTGCCTGGCGCTGCCTGATCCCTCACTCAGCTGTCAGTCCA TCTCCATCTCTGACCAGACAGGTGGTTGGCTGCCTCAGCTGCTC TGGTTCCTACCTGGCATGGGATTTGAACATCTTCACCCACAGTGT AAGACAGTGTGCATTGTGATGTGTAACTCCCATTGTTTTTTCTGCT ATTTGCAGTGTAACGTGTGGATTAGTGCCCAAGTCATCTCTGCTC TGCTCTCAGCCTGTCCCTAATGGCTTGGGAAGCAGTC 7 GGGACAGGCTGAGAGCAGAG GGCTTGGGAAGCAGTCATGC chr2 20383499 20383673 894.1 174 Chr2: 20383321 - 20383744 375 GGCTTGGGAAGCAGTCATGCACATTGGCCAATTCACAGGCTGA AGAAGTGCAGAGGGCAGTACTGGGCACTTAACATTGCATTCTGT TAGAGTAATTTTTATCAGCGCGTTGTAAGTACACAATTTGGCATAAC TAGTGCCTGGCGCCTCATGTCACAAACGCTCTCAGCCACAGCCA TCTCCATCTCTGACCAGACAGGTGGTTGGCTGCCTCAGCTGCTC TGGTTCCTACCTGGCATGGGATTTGAACATCTTCACCCACAGTGT AAGACAGTGTGCATTGTGATGTGTAACTCCCATTGTTTTTTCTGCT ATTTGCAGTGTAACGTGTGGATTAGTGCCCAAGTCATCTCTGCTC TGCTCTCAGCCTGTCCC 8 CACGACTGACAGGATTATCAGGC AAGGATTTGTGCATGCTTCTGAC chr1 30701201 30701303 894.1 102 Chr1: 30701201 - 30701303 190 CACGACTGACAGGATTATCAGGCCCACAGTATCGCTCCTATCCC TTTTTCAGCAATGCCGCTTTCCAGGTTATTCCCTCACAGCGAATA CCTGTTTTACAAAGGCTATATACCATACTGTAAACTGAGGCCTGTT CTCCCTTGTCAGAAGCATGCACAAATCCTTTTTATATGAAGGGGAG TTGGGTGCA Mitochodrial DNA F:ACCCCCAACTCAAACACTCG R:GGTGGTTCCTAAGACCAACGG

SUPPLEMENTAL TABLE 3-1 Chromosomal loci interrogated by inverse PCR, selected because they produced highly abundant eccDNA in human or chicken cells as detected in eccDNA libraries prepared by rolling circle amplification and high throughput sequencing from past research.7 The abundance was listed as the number of times the junction sequence was sequenced, thus giving an estimate of the percentage of the specific eccDNA within the entire population of eccDNA. Sequences of outward facing primers, size of inverse PCR amplicon and sequence of the amplicon with the two parts that are joined to form the circle are given. The two parts are indicated by bold and regular font; the junction is at the shift between bold and regular font. UF: Upstream facing, DF: Downstream facing. (A) Human eccDNA. (B) Chicken eccDNA. Some primers amplified more than one eccDNA, therefore some primers have more than one sequenced amplicon.

88 Primer and Amplicon Sequences of eccDNA arising at Chr22:18623837-18623994 Size of Size of Primer # UF Primer sequence DF Primer Sequence eccDNA Amplicon Sequence of OF primer PCR amplicon (bp) (bp) Primer Set 1 CCCAGTGCACCCC CCCACTCACTCTGGG 1511 1210 CTGGGCTGGGACCGTGGTCTGGATGCCAGGGAGTGGAAAGGAGCTGAAGGAACTCTCGCAGGCGGGCA AACAAAGTGTTGAT CTGGGAC GGCCTCCCAAGGGATTGGTTCAGCCAGAGAGGCAGGACACGGACGGGCCAAGTCTCGGTGTGCATAGAT TAGCGCTGGGGAAAGTCCCCGTGGGGGCAGGCTGTGGGCAGGGAGGAGGGCTCACCAGCCAGCCAAGT GCCCGCACGAGGGAAGGAGTGGCCATCGTCCAGTGAGATGGTGAAGGGGACGTGGCCGCTCTCATAGAG CAGGGGCGACACGAAGTGCACTTGCCCGGAGGGTTTCACCACATTGGCCAGGATGGTCTTGATCTCCTGAC ATCATCGTGATCCGCCCGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGTAACCAGCCT GCCCTGCGTAATTTTTGCTGTCGATGATCCAAAGTGCACCCACTCGCCATCTCTGGTAGCTGGGAGGAGCAG GTTTTTGGGTGAAGGTGAAAGAGCCGGAGTTGGGGATGTGTGTGGCTGGGGAGTACAGGTGCGACCACTTT GCAGTCCACTCCTGTGAGTAGGGCATTCCTGAGGGAGAGAAGGCCTGGTTAGCCCTGGGGCTCCTTCGAGG CCTGGCCCCAGGCCTTGATACCTAGAGCAGCCTGACACATGTGTGGATGGACTGCCAGGCGGTCCCCTTAG GAGGGGGCTGTTCTAGCAAGGACCCTGGCTTGGGTGACCACCACCTCCTGCCACACTCAAGGATCCCTGTC CCTGCTGGAAGCCGGGCCCAGGGCCATGCCAGCCGTGAGGGGTTCTTCATGAGCCTCCTCCCCGGGACCT GGCCCATGCAGCTGCCCCTCCACCTGCCTGACCCCCTGGCTGCATTCCTCATTCGAATCTCATGTCGCCTGG ATTCCCTTCCTTCTCACTCCACCTGGAAACTCCCCTCCTCCTCCCACACAGCTGGGACTCCCTTTCTCCCAC CTCTGGGGGCCCAAAGCTCTGGATGCCACCCTAGCCCTCAGCTGGCCTCACCTGTCTCCTCATAGCCCCAC AGCTTGATGGTGTCTGTGTGGGCAGCGCCTCAACGTGCCAGGTCAGGCTGAGGTTGCCTGAGGTGTTGGC GGTGCCGTAGTATTGCCAATGCATCTCGTTCATCAACTCACTCTTCTCCTTCATCAACACTTTGTTGGGGTGCA CTGGGAAGGG

Primer Set 2 CCCTCGGCCATGTG GCTCTCCTGTGGCAT 680 673 TGCTCTCCTGTGGCATGCAAACCTGTGGGCCTGGTGGTGGTAGGAGGTGCTGGGCCACTGCAGGCTGAGC GACTCCTC GCAAACCTGT ATCTCGACAGTCTGGGCCCCCAACCTCCCTGGGGAACCCATCCAACCAGCTGTCCTGTGCCCTCCTGCCT GTCCTGGGTCTGAGGGGCTCCATTCAGCCTGCCAGTGGGTCTGAGCCCTCAGCATGCACAGGGATGGGG GCGTGGGGCCCCAGTGCCCGGCCTCCTGGGCCTCCCAACCTGCAGATCACCACCTGGCTTCTCCCCTGCC CTGGTGTGGCGGAGCTCAGTGGATAGGCACTGGCCCACCTCTCTAGGGACCCCTGCAGCTCCCACTCACTC TGGGCTGGGACCGTGGTCTGGATGCCAGGGAGTGGAAAGGAGCTGAAGGAACTCTCGCAGGCGGGCAGG CCTCCCAAGGGATTGGTTCAGCCAGAGAGGCAGGACACGGACGGGCCAAGTCTCGGTGTGCATAGATTAGC GCTGGGGAAAGTCCCCGTGGGGGCAGGCTGTGGGCAGGGAGGAGGGCTCACCAGCCAGCCAAGTGCCCG CACGAGGGAAGGAGTGGCCATCGTCCAGTGAGATGGTGAAGGGGACGTGGCCGCTCTCATAGAGCAGGGG CGACACGAAGTGCACTTGCCCGGAGGAGTCCACATGGCCGAGGG

Primer Set 3 CCCAGTGCACCCC AGGGGAGGTGAGGG 947 947 AGGGAGGTGAGGGGTGAGTGAGGGCTGGGCCCAGGGAAGCAGGGAGGAGGCCCTGAGGCTTCAAGGG AACAAAGTGTTGAT GTGAGTGAG CTGTGGGAGCCTCCAGGGCCAATGCATGGCAGCAGCTGGGTGATGGTCTCTCCGGCCCTCATGGCAGAAA AGGACGCTGAGGCCGGAGAGGCAGGGGCCTGCCCTCCCTCCACTGCCCACGCCTCTAGCCTCCACCTGG CTTCTCCCCTGCCCTGGTGTGGCGGAGCTCAGTGGATAGGCACTGGCCCACCTCTCTAGGGACCCCTGCA GCTCCCACTCACTCTGGGCTGGGACCGTGGTCTGGATGCCAGGGAGTGGAAAGGAGCTGAAGGAACTCT CGCAGGCGGGCAGGCCTCCCAAGGGATTGGTTCAGCCAGAGAGGCAGGACACGGACGGGCCAAGTCTC GGTGTGCATAGATTAGCGCTGGGGAAAGTCCCCGTGGGGGCAGGCTGTGGGCAGGGAGGAGGGCTCACC AGCCAGCCAAGTGCCCGCACGAGGGAAGGAGTGGCCATCGTCCAGTGAGATGGTGAAGGGGACGTGGC CGCTCTCATAGAGCAGGGGCGACACGAAGTGCACTTGCCCGGAGGAGTCCACATGGCCGAGGGTCTGGA TGCTCTCCTGTGGCATGCAAACCTGTGGGCCTGGTGGTGGTAGGAGGTGCTGGGCCACTGCAGGCTGAGC ATCTCGACAGTCTGGGCCCCCAACCTCCCTGGGGAACCCATCCAACCAGCTGTCCTGTGCCCTCCTGCCT GTCCTGGGTCTGAGGGGCTCCATTCAGCCTGCCAGTGGGTCTGTGGTGTCTGTGTGGGCAGCGCCTCAAC GTGCCAGGTCAGGCTGAGGTTGCCTGAGGTGTTGGCGGTGCCGTAGTATTGCCAATGCATCTCGTTCATCAA CTCACTCTTCTCCTTCATCAACACTTTGTTGGGGTGCACTGGGCAGGG

Primer F Primer R Mt GATCAGGACATCCCGATGGTG AGGCGCTTTGTGAAGTAGGC SUPPLEMENTAL TABLE 3-2 Inverse PCR primers to detect eccDNA formed from near a CRISPR-Cas9 induced double strand break. Sequences of outward facing primers, size of inverse PCR amplicon and sequence of the amplicon with the two parts that are joined to form the circle are given. The two parts are indicated by bold and regular font. Because the primers are slightly spaced apart the entire eccDNA sequence is not amplified. Therefore the PCR amplicon length is slightly less than the entire length of the eccDNA molecule from which it was amplified.

89

SUPPLEMENTAL FIGURE 3-1 Propidium Iodide FACS profiles to show cell-cycle profile of HeLa cells. To ensure that cells were progressing through S-phase as predicted, we analyzed the cells through several points after release from hydroxyl urea: 0, 1.5, 3, 4.5, 6, 7.5, and 9 hours. We further confirmed that cells were stalling in M phase after the addition of nocodazole.

90 BIBLIOGRAPHY

1. Turner KM, Deshpande V, Beyter D, et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature. 2017;543(7643):122-125. doi:10.1038/nature21356 2. Koo D-H, Molin WT, Saski CA, et al. Extrachromosomal circular DNA-based amplification and transmission of herbicide resistance in crop weed Amaranthus palmeri. Proc Natl Acad Sci. 2018;115(13):3332-3337. doi:10.1073/pnas.1719354115 3. David A. Nathanson, Beatrice Gini, Jack Mottahedeh, Koppany Visnyei, Tomoyuki Koga, German Gomez, Ascia Eskin, Kiwook Hwang, Jun Wang, Kenta Masui, Andres Paucar, Huijun Yang, Minori Ohashi, Shaojun Zhu, Jill Wykosky, Rachel Reed, Stanley F. Nelson, Timot and PSM. Targeted Therapy Resistance Mediated by Dynamic Regulation of Extrachromosomal Mutant EGFR DNA. Science (80- ). 2014;343:72-76. 4. Paulsen T, Kumar P, Koseoglu MM, Dutta A. Discoveries of Extrachromosomal Circles of DNA in Normal and Tumor Cells. Trends Genet. 2018;34(4). doi:10.1016/j.tig.2017.12.010 5. Kumar P, Dillon LW, Shibata Y, Jazaeri AA, Jones DR, Dutta A. Normal and Cancerous Tissues Release Extrachromosomal Circular DNA (eccDNA) into the Circulation. Mol Cancer Res. 2017;9:1197-1205. doi:10.1158/1541-7786.MCR-17- 0095 6. Shibata, Y; Kumar, P; Layer, R; Willcox, S; Gagan, J; Griffith, J; Dutta A. Extrachromosomal microDNAs and chromosomal microdeletions in normal tissues. Science (80- ). 2012:82-86. 7. Dillon LW, Kumar P, Shibata Y, et al. Production of Extrachromosomal MicroDNAs Is Linked to Mismatch Repair Pathways and Transcriptional Activity. CellReports. 2015;11(11):1749-1759. doi:10.1016/j.celrep.2015.05.020 8. Gaubatz JW. Extrachromosomal circular DNAs and genomic sequence plasticity in eukaryotic cells. Mutat Res DNAging. 1990;237(5-6):271-292. doi:10.1016/0921-8734(90)90009-G 9. Decarvalho AC, Kim H, Poisson LM, et al. Discordant inheritance of chromosomal and extrachromosomal DNA elements contributes to dynamic disease evolution in glioblastoma. Nat Genet. 2018;50(5):708-717. doi:10.1038/s41588-018-0105-0 10. Hotta, Y; Bassel A. Molecular size and circularity of DNA in cells of mammals and higher plants. Proc Natl Acad Sci USA. 1965;53:356-362. 11. Cox D, Yuncken C, Spriggs AI. Minute Chromatin Bodies in Malignant Tumours of Childhood. Lancet. 1965:20. 12. Jones RS, Potter SS. LI sequences in HeLa extrachromosomal circular DNA : Evidence for circularization by homologous recombination. 1993;82(April 1985):1989-1993. 13. Stahl F, Wettergren Y, Levan G. Amplicon Structure in Multidrug-Resistant Murine Cells : A Nonrearranged Region of Genomic DNA Corresponding to Large Circular DNA. 1992;12(3):1179-1187. 14. Per S, Sjöberg R-M, Karlsson A-L, Lundh L, Bjursell G. Molecular cloning and characterization of small polydisperse circular DNA from mouse 3T6 cells. Nucleic Acids Res. 1986;14(20):7823-7838.

91 15. Carroll SM, Gaudray P, Rose MLDE, et al. Characterization of an Episome Produced in Hamster Cells That Amplify a Transfected CAD Gene at High Frequency : Functional Evidence for a Mammalian Replication Origin. 1987;7(5):1740-1750. 16. Van Loon N, Miller D, Murnane JP. Formation of extrachromosomal circular DNA in HeLa cells by nonhomologous recombination. Nucleic Acids Res. 1994;22(13):2447-2452. doi:10.1093/nar/22.13.2447 17. Motejlek K, Schindler D, Assum G, Krone W. Increased amount and contour length distribution of small polydisperse circular DNA (spcDNA) in Fanconi anemia. DNA Repair (Amst). 1993;293:205-214. 18. Cohen S, Lavi S. Induction of Circles of Heterogeneous Sizes in Carcinogen- Treated Cells : Two-Dimensional Gel Analysis of Circular DNA Molecules. 2014;16(5):2002-2014. 19. Cohen S, Regev A, Lavi S. Small polydispersed circular DNA ( spcDNA ) in human cells : association with genomic instability. Oncogene. 1997;14:977-985. 20. Cohen S, Yacobi K, Segal D. Extrachromosomal Circular DNA of Tandemly Repeated Genomic Sequences in Drosophila Extrachromosomal Circular DNA of Tandemly Repeated Genomic Sequences in Drosophila. Genome Res. 2003:1133-1145. doi:10.1101/gr.907603 21. Cohen S, Agmon N, Sobol O, Segal D. Extrachromosomal circles of satellite repeats and 5S ribosomal DNA in human cells. Mob DNA. 2010;1(1):1-11. doi:10.1186/1759-8753-1-11 22. Radloff, R; Bauer, W; Vinograd J. A dye-buoyant-density method ofr the detection and isolation of closed circular duplex DNA: The closed circular DNA in HeLa cells. Proc Natl Acad Sci USA. 1967:1514-1521. 23. Navrátilová A, Koblížková A, Macas J. Survey of extrachromosomal circular DNA derived from plant satellite repeats. BMC Plant Biol. 2008;8:1-13. doi:10.1186/1471-2229-8-90 24. Timmis K, Winkler U. Isolation of Covalently Closed Circular Deoxyribonucleic Acid from Bacteria Which Produce Exocellular Nuclease. J Bacteriol. 1973;113(1):508-509. 25. Stanfield S, Helinski DR. Small circular DNA in Drosophila melanogaster. Cell. 1976;9(2):333-345. doi:10.1016/0092-8674(76)90123-9 26. Stanfield SW, Helinski DR. Cloning and characterization of small circular DNA from Chinese hamster ovary cells. Mol Cell Biol. 1984;4(1):173-180. doi:10.1128/MCB.4.1.173 27. Misra, R; Matera A; Schmid, C; Rush M. Recombinaton mediates production of an extrachromosomal circular DNA containing a transposon-like human element THE-1. Nucleic Acids Res. 1989;17(20):8327-8341. 28. Kunisada T, Yamagishi H. Sequence repetition and genomic distribution of small polydisperse circular DNA purified from HeLa cells. 1984;31:213-223. 29. Conley EC, Saunders VA, Saunders JR. Multiple mechanisms generate extrachromosomal circular DNA in Chinese hamster ovary cells. Nucleic Acids Res. 1986;14(22):8905-8917. 30. Schneider SS, Hiemstra JL, Zehnbauer BA, et al. Isolation and structural analysis of a 1.2-megabase N-myc amplicon from a human neuroblastoma. Mol Cell Biol.

92 1992;12(12):5563-5570. doi:10.1128/MCB.12.12.5563 31. Mehanna, P; Gagné, V; Lajoie, M; Spinella, J; St-Onge, P; Sinnett, D; Brukner, I; Krajinovic M. Characterization of the microDNA through the response to chemotherapeutics in lymphoblastoid cell lines. PLoS One. 2017:1-14. 32. Zhu, J; Zhang, F; Du, M, Zhang, P; Fu, S; Wang L. Molecular characterization of cell-free microDNAs in human plasma. Sci Rep. 2017;7:1-11. 33. Paulsen T, Shibata Y, Kumar P, Dillon L, Dutta A. Small extrachromosomal circular DNAs , microDNA , produce short regulatory RNAs that suppress gene. Nucleic Acids Res. 2019;47(9):4586-4596. doi:10.1093/nar/gkz155 34. Møller HD, Mohiyuddin M, Prada-Luengo I, et al. Circular DNA elements of chromosomal origin are common in healthy human somatic tissue. Nat Commun. 2018;9(1):1069. doi:10.1038/s41467-018-03369-8 35. Hoff DDVON, Mcgillt JR, Forseth BJ, et al. Elimination of extrachromosomally amplified MYC genes from human tumor cells reduces their tumorigenicity. Proc Natl Acad Sci. 1992;89(September):8165-8169. 36. Eckhardt SG, Dai A, Davidson K, Forseth B, Wahl G, Von Hoff D. Induction of differentiation in HL60 cells by the reduction of extrachromosomally amplified c- mnc. Proc Natl Aca Sci. 1994;91(July):6674-6678. 37. Yu L, Zhao Y, Quan C, et al. Gemcitabine Eliminates Double Minute Chromosomes from Human Ovarian Cancer Cells. PLoS One. 2013;8(8). doi:10.1371/journal.pone.0071988 38. B SC, Madhuri U, Amitabha R. Small Circular DNAs in Human Pathology. Malays J Med Sci. 2014;21(3):4-18. 39. Sfeir A, Symington L. Microhomology-mediated end joining: a back-up survival mechanism or dedicated pathway? Trends Biochem Sci. 2016;40(11):701-714. doi:10.1016/j.tibs.2015.08.006.Microhomology-mediated 40. Hashimoto S, Anai H, Hanada K. Mechanisms of interstrand DNA crosslink repair and human disorders. Genes Environ. 2016:1-8. doi:10.1186/s41021-016-0037-9 41. Rastogi RP, Kumar A, Tyagi MB, Sinha RP. Molecular Mechanisms of Ultraviolet Radiation-Induced DNA Damage and Repair. 2010:1-32. doi:10.4061/2010/592980 42. Alexander JL, Orr-weaver TL. Replication fork instability and the consequences of fork collisions from rereplication. Genes Dev. 2016:2241-2252. doi:10.1101/gad.288142.116. 43. Bradley MO, Kohn KW. X-ray induced DNA double strand break production and repair in mammalian cells as measured by neutral filter elution. Nuc Acids Res. 1979;7(3):793-804. 44. Smith BL, Bauer GB. DNA Damage Induced by Bleomycin , Neocarzinostatin , and Melphalan in a Precisely Positioned Nucleosome. J Bio Chem. 1994;269(48):30587-30594. 45. Santini J, Luebeck J, Rajkumar U, et al. Circular ecDNA promotes accessible chromatin and high oncogene expression. Nature. 2019;575(November 2018). doi:10.1038/s41586-019-1763-5 46. Davis AJ, Chen BPC, Chen DJ. DNA-PK : a dynamic enzyme in a versatile DSB repair pathway. DNA Repair (Amst). 2014;(214):21-29. doi:10.1016/j.dnarep.2014.02.020.DNA-PK

93 47. Ahnesorg P, Smith P, Jackson SP. XLF Interacts with the XRCC4- DNA Ligase IV Complex to Promote DNA Nonhomologous End-Joining. Cell. 2006;(D):301-313. doi:10.1016/j.cell.2005.12.031 48. Panier S, Boulton SJ. Double-strand break repair: 53BP1 comes into focus. Nat Rev. 2014;15(January). doi:10.1038/nrm3719 49. Black SJ, Ozdemir AY, Kashkina E, et al. Molecular basis of microhomology- mediated end-joining by puri fi ed full-length Pol θ. Nat Comm. 2019;10:4423. doi:10.1038/s41467-019-12272-9 50. Ceccaldi R, Liu JC, Amunugama R, et al. Homologous recombination deficient tumours are dependent on PolQ mediated repair. Nature. 2014;518(7538):258- 262. doi:10.1038/nature14184 51. Sallmyr A, Tomkinson AE. Role of alternative end-joining in genome stability. J Biol Chem. 2018:1-18. doi:10.1074/jbc.TM117.000375 52. Wang M, Wu W, Wu W, et al. PARP-1 and Ku compete for repair of DNA double strand breaks by distinct NHEJ pathways. Nuc Acids Res. 2006;34(21):6170- 6182. doi:10.1093/nar/gkl840 53. Reynolds P, Cooper S, Lomax M, Neill PO. Disruption of PARP1 function inhibits base excision repair of a sub-set of DNA lesions. Nuc Acids Res. 2015;43(8):4028-4038. doi:10.1093/nar/gkv250 54. Krokan HE, Bjørås M. Base Excision Repair. Cold Spring Harb Perspect Biol. 2013:1-22. 55. Roychoudhury S, Somsubhra N, Song H, et al. Human Apurinic/Apyrimidinic Endonuclease (APE1) Is Acetylated at DNA Damage Sites in Chromatin, and Acetylation Modulates Its DNA Repair Activity. Am Soc Microbiol. 2017;37(6):1- 16. 56. Asagoshi K, Tano K, Chastain PD, et al. FEN1 Functions in Long Patch Base Excision Repair Under Conditions of Oxidative Stress in Vertebrate Cells. Mol Cancer Res. 2010;8(2):204-215. doi:10.1158/1541-7786.MCR-09-0253.FEN1 57. Lamarche BJ, Orazio NI, Weitzman MD. The MRN complex in Double-Strand Break Repair and Telomere Maintenance. FEBS Lett. 2011;584(17):3682-3695. doi:10.1016/j.febslet.2010.07.029.The 58. Rothenberg E, Grimme JM, Spies M, Ha T. Human Rad52-mediated homology search and annealing occurs by continuous interactions between overlapping nucleoprotein complexes. Nat Acad Sci. 2008. 59. Huang F, Mazin A V. A Small Molecule Inhibitor of Human RAD51 Potentiates Breast Cancer Cell Killing by Therapeutic Agents in Mouse Xenografts. PLoS One. 2014;9(6). doi:10.1371/journal.pone.0100993 60. Cruz-García A, López-Saavedra A, Huertas P. BRCA1 accelerates CtIP-mediated DNA-end resection. Cell Rep. 2014;9:451-459. doi:10.1016/j.celrep.2014.08.076 61. Daley JM, Niu H, Miller AS, Sung P. Biochemical mechanism of DSB end resection and its regulation. DNA Repair (Amst). 2015;32:66-74. doi:10.1016/j.dnarep.2015.04.015 62. Arora S, Heyza J, Zhang H, et al. Identification of small molecule inhibitors of ERCC1-XPF that inhibit DNA repair and potentiate cisplatin efficacy in cancer cells. Oncotarget. 2016;7(46):1-14. 63. Faridounnia M, Folkers GE, Boelens R. Function and Interactions of ERCC1-XPF

94 in DNA Damage Response. Molecules. 2018:1-25. doi:10.3390/molecules23123205 64. Liu T, Ghosal G, Yuan J, Chen J, Huang J. FAN1 Acts with FANCI-FANCD2 to Promote DNA Interstrand Cross-Link Repair. Science (80- ). 2010;329(August):693-697. 65. Kratz K, Schöpf B, Kaden S, et al. Deficiency of FANCD2-Associated Nuclease KIAA1018 / FAN1 Sensitizes Cells to Interstrand Crosslinking Agents. Cell. 2010;142:77-88. doi:10.1016/j.cell.2010.06.022 66. Bhagwat N, Olsen AL, Wang AT, et al. XPF-ERCC1 Participates in the Fanconi Anemia Pathway of Cross-Link Repair . Mol Cell Biol. 2009;29(24):6427-6437. doi:10.1128/MCB.00086-09 67. Mehta A, Haber JE. Sources of DNA Double-Strand Breaks and Models of Recombinational DNA Repair. Cold Spring Harb Perspect Biol. 2014:1-17. 68. Brandsma I, Gent DC Van. Pathway choice in DNA double strand break repair : observations of a balancing act. Genome Integr. 2012:1-10. 69. Arnoult N, Correia A, Ma J, et al. G2 phases by the NHEJ inhibitor CYREN. Nature. 2017;549(7673):548-552. doi:10.1038/nature24023 70. Møller HD, Lin L, Xiang X, et al. CRISPR-C: circularization of genes and chromosome by CRISPR in human cells. Nucleic Acids Res. 2018;46(22). doi:10.1093/nar/gky767 71. Hull RM, King M, Pizza G, Id FK, Vergara X. Transcription-induced formation of extrachromosomal DNA during yeast ageing. PLOS Biol. 2019;17:1-29. 72. Chen W, Jinks-robertson SUE. Mismatch Repair Proteins Regulate Heteroduplex Formation during Mitotic Recombination in Yeast. Mol Cell Biol. 1998;18(11):6525-6537. 73. Eccleston J, Schrader CE, Yuan K, et al. Class switch recombination efficiency and junction microhomology patterns in MSH2-MLH1-, and EXO1-deficient mice depend on the presence of µ switch region tandem repeats. J Immunol. 2009;183:1222-1228. doi:10.4049/jimmunol.0900135 74. Dietlein F, Thelen L, Reinhardt HC. Cancer-specific defects in DNA repair pathways as targets for personalized therapeutic approaches. Trends Genet. 2014;30(8):326-339. doi:10.1016/j.tig.2014.06.003 75. Pozniak Y, Balint-lahat N, Rudolph JD, et al. System-wide Clinical Proteomics of Breast Cancer Reveals Global Remodeling of Tissue Homeostasis Article System-wide Clinical Proteomics of Breast Cancer Reveals Global Remodeling of Tissue Homeostasis. Cell Syst. 2016;2:172-184. doi:10.1016/j.cels.2016.02.001 76. Nickoloff JA, Jones D, Lee S, Williamson EA, Hromas R. Drugging the Cancers Addicted to DNA Repair. J Natl Cancer Inst. 2017;109:1-13. doi:10.1093/jnci/djx059

95 Chapter 4

FUTURE DIRECTIONS

In this chapter I will discuss future experiments, some of which I have already started, and end by summarizing my thesis and speculating on the future of eccDNA research.

Quantifying eccDNA through fluorescent microscopy after DNA damage and loss of DNA repair pathways

INTRODUCTION

Larger eccDNA, double minutes, can be visualized by DAPI staining.1–3 Though this method is limited to quantify a small percentage of the population of eccDNA within a cell, it does give another estimate of the levels of the eccDNA population in cells between various samples. It also shows that if the large eccDNA levels are increased, then it is also likely that a full oncogenic protein-coding gene is expressed from the eccDNA and possibly this can be selected for in the cancer. This method thus focuses on eccDNA that could be used by cancer cells to a host of genes including oncogenic full protein coding genes, long non-coding RNA genes, and microRNA genes, which can lead to increase growth and drug resistance.

METHODS

The double minutes were visualized through fluorescent microscopy which images DNA through DAPI staining. This was done by treating the cells with the indicated DNA damaging agents using the same technique and concentrations as with NCS in Chapter 3 above. After treatment, the cells were washed with PBS, fresh medium was added, and then colcemid was added (0.02 ug/mL) to arrest the cells in mitosis for 10 hours. The mitotic cells were then collected by trypisinization and centrifugation. Then the cells were washed with phosphate buffered solution and fixed in 3:1 ratio of methanol:acetic acid. The cells were then dropped onto microscope slides from a distance of 10 cm, DAPI stain (Vectashield, anti-fade mounting

96 medium with DAPI) added, and the samples covered with a cover slip and sealed with polish.

The DNA was imaged at 630X using a confocal fluorescent microscope. The visible eccDNA was then manually counted.

RESULTS

Double minutes, quantified by microscopy, are increased after DSBs

We previously found that DSBs (induced by addition of NCS for 48 hours) significantly increased levels of small eccDNA. We used the metaphase spread method above to determine whether the addition of NCS also increases long eccDNA levels. There was a large increase in long eccDNA levels (>20X) after DSB induction (Figure 4-1). This shows that both types of eccDNA, small and large, are increased after DSB induction.

Double minutes, quantified by microscopy, are decreased by PARP1i

Previously, we found that the addition of a PARP1 inhibitor (addition of AZD2461 for 48 hours) significantly decreased levels of eccDNA. Using the metaphase spread assay, we also found that inhibition of PARP1 decreases the abundance of the long eccDNAs, validating our previous results with small eccDNA (Figure 4-2). This shows that MMEJ contributes to the generation or maintenance of both large and small eccDNAs. This also suggests that MMEJ is a viable target to suppress the formation of a large eccDNA in cancer cells.

Analysis of eccDNA sequences from KO and inhibited cell lines

To further investigate the changes in eccDNA formation when a DNA repair pathway is compromised through either a KO or inhibition of a necessary gene we plan to sequence the circles (small eccDNA) with next generation sequencing. This will determine what structural characteristics are utilized by the various DNA repair pathways, e.g. MMEJ, MMR, when forming eccDNA. The technology I used in Chapters 2 and 3 for characterizing eccDNAs is

Rolling Circle Amplification followed by next generation sequencing. In the future, I will utilize

97 transposase proteins that cleave into DNA and attach an adapter for sequencing (as in the

ATAC-seq paper by Kumar et al in the Appendix). This latter approach is useful because there is less bias based on size relative to Rolling Circle Amplification and will give a qualitative view of what types of eccDNA are most commonly formed between the different repair pathways.

DISCUSSION

These experiments show that large eccDNA are increased after DSBs and are formed through the same pathways as studied with our assay quantifying smaller eccDNA. MMEJ is a vital pathway for both large and small eccDNA, especially the role of the PARP1 protein. These experiments will continue to assay other types of DNA damage and the inhibition of other DNA repair pathways.

EccDNA visualized by DAPI staining has been studied in other labs,1,2,4,5 where they validated the circularity and sequence of eccDNA through next generation sequencing, FISH and scanning electron microscopy. The circles contain junctional sequences determined by next generation sequencing which confirm their circularity; many circles also contain the same sequence as seen by FISH showing that eccDNA are propagated and multiplied through replication. Further, these molecules, which appear as small dots or circles in DAPI mitotic spreads are also shown to be distinctly circular as shown by scanning electron microscopy. We will use these same techniques to determine whether the eccDNA seen in our DAPI mitotic spreads are also circular or are linear extrachromosomal DNA.

Because the cells we visualize are progressing into and then stalling within M-phase we predict that the eccDNA are not linear DNA fragments. This is because linear DNA fragments contain broken ends of DNA which are strong signals for activating ATM-Chk2 or ATR-Chk1 dependent checkpoint pathways that block the cells at the G2/M transition, before entrance into

M-phase, so that the DNA can be repaired before it is distributed between daughter cells.6 The fact that we obtain metaphase spreads in our cells means the cells can enter mitosis within 12

98 hr of the end of treatment with NCS. Thus, either the cells have mutated checkpoint pathways that allow the cell to progress through the cell cycle even with linear DNA fragments that signal

DNA damage, or it means that the DNA fragments are not causing a signal of damage either because the ends have been ligated together to form a circular DNA molecule or because the ends have acquired telomeres. The nature of the extrachromosomal DNA that we see in the metaphase spreads, and that we tentatively call eccDNA, will be distinguished through next generation sequencing, electron microscopy and FISH with telomere specific probes.

EccDNA potentially increase of drug resistance, growth, and migration of cancer cells

Future experiments also include elucidating the capability of eccDNA to affect cell behavior such as growth, migration, and adaptability to chemotherapeutic treatments. For example, I will treat cancer cells with agents that inhibit MMEJ (which is shown to decrease eccDNA levels) or NHEJ (which is shown to increase eccDNA levels) and measure how well the treated cells acquire resistance in vitro to chemotherapeutic drugs. We will also determine whether there are any changes in the ability of the cells to proliferate, migrate, invade and grow in suspension. This can be done with an MTT assay to quantify cell growth on plastic over time, a scratch assay to measure migration, a Boyden chamber assay to measure invasion through matrigel and colony formation in soft agar. Because eccDNA is shown to exist in the majority of cancer cells one could treat cancers by understanding the mechanisms by which eccDNA are generated and maintained and then targeting those pathways.

EccDNA in brain

Another interesting aspect of eccDNA is that its half-life is related to cell division (Figure

4-3: Data gathered by Debbie Pan and Laura Dillon, unpublished). Transfected eccDNA levels decrease sharply over time as measured by quantifying the eccDNA transfected at various time points after transfection (1-3 days in HeLa, 293T and U2OS cells). Preliminary results show that the half-life of <20 hr in these rapidly proliferating cells is prolonged when cell proliferation is

99 decreased by serum starvation (data not shown). We suggest that the population of eccDNA decreases as the molecules are released into the cytoplasm when the nucleus dissolves during mitosis. A portion of the eccDNA population will be retained in the nucleus when it re-forms in

G1, but a percentage will become excluded from the nucleus and will be exposed to cytoplasmic proteins which degrade DNA.

One tissue which does not experience cell division after development is neuronal tissue.

Yet neurons suffer from DNA damage from superoxide radicals and other metabolites. This is interesting because eccDNA could accumulate in neurons and contribute to the diversification and function of brain cells. EccDNA could also contribute to aging if eccDNA accumulate with age. Our results in Chapter 2 suggest that the eccDNAs contribute to the pool of RNA—which could send mixed signals either increasing expression of the genes on the eccDNAs, or producing short regulatory RNAs that inhibit the genes from which the eccDNAs arose. These changes in gene expression could be toxic. In addition, the aberrant regulation of RNA may lead to a build-up of mis-folded RNA and miscoded proteins, and thus be toxic to the neurons.

To analyze this, the eccDNA from brain tissue from different ages of mice should be analyzed and the levels and types of eccDNA quantified.

EccDNA as a liquid biopsy using plasma from breast cancer patients

It has been shown by Kumar et al. that eccDNA are expelled into the bloodstream by cancer cells.7 Whether this is while the tumor cells are alive or this occurs when the cells undergo cell death is unknown, but it shows that cancer-derived eccDNA is detectable in the blood. One complication is that eccDNA from normal cells are also in the bloodstream. In order to determine whether eccDNA arise from cancerous tissues there must be specific mutations or a detectible change in eccDNA levels. In Kumar et al. the levels of the eccDNAs in the circulation were not consistently raised or lowered after a tumor was removed from a cancer patient.

100 One potential indicator of cancer would be an accumulation of a specific type of eccDNA that carries a mutant oncogene, or even a wild type oncogene. To test whether this occurs in patient samples we have acquired plasma from breast cancer patients at different stages of treatment with chemotherapy. Our hope is to observe changes in the eccDNA profiles of the patients that would indicant enrichment of oncogenes or mutations in tumor suppressor genes.

This could help the screening of cancer to improve the early detection of tumors and help treatment occur earlier in the development of cancers.

EM images of eccDNA from DT40 and U2OS KO cell lines

In past studies by Dillon et al., we have used the electron microscopy to both quantify and characterize eccDNA.8 We found that the length distribution of eccDNA matched exactly the distribution shown through next generation sequencing. We also found that we could give a rough estimate of the quantity of eccDNA between WT and MSH3 KO DT40 cells, and this was the first evidence that MSH3 KO decreased eccDNAs levels. I also have data on CtIP and

NBS1 KO cells showing a slight decrease of eccDNA, and LIG1 KO cells showing a slight increase of eccDNAs (Figure 4-4). This work was done with the help of Dr. Yuh-Hwa Wang in the preparation of the electron microscope grid preparation, DNA adherence to the grid, and the coating with tungsten. Further, I confirm the reduction of eccDNA in MSH3 KO cells independent of the data used in the published study. These data match our quantification of these same cells using our outward facing primer QPCR method.

In this experiment, the eccDNA was isolated using the same techniques described above. Then the solution of eccDNA was placed onto graphite coated copper EM grids. The grids were dehydrated through the addition of a gradient of ethanol and water. Then the grids were coated with tungsten and then analyzed by a transmission electron microscope (FEI Titan system).

101 We plan to use this technique to determine changes in eccDNA lengths after the addition of various treatments, e.g. NCS. This will show whether the larger or smaller eccDNA are selectively affected by DSBs. This method is useful for length determination because it adds little bias to the length distribution as the process directly images the DNA without any amplification or modification. Our data show that there may be some changes in DNA length distributions especially in NBS1 and MSH3 KO cell lines.

102 SUMMARY OF THESIS

In this thesis I have elucidated some of the DNA repair pathways by which eccDNA form and have shown possible functions by which small eccDNA contribute to modulating gene expression. EccDNA formation is tied to DNA double-strand breaks, accompanied by end- resection, homology searching, and sensing of mismatching sequences. Further, we have shown that small eccDNA are transcribed without promoter sequences and can amplify microRNA and si-like RNA.

It has been known that large eccDNA that carry full protein coding genes are transcribed and increase the expression of the gene within the eccDNA since the 1970s.9–15 However, the vast majority of eccDNA are too small to carry full protein coding genes or carry a promoter that would signal for transcription. Therefore, these molecules have been assumed to be inert byproducts within the cell. The work in my thesis has shown that these circles are not inert, but are actively transcribed and are forming functional RNA that influences the gene expression profile of a cell. We showed that small eccDNA are transcribed without promoter sequences and form functional RNA transcripts. These RNA molecules include known microRNAs whose genes, the pre-microRNAs, are present naturally as eccDNAs. The regulatory siRNA like molecules can also arise from eccDNAs derived from any exonic sequences. Pre-microRNA are transcribed and then processed by the Drosha and Dicer machinery into the known mature microRNA sequences that repress the known downstream targets of the microRNA. The eccDNA which carry exonic sequences can form transcripts from both strands of the eccDNAs, and these sense and anti-sense transcripts can anneal to form dsRNA. The dsRNA can be processed into novel si-like RNA which repress the genes which carry a homologous sequence, including the gene from which the eccDNA arose.

Further, my thesis elucidates eccDNA formation by showing its tie to resection dependent DNA repair pathways. The overall mechanism is that replication creates secondary

103 structures and breaks in the DNA which trigger resection dependent DNA repair pathways which produce eccDNA. MMR most likely intersects with the resection dependent DNA repair pathways either through the mismatches that form when the end-resected single-stranded DNA has annealed to its partner using microhomology (Figure 3-4D). We propose that the mismatching sequences have to be removed from the sites of microhomology by the mismatch repair proteins before extension of the invading strand and ligation can occur to complete the circle. Overall, these data show that eccDNA formation is a normal and common event that occurs during each passage through the cell cycle. Further, DNA damage will trigger eccDNA formation and thus patients receiving chemotherapy utilizing DNA damage strategies will experience an upsurge of eccDNA throughout the cells affected by the chemotherapy.

The future research on eccDNA will continue to diversify and deepen into its impact on carcinogenesis and chemotherapeutic resistance. Whether small eccDNA molecules in normal cells can be enriched in the same way that large eccDNA can in tumor cells and alter gene expression profiles has yet to be determined. Similarly, it is unclear whether small eccDNAs in the tumor cells can alter tumor phenotype. To further elucidate the role of eccDNA in tumor adaption and diversity, we should study the impact of the transfection of a population of eccDNA from cancer cells into normal cells.

Another avenue of great interest is eccDNA as a biomarker. It has been shown that eccDNA from xenografts of human cancers in mice are detectable in the blood.7 It has also been shown that fetal eccDNA are detectable in maternal plasma.16 This suggests that eccDNA can be utilized as a biomarker for detecting pathological and physiological changes within the body. Because early detection can make significant improvements for patient survival,17 this is a very encouraging field to pursue.

104

FIGURE 4-1 DSB induces long eccDNAs as detected by metaphase spread of ES2 ovarian cancer cells. (A) EccDNA quantification by fluorescent microscopy after NCS addition. (B) Microscopy images of ES2 cells treated with control (DMSO) and DSB inducing agent (NCS). (C,D) Histogram of number of eccDNA per cell after treatment with (C) DMSO and (D) NCS.

105

FIGURE 4-2 DSB induces long eccDNAs and this is repressed by inhibition of MMEJ by the PARP inhibitor AZD2461. (A) Quantification of eccDNA by fluorescent microscopy after various treatments. (B) Microscopy images of ES2 ovarian cancer cells treated with NCS, PARP1i, and both NCS and PARP1i. (C) Histogram of number of eccDNA per cell after treatment with NCS, PARP1i, and both NCS and PARP1i.

106

FIGURE 4-3 Levels of transfected eccDNA remaining after transfection at time indicated in 293T, U2OS and HeLa cells (A) Logarithmic scale (B) Non-logarithmic scale. [Data gathered by Laura Dillon and Debbie Pan]

107

FIGURE 4-4 EM quantifications of eccDNA: (A) Number of eccDNA per 10e8 cells (B) Length distributions of eccDNA, in DT40 KO cells.

108 BIBLIOGRAPHY

1. Turner KM, Deshpande V, Beyter D, et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature. 2017;543(7643):122-125. doi:10.1038/nature21356 2. David A. Nathanson, Beatrice Gini, Jack Mottahedeh, Koppany Visnyei, Tomoyuki Koga, German Gomez, Ascia Eskin, Kiwook Hwang, Jun Wang, Kenta Masui, Andres Paucar, Huijun Yang, Minori Ohashi, Shaojun Zhu, Jill Wykosky, Rachel Reed, Stanley F. Nelson, Timot and PSM. Targeted Therapy Resistance Mediated by Dynamic Regulation of Extrachromosomal Mutant EGFR DNA. Science (80- ). 2014;343:72-76. 3. Paulsen T, Kumar P, Koseoglu MM, Dutta A. Discoveries of Extrachromosomal Circles of DNA in Normal and Tumor Cells. Trends Genet. 2018;34(4). doi:10.1016/j.tig.2017.12.010 4. Decarvalho AC, Kim H, Poisson LM, et al. Discordant inheritance of chromosomal and extrachromosomal DNA elements contributes to dynamic disease evolution in glioblastoma. Nat Genet. 2018;50(5):708-717. doi:10.1038/s41588-018-0105-0 5. Santini J, Luebeck J, Rajkumar U, et al. Circular ecDNA promotes accessible chromatin and high oncogene expression. Nature. 2019;575(November 2018). doi:10.1038/s41586- 019-1763-5 6. Andreassan P, Ho G, Andrea A. DNA damage responses and their many interactions with the replication fork. Carcinogenesis. 2006;27(5):883-892. doi:10.1093/carcin/bgi319 7. Kumar P, Dillon LW, Shibata Y, Jazaeri AA, Jones DR, Dutta A. Normal and Cancerous Tissues Release Extrachromosomal Circular DNA (eccDNA) into the Circulation. Mol Cancer Res. 2017;9:1197-1205. doi:10.1158/1541-7786.MCR-17-0095 8. Dillon LW, Kumar P, Shibata Y, et al. Production of Extrachromosomal MicroDNAs Is Linked to Mismatch Repair Pathways and Transcriptional Activity. CellReports. 2015;11(11):1749-1759. doi:10.1016/j.celrep.2015.05.020 9. Cox D, Yuncken C, Spriggs AI. Minute Chromatin Bodies in Malignant Tumours of Childhood. Lancet. 1965:20. 10. Koo D-H, Molin WT, Saski CA, et al. Extrachromosomal circular DNA-based amplification and transmission of herbicide resistance in crop weed Amaranthus palmeri. Proc Natl Acad Sci. 2018;115(13):3332-3337. doi:10.1073/pnas.1719354115 11. Smith, CA; Vinograd J. Small polydisperse circular DNA of HeLa cells. J Mol Biol. 1972;2(69):163-178. 12. Timmis K, Winkler U. Isolation of Covalently Closed Circular Deoxyribonucleic Acid from Bacteria Which Produce Exocellular Nuclease. J Bacteriol. 1973;113(1):508-509. 13. Hotta, Y; Bassel A. Molecular size and circularity of DNA in cells of mammals and higher plants. Proc Natl Acad Sci USA. 1965;53:356-362. 14. Radloff, R; Bauer, W; Vinograd J. A dye-buoyant-density method ofr the detection and isolation of closed circular duplex DNA: The closed circular DNA in HeLa cells. Proc Natl Acad Sci USA. 1967:1514-1521. 15. Stanfield S, Helinski DR. Small circular DNA in Drosophila melanogaster. Cell. 1976;9(2):333-345. doi:10.1016/0092-8674(76)90123-9 16. Sin STK, Jiang P, Deng J, et al. Identification and characterization of extrachromosomal circular DNA in maternal plasma. 2020;117(3). doi:10.1073/pnas.1914949117 17. Loud J, Branch CG, Murphy J, Cancer G. Cancer screening and early detection in the 21st century. HHS Public Access. 2018;33(2):121-128. doi:10.1016/j.soncn.2017.02.002.Cancer

109 Appendix

Contributions to other projects

Adapted from: Kumar P, Kiran S, Saha S, Su Z, Paulsen T, Chatrath A, Shibata Y, Shibata E, Dutta A: ATAC-seq identifies thousands of extrachromosomal circular DNA in cancer and cell lines. Sci Adv 2020; 6:eaba2489. doi: 10.1126/sciadv.aba2489.

Kumar et al. investigated the use of ATAC-Seq data in public data repositories to identify eccDNA. We found eccDNA existed in OVCAR8 ovarian cancer cells and C4-2 prostate cancer cells using ATAC-Seq done in the lab. I helped in the confirmation of the eccDNAs from those two cell lines. The circles were confirmed through preparation of eccDNAs (isolated from cells using a plasmid isolation kit, purification with an EtOH precipitation, two overnight digestions with a ATP-dependent plasmid safe DNase, and then a final purification of the eccDNA with a

DNA purification kit), outward facing PCR and Sanger sequencing, as well as through metaphase spreads probed with fluorescent sequences matching the eccDNA sequence.

Further, using the same ATAC-seq pipeline analysis, a host of other samples were analyzed from patients with LGG and GBM.

The results showed that sequences amplified on eccDNA molecules could be detected through existing next generation sequencing (ATAC-seq) of cancer patient tumor samples and many of them contained oncogenic genes. Interestingly, many of the eccDNAs molecules identified could not be picked up by increase in copy number (gene amplification), suggesting that the eccDNAs were being picked up when they were present in a small portion of the tumor cells, at a pre-amplification stage. This suggests that the amplification of oncogenic genes through the formation of eccDNA could be a dynamic process that is part of the progression of cancers all the way to metastasis. Because these eccDNA molecules lacked centromeres they are distributed randomly between daughter cells during mitosis. This could accelerate the rate of amplification when the cancer is put under selective pressure, because the eccDNA could amplify through replication and then be further amplified by the unequal distribution into one of

110 the two daughter cells. Additionally, these eccDNA contain pseudogenes which contain nonsense mutations. Because the excision of sequences from the chromosomes and ligation into eccDNA can create deletions and rearrangement of sequences, including the re-positioning of a gene relative to an enhancer or a promoter, it is possible that genes that were normally suppressed by nonsense mutation or a lack of a strong promoter or enhancer could then become functional and expressed.

Overall, this shows that eccDNA produce a very dynamic genetic environment in cancers that can lead to rapid amplification of genes as well as the expression of genes that are normally suppressed.

My contribution

My contribution to this project was designing primers that targeted eccDNA and isolating the eccDNA from the cancer cell lines for analysis.

Abstract from “ATAC-seq identifies thousands of extrachromosomal circular DNA in cancer and cell lines”

Pankaj Kumar, Shashi Kiran, Shekhar Saha, Zhangli Su, Teressa Paulsen, Ajay Chatrath,

Yoshiyuki Shibata, Etsuko Shibata and Anindya Dutta

Extrachromosomal circular DNAs (eccDNAs) are somatically mosaic and contribute to intercellular heterogeneity in normal and tumor cells. Because short eccDNAs are poorly chromatinized, we hypothesized that they are sequenced by tagmentation in ATAC-seq experiments without any enrichment of circular DNA. Indeed, ATAC-seq identified thousands of eccDNAs in cell lines that were validated by in-verse PCR and by metaphase FISH. ATAC-seq in gliomas and glioblastomas identify hundreds of eccDNAs, including one containing the well- known EGFR gene amplicon from chr7. More than 18,000 eccDNAs, many carrying known cancer driver genes, are identified in a pan-cancer analysis of ATAC-seq libraries from 23 tumor

111 types. Somatically mosaic eccDNAs are identified by ATAC-seq even before amplification is recognized by genome-wide copy number variation measurements. Thus, ATAC-seq is a sensitive method to detect eccDNA present in a tumor at the pre-amplification stage and can be used to predict resistance to therapy.

112 Adapted from: Chatrath A, Przanowska R, Kiran S, Su Z, Saha S, Wilson B, Tsunematsu T, Ahn J, Lee K, Paulsen T, Sobierajska E, Kiran M, Tang X, Li T, Kumar P, Ratan A, Dutta A: The pan-cancer landscape of prognostic germline variants in 10,582 patients. Genome Med 2020; 12:15. doi: 10.1186/s13073-020-0718-7.

Chatrath et al. used publicly available datasets in The Cancer Genome Atlas Project

(TCGA) to discover that many germline variants in a cancer patient can help in the prediction of patient outcome. The analysis of prognostic germline variants can easily be added to the other types of analysis to help chose what chemotherapeutics would be the most effective in treating a patient. Because the sample size in the study was small, the association of the germline variants with patient outcome will only improve as the analysis gains more statistical power with the acquisition of more patients.

Of the germline variants found, twelve of them are associated with causing a change in gene expression or protein folding. Half of these variants are within genes associated with cancer progression as either tumor suppressors or oncogenes. Literature searches of these variants showed that genes that were found to have germline mutations, e.g. rs1800932 in

MSH6, may confer a survival advantage in tumors treated with temozolomide. Also, germline variants were found in rs55796947 in MAP2K3 which is a gene associated with cell proliferation.

Lastly, rs77903511 is a germline variant in SURVIVIN, a gene that is associated with the inhibition of apoptosis.

This study shows that normal genome sequencing (to identify germline variants) can give insights into the outcome of a cancer patient treated by standard methods in the USA. In the future I expect this strategy will also be useful to identify which therapy would most benefit a patient based on his/her germline variants. This will further expand the resources that can be used to help characterize the tumor and understand the underlying mechanisms which are giving the tumor adaptability and advantages in the acquisition of drug resistance.

My contributions

113 My contribution to this project was to do literature searches on the various genes (~40 genes) which contained the prognostic germline variants to list the various studies on the genes and the links with other relevant proteins and oncogenic processes.

Abstract from “The pan-cancer landscape of prognostic germline variants in 10,582 patients”

Ajay Chatrath, Roza Przanowska, Shashi Kiran, Zhangli Su, Shekhar Saha, Briana Wilson,

Takaaki Tsunematsu, Ji-Hye Ahn, Kyung Yong Lee, Teressa Paulsen, Ewelina Sobierajska,

Manjari Kiran, Xiwei Tang, Tianxi Li, Pankaj Kumar, Aakrosh Ratan and Anindya Dutta

Background: While clinical factors such as age, grade, stage, and histological subtype provide physicians with information about patient prognosis, genomic data can further improve these predictions. Previous studies have shown that germline variants in known cancer driver genes are predictive of patient outcome, but no study has systematically analyzed multiple cancers in an unbiased way to identify genetic loci that can improve patient outcome predictions made using clinical factors.

Methods: We analyzed sequencing data from the over 10,000 cancer patients available through

The Cancer Genome Atlas to identify germline variants associated with patient outcome using multivariate Cox regression models.

Results: We identified 79 prognostic germline variants in individual cancers and 112 prognostic germline variants in groups of cancers. The germline variants identified in individual cancers provide additional predictive power about patient outcomes beyond clinical information currently in use and may therefore augment clinical decisions based on expected tumor aggressiveness.

Molecularly, at least 12 of the germline variants are likely associated with patient outcome through perturbation of protein structure and at least five through association with gene expression differences. Almost half of these germline variants are in previously reported tumor suppressors, oncogenes or cancer driver genes with the other half pointing to genomic loci that

114 should be further investigated for their roles in cancers.

Conclusions: Germline variants are predictive of outcome in cancer patients and specific germline variants can improve patient outcome predictions beyond predictions made using clinical factors alone. The germline variants also implicate new means by which known oncogenes, tumor suppressor genes, and driver genes are perturbed in cancer and suggest roles in cancer for other genes that have not been extensively studied in oncology. Further studies in other cancer cohorts are necessary to confirm that germline variation is associated with outcome in cancer patients as this is a proof-of-principle study.

115