An investigation into cis-elements, rare mutations, and slipped-DNA detection at trinucleotide repeat disease-associated loci

by

Michelle Marie Axford

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Molecular University of Toronto

©Copyright by Michelle M Axford 2012

An investigation into cis-elements, rare mutations, and slipped-DNA detection at trinucleotide repeat disease-associated loci

Michelle Marie Axford

Doctor of Philosophy

Molecular Genetics University of Toronto

2012 Abstract

Gene-specific trinucleotide repeat expansions are the cause of an ever-growing number of disorders, including myotonic dystrophy type 1 (DM1) and spinocerebellar ataxia type 7 (SCA7).

Both DM1, and SCA7 are characterized by large differences in repeat numbers between tissues that are differentially affected, indicating tissue-specific mechanisms of repeat instability. The mechanism(s) of both somatic as well as germline instability are complex and still poorly understood, with evidence supporting the contribution of cis-elements, trans factors, and DNA metabolic processes that are hypothesized to involve alternative structure formation within the

DNA tract. This thesis involves investigations into the role of a particular cis-element (CTCF) on instability, as well as the detection of slipped-DNAs in patient tissues and the presence of rare mutations within those same tissues.

Here I identify the first endogenous cis-element reported to show regulation of instability at a trinucleotide repeat disease locus, the DNA binding site for the insulator protein CCCTC- binding factor (CTCF) downstream of the SCA7 repeat. Using a mouse model with a mutation in the CTCF binding domain, I show that the loss of CTCF binding stimulates germline and somatic instability in a tissue-specific and age-dependent manner. The binding of CTCF likely

ii protects the repeat tract from expansion by shielding it from other elements that may contribute to expansion.

DNA metabolic processes such as replication, repair, and transcription likely play a role in repeat expansion at disease loci, with the general mechanism hypothesized to be the extrusion and aberrant repair of slipped-DNA structures during the unwinding process for each. While characterizing DM1 patient tissues in order to isolate slipped-DNA structures, I characterized two non-CTG repeat insertion mutations that had completely replaced the repeat tract in a small subset of cells in only two tissues in one patient. Given the hypermutable nature of expanded repeat tracts, it is possible that these types of mutations are more common than suspected.

Finally, I report on the detection and isolation of slipped-DNA structures from the endogenous DM1 locus from patient tissues. The slip-outs appear as clusters along a length of

DNA, rather than single isolated slip-outs, and more unstable tissues contain greater amounts of slipped-DNA compared to more stable tissues. This detection implies that slipped-DNA structures are not merely transient intermediates in the mutation and expansion process as has long been assumed, but remain within the DNA at detectable levels.

The data reported herein both furthers our understanding of trinucleotide repeat instability, and additionally confirms the decades-long hypothesis that slipped-DNAs are in fact forming in patient tissues in a tissue-specific manner.

iii

Acknowledgments

Thank you to my supervisor, Dr. Christopher Pearson, for taking a chance on a relatively green graduate student, and for trusting me with such an important and rewarding project. I have learned so much and have acquired many skills in your lab, for which I will always be grateful. Thank you to my committee members for the valuable feedback and critical discussions during my PhD. Thank you to our wonderful collaborators, especially Dr. Charles Thornton and Dr. Yuh-Hwa Wang, who have both provided valuable resources in my 2D3 project. Thank you to Dr. Maria Zannis-Hadjopoulos for the invaluable contribution of the 2D3 antibody. Many thanks also go to all present and former members of the Pearson lab, especially those that helped train and teach me. I would like to single out Dr. John Cleary, who was an invaluable resource to me when I started in the lab, and had the perfect mix of seriousness and hilarity to allow a new grad student to feel comfortable. Special thanks to my fellow graduate students in the lab Meghan Slean and Kaalak Reddy; you‘ve both helped make the last 6 years a fantastic experience for me, and have provided immeasurable insight, support, and the occasional welcome break. I couldn‘t have asked for better lab mates or friends. Also to Jodie Simard, thank you for keeping the lab running smoothly every day, and thank you especially for your support of my ideas and beliefs, for the sympathetic ear, and for your valuable friendship. Thank you to the other graduate students and PIs on the floor and in the department, past and present, whom have helped my science or myself in one way or another throughout the years. Thank you to my extremely supportive and loving family: John and Nicole Axford, Marsha and Mark Sheehan, and Theresa Fairweather. Special thanks to my parents, Brian and Vera Axford, who have always fully supported me and my decisions throughout my life. Your unwavering support and love has helped me through good times and hard times, and means more to me than you could ever know. Everyone should be so lucky to have such fantastic parents. Finally, thank you to my rock and my sounding board, Tom, for being very patient and understanding during my Ph.D years. But, maybe more importantly, thank you for being so supportive of it, especially for the moments when you listened and went out of your way to truly understand what I did, despite not being a biological scientist yourself. The times you would mention all the cool science you read about or heard on the news were not lost on me, and I truly appreciate it.

iv

Table of Contents

Acknowledgments ...... iv Table of Contents ...... v List of Tables ...... vii List of Figures ...... viii List of Abbreviations ...... x 1 Introduction ...... 1 1.1 Trinucleotide repeat diseases and instability ...... 1 1.2 Spinocerebellar ataxia type 7 ...... 2 1.3 Myotonic dystrophy type 1 (DM1) ...... 2 1.4 Mechanisms of instability ...... 4 1.4.1 Cis-elements, transfactors, and flanking sequences ...... 4 1.4.2 Repeat tract purity ...... 4 1.4.2.1 Interruptions in the spinocerebellarataxias ...... 4 1.4.2.2 Interruptions in fragile X ...... 5 1.4.2.3 Interruptions in myotonic dystrophy type 1 ...... 5 1.4.2.4 Interruptions in Friedrich‘s ataxia ...... 6 1.4.3 DNA methylation ...... 6 1.4.4 DNA replication ...... 7 1.4.5 Transcription ...... 8 1.4.6 Mismatch repair and other DNA binding proteins ...... 9 1.5 DNA Structure ...... 10 1.5.1 History ...... 10 1.5.2 History of DNA structures and disease association ...... 12 1.5.2.1 Formation: Slipping, sliding, creeping, shifting, bubble- and branch- migration ...... 15 1.5.2.2 Formation: Mechanism ...... 16 1.5.3 Structures ...... 19 1.5.3.1 Structures:Evidence of DNA structure in trinucleotide repeats ...... 19 1.5.4 Processing ...... 26 1.5.5 Compounds interacting with slipped-DNA ...... 28 1.6 Non-trinucleotide repeat mutations and other unusual mutations in TNR diseases ...... 29 1.6.1 Mutations within or in proximity to the repeat ...... 29 1.7 Thesis Goals ...... 31 2 CTCF cis-regulates trinucleotide repeat instability in an epigenetic manner: a novel basis for mutational hot spot determination ...... 36 2.1 Abstract ...... 36 2.2 Introduction ...... 36 2.3 Methods ...... 37 2.3.1 Generation of SCA-CTCF-I-mut transgenic mice ...... 37 2.3.2 Repeat Instability Analysis ...... 38 2.4 Results ...... 38 2.5 Conclusions and Discussion ...... 50 3 Replacement of the myotonic dystrophy type 1 CTG repeat with ‗non-CTG repeat‘ insertions in specific tissues ...... 58 3.1 Abstract ...... 58 v

3.2 Introduction ...... 58 3.3 Methods ...... 59 3.3.1 DNA extraction ...... 59 3.3.2 DNA amplification and electrophoresis ...... 59 3.3.3 Sequencing ...... 60 3.3.4 DNA alignment and sequence location ...... 60 3.3.5 Methylation status ...... 60 3.4 Results ...... 60 3.5 Conclusions and Discussion ...... 72 4 Detection of slipped-DNAs at the trinucleotide repeats of the myotonic dystrophy type 1 disease locus in patient tissues ...... 76 4.1 Abstract ...... 76 4.2 Introduction ...... 76 4.2.1 The Anti-DNA Junction Antibody (2D3) ...... 78 4.3 Methods ...... 78 4.3.1 Human tissues ...... 78 4.3.2 DNA extraction ...... 78 4.3.3 CTG repeat length analysis ...... 85 4.3.4 Structure formation ...... 86 4.3.5 Electrophoretic mobility shift assay (EMSA) ...... 86 4.3.6 DNA-Immunoprecipitation ...... 87 4.3.7 Polymerase chain reaction protocols ...... 87 4.3.8 Mung Bean Nuclease, T7EndonucleaseI, and restriction enzyme digestion ...... 90 4.3.9 Nuclease accessibility protocol ...... 90 4.3.10 Electron microscopy ...... 91 4.4 Results ...... 91 4.4.1 Binding of anti-DNA junction antibody to slipped DNAs ...... 91 4.4.2 Determination of repeat (CTG) size and heterogeneity in DM1 patient tissues ... 97 4.4.3 Immunoprecipitation of slipped-DNA from DM1 patient tissues: allele specificity ...... 97 4.4.4 Quantification of immunoprecipitated DNAs ...... 107 4.4.5 Sensitivity of patient DNAs to structure-specific enzymes ...... 114 4.4.6 Electron microscopic analysis of immunoprecipitated DNAs ...... 127 4.5 Conclusions and Discussion ...... 134 5 Discussion and Implications ...... 141 5.1 Cis-elements and instability ...... 141 5.2 Rare mutations in repeat diseases ...... 147 5.3 Slipped-DNA and alternative structures in repeat disease ...... 149 5.4 Summary and concluding remarks ...... 152 References ...... 154

vi

List of Tables

Chapter 1 Introduction

Table 1.1 Trinucleotide repeat diseases containing interruptions……………………...…...……23

Chapter 2 CTCF cis-regulates trinucleotide repeat instability in an epigenetic manner: a novel basis for mutational hot spot determination

Table 2.1Repeat sizes of cortex DNA: CAG tract length - Small-pool PCR………………..…..51

Chapter 4 Detection of slipped-DNAs at the trinucleotide repeats of the myotonic dystrophy type 1 disease locus in patient tissues Table 4.1Patient, tissue, and DM1 CTG tract sizes……………………………………….……..79 Table 4.2Primers used in the experiments carried out in this chapter…………………….…...... 88

Chapter 5 Discussion and Implications Table 5.1 Trinucleotide repeat diseases known to be flanked by CTCF binding sites……....…145

vii

List of Figures

Chapter 1 Introduction Figure 1.1 Repair outcomes of slipped (CTG)/(CAG) DNA structures.……………………..….13 Figure 1.2 Movement of slipped-out DNA structures………………………...…………………17 Figure 1.3 Unusual DNA structures formed by trinucleotide repeats……………………………21 Figure 1.4 Mutations within or in proximity to various expanded repeats………………………33

Chapter 2 CTCF cis-regulates trinucleotide repeat instability in an epigenetic manner: a novel basis for mutational hot spot determination Figure 2.1 The human SCA7 region………………………………………………..……..……..39 Figure 2.2 Small-pool PCR data from sperm- 16 month-old mice………………………………42 Figure 2.3 SCA7-CTCF-I-mut mice display increased somatic instability………………...... ….44 Figure 2.4 SCA7-CTCF-I-mut mice display increased somatic instability……………….…...…46 Figure 2.5 SCA7-CTCF-I-mut mice display increased somatic instability………………………48 Figure 2.6 Model for CTCF regulation of CAG repeat instability……...……………………….54

Chapter 3 Replacement of the myotonic dystrophy type 1 (CTG)n repeat with „non-CTG repeat‟ insertions in specific tissues Figure 3.1Detection of extra bands after short myotonic dystrophy type 1 (DM1) allele PCR amplification in ADM9 DM1 patient……..……………………………………………..61 Figure 3.2 Characterisation by cloning approach of the myotonic dystrophy type 1 (DM1) repeat site in the short allele from the different ADM9 tissues analysed……….……….64 Figure 3.3Characterisation of the PCR products of the insert specific designed primers...... …..67 Figure 3.4Assessment of methylation allele specificity………………………………………....70

Chapter 4 Detection of slipped-DNAs at the trinucleotide repeats of the myotonic dystrophy type 1 disease locus in patient tissues Figure 4.1Slipped-DNAs are bound by anti-DNA junction antibody……………....…………...81 Figure 4.2 Models of expansion at trinucleotide repeats………………………………...……....83 Figure 4.3 2D3 and control antibody binding………...………….……………………………....93 Figure 4.4The anti-DNA junction antibody 2D3 does not induce structure in long repeat- containing linear ………………………………….……………………..……...………..95 Figure 4.5Southern blot sizing of the expanded DM1 (CTG) repeat in various tissues from the two patients used in this study…………………………………………………………...98 Figure 4.6 Protocol for isolating slipped-DNAs from genomic DNA……………………...…..100 Figure 4.7 Immunoprecipitated DNA enriches the expanded DM1 allele………...……...…....103 Figure 4.8 2D3 antibody does not pull-down non-specific DNAs void of structure-forming sequences……………………………………………………………………………….105 Figure 4.9 Triplet-primed PCR analysis of immunoprecipitated DNA………………………...108 Figure 4.10Method of quantification for immunoprecipitated materials………...……...….…..110 Figure 4.11 Quantification of immunoprecipitated DNA from patient and control samples…..112 Figure 4.12Sensitivity of DM1 patient DNAs to structure-specific enzymes……...………...... 115 Figure 4.13 TP-PCR analysis of samples +/- structure specific enzyme digestions, assessed by GeneScan……………………………………………………………………………….117 Figure 4.14 TP-PCR analysis of samples +/- structure-specific enzyme digestions, agarose analysis………………………………………………………………………………….119 viii

Figure 4.15 TP-PCR analysis of samples +/- structure specific enzyme digestions within native chromatin context…………………………………………………………………….…121 Figure 4.16 Quantification of areas under the peak in Figure 4.4.N………………………...….123 Figure 4.17 Sensitivity of DM1 patient DNAs to structure specific enzymes when treated within their native chromatin context…………………………………………………………..125 Figure 4.18 Electron microscopic imaging of linear and induced-structure DNA……….…….128 Figure 4.19 Electron microscopic analysis of slipped-DNAs…………………………….….....130 Figure 4.20 Quantification and analysis of EM images of patient slipped DNA……….……...132 Figure 4.21 Expansion and heteroduplex slipped DNA……………………………….….……136 Figure 4.22 Slipped-DNA: Southern blot and PCR analysis………………………….……...... 138

Chapter 5 Discussion and Implications

Figure 5.1 The fork-shift model of instability…………………………….………………....….142

ix

List of Abbreviations

2D3 anti-DNA junction antibody ALS amyotrophic lateral sclerosis Ataxin# Ataxin gene (i.e. 7) BLM Bloom protein BORIS Brother of the Regulator of Imprinted Sites cDNA complementary DNA ChIP Chromatin immunoprecipitation cTNT cardiac troponin t CTCF CCCTC-binding factor D-loop displacement loop (DNA structure) DM1/2 myotonic dystrophy type ½ DMPK dystrophiamyotonica protein kinase gene DNA deoxyribonucleic acid DNMT1 DNA (cytosine-5)-methyltransferase 1 enzyme E. coli Escherichia coli – gram negative bacteria EMSA electrophoretic mobility shift assay FD fully- duplexed FEN1 flap endonuclease 1 protein FMR1 Fragile X mental retardation gene 1 FRAXA Fragile X type A FRDA Friedreich‘s ataxia FSHD Fascio-scapulo-humeral muscular dystrophy HD Huntington‘s disease HNPCC Hereditary non-polyposis colorectal cancer IP Immunoprecipitation LNA locked nucleic acid MBN mung bean nuclease MBNL muscleblind protein MMR mismatch repair MSH2/3/6 MutS homolog 2/3/6 Mut mutant NER nucleotide excision repair OGG1 oxoguanineglycosylase 1 protein PCR polymerase chain reaction PERT Phenol Emulsion Reassociation Technique pol-α polymerase alpha polyQ Polyglutamine R-loop RNA-loop RNA ribonucleic acid RNA-PolII RNA polymerase II SBMA Spinal bulbar muscular atrophy SCA# Spinocerebellar ataxia # (i.e. 1, 2, 3...) S-DNA slipped-homoduplex DNA

x

SI-DNA slipped intermediate (heteroduplex) DNA SIX5 gene adjacent to DMPK SV40 simian virus 40 T7endoI T7 endonuclease I TCR transcription coupled repair TNR trinucleotide repeat TP-PCR triplet-primed polymerase chain reaction UTR untranslated region WRN Werner syndrome, RecQ helicase-like WT wildtype XPG Xerodermapigmentosum G ZNF9 zinc finger protein 9 Zn zinc

xi 1

1 Introduction 1.1 Trinucleotide repeat diseases and instability

Unstable trinucleotide repeat (TNR) tracts in the human genome are responsible for over 30 neuromuscular and neurodegenerative diseases. The first TNR disease mutations were reported in 1991, with the discovery of the gene-specific trinucleotide repeat expansions causing X-linked spinal and bulbar muscular atrophy (SBMA), fragile X mental retardation syndrome (FRAXA), and myotonic dystrophy type 1 (DM1). These TNR mutations can occur at various locations within a gene, from the promoter region (i.e. the (CAG)n repeat of spinocerebellar ataxia type 12 [SCA12]); the 5‘ untranslated region (UTR) (i.e. the (CGG)n of fragile X type A [FRAXA]); exons (i.e. the (CAG)n of Huntington‘s disease [HD] and SCA7); introns (i.e. the (GAA)n of Freidreich‘s ataxia [FRDA]); and the 3‘ UTR (i.e. the (CTG)n repeats of DM1). In most cases, the unaffected population harbours a short repeat (generally >35) that is stable throughout each individual‘s life and across generations. The expanded allele, however, is unstable both within individuals and during transmission, with instability differing between diseases. The instability in each disease is locus-specific, with no generalized genomic instability occurring. Also interesting to note is that the same repeat sequence at different loci (ie. CTG/CAG in both DM1 and HD) can show remarkably different instabilities. For example, DM1 patients can harbour (CTG) repeats as great as thousands in certain affected tissues [1], while HD patients have an average repeat length rarely greater than 100 [2]. This dramatic difference in repeat length between diseases, however, does not equate to an equivalently worse phenotype in the disease with larger expansions, implying that more than repeat length affects the severity between diseases.

In general, TNR diseases exhibit a phenomenon termed genetic anticipation, wherein the disease has an earlier age-at-onset in subsequent generations, usually accompanied by symptoms that are more severe than the previous affected generation. In the diseases where it has been studied, the mutation has also been shown to be dynamic within an individual, with various tissues showing variable levels of instability [1, 3-5]. The mechanisms of repeat instability are not fully elucidated, although there is evidence that certain metabolic processes have an effect such as DNA replication, repair, methylation, and transcription, as well as alternative DNA structure formation, among others (see ―Mechanisms of instability‖ below).

2

1.2 Spinocerebellar ataxia type 7

There are over 30 known spinocerebellar ataxias (SCAs) for which a mutation has been defined [6]. The largest subtype of SCAs are those caused by an expansion of a (CAG)n repeat in a specific gene which differs for each SCA, although SCAs caused by other mutations are also known, notably the recently discovered SCA31 and SCA36, which are caused by an inserted pentanucleotide, and an expanded hexanucleotide, respectively [7, 8]. The first SCA for which a mutation was identified was SCA1 in 1993 [9], with many more mutations for many more SCAs following quickly thereafter. The SCA7 mutation is autosomal dominant, was discovered in 1997 and found to occur in the ataxin-7 (ATXN7) gene, and is of the subtype which is caused by an expanded (CAG) repeat [10]. The pathogenesis of this disease is caused by a toxic gain-of- function of the ATAXIN-7 protein containing an expanded polyglutamine (polyQ) tract, which forms nuclear, and less commonly cytoplasmic, inclusions through self-aggregation [11]. The exact mechanism of the pathology, however, is still unknown. This toxic gain-of-function of the expanded protein is not restricted specifically to SCA7, as other TNR diseases such as HD, SBMA, and various other SCAs, all containing expanded polyQ tracts, also show the same pathogenic inclusions. Although most SCAs have an adult age-at-onset, only a few, one of which is SCA7, can manifest as an infantile onset form, generally leading to early death. The (CAG)n repeat in a SCA7 mouse model has been shown to be unstable in a tissue-specific manner [12], although the same is not yet known for human patients. Generally, as with some other TNRs that show anticipation, the larger the repeat, the more likely the age-at-onset will be younger. Unlike DM1 (below), however, the very large repeats in SCA7 are in the low hundreds [13-15], not thousands. It has been recently suggested that the normal function of ATXN-7 is the stabilization of microtubules, and that the disruption of this regulation may also contribute to disease pathology [16]. It is likely that SCA7 disease pathology is caused by a combination of factors.

1.3 Myotonic dystrophy type 1 (DM1)

Originally called Steinert‘s disease, DM1 was first described by Hans Steinert in 1909. It wasn‘t until 1992 that the (CTG) expansion causing the dominant disease was discovered in the 3‘ UTR of the myotonic dystrophy protein kinase gene (DMPK) on the long arm of chromosome 19 [17]. Unlike the spinocerebellar ataxias, there are only two known types of mytonic

3 dystrophy, DM1 and DM2, with the latter caused by the expansion of a (CCTG) repeat in the ZNF9 gene [18]. DM1 is distinguishable phenotypically from other muscular dystrophies in that the distal rather than proximal musculature becomes affected initially upon disease onset, with the proximal muscle onset occurring later. Additional symptoms can include abnormal cardiac rhythms, apathy, learning difficulties, and male infertility.

The cause of DM1 disease pathology is mostly attributable to a toxic gain-of-function of the expanded (CUG)n RNA, which sequesters other proteins from their normal functions and creates foci of protein aggregates within the nucleus. One of the sequestered proteins is muscleblind (MBNL), a protein that normally acts as a splicing factor [19, 20]. The sequestration of MBNL1 is directly connected to many of the symptoms of DM1. For example, MBNL1 is known to affect the alternative splicing of cardiac troponin t (cTNT), which is then misregulated when MBNL1 is sequestered in DM1, contributing to the cardiac symptoms seen in the disease [21]. In general, MBNL1 regulates the transition of the splicing of various RNAs from the neonatal isoforms to the adult isoforms, with the absence of MBNL1 blocking that transition and leaving those particular RNAs in the developmentally inappropriate neonatal form [22]. Similar to SCA7, DM1 has an adult-onset form, a juvenile onset form, and a congenital form. DM1 has a range of repeat sizes associated with the unaffected and affected populations; unaffected individuals can have up to 35 CTG repeats, which are intergenerationally and somatically stable; permutation lengths (up to 90 or 100 repeats) which are unstable; and full disease lengths into the thousands, in which patients exhibit symptoms at varying ages, generally dependent upon repeat length. Within the premutation lengths, few individuals exhibit disease symptoms, with potential cataract formation in older age not always being attributed to DM1 until the following generation‘s symptom onset. Although repeat contractions are reported to have an occurrence rate of between 4.2 and 6.4% upon transmission [23], the majority of unstable alleles are expansions. Unlike SCA7, the tissue specificity of DM1 CTG instability has been documented in humans [1], with differences approaching the thousands between affected and less affected tissues, although the mechanism by which this differential instability occurs is not fully understood.

4

1.4 Mechanisms of instability

1.4.1 Cis-elements, trans factors, and flanking sequences

Outside of the repeat expansions themselves, cis-elements and trans-acting factors can also play a role in TNR disease pathology. When referring to a cis-element, I am indicating the sequence of the repeats themselves, the length and purity of the tract including interruptions, alterations of the DNA itself such as methylation, or a change in DNA sequences within and into the flanking sequences of the repeat. Additionally, trans-factors refer to proteins that bind within or around the repeat, such as mismatch repair proteins and the CCCTC-binding factor (CTCF) protein, or processes that in some way may affect instability such as DNA replication and transcription.

1.4.2 Repeat tract purity

Diseases caused by TNRs were initially thought to be composed of pure tracts in both the stable and the expanded repeats, with the absolute length of the repeat tract being the most important predictor of further expansion and general disease severity. Repeat tracts containing interruptions within an otherwise pure tract were discovered soon after [24], and were shown to have increased stability against expansions or contractions relative to pure tracts in both yeast and human cell models [25, 26]. Since then, a growing number of trinucleotide repeat disorders have been associated with interruptions within the repeat tract (see [27] and Table 1.1 below). Although repeat tract interruptions have generally been associated with a more stable and less expanded repeat tract, there is evidence to support that stability is not always the case, or at least not the only explanation, where interruptions are concerned.

1.4.2.1 Interruptions in the spinocerebellar ataxias

Various spinocerebellar ataxias have been found to contain de novo (and in some cases maintained from the unexpanded) interruptions within the expanded allele. For example, the expanded SCA3 allele maintains an interruption in the 5‘ end of the repeat [28], which likely is important in protein context as the interruption codes for a lysine residue that is essential in protein function. The CAG expanded repeat of SCA17 contains CAA interruptions, which are maintained and contribute to expansion, likely due to the fact that both trinucleotides encode glutamine [29]. Conversely, SCA8 (CAG) repeats are uninterrupted within the non-disease

5 causing range, but become interrupted after expansion [30]. Until recently, it was also believed that SCA2, another CAG repeat disease, contained interruptions in the unexpanded but not in the expanded allele. However, it was in fact shown to be interrupted in rare patients, sometimes in complex patterns of CAG tracts [31]. These alleles showed a strong bias for repeat lengthening at the 5‘ end, with minor lengthening and shortening occurring at the 3‘ end, where the interruptions were found. (CAA) interruptions have also been found in families with SCA2 wherein the pure expanded alleles as well as the pathogenic-length interrupted alleles originated from the same ancestral haplotype [32].The results for these families suggest that interrupted alleles that are below the pathogenic threshold in SCA2 may be a reservoir for alleles prone to expansion.

1.4.2.2 Interruptions in fragile X

The ―ancestral haplotype‖ hypothesis suggested for SCA2 has also been proposed for some fragile X patient families, wherein the CGG repeat tract is interspersed occasionally with AGG interruptions in the normal allele, with large but normal-range pure alleles hypothesized to be the likely reservoir of fragile X causing expansions [33]. Later work showed that interrupted alleles with 39 total (CGG+AGG) repeats in sperm from affected males showed almost no expansion, compared to a 3% expansion rate in alleles with 39 pure CGG repeats [34], indicating a stabilization through interruption. Non-sperm experiments have similarly indicated that the length of the uninterrupted tract is what determines the stability of the tract [35], with the loss of the AGG interruptions occurring as a late-stage event, given that intermediate-length alleles generally remain interrupted [36]. This loss of interruptions could potentially be mediated by DNA replication through the repeat [37, 38]. Analysis has also shown that the longest tract of uninterrupted repeats occurs at the 3‘ end of the repeat, which contrast with SCA2, wherein interruptions are clustered at the 3‘. The contribution of interruptions to stability appears also to be locus-specific.

1.4.2.3 Interruptions in myotonic dystrophy type 1

In addition to the spinocerebellar ataxias and fragile X, DM1 has also recently been shown to contain complicated arrays of interruptions. A family with multiple DM1 patients was found to harbour complex and variable interruptions within the expanded CTG tract, potentially accounting for unusual symptoms in the family [39]. To complicate this conclusion, however, a

6 previous study examining interruptions in two families with myotonic dystrophy, as well as individual patients with DM1, determined that one family had simple interruptions near the 3‘ end of the repeat tract, while the other had more complex interruption patterns [23], with the complex interruption-family showing large differences in interruption pattern between father and son. No significant difference in phenotype was observed between these individuals, however, nor when compared with what is considered classical DM1 symptoms. These conflicting results make any conclusions regarding the significance of interruptions with respect to disease progression or transmission difficult to make.

1.4.2.4 Interruptions in Friedrich’s ataxia

Friedreich‘s ataxia (FRDA), an autosomal recessive disorder, is caused by the expansion of a GAA repeat in the first intron of the FRDA gene on both alleles, generally with one very large expansion on one allele and a smaller expansion on the other [40]. Age-at-onset is determined by the size of the smaller allele, with shorter repeats generally manifesting as a later age-at-onset with milder symptoms than larger ones [40]. Several studies have shown that particularly mild symptoms may be caused by multiple complex interruptions in the small allele [41], and have been hypothesized to contribute to atypically late onset [42], or conversely to have no clear impact on age-at-onset [41]. This may depend upon the individual family being examined, or some other unknown element. As in DM1, FRDA interruptions can be quite complex, and include nucleotide compositions such as GAAGAG, GAAGGA, and GAAGAAAA. Also similarly to DM1, the role or impact of FRDA interruptions on the disease are not entirely clear.

1.4.3 DNA methylation

DNA methylation occurs mostly at CpG sites in the human genome, with alterations in normal methylation status associated with instability in tumours [43], as well as general genomic instability [44]. The loss or inhibition of expression of the maintenance DNA methyltransferase Dnmt1 in human cells as well as in mouse models has been shown to destabilize trinucleotide repeats [45, 46]. Many of the TNR disease loci are embedded within CpG islands, and so methylation at the repeat may play a role in stability in some of these diseases. The best characterized connection between repeat methylation and instability is in fragile X. Large expansions are generally relatively stable after expansion into the disease

7 causing range, and are fully methylated, although there are high-functioning males that are mosaic for the expansion as well as for methylation [47]. Methylation was found to be coincident with a decrease in instability in an in vitro SV40 replication primate system [38]. A region upstream of the expanded (GAA) in FRDA was shown to be significantly methylated in individuals with the disease compared to controls, with an additional coincidence with age-at- onset [48]. Conversely, no significant association was found between promoter DNA methylation and age-at-onset in SCA3 patient cells [49]. Recently, our lab examined the methylation pattern in various tissues from DM1 patients, fetuses, and transgenic DM1 mouse models [1]. Levels of methylation were highest in specific fetal tissues, with a decreased amount of methylation in tissues from adult-onset patients. Additionally, methylation was present upstream of the repeat in some adult DM1 tissues- not necessarily coincident with the highest levels of instability- but not across the repeat or downstream of the repeat. These results counter the previous idea that only congenital DM1 tissues exhibit methylation around the repeat. Interestingly, many disease-causing TNRs are flanked by one or two CTCF binding sites, which may potentially act as insulators at these loci. Methylation in the flanks would ablate the binding of CTCF protein, and has been shown to affect the stability of certain repeats [50]. Methylation status is complex across various TNR diseases, and may be playing a role in instability and disease depending upon the locus and several other cis-acting factors.

1.4.4 DNA replication

Repeat instability occurs in both proliferative and non-proliferative tissues in TNR disorders. For example, the repeats in the liver and skeletal muscle tissues in DM1 patients are very unstable [1], although of the two, only liver is considered a proliferative tissue. However, DNA replication likely does play some part in TNR instability, as several in vitro studies involving human or primate cells have indicated that the direction of replication affects the stability of trinucleotide repeats [51, 52], and that proliferation in general contributes to repeat instability [51]. Various other proteins involved in replication processes have also been implicated in repeat instability (FEN1, BLM, WRN) [53-55] although, with FEN1 for example, the model system used greatly varied the role the protein played in instability [56-59]. For example, trinucleotide repeats in yeast were shown to be unstable when FEN1 was knocked out [60], however neither mouse models nor human cell models exhibited any change in repeat stability after FEN1 knockout [61, 62]. Given that the direction of replication through the repeat

8 affects the stability of the repeat, it is possible that the in vivo position of DNA replication origins may be important in repeat instability. A DNA replication origin is a particular location within a stretch of DNA where the replication of DNA begins. Despite the relatively few specific origins of DNA replication that have been mapped within the human genome, several have in fact been mapped in close proximity to TNR tracts, strengthening the idea that replication through the tract plays a role in instability in at least some tissues in some TNR diseases. Two separate studies mapped a replication origin in the promoter region of the FRAXA gene [63, 64]. Candidate regions were also mapped at SCA7 and HD (both with a single origin mapped just downstream of the repeat tract), and SBMA (two origins closely flanking the repeat) [65]. Our lab also mapped two DNA replication origins at the DM1 locus that flank the CTG repeat in human and patient cells [66]. Additionally, various tissues from a DM1 transgenic mouse were used for replication origin mapping. This experiment was of interest, as these mice have only one human transgene inserted into their genome, which contains 45 kb of human DNA from the expanded DM1 locus. In mice, origin mapping of the expanded allele did not have the confounding effect of simultaneously mapping the unexpanded allele. Only one origin (downstream) was found to be active adjacent to the repeat in these mouse tissues, in contrast to the two found in human DM1 patient tissues. This direction of replication through the repeat (with the CAG strand as the lagging strand template) is the one generally associated with repeat expansion (reviewed in [67]). Various tissues and ages showed variable, asymmetric fork progression towards the repeat starting from the same origin location, indicating a potential role for replication and fork progression in TNR instability in a tissue-specific and age-dependent manner. Although likely contributing to instability in some manner, replication isn‘t the only cis-element affecting instability.

1.4.5 Transcription

All known disease-associated TNRs are transcribed, some bidirectionally in a convergent manner, which makes transcription an attractive candidate to influence instability. Replication and transcription likely play a role in instability together, as it has been shown in vitro that the collision between replication and transcription machinery results in repeat instability [68]. In terminally differentiated cells that no longer replicate, transcription is likely a major contributor to repeat instability. Bidirectional transcription has been shown to result in instability of CTG/CAG repeats by over twenty-fold more than single-direction transcription [69], and also

9 contributes to the formation of R-loops [70], an RNA-DNA hybrid structure that may additionally contribute to repeat instability [71], potentially by aberrant processing via transcription-coupled nucleotide excision repair [70]. In addition to R-loops, alternative structures in the DNA itself (see ―DNA structure‖ below) may contribute to instability through multiple means, including a coupling with transcription that involves RNAPolII arrest and multiple attempted rounds of transcription-coupled repair (TCR)[72]. Transcription and replication machinery collision as well as multiple rounds of TCR in attempted repair of hybrids or arrested RNAPolII, in addition to the other cis-elements previously mentioned, likely all play some role in TNR instability.

1.4.6 Mismatch repair and other DNA binding proteins

In addition to cis-elements, certain trans-acting factors such as proteins that bind within or around the repeat tract may also contribute to repeat instability, particularly in tissues that have very little proliferation and therefore are not likely to have a high degree of instability attributable to DNA replication. Mismatch repair (MMR) proteins, in particular, have a strong effect upon TNR instability. Certain MMR proteins, such as MSH2 and MSH3, have in fact been shown to paradoxically be required for various TNR tract instabilities [73-75], rather than shielding the tract from further mutation as in most other cases. MSH2 binds directly to TNR DNA structures [76], which may explain at least some of the variability in instability seen in certain diseases, based upon the propensity of the nucleotides in the tract to assume non-B DNA structure. Instability is likely caused by error-prone or escaped repair of DNA structures [77], with clusters of slipped-structures the most poorly repaired [78]. (Mismatch and other repair proteins are further discussed in ―DNA Structures: Processing‖ below.)

Besides MMR proteins, a zinc-finger protein named CCCTC binding factor (CTCF) also plays a role in instability. CTCF is well-characterized as a transcriptional insulator, although there is evidence for transcriptional activation as well [79]. CTCF acts primarily by insulating promoters from their respective enhancer elements. Interestingly, CTCF binding sites are known to flank many of the TNRs causing disease [75], with some evidence indicating that they may have a direct effect on TNR instability. A mouse model containing a human fragment of the ATXN7 locus with an expanded CAG repeat showed an increase in repeat instability when the flanking CTCF binding sites were methylated or mutated, and therefore unable to bind the

10 protein (Chapter 2, below, and reference [50]). A depletion of CTCF binding in the 5‘UTR of the frataxin gene was also shown to be coincident with both a decrease in sense transcription, which is known to contribute to the disease, and an increase in antisense transcription [80]. The non-trinucleotide megasatellite of the disease fascio-scapulo-humeral dystrophy (FSHD) has been shown to itself act as a CTCF-dependent insulator site [81]. FSHD is caused by a deletion of this repeat region rather than an expansion. The deletion allows the repeat to act as an insulator, interfering with enhancer activity and protecting transgenes from positional effects in a CTCF dependent-manner [81]. Given that many of the TNR diseases are flanked by CTCF binding sites [82], potential effects could be a widespread phenomenon among this class of disease.

Various other proteins have also been shown to have effects on repeat instability. DNA polymerase kappa, for example has been shown to produce mutations in mononucleotide microsatellite sequences [83]. DNA polymerase alpha primase has also been shown to pause within microsatellite sequences, likely due to secondary structure formation, causing inserted base mutations [84]. Base excision repair proteins [85], ligase I [86], and OGG1 [87] have all been implicated in repeat instability of various trinucleotides as well.

Various cis-elements and trans-factors may work alone or in conjunction with each other to play a role in TNR disease instability and pathology, likely through an altered ability to repair structures formed through DNA replication or transcription, or both. The history of alternative DNA structures in relation to disease is discussed below.

1.5 DNA Structure

In addition to DNA repeat instability and the potential additional effects of cis-elements and trans-factors, the actual physical structure of the DNA within the expanded repeat may also contribute to disease.

1.5.1 History

Soon after the 1953 publication of the double helical structure of deoxyribonucleic acid, it became clear that this conformation was not the entire story for DNA organization. Various conformational permutations were soon suggested, starting in 1955 and 1958 with the theoretical idea of cruciforms and slipped-strand structures [88, 89]. By 1962 the structures of cruciform (4-

11 way DNA junctions), triple-stranded, and four-stranded DNAs had been proposed to exist, with many of these structures dependent upon the primary sequence of the DNA. The first biological role for inter-strand slippage was in the reiterative synthesis of simple repeated DNAs by Escherichia coli (E. coli) DNA polymerase [90]. The process of reiterative synthesis, or replication slippage, has since been shown to be dependent upon both the sequence of the repeats as well as the DNA polymerases used [91]. More biologically relevant to humans, Streisinger and colleagues first suggested a potential role for alternative DNA structures in disease only a few years later [92]. Since the 1960‘s replication slippage has been a proposed mechanism of frameshift mutations at repeat tracts in prokaryotic, eukaryotic and human cancer cells. More recently, replication slippage and slipped-structures have been implicated in the genetic instability of trinucleotide repeat sequences—a mutation responsible for around 20 neurological, neuromuscular and neurodegenerative diseases [93, 94]. Although most of these disorders involve trinucleotide instability, it is now clear that other repeated sequences, such as tetrameric, pentameric, hexameric and dodecameric repeats, can also lead to human disease (reviewed in [95]. In all cases, remarkable features are associated with this class of diseases. First is a large intergenerational bias to expansions, a novel dynamic type of mutation that seems to be exclusive in humans. However, some age-dependent expansions have also been achieved in different transgenic mice models [96, 97], most recently mimicking the ―big jumps‖ expansions observed in DM1 patients [98]. Second, size repeat thresholds linked to disease symptoms are around 30-70 repeats in most cases. Beyond them, repeat tracts become highly unstable, including across intergenerational transmissions, leading to anticipation. Therefore, in general, the longer the expansion the earlier age-at-disease onset observed, with severe forms of the diseases reaching expansions from 100 up to thousands of repeats. No therapeutic strategy exists for stopping or slowing the expansion process, primarily because the specific destabilization mechanism(s) still remain a mystery. However, active contributions to repeat instability have been shown to occur through DNA metabolic pathways such as replication, repair, and transcription (reviewed in [95]), likely not as independent events but probably acting in concert, indicating variability in the biological or biochemical mechanisms causing disease. In addition, several factors acting either in cis or in trans, influence the expansion mechanism, including repeat tract sequence [99, 100], genomic location [101, 102], repeat tract purity [26, 103], length [104], repeat orientation relative to the origin of replication [51, 66], repeat distance from the origin [51], DNA methylation [1, 45, 47] and transcription through the repeat [68-70]. Despite

12 this disease complexity, the same underlying mutation, a gene-specific repeat expansion, is responsible for each of them. Thus, the same question remains unanswered; what common feature may contribute to unstable repeat expansions?

In the last few years, some effort towards answering this question has been directed to the ability of repeats involved in expansion diseases to form unusual DNA structures. All models proposed to explain repeat expansions involve DNA slippage at the repeats (reviewed in [95, 105]). The formation and aberrant repair of slipped-DNAs is a likely source of repeat instability and progressive disease severity in patients. Slipped-DNAs formed by out-of-register misalignment of the repeats have long been thought to be transient mutagenic intermediates forming in mitotic cells at replication forks or in non-mitotic cells at sites of DNA damage, transcription, or recombination. Characterization and understanding of the formation of these particular structures, as well as which factors might be involved in their processing, have highlighted some disease aspects. Likewise, disease therapy might be aimed at compounds that specifically recognize these structures. Given the genome-wide microsatellite instability of 1-3% in certain cancers and the presence of hundreds of thousands of simple sequence repeats in the human genome [106], with approximately 378,869 of those being TNRs [106], potential non-B conformations at repeats and their effect on disease becomes an important avenue of research.

1.5.2 History of DNA structures and disease association

The discovery of genetic instability of repeat sequences as the mutagenic cause of multiple diseases [94] stimulated renewed interest in slipped-strand DNA structures. An accurate characterization of slipped structures is providing insight into the mechanisms of repeat instability. For example, certain structural features of slipped-DNAs can determine their repair outcome. Slip-outs of CAG repeats are repaired by human cell extracts with greater efficiency than CTG slip-outs [77], shorter slip-outs are repaired with greater efficiency than longer slip- outs [78], and the number of slip-outs can also perturb repair, where clustered slip-outs on the same molecule are poorly repaired relative to isolated slip-outs [78] (Fig. 1.1). Components of slipped-DNAs include, in the case of CTG/CAG, the slip-outs of either CAG or CTG, as well as the variable types of junctions at which repeats are slipped-out from a duplex of complementary paired repeat strands. Understanding the mechanism of formation of slipped-strand DNAs is the first step in discovering the importance and potential role of these structures in multiple diseases.

13

Figure 1.1. Repair outcomes of slipped (CTG)/(CAG) DNA structures

CAG slip-outs are repaired better than CTG slip-outs, with error-prone repair occurring more often with CTG slip-outs. Smaller slip-outs of either CTG or CAG are repaired better than larger slip-outs, and clustered slip-outs are repaired poorly, retaining or acquiring multiple slip-out structures.

14

15

1.5.2.1 Formation: Slipping, sliding, creeping, shifting, bubble- and branch-migration

From the initial experiments demonstrating both the de novo and the reiterative synthesis of simple repeating nucleotides [90] it seemed evident that the two strands must be slipping with respect to each other in an out-of-register fashion. Strands that slipped in the 5‘ direction would result in 3‘ recessed ends, which are substrates for extension by DNA polymerases. Since all polymerases extend only 3‘ to 5‘, strands that have slipped in the 3‘ direction would not be active templates for extension. The precise details of such slippage remain unknown. It is possible, although unlikely, that each of the hydrogen bonds between the two strands are completely broken, the strands shifted, then all hydrogen bonds reformed (Fig 1.2A). This mechanism, which we term sliding, is unlikely as it would be a high energy-requiring process, with not just the breakdown and reformation of bonds but also the actual translational movement of the two strands. This is in contrast to branch migration of Holliday (4-way DNA) junctions, where there is juxtaposed simultaneous breakage and forming of base pairs (Fig. 1.2B).

The likely mechanism of oligo dT motion upon the poly dA seen by Kornberg was probably through a ―creep‖ (or partial dissociation) type of motion (Fig. 1.2C) as opposed to an ―exchange‖ or sliding type mechanism [107]. The latter two would require the complete dissociation and possible exchange of the oligo dT to another poly dA. The former would involve the partial denaturation, translocation and eventual positional shift of the oligo dT along the poly dA. There is little if any dissociation of oligo dT from the poly dA—thereby favoring the ―creep‖ type mechanism (Fig. 1.2C).

If one assumes that there is no intra-strand interactions occurring in the slipped-out repeat strands (such as may occur in a loop-out), and that the ends of the slip-out are available for re- association with the complementary repeat strand, the slip-out can effectively branch migrate along the complementary strand. This is very much the same as D-loop migration or Holliday junction branch migration (a D-loop is a displacement loop, where two strands of DNA that are normally paired are displaced from each other by a third strand) (Fig. 1.2B compared to D). Such loop migration would entail only the energy necessary for the breaking and reforming of the inter-strand hydrogen bonds ahead of and behind the loop-out, respectively. However, if there are intra-strand interactions occurring in the slipped-out repeats (such as hairpin or other structure) then migration of the slip-out would be a high energy demanding process - requiring

16 that both the intra-strand base pairs as well as the base pairs at the slip-out junction be broken and reformed (see hairpin in Fig. 1.2D). Branch migration of slipped-out trinucleotide repeats along the Watson-Crick helix of repeats is impeded by the intra-strand base pairing of the hairpin [108, 109]. It is likely that the inability of the slip-out to branch migrate maintains the preferred location of slip-out extrusion. Being unhindered by intrastrand base pairings, a looped-out (CAG) repeat may actively slip (translocate) in a ―creeping‖ or inchworm fashion [107] along the Watson-Crick duplex. Such inter-strand sliding motions would move at single repeat unit increments. This mechanism of sliding may result in some of the somatic and population polymorphism‘s of trinucleotide (or other) repeat tract which arise as +/- 1-3 repeats. We do not believe that larger alterations (deletions and expansion) arise from such a process, as it is too demanding both kinetically and biophysically. We presently favor the strand dissociation/re- association (creep) model for the formation of expansion and deletion intermediates.

1.5.2.2 Formation: Mechanism

As indicated above, different DNA pathways seem to be involved in the complex repeat expansion mechanism, likely not in an independent way but playing simultaneous roles. Evidence from patients and experimental models supports that both replicating and non- replicating cell environments can lead to repeat instability [51, 77, 110-114]. Likewise, in all proposed cases involving replication, repair, recombination and/or transcription, slipped DNAs formed by out-of-register pairings between complementary repeat strands can appear as deletion or expansion mutagenic intermediates. But how could these structures form in vivo?

At replication forks, formation of slip-outs of the repeat in the template strand (leading or lagging) may allow these sequences to be bypassed during replication synthesis, resulting in repeat contraction intermediates. On the contrary, if the slip-outs were to form in the newly replicated nascent strand, the outcome would lead to expansion intermediates. Evidence that slippage may occur in non-replicating cells has been shown [76, 77, 115], and thus slipped-DNA intermediates might also be formed under these conditions. Nicked DNA damage sites generated by several endogenous or exogenous sources may facilitate slippage at the repeats, with expansion or deletion intermediates forming when the strand undergoing the repair process is either the nicked or the continuous strand, respectively. Slipped-DNAs may also be formed during recombination processes as unequal crossovers, homologous recombination, or

17

Figure 1.2.Movement of slipped-out DNA structures

(A) Sliding. Slipped-out DNA structures slide along the DNA by breaking the hydrogen bonds between both strands on both sides of the structure, the two strands are then shifted relative to one another, and then all hydrogen bonds reformed when complementary bases are again aligned. This mechanism is unlikely as it would be a high energy-requiring process, with not just the breakdown and reformation of bonds but also the actual translational movement of the two strands. (B) Holliday (4-way DNA) junction branch migration. This involves the simultaneous breakage and formation of base pairs along both lengths of DNA in the direction the junction is migrating.(C) Creep. Structures move along the DNA by repeated rounds of partial denaturation, resulting in an eventual structural shift. There is very little dissociation, which energetically favours this mechanism. (D) Bubble migration (sliding). CAG slip-outs are loops that have little bonding between the looped-out bases. Migration of these ―bubbles‖ along the DNA requires only the breakdown of bonds ahead of the structure in the direction in which it is migrating. Hairpins of CTG repeats require the more energy-consuming step of also breaking down the hydrogen bonds within the structure.

18

19 recombination-mediated repair of double-strand breaks [116, 117]; or during transcription processes, since all disease-associated trinucleotide repeat sequences are actively transcribed, facilitated by transcription-replication fork collisions [68-71].

To elucidate the mechanism(s) of repeat instability and to understand the interaction of proteins with them, it is imperative to understand the structural details of the putative DNA mutagenic intermediates. Just as an appreciation of the biophysical features of four-way DNA junctions has been informative to understanding recombination [118], the characterization of slipped-structures should provide insight into the mechanisms of repeat instability. Since genetic instability is tightly associated with repeat tract length [119], a full understanding will be obtained through analysis of disease-relevant tract lengths.

1.5.3 Structures

Some of the earliest reports thought to be slipped-strand DNAs [120-124] occurred within direct, but not tandem, repeating units - thereby having limited ability to assume numerous conformational isomers. Furthermore, in each case the structural alterations were located at homopurine•homopyrimidine tracts—suggesting that these were triple-stranded DNAs rather than slipped structures. Synthetic oligonucleotides composed of direct, but not tandem, repeats-which by sequence design necessarily form intra-strand DNA heteroduplex structures- have been used as models of slipped structures [125]. However, each of these model DNAs are incapable of inter-converting between slipped isomers differing by increments of individual repeat units.

1.5.3.1 Structures:Evidence of DNA structure in trinucleotide repeats

Biophysical characterization of trinucleotide repeat slipped-strand DNA had to wait until over a decade later [76, 104, 126]. Previous data confirmed that short repeat tracts (CTG)10, (CAG)10, (CGG) 10 and (CCG) 10 associated with dynamic mutations could form intra-strand hairpins [127] that may facilitate and stabilize the formation of slipped-structures. Two different kinds of slipped-DNA structures were well characterized at this time; homoduplex slipped DNAs (S-DNAs) and heteroduplex slipped intermediate DNAs (SI-DNAs) of (CNG)x•(CNG)y, where x = y or x ≠ y, respectively (See Fig. 4.1, as well as Fig 1.3). Both structures are composed of complementary strands containing trinucleotide repeats, where the structural anomaly occurs

20 within the repeat tract. The propensity of slipped structure formation correlates with both repeat tract length (n ranged from 17 up to 255 repeats) and tract purity, both known determinants of genetic instability, making these structures suitable models for mutagenic intermediates of disease-associated repeat instability [104, 126]. Biophysically, S-DNAs and SI-DNAs formed by (CNG/CNG) trinucleotide repeats are remarkably stable under physiological conditions, and unlike other alternative DNA structures (cruciforms, Z-DNAs or triplexes), their extrusion and sustainment does not require superhelical tension (Fig. 1.3B). This suggests that slipped-DNAs may be formed easily in vivo, either at replication forks, nicked, or gapped DNAs. Using zinc- finger nucleases, a recent study showed that slipped-DNA structures formed at an exogenously added repeat in human cells in a replication-dependent manner [128]. That study, however, could not comment on the structure or organization of the slipped-DNAs, or whether slipped structures could also form at an endogenous TNR disease locus. Locus specificity and the overall structural features of slipped-DNA may critically determine whether and how they might be recognized and processed by DNA metabolizing proteins. These features include sequence- and length-specific slip-outs, forming hairpin or loop structures, the presence of interruptions in the repeat tract, and the formation of slipped junctions. Interruptions within the trinucleotide repeat tract (Table 1.1) dramatically reduce the propensity to form slipped DNAs [104], which suggests that they may provide genetic stability by ablating the formation of mutagenic intermediates [39]. The pattern and amount of S-DNA formed by a given length of trinucleotide repeats can vary depending upon the sequence of the non-repeating DNA flanking the repeat. Conversely, the DNA sequences of the tracts flanking the trinucleotide repeat do not seem to affect the formation or structure of SI-DNAs [126, 129], lending strength to the idea that multiple TNR disease loci may form variable heteroduplexed slip-out structures regardless of flank. Sequence context (flanking sequence) is known to affect the biophysical structure of base-base mismatches, with differences correlating to differential abilities to be repaired [130-132]. Furthermore, the sequences flanking cruciforms [133] and triplex-forming tracts [134] are known to affect the extrusion of these structures. It has been shown that the repeats of the SCA10 locus (ATTCT·AGAAT) form ill-defined unpaired structures that spread out into the flanking DNA when within the genomic context [135]. It has additionally been found that a Z-DNA forming sequence 5‘ of the DM2 (CCTG·CAGG) repeat decreases the propensity for slipped-structure formation within DM2 repeat itself [136]. The manner through which flanking sequence may

21

Figure 1.3. Unusual DNA structures formed by trinucleotide repeats.

TNRs can form various alternative DNA structures dependent upon the repeat tract composition. (A) C(N)G repeats can form hairpins (or slip-outs) with paired or unpaired bases within the hairpin. G-rich TNRs, such as the (CGG) of the Fragile X repeat, can form quadruplex structures. Homopurine-homopyrimidine repeats such as (GAA•TTC) can form triplex, or triple- stranded DNA, which are formed due to negative supercoiling. All of these structures have been shown to form in vitro. (B) Slipped-out structures formed by (CNG•CNG) trinucleotide repeats are extremely biophysically stable, and, unlike most other alternative DNA structures do not require superhelical tension to remain in the slipped-DNA structure format. (C) The junctions of slip-outs may contain either zero, one, or two unpaired bases across from the slip-out. This depends upon whether a C, T/A, or G is the first base of the slip-out on the 5‘ end.

22

(A)

(B)

(C)

23

TABLE 1.1.Trinucleotide repeat diseases containing interruptions

Various trinucleotide repeat disease loci have been reported to contain anything from simple to complex interruption patterns. The stability of the tract, in some cases, has been shown to be affected by the presence of interruptions.

24

Disorder/gene Repeat unit Number of Repeats Interruption

disease Normal Permutation Full/disease

DM1/DMPK (CCGCTG)n, complex (CTG)/(CAG) 5-37 50-80 80 - >6000 variable myotonic dystrophy

(GAG,GGA,GAA),

FRDA/FXN (GAA)/(TTC) 6–32 33-60 200-1700 CCG, TTG, CTA, Friedreich's ataxia GAAGAG, GAAGGA, GAAGAAAA

FRAXA/FMR1 (CGG)/(GCC) 6–52 59-230 230-2000 AGG fragile X type A

DM2/ZNF9 (CCTG)/(CAGG) 104–176 ND 75–11,000 (GCTG,TCTG) myotonic dystrophy 2

SCA1/ATX1 (CAG)/CTG) 6-39 - 40-82 CAT spinocerebellar ataxia 1

SCA2/ATX2 (CAG)/(CTG) 13-33 - 32-200 CAA spinocerebellar ataxia 2

SCA3/ATX3 55-84 (CAG)/(CTG) 13-44 ND CGG, GCC, CAG, CAA spinocerebellar ataxia 3

SCA8/ATXN8 (CTG)/(CAG) 16-92 ND 100-127 CCA,GTC,CTT spinocerebellar ataxia 8

SCA10/SCA10 ATGCT, ATTCTAT, (AATCT)/(AGATT) 10-22 ND 280-4500 variable sequences spinocerebellar ataxia 10

FSHD/FSHDIA 3.3 kb D4Z4 11-150 ND <11 Sequence varations facioscapulohumeral repeat muscular dystrophy-1A

25 affect S-DNA, but not SI-DNA formation, is not clear, but such differences may correlate with the genetic instability displayed by (CAG)/(CTG) repeats at different disease loci. Additionally, the structural features of slipped DNAs, including conformations of the slip outs as well as the slipped junctions, may permit aberrant processing of these mutagenic intermediates in a manner that leads to repeat length mutations. Understanding the structural aspects of slipped-DNAs, including the slip-out, the hairpin tip, the junction, and any unpaired bases across from the hairpins, may reveal what determines repair outcome. Analysis of disease-length (CTG)n and (CAG)n SI-DNAs showed that they are able to form intrastrand hairpins with G·C base pairs and T-T or A-A mismatches [126]. These hairpins, both biophysically stable, lead to different structures. The CTG slip outs predominantly form intrastrand hairpins due to T·T mismatches stacked in the helix, and are stabilized by hydrogen bond formation. Alternatively, the CAG slip-outs predominantly form unpaired single-strand loops, since A-A mismatches are poorly stacked in the helix and do not form base pairs. Hairpin tips are an integral feature of slip-outs as well. Two hairpin tip conformations are possible, depending upon whether the number of repeats contained within the slip-out are even or odd (an odd number will have an unpaired T or A at the tip). Moreover, two general forms of slipped junctions can form between the complementary (CTG)n•(CAG)n repeats—those with slipped-out CAG repeats and those with slipped-out CTG repeats. One arm of the slipped junction is composed of the slipped-out strand, while two arms of the junction are composed of complementary paired repeat strands. For each sister SI-DNA there are three possible junction base-pair binding conformations (Fig. 1.3C), each defined by the first 5‘ base of the slipped-out hairpin being either G, T/A or C. The junctions formed by these would contain two, one or no unpaired bases, respectively, on the shorter non-slipped-out strand. The zero and 2 unpaired base junctions may inter-convert with each other [126] at both the CTG or CAG slip-out, indicating that certain junction conformations may be more stable than others. The presence of unpaired bases at heterologous 3-way DNA junctions can increase the biophysical stability of the junctions [118], and along with the hairpin tip structure may serve to modulate recognition of the slip-stranded DNA by DNA metabolizing proteins (see ―Processing‖ below). An appreciation of the biophysical features of the Holliday four-way DNA junction has proved to be informative to understanding their metabolism and mechanisms of recombination, thus in the same manner this knowledge of slipped-DNA structures may aid our understanding of their repair outcome.

26

The fact that different junctions have variable biophysical stability indicates that there is not likely a ready interconversion between the various structural isomers of S-DNA, nor between those and the linear form. This differs greatly for the case of a bulged base in a repeat tract: Woodson and Crothers [137-140] analyzed a series of oligonucleotides having internal (G)n•(C)n tracts, specifically creating duplex hybrids where one strand contained four G residues and the other contained three C residues. The extra G residue, which could be positioned at any of multiple points along its complementary C tract, appeared to branch migrate throughout the whole of the tract. Similar results were found for an extra C residue in a (G)n•(C)n tract [141]. Thus, in the case of an extra (bulged) base in a mononucleotide repeat, there is rapid breaking and forming of hydrogen bonds to permit migration of the unpaired residue along its complement tract. In this situation there is only a single base, and hence no possibility for impeding intra- strand interactions. A single bulged CAG, CTG, CGG, or CCG repeat, may not be sufficiently long to have strong intrastrand (C-G) interactions to impede its migration. These potential structural organizations are distinct for either a CTG or a CAG slip out. In addition to junction structure, other features such as correlation of the size of slip-outs in SI-DNAs and S-DNAs with the size of tract length, as well as changes in proliferating and non-proliferating tissues, suggest that both are mutagenic intermediates in proliferative and non-proliferative processes of instability [126].

1.5.4 Processing

As with 4-way DNA junctions [118, 142], the sequence and structural features of slipped- DNA, including the slip-out and the slip-out junction, may critically determine whether they are recognized and processed by DNA repair proteins. The determination of repair processing of these mutagenic intermediates might aid our understanding of the important features involved in the instability of disease-causing repeats. In yeast, where the repeats tend to delete, it has been shown that heteroduplex DNAs with intra-strand hairpins composed of (CTG)10 or (CAG)10 escape repair [143]. The repeat structures analyzed were true heteroduplexes, in that they contained only CTG or CAG repeats in the complete absence of CAG or CTG repeats in the complementary strand. Thus, while the slip-out contained repeats, the other branches of the three-way junctions were composed of fully base-paired non-repeating sequences. Interestingly, another yeast study, using only a CTG loop-out on one strand, showed correct repair of the slip- out [144], while another [115] determined that a CAG-only slip-out was bound by mismatch

27 repair proteins; however the lack of a true trinucleotide repeat junction, with fully paired repeats in the branches, also misses important potential recognition and repair outcomes. Instability intermediates in human patients involving the gain or loss of some repeat units would be expected to contain three-way junctions composed of complementary repeats. In vitro evidence using cloned human DM1 repeats has also shown that MSH2 binds both S- and SI-DNA, with preferential binding occurring at the (CAG) loop-out in a repeat-length dependent fashion [76]. Data from various model organisms can be confusing at best or contradictory at worst, and therefore it is of interest to know if in human cells, where the repeats tend to expand rather than contract like they do in yeast, whether slipped-strand trinucleotide repeat structures are metabolized and if so, what role, if any, the slip-out and (CTG)/(CAG) three-way junctions may play.

It has been found that human cell lines deficient in XPG (an endonuclease involved in nucleotide excision repair (NER)) can still remove (CTG)25 or (CAG)25 hairpins, although not as efficiently as HeLa cells in every case, and in a manner that involves differential nicking of the slipped-out strand [145]. Addition of purified XPG back into the system stimulated hairpin repair, indicating that there are likely multiple pathways used by human cells for repair of hairpin structures. It has also been shown that a single guanine in a CAG hairpin structure is hyper susceptible to DNA damage (7,8-dihydro-8-oxoguanine; 8-oxoG) relative to fully duplexed (CTG)/(CAG) DNA [146]. This DNA damage was previously implicated, along with attempted repair by the base-excision repair enzyme OGG1, in repeat expansion [87]. This damaged base is not only repaired 700-fold slower than the same damage found in fully duplexed DNA [146], it likely persists in the slipped-DNA intermediates, eventually becoming incorporated into the duplex DNA, restarting the repair attempt, worsening a toxic oxidation cycle and likely contributing to repeat expansion by strand displacement, nicking, and polymerization [147].

Using extensively characterized slipped (CTG)/(CAG) substrates that model intermediates of expansion and contraction events with mutagenic length changes, it has been shown that these structures can undergo multiple alternative repair processing by various human cell extracts, depending upon the structures present within the molecule and the repair proteins present within the extracts [77, 78]. The slip-out was either correctly repaired, escaped repair altogether, or underwent error-prone repair, wherein the excess repeats in the slip-out are incompletely excised during repair. Each of these repair paths depended upon the location of a

28 nick in the DNA relative to the slip-out as well as the composition of the slip-out (CTG or CAG), but did not depend upon the presence or absence of various MMR or NER repair proteins indicating the involvement of various repair pathways. Some of the substrates used are termed ―expansion‖ substrates, meaning that they are prone to expansion rather than contraction during in vitro repair assays. Interestingly, it was only these expansion substrates that underwent error- prone repair, which presented a novel path to generate expansions, but not deletions. The same results were observed whether extracts were used from proliferating or non-proliferating cells, indicating a potentially ubiquitous repair pathway not dependent upon proliferation status of the cells. It has also been found that short, clustered slip-outs on either strand are repaired less efficiently than single, isolated slip-outs, likely due to each slip-out interfering with the repair of adjacent slip-outs [78]. Inefficient repair of slip-outs leads to expansions, and so it is possible that slip-outs found at patient loci may contain clustered slip-outs on either strand.

1.5.5 Compounds interacting with slipped-DNA

Identifying compounds that interact with the various types of alternatively-structured DNA can potentially lead to the discovery of a therapeutic course of action for many of the repeat diseases that form slipped-out structures within the repeating DNA tract. Therapeutic strategies that are targeted at modifying the repeat dynamics at the DNA level can potentially have a broad general utility in repeat expansion disorders, and would attack the root cause, the repeat expansion, of each of these diseases. Very few compounds to date have been reported to show this feature, and even those reported have not yet been studied extensively in in vitro or in vivo assays to assess their ability to interfere with repeat expansions. Two groups of compounds have been studied more extensively than others: spirocycline molecules have been shown to recognize DNA and RNA-loops, although, as with some chemotherapeutic compounds, expansions rather than contractions are sometimes observed [148-151]. Likewise, naphthyridine derivatives have been shown to bind specifically to trinucleotide repeats, detecting the AA, CC, and GG mismatches formed in hairpin structures by recognizing the hydrogen bonding motif of the bases [152]. These derivatives work to stabilize various nucleotide composition hairpins by interfering with DNA replication [153, 154]. A few other compounds have also been found to interact with repeating DNA; actinomycin-D, an anti-cancer drug that interacts with G- quadruplex structures which may be forming at G-rich repeats such as the CGG repeats of fragile X [155]; aluminum, which interacts with CCG repeats and induces structural change [156]; and

29 bis-acridine [157]. The design and improvement of compound screens [149] for ligands that specifically bind to disease repeats is another promising approach. In addition to compounds, nucleic acids themselves have been studied for their abilities to interact with hairpins. Interestingly, a CTG unstructured nucleic acid was found to disrupt CAG hairpins by invasion into the hairpin structure [158]. Similarly, a repeat-complementary peptide nucleic acid has been shown to bind within the androgen receptor [159], altering nucleosome assembly. Although promising, more work must be done to determine how effective compounds, unstructured nucleic acids, and peptide nucleic acids may be as hairpin interactors and TNR expansion prevention agents.

1.6 Non-trinucleotide repeat mutations and other unusual mutations in TNR diseases

Although the expansion of gene-specific TNRs is the root cause of the resultant disease, the repeat expansion is not the only factor within or around the repeat that contributes to phenotype. Repeat interruptions and de novo mutations can affect not only the TNR disease itself, but can also contribute to the phenotype of other closely related diseases. In the twenty years since the first TNR disease has been described, the true complexity of these disorders has slowly begun to be revealed.

1.6.1 Mutations within or in proximity to the repeat

The instability of trinucleotide repeats that cause disease as well as the presence of interruptions are not always the only genetic change occurring within or around the repeat. There are an ever-growing number of TNR diseases in which unique, rare mutations have been found in the adjacent sequences. Some have been associated with an alteration in the particular disease phenotype, while others are more cryptic in potential effects. The recently discovered spinocerebellar ataxia type 31 (SCA31) has been associated with inserted, complex pentanucleotide repeats between 2.5 and 3.8 kb in length [7], with a string of 0-4 (TAGAA) repeats located at the 5‘ end of this insertion [160]. It was found that the greater the number of inserted (TAGAA) repeats at the 5‘ end, the smaller the overall insertion tended to be, which correlated with age-at-onset of individuals with SCA31. Additionally, the insertion in patients had long stretches of (TGGAA) repeats, whereas control individuals had neither this nor the (TAGAA) pentanucleotides. Additionally, controls had insertion lengths between 1.0 kb and

30

3.5kb, and contained only (TAAAA) pentanucleotides. In the complicated case of SCA31, the length of the insertion is not the simple cause of the disease, with pathogenicity likely attributable to the presence of both (TGGAA) and (TAGAA) pentanucleotides within the expanded insertion.

Several studies have also revealed various rare mutations at the myotonic dystrophy type 1 locus. A case study reported a single individual with cardiomyopathy and no diagnosis of DM1, yet this patient harboured an expansion of the (CTG) repeat at the DM1 locus to over 50 TNRs in a small fraction of peripheral blood mononuclear cells, with a larger fraction of the cardiac muscle also showing an expansion[161]. No other trinucleotide repeat loci showed instability, indicating that the instability reported was locus-specific. Similarly, in another individual case study, an expansion of the (CTG) repeat was seen only in muscle tissue [162]. Yet another study found a novel repeating hexamer (CCGCTG) in sperm of a single DM1 individual, again with any effect on phenotype being unclear [102]. Chapter 3 of this thesis describes another report of a mutation involving the CTG repeat at the DM1 locus.

Huntington‘s disease also has several reported incidents of mutations other than the uniform expansion of the trinucleotide repeat. Monozygotic twins were shown to have (CAG) repeat mosaicism of 37 or 47 repeats in various tissues (blood, skin, and hair), with the symptomatic twin having only the 47 repeats in skin and hair, with the other having a mosaic pattern in both tissues [163]. Another family showed variable lengths of two (CCG) repeat stretches immediately downstream of the (CAG) repeat that causes HD, and these correlated with age-at-onset in the family [164].

Several studies revealing unusual mutations coincident with the fragile X expansion (or in place of the expansion) have proved to be more easily associated with clinical phenotypes of the disease than mutations in other TNR diseases. A review of the variable mutations that can lead to disease at the fragile X locus has been recently published [165], and only a few are mentioned here. Deletion of the entire CGG repeat as well as flanking sequence in addition to a microdeletion on the second allele has been reported in a single female from a fragile X family [166]. This female was, interestingly, clinically normal. Another separate case of a deletion just proximal to the repeat, however, caused the clinical phenotype of fragile X in an entire family [167], likely due to the loss of a promoter sequence. To complicate matters, it appeared in this

31 case that the progenitor allele likely started with an expanded repeat, and at some point acquired a deletion mutation at the 5‘end that resulted in a decrease of the expansion to 45 CGG repeats, with no interspersed AGG interruptions. Several other studies looked at mosaicism of the CGG repeat, with affected individuals harbouring both an expanded allele in some tissues as well as an allele with deletions involving some of the repeat as well as some flanking sequence [168, 169]. At the opposite spectrum to deletions, a small duplication on the X chromosome encompassing the FRAXA gene in another family resulted in a very similar phenotype to fragile X [170]. Point mutations in the FRAXA gene have also been reported to cause fragile X syndrome, due to the loss of transcription [171]. Unlike other TNR diseases, it appears that many different types of mutations that disrupt transcription of the FRAXA gene can cause the fragile x phenotype. Various FRDA patients have also been reported to have different types of mutations. Given that FRDA is autosomal recessive, most patients are homozygous for a (GAA) expansion. The other 2% have an expansion on one allele and a different type of mutation on the other, ranging from novel insertion-deletion mutations [172], to point mutations that result in missense, frameshift, or nonsense mutations [173, 174]. A summary of the above mutations are depicted in Figure 1.4.

Given the multiple mutations that have been shown to occur at various TNR loci, and in some cases contribute to phenotype or even cause disease, it is clear that many of these repeat disorders should not be defined only as simply expansion mutation diseases, but as a combination of a TNR mutation and various other interacting elements.

1.7 Thesis Goals

The dramatic repeat length variation evident at different trinucleotide repeat disease loci is not easily explained by the repeat expansion alone, raising the possibility that cis-elements and trans-acting factors may be contributors to instability. Although the existence of cis- elements that alter instability have been widely-accepted, the identities and in vivo effects of these elements are largely unknown. Chapter 2 focuses on providing the first in vivo evidence of a cis-element, the CTCF binding site adjacent to the (CAG)92 tract at an integrated human SCA7 locus, that modulates CAG length instability in a mouse model of SCA7.

Although TNR diseases are caused by a single mutated gene in the human genome, the mutation is generally not uniform throughout the tissues of individual patients or across generations, with additional rare mutations within or around the repeat having been reported. The

32 goal of Chapter 3 is to further characterize rare mutations in DM1 patient tissues; it reports two novel mutations that occurred at the CTG repeat at the DM1 locus in a single patient. These mutations have completely replaced the CTG tract in a subset of cells with two short yet distinct mutations in two separate tissues. Although it is difficult to assess whether these rare mutations have any effect on instability or pathology, they do help us to gain further knowledge about the mutability of the TNR loci.

Another element that has long been assumed to contribute to instability in various TNR diseases is the formation of slipped-out DNA structures along the expanded trinucleotide repeat allele. Although evidence of their formation along repeat tracts has been shown in vitro as well as at an artificial exogenous locus inserted into a HeLa cell line, there is no evidence to show that they form in patient tissues. The goal of Chapter 4 is to show that slipped-DNAs do in fact form in myotonic dystrophy patient tissues in a manner co-incident with instability, and further to structurally characterize these slip-outs. This Chapter gives evidence, after more than 20 years of speculation, that slipped-DNA structures form at an expanded TNR disease-causing tract.

33

Figure 1.4. Mutations within or in proximity to various expanded repeats

A schematic of the various repeat diseases mentioned in the proceeding section that contain various mutations within or in close proximity to the repeat.

34

35

CHAPTER 2 – CTCF cis-regulates trinucleotide repeat instability in an epigenetic manner: a novel basis for mutational “hot spot” determination

Co-authorship statement: The La Spada lab established the mouse lines, confirmed CTCF binding and performed methylation analysis, which is not touched upon in this chapter, but is within the published work (PLoS Genet. 2008 Nov;4(11):e1000257). The Pearson lab provided all measures of instability. I carried out the small-pool PCR analysis of tissue-specific and germline instability. I made figures 2.2, 2.5, and Table 2.1. Katharine Hagerman carried out experiments for data in Figure 2.1, 2.3, 2.4, and 2.6.

36

2 CTCF cis-regulates trinucleotide repeat instability in an epigenetic manner: a novel basis for mutational hot spot determination 2.1 Abstract

Dramatic variation in repeat instability occurs at different disease loci and between different tissues; however, cis-elements and trans-acting factors regulating the instability process remain largely undefined. Genomic fragments from the human spinocerebellar ataxia type 7 (SCA7) locus, containing a highly unstable (CAG) tract, were previously introduced into mice to localize cis-acting ‗‗instability elements,‘‘ and revealed that genomic context is required for repeat instability. The critical instability-inducing region contained, among others, binding sites for CTCF—a regulatory factor implicated in genomic imprinting, chromatin remodeling, and DNA conformation change. To evaluate the role of CTCF in repeat instability, our colleague Dr. Al La Spada derived transgenic mice carrying SCA7 genomic fragments with CTCF binding-site mutations. I found that CTCF binding-site mutation promotes trinucleotide repeat instability both in the germ line and in somatic tissues. As CTCF binding sites are associated with a number of highly unstable repeat loci, these findings suggest a novel basis for demarcation and regulation of mutational hot spots and implicate CTCF in the modulation of genetic repeat instability.

2.2 Introduction In spinocerebellar ataxia type 7 (SCA7), disease onset in children who inherit the expanded repeat averages 20 years earlier than in the affected parent [175]. The basis of the profound anticipation in SCA7 stems from a significant tendency to undergo large repeat expansions upon parent-to-child transmission [176]. Other similarly-sized, disease-linked (CAG)/(CTG) repeat tracts do not exhibit strong anticipation, and are much more stable upon intergenerational transmission, as occurs at the SBMA disease locus [101]. Drastic differences in the stability of (CAG)/(CTG) repeats, depending upon the locus at which they reside, strongly support the existence of cis-acting DNA elements that modulate repeat instability at certain loci. Furthermore, dramatic variation in (CAG) tract instability in tissues from an individual patient, together with disparities in the timing, pattern, and tissue-selectivity of somatic instability between CAG/CTG disorders, indicates a role for epigenetic modification in DNA instability [77, 177-180]. While the existence of cis-elements regulating disease-associated instability is

37 widely accepted, the identities of cis-elements that define the mutability of any repeat are still largely unknown. Proposed cis-elements that regulate repeat instability include: the sequence of the repeat tract, the length and purity of the repeat tract, flanking DNA sequences, surrounding epigenetic environment, replication origin determinants, trans-factor binding sites, and transcriptional activity [94, 105, 181]. Such cis-elements may enhance or protect against (CAG) tract instability. To identify cis-elements responsible for (CAG) expansion at the SCA7 locus, our collaborator Dr. LaSpada previously introduced SCA7 (CAG)92 repeat expansions into mice, either on 13.5 kb ataxin-7 genomic fragments or on ataxin-7 cDNAs. Comparison of (CAG) repeat length change revealed that ataxin-7 genomic context drives repeat instability with an obvious bias toward expansion, while SCA7 (CAG) repeats introduced on ataxin-7 cDNAs were stable [12]. To localize the cis-acting elements responsible for this instability tendency, Dr. LaSpada derived lines of transgenic mice based upon the original 13.5 kb ataxin-7 genomic fragment, deleting a large region (8.3 kb) of human sequence beyond the 3‘ end of the (CAG) tract (a-SCA7-92R construct). As deletion of the 3‘ region in the a-SCA7-92R transgenic mice significantly stabilized the CAG-92 tract [12], we hypothesized that cis-elements within this 3‘ region modify repeat instability at the SCA7 locus. To identify cis-acting instability elements at the SCA7 locus and the trans-acting proteins that regulate them, we evaluated the critical genomic region 3‘ to the (CAG) repeat for sequences that might regulate genetic instability. In the case of SCA7 and a number of other highly unstable CAG/CTG repeat loci, including HD, DM1, SCA2, and dentatorubral-pallidoluysian atrophy, binding sites for a protein known as CTCF have been found [82]. CTCF is an evolutionarily conserved zinc-finger DNA binding protein with activity in chromatin insulation, transcriptional regulation, and genomic imprinting [182, 183]. As CTCF affects higher order chromatin structure [184, 185], we wondered if CTCF binding at the SCA7 locus might regulate (CAG) repeat instability. To test this hypothesis, our collaborator derived SCA7 genomic fragment transgenic mice with CTCF binding site mutations, and analyzed the instability at the (CAG/CTG) repeat.

2.3 Methods

2.3.1 Generation of SCA-CTCF-I-mut transgenic mice

To derive the SCA7-CTCF-I-mut transgenic construct, our collaborators synthesized a PCR primer with randomly mutated nucleotides introduced at the CTCF-I contact sites for

38 recombineering into the RL-SCA7-92R (SCA7-CTCF-I-wt) construct[12], and then confirmed loss of CTCF binding by the mutated fragment by electrophoretic mobility shift assay (data found in the published study, [50]). Using a standard recombineering approach [186], a SCA7- CTCF-I targeting cassette containing a chloramphenicol resistance gene and ClaI restriction site flanked by SCA7-CTCF-I region sequences was PCR generated with the following primer set: hSCA7-wt-CAM-F, 5‘-tcccccctgcccccctcctgtatcgatgtttaagggcaccaataactgc-3‘ & hSCA7-mut- CAM-R, 5‘-catctctgcccctcgatttttatcgatatcgataatgatgagcacttttcgaccg- 3‘. After recombineering the SCA7-CTCF-I-mut targeting cassette into the SCA7-CTCF genomic fragment carried on a plasmid, selection, and PCR screening, the chloramphenicol gene was deleted by ClaI digestion and ligation. The sequence of the SCA7-CTCF-I-mut construct was verified prior to linearization with Sal I – Spe I digestion, gel purification, and microinjection into C57BL/6J6C3H/HeJ oocytes. Transgene-positive founders were backcrossed onto the C57BL/6J background for more than 12 generations to yield incipient congenic mice before repeat instability analysis commenced. All experiments and animal care were performed in accordance with the University of Washington IACUC guidelines.

2.3.2 Repeat Instability Analysis

I PCR amplified the SCA7 CAG repeat from genomic DNA samples in the presence of 0.1mCi 32P-ATP using primers SCA7BR (5‘-GGAGGCCTCAACCCACAGATTC-3‘) and SCA7A (5‘-GCGACTCTTTCCCCCTTTTTTTTG-3‘) with the following conditions: 95°C 3 min, 95°C 30sec, 62°C 30 sec, 72°C 2 min, and 72°C 5min for 30 cycles, and then resolved the radiolabeled PCR products on 1.8% agarose gels [12]. For small pool PCR, dilution of genomic DNA, yielding 1–5 genome equivalents, was performed prior to amplification and sizing [176]. In all experiments, at least three mice per genotype, or three samples per time point, were analyzed.

2.4 Results

At the SCA7 locus, there are two CTCF binding sites that flank the (CAG) repeat tract; the CTCF-I binding site is located 3‘ to the (CAG) repeat (Fig. 2.1A), within the critical region deleted from the SCA7 genomic fragment in the a-SCA7-92R mice (Fig. 2.1 A,B). As CTCF binding sites are coincident with highly unstable repeat loci [82], and CTCF binding can alter chromatin structure and DNA conformation [184, 185], I hypothesized that CTCF binding

39

Figure 2.1.The human SCA7 region

(A) Sequence of the SCA7-CTCF region. Primary sequence for the 3′ end of intron 2, all of exon 3, and the 5′ end of intron 3 are shown. Intron sequence is lowercase; exon sequence is uppercase. CTCF binding sites are shown in blue. Note that the CTCF-I binding site is located in intron 3, while the CTCF-II binding site encompasses intron 2 - exon 3 boundary. Start site of translation is underlined in blue, and CAG repeat is shown in red. (B)SCA7 genomic fragments used for transgenesis. Upper: SCA7-CTCF-I-wt; Middle: α-SCA7 3′ genomic deletion; Bottom: SCA7-CTCF-I-mut. Core CCCTC sequences are underlined, and sequence alterations in the SCA7-CTCF-I-mut transgenic construct are shown in gray.

40

(A)

(B)

41 might be involved in SCA7 repeat instability. To test this hypothesis, I decided to compare SCA7 (CAG) repeat instability in mice carrying either the wild-type CTCF binding site or a mutant CTCF binding site that would be incapable of binding CTCF (the mutation is indicated in gray in Figure 2.1 B).

I assessed germ line repeat instability by small-pool PCR of individual alleles in sperm DNAs from mice at age 2 months and 16 months (Fig. 2.2 A,B). Small-pool PCR is carried out on genomic DNA that has been diluted such that the equivalent of only one or a few genomes is present in each PCR reaction, therefore allowing rare alleles of varying sizes the opportunity to be amplified. As the mice aged, the (CAG) repeat in SCA7-CTCF-I-mut mice became increasingly unstable (p = 0.009, Mann-Whitney two-tailed test), as mean expansion and deletion sizes were significantly greater for 16 month-old SCA7-CTCF-I-mut mice in comparison to SCA7-CTCFI-wt mice (+24.3 CAG‘s/ - 215.5 CAG‘s (mut) vs. +9.2 CAG‘s/ - 21.0CAG‘s(wt)). Increasing (CAG) repeat instability with aging in SCA7-CTCF-I-mut mice suggests a role for CTCF in preventing DNA instability during spermatogenesis, or for the male germ line restricted CTCF-like paralogue (CTCFL), also known as Brother Of the Regulator of Imprinted Sites, or ‗BORIS‘ [187].

Another intriguing feature of repeat instability is variation in repeat size within and between the tissues of an individual organism. This tissue-specific instability, or ―somatic mosaicism‖, occurs in human patients with repeat diseases, and in mouse models of repeat instability and disease [95, 105, 179]. While shown to be age-dependent, the mechanistic basis of inter-tissue variation, which even occurs in postmitotic neurons [188], is unknown. To determine if somatic (CAG) mosaicism at the SCA7 locus involves CTCF binding, I surveyed repeat instability in various tissues from SCA7-CTCF-I-wt and SCA7-CTCF-I-mut mice. At two months of age, the SCA7 (CAG) repeat was remarkably stable in all analyzed tissues (Fig. 2.3). However, by ~10 months of age, SCA7-CTCF-I-wt and SCA7-CTCF-I-mut mice displayed large (CAG) repeat expansions in the cortex and liver (Fig. 2.3). The liver also exhibited a bimodal distribution of repeat size (i.e. two populations of cells with distinct tract lengths) (Fig. 2.3). The most pronounced somatic instability differences existed in the kidney, with large expansions for SCA7-CTCF-I-mut mice, but stable repeats in the SCA7-CTCF-I-wt mice (Fig. 2.3). This pattern of increased kidney and liver repeat instability was present in both SCA7-CTCF-I-mut transgenic lines (Fig. 2.3, 2.4). Indeed, comparable somatic instability was also detected in both SCA7-

42

Figure 2.2. Small-pool PCR data from sperm- 16 month-old mice

(A)Small-pool PCR of sperm from 16 month old wildtype and mutant mice. SCA7-CTCF-I-wt mice typically exhibited small repeat length changes, while SCA7-CTCF-I-mut mice displayed pronounced instability. Numbers along the side of the panels indicate repeat number. Each lane is representative of a single small-pool PCR reaction. The schematics above each panel represent the specific construct inserted into the specified mouse line, with the center rectangle representing the (CAG) repeat, and the flanking circles representing the CTCF binding sites. An ―x‖ indicates a mutated or absent binding site. (B) A compilation of small-pool PCR data in sperm. At 2 months of age, only modest instability was noted in both wildtype and mutant mice. At 16 months of age, SCA7-CTCF-I-wt mice displayed moderate instability, but SCA7-CTCF-I- mut mice exhibited significantly greater instability (p = 0.009; Mann-Whitney two-tailed test).

43

(A)

(B)

44

Figure 2.3. SCA7-CTCF-I-mut mice display increased somatic instability.

At 2 months of age, the SCA7 CAG repeat is stable in many tissues in the SCA7-CTCF-I-wt line and in both SCA7-CTCF-I-mut lines. With advancing age, tissue-specific instability is seen in SCA7-CTCF-I-wt mice; however, this tissue-specific instability is much more pronounced in SCA7-CTCF-I-mut mice. Results for individuals from the two different SCA7-CTCF-I-mut mice are shown here. The two mutant mice are from two separate transgenic lines, each with a single transgene inserted into a different part of the mouse genome. All panels are pictures of autoradiographic films used to image agarose gels run of standard PCR across the integrated human SCA7 locus, carried out with 32P. PCR protocols can be found in the Methods of this Chapter (2.3.2)

45

46

Figure 2.4. SCA7-CTCF-I-mut mice display increased somatic instability.

Two different mouse mutant lines (same mutation, different integration site) exhibit the same bimodal somatic instability in liver at two different ages. All panels are pictures of autoradiographic films used to image agarose gels run of standard PCR across the integrated human SCA7 locus, carried out with 32P. PCR protocols can be found in the Methods of this Chapter (2.3.2)

47

48

Figure 2.5. SCA7-CTCF-I-mut mice display increased somatic instability.

(A) To permit quantification of somatic instability, we performed small-pool PCR on tissue DNA samples from SCA7-CTCF-I-wt and SCA7-CTCF-I-mut mice. As shown here for cortex, SCA7-CTCF-I-mut mice displayed significantly greater instability than SCA7-CTCF-I-wt mice (p = 8.6×10−5, Mann-Whitney two-tailed test). See Table 2.1 for a compiled list of repeat alleles. (B) Histogram of repeat length variation in the cortex of SCA7-CTCF-I-wt and SCA7-CTCF-I- mut mice.SCA7-CTCF-I-mut mice exhibit significantly greater instability than SCA7-CTCF-I-wt mice, and this expansion tendency exceeds that of SCA7-CTCF-I-wt mice, even when 2.5 months younger (p = 0.0003, Mann-Whitney two-tailed test). With advancing age, the expansion bias between the SCA7-CTCF-I-mut and -wt mice becomes more pronounced (p<.0001, Mann- Whitney two-tailed test). Results for individuals from the two different SCA7-CTCF-I-mut mice are shown here.

49

(A)

(B)

50

CTCF-I-mut transgenic lines at five months of age (Fig. 2.4). When I closely examined repeat instability in the cortex by small-pool PCR, I observed significantly different repeat sizes (p = 8.6×10−5, Mann-Whitney), with a range of 39 to 152 (CAG) repeats in SCA7-CTCF-I-mut mice and 26 to 245 (CAG) repeats in SCA7-CTCF-I-wt mice (Fig. 2.5; Table 2.1).The increased somatic instability occurred in both SCA7-CTCF-I-mut transgenic lines, as an expansion bias was apparent in both lineages upon small-pool PCR analysis (Fig. 2.5 B; Table 2.1). These findings suggest that CTCF binding stabilizes the SCA7 (CAG) repeat in certain tissues. Thus, as noted for the germ line and documented for two independent lines of SCA7-CTCF-I-mut transgenic mice, SCA7 somatic (CAG) instability is dependent upon age and the presence of intact CTCF binding sites.

2.5 Conclusions and Discussion

I have identified a CTCF binding site as the first cis-element regulating (CAG) tract instability at a disease locus. At the SCA7 locus and four other CAG/CTG repeat loci known to display pronounced anticipation, functional CTCF binding sites occur immediately adjacent to the repeats, and CTCF binding can affect DNA structure and chromatin packaging at such loci, and elsewhere [185, 189-192]. Although an interplay between GC-content, CpG islands, epigenetic modification, chromatin structure, repeat length, and unusual DNA conformation has long been postulated to underlie trinucleotide repeat instability [105, 193-195], the mechanistic basis of this process is ill-defined. CTCF insulator and genomic imprinting functions are subject to epigenetic regulation, as methylation status is a key determinant of CTCF action at certain differentially methylated domains, and methylation changes at CTCF binding sites are linked to oncogenic transformation [183, 185]. Thus, the inability to bind CTCF at sites adjacent to (CAG) tracts, because of CpG methylation or binding site mutations in the case of the SCA7-CTCF-I site, can promote further expansion of disease-length CAG repeat alleles (Figure 2.6).

In both affected humans and transgenic mice with expanded repeat tracts, the repeat displays high levels of instability. The flanking sequence has been thought to contain elements that may protect or enhance repeat instability. These results show that CTCF binding is a stabilizing force at the SCA7 repeat locus, suppressing expansion of the (CAG) repeat in the germ line and soma. Interestingly, deletion of ~8.3 kb of 3′ genomic sequence in the previous SCA7 transgenic mouse, including the CTCF-I site, stabilized the repeat [12]. The CAG-92

51

Table 2.1. Repeat sizes of cortex DNA: CAG tract length - Small-pool PCR.

Raw counts of CAG repeat sizes in cortex DNA using small-pool PCR. (Related to the data in Figure 2.5). Each number is representative of the repeat number of a single sized allele.

52

53 stabilization, arising from the ~8.3 kb 3′ genomic fragment deletion, suggests the existence of positive cis-regulators that were ―driving‖ (CAG) instability. One such element could be a replication initiation site that was mapped within the genomic region 3′ to the CTCF-I binding site at the SCA7 locus [65].Hence, the 8.3 kb 3′ deletion could grossly alter the chromatin organization of the adjacent repeat, and would likely ablate replication origin activity, stabilizing the (CAG) repeat tract. However, this ~8.3 kb genomic region likely also contained negative cis- regulators of (CAG) repeat instability, whose dampening effects would not be apparent due to the coincident loss of instability drivers. The above results indicate that CTCF binding negatively regulates expanded (CAG) repeat instability at the SCA7 locus. CTCF regulation of repeat instability potential is consistent with its many roles in modulating DNA structure. CTCF can mediate long-range chromatin interactions and can co-localize physically distant genomic regions into discrete sub-nuclear domains [184, 185]. CTCF insulates heterochromatin and silenced genes from transcriptionally active genes, as CTCF binding sites occur at transition zones between X-inactivation regions and genes that escape from X-inactivation [190]. CTCF has been implicated in genomic imprinting, although recent studies indicate that such transcription insulator events may involve the coordinated action of CTCF with cohesion [196- 198]. CTCF binding at the DM1 locus sequesters repeat-driven heterochromatin formation to the immediate repeat region, while repeat expansion-induced loss of CTCF binding may permit spreading of heterochromatin to adjacent genes, accounting for the mental retardation phenotype in congenital DM1 [189]. As DNA structural conformation and transcription activity are two highly intertwined processes that appear fundamental to the instability of expanded repeats [105, 181], CTCF appears a likely candidate for modulation of trinucleotide repeat instability.

At the SCA7 locus, a pronounced tendency for repeat expansion has been associated with transmission through the male germ line [10, 175, 176]. Although I have hypothesized that CTCF is principally responsible for modulating SCA7 (CAG) repeat instability both in the germ line and in the soma, I considered a possible role for the related CTCF-like factor BORIS. BORIS and CTCF share identical 11 zinc-finger domains for DNA binding [187]; hence, both CTCF and BORIS can bind to the CTCF binding sites at the SCA7 locus. Upon mutation or methylation of the CTCF binding site 3′ to the SCA7 (CAG) repeat, neither CTCF nor BORIS can bind [50]. As BORIS can bind to the H19 differentially methylated domain even when it is

54

Figure 2.6. Model for CTCF regulation of CAG repeat instability.

The non-expanded CAG repeat is stable, as CTCF is bound to the adjacent site. Upon repeat expansion, the chromatin environment and the DNA structure of the repeat region is altered, permitting instability. Loss of CTCF binding at the adjacent CTCF binding site, either by CpG methylation or CTCF binding site mutation, further promotes repeat instability.

55

56 methylated [199], these results suggest that the methylation dependence of BORIS binding is locus specific. BORIS and CTCF expression patterns overlap very little, if at all, and in the male germ line, BORIS appears restricted to primary spermatocytes, while CTCF occurs almost exclusively in post-meiotic cells, such as round spermatids [187]. In human HD patients and transgenic mouse models of CTG/CAG instability, large repeat expansions have been documented in spermatagonia, but not in post-meiotic spermatids or spermatozoa [73, 200-202]. Thus, absence or low levels of BORIS or CTCF in spermatogonia — the cells in which the largest and most frequent repeat expansions occur — may contribute to the paternal parent-of- origin expansion bias common to most CAG/CTG repeat diseases. In spermatocytes, BORIS may stabilize expanded (CAG) repeats, just as CTCF binding appears to promote repeat stability in somatic tissues. Thus, in the SCA7-CTCF-I-mut mice, abrogated binding of BORIS may contribute to increased repeat instability and expansion bias in the male germ line.

My findings suggest that CTCF is a trans-acting factor that specifically interacts with the adjacent cis-environment to prevent hyper-expansion of disease length CAG repeats. In a Drosophila model of polyglutamine repeat disease, expression of the mutant gene product modulated repeat instability by altering transcription and repair pathways [181]. Similarly, uninterrupted repeat sequences, and in particular, runs of CG-rich trinucleotide repeats, can affect replication machinery, DNA repair pathways, and nucleosome positioning, though in cis, by altering the structure and conformation of the DNA regions within which they reside [203, 204]. Association of adjacent CTCF binding sites with repeat loci is a common feature of unstable microsatellite repeats [82]. I propose that acquisition of CTCF binding sites at mutational hot spots represents an evolutionary strategy for insulating mutagenic DNA sequences [205], and our findings indicate that CTCF binding site utilization at a mutational hot spot is subject to epigenetic regulation. I thus envision a predominant role for CTCF in modulating genetic instability at DNA regions containing variably-sized repeats, unstable sequence motifs, or other repetitive sequence elements.

57

CHAPTER 3 - Replacement of the myotonic dystrophy type 1 (CTG) repeat with „non-CTG repeat‟ insertions in specific tissues

Co-authorship statement: I‘d like to thank the following people that contributed to data in this chapter: Masayuki Nakamori (University of Rochester) carried out the LNA Southern blot on patient tissues. Arturo Lopez-Castel contributed to the initial characterization, cloning, and sequencing of the insert sequences. Charles Thornton contributed the patient samples. I carried out experiments for the data in Figures 3.3 and 3.4. The data in this chapter has been published (J Med Genet. 2011 Jul;48(7):438-43).

58

3 Replacement of the myotonic dystrophy type 1 CTG repeat with ‘non-CTG repeat’ insertions in specific tissues 3.1 Abstract

Curious mutations have been reported to occur within the (CTG)n repeat tract of the DM1 locus. For example, the repeat, long presumed to be a pure repeat sequence, has now been revealed to often contain interruption motifs in a proportion of cases with expansions. Similarly, a few de novo somatic CTG expansions have been reported to arise from non-expanded DM1 alleles with 5-37 units, thought to be genetically stable. Here I characterize a novel mutation configuration at the DM1 CTG repeat that arose as somatic mosaicism in a juvenile onset DM1 patient with a non-expanded allele of (CTG)12 and tissue specific expansions ranging from (CTG)1100 to 6000. The mutation configuration replaced the CTG tract with a non-CTG repeat insertion of 43 or 60 nucleotides, precisely placed in the position of the CTG tract with proper flanking sequences. The inserts appear to arise from a longer human sequence on chromosome 4q12, and may have arisen through DNA structure mediated somatic inter-gene recombination or replication/repair template switching errors. De novo insertions were detected in cerebral cortex and skeletal muscle, but not in heart or liver. Repeat tracts with a change of 1 or 2 CTG units were also detected in cerebellum, which may have arisen by contractions of the short (CTG)12 allele. This non-CTG configuration expands current understanding of the sequence variations that can arise at this hypermutable site.

3.2 Introduction

The mechanism(s) involved in a non-expanded (CTG) repeat tract (5 to 35 units) reaching the status of a highly unstable sequence (>50 units) in patients with DM1 is poorly understood. The short, non-expanded allele is genetically stable, although occasional expansion biased mutations of non-expanded (CTG) alleles in DMPK gene have been reported [161, 206, 207] usually involving the high end of the normal size range (19-37 repeats), although in one case the normal allele that expanded was only 12 repeats expanding to >50 repeats[161]. Aside from repeat expansions that arise through additions of more (CTG) units, the DM1 (CTG) tract has been shown to contain interspersed non-CTG units [23, 39, 102]. Somatic mosaicism of the

59

HD (CAG) tract (simultaneous presence of (CAG)20, (CAG)37 and (CAG)47) has been reported [163]. These rare events in DM1 and Huntington‘s disease (HD) have in some instances been detected in specific tissues [161, 163, 207]. However, it should be noted such events are rare and alternative explanations, such as contamination, were not ruled out, and hence can be described as ‗putative‘ CTG/CAG changes. Here, I report the presence of aberrant sequences that replace the (CTG) tract at the DM1 locus. These non-CTG configurations arise as somatic events in specific tissues, and appear to be derived from a region of chromosome 4.

3.3 Methods

3.3.1 DNA extraction

DNA from patient sample named ADM9, as well as all other patient tissue DNAs, was extracted by homogenization of samples using a manual glass homogenizer after tissue disassociation with mortar and pestle in liquid nitrogen. Homogenized tissues were then lysed using a proteinase K buffer (1M Tris pH8, 5M NaCl, 0.5M EDTA, 10% SDS, 20mg/ml proteinase K, 10mg/ml RNAse A), followed by phenol, chloroform and ethanol precipitation steps. DNA was resuspended in 1 x Tris/EDTA buffer.

3.3.2 DNA amplification and electrophoresis

Amplification through the DMPK (CTG) repeat tract in ADM9 and other patient samples was performed using PCR primers 409 (5‘-GAAGGGTCCTTGTAGCCGGGAA-3‘) and 407 (5‘-CAGAGCAGGCGTCATGCACA-3‘) at an annealing temperature of 67°C for 2 min, over 30 cycles. Four per cent non-denaturing acrylamide gels (200V for 1.5 hrs) and 1% agarose gels (100V for 1 hr) were run to resolve the PCR products. Amplification using the primers designed based on the areas of homology found between the non-CTG insert in the skeletal muscle and cortex (3PC:5‘-CACCGGGTGGGTTACACC-3‘ and 5PSC: 5‘-CTGGCCTTAGCCACGCCAC- 3‘) was performed at an annealing temperature of 63°C for 45 sec, over 30 cycles. Products were run on a 1% agarose gel at 100V for 1 r.

60

3.3.3 Sequencing

Chosen PCR products were sent routinely to the sequencing facility of The Hospital for Sick Children (The Centre for Applied Genomics) after a step of gel excision/cleaning (QIAquick PCR purification kit, Qiagen) and cloning (TOPO-TA 4.0 cloning kit, Invitrogen). A minimum of 10 clones were sequenced for each PCR product analyzed. Each sequence was proofread against the electropherogram to avoid base misspelling.

3.3.4 DNA alignment and sequence location

Non-repetitive DNA sequences present in the repeat location were aligned against human, bacteria (Escherichia coli), and TOPO vector DNAs using the Blast2Align tool available at http://www.ncbi.nlm.nih.gov/. DNA sequences found from the insert-specific primer PCRs were searched against the human genome using the BLAST tool at http://www.ncbi.nlm.nih.gov/.

3.3.5 Methylation status

The methylation status of the allele on which the non-CTG insert is located was carried out using SacII digestion followed by multiplex PCR; 200 ng of ADM9 patient samples were digested with 10 units of SacII over night, then subjected to a multiplex PCR protocol. Multiplex amplification across the DMPK (CTG) repeat tract was performed using PCR primers 409 (5‘- GAAGGGTCCTTGTAGCCGGGAA-3‘), CTCFIa (5‘-CTGCCAGTTCACAACCGCTCCGAG- 3‘) and CTCFIIb (5‘-AAAGCAAATTTCCCGAGTAAGCAGGC-3‘) with the following conditions; 95°C 5min, 95°C 30sec, 65°C 45sec, 72°C 45sec, 72°C 45sec over 30 cycles. 4% non-denaturing acrylamide gels were used to resolve PCR products, run for 1.5 hours at 200V.

3.4 Results

During analysis of (CTG) repeat length variations between DM1 patient tissues, I found that, while characterizing the non-expanded allele by PCR, several aberrant PCR products arose in several tissues of a DM1 patient. The aberrant PCR products, in addition to the main PCR product, arose after the amplification of the short DMPK allele of a 44-year-old-woman (ADM9) diagnosed with a juvenile form of DM1 disease (Fig. 3.1). This individual had expanded

61

Figure 3.1.Detection of extra bands after short myotonic dystrophy type 1 (DM1) allele PCR amplification in ADM9 DM1 patient.

Products of PCR amplification (30 rounds) were analysed by: (A) Non-denaturing acrylamide gel showing additional bands in cortex, skeletal muscle and cerebellum in ADM9 patient. (B) No detection of extra bands in other DM1 patients (*detection of extra bands). (C) Agarose gel analysis ofADM9 cerebellum, cortex, and skeletal muscle tissues to compare the appearance of additional bands versus acrylamide gel.

62

63 alleles ranging from 1100 to 6000 repeats[1]. The appearance of unexpected PCR products suggests their formation in only a fraction of the cells from the tissues affected. Detection was achieved in cerebral cortex, skeletal muscle, and cerebellum of the ADM9 patient during the DMPK alleles repeat sizing in several tissues from up to eight different DM1 affected individuals (Fig. 3.1 A)[1]. The additional PCR fragments detected in the cortex and muscle were larger than the characterized (CTG)12 repeat length short allele, and shorter than that in the cerebellum. Such mosaicism in the shorter ranges was not detected in other tissues of the same individual nor in several tissues in other DM1 individuals (Fig. 3.1 B). Initially several faint extra PCR amplified fragments were detected in cerebral cortex, skeletal muscle, and cerebellum when resolved on non-denaturing acrylamide gels, which separates based upon both molecular weight and DNA secondary structure (Fig. 3.1 A). The same PCR products resolved during agarose electrophoresis, which is relatively insensitive to structure (Fig. 3.1 C), suggesting that a single additional product was present in these tissues with the largest in the cortex and the smallest in the cerebellum, while the other bands on acrylamide gels were likely heteroduplexes of the two differently sized PCR products [126, 163, 208]. Sequence analysis of these additional PCR products from all three tissues (see Chapter 3 Methods) showed that six of 15 clones from cortex (40%) and six of 13 clones from skeletal muscle (46%) contained, as expected, larger sequences (a total of 63 and 46 base pairs, respectively) than the non-expanded repeat tract ((CTG)12-36 base pairs). Strikingly, the sequences of these variants were completely devoid of (CTG) repeats except for the first 5‘-CTG unit (Fig. 3.2 A). In the cerebral cortex the complete 5‘-(CTG)-3‘ appeared to have been replaced with a sequence containing the first (CTG) followed by a 60 base 5‘-CTG-repeatfree sequence (Fig. 3.2 A). In the muscle the complete 5‘-(CTG)-3‘ appeared to have been replaced with a sequence containing only the first (CTG) followed by a 43 base 5‘- CTG-repeat-free sequence (Fig. 3.2 A). Interestingly, a comparison between the muscle or cortex insertion sequences revealed high levels of sequence identity at the beginning and end: complete alignment (100%) of the first 15 bases and strong alignment (81%) in the last 16 bases (13 of 16) (Fig. 3.2 B). Inserts began with a (CTG) and ended with a TG, presumably belonging to the first and last (CTG) units. The flanking DMPK sequence also PCR amplified in these samples was perfectly aligned with the known gene sequence (GeneID 1760; Fig. 3.2 A) in all 28 clones. The rest of the clones showed only the pure repeat tract with 12 or 11 units, suggesting a mixing of DNA products due to poor electrophoretic resolution. None of the 12 clones derived from cerebellum DNA of the same individual (isolated and characterized in parallel) showed this

64

Figure 3.2.Characterisation by cloning approach of the myotonic dystrophy type 1 (DM1) repeat site in the short allele from the different ADM9 tissues analyzed.

(A) Traces of sequenced, gel purified products of PCRs carried out across the CTG DM1 repeat. Examples of clones with a normal 12 CTG repeat (36 bp) sample (main PCR band in the ADM9 DM1 patient), a −2 CTG units sample (cerebellum), a 63 bp non-repeated sample (cerebral cortex), and a 46 bp non-repeated sample (skeletal muscle). (B) Homology degree between both non-repeated sequences found in cerebral cortex and skeletal muscle in ADM9 DM1 patient.

65

ADM9

ADM9

ADM9

66 obscure repeat configuration, as all contained a pure (CTG) tract, where five of 12 clones (~50%) contained (CTG)10 or (CTG)11, which may be contractions of 1 or 2 repeat units from the normal (CTG)12 allele. These short repeat length variants explain the presence of the extra PCR products derived from this tissue resolved on acrylamide (Fig. 3.2 A). Sequence analysis of 10 clones derived in parallel from the main PCR product from the cerebellum showed 12 (CTG) repeats and the proper flanking sequence. I do not favour the possibility that these aberrant DM1 configurations arose by PCR contamination or artifactual events as it is unlikely that these would occur in all reactions from only specific tissues, but not others. Further argument against contamination is derived from assessing additional clones from the main PCR fragment amplified (short allele) in cerebral cortex (tissue showing the extra band) and from other ADM9 tissues (not showing the extra band). All 25 clones analyzed showed only the presence of the (CTG)12 repeat tract after sequencing, confirming the additional extra PCR products as the only ones containing the non-repetitive sequences. I also performed a sequence database search for potential donor sources for the inserted sequences. Both inserted sequences were assessed using the Blast2Align tool available online (http://www.ncbi.nlm.nih.gov/) and the alignments against different sources showed: (1) very low score alignments (<35%) with any known sequence in the human genome; (2) no alignments with the bacteria genome (E.coli) used to transform in the clones; and (3) no alignments with the cloning TOPO TA vector sequence. These results argue against a bacterial or cloning vector DNA contamination. The absence of the inserted sequence in the human genome database may be due to it arising from a sequencing gap that has not been annotated [209-212] or possibly to a novel recombined sequence. To determine if the insert sequences are part of the human genome, I designed primers specific to the insert using the regions of homology (Fig. 3.2 B), and these were able to PCR amplify products from the DNA of various patient tissues, confirming that this sequence is in fact of human origin (Fig. 3.3 A). The size of the PCR products was approximately 800 bp, considerably longer than the non-CTG insert, indicating that its source might not be a continuous sequence but harbours intervening sequences. Direct sequencing of these revealed a total sequence of 858 bp (Fig. 3.3 B). Sequencing was carried out on both the clones of direct PCR products of skeletal muscle, cortex, and cerebellum, as well as the gel purified and cloned PCR products, with three clones of each separate tissue having been sequenced. Each clone contained at least a partial sequence, with a BLAST search revealing that each sequence mapped to the same location within the human genome (chromosome 4q12) (accession # NT_022853.15) [213-215]. The sequence contained

67

Figure 3.3.Characterisation of the PCR products of the insert specific designed primers.

(A) PCR amplification of ADM9 skeletal muscle and cortex DNA with the primers designed from the homology between the non-CTG insert in both tissues revealed a product approximately 800 bp in length. Primers were imperfectly hybridized at the ends of the sequence. (B) Sequencing and a BLAST search determined that the PCR product came from chromosome 4q12. Red text indicates locations of purine/pyrimidine tracts able to form Z-DNA. Bold green text shows a polymerase α mutation hotspot site. Mutant sequences were deposited to GenBank (accession numbers JF697199, JF697200 and JF697201).

68

69 motifs capable of forming unusual DNA structures, including Z-DNA (Fig. 3.3 B) [216], as well as containing a pol-α mutation hotspot site [217, 218], which may suggest a tendency to be recombinogenic, thereby permitting it to be inserted into the DM1 CTG repeat tract region [212]. Although close in size with the short repeat allele, I cannot discard the expanded allele as the source of these aberrant products. To determine whether the non-CTG insertions arose on the expanded or non-expanded DM1 alleles, I took advantage of known epigenetic marks specific to the expanded allele. Initially I considered using the adjacent markers that could discern DM1 alleles. Unfortunately, in patient ADM9 the upstream BpmI polymorphism in exon 10[219] is homozygous and the downstream Cac8I polymorphism in exon 3 of the SIX5 gene that is in complete linkage disequilibrium with the DM1 CTG expansion [220] is very far (~5 kb) from the (CTG) repeat and rich in G+C content, making PCR amplification impossible. However, our lab recently demonstrated the presence of CpG methylation on the expanded but not the non- expanded allele in many DM1 patient tissues [1]. In patient ADM9 the cortex showed high levels of CpG methylation upstream of the CTG repeat spanning the adjacent methyl-sensitive SacII restriction site. However, the cerebellum was relatively free of methylation. I used the methyl- specific multiplex PCR assay that I had devised to assess the methylation status of the SacII site 42 bp upstream of the CTG tract (Fig. 3.4 A). Forward and reverse primers were placed on opposite sides of the CTG repeat, and therefore were only capable of amplifying across the non- expanded (CTG) tracts, as well as the short non-CTG insertions. (No primer sets were able to amplify across the expanded allele due to the very large expansions they harboured; skeletal muscle (with (CTG)= 3700-4300), cortex (with (CTG)= 4000-5500), and cerebellum (with (CTG)= 1100-1500 and 3800-4600) of this individual, and Chapter 4, [1]). Using two upstream primers, one that did and one that did not cover the SacII site, permitted determination of the methylation status of the SacII site in the non-expanded allele (Fig. 3.4 A). PCR amplification revealed that in the absence of SacII digestion two major PCR products, both from the non- expanded allele, were evident. Also evident were the slower migrating products, representing the non-CTG insertions in the muscle and cortex DNAs and the (CTG) length stutter products in the cerebellum (Fig. 3.4 B, see grey arrowhead). Successful SacII digestion before PCR amplification, an indication of an unmethylated template, considerably reduced the production of the PCR products that encompassed the SacII site. The loss of the non-expanded CTG allele, the non-CTG insertion, and the CTG stutter PCR products supports the conclusion that the DNA templates from which these were amplified were not methylated at the SacII site.

70

Figure 3.4. Assessment of methylation allele specificity.

(A) SacII methylation sensitive digestion combined with multiplex PCR amplification was performed, using two upstream primers and one downstream primer to produce two PCR products from only the non-expanded CTG allele, that either did or did not encompass the methylation sensitive SacII site present upstream of the CTG tract. The insert, or stutter PCR products of the skeletal muscle, cortex, or cerebellum, are visible as lighter grey bands in the schematic. (B) Following SacII digestion the reduction of only the PCR product encompassing the SacII site revealed that this site on the allele in which the non-CTG insert is found was not methylated in either the cortex or the skeletal muscle. The same is true of the stutter found in the cerebellum sample.

71

72

Together these results could suggest that the non-CTG insertions arose on the non-expanded DM1 allele, or that they arose on the subset of expanded alleles that was free of methylation, or that some time following insertion the expansion-specific methylation was lost.

3.5 Conclusions and Discussion

In this study I have observed a DM1 allele that has completely lost the (CTG) tract, retaining only the first CTG and the TG of the very last CTG unit. Previously, the complete loss of all CGG repeats at the FMR1 gene was reported, but this also incurred loss of flanking sequences, and appeared to have occurred during transmission [166]. In this DM1 sample the loss of the CTG tract did not incur any changes in flanking sequences and arose as a somatic insertion mutation. Insertions into repetitive sequences have not been reported; however, SCA31 was recently reported to have arisen through the insertion of an unstable repetitive element [7]. It is noteworthy that there are several reports claiming apparent DM1 allele length mosaicism, that have been interpreted to be the result of small CTG length alterations of the non-expanded allele [221]. However, sequence analysis of these presumed CTG length changes was not performed. Our findings suggest other reports of presumed CTG or CAG length changes [161, 206, 207, 221, 222] may actually be insertions at the repeat of non-CTG sequences, much as I described herein. Thus, the insertion mutations I report here may be more common than suspected. It is curious that these non-CTG somatic insertions arose in some, but not all, tissues. Previously, somatic CTG expansions were present at higher levels in the cardiac muscle than in peripheral blood mononuclear cells [161]. Tissue preference for somatic length heterogeneity at the DM1 repeat and the HD locus has been reported [162, 163]. Our lab previously reported, for this same juvenile onset DM1 individual, a range of heterogeneous (CTG) expansions in the heart (with (CTG)= 4100-5300 and 6000), liver (with (CTG)= 4200-4600), skeletal muscle (with (CTG)= 3700-4300), cortex (with (CTG)= 4000-5500), and cerebellum (with (CTG)= 1100-1500 and 3800-4600) for this individual [1]. The presence of non-repeat insertions in cortex and muscle, and limited CTG length variations of the short allele in cerebellum, with absence of variant short products in the heart and liver, does not offer any obvious pattern. While such variant DM1 configurations may be present at extremely low levels in other tissues, they are not detectable using the bulk PCR methods used herein. The significance, if any, of the tissue selections is unknown. The frequency and timing of these aberrant insertions is unclear. We speculate that the unique nature of the mutant alleles makes it extremely unlikely that these mutations occurred

73 more than once in the lifetime of the patient. Likewise, de novo mutations were detected in cerebral cortex and skeletal muscle, but not in heart or liver. Only two mutant alleles were detected, and although they must have occurred de novo at some point, it remains unclear if they actually occurred in those tissues or in a precursor lineage at an earlier developmental stage. The level of mosaicism of such novel species has not been determined rigorously. However, considering that they are readily detectable using the bulk PCR methods applied herein, we suggest that their occurrence is likely to be around 10% of the cells in the cortex and muscle. As to how the DM1 insertion mutations arose, I can only speculate. Such events may have arisen on the expanded or the non-expanded alleles, through events involving only one or both alleles. Short size alterations may have arisen through insertions into the non-affected (CTG) allele, or through catastrophic contractions and insertions into the expanded allele. Previously, a germline contraction of the expanded DM1 allele to non-affected lengths was reported [223], a transmission event that appeared to arise by recombination between the two DM1 alleles. Similarly, somatic mosaicism of the HD (CAG) tract was shown to arise by the contraction of the expanded (CAG)47 to (CAG)37 rather than by the expansion of the normal (CAG)20 allele [163].While I was unable to definitively assign which allele incurred the non-CTG insertions with the methylation experiment, the analyses suggest that the mutations occurred on either the unmethylated non-expanded allele, or, more likely, given the ability of the expanded repeat to form slipped-DNA structures and the absence of CTG region mutations in non-DM1 individuals, that they rose on expanded alleles that were either unmethylated or became unmethylated after insertion. Recent studies in yeast have suggested that other expanded repetitive elements can facilitate events, and that such events may be facilitated by unusual DNA structures, double- or single-strand breaks, and the capturing of single-strand regions [224]. Another possible mechanism is a template switching event, where a replication fork at the expanded (CTG) tract may arrest, and through multiple microhomology mediated template switches with the chromosome 4 region (or other), may incur a series of insertions and aberrant nucleotide incorporations-resulting in a novel sequence at the DM1 CTG tract region [225, 226]. Such obscure mutation configurations, although rare, have been observed in model systems and may involve unusual DNA structures [217, 218, 224-228]. The likely source of the DM1 insert sequence on chromosome 4 is enriched in G+C content and can form unusual DNA structures, with this region having also been associated with sequence deletions in tumours [213-215]. The insertion at the DM1 repeat in two different tissues with sequence arising from the same region

74 of chromosome 4 argues for the close nuclear localization of this region with the DM1 region of chromosome 19. As to whether such events can lead to the non-CTG insertions observed here remains to be determined.

In conclusion, I detected a CTG-free configuration at the DM1 (CTG) repeat location that completely replaced the CTG tract. These non-CTG configurations arise as somatic events in specific tissues, and appear to be derived from a region of chromosome 4. The DM1 repeat tract has recently been described as presenting configurations distinct from the pure CTG trinucleotide motif. Interruptions by non-CTG repeat units like CCG and GGC through the DM1CTG tract have in some cases been linked with altered CTG repeat stability [23, 39, 102]. However, like those interruptions, it is unknown if the non-repeat configuration observed herein is a product of instability of the larger pure tract. Interruptions in the DM1 tract have been suggested in one family to be associated with altered clinical presentation [39]. The impurity of the repeat will essentially deplete CUG repeats within the toxic DMPK transcript and may well affect its ability to bind and sequester RNA splicing factors such as MBNL1 which specifically binds CUG, as has been suggested [39]. Similarly, DMPK transcripts with a complete absence of CUG repeats, as in the alleles observed herein, may also be expected to have altered toxic effects. While the clinical significance of these distinct configurations of the DM1 repeat tract awaits future analysis, their existence advises against presuming a pure repeat when interpreting changes in nucleotide sizes of the DM1CTG region.

75

CHAPTER 4- Detection of slipped-DNAs at the trinucleotide repeats of the myotonic dystrophy type 1 disease locus in patient tissues

Co-authorship statement: Thank you to the following people who contributed experimental data for this chapter: Masayuki Nakamori (University of Rochester) carried out the LNA Southern blot on patient tissues. Yuh-Hwa Wang (Wakeforest University) carried out EM imaging and EM data assembly. EM data analysis was done by both myself and Yuh-Hwa Wang. I carried out all other experiments. Charles Thornton (University of Rochester) contributed the patient samples. 2D3 antibody was provided by Maria Zannis-Hadjopoulos (McGill University). This Chapter has been submitted to PNAS for publication consideration.

76

4 Detection of slipped-DNAs at the trinucleotide repeats of the myotonic dystrophy type 1 disease locus in patient tissues 4.1 Abstract

Slipped-strand DNAs, formed by out-of-register mispairing of repeat units on complementary strands, were proposed over 50 years ago as transient intermediates in repeat length mutations, hypothesized to cause genome evolution, cancer and at least 30 neurodegenerative diseases. While slipped-DNAs have been characterized in vitro, evidence of slipped-DNAs at an endogenous locus in biologically relevant tissues, where instability varies widely, is lacking. The largest expansions typically arise in non-mitotic tissues such as cortex and heart. Here, using an anti-DNA junction antibody and immunopreciptation, I identify slipped-DNAs at the unstable trinucleotide repeats of the myotonic dystrophy disease locus in patient brain, heart, muscle, and other tissues. Slipped-DNAs were present in naked as well as chromatinized DNA. I characterized slipped-DNAs as clusters of slip-outs along a DNA, with each slip-out having 1-100 extrahelical repeats, where their levels correlate with the degree of repeat instability between tissues. This supports the formation of slipped-DNAs as persistent products of repeat instability, and not merely as transient intermediates as previously assumed. These findings further our understanding of genetic variation and have potential prognostic implications.

4.2 Introduction

All models proposed to explain the instability of repeats involve DNA slippage at the repeats (Fig. 4.2) [1, 66, 77, 95, 104, 128, 229-232]. Slipped-DNAs are thought to contribute to more than 30 neuromuscular/neurodegenerative diseases caused by unstable trinucleotide repeats and numerous cancers that show microsatellite instability [95, 229, 230]. Recently, the most common cause of amyotrophic lateral sclerosis was found to be an expanded hexanucleotide repeat, which may also arise through slipped-DNAs–hence understanding slipped-DNAs in patient tissues is of great importance [233, 234]. Expansion mutations continue in patients as they age, coinciding with worsening symptoms. Patients exhibit inter-tissue repeat length differences as great as 5,770 repeats, with large expansions occurring in affected tissues such as

77 brain, muscle and heart, indicating high levels of continuing expansions [1, 66]. The formation and aberrant repair of slipped-DNAs is a likely source of repeat instability and progressive disease severity in patients (Fig. 4.2) [77, 231]. An understanding of these DNA mutagenic intermediates in patients should provide insight as to how they may be processed and lead to mutations. The important questions demanding answers are 1) Do slipped-DNAs form at disease loci? 2) Do their levels vary in patient tissues that undergo variable levels of repeat expansion within a given individual? And, 3) What is the biophysical structure of these slipped-DNAs?‖ These questions cannot be answered in a heterologous model system that shows repeat instability that does not reflect that ongoing in a patient, nor one lacking tissues. While slipped-DNAs have been characterized in vitro [104, 232], data supporting the presence of slipped-DNAs at an endogenous locus in biologically relevant tissues has been lacking.

Recently, an artificial system with an exogenously added CTG/CAG repeat was used to show slipped-DNAs forming in a replication-dependent manner through the use of a Zn-finger nuclease predicted to be specific for slipped-repeats [128]. This study supports the formation of slipped DNA in a living cultured system. Shortcomings of this study include the prevalence of contractions rather than expansions, and the poor characterization of the specificity of the nuclease used. Moreover that study was limited in that it could provide no insight into what DNA structures may be forming during expansions (rather than contractions), in a disease locus, in different tissues with variable levels of instability or in non-replicative tissues such as the brain. It is imperative to study instability in patient tissues since repeat mutations are expansion- biased, arising by processes distinct from contractions, coupled with the known locus-specific effects, the tissue-specific variations of instability, and the fact that most expansions arise in non- replicating cells. To identify slipped-DNAs at a disease locus in patient tissues I devised a DNA- immunoprecipitation protocol that uses a highly specific anti-DNA junction antibody (2D3) that recognizes 3-way DNA junctions [232, 235, 236] (Fig. 4.1), a structural feature of slipped-DNAs [104, 232]. The antibody has been characterized extensively, and is known to recognize DNA- junctions but not single-stranded DNA, hairpin tips, Z-DNA, triplex or quadruplex, with no off- target binding reported [232, 235-237]. 2D3 binds best to slipped-DNAs [232], strengthening its use to isolate these structures.

78

4.2.1 The Anti-DNA Junction Antibody (2D3)

To identify slipped-DNAs at a disease locus in patient tissues I devised a DNA- immunoprecipitation protocol that uses a highly specific anti-DNA junction antibody (2D3) that recognizes 3-way DNA junctions with no sequence preference [232, 235, 236], with 3-way junctions known to be a structural feature of slipped-DNAs (Fig. 4.1) [104, 232]. No off-target binding has been reported with this antibody [232, 235, 237]. Binding of 2D3 is independent of superhelical tension—a feature that allows us to omit crosslinking and permits purification of slipped-DNAs in linear molecules [238]. The 2D3 binding site has been mapped to DNA junctions [235], and 1-2 antibodies bind per DNA junction [235-237]. 2D3 does not bind to junction-free linear duplexes, single-stranded DNA, Z-DNA or triple-stranded DNA, it is uniquely directed to 3-way and 4-way DNA junctions [235-241]. 2D3 has successfully purified cruciform-containing DNAs from human [236]and yeast cells [242]. Unlike those protocols, isolation of slipped-DNAs does not require antibody binding to supercoiled unbroken DNAs, as slipped-DNAs are very stable biophysically even in sheared DNAs, and survive DNA isolation, purification and de-proteinization [104].

4.3 Methods

4.3.1 Human tissues

Human tissues analyzed in this study are listed in Table 4.1.Autopsy tissues from a non- affected individual (ADN1) were obtained snap-frozen from the National Disease Research Interchange.

4.3.2 DNA extraction

DNA extractions from human tissues were carried out to eliminate conditions that might allow structure formation and/or DNA shearing, using low binding tubes (Marsh Biomedical Products), as recommended [243, 244].Flash-frozen tissues were crushed by mortar and pestle in liquid nitrogen, and subsequently mixed with 10 x volume lysis buffer (50mM Tris pH8, 100mM EDTA, 100mM NaCl, 0.1% SDS, 200 µg/ml proteinase K). This was incubated at 40°C overnight.

79

Table 4.1. Patient, tissue, and DM1 CTG tract sizes

Sizing taken from LNA Southern blots (for the DM1 patients) or standard PCR reactions (control DM1). Numbers are absolute sizes for the non-expanded allele, and a median, with a size range for the expanded allele.

80

81

Figure 4.1.Slipped-DNAs

(A) Slipped-DNA structures that may form at the expanded repeats of the DM1 locus. S-DNA is slipped-homoduplex DNA, containing the same number of repeats in both DNA strands, with multiple clustered slip-outs. SI-DNA is slipped-intermediate heteroduplex DNA, containing differing numbers of repeats in each strand, where number of repeats, ―N‖ is reflected by font size. Each slip-out is comprised of a both the slip-out structure as well as the 3-way junction structure (one slip-out and two fully base-paired arms). One junction is shown with an arrow.

82

83

Figure 4.2. Models of expansion at trinucleotide repeats

(A) Model of DNA slippage in trinucleotide repeats. Slippage and mis-pairing of trinucleotide repeats by the complementary repeat units shifting out-of register, leading to slipped-out repeats. Slipped-out- DNAs may form on either the CTG or CAG strand, forming Slipped-Intermediate (SI) heterduplexes, or on both strands forming Slipped (S) homoduplexes, by out-of-register mispairing. (B) Slipped-strand DNAs can form during various metabolic processes such as replication, repair, recombination, transcription, and at unwound DNA. Mispairing of the repeats are shown at right.

84

(A)

(B)

85

The following day protein extraction was repeated two times with an equal volume of a 1:1 mixture of phenol and chloroform/isoamylalcohol (24:1) and once with chloroform/isoamyl alcohol (pH 8.0) using a 10ml serological pipette and pipetting slowly. Samples were mixed for 20min at room temperature and centrifuged at 14000rpm for 10 min after each phenol/chloroform/isoamylalcohol extraction step, and mixed for 20 min and centrifuged at 14000 rpm for 5 mins after the chloroform/isoamyl alcohol extraction step. DNA was precipitated by adding 1/10 volume of 3M NaOAc and an equal volume of isopropanol, then mixed gently until stringy DNA was visible. After centrifugation, the pellet was rinsed with 70% ethanol, then 95% ethanol. The pellet was air-dried for no more than 20 min (allows for the removal of excess ethanol but not drying of pellet), and wet pellets were resuspended in 1xTE (10 mM Tris, 0.1 mM EDTA, pH 8.0). Samples were allowed to hydrate for 10 min and placed at 40°C for two hours. To avoid denaturation, DNAs were never allowed to be dehydrated [245]. In regular polypropylene tubes only low-levels of non-specific polypropylene-induced slipped- DNAs formed by CTG/CAG repeats has been observed previously [26, 76]. DNA dehydration led to similar results[245] and hence must be avoided. Our use of phenol to extract proteins was done under conditions known to protect against DNA denaturation and enhance the stability of the duplexed state. Specifically I used conditions of the Phenol Emulsion Reassociation Technique or PERT assay, which is widely used to enhance DNA renaturation [246-249].

4.3.3 CTG repeat length analysis

Genomic DNA was digested with BglI, HaeIII, AluI, DpnII, or MwoI. Fragments were resolved on 0.5% agaronse gels buffered with 40 mM Tris–acetate, 1 mM EDTA for 24 h at 1 V/cm, and then transferred overnight onto nylon membranes (Roche) by alkaline transfer. Blots were fixed at 120 °C for 20 min and then hybridized for 4 hr at 70°C with 10 pmol/ml DIG- labeled (CAG)7 (5‘-gcAgCagcAgCagCagcAgca-3‘) (upper case letter indicates position of LNA nucleotide) in hybridization buffer (5x SSC, 1% block solution (Roche), 0.1% N-lauryl sarcosine, 0.02% sodium dodecyl sulfate). After washing to high stringency (0.5x SSC at 70°C), blots were developed with alkaline phosphatase-conjugated anti-DIG antibody using CDP-Star substrate according to manufacturers‘ instructions (Roche), followed by 2–30 min exposure to BioMax XAR film (Kodak). For conventional Southern analysis, blots were hybridized with 32P-labeled random-primed probe p5B1.4 [250], a 348 nt DMPK fragment that is adjacent to the

86

(CTG) repeats. These blots were analyzed by using a Typhoon 9600 phosphorimager (GE Health Care).

This sizing method takes advantage of the increased electrophoretic resolution of DM1 restriction fragments having a minimum of non-repetitive flanking sequences. Non-expanded (CTG) alleles were sized by sequencing (The Centre for Applied Genomics (TCAG), MaRS Centre, Toronto, Canada) of the products obtained after PCR amplification (forward primer 409: 5′-gaagggtccttgtagccgggaa-3′; reverse primer 407: 5′-cagagcagggcgtcatgcaca-3′; 67°C annealing temperature for 30 sec, 2 min of extension for 30 cycles).

4.3.4 Structure formation

Homoduplex and heteroduplex slipped-structures (S-DNAs) of 30 or 50 repeat containing DNAs end-labeled with 32P were formed by alkaline denaturation/renaturation. DNAs of (CTG)50, (CAG)50, (CTG)47, (CAG)47, (CTG)49, (CAG)49 , (CTG)30, and (CAG)30 were end- labeled, mixed with an equimolar amount of a corresponding trinucleotide repeat in order to generate slip-outs of 20, 3, and 1 repeat, (or multiple repeats in the case of the (CTG)50/(CAG)50 S-DNAs) on both strands, DNA samples were then ethanol precipitated, dried in a Speed Vac, then denatured by resuspension in a solution of 500 mM NaOH (pH 13) and 1.5 M NaCl, and incubated at room temperature for 5 min. Samples were neutralized by the addition of a 50-fold volume of 50 mM Tris-HCl (pH 8) and 5 mM EDTA, resulting in a solution having 0.01 M NaOH and 0.03 M NaCl (pH 8). These conditions promote full renaturation. Samples were immediately placed at 68 °C for 3 hr, followed by purification by ethanol precipitation, resuspended in TE, and stored at -20 °C. Reduplexed plasmids were electrophoretically resolved on a 4% (w/v) polyacrylamide gel for 1.5 hrs at a constant 200 V and then gel purified. The resultant structures were end-labeled using T4 polynucleotide kinase as directed by the manufacturer (New England Biolabs).

4.3.5 Electrophoretic mobility shift assay (EMSA)

The radioactivity of each labeled structure was determined using Cerenkov counting, and an equivalent radioactive concentration of each structure either in the presence or absence of 2D3 antibody was run on a 4% (w/v) polyacrylamide gel in 1 x TBE buffer at a constant 150 V for 1.5 hrs. Following electrophoresis gels were dried and autoradiographed. The negative control

87 used for binding was an anti-actin antibody. Fully duplexed (FD) (CTG)50/(CAG)50 were used to test for specificity of binding, with linearized p-Bluescript plasmid additionally being used as a competitor during antibody binding.

4.3.6 DNA-Immunoprecipitation

Immunoprecipitation of slipped-structures in patient DNA was carried out using the anti- DNA junction monoclonal antibody 2D3. To minimize random shearing of the extensively expanded CTG-tract containing DNAs, I restriction digested 1 ug of genomic DNAs with BamHI and BbsI overnight at 37°C, in order to release the repeat-containing genomic fragment from the rest of the genome. 50 ul of hybridoma 2D3 culture supernatant containing ~5 ug/ml immunoglobulin was diluted 1:1 with PBS and added to the restriction digested patient DNA the following day and incubated on ice for one hour. 80 ul Protein G beads (Millipore, 0.2 mg/ml) were prepared by washing once with 500 ul 0.15 M NaCl and 10 mM Tris buffer (pH 7.4) and once with 500 ul 1 x TE buffer (pH 7.4), and resuspended in 250 ul 1 x TE. In order to extract antibody bound DNA, the resuspended beads were added to the antibody-DNA mixture and incubated on ice for 1 hour, mixing occasionally. The antibody/DNA/bead complex was washed free of unbound DNA three times with 1ml buffer containing 0.5 M NaCl, 1 mg/ml bovine serum albumin, and 10 mM Tris (pH7.4) by centrifuging bead complex at 11000 rpm for 20 sec, decanting supernatant, and gently resuspending the pellet in new wash solution. This was repeated with a buffer containing 0.15 M NaCl, 10 mM Tris (pH 7.4) and 0.1% Nonidet-40. The affinity purified DNA was then eluted by resuspending the complex in 1 x TE buffer (pH 7.4) containing 2% SDS followed by centrifugation at 11000 rpm for 2 min. DNA was further purified by phenol-chloroform-isoamyl alcohol extraction, followed by 100% ethanol precipitation. Previous studies have found that DNA-immunoprecipitations, with few exceptions, can bring down DNA fragment sizes of 0.370-23 kb [242, 251, 252], arguing against a possible bias against bringing down the larger CTG-expanded DM1 allele.

4.3.7 Polymerase chain reaction protocols

Multiplex PCR analysis was carried out using primer sets 407-409 and CTCFIIa- CTCFIIb (Table 4.2) with the following conditions: 95°C for 3 min, 95 for 45 sec, an annealing temperature of 67°C for 30 sec, an extension at 72°C for 2 min for 27 cycles. Products were

88

Table 4.2. Primers used in the experiments carried out in this chapter

89

90 resolved on a 4% non-denaturing acrylamide gel. Trinucleotide-primed PCR analysis was carried out using primer set P3R, P4CTG, and Somy4R, with the following conditions: 95°C for 5min, 95°C for 1 min, an annealing temperature of 62°C, an extension of 72°C for 2 min for 30 cycles, with a final extension time of 10 min. Amplification of the Lamin B2 locus was carried out using primers B13dx and B13sx, with the following conditions: 95° 5 min, 95°C 30 sec, an annealing temperature of 64°C 45 sec, an extension of 72°C 45 sec over 30 cycles. PCR across the DM1 locus was carried out with primers 407 and 409 with the following conditions: 95°C 5 min, 95°C 45 sec, an annealing temperature of 68°C 45s, and extension at 72°C 1 min for 30 cycles. Quantification PCR at the CTCFII site was carried out using CTCFIIa and CTCFIIb primers using the following conditions: 95°C 5 min, 95°C 45 sec, an annealing temperature of 65°C 45sec, and an extension at 72°C 45 sec for 30 cycles.

4.3.8 Mung Bean Nuclease, T7EndonucleaseI, and restriction enzyme digestion

10 Units of MBN or T7endoI were incubated with 200 ng DM1 patient genomic, un-IP‘d DNA in NEB Buffer 2 (New England Biolabs) at 30°C for 30 minutes and 37°C for 30 minutes, respectively. After digestion, the DNA was subjected to TP-PCR (see Table 4.2 for primers). Products of TP-PCR with a fluorescently labeled primer were analyzed by capillary electrophoresis at The Centre for Applied Genomics [Toronto]. TP-PCR run with standard, non- fluorescent primers were run on a 1% agarose gel at a constant 100V and visualized under UV light. Restriction digestion of patient genomic DNA before IP was done with 100 units of BbsI and BamHI per µg of genomic DNA in NEB Buffer 2 (New England Biolabs) at 37°C overnight.

4.3.9 Nuclease accessibility protocol

Basic Buffer A was made (60 mM KCl, 15 mM NaCl, 0.15 mM spermine, 0.5 mMspermidine,15 mM 2-mercaptoethanol, 15 mM tris, adjusted to pH 7.4with HCl) as per

[253], with the addition of 0.05 M NaHSO3 to inhibit proteolysis and 0.5% Triton X to remove contaminating cytoplasmic membrane. Fresh frozen tissues were homogenized rapidly in 7 ml/g of Buffer A (+ 0.34 M sucrose, 2 mM EDTA, 0.5 mM EGTA). Homogenates were layered on 0.33 volumes of Buffer A (+1.37 M sucrose, 1mM EDTA, 0.25 mM EGTA) and centrifuged for 15 min at 16,000X g. The nuclear pellet was gently dispersed in 7 volumes Buffer A (+2.4 M

91 sucrose, 0.1mM EDTA, 0.1mM EGTA), layered over an equal volume of the same buffer, and centrifuged for 45 minutes at 75,000Xg. The nuclear pellet was washed once in Buffer A (+ 0.34 M sucrose) by centrifuging for 15min at 16,000Xg. The nuclei were then resuspended in 5% sucrose + 5 mM Tris (pH 7.4). Concentrations of nuclei were adjusted to 0.5mg/ml (A260=10) using the appropriate enzyme reaction buffer (of either mung bean nuclease, T7 endonuclease, or AluI restriction enzyme- all from NEB). The mixtures were pre-incubated at the appropriate temperature (37°C for T7 and AluI, 30 for MBN) for 5 minutes, and the enzyme was added (5000 units), and digested overnight. Genomic isolation was carried out as above, followed by TP-PCR.

4.3.10 Electron microscopy

Immunoprecipitated DNA samples were mixed in a buffer containing 0.15 M NaCl, 1 mM MgCl2 and 2 mM spermidine, adsorbed to glow-charged thin carbon films, washed with a water/graded ethanol series and rotary shadow cast with tungsten. Samples were examined using a Philips 400 electron microscope. The NIH Image J program was used to measure the length of DNA molecules.

4.4 Results

4.4.1 Binding of anti-DNA junction antibody to slipped DNAs

To verify that the anti-junction DNA antibody recognizes slipped-DNA structures with varying slip-out sizes and slip-out numbers, I assessed 2D3 binding to various slipped-DNAs using an electrophoretic mobility shift assay (EMSA). Various slipped-DNA structures can form at trinucleotide repeats and these have been structurally characterized in detail by electrophoresis, chemical and enzymatic probing, and electron microscopy [26, 104, 126, 232](Fig. 4.1). Previously, our lab demonstrated that 2D3 binds to homoduplex Slipped-DNAs (S-DNA) of 50 CTG/CAG repeats on complementary strands as well as Slipped Intermediate (SI) heteroduplex DNAs (with an excess of 20 CAGs or 20 CTGs)[232]. The putative slipped DNA at the DMPK repeats may involve slip-outs of various sizes, determined by the length difference between the two repeat-containing strands. To determine whether 2D3 would recognize and bind to smaller slip-outs, I made SI-DNAs with slip-outs of 1- or 3-excess repeat units of either the (CTG) or (CAG) strand, in addition to the larger slip-outs of 20 repeats and S-

92

DNA. The antibody was incubated with radioactively labeled structured DNA and resolved on polyacrylamide gels to visualize antibody-DNA complexes, evident as slower migrating than the protein-free DNA (Fig. 4.3 A). As previously reported, 2D3 bound to homoduplex DNAs composed of multiple clustered slip-outs (S) on each molecule [26, 104, 126, 232]. The antibody bound all sizes of slip-outs (Sl) tested, but not the fully-duplexed (FD) repeat-containing DNA. This binding and its persistence in the presence of a 25-fold excess of non-specific competitor (linearized plasmid) indicates the binding specificity of 2D3 for S- and SI-DNA. Importantly, the anti-DNA junction antibody did not induce the formation of slipped-DNA from fully- duplexed DNA (Fig. 4.3 A), consistent with its inability to induce cruciform or hairpin extrusion [235-237] (see also Fig. 4.4). Non-specific antibody-DNA binding was excluded, as an anti- actin antibody, which should not bind DNA, did not yield antibody-DNA complexes (Fig. 4.3 B). Thus, the anti-junction DNA antibody recognizes a variety of slip-out sizes as well as multiple clustered slip-outs, supporting the usefulness of the antibody as a tool to detect these structures.

To determine whether slipped-DNAs are present in a disease locus of patient tissues, I developed a DNA-immunoprecipitation strategy. To be sure that I was not inducing slipped- DNAs through manipulations (genomic isolation, immunoprcipitation etc.), I used several precautionary measures and performed a series of controls. Precautionary measures included preparing genomic DNAs under conditions that avoided DNA denaturation or shearing. Genomic DNAs were prepared from patient tissues under non-denaturing conditions to avoid inducing the formation of unusual DNA structures (see Fig 4.4 B). Slipped-DNAs cannot be induced by DNA supercoiling, and their biophysical stability does not depend upon supercoiling [104, 254]. Since supercoiling does not induce base-unpairing for any structural transitions of expanded CAG/CTG repeats [254], and since slipped-DNAs do not depend on supercoiling, I strongly disfavor the extrusion of slipped-DNAs by chromatin removal during DNA isolation. Slipped DNAs can stably exist in linear unrestrained DNA [104]. To reduce DNA shear, enhance immunoprecipitation and eliminate the binding of 2D3 to supercoil-dependent DNA structures (cruciforms), genomic DNAs were restriction digested, and then immunoprecipitated using 2D3 and protein G agarose beads. While the 2D3 antibody does not induce cruciform or hairpin extrusion (see above) or slipped-DNAs in short fragments (Fig. 4.3 A) it was important to assess its effect upon disease-relevant expansions. Control experiments using in vitro prepared DNAs

93

Figure 4.3. 2D3 and control antibody binding.

(A) The anti-DNA junction antibody 2D3 bound slip-outs of 1-, 3- and 20-excess repeats as well as S-DNA by band-shift assay. The shift persisted under competitive conditions. (B) To test whether the bandshift observed in Fig. 1B was due to the specific binding of the anti-junction antibody, or as a result of a large amount of non-specific antibody binding, we incubated slipped substrates with anti-actin antibody. No protein-DNA complexes were observed at any antibody concentration tested. FD= fully-duplexed. S= slipped homoduplex. SI= slipped intermediate.

94

(A)

(B)

95

Figure 4.4.(A) The anti-DNA junction antibody 2D3 does not induce structure in long repeat-containing linear DNA-Southern Blot (B) Genomic isolation protocol does not induce formation of slipped structures in(CTG)50 and (CTG)500 linear DNA

(A) After IP of linearized, end-labeled DNA, no altered migration was detected in the flowthrough (SN=supernatant)DNA after electrophoretic migration on a polyacrylamide gel, and noDNA was detected in the IP lane, indicating a lack of induced structure. (B)After genomic isolation, the untreated labeled DNAs (-) were run alongside the linear DNAs that had been through the isolation (+) on a polyacrylamide gel. No altered migration was observed between the two linear DNAs. Both of the above results indicate a lack of induced or altered structure.

96

(A) (B)

97 with (CTG)500 showed that the antibody did not induce slipped-DNAs into the expanded tracts (Fig 4.4 A). Furthermore, I tested the possibility that structures might be induced during genomic isolation by adding the synthetically formed fully duplexed (CTG)500 to tissues to follow it through genomic isolation protocol, and again did not induce structure formation (Fig. 4.4 B). Using these conditions and controls, I proceeded to assess the presence of slipped-DNA in patient tissues.

4.4.2 Determination of repeat (CTG) size and heterogeneity in DM1 patient tissues

To determine whether the anti-junction antibody would recognize and immunoprecipitate slip-outs that may have arisen at the CTG-expanded DM1 locus in human patient DNAs, first I had to characterize the DM1 (CTG) tract length in a set of tissues from DM1 patients and a control individual by Southern blot (Fig. 4.5,Table 4.1) [1]. A tissue-specific length variation of the expanded allele was evident in both patients used in this study, with many tissues showing a heterogeneous range of repeat sizes (Fig. 4.5,Table 4.1). Increased length heterogeneity within a tissue and larger expansions between tissues are indicative of high levels of active (CTG) instability, while limited heterogeneity (such as a discrete size) and shorter expansions reflect less instability[1]. The largest and most heterogeneous expansions were present in the most affected tissues (cortex, muscle and heart), while the cerebellum showed the shortest expansion with no length heterogeneity (Fig. 4.5,Table 4.1). Control non-DM1 samples showed only non- expanded alleles. These tissues were used throughout the experiments conducted within this chapter.

4.4.3 Immunoprecipitation of slipped-DNA from DM1 patient tissues: allele specificity

To determine whether 2D3 would recognize and bind to slipped-DNAs formed at the endogenous DM1 locus in patient DNAs, I established a protocol which immunoprecipitates slipped-DNAs (Fig. 4.4.6). Initial experiments were performed on DNAs isolated from various tissues, using methods which would not be expected to either introduce or remove any pre- existing aberrant DNA structures, subsequent experiments were performed on chromatinized DNAs to further support their presence prior to isolation (see above).Immunoprecipitated DNAs were subsequently characterized by various means, as described below.

98

Figure 4.5. Southern blot sizing of the expanded DM1 (CTG) repeat in various tissues from the two patients used in this study.

Tissue-specific length variation of the expanded allele was evident in both patients, with many tissues showing a heterogeneous range of repeat sizes. Increased length heterogeneity within a tissue and larger expansions between tissues are indicative of high levels of active CTG instability, while limited heterogeneity (such as a discrete size) and shorter expansions reflect less instability. Note the considerable degree of smearing at the upper ranges of the CTG repeat sizes.

99

100

Figure 4.6.Protocol for isolating slipped-DNAs from genomic DNA.

Genomic DNA is isolated from patient tissues using a non-denaturing protocol (see Methods), after which the DNA is digested both upstream and downstream of the repeat. The digested DNA is then incubated with the 2D3 antibody, pulled down using Protein G beads, released, and the resultant immunoprecipitated DNA characterized. Further explanation can be found in Methods of this chapter.

101

102

Immunoprecipitation of slipped-DNA structures would be expected to enrich for the expanded disease-DM1 allele, as the non-expanded allele would not be expected to contain slipped-DNAs. Slipped-DNAs were in fact present on the expanded but not the non-expanded allele, as outlined below: Because the expanded CTG tract in DM1 patient tissues is beyond the PCR amplifiable size range, I devised a multiplex PCR to distinguish the presence of the disease-allele from the non-disease allele in the IP‘d DNAs. As outlined in Figure 4.7A, two sets of PCR primers amplifying adjacent regions of the DM1 locus were used in the same PCR reaction. The regions included the (CTG) repeat, and the downstream CTCF binding site. In the presence of both the expanded disease allele and the non-expanded allele, I expect three PCR products representing the adjacent amplicons from the non-expanded (CTG) tract, the CTCF sites from both alleles, as well as a larger product amplified from across the non-expanded allele and the downstream CTCF site. I did not expect either a PCR product from across the very large (CTG) expansion or from across the expansion and the adjacent CTCF site. Thus, when only the expanded (CTG) allele is present, as might be expected in samples IP‘d for the presence of slipped-DNAs, I would expect only one PCR product (derived from the CTCF site), whereas when both expanded and non-expanded alleles are present I would expect all three PCR products. This multiplex PCR assay was applied to IP‘d samples derived from various tissues of a DM1 patient with an expanded (CTG) tract (Fig. 4.7 B). Each genomic DNA prior to immunoprecipitation revealed all three multiplex PCR products, expected from both DM1 alleles. The IP‘d DNAs revealed only the one CTCF PCR product (Fig. 4.7 B). The absence of the upper two bands provides strong support for the IP‘d DNA to be enriched for the expanded DM1 allele, and deficient in the non-expanded allele. That the IP‘d DNA was enriched for slipped-DNA was further supported by the inability to detect DNAs derived from a control region that is devoid of unusual structure forming DNA sequence motifs (Fig. 4.8). Additional support for an enrichment of the expanded disease DM1 allele is the limited amounts of DM1 DNA immunoprecipitated from a control individual (non-DM1) (see quantification below). Thus, the inability to detect PCR products from the (CTG) tract but the presence of only the adjacent DM1 region supports a specific enrichment of the expanded DM1 allele through anti-DNA junction immunoprecipitation. This strongly supports the conclusion that slipped-DNA is present in the expanded allele.

103

Figure 4.7.Immunoprecipitated DNA enriches the expanded DM1 allele.

(A)Multiplex PCR protocol to determine the DM1 allele specificity of IP‘d DNA, where ―n‖ and ―N‖ are the non-expanded and expanded alleles. Two primer pairs (black, across the repeat, and blue across the adjacent CTCF binding site) are used in the same PCR reaction in order to differentiate between the expanded and unexpanded allele in genomic and IP‘d DNA. Expected products are shown in the schematic gels for each case. In an un-IP‘d, genomic DNA sample, one would expect to see 3 bands when the products are electrophoresed on an agarose gel, representative of both the unexpanded and the expanded allele. In an IP‘d sample, one would expect a single band, the lowest, representative of the expanded allele only. (B) Multiplex analysis of ADM5 patient tissue DNAs shows only the lower band in IP‘d DNAs, indicating a strong enrichment of the expanded but not the unexpanded allele. Un-IP‘d samples are marked as ―-―, and IP‘d samples are marked as ―+‖. M= Marker.

104

(A)

(B)

200 bp

105

Figure 4.8. 2D3 antibody does not pull-down non-specific DNAs void of structure-forming sequences

The lamin-B2 region was used as a control locus, as it is free of sequence motifs capable of assuming unusual DNA structures (there are no Z-DNA, cruciform, triple-stranded DNA, quadruplex DNA, or slipped-DNA forming sequences in this region). PCR amplification of the lamin-B2 locus from the 2D3-immunoprecipitated DNA (IP) did not yield detectable products. In contrast, the supernatant post-IP (SN), and the starting material (genomic) contained the lamin-B2 locus. Shown are representative PCR reaction products of DM1 patient ADM5 cerebellum DNAs run on a 4% polyacrylamide gel.

106

107

I used an independent direct method to determine the enrichment of the expanded DM1 allele in the immunoprecipitated material. Triplet-primed PCR (TP-PCR), which has been used as a myotonic dystrophy diagnostic tool, allows for amplification of short stretches of a large expanded (CTG) tract[255]. TP-PCR uses a primer that hybridizes to the repeat and a primer that hybridizes downstream of the repeat (Fig. 4.9 A). TP-PCR of the expanded (CTG) allele typically yields a heterogeneous range of (CTG) sizes in the PCR products, electrophoretically visible as a smear. This range of PCR products arises by PCR amplification of the repeat- specific primer hybridized randomly along the (CTG) expansion but relatively close to the flanking primer. TP-PCR across the non-expanded allele yields a distinct but shorter size range of PCR products, due to the limited locations to which the repeat-primer can hybridize (Fig 4.9 B). TP-PCR analysis of the DM1 patient DNAs prior to immunoprecipitation revealed both the expected smear of products for the expanded and non-expanded alleles (Fig. 4.9 B, left lane). However, TP-PCR of the immunoprecipitated sample revealed predominantly the longer range smeared products (middle lane), while the DNA in the supernatant following immunoprecipitation from the same experiment revealed predominantly the shorter products derived from the non-expanded allele (right lane). These results directly support the conclusion that the expanded DM1 disease allele was enriched by the anti-DNA junction antibody, consistent with the interpretation that slipped-DNAs are present along the expanded allele.

4.4.4 Quantification of immunoprecipitated DNAs

The levels of slipped-DNA might be expected to correlate with the levels of (CTG) instability, which varies between tissues of the same DM1 individual [1]. I quantified the amount of DM1 DNA being IP‘d from two tissues from the same patient harboring high and low levels of (CTG) instability, indicated by the larger and heterogeneous (CTG) lengths in the heart (ranging from 4100-6000 repeats), and the more discrete and shorter length in the cerebellum (1100-1500 repeats), respectively (Table 4.1). Quantification was accomplished using the highly accurate competitive quantitative PCR, relative to a cloned internal competitor (detailed in Fig. 4.10). The number of IP‘d DNA molecules were significantly greater in the heart over the cerebellum (relative to input) (Fig. 4.11), an enrichment that is consistent with a greater amount of slipped-DNA present in tissues having greater levels of (CTG) instability. A similar analysis of DM1 skeletal muscle showed that it too was significantly enriched for slipped-DNA (see Sensitivity, below).

108

Figure 4.9. Triplet-primed PCR analysis of immunoprecipitated DNA

(A) Triplet-primed PCR protocol for IP‘d DNAs (explained in text). Expected PCR products are schematically represented on the right. (B) TP-PCR reveals predominantly the expanded allele in IP‘d DNA (black arrowhead), and an absence of the non-expanded allele (blue arrowhead) indicating specific immunoprecipitation of the expanded allele. The supernatant (SN) is depleted of the expanded but not the non-expanded allele.

109

(A)

(B)

110

Figure 4.10. Method of quantification for immunoprecipitated materials

The immunoprecipitated material was quantified by the highly accurate competitive quantitative PCR [256, 257] (A). More sensitive and reproducible than RT-PCR, competitive quantitative PCR involves the co-amplification of a known amount of a cloned competitor- differing from the PCR target by a small insertion- along with a set amount of IP‘d material. In the left panel in (A), the DM1 locus is shown (flanked by CTCF binding sites). The location at which IP‘d DNA is being quantified along the locus (for example, the CTCF site) is PCR amplified, cloned, and into this clone a piece of DNA of roughly 50 bp is inserted. This competitor can now be amplified in the same reaction as genomic DNA, but will be electrophoretically resolvable when run on a gel. The competitor is then calibrated using genomic DNA of the patient from which IP‘d DNA is being quantified (right panel in (A)-C/T= competitor/template). The ratio of a known amount of competitor to input IP‘d DNA allows for the accurate calculation of total number of molecules in each IP‘d sample. Products quantified by densitometry (ImageQuant). (B) Representative competitive quantitative PCRs for each tissue quantified, with one example of ADM9 heart quantified by lane. In each example, the upper band of products is the linear PCR amplification of competitors. The lower bands are the PCR products from the IP‘d material of the specified tissue. The number of molecules in each lane is given for one representative PCR reaction, with the 1:1 ratio indicated by an asterisk. A representative quantification chart is also given. The x-axis of each graph indicates the ratio of IP‘d DNA to competitor DNA in the specific lane indicated. The appropriate range of competitor dilutions was chosen individually for each tissue quantified, making each gel not necessarily comparable lane for lane to each other.

111

(A)

(B) Competitor concentrations (in number of molecules), by lane, in ADM9 heart competitive PCR panel 1 (lane 1=point 1 on the graph):

Lane 1- 346,000

Lane 2- 173,000

Lane 3- 86,000

Lane 4- 57,666

Lane 5- 34,600 *

Lane 6- 17,300

Lane 7- 8650

*1:1 ratio at this dilution

x- axis = lane of the gel (which contains a known number of molecules y-axis= ratio of competitor to template (IP‘d) DNA

112

Figure 4.11. Quantification of immunoprecipitated DNA from patient and control samples

Quantitative competitive PCR indicates a significant increase in the amount of IP‘d/input from heart compared to cerebellum DNA of the same DM1 patient (unpaired two-tailed t-test, p = 0.03). No significant difference was found between control tissues.

113

114 from a control non-DM1 individual were not enriched differentially between cerebellum and heart (Fig. 4.11). These findings reveal a trend of increased levels of slipped-DNAs in tissues with higher levels of instability.

4.4.5 Sensitivity of patient DNAs to structure-specific enzymes

As a further test for the presence of slipped-DNAs in DM1 patient tissues, DNAs were treated with enzymes that specifically digest slipped-DNA structures. Mung bean nuclease (MBN) and T7 endonuclease I (T7endoI), have been shown to specifically cleave the single- stranded regions in slip-outs and across DNA-junctions of slipped-DNA [26, 104] (Fig. 4.12). Following enzyme digestions, DNAs were subjected to TP-PCR to assess the potential loss of the expanded alleles (Fig. 4.13). If the expanded allele contains slipped-DNA, cleavage by these enzymes should decrease the amount of the smear representing the expanded allele. The amount of expanded repeat products in several DM1 patient tissues was significantly reduced following treatment with MBN or T7endoI (Fig. 4.14). Non-DM1 control DNAs showed no difference after enzyme treatment (Fig. 4.14). The significant reduction of the expanded allele by either structure-specific enzyme is consistent with a portion of the DM1 disease alleles being in the slipped-DNA conformation.

It was of interest to know if the slipped DNAs were present within tissues in the native chromatin context. Furthermore, the detection of slipped-DNAs directly on chromatinized DNA in tissues, where freeze and thawing does not affect chromatin packaging relative to fresh tissues [258], provides further evidence for the existence of slipped-DNAs prior to chromatin removal during DNA isolation in the above experiments. To assess the presence of slipped-DNAs in tissues, I used an established protocol of DNA nuclease accessibility assay on tissues, which assesses DNA in its native chromatin context [258]. Unusual DNA conformations have been detected in native chromatin through nuclease digestion [259-261]. For example, cruciform and stem-loops structures, similar to slipped-DNAs, were susceptible to nuclease digestion and found to stably reside in the inter-nucleosomal region [260]. DM1 patient and control tissues were treated with either buffer only as a negative control, or both T7endoI and mung bean nuclease to test for digestion, or AluI restriction enzyme to test for off-target digestion (Fig. 4.15, 4.16, 4.17). Following treatment, DNA was subjected to TP-PCR to assess the potential loss by digestion of the expanded alleles. There was a significant reduction in the expanded allele from

115

Figure 4.12.Sensitivity of DM1 patient DNAs to structure-specific enzymes

DM1 patient DNAs were treated with T7EndonucleaseI or mung bean nuclease. The schematic given shows the locations at which each of these nucleases cleave within a slipped-DNA structure.

116

117

Figure 4.13. TP-PCR analysis of samples +/- structure specific enzyme digestions, assessed by GeneScan

Products of TP-PCR with a fluorescently labeled primer were analyzed by capillary electrophoresis with scans shown, where the green line plots the TP-PCR products. Skeletal muscle from the two patients is shown, as well as the control individual. The non-expanded and the expanded alleles are indicated. Pre- and post- enzyme treatments are also indicated. A decrease in the expanded allele signal is seen after treatment with either mung bean nuclease or T7 Endonuclease I, compared to untreated, with reduced amounts of the expanded repeat allele visible as a reduced tail running towards the right of the scan (A- ADM5 skeletal muscle, B- ADM9 skeletal muscle, C- control skeletal muscle)

118

(A)

(B)

(C)

119

Figure 4.14.TP-PCR analysis of samples +/- structure-specific enzyme digestions, agarose analysis

(A) Agarose electrophoretic migration showing decreased signal of the expanded allele afterbothmung bean nuclease and T7 Endonuclease I digestion, with control DNA showing no difference. (B) . The reduction of the expanded allele was quantified by ImageQuant software as a percentage of the total amount of DNA in each lane, compared to untreated genomic DNA. There is a significant reduction in the expanded allele after treatment with both enzymes in both patient‘s skeletal muscle (paired two-tailed t-test, ADM5 skeletal muscle genomic DNA vs MBN treatment, p = 0.0015, and vs. T7EndoI treatment, p = 0.0015; ADM9 skeletal muscle genomic DNA vs. MBN treatment, p = 0.003, and vs T7EndoI treatment, p = 0.0009). There was no significant difference after MBN or T7endoI treatment in non-DM1 individual DNA. M= marker

120

(A)

(B)

121

Figure 4.15. TP-PCR analysis of samples +/- structure specific enzyme digestions within native chromatin context.

Patient and control tissues were subjected to digestion by structure specific enzymes (MBN and T7) or a control enzyme (AluI) within the native chromatin context (see Methods ―Nuclease Accessibility Assay‖). Representative scans of control cerebellum, ADM9 cerebellum, control heart and ADM9 skeletal muscle either undigested or digested by the indicated enzymes.

122

(A)

(B)

(C)

(D)

123

Figure 4.16. Quantification of areas under the peak in Figure 4.15

A comparison of the areas under the peak in ADM9 muscle treatments (untreated, MBN+T7, and AluI), and ADM9 cerebellum treatments (untreated, MBN+T7). Each peak is represented as a percentage of the first peak, the first peak being the start of the normal and expanded repeats combined. Additionally, the percent reduction in area under the peak from treated compared to untreated is given. Peak 9 is the location at which the normal allele ends and only the expanded allele is being scanned. Treatment of skeletal muscle with MBN+T7 decreases the area under the peak in the expanded allele between 21-54%, with no difference after AluI treatment, and no difference in treatment of the cerebellum.

124

125

Figure 4.17.Sensitivity of DM1 patient DNAs to structure specific enzymes when treated within their native chromatin context

(A) Agarose electrophoretic migration of native-chromatin context DNA after TP-PCR, showing decreased signal of the expanded allele in ADM9 skeletal muscle after both mung bean nuclease and T7 endonuclease I digestion, but not with AluI. Neither control tissue, nor the ADM9 cerebellum, showed any significant difference after any treatments. (B) The reduction of the expanded allele was quantified by ImageQuant software as a percentage of the total amount of DNA in each lane, compared to untreated genomic DNA. The graph shows the significant difference in the reduction of the expanded allele after MBN and T7 treatment, compared to both untreated and AluI treatment (p = 0.0038), n=3 experiments. There is no significant difference between untreated and MBN+T7 treated ADM9 cerebellum digested in the native chromatin context, nor in either control tissue.

126

(A)

(B)

127

Patient ADM9 muscle tissue when treated with T7 and MBN treated tissues (p= 0.0038), but not when treated with AluI (p = 0.255). There was no significant reduction in either control cerebellum or heart, or patient cerebellum, after treatment with either T7 and MBN, or AluI (Fig.4.15, 4.16, 4.17). These results are consistent with the above results in that the tissue with greater (CTG) instability contains more slipped-DNAs, and control tissues do not. Thus, slipped- DNAs were present in the native chromatin in DM1 tissues and at levels that correlate with levels of instability.

4.4.6 Electron microscopic analysis of immunoprecipitated DNAs

In order to biophysically characterize the slipped-DNAs, I visualized IP‘d DNA by electron microscopy. Electron microscopic analysis of the immunoprecipitated slipped-DNAs revealed multiple bends, kinks, bulges, branched arms, and regions with increased thickness relative to control DNA (compare Fig. 4.18- linear and induced structure, to Fig. 4.19). A significantly increased number of molecules with slip-outs were identified in tissues showing high levels of (CTG) instability compared to the cerebellum which showed the lowest level of instability (skeletal muscle vs. cerebellum, two-sided t-test; p = 0.005; pancreas vs. cerebellum, two-sided t-test; p = 0.0202; all unstable tissues (heart, liver, pancreas, cortex, skeletal muscle) vs. cerebellum; p = 0.0386; see Fig. 4.20 B).The size of the slip-outs presented a bimodal distribution ranging from 1-100 repeats with peaks at ~30 and <10 repeats (Fig. 4.20 A). Multiple slip-outs were clustered along a given DNA, with distances of <100bp between slip- outs (Fig. 4.20 B). These features were also present in in vitro induced slipped-DNAs in synthetic (CTG)800 (Fig. 4.18) and are consistent with previous electron micrographs showing multiple short slip-outs in (CTG) repeat DNAs [26, 126]. The immunoprecipitated DNAs were shorter than expected, possibly due to IP-induced DNA shearing of the very long repeat expansions that I studied, a phenomenon previously observed for DNAs enriched in single-strand nicks, gaps or DNAse I hypersensitive sites. An enrichment of such strand breaks may be expected for arrested repair of clustered slip-out lesions [78], a sensitivity to site fragility [60], and susceptibility of slipped-DNAs to double-strand breaks [262].

128

Figure 4.18. Electron microscopic imaging of linear and induced-structure DNA

As a positive control for slipped-DNAs, in vitro-induced slipped-DNAs in the synthetic (CTG)800 repeat containing DNA fragments and its non-slipped variant were analyzed by EM. The fully-base paired (CTG)800 fragment appears with smooth contours (left panel) , while the (CTG)800fragments with induced slipped-DNAs appear thicker, with bends and kinked structures (right two panels).

129

130

Figure 4.19. Electron microscopic analysis of slipped-DNAs

(A) Multiple clustered short slip-outs were present in the immunoprecipitated DM1 patient DNAs (see also Fig 4.18 for control experiment). Wide EM field of view. (B) EM images of immunoprecipitated DM1 DNAs and a control fully-duplexed DNA, in a narrow field of view which includes only one DNA fragment. Each of the non-control panels show a fragment containing at least one visible slip out.

131

(A)

(B)

132

Figure 4.20. Quantification and analysis of EM images of patient slipped DNA

(A) Quantification of slipped out molecules by tissue, as a percentage of total number of molecules seen in an EM field. Total number of molecules counted per tissue ranged from 6 to 32. Significant differences between tissues are noted with p-values determined using a two-sided t-test. (B) Analysis of slip-out sizes and the distance between slip-outs on immunoprecipitated slipped DNAs.

133

(A)

(B)

134

4.5 Conclusions and Discussion

I report here the isolation of slipped-DNAs present at an endogenous disease locus from various DM1 patient tissues. Our observations are supported by multiple independent, complementary experimental approaches. To better understand mutations leading to evolutionary variation, cancer-associated instability, and disease-causing mutation, it is crucial to study these structures at specific genomic loci and in relevant patient tissues. This is particularly important for trinucleotide repeat expansions as these events are expansion-biased, arising by processes distinct from contractions, with instability patterns varying between disease loci, and between mitotic and non-mitotic tissues. An understanding of these DNA mutagenic intermediates occurring in patients should provide insight as to how they may be processed and lead to mutations, as model systems may not accurately reflect what is ongoing in affected individuals. Previously it was reported that slipped-DNAs can form in an artificial cell system with CTG/CAG repeats placed at an ectopic locus, where instability was contraction biased as opposed to the expansion bias present in affected patients [128]. However, although the authors report in vivo hairpin formation, that report could not comment upon either structure formation at an endogenous disease locus, their variation between tissues showing variable instability, upon their persistence in patients, or on the structural features of the slipped-DNAs. Furthermore, in that artificial system, instability and hairpin detection depended upon DNA replication, contrasting with the high levels of instability arising in post-mitotic tissues of patients. Here, I revealed slipped-DNAs at the expanded CTG/CAG DM1 disease locus in tissues including the brain and heart. The brain in humans is essentially post-mitotic after birth [263], with both the cerebellum and cortex developing from very early on in embryogenesis and continuing up until the first years after birth [264, 265]. Similarly, the majority of cells in the heart are post-mitotic in nature, with cell division essentially ceasing in cardiac myocytes soon after birth and the dividing cell population in adults being less than 1% of the total number of cells [266-269]. Since (CTG) expansions continue to expand at a greater rate following birth [1], our detection of slipped-DNAs in non-mitotic tissues supports that they arose in the absence of DNA replication.

Slipped-DNAs are clusters of small slip-outs, with increased levels in DM1 patient tissues showing high repeat instability. This contrasts with the long-presumed concept that slipped-DNAs arise as isolated slip-outs on either strand of the duplex. That slipped-DNAs are clusters of short slip-outs gives new insight into the role they may play in the mutation process.

135

The majority of analyses addressing the potential roles of slip-outs have centered upon the presumption that single isolated slip-outs may form transiently and be processed to fully duplexed repeat-length mutations.

The most striking implication of our findings is that slipped-DNAs do not appear to be merely transient in nature, as previously believed [92], but may be persistent in vivo products of instability, consistent with their poor ability to be repaired [78]. This does not preclude them from also being transient in nature. However the fact that they are abundant enough to be immunoprecipitated argues for the fact that there are numerous slipped-out structures at any one time within the tissues of a patient. Rather than being rapidly repaired following their formation as previously presumed, slipped-DNAs, particularly when short slip-outs are tandemly clustered along a given repeat tract, resist repair by interfering with each other‘s repair [78]. The persistence of clustered slip-outs in patient tissues with high levels of instability supports the formation of these as products of mutation, rather than serving only as transient intermediates. Intrastrand slippage may occur after attempted repair events on slipped-DNAs become arrested due to adjacent slip-outs. This shifts the repeats further out of register leaving a gap that, when filled, results in an increased number of repeats in that strand and producing slipped-DNAs with an excess of repeats on one or both strands (Figure 4.21). Reiterations of such events in the absence of proper repair would lead to instability, where products are slipped-DNAs. That patient DNAs contain slipped-DNA at the repeats may have diagnostic implications, whereby the slower electrophoretic migration of a heteroduplex would inaccurately be interpreted as a fully- duplexed molecule with a longer repeat tract than is present in either strand of the heteroduplex (Figure 4.22). Essentially, the high-end of the range would be incorrectly reported as being longer than it actually is. Such erroneous broadening of the size range of repeat units could explain the often imperfect correlation of expansion size with clinical severity or age-of-disease onset, common to many trinucleotide repeat diseases [270-272]. Similar problems could arise with the PCR analysis of microsatellite instability, which in many DNA repair-defective tumors is used as a prognostic factor [273], with the additional stutter or shadow bands being interpreted as multiple discrete DNAs rather than as a single heteroduplex, which may complicate genotyping. Thus, the persistence of slipped-DNAs in patient tissues has implications on multiple levels.

136

Figure 4.21.Expansion and heteroduplex slipped DNA

Slipped homoduplex DNA that forms at a trinucleotide repeat may be a target of attempted repair. Interference by adjacent slip-outs may arrest repair, allowing for intrastrand slippage and the formation of a gap. When filled, this would result in an expansion in one of the strands, producing a heteroduplex as well as more slipped DNA

137

138

Figure 4.22. Slipped-DNA: Southern blot and PCR analysis

(A) Southern blot analysis of expanded repeats can be interpreted as either multiple fully duplexed DNAs, or multiple (but smaller) slipped heteroduplex DNAs. In the latter interpretation, the largest product is in fact the result of the heteroduplexing of two smaller strands, with the end result being an over-estimation of the repeat length. Similarly, in (B) the PCR products of an expanded repeat showing multiple bands on a gel can be interpreted as either the genomic DNA containing multiple expanded alleles, or the genomic DNA containing one heteroduplex allele, with different numbers of repeat units on each strand. That patient DNAs contain slipped-DNA at the repeats may have prognostic implications, whereby the slower electrophoretic migration of a heteroduplex would inaccurately be interpreted as a fully-duplexed molecule with a longer repeat tract than is present in either strand of the heteroduplex. Essentially the high-end of the range would be incorrectly reported as longer than it actually is. This would lead to an inaccurate prognostic assessment. Such erroneous broadening of the size range of repeat units could explain the often imperfect correlation of expansion size with clinical severity or age-of-disease onset, common to many trinucleotide repeat diseases

139

(A)

(B)

140

The utility of isolating slipped-DNAs through a structure-specific antibody has applications that could reveal important processes in genetic variation arising during DNA damage, cancers and inherited diseases. This IP protocol has identified slipped-DNAs at a disease locus in patient tissues, but not in appropriate controls. While our findings are not definitive in vivo evidence, the fact that the levels of slipped-DNAs varied between tissues which showed varying levels of instability supports that these have arisen in the patient. Going forward, slipped-DNA identification and characterization could provide mechanistic and prognostic insights into medically important mutations in hypermutable viruses, mitochondria, hypervariable genomic regions, and fragile sites [274-278] as it has done herein for DM1 repeat instability.

141

5 Discussion and Implications 5.1 Cis-elements and instability

The causes and mechanisms of repeat instability in TNR diseases is an important avenue of research, as intergenerational instability can contribute to anticipation and phenotypically worse disease through generations, and somatic instability contributes to worsening symptoms throughout the lifetime of individuals harbouring these mutations. An understanding of any element that contributes to disease will aid in potential discoveries of therapeutic intervention strategies. Cis-element are a likely candidate for instability contribution, especially the flanking sequence of the DNA, as it is known that TNRs of the same composition and relative length can show remarkably different repeat stabilities and lengths within their various genomic locations. The mouse lines described in Chapter 2 have either an intact or mutated CTCF binding site from the human ATXN-7 locus in a SCA7 mouse model, with the CAG tract initially containing 94 repeats, which is within the disease-causing range in humans. Here, I have shown that the CTCF binding site cis-element is an important regulator of instability. The presence of a wildtype CTCF binding site conferred stability to the CAG repeat; mice with mutated binding sites showed greater instability within the germline as well as somatically as they aged. This indicates that the loss of CTCF binding induces repeat instability both somatically and transgenerationally. This was the first published instance of an endogenous cis-element regulating the stability of a TNR causing disease.

A previous SCA7 mouse model with a 8.3 kb deletion 3‘ of the CAG repeat showed a stabilization of the repeat tract [12], and the directed mutagenesis of the CTCF binding site in that same 8.3 kb region in the mouse model reported here conversely shows a decrease in stabilization, strongly supporting the role of CTCF binding on the stability of the SCA7 repeat. The wildtype binding of CTCF likely insulates the repeat from other elements that may be causing expansion. In the same 3‘ end in which the CTCF binding site is found, a putative replication origin has been mapped [65]. The placement of the origin of replication relative to the CTCF binding site suggests that the instability at this particular locus may be at least partially caused by a ―fork-shift‖ model of repeat instability (Fig. 5.1) [67]. The fork-shift model states that the placement of a cis-element, such as the CTCF binding site in this particular case, between a DNA replication origin and the repeat itself could alter the progression of the

142

Figure 5.1. The fork-shift model of instability

The fork shift model of repeat instability states that, between the DNA replication origin and the repeat tract, there is a cis-element (blue X) that alters replication fork progression, leading to instability. OIZ- Okazaki initiation zone. Ori- origin of DNA replication

143

144 replication fork, therefore altering Okazaki fragment initiation and contributing to instability. Okazaki initiation occurs on the lagging strand of DNA synthesis, and is known to have preferential as well as excluded nucleotide sites for initiation of lagging strand DNA replication [67]. Depending upon the direction of initiation of DNA replication through the repeat, different flanking sequences as well as repeat tract sequences will have differential abilities to initiate Okazaki fragments. With the combined effect of a cis-element such as CTCF also potentially altering fork progression, the stability of the repeat may be further affected. In order to test whether the DNA replication origin is still active in the SCA7 mice, it could be mapped, as has been done previously in a mouse model of DM1 [66].There is an interesting correlation of CTCF binding sites and TNR disease loci (Table 5.1), and so it is possible that this cis-element plays a role in instability in more than only a SCA7 context. The same CTCF mutation experiments as those carried out in Chapter 2 could be completed in mouse models of various other TNR diseases known to contain one or two flanking CTCF binding sites, in order to determine whether the role of CTCF on repeat instability is a more general phenomenon across TNR disorders. Additionally, due to the tissue-specific nature of instability in several TNR diseases, ChIP experiments could be carried out on mouse or human tissues showing variable instability in order to test for the binding of CTCF at these sites.

Other cis-elements are known to be co-incident with certain expanded TNR disease causing tracts, though few have been shown to have a direct effect on repeat stability. Replication origins have been shown to flank the human DM1 repeat, with only the downstream origin active in mice containing one copy of a 45 kb expanded human transgene [66]. Coupled with previous data showing that the direction of replication through the repeat tract alters repeat stability [51], this study is suggestive of another cis-element that affects stability in vivo at the DM1 locus. The same study also showed that mutations in the flanking CTCF binding sites showed decreased replication efficiency and increased repeat instability in vitro. Additionally, as discussed in the Introduction, transcription through expanded CTG repeats has been shown to affect repeat instability [69], though this study was not performed at an endogenous locus but at an exogenously added expanded (CTG) repeat. The formation of RNA/DNA hybrids [70, 71] has also been shown to affect stability in vitro and in E.coli, respectively. Methylation, another cis-acting element, has been shown to differ between control and affected individuals with

145

Table 5.1.Trinucleotide repeat disease loci known to be flanked by CTCF binding sites

146

Disorder/gene Repeat unit Proximal CTCF Disease Full/disease Binding Site

DM1/DMPK myotonic (CTG)/(CAG) 80 - >6000 Flanking dystrophy DRPLA/ATN1 Dentatorubral- (CAG)/(CTG) 49-88 Flanking pallidouysian atrophy

FRAXA/FMR1 (CGG)/(GCC) 230-2000 Flanking fragile X type A FXTAS/FMR1

Fragile x (CGG)/(GCC) 59-230 Flanking tremor/ataxia syndrome SCA2/ATX2 spinocerebellar (CAG)/(CTG) 32-200 Upstream ataxia 2

SCA7/ATXN7 spinocerebellar (CTG)/(CAG) 37-306 Flanking ataxia 7 HD/HTT >35 Huntington’s (CAG)/(CTG) Upstream disease

147 various TNR diseases [1, 82, 279, 280], although a clear correlation with instability is not always evident [1]. Although important work has and is being carried out in vitro, and in model systems such as yeast, bacteria, drosophila, mice, and even human tissue culture, none recapitulate the human disease perfectly. Much of the data gleaned from these studies must also be shown in human tissues in order to gain a true understanding of what is ongoing in patients. Ultimately, the mechanisms of repeat instability, especially those behind the large somatic differences that are evident between tissues in certain TNR diseases like SCA7 and DM1, are likely a combination of any or all of the above cis-elements as well as certain trans-acting factors such as repair protein levels and activity. The complex interactions involved warrant continued analysis, as a better understanding of the contribution of all interactors will lead to a more appropriate course of action for therapeutic intervention.

5.2 Rare mutations in repeat diseases

Although TNR diseases are essentially Mendelian in nature, with the causative mutation occurring at a single gene within the genome, it is very likely that there are other factors that are contributing to disease mutagenesis and pathology. I have described the role that CTCF, a cis- acting factor, has on the stability of the repeat in a mouse model of SCA7. In Chapter 3 I describe two insertion mutations that have almost completely replaced the (CTG) repeat at the DM1 locus in a small subpopulation of cells found in only two tissues from one DM1 patient. The exact mechanism of this insertion mutation remains elusive, although several potential mechanisms, such as gene conversion or a template switching event are discussed. Although rare mutations such as the insertion mutations described in Chapter 3 likely do not have much of an effect on the stability or pathology of the disease, mostly due to the small fraction of cells harbouring the mutation, it is possible that other factors within those particular cells lead to the insertion in the first place. Within the cells which incurred the insertion mutations, a minor change in the levels or activity of various proteins involved in the potential mutational event (for example, a template switch due to structure formation and double stranded breaks) may be enough to have caused the switch. It has been reported that errors made by various polymerases can increase microhomology-mediated template switches [225].Tissues containing these insertion mutations could be disassociated, the (CUG) RNA aggregates which are normally found in DM1 cells can be antibody labeled, and the cells containing aggregrates separated from those not containing aggregates using fluorescence-activated cell sorting (FACS) (i.e. the cells

148 with the insertion mutation, and therefore no expanded RNA aggregates, should not fluoresce). The population of cells not containing aggregates (insertion mutations) could be compared to those with aggregates (expanded repeat) for various protein expression difference or other parameters of interest. Although any experiments attempting to address additional factors that may have contributed to such rare mutational events on such a small subpopulation of cells would be difficult, they may in fact be worthwhile in order to potentially determine some yet unknown factors contributing to instability and disease progression.

It is possible that mutations within expanded trinucleotide repeats may be more common than we now believe. When the first few TNR mutations were discovered, they were believed to be pure, uninterrupted repeat tracts. Soon after, however, it was found that some tracts contain interruptions when unexpanded, with the expanded allele having lost the interruptions, leading to the hypothesis that they confer some protection against expansion [24]. Interestingly, subsequent research has revealed that some TNR tracts can either maintain interruptions after expansion [28], gain new interruption patterns [30, 39], or have varied, complicated patterns [23, 41], with some of these contributing to the stability of the repeat on one end or the other, causing expansion to occur predominantly in one direction. Also interesting to note is the fact that, for certain TNR diseases, the tract length or specific interruption pattern has been reported to have an effect on or contribute to the pathogenesis of other generally non-TNR associated diseases. One of the best studied examples of this is for the (CAG) repeat tract at the ATXN2 locus, which when expanded causes SCA2. Intermediate length polyglutamine tracts of between 27-33 repeats which are interrupted have been shown to increase the risk of ALS [281-284]. Pure expanded repeats (greater than 34) cause SCA2, and when interrupted with CAA patients present with atypical parkinsonism [281]. The intermediate lengths also have an earlier age at ALS onset when interrupted with 3 CAA repeats, compared to individuals with fewer interruptions, despite the fact that those with 3 CAA repeats have fewer overall disease-causing (CAG) repeats. Interestingly, polymorphisms in the SBMA locus, another CAG TNR disease, do not contribute to an ALS phenotype [285], indicating a locus-specific effect of the expanded (CAG) repeat. It is possible that the sometimes variable symptoms as well as the oftentimes poor correlation of repeat length to disease pathology and onset [272] seen in some of the TNR disorders may be partially due to mutations or alternative nucleic acid compositions within the expanded tracts. Given the propensity of various TNR repeat tracts to form mutagenic secondary structures, there

149 is the potential for these to lead to mutations such as the insertions seen in Chapter 3 deep within the repeat where sequencing is not possible at this time.

5.3 Slipped-DNA and alternative structures in repeat disease

Slipped-DNA structures have been proposed to exist in relation to disease for over 50 years, and have been hypothesized to contribute to trinucleotide disease pathology for nearly 20 years. In Chapter 4, I have shown for the first time that slipped-DNA structures do in fact form at an endogenous TNR disease locus in DM1 patient tissues. Importantly, I have shown that the levels of slipped-DNA correlate with the levels of repeat instability in tissues, and, most excitingly, that the slip-outs tend to be clustered along a length of DNA rather than as individual slip-outs, as has long been presumed.

Nearly all models of repeat instability rely upon the hypothesis that slipped structures form at expanded trinucleotide repeats, contributing to instability and disease progression. DNA metabolic processes such as replication, repair, and transcription all require that the duplexed DNA become unwound, leaving single-stranded regions which are then prone to self-anneal and become stable slipped-out structures. Although the data contained herein is not in vivo evidence per se, since determining the presence of slipped-DNA in living patient tissues is not possible, the data presented in Chapter 4 reveals their presence in the most important model system for TNR diseases: human tissues. Because slipped-DNAs were not found at any appreciable levels in the control individual with the largest allele repeat size being 12, we cannot conclude based on these data that slipped DNAs contribute to disease onset via slip-outs forming within a non- disease causing allele length. What does appear to be occurring is that slip-outs form in the already expanded allele and contribute to continued expansion, given the fact that a greater number of slip-outs are present in more unstable tissues with the largest repeat lengths. It is still possible that slip-outs form in tissues of individuals with high-normal repeat sizes (in the 30s) at a level greater than in individuals with smaller allele sizes, and this could be tested on such tissues using the experimental approach in Chapter 4.

One question left unanswered is whether the slip-outs are strand biased. Slip-outs of (CTG) form more stable hairpin conformations than CAG slip-outs. If the repeat is in fact being replicated from the downstream DNA replication origin as previously published data suggests [66], the (CAG) strand would be the lagging strand template. Due to this particular nucleic acid

150 composition being a poor template for Okazaki initiation, we may expect issues with initiation and extension, leading to potential mutagenic CTG slip-outs forming within the Okazaki fragments. This would lead to the newly replicated DNA containing expansions primarily on the CTG strand. One way to answer this question would be to carry out denaturing Southern blots, probing with either a CTG or a CAG oligo. If we expect the slip-outs to be present as a heteroduplex (SI-DNA), a difference in size from one strand compared to the other should be discernable, especially at very long lengths, where this type of slip-out has likely occurred many times. A comparison of these two probes separately as well as with both probes together could aid in the determination of a strand-biased aspect of the slip-outs. Additionally, a denaturing Southern blot could also answer the question as to whether the repeat sizes reported during standard Southern blotting for DM1 are actually overestimates based on the slower migration of heteroduplexes.

The set of experiments carried out in Chapter 4 could also be expanded to include multiple different TNR diseases. Various trinucleotide repeats are known to form alternative structures in vitro, and so it would be useful to know whether slip-outs occur in patient tissues of other repeat diseases such as HD, SCA7, or non-CTG repeat disorders such as Fragile X, in order to determine the utility of finding potential DNA-binding agents that can stop repeat expansion. This would be especially important for diseases in which continued somatic instability during a patient‘s life contributes to worsening symptoms (covered more fully below). Slipped-DNA isolation could also be expanded beyond TNR diseases to others involving microsatellite instability, such as hereditary non-polyposis colorectal cancer (HNPCC), also called Lynch syndrome. HNPCC is a disease caused by mutations in the mismatch repair 2 (MSH2) gene, which generally leads to a truncation and loss of function of the protein. In addition to the cancer phenotype, mostly colorectal in nature, many individuals with HNPCC show generalized instability of (CA) dinucleotides across the genome [286]. MSH2, in conjunction with MSH6, is mostly responsible for small-loop repair, and it is therefore possible that the unchecked dinucleotide repeats are forming short unstable loop-outs that are not being correctly repaired. Given the shadow bands that are often seen in PCRs carried out at the unstable of HNPCC [287], it is possible that what is actually being reported are heteroduplexed DNAs formed by slip-outs at the loci. Similarly, the levels and biophysical structure of slipped-DNAs

151 forming at mono- and dinucleotide repeats may be distinct between tumors showing microsatellite instability compared to those that are microsatellite stable.

Additionally, different mutational paths may involve distinct slipped-DNAs. For example, slipped-DNAs isolated from transgenic mice harboring expanded CAG/CTG repeats may be structurally very different between mice that are MutSβ-proficient or -deficient, where instability is expansion-biased or contraction-biased, respectively. Isolating slipped-DNA from these mice is an avenue that could be explored in order to further our understanding of slipped- DNA formation.

As briefly mentioned in the Introduction and above, a more universal presence of slipped- DNA structures in various diseases could lead to a more general treatment of these diseases through targeting of the slip-out structures. Those diseases in which the repeats contain more numerous slip-outs would be better candidates for compounds directed towards slipped-DNA, and could help determine where such research should be directed. There are currently several research avenues testing the binding of ligands or nucleic acids to specific DNA structures [288- 290]. Any compounds used, however, must be determined to be specific, as it is postulated that certain alternative structures are advantageous (or at the very least naturally occurring in cells). For example, it has been reported that cruciform structures can form at DNA replication origins in mammals [239]. Those diseases that are found to have little to no slip-out formation but still show substantial somatic instability would also benefit greatly from therapeutic targeting of the DNA, as it is the expansion itself which contributes to worsening symptoms. The target in such cases could instead be potential DNA/RNA hybrids, since it is known that all repeat disease loci are transcribed and that DNA/RNA hybrids themselves also contribute to repeat expansion [70].

It would also be advantageous to determine the junction structure(s) of the IP‘d slip-outs, given that perfect three-way junctions cause any additional folding of the DNA to become energetically difficult, and that even one extruded base across from a slip-out relaxes this restraint [142]. NMR of 3-way junctions has previously been carried out, which revealed localized differential stereochemistry of bulged-out bases at the junction, with several different conformations possible and identified within the same structure [291-294]. It may be possible to determine whether there are any variable conformations at the junctions in the DNA pulled down with the 2D3 IP. IP‘d DNA can be cloned, purified, multiplied, and NMR carried out.

152

Additionally, and as a complementary approach, NMR can be carried out on substrates that are designed to contain CTG/CAG repeats within the proper DM1 genomic context (i.e. accurate flanking sequences). As mentioned in the Introduction, flanking sequence has been shown to affect the formation of certain structures [126, 129]. This may also be true for the junction structure across from the slip-out. The substrates made can be based upon the genomic context of both the 5‘ end and 3‘ end of the repeat, as well as a substrate of pure repeats, because the flanking sequences of the structure-containing repeats far within the tract will be only other repeats. The slip-outs and junction structure may vary considerably in all cases. Based on the variable biophysical stability of different junction structures [118] and the potential for structure variation throughout the tract, this variation may be interesting to know when screening for potential drugs or ligands that interact with slip-outs, as the particular location of interaction may in fact include the junction structure. This knowledge could aid in the selection of a ligand with as specific an interaction as possible to the particular slip-out of each disease.

The 2D3 antibody could also have utility in other models of TNR diseases which, when coupled with the genetic knockdown of proteins or pathways known to affect certain aspect of repeat instability, could help tease out more mechanistic aspects of slipped-DNA formation. For example, attempting to IP DNA from a DM1 mouse model that has various mismatch repair proteins knocked out could help determine to what extent MMR may be contributing to the formation of stable slipped-out structures in vivo. Slipped-DNA intermediates at other non- disease loci could also be isolated. For example slipped-DNAs might be expected to arise and persist in certain tumors from individuals deficient in mismatch repair proteins. The 2D3 antibody might also prove useful in isolating and characeterising slipped-DNAs or other aberrant DNA structures that contain junction features, like reversed replication forks, recombination intermediates, replication intermediates, etc.

5.4 Summary and concluding remarks

Trinucleotide repeat diseases, although Mendelian in nature, are quite complex in pathology and mutational instability. A combination of cis-elements such as repeat tract length, purity, DNA structures, methylation and protein binding sites, as well as trans factors such as protein binding, repair, replication, transcription and potentially numerous other factors are all likely playing a role in repeat instability and disease progression. These various factors are

153 likely also contributing to rare, cryptic non-expansion mutations that may also be having an effect on disease pathology. The best theoretical method of therapeutic treatment would be to attack these diseases at their root. The presence of slipped-DNAs may allow this targeting to be more locus- and disease-specific, reducing any potential off-target effects on other repeat loci and decreasing as much as possible the root cause of dozens of diseases: repeat expansion.

154

References

1. Lopez Castel, A., et al., Expanded CTG repeat demarcates a boundary for abnormal CpG methylation in myotonic dystrophy patient tissues. Hum Mol Genet, 2011. 20(1): p. 1-15.

2. Duyao, M., et al., Trinucleotide repeat length instability and age of onset in Huntington's disease. Nat Genet, 1993. 4(4): p. 387-92.

3. Taylor, A.K., et al., Tissue heterogeneity of the FMR1 mutation in a high-functioning male with fragile X syndrome. Am J Med Genet, 1999. 84(3): p. 233-9.

4. Tanaka, F., et al., Differential pattern in tissue-specific somatic mosaicism of expanded CAG trinucleotide repeats in dentatorubral-pallidoluysian atrophy, Machado-Joseph disease, and X-linked recessive spinal and bulbar muscular atrophy. J Neurol Sci, 1996. 135(1): p. 43-50.

5. Benitez, J., et al., Somatic stability in chorionic villi samples and other Huntington fetal tissues. Hum Genet, 1995. 96(2): p. 229-32.

6. Verbeek, D.S. and B.P. van de Warrenburg, Genetics of the dominant ataxias. Semin Neurol, 2011. 31(5): p. 461-9.

7. Sato, N., et al., Spinocerebellar ataxia type 31 is associated with "inserted" penta- nucleotide repeats containing (TGGAA)n. Am J Hum Genet, 2009. 85(5): p. 544-57.

8. Kobayashi, H., et al., Expansion of intronic GGCCTG hexanucleotide repeat in NOP56 causes SCA36, a type of spinocerebellar ataxia accompanied by motor neuron involvement. Am J Hum Genet, 2011. 89(1): p. 121-30.

9. Orr, H.T., et al., Expansion of an unstable trinucleotide CAG repeat in spinocerebellar ataxia type 1. Nat Genet, 1993. 4(3): p. 221-6.

10. David, G., et al., Cloning of the SCA7 gene reveals a highly unstable CAG repeat expansion. Nat Genet, 1997. 17(1): p. 65-70.

11. Holmberg, M., et al., Spinocerebellar ataxia type 7 (SCA7): a neurodegenerative disorder with neuronal intranuclear inclusions. Hum Mol Genet, 1998. 7(5): p. 913-8.

12. Libby, R.T., et al., Genomic context drives SCA7 CAG repeat instability, while expressed SCA7 cDNAs are intergenerationally and somatically stable in transgenic mice. Hum Mol Genet, 2003. 12(1): p. 41-50.

13. Whitney, A., et al., Massive SCA7 expansion detected in a 7-month-old male with hypotonia, cardiomegaly, and renal compromise. Dev Med Child Neurol, 2007. 49(2): p. 140-3.

155

14. Ansorge, O., et al., Ataxin-7 aggregation and ubiquitination in infantile SCA7 with 180 CAG repeats. Ann Neurol, 2004. 56(3): p. 448-52.

15. Michalik, A., J.J. Martin, and C. Van Broeckhoven, Spinocerebellar ataxia type 7 associated with pigmentary retinal dystrophy. Eur J Hum Genet, 2004. 12(1): p. 2-15.

16. Nakamura, Y., et al., Ataxin-7 associates with microtubules and stabilizes the cytoskeletal network. Hum Mol Genet, 2012. 21(5): p. 1099-110.

17. Brook, J.D., et al., Molecular basis of myotonic dystrophy: expansion of a trinucleotide (CTG) repeat at the 3' end of a transcript encoding a protein kinase family member. Cell, 1992. 69(2): p. 385.

18. Liquori, C.L., et al., Myotonic dystrophy type 2 caused by a CCTG expansion in intron 1 of ZNF9. Science, 2001. 293(5531): p. 864-7.

19. Miller, J.W., et al., Recruitment of human muscleblind proteins to (CUG)(n) expansions associated with myotonic dystrophy. EMBO J, 2000. 19(17): p. 4439-48.

20. Jiang, H., et al., Myotonic dystrophy type 1 is associated with nuclear foci of mutant RNA, sequestration of muscleblind proteins and deregulated alternative splicing in neurons. Hum Mol Genet, 2004. 13(24): p. 3079-88.

21. Philips, A.V., L.T. Timchenko, and T.A. Cooper, Disruption of splicing regulated by a CUG-binding protein in myotonic dystrophy. Science, 1998. 280(5364): p. 737-41.

22. Lin, X., et al., Failure of MBNL1-dependent post-natal splicing transitions in myotonic dystrophy. Hum Mol Genet, 2006. 15(13): p. 2087-97.

23. Musova, Z., et al., Highly unstable sequence interruptions of the CTG repeat in the myotonic dystrophy gene. Am J Med Genet A, 2009. 149A(7): p. 1365-74.

24. Chung, M.Y., et al., Evidence for a mechanism predisposing to intergenerational CAG repeat instability in spinocerebellar ataxia type I. Nat Genet, 1993. 5(3): p. 254-8.

25. Rolfsmeier, M.L. and R.S. Lahue, Stabilizing effects of interruptions on trinucleotide repeat expansions in Saccharomyces cerevisiae. Mol Cell Biol, 2000. 20(1): p. 173-80.

26. Pearson, C.E., et al., Structural analysis of slipped-strand DNA (S-DNA) formed in (CTG)n. (CAG)n repeats from the myotonic dystrophy locus. Nucleic Acids Res, 1998. 26(3): p. 816-23.

27. Cleary, J.D. and C.E. Pearson, The contribution of cis-elements to disease-associated repeat instability: clinical and experimental evidence. Cytogenet Genome Res, 2003. 100(1-4): p. 25-55.

28. Kawaguchi, Y., et al., CAG expansions in a novel gene for Machado-Joseph disease at chromosome 14q32.1. Nat Genet, 1994. 8(3): p. 221-8.

156

29. Koide, R., et al., A neurological disease caused by an expanded CAG trinucleotide repeat in the TATA-binding protein gene: a new polyglutamine disease? Hum Mol Genet, 1999. 8(11): p. 2047-53.

30. Moseley, M.L., et al., SCA8 CTG repeat: en masse contractions in sperm and intergenerational sequence changes may play a role in reduced penetrance. Hum Mol Genet, 2000. 9(14): p. 2125-30.

31. Sobczak, K. and W.J. Krzyzosiak, Patterns of CAG repeat interruptions in SCA1 and SCA2 genes in relation to repeat instability. Hum Mutat, 2004. 24(3): p. 236-47.

32. Ramos, E.M., et al., Common origin of pure and interrupted repeat expansions in spinocerebellar ataxia type 2 (SCA2). Am J Med Genet B Neuropsychiatr Genet, 2010. 153B(2): p. 524-31.

33. Kunst, C.B. and S.T. Warren, Cryptic and polar variation of the fragile X repeat could result in predisposing normal alleles. Cell, 1994. 77(6): p. 853-61.

34. Kunst, C.B., et al., The effect of FMR1 CGG repeat interruptions on mutation frequency as measured by sperm typing. J Med Genet, 1997. 34(8): p. 627-31.

35. Eichler, E.E., et al., Length of uninterrupted CGG repeats determines instability in the FMR1 gene. Nat Genet, 1994. 8(1): p. 88-94.

36. Dombrowski, C., et al., Premutation and intermediate-size FMR1 alleles in 10572 males from the general population: loss of an AGG interruption is a late event in the generation of fragile X syndrome alleles. Hum Mol Genet, 2002. 11(4): p. 371-8.

37. Harr, B., B. Zangerl, and C. Schlotterer, Removal of microsatellite interruptions by DNA replication slippage: phylogenetic evidence from Drosophila. Mol Biol Evol, 2000. 17(7): p. 1001-9.

38. Nichol Edamura, K., M.R. Leonard, and C.E. Pearson, Role of replication and CpG methylation in fragile X syndrome CGG deletions in primate cells. Am J Hum Genet, 2005. 76(2): p. 302-11.

39. Braida, C., et al., Variant CCG and GGC repeats within the CTG expansion dramatically modify mutational dynamics and likely contribute toward unusual symptoms in some myotonic dystrophy type 1 patients. Hum Mol Genet, 2010. 19(8): p. 1399-412.

40. Mateo, I., et al., GAA expansion size and age at onset of Friedreich's ataxia. Neurology, 2003. 61(2): p. 274-5.

41. Stolle, C.A., et al., Novel, complex interruptions of the GAA repeat in small, expanded alleles of two affected siblings with late-onset Friedreich ataxia. Mov Disord, 2008. 23(9): p. 1303-6.

42. McDaniel, D.O., et al., Sequence variation in GAA repeat expansions may cause differential phenotype display in Friedreich's ataxia. Mov Disord, 2001. 16(6): p. 1153-8.

157

43. Herman, J.G., et al., Incidence and functional consequences of hMLH1 promoter hypermethylation in colorectal carcinoma. Proc Natl Acad Sci U S A, 1998. 95(12): p. 6870-5.

44. Xu, G.L., et al., Chromosome instability and immunodeficiency syndrome caused by mutations in a DNA methyltransferase gene. Nature, 1999. 402(6758): p. 187-91.

45. Dion, V., et al., Dnmt1 deficiency promotes CAG repeat expansion in the mouse germline. Hum Mol Genet, 2008. 17(9): p. 1306-17.

46. Gorbunova, V., et al., Genome-wide demethylation destabilizes CTG.CAG trinucleotide repeats in mammalian cells. Hum Mol Genet, 2004. 13(23): p. 2979-89.

47. Hagerman, R.J., et al., High functioning fragile X males: demonstration of an unmethylated fully expanded FMR-1 mutation associated with protein expression. Am J Med Genet, 1994. 51(4): p. 298-308.

48. Castaldo, I., et al., DNA methylation in intron 1 of the frataxin gene is related to GAA repeat length and age of onset in Friedreich ataxia patients. J Med Genet, 2008. 45(12): p. 808-12.

49. Emmel, V.E., et al., Does DNA methylation in the promoter region of the ATXN3 gene modify age at onset in MJD (SCA3) patients? Clin Genet, 2011. 79(1): p. 100-2.

50. Libby, R.T., et al., CTCF cis-regulates trinucleotide repeat instability in an epigenetic manner: a novel basis for mutational hot spot determination. PLoS Genet, 2008. 4(11): p. e1000257.

51. Cleary, J.D., et al., Evidence of cis-acting factors in replication-mediated trinucleotide repeat instability in primate cells. Nat Genet, 2002. 31(1): p. 37-46.

52. Farrell, B.T. and R.S. Lahue, CAG*CTG repeat instability in cultured human astrocytes. Nucleic Acids Res, 2006. 34(16): p. 4495-505.

53. Kamath-Loeb, A.S., et al., Interactions between the Werner syndrome helicase and DNA polymerase delta specifically facilitate copying of tetraplex and hairpin structures of the d(CGG)n trinucleotide repeat sequence. J Biol Chem, 2001. 276(19): p. 16439-46.

54. Lee, S. and M.S. Park, Human FEN-1 can process the 5'-flap DNA of CTG/CAG triplet repeat derived from human genetic diseases by length and sequence dependent manner. Exp Mol Med, 2002. 34(4): p. 313-7.

55. Wang, W. and R.A. Bambara, Human Bloom protein stimulates flap endonuclease 1 activity by resolving DNA secondary structure. J Biol Chem, 2005. 280(7): p. 5391-9.

56. Callahan, J.L., et al., Mutations in yeast replication proteins that increase CAG/CTG expansions also increase repeat fragility. Mol Cell Biol, 2003. 23(21): p. 7849-60.

158

57. Liu, Y., et al., Saccharomyces cerevisiae flap endonuclease 1 uses flap equilibration to maintain triplet repeat stability. Mol Cell Biol, 2004. 24(9): p. 4049-64.

58. Spiro, C. and C.T. McMurray, Nuclease-deficient FEN-1 blocks Rad51/BRCA1-mediated repair and causes trinucleotide repeat instability. Mol Cell Biol, 2003. 23(17): p. 6063- 74.

59. Jackson, S.M., et al., A SCA7 CAG/CTG repeat expansion is stable in Drosophila melanogaster despite modulation of genomic context and gene dosage. Gene, 2005. 347(1): p. 35-41.

60. Freudenreich, C.H., S.M. Kantrow, and V.A. Zakian, Expansion and length-dependent fragility of CTG repeats in yeast. Science, 1998. 279(5352): p. 853-6.

61. van den Broek, W.J., et al., Fen1 does not control somatic hypermutability of the (CTG)(n)*(CAG)(n) repeat in a knock-in mouse model for DM1. FEBS Lett, 2006. 580(22): p. 5208-14.

62. Moe, S.E., J.G. Sorbo, and T. Holen, Huntingtin triplet-repeat locus is stable under long- term Fen1 knockdown in human cells. J Neurosci Methods, 2008. 171(2): p. 233-8.

63. Gray, S.J., et al., An origin of DNA replication in the promoter region of the human fragile X mental retardation (FMR1) gene. Mol Cell Biol, 2007. 27(2): p. 426-37.

64. Brylawski, B.P., et al., Mapping of an origin of DNA replication in the promoter of fragile X gene FMR1. Exp Mol Pathol, 2007. 82(2): p. 190-6.

65. Nenguke, T., et al., Candidate DNA replication initiation regions at human trinucleotide repeat disease loci. Hum Mol Genet, 2003. 12(9): p. 1021-8.

66. Cleary, J.D., et al., Tissue- and age-specific DNA replication patterns at the CTG/CAG- expanded human myotonic dystrophy type 1 locus. Nat Struct Mol Biol, 2010. 17(9): p. 1079-87.

67. Cleary, J.D. and C.E. Pearson, Replication fork dynamics and dynamic mutations: the fork-shift model of repeat instability. Trends Genet, 2005. 21(5): p. 272-80.

68. Schumacher, S., I. Pinet, and M. Bichara, Modulation of transcription reveals a new mechanism of triplet repeat instability in Escherichia coli. J Mol Biol, 2001. 307(1): p. 39-49.

69. Nakamori, M., C.E. Pearson, and C.A. Thornton, Bidirectional transcription stimulates expansion and contraction of expanded (CTG)*(CAG) repeats. Hum Mol Genet, 2011. 20(3): p. 580-8.

70. Reddy, K., et al., Determinants of R-loop formation at convergent bidirectionally transcribed trinucleotide repeats. Nucleic Acids Res, 2011. 39(5): p. 1749-62.

159

71. Lin, Y., et al., R loops stimulate genetic instability of CTG.CAG repeats. Proc Natl Acad Sci U S A, 2010. 107(2): p. 692-7.

72. Salinas-Rios, V., B.P. Belotserkovskii, and P.C. Hanawalt, DNA slip-outs cause RNA polymerase II arrest in vitro: potential implications for genetic instability. Nucleic Acids Res, 2011. 39(17): p. 7444-54.

73. Savouret, C., et al., CTG repeat instability and size variation timing in DNA repair- deficient mice. EMBO J, 2003. 22(9): p. 2264-73.

74. van den Broek, W.J., et al., Somatic expansion behaviour of the (CTG)n repeat in myotonic dystrophy knock-in mice is differentially affected by Msh3 and Msh6 mismatch- repair proteins. Hum Mol Genet, 2002. 11(2): p. 191-8.

75. Foiry, L., et al., Msh3 is a limiting factor in the formation of intergenerational CTG expansions in DM1 transgenic mice. Hum Genet, 2006. 119(5): p. 520-6.

76. Pearson, C.E., et al., Human MSH2 binds to trinucleotide repeat DNA structures associated with neurodegenerative diseases. Hum Mol Genet, 1997. 6(7): p. 1117-23.

77. Panigrahi, G.B., et al., Slipped (CTG)*(CAG) repeats can be correctly repaired, escape repair or undergo error-prone repair. Nat Struct Mol Biol, 2005. 12(8): p. 654-62.

78. Panigrahi, G.B., et al., Isolated short CTG/CAG DNA slip-outs are repaired efficiently by hMutSbeta, but clustered slip-outs are poorly repaired. Proc Natl Acad Sci U S A, 2010. 107(28): p. 12593-8.

79. Vostrov, A.A. and W.W. Quitschke, The zinc finger protein CTCF binds to the APBbeta domain of the amyloid beta-protein precursor promoter. Evidence for a role in transcriptional activation. J Biol Chem, 1997. 272(52): p. 33353-9.

80. De Biase, I., et al., Epigenetic silencing in Friedreich ataxia is associated with depletion of CTCF (CCCTC-binding factor) and antisense transcription. PLoS One, 2009. 4(11): p. e7914.

81. Ottaviani, A., et al., The D4Z4 macrosatellite repeat acts as a CTCF and A-type lamins- dependent insulator in facio-scapulo-humeral dystrophy. PLoS Genet, 2009. 5(2): p. e1000394.

82. Filippova, G.N., et al., CTCF-binding sites flank CTG/CAG repeats and form a methylation-sensitive insulator at the DM1 locus. Nat Genet, 2001. 28(4): p. 335-43.

83. Hile, S.E. and K.A. Eckert, DNA polymerase kappa produces interrupted mutations and displays polar pausing within mononucleotide microsatellite sequences. Nucleic Acids Res, 2008. 36(2): p. 688-96.

84. Hile, S.E. and K.A. Eckert, Positive correlation between DNA polymerase alpha-primase pausing and mutagenesis within polypyrimidine/polypurine microsatellite sequences. J Mol Biol, 2004. 335(3): p. 745-59.

160

85. Goula, A.V., et al., Stoichiometry of base excision repair proteins correlates with increased somatic CAG instability in striatum over cerebellum in Huntington's disease transgenic mice. PLoS Genet, 2009. 5(12): p. e1000749.

86. Lopez Castel, A., A.E. Tomkinson, and C.E. Pearson, CTG/CAG repeat instability is modulated by the levels of human DNA ligase I and its interaction with proliferating cell nuclear antigen: a distinction between replication and slipped-DNA repair. J Biol Chem, 2009. 284(39): p. 26631-45.

87. Kovtun, I.V., et al., OGG1 initiates age-dependent CAG trinucleotide expansion in somatic cells. Nature, 2007. 447(7143): p. 447-52.

88. Platt, J.R., Possible Separation of Intertwined Nucleic Acid Chains by Transfer-Twist. Proc Natl Acad Sci U S A, 1955. 41(3): p. 181-3.

89. Felsenfeld, G., Theoretical studies on the interaction of synthetic polyribonucleotides. Biochim Biophys Acta, 1958. 29(1): p. 133-44.

90. Kornberg, A., et al., Enzymatic Synthesis of Deoxyribonucleic Acid, Xvi. Oligonucleotides as Templates and the Mechanism of Their Replication. Proc Natl Acad Sci U S A, 1964. 51: p. 315-23.

91. Schlotterer, C. and D. Tautz, Slippage synthesis of simple sequence DNA. Nucleic Acids Res, 1992. 20(2): p. 211-5.

92. Streisinger, G., et al., Frameshift mutations and the genetic code. This paper is dedicated to Professor Theodosius Dobzhansky on the occasion of his 66th birthday. Cold Spring Harb Symp Quant Biol, 1966. 31: p. 77-84.

93. Richards, R.I. and G.R. Sutherland, Dynamic mutations: a new class of mutations causing human disease. Cell, 1992. 70(5): p. 709-12.

94. Sinden, R.R., Neurodegenerative diseases. Origins of instability. Nature, 2001. 411(6839): p. 757-8.

95. Pearson, C.E., K. Nichol Edamura, and J.D. Cleary, Repeat instability: mechanisms of dynamic mutations. Nat Rev Genet, 2005. 6(10): p. 729-42.

96. Fortune, M.T., et al., Dramatic, expansion-biased, age-dependent, tissue-specific somatic mosaicism in a transgenic mouse model of triplet repeat instability. Hum Mol Genet, 2000. 9(3): p. 439-45.

97. Seznec, H., et al., Transgenic mice carrying large human genomic sequences with expanded CTG repeat mimic closely the DM CTG repeat intergenerational and somatic instability. Hum Mol Genet, 2000. 9(8): p. 1185-94.

98. Gomes-Pereira, M., et al., CTG trinucleotide repeat "big jumps": large expansions, small mice. PLoS Genet, 2007. 3(4): p. e52.

161

99. Kramer, P.R., C.E. Pearson, and R.R. Sinden, Stability of triplet repeats of myotonic dystrophy and fragile X loci in human mutator mismatch repair cell lines. Hum Genet, 1996. 98(2): p. 151-7.

100. Goellner, G.M., et al., Different mechanisms underlie DNA instability in Huntington disease and colorectal cancer. Am J Hum Genet, 1997. 60(4): p. 879-90.

101. La Spada, A.R., et al., Meiotic stability and genotype-phenotype correlation of the trinucleotide repeat in X-linked spinal and bulbar muscular atrophy. Nat Genet, 1992. 2(4): p. 301-4.

102. Leeflang, E.P. and N. Arnheim, A novel repeat structure at the myotonic dystrophy locus in a 37 repeat allele with unexpectedly high stability. Hum Mol Genet, 1995. 4(1): p. 135-6.

103. Rolfsmeier, M.L., M.J. Dixon, and R.S. Lahue, Mismatch repair blocks expansions of interrupted trinucleotide repeats in yeast. Mol Cell, 2000. 6(6): p. 1501-7.

104. Pearson, C.E. and R.R. Sinden, Alternative structures in duplex DNA formed within the trinucleotide repeats of the myotonic dystrophy and fragile X loci. Biochemistry, 1996. 35(15): p. 5041-53.

105. Mirkin, S.M., Expandable DNA repeats and human disease. Nature, 2007. 447(7147): p. 932-40.

106. Subramanian, S., R.K. Mishra, and L. Singh, Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions. Genome Biol, 2003. 4(2): p. R13.

107. Olivera, B.M. and I.R. Lehman, Enzymic joining of polynucleotides. 3. The polydeoxyadenylate-polydeoxythymidylate homopolymer pair. J Mol Biol, 1968. 36(2): p. 261-74.

108. Harvey, S.C., Slipped structures in DNA triplet repeat sequences: entropic contributions to genetic instabilities. Biochemistry, 1997. 36(11): p. 3047-9.

109. Pearson, C.E. and R.R. Sinden, Trinucleotide repeat DNA structures: dynamic mutations from dynamic DNA. Curr Opin Struct Biol, 1998. 8(3): p. 321-30.

110. Manley, K., et al., Msh2 deficiency prevents in vivo somatic instability of the CAG repeat in Huntington disease transgenic mice. Nat Genet, 1999. 23(4): p. 471-3.

111. Gomes-Pereira, M., M.T. Fortune, and D.G. Monckton, Mouse tissue culture models of unstable triplet repeats: in vitro selection for larger alleles, mutational expansion bias and tissue specificity, but no association with cell division rates. Hum Mol Genet, 2001. 10(8): p. 845-54.

162

112. Iyer, R.R. and R.D. Wells, Expansion and deletion of triplet repeat sequences in Escherichia coli occur on the leading strand of DNA replication. J Biol Chem, 1999. 274(6): p. 3865-77.

113. Panigrahi, G.B., J.D. Cleary, and C.E. Pearson, In vitro (CTG)*(CAG) expansions and deletions by human cell extracts. J Biol Chem, 2002. 277(16): p. 13926-34.

114. Seriola, A., et al., Huntington's and myotonic dystrophy hESCs: down-regulated trinucleotide repeat instability and mismatch repair machinery expression upon differentiation. Hum Mol Genet, 2011. 20(1): p. 176-85.

115. Owen, B.A., et al., (CAG)(n)-hairpin DNA binds to Msh2-Msh3 and changes properties of mismatch recognition. Nat Struct Mol Biol, 2005. 12(8): p. 663-70.

116. Stead, J.D. and A.J. Jeffreys, Allele diversity and germline mutation at the insulin . Hum Mol Genet, 2000. 9(5): p. 713-23.

117. Meservy, J.L., et al., Long CTG tracts from the myotonic dystrophy gene induce deletions and rearrangements during recombination at the APRT locus in CHO cells. Mol Cell Biol, 2003. 23(9): p. 3152-62.

118. Lilley, D.M. and M.F. White, The junction-resolving enzymes. Nat Rev Mol Cell Biol, 2001. 2(6): p. 433-43.

119. Richards, R.I. and G.R. Sutherland, Heritable unstable DNA sequences. Nat Genet, 1992. 1(1): p. 7-9.

120. Hentschel, C.C., Homocopolymer sequences in the spacer of a sea urchin histone gene repeat are sensitive to S1 nuclease. Nature, 1982. 295(5851): p. 714-6.

121. Mace, H.A., H.R. Pelham, and A.A. Travers, Association of an S1 nuclease-sensitive structure with short direct repeats 5' of Drosophila heat shock genes. Nature, 1983. 304(5926): p. 555-7.

122. McKeon, C., A. Schmidt, and B. de Crombrugghe, A sequence conserved in both the chicken and mouse alpha 2(I) collagen promoter contains sites sensitive to S1 nuclease. J Biol Chem, 1984. 259(10): p. 6636-40.

123. Shen, C.K., Superhelicity induces hypersensitivity of a human polypyrimidine . polypurine DNA sequence in the human alpha 2-alpha 1 globin intergenic region to S1 nuclease digestion--high resolution mapping of the clustered cleavage sites. Nucleic Acids Res, 1983. 11(22): p. 7899-910.

124. Yu, Y.T. and J.L. Manley, Structure and function of the S1 nuclease-sensitive site in the adenovirus late promoter. Cell, 1986. 45(5): p. 743-51.

125. Ulyanov, N.B., et al., Tertiary base pair interactions in slipped loop-DNA: an NMR and model building study. Nucleic Acids Res, 1994. 22(20): p. 4242-9.

163

126. Pearson, C.E., et al., Slipped-strand DNAs formed by long (CAG)*(CTG) repeats: slipped-out repeats and slip-out junctions. Nucleic Acids Res, 2002. 30(20): p. 4534-47.

127. Gacy, A.M., et al., Trinucleotide repeats that expand in human disease form hairpin structures in vitro. Cell, 1995. 81(4): p. 533-40.

128. Liu, G., et al., Replication-dependent instability at (CTG) x (CAG) repeat hairpins in human cells. Nat Chem Biol, 2010. 6(9): p. 652-9.

129. Oussatcheva, E.A., et al., Structure of branched DNA molecules: gel retardation and atomic force microscopy studies. J Mol Biol, 1999. 292(1): p. 75-86.

130. Werntges, H., et al., Mismatches in DNA double strands: thermodynamic parameters and their correlation to repair efficiencies. Nucleic Acids Res, 1986. 14(9): p. 3773-90.

131. Fazakerley, G.V., et al., Structures of mismatched base pairs in DNA and their recognition by the Escherichia coli mismatch repair system. EMBO J, 1986. 5(13): p. 3697-703.

132. Hunter, W.N., et al., Structure of an adenine-cytosine base pair in DNA and its implications for mismatch repair. Nature, 1986. 320(6062): p. 552-5.

133. Sullivan, K.M. and D.M. Lilley, A dominant influence of flanking sequences on a local structural transition in DNA. Cell, 1986. 47(5): p. 817-27.

134. Kang, S., F. Wohlrab, and R.D. Wells, GC-rich flanking tracts decrease the kinetics of intramolecular DNA triplex formation. J Biol Chem, 1992. 267(27): p. 19435-42.

135. Potaman, V.N., et al., Unpaired structures in SCA10 (ATTCT)n.(AGAAT)n repeats. J Mol Biol, 2003. 326(4): p. 1095-111.

136. Edwards, S.F., et al., A Z-DNA sequence reduces slipped-strand structure formation in the myotonic dystrophy type 2 (CCTG) x (CAGG) repeat. Proc Natl Acad Sci U S A, 2009. 106(9): p. 3270-5.

137. Woodson, S.A. and D.M. Crothers, Proton nuclear magnetic resonance studies on bulge- containing DNA oligonucleotides from a mutational hot-spot sequence. Biochemistry, 1987. 26(3): p. 904-12.

138. Woodson, S.A. and D.M. Crothers, Binding of 9-aminoacridine to bulged-base DNA oligomers from a frame-shift hot spot. Biochemistry, 1988. 27(25): p. 8904-14.

139. Woodson, S.A. and D.M. Crothers, Structural model for an oligonucleotide containing a bulged guanosine by NMR and energy minimization. Biochemistry, 1988. 27(9): p. 3130- 41.

140. Woodson, S.A. and D.M. Crothers, Preferential location of bulged guanosine internal to a G.C tract by 1H NMR. Biochemistry, 1988. 27(1): p. 436-45.

164

141. Woodson, S.A. and D.M. Crothers, Conformation of a bulge-containing oligomer from a hot-spot sequence by NMR and energy minimization. Biopolymers, 1989. 28(6): p. 1149- 77.

142. Lilley, D.M., Structures of helical junctions in nucleic acids. Q Rev Biophys, 2000. 33(2): p. 109-59.

143. Moore, H., et al., Triplet repeats form secondary structures that escape DNA repair in yeast. Proc Natl Acad Sci U S A, 1999. 96(4): p. 1504-9.

144. Corrette-Bennett, S.E., et al., Efficient repair of large DNA loops in Saccharomyces cerevisiae. Nucleic Acids Res, 2001. 29(20): p. 4134-43.

145. Hou, C., et al., The Role of XPG in Processing (CAG)n/(CTG)n DNA Hairpins. Cell Biosci, 2011. 1(1): p. 11.

146. Jarem, D.A., N.R. Wilson, and S. Delaney, Structure-dependent DNA damage and repair in a trinucleotide repeat sequence. Biochemistry, 2009. 48(28): p. 6655-63.

147. Jarem, D.A., et al., Incidence and persistence of 8-oxo-7,8-dihydroguanine within a hairpin intermediate exacerbates a toxic oxidation cycle associated with trinucleotide repeat expansion. DNA Repair (Amst), 2011. 10(8): p. 887-96.

148. Kappen, L.S., et al., Stimulation of DNA strand slippage synthesis by a bulge binding synthetic agent. Biochemistry, 2003. 42(7): p. 2166-73.

149. Amrane, S., et al., Identification of trinucleotide repeat ligands with a FRET melting assay. Chembiochem, 2008. 9(8): p. 1229-34.

150. Xi, Z., D. Ouyang, and H.T. Mu, Stimulation on DNA triplet repeat strand slippage synthesis by the designed spirocycles. Bioorg Med Chem Lett, 2006. 16(5): p. 1180-4.

151. Xi, Z., et al., Targeting DNA bulged microenvironments with synthetic agents: lessons from a natural product. Chem Biol, 2002. 9(8): p. 925-31.

152. Nakatani, K., et al., Highly sensitive detection of GG mismatched DNA by surfaces immobilized naphthyridine dimer through poly(ethylene oxide) linkers. Bioorg Med Chem Lett, 2004. 14(5): p. 1105-8.

153. Hagihara, M. and K. Nakatani, Inhibition of DNA replication by a d(CAG) repeat binding ligand. Nucleic Acids Symp Ser (Oxf), 2006(50): p. 147-8.

154. Hagihara, M., et al., A small molecule regulates hairpin structures in d(CGG) trinucleotide repeats. Bioorg Med Chem Lett, 2012. 22(5): p. 2000-3.

155. Hudson, J.S., S.C. Brooks, and D.E. Graves, Interactions of actinomycin D with human telomeric G-quadruplex DNA. Biochemistry, 2009. 48(21): p. 4440-7.

165

156. Latha, K.S., et al., Molecular understanding of aluminum-induced topological changes in (CCG)12 triplet repeats: relevance to neurological disorders. Biochim Biophys Acta, 2002. 1588(1): p. 56-64.

157. Slama-Schwok, A., et al., A macrocyclic bis-acridine shifts the equilibrium from duplexes towards DNA hairpins. Nucleic Acids Res, 1997. 25(13): p. 2574-81.

158. Avila-Figueroa, A., D. Cattie, and S. Delaney, A small unstructured nucleic acid disrupts a trinucleotide repeat hairpin. Biochem Biophys Res Commun, 2011. 413(4): p. 532-6.

159. Boffa, L.C., et al., Invasion of the CAG triplet repeats by a complementary peptide nucleic acid inhibits transcription of the androgen receptor and TATA-binding protein genes and correlates with refolding of an active nucleosome containing a unique AR gene sequence. J Biol Chem, 1996. 271(22): p. 13228-33.

160. Sakai, H., et al., Analysis of an insertion mutation in a cohort of 94 patients with spinocerebellar ataxia type 31 from Nagano, Japan. Neurogenetics, 2010. 11(4): p. 409- 15.

161. Furutama, D., et al., Possible de novo CTG repeat expansion in the DMPK gene of a patient with cardiomyopathy. J Clin Neurosci, 2010. 17(3): p. 408-9.

162. Hasan, Q., V. Mohan, and Y.R. Ahuja, (CTG)n expansion at DMPK locus seen only in muscle tissue: a novel case. Indian J Exp Biol, 2004. 42(9): p. 937-40.

163. Norremolle, A., et al., Mosaicism of the CAG repeat sequence in the Huntington disease gene in a pair of monozygotic twins. Am J Med Genet A, 2004. 130A(2): p. 154-9.

164. Rubinsztein, D.C., et al., Analysis of the huntingtin gene reveals a trinucleotide-length polymorphism in the region of the gene that contains two CCG-rich stretches and a correlation between decreased age of onset of Huntington's disease and CAG repeat number. Hum Mol Genet, 1993. 2(10): p. 1713-5.

165. Wells, R.D., Mutation spectra in fragile X syndrome induced by deletions of CGG*CCG repeats. J Biol Chem, 2009. 284(12): p. 7407-11.

166. Gronskov, K., et al., Deletion of all CGG repeats plus flanking sequences in FMR1 does not abolish gene expression. Am J Hum Genet, 1997. 61(4): p. 961-7.

167. Meijer, H., et al., A deletion of 1.6 kb proximal to the CGG repeat of the FMR1 gene causes the clinical phenotype of the fragile X syndrome. Hum Mol Genet, 1994. 3(4): p. 615-20.

168. de Graaff, E., et al., The fragile X phenotype in a mosaic male with a deletion showing expression of the FMR1 protein in 28% of the cells. Am J Med Genet, 1996. 64(2): p. 302-8.

169. Mila, M., et al., Mosaicism for the fragile X syndrome full mutation and deletions within the CGG repeat of the FMR1 gene. J Med Genet, 1996. 33(4): p. 338-40.

166

170. Rio, M., et al., Familial interstitial Xq27.3q28 duplication encompassing the FMR1 gene but not the MECP2 gene causes a new syndromic mental retardation condition. Eur J Hum Genet, 2010. 18(3): p. 285-90.

171. De Boulle, K., et al., A point mutation in the FMR-1 gene associated with fragile X mental retardation. Nat Genet, 1993. 3(1): p. 31-5.

172. Evans-Galea, M.V., et al., A novel deletion-insertion mutation identified in exon 3 of FXN in two siblings with a severe Friedreich ataxia phenotype. Neurogenetics, 2011. 12(4): p. 307-13.

173. Gellera, C., et al., Frataxin gene point mutations in Italian Friedreich ataxia patients. Neurogenetics, 2007. 8(4): p. 289-99.

174. Cossee, M., et al., Friedreich's ataxia: point mutations and clinical presentation of compound heterozygotes. Ann Neurol, 1999. 45(2): p. 200-6.

175. Gouw, L.G., et al., Analysis of the dynamic mutation in the SCA7 gene shows marked parental effects on CAG repeat transmission. Hum Mol Genet, 1998. 7(3): p. 525-32.

176. Monckton, D.G., et al., Very large (CAG)(n) DNA repeat expansions in the sperm of two spinocerebellar ataxia type 7 males. Hum Mol Genet, 1999. 8(13): p. 2473-8.

177. Ansved, T., A. Lundin, and M. Anvret, Larger CAG expansions in skeletal muscle compared with lymphocytes in Kennedy disease but not in Huntington disease. Neurology, 1998. 51(5): p. 1442-4.

178. Hashida, H., et al., Brain regional differences in the expansion of a CAG repeat in the spinocerebellar ataxias: dentatorubral-pallidoluysian atrophy, Machado-Joseph disease, and spinocerebellar ataxia type 1. Ann Neurol, 1997. 41(4): p. 505-11.

179. La Spada, A.R., Trinucleotide repeat instability: genetic features and molecular mechanisms. Brain Pathol, 1997. 7(3): p. 943-63.

180. Thornton, C.A., K. Johnson, and R.T. Moxley, 3rd, Myotonic dystrophy patients have larger CTG expansions in skeletal muscle than in leukocytes. Ann Neurol, 1994. 35(1): p. 104-7.

181. Jung, J. and N. Bonini, CREB-binding protein modulates repeat instability in a Drosophila model for polyQ disease. Science, 2007. 315(5820): p. 1857-9.

182. Lobanenkov, V.V., et al., A novel sequence-specific DNA binding protein which interacts with three regularly spaced direct repeats of the CCCTC-motif in the 5'-flanking sequence of the chicken c-myc gene. Oncogene, 1990. 5(12): p. 1743-53.

183. Ohlsson, R., R. Renkawitz, and V. Lobanenkov, CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends Genet, 2001. 17(9): p. 520-7.

167

184. Ling, J.Q., et al., CTCF mediates interchromosomal colocalization between Igf2/H19 and Wsb1/Nf1. Science, 2006. 312(5771): p. 269-72.

185. Filippova, G.N., Genetics and epigenetics of the multifunctional protein CTCF. Curr Top Dev Biol, 2008. 80: p. 337-60.

186. Lee, E.C., et al., A highly efficient Escherichia coli-based chromosome engineering system adapted for recombinogenic targeting and subcloning of BAC DNA. Genomics, 2001. 73(1): p. 56-65.

187. Loukinov, D.I., et al., BORIS, a novel male germ-line-specific protein associated with epigenetic reprogramming events, shares the same 11-zinc-finger domain with CTCF, the insulator protein involved in reading imprinting marks in the soma. Proc Natl Acad Sci U S A, 2002. 99(10): p. 6806-11.

188. Gonitel, R., et al., DNA instability in postmitotic neurons. Proc Natl Acad Sci U S A, 2008. 105(9): p. 3467-72.

189. Cho, D.H., et al., Antisense transcription and heterochromatin at the DM1 CTG repeats are constrained by CTCF. Mol Cell, 2005. 20(3): p. 483-9.

190. Filippova, G.N., et al., Boundaries between chromosomal domains of X inactivation and escape bind CTCF and lack CpG methylation during early development. Dev Cell, 2005. 8(1): p. 31-42.

191. Navarro, P., et al., Tsix-mediated epigenetic switch of a CTCF-flanked region of the Xist promoter determines the Xist transcription program. Genes Dev, 2006. 20(20): p. 2787- 92.

192. Splinter, E., et al., CTCF mediates long-range chromatin looping and local histone modification in the beta-globin locus. Genes Dev, 2006. 20(17): p. 2349-54.

193. Brock, G.J., N.H. Anderson, and D.G. Monckton, Cis-acting modifiers of expanded CAG/CTG triplet repeat expandability: associations with flanking GC content and proximity to CpG islands. Hum Mol Genet, 1999. 8(6): p. 1061-7.

194. Gourdon, G., et al., Intriguing association between disease associated unstable trinucleotide repeat and CpG island. Ann Genet, 1997. 40(2): p. 73-7.

195. Nichol, K. and C.E. Pearson, CpG methylation modifies the genetic stability of cloned repeat sequences. Genome Res, 2002. 12(8): p. 1246-56.

196. Parelho, V., et al., Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell, 2008. 132(3): p. 422-33.

197. Stedman, W., et al., Cohesins localize with CTCF at the KSHV latency control region and at cellular c-myc and H19/Igf2 insulators. EMBO J, 2008. 27(4): p. 654-66.

168

198. Wendt, K.S., et al., Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature, 2008. 451(7180): p. 796-801.

199. Nguyen, P., et al., CTCFL/BORIS is a methylation-independent DNA-binding protein that preferentially binds to the paternal H19 differentially methylated region. Cancer Res, 2008. 68(14): p. 5546-51.

200. Savouret, C., et al., MSH2-dependent germinal CTG repeat expansions are produced continuously in spermatogonia from DM1 transgenic mice. Mol Cell Biol, 2004. 24(2): p. 629-37.

201. Yoon, S.R., et al., Huntington disease expansion mutations in humans can occur before meiosis is completed. Proc Natl Acad Sci U S A, 2003. 100(15): p. 8834-8.

202. Zhang, Y., et al., Age and insertion site dependence of repeat number instability of a human DM1 transgene in individual mouse sperm. Hum Mol Genet, 2002. 11(7): p. 791- 8.

203. Mirkin, S.M., Toward a unified theory for repeat expansions. Nat Struct Mol Biol, 2005. 12(8): p. 635-7.

204. Wang, Y.H. and J. Griffith, Expanded CTG triplet blocks from the myotonic dystrophy gene create the strongest known natural nucleosome positioning elements. Genomics, 1995. 25(2): p. 570-3.

205. Benzer, S., On the Topography of the Genetic Fine Structure. Proc Natl Acad Sci U S A, 1961. 47(3): p. 403-15.

206. Martorell, L., et al., Germline mutational dynamics in myotonic dystrophy type 1 males: allele length and age effects. Neurology, 2004. 62(2): p. 269-74.

207. Meiner, A., et al., Instability in the normal CTG repeat range at the myotonic dystrophy locus. J Med Genet, 1998. 35(9): p. 791.

208. Olejniczak, M. and W.J. Krzyzosiak, Genotyping of simple sequence repeats--factors implicated in shadow band generation revisited. Electrophoresis, 2006. 27(19): p. 3724- 34.

209. Bork, P. and R. Copley, The draft sequences. Filling in the gaps. Nature, 2001. 409(6822): p. 818-20.

210. Eichler, E.E., R.A. Clark, and X. She, An assessment of the sequence gaps: unfinished business in a finished human genome. Nat Rev Genet, 2004. 5(5): p. 345-54.

211. Eichler, E.E., Segmental duplications: what's missing, misassigned, and misassembled-- and should we care? Genome Res, 2001. 11(5): p. 653-6.

212. Garber, M., et al., Closing gaps in the human genome using sequencing by synthesis. Genome Biol, 2009. 10(6): p. R60.

169

213. Cetin, E., et al., Deletion mapping of chromosome 4q22-35 and identification of four frequently deleted regions in head and neck cancers. Neoplasma, 2008. 55(4): p. 299- 304.

214. Berg, M., et al., Distinct high resolution genome profiles of early onset and late onset colorectal cancer integrated with gene expression data identify candidate susceptibility loci. Mol Cancer, 2010. 9: p. 100.

215. Brosens, R.P., et al., Deletion of chromosome 4q predicts outcome in stage II colon cancer patients. Anal Cell Pathol (Amst), 2010. 33(2): p. 95-104.

216. Zhao, J., et al., Non-B DNA structure-induced genetic instability and evolution. Cell Mol Life Sci, 2010. 67(1): p. 43-62.

217. Christodoulou, J., et al., Deletion hotspot in the argininosuccinate lyase gene: association with topoisomerase II and DNA polymerase alpha sites. Hum Mutat, 2006. 27(11): p. 1065-71.

218. Kunkel, T.A., The mutational specificity of DNA polymerases-alpha and -gamma during in vitro DNA synthesis. J Biol Chem, 1985. 260(23): p. 12866-74.

219. Neville, C.E., et al., High resolution genetic analysis suggests one ancestral predisposing haplotype for the origin of the myotonic dystrophy mutation. Hum Mol Genet, 1994. 3(1): p. 45-51.

220. Thornton, C.A., et al., Expansion of the myotonic dystrophy CTG repeat reduces expression of the flanking DMAHP gene. Nat Genet, 1997. 16(4): p. 407-9.

221. Coolbaugh-Murphy, M.I., et al., Microsatellite instability (MSI) increases with age in normal somatic cells. Mech Ageing Dev, 2005. 126(10): p. 1051-9.

222. Dow, D.J., et al., Instability of normal (CTG)n alleles in the DM kinase gene. J Med Genet, 1997. 34(10): p. 871-3.

223. O'Hoy, K.L., et al., Reduction in size of the myotonic dystrophy trinucleotide repeat mutation during transmission. Science, 1993. 259(5096): p. 809-12.

224. Tang, W., et al., Friedreich's Ataxia (GAA)(n)*(TTC)(n) Repeats Strongly Stimulate Mitotic Crossovers in Saccharomyces cerevisae. PLoS Genet, 2011. 7(1): p. e1001270.

225. Hicks, W.M., M. Kim, and J.E. Haber, Increased mutagenesis and unique mutation signature associated with mitotic gene conversion. Science, 2010. 329(5987): p. 82-5.

226. Muers, M., Mutation rate: DNA repair and indels boost errors. Nat Rev Genet, 2010. 11(9): p. 592.

227. Kunkel, T.A. and A. Soni, Mutagenesis by transient misalignment. J Biol Chem, 1988. 263(29): p. 14784-9.

170

228. Kunkel, T.A., et al., Fidelity of DNA polymerase I and the DNA polymerase I-DNA primase complex from Saccharomyces cerevisiae. Mol Cell Biol, 1989. 9(10): p. 4447- 58.

229. Lopez Castel, A., J.D. Cleary, and C.E. Pearson, Repeat instability as the basis for human diseases and as a potential target for therapy. Nat Rev Mol Cell Biol, 2010. 11(3): p. 165-70.

230. Wang, Y.H., et al., Preferential nucleosome assembly at DNA triplet repeats from the myotonic dystrophy gene. Science, 1994. 265(5172): p. 669-71.

231. Hou, C., et al., Incision-dependent and error-free repair of (CAG)(n)/(CTG)(n) hairpins in human cell extracts. Nat Struct Mol Biol, 2009. 16(8): p. 869-75.

232. Tam, M., et al., Slipped (CTG).(CAG) repeats of the myotonic dystrophy locus: surface probing with anti-DNA antibodies. J Mol Biol, 2003. 332(3): p. 585-600.

233. DeJesus-Hernandez, M., et al., Expanded GGGGCC hexanucleotide repeat in noncoding region of C9ORF72 causes chromosome 9p-linked FTD and ALS. Neuron, 2011. 72(2): p. 245-56.

234. Renton, A.E., et al., A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron, 2011. 72(2): p. 257-68.

235. Frappier, L., et al., Characterization of the binding specificity of two anticruciform DNA monoclonal antibodies. J Biol Chem, 1989. 264(1): p. 334-41.

236. Steinmetzer, K., M. Zannis-Hadjopoulos, and G.B. Price, Anti-cruciform monoclonal antibody and cruciform DNA interaction. J Mol Biol, 1995. 254(1): p. 29-37.

237. Frappier, L., et al., Monoclonal antibodies to cruciform DNA structures. J Mol Biol, 1987. 193(4): p. 751-8.

238. Stollar, B.D., Antibodies to DNA. CRC Crit Rev Biochem, 1986. 20(1): p. 1-36.

239. Zannis-Hadjopoulos, M., et al., Effect of anti-cruciform DNA monoclonal antibodies on DNA replication. EMBO J, 1988. 7(6): p. 1837-44.

240. Ward, G.K., et al., The dynamic distribution and quantification of DNA cruciforms in eukaryotic nuclei. Exp Cell Res, 1990. 188(2): p. 235-46.

241. Ward, G.K., et al., DNA cruciforms and the nuclear supporting structure. Exp Cell Res, 1991. 195(1): p. 92-8.

242. Callejo, M., et al., The 14-3-3 protein homologues from Saccharomyces cerevisiae, Bmh1p and Bmh2p, have cruciform DNA-binding activity and associate in vivo with ARS307. J Biol Chem, 2002. 277(41): p. 38416-23.

171

243. Belotserkovskii, B.P. and B.H. Johnston, Polypropylene tube surfaces may induce denaturation and multimerization of DNA. Science, 1996. 271(5246): p. 222-3.

244. Belotserkovskii, B.P. and B.H. Johnston, Denaturation and association of DNA sequences by certain polypropylene surfaces. Anal Biochem, 1997. 251(2): p. 251-62.

245. Svaren, J., et al., DNA denatures upon drying after ethanol precipitation. Nucleic Acids Res, 1987. 15(21): p. 8739-54.

246. Kohne, D.E., S.A. Levison, and M.J. Byers, Room temperature method for increasing the rate of DNA reassociation by many thousandfold: the phenol emulsion reassociation technique. Biochemistry, 1977. 16(24): p. 5329-41.

247. Akiyama, K. and Y. Nishi, Cloning and physical mapping of DNA sequences encompassing a region in N-myc amplicons of a human neuroblastoma cell line. Nucleic Acids Res, 1991. 19(24): p. 6887-94.

248. Mor, O., et al., Novel DNA sequences at chromosome 10q26 are amplified in human gastric carcinoma cell lines: molecular cloning by competitive DNA reassociation. Nucleic Acids Res, 1991. 19(1): p. 117-23.

249. Shiloh, Y., et al., Rapid cloning of multiple amplified nucleotide sequences from human neuroblastoma cell lines by phenol emulsion competitive DNA reassociation. Gene, 1987. 51(1): p. 53-9.

250. Shelbourne, P., et al., Unstable DNA may be responsible for the incomplete penetrance of the myotonic dystrophy phenotype. Hum Mol Genet, 1992. 1(7): p. 467-73.

251. Schwartz, Y.B., T.G. Kahn, and V. Pirrotta, Characteristic low density and shear sensitivity of cross-linked chromatin containing polycomb complexes. Mol Cell Biol, 2005. 25(1): p. 432-9.

252. Bell, D., et al., Anti-cruciform DNA affinity purification of active mammalian origins of replication. Biochim Biophys Acta, 1991. 1089(3): p. 299-308.

253. Hewish, D.R. and L.A. Burgoyne, Chromatin sub-structure. The digestion of chromatin DNA at regularly spaced sites by a nuclear deoxyribonuclease. Biochem Biophys Res Commun, 1973. 52(2): p. 504-10.

254. Bacolla, A., et al., Flexible DNA: genetically unstable CTG.CAG and CGG.CCG from human hereditary neuromuscular disease genes. J Biol Chem, 1997. 272(27): p. 16783- 92.

255. Warner, J.P., et al., A general method for the detection of large CAG repeat expansions by fluorescent PCR. J Med Genet, 1996. 33(12): p. 1022-6.

256. Zentilin, L. and M. Giacca, The renaissance of competitive PCR as an accurate tool for precise nucleic acid quantification. Methods Mol Biol, 2010. 630: p. 233-48.

172

257. Zentilin, L. and M. Giacca, Competitive PCR for precise nucleic acid quantification. Nat Protoc, 2007. 2(9): p. 2092-104.

258. Gaubatz, J., M. Ellis, and R. Chalkley, Nuclease digestion studies of mouse chromatin as a function of age. J Gerontol, 1979. 34(5): p. 672-9.

259. Larsen, A. and H. Weintraub, An altered DNA conformation detected by S1 nuclease occurs at specific regions in active chick globin chromatin. Cell, 1982. 29(2): p. 609-22.

260. Nickol, J. and R.G. Martin, DNA stem-loop structures bind poorly to histone octamer cores. Proc Natl Acad Sci U S A, 1983. 80(15): p. 4669-73.

261. Nobile, C., J. Nickol, and R.G. Martin, Nucleosome phasing on a DNA fragment from the replication origin of simian virus 40 and rephasing upon cruciform formation of the DNA. Mol Cell Biol, 1986. 6(8): p. 2916-22.

262. Marcadier, J.L. and C.E. Pearson, Fidelity of primate cell repair of a double-strand break within a (CTG).(CAG) tract. Effect of slipped DNA structures. J Biol Chem, 2003. 278(36): p. 33848-56.

263. Dobbing, J. and J. Sands, Comparative aspects of the brain growth spurt. Early Hum Dev, 1979. 3(1): p. 79-83.

264. Shankle, W.R., et al., Evidence for a postnatal doubling of neuron number in the developing human cerebral cortex between 15 months and 6 years. J Theor Biol, 1998. 191(2): p. 115-40.

265. ten Donkelaar, H.J., et al., Development and developmental disorders of the human cerebellum. J Neurol, 2003. 250(9): p. 1025-36.

266. Marino, T.A., et al., Proliferating cell nuclear antigen in developing and adult rat cardiac muscle cells. Circ Res, 1991. 69(5): p. 1353-60.

267. Engerman, R.L., D. Pfaffenbach, and M.D. Davis, Cell turnover of capillaries. Lab Invest, 1967. 17(6): p. 738-43.

268. Cruz-Munoz, W., et al., ENU induces mutations in the heart of lacZ transgenic mice. Mutat Res, 2000. 469(1): p. 23-34.

269. Bergmann, O., et al., Evidence for cardiomyocyte renewal in humans. Science, 2009. 324(5923): p. 98-102.

270. Gennarelli, M., et al., Prediction of myotonic dystrophy clinical severity based on the number of intragenic [CTG]n trinucleotide repeats. Am J Med Genet, 1996. 65(4): p. 342-7.

271. Zappacosta, B., et al., Psychiatric symptoms do not correlate with cognitive decline, motor symptoms, or CAG repeat length in Huntington's disease. Arch Neurol, 1996. 53(6): p. 493-7.

173

272. Giordano, M., et al., Problems arising in correlating clinical and molecular data in myotonic dystrophy. Clin Genet, 1995. 47(6): p. 302-4.

273. Umar, A., J.C. Boyer, and T.A. Kunkel, DNA loop repair by human cell extracts. Science, 1994. 266(5186): p. 814-6.

274. Berg, I., et al., Two modes of germline instability at human minisatellite MS1 (locus D1S7): complex rearrangements and paradoxical hyperdeletion. Am J Hum Genet, 2003. 72(6): p. 1436-47.

275. Buard, J., et al., Somatic versus germline mutation processes at minisatellite CEB1 (D2S90) in humans and transgenic mice. Genomics, 2000. 65(2): p. 95-103.

276. Wang, L., et al., Aphidicolin-induced FRA3B breakpoints cluster in two distinct regions. Genomics, 1997. 41(3): p. 485-8.

277. Guglietta, S., G. Pantaleo, and C. Graziosi, Long sequence duplications, repeats, and palindromes in HIV-1 gp120: length variation in V4 as the product of misalignment mechanism. Virology, 2010. 399(1): p. 167-75.

278. Boyer, J.C., K. Bebenek, and T.A. Kunkel, Unequal human immunodeficiency virus type 1 reverse transcriptase error rates with RNA and DNA templates. Proc Natl Acad Sci U S A, 1992. 89(15): p. 6919-23.

279. Malmgren, H., et al., Methylation and mutation patterns in the fragile X syndrome. Am J Med Genet, 1992. 43(1-2): p. 268-78.

280. Naumann, A., et al., A distinct DNA-methylation boundary in the 5'- upstream sequence of the FMR1 promoter binds nuclear proteins and is lost in fragile X syndrome. Am J Hum Genet, 2009. 85(5): p. 606-16.

281. Yu, Z., et al., PolyQ repeat expansions in ATXN2 associated with ALS are CAA interrupted repeats. PLoS One, 2011. 6(3): p. e17951.

282. Daoud, H., et al., Association of long ATXN2 CAG repeat sizes with increased risk of amyotrophic lateral sclerosis. Arch Neurol, 2011. 68(6): p. 739-42.

283. Van Damme, P., et al., Expanded ATXN2 CAG repeat size in ALS identifies genetic overlap between ALS and SCA2. Neurology, 2011. 76(24): p. 2066-72.

284. Lee, T., et al., Ataxin-2 intermediate-length polyglutamine expansions in European ALS patients. Hum Mol Genet, 2011. 20(9): p. 1697-700.

285. Garofalo, O., et al., Androgen receptor gene polymorphisms in amyotrophic lateral sclerosis. Neuromuscul Disord, 1993. 3(3): p. 195-9.

286. Aaltonen, L.A., et al., Clues to the pathogenesis of familial colorectal cancer. Science, 1993. 260(5109): p. 812-6.

174

287. Oda, S., et al., Precise assessment of microsatellite instability using high resolution fluorescent microsatellite analysis. Nucleic Acids Res, 1997. 25(17): p. 3415-20.

288. Chaires, J.B., A competition dialysis assay for the study of structure-selective ligand binding to nucleic acids. Curr Protoc Nucleic Acid Chem, 2003. Chapter 8: p. Unit 8 3.

289. Song, G. and J. Ren, Recognition and regulation of unique nucleic acid structures by small molecules. Chem Commun (Camb), 2010. 46(39): p. 7283-94.

290. Francois, J.C. and C. Helene, Recognition of hairpin-containing single-stranded DNA by oligonucleotides containing internal acridine derivatives. Bioconjug Chem, 1999. 10(3): p. 439-46.

291. Leontis, N.B., et al., A model for the solution structure of a branched, three-strand DNA complex. J Biomol Struct Dyn, 1993. 11(2): p. 215-23.

292. Rosen, M.A. and D.J. Patel, Structural features of a three-stranded DNA junction containing a C-C junctional bulge. Biochemistry, 1993. 32(26): p. 6576-87.

293. Rosen, M.A. and D.J. Patel, Conformational differences between bulged pyrimidines (C- C) and purines (A-A, I-I) at the branch point of three-stranded DNA junctions. Biochemistry, 1993. 32(26): p. 6563-75.

294. Thiviyanathan, V., et al., Hybrid-hybrid matrix structural refinement of a DNA three-way junction from 3D NOESY-NOESY. J Biomol NMR, 1999. 14(3): p. 209-21.

175