NOVEL GENOME ENGINEERING STRATEGIES TO MODEL REPEAT EXPANSION DISEASES

By

RUAN OLIVEIRA

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2018

© 2018 Ruan Oliveira

To my parents, whose hard work and continuous support allowed me to obtain a doctoral degree

ACKNOWLEDGMENTS

First, I would like to thank my parents, Rosane and Rudimar Oliveira, for their unconditional love and uninterrupted support over the last 26 years. Their commitment to my education makes me prouder than my own graduate degree. Next, I would like to express my deepest gratitude to Andriel Fenner, whose ears endured my daily complaints about graduate school. His continued support kept me sane and his company eased the process of transitioning into adulthood (in progress). Also, I would like to thank Maria Seabra for crossing my path in 2010, when my sophomore version went to a conference in São Paulo and met this loud and contagious woman who was unable to stop talking about her experiences as a Ph.D. student at the University of

Florida. If I did not meet Maria, I would have never heard of Gainesville.

I would like to thank my Ph.D. mentor, Dr. Maurice Swanson, for giving me freedom to pursue my own ideas and trusting me. I also thank Maury for teaching me how to be a scientist and for his patience with my learning curve. I am grateful to Myrna

Stenberg, who taught me discipline and offered me psychological support in the moments I needed the most. I would like to thank Amrutha Pattamatta, Belinda Pinto,

Barbara Perez and Franjo Ivanković, for their continuous emotional support and friendship. I am also very grateful to Curtis Nutter, Jodi Bubenik and James Thomas, who assisted me in my research endeavors, especially during the last year. Additionally,

I would like to thank past and current members from the Swanson lab (Konstantinos

Charizanis, Mini Manchanda, Marianne Goodwin, Apoorva Mohan, Marina Scotti,

Catherine Marten, Łukasz Sznajder, Olgert Bardhi, Jordan Ross, Katherine Narvaez and Jorge Cañas) for creating a pleasant work environment and for their help and/or advice.

4

I would like to thank the Graduate Program in Biomedical Sciences, especially

Peggy Wallace and the Genetics concentration, for their commitment to graduate education. I am also grateful to my committee members, Laura Ranum, Eric Wang,

Susan Semple-Rowland and Naohiro Terada, for their availability and guidance. Last, I thank the Department of Molecular Genetics and the Center for NeuroGenetics, for their resources and support in the last 4 years.

5

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS ...... 4

LIST OF TABLES ...... 8

LIST OF FIGURES ...... 9

LIST OF ABBREVIATIONS ...... 11

ABSTRACT ...... 15

CHAPTER

1 INTRODUCTION ...... 17

Repeat Expansion Diseases ...... 17 Pathomechanisms of Repeat Expansions ...... 18 Myotonic Dystrophy Type I: Mechanisms and Mouse Models ...... 21

2 ENGINEERING THE ENDOGENOUS DMPK LOCUS ...... 31

Background ...... 31 Targeting the Endogenous Dmpk 3’ UTR by CRISPR/Cas9...... 34 Gibson Assembly of CTGexp Recombination Templates ...... 36 Outcomes of CRISPR/Cas9 Zygote Microinjections ...... 37 Synthesis of Recombination Templates Carrying Larger Repeat Expansions ...... 40 The Knockin of 480 CTG Repeats in the Endogenous Dmpk 3’ UTR ...... 41 Discussion ...... 42

3 PHENOTYPIC OUTCOMES OF DMPK KNOCKIN MODELS ...... 54

Background ...... 54 Transcriptional and Translational Assessments in Dmpk170 Muscles ...... 55 Accumulation of RNA Foci in the Absence of Spliceopathy ...... 57 Increased Levels of RNA Foci Following Myogenic Differentiation ...... 58 The Transcriptomic Profile of Dmpk170/170 Myotubes ...... 59 Discussion ...... 60

4 THE CHOROID PLEXUS INVOLVEMENT IN DM1 ...... 68

Background ...... 68 In the Dmpk170 Brain, RNA Foci Are Restricted to Choroidal Epithelial Cells ...... 69 Increased Dmpk Expression in the Mouse Choroid Plexus ...... 70 Choroid Plexus Development, Functions and Pathology ...... 71 Consequences of Mbnl Loss-of-Function in the Choroid Plexus ...... 75

6

Discussion ...... 79

5 CONCLUSIONS AND FUTURE DIRECTIONS ...... 90

6 MATERIALS AND METHODS...... 94

Design and Validation of crRNAs Targeting Dmpk ...... 94 Propagation of Plasmids in Escherichia coli ...... 95 Gibson Assembly of Recombination Templates ...... 95 Rolling Circle Amplification ...... 97 Repeat Dimerization by Golden Gate Assembly ...... 98 Zygote Microinjections ...... 98 Isolation of Genomic DNA from Tail Snips ...... 99 Southern Blotting ...... 100 PCR Genotyping ...... 101 Repeat-primed PCR...... 101 RNA Isolation from Dissected Tissues and Cultured Cells ...... 102 cDNA Synthesis ...... 103 Quantitative PCR ...... 103 Splicing Analysis by RT-PCR ...... 104 Western Blotting ...... 104 RNA Fluorescence in situ Hybridization ...... 105 Myoblast Isolation, Culture and Differentiation ...... 106 RNA-Seq ...... 106

LIST OF REFERENCES ...... 108

BIOGRAPHICAL SKETCH ...... 123

7

LIST OF TABLES

Table page

1-1 Mouse models developed to study and treat myotonic dystrophy ...... 29

2-1 Primers used to amplify the targeted Dmpk locus for T7E1 assay ...... 52

2-2 Primers used to amplify homology arms for Gibson assembly ...... 52

2-3 Primers used to amplify Southern blot probes and resulting amplicons...... 52

2-4 Primers used to amplify the edited Dmpk 3’ UTR locus for genotyping purposes ...... 53

2-5 Primers used for fluorescent repeat-primed PCR ...... 53

2-6 Mutant Dmpk founders: germline transmission rates and expansion length ...... 53

3-1 Primers used to measure transcript levels by qPCR ...... 67

3-2 Primers used to asses Dmpk transcript distribution and alternative splicing by RT-PCRs...... 67

3-3 Animals used for isolation of primary myoblasts ...... 67

4-1 Choroid plexus RNA-Seq ...... 89

8

LIST OF FIGURES

Figure page

1-1 Examples of pathogenic microsatellites ...... 27

1-2 DM1 pathogenic cascade ...... 28

2-1 DMPK sequence conservation in mice ...... 44

2-2 Validation of Dmpk targeting by CRISPR/Cas9 ...... 45

2-3 Assembly of Dmpk targeting construct ...... 46

2-4 Rolling circle amplification of recombination templates ...... 47

2-5 Southern blot screening of mutant Dmpk founders ...... 48

2-6 Genotyping progeny from mutant Dmpk lines ...... 49

2-7 Generation of recombination templates carrying larger expansions ...... 50

2-8 Genotyping Dmpk170 and Dmpk480 knockin lines...... 51

3-1 Transcriptional and translational assessments in Dmpk knockin mice ...... 62

3-2 Accumulation of RNA foci in the absence of spliceopathy ...... 63

3-3 Increased Dmpk expression and RNA foci accumulation during myogenesis ..... 64

3-4 expression changes in Dmpk170 myotubes ...... 65

3-5 Alternative splicing changes in Dmpk170 myotubes ...... 66

4-1 Distribution of RNA foci in the brain of Dmpk170 knockin mice ...... 82

4-2 Increased Dmpk expression levels in the choroid plexus of wild-type mice ...... 83

4-3 Primary functions of the choroid plexus-cerebrospinal fluid system ...... 84

4-4 Differential in the choroid plexus of Mbnl1 and Mbnl2 knockout mice ...... 85

4-5 (GO) analysis of differentially expressed in the CP of Mbnl1 and Mbnl2 knockout mice ...... 86

4-6 Examples of downregulated genes in the CP of Mbnl1 or Mbnl2 knockout mice...... 87

9

4-7 Alternative splicing changes in the CP of Mbnl1 and Mbnl2 knockout mice ...... 88

10

LIST OF ABBREVIATIONS

AD Alzheimer's disease

Aldh1l2 aldehyde dehydrogenase 1 family member L2

ALS amyotrophic lateral sclerosis

Ap2m1 adaptor-related complex 2 mu 1 subunit

AQP1 1

AR androgen receptor

ATP2A1/Atp2a1 sarcoplasmic/endoplasmic reticulum calcium ATPase 1

ATXN8 ataxin 8

BBB blood-brain barrier

BCSFB blood-cerebrospinal fluid barrier bp (s)

C9ORF72 9 open reading frame 72

Cacna2d1 calcium voltage-gated channel auxiliary subunit alpha2 delta 1

Cas CRISPR-associated

Ccnd2 cyclin D2

CDM congenital myotonic dystrophy

CDS coding sequence

CELF CUGBP/Elav-like family

Ckmt2 creatine kinase mitochondrial 2

CLCN1/Clcn1 1

CLIP cross-linking immunoprecipitation

CNS central nervous system

CP choroid plexus

CRISPR clustered regularly interspaced palindromic repeats

11

crRNA CRISPR RNA

CSF cerebrospinal fluid

CUGBP CUG binding protein

DBI diazepam binding inhibitor

DKO double knockout

DM myotonic dystrophy

DM1 myotonic dystrophy type 1

DM2 myotonic dystrophy type 2

DMPK dystrophia myotonica protein kinase

DNA deoxyribonucleic acid

DSB double-stranded break

EAE experimental autoimmune encephalomyelitis

ES embryonic stem exp expansion

FISH fluorescence in situ hybridization

FMR1 fragile X mental retardation 1

FRAXA fragile X syndrome

FTD frontotemporal dementia

Fwd forward

GABA gamma-aminobutyric acid

Gln glutamine gRNA guide RNA

HD Huntington's disease

HDR homology-directed repair

HSA human skeletal actin

12

HTT huntingtin

IFN interferon

IGF2 insulin growth factor 2 kb kilobase

Kcne2 potassium voltage-gated channel subfamily E member 2

KO knockout

LB lysogeny broth

LIF leukemia inhibitory factor mbl muscleblind

MBNL muscleblind-like mRNA messenger ribonucleic acid

MS multiple sclerosis

NMD nonsense-mediated decay

Nmrk2 nicotinamide riboside kinase 2

NO nitric oxide nt nucleotide(s)

ORF open reading frame

PAM protospacer adjacent motif

PCR polymerase chain reaction

PKC protein kinase C qPCR quantitative real-time PCR

RAN repeat-associated non-ATG

RCA rolling circle amplification

REM rapid eye movement

Rev reverse

13

RNA ribonucleic acid

Rpph1 RNase P component H1

RT-PCR reverse transcriptase PCR

SBMA spinal and bulbar muscular atrophy

SCA spinocerebellar ataxia sgRNA single-guide RNA

SHH sonic hedgehog

SIX5 sine oculis homeobox homolog 5

Slco1c1 solute carrier organic anion transporter family member 1C1

SNP single nucleotide polymorphism

Sod3 superoxide dismutase 3

SVZ subventricular zone

T3 triiodothyronine

T4 thyroxine

TALEN transcription activator-like effector nuclease

TBP TATA-binding protein

THs thyroid hormones

TKO triple knockout

Tnnt2 troponin T2, cardiac type tracrRNA trans-activating crRNA

TTR transthyretin

Tyr tyrosinase

UTR untranslated region

WGA wheat-germ agglutinin

ZFN zinc finger nuclease

14

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

NOVEL GENOME ENGINEERING STRATEGIES TO MODEL REPEAT EXPANSION DISEASES

By

Ruan Oliveira

May 2018

Chair: Maurice S. Swanson Major: Medical Sciences

Repeat expansions are responsible for over than 30 neurological and neuromuscular diseases. Since their discovery in the early 1990s, several pathomechanisms have been identified, suggesting multiple strategies for therapeutic development. Despite these successes, no treatment is currently available for these disorders, in part due to the limited availability of animal models that faithfully recapitulate the entire spectrum of disease symptoms. While human genes that harbor repeat expansions are strongly conserved in the mouse genome, the process of inserting large expansions in these loci has been largely hindered by technical challenges, including the instability of targeting vectors in bacteria and the small repeat numbers capable of generating viable mouse ES cells for blastocyst injections.

To overcome these limitations, we used a bacteria-free cloning strategy to generate recombination templates carrying large repeat expansions, followed by

CRISPR/Cas9 gene editing of mouse zygotes, a technology that bypasses the requirement of targeting mouse ES cells. We tested this approach by developing an allelic series of knockin models for myotonic dystrophy type I (DM1), the most common

15

muscular dystrophy in adults, a disease caused by a CTG expansion (CTGexp) in the 3’ untranslated region (3’ UTR) of the DMPK gene on chromosome 19q13.3.

Microinjection of mouse zygotes with CRISPR/Cas9 components and different recombination templates generated Dmpk knockin models carrying from 170 to 480

CTG repeats, the largest repeat expansion inserted in a mouse gene to date. Dmpk170 mice accumulated mutant transcripts in nuclear RNA foci and showed early signs of spliceopathy in differentiated primary myoblasts. In the Dmpk170 brain, RNA foci accumulate exclusively in the choroid plexus, a structure that regulates cerebrospinal fluid homeostasis. These results confirmed the feasibility of our genome engineering strategy and provided novel models to develop therapeutic strategies that target all tissues affected in DM1. Future studies include the characterization of Dmpk480 mice and the development of additional knockin models for other repeat expansion diseases.

16

CHAPTER 1 INTRODUCTION

Repeat Expansion Diseases

The ambitious efforts to sequence the transformed our perception of genetic content and complexity1, 2. Approximately 98% of our DNA is not protein-coding, and nearly half of it is comprised of repetitive sequences3. Two major classes of repetitive DNA elements have been described to date. Non-adjacent or interspaced DNA repeats include transposable elements thought to accelerate evolution by gene conversion4. Adjacent or tandem repeats are further divided into three categories, according to the size of each repeat unit5. The largest members, known as satellites, are composed of DNA repeats larger than 60 bp, and can be found in the centromeres of some chromosomes6. Minisatellites are smaller, ranging from 10-60 bp per unit, and can also be found in some centromeres and in the sub-telomeric region of human chromosomes7. Microsatellites are the shortest members, with repeat units ranging from 2-10 bp8. Perhaps the most prominent example of microsatellite is the

TTAGGG repeat found in human telomeres9, but tandem repeats also exists in coding10 and non-coding11 sequences of mammalian genes.

Microsatellites are highly prone to folding into a variety of imperfect hairpins, quadruplex-like and slipped-stranded structures12, 13, presenting several challenges to some of the most essential processes in the cell, including DNA replication, repair, recombination and transcription14, 15. When attempting to handle microsatellites, cells can introduce extra repeat copies, leading to mutations known as repeat expansions. In most cases, small increases in repeat size are asymptomatic. Over several years, repeat instability can lead to large expansions, triggering the pathogenic cascade of

17

several diseases16. To date, microsatellite repeat expansions have been identified as the cause of over than 30 neurological diseases, including fragile X syndrome,

Huntington’s disease, amyotrophic lateral sclerosis, myotonic dystrophy and a set of spinocerebellar ataxias16, 17.

Pathomechanisms of Repeat Expansions

Pathogenic microsatellite repeat expansions can be found in coding and non- coding portions of several human genes12 (Figure 1-1). Protein-coding microsatellites are trinucleotide repeats that naturally exist in many essential genes10, including huntingtin (HTT), androgen receptor (AR) and TATA-binding protein (TBP)18. While repeats smaller than 30 units are usually non-pathogenic, larger expansions can lead to protein loss and/or gain-of-function, triggering the pathogenic cascade of disorders such as Huntington’s disease (HD), spinal and bulbar muscular atrophy (SBMA) and spinocerebellar ataxia type 17 (SCA17)19.

While the direct effect of coding microsatellites on protein function is fairly intuitive, the pathomechanisms underlying diseases caused by non-coding microsatellites has puzzled the field since these mutations were discovered in the early

1990s20-23. One possibility is the effect of GC-rich microsatellites on chromatin accessibility. Indeed, this mechanism was observed for the CGG repeat in the 5’ untranslated region (5’ UTR) of the Fragile X mental retardation 1 (FMR1) gene on chromosome X20-23. The presence of >230 CGG repeats in FMR1 leads to methylation and gene silencing24-26. Male infants carrying a pathogenic CGG expansion completely lack the protein product of FMR1, FMRP, and develop Fragile X Syndrome (FRAXA)27.

Several disease-relevant phenotypes are also observed in Fmr1 knockout mice, confirming that protein loss-of-function is the primary mechanism of FRAXA28.

18

Gene silencing was a plausible rationale to explain how non-coding repeat expansions could lead to disease. Consequently, this mechanism was further investigated in myotonic dystrophy type I (DM1), a dominantly-inherited disorder caused by a CTG expansion (CTGexp) in the 3’ untranslated region (3’ UTR) of the dystrophia myotonica protein kinase (DMPK) gene29-33. Surprisingly, this disease could not be explained by partial protein loss-of-function or DMPK haploinsufficiency, since Dmpk knockout mice did not develop disease-relevant phenotypes34, 35.

Repeat expansions in the 3’ UTR are expected to be present in fully mature messenger RNA (mRNA) molecules. Therefore, fluorescently-labeled oligonucleotides complementary to the repeat expansion would be able to hybridize to expanded mRNAs, becoming a useful tool to evaluate the presence and localization of expanded transcripts. In fact, the use of CAG probes for RNA fluorescent in situ hybridization

(RNA-FISH) revealed nuclear accumulation of mutant CUG expanded (CUGexp) transcripts in DM1 patient cells36-38. To date, these focal depositions of expanded RNAs, also known as RNA foci, have been detected in multiple repeat expansion diseases.

The observation that a repeat expansion in a non-coding region of a gene resulted in a dominantly inherited disorder led to the RNA gain-of-function hypothesis, in which transcripts with an expanded RNA repeat formed an unusual RNA structure that, in turn, sequestered RNA-binding with a high affinity for the binding motifs present in the expansion39. Indeed, numerous experiments have identified RNA processing factors with high affinity for CUGexp transcripts39. Later, this family of proteins was named muscleblind-like or MBNL due to homology with Drosophila’s muscleblind

(mbl)40. Further studies demonstrated co-localization of MBNL proteins to CUGexp RNA

19

foci in DM1 patient cells, validating the mechanism of MBNL sequestration in DM141. In agreement, Mbnl1 and Mbnl2 knockout mice recapitulated many DM1 symptoms, establishing RNA gain-of-function as a possible pathomechanism of repeat expansions42, 43. Additional studies in DM1 also revealed bidirectional transcription across the expanded repeat tract, giving rise to two RNA species of possible toxicity44,

45. To date, bidirectional transcription has been reported in numerous repeat expansion diseases, including C9ORF72 amyotrophic lateral sclerosis/frontotemporal dementia

(ALS/FTD)46 and spinocerebellar ataxia type 8 (SCA8)47.

In 2011, a study focused on SCA8 pathomechanisms transformed our understanding of molecular pathogenesis driven by repeat expansions48. Previously reported in Huntington’s disease and disorders characterized by ataxia, poly-glutamine

(PolyGln) peptides were detected in SCA8 autopsy tissue47. These peptides were initially thought to be derived from a single ATXN8 open reading frame (ORF)47. To evaluate the effects of RNA gain-of-function in SCA8 in the absence of polyGln, cells were transfected with CAG constructs lacking an ATG initiation codon, which in theory would prevent translation. Surprisingly, polyGln peptides were still detected, originating a mechanism known as repeat-associated non-ATG (RAN) translation48. Additionally, it was reported that RAN translation was capable of producing peptides in all different reading frames from repeat expansions, from both sense and antisense transcripts, in the absence of frame-shifting46, 48. Altogether, these findings suggest that a single repeat expansion can produce two distinct transcripts of possible RNA gain-of-function toxicity, and multiple RAN peptides49. To date, RAN translation products have been detected in numerous repeat expansion diseases, including those caused by coding

20

microsatellites such as Huntington’s disease50, and non-coding microsatellites such as myotonic dystrophies type I (DM1)48, type II (DM2)51 and C9ORF72-ALS/FTD46. The relative contribution of RAN peptides to disease pathogenesis, as well as the molecular mechanisms that initiate RAN translation, are currently being investigated.

Myotonic Dystrophy Type I: Mechanisms and Mouse Models

In 1909, Hans Steinert described a form of muscular dystrophy characterized by muscle weakness and delayed muscle relaxation following contraction, a symptom also known as myotonia52. In 1992, the genetic cause of Steinert’s disease – now renamed as myotonic dystrophy type I or DM1 – was described as a CTG repeat expansion

(CTGexp) in the 3’ untranslated region (3’ UTR) of the dystrophia myotonica protein kinase (DMPK) gene on chromosome 19q13.329-33. In unaffected individuals, this locus contains 5-37 CTG repeats but in DM1 these repeats expand into the thousands. Larger repeats correlate with an earlier disease onset and more severe symptoms53. Repeat size is also subject to somatic mosaicism, with sizes in peripheral blood correlating poorly to the expansion lengths in post-mitotic tissues such as skeletal muscle or brain54-57. In the germline, repeat instability can lead to transmission of larger repeats to successive generations, resulting in earlier disease onsets and an increase in disease severity, a phenomenon referred to as genetic anticipation58. Maternal transmission of alleles larger than 1,000 repeats can lead to congenital myotonic dystrophy (CDM), the most severe onset of DM1 characterized by respiratory distress, intellectual disability, and low muscle tone or hypotonia59, 60. Because DM1 is mostly an adult or late-onset disease, the birth of a child with CDM usually prompts genetic testing of affected families.

21

The most established DM1 pathomechanism is RNA gain-of-function. Upon transcription, the expanded DMPK allele gives rise to CUGexp RNAs that sequester

RNA-binding proteins from the muscleblind-like (MBNL) family41. MBNL proteins are major regulators of RNA processing, with well-established roles in alternative splicing61, alternative polyadenylation62 and RNA localization63-65. As described in numerous studies, MBNL1 and MBNL2 are required for alternative splicing of hundreds of developmentally-regulated genes, promoting fetal to adult splice isoform switching66.

Sequestration of MBNL1 and 2 by CUGexp RNAs results in the presence of fetal splicing isoforms in adult tissues, a characteristic feature underlying DM1 pathogenesis. A prominent example is the misplicing of chloride channel 1 (CLCN1) in the skeletal muscle of DM1 patients67. In unaffected adults, MBNL1 promotes skipping of exon 7a in the CLCN1 mRNA, resulting in a transcript that allows translation of CLCN1 protein required for muscle relaxation42. In DM1, sequestration of MBNL1 by CUGexp RNAs leads to increased inclusion of exon 7a, resulting in CLCN1 transcripts containing a premature stop codon that triggers degradation by nonsense mediated decay (NMD).

Decreased CLCN1 protein levels in the skeletal muscle of DM1 patients undermines the entrance of chloride ions, leading to delayed muscle relaxation following contraction, a phenotype also known as myotonia67, 68.

Additional contributors to the DM1 pathogenic cascade have been reported.

Steady-state levels of proteins from the CUGBP/Elav-like family (CELF) are increased in DM1 due to hyperphosphorylation mediated by protein kinase C (PKC)69. CELF proteins are MBNL antagonists, and the increase in CELF half-life further favors the existence of fetal splicing isoforms in adult tissues70. The mechanism triggering PKC

22

activation in DM1 is unknown, and it is unclear whether the increase in CELF1 levels is a direct effect of CUGexp RNAs or an indirect consequence of tissue repair. Other mechanisms, including RAN translation48 and RNA interference71, have also been reported in DM1, but the extent of their contribution to disease pathogenesis is not clear

(Figure 1-2).

Since the DM1 causal mutation was first described in 1992, several mouse models have been developed to improve our understanding of disease mechanisms and develop therapeutic strategies (Table 1-1)72. Early studies suggested that DM1 could result from a decrease in expression of DMPK73, 74 and its downstream gene

SIX575. While decreased levels of DMPK protein have been reported in DM1 patients76,

Dmpk or Six5 heterozygous knockouts – true models of haploinsufficiency – did not develop disease-relevant phenotypes34, 35, 77-80, suggesting that DMPK or SIX5 loss-of- function are not the primary mechanisms underlying DM1 pathogenesis.

In the years following the discovery of MBNL sequestration by CUGexp RNAs41, a variety of Mbnl knockout models were developed42, 43, 81-83. Mbnl1 knockout mice presented myotonia caused by Clcn1 misplicing, particulate cataracts similar to those observed in DM1 patients, and histopathological features of DM1 in skeletal muscle42.

Mbnl2 knockout mice showed several CNS features of DM1, including learning deficits and REM sleep disturbances43. Upregulation of MBNL2 protein in Mbnl1 knockout mice suggested a compensatory mechanism for MBNL proteins in RNA processing81. While constitutive Mbnl1; Mbnl2 double knockouts (DKO) are embryonic lethal, a muscle- specific knockout of Mbnl2 in an Mbnl1 null background has been generated, indicating that compound loss of Mbnl1 and 2 is required to recapitulate several DM1 symptoms,

23

including muscle weakness and wasting81. Recently, a muscle-specific knockout of

Mbnl2 and Mbnl3 in an Mbnl1 null background, here referred as Mbnl triple knockout

(TKO), was able to mimic several symptoms observed in CDM, such as congenital myopathy and respiratory distress83. Altogether, these models suggest that progressive

MBNL titration leads to phenotypes associated with different onsets and severities of

DM1.

To test the whether CTG repeats alone are capable of driving DM1 phenotypes, additional mouse models were generated focusing on the overexpression of a CTGexp in the 3’ UTR of a human skeletal actin (HSA) transgene84. While HSASR (short repeat) mice overexpressing 5 CTG repeats did not display any DM1 features, the HSALR (long repeat) line carrying 250 CTG repeats developed myotonia, showed accumulation of

RNA foci that co-localized with MBNL1, and presented widespread misplicing of MBNL1 targets such as Clcn1 and Atp2a184. These phenotypes provided solid evidence that overexpression of CTG repeats in a locus unrelated to DMPK is sufficient to drive disease, supporting RNA gain-of-function as the major mechanism underlying DM1.

Despite their contribution to our understanding of disease pathogenesis, HSALR mice only express the randomly inserted transgene in mature skeletal muscle and, therefore, cannot be used to study the multisystemic features of DM1 or the developmental defects observed in CDM.

Transgenic lines carrying a ~45 kb insert containing a mutant human DM1 locus have also been developed85. Homozygous DM300 mice expressing two copies of a

DMPK transgene with 300 CTG repeats developed muscle histopathology, myotonia, muscle weakness and insulin resistance85, 86. Furthermore, the occurrence of

24

intergenerational instability in one DM300 line established the DMSXL model, which carries a DMPK transgene containing up to 1,800 CTG repeats87. While hemizygous

DMSXL mice presented very mild phenotypes87, 88, homozygous animals developed severe growth defects87, in addition to heart89 and CNS abnormalities88, 90. The absence of additional DMSXL lines derived from different transgenic founders, added to the requirement of homozygosity to model a dominant disease, led to skepticism over the phenotypes observed in this model. The randomly inserted transgene disrupts the Fbxl7 gene, which encodes a subunit of the ubiquitin ligase complex. This could be responsible or influence the features presented by DMSXL mice. Additionally, low levels of transgene expression in skeletal muscle, added to high mortality rates and breeding difficulties, limit the feasibility of DMSXL mice as a DM1 model72.

The existence of a highly conserved DMPK orthologue in mice35 invoked the possibility of modeling DM1 by inserting large CTGexp in the 3’ UTR of the endogenous mouse Dmpk gene. This knockin strategy would lead to expression of mutant RNAs in the appropriate spatiotemporal context of Dmpk, closely mirroring the transcript distribution observed in DM1 patients. Still, the only Dmpk knockin model developed to date harbors 98 CTG repeats – numbers much smaller than those found in post-mitotic tissues of DM1 patients91. Despite several attempts by different research groups including ours, the knockin of larger expansions remains a largely inefficient and technically challenging process. Large repeat expansions are noticeably unstable in bacterial strains routinely used for cloning, limiting the expansion size in recombination templates to less than 200 repeats. These constructs are then used for traditional homologous recombination of mouse embryonic stem (ES) cells, a notoriously

25

inefficient process that requires extensive screening and often results in repeat sizes contracting over passaging. Not surprisingly, the development of knockin models for repeat expansion diseases has very few successful stories, including the CAG111 Htt knockin for Huntington’s disease92-94 and the CGG200 Fmr1 knockin for fragile X syndrome95, 96.

In this study, we overcame the technical challenges undermining the development of knockin models carrying large repeat expansions. This dissertation will describe the genome engineering strategies that enabled the generation of Dmpk knockin models carrying up to 480 CTG repeats - the largest repeat expansion inserted in a mouse gene to date. The contribution of these models to our understanding of DM1 pathogenesis will be discussed, as well as their potential limitations and future directions.

26

Figure 1-1. Examples of pathogenic microsatellites. Repeat expansions (pink triangles) can exist in exons (thick blue boxes), 5’ or 3’ untranslated regions (thin blue boxes) or introns (grey boxes). Repeat expansions (exp) are shown in bold. Repeat-containing genes are italicized, followed by the acronym for the related genetic condition.

27

Figure 1-2. DM1 pathogenic cascade. Bidirectional transcription of a CTGexp in the 3’ UTR of DMPK produces CUGexp (1a) and CAGexp (1b) mutant transcripts. CUGexp RNA sequesters MBNL proteins (2) and may trigger a PKC-mediated CELF1 hyperphosphorylation, increasing its steady-state levels (3). MBNL sequestration, in addition to CELF stabilization, favors fetal splicing patterns for MBNL target RNAs in adult tissues. In DM1 skeletal muscle, inclusion of exon 7A in the CLCN1 mRNA (fetal isoform) triggers nonsense-mediated decay and loss of CLCN1 protein, leading to myotonia (4). When exported, mutant transcripts can undergo RAN translation, giving rise to 6 polypeptides of potential toxicity (5). MBNL sequestration by CUGexp may also impair RNA localization (6).

28

Table 1-1. Mouse models developed to study and treat myotonic dystrophy. Mouse model Rationale Contributions Limitations Dmpk KO34, 35 DMPK Established that DM1 Conflicting reports on haploinsufficiency in is not primarily impacts of Dmpk DM1 caused by DMPK ablation in mice haploinsufficiency Six5 KO77-80 SIX5 Established that DM1 Animals develop haploinsufficiency in is not caused by SIX5 nuclear cataracts, DM1 haploinsufficiency different from the subcapsular cataracts in DM1 Cnbp KO97, 98 CNBP Heterozygous mice Conflicting reports of haploinsufficiency in present myotonia, CNBP DM2 cardiac conduction downregulation in defects, cataracts, DM2 and myopathic features Mbnl1 KO42, Mbnl1 loss-of- Presents myotonia, Not a C(C)TGexp 81 function aberrant splicing, model so may not centralized myonuclei address some repeat- and subcapsular induced cataracts pathomechanisms Mbnl2 KO43 Mbnl2 loss-of- Develops DM- including RAN function relevant learning translation, RNAi and deficits and REM CELF stabilization. sleep disturbances Constitutive loss of Mbnl3 KO99 Mbnl3 loss-of- Absence of nuclear both Mbnl1 and function isoform impairs Mbnl2 is embryonic muscle regeneration lethal

CELF Overexpression of Presents molecular, Premature lethality. transgenic100- CELF in skeletal histological and Limited applications 102 muscle or heart physiological features for therapeutic of DM1 development Dmt103, 104 First DM1 transgenic Occurrence of Absence of promoter model. Transgene somatic and elements prevents comprises ~162 CTG intergenerational transgene repeats and ~750 bp instability expression, limiting of flanking DNA from the applicability of the DMPK 3’ UTR this model to studies without any promoter on repeat instability element or coding sequence

29

Table 1-1. Continued Mouse model Rationale Contributions Limitations HSALR 84 Transgenic Established the RNA Transgene is overexpression of gain-of-function expressed only in CTG250 within the mechanism post-developmental human skeletal actin underlying DM1. skeletal muscle tissue (HSA) 3’ UTR Shows RNA foci and at levels much higher splicing alterations. than the endogenous Widely used for Dmpk gene. Absence therapeutic of muscle weakness development or wasting. No effect on CELF levels DM30085 Transgenic line Occurrence of Variable expression carrying a 45 kb somatic and levels. Mild region from the DM1 intergenerational phenotype locus including a repeat instability. mutant DMPK gene Myotonia and with ~300 CTG histological changes repeats DMSXL87 Transgenic line Splicing changes in Derived from a single derived from the the central nervous founder. Low DMPK DM300 model. system and skeletal expression. Requires Contains >1,000 CTG muscle of homozygosity to repeats homozygous mice model a dominant disease EpA960105, 106 Tissue-specific Suggested that Interrupted repeat overexpression of CELF1 upregulation tract. Heart-specific 960 interrupted CTG is dependent on the model displayed repeats within exon DMPK genomic premature death. 15 of DMPK context. TRE-EGFP- Transgenic line with Dox-induced Overexpression of a CTG5 and inducible expression transgene activation non-pathogenic CTG200107 of DMPK 3’ UTR leads to myotonia repeat, CTG5, results carrying 5 or 200 and cardiac defects. in DM1 features CTG repeats Dox withdrawal reverts these phenotypes Humanized Transgenic line in Somatic instability Absence of published Dmpk CTG98 which the genomic blocked in an Msh3- data addressing Knockin91 fragment between deficient background molecular, exons 13-15 from the and increased in an histological or endogenous Dmpk Msh6-deficient physiological gene was replaced by background phenotypes from the the human fragment expression of including 98 CTG humanized mutant repeats Dmpk alleles

30

CHAPTER 2 ENGINEERING THE ENDOGENOUS DMPK LOCUS

Background

The site-specific knockin of repeat expansions is a challenging task mediated by homologous recombination. To enable this process, a recombinant DNA fragment has to be engineered to contain the desired repeat expansion flanked by large sequences of perfect homology to the targeted locus. When linearized, recombination templates trigger homology-directed repair (HDR) by invading the homologous sequence present in the locus of interest. Microsatellites are notoriously unstable in E. coli strains commonly used for cloning, limiting the expansion sizes in recombination templates to less than 200 repeats. Therefore, the generation of recombination templates carrying large CTGexp is a limiting step in the development of Dmpk knockin models for DM1.

In 2005, a bacteria-free cloning method was developed to allow synthesis of sequences known to be toxic in E. coli, such as synthetic phage genomes carrying lethal mutations108. This strategy relies on the strong strand displacement of phi29 DNA polymerase, which enables the continuous extension of circular DNA molecules by rolling circle amplification (RCA)109. While this technology was primarily developed to bypass the limitations of cloning lethal genes, it also offered an alternative to overcome the instability issues of large repeat expansions in bacteria. In fact, a 2008 study by

Osborne and Thornton successfully generated recombinant DNA fragments containing up to 2,000 uninterrupted CTG repeats110. Their approach is a promising solution to supply recombination templates carrying large expansions, simplifying the development of knockin models for DM1 and other repeat expansion diseases.

31

Despite the addition of RCA to our technical repertoire, other challenges have hindered the development of Dmpk knockin models. Gene editing by traditional ES targeting is a very inefficient process111, since it relies on the spontaneous invasion of the homologous locus by free DNA ends from the linearized recombination template.

Additionally, this strategy is very laborious, requiring positive selection and extensive screening of ES clones prior to blastocyst injection. Cultured ES clones are also prone to repeat instability, often contracting the repeat size over passaging91. These issues continued to undermine the generation of Dmpk knockin models carrying large repeat expansions, until the recent development of novel gene editing strategies that bypass the usage of mouse ES cells.

To overcome the limitations of ES targeting, several research groups developed nucleases that can be engineered to recognize a specific DNA sequence and cleave it, forming a double-stranded break (DSB) in the targeted locus112-114. Compared to traditional ES targeting, engineered nucleases offer a multi-fold improvement in editing efficiency, since the resulting DSB is a very potent trigger of DNA repair115. This dramatic increase in efficacy meant that gene editing could now be performed directly in mouse zygotes, completely eliminating the requirement for mouse ES cells.

In their early stages of development, engineered nucleases were not easily accessible to the average molecular biologist. Zinc finger nucleases (ZFNs)115 and transcription activator-like effector nucleases (TALENs)114 remained a costly alternative to traditional ES targeting, primarily due to their difficult design, synthesis and validation.

In 2012, the CRISPR/Cas9 system was finally harnessed for gene editing, offering a simplified and cost-effective option that was quickly adopted for routine applications.

32

Clustered regularly interspaced palindromic repeats (CRISPR) were initially described as a locus containing non-adjacent repetitive sequences in the genome of several bacteria including E. coli116, 117. This locus also encoded CRISPR-associated

(Cas) proteins containing helicase and nuclease domains118-121. Little was known about the function of CRISPR, until the discovery of phage sequences in the spacers between repeats118, 119, 122. These findings originated the hypothesis that CRISPR could mediate adaptive immunity in bacteria, serving as a memory of past phage infections123. Initially, the CRISPR locus was thought to function similarly to RNA interference, producing antisense RNAs aimed to degrade viral transcripts124. However, later studies demonstrated that Cas proteins, when guided by CRISPR RNAs (crRNAs)125, were capable of targeting and cleaving specific DNA sequences from invading phages126.

The CRISPR locus is very diverse among bacteria127. While types I and III

CRISPR systems demand multiple Cas proteins for DNA recognition and cleavage128, type II only requires a single protein named Cas9 – an attractive feature for genome engineering applications129, 130. In Streptococcus pyogenes, Cas9 is guided by a dual-

RNA complex, composed of a crRNA encoded by the CRISPR locus and a trans- activating crRNA (tracrRNA) required for crRNA maturation129, 131. For gene editing purposes, this dual-guide RNA complex has been simplified to a chimeric single-guide

RNA (sgRNA) containing the tracrRNA scaffold required for Cas9 association and the variable crRNA sequence that can be modified to target any sequence of interest129, 131.

To prevent cleavage of the endogenous CRISPR locus, Cas9 activity requires the presence of a protospacer adjacent motif (PAM) in the target DNA – a small sequence motif immediately downstream to the crRNA binding site132. Motif sequences

33

vary according to the species from which Cas9 is derived. For instance, S. pyogenes requires an NGG PAM131, in contrast to Staphylococcus aureus that recognizes an

NNGRRT PAM133. When choosing a Cas9 variant, several items must be considered including the sequence constraints in the targeted locus and the packaging limitations of the intended delivery vector.

The groundbreaking potential of CRISPR/Cas9 has been demonstrated by numerous studies. Remarkable applications include the elimination of HIV in live animals134, the simultaneous engineering of 62 pig loci135, and the controversial editing of human embryos136, 137. In the field of repeat expansion diseases, CRISPR/Cas9 has been repetitively used to correct mutant alleles and revert disease phenotypes in patient cells 138, 139, and to block transcription of expanded repeats by stalling RNA polymerase

II in a DM1 mouse model140. This chapter will address the development of CTGexp Dmpk knockin models by CRISPR/Cas9, including the strategies used to generate recombination templates carrying large CTGexp and the outcomes from this gene editing approach.

Targeting the Endogenous Dmpk 3’ UTR by CRISPR/Cas9

To assess the conservation of human DMPK in the mouse genome, multiple

BLOSUM62 pairwise sequence alignments were performed. At the protein level, DMPK is strongly conserved in mice, with a 90.6% similarity in the amino acid sequence. Not surprisingly, DMPK is also highly conserved at the DNA level. In both humans and mice, the DMPK gene is primarily composed of 15 exons, with an average similarity of 86 and

61% for CDS exons and introns, respectively. In the 3’ UTR, a 59.8% similarity was observed (Figure 2-1). Altogether, these results indicate a strong evolutionary

34

conservation of DMPK between mice and humans, highlighting the potential of Dmpk knockin models to understand and treat DM1.

Our ability to harness novel technologies to engineer the endogenous Dmpk 3’

UTR and insert progressively larger CTGexp will yield mouse models aimed to recapitulate different DM1 onsets (Figure 2-2-A). Despite the existence of numerous studies reporting that larger CTGexp in peripheral blood correlate with earlier DM1 manifestations53, 54, 141-143, the exact expansion size underlying different DM1 onsets is not well-established. Repeat sizes from peripheral blood often underestimate the

CTGexp from affected tissues, mostly due to the occurrence of somatic instability throughout the patient lifespan55. Autopsy samples from children born with congenital myotonic dystrophy (CDM) shed some light on this issue as Southern blots from multiple CDM tissues revealed that repeat sizes from 700 to 1,000 are capable of triggering congenital disease59, 60. These patients also presented a reduced level of somatic mosaicism, likely due to their shorter lifespan59. For adult-onset DM1, the expansion threshold is still unknown. While some argue that post-developmental tissues express lower levels of DMPK and may require larger CTGexp to drive disease, others point to the possibility of increased toxicity of small repeats with aging.

Inbred mouse strains such as C57BL/6J are promising platforms to evaluate the functional consequences of different CTGexp sizes, primarily due to their short lifespan and homogenous genetic background. To enable the insertion of CTGexp in the endogenous Dmpk 3’ UTR of C57BL/6J mice, single-guide RNAs (sgRNAs) targeting this locus were searched using different CRISPR algorithms. Among the sgRNAs with the highest specificity scores, a 5’ – TGCTTTACTCAGCCCCGACGTGG – 3’ sequence

35

was chosen for further validations, mostly due to its close proximity to the interrupted

CTG repeat tract present in the mouse Dmpk 3’ UTR. To investigate the ability of this sgRNA to induce DNA cleavage, Cas9 ribonucleoprotein (RNP) complexes were transfected into mouse C2C12 cells and the resulting indel mutations were detected by

T7E1 assay (Figure 2-2-B; Table 2-1). The robust presence of indels in transfected cells demonstrated the ability of Cas9 to efficiently target and cleave the Dmpk 3’ UTR.

Gibson Assembly of CTGexp Recombination Templates

The knockin of large CTGexp relies on homology-directed repair (HDR) of the targeted locus following Cas9 cleavage. Consequently, a recombination template containing the desired repeat expansion flanked by homology arms is required. A plasmid containing 220 CTG repeats (courtesy of Dr. Charles Thornton) was electroporated into HB101 E. coli, aiming to minimize repeat instability. Kanamycin- resistant clones were grown at 30C for no longer than 12 hours, picked and further cultured in liquid lysogeny broth (LB) containing 50 mg/mL of kanamycin for 12-15 hours at 30C. Purified plasmids were double digested by BstXI and BstAPI – restriction enzymes that release the repeat expansion from the plasmid backbone (Figure 2-3-A).

Following agarose gel electrophoresis, digested plasmids containing a stable repeat expansion were used to generate a recombination template by Gibson assembly

(Figure 2-3-B). This method allows joining multiple DNA fragments in a single isothermal reaction based on three enzymatic components: a 5’ exonuclease, a DNA polymerase and a ligase. Adjacent fragments must have a 15 bp overlap to enable annealing of complementary sequences following exonuclease activity. Sequence gaps

36

are filled by the polymerase and nicks are finally removed by the ligase, resulting in a contiguous DNA fragment.

To assemble the final recombination template, two primer sets (F1/R1 and

F2/R2; Table 2-2) were designed to amplify homology arms of 956 bp flanking the Cas9 cleavage site (3 bp upstream the NGG PAM). These primers included a 15 bp overlap to the BstXI/BstAPI digested plasmid fragments, which enabled cloning by Gibson assembly. Primers F1 and R2 also included an AfeI restriction site, allowing gel extraction of linearized recombination templates free of any plasmid backbone sequence. Following Gibson assembly (Figure 2-3-C), plasmids purified from kanamycin-resistant HB101 E. coli clones were screened by AfeI digestion, which is expected to result in a ~2.6 kb band corresponding to the recombination template (220

CTG repeats flanked by two 956 bp homology arms) and a ~2.0 kb band derived from the plasmid backbone (Figures 2-3-D and E). Stable constructs were further amplified by rolling circle amplification (RCA) and digested by AfeI. Finally, the linearized recombination template was gel purified and injected into C57BL/6J zygotes (Figure 2-

4), together with Cas9 and the in vitro transcribed sgRNA at previously reported concentrations144. Sanger sequencing of injected recombination templates revealed a small contraction – from 220 to 202 CTG repeats, which likely occurred during construct assembly in bacteria.

Outcomes of CRISPR/Cas9 Zygote Microinjections

To assess the editing outcomes from zygote microinjections, genomic DNA was extracted from tail snips collected at P14, digested by EcoRI and evaluated by Southern blot using two different DNA probes (Figure 2-5-A; Table 2-3). While the external probe revealed that 5 out of 74 possible founders (6.76%) had an insertion at the Dmpk 3’

37

UTR locus (Figure 2-5-B), an internal probe demonstrated the absence of random insertion events in only 2 out of the 5 positive founders (40%) (Figure 2-5-C). The high occurrence of random insertions can be attributed to the use of a linearized recombination template. Unfortunately, there is no alternative to this problem, since recombination templates containing repeat expansions can only be efficiently synthesized by RCA, and this method yields a hyperbranched product that must be linearized and gel purified prior to zygote microinjections. Consequently, the use of recombination templates generated by RCA requires a very careful screening by

Southern blot aimed to detect and eliminate any founders containing randomly inserted

DNA fragments.

A peculiar outcome can be observed in the internal probe screening by Southern blot (Figure 2-5-C). Both Founders #35 and #52 showed a 2.6 kb band derived from a random insertion of the recombination template. Independent random insertion events, especially in two different animals, are unlikely to generate the same band size by

Southern blot. After six generations of backcrossing these 2 founders to C57BL/6J wild- type mice, these bands were still present in all mutant animals, suggesting they were linked to the endogenous Dmpk locus. These results led to the hypothesis of a concatameric insertion of the recombination template in the edited locus of founders

#35 and #52. This hypothesis was tested by PCR amplification using a forward primer outside the 5’ homology arm and a reverse primer either inside or outside the 3’ homology arm (Figure 2-6-A; Table 2-4). In true knockin lines, both primer combinations should yield the same product. In comparison, concatameric mutants should exhibit a much larger product when using a reverse primer outside the 3’ homology arm – a

38

primer that would force the amplification of the entire insertion. This strategy was tested in the progeny from 4 founders, indicating that founders #08 and #68 were true Dmpk knockins, whereas founders #35 and #52 harbored a concatameric insertion (Figures 2-

6-B and C). Finally, repeat-primed PCR was performed to confirm the presence of a repeat expansion in all 4 mutant lines (Figure 2-6-D; Table 2-5).

Since gene editing often takes place after the one-cell stage, founders are commonly mosaic145. The success of germline transmission depends on the zygote stage in which editing occurred, and whether the edited blastomere(s) will ultimately become germ cells. Among the 5 founders that tested positive for a Dmpk insertion by

Southern blot, only 4 transmitted the targeted allele to the next generation. The lack of mutant progeny in the litters from Founder #50 indicates absence of edited germline, despite the existence of mutant cells in the tail tissue screened by Southern blot. For the remaining 4 founders (#08, #35, #52 and #68), germline transmission rates varied from

14.28 to 47.05%. The expansion size in each of these lines was calculated by regression analysis of distances migrated by mutant PCR products following agarose gel electrophoresis. While Founders #35 and #52 contained a concatameric insertion of

202 repeats, Founders #08 and #68 carried a single knockin insertion of 170 and 150 repeats, respectively (Table 2-6). Given our goals of developing a Dmpk knockin model that harbors the same mutation underlying DM1, we chose to expand the 170 knockin line derived from Founder #08 for further characterization studies (Chapter 3). From this point, these mice will be referred as Dmpk+/170 (heterozygous) or Dmpk170/170

(homozygous).

39

Synthesis of Recombination Templates Carrying Larger Repeat Expansions

The use of RCA to generate a CTGexp recombination template, followed by

CRISPR/Cas9 editing of the endogenous Dmpk 3’ UTR, was successful to generate a

Dmpk knockin model carrying 170 CTG repeats – the largest expansion inserted in the

Dmpk locus to date. While this achievement demonstrates the feasibility of these technologies to develop mouse models for DM1, the knockin of a much larger repeat expansion may be needed to model this disorder. For this purpose, recombination templates carrying larger repeat expansions are required. To generate these constructs, we modified a previously developed strategy that relies on successive rounds of Golden

Gate assembly followed by RCA110. This method explores the ability of type IIS restriction enzymes (e.g. BbsI, BsaI, BsmBI) to cleave outside of their recognition sequence. When flanking a repeat expansion in opposite strands, these restriction sites can be used to create complementary overhangs within the expanded repeat tract, enabling construct dimerization by Golden Gate assembly. In this strategy, the starting construct is linearized by a single-site restriction enzyme and separately digested by each of the two type IIS enzymes whose sites flank the repeat expansion. Following gel purification of the expansion-containing fragment from each digestion, both pieces are re-ligated, generating a new circular construct containing 2n-1 repeats, where “n” is the original expansion size (Figure 2-7-A). This circular molecule can be amplified by RCA and the steps above can be repeated to further increase the expansion size. Thus, this bacteria-free strategy allows the generation of recombination templates carrying progressively larger repeat expansions. Since the restriction sites are outside the repeat tract, this method is not restricted to CTG expansions, and it could be easily applied to generate knockin models for other repeat expansion diseases.

40

The strategy above was applied to the original plasmid containing 202 repeats flanked by homology arms, generating new recombination templates carrying 403 and

805 uninterrupted CTG repeats (Figure 2-7-B). The latter was purified and microinjected into C57BL/6J zygotes, together with Cas9 protein and the same in vitro transcribed sgRNA used to generate the original Dmpk knockin lines.

The Knockin of 480 CTG Repeats in the Endogenous Dmpk 3’ UTR

The outcomes from microinjecting CRISPR/Cas9 reagents and a recombination template carrying 805 CTG repeats were assessed by Southern blot of genomic DNA extracted from tail snips. Seven out of 81 mice (8.64%) tested positive for a Dmpk insertion as determined using a radiolabeled probe external to the homology arm.

Repeat-primed PCR confirmed the presence of a repeat expansion in Founders #22 and #24. The absence of random insertions in Founder #24 led us to expand this colony for future characterization studies. The expansion size in this knockin line was calculated by regression analysis from restriction fragment length polymorphism (RFLP) and PCR genotyping data. Founder #24 carried 480 CTG repeats in the targeted Dmpk

3’ UTR locus.

Germline transmission rates from Founder #24 were extremely low, suggesting the existence of very few germ cells carrying the edited allele. This founder has been mated to numerous C57BL/6J and FVB/NJ dams. To date, C57BL/6J matings have generated 43 mice and no transmission has been observed. From FVB/NJ matings, germline transmission occurred in 1 out of 7 newborn mice (Figure 2-8). This single heterozygous Dmpk+/480 mouse was extensively mated to expand the colony and generate animals for phenotypic studies, before suddenly dying at 13 weeks of age.

41

A single nucleotide polymorphism (SNP) from the FVB/NJ background is particularly helpful when breeding Dmpk knockin mice. The albinism present in the

FVB/NJ strain is caused by a recessive G>C mutation in the tyrosinase (Tyr) gene, located 39.55 cM apart from Dmpk on . This distance yields a frequency of recombination of 27.33% - within the 50% limit used to define two loci as genetically linked. While recombination is still possible, most of Dmpk knockin mice in the mixed

C57BL/6J-FVB/NJ background are albino if wild-type, agouti if heterozygous and black if homozygous. This coat color distribution is due to the genetic linkage between mutant

Dmpk and wild-type Tyr alleles in the original C57BL/6J founder, and wild-type Dmpk and mutant Tyr alleles in the FVB/NJ mice used for backcrossing.

Discussion

This chapter outlined the genome engineering strategies capable of overcoming the main technical challenges that hindered the development of knockin mouse models for repeat expansion disorders. The use of Gibson assembly to build circular constructs containing a CTGexp flanked by homology arms, followed by rolling circle amplification, was successful to generate recombination templates for CRISPR/Cas9 editing of the endogenous Dmpk 3’ UTR locus. Following microinjection of mouse zygotes, tail DNA was screened by different genomic approaches, revealing the presence of mutant Dmpk alleles in approximately 7% of newborn mice. Mosaicism was observed in founder animals, with germline transmission rates ranging from 0 to 47%. The inserted expansion varied in length from 150 to 202 CTG repeats, suggesting the occurrence of contractions during homologous recombination. To our surprise, two independent lines carried a concatameric insertion of the recombination template in the targeted locus, stressing the importance of genomic screening by Southern blot to select properly

42

targeted lines for characterization studies. One Dmpk knockin founder carrying 170

CTG repeats was backcrossed to C57BL/6J wild-type mice for 10 successive generations and characterized by multiple molecular, histological and physiological assessments described in Chapter 3.

Microinjection of a recombination template containing 202 repeats generated mutant lines carrying smaller expansions, such as 150 and 170 CTGs. While the exact contraction mechanism is unclear, this outcome suggested a shortcut to create an allelic series of knockin mice for repeat expansion disorders. Instead of individually attempting to insert progressively larger expansions, one could simply generate a recombination template with the largest repeat possible and allow nature to spontaneously shrink the expansion during homologous recombination. Since every recombination event is distinct, a single microinjection would yield several mouse lines carrying different repeat lengths in the targeted locus. To test this hypothesis, a recombination template carrying

805 CTG repeats was generated by Golden Gate assembly followed by RCA and microinjected into C57BL/6J zygotes. As expected, this strategy resulted in Dmpk mutants carrying smaller repeat sizes, including a founder with 480 CTG repeats in the targeted locus. This line is currently undergoing extensive backcrossing to the FVB/NJ and C57BL/6J backgrounds to outcross any potential off-target mutations and enable future phenotypic studies.

43

Figure 2-1. DMPK sequence conservation in mice. A) Both humans and mice share a similar DMPK gene structure, which is primarily composed of 15 exons. B) In terms of genomic sequence, Dmpk is highly similar to its human counterpart. Pairwise sequence alignments revealed an average similarity of 86 and 61 percent for CDS exons and introns, respectively. In terms of amino acid sequence, the encoded DMPK protein is 90.6% similar to the human DMPK product.

44

Figure 2-2. Validation of Dmpk targeting by CRISPR/Cas9. A) Summary of CRISPR/Cas9 strategy aimed to generate an allelic series of Dmpk knockin models. Knockin mice carrying progressively large CTG repeat expansions are expected to model different DM1 onsets. B) Following cleavage by Cas9, DNA double-stranded breaks are often repaired by non-homologous end- joining (NHEJ), a pathway that leads to insertions or deletions (indels) in the target sequence. To validate a sgRNA targeting the endogenous Dmpk 3’ UTR, transfected cells were lysed and a small fragment containing the target site was PCR amplified. Following amplicon denaturation and re-annealing, fragments containing indels form DNA hairpins, which are recognized and cleaved by T7E1 endonuclease.

45

Figure 2-3. Assembly of Dmpk targeting construct. A) Plasmids containing 220 CTG repeats (courtesy of Dr. Charles Thornton) were initially propagated in HB101 E. coli to minimize repeat instability. B) Purified DNA was digested by BstXI and BstAPI. In stable clones (red stars), these enzymes should yield a 2.0 kb band (backbone) and a 660 bp band corresponding to 220 CTG repeats. C) The Dmpk targeting construct was generated by Gibson assembly, using PCR amplified homology arms and digested plasmid DNA from stable clones. D) Screening of antibiotic-resistant HB101 clones following Gibson assembly. When digested by AfeI, correctly assembled constructs (red stars) should yield a ~2.0 kb band (backbone) and a 2.6 kb band corresponding to the recombination template. E) Resulting plasmid containing a CTGexp flanked by homology arms.

46

Figure 2-4. Rolling circle amplification of recombination templates. A) Map for the construct previously generated by Gibson assembly. This plasmid contains a CTG repeat expansion flanked by homology arms, in addition to backbone sequences required for propagation in bacteria. B) The strong strand displacement of phi29 DNA polymerase enables continuous extension of the circular template by rolling circle amplification (RCA). Due to the use of random hexamers instead of specific primers, RCA can start at any location in the circular template or in the newly synthesized DNA. C) This continuous amplification strategy generates a hyperbranched product that must be linearized prior to zygote microinjection. D) Following linearization, recombination templates are gel purified and microinjected into C57BL/6J zygotes, together with Cas9 and the sgRNA targeting the endogenous Dmpk 3’ UTR.

47

Figure 2-5. Southern blot screening of mutant Dmpk founders. A) When digested by EcoRI, genomic DNA is expected to generate a 4,649 bp fragment containing the targeted Dmpk locus. In mutant Dmpk founders, a larger fragment should also be present, corresponding to the wild-type size added to the repeat expansion. The use of a radiolabeled DNA probe external to the homology arms allows selective detection of the targeted Dmpk locus, providing useful information about the occurrence of indels and the presence of a repeat expansion in the Dmpk 3’ UTR. In contrast, the use of a probe internal to any homology arm enables detection of random insertions from the recombination template, which yield fragments of unpredicted sizes. B) External probe screening of mutant founders. Presence of more than 2 bands or differences in the intensity of wild-type and mutant bands indicate mosaicism in the analyzed tissue (e.g., Founder #35). Bands smaller than the wild-type fragment suggest the occurrence of deletions (e.g., Founder #52). C) Internal probe screening of mutant founders. Bands of unexpected sizes indicate random insertions. The ideal outcome is the presence of only a wild-type and an expanded band in both external and internal screenings (e.g., Founder #08).

48

Figure 2-6. Genotyping progeny from mutant Dmpk lines. A) To amplify the targeted locus, PCR was performed using a forward primer outside the 5’ homology arm (Fwd-1) and a reverse primer either inside (Rev-1) or outside (Rev-2) the 3’ homology arm. Vertical dashed lines represent the boundaries of each homology. B) Primers Fwd-1 and Rev-1 yield smaller locus-specific amplicons, which are useful for routine genotyping purposes, and to estimate the repeat size by regression analysis. In wild-type mice, these primers generate a 1,178 bp band, while in mutants they produce a larger fragment depending on the repeat expansion size. C) To detect possible concatamers, PCR was performed using both primers outside the homology arms (Fwd-2 and Rev-2), which produces a 3,006 bp band in wild-type animals. In concatameric mutants carrying two copies of the recombination template, the use of primer Rev-1 would result in biased amplification of the first insertion, preventing the detection of the second copy. Primer Rev-2 overcomes this issue by forcing amplification of the entire targeted locus, enabling to distinguish true knockin mutants from mice harboring a concatameric insertion. D) Repeat-primed PCR of heterozygous progeny from each founder. In this strategy, amplification is performed using a fluorescently labeled forward primer and two reverse primers, one of them designed to anneal within the repeat expansion. Stutters in the capillary electrophoresis results indicate presence of a repeat expansion in the amplified fragment.

49

Figure 2-7. Generation of recombination templates carrying larger expansions. A) Repeat dimerization by Golden Gate assembly followed by RCA. Plasmids containing a repeat expansion flanked by BsaI and BbsI were initially linearized by SapI. Next, linear constructs were separately digested by BsaI or BbsI, and the repeat-containing fragments were gel purified, ligated and amplified by RCA. This bacteria-free process generates a new construct carrying 2n-1 repeats, where “n” is the initial expansion size. For simplification purposes, the starting construct shown here contains 5 CTG repeats, but this method is also applicable to plasmids containing larger expansions. B) This strategy was applied to the original plasmid containing 202 repeats flanked by homology arms, generating new recombination templates carrying 403 and 805 CTG repeats. RCA products were linearized by AfeI, yielding a constant band of ~2.0 kb (backbone; bb) and an upper band of variable size corresponding to the recombination template containing progressively larger repeat expansions.

50

Figure 2-8. Genotyping Dmpk170 and Dmpk480 knockin lines. A) PCR amplification of genomic DNA extracted from tail tissue of Dmpk+/+ (wild-type), Dmpk+/170 and Dmpk+/480 mice. This strategy uses a forward primer upstream the 5’ homology arm (Fwd-1) and a reverse primer within the 3’ homology arm (Rev- 1), yielding a single 1,178 bp locus-specific amplicon in wild-type mice, in contrast to Dmpk+/170 and Dmpk+/480 mutants that show an additional ~1,688 and ~2,618 bp band, respectively. B) Repeat-primed PCR of heterozygous progeny from each knockin line, confirming the presence of a repeat expansion in the targeted locus.

51

Table 2-1. Primers used to amplify the targeted Dmpk locus for T7E1 assay. Primer 5’ – 3’ Sequence (size) Dmpk-T7E1-Fwd TCCGCTCCACACTTCTGT (18 nt) Dmpk-T7E1-Rev GAGATATTGCGCGGGATTTAGG (22 nt)

Table 2-2. Primers used to amplify homology arms for Gibson assembly. Uppercase letters represent locus-specific sequences, whereas lowercase was used for sequence overlaps. AfeI restriction sites are shown in bold. Primer 5’ – 3’ Sequence (size) Gibson-Fwd1 (F1) agcggccccaggccaAGCGCTGGAACTCCCTATAC (35 nt) Gibson-Rev1 (R1) ggagaccccagtggcCGGGGCTGAGTAAAGCAAGG (35 nt) Gibson-Fwd2 (F2) gcgtcttctgcacaagACGTGGATGGGCAAACTGCT (36 nt) Gibson-Rev2 (R2) gggttaccagcaacttAGCGCTTCAGAGCAACGGAG (36 nt)

Table 2-3. Primers used to amplify Southern blot probes and resulting amplicons. Primer 5’ – 3’ Sequence (size) External-Fwd TAACCCTATAACCCCACCATTGAGG (24 nt) External-Rev TGAGCGATAACTCCACAGAGGGTG (24 nt) External probe TAACCCTATAACCCCACCATTGAGGAGGCTGAGGCTGGA GCATCACTGCAAATTTGAGGCCAGGATGGTCAACAAATA AGTCCCAGAGCTGGCATAGAGGAACTCTGTCTCAACAAT AAAGAGAACTTATCTAGCATTTATGAGGGTAAATAAAAAT TTACCATTGCCACAAAAAATGTAAATGAAGAGACTGCTTT TAGGAGTGAACTGGGAAGCAGGGAACACTTAGAGGATG CTCACTCACACAGGTATCCACCATCAGGCATGCCTCAGG CCTGCACAGGGAAGGACAACTTGTTTCATGATTTGCAAG CAGCATCCCATGCTCCTTAGAGCGGGTTGGGCCCAGCC CACCCTCTGTGGAGTTATCGCTCA (375 nt) Internal-Fwd CGCAATATCTCTGTTGTGGAAAGGA (25 nt) Internal-Rev CAGCAACTTTTCTGGAGTGTCGGT (24 nt) Internal probe CGCAATATCTCTGTTGTGGAAAGGAAACCGCCCCGCAGG CCAATGGAGAGTCCAATAGAGACAACCAATGGCTTGAGT GGGAGCTAGAGGGGAGGCAAAGCGCACGAATCAGGTTG AAGGGTGGGGCTTAGGCATCCAGCCAGTAGGAGAGAAG CAACAAGCCACCAGAGACACCACCGCCCCCCACCCTCC CCCCCAGCTGTGACCCAGCTGTGCCACTCAAGTTTGGAA AAAAGTAGGGGGTTGGGCCAGCAGCGGGCACACCATCT TCCCACTGCGCCTGCGCAAGCCACGCGCATCCGCTTTTT GGACCGACACTCCAGAAAAGTTGCTG (334 nt)

52

Table 2-4. Primers used to amplify the edited Dmpk 3’ UTR locus for genotyping purposes. Primer 5’ – 3’ Sequence (size) Outside-Fwd (Fwd-1) GCCCTTCTCCCGTCCACGTTTAGGGTCCATTCTCC (35 nt) Inside-Rev (Rev-1) AGGGGGCTCATGCTTATGGCTTGTAACTGA (30 nt) Outside-Fwd (Fwd-2) AGGTGGGATGCTGGTTTCTTAAAAGCCA (28 nt) Outside-Rev (Rev-2) TTCCCCCCTTGTGTGCCTCTGACTC (25 nt)

Table 2-5. Primers used for fluorescent repeat-primed PCR. Primer 5’ – 3’ Sequence (size) Fwd FAM-TGGGCATTCGCCGACCTTGCTTTAC (25 nt) Rev-1 CAGGAAACAGCTATGACCCGTCTTGTGCAGAAGACGCAG CAGCAGCAGCAGCAG (54 nt) Rev-2 CAGGAAACAGCTATGACC (18 nt)

Table 2-6. Mutant Dmpk founders: germline transmission rates and expansion length Founder % Transmission Expansion size Mutation #08 47.05 170 knockin #35 19.57 202 concatamer #50 zero unknown unknown #52 14.28 202 concatamer #68 40.00 150 knockin

53

CHAPTER 3 PHENOTYPIC OUTCOMES OF DMPK KNOCKIN MODELS

Background

In skeletal muscle tissue, adult-onset DM1 is primarily characterized by delayed muscle relaxation following contraction (myotonia), muscle weakness and wasting.

These symptoms are predominantly observed in distal muscles, where they compromise fine movements and often lead to foot drop. Additionally, DM1 commonly affects facial muscles and the tongue, causing ptosis, facial muscle weakness and difficulties to swallow or speak. In earlier disease onsets, common features include respiratory distress caused by diaphragm dysfunction and low muscle tone or hypotonia. In the cardiac muscle, DM1 manifestations include dilated cardiomyopathy, conduction defects and cardiac block. As a multisystemic disease, DM1 affects numerous other cell types, including the central nervous system (discussed in chapter

04) and the gastrointestinal tract146.

Among the multiple tissues compromised by DM1, skeletal muscle is by far the most studied, in part due to the abundance of mouse models developed to investigate disease progression in this tissue72. These include numerous Mbnl knockout models generated to assess the effects of progressive MBNL titration in skeletal muscle42, 81, 83, and transgenic lines such as the HSALR model carrying 250 CTG repeats in the 3’ UTR of a human skeletal actin (HSA) transgene84. Toxic transcripts in HSALR mice accumulate in RNA foci, sequestering MBNL proteins and giving rise to widespread splicing alterations that result in overt myotonia84. Still, HSALR mice do not express mutant transcripts in tissues other than mature skeletal muscle, a limitation that precludes their use to study the multisystemic features of DM1 or the developmental

54

defects observed in CDM. To overcome these issues, we developed an allelic series of

Dmpk knockin models carrying progressively larger CTGexp in the endogenous Dmpk 3’

UTR. These mice are expected to express toxic transcripts in the endogenous spatiotemporal context, recapitulating molecular, histological and physiological features observed in different DM1 onsets.

In Chapter 2, we reviewed the genome engineering strategies that enabled the generation of Dmpk knockin models carrying up to 480 CTG repeats in the Dmpk 3’

UTR. Here, we will discuss the phenotypic outcomes of inserting 170 CTG repeats in the endogenous locus. Although the expansion size in Dmpk170 mice is not expected to generate an overt phenotype, this model will provide a platform to investigate the spatial distribution of toxic transcripts and the effects of relatively small expansions in Dmpk transcription and translation.

Transcriptional and Translational Assessments in Dmpk170 Muscles

To assess the spatial distribution of mutant transcripts in Dmpk knockin mice, several tissues were dissected from mutant animals, including samples from skeletal, cardiac and smooth muscle. Following RNA extraction and cDNA synthesis, primers flanking the repeat expansion were used to amplify the region in exon 15 containing the edited locus. In heterozygous animals, these primers generate a 397 bp fragment corresponding to the wild-type transcript and a larger amplicon derived from the expanded mRNA. Using this RT-PCR strategy, we confirmed the expression of mutant and wild-type Dmpk alleles in numerous tissues (Figure 3-1-A) – a result that demonstrated the potential of our model to investigate the multisystemic nature of DM1.

Next, we sought to determine the effects of 170 CTG repeats in Dmpk expression. For this purpose, we performed quantitative real-time PCR (qPCR) in the

55

diaphragm and tibialis anterior from Dmpk+/+, Dmpk+/170 and Dmpk170/170 mice. To our surprise, a significant downregulation in Dmpk expression was observed in knockin samples, especially in homozygous mice (Figure 3-1-B). While it has been reported that repeat expansions can modify chromatin accessibility and modulate transcription45, 147, the observed decrease in Dmpk expression can also result from the inability of most

RNA isolation methods to properly extract transcripts bound to proteins51, 148. This hypothesis will be further discussed in a later section from this chapter.

Early DM1 studies pursued the hypothesis of DMPK haploinsufficiency as a major contributor to disease pathogenesis. Initially, this theory was supported by observations of reduced DMPK protein levels in DM1 tissues76, despite the existence of reports that contradicted these findings149. The absence of disease-relevant phenotypes in heterozygous Dmpk knockout mice permanently debunked the DMPK haploinsufficiency model. Consequently, this result reduced the impact of studies aiming to assess the levels of DMPK protein in DM1 – a controversial question that remains unanswered. If completely retained in nuclear RNA foci, mutant transcripts would not be available for translation in the cytoplasm, which in theory would halve the levels of

DMPK protein. We tested this hypothesis in the tibialis anterior of Dmpk170 knockin mice. Compared to wild-type littermates, DMPK protein levels were reduced to 42 and

6% in Dmpk+/170 and Dmpk170/170 mice, respectively. In heterozygous knockins, reduction to 42% (below the 50% threshold expected for these animals) may indicate oversaturation in the wild-type lanes or sequestration of a trans-dominant factor involved in the transport of Dmpk transcripts. Reduction of DMPK protein levels to 6% in homozygous Dmpk170/170 mice suggests that a small portion of mutant transcripts is still

56

exported – an outcome with possible implications for RAN translation (Figures 3-1-C and D).

Accumulation of RNA Foci in the Absence of Spliceopathy

Decreased levels of DMPK protein in the tibialis anterior of Dmpk170 models suggested nuclear retention of mutant transcripts. To test this hypothesis, we performed

RNA fluorescence in situ hybridization (RNA-FISH) in transverse sections of tibialis anterior from Dmpk+/170 mice. Wheat-germ agglutinin (WGA), a lectin that binds monosaccharides found in connective tissue, was used to outline muscle fibers and facilitate the assessment of RNA foci accumulation in myonuclei. As expected, RNA foci were present in the skeletal muscle of Dmpk+/170 mice (Figure 3-2-A), suggesting that expansions as short as 170 CTG repeats are able to accumulate in the nucleus, where they have been shown to sequester MBNL proteins and disrupt RNA processing41, 42.

Following the detection of RNA foci in skeletal muscle sections of Dmpk+/170 mice, we investigated the consequences of RNA toxicity in the alternative splicing of exons previously shown to be regulated by MBNL proteins, such as Atp2a1 exon 22,

Clcn1 exon 7a and Cacna2d1 exon 19. Alternative splicing of these exons was assessed by RT-PCR, using cDNA from the tibialis anterior muscle of Dmpk+/+,

Dmpk+/170 and Dmpk170/170 mice. As predicted for this expansion size, no detectable signs of spliceopathy were observed (Figures 3-2-B and C), suggesting insufficient

MBNL sequestration by mutant transcripts derived from the Dmpk170 allele. In agreement with these results, Dmpk170 mice did not develop overt myotonia or show any significant changes is muscle length, mass, cross-sectional area or force (Figure 3-

2-D through Figure 3-2-K).

57

Increased Levels of RNA Foci Following Myogenic Differentiation

Recent studies reported that DMPK expression is dramatically upregulated following differentiation of human muscle progenitor cells, peaking at 72h post-induction and returning to basal levels in mature myofibers (Figure 3-3-A, adapted from Thomas et al., 2017)83. Therefore, we isolated primary myoblasts from Dmpk170 mice to test the hypothesis of increased RNA toxicity during myogenesis. RNA foci were observed in primary myoblasts derived from Dmpk+/170 and Dmpk170/170 mice. In agreement with the previous study, accumulation of toxic transcripts was noticeably increased in differentiated cells, confirming the existence of a similar temporal pattern of Dmpk expression during muscle development in mice (Figure 3-3-B).

Similar to our findings in mature skeletal muscle, Dmpk transcript levels were significantly decreased in differentiated cells from Dmpk+/170 and Dmpk170/170 mice. To test the possibility of incomplete extraction of mutant transcripts that accumulate in RNA foci, cell lysates were split, and RNA was extracted using standard or hot TRIzol® methods148 (see Chapter 6 for additional details). To our surprise, RNA isolation in hot

TRIzol® was able to recover the entire pool of Dmpk transcripts, resulting in similar (if not slightly upregulated) levels of Dmpk expression in mutant cells, as analyzed by qPCR (Figure 3-3-C). This result confirms that standard RNA isolation procedures are notoriously ineffective when extracting transcripts that accumulate in RNA foci – an outcome that must be carefully considered when quantifying expression of transcripts that carry a repeat expansion. In agreement with our studies in tibialis anterior, DMPK protein levels were downregulated in differentiated myotubes from Dmpk+/170 and

Dmpk170/170 cells (Figure 3-3-D and E), likely due to retention of mutant transcripts in nuclear RNA foci, which prevents mRNA export and DMPK translation in the cytoplasm.

58

The Transcriptomic Profile of Dmpk170/170 Myotubes

Presence of abundant RNA foci in Dmpk170/170 myotubes led us to investigate the transcriptomic effects of RNA toxicity in these cells. For this purpose, we isolated RNA from 6 primary myoblast lines following 72h of myogenic differentiation (3 lines derived from Dmpk+/+ and 3 from Dmpk170/170 mice). Following ribosomal depletion, RNA was converted into a cDNA library using the NEB Ultra II Directional Library Prep kit and sequenced in the Illumina NextSeq500 platform. To enable detection of very subtle variations in alternative splicing, RNA-Seq was performed at very high depth, with an average read count of ~90 million reads per sample (Table 3-3). A sudden drop in read coverage was observed in the edited locus from Dmpk170/170 myotubes, confirming the presence of a repeat expansion in these cells (Figure 3-4-A).

Compared to wild-type myotubes, only 28 significant differences in gene expression were observed in Dmpk170/170 samples (p. adj.  0.05). These included several genes that encode mitochondrial proteins, such as creatine kinase 2 (Ckmt2) and aldehyde dehydrogenase 1 family member L2 (Aldh1l2). Other differentially expressed genes included cyclin D2 (Ccnd2) and nicotinamide riboside kinase 2

(Nmrk2). Overall, the small number of significant gene expression changes reflects the insufficiency of 170 CTG repeats to drive a pathological response, even when expressed at very high levels in homozygous cells (Figure 3-4-B).

Numerous alternative splicing differences were present in Dmpk170/170 myotubes, most of which were subtle yet significant (Figure 3-5-A and B). These included several alternatively spliced exons regulated by MBNL proteins, such as exons 4 and 5 of Tnnt2 and exon 6 of Mbnl2. Exon 4 of Ap2m1 was the most significant alternative splicing

59

event ( = 0.76; Z-score = 2.22) (Figure 3-5-C). Not surprisingly, this 6-nt exon has been extensively shown to be regulated by MBNL proteins. Additionally, a curious alternative 3’ splice site in exon 14 of Dmpk was observed in Dmpk170/170 cells (Figure 3-

5-D). This 3’ splice site is present in a non-coding isoform of Dmpk, and its reduced utilization may suggest a compensatory mechanism for decreased DMPK translation in homozygous cells. Altogether, these differences in alternative splicing indicate early signs of MBNL loss-of-function, which are expected to increase in future studies using primary cells from Dmpk480 mice.

Discussion

Myotonic dystrophy is the most common muscular dystrophy in adults, affecting 1 in 8,000 individuals worldwide. In skeletal muscle, the DM1 pathogenic cascade has been extensively studied, primarily supported by the availability of muscle biopsies from

DM1 patients, in addition to numerous mouse models generated to investigate disease pathogenesis in this tissue83. Transgenic mice such as the HSALR model have been largely used to investigate RNA toxicity and develop rational therapeutic strategies150, despite their inability to recapitulate the developmental defects observed in CDM or the multisystemic features observed in DM184. Other limitations of HSALR mice include the expression of toxic transcripts at levels much higher than observed in DM1 patients, lack of muscle weakness or wasting, and improper distribution of RNA foci that fail to accumulate in myonuclei from neuromuscular junctions151.

Using novel genome engineering strategies described in Chapter 2, we developed Dmpk knockin models carrying up to 480 CTG repeats in the endogenous locus – mice that are expected to recapitulate the spatiotemporal distribution of toxic

60

transcripts observed in DM1 patients. In this chapter, we characterized several aspects of DM1 in the skeletal muscle of Dmpk170 mice, a knockin model initially developed to assess the feasibility of CRISPR/Cas9 to introduce repeat expansions in the mouse genome. As expected for this repeat size, the mature skeletal muscle of Dmpk170 mice did not show any overt phenotypes or signs of spliceopathy, despite the accumulation of

RNA foci in this tissue. Interestingly, a dramatic decrease in DMPK protein levels was observed in the tibialis anterior of Dmpk170 mice, likely due to nuclear retention of toxic transcripts and reduced export of Dmpk mRNA. This result, in the absence of splicing changes or physiological phenotypes, represents the ultimate evidence that DM1 is not caused by DMPK haploinsufficiency.

In agreement with recent findings, we observed increased accumulation of RNA foci in differentiated primary myoblasts from Dmpk170 mice. Upon sequencing the transcriptome of Dmpk170/170 myotubes, we found subtle alternative splicing changes in exons previously shown to be regulated by MBNL proteins, suggesting mild Mbnl loss- of-function in these cells. These early signs of spliceopathy are expected to be more pronounced in future knockin models such as Dmpk480 mice. Altogether, these outcomes demonstrate the feasibility of CRISPR/Cas9 to generate relevant models for repeat expansion diseases, expanding our toolbox of mouse models available to understand disease progression and develop safe and effective therapeutic strategies.

61

Figure 3-1. Transcriptional and translational assessments in Dmpk knockin mice. A) RT-PCR amplification across the targeted site from a heterozygous Dmpk knockin mouse, demonstrating the expression of both mutant and wild-type alleles in numerous tissues. B) Quantitative PCR in skeletal muscle samples from Dmpk170 mice, showing a downregulation in Dmpk transcript levels. C) 40 g of protein lysates from tibialis anterior were blotted with a polyclonal anti-DMPK antibody (ThermoFisher #34-9900; 1:200), revealing decreased DMPK protein levels in the tibialis anterior of Dmpk170 mice. D) Quantification of the Western blot shown in panel C, normalized to total protein. When compared to wild-type levels, DMPK was reduced to 42 and 6% in the tibialis anterior of Dmpk+/170 and Dmpk170/170 mice, respectively.

62

Figure 3-2. Accumulation of RNA foci in the absence of spliceopathy. A) Presence of RNA foci (red) in transverse sections of tibialis anterior from a Dmpk+/170 mouse. Muscle tissue was counterstained with wheat-germ agglutinin (WGA; green) and DAPI (blue). B) Despite the presence of RNA foci, no changes in the alternative splicing of MBNL-regulated exons were observed in the tibialis anterior of Dmpk170 mice, suggesting the insufficiency of this repeat size to sequester enough MBNL proteins and disrupt RNA processing. C) Quantification of RT-PCRs shown in panel B. D-K) Length, mass, cross- sectional area (CSA) and normalized force measurements from the tibialis anterior (D-G) and diaphragm (H-K) of Dmpk170 knockin mice. No significant differences were observed.

63

Figure 3-3. Increased Dmpk expression and RNA foci accumulation during myogenesis. A) In humans, DMPK transcript levels are elevated during muscle development, peaking at 72h after myogenic induction and returning to basal levels in mature muscle tissue (adapted from Thomas et al., Genes Dev, 2017). B) In mice, a similar pattern is observed. Primary myoblasts isolated from Dmpk170 mice showed increased RNA foci accumulation following myogenic differentiation. C) Similar to our observations in mature skeletal muscle, differentiated myotubes showed decreased Dmpk transcript levels (blue bars). However, this result seems to be caused by the inability of standard RNA isolation methods to extract mutant transcripts bound to proteins. When extracted using hot TRIzol, Dmpk transcript levels were similar (if not slightly upregulated) in myotubes from all genotypes analyzed (red bars). D) Protein lysates from differentiated myotubes were blotted with a polyclonal anti-DMPK antibody (ThermoFisher #34-9900; 1:200), showing decreased DMPK protein levels in myotubes from Dmpk170 mice. E) Quantification of the Western blot shown in D, normalized to Gapdh levels. When compared to wild-type, DMPK was reduced to 81 and 19% in differentiated myotubes from Dmpk+/170 and Dmpk170/170 mice, respectively.

64

Figure 3-4. Gene expression changes in Dmpk170 myotubes. A) Drop in read coverage in the edited locus from Dmpk170/170 myotubes, confirming the genotype of these cells. B) Examples of differentially expressed genes in myotubes from Dmpk170/170 mice, including a 2.4-fold downregulation (p. adj. = 6.47E-10) in the transcript levels of mitochondrial creatine kinase 2 (Ckmt2).

65

Figure 3-5. Alternative splicing changes in Dmpk170 myotubes. A) Overview of the main classes of alternative splicing events. B) Number of significant changes (Z- score  1.5) within each splicing category in myotubes from Dmpk170/170 mice. Note that most changes are mild (variations between 0 and 10%), yet significant. C) Inclusion of Ap2m1 exon 4 is the most dramatic splicing change in Dmpk170/170 myotubes (Z-score = 2.22). D) A curious alternative 3’ splice site can be observed in exon 14 of Dmpk, which could indicate an attempt to downregulate the expression of a lncRNA isoform from this gene.

66

Table 3-1. Primers used to measure transcript levels by qPCR. Primer 5’ – 3’ Sequence (size) Dmpk qPCR-Fwd CTGCTCGACCTTCTCCTGG (19 nt) Dmpk qPCR-Rev CACGCCCGATCACCTTCAA (19 nt) Eef2 qPCR-Fwd TGTCAGTCATCGCCCATGTG (20 nt) Eef2 qPCR-Rev CATCCTTGCGAGTGTCAGTGA (21 nt) Gapdh qPCR-Fwd AGGTCGGTGTGAACGGATTTG (21 nt) Gapdh qPCR-Rev TGTAGACCATGTAGTTGAGGTCA (23 nt)

Table 3-2. Primers used to asses Dmpk transcript distribution and alternative splicing by RT-PCRs. Primer 5’ – 3’ Sequence (size) Dmpk-RT-Fwd ACCTTCGCCCCCTGAACCCTAAGACT (26 nt) Dmpk-RT-Rev AGGGGGCTCATGCTTATGGCTTGTAACTGA (30 nt) Atp2a1-E22-Fwd GCTCATGGTCCTCAAGATCTCAC (23 nt) Atp2a1-E22-Rev GGGTCAGTGCCTCAGCTTTG (20 nt) Clcn1-E7a-Fwd GGAATACCTCACACTCAAGGCC (22 nt) Clcn1-E7a-Rev CACGGAACACAAAGGCACTGAATGT (25 nt) Cacna2d1-E19-Fwd TGTCCCAATGGCTACTATTTTGC (23 nt) Cacna2d1-E19-Rev CTCAGCATCGAGAAAATCCAGTG (23 nt)

Table 3-3. Animals used for isolation of primary myoblasts. After extensive passaging to increase cell purity, myoblasts were differentiated for 72h and their RNA was sequenced in the NextSeq500 Illumina platform. Animal ID Genotype Age Sex Passage Number of reads P00772 Dmpk+/+ 3w5d Female P12 77,992,696 P00773 Dmpk+/+ 4w5d Female P12 88,959,247 P00887 Dmpk+/+ 5w2d Female P12 93,698,052 P00842 Dmpk170/170 6w Female P12 88,921,133 P00886 Dmpk170/170 5w2d Female P12 95,246,779 P00917 Dmpk170/170 6w Female P12 95,500,378

67

CHAPTER 4 THE CHOROID PLEXUS INVOLVEMENT IN DM1

Background

Myotonic dystrophy type I is one of the most variable genetic diseases, with symptoms affecting nearly every organ in the body. In the central nervous system

(CNS), disease manifestations include excessive daytime sleepiness, extreme fatigue, cognitive and memory deficits146, 152. Our knowledge on the mechanisms underlying these symptoms is primarily derived from studies using autopsy samples82, 153, constitutive Mbnl2 knockout models43, neuronal-specific Mbnl1; Mbnl2 double knockout mice82 and Drosophila154 or zebrafish155 models overexpressing CTG repeats. While these studies revealed widespread splicing alterations caused by MBNL sequestration in the CNS43, 82, 153, little is known about the primary events that orchestrate disease progression or the exact contribution of different brain structures to DM1 pathogenesis.

To explore these questions, a transgenic approach was used to generate a mouse model carrying a ~45 kb fragment containing a mutant human DMPK gene with approximately 300 CTG repeats85. Intergenerational instability in one DM300 line generated the DMSXL model carrying over than 1,000 CTG repeats87. Hemizygous

DMSXL mice are mostly unaffected, in contrast to homozygous animals that display several CNS abnormalities87, 88, 90. These phenotypes are usually received with skepticism, primarily due to the use of a single transgenic line, in which gene disruption by the randomly inserted transgene could influence the observed phenotypes.

Additionally, transgene expression levels are significantly different from the endogenous

Dmpk transcripts88, 156, undermining the contribution of DMSXL mice to investigate the involvement of specific CNS components in disease progression.

68

The development of Dmpk knockin models that express mutant transcripts from the endogenous locus offers a promising alternative to investigate the contribution of different brain areas to DM1 pathogenesis. Although the relatively small repeat expansion in the Dmpk170 knockin line is not expected to present overt physiological phenotypes in the CNS, this model may be used to evaluate the distribution of RNA foci across the brain. In fact, studies using Dmpk170 mice may reveal which structures are more prone to accumulation of mutant transcripts in the early stages of DM1 progression, when repeats are still fairly small. Additionally, the pattern of RNA foci distribution will also inform us on the spatial differences in Dmpk expression – data that is surprisingly absent from popular resources such as the Allen Mouse Brain Atlas.

In the Dmpk170 Brain, RNA Foci Are Restricted to Choroidal Epithelial Cells

To assess the presence and distribution of CUGexp RNA foci in the brain of Dmpk+/170 and Dmpk170/170 mice, RNA-FISH was performed in sagittal sections from 8-12-week old animals. To our surprise, both heterozygous and homozygous knockin mice accumulated RNA foci exclusively in the choroid plexus – more specifically in choroidal epithelial cells, a single-layered secretory epithelium surrounding the blood vessels that project into the ventricular lumen (Figure 4-1). Importantly, RNA foci were absent in endothelial cells that compose the fenestrated capillaries from the choroid plexus, and in ependymal cells lining the ventricular system. RNA foci were abundant in choroidal epithelial cells from the lateral ventricles rostral to the hippocampus, and in the fourth ventricle located between the brainstem and cerebellum. Choroid plexus tissue was not found in the third ventricle, at least in the sagittal sections used in our experiments.

In brain structures more commonly studied in neurodegenerative diseases, such as the cortex, striatum, hippocampus and cerebellum, no RNA foci were observed

69

(Figure 4-1). Previous reports using post-mortem brain tissue from DM1 patients has demonstrated the presence of RNA foci153 and histopathological remarks in these areas152. The absence of these features in the brain of Dmpk170 knockin mice is possibly due to the relatively small repeat size in this model, likely below the pathogenic threshold for these tissues. Interestingly, the presence of RNA foci in choroidal epithelial cells may suggest the involvement of the choroid plexus in early stages of DM1 pathogenesis, when the repeat tract has not expanded into the thousands. These unexpected results urge for a comprehensive study of molecular and histological signatures of DM1 in autopsy choroid plexus samples.

Increased Dmpk Expression in the Mouse Choroid Plexus

Within the mouse brain, the presence of RNA foci exclusively in choroidal epithelial cells could be caused by spatial differences in RNA turnover or Dmpk expression. To investigate the latter, we microdissected the choroid plexus from the fourth ventricle of wild-type C57BL/6J mice (Figure 4-2-A) and performed quantitative real-time PCR (qPCR) to evaluate Dmpk expression. The levels of Dmpk transcripts were normalized to Eef2, a housekeeping gene previously reported as uniformly expressed across different tissues157. When compared to other brain regions, the choroid plexus showed a ~10-fold increase in Dmpk expression, even though these levels are still ~10-fold lower than those from skeletal muscle or heart (Figure 4-2-B).

These results agree with a 1995 study on the spatial distribution of Dmpk protein in mouse tissues, which reported elevated levels of DMPK protein in the choroid plexus158,

159. Perhaps due to uncertainties concerning the specificity of anti-DMPK antibodies available in the 1990s, these finds have been largely neglected by DM1 researchers until now.

70

Detection of RNA foci in the choroid plexus, in addition to increased Dmpk expression in this tissue, reinforces the concept of repeat load, instead of repeat size, as a major determinant of DM1 pathogenesis. In this model, tissues that express DMPK at higher levels would require smaller repeat expansions to accumulate RNA foci and potentially disrupt RNA processing. As the repeat expansion continues to grow during the patient lifespan, tissues that express DMPK at lower levels become affected, resulting in the multisystemic spectrum of disease symptoms observed in DM1. This hypothesis is supported by the presence of numerous CNS abnormalities in congenital

DM1 patients who inherit a very large repeat expansion, in contrast to adult DM1 patients whose disease is primarily observed in skeletal, cardiac and smooth muscle tissues – all of which express DMPK at very high levels. However, this model fails to explain why adult DM1 patients often present extreme fatigue and excessive daytime sleepiness. These CNS symptoms might be driven by structures that express DMPK at higher levels such as the choroid plexus, or by other brain areas that are less tolerant to smaller expansions or more prone to repeat instability.

Choroid Plexus Development, Functions and Pathology

The nervous system is extremely diverse across species, evolving from a simple net of nerve cells in the cnidarian phylum to a complex and organized network in higher animals160. Despite this evolutionary diversity, the choroid plexus (CP) is conserved from cephalochordates to humans161. This villous structure is found in the ventricular lumen and it is composed of a monolayer of secretory choroidal epithelial cells that surround blood vessels protruding from the brain parenchyma (Figure 4-3). In the CP, capillaries are fenestrated, allowing rapid passage of water from the blood to choroidal epithelial cells, which are responsible for producing and secreting cerebrospinal fluid

71

(CSF). To facilitate CSF production, blood flow in the CP is 5-10 times higher when compared to other tissues162. Upon secretion, CSF can flow from the ventricular system down the spinal cord through the central canal or to the subarachnoid space where it is reabsorbed. The rate of CSF production is approximately 500 mL/day in humans and

432 L/day in mice, with usually no more than 150 mL and 35 L of circulating CSF in these species. Deficits in CSF production or secretion can impair brain development, while too much CSF due to overproduction or decreased reabsorption can lead to hydrocephalus. The absence of a blood-brain barrier (BBB) in the fenestrated capillaries found in the CP is compensated by tight junctions between choroidal epithelial cells, which ensure cell polarity and form the blood-CSF barrier (BCSFB). The apicobasal polarity found in choroidal epithelial cells is critical for choroid plexus function since many transporters such as aquaporin 1 (AQP1) are only present in the luminal side.

Differentiation of choroidal epithelial cells in the mouse brain starts as early as

E8.5, however CP is not evident until 2-3 days later. Following neural tube closure, the

CP from each ventricle originates individually, starting with the plexus from hindbrain

(fourth ventricle), followed by the forebrain CP (lateral ventricles) and finally the plexus from the diencephalon (third ventricle). The CP stroma is derived from head mesenchymal stem cells, while the CP epithelium belongs to the neuroepithelial lineage. Choroidal epithelial cells are mostly post-mitotic, terminating their proliferative stage at the end of embryonic development163. The forebrain and hindbrain CPs are distinct in their transcriptomes and proteomes. These differences result in distinct secretion products, which are thought to contribute to spatial patterning in the developing brain164.

72

Choroidal epithelial cells secrete microRNAs and numerous proteins, including insulin growth factor 2 (IGF2), sonic hedgehog (SHH), leukemia inhibitory factor (LIF) and many others, all of which have been implicated in the maintenance and differentiation of neural stem cells during brain development165. The CP also plays a role in adult hippocampal neurogenesis, secreting chemokines that modulate brain plasticity. With aging, multiple gene expression changes are observed in the CP transcriptome, including alterations in the expression of genes involved in immune response. For instance, the aged CP shows an increase in type I interferon (IFN-I) response, typically responsible for translational repression following viral infections.

Interestingly, aging downregulates type II interferon (IFN-II) pathways, triggering the expression and secretion of cytokines that decrease adult neurogenesis and impair learning and memory166.

The choroid plexus is also a critical entry point for thyroid hormones (THs).

Circulating THs reach the brain through transmembrane transporters from the solute carrier (SLC) family, such as MCT8 and SLCO1C1167. In the CNS, approximately 80% of THs enter the brain parenchyma through the BBB. The remaining 20% is transported from the serum to the CSF by choroidal epithelial cells. The distribution of hydrophobic

THs in the CSF is facilitated by transthyretin (TTR). In the brain, TTR is exclusively produced by the CP and it is the only known carrier of THs in the CNS. This protein plays a critical role in delivering THs to neural stem cells in the subventricular zone

(SVZ). In this niche, TH signaling through the thyroid receptor 1 (TR1) represses

Sox2 expression and favors neurogenic differentiation168. Studies using Ttr knockout

73

mice showed that T4 and T3 levels are reduced to 14 and 48% in the CP, which is likely to result in reduced levels of THs in the CSF and impaired neurogenesis in the SVZ.

For a long time, the adult CNS was thought to be isolated from the immune system. Early in development, myeloid progenitors from the yolk sac migrate to the brain, where they differentiate into microglia, cells that function as macrophages in the

CNS169. After the BBB is formed, myeloid progenitors can no longer access the brain and self-renewal of microglia following activation becomes the primary mechanism to replenish these cells in the adult CNS. However, several studies reported the presence of leukocytes in the CSF, suggesting that immune cells from the blood can access the brain following the BBB establishment170. Several factors implicated the CP as the gateway for immune cells to enter the adult CNS, including the presence of numerous macrophages and dendritic cells in the CP stroma, which are ready to activate lymphocytes, the presence of fenestrated capillaries to facilitate access of immune cells to the CP epithelium, and the significantly increased blood flow in this structure. For some time, the existence of tight junctions that form the BCSFB challenged this theory, but it was later confirmed that brain injury produces messengers aimed to signal the CP to dissociate these junctional complexes and allow basolateral migration of immune cells through the monolayer of choroidal epithelial cells171. This coordinated mechanism prevents the excess of immune cells in the healthy CSF, while still allowing the immune system to fight brain infections and injury. Very recently, CP dysfunction has been reported in Alzheimer’s disease (AD)172, with some studies pointing to the contribution of choroidal epithelial cells in the clearance of amyloid beta (A) from the CSF, and others focusing on the possibility of impaired trafficking of immune cells through the CP

74

as a contributor to AD pathology. Changes in CP structure or function have also been described in models of experimental autoimmune encephalomyelitis (EAE)173 and multiple sclerosis (MS)174.

Consequences of Mbnl Loss-of-Function in the Choroid Plexus

The increased levels of Dmpk expression in the CP compared to other CNS structures, in addition to the accumulation of RNA foci exclusively in choroidal epithelial cells within the brain of Dmpk170 knockin mice, raises the possibility of CP involvement in the pathogenesis of DM1. To date, not a single study has investigated the transcriptomic changes in the CP of DM1 patients or disease models. Consequently, the effects of MBNL sequestration by CUGexp transcripts in the RNA processing of choroidal epithelial cells are unknown. To determine the transcriptomic signatures of Mbnl loss-of- function in these cells, we sequenced the CP transcriptome from wild-type, Mbnl1E3/E3, and Mbnl2E2/E2 knockout mice42, 43. For this purpose, we extracted RNA from the hindbrain CP, which is usually found in the ventral side of the dissected cerebellum.

Following ribosomal depletion, cDNA libraries were constructed using the KAPA

Stranded kit and sequenced in the Illumina NextSeq500 platform (Table 4-1).

In the CP from wild-type mice, Mbnl2 transcript levels were approximately 2 and

8-fold higher than Mbnl1 and Mbnl3, in agreement with previous studies that reported

MBNL2 as the primary muscleblind-like protein in the adult CNS. Knockout of Mbnl1 or

Mbnl2 led to widespread differences in gene expression. While ablation of MBNL2 resulted in 567 significant changes (p.adj.<0.05), loss of MBNL1 led to 284 differentially expressed genes. An overlap of 100 events was observed, which corresponded to 35 and 18% of all differentially expressed genes in the CP of Mbnl1 and Mbnl2 knockout

75

mice. Upregulated genes represented 68 and 52% of all changes in the CP transcriptome following loss of MBNL1 or MBNL2, respectively. The remaining events

(32 and 48%) accounted for downregulated genes in these models (Figure 4-4). Gene ontology (GO) analysis was performed to determine which biological processes were enriched in the list of differently expressed genes in the CP from Mbnl1 or Mbnl2 knockout mice. Interestingly, loss of MBNL1 or MBNL2 affected the expression of genes enriched in similar pathways, including transport of neurotransmitters, ions and hormones. Other pathways, such as rhythmic processes and circadian regulation of gene expression were also significantly enriched, suggesting a possible role for MBNL proteins in the circadian rhythm of choroidal epithelial cells (Figure 4-5).

Absence of MBNL1 or MBNL2 in the mouse CP resulted in downregulation of several genes that have been previously associated with CNS phenotypes. A significant decrease in Slco1c1 expression was observed in the CP of Mbnl2 knockout mice

(Figure 4-6-A). This gene, also known as Oatp1c1 or Oatp14, encodes a transmembrane protein that transports thyroid hormones from the blood to choroidal epithelial cells. While Slco1c1 knockout mice did not display overt neurological abnormalities, these animals showed decreased levels of T4 and T3 hormones in the brain and downregulated expression of T3 targets167. In the CP, reduced transcript levels of Slco1c1 may compromise the transport of thyroid hormones to the CSF, where they play a critical role in regulating adult neurogenesis in the SVZ168.

A reduction in the expression of RNase P component H1, encoded by the Rpph1 gene, is observed in the CP from Mbnl1 and Mbnl2 knockout mice (Figure 4-6-B). The

Rpph1 product is an RNA subunit of RNase P, well-known for its function in tRNA

76

maturation175. In addition to its role as a ribonuclease component, the lncRNA encoded by Rpph1 has been reported as a competing endogenous RNA that promotes dendritic spine formation in the hippocampus by upregulating Cdc42176. To date, it is unknown whether choroidal epithelial cells secrete lncRNAs of paracrine function. It is possible that downregulation of Rpph1 expression in the CP of Mbnl1 and Mbnl2 knockout mice may compromise dendritic spine formation in the hippocampus and impair learning and memory – phenotypes previously observed in mice lacking MBNL243.

The expression of extracellular superoxide dismutase (Sod3) was also 2-fold downregulated in the CP from Mbnl2 knockout mice (Figure 4-6-C). In addition to its traditional role in neutralizing superoxide anions to prevent oxidative stress, Sod3 regulates blood flow by protecting endogenously produced nitric oxide (NO) – a potent vasodilator177. Increased blood flow in the CP facilitates the transport of water required for CSF production. Therefore, a plausible hypothesis is that decreased Sod3 is expected to cause vasoconstriction, reducing blood flow and impairing CSF production by choroidal epithelial cells.

Decreased levels of potassium voltage-gated channel subfamily E member 2

(Kcne2) transcripts were also observed in the CP following loss of MBNL2 (Figure 4-6-

D). This gene is highly expressed in choroidal epithelial cells178, where it regulates the activity of another channel – the sodium/myo-inositol cotransporter 1 (Slc5a3) responsible for secreting myo-inositol into the CSF. Decreased myo-inositol levels in the

CSF of Kcne2 knockout mice lead to behavioral abnormalities and seizures, phenotypes that can be rescued with CSF supplementation of myo-inositol179. Downregulation of

Kcne2 expression in the CP of Mbnl2 knockout mice is a promising contributor to the

77

increased susceptibility of this model to seizures43. Nevertheless, the levels of myo- inositol in the CSF of Mbnl2 knockout mice has yet to be determined.

Alternative splicing differences were also observed in the CP of Mbnl1 and Mbnl2 knockout mice (Figure 4-7). Separate ablation of MBNL1 or MBNL2 led to a similar number of splicing alterations, but most of these events did not overlap. These results may indicate that MBNL1 and MBNL2 regulate different alternative splicing targets in choroidal epithelial cells, or that ablation of MBNL1 or MBNL2 in this tissue compromises additional splicing pathways, leading to an indirect effect on non-MBNL targets. In the near future, strategies such as cross-linking and immunoprecipitation

(CLIP) should be performed to determine the direct RNA targets of MBNL proteins in the CP epithelium.

The primary class of alternative splicing events in the CP of mice lacking MBNL1 or MBNL2 was skipped exons, followed by retained introns, alternative 3’ splice sites, alternative 5’ splicing sites and mutually exclusive exons. Within each major class of alternative splicing, significant events (absolute Z-score  1.5) were grouped according to the absolute variation in the percent spliced in (referred as delta psi or simply ).

For most alternative splicing events, absolute  values ranged from 10 to 30%. Due to the compensatory mechanisms previously reported for MBNL1 and MBNL2, a CP- specific knockout of Mbnl2 in a Mbnl1 null background may be required to fully investigate the consequences of progressive MBNL titration by CUGexp transcripts in

DM1.

78

Discussion

Over the last decades, numerous studies investigated the CNS components of

DM1, uncovering several pathways associated with disease symptoms. The development of Mbnl loss-of-function models provided a resourceful platform to understand the consequences of MBNL sequestration by CUGexp transcripts in any disease-relevant cell type. When guided by histopathological findings from DM1 autopsy tissues153, Mbnl knockout mice have been extensively used to assess the effects of

RNA toxicity, providing useful biomarkers and potential avenues for therapeutic interventions43, 82. The limitation of this strategy is that post-mortem brain tissue usually presents widespread late-stage pathology, offering limited insights on the contribution of specific brain structures to disease pathogenesis. To determine the initial events that orchestrate DM1 progression in vivo, we generated an allelic series of Dmpk knockin models that express mutant transcripts in the endogenous spatiotemporal context.

Using the Dmpk170 knockin model, we evaluated the accumulation of RNA foci throughout the brain and identified the choroid plexus as a promising contributor to the early stages of DM1 pathogenesis.

The choroid plexus involvement in DM1 was further investigated by sequencing the transcriptome of this tissue in wild-type, Mbnl1 and Mbnl2 knockout mice. Ablation of MBNL1 or MBNL2 led to widespread changes in the expression of genes involved in the transport of neurotransmitters, ions and hormones. Several differentially expressed genes have been previously implicated in CNS abnormalities, such as impaired neurogenesis or increased susceptibility to seizures. These findings have yet to be validated using the recently developed Dmpk480 knockin model and autopsy choroid plexus samples from DM1 patients.

79

Since the primary function of choroidal epithelial cells is to produce and secrete cerebrospinal fluid (CSF), biochemical studies in the CSF can be performed to evaluate choroid plexus dysfunction. In mice, the feasibility of this approach is limited by multiple factors, including (1) the spatial differences in CSF composition within the ventricular system164, (2) the notoriously challenging process of collecting blood-free CSF180, (3) the small volumes that can be collected by cisterna magna puncture180, (4) the extremely low protein concentration in this fluid – approximately 0.3-0.4 mg/mL, 60% of which is albumin181, 182, and (5) the fact that several other brain structures actively modify the CSF, meaning that differences in CSF composition do not necessarily reflect choroid plexus dysfunction, unless the analyzed product is exclusively synthesized or transported by choroidal epithelial cells.

Compared to mice, human CSF can be collected at much larger volumes by lumbar puncture. Consequently, several research groups have attempted to characterize the CSF of DM1 patients, primarily aiming to investigate the mechanisms underlying neurodegeneration or excessive daytime sleepiness. Some studies reported conflicting results on the CSF levels of hypocretin 1 (also known as orexin A), a neuropeptide produced by the hypothalamus that regulates the sleep cycle and is decreased in narcoleptic patients183-185. Additional findings in the CSF of DM1 patients include increased levels of total tau and decreased levels of A42186. While increased

CSF levels of total tau are likely due to neurodegeneration, the mechanism underlying decreased levels of A42 is currently unknown, since the neurofibrillary tangles observed in DM1 are not associated with A deposits186, 187. Recently, it has been hypothesized that CSF from DM1 patients contains an endogenous benzodiazepine-like

80

(endozepine) compound that functions as a GABAA receptor agonist. Treatment of DM1 patients with flumazenil, a benzodiazepine antagonist used to treat patients with idiopathic hypersomnia (IH), improved their wakefulness188. Interestingly, the most studied endozepine – diazepam binding inhibitor (DBI) – is expressed at very high levels by choroidal epithelial cells189.

81

Figure 4-1. Distribution of RNA foci in the brain of Dmpk170 knockin mice. Fluorescently exp labeled CAG10 probes revealed the accumulation of toxic CUG transcripts exclusively in the choroid plexus of Dmpk170/170 (above) and Dmpk+/170 (not shown). RNA foci were absent in all other brain structures, including cerebellum, hippocampus and cortex.

82

Figure 4-2. Increased Dmpk expression levels in the choroid plexus of wild-type mice. A) Hindbrain choroid plexus microdissected from the fourth ventricle, compared to the cerebellum. B) Quantitative real-time PCR (qPCR) was performed to assess the relative expression levels of Dmpk in several tissues from wild-type C57BL/6J mice. In contrast to the most commonly studied brain structures, the choroid plexus expresses remarkably high levels of Dmpk transcripts. For comparison purposes, two muscle samples (skeletal and cardiac) were included in this analysis.

83

Figure 4-3. Primary functions of the choroid plexus-cerebrospinal fluid system. The CP is a monolayer of secretory epithelial cells surrounding blood vessels that protrude from the brain parenchyma into the ventricular lumen. These capillaries are fenestrated, allowing rapid passage of water from the blood to choroidal epithelial cells, which are responsible for producing and secreting CSF. The CP epithelium also secretes transthyretin (TTR) – the only carrier of thyroid hormones in the CNS – and numerous growth factors and cytokines that modulate neurogenesis. Tight junctions between adjacent choroidal epithelial cells form the blood-cerebrospinal fluid barrier (BCSFB). Following brain injury or infection, pro-inflammatory messengers in the CSF mediate the dissociation of these junctional complexes, allowing immune cells to enter the CNS.

84

Figure 4-4. Differential gene expression in the choroid plexus of Mbnl1 and Mbnl2 knockout mice. A) Transcript levels of Dmpk, Mbnl1, Mbnl2 and Mbnl3 in the CP of wild-type mice. Similar to other CNS structures, the adult CP expresses Mbnl2 at higher levels than Mbnl1 and Mbnl3. B) Not surprisingly, ablation of MBNL2 leads to a larger number of differentially expressed genes. In the CP, lack of MBNL1 resulted in 284 changes in gene expression – 194 upregulated and 90 downregulated events. In contrast, absence of MBNL2 led to 567 differentially expressed genes – 298 upregulated and 269 downregulated events.

85

Figure 4-5. Gene ontology (GO) analysis of differentially expressed genes in the CP of Mbnl1 and Mbnl2 knockout mice. A-E) Biological processes enriched in the gene expression changes observed in the CP of mice lacking MBNL1 or MBNL2. To facilitate visualization, gene lists were grouped based on biological processes. Genes expressed at very low levels (average RPKM<1) are not shown.

86

Figure 4-6. Examples of downregulated genes in the CP of Mbnl1 or Mbnl2 knockout mice. A) The expression of thyroid transporter Slco1c1 is 1.4-fold downregulated in the CP of Mbnl2 knockout mice (p. adj. = 4.11E-06). B) The expression of lncRNA Rpph1 is 2.8 and 2.3-fold downregulated in the CP of Mbnl1 (p. adj. = 4.06E-14) and Mbnl2 (p. adj. = 3.52E-21) knockout mice, respectively. C) The expression of extracellular superoxide dismutase Sod3 is 2-fold downregulated in the CP of Mbnl2 knockout mice (p. adj. = 1.76E-18). D) The expression of voltage-gated Kcne2 is 1.3-fold downregulated in the CP of Mbnl2 knockout mice (p. adj. = 0.0014). Previous studies have implicated these genes in CNS abnormalities.

87

Figure 4-7. Alternative splicing changes in the CP of Mbnl1 and Mbnl2 knockout mice. A) Alternative splicing is primarily divided into five major classes of events: skipped exons (SE), mutually exclusive exons (MXE), retained introns (RI), alternative 5’ splice sites (A5’SS) and alternative 3’ splice sites (A3’SS). B) Distribution of significant alternative splicing events (absolute Z-score  1.5) observed in the CP of Mbnl1 and Mbnl2 knockout mice, identified using the mixture of isoforms (MISO) probabilistic model. Within each major class of alternative splicing, events were grouped according to the absolute variation in the percent spliced in (referred as delta psi or simply ). A larger absolute  value indicates a more pronounced shift in alternative splicing. C) Overlapping splicing events in the CP of Mbnl1 and Mbnl2 knockout mice, grouped by alternative splicing categories. D-G) Examples of targets differently spliced in these datasets. To facilitate visualization, reads mapping to intronic sequences are not shown.

88

Table 4-1. Choroid plexus RNA-Seq. Animals whose hindbrain choroid plexus was sequenced, and the total read count for each sample. Animal ID Genotype Age Sex Number of reads C05817 Wild-type 12w5d Male 41,536,374 C05819 Wild-type 12w5d Male 27,090,917 C05820 Wild-type 12w5d Male 53,082,412 M02973 Mbnl1 KO 16w2d Male 43,679,213 M02977 Mbnl1 KO 16w2d Male 33,841,063 M03100 Mbnl1 KO 16w6d Male 52,790,437 C05807 Mbnl2 KO 13w3d Male 37,313,714 C05808 Mbnl2 KO 13w3d Male 35,724,545 C05809 Mbnl2 KO 13w3d Male 56,108,380

89

CHAPTER 5 CONCLUSIONS AND FUTURE DIRECTIONS

Repeat expansion mutations are responsible for over than 30 neurological and neuromuscular diseases, most of which are dominantly inherited. Multiple pathomechanisms have been described for these disorders, including haploinsufficiency, RNA gain-of-function and RAN translation. In mice, the relative contribution of these mechanisms has been investigated using several strategies, including ablation of conserved mouse orthologues to mimic protein loss-of-function28, knockout of RNA-binding proteins sequestered by mutant transcripts42, and development of transgenic lines carrying the mutant human locus85. Despite the immeasurable contribution of these models to understand disease pathogenesis, they often do not recapitulate the entire spectrum of disease symptoms, hindering the development of multisystemic therapies for repeat expansion disorders.

We are interested in developing knockin models that faithfully recapitulate repeat expansion diseases by inserting these mutations in the conserved endogenous mouse locus, which will lead to expression of toxic transcripts in the correct spatiotemporal context. Until recently, development of such a model has been hindered by several technical challenges, including the instability of targeting vectors in bacteria and the small repeat sizes capable of generating viable mouse ES cells for blastocyst injections.

In this study, we overcame these limitations using a bacteria-free strategy to generate recombination templates, followed by CRISPR/Cas9 genome engineering. To test our strategy, we developed an allelic series of mouse models for DM1, a neuromuscular disorder caused by a CTGexp in the 3’ UTR of the DMPK gene.

90

Our endeavors to engineer the highly conserved mouse Dmpk gene started with the generation of recombination templates by Golden Gate assembly and rolling circle amplification, a completely in vitro cloning approach that prevents repeat instability. To assess the feasibility of our strategy, we initially attempted to knockin a relatively small

202 CTG repeat, which is not expected to result in embryonic lethality or any overt phenotypes. For this purpose, we microinjected targeting constructs and CRISPR/Cas9 components into C57BL/6J zygotes, and the gene editing outcomes were assessed by several methods. Random insertions of the linear recombination template were found in multiple animals, highlighting the importance of careful screening by Southern blot and extensive backcrossing. Fortunately, one founder carrying 170 CTG repeats in the targeted locus was found to not carry any random insertions, and this animal was backcrossed for 10 generations to establish the Dmpk170 knockin line.

Next, we sought to characterize several aspects of DM1 in heterozygous and homozygous Dmpk170 mice. In skeletal muscle, mutant transcripts accumulated in RNA foci. Expression of Dmpk was initially found to be downregulated in Dmpk knockin animals, but this result was later confirmed to be a consequence from the inability of standard RNA extraction methods to isolate transcripts trapped in RNA foci. DMPK protein levels were dramatically downregulated in knockin animals, likely due to nuclear retention of mutant transcripts, which fail to be exported and translated in the cytoplasm.

As expected, in adult skeletal muscle, expression of 170 CTG repeats at endogenous levels was not enough to trigger spliceopathy or any overt phenotypes previously reported in DM1. Altogether, these results demonstrate that repeats as small as 170 units are able to accumulate in RNA foci, a process that impairs mRNA export and

91

DMPK translation. Absence of overt phenotypes or aberrant splicing patterns confirms that DM1 is not caused by DMPK haploinsufficiency.

Recent studies reported that DMPK expression is dramatically upregulated following differentiation of human muscle progenitor cells, peaking at 72h post-induction and returning to basal levels in mature myofibers. To test the hypothesis of increased

RNA toxicity during myogenesis, we isolated primary myoblasts from Dmpk170 mice and induced myogenic differentiation. Upon induction, myotubes showed increased RNA foci accumulation and early signs of spliceopathy, with several mild yet significant changes in the alternative splicing of exons regulated by MBNL proteins. Collectively, these results suggest that RNA toxicity is increased during muscle development in DM1 patients, reinforcing the idea of repeat load, instead of repeat size, as a major determinant of DM1 pathogenesis.

The CNS expresses DMPK at much lower levels than skeletal muscle or heart and, consequently, overt CNS abnormalities were not expected in Dmpk170 mice.

Nevertheless, expression of toxic transcripts in the endogenous spatiotemporal distribution makes this model a promising tool to evaluate the differential accumulation of RNA foci within the brain, which can possibly inform us about the involvement of specific brain structures in the early stages of DM1 when repeats have not expanded into the thousands. Therefore, we evaluated the accumulation of mutant transcripts in the brain of Dmpk170 mice and found that RNA foci were restricted to choroidal epithelial cells that surround the capillaries from the choroid plexus. Dmpk transcript levels were approximately 10-fold upregulated in the choroid plexus in comparison to other brain areas, suggesting that this structure may be involved in DM1. To assess the

92

consequences of MBNL sequestration in the choroid plexus, we sequenced the transcriptome of this tissue isolated from Mbnl1 and Mbnl2 knockout mice. Widespread changes in alternative splicing and gene expression were observed. Among the differently expressed genes, pathways such transport of neurotransmitters, ions and hormones were enriched, indicating the contribution of MBNL proteins to CSF homeostasis, and suggesting the blood-CSF barrier as a possible mediator of disease progression in DM1. Our results highlight the need for a comprehensive study using autopsy choroid plexus samples from DM1 patients, which will determine the degree of choroid plexus involvement and the overlap between disease patients and Mbnl knockout models.

The Dmpk170 knockin mouse was originally developed with the sole intention of evaluating the feasibility of our genome engineering strategy. However, this model resulted in significant contributions to our understanding of DM1, and these results have encouraged the generation of additional knockin models carrying larger repeat expansions. Recently, we microinjected C57BL/6J zygotes with a recombination template carrying 805 CTG repeats, which resulted in the generation of a Dmpk480 line.

This model is currently undergoing extensive backcrossing to eliminate potential off- target mutations prior to any phenotypic studies. In the near future, we hope to characterize these animals and develop additional knockin models for other repeat expansion diseases, providing tools to develop rational therapeutic strategies for these disorders.

93

CHAPTER 6 MATERIALS AND METHODS

Design and Validation of crRNAs Targeting Dmpk

Publicly available programs such as the MIT CRISPR design tool

(http://crispr.mit.edu) or CHOPCHOP (http://chopchop.cbu.uib.no) were used to search for S. pyogenes CRISPR/Cas9 targeting sites within the 3’ UTR of Dmpk. A 5’ –

TGCTTTACTCAGCCCCGACGTGG – 3’ sequence was chosen for further validations by T7E1 assay. For this purpose, the crRNA above and a common tracrRNA were ordered as Alt-R oligonucleotides from Integrated DNA Technologies (IDT). Alt-R Cas9 protein was also ordered from IDT. Equimolar amounts of crRNA and tracrRNA (2 l of

20 M stocks plus 36 L of nuclease-free duplex buffer) were mixed to a final concentration of 1 M and annealed in a thermocycler (95C for 5 min, followed by 25C forever, ramping down at 0.1C/s). For each well of a 96-well plate to be transfected, 1.5 pmol of cr:tracrRNA complex (1.5 L of 1 M mixture previously annealed) were mixed to 1.5 pmol of Alt-R Cas9 protein (1.5 L of 1 M solution of Cas9) and the final volume was brought up to 25 L with OptiMEM (ThermoFisher). To form ribonucleoprotein

(RNP) complexes, this mixture was incubated for 10 min at room temperature.

Following incubation, 23.8 L of OptiMEM and 1.2 L of Lipofectamine RNAiMAX

(ThermoFisher) were separately mixed and the resulting 25 L were added to RNP complexes (also in 25 L). These 50 L were incubated at room temperature for 30 min and added to one well of a 96-well plate containing 40,000 C2C12/NIH3T3 cells in suspension (recently plated in 100 L per well) and mixed by pipetting (total volume =

150 L/well). We adjusted the calculations above to perform this experiment in triplicate

94

and always included a negative control (untransfected). After 48-72h, genomic DNA was isolated using the DNeasy Blood & Tissue Kit (QIAGEN) and T7E1 assay was performed using the Alt-R Genome Editing Detection kit (IDT), following the manufacturer’s instructions and using primers from Table 2-1. T7E1 cleavage products were separated in a 2.5% agarose gel containing ethidium bromide and visualized under UV light.

Propagation of Plasmids in Escherichia coli

Plasmids containing repeat expansions (courtesy of Dr. Charles Thornton) were propagated in HB101 E. coli. Bacteria were electroporated with 1 ng of plasmid and grown in LB agar plates containing the appropriate concentration of antibiotics.

Following a 12h incubation at 30C, resistant clones were picked and transferred to a liquid LB culture, again in the presence of antibiotics. Following a 12-15h growth at 30C in a shaking incubator, cultures were centrifuged, and plasmid DNA was extracted from pellets using the QIAprep Spin Miniprep kit (QIAGEN), according to the manufacturer’s protocol. Stable clones were screened by restriction enzyme digestion and plasmids containing a stable repeat expansion were stored at -20C. Plasmids that did not harbor a repeat expansion were propagated in DH10 E. coli. Following electroporation, bacteria were transferred to LB agar plates and grown overnight at 37C. Resistant clones were cultured in liquid LB containing antibiotics for 15-18h at 37C in a shaking incubator, and plasmids were purified as described above.

Gibson Assembly of Recombination Templates

Primers used to amplify homology arms and generate fragments for Gibson assembly were designed using the NEBuilder tool from New England Biolabs (Table 2-

95

2). The left or 5’ homology arm was PCR amplified using Q5 high-fidelity DNA polymerase (NEB) and primers F1 and R1. A 50 L PCR reaction consisted of 1X Q5 master mix, 25 pmols of each primer and 500 ng of genomic DNA isolated from wild- type C57BL/6J mice. Optimized cycling conditions for primers F1/R1: 98C for 30s,

(98C for 10s, 62.8C for 30s, 72C for 60s) x 35 cycles, 72C for 2 min, 10C forever.

The right or 3’ homology arm was PCR amplified using the 5PRIME PCR Extender

System (5PRIME) and primers F2 and R2. A 50 L PCR reaction consisted of 1X

Tuning buffer, 10 nmols of each dNTP, 10 pmols of each primer, 3 units of 5PRIME

DNA polymerase and 500 ng of genomic DNA isolated from wild-type C57BL/6J mice.

Optimized cycling conditions for primers F2/R2: 94C for 3 min, (94C for 30s, 68C for

30s, 72C for 90s) x 35 cycles, 72C for 10 min, 10C forever. For each fragment, 8

PCR reactions of 50 L each were performed, gel extracted using the NucleoSpin Gel and PCR Clean-up kit (Clontech), combined and further purified by ethanol precipitation.

To enable Gibson assembly, a plasmid containing a CTG202 repeat expansion

(courtesy of Dr. Charles Thornton) was rolling circle amplified and double-digested by

BstAPI and BstXI. The resulting fragments (plasmid backbone and repeat expansion) were gel extracted and combined. Construct assembly was performed using 100 ng of digested plasmid and 200 ng of each homology arm, in addition to Gibson assembly master mix (NEB), following the manufacturer’s instructions. Assembled constructs were transformed into HB101 E. coli as previously described and purified plasmids were screened by AfeI digestion. Plasmids containing a stable repeat expansion were stored at -20C.

96

Rolling Circle Amplification

To generate large amounts of constructs containing repeat expansions, rolling circle amplification (RCA) was performed. For this purpose, 100 ng of plasmid was mixed with 1 L of random hexamers (ThermoFisher #SO142) and 1 L of 10X T4 DNA ligase buffer (NEB). The total volume of this mixture was brought up to 22 L and incubated at 95C for 3 min followed by 10C forever (no ramp). Following hexamer annealing, 8 L of RCA master mix was added to each reaction and mixed by pipetting.

This master mix was prepared by mixing 3 L of 10X phi29 DNA polymerase buffer

(NEB), 3 L of 10 mM dNTP solution mix (NEB), 0.6 L of 10 mg/mL BSA (NEB), 0.4 L of yeast inorganic pyrophosphatase (NEB) and 1 L of phi29 DNA polymerase (NEB).

The resulting 30 L RCA reaction was incubated at 37C for 18 hours, followed by 65C for 10 min and 10C forever. Viscous RCA products must be linearized prior to any downstream application. For this purpose, 2 L of restriction enzyme, 4 L of the appropriate 10X buffer and 4 L of nuclease-free water were added to each 30 L RCA reaction and the resulting 40 L were digested overnight at 37C (or at the specific temperature required by the restriction enzyme used). Digested RCA products were combined and purified using Agencourt AMPure XP beads (Beckman Coulter), following the manufacturer’s protocol.

Notes: (a) 10X T4 DNA ligase and 10X phi29 DNA polymerase buffers are highly sensitive to repetitive freeze-thawing cycles and should be stored in small aliquots at -

20C, preferably in a non-frost-free freezer; (b) ethanol precipitation instead of AMPure

XP beads is not recommended when purifying digested RCA products, since ethanol precipitates unincorporated dNTPs, which dramatically mask the concentration of RCA

97

products measured using NanoDrop; (c) the choice of restriction enzyme depends on the desired downstream application of RCA products. To generate recombination templates for zygote microinjections, RCA products were digested by AfeI. Products used for repeat dimerization by Golden Gate assembly were digested by SapI.

Repeat Dimerization by Golden Gate Assembly

Following SapI digestion, RCA products were combined, purified using AMPure

XP beads and quantified by NanoDrop. Multiple aliquots, each containing 7-10 g of

SapI-digested fragments, were separately digested overnight by BsaI or BbsI, and the fragments of interest were gel extracted, combined and purified using AMPure XP beads. To enable repeat dimerization by Golden Gate assembly, 100 ng of SapI-BsaI fragments were ligated to 100 ng of SapI-BbsI fragments, using T4 DNA ligase and buffer (NEB) in a total volume of 10 L. Ligations were performed at 16C overnight.

The following day, these 10 L were mixed to 11 L of nuclease-free water and 1 L of random hexamers, and the resulting 22 L were amplified by RCA as previously described. Viscous RCA products were again digested by SapI and this dimerization process was repeated to double the repeat expansion one more time. At every dimerization step, RCA products can be linearized by AfeI and gel purified for zygote microinjections.

Zygote Microinjections

Following digestion of RCA products by AfeI, the linearized recombination template was gel extracted and purified using AMPure XP beads. If necessary, this process was repeated to achieve the volume and concentration required for intracytoplasmic microinjections (30-50 L of template at >400 ng/L). A chimeric

98

sgRNA was in vitro transcribed using the MEGAshortscript T7 Transcription kit

(ThermoFisher) and purified using the MEGAclear Transcription Clean-Up kit

(ThermoFisher), following the manufacturer’s instructions. Lyophilized Cas9 protein was purchased from System Biosciences (SBI). Microinjections were performed at the

Genome Modification Facility from the Harvard University. Final concentrations of Cas9 protein, sgRNA and recombination templates were not disclosed.

Isolation of Genomic DNA from Tail Snips

Tail snips were digested overnight in a thermomixer set to 55C and 500 rpm, in

700 L of mouse tail digestion buffer (50 mM Tris-HCl pH 8.0, 100 mM EDTA, 100 mM

NaCl, 1% SDS) containing 250 g of proteinase K. The following day, digested tails were allowed to cool down to room temperature and 300 L of 5M NaCl was added to each tube. Lysates were mixed by inversion and centrifuged at maximum speed for 10 min. From each lysate, 700 L of clear supernatant were transferred to new tubes containing 700 L of isopropanol. These were mixed by inversion and centrifuged at maximum speed for 10 min. Supernatants were poured off and 800 L of 70% ethanol was added to each tube. Again, tubes were mixed by inversion and centrifuged at maximum speed for 10 min. Finally, ethanol was carefully removed since pellets are very loose following this washing step. Pellets were allowed to air dry for 5-10 min and resuspended in 50 L of nuclease-free water of TE buffer. To ensure complete resuspension, DNA was incubated overnight at 4C prior to any downstream application. For PCR genotyping, genomic DNA must be diluted 1:10 in nuclease-free water. Considering that Southern blotting requires highly concentrated DNA, this dilution should be performed in a separate tube.

99

Southern Blotting

Genomic DNA was quantified using NanoDrop and 7-10 g was digested overnight at 37C with 300 units of EcoRI in a total volume of 50 L. The next morning, digestions were spiked with additional 200 units of EcoRI and incubated at 37C for ~8 hours. Digested genomic DNA was separated in a 0.8% agarose gel and photographed next to a fluorescent ruler. Within the agarose gel, DNA was depurinated by 2 washes of 15 min in 0.2 M HCl solution, denatured by 2 washes of 15 min in a 1.5 M NaCl/0.5 M

NaOH solution and finally neutralized by 1 wash of 30 min in a 1.5 M NaCl/0.5 M Tris-

HCl pH 7.5 solution. Negatively charged DNA fragments were transferred to a positively charged nylon membrane (Hybond N+; GE #RPN303B) by capillary action, using a reservoir filled with 20X SSC. The next morning, the membrane was UV crosslinked and pre-hybridized in a rotating oven at 60C for 1h in pre-warmed Rapid-Hyb buffer (GE

#RPN1636). In the meantime, a PCR amplicon (previously generated and purified using primers from Table 2-3) was radiolabeled using the Random Primers DNA Labeling

System (ThermoFisher). For this purpose, 25 ng of purified PCR amplicons, in a total volume of 22 L, were denatured at 95C for 5 min and immediately placed on ice.

Denatured fragments were mixed with 15 L of random primers, 2 L of each dNTP except dCTP, 5 L of 32P-dCTP and 1 L of Klenow. Labeling reactions were mixed by pipetting and incubated for 1 hour at room temperature. After addition of 5 L of stop solution, 50 L of radiolabeled probes were purified using illustra MicroSpin G-50

Columns (GE #27533001), denatured at 95C for 5 min and immediately placed on ice.

Denatured radiolabeled probes were then added to pre-hybridization buffer and membranes were hybridized for 3 hours in a rotating oven at 60C. After hybridization,

100

membranes were rinsed with a large volume of 2X SSC/0.1% SDS, followed by 2 washes of 15 min in this solution in a rotating oven at room temperature. Membranes were then washed 2-4 times with 0.2X SSC/0.1% SDS for 15 min each in a rotating oven at 60C. Finally, membranes were wrapped in plastic and exposed to a

Carestream Kodak BioMax light film for 1-3 days at -80C and developed in the dark.

PCR Genotyping

PCR genotyping was performed using primers from Table 2-4. Prior to PCR amplification, genomic DNA must be diluted 1:10 using nuclease-free water. PCR reactions consisted of 2.5 L of Tuning buffer (5PRIME), 5 L of 5 M betaine (Sigma-

Aldrich), 1.5 L of 10 mM dNTP solution mix (NEB), 0.5 L of each 10 M primer (IDT),

0.4 L of Phire Hot Start II DNA Polymerase (ThermoFisher), 2 L of diluted genomic

DNA and 12.6 L of nuclease-free water. Cycling conditions were: 98C for 1 min,

(98C for 15s, 55C for 20s, 72C for 45s) x 35 cycles, 72C for 5 min and 10C forever.

When using primers Fwd-1 and Rev-1, extension times varied from 45s (Dmpk170) to

1min45s (Dmpk480). Primers Fwd-1 and Rev-2, which amplify a larger fragment, require

2min30s of extension. PCR products were separated in a 1.5% agarose gel and visualized under UV light.

Repeat-primed PCR

Repeat-primed PCR was performed using primers from Table 2-5. Prior to amplification, genomic DNA must be diluted 1:10 using nuclease-free water. Repeat- primed PCR reactions consisted of: 2 L of Tuning buffer (5PRIME), 4 L of 5 M betaine (Sigma-Aldrich), 1 L of 10 mM dNTP solution mix (NEB), 0.66 L of each 10

M primer (IDT), 1 L of Taq DNA Polymerase (ThermoFisher), 1 L of diluted genomic

101

DNA and 9 L of nuclease-free water. Cycling conditions were: 98C for 10 min – ramping up at 0.5C/s, (97C for 35s – ramping down at 0.5C/s, 64C for 2 min – ramping down at 0.5C/s, 68C for 8 min – ramping up at 0.5C/s) x 10 cycles, (97C for

35s – ramping up at 0.5C/s, 64C for 2 min – ramping down at 0.5C/s, 68C for [8 min

+ 20s per cycle] – ramping up at 0.5C/s) x 25 cycles, 68C for 2 min (no ramp) and

10C forever – ramping down at 0.5C/s. Products were diluted 1:10 with nuclease-free water and separated by capillary electrophoresis in the ABI3730 DNA Analyzer (Applied

Biosystems). Results were processed in the Gene Marker software (SoftGenetics).

RNA Isolation from Dissected Tissues and Cultured Cells

Total RNA was isolated from mouse tissues using the Direct-zol RNA extraction kit (Zymo Research), following the manufacturer’s instructions. For small tissues such as the choroid plexus, RNA was eluted in 25 L. For all other sample types, the elution volume was 50 L. Following extraction, RNA was stored at -80C.

When evaluating the ability of standard RNA extraction protocols to isolate transcripts bound to proteins (Figure 3-3), cells were resuspended in 1 mL of TRIzol

(ThermoFisher), mixed to 10 L of 500 mM EDTA and split into two tubes (500 L each). The volume in each tube was brought up to 1 mL with TRIzol. Both tubes were incubated in a thermomixer set to 1,000 rpm, one at 65C (hot) and the other at 10C

(standard). Tubes can pop open. To minimize this risk, tubes were wrapped in parafilm.

Following this step, RNA extraction was identically performed for both tubes. Briefly,

200 L of chloroform: isoamyl alcohol (1:24) was added, the mixture was vortexed for

30s and centrifuged at maximum speed for 10 min at room temperature. The aqueous phase was transferred to a new tube, precipitated with 1 volume of isopropanol and

102

incubated at -20C for >1h. Precipitated RNA was centrifuged, the supernatant was discarded, and the RNA pellet was washed with 70% ethanol. Last, the supernatant was removed, and the pellet was air dried for 5 min. RNA pellets were resuspended in nuclease-free water and stored at -80C.

cDNA Synthesis

Synthesis of cDNA was performed using SuperScript III Reverse Transcriptase

(ThermoFisher), following the manufacturer’s protocol. Briefly, 200-500 ng of RNA was brought up to 10 L with nuclease-free water and mixed to 1 L of 10 mM dNTP solution mix (NEB) and 1 L of random hexamers or oligo dT (ThermoFisher). This mixture was denatured at 65C for 5 min and placed immediately on ice. To each synthesis reaction, 8 L of a master mix was added (4 L of 5X 1st Strand Synthesis

Buffer, 2 L of DTT, 1 L of RNaseOUT and 1 L of Superscript III). The final reaction was incubated at 25C for 5 min, 50C for 60 min and 70C for 15 min. Last, cDNA was treated with 2 L of RNase H (NEB) to remove the RNA template. For small tissue samples such as choroid plexus, 200 ng of total RNA were used to synthesize cDNA.

For all other tissues and cultured cells, 500 ng of RNA were used in the cDNA synthesis. When performing cross-tissue comparisons, all cDNA synthesis reactions were performed using 200 ng of RNA. cDNA was stored at -20C for future use.

Quantitative PCR

Quantitative PCR (qPCR) was performed using the SsoAdvanced Universal

SYBR Green Supermix (Bio-Rad), according to the manufacturer’s instructions. Briefly, cDNA was diluted 1:5 (if 500 ng of RNA were used in the cDNA synthesis) or 1:2 (if cDNA was synthesized from 200 ng of RNA) and 2 L were used in each qPCR

103

reaction (10 ng of RNA equivalent). cDNA was mixed to 10 L of SYBR Green

Supermix, 0.5 L of each 10 M primer (Table 3-1) and 7 L of nuclease-free water.

Amplification was performed in a C1000 Touch Bio-Rad thermocycler, using the following program: 95C for 3 min, (95C for 10s, 55C for 30s, plate read) x 40 cycles,

65C for 31s, (65C for 5s, + 0.5C per cycle, plate read) x 60 cycles. Results were processed in the Bio-Rad CFX Manager software and relative expression levels were calculated using the 2-Cq method. Graphical representations were generated using

GraphPad Prism.

Splicing Analysis by RT-PCR

Alternative splicing was measured by RT-PCR. For this purpose, 10 ng of RNA equivalent were amplified using primers from Table 3-2. Splicing reactions consisted of

2.5 L of Tuning buffer (5PRIME), 1 L of 10 mM dNTP solution mix (NEB), 0.5 L of each 10 M primer (IDT), 0.25 L of 5PRIME DNA Polymerase (5PRIME), 1 L of cDNA (10 ng of RNA equivalent) and 19.25 L of nuclease-free water. Cycling conditions were: 94C for 4 min, (94C for 30s, 58C for 30s, 72C for 30s) x 32 cycles,

72C for 10 min and 10C forever. Products were separated in a 2.5% agarose gel containing ethidium bromide and visualized under UV light.

Western Blotting

Tibialis anterior muscles or differentiated myotubes were lysed in protein lysis buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 5 mM EDTA, 1% Igepal, 0.25% sodium deoxycholate, 1X picD and 1X picW protease inhibitors) and lysates were quantified using the Pierce 660 nm Protein Assay Reagent (ThermoFisher). Approximately 40 g of each protein lysate was denatured and loaded in a TGX Stain-Free Mini-PROTEAN

104

gel, any kD (Bio-Rad), which was run for 27 min at 200V and imaged in a ChemiDoc system. Proteins were transferred to a PVDF membrane (Millipore), which was blocked with 5% milk and blotted with anti-DMPK (ThermoFisher #34-9900; rabbit polyclonal;

1:200; overnight at 4C) or anti-GAPDH (Abcam #6C5; mouse monoclonal; 1:200,000;

1h at room temperature) antibodies. The membrane was washed 3X 5 min with PBST before adding secondary antibodies conjugated to horseradish peroxidase (HRP).

Secondary antibodies (1:5,000) were incubated for 1h at room temperature, washed 3X

5 min with PBST, 1X 5 min with PBS and treated with an ECL HRP substrate

(Prometheus) to enable chemiluminescent detection. Images were captured in the

Protein Simple and signal intensity was calculated using ImageJ.

RNA Fluorescence in situ Hybridization

Dissected tissues were frozen in optimal cutting temperature (OCT) compound.

8-10 m sections were fixed in 4% paraformaldehyde (PFA) for 10 min at room temperature. Sections were washed twice for 5 min each in 1X PBS and permeabilized in cold 2% acetone (diluted in 1X PBS) for 5 min at room temperature. Next, sections were pre-hybridized in 2X SSC/25% formamide for 30 min at 30C and hybridized overnight at 30C in FISH hybridization buffer (2X SSC, 25% formamide, 100 mg/mL of dextran sulfate, 1 mg/mL of yeast tRNA, 0.2 mg/mL of BSA, 2 mM of ribonucleoside vanadyl complex and 400 ng/mL of TexasRed-CAG10 probe). The following day, sections were washed twice for 30 min each in in 2X SSC/25% formamide at 30C. To block endogenous fluorescence, sections were treated with 0.25% Sudan Black B, diluted in 70% ethanol, for 5 min at room temperature, and washed 3X in PBS. Tissues were counterstained with DAPI diluted 1:100 in 1X PBS, washed once more and

105

mounted using ProLong Gold Antifade reagent. When counterstaining muscle sections with wheat-germ agglutinin (WGA), this lectin was added to the DAPI solution at the final concentration of 5 g/mL (1:200 of 1 mg/mL stock).

Myoblast Isolation, Culture and Differentiation

Primary myoblasts were isolated based on previously described protocols190.

Briefly, total hindlimb musculature from mice 4-6 weeks of age was minced using sterile razor blades and digested in a collagenase/dispase mixture for 1 hour at 37°C. The resulting cell slurry was triturated using a serological pipette and filtered through first a

70 m then a 40 m cell strainer. After centrifugation, the cell pellet was resuspended in myoblast growth medium (Ham’s F10, 20% FBS, 1X penicillin/streptomycin, 2.5 ng/L bFGF), and cells were grown on collagen I-coated tissue culture dishes at 37°C and 5%

CO2. Wild-type and knockin myoblasts were prepared simultaneously and cultured until nearly >90% of cells stained positive for desmin. To induce in vitro differentiation, cultures at ∼80% confluency were switched to differentiation medium (DMEM, 2% horse serum, 1X penicillin/streptomycin).

RNA-Seq

Total RNA was extracted from differentiated myotubes and dissected hindbrain choroid plexus as previously described. From myotubes, 500 ng of RNA was depleted of ribosomal RNAs and converted into a cDNA library using the Ultra II Directional RNA

Library Prep kit (NEB). From choroid plexus, 200 ng of RNA was similarly depleted of ribosomal RNAs and converted into a cDNA library using the KAPA Stranded RNA-Seq

Library Preparation kit (KAPA Biosystems). In either case, cDNA libraries were sequenced using the NextSeq500 Illumina platform. For splicing analysis, reads were

106

aligned to the mouse genome (mm10) using HISAT2 and splicing was quantified using

MISO. For differential gene expression studies, reads were aligned using OLego and transcript abundance was estimated using Quantas. Gene ontology studies were performed using Metascape and heat maps were generated by Morpheus (Broad

Institute).

107

LIST OF REFERENCES

1. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2001).

2. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304-1351 (2001).

3. Treangen, T.J. & Salzberg, S.L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 13, 36-46 (2012).

4. Hurst, G.D. & Werren, J.H. The role of selfish genetic elements in eukaryotic evolution. Nat Rev Genet 2, 597-606 (2001).

5. Charlesworth, B., Sniegowski, P. & Stephan, W. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371, 215-220 (1994).

6. Henikoff, S., Ahmad, K. & Malik, H.S. The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293, 1098-1102 (2001).

7. Vergnaud, G. & Denoeud, F. Minisatellites: mutability and genome architecture. Genome Res 10, 899-907 (2000).

8. Ellegren, H. Microsatellites: simple sequences with complex evolution. Nat Rev Genet 5, 435-445 (2004).

9. Moyzis, R.K. et al. A highly conserved repetitive DNA sequence, (TTAGGG)n, present at the telomeres of human . Proc Natl Acad Sci U S A 85, 6622-6626 (1988).

10. Marcotte, E.M., Pellegrini, M., Yeates, T.O. & Eisenberg, D. A census of protein repeats. J Mol Biol 293, 151-160 (1999).

11. Vinces, M.D., Legendre, M., Caldara, M., Hagihara, M. & Verstrepen, K.J. Unstable tandem repeats in promoters confer transcriptional evolvability. Science 324, 1213-1216 (2009).

12. Mirkin, S.M. Expandable DNA repeats and human disease. Nature 447, 932-940 (2007).

13. Bagshaw, A.T.M. Functional Mechanisms of Microsatellite DNA in Eukaryotic Genomes. Genome Biol Evol 9, 2428-2443 (2017).

14. Pearson, C.E., Nichol Edamura, K. & Cleary, J.D. Repeat instability: mechanisms of dynamic mutations. Nat Rev Genet 6, 729-742 (2005).

108

15. McMurray, C.T. Mechanisms of trinucleotide repeat instability during human development. Nat Rev Genet 11, 786-799 (2010).

16. Mohan, A., Goodwin, M. & Swanson, M.S. RNA-protein interactions in unstable microsatellite diseases. Brain Res 1584, 3-14 (2014).

17. Cleary, J.D. & Ranum, L.P. Repeat associated non-ATG (RAN) translation: new starts in microsatellite expansion disorders. Curr Opin Genet Dev 26, 6-15 (2014).

18. Gatchel, J.R. & Zoghbi, H.Y. Diseases of unstable repeat expansion: mechanisms and common principles. Nat Rev Genet 6, 743-755 (2005).

19. Everett, C.M. & Wood, N.W. Trinucleotide repeats and neurodegenerative disease. Brain 127, 2385-2405 (2004).

20. Verkerk, A.J. et al. Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65, 905-914 (1991).

21. Oberle, I. et al. Instability of a 550-base pair DNA segment and abnormal methylation in fragile X syndrome. Science 252, 1097-1102 (1991).

22. Fu, Y.H. et al. Variation of the CGG repeat at the fragile X site results in genetic instability: resolution of the Sherman paradox. Cell 67, 1047-1058 (1991).

23. Yu, S. et al. Fragile X genotype characterized by an unstable region of DNA. Science 252, 1179-1181 (1991).

24. Colak, D. et al. Promoter-bound trinucleotide repeat mRNA drives epigenetic silencing in fragile X syndrome. Science 343, 1002-1005 (2014).

25. Pieretti, M. et al. Absence of expression of the FMR-1 gene in fragile X syndrome. Cell 66, 817-822 (1991).

26. Verheij, C. et al. Characterization and localization of the FMR-1 gene product associated with fragile X syndrome. Nature 363, 722-724 (1993).

27. Rousseau, F. et al. Direct diagnosis by DNA analysis of the fragile X syndrome of mental retardation. N Engl J Med 325, 1673-1681 (1991).

28. Fmr1 knockout mice: a model to study fragile X mental retardation. The Dutch- Belgian Fragile X Consortium. Cell 78, 23-33 (1994).

109

29. Brook, J.D. et al. Molecular basis of myotonic dystrophy: expansion of a trinucleotide (CTG) repeat at the 3' end of a transcript encoding a protein kinase family member. Cell 68, 799-808 (1992).

30. Buxton, J. et al. Detection of an unstable fragment of DNA specific to individuals with myotonic dystrophy. Nature 355, 547-548 (1992).

31. Fu, Y.H. et al. An unstable triplet repeat in a gene related to myotonic muscular dystrophy. Science 255, 1256-1258 (1992).

32. Harley, H.G. et al. Unstable DNA sequence in myotonic dystrophy. Lancet 339, 1125-1128 (1992).

33. Mahadevan, M. et al. Myotonic dystrophy mutation: an unstable CTG repeat in the 3' untranslated region of the gene. Science 255, 1253-1255 (1992).

34. Reddy, S. et al. Mice lacking the myotonic dystrophy protein kinase develop a late onset progressive myopathy. Nat Genet 13, 325-335 (1996).

35. Carrell, S.T. et al. Dmpk gene deletion or antisense knockdown does not compromise cardiac or skeletal muscle function in mice. Hum Mol Genet 25, 4328-4338 (2016).

36. Taneja, K.L., McCurrach, M., Schalling, M., Housman, D. & Singer, R.H. Foci of trinucleotide repeat transcripts in nuclei of myotonic dystrophy cells and tissues. J Cell Biol 128, 995-1002 (1995).

37. Davis, B.M., McCurrach, M.E., Taneja, K.L., Singer, R.H. & Housman, D.E. Expansion of a CUG trinucleotide repeat in the 3' untranslated region of myotonic dystrophy protein kinase transcripts results in nuclear retention of transcripts. Proc Natl Acad Sci U S A 94, 7388-7393 (1997).

38. Hamshere, M.G., Newman, E.E., Alwazzan, M., Athwal, B.S. & Brook, J.D. Transcriptional abnormality in myotonic dystrophy affects DMPK but not neighboring genes. Proc Natl Acad Sci U S A 94, 7394-7399 (1997).

39. Timchenko, L.T. et al. Identification of a (CUG)n triplet repeat RNA-binding protein and its expression in myotonic dystrophy. Nucleic acids research 24, 4407-4414 (1996).

40. Miller, J.W. et al. Recruitment of human muscleblind proteins to (CUG)(n) expansions associated with myotonic dystrophy. EMBO J 19, 4439-4448 (2000).

41. Fardaei, M. et al. Three proteins, MBNL, MBLL and MBXL, co-localize in vivo with nuclear foci of expanded-repeat transcripts in DM1 and DM2 cells. Hum Mol Genet 11, 805-814 (2002).

110

42. Kanadia, R.N. et al. A muscleblind knockout model for myotonic dystrophy. Science 302, 1978-1980 (2003).

43. Charizanis, K. et al. Muscleblind-like 2-mediated alternative splicing in the developing brain and dysregulation in myotonic dystrophy. Neuron 75, 437-450 (2012).

44. Batra, R., Charizanis, K. & Swanson, M.S. Partners in crime: bidirectional transcription in unstable microsatellite disease. Hum Mol Genet 19, R77-82 (2010).

45. Cho, D.H. et al. Antisense transcription and heterochromatin at the DM1 CTG repeats are constrained by CTCF. Mol Cell 20, 483-489 (2005).

46. Zu, T. et al. RAN proteins and RNA foci from antisense transcripts in C9ORF72 ALS and frontotemporal dementia. Proc Natl Acad Sci U S A 110, E4968-4977 (2013).

47. Moseley, M.L. et al. Bidirectional expression of CUG and CAG expansion transcripts and intranuclear polyglutamine inclusions in spinocerebellar ataxia type 8. Nat Genet 38, 758-769 (2006).

48. Zu, T. et al. Non-ATG-initiated translation directed by microsatellite expansions. Proc Natl Acad Sci U S A 108, 260-265 (2011).

49. Cleary, J.D. & Ranum, L.P. Repeat-associated non-ATG (RAN) translation in neurological disease. Hum Mol Genet 22, R45-51 (2013).

50. Banez-Coronel, M. et al. RAN Translation in Huntington Disease. Neuron 88, 667-677 (2015).

51. Zu, T. et al. RAN Translation Regulated by Muscleblind Proteins in Myotonic Dystrophy Type 2. Neuron 95, 1292-1305 e1295 (2017).

52. Steinert, H. Myopathologische Beiträge. Deutsche Zeitschrift für Nervenheilkunde 37, 58-104 (1909).

53. Harley, H.G. et al. Size of the unstable CTG repeat sequence in relation to phenotype and parental transmission in myotonic dystrophy. Am J Hum Genet 52, 1164-1174 (1993).

54. Ashizawa, T., Dubel, J.R. & Harati, Y. Somatic instability of CTG repeat in myotonic dystrophy. Neurology 43, 2674-2678 (1993).

111

55. Thornton, C.A., Johnson, K. & Moxley, R.T., 3rd Myotonic dystrophy patients have larger CTG expansions in skeletal muscle than in leukocytes. Ann Neurol 35, 104-107 (1994).

56. Nakamori, M., Sobczak, K., Moxley, R.T., 3rd & Thornton, C.A. Scaled-down genetic analysis of myotonic dystrophy type 1 and type 2. Neuromuscul Disord 19, 759-762 (2009).

57. Nakamori, M. et al. Splicing biomarkers of disease severity in myotonic dystrophy. Ann Neurol 74, 862-872 (2013).

58. Tsilfidis, C., MacKenzie, A.E., Mettler, G., Barcelo, J. & Korneluk, R.G. Correlation between CTG trinucleotide repeat length and frequency of severe congenital myotonic dystrophy. Nat Genet 1, 192-195 (1992).

59. Tachi, N., Ohya, K., Chiba, S., Sato, T. & Kikuchi, K. Minimal somatic instability of CTG repeat in congenital myotonic dystrophy. Pediatr Neurol 12, 81-83 (1995).

60. Wong, L.J. & Ashizawa, T. Instability of the (CTG)n repeat in congenital myotonic dystrophy. Am J Hum Genet 61, 1445-1448 (1997).

61. Ho, T.H. et al. Muscleblind proteins regulate alternative splicing. EMBO J 23, 3103-3112 (2004).

62. Batra, R. et al. Loss of MBNL leads to disruption of developmentally regulated alternative polyadenylation in RNA-mediated disease. Mol Cell 56, 311-322 (2014).

63. Wang, E.T. et al. Transcriptome-wide regulation of pre-mRNA splicing and mRNA localization by muscleblind proteins. Cell 150, 710-724 (2012).

64. Taliaferro, J.M. et al. Distal Alternative Last Exons Localize mRNAs to Neural Projections. Mol Cell 61, 821-833 (2016).

65. Wang, E.T. et al. Dysregulation of mRNA Localization and Translation in Genetic Disease. J Neurosci 36, 11418-11426 (2016).

66. Han, H. et al. MBNL proteins repress ES-cell-specific alternative splicing and reprogramming. Nature 498, 241-245 (2013).

67. Mankodi, A. et al. Expanded CUG repeats trigger aberrant splicing of ClC-1 chloride channel pre-mRNA and hyperexcitability of skeletal muscle in myotonic dystrophy. Mol Cell 10, 35-44 (2002).

112

68. Berg, J., Jiang, H., Thornton, C.A. & Cannon, S.C. Truncated ClC-1 mRNA in myotonic dystrophy exerts a dominant-negative effect on the Cl current. Neurology 63, 2371-2375 (2004).

69. Kuyumcu-Martinez, N.M., Wang, G.S. & Cooper, T.A. Increased steady-state levels of CUGBP1 in myotonic dystrophy 1 are due to PKC-mediated hyperphosphorylation. Mol Cell 28, 68-78 (2007).

70. Wang, E.T. et al. Antagonistic regulation of mRNA expression and splicing by CELF and MBNL proteins. Genome Res 25, 858-871 (2015).

71. Krol, J. et al. Ribonuclease dicer cleaves triplet repeat hairpins into shorter repeats that silence specific targets. Mol Cell 25, 575-586 (2007).

72. Gomes-Pereira, M., Cooper, T.A. & Gourdon, G. Myotonic dystrophy mouse models: towards rational therapy development. Trends Mol Med 17, 506-517 (2011).

73. Novelli, G. et al. Failure in detecting mRNA transcripts from the mutated allele in myotonic dystrophy muscle. Biochem Mol Biol Int 29, 291-297 (1993).

74. Carango, P., Noble, J.E., Marks, H.G. & Funanage, V.L. Absence of myotonic dystrophy protein kinase (DMPK) mRNA as a result of a triplet repeat expansion in myotonic dystrophy. Genomics 18, 340-348 (1993).

75. Thornton, C.A., Wymer, J.P., Simmons, Z., McClain, C. & Moxley, R.T., 3rd Expansion of the myotonic dystrophy CTG repeat reduces expression of the flanking DMAHP gene. Nat Genet 16, 407-409 (1997).

76. Fu, Y.H. et al. Decreased expression of myotonin-protein kinase messenger RNA and protein in adult form of myotonic dystrophy. Science 260, 235-238 (1993).

77. Wakimoto, H. et al. Characterization of cardiac conduction system abnormalities in mice with targeted disruption of Six5 gene. J Interv Card Electrophysiol 7, 127- 135 (2002).

78. Klesert, T.R. et al. Mice deficient in Six5 develop cataracts: implications for myotonic dystrophy. Nat Genet 25, 105-109 (2000).

79. Sarkar, P.S. et al. Heterozygous loss of Six5 in mice is sufficient to cause ocular cataracts. Nat Genet 25, 110-114 (2000).

80. Sarkar, P.S., Paul, S., Han, J. & Reddy, S. Six5 is required for spermatogenic cell survival and spermiogenesis. Hum Mol Genet 13, 1421-1431 (2004).

113

81. Lee, K.Y. et al. Compound loss of muscleblind-like function in myotonic dystrophy. EMBO Mol Med 5, 1887-1900 (2013).

82. Goodwin, M. et al. MBNL Sequestration by Toxic RNAs and RNA Misprocessing in the Myotonic Dystrophy Brain. Cell reports 12, 1159-1168 (2015).

83. Thomas, J.D. et al. Disrupted prenatal RNA processing and myogenesis in congenital myotonic dystrophy. Genes Dev 31, 1122-1133 (2017).

84. Mankodi, A. et al. Myotonic dystrophy in transgenic mice expressing an expanded CUG repeat. Science 289, 1769-1773 (2000).

85. Seznec, H. et al. Mice transgenic for the human myotonic dystrophy region with expanded CTG repeats display muscular and brain abnormalities. Hum Mol Genet 10, 2717-2726 (2001).

86. Guiraud-Dogan, C. et al. DM1 CTG expansions affect insulin receptor isoforms expression in various tissues of transgenic mice. Biochim Biophys Acta 1772, 1183-1191 (2007).

87. Gomes-Pereira, M. et al. CTG trinucleotide repeat "big jumps": large expansions, small mice. PLoS Genet 3, e52 (2007).

88. Huguet, A. et al. Molecular, physiological, and motor performance defects in DMSXL mice carrying >1,000 CTG repeats from the human DM1 locus. PLoS Genet 8, e1003043 (2012).

89. Algalarrondo, V. et al. Abnormal sodium current properties contribute to cardiac electrical and contractile dysfunction in a mouse model of myotonic dystrophy type 1. Neuromuscul Disord 25, 308-320 (2015).

90. Sicot, G. et al. Downregulation of the Glial GLT1 Glutamate Transporter and Purkinje Cell Dysfunction in a Mouse Model of Myotonic Dystrophy. Cell Rep 19, 2718-2729 (2017).

91. van den Broek, W.J. et al. Somatic expansion behaviour of the (CTG)n repeat in myotonic dystrophy knock-in mice is differentially affected by Msh3 and Msh6 mismatch-repair proteins. Hum Mol Genet 11, 191-198 (2002).

92. Wheeler, V.C. et al. Length-dependent gametic CAG repeat instability in the Huntington's disease knock-in mouse. Hum Mol Genet 8, 115-122 (1999).

93. Lloret, A. et al. Genetic background modifies nuclear mutant huntingtin accumulation and HD CAG repeat instability in Huntington's disease knock-in mice. Hum Mol Genet 15, 2015-2024 (2006).

114

94. White, J.K. et al. Huntingtin is required for neurogenesis and is not impaired by the Huntington's disease CAG expansion. Nat Genet 17, 404-410 (1997).

95. Willemsen, R. et al. The FMR1 CGG repeat mouse displays ubiquitin-positive intranuclear neuronal inclusions; implications for the cerebellar tremor/ataxia syndrome. Hum Mol Genet 12, 949-959 (2003).

96. Diep, A.A. et al. Female CGG knock-in mice modeling the fragile X premutation are impaired on a skilled forelimb reaching task. Neurobiol Learn Mem 97, 229- 234 (2012).

97. Chen, W. et al. The zinc-finger protein CNBP is required for forebrain formation in the mouse. Development 130, 1367-1379 (2003).

98. Chen, W. et al. Haploinsuffciency for Znf9 in Znf9+/- mice is associated with multiorgan abnormalities resembling myotonic dystrophy. J Mol Biol 368, 8-17 (2007).

99. Poulos, M.G. et al. Progressive impairment of muscle regeneration in muscleblind-like 3 isoform knockout mice. Hum Mol Genet 22, 3547-3558 (2013).

100. Timchenko, N.A. et al. Overexpression of CUG triplet repeat-binding protein, CUGBP1, in mice inhibits myogenesis. J Biol Chem 279, 13129-13139 (2004).

101. Ward, A.J., Rimer, M., Killian, J.M., Dowling, J.J. & Cooper, T.A. CUGBP1 overexpression in mouse skeletal muscle reproduces features of myotonic dystrophy type 1. Hum Mol Genet 19, 3614-3622 (2010).

102. Koshelev, M., Sarma, S., Price, R.E., Wehrens, X.H. & Cooper, T.A. Heart- specific overexpression of CUGBP1 reproduces functional and molecular abnormalities of myotonic dystrophy type 1. Hum Mol Genet 19, 1066-1075 (2010).

103. Monckton, D.G., Coolbaugh, M.I., Ashizawa, K.T., Siciliano, M.J. & Caskey, C.T. Hypermutable myotonic dystrophy CTG repeats in transgenic mice. Nat Genet 15, 193-196 (1997).

104. Fortune, M.T., Vassilopoulos, C., Coolbaugh, M.I., Siciliano, M.J. & Monckton, D.G. Dramatic, expansion-biased, age-dependent, tissue-specific somatic mosaicism in a transgenic mouse model of triplet repeat instability. Hum Mol Genet 9, 439-445 (2000).

105. Wang, G.S., Kearney, D.L., De Biasi, M., Taffet, G. & Cooper, T.A. Elevation of RNA-binding protein CUGBP1 is an early event in an inducible heart-specific mouse model of myotonic dystrophy. J Clin Invest 117, 2802-2811 (2007).

115

106. Orengo, J.P. et al. Expanded CTG repeats within the DMPK 3' UTR causes severe skeletal muscle wasting in an inducible mouse model for myotonic dystrophy. Proc Natl Acad Sci U S A 105, 2646-2651 (2008).

107. Mahadevan, M.S. et al. Reversible model of RNA toxicity and cardiac conduction defects in myotonic dystrophy. Nat Genet 38, 1066-1070 (2006).

108. Hutchison, C.A., 3rd, Smith, H.O., Pfannkoch, C. & Venter, J.C. Cell-free cloning using phi29 DNA polymerase. Proc Natl Acad Sci U S A 102, 17332-17336 (2005).

109. Dean, F.B., Nelson, J.R., Giesler, T.L. & Lasken, R.S. Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification. Genome Res 11, 1095-1099 (2001).

110. Osborne, R.J. & Thornton, C.A. Cell-free cloning of highly expanded CTG repeats by amplification of dimerized expanded repeats. Nucleic Acids Res 36, e24 (2008).

111. Thomas, K.R., Folger, K.R. & Capecchi, M.R. High frequency targeting of genes to specific sites in the mammalian genome. Cell 44, 419-428 (1986).

112. Chevalier, B.S. et al. Design, activity, and structure of a highly specific artificial endonuclease. Mol Cell 10, 895-905 (2002).

113. Kim, Y.G., Cha, J. & Chandrasegaran, S. Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc Natl Acad Sci U S A 93, 1156-1160 (1996).

114. Boch, J. et al. Breaking the code of DNA binding specificity of TAL-type III effectors. Science 326, 1509-1512 (2009).

115. Bibikova, M., Beumer, K., Trautman, J.K. & Carroll, D. Enhancing gene targeting with designed zinc finger nucleases. Science 300, 764 (2003).

116. Ishino, Y., Shinagawa, H., Makino, K., Amemura, M. & Nakata, A. Nucleotide sequence of the iap gene, responsible for alkaline phosphatase isozyme conversion in Escherichia coli, and identification of the gene product. J Bacteriol 169, 5429-5433 (1987).

117. Mojica, F.J., Diez-Villasenor, C., Soria, E. & Juez, G. Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria. Mol Microbiol 36, 244-246 (2000).

116

118. Bolotin, A., Quinquis, B., Sorokin, A. & Ehrlich, S.D. Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology 151, 2551-2561 (2005).

119. Pourcel, C., Salvignol, G. & Vergnaud, G. CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. Microbiology 151, 653-663 (2005).

120. Jansen, R., Embden, J.D., Gaastra, W. & Schouls, L.M. Identification of genes that are associated with DNA repeats in prokaryotes. Mol Microbiol 43, 1565- 1575 (2002).

121. Haft, D.H., Selengut, J., Mongodin, E.F. & Nelson, K.E. A guild of 45 CRISPR- associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput Biol 1, e60 (2005).

122. Mojica, F.J., Diez-Villasenor, C., Garcia-Martinez, J. & Soria, E. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J Mol Evol 60, 174-182 (2005).

123. Barrangou, R. et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709-1712 (2007).

124. Makarova, K.S., Grishin, N.V., Shabalina, S.A., Wolf, Y.I. & Koonin, E.V. A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol Direct 1, 7 (2006). 125. Brouns, S.J. et al. Small CRISPR RNAs guide antiviral defense in prokaryotes. Science 321, 960-964 (2008).

126. Marraffini, L.A. & Sontheimer, E.J. CRISPR interference limits horizontal gene transfer in staphylococci by targeting DNA. Science 322, 1843-1845 (2008). 127. Makarova, K.S. et al. Evolution and classification of the CRISPR-Cas systems. Nat Rev Microbiol 9, 467-477 (2011).

128. Li, Y. et al. Harnessing Type I and Type III CRISPR-Cas systems for genome editing. Nucleic Acids Res 44, e34 (2016).

129. Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816-821 (2012).

130. Gasiunas, G., Barrangou, R., Horvath, P. & Siksnys, V. Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proc Natl Acad Sci U S A 109, E2579-2586 (2012).

117

131. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013).

132. Shah, S.A., Erdmann, S., Mojica, F.J. & Garrett, R.A. Protospacer recognition motifs: mixed identities and functional diversity. RNA Biol 10, 891-899 (2013). 133. Ran, F.A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186-191 (2015).

134. Yin, C. et al. In Vivo Excision of HIV-1 Provirus by saCas9 and Multiplex Single- Guide RNAs in Animal Models. Mol Ther 25, 1168-1186 (2017).

135. Niu, D. et al. Inactivation of porcine endogenous retrovirus in pigs using CRISPR- Cas9. Science 357, 1303-1307 (2017).

136. Liang, P. et al. CRISPR/Cas9-mediated gene editing in human tripronuclear zygotes. Protein Cell 6, 363-372 (2015).

137. Ma, H. et al. Correction of a pathogenic gene mutation in human embryos. Nature 548, 413-419 (2017).

138. Provenzano, C. et al. CRISPR/Cas9-Mediated Deletion of CTG Expansions Recovers Normal Phenotype in Myogenic Cells Derived from Myotonic Dystrophy 1 Patients. Mol Ther Nucleic Acids 9, 337-348 (2017).

139. Xie, N. et al. Reactivation of FMR1 by CRISPR/Cas9-Mediated Deletion of the Expanded CGG-Repeat of the Fragile X Chromosome. PLoS One 11, e0165499 (2016).

140. Pinto, B.S. et al. Impeding Transcription of Expanded Microsatellite Repeats by Deactivated Cas9. Mol Cell 68, 479-490 e475 (2017).

141. Monckton, D.G., Wong, L.J., Ashizawa, T. & Caskey, C.T. Somatic mosaicism, germline expansions, germline reversions and intergenerational reductions in myotonic dystrophy males: small pool PCR analyses. Hum Mol Genet 4, 1-8 (1995).

142. Martorell, L. et al. Progression of somatic CTG repeat length heterogeneity in the blood cells of myotonic dystrophy patients. Hum Mol Genet 7, 307-312 (1998).

143. Hamshere, M.G., Harley, H., Harper, P., Brook, J.D. & Brookfield, J.F. Myotonic dystrophy: the correlation of (CTG) repeat length in leucocytes with age at onset is significant only for patients with small expansions. J Med Genet 36, 59-61 (1999).

118

144. Yang, H. et al. One-step generation of mice carrying reporter and conditional alleles by CRISPR/Cas-mediated genome engineering. Cell 154, 1370-1379 (2013).

145. Yen, S.T. et al. Somatic mosaicism and allele complexity induced by CRISPR/Cas9 RNA injections in mouse zygotes. Dev Biol 393, 3-9 (2014).

146. Gourdon, G. & Meola, G. Myotonic Dystrophies: State of the Art of New Therapeutic Developments for the CNS. Front Cell Neurosci 11, 101 (2017).

147. Filippova, G.N. et al. CTCF-binding sites flank CTG/CAG repeats and form a methylation-sensitive insulator at the DM1 locus. Nat Genet 28, 335-343 (2001).

148. Gagnon, K.T., Li, L., Janowski, B.A. & Corey, D.R. Analysis of nuclear RNA interference in human cells by subcellular fractionation and Argonaute loading. Nat Protoc 9, 2045-2060 (2014).

149. Narang, M.A., Waring, J.D., Sabourin, L.A. & Korneluk, R.G. Myotonic dystrophy (DM) protein kinase levels in congenital and adult DM patients. Eur J Hum Genet 8, 507-512 (2000).

150. Wheeler, T.M. et al. Targeting nuclear RNA for in vivo correction of myotonic dystrophy. Nature 488, 111-115 (2012).

151. Wheeler, T.M., Krym, M.C. & Thornton, C.A. Ribonuclear foci at the neuromuscular junction in myotonic dystrophy type 1. Neuromuscul Disord 17, 242-247 (2007).

152. Caillet-Boudin, M.L. et al. Brain pathology in myotonic dystrophy: when tauopathy meets spliceopathy and RNAopathy. Front Mol Neurosci 6, 57 (2014).

153. Jiang, H., Mankodi, A., Swanson, M.S., Moxley, R.T. & Thornton, C.A. Myotonic dystrophy type 1 is associated with nuclear foci of mutant RNA, sequestration of muscleblind proteins and deregulated alternative splicing in neurons. Hum Mol Genet 13, 3079-3088 (2004).

154. Yu, Z., Teng, X. & Bonini, N.M. Triplet repeat-derived siRNAs enhance RNA- mediated toxicity in a Drosophila model for myotonic dystrophy. PLoS Genet 7, e1001340 (2011).

155. Todd, P.K. et al. Transcriptional changes and developmental abnormalities in a zebrafish model of myotonic dystrophy type 1. Dis Model Mech 7, 143-155 (2014).

119

156. Gudde, A.E., Gonzalez-Barriga, A., van den Broek, W.J., Wieringa, B. & Wansink, D.G. A low absolute number of expanded transcripts is involved in myotonic dystrophy type 1 manifestation in muscle. Hum Mol Genet 25, 1648- 1662 (2016).

157. Kouadjo, K.E., Nishida, Y., Cadrin-Girard, J.F., Yoshioka, M. & St-Amand, J. Housekeeping and tissue-specific genes in mouse tissues. BMC Genomics 8, 127 (2007).

158. Whiting, E.J. et al. Characterization of myotonic dystrophy kinase (DMK) protein in human and rodent muscle and central nervous tissue. Hum Mol Genet 4, 1063-1072 (1995).

159. Sarkar, P.S., Han, J. & Reddy, S. In situ hybridization analysis of Dmpk mRNA in adult mouse tissues. Neuromuscul Disord 14, 497-506 (2004).

160. Niven, J.E. & Chittka, L. Evolving understanding of nervous system evolution. Curr Biol 26, R937-R941 (2016).

161. Brocklehurst, G. The significance of the evolution of the cerebrospinal fluid system. Ann R Coll Surg Engl 61, 349-356 (1979).

162. Meeker, R.B., Williams, K., Killebrew, D.A. & Hudson, L.C. Cell trafficking through the choroid plexus. Cell Adh Migr 6, 390-396 (2012).

163. Lun, M.P., Monuki, E.S. & Lehtinen, M.K. Development and functions of the choroid plexus-cerebrospinal fluid system. Nat Rev Neurosci 16, 445-457 (2015).

164. Lun, M.P. et al. Spatially heterogeneous choroid plexus transcriptomes encode positional identity and contribute to regional CSF production. J Neurosci 35, 4903-4916 (2015).

165. Lehtinen, M.K. et al. The cerebrospinal fluid provides a proliferative niche for neural progenitor cells. Neuron 69, 893-905 (2011).

166. Baruch, K. et al. Aging-induced type I interferon response at the choroid plexus negatively affects brain function. Science 346, 89-93 (2014).

167. Mayerl, S., Visser, T.J., Darras, V.M., Horn, S. & Heuer, H. Impact of Oatp1c1 deficiency on thyroid hormone metabolism and action in the mouse brain. Endocrinology 153, 1528-1537 (2012).

168. Lopez-Juarez, A. et al. Thyroid hormone signaling acts as a neurogenic switch by repressing Sox2 in the adult neural stem cell niche. Cell Stem Cell 10, 531-543 (2012).

120

169. Alliot, F., Godin, I. & Pessac, B. Microglia derive from progenitors, originating from the yolk sac, and which proliferate in the brain. Brain Res Dev Brain Res 117, 145-152 (1999).

170. Ransohoff, R.M. & Engelhardt, B. The anatomical and cellular basis of immune surveillance in the central nervous system. Nat Rev Immunol 12, 623-635 (2012). 171. Kunis, G. et al. IFN-gamma-dependent activation of the brain's choroid plexus for CNS immune surveillance and repair. Brain 136, 3427-3440 (2013).

172. Balusu, S., Brkic, M., Libert, C. & Vandenbroucke, R.E. The choroid plexus- cerebrospinal fluid interface in Alzheimer's disease: more than just a barrier. Neural Regen Res 11, 534-537 (2016).

173. Reboldi, A. et al. C-C chemokine receptor 6–regulated entry of TH-17 cells into the CNS through the choroid plexus is required for the initiation of EAE. Nature Immunology 10, 514 (2009).

174. Vercellino, M. et al. Involvement of the choroid plexus in multiple sclerosis autoimmune inflammation: a neuropathological study. J Neuroimmunol 199, 133- 141 (2008).

175. Evans, D., Marquez, S.M. & Pace, N.R. RNase P: interface of the RNA and protein worlds. Trends Biochem Sci 31, 333-341 (2006).

176. Cai, Y. et al. Rpph1 Upregulates CDC42 Expression and Promotes Hippocampal Neuron Dendritic Spine Formation by Competing with miR-330-5p. Front Mol Neurosci 10, 27 (2017).

177. Demchenko, I.T., Gutsaeva, D.R., Moskvin, A.N. & Zhilyaev, S.Y. Involvement of extracellular superoxide dismutase in regulating brain blood flow. Neurosci Behav Physiol 40, 173-178 (2010).

178. Roepke, T.K. et al. KCNE2 forms potassium channels with KCNA3 and KCNQ1 in the choroid plexus epithelium. FASEB J 25, 4264-4273 (2011).

179. Abbott, G.W. et al. KCNQ1, KCNE2, and Na+-coupled solute transporters form reciprocally regulating complexes that affect neuronal excitability. Sci Signal 7, ra22 (2014).

180. Liu, L. & Duff, K. A technique for serial collection of cerebrospinal fluid from the cisterna magna in mouse. J Vis Exp (2008).

181. Smith, J.S. et al. Characterization of individual mouse cerebrospinal fluid proteomes. Proteomics 14, 1102-1106 (2014).

121

182. Cunningham, R., Jany, P., Messing, A. & Li, L. Protein changes in immunodepleted cerebrospinal fluid from a transgenic mouse model of Alexander disease detected using mass spectrometry. J Proteome Res 12, 719-728 (2013).

183. Martinez-Rodriguez, J.E. et al. Decreased hypocretin-1 (Orexin-A) levels in the cerebrospinal fluid of patients with myotonic dystrophy and excessive daytime sleepiness. Sleep 26, 287-290 (2003).

184. Ciafaloni, E. et al. The hypocretin neurotransmission system in myotonic dystrophy type 1. Neurology 70, 226-230 (2008).

185. Iwata, T. et al. [A marked decrease of orexin in the cerebrospinal fluid in a patient with myotonic dystrophy type 1 showing an excessive daytime sleepiness]. Rinsho Shinkeigaku 49, 437-439 (2009).

186. Winblad, S. et al. Cerebrospinal fluid tau and amyloid beta42 protein in patients with myotonic dystrophy type 1. Eur J Neurol 15, 947-952 (2008).

187. Oyamada, R. et al. Neurofibrillary tangles and deposition of oxidative products in the brain in cases of myotonic dystrophy. Neuropathology 26, 107-114 (2006).

188. McConnell, O. et al. Impaired consciousness states in myotonic dystrophy-type 1 mediation by γ-amino butyric acid (GABA). Sleep Medicine 40, e285 (2017).

189. Marques, F. et al. Transcriptome signature of the adult mouse choroid plexus. Fluids Barriers CNS 8, 10 (2011).

190. Rando, T.A. & Blau, H.M. Primary mouse myoblast purification, characterization, and transplantation for cell-mediated gene therapy. J Cell Biol 125, 1275-1287 (1994).

122

BIOGRAPHICAL SKETCH

Ruan Oliveira was born and raised in Rio Grande, Rio Grande do Sul, Brazil. In

2012, he completed his bachelor’s degree in Biotechnology at the Federal University of

Pelotas, Brazil, where he worked as an undergraduate research assistant in the laboratory of Dr. Tiago Collares. In 2013, Ruan moved to Gainesville, FL, to pursue his

Ph.D. degree in Medical Sciences at the University of Florida. Under the mentorship of

Dr. Maurice S. Swanson, Ruan developed novel strategies to engineer the mouse genome and generate better models for repeat expansion disorders. During his Ph.D.,

Ruan spearheaded the efforts to implement the CRISPR/Cas9 technology at the

University of Florida, assisting several researchers in their gene editing endeavors. In the laboratory of Dr. Swanson, Ruan developed mouse models for myotonic dystrophy, aiming to provide a platform to understand disease progression and develop rational therapeutic approaches. Ruan graduated in the spring semester of 2018 and will continue his scientific career in the biotechnology industry.

123