IDENTIFICATION OF MODIFIERS OF

DISSERTATION

Presented in partial fulfillment of the Requirements for the Degree Doctor of Philosophy

in the Graduate School of The Ohio State University

By

Corey D. Ruhno, B.S.

The Ohio State Biochemistry Graduate Program

The Ohio State University

2019

Dissertation Committee

Dr. Arthur H. M. Burghes, Advisor

Dr. Stephen J. Kolb

Dr. Jill Rafael-Fortney

Dr. Brian K. Kaspar

Dr. Kun Huang

Copyright by

Corey D. Ruhno

2019

ABSTRACT

Spinal muscular atrophy (SMA) is a neurodegenerative disease with an incidence of approximately 1 in 10,000 live births that results in loss of motor neurons, leading to paralysis and death in the most severe form. SMA is a disease of low levels of SMN , which is encoded by 2 that are nearly identical: SMN1 and SMN2. However, a single basepair difference in exon 7 of the SMN2 disrupts the binding of splicing factors, resulting in a loss of exon 7 in a majority of SMN2 transcripts. Consequently, a majority of the protein originating from the SMN2 gene is truncated, unable to oligomerize efficiently, and is rapidly degraded. SMA patients have homozygous loss or of SMN1 and therefore rely completely on the SMN2 gene for SMN production. Copy number of SMN2 varies in individuals and is inversely correlated with SMA severity. However, some patients do not follow this correlation, called

“exception patients”, most striking of which are haploidentical SMA siblings who have discordant phenotypes with each other. One known genetic modifier, c.859G>C, has been identified in SMN2, however it only explains a small subset of patients. We have performed targeted sequencing of the SMN2 gene of 217 SMA patients to identify genetic modifiers of

SMA. We have identified a 6.3 kilobase deletion that occurs in both the SMN1 and SMN2 genes.

This deletion was screened for in a panel of 466 patients and from this data we model deletion frequency and show the deletion occurred in both SMN1 and SMN2. We called variants in SMN2 and by performing an association analysis we identified that the variants A-44G, A-549G, and C-

i

1897T all statistically associate with mild exception SMA patients. We also captured the PLS3 gene which was analyzed for modifying variants, however we found no evidence of association between any PLS3 variants and SMA exception patients. From these results we conclude not all

SMN2 genes are equal. Though we found modifiers in SMN2, they could only explain 14 out of

58 mild exception patients. Hence, we also conclude that modifying variants must exist outside of SMN2. The patient samples we sequenced can be used in association analysis of modifying genes outside of SMN2.

In order to find modifiers of SMA that lie outside of SMN2, we have also performed genomic and / or exomic sequencing on SMA discordant siblings. As a control, we exome sequenced 8 SMA type 1 patients with 2 copies of SMN2 and genome sequenced 4 pairs of SMA type 2 patients with 3 copies of SMN2. In 3 mild discordant siblings, I found 3 synonymous candidate variants in the ROCK2 gene, 2 of which were predicted by the Alamut software of altering splicing or creating a cryptic splice site. I also identified 6 genes with multiple intronic candidate variants. These genes were FBXO3, CD59, SLIT1, PTPRD, FAM171A1, and HS6ST3.

Using 1000 genomes data, I determined the variants in the 6 genes were in linkage disequilibrium. In addition, using publicly available RNA-seq data of mouse motor neurons and the Allen Brain Atlas, I determined all candidate genes were highly expressed in neuronal tissue.

In sum, I have identified a select few strong candidate genes as being modifiers of SMA. Finally, we have analyzed RNA-sequencing data from laser-capture microdissected motor neurons of

SMA mice at P1 and P6. I have developed an analysis pipeline that allows for the detection of transcriptome changes, including splicing changes of novel exons. I have found splicing changes in the genes Mdm4, Magi1, and Tia1. By combining differential splicing and expression data, I

ii

found the pathway to be disrupted. Splicing changes were validated using ddPCR. From this data, I show splicing disruption in SMA animals can cause significant changes in biological pathways that may be contributing to the SMA phenotype.

iii

ACKNOWLEDGEMENTS

First and foremost, I would like to thank Arthur for giving me the opportunity to work in the Burghes lab. I have learned a lot and will not forget the time I have spent there. Arthur is one of the most intelligent people I know and it has been an honor working for him. And, although I do not want to tarnish his reputation as a cranky British man, he is also very kind, so long as you are not a telemarketer interrupting his lunch. I also need to thank the past and present members of the Burghes lab for all of their support. In particular, Vicki was vital to all these projects and has given much guidance in scientific writing. Anton, Kaitlyn, and Chitra, it has been a pleasure working with all of you and I know you all have very successful careers ahead of you. I would also like to thank my committee for all their help over the years. Dr. Kaspar was the reason I became interested in OSU. Dr. Rafael-Fortney, you have been an important part of my graduate school career since my first visit to OSU during OSBP’s recruitment weekend. Dr. Huang, you have been very helpful and have done an excellent job teaching me bioinformatics. And Dr.

Kolb, though you are new to my committee you have always been close and provided a good dose of humor. Finally, thank you to my wife, Nadia, I couldn’t have done this without you.

Together forever.

Corey Ruhno

iv

VITA

June 2017 ...... Heritage High School, Saginaw, MI

May 2011 ...... B.Sc, Biochemistry, Michigan State University, East Lansing, MI

September 2011 – February 2019 ...... Graduate Research Associate

Ohio State Biochemistry Program

Department of Biological Chemistry and Pharmacology

The Ohio State University, Columbus, OH

PUBLICATIONS 1) Ruhno, C., McGovern, V.L., Avenarius, M.R., Snyder, P.J., Prior, T.W., Nery, F.C.,

Muhtaseb, A., Roggenbuck, J.S., Kissel, J.T., Sansone, V.A., Siranosian, J.J., Johnstone, A.J.,

New, P.H., Zhang, R.Z., Swoboda, K.J., Burghes, A.H.M. (2019). Complete sequencing of the

SMN2 gene in SMA patients detects SMN gene deletion junctions and variants in SMN2 that modify the SMA phenotype. Hum Genet 138:241–256. doi: 10.1007/s00439-019-01983-0

2) Iyadurai, S., Arnold, W. D., Kissel, J. T., Ruhno, C. , Mcgovern, V. L., Snyder, P. J., Prior, T.

W., Roggenbuck, J. , Burghes, A. H. and Kolb, S. J. (2017), Variable phenotypic expression and onset in MYH14 distal hereditary motor neuropathy phenotype in a large, multigenerational

North American family. Muscle Nerve, 56: 341-345. Doi:10.1002/mus.25491

v

Fields of Study

Major Field: Biochemistry

vi

TABLE OF CONTENTS

Abstract ...... i

Acknowledgements ...... iv

Vita ...... v

List of Tables ...... xii

List of Figures ...... xiii

List of Symbols and Abbreviations...... xiv

1. Introduction ...... 1 1.1 History and Classification of Spinal Muscular Atrophy (SMA) ...... 1

1.2 Genetics of SMA ...... 8

1.2.1 SMN Missense ...... 13

1.3 SMN Function ...... 15

1.3.1 The SMN Complex ...... 15

1.3.2 Functional Domains and Cellular Location of SMN ...... 16

1.3.3 Role of SMN Complex in snRNP Biogenesis ...... 17

1.3.4 Other Functions of SMN ...... 20

1.3.5 SMN in Various Tissues ...... 22

1.4 Splicing Defects in SMA...... 25

vii

1.4.1 Splicing Alterations Detected in SMA Using Exon Microarray ...... 25

1.4.2 Splicing Disruption of the Agrin Gene ...... 26

1.4.3 Stasimon ...... 27

1.4.2 Other splicing changes ...... 29

1.5 Modifiers of SMA ...... 29

1.5.1 Evidence of Modifiers ...... 29

1.5.2 Plastin3 ...... 32

1.5.3 Neurocalcin Delta ...... 37

1.5.4 Myostatin ...... 39

1.5.5 PTEN ...... 40

1.5.6 Ubiquitination ...... 40 1.5.7 Dynamics ...... 42 1.5.8 Epigenetics...... 43

1.6 Therapies to treat SMA ...... 43

1.6.1 Splicing Modulators of SMN1/2 ...... 43

1.6.2 ASO treatment of SMA ...... 46

1.6.3 Valporic Acid treatment of SMA ...... 48

1.6.4 Gene therapy ...... 49

1.6.5 ...... 51

2. Complete sequencing of the SMN2 gene in SMA patients detects SMN gene deletion junctions and variants in SMN2 that modify the SMA phenotype ...... 53

2.1 Introduction ...... 53

viii

2.2 Methods ...... 57

2.2.1 DNA samples ...... 57

2.2.2 Classification of patients ...... 58

2.2.3 MDiGS sequencing...... 59

2.2.4 Bioinformatics ...... 59

2.2.5 Determination of SMN1 and SMN2 copy number by ddPCR ...... 61

2.2.6 PCR detection of the SMN1/2 6 deletion junction ...... 62

2.2.7 Determination of the inheritance of alleles in family with deletion junction by ddPCR

...... 62

2.3 Results ...... 63

2.3.1 Identification of SMN1/2 deletion junction...... 63 2.3.2 Inheritance of the junction in an SMA family ...... 64 2.3.3 Model of deletion junction frequency in individuals with varying SMN1 and SMN2

copy number ...... 69

2.3.4 Correlation curve of SMN2 as determined by MDiGS compared to ddPCR ...... 71

2.3.5 Analysis of SMN2 variants that can modify the SMA phenotype ...... 72

2.3.6 Mutations in SMN2 or SMN1 in SMA patients ...... 75

2.3.7 Alignment and map of the SMA region ...... 85

2.4 Discussion ...... 87

3. Identification of SMA Modifiers Using Exomic and Genomic Sequencing ...... 95

3.1 Introduction ...... 95

3.2 Methods ...... 99

ix

3.2.1 Patient Samples...... 99

3.2.2 Exomic sequencing of SMA discordant siblings ...... 100

3.2.3 Bioinformatic Analysis of Exomes ...... 100

3.2.5 Genomic sequencing of SMA discordant siblings ...... 102

3.3 Results ...... 104

3.3.1 Exonic variants identified from exomic and genomic sequencing ...... 104

3.3.2 Confirmation of the CYP7B1 and COL5A3 variants ...... 107

3.3.3 Intronic variants identified from genomic and exomic sequencing ...... 108

3.4 Discussion ...... 115

4. Identification of splicing changes in motor neurons of SMA mice ...... 126

4.1 Introduction ...... 126 4.2 Materials and methods ...... 129 4.2.1 Collection of samples for RNA sequencing ...... 129

4.2.2 Read alignment and splicing analysis ...... 129

4.2.3 Identification of novel exons ...... 130

4.2.4 Confirmation of splicing changes ...... 131

4.2.5Validation of novel exons ...... 131

4.3 Results ...... 132

4.3.1 Alignment and analysis ...... 132

4.3.2 Transcriptome assembly ...... 135

4.3.3 Determination of splicing changes ...... 135

4.3.5 Validation of splicing changes ...... 151

x

4.4 Discussion ...... 153

5. Conclusions and future directions ...... 160

References ...... 164

Appendix A: PLS3 SNPs, Female ...... 207

Appendix B: PLS3 indels, Females ...... 227

Appendix C: PLS3 SNPs, Males ...... 238

Appendix D: PLS3 indels, Males...... 252

xi

List of Tables

Table 1.1. Classification of SMA ...... 6

Table 2.1. Prevalence of deletion junction amongst a panel of 466 individuals with different copy numbers of SMN1 and SMN2...... 69

Table 2.2. Association between SNPs and exception SMA phenotypes...... 73

Table 2.3. Allele counts of SNPs in SMN1/2 ...... 76

Table 2.4. Allele counts of indels in SMN1/2 ...... 81

Table 3.1. Intronic candidate variants identified by genome sequencing ...... 109 Table 3.2. Variants of FBXO3 and CD59 in linkage disequilibrium ...... 112 Table 3.3. Clones that contain candidate modifying genes ...... 124

Table 4.1 Gene expression changes at P1 ...... 134

Table 4.2 ddPCR validation of novel exons ...... 136

Table 4.3 Splicing changes at P1 ...... 137

Table 4.4 Splicing changes at P6 ...... 146

Table A.1: PLS3 SNPs, Female ...... 208

Table B.1: PLS3 indels, Females ...... 228

Table C.1: PLS3 SNPs, Males ...... 239

Table D.1: PLS3 indels, Males ...... 253

xii

List of Figures

Figure 1.1. Comparison of muscle biopsy in infantile and juvenile SMA...... 5

Figure 1.2. Diagram of SMN1 and SMN2...... 7

Figure 1.3. Domains of SMN and SMA causing missense mutations ...... 13

Figure 1.4. Illustration showing the SMN complex...... 15

Figure 1.5. SMN complex and snRNP biogenesis ...... 18

Figure 1.6. Pedigree with discordant SMA siblings ...... 30

Figure 2.1 Detection of 6.3 kb deletion junction ...... 66 Figure 2.2. Inheritance a 6.3 kb deletion ...... 68 Figure 2.3. Correlation curve of SMN2 copy number as determined by MDiGS compared to ddPCR...... 71

Figure 2.4. Map of the SMA region ...... 87

Figure 3.1. Schematic of variant filtration pipeline ...... 105

Figure 3.2. Validation of CYP7B1 p.R324H ...... 108

Figure 3.2. Map of variants near FBXO3 in linkage disequilibrium ...... 111

Figure 3.3. Read alignments of 2 NCALD variants...... 115

Figure 4.1. Pathways detected as being significantly altered in the SMA ...... 151

Figure 4.2. Validation of splicing changes using ddPCR...... 152

xiii

List of Symbols And Abbreviations

ASO anti-sense oligonucleotide

CDCA chenodeoxycholic acid

ChAT choline acetyl transferase

CMAP compound motor action potential

CNS central nervous system

CSF cerebral spinal fluid ddPCR droplet digital PCR DNA deoxyribonucleic acid EDS Ehlers-Danlos syndrome

EMG electromyogram

FL-SMN full-length Survival Motor Neuron

GFP green fluorescent protein

HDAC histone deacetylase

ICV intra-cerebroventricular

ISS-N1 intronic splice silencer N1 kb kilobase kDa kilo Dalton

LCM laser-capture microdissection

xiv

MO morpholino

MUNE motor unit number estimate

NAIP neuronal apoptosis inhibitor protein

NMJ neuromuscular junction

PCR polymerase chain reaction

PND post-natal day

RNA ribonucleic acid

Robo Roundabout scAAV self-complementary adeno-associated virus shRNA short-hairpin ribonucleic acid

SMA Spinal Muscular Atrophy SMN Survival Motor Neuron SMNΔ7 SMN with exon 7 deletion snRNA small nuclear RNA snRNP small nuclear ribonucleoprotein

SPG5 spastic paraplegia

TRAF TNFR-associated factor

UNRIP UNR-interacting protein

WT wild-type

xv

Chapter 1

Introduction

1.1 History and Classification of Spinal Muscular Atrophy (SMA)

The first published accounts describing SMA occurred in the 1890s. In 1891, Guido

Werdnig described 2 brothers who had a neuromuscular phenotype that began around 10 months of age (Werdnig 1891). This disease was called infantile or acute SMA due to the early onset and severity of the disease. Leg weakness was the first symptom observed, which later progressed to arm weakness and the inability to raise the head. It was noted that reflexes of the biceps and triceps were lost while intelligence was normal (Werdnig 1891). One brother died at 3 years of age and the other at 6 years. The autopsy revealed degeneration of the anterior horn cells, and extreme atrophy of muscles, though some hypertrophied muscle fibers remained (Werdnig

1894). That the disease onset, symptoms, and progression were similar between the siblings suggested the disease had a genetic origin.

In 1893, Johann Hoffmann published a report detailing four cases of a disease very similar to that described by Werdnig (Hoffmann 1893). Disease onset in three of these cases began at 9 months. In these reports it was noted that muscle weakness, although widespread, did not affect all muscles equally (Hoffmann 1893). The most severely affected muscles were those 1

of the legs, specifically the gluteal muscle, which was so weak it prevented hip flexion

(Hoffmann 1893). In contrast, the muscles in the face were normal as was the ability to swallow

(Hoffmann 1893). Most strikingly, Hoffmann reported that the diaphragm, masseter, and cardiac muscles all appeared to be normal (Hoffmann 1893). These descriptions of SMA by Hoffmann were the first detailed accounts showing that certain muscles are spared by the disease. As a result of Werdnig’s and Hoffmann’s descriptions, the infantile form of SMA was coined

Werdnig-Hoffmann disease.

An early onset form of SMA was described in a patient report in 1902 by C.E. Beevor

(Beevor 1902). The patient came from a family with 7 siblings, 3 of whom developed a form of paralysis around 6 weeks of age. The 3 affected children all died between 4 and 8 months of age, while the others remained healthy (Beevor 1902). The age of onset and time of death were markedly sooner than in the cases noted by Werdnig and Hoffmann, indicating that SMA had a spectrum of severity. However, similar to Hoffmann, Beevor noted a very strong diaphragm that was performing much of the work required to breathe as evidenced by how the thorax moved with each inhalation (Beevor 1902). A comparison between the SMA patient and a similarly- aged patient with a spinal cord injury showed that degeneration was present in the anterior horn in the patient with SMA but not in the patient with the spinal cord injury (Beevor 1902).

Interestingly, the SMA patient Beevor described had an onset of disease at birth and died at 8 weeks which was several months sooner than his siblings also afflicted with the disease (Beevor

1902). This makes the case described by Beevor the first reported instance of SMA siblings who have discordant SMA phenotypes.

A mild form of SMA (also called chronic or juvenile-onset SMA) was described by

Kugelberg and Welander in 1956 (Kugelberg and Welander 1956). Patients experienced 2

progressive muscle wasting with electrophysiological findings consistent with a neurological disease (Kugelberg and Welander 1956). The study described patients with a very wide spectrum of disease phenotypes. Twelve patient cases were examined, with age of onset ranging from 2 to

17 years of age (Kugelberg and Welander 1956). All but one patient had a disease duration greater than 10 years, with the longest being 40 years. Despite the slow progression, one patient developed weakness so severe they were bedridden and could barely move (Kugelberg and

Welander 1956). Kugelberg and Welander explicitly stated that the disease they described is separate from Werdnig-Hoffmann disease (Kugelberg and Welander 1956). However, the symptoms they described are identical to those of infantile SMA, only they were less severe and progression was much slower which led to speculation the two were the same disease (Bobowick and Brody 1973). Evidence that the infantile and juvenile forms of SMA were the same disease included the existence of siblings with different severities of the disease as well as patients with infantile onset who had a slow progression and lived into adulthood (Dubowitz 1964; Munsat et al. 1969a). The most mild patients with infantile onset had remarkable overlap with patients described by Kugelberg in terms of symptom severity (Kugelberg and Welander 1956; Dubowitz

1964).

Although clinical examination was often sufficient for an SMA diagnosis, a muscle biopsy could also be performed for confirmation. Numerous abnormalities have been identified in muscle biopsies from SMA patients (Kugelberg and Welander 1956; Byers and Banker 1961;

Hausmanowa-Petrusewicz et al. 1968; Munsat et al. 1969b; Buchthal and Olsen 1970). Muscles from SMA patients are reported to have fibers at all stages of atrophy depending on the severity of the disease (Byers and Banker 1961). In milder patients with the Kugelberg-Welander form of the disease, small subsections of atrophied fibers can be found adjacent to bundles of normal- 3

sized fibers (Hausmanowa-Petrusewicz et al. 1968). In contrast, in the Werdnig-Hoffmann form of SMA, the atrophy typically extended throughout the entire bundle (Hausmanowa-Petrusewicz et al. 1968). A comparison of muscle biopsy from cases of Werdnig-Hoffmann and Kugelberg-

Welander SMA can be seen in Figure 1.1 (Hausmanowa-Petrusewicz et al. 1968). In late stages of the disease, the distribution of fiber sizes becomes bimodal, with many atrophic and hypertrophic fiber sizes present and with very few muscle fiber sizes lying in the normal range

(Buchthal and Olsen 1970). The hypertrophied fibers are likely a result of motor neuron sprouting as fiber type staining has shown they are the same fiber type as surrounding atrophied fibers (Dubowitz et al. 1985). Fiber type staining also indicated that atrophy occurred in both type I and type II fibers (Hausmanowa-Petrusewicz et al. 1968). Another abnormality reported from biopsies was nuclear changes such as an increase of intracellular nuclei in both atrophic and normal fibers (Munsat et al. 1969a). Abnormalities have also been observed in muscle spindles, such as a thickening of the capsule and increase in connective tissue, though these are non- specific to SMA (Hausmanowa-Petrusewicz et al. 1968). Normal muscle biopsy can be obtained for milder cases of Kugelberg-Welander if performed early in disease course (Byers and Banker

1961).

Historically, an electrophysiology exam could also be used to assist in diagnosis, though now it is more used as a biomarker for disease (David Arnold et al. 2014; Arnold et al. 2015).

Severe SMA patients have been shown to exhibit regular and spontaneous potentials that exist even when the patient is at rest or sleeping (Buchthal and Olsen 1970). This is different from fasciculations in its regularity and does not appear in other neurodegenerative diseases (Buchthal and Olsen 1970), though the reason for this is not known (Arnold et al. 2015). A study analyzing

EMG data from 223 SMA patients showed that fibrillations and positive sharp waves were 4

frequently found in all patients, though more common in older and milder Kugelberg-Welander patients (Hausmanowa-Petrusewicz and Karwańska 1986). Fasciculations were rare in Werdnig-

Hoffmann patients but very common in milder Kugelberg-Welander patients (Hausmanowa-

Petrusewicz and Karwańska 1986). This study also determined that the best parameter for distinguishing between the different forms of SMA was the mean amplitude of the single motor unit potential. In general, the patients with milder forms of the disease had longer duration of potentials with a higher amplitude (Hausmanowa-Petrusewicz and Karwańska 1986).

More recently, motor unit number estimation (MUNE) and compound muscle action potential (CMAP) have been shown to be useful biomarkers in monitoring disease progress of

Figure 1.1. Comparison of muscle biopsy in infantile and juvenile SMA. Haematoxylin-eosin staining of deltoid muscle biopsies from SMA patients (Hausmanowa-Petrusewicz et al. 1968). (A) Muscle biopsy from SMA patient with severe Werdnig-Hoffmann form of the disease. There are entire bundles of atrophic fibers. 400X (B) Muscle biopsy from patient with Kugelberg- Welander form of SMA. There are small groupings of atrophic fibers within normal bundles. 100X

5

SMA (David Arnold et al. 2013). MUNE is the number of motor units for a particular output, and CMAP is a measure of the output of all motor units of a particular muscle (David Arnold et al. 2013). Generally, data has shown that there is a steep decline in both MUNE and CMAP for

SMA patients during the first months of the disease, though the steepness and extent of the decline depends upon SMA type (Swoboda et al. 2005). As disease severity increases, there is a corresponding drop in both MUNE and CMAP. Thus, more severe SMA type I patients have a rapid decline in measured CMAP and MUNE while more mild SMA type 3 patients have only a slight decline in MUNE and a slight decline or stable CMAP (Swoboda et al. 2005; Arnold and

Burghes 2013).

Diagnostic criteria for SMA was solidified in 1991 by the SMA Collaboration. Criteria for SMA include symmetrical weakness which is greater in the proximal muscles as well as denervation as measured by EMG or muscle biopsy (Munsat 1991). A summary of these criteria can be found in Table 1.1 (Kolb and Kissel 2015). Age of onset and peak muscle function determine the type of SMA. SMA type 1 patients have an onset before 6 months of age, SMA type 2 patients have an onset between 6 months and 18 months of age, and SMA type 3 patients

Table 1.1. Classification of SMA (Kolb and Kissel 2015) Natural Age SMN2 Type Age of Onset Highest Function of Death Copies 0 Prenatal Respiratory support <1 mo 1 1 0-6 mos. Never sit <2 years 2 2 <18 mos. Never stand >2 years 3, 4 3 >18 mos. Stand alone Adult 3a 18 mos. -3 years Stand alone Adult 3, 4 3b > 3 years Stand alone Adult 4 4 > 21 years Stand alone Adult 4-8

6 independently, and SMA type 3 patients as those who are ambulatory for some point of their life

(Munsat 1991; Zerres and Rudnik-Schoneborn 1995). In some cases, these classifications have been further subdivided. For example, SMA type 3 can be divided into type 3a and 3b, depending on whether onset occurs prior to or after 3 years of age, respectively (Zerres and

Rudnik-Schoneborn 1995). Additionally, it was suggested patients be excluded from having

SMA if they have central nervous system (CNS) dysfunction, weakness in the eyes, face, diaphragm defects, or sensory defects (Munsat 1991). Although rare, there is also an SMA type

0, which has onset at birth or even in utero (Macleod et al. 1999). Finally, SMA type 4 is adult

Figure 1.2. Diagram of SMN1 and SMN2. A single nucleotide change distinguishes SMN1 from SMN2, which disrupts SMN exon 7 inclusion and as a result less function protein is made from SMN2. There is a C to T change in exon 7 of the SMN2 gene. The presence of this nucleotide difference results in a splicing change, and thus exon 7 is skipped in a majority of transcripts. The result is that approximately 90% of protein translated from the SMN2 gene is unable to oligomerize correctly and thus is nonfunctional and is rapidly degraded (Burghes and Beattie 2009).

7

onset SMA where patients remain asymptomatic until after 21 years of age.

Currently, the necessary diagnostic criteria for SMA is a homozygous loss or mutation of the survival motor neuron 1 (SMN1) gene (Lefebvre et al. 1995). SMN1 codes for the survival motor neuron (SMN) protein which is necessary for cellular survival (Lefebvre et al. 1995;

Schrank et al. 1997). A second gene, SMN2, is highly homologous to SMN1, differing by only a single nucleotide in the coding regions(Lefebvre et al. 1995). Although this nucleotide change results in a synonymous mutation, it also disrupts the incorporation of exon 7 in the mRNA

(Lorson et al. 1999b; Monani et al. 1999a). As a result, SMN protein from SMN2 is truncated, making it unable to oligomerize efficiently and is rapidly degraded (Coovert et al. 1997; Lorson et al. 1998a; Burnett et al. 2009b). Thus, when SMN1 is deleted or mutated, all SMN must come from the highly inefficient SMN2 gene from which the amount of functional SMN protein produced by the cell is greatly reduced (Coovert et al. 1997; Lefebvre et al. 1997). The deficiency of SMN protein causes the disease meaning SMA is essentially a disease of low SMN levels (Schrank et al. 1997). A diagram of the SMN1 and SMN2 genes can be seen in Figure 1.2

(Burghes and Beattie 2009). Difference between the SMN1 and SMN2 genes are discussed more in the following section.

1.2 Genetics of SMA

SMA was considered to be an autosomal recessive disorder based on the frequency of the disease amongst siblings in SMA family studies (Byers and Banker 1961). Linkage studies located the SMA region to 5q13 (Brzustowicz et al. 1990; Gilliam et al. 1990;

Melki et al. 1990b, a). Interestingly, all types of SMA mapped to the same region of the genome, confirming that they were all the same disease (Gilliam et al. 1990; Melki et al. 1990b, a). The

8

region of the genome was further narrowed by using pedigree analysis and linkage studies of polymorphic markers (Francis et al. 1993; Soares et al. 1993; Clermont et al. 1994; Burghes et al.

1994a). Additionally, YAC contigs were constructed in attempts to physically map the region

(Francis et al. 1993; Kleyn et al. 1993; Carpten et al. 1994; Melki et al. 1994). Several polymorphic markers mapped to multiple regions of 5q13, which indicated the presence of low- copy repeats and suggested that the region may be unstable (Francis et al. 1993; Burghes et al.

1994b; Melki et al. 1994). Indeed, analysis of allele segregation near the markers C212 and C272 showed de novo deletions and rearrangements in SMA patients (Melki et al. 1994). Significant allelic association was demonstrated between SMA and 2 markers, Ag1-CA and CATT1, which was evidence the SMA causing gene was nearby (Burghes et al. 1994b; DiDonato et al. 1994).

Multiple cDNAs were detected from this region, namely those deriving from SMN1, SMN2, and neuronal apoptosis inhibitor protein (NAIP) (Roy et al. 1995; Thompson et al. 1995; Lefebvre et al. 1995). Initially there was some debate as to whether NAIP or SMN1 was responsible for

SMA. Analysis of NAIP exons 5, 6, and 13 showed that NAIP is at least partially deleted in 24% of SMA patients, with the percentage varying based on SMA type (Hahnen et al. 1995). Multiple studies detected a partial NAIP deletion in up to 67% of SMA type 1 patients compared to only

7% of SMA type 3 patients having a partial NAIP deletion (Hahnen et al. 1995; Roy et al. 1995).

However, NAIP was fully intact in numerous SMA patients and 2% of non-SMA controls carried a NAIP deletion (Hahnen et al. 1995; Roy et al. 1995). In contrast, SMN1 was always deleted or mutated in SMA patients (Lefebvre et al. 1995). Additionally, missense mutations in SMN1 can cause severe SMA, while the NAIP gene remained intact. Altogether, this evidence meant SMN1 must be the SMA determining gene.

9

Lying several hundred kilobases upstream of SMN1 is the highly homologous SMN2. The two genes were initially reported as lying in an inverted repeat (Lefebvre et al. 1995), with SMN1 being telomeric relative to SMN2, and thus SMN1 and SMN2 are sometimes referred to as

‘telomeric’ or ‘centromeric’, based on their positioning relative to each other. SMN1 and SMN2 generally differ by only 5 nucleotides (Monani et al. 1999a). Four of these nucleotides occur in non-coding regions (one in intron 6, two in intron 7, and one in exon 8) and the fifth is the critical C to T base change in exon 7 (Monani et al. 1999a). However, gene conversion events can result in variants that normally exist in SMN1 to be present in SMN2 or vice versa (DiDonato et al. 1997; Burghes 1997). Thus, the C to T change is essentially the only nucleotide that distinguishes SMN1 and SMN2.

The C to T change is critical as its presence drastically alters the splicing of the gene which is the defining characteristic of SMN2 (Lorson et al. 1999b; Monani et al. 1999a). Experiments using minigenes that carried the SMN1 gene with different mutations that are usually only associated with SMN1 determined that the C to T change alone was responsible for drastically reducing the inclusion of exon 7 (Lorson et al. 1999a; Monani et al. 1999b). SMN protein that is missing exon 7 is unable to oligomerize correctly or form functional complexes

(Lefebvre et al. 1997; Lorson et al. 1998a; Le et al. 2005). Indeed, this was confirmed by mutational analysis which showed that the missense mutation G279V in exon 7 disrupted oligomerization (Lorson et al. 1998a). This truncated protein which cannot oligomerize correctly is called SMNdelta7 and has been shown to undergo rapid degradation (Burnett et al. 2009b).

Curiously, SMA mice that also carry a transgene for SMNdelta7 live longer than SMA mice that do not (Le et al. 2005). This indicates that SMNdelta7 can play some role in forming functional

10

SMN complexes. However, some full-SMN must also be present (Le et al. 2005; Burghes and

Beattie 2009).

SMN1 and SMN2 lie in a region of the genome that is highly unstable and prone to rearrangements. This was evidenced by pulse-field gel electrophoresis experiments showing multiple banding patterns amongst both SMA patients and controls (Campbell et al. 1997). As previously mentioned, de novo deletions were detected in this region in SMA patients during linkage analysis (Melki et al. 1994). Duplications have also been shown to occur and some individuals are known to have two copies SMN1 on the same chromosome (Chen et al. 1999;

Mailman et al. 2002; Alías et al. 2014). An estimated 8% of Ashkenazi Jewish SMA carriers are

“silent carriers”, in that they have 2 copies of SMN1 on a single chromosome and 0 copies SMN1 on the other (Chen et al. 1999). Silent carriers can be detected by screening for the g.27134T>G and g.27706_27707delAT variants which segregate with the 2 SMN1 haplotype (Luo et al. 2014). Partial deletions have also been reported in SMN1, including an Alu-mediated deletion of exons 5 and 6 (Wirth et al. 1999). Additionally, gene conversion events have also been reported

(DiDonato et al. 1997; Burghes 1997). An SMN1 to SMN2 gene conversion event would result in an increase of the copy number of SMN2 on a single chromosome and elimination of SMN1 thus creating a carrier. As a result of the genetic rearrangements in the region, there is a wide degree of copy number variation of SMN1 and SMN2 in the general population. SMN2 copy number has been reported to be as high as 8 in normal individuals (Vitali et al. 1999) and approximately 10-

15% of people have 0 copies of SMN2 (Vitali et al. 1999; Mailman et al. 2002).

Copy number of SMN2 is particularly important for SMA patients as it is inversely correlated with disease severity (Velasco et al. 1996; McAndrew et al. 1997; Feldkötter et al.

11

2002; Mailman et al. 2002; Jedrzejowska et al. 2009; Calucho et al. 2018). This is due to the additional SMN2 genes producing a total increase in full-length SMN protein. It is estimated that a single copy of SMN2 produces about 10% the level of full-length protein compared to SMN1

(Coovert et al. 1997; Lefebvre et al. 1997). From a recent study of 625 Spanish SMA patients,

86% of SMA type 1 patients (n = 272) were found to have two copies of SMN2. Similarly, 87% of SMA type 2 patients (n = 186) had three copies of SMN2. Interestingly, 64% of SMA type 3 patients (n = 167) also has three copies of SMN2, while most of the rest (31%) had four copies

(Calucho et al. 2018). The fact that a majority of both SMA type 2 and SMA type 3 cases have 3 copies of SMN2 indicates that copy number alone is not sufficient to describe the phenotype of

SMA patients. Possible explanations for this discrepancy include modifiers outside of the SMA region that modulate the phenotype (Oprea et al. 2008a; Hosseinibarkooie et al. 2016; Riessland et al. 2017), or variants inside of SMN2 that affect how much full-length protein is produced (Prior et al. 2009; Wu et al. 2017). Regardless of the cause, there is a clear association between having more functional copies of SMN2 and a less severe SMA phenotype.

Pan ethnic carrier frequency was determined to be 1 out of 54 individuals in the United

States based on data from 72,453 people (Sugarman et al. 2012). Separating the data by ethnic groups showed a range in frequency from 1/47 in Caucasians to 1/72 in African Americans.

Other populations tested include Asian with a frequency of 1/59, Hispanic with a frequency of

1/68, and Asian Indian with a frequency of 1/52. Altogether, the incidence rate of SMA was predicted to be approximately 1 out of 10,000 (Pearn 1978).

In summary, SMA is a disease of low SMN levels. This is a result of homozygous loss or mutation of the SMN1 gene while simultaneously retaining 1 or more copies of the SMN2 gene.

12

The amount of full-length, functional protein that comes from SMN2 is drastically limited compared to SMN1. However, with an increase in SMN2 copy number there is a corresponding increase in full-length protein. This means there is an inverse correlation between SMN2 copy number and disease severity, however it is not enough to fully explain or predict the SMA phenotype.

1.2.1 SMN Missense Mutations

Numerous missense mutations have been identified in the SMN1 gene which can be seen in Figure 1.3. These mutations have a wide spectrum of severity with some severely disrupting the protein’s function. For example, SMA patients with the p.Y272C mutation in SMN1 and 2 copies of SMN2 have similar levels of function SMN compared to SMA type 1 patients

(Lefebvre et al. 1997). Binding assays indicate that SMN with this mutation had a decreased

Figure 1.3. Domains of SMN and SMA causing missense mutations (adapted from (Burghes and Beattie 2009)). GEMIN2 has been found to bind to SMN exon 2a. The is located in SMN exon 3. The Tudor domain mutation p.E134K has been shown to impact Sm protein binding. The oligomerization domain primarily resides in SMN exon 6. The p.Y272C and p.T274I mutations in exon 6 are known to decrease the ability of SMN to self-associate.

13

ability to self-associate as it lies in the oligomerization domain (Lorson et al. 1998b). Similarly, the nearby mutations p.G279V, p.S262I, and p.T274I also affect the ability of SMN to self- associate, though to different degrees as the relative binding of p.S262I and p.T274I is an order of magnitude higher than p.G279V (Lorson et al. 1998b). Several mutations in the Tudor domain disrupt the binding of Sm to the SMN complex or impact the ability of the SMN complex to assemble (Bühler et al. 1999; Shpargel and Matera 2005). One example is the Tudor domain mutation p.E134K which is a severe mutation that disrupts the ability of SMN to bind the Sm Core proteins (Bühler et al. 1999). Similarly, the SMN Tudor domain mutations p.I116F and p.Q136E have significantly decreased snRNP assembly compacity compared to wild-type SMN (Shpargel and Matera 2005).

A curious feature of SMN protein harboring missense mutations is that it is non- functional in the absence of full-length SMN. This was initially demonstrated with the mild p.A2G mutation which was unable to rescue embryonic lethality when it was trangenically expressed in mice lacking Smn (with no SMN2) (Monani et al. 2003). Interestingly, when mild

SMN mutants like p.A2G and p.A111G are co-expressed with SMN2, it results in a rescue of phenotype indicating that mutant SMN must complement with full-length SMN to be functional

(Monani et al. 2003; Workman et al. 2009). Indeed, all mild SMN missense mutations analyzed need some full length SMN2 in order to rescue (Iyer et al. 2018). This is strong evidence for

SMN existing as an oligomer, which is discussed in more detail below.

14

1.3 SMN Function

1.3.1 The SMN Complex

SMN is a 38 kDa ubiquitously expressed protein that is essential for cellular survival.

SMN is present in all except Saccharomyces cerevisiae, though creation of SMN2 was a recent occurrence in evolution and as such it is only present in homo sapiens (Rochette et al. 2001). Knockout of mouse Smn leads to cell death very early in embryo development

(Schrank et al. 1997). Similarly, mutation of zebrafish smn or fruit fly smn results in a lethal phenotype early in development, though with a slight lag as a result of maternal smn contribution

(Chang et al. 2008; Hao et al. 2011).

SMN has been found to interact with Gemin2, Gemin3, Gemin4, Gemin5, Gemin6,

Gemin7, Gemin8, and the UNR-interacting protein (UNRIP) (Charroux et al. 1999, 2000; Baccon et al. 2002; Gubitz et al. 2002; Pellizzoni et al. 2002a; Carissimi et al. 2005, 2006). Together, these proteins make up the SMN complex which is illustrated in Figure 1.4. It is hypothesized the SMN complex exists as an oligomer for multiple reasons. First, it has been

Figure 1.4. Illustration showing the SMN complex. Individuals components are arranged to show known interactions between them (Pellizzoni 2007). However, this diagram has been simplified as the actual SMN complex is thought to be larger and composed of 2 tetramers.

15

found to oligomerize in vitro (Lorson et al. 1998a). Second, complementation studies indicated that complexes consisting entirely of mutated SMN are inactive, whereas if some full-length

SMN is also present the complex is still active (Workman et al. 2009; Burghes and McGovern

2017). For example, transgenic Smn null mice with SMN p.A111G and 1 copy SMN2 can live for over a year while Smn null mice with only SMN p.A111G are not viable, strongly indicating missense alleles can complement with full-length SMN in a heteromer to form functional complexes (Workman et al. 2009). Third, mutations which disrupt oligomerization cause SMA and the ability of oligomerization to occur correlates with disease severity (Lorson et al. 1998a).

Although much evidence suggests the SMN complex exists as an oligomer, the exact stoichiometry is not currently known. However, complementation studies have demonstrated that high levels of SMN p.A111G can rescue better than SMN from an additional copy of SMN2, which implies that the complex contains multiple subunits and just a single full length SMN molecule may be needed in the complex for function (Workman et al. 2009).

1.3.2 Functional Domains and Cellular Location of SMN

The SMN protein has several functional domains which can also be seen in Figure 1.3.

The C-terminal of SMN contains the conserved YG box, which is a 12 amino acid stretch containing a tyrosine and glycine-rich motif (Talbot et al. 1997). This region of the protein is important for oligomerization and vital for protein function. Mutational analysis of SMN1 exons

5-7 show that deletion of this region drastically reduces the ability of SMN to self-associate

(Lorson et al. 1999a). Near the N-terminal of SMN is exon 2b which can bind the protein

Gemin2 (Young et al. 2000). SMN exon 3 has a homology to the Tudor domain motif, which is known to interact with RNA (Ponting 1997). The Tudor domain of SMN also binds strongly to

16

the proteins SmB, SmD1, SmD2, SmD3 and SmE, which are all critical components of the Sm core of snRNPs (Bühler et al. 1999). Interestingly, the severe SMA mutation E134K which lies in the Tudor domain results in failure of the Sm proteins to associate with SMN (Bühler et al.

1999).

Staining experiments with an anti-SMN monoclonal antibody have shown that the SMN protein is weakly dispersed throughout the , as well as in concentrated foci in the nucleus called gems (Liu and Dreyfuss 1996). Gems are distinct from coiled bodies, though the two associate with each in a majority of cases (Liu and Dreyfuss 1996; Carvalho et al. 1999).

Gems are more often found separate from coiled bodies in rapidly dividing cells. Interestingly, staining with anti-, anti-SMN, and anti-Sm protein antibodies show that the three all associate with each other (Carvalho et al. 1999). 1.3.3 Role of SMN Complex in snRNP Biogenesis The best documented function of SMN is that of snRNP biogenesis. A snRNP is composed of a snRNA (U1, U2, U4, U5, U6, U11, U12, U4atac, U6atac), a set of core Sm proteins, and proteins unique to each snRNP (Will and Lührmann 2001, 2005). One important function of snRNPs is the removal of from pre-mRNA in a process called splicing. This is a necessary biological process that removes non-coding elements from the pre-mRNA. The machinery responsible for excising introns from pre-mRNA is called the . The vast majority of introns are excised by the major spliceosome, which is composed of the U1, U2, U4,

U5, and U6 snRNP (Will and Lührmann 2001). In addition to the major spliceosome in the minor spliceosome. Approximately 1% of introns in the genome are spliced by the minor spliceosome, which differs from the major spliceosome in the snRNPs it contains (Will and

17

Lührmann 2005; Alioto 2007a). The minor spliceosome contains U11, U12, U4atac, U5, and

U6atac snRNPs (Will and Lührmann 2005).

The biogenesis of snRNPs is an extremely complex process involving dozens of proteins and is illustrated in Figure 1.5. The process is initiated in the nucleus with the transcription of an small nuclear RNA (snRNA), which is then transported into the cytoplasm (Izaurralde et al.

1995). In the cytoplasm, eight Sm proteins (B, B', D1, D2, D3, E, F, and G) form a ring around

Figure 1.5. SMN complex and snRNP biogenesis (adapted from (Burghes and Beattie 2009)). (A) In the cytoplasm, SMN exists in the SMN complex along with GEMIN2-8 and UNRIP. Also in the cytoplasm, the core Sm proteins associate with the protein pICln and are methylated by PRMT7 which increases their affinity for the SMN complex, allowing them to bind. (B) The snRNA is transcribed in the nucleus and exported to the cytoplasm. (C) The snRNA binds the SMN complex through GEMIN5. With the assistance of the SMN complex, the core Sm proteins are loaded unto the snRNA as a heptameric ring. The methylase TGS1 hypermethylates the 5' cap of the snRNA. (D) With the snRNA now hypermethylated, the complex is able to bind Snurportin and be imported into the nucleus. Inside the nucleus, the complex localizes to Cajal bodies where the snRNP dissociates and is further processed into a mature snRNP.

18

the snRNA (Liu et al. 1997; Fischer et al. 1997; Meister et al. 2001; Pellizzoni et al. 2002b).

Association between the Sm proteins and a complex containing pICln, PRMT5 and PRMT7 results in the methylation of arginine residues on the Sm proteins (Neuenkirchen et al. 2008;

Chari et al. 2009). This critical step increases the binding affinity of the SMN complex with the

Sm proteins (Neuenkirchen et al. 2008). The Sm proteins bind to the Tudor domain of SMN, where the SMN complex will load the Sm ring unto the snRNA (Liu et al. 1997; Fischer et al.

1997; Bühler et al. 1999; Mohaghegh et al. 1999; Pellizzoni et al. 2002b). The SMN complex, still bound to the Sm core, is transported to the nucleus with the aid of snurportin1 (Narayanan et al. 2002).

Several pieces of evidence underpin the hypothesis that the SMN complex is involved in the biogenesis of snRNPs. First, experiments in Xenopus laevis oocytes have shown that when Gemin2 is reduced using anti-Gemin2 antibodies, there is a reduction in snRNP formation (Fischer et al. 1997). Second, SMN complex that was purified from cell extracts was able to assemble the Sm core in an ordered manner (Pellizzoni et al. 2002c). Third, depletion of SMN prevented the assembly of the Sm core onto snRNA (Meister et al. 2001). Finally, numerous defects in snRNP assembly have been observed in SMA tissues (Workman et al. 2012). Certain missense mutations in SMA result in reduced snRNP assembly activity (Cuscó et al. 2004;

Shpargel and Matera 2005), the most notable of which is the Tudor domain mutation E134K which disrupts the ability of SMN to bind Sm proteins (Bühler et al. 1999). Additionally, tissues from SMA patients have a marked reduction in snRNP assembly activity compared with healthy controls (Wan et al. 2005). Furthermore, snRNP assembly defects were found to correlate with

SMA severity (Gabanella et al. 2007). For example, SMNdelta7 mouse spinal cord extracts had

19

about 10% of the snRNP assembly activity compared to wild-type, while extracts from milder mice with the A2G mutation had 30% snRNP assembly activity relative to wild-type (Gabanella et al. 2007). The levels of U1, U2, U11, and U12 snRNP were also found to be reduced in spinal cords from SMA mice (Gabanella et al. 2007). Altogether, these data show defects in snRNP biogenesis in SMA animals. In addition to assembly of the spliceosomal snRNPs, SMN also participates in the assembly of U7 snRNP which is involved in the 3’ end processing of histone mRNA (Dominski and Marzluff 2007). The U7 snRNP differs from spliceosomal snRNPs in that its Sm core is composed of the Sm-like proteins LSm10 and LSm11 instead of SmD1 and SmD2

(Pillai et al. 2003). Several pieces of evidence indicated SMN involvement in U7 snRNP assembly. First, the U7 snRNA was present in coiled bodies along with SMN (Pillai et al. 2003).

Second, SMN depletion using RNAi in NIH3T3 cells resulted in a significant reduction of U7 snRNP assembly (Tisdale et al. 2013). Finally, histone 3’-end formation defects are found in both SMA mouse motor neuron and SMA patent samples (Tisdale et al. 2013). This evidence shows that SMN is involved in U7 snRNP assembly.

1.3.4 Other Functions of SMN

SMN has also been suggested to have a function in the axons of motor neurons.

Knockdown of zebrafish smn using a morpholino (MO) resulted in axon truncation and axon branching defects (McWhorter et al. 2003). Additionally, motor neurons from transgenic SMA mice isolated and grown in culture showed a decrease in axon growth but not survival (Rossoll et al. 2003). Furthermore, there is an isoform of SMN created from a readthrough of intron 3 called axonal-SMN (a-SMN) (Setola et al. 2007). Transfection of NSC34 cells with a-SMN resulted in an increase in axonogenesis (Setola et al. 2007). Although this experiment showed a-SMN may

20

be important for axon growth, a-SMN is a protein truncated shortly after exon 3. Furthermore, that SMA missense mutations exist in the truncated region indicate a-SMN may not be relevant to SMA.

Other evidence of a role for SMN in axons comes from quantitative immunofluorescence experiments that have detected SMN in motor neuron axons with and without Gemin2 (Zhang et al. 2006; Fallini et al. 2011). Interestingly, SMN and Gemin2 were not always found in the presence of the Sm proteins, suggesting that SMN and Gemin2 associate together in axons in a multiprotein complex other than the SMN complex and with a function other than snRNP assembly (Zhang et al. 2006). However, exactly what these other functions are and what proteins exist in these complexes has not been elucidated. There is also suggestion of a complex containing the protein HuD and SMN complex. Staining experiments in primary motor neurons with SMN and HuD antibodies showed a colocalization of the 2 proteins (Fallini et al. 2011). HuD is a neuronal RNA-binding protein that is involved in RNA processing, stability, and transport (Perrone-Bizzozero and Bolognani 2002). Furthermore, using an shRNA to knockdown

SMN levels led to a 24% reduction in fluorescence intensity of HuD in the axons and a 57% reduction in poly-A signal in the axons as measured by fluorescence in situ hybridization (Fallini et al. 2011). This suggests SMN may play a role in axonal transport of mRNAs with the assistance of HuD. Interestingly, in both zebrafish and MN-1 cells an association between HuD and SMN has been demonstrated which is disrupted in SMN p.E134K mutants (Hubers et al.

2011; Hao le et al. 2017). Additionally, over-expression of HuD in zebrafish with mutated smn was shown to partially rescue axon growth defects (Hao le et al. 2017). This suggests SMN and

21

HuD function together, however it should be noted that such axon defects are not present in severe SMA mice (McGovern et al. 2008).

In short, many experiments have shown a possible role for SMN in the axon of motor neurons. However, the data suggests SMN is present in some capacity outside of the SMN complex. For example, it is unclear why the mutant SMN p.E134K protein does not bind HuD, as SMN exists as an oligomer and as long as full length SMN protein is present the complex should remain functional. A possible explanation is that SMN exists in a separate complex with

HuD but the function and components of such a complex are unknown. These complexes will need to be determined and their function ascertained in order to truly understand the extent to which SMN functions in the axon.

1.3.5 SMN in Various Tissues An abundance of evidence points to motor neurons as being the primary tissue affected by SMA. As mentioned previously, degeneration of the anterior horn as well as electrophysiological abnormalities are well known in SMA (David Arnold et al. 2013). In mouse models of SMA, there is a decrease of innervated synapses at the NMJ (Murray et al. 2008).

However, the severe SMA mouse is without any defects in axon growth and guidance which suggests denervation which begins during embryonic development (McGovern et al. 2008).

Other NMJ defects including immature clustering of acetylcholine receptors and neurofilament accumulation (Kariya et al. 2008). Transgenic studies that selectively express or knockdown

SMN in certain tissues also implicate motor neurons as the critical tissue affected in SMA. SMA mice carrying the SMN1 gene under control of the Prion promoter, which has high expression in motor neurons, lived over 200 days in contrast to SMA mice which lived for only 4 days

22

(Gavrilina et al. 2008). Though this experiment strongly suggests expression of SMN in nerves can rescue SMA animals, the experiment is imperfect due to leaky expression of the transgene into muscle (Gavrilina et al. 2008). Deletion of functional SMN in motor neurons using oligo-

Cre drivers results in an SMA like phenotype, albeit a milder one with the majority of mice surviving over a year and with increased motor function compared to SMA controls (Park et al.

2010). This may be a result of other neurons such as interneurons contributing to the disease, which would not have reduced SMN levels in this particular experiment. Other experiments using Cre-drivers performed by the Burghes lab have also shown the importance of motor neurons in SMA pathology. By using Nestin-Cre which drives expression in all neurons and glia,

ChAT-Cre which drives expression in cholinergic neurons, and a combination of both Nestin-

Cre and ChAT-Cre drivers, Smn was selectively deleted in these tissues. which led to a significant reduction in the electrophysiological biomarkers MUNE and CMAP, although survival declined less than expected (McGovern et al. 2015a). Similarly, replacement of Smn using Nestin-Cre, ChAT-Cre, and Nestin-Cre + ChAT-Cre showed that Smn was needed in motor neurons for rescue of MUNE and CMAP (McGovern et al. 2015a).

In contrast to the SMN requirement in nerve, studies performed using Cre-drivers indicated high SMN levels are not needed in muscle. Experiments using Myf5-Cre demonstrated that reduction of SMN in muscle to SMA levels does not result in a neuromuscular phenotype

(Iyer et al. 2015). Similarly, replacement of SMN in muscles of SMA animals gives no survival benefit (Iyer et al. 2015). Prior studies had shown that selective knockdown of SMN in muscles resulted in a phenotype more similar to muscular dystrophy (Cifuentes-Diaz and Frugier 2001), though this is not unexpected as SMN is needed in all cells for survival. Although muscle may

23

not have a high SMN requirement, defects in muscle cells of SMA patients has been reported, namely impaired myotube development and immature myotubes that fuse incorrectly (Arnold et al. 2004; Martínez-Hernández et al. 2009).

The extent of interneuron involvement in SMA has been examined in numerous experiments. Co-culture experiments using sensory neurons from SMA-derived iPS cell lines and wild-type motor neurons found no growth or survival abnormalities of the motor neurons, demonstrating that motor neurons remain healthy even when sensory neurons are SMN deficient

(Schwab and Ebert 2014). Additionally, the number of VGLUT synapses on Renshaw inhibitory interneurons was found to be comparable in SMA to those in wildtype samples (Thirumalai et al.

2013). In contrast, other experiments have demonstrated possible sensory nerve involvement in

SMA. The gene stasimon was found to be aberrantly spliced in SMA fruit flies, but knockdown of stasimon in motor neurons alone did not result in a neuromuscular phenotype (Lotti et al. 2012). However, pan-neuronal knockdown of stasimon caused electrophysiological abnormalities, suggesting that sensory neurons that may act as inputs to motor neurons are important in SMA pathology (Lotti et al. 2012). Interestingly, fluorescent staining of VGluT1 in

SMA mouse spinal cord samples showed a reduction of VGluT1 boutons without a reduction in total proprioceptive neurons (Mentis et al. 2011). In addition, the number of boutons decreased as the disease progressed (Mentis et al. 2011). Hence, there is a reduced connectivity between sensory neurons and motor neurons in SMA mice. Consistent with this view is the fact that SMA motor neurons have a reduced threshold needed to cause an action potential, which results in

SMA motor neurons being hyperexcitable (Mentis et al. 2011). In short, there is evidence both in

24

favor of and against interneuron involvement in SMA and more experimentation will be needed to resolve this conflict.

1.4 Splicing Defects in SMA

1.4.1 Splicing Alterations Detected in SMA Using Exon Microarray

Splicing perturbations have been reported in SMA samples (Zhang et al. 2008, 2013;

Bäumer et al. 2009; Lotti et al. 2012; See et al. 2014). An experiment involving exon microarray analysis of tissues taken from SMA mice at postnatal day 11 (P11) identified over 200 splicing changes in the spinal cord and over 600 from the kidney, which shows that splicing changes do occur, that they are widespread, and that they are tissue specific (Zhang et al. 2008). However, the large number of splicing changes may be a secondary effect of the disease. Another study, which also performed an exon microarray analysis on SMA spinal cord samples but at multiple timepoints, showed that at P1 and P7 timepoints expression analysis identified only 12 and 23 genes to be differentially expressed, respectively, compared to the P13 timepoint when 142 genes were differentially expressed (Bäumer et al. 2009). Furthermore, analysis showed that at the P13 timepoint genes were enriched for pathways dealing with cell injury

(Bäumer et al. 2009). The conclusion from this data is that although there are transcriptome changes in SMA mice that are widespread and in multiple tissues, evidence suggests the vast majority are a consequence and not a cause of SMA.

Finding the splicing changes most relevant to SMA has proven to be a difficult task for multiple reasons. First the disease primarily affects motor neurons, which are just a sub- population of the spinal cord making up just over 10% by weight. Using whole spinal cord tissue complicates analysis as changes unique to motor neurons become impossible to parse out from

25

the data. It also questions the relevance, as it has been shown that exon 7 splicing of SMN is particularly low in motor neurons, suggesting that motor neurons are more sensitive to low SMN levels (Ruggiu et al. 2012), and thus are more likely to experience splicing defects. Second, as mentioned above many changes may be secondary and not actually causal of the SMA phenotype. Third, aberrant splicing may be difficult to detect, as many available splicing programs are incapable of detecting novel splicing isoforms (Katz et al. 2010). This is especially relevant for motor neurons, which may have many transcripts and isoforms not currently annotated in transcriptome databases. Regardless, several genes have been identified as being misspliced in SMA, described below.

1.4.2 Splicing Disruption of the Agrin Gene

An RNA-sequencing experiment performed by the Dreyfuss lab identified downregulation of a specific isoform of the Agrin gene in motor neurons of SMA mice (Zhang et al. 2013). Agrin, which is known to play a role in the organization of the post-synaptic side of the NMJ, is alternatively spliced with certain isoforms, namely Z+ Agrin, being much more prevalent at the NMJ (Bezakova and Ruegg 2003). Skipping of two exons of the Agrin gene was detected in SMA motor neuron samples that was not present in the wild-type samples or white matter control cells, resulting in a loss of Z+ Agrin. Staining experiments confirmed the decrease of Z+ Agrin at P1 and almost the total loss at P3 (Zhang et al. 2013). A drawback of this study was that only 2 replicates were studied for both the SMA and control samples. Additionally, splicing analysis was performed using the program MISO (Katz et al. 2010), which unfortunately does not support testing of replicates and as a result can skew splicing analysis results. Indeed, in

26

the figures published the decrease of Z+ Agrin is not consistent across the two SMA replicates

(Zhang et al. 2013).

Nonetheless, transgenic experiments have shown that Z+ Agrin in SMA animals can have beneficial effects (Kim et al. 2017a). SMA mice expressing Z+ Agrin under the Hb9 promoter were partially rescued for NMJ defects including endplate innervation and pre-synaptic abnormalities, although there was no change in denervation and only a limited increase in survival (Kim et al. 2017a). These experiments show that splicing changes do occur in SMA and they may be specific to motor neurons, as evidenced by the absence of misspliced Agrin in white matter cells. It also demonstrates that certain NMJ defects found in SMA mice can be partially rescued when the corrected isoform is expressed in neuronal tissues. Finally, although expression of Z+ Agrin gives some benefit to SMA mice, clearly other splicing changes must be present in SMA that are critical in causing the disease. 1.4.3 Stasimon

A study performed by the Pellizzoni lab specifically analyzed 25 genes that contained a total of 28 known U11/U12 introns in mouse NIH 3T3 cells treated with an interfering RNA against mouse Smn (Lotti et al. 2012). They found aberrant splicing to occur in 9 of these introns, while all U2 introns they analyzed were spliced correctly (Lotti et al. 2012). This indicates that while U12 introns are more sensitive to aberrant splicing in SMA animals, a majority are still spliced correctly. One of the aberrantly spliced genes was Tmem41b, renamed to Stasimon, which was also aberrantly spliced in a Drosophila loss of function Smn73Ao mutant (Lotti et al.

2012). Interestingly, when the Drosophila homolog CG8408 was knocked down, it resulted in an increase in excitatory postsynaptic potential (eEPSP) amplitude of 127% compared to controls,

27

which is strikingly similar to Drosophila Smn mutants which have an increased eEPSP amplitude of 125% (Lotti et al. 2012). Curiously, pan-neuronal knockdown of CG8408 would correct the eEPSP amplitude, while knockdown in motor neurons would not, suggesting that the increase in eEPSP may be caused by some disruption of the motor circuit (Lotti et al. 2012). Abnormalities of the motor circuit have been identified in SMA animals, including hyperexcitability and a reduction of VGLUT1 positive synapses (Mentis et al. 2011; Fletcher et al. 2017). Importantly, the eEPSP amplitude was rescued in Smn mutants with transgenic expression of CG4808 cDNA, thus showing a link between aberrant splicing caused by low SMN and an abnormal neuronal phenotype. In zebrafish, coinjection of an smn MO and Stasimon RNA resulted in a reduction of motor axon branching defects that are known to be present in SMA zebrafish (Lotti et al. 2012).

The discovery of Stasimon, for the first time, linked a specific aberrant splicing event that occurs as a result of SMN deficiency with an aspect of the SMA phenotype. This strongly suggests that correction of certain splicing effects can ameliorate the SMA phenotype.

Unfortunately, correction of the Stasimon splicing defect by injection of scAAV9-Stasimon does not extend survival of SMA mice (data not published), suggesting other splicing changes may also underlie SMA. Identification of these defects is crucial to understand SMA and opening up new avenues of treatment. The study of Stasimon has also shown the importance of the motor circuit in SMA. The increase in eEPSP amplitude was not corrected in Drosophila Smn mutants when Stasimon cDNA was expressed only in motor neurons (Lotti et al. 2012). As such, stasimon was important in the cholinergic neurons that act as inputs to motor neurons.

Additionally, in mice Stasimon was found to be aberrantly spliced in DRG cells, though this has not yet been studied further (Lotti et al. 2012). Intron retention of Stasimon has been detected in

28

whole spinal cord RNA-seq experiments of SMA mice, though at low expression levels (Doktor et al. 2017). Stasimon has also been shown to localize to mitochondrial-associated ER membranes (Van Alstyne et al. 2018), though how this relates to SMA, if at all, is not known.

1.4.2 Other splicing changes

The zebrafish gene neurexin2a (nrxn2a) has a known function at the NMJ and has been identified as being aberrantly spliced in SMA (See et al. 2014). The gene nrxn2a was significantly downregulated by over 3-fold as well as being aberrantly spliced in zebrafish treated with an SMN MO that knockdown smn (See et al. 2014). The neurexin family of genes are critically important for neurotransmitter release and function as presynaptic cell adhesion molecules (de Wit et al. 2009). Injecting a splice blocking ASO against nrxn2a resulted in a branching defect in motor neurons which mimics the phenotype of SMA zebrafish. Furthermore, co-injection a full-length isoform of nrxn2a with a SMN MO resulted in a partial rescue of the branching defect (See et al. 2014). Nrxn2a was also downregulated in spinal cord tissue from

SMA mice at P2, indicating that the effect is conserved across species (See et al. 2014).

Interestingly, nrxn2 is a critical protein for maintaining excitatory synapses (de Wit et al. 2009), and thus, like Stasimon, may link a splicing change to disruption of the motor circuit.

1.5 Modifiers of SMA

1.5.1 Evidence of Modifiers

Although copy number is highly linked to SMA severity, there are exceptions to this rule.

These patients, sometimes called “exception patients”, have a disease that is milder or more severe than would be expected for their given SMN2 copy number (Cobben et al. 1995; Prior et al. 2004). This is most easily seen in haploidentical siblings who have the same copy number and

29

have inherited the same disease alleles, yet they have markedly different phenotypes (Burghes et al. 1994a; Cobben et al. 1995; Hahnen et al. 1995; McAndrew et al. 1997; Prior et al. 2004;

Oprea et al. 2008a; Jedrzejowska et al. 2008; Bernal et al. 2011). These patients are referred to as

“discordant siblings”. There is no evidence the modifying effect is unique to any type of SMA, as it has been reported at all SMA types. Thus there are SMA type 1 patients who have a SMA type 2 sibling (DiDonato et al. 1994, 1997; Pane et al. 2017), as well as SMA type 3 patients who have an unaffected sibling (Cobben et al. 1995; Hahnen et al. 1995). An example of a family with SMA discordant siblings can be seen in Figure 1.6. Similarly, the affect is not

Figure 1.6. Pedigree with discordant SMA siblings (Cobben et al. 1995). SMN-7 = SMN1 exons 7 and 8, NAIP-5 = NAIP exon 5. A (+) sign indicates at least 1 copy detected, while (--) indicates homozygous deletion. Individuals II-8 and II-11 have an identical genotype at flanking markers D5S629 and D5S639 and with homozygous deletion of SMN1. They were affected with SMA type III with age of onset between 8 and 12 years. The siblings II-2, II-7, and II-9 have the same genotype including homozygous SMN1 deletion, however they remain unaffected.

30

limited to one sex, as exceptionally mild cases have occurred in both males and females, nor has it been shown that exception cases occur more often in one sex over the other.

An SMA modifier could exist inside or outside of the SMA region. The highly complex and repetitive nature of the SMA region makes it prone to arrangements and deletions (Lefebvre et al. 1995; Campbell et al. 1997). Indeed, Alu repeats are known to be considerably higher in the SMA region compared to other genomic regions (Chen et al. 1999). At least one partial deletion has been reported in SMN2. It was an Alu-mediated deletion 6.6kb in length that eliminates exons 5 and 6 (Wirth et al. 1999). Such a deletion would result in non-functional protein but may still be detected when determining copy number as exon 7 remained, thus giving the false impression of an intact gene. An example of a positive modifier in the SMN2 gene is the sequence variant c.859G>C in exon 7 has been shown to drastically reduce SMA severity when it is present in patients (Prior et al. 2009). This variant is hypothesized to affect the binding of splicing enhancers, leading to more inclusion of the critical exon 7. Indeed, the presence of this variant increases full-length SMN2 transcript by 20% (Prior et al. 2009). One copy of this variant was found in a SMA type 2 patient with two copies SMN2. Interestingly, two copies of this variant were found in a SMA type 3 patient with two copies SMN2 (Bernal et al. 2010), indicating that the effect is cumulative. Additional variants around exon 7 have been analyzed for their effect on exon 7 incorporation. Two of these have been shown to have an effect, namely

A-44G and G100A, with A-44G having the most pronounced effect on exon 7 splicing at about

19% difference (Wu et al. 2017). In sum, partial deletions and sequence variants can have an effect on SMA phenotype, showing that not all SMN2 genes are equal in their capacity to

31

produce SMN protein. However, these variants only explain a small subset of patients and the cause of the vast majority of SMA exception cases remains unknown.

1.5.2 Plastin3

The gene Plastin3 (PLS3) has been proposed as a sex specific modifier of SMA that is active in females (Oprea et al. 2008a). However, studies where PLS3 levels were measured from blood samples have given mixed results (Oprea et al. 2008a; Stratigopoulos et al. 2010; Bernal et al. 2011; Yanyan et al. 2014). Comparing PLS3 expression in discordant SMA siblings shows no general correlation with a mild phenotype, including in female patients (Bernal et al. 2011;

Yanyan et al. 2014). A correlation between mild female SMA patients could be discerned but only if the data is separated by age groups (Stratigopoulos et al. 2010; Yanyan et al. 2014).

Controlling for sex, age, and SMN2 copy number eliminates these correlations (Stratigopoulos et al. 2010). Also, even when separating into age groups, PLS3 expression is only higher when comparing between SMA type 2 and SMA type 3 SMA patients, while no difference is found between SMA type 1 and SMA type 2 patients (Stratigopoulos et al. 2010; Yanyan et al. 2014).

This would imply that PLS3 may not work to modify severe SMA phenotypes. However, this is inconsistent with certain discordant sibling cases where the severe sibling is a SMA type 1 and the mild is a SMA type 2 (DiDonato et al. 1994, 1997; Pane et al. 2017). In addition, maximal

CMAP and MUNE did not correlate with PLS3 expression in mild discordant SMA patients

(Stratigopoulos et al. 2010).

The first experiment that identified PLS3 as a candidate modifier was performed by the

Wirth group. In this experiment, gene expression in lymphoblasts was measured for SMA exception cases (Oprea et al. 2008a). PLS3 was the only gene found to have higher expression in

32

the mild individuals. However, there are some problems with this hypothesis. To start, there were

5 families analyzed and in all the mild sibling was a female. PLS3 expression in the two siblings were then compared. In one case (patients BW 421 and BW 422), PLS3 expression was almost identical between the mild and severe sibling (Oprea et al. 2008a). This was explained in the paper to be the case because PLS3 acts as a modifier only in females. However, the authors then compare PLS3 expression in males to PLS3 expression in females, which is not a valid comparison if PLS3 is a sex specific modifier since PLS3 expression is irrelevant in males as it will never modify the phenotype. There is only one family where the female mild sibling is compared to a female severe sibling (Oprea et al. 2008a). Furthermore, studies have been done on PLS3 expression in discordant SMA siblings which show no correlation between PLS3 expression and a mild SMA phenotype (Bernal et al. 2011; Yanyan et al. 2014). In fact, one study shows two mild SMA females who have less PLS3 expression, as measured from lymphoblasts, compared to their female SMA sibling (Bernal et al. 2011). Interestingly, in this study PLS3 expression was found to be over an order of magnitude higher in fibroblasts than in lymphoblasts (Bernal et al. 2011), questioning the relevance of relatively small changes in PLS3 expression in lymphoblast tissue. Interestingly, transgenic mice have been made that over- express PLS3 mRNA by 100-fold only increase total PLS3 protein by 2-fold (McGovern et al.

2015b), again suggesting PLS3 mRNA measurements may not be reliable indicators for amount of PLS3 protein present. This evidence contests the idea of PLS3 as a modifier of SMA and at a minimum suggests other modifiers exist.

Both mouse and zebrafish models of SMA have been used to study the effects of PLS3 on

SMA phenotype. When zebrafish smn was knocked-down using an anti-sense MO while

33

simultaneously injecting human PLS3 RNA, motor neuron outgrowth defects that were known to present with SMA zebrafish were corrected (Oprea et al. 2008b). Further zebrafish studies showed that expression of PLS3 in motor neurons could rescue a SV2 expression defect at the

NMJ in smn-/- animals (Hao et al. 2012). Although these data suggest a possible protective role for PLS3 in the motor neurons of SMA zebrafish, slight increases in PLS3 expression failed to give a survival benefit (Hao et al. 2012).

Similarly, studies of PLS3 using SMA mice have given mixed results. The Wirth lab created mice overexpressing PLS3 and crossed them with the severe Taiwanese SMA mouse model (Ackermann et al. 2013). The SMA mice overexpressing PLS3 showed no increase in survival nor increase in weight compared to SMA mice (Ackermann et al. 2013). Similarly, there was no improvement in function in SMA mice compared to SMA mice overexpressing PLS3, as measured by a tube test and righting reflex test (Ackermann et al. 2013). However, there were modest improvements noted in the muscle and the NMJ including muscle fiber size of the gastrocnemius muscle at P10, endplate size, and the number of axons innervating endplates were all increased in the SMA animals overexpressing PLS3 compared to controls, though the latter was transient and disappeared as the mice aged (Ackermann et al. 2013). The Burghes lab created transgenic mouse lines over-expressing PLS3 and crossed them with SMNdelta7 mice

(McGovern et al. 2015b). None of the transgenic PLS3 mouse lines had any survival or weight gain benefit over delta7 mice (McGovern et al. 2015b). Additionally, electrophysiology was studied in these mice, and there was found to be no improvement of endplate current, miniature endplate current MEPC, quantal content, or endplate current time constant EPC time constant in

34

SMA mice over-expressing PLS3 compared to SMNdelta7 mice (McGovern et al. 2015b). Thus,

PLS3 overexpression gives neither a survival nor functional benefit in severe SMA mice.

The Wirth lab suggests organ defects due to low SMN may be masking beneficial effects of PLS3 (Ackermann et al. 2013). In support of this hypothesis, they performed experiments on

SMA mice on a mixed background, which is known to result in a milder SMA phenotype

(Eshraghi et al. 2016). SMA mice overexpressing PLS3 on a mixed background had a mean lifespan of 19.4 days compared to 16.9 days in controls (Ackermann et al. 2013). Thus, there is some evidence PLS3 can ameliorate SMA, though the scope is limited and it only has an effect in mild mice. Indeed, treatment of Smn2B- and SMNdelta7 mice with scAAV9-PLS3 gave no survival benefit in SMNDelta7 mice, whereas in Smn2b/- there was a median increase in survival of the treated Smn2B- mice from approximately 30 days to 55 days (Kaifer et al. 2017). Interestingly, increasing the dose from 1e11 to 3e11 vector genomes had no effect on survival (Kaifer et al. 2017).

Similarly, administering a SMN splice correcting ASO in transgenic mice over- expressing PLS3 in both the heterozygous (PLS3het) and homozygous state (PLS3hom) showed a remarkable rescue in survival (Hosseinibarkooie et al. 2016). The PLS3het mice lived 169 days

(n = 23) on average (albeit with a large standard deviation of 176 days), which is dramatically longer than ASO-treated control mice that only lived 26 days (n = 22). The PLS3hom mice lived even longer on average, at 219 days (n = 11) (Hosseinibarkooie et al. 2016). The mechanism of action of PLS3 to extend survival in this experiment was hypothesized to be the rescue of endocytosis, based on a fluorescence-activated cell sorting experiment that measured the uptake of fluorescently labeled dextran (FITC-Dex) in murine embryonic fibroblast cells

35

(Hosseinibarkooie et al. 2016). There was a significant reduction in FITC-Dex uptake in the

SMA cells which was restored to wild-type levels in the cells derived from PLS3het mice

(Hosseinibarkooie et al. 2016). Defects related to endocytosis have been reported in a

Caenorhabditis elegans model of SMA including decreases in the number of docked synaptic vesicles (Dimitriadi et al. 2016). Interestingly, overexpression of PLS3 in mouse models of

SMA results in the partial rescue of endocytotic defects such as the area occupied by synaptic vesicles at nerve terminals as well as the number of docked or fused vesicles (Ackermann et al.

2013).

In sum, there is much conflicting evidence of PLS3 being an SMA modifier. The initial report of PLS3 acting as a protective modifier was based on a single case of PLS3 over- expression in a mild discordant sibling. Using similar methods, other studies have shown no correlation or even negative correlation between PLS3 expression and SMA severity, where the mild female SMA sibling has lower PLS3 expression than the severe (Stratigopoulos et al. 2010;

Bernal et al. 2011). When a correlation is found, it is only when the data is sequestered by age, such as after 3 years of age or post-puberty (Stratigopoulos et al. 2010; Yanyan et al. 2014), and the biological relevance of such an analysis is questionable. In animals, treating SMA mice with

AAV9-PLS3 had no effect on survival or electrophysiology (Kaifer et al. 2017). Only when

PLS3 is given in conjunction with a modifying ASO is any benefit perceptible (Hosseinibarkooie et al. 2016). Interestingly, as PLS3 has been shown to increase endocytosis (Ackermann et al.

2013), it is possible it is increasing uptake of the ASO which indeed would lead to a milder phenotype. This explanation is consistent with human data that does not show mild phenotypes being associated with higher PLS3 expression (Stratigopoulos et al. 2010; Bernal et al. 2011). It

36

also explains why there was a dose-dependent effect of PLS3 in the experiments with SMN-ASO treated mice, but not in experiments where Smn2b/- mice were treated only with varying doses of

AAV9-PLS3 (Kaifer et al. 2017). The Wirth group maintains that this is a result of PLS3 only affecting mild phenotypes, however discordant siblings have been found for all types of SMA, indicating that even SMA type 1 phenotypes can be modified (DiDonato et al. 1994, 1997; Pane et al. 2017). Thus, while PLS3 may gave a benefit in combination with ASO treatment, it is not known if the modifying effect is due to increased uptake of the ASO, or due to some other biology process being corrected in SMA motor neurons.

1.5.3 Neurocalcin Delta

The gene NCALD was identified as a modifier of SMA after a parametric linkage analysis was performed on a large family with patients who had a homozygous deletion of SMN1 yet were asymptomatic (Riessland et al. 2017). A transcriptome analysis was performed concurrently which identified differential expression of NCALD in the asymptomatic patients

(Riessland et al. 2017). Sequencing of the NCALD gene led to the discovery of two variants that were present in the asymptomatic patients. The first was rs147264092 which is a 2 bp insertion in intron 1 and the second is a 17 bp deletion which is 600kb upstream of the NCALD gene

(Riessland et al. 2017). It is hypothesized that the two variants work synergistically to modify the

SMA phenotype but no evidence was given to support this and there is no explanation as to how such modification occurs. The 17 bp deletion is adjacent to an enhancer as identified by the

ENCODE project, however this deletion alone is insufficient to have a modifying effect

(Riessland et al. 2017).

37

There is some biochemical evidence that NCALD is involved with neuronal growth and development. Ncald knockdown in NSC34 cells deficient of SMN restored a neurite outgrowth defect present in those cells (Riessland et al. 2017). In zebrafish, knockdown of both smn and ncald improves motor axon outgrowth defects compared to just smn knockdown, specifically by reducing the number of severely truncated motor axons (Riessland et al. 2017). However, about

20% of motor axons still had a defect compared to controls. Experiments involving SMA mice with a heterozygous knockdown of Ncald show little improvement of SMA phenotype. SMA-

Ncaldko/wt mice had no survival advantage over SMA mice (Riessland et al. 2017). Similarly,

SMA-Ncaldko/wt mice treated with a suboptimal dose of a corrective SMN-ASO showed no survival advantage compared to ASO treated SMA mice (Riessland et al. 2017). Interestingly, the SMA-Ncaldko/wt mice did have a small, yet significant, improvement in righting reflex score, grip strength, and NMJ morphology compared to SMN-ASO treated SMA mice (Riessland et al. 2017). These data show that although reduction of Ncald can improve strength and NMJ physiology in SMA animals treated with SMN-ASO, Ncald reduction alone is insufficient to extend survival in SMA animals.

In conclusion, NCALD was identified as a modifier of SMA based on the segregation of 2 alleles with 5 asymptomatic patients, as well as reduced NCALD expression in said patients. In addition, these 2 variants were also found in a type 0 patient who lived longer than expected

(Riessland et al. 2017). That a mild type 0 patient had these 2 variants is contradictory to what the Wirth group consistently maintains, which is that modifier cannot be present in severe individuals due to systemic organ failure. Indeed, in papers exploring the effects of both PLS3 and NCALD, mild mouse models, including SMA mice treated with corrective ASO, were used

38

explicitly for this reason. It is also problematic that no evidence has been provided showing how the 2 variants result in reduced NCALD expression. The 17 bp deletion is adjacent to an

ENCODE regulatory region, but as the deletion has not been shown to associate with mild SMA patients and has been found in patients with the expected phenotype for their copy number of

SMN2, this deletion alone is not able to have a modifying effect.

1.5.4 Myostatin

Myostatin is a member of the TGF-beta family of proteins and is a negative regulator of skeletal muscle mass (McPherron et al. 1997). Conditional knockout of the myostatin gene in mice results in a widespread hypertrophy of skeletal muscle (Grobet et al. 2003), which has resulted in speculation that myostatin inhibition could be used as a therapy for diseases involving muscle loss or weakness (Wagner et al. 2002). One method of myostatin inhibition is treatment with follistatin, which binds to myostatin and act as an inhibitor of myostatin activity (Hill et al. 2003). Treatment of SMAdelta7 mice with follistatin resulted in increased muscle mass of the triceps and gastrocnemius as well as increased strength as measured by percent of animals able to right (Rose et al. 2009). However, maximal survival was not improved in the follistatin treated group, though one dosage cohort experienced less early death (Rose et al. 2009). Further experiments testing follistatin overexpression or transgenic myostatin inactivation in SMA mice have similarly showed little benefit to motor function and survival (Sumner et al. 2009; Rindt et al. 2012). Recently, co-treatment of SMAdelta7 mice with a splice-correcting ASO along with the myostatin inhibitor and monoclonal antibody muSRK-O15P resulted in significantly increased muscle mass and muscle strength in treated animals compared to controls, as well as a reduction in bone loss (Long et al. 2018). Thus, while myostatin inhibition alone has little effect

39

in treating SMA, myostatin inhibition in conjunction with treatments for SMA that result in more full length SMN can treat secondary aspects of the disease.

1.5.5 PTEN

The tumor suppressor protein phosphatase and tensin homolog deleted on chromosome

10 (PTEN) is thought to have a neuroprotective role with respect to neural injury (Gary and

Mattson 2002), which is mediated through regulation of NMDA receptors (Ning 2004). PTEN depletion has been shown to encourage axon regeneration after optic nerve injury (Park et al.

2008). When a small interfering RNA targeting PTEN was injected into the hindlimb gastrocnemius muscle of SMNdelta7 mice on P1, the siRNA was able to reach the spinal cord via retrograde transport and resulted in a 35% increase in motor neuron survival (Ning et al.

2010). Systemic delivery of an siRNA targeted against PTEN in SMAdelta7 mice resulted in a dramatic 3-fold increase in survival, likely due to activation of either the AKT or mTOR pathways (Little et al. 2015). Activated AKT has been shown to promote growth factors through its downstream effectors while also inhibiting the JNK-signaling apoptotic pathway (Okouchi et al. 2007).

1.5.6 Ubiquitination

Ubiquitination is a post-translational modification that is involved in a myriad of cellular processes, the best known of which is targeting proteins for degradation via the proteasome

(Sánchez-Sánchez and Arévalo 2017). Ubiquitination is a process where a -activating enzyme (E1) activates and transfers the ubiquitin protein to a ubiquitin-conjugating enzyme (E2), followed by a reversible reaction catalyzed by a (E3), which transfers the activated ubiquitin from E2 to the target protein (Petroski 2008). SMN has been shown to be

40

ubiquitinated and subsequently degraded via the proteasome (Chang et al. 2004). Specifically, it has been shown that the mouse E3 ubiquitin ligase mind bomb 1 (Mib1) ubiquitinates SMN, and knockdown of the C. elegans ortholog mib-1 using RNAi attenuates pharyngeal pumping defects that are present in smn-1 deficient animals (Kwon et al. 2013). Ubiquitin can be removed from proteins by a class of deubiquitinating enzymes. Such enzymes can stabilize proteins by preventing their degradation. The USP9X deubiquitinates SMN preventing it from being degraded (Han et al. 2012). Knockdown of USP9X in HeLa cells resulted in a significant decrease in the number of nuclear gems (Han et al. 2012), illustrating that USP9X may function in stabilizing or proper localizing of SMN. Interestingly, SMN that had been monoubiquitinated by the protein ITCH failed to co-localized with the protein coilin, suggested defects in ubiquitination could cause SMN to mislocalize (Han et al. 2016). Proteomic analysis of the synapse detected abnormalities in ubiquitin pathways in samples from SMA tissues, including a decrease in the E1 ubiquitin-like modifier activating enzyme (UBA1) protein and an increase of the ubiquitin ligase ubiquitin carboxy-terminal hydrolase L1 (UCHL1) (Hsu et al. 2010; Wishart et al. 2014). When was knocked down in zebrafish, it resulted in a branching phenotype of the motor neurons, which resembles a defect present in SMA zebrafish (Wishart et al. 2014). Taiwanese SMA mice that received a systemic injection of AAV9-UBA1 experienced an increase in median survival from P9 to P12 along with a significant increase in spinal cord SMN levels (Powis et al. 2016). In NSC34 cells, over- expression of UCHL1 decreased level of SMN protein, while treating with a UCHL1 inhibitor lead to increased SMN level, suggesting that UCHL1 inhibition may be therapeutic to SMA animals (Hsu et al. 2010). However, severe SMA mice treated with a UCHL1 inhibitor via

41

intraperitoneal injection experienced no increase in survival compared to SMA controls (Powis et al. 2014). In contrast, the compound ML372, which blocks ubiquitination of the SMN protein, modestly extends lifetime from 14 days to 18 days when administered to SMAdelta7 mice

(Abera et al. 2016). These conflicting reports of modification of SMA via the ubiquitin pathway necessitates further study to confirm ubiquitin’s role in modulating the SMA phenotype.

1.5.7 Actin Dynamics

Many proteins that have been proposed as associating with SMN are involved in actin dynamics or mRNA transport. SMN has been found to interact with hnRNP Q and hnRNP R in both yeast-2-hybrid screens and co-immunoprecipitation experiments (Rossoll et al. 2002). hnRNP R is known for its critical role in transporting ß-actin to the terminals of axons of motor neurons by binding the 3' UTR of ß-actin RNA (Rossoll et al. 2003). Staining of ß-actin in motor neurons from SMA mice show that there is significantly less ß-actin in the growth cones of SMA mice compared to controls (Rossoll et al. 2003). It was also found that ß-actin does not co- localize to growth cones with mutant hnRNP R, including hnRNP R that is mutated in the SMN binding domain (Rossoll et al. 2003). These data point to a possible role in SMN transporting actin and thus indirect involvement in actin dynamics.

SMN was also found by coimmunoprecipitation experiments to associate with α-COP

(Peter et al. 2011), which forms part of a vesicle coat protein complex that it important for golgi and endoplasmic reticulum trafficking. Fluorescence and time lapse studies showed that α-COP and SMN co-localize and traffic with each other down growth cones in immature motor neurons

(Peter et al. 2011). Knockdown of α-COP disrupted SMN localization as well as F-actin dynamics in neurites (Peter et al. 2011). This suggests that α-COP is important in localizing

42

SMN to growth cones and the absence of SMN at the growth cones disrupts neurite outgrowth, possibly as a result of actin dynamics. Interestingly, as mentioned earlier one proposed modifier of SMA is PLS3 which is known to associate with actin, though PLS3 mode of action is not yet known and is very controversial.

1.5.8 Epigenetics

Epigenetic effects have been proposed to affect SMA phenotype, both within and outside of the SMN2 gene, though evidence for this is limited (Hauke et al. 2009; Zheleznyakova et al.

2013, 2015). One study found differences in CpG methylation at certain CpG dinucleotides, two of which were near an alternative transcription start sites of SMN2 (Hauke et al. 2009). Although these 2 CpG dinucleotides were shown to modulate expression of a particular isoform of SMN2, this isoform makes up less than 5% of total SMN2 transcripts and hence does not result in a significant change in total SMN protein (Hauke et al. 2009). Outside of SMN2, methylation differences have been found between lymphoblast samples from SMA patients and healthy controls near the genes CHML and ARHGAP22 which are related to the activity of Rab and Rho

GTPases (Zheleznyakova et al. 2013). However, the ability of these methylation changes to alter the SMA phenotype has yet to be demonstrated.

1.6 Therapies to treat SMA

1.6.1 Splicing Modulators of SMN1/2

SMA is a disease of low SMN levels and thus increasing full-length SMN is one possible way of treating the disease. As all SMA patients have either homozygous loss or mutation of

SMN1, this would need to be accomplished via altering SMN2 expression or splicing to increase total full-length SMN protein. SMN1 and SMN2 are nearly identical and the coding sequence

43

remains the same. There is a single nucleotide sequence, a C to T change in exon 7 of SMN2, which disrupts the splicing of the transcript and is essentially the only sequence difference that defines SMN2 (Monani et al. 1999a). However, there are other sequence variants that exist between the 2 genes. There are 34 other sequence variants that differ between SMN1 and SMN2 if the promoter region is included (Monani et al. 1999a). These variants are found as a polymorphism or as a result of gene conversion events and they can occur at different frequencies (DiDonato et al. 1997; Burghes 1997; DiMatteo et al. 2008). Seventeen of these changes occur in intron 6, 2 in intron 7, and 1 in exon 8 (Monani et al. 1999a). These are notable, as splicing regulatory regions have been identified in intron 6 and intron 7 of SMN1/2 (Miyajima et al. 2002; Singh et al. 2006). In particular, the A-44G change in intron 6 which is usually only present in SMN1 has been shown to increase exon 7 incorporation, likely by affecting the binding of splicing modulators (Wu et al. 2017). The binding of splicing modulators is responsible for the vast difference in total functional protein produced by SMN1 compared to SMN2 (Lorson et al. 1999a). There is evidence that the C to T change effects the binding of both splicing enhancers and splicing silencers (Cartegni and Krainer 2002). A 7 bp sequence at the 5’ end of SMN1 exon 7 matches the motif of SF2/ASF exon splice enhancers, which is eliminated in SMN2 (Cartegni and Krainer

2002). A compensatory mutation designed to re-establish the SF2/ASF motif in SMN2 was used in a minigene construct to test splicing efficiency and it resulted in restoring splicing to SMN1 levels (Cartegni and Krainer 2002). This shows that the disruption of an exon splice enhancer is the mechanism by which the C to T change alters SMN2 splicing at exon 7. In contrast, another study has shown that the binding sequence for hnRNPA1, a splicing silencer, is created with the

44

C to T mutation (Cartegni et al. 2006). Mutations at the +7 and +8 position of exon 7 of SMN1 and SMN2 was shown to increase the splicing of exon 7 in SMN2 but not SMN1 (Cartegni et al.

2006). In addition to hnRNPA1 acting as a splicing silencer, it may also disrupt binding at an adjacent Tra2 dependent ESE site (Kashima and Manley 2003). Thus, there is biochemical evidence for both the disruption and enhancement of splicing via splicing modulators as a result of the C to T mutation in exon 7 of SMN2.

There is also evidence that splicing modulators bind to intronic sequences around exon 7.

Sixty-seven bp upstream from exon7 lies a sequence named Element 1, which is 45 bp in length

(Miyajima et al. 2002). Mutations or deletions in Element 1 resulted in increased exon 7 incorporation (Miyajima et al. 2002). Evidence from RNA affinity chromatography suggests that the proteins FUSE-BP and PTB bind Element 1 and it is hypothesized that they act as splice silencers (Baughan et al. 2009). There is also a sequence in intron 7, located just 3 bp away from the U1 snRNP binding site, which is shown to be a strong splicing silencer (Singh et al. 2006).

This sequence is called ISS-N1 and when deleted greatly increased exon 7 incorporation in human patient fibroblasts (Singh et al. 2006). Furthermore, treatment of the GM03813 cell line which lacks SMN1 with an ASO that binds to ISS-N1 restored exon 7 incorporation to levels seen in control fibroblasts (Singh et al. 2006). These data show the prevalence of splicing modifiers in the SMN1/2 genes, especially around exon 7. Mutations in the splicing regulatory regions were able to reverse their effect, suggesting that mutations in SMN2 alter exon 7 splicing, and thus also alter the phenotype. In short, identified splicing regulatory regions are therapeutic targets for the treatment of SMA. The repression of a negative splicing regulatory region or

45

enhancement of a positive regulatory element would result in the generation of more full-length

SMN protein.

1.6.2 ASO treatment of SMA

As mentioned above, in vitro experiments that targeted splice regulatory regions had very promising results of increasing full-length SMN transcript (Singh et al. 2006), which lead to the development of a 2'-O-(2-methoxyethyl) (MOE) phosphorothioate-modified ASO (Hua et al.

2010). The lab of Adrian Krainer has demonstrated that Taiwanese SMA mice given this ASO through both intracerebroventricular (ICV) and subcutaneous (SC) injections at P1 had a median survival of 173 days, considerably longer than SMA controls who lived only 10 days (Hua et al.

2011). Interestingly when the mice received just a single ICV injection survival was extended only modestly by about 10 days. They also found that while multiple injections at later timepoints extended survival to 137 days, the benefit was dampened compared to injection at earlier timepoints emphasizing the importance of early treatment (Hua et al. 2011). These results suggest that multiple systemic injections are required for rescue. However, other research by the

Burghes lab shows that a single ICV injection of a splice-correcting phosphorodiamidate morpholino (MO) ASO targeting ISS-N1 was able to extend survival of SMA mice to a medium of 112 days (Porensky et al. 2012). In both cases there was no evidence of toxicity (Hua et al.

2011; Porensky et al. 2012). Thus, there is conflicting evidence as to whether systemic delivery is necessary. Regardless, these early experiments showed that treatment of SMA animals using

ASOs that target ISS-N1 was safe and highly effective.

Indeed, the drug Nusinersen is an MOE ASO targeted to ISS-N1 and was developed by

Ionis Pharmaceutical and Biogen to be delivered via intrathecal injection. Results from a double

46

blind study of 122 patients demonstrated that 61% of treated patients experienced no death or the requirement of permanent ventilation compared to the control group at just 32% (Finkel et al.

2017). In addition, the patients in the Nusinersen group reached motor milestones significantly more often than patients in the control group (Finkel et al. 2017). In fact, just over half of all treated patients reached a motor milestone while no patients in the control group did. More specifically, 22% of treated patients gained full head control, 8% were able to sit independent, and 1% were even able to stand (Finkel et al. 2017). Adverse events were similar between the control and treated group, and thus demonstrate that Nusinersen was well-tolerated (Finkel et al.

2017). Overall the study shows that ASO treatment is safe and effective for SMA patients.

Treatment was so effective in these studies that in December of 2016 the Food and Drug

Administration in the United States approved Nusinersen for the treatment of SMA of all age groups. Interestingly, at the time Nusinersen had not been administered to any adults. From the patients study, it is known that Nusinersen treatment is effective even after symptoms present, however early treatment is more effective as evidenced by the fact earlier administration of the drug decreased the need for permanent ventilation in patients (Finkel et al. 2017). Thus, it is not fully known how less severe and older SMA patients will respond to treatment, though clinical trials of Nusinersen in SMA type 2 patients has similarly been successful (Mercuri et al. 2018).

Additionally, it is not known when treatment should begin in patients who have homozygous deletion of SMN1 but are asymptomatic. This is especially important for patients with higher copy numbers of SMN2 (4 copies or greater) who may not show symptoms until early adulthood or later. Delivery of the ASO to the CNS is a complex procedure and not without risks.

Additionally, the ASO has a half-life of approximately four months and thus needs to be re-

47

administered continuously for the treatment of the disease (Chiriboga et al. 2016). Treatment of the disease too early would be very costly and pose unnecessary risks to the patient. Finally, rescue of SMA animals using an MOE or MO ASO has been shown to be dose dependent. More research is needed to answer these questions related to ASO treatment in SMA patients.

1.6.3 Valporic Acid treatment of SMA

Small molecules are one method of increasing total full-length SMN, which can act by either increasing expression or splicing modulation of SMN2. Valporic acid (VPA) is one compound that has been extensively studied for the treatment of SMA. VPA is an FDA approved drug for the treatment of numerous neurological ailments including epilepsy and migraines

(Labiner 2002). Treatment of fibroblasts derived from SMA patients with increasing amounts of

VPA lead to increased levels of full-length SMN transcript (Brichta et al. 2003). The increase is due both to increased transcription levels as well as augmenting splicing at exon 7 (Brichta et al. 2003). VPA is known to be a histone deactylase inhibitor (Leng and Chuang 2006), and thus it is thought to alter transcriptional regulation of the SMN2 gene. However, benefits of VPA were limited in an open label Phase II study where VPA was administered to SMA patients. Younger patients (less than 5 years of age) had the best improvement with a majority of patients showing an improvement in Hammersmith Motor Function (Swoboda et al. 2009). Changes in CMAP were reported in 2 clinical trials of VPA in SMA patients (Swoboda et al. 2009; Kissel et al.

2011). One complication was a deficiency of L-carnitine (Swoboda et al. 2009), as carnitine is known to be eliminated from the body as a result of VPA (Melegh et al. 1994). It was not known if carnitine deficiency was impacting motor function, though future trials where both carnitine

48

and VPA were given to patients showed no clinically relevant changes in patients (Krosschell et al. 2018).

1.6.4 Gene therapy

Gene therapy is the delivery or correction of gene in order to treat a patient. Delivery is accomplished using a viral vector. Delivery of a gene using self-complimentary adeno associated virus (scAAV9) is ideal for a disease like SMA which is caused by loss-of-function of a single, small gene that can be packaged inside of the vector. Self-complimentary adeno associated virus has been shown to be an effective vector for delivery of recombinant DNA to a wide variety of tissues including heart, muscle, and neurons (McCarty 2008). Neonatal mice that received an intravenous injection of scAAV9-GFP at P1 showed widespread expression of GFP amongst motor neurons, indicating that at early timepoints scAAV9 was able to cross the blood-brain barrier (Foust et al. 2009). Interestingly, when adult mice were intravenously injected with scAAV9-GFP, around 64% of astrocytes from lumbar spinal cord were found to be transduced, whereas lower motor neurons were rarely transduced (Foust et al. 2009). Another experiment has shown that up to 28% of lower motor neurons of both cervical and lumbar spinal cord from adult mice could be transduced using scAAV9, however the results were extremely variable between replicates, with the lowest having only 2% transduction (Duque et al. 2009). Thus, while scAAV9 crosses the blood-brain barrier at both neonatal and adult timepoints, early delivery is necessary for transduction of motor neurons. However, it was also demonstrated that delivery of scAAV9 directly into the CNS of 5 day old pigs via intrathecal or intracisternal injection efficiently transduced motor neurons (Bevan et al. 2011). Furthermore, delivery into the cerebral spinal fluid (CSF) rather than intravenously was found to be highly effective at targeting

49

motor neurons, even at 1/10th of the dose, which could be further improved by placing animals in the Trendelenburg position after delivery (Meyer et al. 2015). These experiments showed that systemic injection of scAAV9 is effective for gene delivery in younger animals, and direct injection into the CSF is a viable option for larger subjects.

Experiments in SMNdelta7 mice have shown that a single delivery of SMN using AAV9 at P1 can increase lifespan from 15 days to over 250 (Foust et al. 2010). Righting time was also corrected. Interestingly, weight was only partially rescued as the SMN treated mice were smaller than control animals for their entire life (Foust et al. 2010). Treatment at later timepoints showed a decrease in treatment efficacy, with there being only a 15 day extension to survival when injecting at P5 and no appreciable difference when injected at P10 (Foust et al. 2010). This emphasizes the importance of early treatment in SMA. Clinical trials of scAAV9-SMN delivery in 15 SMA patients has shown very promising results (Mendell et al. 2017). Patients were divided into a low dose and a high dose cohort, and all patients had a disease onset between 1 and 3 months making them classic SMA type 1 cases.

No patient that was treated required permanent ventilatory assistance, which is in contrast to

92% of patients at the same age according to natural history data (Kolb et al. 2017; Mendell et al.

2017). Treated patients showed gains in motor function according to the CHOP INTEND scale with 11 patients having a score of 40 or greater, which is a level never attained by Type 1 patients (Mendell et al. 2017). Patients in the low dose cohort also never achieved a score of 40 on CHOP INTEND, though they all did have an improvement. This shows that there is a dose dependent response to receiving the treatment. Most impressively, 2 patients gained the ability to walk independently, which is unheard of for Type 1 patients (Mendell et al. 2017).

50

The clinical trials have shown that scAAV9-SMN treatment for SMA is effective and safe. They need to only administer a single dose is a large advantage for treatment. There are some drawbacks, however. Two patients did have a treatment related adverse reaction, related to levels of aminotransferase levels in serum, which was resolved with prednisolone treatment

(Mendell et al. 2017). Another barrier to gene therapy treatment is the prevalence of individuals who are seropositive for AAV9 antibodies. These individuals cannot receive the treatment.

Indeed, 1 individual who tested positive was excluded from the study. One study had suggested that approximately 10% of the population has antibodies of some kind to adeno associated virus, and it appears that it rises with age (Harrington et al. 2016). Currently, there are clinical trials underway studying the intrathecal delivery of AVXS-101 (clinicaltrials.gov identifiers

NCT03381729 and NCT03505099). In December of 2018, the Food and Drug Administration announced FDA filing acceptance and Priority Review for AVXS-101 (now known as Zolgensma), with regulatory approval expected in 2019.

1.6.5 Risdiplam

Risdiplam is another small molecule that is a very promising treatment for SMA.

Risdiplam is a compound chemically similar to another small molecule, RG7800 (Sturm et al.

2018). RG7800 was found to significantly increase the amount of full-length SMN2 transcript by over 1.5 fold in SMA type 1 patient derived fibroblasts (Ratni et al. 2016). Additionally,

SMNDelta7 mice receiving 10 mg/kg/day of RG7800 experienced a significant rescue of survival and weight gain, with 80% of the mice living at least 70 days (Ratni et al. 2016).

RG7800 entered human clinical trials where it was shown to increase SMN protein levels by 2- fold in SMA patients, however it was halted due to concerns of retinal toxicity (Kletzl et al.

51

2018). Risdiplam was developed as an optimization of RG7800 and is expected to have reduced retinal toxicity (Ratni et al. 2018). Indeed, a recent phase I study in healthy patients demonstrated that Risdiplam was well-tolerated and led to a 42% increase in full-length SMN2 RNA (Sturm et al. 2018). One advantage of Risdiplam is that it is taken orally, in contrast to ASO treatment which must be given through a spinal injection. Risdiplam is currently undergoing clinical trials in pediatric, adult, and pre-symptomatic SMA patients (clinicaltrials.gov identifiers

NCT03032172 and NCT03779334).

52

Chapter 2

Complete sequencing of the SMN2 gene in SMA patients detects SMN gene deletion

junctions and variants in SMN2 that modify the SMA phenotype

Several authors contributed to this work. Vicki McGovern was crucial to these experiments and performed the BAC capture and library prep, as well as validation of copy number using ddPCR.

Matthew R. Avenarius and Pamela J. Snyder in the lab of Thomas W. Prior provided samples and performed screening of the deletion junction. Flavia C. Nery, Abdurrahman Muhtaseb, Jennifer J. Siranosian, Alec J. Johnstone, Pann H. Nwe, and Ren Z. Zhang in the lab of Kathryn

J. Swoboda provided patient samples and maintained a patient phenotype database. Jennifer

Sinott assisted with mathematical modeling of the junction. Jennifer S. Roggenbuck, John T.

Kissel, and Valeria A. Sansone provided patient samples.

2.1 Introduction

The SMN2 copy number correlates with phenotypic severity, where higher copy numbers result in a less severe SMA phenotype (McAndrew et al. 1997; Burghes 1997; Feldkötter et al.

2002; Mailman et al. 2002; Jedrzejowska et al. 2009; Calucho et al. 2018). However, exceptions to this rule are known to occur (Cobben et al. 1995; Prior et al. 2004). For example, siblings with

53

identical SMN2 copies inherited from the same 5q13 region yet discordant SMA phenotypes have been reported (Burghes et al. 1994a; Cobben et al. 1995; Hahnen et al. 1995; McAndrew et al. 1997; Prior et al. 2004; Oprea et al. 2008b; Jedrzejowska et al. 2008; Bernal et al. 2011).

These sib pairs are referred to as “discordant siblings” and they can occur in all phenotypic forms of SMA. For example, there are SMA type 1 and type 2 siblings reported with the same SMN2 copy number (DiDonato et al. 1994, 1997; Pane et al. 2017), as well as siblings with SMA type 2 and type 3b, and cases of SMA type 3a with a phenotypically normal sibling (Cobben et al.

1995; Hahnen et al. 1995). Furthermore, 3 copies of SMN2 is the most common genotype for both SMA type 2 and type 3 (Calucho et al. 2018), indicating that SMN2 copy number alone poorly predicts severity of the disease in individuals with 3 or more copies of SMN2.

Discrepancy between SMN2 copy number and the expected SMA phenotype may result from partial deletions of SMN2 and modifying variants inside or outside of the SMA region (modifiers outside of SMN2 are investigated and discussed more in Chapter 3). Inside of SMN2, the variant c.859G>C in exon 7 of SMN2 has been demonstrated to be a positive modifier of

SMA by causing an approximate 20% increase in exon 7 incorporation, resulting in more full- length SMN protein (Prior et al. 2009; Vezain et al. 2010). As expected, this variant has never been reported in a severe SMA type 1 patient but is found in SMA type 2 patients with 2 copies of SMN2 in a heterozygous state, and in SMA type 3 patients with 2 copies of SMN2 in a homozygous state (Bernal et al. 2010). Additional variants in SMN2 introns 6 and 7 have been shown to alter the incorporation of the critical exon 7 (Wu et al. 2017). In particular, the A-44G variant which is normally present in SMN1 but can be found in SMN2 due to gene conversion events was shown to have about a 20% effect on SMN2 exon inclusion (Burghes 1997; Monani

54

et al. 1999b; Wu et al. 2017). This clearly indicates that the amount of SMN produced by a particular SMN2 gene is critical in determining phenotypic severity and that SMA phenotype can be modulated by variants in SMN2.

There are numerous pathogenic mutations which have been reported in SMN1 that can cause SMA. Missense mutations in SMN1 can have a wide degree of severity, with severe mutations drastically limiting the amount of functional protein produced. The missense mutation p.Y272C is in the oligomerization domain of SMN and as such, it disrupts the ability of SMN to self-associate (Lorson et al. 1998a). There is also the severe mutation p.E134K in the Tudor domain which disrupts the ability of the SMN complex to bind Sm proteins (Bühler et al. 1999).

Numerous mutations have been reported that result in a frameshift have also been reported, as well as the c.922+6T mutation of intron 6 which results in SMA (Wirth et al. 1999; Sun et al. 2005). However, the mutational profile of SMN2 is not as well known. The p.G278R mutation of exon 7 has been found in SMN2 (Feldkötter et al. 2002), but it is not known what other mutations reside in SMN2 that affect its ability to produce functional protein, or how common such mutations are.

The SMA genomic region is known to be unstable and prone to rearrangements, as evidenced by multiple banding patterns in pulsed-field gel electrophoresis experiments performed using probes for SMN1 and the adjacent gene neuronal anti-inhibitory protein (NAIP)

(Campbell et al. 1997). The region is often depicted as an inverted repeat of 500 kilobases (kb)

(Lefebvre et al. 1995). However, the region appears considerably more complex with multiple arrangements in different individuals (Campbell et al. 1997; Burghes 1997). In the NAIP gene and the Small EDRK-Rich Factor (SERF1A) gene flank the SMN1 gene (Chen et al. 1998),

55

while a NAIP pseudogene and a copy of the SERF1B flank the SMN2 gene. Due to the high variation and difficulty in assembling this large repeat region, no consensus maps of the SMA region exists. The loss of the SMN1 gene can occur via deletion or by gene conversion to SMN2

(Campbell et al. 1997; DiDonato et al. 1997; Burghes 1997). The larger deletions are also marked by loss of the intact NAIP or the probe XS2G3 (Roy et al. 1995; Thompson et al. 1995).

To date, there is limited knowledge about the extent of deletions and no deletion junction has been defined which removes exons 7 and 8 of either SMN1 or SMN2 (SMN1/2). However, at least one internal Alu-mediated deletion has been reported that eliminates exons 5 and 6 (Wirth et al. 1999).

Several genetic modifiers of the SMA phenotype have been reported. One proposed modifier that lies outside the SMA region is the plastin 3 (PLS3) gene located on the whose expression level has been reported to alter the severity of the SMA phenotype. This is based on higher expression levels of PLS3 in lymphoblasts from mild exception patients but the effect is sex-dependent and only partially penetrant. (Oprea et al.

2008b; Bernal et al. 2011). The degree to which PLS3 expression modifies the SMA phenotype remains controversial, in part due to experiments in SMA mice showing no survival or electrophysiological benefit of PLS3 over-expression (McGovern et al. 2015b; Burghes and

McGovern 2017). No DNA variant that accounts for the altered PLS3 expression has been reported to date.

In this work, we utilized an adaption of the targeted sequencing technique Multiplexed

Direct Genomic Selection (MDiGS) (Alvarado et al. 2014) to define all the variants that occur in the SMN1, SMN2, and PLS3 genes of 217 SMA patients. We defined, for the first time, a

56

deletion junction that removed SMN1/2 exons 7 and 8 but retained exons 1-6. We tested for this deletion junction in a separate group of 466 individuals who have various copy numbers of

SMN1 and SMN2. This data, as well as pedigree analysis, showed that the deletion can occur in

SMN1 or SMN2. By using the read depth across the SMN2 gene we determined the copy number of SMN1/2 and the copy number was consistent with that determined by droplet digital PCR

(ddPCR). Variants and indels in SMN2 were analyzed for their association in SMA patients with exceptionally mild or severe phenotypes not predicted by their copy number of SMN2. The intron 6 variants A-44G, A-549G, and C-1897T of SMN2 showed a statistically significant correlation with milder than expected exception patients. These variants were found in SMN2 but are typically associated with the SMN1 gene (Monani et al. 1999b; Wu et al. 2017). No variant or indel in PLS3 was found to significantly associate with mild or severe SMA patients. Thus, a majority of patients had fully intact SMN2 genes with no evidence of modifying mutations within the SMN2 gene implying that modifying variants in a majority of cases lie outside of the SMA region. The patient sample set presented here is an ideal dataset for the further testing of candidate modifiers of SMA.

2.2 Methods

2.2.1 DNA samples

This study used multiple sources of DNA including samples previously collected for linkage analysis or molecular studies of SMA (Burghes et al. 1994a; McAndrew et al. 1997;

Miller et al. 2001) under the Institutional Review Board (IRB) of The Ohio State University

OSU1988H0371 that were de-identified and determined to be exempt by the IRB. This consisted of 80 OSU SMA samples as well as 10 parents or siblings. Thirteen new SMA samples were

57

collected under IRB No. 2015H0115, giving a total of 93 SMA patients. From Massachusetts

General Hospital (MGH), we included 127 samples from the Project Cure SMA Longitudinal

Pediatric Data Repository, University of Utah IRB No. 8751 and Partners IRB No. 2016-

P000469. The MGH samples were deidentified to the OSU investigators. A total of 217 SMA samples were sequenced and analyzed by MDiGS.

A set of 466 de-identified independent samples with known and varying copy number of

SMN1 and SMN2 from The Ohio State University Molecular Pathology Laboratory were used to screen for deletion junction frequency. These samples were determined exempt by the OSU IRB.

2.2.2 Classification of patients

A total of 217 SMA patients were sequenced. Phenotypic information was known for all but 29 samples. Phenotype was reported when samples were collected by the physician and was based on the approved criteria of age of symptom onset and maximum achieved motor function (non-sitters, sitters, and walkers). Based on the phenotype and MDiGS determined copy number, we classified each patient as concordant, mild, or severe. We defined concordant patients as those who had an expected SMN2 copy number for their given level of disease severity. For testing purposes, we used the following model to define expected copy number: SMA type 1 has

2 copies of SMN2, type 2 has 3 copies of SMN2, and type 3 has 4 copies of SMN2. This model allows us to identify exception patients (discordant) that have either a milder or more severe phenotype compared to SMA patients that are concordant with SMN2 copy number and severity.

This model allows for testing the association of SMN2 and PLS3 variants with severity of SMA while accounting for SMN2 copy number. In the cohort we studied (n = 217), we identified 77

58

exception patients, 58 with a milder than expected phenotype and 19 with a more severe phenotype.

2.2.3 MDiGS sequencing

Genes of interest were captured and sequenced using an adaptation of the Multiplexed

Direct Genomic Selection (MDiGS) assay (Alvarado et al. 2014). Using the same method as described in Alvarado et al, we pooled 48 indexed DNA samples. For BAC capture we used the clones RP11-652K3 (CFTR), RP11-1056O6 (CFTR), and RP11-268A15 (PLS3) obtained from the BACPAC Resource Center at Children’s Hospital Oakland Research Institute in Oakland,

California. RP11-652K3 and RP11-1056O6 have approximately 88 kb of overlap and together cover the entire CFTR gene (Osoegawa et al. 2001). For the capture of SMN2 we used a 35.5 kb portion of the clone RP1-215P15. This portion contains the entire SMN2 gene flanked by BamHI that had previously been cloned into the BAC pIndigoBac5 (Epicentre) SMN26.6 (BAC5 SMN2) (Hao et al. 2011). Four cosmids (108F4, 121C9, 22A5, 30C9) flanking the SMN2 gene were used to block non-specific capture. We confirmed that the cosmids do not contain SMN2 by

PCR (DiDonato et al. 1994; DiDonato 1995; Thompson et al. 1995). The biotinylated captured

BACs were then hybridized with the pooled indexed DNA library for more than 70 hours. The

DNA library then contained only sequences hybridized to SMN, PLS3, or CFTR. The 48 samples were run on a single MiSeq lane and then decoded using the indices.

2.2.4 Bioinformatics

Reads were aligned using the program STAR (Dobin et al. 2012) to a custom made genome based on human reference hg19. This genome masked the SMN1, SMN2, CFTR, and

PLS3 genes. In addition, the sequence of the clones RP1-215P15, RP11-652K3, RP11-1056O6,

59

and RP11-268A15 were appended to the genome as separate contigs. Only the 35.5 kb sequence of RP1-215P15 that was captured was included in the genome.

Copy number was determined by calculating the ratio of reads aligned to RP1-215P15

(SMN2) compared to RP11-652K3 (CFTR) and RP11-1056O6 (CFTR). This is the ratio of reads aligned to SMN2 (and SMN1 if present) compared to CFTR (i.e. SMN1/2 / CFTR). We used 3 samples with known SMN2 copy number to normalize the read counts as the clones have different capture efficiencies and lengths. Since the copy number of CFTR should be 2 for all individuals, as it is on an autosomal chromosome, this normalized ratio gave the number of copies of SMN1 and SMN2. Additionally, RP1-215P15 was divided into equal-lengths bins 1,815 bp in length, and SMN1/2 copy number was found for each bin, again by calculating the SMN1/2

/ CFTR ratio of reads in that region. Variants and indels were called using the Genome Analysis Toolkit (DePristo et al. 2011). Duplicate reads were marked and removed using Picard Tools MarkDuplicates. Reads were realigned along known insertions and deletions (indels) in the SMN2, CFTR, and PLS3 genes. Base quality scores were then recalibrated for each read. HaplotypeCaller was used to call the variants and the variants were then hard filtered. The presence and copy number of SMN1 is determined by analyzing variant calls for the C to T change at the +6 position of exon 7.

Variants were tested for association with mild or severe exception patients by performing a

Fisher’s exact test using a 2 x 2 contingency table and comparing allele counts in exception patients to concordant patients. We defined concordant patients as those who had an expected

SMN2 copy number for their given level of disease severity. P-values were corrected for multiple

60

testing using a False Discovery Rate calculation in R (R Core Team 2013). To reduce false positives, variants were only tested if more than 15% of reads at that contained the variant.

We analyzed the captured region for possible long deletions in our samples using two methods. First, we designed a Python script that extracted all aligned reads with gaps greater than 20 base pairs based on CIGAR strings containing the ‘N’ value (Li et al. 2009). The script then printed out the length of every gap and the number of occurrences per sample. Second, we plotted the CFTR vs. SMN1/2 read count ratio over bins (1815 bp per bin) for each sample. Any sudden decrease in copy number in a particular bin suggested a possible deletion at that location.

Individual reads at sites of possible deletions were analyzed for gaps, partial alignments, and mismatches to determine the break point. Break points were then confirmed via PCR. Similarly, a mismatch between the copy number when determined over the whole gene and copy number when determined at exon 7 was evidence of a possible duplication or deletion. All scripts have been deposited to GitHub and are available at https://github.com/BurghesLab/SMN2Analysis.

2.2.5 Determination of SMN1 and SMN2 copy number by ddPCR

Copy number as determined by MDiGS was validated using droplet digital PCR on the

QX200 Droplet Digital PCR system (BioRad). Locked nucleic acid (LNA) probes with a competitive non-extending oligo were used for to determine SMN1 and SMN2 copy number using the primer set (5’- AATGCTTTTTAACATCCATATAAAGCT-3’ and 5’-

CCTTAATTTAAGGAATGTGAGCACC-3’) (Anhuf et al. 2003), SMN1 5’- FAM-

CAGG+GTT+T+C+AGACAAA-3’ with competitive oligo

5`ATTTTCCTTACAGGGTTTtAGACAAAATCAAAAGA-PHO-3’, and SMN2 5’- FAM-

61

TGATTTTGT+C+T+A+A+AA+CCCT-3’, with competitive oligo

5`ATTTTCCTTACAGGGTTTcAGACAAAATCAAAAGA-PHO (Pyatt and Prior 2006). In every multiplex reaction exon 14 of CFTR was amplified as a two-copy control and used to determine copy number using (FP 5`-AGAGAGAAGGCTGTCCTTAGT-3’, RP 5`-

GAGTGTGTCATCAGGTTCAGG, probe 5`-HEX-TTCTGAGCAGGGAGAGGCGATACT-

3’).

2.2.6 PCR detection of the SMN1/2 intron 6 deletion junction

Once possible deletion junctions were identified bioinformatically they were confirmed using conventional PCR. Primers were designed that flank the junction and did not contain repeats as identified by using Repeat Masker (Smit et al. 2013). The forward primer used was 5’-

CAGTTATCTGACTGTAACACTGTAGGC-3’ and the reverse primer used was 5’-

GTTGTTGCTTATGCTGGTCTTG-3’ to generate a 650 bp product. For 3 individuals the PCR product was subcloned into pCR 2.1-TOPO TA vector (Thermo Fisher) and Sanger sequenced for confirmation.

2.2.7 Determination of the inheritance of alleles in family with deletion junction by ddPCR

To determine inheritance of alleles primers and probes were made to SMN exon 1 (FP 5’-

TGTTCCGCTCCCAGAAG-3’ and RP 5’-CTCATCGCCATAGCAAACC-3’, 5’-FAM-

TTAAGAGTGACGACTTCCGCCGC-3’), exon 2a (FP 5’-

TTTATTTCTTACCCTTTCCAGAGC-3’ and RP5’-AAATGAAGCCACAGCTTTATCA-3’,

5’-FAM- TCTGACATTTGGGATGATACAGCACTGA-3’), and exon 6 (FP 5’-

CACCTCCCATATGTCCAGATTC-3’ and RP 5’- CCAGTATGATAGCCACTCATGT-3’, 5’-

FAM-TCTTGATGATGCTGATGCTTTGGGAAGT-3’). To determine the number of junction

62

fragments created by the deletion we designed a primer probe set where the forward primer (FP

5`-ATACAAGTTGGCTGGGCACAA-3’) was located in intron 6 and the reverse primer (RP 5`-

TTTTACTATGTTGGCCAGGCTG-3’) and probe (5`FAM-

TGGATCACCTGAGATCAGGAGTTCC-3’) were located in exon 8. 10-20ng of genomic

DNA was used in each ddPCR reaction. All ddPCR reactions were multiplexed with the same

CFTR primer/probe set used to determine SMN1/2 copy number. Copy number of SMN1 and

SMN2 at exon 7 was determined via ddPCR using primers as described above.

2.3 Results

2.3.1 Identification of SMN1/2 deletion junction

To analyze the genotype/phenotype relationship in SMA, we captured and sequenced the

CFTR, SMN2, and PLS3 genes of 217 SMA patients using the MDiGS procedure (Alvarado et al. 2014). We used the clones BAC5SMN2 (SMN2), RP11-652K3 (CFTR), RP11-1056O6 (CFTR), and RP11-268A15 (PLS3) as bait for the capture procedure. The average number of total reads per sample was approximately 940,000, with 19.2% of reads aligning to the clone sequences which is similar to previously reported results (Alvarado et al. 2014). Copy number of SMN2 was determined by calculating the ratio of reads aligned to the CFTR contigs compared to the

SMN2 contig.

In order to detect partial deletions of SMN1/2 we divided the SMN2 contig into bins of

1,815 bp and copy number was determined at each bin. Out of 217 samples sequenced 17 samples displayed a decrease in the copy number over 3 consecutive bins in the 3’ region of the

SMN1/2 genes (Figure 2.1A). Upon analysis of the read alignments in this region we found reads that contained a 6310 bp deletion. The deletion junction occurred within a 21 bp repeat that

63

had an exact match a total of 15 times in the 35.5 kb segment containing the SMN2 gene that we captured (Figure 2.1B). Primers were made to unique sequences flanking the repeats and the deletion junction was amplified by PCR. The resulting fragment was Sanger sequenced to confirm the deletion in 3 samples (Figure 2.1C). From this assay alone it is impossible to determine if the deletion occurs in SMN1 or SMN2. Nonetheless, this is the first deletion junction identified that eliminates the critical exons 7 and 8 of SMN1/2. The deletion was detected in 17 out of 217 SMA patients sequenced. Of these 17 patients, 2 had SMN1: OSU284 and MGH300.

OSU284 had 1 copy of SMN1 and 1 copy of SMN2, while MGH300 had 1 copy of SMN1 and 2 copies of SMN2. Of the 15 remaining patients, 2 had 1 copy of SMN2, 6 had 2 copies of SMN2, 6 had 3 copies of SMN2, and 1 had 4 copies of SMN2. However, this sample is not completely random as the MGH samples where partially selected to include patients that either had a milder or more severe phenotype than expected. In the case of the OSU samples the deletion was found in 9 out of 80 samples, which were all collected in an unbiased manner. The frequency of deletion was then determined by PCR in a separate set of samples with known SMN1 and SMN2 copy number to develop a model of deletion described below.

2.3.2 Inheritance of the junction in an SMA family

To determine if the deletion occurred in SMN1 or SMN2, we analyzed the inheritance of the deletion junction with copy number analysis as measured by ddPCR, polymorphic markers that lie at the 5’ end of the SMN1/2 genes, and a flanking single copy marker. The pedigree of

Family 32 with proband OSU199 is shown in Figure 2. The proband had a severe SMA type 1/0 phenotype and was identified as having at least one copy of a SMN2 partial deletion using

MDiGS. Copy number of SMN1/2 was determined at exons 1, 2a, 6, and 7. Only exon 7 primers

64

Figure 2.1 Detection of 6.3 kb deletion junction bioinformatically, location of repeats in SMN2, and verification of the deletion junction using Sanger sequencing after amplification using PCR. A The ratio of reads aligned to CFTR compared to SMN1/2 was calculated and plotted for 19 bins, each with a size of 1.8 kb. A decrease in the ratio over 3 bins in the 3’ end of SMN1/2 is shown here for sample 199. Analysis of the sequencing reads resulted in the detection of a 6.3 kb deletion. The location of the deletion is indicated by black bars. B. A diagram illustrating where each bin aligns on the SMN2 gene. Also shown are the locations of the 21 bp repeats in SMN2, depicted as triangles. These 21 bp repeats match the Alu-repeat sequence and are known to be involved in deletions and rearrangements (Rüdiger et al. 1995). The black bars indicate location of the deletion. C. The deletion junction was amplified using PCR, subcloned, and Sanger sequenced for verification. Shown is a sequence alignment between the Sanger sequencing and the MiSeq MDiGS reads. An asterisk (*) represents identical nucleotides. A “W” base call can be either an “A” or a “T”. The boxed nucleotides indicate the repeat that contains the junction.

65

Figure 2.1

66

and probe were able to distinguish between SMN1 and SMN2 exon 7. The parents

(II:OSU201and II:OSU200) and paternal grandparents (I:OSU201.1 and I:OSU201.2) of the proband were also analyzed.

The proband (III:OSU199) was determined to have 1 copy of SMN2 at exon 7, 3 copies of SMN2 at exon 1, and 2 copies of the deletion junction. The mother (II:OSU200) had 3 copies of SMN1/2 exon 7 (1 copy of SMN1, 2 copies of SMN2), 3 copies of SMN1/2 at exon 1, and 0 copies of the deletion junction, meaning all SMN1/2 copies are intact. The father (II:OSU201) had 1 copy of SMN1 exon 7, 4 copies of SMN1/2 exon 1 and 3 copies of the deletion junction.

These data indicate the proband inherited 2 deletion junctions from the father and 1 SMN2 from the mother and the 2 deletion junctions lie on the same chromosome. The paternal grandmother

(I:OSU201.2) had 1 copy of SMN1 exon 7, but 3 copies of SMN1/2 exon 1, and 2 copies of the deletion junction. The grandfather (I:OSU201.1) had 2 copies of SMN1 exon 7, 4 copies of SMN1/2 exon 1, and 2 copies of the deletion junction.

In order to determine the origin of the deletion alleles, we examined the polymorphic markers AG1-CA, D5S823 and D5S107 (Weber et al. 1991; DiDonato et al. 1994; Wirth et al.

1995). The marker D5S107 lies near D5S39 and the markers are closely associated with each other (Weber et al. 1991). The marker AG1-CA was not informative as the same allele was inherited from the grandmother and grandfather into the father. Sequencing from the flanking markers D5S823 and D5S107 indicated that the mutant chromosome was shared between the proband and the grandmother. Thus, the grandmother had a deletion in both SMN1 and SMN2 which was inherited by the proband. These markers provide strong evidence that the deletion junction occurs in both SMN1 and SMN2 in this family.

67

Figure 2.2. Inheritance a 6.3 kb deletion. The copy number of SMN1 exon 7, SMN2 exon 7, SMN1/2 exon 1, and the deletion junction were measured using ddPCR in patient III:OSU199, her parents (II:OSU201 and II:OSU200), as well as paternal grandparents (I:OSU201.1 and I:OSU201.2). Shown are the measured copy numbers, from which we have inferred the number of copies per chromosome. The proband III:OSU199 had 1 copy of SMN2 exon 7, 3 copies of SMN1/2 exon 1, and 2 copies of the deletion junction. The father (II:OSU201) had 1 copy of SMN1, 4 copies of SMN1/2 exon 1, and 3 copies of the deletion junction. The mother had 0 copies of the deletion junction, meaning the proband inherited 2 copies of the deletion junction from the father on the same chromosome. The grandfather (I:OSU201.1) had 2 copies of SMN1 exon 7, 4 copies of SMN1/2 exon 1, and 2 copies of the deletion junction. The grandmother had 1 copy of SMN1, 3 copies of SMN1/2 exon 1, and 2 copies of the deletion junction. Analysis of polymorphic markers D5S823 and D5S107 confirms the deletion was inherited from the grandmother.

68

2.3.3 Model of deletion junction frequency in individuals with varying SMN1 and SMN2 copy number

To determine the frequency of the deletion junction, we screened a separate panel of 466 samples with varying copy numbers of SMN1 and SMN2 using PCR. The results are shown in

Table 2.1. The group with the highest frequency of deletion were individuals who had 1 copy of

SMN1 and 0 copies of SMN2 with a frequency of 0.63. The genotype with the next highest deletion frequency was 2 SMN1, 0 SMN2 with a frequency of 0.46. For genotypes with the same copy number of SMN1 deletion frequency increased as copy number of SMN2 decreased. For example, the deletion frequency was 0, 0.02, 0.06, and 0.35 for individuals with 0 SMN1 and 4,

Table 2.1. Prevalence of deletion junction amongst a panel of 466 individuals with different copy numbers of SMN1 and SMN2.

Known Copy Number Total Total Samples SMN1 SMN2 Positive Screened Frequency 0 1 6 17 0.35 0 2 3 50 0.060 0 3 1 50 0.020 0 4 0 50 0.0 1 0 12 19 0.63 2 0 23 50 0.46 3 0 3 30 0.10 2 1 17 50 0.34 2 2 0 50 0.0 1 1 17 50 0.34 1 2 2 50 0.04

69

3, 2, and 1 copies of SMN2 respectively. Similarly, for genotypes with the same copy number of

SMN2, deletion frequency increased as copy number of SMN1 decreased. The deletion frequency in patients with (3 SMN1; 0 SMN2), (2 SMN1; 0 SMN2), and (1 SMN1; 0 SMN2) was 0.10, 0.46, and 0.63, respectively.

To determine if the deletion junction occurred in both SMN1 and SMN2 we compared two probability models that calculated the log odds ratio of having the deletion based on the copy numbers of SMN1 and SMN2. Model 1 assumed the deletion junction frequency is dependent only on SMN2. Model 2 assumed the frequency is dependent upon the cross-classification of

SMN1 and SMN2.

Model 1:

Logit { P(Y=1) } = 훽0 + 훽1퐼(푋2 = 1) + 훽2퐼(푋2 = 2) + 훽3퐼(푋2 = 3)+ 훽4퐼(푋2 = 4)

Model 2:

4 3 Logit { P(Y=1) } = 훽0 + ∑푖=1 ∑푗=1 훽푖푗 I (X2 = i and X1 = j)

Where X1 is SMN1 copy number, X2 is SMN2 copy number, and I is the indicator function, which is 1 if the event occurs and 0 if the event does not occur.

By using the frequencies in Table 1 and comparing Model 1 to Model 2 using a likelihood ratio test, a p-value of 0.005 is obtained in favor of Model 2. As a result, the model that allows different probabilities of deletion by the cross-classified SMN1 and SMN2 fit the data better than the model that assumes the probability of deletion depends only on SMN2. This supports the hypothesis that the deletion junction occurred in both SMN1 and SMN2. This model however does not make any assumptions about the frequency in which the deletion occurs in

70

each gene. It is notable that all groups with 1 copy of SMN2 and varying copy numbers of SMN1 had a relatively similar rate of deletion at frequencies of 0.34, 0.34 and 0.35. This indicates a higher deletion rate in SMN2 than SMN1 however SMN1 deletions do occur as evidenced by the increased rate in the 1 SMN1; 0 SMN2 individuals compared to 2 SMN1; 0 SMN2 individuals.

2.3.4 Correlation curve of SMN2 as determined by MDiGS compared to ddPCR

SMN2 copy number as determined by MDiGS was validated in a subset of the sequenced samples (n = 180) using ddPCR. Of the 180 samples, 172 were confirmed to be correct (95.5%).

The correlation curve had an R2 value of 0.9108 and can be seen in Figure 2.3. Of the 8

Figure 2.3. Correlation curve of SMN2 copy number as determined by MDiGS compared to ddPCR. A total of 180 samples were analyzed, 172 of which were confirmed to be correct

(95.5%).

71

discrepant samples, 3 samples had low read counts, which can complicate bioinformatic analysis of copy number. In these samples, MDiGS indicated 3 copies of SMN2 in 2 cases whereas ddPCR indicated 2 copies of SMN2. In one case, MDiGS indicated 4 copies of SMN2 whereas ddPCR indicated 3 copies of SMN2. Thus, when read depth is low MDiGS over-estimated

SMN2 copy number by 1. Of the remaining 5 discrepant copy numbers, 3 were underestimates with MDiGS. In 2 cases, MDiGS indicated 3 copies of SMN2 while ddPCR indicated 4 copies of

SMN2. In one case, MDiGS indicated 4 copies of SMN2 and ddPCR 5 copies of SMN2. In the other 2 samples, overestimates occurred with MDiGS indicating 4 and 3 copies of SMN2 and ddPCR indicating 3 and 2 copies of SMN2. In all cases the difference between the two methods was a single copy. The exact reason for these discrepancies is unclear but could result from sample quality affecting the sequencing measures. We note that this rate of discrepancy is similar to previously reported rates when comparing sequencing and ddPCR copy number determination (Eisfeldt et al. 2018).

2.3.5 Analysis of SMN2 variants that can modify the SMA phenotype

Variants, insertions, and deletions were identified for all samples. There were 157 SNPs in SMN1/2 amongst all patients examined. A total of 173 samples were examined from the 217 sequenced. Of the samples eliminated from analysis, 5 had SMN1, 29 had either no phenotypic data or the exact phenotype was unclear, 8 had low read counts, and 2 had discrepant copy number of SMN2. As previously described, we classified each patient as concordant (Type 1 = 2 copies of SMN2, Type 2 = 3 copies of SMN2 and Type 3 = 4 copies of SMN2), mild, or severe.

We analyzed each variant for segregation by comparing allele counts in exception cases (mild or severe) to allele counts in concordant cases using a Fisher’s exact test. P-values were corrected

72

for multiple testing using False Discovery Rate. Samples were included for this analysis if there was phenotypic information available and if the MDiGS determined SMN2 copy number matched the SMN2 copy number determined by ddPCR or an alternative quantitative dosage assay (Mailman et al. 2002).

The SNPs with the highest significance are shown in Table 2.2. A total of 3 SNPs in intron 6, namely A-44G, A-549G, and C-1897T, were found to significantly associate with mild exception patients after correcting for multiple testing. Variant A-549G

(NC_000005.9:g.69371799A>G; rs564142907; GnomAD frequency of 0.0018) was found to be most significant (adjusted p-value of 0.00294) with identification in 9 mild exception patients and never in concordant or severe exception patients. Similarly, A-44G

(NC_000005.9:g.69372304A>G; rs212216; GnomAD frequency of 0.0001) was found in 6 mild exception patients but never in severe exception patients or concordant patients (adjusted p-value

Table 2.2. Association between SNPs and exception SMA phenotypes. Concordant Mild Severe Adjusted Location Variant Refa Altb Ref Alt Ref Alt Ref Alt P-Value P-Value Intron 6 A-549G A G 309 0 143 9 39 0 1.872E-5 0.00294 Intron 6 A-44G A G 310 0 156 6 39 0 0.000952 0.04982 C- Intron 6 C T 308 1 146 8 39 0 0.000471 0.03704 1897T Intron 6 C-478T C T 309 0 151 5 39 0 0.002716 0.10662 Intron 6 C-255T C T 311 0 165 5 39 0 0.003588 0.11265 aRef = Reference Allele, bAlt = Alternate Allele

73

of 0.04982). C-1897T (NC_000005.9:g.69370451C>T; rs1381625877; GnomAD frequency of

0.0025) was found in 8 mild exception patients and 1 concordant (adjusted p-value of 0.03704).

In all cases, only 1 allele of the variant was present in each patient. All 3 of these variants are known to associate more commonly with SMN1 but can also be present in SMN2 due to gene conversion events (Burghes 1997; Monani et al. 1999b; Wu et al. 2017). Two other variants that are also usually in SMN1, T-478C (NC_000005.9:g.69371870C>T; rs1457707829), and C-255T

(NC_000005.9:g.69372088C>T; rs1317747440; GnomAD frequency of 0.0005), were detected in 5 mild exception patients however they did not reach statistical significance with an adjusted p-values of 0.1066 and 0.1126, respectively. Interestingly, A-44G has recently been shown to increase the level of full-length transcript when it is present in SMN2. C-255T and A-549G were also studied in the same experiment but were not found to have an effect (Wu et al. 2017). All of these variants had an extremely low frequency of less than 1% as reported by GnomAD. However, this frequency may be incorrect as it is not clear if the variants were properly designated in SMN1 or SMN2. Moreover, some of these variants have been reported previously with significantly higher allele frequencies (Monani et al. 1999b). These data suggest 3 variants in intron 6 of SMN2 result in a milder SMA phenotype than expected,

We also identified and tested 80 SMN1/2 indel alleles for association with either the mild or severe exception patients. There were no significant differences in the frequencies of these alleles after multiple testing correction. Similarly, we tested for segregation of PLS3 variants with exception patients after separating all patients by sex as PLS3 is on the X chromosome.

Including indels, we tested 401 PLS3 variants in males and 637 in females. We found no SNPs or indels in PLS3 that segregated with mild or severe exception patients after multiple testing

74

correction. This includes the rs871773 variant which is known to increase PLS3 expression in colon cancer (Szkandera et al. 2013). We found rs871773 in 6 female mild exception patients, but also 3 female severe exception patients with a corrected p-value of 0.6749. These data suggest that there is no evidence of a modifying variant in PLS3. Data for all SMN1/2 SNPs can be found in Table 2.3 and data for all SMN1/2 indels can be found in Table 2.4. Lists of PLS3

SNPs in females can be found in Appendix A, PLS3 indels in females can be found in Appendix

B, PLS3 SNPs in males can be found in Appendix C, and PLS indels in males can be found in

Appendix D.

2.3.6 Mutations in SMN2 or SMN1 in SMA patients

We found 5 exonic variants in our patient samples (n = 217). One was the novel mutation p.A75T (c.223G>A; NC_000005.9:g.69361861G> A) in SMN2 exon 2b of one SMA patient. To our knowledge, this variant has never been reported in either SMN1 or SMN2. This variant was present in a SMA type 3 patient with 4 copies of SMN2. As the phenotype is concordant with

SMN2 copy number, there is no evidence this mutation affects SMA phenotype. Indeed, the variant was predicted by PolyPhen to be benign with a score of 0.005.

We also identified the variant p.Y130C (c.389A>G; rs397514517;

NC_000005.9:g.70238300A>G) in exon 3 of one patient. This patient had 1 copy of SMN1, 2 copies of SMN2, and was diagnosed as SMA type 3b. The SMN p.Y130C variant was previously reported as an SMA-causing mutation (Prior 2007), though the phenotype is milder compared to a 3 copy SMN2 individual. Therefore, the variant appears to result in a milder SMA phenotype.

Two patients had the c.859G>C (rs121909192; NC_000005.9:g.69372372G>C,

75

Table 2.3. Allele counts of SNPs in SMN1/2

Adjusted P- chr5 hg19 Normal Mild Severe Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 69341531 C T 305 0 163 1 39 0 0.8020 1 69342183 T C 310 1 164 3 36 0 0.5391 1 69342265 C G 210 55 104 30 30 6 0.9831 1 69342581 G C 277 31 150 18 35 4 1.0000 1 69342622 T C 264 41 155 9 35 4 0.2280 1 69342821 G C 302 9 165 3 37 2 0.9732 1 69342881 T C 227 72 132 31 34 4 0.9077 1 69342937 A G 277 29 154 12 34 3 0.9732 1 69342981 T A 279 27 155 9 35 3 0.8020 1 69343010 C T 278 27 156 10 35 3 0.8638 1 69343174 A C 274 28 149 17 35 4 1.0000 1 69343206 T C 281 22 150 10 35 3 1.0000 1 69343230 C T 272 28 144 17 35 4 1.0000 1 69343421 G C 277 30 146 12 36 3 0.9831 1 69343570 G T 243 56 118 43 28 9 0.5226 1 69343788 T A 272 29 153 15 34 5 1.0000 1 69343795 C G 310 0 170 0 38 1 1.0000 1 69343799 G C 276 27 152 16 34 4 1.0000 1 69343909 G C 272 29 144 18 34 4 0.9917 1 69344071 A G 276 31 152 16 34 4 1.0000 1 69344169 G A 311 0 169 1 39 0 0.8020 1 69344333 G A 309 2 163 6 38 1 0.5225 1 69344660 T A 310 1 166 1 39 0 0.9732 1 69345130 T G 287 21 152 9 39 0 1.0000 1 69345302 T C 308 0 164 0 38 1 1.0000 1 69345346 C G 308 0 163 1 39 0 0.8020 1 69345593 G A 311 0 164 1 37 2 1.0000 1 69345751 A C 310 1 166 1 39 0 0.9732 1 69346279 C T 311 0 168 1 39 0 0.8020 1 69346522 C A 304 1 160 1 36 0 0.9732 1 69346945 G A 310 1 170 0 38 1 1.0000 1 69347007 C T 310 1 167 2 38 1 0.9814 1 69347719 A G 310 1 169 1 39 0 0.9732 1 69348440 T C 296 3 140 1 36 0 1.0000 1 Continued

76

Table 2.3 Continued Adjusted p- chr5 hg19 Normal Mild Severe value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 69349106 C T 310 1 169 1 39 0 0.9732 1 69349359 T A 311 0 169 1 39 0 0.8020 1 69349447 C T 307 4 169 1 38 1 1.0000 1 69349820 T C 259 39 137 19 32 7 1.0000 1 69350103 C T 310 1 170 0 38 1 1.0000 1 69350418 T C 310 1 170 0 37 1 1.0000 1 69350523 G A 308 3 168 2 38 1 1.0000 1 69350617 G C 310 1 169 1 39 0 0.9732 1 69350648 C A 309 2 168 2 39 0 0.9814 1 69350794 A C 302 9 169 1 36 3 0.5225 1 69351476 A C 288 18 158 11 36 3 1.0000 1 69351640 T G 279 24 144 19 35 4 0.8020 1 69352037 G A 307 2 169 1 39 0 1.0000 1 69352552 A G 311 0 170 0 38 1 1.0000 1 69352955 T C 311 0 169 1 39 0 0.8020 1 69353136 A G 296 13 164 6 38 1 1.0000 1 69353669 T C 176 127 90 79 24 15 0.8020 1 69353924 A G 311 0 169 1 39 0 0.8020 1 69354309 G T 309 2 166 2 39 0 0.9814 1 69354754 A G 311 0 169 1 39 0 0.8020 1 69354973 A G 206 92 108 55 29 10 0.9732 1 69355378 C A 311 0 169 1 39 0 0.8020 1 69355573 T C 310 1 166 1 39 0 0.9732 1 69355622 T A 291 16 159 7 37 2 1.0000 1 69356686 G A 294 10 144 8 34 4 0.9917 1 69357190 A G 272 27 153 6 38 1 0.5226 1 69357245 C G 180 109 101 39 25 13 0.5225 1 69357509 G A 289 15 147 18 35 4 0.4855 1 69357933 A C 297 14 164 5 39 0 0.9917 1 69358232 A G 311 0 165 1 39 0 0.8020 1 69358605 A G 203 93 108 58 27 10 0.9077 1 69359034 C T 310 1 170 0 38 1 1.0000 1 69359142 A C 305 6 167 2 37 0 1.0000 1 Continued

77

Table 2.3 Continued Adjusted p- chr5 hg19 Normal Mild Severe value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 69359244 C T 302 9 163 6 37 2 1.0000 1 69359824 G C 290 17 159 7 37 2 1.0000 1 69360020 G T 308 3 164 6 38 1 0.5226 1 69360500 C A 310 1 169 1 39 0 0.9732 1 69360651 G A 311 0 167 3 39 0 0.4170 1 69360730 T G 311 0 169 1 39 0 0.8020 1 69360743 G C 297 14 156 7 36 3 1.0000 1 69361500 A G 309 2 168 1 39 0 1.0000 1 69362280 G A 309 2 167 3 39 0 0.8124 1 69362410 T C 288 17 153 14 34 4 0.8308 1 69362696 G A 310 1 168 1 39 0 0.9732 1 69362876 A G 311 0 169 1 39 0 0.8020 1 69362949 A G 202 96 113 56 28 10 1.0000 1 69363365 T C 310 1 167 1 39 0 0.9732 1 69363717 C T 267 34 140 22 33 5 0.9732 1 69363814 A G 311 0 162 1 39 0 0.8020 1 69363996 G A 309 2 169 1 39 0 1.0000 1 69364605 A G 222 83 121 47 29 7 1.0000 1

69365020 T G 308 3 167 3 39 0 0.8927 1 69365216 G C 289 18 153 13 35 4 0.9814 1 69365646 C T 309 2 169 1 39 0 1.0000 1 69366414 C T 278 29 143 23 35 4 0.7937 1 69367010 C T 276 26 130 13 37 2 1.0000 1 69367063 G T 134 168 60 91 18 18 0.8020 1 69367742 G C 311 0 169 1 39 0 0.8020 1 69367840 T C 307 4 169 1 39 0 1.0000 1 69368084 A G 167 134 89 76 22 16 1.0000 1 69368206 A G 311 0 169 1 39 0 0.8020 1 69368270 C G 274 29 161 7 35 3 0.5212 1 69368307 G A 311 0 166 3 39 0 0.4170 1 69368329 G A 156 152 79 90 22 16 0.8284 1 69368571 A G 292 15 157 9 36 3 1.0000 1 69368717 G A 253 47 139 24 29 8 1.0000 1 Continued

78

Table 2.3 Continued

Adjusted p- chr5 hg19 Normal Mild Severe value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe

69368815 G C 300 9 152 6 36 2 1.0000 1 69368905 T C 311 0 166 1 39 0 0.8020 1 69368911 C G 302 8 161 6 34 4 1.0000 1 69368927 G C 311 0 165 2 39 0 0.5391 1 69369396 T G 311 0 168 2 39 0 0.5391 1 69369556 A G 307 4 164 2 39 0 1.0000 1 69369621 C T 310 1 167 1 39 0 0.9732 1 69369651 A T 311 0 169 1 39 0 0.8020 1 69370081 T A 311 0 168 2 39 0 0.5391 1 69370451 C T 308 1 146 8 37 0 0.0370 1 69370574 C G 231 63 124 31 30 8 1.0000 1 69370591 A G 297 6 160 8 38 0 0.5226 1 69370594 C T 301 2 160 6 38 0 0.3365 1 69370731 A G 295 3 153 6 35 0 0.5225 1 69370742 C T 296 3 154 5 35 1 0.7376 1 69370895 A G 281 10 142 9 34 1 0.8020 1 69371300 C G 309 2 150 2 39 0 0.9814 1 69371328 A G 308 3 153 1 39 0 1.0000 1 69371368 A G 305 2 153 4 39 0 0.5226 1 69371499 C A 307 3 156 6 39 0 0.4170 1 69371799 A G 309 0 143 9 39 0 0.0029 1 69371870 C T 309 0 151 5 39 0 0.1066 1 69371933 C T 308 3 162 2 39 0 1.0000 1 69371981 C A 201 101 104 65 25 13 0.8020 1 69372088 C T 311 0 165 5 39 0 0.1127 1 69372304 A G 310 0 156 6 39 0 0.0498 1 69372372 G C 311 0 164 3 39 0 0.4170 1 69372501 G A 311 0 162 3 39 0 0.4170 1 69372613 G A 302 8 166 4 39 0 1.0000 1 69372616 G A 311 0 164 5 39 1 0.3365 1 69372900 G A 310 1 169 1 39 0 0.9732 1 69373000 G A 304 7 168 2 39 0 1.0000 1 Continued 79

Table 2.3 Continued

Adjusted p- chr5 hg19 Normal Mild Severe value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe

69373081 A G 305 4 163 4 39 0 0.8020 1 69373373 C G 306 5 168 2 39 0 1.0000 1 69373403 T A 310 1 169 1 39 0 0.9732 1 69373485 A G 302 7 166 4 38 1 1.0000 1 69373637 C G 311 0 169 1 39 0 0.8020 1 69373667 A G 205 95 108 58 29 9 0.8360 1 69373682 C G 284 21 147 15 32 7 1.0000 1 69374446 A G 189 114 93 73 23 14 0.7937 1 69374554 C T 311 0 169 1 39 0 0.8020 1 69374555 A G 311 0 168 2 39 0 0.5391 1 69374575 G C 190 107 93 75 24 14 0.5225 1 69374607 A C 303 5 168 2 38 1 1.0000 1 69374608 A T 303 5 168 2 38 1 1.0000 1 69375026 G A 310 1 165 0 38 1 1.0000 1 69375219 A G 245 44 104 21 35 4 0.9732 1 69375425 C A 257 40 138 23 31 5 1.0000 1 69375525 G A 252 39 142 22 33 4 1.0000 1 69375535 G A 254 37 144 19 33 4 1.0000 1 69376037 G A 261 40 145 13 34 5 0.6517 1 69376082 C G 309 1 164 1 39 0 0.9732 1 69376113 G A 256 45 146 17 34 5 0.8020 1 69376219 A C 311 0 169 1 39 0 0.8020 1 69376441 G A 311 0 169 1 39 0 0.8020 1 69376576 A G 231 76 127 23 35 3 0.5225 1 69376589 G A 303 2 158 1 39 0 1.0000 1

80

Table 2.4. Allele counts of indels in SMN1/2

chr5 hg19 Normal Mild Severe Adjusted P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 69375646 CTTT CT 19 227 8 125 6 27 0.739 0.511 69375646 CTTT CTT 19 27 8 10 6 2 1.000 0.886 69375646 CTTT C 19 17 8 6 6 3 1.000 1.000 69375585 CA CAA 192 95 104 36 25 8 0.739 1.000 69375585 CA C 192 16 104 5 25 2 0.739 1.000 69375258 CAAA CAA 9 274 3 138 3 33 0.888 0.768 69375258 CAAA CA 9 17 3 5 3 0 1.000 0.678 69375258 CAAA C 9 5 3 2 3 0 1.000 1.000 69371046 GT G 268 18 129 3 35 0 0.739 1.000 69371046 GT GTT 268 3 129 0 35 1 0.739 1.000

81

69370819 CA C 278 6 142 3 34 1 1.000 1.000 69370819 CA CAA 278 2 142 2 34 0 0.888 1.000 69370470 CA C 279 2 142 4 36 1 0.739 1.000 69370470 CA CAA 279 12 142 3 36 0 0.888 1.000 69367343 ATTT ATTTT 243 14 106 0 23 4 0.176 0.511 69367343 ATTT ATT 243 14 106 10 23 1 0.739 1.000 69367343 ATTT AT 243 1 106 4 23 0 0.359 1.000 69367343 ATTT A 243 1 106 2 23 0 0.739 1.000 69366035 AT A 244 24 139 16 31 4 0.958 1.000 69366035 AT ATT 244 5 139 2 31 0 1.000 1.000 69365975 CTGTG CTGTGTG 183 55 111 23 26 5 0.739 1.000 69365975 CTGTG CTG 183 27 111 13 26 2 0.958 1.000 Continued

Table 2.4. Continued chr5 hg19 Normal Mild Severe Adjusted P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 69365975 CTGTG C 183 1 111 1 26 0 1.000 1.000 69365530 CTT C 196 2 123 2 21 0 0.907 1.000 69365530 CTT CT 196 49 123 20 21 4 0.739 1.000 69365530 CTT CTTT 196 13 123 4 21 0 0.811 1.000 69364194 ATTT ATT 53 152 23 83 3 20 0.920 1.000 69364194 ATTT AT 53 54 23 29 3 9 0.958 0.909 69364194 ATTT A 53 4 23 4 3 0 0.739 1.000 69364194 ATTT ATTTT 53 1 23 3 3 1 0.739 1.000 69362971 CA C 229 17 132 16 26 2 0.739 1.000 69362971 CA CAA 229 2 132 3 26 1 0.795 1.000

82 69360406 AT A 163 59 104 30 21 8 0.760 1.000 69360406 AT ATT 163 29 104 12 21 3 0.739 1.000 69357002 GT G 256 15 118 10 36 0 0.739 1.000 CTTTTTT 69356554 CT T 44 121 18 61 5 24 0.961 1.000 69356554 CT CTTTTTT 44 13 18 5 5 2 1.000 1.000 69356554 CT C 44 21 18 1 5 2 0.310 1.000 69356554 CT CTT 44 10 18 18 5 2 0.176 1.000 69356176 CA CAA 201 84 118 38 29 8 0.760 1.000 69356176 CA C 201 9 118 2 29 1 0.739 1.000 69355565 CT C 260 10 150 2 35 0 0.739 1.000 69354033 GT G 232 40 127 26 28 5 0.888 1.000 69354033 GT GTT 232 7 127 1 28 0 0.811 1.000 69352953 CT C 278 1 154 6 33 0 0.176 1.000 Continued

Table 2.4. Continued chr5 hg19 Normal Mild Severe Adjusted P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 69351634 GT GTT 260 4 134 5 30 2 0.739 1.000 69351634 GT G 260 22 134 16 30 2 0.739 1.000 69349812 CTTT CTT 30 200 14 111 2 28 1.000 1.000 69349812 CTTT CT 30 44 14 16 2 3 0.918 1.000 69349820 CTTT C 30 3 14 2 2 0 0.918 1.000 69349560 TC T 281 11 149 9 34 2 0.862 1.000 69349560 TC TCC 281 3 149 1 34 0 1.000 1.000 69349553 CT C 259 40 144 19 30 6 0.888 1.000 69348708 CAAA CA 195 17 128 4 26 3 0.379 1.000 69348708 CAAA CAA 195 43 128 19 26 3 0.739 1.000

83 69348708 CAAA CAAAA 195 22 128 6 26 4 0.379 1.000 69348708 CAAA C 195 10 128 2 26 0 0.739 1.000 69348708 CAAA CAAAAA 195 3 128 2 26 0 1.000 1.000 69348369 TAA TA 198 30 102 14 28 1 1.000 1.000 69348369 TAA TAAA 198 2 102 0 28 1 0.888 1.000 69348031 AAGAAA A 298 1 155 0 33 3 0.739 0.056 69348032 AGAA A 298 0 154 1 33 0 0.739 1.000 69348017 CAAAA CAAA 233 29 124 15 27 3 1.000 1.000 69348017 CAAAA CAAAAA 233 25 124 9 27 3 0.739 1.000 69348017 CAAAA CAA 233 5 124 2 27 0 1.000 1.000 69346683 CAAATAAAT CAAAT 300 3 162 2 32 0 0.918 1.000 69345281 CA CAA 281 11 148 4 32 2 0.888 1.000 69345281 CA C 281 4 148 6 32 1 0.739 1.000 Continued

Table 2.4 Continued chr5 hg19 Normal Mild Severe Adjusted p-value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe TCACACA TCACACAC 69345023 CACACACA ACACA 21 60 18 23 2 7 0.379 1.000 TCACACA TCACACAC 69345023 CACACACA ACA 21 42 18 21 2 4 0.739 1.000 TCACACA TCACACAC 69345023 CACACACA ACACACACACA 21 43 18 17 2 12 0.379 0.679 TCACACA 69345023 CACACACA TCACACA 21 27 18 13 2 3 0.739 1.000 TCACACA TCACACACACA 69345023 CACACACA CACACACACA 21 12 18 2 2 7 0.176 0.138

84 TCACACA 69345023 CACACACA TCACA 21 17 18 3 2 0 0.379 1.000 69344545 CA C 253 25 131 12 31 3 1.000 1.000 69344545 CA CAA 253 10 131 3 31 1 0.888 1.000 69343519 CTTTT CTT 222 10 120 4 21 3 0.888 0.679 69343519 CTTTT CTTT 222 42 120 14 21 6 0.708 1.000 69343519 CTTTT CTTTTT 222 13 120 8 21 4 1.000 0.679 69343519 CTTTT CT 222 2 120 1 21 2 1.000 0.511

GnomAD frequency of 0.0017) mutation in exon 7 which has previously been shown to modify SMN production by modulating the binding of splicing factors to exon 7 (Prior et al.

2009).The first patient with this variant was determined to have 2 copies of SMN2 using the

MDiGS sequencing data and phenotypically was a SMA type 3a SMA patient. The second patient was determined to have 2 copies of SMN2 and was a SMA type 3b patient. Additionally, this patient was determined to have 2 copies of the c.859G>C allele. These phenotypes are milder than expected, as the vast majority of 2 copy SMN2 patients have a severe type 1 phenotype (Feldkötter et al. 2002; Calucho et al. 2018). This milder phenotype is consistent with previous reports of SMA patients with the c.859G>C variant (Prior et al. 2009).

The c.84C>T (rs1554066599; NC_000005.9:g.69359244C>T) variant in exon 2a was detected in 17 alleles. This is a synonymous variant located 3 bp from the beginning of exon 2a.

This variant was found in 9 concordant, 6 mild, and 2 severe exception patients and was not significantly associated with either a milder (adjusted p-value 1.000) or more severe (adjusted p- value 1.000) phenotype. This is in contrast with previous data reporting it as a possible pathogenic variant, as it was found in an SMA patient with 1 copy of SMN1 (Wang et al. 2010a).

Finally, we detected the c.462A>G (rs1450194682; NC_000005.9:g.69362949A>G) variant in exon 3 of numerous patients. This allele was detected 162 times out of a total of 505 alleles. It was not found to segregate with either the mild (adjusted p-value 1.000) or severe exception patients (adjusted p-value 1.000).

2.3.7 Alignment and map of the SMA region

The complexity of the SMA region has caused notable difficulty in constructing a correct map of the region. There are multiple different arrangements of the region, as evidenced by the

85

multiplicity of banding patterns obtained on pulsed field gel electrophoresis experiments when using SMN2 and NAIP probes. To determine one arrangement of the SMA region, we assembled a map using overlapping PACs that originated from a single library Figure 2.4 (Osoegawa et al.

2001). We used a total of 10 clones, with an average overlap of 71850 bp between adjacent clones and the smallest overlap being 32833 bp. There was only a single mismatch among all the overlaps. It has been reported that heterozygous SNPs occur on average every 1.1 kb making it very likely this assembly is one chromosome (Ceballos et al. 2018). The clones we used were all from the RP11 library (Osoegawa et al. 1998, 2001), specifically RP11-619K7,

RP11-1012N14, RP11-1415C14, RP11-1414O21, RP11-497H16, RP11-1005E12 , RP11-

1432L1 , RP11-974F13 , RP11-195E2 , and RP11-1280N14 (Fig. 4B). The total length of the assembly was 1.358 Mbp.

We identified and labeled genes and pseudogenes on the assembled map that can be seen in Figure 4. SMN1 and SMN2 were located approximately 848 kb away from each other and were in the same orientation. Approximately 6.5 kb upstream of SMN2 was SERF1B and 16.4 kb downstream of SMN2 was a NAIP pseudogene containing NAIP exons 6-17 (Ensembl exons

ENSE00003489009 through ENSE00003505062). The gene GTF2H2B was 338 kb downstream of SMN2, followed by a second NAIP pseudogene containing NAIP exon 3 (Ensembl exon

ENSE00003668305) and exons 6-9 (Ensembl exons ENSE00003489009 through

ENSE00002219419). Approximately 6.5 kb upstream of SMN1 was SERF1A and 16.4 kb downstream was NAIP. Approximately 80 kb downstream of SMN1 was GTF2H2, followed by a NAIP pseudogene that contains NAIP exons 6-13 (Ensembl exons ENSE00003489009 through

ENSE00003590701). There were also 4 copies of GUSBP3 pseudogene containing various

86

Figure 2.4. Map of the SMA region that was assembled using overlapping clones that originate from the same chromosome. A Map of the region showing all genes and their orientation. SMN1 and SMN2 are in the same orientation and are approximately 848 kb away from each other. Lying between SMN1 and SMN2 are 2 NAIP pseudogenes (one containing NAIP exons 6-17 and the other contained NAIP exons 3 and 6-9) and 2 GUSBP3 pseudogenes, as well as SERF1A and GTF2H2B. Pseudogenes are indicated with a (Ψ). B Overlapping clones used to construct the region. The smallest overlap between clones was 32,833 bp while the average overlap was 71,850 bp. There was only a single mismatched base pair out of all the overlapping regions.

combinations of exons and in different orientations.

2.4 Discussion

SMA patients have a wide degree of severity usually as a result of copy number variation in the SMN2 gene. However, variants in SMN2 have been reported which alter the phenotype or are suspected of altering the phenotype. In this work, I have analyzed targeting sequencing data of the SMN2 genes of SMA patients to determine what variants inside of SMN2 to segregate with mild SMA exception patients. In addition, I have analyzed targeting sequencing data of the PLS3 gene, which has been suggested as one modifier outside of SMN2 that may modify the phenotype. In this work I identified a partial deletion that can occur in both SMN1 and SMN2, as well as 3 variants that are statistically associated with mild exception SMA patients. Meanwhile,

87

no variant was found in PLS3 that segregated with exception patients. In short, I show modifiers, including partial deletions, exist in SMN2 while no such modifier exist in PLS3.

I have also assembled the sequence of the SMA region on a single chromosome using overlapping clones (Figure 2.4). The SMA region was originally described as an inverted repeat which was based on mapping and pulsed field analysis of YAC clones obtained from the region

(Melki et al. 1994; Lefebvre et al. 1995). It is notable that in the assembly I created, the arrangement is not an inverted duplication. Instead, SMN1 and SMN2 are in the same orientation as is NAIP and a NAIP pseudogene, indicating that this region may be prone to inversions with multiple orientations possible. The arrangement of the SMA region we present is consistent with the occurrence of unequal crossover events leading to deletion of the SMN1 gene as has been previously reported (Wirth et al. 1997). This arrangement is also consistent with the studies of

CA-dinucleotide markers in the region that show multiple copies but of varying number. The studies of these markers would also be consistent with multiple possible arrangements and additional copies of certain markers, including AG1-CA and CAAT1 (DiDonato et al. 1994;

Burghes et al. 1994a). From the available clones, I was only able to construct 1 assembly.

However, if more assemblies are generated, it would give insight into how genomic rearrangements like deletions, inversions, and duplications occur in the region as well as which sequences are hotspots for these arrangements to occur. Furthermore, mapping reads of targeted

SMN2 sequencing to these assemblies may be ideal for detecting deletions that occur inside the region, including in NAIP. One approach to assembling the region is by long read sequencing technologies such as Nanopore MinION whole genome sequencing or 10X Genomics sequencing, which barcodes the small DNA fragments to allow for direct assembly.

88

It is known that SMN1 and SMN2 can be lost by deletion but deletion junctions have not been reported that remove SMN1/2 exons 7 and 8 (Burghes 1997). Using a ligation mediated

PCR assay for SMN1/2 copy number, exon 1 of SMN1/2 has been previously examined in SMA patients, carriers, and normal individuals. In these studies, SMA cases were presented where there was a loss of SMN1 exons1-6, but with the presence of SMN1 exon 7 (Arkblad et al. 2006).

Additionally, there were cases with excess exons 1-6 but missing exons 7-8 (Arkblad et al.

2006). The loss of SMN exons 7 and 8 was described as a polymorphism as it was present in normal individuals with 2 copies of SMN1 (Arkblad et al. 2006; Calucho et al. 2018). However, here we report a 6.3 kb deletion that eliminates exons 7 and 8 and can occur in either the SMN1 or SMN2 gene (Fig 1). As the deletion occurs in SMN1 and can cause SMA, it is a disease- causing variant and it cannot be referred to as a polymorphism. Interestingly, we did not detect loss of SMN1 exons1-6 but these alleles clearly exist and are SMA alleles as they occur in SMA patients (Arkblad et al. 2006). Evidence of the 6.3 kb deletion occurring in SMN1 comes from dosage of the 3’ and 5’ end of SMN1/2 genes examined across 3 generations in one family

(Figure 2), as well as polymorphic markers D5S823 and D5S107 that trace the origin of the mutation. This analysis indicates that the mutant allele originates in the grandmother, who is deleted for both SMN1 and SMN2 on the same chromosome.

Additionally, we developed models based on the deletion frequency in patients with varying copy numbers of SMN1 and SMN2. In comparing the models, the model where the deletion occurs in both SMN1 and SMN2 is most consistent with the observed deletion frequency data. Together, the data show strong evidence of the deletion occurring in both SMN1 and SMN2.

The frequency of this deletion appears to be higher in SMN2 than SMN1 as indicated by the

89

similar frequency of the deletion in the groups containing a single copy of SMN2 but different

SMN1 copy number. A puzzling feature of the SMA region is the frequency of SMN1 loss versus the frequency of SMN2 loss. In the case of homozygous SMN1 loss, the frequency is 1/10,000

(0.01%), which is also the frequency of SMA (Pearn 1978). Whereas with homozygous SMN2 loss, the frequency in the general population is 10-15%, which is 1000 times higher than SMN1

(Mailman et al. 2002). Perhaps the reason for this is selection against loss of SMN1 but not

SMN2 as the loss of SMN1 in the population gives rise to SMA and the loss of SMN2 is not detrimental.

We identified 15 repeats in the SMN1/2 gene that have an exact 21 bp match to the 3' end of the deletion junction. The sequence of this 21 bp repeat matches perfectly the first 21 bp of the

Alu core element (Rüdiger et al. 1995). Interestingly, a deletion junction of SMN exons 5 and 6 has been found near a similar repeat, though with a slightly different sequence (Wirth et al.

1999). It is possible other less frequent deletion junctions are occurring at these other repeats.

The repeats also lie in the adjacent NAIP gene, which had a total of 24 repeats with an exact 21 bp match. With NAIP known to be deleted in approximately 43% of type 1 SMA patients (Roy et al. 1995; Thompson et al. 1995; Burlet et al. 1996), it is possible a deletion junction exists spanning NAIP and SMN1 that is flanked by these repeats. Our attempts to find such a deletion junction were unsuccessful, though future experiments which expand the captured region to include NAIP would greatly increase the chances of detection.

In this study, we sequenced the SMN2 genes of SMA patients in order to find modifiers of SMA. The sequencing data was analyzed for variants as well as SMN2 copy number which was then verified using ddPCR (Figure 3). From this analysis, we found 3 variants, A-44G, A-

90

549G, and C-1897T in intron 6 of SMN2, that are significantly associated with mild exception patients. All 3 are variants are more prevalent in the SMN1 gene (Monani et al. 1999; Wu et al.

2017) but may be present in SMN2 due to gene conversion events (Campbell et al. 1997;

DiDonato et al. 1997; Burghes 1997). The variants C-1897T and A-549G have not been previously implicated as SMA modifiers before. The third variant A-44G has previously been suspected of being a modifier as it has been shown to increase the amount of full-length SMN

(Wu et al. 2017). Here we confirm that prediction, as A-44G was associated with 6 mild exception patients and never with concordant or severe exception patients, strongly suggesting the presence of this variant resulted in a milder than expected SMA phenotype. Five patients were identified as having 2 or more of these variants and there was some evidence of an additive effect. For example, patient MGH157 with 3 copies SMN2 and 1 of the modifying variants (C-

1897T) had SMA type 3a. Patients MGH335, OSU01-001, and OSU01-002 also had 3 copies of

SMN2 but in addition these individuals each had all 3 of the variants and displayed milder type

3b or type 4 SMA. This data indicates that patients with multiple variants may manifest a milder than anticipated phenotype. However not all discordant sibling pairs contained variants in

SMN2. For example, OSU01-001 and OSU01-002 had identical SMN2 genotypes yet OSU01-

001 had a SMA type 3b phenotype while OSU01-002 had SMA type 4. Thus, in this case it is likely that individual OSU01-002 has additional modifiers outside of the SMA region that could account for the milder phenotype.

Indeed, genes outside of the SMA region have been implicated in modifying SMA.

Epigenetic effects have also been proposed to affect SMA phenotype, both within and outside of the SMN2 gene, though evidence for this is limited (Hauke et al. 2009; Zheleznyakova et al.

91

2013, 2015). One study found differences in CpG methylation at certain CpG dinucleotides, two of which were near an alternative transcription start sites of SMN2 (Hauke et al. 2009). Although these 2 CpG dinucleotides were shown to modulate expression of a particular isoform of SMN2, this isoform makes up less than 5% of total SMN2 transcripts and hence does not result in a significant change in total SMN protein (Hauke et al. 2009). Outside of SMN2, methylation differences have been found between lymphoblast samples from SMA patients and healthy controls near the genes CHML and ARHGAP22 which are related to the activity of Rab and Rho

GTPases (Zheleznyakova et al. 2013). However, the ability of these methylation changes to alter the SMA phenotype has yet to be demonstrated.

The PLS3 gene was determined to be a sex-specific protective modifier that is over- expressed in lymphoblasts from SMA females (Oprea et al. 2008b). However, further studies of female discordant SMA siblings showed higher PLS3 expression in the severe sibling demonstrating that the effect is non-penetrant in certain patients (Bernal et al. 2011). To date it is unclear why elevated PLS3 expression is found in some female patients but not others.

Additionally, male exception cases exist which PLS3 cannot explain (Burghes et al. 1994a;

Cobben et al. 1995; McAndrew et al. 1997; Cuscó et al. 2006). Furthermore, mice experiments remain controversial in that our group showed no improvement in SMA mice upon overexpression of PLS3 while the Wirth group show marginal improvement (Ackermann et al.

2013; McGovern et al. 2015b). Studies by Oprea et al. maintain that the modification can only occur in SMA type 2 patients and not in those with SMA type 1 and consequently the severe phenotypes cannot be modified (Oprea et al. 2008b). However, cases with discordant SMA type

1 and type 2 siblings have been reported (Pane et al. 2017). Mild SMA animals treated with both

92

PLS3 and a suboptimal dose of corrective antisense oligonucleotide (ASO) have an improved phenotype (Strathmann et al. 2018), but it is not known if it is due to improved uptake of oligonucleotide or a direct effect of PLS3. In sum, the mechanism and degree to which PLS3 is a modifier of SMA remains uncertain.

In this study I found no evidence that PLS3 was a modifier of SMA. I tested all alleles that were present in the 268A15 clone that contains PLS3 and none of them had a statistically significant segregation with neither mild nor severe exception SMA patients. This includes the variant rs871773 which is known to increase expression of PLS3 in colon cancer (Szkandera et al. 2013). In males, rs871773 was found in 1 concordant and 1 mild exception patient, whereas in females it was found in 6 mild and 3 severe exception patients. In short, our data does not support the hypothesis that PLS3 modifies the SMA phenotype as no variant was found to statistically segregate with exception patients. Finally, SMA patients with a milder phenotype than expected for their copy number are not exclusively females as 24 of our 58 mild exception patients were males. Clearly, other modifiers besides PLS3 exist.

The work we have presented here has several important conclusions. First, not all SMN2 genes are equivalent as evidence by the existence of variants that segregate with milder phenotypes. I have shown that variants do exist in SMN2 that alter SMA phenotype. Second, there must be variants that modify SMA that exist outside of SMN2. The SMN2 variants we found to significantly associate with milder SMA individuals can only explain a small fraction

(14 out of 58) of our mild exception patients. Amongst all our patient samples we had 11 cases of discordant siblings and all were confirmed to have identical genotypes in the SMA region.

Hence the majority of SMA patients, including exception patients, have fully intact SMN2 genes.

93

This strongly supports the notion that in the majority of discrepant cases genetic modifiers lie outside the SMN region. Third, no variant in PLS3 segregated with the mild exception patients, not even when accounting for sex. Fourth, the data generated by this adaptation of the MDiGS technique had enough statistical power to identify significant variant associations with milder phenotypes, even when the variant was present in only a limited number of individuals. This makes the MDiGS technique a viable option for determining modifiers using targeted sequencing. Finally, the patient samples and data described here an ideal dataset for identifying modifying variants outside of the SMA region. These fully sequenced samples have a confirmed

SMN2 copy number and well-characterized phenotypic information. Any candidate modifiers of

SMA can be tested for their presence in our confirmed exception cases, or their absence in our confirmed concordant cases.

94

Chapter 3

Identification of SMA Modifiers Using Exomic and Genomic Sequencing

This work was completed with the assistance of multiple individuals. Exome sequencing and library prep was performed by Jesse Hunter in Lisa Baumbach-Reardon’s group at TGen.

Genomic sequencing and alignment was performed by Novogene. Vicki McGovern performed all bench work experiments for the confirmation of variants. Bioinformatic analysis was entirely performed by me.

3.1 Introduction

SMA is a disease of low SMN levels and thus disease severity is generally inversely correlated with the copy number of SMN2 (McAndrew et al. 1997; Burghes 1997; Feldkötter et al. 2002; Mailman et al. 2002; Jedrzejowska et al. 2009; Calucho et al. 2018). However, patients have been reported who are exceptionally mild or severe for their SMN2 copy number, most strikingly of which are siblings who have inherited the same disease locus but one sibling has a significantly milder phenotype (Burghes et al. 1994a; Cobben et al. 1995; Prior et al. 2004;

Cuscó et al. 2006; Jedrzejowska et al. 2008; Bernal et al. 2011). In Chapter 2, I demonstrated that

3 variants in the SMN2 gene segregated with mild exception patients which are acting as SMA

95

modifiers. However, these modifiers can only explain 14 out of 58 of the mild exception patients we studied. Clearly, modifiers that lie outside of the SMN2 gene exist.

In Chapter 1, I discussed genes and pathways that were suggested modifiers of SMA. In addition to those discussed earlier, there is much evidence that the Rho-kinase (ROCK) pathway can modulate the SMA phenotype. RhoA is a small GTPase which is capable of regulating the actin cytoskeleton and neuronal growth cone signaling through its downstream effectors (Luo et al. 1997). Profilin IIa, which was shown to bind SMN in a yeast-II-hybrid experiment, has been shown to regulate neuritogenesis by forming a complex with RhoA (Giesemann et al. 1999; Da

Silva et al. 2003). Interestingly, the ROCK pathway was found to be activated in tissues depleted of SMN (Nölle et al. 2011). Inhibition of ROCK using either of the ROCK inhibitors Y-27632 or

Fasudil resulted in a significant increase of survival in the Smn2b/- mild mouse model of SMA

(Bowerman et al. 2010, 2012). Though survival was increased, the exact mechanism as to how was not determined, since neither muscle strength tests or ventral motor neuron counts increased in the treated mice (Bowerman et al. 2010, 2012). Still, these data show that ROCK pathway inhibition can ameliorate SMA severity.

Numerous experiments have also suggested that the ubiquitin pathway can modify the

SMA phenotype. SMN is a target of ubiquitination and can be degraded via the proteasome

(Chang et al. 2004; Burnett et al. 2009a). Furthermore, SMN was shown to be ubiquitinated by the mouse E3 ubiquitin ligase Mib1 and SMN degradation could be altered by either knockdown or over-expression of Mib1 (Kwon et al. 2013). Similarly, the deubiquitinase USP9X was shown to remove ubiquitin from SMN and increasing knockdown of USP9X using an shRNA resulted in a progressive depletion of SMN protein (Han et al. 2012). Thus, enzymes that add or remove

96

ubiquitin from SMN are targets for SMA modification as they alter the total amount of SMN.

Experiments which target ubiquitin dynamics of SMN in the Taiwanese mouse model of SMA have resulted in increased SMN protein, weight gain, and survival in the treated animals (Powis et al. 2016). These experiments show that modification of the ubiquitin pathway can reduce the severity of SMA.

Despite the numerous genes and pathways that are purported as being modifiers of SMA, evidence linking genetic variants to mild exception SMA patients is extremely lacking. In

Chapter 2, I analyzed targeted sequencing data of the suspected SMA modifying gene PLS3 in

217 SMA patients and found no evidence of a modifying variant. Only 2 genetic variants have been proposed to modify SMA. These 2 variants are both in or near the NCALD gene and have been discovered in SMN1-deleted patients who were asymptomatic at the time of study, though it is unclear how these variants exert a modifying effect (Riessland et al. 2017). One of these variants was a 2 bp insertion inside of the 5’ UTR while the other was a 17 bp deletion 600 kb upstream of the NCALD gene (Riessland et al. 2017). The 17 bp deletion was adjacent to an

ENCODE super-enhancer, making it a plausible candidate for modification of SMA as NCALD expression was also detected as being lowered in mild exception SMA patients (Riessland et al.

2017). However, data showing how the deletion near the super-enhancer results in lower NCALD expression has not been published. In fact, the paper suggests both the 2 bp insertion and the 17 bp deletion may be required for the modifying effect to occur but no explanation has been given as to how the variants work synergistically to modify SMA or why 1 of the variants alone is insufficient for the modifying effect (Riessland et al. 2017). In short, numerous genes and

97

pathways have been implicated in modifying SMA, but in nearly all cases a genetic variant has never been discovered linking the modifying gene with an altered phenotype in SMA patients.

In this experiment, we have performed exomic and / or genomic sequencing of discordant SMA siblings in order to identify modifiers of SMA. Siblings from a total of 4 families were sequenced, with 1 of these families only exome sequenced, 1 only genome sequence, and 2 both exome and genome sequenced. As a control, 8 SMA type 1 patients with 2 copies SMN2 were exome sequenced and 4 pairs of concordant SMA siblings with SMA type 2 and 3 copies SMN2 were genome sequenced. I developed a Python program for sorting variants that segregate with the mild sibling but not the severe sibling or the controls (these variants will be referred to as “candidate variants”). By combining the exomic and genomic data I identified 3 candidate exonic variants in the gene ROCK2 that were present in 3 out of 4 of the mild siblings and never in the severe. By performing a mutational analysis on these variants, I determined two of these variants were identified as strongly likely to affect splicing of the gene as a result of generation of a cryptic splice site. In addition, I identified multiple candidate variants in the introns of the genes SLIT1, FBXO3, FAM171A1, HS6ST3, and PTPRD. The intronic variants in these genes are in linkage disequilibrium, the most striking example of which is FBXO3 as there are 24 variants in a 106 kb region that includes the FBXO3 gene all of which segregate with the mild exception siblings. One of these variants occurs in the poly-pyrimidine tract of a 3’ splice site and may be creating a new branch point. From this project I have identified a limited number of genes with candidate modifying variants of SMA. The genes I have identified can be further captured and sequenced in other discordant siblings and in the 217 individuals I analyzed in

Chapter 2, which will confirm these variants are associated with mild SMA patients.

98

3.2 Methods

3.2.1 Patient Samples

This study used multiple sources of DNA including samples previously collected for linkage analysis or molecular studies of SMA (Burghes et al. 1994a; McAndrew et al. 1997;

Miller et al. 2001) under the Institutional Review Board (IRB) of The Ohio State University

OSU1988H0371 that were de-identified and determined to be exempt by the IRB. Thirteen new

SMA samples were collected under IRB No. 2015H0115.

A total of 15 patient samples were exome sequenced, including 3 discordant sibling pairs from 3 different families. Two of these families were described previously (Burghes et al.

1994a). The first pair of siblings are OSU112 and OSU113, who are both female. OSU112 was diagnosed with SMA type 2 with an onset of symptoms at 17 months of age. OSU113 developed normally and an EMG at age 6 showed no signs of neurodegeneration. However, by age 13 she developed weakness in the legs as well as an EMG showing signs of degeneration. The second family consists of OSU50 (female) and OSU51 (male). OSU50 was diagnosed as SMA type 2 while OSU51 showed no signs of SMA. Haplotype analysis reveals that the two siblings share identical markers between markers D5S435 to D5S39, in the SMA region. OSU51 showed mild weakness at age 20, though EMG and muscle biopsy were not available at that time to confirm diagnosis. Finally, OSU02-007 and OSU02-009 were sequenced. OSU02-007 is SMA type 3b with 3 copies SMN2, while OSU02-009 is SMA type 2 with 3 copies SMN2.

Two of the sibling pairs that were exome sequenced were later used for genome sequencing, namely OSU50 and OSU51, and OSU02-007 and OSU02-009. In addition to

OSU02-007 and OSU02-009, a third sibling OSU02-008 was sequenced. OSU02-008 was a

99

severe SMA sibling with SMA type 2. Patient OSU04-001 was a severe SMA sibling with SMA type 2, and OSU-4-002 was mild with no symptoms at age 5. The mild cases include 2 males

(OSU51 and OSU04-002) and 1 female (OSU02-007). All discordant siblings had 3 copies of

SMN2 with no alterations in their SMN2 genes. Four pairs of concordant SMA siblings were also sequenced who were all SMA type 2 with 3 copies SMN2 and no alteration of their SMN2 gene.

3.2.2 Exomic sequencing of SMA discordant siblings

DNA that was collected from the patients was used to create a library using TruSeq

Library Preparation Kit (Illumina) and was performed as per the manufacturer’s guidelines.

Quality checks were performed using a High Sensitivity DNA Bioanalyzer chip (Agilent). DNA concentrations were determined on a Qubit Hi-sensitivity DNA Assay kit (Life Technologies).

Libraries were pooled and barcoded and exome enrichment was performed using a TruSeq

Exome Enrichment Kit (Illumina). Sequencing was performed on an Illumina HiSeq 2500 and returned 100 x 100 bp paired-end reads.

3.2.3 Bioinformatic Analysis of Exomes

Sequenced reads were aligned to the hg19 using BWA (Li and Durbin

2010). Duplicate reads were removed using the Picard Tools program MarkDuplicates. The best practices pipeline for the Genome Analysis ToolKit was followed for analysis (Rimmer et al.

2014). Reads were re-aligned along known insertions and deletions (as defined by 1000

Genomes phase1 data) using GATK IndelRealigner. Base quality scores were then recalibrated to reduce platform and lane specific errors. Finally, Haplotype Caller was used to call variants, which were filtered for quality. The final result is a VCF file contain all SNPs and indels. Read counts and manipulations of SAM/BAM files were performed with Samtools (Li et al. 2009).

100

The VCF file was annotated using he program ANNOVAR (Wang et al. 2010b). Variants were filtered using GATK’s VariantFiltration module using the following settings: QualityByDepth >

2, FisherStrand > 60, RMSMappingQuality < 40, ReadPosRankSumTest < -8.

I wrote a custom Python script that analyzed the generated VCF file for candidate variants. For each sibling pair, the script iterated through all variants and determined which variants differed between the discordant siblings. These variants were considered candidate modifiers. After candidate variants were determined for all 3 sibling pairs, the scripted noted which candidate variants were present in multiple sibling pairs. Next, all sequence variants in the control SMA type 1 population were pooled and filtered from the list of candidates. The candidate variants were then printed out to a comma-separated values file. The script can print out candidates based on the number of siblings they are found in, as well as which siblings they are found in. For this analysis, I generated lists for candidate variants that were in all 3 mild siblings, as well as those that were in only 2 mild siblings. Thus, even if the modifying variant was not the same in all 3 of our mild exception patients, it would still have been detected as long as it was in 2 mild siblings. The script considers zygosity when determining candidate modifiers, such that if a mild sibling is homozygous for a variant while the severe is heterozygous, it is still considered a candidate, though this can be filtered if desired.

3.2.4 Confirmation of variant

Candidate variants were confirmed using PCR amplification and a restriction digest.

There were 2 candidates validated, p.P1002P in COL5A3 and p.R324H in CYP7B1. The forward primer used for CYP7B1 was 5’-CACATCATTTAGGCTTTCTCTGG-3’ and the reverse was

5’-GATTAGGCTGTCCAATTGTTCTC-3’. For the COL5A3 exonic variant, the forward primer

101

used was 5’-CCACAGAGGAAGACAGGA-3’ and the reverse was 5’-

CCAAGACACCTTGAGTCC-3’. The restriction digest was performed using the enzyme BtgI for the COL5A3 variant and HpyCH4III for the CYP7B1 variant. The PCR product and enzyme were incubated at 37°C with 1X CutSmart Buffer.

3.2.5 Genomic sequencing of SMA discordant siblings

For each sample, 1.0 µg of DNA were used to generate the sequencing library using

TruSeq Nano DNA HighThroughput Sample Preparation kit (Illumina). DNA was fragmented via sonication such that average DNA length was 350 bp. Poly-A tails were added to the DNA fragments as well as Illumina adaptors subjected to PCR amplification. The libraries were purified using AM Pure XP system and analyzed for size distribution using Agilent 2100

Bioanalyzer. The samples were clustered using a cBot Cluster Generation System using TruSeq

PE cluster kit v4 (Illumina). Sequencing was performed on an Illumina HiSeq which returned

150 x 150 bp paired-end reads.

3.2.6 Bioinformatic analysis of genomic data

The sequenced reads, in FastQ format, were subject to quality control. To start, reads were trimmed for adapters. Reads were eliminated if more than 10% of bases in a read were ambiguous base calls or if the proportion of low quality base calls (as defined by a phred score of less than 20, equivalent to a 10% error rate) was over 50% of bases in the read. Reads were aligned using BWA to the human hg19 genome. Duplicate reads were marked using the Picard tools program Mark Duplicates. Manipulation of the SAM/BAM files was done using Samtools.

Copy number variants and structural variants were called using proprietary software for

Novogene. The calling of SNPs and indels was performed using GATK as described above,

102

including the filtering of the VCF files. Annotation of variants was performed using the program

ANNOVAR. The annotations include RefSeq annotations to identify variants in coding genes. It also annotates variants with scores for SIFT, PolyPhen, and GERP to indicate the degree of severity and conservation of mutations.

I developed a custom Python script that identifies variants that segregate with mild discordant SMA siblings but are never found in the severe sibling or in the concordant control samples. This script is very similar to the one used in the exome analysis but with several modifications. First this script allows for the input of families with more than two siblings.

Second, this script looks for all the variants that are different between the concordant SMA siblings and populates them in a list, which is then used to filter all candidate variants that are in this list. The script was further adapted to filter any variant in the concordant siblings, as they all are SMA type 2 individuals with 3 copies of SMN2. Variants were classified by differences in zygosity. Candidate variants were prioritized based on the number of mild discordant siblings that they were detected in as well as the type of mutation with emphasis on exonic mutations that result in a frame shift or that were predicted to be damaging based on SIFT and GERP scores.

Gene expression information was added based on expression from an RNA-sequencing experiment of laser capture micro dissection motor neurons from wildtype mice as well as human brain data from the Allan Brain Atlas (Hawrylycz et al. 2012; Zhang et al. 2013). Genes with neuronal expression greater than 1 reads per kilobase per million reads sequenced (RPKM) were considered as good candidate genes for the modifier.

103

3.3 Results

3.3.1 Exonic variants identified from exomic and genomic sequencing

In order to identify possible genetic modifiers of SMA, we performed exomic and genomic sequencing on pairs of discordant SMA siblings. Initially, we only exome sequenced 3 pairs of discordant siblings. However, we were unable to find any candidate variants that were consistently present in mild exception cases and absent in severe exception cases. In order to increase chances of detecting the modifier, we genome sequenced an additional family, as well as 2 of the families who were exome sequenced. Thus, we had a total 4 discordant sibling families that were sequenced. As a control, 8 SMA type 1 patients with 2 copies of SMN2 were exome sequenced and 4 concordant SMA type 2 siblings with 3 copies SMN2 were genome sequenced. Candidate variants were determined bioinformatically by identifying variants in the mild SMA sibling but not the severe sibling. Variants were prioritized based on if they were present in multiple mild SMA siblings. Variants were further filtered if they were detected in our concordant SMA patient controls. A schematic of the variant filtration process is shown in

Figure 3.1.

Interestingly, I did not detect any exonic mutation that was present in all 4 mild discordant siblings. This means there may be 2 genetic modifiers in the families sequenced or the modifier is not exonic (intronic candidate modifiers are discussed in the next section). However, there were 6 exonic variants present in 3 discordant siblings, 1 of which was a non-frameshift insertion, 1 was missense, and 4 were synonymous. The non-frameshift insertion was 30 bp long and in the gene IVL, which is involved in keratinization (Yoon et al. 2014). To determine if IVL is expressed in motor neurons, I utilized publicly available data from an RNA-seq experiment

104

Figure 3.1. Schematic of variant filtration pipeline for genomically sequenced discordant siblings. First, variants are identified that segregate with the mild SMA sibling but not the severe. A greater priority is placed on those present in multiple mild siblings. Variants in the control samples are then filtered.

performed on motor neurons laser-capture microdissected from mice as well as the Allan Brain

Atlas (Hawrylycz et al. 2012; Zhang et al. 2013). From this data I determined IVL had very low expression in neurons of 0.001 RPKM and hence it is unlikely this gene modifies SMA as motor neurons are the primary tissue affected in SMA.

I also detected 1 exonic variant each in the genes COL5A3 and CYP7B1. The synonymous c.3006T/p.P1002P (rs12610207) variant was found in the gene COL5A3, which is involved in making certain types of collagen (Malfait et al. 2005). Mutations in the related genes

COL5A1 and COL5A2 are known to cause Ehlers-Danlos syndrome (Malfait et al. 2005;

Hoffman et al. 2008). The p.P1002P mutation had a very high allele frequency at 0.63. In addition, there were 5 intronic mutations in COL5A3, all within 20 kb of each other with 105

similarly high allele frequencies ranging from 0.56 to 0.66. In CYP7B1, the non-synonymous c.G971A/p.R324H (rs59035258) mutation was identified. CYP7B1 codes for a protein that is involved in cholesterol metabolism (Li-Hawkins et al. 2000; Tsaousidou et al. 2008; Cui et al.

2013). Interestingly mutations in CYP7B1 are known to result in a form of spastic paraplegia, a disease which can result in weakness of the limbs (Schüle et al. 2009; Schlipf et al. 2011;

Arnoldi et al. 2012). The variant c.G971A/p.R324H has a frequency of approximately 0.029.

Unfortunately, the p.R324H mutation of CYP7B1 was subsequently found in the more severe sibling of 2 discordant sibling pairs and the p.P1002P mutation (as well as the 5 intronic mutations) of COL5A3 was later found in numerous concordant SMA siblings, eliminating both as being possible modifiers of SMA.

The remaining 3 exonic mutations were all found in the ROCK2 gene. These variants were the c.G342A/p.S114S (rs41264193), c.A2145G/p.L715L (rs55932113), and c.A255G:p.L85L (rs56070302) mutations which all have an allele frequency of approximately

0.015. In order to determine if these synonymous mutations had any effect on the splicing of the gene, a splice site mutational analysis was performed using the Alamut software. From this analysis I determined the both the c.A2145G/p.L715L and c.A255G/p.L85L variants were predicted to affect splicing by creation of a cryptic splice site. Interestingly, the ROCK pathway has previously been implicated in modifying SMA as treatment of mild Smn2b/- SMA mice with

ROCK inhibitors resulted in an extension of survival (Bowerman et al. 2010, 2012; Nölle et al.

2011), making these 3 ROCK2 variants highly interesting candidates.

In order to expand the list of candidate variants, I searched for variants present in only 2 mild siblings. I also performed an analysis for genes which were mutated in the mild siblings but

106

with different mutations. In this analysis, I prioritized the candidate variants if they resulted in a non-synonymous amino acid change, had a known neuronal function, and had an allele frequency less than 0.25 which is consistent with the numbers of SMA exception patients. By doing this analysis, I found a total of 141 candidate variants in 60 genes. Of note were 2 mutations in the SRRM2 gene, which is involved in 3’ splice site selection (Fontrodona et al.

2013), which is particularly interesting because of known splicing defects reported in SMA samples (Zhang et al. 2008; Lotti et al. 2012). Mutations in SRRM2 have been shown to result in alterations of mRNA splicing and exon inclusion (Tomsic et al. 2015), and thus mutations in

SRRM2 could possibly complement a reduction in SMN. Also of note were the p.Q1216E and p.E679K mutations in the SBF2 gene which is known to be involved in Charcot-Marie Tooth disease (Conforti et al. 2004).

3.3.2 Confirmation of the CYP7B1 and COL5A3 variants

Candidate variants were amplified using PCR and validated using a restriction digest. In particular, we validated the COL5A3 p.P1002P and the CYP7B1 p.R324H exonic variants.

Primers were made to amplify the variant. The resultant PCR product was incubated with the restriction enzyme BtgI (COL5A3) or HpyCH4III (CYP7B1) and analyzed using gel electrophoresis. A gel showing validation of the CYP7B1 variant can be found in Figure 3.2.

After performing this analysis, we were not able to identify the CYP7B1 variant in any additional mild discordant siblings and it has subsequently been found in a severe discordant sibling and thus it is unlikely to be a modifier of SMA. The COL5A3 variants have an extremely high allele frequency which is inconsistent with the numbers of SMA exception patients and were also subsequently found in numerous concordant patients.

107

Figure 3.2. Validation of CYP7B1 p.R324H variant using restriction digest. The digest was performed with HpyCH4III which does not cut if the variant is present. Lanes 1-8 were severe patients. Lanes 9-11 were the mild discordant siblings. Lanes 12 and 13 were negative control. The digest indicates the sample in Lanes 9-11 are heterozygous for the p.R324H variant.

3.3.3 Intronic variants identified from genomic and exomic sequencing

In order to expand the search for SMA modifiers, I performed the analysis again using only variants that lie in introns. This analysis was performed on the 3 families that were genomic sequenced. From this analysis, I identified 107 intronic candidate variants. Interestingly, many of the 107 intronic candidate variants occurred in the same gene. The genes SLIT1, CD59, FBXO3,

PTPRN2, PTPRD, RAPGEF1, HS6ST3, and FAM171A1 all had at least 2 intronic candidate variants. There were 5 intronic variants identified in SLIT1, 13 in CD59, 9 in FBXO3, 10 in

PTPRD, 12 in HS6ST3, and 2 each in PRPRN2, RAPGEF1, and FAM171A1. A table showing all the intronic mutations in genes with multiple intronic candidate variants can be found in Table

3.1. The high number of candidate variants found in the genes SLIT1, FBXO3, CD59, and

PTPRD suggests that the variants may be in linkage disequilibrium and strengthens the possibility the SMA modifier exists in this region. Interestingly, the genes CD59 and FBXO3 are adjacent to each other and thus between the 2 genes there were 22 candidate variants detected.

There was additional candidate variant found in the gene C11orf91 which is just upstream of

CD59, as well as an intergenic variant found downstream of FBXO3. Altogether, there were 24

108

Table 3.1. Intronic candidate variants identified by genome sequencing

Loci Refa Altb Gene dbSNP Freqc 10:15276828 A T FAM171A1 rs11259564 0.134784 10:15277246 G A FAM171A1 rs113331813 0.136581 10:98801610 AAAG A.D3 SLIT1 rs10533432 0.0716853 10:98809229 G A SLIT1 rs12098593 0.125599 10:98811296 C T SLIT1 rs4917755 0.1252 10:98812106 C T SLIT1 rs111453139 0.124002 10:98817662 G A SLIT1 rs59428103 0.109824 11:33721419 C G C11orf91d rs831614 0.189097 11:33733839 T C CD59 rs831619 0.182708 11:33738036 T G CD59 rs17760306 0.076278 11:33738267 C T CD59 rs831637 0.177117 11:33741128 A T CD59 rs831632 0.193091 11:33741657 C G CD59 rs10501127 0.0854633 11:33742290 C CA CD59 rs10631432 0.194489 11:33743534 A G CD59 rs704700 0.194489 11:33749637 C T CD59 rs145935204 0.0754792 11:33751536 T G CD59 rs12420580 0.0758786 11:33753739 C T CD59 rs61887474 0.0754792 11:33757092 T C CD59 rs17760694 0.0874601 11:33757173 G C CD59 rs3181268 0.0886581 11:33757459 CA C.D1 CD59 rs571280343 0.129992 11:33778918 G A FBXO3 rs61887477 0.122804 11:33780163 A T FBXO3 rs1522086 0.122804 11:33782214 G A FBXO3 rs7941387 0.122804 11:33785913 T G FBXO3 rs7943459 0.123003 11:33786362 A G FBXO3 rs11607549 0.122604 11:33793407 C G FBXO3 rs56026019 0.123003 11:33794976 G A FBXO3 rs11599935 0.122804 11:33794984 T C FBXO3 rs7930105 0.123003 11:33795545 C T FBXO3 rs17767323 0.0828674 11:33827805 C T -d rs77581366 0.0796 7:157412101 G A PTPRN2 rs77798372 0.0393371 7:157536990 C T PTPRN2 rs75478675 0.115216 9:134462354 A AAC RAPGEF1 rs770094684 . 9:134590098 T A RAPGEF1 . . 9:8538919 A G PTPRD rs2184978 0.24361 9:8539431 C T PTPRD rs34287535 0.110823 Continued

109

Table 3.1 continued

Loci Ref Alt Gene dbSNP Freq 9:8546660 C G PTPRD rs34736341 0.120008 9:8548306 C A PTPRD rs59716145 0.116414 9:8549070 G C PTPRD rs35386998 0.108427 9:8549818 T C PTPRD rs34535566 0.151358 9:8550219 T C PTPRD rs34388664 0.110024 9:8550388 C T PTPRD rs35300369 0.111022 9:8550669 G C PTPRD rs34313240 0.113419 9:8550819 C T PTPRD rs62534057 0.111222 13:97111474 T G HS6ST3 rs7331513 0.2075 13:97111792 G A HS6ST3 rs7331050 0.2007 13:97117174 A G HS6ST3 rs16951476 0.2053 13:97118760 A G HS6ST3 rs9513148 0.1999 13:97118967 C A HS6ST3 rs9513149 0.1997 13:97118990 C G HS6ST3 rs12583324 0.1997 13:97224650 T C HS6ST3 rs3858769 0.1538 13:97228891 A T HS6ST3 rs9582054 0.1536 13:97229933 G A HS6ST3 rs4548722 0.1727 13:97229939 C T HS6ST3 rs4262817 0.1735 13:97342380 C T HS6ST3 rs35063443 0.1538 13:97356882 T G HS6ST3 rs1924580 0.1657 aReference allele, bAlternate allele, cAllele frequency, dThese variants are adjacent to the FBXO3 and CD59 genes

110

candidate variants detected within a 106 kb region. A diagram of this region showing the genes in the area and the location of all candidate variants can be found in Figure 3.3.

The high number of candidate variants in such narrow regions of the genome suggests the variants may be in linkage disequilibrium. In order to determine if these variants were in linkage disequilibrium, I searched for haplotype information of these variants in the Ensembl database.

On multiple ethnic backgrounds, many of the candidate variants were identified as being in linkage disequilibrium. Table 3.2 shows variants of FBXO3 on the ethnic background of Utah residents with Northern and Western European ancestry (CEU) which uses the rs1522086 variant as a starting point. On the CEU ethnic background, there were 11 candidate variants which were found to be in linkage disequilibrium according to Ensembl. In addition, there were 9 more

Figure 3.2. Map of variants near FBXO3 in linkage disequilibrium. The orange vertical lines represent 1 variant. There are 24 total variants that span a 106 kb region. These variants were present in all 3 mild discordant patients but no severe or control patient. Underneath the 2 small green dashes are 3 candidate genes very close together.

111

Table 3.2. Variants of FBXO3 and CD59 in linkage disequilibrium

Variant Distance Gene r2 D' Concordanta rs61887477 1245 FBXO3 1.000 1.000 rs7941387 2051 FBXO3 1.000 1.000 rs7943459 5750 FBXO3 0.971 1.000 rs11607549 6199 FBXO3 0.971 1.000 rs56026019 13244 FBXO3 0.971 1.000 rs11599935 14813 FBXO3 0.971 1.000 rs7930105 14821 FBXO3 0.971 1.000 rs17767323 15382 FBXO3 0.971 1.000 rs11032377 19250 RP11-646J21.6 0.971 1.000 134_2 rs12419271 19404 RP11-646J21.6 0.971 1.000 134_2 rs147788946 21877 intergenic 0.971 1.000 rs11032385 23847 intergenic 0.971 1.000 134_2 rs11607434 18756 RP11-646J21.6 0.943 1.000 134_2 rs11601836 35812 intergenic 0.943 1.000 134_2 rs60691031 36509 intergenic 0.943 1.000 134_2 rs11608235 37825 intergenic 0.943 1.000 134_2 rs61887491 39255 intergenic 0.943 1.000 134_2 rs61887492 39256 intergenic 0.943 1.000 134_2 rs3181268 22990 CD59 0.943 0.971 rs17760694 23071 CD59 0.943 0.971 aID number of any concordant patients with the variant

variants which were in the mildly affected siblings that were also determined to be in linkage disequilibrium on the CEU background, however they were not candidate variants as they were detected in one of the concordant controls. All variants identified as being in linkage disequilibrium by Ensembl were present in the mild discordant siblings. However, there were 13 additional candidate variants which were not in linkage equilibrium on the CEU background, which were rs831614, rs831619, rs17760306, rs831637, rs831632, rs10501127, rs10631432, rs704700, rs145935204, rs12420580, rs61887474, rs571280343, and rs77581366. Other ethnic

112

backgrounds in which several candidate FBXO3 variants were found to be in linkage disequilibrium include Mexican and Los Angeles populations as well as Chinese Han populations. I also analyzed the PTPRD gene for linkage disequilibrium. Nine of the 10 candidate variants in PTPRD were found to be in linkage disequilibrium on the CEU background. One variant, rs35300369, was not found to be in linkage disequilibrium. Similarly, the 4 candidate SNPs in SLIT1 were all found to be in linkage disequilibrium with each other on the MXL background. In addition, so were the variants rs4917754, rs74542871, and rs12244078, which were present in the mild exception cases but were not considered candidate variants as they were present in the concordant siblings OSU45 and/or OSU46. These data show that candidate variants are in linkage disequilibrium making them highly interesting to study as potential modifiers of SMA.

Intronic variants may impact transcripts by altering splicing via the disruption of splicing modulators. To identify variants that may affect splicing, I performed a splicing analysis using

Human Splice Finder (Desmet et al. 2009). One variant, rs1522086, is found inside of the polypyrimidine tract of intron 2. This variant, which is a T-to-A change on the mRNA, was predicted to create a new branch point by Human Splice Finder. Thus, this variant is highly likely to cause an alteration of splicing which could alter the protein. In order to determine if any cryptic splice sites may be created by the intronic mutation, I performed a mutational splicing analysis using Alamut. There was evidence of weak cryptic splice site activation for the variants rs7943459 and rs11599935 of FBXO3 as well as for variants rs59716145 and rs35386998 of

PTPRD. Thus, these variants may disrupt splicing of the gene leading to altered protein levels.

113

3.3.4 Analysis of PLS3 and NCALD

The PLS3 and NCALD genes have previously been identified as modifiers of SMA.

However, no genetic variant of PLS3 has ever been identified as being a modifier of SMA. In

Chapter 2, I sequenced the PLS3 gene and found no evidence of a modifying variant in PLS3 that segregated with mild exception SMA cases. Similarly, in this experiment no variant was detected in PLS3 or in a 100 kb region adjacent to the gene that segregated with the milder discordant

SMA siblings. The NCALD gene has been identified as being downregulated in asymptomatic

SMN1 deleted patients and two variants have been associated with this phenotype. To determine if these variants were responsible for the mild phenotype in the discordant patients I sequenced, I analyzed our samples for the presence of both these variants. The first variant is rs147264092 which is a 2 bp insertion in intron 1 of NCALD. This variant was present in S02_009 which is the more severely affected sibling of the discordant SMA trio that was sequenced. It was also present in S134_2 which is one of the concordant sibling controls. In order to determine if the rs147264092 variant disrupts splicing of the NCALD gene, I analyzed this variant using the

Alamut program. This variant was predicted to strongly activate a cryptic donor splice site. This has not previously been reported and may explain how the mutation results in lower levels of

NCALD. The second variant was rs150254064 which is a 17 bp deletion that is 600 kb upstream of NCALD expression. I found the rs150254064 variant in both the mild and severe sibling in one discordant pair (S04_001 and S04_002) as well as in S57, one of the concordant sibling controls. Thus, I have found no evidence that either of these variants alone is sufficient to modify the SMA phenotype. Read alignments at these 2 variants in different patients can be seen in

Figure 3.3.

114

Figure 3.3. Read alignments of 2 NCALD variants. (A) Read alignments of the rs147264092 variant. The insertion is shown as a purple “I” and it is only present in the severe sibling. (B) Read alignments of the rs150254064 variant. The deletion is present in 1 concordant sibling but not the other.

3.4 Discussion

There is currently a lack of explanation as to why SMA exception cases exist. Genetic modifiers that lie in SMN2 can explain some cases (Prior et al. 2009; Wu et al. 2017; Ruhno et al.

2019), but not all as I showed in Chapter 2. Though numerous genes and pathways have been identified as being capable of modifying SMA severity, the role of these pathways in modifying

SMA severity in patients has not been demonstrated. Identifying genetic modifiers is crucial for accurate patient prognosis and measurement of clinical trial outcomes as well as identification of further treatment targets. Though clinical trials for SMA are advancing and one treatment is already approved, it is unknown how effective these treatments will be for older, symptomatic patients and thus further treatments may be needed and more research into understand the disease

115

and how to attenuate is crucial. Here, I have performed exomic and genomic sequencing on discordant SMA siblings. I have written Python scripts that allow for the filtering of variants segregating with the milder siblings. I have annotated variants, including whether the variants may result in cryptic splice site utilization. I have also determined which variants lie in genes that are expressed in mouse motor neurons and human neuronal tissue. I have identified candidate variants in the genes ROCK2, FBXO3, HS6ST3, FAM171A1, PTPRD, and SLIT1 which are in linkage disequilibrium. One of the FBXO3 variants lies in the polypyrimidine tract and may be altering splicing of the gene. Similarly, 2 variants in ROCK2 are predicted as affecting splicing by the creation of a cryptic splice site. Thus, we have identified 6 candidate genes as possible modifiers of SMA, which will be further investigated to determine if they are present in milder SMA exception patients as well as if they alter protein levels of the gene.

From the exome sequencing data, I identified the p.R324H variant in Cytochrome P450

B1 (CYP7B1) as a candidate modifier. CYP7B1 is a gene involved in bile acid metabolism. It is involved in the alternative (also called acidic) pathway that converts cholesterol to lithocholic acid (LCA) (Kwong et al. 2015). Specifically, it catalyzes a reaction of an intermediate to chenodeoxycholic acid (CDCA) (Kwong et al. 2015). Mutations in CYP7B1 have been reported to cause a form of inherited spastic paraplegia (SPG5) (Tsaousidou et al. 2008; Biancheri et al.

2009; Criscuolo et al. 2009; Di Fabio et al. 2014). Interestingly, one reaction catalyzed by

CYP7B1 results in the formation of the 3B,7a-diHCA cholesterol molecule, which appears to have a neuroprotective effect in tissue culture experiments (Theofilopoulos et al. 2014). Analysis to determine the presence of this molecule in the CSF and plasma of SPG5 patients showed that there is a deficiency of this molecule in diseased patients (Theofilopoulos et al. 2014). Thus,

116

CYP7B1 may be capable of modifying neuronal phenotypes by increasing levels of 3B,7a- diHCA. Indeed, studies have listed p.R324H as a possible disease modifier as it associates more commonly with complex SPG5 patients than would be expected (Goizet et al. 2009; Schüle et al.

2009). However, the existence of the p.R324H mutation in the more severe sibling of a discordant SMA pair is strong evidence this variant does not modify SMA.

I also identified one synonymous mutation and four intronic mutations in COL5A3. This gene is known for encoding the alpha3 chain Collagen V (Mak et al. 2016). Collagen V can also contain alpha1 and alpha2 chains, encoded by the genes COL5A1 and COL5A2, respectively

(Mak et al. 2016). Mutations in COL5A1 and COL5A2 are known to account for about half of classic cases of the disease Ehlers-Danlos syndrome (EDS) (Malfait et al. 2005). As a result of the prevalence of EDS-causing mutations in COL5A1 and COL5A2, COL5A3 was suspected as being involved in the disease as well. However, no EDS causing mutations have yet to be reported in COL5A3 (Hoffman et al. 2008). The variants I identified in COL5A3 were extremely common, with a frequency as high as 0.66 as reported by dbSNP. The high frequency eliminates them as possible SMA modifiers. To eliminate the possibility of another, less frequent, variant being in linkage disequilibrium with the COL5A3 variants, I searched the nearby intergenic region for any additional candidate variants including in known miRNA. However, none were found. Thus, these mutations are likely just a common haplotype and not evidence of an SMA modifier in linkage disequilibrium.

By combining both the exomic and genomic sequencing data, I identified 3 synonymous candidate variants in ROCK2 that were in 3 of the 4 mild discordant siblings sequenced. Two of these variants were predicted to alter splicing via creation of a cryptic splice site. Interestingly,

117

there were also 147 intronic variants that I found segregating with 2 of the milder siblings, 49 of which were in linkage disequilibrium according to Ensembl. I am not able to determine if these intronic variants are also present in the third mild sibling with the synonymous ROCK2 mutations, as this sibling pair was only exome sequenced. Though I have identified 2 exonic mutations predicted to create a cryptic splice site, it is entirely possible an intronic variant in

ROCK2 modifies SMA in a similar manner and the exonic mutations are just in linkage disequilibrium. Therefore, it is very important to sequence the ROCK2 gene of additional milder discordant siblings to further narrow down the list of exonic and intronic variants that segregate with the exception patients.

In Chapter 1, I described how RhoA and the ROCK pathway were considered possible modifiers of SMA. Briefly, experiments where mice were treated with the RhoA/ROCK inhibitors Y-27632 or Fasudil resulted in amelioration of the SMA phenotype, including increases in weight, survival, and motor function (Bowerman et al. 2010, 2012). Although much of the research related to the ROCK pathway and SMA was not specifically focused on the gene

ROCK2, ROCK2 has been shown to act synergistically with ROCK1 and is similarly inhibited by

Y-27632 (Swanger et al. 2015). One of the variants I identified in ROCK2 was the rs56070302 mutation which is predicted to create a cryptic splice site donor. This would take the coding region out of frame and as it is located in the 5’ end of the gene in exon 3, it would certainly result in non-functional protein. Thus, the lowering of ROCK2 protein levels leading to inhibition of the ROCK pathway is a clear mechanism of how the variants I identified in ROCK2 could lead to a milder SMA phenotype.

118

The candidate variants in FBXO3 and CD59 were of great interest to me initially for three reasons. First, there were 24 variants found within 106 kb of each other. Second, many of these variants were known to be in linkage disequilibrium. Third, FBXO3 gene expression was found in both mouse motor neurons and human brain tissue. FBXO3 is more likely to be a modifier of

SMA as CD59 codes for a protein that is an inhibitor of the membrane-attack complex (Rollins and Sims 1990) and CD59 gene expression in motor neurons is an order of magnitude lower than

FBXO3, thus making it a less attractive candidate than FBXO3. The function of FBXO3 also makes it an attractive candidate for SMA modification. FBXO3 is an E3 ubiquitin ligase and a member of the F-box protein family. F-box proteins interact with SKP1, CULLIN1, and ROC1, and together form the Skip1-Cullin1-F-Box (SCF) complex whose function is to bind specific substrates for ubiquitination (Cui et al. 2011). There are over 60 F-box proteins, though the targets of each have not been fully elucidated (Cui et al. 2011). In particular, FBXO3 has been shown to polyubiquitinate the protein SMURF1, targeting it for degradation (Li et al. 2015).

Interestingly, SMURF1 is also an E3 ubiquitin ligase and ubiquitination assays have shown it catalyzes the direct ubiquitination of RhoA, leading to its degradation (Wang et al. 2003). It may be possible for FBXO3 to indirectly affect the Rho-kinase pathway by altering levels of

SMURF1. An intronic mutation of FBXO3 which alters splicing and inactivates the gene would be predicted to decrease the ubiquitination of its targets, including SMURF1, allowing SMURF1 to act as a RhoA inactivator. Thus, two of the candidate modifiers I identified may alter SMA severity via ROCK inhibition. Other targets of FBXO3 include Fbxl2 (and by extension TNFR- associated factor (TRAF) proteins) and p62 (Kainulainen et al. 2014; Chandra et al. 2019).

TRAF proteins are responsible for inflammation (Mallampalli et al. 2013), while p62 is a subunit

119

of the transcription factor TFIIH (Kainulainen et al. 2014), which are functions that seems to be unrelated to SMA.

FBXO3 may also more directly be affecting SMA via ubiquitination. SMN is known to be degraded through the ubiquitin pathway, based on experiments showing increased SMN protein in SMA-patient derived fibroblasts that were treated with a proteasome inhibitor (Chang et al. 2004; Burnett et al. 2009a). Perturbations in the ubiquitin pathway have been reported in

SMA, such as decreased expression of the ubiquitin-like modifier activating enzyme UBA1

(Wishart et al. 2014). Restoration of UBA1 using AAV9 in a mouse model of SMA modestly increased survival (Powis et al. 2016). Salbutamol has been shown to inhibit ubiquitination of

SMN (Harahap et al. 2015), though clinical trials have been mixed as double-blinded studies showed no increase in strength (Pane et al. 2008; Morandi et al. 2013). Ubiquitin has other roles aside from protein degradation, including directing cellular localization of proteins and altering protein-protein interactions (Schnell and Hicke 2003). Indeed, differences in ubiquitination can account for the different subcellular location of SMN (Han et al. 2016). Mutations in FBXO3 could therefore be modifying SMA if they alter SMN protein ubiquitination.

A second candidate gene of interest I detected was the protein tyrosine phosphate receptor PTPRD. Genes in the protein tyrosine phosphatase receptor family (PTPR) are known as tumor suppressors as their inactivation has been found in numerous types of cancer (Chan et al. 2009). Protein tyrosine phosphotase receptor delta (PTPRD) in particular is commonly found as lost in glioblastoma multiforme cancers (Veeriah et al. 2009; Ortiz et al. 2014). Multiple studies have shown that the tumor suppressor property of PTPRD comes from its ability to dephosphorylate the protein STAT3 (Chan et al. 2009; Veeriah et al. 2009; Ortiz et al. 2014).

120

STAT3 activation has been detected in response to nerve injury (Rajan et al. 1995; Alonzi et al.

2001), including motor neurons (Schweizer et al. 2002a). When rat motor neurons were subjected to injury, there was a near-immediate increase in STAT3 activation which was sustained in some cases for 3 months afterward (Haas et al. 1999; Schwaiger et al. 2000). In a study where knockout of STAT3 in mice was performed using Cre recombinase, isolated motor neurons showed significantly reduced survival after nerve injury (Schweizer et al. 2002b).

STAT3 activation extends survival in mice with progressive motor neuropathy (Jablonka et al.

2013), suggesting that loss or disruption of this gene may have a protective effect in motor neuron disease. The mode of action of STAT3-mediated neuroprotection may be the inhibition of apoptosis, as STAT3 activation results in an upregulation of the anti-apoptotic protein Bcl-xL

(Grad et al. 2000; Schwaiger et al. 2000). Interestingly, 2 related anti-apoptotic proteins Bcl-2 and Bxl-X where detected as being down regulated in SMA type 1 fetuses (Soler-Botija et al.

2003), and knockout of the apoptotic Bax protein prolonged survival in SMA mice (Tsai et al.

2006). Thus, STAT3 in motor neurons appears to be activated and have some neuroprotective effect. A loss of a STAT3 inactivator such as PTPRD may allow for increased activation of the

STAT3 pathway and attenuate disease severity by upregulating anti-apoptotic proteins.

The third region identified as being in linkage disequilibrium was in the gene SLIT1.

SLIT1 is crucial for neurodevelopment as it acts as a chemoattractant or chemorepellant by binding to Roundabout (Robo) receptors (Kramer et al. 2001). SLIT1 has been demonstrated to be involved in the development of many neuronal types, including pyrimidal neurons (Yeh et al.

2014), olfactory sensory neurons (Jaafar et al. 2016), dorsal root ganglion neurons (Zhang et al.

2010), and retinal ganglion neurons (Thompson et al. 2006). Importantly, experiments in mouse

121

embryos using SLIT1 and Robo1/Robo2 mutants have shown that SLIT1 was needed for the proper migration of motor neuron axons out of the neural tube (Kim et al. 2015, 2017b). Thus, changes in Slit1 protein levels may alter motor neuron development. However, it should be noted that in mouse models of SMA, there was no delay or defect in axonal development and axons reached muscles at similar timepoints in SMA mice compared to controls (McGovern et al.

2008). Therefore, if Slit1 modifies SMA it is likely through a mechanism separate from neurodevelopment. As I detected expression of Slit1 in postnatal mouse motor neurons, it may have other functions in mature motor neurons. Indeed, Slit1 may also be involved in neuro- regeneration, as it has been found to be upregulated in rat spinal cord after peripheral nerve injury (Yi et al. 2006). Still, it is unclear how intronic mutations may alter such a function. The most likely method would be by altering splicing in some way. SLIT1 is known to undergo alternative splicing (Tanno et al. 2004), though the function of the different spliceforms is not fully elucidated.

The genes FAM171A1 and HS6ST3 also had multiple candidate intronic variants detected in them. Not much is known about the FAM171A1 genes but from the expression data available I determined it had high expression in neuronal tissues of 19.81 RPKM. HS6ST3 codes for heparan sulfate 6-O-sulfotransferase 3 which may be important in axon guidance (Irie et al.

2002). As such, both FAM171A1 and HS6ST3 could possible act as SMA modifiers though evidence is limited and they currently are not as strong candidates as other genes discussed above.

Two genes that have been proposed to be SMA modifiers are the genes PLS3 and

NCALD (Oprea et al. 2008b; Riessland et al. 2017). No genetic variant has been identified in

122

PLS3 that has been linked to the milder phenotype. In Chapter 2, I performed targeted genome sequencing on the PLS3 gene of 217 SMA patients and found no evidence of modifying variants that segregate with exception SMA patients. In this project, I have again found no evidence of modification of the SMA phenotype by PLS3. Similarly, I did not find evidence of genetic variants in NCALD modifying the SMA phenotype, including the rs147264092 and rs150254064 variants. In the original paper describing NCALD as an SMA modifier in asymptomatic SMN1 deleted patients, it was implied that both the rs147264092 and rs150254064 variant must be present in a patient for them to have a modifying effect, though no data was published showing how these variants work synergistically (Riessland et al. 2017). However, the paper also demonstrated that these asymptomatic patients have reduced NCALD expression and protein levels (Riessland et al. 2017). Presumably, the variants result in the lowered NCALD expression and protein levels. Interestingly, rs150254064 is a 17 bp deletion and was found adjacent to a

ENCODE super-enhancer. And from my analysis, I have found evidence rs147264092 may result in a cryptic splice site creation which would potentially disrupt the gene. If this is correct, lowered NCALD expression may come from rs150254064 impacting the ENCODE super- enhancer, while rs147264092 may lower protein levels by impacting gene splicing. If this is indeed how the variants work synergistically, it is unclear why both variants must be present for the modifying effect to take place. Regardless, we have demonstrated here that the presence of one of the variants alone is insufficient for any modifying effect.

This work here paves the way for discovery of SMA modifiers. Additional confirmation of each variant’s ability to modify SMA is needed and can be accomplished in multiple ways.

For the variants in ROCK2 and FBXO3 that may be impacting splicing, a splicing assay using a

123

Table 3.3. Clones that contain candidate modifying genes Region in Gene Description Clone LDa (bp) COPII vesicle coating, regulation of CD59 membrane attack complex RP11-646J21 106,386 FBXO3 protein ubiquitination RP11-646J21 106,386 SLIT1 axon guidance RP11-1083G23 16,052 PTPRD regulation of presynapse assembly RP11-1036K24 11,900 Astroprincin, abundant expression in RP11-721D14 FAM171A1 astrocytes, actin cytoskeleton RP11-1080E3 418 RP11-621G18 HS6ST3 promotes axonal guidance by slit/robo RP11-1126G7 245,408 regulates cytokinesis, muscle contraction, ROCK2 actin stress fibers, focal adhesions RP11-431N16 36,827 aLD = Linkage Disequilibrium.

minigene with and without the variants of interest can be done to determine if the variant alters splicing and amount of functional protein. Most importantly, the variant needs to be found in more mild discordant SMA siblings. One approach that would accomplish this using high- throughput technology would be to perform targeted sequencing of the candidate genes in multiple discordant SMA families. The MDiGS technique would be perfect for this experiment, allowing for multiple genes to be sequenced for multiple patients in a single MiSeq run. In the original MDiGS paper, 3 genes were captured and sequenced for 48 patients (Alvarado et al.

2014), which would be roughly equivalent to sequencing 5 genes of 28 patients or 14 discordant

SMA sibling pairs. Our lab has recently proposed such an experiment. We have found clones for each of the target genes which are shown in Table 3.3. These clones will be used to capture the target genes over 2 rounds of MDiGS sequencing. In collaboration with Dr. Kathryn Swoboda, we have 28 pairs of discordant siblings which will be sequenced in this experiment. I have 124

modeled this experiment and determined that to reach statistical significance using a 2 x 2 contingency table, the candidate variant needs to be present in only 7 mild individuals. As such, even if there are multiple modifying variants in different genes, we can still have the statistical power to detect these variants. In short, the work described in this Chapter has identified 7 genes as possible SMA modifiers, 1 of which (ROCK2) has previously been shown to ameliorate SMA severity in a mild mouse model of SMA, and I have outlined a strategy to confirm the variants in further discordant SMA pairs.

125

Chapter 4

Identification of splicing changes in motor neurons of SMA mice

Vicki McGovern contributed to this work by performing library prep for sequencing and validation of splicing changes using ddPCR

4.1 Introduction

The SMN complex has been shown extensively to participate in the biogenesis of snRNPs (Liu and Dreyfuss 1996; Fischer et al. 1997; Pellizzoni et al. 2002c). These snRNPs are a crucial component of the cellular splicing machinery and are composed of an snRNA, core Sm proteins, as well as proteins specific to each individual snRNP (Kambach et al. 1999; Stark et al.

2001). In particular, the SMN complex loads the seven Sm proteins onto the snRNA (Fischer et al. 1997; Meister et al. 2001; Pellizzoni et al. 2002c). Defects of snRNP assembly have been implicated in contributing to the SMA phenotype, as cell extracts from SMA patient spinal cords show a clear reduction in the spliceosomal snRNPs (Wan et al. 2005). Additionally, snRNP assembly assays on spinal cord tissue from mice with vary degrees of SMA severity show that there is a reduction of snRNP assembly activity which correlates with disease severity

(Gabanella et al. 2007), and there is also a reduction in snRNP levels in the spinal cord of SMA

126

animals (Gabanella et al. 2007). Altogether, the data has shown a disruption of snRNP biogenesis in SMA tissues, suggesting that snRNP function may be reduced in these samples.

The most well-known function of many snRNPs is the splicing of pre-mRNA. They exist in a complex called the spliceosome, which is composed of the U1, U2, U4, U5, and U6 snRNP

(Kambach et al.; Will and Lührmann 2001). There is also a minor spliceosome that is composed of a unique set of snRNPs, namely U11, U12, U4atac, and U6atac (Will and Lührmann 2005).

The minor spliceosome is responsible for splicing a small subset of introns (Alioto 2007a). As snRNP biogenesis has been shown to be disrupted in SMA, it was hypothesized improper splicing of mRNA may contribute to the SMA phenotype. Indeed, early microarray experiments have shown that there is widespread splicing abnormalities in spinal cord, brain, and kidney tissues from SMA mice compared with controls (Zhang et al. 2008). However, similar experiments have shown that when correcting for multiple testing and when analyzing mice earlier in the disease at the P1 and P7 timepoints, splicing changes are limited to only a handful of genes (Bäumer et al. 2009). Thus, although numerous splicing changes may occur in SMA samples, many are likely secondary effects of the disease and may not be causal of SMA.

Determining the early splicing changes that lead to motor neuron cell death or other aspects of the SMA phenotype is crucial for understanding the disease and for developing novel therapeutic targets. The gene Stasimon has been demonstrated to be aberrantly spliced and prone to intron retention in Drosophila models of SMA, specifically at the intron that is spliced by the minor spliceosome (Lotti et al. 2012). Transgenic expression of full-length Stasimon cDNA in

Drosophila smn mutants rescued multiple defects in these animals, including an abnormal increase in eEPSP and motor axon defects (Lotti et al. 2012). This illustrates that splicing defects

127

can be linked to specific aspects of the SMA phenotype and correction of the splicing defect can ameliorate the phenotype. Intron retention has been detected via RNA-seq experiments performed on spinal cord samples from SMA mice taken at P5, especially in introns known to be spliced by the minor spliceosome (Doktor et al. 2017). However, as this tissue was from whole spinal cord, it may not be representative of what occurs in motor neurons.

Difficulty remains in determining splicing changes that contribute to the SMA phenotype for multiple reasons. First, there is an abundance of secondary changes that are due to a late- stage diseased motor neuron that are not SMA specific (Bäumer et al. 2009). Second, as motor neurons are the primary tissue affected in SMA, it is motor neurons that must be analyzed which may not be well annotated. Indeed, analysis of SMA RNA-seq data using different annotations gave different results (Doktor et al. 2017). And third, as SMA is predicted to result in splicing aberrations, analysis must be done with software that can detect novel splicing changes and aberrations. I have performed an RNA-sequencing experiment and designed a bioinformatic pipeline that addresses these challenges. I have analyzed RNA-seq data from the motor neurons of nm- mice, which I used as a disease-control. nmd mice have a motor-neuropathy phenotype as a result of mutation of the IGHMBP2 gene which causes a translational defect not related to

SMN deficiency (de Planell-Saguer et al. 2009). I also have performed a transcriptome assembly using OASES, from which I have detected 669 novel exons that have not previously been annotated by Gencode. These novel exons were implemented in curated annotation files so they were analyzed in downstream analysis. Finally, I performed differential expression and splicing analysis on RNA-seq data from motor neurons of SMA mice

128

4.2 Materials and methods

4.2.1 Collection of samples for RNA sequencing

Motor neurons were collected from mice on P6 using laser-capture microdissection. We used five different mouse lines: SMA (Smn-/-, SMN2+/+, SMNdelta7+/+), SMA carrier (Smn+/-,

SMN2+/+, SMNdelta7+/+), nmd (nmd-/-), nmd carrier (nmd+/-), and SMA rescue mice that were treated with an ASO on P1 that corrects the SMN2 splicing defect, and thus ameliorating the phenotype. SMA mice live for approximately 14 days, and recapitulate many aspects of the disease. The nmd mice were used as a disease control, as they give rise to a mouse with a motor neuropathy, but one that is not a result of missplicing. In addition, I have also re-analyzed RNA- seq data of LCM SMA motor neurons at P1 published by the Dreyfuss group. Their experiments were lacking in analysis to detect novel splicing changes and did not use a disease control.

4.2.2 Read alignment and splicing analysis

I have developed a bioinformatic pipeline specifically for this project that allows for the detection of both novel and annotated splicing changes. Sequencing reads are aligned to the mm10 genome using STAR v 2.3.1. Read counts for each gene were calculated using HTCount.

The read counts were used for differential expression analysis using the R package DESeq2.

Differential expression changes were sorted by log2 fold change and have been corrected for multiple testing. Finally, splicing analysis was performed by using the program MATS. Splicing changes that were detected were eliminated if the false discovery rate was greater than 0.05.

Inclusion level of the alternative splicing changes were calculated, which is the percentage of reads that contain evidence of a splicing change. Splicing changes with an inclusion level of less than 0.20 were not considered to be biologically important.

129

4.2.3 Identification of novel exons

I developed a Python script that can identify previously unannotated exons from RNA- seq data. The script works by taking an annotation file and an alignment file as input and searching between known coding exons for evidence of spliced read alignments. For this project,

Gencode V2 mm10 annotations were used. The alignment file used was from a laser capture micro dissected (LCM) motor neuron RNA-seq experiment that was performed by the Dreyfuss lab (Zhang et al. 2013). Both the SMA and wildtype data sets were used. The raw read files from these experiments were assembled into full length transcripts using the program OASES.

Assembled transcripts that were longer than 200 bp were put into a FASTQ file and aligned to the mm10 genome using the program STAR 2.3.1uL which is a version of STAR specially adapted for long read alignments. This alignment file is what is used as input for my Python program to detect novel exons. The script searches for read alignments that lie between known exons. Evidence of read alignments was found by analyzing the CIGAR string in the alignment file, which indicates if reads were spliced. As an additional measure, I find the genomic base pairs flanking where the alignment occurred to ensure that they are the canonical GT-AG or AT-

AC splice sites. Additionally, the script can detect if a novel alternative splice site is being utilized at known exons. After identification of possible novel exons, the assembled transcript alignments are viewed in the program IGV manually to ensure that the transcript was assembled correctly. Transcripts determined to be assembled correctly are then appended to the input annotation file so that the novel exon can be used in any downstream analysis. Select novel exons were validated using PCR amplification with primers that flank the exon.

130

4.2.4 Confirmation of splicing changes

Splicing changes were validated using ddPCR with primers and probes designed for the alternate exon as well as for an adjacent constitutively expressed exon. For the Gria4 alternate exon, which is located on the minus strand 16537bp upstream of the last coding exon, the forward primer used was 5'-AATGCTGTTAACCTCGCAGT-3', the reverse primer used was 5'-

GCCACATTCTCCTTTGTCGTA-3', and the probe used was 5'-/56-

FAM/ACTGAATGA/ZEN/ACAAGGCCTCTTG GACA/3IABkFQ/-3'. For the Gria4 constitutively expressed exon (coding exon 3), the forward primer used was 5'-

GCAGGCGTCTTCTACATTCT-3', the reverse primer used was 5'-

CTCTGCCCTGGACTTGTAAC-3', and the probe used was 5'-/56-

FAM/CTTGGCAAT/ZEN/GCTGGTGGCTTTGAT/3IABkFQ/-3'. For the Slc39a14 alternate exon, which is located on the minus strand 16537bp upstream of the last coding exon, the forward primer used was 5'-CCAGGCTACTTAATATCCAGAGTG-3', the reverse primer used was 5'-GTGAGGCCAAGGCTAATGT-3', and the probe used was 5'-/56-

FAM/ACTGTTGGC/ZEN/TCAGTTTGACCAGGT/3IABkFQ/-3'. For the Slc39a14 constitutively expressed exon (coding exon 3), the forward primer used was 5'-

GTTGGTGCTGTTTGGTATATGG-3', the reverse primer used was 5'-

GTCAGGGTAAGGCTGTCATT-3', and the probe used was 5'-/56-

FAM/TCCTGGAAG/ZEN/ATCTCATGGACCGCTA/3IABkFQ/-3'.

4.2.5Validation of novel exons

Select novel exons were validated using ddPCR with primers and probes designed to amplify the novel exon. Specifically novel exons for the genes Sdhaf2 and Ubl7 were tested for.

131

Primers and probes for a constitutively expressed exon for both genes are also designed, so the relative expression of the novel exon could be determined. For the Sdhaf2 novel exon, which is located on the minus strand 16537bp upstream of the last coding exon, the forward primer used was 5'-AGACATGATTGAAATCCCTTTGC-3', the reverse primer used was 5'-

TTTCTGCTCTCATAGAGTAAGCG-3', and the probe used was 5'-/56-

FAM/CCGTGGCAG/ZEN/GAGAGAACTGATGA/3IABkFQ/-3'. For the Sdhaf2 constitutively expressed exon (coding exon 3), the forward primer used was 5'-

GAGAAGCAGCTGAACCTCTATG-3', the reverse primer used was 5'-

CTGTGGCCCAGTAGTAAATATCC-3', and the probe used was 5'-/56-

FAM/TCGCCTGAT/ZEN/TAACGAGCCTAGCAA/3IABkFQ/-3'. For the Ubl7 novel exon, which is located 813bp upstream of the first coding exon, the forward primer used was 5'-

AGAGAGCATGGAACCGATGT-3', the reverse primer used was 5'-

CTCCTGTCTTTGCATATAAGTTC-3', and the probe used was 5'-/56-

FAM/TCTTGGTAG/ZEN/GAGTTTGCTAAGAGGCC/3IABkFQ/-3'. For the Ubl7 constitutively expressed exon (coding exon 1), the forward primer used was 5'-

TCAGTTGCCAGAGACAGAAC-3', the reverse primer used was 5'-

GGAACAGATTCCTGGAGCTT-3', and the probe used was 5'-/56-

FAM/TCATTTCTG/ZEN/AAGCAGCTTATTGCTGGC/3IABkFQ/-3'.

4.3 Results

4.3.1 Alignment and gene expression analysis

In order to identify changes in the transcriptome of SMA mice, RNA-sequencing was performed on RNA isolated from motor neurons of SMA (Smn-/-, SMN2+/+, SMNdelta7+/+) and

132

SMA carrier (Smn+/-, SMN2+/+, SMNdelta7+/+) mice. As a disease control, RNA from the motor neurons of nmd (nmd-/-) and nmd carrier (nmd+/-) mice were also sequenced. Finally, we also sequenced RNA from the motor neurons of SMA rescue mice that were treated with an ASO on

P1 that corrects the SMN2 splicing defect. In all cases, motor neurons were isolated from the mice using LCM. We expect any transcriptome changes that are relevant to SMA would be present in the SMNdelta7 mice, but not in the nmd mice, SMA carrier, or SMA rescue samples.

RNA-sequencing reads for all samples were aligned using the program STAR.

Differential expression analysis was performed using the R package DESeq2. Results of differential expression analysis for the P1 data can be seen in Table 4.1. There was a total of 24 genes found to be differentially expressed between the SMA and control motor neurons at the P1 timepoint, indicating that only few expression differences remain at this early timepoint. In order to determine if there were any biological pathways that were enriched for in the list of differentially expressed genes, I performed gene ontology analysis using Panther. From the analysis, I determined the differentially expressed genes were involved in regulation of synaptic transmission (Tacr2), negative regulation of Schwann cell proliferation (Sox10), peripheral nervous system development (Sox10 and Etv1), positive regulation of gliogenesis (Sox10 and

Lif), central nervous system myelination (Sox10), and neuromuscular synaptic transmission

(Chrne). However, there was only 1 pathway found to be significantly enriched for after multiple testing correction which was the interleukin signaling pathway, as 3 genes in that pathway were found to be differentially expressed (Cdkn1a, Fos, and Csf2rb).

133

Gene expression analysis was also performed on the samples taken at P6. However, after correcting for multiple testing, no significant differentially expressed genes were found. This is likely due to the poor quality observed in these samples.

Table 4.1 Gene expression changes at P1

Gene log2FoldChange p-Value FDR Cdkn1a 2.115044 2.88E-13 3.95E-09 Gjb5 5.705012 3.59E-12 2.46E-08 Srd5a2 5.641813 2.29E-11 1.05E-07 Lif 5.461981 1.18E-10 4.06E-07 Slc26a9 4.738926 5.12E-09 1.41E-05 Csf2rb 4.37249 1.23E-08 2.81E-05 Hmcn1 -2.40929 1.09E-07 0.000214 6330410L21Rik 4.248322 2.45E-07 0.000421 Rorc -4.47341 4.05E-07 0.000617 Etv1 -1.7484 1.33E-06 0.00182 Ppp1r1c -2.64383 1.86E-06 0.002316 E130012A19Rik -4.0981 3.75E-06 0.004195 Gpr15 4.033384 3.97E-06 0.004195 1700007K13Rik 3.34154 8.81E-06 0.008641 Gsx1 3.961094 9.91E-06 0.009067 Rhbdf1 2.858979 1.25E-05 0.01072 Fos 2.051568 1.71E-05 0.013849 Col1a1 2.851907 2.32E-05 0.017677 Tacr2 -3.79976 2.71E-05 0.019573 Arhgap36 -1.20316 3.62E-05 0.024511 Fbxl6 2.393986 3.75E-05 0.024511 C1qc 1.59054 7.15E-05 0.044493 Chrne -3.53061 7.57E-05 0.044493 Sox10 1.522331 7.78E-05 0.044493

134

4.3.2 Transcriptome assembly

In this experiment I wanted to detect any splicing aberrations in the SMA samples.

However, such splicing changes may result in transcripts that are novel or otherwise unknown and not in any annotation database, which would result in splicing analysis programs missing these splicing abnormalities. To ensure comprehensive detection of splicing changes, I have performed a de novo transcriptome assembly using the program Oases. Raw sequencing reads were assembled into full length transcripts using Oases. All resulting transcripts that were greater than 200 bp in length were then aligned to the mm10 genome. Using a script that I wrote, novel splice sites and novel exons that were not in Gencode v2 annotations were detected. These novel exons and splice sites were manually curated into a new annotations file that I could use in downstream analysis. In all, 669 novel splice sites or exons were detected.

Select novel exons were chosen to be validated using ddPCR. Primers were made to amplify the novel exons found in the gene Sdhaf2 and Ubl7. This exon was able to be analyzed in both LCM collected and spinal cord material. This data indicates that the novel exons are real and expressed in spinal cord of SMA and carrier mice. Furthermore, it shows my script successfully detects novel exons, making it a very powerful tool in transcriptomic analysis.

Expression data as measured by ddPCR can be found in Table 4.2.

4.3.3 Determination of splicing changes

In order to determine if there was splicing disruption in SMA, we performed splicing analysis on the sequencing data using the program MATS. Splicing analysis was performed on the data from both P1 and P6 samples. Splicing changes were only considered if there were a total of 10 reads at that particular loci. Additionally, splicing changes were prioritized by inclusion level of the

135

Table 4.2 ddPCR validation of novel exons Expression Tissue Mouse Sdhaf2 Ubl7 MN SMA 22.1 13.5 MN Carrier 22.5 14.9 SC SMA 66.4 27.5 SC Carrier 57 32.4

alternative splicing event in the SMA sample compared to wildtype. I detected a total of

207 and 78 splicing changes in the P1 and P6 samples, respectively. These are splicing changes with an absolute value of the inclusion level difference greater than 0.20 and with an FDR less than 0.05. Note that of the 207 splicing changes at P1, a majority of them (158) have an inclusion level of less than 0.40, indicating that most the changes are small in degree of change. The splicing changes for the P1 data can be found in Table 4.3 and the splicing changes in the P6 data can be found in Table 4.4. For the P1 data, they include 147 skipped exons (SE), 35 alternative 5' splice sites (A5SS), 19 alternative 3' splice sites (A3SS), and 6 mutually exclusive exons (MXN). For the P6 data, I found 52 SE, 9 A3SS, 7 A5SS, and 10 MXN. These data show there were numerous splicing differences between the SMA and SMA carrier samples, although in most the difference in isoform levels was small. Additionally, this data shows there were more splicing changes detected at the earlier timepoint. However, it should be noted that the decrease in detectable splicing events were likely due to sample quality issues and may not indicate reduced alternative splicing at this timepoint.

Interestingly, there were 3 genes in common between the P1 and P6 datasets that were found to be alternatively spliced. These genes were Mdm4, Magi1, and Slc4a4. There was some evidence that the magnitude of the splicing alteration was affected by the progression of these

136

Table 4.3 Splicing changes at P1 Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Ppp1r16a SE chr15:76689327 0.984 0.000 0.968 <1.0E-16 <1.0E-16 Shank2 SE chr7:144091711 0.036 0.991 0.910 0.000777 0.009173 2210408F21Rik SE chr6:31277847 0.067 0.976 0.819 3.41E-08 2.11E-06 Agap2 SE chr10:127090141 1.000 0.194 0.737 4.78E-10 5.69E-08 Crem SE chr18:3276693 0.355 1.000 0.643 3.16E-05 0.000674 Rif1 SE chr2:52074246 0.170 0.819 0.632 5.33E-15 2.24E-12 Erbb2ip SE chr13:103830187 0.895 0.181 0.613 <1.0E-16 <1.0E-16 Slc39a14 SE chr14:70331268 0.868 0.132 0.592 5.31E-09 4.31E-07 Tia1 SE chr6:86419100 0.202 0.951 0.592 2.77E-09 2.52E-07 Unc79 SE chr12:103125608 0.033 0.673 0.590 1.33E-15 6.34E-13 Mef2c SE chr13:83654597 0.047 0.795 0.559 1.14E-07 5.94E-06 E230016M11Rik SE chr6:67043758 0.000 0.553 0.550 <1.0E-16 <1.0E-16

137 Med23 SE chr10:24882553 0.286 1.000 0.545 3.45E-07 1.55E-05 Gria4 SE chr9:4432772 0.955 0.372 0.528 5.42E-10 6.29E-08 Bcas3 SE chr11:85354593 0.076 0.677 0.514 5.14E-10 6.06E-08 Stk30 SE chr12:110834125 0.083 0.823 0.514 1.54E-09 1.52E-07 Prrx1 SE chr1:163253976 0.688 0.142 0.508 0.000314 0.004368 Dgkg SE chr16:22572599 0.000 0.536 0.500 2.22E-16 1.38E-13 Gpatch2l SE chr12:86281448 0.437 1.000 0.494 8.56E-13 2.11E-10 Tfdp1 SE chr8:13357042 0.326 0.810 0.472 1.13E-10 1.65E-08 St7 SE chr6:17905706 0.840 0.254 0.471 4.44E-15 1.98E-12 Pex2 SE chr3:5570486 0.084 0.704 0.453 1.25E-06 4.57E-05 U2af1 SE chr17:31654978 0.546 1.000 0.442 0.000457 0.005912 Gpr56 SE chr8:94997300 0.524 0.000 0.436 6.15E-10 6.97E-08 Cbs SE chr17:31614528 0.000 0.484 0.436 <1.0E-16 <1.0E-16 Continued

Table 4.3 Continued Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Stradb SE chr1:58992203 0.781 0.272 0.430 2.07E-08 1.36E-06 Pitpnm3 SE chr11:72061473 0.327 1.000 0.429 5.53E-06 0.000155 Zfp788 SE chr7:41634679 0.113 0.561 0.425 7.52E-06 0.000202 Zfp324 SE chr7:12968794 0.380 0.974 0.424 7.94E-06 0.000211 Soga2 SE chr17:66340483 0.150 0.700 0.422 0.000876 0.010116 Mphosph9 SE chr5:124316081 0.229 0.898 0.419 1.15E-07 5.97E-06 Gm12239* SE chr11:56015916 1.000 0.514 0.417 0.000605 0.007438 Spsb4 SE chr9:96995574 0.364 0.973 0.415 2.51E-08 1.6E-06 Mpv17 SE chr5:31145239 0.872 0.356 0.410 4.68E-08 2.79E-06 Mpv17 SE chr5:31144893 0.869 0.356 0.410 1.86E-06 6.34E-05 Senp6 SE chr9:80089848 0.234 0.771 0.410 0.000132 0.002172 Rasgef1c SE chr11:49978407 1.000 0.576 0.400 0.001797 0.017717

138 Mad1l1 SE chr5:140088629 0.543 1.000 0.396 2.27E-11 4.11E-09 Pan3 SE chr5:147455117 0.401 1.000 0.394 3.9E-11 6.47E-09 Gtf2h1 SE chr7:46797396 0.402 0.000 0.387 5.4E-09 4.36E-07 Ralgps1 SE chr2:33336512 0.473 1.000 0.386 0.002291 0.021564 Aspscr1 SE chr11:120703602 0.503 0.909 0.384 0.000104 0.001801 Zfp324 SE chr7:12968794 0.475 0.992 0.384 3.69E-09 3.19E-07 Nbeal1 SE chr1:60238571 1.000 0.612 0.382 4.6E-06 0.000134 Lrch1 SE chr14:74793126 0.912 0.270 0.380 3.15E-07 1.44E-05 Ogfod2 SE chr5:124114064 0.813 0.208 0.375 0.00615 0.045972 Glis3 SE chr19:28540200 0.089 0.773 0.367 0.004126 0.034015 Crem SE chr18:3299161 0.031 0.522 0.366 9.3E-07 3.56E-05 A730056A06Rik SE chr7:73332427 0.109 0.600 0.366 1.85E-07 8.92E-06 Aamdc SE chr7:97565150 0.975 0.442 0.362 0.004022 0.033347 Continued

Table 4.3 Continued Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Spata2 SE chr2:167489168 0.000 0.372 0.355 6.26E-06 0.000172 Bicd1 SE chr6:149520352 1.000 0.596 0.353 0.002205 0.020926 Airn SE chr17:12859106 0.269 0.921 0.342 4.53E-11 7.44E-09 Strn3 SE chr12:51643103 0.551 0.100 0.342 <1.0E-16 <1.0E-16 Ube2q2 SE chr9:55168202 0.956 0.504 0.340 3.06E-05 0.000656 Zfp275 SE chrX:73351268 0.138 0.661 0.337 8.85E-06 0.000232 Fam189a2 SE chr19:24021605 0.348 1.000 0.333 0.000376 0.00507 Mgat4a SE chr1:37464420 0.462 0.917 0.333 3.38E-06 0.000104 Eri2 SE chr7:119787403 0.571 1.000 0.333 6.75E-10 7.53E-08 Zfp40 SE chr17:23191344 0.548 1.000 0.333 0.004664 0.037337 Patz1 SE chr11:3306287 0.600 0.214 0.331 9.66E-15 3.73E-12 Zfp870 SE chr17:32885702 0.576 0.920 0.330 7.55E-09 5.83E-07

139 Cand1 SE chr10:119208026 0.654 1.000 0.329 0.000174 0.002706 C77370 SE chrX:104099625 0.391 0.926 0.327 0.000119 0.002004 Rims1 SE chr1:22726929 0.255 0.861 0.317 0.000237 0.003492 Lrrc40 SE chr3:158041585 0.966 0.602 0.316 5.27E-09 4.31E-07 Lrrc40 SE chr3:158041619 0.940 0.523 0.315 1.59E-06 5.61E-05 Ubl7* SE chr9:57911709 0.118 0.440 0.310 0.002543 0.023532 Pde4b SE chr4:102421548 0.012 0.374 0.310 3.17E-12 7.08E-10 Rabep1 SE chr11:70919176 0.433 0.875 0.305 0.006598 0.048511 Prss36 SE chr7:127933566 0.875 0.376 0.303 3.25E-05 0.000689 Xpnpep1 SE chr19:53032008 0.363 0.777 0.302 3.09E-13 9.39E-11 Tpm1 SE chr9:67029663 0.734 0.333 0.302 9.6E-11 1.46E-08 Reln SE chr5:21891561 0.395 0.915 0.301 0.000445 0.005783 Pvt1 SE chr15:62107147 0.665 0.989 0.300 2.02E-05 0.000463 Continued

Table 4.3 Continued Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Zbtb20 SE chr16:43571733 0.236 0.646 0.300 3.86E-07 1.71E-05 Jmjd1c SE chr10:67216982 0.468 0.848 0.300 1.25E-08 8.83E-07 Hmga1 SE chr17:27556951 0.168 0.511 0.297 5.55E-16 2.94E-13 4933431E20Rik SE chr3:107894860 1.000 0.582 0.296 1.06E-06 3.96E-05 Mbnl1* SE chr3:60500980 1.000 0.579 0.296 0.000126 0.002082 Zfp414 SE chr17:33629621 0.293 0.711 0.296 1.73E-11 3.2E-09 Tbp SE chr17:15501311 0.850 0.445 0.293 0.000149 0.002404 Mrpl24 SE chr3:87919802 0.911 0.515 0.290 4.07E-06 0.000121 Cacna1b SE chr2:24682975 0.887 0.367 0.288 2.38E-08 1.53E-06 Wnk1 SE chr6:119952684 0.971 0.602 0.288 1.14E-08 8.17E-07 Zfp783 SE chr6:47945882 0.217 0.833 0.286 0.00227 0.02142 Pvt1 SE chr15:62107147 0.691 0.990 0.283 1.69E-05 0.0004

140 Fance SE chr17:28322645 0.750 0.213 0.283 0.000966 0.010853 Lrrc51 SE chr7:101921503 0.000 0.585 0.282 0.000157 0.002502 4833439L19Rik SE chr13:54561692 0.314 0.883 0.282 <1.0E-16 <1.0E-16 Ccp110 SE chr7:118732393 0.642 1.000 0.281 2.75E-06 8.66E-05 Anapc1 SE chr2:128680108 0.488 0.890 0.281 7.74E-07 3.01E-05 Unc79 SE chr12:103096550 0.194 0.656 0.279 4.77E-15 2.07E-12 Fance SE chr17:28317157 0.164 0.648 0.279 2.1E-08 1.37E-06 Apbb3 SE chr18:36678549 0.492 0.878 0.278 0.006389 0.047315 Gas2 SE chr7:51896306 0.000 0.307 0.273 0.000028 0.000606 Egfr SE chr11:16862945 0.511 1.000 0.267 0.000455 0.005898 Ankrd13d SE chr19:4282168 1.000 0.681 0.267 2.6E-06 8.33E-05 Tsc22d2 SE chr3:58428227 0.165 0.481 0.265 2.04E-11 3.74E-09 Pax6 SE chr2:105684750 0.278 0.908 0.260 0.005636 0.042938 Continued

Table 4.3 Continued Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Fbxo44 SE chr4:148156394 0.160 0.500 0.259 8.58E-14 2.72E-11 Tmem120b SE chr5:123104421 0.654 1.000 0.258 0.003027 0.026775 Ss18* SE chr18:14651022 1.000 0.577 0.256 1.29E-06 4.68E-05 Tctn1 SE chr5:122257469 0.359 0.986 0.254 0.000659 0.007998 Senp6 SE chr9:80087429 0.478 0.774 0.253 0.001129 0.012253 Ubr4 SE chr4:139435948 0.536 0.195 0.253 6.18E-08 3.57E-06 Smad5 SE chr13:56723428 0.672 0.975 0.253 0.003871 0.032323 Pan3 SE chr5:147503084 0.125 0.690 0.250 1.71E-07 8.44E-06 Ankrd29 SE chr18:12295878 0.583 0.161 0.250 0.004617 0.037095 Stx4a SE chr7:127848568 0.014 0.585 0.250 9.77E-07 3.71E-05 Madd SE chr2:91163962 0.203 0.523 0.250 4.1E-13 1.15E-10 Ncdn SE chr4:126752970 0.819 0.313 0.246 0.001305 0.013771

141 Nf1 SE chr11:79463232 0.083 0.368 0.243 <1.0E-16 <1.0E-16 Pxn SE chr5:115551487 0.617 0.213 0.233 0.005158 0.040071 Gnl3 SE chr14:31017807 0.672 1.000 0.231 0.006168 0.046058 Nrip1 SE chr16:76330745 0.611 0.993 0.231 5.33E-05 0.00103 Patz1 SE chr11:3306222 0.459 0.174 0.228 2.39E-10 3.16E-08 Sipa1l2 SE chr8:125423192 0.947 0.496 0.228 0.001004 0.011181 Lta4h SE chr10:93482249 0.527 0.830 0.226 0.005373 0.041371 Magi1 SE chr6:93694007 1.000 0.746 0.226 6.83E-06 0.000186 Rnf38 SE chr4:44158902 0.583 0.946 0.225 0.001736 0.017234 Zfx SE chrX:94082149 0.715 1.000 0.222 7.61E-09 5.84E-07 Hectd2 SE chr19:36583935 0.491 0.993 0.221 9.17E-09 6.85E-07 Dysf SE chr6:84064470 0.000 0.328 0.219 0.00319 0.027838 Etv6 SE chr6:134066306 0.647 0.244 0.219 1.53E-09 1.52E-07 Continued

Table 4.3 Continued Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Zfp454 SE chr11:50878042 0.660 0.991 0.217 3.99E-13 1.15E-10 Zmynd11 SE chr13:9710145 0.554 0.858 0.217 0.001564 0.015894 Bphl SE chr13:34046785 0.122 0.651 0.216 2.27E-05 0.000505 Ppp4r1l-ps SE chr2:173599589 0.168 0.470 0.215 7.17E-06 0.000194 Snap23 SE chr2:120590821 0.000 0.365 0.214 0.000202 0.003052 Uhrf2 SE chr19:30039063 0.674 1.000 0.214 0.000671 0.008127 Matn2 SE chr15:34378654 1.000 0.768 0.214 0.0002 0.003035 Ccdc85a SE chr11:28434080 0.695 0.959 0.214 5.55E-16 2.94E-13 Phc1 SE chr6:122332521 0.851 0.635 0.211 0.000133 0.002184 Mdm4 SE chr1:133011064 0.623 0.985 0.210 5.84E-05 0.001111 Brsk2 SE chr7:141998919 0.150 0.568 0.210 6.46E-07 2.6E-05 Egfl7 SE chr2:26591663 0.990 0.749 0.209 0.000149 0.002397

142 Clstn1 SE chr4:149626086 0.605 0.849 0.208 <1.0E-16 <1.0E-16 Nck2 SE chr1:43553860 0.758 0.989 0.206 2.22E-09 2.08E-07 Cdk8 SE chr5:146292623 0.906 0.497 0.205 9.31E-05 0.001649 Luzp1 SE chr4:136539124 0.561 0.310 0.205 6.66E-06 0.000182 Orc6 SE chr8:85302795 0.751 0.976 0.205 7.63E-07 2.98E-05 Hsf1 SE chr15:76496452 0.585 0.929 0.204 2.01E-05 0.000463 Pax2 SE chr19:44835390 0.726 1.000 0.204 0.000113 0.001929 Unc80 SE chr1:66694364 0.975 0.685 0.203 <1.0E-16 <1.0E-16 Prpf3 SE chr3:95849694 0.004 0.446 0.202 1.31E-09 1.34E-07 Tpd52l2 SE chr2:181510503 0.613 0.908 0.200 7.4E-10 8.12E-08 Zim1 A3SS chr7:6688623 0.610 0.650 -0.474 0.004237 0.02825 Mier2 A3SS chr10:79549536 0.837 0.712 -0.327 0.000764 0.007623 Trp53i11 A3SS chr2:93197608 0.814 0.584 0.200 0.004023 0.027184 Continued

Table 4.3 Continued Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Tgfbr1 A3SS chr4:47393246 0.616 0.350 0.204 0.00174 0.014364 Nup88 A3SS chr11:70944844 0.670 0.417 0.204 8.33E-15 3.18E-12 Rbm6 A3SS chr9:107852124 0.983 0.723 0.206 1.44E-06 4.89E-05 Dyrk1a A3SS chr16:94663741 0.366 0.652 0.207 0.000118 0.001646 Smim1 A3SS chr4:154023687 0.088 0.517 0.213 0.001511 0.012957 Arnt A3SS chr3:95493697 0.484 0.099 0.224 4.84E-06 0.000127 Atpaf2 A3SS chr11:60405775 0.906 0.508 0.224 0.000149 0.002011 Sun1 A3SS chr5:139238827 0.031 0.496 0.226 2.57E-05 0.000502 Bcor A3SS chrX:12048302 0.461 0.971 0.227 0.001114 0.009995 Pkp4 A3SS chr2:59308007 0.486 0.140 0.232 2.81E-12 4.3E-10 Ppih A3SS chr4:119300009 0.964 0.666 0.232 2.15E-05 0.000432 Galt A3SS chr4:41756343 0.071 0.550 0.236 0.000568 0.006061

143 Anks1b A3SS chr10:90163160 0.268 0.000 0.250 7.23E-07 2.69E-05 Irak1 A3SS chrX:74017129 0.000 0.321 0.250 0.001527 0.012957 Dync1i1 A3SS chr6:5769700 0.466 0.160 0.252 2.06E-09 1.26E-07 Rad52 A3SS chr6:119922471 0.560 0.000 0.262 0.003254 0.023069 Mro A3SS chr18:73873220 0.978 0.358 0.274 7.13E-05 0.001146 Cit A3SS chr5:116005781 0.329 0.722 0.281 7.21E-09 3.67E-07 Dyrk1b A3SS chr7:28185713 0.974 0.634 0.299 6.27E-05 0.00103 Ppil2 A3SS chr16:17086554 0.363 0.000 0.321 2.92E-09 1.65E-07 Fam13c A3SS chr10:70553028 0.038 0.582 0.335 2.71E-08 1.22E-06 Eci2 A3SS chr13:34993067 0.034 0.452 0.337 2.32E-14 7.09E-12 Tspan5 A3SS chr3:138896788 0.883 0.359 0.345 1.22E-13 2.66E-11 Pcdh19 A3SS chrX:133625268 0.380 0.797 0.347 1.61E-10 1.39E-08 Ciz1 A3SS chr2:32370031 0.787 0.205 0.379 5.24E-10 4E-08 Continued

Table 4.3 Continued Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Rnf180* A3SS chr13:105252251 0.321 0.753 0.405 1.95E-05 0.000402 Kcnt1 A3SS chr2:25903358 0.052 0.654 0.468 2.88E-06 8.62E-05 Hdgfrp2 A3SS chr17:56096820 1.000 0.299 0.469 7.76E-06 0.000182 Mid1ip1 A3SS chrX:10717983 0.285 0.907 0.477 1.61E-11 2.11E-09 Dlk1 A3SS chr12:109460045 0.205 0.942 0.539 1.29E-08 6.35E-07 Fbf1 A3SS chr11:116152476 0.967 0.356 0.576 <1.0E-16 <1.0E-16 Crem A3SS chr18:3267584 0.935 0.239 0.633 4.04E-10 3.24E-08 Kalrn A5SS chr16:34256151 0.977 0.635 0.205 0.000342 0.003762 Clec16a A5SS chr16:10595801 0.115 0.490 0.211 2.27E-07 7.8E-06 Zfp13 A5SS chr17:23581056 0.378 0.014 0.230 2.78E-05 0.000508 Sdhaf2* A5SS chr19:10517168 0.609 0.090 0.238 5.97E-05 0.000962 Tbp A5SS chr17:15501153 0.119 0.485 0.239 0.00021 0.002575

144 Rnf220 A5SS chr4:117285850 0.977 0.698 0.245 0.000131 0.001751 Zfp2 A5SS chr11:50910705 0.685 0.361 0.247 6.55E-05 0.001012 Arhgap26 A5SS chr18:39357552 0.926 0.433 0.256 7.56E-05 0.001138 Ubap2l A5SS chr3:90038814 0.252 0.515 0.256 6.61E-05 0.001012 Kcnt1 A5SS chr2:25900888 0.624 0.991 0.271 9.51E-08 4.29E-06 Slc4a4 A5SS chr5:89046212 0.397 0.055 0.280 <1.0E-16 <1.0E-16 Mgat4a A5SS chr1:37498572 0.103 0.432 0.283 1.16E-07 4.96E-06 Dixdc1 A5SS chr9:50710764 0.861 0.474 0.301 4.57E-06 0.000119 Slc39a11 A5SS chr11:113463950 0.421 0.000 0.316 0.000102 0.00139 Efr3a A5SS chr15:65815334 1.000 0.650 0.333 2.84E-09 1.67E-07 Htr2c A5SS chrX:147169569 0.507 0.106 0.374 0.000229 0.002725 2410089E03Rik* A5SS chr15:8178336 1.000 0.556 0.389 1.12E-05 0.000234 Shf A5SS chr2:122354028 0.059 0.677 0.487 1.27E-06 3.97E-05 Continued

Table 4.3 Continued Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Golga3 A5SS chr5:110176878 0.094 0.713 0.583 1.8E-14 1.93E-12 Erbb2ip MXE chr13:103827906 0.038 0.600 0.477 9.88E-06 0.00029 Arhgap26 MXE chr18:39357552 0.926 0.428 0.256 1.55E-07 9.29E-06 Calu MXE chr6:29361293 0.766 0.471 0.254 7.54E-05 0.00162 Actn4 MXE chr7:28909902 0.950 0.676 0.230 7.45E-09 5.96E-07 Brsk2 MXE chr7:141998632 0.750 0.278 0.227 1.28E-06 5.25E-05 Gria4 MXE chr9:4427029 0.382 0.779 0.205 3.33E-16 4.8E-13 aInclusion level condition (SMA) bInclusion level control cInclusion level difference, adjusted for sample variability

145

Table 4.4 Splicing changes at P6 Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Tex9 SE chr9:72325543 0.000 1.000 1 <1.0E-16 <1.0E-16 Mdm4 SE chr1:134907641 0.000 1.000 1 7.56E-13 1.58E-11 Lyrm4 SE chr13:36184674 0.000 1.000 1 6.07E-11 1.07E-09 Chmp3 SE chr6:71510890 0.000 1.000 1 4.82E-10 8.04E-09 Plcb1 SE chr2:135224984 0.000 0.974 0.974 <1.0E-16 <1.0E-16 Camta1 SE chr4:151209922 0.000 0.957 0.957 <1.0E-16 <1.0E-16 Trim33 SE chr3:103157489 0.082 1.000 0.918 1.05E-08 1.4E-07 Mpp6 SE chr6:50146512 0.109 1.000 0.891 2.26E-09 3.14E-08 Magi1 SE chr6:93644001 0.011 0.894 0.883 <1.0E-16 <1.0E-16 Magi1 SE chr6:93644001 0.011 0.890 0.879 <1.0E-16 <1.0E-16 Magi1 SE chr6:93644001 0.031 0.826 0.795 <1.0E-16 <1.0E-16

146 Clta SE chr4:44032745 0.234 1.000 0.766 0.008035 0.038338

Ktn1 SE chr14:48345603 0.124 0.854 0.73 <1.0E-16 <1.0E-16 Fam184a SE chr10:53357982 0.000 0.713 0.713 <1.0E-16 <1.0E-16 Gabrb3 SE chr7:65047861 0.316 1.000 0.684 1.4E-09 2.04E-08 Ube2k SE chr5:65972624 0.320 1.000 0.68 4.62E-13 1.03E-11 Ndrg3 SE chr2:156784202 0.279 0.956 0.677 3.71E-08 4.13E-07 Usp15 SE chr10:122600648 0.453 0.990 0.537 1.24E-08 1.6E-07 Chl1 SE chr6:103664438 0.465 1.000 0.535 0.003262 0.018468 Zfp62 SE chr11:49028056 0.000 0.533 0.533 1.05E-13 2.5E-12 Chd2 SE chr7:80658975 0.484 1.000 0.516 <1.0E-16 <1.0E-16 Guf1 SE chr5:69954321 0.000 0.486 0.486 0.000159 0.001158 Tecr SE chr8:86097309 0.000 0.451 0.451 0.001222 0.007287 Fxr1 SE chr3:33967082 0.164 0.612 0.448 8.17E-06 7.58E-05 Map7d2 SE chrX:155929699 0.190 0.613 0.423 1.55E-15 3.99E-14 Continued

Table 4.4 Continued Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Rnf14 SE chr18:38461309 0.583 1.000 0.417 0.000117 0.000887 Spast SE chr17:74758593 0.373 0.780 0.407 3.51E-05 0.0003 Ccdc82 SE chr9_random:374358 0.000 0.406 0.406 <1.0E-16 <1.0E-16 Tax1bp1 SE chr6:52692685 0.597 1.000 0.403 4.73E-11 8.78E-10 Csde1 SE chr3:102844363 0.057 0.444 0.387 6.24E-05 0.000494 Acsl3 SE chr1:78659715 0.000 0.368 0.368 2.67E-08 3.3E-07 Mycbp2 SE chr14:103668574 0.025 0.389 0.364 0.000244 0.001694 Ttc3 SE chr16:94605952 0.468 0.804 0.336 4.53E-05 0.000378 Dclk1 SE chr3:55325771 0.000 0.276 0.276 0.009383 0.042931 Map9 SE chr3:82185815 0.748 1.000 0.252 9.48E-10 1.44E-08 Atrx SE chrX:103079645 0.898 0.647 0.251 0.005002 0.026521 Casc4 SE chr2:121732433 1.000 0.724 0.276 4.73E-06 4.51E-05

147 Slc4a4 SE chr5:89661788 0.768 0.479 0.289 0.005859 0.030105 Hnrnpd SE chr5:100392749 0.962 0.630 0.332 <1.0E-16 <1.0E-16 Pphln1 SE chr15:93290102 1.000 0.666 0.334 0.000234 0.001663 Trpc1 SE chr9:95643615 1.000 0.555 0.445 0.00071 0.004559 Pank2 SE chr2:131119134 1.000 0.555 0.445 0.003719 0.020364 Nova1 SE chr12:47814484 0.879 0.425 0.454 5.2E-08 5.61E-07 Mbtps2 SE chrX:154031656 1.000 0.412 0.588 0.002928 0.016863 Fchsd2 SE chr7:108425131 1.000 0.408 0.592 4.35E-11 8.54E-10 Mdga2 SE chr12:67730891 1.000 0.361 0.639 0.004102 0.0221 A330076H08Rik SE chr7:69126562 1.000 0.288 0.712 6.5E-08 6.78E-07 Pja2 SE chr17:64647064 0.850 0.132 0.718 1.15E-05 0.000104 Stxbp1 SE chr2:32650107 1.000 0.178 0.822 0.000132 0.000977 Clcn3 SE chr8:63393549 1.000 0.161 0.839 1.66E-07 1.68E-06 Continued

Table 4.4 Continued Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Ncam1 SE chr9:49314957 1.000 0.147 0.853 <1.0E-16 <1.0E-16 Lrrcc1 SE chr3:14535998 1.000 0.094 0.906 0.000464 0.003164 Nars A3SS chr18:64674979 0.289 0.620 0.331 <1.0E-16 <1.0E-16 Serbp1 A3SS chr6:67222132 0.885 0.517 0.368 <1.0E-16 <1.0E-16 Whsc1 A3SS chr5:34210222 0.088 0.881 0.793 2.83E-09 3.12E-08 Mapk9 A3SS chr11:49696643 0.495 0.143 0.352 7.41E-09 6.12E-08 Ngfrap1 A3SS chrX:132805801 1.000 0.776 0.224 3.09E-06 2.04E-05 Gnb1 A3SS chr4:154931576 0.000 0.266 0.266 3.91E-06 2.15E-05 Ehbp1 A3SS chr11:22046465 1.000 0.368 0.632 9.62E-05 0.000453 Eif4a2 A3SS chr16:23112423 1.000 0.250 0.75 0.00255 0.01052 Nudcd1 A3SS chr15:44231565 1.000 0.625 0.375 0.005493 0.02014 Smg7 A5SS chr1:154695725 0.110 0.737 0.627 <1.0E-16 <1.0E-16

148 2700094K13Rik A5SS chr2:84510143 0.293 1.000 0.707 2.53E-08 1.52E-07 Srsf5 A5SS chr12:82046490 1.000 0.583 0.417 2.91E-06 1.16E-05 Scg3 A5SS chr9:75510999 0.090 0.390 0.3 0.001114 0.003341 Cnot4 A5SS chr6:35014888 0.000 0.224 0.224 0.00634 0.013008 Flrt3 A5SS chr2:140488443 0.000 0.208 0.208 0.006504 0.013008 D17Wsu92e A5SS chr17:27904855 0.225 0.458 0.233 0.044311 0.075962 Usp34 MXN chr11:23220660 0.102 0.580 0.478 <1.0E-16 <1.0E-16 Vcan MXN chr13:89827889 0.973 0.474 0.499 <1.0E-16 <1.0E-16 Chd2 MXN chr7:80652691 0.201 0.704 0.503 1.06E-11 5.58E-11 Casc4 MXN chr2:121732433 0.603 0.279 0.324 1.3E-05 5.46E-05 Trpc1 MXN chr9:95637238 1.000 0.667 0.333 5.71E-05 0.0002 Usp15 MXN chr10:122590096 0.502 0.889 0.387 0.000505 0.001324 Bptf MXN chr11:106960879 1.000 0.775 0.225 0.000996 0.002324 Continued

Table 4.4 Continued Incl Lvl Incl Lvl Incl Lvl Symbol Type Position Conda Ctrlb Adjc P-Value FDR Mapk9 MXN chr11:49687337 1.000 0.781 0.219 0.001169 0.002455 Atp8a1 MXN chr5:68170198 0.000 0.467 0.467 0.002863 0.005465 Ube2k MXN chr5:65957227 0.895 0.654 0.241 0.007963 0.013935 aInclusion level condition (SMA) bInclusion level control cInclusion level difference, adjusted for sample variability

149

disease. For example, Mdm4 coding exon 3 was found to have an inclusion level difference of

0.21 in the P1 mouse, with the exon being skipped slightly more often in the SMA samples.

However, by P6, the inclusion level difference had dramatically increased to 1.000, with it being fully skipped in the SMA sample. Similarly, Magi1 was found with 1 SE event in the P1 data with a difference in inclusion level of 0.22 between the SMA and SMA carrier samples.

However, by P6 the inclusion level difference had increased to 0.88. The third gene Slc4a4 was found with an A5SS at P1 but a SE at P6, both with an inclusion level of about 0.28, and thus the splicing difference is not exacerbated at the later timepoint. Nonetheless the P1 and P6 data together show that some splicing changes are exacerbated as the disease progresses.

In order to determine if any particular pathways were being altered in the SMA patients, pathway analysis was performed using Ingenuity Pathway Analysis (IPA® , QIAGEN Redwood

City, www.qiagen.com/ingenuity). This analysis incorporates both splicing and expression data to determine if either were altering cellular pathways. IPA identified 15 pathways as being significantly altered between the SMA and SMA carrier samples at P1. These pathways and their significance value can be seen in Figure 4.1. I identified p53 signaling and myc-mediated apoptosis signaling as two canonical pathways most significantly affected. We also generated pathways downstream of genes we found to be alternatively spliced and used our gene expression data to determine which of these pathways were most altered. We found pathways downstream of the genes Mdm4 and Magi1 to be significantly changed (p-values of 0.038 and

0.00155, respectively), indicating that splicing changes in these genes may cause numerous downstream effects. These pathways and their significance values can be found in Table 4.5.

150

4.3.5 Validation of splicing changes

The splicing changes in Mdm4, Magi1, and Tia1 were of particular interest. Mdm4 is known to inhibit p53, and the splicing change we had identified would shift the reading frame, possibly altering functional protein levels, which we show is exacerbated at later timepoints.

Activation of p53 was detected in my pathway analysis, and Mdm4 could be the reason for it.

Magi1, like Mdm4, was found to be spliced more aberrantly at later timepoints and is known to be involved in post-synaptic organization. Finally, Tia1 is a RNA-binding protein which pathway analysis has indicated lies upstream of Fas, a pro-apoptitic gene with increased expression of 3.2 log2 fold-change in SMA (though not statistically significant). To validate these changes,

Canonical Pathways

Amyotrophic Lateral Sclerosis Signaling p53 Signaling Wnt/β-catenin Signaling PEDF Signaling IL-3 Signaling ErbB Signaling NRF2-mediated Oxidative Stress Response Neuropathic Pain Signaling In Dorsal Horn… Ephrin Receptor Signaling CXCR4 Signaling UVC-Induced MAPK Signaling Cholecystokinin/Gastrin-mediated Signaling Endothelin-1 Signaling Glioblastoma Multiforme Signaling Myc Mediated Apoptosis Signaling 0 0.5 1 1.5 2 2.5 3 3.5 4 -log(p-value)

Figure 4.1. Pathways detected as being significantly altered in the SMA samples at P1 and their -log(p-value), as detected by Ingenuity Pathway Analysis.

151

Figure 4.2. Validation of splicing changes using ddPCR. Shown are differences in inclusion of alternative exon in both LCM and spinal cord (SC) material. Also shown is the splicing change as measured in the NMD disease control. In Magi1, Mdm4, and Fas a splicing change was detected between the LCM SMA and SMA carrier samples. In the case of Magi1 and Mdm4, this splicing change was not detected in tissues from the spinal cord indicating the importance of using LCM motor neuron tissue in splicing analysis. In the case of Slc39a14, a splicing difference between SMA and SMA carrier LCM tissues was detected, but this change was also in the NMD disease control.

expression was measured using ddPCR with primers and probes were made to the alternative exon and normalized to a flanking constitutive exon. Figure 4.2 shows the ddPCR measurements of the inclusion exon normalized to the constitutive exon. Only a limited difference was detected

152

in the exon inclusion of Mdm4 and Tia1. In the gene Magi1, there was a large detectable difference in expression of the alternative exon, with higher expression in the SMA sample as expected. Fas also showed a difference in exon inclusion. The difference in Fas splicing was detected despite a difference in Tia1 splicing, which I had predicted would alter the Fas gene, indicating that some other process may be altering Fas splicing.

The ddPCR data highlights the importance of using LCM material for splicing analysis instead of whole spinal cord. Magi1, Mdm4, and Slc39a14 all showed no difference between the

SMA and carrier samples when just analyzing the spinal cord data, despite a clear change between the SMA and carrier samples in the data from the LCM samples. Similarly, the splicing change found in Slc39a14 was also detected in the nmd samples, which indicates this change is not SMA specific. These results illustrate the importance of using LCM material and a disease control when performing transcriptome studies on SMA samples.

4.4 Discussion

Determining changes in the transcriptome of SMA animals requires overcoming several obstacles. First, the primary tissue affected in SMA is motor neurons which must be studied specifically. Second, there must be a method for detecting aberrant splicing. It is known that the main function of SMN is in snRNP assembly, and a lack of SMN is expected to alter splicing and create novel isoforms. And third, after detection of splice events, there must be a way to prioritize candidates. In this work, I have designed an experiment to address these issues.

Samples from motor neurons were obtained via LCM so that analysis could be performed on the tissue most affected in SMA. I have also designed a bioinformatic pipeline which comprehensively detects transcriptome changes. This pipeline involves assembly of transcripts

153

followed by novel detection of exons. Finally, by integrating expression and splicing data in pathway analysis, I could limit down a list of splicing changes to just a few genes of priority candidates for further study.

This work shows that mouse annotation databases are not yet complete with respect to motor neurons. I have found 669 splicing events including novel exons which were not in the

Gencode V2 mm10 annotations. Incorporating such events into splicing analysis is crucial for a complete analysis of the transcriptome. Indeed, in the P1 data 7 of these novel events were determined to be alternatively spliced. Using ddPCR, 2 of these splicing changes were confirmed to exist, although an appreciable difference in expression of the isoforms between SMA and

SMA carriers could not be demonstrated (Table 4.2). This analysis needs to also be performed at later timepoints in the disease. It is likely different transcripts are expressed at later timepoints and if they have novel exons or splicing aberrations this method can detect them.

By analyzing both the P1 and P6 data, I have found some evidence that splicing changes are correlated with disease course. From the P6 data, one of the splicing changes with the largest difference in inclusion level was the coding exon 3 of the gene Mdm4, which was skipped in

100% of reads. However, at P1 it had an inclusion level difference of only 0.21 with exon 3 being skipped only approximately 37% of the time, indicating that as the disease progresses so does the extent of the alternative splicing. This also is the case with the Magi1 gene where inclusion level difference increases at later timepoints. It is puzzling why only 2 splicing events occurred in both the P1 and P6 datasets (a third gene, Slc4a4, was identified as being differentially spliced in both datasets, though they were different splicing events). This may just be a result of poor data quality in the P6 samples which made detection of splicing events

154

difficult. It may also mean that different splicing isoforms are required at different time points, and thus finding the transcripts altered in SMA that contribute most to the phenotype will require multiple experiments at various timepoints to determine such transcripts.

Interestingly, the Magi1 exon is skipped more often in the SMA carrier sample than in the SMA sample. This indicates that the splicing changes in SMA are not caused simply by a lack of snRNPs leading to increased exon skipping. This is in agreement with previous experiments where a mutated U2 snRNA gene lead to widespread splicing alteration, not just exon skipping (Jia et al. 2012). However, of the 147 skipped exon events in the P1 sample, in 92 the exon was skipped more often in the SMA sample. Similarly, in the P6 sample 35 out of 52 differentially spliced exon skipping events occurred with more exon skipping in the SMA sample. These data show that although exon skipping tends to occur more frequently in SMA, they need not be the only way splicing changes occur. More data would be needed to confirm that exon skipping is more frequent in SMA.

In this work, I identified the Mdm4 gene as being spliced differently in the SMA sample, specifically by the increased skipping of exon 3. Skipping of this exon would cause a shift in the reading frame, resulting in a premature stop codon. The Mdm4 gene has been extensively studied in cancer biology, as it is a known tumor suppressor due to its inhibitory effects on p53 (Imanishi et al.). Interestingly, skipping of Mdm4 exon 3 and exon 7 in SMA samples has been shown to induce p53 activation, which is attenuated upon delivery of full length SMN (Alstyne et al.

2018). This suggests missplicing could active the p53 pathway, possible leading to apoptosis.

This is totally in agreement with the splicing data I have found as well as pathway analysis which has shown an activation of the p53 pathway. A crucial experiment which should be done

155

is the delivery of full-length Mdm4 to an SMA mouse model with subsequent measure of p53 activation and survival of the mouse. The effect may be limited however, as other evidence suggests that p53 knockout in SMA mice has no survival benefit (Tsai et al. 2006).

By combining splicing and expression data, I had identified both Tia1 and Magi1 as misspliced genes involved in pathways altered in SMA. In the case of Tia1, it was found to lie in a pathway upstream of the Fas gene, which I found to be over-expressed in SMA by over 3-fold, although this was not statistically significant after multiple testing correction. Tia1 is known to affect splicing of numerous genes, including Fas (Izquierdo et al. 2005). Specifically, I had identified that the Tia1-b isoform was increased in SMA. This isoform may result in increased inclusion of Fas exon 6 (Izquierdo et al. 2005), which is an activated form of the gene known to have pro-apoptotic effects. Tia1 mediates this effect by binding to a uridine-rich sequence (called

URI6 in intron 6) (Izquierdo et al. 2005; Izquierdo and Valcárcel 2007). Interestingly, the ddPCR data did not confirm Tia1 to be spliced differently at P1. Measuring for this effect at later timepoints could be informative in determining if this splicing change is involved in SMA pathology.

A third gene analyzed in this work was Magi1. This is a scaffolding junction protein known to be present at the post-synaptic synapse (Wright et al. 2004). This gene was of interest because of its presence in both the P1 and P6 datasets. Magi1 interacts with the Notch signaling ligand Dll1 (Mizuhara et al. 2005). Interestingly, Notch signaling has been demonstrated to be activated in motor neurons from SMNdelta7 mice, including increased expression of Dll1

(Caraballo-Miralles et al. 2013). In short, altered Notch signaling is one mechanism by which misspliced Magi1 may be causing the SMA phenotype. Delivery of a full-length transcript

156

followed by measurement of Notch activation is one way to test this hypothesis. It should be noted that I did not detect any significant alteration in expression of Notch pathway genes in the

RNA-seq data, including in the Dll1 gene. Additionally, developmental defects of the motor axon are not present in a severe mouse model during early embryonic timepoints (McGovern et al. 2008).

It is unknown if motor neuron death occurs in SMA as a result of a single aberrantly spliced transcript, or if there are multiple misspliced transcripts that contribute to the disease.

Although I have detected over 200 splicing changes in the P1 dataset, 159 of them had an inclusion level difference of less than 0.40. The biological relevance of small changes in splicing is not known, but it is fair to say that only a limited number of splicing changes capable of drastically altering the dominant isoform are present in SMA samples. This is in agreement with exon microarray studies which have found only a handful of splicing changes at earlier timepoints (Bäumer et al. 2009). Although there are only limited splicing changes, it is possible multiple are contributing to the SMA phenotype. Previously, the gene Stasimon has been shown to be aberrantly spliced in Drosophila, and this splicing defect was linked to a specific electrophysiological abnormality in SMA flies (Lotti et al. 2012). Expression of full length stasimon corrected this deficit, suggesting that a single splicing change may underlie only one aspect of the disease (Lotti et al. 2012). Agrin has also been identified as a transcript whose splicing was altered in SMA (Zhang et al. 2013). When a corrected isoform of Agrin was over- expressed transgenically in SMA mice, it resulted in a partial rescue of NMJ pathology, although it had no effect on neurodegeneration and only limited effect on survival (Kim et al. 2017a). This

157

indicates that altered Agrin may be responsible for only certain NMJ abnormalities in SMA and other splicing changes must be responsible for neurodegeneration.

Levels of U11 and U12 snRNA are lowered preferentially in SMA tissues, so it has been hypothesized that minor introns, such as the one in Stasimon, would be susceptible to aberrant splicing. Despite manually analyzing all 697 known minor introns in the U12DB (Alioto 2007b),

I found no evidence of either missplicing or intron retention of minor introns in the P1 and P6 samples. A recent study on splicing in spinal cord tissue taken at P1 and P5 from SMA mice has shown that widespread intron retention dis occur in SMA samples, however it occurred almost exclusively in the P5 samples (Doktor et al. 2017). The same study also found over 150 alternative splicing events at the P5 timepoint compared to only 4 at P1 in the SMA sample

(Doktor et al. 2017). I did not see such a big increase in splicing at the P6 timepoint, but as mentioned previously this may be due to poor data quality. Caution should also be taken interpreting those results as they were from whole spinal cord tissue, and as I have shown here changes in whole spinal cord is not always reflective of changes in the motor neuron.

Nonetheless, that such a dramatic increase occurs may mean that at the P5 timepoint crucial changes have already occurred and secondary effects of the disease are underway. Thus, a study of splicing in the motor neurons at an earlier timepoint such as P3 may be informative for finding splicing changes that are SMA specific and cause the SMA phenotype.

In this work, I have designed a bioinformatic pipeline to detect any transcriptome changes in SMA samples. In this work, I show that custom bioinformatic analysis must be performed to detect novel splicing changes. I show that current annotations are insufficient for splicing analysis of LCM material. I also show that there are limited splicing changes at P1 that

158

result in a change in the dominant isoform that is expressed. This work has highlighted the genes

Mdm4, Magi1, and Tia1 as possibly being important in SMA pathology, as they were detected as being alternatively spliced as well as being associated with pathways identified as being altered in SMA. The bioinformatic pipeline I have designed here should be used to analyze RNA-seq data from LCM material of SMA samples at a later timepoint, possibly P3. The pipeline could similarly be used on another animal model of SMA. In particular, this pipeline would be suited for SMA pigs as the pipeline is meant for detection novel transcripts and splicing changes that are not yet annotated.

159

Chapter 5

Conclusions and future directions

In this dissertation, I have performed genomic sequencing to identify modifiers of SMA.

An RNA-sequencing analysis was also performed to identify genes that are spliced aberrantly in

SMA samples. In Chapter 2, a targeted sequencing experiment was performed in order to identify modifiers that lie inside of the SMN2 gene. I made 2 important discoveries: that the intron 6 variants A-44G, A-549G and C-1897T within SMN2 associate with mild exception SMA patients, and that there is a 6.3 kb deletion which can occur in both SMN1 and SMN2 which eliminates SMN1/2 exons 7 and 8. These data show that variants inside of SMN2, including partial deletions, exist and can affect the SMA phenotype. However, as they were only able to explain a subset (14 out of 54) patients, it indicates modifiers exist outside of the SMA region as well, which I went on to investigate in Chapter 3 (discussed below).

The experiments performed in Chapter 2 can be improved on in several ways. First, sequencing of more SMA patients would allow for greater statistical power. Although a variant was needed in only 7 patients to reach statistical significance, this may not be enough to detect rare variants that modulate SMA phenotype. For example, c.859G>C which is a known modifier of SMA that acts by increase FL-SMN had an adjusted p-value of 0.416. I detected 3 alleles of this variant (in 2 patients) which was just not enough to give it statistical significance. It is possible I have missed other variants that act as positive or negative modifiers and increasing

160

statistical power may aid with that. Second, expanding the captured region may allow for detection of other deletions. NAIP is known to be deleted in some SMA patients but no deletion junctions that eliminate SMN1 and NAIP has ever been reported. Capturing both NAIP and SMN and aligning them to a contig that contains both genes may help. However, it is not known how big the deletion is and thus one end of the junction may still end up lying outside the captured region. Finally, performing a de novo assembly of the SMA region may enhance the ability to detect deletions. A consensus map of the region does not exist, though in Chapter 2 I have made a map from overlapping contigs from the RP11 library. Using long-read technologies to assemble the region in patients with NAIP deletions, SMN1 deletions, or SMN1 duplications would give critical insight into how these genomic rearrangements occur. Such assemblies could also be used to map reads from future targeted capture experiments of SMN2.

In Chapter 3 I performed exomic and genomic sequencing on SMA siblings who had a discordant phenotype. There have been numerous proposed modifiers of SMA but identifying genetic variants that are proven to modulate the SMA phenotype has proven difficult. Here I have designed scripts which are capable of identifying all variants that are different between discordant siblings. We show here that with only 3 families (2 pairs and 1 trio) the variants can be filtered down such that only 2 non-synonymous exonic mutations remain. Thus, this is a valid approach for identifying disease modifiers in related individuals. It is interesting that no variant was present in all 4 of our families sequenced. This may be because the modifying variant is intronic and one of our families was only exomic sequenced. Indeed, we have identified really strong evidence of modification by either the FBXO3, SLIT1, or PTPRD gene as a result of an intronic variant. Aside from these genes, we also show 3 synonymous mutations in ROCK2

161

which segregate with the mild siblings, 2 of which are predicted to alter splicing. In short, we have narrowed in on a small list of candidates as a result of our sequencing analysis.

We are proposing the targeting sequencing of these genes in a further cohort of discordant SMA siblings. I have modeled this experiment and determined significance can be reached with only 7 samples. An added benefit is that if there are multiple modifiers in different genes, we may still be able to detect them with the increased number of samples. This approach is also perfect for this analysis because although we have narrowed down the list of candidate genes, we are unsure which variant is acting as the modifier. This is particular true of the intronic candidates. With this approach we can finally identify genetic variants that modify the SMA phenotype.

In Chapter 4, I analyzed RNA-sequencing data to determine which genes were aberrantly spliced. The work can be expanded on in many ways. Most importantly, the data from the P6 was of poor quality and analysis of much of it did not bear meaningful results. Higher quality sequencing data at later timepoints would be crucial for identifying changes that start early. The

P1 raw sequencing data was originally from a paper published by the Dreyfus lab (Zhang et al.

2013). Interestingly, this paper had identified the gene Agrin as being alternatively spliced in

SMA samples (Zhang et al. 2013). I did not find this gene to be misspliced, likely because my analysis uses information from both replicates, whereas theirs did not as it used the program

MISO which must analyze replicates separately. This highlights how variable replicates can be, which can complicate splicing analysis. Adding a third replicate would greatly increase the ability to determine true splicing changes from those that are just due to sample variability.

Indeed, in an analysis by the creators of the MATS splicing analysis program, it was concluded

162

that more replicates is preferred over deeper sequencing coverage (Shen et al. 2014). I anticipate the list of splicing genes would be reduced even more with the addition of another replicate. This work can also be expanded by performing similarly experiments in other animal models of SMA.

In particular, a pig model of SMA has recently been created which successfully reproduces the

SMA phenotype (Duque et al. 2016). Splicing changes that are conserved across species would be highly likely as being critical to causing the SMA phenotype. These improvements would help immensely in identify transcripts misspliced in SMA

163

References

Abera MB, Xiao J, Nofziger J, et al (2016) ML372 blocks SMN ubiquitination and improves

spinal muscular atrophy pathology in mice. JCI Insight 1:. doi: 10.1172/jci.insight.88427

Ackermann B, Kröber S, Torres-Benito L, et al (2013) Plastin 3 ameliorates spinal muscular

atrophy via delayed axon pruning and improves neuromuscular junction functionality. Hum

Mol Genet 1–20. doi: 10.1093/hmg/dds540

Alías L, Barceló MJ, Bernal S, et al (2014) Improving detection and genetic counseling in

carriers of spinal muscular atrophy with two copies of the SMN1 gene. Clin Genet 85:470–

475. doi: 10.1111/cge.12222

Alioto TS (2007) U12DB: a database of orthologous U12-type spliceosomal introns. Nucleic

Acids Res 35:D110-5. doi: 10.1093/nar/gkl796

Alonzi T, Middleton G, Wyatt S, et al (2001) Role of STAT3 and PI 3-kinase/Akt in mediating

the survival actions of cytokines on sensory neurons. Mol Cell Neurosci 18:270–282. doi:

10.1006/mcne.2001.1018

Alstyne M Van, Simon CM, Sardi SP, Shihabuddin LS (2018) and Mdm4 splicing

dysregulation underlies motor neuron death in SMA. 1–15. doi: 10.1101/gad.316059.118.5

Alvarado DM, Yang P, Druley TE, et al (2014) Multiplexed direct genomic selection (MDiGS):

A pooled BAC capture approach for highly accurate CNV and SNP/INDEL detection.

164

Nucleic Acids Res 42:1–10. doi: 10.1093/nar/gku218

Anhuf D, Eggermann T, Rudnik-Schöneborn S, Zerres K (2003) Determination of SMN1 and

SMN2 copy number using TaqManTM technology. Hum Mutat 22:74–78. doi:

10.1002/humu.10221

Arkblad EL, Darin N, Berg K, et al (2006) Multiplex ligation-dependent probe amplification

improves diagnostics in spinal muscular atrophy. Neuromuscul Disord 16:830–838. doi:

10.1016/j.nmd.2006.08.011

Arnold AS, Gueye M, Guettier-Sigrist S, et al (2004) Reduced expression of nicotinic AChRs in

myotubes from spinal muscular atrophy I patients. Lab Investig 84:1271–1278. doi:

10.1038/labinvest.3700163

Arnold WD, Burghes AHM (2013) Spinal muscular atrophy: Development and implementation

of potential treatments. Ann Neurol 74:348–362. doi: 10.1002/ana.23995

Arnold WD, Kassar D, Kissel JT (2015) Spinal muscular atrophy: Diagnosis and management in

a new therapeutic era. Muscle Nerve 51:157–167. doi: 10.1002/mus.24497

Arnoldi A, Crimella C, Tenderini E, et al (2012) Clinical phenotype variability in patients with

hereditary spastic paraplegia type 5 associated with CYP7B1 mutations. Clin Genet 81:150–

157. doi: 10.1111/j.1399-0004.2011.01624.x

Baccon J, Pellizzoni L, Rappsilber J, et al (2002) Identification and characterization of Gemin7,

a novel component of the complex. J Biol Chem 277:31957–

31962. doi: 10.1074/jbc.M203478200

Baughan TD, Dickson A, Osman EY, Lorson CL (2009) Delivery of bifunctional RNAs that

165

target an intronic repressor and increase SMN levels in an animal model of spinal muscular

atrophy. Hum Mol Genet 18:1600–11. doi: 10.1093/hmg/ddp076

Bäumer D, Lee S, Nicholson G, et al (2009) Alternative splicing events are a late feature of

pathology in a mouse model of spinal muscular atrophy. PLoS Genet 5:e1000773. doi:

10.1371/journal.pgen.1000773

Beevor CE (1902) A case of congenital spinal muscular atrophy (family type) and a case of

hemorrhage into the spinal cord at birth, giving similar symptoms. Brain 25:85–108

Bernal S, Alías L, Barceló MJ, et al (2010) The c.859G>C variant in the SMN2 gene is

associated with types II and III SMA and originates from a common ancestor. J Med Genet

47:640–2. doi: 10.1136/jmg.2010.079004

Bernal S, Also-Rallo E, Martínez-Hernández R, et al (2011) Plastin 3 expression in discordant

spinal muscular atrophy (SMA) siblings. Neuromuscul Disord 21:413–9. doi:

10.1016/j.nmd.2011.03.009

Bevan AK, Duque S, Foust KD, et al (2011) Systemic gene delivery in large species for targeting

spinal cord, brain, and peripheral tissues for pediatric disorders. Mol Ther 19:1971–1980.

doi: 10.1038/mt.2011.157

Bezakova G, Ruegg MA (2003) New insights into the roles of agrin. Nat Rev Mol Cell Biol

4:295–308. doi: 10.1038/nrm1074

Biancheri R, Ciccolella M, Rossi A, et al (2009) White matter lesions in spastic paraplegia with

mutations in SPG5/CYP7B1. Neuromuscul Disord 19:62–65. doi:

10.1016/j.nmd.2008.10.009

166

Bobowick AR, Brody JA (1973) Epidemiology of Motor-Neuron Diseases. N Engl J Med

288:1047–1055. doi: 10.1056/NEJM197305172882005

Bowerman M, Beauvais A, Anderson CL, Kothary R (2010) Rho-kinase inactivation prolongs

survival of an intermediate SMA mouse model. Hum Mol Genet 19:1468–1478. doi:

10.1093/hmg/ddq021

Bowerman M, Murray LM, Boyer JG, et al (2012) Fasudil improves survival and promotes

skeletal muscle development in a mouse model of spinal muscular atrophy. BMCMed 10:24

Brichta L, Hofmann Y, Hahnen E, et al (2003) Valproic acid increases the SMN2 protein level:

A well-known drug as a potential therapy for spinal muscular atrophy. Hum Mol Genet

12:2481–2489. doi: 10.1093/hmg/ddg256

Brzustowicz LM, Lehner T, Castilla LH, et al (1990) Genetic mapping of chronic childhood-

onset spinal muscular atrophy to chromosome 5q1 1.2-13.3. Nature 344:540–541

Buchthal F, Olsen PZ (1970) Electromyography and muscle biopsy in infantile spinal muscular

atrophy. Brain 93:15–30

Bühler D, Raker V, Lührmann R, Fischer U (1999) Essential role for the tudor domain of SMN

in spliceosomal U snRNP assembly: Implications for spinal muscular atrophy. Hum Mol

Genet 8:2351–2357. doi: 10.1093/hmg/8.13.2351

Burghes AHM (1997) When is a deletion not a deletion? When it is converted. Am J Hum Genet

61:9–15. doi: 10.1086/513913

Burghes AHM, Beattie CE (2009) Spinal muscular atrophy: why do low levels of survival motor

neuron protein make motor neurons sick? Nat Rev Neurosci 10:597–609. doi:

167

10.1038/nrn2670

Burghes AHM, Ingraham SE, Kóte-Jarai Z, et al (1994a) Linkage mapping of the spinal

muscular atrophy gene. Hum Genet 93:305–12

Burghes AHM, Ingraham SE, McLean M, et al (1994b) A multicopy dinucleotide marker that

maps close to the spinal muscular atrophy gene. Genomics 21:394–402. doi:

10.1006/geno.1994.1282

Burghes AHM, McGovern VL (2017) Genetics of Spinal Muscular Atrophy. Mol Cell Ther Mot

Neuron Dis 121–139. doi: 10.1016/B978-0-12-802257-3.00006-7

Burlet P, Bürglen L, Clermont O, et al (1996) Large scale deletions of the 5q13 region are

specific to Werdnig-Hoffmann disease. J Med Genet 33:281–283. doi:

10.1136/jmg.33.4.281

Burnett BG, Munoz E, Tandon A, et al (2009) Regulation of SMN Protein Stability. Mol Cell

Biol 29:1107–1115. doi: 10.1128/MCB.01262-08

Byers RK, Banker BQ (1961) Infantile Muscular Atrophy. Arch Neurol 5:140–164. doi:

10.1001/archneur.1961.00450140022003

Calucho M, Bernal S, Alías L, et al (2018) Correlation between SMA type and SMN2 copy

number revisited: An analysis of 625 unrelated Spanish patients and a compilation of 2834

reported cases. Neuromuscul Disord 28:208–215. doi: 10.1016/j.nmd.2018.01.003

Campbell L, Potter A, Ignatius J, et al (1997) Genomic variation and gene conversion in spinal

muscular atrophy: implications for disease process and clinical phenotype. Am J Hum

Genet 61:40–50. doi: S0002-9297(07)64275-X [pii]\r10.1086/513886

168

Caraballo-Miralles V, Cardona-Rossinyol A, Garcera A, et al (2013) Notch Signaling Pathway Is

Activated in Motoneurons of Spinal Muscular Atrophy. Int J Mol Sci 14:11424–11437. doi:

10.3390/ijms140611424

Carissimi C, Baccon J, Straccia M, et al (2005) Unrip is a component of SMN complexes active

in snRNP assembly. FEBS Lett 579:2348–2354. doi: 10.1016/j.febslet.2005.03.034

Carissimi C, Saieva L, Baccon J, et al (2006) Gemin8 is a novel component of the survival motor

neuron complex and functions in small nuclear ribonucleoprotein assembly. J Biol Chem

281:8126–8134. doi: 10.1074/jbc.M512243200

Carpten JD, DiDonato CJ, Ingraham SE, et al (1994) A YAC contig of the region containing the

spinal muscular atrophy gene (SMA): Identification of an unstable region. Genomics

24:351–356. doi: 10.1006/geno.1994.1626

Cartegni L, Hastings ML, Calarco JA, et al (2006) Determinants of Exon 7 Splicing in the Spinal

Muscular Atrophy Genes, SMN1 and SMN2. Am J Hum Genet 78:63–77. doi:

10.1086/498853

Cartegni L, Krainer AR (2002) Disruption of an SF2/ASF-dependent exonic splicing enhancer in

SMN2 causes spinal muscular atrophy in the absence of SMN. Nat Genet 30:377–384. doi:

10.1038/ng854

Carvalho T, Almeida F, Calapez A, et al (1999) The Spinal Muscular Atrophy Disease Gene

Product, Smn. J Cell Biol 147:715–728. doi: 10.1083/jcb.147.4.715

Ceballos FC, Hazelhurst S, Ramsay M (2018) Assessing runs of Homozygosity: A comparison

of SNP Array and whole genome sequence low coverage data. BMC Genomics 19:1–12.

169

doi: 10.1186/s12864-018-4489-0

Chan TA, Heguy A (2009) The protein tyrosine phosphatase receptor D, a broadly inactivated

tumor suppressor regulating STAT function. Cell Cycle 8(19):3063-3064. doi:

10.4161/cc.8.19.9455

Chandra D, Londino J, Alexander S, et al (2019) The SCFFBXO3 ubiquitin E3 ligase regulates

inflammation in atherosclerosis. J Mol Cell Cardiol 126:50–59. doi:

10.1016/j.yjmcc.2018.11.006

Chang HC, Hung WC, Chuang YJ, Jong YJ (2004) Degradation of survival motor neuron (SMN)

protein is mediated via the ubiquitin/proteasome pathway. Neurochem Int 45:1107–1112.

doi: 10.1016/j.neuint.2004.04.005

Chang HCH, Dimlich DN, Yokokura T, et al (2008) Modeling spinal muscular atrophy in

Drosophila. PLoS One 3:1–18. doi: 10.1371/journal.pone.0003209

Chari A, Paknia E, Fischer U (2009) The role of RNP biogenesis in spinal muscular atrophy.

Curr Opin Cell Biol 21:387–93. doi: 10.1016/j.ceb.2009.02.004

Charroux B, Pellizzoni L, Perkinson RA, et al (2000) Gemin4: A novel component of the SMN

complex that is found in both gems and nucleoli. J Cell Biol 148:1177–1186. doi:

10.1083/jcb.148.6.1177

Charroux B, Pellizzoni L, Perkinson RA, et al (1999) Gemin3: A novel DEAD box protein that

interacts with SMN, the spinal muscular atrophy gene product, and is a component of gems.

J Cell Biol 147:1181–1193. doi: 10.1083/jcb.147.6.1181

Chen KL, Wang YL, Rennert H, et al (1999) Duplications and de novo deletions of the SMNt

170

gene demonstrated by fluorescence-based carrier testing for spinal muscular atrophy. Am J

Med Genet 85:463–469. doi: 10.1002/(SICI)1096-8628(19990827)85:5<463::AID-

AJMG6>3.0.CO;2-V

Chen Q, Baird SD, Mahadevan M, et al (1998) Sequence of a 131-kb region of 5q13.1

containing the spinal muscular atrophy candidate genes SMN and NAIP. Genomics 48:121–

127. doi: 10.1006/geno.1997.5141

Chiriboga CA, Swoboda KJ, Darras BT, et al (2016) Results from a phase 1 study of nusinersen

(ISIS-SMN Rx) in children with spinal muscular atrophy. Neurology 86:890–897. doi:

10.1212/WNL.0000000000002445

Cifuentes-Diaz C, Frugier T (2001) Deletion of murine SMN exon 7 directed to skeletal muscle

leads to severe muscular dystrophy. J Cell Bio 152:1107–1114

Clermont O, Burlet P, Burglen L, et al (1994) Use of genetic and physical mapping to locate the

spinal muscular atrophy locus between two new highly polymorphic DNA markers. Am J

Hum Genet 54:687–94

Cobben JM, Van Der Steege G, Grootscholten P, et al (1995) Deletions of the Survival Motor

Neuron Gene in Unaffected Siblings of Patients with Spinal Muscular Atrophy. Am J Hum

Genet 57:805–808

Conforti FL, Muglia M, Mazzei R, et al (2004) A new SBF2 mutation in a family with recessive

demyelinating Charcot-Marie-Tooth (CMT4B2). Neurology 63:1327–1328. doi:

10.1212/01.WNL.0000140617.02312.80

Coovert DD, Le TT, McAndrew PE, et al (1997) The survival motor neuron protein in spinal

171

muscular atrophy. Hum Mol Genet 6:1205–1214. doi: 10.1093/hmg/6.8.1205

Criscuolo C, Filla A, Coppola G, et al (2009) Two novel CYP7B1 mutations in Italian families

with SPG5: A clinical and genetic study. J Neurol 256:1252–1257. doi: 10.1007/s00415-

009-5109-3

Cui Y, He S, Xing C, et al (2011) SCFFBXL15 regulates BMP signalling by directing the

degradation of HECT-type ubiquitin ligase Smurf1. EMBO J 30:2675–2689. doi:

10.1038/emboj.2011.155

Cui YL, Zhang JL, Zheng QC, et al (2013) Structural and dynamic basis of human cytochrome

P450 7B1: A survey of substrate selectivity and major active site access channels. Chem - A

Eur J 19:549–557. doi: 10.1002/chem.201202627

Cuscó I, Barceló MJ, Del Río E, et al (2004) Detection of novel mutations in the SMN Tudor

domain in type I SMA patients. Neurology 63:146–149. doi:

10.1212/01.WNL.0000132634.48815.13

Cuscó I, Barceló MJ, Rojas-García R, et al (2006) SMN2 copy number predicts acute or chronic

spinal muscular atrophy but does not account for intrafamilial variability in siblings. J

Neurol 253:21–5. doi: 10.1007/s00415-005-0912-y

Da Silva JS, Medina M, Zuliani C, et al (2003) RhoA/ROCK regulation of neuritogenesis via

profilin IIa-mediated control of actin stability. J Cell Biol 162:1267–1279. doi:

10.1083/jcb.200304021

David Arnold W, Porensky PN, Mcgovern VL, et al (2014) Electrophysiological biomarkers in

spinal muscular atrophy: Proof of concept. Ann Clin Transl Neurol 1:34–44. doi:

172

10.1002/acn3.23 de Planell-Saguer M, Schroeder DG, Rodicio MC, et al (2009) Biochemical and genetic evidence

for a role of IGHMBP2 in the translational machinery. Hum Mol Genet 18:2115–26. doi:

10.1093/hmg/ddp134 de Wit J, Sylwestrak E, O’Sullivan ML, et al (2009) LRRTM2 Interacts with Neurexin1 and

Regulates Excitatory Synapse Formation. Neuron 64:799–806. doi:

10.1016/j.neuron.2009.12.019

DePristo M, Banks E, Poplin R, et al (2011) A framework for variation discovery and

genotyping using next-generation DNA sequencing data. Nat Genet 43:491–8. doi:

10.1038/ng.806

Desmet FO, Hamroun D, Lalande M, et al (2009) Human Splicing Finder: An online

bioinformatics tool to predict splicing signals. Nucleic Acids Res 37:1–14. doi:

10.1093/nar/gkp215

Di Fabio R, Marcotulli C, Tessa A, et al (2014) Sensory ataxia as a prominent clinical

presentation in three families with mutations in CYP7B1. J Neurol 261:747–751. doi:

10.1007/s00415-014-7247-5

DiDonato CJ (1995) The spinal muscular atrophy gene:“Isolation and characterization of the

genetic and physical region surrounding the gene locus and identification of candidate

cDNSs.” The Ohio State University, Columbus OH

DiDonato CJ, Ingraham SE, Mendell JR, et al (1997) Deletion and conversion in spinal muscular

atrophy patients: is there a relationship to severity? Ann Neurol 41:230–237. doi:

173

10.1002/ana.410410214

DiDonato CJ, Morgan K, Carpten JD, et al (1994) Association between Ag1-CA alleles and

severity of autosomal recessive proximal spinal muscular atrophy. Am J Hum Genet

55:1218–1229

DiMatteo D, Callahan S, Kmiec EB (2008) Genetic conversion of an SMN2 gene to SMN1: a

novel approach to the treatment of spinal muscular atrophy. Exp Cell Res 314:878–86. doi:

10.1016/j.yexcr.2007.10.012

Dimitriadi M, Derdowski A, Kalloo G, et al (2016) Decreased function of survival motor neuron

protein impairs endocytic pathways. Proc Natl Acad Sci 113:E4377–E4386. doi:

10.1073/pnas.1600015113

Dobin A, Davis CA, Schlesinger F, et al (2012) STAR : ultrafast universal RNA-seq aligner.

Bioinformatics 29:15–21

Doktor TK, Hua Y, Andersen HS, et al (2017) RNA-sequencing of a mouse-model of spinal

muscular atrophy reveals tissue-wide changes in splicing of U12-dependent introns. Nucleic

Acids Res 45:395–416. doi: 10.1093/nar/gkw731

Dominski Z, Marzluff WF (2007) Formation of the 3′ end of histone mRNA: Getting closer to

the end. Gene 396:373–390. doi: 10.1016/j.gene.2007.04.021

Dubowitz V (1964) Infantile Muscular Atrophy. A Prospective Study With Particular Reference

To A Slowly Progressive Variety. Brain 87:707–18

Dubowitz V, Sewry CA, Fitzsimons RB (1985) Muscle biopsy: A practical approach, 2nd edn

Duque S, Joussemet B, Riviere C, et al (2009) Intravenous administration of self-complementary

174

AAV9 enables transgene delivery to adult motor neurons. Mol Ther 17:1187–1196. doi:

10.1038/mt.2009.71

Duque SI, Arnold WD, Odermatt P, et al (2016) A large animal model of Spinal Muscular

Atrophy and correction of phenotype. 77:399–414. doi: 10.1002/ana.24332.A

Eisfeldt J, Nilsson D, Andersson-Assarsson JC, Lindstrand A (2018) AMYCNE: Confident copy

number assessment using whole genome sequencing data. PLoS One 13:1–14. doi:

10.1371/journal.pone.0189710

Eshraghi M, McFall E, Gibeault S, Kothary R (2016) Effect of genetic background on the

phenotype of the Smn 2B/- mouse model of spinal muscular atrophy. Hum Mol Genet

25:ddw278. doi: 10.1093/hmg/ddw278

Fallini C, Zhang H, Su Y, et al (2011) The Survival of Motor Neuron (SMN) Protein Interacts

with the mRNA-Binding Protein HuD and Regulates Localization of Poly(A) mRNA in

Primary Motor Neuron Axons. J Neurosci 31:3914–3925. doi: 10.1523/JNEUROSCI.3631-

10.2011

Feldkötter M, Schwarzer V, Wirth R, et al (2002) Quantitative analyses of SMN1 and SMN2

based on real-time lightCycler PCR: fast and highly reliable carrier testing and prediction of

severity of spinal muscular atrophy. Am J Hum Genet 70:358–68. doi: 10.1086/338627

Finkel RS, Mercuri E, Darras BT, et al (2017) Nusinersen versus Sham Control in Infantile-

Onset Spinal Muscular Atrophy. N Engl J Med 377:1723–1732. doi:

10.1056/NEJMoa1702752

Fischer U, Liu Q, Dreyfuss G (1997) The SMN-SIP1 complex has an essential role in

175

spliceosomal snRNP biogenesis. Cell 90:1023–9

Fletcher E V., Simon CM, Pagiazitis JG, et al (2017) Reduced sensory synaptic excitation

impairs motor neuron function via Kv2.1 in spinal muscular atrophy. Nat Neurosci 20:905–

916. doi: 10.1038/nn.4561

Fontrodona L, Porta-de-la-Riva M, Morán T, et al (2013) RSR-2, the Caenorhabditis elegans

ortholog of human spliceosomal component SRm300/SRRM2, regulates development by

influencing the transcriptional machinery. PLoS Genet 9:e1003543. doi:

10.1371/journal.pgen.1003543

Foust KD, Nurre E, Montgomery CL, et al (2009) Intravascular AAV9 preferentially targets

neonatal neurons and adult astrocytes. Nat Biotechnol 27:59–65. doi: 10.1038/nbt.1515

Foust KD, Wang X, McGovern VL, et al (2010) Rescue of the spinal muscular atrophy

phenotype in a mouse model by early postnatal delivery of SMN. Nat Biotechnol 28:271–4.

doi: 10.1038/nbt.1610

Francis MJ, Morrison KE, Campbell L, et al (1993) A contig of non-chimaeric YACs containing

the spinal muscular atrophy gene in 5q13. Hum Mol Genet 2:1161–1167. doi:

10.1093/hmg/2.8.1161

Gabanella F, Butchbach MER, Saieva L, et al (2007) Ribonucleoprotein assembly defects

correlate with spinal muscular atrophy severity and preferentially affect a subset of

spliceosomal snRNPs. PLoS One 2:e921. doi: 10.1371/journal.pone.0000921

Gary DS, Mattson MP (2002) PTEN regulates Akt kinase activity in hippocampal neurons and

increases their sensitivity to glutamate and apoptosis. NeuroMolecular Med 2:261–269. doi:

176

10.1385/NMM:2:3:261

Gavrilina TO, McGovern VL, Workman E, et al (2008) Neuronal SMN expression corrects

spinal muscular atrophy in severe SMA mice while muscle-specific SMN expression has no

phenotypic effect. Hum Mol Genet 17:1063–75. doi: 10.1093/hmg/ddm379

Giesemann T, Rathke-hartlieb S, Rothkegel M, et al (1999) A Role for Polyproline Motifs in the

Spinal Muscular Atrophy Protein SMN. Biochemistry 274:37908–37914. doi:

10.1074/jbc.274.53.37908

Gilliam TC, Brzustowicz LM, Castilla LH, et al (1990) Genetic homogeneity between acute and

chronic forms of spinal muscular atrophy. Nature 345:823–825. doi: 10.1038/345823a0

Goizet C, Boukhris A, Durr A, et al (2009) CYP7B1 mutations in pure and complex forms of

hereditary spastic paraplegia type 5. Brain 132:1589–1600. doi: 10.1093/brain/awp073

Grad JM, Zeng X, Boise LH (2000) Regulation of Bcl-x L : a little bit of this and a little bit of

STAT. Curr Op Onc 12:543–549

Grobet L, Pirottin D, Farnir F, et al (2003) Modulating skeletal muscle mass by postnatal,

muscle-specific inactivation of the myostatin gene. Genesis 35:227–238. doi:

10.1002/gene.10188

Gubitz AK, Mourelatos Z, Abel L, et al (2002) Gemin5, a novel WD repeat protein component

of the SMN complex that binds Sm proteins. J Biol Chem 277:5631–5636. doi:

10.1074/jbc.M109448200

Haas CA, Hofmann HD, Kirsch M (1999) Expression of CNTF/LIF-receptor components and

activation of STAT3 signaling in axotomized facial motoneurons: Evidence for a sequential

177

postlesional function of the cytokines. J Neurobiol 41:559–571. doi: 10.1002/(SICI)1097-

4695(199912)41:4<559::AID-NEU11>3.0.CO;2-A

Hahnen E, Forkert R, Marke C, et al (1995) Molecular analysis of candidate genes on

chromosome 5q13 in autosomal recessive spinal muscular atrophy: Evidence of

homozygous deletions of the SMN gene in unaffected individuals. Hum Mol Genet 4:1927–

1933. doi: 10.1093/hmg/4.10.1927

Han KJ, Foster D, Harhaj EW, et al (2016) Monoubiquitination of survival motor neuron

regulates its cellular localization and Cajal body integrity. Hum Mol Genet 25:1392–1405.

doi: 10.1093/hmg/ddw021

Han KJ, Foster DG, Zhang NY, et al (2012) Ubiquitin-specific protease 9x deubiquitinates and

stabilizes the spinal muscular atrophy protein-survival motor neuron. J Biol Chem

287:43741–43752. doi: 10.1074/jbc.M112.372318

Hao le T, Duy PQ, An M, et al (2017) HuD and the Survival Motor Neuron Protein Interact in

Motoneurons and Are Essential for Motoneuron Development, Function, and mRNA

Regulation. J Neurosci 37:11559–11571. doi: 10.1523/JNEUROSCI.1528-17.2017

Hao LT, Burghes AHM, Beattie CE (2011) Generation and Characterization of a genetic

zebrafish model of SMA carrying the human SMN2 gene. Mol Neurodegener 6:1–9. doi:

10.1186/1750-1326-6-24

Hao LT, Wolman M, Granato M, Beattie CE (2012) Survival motor neuron affects plastin 3

protein levels leading to motor defects. J Neurosci 32:5074–84. doi:

10.1523/JNEUROSCI.5808-11.2012

178

Harahap NIF, Nurputra DK, Ar Rochmah M, et al (2015) Salbutamol inhibits ubiquitin-mediated

survival motor neuron protein degradation in spinal muscular atrophy cells. Biochem

Biophys Reports 4:351–356. doi: 10.1016/j.bbrep.2015.10.012

Harrington EA, Sloan JL, Manoli I, et al (2016) Neutralizing Antibodies Against Adeno-

Associated Viral Capsids in Patients with mut Methylmalonic Acidemia. Hum Gene Ther

27:345–53. doi: 10.1089/hum.2015.092

Hauke J, Riessland M, Lunke S, et al (2009) Survival motor neuron gene 2 silencing by DNA

methylation correlates with spinal muscular atrophy disease severity and can be bypassed

by histone deacetylase inhibition. Hum Mol Genet 18:304–317. doi: 10.1093/hmg/ddn357

Hausmanowa-Petrusewicz I, Askanas W, Badurska B, et al (1968) Infantile and juvenile spinal

muscular atrophy. J Neurol Sci 6:269–287. doi: 10.1016/0022-510X(68)90096-8

Hausmanowa-Petrusewicz I, Karwańska A (1986) Electromyographic findings in different forms

of infantile and juvenile proximal spinal muscular atrophy. Muscle Nerve 9:37–46. doi:

10.1002/mus.880090106

Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, et al (2012) An anatomically comprehensive

atlas of the adult human brain transcriptome. Nature 489:391–399. doi:

10.1038/nature11405

Hill JJ, Qiu Y, Hewick RM, Wolfman NM (2003) Regulation of Myostatin in Vivo by Growth

and Differentiation Factor-Associated Serum Protein-1: A Novel Protein with Protease

Inhibitor and Follistatin Domains. Mol Endocrinol 17:1144–1154. doi: 10.1210/me.2002-

0366

179

Hoffman GG, Dodson GE, Cole WG, Greenspan DS (2008) Absence of apparent disease causing

mutations in COL5A3 in 13 patients with hypermobility Ehlers-Danlos syndrome. Am J

Med Genet A 146A:3240–1. doi: 10.1002/ajmg.a.32586

Hoffmann J (1893) Ueber chronische spinale Muskelatrophie im Kindesalter, auf familiärer

Basis. Dtsch Z Nervenheilkd 3:427–470. doi: 10.1007/BF01668496

Hosseinibarkooie S, Peters M, Torres-Benito L, et al (2016) The Power of Human Protective

Modifiers: PLS3 and CORO1C Unravel Impaired Endocytosis in Spinal Muscular Atrophy

and Rescue SMA Phenotype. Am J Hum Genet 99:647–665. doi:

10.1016/j.ajhg.2016.07.014

Hsu SH, Lai MC, Er TK, et al (2010) Ubiquitin carboxyl-terminal hydrolase L1 (UCHL1)

regulates the level of SMN expression through ubiquitination in primary spinal muscular

atrophy fibroblasts. Clin Chim Acta 411:1920–1928. doi: 10.1016/j.cca.2010.07.035

Hua Y, Sahashi K, Hung G, et al (2010) Antisense correction of SMN2 splicing in the CNS

rescues necrosis in a type III SMA mouse model. Genes Dev 24:1634–1644. doi:

10.1101/gad.1941310

Hua Y, Sahashi K, Rigo F, et al (2011) Peripheral SMN restoration is essential for long-term

rescue of a severe spinal muscular atrophy mouse model. Nature 478:123–6. doi:

10.1038/nature10485

Hubers L, Valderrama-Carvajal H, Laframboise J, et al (2011) HuD interacts with survival motor

neuron protein and can rescue spinal muscular atrophy-like neuronal defects. Hum Mol

Genet 20:553–579. doi: 10.1093/hmg/ddq500

180

Imanishi M, Yamamoto Y, Wang X, et al Augmented antitumor activity of 5-FU by double

knockdown of MDM4 and MDM2 in colon and gastric cancer cells. Cancer Sci 0:. doi:

10.1111/cas.13893

Irie A, Yates EA, Turnbull JE, Holt CE (2002) Specific heparan sulfate structures involved in

retinal axon targeting. Development 70:61–70

Iyer CC, Corlett KM, Massoni-Laporte A, et al (2018) Mild SMN missense alleles are only

functional in the presence of SMN2 in mammals. Hum Mol Genet 00:1–13. doi:

10.1093/hmg/ddy251

Iyer CC, McGovern VL, Murray JD, et al (2015) Low levels of Survival Motor Neuron protein

are sufficient for normal muscle function in the SMNδ7 mouse model of SMA. Hum Mol

Genet 24:6160–6173. doi: 10.1093/hmg/ddv332

Izaurralde E, Lewis J, Gamberi C, et al (1995) A cap-binding protein complex mediating U

snRNA export. Nature 376:709–712. doi: 10.1038/376709a0

Izquierdo JM, Majós N, Bonnal S, et al (2005) Regulation of Fas alternative splicing by

antagonistic effects of TIA-1 and PTB on exon definition. Mol Cell 19:475–84. doi:

10.1016/j.molcel.2005.06.015

Izquierdo JM, Valcárcel J (2007) Two isoforms of the T-cell intracellular antigen 1 (TIA-1)

splicing factor display distinct splicing regulation activities. Control of TIA-1 isoform ratio

by TIA-1-related protein. J Biol Chem 282:19410–7. doi: 10.1074/jbc.M700688200

Jaafar C, Omais S, Al Lafi S, et al (2016) Role of Rb during Neurogenesis and Axonal Guidance

in the Developing Olfactory System. Front Mol Neurosci 9:1–15. doi:

181

10.3389/fnmol.2016.00081

Jablonka S, Dombert B, Asan E, Sendtner M (2013) Mechanisms for axon maintenance and

plasticity in motoneurons: alterations in motoneuron disease. J Anat. doi: 10.1111/joa.12097

Jedrzejowska M, Borkowska J, Zimowski J, et al (2008) Unaffected patients with a homozygous

absence of the SMN1 gene. Eur J Hum Genet 16:930–4. doi: 10.1038/ejhg.2008.41

Jedrzejowska M, Milewski M, Zimowski J, et al (2009) Phenotype modifiers of spinal muscular

atrophy: the number of SMN2 gene copies, deletion in the NAIP gene and probably gender

influence the course of the disease. Acta Biochim Pol 56:103–108. doi: 20091732 [pii]

Jia Y, Mu JC, Ackerman SL (2012) Mutation of a U2 snRNA gene causes global disruption of

alternative splicing and neurodegeneration. Cell 148:296–308. doi:

10.1016/j.cell.2011.11.057

Kaifer KA, Villalón E, Osman EY, et al (2017) Plastin-3 extends survival and reduces severity in

mouse models of spinal muscular atrophy. JCI insight 2:e89970. doi:

10.1172/jci.insight.89970

Kainulainen M, Habjan M, Hubel P, et al (2014) Virulence Factor NSs of Rift Valley Fever

Virus Recruits the F-Box Protein FBXO3 To Degrade Subunit p62 of General Transcription

Factor TFIIH. J Virol 88:3464–3473. doi: 10.1128/JVI.02914-13

Kambach C, Walke S, Nagai K (1999) Structure and assembly of the spliceosomal small nuclear

ribonucleoprotein particles. Curr Opin Struct Biol 9:222–230. doi: 10.1016/S0959-

440X(99)80032-3

Kariya S, Park G-H, Maeno-Hikichi Y, et al (2008) Reduced SMN protein impairs maturation of

182

the neuromuscular junctions in mouse models of spinal muscular atrophy. Hum Mol Genet

17:2552–69. doi: 10.1093/hmg/ddn156

Kashima T, Manley JL (2003) A negative element in SMN2 exon 7 inhibits splicing in spinal

muscular atrophy. Nat Genet 34:460–3. doi: 10.1038/ng1207

Katz Y, Wang ET, Airoldi EM, Burge CB (2010) Analysis and design of RNA sequencing

experiments for identifying isoform regulation. Nat Methods 7:1009–15. doi:

10.1038/nmeth.1528

Kim JK, Caine C, Awano T, et al (2017) Motor neuronal repletion of the NMJ organizer, Agrin,

modulates the severity of the spinal muscular atrophy disease phenotype in model mice.

Hum Mol Genet 26:2377–2385. doi: 10.1093/hmg/ddx124

Kim M, Fontelonga T, Roesener AP, et al (2015) Motor neuron cell bodies are actively

positioned by Slit/Robo repulsion and Netrin/DCC attraction. Dev Biol 399:68–79. doi:

10.1016/j.ydbio.2014.12.014

Kim M, Fontelonga TM, Lee CH, et al (2017) Motor axons are guided to exit points in the spinal

cord by Slit and Netrin signals. Dev Biol 432:178–191. doi: 10.1016/j.ydbio.2017.09.038

Kissel JT, Scott CB, Reyna SP, et al (2011) SMA CARNI-VAL TRIAL PART II: A prospective,

single-armed trial of L-carnitine and valproic acid in ambulatory children with spinal

muscular atrophy. PLoS One 6:1–11. doi: 10.1371/journal.pone.0021296

Kletzl H, Marquet A, Günther A, et al (2018) The oral splicing modifier RG7800 increases full

length survival of motor neuron 2 mRNA and survival of motor neuron protein: Results

from trials in healthy adults and patients with spinal muscular atrophy. Neuromuscul Disord

183

1–9. doi: 10.1016/j.nmd.2018.10.001

Kleyn PW, Wang CH, Lien LL, et al (1993) Construction of a yeast artificial chromosome contig

spanning the spinal muscular atrophy disease gene region. Proc Natl Acad Sci U S A

90:6801–6805. doi: 10.1073/pnas.90.14.6801

Kolb SJ, Coffey CS, Yankey JW, et al (2017) Natural history of infantile-onset spinal muscular

atrophy. Ann Neurol 82:883–891. doi: 10.1002/ana.25101

Kolb SJ, Kissel JT (2015) Spinal Muscular Atrophy. Neurol Clin 33:831–846. doi:

10.1016/j.ncl.2015.07.004

Kramer SG, Kidd T, Simpson JH, Goodman CS (2001) Switching repulsion to attraction:

Changing responses to slit during transition in mesoderm migration. Science (80- )

292:737–740. doi: 10.1126/science.1058766

Krosschell KJ, Kissel JT, Townsend EL, et al (2018) Clinical trial of L-Carnitine and valproic

acid in spinal muscular atrophy type I. Muscle and Nerve 57:193–199. doi:

10.1002/mus.25776

Kugelberg E, Welander L (1956) Heredofamilial Juvenile muscular Atrophy Simulating

muscular Dystrophy. Arch Neurol Psychiatry 75:500–509. doi:

10.1001/archneurpsyc.1956.02330230050005

Kwon DY, Dimitriadi M, Terzic B, et al (2013) The E3 ubiquitin ligase mind bomb 1

ubiquitinates and promotes the degradation of survival of motor neuron protein. Mol Biol

Cell 24:1863–1871. doi: 10.1091/mbc.E13-01-0042

Kwong E, Li Y, Hylemon PB, Zhou H (2015) Bile acids and sphingosine-1-phosphate receptor 2

184

in hepatic lipid metabolism. Acta Pharm Sin B 5:151–157. doi: 10.1016/j.apsb.2014.12.009

Labiner DM (2002) DP-VPA D-Pharm. Curr Opin Investig Drugs 3:921—923

Le TT, Pham LT, Butchbach MER, et al (2005) SMNDelta7, the major product of the

centromeric survival motor neuron (SMN2) gene, extends survival in mice with spinal

muscular atrophy and associates with full-length SMN. Hum Mol Genet 14:845–57. doi:

10.1093/hmg/ddi078

Lefebvre S, Bürglen L, Reboullet S, et al (1995) Identification and characterization of a spinal

muscular atrophy-determining gene. Cell 80:155–65

Lefebvre S, Burlet P, Liu Q, et al (1997) Correlation between severity and SMN protein level in

spinal muscular atrophy. Nat Genet 16:265–9. doi: 10.1038/ng0797-265

Leng Y, Chuang D-M (2006) Endogenous -Synuclein Is Induced by Valproic Acid through

Histone Deacetylase Inhibition and Participates in Neuroprotection against Glutamate-

Induced Excitotoxicity. J Neurosci 26:7502–7512. doi: 10.1523/JNEUROSCI.0096-06.2006

Li-Hawkins J, Lund EG, Turley SD, Russell DW (2000) Disruption of the oxysterol 7alpha-

hydroxylase gene in mice. J Biol Chem 275:16536–16542. doi: 10.1074/jbc.M001811200

Li D, Xie P, Zhao F, et al (2015) F-box protein Fbxo3 targets Smurf1 ubiquitin ligase for

ubiquitination and degradation. Biochem Biophys Res Commun 458:941–945. doi:

10.1016/j.bbrc.2015.02.089

Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform.

Bioinformatics 26:589–95. doi: 10.1093/bioinformatics/btp698

Li H, Handsaker B, Wysoker A, et al (2009) The Sequence Alignment/Map format and

185

SAMtools. Bioinformatics 25:2078–2079. doi: 10.1093/bioinformatics/btp352

Little D, Valori CF, Mutsaers CA, et al (2015) PTEN depletion decreases disease severity and

modestly prolongs survival in a mouse model of spinal muscular atrophy. Mol Ther 23:270–

277. doi: 10.1038/mt.2014.209

Liu Q, Dreyfuss G (1996) A novel nuclear structure containing the survival of motor neurons

protein. EMBO J 15:3555–3565. doi: 10.1016/j.ajhg.2013.04.018

Liu Q, Fischer U, Wang F, Dreyfuss G (1997) The spinal muscular atrophy disease gene product,

SMN, and its associated protein SIP1 are in a complex with spliceosomal snRNP proteins.

Cell 90:1013–1021. doi: 10.1016/S0092-8674(00)80367-0

Long KK, Shea KMO, Khairallah RJ, et al (2018) Specific inhibition ofmyostatin activation is

beneficial in mouse models of SMA therapy. Hum Mol Genet 00:1–14. doi:

10.1093/hmg/ddy382

Lorson C, Strasswimmer J, Yao J (1998a) SMN oligomerization defect correlates with spinal

muscular atrophy severity. Nat … 19:63–66

Lorson CL, Hahnen E, Androphy EJ, Wirth B (1999a) A single nucleotide in the SMN gene

regulates splicing and is responsible for spinal muscular atrophy. Proc Natl Acad Sci U S A

96:6307–11. doi: 10.1073/pnas.96.11.6307

Lorson CL, Hahnen E, Androphy EJ, Wirth B (1999b) A single nucleotide in the SMN gene

regulates splicing and is responsible for spinal muscular atrophy. Proc Natl Acad Sci

63:6307–6311. doi: 10.1242/dev.087338

Lorson CL, Strasswimmer J, Yao JM, et al (1998b) SMN oligomerization defect correlates with

186

spinal muscular atrophy severity. Nat Genet 19:63–66. doi: 10.1038/ng0598-63

Lotti F, Imlach WL, Saieva L, et al (2012) An SMN-Dependent U12 Splicing Event Essential for

Motor Circuit Function. Cell 151:440–454. doi: 10.1016/j.cell.2012.09.012

Luo L, Jan LY, Jan YN (1997) Rho family small GTP-binding proteins in growth cone

signalling. Curr Opin Neurobiol 7:81–86. doi: 10.1016/S0959-4388(97)80124-9

Luo M, Liu L, Peter I, et al (2014) An Ashkenazi Jewish SMN1 haplotype specific to duplication

alleles improves pan-ethnic carrier screening for spinal muscular atrophy. Genet Med

16:149–156. doi: 10.1038/gim.2013.84

Macleod MJ, Taylor JE, Lunt PW, et al (1999) Prenatal onset spinal muscular atrophy. Eur J

Paediatr Neurol 3:65–72. doi: 10.1016/S1090-3798(99)80015-4

Mailman MD, Heinz JW, Papp AC, et al (2002) Molecular analysis of spinal muscular atrophy

and modification of the phenotype by SMN2. Genet Med 4:20–6. doi: 10.1097/00125817-

200201000-00004

Mak KM, Png CYM, Lee DJ (2016) Type V Collagen in Health, Disease, and Fibrosis. Anat Rec

299:613–629. doi: 10.1002/ar.23330

Malfait F, Coucke P, Symoens S, et al (2005) The molecular basis of classic Ehlers-Danlos

syndrome: A comprehensive study of biochemical and molecular findings in 48 unrelated

patients. Hum Mutat 25:28–37. doi: 10.1002/humu.20107

Mallampalli RK, Coon TA, Glasser JR, et al (2013) Targeting F Box Protein Fbxo3 To Control

Cytokine-Driven Inflammation. J Immunol 191:5247–5255. doi:

10.4049/jimmunol.1300456

187

Martínez-Hernández R, Soler-Botija C, Also E, et al (2009) The developmental pattern of

myotubes in spinal muscular atrophy indicates prenatal delay of muscle maturation. J

Neuropathol Exp Neurol 68:474–81. doi: 10.1097/NEN.0b013e3181a10ea1

McAndrew PE, Parsons DW, Simard LR, et al (1997) Identification of proximal spinal muscular

atrophy carriers and patients by analysis of SMNT and SMNC gene copy number. Am J

Hum Genet 60:1411–22. doi: 10.1086/515465

McCarty DM (2008) Self-complementary AAV vectors; advances and applications. Mol Ther

16:1648–1656. doi: 10.1016/j.asoc.2016.08.039

McGovern VL, Gavrilina TO, Beattie CE, Burghes AHM (2008) Embryonic motor axon

development in the severe SMA mouse. Hum Mol Genet 17:2900–9. doi:

10.1093/hmg/ddn189

McGovern VL, Iyer CC, David Arnold W, et al (2015a) SMN expression is required in motor

neurons to rescue electrophysiological deficits in the SMNΔ7 mouse model of SMA. Hum

Mol Genet 24:5524–5541. doi: 10.1093/hmg/ddv283

McGovern VL, Massoni-Laporte A, Wang X, et al (2015b) Plastin 3 expression does not modify

spinal muscular atrophy severity in the Δ7 SMA mouse. PLoS One 10:1–19. doi:

10.1371/journal.pone.0132364

McPherron AC, Lawler AM, Lee S-J (1997) Regulation of skeletal muscle mass in mice by a

new TGF-p superfamily member. Nature 387:83–90. doi: 10.1038/387083a0

McWhorter ML, Monani UR, Burghes AHM, Beattie CE (2003) Knockdown of the survival

motor neuron (Smn) protein in zebrafish causes defects in motor axon outgrowth and

188

pathfinding. J Cell Biol 162:919–931. doi: 10.1083/jcb.200303168

Meister G, Bühler D, Pillai R, et al (2001) A multiprotein complex mediates the ATP-dependent

assembly of spliceosomal U snRNPs. Nat Cell Biol 3:945–949. doi: 10.1038/ncb1101-945

Melegh B, Pap M, Morava E, et al (1994) Carnitine-dependent changes of metabolic fuel

consumption during long-term treatment with valproic acid. J Pediatr 125:317–321. doi:

10.1016/S0022-3476(94)70218-7

Melki J, Abdelhak S, Sheth P, et al (1990a) Gene for chronic proximal spinal muscular atrophies

maps to chromosome 5q. Nature 344:767–768

Melki J, Lefebvre S, Burglen L, et al (1994) De novo and inherited deletions of the 5q13 region

in spinal muscular atrophies. Science (80- ) 264:1474–1477. doi: 10.1126/science.7910982

Melki J, Sheth P, Abdelhak S, et al (1990b) Mapping of acute (type I) spinal muscular atrophy to

chromosome 5q12-q14. Lancet 336:271–273. doi: 10.1016/0140-6736(90)91803-I

Mendell JR, Al-Zaidy S, Shell R, et al (2017) Single-Dose Gene-Replacement Therapy for

Spinal Muscular Atrophy. N Engl J Med 377:1713–1722. doi: 10.1056/NEJMoa1706198

Mentis GZ, Blivis D, Liu W, et al (2011) Early functional impairment of sensory-motor

connectivity in a mouse model of spinal muscular atrophy. Neuron 69:453–67. doi:

10.1016/j.neuron.2010.12.032

Mercuri E, Darras BT, Chiriboga CA, et al (2018) Nusinersen versus Sham Control in Later-

Onset Spinal Muscular Atrophy. N Engl J Med 378:625–635. doi:

10.1056/NEJMoa1710504

Meyer K, Ferraiuolo L, Schmelzer L, et al (2015) Improving single injection CSF delivery of

189

AAV9-mediated gene therapy for SMA: A dose-response study in mice and nonhuman

primates. Mol Ther 23:477–487. doi: 10.1038/mt.2014.210

Miller RG, Moore DH, Dronsky V, et al (2001) A placebo-controlled trial of gabapentin in spinal

muscular atrophy. J Neurol Sci 191:127–131. doi: 10.1016/S0022-510X(01)00632-3

Miyajima H, Miyaso H, Okumura M, et al (2002) Identification of a cis-acting element for the

regulation of SMN exon 7 splicing. J Biol Chem 277:23271–23277. doi:

10.1074/jbc.M200851200

Mizuhara E, Nakatani T, Minaki Y, et al (2005) MAGI1 recruits Dll1 to cadherin-based adherens

junctions and stabilizes it on the cell surface. J Biol Chem 280:26499–507. doi:

10.1074/jbc.M500375200

Mohaghegh P, Rodrigues NR, Owen N, et al (1999) Analysis of mutations in the tudor domain

of the survival motor neuron protein SMN. Eur J Hum Genet 7:519–525. doi:

10.1038/sj.ejhg.5200346

Monani UR, Lorson CL, Parsons DW, et al (1999) A single nucleotide difference that alters

splicing patterns distinguishes the SMA gene SMN1 from the copy gene SMN2. Hum Mol

Genet 8:1177–1183. doi: 10.1093/hmg/8.7.1177

Monani UR, Pastore MT, Gavrilina TO, et al (2003) A transgene carrying an A2G missense

mutation in the SMN gene modulates phenotypic severity in mice with severe (type I) spinal

muscular atrophy. J Cell Biol 160:41–52. doi: 10.1083/jcb.200208079

Morandi L, Abiusi E, Pasanisi MB, et al (2013) P.6.4 Salbutamol tolerability and efficacy in

adult type III SMA patients: Results of a multicentric, molecular and clinical, double-blind,

190

placebo-controlled study. Neuromuscul Disord 23:771. doi: 10.1016/j.nmd.2013.06.475

Munsat TL (1991) International SMA Collaboration. Neuromuscul Disord 1:81. doi:

10.1016/0960-8966(91)90052-T

Munsat TL, Woods R, Fowler W, Pearson CM (1969) Neurogenic Muscular Atrophy of Infancy.

Brain 92:9–24

Murray LM, Comley LH, Thomson D, et al (2008) Selective vulnerability of motor neurons and

dissociation of pre- and post-synaptic pathology at the neuromuscular junction in mouse

models of spinal muscular atrophy. Hum Mol Genet 17:949–62. doi: 10.1093/hmg/ddm367

Narayanan U, Ospina J, Frey M (2002) SMN, the spinal muscular atrophy protein, forms a pre-

import snRNP complex with snurportin1 and importin β. Hum Mol 11:1785–1795. doi:

10.1093/hmg/11.15.1785

Neuenkirchen N, Chari A, Fischer U (2008) Deciphering the assembly pathway of Sm-class U

snRNPs. FEBS Lett 582:1997–2003. doi: 10.1016/j.febslet.2008.03.009

Ning K (2004) Dual Neuroprotective Signaling Mediated by Downregulating Two Distinct

Phosphatase Activities of PTEN. J Neurosci 24:4052–4060. doi:

10.1523/JNEUROSCI.5449-03.2004

Ning K, Drepper C, Valori CF, et al (2010) PTEN depletion rescues axonal growth defect and

improves survival in SMN-deficient motor neurons. Hum Mol Genet 19:3159–3168. doi:

10.1093/hmg/ddq226

Nölle A, Zeug A, van Bergeijk J, et al (2011) The spinal muscular atrophy disease protein SMN

is linked to the Rho-kinase pathway via profilin. Hum Mol Genet 20:4865–78. doi:

191

10.1093/hmg/ddr425

Okouchi M, Ekshyyan O, Maracine M, Aw TY (2007) Neuronal Apoptosis in

Neurodegeneration. Antioxid Redox Signal 9:1059–1096. doi: 10.1089/ars.2007.1511

Oprea GE, Kröber S, McWhorter ML, et al (2008) Plastin 3 is a protective modifier of autosomal

recessive spinal muscular atrophy. Science 320:524–7. doi: 10.1126/science.1155085

Ortiz B, Fabius AWM, Wu WH, et al (2014) Loss of the tyrosine phosphatase PTPRD leads to

aberrant STAT3 activation and promotes gliomagenesis. Proc Natl Acad Sci 111:8149–

8154. doi: 10.1073/pnas.1401952111

Osoegawa K, Mammoser AG, Wu C, et al (2001) A bacterial artificial chromosome library for

sequencing the complete human genome. Genome Res 11:483–496. doi: 10.1101/gr.169601

Osoegawa K, Woon PY, Zhao B, et al (1998) An improved approach for construction of

bacterial artificial chromosome libraries. Genomics 52:1–8. doi: 10.1006/geno.1998.5423

Pane M, Lapenta L, Abiusi E, et al (2017) Longitudinal assessments in discordant twins with

SMA. Neuromuscul Disord 27:890–893. doi: 10.1016/j.nmd.2017.06.559

Pane M, Staccioli S, Messina S, et al (2008) Daily salbutamol in young patients with SMA type

II. Neuromuscul Disord 18:536–540. doi: 10.1016/j.nmd.2008.05.004

Park G-H, Maeno-Hikichi Y, Awano T, et al (2010) Reduced survival of motor neuron (SMN)

protein in motor neuronal progenitors functions cell autonomously to cause spinal muscular

atrophy in model mice expressing the human centromeric (SMN2) gene. J Neurosci

30:12005–19. doi: 10.1523/JNEUROSCI.2208-10.2010

Park KK, Liu K, Hu Y, et al (2008) Promoting Axon Regeneration in the Adult CNS by

192

Modulation of the PTEN/mTOR Pathway. Science (80- ) 322:963–966. doi:

10.1126/science.1161566

Pearn JH (1978) Incidence, prevalence, and gene frequency studies of chronic childhood spinal

muscular atrophy. J Med Genet 15:409–413. doi: 10.1136/jmg.15.6.409

Pellizzoni L (2007) Chaperoning ribonucleoprotein biogenesis in health and disease. EMBO Rep

8:340–345. doi: 10.1038/sj.embor.7400941

Pellizzoni L, Baccon J, Rappsilber J, et al (2002a) Purification of native survival of motor

neurons complexes and identification of Gemin6 as a novel component. J Biol Chem

277:7540–7545. doi: 10.1074/jbc.M110141200

Pellizzoni L, Yong J, Dreyfuss G (2002b) Essential role for the SMN complex in the specificity

of snRNP assembly. Science (80- ) 298:1775–1779. doi: 10.1126/science.1074962

Perrone-Bizzozero N, Bolognani F (2002) Role of HuD and other RNA-Binding proteins in

neural development and plasticity. J Neurosci Res 68:121–126. doi: 10.1002/jnr.10175

Peter CJ, Evans M, Thayanithy V, et al (2011) The COPI vesicle complex binds and moves with

survival motor neuron within axons. Hum Mol Genet 20:1701–1711. doi:

10.1093/hmg/ddr046

Petroski MD (2008) The ubiquitin system, disease, and drug discovery. BMC Biochem 9:1–15.

doi: 10.1186/1471-2091-9-S1-S7

Pillai RS, Grimmler M, Meister G, et al (2003) Unique Sm core structure of U7 snRNPs:

Assembly by a specialized SMN complex and the role of a new component, Lsm11, in

histone RNA processing. Genes Dev 17:2321–2333. doi: 10.1101/gad.274403

193

Ponting CP (1997) Tudor domains in proteins that interact with RNA. Trends Biochem Sci

22:51–52. doi: 10.1016/S0968-0004(96)30049-2

Porensky PN, Mitrpant C, McGovern VL, et al (2012) A single administration of morpholino

antisense oligomer rescues spinal muscular atrophy in mouse. Hum Mol Genet 21:1625–38.

doi: 10.1093/hmg/ddr600

Powis RA, Karyka E, Boyd P, et al (2016) Systemic restoration of UBA1 ameliorates disease in

spinal muscular atrophy. JCI Insight 1:. doi: 10.1172/jci.insight.87908

Powis RA, Mutsaers CA, Wishart TM, et al (2014) Increased levels of UCHL1 are a

compensatory response to disrupted ubiquitin homeostasis in spinal muscular atrophy and

do not represent a viable therapeutic target. Neuropathol Appl Neurobiol 40:873–887. doi:

10.1111/nan.12168

Prior TW (2007) Spinal Muscular Atrophy Diagnostics. J Child Neurol 22:952–956. doi:

10.1177/0883073807305668

Prior TW, Krainer AR, Hua Y, et al (2009) A positive modifier of spinal muscular atrophy in the

SMN2 gene. Am J Hum Genet 85:408–13. doi: 10.1016/j.ajhg.2009.08.002

Prior TW, Swoboda KJ, Scott HD, Hejmanowski AQ (2004) Homozygous SMN1 deletions in

unaffected family members and modification of the phenotype by SMN2. Am J Med Genet

130 A:307–310. doi: 10.1002/ajmg.a.30251

Pyatt RE, Prior TW (2006) A feasibility study for the newborn screening of spinal muscular

atrophy. Genet Med 8:428–437. doi: 10.1097/01.gim.0000227970.60450.b2

R Core Team (2013) R: A language and environment for statistical computing. R Foundation for

194

Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-

project.org.

Rajan P, Stewart CL, Fink JS (1995) LIF-mediated activation of STAT proteins after neuronal

injury in vivo. Neuroreport 6:2240—2244

Ratni H, Ebeling M, Baird J, et al (2018) Discovery of Risdiplam, a Selective Survival of Motor

Neuron-2 (SMN2) Gene Splicing Modifier for the Treatment of Spinal Muscular Atrophy

(SMA). J Med Chem 61:6501–6517. doi: 10.1021/acs.jmedchem.8b00741

Ratni H, Karp GM, Weetall M, et al (2016) Specific Correction of Alternative Survival Motor

Neuron 2 Splicing by Small Molecules: Discovery of a Potential Novel Medicine to Treat

Spinal Muscular Atrophy. J Med Chem 59:6086–6100. doi:

10.1021/acs.jmedchem.6b00459

Riessland M, Kaczmarek A, Schneider S, et al (2017) Neurocalcin Delta Suppression Protects

against Spinal Muscular Atrophy in Humans and across Species by Restoring Impaired

Endocytosis. Am J Hum Genet 100:297–315. doi: 10.1016/j.ajhg.2017.01.005

Rimmer A, Phan H, Mathieson I, et al (2014) Integrating mapping-, assembly- and haplotype-

based approaches for calling variants in clinical sequencing applications. Nat Genet 46:912–

918. doi: 10.1038/ng.3036

Rindt H, Buckley DM, Vale SM, et al (2012) Transgenic inactivation of murine myostatin does

not decrease the severity of disease in a model of Spinal Muscular Atrophy. Neuromuscul

Disord 22:277–285. doi: 10.1016/j.nmd.2011.10.012

Rochette CF, Gilbert N, Simard LR (2001) SMN gene duplication and the emergence of the

195

SMN2 gene occurred in distinct hominids: SMN2 is unique to Homo sapiens. Hum Genet

108:255–266. doi: 10.1007/s004390100473

Rollins SA, Sims PJ (1990) The complement-inhibitory activity of CD59 resides in its capacity

to block incorporation of C9 into membrane C5b-9. J Immunol (Baltimore, Md 1950)

144:3478–3483. doi: TNK0043

Rose FF, Mattis VB, Rindt H, Lorson CL (2009) Delivery of recombinant follistatin lessens

disease severity in a mouse model of spinal muscular atrophy. Hum Mol Genet 18:997–

1005. doi: 10.1093/hmg/ddn426

Rossoll W, Jablonka S, Andreassi C, et al (2003) Smn, the spinal muscular atrophy-determining

gene product, modulates axon growth and localization of beta-actin mRNA in growth cones

of motoneurons. J Cell Biol 163:801–12. doi: 10.1083/jcb.200304128

Rossoll W, Kröning A, Ohndorf U, et al (2002) Specific interaction of Smn, the spinal muscular

atrophy determining gene product, with hnRNP-R and gry-rbp/hnRNP-Q: a role for Smn in

RNA processing in motor axons? Hum Mol Genet 11:93–105

Roy N, Mahadevan MS, McLean M, et al (1995) The gene for neuronal apoptosis inhibitory

protein is partially deleted in individuals with spinal muscular atrophy. Cell 80:167–178.

doi: 10.1016/0092-8674(95)90461-1

Rüdiger NS, Gregersen N, Kielland-brandt MC (1995) One short well conserved region of Alu-

sequences is involved in human gene rearrangements and has homology with prokaryotic

chi. Nucleic Acids Res 23:256–260. doi: 10.1093/nar/23.2.256

Ruggiu M, McGovern VL, Lotti F, et al (2012) A role for SMN exon 7 splicing in the selective

196

vulnerability of motor neurons in spinal muscular atrophy. Mol Cell Biol 32:126–38. doi:

10.1128/MCB.06077-11

Ruhno C, Mcgovern VL, Avenarius MR, et al (2019) Complete sequencing of the SMN2 gene in

SMA patients detects SMN gene deletion junctions and variants in SMN2 that modify the

SMA phenotype. Hum Genet 138:241–256. doi: 10.1007/s00439-019-01983-0

Sánchez-Sánchez J, Arévalo JC (2017) A review on ubiquitination of neurotrophin receptors:

Facts and perspectives. Int J Mol Sci 18:. doi: 10.3390/ijms18030630

Schlipf N, Schüle R, Klimpe S, et al (2011) Amplicon-based high-throughput pooled sequencing

identifies mutations in CYP7B1 and SPG7 in sporadic spastic paraplegia patients. Clin

Genet 80:148–160. doi: 10.1111/j.1399-0004.2011.01715.x

Schnell JD, Hicke L (2003) Non-traditional functions of ubiquitin and ubiquitin-binding

proteins. J Biol Chem 278:35857–35860. doi: 10.1074/jbc.R300018200

Schrank B, Götz R, Gunnersen JM, et al (1997) Inactivation of the survival motor neuron gene, a

candidate gene for human spinal muscular atrophy, leads to massive cell death in early

mouse embryos. Proc Natl Acad Sci U S A 94:9920–5. doi: 10.1073/pnas.94.18.9920

Schüle R, Brandt E, Karle KN, et al (2009) Analysis of CYP7B1 in non-consanguineous cases of

hereditary spastic paraplegia. Neurogenetics 10:97–104. doi: 10.1007/s10048-008-0158-9

Schwab AJ, Ebert AD (2014) Sensory neurons do not induce motor neuron loss in a human stem

cell model of spinal muscular atrophy. PLoS One 9:31–33. doi:

10.1371/journal.pone.0103112

Schwaiger F, Hager G, Schmitt AB, et al (2000) Peripheral but not central axotomy induces

197

changes in Janus kinases ( JAK ) and signal transducers and activators of transcription (

STAT ). 12:1165–1176

Schweizer U, Gunnersen J, Karch C, et al (2002) Conditional gene ablation of Stat3 reveals

differential signaling requirements for survival of motoneurons during development and

after nerve injury in the adult. J Cell Biol 156:287–297. doi: 10.1083/jcb.200107009

See K, Yadav P, Giegerich M, et al (2014) SMN deficiency alters Nrxn2 expression and splicing

in zebrafish and mouse models of spinal muscular atrophy. Hum Mol Genet 23:1754–1770.

doi: 10.1093/hmg/ddt567

Setola V, Terao M, Locatelli D, et al (2007) Axonal-SMN (a-SMN), a protein isoform of the

survival motor neuron gene, is specifically involved in axonogenesis. Proc Natl Acad Sci

104:1959–1964. doi: 10.1073/pnas.0610660104

Shen S, Park JW, Lu Z, et al (2014) rMATS: Robust and flexible detection of differential

alternative splicing from replicate RNA-Seq data. Proc Natl Acad Sci 111:E5593–E5601.

doi: 10.1073/pnas.1419161111

Shpargel KB, Matera AG (2005) Gemin proteins are required for efficient assembly of Sm-class

ribonucleoproteins. Proc Natl Acad Sci 102:17372–17377. doi: 10.1073/pnas.0508947102

Singh NK, Singh NN, Androphy EJ, Singh RN (2006) Splicing of a critical exon of human

Survival Motor Neuron is regulated by a unique silencer element located in the last intron.

Mol Cell Biol 26:1333–46. doi: 10.1128/MCB.26.4.1333-1346.2006

Soares VM, Brzustowicz LM, Kleyn PW, et al (1993) Refinement of the spinal muscular atrophy

locus to the interval between d5s435 and map1b. Genomics 15:365–371. doi:

198

10.1006/geno.1993.1069

Soler-Botija C, Ferrer I, Alvarez JL, et al (2003) Downregulation of Bcl-2 Proteins in Type I

Spinal Muscular Atrophy Motor Neurons During Fetal Development. J Neuropathol Exp

Neurol 62:420–426. doi: 10.1093/jnen/62.4.420

Stark H, Dube P, Luührmann R, Kastner B (2001) Arrangement of RNA and proteins in the

spliceosomal U1 small nuclear ribonucleoprotein particle. Nature 409:539–542. doi:

10.1038/35054102

Strathmann EA, Peters M, Hosseinibarkooie S, et al (2018) Evaluation of potential effects of

Plastin 3 overexpression and low-dose SMN-antisense oligonucleotides on putative

biomarkers in spinal muscular atrophy mice. 4:1–28

Stratigopoulos G, Lanzano P, Deng L, et al (2010) Association of plastin 3 expression with

disease severity in spinal muscular atrophy only in postpubertal females. Arch Neurol

67:1252–1256. doi: 10.1001/archneurol.2010.239

Sturm S, Günther A, Jaber B, et al (2018) A phase 1 healthy male volunteer single escalating

dose study of the pharmacokinetics and pharmacodynamics of risdiplam (RG7916,

RO7034067), a SMN2 splicing modifier. Br J Clin Pharmacol. doi: 10.1111/bcp.13786

Sugarman EA, Nagan N, Zhu H, et al (2012) Pan-ethnic carrier screening and prenatal diagnosis

for spinal muscular atrophy: clinical laboratory analysis of 472400 specimens. Eur J Hum

Genet 20:27–32. doi: 10.1038/ejhg.2011.134

Sumner CJ, Wee CD, Warsing LC, et al (2009) Inhibition of myostatin does not ameliorate

disease features of severe spinal muscular atrophy mice. Hum Mol Genet 18:3145–3152.

199

doi: 10.1093/hmg/ddp253

Sun Y, Grimmler M, Schwarzer V, et al (2005) Molecular and functional analysis of intragenic

SMN1 mutations in patients with spinal muscular atrophy. Hum Mutat 25:64–71. doi:

10.1002/humu.20111

Swanger SA, Mattheyses AL, Gentry EG, Herskowitz JH (2015) ROCK1 and ROCK2 inhibition

alters dendritic spine morphology in hippocampal neurons. Cell Logist 5:e1133266. doi:

10.1080/21592799.2015.1133266

Swoboda KJ, Prior TW, Scott CB, et al (2005) Natural history of denervation in SMA: Relation

to age, SMN2 copy number, and function. Ann Neurol 57:704–712. doi: 10.1002/ana.20473

Swoboda KJ, Scott CB, Reyna SP, et al (2009) Phase II open label study of valproic acid in

spinal muscular atrophy. PLoS One 4:. doi: 10.1371/journal.pone.0005268

Szkandera J, Winder T, Stotz M, et al (2013) A common gene variant in PLS3 predicts colon

cancer recurrence in women. Tumor Biol 34:2183–2188. doi: 10.1007/s13277-013-0754-7

Talbot K, Ponting CP, Theodosiou AM, et al (1997) Missense mutation clustering in the survival

motor neuron gene: A role for a conserved tyrosine and glycine rich region of the protein in

RNA metabolism? Hum Mol Genet 6:497–500. doi: 10.1093/hmg/6.3.497

Tanno T, Takenaka S, Tsuyama S (2004) Expression and function of Slit1α, a novel alternative

splicing product for Slit1. J Biochem 136:575–581. doi: 10.1093/jb/mvh164

Theofilopoulos S, Griffiths WJ, Crick PJ, et al (2014) Cholestenoic acids regulate motor neuron

survival via liver X receptors. 124:4829–4842. doi: 10.1172/JCI68506DS1

Thirumalai V, Behrend RM, Birineni S, et al (2013) Preservation of VGLUT1 synapses on

200

ventral calbindin-immunoreactive interneurons and normal locomotor function in a mouse

model of spinal muscular atrophy. J Neurophysiol 109:702–10. doi: 10.1152/jn.00601.2012

Thompson H, Barker D, Camand O, Erskine L (2006) Slits contribute to the guidance of retinal

ganglion cell axons in the mammalian optic tract. Dev Biol 296:476–484. doi:

10.1016/j.ydbio.2006.06.017

Thompson TG, Didonato CJ, Simard LR, et al (1995) A novel cDNA detects homozygous

microdeletions in greater than 50% of type I spinal muscular atrophy patients. Nat Genet

9:56–62. doi: 10.1038/ng0195-56

Tisdale S, Lotti F, Saieva L, et al (2013) SMN Is Essential for the Biogenesis of U7 Small

Nuclear Ribonucleoprotein and 3’-End Formation of Histone mRNAs. Cell Rep 5:1187–95.

doi: 10.1016/j.celrep.2013.11.012

Tomsic J, He H, Akagi K, et al (2015) A germline mutation in SRRM2, a splicing factor gene, is

implicated in papillary thyroid carcinoma predisposition. Sci Rep 5:10566. doi:

10.1038/srep10566

Tsai MS, Chiu YT, Wang SH, et al (2006) Abolishing Trp53-dependent apoptosis does not

benefit spinal muscular atrophy model mice. Eur J Hum Genet 14:372–5. doi:

10.1038/sj.ejhg.5201556

Tsaousidou MK, Ouahchi K, Warner TT, et al (2008) Sequence Alterations within CYP7B1

Implicate Defective Cholesterol Homeostasis in Motor-Neuron Degeneration. Am J Hum

Genet 82:510–515. doi: 10.1016/j.ajhg.2007.10.001

Van Alstyne M, Lotti F, Dal Mas A, et al (2018) Stasimon/Tmem41b localizes to mitochondria-

201

associated ER membranes and is essential for mouse embryonic development. Biochem

Biophys Res Commun 506:463–470. doi: 10.1016/j.bbrc.2018.10.073

Veeriah S, Brennan C, Meng S, et al (2009) The tyrosine phosphatase PTPRD is a tumor

suppressor that is frequently inactivated and mutated in glioblastoma and other human

cancers

Velasco E, Valero C, Valero A, et al (1996) Molecular analysis of the SMN and NAIP genes in

Spanish spinal muscular atrophy (SMA) families and correlation between number of copies

ofcBCD541 and SMA phenotype. Hum Mol Genet 5:257–263. doi: 10.1093/hmg/5.2.257

Vezain M, Saugier-Veber P, Goina E, et al (2010) A rare SMN2 variant in a previously

unrecognized composite splicing regulatory element induces exon 7 inclusion and reduces

the clinical severity of spinal muscular atrophy. Hum Mutat 31:1110–1125. doi:

10.1002/humu.21173

Vitali T, Sossi V, Tiziano FD, et al (1999) Detection of the survival motor neuron (SMN) genes

by FISH: further evidence for a role for SMN2 in the modulation of disease severity in

SMA patients. Hum Mol Genet 8 13:2525–2532

Wagner KR, McPherron AC, Winik N, Lee SJ (2002) Loss of myostatin attenuates severity of

muscular dystrophy in mdx mice. Ann Neurol 52:832–836. doi: 10.1002/ana.10385

Wan L, Battle DJ, Yong J, et al (2005) The Survival of Motor Neurons Protein Determines the

Capacity for snRNP Assembly : Biochemical Deficiency in Spinal Muscular Atrophy The

Survival of Motor Neurons Protein Determines the Capacity for snRNP Assembly :

Biochemical Deficiency in Spinal Muscul. doi: 10.1128/MCB.25.13.5543

202

Wang CC, Chang JG, Chen YL, et al (2010a) Multi-exon genotyping of SMN gene in spinal

muscular atrophy by universal fluorescent PCR and capillary electrophoresis.

Electrophoresis 31:2396–2404. doi: 10.1002/elps.201000124

Wang H, Zhang Y, Ozdamar B (2003) Regulation of Cell Polarity and Protrusion Formation by

Targeting RhoA for Degradation. 302:1775–1780

Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants

from high-throughput sequencing data. Nucleic Acids Res 38:e164. doi:

10.1093/nar/gkq603

Weber JL, Polymeropoulos MH, May PE, et al (1991) Mapping of human

microsatellite DNA polymorphisms. Genomics 11:695–700. doi: 10.1016/0888-

7543(91)90077-R

Werdnig G (1891) Zwei frühinfantile hereditäre Fälle von progressiver Muskelatrophie unter

dem Bilde der Dystrophie, aber auf neurotischer Grundlage. Arch fur Psychiatr und

Nervenkrankheiten, Berlin 22:437–81

Werdnig G (1894) Die frühinfantile progressive spinale Amyotrophie. Arch fur Psychiatr und

Nervenkrankheiten, Berlin 26:707–44

Will CL, Lührmann R (2001) Spliceosomal UsnRNP biogenesis, structure and function. Curr

Opin Cell Biol 13:290–301

Will CL, Lührmann R (2005) Splicing of a rare class of introns by the U12-dependent

spliceosome. Biol Chem 386:713–24. doi: 10.1515/BC.2005.084

Wirth B, El-Agwany A, Baasner A, et al (1995) Mapping of the spinal muscular atrophy (SMA)

203

gene to a 750-kb interval flanked by two new microsatellites. Eur J Hum Genet 3:56–60.

doi: 10.1159/000472274

Wirth B, Herz M, Wetter A, et al (1999) Quantitative analysis of survival motor neuron copies:

identification of subtle SMN1 mutations in patients with spinal muscular atrophy, genotype-

phenotype correlation, and implications for genetic counseling. Am J Hum Genet 64:1340–

1356. doi: S0002-9297(07)62279-4 [pii]\n10.1086/302369

Wirth B, Schmidt T, Hahnen E, et al (1997) De novo rearrangements found in 2% of index

patients with spinal muscular atrophy: mutational mechanisms, parental origin, mutation

rate, and implications for genetic counseling. Am J Hum Genet 61:1102–11. doi:

10.1086/301608

Wishart TM, Mutsaers CA, Riessland M, et al (2014) Dysregulation of ubiquitin homeostasis

and β-catenin signaling promote spinal muscular atrophy. J Clin Invest 124:1821–1834. doi:

10.1172/JCI71318

Workman E, Kolb SJ, Battle DJ (2012) Spliceosomal small nuclear ribonucleoprotein biogenesis

defects and motor neuron selectivity in spinal muscular atrophy. Brain Res 1462:93–9. doi:

10.1016/j.brainres.2012.02.051

Workman E, Saieva L, Carrel TL, et al (2009) A SMN missense mutation complements SMN2

restoring snRNPs and rescuing SMA mice. Hum Mol Genet 18:2215–29. doi:

10.1093/hmg/ddp157

Wright GJ, Leslie JD, Ariza-McNaughton L, Lewis J (2004) Delta proteins and MAGI proteins:

an interaction of Notch ligands with intracellular scaffolding molecules and its significance

204

for zebrafish development. Development 131:5659–69. doi: 10.1242/dev.01417

Wu X, Wang S-H, Sun J, et al (2017) A-44G transition in SMN2 intron 6 protects patients with

spinal muscular atrophy. Hum Mol Genet 26:2768–2780. doi: 10.1093/hmg/ddx166

Yanyan C, Yujin Q, Jinli B, et al (2014) Correlation of PLS3 expression with disease severity in

children with spinal muscular atrophy. J Hum Genet 59:24–7. doi: 10.1038/jhg.2013.111

Yeh ML, Gonda Y, Mommersteeg MTM, et al (2014) Robo1 Modulates Proliferation and

Neurogenesis in the Developing Neocortex. J Neurosci 34:5717–5731. doi:

10.1523/JNEUROSCI.4256-13.2014

Yi XN, Zheng LF, Zhang JW, et al (2006) Dynamic changes in Robo2 and Slit1 expression in

adult rat dorsal root ganglion and sciatic nerve after peripheral and central axonal injury.

Neurosci Res 56:314–321. doi: 10.1016/j.neures.2006.07.014

Yoon HK, Li ZJ, Choi D-K, et al (2014) Glucocorticoid receptor enhances involucrin expression

of keratinocyte in a ligand-independent manner. Mol Cell Biochem 390:289–295. doi:

10.1007/s11010-014-1985-7

Young PJ, Man NT, Lorson CL, et al (2000) The exon 2b region of the spinal muscular atrophy

protein, SMN, is involved in self-association and SIP1 binding. Hum Mol Genet 9:2869–77.

doi: 10.1093/hmg/9.19.2869

Zerres K, Rudnik-Schoneborn S (1995) Natural history in proximal spinal muscular atrophy:

clinical analysis of 445 patients and suggestions for a modification of existing

classifications. Arch Neurol 52:

Zhang H, Xing L, Rossoll W, et al (2006) Multiprotein Complexes of the Survival of Motor

205

Neuron Protein SMN with Gemins Traffic to Neuronal Processes and Growth Cones of

Motor Neurons. J Neurosci 26:8622–8632. doi: 10.1523/JNEUROSCI.3967-05.2006

Zhang HY, Zheng LF, Yi XN, et al (2010) Slit1 promotes regenerative neurite outgrowth of

adult dorsal root ganglion neurons in vitro via binding to the Robo receptor. J Chem

Neuroanat 39:256–261. doi: 10.1016/j.jchemneu.2010.02.001

Zhang Z, Lotti F, Dittmar K, et al (2008) SMN deficiency causes tissue-specific perturbations in

the repertoire of snRNAs and widespread defects in splicing. Cell 133:585–600. doi:

10.1016/j.cell.2008.03.031

Zhang Z, Pinto A, Wan L, et al (2013) Dysregulation of synaptogenesis genes antecedes motor

neuron pathology in spinal muscular atrophy. Proc … 110:1–6. doi:

10.1073/pnas.1319280110/-

/DCSupplemental.www.pnas.org/cgi/doi/10.1073/pnas.1319280110

Zheleznyakova GY, Nilsson EK, Kiselev A V., et al (2015) Methylation levels of SLC23A2 and

NCOR2 genes correlate with spinal muscular atrophy severity. PLoS One 10:1–14. doi:

10.1371/journal.pone.0121964

Zheleznyakova GY, Voisin S, Kiselev A V, et al (2013) Genome-wide analysis shows

association of epigenetic changes in regulators of Rab and Rho GTPases with spinal

muscular atrophy severity. Eur J Hum Genet 21:988–93. doi: 10.1038/ejhg.2012.293

206

Appendix A: PLS3 SNPs, Female

207

Table A.1: Allele counts of PLS3 SNPs, Females chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114734068 T C 87 1 65 1 14 0 1.000 1.000 114734111 G A 47 41 37 29 7 7 1.000 1.000 114734220 T C 87 1 65 1 14 0 1.000 1.000 114734718 G A 47 41 30 36 10 4 0.675 0.675 114734727 T C 88 0 65 1 14 0 0.675 0.675 114734731 T A 88 0 65 1 12 2 1.000 1.000 114735479 C T 88 0 64 2 14 0 0.675 0.675 114736614 C T 49 39 31 35 10 4 0.675 0.675 114736951 G T 79 9 61 5 12 2 0.899 0.899 114737436 A G 79 9 61 5 12 2 0.899 0.899 208 114738128 T C 87 1 62 2 14 0 0.862 0.862

114738319 T A 68 20 41 25 11 3 0.675 0.675 114738387 A G 47 41 29 37 10 4 0.675 0.675 114738436 A G 79 9 62 4 13 1 0.867 0.867 114738701 C T 68 20 45 21 10 4 0.675 0.675 114738837 G T 79 9 61 5 13 1 1.000 1.000 114738916 T C 87 1 65 1 14 0 1.000 1.000 114739063 T C 46 42 31 35 10 4 0.675 0.675 114739272 C G 46 42 31 35 9 5 0.712 0.712 114739498 C A 87 1 65 1 14 0 1.000 1.000 114739632 A T 66 22 45 21 10 4 0.675 0.675 114739684 T G 45 43 28 38 10 4 0.675 0.675 114740834 G A 79 9 61 5 11 1 1.000 1.000 114740938 A T 49 39 28 36 8 4 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114741064 C T 81 7 61 5 12 2 1.000 1.000 114741270 T A 86 2 66 0 12 2 0.675 0.675 114741345 G A 87 1 65 1 14 0 1.000 1.000 114741485 A G 39 41 25 41 8 6 0.675 0.675 114741578 A G 88 0 65 1 14 0 0.675 0.675 114741788 G T 47 41 30 36 10 4 0.675 0.675 114741907 G A 86 2 64 2 14 0 0.946 0.946 114741937 G A 87 1 65 1 14 0 1.000 1.000 114742093 G A 41 47 24 42 8 6 0.675 0.675 114742104 A C 42 46 24 42 8 6 0.675 0.675 209 114742183 G A 69 19 55 11 14 0 1.000 1.000

114742194 A G 81 7 60 6 12 2 1.000 1.000 114742278 C T 46 42 31 35 8 6 0.829 0.829 114742390 C T 47 41 30 34 10 4 0.675 0.675 114742466 A C 45 41 31 35 10 4 0.675 0.675 114742705 C T 88 0 65 1 14 0 0.675 0.675 114742798 C G 87 1 65 1 14 0 1.000 1.000 114742863 T C 41 47 37 29 4 10 0.675 0.675 114743006 T C 88 0 65 1 14 0 0.675 0.675 114743131 G A 52 36 41 25 7 7 0.943 0.943 114743310 T G 87 1 65 1 14 0 1.000 1.000 114743445 C T 88 0 65 1 14 0 0.675 0.675 114743600 C T 88 0 65 1 14 0 0.675 0.675 114743679 G T 86 2 63 3 14 0 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114743762 G A 82 6 64 2 12 2 0.675 0.675 114743890 G A 41 45 37 27 4 10 0.675 0.675 114744108 T C 88 0 65 1 14 0 0.675 0.675 114744241 G A 70 18 43 23 9 5 0.675 0.675 114744256 A G 69 19 43 23 9 5 0.675 0.675 114744434 T C 21 67 13 53 0 14 1.000 1.000 114744618 T G 88 0 66 0 12 2 0.822 0.822 114744917 C T 87 1 65 1 14 0 1.000 1.000 114744988 A C 40 48 38 28 4 10 0.675 0.675 114744992 A G 39 49 38 28 4 10 0.675 0.675 210 114745305 A G 40 48 38 28 4 10 0.675 0.675

114745371 C A 88 0 65 0 14 0 1.000 1.000 114745624 A G 88 0 65 1 14 0 0.675 0.675 114745883 T C 40 48 40 26 4 10 0.675 0.675 114745907 T G 88 0 65 1 14 0 0.675 0.675 114747025 G C 39 49 38 28 4 10 0.675 0.675 114747085 T A 88 0 65 1 14 0 0.675 0.675 114747295 T C 88 0 65 1 14 0 0.675 0.675 114747963 T C 19 69 11 55 0 14 1.000 1.000 114747976 G C 88 0 65 1 14 0 0.675 0.675 114748204 C T 65 11 52 6 11 1 0.923 0.923 114748507 T C 46 42 41 25 4 10 0.675 0.675 114748853 A G 87 1 65 1 14 0 1.000 1.000 114749162 T C 19 69 12 54 0 14 1.000 1.000 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114749228 C A 88 0 65 1 14 0 0.675 0.675 114749613 T C 81 7 61 5 12 2 1.000 1.000 114750081 C T 40 46 35 29 4 10 0.675 0.675 114751436 G A 82 6 64 2 14 0 0.782 0.782 114751520 C T 88 0 65 1 14 0 0.675 0.675 114751583 A T 88 0 65 1 14 0 0.675 0.675 114751852 G A 20 68 11 55 0 14 0.979 0.979 114751859 A G 88 0 65 1 14 0 0.675 0.675 114752402 G A 20 68 12 54 0 14 1.000 1.000 114752603 C G 88 0 65 1 14 0 0.675 0.675 211 114752775 C A 88 0 65 1 14 0 0.675 0.675

114753258 G T 88 0 64 2 14 0 0.675 0.675 114753347 C T 88 0 65 1 14 0 0.675 0.675 114754151 T C 71 15 42 18 11 3 0.675 0.675 114754826 A G 67 21 52 14 9 5 0.882 0.882 114754908 G A 87 1 63 3 14 0 0.675 0.675 114755257 T C 19 69 12 54 0 14 1.000 1.000 114755337 G C 71 17 40 26 8 4 0.675 0.675 114756756 A T 68 20 43 23 10 4 0.675 0.675 114757329 A G 71 17 43 23 10 4 0.675 0.675 114757568 T C 88 0 65 1 14 0 0.675 0.675 114757700 A T 71 17 44 22 10 4 0.675 0.675 114757845 A G 86 2 65 1 14 0 1.000 1.000 114757970 C T 69 19 43 21 9 5 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114758031 C T 86 0 58 2 14 0 0.675 0.675 114758408 T A 86 2 65 1 14 0 1.000 1.000 114758683 T C 70 18 43 21 10 4 0.675 0.675 114758766 G C 88 0 65 1 14 0 0.675 0.675 114758813 G C 86 0 60 2 14 0 0.675 0.675 114758818 G C 67 21 38 28 10 4 0.675 0.675 114758823 G C 58 30 34 32 8 6 0.675 0.675 114758948 G C 87 1 65 1 14 0 1.000 1.000 114759084 G A 70 18 47 19 10 4 0.675 0.675 114759321 A C 19 69 13 53 0 14 1.000 1.000 212 114759569 A G 88 0 65 1 14 0 0.675 0.675

114759959 G A 88 0 65 1 14 0 0.675 0.675 114760628 G A 79 9 59 7 13 1 1.000 1.000 114760676 G T 72 16 58 8 11 3 0.675 0.675 114760904 C G 88 0 65 1 14 0 0.675 0.675 114761010 C T 71 17 55 11 11 3 0.979 0.979 114761322 T C 65 23 44 22 10 4 0.675 0.675 114761328 A T 49 39 33 33 6 8 0.944 0.944 114761520 T C 88 0 65 1 14 0 0.675 0.675 114761639 T C 60 28 45 21 7 7 1.000 1.000 114762020 G A 71 17 58 8 11 3 0.675 0.675 114762220 C G 72 16 58 8 11 3 0.675 0.675 114762350 C T 88 0 65 1 14 0 0.675 0.675 114762834 G T 64 24 39 27 10 4 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114762879 A G 19 69 14 52 0 14 0.986 0.986 114762892 T A 85 3 63 3 13 1 1.000 1.000 114763066 C T 46 42 45 21 4 8 0.675 0.675 114763078 C T 48 40 45 21 5 9 0.675 0.675 114763288 A C 87 1 64 2 14 0 0.862 0.862 114763493 G A 88 0 65 1 14 0 0.675 0.675 114763744 A G 55 33 33 33 10 4 0.675 0.675 114763860 A G 88 0 65 1 14 0 0.675 0.675 114763975 G A 88 0 65 1 14 0 0.675 0.675 114764042 T C 88 0 65 1 14 0 0.675 0.675 213 114764123 G A 88 0 65 1 14 0 0.675 0.675

114764139 G C 55 33 38 28 10 4 0.822 0.822 114764479 G A 88 0 65 1 14 0 0.675 0.675 114764557 G C 34 54 25 41 6 8 1.000 1.000 114764654 A G 58 30 37 29 10 4 0.675 0.675 114764661 G T 33 55 21 45 7 7 0.692 0.692 114764778 T G 88 0 65 1 14 0 0.675 0.675 114764923 C T 58 30 38 28 10 4 0.675 0.675 114765388 A C 87 1 64 2 14 0 0.862 0.862 114765680 G A 61 27 40 26 11 3 0.675 0.675 114765727 T A 60 28 39 27 11 3 0.675 0.675 114765890 T A 88 0 65 1 14 0 0.675 0.675 114766295 A G 88 0 65 1 14 0 0.675 0.675 114766619 A C 85 3 65 1 14 0 1.000 1.000 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114766672 A G 87 1 65 1 14 0 1.000 1.000 114766778 T G 84 4 63 3 14 0 1.000 1.000 114767865 T C 88 0 65 1 14 0 0.675 0.675 114768535 T G 44 44 36 30 10 4 1.000 1.000 114768705 T G 87 1 64 0 14 0 1.000 1.000 114768985 G A 88 0 62 2 14 0 0.675 0.675 114769142 T G 80 6 56 6 11 3 1.000 1.000 114769546 G A 50 38 41 25 10 4 1.000 1.000 114769889 G A 88 0 65 1 14 0 0.675 0.675 114770286 T C 88 0 65 1 14 0 0.675 0.675 214 114771287 G A 88 0 65 1 14 0 0.675 0.675

114771468 A G 40 48 32 34 12 2 1.000 1.000 114771589 T C 88 0 65 1 14 0 0.675 0.675 114771628 C G 88 0 65 1 14 0 0.675 0.675 114771643 G A 88 0 65 1 14 0 0.675 0.675 114771867 A G 42 46 36 30 10 4 1.000 1.000 114772192 C T 88 0 65 1 14 0 0.675 0.675 114772286 T C 87 1 66 0 14 0 1.000 1.000 114772603 A G 86 2 63 3 14 0 0.675 0.675 114773990 C T 51 37 41 25 10 4 1.000 1.000 114774580 G C 87 1 66 0 14 0 1.000 1.000 114774690 G A 88 0 65 1 14 0 0.675 0.675 114774920 C T 82 6 62 4 10 4 0.867 0.867 114775033 A G 44 44 34 32 10 4 1.000 1.000 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114775105 G A 88 0 65 1 14 0 0.675 0.675 114775257 T C 87 1 65 1 14 0 1.000 1.000 114775399 C T 47 41 34 32 10 4 0.944 0.944 114775543 T C 87 1 65 1 14 0 1.000 1.000 114775619 T C 87 1 65 1 14 0 1.000 1.000 114775701 A G 87 1 65 1 14 0 1.000 1.000 114775895 G A 87 1 65 1 14 0 1.000 1.000 114776693 A C 39 49 34 32 10 4 1.000 1.000 114776700 T C 87 1 65 1 14 0 1.000 1.000 114777980 C T 87 1 64 2 11 3 1.000 1.000 215 114778030 C T 88 0 65 1 14 0 0.675 0.675

114778076 T G 87 1 63 1 14 0 1.000 1.000 114778108 A G 87 1 65 1 14 0 1.000 1.000 114778292 T C 88 0 65 1 14 0 0.675 0.675 114778293 G C 87 1 65 1 14 0 1.000 1.000 114778345 G A 88 0 65 1 14 0 0.675 0.675 114778700 C T 84 4 64 2 14 0 1.000 1.000 114778714 C T 87 1 65 1 14 0 1.000 1.000 114778905 T C 87 1 65 1 14 0 1.000 1.000 114779339 T C 45 43 28 36 3 11 1.000 1.000 114779367 C A 81 3 58 2 12 0 1.000 1.000 114779777 C T 40 48 35 31 10 4 0.944 0.944 114779881 C T 88 0 65 1 14 0 0.675 0.675 114780625 T C 88 0 65 1 14 0 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114782794 G C 88 0 64 2 14 0 0.675 0.675 114782887 G A 88 0 65 1 14 0 0.675 0.675 114783313 A G 85 3 65 1 14 0 1.000 1.000 114783868 T C 87 1 66 0 13 1 0.822 0.822 114784126 G A 88 0 65 1 14 0 0.675 0.675 114784926 C G 45 43 35 31 10 4 1.000 1.000 114785281 T C 85 3 64 2 14 0 1.000 1.000 114788192 G A 88 0 65 1 14 0 0.675 0.675 114788247 T C 87 1 66 0 14 0 1.000 1.000 114788320 C A 87 1 65 1 14 0 1.000 1.000 216 114788344 A T 87 1 66 0 14 0 1.000 1.000

114788545 T C 85 3 63 3 14 0 0.975 0.975 114788577 C T 46 42 35 31 10 4 1.000 1.000 114788651 A G 88 0 65 1 14 0 0.675 0.675 114789039 A C 13 75 12 54 1 13 0.822 0.822 114789201 A T 88 0 63 1 14 0 0.675 0.675 114789624 C G 42 42 30 32 8 4 1.000 1.000 114789653 C T 86 0 64 0 11 1 1.000 1.000 114789877 G C 88 0 65 1 14 0 0.675 0.675 114789963 C A 85 3 63 3 14 0 0.975 0.975 114791638 C A 88 0 65 1 14 0 0.675 0.675 114791842 C G 47 41 32 34 4 10 1.000 1.000 114792387 C T 88 0 66 0 13 1 1.000 1.000 114792611 A G 88 0 65 1 14 0 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114792915 A G 85 3 65 1 14 0 1.000 1.000 114793506 T C 88 0 65 1 14 0 0.675 0.675 114794513 C T 85 3 65 1 14 0 1.000 1.000 114795131 T A 88 0 65 1 14 0 0.675 0.675 114795175 G T 41 47 32 34 10 4 1.000 1.000 114795245 T C 88 0 65 1 14 0 0.675 0.675 114795361 C A 88 0 65 1 14 0 0.675 0.675 114795541 C G 38 44 32 30 4 8 0.698 0.698 114795746 T C 88 0 65 1 14 0 0.675 0.675 114795942 T G 88 0 65 1 14 0 0.675 0.675 217 114796094 C T 84 2 64 2 14 0 0.946 0.946

114796421 A G 40 46 29 37 9 3 0.826 0.826 114796464 C A 84 4 64 2 14 0 1.000 1.000 114796664 C T 88 0 65 1 14 0 0.675 0.675 114797049 A G 88 0 64 2 14 0 0.675 0.675 114798262 C T 88 0 65 1 14 0 0.675 0.675 114798913 A G 44 44 36 30 10 4 1.000 1.000 114799168 A G 88 0 65 1 14 0 0.675 0.675 114799818 C T 88 0 65 1 14 0 0.675 0.675 114800070 G A 87 1 65 1 14 0 1.000 1.000 114800505 A C 85 3 65 1 14 0 1.000 1.000 114800802 C T 88 0 65 1 12 0 0.681 0.681 114800846 C G 88 0 63 1 12 0 0.675 0.675 114801127 G A 83 3 59 1 12 0 1.000 1.000 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114801161 T C 85 3 64 2 12 0 1.000 1.000 114801256 C G 88 0 65 1 14 0 0.675 0.675 114801291 G A 84 4 59 7 14 0 0.675 0.675 114801702 T A 88 0 65 1 14 0 0.675 0.675 114801783 C T 88 0 65 1 14 0 0.675 0.675 114802248 G A 86 2 65 1 14 0 1.000 1.000 114802905 T C 87 1 66 0 14 0 1.000 1.000 114803346 A C 84 0 60 0 14 0 1.000 1.000 114804206 A G 88 0 65 1 14 0 0.675 0.675 114804220 A G 87 1 62 4 11 3 1.000 1.000 218 114804286 T C 87 1 66 0 14 0 1.000 1.000

114804487 G A 88 0 65 1 14 0 0.675 0.675 114804607 T C 84 4 61 5 14 0 0.675 0.675 114804639 C A 87 1 66 0 14 0 1.000 1.000 114804656 T C 88 0 65 1 14 0 0.675 0.675 114805173 C T 88 0 63 1 14 0 0.675 0.675 114805539 G A 88 0 62 2 12 0 0.675 0.675 114805895 A G 87 1 63 3 14 0 0.675 0.675 114806173 T C 42 46 36 30 10 4 1.000 1.000 114806391 C T 10 78 12 54 1 13 0.675 0.675 114807518 A T 88 0 65 1 14 0 0.675 0.675 114807728 C T 88 0 65 1 14 0 0.675 0.675 114807744 A T 5 83 3 63 1 13 1.000 1.000 114808082 A G 84 4 61 5 14 0 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114808166 T C 87 1 61 1 14 0 1.000 1.000 114809488 A G 88 0 65 1 14 0 0.675 0.675 114809533 C T 87 1 65 1 14 0 1.000 1.000 114809763 A T 87 1 65 1 14 0 1.000 1.000 114809845 G A 88 0 65 1 14 0 0.675 0.675 114810024 C G 87 1 65 1 14 0 1.000 1.000 114810038 G A 75 13 60 6 10 4 0.675 0.675 114810250 T G 88 0 65 1 14 0 0.675 0.675 114810879 T A 46 42 37 29 10 4 1.000 1.000 114811458 T C 88 0 65 1 14 0 0.675 0.675 219 114812362 G T 87 1 66 0 14 0 1.000 1.000

114812529 T C 88 0 65 1 14 0 0.675 0.675 114812987 C T 85 1 53 1 12 0 1.000 1.000 114813469 A T 88 0 65 1 14 0 0.675 0.675 114813514 G A 87 1 65 1 14 0 1.000 1.000 114816740 G A 88 0 65 1 14 0 0.675 0.675 114817796 A T 88 0 65 1 14 0 0.675 0.675 114817866 A T 88 0 65 1 14 0 0.675 0.675 114818016 A T 85 3 62 4 12 2 1.000 1.000 114818182 T C 85 3 62 4 14 0 0.712 0.712 114818187 T G 83 3 64 2 14 0 1.000 1.000 114819001 C G 88 0 65 1 14 0 0.675 0.675 114819264 C A 87 1 66 0 14 0 1.000 1.000 114819969 A G 88 0 65 1 14 0 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114820350 G A 87 1 64 0 12 0 1.000 1.000 114820424 C G 86 0 60 2 13 1 0.862 0.862 114820481 C T 86 0 57 1 14 0 0.675 0.675 114821455 T C 88 0 65 1 14 0 0.675 0.675 114821491 C T 87 1 65 1 14 0 1.000 1.000 114821880 C T 88 0 66 0 13 1 1.000 1.000 114821970 T C 88 0 64 0 13 1 1.000 1.000 114821971 A C 88 0 66 0 13 1 1.000 1.000 114822368 A G 88 0 65 1 14 0 0.675 0.675 114822737 C T 51 37 38 28 9 5 1.000 1.000 220 114823153 T A 88 0 64 2 13 1 0.862 0.862

114823319 T A 88 0 65 1 14 0 0.675 0.675 114823587 A T 88 0 65 1 14 0 0.675 0.675 114824370 G T 88 0 63 3 14 0 0.675 0.675 114824405 T A 84 4 63 3 14 0 1.000 1.000 114824747 C T 88 0 65 1 14 0 0.675 0.675 114825308 G A 88 0 65 1 14 0 0.675 0.675 114825375 C T 87 1 63 3 11 3 1.000 1.000 114825645 A G 86 2 65 1 11 3 0.683 0.683 114825897 T G 86 2 65 1 14 0 1.000 1.000 114826772 C T 88 0 65 1 14 0 0.675 0.675 114827504 T G 88 0 65 1 14 0 0.675 0.675 114827827 C T 75 13 58 8 13 1 1.000 1.000 114828115 A C 88 0 65 1 14 0 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114828615 C T 88 0 65 1 14 0 0.675 0.675 114829699 A G 87 1 65 1 14 0 1.000 1.000 114830101 A T 88 0 65 1 14 0 0.675 0.675 114830188 A T 88 0 65 1 14 0 0.675 0.675 114830284 T C 87 1 65 1 14 0 1.000 1.000 114830353 A T 88 0 65 1 14 0 0.675 0.675 114830720 T C 84 4 61 5 14 0 0.675 0.675 114831254 C T 86 0 61 1 14 0 0.675 0.675 114831359 A C 88 0 65 1 14 0 0.675 0.675 114831493 A G 88 0 65 1 14 0 0.675 0.675 221 114831640 T A 87 1 65 1 14 0 1.000 1.000

114831767 G A 62 26 45 21 12 2 0.907 0.907 114832030 C G 88 0 65 1 14 0 0.675 0.675 114832253 A G 88 0 65 1 14 0 0.675 0.675 114833645 C G 81 3 60 2 14 0 1.000 1.000 114834030 C T 82 6 58 2 13 1 0.786 0.786 114834739 T G 88 0 65 1 14 0 0.675 0.675 114835030 A G 88 0 65 1 14 0 0.675 0.675 114835571 T C 85 3 62 4 14 0 0.712 0.712 114836002 G C 88 0 65 1 14 0 0.675 0.675 114836977 T C 83 5 65 1 14 0 0.683 0.683 114837345 A G 84 4 64 2 14 0 1.000 1.000 114838192 C T 88 0 66 0 13 1 1.000 1.000 114838316 A T 88 0 65 1 14 0 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114838881 G A 85 3 66 0 13 1 0.675 0.675 114839042 G A 87 1 65 1 14 0 1.000 1.000 114839070 C T 88 0 65 1 14 0 0.675 0.675 114839408 C A 88 0 65 1 14 0 0.675 0.675 114840025 C T 88 0 65 1 14 0 0.675 0.675 114840405 G A 84 0 61 3 14 0 0.675 0.675 114840738 C T 88 0 65 1 14 0 0.675 0.675 114840777 T C 88 0 65 1 14 0 0.675 0.675 114840892 A G 86 2 64 2 14 0 0.946 0.946 114841606 A C 88 0 65 1 14 0 0.675 0.675 222 114841914 G C 88 0 64 2 14 0 0.675 0.675

114842507 A G 88 0 65 1 14 0 0.675 0.675 114843517 T C 88 0 65 1 14 0 0.675 0.675 114843597 A T 88 0 65 1 14 0 0.675 0.675 114843798 T G 88 0 65 1 14 0 0.675 0.675 114843818 A T 88 0 66 0 12 2 0.822 0.822 114843871 G T 87 1 65 1 14 0 1.000 1.000 114844477 C T 88 0 65 1 14 0 0.675 0.675 114844912 G T 88 0 65 1 14 0 0.675 0.675 114844991 A G 88 0 65 1 14 0 0.675 0.675 114845660 C T 88 0 65 1 14 0 0.675 0.675 114845851 C G 83 5 65 1 14 0 0.683 0.683 114848059 C T 87 1 65 1 14 0 1.000 1.000 114848193 A T 88 0 65 1 14 0 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114849009 T A 88 0 65 1 14 0 0.675 0.675 114849835 G C 88 0 63 3 14 0 0.675 0.675 114850237 G C 85 3 64 2 14 0 1.000 1.000 114850349 T G 88 0 65 1 14 0 0.675 0.675 114850431 G A 88 0 65 1 14 0 0.675 0.675 114850755 G A 88 0 65 1 14 0 0.675 0.675 114850774 G A 88 0 65 1 14 0 0.675 0.675 114853028 C T 88 0 66 0 13 1 1.000 1.000 114853107 G A 85 3 63 3 14 0 0.975 0.975 114853406 T C 88 0 65 1 14 0 0.675 0.675 223 114854121 T C 88 0 65 1 14 0 0.675 0.675

114854496 A G 87 1 65 1 14 0 1.000 1.000 114854870 G A 88 0 65 1 14 0 0.675 0.675 114855085 T C 52 26 40 14 7 3 0.740 0.740 114855622 C T 88 0 63 1 14 0 0.675 0.675 114855713 A G 87 1 65 1 14 0 1.000 1.000 114855726 A G 88 0 65 1 14 0 0.675 0.675 114855994 C A 88 0 65 1 14 0 0.675 0.675 114856322 A G 88 0 65 1 14 0 0.675 0.675 114856428 T A 88 0 65 1 14 0 0.675 0.675 114856479 A G 88 0 65 1 14 0 0.675 0.675 114856508 C A 88 0 65 1 14 0 0.675 0.675 114856903 C T 87 1 63 3 11 3 1.000 1.000 114857706 G A 88 0 64 2 14 0 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114858181 C T 61 27 42 24 10 4 0.806 0.806 114858395 T C 88 0 59 7 12 2 0.675 0.675 114859099 C T 87 1 65 1 14 0 1.000 1.000 114859489 A G 83 5 64 2 13 1 0.782 0.782 114859833 A C 88 0 64 2 14 0 0.675 0.675 114860235 G C 88 0 65 1 14 0 0.675 0.675 114860604 T C 88 0 64 2 13 1 0.862 0.862 114860775 T C 87 1 65 1 14 0 1.000 1.000 114861094 T C 88 0 62 4 12 2 0.675 0.675 114861835 G A 83 5 63 3 14 0 1.000 1.000 224 114863593 T A 87 1 65 1 14 0 1.000 1.000

114863725 T A 88 0 63 3 11 3 0.975 0.975 114864250 C T 88 0 65 1 14 0 0.675 0.675 114864705 C G 88 0 60 6 11 3 0.675 0.675 114865620 T C 88 0 60 6 10 2 0.675 0.675 114865786 G A 88 0 61 5 12 2 0.675 0.675 114867907 G A 88 0 61 5 12 2 0.675 0.675 114868007 A G 88 0 65 1 14 0 0.675 0.675 114868213 T C 88 0 65 1 14 0 0.675 0.675 114868681 A G 88 0 65 1 14 0 0.675 0.675 114869487 A G 88 0 65 1 14 0 0.675 0.675 114869488 G A 85 3 62 4 11 3 1.000 1.000 114869790 C T 88 0 62 4 11 3 0.712 0.712 114869820 A G 88 0 61 5 12 2 0.675 0.675 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114870450 A G 81 7 53 13 10 4 0.675 0.675 114870587 T C 83 3 47 9 11 1 0.675 0.675 114870613 G T 87 1 59 1 14 0 1.000 1.000 114871123 T C 86 2 65 1 11 3 0.683 0.683 114871655 T C 81 7 52 14 10 4 0.675 0.675 114871969 C G 81 7 52 14 10 4 0.675 0.675 114871989 A C 81 7 53 13 10 4 0.675 0.675 114872801 G A 81 7 53 13 10 4 0.675 0.675 114873055 T C 81 7 51 13 11 3 0.675 0.675 114873093 C T 82 4 65 1 14 0 0.946 0.946 225 114873185 T C 87 1 65 1 14 0 1.000 1.000

114873277 C T 81 7 53 13 11 3 0.675 0.675 114873543 G A 80 8 52 14 10 4 0.675 0.675 114874220 G A 81 7 52 14 10 4 0.675 0.675 114874386 C T 88 0 65 1 14 0 0.675 0.675 114874505 C T 81 7 53 13 10 4 0.675 0.675 114875371 G A 87 1 65 1 14 0 1.000 1.000 114875665 G A 88 0 63 1 14 0 0.675 0.675 114876061 C T 88 0 65 1 14 0 0.675 0.675 114876294 G A 88 0 65 1 14 0 0.675 0.675 114878056 G A 88 0 65 1 14 0 0.675 0.675 114878143 G T 88 0 65 1 14 0 0.675 0.675 114878206 T C 88 0 63 3 12 2 0.675 0.675 114878315 G A 88 0 62 4 11 3 0.712 0.712 Continued

Table A.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114878395 T C 67 21 55 11 13 1 0.862 0.862 114878441 C T 88 0 61 5 11 3 0.675 0.675 114878636 C T 88 0 65 1 14 0 0.675 0.675 114879399 T C 88 0 61 5 12 2 0.675 0.675 114879869 A C 88 0 65 1 14 0 0.675 0.675 114879876 G A 88 0 61 5 11 3 0.675 0.675 114880073 C A 88 0 60 6 11 3 0.675 0.675 114880121 T C 88 0 65 1 14 0 0.675 0.675 114880423 T C 88 0 63 3 11 3 0.975 0.975 114880523 C T 88 0 60 6 11 3 0.675 0.675 226 114880587 G A 88 0 65 1 14 0 0.675 0.675

114880673 A G 88 0 65 1 14 0 0.675 0.675 114880937 T C 88 0 60 6 11 3 0.675 0.675 114881104 G A 83 5 63 3 14 0 1.000 1.000 114881493 G A 86 2 65 1 14 0 1.000 1.000 114881569 C T 88 0 65 1 14 0 0.675 0.675 114881669 T C 88 0 65 1 14 0 0.675 0.675 114881758 A C 88 0 65 1 14 0 0.675 0.675 114882820 T G 87 1 62 4 12 2 0.712 0.712 114882996 C T 87 1 61 5 12 2 0.675 0.675 114883194 T A 84 4 65 1 14 0 0.946 0.946 114883286 C T 87 1 60 6 11 3 0.675 0.675 114883451 C T 88 0 65 1 14 0 0.675 0.675 114885926 C T 88 0 65 1 14 0 0.675 0.675 114885960 G A 87 1 65 1 14 0 1.000 1.000

Appendix B: PLS3 indels, Females

227

Table B.1: Allele counts of PLS3 indels, Females chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114738112 CAA CA 75 15 57 7 12 3 1.000 1.000 114738112 CAA CAAA 75 6 57 4 12 1 1.000 1.000 114739858 AT ATT 52 44 38 31 6 10 1.000 1.000 114739858 AT A 52 0 38 1 6 0 1.000 1.000 114747223 GAAA GAA 33 39 24 33 6 7 1.000 1.000 114747223 GAAA G 33 0 24 1 6 0 1.000 1.000 114747223 GAAA GA 33 4 24 6 6 1 1.000 1.000 114747223 GAAA GAAAA 33 4 24 3 6 0 1.000 1.000 CTTTTTTT 114748164 TTTT CT 91 1 65 1 14 0 1.000 1.000

228 114749344 TAAA TA 48 13 40 5 5 3 0.876 1.000

114749344 TAAA TAA 48 26 40 15 5 7 0.995 1.000 114749344 TAAA TAAAA 48 6 40 2 5 1 1.000 1.000 114749344 TAAA T 48 1 40 4 5 0 0.901 1.000 114749519 CAA CA 46 46 37 29 8 8 1.000 1.000 114749519 CAA CAAA 46 0 37 4 8 0 0.629 1.000 114749966 GAAA GA 31 19 30 13 7 6 1.000 1.000 114749966 GAAA GAA 31 42 30 27 7 3 1.000 1.000 114750287 CAAAAA CAAA 7 63 0 43 4 10 0.466 0.715 114750287 CAAAAA CAA 7 10 0 2 4 0 1.000 0.873 114750287 CAAAAA C 7 4 0 5 4 0 0.366 1.000 114751278 CT C 63 15 45 12 8 2 1.000 1.000 114751278 CT CTT 63 17 45 11 8 5 1.000 1.000 114751278 CT CTTT 63 1 45 0 8 1 1.000 1.000 Continued

Table B.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114754239 TACACAC TACAC 69 11 47 9 9 2 1.000 1.000 114754239 TACACAC TAC 69 9 47 4 9 3 1.000 1.000 114755000 CAAAAAA CAA 1 46 1 37 0 10 1.000 1.000 114755000 CAAAAAA CAAA 1 24 1 11 0 3 1.000 1.000 114755000 CAAAAAA CA 1 6 1 12 0 3 1.000 1.000 114755000 CAAAAAA CAAAAA 1 5 1 4 0 0 1.000 1.000 114755000 CAAAAAA C 1 0 1 1 0 0 1.000 1.000 114755000 CAAAAAA CAAAA 1 6 1 4 0 0 1.000 1.000 114755588 CAAAAA CAAA 16 47 3 37 0 9 0.780 1.000 114755588 CAAAAA CAA 16 10 3 8 0 1 0.876 1.000 229 114755588 CAAAAA CAAAA 16 14 3 12 0 2 0.780 1.000

CAAAAA 114755588 CAAAAA A 16 2 3 1 0 1 1.000 1.000 114755588 CAAAAA CA 16 3 3 2 0 1 1.000 1.000 114755588 CAAAAA C 16 2 3 1 0 1 1.000 1.000 114756070 GT GTT 20 20 25 16 6 2 1.000 1.000 114756070 GT GTTT 20 44 25 26 6 6 0.876 1.000 114756070 GT GTTTT 20 0 25 1 6 0 1.000 1.000 114758051 CA CAAA 67 9 51 6 10 3 1.000 1.000 114758051 CA C 67 4 51 6 10 1 1.000 1.000 114758051 CA CAA 67 12 51 1 10 0 0.629 1.000 114758051 CA CAAAA 67 2 51 0 10 2 0.876 0.873 114758142 AATAT AAT 22 42 13 44 1 8 1.000 1.000 114758142 AATAT A 22 1 13 0 1 2 1.000 0.537 Continued

Table B.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114758142 AATAT AATATAT 22 15 13 9 1 5 1.000 1.000 AACACA 114758194 AAC CAC 16 42 8 30 1 9 1.000 1.000 114758194 AAC AACAC 16 13 8 5 1 5 1.000 1.000 AACACA 114758194 AAC C 16 9 8 3 1 0 1.000 1.000 AACACA CACACA 114758194 AAC CAC 16 2 8 6 1 0 0.773 1.000 114758194 AAC A 16 2 8 3 1 1 1.000 1.000

230 AACACA CACACA

114758194 AAC C 16 4 8 3 1 0 1.000 1.000 CAAAAAA 114760206 A C 72 9 46 7 10 2 1.000 1.000 CAAAAAA 114760206 A CA 72 1 46 5 10 2 0.876 1.000 CAAAAAA 114760206 A CAA 72 2 46 2 10 0 1.000 1.000 114762619 ACTT AT 77 19 44 22 8 5 0.876 1.000 114762619 ACTT ATT 77 0 44 1 8 0 1.000 1.000 114762619 ACTT A 77 0 44 3 8 1 0.876 1.000 114762620 CTTTT CTTT 49 44 35 26 8 6 1.000 1.000 114762620 CTTTT CTT 49 2 35 1 8 0 1.000 1.000 114764434 AAAAT A 92 1 65 2 15 1 1.000 1.000 Continued

Table B.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe AAAATA AATAAA 114764434 AAAAT T 92 3 65 3 15 0 1.000 1.000 114764718 TTGTG TTG 43 22 34 14 11 3 1.000 1.000 114764718 TTGTG TTGTGTG 43 24 34 20 11 2 1.000 1.000 TTGTGTG 114764718 TTGTG TG 43 0 34 2 11 0 0.876 1.000 TTTTATTT ATTTATT TTTTATT 114768876 TA TATTTA 25 44 16 31 4 4 1.000 1.000

231 TTTTATTT TTTTATT

ATTTATT TATTTAT 114768876 TA TTATTTA 25 24 16 16 4 6 1.000 1.000 TTTTATT TTTTATTT TATTTAT ATTTATT TTATTTA 114768876 TA TTTA 25 3 16 3 4 0 1.000 1.000 114769095 GA GAA 44 42 35 24 7 8 1.000 1.000 114769095 GA GAAA 44 2 35 1 7 1 1.000 1.000 114769095 GA G 44 6 35 2 7 0 1.000 1.000 114769182 AATT AATTATT 43 51 33 35 7 9 1.000 1.000 114771373 CAAAA CAAA 53 2 39 2 10 0 1.000 1.000 114771373 CAAAA C 53 18 39 12 10 4 1.000 1.000 114771373 CAAAA CAA 53 10 39 10 10 1 1.000 1.000 114771373 CAAAA CA 53 9 39 5 10 1 1.000 1.000 Continued

Table B.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114776488 CTTT C 64 4 47 1 9 1 1.000 1.000 114776488 CTTT CTT 64 18 47 12 9 4 1.000 1.000 114776488 CTTT CTTTT 64 3 47 2 9 0 1.000 1.000 114776488 CTTT CT 64 5 47 4 9 0 1.000 1.000 114781529 TA TAA 17 59 18 32 8 2 1.000 0.286 114781529 TA TAAA 17 3 18 2 8 0 1.000 1.000 114781529 TA T 17 3 18 4 8 2 1.000 1.000 114784096 TA T 87 6 59 5 14 2 1.000 1.000 114784096 TA TAA 87 3 59 6 14 0 0.876 1.000 114789615 T TTCCC 40 47 36 31 8 4 1.000 1.000

232 TTCCCTC

114789615 T CC 40 7 36 1 8 0 0.876 1.000 114797163 GAA GA 95 1 70 0 16 0 1.000 1.000 114802031 CTTTT CTTT 59 14 49 8 9 4 1.000 1.000 114802031 CTTTT CTT 59 10 49 6 9 1 1.000 1.000 114802031 CTTTT C 59 0 49 1 9 0 1.000 1.000 114802031 CTTTT CT 59 7 49 2 9 0 1.000 1.000 CAAAAA 114803357 CA AA 78 5 58 2 15 0 1.000 1.000 114803357 CA CAA 78 5 58 2 15 0 1.000 1.000 114803357 CA C 78 2 58 2 15 1 1.000 1.000 114804819 CTT C 69 3 51 2 14 0 1.000 1.000 114804819 CTT CT 69 13 51 9 14 0 1.000 1.000 Continued

Table B.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114804819 CTT CTTT 69 11 51 2 14 2 0.876 1.000 114804819 CTT CTTTT 69 0 51 2 14 0 0.876 1.000 114805210 CTTT CTT 60 22 44 10 11 4 1.000 1.000 114805210 CTTT CTTTT 60 3 44 7 11 1 0.876 1.000 114805210 CTTT CT 60 5 44 4 11 0 1.000 1.000 114805210 CTTT C 60 0 44 1 11 0 1.000 1.000 114805830 GAAA GA 41 6 28 4 4 0 1.000 1.000 114805830 GAAA GAA 41 44 28 32 4 10 1.000 1.000 CAAAAAA CAAAAA 114807408 A A 70 13 39 8 9 2 1.000 1.000

233 CAAAAAA CAAAAA

114807408 A AAA 70 4 39 2 9 3 1.000 0.873 CAAAAAA 114807408 A C 70 1 39 1 9 0 1.000 1.000 114812883 GA GAA 96 0 69 1 16 0 1.000 1.000 114812883 CAAA CAA 51 5 29 4 12 2 1.000 1.000 114820828 CAAA C 51 0 29 1 12 0 1.000 1.000 114820828 CA CAA 87 1 52 4 11 2 1.000 1.000 114820828 CA C 87 6 52 6 11 1 1.000 1.000 114820991 CA CAAA 60 0 44 2 8 0 0.876 1.000 114820991 CA CAA 60 13 44 5 8 1 1.000 1.000 114820991 CA C 60 19 44 19 8 5 1.000 1.000 114822046 CAA CA 64 17 47 15 10 3 1.000 1.000 Continued

Table B.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114822046 CAA CAAA 64 11 47 3 10 1 1.000 1.000 114822046 CAA C 64 2 47 1 10 0 1.000 1.000

114828956 CCTCTCT CCTCT 87 7 64 4 16 0 1.000 1.000

114828956 CCTCTCT C 87 0 64 2 16 0 0.876 1.000 114828980 TCA T 76 1 67 3 14 2 1.000 1.000 114829999 CTT CTTT 68 7 55 3 12 2 1.000 1.000 114829999 CTT C 68 3 55 4 12 0 1.000 1.000 114829999 CTT CT 68 16 55 6 12 2 0.901 1.000 234 114832604 CAA CA 51 40 47 18 7 6 0.629 1.000

114832604 CAA CAAA 51 1 47 3 7 1 1.000 1.000 114833648 CAA CA 84 3 52 8 14 0 0.516 1.000 114833648 CAA C 84 1 52 4 14 0 0.780 1.000 114833832 CA C 59 11 47 6 13 3 1.000 1.000 114833832 CA CAA 59 6 47 1 13 0 1.000 1.000 114833899 CTT C 71 3 40 3 14 0 1.000 1.000 114833899 CTT CT 71 2 40 3 14 0 1.000 1.000 114834604 ATT A 54 1 44 1 10 0 1.000 1.000 114834604 ATT AT 54 24 44 19 10 5 1.000 1.000 114834604 ATT ATTT 54 9 44 4 10 1 1.000 1.000 114835773 TAA TA 47 40 35 30 9 7 1.000 1.000 114835773 TAA TAAA 47 2 35 1 9 0 1.000 1.000 114835773 TAA T 47 5 35 2 9 0 1.000 1.000 Continued

Table B.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114837176 TAA TA 2 94 1 69 0 16 1.000 1.000 114841811 ATTT A 15 0 12 1 4 0 1.000 1.000 114841811 ATTT AT 15 27 12 21 4 2 1.000 1.000 114841811 ATTT ATT 15 54 12 36 4 8 1.000 1.000 114844799 CTTT CTT 65 26 58 8 7 9 0.268 0.290 114844799 CTTT CTTTT 65 2 58 4 7 0 1.000 1.000 CAGAGAG 114849799 AG CAG 45 0 40 1 7 0 1.000 1.000 CAGAGAG CAGAGA 114849799 AG G 45 33 40 20 7 4 1.000 1.000 235 CAGAGAG CAGAGA

114849799 AG GAGAG 45 1 40 3 7 1 1.000 1.000 CAGAGA CAGAGAG GAGAGA 114849799 AG G 45 3 40 1 7 1 1.000 1.000 CAGAGAG 114849799 AG C 45 0 40 1 7 0 1.000 1.000 CAGAGAG 114849799 AG CAGAG 45 4 40 2 7 1 1.000 1.000 CTTTTTTT CTTTTTT 114855018 TTTTT TTTT 82 6 58 0 13 2 0.773 1.000

CTTTTTTT 114855018 TTTTT C 82 0 58 0 13 1 1.000 1.000 Continued

Table B.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114859905 TAA TA 89 2 62 2 8 2 1.000 0.873 114859905 TAA T 89 1 62 6 8 4 1.000 0.290 114860598 CTT CT 70 14 54 12 12 4 1.000 1.000 114860598 CTT CTTT 70 12 54 4 12 0 1.000 1.000 114865357 TTA TTATA 80 0 46 2 12 0 0.876 1.000 114867489 GTTT GTT 60 33 59 9 14 2 0.268 1.000 114867489 GTTT GT 60 1 59 2 14 0 1.000 1.000 114869962 CAAA CAA 65 8 39 2 9 1 1.000 1.000 114869962 CAAA C 65 1 39 1 9 0 1.000 1.000 114870517 CTTT CTT 55 16 52 8 11 0 1.000 1.000 114870517 CTTT CTTTT 55 11 52 4 11 2 0.876 1.000

236 114870517 CTTT CT 55 6 52 0 11 3 0.377 1.000

114875537 CTTTT CTTT 43 18 44 8 9 2 0.876 1.000 114875537 CTTTT CTTTTT 43 7 44 4 9 1 1.000 1.000 114875537 CTTTT C 43 2 44 1 9 1 1.000 1.000 114875537 CTTTT CT 43 5 44 3 9 2 1.000 1.000 114875537 CTTTT CTT 43 9 44 8 9 1 1.000 1.000 114876350 CTTT CT 59 14 42 11 7 4 1.000 1.000 114876350 CTTT CTTTT 59 6 42 9 7 3 1.000 1.000 114876350 CTTT C 59 2 42 4 7 0 1.000 1.000 114876350 CTTT CTT 59 13 42 4 7 2 0.979 1.000

AAAATAA ATAAATA AAAATA 114878911 AAT AAT 1 39 2 24 0 3 1.000 1.000 Continued

Table B.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe

AAAATAA AAAATA ATAAATA AATAAA 114878911 AAT T 1 56 2 36 0 10 1.000 1.000

AAAATAA ATAAATA 114878911 AAT A 1 0 2 5 0 3 1.000 1.000

AAAATAA 237 ATAAATA

114878911 AAT AAAAT 1 0 2 1 0 3 1.000 1.000

Appendix C: PLS3 SNPs, Males

238

Table C.1: Allele counts of PLS3 SNPs, Males chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114734068 T C 58 0 23 1 11 0 1.000 1.000 114734076 C T 58 0 23 1 11 0 1.000 1.000 114734111 G A 24 34 11 13 8 3 1.000 0.384 114734469 T G 58 0 23 1 11 0 1.000 1.000 114734718 G A 28 30 12 12 7 4 1.000 0.953 114735356 G A 58 0 24 0 9 1 1.000 0.384 114736614 C T 28 30 12 12 6 5 1.000 1.000 114738300 G A 58 0 24 0 9 1 1.000 0.384 114738319 T A 42 16 16 8 9 2 1.000 1.000 114738387 A G 28 30 10 14 5 5 1.000 1.000 239 114738436 A G 54 4 23 1 11 0 1.000 1.000

114738701 C T 41 17 17 7 9 2 1.000 1.000 114739063 T C 28 30 11 13 6 5 1.000 1.000 114739272 C G 28 30 12 12 6 4 1.000 1.000 114739632 A T 41 17 17 7 9 2 1.000 1.000 114739684 T G 28 30 10 14 6 5 1.000 1.000 114740834 G A 55 3 22 1 11 0 1.000 1.000 114740938 A T 28 30 9 14 6 5 1.000 1.000 114741136 C A 58 0 23 1 10 0 1.000 1.000 114741485 A G 25 33 10 14 5 6 1.000 1.000 114741788 G T 30 28 12 12 6 4 1.000 1.000 114742093 G A 24 34 9 15 7 3 1.000 0.384 114742104 A C 24 34 9 15 6 5 1.000 0.953 114742183 G A 48 10 20 4 8 3 1.000 0.857 Continued

Table C.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114742278 C T 28 30 12 12 5 6 1.000 1.000 114742390 C T 28 30 11 13 4 7 1.000 0.953 114742466 A C 29 29 12 12 5 6 1.000 1.000 114742863 T C 34 24 14 10 5 6 1.000 0.953 114743131 G A 38 20 16 8 5 6 1.000 0.583 114743762 G A 54 4 23 0 9 2 1.000 0.470 114743890 G A 36 21 14 9 5 5 1.000 0.953 114743905 G A 58 0 22 1 11 0 1.000 1.000 114744241 G A 43 15 16 8 9 2 1.000 1.000 114744256 A G 43 15 17 7 9 2 1.000 1.000 240 114744434 T C 15 43 5 19 3 8 1.000 1.000

114744988 A C 33 25 14 10 4 5 1.000 0.953 114744992 A G 33 25 13 11 5 5 1.000 1.000 114745305 A G 33 25 14 10 5 6 1.000 0.953 114745371 C A 56 2 24 0 11 0 1.000 1.000 114745883 T C 33 25 14 10 6 5 1.000 1.000 114746960 C T 58 0 23 1 11 0 1.000 1.000 114747025 G C 33 25 14 10 5 6 1.000 0.953 114747963 T C 15 43 5 19 2 9 1.000 1.000 114748204 C T 49 3 15 3 8 0 1.000 1.000 114748507 T C 35 23 14 10 5 6 1.000 0.953 114749162 T C 13 45 5 19 3 8 1.000 1.000 114750081 C T 34 23 14 8 5 5 1.000 0.953 114751852 G A 15 43 5 19 3 8 1.000 1.000 Continued

Table C.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114751863 A T 58 0 23 1 11 0 1.000 1.000 114752402 G A 13 45 3 21 3 8 1.000 1.000 114753057 T G 58 0 23 1 11 0 1.000 1.000 114754151 T C 40 16 16 6 9 1 1.000 0.909 114754574 C T 58 0 23 1 11 0 1.000 1.000 114754826 A G 47 11 18 6 7 4 1.000 0.682 114755257 T C 15 43 5 19 3 8 1.000 1.000 114755337 G C 40 18 16 8 9 2 1.000 0.953 114756756 A T 42 16 16 8 9 2 1.000 1.000 114757329 A G 43 15 16 8 9 2 1.000 1.000 241 114757700 A T 43 15 16 8 9 2 1.000 1.000

114757970 C T 43 15 15 8 8 2 1.000 1.000 114758683 T C 42 16 15 8 9 1 1.000 0.682 114758818 G C 40 18 15 8 9 1 1.000 0.682 114758823 G C 35 12 14 9 9 1 1.000 0.384 114759084 G A 42 16 16 8 10 1 1.000 0.682 114759321 A C 17 41 5 19 3 8 1.000 1.000 114760676 G T 48 10 21 3 9 2 1.000 1.000 114761010 C T 46 12 21 3 9 2 1.000 1.000 114761322 T C 41 17 15 8 8 3 1.000 1.000 114761328 A T 28 30 12 11 6 5 1.000 1.000 114761639 T C 45 13 17 7 7 4 1.000 0.935 114761640 G A 58 0 23 1 11 0 1.000 1.000 114762020 G A 47 11 21 3 9 2 1.000 1.000 Continued

Table C.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114762220 C G 48 10 21 3 9 2 1.000 1.000 114762834 G T 38 20 16 8 9 1 1.000 0.504 114762879 A G 13 45 5 19 2 9 1.000 1.000 114763066 C T 36 22 14 9 5 5 1.000 0.953 114763078 C T 37 21 14 9 5 5 1.000 0.953 114763361 T C 58 0 24 0 10 1 1.000 0.384 114763744 A G 42 16 14 10 6 5 1.000 0.953 114763860 A G 58 0 23 1 11 0 1.000 1.000 114763975 G A 58 0 23 1 11 0 1.000 1.000 114764123 G A 58 0 23 1 11 0 1.000 1.000 242 114764139 G C 42 16 14 10 6 5 1.000 0.953

114764479 G A 58 0 23 1 9 0 1.000 1.000 114764482 G A 58 0 24 0 8 1 1.000 0.384 114764557 G C 25 33 8 16 2 8 1.000 0.682 114764654 A G 42 16 14 9 6 5 1.000 0.703 114764661 G T 22 36 8 16 3 8 1.000 1.000 114764778 T G 58 0 23 1 11 0 1.000 1.000 114764923 C T 43 15 15 9 6 5 1.000 0.682 114765146 T G 58 0 23 1 11 0 1.000 1.000 114765680 G A 43 15 15 9 5 5 1.000 0.682 114765727 T A 44 14 14 9 5 5 1.000 0.682 114766295 A G 58 0 23 1 11 0 1.000 1.000 114766619 A C 55 3 24 0 10 1 1.000 0.832 114767739 G A 58 0 24 0 10 1 1.000 0.384 Continued

Table C.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114768535 T G 30 28 15 9 7 4 1.000 1.000 114769428 C G 58 0 24 0 10 1 1.000 0.384 114769522 T C 58 0 23 1 11 0 1.000 1.000 114769546 G A 32 26 16 8 6 5 1.000 1.000 114770408 A G 58 0 23 1 11 0 1.000 1.000 114770805 G A 58 0 24 0 10 1 1.000 0.384 114770820 T C 58 0 23 1 11 0 1.000 1.000 114771468 A G 27 31 14 10 5 6 1.000 1.000 114771589 T C 58 0 23 1 11 0 1.000 1.000 114771867 A G 30 28 15 9 6 5 1.000 1.000 243 114772192 C T 58 0 23 1 11 0 1.000 1.000

114773968 C G 58 0 24 0 10 1 1.000 0.384 114773990 C T 32 26 16 8 6 5 1.000 1.000 114774126 C T 58 0 24 0 10 1 1.000 0.384 114774920 C T 56 2 18 6 10 1 1.000 1.000 114775033 A G 29 29 16 8 5 6 1.000 1.000 114775133 T A 58 0 24 0 10 1 1.000 0.384 114775257 T C 58 0 24 0 9 1 1.000 0.384 114775399 C T 32 26 16 8 6 4 1.000 1.000 114775543 T C 58 0 24 0 10 1 1.000 0.384 114775701 A G 58 0 24 0 10 1 1.000 0.384 114775895 G A 58 0 24 0 10 1 1.000 0.384 114776466 G C 58 0 23 0 9 1 1.000 0.384 114776473 G T 58 0 22 0 9 1 1.000 0.384 Continued

Table C.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114776693 A C 27 31 12 10 4 7 1.000 0.953 114776700 T C 58 0 22 0 9 1 1.000 0.384 114776703 A G 58 0 22 0 10 1 1.000 0.384 114776735 G A 58 0 23 0 10 1 1.000 0.384 114776944 G A 58 0 23 0 8 1 1.000 0.384 114777684 G T 58 0 24 0 10 1 1.000 0.384 114777980 C T 57 1 24 0 10 1 1.000 0.609 114778076 T G 58 0 22 0 9 1 1.000 0.384 114778293 G C 58 0 24 0 9 1 1.000 0.384 114778345 G A 58 0 24 0 9 1 1.000 0.384 244 114778700 C T 57 1 23 1 11 0 1.000 1.000

114778905 T C 58 0 24 0 10 1 1.000 0.384 114779339 T C 29 29 10 13 4 6 1.000 1.000 114779611 C G 58 0 24 0 9 1 1.000 0.384 114779777 C T 27 31 15 9 4 5 1.000 1.000 114780234 C G 58 0 24 0 10 1 1.000 0.384 114780708 C T 58 0 24 0 10 1 1.000 0.384 114780757 T G 58 0 24 0 10 1 1.000 0.384 114784431 T A 58 0 24 0 9 1 1.000 0.384 114784683 C G 58 0 24 0 10 1 1.000 0.384 114784926 C G 31 27 15 9 6 5 1.000 1.000 114785276 A T 58 0 24 0 10 1 1.000 0.384 114785445 G T 58 0 24 0 10 1 1.000 0.384 114785447 T A 58 0 24 0 10 1 1.000 0.384 Continued

Table C.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114785634 C T 57 1 23 0 8 1 1.000 0.569 114786633 T C 58 0 24 0 10 1 1.000 0.384 114786676 G A 58 0 24 0 10 1 1.000 0.384 114788442 T C 58 0 24 0 10 1 1.000 0.384 114788451 T C 58 0 24 0 10 1 1.000 0.384 114788545 T C 52 6 23 1 11 0 1.000 1.000 114788577 C T 31 27 16 8 6 5 1.000 1.000 114789039 A C 12 46 3 20 0 11 1.000 0.583 114789624 C G 30 25 15 6 5 4 1.000 1.000 114789963 C A 56 2 23 1 11 0 1.000 1.000 245 114790399 C A 58 0 23 0 9 1 1.000 0.384

114791288 C A 58 0 24 0 10 1 1.000 0.384 114791451 C G 58 0 24 0 10 1 1.000 0.384 114791842 C G 31 26 9 15 8 3 1.000 0.583 114792915 A G 56 2 21 3 11 0 1.000 1.000 114794449 C T 58 0 24 0 10 1 1.000 0.384 114795175 G T 27 31 14 10 5 6 1.000 1.000 114795541 C G 26 26 12 7 5 5 1.000 1.000 114795955 C T 58 0 24 0 9 1 1.000 0.384 114796421 A G 21 29 13 9 5 4 1.000 1.000 114796464 C A 50 1 22 1 9 0 1.000 1.000 114798913 A G 30 28 15 9 6 5 1.000 1.000 114800522 T C 58 0 24 0 9 2 1.000 0.384 114801127 G A 49 4 19 0 8 1 1.000 0.917 Continued

Table C.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114801604 T C 58 0 24 0 10 1 1.000 0.384 114802248 G A 56 2 21 3 10 0 1.000 1.000 114802787 C T 58 0 24 0 10 1 1.000 0.384 114804220 A G 56 2 24 0 10 1 1.000 0.682 114804543 G A 58 0 23 0 9 2 1.000 0.384 114804607 T C 53 5 23 0 10 1 1.000 0.955 114805895 A G 55 3 24 0 10 1 1.000 0.832 114806173 T C 30 28 14 10 6 5 1.000 1.000 114806385 A T 58 0 23 1 11 0 1.000 1.000 114806391 C T 10 48 4 20 0 11 1.000 0.588 246 114806694 T A 58 0 23 1 10 0 1.000 1.000

114806935 T C 58 0 24 0 10 1 1.000 0.384 114807237 G A 58 0 22 1 10 0 1.000 1.000 114808082 A G 53 5 24 0 9 1 1.000 0.953 114808166 T C 56 2 24 0 9 1 1.000 0.682 114809533 C T 56 2 24 0 10 1 1.000 0.682 114809763 A T 57 1 24 0 10 1 1.000 0.609 114809904 G A 58 0 24 0 10 1 1.000 0.384 114810024 C G 58 0 23 1 10 0 1.000 1.000 114810038 G A 56 2 17 7 9 1 0.808 1.000 114810879 T A 32 26 16 8 6 5 1.000 1.000 114815621 T C 58 0 24 0 10 1 1.000 0.384 114818182 T C 53 5 24 0 10 1 1.000 0.953 114818187 T G 53 5 23 1 10 1 1.000 1.000 Continued

Table C.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114818959 T C 58 0 24 0 10 1 1.000 0.384 114819969 A G 56 2 24 0 11 0 1.000 1.000 114821491 C T 56 2 24 0 10 1 1.000 0.682 114822737 C T 35 23 20 4 6 5 1.000 0.953 114825375 C T 57 1 22 2 10 0 1.000 1.000 114825509 A C 58 0 24 0 10 1 1.000 0.384 114825645 A G 58 0 22 2 11 0 1.000 1.000 114827198 C T 58 0 24 0 9 2 1.000 0.384 114827827 C T 48 10 21 3 9 2 1.000 1.000 114829007 A C 58 0 24 0 9 1 1.000 0.384 247 114829699 A G 56 2 24 0 10 1 1.000 0.682

114830063 A G 57 1 20 0 9 1 1.000 0.598 114830064 C G 57 1 20 0 9 1 1.000 0.598 114830069 T G 57 1 21 0 9 1 1.000 0.598 114830454 A T 56 2 24 0 10 1 1.000 0.682 114830720 T C 53 5 24 0 10 1 1.000 0.953 114831053 G C 58 0 24 0 9 1 1.000 0.384 114831640 T A 56 2 24 0 10 1 1.000 0.682 114831767 G A 35 23 17 7 8 3 1.000 1.000 114832251 C T 58 0 24 0 9 1 1.000 0.384 114833291 A G 58 0 24 0 10 1 1.000 0.384 114833645 C G 50 1 17 1 7 0 1.000 1.000 114834156 G T 58 0 24 0 9 1 1.000 0.384 114836920 T G 2 56 0 24 1 10 1.000 0.682 Continued

Table C.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114837345 A G 57 1 22 2 11 0 1.000 1.000 114838881 G A 58 0 21 3 10 1 1.000 0.832 114839042 G A 56 2 24 0 10 1 1.000 0.682 114840027 C G 58 0 24 0 10 1 1.000 0.384 114840227 T C 56 2 24 0 10 1 1.000 0.682 114840892 A G 57 1 20 4 11 0 1.000 1.000 114840900 G A 58 0 24 0 10 1 1.000 0.384 114841914 G C 58 0 23 1 10 0 1.000 1.000 114842313 T C 2 56 0 24 1 10 1.000 0.682 114842350 G A 2 56 0 24 1 10 1.000 0.682 248 114842957 C T 58 0 24 0 10 1 1.000 0.384

114846716 T C 58 0 24 0 9 2 1.000 0.384 114847413 A C 56 2 24 0 10 1 1.000 0.682 114847566 T C 56 2 24 0 10 1 1.000 0.682 114848059 C T 56 2 24 0 10 1 1.000 0.682 114849925 A G 56 2 24 0 10 1 1.000 0.682 114850196 T C 58 0 24 0 10 1 1.000 0.384 114850237 G C 55 3 21 3 11 0 1.000 1.000 114850349 T G 58 0 24 0 10 1 1.000 0.384 114850414 G T 58 0 23 1 11 0 1.000 1.000 114850431 G A 58 0 24 0 10 1 1.000 0.384 114851887 C T 58 0 24 0 9 1 1.000 0.384 114852303 A G 58 0 24 0 10 1 1.000 0.384 114853107 G A 56 2 23 1 11 0 1.000 1.000 Continued

Table C.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114853406 T C 58 0 24 0 10 1 1.000 0.384 114854121 T C 58 0 24 0 10 1 1.000 0.384 114855085 T C 26 22 7 6 5 2 1.000 0.917 114855257 C T 54 0 22 0 9 1 1.000 0.384 114855622 C T 58 0 24 0 10 1 1.000 0.384 114855726 A G 58 0 24 0 10 1 1.000 0.384 114855994 C A 58 0 24 0 9 1 1.000 0.384 114856322 A G 58 0 24 0 9 1 1.000 0.384 114856348 C T 58 0 23 1 10 0 1.000 1.000 114856479 A G 58 0 24 0 10 1 1.000 0.384 249 114856508 C A 58 0 24 0 10 1 1.000 0.384

114856903 C T 57 1 23 1 10 1 1.000 0.682 114858181 C T 38 20 19 5 5 6 1.000 0.521 114858395 T C 57 1 22 1 8 1 1.000 0.682 114860534 C A 58 0 23 1 11 0 1.000 1.000 114860576 C G 58 0 24 0 10 1 1.000 0.384 114860875 T C 58 0 24 0 10 1 1.000 0.384 114861094 T C 58 0 23 1 11 0 1.000 1.000 114861636 C T 58 0 24 0 10 1 1.000 0.384 114861835 G A 57 1 22 2 11 0 1.000 1.000 114862266 A G 58 0 24 0 10 1 1.000 0.384 114863650 T C 58 0 24 0 10 1 1.000 0.384 114864364 G A 58 0 24 0 10 1 1.000 0.384 114864705 C G 57 1 23 1 10 1 1.000 0.682 Continued

Table C.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114865620 T C 58 0 23 1 10 0 1.000 1.000 114865786 G A 58 0 23 1 11 0 1.000 1.000 114866258 G A 58 0 24 0 10 1 1.000 0.384 114866391 C T 58 0 24 0 9 1 1.000 0.384 114867880 C G 58 0 24 0 10 1 1.000 0.384 114867907 G A 57 1 23 1 10 1 1.000 0.682 114869488 G A 57 1 23 1 11 0 1.000 1.000 114869545 T C 58 0 24 0 10 1 1.000 0.384 114869790 C T 58 0 23 1 11 0 1.000 1.000 114869820 A G 57 1 23 1 10 1 1.000 0.682 250 114870450 A G 56 2 21 1 8 2 1.000 0.384

114870587 T C 53 3 19 0 5 1 1.000 0.682 114871655 T C 55 3 22 2 8 3 1.000 0.384 114871764 C T 58 0 24 0 10 1 1.000 0.384 114871969 C G 55 3 22 2 9 2 1.000 0.569 114871989 A C 55 3 22 2 10 1 1.000 0.953 114872801 G A 55 3 21 3 8 3 1.000 0.384 114873055 T C 54 4 22 1 8 2 1.000 0.521 114873277 C T 54 4 21 3 10 1 1.000 1.000 114873543 G A 52 6 22 2 9 2 1.000 0.712 114874220 G A 54 4 22 2 8 3 1.000 0.384 114874505 C T 55 3 22 2 9 2 1.000 0.569 114877563 G C 58 0 24 0 10 1 1.000 0.384 114878206 T C 57 1 23 1 11 0 1.000 1.000 Continued

Table C.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114878315 G A 57 1 23 1 11 0 1.000 1.000 114878395 T C 50 8 19 5 10 1 1.000 1.000 114878441 C T 57 1 23 1 11 0 1.000 1.000 114879399 T C 57 1 23 1 11 0 1.000 1.000 114879876 G A 57 1 23 1 11 0 1.000 1.000 114880073 C A 57 1 23 1 11 0 1.000 1.000 114880423 T C 57 1 23 1 11 0 1.000 1.000 114880523 C T 57 1 23 1 11 0 1.000 1.000 114880937 T C 57 1 23 1 11 0 1.000 1.000 114881104 G A 55 3 19 5 11 0 1.000 1.000 251 114881685 G A 57 1 23 1 11 0 1.000 1.000

114881700 C T 58 0 23 1 11 0 1.000 1.000 114882820 T G 57 1 23 1 11 0 1.000 1.000 114882996 C T 57 1 23 1 11 0 1.000 1.000 114883194 T A 58 0 24 0 9 2 1.000 0.384 114883286 C T 57 1 23 1 11 0 1.000 1.000 114885960 G A 58 0 23 1 11 0 1.000 1.000

Appendix D: PLS3 indels, Males

252

Table D.1: Allele counts of PLS3 indels, Males chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114738112 CAA CA 53 1 20 3 6 0 0.859 1.000 114739858 AT ATT 23 28 6 16 2 2 1.000 1.000 114747223 GAAA GAA 23 0 6 1 2 1 1.000 1.000 114747223 GAAA GA 28 23 16 6 2 2 1.000 1.000 114749344 TAAA TA 25 2 12 1 6 0 1.000 1.000 114749344 TAAA TAA 25 15 12 2 6 0 1.000 1.000 114749344 TAAA TAAAA 25 3 12 4 6 0 1.000 1.000 114749519 CAA CA 29 22 17 5 5 1 1.000 1.000 114749966 GAAA GA 29 3 12 1 2 0 1.000 1.000 114749966 GAAA GAA 29 22 12 10 2 4 1.000 1.000

253 114751278 CT C 49 1 18 1 5 0 1.000 1.000

114751278 CT CTT 49 4 18 2 5 1 1.000 1.000 TACACA 114754239 C TACAC 33 11 18 1 5 1 1.000 1.000 TACACA 114754239 C TAC 33 4 18 1 5 0 1.000 1.000 CAAAAA 114755000 A CAA 0 38 0 17 0 5 1.000 1.000 CAAAAA 114755000 A CAAA 0 7 0 3 0 0 1.000 1.000 CAAAAA 114755000 A C 0 0 0 0 0 1 1.000 1.000 CAAAAA 114755000 A CAAAA 0 3 0 1 0 0 1.000 1.000 114755588 CAAAAA CAAA 0 32 0 16 0 3 1.000 1.000 Continued

Table D.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114755588 CAAAAA CAA 0 3 0 2 0 0 1.000 1.000 114756070 GT GTT 19 4 9 2 1 0 1.000 1.000 114756070 GT GTTT 19 30 9 12 1 5 1.000 1.000 114758051 CA CAAA 41 3 15 2 3 0 1.000 1.000 114758051 CA CAAAA 41 0 15 0 3 1 1.000 1.000 114758142 AATAT AAT 12 33 1 12 1 2 1.000 1.000 114758142 AATAT AATATAT 12 3 1 4 1 1 0.629 1.000 AACACAC 114758194 AAC AC 12 23 0 11 1 2 0.580 1.000 114758194 AAC AACAC 12 3 0 3 1 0 0.580 1.000 AACACAC

254 ACACACA 114758194 AAC C 12 4 0 4 1 1 0.580 1.000 CAAAAA 114760206 AA CA 36 3 12 1 4 0 1.000 1.000 114762619 ACTT AT 34 17 15 7 5 1 1.000 1.000 114762620 CTTTT CTTT 28 20 12 9 4 1 1.000 1.000 TTGTGT 114764718 G TTGTG 16 23 4 16 1 4 1.000 1.000 TTGTGT 114764718 G TTG 16 13 4 3 1 0 1.000 1.000 TTGTGT TTGTGTG 114764718 G TG 16 2 4 0 1 1 1.000 1.000 TTTTAT TTATTT TTTTATTT ATTTAT ATTTATT 114768876 TTA TA 8 22 4 11 0 2 1.000 1.000 Continued

Table D.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe TTTTAT TTATTT ATTTAT TTTTATTT 114768876 TTA ATTTA 8 17 4 6 0 2 1.000 1.000 TTTTAT TTATTT ATTTAT 114768876 TTA T 8 0 4 0 0 1 1.000 1.000 114769095 GAA GA 15 39 7 14 0 4 1.000 1.000 114769095 GAA G 15 0 7 0 0 1 1.000 1.000 114769182 AATT AATTATT 32 22 10 9 5 0 1.000 1.000

255 114771373 CAAAA CAAA 4 37 1 16 1 3 1.000 1.000 114771373 CAAAA CAA 4 7 1 4 1 1 1.000 1.000 114776488 CTTT CTT 41 12 17 2 4 2 1.000 1.000 114776488 CTTT CT 41 0 17 1 4 0 1.000 1.000 114781529 TA TAA 10 28 4 11 0 2 1.000 1.000 114784096 TA T 51 2 23 0 5 1 1.000 1.000 114789615 T TTCCC 18 28 9 7 1 3 1.000 1.000 TTCCCTC 114789615 T CC 18 3 9 4 1 1 1.000 1.000 114802031 CTTTT CTTT 43 4 14 4 6 0 1.000 1.000 114802031 CTTTT CTT 43 2 14 1 6 0 1.000 1.000 114802031 CTTTT CT 43 0 14 1 6 0 1.000 1.000 114803357 CA CAAAAAAA 47 1 18 2 4 1 1.000 1.000 114803357 CA C 47 1 18 1 4 0 1.000 1.000 114804819 CTT CTTT 37 3 19 2 6 0 1.000 1.000 114805210 CTTT CTT 40 3 17 1 3 1 1.000 1.000 Continued

Table D.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe 114805210 CTTT CTTTT 40 1 17 1 3 0 1.000 1.000 114805830 GAA GAAA 24 28 10 11 1 5 1.000 1.000 114805830 GAA GA 24 2 10 1 1 0 1.000 1.000 114820828 CA C 45 1 11 3 4 0 0.580 1.000 114822046 CAA CA 47 4 14 3 5 1 1.000 1.000 114822046 CAA CAAA 47 1 14 2 5 0 1.000 1.000 CCTCTC 114828956 T CCTCT 52 0 23 0 5 1 1.000 1.000 114828980 TCA T 51 0 21 1 5 0 1.000 1.000 114829999 CTT CTTT 43 4 19 1 5 0 1.000 1.000

256

114829999 CTT C 43 2 19 1 5 0 1.000 1.000 114829999 CTT CT 43 4 19 1 5 0 1.000 1.000 114832604 CAA CA 52 1 21 1 5 1 1.000 1.000 114832604 CAA CAAA 52 1 21 1 5 0 1.000 1.000 114833648 CAA CA 31 8 12 4 5 0 1.000 1.000 114833899 CTT CT 35 1 8 1 4 0 1.000 1.000 114834604 ATT AT 53 1 18 4 6 0 0.580 1.000 114834604 ATT ATTT 53 0 18 1 6 0 1.000 1.000 114835773 TAA TA 28 26 13 8 4 1 1.000 1.000 114835773 TAA T 28 0 13 0 4 1 1.000 1.000 114837176 TAA TA 2 52 0 23 1 5 1.000 1.000 114841811 ATTT AT 3 4 3 4 0 1 1.000 1.000 114841811 ATTT ATT 3 46 3 13 0 5 1.000 1.000 Continued

Table D.1: Continued chrX hg19 Normal Mild Severe Adj P-Value Ref Alt Ref Alt Ref Alt Position Ref Alt Alleles Alleles Alleles Alleles Alleles Alleles Mild Severe CAGAGA CAGAGAG 114849799 GAG AGAG 42 3 22 1 5 1 1.000 1.000 114859905 TAA TA 52 0 21 0 4 2 1.000 0.467 114859905 TAA T 52 1 21 1 4 0 1.000 1.000 114860598 CTT CT 54 0 22 1 6 0 1.000 1.000 114867489 GTTT GTT 39 14 17 5 6 0 1.000 1.000 114867489 GTTT GT 39 0 17 1 6 0 1.000 1.000 114870517 CTTT CTT 39 8 16 6 4 1 1.000 1.000 114875537 CTTTT CTTT 42 6 15 4 5 1 1.000 1.000 114875537 CTTTT CTTTTT 42 0 15 1 5 0 1.000 1.000 114875537 CTTTT CTT 42 2 15 1 5 0 1.000 1.000

257 114876350 CTTT CT 46 0 18 1 6 0 1.000 1.000 114876350 CTTT CTT 46 5 18 3 6 0 1.000 1.000 AAAATA AATAAA AAAATAA 114878911 TAAAT AT 0 10 1 6 0 5 1.000 1.000 AAAATA AATAAA AAAATAA 114878911 TAAAT ATAAAT 0 42 1 13 0 1 1.000 1.000 AAAATA AATAAA 114878911 TAAAT A 0 1 1 1 0 0 1.000 1.000 AAAATA AATAAA 114878911 TAAAT AAAAT 0 0 1 1 0 0 1.000 1.000