(' '%'#($ )/)# ""($)%" (#(' '%'# $ %)%'"'%'## $ ' $$ $ *").%  $ $ +'( ).%"( $!  $"$                                 

        

                 %&'($), ) ) &'# (( %$%)  *").%  $%) $ +'( ). %"( $! %'&*" -# $) %$'#%)".'%#")*' "" %# *#%$)  %.)%0"%!  "( $!  $"$ 

Supervised by Associate Professor Henna Tyynismaa, Ph.D Research Programs Unit, Faculty of Medicine University of Helsinki

Docent Emil Ylikallio, MD, Ph.D Research Programs Unit, Faculty of Medicine University of Helsinki

Reviewed by Professor Johanna Uusimaa, MD, Ph.D Faculty of Medicine University of Oulu

Doctor Vihandha Wickramasinghe, Ph.D School of Biomedical Sciences University of Melbourne

Opponent Professor Marina Kennerson, Ph.D Faculty of Health and Medicine University of Sydney

Custos Associate Professor Henna Tyynismaa, Ph.D Research Programs Unit, Faculty of Medicine University of Helsinki

The Faculty of Medicine uses the Urkund system (plagiarism recognition) to examine all doctoral dissertations.

Cover image: Human neurons differentiated from induced pluripotent stem cells expressing MAP2 (green) and GANP (red).

ISBN 978-951-51-7217-4 (print) ISBN 978-951-51-7218-1 (online) ISSN 2342-3161 (print) ISSN 2342-317X (online)

Dissertationes Scholae Doctoralis Ad Sanitatem Investigandam Universitatis Helsinkiensis

Unigrafia Helsinki 2021

To my husband Dr. Michael Woldegebriel.

 

Rare diseases are a heterogenous group of disorders, which together affect a large number of people. These diseases typically have a genetic cause, which affects various cellular processes and pathways. One of the key processes in the cell is RNA metabolism, which is crucial for expression. Defects in RNA processing steps contribute to a variety of human diseases ranging from motor neuron diseases to cancer. Most rare diseases currently have no cure. In this dissertation, biallelic variants in Minichromosome maintenance complex 3 associated (MCM3AP) gene were found to underlie a neurological syndrome: early-onset peripheral neuropathy with/without intellectual disability. Through international collaboration, patients from multiple families and the disease variants in MCM3AP were identified. Germinal center associated nuclear protein (GANP), encoded by MCM3AP, is a large scaffold protein in the Transcription and Export 2 (TREX-2) complex, which is responsible for the transport of mRNAs from nucleus to the cytoplasm. The effects of different disease-causing variants on GANP abundance on the nuclear envelope were determined, which led to establishment of a genotype/phenotype correlation regarding the severity of the motor phenotype. This research showed that GANP regulates in a cell type specific manner. In patient fibroblasts, GANP was found to dysregulate depending on intron content. In addition, stem cell-based technology and genome editing was utilized to allow modeling of GANP defects in human cultured motor neurons. We found that GANP is a major regulator of gene expression in developing motor neurons, with GANP defects altering the expression of genes related to synaptic functions and resulting in compensatory induction of protein synthesis and mitochondrial pathways. Understanding of the causes and mechanisms of rare diseases is needed for improving diagnostics, therapeutic development and personalized treatments. In this dissertation, the foundation for understanding the mechanisms of MCM3AP-linked neurological syndrome has been established through identification of the causative gene and description of the disease phenotype. The different cellular models including patient-specific motor neurons used in this study have revealed molecular pathways that may be targets for treatment in preclinical trials.

4



Abstract ...... 4

Contents ...... 5

List of original publications ...... 9

Abbreviations ...... 10

1 Introduction ...... 14

2 Review of the literature ...... 16

2.1 Introduction to human genetics ...... 16

2.1.1 DNA ...... 16

2.1.2 organization ...... 16

2.1.3 Genes and other elements of the genome ...... 16

2.1.4 Genetic variation ...... 17

2.1.5 Mendelian inheritance of traits ...... 18

2.2 mRNA export is a key step in gene expression ...... 19

2.2.1 Gene expression ...... 19

2.2.2 mRNAs are exported through nuclear pores ...... 20

2.2.3 Pre-mRNA processing and splicing ...... 21

2.2.4 Macromolecular complexes in gene expression and export 22

2.2.5 mRNA export factor GANP ...... 24

2.2.6 Selectivity of mRNA export ...... 25

2.2.7 Export and expression of intronless genes ...... 26

2.2.8 Nucleocytoplasmic transport in neurological diseases 27

2.3 Charcot- Marie Tooth disease: from phenotypes to genetics and variant detection ...... 28

5

2.3.1 Introduction and etiology ...... 28

2.3.2 Modes of inheritance ...... 28

2.3.3 Classification ...... 29

2.3.4 Clinical features ...... 30

2.3.5 Genetic causes and molecular mechanisms ...... 31

2.3.6 Genetic testing ...... 34

2.3.7 Advances in the identification of disease-causing variants by next generation sequencing ...... 35

2.4 Stem cell-based modeling of diseases affecting motor neurons 36

2.4.1 Introduction and stem cell types ...... 36

2.4.2 Induced pluripotent stem cells ...... 37

2.4.3 Modeling neuronal disease in human neurons ...... 38

2.4.4 Assessing local translation in neurons ...... 39

2.4.5 Editing the genome with CRISPR-Cas9 for disease modeling 40

3 Aims of the study ...... 42

4 Materials and methods ...... 43

4.1 Collection of clinical data (I, II) ...... 43

4.2 Ethical permissions (I, II, III) ...... 43

4.3 Affected individual and control cells (I, II, III) ...... 43

4.4 Whole exome and genome sequencing (I, II) ...... 44

4.5 Fibroblast cell culture (I, II) ...... 44

4.6 RNA isolation and cDNA synthesis (I, II, III) ...... 44

4.7 Quantitative reverse transcription polymerase chain reaction (qRT-PCR) (I, II, III) ...... 45

6

4.8 Minigene assay to detect altered mRNA splicing in fibroblasts (I) ...... 45

4.9 Mixed neuronal cultures (I) ...... 45

4.10 DNA repair assay (I) ...... 45

4.11 Sodium dodecyl sulfate-polyacrylamide electrophoresis (SDS-PAGE) and western blotting (I, II, III) ...... 46

4.12 Immunocytochemistry (I, II, III) ...... 46

4.13 Microscopy (I, II, III) ...... 46

4.14 RNA in situ hybridization (II, III) ...... 47

4.15 Transmission electron microscopy (II) ...... 47

4.16 Molecular modeling of Sac3 region of GANP (II) ...... 47

4.17 RNA sequencing and bioinformatics (II, III) ...... 47

4.18 Culture of human induced pluripotent stem cells (hiPSC) (I, III) 48

4.19 Motor neuron differentiation (III) ...... 48

4.20 Axon isolation for RNA and protein (III) ...... 49

4.21 Nuclei isolation for RNA and protein (III) ...... 49

4.22 CRISPR-Cas9 gene editing (III) ...... 50

4.23 Statistics (I, II, III) ...... 50

5 Results and discussion ...... 51

5.1 MCM3AP variants underlie an early-onset neurological disease spectrum (I, II) ...... 51

5.1.1 Clinical features ...... 51

5.1.2 Genetic features ...... 53

5.2 Distinct effects on GANP by variant location (I-III) ...... 54

5.2.1 Defects in GANP correlate with severity of disease phenotype 54

7

5.3 MCM3AP variants have different effects on gene expression depending on cell type (II-III) ...... 55

5.3.1 Generation and validation of a human neuronal system for disease modeling ...... 55

5.3.2 MCM3AP variants remodel neuronal gene expression 56

5.3.3 GANP regulates expression of intronless genes ...... 57

5.3.4 MCM3AP defect alters nuclear mRNA composition in neurons 58

6 Conclusions and future directions ...... 60

7 Acknowledgements ...... 62

8 References ...... 64

9 Contributions ...... 82

9.1 Publication I ...... 82

9.2 Publication II ...... 82

9.3 Publication III ...... 82

10 Original publications ...... 84

8

   

This thesis is based on the following publications:

I Ylikallio E*, Woldegebriel R*, Tumiati M, Isohanni P, Ryan MM, Stark Z, Walsh M, Sawyer SL, Bell KM, Oshlack A, Lockhart PJ, Shcherbii M, Estrada-Cuzcano A, Atkinson D, Hartley T, Tetreault M, Cuppen I, van der Pol WL, Candayan A, Battaloglu E, Parman Y, van Gassen KLI, van den Boogaard MH, Boycott KM, Kauppi L, Jordanova A, Lönnqvist T, Tyynismaa H. - MCM3AP in recessive Charcot-Marie-Tooth neuropathy and mild intellectual disability. *Equal contribution. Brain 2017 140(8):2093-2103.

II Woldegebriel R, Kvist J, Andersson N, Õunap K, Reinson K, Wojcik M.H, Bijlsma E.K, Hoffer M.J.V, Ryan M.M, Stark Z, Walsh M, Cuppen I, van den Boogaard M-J.H, Bharucha-Goebel D, Donkervoort S, Winchester S, Zori R, Bönnemann C.G, Maroofian R, O’Connor E, Houlden H, Zhao F, Carpén O, White M, Sreedharan J, Stewart M, Ylikallio E, Tyynismaa H. Distinct effects on mRNA export factor GANP in neurological disease misregulate intronless genes. Human Molecular Genetics 2020 Jun 3; 29 (9): 1426-1439.

III Woldegebriel R, Kvist J, White M, Sinkko M, Hänninen S, Sainio M.T, Torregrosa-Munumer R, Harjuhaahto S, Carpen O, Bassett A, Ylikallio E, Sreedharan J, Tyynismaa H. Human motor neuron specific gene regulation by mRNA export factor GANP. Manuscript.

The publications are referred to in the text by their roman numerals.

9 Abbreviations    

aa Amino acid ANOVA Analysis of variance ATXN7L3 Ataxin 7-like 3 gene ATXN7L3B Ataxin 7-like 3 B gene ATP Adenosinetriphosphate BDNF Brain derived neurotrophic factor bp Basepair BCA Bicinchoninic acid assay BSA Bovine serum albumin CADD Combined Annotation Dependent Depletion Cas9 CRISPR associated protein 9 cDNA Complementary deoxyribonucleic acid CETN3 Centrin 3 CHAT Choline acetyltransferase CHX Cyclohexamide CMT Charcot-Marie Tooth disease C-Myc Myc proto-oncogene CNTF Ciliary neurotrophic factor CNV Copy number variant CRISPR Clustered regularly interspaced short palindromic repeats CRM1 region maintenance 1 gene crRNA CRISPR RNA C9orf72 Chromosome 9 open reading frame 72 DABP D-Box Binding PAR BZIP Transcription Factor DAPI 4′,6-diamidino-2-phenylindole DSS1 Exoribonuclease II, mitochondrial DE Differentially expressed DMEM Dulbecco’s modified eagle’s medium DMSO Dimethylsulfoxide DNA Deoxyribonucleic acid DUB Deubiquitinating DSB Double stranded break dsODN Double standed oligodenucleotide DSS1 Exoribonuclease II, mitochondrial EDTA Ethylenediaminetetraacetic acid EEG Electroencephalography ENY2 Transcription and mRNA export factor ENY2 ExAC Exome aggregation consortium FBS Fetal bovine serum FG Phenylalanine-glycine FMRP The Fragile X mental retardation protein FTD Frontotemporal dementia

10

GANP Germinal center associated nuclear protein GAPDH Glyceraldehyde-3-phosphate dehydrogenase GDNF Glial derived neurotrophic factor GEO Gene expression omnibus GO GSEA Gene set enrichment analysis GSK-3 Glycogen synthase kinase 3 gRNA Guide ribonucleic acid HAT Histone acetyltransferase HCT-116 Human colon cancer cell line HD Huntington’s disease HDR Homology directed repair HeLa Human cervical cancer cell line hESC Human embryonic stem cell hiPSC Human induced pluripotent stem cell HRP Horseradish peroxidase H2B Histone H2B H2Bub1 Histone H2B monoubiquitination ID Intellectual disability iPSC Induced pluripotent stem cell ISL1 Insulin gene enhancer protein KEGG Kyoto Encyclopedia of Genes and Genomes Klf4 Kruppel-like factor 4 LIG4 DNA Ligase 4 MAP2 Microtubule associated protein 3 MCM3 Minichromosome maintenance complex component 3 gene MCM3AP Minichromosome maintenance complex component 3 associated protein gene MFN2 Mitofusin-2 MN Motor neuron MNP Motor neuron progenitor MNX1 Motor neuron and pancreas homeobox 1 mRNA Messenger ribonucleic acid MRI Magnetic resonance imaging NCV Nerve conduction velocity NEFL Neurofilament light NEFM Neurofilament medium NEP Neuroepithelial progenitor NHEJ Non-homologous end joining NMD Nonsense mediated decay NPC Nuclear pore complex NXF1 Nuclear RNA Export Factor 1 Oct4 Octamer-binding transcription factor 4 PAM Protospacer adjacent motif PAP Poly A polymerase PBS Phosphate buffered saline PCA Principal component analysis

11 Abbreviations  PCID2 PCI domain-containing protein 2 PCR Polymerase chain reaction PD Parkinson’s disease PFA Paraformaldehyde Poly A Polyadenylated POMP Proteasome maturation protein PNRIID Peripheral neuropathy, autosomal recessive, with or without impaired intellectual development PNS Peripheral nervous system PPIB Peptidyl-prolyl cis-trans isomerase B PRIMA1 Proline-rich membrane anchor 1 qRT-PCR Quantitative reverse transcriptase polymerase chain reaction Rab7a Ras-related protein RBP RNA binding protein rDNAse Recombinant DNAse RIPA Radioimmunoprecipitation assay buffer RNA Ribonucleic acid rRNA Ribosomal RNA RNAPII RNA polymerase II RNP Ribonucleoprotein SAGA Spt-Ada-Gcn5 acetyltransferase Sac3 Suppressor of actin 3 SDS-PAGE Sodium lauryl sulfate-polyacrylamide gel electrophoresis Sem1 SEM1 26S Proteasome Complex Subunit sgRNA Single guide RNA SNP Single nucleotide polymorphism snRNA Small nucleolar RNA snRNP Small nucleolar ribonucleoprotein complex SMN1 Survival of motor neuron 1 SOX2 SRY (sex determining region Y)-box 2 Sp Streptococcus pyogenes ssODN Single-stranded oligodenucleotide TARDBP Tar DNA binding protein-43 gene TBST Tris-buffered saline-Tween-20 TCA Tricarboxylic acid (cycle) TDP-43 Tar DNA binding protein-43 TEM Transmission electron microscopy Tgf-β Transforming growth factor beta THOC5 THO complex subunit 5 Thp1 Nuclear mRNA export protein THP1 TPR Tetratricopeptide repeat tracRNA Trans-activating CRISPR RNA tRNA Transfer RNA TRAP Translating ribosome affinity purification TREX Transcription and export complex TSS Transcription start site TTS Transcription termination site

12

TUJ1 Neuron-specific class III beta-tubulin USP22 Ubiquitin specific peptidase 22 UTR Untranslated region WES Whole exome sequencing WGS Whole genome sequencing XPC Xeroderma pigmentosum, complementation group C 143B cells Human osteosarcoma cell line

Abbreviations are listed above, excluding less frequently mentioned gene names.

13 Introduction   

Rare diseases are conditions that affect less than 1 in 2000 individuals in the European population. However, they are a major cause of disease burden, as more than 6000 different rare diseases have been identified to date (Nguengang Wakap et al., 2020). Most of these disorders are of genetic origin and typically monogenic, meaning that pathogenic variants in a single gene cause the disease. Rare disease phenotypes range from neurological conditions to cancers and can be severe. In neurological disorders in particular, early- onset recessive genetic variants typically cause more severe disability from childhood than the dominant variants. Exceptions to these are de novo disease variants, which in some cases present with a severe clinical phenotype. Identification of the disease-causing genetic variants makes it possible to provide affected individuals with a molecular diagnosis. Molecular diagnosis allows for genetic counselling and may provide personalized treatment options. Currently, treatments are not available for most rare diseases, which calls for more research on disease causes and mechanisms. The rise of next generation DNA sequencing methods such as whole- genome and whole-exome sequencing has brought molecular diagnosis for an increasing number of affected individuals with rare monogenic diseases. In particular, sequencing of the coding regions or whole genomes of the affected child and parents (trio sequencing) have increased diagnostic efficiency. This method is particularly useful for detection and early treatment of congenital forms of neurological diseases. Previously, time-consuming linkage studies requiring a large number of affected individuals with the same disease were used to identify disease genes. The new technologies have enabled gene discoveries in diseases where only a small number of patients with identical phenotypes are recognized. These rare patients may be scattered around the world, which necessitates international collaboration between clinicians and researchers. Once a new candidate disease gene is identified in patients from multiple families, molecular studies in relevant cell types may offer insight into disease mechanisms and function as a preclinical model for testing personalized treatments. Patient-specific induced pluripotent stem cells have provided a source for meaningful model systems that can be used to study cell- type specific molecular mechanisms. Combined with genome editing for generation of isogenic cell lines, these new technologies could provide a faster route to therapy. 1079 monogenic neuromuscular diseases have been described (Bonne, Rivier, 2021). One of them is Charcot-Marie Tooth disease (CMT), which is a collective name for inherited forms of peripheral neuropathies. They are a group of genetically heterogenous diseases for which over 80 disease genes have been identified (Rossor, Evans & Reilly, 2015). CMT is characterized by

14

motor and sensory neuron axon degeneration or demyelination in the peripheral nervous system. It was named after three physicians who described the phenotype in 1886. The disease typically causes foot and hand abnormalities, progressive loss of movement and sensory signs. CMT has an overall prevalence of 1:2500 and is considered the most common inherited neuromuscular disorder (Rossor, Evans & Reilly, 2015), this is what Rossor et al. considered. Nevertheless, many individual CMT genes are the disease cause in only a handful of patients. Various molecular pathways from mitochondrial dysfunction to faulty RNA metabolism contribute to CMT pathogenicity. No cure currently exists for CMT. Despite the large number of known CMT genes, many affected individuals remain without molecular diagnosis highlighting the importance of further disease gene identification (Rossor et al., 2013). Understanding of the disease mechanisms is limited particularly due to lack of suitable model systems that allow investigation in relevant cell types. Therefore, human neurological model systems could aid in understanding disease mechanisms in vitro, which are currently lacking for many monogenic diseases. In this dissertation, a novel rare peripheral neuropathy syndrome and its disease gene MCM3AP are described and modelled in human motor neurons.

15 Review of the literature      

 1752(8'7-2172,80%1+)1)7-'6

  

Human genome consists of deoxyribonucleic acid (DNA), which contains the genetic information of the cells. DNA is a biochemical structure, which is 99.9 % identical across all humans. Variation in nucleotide sequence contributes to individual differences in health and disease. The DNA is a double helix, formed of a chain of complementary nitrogen-containing nucleotide basepairs (WATSON, CRICK, 1953). Adenine (A) pairs with thymine (T) and cytosine (C) pairs with guanine (G). In addition, DNA contains a deoxyribose sugar backbone and and a phosphate molecule. Double stranded DNA has a sense strand from 5’ to 3’ direction and antisense strand on the opposite direction. The human genome consists of approximately 3 billion basepairs of DNA. Genetic information of the nuclear genome is the same in each cell regardless of cell type, and it forms the basic instructions for guiding RNA and protein production.

 +#$ $%#%( $!0*!%$

DNA is packed into . Each chromosome has a short arm (p) and long arm (q), which contain the loci or locations of the genes. Human genome is diploid meaning that it consists of 22 autosomes and a pair of sex chromosomes, inherited from each parents. Females have two X- chromosomes, whereas males have one copy of X and Y chromosomes. Therefore, the genome consists of total 46 chromosomes, and of maternally inherited multi-copy mitochondrial DNA (Miller et al., 2003). Chromosomal DNA resides within the nuclear compartment of the cell, encoding most of the genes. The order of the chromosomes is called the karyotype.

 $)$%* ("#$*)%*  $%#

Human genome contains genes that code for , but actually protein coding regions (exons of genes) cover only 1.5% of the genome (ENCODE Project Consortium, 2012). The same gene can have one or more different forms, called alleles. Since 1990, when the Human Genome Project was started, 20 000 – 25 000 protein coding human genes have been mapped

16

(International Human Genome Sequencing Consortium, 2004). The goal of the Human Genome Project was to map the precise order of the nucleotide sequences in a large number of individuals. This formed a basis for understanding the protein coding regions, particularly interesting for disease studies, since variants causing inherited disorder cluster in exons. Further studies on gene function are based on understanding the precise order of the genome and form the basis for personalized medicine. Exons in genes are flanked by non-coding introns, and in addition the genome contains regulatory regions, non-coding RNAs, repetitive sequences, microsatellites and pseudogenes. The non-coding regions consists of for example promoter, enhancer, silencer and insulator elements, which regulate transcription (Maston, Evans & Green, 2006). Promoter regions are located at the beginning of genes, whereas enhancers are found in the beginning and end parts of the gene (Plank, Dean, 2014). Silencers are capable of repressing gene function and are located before or after the exons in the gene. Insulators can function in blocking enhancer function in a long range in the genome. The exact functions of many of these elements, in particular non-coding regions is still unknown and an active area of research. Previously, the non-coding genome was thought to be unimportant, but recently it has been shown to have potentially important functions in for example gene regulation.

 $*!,(!*!%$

Variant is a permanent difference in a specific site of a DNA sequence between two genomes. These can be either harmful to for example protein function of the gene or contribute to natural differences that do not have adverse consequences to the health of an individual. Single nucleotide polymorphisms (SNP) are variants of a single nucleotide. In the human genome, there are approximately 10 million SNPs (International HapMap Consortium, 2003), most of which is part of the natural variation between individuals. However, some variants can be pathogenic and contribute to human disease (Table 1). Point variants change a single nucleotide in a gene. These can alter an amino acid in exonic regions (missense). On the other hand, nonsense variants generate a premature stop codon, potentially resulting in truncated protein product. Finally, silent or synonymous variants do not cause a change in the amino acid. Amino acid changing missense variants in exons may disrupt a conserved residue (has stayed the same through evolution), in which case they may have harmful consequences to protein function. These are particularly detrimental in functionally important regions of the protein product. Deletions and insertions remove or add nucleotide bases to the DNA sequence. This may lead to loss of the reading frame (frameshift variant) of the gene when nucleotides added or removed are not in multiples of three and add amino acids when they are in-frame (Table 1). Another typical change is repeat number expansion, in which case several nucleotides are added as repeat

17 Review of the literature  sequences in the gene. The variants can be categorized based on how they affect protein product of the gene or affect other critical functions of the gene. Variants can be categorized into five categories that apply to the classification of a majority of Mendelian diseases (Richards et al., 2015). These are I) pathogenic, II) likely pathogenic, III) variants of uncertain significance, IV) likely benign and finally, V) benign variants.

 &*+5.8/=1.68<=,86687?*;2*7==B9.<

      %>+<=2=>=287 $.95*,.6.7= 2<<.7<. 6278*,2-,1*70.   !87<.7<. .7.;*=2878/<=89 ,8-87   %B787B68>< !8*6278*,2- ,1*70. .5.=2878; --;.68?. 7/;*6. --8;;.68?. 27<.;=287 7>,5.8=2-.<27 *6278*,2-< 6>5=295.<8/ =1;..   --8; ">=8//;*6. $.*-270/;*6. ;.68?. <12/=< 7>,5.8=2-.< 78=27 6>5=295.<8/ =1;..  >952,*=287 --87.8; ;*6.<12/= *B*5=.;9;8=.27 68;.=26.< />7,=287 =1.<*6. 92.,.8/! $.9.*= --<.?.;*5 $.9.*=270<.:>.7,. 87<.:>.7,.<87 7>6+.; 7>,5.8=2-.< 9;8=.27/>7,=287 .A9*7<287

 $"!$!$ (!*$%*(!*)

Gregor Mendel described in 1865 segregation of traits in plant populations that formed the foundations for understanding of inheritance in animal populations. These principles apply to allelic segregation in families with disease variants. Segregation means that two copies of a gene are randomly distributed to the offspring so that one copy is present in half of the offspring and the other in half. Genetic variants are located in either autosomes or the X-chromosome. These can be inherited in autosomal dominant (requiring one copy of the allele from one parent), autosomal recessive (requiring a copy of variant from each parent), X-linked dominant, X-linked recessive or maternal pattern of inheritance in case of mitochondrial DNA variants. Parents of affected offspring with recessive variant combinations are called carriers and

18

are typically healthy due to presence of only one copy of the variant. The disease variants can be heterozygous, which means that they are only on one allele. Homozygosity, on the other hand, indicates that the same variant is present on both alleles. The variants can also be compound heterozygous, in which case two different variants in different alleles together contribute to the disease pathogenicity.

  0 );3257-6%.)<67)3-1+)1));35)66-21

 $.&())!%$

Genetic information from nuclear genes to proteins are processed by transcribing genes into messenger RNA (mRNA). Nucleobases in RNA are adenine, cytosine, guanine, and uracil (U). Genes contain a transcriptional start site (TSS) at the 5’ end untranslated region (UTR). Most genes are polyadenylated (poly A) at the 3’ UTR followed by transcription termination site (TTS). UTR regions have several critical functions from affecting protein function, to ribosome attachment and regulation of mRNA half-life (Barrett, Fletcher & Wilton, 2012). RNA polymerases (RNAP) transcribe genes to different RNA types. Different RNA types have unique RNAPs, which control transcription. In eukaryotes, protein-coding genes are transcribed by RNA Polymerase II (RNAP II). RNA Polymerase I transcribes rRNA genes and RNA Polymerase III transcribes tRNA and 5s RNA ribosomal genes (Carter, Drouin, 2009). RNAP II attaches to the primary strand of DNA and synthesizes RNA from a DNA template in the form of precursor mRNA. Its initiation is regulated by transcription factors that are recruited by binding to promoter and enhancer elements in the DNA and to coactivators binding to RNA (Lee, T. I., Young, 2013). The information of the template or primary strand of DNA transfers to RNA with complementary nucleotides. As mRNA is not present naked in cells, the messenger ribonucleoprotein complexes (mRNPs) have proteins protecting mRNA. Transcribed RNA contains exons and introns and must be futher processed by splicing out the introns (Figure 1). During the processing, a 5’ cap is added to mRNA, which is recognized at nuclear pore and necessary for nuclear exit (Topisirovic et al., 2011). The poly A tail is necessary for prevention of enzymatic degradation of mRNA in the nucleus in yeast (Burkard, Butler, 2000) and mammalian cells (Wu, Brewer, 2012). As a final step of gene expression, protein synthesis takes place in cytosol, where ribosomes read the genetic code from the mRNA template. Thus mRNAs must be efficiently exported from the nucleus to the cytosol. Gene expression describes the entire process from transcription to protein synthesis

19 Review of the literature  and is tightly regulated for each gene depending on cell type and state (Figure 1).

  %,1.6*=2,8/0.7..A9;.<<287<=.9<27=1.7>,5.><*7-,B=895*<68558@270 =;*7<,;29=287<952,270=*4.<95*,.27=1.<952,.8<86.@2=1=1.1.598/=1.&$) ,8695.A&1.7&$) *7-<,*//85-9;8=.27!#9*;=2,29*=.276$!.A98;=8/ 985B*-.7B5*=.-6$!<=8=1.,B=895*<67=1.,B=895*<6;2+8<86.=;*7<5*=.<=1. 27/8;6*=28727=89;8=.27

 # )(.&%(** (%+ $+"(&%()

The separation between nucleus and cytoplasm compartmentalizes mRNA processing into the nucleus by separating transcription from translation (Martin, W., Koonin, 2006, Gorlich, Kutay, 1999). The nucleus is separated from cytoplasm by the nuclear envelope (NE) lipid bilayer. The inner and outer nuclear membranes contain between them openings, through which mRNA molecules are transported. These openings are called nuclear pore complexes (NPCs), which are highly conserved structures across species (Strambio-De- Castillia, Niepel & Rout, 2010). An active human cell contains between 3000- 5000 NPCs (Cordes, Reidenbach & Franke, 1995). Transport through the nuclear pore is a metabolically active process, requiring energy from adenosinetriphosphate (ATP). The nuclear pores function as a barrier preventing other than mature mRNA such as small molecules and passively diffusing molecules from passing through the nuclear pore (Stewart, 2019). Transport through the NPCs require nuclear transport factors (NTFs), which bind on the RNA molecules facilitating their transport until the molecules reach the cytoplasmic side (Moroianu, 1999). There are

20

various types of NTFs depending on the type of RNA molecule to be transported. Karyopherin family (Kap) of NTFs consists of factors that import cargo to the nucleus (importins) and those that participate in export (exportins). Kaps transport rRNA, tRNA and viral RNA (Cullen, 2009). Distinct factors participate in the export of mRNA. General nuclear export complex nuclear RNA export factor 1 (NXF1) and NXT1 are most common export factors recruited for mRNA cargo (discussed in detail later). NTF and attached cargo is then transported to nuclear basket. The nuclear basket is a structure, which extends from the inner nuclear envelope through the NPCs. NPCs consist of an approximated 30 different nucleoporin proteins (Strambio-De-Castillia, Niepel & Rout, 2010). These are filamentous phenylalanine-glycine nucleoporins (FG-Nup) that fill the channel and interact with the mRNA molecules and NTF-cargo complexes that are transported. FG-Nups can be on both (symmetric) or only one side (asymmetric) of the NPC (Hoelz, Debler & Blobel, 2011). Nuclear basket is a functionally important entity with control over transcription and RNA metabolism (Strambio-De-Castillia, Niepel & Rout, 2010). Following exit of molecules to the cytoplasmic side, cytoplasmic filaments link to the cytoskeleton and to the protein synthesis machinery. As will be discussed in the following chapters, the process of mRNA export is highly interconnected between different stages.

 (# &(%))!$ $)&"!!$ 

Before mRNA can be exported from the nucleus, it needs to be preprocessed. Pre-mRNA modification is an important feature in eukaryotic gene expression. The process is controlled by a large number of gene regulatory elements (Struhl, 1999). Majority of human genes are spliced prior to export, but there is a high number of intronless genes that are expressed in particular cell types such as neurons (Grzybowska, 2012), highlighting the importance of tissue-specificity in gene expression. The human genome contains approximately 6% intronless genes (Hill, Sorscher, 2006). 5’capping To complete the preprocessing of mRNA, both 3’ and 5’ ends are capped by capping enzymes to mark mature mRNA and promote nuclear exit upon arrival at the NPC. M7G methyl capping ensures pre-mRNA is not prematurely degraded before mature mRNA is formed (Topisirovic et al., 2011). 5’ end cap is added to the pre-mRNP (Ramanathan, Robb & Chan, 2016, Shatkin, 1976). Polyadenylation The 3’ end of the mRNA contains a Poly A tail, which consists of hundreds of A nucleotides (Sachs, Wahle, 1993). In principle, all eukaryotic mRNAs are polyadenylated to facilitate their export (Shepard et al., 2011, Derti et al., 2012, Lin et al., 2012). Some exceptions to polyadenylation are histone-encoding genes (Yang et al., 2011). Poly A end of mRNA is synthesized by poly A polymerase (PAP) (Colgan, Manley, 1997).

21 Review of the literature  Splicing Splicing generates alternative transcripts, generating diversity in gene expression through generation of multiple isoforms (Nilsen, Graveley, 2010). Splicing factors have been shown to regulate target gene splicing and expression from yeast to human (Park et al., 2004, Pleiss et al., 2007, Saltzman, Pan & Blencowe, 2011, Wickramasinghe, Laskey, 2015). The splicing process takes place in the spliceosome, a macromolecular complex formed of RNA and over 300 proteins assembling uridine-enriched small nucleolar RNAs (U snRNAs) (Kornblihtt et al., 2013). Spliceosome proteins protect the mRNA during its preprocessing. Spliceosome consists of the minor and major parts and is formed of snRNPs (small nucleolar ribonucleoproteins) produced by U snRNAs. The major spliceosome catalyzes most of the snRNP reactions (Wahl, Will & Luhrmann, 2009). To form RNP complexes, various compositions of mRNA binding proteins attach to different RNAs. Most commonly these are heterogeneous nuclear ribonucleoproteins (hnRNPs) (Dreyfuss et al., 1993, Matunis, Matunis & Dreyfuss, 1993). Before mRNA is exported out the nucleus for gene expression, mRNA is processed into its mature form from unprocessed mRNP. Once matured, mRNP molecules move with Brownian motion in the nucleus to ensure accurate representation at the nuclear pore complexes (Stewart, 2007, Vargas et al., 2005). Finally, mRNA molecules are transported to the cytoplasm through export complexes at the nuclear pores.

 (%#%"+"(%#&".)!$ $.&())!%$$.&%(*

Following the preprocessing steps such as splicing and capping, mature mRNA is transported through the large nuclear pore macromolecule complexes to the cytoplasm for translation (Tudek et al., 2018). Three multiprotein complexes control gene expression at different levels from histone modification to preprocessing and export. Upstream in gene expression is Spt-Ada-Gcn5 acetyltransferase (SAGA) complex as the initial step for histone deubiquitinylation. Following this, Transcription and export (TREX) complex participates in splicing and pre-processes mRNA through interaction with RNAP II. Finally, Transcription and export 2 (TREX-2) complex is required to export fully matured mRNAs from the nucleus to cytoplasm (Figure 1). SAGA complex is a multi-protein complex upstream in tbe gene expression pathway that interacts with nuclear histones (Table 2). The primary function of the complex is the modification of chromatin through two important modules: histone acetylation (HAT) and deubiquitination (DUB) modules (Koutelou, Hirsch & Dent, 2010). The DUB module contains Ubiquitin specific peptidase 22 (USP22), which removes monoubiquitin of H2B (H2Bub1) in genes. The complex also has ataxin 7 like 3 (ATXN7L3) and ENY2 Transcription And Export Complex 2 Subunit (ENY2) as adaptor proteins for activity and stability (Rodriguez-Navarro, Hurt, 2011). SAGA complex also interacts with ataxin 7 like 3 B (ATXN7L3B), a cytoplasmic

22

paralog of ATXN7L3 encoded by a pseudogene, and competes with it for ENY2 binding, thus inhibiting the ability of SAGA complex to deubiquitinate histones (Li et al., 2016). TREX is recruited to mRNAs during splicing, and it consists of conserved core subunits such as ALY, UAP56 and the THO sub complex (Carey, Wickramasinghe, 2018) (Table 2). TREX interacts with RNAP II (Katahira, 2012). The components of the TREX transcriptional machinery are highly conserved with homologues in yeast and fly, indicating their importance for cellular function. TREX-2 is a large multiprotein complex located at the NPCs. The complex contains a large scaffold protein germinal center associated nuclear protein (GANP). The human TREX-2 is composed of four subunits in addition to GANP. These are ENY2, PCID2, CETN3 and DSS1 (Jani et al., 2012, Umlauf et al., 2013, Wickramasinghe, Laskey, 2015) (Table 2). GANP and ENY2 are crucial for localization to the NPC and function of the TREX-2 complex in gene expression (Umlauf et al., 2013). NXF1 controls the export of most mature mRNAs at nuclear pore. Furthermore, in TREX-2-dependent export, NXF1 interacts directly with the phenylalanine-glycine (FG) repeats of GANP in the cytoplasmic side of the nuclear basket (Wickramasinghe et al., 2010). The multiprotein complexes involved in transcription, splicing and export also share protein components and modules reciprocally allowing coordination or modulation of the processes (Helmlinger, Tora, 2017). For example, ENY2 in the TREX-2 complex is also utilized by the histone deubiquitinase module of the SAGA complex (Lang et al., 2011, Umlauf et al., 2013). In Drosophila melanogaster, TREX-2 and SAGA have been suggested to co-regulate transcription and mRNA export (Kurshakova et al., 2007). All together, this emphasizes the importance of concerted action of the different macromolecular complexes in gene expression. Recent study has interestingly linked the GANP and SAGA connection to genome stability, by showing that a functional interaction between human TREX-2 and the DUB activity of SAGA is needed to maintain the H2B/HB2Bub1 balance for efficient DNA repair and homologous recombination (Evangelista et al., 2018). These authors showed that, whereas cells lacking ENY2 had increased H2Bub1 levels due to decreased deubiquitination, GANP-depletion decreased global H2Bub1. This suggests SAGA and TREX-2 may function beyond gene expression and mRNA export.

23 Review of the literature 

 &*+5.8/=1.1>6*7%&$)*7-&$) ,8695.A9;8=.27<*72.=*5  *=*12;*  8>=.58>2;<,1.7= 

$ ! "   " #  "# @ " #  "#      # # ### #!!   " $" #&#& '  #!& # #  #  #  #  #   $ & '  #!&   ' #    "" 

 # .&%(**%( 

GANP is a large 208 kDa protein that localizes to the nuclear pore complex and is encoded by 28 exons of Minichromosome maintenance complex component 3 associated protein (MCM3AP) at chromosome 21q22.3 (Abe et al., 2000). MCM3AP has two splice variants encoding GANP, the longer protein of 1980 amino acids (Figure 2), and a shorter protein named MCM3AP, which consists of 721 amino acids (Wickramasinghe et al., 2011). The MCM3AP protein shares the C-terminus with GANP in the carboxyl terminus acetyltransferase region that was shown to associate with Minichromosome maintenance complex component 3 (MCM3) (Takei, Tsujimoto, 1998). In GANP this domain is crucial for the nuclear pore localization together with ENY2. However, the function of the MCM3AP protein is unknown. GANP consists of multiple functionally important domains (Figure 2). It contains a cluster of six nucleoporin-like FG repeats in the amino terminus. Another functionally crucial domain is the Suppressor of actin 3 (Sac3). Sac3 is a RNA binding domain, conserved from yeast to human. This region is thought to be crucial for mRNA export (Jani et al., 2009, Jones et al., 2000). The domain physically interacts with nucleoporins enabling recruitment of the mRNA export machinery to the NPCs (Gallardo et al., 2003, Jones et al., 2000). For localization to the nuclear envelope, carboxyl terminus/acetyltransferase domain of GANP is crucial and may be involved in acetylation of MCM3 contributing to DNA replication (Takei et al., 2001, Takei et al., 2002) in addition to the function of GANP in mRNA export. MCM3AP is expressed across different tissues (Abe et al., 2000), but the exact functions of GANP especially in different human tissues are not known. GANP as a scaffold in the TREX-2 complex has been shown by several studies to transport a subset of mRNAs through NPCs (Abe et al., 2000, Jani et al., 2012, Umlauf et al., 2013, Wickramasinghe et al., 2010, Wickramasinghe et al., 2014). Furthermore, GANP is thought to be important for export of mRNAs for key cellular functions, such as gene expression and ribosome biogenesis

24

(Wickramasinghe et al., 2014). In addition, GANP associates with NXF1 to facilitate the transport of mRNAs from the nuclear interior to the nuclear pores (Wickramasinghe et al., 2010, Jani et al., 2012). NXF1 is the main factor that transports the majority of mature mRNA species from the nucleus also in human cells. GANP function has been studied in immunological context, mainly in B cells. GANP was found to be upregulated in germinal center B cells (Kuwahara et al., 2000) and to be involved in DNA replication in the immune system (Abe et al., 2000). GANP has also been identified to participate in the regulation of immune response (Sakaguchi, Fujimura & Kuwahara, 2002). More recently, GANP has been found to modify chromatin and be involved in transcription complex recruitment in a normal state, whereas in GANP B cell specific knockout mice this activity was reduced at immunoglobulin variable regions (Singh et al., 2013). GANP as part of TREX-2 has been implicated in regulation of pathways related to cancer (Goodall, Wickramasinghe, 2020). Collectively, GANP is implicated as a key mRNA export regulator and also in DNA replication in the immune system, contributing to human disease such as cancer.

  !#9;8=.27*7-2=<-86*27<!#,87=*27<;.9.*=<@12,127=.;*,=@2=1 7>,5.898;27<  **#;8A26*5=8=12<$$ $!;.,8072=28768=2/27=.;*,=< @2=1$!<  **>7,=287*55B2698;=*7=%*, -86*27  ** 9*;=2,29*=.<276$!.A98;=    ***7-*,.=B5=;*7,5.*;58,*52C*=2878/!#26*0.+*<.-87*72.= *5 

  "*!,!*/%# .&%(*

Nuclear mRNA export is thought to be highly selective and there are multiple pathways preferring different types of RNAs. The NXF1-NXT1 dimer mediated mRNA export requires hydrolysis of ATP by DEAD-box helicase in the cytoplasm (Katahira, 2015, Stewart, 2019). NXF1 functions in concert with the TREX and TREX-2 multiprotein complexes to export subsets of transcripts interdependently (Wickramasinghe et al., 2014). Nevertheless, GANP and NXF1 also function independently to export specific transcripts. Therefore, mRNA export is partly selective, unlike previously thought (Wickramasinghe, Laskey, 2015). NXF1/NXT1 heterodimer is responsible for the export of a large portion of mature mRNAs and GANP exports a subset of genes, particularly those involved in cellular metabolism (Wickramasinghe et al., 2014). There are some exceptions to NXF1 mediated export. For instance, certain viral mRNAs

25 Review of the literature  utilize chromosome region maintenance 1 protein homologue (CRM1) for export (Wickramasinghe, Laskey, 2015). In a fly study, NXF1 was found to export most but not all mRNAs (Wu J, Bao A, Chatterjee K, Wan Y, Hopper AK, 2015), supported by indications of selective export in human cancer cell line HCT116 (Wickramasinghe et al., 2014). On the contrary, a deficiency in THOC5, a component of the TREX complex, decreased the cytoplasmic expression of only a small subset of mRNAs, particularly crucial in hematopoiesis (Guria et al., 2011) and pluripotency (Wang et al., 2013). TREX-2, through GANP, promotes the nuclear export of specific mRNA classes required for gene expression itself, such as RNA processing, splicing, mRNP and ribosome biogenesis (Herold, Teixeira & Izaurralde, 2003). These studies implicate the mRNA export machinery in regulation of various cellular processes and indicate that different components of the complexes may have varying functions. Interestingly, TREX and TREX-2 are thought to recognize mRNAs during early stages of biogenesis and transfer the mature, preprocessed mRNAs to NXF1. This then facilitates the export of final processed mRNAs by interacting with the FG repeats of nucleoporins (Carey, Wickramasinghe, 2018, Wickramasinghe et al., 2014, Wickramasinghe, Laskey, 2015). In light of this recent evidence, GANP seems to be crucial for mRNA export in human cells and seems to be selective in the mRNAs it exports. However, the responsible machineries for specific types of mRNAs have been only identified recently and role of GANP and other mRNA export factors remains to be studied in disease relevant cell types such as human neurons. The mechanisms by which GANP regulates gene expression could vary in different cell types, since RNA metabolism varies between tissues.

  .&%(*$.&())!%$%!$*(%$")) $)

In comparison to other species, higher eukaryotic genes are typically longer, contain more introns, and contribute multiple times more to the genome than shorter genes (Grzybowska, 2012). The typical human gene contains eight introns, each generating a splicing reaction (Sakharkar et al., 2005). Introns provide many benefits to the genome, although they are energetically unfavorable (Jo, Choi, 2015). Introns have a crucial role in the regulation of alternative splicing of genes, allowing for production of multiple proteins from a single gene. Introns additionally have a role in regulating nonsense- mediated decay (NMD) of mRNAs to reduce gene expression errors rising from faulty transcripts (Lewis, Green & Brenner, 2003). Genes with introns have been shown to have positive effects on increasing gene expression (Buchman, Berg, 1988). The human genome contains a small number of intronless genes, which are largely tissue-specific in their expression patterns and variants in them underlie diseases such as neuropathy (Grzybowska, 2012). However,

26

mechanisms underlying intron-content dependent gene expression are relatively poorly understood. It is known that genes with introns that are spliced, are more efficiently transported from the nucleus to the cytoplasm in eukaryotes, further highlighting the importance of splicing in gene expression (Luo, Reed, 1999). TREX has been suggested to be involved in the export of intronless mRNAs in human HeLa cells (Lei, Dias & Reed, 2011), and NXF1/TAP pathway mediating serine/arginine-rich (SR) protein related intronless gene export was identified critical for exporting intronless histone H2A mRNA (Huang et al., 2003). Furthermore, transcripts that were selectively regulated by GANP in colon cancer cells had fewer introns than those regulated by GANP and NXF1 together or NXF1 only, suggesting that GANP may facilitate rapid changes in gene expression by regulating the export of genes that require less processing (Wickramasinghe et al., 2014). Overall, mRNA export machinery has selectivity towards genes based on their intron content, which seems to be the case also in human cells.

  +"%/*%&")#!*($)&%(*!$$+(%"% !"!)))

Nucleocytoplasmic transport is crucial for gene expression and defects in this process contribute to a variety of neurological diseases (Cooper, Wan & Dreyfuss, 2009). It has been hypothesized that nuclear pore structures are highly stable, and therefore are vulnerable to damage in mature cell types such as neurons (D'Angelo et al., 2009). Various types of dysfunction from protein aggregation, decreased expression and mislocalization have been identified in amyotrophic lateral sclerosis (ALS), frontotemporal dementia (FTD), Huntington’s disease (HD) and Parkinson’s diseases (PD) (Boehringer, Bowser, 2018, Hutten et al., 2020). In ALS, which is a lethal motor neuron disease, expression or localization of proteins is altered causing neurodegeneration and progressive loss of motor neurons (Taylor, Brown & Cleveland, 2016). ALS manifests on a cellular level in pathologic, ubiquitinylated-phosphorylated cytoplasmic inclusions regardless of which gene carries the pathogenic variant (Neumann et al., 2006). Disease variants in multiple genes encoding for RNA binding proteins have been detected in ALS, such as in TARDBP (TDP-43) (Sreedharan et al., 2008). Other hereditary forms of ALS are caused for example by variants in Fused in Sarcoma (FUS) (Kwiatkowski et al., 2009, Vance et al., 2009) and heterogeneous nuclear ribonucleoprotein hnRNP genes A1 and A2 (Kim et al., 2013). Human disease-causing variants in Nucleoporin GLE1 (GLE1) were identified in a severe form of fetal motor neuron disease (lethal congenital contracture syndrome, LCCS1) and lethal arthrogryposis with anterior horn cell disease (LAAHD) leading to embryonic lethality (MIM # 253310, # 611890) (Nousiainen et al., 2008, Vuopala, Ignatius & Herva, 1995, Makela- Bengs et al., 1998). Furthermore, a family with postnatal lethal form of the phenotype has also been identified (Paakkola et al., 2018). The mostly loss of

27 Review of the literature  function mechanism shows that defects in mRNA metabolism due to mutations in nucleoporins can lead to consequences to neurons and cause degeneration. Spinal muscular atrophy (SMA) is a leading cause of motor neuron disease, caused by mutations in survival of motor neuron 1 (SMN1). The disease manifests often from birth, but may also be adult-onset. It is one of the leading causes of lethality in children and causes severe muscle weakness. SMN protein product is involved in splicing during snRNA-snRNP biogenesis (Zhang et al., 2008). This is example of a form of neurological disease with consequences on RNA metabolism, for which treatment exists in the form of gene therapy (Prior, Leach & Finanger, 2020).

  ,%5'27%5-)"227,(-6)%6)*5203,)127<3)6  72+)1)7-'6%1(9%5-%17()7)'7-21

 $*(%+*!%$$*!%"% /

Inherited peripheral neuropathies are a group of disorders affecting the sensory and motor nerves of the peripheral nervous system (PNS). Hereditary motor and sensory neuropathy (HMSN), also known as Charcot-Marie Tooth disease (CMT), is the most common disease in this category. CMT is a chronic and progressive neuromuscular disease caused by degeneration or demyelination of the axons of motor and/or sensory neurons in the PNS. With an overall prevalence of 1:2500, CMT is also the most frequent inherited neurological condition (Rossor et al., 2013, Rossor, Tomaselli & Reilly, 2016). The syndrome was initially described in 1886 by three physicians Jean-Martin Charcot, Pierre Marie and Howard Henry Tooth, who identified muscular atrophy in patients resulting from muscle wasting. CMT is both clinically and genetically heterogenous with 80 genes underlying the disease. CMTs are monogenic and most of them are rare, since only a handful of genes contribute to the majority of the disease burden.

 %)%!$ (!*$

CMT genes function through all forms of Mendelian inheritance including autosomal recessive (AR), autosomal dominant (AD), X-linked and mitochondrial pattern of inheritance. Age of onset links to the pattern of inheritance, since AR inherited forms tend to be more severe and begin earlier in life. AR forms of CMT are more common in consanguineous marriages (Dubourg et al., 2006, Vallat et al., 2005). Affected individuals with CMT often have negative family history of CMT or other neurological diseases due to

28

intrafamiliar variability, recessive inheritance, possibility of de novo variants, or X-linked inheritance.

 "))!!*!%$

Various types of classification for the CMT subtypes and nomenclature has been used over the past decades. Initially, CMT was divided into two subtypes based on whether the affected individual had normal or reduced nerve conduction velocity (NCV) (Dyck, Lambert, 1968). Since then, electrophysiology of the nerves has been in central role in diagnosis together with medical and family history. Measurement of NCVs helps to understand the nerves’ capability to conduct signals, which helps to categorize the disease subtype into two main categories: axonal or demyelinating CMT. The demyelinating subtype is also known as CMT1, whereas axonal type is known as CMT2. NCV of <35 m/s is typical in CMT1 and >45 m/s in CMT2, which is normal. A third, dominant intermediate form (DI-CMT2), is recognized and characterized by NCVs of 35-45 m/s (Davis, Bradley & Madrid, 1978, Reilly, Murphy & Laura, 2011) (Table 3). In the past, CMTs were classified into demyelinating, axonal or intermediate subtypes and by the mode of inheritance and clinical features (CMT1, CMT2, CMT3, CMT4, CMTX). CMT3 (also known as Dejerine-Sottas disease) is a severe, early onset CMT. CMT4 is a less frequent form of CMT, which can have features of both axonal and demyelinating neuropathy and CMT4 is mostly AR. X-linked CMT is denoted as CMTX. A key feature is that variants in the same gene can cause either axonal, demyelinating or intermediate CMT, and therefore cause some heterogeneity in classification. Recently, a new classification system was proposed to better streamline the classification with the increasing amount of disease genes identified (Magy et al., 2018). This system is modular, and takes into account the mode of inheritance, phenotype and name of the gene, presenting with a more widely applicable and consistent way of categorizing different disease forms based on the affected gene.

29 Review of the literature 

 &1.-2//.;.7= &<>+=B9.<=1.2;7.;?.,87->,=287?.58,2=2.<!'*7- 91.78=B92,/.*=>;.<*<.-872;-  -8627*=.27=.;6.-2*=.

 "!%   !%!"    !  "%     #,4?,305(:05.  49 09:(34;9*3,=,(25,99 /03+/66+     (:867/? 9,5968? 3699 ?,(89*64465     )03(:,8(3 -66:    +867469:3?46)03,      # >65(3  49 09:(34;9*3,=,(25,99    (:867/? 3,99 9,5968? /03+/66+68(+;3:   3699 (5+ +09()030:? 659,:*64465   +,7,5+05.65<(80(5:  # ,1,805,"6::(9    +09,(9, %,8? 36= %,8?9,<,8,7/,56:?7,  %9 *3050*(3=,(25,9965065 65.,50:(3-6848(8, );3)9,3,<(:,+" /?7,8:867/0*5,8<,9  #   49 %(80()3,%9 /03+/66+68(+;3: 05:8(-(4030(8<(80()030:? 659,:8(8, :?70*(3#7/,56:?7,

#&&3052,+  49 09:(34;9*3,=,(25,99 +;3:659,:8(8, 9,5968?90.59*,5:8(3 5,8<6;99?9:,4 05:,33,*:;(3+09()030:?

 "!$!"*+()

CMT manifests at different ages of onset from congenital, childhood to adult- onset disease. Typically, the disease begins within the first three decades of life. CMT is clinically heterogenous, and even individuals from the same family carrying the same disease variants may present with either axonal or demyelinating subtypes (Szigeti, Lupski, 2009). Age of onset also correlates with the severity of clinical features. Autosomal recessive CMT is typically more severe than the autosomal dominant forms, moreover it may manifest in early childhood with severe symptoms such as skeletal deformities and loss of ambulation before adulthood (Tazir et al., 2013). The clinical features of CMT consist of muscle weakness and wasting, progressing slowly from distal to proximal extremeties, spinal deformities such as scoliosis and hyporeflexia are common (Rossor et al., 2013). Other typical symptoms include reduction or difficulty in movement, and reduced motor development during childhood. Fine motor skills such as writing might be affected. CMT is characterized by sensory and motor neuron loss, which manifest as muscle weakness, atrophy and wasting in the arms and legs. Disease typically proceeds from distal to proximal. Clinical features vary from severe loss of mobilization to mild loss of sensory ability depending on the disease-causing variant. Upon examination, affected individuals present with

30

hammer toes, pes cavus (high arched feet) and claw hands due to muscle weakness in the underlying muscles. CMT manifests with several potential neurological and non-neurological comorbidities. These may include for example eye abnormalities such as strabismus, hearing loss, obesity, hypothyroidism, diabetes and dysautonomous signs, and may complicate disease course. Characteristic additional neurological signs may involve for example involvement of upper motor neurons and spinocerebellar ataxia.

 $*!+))$#%"+"(# $!)#)

CMT1 affects Schwann cells, the glia of the peripheral nervous system, whereas CMT2 affects motor and sensory neurons. Therefore, variants in genes affecting both of these cell types have been identified in CMT (Table 4). To date, approximately 80 genes have been identified, contributing to several cellular and molecular mechanisms. However, the five most common disease genes together contribute to 80-90% of the disease, whereas the contribution of the other genes remains small. The number of the identified disease associated genes is constantly increasing due to advances in next generation sequencing technologies. The most common genetic cause, and the first disease gene associated with CMT in 1991 is peripheral myelin protein 22 (PMP22) with its duplication on the short arm of chromosome 17 associated with demyelinating type CMT1A (Raeymaekers et al., 1991). The inheritance pattern is AD. Deletions and variants in PMP22 also contribute to the disease load (Valentijn et al., 1995) although point variants are less common. PMP22 affects the myelin sheet of the Schwann cells by interfering with its development and maintenance. Onion bulb formations in the myelin are seen as characteristic sign in nerve biopsies of patients with PMP22 disease causing variants (Nishimura et al., 1996). Variants in PMP22 contribute to for example defect in intra-axonal transport (de Waegh, Brady, 1990) and formation of protein aggregates (Ryan, Shooter & Notterpek, 2002). Two other common causes of CMT1 linked with myelination are variants in myelin protein zero (MPZ) and Gap junction beta- 1 protein (GJB1), which are also involved in myelination and gap junction formation in the Schwann cells, respectively. GJB1 is the second most frequent cause of CMT, and in addition to gap junction formation is involved in trafficking cargo in Schwann cells. The most common cause of CMT2, on the other hand, is mitofusin-2 (MFN2) (Zuchner et al., 2006, Verhoeven et al., 2006) which contributes to an estimated 10-30% of CMT2 cases with AD inheritance pattern (Tan et al., 2016). MFN2 function is relatively well recognized. The protein MFN2 functions in the mitochondrion together with MFN1 (Chen et al., 2003), at the contact sites between the outer membranes participating in mitochondrial fusion and dynamics (Pareyson et al., 2015). MFN2 variants are known to

31 Review of the literature  cause fragmentation of the mitochondrial networks (Amiott et al., 2008) and reduce mitochondrial movement, as well as cause axonal transport defects (Baloh et al., 2007), since mitochondria are integral in axons. Mitochondria are crucial for cellular energy metabolism, and CMT associated variants in both nuclear and mitochondrial DNA encoded genes have been found. Particularly they are regulating mitochondrial structure, dynamics and other functions such as mitochondrial contacts (Krols, Bultynck & Janssens, 2016, Krols et al., 2016). Other mitochondrial key genes in addition to MFN2 involved in CMT are GDAP1, MT-ATP6, PDK3. An important pathway affected in CMT2 is the cytoskeleton of the axon. Variants in intermediate filaments such as neurofilament light (NEFL) have been identified and for example genes involved in axonal transport Cytoplasmic dynein 1 heavy chain 1 (DYNC1H1). Another key pathway implicated in CMT is endosomal sorting, including genes that participate in vesicular transport, intracellular signaling and membrane trafficking (LITAF, MTMR2, SBF1, SBF2, SH3TC2, NDRG1, FIG4, RAB7A, TFG, DNM2, SIMPLE). Faulty processing by the ubiquitin-proteasome system has also been implicated in CMT with genes such as Tripartite motif containing 2 (TRIM2) and small heat shock protein genes HSPB1, HSPB8 being affected. Furthermore, there is a group of disease genes involved in transcription (HINT1, IGHMBP2, MORC2), aminoacyl-tRNA synthetases of protein synthesis (AARS, GARS, MARS, KARS, YARS) and nuclear envelope integrity (LMNA). Additionally, variants in ion channels (TRPV4) have been identified in CMT. Regardless of a large number of identified CMT genes and related cellular functions, a great number of affected individuals remain without a molecular diagnosis, particularly in CMT2, suggesting the need for further gene discovery (Rossor et al., 2013). In addition, understanding of cellular mechanisms in CMT is important in order to link disturbances on a molecular level to a specific genetic alteration and phenotype.

32

 >7,=287*5,*=.08;2.<8/-2//.;.7= &?*;2*7=<*7- &<>+=B9.*<.-87 (.2<.=*5 %86.,87=;2+>=.=88=1.;/8;6<8/271.;2=.-7.>;89*=1B+>= 875B &=B9.<*;.6.7=287.-1.;.

        

>,204(9054  $$



 % $$ $  $

>9581,2,954  $$$ 

  $

:*2,(7,4;,256,  "$ $ 

$7(48*7069054   $$ 

   $ '

  $ #

 ! $ 4,:753>59540((4+ (=54(24,:756(9/>

   "$ $  

 759,048>49/,808   $ 

   $ 

 $  $

   $"

   $ %

 759,04675*,8804.(4+   $  6759,(853,

    $ 

   $ 

 !  $ "

   $ 

497(*,22:2(797(486579  $&

  !  $ 

   $ 

  $ $

 $ $ 

33 Review of the literature 

 ! $

   $ 

   $ &

4+562(830*7,90*:2:3 ! $ 

   $ 

095*/54+70(2+>4(30*8  $ $  

  $ $ $"

095*/54+70(23,9()52083 ! $ !

   $& 

  $ 

,9()520835- !  $  6/586/50458090+,

 !   $  

 !  $  

   $ 

"/5$ (8,80.4(2204.   $"

   $ 

 49,7(*9054<09/9/,   $ ,=97(*,22:2(7,4;07543,49

  $ $

   $

  # $ 

 54*/(44,28 ! "  $ 

  9/,7   152,/*.*4&'0-,3. $&  &/)/5(-*04,)*',036/4+*3,3

 .,40(+0/)2,&- $&  .*),&4*)&101403,3

  $*!*)*!$ 

Determining the underlying cause of disease by genetic testing is in key role for molecular diagnosis, genetic counselling and potential treatment. It is also useful for determining which family members are carriers of the variants or in family planning. In monogenic diseases, it is particularly useful to test for

34

genetic variants in known disease genes first, if symptoms match a defined phenotype. However, in genetically heterogenous disorders such as CMT, due to large number of involved disease genes, further genetic testing is often required if causative gene is not a common one. Previously, the identification of human disease genes required linkage analysis studies involving multiple patients with identifical phenotypes and/or candidate gene sequencing. Both approaches were time-consuming with the disease variant identification relying ultimately on Sanger sequencing (Sanger, Nicklen & Coulson, 1977), which restricted the number of genes that could be sequenced. In addition, distinguishing between phenotypic features based on a given gene is difficult, since phenotypes are similar across most genes. Sanger sequencing may be still useful in sequencing the most common causes of CMT such as PMP22, GBJ1, MPZ and MFN2, however it is not optimal for identification of rare disease genes, which are common as a group in CMT.

  ,$)!$* !$*!!*!%$%!))+)!$ ,(!$*)/$.* $(*!%$)'+$!$ 

Recent developments in next generation sequencing (NGS) methods have enabled the analysis of all exons in the genome at once by whole exome sequencing (WES). Alternatively, whole genome sequencing (WGS) is an option for sequencing of the coding and non-coding regions. Trio sequencing, referring to the sequencing of the exomes or genomes of parents and the offspring, is an efficient method. As a primary diagnostic tool, this method can improve diagnostics and lead to early diagnosis. In addition, the method enables starting early treatment for treatable congenital disease forms. WES method enables the high-throughput investigation of both known and novel disease genes in coding regions of the genome, where most functional variation lies (85%) (Botstein, Risch, 2003, Majewski et al., 2011). The rise of WES in the past decade has enabled the identification of disease variants for Mendelian disease with known phenotypes in approximately 60% of sequenced exomes (Gilissen et al., 2012). This, however, varies largely depending on the disease, inheritance type and age of onset in question. NGS methods continue to be increasingly crucial for diagnosis of rare diseases as they offer a more cost-effective and comprehensive way of large-scale screening of disease-causing variants. Patients who present with a CMT phenotype are often referred for either targeted gene panels or untargeted NGS to link the phenotype to a gene variant that potentially causes their disease. Exome sequencing could thus serve as a first line diagnostic tool in rare disease patients without the need for complex and time-consuming sequencing of dozens of known disease genes separately. Identification of new disease genes from WES data is formed upon understanding of the molecular processes linked with the gene’s functions. Furthermore, literature search will reveal whether variants in the gene have

35 Review of the literature  previously been linked to the disease phenotype. Mendelian disease is particularly suitable for WES because of location of likely disease-causing variant in the coding region, high penetrance and rare in population (Ng et al., 2009). WES analysis has proven particularly useful for recessive variants, whereas dominant heterozygous variants are more difficult to pinpoint as a novel disease gene by this method (Goh, Choi, 2012). WES has benefited the diagnosis of neurological diseases from intellectual disability to mitochondrial diseases and neuromuscular phenotypes (Fogel, Satya-Murti & Cohen, 2016). However, further functional analyses are required following a candidate gene discovery to support the pathogenicity of the variants of interest. A complementary tool for genotype-phenotype analysis are online websites such as GeneMatcher and Matchmaker Exchange that connect clinicians and researchers from across the world with matching genotypes and phenotypes (Philippakis et al., 2015, Sobreira et al., 2015). This is particularly important when investigating the link between variants in a new disease gene and associated rare phenotype.

  !7)0')//&%6)(02()/-1+2*(-6)%6)6%**)'7-1+  027251)85216

 $*(%+*!%$$)*#""*/&)

Stem cells are undifferentiated cells from which specialized cell types are generated. They exist in different organs of the body and are important for growth and healing response in response to injury for example. They are present in several tissues such as the bone marrow, liver, blood and teeth. These are the somatic stem cells in the adult body. Stem cells have therapeutic potential and bone marrow, and blood stem cells are being used in certain treatment applications such as stem cell replacement therapy. Embryonic stem cells (ESC) are isolated from embryos and easily cultured and expanded into any cell type but have a problem with immune response in transplantations. Induced pluripotent stem cells (iPSC) are similar in application and differentiation potential than ESC, but do not have similar ethical or transplantation challenges. Stem cells have multiple applications in basic biology and translational medicine. However, their use and material they are derived from have been debated for decades. Recent developments in particularly the use of iPSC provided a model system with less ethical concerns and a high-throughput model for differentiations for example (Figure 3).

36

  &;*7<5*=2878//27-270</;869*=2.7=<*695.<=1;8>0112#%<*7-0.7..-2=270=8 =1.;*92.<

 $+&"+(!&%*$*)*#"") iPSC are non-terminal cell types that are capable of differentiating into multiple lineages and proliferating limitlessly in culture. They can be generated by reprogramming from many somatic cell types, for example skin, blood or urine (Figure 3). iPSCs were first reprogrammed from somatic cells of rodents (Takahashi, Yamanaka, 2006) shortly followed by differentiation from human somatic cells into hiPSC (Takahashi et al., 2007). iPSCs have since been used for studying basic biology and for disease modeling due to their capability to differentiate to a variety of mesodermal, ectodermal and endodermal cell types. This is particularly useful for translational value where affected individuals carry a specific genetic defect affecting particular cell type (Figure 3). Human induced pluripotent stem cells (hiPSC) can be reprogrammed from several somatic cell types in the presence of the pluripotency factors ‘Yamanaka factors’ Oct4, Klf-4, Sox2 and c-myc (Takahashi et al., 2007, Takahashi, Yamanaka, 2006). Indeed, Oct4 has been long considered the most important single factor for inducing pluripotency. Recent study has shown that c-myc is capable of inducing pluripotency in somatic cells in absence of Oct4 (Velychko et al., 2019). The research into understanding mechanisms of pluripotency is still ongoing. There are multiple exogenous and endogenous methods for reprogramming with variable benefits. Traditionally most common are transient expression of adenoviral or retroviral vectors or introducing transgenes (exogenous genomic sequences) into the genome of the cells. Transgenes may be more desirable for therapeutic applications (Martin, U., 2017) due to the problems associated with integrative vectors (Rothe, Modlich & Schambach, 2013). Many of the reprogramming approaches are constantly being developed and reprogramming is possible from increasing number of different tissues.

37 Review of the literature   %"!$ $+(%$"!))!$ +#$$+(%$)

Neuronal diseases require a cell-type specific model system. Due to variation between human and other model organisms, novel in vitro model systems have been developed based on hiPSC. These are particularly important for disease modeling, since neurons are otherwise unobtainable and nerve biopsies obtained from patients unethical and risky. hiPSC have enabled the study of human neurons in as short time as two to three weeks. Motor neurons are the primary cell type affected in CMT. Disease modelling using differentiated neurons has proved useful for study of other diseases affecting motor neurons. Motor neurons differentiated from hiPSC provide high potential for disease modeling and potential for transplantation without problems attached to hESC (Nori et al., 2011). However, hESC have been useful in multiple disease modeling studies due to robust differentiation abilities (Curtis et al., 2018, Lee, J. P. et al., 2007, McDonald et al., 1999, Peljto, Wichterle, 2011, Rosenzweig et al., 2018). Various motor neuron differentiation methods have been established and those have become faster and more efficient. One of the earliest protocols reported an efficient conversion of both hESC and hiPSC into motor neurons through SMAD pathway inhibition of TGF and BMP pathways (Chambers et al., 2009). The method of SMAD inhibition has since become a popular backbone of various motor neuron differentiation protocols. Reports have presented conversion of hiPSC into motor neurons at approximately 70-80% efficiency (Maury et al., 2015, Qu et al., 2014). Motor neurons can be generated in two-dimensional platform in adherent surfaces from attached hiPSCs or in a more scalable and physiological three-dimensional suspension-culture. Suspension culture approaches are most commonly used since they have produced higher differentiation efficiencies than adherent culture models. Typically, during the first few days to a week, hiPSC are differentiated into neuroepithelial progenitors (NEP) by dual SMAD inhibition of TFG-Beta (SB431542) and LDN193189 (BMP inhibitor) (Chambers et al., 2009). Some variation in protocols is seen, for example the addition of Chir-99021 as a GSK- 3 inhibitor. Neural induction typically takes place by inducing motor neuron progenitors (MNP) by activation of sonic hedgehog (SHH) and retinoic acid pathways. Following initial patterning, media may contain neurotrophic factors such as brain-derived neurotrophic factor (BDNF), and glia cell- derived neurotrophic factor (GDNF). After MNP induction typically at day 10 (Chambers et al., 2009, Maury et al., 2015, Qu et al., 2014, Vandoorne et al., 2019), spheroids are dissociated into single progenitor cells and plated down on an extracellular matrix containing poly-d-ornithine/lysine and laminin. Poly-d-ornithine/lysine is a general surface for cell growth used for various cell types, whereas laminin allows for axon formation and growth. Media typically still contains retinoic acid and sonic hedgehog, as well as neurotrophic factors in media until the end, and γ-secretase inhibitors such as N-[N-(3,5-Difluorophenacetyl)-L-alanyl]-S-phenylglycine t-butyl ester

38

(DAPT) are added for approximately a week to improve neuronal differentiation. Following this, neurons are typically cultured in media containing only neurotrophic factors until matured. Similar protocols may generate very specific subsets of neurons, characterized by HOX gene transcription factor expression, such as lateral motor column motor neurons of the thoracic/lumbar spinal cord (Hudish et al., 2020). Overall, motor neuron model systems have facilitated the study of molecular mechanisms linked to monogenic motor neuronal diseases. Since these are relevant cell types and human model system, they are expected to facilitate translation of findings from motor neuron model systems.

 ))))!$ "%"*($)"*!%$!$$+(%$)

The existence of axonal translation has been debated in literature. Previously, axons were thought to derive their proteins from cell body and dendrites, and not actively translate their own proteins (DROZ, LEBLOND, 1963). In situ hybridization and RNA sequencing of axons and dendrites from rat hippocampi indicated the presence of approximately 2500 mRNA species (Cajigas et al., 2012). To assess axonal transcription and translation, methods such as the microwell assay using Boyden chamber have been repurposed to study axons (BOYDEN, 1962). Additionally, new methods using microfluidic approaches have been developed to study axonal translation. Interestingly, recent studies have identified axon-specific transcriptome and proteome signatures and shown that axonal translation has an important role in both developing and adult nervous system (Shigeoka et al., 2016, Shigeoka et al., 2019). The purpose of local translation is to enable proteins crucial to the proximal compartment to be produced and transported efficiently (Shigeoka et al., 2016). Defects in local neuronal translation were identified to compromise mitochondrial function and translation in axonal CMT caused by Ras related protein (RAB7A) pathogenic variants (Cioni et al., 2019). Axons require local translation to efficiently synthesise in particular mitochondrial and ribosomal proteins. Since little is known about axonal translation overall, novel methods such as axon-TRAP (translating ribosome affinity purification) (Shigeoka et al., 2018) and RiboTag knock-in mouse line (Sanz et al., 2009) have been developed to understand transcription dynamics in mammalian axons in vitro and in vivo, respectively. Development of methods to analyze axonal transcriptomes have proven useful for understanding alterations in disease. Recent study analyzed the transcriptome of axons of motor neurons generated from hiPSC (Hudish et al., 2020). The results indicated that nuclear-encoded mitochondrial genes as well as ribosomal protein encoding genes are translated in axons, and that the axonal translation of nuclear-encoded mitochondrial genes is disrupted by hypoxia. Moreover, defective axonal

39 Review of the literature  translation may contribute to neuronal disease phenotypes such as neuropathy. To conclude, the role of axonal mRNA transport and translation was previously unappreciated, but recent understanding of axonal transcriptome has highlighted the importance of active translation to axonal function. Further research is required to understand axonal gene expression in health and disease.

 !*!$ *  $%#-!* ) %(!))#%"!$ 

Genome editing has enabled the modification of the genome (mutagenesis) of variants into coding and non-coding regions of the genome of humans and other species in a variety of cell types. hiPSC obtained from healthy individuals can be genetically modified by using different gene editing methods. In recent years clustered regularly interspaced palindromic repeats (CRISPR) has become a popular tool for editing the genome of hiPSCs. The disease modeling community has also benefited from the ability to generate cell models for reliable and robust readouts. Gene editing has for example been used in the modeling of pathomechanisms in ALS (Kramer et al., 2018). The Nobel Prize winning technology is expected to give more insight into human disease biology. Knock-in gene editing using homology directed repair (HDR) can be used to generate a variant in a healthy hiPSC line to study mechanisms and therapeutic targets of diseases (Shi et al., 2017). Generating variants in a healthy hiPSC line enables the cell lines to be isogenic by only altering the loci of interest, while otherwise being identical genetically. This method allows for a reliable readout of changes, which may otherwise be covered by the heterogeneous genetic background in a non-isogenic control. Somatic cells derived from affected individuals contain the disease-causing variant/haplotype. These can be used to generate an isogenic line by correcting the variant. Corrected cell lines offer a potential therapeutic interest for personalized drug discovery and transplantation. Affected individuals’ hiPSC are useful in studies for comparison with isogenic cell lines and for therapeutic applications of transplantation. Typically, gene editing experiments with CRISPR to introduce a knock-in variant consist of three main components – guide RNA (gRNA) an RNA that corresponds to the loci of interest, single-stranded oligodenucleotide (ssODN) that introduces the change desired (Figure 4). Finally, Cas nuclease plasmid or protein (typically Cas9) that recognizes the sequence in genome as directed by gRNA and generates a double-stranded break (DSB) in or near the loci. gRNA consists of 20 nucleotides long sequence CRISPR RNA (crRNA) that identifies complementary genomic DNA sequence. Base pairing in the genome is predetermined by protospacer adjacent motif sequences (PAM) and are specific to the Cas9 enzyme. The PAM sequence for SpCas9 is NGG and occurs

40

frequently in the genome. Alternative non-preferential PAM sites may be used but are less likely to produce specific cuts (Hsu et al., 2013). Two different ways for the cutting to take place by CRISPR are DSB by non- homologous end joining (NHEJ) or HDR, which differ in the specificity and predominance. NHEJ pathways are dominant repair pathways in the genome, whereas HDR guided repair mechanisms require a set of conditions for targeting. HDR mediated repair mechanism requires a donor template, typically single – or double stranded (ssODN, dsODN) oligonucleotide sequence for specific targeting. Gene editing community has generated variety of tools to design, assess efficiency and off-target effects of specific gRNAs. Particular care to design gRNAs that avoid excessive numbers of exonic off- targets is needed (Bruntraeger et al., 2019) and selection of multiple potential gRNAs. However, efficiency is not completely predictable by online tools, which indicates the need to test multiple gRNAs in cells (Bruntraeger et al., 2019). gRNAs have various efficiencies at target sites depending on chromatin accessibility and gRNA target sequence for example. Some protocols have reported up to 40% allelic modification by HDR (Bruntraeger et al., 2019). Delivery method additionally affects CRISPR efficiency. Small molecule approach has been used as an alternative for HDR targeting in human cells (Riesenberg, Maricic, 2018). The authors utilized ‘CRISPY’ mixture of small molecules and targeted repair pathways effectively in human pluripotent stem cells, whereas the effects in other cell types were compound specific. Engineered high-fidelity variants of Cas9 delivered as ribonucleoprotein complexes (RNP) are efficient and specific, minimizing unwanted off-target effects in human hematopoietic stem and progenitor cells (Vakulskas et al., 2018).

  %,1.6*=2,8/,86987.7=<8/=1.$%#$*<<.-=80.7.;*=.,>=+B-8>+5.<=;*7-.-+;.*4%8;1868580B-2;.,=.-;.9*2; $*==1.<2=.8/?*;2*7=&12<8,,>;<9;8A26*5=8=1.9;8=8<9*,.;*-3*,.7=68=2/ # 68<=,866875B!>2-.$!0$!0>2-.<=1.*<7>,5.*<.=8=1. ,8;;.,==*;0.=58,*=28727=1.0.786.&1.=B92,*55.70=18/=1.0$!2< +9&1. ,5.8=2-.<<"!27=;8->,.<=1.?*;2*7=

41 Aims of the study   !

The overall aim was to study the molecular mechanisms and genotype- phenotype correlation in CMT causing variants of MCM3AP, which is thought to be involved in mRNA export and gene expression. Prior to beginning of this dissertation work, MCM3AP had been identified as a candidate gene from the exome sequencing data of the Finnish family.

The specific aims of the dissertation work were the following:

1. To identify the molecular and functional basis for a novel disease gene MCM3AP. 2. Characterise a novel syndrome and its genotype-phenotype correlation in multiple families. 3. Generate an in vitro motor neuron model to study effects of MCM3AP affected individuals’ variants. 4. To study cell type specific effects of GANP on mRNA export and gene expression.

42

    

The following is a summary of the most important and commonly used methods and materials. The full details are available in each of the articles.

 2//)'7-212*'/-1-'%/(%7%

The affected individuals were identified and characterized in the participating centres obtaining all available clinical data including neurophysiology and imaging data.

  7,-'%/3)50-66-216

Experiments done using human samples from affected individuals were in accordance with the Declaration of Helsinki and voluntary written informed consent was received from affected individuals or their guardians. Institutional ethics boards approved the studies. The hiPSC programming was approved in HUS ethical committee with statement number 95/13/03/00/2015 and exome sequencing number 399/E9/07 with revisions 371/13/03/01/2012.

  **)'7)(-1(-9-(8%/%1('21752/')//6

For the study, fibroblast cell lines were obtained from nine members of seven families and two family members (mothers) (Table 5). Biomedicum Stem Cell Centre reprogrammed hiPSC from healthy controls (HEL61.2, HEL62.4) and two affected individuals’ fibroblasts (FIN1/2: HEL147.3, HEL148.1). One hiPSC line was generated by CRISPR-Cas9 by introducing a variant into healthy control hiPSC as explained below.

43 Materials and methods 

 &*+5.8/,.55527.<><.-27=12<<=>-B

         !     #"       (403?   --,*:,+05+0<0+;(3 0)86)3(9: /0 "    --,*:,+05+0<0+;(3 0)86)3(9: /0 "  (403?  $" --,*:,+05+0<0+;(3 0)86)3(9:  (403?   --,*:,+05+0<0+;(3 0)86)3(9:     (403?4,4),8 0)86)3(9:  (403?$"  $"  --,*:,+05+0<0+;(3 0)86)3(9:  (403?!  ! --,*:,+05+0<0+;(3 0)86)3(9:  ! (403?4,4),8 0)86)3(9:  (403?  "# --,*:,+05+0<0+;(3 0)86)3(9: "#    "#  --,*:,+05+0<0+;(3 0)86)3(9:  (403?    --,*:,+05+0<0+;(3 0)86)3(9:    (403?  !! ,5,,+0:,+ /0 "

  $,2/));20)%1(+)120)6)48)1'-1+

Whole-exome sequencing was done for ten affected individuals from six families and whole genome sequencing was done for two affected individuals from one family.

  -&52&/%67')//'8/785)

Skin fibroblasts were established by skin bunch biopsies and maintained in typical fibroblast culture conditions. Cells were cultured in incubator at 37°C in 5% CO2 and 95% O2 in Dulbecco’s modified eagle’s media (DMEM) (#BE12-614F, Lonza) containing 10% FBS, 1% penicillin–streptomycin, 1% L- glutamate and 0.1% uridine.

  -62/%7-21%1('6<17,)6-6

Total RNA from fibroblasts, hiPSC and motor neurons was isolated with Nucleospin RNA kit (#740955.50, Macherey-Nagel). cDNA was synthesized according to manufacturer’s instructions with Maxima First Strand cDNA Synthesis Kit (#K1671, Thermo Fisher Scientific).

44

 8%17-7%7-9) 5)9)56) 75%16'5-37-21 32/<0)5%6) ',%-15)%'7-214 "  qRT-PCR for quantification of mRNA amount, as well as RNA sequencing validations was done with cDNA isolated as described above. qRT-PCR was run on a CFX96 Touch Real-Time PCR cycler (Bio-Rad) with SYBR Green qPCR Kit (#F415S, Thermo Fisher Scientific) on a 96-well plate with gene- specific primers. The cycling protocol was 95°C for 7 min, then 40 cycles of 95°C for 10 s and 60°C for 30 s.

 -1-+)1)%66%<72()7)'7%/7)5)(0 63/-'-1+ -1*-&52&/%676

MCM3AP exon 5 and proximal intronic sequences were inserted by bacterial cloning into pSPL3 minigene. We introduced the splice donor variant of affected individual NL1 c.1857A4G, p. Q619= into exon 5. 2 μg of plasmids containing the mutant or wild type exon 5 were transfected into 143B cells with jetPRIME kit (Polyplus). RNA was isolated 24 h post-transfection with NucleoSpin kit (Macherey-Nagel) and analyzed by RT-PCR and agarose gel electrophoresis.

 -;)(1)8521%/'8/785)6

A previously characterized hiPSC line HEL11.4 was differentiated towards neuronal lineage (Hamalainen et al., 2013). Neuronal cells were analyzed using immunocytochemistry with antibody targeting neuronal β-tubulin III (TUJ1, #801201, BioSite), and GANP protein detected using an N-terminal antibody (#ab198173, Abcam).

 5)3%-5%66%<

To study DNA damage repair, we used fibroblast cells from affected individuals FIN1 and FIN2, three unaffected control cell lines and fibroblasts derived from a xeroderma pigmentosum group C (XPC) affected individual (Coriell Institute Cell repository, GM16684) and from a DNA ligase 4 syndrome (LIG4) affected individual (Coriell Institute Cell repository, GM16088) serving as positive controls for sensitivity to UV and ionizing radiation, respectively.

45 Materials and methods   !2(-80 (2()'

Cells were isolated with 1X RIPA (#9806, Cell Signaling) for 5 min on ice, scraped and centrifuged at 14 000 x g for 10 min at 4°C. Following this, protein concentrations were measured with Bradford assay (#500-0006, Bio-Rad). For SDS-PAGE analysis of affected and control fibroblast cells, 30 μg of protein was loaded per well on a 10% TGX 10-well 30μl/well gel (456-1033, Bio-Rad) and 5 μg protein for neurons. Molecular size of bands was detected with Protein Standard (#161-0374, Bio-Rad) and proteins loaded together with 4x Laemmli loading buffer (#161-0747, Bio-Rad) in 2-mercaptoethanol (#1610710, Bio-Rad). Gel was transferred into a nitrocellulose or PVDF membrane (Bio-Rad) and transferred using Trans-Blot Turbo system (Bio- Rad). The blot was washed with TBST and blocked in 5% milk/TBST for 1 hr in RT followed by primary and secondary antibody incubation steps.

  00812'<72',)0-675<

Cells were grown in standard cell culture conditions as described in relevant sections on cover slips. Cells were then washed three times with PBS, fixed with 4% paraformaldehyde for 15 min and washed three times with PBS. Cells were then permeabilized in 0.2% Triton X-100 in PBS for 15 min in room temperature. Coverslips were washed three times in PBST (0.1% Tween-20) followed by blocking in 5% BSA/PBST (#001-000-162, Jackson Immuno Research) for two hours in room temperature. Primary antibody in 5% BSA/PBST was incubated overnight in 4°C. Coverslips were washed with PBST for 15 min three times, followed by incubation in secondary antibody in 5% BSA/PBST for one hour in room temperature. Then cells were washed three times for 10 min and stained with Vectashield containing DAPI (#H-1200, Vector Laboratories).

  -'526'23<

Fluorescence imaging was done with Zeiss Axio Observer microscope (Zeiss, Germany) with a 20X air or 63X oil immersion objective and Apotome.2 with excitation/emission wavelengths (nm): blue (DAPI), 350/461; green (GFP), 498/516; and red (DsRed). Fibroblasts were imaged with Zeiss LSM880 IndiMo Axio Observer confocal microscope with DyLight 594 (GANP) imaged with laser 561 and Dapi imaged with laser 405 using the same exposure times and settings for all cell lines. Imaging objective was Plan-Apochromat 63x/1.4 Oil Dic M27.

46

    ,<&5-(-=%7-21

RNA in situ hybridization was performed on fixed fibroblasts using RNAscope Multiplex Fluorescent Reagent Kit Version 2 (#323100, Advanced Cell Diagnostics). Cells were first fixed in 10% neutral buffered formalin followed by dehydration in increasing concentrations of ethanol (50%, 70% and 100%) followed by rehydration. The samples were then labelled in decreasing concentrations of ethanol (70% and 50%) and washed in PBS. The probes PPIB (positive control, #313901), dapB (negative control, #310043), and ATXN7L3B (target probe, #804431), PRIMA1 (target probe #900601) were hybridized for 2 h at 40°C followed by amplification of the signal and developing the HRP channel with TSA Plus fluorophore Cyanine 3 dye (1:1500 dilution) (NEL744001KT, Perkin Elmer). The cells were counterstained with DAPI to label nuclei and mounted with ProLong Gold Antifade Mounting media (P36930, Invitrogen). The samples were imaged on 3DHISTECH Pannoramic 250 FLASH II digital slide scanner using a 40x objective.

  "5%160-66-21)/)'75210-'526'23<

Fibroblast cultures on glass slides were fixed with 2% PFA + 2% Glutaraldehyde (Sigma) for 30 min at 4°C, washed with PBS once and then fixed with 2% Glutaraldehyde O/N at 4°C. Fixed cells were then prepared according to standard procedures for TEM and images were captured with the Jeol 1400 TEM microscope at 80,000V at the Electron microscopy Unit of the Institute of Biotechnology.

  2/)'8/%502()/-1+2*!%' 5)+-212*

The structure of GANP in the Sac3 region was modelled based on the structure of this region in the S. cerevisiae TREX-2 complex, with high homology to the human complex. Visualization was generated using the CCP4MG software.

 6)48)1'-1+%1(&-2-1*250%7-'6

RNA was isolated as described above. RNA concentrations/RNA integrity were measured with Agilent TapeStation and Nanodrop. RNA measurements and mRNA sequencing were completed at Biomedicum Functional Genomics Unit with paired-end sequencing with Poly A binding beads (rRNA depletion for axons) and NEBNext Ultra Directional RNA Library Prep and sequenced with Illumina NextSeq High Output 2 x 75bp. Target number of reads was 45- 50M which was ensured by sequencing with Next Seq Mid Output flow cell (fibroblasts). For motor neurons (total and nuclei), the RNA-seq libraries were prepared using NEBNext Ultra II Directional (PolyA selection beads) kit and

47 Materials and methods  sequenced with NovaSeq SP 2x150bp reads (Illumina) at Biomedicum Functional Genomics Unit (FuGU). For axon-Seq, libraries were prepared using rRNA depletion method. The read count data was analyzed with DESeq2 (Love, Huber & Anders, 2014). R878H/R878H Sac3 mutant line was compared to the parental cell line, separately for each sample type (total RNA, nuclear RNA and axonal RNA). The raw data was processed through several quality control steps, detailed in the articles. Differential gene expression was analyzed using DESeq2, comparing disease status (Control vs affected individual, parental vs mutant). Splicing changes were analysed by KisSplice. All data was deposited to GEO.

 8/785)2*,80%1-1(8')(3/85-327)1767)0')//6 ,-! hiPSC were expanded and grown in Essential 8 or Essential 8 Flex media in vitronectin coated plates until confluent. The cells were dissociated with Versene/EDTA and processed for experiments. The cells were incubated in media containing Rocki/Revitacell supplement when needed. hiPSC were frozen in KOSR/10% DMSO.

 27251)8521(-**)5)17-%7-21

To differentiate hiPSC into motor neurons, we used an existing protocol with small modifications (Guo et al., 2017). In summary, confluent hiPSC were dissociated with 0.5 mM EDTA and cultured in Y/SB/LDN/CHIR for 2 days. Following five days, the cells were cultured in RA/SAG and for the next 7 days, BDNF and GDNF were added to the media. DAPT was added for days 9-17. Day 10 neurons were dissociated. Day 17 onwards until analysis, neurons were kept in BDNF, GDNF and CNTF. Neurons were analyzed for experiments on days 22-29. Briefly, to obtain neuroepithelial stem cells, hiPSC were dissociated with 0.5 mM EDTA and resuspended Neuronal basal medium (DMEM/F12/Glutamax, Neurobasal vol: vol, with N2 (Life Technologies), B27 (Life technologies), L-ascorbic acid 0.1 mM (Santa Cruz) and Primocin supplemented with 40OμM SB43154 (Merck), 0.2OμM LDN-193189 (Merck/Sigma), 3OμM CHIR99021 (Selleckchem), and 5OμM Y-27632 (Selleckchem). From day 3 on, 0.1OμM retinoic acid (Fisher) and 0.5OμM SAG (Calbiochem) was added to Neuronal Basal medium. From day 7 on, BDNF (10Ong/ml, Peprotech) and GDNF (10Ong/ml, Peprotech) were added. 20OμM DAPT (Calbiochem) was added on day 9. Motor neuron progenitor spheroids were dissociated into single cells for plating on day 11 by using Accumax (Invitrogen). Motor neuron progenitors were subsequently plated on poly-D- lysine 50 μg/ml (Merck Millipore) and laminin 10 μg/ml (SigmaAldrich)

48

coated 6 well plates for RNA sequencing (500 000 cells/well), 12 well plates for protein and RNA (200 000 cells/well) and 24 well plates with round coverslips for immunocytochemistry (75 000-100 000 cells/well). At day 14 retinoic acid and SAG were removed from media. From day 16 on, the cells were switched to motor neuron maturation medium supplemented with growth factors BDNF, GDNF, and CNTF (each 10Ong/ml, Peprotech). Media were changed every other day by replacing half of the medium. Developing motor neurons were isolated at earliest d22 of differentiation and expressed Isl1 and Hb9.

  ;21-62/%7-21*25 %1(3527)-1

Dissociated Accutase passaged single neurons d11 (500K) were plated in motor neuron differentiation media in 6 well hanging inserts with Boyden chamber device (1 μm pore, Falcon Cell Culture Inserts, #10289270, Thermo) coated on both sides with poly d lysine and only on bottom side with laminin. For protein extraction, three wells were combined as follows for nuclear and axonal compartments. Cells containing media were removed and scraped with 1 mL PBS into a conical tube on ice. The filter top was then washed 10 times with PBS strongly pipetting and then filter containing axons on bottom side was washed once with cold PBS. For RNA, PBS was removed by aspiration after spinning down and continued to RNA extraction (incubated in the first buffer of kit for 15 min after vortexing). Six wells combined into same conical tube. For protein, the filter was then placed on ice and lysed in 30 uL RIPA containing Halt protease inhibitor. After centrifuging the nuclear compartments for 7 min in 1000 rpm in cold centrifuge, PBS was removed and 30 uL RIPA with Halt protease inhibitor (#1861284, Thermo) was added. All samples were vortexed and incubated on ice for 15 min. The samples were then centrifuged for 10 min at 14 000 x g in a cold centrifuge and supernatant was removed for protein. Three wells combined into same conical tube. Protein concentration was measured with BCA assay kit (#23227, Thermo).

  8'/)--62/%7-21*25 %1(3527)-1

Neurons at d22 were washed once with PBS RT, lysed in homogenizing buffer (0.3 M sucrose, 1 mM EGTA, 5 mM MOPS, 5 mM KH2PO4, pH 7.4). Cells were scraped and an aliquot was removed for total cell extract for protein samples. Cells were run through 22 G syringe and needle 8 times; an aliquot was taken for trypan blue staining and cells were observed under microscope for lysis. Cells were then centrifuged for 15 minutes at 6500 rpm at 4°C. Supernatant formed the cytoplasmic extract and pellet the nuclear extract. RNA was isolated with Nucleospin RNA kit (#740955.50, Macherey-Nagel) and RNA concentrations were measured with Denovix and TapeStation. For RNA,

49 Materials and methods  nuclear and total cell extracts were lysed in 1 x RIPA with protease inhibitor, incubated on ice 5 min, centrifuged 14 000 x g for 10 min, removed supernatant for protein.

   ! %6+)1))(-7-1+ hiPSCs were cultured in feeder-free conditions in 10 cm culture dishes in TeSR-E8 (Stem Cell Tech) medium on SyntheMAX (2μg/cm2) (Corning #3535). For passaging cells were washed with PBS (no Mg/Ca2), incubated with Gentle Cell Dissociation Reagent (Stem Cell Tech) at 37°C for 3 min and added TesR-E8 media. Cells from a confluent 10 cm petri dish were dissociated with Accutase (Thermo) into single cells (1 million per electroporation). SpCas9 (final concentration of 20 μg/electroporation) ribonucleoprotein complexes (RNP) were formed with sgRNA (final concentration 20 μg sgRNA/electroporation) containing crRNA and tracrRNA and ssODN containing variant. Cells were electroporated with Amaxa 4D using (Lonza #AAF-1002B, #AAF-1002x) suspended in P3 Primary cell 4D-Nucleofector X Kit L (Lonza, V4XP-3012) and CA137 program. T7E1 assay was done on pooled cell population to assess cutting efficiency of the sgRNAs. For single cell cloning, cells were passaged into single cells with Accutase, and 500-1000 cells plated on 10 cm petri dishes. Single cells were washed and slightly detached from surface by incubation with Gentle Cell Dissociation Reagent. Colonies were then picked in TesR-E8 with Rock inhibitor under EVOS microscope into U-bottom 96-well plates containing the same media. Two identical plates with Geltrex or SyntheMAX were prepared for freezing and genotyping, respectively. Libraries tagged with i5 plate-specific and i7 well-specific Illumina compatible oligonucleotides were quantified by qRT-PCR-based KAPA Library quantification kit Illumina Platforms (#KK4824). Libraries were sequenced by MiSeq (Illumina). hiPSC karyotypes of the parental and edited clones were confirmed normal (46 XY) by g-banding.

  !7%7-67-'6

Unpaired parametric two-tailed t-test was used for qRT-PCR and RNA in situ hybridization experiments in which two biological groups were compared (affected individuals vs. controls). Fisher’s exact test was used for analyzing microscopy data. For intron distribution, testing Wilcoxon rank sum test with continuity correction (Mann-Whitney U test) was used. For analysis of GEO siRNA data, analysis was run in GEO2R comparing GANP knockdown and controls. For analysis of more than two groups, one-way ANOVA was used.

50

  

 9%5-%17681()5/-)%1)%5/<216)7 1)852/2+-'%/(-6)%6)63)'7580

In this study, we established biallelic variants in MCM3AP underlying recessively inherited CMT. Following our initial publication, the disease was named peripheral neuropathy with or without impaired intellectual development (PNRIID, #618124) by Mendelian Inheritance in Man (MIM).

 "!$!"*+()

MCM3AP-related disease is primarily axonal and predominantly motor neuropathy; disease begins in childhood and progresses into distal motor impairment, and difficulties in maintaining gait (Table 6). In some individuals, the syndrome initially presents with a generalized developmental syndrome, epilepsy or central dysfunction followed by development of neuropathy. Affected individuals have gradual loss of mobility, often leading to wheelchair dependency in childhood, and causing severe reduction in motor skills leading to significant disability (I, II). Most affected individuals additionally present with some degree of delay in intellectual development (14/24) or learning difficulties (2/24) (Karakaya et al., 2017, Kennerson et al., 2018, Schuurs-Hoeijmakers et al., 2013) (I, II). However, 8 of 24 individuals have normal cognitive ability (Kennerson et al., 2018). Third most common features are central nervous system related signs such as upper motor neuron signs, ataxia (4/24), cerebellar ataxia (2/24), encephalopathy or epilepsy (I, II). Brain MRI findings indicated temporal lobe signal changes in 2 of 24 patients, white matter changes were also seen (4/24) and other abnormalities in brain, basal ganglia or cervical regions (5/24) (I, II). The multi-system phenotype described in the affected individuals is consistent with potential secondary mitochondrial dysfunction. Sensory impairment such as altered pain or sensory thresholds or reduced sensory nerve action potentials are present in some affected individuals. Electrophysiological testing and nerve biopsy mostly indicate axonal sensorimotor neuropathy in the affected individuals (I, II), while individuals in one family have signs of demyelination (I). Additional signs seen in a portion of affected individuals include eye-related abnormalities such as strabismus (9/24), hearing problems (2/24), sleep apnoea (4/24), claw hands and foot deformities characteristic of CMT, and scoliosis of spine (2/24) (I, II). Most affected individuals who already reached adulthood also have accompanying non-neurological signs such as obesity (5/24), diabetes (2/24) and short stature (2/24) (Kennerson et al., 2018) or ovarian dysfunction in

51 Results and discussion  adulthood (I). Similar phenotypes in multiple reported patients thus confirm MCM3AP as the underlying genetic cause of PNRIID. Phenotypes linked to MCM3AP variants have since expanded by the identification of affected individuals who developed peripheral neuropathy and multiple sclerosis (Sedghi et al., 2019), and individuals with dominant periodic paralysis phenotype (Gustavsson et al., 2019). These findings indicate the role of GANP in a spectrum of neurological phenotypes, although most phenotypes identified so far are predominantly linked with neuropathy and ID. Interfamiliar variability is common in CMT. Therefore, it is not surprising that we found some affected individuals with demyelinating signs, whereas most individuals presented with purely axonal type disease. For example, variants in GDAP1 (Baxter et al., 2002, Claramunt et al., 2005, Cuesta et al., 2002, Nelis et al., 2002), NEFL (Jordanova et al., 2003) and MPZ (Shy et al., 2004) contribute to both CMT1 and CMT2. The involvement of both neuron and glia highlight the importance of interaction between the two cell types in pathogenesis in multiple forms of CMT. Overall, MCM3AP related disease phenotypes are fairly uniform, and neuropathy is always present.

 &*+5.8/9;.-8627*7=91.78=B92,/.*=>;.<8/9*=2.7=</;868>;<=>-B*7-=1. ;.98;=<52<=.-1.;. *;*4*B*.=*5  .77.;<87.=*5 %,1>>;< 8.236*4.;<.=*5 %.-012.=*5 

   !"  !  "  ! #" "!     ",5968046:68 >65(3964,:04,9  5,;867(:/? +,4?,305(:05.   5:,33,*:;(3+09()030:? 03+ :6 9,<,8, *6.50:065   4(?(396),5684(3   ,(8505.+09()030:? %09;646:6897,,*/  3(5.;(.,  :(>0( "64,:04,99,5968?68  *,8,),33(8   $77,846:685,;865 ()0592090.5  90.59   703,79? 8,9,5:0565,-(403?    8(0504(.05.-05+05.9 5, 7(:0,5: *,8<0*(3  ()5684(30:0,9  "7(9:0*0:? 8,9,5:0565,-(403?    ;3:073,9*3,86909 8,9,5:0565,-(403?    (0:     ?76:650(    ;9*;3(8 !,+;*,+ 68 ()9,5: 36=,8   304) 8,-3,>,9 4;9*3, =,(25,99 +0--0*;3:? ;905. /(5+9

52

 "*6306909    3(=/(5+9(5+-66: #?70*(36-# 64465 +,-6840:0,9  ,3(?,+=(3205. 59,: ),:=,,5  (5+   3(:,8:/(5  ?,(89 6- (., 964, 465:/9 7(:0,5:9 5684(3 (., 964,56=(3205.  6996-46)030:?   ?,(89 964, 9:033  =(3205.05(+;3:/66+    ),90:? 69:*64465565  5,;8636.0*(390.5   <(80(5+?9-;5*:065 ",,505964,63+,8  7(:0,5:9   "/68:9:(:;8,     ,(805.786)3,49    ":8()094;9 86:/,8,?,  ()5684(30:0,9  68:0**6(8*:(:065  

 $*!*+()

Affected individuals from several unrelated families were studied by WES or WGS (Figure 5) leading to the identification of biallelic variants in MCM3AP that segregated with the phenotype (I, II). The families were identified in different centers around the world and connected through GeneMatcher (I). We identified different compound heterozygous missense/nonsense variant combinations and homozygous missense variants. The homozygous variants affected conserved amino acids in the Sac3 domain of GANP, and compound heterozygous variants in other regions of the gene (I, II). Previously, a large genome sequencing study investigating causes of ID had identified siblings with a homozygous Sac3 domain variant (Schuurs-Hoeijmakers et al., 2013) as a potential disease cause. Following our publication (I), several mainly homozygous Sac3 disease-causing variants have been reported (Gustavsson et al., 2019, Karakaya et al., 2017, Kennerson et al., 2018, Sedghi et al., 2019) as well as a dominant Sac3 variant (Gustavsson et al., 2019). Altogether, MCM3AP variants have now been reported in 14 unrelated families with neurological phenotypes (Table 2). Biallelic loss-of-function variants in MCM3AP indicate that GANP loses its critical function in regulating gene expression in the nuclear pore needed for the export and expression of mRNAs. This could be particularly detrimental to peripheral neurons with long axons contributing to neuronal vulnerability. The family member who are carriers of heterozygous variants are asymptomatic, which suggests two copies of a disease allele are needed to develop the phenotype.

53 Results and discussion 

  *98/-2<.*<.,*><270?*;2*7=<27!#9;8=.27! *6278=.;627><;.9.*=< 91.7B5*5*727.05B,27.;.9.*=<$$ $!+27-27068=2/%*, <>99;.<<8;8/*,=27 "" ,*;+8AB5=.;627><

  -67-1'7)**)'7621&<9%5-%17/2'%7-21

We used affected individuals’ fibroblasts from multiple families to assess the correlation between genotypes and phenotype severity, and to gain a better understanding of the molecular mechanisms associated with MCM3AP disease (II).

 *)!$ %(("*-!* ),(!*/%!))& $%*/&

Fibroblast cell lines from nine affected individuals from families FIN, AUS, USA, NL1, NL2, EST, IR and the unaffected mothers of NL1 (NLM) and IR (IRM) were available (Figure 5) (II). In the fibroblast study, all other affected individuals carried compound heterozygous variants outside of the Sac3 domain, except the AUS individual and the IR individual who had compound heterozygous and homozygous Sac3 variants, respectively. We used modeling and functional studies to identify a genotype-phenotype correlation in MCM3AP disease. We observed that the impact on protein abundance at the nuclear envelope associates with variant location. Whereas compound heterozygous MCM3AP variant combinations outside of the Sac3 domain resulted in a drastic decrease in GANP protein at the nuclear envelope, Sac3 mutants did not affect localization or amount of GANP. To further investigate the disease mechanism of the Sac3 missense variants, we exploited the homology between the crystallized human GANP: PCID2: DCCI complex and the S. cerevisiae Sac3:Thp1: Sem1 complex in silico (II). PCID2/Thp1 and GANP/Sac3 both have in this region a (PCI) fold based on a series of helical tetratricopeptide repeats (TPR) underneath the winged

54

helix domain. Seven of the eight Sac3 domain variants identified in the affected individuals were located in the region that lies in the winged helix domain and the final one close to winged helix domain. The winged helix domain and the adjacent PCID2 bind to RNA indicating that disease variants could impair RNA binding. We also found that loss of GANP protein in the nuclear envelope associate with more severe motor phenotypes. The patients with compound heterozygous variants typically loose mobility earlier and can be wheelchair bound. They have severe delays in motor development and acquiring motor skills, and the disease progresses faster. On the other hand, Sac3 variants are predicted to affect mRNA binding but the associated motor phenotypes tend to be milder. This suggests that GANP has several roles in cellular function, and that GANP depletion causes a more severe loss-of-function contributing to the severe phenotype. The depletion of GANP seems to be particularly important for motor neuron function with their long peripheral axons, whereas other signs such as ID are seen equally in both variant categories. Understanding of the variant type and location will help in predicting the potential clinical outcome in these two subsets of phenotypes. Although a relatively large number of patients with variants have now been identified particularly considering the rarity of the disease, absolute genotype-phenotype correlations need to be established through more families.

   9%5-%176 ,%9) (-**)5)17 )**)'76 21 +)1) );35)66-21()3)1(-1+21')//7<3)

 $(*!%$ $ ,"!*!%$ %  +#$ $+(%$" )/)*# %( !))#%"!$ 

Variants in genes encoding for nuclear envelope proteins have been associated with neurological conditions such as intellectual disability and with cancer (Carey, Wickramasinghe, 2018). Mechanisms of nuclear mRNA export, however, remain unstudied particularly in relevant cell types such as neurons and in human model systems. Much of research on mechanisms of RNA metabolism, particularly mRNA export, have been conducted in yeast or rodent models. In recent years, understanding of mechanisms in human cells have been brought forward. However, there are few studies into mRNA export mechanisms in disease context using the cell type affected. To model disease mechanisms associated with the main cell type affected in MCM3AP associated disease, a human neuronal model system was generated (IV). hiPSC were reprogrammed from skin fibroblasts of two affected individuals carrying compound heterozygous variants P148*fs48/ V1272M from Family F. Additionally, homozygous Sac3 variant R878H was introduced by gene editing into wild type hiPSC (Sac3 mRNA binding variant).

55 Results and discussion  This model allowed comparison of isogenic lines for more robust understanding of disease variant. We attempted the editing of other patient variants as well, however, no clones corresponding to those variants were obtained from these experiments. Motor neurons were selected as the modelled neuronal type since all affected individuals shared the common clinical phenotype of peripheral neuropathy affecting motor neurons in the peripheral nervous system. hiPSC were characterized and differentiated into motor neurons in culture. Motor neurons were differentiated using a suspension culture approach followed by dissociating and culture on adherent surface. Developing motor neurons were matured for a minimum of 22 days and their gene expression of motor neuronal transcription factors and neuronal differentiation markers was validated by qRT-PCR. The relative levels of these markers were similar across cell lines and independent differentiations, suggesting that GANP variant did not impair the ability of the neurons to differentiate. The development of human disease models based on affected cell types will facilitate understanding of neuronal mechanisms and are particularly suitable for modeling monogenic diseases. For RNA sequencing experiments, we sequenced the transcriptomes of only gene edited neurons, since patient neurons had large numbers of progenitor cells in the cultures. Additionally, comparison between edited cell line to isogenic parental line yields expression differences independent of genetic background.

 ,(!$*)(#%"$+(%$" $.&())!%$

To understand how GANP defects affect gene expression in the cell type affected by the disease, we used the motor neuron model system described above for analyzing transcriptomic changes (III). Comparison of overall gene expression profiles with hierarchical correlation revealed highly different gene expression between the gene edited Sac3 mRNA binding mutant line and wild type neurons, with 7044 differentially expressed genes. In this study, the role of GANP in neuronal gene expression was studied for the first time. We sequenced the transcriptomes of differentiating motor neurons to identify the impact of GANP Sac3 mRNA binding mutant on neuronal gene expression signatures. The primary expected outcome was a block in mRNA export of selected genes based on studies on other human cancer cell types (Aksenova et al., 2020, Wickramasinghe et al., 2014). However, we did not find evidence of altered export of mRNA such as retainment of nuclear transcripts in mutant neurons. Nevertheless, we found that GANP had an unexpectedly high impact on the expression of genes involved in translation indicating alterations in RNA metabolic processes. Particularly, based on RNA sequencing, GANP variant depleted neuronal gene expression in genes crucial for neuronal signal transduction, synapses, mitochondria and axon development. This suggests that GANP regulates

56

neuronal development and function. In addition, GANP defect causes large compensatory effects in the transcriptome of motor neurons, as for example seen in enchanced expression of translation related genes. The gene expression changes related to tricarboxylic acid (TCA) cycle could partly explain the secondary mitochondrial dysfunction and multiorgan phenotype seen in affected individuals. The role of GANP in neurons seems to be somewhat different from other cell types beyond the function of mRNA export and could explain the tissue specificity seen in the phenotype. Effects of the acute reduction in GANP expression by siRNA and auxin inducible protein degradation approaches in previous models (Aksenova et al., 2020, Wickramasinghe et al., 2014) could explain differences compared to patient and gene edited neurons with chronic effects on GANP.

  ( +"*).&())!%$%!$*(%$")) $)

To understand the effect of GANP defects in fibroblasts, which was the available cell type from several patients, mRNA was sequenced of cell lines of affected individuals FIN1/2, AUS, USA, NL1 and NLM and two unrelated healthy individuals (II). To compare affected individuals’ overall gene expression to controls’, we compared the transcriptomes by hierarchical correlation analysis. The analysis revealed a correlation of variant type (compound heterozygous or Sac3 variant). The compound heterozygous cell lines clustered closest together, whereas the Sac3 variant cell line clustered separately. In this analysis, a very small number of differentially expressed (DE) genes were identified in affected individuals compared to healthy controls. 51 downregulated and 110 upregulated genes were shared in affected individuals in this cell type. The low number of DE genes was likely partially related to the non-isogenic background of the affected and healthy individuals as well as fibroblast not being the primary cell type affected. The other main finding identified in fibroblasts was the intron content dependency of gene expression. We examined the number of introns in the identified DE genes since GANP has been suggested to be selectively involved in the export of rapidly regulated genes, which are shorter and contain less introns than most genes (Wickramasinghe et al., 2014). One of the intronless genes identified in our fibroblast transcriptome data was ATXN7L3B, encoding ATXN7L3B, which interacts with ENY2, an adaptor protein in both the SAGA and TREX-2 complexes. We also identified in a GANP siRNA GEO dataset of a previous publication a downregulation of ATXN7L3B in HCT-116 cells (Wickramasinghe et al., 2014), and that ATXN7L3B was also downregulated in a recent report of a degron system induced GANP depletion (Aksenova et al., 2020). In the sequencing of neuronal RNA, we identified ATXN7L3B again as downregulated. This indicates that ATXN7L3B is regulated by GANP and is the sole consistent disease signature gene across different models of GANP depletion/variant and in various cell types. This

57 Results and discussion  seems to be a specific target of GANP since no changes in other TREX-2 or SAGA complex subunits were identified. Interestingly, we did not identify differential expression based on intron content in the neuronal transcriptomes, which suggests that gene expression mechanisms are different in neurons than other cell types (II-III). It also suggests that GANP has a more widespread effect on gene expression in neurons, but a more selective impact in other cell types. The large effect on gene expression was also suggested as a potential effect, since GANP was thought to participate in regulation of genes that control gene expression.

 *"*()$+"(# %#&%)!*!%$!$$+(%$)

In this study we seeked to understand the role of compartmentalized transcriptome in our mutant neurons in particular to understand protein synthesis in axons (III). We isolated the nuclear and axonal compartment of Sac3 mRNA binding mutant and parental line neurons to study effects on gene expression and splicing. Our results showed that 2225 genes were uniquely altered in the nuclear fraction, 4832 genes that were shared with total neuron population and 1701 genes unique to the total sequencing data. This indicates that in the nuclear fraction, we can identify unique genes that are enriched. In addition, the analysis revealed frequent splicing changes, approximately fivefold more by KissSplice analysis than in total neuron population. Principal component analysis showed similar clustering by genotypes in both total and nuclear RNA sequencing. We identified a small set of genes that were downregulated in the axonal dataset but upregulated in nuclear fraction of mutant neurons. Among these was proline rich membrane anchor 1 (PRIMA1), which functions in neurons and a variant in the gene was found in a family with nocturnal frontal lobe epilepsy (Hildebrand et al., 2015). In this research, the crucial role of GANP in gene expression in neurons was identified. Whereas it was previously shown in other cell types that GANP may participate in selective export of a small number of genes, our result shed new light on GANP function in human neurons, which is the affected cell type in the disease. GANP could play a role in selective mRNA export in other cell types as shown in previous research and in our studies gene expression in non- neuronal cell types, however, in neurons GANP seems to play a different role and drastically alter the transcriptome. Similar gene expression preference by affected cell type in neurons and muscles was found in Drosophila melanogaster study small bristles (sbr) (human NXF1), which interacts with GANP (Korey et al., 2001). In this work, we have shown that GANP has a different role in neurons than in other cell types studied so far. In other cell types GANP regulates selective export as shown previously, and in neurons GANP dysregulated approximately 40% of the genes expressed in our dataset, implicating it in processing various cellular pathways and particularly controlling translation

58

related genes as part of compensatory response. In particular, these findings could partly explain why the phenotype is neuron-specific although RNA metabolism defects are seen in other cell types. The neuronal phenotype could also be explained by axonal translation defects in the long axons of the PNS. Selective defects are seen in other cell types, whereas global alterations of gene expression could explain the neuronal predominance of the disease. This suggests that GANP defects in the Sac3 domain might cause only partial mRNA export defect that does not present as a nuclear retention of mRNAs, although we identified a larger effect on RNA metabolism in compensation of gene expression. The proposed role of GANP in selective mRNA export could be important also in neurons (Wickramasinghe et al., 2014), since GANP reshapes gene expression by dysregulating genes involved in multiple pathways including mitochondrial functions important for axons. As nucleocytoplasmic transport is important in many neurological diseases from motor neuron diseases to epilepsy, understanding of the role of key export factors such as GANP could improve development of therapies beyond treatment of PNRIID.

59 Conclusions and future directions     

Understanding of human genetic disease variants has dramatically increased over the past decades. Next generation sequencing methods such as exome sequencing have improved the detection of genetic defects and enabled identification of increasing number of monogenic human disease genes. This is particularly important in rare diseases, where molecular diagnosis is lacking for a large number of affected individuals. Identification of novel disease genes and their molecular mechanisms in relevant model systems could facilitate further preclinical studies and personalized therapies. In this thesis, a novel disease gene MCM3AP (GANP) was identified in individuals with peripheral neuropathy and intellectual disability. Here, genotype-phenotype correlations were described for variants depending on their location and effect on GANP protein. One of the aims of this work was to identify, for the first time, the role of GANP in human neurons, for which a motor neuron model system was used. Previously, GANP was shown to regulate TDP-43 toxicity in fly motor neurons (Sreedharan et al., 2015), being the only link between GANP and motor neurons prior to this study. The results implicate that there is a clear association with depletion of GANP protein and a more severe clinical phenotype seen in affected individuals. On the other hand, variants in the mRNA binding domain Sac3 are suggested to cause faulty binding to mRNA. We identified that these defects in GANP contribute to vast remodeling of gene expression in human neurons. This indicates that GANP is a critical regulator of gene expression of genes involved in various cellular processes, in particular translation compensatory increase and neuronal development. In other cell types, GANP selectively affects gene expression of a subset of genes. In summary, this thesis work provides understanding of the phenotype and genotype of MCM3AP neuropathy as well as disease mechanisms in human neuronal model system. Several open questions remain. Firstly, what are the common disease mechanisms across multiple different variants in the disease. Here, we explored the effects of mRNA binding patient variant in Sac3 domain and C-terminal compound heterozygous variants in motor neurons. However, further understanding of disease consequences due to other variants would be helpful to form an overall image of GANP function in neurons. Secondly, what is the neuromuscular pathology like in neuronal co-cultures, since the disease could potentially affect neuromuscular junctions and is there a disease phenotype in these co-cultures. Thirdly, what are the roles of GANP in other cell types such as stem cells, since components of the gene expression machinery were previously suggested to participate in regulation of pluripotency. The patient fibroblast model system has elucidated some of these, but the neuronal system generated here could be used to further study

60

the effect of other variants. This model system could be used to identify candidate genes to therapeutically target GANP defects for personalized treatments. Various molecular mechanisms are known to be involved in CMT, however few genes affecting gene expression dynamics have been identified. In this work, RNA metabolism defect and resulting compensation in neurons was identified as the primary disease signature. These results could help understand the underlying disease signatures in other CMT types and beyond in other neurological diseases. Examples of these include synaptopathies, epilepsy and ALS where pathogenic variants are linked with disturbances in RNA metabolism. The increased understanding of mechanisms in human neuronal model system could be used as a basis for developing treatments for synaptopathies since many of the underlying gene expression changes are linked with this group of diseases.

61 Acknowledgements    

This work was conducted in Stem Cells and Metabolism Research Program at the Faculty of Medicine, University of Helsinki 2017-2020. Gene editing part of this thesis was completed in Sanger Institute and King’s College London during 2018-2019. I would like to thank the following funding agencies for their financial support: Doctoral Programme Brain & Mind, University of Helsinki, Academy of Finland, Instrumentarium Science Foundation, HUS Helsinki University Hospital, Emil Aaltonen Foundation, Sigrid Juselius Foundation, Maud Kuistila Foundation, Lastentautien tutkimussäätiö. I am in gratitude to my supervisor, Dr. Henna Tyynismaa for mentoring me during these years. During this period, I have found my passion for translational science that makes an impact in patients’ lives. Thank you for discussions and guidance that not only improved the science but were inspirational and helped me grow into an independent researcher. I wish to thank my second supervisor Dr. Emil Ylikallio for sharing my passion for helping patients suffering from this rare disease. Both of you have helped me found what I am truly inspired to pursue in my scientific career. Dr. Jemeen Sreedharan, thank you for accepting me into your group for a wonderful research experience and for long-lasting collaboration and friendship. I have learnt so much during my year in London and I will never forget this experience. I admire your dedication to your patients, researchers and exceptional quality of your group’s research. Thank you for taking the time to mentor me and better my research with suggestions and thoughts. Dr. Matthew White - thank you for your dedication to help me in my project and for wonderful scientific discussions. Thank you both for your collaboration and friendship. Thank you for great times in the lab and beyond: Dr. Michael, Dr. Selina, Dr. Francesca, Dr. Emi, Ella and Camille. Thank you Dr. Andrew Bassett for allowing me to learn in such a stimulating environment with your excellent guidance and advice throughout my project. The opportunity to collaborate with you was very significant for the project and my training. I would like to thank Prof Dr. Johanna Uusimaa and Dr. Vihandha Wickramasinghe for agreeing to review my thesis and for useful comments to improve it and Professor Marina Kennerson for its public discussion. Thank you Dr. Merja Voutilainen and Dr. Thomas McWilliams for your guidance and multiple perspectives as part of the thesis committee, advice and encouragement during this work. Thank you also all old and current members of the lab – Miya, Tiina, Taru, Pooja, Jouni, Jana, Riitta, Svetlana K, Ruben, Sandra, Matilda, Markus, Jay, Nirajan, Mugen, Elina, Nelli, Svetlana M, Veijo, Julius J, Julius R and Sebastian.

62

My deepest gratitude to all collaborators – clinicians who contributed to genotype-phenotype characterization, modeling experts, laboratory scientists in Finland and internationally. Special thanks to Dr. Murray Stewart, Dr. Pirjo Isohanni and Dr. Tuula Lönnqvist. My hope is that this disease will have a cure one day and together we have learned so much about this disease. I would like to thank the patients and their families for taking part in this research. I hope the research will generate more research projects around the disease and be a small part in finding a therapy in the future. Thank you Michael and my parents for supporting me during these times.

Rosa Woldegebriel, Helsinki 2021

63 References   

 

69 #IK5<5F5# 1CG<=85% +INI?=% ,9F5G5?= %5HGIC1  ,5?5<5G<= ! +5?5;I7<=&  +HFI7HIF99LDF9GG=CB5B87

?G9BCJ5. +A=H< $99 <5H( GB5I@H <9B+ !69B" #5I:

A=CHH  $CHH( +CHC" #5B;(  %75::9FM" % =%5IFC+ 69@   @5B=;5B# % $5KGCB. +<5K" %  %=HC7

5@C<* +7

5FF9HH$ / @9H7<9F+ /=@HCB+   *9;I@5H=CBC:9I?5FMCH=7;9B9 9LDF9GG=CB6MH<9IBHF5BG@5H98;9B9F9;=CBG5B8CH<9FBCB7C8=B;9@9A9BHG ,33<3(9(5+463,*<3(930-,:*0,5*,:#JC@ BC  DD   

5LH9F* . 9B'H=7<"   I@9HH9 9K #B=;

=F8,   <5F7CH%5F=9,CCH<%, 9F98=H5FM&9IFCD5H:"98G % ( 85A F8=B;9F*  (5;CB9H5@ -B=J9FG=HMC:/5G<=B;HCB+95HH@9  9B9*9J=9KG=G5F9;=GH9F98HF589A5F?C: H<9-B=J9FG=HMC:/5G<=B;HCB+95HH@9 @@F=;

C9

CBB9 *=J=9F  ,5,;()3,6-5,<964<:*<3(9+0:69+,9: J5=@56@9

64

CHGH9=B *=G7<&  =G7CJ9F=B;;9BCHMD9GIB89F@M=B;

'1&+   ,<97<9ACH57H=79::97HC:A=LHIF9GC:5BH=6C8M5B85BH=;9BCB DC@MACFD

FIBHF59;9F% MFB9% $CB;# 5GG9HH *  8=H=B;H<9 9BCA9C: IA5B!B8I798(@IF=DCH9BH+H9A9@@G-G=B;*!+(* 5G*=6CBI7@9CDFCH9=B CAD@9L9G,;/6+:05463,*<3(9)0636.@30-;65 JC@   DD    

I7

IF?5F8# , IH@9F" +  BI7@95F9LCBI7@95G9=BJC@J98=BA*& 89;F585H=CB=BH9F57HGK=H<(C@MDC@MA9F5G95B8H<9

5>=;5G! " ,IG<9J /=@@, " HCA=97?+ I9FGH& +7

5F9M# , /=7?F5A5G=B;<9. '  *9;I@5HCFM(CH9BH=5@C:H<9*& (FC79GG=B;%57<=B9FM!AD@=75H=CBG:CF IA5B=G95G9$9,5+:05.,5,;0*: $JC@ BC DD    

5FH9F* FCI=B  ,<99JC@IH=CB5FMF5H9GC:9I?5FMCH=7*& DC@MA9F5G9G5B8C:H<9=FHF5BG7F=DH=CB:57HCFG5F95::97H986MH<9@9J9@C: 7CB79FH989JC@IH=CBC:H<9;9B9GH<9MHF5BG7F=6963,*<3(9)0636.@(5+ ,=63<;065JC@  BC  DD     

<5A69FG+ % 5G5BC  (5D5D9HFCI ( ,CA=G<=A5% +589@5=B%  +HI89F$   =;<@M9::=7=9BHB9IF5@7CBJ9FG=CBC:

<9B 9HA9F+  K5@8 "  F=::=B  F5G9F+  <5B    %=HC:IG=BG%:B 5B8%:B 7CCF8=B5H9@MF9;I@5H9A=HC7

=CB=" % $=B" )  C@H9FA5BB . #CDD9FG% "5?C6G%  N=N=  ,IFB9FF=8;9F +<=;9C?5, F5BN9#  5FF=G/   C@H    $5H9B8CGCA9G7H5GA*&,F5BG@5H=CB(@5H:CFAG5B8+IGH5=B %=HC7

65 References  @5F5AIBH* (98FC@5$ +9J=@@5, $CD9N89%IB5=B 9F7=5BC" I  +5B7<9N&5J5FFC %=@@5B" % +5=:= % $IDG?=" * .=@7<9N" "  GD=BCG (5@5I   9B9H=7GC:<5F7CH%5F=9,CCH<8=G95G9HMD9 AIH5H=CBG=B<9F=H5B79D<9BCHMD=7J5F=56=@=HM5B8:CIB89F9::97H6<95(3 6-4,+0*(3.,5,;0*:JC@  BC DD  

C@;5B  %5B@9M" $  %97<5B=GA5B8F9;I@5H=CBC:A*& DC@M589BM@5H=CB,5,:+,=,3674,5;JC@  BC  DD    

CCD9F,  /5B$ F9M:IGG  *&5B88=G95G9,33JC@  BC  DD  

CF89G.  *9=89B657<+ F5B?9/ /   =;<7CBH9BHC:5BI7@95FDCF9 7CAD@9LDFCH9=B=B7MHCD@5GA=75BBI@5H9@5A9@@59C:09BCDIGCC7MH9G <967,(516<95(36-*,33)0636.@JC@ BC DD     

I9GH5 (98FC@5$ +9J=@@5,  5F7=5(@5B9@@G" 

I@@9B *  .=F5@*&G@9GGCBG:FCAH<99B9AM,33JC@  BC DD    

IFH=G %5FH=B" *  569@ +=8IFM,33:;,4*,33JC@  BC  DD   9 

B;9@C%  *5=79G% (5BCKG?=+  9HN9F% /  ;989D9B89BH 89H9F=CF5H=CBC:BI7@95FDCF97CAD@9L9G75IG9G5@CGGC:BI7@95F=BH9;F=HM=B DCGHA=HCH=779@@G,33JC@  BC  DD    

5J=G " F58@9M/ %58F=8*  ,<9D9FCB95@AIG7I@5F5HFCD

89/59;<+ F58M+ ,   @H9F98G@CK5LCB5@HF5BGDCFH5B8F9;9B9F5H=CB =B5AM9@=B89:=7=9BHAIH5BHACIG9H<9HF9A6@9F5G5B=BJ=JCAC89@:CF +7

9FH=  5FF9HHB;9@9( %57=G557#  +H9J9BG*  +F=F5A+ <9B*  *C<@  "C

66

F9M:IGG %5HIB=G% " (=BC@*CA5+ IF8  6-06*/,40:;9@JC@  DD    

*'2 $$'& (  LCB5@%=;F5H=CBC:(FCH9=BG=BH<99BHF5@ &9FJCIG+MGH9A5B8(9F=D<9F5@&9FJ9G5GG

I6CIF;' NN98=B9 .9FBM IFCG=9F =FCI?&  CI=89F* +5@=< % CI

M7?( " $5A69FH  $CK9FACHCF5B8DF=A5FMG9BGCFMB9IFCB 8=G95G9GK=H<D9FCB95@AIG7I@5F5HFCD

&'(FC>97HCBGCFH=IA B=BH9;F5H989B7M7@CD98=5C:&9@9A9BHG =BH<9

J5B;9@=GH5 % %5;@CHH*CH< +H=9F@9% F=BC$ +CIHC;@CI ,CF5$  ,F5BG7F=DH=CB5B8A*&9LDCFHA57<=B9F=9G+ 5B8,*0  A5=BH5=BACBCI6=EI=H=B5H98 65@5B79F9EI=F98:CF&F9D5=F$/, 6<95(36-*,33)0636.@JC@  BC  DD   

C;9@ $ +5HM5%IFH=+ C<9B  @=B=75@9LCA9G9EI9B7=B;=B B9IFC@C;=78=G95G9 ,<9636.@3050*(379(*;0*,JC@ BC  DD    

5@@5F8C% $IB5* F8>IA9BHFCA5;9 ,9ADGH( ;I=@9F5   &56 D5B8H<9,

=@=GG9B  C=G7<9B FIBB9F .9@HA5B"   =G95G9;9B9 =89BH=:=75H=CBGHF5H9;=9G:CF9LCA9G9EI9B7=B;<967,(516<95(36-/<4(5 .,5,;0*:JC@  BC DD   

C< 

CC85@@ " /=7?F5A5G=B;<9. '  *&=B75B79F (;<9, 9,=0,>:(5*,9 

CF@=7< #IH5M-  ,F5BGDCFH69HK99BH<979@@BI7@9IG5B8H<9 7MHCD@5GA55<(3",=0,>6-,33(5+,=,3674,5;(30636.@JC@  DD    

67 References  FNM6CKG?5    IA5B=BHFCB@9GG;9B9G:IB7H=CB5@;FCIDG5GGC7=5H98 8=G95G9G9JC@IH=CB5B8A*&DFC79GG=B;=B56G9B79C:GD@=7=B;06*/,40*(3 (5+)067/@:0*(39,:,(9*/*644<50*(;065:JC@  BC  DD   

IC/ &5I>C7?% IA5;5@@=$ .5B8CCFB9, 55HG9B( CCB*  'F8CJ5G$ (5H9@ /9@H9FG% .5BK9@89B,  99BG& ,F=7CH,  9BCM. +H9M59FH" $9:96JF9'A5F C9GA5BG/ "5FD9%  +H9FB97?9FH" /9;B9F (9HF=+ C<@ .5B89B9F;<9( *C669F97

IF=5 ,F5B  *5A57<5B8F5B+ #C7< @CIB?5F=' IHH5(  5IG9F ,5AIF5,  !89BH=:=75H=CBC:A*&GH<5H5F9GD@=7986IH BCH9LDCFH98HCH<97MHCD@5GA=BH<956G9B79C:, '=BACIG99A6FMC :=6FC6@5GHG"  ,>'692 'JC@  BC DD    

IGH5JGGCB C@@9HH" 5FF9F% 5G@M"  5A=@MK=H<DF=A5FM D9F=C8=7D5F5@MG=G5B85AIH5H=CB=B%%(5;9B9=AD@=75H98=BA*& HF5BGDCFH<:*3, ,9=,JC@  BC DD    

5A5@5=B9B* %5BB=B9B, #C=JIA5?= #=G@=B% 'HCB?CG?=,  +ICA5@5=B9B  ,=GGI95B879@@HMD9GD97=:=7A5B=:9GH5H=CBGC: <9H9FCD@5GA=7AH&  AIH5H=CB=B

9@A@=B;9F ,CF5$  +<5F=B;H<9+ $9,5+:05)06*/,40*(3 :*0,5*,:JC@  BC  DD   

9FC@8 ,9=L9=F5$ !N5IFF5@89   9BCA9K=895B5@MG=GC:BI7@95F A*&9LDCFHD5H

=@896F5B8% + ,5B?5F8*  5N=B5 . 5A=5BC"  $5KF9B79# % 5<@ *9;5B % +<95F9F  +A=H<* " %5F=B=  I9FF=B=* $565H9   5A65F89@@5 ,=BID9F( $=7<9HH5$ 5@85GG5F=+ =GI@@=  (=DDI77=, +7<9::9F!  *9=8  (9HFCI+ 5<@C% 9F?CJ=7+   (*!% AIH5H=CB5B9K75IG9C:BC7HIFB5@:FCBH5@@C699D=@9DGM 55(3:6-*3050*(3(5+;9(5:3(;065(35,<9636.@JC@  BC DD   

=@@  +CFG7<9F "  ,<9BCBF5B8CA8=GHF=6IH=CBC:=BHFCB@9GG

C9@N 96@9F / @C69@  ,<9GHFI7HIF9C:H<9BI7@95FDCF9 7CAD@9L55<(3",=0,>6-06*/,40:;9@JC@  DD   

68

GI(  +7CHH  /9=BGH9=B"  *5B  #CB9FA5BB+ ;5FK5@5. $= 1 =B9 " /I0 +<5@9A' F58=7?, " %5FF5::=B=$  5C  2<5B;  &H5F;9H=B;GD97=:=7=HMC:*&;I=8985GBI7@95G9G (;<9,)06;,*/5636.@JC@  BC DD   

I5B;1  5HHCB=* +H9J9B=B" +H9=HN"   +*GD@=7=B;:57HCFGG9FJ95G 585DH9FDFCH9=BG:CF,(89D9B89BHA*&9LDCFH63,*<3(9*,33JC@  BC  DD  

I8=G<$ ! I65? ,F=C@C, % &=9A9M9F + +IGG9@$ &5;9@%  ,5@=5:9FFC" % *IGG   %C89@=B; MDCL=5!B8I798&9IFCD5H<=9G -G=B;55GH5B8+75@56@9 IA5B%CHCF&9IFCB=::9F9BH=5H=CB+MGH9A#;,4 *,339,769;:JC@  BC DD    

IHH9B+ -G@I9F+ CIF;9C=G +=ACB9HH= '89< % 5F9 %  NIDD5%  FIG?5(@C7<5B%  C:K969F% (C@MA9B=8CI% +

!BH9FB5H=CB5@ 5D%5DCBGCFH=IA ,<9!BH9FB5H=CB5@ 5D%5D(FC>97H (;<9,JC@  BC DD  

!BH9FB5H=CB5@ IA5B 9BCA9+9EI9B7=B;CBGCFH=IA =B=G<=B;H<9 9I7

"5B= $IHN+  IFH $5G?9M*  +H9K5FH% /=7?F5A5G=B;<9. '  IB7H=CB5@5B8GHFI7HIF5@7<5F57H9F=N5H=CBC:H<9A5AA5@=5B,*0  7CAD@9LH<5H@=B?GHF5BG7F=DH=CBK=H<BI7@95FA9GG9B;9F*&9LDCFH <*3,0* (*0+:9,:,(9*/JC@  BC  DD   

"5B= $IHN+ %5FG<5@@& " =G7<9F, #C<@9F @@=G8CB %  IFH  +H9K5FH%  +IG 87 5B8H<9+57!F9;=CB:CFA57CBG9FJ98 =BH9F57H=CBD@5H:CFAH<5HDFCACH9GBI7@95FDCF95GGC7=5H=CB5B8A*&9LDCFH 63,*<3(9*,33JC@ BC DD   

"C + 

"CB9G $ )I=A6M   CC8" # 9FF=;BC( #9G<5J5( +=@J9F(   CF69HH  +A5M@=B?BI7@95FDFCH9=B9LDCFHHC79@@7M7@9 DFC;F9GG=CB!96*,,+05.:6-;/, (;065(3*(+,4@6-#*0,5*,:6-;/,%50;,+ #;(;,:6-4,90*(JC@ BC DD    

"CF85BCJ5 9"CB;<9( C9F?C9@  ,5?5G<=A5 9.F=9B8H  9IH9F=7? %5FH=B" " IH@9F! " %5B7=5G( (5D5GCNCA9BCG+   ,9F9GDC@G?M (CHC7?=$ FCKB / +

69 References  B9IFC:=@5A9BH@=;

#5F5?5M5% %5N5<9F=& (C@5H! <5FI7<5 C969@ CB?9FJCCFH+  %5FCC:=5B* +<5F=5H=  C9@?9F! %CB5;<5B# /=B7<9GH9F+ 2CF=*  5@9<85F= CBB9A5BB 1=G- /=FH<  =5@@9@=7 %%(AIH5H=CBG75IG9<5F7CH%5F=9,CCH<B9IFCD5H

#5H5<=F5"  &I7@95F9LDCFHC:A9GG9B;9F*&,5,:JC@ BC  DD    

#5H5<=F5"  A*&9LDCFH5B8H<9,*07CAD@9L06*/040*(,; )067/@:0*((*;(JC@   BC DD    

#9BB9FGCB% $ CF69HH  @@=G% (9F9N+=@9G &=7

#=A " #=A&  /5B;1  +75F6CFCI;<  %CCF9" =5N2 %57$95 # + F9=65IA $=+ %C@@=9L #5B5;5F5> ( 5FH9F* CM@5B #  /C>H5G % *589A5?9FG* (=B?IG" $  F99B69F;+   ,FC>5BCKG?=" ) ,F5MBCF " +A=H< & ,CDD+  ?5N= + %=@@9F"  +<5K  #CHH@CFG% #=FG7

#CF9M  /=@?=9 5J=G! .5B.57HCF  GA5@@6F=GH@9G=GF9EI=F98 :CFH<9ACFD

#CFB6@=5F8=B (9HF=@@C %IBCN% "  @H9FB5H=J9GD@=7=B;5D=JCH5@GH9D69HK99B9I?5FMCH=7HF5BG7F=DH=CB5B8 HF5BG@5H=CB (;<9,9,=0,>:63,*<3(9*,33)0636.@JC@  BC DD    

#CIH9@CI  =FG7< $ 9BH+ 1  %I@H=D@9:579GC:H<9+  7CAD@9L<99,5;670506505*,33)0636.@JC@  BC DD  

#F5A9F& "  5B9M% + %CF;9BG / "CJ=7=7 CIH

#FC@G% I@HMB7? "5BGG9BG+  *%=HC7

70

#FC@G% J5B!GH9F859@ GG9@69F;< #F9A9F $=DD9BG+ ,=AA9FA5B . "5BGG9BG+  %=HC7

#IFG<5?CJ5% % #F5GBCJ & #CDMHCJ5 . +<=8@CJG?==1 . &=?C@9B?C " . &56=FC7

#IK5<5F5# 1CG<=85% #CB8C +5?5H5 /5H5B5691 69 #CIBC 1 ,CA=M5GI+ I>=AIF5+ ,C?I<=G5, #=AIF5 N5?=,  +5?5;I7<=&  BCJ9@BI7@95FD

#K=5H?CKG?=, " "FCG7C  $97@9F7 $ ,5AF5N=5B .5B89F6IF; *  *IGG 5J=G  =@7'692 'JC@  BC  DD    

$5B; CBB9H" -A@5I: #5FAC8=M5# #C::@9F" +H=9F@9% 9JMG  ,CF5$  ,<9H=;

$99" ( "9M5?IA5F%  CBN5@9N* ,5?5<5G<= $99( " 59?*  @5F?  *CG9 I @5F?9" %7#9F7<9F+ %99F@CC" %I@@9F " (5F? # ! IHH9FG,  K9?*  +7

$99, ! 1CIB;*   ,F5BG7F=DH=CB5@F9;I@5H=CB5B8=HGA=GF9;I@5H=CB=B 8=G95G9,33JC@   BC DD    

$9= =5G ( *998*  LDCFH5B8GH56=@=HMC:B5HIF5@@M=BHFCB@9GG A*&GF9EI=F9GD97=:=77C8=B;F9;=CBG9EI9B79G5B8H<9,*0A*&9LDCFH 7CAD@9L!96*,,+05.:6-;/, (;065(3*(+,4@6-#*0,5*,:6-;/,%50;,+ #;(;,:6-4,90*(JC@  BC DD    

$9K=G (  F99B*  F9BB9F+   J=89B79:CFH<9K=89GDF958 7CID@=B;C:5@H9FB5H=J9GD@=7=B;5B8BCBG9BG9A98=5H98A*&8975M=B

71 References  $=/ H5B5GGCJ + $5B0 %C<5B*  +K5BGCB+ # 5FF=5 , @CF9BG $ /5G<6IFB% ( /CF?A5B" $ 9BH+ 1 *  MHCD@5GA=7 ,0&$!BH9F:9F9GK=H<&I7@95FIB7H=CBGC:H<9+ 9I6=EI=H=B5G9 %C8I@963,*<3(9(5+*,33<3(9)0636.@JC@ BC  DD    

$=B1 $=2 'NGC@5? #=A+ / F5B;CF;CHM $=I, , ,9B9B65IA +  5=@9M, %CB5;<5B ( %=@CG( % "C

$CJ9% !  I69F/ B89FG+  %C89F5H989GH=A5H=CBC::C@87<5B;95B8 8=GD9FG=CB:CF*&G9E85H5K=H<+9E ,564,)0636.@JC@  BC   DD      

$IC% " *998*  +D@=7=B;=GF9EI=F98:CFF5D=85B89::=7=9BHA*& 9LDCFH=BA9H5NC5BG!96*,,+05.:6-;/, (;065(3*(+,4@6-#*0,5*,:6-;/, %50;,+#;(;,:6-4,90*(JC@ BC  DD    

%5;M$ %5H<=G+ $9%5GGCB  C=N9H ,5N=F% .5@@5H" %   -D85H=B;H<97@5GG=:=75H=CBC:=B<9F=H98B9IFCD5H<=9G*9GI@HGC:5B =BH9FB5H=CB5@GIFJ9M ,<9636.@JC@  BC  DD 9 9 

%5>9KG?=" +7

%5?9@59B;G( "5FJ=B9B& .ICD5@5# +ICA5@5=B9B !;B5H=IG" +=D=@5 %  9FJ5* (5@CH=9 (9@HCB9B$  GG=;BA9BHC:H<98=G95G9 @C7IG:CF@9H<5@7CB;9B=H5@7CBHF57HIF9GMB8FCA9HC5F9GHF=7H98F9;=CBC: 7

%5FH=B-  ,<9F5D9IH=7DD@=75H=CBC:(@IF=DCH9BH+H9A9@@G<5@@9B;9G 5B8*=G?G965;0,9:054,+0*05,JC@ DD   

%5FH=B/ #CCB=B .  !BHFCBG5B8H<9CF=;=BC:BI7@9IG7MHCGC@ 7CAD5FHA9BH5@=N5H=CB (;<9,JC@  BC   DD   

%5GHCB  J5BG+ #  F99B% *  ,F5BG7F=DH=CB5@F9;I@5HCFM 9@9A9BHG=BH<96-.,5640*:(5+/<4(5 .,5,;0*:JC@ DD   

%5HIB=G $ %5HIB=G% " F9M:IGG  GGC7=5H=CBC:=B8=J=8I5@

%5IFM1 CA9" (=G?CFCKG?=*  +5@5<%C<9@@=6=& <9J5@9MF9.  (9G7<5BG?=% %5FH=B5H &989@97+  CA6=B5HCF=5@5B5@MG=GC:

72

89J9@CDA9BH5@7I9G9::=7=9BH@M7CBJ9FHG

%7CB5@8" / $=I0 2 )I1 $=I+ %=7?9M+ # ,IF9HG?M  CHH@=96  ! IF98F5HGD=B5@7CF8 (;<9,4,+0*05, JC@ BC  DD     

%=@@9F " *CG9B:9@8H $ 2<5B; $=BB5B9 / &5;@9M(  (F97=G9 89H9FA=B5H=CBC:A=HC7

%CFC=5BI"  &I7@95F=ADCFH5B89LDCFHHF5BGDCFH:57HCFGA97<5B=GAG5B8 F9;I@5H=CB90;0*(39,=0,>:05,<2(9@6;0*.,5,,?79,::065JC@ BC  DD   

&9@=G F89A+ .5B9B9F;<( 1 9@D5=F99H<=CI%  9IH9F=7?  .5B 9FK9B. I9GH5 (98FC@5$ (5@5I  56F99@G9GH9B   .9F9@@9B ,5B 9A=F7=% .5BFC97?

&9IA5BB% +5AD5H5BCKG?=" ) $99. %  -6=EI=H=B5H98,( =B:FCBHCH9ADCF5@@C65F89;9B9F5H=CB5B85AMCHFCD<=7@5H9F5@G7@9FCG=G #*0,5*, ,>'692 'JC@  BC DD     

&;+  ,IFB9F *C69FHGCB(  @M;5F9+  =;<5A / $99  +<5::9F, /CB;% <5HH57<5F>99 =7<@9F  5AG<58%  &=7?9FGCB  +<9B8IF9"  ,5F;9H9875DHIF95B8A5GG=J9@MD5F5@@9@ G9EI9B7=B;C: 

&;I9B;5B;/5?5D+ $5A69FH % '@FM *C8K9@@  I9M85B  $5BB95I. %IFD

&=@G9B, /  F5J9@9M *  LD5BG=CBC:H<99I?5FMCH=7DFCH9CA96M 5@H9FB5H=J9GD@=7=B; (;<9,JC@ BC   DD  

&=G<=AIF5, 1CG<=?5K5 I>=AIF5 +5?C85+ 15B5;=<5F5,   77IAI@5H=CBC:D9F=D<9F5@AM9@=BDFCH9=B =BCB=CB6I@6G5B8+7

73 References  &CF=+ '?5851 15GI85 ,GI>=' ,5?5<5G<=1 #C65M5G<=1 I>=MCG<= # #C=?9% -7<=M5A51 !?985 ,CM5A51 15A5B5?5+ &5?5AIF5 % '?5BC   F5:H98IFM=B A=79!96*,,+05.:6-;/, (;065(3*(+,4@6-#*0,5*,:6-;/,%50;,+#;(;,:6- 4,90*(JC@  BC  DD     

&CIG=5=B9B ' #9GH=@5% (5??5G>5FJ=&  CB?5@5 #IIF9+ ,5@@=@5"  .ICD5@5# !;B5H=IG"  9FJ5* (9@HCB9B$  %IH5H=CBG=BA*& 9LDCFHA98=5HCF $ F9GI@H=B5:9H5@ACHCB9IFCB8=G95G9 (;<9,.,5,;0*: JC@  BC  DD    

(55??C@5, +5@C?5G# %==B5@5=B9B! $9CG5@C% #55FH99B5

(5F9MGCB +5J9F=( +5;B9@@= (=G7CGEI=HC  %=HC7

(5F?" / (5F=G?M# 9@CHHC % *99B5B*   F5J9@9M *   !89BH=:=75H=CBC:5@H9FB5H=J9GD@=7=B;F9;I@5HCFG6M*&=BH9F:9F9B79=B FCGCD<=@5!96*,,+05.:6-;/, (;065(3*(+,4@6-#*0,5*,:6-;/,%50;,+ #;(;,:6-4,90*(JC@  BC DD    

(9@>HC% /=7

(<=@=DD5?=G  NN5F=H= * 9@HF5B+ FCC?9G " FCKBGH9=B   FI8BC% FIBB9F IG?9' " 5F9M# C@@ IA=HF=I+  M?9+ ' 89BIBB9B" , =FH< .  =66G*   =F895%  CBN5@9N %  59B89@%   5ACG<  C@A!   I5B;$  IF@9G%   IHHCB  #F=9F"  %=GMIF5 %IB;5@@ " (5G7<5@@" (5H9B *C6=BGCB ( & +7<=9HH975HH9 +C6F9=F5& $ +K5A=B5H<5B " ,5G7

(@5B?" $ 95B  B<5B79F:IB7H=CBA97<5B=GH=75B8;9BCA9K=89 =BG=;

(@9=GG"  /<=HKCFH<  9F;?9GG9@%  IH

74

(F=CF, $957<% =B5B;9F  979A69F @5GHID85H9#705(3 <:*<3(9;967/@3 CA9D5;9C: 9B9*9J=9KG43'B@=B94 J5=@56@9

)I) $= $CI=G# * $=0 15B; +IB) F5B85@@+ * ,G5B;+  2

*59MA59?9FG( ,=AA9FA5B. &9@=G 9"CB;<9(  CC;9B8=>?"  55G  5F?9F  %5FH=B" " 9.=GG9F% C@

*5A5B5H<5B *C66  <5B+  A*&75DD=B;6=C@C;=75@ :IB7H=CBG5B85DD@=75H=CBG <*3,0*(*0+:9,:,(9*/JC@ BC  DD     

*9=@@M% % %IFD

*=7<5F8G+ N=N& 5@9+ =7? 5G+  5GH=9FCGH9F"  FC8M/ /  9;89% $MCB +D97HCF .C9@?9F8=B;# *9C=BH7CBG9BGIGF97CAA9B85H=CBC:H<9 A9F=75BC@@9;9C:%98=75@ 9B9H=7G5B8 9BCA=7G5B8H<9GGC7=5H=CB:CF %C@97I@5F(5H

*=9G9B69F;+ %5F=7=7,  ,5F;9H=B;F9D5=FD5H

*C8F=;I9N&5J5FFC+  IFH  $=B?=B;;9B9F9;I@5H=CBHCA*& DFC8I7H=CB5B89LDCFH<99,5;670506505*,33)0636.@JC@  BC DD     

*CG9BNK9=; + FC7?" $I( #IA5A5FI +5@9;=C  #58CM5#  /969F" $ $=5B;" " %CG95B?C*  5K697?9F+  I=9" *  5JHCB$   &CIH$CA5G1 + 9F;IGCB * 95HH=9% + F9GB5<5B"  ,IGNMBG?= %  *9GHCF5H=J99::97HGC:

*CGGCF % J5BG% * *9=@@M% %  DF57H=75@5DDFC57<HCH<9 ;9B9H=7B9IFCD5H<=9G!9(*;0*(35,<9636.@JC@  BC DD    

75 References  *CGGCF % (C@?9" %  CI@89B *9=@@M% %  @=B=75@=AD@=75H=CBG C:;9B9H=758J5B79G=B<5F7CH%5F=9,CCH<8=G95G9 (;<9, 9,=0,>: ,<9636.@JC@ BC  DD   

*CGGCF % ,CA5G9@@=( " *9=@@M% %  *979BH58J5B79G=BH<9;9B9H=7 B9IFCD5H<=9G<99,5;6705065055,<9636.@JC@  BC DD  

*CH<9% %C8@=7<- +7<5A657<  =CG5:9HM7<5@@9B;9G:CFIG9C: @9BH=J=F5@J97HCFG=B;9B9H<9F5DM<99,5;.,5,;/,9(7@JC@  BC DD   

*M5B%  +

+57

+5?5;I7<=& I>=AIF5+ #IK5<5F5#  !BJC@J9A9BHC: &(=B79@@ 57H=J5H=CB=B,79@@89D9B89BH5BH=;9BF9GDCBG9,=,3674,5;(3044<5636.@ JC@ BC DD    

+5?<5F?5F% # (9FIA5@ + $=A1 ( <9FB$ ( 1I1 #5B;I95B9(  @H9FB5H=J9@MGD@=798

+5@HNA5B $ (5B) @9B7CK9 "  *9;I@5H=CBC:5@H9FB5H=J9GD@=7=B; 6MH<97CF9GD@=79CGCA5@A57<=B9FM,5,:+,=,3674,5;JC@  BC DD   

+5B;9F &=7?@9B+ CI@GCB *  &G9EI9B7=B;K=H<7<5=B H9FA=B5H=B;=B<=6=HCFG!96*,,+05.:6-;/, (;065(3*(+,4@6-#*0,5*,:6-;/, %50;,+#;(;,:6-4,90*(JC@ BC  DD  

+5BN 15B;$ +I, %CFF=G * %7#B=;

+7A5?9FG" .I@HCJ5B+=@:

76

+98;<=% %CG@9A= 56F9F5+9FF5BC% BG5F=  <5G9A=%  5?H5G<=5B% 5HH5G<5F;<=  *979GG=J9<5F7CH %5F=9,CCH<5B8AI@H=D@9G7@9FCG=G5GGC7=5H98K=H<5J5F=5BH=B ! 9(05644<50*(;065:JC@  BC  

+<5H?=B "  5DD=B;C:9I75FMCH=7A*&G,33JC@ BC (, DD   

+<9D5F8( " '692 'JC@  BC DD   

+<=1 !BCI9 /I"  15A5B5?5+  !B8I798D@IF=DCH9BHGH9A79@@ H97:9<.+0:*6=,9@JC@  BC  DD    

+<=;9C?5, "IB; "IB;" ,IFB9FF=8;9F '

+<=;9C?5, "IB;"  C@H  "IB;  LCB,*(*=6C,5;::=B=HM (IF=:=75H=CBC:,F5BG@5H98A*&G:FCA&9IFCB5@LCBG=B%CIG9!B.=JC ,;/6+:05463,*<3(9)0636.@30-;65 JC@  DD  

+<=;9C?5, #CDD9FG% /CB; $=B" ) 5;B9HH5* K=J98M 89 F9=H5G&5G7=A9BHC" J5B,5FHK=>? / +HFC<@ =CB=" % +7<59::9F"  5FF=B;HCB% #5A=BG?=  "IB;  5FF=G/   C@H   'B +=H9*=6CGCA9*9AC89@=B;6M$C75@@M+MBH<9G=N98*=6CGCA5@(FCH9=BG=B LCBG,339,769;:JC@  BC  DD    9 

+9KG?=#  F5B8=G% $9K=G*  $=" +

+=B;<+ # %5985# =8% % @AC:HM+  'BC% (<5A(  CC8A5B %  +5?5;I7<=&   &(F9;I@5H9GF97FI=HA9BHC:!HC =AAIBC;@C6I@=BJ5F=56@9F9;=CBG6MAC8I@5H=B;HF5BG7F=DH=CB5B8BI7@9CGCA9 C77ID5B7M (;<9,*644<50*(;065:JC@ DD   

+C6F9=F5& +7<=9HH975HH9 .5@@9  5ACG<   9B9%5H7<9F5 A5H7<=B;HCC@:CF7CBB97H=B;=BJ9GH=;5HCFGK=H<5B=BH9F9GH=BH<9G5A9;9B9 <4(54<;(;065JC@ BC  DD   

+F998<5F5B" @5=F! ( ,F=D5H<=.   I0 .5B79 *C;9@> 7?9F@9M+  IFB5@@"  /=@@=5AG# $ IF5HH= 5F5@@9 899@@9FC7<9" %=H7<9@@ "  $9=;<( & @<5@56= %=@@9F  &=7'692 'JC@  BC  DD    

77 References  +F998<5F5B" &9I?CAA$ " FCKB* "FF99A5B% *  ;9 9D9B89BH,(%98=5H98%CHCF&9IFCB9;9B9F5H=CB*9EI=F9G +#<5H HF=7?5B8LA5G <99,5;)0636.@JC@  BC  DD     

+H9K5FH%  (C@M589BM@5H=CB5B8BI7@95F9LDCFHC:A*&G$/,6<95(36- )0636.0*(3*/,40:;9@JC@  BC DD    

+H9K5FH%  *5H7<9H=B;A*&CIHC:H<9BI7@9IG63,*<3(9*,33JC@   BC DD   

+HF5A6=C95GH=@@=5 &=9D9@% *CIH% (  ,<9BI7@95FDCF9 7CAD@9L6F=8;=B;BI7@95FHF5BGDCFH5B8;9B9F9;I@5H=CB (;<9, 9,=0,>:63,*<3(9*,33)0636.@JC@  BC DD   

+HFI<@#  IB85A9BH5@@M8=::9F9BH@C;=7C:;9B9F9;I@5H=CB=B9I?5FMCH9G5B8 DFC?5FMCH9G,33JC@ BC  DD   

+N=;9H=# $IDG?=" *  <5F7CH%5F=9,CCH<8=G95G9<967,(516<95(3 6-/<4(5.,5,;0*:JC@  BC DD   

,5?5<5G<=# '?=H5# &5?5;5K5% 15A5B5?5+  !B8I7H=CBC: D@IF=DCH9BHGH9A79@@G:FCA:=6FC6@5GH7I@HIF9G (;<9,796;6*63:JC@  BC   DD     

,5?5<5G<=# 15A5B5?5+  !B8I7H=CBC:D@IF=DCH9BHGH9A79@@G:FCA ACIG99A6FMCB=75B858I@H:=6FC6@5GH7I@HIF9G6M89:=B98:57HCFG,33JC@  BC DD  

,5?9=1 GG9B69F;% ,GI>=ACHC $5G?9M*  ,<9%%579HM@5G9 %%(=B<=6=HG=B=H=5H=CB6IHBCH9@CB;5H=CBC:&F9D@=75H=CBJ=5 =BH9F57H=CBK=H<%%$/,6<95(36-)0636.0*(3*/,40:;9@JC@  BC  DD    

,5?9=1 +K=9H@=?% ,5BCI9 ,GI>=ACHC #CIN5F=89G, $5G?9M*  %%(5BCJ9@579HM@HF5BG:9F5G9H<5H579HM@5H9GF9D@=75H=CBDFCH9=B %% 9,769;:JC@  BC  DD    

,5?9=1 ,GI>=ACHC  !89BH=:=75H=CBC:5BCJ9@%%5GGC7=5H98DFCH9=B H<5H:57=@=H5H9G%%BI7@95F@C75@=N5H=CB$/,6<95(36-)0636.0*(3 */,40:;9@JC@  BC DD    

,5B  *56=895I% @9J=BG /9GH6FCC?% " ?GH9=B, &M?5AD#  9I7<9F  5FD9F 9AA9F$  IHCGCA5@F979GG=J9%&  F9@5H98<5F7CH%5F=9,CCH<8=G95G9K=H<8=5D

,5M@CF" ( FCKB* "F@9J9@5B8 /  97C8=B;$+:FCA;9B9G HCA97<5B=GA (;<9,JC@ BC  DD    

78

,5N=F% 9@@5H57<9% &CI=CI5+ .5@@5H" %  IHCGCA5@F979GG=J9 <5F7CH%5F=9,CCH<8=G95G9:FCA;9B9GHCD<9BCHMD9G6<95(36-;/, 7,907/,9(35,9=6<::@:;,4! #JC@  BC  DD    

,CD=G=FCJ=7! +J=H?=B1 . +CB9B69F;& +<5H?=B "  5D5B875D 6=B8=B;DFCH9=BG=BH<97CBHFC@C:;9B99LDF9GG=CB&03,@05;,9+0:*07305(9@ 9,=0,>:" JC@  BC  DD    

,I89? +7

-A@5I: CBB9H" /5<5FH9 CIFB=9F% +H=9F@9% =G7<9F F=BC$  9JMG ,CF5$  ,<9

.5?I@G?5G  9J9F ( *9HH=; * ,IF?* "57C6= % C@@=B;KCC8 %  C89& % %7&9=@@% + 15B+ 5A5F9B5" $99 % (5F?+  /=96?=B;. 5?* '  CA9N'GD=B5& (5J9@=BI% +IB/ 5C  (CFH9IG% 9<@?9%   <=;<:=89@=HM5GAIH5BH89@=J9F985G 5F=6CBI7@9CDFCH9=B7CAD@9L9B56@9G9::=7=9BH;9B998=H=B;=B

.5@9BH=>B$ " 'IJF=9F*  J5B89BCG7<& C@9F=B9+CHH5GB9IFCD5H

.5@@5H" % ,5N=F% %5;89@5=B9 +HIFHN  F=8  IHCGCA5@ F979GG=J9<5F7CH%5F=9,CCH<8=G95G9G6<95(36-5,<967(;/636.@(5+ ,?7,904,5;(35,<9636.@JC@ BC DD  

.5B79 *C;9@>  CFHC65;M=, 9.CG# " &=G<=AIF5 $ +F998<5F5B"  I0 +A=H< *I88M /F=;+ @<5@56= $9=;<( & @5=F! ( &=7'692 'JC@  BC  DD   

.5B8CCFB9, .9MG#  IC/ +=75FH .=BHG# +K=>G9B %C=GG9%  9@9B  CIB?C& . IA5;5@@=$ 5N5@*  9FA9MG )I59;969IF  9B8H+ % 5FA9@=9H( .9F:5=@@=9 .5B5AA9(  <9GEI=9F9  9C7?# .5B9BCG7<$  =::9F9BH=5H=CB6IHBCH$+ AIH5H=CBG=B-+F9K=F9GACHCFB9IFCBA9H56C@=GA (;<9,*644<50*(;065: JC@  BC  DD     

79 References  .5F;5G 1 *5> %5FF5G+  #F5A9F * ,M5;=+  %97<5B=GAC: A*&HF5BGDCFH=BH<9BI7@9IG!96*,,+05.:6-;/, (;065(3*(+,4@6- #*0,5*,:6-;/,%50;,+#;(;,:6-4,90*(JC@  BC DD      

.9@M7

.9F

.ICD5@5# !;B5H=IG"  9FJ5*  $9H<5@5FH

/5<@%  /=@@ $ $I

/5B;$ %=5C1 $ 2<9B;0 $57?:CF8 2

/,+'&"  *!#  %C@97I@5FGHFI7HIF9C:BI7@9=757=8G5 GHFI7HIF9:CF89CLMF=6CG9BI7@9=757=8 (;<9,JC@   BC DD  

/9=G" @59MG# *CCG NN98=B9 #5HCB5! +7

/=7?F5A5G=B;<9. ' B8F9KG* @@=G( $5B;:CF8  IF8CB"  +H9K5FH % .9B?=H5F5A5B * $5G?9M*   +9@97H=J9BI7@95F9LDCFHC: GD97=:=77@5GG9GC:A*&:FCAA5AA5@=5BBI7@9==GDFCACH986M &( <*3,0*(*0+:9,:,(9*/JC@  BC DD    

/=7?F5A5G=B;<9. ' $5G?9M*   CBHFC@C:A5AA5@=5B;9B9 9LDF9GG=CB6MG9@97H=J9A*&9LDCFH (;<9,9,=0,>:63,*<3(9*,33)0636.@ JC@  BC DD   

80

/=7?F5A5G=B;<9. ' %7%IFHF=9( ! %5FF" A5;5G91 %5=B+ %=@@G   $5G?9M*  ,5?9=1  %%(=GHF5BG7F=698:FCA5 DFCACH9FK=H<=B5B=BHFCBC:H<9CJ9F@5DD=B;;9B9:CF &(6<95(36- 63,*<3(90636.@JC@  BC DD  

/=7?F5A5G=B;<9. ' %7%IFHF=9( ! %=@@G  ,5?9=1 (9BF

/I"5C<5HH9F>99#/5B1 CDD9F#                    ,5,:+,=,3674,5;JC@  BC  DD    

/I0 F9K9F  ,<9F9;I@5H=CBC:A*&GH56=@=HM=BA5AA5@=5B79@@G ,5,JC@  BC  DD   

15B;$ I::% '  F5J9@9M * 5FA=7<59@ <9B$ $    9BCA9K=897<5F57H9F=N5H=CBC:BCBDC@M589BM@5H98*&G,564,)0636.@ JC@  BC  DD *    F  DI6 96  

2<5B;2 $CHH= =HHA5F# 1CIB=G! /5B$ #5G=A% F9M:IGG  +%&89:=7=9B7M75IG9GH=GGI9GD97=:=7D9FHIF65H=CBG=BH<9F9D9FHC=F9C: GB*&G5B8K=89GDF95889:97HG=BGD@=7=B;,33JC@  BC DD  

2I79KG?=# % +H5>=7<"  ,CIFB9J! .9F



81 Contributions    

 #"

In this work, I participated in conceptualization of the project, investigation of background literature, development and creation of methodology, validation of the results, formal analyses, investigation, visualization of data, preparation of figures and writing the manuscript draft. I performed skin fibroblast culture, cDNA sequencing, and analysis of RNA and protein levels, utilization of bioinformatics online tools, minigene mRNA splicing assay, mixed neuronal culture, immunocytochemistry and imaging. I participated in collaborative work with international collaborators and overall project management. Participated in public outreach and communication about the work to a broader audience. I presented my work in international and national conferences.

  #"

In this work, I participated in the conceptualization, study design and hypothesis generation, research of background literature, development of models and methodology, validation of the experiments, formal analysis, experimental investigation, writing original draft of the publication, visualization, and acquired personal funding. For experimental work, I performed the Sanger sequencing, qRT-PCR experiments, analysis and quantification of protein in immunocytochemistry by microscopy, electron microscopy, isolation of RNA for sequencing, quantification of RNAScope data, statistical analysis and analysis/interpretation of final data. I presented my work in national and international meetings and took responsibility for managing the project.

  #"

In this project, I participated in the overarching conceptualization, review of background literature, design of experiments and implementation of experiments. I created models, applied and generated methodology, validated the experiments and data, did the formal analysis, conducted the investigation, I performed the experiments, analyzed the data, prepared the manuscript draft and prepared visualization of the data. I acquired personal funding. I supervised a student in the project. In terms of experiments, I performed cell culture, gene editing, characterized the hiPSC, performed motor neuron differentiation, subcellular fractionation, completed the

82

microscopy, RNA sequencing experiment and final data analysis and interpretation, statistical analysis, prepared karyotyping and analyzed results, performed qRT-PCR experiments, analyzed and visualized data. I generated collaboration with an international laboratory and participated in a research visit for one year for the genome editing part of the project. I managed the project and coordinated work with collaborators.

83