Unravelling the genetic and pathophysiological complexity of the mitochondrial myopathies

Author: R.G.J. Dohmen Final Graduation Report

Unravelling the genetic and pathophysiological complexity of the mitochondrial myopathies

University Maastricht

General data:

Graduation subject Author Mitochondrial Myopathies Richard Gerardus Johannes Dohmen [email protected] Graduation term 23-01-2012 to 25-06-2012 Student number 2016701

Version Deadline 1 May/ June 2011

Education contact information Internship contact information School of Life Sciences and Environment University Maastricht Technology Clinical Genomics Department Lovensdijkstraat 61-63 Universiteitssingel 50 4818 AJ Breda Phone: 076 525 05 00 6229 ER Maastricht Phone: 043 388 19 95

Supervisor ATGM Internship mentor Julian Ramakers Prof. Dr. Bert Smeets [email protected] [email protected]

Supervisor UM Ing. Rick Kamps [email protected]

Page ii

Preface The performed graduation term, 23rd of January 2012 to 25th of June 2012, is documented in this final report. During the graduation my knowledge of the mechanisms of genomics, the use of different databases and my practical skills were improved. The obtained knowledge and results during the graduation are included in this report.

I would like to thank Bert Smeets for the opportunity to become an intern at Clinical Genomics. Second of all I want to thank Mike Gerards, Iris Boesten, Auke Otten and Bianca van den Bosch for their assistance, knowledge and practical tricks which they shared with me. Further more I thank the rest of the department Clinical Genomics for a wonderful and educational 9 and half months. And last but definitely not least my supervisor, Rick Kamps. I thank him for all the knowledge and the support and guidance he gave me, his structured and calm way of handling things really inspired me.

Page iii

Table of contents Preface ...... iii

Summary...... vii

Samenvatting...... viii

1. Introduction...... 1

2. Theoretical background ...... 2

2.1 Whole Exome – Enrichment ...... 2

2.2 Illumina Sequencing ...... 3

2.3 Bioinformatics...... 5

2.3.1 Variant analyses...... 5

2.4 Functional tests...... 7

2.4.1 Targeting of the ...... 7

2.4.2 Analyses other patients...... 8

2.4.3 expression levels...... 8

2.5 Affected patients...... 8

2.5.1 Family DNA 07-2283 ...... 9

2.5.1.1 LAMA3 ...... 9

2.5.1.2 SYNPO2 ...... 9

2.5.1.3 DCHS2 ...... 9

2.5.1.4 SLC6A8...... 9

2.5.1.5 Zyxin ...... 10

2.5.1.6 KIAA1109...... 10

2.5.1.7 Transformation fibroblast - Myogenesis ...... 10

2.5.2 Family DNA 08-5759...... 10

2.5.2.1 KRTAP10-6...... 11

2.5.2.2 PSPH ...... 11

2.6 Aim/ hypothesis...... 11

Page iv

3. Methodology...... 12

3.1 Whole Exome Enrichment & Illumina Genoma Analyzer HiSeq 2000 ...... 12

3.1.1 Sample preparation...... 12

3.1.2 Hybridization ...... 12

3.1.3 Addition of Index Tags by Post-Hybridization Amplification...... 13

3.1.4 Cluster generation ...... 13

3.1.5 Sequencing by synthesis...... 13

3.2 Data analyses...... 13

3.2.1 Validation & segregation...... 13

3.2.2 Relate function gene to phenotype ...... 14

3.3 Functional tests...... 14

3.3.1 Targeting ...... 14

3.3.2 Analyses other patients...... 14

3.3.3 Gene expression levels...... 15

3.3.4 Transformation fibroblasts ...... 15

4. Results & Discussion/ Conclusion...... 16

4.1 Family DNA 07-2283 ...... 16

4.1.1 Variants...... 16

4.1.2 Sanger sequencing validation & segregation ...... 17

4.1.3 SYNPO2 protein targeting ...... 21

4.1.4 Myogenesis – qPCR gene expression levels ...... 22

4.2 Family DNA 08-5759...... 28

4.2.1 Variants (2) ...... 28

4.2.2 Sanger sequencing validation & segregation (2)...... 29

5. Discussion/ Conclusion & Recommendations...... 32

References...... 35

Page v

Appendix’...... 38

Appendix 1: Flow diagram WE-enrichment...... 39

Appendix 2: pEGFP-n1 ...... 40

Appendix 3: Covaris S2 fragmentation...... 41

Appendix 4: AMPure XP beads Agencourt...... 42

Appendix 5: Agilent 2100 Bioanalyzer...... 43

Appendix 6: Primer design...... 44

Appendix 7: PCR & gel electrophoresis ...... 45

Appendix 8: Sanger sequencen ...... 46

Appendix 9: Designed primers for Sanger sequencing ...... 47

Appendix 10: Regulation of SYNPO2 localization in myocytes ...... 48

Appendix 11: Protein-protein interaction SYNPO2 ...... 49

Page vi

Summary Mitochondria are the power plants of the cell, using five complexes, which are located in the inner membrane, to generate energy. These five complexes, I to IV and ATP synthase, are encoded by mitochondrial and nuclear genes. Mutations in these genes can cause the complexes or associated to become non-functional. This could lead to lack of energy for cells, tissues and organs to use; especially organs which have a high energy demand, are affected. These mutations can be the cause of a mitochondrial myopathy, disorders in which the energy supply of muscles is affected. A new approach to detect the genetic cause is insofar irresolvable. All the thousands of genetic variants in the coding regions of the genome are detected, from which the potentially pathogenic, causative mutations need to be filtered out. Whole Exome (WE) Sequencing is based on Next-Generation Sequencing (NGS) technology. These variants are validated and the segregation in the families is tested using Sanger sequencing. The variants are also functionally tested; these tests include protein targeting, cellular and gene expression studies.

Two patients from WE families, DNA 08-5759 and 07-2283, were investigated and sequenced, the sequence data obtained from the two families is filtered using various filter steps. A reduced subset of potential pathogenic variants remains, approximately 60 and 195 variants for, respectively, family 08-5759 en 07-2283. With the use of several prediction programs, which predict the damaging effect of the variant, and linking the phenotype to the gene function, the list is even further reduced to 2-5 variants. The variants of family 08-5759 were false positive or could not be related to the phenotype, KRTAP10.6. The variants, LAMA3, SYNPO2 and ZYX; which were validated and tested for segregation, in family 07-2283 are thought to be more trivial, because these can be related to the phenotype. Eventually one of these variants, SYNPO2, is functionally tested. The location of the SYNPO2 protein in the cell was determined. The protein was diffuse expressed throughout the cytoplasm and nucleus, not in the mitochondria, making a role in a mitochondrial disorder less likely. The SYNPO2 variant was tested in the family, the two affected children were homozygous for the same variant, which matches a potential pathogenic role, however, so were DCHS2, KIAA1109 and LAMA3 variants. Patient fibroblasts were transfected with a myoD (myogenic regulatory factor) vector, to transdifferentiate the fibroblasts to myotubes (muscle cells). The SYNPO2 expression in these transfected fibroblasts was quantified using a relative quantification, and it was, in comparison to the control, , low.

We concluded that the mutation doesn’t alter the expression of SYNPO2 in the patient fibroblasts, because the expression was also low in the transfected control. However before further analyses can be done, the transfection of the patient fibroblasts using myoD vectors has to be optimized. The low expression of the SYNPO2 gene is probably a cause of insufficient differentiation of the transfected fibroblasts. Whenever the transfection is optimized, the transfected fibroblasts have differentiated to myotubes, further additional functional tests are possible, such as Immunohistochemical or fluorescent antibody staining, electron microscopy and Western blotting to analyze the formed Z-disc and proteins which might be affected.

Page vii

Samenvatting Mitochondriën zijn de energiefabrieken van de cel, door middel van vijf complexen, welke zich in het binnenste membraan bevinden, wordt er energie opgewekt. Deze vijf complexen, I t/m IV en ATP synthase, worden gecodeerd door mitochondriale en nucleaire genen. Wanneer er een mutatie optreedt in een van deze genen kan een van de complexen of bijbehorende eiwitten non-functioneel worden. Dit leidt tot een gebrek aan het vermogen om energie op te wekken voor de cel, weefsel en orgaan; vooral organen welke een hoog energie verbruik hebben, worden ernstig getroffen door een gebrek aan energie. Deze mutaties kunnen de oorzaak zijn van een mitochondriale myopathie zijn, een groep ziekte waarbij de energievoorziening van spieren is aangetast. Een nieuwe benadering om genetische oorzaken te detecteren is tot dusver onoplosbaar. Alle duizenden genetische varianten in het coderende gedeelte van het genoom worden gedetecteerd, vanuit deze duizenden varianten worden de mogelijk pathogene en oorzakelijke mutaties gefilterd. Whole Exome (WE) sequencen is gebaseerd op Next-Generation Sequencing (NGS) technologie. Deze varianten worden gevalideerd en segregatie in de familie wordt getest met behulp van Sanger sequencing. Verder wordt de variant ook functioneel geanalyseerd met functionele testen, als eiwit targeting, cellulaire en gen expressie studies.

Twee patiënten van de WE families, DNA 08-5759 en 07-2283, zijn onderzocht en gesequenced, de sequence data verkregen van de twee families is gefilterd met behulp van verschillende filterstappen. Er blijft een gereduceerde subset aan mogelijk pathogene varianten over, van ongeveer 60 en 195 varianten voor respectievelijk familie 08-5759 en 07-2283. Door middel van verscheidene predictie programma’s, waarmee het schadelijke effect van de variant voorspelt, en het linken van het fenotype aan gen functie werd de lijst verder gereduceerd tot er 2-5 varianten overbleven. De varianten van familie 08-5759 waren fout positief of kon niet worden gelinked aan het fenotype, KRTAP10.6. De varianten, LAMA3, SYNPO2 en ZYX; welke gevalideerd waren en getest op segregatie, in familie 07-2283 waren interessanter, deze konden worden gerelateerd aan het fenotype. Een van deze varianten, SYNPO2, is uiteindelijk functioneel getest. De locatie van het SYNPO2 eiwit in de cel is bepaald. Het eiwit vormt een diffuse kleuring van het cytoplasma en nucleus, niet in de mitochondriën, waardoor een rol in een mitochondriale ziekte minder waarschijnlijk wordt. De SYNPO2 variant was getest in de familie, beide aangedane patiënten waren homozygoot voor dezelfde mutatie, maar dat geldt zowel voor de KIAA1109, DCHS2 en LAMA3 varianten. Patiënt fibroblasten zijn getransfecteerd met een specifieke myoD (myogene regulator factor) vector, om de fibroblasten te transdifferentiëren naar spiercellen. De SYNPO2 expressie in deze getransfecteerde fibroblasten is gekwantificeerd met behulp van een relatieve kwantificatie, en was, in vergelijking met de controle, skeletspieren, laag.

Er wordt geconcludeerd dat de lage expressie niet wordt veroorzaakt door de SYNPO2 variant, doordat de getransfecteerde controle ook een lage expressie bevat. Maar voor er verdere analyses kunnen worden uitgevoerd, moet de transfectie met de myoD vector worden geoptimaliseerd, de lage expressie van SYNPO2 is mogelijk te wijten aan het onvolledig differentiëren van de getransfecteerde fibroblasten. Wanneer een optimalisatie van de transfectie is uitgevoerd, de ‘myotubes’ lang genoeg in kweek kunnen worden gehouden zodat deze differentiëren, zijn verdere aanvullende functionele testen als kleuringen met Immunohistochemische of fluorescerende antilichamen, elektronen microscopie en Western blotting mogelijk om de gevormde eiwitten te analyseren.

Page viii

1. Introduction Living cells of most eukaryotic and many prokaryotic organisms generate energy, adenosine triphosphate (ATP), using a process called the cellular respiration. The generated ATP is used to perform their many tasks. In most eukaryotic organisms mitochondria are the sites of cellular respiration, the most important process of cellular respiration is oxidative phosphorylation (OXPHOS). The OXPHOS sites in the inner membrane of the mitochondrion, it’s a collection of proteins, which exist in multiprotein complexes numbered I through IV. The subunits of these complexes are encoded by both the mitochondrial (mtDNA) and nuclear DNA (nDNA). Mutations in the mtDNA and/ or in nuclear genes, which are involved in the maintenance and replication of the mitochondria, could lead to a malfunction of the mitochondria. A malfunction of the mitochondria causes a lack of energy in cells, tissue and organs, the lack of energy could lead to disorders. These disorders are called mitochondrial myopathies. These myopathies are one of the most common metabolic or neurologic hereditary disorders, and there are over 1500 genes involved with the myopathies. Organs and tissues which have a high demand of energy are affected most often in mitochondrial myopathies. In a quarter of all patients their mitochondrial myopathy is a hereditary cause, finding an approach to detect these genetic causes has been irresolvable insofar.

Using Next-generation Sequencing (NGS) applications, as Whole Exome (WE) sequencing or Long Range PCR (LRPCR) fragments, defects present in the mtDNA and/ or nDNA are detected and investigated. The research aimed to identify the possible pathogenic defect out of thousands detected variants from patients with a mitochondrial myopathy, using standardised protocols for NGS and several filtering steps. It’s assumed that the pathogenic variant, which is found during WES, alters the amino acid, affecting the protein and its function, and the frequency of the variant in the general population is low. Based on these assumptions the list of variants, which includes the possible pathogenic defect, is filtered and shortened. The amount of variants was even more reduced using additional strategies, like relating the phenotype of the patient to the gene in which the possible pathogenic variant is found. Validating the variants to exclude false positive variants and also verifying the segregation of the variant, both were done using Sanger sequencing. Further analysis included functional tests, targeting of the transcribed and translated gene, gene and cellular expression levels and Sanger sequencing patients with the same phenotype. It was expected that the genetic and pathophysiological complexity of the mitochondrial myopathies were to be solved using NGS. And the possible pathogenic variants could be linked to the phenotype with the use of the filtering steps and functional tests.

The report is composed of several chapters; chapter two comprises of the theoretical background and the principles of the research and techniques used during the graduation. Chapter three contains the performed procedure and the used techniques. In chapter four the obtained results are included, followed by chapter five wherein the results are discussed and a conclusion is drawn. Also the references and any attachments included in the graduation report.

Page 1

2. Theoretical background

2.1 Whole Exome – Enrichment NGS is the second sequencing generation, the NGS platforms, Roche 454, Illumina and SOLiD, enable more applications to be sequenced than the first generation, Sanger sequencing. However Whole Genome-sequencing (WGS), which is a NGS application, is expensive, an alternative is Whole Exome sequencing (WES). WES is used to sequence the entire exome, the protein coding regions which constitutes of approximately 1% of the genome [1][2][3][4]. Using WES, the protein coding regions of rare genes, which cause complex disease and health-related traits, are examined and explored. These protein coding regions, the exome, contain 85% of the mutations which have a large effect on disease-related traits [5]. The exome contains import information which is necessary for an organism to function, mutations in the exons could have severe consequences for the functioning of the organism. The type of mutation in the exons determines the severity of the mutation, there are synonymous and non-synonymous mutations. Synonymous mutations are mutations which cause no alteration to the amino acid, these mutations modify the codon, however multiple codons produce the same amino acid. Non-synonymous mutations alter the codon and the amino acid, a change in amino acid could have severe consequences. It could change the polarity, permeability, acidity and charge of the formed protein. Using WES, these mutations, which possibly change the amino acid, are detected, however mutations present in introns are partly lost. The introns harbour specific regions which are important for the development of a protein, such as splice sites, Untranslated Regions (UTR), and some transcriptional factors. The transcriptional factors can influence the amount of protein which is synthesized from a gene. Mutations in the splice sites have an effect on the splicing of the pre-mRNA, resulting in a possible loss of an exon or an intron isn’t spliced out. During WES a part of these regions is included, however this part is only a fraction of the amount of introns. WGS provides the complete summary of all mutations, in introns and exons, and doesn’t lose any [6][7][8][9] important mutations in the introns . WES starts with the isolation of the exome using WE-enrichment kits, there are various WE- enrichment kits. The SureSelect Target Enrichment System from Agilent Technologies is linked to the Illumina sequencers, Genoma Analyzers and HiSeq. WES is performed using Genomic DNA (gDNA), which is sheared in fragments of 150-200 bp. After shearing the created sticky ends are converted to blunt ends and the DNA-fragments are phosphorylised. By adding a dATP to the DNA-fragments, adapters, which have a dTTP-nucleotide overhang, are ligated to the both the 3’ and 5’ fragment. The adapters contain an index tag, it’s a sequence which is used for identification. There are multiple index sequences, using these multiple indexes it’s possible to pool different samples, every sample containing a different index. Using these adapters, which function as primers, the DNA-fragments are amplified. After the amplification the DNA-fragments are denatured and RNA baits are added, these RNA baits are 120 bp in size and synthesized from cDNA. The RNA baits are complementary to the cDNA, which ensures that DNA-fragments consisting of introns can’t attach to the RNA baits, whereas the exon DNA-fragments can. The end of the RNA bait is biotinylated, DNA-fragments, which are attached to the complementary RNA baits, are isolated using streptavidin coated magnetic beads. The biotinylated end of the RNA bait attaches to the streptavidin coated magnetic bead, using a magnetic field the RNA baits, with the exon DNA-fragments, are separated from the unattached fraction of intron and partly exon DNA-fragments. The biotin-streptavidin bond is broken, the DNA- fragment-RNA bait complex is denatured and the RNA is digested.

Page 2

The single stranded DNA-fragments are PCR amplified, and sequenced using the Illumina HiSeq 2000 [10][11]. The whole WE-enrichment flow diagram is included in appendix 1, figure 21. 2.2 Illumina Sequencing The Illumina/ Solexa sequencing technique uses reversible terminators and a solid surface to sequence a diverse set of applications, such as gDNA, cDNA and LRPCR-fragments. Input DNA sheared in fragments smaller than 800 bp, shearing the DNA creates sticky-ends, which are converted to blunt ends using enzymatic reactions. An A-nucleotide is added to the 3’ blunt end of the DNA-fragment. The adapters (A and B), which have a T-nucleotide overhang, are ligated to both ends of the DNA-fragment using the A-T-nucleotide bond. The adapters contain an index tag, it’s a sequence which is used for identification. There are multiple index sequences, using these multiple indexes it’s possible to pool different samples, every sample containing a different index. The Illumina sequencing software separates the sequence data of all samples according to its index tag.

A flow cell containing 8 lanes, see figure 2, is loaded in the cBot, an automatically cluster generator, see figure 3 on the next page. The denatured DNA-fragment is loaded onto the solid flow cell, the surface of the flow cell is covered with oligo nucleotides, adapters (B) and complementary adapters (A), see figure 1. The single stranded DNA-fragment binds to the oligo nucleotide complementary to adapter A, which is attached to the surface of the flow cell. Using an extension reaction the single stranded DNA-fragment is amplified starting from the complementary adapter A. After the extension, the template DNA is removed and the amplified single stranded DNA-fragment is attached to the flow cell. The 3’ end adapter sequence (B) of the amplified DNA-fragment binds to the oligo nucleotide complementary to adapter B, which is attached to the flow cell. When binding to the adapter, the DNA-fragments forms a bridge, see figure 1. The adapter, attached to the flow cell, functions as a primer for the following PCR amplification. Using PCR amplification reagents and multiple PCR-cycles the DNA-fragments are amplified starting from the adapter, this is called ‘’bridge amplification’’. The bridge amplification creates clusters of the same DNA-fragment on the surface of the flow cell, these clusters exist of thousands of DNA-fragments [12][13][2].

Figure 1: Schematic workflow Illumina sequencers, combination of multiple figures. Sticky ends of sheared DNA are converted to blunt ends, phosphorylised and an dATP is added (1). Adapters are ligated to both ends of the DNA-fragment using a dTTP overhang (2), resulting in an adapter-DNA-fragment (3). The DNA-fragment is denatured and binds to the oligo nucleotide complementary to adapter A, which is attached to the flow cell (4). Using a PCR reaction the complementary strand is applied to the flow cell (5), the strand bends and the 3’ adapter end binds to an oligo nucleotide complementary to it (6). Clusters are formed on the surface of the flow cell of the same DNA-fragment (7) [14][12][2].

Page 3

Figure 2: cBot, clusters are Figure 3: Flow cell including the 8 lanes. generated automatically.

After the cluster generation the flow cell is loaded into the HiSeq 2000 sequencer, the sequencer automatically adds all the sequence reagents, such as the sequence primer and four reversible terminator nucleotides (A, C, T and G). The sequence primer, which is complementary to the 3’ end adapter, attaches to the single stranded DNA-fragment. The reversible terminator nucleotides are 3’- modified nucleotides (3’-O-azidomethyl 2’-deoxynucleoside triphosphate), which each contain a different removable fluorophore, and added simultaneously. Whenever a nucleotide attaches to the DNA-fragment, it’s impossible for DNA-polymerase to bind another nucleotide because of the terminator group. After incorporation of a reversible terminator nucleotide, four lasers illuminate the fluorophore, which emits its specific wavelength. The detectors detect the emitted wavelength and identify the nucleotide and its position on the flow cell. The fluorophore and terminator groups are cleaved off by adding tris(2-carboxyethyl)phosphine (TCEP), during the cleavage TCEP regenerates the 3’-OH end of the incorporated nucleotide. The addition of the hydroxyl group enables polymerase to add the next reversible terminator nucleotide, during the next cycle of sequencing. These synthesis cycles are repeated, during each cycle the fluorescent signal of every cluster is measured as depicted in figure 2, on the next page. The DNA-fragments are sequenced during two Reads of 100 cycles, every cycle a nucleotide is attached and identified, the read length of the HiSeq 2000 is approximately 100 bp. The first read is a multiplexed read, the second 100 cycles the fragments are paired-end sequenced. Paired-end sequencing enables the sequencing of the opposite end of the fragment. The original templates are cleaved and removed, the complementary strand is regenerated in clusters and sequenced. The clusters are created in the sequencer, but in the same way as in the cBot, after the regeneration the created clusters are sequenced. Using the Illumina software the fluorescent signal is converted into raw data, the nucleotides are determined, every cluster has its own sequence. The Illumina software applies a base-calling algorithm to define the quality (Q) value for each base call. The raw data is mapped against a reference gene using mapping software, variants, insertions and deletions are detected. The average HiSeq 2000, see figure 4, data output is 2 x 300 Giga base pairs (Gb), this applies for two flow cells, each flow cell contains 8 lanes [14][15].

Figure 4: Schematic review of the Illumina sequence run and a picture of the data. The sequence primers and reversible terminator nucleotides are added (1), the attached nucleotide, which contains a fluorophore, is illuminated and the fluorophore emits a fluorescent signal (2). The fluorescent signal is detected by the CCD camera and pictures are made of each cycle (picture on the right), the fluorescent signal is converted to a nucleotide [14]. On the right is the Illumina sequencer HiSeq 2000.

Page 4

2.3 Bioinformatics Using the Agilent SureSelect Exon Enrichment kit approximately 38 Mega base pairs (Mb), containing exons, is captured, the remaining 12 Mb exists of UTR’s, microRNAs and adjacent splice sites [10][14]. While mapping the raw data against a reference gene, several thousands of single nucleotide variants (SNVs) are discovered. The majority of the variants are known polymorphism, however one, or possible more, of these variants is potential pathogenic. The detected list of variants is filtered in various steps, using these filtering steps the potential pathogenic variant is identified. The filter steps are based on the assumptions that pathogenic mutations alter an amino acid, thereby affecting the protein and its function, and the frequency of the variant in the general population is low. The variants are filtered according to the effect of the mutation, mutations in protein coding regions have, in most cases, a more severe effect than mutations in non-coding regions. The damaging effect of missense, alteration of the amino acid, or nonsense mutations, amino acid changes in a stop codon, is predicted using algorithmic calculations, Grantham table and the computer program PolyPhen-2. The Grantham score reflects the increasing chemical inequality, PolyPhen-2 predicts the functional effect of an amino acid change based on its evolutionary conservation, and the position of the amino acid alteration in the protein. These algorithmic calculations predict the damaging effect, the higher or lower the score, Grantham ш 101, PolyPhen 2 ш 0.851 and SIFT ” 0.05, the more pathogenic a variant is considered [16][17][18]. The variants are compared with Single Nucleotide Polymorphism (SNP) databases, these SNP databases include all known variants, which might be pathogenic. The known variants and variants which have a high frequency in the general population are excluded, it’s assumed that high frequency mutations are non-pathogenic. The next filter steps is based on the fact that most metabolic traits are autosomal or X-linked recessive traits, only homozygous (two affected alleles) or compound heterozygous (two various mutations, in combination pathogenic) aren’t excluded. Using these filtering steps the number of variants is reduced to a smaller subset of several hundred potential pathogenic mutations, see figure 5 [19][6][7][11].

2.3.1 Variant analysis With the use of the first filter steps the number of variants is reduced, to shorten the created list of potential pathogenic variants additional analyses are used. These tests imply to the validation and segregation of the variant and the relation of the gene with the phenotype of the patient. Total variant (±50.000 variants)

Novel & low frequency in general population (±10.000)

Coding/ non-synonymous mutations (±1500)

Two alleles in one gene (homozygous, compound heterozygous) (±600)

Smaller subset of variants, due to prediction programs, homozygosity (±90)

Validation, segregation of small subset of potential pathogenic variants (±3-5)

Previous step confirmed, variant functionally tested (gene expression levels, targeting, familial analyses)

Figure 5: Diagram overview of filtering steps of detected variants.

Page 5

Relating the function of the gene to the phenotype During sequencing variants are discovered in several genes, using the mapping software the variants are mapped to known genes. Every gene family and subfamily has a unique function, using NCBI the function of the, in the NCBI database known, genes are retrieved [20]. Mutations in the protein coding regions could alter the protein, resulting in a non-functional protein, the pathophysiological effect of a non-functional protein depends on the function of the protein. Mitochondrial myopathy patients have a pathophysiological cause, which manifests itself in a specific phenotype. By relating the functions of the various genes, in which a potential pathogenic mutation is found, to the clinical and subclinical phenotype, the subset of possible pathogenic variants is even further reduced. Mitochondrial related proteins are more associated with mitochondrial affections than probably i.e. hair related proteins are.

Validation of the variant Using NGS, the throughput of large volumes of sequence data is increased, however the accuracy is reduced which enlarges the error rate. To exclude false positive reads, the region of the gene, in which the variant is found, is Sanger sequenced. The variants are validated, true variants are separated from the false positive variants, and this increases the reliability of the found variant [2].

Segregation of the variant The segregation of the found variants is determined using familial analyses. Segregation refers to the separation of individuals with different traits, such as pathogenic variants, the segregation of genes explains how recessive disorders can skip a few generations. Pathogenic, and non-pathogenic, variants are in most cases, there do exist de novo mutations which occur spontaneously, traceable to one of the parents. Variants which segregate incorrectly are excluded as potential pathogenic variant. When the progeny, of a healthy homo- and heterozygous (AA, normal type and Aa, a=variant) parent, has a pathogenic phenotype, but is heterozygous (Aa), the variant didn’t segregate correctly. The affected progeny has got the same genotype as one of the parents, there’s no separation of two distinctive traits [25][26][27][19]. The variant is excluded as a potential pathogenic variant. Using these analyses of variants, relating the function of the gene, segregation and validation of the variants; the variants are reduced to a manageable amount. A few of the remaining subset of variants will be functionally tested.

Various types of defects in genes Mitochondrial functions are inherited in two ways: maternal and bipaternal inheritance. Maternal inheritance relies on the mitochondrial DNA (mtDNA), during fertilization the paternal mtDNA, of which a part ends up in the oocyte, degraded [21]. Mitochondrial dysfunctions caused by mtDNA defects are always maternally inherited. The inheritance of a mitochondrial disorder depends on which types of mtDNA are present in the mitochondrion, a cell contains 102 - 105 mitochondria, a mitochondrion contains two to ten copies of the mtDNA. These two types of mtDNA, normal and mutant mtDNA, can be distributed in different ratios amongst cells, tissues and organs. Whenever a cell contains all the same, normal, mtDNA type it’s called homoplasmy. Heteroplasmy is the state in which normal and mutant copies of mtDNA are mixed in the mitochondrion or cell. The percentage of mutant mtDNA in comparison with normal mtDNA, the threshold, affects the severity of the phenotype, the disease [22][23]. Bipaternal inheritance relies on the inheritance of 23 of each parent, over a thousand genes encode mitochondrial functions [24].

Page 6

Defects in these genes on the gDNA, which encode for replication and maintenance of the mtDNA, could cause mutations in the mtDNA. Pathogenic phenotypes which are caused due to a mutation on one gene, are called monogenic diseases. Such diseases, rather these mutations on the gDNA genes are inherited in a Mendelian way, a single copy of a gene of both parents. The mutations are inherited in an autosomal dominant, recessive or X-linked way. Autosomal mutations are inherited via the autosomes, X-linked mutations are mutations which inherit via the X-. A mutation, which results in a pathogenic phenotype, in a single copy of an autosomal gene is an autosomal dominant mutation. Autosomal recessive traits are caused by mutations in both copies of an autosomal gene, these are homozygous for a pathogenic mutation. Heterozygous autosomal recessive individuals contain a affected gene and a healthy gene, these individuals are carriers for the autosomal recessive trait. Parents, which both are carriers of a recessive trait, have an one to four chance the progeny has a recessive disorder, this applies to both X-linked as autosomal mutations. The autosomal and X-linked dominant inheritance has an one to two chance in which the progeny inherits the dominant disorder, see figure 6.

1 2 3

Figure 6: Composition of various forms of heredity. (1) Autosomal dominant inheritance: H (dominant trait) and h (healthy gene), 50% of the progeny inherits the autosomal dominant trait (red), 50% is healthy (blue). (2) Autosomal recessive inheritance: both parents are carriers of autosomal recessive traits (c, and C is healthy), 50% of the progeny are carriers (Cc), 25% is healthy (CC) and 25%

inherits the autosomal recessive trait (cc). (3) X-linked heredity (recessive): one of the parents is a carrier of recessive X-linked traits (Xx), 50% of the progeny is healthy (XY and XX), 25% are carriers (Xx) and 25% inherits the recessive X-linked trait (xY). Dominant X-linked traits: 50% of the progeny is healthy, 50% inherited the X-linked dominant trait [27]. 2.4 Functional tests When the potential pathogenic variants are validated and the segregation of the variant is determined, functional test are performed. These functional tests include targeting, familial analyses and gene expression levels.

2.4.1 Targeting of the protein By means of transcription and translation DNA is converted to protein, these proteins, which are synthesized in the ribosomes, each have a specific function. To function the protein is being transported to its organelle, the membrane of the organelle, the inner space of the organelle, the cell membrane or to the extracellular matrix outside the cell. The transporting process is carried out based on information of the protein itself, proteins contain a specific targeting signal, a signal peptide. Based on this signal peptide, proteins are transported and delivered to the correct organelle.

Page 7

The signal peptide includes two different types, the pre-sequences and the internal targeting peptides. The pre-sequences targeting peptides are located at the N-terminus, beginning of the protein, or at the C-terminus, the end of the protein. Internal targeting peptides are enclosed by the rest of the protein [1]. Genes, which contain a potential pathogenic variant, can be localised using these targeting peptides. Mutations in the targeting peptides could alter the targeting peptides composition, the alteration could lead to a non-functional targeting peptide. A non-functional targeting peptide results in non-targeted proteins, these aren’t targeted to its organelle.

Using a vector (pEGFP-n1, see appendix 2, figure 22), restriction enzymes and DNA ligase the pre- amplified targeting peptide, of an interesting gene (protein), is ligated into the vector. Competent cells, cells which posses a easily altered and, for DNA, crossable cell membrane, are transformed and used to replicate the vectors. The replicated vectors are isolated and used to transfect human cells using the FuGene complex, the FuGene reagent forms a complex with DNA. The complex exists of DNA which is surrounded by a lipid membrane, the FuGene lipid membrane merges with the human cell membrane, and the DNA ends up in the cell. The vector contains a promoter, the targeting peptide and the GFP, green fluorescent protein, is transcribed and translated by the cell. The targeting peptide, together with the GFP, is transported to its organelle, where the GFP emits the green fluorescence. The interesting gene and protein, which contained a potential pathogenic variant, is localised.

2.4.2 Analyses other patients The pathogenicity of a mutation can be tested by means of sequencing other patients, the specific exon of the patients, which might have a similar phenotype, is sequenced. The analyses has also been used to analyze the inheritance of a variant (autosomal dominant or recessive, X-linked or de novo mutations), during the validation and segregation in the family [6].

2.4.3 Gene expression levels Using transcription and translation, cells convert DNA into protein, these proteins are important for the functioning of cells. Each protein has its own function, some proteins are more important than others. There are differences in gene expression of a gene between various tissues, every tissue synthesizes its own specific proteins, which it needs to function. Specific genes are expressed in multiple tissues, these genes are called housekeeping genes. A mutation in a gene can have multiple consequences, it can cause the translation of a non-functional protein, an in- or decrease of expression or the protein isn’t expressed at all. Using quantitative PCR (qPCR) it’s possible to determine the gene expression of genes in various tissue. By analysing the gene expression levels of genes, which contain potential pathogenic variants found by WES, in multiple tissue, the effect of the variant on the protein is determined.

2.5 Affected patients During the graduation and apprentice internship, there were two families of great importance; family DNA 07-2283 and DNA 08-5759. Because of the confidentiality no family names were used in this report. These families had one or more affected children, these were screened using WES. The bioinformatics analyzed and filtered the majority of the dataset and reduced the number of potential pathogenic variants. The remaining subset was to be validated, tested on segregation and functionally analyzed.

Page 8

2.5.1 Family DNA 07-2283 Family DNA 07-2283 was a consanguineous family and had two affected family members, the patient, whom both died 6 months after birth, displayed the same phenotype and various clinical symptoms: hypertrophic cardiomyopathy (HCM), incorrect muscle development and underdevelopment. After WES and the filtering steps a subset of 195 variants remained, six variants, LAMA3, SYNPO2, DCHS2, SLC6A8, Zyxin (ZYX) and KIAA1109; were selected as potential pathogenic. Four variants, LAMA3, SYNPO2, DCHS2 and KIAA1109, were selected based on the similarity between the two patients; SLC6A8 and ZYX were chosen because of their relatedness to the phenotype. A few variants were analyzed during the graduation term.

2.5.1.1 LAMA3 LAMA3, laminin, alpha-chain (A) 3, is one of the subunit proteins which form the hetero-trimetric glycoprotein laminin 5, 6 and 7 encoded on chromosome 18. The laminins are components of the basement membrane, a thin layer of extracellular matrix which comprises epi- and endothelial, muscle, fat and Schwann cells. The basement membrane, including the laminins, mediates the maintenance of skin integrity, filtration and various stages of development. The laminins play an important role during embryonic development by mediation of the attachment, migration and organization of cells into tissues [28][29]. The LAMA3 mutation, c.2234G>T, was detected in exon 19 of the 76 exons, and altered the amino acid 745 arginine to leucine. The nucleotide and amino acid were highly conserved, which increased the possibility that the mutation is potentially pathogenic.

2.5.1.2 SYNPO2 Synaptopodin 2, myopodin, is a multi adapter protein and encoded on , which interacts and co localizes with filamin and alpha-actinin during all stages of muscle development. Study [30] revealed the protein to be expressed in early stages of in vitro differentiation of human skeletal muscle cells. Synaptopodin is thought to mediate in the early assembly and stabilization of the Z-disc, the Z-disc is one of the major components of muscle cells. An affected Z-disc can cause less functioning muscle cells and muscle, which could lead to severe problems. Mutations in several other genes, which encode Z-disc related proteins, have been found to cause myopathies and cardiomyopathies. The synaptopodin 2 variant is therefore a very interesting candidate for causing the severe phenotype [30][31]. The SYNPO2 mutation, c.1656A>C, was detected in exon 4 of the 5, and altered the amino acid 552 lysine to asparagine.

2.5.1.3 DCHS2 Dachsous 2 (DCHS2) is on chromosome 4 encoded cadherin (calcium dependent adhesion molecule) protein. Cadherins are adhesion proteins which mediate cell to cell contact and responsible for tissue and organ organisation, cadherins depend on calcium ions to function. The DCHS2 protein functions as a homophilic cell adhesion protein, it binds to an identical DCHS2 protein of an adjacent cell [32]. The DCHS2 mutation, c.8351G>A, alters the protein 2784 serine to asparagine and was found in exon 25 of the 28.

2.5.1.4 SLC6A8 Solute carrier family 6 (neurotransmitter transporter, creatine), member 8 is a gene which encodes a transporting protein on chromosome X. The SLC6A8 protein transport creatine, which is a very important intermediate energy supply for muscle and nerve cells, into and out of the cell.

Page 9

Creatine is used to store energy, phosphor is supplemented to creatine forming creatine phosphate (CP), the phosphor is cleaved of and added to ADP, resulting in ATP (energy). A deficiency in the transport of creatine into the cell results in a reduced amount of creatine, which leads to less energy production. A creatine deficiency can be caused by a defect in the SLC6A8 gene, a reduced amount of creatine can have severe consequences for organs with a high demand of energy. The SLC6A8 gene is linked to a specific trait, the X-linked creatine deficiency [33]. The mutation, c.691C>G, was found in exon 4 of the 14 and alter the protein 231 leucine to valine.

2.5.1.5 Zyxin Zyxin is a zinc-binding phosphoprotein and a component of the focal adhesion, which are actin-rich structures and connect the extracellular matrix to the cytoskeleton of the cell. It’s encoded on chromosome 7. These structures mediate the adhesion of the extracellular matrix to the cell and signal transduction. The zyxin binds to the alpha-actin, and is thought to mediate the assembly and control of the actin cytoskeleton. Alpha-actinin is one of the components of the Z-disc [34][35]. The Zyxin mutation, c.263C>A, was detected in exon 3 of the 10, and alters the protein 88 alanine to asparagine.

2.5.1.6 KIAA1109 KIAA1109 is a gene which encodes a chromosome 4 protein whose function is the regulation of epidermal growth and differentiation. The region, long arm of chromosome 4, in which the gene is found, is associated with the susceptibility celiac disease [36]. The mutation, c.6664C>T, alters the protein 2222 arginine to tryptophan and detected in exon 40 of the 84.

2.5.1.7 Transformation fibroblast - Myogenesis Myogenesis is the development of muscle tissue, by means of fusion of myoblasts into myotubes, muscle fibers are formed. Using specific myogenic regulatory factors fibroblasts could be transformed into myotubes, failing muscle development can be detected. These specific factors are present in various vectors, these vectors are incorporated into an Adenovirus. The Adenovirus is used to transfect the fibroblasts, because of the inserted vectors, the fibroblasts begin to express, rather the fibroblast are triggered to form, muscle proteins. After the formation of the myoblasts and the fusion into myotubes, all sorts of test, qPCR, electron microscopy, Western Blots, antibody staining; can be conducted upon the myotubes[37].

2.5.2 Family DNA 08-5759 The family DNA 08-5759 wasn’t consanguineous and had one affected family member, the patient, which also died on a young age, had malfunctioning kidneys and Rhabdomyolysis. Rhabdomyolysis is a severe disease which breaks down the skeletal muscle, the disease probably caused the kidney failures. The metabolites of the damaged muscle cells are toxic for the kidneys. The subset of potential pathogenic variants contained 40 variants, most of these variants were excluded because of incorrect segregation. The potential pathogenic variants, which were validated and segregated correctly, KRTAP10-6 and PSPH, had respectively a high prediction score (Grantham, etc.) and the phenotype related to gene function. During the graduation term this variant was trivial.

Page 10

2.5.2.1 KRTAP10-6 Keratin is a fibrous structural protein located in hair, skin and nails. The keratin proteins form intermediate filaments, in specific types of cells such as cells of the dermis the keratin filaments are a part of the cytoskeleton. In hair the keratin filaments include keratin associated proteins (KRTAP) in the formation of rigid and resistant hair shafts. The KRTAPs consist of three groups: high, ultrahigh cysteïne and high glycine-tyrosine, the KRTAP genes are divided in 27 subfamilies (KRTAP1 to 27). KRTAP10-6 is a high cysteïne KRTAP, encoded on 21, the protein is only expressed in hair [38][39][40]. The KRTAP10-6 mutations, c.184C>T and c.206C>G, detected in exon 1, it’s the only exon of this gene, altered the amino acid 62 arginine to cysteïne and 69 proline to arginine.

2.5.2.2 PSPH The phosphoserine phosphatase gene encodes for a chromosome 7 catalyzing enzyme, the enzyme catalyzes the formation of L-serine. It’s a member of the haloacid dehalogenase (HAD) superfamily, members of this superfamily are very conserved. PSPH catalyzes the last and irreversible step in the formation of L-serine. During the catalysis, PSPH uses magnesium for the hydrolysis of L- phosphoserine, which results in two products: L-serine and inorganic phosphate (Pi) [40]. The mutations, c.81A>T and c.95A>G, altered the amino acid asparagine to cysteïne and arginine to serine, and was found in exon 4 of the 8.

2.6 Aim/ hypothesis The aim/ hypothesis of the project were to use WE-enrichment and Illumina sequencing to detect potential pathogenic variants in the mtDNA and gDNA. To filter the dataset of thousand detected variants, by using data and variant analyses, and to solve the genetic complexity of the mitochondrial myopathies. It’s assumed that the protein coding regions harbours the majority of the disease causing mutations, with the use of WES these regions are sequenced and the mutations are detected. The larger part of the genetic and pathophysiologic complexity of the mitochondrial myopathies can ‘’hopefully’’ be solved using WES, to, eventually, provide a more reliable diagnosis for the patient.

The thousand detected variants are filtered to find the potential pathogenic variant, the filtered subset is validated, tested on segregation and the gene function is linked to the phenotype. The remaining variants are functionally tested, targeting, gene expression levels and familial analysis; using these tests the pathogenic variant will be discovered.

Page 11

3. Methodology The following methodologies are standard methods used for the detection and validation of a pathogenic variant. It depended on the patient, rather phenotype of the patient, which functional methods were performed during the research.

3.1 Whole Exome Enrichment & Illumina Genoma Analyzer HiSeq 2000 Whole exome sequencing consist of multiple steps, first the WE enrichment kit, SureSelect Exon Enrichment (Agilent Technologies), second the cluster generation kit, TruSeq PE Cluster kit v3 and third the WE-sequencing kit, TruSeq SBS Kit v3 (200 cycles) (Illumina). During these three methods other methodologies were used, such as quality- and quantity checks (Qubit, Bioanalyzer), fragmentation (Covaris S2), and purification/ isolation using AMPure XP beads. These standard methods were copied from the apprenticeship report [42], every method is included in the appendix.

3.1.1 Sample preparation For each DNA sample that had to be sequenced, 1 library was prepared. First the gDNA was sheared using the Covaris S2, appendix 3, after shearing the DNA sample was purified with the use of AMPure XP beads (Agencourt), appendix 4. The length of the fragmented and purified DNA sample was verified using the Bioanalyzer (Agilent 2100), appendix 5. After the verification, the ‘’End repair’’ mix was added to the sample, and it was purified using the AMPure XP beads. Then 10X Klenow Polymerase Buffer, dATP and Exo(-) Klenow was added to the DNA sample and it was purified once more with the use of AMPure XP beads. The purified DNA sample was added to the prepared ‘’ligation master mix’’, and purified using the AMPure XP beads. After purification the PCR components were added to the DNA sample, after PCR the sample was purified and its quantity, quality and size of fragments (±100-120bp) assessed using the Bioanalyzer. The sample preparation was performed using the SureSelect Exon Enrichment [10].

3.1.2 Hybridization Each prepared DNA library sample was hybridized and captured; these weren’t pooled at this stage. First the ‘’hybridization buffers’’, the ‘’SureSelect Capture Library’’ mix and ‘’SureSelect Block’’ mix were prepared. The prepped DNA library sample was pipetted in the wells (row B) of a PCR plate, the ‘’SureSelect Block’’ mix was added to the same wells in the PCR plate and mixed by pipetting up and down. The hybridization buffers and capture library mix were also pipetted in the wells of respectively row A and C of the PCR plate. A fraction of the hybridization buffers and the entire content of the prepped DNA library sample were transferred to row C, the mixture was mixed by pipetting up and down and incubated for 24 hours at 65ºC. During incubation the magnetic beads were prepared using ‘’SureSelect Binding Buffer’’ and the ‘’SureSelect Wash Buffer #2’’. The hybridization mixture was added to the magnetic bead solution, the beads were washed using the ‘’SureSelect Wash Buffer #1 and 2’’ and the DNA was eluted with the use of the ‘’SureSelect Elution Buffer’’. After adding the ‘’SureSelect Neutralization Buffer’’ to the captured DNA, the DNA sample was purified using the AMPure XP beads. The hybridization was performed using the SureSelect Exon Enrichment [10].

Page 12

3.1.3 Addition of Index Tags by Post-Hybridization Amplification For each hybrid capture one amplification reaction was prepared. The captured DNA was added to the ‘’Herculase II Master Mix’’ in a PCR tube, after the addition of all the reagents the PCR tubes were loaded in the thermal cycler and the PCR program was run. After PCR the DNA sample was purified using the AMPure XP beads. The quality and quantity of the DNA was assessed with, respectively, the Bioanalyzer and qPCR. The samples were pooled after their quality and quantity check; the samples were added in an equimolar amount. After pooling, the samples were prepared for the cluster generation using the TruSeq Cluster Generation Kit, which included ‘’HP3 (2 N NaOH)’’, ‘’HT1 (Hybridization Buffer)’’ and the PhiX Control (known virus genome, serves as a control during sequence runs). The addition of index tags was performed using the SureSelect Exon Enrichment [10].

3.1.4 Cluster generation First the cBot Reagent Plate was prepared by thawing the reagents in the 96-wells plate. 120 µl of the denatured DNA, last step 3.1.3, was loaded in an eight-tube strip. After preparing the reagent plate and DNA; the eight-tube strip, the 96-wells plate, waste bottle and the flow cell were loaded in the cBot. The single disposable self-piercing sippers were attached to the cBot above the flow cell and 96-wells reagent plate. The specific cBot clustering program was run. The cluster generation was performed using the TruSeq PE Cluster kit v3 [43].

3.1.5 Sequencing by synthesis The TruSeq SBS multiplexing reagents were thawed and prepared for the first sequencing read. Before sequencing with the HiSeq 2000 all the sippers were flushed using water, after flushing the reagents and flow cell were loaded in the HiSeq 2000 and the first sequencing read began. After 4-5 days the paired-end reagents, which were stored, were freshly prepared for the second read and were loaded in the HiSeq 2000, replacing the reagents of the first read. After loading the reagents in the HiSeq 2000 the program was run. The SBS was performed using the TruSeq SBS kit v3 (200 cycles) [44].

3.2 Data analysis The obtained sequence dataset of all variants was filtered by the bioinformatics, the filtered subset of possible pathogenic variants were further reduced using additional analyses: validation, segregation of the variant and relating the function of the gene to the phenotype.

3.2.1 Validation & segregation The validation and segregation of the variants were performed with the same methodologies. First primers were designed for the specific target gene, the amplicon contained the mutation. With these primers a PCR, including possible optimization, was conducted, the amplicon was visualized using gel electrophoresis. The amplicon was purified with the use of AMPure XP beads and sequenced using Sanger sequencing. The Sanger sequencing data was compared with a reference and/ or multiple family members. All these methods were copied from the apprenticeship report [42], and are included in the appendix’ 5 to 8.

Page 13

3.2.2 Relate function gene to phenotype The following text was typed in the NCBI database: ‘’name gene, Homo sapiens’’, and the search database was set to gene. The display of the gene was opened by clicking it and the summary, or related articles were sought. The function of the gene were linked to the phenotype, genes with non-mitochondrial or metabolic related functions were excluded.

3.3 Functional tests The reduced subset of validated variants was functionally tested, which included targeting, familiar and gene expression level analyses. Not all these functional tests were used, it depended on the type of variant, patient and phenotype of the patient which functional test was suitable.

3.3.1 Targeting First primers were designed for the specific target gene, the amplicon contained the first 60-120 coding bp of the N-terminus. Also a restriction site of two different restriction enzymes and an overhang were added to the primers, each primer contained one of the two restriction sites and overhang. With these primers a slowdown PCR was conducted, the amplicon was visualized using gel electrophoresis. The sample was purified using the MSB Spin PCRapace (Invitek), after purification the purified amplicon and a vector (pEGFP-n1) were digested with restriction enzymes KpnI and AgeI. The digested sample and vector were purified again using MSB Spin PCRapace, after purification 1 µl T4 DNA ligase, ligase buffer and digested and purified vector were added to the sample. The sample was incubated overnight at 4ºC, the ligation was verified using a slowdown PCR and specific ‘’vector’’-primers, see appendix 2. PCR amplicons of insert-vector and ‘’empty’’-vector ligations were visualized using gel electrophoresis. For each sample, which contained an insert-vector ligation, an agar plate and 100 ml LB-medium were prepared containing kanamycin (30 µg/ml). During the autoclaving of the LB-agar and medium, for each sample one vial containing competent cells (100 µl MAX efficiency DH5ɲ competent cells – Invitrogen) was transformed. The solidified agar plate was streaked with the competent cells and incubated overnight at 37ºC. The colonies were picked into a 15 ml glass tube containing 3 ml LB-medium (+ 30 µg/ml kanamycin), for each colony one tube was prepared. The tubes were incubated for 7-8 hours, 250 rpm, at 37ºC, after incubation the vectors were isolated using the GeneJET Plasmid Miniprep Kit (Fermentas Life Sciences). The isolated vectors were quantified by the Nanodrop, and verified using a slowdown PCR and a NcoI digestion. After verifying the insert-vector samples, these were mixed, in the proper concentration, with the FuGene HD Transfection Reagent (Roche Applied Science) and used to transfect Hela cells. For each sample a chamber slide containing 50-70% confluent Hela cells was used. The transfected Hela cells were incubated for 1-2 days at 37ºC, after incubation Mitotracker Red CMXRos (Molecular Probes, Invitrogen) was added to the chamber slides and the cells were fixed. The slides were dried by air and a cover glass was placed to cover the cells imprisoning a drop of DABCO DAPI. The slides were examined using a fluorescent microscope.

3.3.2 Analyses other patients The analyses of other patients included the same protocols that were used during the validation and segregation of the variant. All these methods were copied from the apprenticeship report [42].

Page 14

3.3.3 Gene expression levels First the cultured cell were harvested, RNA was isolated from these cells using the High Pure RNA isolation kit (Roche Applied Science). 1-2 µg RNA was used for the reverse transcriptase reaction, the volume of the RNA was adjusted to 26 µl using DEPC-water. The following reagents were added to the RNA: 4 µl Reverse Transcriptase 200 U (Finzymes), 1,5 µl oligodT 500 µg/ml (Invitrogen), 1,5 µl random hexameer primer 500 µg/ml, 5,0 µl dNTPs 10mM, 1 µl RNAsin 40 U (Promega), 1 µl First strand buffer 10x (Finzymes). After incubating for 1 hour at 42ºC and 5 minutes at 95ºC, the cDNA was used for a qPCR. The data was analyzed to estimate the relative gene expression level of the gene.

3.3.4 Transformation fibroblasts Two of the wells of a 6-wells plate were first covered with a 1:50 dilution of matrigel, the matrigel was solidified during two hours of incubation, these two wells were used for the transfection. Then the patient fibroblasts and control fibroblast were seeded resulting in a 50-70% confluent well. 100 µl of a 2.5*107 adenovirus particles solution was added to 3 ml DMEM medium (+10% FBS, 1%P/S), 1.5 ml was added to each of the two wells containing matrigel. After approximately three hours the transfection medium was removed, the two wells were washed twice using PBS (1%) and 2 ml clean DMEM medium was added. During the 24 hours incubation the differentiation medium was made: 50 ml DMEM + 4.5 g/l glucose containing 0.5% BSA, 0.15 mg/µl creatine, 5 ng/ml insulin; the solution was filtered through a 0.2 µm filter and before use EGF (final conc.:10 ng/µl) was added. After the (24 hours) incubation the DMEM medium was removed and the differentiation medium was pipetted in the two wells containing matrigel. Every 1-2 days the medium was changed by fresh differentiation medium [45].

Page 15

4. Results & Discussion/ Conclusion

4.1 Family DNA 07-2283

4.1.1 Variants During the WES thousands of variants are detected, these variants are filtered to a reduced and manageable subset of potential pathogenic variants. The number of variants during each filter step is included in table 1.

Table 1: Number of variants of family DNA 07-2283 during each filter step. Filter step: Number of variants

Total detected variants ± 39914 No RS number & low frequency ± 7241 Coding regions & non-synonymous ± 1472 Homozygous/ compound heterozygous (2 alleles) ± 694 Only homozygous ± 195

Because of the high number of variants remaining, the family of the two patients were homozygosity mapped against each other. During the mapping of the data a few variants remained which were encountered in both patients, LAMA3, SYNPO2, DCHS2 and KIAA1109. Two variants were picked from the larger WES data, these variants, ZYX and SLC6A8, were chosen because of the function of the gene, which could be related to the phenotype. The damaging effect of the variants was determined using the Grantham, PolyPhen-2 and SIFT score. This small subset of variants was chosen, because of their prediction score or metabolic or pathway related function (theoretical background). The prediction scores of the six variants, LAMA3, SYNPO2, DCHS2, SLC6A8, Zyx and KIAA1109; are depicted in table 2.

Table 2: The damaging effect of the six potential pathogenic variants, determined by various prediction scores. Prediction program Æ Grantham score PolyPhen-2 SIFT Variants љ LAMA3 102 1.000 0 probably damaging Damaging SYNPO2 94 0.493 0.43 possibly damaging Tolerated DCHS2 46 0.067 0.71 benign Tolerated SLC6A8 32 0.003 0.37 benign Tolerated ZYX 126 0.079 0.28 benign Tolerated KIAA1109 101 0.999 0 probably damaging Damaging

Page 16

Discussion/ Conclusion After the filtering steps the number of variants was high, the remaining subset of variants were to be reduced. The family is consanguineous, meaning the parents are related to each other and could have a large genetic similarity. That’s why, an additional strategy was used, the family of the patients was mapped against each other to find significant homozygosity regions. The additional strategy was combined with the WES data and four variants remained, the other two variants were picked from the WES data. These variants were chosen because of their relatedness with the phenotype, ZYX is a Z-disc related protein and SLC6A8 creatine related, which is used in muscles to story energy. Of the four variants, which were found during the homozygosity mapping and WES combination, only LAMA3 and SYNPO2 were thought to be interesting. The two other variants, KIAA1109, which is related to celiac diseases, and DCHS2, which is related to cadherin proteins, might be less trivial because of their function.

The three prediction scores predict generally, overall the variants, the same damaging effect, the variant is interesting whenever the scores predict the variant to be possibly or probably damaging. Possibly damaging: Grantham table ш 101, PolyPhen 2.0 ш 0.151 and SIFT ш 0.05; probably damaging: Grantham ш 150, PolyPhen 2.0 ш 0.851 and SIFT ч 0.05. The prediction scores of LAMA3 indicate that the alteration of the amino acid is a severe change and is possibly to probably damaging. SYNPO2, DCHS2, SLC6A8 and ZYX are more tolerated mutations, although SYNPO2 might be possibly damaging. The KIAA1109 variant is predicted to be probably damaging, despite of the high prediction score the variant is thought to be less important because of its function. The most interesting variants, according to their relatedness to the phenotype and prediction scores, are LAMA3 and SYNPO2. However the prediction score, as the name already indicates, is a prediction of the damaging effect, only functional tests will prove the variant to be damaging or not. Yet, the prediction programs remain a useful tool to filter the potential pathogenic variants.

4.1.2 Sanger sequencing validation & segregation The subset of potential pathogenic variants was to be validated and tested on segregation, of most genes, not every gene has been validated or tested for segregation, the validation and segregation data is included, an overview is depicted in table 3. The Sanger sequence data, forward and reverse strand, is mapped against the reference, the variant and its reference are marked between the two striped lines. The codon, in which the mutation is located, is underlined, the genotype of the patient, mother and father is also depicted in the family tree. Primers for the Sanger sequencing are included in appendix 9.

Table 3: Overview of validation and segregation of the variants. Variant Status of validation Segregation check LAMA3 Confirmed Segregated correctly SYNPO2 Confirmed Segregated correctly DCHS2 Confirmed Not tested SLC6A8 Unconfirmed Segregated incorrectly ZYX Confirmed as heterozygous, not Segregated correctly as a homozygous heterozygous variant

KIAA1109 Not tested Not tested

Page 17

LAMA3, c.2234G>T p.Arg745Leu (figure 7): Patient (T/T): Father (G/T): Forward: 5’ CTT C T A TTT 3’ Forward: 5’ CTT C T/G A TTT 3’ Reference: 5’ CTT C G A TTT 3’ Reference: 5’ CTT C G A TTT 3’

Reverse: 5’ AAA T A G AAG 3’ Reverse: 5’ AAA T A/C G AAG 3’ Reference: 5’ AAA T C G AAG 3’ Reference: 5’ AAA T C G AAG 3’

Mother (G/T): Forward: 5’ CTT C T/G A TTT 3’ Reverse: 5’ AAA T A/C G AAG 3’ Reference: 5’ CTT C G A TTT 3’ Reference: 5’ AAA T C G AAG 3’

In the forward strand the T is the mutation, in the reverse strand the A is the mutation.

SYNPO2, c.1656A>C p.Lys552Asn (figure 8): Patient (C/C): Father (A/C): Forward: 5’ GCA A C A GCT 3’ Forward: 5’ GCA A A/C A GCT 3’ Reference: 5’ GCA A A A GCT 3’ Reference: 5’ GCA A A A GCT 3’

Reverse: 5’ AGC T G T TGC 3’ Reverse: 5’ AGC T G/T T TGC 3’ Reference: 5’ AGC T T T TGC 3’ Reference: 5’ AGC T T T TGC 3’

Mother (A/C): Forward: 5’ GCA A A/C A GCT 3’ Reverse: 5’ AGC T G/T T TGC 3’ Reference: 5’ GCA A A A GCT 3’ Reference: 5’ AGC T T T TGC 3’

In the forward strand the C is the mutation, in the reverse strand the G is the mutation.

G/T G/T A/C A/C

T/T T/T C/C C/C Figure 7: Family tree of the LAMA3 variant. Both Figure 8: Family tree of SYNPO2 variant. Both the mother and father are heterozygous (no trait, the mother and father are heterozygous (no trait, carriers, crossed sign (G/T)), the patients were both carriers, crossed sign (A/C)), the patients were homozygous (trait, fully colored (T/T)). both homozygous (trait, fully colored (C/C). Grandparents weren’t included in the genotyping Grandparents weren't included in the genotyping (blank). (blank).

Page 18

DCHS2, c.8351G>A p.Ser2784Asn: Patient (A/A): Forward: 5’ GGC A A T AAA 3’ Reverse: 5’ TTT A T T GCC 3’ Reference: 5’ GGC A G T AAA 3’ Reference: 5’ TTT A C T GCC 3’

In forward strand the A is the mutation, in the reverse strand the T is the mutation. The segregation of the DCHS2 mutation wasn’t tested, because of the priority of the other genes.

SLC6A8, c.691C>G p.Leu231Val (figure 9): Patient (G/G): Father (C/G): Forward: 5’ GCC C TC AAC 3’ Forward: 5’ GCC C TC AAC 3’ Reference: 5’ GCC C TC AAC 3’ Reference: 5’ GCC C TC AAC 3’

Reverse: 5’ GTT G AG GGC 3’ Reverse: 5’ GTT GA G GGC 3’ Reference: 5’ GTT G AG GGC 3’ Reference: 5’ GTT GA G GGC 3’

Mother (C/G): Forward: 5’ GCC C TC AAC 3’ Reverse: 5’ GTT GA G GGC 3’ Reference: 5’ GCC C TC AAC 3’ Reference: 5’ GTT GA G GGC 3’

In the forward strand a G is a mutation, in the reverse strand a C is a mutation. The mutation has not been observed in the Sanger sequence results.

C/C C/C Figure 9: Family tree of SLC6A8 variant. Both the mother and father are homozygous (no trait, blank (C/C)), the patients were also both homozygous (no trait, C/C C/C blank (C/C)). Grandparents weren't included in the genotyping (blank).

ZYX, c.263C>A p.Ala88Asp (figure 10, next page) Patient (C/A): Father (C/A): Forward: 5’ GGT G C/ A T CTG 3’ Forward: 5’ GGT G C/A T CTG 3’ Reference: 5’ GGT G C T CTG 3’ Reference: 5’ GGT G C T CTG 3’

Reverse: 5’ CAG A G/T C ACC 3’ Reverse: 5’ CAG A G/T C ACC 3’ Reference: 5’ CAG A G C ACC 3’ Reference: 5’ CAG A G C ACC 3’

Mother (C/C): Forward: 5’ GGT G C T CTG 3’ Reverse: 5’ CAG A G C ACC 3’ Reference: 5’ GGT G C T CTG 3’ Reference: 5’ CAG A G C ACC 3’

In forward strand the A is the mutation, in the reverse strand the T is the mutation.

Page 19

Figure 10: Family tree of the ZYX variant. The mother C/C A/C is homozygous for the wild-type (C/C, no trait, blank) the father is heterozygous (no trait, carrier, crossed sign (A/C)), the patients were both heterozygous (yet a trait, crossed sign because of heterozygosity (A/C)). A/C A/C Grandparents weren't included in the genotyping

KIAA1109, c.6664C>T p.Arg2222Trp (figure 12) Hasn’t been validated or tested on segregation.

Discussion/ Conclusion: The LAMA3, SYNPO2, DCHS2 and ZYX variants were all confirmed during the Sanger sequencing validation. The SLC6A8 variant was unconfirmed, which signifies that SLC6A8 is a false positive read and is therefore excluded. Zyxin is actually a heterozygous variant, and not homozygous. The KIAA1109 variant wasn’t validated and tested on segregation, because the variant couldn’t be linked to the phenotype. Only the LAMA3, high prediction scores, SYNPO2, SLC6A8 and zyxin, all related to the phenotype, variants were tested on segregation. The DCHS2 wasn’t tested on segregation, the function of the gene didn’t relate to the phenotype. According to the results of the segregation test, it’s concluded that both LAMA3 and SYNPO2 segregated correctly. The zyxin variants did also segregate correctly as a heterozygous variant, the mother didn’t have a mutation (C/C), the variant in the patients can therefore never be homozygous.

The parents, who had no trait, were both carriers of a heterozygous LAMA3 or SYNPO2 mutation (LAMA3, G/T; SYNPO2, A/C), the two patients were homozygous to the mutation (LAMA3, T/T; SYNPO2, C/C). The SLC6A8 variant wasn’t segregated correctly, the parents were both homozygous and didn’t have any mutation (C/C). As mentioned before the patients didn’t either, both were homozygous for the wild type (C/C). The segregation and validation of the SLC6A8 variant was done simultaneously, to run the validation and segregation of one gene separately would be time- consuming.

Only the LAMA3 and SYNPO2 variants are left after the validation and segregation of the variants, these are thought to be the most trivial variants. LAMA3 is related to the basal membrane and requires material, which at the moment isn’t available, therefore SYNPO2 was chosen for the functional tests. SYNPO2 is linked to the alpha actin in muscle cells, rather the Z-disc assembly, which makes SYNPO2 an easier variant to investigate.

Page 20

4.1.3 SYNPO2 protein targeting The potential pathogenic variants were to be functionally tested, only one variant, SYNPO2, was functionally tested, because of its relatedness with the phenotype. The SYNPO2 protein was targeted in Hela cells, see figure 13, the cells were stained with a DAPI staining, which stains the nucleus, Mitotracker Red, which stains the mitochondria, and a potential signal peptide of SYNPO2-GFP transfection. All the various stainings were visualized using a fluorescent microscope, and combined in a composite. As controls, a vector without an insert (figure 12) and OXCT1 (figure 11) were used, OXCT1 has a mitochondrial signal peptide and was chosen for that purpose as control.

A B A B

C D C D

Figure 11: Positive control, OXCT1. (A) Is the Figure 12: Vector without insert. (A) Is the DAPI (nucleus) staining, (B) is the Mitotracker DAPI (nucleus) staining, (B) is the Mitotracker (mitochondria) staining, (C) is the GFP and (D) is (mitochondria) staining, (C) is the GFP and (D) is the composite. the composite.

A B

C D

Figure 13: SYNPO2 targeting, (A) Is the dapi (Nucleus) staining, (B) is the Mitotracker (mitochondria) staining, (C) is the GFP and (D) is the composite. Discussion/ Conclusion The transfection of OXCT1-GFP into Hela cells resulted in a specific green fluorescent mitochondria staining, which was confirmed by the co-mitochondria-staining with Mitotracker Red. The vector without insert transfection into the Hela cells displayed, as expected, a diffuse green fluorescent staining throughout the whole cell. The vector without insert doesn’t contain a signal peptide, which transports the GFP to an organelle. The GFP accumulates therefore in the cytoplasm and nucleus. It’s concluded that the controls are reliable, the OXCT1 and the vector without insert transfection resulted in the proper stainings.

Page 21

The potential signal peptide of SYNPO2-GFP transfection into the Hela cells showed a diffuse green fluorescent staining, as the vector without an insert. It’s possible that the insert, which was ligated into the vector, wasn’t the signal peptide of SYNPO2. Only the N-terminus sequence was amplified and used for transfection, signal peptides can also be located at the C-terminus or they are internal targeting peptides. To verify whether SYNPO2 has got a signal peptide it’s possible to ligate the entire SYNPO2 protein, 1261 amino acids, into the vector. However, proteins which do not contain a signal peptide are localized in the cytoplasm [1], and according to the NCBI database and literature [48], SYNPO2 is localized in the cytoplasm or nucleus (or the Z-disc in muscle cells), which is consistent to the expression in this experiment and would therefore indicate that SYNPO2 hasn’t got a signal peptide [48]. Literature indicates the presence of multiple proteins which ensures SYNPO2 to be transported into and out of the nucleus and cytoplasm [49][50].

4.1.4 Myogenesis – qPCR gene expression levels Patient fibroblasts were transfected using an Adeno-virus to transform the fibroblasts into myotubes, as control Normal Human Derived Fibroblasts (NHDF’s) are included. Fibroblasts are a type of connective tissue cells, they synthesize the extracellular matrix and collagen, and are used because they’re easily convertible. These transfected fibroblasts were kept in culture for approximately 7 days, along with the controls: non-transfected patient fibroblast, transfected NHDF’s and non- transfected NHDF’s. After 7 days the cells were harvested and the RNA was isolated, the RNA was reverse transcribed into cDNA and using the cDNA, in a one to four dilution using milli-Q, a qPCR was performed. A relative quantification was performed, to analyze the gene expression levels of SYNPO2 in the created myotubes, as housekeeping gene the TATA-binding protein (TBP) was used. To verify the transfected fibroblasts to be myotubes, first an audit was performed using several myotubes markers (myogenin (MyoG), myosin heavy chain 1 (MYH1) and myostatin (MSTN)), a fibroblast marker (vimentine (VIM)). The Ct values, of the triplets, of the control are included in table 4.

Table 4: Ct values relative quantification of the triplets of TBP, VIM, MYH1, MyoG & MSTN. Samples TBP VIM MYH1 MyoG MSTN Blank Undetermined Undetermined Undetermined Undetermined Undetermined Blank Undetermined Undetermined Undetermined Undetermined Undetermined Blank Undetermined Undetermined Undetermined Undetermined Undetermined Fibroblasts 28.326284 17.646263 26.730747 31.099161 35.750416 Fibroblasts 27.927063 18.302036 26.686623 30.904863 36.502052 Fibroblasts 27.959057 18.246935 26.702051 31.458458 36.87149 Transf. Fibroblasts 28.27187 18.51885 27.41554 24.28934 27.21944 Transf. Fibroblasts 28.51401 18.42749 27.46274 24.2681 27.0873 Transf. Fibroblasts 28.5131 18.40431 27.55647 24.19987 27.09986 NHDF’s Undetermined 33.97698 37.82465 Undetermined Undetermined NHDF’s Undetermined 33.74108 Undetermined Undetermined Undetermined NHDF’s Undetermined 34.16654 Undetermined Undetermined Undetermined

Page 22

Transf. NHDF’s 29.17628 20.83589 28.65399 24.99596 27.04029 Transf. NHDF’s 29.0825 21.19682 31.62529 24.92736 27.17607 Transf. NHDF’s 28.98302 20.96985 33.1312 25.04211 27.2651

The mean Ct value of the triplets, which differed ” 1-2%, was calculated and used to calculate the dCt ^Ct -Ct values, dCt = 2 TBP marker. Per marker the dCt’s are depicted in the following figures 14 to 17, the dCt values are also included in table 5. The Vimentin dCt values are divers, the transfected fibroblast show the highest concentration of Vimentin. The transfected and non-transfected fibroblasts displayed a low dCt MYH1 marker value, where the non-transfected fibroblasts contain the highest dCt. The MyoG and MSTN marker values differ between the two transfected cell-lines and the non- transfected fibroblasts. In all the markers the NHDF’s didn’t have a dCt or even a Ct value.

Table 5: dCt values control myotubes. Samples VIM MYH1 MyoG MSTN Fibroblasts 889.1702 2.574563 0.131105 0.002549 Transf. Fibroblasts 1011.849 1.938235 18.13312 2.457957 NHDF’s Undetermined Undetermined Undetermined Undetermined Transf. NHDF’s 289.5614 Undetermined 17.05501 3.78453

Vimentin

1200

1000

800 t

C 600 Vimentin d

400

200

0 Fibroblasts Trans. Fibroblasts NHDF's Trans. NHDF's Samples

Figure 14: dCt graph Vimentin control myotubes; Fibroblasts, trans. fibroblasts, NHDF's and trans. NHDF's.

Page 23

MYH1

3

2.5

2 t

C 1.5 MYH1 d

1

0.5

0 Fibroblasts Trans. Fibroblasts NHDF's Trans. NHDF's Monsters

Figure 15: dCt graph Myosin Heavy Chain 1 control myotubes; fibroblasts, trans. fibroblast, NHDF's and trans. NHDF's.

MyoG

20

18

16

14

12 t

C 10 MyoG d 8

6

4

2

0 Fibroblasts Trans. Fibroblasts NHDF's Trans. NHDF's Samples

Figure 16: dCt graph Myogenin control myotubes; fibroblasts, trans. fibroblasts, NHDF's and trans. NHDF's.

Page 24

MSTN

4

3.5

3

2.5 t

C 2 MSTN d

1.5

1

0.5

0 Fibroblasts Trans. Fibroblasts NHDF's Trans. NHDF's Samples

Figure 17: dCt graph Myostatin control myotubes; fibroblasts, trans. fibroblasts, NHDF's and trans. NHDF's.

SYNPO2 expression levels Using the same cDNA (1:4 dilution) of the samples, which was used during the control, a relative quantification of SYNPO2 expression was performed. To analyze the gene expression levels of SYNPO2 in the created myotubes and comparing that with skeletal muscle. Because the NHDF’s didn’t give a result, these were excluded from the quantification, the Ct values are included in table 6.

Table 6: Ct values relative quantification of TBP & SYNPO2. Marker Blank Fibroblast Trans. Trans. Sk. Muscle Fibroblast NHDF’s TBP Undetermined 25.96258 26.341621 26.975359 Undetermined Undetermined 25.976967 26.493065 26.787342 34.28437 Undetermined 26.13358 26.27785 26.874172 33.765053 SYNPO2 Undetermined 28.63408 28.275553 29.688778 31.2639 Undetermined 28.695406 27.890701 30.422806 31.463882 Undetermined 29.231686 28.214586 30.078016 32.029713

The triplets, which differed ” 1-2%, were averaged, using the mean of the triplets the dCt was ^Ct -Ct calculated, dCt = 2 TBP marker. The dCt values are included in table 7 and depicted in figure 18. The skeletal muscle sample had the highest dCt, the transfected NHDF’s the lowest. The relative expression of the transfected fibroblasts is higher than the relative expression of the non-transfected fibroblasts.

Page 25

Table 7: dCt values relative quantification of SYNPO2 expression. Sample dCt values Fibroblasts 0.160387395 Trans. Fibroblasts 0.272773582 Trans. NHDF’s 0.124615952 Sk. Muscle 6.323926062

SYNPO2

7

6

5

4 t

C SYNPO2 d 3

2

1

0 Fibroblasts Trans. Fibroblasts Trans. NHDF's Sk. Muscle Samples

Figure 18: dCt values relative quantification of SYNPO2 expression.

Discussion/ Conclusion RNA was isolated from the control fibroblasts, transfected fibroblasts, NHDF’s and transfected NHDF’s. During the isolation ethanol is centrifuged trough the Spin filter to wash the RNA, while eluting the RNA small ethanol residues ended up in the eluted RNA solution of the NHDF sample. Ethanol precipitates the RNA, meaning the majority of the RNA used during the qPCR was possible precipitated and unusable to do a qPCR with. Using the RNA of the NHDF’s resulted in no Ct values.

The synthesized cDNA of the harvested cell lines was used for a relative quantification. First the cDNA was used for a control, to verify that the transfected fibroblasts were differentiated to myotubes. The verification of the transfected fibroblasts was done using four markers, VIM (fibroblast marker), MYH1, MyoG and MSTN (myotubes markers). The triplets of the samples and marker were comparable and were therefore averaged, outliers, which differed • 2 %, were excluded leaving a reliable duplicate. The qPCR blank, the negative control, was negative, because of the negative blank and comparable triplets the experiment is considered to be reliable.

Vimentin, which is a fibroblasts marker, has the highest expression in fibroblasts, but is also less expressed in other cell lines, and a component of the intermediate filament, was highly expressed in all cell lines. The difference between the fibroblasts and transfected fibroblasts was high, where the transfected fibroblasts had the highest dCt value. Although the fibroblasts are transfected, not all fibroblasts contain viruses or, when transfected, differentiate, the transfected cell line remains a mixture of differentiated fibroblasts and normal fibroblasts.

Page 26

However even though there is a mixture, it was expected that during the differentiation of the transfected fibroblasts the vimentin level would decrease, because vimentin is expressed in a lesser extent in myotubes than in fibroblasts. The myosin heavy chain 1 marker displayed a small difference between the fibroblasts and transfected fibroblasts, the fibroblasts showed the highest dCt value. It’s unclear why the fibroblasts contain a higher MYH1 level than the transfected fibroblasts, because MYH1 is a specific muscle protein. The myogenin and myostatin markers are also specific muscle markers, these markers had a large distribution amongst the samples. The transfected cell lines had the highest expression, which differed significantly with the non-transfected fibroblasts. The expression of these genes is high in the cell lines, the fibroblasts as well as the NHDF’s had probably differentiated into myotubes. The cell lines weren’t compared with skeleton muscle, which is necessary to conclude the transfected cell lines are myotubes. Because of the difference in expression of MyoG and MSTN and assuming that the higher vimentin level are probably caused by the mixture of cell lines, it’s concluded that the transfected fibroblasts were expressing muscle proteins and therefore were in a differentiating condition.

Next, a relative quantification of the SYNPO2 expression levels in the patient transfected fibroblasts was performed, to check whether the mutation had an effect on the gene expression level and SYNPO2 was actually expressed in these transfected fibroblasts. The measurement itself was reliable, the triplets were comparable and therefore averaged, and the negative controls (water) were negative. The SYNPO2 expression level was highest in the skeletal muscles, significantly lower in the other cell lines, of the other cell lines the transfected fibroblasts had the highest expression of SYNPO2. There’s a gain in expression, between the transfected and non-transfected fibroblasts, the level is higher than in the non-transfected fibroblasts, however incomparable with the skeletal muscle. The gene expression of SYNPO2 was also low in the transfected NHDF’s, it’s therefore concluded that the low expression in the transfected fibroblasts and NHDF’s is due to the unsuccessful differentiation of the cell lines. Before other tests are used, such as staining cell lines using antibodies, the myogenesis of the fibroblasts should first be optimized. The optimization could include factors like: the number of virus particles used to transfect the cells, too much virus isn’t optimal, the transfer passage of the fibroblasts, which was high.

Page 27

4.2 Family DNA 08-5759

4.2.1 Variants (2) Using WES thousands of variants are detected, this large data set of variants is filtered to a reduced and manageable subset of potential pathogenic variants. The number of variants during each filter step of family 08-5759 is included in table 8.

Table 8: Number of variants of family DNA 08-5758 during each filter step. Filter step: Number of variants Total detected variants ± 53762 No RS number & low frequency ± 9419 Coding regions & non-synonymous ± 1069 Homozygous/ compound heterozygous (2 alleles) ± 627 Compound heterozygous ± 40

Of the remaining subset of potential pathogenic variants, the gene function was related to the phenotype and the Grantham score and PolyPhen-2 score were determined. A small subset of variants was chosen, because of their prediction score or metabolic or pathway related function (theoretical background). The prediction scores of the two, actually four, compound heterozygous variants, KRTAP10-6 and PSPH; are depicted in table 9.

Table 9: Prediction scores of three potential pathogenic variants. Prediction program Æ Grantham score PolyPhen-2 SIFT Variants љ KRTAP10.6 variant 1 26 0.000 0.01 c.184C>T Benign Damaging KRTAP10.6 variant 2 103 0.000 0.12 c.206C>G Benign Tolerated PSPH variant 1 110 0.001 0 c.81A>T Benign Damaging PSPH variant 2 94 0.846 0.49 c.95A>G Possibly damaging Tolerated

Discussion/ Conclusion The filtering steps reduced the number of potential pathogenic variants to 627 variants, because the family wasn’t consanguineous the heterozygous variants were thought to be the most interesting variants. The remained subset of variants contained 40 variants, of these 40 a small subset was chosen to be validated and the segregation was tested. The subset constituted of KRTAP10.6, PSPH and FAHD2B; PSPH and FAHD2B were chosen because of their metabolic function, KRTAP10.6 had a high SIFT prediction score. FAHD2B was already tested, the variant was validated, but didn’t segregate correctly, which left only PSPH and KRTAP10.6 as potential pathogenic variants. These two variants were validated and tested on segregation.

The prediction scores are distributed, especially the KRTAP10.6 and PSPH variant one, which has a high SIFT score, but a low Grantham and PolyPhen 2 score. Variant number two of both genes were less distributed, but differed nonetheless. All the prediction programs use different methods to predict the damaging effect of a variant, only functional test will point out which variant is potential pathogenic. As mentioned before, the prediction score is a prediction, but is a useful tool to reduce the number of variants.

Page 28

4.2.2 Sanger sequencing validation & segregation (2) The subset of potential pathogenic variants was to be validated and tested on segregation, of the two genes the validation and segregation data is included, an overview is depicted in table 10. The Sanger sequence data, forward and reverse strand, is mapped against the reference, the variant and its reference are marked between the two striped lines. The codon, in which the mutation is located, is underlined, the genotype of the patient, mother and father is also depicted in the family tree. Primers for the Sanger sequencing are included in appendix 9.

Table 10: Overview of validation and segregation of the variants of family DNA 08-5759. Variant Status of validation Segregation check KRTAP10.6 Confirmed Segregated correctly variant 1 KRTAP10.6 Confirmed Segregated correctly variant 2 PSPH Unconfirmed Segregated incorrectly variant1 PSPH Unconfirmed Segregated incorrectly variant 2

KRTAP10.6, c.184C>T p.Arg62Lys (figure 19): Patient (C/T): Father (C/C): Forward: 5’ AGC C/T GT GTG 3’ Forward: 5’ AGC C GT TG 3’ Reference: 5’ AGC C GT GTG 3’ Reference: 5’ AGC C GT GTG 3’

Reverse: 5’ CAC AC G/A GCT 3’ Reverse: 5’ CAC AC G GCT 3’ Reference: 5’ CAC AC G GCT 3’ Reference: 5’ CAC AC G GCT 3’

Mother (C/T): Forward: 5’ AGC C/T GT GTG 3’ Reverse: 5’ CAC AC G/A GCT 3’ Reference: 5’ AGC C GT GTG 3’ Reference: 5’ CAC AC G GCT 3’

In the forward strand the T is the mutation, in the reverse strand the A is the mutation. The mutation (C>T) was detected in the Sanger sequencing data of the patient and the mother.

KRTAP10.6, c.206C>G p.Pro69Arg (figure 20): Patient (C/G): Father (C/G): Forward: 5’ TGC C C/G A GTG 3’ Forward: 5’ TGC C C/G A GTG 3’ Reference: 5’ TGC C C A GTG 3’ Reference: 5’ TGC C C A GTG 3’

Reverse: 5’ CAC T G/C G GCA 3’ Reverse: 5’ CAC T G/C G GCA 3’ Reference: 5’ CAC T G G GCA 3’ Reference: 5’ CAC T G G GCA 3’

Mother (C/C): Forward: 5’ TGC C C A GTG 3’ Reverse: 5’ CAC T G G GCA 3’ Reference: 5’ TGC C C A GTG 3’ Reference: 5’ CAC T G G GCA 3’

In the forward strand the G is the mutation, in the reverse strand the C is the mutation. The mutation (C>G) was detected in the Sanger sequencing data of the patient and the father.

Page 29

C/T C/C C/C C/G

C/T C/G Figure 19: Family tree of the KRTAP10.6 Figure 20: Family tree of the KRTAP10.6 c.184C>T The mother is heterozygous (no trait, c.206C>G variant. The mother is homozygous (no carrier, crossed sign (C/T)), the father is trait (C/C)), the father is heterozygous (no trait, homozygous (no trait (C/C)). The patient is carrier, crossed sign (C/G)). The patient is heterozygous (trait, fully colored (C/T)). heterozygous (trait, fully colored (C/G). Grandparents weren't included in the genotyping. Grandparents weren't included in the genotyping.

PSPH, c.81A>T p.Arg27Ser: Patient (A/A): Father (A/A): Forward: 5’ ATC AG A GAA 3’ Forward: 5’ ATC AG A GAA 3’ Reference: 5’ ATC AG A GAA 3’ Reference: 5’ ATC AG A GAA 3’

Reverse: 5’ TTC T CT GAT 3’ Reverse: 5’ TTC T CT GAT 3’ Reference: 5’ TTC T CT GAT 3’ Reference: 5’ TTC T CT GAT 3’

Mother (A/A): Forward: 5’ ATC AG A GAA 3’ Reverse: 5’ TTC T CT GAT 3’ Reference: 5’ ATC AG A GAA 3’ Reference: 5’ TTC T CT GAT 3’

In the forward strand a T would be the mutation, in the reverse strand an A would be the mutation. The mutation isn’t detected in the Sanger sequencing data.

PSPH, c.95A>G p.Asp32Gly: Patient (A/A): Father (A/A): Forward: 5’ ATC G A T GAG 3’ Forward: 5’ ATC G A T GAG 3’ Reference: 5’ ATC G A T GAG 3’ Reference: 5’ ATC G A T GAG 3’

Reverse: 5’ CTC A T C GAT 3’ Reverse: 5’ CTC A T C GAT 3’ Reference: 5’ CTC A T C GAT 3’ Reference: 5’ CTC A T C GAT 3’

Mother (A/A): Forward: 5’ ATC G A T GAG 3’ Reverse: 5’ CTC A T C GAT 3’ Reference: 5’ ATC G A T GAG 3’ Reference: 5’ CTC A T C GAT 3’

In the forward strand a G would be the mutation, in the reverse strand a C would be the mutation. The mutation (A>G) isn’t detected during the Sanger sequencing, because the mutations weren’t found the family tree wasn’t included.

Page 30

Discussion/ Conclusion The KRTAP10.6 variants were both confirmed variants during the Sanger sequencing validation, the PSPH variants were both unconfirmed variants. The PSPH variants are false positive reads and are therefore excluded, these variants indicate the importance of validation of with NGS found variants. NGS has a higher throughput and sequences a lot faster, but these advantages reduce the accuracy and reliability of the sequenced variants. However both variants of both genes were tested on segregation, this was done simultaneously with the validation, it’s less time-consuming. The KRTAP10.6 variants segregated correctly, the patient is compound heterozygous, meaning two different heterozygous variants which cause the trait. KRTAP10.6 variant c.184C>T segregated from the mother, which was heterozygous for the variant (C/T), the father was homozygous for the wild- type (C/C). The KRTAP10.6 variant c.206C>G segregated from the father, which was heterozygous for this variant (C/G), while the mother was homozygous for the wild-type (C/C). The PSPH variants didn’t segregate correctly, of course, because they didn’t exist, both the patients as the parents were homozygous for the wild-type (A/A; A/A).

Only the KRTAP10.6 variants remained after filtering, validation and segregation. Because the PSPH variants were false positive reads and the KRTAP10.6 variants couldn’t be linked to the phenotype of the patient, the family DNA 08-5759 was left aside and the variants of family 07-2283 were the only ones to be functionally tested.

Page 31

5. Discussion/ Conclusion & Recommendations The Whole Exome Enrichment linked to the Illumina Sequencing is a useful tool to detect potential pathogenic variants, these variants cause myopathies and other diseases. During the graduation term it was expected to solve a mitochondrial myopathy, two families were chosen to investigate and these were sequenced. The obtained sequenced dataset was filtered using filter steps that were based on a few assumptions, the variants are possibly new variants, have a low allele frequency in the general population and are non-synonymous mutations. Yet based on assumptions the filter steps are quit solid, mitochondrial myopathies are relatively rare, 1 in 10,000 affected children, and 85% of the mutations which have a large effect on disease-related traits are harboured by the exome[5]. The final step of filtering, before additional strategies are used, is based on whether the patient is from a consanguineous family or not. Consanguineous families have a significant genetic similarity, so the potential pathogenic variant is thought to be rather homozygous than heterozygous. With non-consanguineous families it might be the other way around, rather heterozygous than homozygous. Relatively these filter steps are quit reliable to filter a large obtained dataset to a reduced subset of potential pathogenic variants. After the first set of filter steps, the subset is even further shortened using prediction scores and phenotype to gene function linkage. These steps are useful to filter the subset, yet they remain tricky. Variants that are potentially pathogenic are validated using Sanger sequencing, because of the lower accuracy of the NGS instruments, which possibly reduces the list even more and strengthens the reliability of the found variants. After the analyses of the segregation, the variants are functionally tested, the used functional tests in the graduation term might be relatively short, quick and normal tests which easily can be performed during the term. However are necessary to find out, what and how the variant causes the trait, extender tests can indicate the pathogenicity of the variant.

Because the variants of family 08-5759 were false positive reads or thought to be less trivial, only the variants of family 07-2283 were further investigated. During filtering and validation/ segregation of the dataset of family 07-2283 only a small subset of three variants remained, which are potentially pathogenic, these three variants were LAMA3, SYNPO2 and, the later found, ZYX. SYNPO2 and ZYX are both Z-disc related genes and were chosen because of their relatedness [30][31], where LAMA3 was chosen because of its severe prediction score. SYNPO2 is eventually chosen because it’s thought to be an interesting variant which can easily be tested with the use of different functional tests, according to other studies described in literature [30][48][49][50]. SYNPO2 was tested using transfected fibroblasts, it was expected that these fibroblasts would differentiate to myotubes and would contain Z-discs. Using a relative quantification with these transfected fibroblasts the SYNPO2 expression was measured, to check if the variant altered the expression of the gene and SYNPO2 was expressed or not. The SYNPO2 gene expression of the transfected fibroblasts and NHDF’s is, in comparison with skeletal muscle, low. The approximately similar low SYNPO2 gene expression in both transfected cell lines indicates that the variant wasn’t the cause of the low SYNPO2 expression. The low expression is probably the cause of the insufficient differentiation of the transfected cell lines. To test if the variant alters the expression or causes a truncated protein, further analyses of the SYNPO2 variant should focus on the myogenesis of the fibroblasts, optimizing the transfection and differentiation of the patient fibroblasts. These transfected fibroblasts are of great importance to analyze the SYNPO2 variant. SYNPO2 is expressed in multiple tissues, such as the uterus, lung, stomach, colon, skeletal muscle and heart; however it’s mainly restricted to the muscular cell layers of the tissues, the smooth or striated muscle cells.

Page 32

The damaging effect of the SYNPO2 variant can therefore only be analyzed in muscular cells, which makes the transfection of the patient fibroblasts even more important. The expression of SYNPO2 and the differentiation of cells can also be detected with the use of immunohistochemical or fluorescent antibodies in living differentiating cells. The SYNPO2 expression and distribution in the cell is type, state of differentiation and stress dependent. In undifferentiated myoblasts the SYNPO2 is located in the nucleus and very weakly in the cytoplasm, whenever the myoblasts differentiate SYNPO2 is located in the nucleus and along actin filaments. In differentiated myotubes SYNPO2 is completely exported out of the nucleus into the cytoplasm where it binds to the Z-disc protein alpha actinin. SYNPO2 attaches to actin in myoblasts, after the myoblasts begin to differentiate it’s relocated to alpha-actinin, this could indicate that SYNPO2 is important during the myogenesis in assembly of the cytoskeleton and Z-disc. This is supported by the fact that the SYNPO2 expression precedes the expression of alpha actinin. Under stress conditions the SYNPO2 is located only in the nucleus and depleted from the cytoplasm and Z-disc, which could be due to the mutation. The mutation detected in exon 4 is located in the alpha binding region and one of the three alpha actin binding regions [48][30]. It’s possible that because of the mutation the binding of SYNPO2 to actin fails and therefore alpha actin can’t be correctly connected to the actin, which results in a disrupted Z- disc. Using antibodies in various developmental stages of the transfected fibroblasts could give an impression of SYNPO2 pathway in the assembly of the Z-disc. Assembly of the Z-disc can also be detected using an electron microscope, improper development of the Z-disc can easily detected.

Another interesting pathway which could be investigated is the trafficking of the SYNPO2 protein between the nucleus and cytoplasm, because SYNPO2 doesn’t use a signal peptide to be transported. The protein uses other proteins; the location of SYNPO2 depends on the phosphorylation state of the protein. , signalling protein, ensures SYNPO2 is dephosphorylated, because of the phosphorylated state of SYNPO2 the 14-3-3 protein can’t bind to SYNPO2. However if (PKA) and Ca2+/Calmodulin-dependent kinase II (CaMKII), which also are signalling proteins, are activated or calcineurin is inhibited, SYNPO2 is phosphorylated, which enables the interaction of 14- 3-3. The interaction between these two proteins results in a release of SYNPO2 from the Z-disc anchoring proteins, once released in the cytoplasm SYNPO2 can interact with importin and SYNPO2 is transported to the nucleus, see appendix 10, figure 24 for an overview. Because of the interaction with these signalling proteins, SYNPO2 itself could be a signalling protein and it communicates with the nucleus, which has been suggested for multiple Z-disc proteins [50]. In this scenario the Z-disc would sense an increase of stress in the cell, tissue and organ; and it reacts by phosphorylizing SYNPO2 which is transported to the nucleus. Once arrived in the nucleus, SYNPO2 could be the signal for the nucleus to change its gene expression of muscle specific proteins, which results in remodelling and cardiac hypertrophy [51].

However SYNPO2 isn’t the only protein which is a component of the Z-disc, during the first stages of muscle development SYNPO2 co localizes with zyxin, see appendix 11, figure 25. Zyxin isn’t expressed in later developmental stages of the myotubes or adult skeletal muscle, which would indicate that it mediates in early stages of the myogenesis [30]. The co localization of SYNPO2 with other Z-disc proteins should be investigated to determine the damaging effect of a non-functional SYNPO2 protein in Z-disc assembly. However to analyze the effect, first, the differentiation and transfection of patient fibroblasts should be optimized. Because of the optimization infinite in vitro experiments can be done, to create an overview of the SYNPO2 pathway using the proper antibodies.

Page 33

Second, if the overview is created and the proper antibodies prove to work, pathological material of the patient could be used. Immunohistochemistry or fluorescent antibodies and Western blotting can be used to detect the SYNPO2 protein and analyze the Z-disc assembly of the patient muscle cells. However, as third recommendation, not only the SYNPO2 variant can be analyzed in the transfected fibroblasts or pathological materials, also LAMA3 can be visualized in these materials to check the LAMA3 protein. Fourth, the SYNPO2 protein could also be analyzed in a model system, e.g. zebra fish, which has also been done with another Z-disc related gene, CHAP [52]. Fifth, the trafficking of SYNPO2 between the nucleus and cytoplasm should be analyzed; if the signalling proteins are disturbed SYNPO2 becomes stressed and is located in the nucleus. However it could also be a combination of two causes, the mutation and disturbed signalling proteins, which could lead to the disease. Further analyses could reveal the cause of the disease of the two patient of family DNA 07- 2283.

The whole exome enrichment linked to Illumina sequencing can be used to find all sorts of genetic diseases or traits, like mitochondrial myopathies or myopathies such as the patients from the investigated families are affected with. The used filter steps, prediction programs and additional tests to find the potential pathogenic variant are effective, however functional and additional tests need time to be optimized and work properly. The aim/ hypothesis were to filter the potential pathogenic variant from a dataset containing thousands of variants, and functionally test the found variants. The filter steps reduced the number of variants and left a small subset of potential pathogenic variants. These variants can only be called pathogenic if they’re functionally tested, which only the SYNPO2 variant was. Only half of the goal is achieved, however it’s still uncertain if SYNPO2, or for that matter LAMA3, is the potential pathogenic variant.

Page 34

References [1] Campbell, N.A., Reece, J.B., Urry, L.A., Cain, M.L., Wasserman, S.A., Minorsky, P.V., Jackson, R.B. (2008). Biology. 8ste dr. San Francisco: Pearson Education. Chapter 17, page 329-344 [2] Michael L. Metzker (January 2010). Sequencing technologies – the next generation. Nature reviews, genetics volume 11. [3] Zhenqiang Su, Baitang Ning, Hong Fang, Huixiao Hong, Roger Perkins, Weida Tong and Leming Shi (2011). Next-generation sequencing and its applications in molecular diagnostics. Expert Rev. Mol. Diagn. 11 (3), 333-343.’ [4] Olena Morozova, Marco A. Marra (2008). Applications of next-generation sequencing technologies in functional genomics. Genomics 92, 255–264. [5] Choi M, Scholl UI, Ji W, Liu T, Tikhonova IR, Zumbo P, Nayir A, BakkaloŒlu A, Ozen S, Sanjad S, Nelson- Williams C, Farhi A, Mane S, Lifton RP. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U S A. 2009 Nov 10;106(45):19096-101. [6] Chee-Seng Ku, Nasheen Naidoo, Yudi Pawitan. Revisiting Mendelian disorders through exome sequencing. Hum Genet (2011) 129:351–370 [7] Michael J. Bamshad, Sarah B. Ng, Abigail W. Bigham, Holly K. Tabor, Mary J. Emond, Deborah A. Nickerson & Jay Shendure. Exome sequencing as a tool for Mendelian disease gene discovery. Nature Reviews Genetics 12, 745-755 (november 2011) [8] Andrew B Singleton. Exome sequencing: a transformative technology. Lancet Neurology 2011; 10: 942–46 [9] Guey-Shin Wang & Thomas A. Cooper. Splicing in disease: disruption of the splicing code and the decoding machinery. Nature Reviews Genetics 8, 749-761 (October 2007) [10] Agilent Technologies Protocol: SureSelectXT Target Enrichment System for Illumina Paired-End Sequencing Library, SureSelectXT Target Enrichment for Illumina Multiplexed Sequencing. Version 1.2, May 2011 [11] Christian Gilissen, Heleen H. Arts, Alexander Hoischen, Liesbeth Spruijt, Dorus A. Mans, Peer Arts, Bart van Lier, Marloes Steehouwer, Jeroen van Reeuwijk, Sarina G. Kant, Ronald Roepman, Nine V.A.M. Knoers, Joris A. Veltman, and Han G. Brunner. Exome Sequencing Identifies WDR35 Variants Involved in Sensenbrenner Syndrome. Am J Hum Genet. 2010 September 10; 87(3): 418–423 [12] Wilhelm J. Ansorge. Next-generation DNA sequencing techniques. N Biotechnol. 2009 Apr;25(4):195-203 [13] Daniel C. Koboldt, Li Ding, Elaine R. Mardis, and Richard K. Wilson. Challenges of sequencing human genomes. Brief Bioinform. 2010 Sep;11(5):484-98. [14] Elaine R. Mardis. The impact of next-generation sequencing technology on genetics. Trends in Genetics, Volume 24, Issue 3, March 2008, Pages 133–141 [15] David R. Bentley, Shankar Balasubramanian, Harold P. Swerdlow, Geoffrey P. Smith, et al. Accurate whole sequencing using reversible terminator chemistry. Nature 456, 53-59 (6 November 2008) [16] Grantham R. Amino acid difference formula to help explain protein evolution. Science 1974;185:862–4 [17] Matthew F. Rudd, Richard D. Williams, Emily L. Webb, et al. The Predicted Impact of Coding Single Nucleotide Polymorphisms Database. Cancer Epidemiol Biomarkers Prev 2005;14:2598-2604. [18] Ivan A Adzhubei, Steffen Schmidt, Leonid Peshkin, Vasily E Ramensky, Anna Gerasimova, Peer Bork, Alexey S Kondrashov & Shamil R Sunyaev. A method and server for predicting damaging missense mutations. Nature Methods 7, 248 - 249 (2010)

Page 35

[19] Sarah B Ng, Kati J Buckingham, Choli Lee, Abigail W Bigham, Holly K Tabor, Karin M Dent, Chad D Huff, Paul T Shannon, Ethylin Wang Jabs, Deborah A Nickerson, Jay Shendure & Michael J Bamshad. Exome sequencing identifies the cause of a mendelian disorder. Nature Genetics Volume: 42, Pages: 30–35 (2010) [20] National Center for Biotechnology Information (NCBI), gene database http://www.ncbi.nlm.nih.gov/ [21] DAVID R. THORBURN AND HANS-HENRIK M. DAHL. Mitochondrial Disorders: Genetics, Counseling, Prenatal Diagnosis and Reproductive Options. American Journal of Medical Genetics (Semin. Med. Genet.) 106:102±114 (2001) [22] Saskia Koene & Jan Smeitink. Mitochondrial medicine: a clinical guideline. (2011) Khondrion, Nijmegen, Nederland. Hoofdstuk 7, pagina 93. [23] Rodrigue ROSSIGNOL, Benjamin FAUSTIN, Christophe ROCHER, Monique MALGAT, Jean-Pierre MAZAT and Thierry LETELLIER. Mitochondrial threshold effects. Biochem. J. (2003) 370, 751±762 [24] Michael V. Zaragoza, Marin C. Brandon, Marta Diegoli, Eloisa Arbustini en Douglas C. Wallace (2011). Mitochondrial cardiomyopathies: how to identify candidate pathogenic mutations by mitochondrial DNA sequencing, MITOMASTER and phylogeny. European Journal of Human. Genetics 19, 200-207. [25] C.T.R.M. Schrander-Stumpel, L.M.G. Curfs, J.W. van Ree. Klinische Genetica. 1e druk Bohn Stafleu van Loghum, Houten (2005). Chapter 2, Page 8-9. [26] Craigen WJ. Mitochondrial DNA mutations: an overview of clinical and molecular aspects. Methods Mol Biol. 2012;837:3-15. [27] Inheritance. http://www.pcleiden.nl/item.html&objID=718 [28] Hamill KJ, Paller AS, Jones JC. Adhesion and migration, the diverse functions of the laminin alpha3 subunit. Dermatol Clin. 2010 Jan;28(1):79-87. [29] Jeffrey H. Miner. Laminins and their roles in mammals. Microscopy Research and Technique 71:349- 356 (2008) [30] Linnemann A, van der Ven PF, Vakeel P, Albinus B, Simonis D, Bendas G, Schenk JA, Micheel B, Kley RA, Fürst DO. The sarcomeric Z-disc component myopodin is a multiadapter protein that interacts with filamin and alpha-actinin. Eur J Cell Biol. 2010 Sep;89(9):681-92. Epub 2010 May 31. [31] Ralph Knöll, Byambajav Buyandelger, and Max Lab. The Sarcomeric Z-Disc and Z-Discopathies.J Biomed Biotechnol. 2011; 2011: 569628. Published online 2011 October 18. doi: 10.1155/2011/569628 [32] NCBI – DCHS2 http://www.ncbi.nlm.nih.gov/gene/54798 [33] Patricia Alcaide, Begon˜ Merinero, Pedro Ruiz-Sala, Eva Richard, Rosa Navarrete, Angela Arias, Antonia Ribes, Rafael Artuch, Jaume Campistol, Magdalena Ugarte, and Pilar Rodriguez-Pombo. Defining the Pathogenicity of Creatine Deficiency Syndrome. Human Mutation Volume 32, Issue 3, Article first published online: 8 FEB 2011. [34] Bo Li and Beat Trueb. Analysis of the ǂ-Actinin/Zyxin Interaction. THE JOURNAL OF BIOLOGICAL CHEMISTRY Vol. 276, No. 36, Issue of September 7, pp. 33328– 33335, 2001 [35] B. Sjçbloma, A. Salmazoa and K. Djinovic´-Carugoa. a-Actinin structure and regulation. Cell. Mol. Life Sci. 65 (2008) 2688 – 2701. [36] NCBI – KIAA1109 http://www.ncbi.nlm.nih.gov/gene/84162 [37] Haiyan Zhou, Naohiro Yamaguchi, Le Xu, Ying Wang, Caroline Sewry, Heinz Jungbluth, Francesco Zorzato, Enrico Bertini, Francesco Muntoni, Gerhard Meissner and Susan Treves. Characterization of recessive RYR1 mutations in core myopathies. Human Molecular Genetics, 2006, Vol. 15, No. 18 2791– 2803 doi:10.1093/hmg/ddl22 [38] Jeffrey E. Plowman. The proteomics of keratin proteins. Journal of Chromatography B, Volume 849, Issues 1–2, 15 April 2007, Pages 181–189

Page 36

[39] Michael A. Rogers, Lutz Langbein, Silke Praetzel Wunder, Hermelita Winter, Jürgen Schweizer. Human Hair Keratin: Associated Proteins (KAPs). International Review of Cytology, Volume 251, 2006, Pages 209–263 [40] Michael A Rogers, Lutz Langbein*, Hermelita Winter, Iris Beckmann, Silke Praetzel* and Jürgen Schweizer. Hair Keratin Associated Proteins: Characterization of a Second High Sulfur KAP Gene Domain on Human Chromosome 211. Journal of Investigative Dermatology (2004) 122, 147–158; doi:10.1046/j.0022-202X.2003.22128.x [41] Yves Peeraer, Anja Rabijns, Jean-Francois Collet, Emile Van Schaftingen and Camiel De Ranter. How calcium inhibits the magnesium-dependent enzyme human phosphoserine phosphatise. Eur. J. Biochem. 271, 3421–3427 (2004) [42] R.G.J. Dohmen. Project plan apprentice internship: Mitochondriale myopathieën. Date: 30-09-2011 (apprenticeship term: 29-08-11 t/m 23-01-12). [43] Illumina. TruSeq PE Cluster kit v3 – Reagent Preparation Guide. [44] Illumina. TruSeq SBS kit v3 (200 cycles) – HS Reagent Preparation Guide [45] Anneke Janson en Judith van Deutekom. Leiden Muscular Dystrophy pages – Protocols. Adenovirus- mediated MyoD gene transfer. Last modified on Feb 2004. [46] pEGFP-n1 features. https://www.lablife.org/g?a=seqa&id=vdb_g2.VPKknuTT9yqfuhh9oHRBnjQjWNs- _sequence_a718cc5b874124167db38f73795e7c2aeccf8b43_10 [47] Protein-protein interaction SYNPO2 http://string-db.org/newstring_cgi/show_network_section.pl [48] Astird Weins, Karin Schwarz, Christian Faul, Laure Barisoni, Wolfgang A. Linke and Peter Mundel. Differentation- and stress-dependent nuclear cytoplasmic redistribution of myopodin, a novel actin-bundling protien. JCB article. 22nd of October 2001. [49] Faul C, Dhume A, Schecter AD, Mundel P. Protein kinase A, Ca2+/calmodulin-dependent kinase II, and calcineurin regulate the intracellular trafficking of myopodin between the Z-disc and the nucleus of cardiac myocytes. Mol Cell Biol. 2007 Dec;27(23):8215-27. Epub 2007 Oct 8. [50] Frank, D.,C. Kuhn, H. A. Katus, and N. Frey. The sarcomeric Z-disc: a nodal point in signaling and disease. J. Mol. Med. 84:446-468. [51] Pyle, W. G., and R. J. Solaro. At the crossroads of myocardial signaling: the role of Z- disc in intracellular signaling and cardiac function. Circ. Res. 94.296-306 [52] Abdelaziz Beqqali, J. Monshouwer-Kloots, R. Monteiro, M. Welling, J. Bakkers, E. Ehler, A. Verkleij, C. Mummery and R. Passier. CHAP is newly identified Z-disc protein essential for heart and skeletal muscle function. Journal of Cell Science 123, 1141-1150.

Page 37

Appendix’ Page Appendix 1: Flow diagram WE-enrichment 39 Appendix 2: pEGFP-n1 40 Appendix 3: Covaris S2 fragmentation 41 Appendix 4: AMPure XP beads Agencourt 42 Appendix 5: Agilent 2100 Bioanalyzer 43 Appendix 6: Primer design 44 Appendix 7: PCR & gel electrophoresis 45 Appendix 8: Sanger sequencen 46 Appendix 9: Designed primers for Sanger sequencing genes 47 Appendix 10: Regulation of SYNPO2 localization in myocytes 48 Appendix 11: Protein-protein interaction SYNPO2 49

Page 38

Appendix 1: Flow diagram WE-enrichment

Figure 21: Flow diagram SureSelect WE-enrichment. gDNA is sheared, sticky ends are converted to blunt ends and these are phosphorylated. Adapters are ligated to the fragment with the use of a dATP overhang, adapters have a dTTP overhang. The adapters are biotinylated, the streptavidin, of the streptavidin coated magnetic beads, attaches to the biotin of the adapters, using a magnetic field the exon fraction is captured. The beads are cleaved off and the RNA is digested, the single stranded DNA fragment is ready to be sequenced [10].

Page 39

Appendix 2: pEGFP-n1

Figure 22: Features of pEGFP-n1[46]

Digestion:

7 µl pre-amplified PCR product – target sequence/ 6.5 µl H2O + 0.5 pEGFP-n1 1 µl BSA (100 mg/ml) 1 µl NEB 1 0.5 µl KpnI and 0.5 µl AgeI

Ligation: 7 µl digested PCR product 1 µl digested pEGFP-n1 + 1 µl T4 DNA Ligase + 1 µl T4 DNA Ligase buffer (10X)

Used primers: Forward: 5’ TTGACGTCAATGGGAGTTTG 3’ Reverse: 5’ GCCACCTACGGCAAGCTGAC 3’

Amplicon no insert: 359 bp Amplicon insert: 359 bp + amplicon target sequence

Digestion NcoI, the standard products: 1905 bp, 1809, 703 bp and the last product depends on the size of the insert.

Page 40

Appendix 3: Covaris S2 fragmentation

Materials: - Covaris S2 - Sample Vessel: Snap-Cap microTube met AFA fiber - S2 holder: microTube holder with Snap-Cap - Buffer: Tris, EDTA (pH=8.0) - Concentration sample: 3 µg gDNA or LRPCR in 123 µl, resolve in 1 x TE buffer

Method: 1. Fill the tank with fresh de-ionized water to proper fill line (10-15). Degas water for recommended time period and set the chiller to the right temperature. 2. Set up the Covaris S2 at the appropriate temperature following the operating conditions, see table 3. 3. Slowly pipette x ʅl of DNA solution into the miniTUBE, first calculate how much DNA is to be used. 4. Load the capped miniTUBE into the holder, into the Covaris and press start. 5. At the completion of the run remove the miniTUBE from the holder. For maximum recovery of the sample, tilt the miniTUBE and slowly remove the sample with a pipette.

Table 11: Work conditions Covaris S2. Base pairs 150 200 300 400 500 800 1000 1500

Duty Cycle 10% 10% 10% 10% 5% 5% 5% 2%

Intensity 5 5 4 4 3 3 3 4

Cycles per 200 200 200 200 200 200 200 200 burst

Time 430 180 80 55 80 50 40 15 (seconds)

Temperature (water bad) 6-8 oC Power mode Frequency Sweeping Degassing mode Continuous V o l u m e 123 µl Buffer Tris EDTA, pH: 8.0 DNA mass < 5 µg Starting material > 50 kb Water level (Fill/ Run) S2 – Level 12 AFA Intensifier Yes

Page 41

Appendix 4: AMPure XP beads Agencourt

Method: 1. Gently shake the Agencourt AMPure XP bottle to resuspend any magnetic particles that may have settled. Add Agencourt AMPure XP according to the equation: (Volume of Agencourt AMPure XP per reaction) = 1.8 x (Reaction Volume). 2. Mix reagent and PCR reaction thoroughly by pipette mixing 10 times. Let the mixed samples incubate for 5 minutes at room temperature for maximum recovery. The colour of the mixture should appear homogenous after mixing. 3. Load the tubes into an Agencourt MSB magnetic holder for 2 minutes to separate beads from the solution. Wait for the solution to clear before proceeding to the next step. Aspirate the cleared solution from the tube and discard. 4. Dispense 200 ʅL of 70% ethanol to each tube and incubate for 30 seconds at room temperature. Aspirate out the ethanol and discard. Repeat for a total of two washes. It is important to perform these steps on the magnetic holder. Do not disturb the separated magnetic beads. Be sure to remove all of the ethanol from the bottom of the well as it is a known PCR inhibitor. 5. Off the magnetic holder, add 40 ʅL of elution buffer (Reagent grade water, TRISAcetate pH 8.0, or TE) to each tube and pipette mix 10 times. 6. Load the tubes into an Agencourt magnetic holder for 1 minute to separate beads from the solution. 7. Transfer the eluant to a new tube.

Page 42

Appendix 5: Agilent 2100 Bioanalyzer

Materials: • Consumables: - DNA chip, - Electrode cleaner, - Syringe, - Spin filters - Samplers (10µl, 100µl en 1000µl) + pipet tips, - 0,5ml tubes • Chemicals: - High sensitivity DNA ladder (depends on chip) (yellow) - High sensitivity DNA markers (depends on chip) (green) - High sensitivity DNA dye concentration (blue) - High sensitivity DNA gel matrix (red) • Equipement: - Vortex, - Centrifuge

Method: Preparation of the gel-dye mix 1) Allow High sensitivity (HS) dye concentrate (blue) and HS DNA gel matrix (red) to equilibrate to room temperature for 30 min. 2) Add 15 µl of HS DNA dye concentrate (blue) to a HS DNA gel matrix vial (red). 3) Vortex solution well and spin down. Transfer to spin filter. 4) Centrifuge at 6000 rpm for 10 min. Protect the solution form light, store 4°C in the dark. Loading of the gel-dye mix 1) Allow the gel-dye mix to equilibrate to room temperature for 30 min before use. 2) Put a new HS DNA chip on the chip priming station. 3) Pipette 9 µl of gel-dye mix in the well marked G. 4) Make sure that the plunger is positioned at 1 ml and then close the chip priming station. 5) Press plunger until it is held by the clip. 6) Wait for exactly 60 s then release clip. 7) Wait for 5 seconds, then slowly pull back the plunger to the 1 ml position. 8) Open the chip priming station and pipette 9 µl of gel-dye mix in the wells marked G. Loading the High Sensitivity DNA marker 1) Pipette 5 µl of marker (green) in all sample and ladder Wells. Do not leave any wells empty. Loading the ladder and the samples 1) Pipette 1 µl of HS DNA ladder (yellow) in the well marked with a ladder. 2) In each of the 11 sample Wells pipette 1 µl of sample (used wells) or 1 µl of marker (unused well). 3) Put the chip horizontally in the adapter and vortex for 1 min at the indicated setting (2400 rpm). 4) Run the chip in the Agilent 2100 Bioanalyzer within 5 min.

Figure 23: Schematic overview of the HS chip. All the wells are marked with a specific colour.

Page 43

Appendix 6: Primer design Materials: - Computer - Primer 3 Design - SNPcheck (single-nucleotide polymorphism) - NCBI (National Center for Biotechnology Information) - NGS dataset

Methods: 1. Compare the NGS dataset with a reference gene, filter to leave the potential pathogenic mutations. Determine the chromosome number, exon and the mutation. 2. Search using NCBI, function: gene, the gene and FASTA the gDNA sequence. 3. Do the same for the mRNA sequence, search the exon, in which the mutation occurred, and the coding DNA sequence (CDS, start codon) of the gene. 4. Determine the position of the mutation in the gDNA, using the CDS and the position of the mutation in the cDNA. In the cDNA the nucleotides are numbered with a c.x number, the x indicates a number of a nucleotide, c.1 is the First nucleotide of the starting codon. E.g. the CDS starts at position 17 of the mRNA, so nucleotide 17 is c.1 in the cDNA. 5. Copy the sequence surrounding the mutation of the gDNA, the amplicon has to be 400 base pairs in length, to the primer 3 design program. Highlight the region of interest and pick primers. 6. The picked primers need to be checked for SNP’s and availability, use respectively SNPcheck and BLAST (NCBI function) to check the primers.

Page 44

Appendix 7: PCR & gel electrophoresis

Materials PCR: • Equipment: - Vortex, - Eppendorf centrifuge, - Biometra thermocycler, - Biometra Tgradient • Materials and reagents:

- Tubes (0,2 ml), - Samplers (2 en 10 µl), - Milli-Q, - 10x PCR buffer, - MgCl2 (50 mM), - dNTP-mix (10 mM), - Taq DNA-polymerase (5 U/ml), Forward/ Reverse primer (10 pmol/ µl) - DNA

Methods PCR: 7. Thaw all the reagents and store them on ice, mix the reagents as follows: - 39,3-36,3 µl Milli-Q - 5 µl buffer - 1,5 µl MgCl2 - 1,0 µl dNTP-mix - 1,0 µl forward en 1,0 µl reverse primer - 0,2 µl Taq-polymerase - 1-5 µl DNA (neg. controle is 1 µl Milli-Q) 8. Vortex the Master mix and place the tubes in the PCR, run the following temperature program: Table 12: Temperature program standard PCR. Step Time (min’s’’) Temperature (oC) Cycles Pre-denaturation 5’00’’ 95 - Denaturation 1’00’’ 95 Annealing 1’00’’ Primer dependent 33 Extension 1’30’’ 72 Extra extension 7’00’’ 72 - Hold Infinite 4 - 9. The PCR is optimized using different temperatures 56 oC to 62 oC.

Materials gel electrophoresis: • Equipment: - Micro wave, - electrophoresis unit, - Imago Compact Imaging System • Materials and reagents: - Samplers (20 µl), - TAE-buffer (1x), - Agarose, - Ethidium Bromide (0,625 mg/ml) - Loading buffer (Orange G 5x), - DNA Ladder mix O’generuler, - PCR products

Methods gel electrophoresis: 1. Prepare a 2% agarose gel in 100 ml 1x TAE buffer (2 grams of agarose). 2. Heat the mixture in the micro wave and cool it down to 65-70 oC. 3. Add 10 µl of ethidium bromide and pore the gel, make sure the comb is in place. 4. Leave the gel to solidify for 20 min and remove the comb, place the gel in the electrophoresis unit and fill the unit with 1x TAE to 2-3 mm above the gel. 5. Add 5 µl Loadingbuffer to 5 µl sample and pipette the mixture in one of the wells. 6. Pipette 2,5 µl DNA ladder mix in one of the wells. 7. Electrophore the gel for 30 minutes at 80 Volt and image the gel with the Imago Compact Imaging System.

Page 45

Appendix 8: Sanger sequencen

Materials: • Equipment: - Centrifuge - Biometra thermocycler - 3100 Genetic analyzer • Materials and reagents: - Tubes (0,2 ml) - Samplers (2 en 10 µl) - Milli-Q - BigDye ® Terminator v1.1 cycle sequence buffer - BigDye ® Terminator v1.1 cycle sequence kit - Forward primer (1 pmol) - Reverse primer (1 pmol) - Purified PCR-product

Method: 1. Thaw all the reagents and store them on ice, mix the reagents as follows - 3,3 µl Milli-Q - 1,5 µl sequence buffer - 3,2 µl forward or reverse primer - 1 µl PCR product - 1 µl sequence enzyme 2. Vortex the mixture and place the tubes in the thermocycler, run the following temperature program:

Table 13: Temperature program standard Sanger sequencing. Step Time (min’ s’’) Temperature (oC) Jump to Cycles 1 0’10’’ 94 - - 2 0’10’’ 50 - - 3 2’00’’ 60 1 24 4 4 - -

Page 46

Appendix 9: Designed primers for Sanger sequencing genes

DNA Family 07-2283 LAMA3, exon 19, chromosome 18 Forward primer: TTAATGTACTCTGTCAATGTGGT Reverse primer: ATCGTGGAATCAGTTCTGG Product size: 384 bp

SYNPO2, exon 4, chromosome 4 Forward primer: ACGCCAAGCAGAGAACAA Reverse primer: GGCTGATTCACAGACCCT Product size: 248 bp

DCHS2, exon 25, chromosome 4 Forward primer: GGAGATCAAGGGGAAGGCTGCA Reverse primer: TTTTGGAATGCCAGGCATATG Product size: 309 bp

SLC6A8, exon 4, chromosome X Forward primer: CCTGTCCTCGGAGAGTCCTG Reverse primer: CTTCCAGACACAGAAGTAGACCAG Product size: 399 bp

ZYX, exon 3, chromosome 7 Forward primer: CTGGGACCCCTGAGAGAGTT Reverse primer: AGAGGTGCAGGGGAGGTAAG Product size: 397 bp

DNA Family 08-5759

KRTAP10.6, exon 1, chromosome 21 (both mutations) Forward primer: AACAAGGCCAGGTGGAATAAAA Reverse primer: GAAGAGGAAATCCCAGAGCAGA Product size: 595 bp

PSPH, exon 4, chromosome 7 (both mutation) Forward primer: GGGAAGTAAACAAAAAGAAAAGGAA Reverse primer: GGGTCTAGCACTTCATCCAAAACAG Product size: 399 bp

Page 47

Appendix 10: Regulation of SYNPO2 localization in myocytes

Figure 24: Regulation of SYNPO2 (myopodin, orange bar) localization in myocytes. (A) Normal state in fully differentiated myotubes, SYNPO2 is attached to the Z-disc (alpha-actinin). (B) By inhibiting calcineurin or activation of PKA and CaMKII the SYNPO2 proteins become phosphorylated (pink P), resulting in a bond between SYNPO2 and 14-3-3. Due to the phosphorylation of SYNPO2 it detaches from the Z-disc and it’s located in the cytoplasm. (C) Importin can bind to the, in the cytoplasm located, SYNPO2 and imports it into the nucleus [50].

Page 48

Appendix 11: Protein-protein interaction SYNPO2

Figure 25: The known protein-protein interactions, in the middle (red) SYNPO2 [47].

Page 49