GENETICSOFDISTALHEREDITARY MOTORNEUROPATHIES

By alexander peter drew

A thesis submitted for the Degree of Doctor of Philosophy

Supervised by Professor Garth A. Nicholson Dr. Ian P. Blair

Faculty of Medicine University of Sydney

2012 statement

No part of the work described in this thesis has been submitted in fulfilment of the requirements for any other academic degree or qualification. Except where due acknowledgement has been made, all experimental work was performed by the author.

Alexander Peter Drew CONTENTS acknowledgements ...... i summary ...... ii list of figures ...... v list of tables ...... viii acronyms and abbreviations ...... xi publications ...... xiv

1 literature review ...... 1 1.1 Molecular genetics and mechanisms of disease in Distal Hereditary Motor Neuropathies ...... 1 1.1.1 Small heat shock family ...... 2 1.1.2 Dynactin 1 (DCTN1)...... 9 1.1.3 Immunoglobulin mu binding protein 2 (IGHMBP2) 11 1.1.4 Senataxin (SETX)...... 14 1.1.5 Glycyl-tRNA synthase (GARS)...... 16 1.1.6 Berardinelli-Seip congenital lipodystrophy 2 (SEIPIN) gene (BSCL2)...... 18 1.1.7 ATPase, Cu2+-transporting, alpha polypeptide gene (ATP7A) 20 1.1.8 Pleckstrin homology domain-containing protein, G5 gene (PLEKHG5)...... 21 1.1.9 Transient receptor potential cation channel, V4 gene (TRPV4) 22 1.1.10 DYNC1H1 ...... 23 1.1.11 Clinical variability in dHMN ...... 24 1.1.12 Common disease mechanisms in dHMN ...... 29 2 general materials and methods ...... 32 2.1 General materials and reagents ...... 32 2.1.1 Reagents and Enzymes ...... 32 2.1.2 Equipment ...... 33 2.1.3 Plasticware ...... 33 2.2 Study participants ...... 34 2.3 DNA methods ...... 34 2.3.1 DNA extraction from blood ...... 34 2.3.2 DNA oligonucleotides ...... 35 2.3.3 PCR protocol ...... 35 2.3.4 Agarose gel electrophoresis ...... 36 2.3.5 DNA purification from PCR ...... 36 2.3.6 DNA sequencing ...... 37 3 genetic mapping ...... 39 3.1 Introduction ...... 39 3.1.1 Overview ...... 39 3.1.2 Homologous recombination and linkage disequilibrium 39 3.1.3 Genetic linkage analysis ...... 41 3.1.4 Penetrance in dHMN ...... 44 3.1.5 Characterisation of dHMN1 in a large Australian family 44 3.1.6 Previous genetic studies in dHMN1 at 7q34-q36 . . . . . 44 3.1.7 Hypothesis and aims ...... 48 3.2 Materials and Methods ...... 50 3.2.1 Microsatellite analysis ...... 50 3.2.2 Microsatellite markers ...... 51 3.2.3 Haplotype analysis ...... 51 3.2.4 Genetic linkage analysis ...... 52 3.2.5 Association analysis ...... 53 3.3 Results ...... 54 3.3.1 Fine mapping of family CMT54 ...... 54 3.3.2 Linkage analysis using new markers in CMT54 . . . . . 57 3.3.3 Identification of additional dHMN families ...... 60 3.3.4 Linkage power analysis ...... 62 3.3.5 Genetic linkage analysis in additional dHMN families . 65 3.3.6 Suggestive linkage in family CMT44 ...... 65 3.3.7 Uninformative families ...... 66 3.3.8 dHMN families excluded from 7q34-q36 ...... 74 3.3.9 Haplotype association analysis ...... 75 3.4 Discussion ...... 76 3.4.1 Linkage analysis in CMT54 ...... 76 3.4.2 Linkage analysis in the extended dHMN cohort . . . . . 79 3.4.3 Proportion of dHMN caused by a pathogenic gene at 7q34-q36 ...... 82 4 analysis of candidate ...... 84 4.1 Introduction ...... 84 4.1.1 Chapter overview ...... 84 4.1.2 Strategy for gene identification ...... 84 4.1.3 Identifying genes within a disease interval ...... 85 4.1.4 Selection of candidate genes ...... 86 4.1.5 Gene screening methods ...... 88 4.2 Materials and Methods ...... 90 4.2.1 In-silico candidate gene prioritisation ...... 90 4.2.2 PCR and sequencing for gene screening ...... 90 4.2.3 Analysis of effect of variants on exon splicing ...... 91 4.2.4 RNA studies of splicing variants ...... 92 4.2.5 RNA folding prediction ...... 93 4.3 Results ...... 94 4.3.1 Positional candidate genes ...... 94 4.3.2 Functional candidate gene selection ...... 96 4.3.3 Screening candidate genes ...... 99 4.3.4 Analysis of the interval defined by significantly associ- ated haplotypes ...... 106 4.4 Discussion ...... 107 4.4.1 Gene screening summary ...... 107 4.4.2 Was the missed? ...... 108 5 copy number and structural variation ...... 109 5.1 Introduction ...... 109 5.1.1 CNV and structural variation ...... 109 5.1.2 CNV in dHMN and related peripheral neuropathies . . 112 5.1.3 Methods of detecting CNV ...... 113 5.1.4 Hypothesis and Aim ...... 115 5.2 Materials and Methods ...... 115 5.2.1 Custom CGH Microarray ...... 115 5.2.2 Array processing ...... 116 5.2.3 Software Nexus 5 ...... 117 5.2.4 PCR confirmation of deletions ...... 118 5.3 Results ...... 118 5.3.1 Cytogenetic analysis ...... 118 5.3.2 CNV analysis at 7q34-q36 using aCGH ...... 119 5.4 Discussion ...... 130 5.4.1 Summary ...... 130 5.4.2 Comparison of methods for CNV analysis ...... 130 5.4.3 Limitations of array-based CGH ...... 133 6 next generation sequencing of cmt54 ...... 134 6.1 Introduction ...... 134 6.1.1 NGS technology ...... 134 6.1.2 NGS of large genomes ...... 138 6.1.3 Applications of NGS to Mendelian disease ...... 139 6.2 Materials and methods ...... 141 6.2.1 NimbleGen array-based targeted sequence capture . . . 141 6.2.2 454 GS FLX sequencing ...... 142 6.2.3 NimbleGen array-based exome capture and Solexa se- quencing ...... 143 6.2.4 Confirmation of sequence variants using Sanger sequencing144 6.2.5 Sequence variant analysis using Galaxy Browser . . . . . 144 6.2.6 NGS assembly and annotation using Bowtie and SAMtools144 6.2.7 Unequal coverage comparison ...... 145 6.2.8 In silico prediction of functional impacts of sequence vari- ants ...... 145 6.3 Results ...... 146 6.3.1 7q34-q36 target region sequencing . . . . . 146 6.3.2 7q34-q36 sequence variant analysis ...... 147 6.3.3 Off-target and repeat mapping reads ...... 154 6.3.4 Exome sequencing ...... 155 6.3.5 Further sequencing analysis using unequal coverage anal- ysis ...... 158 6.3.6 Exome wide analysis ...... 163 6.4 Discussion ...... 164 6.4.1 Limitations of the NGS analysis ...... 166 7 exome sequencing of cmt44 ...... 169 7.1 Introduction ...... 169 7.2 Materials and methods ...... 170 7.2.1 Exome sequencing ...... 170 7.2.2 NGS variant analysis ...... 171 7.2.3 HRM analysis ...... 172 7.3 Results ...... 172 7.3.1 Exome sequencing in CMT44 ...... 172 7.3.2 Clinical detail of CMT44 ...... 173 7.3.3 MFN2 genotyping in CMT44 using high resolution melt analysis ...... 177 7.3.4 Exome wide sequence variant analysis in CMT44 . . . . 177 7.3.5 Comparison of SNPs between CMT44 and CMT54 . . . 179 7.4 Discussion ...... 179 8 general discussion ...... 185 8.1 Genetic studies in family CMT54 ...... 185 8.2 Genetic linkage analysis in additional dHMN families . . . . . 190 8.3 Genetic studies in family CMT44 ...... 191 8.4 Conclusion ...... 192 a custom gnu bash scripts ...... 194 a.1 Linkage analysis ...... 194 a.2 NGS coverage visualisation ...... 197 a.3 NGS variant comparison with fold coverage analysis ...... 198 b pcr primers ...... 205 c pedigrees for additional dhmn families ...... 216 d two point lod scores for additional dhmn families ...... 230 e multi-point lod scores for additional dhmn families ...... 240 f 7q34-q36 control haplotypes ...... 249 g 7q34-q36 candidate genes and reported ...... 251 h next generation sequencing ...... 255 i cnv identified with array cgh ...... 257 bibliography ...... 261 ACKNOWLEDGEMENTS

It is a pleasure to thank the people who made this PhD project possible. My warmest thanks to my supervisors Professor Garth Nicholson and Dr Ian Blair for your support, encouragement and the opportunities presented to me throughout this project. To Garth, your expertise with the clinical aspects of peripheral neuropathies as well as your academic experience have proved invaluable. To Ian, thank you for your guidance in the lab, the opportunities and resources I was given and the assistance with preparation of my thesis. I am extremely grateful to all participating dHMN family members, without whom this research would not be possible. To my colleagues, in particular, Kelly Williams, for assistance with the targeted sequencing of candidate genes and Jenn Solski, for assistance with sequencing variants identified with NGS. To all the staff and students in the Northcott Neuroscience and Molecular Medicine laboratories, thank you for your friendly assistance over these years. The help of Annette Berryman in the Molecular Medicine office and the procedural guidance of Tracey Dent and Annet Doss in the ANZAC research institute office has been most helpful. The support and encouragement of my friends and family has been essential. My parents, Paul and Josephina Drew, have been a constant source of support, both emotionally and financially during my postgraduate years. Finally, my warmest thanks go to my girlfriend Victoria, without her loving support and patience this thesis would never have been possible. You always bring me joy. This work was supported in part by an Australian Postgraduate Award and funding from the National Health and Medical Research Council of Australia (511941).

i SUMMARY

The distal hereditary motor neuropathies (dHMN) are a clinically and geneti- cally heterogeneous group of disorders that primarily affect motor neurons, without significant sensory involvement. Using genome wide linkage analy- sis in a large Australian family (CMT54), a form of dHMN was previously mapped by this laboratory, to a 12.98 Mb interval on chromosome 7q34-q36. The axonal neuropathy seen in this family was classified as dHMN1; with autosomal dominant inheritance, early but variable age of onset, and muscle weakness and wasting affecting the lower limbs.

In this project, genetic linkage analysis of the chromosome 7q34-q36 disease interval was carried out in the original family (CMT54) and 20 smaller families from an Australian dHMN cohort. Fine mapping in family CMT54, including unaffected individuals suggested a minimum probable candidate interval of 6.92 Mb, flanked by markers D7S615 and D7S2546 within the 12.98 Mb critical disease interval. Of the additional dHMN families, one (family CMT44) achieved suggestive linkage to the chromosome 7q34-q36 disease locus with a LOD score of 2.02.

Mutation screening was carried out in family CMT54 at the chromosome 7q34-q36 locus. The 12.9 Mb disease interval contains 89 annotated protein- coding genes, of which 60 lay within the prioritised 6.92 Mb interval. A combi- nation of methods was used to screen these genes for a putative pathogenic mutation. Functional candidate genes were identified via a literature and database search. The coding exons of 35 prioritised candidate genes were sequenced and no pathogenic mutation was identified. Cytogenetic analysis excluded large scale chromosomal abnormalities. Array based comparative

ii genomic hybridisation of the 7q34-q36 interval in patients did not identify any pathogenic duplications or deletions.

Next generation sequencing (NGS) techniques were used to identify se- quence variants within the remaining genes within the 7q34-q36 interval and elsewhere in the genome. Two NGS based approaches were applied to muta- tion screening in family CMT54. Initially, the chromosome 7q34-q36 disease interval was analysed in one affected individual using a custom designed DNA capture microarray and 454 GS FLX (Roche) sequencing. Approximately 80% of patient coding exons were captured, sequenced and no pathogenic muta- tions were identified. The chromosome 7q34-q36 target captured DNA sample was also re-sequenced along with an additional two affected individuals and one unaffected parent using exome capture and Solexa (Illumina) sequencing. Combined, 99.5% of coding exons were sequenced in the chromosome 7q34- q36 interval and all sequence variants that were identified were excluded from a pathogenic role. Sequence variants identified elsewhere in the exome were also excluded from a pathogenic role.

Exome sequencing of dHMN family CMT44 did not identify any putative pathogenic mutation at the chromosome 7q34-q36 locus. The exomes of four affected and one unaffected individuals were sequenced. Exome wide analysis identified a potential digenic inheritance in CMT44 of a previously published MFN2 mutation causing a mild CMT2 phenotype and a second mutation causing a dHMN phenotype. Potential candidate for dHMN were identified in two genes, PCDHGA4 and DNAH11. PCDHGA4, was previously shown to function in the brain and spinal cord, and deletion of PCDHG genes in a mouse model causes a severe neurodegenerative phenotype.

The gene mutation causing dHMN that maps to chromosome 7q34-q36 remains to be identified. The disease mutation may lie in a coding region not captured by current exome platforms, a non-coding region, or the muta- tion may cause disease through an alternate mechanism not detected by the

iii methods employed in this thesis.Future studies should concentrate on tran- scriptome analysis by next-gen RNA sequencing, which may identify unknown transcripts and exons that map to chromosome 7q34-q36 or highlight sequence variants located in regulatory elements. Identification of new gene mutations is critical to further understanding the biochemical and cellular processes underlying dHMN. Although the causative mutation for dHMN on 7q34-q36 was not identified, a significant proportion of the disease interval has been excluded using a combination of traditional and new technologies. The purpose of this thesis is to identify new gene mutations causing dHMN. The genetic and functional data presented here suggest this will be a difficult task; the genetic heterogeneity complicates genetic analysis and the multiple molecular mechanisms implicated to date make it difficult to pinpoint specific candidate genes. The identification of additional genes and genetic modifiers is necessary to increase our understanding of the disease mechanisms caus- ing dHMN and related neuropathies. This will directly aid in the diagnosis and classification of these neurodegenerative diseases and may lead to new therapeutics and treatment strategies.

iv LISTOFFIGURES

Figure 1.1 Eleven causative genes described to date for dHMN. . .4 Figure 3.1 Original pedigree CMT54 and haplotype analysis at chr7q34- q36 ...... 46 Figure 3.2 Physical map of the chr7q34-q36 disease locus ...... 49 Figure 3.3 Fine mapping of the proximal end of the minimum prob- able candidate interval in CMT54...... 55 Figure 3.4 Detail of fine mapping of the distal end of minimum probable candidate interval in family CMT54 ...... 57 Figure 3.5 Updated pedigree CMT54 and haplotype analysis . . . . 59 Figure 3.6 CMT54 multi-point LOD plot at 7q34-q36 ...... 61 Figure 3.7 Pedigree CMT44 haplotype analysis ...... 67 Figure 3.8 CMT44 multi-point LOD plot...... 69 Figure 4.1 Genes mapping to the 7q34-q36 disease interval . . . . . 95 Figure 4.2 Location of variant CNTNAP2:c.3476-15C>A and adja- cent exons...... 104 Figure 4.3 PCR amplification of cDNA flanking the CNTNAP2:c3476- 15C>A variant ...... 105 Figure 4.4 Genes in the interval defined by significantly associated haplotypes ...... 106 Figure 5.1 Forms of CNV and structural variation at a chromosomal or molecular scale...... 111 Figure 5.2 Array based CGH analysis...... 115 Figure 5.3 Direct method for aCGH...... 117 Figure 5.4 The proportion of CNV calls within each size range. . . 120 Figure 5.5 Array CGH analysis in CMT54...... 121 Figure 5.6 CNV probe plot around deletion A...... 122 Figure 5.7 UCSC genome browser views of deletion A...... 123 Figure 5.8 CNV probe plot around deletion B...... 126 Figure 5.9 UCSC genome browser views of deletion B...... 127 Figure 5.10 CNV probe plot around deletion C...... 128 Figure 5.11 UCSC genome browser views of deletion C...... 129 Figure 5.12 PCR Genotyping of control individuals for deletion C. . 129 Figure 5.13 CNV probe plot around deletion D...... 131 Figure 5.14 UCSC genome browser views of deletion D...... 132 Figure 6.1 Clonal amplification of immobilised template DNA in NGS136 Figure 6.2 Filtering strategy for variants identified by targeted cap- ture sequencing of 7q34-q36...... 148 Figure 6.3 454 sequencing contigs at the ACCN3 gene...... 152 Figure 6.4 7q34-q36 sequencing coverage AGRF sequencing . . . . 153

v Figure 6.5 Recent segmental duplications at the chromosome 7q34- q36 disease locus...... 155 Figure 6.6 Sequencing coverage comparison between three exome capture and one target region capture samples...... 160 Figure 7.1 Updated pedigree for family CMT44 ...... 175 Figure 7.2 HRM genotyping analysis of the MFN2 mutation identi- fied in CMT44...... 178 Figure 7.3 Updated comparison of haplotypes between CMT44 and CMT54 using SNP data from NGS...... 180 Figure 7.4 Organisation of the protocadherin gene cluster on chro- mosome 5p31...... 183 Figure C.1 Pedigree CMT484 with haplotype analysis at 7q34-q36 . 217 Figure C.2 Pedigree CMT274 with haplotype analysis at 7q34-q36 . 217 Figure C.3 Pedigree CMT609 with haplotype analysis at 7q34-q36 . 218 Figure C.4 Pedigree CMT677 with haplotype analysis at 7q34-q36 . 219 Figure C.5 Pedigree CMT703 with haplotype analysis at 7q34-q36 . 219 Figure C.6 Pedigree CMT131 with haplotype analysis at 7q34-q36 . 220 Figure C.7 Pedigree CMT273 with haplotype analysis at 7q34-q36 . 220 Figure C.8 Pedigree CMT333 with haplotype analysis at 7q34-q36 . 221 Figure C.9 Pedigree CMT701 with haplotype analysis at 7q34-q36 . 222 Figure C.10 Pedigree CMT260 with haplotype analysis at 7q34-q36 . 223 Figure C.11 Pedigree CMT618 with haplotype analysis at 7q34-q36 . 224 Figure C.12 Pedigree CMT477 with haplotype analysis at 7q34-q36 . 225 Figure C.13 Pedigree CMT748 with haplotype analysis at 7q34-q36 . 226 Figure C.14 Pedigree CMT779 with haplotype analysis at 7q34-q36 . 226 Figure C.15 Pedigree CMT808 with haplotype analysis at 7q34-q36 . 226 Figure C.16 Pedigree CMT375 with haplotype analysis at 7q34-q36 . 227 Figure C.17 Pedigree CMT102 with haplotype analysis at 7q34-q36 . 227 Figure C.18 Pedigree CMT724 with haplotype analysis at 7q34-q36 . 228 Figure C.19 Pedigree CMT331 with haplotype analysis at 7q34-q36 . 229 Figure E.1 Multi-point LOD plot for family CMT609...... 240 Figure E.2 Multi-point LOD plot for family CMT274...... 241 Figure E.3 Multi-point LOD plot for family CMT677...... 241 Figure E.4 Multi-point LOD plot for family CMT484...... 242 Figure E.5 Multi-point LOD plot for family CMT703...... 242 Figure E.6 Multi-point LOD plot for family CMT131...... 243 Figure E.7 Multi-point LOD plot for family CMT333...... 243 Figure E.8 Multi-point LOD plot for family CMT273...... 244 Figure E.9 Multi-point LOD plot for family CMT260...... 244 Figure E.10 Multi-point LOD plot for family CMT701 ...... 245 Figure E.11 Multi-point LOD plot for family CMT618...... 245 Figure E.12 Multi-point LOD plot for family CMT724...... 246 Figure E.13 Multi-point LOD plot for family CMT331...... 246 Figure E.14 Multi-point LOD plot for family CMT477...... 247 Figure E.15 Multi-point LOD plot for family CMT779...... 247

vi Figure E.16 Multi-point LOD plot for family CMT748...... 248 Figure E.17 Multi-point LOD plot for family CMT808...... 248

vii LISTOFTABLES

Table 1.1 OMIM classification of dHMN ...... 25 Table 3.1 Previous two-point LOD scores in CMT54 ...... 47 Table 3.2 New STR markers for fine mapping ...... 56 Table 3.3 CMT54 two-point LOD scores between the dHMN locus and microsatellite markers at 7q34-q36 ...... 57 Table 3.4 Sample and clinical information for 20 additional dHMN families...... 63 Table 3.5 Simulated two-point LOD scores for additional dHMN families at θ = 0.00 ...... 64 Table 3.6 CMT44 two-point LOD scores between the dHMN locus and microsatellite markers on 7q34-q36 ...... 68 Table 3.7 Comparison of haplotypes between CMT54 and addi- tional dHMN families ...... 77 Table 3.8 Haplotypes used in association analysis...... 78 Table 3.9 Association analysis of haplotypes with dHMN using Fisher’s exact test ...... 79 Table 4.1 Candidate gene prioritisation ranking using GeneWan- derer and Endeavour...... 99 Table 4.2 DNA variants identified in genes screened. All mutations found in affected individuals sequenced including known SNPs. Allele frequency. When both controls are homozy- gous, allele freq will be 0% or 100%. 100% indicates they are homozygous for the alternate SNP allele. Where rs numbers are indicated, alleles identified in sequencing matched those reported in dbSNP...... 101 Table 6.1 Summary statistics for targeted sequencing of CMT54 family member III:14...... 147 Table 6.2 Novel non-synonymous variants identified by targeted capture and sequencing of 7q34-q36...... 149 Table 6.3 Novel sequence variants possibly affecting splicing iden- tified within the 7q34-q36 interval...... 150 Table 6.4 Novel variants within the 5’ and 3’ UTR of known genes. 151 Table 6.5 Summary of exome and chromosome 7q34-q36 targeted sequencing using Solexa platform...... 157 Table 6.6 Sequence variants identified by exome capture and BGI sequencing ...... 158 Table 6.7 Novel sequence variants identified with exome sequencing.158 Table 6.8 Segregation analysis of exome sequencing identified vari- ants assuming unequal sequencing coverage...... 162

viii Table 6.9 Markers with two-point LOD scores > 1.5 in a genome wide linkage analysis...... 163 Table 6.10 Sequence variants predicted to be deleterious or affecting conserved sequences...... 164 Table 7.1 Summary of exome sequencing in CMT44...... 174 Table 7.2 Sequence variants identified by exome sequencing in CMT44 ...... 174 Table B.1 PCR primers used in gene screening...... 205 Table B.2 PCR primers for microsatellite markers...... 214 Table B.3 PCR primers for RT-PCR...... 214 Table B.4 PCR primers for HRM analysis...... 214 Table B.5 PCR primers for validation of CNV...... 214 Table B.6 PCR primers for validation of variants identified through NGS...... 215 Table D.1 Two-point linkage analysis in family CMT274 for 7q34- q36 markers...... 230 Table D.2 Two-point linkage analysis in family CMT609 for 7q34- q36 markers...... 231 Table D.3 Two-point linkage analysis in family CMT333 for 7q34- q36 markers...... 231 Table D.4 Two-point linkage analysis in family CMT484 for 7q34- q36 markers...... 232 Table D.5 Pedigree CMT677 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36...... 232 Table D.6 Two-point linkage analysis in family CMT703 for 7q34- q36 markers...... 233 Table D.7 Two-point linkage analysis in family CMT131 for 7q34- q36 markers...... 233 Table D.8 Two-point linkage analysis in family CMT273 for 7q34- q36 markers...... 234 Table D.9 Two-point linkage analysis in family CMT808 for 7q34- q36 markers...... 235 Table D.10 Two-point linkage analysis in family CMT260 for 7q34- q36 markers...... 235 Table D.11 Two-point linkage analysis in family CMT779 for 7q34- q36 markers...... 236 Table D.12 Two-point linkage analysis in family CMT477 for 7q34- q36 markers...... 236 Table D.13 Two-point linkage analysis in family CMT331 for 7q34- q36 markers...... 237 Table D.14 Two-point linkage analysis in family CMT724 for 7q34- q36 markers...... 237 Table D.15 Two-point linkage analysis in family CMT618 for 7q34- q36 markers...... 238

ix Table D.16 Two-point linkage analysis in family CMT748 for 7q34- q36 markers...... 238 Table D.17 Two-point linkage analysis in family CMT701 for 7q34- q36 markers...... 239 Table D.18 Two-point linkage analysis in family CMT102 for 7q34- q36 markers...... 239 Table D.19 Two-point linkage analysis in family CMT375 for 7q34- q36 markers...... 239 Table F.1 Healthy control haplotypes for association analysis . . . 249 Table G.1 Candidate genes within the minimum probable candidate interval on 7q34-q36 ...... 251 Table G.2 Candidate genes outside the minimum probable candi- date interval on 7q34-q36 ...... 253 Table G.3 Expression pattern of genes within the chromosome 7q34- q36 dHMN1 disease region...... 254 Table H.1 Sequencing gaps in 454 sequencing of 7q34-q36 . . . . . 255 Table H.2 Analysis of variants within chromosomal translocations on 7q34-q36 ...... 256 Table I.1 Gains and losses identified in control individual II:7 using aCGH...... 257 Table I.2 Gains and losses identified in affected individual III:9 using aCGH...... 258 Table I.3 Gains and losses identified in affected individual III:12 using aCGH...... 259 Table I.4 Gains and losses identified in affected individual II:1 using aCGH...... 260

x ACRONYMSANDABBREVIATIONS

ACRF Australian Cancer Research Foundation aCGH array comparative genomic hybridisation

AD autosomal dominant

ADLD autosomal dominant leukodystrophy

AGE agarose gel electrophoresis

AGRF Australian Genome Research Foundation

ALS amyotrophic lateral sclerosis

AR autosomal recessive

ASSP Alternative Splice Site Predictor

BAsH GNU Bourne-Again SHell bp

CCDS consensus coding sequence

CHOP Children’s Hospital of Philadelphia cM centimorgan

CMT Charcot-Marie-Tooth

CMT1A CMT type 1A

CNV copy number variation ddNTP dideoxynucleotide dHMN distal hereditary motor neuropathy

DNA deoxyribonucleic acid

DSMA distal spinal muscular atrophy

ESTs expressed sequence tags

FAM 6-carboxyfluorescein

FTD frontotemporal dementia

GWAS genome-wide association studies

xi HNPP hereditary neuropathy with liability to pressure palsies

HSF Human Splicing Finder

IBD identical by descent

IBS identical by state indel insertion or deletion kb kilobase

LOD logarithm of the odds

MAF minor allele frequency

Mb megabase microsatellite short tandem repeat

NCS nerve conduction studies

NGS next generation sequencing

NNSplice Splice Site Prediction by Neural Network

NS non-synonymous

OMIM Online Mendelian Inheritance in Man

PCR polymerase chain reaction

PCDHγ protocadherin-gamma

PPI protein-protein interaction

QC quality control

Rw rump white

SCA spinocerebellar ataxia

SD segmental duplication small HSP small heat shock protein

SMA spinal muscular atrophy

SNP single nucleotide polymorphism snRNP small nuclear ribonucleoprotein

SNV single nucleotide variation

SS splice site

xii ssDNA single-stranded DNA

STS sequence-tagged site

WGA whole-genome association

Wt wild type

xiii PUBLICATIONS

papers

• Drew AP, Blair IP, Nicholson GA. “Molecular genetics and mechanisms of disease in distal hereditary motor neuropathies: insights directing future genetic studies”. Curr Mol Med. 2011 Nov;11(8):650-65. abstracts

Presentations

• Drew AP,“Identifying a gene for distal motor neuron disease”, Concord clinical week 2008, Seminar series finalist.

Posters

• Williams K, Solski J, Drew A, Albulym O, Durnall J, Thoeng A, Thomas V, Warraich S, Crawford J, Rouleau G, Nicholson G, Blair I, “Exome sequencing in ALS and other motor neuron diseases”, 22nd International Symposium on ALS/MND, Sydney 2011.

• Williams K, Drew A, Solski J, Durnall J, Thoeng A, Albulym O, Warraich S, Thomas V, Crawford J, Rouleau G, Nicholson G, Blair I, “Investigating the genetic basis of ALS and other motor neuron diseases: Analysis of known genes and search for new loci”, 21st International Symposium on ALS/MND, Orlando 2010.

• Drew AP, Blair IP , Williams KL Nicholson GA, “Genetic analysis of distal motor neuron disease”, ANS satellite meeting motor neuron disease symposium 2010.

• Drew AP, Blair IP , Williams KL Nicholson GA, “Positional cloning of a gene for distal motor neuron disease on 7q34-q36”, ASHG annual conference Honolulu Oct 2009.

xiv • Drew AP, Blair IP , Williams KL Nicholson GA, “Molecular genetic studies of distal motor neuron disease”, ASMR NSW scientific meeting 2009.

• Drew AP, Blair IP , Williams KL Nicholson GA, “Molecular genetic studies of distal motor neuron disease”, From cell to society 6 2008.

• Drew AP, Blair IP, Durnall JC, Gopinath S, Kennerson ML, Nicholson GA, “Finding new genes causing motor neuron disease”, Lorne genome conference 2008.

xv LITERATUREREVIEW 1

1.1 molecular genetics and mechanisms of disease in distal hereditary motor neuropathies

The distal hereditary motor neuropathies (dHMNs) are a clinically and genet- ically heterogeneous group of disorders that superficially resemble Charcot- Marie-Tooth (CMT) neuropathy, where motor neurons are primarily affected with an absence of significant sensory involvement. Distal HMN is also known as distal spinal muscular atrophy (DSMA) or the spinal form of CMT (spinal CMT). Distal HMN is distinct from the other peroneal muscular atrophies, the combined motor and sensory neuropathies (hereditary motor and sensory neuropathy; HMSN including CMT1 and CMT2) or exclusively sensory neu- ropathy (hereditary sensory neuropathy; HSN). Symptoms caused by motor neuron loss include weakness and wasting of muscles controlled by the af- fected nerves, leading to loss of function of those muscles. The nerves affected varies between types of dHMN. Sensory neuron loss in related diseases results in loss of sensation and proprioception, while pain nerves generally remain unaffected. This can lead to over use of affected limbs. The original classi- fication of dHMN was based on the system proposed by Harding [1] with seven subtypes defined according to; mode of inheritance, age at onset and additional complicating features. Several additional types of dHMN have been described, necessitating additional classifications (see Table 1.1). The molecular genetic characterisation of dHMN has further delineated the classification of these disorders and has highlighted some overlap between phenotypic forms

1 1 literature review 2 of dHMN. There are also rare sporadic cases which can be attributed, at least in part, to de-novo mutations in dominant dHMN genes.

Several new genes with dHMN mutations have recently been identified, confirming the genetic heterogeneity of dHMN. As shown in Table 1.1, there are now 12 causative genes described for dHMN, and an additional five genetic loci where the gene remains to be identified. Some of these mutated genes are common between dHMN and CMT2. This review examines the growing number of identified dHMN genes, discusses recent insights into the functions of these genes and possible pathogenic mechanisms, and looks at the increasing overlap between dHMN and other neuropathies including CMT2 and spinal muscular atrophy (SMA).

1.1.1 Small heat shock protein family

Missense mutations in three genes from the small heat shock protein (small HSP) family, heat shock 27kDa protein 1 (HSPB1), heat shock 22kDa protein 8 (HSPB8) and heat shock 27kDa protein 3 (HSPB3), cause both dHMN2 and CMT2 [2–4]. Classically, dHMN2 is characterised by adult onset weakness and wasting, predominantly in the distal muscles of the lower limbs, with a rapid progression resulting in paralysis of the lower extremities within 5-10 years [5]. However, there can be a broad clinical phenotype including some cases with later bilateral hand weakness and minor sensory involvement. Also, a more severe earlier onset phenotype, similar to amyotrophic lateral sclerosis 4 (ALS4; Section 1.1.4), has been observed with a specific HSPB1 mutation [6]. Unrelated dHMN2 and CMT2 families have been reported with identical mutations in HSPB1 or HSPB8, demonstrating that there is significant genetic overlap between these disorders [2,7–9]. 1 literature review 3

The small HSPs are encoded by eleven genes, HSPB1 to HSPB11 [10]. The three small HSPs involved in dHMN, HSPB1, HSPB8 and HSPB3 all contain the highly conserved α-crystallin domain, characteristic of small HSPs [11]. The α-crystallin domain is flanked by a variable N-terminal and in HSPB1 and HSPB8 a short C-terminal. The expression of the small HSP genes is variable between tissue types, with expression of HSPB1, HSPB5, HSPB6, HSPB7 and HSPB8 in spinal cord [12–14]. Both HSPB1 and HSPB8 are expressed in motor neurons, with HSPB1 showing high expression [3, 15, 16].

In dHMN2, two missense mutations with autosomal dominant (AD) inheri- tance have been identified in HSPB8, both affecting the same amino acid [3]. Ten mutations have been described in HSPB1 in dHMN and CMT2F families, nine of which show AD inheritance and one with homozygous autosomal recessive (AR) inheritance [2, 17, 18]. Most recently a mutation in HSPB3 was identified in a dHMN family [4]. The majority of the HSPB1 mutations and both HSPB8 mutations are located in the conserved α-crystallin domain (HSPB1; L99M, R127W, S135F, R136W, R140G and T151I. HSPB8; K141N and K141E), with two mutations in the C-terminal of HSPB1 affecting the same amino acid (P182L and P182S) and three mutations in the N-terminal region (HSPB1; P39L and G84R. HSPB3; R7S). The G84R mutation is adjacent to the α-crystallin domain and lies in a conserved phosphorylation recognition motif (Figure 1.1 A).

Under normal cellular conditions, small HSPs freely interact to form large oligomeric complexes containing several small HSP family members, leaving only low levels of un-phosphorylated monomer [23, 24]. Under conditions of stress, small HSPs are phosphorylated, triggering the large oligomers to dissociate into smaller oligomers, dimers, and monomers [25, 26]. These phos- phorylated active act as molecular chaperones to help stabilise cellular proteins, preventing protein aggregation and facilitating substrate refolding in conjunction with other molecular chaperones [27]. In addition to their role Figure 1.1: Eleven causative genes described to-date for dHMN. Protein domains are drawn to scale. Mutations are numbered relative to the start ATG (codon 1) and the location is shown as vertical lines. The scale for each gene is indicated. (A) Small HSP family members HSPB1, HSPB8 and HSPB3. Most mutations lie within the α-crystallin domain (blue). (B) DCTN1, the G59S mutation lies within the CAP- gly domain (microtubule-binding) (orange). (C) IGHMBP2 and SETX are the two UPF1-like helicases identified in dHMN. Most missense mutations in IGHMBP2 are located in the Superfamily I DNA and RNA helicase domain (green), whereas null mutants (not shown) are distributed across the gene. Of the three reported SETX mutations, one lies in the Superfamily I DNA and RNA helicase domain and the remaining two lie in a putative N-terminal protein-protein interaction domain (aqua) [19]. (D) GARS mutations E71G, L129P, G240R, I280F, H418R D500N and G526R affect the catalytic domain (red). S581L and G598L lie within the tRNA binding domain (blue). E71G also interacts with the tRNA binding domain [20]. (E) BSCL2 mutations N88S and S90L both affect an N-glycosylation motif. (F) ATP7A mutations are located within or adjacent to transmembrane domains (purple). (G) The PLEKHG5 mutation is located within the pleckstrin-homology domain (yellow). (H) All TRPV4 mutations are located within the ankyrn repeat domain (brown) as part of ANK4 and ANK5 [21]. (I) DYNC1H1 mutations are localised to the dynein binding residues (brown). Figure generated with FancyGene [22]. 1 literature review 4

HSPB1 phosphorylation α-crystallin domain

5'UTR 3'UTR

P39L G84R L99M R127W T151L P182L S135F P182S R136W R140G HSPB8

A 5'UTR 3'UTR

K141N K141E HSPB3

5'UTR

100bp R7S DCTN1 microtubule-binding domain dynein-binding domain -related protein-binding domain

B 5'UTR 3'UTR

G59S 500bp

IGHMBP2 Superfamily I DNA and RNA helicase domain 3'UTR motifs I Ia II III IV V VI R3H Znf-AN1

L17P G86_A355del L192P C241R E334K E382K L472P D565N R637C D974E Q196A L125P G354S L426P E514K R603C 500bp H213R L361P H445P T493I R603H C P216L L364P K572del L577P V580I R581S R583I G586C T221A W386R

SETX Superfamily I DNA and RNA helicase domain Putative protein interaction domain motifs I Ia II III IV V VI NLS

3'UTR

T3I L389S R2136H 1000bp

GARS WHEP-TRS Catalytic Catalytic tRNA binding domain D 5'UTR 3'UTR

200bp E71G L129P G240R I280F H418R D500N S581L G526R G598A

BSCL2 transmembrane N-glycosylation motif transmembrane

E 5'UTR 3'UTR

100bp N88S S90L

ATP7A Copper binding sites Cu#1 Cu#3Cu#2 Cu#4 Cu#5 Cu#6 Phosphatase Phosphorylation ATP binding

F 3'UTR 500bp T994I P1386S

PLEKHG5 RhoGEF PH domain coiled coil G 5'UTR 200bp F647S

TRPV4 ankyrn repeat domain ANK4 ANK5 transmembrane domains pore H 3'UTR

200bp R269H R315W R269C R316C DYNC1H1 stem domain motor domain dimerisation residues ATPase ATPaseATPase ATPase ATPaseATPase ATPase I

1000bp H306R I584L Y970C K671E 1 literature review 5 as chaperones, small HSPs are also involved in a diverse range of cellular processes including, suppression of apoptosis, regulation of dy- namics, protection from oxidative stress, modulation of inflammation and RNA processing [28–33].

With the diverse range of cellular activities of small HSPs, the definitive mechanism through which mutations in HSPB1, HSPB3 and HSPB8 cause dHMN remains unclear. The variability of clinical phenotypes caused by these mutations further confuses their role. However, recent observations of mutant small HSPs in disease models and characterisation of the function of these proteins are providing insight into the processes involved in these neurodegenerative disorders.

Protein aggregates are seen in many neurodegenerative diseases but it is unclear whether these are a cause or consequence of the disease [34]. In neuronal cell culture, expression of HSPB1 and HSPB8 carrying dHMN2 mutations leads to formation of insoluble aggregates in the cell body [2, 3, 35]. Expression of mutant HSPB8 leads to formation of aggregates containing both HSPB1 and HSPB8 with increased interaction between the two proteins [3]. In addition to aggregation in the cell body, the HSPB1 P182L mutation prevented the correct transportation of both mutant and wild type (Wt) HSPB1 down neuronal processes [35]. Furthermore, aggregates were associated with disrupted neurofilament assembly. These aggregates contained p150 dynactin and neurofilament middle chain subunit (NFM) but not other cellular cargoes actively transported in axons [2, 35]. Disrupted assembly and aggregation of NF is also seen in CMT2E caused by mutations in the neurofilament- light subunit (NFL) [36, 37]. In cultured primary mouse motor neurons, Zhai et al [38] showed that both NFL and HSPB1 mutations caused progressive degeneration and loss of motor neurons. Co-expression of mutant NFL with Wt HSPB1 reduced the NF disruption and aggregation caused by mutant NFL. Furthermore, in NFL-null motor neurons expressing mutant HSPB1, 1 literature review 6 no aggregation of NFL or NFM is observed, indicating that HSPB1 interacts specifically with NFL [38].

In contrast to the cell culture models of the HSPB1, HSPB8 and NFL mutants, analysis of mutant HSPB8 in primary motor neuron cultures did not result in the formation of aggregates, but still resulted in a disruption of axonal transport. The cultured primary motor-neurons had shortened neurites with spheroids and end bulbs, resembling Wallerian degeneration, likely a result of axon retraction rather than lack of outgrowth [39]. Spheroids and endbulbs are filled with organelles and disorganised cytoskeleton components [40]. The increase in apoptotic activity seen in injured neurons [41] was not observed in the HSPB1 mutants, therefore, it is unlikely that the anti-apoptotic role of HSPB1 influences the initial pathogenesis of dHMN [39].

HSPB1 also interacts with actin microfilaments both as a chaperone un- der stress conditions and in the regulation of actin filament dynamics. As a chaperone, phosphorylated HSPB1 monomers coat actin filaments, prevent- ing the action of actin severing proteins activated by the stress response and aiding the recovery of actin filaments after stress [27, 42, 43]. Under normal conditions HSPB1 functions in a regulatory role, with un-phosphorylated HSPB1 monomers preventing the polymerisation of actin, promoting a pool of soluble actin subunits [44]. In motor neurons, actin microfilaments are the main components of the plasma membrane cytoskeleton and are enriched in presynaptic terminals, dendritic spines and growth cones [45]. The interaction of mutant HSPB1 with actin microfilaments in motor neuron disease remains to be investigated.

These studies suggest that disruption of the neurofilament network is a com- mon triggering event for motor neuron degeneration in dHMN2 and CMT2E/F, perhaps by disruption of axonal transport rather than direct neurotoxic effects of protein aggregates. 1 literature review 7

As small HSPs interact with the cytoskeleton as chaperones under stress conditions and a loss of chaperone activity is associated with other chaper- onopathies [46], the influence of the mutations on the chaperone activity of HSPB1 and HSPB8 have been investigated. In vivo analysis of HSPB1 mutations in the α-crystallin and C-terminal domains showed there was no reduction in the chaperone activity of HSPB1, some mutations even resulted in an increase in chaperone activity. This was associated with increased levels of dimer to monomer conversion of mutant proteins, with normal levels of HSPB1 dimer maintained by Wt protein. All HSPB1 mutants examined showed enhanced binding to target proteins, overall suggesting increased protein interaction activity is involved in dHMN and CMT2 [26]. This supports previous analysis showing decreased HSPB1 self-association correlates with increased chaperone function [27]. Conversely, the chaperone activity of K141E HSPB8 mutant was decreased, with a significant reduction seen at lower HSPB8 concentrations. Interestingly, with specific substrates, mutant HSPB8 was active earlier than Wt, but with a lower overall activity [47]. The K141N/E HSPB8 mutations affect the equivalent residue mutated in HSPB4 in AD congenital cataract [3, 48]. Mutation of this residue resulted in a similar reduction of chaperone activity [49]. Unfortunately, comparisons are difficult because the analysis of chaperone activity for HSPB1 and HSPB8 used different assays. The concentra- tion of protein, the time point examined and the substrate selected all influence the measured chaperone activity.

Recently, both HSPB1 and HSPB8 have been shown to interact in RNA processing pathways. As RNA processing defects are postulated to be a major factor in neurodegeneration [50, 51] these functions of HSPB1 and HSPB8 should be considered as a possible part of dHMN pathogenesis. Both disease causing HSPB8 mutants lead to abnormally increased binding of HSPB8 to the RNA helicase Ddx20 (gemin-3), which interacts with the survival-of-motor- neurons protein (SMN) [52]. Mutations in the SMN1 gene encoding SMN 1 literature review 8 protein cause SMA type 1 (SMA1; MIM#253300) [53]. With reduced SMN mRNA and protein in patients with SMA1, it is postulated that insufficient levels of SMN in motor neurons results in SMA1 [54, 55]. SMN is part of a multi-protein complex (including gemin-3), involved in the assembly and regeneration of small nuclear ribonucleoproteins (snRNPs) (RNA-protein com- plexes that combine with unmodified pre-mRNA and various other proteins to form a spliceosome) [56, 57] and microtubule associated transport of RNA [58]. The role of mutant SMN in SMA is not well understood, however in SMA mice, SMN deficiency results in reduced axon growth in motor-neurons, correlating with reduced β-actin mRNA and protein in distal axons and growth cones [59], as well as a reduction of snRNP assembly and expression of a subset of Gemin proteins in spinal cord [60]. The HSPB8/Ddx20 interaction involves RNA suggesting a role for HSPB8 in ribonucleoprotein processing in conjunc- tion with SMN [52]. Recently, HSPB1 was identified as a critical subunit of the AUF1- and signal transduction- regulated complex (ASTRC) and is an AU-rich element (ARE) binding protein [33]. AU-rich element mRNA degradation is the process whereby mRNAs with AREs in their 3’ UTR are specifically targeted for degradation [61, 62]. The association of HSPB1 with ASTRC may provide a sensing mechanism for ARE mRNA degradation involved in the regulation of transiently expressed proteins, including those involved in inflammatory responses, cell proliferation and intracellular signalling [33].

In addition to missense mutations, a mutation in the highly conserved heat shock element promoter region of HSPB1 was identified in a sporadic ALS patient with atypically long disease duration. This mutation lead to in vitro reduction of expression of HSPB1 under normal conditions and elimination of the heat shock response in neuronal cells, suggesting an increase in cellu- lar damage and toxicity resulting from a lack of heat-shock response, may contribute to the pathogenesis of motor neuron degeneration [63]. 1 literature review 9

In summary, it is likely that dominant toxic interactions involving mutant HSPB1 and HSPB8 cause dHMN through disruption of axonal transport, a pathogenic mechanism to which motor neurons are particularly vulnerable.

1.1.2 Dynactin 1 (DCTN1)

A missense mutation in the DCTN1 gene encoding the p150Glued subunit of dynactin (p150Glued) was described in a dHMN7B (MIM#607641) family [64]. Distal HMN7B, also known as distal spinal and bulbar muscular atrophy is an AD lower motor neuron disease presenting in the second or third decade of life, with vocal fold paralysis leading to breathing difficulties, progressive facial weakness, weakness and muscle atrophy of the hands and later involvement of the lower extremities [64, 65]. Rare DCTN1 mutations have also been identified in ALS and ALS with FTD [65–67]

Dynactin is a microtubule motor protein involved in mitosis and specialised subcellular movement [68]. Dynactin is a large protein complex necessary for dynein and kinesinII based microtubule movement. The p150Glued subunit is particularly important for its role in binding dynactin to microtubules through its microtubule binding domain. Dynein and other molecular motors bind to dynactin on the arm of p150Glued adjacent to the microtubule binding domain, with the remaining rod like structure involved in binding a variety of membrane and protein structures [68]. The p150Glued subunit is ubiquitously expressed and is highly enriched in neurons of the central nervous system [69].

The G59S mutation reported in dHMN7B is located in the conserved CAP- gly domain of p150Glued [64](Figure 1.1 B). Mutations in the CAP-gly domain of p150Glued were also identified in Perry syndrome (MIM#168605), a rapidly progressive AD neurological disorder involving parkinsonism, weight loss, 1 literature review 10 depression with suicidal attempts and later nocturnal hypoventilation leading to respiratory insufficiency and death within 2-10 years [70]. With no motor neuron involvement, Perry syndrome and dHMN7B are distinct disorders. Unlike the mutations in dHMN7B and Perry syndrome, the ALS DCTN1 mutations are outside the CAP-gly domain.

The G59S p150Glued dHMN mutant shows decreased binding to microtubules and does not disrupt the binding of Wt p150Glued [64, 71]. Dynactin was disrupted in a late onset progressive motor neuron disease phenotype mouse model through over expression of the dynamitin (p50) subunit of dynactin, causing inhibition of retrograde axonal transport [72]. This was reflected in the immunohistochemical staining of a patient with dHMN7B (74yo) after autopsy which showed a change in distribution of dynactin, dyenin and related proteins from diffuse fine granular staining throughout the cell body and axons to more intense, coarser and irregularly shaped granules and larger accumulations within the cell body, proximal axon and distal neurites. Normal neurofilament staining in motor neuron cell bodies and axons indicated that the defect was with dynactin [73]. The G59S p150Glued dHMN mutant leads to formation of inclusion bodies, and it has been suggested that a loss of stability promotes aggregation [71].

The G59S mutation is located in a fold of the CAP-gly domain and is predicted to distort the folding of the microtubule domain [64]. Recent experi- ments with yeast mutants analogous to the G59S human mutation have helped understand the mechanism underlying the dysfunction of dynactin-mediated transport in motor neuron disease. The CAP-gly domain has a critical role in the initiation and persistence of dynein-dependant movement, but is not essential for dyenin based movement. It was speculated that the GAP-Gly activity is necessary for overcoming high force/load thresholds of movement [74]. 1 literature review 11

Analysis of the p150Glued mutants in Perry’s syndrome indicate the mutated CAP-gly domain is folded correctly but is less stable than Wt p150Glued. The domains containing the Perry syndrome mutations bind to microtubules but fail to bind to EB1, a microtubule plus-end tracking protein, unlike a G59A mutant which retained EB1 binding ability. The G59A mutant was analysed instead of the G59S mutant, which was not correctly expressed in this model [75]. Levy et al. [71] showed that EB1 binding was reduced with the G59S mutation but the stability of the dynactin complex was not affected. Ahmed et al. [75] inferred that the steric clashes stemming from the G59S mutation prevent correct protein folding under physiological conditions and that p150Glued is more unstable in dHMN7B than in Perry’s syndrome.

A reduction in the stability of CAP-gly domain of p150Glued supports the model of motor neuron specific degradation proposed by Levy et al [71] that mutant DCTN1 causes decreased binding of p150Glued to microtubules and EB1 resulting in disrupted axonal transport. An aberrant self-association leading to aggregates, specifically in neurons, leads to further impairment of axonal transport.

1.1.3 Immunoglobulin mu binding protein 2 gene (IGHMBP2)

Mutations in the IGHMBP2 gene on chromosome 11q13 result in dHMN6, also known as distal spinal muscular atrophy 1 (DSMA1; MIM#604320) [76]. To date, over 50 novel AR mutations have been described, with both homozygous and compound heterozygous inheritance patterns, including; missense, nonsense, frameshift, in-frame deletions and intronic splice mutations [76–86]. In addition, a large deletion and a complex mRNA rearrangement of IGHMBP2 have been identified together with missense mutations in two DSMA1 cases [78, 79]. In DSMA1 there is a progressive degeneration of spinal motor and sensory 1 literature review 12 neurons, with axonal degeneration, abnormal myelin formation, and motor end-plate degeneration leading to muscle paralysis and wasting [87]. Patients present with rapidly progressive, life threatening respiratory distress caused by paralysis of the diaphragm, and distal muscle weakness and wasting with foot and hand deformities [77, 88]. Sensory nerves are also affected in later stages of the disease. Presentation within the first 6 months is strongly indicative of IGHMBP2 mutations, with only few cases with later juvenile onset of respiratory distress identified with IGHMBP2 mutations [84, 85].

The IGHMBP2 gene encodes Immunoglobulin mu binding protein 2 (IGHMBP2) a ubiquitously expressed protein, predominantly present in the cytoplasm, axons and growth cones with a smaller nuclear localised fraction in spinal motor neurons [89, 90]. IGHMBP2 is a UPF1-like helicase, a member of the superfamily I (SF1) DNA/RNA helicases [91], containing the seven helicase motifs distinctive of SF1 helicases [92], including the conserved Walker A and B ATPases [93](Figure 1.1 C). IGHMBP2 also contains a R3H motif [94] and zinc finger AN1-like domain [95]. Mutations in a second UPF1-like helicase, senataxin, cause the dHMN, ALS4 [96]. Recent insight into the cellular func- tion of IGHMBP2 has indicated it is a ribosome associated protein with 5’-3’ helicase activity [97]. The catalytic activity of IGHMBP2 requires both helicase binding and ATP hydrolysis [97]. More specifically, it is involved in transla- tion of mRNA[98]. IGHMBP2 is co-localised with RNA processing machinery [99] and interacts with related helicases, Reptin, Pontin and Abt1 as well as TFIIIC220 a RNA polymerase III transcription factor involved in tRNA gene transcription, ribosomal maturation and pre-ribosomal RNA processing and maturation [98].

The missense mutations in the helicase domain affect the catalytic activity of IGHMBP2 either directly through disruption of the helicase binding activity or indirectly by loss of the ATPase activity powering the helicase [97]. Rarer missense mutations outside the helicase domain do not affect the helicase 1 literature review 13 binding or ATP hydrolysis activities of IGHMBP2, but rather result in reduced protein levels (50% reduction by the T493I allele), by a disruption of protein stability or reduced transcription of the normal mRNA levels [85]. Similarly, in a splicing mutant where exon 8 is skipped there is partial expression resulting in 24.4 ± 6.9% of Wt IGHMBP2 mRNA levels [84]. In this case, reduction of the exon 8 skipped mRNA was probably the result of nonsense mediated mRNA decay. A second nonsense mutation (C496X) leads to a complete loss of mutant mRNA rather than truncated protein [80].

Some evidence suggests a possible correlation between IGHMBP2 protein levels and age of onset of respiratory symptoms, with infantile onset having a lower IGHMBP2 protein level than patients with juvenile onset [85]. Impor- tantly, 50-75% Wt protein levels were shown in carriers, therefore the critical level of IGHMBP2 for not developing symptoms lies in the range of 25-50% [85]. This is supported by studies of the nmd mouse, a model of SMA resulting from a homozygous splicing mutation in IGHMBP2 that express ∼20% of normal IGHMBP2 mRNA levels in all tissues. The Nmd mouse model is not a 100% null mutant but rather a hypomorphic allele [100].

The recessive inheritance of DSMA1 indicates that there is a loss of function of IGHMBP2 involved in the disease. This is either through missense mutations in the helicase domain affecting the catalytic activity of IGHMBP2 or through reduced protein levels. Protein levels are not the only factor influencing the disease outcome in the nmd mouse model. A protective allele has been mapped to the Mnm locus in the nmd mouse which rescues the nmd phenotype [100]. The modifying factor does not restore IGHMBP2 protein levels, rather it possibly compensates for lost IGHMBP2 function. Interestingly, the Mnm locus fails to rescue the dilated cardiomyopathy phenotype seen later in nmd mice after the neurodegeneration phenotype is rescued [101]. In addition, dilated cardiomyopathy is also seen in nmd mice after neuronal restoration of 1 literature review 14

IGHMBP2 levels and rescue of nmd phenotype, suggesting a motor neuron specific activity of the Mnm locus [101].

In summary, DSMA1 involves the loss of function of IGHMBP2 helicase activity through disruption of protein activity by missense mutations or overall reduced protein levels. The level of active protein is likely to correlate with the rate of disease progression and age of onset. As IGHMBP2 protein levels are equivalently reduced in neurons and other cell types in DSMA1, motor neurons are apparently more sensitive to the reduction of IGHMBP2 than other cell types, possibly due to their large size, high metabolic demands and precise connectivity requirements [85].

1.1.4 Senataxin (SETX)

Mutations in the SETX gene encoding senataxin cause the rare AD distal hereditary motor neuropathy with pyramidal features known as juvenile amyotrophic lateral sclerosis type 4 (ALS4; MIM#602433). ALS4 is a juvenile or adolescent onset, slowly progressive disease that involves limb weakness and muscle wasting with pyramidal signs as a result of degeneration of motor neurons in the brain and spinal cord. Sensory involvement is not seen and there is sparing of bulbar and respiratory muscles [102]. Three ALS4 families were reported with senataxin mutations (T3I, L389S and R2136H) [96] with a fourth ALS4 family later reclassified as dHMN1, with linkage to chromosome 7q34-q36 [103]. A large number of SETX mutations have also been reported in several other disorders including the AR neurological disorder, ataxia with ocular apraxia type 2 (AOA2) [104–117], a rare AR ataxia with peripheral neuropathy [114] and an AD cerebellar ataxia/tremor syndrome [118]. While the involvement of peripheral neuropathy suggests some clinical overlaps between ALS4 and AOA2, they are clinically distinct [96, 114]. 1 literature review 15

Senataxin is ubiquitously expressed, present primarily in the nucleus but also present in the cytoplasm [19, 109]. Senataxin has a conserved SF1 RNA/DNA helicase domain in the C-terminus with strong homology to the helicase domains of UPF1 and IGHMBP2 [96, 104]. As such, senataxin is the second UPF1-like helicase identified in a dHMN. The helicase domain of senataxin is similarly conserved with the yeast orthologue Sen1p [96, 119]. Senataxin also contains a putative N-terminal protein-protein interaction domain and a C-terminal nuclear localisation signal [19, 104]. The SETX missense mutations cluster in the helicase and protein-protein interaction domains suggesting a role for these domains in disease (Figure 1.1C)[ 19]. Although the direct mechanisms through which mutations in SETX cause ALS4 are unknown, recent studies have shown senataxin is involved in transcriptional regulation of genes including RNA polymerase II termination factor and in pre-mRNA processing affecting splicing efficiency and splice site selection [120].

The function of senataxin is similar to Sen1p in yeast, where it is a RNA helicase acting on a wide range of RNA classes, including tRNAs, rRNAs and small nuclear and nucleolar RNA [121], tRNA splicing [122], small nuclear RNA synthesis [123], transcription and transcription-coupled DNA repair [124] and as a RNA polymerase II transcription regulator [125]. An amino acid substitution in the helicase domain of Sen1p altered the genome wide distribution of RNA-polymerase II in both non-coding and protein coding genes [125]. With similar mutations, Steinmetz [125] suggested a role for altered gene transcription in ALS4 and AOA2.

In AOA2, senataxin mutations result in altered response to oxidative stress [109, 117]. Both the knockdown of SETX expression and an AOA2 SETX null mutant, showed a decrease in expression of genes involved in oxidative stress, including SOD1, IMPDH2, CYC and RPL36 [120]. Mutant senataxin also showed a similar toxic effect, however, knockdown of mutant senataxin 1 literature review 16 ameliorated the toxic effect of the mutant form. This resulted in normal levels of oxidative stress response [117].

Mutant senataxin showed normal cellular distribution with no apparent aggregation in cell culture and a normal molecular weight seen in Western immunoblotting. It was therefore suggested that mislocalisation or generation of abnormal protein isomers and aggregates do not play a role in ALS4 [19]. While it appears there is a possible role for decreased oxidative stress response in AOA2, the mechanism of mutant senataxin in ALS4 has not been determined. An error in transcriptional regulation of coding genes or an error in pre-mRNA processing, affecting splice site selection or splicing efficiency are still plausible possibilities in ALS4.

1.1.5 Glycyl-tRNA synthase (GARS)

Mutations of the GARS gene encoding glycyl-tRNA synthetase (GlyRS) were initially reported in dHMN5 (MIM#600794) and CMT2D (MIM#601472) [126]. Distal HMN5, also known as distal spinal muscular atrophy type 5, is an AD inherited peripheral neuropathy, characterized by adolescent onset of weakness and atrophy of muscles in hands and feet, predominantly of the upper limbs, with slow progression to involve the lower limbs in most cases [127]. Distal HMN5 and CMT2D are allelic variants, with sensory involvement distinguishing CMT2D from dHMN5 [126–128]. Atypical dHMN5 phenotypes have also been reported with GARS mutations, a variable phenotype showing lower or upper limb onset (G562R) [129], predominant lower limb involvement (P244L) [130, 131] and early childhood onset with predominantly lower limb involvement (G598A) [132]. Three GARS mutations were reported in distinct dHMN5 and CMT2D families (dHMN5; L129P and G526R, CMT2D; G240R) and one family with both CMT2D and dHMN5 individuals (E71G) [126]. 1 literature review 17

Further GARS mutations have been described in dHMN5 (I280F and A57V) and CMT2 (S581L, H418R, and P244L) (Figure 1.1D)[127, 131–133].

Two mouse models of CMT2D have been described with mutations in the mouse Gars gene, the orthologue of human GARS [134, 135]. These two Gars mutations cause neuropathy through a dominant, gain-of-function mechanism [135].

The aminoacyl-tRNA synthetases (ARS) (including GARS) are a family of enzymes that catalyse the covalent bonding (charging) of an amino acid to its corresponding transfer RNA (tRNA). The charged tRNA can then deliver the amino acid to a growing polypeptide on a ribosome [136]. ARS are ubiquitously expressed and highly conserved across organisms, reflecting their fundamental role in protein synthesis. GlyRS is a homodimer, essential for the charging of glycine to tRNAgly [137], in both the cytoplasm and mitochondria.

Antonellis and Green [20] proposed four mechanisms whereby mutant ARS could cause neurodegeneration. (1) Impaired function of ARS enzymes ei- ther through loss of tRNA charging function or altered cellular localization (compartment specific loss of function) or (2) a toxic gain of function caused by aggregation of mutant GARS or incorrectly synthesized proteins or (3) disruption of a neuronal-specific secondary function of these enzymes or (4) mitochondrial specific disruption of ARS function through loss of ARS trans- port into mitochondria. While the pathologic mechanisms involving mutant GlyRS in dHMN5 remain unclear, Motley et al. [138] suggested that a loss of enzyme function through altered dimerisation or impaired axonal transport of GARS with local protein synthesis defects, were likely pathogenic mechanisms of GARS in CMT2D and dHMN5. Three out of four ARS genes mutated in CMT (lysyl-, alanyl- and tyrosyl-tRNA) involve loss of function through re- duction in tRNA aminoacylation [126, 139, 140] or altered axonal distribution [141]. A dominant-negative loss of function of GlyRS would agree with the loss of function observed by the other mutant ARS in CMT. However, other 1 literature review 18 possible pathogenic mechanisms include an altered non-canonical function of GARS, toxic protein interactions or mitochondrial toxicity [138].

1.1.6 Berardinelli-Seip congenital lipodystrophy 2 (SEIPIN) gene (BSCL2)

In addition to GARS mutations, dHMN5 can also be caused by mutations in the Berardinelli-Seip congenital lipodystrophy 2 (seipin) gene (BSCL2). These mutations are also found in the allelic disorder spastic paraplegia 17 (SPG17; MIM#270685) also known as Silver syndrome [142]. Two BSCL2 mutations have been identified in dHMN5 and SPG17, N88S and S90L (Figure 1.1 E). These affect a conserved N-glycosylation motif [142]. This glycosylation site is a mutational hot-spot, with these mutations identified independently in several families and as de-novo mutations in sporadic cases [6].

SPG17 is a rare AD neurodegenerative disorder, characterised by weakness and wasting of the small muscles in the hand and marked spasticity in the lower limbs [143]. BSCL2 mutations result in heterogeneous phenotypes, with clinical variability often seen within individual families. While the clinical phe- notypes are predominantly dHMN5 and SPG17, atypical phenotypes include, subclinical neurological damage, variant SPG17 and dHMN5 with lower limb or both lower and upper limb involvement and CMT2 [143–146].

Unlike the missense BSCL2 mutations in dHMN5 and SPG17, null mutations cause congenital generalized lipodystrophy type 2 (CGL2; MIM #269700), a rare AR disease of lipid metabolism [147]. CGL2 is characterised by a near absence of adipose tissue from birth or infancy, severe insulin resistance and mental retardation. Lipodystrophy is not evident in dHMN5 and SPG17, suggesting that BSCL2 mutants in dHMN5 and SPG17 confer a dominant toxic gain of function [144]. 1 literature review 19

Three alternative transcripts of BSCL2 have been identified. Two are ubiq- uitously expressed and one is specifically expressed in motor neurons in the spinal cord, cortical neurons in the cerebral cortex, and in the testis and pi- tuitary gland [142, 148]. The 288 aa central region of BSCL2 is functionally conserved across species [149]. This region of BSCL2 contains two transmem- brane domains, with the N and C-terminals in the cytoplasm. The conserved N-glycosylation motif mutated in dHMN and SPG17 is located between these two transmembrane domains [150].

Ito and Suzuki [151] demonstrated that mutations in seipin disrupt N- glycosylation leading to incorrect protein folding in the endoplasmic reticulum (ER). They proposed a mechanism for neurodegeneration, whereby some misfolded seipin is eliminated by the ubiquitin-proteosome system, however this is insufficient to prevent mutant seipin from accumulating in the ER, leading to cell death through ER stress [152]. Similar accumulation of mutant BSCL2 was seen in cell cultures expressing the N88S and S90L mutations, with aggreosome like structures containing mutant BSCL2 along with altered cellular localisation [142]. Further evidence for a toxic gain of function comes from functional studies in yeast. BSCL2 proteins containing the N88S or S90L mutations were able to rescue a lipid droplet morphology in the same way as Wt BSCL2 and truncated BSCL2 containing only the conserved 288 amino acid region. This contrasts with BSCL2 containing a CGL2 null-mutant, which resulted in an altered lipid droplet morphology, indicating the N88S and S90L mutants still possess some normal BSCL2 function [149].

In a dHMN family carrying a N88S BSCL2 mutation a second disease locus was mapped to chromosome 16p [146]. This family had a variable dHMN phenotype, including either lower or upper or both lower and upper limb involvement, while some family members also showed pyramidal features. All twelve patients with dHMN carried the N88S BSCL2 mutation and the 16p locus. One individual carried only the 16p locus and showed sub-clinical 1 literature review 20 neurological damage. This digenic inheritance of the 16p locus may have an additive pathologic effect or be sufficient for an attenuated form of dHMN [146].

While the number of studies of mutant BSCL2 is small, and no post-mortem analysis has been reported, the proposed toxic gain of function leading to neurodegeneration through ER stress remains a likely central mechanism in dHMN5 and SPG17.

1.1.7 ATPase, Cu2+-transporting, alpha polypeptide gene (ATP7A)

Mutations causing X-linked dHMN (DSMAX; MIM#300489) have been reported in the gene encoding the copper-transporting ATPase, ATP7A (Figure 1.1F) [153]. This X-linked dHMN syndrome is characterised by slowly progressive distal weakness, with minor sensory involvement. The mutations occurred in families of Brazilian [154] and North American [155] ancestry. An early age of onset was observed in the Brazilian family with the North American family presenting in the third or fourth decades. Similar motor-neuron disease like symptoms can also result from acquired copper deficiency [156]. Mutations in ATP7A have previously been described in the severe conditions, Menkes dis- ease (MD) [157–159] and occipital horn syndrome (OHS) [160]. However, these syndromes are phenotypically distinct from X-linked dHMN [153]. Analysis of the copper transport function and cellular localisation of ATP7A indicate the X-linked dHMN mutations are causing a protein trafficking defect, where ATP7A doesn’t move away from the trans Golgi network in response to ele- vated copper [153]. It is unclear how this defect in copper transport results in a motor-neuron specific phenotype. 1 literature review 21

1.1.8 Pleckstrin homology domain-containing protein, G5 gene (PLEKHG5)

A mutation in the pleckstrin homology domain containing protein, family G member 5 (PLEKHG5) gene was reported in large consanguineous pedi- gree with an AR lower motor neuron disease [161, 162]. Classified as distal spinal muscular atrophy 4 (DSMA4; MIM#611067) or dHMN4, this severe juvenile onset, lower motor neuropathy presents with distal muscle wasting of both upper and lower limbs, rapidly progressing to a general paralysis and impaired respiration, but sparing cranial nerves [161]. A milder phenotype was identified in one family member, with delayed onset, slower course and moderate generalised weakness.

PLEKHG5 is a member of the Db1 protein family, with characteristic cat- alytic Db1-homology domain and pleckstrin homology domain (Figure 1.1 G). PLEKHG5 is a low abundance protein predominantly expressed in periph- eral nerve, spinal cord and brain, with lower expression in heart and skeletal muscle. [162, 163]. PLEKHG5 is a direct activator of RhoA in vivo, and is a suppressor of neuronal differentiation [163]. PLEKHG5 is also an activator of the NFκB signalling pathway [162, 164]. In neurons, NFκB activation stimu- lates pro-survival genes, including anti-apoptotic proteins, antioxidants and neurotrophic factors [165].

The reported mutation in PLEKHG5 leads to protein instability and a fail- ure in activation of the NFκB pathway. Furthermore, aggregation of mutant PLEKHG5 was seen when over expressed in a motor-neuron like cell line [162]. The demonstrated instability of mutant PLEKHG5, and its inability to activate NFκB signalling, support a homozygous loss of function mechanism. Maystadt et al. [162] proposed that PLECKHG5 may play a protective role in motor neurons and that a loss of NFκB stimulating activity leads to neuronal-specific 1 literature review 22 cell death. Further investigation of PLEKHG5 aggregation is needed to support a role in neurodegeration.

1.1.9 Transient receptor potential cation channel, V4 gene (TRPV4)

Two mutations in the vanilloid transient receptor potential cation-channel gene (TRPV4) were recently identified in congenital DSMA linked to chromosome 12q23-q24 (MIM#600175) [21]. The congenital DSMA TRPV4 mutations were concurrently reported with TRVP4 mutations in hereditary motor sensory neuropathy type 2C (HMSN2C; MIM#606071, or CMT2C) and scapuloperoneal spinal muscular atrophy (SPSMA; MIM#181405) [21, 166, 167], with overlap of all the TRVP4 mutations between the three conditions [168]. In all, four mutations were reported; two mutations affecting the same amino acid, R269H and R269C, and two mutations affecting adjacent amino acids, R315W and R316C (Figure 1.1 H). Both congenital DSMA families showed intra-familial clinical variation. The phenotype of the original congenital DSMA family ranged from mild distal lower limb involvement, through to a more severe phenotype involving pelvic girdle weakness and subsequent scoliosis [169, 170]. The second family reported further clinical variation including mild and severe congenital DSMA, SPSMA and HMSN2C [21]. Although congenital DSMA and SPSMA were originally considered to be distinct [170], they have since been confirmed to be allelic variants of mutant TRVP4 along with HMSN2C [21, 166–168].

TRVP4 forms a homo-tetramer that acts as a plasma membrane calcium ion channel, activated by heat, mechanical stress and extracellular hypotonicity [171]. The four unique mutations in TRVP4 are all located in the cytosolic N-terminal ankyrin repeats, a putative regulatory region involved in TRVP4 tetramerization or interaction with regulatory proteins [171]. 1 literature review 23

The function of mutant TRVP4 in peripheral nerves is unclear, with conflict- ing disease mechanisms reported [168]. Auer-Grumbach et al. [21] reported reduced cell surface expression of TRPV4 calcium channels and reduced cal- cium influx, whereas two other studies reported marked cellular toxicity, with increased calcium channel activity [166, 167]. These differences have been attributed to the use of different experimental techniques [168] and further research is required to clarify the functional effects of TRPV4 mutations in neurodegeneration.

1.1.10 DYNC1H1

During the preparation of this thesis, mutations in the dynein cytoplas- mic 1 heavy chain 1(DYNC1H1) gene were identified in three families with spinal muscular atrophy with lower extremity predominance (SMA-LED, MIM#158600)[172] and one with CMT2O(MIM#614228)[173]. Both the SMA- LED and CMT2O families with DYNC1H1 mutations showed a lower limb predominance of motor neuron disease. The affected patients from SMA-LED families, showed a pure motor neurone disease, manifesting in early childhood as delayed motor development and lower limb weakness affecting walking and running, without sensory involvement. The affected patients from the CMT2O family also showed delayed motor development and early-onset muscle weak- ness affecting walking. However, slow disease progression lead to distal lower limb weakness and wasting with pes cavus deformity and variable sensory involvement in some family members, including reduced vibration sense, loss of proprioception and fine touch and lower limb pains. Some of the affected patients from the CMT2O family had similar clinical descriptions to SMA-LED indicating a clinical overlap between the disorders. 1 literature review 24

The four mutations are located in the tail domain of DYNC1H1, involved in dimerisation of dyenin/dynactin complex motor units (Figure 1.1 I). Three mouse models with motor neuron defects and muscle wasting have been described with mutations in the mouse Dync1h1 gene [174, 175]. These three mutations are also located in the tail domain of Dync1h1.

The discovery of DYNC1H1 mutations adds to previous evidence from DCTN1 (Section 1.1.2) that disruption to retrograde transport in motor neurons is a critical disease mechanism in dHMN. Other genes involved in these processes are likely disease candidates in additional unsolved dHMN families.

1.1.11 Clinical variability in dHMN

The dHMN genes identified to date have a diverse range of functions, sug- gesting motor neuron degeneration is a multifactorial process or that there are numerous ways to disrupt these highly metabolic, terminally differentiated cells. While distal motor neurons are primarily affected, the involvement of other neuronal cell types, including upper motor neurons and sensory neu- rons is apparent in some dHMN. This variability is both intrafamilial, with individuals having different diagnoses within a single family as well as in- terfamilial, where separate families carrying the same mutation, or different mutations in the same gene, have differing diagnosis. In the case of sensory involvement in dHMN, individuals without sensory signs are diagnosed as dHMN, while individuals with clear sensory signs are diagnosed as CMT2. The clinical variability in dHMN makes accurate diagnosis a complex and difficult task. As the involvement of sensory and pyramidal tract signs are prominent in several forms of dHMN they should be considered in diagnostic and gene screening paradigms [6, 185]. This further complicates genetic studies which rely on accurate diagnosis of family members for linkage analysis in 1 literature review 25 [ 3 ] [ 2 ] [ 4 ] [ 64 ] [ 142 ] [ 162 ] [ 181 ] 179 , 180 ] References [ 21 , 169 , 170 , † OMIM classification of dHMN AD adult onset lower limband predominant wasting distal weakness diaphragm contractures tract signs function XL juvenile onset distal weakness and wasting [ 153 ] AR infantile onset with respiratory distressAD congenital, nonprogressive lower limb weakness and paralysis with [ 76 ] AR juvenile onset severe muscle weakness and wasting and paralysisAD of upper the limb predominant distal muscle weakness and wastingAD adult onset breathing difficulties due to vocal cord [ 126 ] paralysis, facial AR childhood onset muscle weakness and severely decreased respiratory AD juvenile onset distal weakness and wasting with pyramidal tract signsAD juvenile delayed motor [ 96 ] development and lower limb weakness [ 172 ] weakness and hand muscle atrophy Table 1.1: IGHMBP2 TRPV4 HSPB8 HSPB1 HSPB3 GARS BSCL2 DCTN1 ATP7A PLEKHG5 SETX DYNC1H1 11q13.2 12q23-q24 11q13.311q13.3 Unknown AR Unknown adult onset distal weakness and wastingXq13-q21 [ 176 , 177 ] 4q34.3-q35.2 Unknown AD adult onset lower limb distal weakness and wasting with pyramidal 11q12-q14 ‡ ‡ AD, autosomal dominant, AR, autosomalA recessive, family XL, mapping X-chromosome to linked. 11q13.3 has both dHMN3 and dHMN4 phenotypes [ 177 ], Irobi et al [ 184 ] suggested that they are the same disorder. † ‡ DiseasedHMN1 (MIM%182960)dHMN2A (MIM#158590)dHMN2B (MIM#608634) 7q34-q36 12q24.3 dHMN2C (MIM#613376)dHMN3 Unknown 7q11-q21 (MIM%607088) AD 5q11.2 dHMN4 juvenile (MIM%607088) onset lower limb predominant Locus distal weakness and wastingdHMN5 (MIM#600794) GenedHMN6 7p15 [ 103 ] dHMN7A (MIM%158580) Inheritance/phenotype dHMN7B (MIM#607641) 2q14 2p13 DSMA4 Unknown (MIM#611067) AD adult onset withCongenital vocal distal cord SMA paralysis 1p35 dHMN-ALS4 (MIM#602433) 9q34 dHMN with pyramidal features dHMN-J (MIM%605726)SMA-LED (MIM#158600) 9p21.1-p12 14q32.31 Unknown AR juvenile onset distal weakness and wasting with pyramidal tract signs [ 178 ] [ 182 , 183 ] (DSMA1; MIM#604320) (MIM#600175) X-Linked dHMN (DSMAX; MIM#300489) 1 literature review 26 such pedigrees. This adds to the limitations on linkage analysis caused by the genetic heterogeneity of dHMN which prevents independent pedigrees from being analysed together. In addition, the functional variability between known genetic causes of dHMN limits the application of pure functional cloning based gene identification (Section 4.1.2).

Sensory involvement is sometimes seen with mutations in dHMN genes, highlighting the overlap between dHMN and CMT2. Interfamilial variability is seen with unique HSPB1 mutations reported in unrelated dHMN2 and CMT2 families. The S135F HSPB1 mutation results in both dHMN2 and CMT2 in unrelated families [2, 18]. As such, the differences in sensory symptoms between these families cannot only be attributed to the different HSPB1 muta- tions having varying influence between sensory and motor neurons, and the disease mechanism must be the same between dHMN2/CMT2 with HSPB1 mutations. Similarly the K141N HSPB8 mutation causes both dHMN2 and CMT2L in unrelated families with mild to moderate sensory involvement the distinguishing feature between these conditions [3, 8]. Furthermore, GARS mutations cause both dHMN5 and CMT2 in unrelated families, as well as in a single pedigree carrying the E71G GARS mutation [126, 127, 131–133].

Additional interfamilial variability in dHMN further supports the overlap of dHMN and CMT2. Nicholson et al. [185] identified several dHMN families with variable sensory signs and individuals from CMTX families lacking sensory symptoms. The X-linked dHMN families with ATP7A mutations have minor sensory involvement in some family members [153]. The dHMN1 family linked to chromosome 7q34-q36 has two individuals with minor sensory signs [103] and the dHMN with pyramidal signs linked to chromosome 4q34.3-q35.2 has one individual with sensory signs [181]. Later sensory symptoms are seen in DSMA1 with IGHMBP2 mutations, confirmed by autopsy results which showed progressive motor and sensory neuron degeneration [87]. Generally no sensory involvement is seen in with BSCL2 mutations, however in the large 1 literature review 27 family reported by Auer-Grumbach et al. [143], 20% of affected members had a motor and sensory neuropathy consistent with CMT2.

Clinical variability in dHMN extends to upper motor neuron involvement in some dHMN families. Upper motor neuron involvement has been reported, again with intra- and interfamilial variability. Pyramidal signs are seen with SETX mutations [6, 102], and the rare HSPB1 P182L mutation [6]. Pyramidal signs have also been reported in dHMN1 mapped to chromosome 7q34-q36 [103], and dHMN with pyramidal features mapped to chromosome 4q34.3- q35.2 [181]. Significant variability is also seen with BSCL2 mutations. Generally, the N88S mutation causes a dHMN5 phenotype, with atrophy and weakness of hand muscles whereas the S90L mutation causes the more severe SPG17 phenotype, with spasticity in the lower limbs and additional upper motor neuron signs in addition to the muscle atrophy in the upper limbs [6]. The unique de novo G598A GARS mutation causes an infantile SMA contrasting to the usual dHMNV/CMT2 [132].

It is still unclear why variability of affected tissues and age of onset exists between closely related individuals, however this implicates additional mod- ifying factors, environmental or genetic, influencing the phenotype. Genetic modifiers are other polymorphisms or mutations which ameliorate or intensify the phenotype created by the primary disease mutation. Two genetic modifiers have been mapped in a dHMN5 family and the nmd mouse model of DSMA1. A digenic inheritance of the N88S BSCL2 mutation and a second locus mapping to chromosome 16p was reported in a dHMN5 family with variable presen- tation and additional pyramidal features in some family members [146]. The chromosome 16p locus was also identified in two individuals with sub-clinical motor neuron damage and is suggested to be a modifier locus, responsible for some of the clinical variability within the family. The LITAF1 gene associated with CMT1C is located in this region, however no mutation, copy number variation or altered expression was identified. A further 49 genes remain in 1 literature review 28 the 16p locus, including other good candidate genes [146]. As previously de- scribed, the Mnm locus in the nmd mouse model of DSMA1 with IGHMBP2 mutations, ameliorates the motor-neuron disease phenotype [100]. The Mnm locus contains several tRNA genes and the ribosomal associated factor Abt1, which associates with IGHMBP2; however, no polymorphisms were identified in these genes to account for the compensatory effect of the Mnm locus [98].

While the two mapped dHMN genetic modifiers have not yet been identified, the genetic modifiers influencing the severity of SMA caused by mutations in the SMN1 gene have been characterised. The phenotype of SMN related SMA is highly variable and ranges from the fatal infantile onset type 1 SMA (SMA1), through the chronic infantile onset SMA type 2 (SMA2), juvenile type 3 SMA (SMA3) and adult onset, type 4 SMA (SMA4). Mutations in SMN1 involve deletions or exon skipping resulting in a loss of SMN protein. Heterozygous SMN1 mutations cause the least severe phenotype SMA4, with loss of both copies of SMN1 causing the more severe earlier onset forms of SMA. There are several other factors which influence the severity of the disorder, the first is the second copy of the SMN gene (SMN2) located on a large duplication on . The SMN2 gene has a silent mutation which causes exon 7 skipping leading to an inactive protein. However SMN2 transcription still produces about 10% of normal SMN mRNA. SMN2 copy number influences the phenotype of SMA, with higher SMN2 copy number correlating with less severe diagnosis [186]. SMN1 mutations that maintain partial transcription levels similar to SMN2, lead to a less severe phenotype than a complete loss of SMN1 [187]. Secondly, high expression levels of the PLS3 gene, encoding Plastin3, have been shown to be protective against SMN1 mutations [188]. Plastin3 over expression rescued axon defects caused by SMN1 deletion. The PLS3 protective effect is gender-specific to females and shows less than 100% penetrance suggesting an unknown regulatory mechanism is responsible for the altered expression levels of PLS3[188]. A third modifying factor is the 1 literature review 29 neuronal apoptosis inhibitory protein (NAIP) [189]. Partial deletions of NAIP are a polymorphism found in 2% of non-SMA . When NAIP deletions are inherited with SMN1 mutations, the more severe SMA1 pheno- type results. Deletions in the NAIP gene are found in a high proportion of patients with SMA1, including 67% of European and 37% of Chinese SMA1 cohorts [189, 190]. NAIP is an inhibitor of cell death and regulator of neuronal outgrowth [191]. These functions are lost with the deletions of NAIP. Similar modifiers may play a role in the variability of dHMN.

1.1.12 Common disease mechanisms in dHMN

While the genes associated with dHMN encompass a diverse range of func- tions, some common mechanisms have emerged. Disrupted axonal transport has been implicated in dHMN with DCTN1, HSPB1 and HSPB8 mutations. As a core component of microtubule based transport, the disruption of the motor function of Dynactin 1 directly disrupts axonal transport. Disruption of components of microtubule based movement, including cargo adapters, micro- tubule tracks and motor proteins are seen in other neurodegenerative diseases including ALS, Huntington’s disease, Parkinson’s disease and Alzheimer’s disease [192]. The mechanism whereby HSPB1 and HSPB8 mutations disrupt axonal transport is less clear, although a disruption of the axonal cytoskeleton or protein aggregation are possible mechanisms leading to axon blockage. Active transport along axons is crucial for the survival of motor neurons. Transport is required for delivery of proteins, lipids and mitochondria to the distal synapse, long distance signalling and the clearing of damaged proteins. These requirements make neurons particularly susceptible to disruption of axonal transport. 1 literature review 30

RNA processing defects are also emerging as a significant factor in mo- tor neuron disease [193]. It is apparent that motor neurons are particularly sensitive to disruptions in RNA processing pathways, which include, RNA transcription, pre-mRNA splicing, editing, transportation, translation and degradation [51]. Several genes mutated in dHMN have roles in RNA pro- cessing including IGHMBP2, SETX and GARS which have been shown to be disrupted. While HSPB8 and HSPB1 have functions involved in RNA pro- cessing, whether these functions are involved in the pathogenesis of dHMN has not been investigated. Recently, mutations in TAR DNA binding protein (TDP-43)[194] and fused in sarcoma (FUS)[195] have been shown to cause ALS. Both FUS and TDP-43 have roles in RNA processing including regulation of splicing, transcription and transport. An understanding of the effects of mutations in these RNA processing proteins should provide insight into the disease mechanisms of dHMN and other motor neuron disease.

Protein aggregation and inclusion body formation are recognised as common molecular mechanisms leading to neurodegeneration [34]. Protein aggregation may result from a failure of the proteasome to clear misfolded proteins, which aggregate causing further toxic effects, including blocked access to axons, toxic interactions with additional proteins or triggering cell death pathways. In dHMN2 with HSPB1 and HSPB8 mutations, protein aggregates were seen in vitro, with aggregates containing other proteins including the cytoskele- ton protein NFL. This was thought to contribute to disruption of the axonal cytoskeleton and axonal transport. Mutant BSCL2 was shown to form aggreo- some like structures and accumulate in the ER, leading to ER stress and cell death [152].

It is unclear how mutations in ubiquitously expressed genes result in a motor neuron predominant phenotype. Motor neurons may be more susceptible due to their unique features such as their large size and long axons, and heavy reliance on active transport and high energy demands, which make them 1 literature review 31 most vulnerable to the effects of these mutations. On the other hand, some genes mutated in dHMN are specifically expressed or expressed at higher levels in the CNS than other tissues, thereby accounting for much of the motor neuron specific disease phenotype. Mutations in the PLEKHG5 gene, which is specifically expressed in motor neurons, results in a more severe dHMN phenotype than many of the other ubiquitously expressed dHMN genes. PLEKHG5 mutations are thought to lead to cell death through loss of NFkb signalling, resulting in a motor neuron specific phenotype due to its limited expression. Future genetic studies including those searching for additional disease genes can use the information from known disease genes to identify likely candidate genes in related genes and common pathways. As additional genes are described in dHMN, it is likely the common disease mechanisms and pathways will become clearer. These common disease pathways will provide targets to develop disease treatments or preventative strategies. The purpose of this thesis is to identify new gene mutations causing dHMN. The genetic and functional data presented here suggest this will be a difficult task; the genetic heterogeneity complicates genetic analysis and the multiple molecular mechanisms implicated to date make it difficult to pinpoint specific candidate genes. The identification of additional genes and genetic modifiers is necessary to increase our understanding of the disease mechanisms caus- ing dHMN and related neuropathies. This will directly aid in the diagnosis and classification of these neurodegenerative diseases and may lead to new therapeutics and treatment strategies. 2 GENERALMATERIALSANDMETHODS

This chapter details the general materials and methods used throughout this project. Protocols specific to individual experiments are detailed in the relevant chapters.

2.1 general materials and reagents

All solutions were prepared using Type 1, reagent-grade water, purified using a Milli-Q Academic (Millipore) water purification system. Solutions, plasticware and glassware were sterilised by autoclaving.

2.1.1 Reagents and Enzymes

The following reagents and enzymes were used: 10 mM dNTP Mix (Bioline), Blue Perpetual Taq DNA polymerase (Vivantis) and supplied buffers, Chromo AtTaq (Vivantis) and supplied Buffer A, ImmoMix Red (Bioline), 10× PCRx Enhancer Solution (Invitrogen), HRM Master Mix (Idaho Technology), LC Green Plus (Idaho Technology), agarose (Vivantis), 25× TAE buffer (Amresco), SYBR safe (Invitrogen), HyperLadder I (Bioline), HyperLadder IV (Bioline), 100% ethanol (Ajax Finechem).

32 2 general materials and methods 33

2.1.2 Equipment

Pipetting was carried out using the following pipettes: Nichipet 100-1000 µL, 20-200 µL, and 2-20 µL (Nichiryo); Pipetman P2 (Gilson); multichannel 20-200 µL (Socorex). Disposable tips used for pipetting were: Ultratip (Greiner Bio- One), Gilson yellow 200 and Gilson blue 1000; ART Aerosol Resistant Tips (Molecular BioProducts), ART 10 and ART 20p; and Aerosol Barrier Tips (Inter- path Services), 1000 µL, 200 µL, 20 µL and 10 µL. Thermal cycling was carried out using either the GeneAmp PCR System 9700 (Applied Biosystems), Master- cycler ep gradient S (Eppendorf), or Mastercycler pro S (Eppendorf). Agarose gel electrophoresis was carried out in horizontal submarine units (Cleaver Sci- entific and Amersham Bioscience) using a 3000Xi Electrophoresis power supply (Bio-Rad). DNA concentration was measured using a BioPhotometer 6131 (Ep- pendorf) with UVette cuvettes (Eppendorf) and a 10 mm optical path length. Centrifugation was carried out using: PMC-860 Capsulefuge (Tomy), 5417C centrifuge (Eppendorf), Biofuge pico (Heraeus), GS6R centrifuge (Beckman) or Megafuge 1.0 (Heraeus).

2.1.3 Plasticware

The following plasticware were used: 1.5 mL microcentrifuge tubes (Astral Scientific), 8-strip 200 µL PCR tubes (Scientific Specialities Inc.), 96-well un- skirted PCR plates (Bioplastics), 96-well black-shell/white-well skirted PCR plates (BioRad), 15 mL and 50 mL centrifuge tubes (Greiner Bio-One), and

Vacuette R 9 mL K3E K3EDTA (Greiner Bio-one) blood collection tubes. 2 general materials and methods 34

2.2 study participants

Individuals participating in this study gave informed consent in accordance to protocols approved by the Sydney South West Area Health Service Ethics Review Committee (Sydney Australia; Reference number: CH62/6/2006 - 050 - G.Nicholson).

2.3 dna methods

2.3.1 DNA extraction from blood

DNA samples were extracted by the Molecular Medicine Laboratory, Concord Repatriation General Hospital (Sydney, Australia). Genomic DNA was ex- tracted from peripheral blood using the PUREGENE DNA isolation kit (Gentra Systems) according to the manufacturers protocol. In brief, RBC lysis solution was added to 3 mL of whole blood to a total volume of 12 mL and incubated at room temperature for 10 min. White cells were pelleted by centrifugation for 10 min at 2000 × g and the supernatant discarded. Cells were lysed by adding cell lysis solution (3 mL) and mixing by vortexing. Protein was precipitated by addition of protein precipitation solution (1 mL) followed by vortexing and centrifugation for 10 min at 2000 × g. Supernatant was recovered into a 15 mL centrifuge tube. DNA was precipitated by addition of 100% isopropanol (3 mL), mixing by inversion, then pelleted by centrifugation for 3 min at 2000 × g. The DNA pellet was washed in 70% ethanol (3 mL) and centrifuged for 1 min at 2000 × g. The supernatant was discarded and the DNA pellet was air dried for 15 min. The DNA was re-suspended in 250 µL of DNA hydration solution and stored at 4°C. 2 general materials and methods 35

DNA samples older than the year 2000 were extracted using an alternate phenol chloroform based DNA extraction protocol as described by Sambrook et al. [196].

2.3.2 DNA oligonucleotides

DNA oligonucleotides for PCR primers were designed using either the Exon- Primer software through the UCSC genome browser [197], or LightScanner Primer Design software version 1.0.R.84 (Idaho Technology). Primer sequences are listed in Appendix B. Primers were synthesised by Invitrogen or Sigma.

2.3.3 PCR protocol

PCR reactions were carried out in a 10 µL reaction volume containing 10 ng template DNA, 1 U of Taq polymerase, 1 × reaction buffer, 100 µM each dNTP, 1.5 mM MgCl2 and 4 pmol each of forward and reverse primers. If required, PCRx Enhancer solution or LC Green Plus solutions were added at

1 × concentration during optimisation. MgCl2 concentration was increased for difficult to optimise amplicons using a gradient from 1.5 mM to 2.5 mM. PCR reaction thermal cycling involved: initial denaturation at 95°C for 10 min; 30 cycles of 95°C for 30 sec, optimal annealing temperature for 30 sec, and 72°C for 30 sec; and a final extension at 72°C for 5 min. The optimal annealing temperature was determined using the Mastercycler ep gradient S (Eppendorf) with a temperature gradient of 55 to 65°C. Thermal cyclers used are listed in Section 2.1.2. Alternatively, a touchdown PCR protocol was used to avoid optimisation of annealing temperature. The touchdown PCR protocol used: initial denaturation at 95°C for 10 min; 10 cycles of 95°C for 30 sec, 72°C for 30 sec with a 1.5°C decrease in annealing temperature per cycle and 72°C for 2 general materials and methods 36

30 sec; 25 cycles of 95°C for 30 sec, 55°C for 30 sec and 72°C for 30 sec; and a final extension at 72°C for 5 min.

2.3.4 Agarose gel electrophoresis

PCR amplicons were size fractionated using agarose gel electrophoresis. 1- 2% (w/v) agarose gels were prepared by disolving agarose into 1 × TAE buffer with heating using a microwave oven. Molten agarose was allowed to cool to 65°C before adding CYBR Safe (Invitrogen) DNA gel stain, then poured into horizontal gel trays and allowed to solidify. PCR samples were electrophoresed at 40 Vcm-1 for 40 min. DNA was visualised using a Safe Imager Transilluminator (Invitrogen) and imaged using a Canon PhotoShot S5-IS digital camera with Hoya O(G) filter. DNA size standards used included HyperLadder I (200 bp to 10 kb) or HyperLadder IV (100 bp to 1 kb).

2.3.5 DNA purification from PCR

Purification of PCR products was carried out using alternate protocols de- pending on the sequencing service used. The The Montage PCR96 Cleanup Kit (Millipore) was used for PCR amplicons sequenced at Garvan sequencing centre (Section 2.3.6.1) and ExoSAP PCR purification system was used for PCR amplicons sequenced using the Macrogen sequencing service (Section 2.3.6.2). The Montage PCR96 Clean-up Kit (Millipore) was used following the man- ufacturers semi-automated protocol using a vacuum manifold (Millipore), Pascal type 2005C vacuum pump (Alcatel) and high-speed microplate shaker (Illumina). For PCR cleanup using ExoSAP, a master mix was prepared on ice containing (per sample) 0.3 µL Exonuclease I (New England Biolabs), 1 µL

Shrimp Alkaline Phosphatase (Promega) and H2O to a volume of 5 µL. For a 2 general materials and methods 37

10 µL PCR reaction volume, 5 µL of ExoSAP master mix was added to each completed PCR reaction and incubated for 30 min at 37°C then 15 min at 80°C.

2.3.6 DNA sequencing

Sequencing was carried out using the sequencing service at Australian Cancer Research Foundation (ACRF) Facility GARVAN institute (Sydney, Australia) and the sequencing service at Macrogen (Korea).

2.3.6.1 GARVAN sequencing

PCR amplicons were cleaned up using the The Montage PCR96 Clean-up Kit (Millipore). Sequencing reactions were carried out using the Big Dye terminator kit (Applied Biosystems) following the manufacturers protocol. In brief, sequencing reactions were carried out in a half reaction volume of 10 µL containing: BigDye Sequencing Buffer (1 ×), ready reaction premix (1 ×), sequencing primer (3.2 pmol), and PCR product DNA template (3-10 ng). Cycle sequencing was carried out using a Mastercycler ep gradient S (Eppendorf) with a modified protocol of: 96°C for 1 min; 40 cycles of 96°C for 10 sec, 50°C 5 sec, 60°C 4 min; then held at 4°C. Sequencing products were forwarded to the sequencing service at Australian Cancer Research Foundation (ACRF) Facility GARVAN institute (Sydney, Australia) for sequencing using an ABI 3100 Genetic Analyser (Applied Biosystems).

2.3.6.2 Macrogen sequencing

PCR products were cleaned up using ExoSAP. Cleaned up PCR products were forwarded to Macrogen (Korea) for their standard-sequencing single or plate service using Big Dye Terminator cycle sequencing (Applied Biosystems) and 3730xl DNA Analyzer (Applied Biosystems). 2 general materials and methods 38

2.3.6.3 Sequence analysis

Sequencing traces were downloaded from sequencing service providers in .abi format. Sequences were aligned to the Human Mar. 2006 (NCBI36/hg18) assembly reference sequence for the target gene using the software SeqMan II version 5.08 (DNASTAR). Affected and control sequences were compared to identify sequence variants. Correct alignment, exon coverage and sequence variant annotation of aligned sequences for each PCR amplicon were checked using the BLAT tool from the UCSC genome browser [198]. GENETICMAPPING 3

3.1 introduction

3.1.1 Overview

The focus of this genetic study is the dHMN1 disease locus on chromosome 7q34-q36 in a large Australian family, CMT54 [103]. This disease locus is a 12.9 megabase (Mb) region flanked by the markers D7S2513 and D7S637, with a minimum probable candidate interval of 7.6 Mb flanked by D7S2511 and D7S2546, suggested by recombination events in two unaffected individuals [103]. There remains potential for fine mapping to refine the minimum probable candidate interval in family CMT54. Further refinement of the disease interval requires additional dHMN families linked to 7q34-q36 or expansion of the original family. This chapter includes the fine mapping of the minimum probable candidate interval in CMT54 and an analysis of the 7q34-q36 locus in a cohort of dHMN families that are negative for known dHMN genes.

3.1.2 Homologous recombination and linkage disequilibrium

Homologous recombination is a mechanism whereby sections of DNA are exchanged between pairs of homologous chromosomes. Homologous recombi- nation results in a chromosomal crossover, creating two chromosomes with a new combination of alleles. While genes located on separate chromosomes

39 3 genetic mapping 40 assort independently during meiosis, genes located on the same chromosome only assort independently if homologous recombination occurs between them. The further apart genes are on a chromosome the more likely this will occur. Two genetic loci that do not assort independently are deemed linked. The recombination fraction (θ) is a measure of the frequency of recombination between two loci, the genetic distance between them. Recombination fractions of < 0.5 indicate some degree of linkage, less offspring are recombinant than would be expected for random assortment. A recombination fraction of 0.5 indicates that two loci are unlinked, that is 50% of offspring are recombinant.

The map unit of genetic distance is the centimorgan (cM), with one map unit defined as the distance between two loci such that 1% of meiosis is recombinant (θ = 0.01). Unless located closely together, two loci on the same chromosome will be rapidly redistributed in a population through genetic recombination and assume random assortment. Because closely located loci are less likely to be recombined, they redistribute slowly and do not assort randomly in a population. This non-random association of alleles at linked loci is termed linkage disequilibrium or allelic association [199].

Genetic distance is a measure of the rate of recombination, not a physical distance. An approximation of 1 cM Mb−1 is often used as a conversion from cM to Mb [199], however the true rate of recombination varies across the genome. The rate of recombination varies: between chromosomes, at different regions across each chromosome, between individuals and between males and females [200, 201]. The average male to female ratio of recombination rate is 1.65. When examined at a Mb scale resolution, recombination rate generally varies from 0-2 cM Mb−1, with localised peaks of high recombination of 3-7 cM Mb−1 [200, 201]. Fine scale mapping at a kilobase (kb) resolution has identified recombination hot spots approximately every 200 kb, generally located outside genes, with higher densities of such hot-spots in regions of higher recombination [202]. Recently, motifs and sequence contexts have been 3 genetic mapping 41 shown to play a role in recombination site selection [203]. In particular, the PRDM9 reignition sequence is a major determinant of recombination site selection [204].

3.1.3 Genetic linkage analysis

Genetic linkage analysis is a method for mapping a gene or genetic variant to a specific chromosomal locus by identifying linked genetic markers. In human genetics, linkage analysis can be an important first step in the identification of disease genes. Recombination events within a pedigree are used to define the locus harbouring the disease allele. Fine mapping is the process of refining the location of these recombination events by analysing markers at greater resolution around recombination sites.

3.1.3.1 DNA markers

DNA markers are characterised genetic variations which are polymorphic within a study population. Markers are deemed informative in an individual when allele phase can be identified, that is, both alleles can be assigned a parental origin. Highly polymorphic markers are those with the greatest heterozygosity, and therefore, are more likely to be informative within a pedigree. Maximum heterozygosity is achieved when individuals within a pedigree carry different alleles for a marker. This allows marker phase to be assigned, and recombinant and non-recombinant meiosis to be identified, giving the maximum possible linkage power for a pedigree. While several types of polymorphic markers exist, short tandem repeat (microsatellite) and single nucleotide polymorphism (SNP) markers are the most commonly used in current genetic linkage analysis techniques. 3 genetic mapping 42

Microsatellites are short repeat sequences of 1-6 base pairs of DNA, with the length of the microsatellite used to identify alleles. Di- and tri-nucleotide sequences repeating 20-30 times are traditionally used as markers, as they are highly polymorphic and suitable for PCR based amplification and sizing. Mark- ers with 10 or more alleles are not uncommon, therefore microsatellite markers are generally highly informative and allow correct phase to be assigned.

SNPs are single point mutations, insertions or deletions, common at a popu- lation level, typically of 1% or greater. The informativeness of SNP markers is lower compared with microsatellites, limited by the possible number of alleles and the population incidence of the SNP. However, the ability to assay large numbers cheaply and quickly with array based techniques and the higher density across the genome, means SNP based genotyping can be suitably informative when analysed using multi-point linkage techniques. However, such analysis is generally limited to small pedigrees or genomic loci due to software and computational limitations.

3.1.3.2 Two point linkage analysis

The odds of linkage between two loci is quantified as the ratio of the likelihood of the two alternate assumptions; that the loci are linked (0 < θ < 0.5) or not linked (θ = 0.5)[205]. In positional cloning, one locus is the marker being tested and the second is the disease locus. The logarithm of the odds (LOD) score (Z) is used to evaluate linkage in pedigrees [206]. The LOD score at a selected value for θ is determined by the equation:

 θR(1 − θ)NR  Z = log 10 (0.5)R(0.5)NR

Where R = number of recombinants and NR = number of non recombinants.

Z is calculated at a range of values for θ from 0 to 0.5, with best estimate of the distance between the two loci given by the value of θ where Z reaches 3 genetic mapping 43 its maximum positive value. This allows the localisation of a disease locus within a genetic map. Calculations of LOD scores used in this thesis were implemented in computer software; SuperLink, FastSLink and Genehunter (Section 3.2.4).

To demonstrate significant linkage, the LOD score must be powerful enough to reject the null hypothesis of no-linkage. By convention, a LOD score of +3 gives sufficient power, with 1000:1 odds in favour of genome wide linkage. Conversely, a LOD score of -2 is considered sufficient to exclude the marker lo- cus from linkage. LOD scores between these limits are uninformative, however LOD scores > 1 may be considered suggestive of linkage and warrant further investigation because the rates of false positive linkage with such scores are low [205].

The strength of Z is influenced by the number of informative meioses within a pedigree and the marker allele frequencies. To achieve a LOD score of +3 generally requires at least 10 informative meioses. As such, smaller pedigrees cannot independently achieve significant LOD scores. In situations where the disease in multiple pedigrees is thought to be due to mutations in a common gene, LOD scores can be added to achieve a greater power to identify a disease locus. LOD scores of -2 can be achieved with small pedigrees when alleles do not segregate among two or more affected individuals. This is useful for excluding linkage to known disease loci. Combining linkage data between pedigrees in heterogeneous diseases such as dHMN, is usually ineffective because each gene is only responsible for a small proportion of cases.

3.1.3.3 Multi-point Linkage analysis

Multi-point linkage analysis combines data from multiple adjacent markers in a single analysis and as such, is useful for markers with limited informative- ness. Multi-point linkage analysis is necessary for SNP based linkage analysis as individually these markers are insufficiently informative. For highly infor- 3 genetic mapping 44 mative microsatellite markers this is less of a concern. A plot of the multi-point LOD score across genetic distance is useful for visualising linkage or exclusion across a genetic locus.

3.1.3.4 Haplotype analysis

A haplotype is a series of linked alleles at a chromosomal locus which can be treated as a single unit of transmission. To identify a haplotype requires informative markers, from which the phase of the alleles in a chromosomal segment is determined. The segregation of a haplotype within a pedigree is useful for visualising recombination events in individuals. The minimum haplotype segregating with affected individuals in a pedigree defines the disease interval. Fine mapping involves genotyping additional markers around recombination sites to minimise the size of the disease interval.

3.1.4 Penetrance in dHMN

The penetrance of a disease allele is the proportion of individuals carrying the mutation who exhibit clinical symptoms. Distal HMN is generally a highly penetrant disorder, however there is some variability between different types of dHMN. Mutations causing dHMN2A [3], dHMN2B [2] and ALS4 [96], have complete penetrance. Similarly, dHMN5 and CMT2D caused by GARS mutations showed complete penetrance in the first identified families [126], however a subsequent dHMN5 family with a GARS mutation showed reduced penetrance [129]. Distal HMN5 caused by BSCL2 mutations also shows reduced penetrance in addition to clinical variability [142]. Until the mutation causing dHMN mapped to 7q34-q36 is identified, the penetrance of dHMN1 in family CMT54 cannot be confirmed. The possibility of reduced penetrance in dHMN limits the use of recombination events in unaffected individuals for refining 3 genetic mapping 45 the disease locus, especially where the recombinant chromosome has not been inherited further by unaffected offspring. Nevertheless, recombinant unaffected individuals, not at risk of becoming affected, help prioritise candidate gene selection in mutation screening analysis.

3.1.5 Characterisation of dHMN1 in a large Australian family

Genetic linkage analysis was carried out in a large Australian dHMN family (CMT54) using microsatellite markers [103]. The disease in this family was classified as dHMN1, showing autosomal dominant inheritance with 10 af- fected individuals (Figure 3.1). The disease in this family is slowly progressive, with an early but variable age of onset (range 3-40 years, median 10 years), presenting with difficulties walking and running due to lower limb weakness. Neurological examination identified predominant foot deformities including pes cavus in all patients, and hammer toes with callus formation in all but one patent. Increased muscle tone was observed in six patients. Power was reduced in ankle extensors and intrinsic foot muscles in all but one patient. Plantar responses were extensor in five patients. Reduced hand-grip in two patients was the only upper limb weakness demonstrated. The only sensory abnormalities identified were reduced vibration in the feet of four patients. Sural nerve biopsy in one patient at age 43 years was consistent with chronic axonal neuropathy.

3.1.6 Previous genetic studies in dHMN1 at 7q34-q36

A genome-wide two-point linkage analysis in family CMT54 showed signifi- cant evidence for linkage at D7S636 on 7q36.1 with a LOD score of 3.22. An Figure 3.1: Previously published pedigree of CMT54 and haplotype analysis at 7q34-q36[103]. Females and males are represented by circles and squares respectively. Blackened symbols indicate affected individuals. Marker alleles are listed below individuals, ordered from centromere to telomere. Haplotypes are indicated by bars, with the haplotype segregating with the disease blackened. The narrow bar in individual II:7 indicates uninformative markers. The recombination events in individuals II:10 and III:13 define the disease gene candidate interval flanked by D7S2513 and D7S637. Recombination events in unaffected individuals II:7 and III:3 suggest a minimum probable candidate interval, flanked by D7S2511 and D7S2546. Individuals id’s have been updated to match those in Figure 3.5. 3 genetic mapping 46 3 genetic mapping 47

Table 3.1: Previous two-point LOD scores between chromosome 7q34-q36 markers and the dHMN1 disease locus in CMT54 [103]. LOD score at recombination fraction Marker 0.00 0.01 0.05 0.10 0.20 0.30 0.40 D7S1518 -5.09 0.86 1.36 1.41 1.20 0.70 0.25 D7S2513 -0.05 0.04 0.22 0.30 0.30 0.20 0.06 D7S2473 2.99 2.94 2.75 2.51 2.00 1.30 0.62 D7S661 3.73 3.68 3.43 3.11 2.40 1.70 0.78 D7S498 0.58 0.59 0.60 0.57 0.44 0.25 0.08 D7S2511 2.59 2.57 2.47 2.30 1.84 1.26 0.58 D7S688 1.67 1.61 1.40 1.12 0.53 -0.05 -0.33 D7S2426 2.69 2.64 2.46 2.21 1.69 1.11 0.49 D7S505 3.29 3.23 3.00 2.70 2.03 1.29 0.48 D7S636 3.22 3.16 2.95 2.68 2.10 1.40 0.67 D7S2439 1.72 1.70 1.59 1.46 1.16 0.83 0.45 D7S3070 3.59 3.53 3.30 2.99 2.32 1.57 0.74 D7S483 3.29 3.24 3.03 2.76 2.15 1.46 0.69 D7S798 1.76 1.74 1.63 1.49 1.20 0.90 0.46 D7S2462 2.17 2.14 2.00 1.82 1.45 1.02 0.55 D7S2546 2.48 2.49 2.45 2.32 1.90 1.30 0.65 D7S637 -2.23 0.71 1.31 1.45 1.30 1.00 0.47 3 genetic mapping 48 additional 16 markers flanking D7S636 were analysed in family CMT54, with four additional markers providing significant evidence of linkage (Table 3.1).

Multi-point linkage analysis using the linked markers achieved a maximum multi-point LOD score of 3.74 at D7S661. Recombinant haplotype analysis of affected individuals defined the disease locus as the 21.78 cM interval flanked by the markers D7S2513 and D7S637. Recombinant haplotype analysis includ- ing unaffected individuals > 25 years of age suggests a minimum probable candidate interval spanning 16.7 cM flanked by the markers D7S2511 and D7S2546. Physical mapping of the flanking markers localised the disease inter- val to chromosome 7q34-q36 (Figure 3.2). Transcript mapping and mutation analysis of the 7q34-q36 dHMN1 locus is described in Chapter 4.

Two other dHMN genes and one spinocerebellar ataxia (SCA) gene have also been mapped to (Figure 3.2). The gene for dHMN2B, HSPB1 is located proximal to the dHMN1 locus at 7q11.23 and the gene for dHMN5, GARS, is located at 7p15.1. The gene for SCA18 (MIM ID %607458), an autosomal dominant sensorimotor neuropathy with ataxia, has been localised to 7q22-q32 [207]. A mutation in the interferon-related developmental regulator 1 (IFRD1) gene was identified as the probable pathogenic mutation for SCA18, however it is yet to be replicated in additional families [208]. Mutations in these genes and other known dHMN genes have been previously excluded in CMT54 [209].

3.1.7 Hypothesis and aims

It is hypothesised that dHMN seen in family CMT54 is caused by a gene mutation within the linked region on chromosome 7q34-q36. The aims of this chapter are to: 3 genetic mapping 49

Figure 3.2: (Left) Ideogram of human chromosome 7, indicating the known dHMN and other neurodegenerative disease genes and loci mapped to chromosome 7. Horizontal bars indicate the location of disease genes and vertical bars indicate the extent of each disease locus. (Right) Physical map of the dHMN1 chromosome 7q34-q36 locus. Original flanking markers are shown in bold. The original 7 Mb minimum probable candidate interval is flanked by the markers D7S2511 and D7S2546. 3 genetic mapping 50

• refine the minimum probable candidate interval in CMT54 through fine mapping.

• identify additional dHMN families linked to 7q34-q36 to further refine the disease interval and confirm the disease locus.

3.2 materials and methods

3.2.1 Microsatellite analysis

Microsatellite markers were PCR amplified from genomic DNA incorporating a 6-carboxyfluorescein (FAM) label that was present on each forward primer (Section 2.3.3). Primer details are included in Appendix B. Size separation of Microsatellite marker alleles was performed using the microsatellite analysis service from the Australian Cancer Research Foundation (ACRF) at the Garvan Institute of Medical Research (Darlinghurst, NSW Australia). In brief, FAM labelled PCR products were analysed using an ABI PRISM® 3100 Genetic Analyzer (Applied Biosystems Ltd) using a GeneScan™ 600 LIZ size standard (Applied Biosystems Ltd).

Scoring of microsatellite alleles was performed using the software, Gene- Marker version 1.51 (SoftGenetics LLC). Allele sets were scored by size to allow comparison of alleles between families. The software Cyrillic, version 2.1.3 (Cherwell Scientific Publishing Ltd) was used to input and manage pedigrees and genotype data. Pedigree and marker data were exported from Cyrillic for linkage analysis using the MLINK format. 3 genetic mapping 51

3.2.2 Microsatellite markers

The microsatellite markers used in fine mapping of the dHMN1 chromosome 7q34-q36 disease locus in family CMT54 were selected from the Marshfield genetic map through the STS Markers track of the UCSC genome browser

(http://genome.ucsc.edu/)[210], based on their physical location between candidate genes. When no suitable markers were available, additional new microsatellite markers were designed by selecting repeat sequences from the UCSC genome browser microsatellite track. The microsatellite track displays di- nucleotide and tri-nucleotide repeats, with 100% identity, no indels of at least 15-mer, identified with the software Tandem Repeats Finder [211]. Flanking PCR primers were designed using the software, LightScanner® Primer Design version 1.0.R.84 (Idaho Technology Inc.) using DNA sequences from the hg18 (2006) assembly, downloaded using the UCSC genome browser. The Marshfield STRP map used in genetic linkage analysis was modified to include these additional markers. The physical position of the microsatellites in the hg18 human assembly was used to correctly order markers in the genetic map. No genetic distances were available for these new markers, therefore the genetic distances were approximated by averaging the genetic position of the flanking markers in the sex-averaged Marshfield STRP map.

3.2.3 Haplotype analysis

Haplotypes were generated from genotype data using parental genotypes to assign chromosomal phase of alleles where possible. Otherwise, allele phase was inferred from the haplotypes of offspring and siblings. Recombinant hap- lotypes were assigned by minimising the number of individuals with the new recombinant haplotype. Pedigrees were analysed for co-inheritance of haplo- 3 genetic mapping 52 types with disease state. Pedigrees with non-segregation of haplotypes among affected individuals and pedigrees in which an unaffected spouse carried the disease associated haplotype were excluded. Families where unaffected at-risk individuals carried the disease haplotype were not excluded due to possible non-penetrance in dHMN.

3.2.4 Genetic linkage analysis

The easyLINKAGEplus version 5.02 software package [212] was used for genetic linkage analysis, including; simulation of two-point LOD scores (Fast- SLink), two-point linkage analysis (SuperLink), and multi-point linkage analy- sis (GeneHunter). A modified version of the sex-averaged Marshfield STRP genetic map including new microsatellite markers, was used in all linkage analysis (Section 3.2.2). A script pedmake.sh (Appendix A), was written for GNU Bourne-Again SHell (BAsH), version 4.1.5 (Free Software Foundation, Inc) to convert the MLINK format files generated by Cyrillic into the pedigree and marker files required for EasyLINKAGEplus.

3.2.4.1 FastSLink

The software FastSLink, version 2.51 [213, 214] was used for simulation of maximum LOD scores for dHMN pedigrees. FastSLink parameters were; dominant inheritance model, using microsatellite markers with six marker alleles, disease allele frequency of 0.0001, penetrance of wt/mt and mt/mt genotypes of 0.95, and 1000 replicates.

3.2.4.2 SuperLink

The software SuperLink, version 1.5 was used for two-point parametric linkage analysis. SuperLink parameters were; single locus analysis type, autosomal- 3 genetic mapping 53 dominant trait with a penetrance vector of 0.95 for wt/mt and mt/mt alleles, and disease allele frequency of 0.0001.

3.2.4.3 GeneHunter

The software GeneHunter, version 2.1.R5 was used for multi-point linkage analysis. GeneHunter parameters were; autosomal-dominant trait with a pen- etrance vector of 0.95 for wt/mt and mt/mt genotypes, no liability classes applied, and 100 markers per set.

3.2.5 Association analysis

Common haplotypes of three or more alleles were identified by observation. Each haplotype was treated as a single unit and tested for association with the disease. The Fisher’s exact test of independence was used to test for association of haplotypes with dHMN. A 2x2 contingency table was used, with the first row the dHMN group haplotypes, and the second row the healthy control group haplotypes. Column 1 counted chromosomes with a common haplotype and column 2 counted chromosomes with a different haplotype. The Fishers exact test calculation was carried out using the software, QuickCalcs - online calculators for scientists (GraphPad Software, Inc, www.graphpad.com/ quickcalcs/index.cfm). Control haplotypes with incomplete matches due to missing allele information were counted as matching to achieve the highest stringency. The null hypothesis tested is: The relative proportion of each haplotype is independent of the disease. 3 genetic mapping 54

3.3 results

3.3.1 Fine mapping of family CMT54

Using fine mapping there was potential to refine the minimum probable candi- date interval defined by genetic recombination events in unaffected individuals in pedigree CMT54. Markers were selected to saturate the flanking regions of the minimum probable candidate interval, with one marker between each gene located in these areas. This density of markers achieved the maximum resolution for excluding genes from the minimum probable candidate interval.

3.3.1.1 Minimum probable candidate interval proximal flanking marker

Under the original analysis, haplotypes show a genetic recombination in in- dividual CMT54-III:3, which defined the proximal flanking marker for the minimum probable candidate interval as D7S2511.The physical distance be- tween D7S2511 and the next marker, D7S688 is 2.00 Mb and contains half of the gene CNTNAP2, the genes CUL1 and EZH2, and the uncharacterised transcript C7orf33. The additional microsatellite markers D7S615, D7S2442 and D7S2419 were selected in the region between the previous markers D7S2511 and D7S688 (Figure 3.3). All three additional markers were informative in individual III:3. Haplotype analysis localised the recombination event in individual III:3 to the 1.06 Mb interval between the markers D7S615 and D7S2442. Therefore, the new proximal flanking marker for the minimum probable candidate interval is marker D7S615. This excluded a further 0.74 Mb of CNTNAP2, a particularly large gene occupying just under 1/5 of the critical disease interval.

3.3.1.2 Minimum probable candidate interval distal flanking marker

The genetic recombination in individual CMT54-II:7 (Figure 3.1), defined the proximal flanking marker for the minimum probable candidate interval 3 genetic mapping 55

Figure 3.3: Detail of fine mapping of the proximal end of the minimum probable candidate interval defined in pedigree CMT54. The recombination boundary lies between D7S615 and D7S2442. The new proximal flanking marker is D7S615. This excludes a further 0.74 Mb of the gene CNTNAP2 from the minimum probable candidate interval. Markers indicated with an asterisk are those used in the previous haplotyping. Uninformative markers are underlined. The black bar represents the haplotype inherited with the disease and the white bar indicates the alternate, non- disease haplotype. Allele scores are shown as numbers in the haplotype bars. Genes in the intervals are shown as bars, with exons shown as wide sections on the bars and introns the narrow sections, the direction of the arrow indicates gene orientation. Haplotype bars are expanded on the right. 3 genetic mapping 56 as D7S2546. Two critical markers, D7S2462 and D7S798 are uninformative in individual II:7. The distance between D7S2546 and the next informative marker, D7S483 is 2.01 Mb and contains the genes XRCC2, ACTR3B and the start of DPP6. The additional marker D7S1815 was selected in this region but was also uninformative in individual II:7.

Four new markers AD0, AD1, AD2 and AD3, were designed using repeat se- quences identified from the microsatellites track of the UCSC genome Browser (Table 3.2). Microsatellites AD0, AD1 and AD3 were genotyped and shown to be polymorphic in CMT54, with 5, 7 and 4 alleles identified respectively. Mi- crosatelite AD2 showed significant PCR banding stutter, making similar sized alleles indistinguishable. Therefore AD2 was unsuitable as a microsatellite marker.

Markers AD0, AD1 and AD3 were informative in individual II:7. The recom- bination event in individual II:7 was localised to the 1.05 Mb interval between the markers AD3 and D7S2546 (Figure 3.4). Therefore the proximal flanking marker for the minimum probable candidate interval remains D7S2546 and no further genes were excluded from the minimum probable candidate interval.

3.3.1.3 Updated pedigree for CMT54

The pedigree of family CMT54 was updated to include two unaffected children of individual II:7; individuals III:7 and III:8 (Figure 3.5). Both individuals are

Table 3.2: New STR markers for fine mapping of the minimum probable candidate interval in pedigree CMT54. The number of alleles indicated includes alleles observed in all dHMN families examined. Marker Repeat† Physical position‡ Genomic band # alleles AD0 (GT)23 chr7:151,940,317-151,940,362 7q36.1 5 AD1 (TA)28 chr7:152,255,637-152,255,692 7q36.2 7 AD2 (TA)30 chr7:152,497,434-152,497,494 7q36.2 - AD3 (CA)23 chr7:152,787,257-152,787,302 7q36.2 4 † (nucleotides)number of repeats ‡ Human Mar. 2006 (NCBI36/hg18) assembly. 3 genetic mapping 57

Figure 3.4: Detail of fine mapping of the distal end of minimum probable candidate in- terval in family CMT54. The recombination boundary lies between AD3 and D7S2546. Markers indicated with an asterisk are those used in the previous haplotyping. Unin- formative markers are underlined. The black bar represents the haplotype inherited with the disease and the white bar indicates the alternate, non-disease haplotype. Allele scores are shown as numbers in the haplotype bars. Genes in the intervals are shown as bars, with exons shown as wide sections on the bars and introns the narrow sections, the direction of the arrow indicates gene orientation. Haplotype bars are expanded on the right. 3 genetic mapping 58

Table 3.3: Pedigree CMT54 two-point LOD scores between the dHMN locus and additional microsatellite markers on chromosome 7q34-q36. LOD scores for D7s2442, D7s2419 and AD0 reach significance. Markers AD1 and AD3 were not genotyped through the entire pedigree, limiting their LOD scores. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S615 2.0383 1.8863 1.7262 1.3782 0.9848 0.5323 D7S2442 3.1630 2.9259 2.6765 2.1347 1.5235 0.8223 D7S2419 3.7168 3.3267 2.9174 2.0417 1.1145 0.2948 AD0 4.9420 4.5508 4.1388 3.2432 2.2324 1.0944 D7S1815 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 AD1 0.5961 0.5316 0.4664 0.3360 0.2105 0.0967 AD3 0.8971 0.8087 0.7160 0.5191 0.3150 0.1285

recombinant, with neither inheriting the disease haplotype from individual II:7. Therefore, these two individuals do not reduce the possibility of incomplete penetrance in individual II:7.

3.3.2 Linkage analysis using new markers in CMT54

3.3.2.1 Two-point linkage analysis

Two-point LOD scores were calculated for the new markers in pedigree CMT54 (Table 3.3). Significant linkage was achieved for the markers D7S2442, D7S2419 and AD0 with LOD scores of 3.16, 3.71 and 4.94 at θ=0.00 respectively. The LOD score for the marker D7S615 did not reach a significant LOD score due to limited heterozygosity. The marker D7S1815 was uninformative in all meiosis in family CMT54, with only one allele observed. The markers AD1 and AD3 were only genotyped in a subset of individuals in family CMT54 limiting their LOD scores which are not representative of the power of these markers. Figure 3.5: Updated pedigree of family CMT54, showing haplotypes for selected markers on chromosome 7q34-q36. Females and males are represented by circles and squares respectively. Blackened symbols indicate affected individuals. Marker alleles are listed below individuals, ordered from centromere to telomere. Haplotypes are indicated by bars, with the haplotype segregating with the disease blackened. The recombination events in individuals II:10 and III:13 define the disease gene candidate interval flanked by D7S2513 and D7S637. Recombination events in unaffected individuals II:7 and III:3 suggest a minimum probable candidate interval, flanked by D7S615 and D7S2546. 3 genetic mapping 59 3 genetic mapping 60

3.3.2.2 Multi-point linkage analysis in CMT54

Multi-point linkage analysis in CMT54 achieved a maximum LOD score of 3.5 in the interval between D7S2442 and D7S2462 corresponding with the minimum probable candidate interval (Figure 3.6).

3.3.2.3 The minimum probable candidate interval in CMT54

Fine mapping in family CMT54 using haplotype and linkage analysis has refined the minimum probable candidate interval by 0.74 Mb, to the 6.92 Mb interval flanked by D7S615 and D7S2546. This excluded region is located within the CNTNAP2 gene which encompasses almost 1.5% of chromosome 7. CNTNAP2 spans the recombination in unaffected individual II:7, which defines the proximal end of the minimum probable candidate interval.

3.3.3 Identification of additional dHMN families

Further refinement of the 7q34-q36 interval may also be achieved by identifying additional dHMN families linked to 7q34-q36 and the analysis of recombinant individuals within these families. An analysis of the 7q34-q36 disease locus was carried out in a cohort of additional dHMN families to help refine the locus. Families from the Molecular Medicine (Concord Hospital) peripheral neuropathies database, indexed as dHMN, distal SMA or CMT2 with mild or no sensory involvement were evaluated. Families were selected if the famil- ial disease was inherited in an autosomal-dominant fashion, with unknown genetic cause, and symptoms were similar to those shown by CMT54.

Fifty-eight additional dHMN families were identified from the database with clinical features consistent with dHMN or CMT2 with no or minor sensory involvement. Twenty of these families had sufficient DNA samples available to enable haplotypes to be generated and were selected for genetic 3 genetic mapping 61

4.0

3.0

2.0

1.0

0.0 0 5 10 15 20 25

-1.0

-2.0 Multi-point LOD score -3.0

-4.0 D7S684 D7S1518 D7S2202 D7S2473 D7S2513 D7S661 D7S498 D7S615 D7S2511 D7S2442 D7S688 D7S2419 D7S2426 D7S505 D7S636 D7S2439 D7S483 D7S3070 AD0 D7S1815 AD1 D7S798 AD3 D7S2462 D7S637 D7S2546 D7S1823 D7S2447

Genetic distance (cM) Figure 3.6: Multi-point LOD score plot for family CMT54, including the new markers for fine mapping the minimum probable candidate interval. 3 genetic mapping 62 linkage analysis (Table 3.4). The remaining families had only one affected DNA sample available or were otherwise unsuitable for linkage and haplotype analysis and were indexed for future screening of any new dHMN gene. Selected families were previously screened for mutations in known autosomal- dominant dHMN genes; HSPB1, HSPB8, BSCL2, DCTN1, TRPV4, and SETX (O.Albulym, personal communication). Families indicated as possible dHMN5 were screened for mutations in GARS, and YARS (A.Antonellis, personal communication).

3.3.4 Linkage power analysis

The power of each family to detect linkage was determined using FastSLink as part of the easy linkage software package. The theoretical LOD scores obtained from simulated linkage analysis were comparable to the size of the families (Table 3.5). No pedigrees reached a maximum simulated LOD score of greater than 3.0, required for significant linkage. Pedigree CMT44 achieved the highest simulated maximum LOD score of 2.02, sufficient for suggestive linkage. Pedi- grees CMT260, CMT701, CMT375 and CMT618 achieved simulated maximum LOD scores of > 1.5. The remaining pedigrees are small nuclear families. While not powerful enough to achieve significant linkage on their own, these pedi- grees may be informative if ancestral links between families can be identified. Simulated negative LOD scores give an estimation of the exclusion power of a pedigree. Two families CMT331 and CMT779 achieved simulated LOD scores that reach significance for exclusion at θ = 0.00 (Table 3.5). At θ = 0.05, CMT331 and CMT779 achieved simulated negative LOD scores of -1.05 and -2.00 respectively. As dHMN is genetically heterogeneous, it is unlikely that all these families share mutations in a common gene. Therefore, the combined 3 genetic mapping 63

Table 3.4: Sample and clinical information of the 20 additional dHMN families selected for genetic linkage analysis. Family # Affected # Unaffected Condition Complicating features4 samples1 samples2,3 CMT44 5 7 dHMN2 CMT102 3 1 dHMN CMT131 3 1 dMHN CMT260 4 (1) 3 dHMN or CMTV CMT273 3 0 dHMN CMT274 3 1 dHMN or CMTV CMT331 4 0 dHMN variable minor/mild symptoms CMT333 3 1 dHMN CMT375 4 4 dHMN CMT477 2 3 dHMN CMT484 3 1 dHMN or CMTV CMT609 4 4 (1) dHMN or minor/mild sensory and CMTV pyramidal signs CMT618 4 3 dHMN or NS CMTV CMT677 2 4 dHMN vocal cord paralysis CMT701 6 1 dHMN pyramidal signs + GOR cough CMT703 3 0 dHMN5 CMT724 4 4 CMT2D minor/mild sensory CMT748 3 3 (1) dHMN pyramidal signs CMT779 3 (1) dHMN CMT808 3 3 dHMN pyramidal signs 1 Carriers with affected children are indicated in ( ) in the affected samples column. 2 Individuals with unknown clinical status are indicated in ( ) in the normal samples column. 3 Number of normal samples includes married in parents. 4 NS indicates normal sensory function in CMT families, GOR cough indicates gastro- oesophageal reflux and chronic cough. 3 genetic mapping 64

Table 3.5: Simulated two-point LOD scores for additional dHMN families at θ = 0.00 Pedigree Average StdDev Min Max CMT44 1.413277 0.672809 -1.880637 2.022451 CMT260 1.227786 0.616653 -1.079144 1.861581 CMT701 1.108423 0.453187 -0.339436 1.747461 CMT375 1.169244 0.619372 -1.793023 1.721420 CMT618 1.351277 0.417707 0.079144 1.630130 CMT724 0.917044 0.602872 -1.160476 1.441581 CMT609 0.827045 0.341491 -0.719797 1.183100 CMT331 0.617530 0.407886 -2.082981 1.181784 CMT477 0.214345 0.384159 -1.027830 0.713644 CMT779 0.376384 0.305761 -4.951016 0.602058 CMT748 0.408205 0.325215 -1.021184 0.580868 CMT808 0.412215 0.322585 -1.021187 0.580868 CMT274 0.378580 0.328142 -1.021107 0.580862 CMT677 0.192213 0.303921 -0.866278 0.442662 CMT484 0.241424 0.119957 -0.000000 0.301028 CMT703 0.197789 0.176466 -0.602437 0.301028 CMT273 0.180616 0.147473 -0.000261 0.301028 CMT131 0.183623 0.146825 -0.000261 0.301023 CMT333 0.193699 0.076322 -0.124957 0.234141 CMT102 0.223685 0.022229 0.176054 0.234128 3 genetic mapping 65 linkage power of this set of families was not calculated and families were instead analysed for linkage/exclusion independently.

3.3.5 Genetic linkage analysis in additional dHMN families

Tests for a common founding mutation(s) were carried out in the 20 additional dHMN families large enough for haplotype analysis. Family members were genotyped for microsatellite markers at 7q34-q36. Haplotype analysis, two- and multi-point linkage analysis was carried out on each family separately. Nine families showed a haplotype segregating with affected individuals only. Of these nine families, one achieved LOD scores suggestive of linkage to 7q34-q36 and eight families were uninformative. Ten families were excluded from linkage to 7q34-q36 and in one family the marker phase could not be determined. These results are detailed in Section 3.3.6, Section 3.3.7 and Section 3.3.8.

3.3.6 Suggestive linkage in family CMT44

Family CMT44 is a multi-generational dHMN family with samples available from five affected individuals across three generations (Figure 3.7). Genotypes for markers across the 7q34-q36 interval were analysed in CMT44 using hap- lotype analysis, two-point and multi-point linkage analysis. CMT44 has a haplotype at 7q34-q36 which segregates with the five affected individuals and one recombinant unaffected individual who carries the proximal end of the disease haplotype. This haplotype is not present in three additional unaffected individuals. One affected individual (III:7) shows a recombination event with an alternate haplotype for markers at the distal end of the disease interval. This recombination defines D7S637 as the distal flanking marker for CMT44. 3 genetic mapping 66

The proximal end of the disease interval lies beyond the most proximal marker analysed in CMT44 (D7S1518), this is beyond the proximal limits of the dis- ease interval defined by CMT54. However, the recombination event in the unaffected individual III:6 suggests the probable proximal flanking marker is D7S498. These two recombination events do not further refine the disease interval beyond family CMT54; the marker D7S637 is the flanking marker previously defined in family CMT54, and D7S498 lies beyond the minimum probable candidate interval.

Family CMT44 achieved a two-point LOD score of 2.02 at D7S2511, D7S688, D73070 and D7S483 (Table 3.6). The distal end of the disease interval is ex- cluded with significantly negative LOD scores at D7S637 and D7S2447 due to the recombination event in affected individual II:7. Multi-point linkage analysis achieved a maximum LOD score of 2.02 across the interval between D7S2511 and D7S2462 (Figure 3.8). This reaches the maximum possible power for this pedigree at this locus and indicates the lower two-point LOD scores within this interval are due to lower marker informativeness. These results support suggestive linkage to chromosome 7q34-q36 in pedigree CMT44, with the results reaching the limit of power in this pedigree.

3.3.7 Uninformative families

Analysis of genotypes for markers across the 7q34-q36 interval using hap- lotype analysis, two-point and multi-point linkage analysis, indicated fami- lies CMT609, CMT274, CMT677, CMT484, CMT703, CMT131, CMT333, and CMT273 were uninformative at this locus. These families were not excluded from the 7q34-q36 locus. The pedigrees, two-point LOD scores and multi-point LOD plots for these families are included in Appendix C, Appendix D and Appendix E respectively. 3 genetic mapping 67

Figure 3.7: Haplotype analysis of markers from the dHMN locus at 7q34-q36 in family CMT44. Females and males are represented by circles and squares respectively. Blackened symbols indicate affected individuals. Marker alleles are listed below individuals, ordered from centromere to telomere. Haplotypes are indicated by bars, with the hap- lotype segregating with the disease blackened. A recombination in affected individual III:7 defines D7S637 as the distal flanking marker in CMT44. The recombination in unaffected individual III:6 suggests the proximal flanking marker is D7S498. 3 genetic mapping 68

Table 3.6: Pedigree CMT44 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. The maximum LOD score achieved is 2.0225 at the markers D7s2511, D7s688, D7s3070 and D7s483. These scores reach the maximum simulated LOD score of this pedigree. lod score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 1.1405 1.0549 0.9647 0.7691 0.5486 0.2959 D7S661 1.1405 0.9885 0.8304 0.5070 0.2184 0.0439 D7S498 0.7214 0.8596 0.8744 0.7628 0.5583 0.3002 D7S2511 2.0225 1.8488 1.6661 1.2708 0.8358 0.3858 D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2442 1.1406 1.0549 0.9647 0.7691 0.5486 0.2959 D7S2419 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S688 2.0225 1.8488 1.6661 1.2708 0.8358 0.3858 D7S2439 1.1405 1.0549 0.9647 0.7691 0.5486 0.2959 D7S3070 2.0225 1.8488 1.6661 1.2708 0.8358 0.3858 D7S483 2.0225 1.8488 1.6661 1.2708 0.8358 0.3858 AD0AD0 0.5597 0.5174 0.4730 0.3766 0.2683 0.1445 D7S1815 1.1405 1.0549 0.9647 0.7691 0.5486 0.2959 AD3AD3 1.1405 1.0549 0.9647 0.7691 0.5486 0.2959 D7S2462 0.8819 0.7939 0.7014 0.5017 0.2872 0.0899 D7S637 -4.8289 0.5724 0.7194 0.6977 0.5316 0.2936 D7S2447 -5.4186 -0.4825 -0.2453 -0.0714 -0.0170 -0.0023 3 genetic mapping 69

3.0

2.0

1.0

0.0 0 5 10 15 20 25 -1.0

-2.0 Multi-point LODMulti-point score -3.0

-4.0

-5.0 D7S2419 D7S688 D7S1518 D7S661 D7S2442 D7S2447 AD0 D7S2511 D7S615 D7S498 D7S2439 D7S3070 D7S483 D7S1815 AD3 D7S2462 D7S637

Genetic distance (cM) Figure 3.8: Multi-point LOD plot for family CMT44. The LOD score between the markers D7s2511 and D7s2462 reaches the maximum simulated two point LOD score of 2.02. 3 genetic mapping 70

3.3.7.1 CMT609

CMT609 is a multi-generational extended family, initially diagnosed as per- oneal muscular atrophy with pyramidal features, with minor sensory involve- ment [215]. The age of onset of dHMN in this family is lower than CMT54, with individual IV:2 clinically affected when examined at age six. CMT609 shows a haplotype segregating with four affected individuals across three generations (Figure C.3). A genetic recombination event has occurred within the 7q34-q36 interval, however due to missing DNA this recombination could be assigned to either individual II:2, II:4 or III:3. The recombinant haplotype has been drawn as occurring in individual III:3 for simplicity (Figure C.3), however both possible family founder haplotypes are considered in haplotype comparisons (Section 3.3.9). Nevertheless, this recombination event excludes the interval centromeric of D7S2462 with significantly negative LOD scores of < -3 at θ = 0.00 (Table D.2 and Figure E.1). The remaining interval is uninformative achieving a maximum two-point LOD score of 0.72 at θ = 0.00 for D7S2462. The power of this family has potential to be increased as unaffected individual IV:3 also carries the disease haplotype. However this at-risk individual was aged only three when clinically examined and therefore of unknown phenotype and not included in linkage analysis. The recombination event in CMT609 would refine the critical disease interval if this family were linked to 7q34-q36.

3.3.7.2 CMT274

CMT274 is a nuclear family including one affected parent with two affected and one unaffected children. This pedigree has a haplotype spanning 7q34- q36 which segregates with all three affected family members and not the unaffected child, with no recombination events observed (Figure C.2). This pedigree achieved a maximum two-point LOD score of 0.58 at θ = 0.00 for 5 markers across the 7q34-q36 interval (Table D.1). The multi-point LOD scores 3 genetic mapping 71 were similar, between 0.5 and 0.6 across the 7q34-q36 interval (Figure E.2). These LOD scores reach the maximum power for this pedigree.

3.3.7.3 CMT677

CMT677 is a three-generation extended family. Generation II includes an affected sibling pair and generation III includes three unaffected siblings and one affected cousin. DNA samples were unavailable for generation I, the affected parent II:2 and unaffected child III:3 (Figure C.4). The haplotype for affected individual II:2 was partly inferred using the haplotypes of the available family members. This pedigree has a haplotype spanning the 7q34- q36 interval which segregates with the three affected individuals and not unaffected individual III:2. Unaffected individual III:1 is recombinant and carries the telomeric end of the chr.7q34-q36 interval. This recombination event suggests the distal flanking marker is D7S3070 in CMT677. However, being in an unaffected individual, without 100% penetrance this recombination does not produce significantly negative LOD scores in the region affected. The two-point and multi-point LOD scores were uninformative across the 7q34-q36 interval (LOD < 0.5; Table D.5 and Figure E.3). The multi-point LOD scores achieved the maximum power for this pedigree in the interval not affected by the recombination event.

3.3.7.4 CMT484

CMT484 is a nuclear family with one affected parent, two affected and one un- affected offspring. The affected parent has a further family history of dHMN, with another possibly affected family member not included in this study (Figure C.1). No DNA was available for the unaffected individual III:2. The pedigree for CMT484 shows a haplotype spanning the 7q34-q36 interval segre- gating with the three affected individuals with no recombinations observed. The affected siblings have alternate haplotypes from the unaffected parent 3 genetic mapping 72 confirming the correct assignment of the disease haplotype. Linkage analysis was uninformative across 7q34-q36 with a maximum LOD score of 0.30 at θ = 0.00 achieved at 9 markers (Table D.4). The multi-point LOD scores were 0.30 across the 7q34-q36 interval (Figure E.4). These LOD scores reach the maximum power for this pedigree.

3.3.7.5 CMT703

CMT703 is a nuclear family with one affected parent, with two affected and two unaffected children. The affected parent has a family history of dHMN. DNA was not available from the two unaffected siblings III:3 and III:4 (Fig- ure C.5). The pedigree for CMT703 has a haplotype spanning 7q34-q36 which segregates with all three affected family members, however one affected child has a recombination event between the markers D7S2511 and D7S2442. This recombination event can be assigned to either III:1 or III:2, therefore there are two possible parental haplotypes for CMT703. Haplotype analysis indicates the distal flanking marker is D7S2442, excluding the distal end of the 7q34-q36 interval. Due to limited marker informativeness in the region affected by the recombination event, linkage analysis only achieved a significant negative two-point LOD score of -4.40 at θ = 0.00 for D7S3070 (Table D.6 and Figure E.5). Two-point and multi-point LOD score were uninformative in the region not excluded by the recombination event, reaching a maximum of 0.30 at θ = 0.00. This reaches the maximum power for this pedigree. The genetic recombina- tion in CMT703 would refine the disease interval if the family was expanded sufficiently to achieve significant linkage.

3.3.7.6 CMT131

CMT131 is a three-generation family with four affected individuals (Figure C.6). DNA was available for one affected individual in each generation. The family founder has three unaffected siblings and no prior family history of dHMN. 3 genetic mapping 73

The pedigree for CMT131 has a haplotype shared by all three affected individ- uals. Affected individual IV:1 shows a recombination event, carrying only the proximal end of the disease haplotype. Haplotype analysis defined D7S3070 as the proximal flanking marker in CMT131. Significant negative LOD scores ex- clude the proximal end of the 7q34-q36 interval (Table D.7 and Figure E.6). The remaining interval was uninformative in two-point and multi-point linkage analysis, achieving a LOD score of 0.30, the maximum power for this pedigree. The genetic recombination in CMT131 would refine the disease interval if the family was expanded sufficiently to achieve significant linkage.

3.3.7.7 CMT333

CMT333 is a three-generation family with two affected and one unaffected sib- lings in generation II and one affected individual in generation III (Figure C.8). No disease was indicated in generation I, however they were not clinically examined. DNA was unavailable for the unaffected sibling II:3 or from genera- tion I. A haplotype at the 7q34-q36 interval segregates with all three affected individuals, with no recombination events. Two-point and multi-point LOD scores were uninformative across the interval, with multi-point LOD scores reaching the maximum power for this pedigree (Table D.3 and Figure E.7).

3.3.7.8 CMT273

CMT273 is a four-generation family with one affected individual in each generation. No DNA was available for the affected individual in generation I or the four unaffected family members. A haplotype across 7q34-q36 segregates with the affected individuals in generation II and III (Figure C.7). Affected individual IV:1. carries the alternate haplotype which may exclude this interval. However, in individual IV:1, the marker phase cannot be determined for 7 markers at the proximal end of the 7q34-q36 interval (D7S1518, D7S661, D7S498, D7S2511, D7S615, D7S2442 and D7S2419) and two markers on the 3 genetic mapping 74 distal end of the interval (D7S637 and D7S2447). There is the possibility of a recombination event within either of these regions. Two-point and multi- point LOD scores reached -4.94 at θ = 0.00 for informative markers, significant for exclusion. However, this family is insufficiently powerful to exclude the flanking regions of uninformative markers, with two-point and multi-point LOD scores uninformative at θ > 0.00 (Table D.8 and Figure E.8). Additional markers are required to clarify if CMT273 is excluded from 7q34-q36 or if there has been a genetic recombination in individual IV:1.

3.3.8 dHMN families excluded from 7q34-q36

Analysis of genotypes for markers across the 7q34-q36 interval using hap- lotype analysis, two-point and multi-point linkage analysis, indicated fami- lies CMT701, CMT260, CMT618, CMT477, CMT748, CMT779, CMT331 and CMT808 were excluded from this interval. These families do not have a haplo- type segregating with affected individuals and achieved significant negative LOD scores for informative markers across the disease interval. Initial linkage analysis using two markers in CMT375 and CMT102 identified non-segregation of haplotypes and negative two-point LOD scores sufficient to exclude linkage. The pedigrees, two-point LOD scores and multi-point LOD plots for these families are included in Appendix C, Appendix D and Appendix E respectively.

CMT724 is an extended family which has a mild and variable dHMN phe- notype with no sensory involvement. As this phenotype is mild, reduced penetrance is more likely in this pedigree. All four affected individuals share a haplotype for the 7q34-q36 locus (Figure C.18). Affected individual III:5 has a recombination event, excluding the region distal to the marker D7S2439. Two unaffected individuals also carry the disease haplotype. Due to the re- combination event in individual III:5, markers distal to this location generate 3 genetic mapping 75 significantly negative two-point and multi-point LOD scores (Table D.14 and Figure E.12). The LOD scores in the remaining interval are uninformative, how- ever the occurrence of the disease haplotype in multiple unaffected individuals suggests CMT724 is not linked to 7q34-q36.

The two largest families ascertained, CMT260 and CMT701 are good candi- dates for further genetic analysis. Future clinical evaluation of at-risk individu- als may identify newly affected individuals. This has the potential to increase the linkage power in these families to a significant level in a genome wide analysis.

3.3.9 Haplotype association analysis

The haplotypes of the eight additional dHMN families that were not excluded were compared with CMT54 to identify any shared alleles (Table 3.7). The minimal possible haplotypes defined by recombination events in affected indi- viduals were considered for families CMT44, CMT131, CMT609 and CMT703. Six haplotype blocks of three or more alleles common between CMT54 and at least one additional family were identified (Table 3.8). CMT54 and CMT44 have two separate allele groups shared; haplotype A (1-4-3) for the mark- ers D7S615, D7S2442, and D7S2419; and haplotype B (4-4-1) for the markers D7S2439, D7S3070 and D7S483. CMT333 and CMT54 share six alleles; haplo- type C (6-1-4-3-4-4) for markers D7S2511, D7S615, D7S2442, D7S2419, D7S688 and D7S2439. CMT54 and CMT484 share five alleles; haplotype D (7-1-6-1- 4) for markers D7S661, D7S498, D7S2511, D7S615 and D7S2442. CMT703, CMT484 and CMT54 share three alleles; haplotype E (1-6-1) for markers D7S161, D7S2511 and D7S615. Families CMT54, CMT333 and CMT484 share three alleles in overlapping haplotype F (6-1-4), for markers D7S2511, D7S615 and D7S2442. 3 genetic mapping 76

Haplotypes for 56 healthy control chromosomes for markers across the 7q34- q36 interval were determined by genotyping unaffected spouses in dHMN families (Appendix F). The six haplotypes were tested for association with the disease using a two tailed Fisher’s exact test of independence. The two groups defined for this test were: the dHMN group, consisting of nine dHMN families, including CMT54 and the additional dHMN families not excluded from the 7q34-q36 locus and the normal control group, consisting of the 56 healthy control chromosomes from unaffected spouses. Statistically significant evidence of association was demonstrated for four haplotypes: B (p = 0.0173), C (p = 0.0173), D (p = 0.0481) and F (p = 0.0099) (Table 3.9). Haplotypes A (p = 0.1010) and E (p = 0.0735) did not reach significance, found in 6 and 5 of 56 control haplotypes respectively.

These results suggest the shared haplotypes B, C, D and F may be derived from a common ancestor. However, the region covered by haplotype B is mutually exclusive from haplotypes C, D and F. A larger haplotype of seven markers spanning A and B in CMT44 is identical to CMT54 with the exception of the allele for D7S688.

3.4 discussion

3.4.1 Linkage analysis in CMT54

Linkage analysis in family CMT54 refined the minimum probable candidate interval by 0.74 Mb to the 6.92 Mb interval flanked by D7S615 and D7S2546. This excluded region was within the CNTNAP2 gene which spans the recombi- nation in unaffected individual II:7 defining the proximal end of the minimum probable candidate interval. The critical disease interval based on affected individuals alone is the 12.98 Mb interval flanked by D7S2513 and D7S637. 3 genetic mapping 77 , with alleles common

D7S2447 1 2 1 2 4 5 1 1 1 1 1/3 gray . . 5 D7S637 6 4 6 5 5 2/5 2/5 3/5 5 2 6 1 4 2 1 6 3 3

D7S2462 10 ...... AD3 2 2 1 . D7S1815 1 1 1 2 3 2 1 12 . . 2/3 ...... AD0 1 4 2

D7S483 1 1 5 2 3 2 9 9 3 5 2 4 4 6 3 4 4 3 6 4 7 D7S3070 2

D7S2439 4 4 4 3 4 3 1 1 1 3 4 . D7S688 4 2 4 3 3 2 3 4 3 2 3 3 3 2 6 3 3 3 2 7

D7S2419 11 4 5 D7S2442 4 4 4 4 2 3 3 2 5/4 Original haplotype Additional Haplotypes

D7S615 1 1 1 1 1 2 1 1 2 2 1/2

D7S2511 6 1 6 6 3 4 6 6 4 5 1

D7S498 1 4 2 1 3 1 1 1 2 7 1

D7S661 7 1 3 7 3 9 4 4 3 2 3

D7S1518 4 7 8 9 2 6 6 5 8 7 3/8

Family CMT54 CMT44 CMT333 CMT484 CMT274 CMT677 CMT131 . Haplotypes recombinant in affected individuals within families are unshaded. Both possible haplotypes for CMT609 CMT703 H1 CMT703 H2 CMT609 H1 CMT609 H2 blue Comparison of haplotypes between CMT54 and eight additional dHMN families. Haplotypes are shaded to CMT54 coloured and CMT703 are shown. Table 3.7: 3 genetic mapping 78

# alleles 3 3 6 5 3 3

D7S2447 2 1 2 1 2 1 2 1 2

D7S637 5 4 5 4 6 6 4 6 2/5

D7S2462 2 6 2 6 1 1 1 6 1 ...... AD3 2 2

D7S1815 1 1 1 1 2 1 2 1 2 ...... AD0 4 4 1 5 5 2 9 2 5 2 D7S483 1

D7S3070 4 6 4 6 3 6 3 6 3 4 4 4 3 1 3 4 3 D7S2439 4

D7S688 2 4 2 4 3 4 3 4 3 3 3 2 3 2 3 2 D7S2419 3 3 4 4 4 4 5 4 D7S2442 4 4 4 Original haplotype 1 1 1 1 1 1 1 1 1 D7S615 Additional Haplotypes 6 6 6 D7S2511 1 6 1 6 6 6 1 1 D7S498 4 2 4 2 1 2 1 7 D7S661 1 3 1 3 4 7 3 7 471614344411125 6 1 D7S1518 7 8 7 8 9 6 9 8 9

Family CMT54 CMT44 CMT44 CMT333 CMT333 CMT484 CMT703 CMT484 CMT333 CMT484 F B E C D Haplotype A Haplotypes used in association analysis are boxed. The six haplotype blocks tested are labelled A to F. Each family is highlighted a unique color. Haplotypes excluded by recombination in affected individuals are shaded gray. Table 3.8: 3 genetic mapping 79

Table 3.9: Association analysis of haplotypes with dHMN using Fisher’s exact test of independence. Significance scores (p value) are shown for each haplotype identified in Table 3.8. In total, 9 haplotypes were counted in the dHMN group and 56 haplotypes in the healthy control group.

Haplotype block Freq. dHMN Freq. controls Significance A 3 6 0.1010 B 2 0 0.0173 C 2 0 0.0173 D 2 1 0.0481 E 3 5 0.0735 F 3 4 0.0496

Because of reduced disease penetrance in dHMN, recombination events in unaffected family members cannot be used to conclusively exclude regions. The updated pedigree for CMT54 includes two additional unaffected children of the unaffected individual used to define the proximal flanking marker for the minimum probable candidate interval. Neither of these individuals carry the recombinant haplotype and therefore it is not possible to discount the possibility of non-penetrance in individual II:7. An accurate determination of disease penetrance in CMT54 requires the identification of the pathogenic 7q34-q36 gene mutation.

3.4.2 Linkage analysis in the extended dHMN cohort

Linkage analysis in the larger dHMN cohort of 20 families identified suggestive linkage to 7q34-q36 in family CMT44. CMT44 achieved a two-point LOD score of 2.02 which reaches the maximum power for this pedigree. This strongly supports the linkage of family CMT44 to the same locus as CMT54. Seven additional families also had a haplotype segregating with affected individuals, however they were uninformative in linkage analysis. 3 genetic mapping 80

The recombination event in the affected individual in CMT44 defines the distal flanking marker in this family as D7S637, with a recombination in an unaffected family member suggesting the proximal flanking marker is D7S498. This interval does not refine the interval defined in CMT54, with marker D7S637 previously defined as the distal flanking marker in CMT54 and D7S498 beyond the proximal flanking marker defining the minimum probable candidate interval in CMT54.

Association analysis of 7q34-q36 haplotypes from dHMN families was per- formed in an attempt to determine the most likely location of the disease gene and to establish whether a founding mutation was evident among families with evidence of linkage to 7q34-q36. Haplotypes from four families showed significant association with the disease compared to a healthy control cohort using the Fisher’s two tailed test of independence. This association suggests the haplotype shared by dHMN families is a result of a common disease allele within an ancestral haplotype [216]. An association between unlinked families cannot be used as evidence for linkage in these smaller families. To identify with certainty families that have a haplotype that is IBD requires a known pedigree linking the two families. Founder effects have previously been identified in peripheral neuropathies. [217] identified a founder effect in hereditary sensory neuropathy type I, with six families of English descent sharing the p.T399G SPTLC1 mutation and associated surrounding haplotype on chromosome 9. Similarly, a founder effect was demonstrated in a Gipsy population with CMT type 4C for the SH3TC2 p.R1109X mutation and associ- ated surrounding haplotype on chromosome 5q23-q33 [218]. A founder effect was described in a Dutch population with CMT4C at the 5q23-q33 locus which helped narrow the disease interval prior to identification of SH3TC2 as the disease gene [219, 220]. A similar founder effect was identified in CMTX3 [221], where the disease gene is yet to be elucidated. 3 genetic mapping 81

Families CMT54, CMT44, CMT333 and CMT484 have 7q34-q36 haplotypes that are significantly associated with dHMN. However the haplotype shared by CMT44 and CMT54 spans an interval mutually exclusive to the haplotype shared by CMT54, CMT333 and CMT484. A founder effect is not possible if this is the case as the disease allele cannot be located in both areas. This suggests a false positive association. A larger haplotype of 7 markers is identical in CMT44 and CMT54 apart from the allele for D7S688. CMT54 has a (4) allele for D7S688 and CMT44 has a (2) allele. This larger haplotype in CMT44 would span the separate haplotypes independently associated with dHMN if it were a real ancestral haplotype. If this larger haplotype is a true ancestral haplotype shared between CMT44 and CMT54, then the difference in the D7S688 marker may be due to a change of the microsatellite repeat size, a genotyping error or a tight double-recombinant. The (2) allele was identified in all five affected individuals in CMT44, therefore a genotyping error is unlikely. A tight double- recombinant is also highly unlikely as the genomic distance between the two markers flanking D7S688 is only 0.002 Cm. Change in microsatellite repeat number generally occurs as strand-slippage during replication, with gene- conversion during recombination or DNA repair less common [222, 223]. Most slip mutations involve a shift of a single-repeat (∼90%), with shifts of two- and three-repeats rarer [222, 224]. Microsatellites have a mutation rate which is proportional to the number of repeats, with higher repeats more likely to be mutated [200]. D7S688 is a CA repeat with a nominal size of 22 repeats, and therefore within the highly polymorphic size range (> 10 repeats). The two alleles in CMT54 and CMT44 differ by three CA repeats (143 to 149 bp), which is within the normal range of a single sequence-slip mutation during replication. It is possible that the differing allele at D7S688 between CMT54 and CMT44 is due to a mutation of the microsatellite repeat and these two families share a common founder. In further support for a founder effect in these four families with evidence of haplotype association, they have a similar 3 genetic mapping 82 phenotype and all have British ancestry, however any localised geographic links are unknown.

The convergence of the significantly associated haplotypes in CMT54, CMT333 and CMT484 suggests the disease allele lies in the region flanked by the markers D7S498 and D7S2419. This spans the proximal end of the minimum probable candidate interval in CMT54. Using the proximal flanking marker for the minimum probable candidate interval, the most likely position of the disease allele is within the 1.14 Mb interval flanked by the markers D7S615 and D7S2419.

3.4.3 Proportion of dHMN caused by a pathogenic gene at 7q34-q36

This study suggests that the incidence of dHMN with 7q34-q36 mutations is low. In the screen of 20 dHMN families with unidentified genetic cause, good evidence for linkage was shown in one family, CMT44, with an additional 7 families uninformative. Association analysis suggested possible ancestral links between CMT54, CMT44 and two additional families. With these families joined, then only one additional family is not excluded from the 7q34-q36 locus. This suggests that the 7q34-q36 locus may be responsible for ∼5% of dHMN within this cohort. In a review of 119 autosomal-dominant dHMN families from the peripheral neuropathies database (from which the families in this study were selected), mutations in known genes were responsible for ∼5% of dHMN with the remaining ∼95% having unknown genetic cause (O.Albulym, personal communication, June 2011). This is slightly lower than found in another autosomal-dominant dHMN cohort where known genes account for ∼15% of dHMN [6]. In both cohorts, each gene individually accounts for between ∼1.5 and ∼7% of of dHMN. The identification of the 7q34-q36 disease 3 genetic mapping 83 gene will likely lead to identification of additional families and clarify the proportion of dHMN caused by mutations at 7q34-q36. ANALYSISOFCANDIDATEGENES 4

4.1 introduction

4.1.1 Chapter overview

This chapter describes the application of a positional candidate based approach to the identification of the disease gene within the 7q34-q36 interval. The strategy employed here is targeted towards the identification of a mutation within the protein-coding exons and flanking intronic sequences of candidate genes within the 7q34-q36 interval. This analysis was conducted concurrently with the analysis of the 7q34-q36 locus in the extended dHMN cohort described in Chapter 3.

4.1.2 Strategy for gene identification

Several contrasting strategies for disease gene identification have been de- veloped to make best use of available experimental resources and charac- teristics of the disease under study. These methods include functional and positional cloning of disease genes [225] and more recently, next generation sequencing (NGS) strategies (Chapter 6). Functional cloning identifies a disease gene through a detailed understanding of the biochemical defect involved in the disease, which implicates a specific gene. Where a disease gene is not immediately apparent, functional cloning can be expanded to a functional

84 4 analysis of candidate genes 85 candidate approach. In such an approach, a series of informed decisions are used to identify more likely candidate genes which are then screened for mutations. In contrast, positional cloning identifies a disease gene based on its location in a genetic map. This is possible if a narrow genomic interval can be defined through linkage analysis and fine mapping to a locus containing only one gene. When a genomic interval linked to a disease contains many genes, positional cloning is combined with functional candidate gene selection to prioritise gene screening. This is known as a positional candidate approach [225].

As dHMN is functionally poorly understood, pure functional cloning is not possible and positional cloning and candidate gene strategies are applied. Most of the known dHMN genes have been identified through linkage analysis and positional candidate gene screening in affected families. One dHMN gene, HSPB3 was identified through a candidate gene based screen of all small HSPs following the identification of two other small HSP genes in dHMN through positional cloning (HSPB1 and HSPB8)[2–4]. Due to the genetic heterogeneity of dHMN, individual genes account for only a few percent of overall disease; any candidate based approaches require large cohorts to identify a new gene even in a single family [226].

4.1.3 Identifying genes within a disease interval

With the completion of the project most genes have been reliably identified through automated gene annotation complemented with manual gene annotation [227, 228], and genetic maps are well integrated into the physical sequence based maps. The identification of genes within a candidate interval has generally been reduced to a computer exercise [229]. Genomic intervals can be examined with online tools such as NCBI 4 analysis of candidate genes 86

map viewer (http://www.ncbi.nlm.nih.gov/mapview) and the UCSC genome browser (http://genome.ucsc.edu,[210]).

Without a comprehensive genome reference, identifying the genes within a disease locus was previously the rate limiting step in positional cloning, requiring the identification of expressed sequences from the linked genomic region [reviewed in 230]. This meant that much smaller disease intervals were required to achieve a reasonable gene identification workload [231]. Early gene identification was greatly assisted by the development of genetic maps generated using sequence-tagged site (STS) markers [232] and expression maps using expressed sequence tags (ESTs)[233].

4.1.4 Selection of candidate genes

Disease intervals identified though linkage analysis can contain hundreds of genes. In a positional candidate based approach, any functional annotation suggesting a disease role is utilised to prioritise candidate genes with a higher likelihood of carrying the disease mutation. Common gene annotation sources include; comparative genomics, epidemiological studies, functional candidates, in-silico analysis of annotation databases, and gene expression patterns.

4.1.4.1 Association analysis

Genetic association analysis is used to identify genes that may have a role in disease aetiology by identifying genetic markers that are overrepresented in disease population compared to a healthy control population, beyond what should occur by chance [234]. Disease associated markers may be in linkage disequilibrium with other functional changes, or may directly cause a change in a protein or its expression [235]. In whole-genome association (WGA) analysis, SNP markers are genotyped across all (or most) genes in the genome. Due 4 analysis of candidate genes 87 to the large number of SNPs examined, false positives are likely. Therefore, any initial associations identified need to be confirmed with additional studies. WGA has been applied to the related disorder amyotrophic lateral sclerosis (ALS) with limited success, possibly due to the heterogeneous nature of ALS [236].

4.1.4.2 Comparative genomics

Comparative genomics is based on the principle that biological features of gene orthologs are evolutionarily conserved among a group of organisms [237]. This provides good functional information about candidate genes which have been characterised in animal models. This was the case for the identification of IGHMBP2 in DSMA1; mutations in the mouse orthologue, Igmbp2 were previously shown to be responsible for the neurodegenerative phenotype in nmd mice [238]. As such, IGHMBP2 was the most promising candidate gene [76] (See Section 1.1.3).

4.1.4.3 Function dependant

Study of the known dHMN genes has highlighted some common pathways and protein functions. These include: disrupted axonal transport, RNA processing defects, protein aggregation and inclusion body formation (See Section 1.1.11). Candidate genes with similar functions to these known dHMN genes are good candidates as they possibly influence the same molecular pathways. However, as the range of functions of known dHMN genes is broad, many candidate genes can be hypothetically linked to these pathways and suggested as functional candidates.

4.1.4.4 In-silico prioritisation of disease genes

In-silico gene prioritisation utilises software to analyse publicly available databases of to prioritise candidate genes within a disease 4 analysis of candidate genes 88 locus based on interactions and homology with known disease genes [239]. While the application of in-silico prioritisation has been successful in some gene discoveries, its application relies on the known biology of the disease being studied and requires that the gene being targeted is not acting in a novel disease pathway.

4.1.4.5 Gene expression analysis

Generally, gene expression patterns matching the tissues affected in a disease can be used as a guide for selecting candidate genes. However, ubiquitously expressed genes can still cause disease in specific tissues. This is the case in dHMN, where some of the known disease genes are ubiquitously expressed (see Section 1.1.11).

4.1.5 Gene screening methods

Gene screening is the process of identifying the DNA variant (mutation) responsible for the disease. To fully characterise the nature of a DNA variant requires the determination of the DNA sequence. DNA sequencing by chain termination, known as ’Sanger’ or ’dideoxy’ sequencing, is commonly used to determine DNA sequence. Sanger sequencing can determine the sequence of DNA fragments of up to 800 bp on average, in a single reaction [240], suitable for sequencing individual exons of genes.

As a gene screening project may need to examine hundreds of exons, several methods have been developed for the detection of single base pair changes in DNA without sequencing, aiming to achieve faster and cheaper gene screening. These methods detect altered properties of the variant DNA strand including: mobility shift of variant DNA during electrophoresis (single-stranded confor- mational analysis and denaturing gradient gel electrophoresis), detection of 4 analysis of candidate genes 89 heteroduplexes, altered digestion of variant DNA by enzymes or chemical cleavage (RNase A cleavage and chemical mismatch cleavage) [reviewed in 241]. These methods all require the subsequent application of direct sequencing to define precisely the nature of any sequence variant identified.

Advances in Sanger sequencing mean it is now rapid, accurate, and afford- able enough to allow the direct sequencing of candidate genes without the need for prior screening using alternate methods [242]. Recently, the use of Sanger sequencing itself has begun to be replaced by NGS technologies (par- ticularly exome sequencing) as the primary sequencing method in positional cloning. NGS allows the sequencing of much larger targets in a single reaction. During the course of this project, as NGS technologies have become available they have been adopted where relevant. This chapter details the application of a positional candidate approach to gene screening using Sanger sequencing of candidate genes. The application of NGS to gene screening within the 7q34-q36 disease interval is detailed in Chapter 6.

4.1.5.1 Hypothesis and aims

It is hypothesised that dHMN seen in family CMT54 is caused by a single gene mutation within the linked region on chromosome 7q34-q36. The aims of this chapter are to:

• Identify and prioritise candidate genes in the 7q34-q36 disease interval.

• Sequence coding sequences and flanking intronic sequences of candidate genes to identify the pathogenic mutation. 4 analysis of candidate genes 90

4.2 materials and methods

4.2.1 In-silico candidate gene prioritisation

Two web based in-silico candidate gene prioritisation programs were used; GeneWanderer and Endeavour. The same set of reference genes was used to generate comparisons in both programs. This set included the known dHMN genes: HSPB1, HSPB8, GARS, BSCL2, IGHMBP2, DCTN1, SETX, and PLEKHG5.

4.2.1.1 GeneWanderer

The web-based interface for the GeneWanderer software (http://compbio. charite.de/genewanderer/GeneWanderer [243] was used with the parameters: linkage interval was set as chr7:141,053,887-154,034,713, analysis method was set to Random walk, and all interactions were selected.

4.2.1.2 Endeavour

The web-based interface for the Endeavour software (http://www.esat.kuleuven. be/endeavourweb [244, 245] was used with the following parameters: the species selected was Homo sapiens, all data sources, and the candidate gene interval was chr7:141,053,887-154,034,713.

4.2.2 PCR and sequencing for gene screening

PCR amplification of exons was carried out as described in Section 2.3.3, with primer sets detailed in Appendix B. Sequencing was carried out as described in Section 2.3.6. 4 analysis of candidate genes 91

4.2.2.1 Sequence analysis

Sequence traces in .abi format were aligned using SeqMan II 5.08 (DNAStar), and sequence variants were identified by visual inspection of sequence traces. The consensus sequence from the affected individuals was aligned to the Human Mar. 2006 (NCBI36/hg18) Assembly using the BLAT tool from the UCSC genome browser [198]. Correct sequence identity, exon coverage, and characterisation of identified variants was carried out following the BLAT alignment.

4.2.3 Analysis of effect of variants on exon splicing

The effect of mutations on exon splicing can be predicted using a combination of several splicing prediction programs and a modest threshold for change in splicing scores [246]. The guidelines for validation of splicing variants [247] were followed, with at least three programs used to predict the effect of each novel variant and variants with two positive in-silico results further analysed with RNA studies.

4.2.3.1 In-silico splicing prediction

The splicing prediction programs used were NetGene2 [248], Splice Site Pre- diction by Neural Network (NNSplice)[249], Human Splicing Finder (HSF) [250], GeneSplicer [251] and Alternative Splice Site Predictor (ASSP)[252]. Default options were used for each software. Where programs required DNA sequences as input, DNA sequence was downloaded from the UCSC genome browser Human Mar. 2006 (NCBI36/hg18) assembly then manually edited to include DNA variants being tested. 4 analysis of candidate genes 92

4.2.4 RNA studies of splicing variants

4.2.4.1 Isolation of total RNA

Total RNA was extracted from lymphocytes purified from peripheral blood using density gradient centrifugation. Approximately 30 ml of blood was

® collected into 9 ml VACUETTE K3EDTA tubes (Greiner bio-one). The blood sample was made up to a volume of 35 ml by adding cold RPMI-1640 medium (Sigma). The Blood/RPMI was overlaid onto 15 ml of Ficoll-paque® PLUS (GE healthcare Bio-sciences AB, Sweden) and centrifuged at 1700 × rpm for 40 min without brakes using a Megafuge® 1.0 (Heraeus® Instruments) type centrifuge. The buffy coat was recovered by pipetting then washed twice by suspending cells in 35-50 ml RPMI media followed by centrifugation at 1000 × rpm for 5 min.

RNA extraction was carried out using the total RNA isolation NucleoSpin® RNA II kit (Machery-Nagel) as per the protocol Total RNA purification from cultured cells and tissue with NucleoSpin® RNA II, version May 2003/Rev.02. In brief, buffy coat cells were lysed using the supplied lysis buffer, with me- chanical disruption by vortexing then passing the sample through a 21 g needle three times. The cell lysate was filtered, RNA was bound to a NucleoSpin® RNA II column, desalted, DNAse digested, and washed three times. RNA was eluted using 60 µL of RNAse free water, then reapplied to the column and re-eluted. All filtration, wash and elution centrifugation steps were carried out using a 5417 C bench-top centrifuge (Eppendorf). Eluted RNA was stored at -70 °C. 4 analysis of candidate genes 93

4.2.4.2 cDNA synthesis and RT-PCR

All cDNA synthesis was carried out using the iScript cDNA synthesis kit (BioRAD), using the supplied protocol, in a 2 × reaction volume (40 µL). Each reaction was set up as follows:

Component Volume

5× iScript Reaction Mix 8 µL

iScript Reverse Transcriptase 2 µL

RNA template up to 1 µg total RNA

Nuclease-free water to a total volume of 40 µL

The reaction mix was incubated for 5 minutes at 25°C, 30 minutes at 42°C, and 5 minutes at 85°C. Synthesised cDNA was stored at -20°C. RT-PCR was carried out using standard PCR protocol described in Section 2.3.3. Nucleotide primers for RT-PCR are detailed in Appendix B.

4.2.5 RNA folding prediction

Sequence variants within the 3’UTR of genes may affect RNA secondary structure. The software Mfold version 4.4 [253, 254] was used to predict RNA secondary structure. Required nucleotide sequences were downloaded from the UCSC genome browser, Human Mar. 2006 (NCBI36/hg18) assembly, then manually edited to include the sequence variants being tested. 4 analysis of candidate genes 94

4.3 results

4.3.1 Positional candidate genes

The critical disease interval defined by recombination events in affected in- dividuals in CMT54 spans 12.98 Mb of DNA at 7q34-q36 (NCBI36/hg18 assembly) and is contained in two sequenced map contigs: NT_007914.10 and NT_034885.1 (Figure 4.1)[255]. The sequence gap between these two contigs has an estimated size of 99-100 kb based on fluorescence in situ hy- bridisation analysis of human DNA and comparison to the analogous region in the mouse genome (gap report HG-740: http://www.ncbi.nlm.nih.gov/ projects/genome/assembly/grc)[255]. The gap is located on the telomeric end of the critical disease interval. The minimum probable candidate interval defined by recombination events in unaffected individuals in CMT54 is located entirely within sequenced map contig NT_007914.10. Positional candidate genes were identified from within the critical disease interval by retrieving records for known genes from the UCSC genes track using the UCSC table browser tool [256]. The UCSC genes track contains gene records combined from RefSeq [257, 258], Genbank [259], the consensus coding sequence (CCDS) project [260] and UniProt [261] databases. The critical disease interval con- tained 245 gene records after removing duplicates. These records correspond to:

• 89 protein coding genes (not including taste, olfactory or T-cell receptor transcripts).

• 10 taste receptor gene transcripts.

• 14 olfactory receptor gene transcripts in the olfactory receptor gene cluster [262]. 4 analysis of candidate genes 95

• 86 T-cell receptor beta-chain gene transcripts localised to the T-cell recep- tor beta-chain gene complex at 7q35 [263].

• 32 putative non-coding transcripts

• 14

The 89 coding genes are the target for this candidate gene screening approach (Figure 4.1). The minimum probable candidate interval contains 60 gene tran- scripts (Table G.1), with 28 gene transcripts in the remaining 7q34-q36 critical disease interval (Table G.2). To ensure all possible exons of candidate genes were screened, transcripts for splicing variants or alternate protein isoforms were identified from the UCSC genes and RefSeq tracks from the UCSC genome browser as well as the Uniprot database. Unique exons were added to the candidate genes list as alternate isoforms.

Chromosome 7 21.11 31.1 33 q34 35

5Mb Cytogenetic map 7q34 7q35 7q36.1 7q36.2 NT_034885 Map contigs NT_007914 CDI MPCI Genes 1 2 5 1 5 K es 6A G2 467 CA-T enes DPP6 DR86 MAP4 PDIA4 Known genes CNTNAP2 WEE2, SSBP1 ABP1, KCNH2 NOS3, ATG9B NOBOX, TPK1 NOBOX, MLL3, CCT8L1 CASP2, CLCN1 ABCB8, ACCN3 XRCC2, ACTR3B REPIN1, ZNF775 SSPO, ATP6V0E2 SSPO, GIMAP6, GIMAP2 GIMAP1, GIMAP5 PRSS37, CLEC5A PRSS37, ZNF786, ZNF425 ZNF398, ZNF282 ZNF777, ZNF746 ACTR3C, LRRC61 C7orf34, KEL, PIP ZNF212, ZNF783 GSTK1, TMEM139 MGAM, LOC93432 C7orf29, RARRES2 GALNTL5, GALNT11 FAM115C, FAM115A FAM115C, C7orf33, CUL1, EZH2 ARHGEF35, ARHGEF TRYX3, PRSS1, PRSS TRYX3, PRSS1, TMUB1, AGAP3, GBX CDK5, SLC4A2, FAST EPHB6, TRPV6, TRPV FAM131B, ZYX, EPHA FAM131B, T-cell receptor B gen receptor T-cell CRYGN, RHEB, PRKA TMEM176B, TMEM17 Olfactory receptor g Olfactory receptor ZNF767, KRBA1, ZNF SMARCD3, NUB1, W ASB10, ABCF2, CSGL GIMAP8, GIMAP7, GI

Figure 4.1: Genes mapping to the 7q34-q36 disease interval. Top panel: chromosome 7 ideogram depicting the location of the 7q34-q36 disease locus. Lower panel: detail of the 7q34-q36 disease locus. Figure labels indicate the following: critical disease interval (blue bar), minimum probable candidate interval (green bar), cytogenetic map and known genes. 4 analysis of candidate genes 96

4.3.2 Functional candidate gene selection

A literature and database search was carried out to identify functional can- didate genes from the 89 positional candidate genes. This search included; neurodegenerative disease role or association, comparative genomics, func- tional candidates, gene expression patterns and in-silico prioritisation.

4.3.2.1 Relationship of candidate genes with neurodegenerative diseases

Using the Online Mendelian Inheritance in Man (OMIM) database (http:

//www.ncbi.nlm.nih.gov/omim), positional candidate genes were examined for any known function in, or published association with neurodegenerative or neurological diseases. Six genes were identified as potential candidate genes: DPP6, ACCN3, CUL1, CNTNAP2, KCNH2 and NOS3. The gene DPP6 is asso- ciated with susceptibility to sporadic ALS in different populations of European ancestry [264]. The same association was reproduced in homogeneous Irish [265] and Italian populations [266]. A second gene, ACCN1 was shown to be associated with sporadic ALS in one study [267]. The positional candidate ACCN3 forms a functional complex with ACCN1, which has a functional role in the peripheral nervous system [268, 269]. Cullin1, coded by the candidate gene CUL1, interacts with Parkin [270], mutations in which cause autosomal recessive juvenile Parkinson’s disease (PARK2 or PDJ: MIM#600116) [271]. CNTNAP2 mutations cause several neurological diseases: recessive frameshift mutations in CNTNAP2 cause cortical dysplasia-focal epilepsy (CDFE) syn- drome [272]: CNTNAP2 haploinsufficiency causes schizophrenia and epilepsy [273]: and CNTNAP2 is disrupted through a chromosomal insertion/translo- cation in familial Gilles de la Tourette syndrome and obsessive compulsive disorder (motor and vocal ticks) [274]. In addition, SNPs within CNTNAP2 are associated with increased risk of developing autism [275]. KCNH2 is associated with schizophrenia, and has a functional role in the brain [276]. Genetic vari- 4 analysis of candidate genes 97 ants in NOS3 infer a slight increase in risk for late-onset Alzheimer’s disease [277, 278].

4.3.2.2 Identification of candidate genes through comparative genomics

Using comparative genomics, the gene CASP2 was identified as a good candi- date gene. Mutant CASP2 causes accelerated motor neuron cell death during development in a mouse model [279]. CDK5 was previously screened for CDS mutations in this family [103] due to the observed deregulation of CDK5 in a SOD1 ALS mouse model [280]. The candidate gene DPP6 is disrupted by a chromosomal re-arrangement in the rump white (Rw) mouse which is em- bryonic lethal in homozygotes [281]. However, complementation with DPP6 deletion mutants indicated that DPP6 disruption is not the embryonic lethal factor in the Rw chromosomal rearrangement [282]. Therefore, this does not add further support for DPP6 as a candidate gene.

4.3.2.3 Functional candidate genes

The function of candidate genes were examined to identify genes with roles in tissues of the peripheral nervous system or in pathways affected in dHMN (Section 1.1.11). Seven genes were identified as potential candidate genes: CUL1, FASTK, SMARCD3, SSBP1, CNTNAP2, TRPV6 and TRPV5. CUL1 is a core protein of the complex responsible for ubiquination of components of NFκB complexes [283, 284]. Cell death through loss of NFκB signalling is caused by PLEKHG5 mutations in DSMA4 [162] (see Section 1.1.8). FASTK is a negative modulator of neurite outgrowth and a positive modulator of neurite retraction, with a likely signalling role in cytoskeleton maintenance [285]. SMARCD3 is involved in chromatin remodelling and transcriptional regulation [286]. SSBP1 is involved in genome stability as part of the single-stranded DNA (ssDNA)- binding complex and is a core protein of mitochondrial nucleoid complexes responsible for replication and translation of mtDNA [287, 288]. CNTNAP2 is 4 analysis of candidate genes 98 thought to have a role in the localisation of ion channels at the juxtaparanodal region of myelin required for proper conduction of the nerve impulse [289]. TRPV6 and TRPV5 form calcium-permeable channels, which participate in neurotransmission, muscle contraction, and exocytosis by providing calcium as an intracellular second messenger. TRPV6 and TRPV5 are members of the same subgroup of the transient receptor potential protein superfamily as the recently identified dHMN gene TRPV4. However, TRPV6 and TRPV5 are calcium selective channels as opposed to TRPV1-4 which are thermosensitive non-selective cation channels. Nevertheless, they remain good candidate genes.

4.3.2.4 Prioritisation based on gene expression pattern

The gene expression pattern of candidate genes in brain, spinal cord and muscle was examined using the expression section of the GeneCards® page for each gene [290, www..org]. The GeneCards® expression figures are generated using data from Genenote [291, 292] and GNF BioGPS [293, 294]. Five genes, CDK5, CNTNAP2, DPP6, WDR86 and ATP6V0E2 had high relative expression levels in brain or spinal cord. For two genes NOBOX and ACTR3C, there was no expression data available for some or all of the tissues. The remaining genes were all expressed in the selected tissues (Appendix G, Table G.3).

4.3.2.5 In silico prioritisation of candidate genes

Two separate programs, GeneWanderer and Endeavour, were used to prioritise candidate genes. The web-based software GeneWanderer, investigates the interaction of candidate genes with known disease genes through a protein- protein interaction (PPI) network based on experimental data from several species [243]. The second program, Endeavour, also investigates the interaction of candidate genes with known disease genes, however uses a greater number of relationships including; literature, functional annotation, gene expression, 4 analysis of candidate genes 99

Table 4.1: Candidate gene prioritisation ranking using GeneWanderer and Endeavour software. Rank GeneWanderer Endeavour Gene Symbol Score Gene Symbol Score 1 CASP2 0.8695 CDK5 0.000247 2 GSTK1 0.8504 CASP2 0.00267 3 TRPV6 0.03027 ACTR3B 0.00387 4 NOS3 0.03019 ARHGEF5 0.00395 5 CDK5 0.02678 CENTG3 0.00714 6 CUL1 0.02558 SSBP1 0.00837 7 MGAM 0.02010 ABP1 0.0105 8 KEL 0.01449 PDIA4 0.0167 9 CNTNAP2 0.01380 GIMAP4 0.0205 10 ABCF2 0.01109 ZYX 0.0225 protein domains, PPI, pathway membership, gene regulation, transcriptional motifs, and sequence similarity [244, 245]. Two genes, CASP2 and CDK5 were ranked highly by both programs, with the remainder unique to each list (Table 4.1). Both these genes have previously been identified as good candidates. The gene GSTK1 also scored highly with GeneWanderer.

4.3.3 Screening candidate genes

The protein-coding exons and flanking intronic regions of candidate genes were sequenced using direct sequencing, in two affected and two normal individu- als from CMT54. Functional candidate genes within the minimum probable candidate interval were prioritised for sequencing, followed by some candi- date genes beyond this interval. The remaining genes screened were generally selected based on their location in the minimum probable candidate interval. A total of 32 genes were sequenced (comprising 475 exons): CUL1, ACCN3, DPP6, SMARCD3, KCNH2, NOS3, CASP2, FASTK, TRPV6, GSTK1, ABCF2, CENTG3, ABP1, PDIA4, KEL, ZYX, ABCB8, RARRES2, CLCN1, TMUB1, NUB1, 4 analysis of candidate genes 100

EZH2, REPIN1, CRYGN, RHEB, ZNF775, PRKAG2, TMEM176A, TMEM176B,

GALNT11, SLC4A2, and XRCC2. One gene, MLL3 was partly sequenced with 40 of 59 exons sequenced. Prior to this PhD project, the protein-coding exons and exon-intron boundaries for two positional candidate genes, CDK5 and CNTNAP2, were sequenced in an affected individual from CMT54, with no pathogenic variants identified. The sequencing traces for CNTNAP2 and CDK5 were reviewed to check for variants affecting splicing. In total, 31 sequence variants were identified in both affected individuals screened (Table 4.2). 14 of these variants were not present in the two unaffected individuals sequenced. Of these variants, nine were reported validated SNPs in dbSNP 1.29 or 1.30. Five variants were examined further as possible pathogenic mutations.

4.3.3.1 ACCN3:c.915C>T

The variant ACCN3:c.915C>T, is a non-synonymous base change predicted to cause the amino acid substitution Pro289Ser. Examination of the conservation track of the UCSC genome browser indicated Pro289 is 100% conserved across mammals, however the Stickleback is Wt for Ser in this location. Sequencing of CMT54 family members showed this variant segregated with all affected individuals tested (9 individuals) and was not present in any unaffected individuals. Sequencing of a healthy control cohort identified this variant in 9 of 269 individuals, with a minor allele frequency (MAF) of 1.67%. Therefore, this variant was considered a non-pathogenic SNP.

4.3.3.2 DPP6:c.*347A>G

The variant DPP6:c.*347A>G, lies in the 3’UTR of a short isoform of the DPP6 gene. This base is conserved across primates. Using the software mfold version 4.4, the variant was predicted to cause altered mRNA secondary structure (data not shown). Sequencing of all CMT54 family members showed that this variant did not segregate with affected individuals, found only in 4 affected 4 analysis of candidate genes 101 controls frequency # controls Table 4.2 - continued on next page 9 269 3.2% for T 2 2 0% for T 3 1 50% for A Affected intron (P289S) (P727L) cds-synon rs3750042 † A rs11981217 intron 2 2 0% for A G rs56680997 intron 2 2 0% for G A - intron 3 1 50% for A A intron 1 - - T rs10271133 intron 2 2T 25% for rs2291538 T intron 3 1 0% for T A † T rs2272252 intron 2 2 0% for T TT rs2272251 rs13438232 cds-synon missense 2 2 100% for T > > > G - 3’UTR 4 of 10 24 4.1% for G > > > TT - rs6962852 cds-synon missense 2 2 0% for T > A > > > > > > > Variant dbSNP Type # c.*347A ‡ accession NM_014141NM_014141 c.1528+117insAAAC rs34615841 c.3476-15C NM_004769 intronNM_001091 c.915C NM_000083 c.1856+15G NM_000083 1NM_000083 c.261C NM_000083 c.1402-9C - c.2154C NM_000083 c.2180C c.2284+33C - NM_130797 c.457+32G NM_014141NM_003592 c.3790-6delC c.1191+8C Multiple SNPs intron 2 - - NM_130797NM_130797 c.1666+47T c.358+30C NM_001936 c.303G DNA variants identified in genes screened. All mutations found in affected individuals sequenced including known SNPs. Allele protein Q8IYG9 ABP1 DPP6 DPP6 DPP6 DPP6 CUL1 CLCN1 CLCN1 CLCN1 CLCN1 CLCN1 ACCN3 isoform 2 CNTNAP2 CNTNAP2 CNTNAP2 Gene Symbol RefSeq DPP6 frequency. When both controls areallele. homozygous, Where allele rs freq numbers will are be indicated, 0% alleles or identified 100%. in 100% sequencing indicates matched they are those homozygous reported for in the dbSNP. alternate SNP Table 4.2: 4 analysis of candidate genes 102 A refer to the > frequency controls isoform 2:c.303G DPP6 # Affected # controls A and > :c.457+32G DPP6 T - intron 2 2 0% for T T - intron 2 2 25% for T T rs2303924 intron 2 2 0% for T A rs1800781 intron 2 2 25% for A C rs41277434 intron 2 2 100% for C G rs41277434 intron 2 2 25% for g G rs446886 intron 2 2 75% for G C rs1805123 missense (K557T) 1 - - G rs2566514 cds-synon 2 2 25% for G > > > T rs2075001 cds-synon 2 2 0% for T T rs17256042 missense (R180W) 2 2 0% for T > C rs3173833 missense (S94R) 2 2 50% for C > A rs2072443 missense (A134T) 2 2 50% for A > C rs1549758 cds-synon 2 2 25% for C isoforms. Both the A rs2303926 cds-synon 2 2 25% for A > C rs2302479 3’UTR 2 2 25% for C > > > > > > > > > DPP6 Variant dbSNP Type NM_06118 c.598+7A NM_022087 c.712+50C NM_000603NM_000603 c.270+42G NM_000603 c.774T c.1998C NM_172057NM_007188 c.1639A NM_007188 c.659+12C NM_144727 c.984T c.411C NM_004456NM_004456 c.2110+6T c.2276+6T NM_014020NM_014020 c.538C c.*3T NM_004456 c.2111-46C NM_014020NM_014020 c.280A c.400G RefSeq accession EZH2 EZH2 EZH2 NOS3 NOS3 NOS3 NUB1 ABCB8 ABCB8 KCNH2 CRYGN One variant has multiple affectssame in mutation. UniProtKB/TrEMBL accession number. GALNT11 TMEM176B TMEM176B TMEM176B TMEM176B † ‡ Gene Symbol Table 4.2 - continued from previous page 4 analysis of candidate genes 103 individuals. Furthermore, this variant was present in 1 of 24 unrelated control individuals (MAF = 2.1%). This variant was considered a non-pathogenic SNP.

4.3.3.3 EZH2:c.2111-46C>T

The variant EZH2:c.2111-46C>T, is an intronic variant located upstream of exon 19. Examination of the conservation track of the UCSC genome browser indicated this nucleotide is not conserved across species, with the T allele present in one of six species at this position. Using four different splicing prediction programs (NetGene2, NNSplice, HSF and GeneSplicer), this variant was predicted to not affect splicing or create any additional splice motifs. Therefore, this variant was excluded. Segregation analysis in the larger CMT54 pedigree was not carried out.

4.3.3.4 CNTNAP2:c.3476-15C>A

The variant CNTNAP2:c.3476-15C>A, is a non-coding variant that was con- sidered to potentially affect both the long and short isoforms of CNTNAP2, located 15 bp upstream of exon 21 and exon 1 respectively (Figure 4.2). Ex- amination of the conservation track of the UCSC genome browser indicated this nucleotide is not conserved across species with C and T nucleotides at this location in eight species. Sequencing of CMT54 family members showed this variant segregated with all affected individuals and was not present in any unaffected individuals. The effect of the CNTNAP2:c.3476-15C>A variant was simulated using the splicing prediction programs NetGene2, NNSplice and HSF. This variant was predicted to reduce the power of the adjacent splice site by -1% by NNSplice, and -5.24% by HSF with no change predicted by Net- Gene2. In addition, the formation of a new enhancer motif site was predicted by the HSF software. These scores suggested the CNTNAP2:c.3476-15C>A vari- ant may affect splicing. The effect of this variant on splicing was investigated by RT-PCR using three PCR primer sets designed to detect any aberrantly 4 analysis of candidate genes 104

CNTNAP2 C>A Ex. 20 Ex. 21 Exon 22 Ex. 23 Exon 24 3'UTR

20F 21F 23R 24R PCR-A (479bp) PCR-B (390bp)

CNTNAP2 short isoform C>A 5' UTR Exon 1 Ex. 2 Exon 3 3'UTR

100 bp shortF 23R 24R PCR-C (370bp)

Figure 4.2: Location of variant CNTNAP2:c.3476-15C>A and adjacent exons. This variant possibly affects both isoforms of CNTNAP2, with the variant located in the three PCRs overlapping the splice site variant. Exon 22 is 240 bp and Exon 23 is 81 bp

spliced variants of CNTNAP2 (Figure 4.2). These amplicons may identify exon skipping of exon 21 or criptic splice site use altering the length of the transcript. No PCR products indicating aberrant exon splicing were detected in affected individuals from CMT54 (Figure 4.3). However, this cannot rule out the pos- sibility of a neuron-specific splicing defect. As neuronal tissue from affected patients is unavailable in this study, this can not be readily tested. Sequencing of a healthy control cohort identified a heterozygous C>A change in 6 of 84 control individuals sequenced (MAF = 3.57%). Therefore, this variant was considered a non-pathogenic SNP. This variant was subsequently identified as SNP rs77706740, included in the dbSNP build 132 (release 9/11/2010).

4.3.3.5 CNTNAP2:c.3790-6delC

The variant CNTNAP2:c.3790-6delC is a deletion that lies upstream of exon 22. Examination of the conservation track of the UCSC genome browser indicated this nucleotide is not conserved across 10 species, with variable gaps of up to 9 bp present at this location. There are seven SNPs reporting small deletions, 4 analysis of candidate genes 105

M2 +I -

Figure 4.3: PCR amplification of cDNA flanking the CNTNAP2:c3476-15C>A 600 - 500 - A variant. PCR-A and PCR-B amplify cDNA 400 - from the full length CNTNAP2 transcript whereas PCR-C amplifies the short iso- form transcript. Lanes marked (1) and (2) are affected individuals, (+) unaffected control, (-) PCR negative control, and (M) M I 2 + - DNA size marker HyperLadder IV 100- 1000bp DNA ladder (Bioline). 500 - B 400 - 300 -

M I 2 + -

500 - 400 - C 300 - 4 analysis of candidate genes 106

Chromosome 7

500 kb

Cytogenetic map 7q35 7q36.1

Genes

CNTNAP2 CNTNAP2 short isoform LOC392145 C7orf33 CUL1 Figure 4.4: Genes in the interval defined by significantly associated haplotypes (Sec- tion 3.3.9). Top panel: chromosome 7 ideogram depicting the location of the smaller locus. Lower panel: detail of the smaller locus. Figure labels indicate the following: cytogenetic map and known genes. The 5’ end of CNTNAP2 is excluded by the proxi- mal end of the interval. Only the 5’-UTR of CUL1 is included at the distal end of the interval.

of two to four bases, affecting this base or adjacent bases, with this nucleotide deleted in the SNPs, rs71188981, rs72268642 and rs72031591 (dbSNP build 130). Analysis of this variant with splicing prediction software indicated no change with NetGene2 (confidence = 0.97) and a slight reduction in confidence with ASSP (12.537 > 12.209). With the reported SNPs at this location and the neutral scores in splicing prediction software, this variant was considered a non-pathogenic SNP.

4.3.4 Analysis of the interval defined by significantly associated haplotypes

The 1.14 Mb interval defined by the overlapping haplotypes associated with dHMN (Section 3.3.9) was examined in greater detail. This interval, at the proximal end of the minimum probable candidate interval contains the 3’ end of CNTNAP2, the 5’ end of CUL1, transcript C7orf33 and LOC392145 (Figure 4.4). Both CNTNAP2 and CUL1 were good candidate genes and protein-coding exons of both genes have been excluded. The CNTNAP2 3’UTR, CUL1 5’UTR, transcript C7orf33 and pseudogene LOC392145 were examined using NGS detailed in Chapter 6.

21.11 31.1 33 q34 35 4 analysis of candidate genes 107

4.4 discussion

4.4.1 Gene screening summary

The protein-coding exons and exon-intron boundaries of 35 genes have been sequenced and no putative pathogenic mutation was identified. Thirty-one sequence variants were identified in two affected individuals from CMT54, however, all were excluded as non-pathogenic through: non-segregation with all affected members of CMT54, being identified in unaffected spouses or healthy controls, reported as SNPs in dbSNP, or without a putative pathogenic function. No further genes were sequenced as this candidate based approach was replaced with a NGS based screen of genes within the 7q34-q36 interval (Chapter 6).

This application of a positional candidate based approach for gene screening has been ineffective in identifying the pathogenic mutation. This may be due to the disease gene having a novel function unrelated to the known dHMN disease genes, a novel uncharacterised function of a candidate gene, or the mutation confers a gain of new function to a gene not prioritised. Also, the possibility of a mutation acting through a regulatory mechanism or being a CNV need to be considered. These possibilities are considered in Chapter 6 and Chapter 5. Nevertheless, the examination of the protein-coding exons and flanking intronic sequences for functional candidate genes is the most logical place to start a gene screen such as this.

The two dHMN genes identified recently, ATP7A and TRPV4 are both novel genes in dHMN. Both genes had no obvious relationship with the other known dHMN genes and therefore were only identified after sequencing numerous other candidate genes. The X-linked dHMN, DSMAX (Section 1.1.7), was mapped to an interval containing 71 candidate genes, 33 of which were 4 analysis of candidate genes 108 screened prior to the identification of the pathogenic mutation in ATP7A. ATP7A is a membrane bound copper transporter and ATP7A mutations had been shown previously shown to cause Menkes disease and OHS, was not selected as a priority candidate. Similarly, the gene for congenital DSMA (Section 1.1.9), TRPV4 was not selected as a primary candidate, with the protein-coding exons and exon-intron boundaries of 19 genes sequenced prior to the identification of the pathogenic mutation in TRPV4. These recent gene discoveries have added to the genetic heterogeneity of dHMN.

4.4.2 Was the mutation missed?

The possibility that the pathogenic mutation was missed is unlikely. Dideoxy sequencing was used in this screen which has a very high accuracy. Only sequence traces of high quality were used and two affected individuals were sequenced for most exons. Sequencing can detect any molecular level change within the size limits of the PCR amplicon sequenced. In addition, all exons from alternate transcripts of the candidate genes were included. However, this approach cannot readily identify CNV involving the entire PCR amplicon or detect allele specific amplification due to additional SNPs within primer binding sites. An analysis of CNV within the dHMN 7q34-q36 disease interval was carried out in Chapter 5. Most candidate genes were focused within the minimum probable candidate interval, with only a handful of genes outside this interval sequenced. This interval is defined by recombination events in unaffected individuals in CMT54, as such the possibility of reduced penetrance means that the defined interval may be incorrect, however with the information at hand this is the most likely area. An examination of the remaining genes in the 7q34-q36 interval was carried out using NGS, detailed in Chapter 6. COPYNUMBERANDSTRUCTURALVARIATION 5

5.1 introduction

Copy number variation (CNV) has recently been recognised as a major source of genetic diversity in humans, including variation leading to disease [295]. It is estimated that copy number variation (CNV) affects 5% of the sequence of the human genome [296], a much greater amount than the 1% affected by SNPs [295]. This chapter explores CNV at the 7q34-q36 disease interval as a potential disease mechanism in dHMN.

5.1.1 CNV and structural variation

By the classical definition, CNV refers to deoxyribonucleic acid (DNA) seg- ments that are present in a genome at an abnormal number when compared to a reference genome. CNV can result from a change in chromosome number or unbalanced structural variation within chromosomes. Balanced structural variation, including inversions or insertion/deletions, where the copy number remains unchanged, are also a source of genetic variation which must be con- sidered in disease gene identification strategies. Structural variation can occur at a scale ranging from large chromosomal regions through to a molecular level of a single base pair. While there is no set definition of the minimum size of a CNV, functional definitions have been based on the limit of resolution achievable with each new technology. With current technology reaching base-

109 5 copy number and structural variation 110 pair resolution, the functional minimum size definition for structural variation is one exon or 50-100 bp of DNA [295, 297]. Similarly, the mean size of CNV detected in the genome has decreased with the increasing resolution of new techniques, with smaller CNV more common [297, Box 2].

5.1.1.1 Chromosomal CNV

Gain or loss of an entire autosomal chromosome has serious clinical conse- quences. Generally, loss of one or two copies of an autosomal chromosome (nullisomy and monosomy respectively) are pre-implantation or embryonic lethal [199]. The gain of an additional autosomal chromosome (trisomy) is also usu- ally embryonic lethal, however, some forms are survivable but with congenital abnormalities. Gain or loss of a sex chromosome is clinically less severe, with affected individuals generally showing only minor abnormalities.

5.1.1.2 Structural variation

Structural variations are rearrangements of DNA segments within homologous or non-homologous chromosomes. Such rearrangements include: deletions, a loss of a DNA segment; duplications, a gain of a DNA segment, either in tandem or dispersed; inversion, the reversal of direction of a DNA segment; insertion, a gain of a foreign DNA segment from a homologous or non-homologous chromosome; translocation, the exchange of a DNA segments between non- homologous chromosomes (Figure 5.1)[297]. Insertions can be either novel sequences or mobile-elements (transposons). Structural variations can be unbal- anced, where the copy number is altered, or balanced, where there is no change in copy number. Complex structural variations, involving a combination of several types of structural variation, can arise from multiple independent rearrangements at the same loci or complex rearrangements occurring in one event. 5 copy number and structural variation 111

The molecular mechanisms by which CNV can cause disease include: gene dosage, gene interruption, gene fusion, position effects, unmasking of reces- sive alleles or functional polymorphisms, and transvection effects [295]. Gene dosage effects are seen when a dosage-sensitive gene is duplicated or deleted which leads to over expression or haploinsufficiency respectively. Gene in- terruption occurs when the breakpoint of a CNV or structural variation lies within a gene, leading to loss of function. Gene fusion occurs when separate genes or their regulatory elements are brought together by structural variation. Positional effects involve alteration to the regulatory elements of genes. Thus, expression of genes located beyond the region involved in a CNV can be af- fected. Unmasking recessive alleles or functional polymorphisms occurs when the good copy of a gene is deleted, thus allowing the effect of the recessive allele to be observed. Transvection is the cross regulation of gene expression by gene pairs on homologous chromosomes. Loss of one allele or regulatory elements disrupts this regulation, affecting gene expression.

Most of the known CNV is located in genomic regions with no obvious phenotypic effect [295]. In several genome-wide CNV studies, up to half of the CNV identified in each study are present in multiple healthy individuals indi- cating they are polymorphic [298–304]. Recent genome-wide high resolution CNV analysis at a population scale has defined both common polymorphic

Deletion Duplication Inversion Insertion Translocation

Chr B Derivative Chr B Chr A Derivative Derivative Chr B Chr A

Figure 5.1: Forms of CNV or structural variation at chromosomal or molecular scale. This includes deletion, duplication, inversion, insertion and translocation. 5 copy number and structural variation 112

CNVs (MAF > 5%) and population specific polymorphic CNVs [302, 304]. The most common structural variation identified is sub-microscopic deletion.

5.1.2 CNV in dHMN and related peripheral neuropathies

Recently, the role of CNV in diseases of the nervous system, including: neu- rodevelopmental, neurodegenerative, and psychiatric disorders has greatly expanded [reviewed by 305]. The majority of disease genes in dHMN identified to date involve point mutations or small deletions affecting only a few bp (Fig- ure 1.1). An analysis of CNV within 35 peripheral neuropathy genes (including the dHMN genes: HSPB1, HSPB8, DCTN1, SETX, GARS and BSCL2), did not identify any putative pathogenic CNV other than known PMP22 duplications in CMT type 1A (CMT1A)[306].

In rare cases, structural variation has been identified as the cause of dHMN. In autosomal recessive dHMN6, where over 50 point mutations or small deletions have been described in IGHMBP2, two cases were observed to carry structural variations of IGHMBP2, heterozygous with point mutations [78, 79]. The first case involved a deletion of 18.5 kb of DNA spanning exons 3-7 and the second case involved a complex structural variation with two deletions and one inversion affecting exons 9-15, creating a premature stop codon. Both of these CNVs lead to disruption of IGHMBP2 and unmasking of the recessive allele on the alternate chromosome. Other important examples of neuropathies caused by structural variation include: CMT1A caused by tandem duplication of PMP22 [307]; hereditary neuropathy with liability to pressure palsies (HNPP) caused by the deletion of the same 1.5 Mb region [308]; SMA caused by deletion or gene conversion of SMN1 (see section Section 1.1.11); and autosomal dominant leukodystrophy (ADLD) caused by duplication of LMNB1 [309]. 5 copy number and structural variation 113

5.1.3 Methods of detecting CNV

CNVs were first identified by cytogenetic analysis of chromosomal number and structure at a microscopic resolution. Such techniques developed to achieve a resolution of approximately 3 Mb. With the advent of sequencing, CNV at specific targeted loci could be characterised to a base pair resolution. How- ever, it wasn’t until recently that comprehensive genome-wide analysis of CNV, at high resolution, has become possible. The technologies that have al- lowed this include clone and oligonucleotide-based array comparative genomic hybridisation (aCGH), SNP genotyping arrays, and NGS based CNV analysis. For reviews on these techniques see Feuk et al. [310] and Alkan et al. [297].

No single technique, as yet, can correctly identify all structural variation in a genome-wide scan. Karyotyping can robustly identify all the types of structural variation but is limited by its resolution and cannot identify sub- microscopic scale structural variation which account for the majority of known structural variation. Array based CGH, based on relative amounts of probe signals can only detect CNV, with balanced structural variation not detected and translocations incorrectly identified as a duplication of the DNA segment at the donor site. However, the resolution of aCGH is limited only by the array probe densities, which have become increasingly high. SNP genotyping arrays are similarity restricted to the identification of CNV and also achieve a high resolution. Recently, the use of NGS in structural variation analysis has shown great potential, however the computational and bioinformatics overheads required still present a significant challenge beyond large research groups. Structural variation analysis using NGS can be grouped into four strategies: read-pair methods, which identify structural variation break points by locating paired-end sequencing reads that are inconsistent with a reference genome; read-depth methods, which identify CNV based on read depth of 5 copy number and structural variation 114 sequence alignments, similar to PCR based methods; split-read approaches, which identify structural variation break points by identifying breaks in se- quence alignment due to multiple reads; and sequence assembly comparisons, where de novo assembled genome sequences are aligned to identify structural variation [reviewed by, 297]. These approaches combined have the potential to identify all types of structural variation, however are limited by the sequencing read-length, sequence fragment sizes, library construction and uniqueness of the genome region, factors that limit the effective detectable size of structural variation to less than 1 kb.

5.1.3.1 Analysis of CNV at 7q34-q36 using aCGH

To examine the 7q34-q36 disease interval for pathogenic structural variation two analysis were carried out in this thesis; karyotyping to detect microscopic structural variation and high-density oligonucleotide aCGH targeted to the 7q34-q36 disease interval to detect deletions and duplications. To confirm a structural variation as pathogenic would require: The structural variation segregates with disease in CMT54; it is a novel structural variation, not reported in the normal population or healthy controls; the structural variation has a putative functional mechanism disrupting a gene or regulatory element; and the structural variation must be validated by a second method.

Array CGH is a flexible technology that allows the design of custom microar- rays targeted to specific regions of the genome. In this study an oligonucleotide array was designed to target the 7q34-q36 disease interval. To detect CNV, aCGH measures the relative amounts of test and reference DNA which bind competitively to the array (Figure 5.2). While not providing a characterisation of all structural variation, high-density aCGH combined with karyotyping should identify the majority of structural variation within the 7q34-q36 interval. 5 copy number and structural variation 115

Digest and Detect and quantify Sample DNA Hybridise to array label DNA Cy3:Cy5 signal ratios

Cy:5 Duplication Deletion ratio Cy3:Cy5 Genomic position Cy:3 Reference DNA Figure 5.2: aCGH analysis. Genomic DNA from the test sample is digested and labelled with a contrasting flurophore to a reference DNA mixture. Both labelled digested DNAs are hybridised to an array and the relative amounts of each DNA fragment are quantified. A plot of the log ration of test over reference identifies intervals that are duplicated and deleted.

5.1.4 Hypothesis and Aim

It is hypothesised that dHMN seen in family CMT54 is caused by a single gene mutation within the linked region on chromosome 7q34-q36. This mutation may be a CNV or structural variation. The aims of this chapter are to:

• Identify microscopic CNV or structural variation using karyotyping of chromosomal spreads.

• Identify any molecular level CNV using aCGH.

5.2 materials and methods

5.2.1 Custom CGH Microarray

CNV analysis was carried out using the Agilent 60-mer oligonucleotide mi- croarray CGH for genomic DNA. An Agilent SurePrint G3 custom CGH microarray 4x 180K was designed using Agilent Technologies Earray software

(release 6.5; https://earray.chem.agilent.com). The 7q34-q36 disease inter- val and 100 Kb of flanking sequence was targeted with all available probes 5 copy number and structural variation 116 included. The design included 117636 probes to the chromosome 7q34-q36 disease interval, the Human CGH 3k Agilent Normalization Probe Group and 5 × the Human CGH 1k Agilent Replicate Probe Group. The average distance between probes in the chromosome 7q34-q36 interval was 109 bp.

5.2.2 Array processing

Genomic DNA (gDNA) from the three clinically affected and one married in control (section 6.2.2) as well as the Agilent SurePrint G3 custom CGH microarray, was forwarded to the Department of Cytogenetics, the Children’s Hospital at Westmead (Artur Darmanian, Senior Hospital Scientist) for aCGH analysis. Arrays were processed following the Agilent Oligonucleotide Array- Based CGH for Genomic DNA Analysis Protocol Version 5.0, (June 2007), using the direct method for sample preparation (Figure 5.3). Briefly, for each sample analysed, an equal amount of control human gDNA (Female 100ug (Promega) Cat: G1521, Male 100ug (Promega) Cat: G1471) was prepared. Genomic DNA was digested with AluI and RsaI restriction enzymes. Digested sample and control DNA were alternatively labelled with random primers and Cyanine 3-dUTP (Cy3) or Cyanine 5-dUTP (Cy5). Following clean-up of labelled gDNA, samples were prepared for hybridisation by mixing Cy3 and Cy5 labelled gDNA with blocking and hybridisation buffers. Samples and arrays were hybridised to the arrays for 19 hours at 65 °C. After washing, slides were scanned using Agilent Microarray Scanner, model G2505B. Feature extraction was carried out using the software, Genomic Workbench Standard Edition version 5.0.14 (Agilent Technologies). 5 copy number and structural variation 117

Experimental sample gDNA Experimental sample gDNA ? ? Restriction digestion of gDNA Restriction digestion of gDNA ? ? Labelling of gDNA Labelling of gDNA ? ? Purification of labelled gDNA Purification of labelled gDNA

? Preparation before hybridisation ? 19-hour hybridisation (65 °C) ? Microarray washing ? Microarray scanning ? Feature extraction

Figure 5.3: Workflow for Agilent aCGH using the Direct method.

5.2.3 Software Nexus 5

Feature extracted aCGH data was segmented with Nexus 5.0 (BioDiscovery) using the rank segmentation algorithm. The significance threshold for segmen- tation was set at 10-6 and the minimal number of probes required per segment was 3. Copy number gains and loss were defined with a log2 Cy3/Cy5 ratio of 0.2 and -0.2 respectively. High gains and homozygous losses were defined with a ratio of 0.6 and -0.7 respectively.

5.2.3.1 UCSC genome browser

Copy number variation coordinates were exported from the Nexus 5 software in bedGraph file format using the Human Mar. 2006 (NCBI36/hg18) assembly reference then uploaded to the UCSC Genome browser [http://genome.ucsc. edu, 210]. CNVs were compared to the following UCSC genome browser tracks: 5 copy number and structural variation 118

UCSC Genes, Database of Genomic Variants: Structural Variation and Segmental Duplications.

5.2.3.2 CHOP Copy Number Variation project

Copy number variations identified with Nexux 5 software were checked against the Copy Number Variation project at the Children’s Hospital of Philadelphia (CHOP) (CNV.CHOP.org), a database of CNVs from over 2000 healthy children. The UCSC liftover tool (http://genome.ucsc.edu) [210] was used to convert the coordinates of CNVs from NCBI36/hg18 format generated by the Nexux 5 software, to the NCBI35/hg17 format used by the CHOP Copy Number Variation project.

5.2.4 PCR confirmation of deletions

PCR amplicons were designed to bridge identified CNV based on the coordi- nates generated from Nexus 5 software with primers designed to sequences flanking each end of the identified CNV. PCR and sequencing protocols are described in Chapter 2. Sequence generated from PCR bridging deletions was aligned to the Human Mar. 2006 (NCBI36/hg18) assembly reference using the BLAT alignment tool from the UCSC genome browser [198].

5.3 results

5.3.1 Cytogenetic analysis

Cytogenetic analysis was carried out to investigate if microscopic structural variations are present in affected individuals in CMT54. Peripheral blood was collected from clinically affected individual III:14 (Figure 3.5) and forwarded 5 copy number and structural variation 119 to the Department of Cytogenetics, the Children’s Hospital at Westmead (Dr C Rudduck, Cytogeneticist). Five Trypsin (GTL) banded metaphases were counted from 10 out of a total of 15 cells. Trypsin (GTL) banding showed a female karyotype (46, XX) with no abnormalities detected.

5.3.2 CNV analysis at 7q34-q36 using aCGH

The 7q34-q36 dHMN1 disease interval was investigated in detail for deletions and duplications using Agilent 60-mer oligonucleotide microarray CGH for genomic DNA analysis. Three affected individuals and one married in control individual from pedigree CMT54 were analysed. Based on the haplotype analysis at 7q34-q36 carried out previously, the most genetically distant family members were selected. The three affected individuals (Figure 3.5: II:1, III:9, III:12) carry the disease chromosome and a unique alternate chromosome at 7q34-q36. The healthy control individual (Figure 3.5: II:7) is the married in parent of affected individual III:9 and therefore shares the non-disease chromosome at 7q34-q36.

Using the Nexus 5 software a total of 89 unique CNVs were identified within the 7q34-q36 interval screened at high resolution (Appendix I). These CNVs included 40 duplications and 49 deletions. These CNVs had a mean size of 3847 bp, with a range from 4 bp to 110312 bp. The size distribution of identified CNV, showed most CNVs were sub-microscopic (< 3 kb) with the largest proportion less than 500 bp (Figure 5.4). This matched the expected distribution, with smaller CNVs more common [297].

The location of the CNVs were plotted against the 7q34-q36 interval for the three affected individuals and the control individual (Figure 5.5). Two heterozygous duplications and six heterozygous deletions were common to the three affected individuals. The coordinates of the ends of the CNV were 5 copy number and structural variation 120

50 45

alls 40 35 30 25 20 15 10

Percentage of of CNV c Percentage 5 0

1.5k 2.5k 3.5k 4.5k 5.5k 6.5k 7.5k 8.5k 9.5k 10k 30k 50k 70k 90k 500.0 110k CNV size (bp)

Figure 5.4: The proportion of CNV calls within each size range. The CNVs were counted in 500 bp blocks up to 10 kb, then counted in 10 kb blocks. The size distribution of the CNVs identified matches the expected distribution, with smaller CNVs more common. checked to ensure identity. CNVs that were not present in all three affected individuals were excluded. Of the eight CNVs identified in the three affected individuals, both duplications and two deletions were also present in the control individual and excluded. The remaining four deletions, labelled A, B, C and D were examined further (Figure 5.5).

5.3.2.1 Deletion A

Deletion A involves the loss of up to 178 kb of DNA from within the olfactory receptor gene cluster at 7q35. The size of the deletion is different between the three affected individuals; affected individual III:9 shows a single large deletion of 178 kb, whereas affected individuals II:1 and III:12 have two smaller deletions of 55.6 kb and 110.3 kb separated by a 12.1 kb gap, maintaining the same overall size as the deletion in III:9. The probe plot generated with the Nexux 5 software shows different Log2 cy3/cy5 ratios for the two deletion patterns in affected individuals (Figure 5.6). Healthy control individual II:7 shows slight gain in Log2 cy3/cy5 ratio across this region, but is below the threshold for a duplication call. 5 copy number and structural variation 121

Chromosome 7

5Mb Cytogenetic map 7q34 7q35 7q36.1 7q36.2 MPCI Genes DGV Struct Var

3 2 CNV 1 Affected Samples 1 2 3 Del A Del B Del C Del D CNV Healthy Sample

Figure 5.5: Array CGH analysis in CMT54. Top: Chromosome 7 ideogram depicting the location of the 7q34-q36 disease locus analysed using aCGH (UCSC genome browser). Middle: detail of the 7q34-q36 disease locus. Figure labels indicate the following: cytogenetic map, minimum probable candidate interval (green bar), known genes, and known CNV from DGV. Bottom: Detail of CNV identified with Nexux5 software for three affected individuals and one healthy married in parent from CMT54. The upper graph shows number of affected individuals carrying deletions (red bars) and duplications (green bars). The lower graph shows CNVs carried by the healthy married in parent. Four deletions labelled A-D are carried by all three affected individuals and not by the control individual. Two additional deletions and two duplications are carried by all three affected individuals as well as the control individual.

The Nexux 5 genome browser indicated that there is a reported CNV at this location. Examination of reported CNV from DGV using the UCSC genome browser showed several reported duplications, deletions and inversions at this location (Figure 5.7). In particular, two CNV reports match the coordinates of the 178 kb deletion seen in affected individual III:9 (DGV; variation_0726 and variation_2110). These reports identified both duplication and deletion of this interval. In addition, two deletions (DGV; variation_94938 and variation_94944) and two duplications (DGV; variation_82238 and variation_82244) matched the 55.6 kb and 110.3 kb deletions identified in affected individuals II:1 and III:12. Further complicating the situation, two small duplications were reported which match the size of the 12.1 kb gap (DGV: variation_70060 and variation_82243). It is possible that affected individuals II:1 and III:12 carry a duplication of the 12.1 kb interval in addition to the larger deletion A. However, additional 5 copy number and structural variation 122

55.6 110.3 kb

Figure 5.6: CNV probe plot around deletion A. Top: Genomic loca- tion, probe design and reported CNV’s. Main panels: CNV probe plots generated with Nexus software around dele- tion A. The log2 test- over-reference ratios of 178 kb hybridisation signals is plotted against genomic position. Each probe is represented by a blue dot and the size of dele- tion is indicated by the red scale bar. The upper three panels shown af- fected individuals II:1, III:9, and III:12, and the lower panel shows 55.6 110.3 kb healthy control individ- ual II:7. Affected in- dividual III:9 shows a 178 kb deletion and af- fected individuals II:1 and III:12 show two smaller deletions of 55.6 kb and 110.3 kb, in- dicated by the red scale bar below each probe plot. Control individ- ual II:7 has a slight gain across this region which does not reach significance. 5 copy number and structural variation 123

chr7 (q35) 13 21.11 31.1 33 q34 35

Figure 5.7: UCSC genome browser views of deletion A. Detail of deletion A and reported CNVs from DGV. The deletions in the affected individuals are shown as orange bars. No CNV is shown for the healthy control individual. UCSC genes and DGV structural variation tracks are shown. Colour in the DGV structural variation track indicates the type of variation relative to the reference: purple indicates inversions, red indicates a gain in size, blue indicates loss in size, and green indicates both a loss and a gain in size.

Scale 100 kb chr7: 143550000 143600000 143650000 143700000 - - - - UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics CTAGE4 OR2A1 FKSG35 FLJ43692 OR2A9P CTAGE4 FKSG35 OR2A42 ARHGEF5 FLJ43692 BC040701 FKSG35 ARHGEF5 OR2A7 OR2A9P ARHGEF5 LOC728377 Database of Genomic Variants: Structural Variation (CNV, Inversion, In/del) 3708 32962 37282 64974 94940 59845 4573 94936 70057 82237 82245 82235 37283 94934 2110 59622 70058 64976 70060 94945 31398 4571 4572 82243 70056 94939 0726 7612 70063 70059 70061 37567 82239 94944 70062 82244 94941 37284 82240 82241 94937 64975 39511 94935 82242 82236 94942 Affected (II:1) 94943 82238 Affected (III:12) 94938 Affected (III:9) Control (II:7) 5 copy number and structural variation 124 evidence would be required to confirm this. These size matches are likely due to the same array platform and probe set being used in the experiments which reported these CNVs as used in this experiment [311–313]. Numerous other reported CNVs overlap deletion A.

To confirm the break points of deletion A, three PCR primer sets were de- signed from the Nexuxs 5 software coordinates. Despite a robust optimisation protocol, no PCR products were amplified. This suggests deletion A may be more complex than indicated, or the deletion co-ordinates may be too far from the true break point to allow PCR amplification.

The interval affected by deletion A contains several olfactory receptor genes and one candidate gene ARHGEF5, located at the distal end of the deletion (Figure 5.7). Three alternate transcripts are shown for ARHGEF5, two of which are partly deleted and one entirely deleted. The deletion or duplication of ARHGEF5 has been identified in numerous healthy individuals, from different populations, in at least three studies [311–313]. This suggests that ARHGEF5 CNV is not involved in the pathogenesis of dHMN linked to 7q34-q36 and deletion A is highly likely to be a common population variant.

5.3.2.2 Deletion B

Deletion B involves the loss of 5.4 kb of DNA from 7q35. The size of this deletion is between 5.4 kb and 5.2 kb for the three affected individuals. The probe plot generated with the Nexus 5 software shows the Log2 cy3/cy5 ration varies between the three affected individuals (Figure 5.8). The healthy control individual also shows a reduction in the Log2 cy3/cy5 ratio which is below the threshold for a deletion call. The Nexus 5 genome browser shows a reported CNV in this location (Figure 5.9 top panel). Examination of reported CNV from DGV using the UCSC genome browser showed two deletions of similar size to deletion B (Figure 5.9). Furthermore, no genes are located within or 5 copy number and structural variation 125 nearby this deletion. Therefore deletion B is likely to be a normal population variant and non-pathogenic in CMT54.

5.3.2.3 Deletion C

Deletion C involves the loss of intronic sequence from CNTNAP2, 14Kb down- stream and 80Kb upstream of the closest two exons. While CNTNAP2 spans the proximal end of the minimum probable candidate interval in CMT54, this deletion is located outside the minimum probable candidate interval. The size of the deletion is 1.4-1.5 kb in the three affected individuals and is absent from the control individual. The probe plot generated with the Nexux 5 software shows an equal reduction in the Log2 cy3/cy5 ratio between the three affected individuals and no reduction in the control individual. No CNV was reported in this location by the Nexus genome browser, however, a single deletion at this location was reported in DGV (DGV; variation_64982) (Figure 5.11).

To check for segregation in CMT54 and screen additional healthy controls, PCR primers were designed flanking deletion C based on the coordinates from the Nexus 5 software. The PCR primers were located outside the deletion, with an expected amplicon size of ∼2 Kb for normal chromosomes and ∼600 bp for chromosomes with the deletion. Amplification from affected individuals in CMT54 produced only the 600 bp PCR product. The 2 kb PCR product was not seen in any affected individuals from CMT54 or 36 control samples indicating the 2 kb product was not amplified by the PCR as opposed to the deletion being homozygous. To confirm the specificity of the PCR, the 600 bp PCR product from an affected individual was sequenced and aligned to the Human Mar. 2006 (NCBI36/hg18) assembly using the BLAT tool from the UCSC genome browser. The sequence aligned with 100% identity to the flanking sequence around deletion C (Figure 5.11). This alignment confirmed the size of the deletion as 1098 bp, smaller than indicated by the Nexux 5 software, possibly due to a lower number of probes at the distal end of the deletion (Figure 5.11). 5 copy number and structural variation 126

5.4 kb

Figure 5.8: CNV probe plot around deletion B. Top: Genomic loca- tion, probe design and reported CNV’s. Main panels: CNV probe plots generated with Nexus software around dele- 5.2 kb tion B. The log2 test- over-reference ratios of hybridisation signals is plotted against genomic position. Each probe is represented by a blue dot and the size of dele- tion is indicated by the red scale bar. The upper three panels shown af- fected individuals II:1, 5.4 kb III:9, and III:12, and the lower panel shows healthy control individ- ual II:7. The three af- fected individuals show a common deletion of 5.2 - 5.4 kb. Control in- dividual II:7 shows a decrease in the Log2 cy3/cy5 ratio which is below the threshold for a deletion call. The probe coverage in this region is shown in the probe design above. 5 copy number and structural variation 127

chr7 (q35) 33 34 35

Figure 5.9: UCSC genome browser views of deletion B. Detail of deletion B and reported CNVs from DGV. The deletions in the affected individuals are shown as orange bars. No CNV is shown for the healthy control individual. UCSC genes and DGV structural variation tracks are shown. Colour in the DGV structural variation track indicates the type of variation relative to the reference: blue indicates loss in size.

Using this PCR as a test, the deletion was present in all 10 affected individuals in CMT54 and the unaffected individual III:3 who carries the proximal end of the disease haplotype after a genetic recombination event. None of the remaining unaffected family members or spouses from CMT54 carried the deletion. A screen of 36 unrelated healthy control individuals identified three individuals with deletion C (Figure 5.12). With the deletion reported in DGV and present in three of 36 healthy controls, deletion C was excluded as a normal population variant.

5.3.2.4 Deletion D

Deletion D involves a loss of sequence from an intragenic region within the minimum probable candidate interval in CMT54. The closest gene is ACTR3C ∼93 Kb downstream. The size of this deletion ranges from 1.9 kb to 2.6 kb in the three affected individuals. The probe plot generated with the Nexus 5 software shows a variable Log2 cy3/cy5 ratio for the deletion between the three affected individuals (Figure 5.13). In addition, the healthy control individual also shows a reduction in the Log2 cy3/cy5 ratio, but is below the threshold for a deletion call. In this location, the Nexus 5 genome browser showed a known CNV and DGV contained 5 deletions, 1 insertion and 1 insertion and

Scale 2 kb chr7: 144548000 144549000 144550000 144551000 144552000 144553000 - Affected (II:1) - Affected (III:12) - Affected (III:9) - Control (II:7) UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics Database of Genomic Variants: Structural Variation (CNV, Inversion, In/del) 6578 64979 5 copy number and structural variation 128

1.4 kb

Figure 5.10: CNV probe plot around deletion C. Top: Genomic loca- tion, probe design and reported CNV’s. Main panels: CNV probe plots generated with Nexux 1.5 kb software around dele- tion C. Each blue dot represents a probe. The Y-axis represents the log2 test-over-reference ratios of hybridisation signals. The X-axis rep- resents the genomic po- sition. The upper three panels shown affected individuals II:1, III:9, 1.4 kb and III:12, and the lower panel shows con- trol individual II:7. Af- fected individuals show a 1.4-1.5 kb deletion, in- dicated by the red scale bar below each probe plot. Control individ- ual II:7 does not show any CNV in this region. The probe coverage in this region is shown in the probe design above. There are no reported CNVs in this region. 5 copy number and structural variation 129

chr7 (q35) 33 34 35 A. Scale chr7:

Your Sequence from Blat Search B.

Figure 5.11: (A) UCSC genome browser views of deletion C. Detail of deletion C and reported CNVs from DGV. The deletions in the affected individuals are shown as orange bars. No CNV is shown for the healthy control individual. UCSC genes and DGV structural variation tracks are shown. Colour in the DGV structural variation track indicates the type of variation relative to the reference: blue indicates loss in size. (B) Characterisation of break points of deletion C. PCR was targeted to the deletion allele and sequenced in two affected individuals. A BLAT search on the Human Mar. 2006 (NCBI36/hg18) assembly18 aligned the sequence to the position of deletion C. The deleted area is the arrowed line joining the two sequence blocks. All three CNV results from the NEXUS CNV analysis over reported the true size of the deletion (1098bp) due to a low density of probes at the 5’ end of the deletion.

M 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

M 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

Figure 5.12: Genotyping of control individuals for deletion C. Lanes marked M are HyperLadder IV DNA size marker, lanes 1 and 2 are CMT54 positive controls, lane 3 is the PCR negative control, Lanes 4-37 are healthy control samples, and lane 38 is samples 29-37 pooled before PCR. Normal controls 21, 23 and 34 are positive for deletion C, indicating it is a normal population variant. 5 copy number and structural variation 130 deletion (Figure 5.14). Therefore, deletion D is a normal population variant and non-pathogenic in CMT54.

5.4 discussion

5.4.1 Summary

While CNV is a known mechanism in dHMN and the related peripheral neu- ropathy CMT1A, the evidence presented here suggests that CNV is not the cause of disease in this instance. Cytogenetic analysis at a microscopic reso- lution did not indicate any deletion, duplication or other structural variation present at the 7q34-q36 interval. CNV analysis at a sub-microscopic resolution using aCGH identified 89 unique duplications and deletions. Four deletions were present in the three affected family members and not the married in healthy control. However, no putative pathogenic link could be established for these deletions and all were excluded. These results are in agreement with the heterozygosity observed for microsatellite markers across the 7q34-q36 inter- val (Chapter 3). Microsatellite markers can detect duplications by observing increased allele dosage or number of alleles, whereas deletions are observed by loss of heterozygosity. No unexplained homozygosity or additional alleles were observed during microsatellite analysis.

5.4.2 Comparison of methods for CNV analysis

None of the CNV identified using aCGH were reported by the cytogenetic analysis. The majority of the CNV identified by aCGH was sub-microscopic, with the majority of CNV below 1.5 kb. The larger CNV identified ranged in size from 12-110 kb, well below the 1-10 Mb resolution observable by 5 copy number and structural variation 131

1.9 kb

Figure 5.13: CNV probe plot around deletion D. Top: Genomic loca- tion, probe design and reported CNV’s. Main panels: CNV probe plots generated with Nexux software around dele- tion D. Each blue dot 2.5 kb represents a probe. The Y-axis represents the log2 test-over-reference ratios of hybridisation signals. The X-axis rep- resents the genomic po- sition. The upper three panels shown affected individuals II:1, III:9, and III:12, and the lower panel shows con- 2.6 kb trol individual II:7. Af- fected individuals show a 1.9-2.6 kb deletion, in- dicated by the red scale bar below each probe plot. Control individual II:7 shows a decrease in the Log2 ratio, how- ever, this does not reach the cut-off, but does suggest this individual may carry this deletion. The probe coverage in this region is shown in the probe design above. 5 copy number and structural variation 132

chr7 (q36.1) 33 34 35

Figure 5.14: UCSC genome browser views of deletion D. Detail of deletion D and reported CNVs from DGV. The deletions in the affected individuals are shown as orange bars. No CNV is shown for the healthy control individual. UCSC genes and DGV structural variation tracks are shown. Colour in the DGV structural variation track indicates the type of variation relative to the reference: red indicates a gain in size, blue indicates loss in size, and green indicates both a loss and a gain in size. The yellow bar shows a segmental duplication on 7q36.1.

cytogenetic analysis. Therefore the karyotyping and CNV analysis are both in agreement.

The methods used here sufficiently overlap in resolution to conclude that microscopic structural variation and all scale of insertions and deletions are unlikely to be responsible. There remains the possibility that sub-microscopic balanced structural variation may be present within the 7q34-q36 interval. The resolution of the custom oligonucleotide based aCGH used here was high enough in most places to identify CNV and overlaps with resolution to detect indels (CNV) with direct sequencing of exons.

The NGS data generated by targeted sequence capture was not amenable to structural variation analysis with the computer resources available. The analysis of this data for evidence of structural variation including inversions and insertions or the characterisation of the breakpoints of the identified structural variation to a base pair resolution should be perused if sufficiently powerful computing resources become available to the research group.

Scale 5 kb chr7: 149505000 149510000 - Affected (II:1) - Affected (III:12) - Affected (III:9) - Control (II:7) UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics Database of Genomic Variants: Structural Variation (CNV, Inversion, In/del) 3709 2716 36647 64984 94959 32964 94960 Duplications of >1000 Bases of Non-RepeatMasked Sequence chr7:153033636 5 copy number and structural variation 133

5.4.3 Limitations of array-based CGH

The CNVs reported in databases have been identified from individuals con- sidered to be clinically normal. It is possible that apparently normal/healthy individuals will later develop disease. The larger the cohorts the greater the chance of including asymptomatic carriers of disease alleles. This has occurred in the case of the duplication that causes CMT1A. This duplication has been reported in a population based study of CNV [314] where the deletion was found in apparently normal individuals. This is also likely to be a concern with the CHOP variation database which has used healthy children as a sam- ple group. This limits the usefulness of these as controls to exclude CNV as pathogenic. To use reported CNVs as controls for excluding pathogenic CNV, the CNV needs to be either; identified in multiple individuals, at a rate above the frequency of the disease in the general population; or from a control group that has been clinically examined and is not at risk of developing the disease. The prevalence of autosomal dominant dHMN is much lower than CMT1A (20 per 100 000) and CMT2 (2 per 100 000)[315], so reported CNV are suitable for filtering CNVs in dHMN. The size of reported CNVs also needs to be considered when filtering po- tential CNV mutations. The size of reported CNVs is generally over-estimated due to the limited resolution of the techniques used to create initial CNV maps [295]. The large CNVs reported from platforms with limited resolution may contain several smaller CNVs. The sequencing of deletion C here, shows that with a high probe density, the size of the deletion was only slightly over represented by the aCGH results. Thus, the size of CNVs within 7q34-q36 is relatively accurate and can be compared to reported CNVs identified with similar platforms. NEXT GENERATION SEQUENCING OF CMT54 6

6.1 introduction

The recent development of new sequencing technologies has revolutionised the strategies for the identification of genetic defects underlying disease. These new sequencing technologies are collectively referred to as next generation sequencing (NGS) and encompass; DNA template preparation, sequencing, data handling and bioinformatics [316]. NGS can generate very large amounts of sequence data. This has led to a shift in focus from data generation (template preparation and sequencing) to data analysis and validation. NGS is particular- ity useful for the identification of disease genes and for targeted re-sequencing of disease intervals defined by linkage analysis or genome-wide association studies (GWAS). The development of NGS continues to rapidly advance and associated costs have dropped so that NGS has become increasingly accessible. This chapter details the application of NGS for the identification of genetic variation in family CMT54 at both the 7q34-q36 dHMN disease interval and elsewhere in the genome.

6.1.1 NGS technology

Multiple NGS platforms were used in this study. As described below, NGS platforms vary in the methods and biochemistry used in each step. For detailed

134 6 next generation sequencing of cmt54 135 descriptions of currently available NGS platforms see reviews by Metzker [316] and Shendure and Ji [317].

6.1.1.1 Template preparation

For the NGS platforms used here, template preparation, or library genera- tion, involves the fragmentation of template DNA followed by immobilisation on solid supports. The immobilisation of template fragments is the key to achieving the massive parallel sequencing required to achieve high sequencing throughput and low costs. For the detection of fluorescent sequencing sig- nals, most instruments require amplification of the DNA template fragments. Template amplification is generally achieved by emulsion or solid phase PCR. Both approaches use an adapter ligated shotgun library and a single-primer set, multi-template PCR. Physical separation of template fragments is used to maintain single template specificity in PCRs, otherwise known as clonal amplification.

For example, the 454 sequencing system (Roche) uses a water in oil emulsion PCR where the template fragment concentration is controlled to achieve, at most, a single template molecule in each emulsion reaction compartment [318]. PCR products are captured to micron-scale beads included in the emulsion. After breaking the emulsion each bead contains PCR products from a single template fragment (Figure 6.1 a). Beads are immobilised by chemical attach- ment to a glass slide or picotitre plate. Alternatively, the Solexa sequencing system (Illumina) uses bridge PCR, with PCR primers fixed to a solid support. Adapter ligated template fragments are bound to complementary probes on a sequencing slide. The concentration of template fragments is controlled to achieve physical separation of initial template fragments. After PCR amplifica- tion, template clusters contain copies of a single template fragment (Figure 6.1 b). Increased instrument sensitivity allows the Helicos sequencing systems (Helicos Biosciences) to sequence from single-molecule templates, avoiding the 6 next generation sequencing of cmt54 136

Figure 6.1: Clonal amplification of immobilised template DNA in NGS. (a) Water- in-oil emulsion PCR. An adapter-ligated shotgun library is PCR amplified within emulsion compartments. One PCR primer is attached to the surface of micron scale beads included in the reaction. The concentration of template and beads is adjusted to achieve one template molecule per compartment. After breaking the emulsion, beads contain only one type of template molecule which are then fixed to a slide or picotitre plate. (b) Bridge PCR. Both PCR primers densely coat the surface of the solid support slide. The initial concentration of the adapter-linked shotgun library is controlled to achieve physical separation between template molecules. PCR products remain tethered near the original template fragment resulting in clonal clusters of PCR products from single templates. Figure adapted from [317]. need for template amplification. As no amplification is needed, templates are bound directly to supports with a primer or other adapter probe.

6.1.1.2 Sequencing Biochemistries

The NGS platforms used in this thesis are based on pyrosequencing and reversible terminator sequencing biochemistries. Several other sequencing biochemistries are currently available or under development and include sequencing by ligation [319], semiconductor based sequencing [320] and real- time sequencing [321].

Pyrosequencing is based on the detection of the pyrophosphate released upon nucleotide incorporation during strand extension by DNA polymerase [322, 323]. In brief, one of the four dNTP types is added to the sequencing reaction and pyrophosphate released stoichiometrically is detected by chemi- luminescence through a a set of enzymatic reactions using sulfurylase and luciferase. In this reaction, pyrophosphate is converted to ATP by sulfurylase, 6 next generation sequencing of cmt54 137 which in turn is used by luciferase in the conversion of luciferin to oxyluciferin, generating light in proportion to the amount of available ATP. Nucleotides are sequentially cycled to generate a fluorescence flow diagram from which the sequence can be read [324].

Reversible terminator based sequencing utilises cyclic addition of dNTPs modified with a reversible terminator capping the 3’-OH and fluorescent tag used in a similar way to dideoxynucleotides (ddNTPs) used in Sanger sequencing [316]. In brief, after each nucleotide extension and imaging, excess dNTPs are washed off and the fluorescent label and the strand terminator are cleaved, restoring the 3’-OH, allowing extension in the subsequent extension cycle. NGS systems using reversible terminators have been developed using both four color individually tagged dNTPs (used here) [325] or single color tags with sequential extension [326].

6.1.1.3 Data handling and bioinformatics

The current NGS instruments produce short sequence reads from the the end(s) of longer DNA fragments making up the sequencing library. Read lengths produced by current instruments include ’short-reads’ of 25-100 base pair (bp) or ’long-reads’ of 250-800 bp [316]. Reads can be made from a single end of a DNA fragment or can be read from both ends as ’paired-end’ reads. Paired-end reads have advantages in assembly accuracy over single reads, in particular for repeat and duplicated sequences [327]. This is a greater limitation for short- reads than long-reads. Sequence reads are aligned to form longer sequence contigs by mapping to a reference genome [328] or de novo assembly [329]. For disease gene discovery, further bioinformatics may include, sequence variant calling, annotation (genes, exons, known SNPs), copy number variation, and segregation/inheritance analysis. 6 next generation sequencing of cmt54 138

6.1.2 NGS of large genomes

NGS is ideally suited to whole genome sequencing, particularly for smaller prokaryote genomes [318]. While this is the ultimate approach for identifying sequence variation within an individual, this remains costly in large genomes, especially where higher read depths are required for accurate identification of heterozygous variants [330]. To overcome this, several non-cloning based approaches to genomic partitioning have been developed to enrich for genomic regions of interest suitable for NGS. These include pooled PCR products, multiplex PCR, and hybridisation sequence capture [331]. PCR based targeted enrichment still requires a large preparation workload and is best suited to small numbers of targets.

Hybridisation based sequence capture, either array-based or solution-based, has allowed the generation of DNA libraries targeting specific genomic in- tervals at an effective speed and cost [332]. Hybridisation sequence capture is based on the construction of a small-fragment DNA library of defined se- quences of interest to which sample DNA is hybridised; thus allowing for uniform recovery of sequences of interest [333]. This was demonstrated by the targeted capture of all known coding exons of the human genome [332]. The protein coding portion of the genome is referred to as the exome. Using larger standardised arrays has reduced the cost of exome capture and sequencing [334]. In addition, as most known disease causing mutations are located in coding regions, the exome is a high yield target for research [324]. Exome sequencing has recently become a popular approach for the identification of Mendelian disease mutations [335]. 6 next generation sequencing of cmt54 139

6.1.3 Applications of NGS to Mendelian disease

With many human genomes now sequenced, the number of known single bp sequence variants or small indels is over ∼3.4 million [336]. An examining of only the protein coding portion of the genome identified an average of 17 272 coding variants per individual [334]. As mutations causing rare Mendelian diseases are expected to be highly penetrant, SNPs that are common can be excluded. The majority of sequence variants within an individual are common SNPs. Studies have shown that around 92% of coding variants identified or 82% of all sequence variants, were previously reported in SNP databases [334, 336, 337]. The identification of a single highly-penetrant disease causing variant from several thousand variants requires a comparison of several individuals and filtering strategies to reduce the number of variants to be validated.

The first successful application of NGS to the identification of a mutation in a Mendelian disease was in the recessive disease, Miller syndrome [338]. Four affected individuals including two siblings and two other affected individuals from three families were exome sequenced. Under an autosomal recessive model, only one candidate gene was shared between individuals from the three families. Compound heterozygous mutations were identified in the gene DHODH in all affected individuals. Confirmation with Sanger sequencing showed that the mutations were inherited in a autosomal recessive manner, and parents were heterozygous carriers of the mutations. DHODH mutations were subsequently identified in an additional three Miller syndrome families using Sanger sequencing.

Combining sibling pairs and unrelated families is effective in recessive diseases with low genetic heterogeneity. Autosomal dominant diseases are generally more genetically heterogeneous, including dHMN and other neu- ropathies where each gene mutation is only responsible for a small proportion 6 next generation sequencing of cmt54 140 of cases. As such, unrelated individuals are less likely to have mutations in the same gene. Multiple individuals from a single pedigree are therefore required for comparison. In the analysis of Miller syndrome by Ng et al. [338], both autosomal recessive and dominant modes of inheritance were considered; using a dominant model increased the number of potential gene variants by a factor of 14-25 × compared to the recessive model. Due to the shared genetic background of individuals in a single pedigree, there is a greater number of genetic variants segregating with the disease by chance. In a practical sense, an autosomal dominant disease requires the comparison of four related affected individuals to achieve a similar number of candidate variants as a recessive disease comparing three individuals [334, 338]. The number of potential vari- ants does not significantly decrease by adding additional affected individuals above 4 or 3 for dominant and recessive modes of inheritance respectively.

In the Miller syndrome analysis, comparison to eight unrelated control individuals was a more effective way of reducing the total number of sequence variants [338]. Under the autosomal dominant disease model, comparing two affected individuals and 8 unrelated controls achieved a similar number of novel variants as comparing only the four affected individuals. However, with rapidly expanding public sequence variation databases, the effect of comparing additional control individuals will be less.

Additional strategies for filtering out SNPs unrelated to disease have been employed. Various software can predict the effect of a sequence variant on protein function and implicate those likely to be damaging. However, there is a risk of falsely excluding a true mutation due to inaccurate predictions [338]. Where the disease in a large pedigree has been mapped to a chromosomal locus, combining NGS with mapping information can greatly reduce the number of novel candidate variants. Sequencing of fewer individuals is required to identify a disease mutation [339]. Based on the reported rates of variation and linkage disequilibrium, ∼4 novel non-synonymous, splice site or indel 6 next generation sequencing of cmt54 141

(NS/SS/Indel) variants would be expected within a disease interval containing 100 genes when sequencing one affected individual [339].

6.1.3.1 Hypothesis and aims

It is hypothesised that dHMN seen in family CMT54 is caused by a single gene mutation within the linked region on chromosome 7q34-q36. With a defined disease locus containing a large number of candidate genes, the use of NGS has the potential to identify the disease gene mutation in CMT54. The aim of this chapter is to:

• Capture and sequence genomic regions of interest in family CMT54 using NGS in order to identify the genetic variant causing dHMN.

6.2 materials and methods

6.2.1 NimbleGen array-based targeted sequence capture

6.2.1.1 NimbleGen sequence capture array design

Two custom 385K 1-plex arrays were designed and manufactured by Roche NimbleGen to capture genomic sequence from the 7q34-q36 disease interval. The 12.98 Mb interval was defined by genetic recombinations in affected individuals in CMT54 (NCBI36/hg18 assembly coordinates; chr7:141 054 291- 154 034 394). Unique probes were designed to 13 348 tiled regions covering 9.03 Mb of sequence within the target interval. Probe tiling excluded repetitive DNA unsuitable for capture.

6.2.1.2 NimbleGen DNA sequence capture

Genomic DNA was purified from peripheral blood following the protocol described previously (Section 2.3.1). DNA was tested for purity by UV spec- 6 next generation sequencing of cmt54 142 trophotometer and degradation by agarose gel electrophoresis according to NimbleGen quality control (QC) requirements. DNA (∼ 40 µg) was forwarded to Roche Diagnostics (Asia Pacific) for sequence capture using the custom 385K arrays, following standard manufacturer protocol (Roche NimbleGen). In brief, approximately 20 µg of genomic DNA was fragmented to a size range of 300-500 bp then purified. Adapters were ligated to the DNA fragments using T4 DNA Ligase and small fragments (< 100 bp) removed. The library was hybridized to the custom 385K arrays. Un-bound DNA fragments were removed by washing followed by elution of target DNA fragments. Eluted target DNA fragments were PCR amplified using primers complementary to the adapter sequence. The amplified DNA library was used in subsequent sequencing reactions.

6.2.2 454 GS FLX sequencing

The Roche NimbleGen target captured DNA was forwarded to Australian Genome Research Facility (AGRF) for sequencing using the Roche GS FLX (454) sequencing platform according to the manufacturer’s protocol. Sequence reads were aligned to the Human Mar. 2006 (NCBI36/hg18) assembly using gsMapper software (Roche). Identification of sequence variants required at least two non-duplicate reads with at least 5 bp of sequence flanking the variant. Sequence variants were defined as ’high-confidence’ if they were identified in three non-duplicate reads and both forward and reverse directions. Variants were annotated to the SNP database, dbSNP build 129. 6 next generation sequencing of cmt54 143

6.2.3 NimbleGen array-based exome capture and Solexa sequencing

Genomic DNA from two affected individuals and one ’married in’ unaffected parent was forwarded to BGI (Hong Kong) for exome capture and sequenc- ing. Exome capture was carried out using the NimbleGen Sequence Capture Human Exome 2.1M Array following the manufacturers protocol (Roche Nim- bleGen). In brief, genomic DNA was randomly fragmented to an average size of 500 bp then purified. Adapters were ligated to the DNA fragments using T4 DNA Ligase and small fragments (< 100 bp) removed. The library was hybridized to a NimbleGen 2.1M Human exome array. Unbound DNA frag- ments were removed by washing followed by elution of target DNA fragments. Eluted target DNA fragments were randomly fragmented and Illumina com- patible adapters ligated. PCR amplification using primers complementary to the Illumina adapters was used to enrich the material for downstream Solexa sequencing.

Sequencing was carried out using an Illumina Genome analyser according to the manufacturers protocol. Adapter sequences within reads were removed and the remaining read sequence used only if longer than 29 bp (BGI local programming algorithm). Sequence reads were aligned to the Human Mar. 2006 (NCBI36/hg18) assembly using SOAP aligner software using one read end of paired end reads [340]. A maximum of 6 sequence variants were allowed per read. Consensus sequence and sequence variant calling was carried out using SOAPsnp software. Sequence variants were annotated to dbSNP build 129. 6 next generation sequencing of cmt54 144

6.2.4 Confirmation of sequence variants using Sanger sequencing

Sequence variants were confirmed by PCR amplification of selected exons and sequencing by Macrogen (Korea) as described in Section 2.3.6.

6.2.5 Sequence variant analysis using Galaxy Browser

Sequence variants were analysed using Galaxy Browser software (http:// galaxyproject.org/)[341, 342]. Sequence variants were annotated using ge- nomic coordinates for UCSC genes (UCSC genes track) downloaded from the UCSC table browser via Galaxy. Public sequence variation databases used included dbSNP 129, dbSNP 130, and dbSNP 132 downloaded via the UCSC table browser.

6.2.6 NGS assembly and annotation using Bowtie and SAMtools

Solexa sequencing reads were aligned to the Human Mar. 2006 (NCBI36/hg18) assembly using Bowtie software version 0.12.5 in SAM mode. SAMtools soft- ware version 0.1.8 was used to convert the Bowtie alignment to a pileup format which describes the base-pair information at each chromosomal posi- tion including SNV/indel calling, read-depth and quality scores. The Bowtie alignment was converted from bam to sam format, sorted, then indexed and output to a pileup formatted file. This version of Bowtie does not support indel calling.

6.2.6.1 Sequence coverage visualisation

A custom BAsH script was written to extract read-depth coverage data from SAMtools pileup format and create a bar graph in bedgraph format that 6 next generation sequencing of cmt54 145 can be visualised in the UCSC genome browser. This software allows the visualisation of sequencing coverage at a single base level in an annotated graphical environment. This script is shown in Appendix A.

6.2.7 Unequal coverage comparison

A custom BAsH script was written to check segregation of variants between samples using an autosomal dominant model of inheritance. The script inspects the read-depth at a chromosomal position before excluding variants. Only variants with 10 × coverage are used to exclude variants. Comparisons are based on the annotated position of variants as used in the UCSC genome browser and requires all samples to use the same genome reference. First, the software collates the reported variants from all samples. Segregation of each variant is checked between the samples. If the variant is not identified in a sample, the read-depth at the corresponding position is checked from the pileup file. Where the sequence read-depth is >= 10 × and no sequence variant is identified, the individual is considered as wt! (wt!) or equal to the annotation reference. Where read-depth at the position of a variant is below 10 × the variant is flagged as uninformative. This script is shown in Appendix A.

6.2.8 In silico prediction of functional impacts of sequence variants

The software tools used for prediction of functional impacts of sequence variants were: multiple sequence alignment (MSA) to identify conserved nu- cleotides; the sequence based prediction tools SIFT (http://blocks.fhcrc. org/sift/SIFT.html )[343] and PANTHER (http://www.pantherdb.org/tools/ csnpScoreForm.jsp)[344]; the Sequence and structure-based prediction tools

SNAP (http://www.rostlab.org/services/SNAP/)[345] and Polyphen (http: 6 next generation sequencing of cmt54 146

//genetics.bwh.harvard.edu/pph/)[346, 347]. Web interfaces were used for all tools.

6.3 results

6.3.1 Chromosome 7q34-q36 target region sequencing

The chromosome 7q34-q36 disease interval was sequenced in one affected individual using targeted sequence capture and 454 sequencing. Peripheral blood DNA from affected individual CMT54-III:14 (Figure 3.5), met quality control standards (A260/280 = 1.83 with a high molecular weight by agarose gel electrophoresis (AGE); and was captured using two 385K 1-plex arrays (NimbleGen). Capture probes were designed against all non repeat-masked sequences within the 7q34-q36 interval, covering 9.03 Mb of the 12.98 Mb region (69.63%).

Captured DNA was forwarded to Australian Genome Research Foundation (AGRF) Ltd. (Brisbane, Australia) for 454 sequencing and read alignment. A total of 1 063 758 QC passed sequence reads were generated as single-end reads with an average read length of 362 bp (Table 6.1). These reads were mapped against the Human Mar. 2006 (NCBI36/hg18) assembly. Approximately 46% of reads mapped to the 7q34-q36 interval, ∼53% mapped to regions outside 7q34-q36 or mapped to multiple locations (repeat-mapped) and 0.54% were un- mapped. The repeat-mapped and unmapped reads were not used for sequence variant identification. Reads mapping to the 7q34-q36 interval generated 5186 sequence contigs covering 9 501 821 bp (73.19% of the 7q34-q36 interval) with an average of 18.84 × coverage. As DNA fragments and sequencing reads are longer than the capture probes, sequence contigs cover 3.56% more of the 7q34-q36 disease interval than the area tiled by array capture probes. A total 6 next generation sequencing of cmt54 147

Table 6.1: Summary statistics for targeted sequencing of CMT54 family member III:14. Sequencing reads 7q34-q36 coverage Total Uniquely 7q34-q36 Called 7q34-q36 Depth† reads∗ mapped mapped bases coverage‡ 1 063 758 757 413 494 387 178 999 610 18.84 × 71.73% ∗ Reads uniquely mapping to the Human Mar. 2006 (NCBI36/hg18) assembly (chr7 = 558261 reads and other chromosomes = 199152 reads). † Mean sequence depth across all sequence contigs mapping to 7q34-q36. ‡ Percentage of 7q34-q36 covered by sequence contigs. of 13 550 high confidence variants and 22 680 lower confidence variants were identified within the 7q34-q36 interval.

6.3.2 7q34-q36 sequence variant analysis

The 13 550 high confidence sequence variants identified within the 7q34-q36 interval were filtered to identify putative pathogenic variants (Figure 6.2). After excluding all variants annotated as SNPs (dbSNP 129), there were 3193 novel variants. Novel variants were filtered using Galaxy browser software [341, 342] against genomic intervals from the Human Mar. 2006 (NCBI36/hg18) assembly to identify non-synonymous, splice site, 5’ UTR and 3’ UTR variants. Sequence variants identified included single nucleotide variations (SNVs) and small insertion or deletions (indels).

6.3.2.1 Novel non-synonymous variants

Twenty-four novel non-synonymous changes were identified (Table 6.2). Two variants were reported in an updated dbSNP database (dbSNP 130) and excluded. To identify false positives arising from reads mapping to repeat sequences, 20 bp of consensus sequence flanking each variant was aligned to the Human Mar. 2006 (NCBI36/hg18) assembly using the BLAT alignment tool [198]. Six variants were identified as part of chromosomal translocations, 6 next generation sequencing of cmt54 148

High confidence variants 13550 ? dbSNP 129 filtered variants 3193

? ? ? ? Exons 5’ UTR 3’ UTR Intronic splicing 40 7 9 13

? ? ? ? ? Non-synonymous Synonymous Splice site Regulatory 24 16 0 13

Figure 6.2: Filtering strategy for variants identified by targeted capture sequencing of 7q34-q36. The number of sequence variants at each filter step is indicated. with the variant present on the alternate chromosome as either the normal sequence or as a SNP at the corresponding position (Appendix H, Table H.2). These variants were excluded as false positives. The remaining 16 variants were re-sequenced using Sanger sequencing in all CMT54 family members: Four variants did not segregate with affected individuals and 12 variants were not present, indicating they were false positives from NGS.

6.3.2.2 Splicing variants

Sequence variants possibly affecting exon splicing were defined as: intronic variants within 50 bp, or exonic changes within 5 bp of intron-exon bound- aries. This covered variants affecting the core splice site motif, intronic splice enhancers and the exonic splicing motif. Thirteen novel intronic variants, and no exonic variants were identified (Table 6.3). One variant, CNTNAP2:c.3476- 15C>A, was previously excluded by Sanger sequencing of candidate genes (Section 4.3.3). A second variant was reported in an updated dbSNP database (dbSNP 130). These 12 variants were re-sequenced using Sanger sequencing in 6 next generation sequencing of cmt54 149

Table 6.2: Novel non-synonymous variants identified by targeted capture and se- quencing of the 7q34-q36 disease interval. Gene Variant AA Seq. Variant Validation change depth freq. result† PRSS2 c.416C>G S139C 5 0.6 FP PRSS2 c.449C>T S150F 6 0.67 FP ZNF398 c.732C>A P244Q 21 0.14 FP KRBA1 c.307G>A E103K 28 0.11 FP SSPO c.2286G>T L762F 16 0.62 rs73168055 SSPO c.9212_9213insC C3071R 11 1 rs71194663 SSPO c.14297C>G H4766D 10 0.6 SNP ns GIMAP5 c.914A>G H305R 28 0.11 SNP ns ABCF2 c.154_155insTT G52fs 15 0.2 FP ABCF2 c.151A>G T51G 15 0.27 FP ABCF2 c.147G>C E49D 14 0.29 FP ABCF2 c.143G>A R48K 13 0.31 FP ABCF2 c.126_129delCTTC K43fs 13 0.31 FP ABCF2 c.96T>C V32A 11 0.36 FP ABCF2 c.65G>A R22Q 10 0.4 FP ABCF2 c.59G>T R20L 10 0.4 FP MLL3 c.2689C>T R897X 3 1 Chr dup MLL3 c.2674G>A G892R 3 1 Chr dup MLL3 c.177_178delGA K60fs 12 0.25 SNP ns CCT8L1 c.532A>G N178D 4 0.75 Chr dup CCT8L1 c.619A>G T207A 4 0.75 Chr dup CCT8L1 c.626A>C H209P 4 0.75 Chr dup CCT8L1 c.700G>A A234T 4 0.75 Chr dup DPP6 iso2‡ c.283T>C S95P 27 0.11 SNP ns † Validation result indicates: SNP ns, the variant was validated in additional family mem- bers and did not segregate with the disease; FP, false positive, not reproduced indicate SNPs not identified when re-sequencing using Sanger sequencing of the affected individ- ual sequenced by next generation sequencing; Chr dup, the SNP is part of a chromosomal duplication as detailed in section 5.3.1.3; rs#, the variant is a validated SNP reported in dbSNP build 130. ‡ This change is reported in a short transcript of DPP6 isoform 2 (UCSC gene: uc0101qh.1). The transcript contains exons 1-3 with an additional 24 amino acids and stop codon following exon 3. 6 next generation sequencing of cmt54 150

Table 6.3: Novel sequence variants possibly affecting splicing identified within the 7q34-q36 interval by targeted DNA capture and 454 sequencing. Gene Variant Type Seq. Variant Validation depth freq. result† TRPV5 c.909+36T>A donor 18 0.17 FP CNTNAP2 c.1499-13T>C acceptor 28 0.11 FP CNTNAP2 c.3476-15C>A acceptor 20 0.45 +‡ ZNF398 c.776-18T>A acceptor 18 0.17 SNP ns GIMAP1 c.46-51_46-50delAG acceptor 26 0.12 FP PRKAG2 c.1437+46T>C donor 10 0.4 SNP ns, rs73156279 MLL3 c.1185-29C>G acceptor 6 0.67 FP MLL3 c.1185-9A>G acceptor 8 0.5 FP MLL3 c.2653-17G>A acceptor 3 1 FP MLL3 c.2653-37G>A acceptor 3 1 FP MLL3 c.3500-46C>T acceptor 4 0.75 FP MLL3 c.3500-23G>A acceptor 4 0.75 FP DPP6 c.457+13T>C donor 27 0.11 SNP ns † Validation result indicates: +, variant segregates with three affected individuals and not the control sample sequenced; SNP ns, variant was identified but did not segregate with the three affected and not the control individual. FP, false positive, variant was not identified in re-sequencing. rs#, the variant is a validated SNP reported in dbSNP build 130. The affected individual sequenced using NGS was one of the three affected individuals re-sequenced. ‡ Variant is a novel SNP characterised in (Chapter 4).

three affected and one unaffected CMT54 family members. Three variants did not segregate with the disease and nine variants were false positives.

6.3.2.3 5’ and 3’ UTR variants

Seven and nine novel variants were identified within the 5’-UTR and 3’-UTR of known genes respectively (Table 6.4). One variant, DPP6:c.*347A>G, was previously excluded by Sanger sequencing of candidate genes (Section 4.3.3). Re-sequencing of the remaining 15 variants using Sanger sequencing in three affected and one unaffected CMT54 family members excluded 4 variants which did not segregate with the disease and 10 variants as false positives. The variant 6 next generation sequencing of cmt54 151

Table 6.4: Novel variants within the 5’ and 3’ UTR of known genes in the 7q34-q36 disease interval. Gene Variant Seq. Variant Validation result depth freq. 5’ UTR sequence variants GIMAP7 c.-32G>A 18 0.33 SNP ns TMEM176B c.-170A>G 5 0.6 +, rs114949263 ABCF2 c.-18C>A 11 0.36 FP ABCF2 c.-31C>T 11 0.36 FP CSGLCA-T c.-429T>C 16 0.19 FP PRKAG2 c.-346_357del 21 0.43 FP PRKAG2 c.-414T>C 24 0.42 +, rs117690714 3’ UTR sequence variants CLEC5A c.*815_*816delTA 20 0.15 FP ZNF783 c.*2008T>G 23 0.13 FP ABCF2 c.*214G>A 22 0.27 FP ABCF2 c.*184G>A 21 0.29 FP ABCF2 c.*181C>T 22 0.27 FP ABCF2 c.*178G>A 22 0.27 FP ABCF2 c.*166G>A 7 0.57 SNP ns GALNT11 c.*329T>A 14 0.21 +, rs117690714 DPP6 c.*347A>G 23 0.57 + ‡ † Validation result indicates: SNP ns, the variant was validated in additional family members and did not segregate with the disease in CMT54; +, variant segregates with the disease in CMT54; FP, false positive, variant not identified when re-sequenced using Sanger sequencing; rs#, the variant is a validated SNP reported in dbSNP build 132. ‡ Variant is a novel SNP characterised in (Chapter 4).

GALNT11:c.T>A, which segregated with affected individuals from CMT54, was identified in an updated dbSNP database (dbSNP 132).

6.3.2.4 454 sequencing quality

No putative pathogenic mutation was identified from the novel sequence variants identified within the 7q34-q36 interval. Of the three novel variants pre- viously identified in affected individual III:14 using Sanger sequencing (Chap- ter 4), two were identified here (CNTNAP2:c.3476-15C>A and DPP6:c.*374A>G) and one variant was missed (ACCN3:c.915C>T). The ACCN3 variant was also 6 next generation sequencing of cmt54 152

Chr7

Contigs created by Roche NimbleGen 454 sequencing Contig 324 Contig 325 UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics

Figure 6.3: 454 sequencing contigs at the ACCN3 gene. Top: Chromosome 7 ideograph: The region indicated by the red bar is expanded below. Lower: UCSC Genome Browser image detailing detailing the two 454 sequence contigs that span the ACCN3 gene. The known SNV within exon 5 was missed, as both exons 5 and 6 are not covered by either contig 324 or 325. not identified in the lower confidence variant report. The ACCN3 gene was covered by two sequence contigs, ACCN3 with exons 5 and 6 in the gap be- tween these two contigs (Figure 6.3). The known SNV is located within exon 5, and hence, was not identified in the variant report. The low confidence variants were briefly examined for changes within good candidate genes with previous Sanger sequencing coverage. However, due to the high false positive rate encountered with these and the high confidence changes previously, this analysis was stopped. This sample was later re-sequenced with a higher quality (Section 6.3.4), minimising the potential that the true variant may have been reported here and missed.

Sequence variations within 7q34-q36 which were missed may be either in the gaps between sequence contigs or within contigs with low read-depth. The majority of gaps between the 5186 sequence contigs consist predominantly of short sequence gaps < 10 kb in length. There were 18 large sequence gaps > 10 Kb totalling 467 kb, 3.6% of the disease interval (Table H.1). Examination of the 18 large sequence gaps identified four gaps containing candidate genes: gap 4 contained the entire FAM115C gene and gaps 11, 15, 17 and 18 contained individual exons of the genes ARHGEF5, ACTR3B and DPP6 (Figure 6.4). The remaining large sequence gaps were intergenic.

To confirm that genomic intervals within sequence contigs are free of se- quence variants requires a minimum read-depth of 10 ×. This limits the 6 next generation sequencing of cmt54 153

chr7 Scale 5Mb chr7: 7q34-q36 sequencing gaps of > 10 kb GAP1 GAP8 GAP12 GAP15 GAP17 GAP2 GAP9 GAP13 GAP16 GAP3 GAP10 GAP14 GAP18 GAP4 GAP11 GAP5 GAP6 GAP7 Sequencing depth at reported variants 50 40 30 20 Depth (x) 10 0 Sequencing contigs

RefSeq Genes

Figure 6.4: UCSC genome browser view of the 7q34-q36 region showing sequencing coverage AGRF sequencing (NimbleGen). Top: Chromosome 7 ideograph: The 7q34- q36 target region indicated in the red box is expanded below. Lower): Detail of the 7q34-q36 target region. Gaps greater than 10 Kb are shown as black bars labelled 1 to 18. The sequencing read-depth at each reported variation is shown as a purple bar. Sequence contigs generated with 454 sequencing are shown in the black bar. RefSeq genes are shown compacted in dark blue. The read-depth analysis indicates not all contigs are sequenced at sufficient read-depth to rule out missed sequence variants.

possibility that all reads derive from a single chromosome, i.e. missing a se- quence variation on the alternate chromosome. The read-depth at identified variants ranged from 2 to 62× with an average of 15×. This is lower than the average sequence contig read-depth of 18.8× and below the sequencing target of 20×. As an indication of the variation seen in sequencing depth, the read-depth at the identified variants was plotted across the 7q34-q36 interval and visualised using the UCSC genome browser (Figure 6.4). This identified several regions with low sequencing read-depth.It is likely that additional sequence variants were missed due to contig gaps and localised intervals of low sequence read-depth. Therefore, this analysis can only report the identified variants. It is unable to determine if a genomic interval is free from variation. 6 next generation sequencing of cmt54 154

6.3.3 Off-target and repeat mapping reads

Repeat-mapped reads are those that map equally well to more than one ge- nomic location. Repeat-mapped reads and reads mapping uniquely to genomic regions outside the 7q34-q36 interval accounted for 53% of all reads. A fur- ther 18.7% of reads mapped uniquely to other chromosomes, 6% mapped uniquely to other areas on chromosome 7, and 29.66% were repeat-mapped. A proportion of the reads mapping outside the 7q34-q36 interval and to other chromosomes were included in the DNA capture design for QC purposes. Of the repeat-mapped reads, 95% mapped to multiple locations on chromosome 7 and 5% mapped to other chromosomes. Some of these off-target reads and repeat mapped reads may be the result of non-specific DNA sequence capture or capturing DNA fragments from recent segmental duplication (SD) into or out-of the 7q34-q36 interval.

A large amount of recent SD had been previously reported in the original finished reference sequence for chr7 [255]. Recent SD within the 7q34-q36 disease interval in the Human Mar. 2006 (NCBI36/hg18) assembly reference sequence was examined using the segmental duplications track (sequences of >= 1 kb and >= 90% identity) [348, 349] of the UCSC table browser [256] and Genome Pixelizer software [350]. Recent SDs at 7q34-q36 accounted for 2.53 Mb of DNA sequence, including: 86 intrachromosomal SDs totalling 1.97 Mb within the 7q34-q36 interval, 3 intrachromosomal SDs (8.04 kb) of chromosome 7 outside the 7q34-q36 interval, and 86 interchromosomal SDs totalling 0.56 Mb (Figure 6.5). Therefore recent SD is likely to be responsible for some of the off-target and repeat mapped reads obtained with targeted capture and 454 sequencing. 6 next generation sequencing of cmt54 155

1 2 3 4 5 6 7 8

7q34-q36

9 10 11 12 13 13-R 15 16 17 19 21 22 XY Figure 6.5: Recent segmental duplications of >= 1 kb and >= 90% identity at the chromosome 7q34-q36 disease locus in the Human Mar. 2006 (NCBI36/hg18) assembly reference genome. Intrachromosomal segmental duplications within the 7q34-q36 interval (blue), outside the 7q34-q36 interval (purple) and interchromosomal segmental duplications (red) are shown. The 7q34-q36 interval is magnified 100 × compared to chromosomes. Details of recent segmental duplications were downloaded from the segmental duplications track [348, 349] using the UCSC table browser [256]. The chart was generated using Genome Pixelizer [350].

6.3.4 Exome sequencing

Because substantial areas of the 7q34-q36 target region were not covered by targeted sequencing (above), the 7q34-q36 interval was re-examined in CMT54 using newly available exome sequencing. DNA from two affected individuals (II:2 and III:9, Figure 3.5) and an unaffected ’married-in’ parent (CMT54-II:8), was forwarded to BGI (Hong Kong) for exome capture and sequencing using the NimbleGen 2.1M Human Exome Array (Roche) and Solexa sequencing system (Illumina). The exome capture design targets 180 000 coding exons from the consensus coding sequence (CCDS) database [260] and ∼550 miRNA exons from miRBase [351]. Additionally, the NimbleGen target captured DNA sequenced previously (affected individual III:14) was resequenced by BGI using the Solexa sequencing system.

An average of 24.9 Gb of sequence was generated as paired-end, 74 bp reads for each sequenced sample. Sequences were aligned by BGI (Hong Kong) as single-end reads only. For the exome capture samples, this achieved an average of 43× coverage for 94% of the CCDS exons targeted (Table 6.5). For 6 next generation sequencing of cmt54 156 the 7q34-q36 targeted capture (sequenced as two samples) this achieved an average of 227× coverage for 99.5% of CCDS and miRNA exons within the 7q34-q36 interval.

6.3.4.1 Sequence variant analysis

An average of 30 526 sequence variants were identified within each exome sequenced in family CMT54 (Table 6.6). These variants were annotated and filtered using the online Galaxy Browser software. An average of 144 variants within the linked interval on 7q34-q36 were identified in the exome sequenced samples and 415 variants in the target captured sample. These variants were compared to the dbSNP 130 database to remove variants present in the popu- lation. Non-synonymous variants, were identified using open reading frame co-ordinates from the CCDS database. After filtering, 4-6 non-synonymous variants remained for each exome sequenced sample and 133 non-synonymous variants for the target region captured sample.

With an autosomal dominant model of inheritance, all affected individuals are expected to carry the same mutation, and be absent from ’married in’ unaffected individuals. Applying this to the data from the three affected and one unaffected samples analysed, no non-synonymous variants were identified within the 7q34-q36 interval. Extending this to all novel variants within the 7q34-q36 interval identified three intronic variants (Table 6.7). One variant was excluded with previous Sanger sequencing (Chapter 4) and the remaining two variants did not segregate with all affected individuals in CMT54 when re-sequenced using Sanger sequencing.

Of the three sequence variants previously shown to segregate with the disease in CMT54, only one was identified in this filtering approach (CNT- NAP2:c.3476-15C>A). The SNV ACCN3:c.915C>T was identified within the 7q34-q36 target capture sequenced sample only. The variant DPP6:c.*374A>G was not reported in any of the samples as this variant is within an alternate 6 next generation sequencing of cmt54 157 94.18% 93.28% 94.28% 99.41% 23.03% × × × × × Total Sequencing bases Coverage reads Total Mapped Exome mapped Depth Exome 17 743 070 2 625 974 360 1 814 973 684 10 175 693 226.91 15 865 300 2 348 064 400 1 646 449 085 17 224 0.38 Summary of exome and chromosome 7q34-q36 targeted sequencing using Solexa platform. † † a b II:2 17 262 146II:8 2 554 797 608 2 16 406 903 209 507 599 2 501 719 036 1 517 249 532 2 358 641 852 1 475 012 041 44.48 43.24 III:9 16 390 674 2 425 819 752 2 262 780 840 1 435 254 630 42.08 Captured DNA from the custominclude capture sequenced arrays intervals were outside sequenced CCDS separately. exons. The exome coverage does not † III:14 III:14 Sample Table 6.5: 6 next generation sequencing of cmt54 158

Table 6.6: Sequence variants identified by exome capture and BGI sequencing Filter II:2 III:9 II:8 III:14 (whole exome/7q34-q36 locus) All Variants 30 676/142 30 655/154 30 248/135 na/415 All Variants & not in 22 258/16 22 277/16 21 809/16 na/268 dbSNP 130 CDS Variants 12 542/65 12 570/71 12 227/51 na/242 NS Variants 5464/29 5497/36 5353/22 na/159 NS Variants & not in 4006/6 4024/6 3847/4 na/133 dbSNP 130

Table 6.7: Novel sequence variants identified with exome and targeted interval sequencing with segregation analysis in CMT54. Position Variant Gene Validation chr7:147 711 659 C>A CNTNAP2 healthy controls∗ chr7:148 137 226 G>A EZH2 no segregation in CMT54 chr7:151 204 513 G>A PRKAG2 no segregation in CMT54 ∗ CNTNAP2 sequencing detailed in Chapter 4

DPP6 isoform not present in the CCDS reference database and therefore, not targeted by the exome capture and BGI sequencing-bioinformatics pipeline. As the pathogenic mutation was not identified a more detailed examination of the sequencing data was undertaken.

6.3.5 Further sequencing analysis using unequal coverage analysis

The sequence read-depths and sequence contig sizes were compared between the four sequenced samples to assess uniformity. Sequence contig coordinates and single-base read-depth were not reported in the sequence variant report by BGI, therefore, sequencing bioinformatics were repeated locally for the BGI sequenced samples. Sequence reads were aligned to the Human Mar. 2006 (NCBI36/hg18) assembly using Bowtie software and sequence alignments were converted to pileup format Using SAMtools software. Sequence contigs 6 next generation sequencing of cmt54 159 and per base read-depth were compared between samples using the custom BAsH script PileupToBgr.sh and the UCSC genome browser (Appendix A).

The sequencing coverage was examined for the ACCN3 gene due to the known variant present and the incomplete sequencing coverage. The three exome captured samples had a similar coverage of sequence contigs, however, the read-depth varied at the ends of target intervals and across exons 4-6 with poor coverage (Figure 6.6). ACCN3 was sequenced to a greater extent in the target captured DNA sample than the exome sequenced samples. Targeted DNA capture included introns and exons at this location as well as the higher number of sequencing reads. Overall, the same exons were covered poorly by both DNA capture platforms. Additional genes were examined and all showed similar variation in sequence coverage at the ends of target intervals or specific exons (data not shown). This confirmed that there is variation in read-depth and contig size between samples in the intervals sequenced.Differences in sequence coverage need to be considered when comparing sequence variants between samples to avoid false negatives from one sample excluding true variants identified in another sample.

6.3.5.1 Sequence variant analysis considering unequal coverage

With an autosomal dominant model of inheritance, all affected individuals should carry the same mutation (assuming there are no phenocopies), which should be absent from ’married-in’ unaffected spouses. With unequal sequenc- ing coverage, the per-base read-depth needs to be considered before excluding a sequence variant that is not reported in an individual. A custom script was written for the BAsH unix shell for variant segregation analysis with unequal sequence coverage (CompSNPchr7.sh; Appendix A). The script checks for segregation of identified sequence variants among the four CMT54 family members. However, when a variant is not reported in an individual, the se- quencing coverage at that position is checked. A sequence depth of at least 10 6 next generation sequencing of cmt54 160

chr7 (q36.1) 34 Scale 1 kb hg18 chr7: 150377000 150378000 150379000 150380000 50 _

II:1

10 - 0 _ 50 _

II:8

10 - 0 _ 50 _

III:9

10 - 0 _ 50 _

III:14

10 - 0 _ UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics ACCN3 ACCN3 ACCN3

Figure 6.6: Sequencing coverage comparison between three exome capture and one target region capture samples. Top Chromosome 7 ideograph. The region bounded by the red box is expanded below. Lower panel Comparison of sequencing depth and coverage for Illumina sequenced samples in the region of the candidate gene ACCN3. Control II:8, affected II:2 and affected III:9 were exome captured (green); targeted DNA capture of the chr7q34-q36 dHMN1 disease interval was used for affected individual III:14 (light blue). Sequencing depth greater than 50 fold is indicated by a pink line. Three isoforms of ACCN3 are shown in dark blue. 6 next generation sequencing of cmt54 161 reads at the selected DNA base position was required to exclude the presence of a variant. Sequence variants were annotated against the dbSNP 130 database to identify novel variants. The validation status of SNPs was considered before excluding reported SNPs. Any SNPs from dbSNP 130 with an ’unknown’ validation were manually examined before excluding them.

Using unequal coverage analysis, the three novel variants that were previ- ously shown to segregate with the disease in the three affected individuals from CMT54 were also identified here (Section 6.3.4.1, Table 6.7). Four additional variants were identified in regions where sequence coverage was insufficient to exclude the possibility of a missed variant call (Table 6.8). These four variants were subsequently excluded with Sanger sequencing of CMT54 family mem- bers. One of these four variants was the ACCN3 mutation described previously. The ACCN3 SNP was only identified in affected individual III:14, and not in the two affected individuals or control individual sequenced with exome sequencing with a with a read-depth of 3-7×. An additional five sequence variants were identified that were reported SNPs with unknown validation status (Table 6.8). Of these, four SNPs were excluded with updated annotation information and one was confirmed as a false positive.

No sequence variation with a putative pathogenic role was identified in the CCDS exons and flanking intronic regions from the 7q34-q36 disease interval in CMT54. The sequencing coverage of these exons is high with 99.5% of CCDS exons covered in the target captured sample. This includes the interval that was common between families CMT54 and CMT44 which was identified through haplotype comparisons. 6 next generation sequencing of cmt54 162 ∗ ∗ NSNS 1 3 4 6 Y Y 1 7 no segregation healthy controls NS 3 Y 0 7 rs71532786 intron 8 3intron Yintron 5 9intron 5 Y no segregation 5 5 Y 4 Y 5 Y 16 rs5888023 rs12334067, 16 rs12333828 rs12333828 intron 2 3 Y 4 rs61356147 5’ UTR Y 5 6 0 no segregation CUL1 PDIA4 PDIA4 CASP2 ACCN3 KCNH2 ZNF767 CLEC5A SMARCD3 10 fold, the threshold for a homozygous call (no variant identified). < T C A A > > > > Segregation analysis of exome sequencing identified variants assuming unequal sequencing coverage. Both these variants incorrectly report a single base deletion at position 148 333 891 in a repetitive sequence. ∗ PositionNovel variants chr7:142 695 577 Variant A Genechr7:150 378 829SNPs with C type unknown validation status (dbSNP130) chr7:141 282 007 II:2 III:9 -/A III:14chr7:148 333 892 II:8 Validation C/A chr7:148 125 760chr7:150 275 881 G T chr7:148 333 891 A/C chr7:148 952 576 G/A chr7:150 568 025 -/T Table 6.8: Variants with fold coverage 6 next generation sequencing of cmt54 163

Table 6.9: Markers with two-point LOD scores > 1.5 in a genome wide linkage analysis by Gopinath et al. [103]. Locus Marker LOD score 2q36 D2S396 2.07 at θ = 0 7q36.1 D7S661 3.19 at θ = 0 7q36.1 D7S636 3.19 at θ = 0 7q36.2 D7S798 1.76 at θ = 0 8q21 D8S270 2.02 at θ = 0 18q12 D18S1102 1.76 at θ = 0

6.3.6 Exome wide analysis

As no putative disease causing mutation was identified within the chromosome 7q34-q36 disease locus, non-synonymous variants identified elsewhere in the exome were examined. Sequence variants from the three exome sequenced samples were examined. A total of 87 novel non-synonymous variants were present in both affected individuals and absent in the unaffected control. The location of these variants was compared to a 10 cM genome wide microsatellite linkage analysis carried out previously as described by Gopinath et al. [103]. With the exception of the 7q34-q36 locus, there were three markers with a two-point LOD score > 1.5 (Table 6.9). One variant in the TSPYL5 gene was located near the marker D8S270 with a two-point LOD score of 2.02. Using Sanger sequencing of two additional affected and two unaffected individuals from CMT54, this variant did not segregate with the disease in the larger CMT54 pedigree.

To identify NS variants with a likely functional impact from amongst the remaining variants, a combination of five software tools were used to assess se- quence conservation or effect on protein function. Variants that were predicted to be deleterious by two or more of the software tools were considered can- didates. Twenty-six potentially deleterious NS variants were re-sequenced in additional CMT54 family members and control individuals. All were excluded 6 next generation sequencing of cmt54 164

Table 6.10: Sequence variants predicted to be deleterious or affecting conserved sequences. MSA SNAP Polyphen SIFT Panther 46/32/9 28/43/16 13/17/44/13 21/51/15 11/24/12/40 results indicate the number of variants in each group: MSA priority - High- /Medium/Low priority; SNAP - Not neutral/neutral/unknown; Polyphen - Probably damaging/possibly damaging/benign/unknown; SIFT - Affect/tol- erate/unknown; Panther - Probably damaging/possibly damaging/not dam- aging/unknown. through: lack of segregation with affected individuals, presence in control individuals or identified as homozygous SNPs.

6.4 discussion

NGS was used to screen all known genes within the chromosome 7q34-q36 disease interval. No putative pathogenic mutation was identified. Sequence variations that were identified and analysed included non-synonymous SNV, splice site, 5’ UTR and 3’ UTR sequence variants. A screen of non-synonymous variants from elsewhere in the exome, including those predicted to be dele- terious, also did not identify a putative pathogenic mutation. This analysis considered variants that segregate with an autosomal dominant mode of in- heritance, and took into account the differences in sequence coverage between samples.

While a pathogenic mutation was not identified here, a similar exome sequencing approach previously identified the disease gene TGM6 causing spinocerebellar ataxia [339]. In this case four patients from a single pedigree were sequenced and compared to identify the disease mutation as the sole candidate sequence variant. This mutation lay within the previously linked interval on chromosome 20p13-p12.2. Meta analysis showed that any two exomes combined with the linked disease locus would have successfully 6 next generation sequencing of cmt54 165 identified the TGM6 mutation as the sole putative pathogenic candidate for this autosomal dominant disorder.

By considering variation in sequencing coverage between samples, the se- quence variants known to segregate with the disease in CMT54 were identified. This identified several additional sequence variants which would have other- wise been missed because of false negatives in individual samples caused by poor or no sequencing coverage. This unequal coverage analysis is a novel way of comparing sequencing variants between samples to account for variations in sequencing coverage between samples. Other approaches to address differences in sequencing coverage between a number (n) of samples have used a relaxed model of segregation requiring variants to be present in n-1 or n-2 samples only [338]. However, this application does not account for low read-depth and would not have identified the ACCN3 SNP described here. Another approach for the identification of variants within low coverage intervals is to combine sequence reads from multiple affected individuals from a pedigree prior to assembly and analyse them as a single ’virtual’ individual. By combining reads from three affected individuals, Hedges et al. [352] demonstrated an increased yield of sequence variants of 11%, compared to assembling three individuals independently. This approach increases the chance of variants being reported from areas of low sequencing coverage. It is necessary to consider unequal se- quencing coverage if genes are to be excluded (i.e. free of putative pathogenic sequence variation) as low read-depth will have a high false positive rate. The addition of one sample with poor sequencing quality can falsely exclude positive variants. This is also necessary when considering the differences in sequence capture and sequencing platforms [353]. 6 next generation sequencing of cmt54 166

6.4.1 Limitations of the NGS analysis

Despite the comprehensive analysis of NGS data, the chromosome 7q34-q36 mutation was not identified. The read-depth and sequencing coverage for one affected individual was high after targeted sequence capture of the disease interval. Due to the high average read-depth of 226×, most of the 99.5% of CCDS exons were covered by sequence contigs at sufficient read-depth to be confident that a sequence variant was not missed in these regions. However there are several important limitations to this analysis.

An important limitation of sequence analysis based on a reference genome is that gene annotation is not 100% complete with ongoing refinement and addition of alternate transcripts. This is important for the definition of coding exons used in the standard exome captures. The exome capture does not sequence all exons within the genome because of incomplete annotation of the reference genes and because not all exons are suitable targets for capture. This includes coding regions within repetitive sequences and areas that may not amplify effectively with PCR required for the sequence capture and enrichment. In this project, a known variation in the 3’-UTR of DPP6, previously identified by targeted Sanger sequencing of candidate genes, was not identified by exome sequencing. In this case the DPP6 gene was not targeted in the NimbleGen Sequence Capture 2.1M Human Exome Array used (2.1M Human Exome Array annotation files). However, it was identified with the use of the custom sequence capture microarray targeting the 7q34-q36 interval. In a comparison of three current exome capture platforms, differences in the genes targeted have been demonstrated; approximately 1100 additional genes were targeted by the Agilent SureSelect platform compared to the NimbleGen exome array [354]. 6 next generation sequencing of cmt54 167

A further limitation is that exome sequencing does not capture regulatory regions that may be involved in disease. These may include promoter regions, splicing enhancers, cryptic splice sites and other remote regulatory elements. Relatively few genes have well characterised regulatory elements beyond their core promoter region. Similarly, the effect of synonymous variants on exon splicing is difficult to assess. In rare cases, synonymous sequence variants can cause exon skipping or other splicing errors [355]. These variants have been reported to create leaky expression, with the incorrect transcript making up only a small percentage of overall gene expression. As with promoter predictions, splice site software is generally restricted to predicting the effect of variants on well characterised core splicing and enhancer motifs.

With 20 000 to 24 000 SNVs identified on average in a human exome it is essential to filter out known SNPs [356]. However, as the number of individuals reported in sequence variant databases continues to grow, there is an increased risk of undiagnosed individuals inadvertently being included. This is more problematic for late onset diseases or those with subtle phenotypes. While the true prevalence of autosomal dominant dHMN is unknown, the most common form of peripheral neuropathy, CMT1, has a prevalence of approximately 20 cases per 100 000 and CMT2 2 per 100 000 [315]. Clinical dHMN is seen less often than CMT2 in the Neurogenetics clinic at Concord Hospital and assumed to have a lower prevalence (G. Nicholson, personal communication). The genetic heterogeneity of dHMN decreases this further, as mutations in a particular disease gene will account for a small percentage of all dHMN [6]. With mutations being so rare, it is unlikely that undiagnosed dHMN individuals will be present in the dbSNP cohorts. Nonetheless, a simple filtering restriction by not using SNPs with a MAF rarer than a reasonable threshold can easily overcome this. Using a MAF threshold of 0.001 for SNPs has been suggested [356], however, for rare mendelian diseases this threshold can be 0.000 01 or greater. 6 next generation sequencing of cmt54 168

A further limitation of the NGS strategy used here is the poor capacity to identify indel mutations. While both SNV and indels were considered in the original analysis using 454 sequencing and gsMapper software (Roche), no putative pathogenic indels indels were identified. However, the performance of this early 454 sequencing and gsMapper software in detecting indels was shown to be limited, with a similar application missing 50% of the indels identified by Sanger sequencing [353]. Indels were not called by the SOAP1 or Bowtie software used to assemble the reads from the Solexa sequencing used in exome sequencing. The use of paired-end reads and the long reads in a de novo assembly may better identity indels as well as translocations and inversions. However the bioinformatics and computer resources required for this assembly exceed those available for this project. Indel identification and analysis using NGS is still in its infancy and currently has a much higher false positive rate than SNV calling. In this project, in silico predictions of disease effect were used to prioritise non-synonymous variants from the exome analysis. However, excluding se- quence variants based on predictions is a risky approach, as true pathogenic variants may be missed. This was demonstrated with the analysis by Bamshad et al. [356] in identifying DHODH mutations in Miller syndrome where in silico predictions incorrectly excluded the true mutation under the recessive model of inheritance tested. The filtering strategies employed here were less stringent, requiring only two positive results from five software tools to screen variants further. These limitations provide possible explanations for not finding a pathogenic variant. The pathogenic variant may be a variant in: a gene that is not yet recognised; a gene that is not targeted by the exome capture; a repetitive sequence that was not sequenced; a sequence variant in a regulatory element; or an indel or other structural variation not identified within the limits of the sequence assembly used here. EXOME SEQUENCING OF CMT44 7

7.1 introduction

A second dHMN family CMT44, with suggestive linkage to the chromosome 7q34-q36 interval, was examined by exome sequencing. Previous two-point and multi-point linkage analysis in CMT44 achieved a maximum LOD score of 2.02 at the 7q34-q36 locus. This reached the theoretical maximum LOD-score for this family. The haplotype comparisons carried out previously (Chapter 3) indicate that families CMT44 and CMT54 possibly share a common ancestor. If this is the case, CMT44 may share the same mutation with CMT54; and the same difficulties will arise in mutation discovery. However, while the haplotype association between these two families is statistically significant, there remains the possibility that the families are unrelated. Recent analysis by Brewer [357] has shown statistically significant haplotype associations to be incorrect when a larger set of markers were included in the analysis. In addition, the more recent sequencing platform used in analysis of CMT44 had the potential to achieve greater coverage in difficult to sequence exons.

7.1.0.1 Hypothesis and aims

It is hypothesised that dHMN seen in family CMT44 is caused by a single gene mutation within the linked region on chromosome 7q34-q36. The aim of this chapter is to:

169 7 exome sequencing of cmt44 170

• Capture and sequence the exomes of individuals from family CMT44 using NGS in order to identify the genetic variant causing dHMN.

7.2 materials and methods

7.2.1 Exome sequencing

Genomic DNA from peripheral blood (3 µg) was forwarded to Axeq Tech- nologies (Seoul, Korea) for exome sequencing. An Illumina sequencing library was prepared for each sample which was enriched using the Illumina Exome Enrichment protocol and sequenced using an Illumina HiSeq 2000 Sequencer.

7.2.1.1 Sequencing library preparation

Sequencing libraries were prepared using the Illumina TruSeq DNA Sample Prep system following manufacturers protocols. In brief, 1 µg of genomic DNA was fragmented by nebulization, 3’ ends of fragmented DNA were repaired, then Illumina adapters were ligated to the fragments. Ligated DNA fragments were size selected in the range of 350-400 bp. Size-selected DNA fragments were PCR amplified to enrich for adapter ligated DNA fragments. Correct library fragment size and concentration were assayed using an Agilent Bioanalyzer.

7.2.1.2 Exome capture and enrichment

Exome capture and enrichment was carried out using the Illumina TruSeq Exome Enrichment system following manufacturers protocols. In brief, DNA libraries were indexed and pooled prior to enrichment. DNA libraries were hybridised to capture probes bound to streptavidin beads. Bound libraries were washed three times to remove non-specific binding fragments. The enriched 7 exome sequencing of cmt44 171 library was eluted and subjected to a second hybridization and three wash steps. The enriched library was eluted from the capture probes and prepared for sequencing.

7.2.1.3 Sequencing

The DNA libraries were PCR amplified to further enrich for adapter ligated DNA fragments. Exome capture and enrichment was validated using Axeq Technologies internal quality control procedures. Automated sequencing was carried out using an Illumina HiSeq 2000 instrument.

7.2.1.4 NGS read mapping and variant detection

Sequence read-mapping and variant annotation was carried out by Axeq Technologies (Seoul, Korea). Sequence reads were mapped to the Human Feb. 2009 (GRCh37/hg19) assembly using BWA software [358] using a read quality threshold of 20. Sequence variants, including single-base substitutions and indels, were called using SAMtools software [359] using a maximum read depth of 1000× and a quality threshold of 20 for both SNVs and indels. SNPs were annotated using dbSNP 131, dbSNP 132, and 1000 genomes SNP call release (20101109, 628 individuals).

7.2.2 NGS variant analysis

Sequence variants were analysed using Galaxy Browser software (http:// galaxyproject.org/)[341, 342, 360]. Novel variants were annotated using the dbSNP 134 and 1000 Genomes Phase 1 Integrated Variant Call Set. Pub- lic data downloaded sites were; dbSNP 134 (ftp://ftp.ncbi.nih.gov/snp/), the 1000 Genomes Phase 1 Integrated Variant Call Set (ftp://ftp-trace. ncbi.nih.gov/1000genomes/ftp/) and the NHLBI Exome Sequencing Project 7 exome sequencing of cmt44 172

database (http://evs.gs.washington.edu/EVS/). Known peripheral neuropa- thy genes were downloaded from the Mutation Database of Inherited Pe- ripheral Neuropathies (http://www.molgen.ua.ac.be/cmtmutations/). Novel variants were screened in a panel of 32 unrelated control exomes of unaffected spouses from unrelated ALS and CMT families using Integrative Genomics Viewer software [361].

7.2.3 HRM analysis

Genotyping of sequence variants was carried out with high resolution melt analysis using the LightScanner System (Idaho Technology Inc.). Target exons were PCR amplified in a 10 µL reaction volume with LightScanner master mix (1×; Idaho Technology Inc.), PCR Enhancer (1×; Invitrogen), forward and reverse primers (0.4 pmol µL-1 each) and 10 ng template DNA. Fluorescent melt curves were acquired using the LightScanner instrument between 65°Cand 98°C.

7.3 results

7.3.1 Exome sequencing in CMT44

Exome sequencing was performed using four affected and one unaffected individuals from CMT44 (individuals III:1, III:5, IV:1 and II:2; Figure 7.1). An average of 6.5 billion bases of sequence per individual was generated as paired- end reads with an average length of 105 bp (Table 7.1). This achieved a mean read-depth of 47.72× across the five exomes. Exome coverage at 1× read-depth ranged from 94.9% to 93.0%, however at 10× read-depth the exome coverage was lower, ranging from 77.3% to 87.0% of target exons. An average of 65 236 7 exome sequencing of cmt44 173

SNVs and 13 385 indels were identified in each sample, of which 11.6% of SNVs and 4.6% of indels, resulted in non-synonymous or acceptor/donor splice site changes (including non-frameshift indels) (Table 7.2).

The non-synonymous and splice site variants (SNV and indels) that were common to the four affected individuals were compared with the database of 44 known peripheral neuropathy genes (Mutation Database of Inherited Pe- ripheral Neuropathies) and two additional dHMN genes (ATP7A and HSPB3). A mutation in the MFN2 gene was identified in all four affected individuals. This missense mutation MFN2:c.C368T (p.P123L), was previously reported in autosomal dominant CMT2 by Verhoeven et al. [362]. However, exome data showed that unaffected individual II:2 also carries this mutation (Figure 7.1). CMT44 had previously been screened for mutations in MFN2, using individual II:1 as the index patient (O.Albulym, personal communication). No mutation was identified in indiviudal II:1. This indicated that the MFN2 mutation, had been inherited from individual II:2. This warranted a clinical re-examination of CMT44 family members.

7.3.2 Clinical detail of CMT44

Approximately seven years had elapsed since the previous clinical neurological examination of CMT44 family members. A further neurological examination of three individuals from CMT44 (II:1, II:2 and III:1) was carried out at the Neurology clinic, Concord Hospital.

Individual II:1, aged 84, has a mild motor neuropathy, with minor muscle wasting in the lower legs and pes cavus. Pes cavus was apparent since child- hood, with a family history of similar feet from a young age. A mild axonal motor neuropathy with no sensory loss was confirmed by nerve conduction studies (NCS). 7 exome sequencing of cmt44 174 × × × × × ∗ depth Mean depth × depth at 10 × (SNVs/indels) Summary of exome sequencing in CMT44. Sequence variants identified by exome sequencing in CMT44 Table 7.1: 1037/628 624/414 309/255 677/367 686/413 ∗ Table 7.2: Filter III:1 III:5 III:7 IV:1 II:2 databases All Variants 68 963/14 923 68 087/14 092 67 205/12 703 60 636/12 446 61 289/12 762 CDS Variants 19 354/845 19 200/651 19 165/486 17 954/568 17 907/617 Public SNP databases used: dbSNP 131, dbSNP132 and 1000 genomes SNP call release (20101109). Total Sequencing bases Exome coverage reads Total Mapped Exome mapped at 1 ∗ & not in public NS/SS VariantsNS/SS Variants 9675/836 9931/646 9306/481 8942/560 9098/609 II:2 46 223 172 4 947 525 468 4 036 220 679 2 188 857 813 93.2% 78.4% 35.3 IV:1 41 324 016 4 422 761 250 3 622 232 062 2 015 291 144 93.0% 77.3% 32.5 III:1III:5 88 020 920III:7 9 69 421 646 239 330 514 7 7 65 034 679 861 279 947 498 330 663 6 6 652 012 011 133 298 870 4 197 581 663 5 167 135 310 3 485 101 401 2 922 467 830 94.4% 94.7% 94.9% 85.7% 87.0% 87.2% 67.6 56.1 47.1 Based on an exome size of 62 085 286 bp. ∗ Sample 7 exome sequencing of cmt44 175

I:1 I:2

DNA DNA

II:1 II:2 II:3 II:4 II:5

P123L/wtMFN2 - wt/wt P123L/wtMFN2 DNAH11 - V1692I/wt wt/wt PCDHGA4 - Y33S/wt wt/wt

DNA DNA DNA DNA DNA DNA

III:1 III:2 III:9 III:3 III:4 III:5 III:10 III:6 III:7 III:8

MFN2 - P123L/wt wt/wt wt/wt P123L/wt P123L/wt P123L/wt DNAH11 - V1692I/wt V1692I/wt V1692I/wt PCDHGA4 - Y33S/wt Y33S/wt Y33S/wt

DNA DNA DNA DNA ? IV:1 IV:2 IV:7 IV:8 IV:3 IV:4 IV:5 IV:6 IV:9 IV:10 IV:11 IV:12 IV:13

MFN2 - P123L/wt P123L/wt wt/wt wt/wt DNAH11 - V1692I/wt PCDHGA4 - Y33S/wt Figure 7.1: Updated pedigree for family CMT44. Females and males are represented by circles and squares respectively. solid coloured symbols indicate affected individuals, with orange indicating the dHMN phenotype, blue the mild motor and sensory neuropathy phenotype and black the more severe CMT2 phenotype. Bars indicate the haplotype at 7q34-q36 described previously Figure 3.7, with blackened bars indicating the haplotype segregating with the disease. The genotype for the MFN2, DNAH11 and PCDHGA4 mutations are indicated below each individual. 7 exome sequencing of cmt44 176

Individual II:2, previously listed as unaffected (Chapter 3), now aged 84, shows distal sensory signs with reduced vibration sensation in the legs. No muscle wasting was apparent. NCS indicated a mild motor and sensory neu- ropathy. Since the previous clinical examination, clinical symptoms had pro- gressed slightly, the patient now required the use of a walking stick. The mild clinical signs would not have brought this individual to the attention of clinicians, if it were not for the affected offspring. Based on the previous clinical description, this individual was initially ruled out as the carrier of the gene mutation in this family.

Affected individual III:1, aged 52, has a severe motor and sensory neuropathy, with gross wasting and weakness of muscles in the lower legs and hands. Pes cavus was apparent since childhood. This is a more severe phenotype than either parent. At 44 years of age a dHMN phenotype was present, however, no sensory defect was apparent from NCS. The first sensory symptoms started at 49 years of age, confirmed by NCS.

Other family members were not re-examined, however previous clinical de- scriptions indicate the more severe motor and sensory neuropathy phenotype described for individual III:1, is seen in affected individuals III:5, III:7 and IV:1, with pes cavus and leg problems from childhood and motor and sensory symptoms developing from adulthood. Individuals III:3, III:6, and all other individuals from generation IV were clinically normal.

The phenotype in generations III and IV is generally more severe than either parental phenotype from generation II. This suggests that two peripheral neuropathy genes are combined in this family; the MFN2 P123L mutation inherited from individual II:2 who has a mild CMT2, and a second gene from individual II:1 with a pure motor neuropathy. This second gene may be located within the 7q34-q36 interval or elsewhere in the genome. 7 exome sequencing of cmt44 177

7.3.3 MFN2 genotyping in CMT44 using high resolution melt analysis

Genotyping of the MFN2 P123L mutation in CMT44 family members was car- ried out using high resolution melt analysis. DNA samples from 12 individuals from CMT44 were analysed in duplicate (Figure 7.2). These included the two founder individuals with dHMN and mild CMT2 respectively, four individu- als with more severe CMT2, four clinically normal individuals (at risk) and two unaffected ‘married in’ spouses (Figure 7.1). The MFN2 P123L mutation was present in the five individuals with CMT2, as identified previously using exome sequencing. Two clinically normal at risk individuals also carry the MFN2 P123L mutation. The individual with dHMN (II:1), two other unaffected family members and the two unaffected spouses did not carry the mutation. The MFN2 P123L mutation shows reduced or age dependant penetrance in CMT44.

7.3.4 Exome wide sequence variant analysis in CMT44

Given that both parents (II:1 and II:2) showed clinical features consistant with peripheral neuropathy, it is possible that two mutations are segregating in CMT44. In addition to individual II:2, segregation analysis of sequence variants was also carried out with affected individual II:1 as the founder. A total of 5155 non-synonymous (NS), splice site (SS) and indel variants were common to the four affected individuals. Excluding those variants that were common to individual II:2 established that 608 NS/SS variants were inherited from individual II:1. Of these variants, 12 SNVs and 5 indels were novel, not reported in dbSNP 132 or the 1000 genomes project (SNP release 20101109). These variants were further filtered to exclude rare SNPs by examining a panel of 30 unrelated control exomes, dbSNP 134, the 1000 Genomes project Phase 1 7 exome sequencing of cmt44 178

1

0.08

0.06

0.04 Fluorescence

4 0.02

0

85 86 87 88 89 90 91 92 Temperature °C

0.08 0.06 0.04 0.02 0 -0.02 -0.04 -0.06

Normalised Fluorescence -0.08 -0.1 85 86 87 88 89 90 91 92 Temperature °C

Figure 7.2: HRM genotyping analysis of the MFN2 mutation identified in CMT44. Normalised melt curves (above) and difference curves (below) are displayed for MFN2 exon 5. The curves are color coded, indicating: grey for normal individuals not carrying the mutation, including two married in spouses; red and orange for affected individuals with the mutation; and green for two unaffected at risk individuals carrying the mutation. The orange curve differs due to the presence of a second SNP in the exon 5 amplicon. 7 exome sequencing of cmt44 179

Integrated Variant Call Set (October 2011), and the NHLBI Exome Sequencing Project database. After filtering, two novel non-synonymous SNVs remained in the genes; Axonemal dynein heavy chain 11, (DNAH11:c.G5074A:p.V1692I) and Protocadherin gamma-A4 (PCDHGA4:c.A98C:p.Y33S). There were no putative pathogenic mutations identified within the 7q34-q36 disease interval in CMT44. Furthermore, no novel sequence variants were identified within these two genes in previous exome sequencing data from family CMT54.

7.3.5 Comparison of SNPs between CMT44 and CMT54

To examine whether the chromosome 7q34-q36 haplotype shared by CMT44 and CMT54 (Chapter 3) represented identical by descent (IBD) or identical by state (IBS), a set of 30 SNPs from 7q34-q36 were selected from exome sequencing data (Figure 7.3). Of these SNPs, 11 were homozygous for different alleles in each family. The remaining alleles were homozygous and identical between the two families (9 SNPs) or heterozygous with a possible match (10 SNPs). Therefore, any SNP based haplotype would have at least 11/30 alleles not shared between the two families. It is therefore unlikely that CMT44 and CMT54 share a common ancestor for the 7q34-q36 disease interval as suggested previously (Chapter 3).

7.4 discussion

We have identified a possible digenic inheritance of dHMN and CMT2 in family CMT44. Both parents in generation II (Figure 7.1) show clinical features of peripheral neuropathy. Parent II:1 has a purely motor neuropathy caused by a novel gene mutation and parent II:2 has a mild CMT2 caused by the MFN2 P123L mutation. Four additional affected individuals appear to carry 7 exome sequencing of cmt44 180

Position (Mb) Marker CMT44 CMT54 146.5 D7S2511 6 1 rs2692185 A T SNP1 A A/C rs2286127 A A/G rs2074715 A/G G D7S615 1 1 rs2538962 A C/A rs2373284 T C rs10240503 A A/G rs3801976 G/A A rs77706740 C C/A

es rs9648691 G A 147.2 rs987456 A C rs2530312 A G rs3194 C A rs1062072 G A rs2530311 G A rs2717829 C G rs2530310 T C rs6962115 T/G T/G rs6962292 T/C T/C ed by common haplotyp common ed by D7S2442 4 4 didate interval CMT54 interval didate D7S2419 3 3 rs1014095 C C rs6956231 G G rs10271133 T C rs2072406 A A rs933436 T T rs2007404 C C rs3217095 - -

Minimal region suggest Minimal region SNP2 G G/A

Minimum probable can probable Minimum 148.3 rs2072407 G G 148.4 rs10274535 G G rs3214332 - - 148.5 D7S688 2 4

Figure 7.3: Updated comparison of haplotypes between CMT44 and CMT54 using SNP data from NGS. The minimal region suggested by a founder effect in CMT54 and additional dHMN families (Chapter 3) is flanked by the markers D7S615 and D7S2419. Physical map of the interval shows the position of markers, with coordinates based on the Human Feb. 2009 (GRCh37/hg19) assembly. Markers include five microsatellites (Chapter 3) and 30 SNPs from NGS. The marker alleles are shown for CMT44 and CMT54, with alleles differing between the two families coloured red. Microsatellite marker alleles are numbered and SNP markers alleles show nucleotides (ATGC) or 1 bp deletions (-). 7 exome sequencing of cmt44 181 both the MFN2 mutation and the second disease gene and have a substantially more severe, earlier onset, CMT2 phenotype than either parent. The MFN2 mutation shows reduced or age dependant penetrance, with two unaffected at risk individuals also carrying the mutation. The second gene mutation causing the founder dHMN phenotype is possibly the DNAH11 V1692I variant or the PCDHGA4 Y33S variant identified from exome analysis. The two mutations segregating in CMT44 may have an additive effect creating the more severe phenotype seen in individuals carrying both. This may also explain the lower penetrance seen in generation IV of CMT44, where fewer individuals are likely to carry both mutations.

A digenic inheritance has been previously described in a three generation dHMN family with both a BSCL2 mutation and a second disease locus on chromosome 16p [146]. In this family, 12 affected individuals inherited both the BSCL2 mutation and the 16p locus, and one individual, with sub-clinical motor neuron damage, carried only the 16p locus. The 16p locus was thought to contribute to the dHMN phenotype seen in this family. Similarly, a family with a CMT1 phenotype, has been described with mutations in the SH3TC2 gene [336]. Normally, compound heterozygous SH3TC2 mutations cause the severe recessive disorder CMT4C through loss of function. In this family, individuals carrying a single missense mutation Y169H, were diagnosed with a mild axonal neuropathy. A second SH3TC2 mutation, R945X, was identified in the family, which on its own did not cause disease. However, this second mutation had an additive effect to the Y169H mutation, with affected patients who were compound heterozygous for the SH3TC2 mutations having a more severe CMT1 phenotype.

Neither of the two candidate dHMN genes DNAH11 or PCDHGA4 in CMT44 have been previously associated with a neurodegenerative disease. DNAH11 is an axonemal dynein, which are molecular motors that drive the beating of cilia and flagella. These are distinct to the cytoplasmic dyneins involved 7 exome sequencing of cmt44 182 in retrograde axonal transport in neurons. Homozygous and compound het- erozygous mutations in DNAH11 cause the recessive disorder, primary ciliary dyskinesia 7 (PCD7; MIM#611884), caused by immotile cilia [363, 364]. A role for cytoplasmic dyenins has been suggested in several neurodegenerative dis- eases [365, 366], with rare mutations of DCTN1 in dHMN7B [64], ALS and ALS with FTD [65–67]. The DNAH11 V1692I mutation is a highly conserved amino acid in the stem domain, which joins the cargo binding domains to the stalk and microtuble binding domains. This is a different region to the mutations described in PCD7 which all cause truncation of the conserved C terminal domain [363, 364] or the DCTN1 mutation in dHMN7 which disrupts the microtubule binding domain [64].

PCDHGA4 is part of the protocadherin-gamma (PCDHγ) gene cluster, one of three protocadherin gene clusters on chromosome 5q31 [367](Figure 7.4-A). Protocadherins are cell-adhesion molecules variably expressed in synaptic junc- tions in brain and spinal cord [368, 369]. The PCDHγ gene cluster consists of 22 ’variable region’ exons which are individually spliced to three ’constant region’ exons (Figure 7.4-B). Transcription of the 22 variable region genes is controlled by individual promoters with selective cis-splicing removing any intermediate variable region exons [370, 371]. Each variable exon encodes extracellular and transmembrane domains of the protocadherin protein, and the constant region exons encode the intracellular domain [372]. The extracellular domain contains six highly conserved cadherin domains which mediate cell-cell interactions when bound to calcium [367](Figure 7.4-C). The Y33S mutation is located at a highly conserved amino acid in the first cadherin domain. This amino acid is 100% conserved across the protocadherin genes on 5q31.

A mouse model with dramatic neurodegeration leading to neonatal death has been described lacking all 22 Pcdhγ genes [368]. Neonatal mice are nearly immobile and have reduced spinal cord synapses. In culture, neurons differ- entiate with normal axonal outgrowth and form synapses, but neurons die 7 exome sequencing of cmt44 183

chr5 Scale 200 kb 5q31.3 A. PCDHα PCDHβ PCDHγ

PCDHGA1 PCDHGA2 PCDHGA3 PCDHGB1 PCDHGA4 PCDHGB2 PCDHGA5 PCDHGB3 PCDHGA6 B. PCDHGA7 PCDHGB4 PCDHGA8 PCDHGB5 PCDHGA9 PCDHGB6 PCDHGA10 PCDHGB7 PCDH-psi3 PCDHGA11 PCDHGA12 PCDHGC3 PCDHGC4 PCDHGC5 C. PCDHGA4 Y33S CA1CA2CA3CA4CA5CA6

Figure 7.4: Organisation of the protocadherin gene cluster on chromosome 5p31. (A) Chromosome 5 ideograph with the location of the protocadherin gene cluster indicated by a red bar. This locus, expanded below contains three gene families; PCDHα, β and γ. (B) The PDGHγ family contains 22 genes. The variable exons (blue) are tandemly arranged upstream of the three common exons (orange). The PCDHGA4 variable exon is highlighted red. (C) The PCDHGA4 gene structure is shown, overlaid with the position of the six cadherin protein domains (CA1-6;blue) transmembrane domain (red) and intracellular domain (yellow). The Y33S variant is located in the first CA domain. Protein domain coordinates were downloaded from the UniProt Knowledgebase [376].

soon afterwards. The neuronal die-back was shown to be cased by fewer and weaker synapses after the deletion of Pcdhγ genes [373]. The PCDHγ locus is unrelated to the CMT4C locus on 5q32 with mutations in the SH3TC2 gene [220, 374, 375].

Only suggestive evidence of linkage to chromosome 7q34-q36 (LOD score 2.02) was identified in CMT44. With a genome wide linkage analysis it would be expected that several loci reach a suggestive LOD score of 2 with five affected individuals. This highlights the limits of genetic analysis using smaller single families that are unable to generate significant linkage scores. In this 7 exome sequencing of cmt44 184 analysis we identified 17 rare SNVs or indels that segregated with five affected individuals in family CMT44. It is likely that suitably informative markers at these 17 loci would generate LOD scores approaching the maximum theoretical value of 2 for this pedigree. The SNP genotype results indicate that there is no common chromosome 7 haplotype between CMT54 and CMT44 and they should be considered to be unrelated. A similar haplotype analysis linking two CMTX3 families was recently shown to be incorrect when considering additional SNP markers across the haplotype [357]. The SNPs used as markers here were all common, with a population incidence of > 0.1 (mean 0.4). As such they are unsuitable for haplotype comparison between families due to the high probability of identity by state. However, for homozygous SNPs where nucleotides are unique to each pedigree, the haplotypes can be effectively shown to be different. Analysis of exome data indicated that chromosome 7q34-q36 is unlikely to play a role in the disease in CMT44. The two genes with non-synonymous SNVs described here, in particular PCDHGA4, are good candidates for further analysis in additional dHMN and CMT families. Unfortunately this analysis was beyond the time-frame of this candidature. To confirm a pathogenic role for either of these genes will require the identification of additional mutations in pedigrees with a similar phenotype and further functional analysis. Functional analysis of these genes including cell and animal models, may shed light on the mechanisms that underlie neurodegenerative disease. GENERALDISCUSSION 8

Distal HMN is a disorder of the peripheral nervous system, predominantly affecting motor neurons. It clinically resembles the related hereditary motor and sensory neuropathies including CMT. This PhD project aimed to identify the defective gene on chromosome 7q34-q36 that is responsible for autosomal dominant dHMN1. A combination of traditional and new technologies were applied including linkage analysis, positional cloning and NGS. In addition to analysis of the original linked family CMT54, linkage analysis was carried out in an additional 20 families from an Australian dHMN cohort, with exome sequencing in one of these families with evidence of suggestive linkage to the dHMN1 7q34-q36 disease locus.

8.1 genetic studies in family cmt54

Linkage and haplotype analysis in family CMT54 was used to refine the minimum probable candidate interval containing the dHMN disease gene, to the 6.92 Mb interval flanked by D7S615 and D7S2546. This interval is defined by recombination events in clinically unaffected individuals in CMT54. This is only considered a ‘probable’ candidate region due the possibility of reduced penetrance in dHMN. The critical disease interval defined by recombinations in affected individuals is the 12.98 Mb interval flanked by D7S2513 and D7S637.

The 12.98 Mb interval on 7q34-q36 contains 89 known candidate genes (not including transcripts for taste, olfactory or T-cell receptors), with 56 candidate genes within the minimum probable candidate interval. A positional candidate

185 8 general discussion 186 based approach was initially applied to screen selected candidate genes using Sanger sequencing of exons and flanking intronic regions. With the possibility of reduced penetrance, the minimum probable candidate interval was used as a guide for prioritising candidate genes for sequencing. No pathogenic mutation was identified in 32 genes screened (and 1 partially screened gene) in CMT54.

Sanger sequencing can identify small scale sequence changes including point mutations and small indels, but cannot identify larger scale structural variation. Therefore, two complementary approaches were used to identify structural variation in the 7q34-q36 interval in CMT54: karyotyping of chromosomal spreads and aCGH. No microscopic structural variation or chromosomal abnormalities were identified using karyotyping. Array based CGH identified 89 unique CNVs ranging from 4 bp to 110 312 bp, with a mean size of 3847 bp. Four deletions segregated with the disease when just three affected and one unaffected ‘married in’ spouse were compared, however all were shown to be non-pathogenic, being intragenic sequences or identified in normal control populations. This excluded structural variation at a microscopic resolution and CNV from an approximate resolution of 500 bp and greater as a source of the causative disease mechanism in CMT54.

Using NGS, the 7q34-q36 disease interval was initially sequenced in one af- fected individual using targeted DNA capture of the 7q34-q36 interval and 454 sequencing. This achieved coverage of ∼73% of the critical disease interval with an average read-depth of 18.84 ×. No putative pathogenic sequence variant was identified from all exons, flanking intronic intervals and 5’ and 3’ UTRs. Using exome sequencing of two additional affected and one family control in- dividual, and re-sequencing of the DNA target capture individual, the coding exons from the 7q34-q36 interval were again examined with greater sequenc- ing coverage. Differences in sequencing coverage were demonstrated between patients, which limited the identification of confirmed known sequence vari- 8 general discussion 187 ants segregating with the affected individuals. To address this, a method was developed to examine the segregation of all identified variants considering the sequence read-depth in order to eliminate false negatives in affected individu- als. Using this method, 99.4% of exons defined by the CCDS database within the 7q34-q36 interval were sequenced with an average read depth of 227 ×. As no putative pathogenic mutation was identified in CMT54 within the 7q34-q36 interval, novel non-synonymous changes from the remainder of the exome were examined. After filtering using in silico based predictions of the effect of sequence variants and sequencing in additional CMT54 family members, all remaining non-synonymous changes were excluded.

While the sequencing of genes within the 7q34-q36 interval was compre- hensive, including both the targeted Sanger sequencing and NGS based ap- proaches, several possibilities still exist for the identification of the disease mutation. These include; mutations in genomic regions not included in the cur- rent analysis; or mutations that cause disease through a alternate mechanism not detected by these methods.

Considering genomic regions not covered by the current analysis, it is possible that the mutation may be present in: an unknown gene; an unknown isoform of a known gene; or a regulatory region. In the positional candidate based approach using Sanger sequencing, all alternate isoforms and known exons were considered for each candidate gene. This was achieved by using annotations from the UCSC genes track (UCSC genome browser) which uses data from RefSeq, Genbank, CCDS and UniProt databases. Given the large number of alternate transcripts for known genes in the candidate region, it is reasonable to assume that other isoforms are yet to be discovered.The sequencing approach using targeted DNA capture and NGS of the 7q34-q36 interval also applied the gene annotations from the UCSC genes track. On the other hand, exome sequencing only targeted CCDS genes, as used in the NimbleGen 2.1M exome capture arrays. The CCDS genes database includes a 8 general discussion 188 core set of protein coding regions that are consistently annotated and of high quality. However, not all genes and alternate transcripts are included [354, 377]. While the exome capture array targets most genes and exons within the 7q34- q36 interval, there are some notable exceptions, including DPP6, screened previously as a positional candidate using Sanger sequencing. Additionally, while most genes have been effectively identified since the completion of the human genome project [227, 228], annotations continue to be updated. An example of this is the AGAP3 gene (formerly CENTG3), within the 7q34- q36 interval, which has been revised during the course of this project with additional exons and isoforms annotated. While sequencing concentrated on coding sequences and intronic splice site intervals, regulatory regions including the UTR exons and core promoter were also examined. However, this analysis was limited to basic annotation such as upstream start codons or mutated stop codons. Several other regulatory mechanisms were not examined such as RNA binding sites, metal response element binding sites, secondary structure, and upstream non-core promoter elements, due to the difficulty in identifying these regions for most genes in silico. Sequence coverage for UTRs was variable between the platforms used. While all UTRs were targeted by the NimbleGen 7q34-q36 DNA capture arrays, these regions have been shown to be poorly covered by NimbleGen exome capture platforms, only sequenced by over-run of sequencing from coding sequences [227].

Considering structural variation as a disease mechanism at the 7q34-q36 in- terval, there remains a gap in completeness of structural variation analysis due to the limitations of resolution for the methods used in this thesis. Karyotyping is able to assess all structural variation within the genome but is limited by its microscopic resolution of approximately 3 Mb. Array based CGH used in this thesis achieves a much higher resolution, with a very high probe density, and was able to report on duplications and deletions down to a resolution of only a few bp. However, the probe density varied across the 7q34-q36 interval 8 general discussion 189 with variable suitability of the sequence to probe design. The average probe density achieved a resolution to detect a CNV of approximately 500 bp or greater across the 7q34-q36 interval. This analysis is limited to duplications and deletions of unique sequences; it does not explore inversions or balanced translocations or known repeat elements such as microsatellites. Small indels within genes in the 7q34-q36 interval were identified with targeted Sanger sequencing and NGS analysis (454 sequencing and gsMapper software), how- ever a gap remains in the resolution between the sequencing and aCGH based platforms.

Recently a non-coding GGGGCC hexanucleotide repeat in the gene C9ORF72 was identified as a genetic cause of ALS and frontotemporal dementia (FTD) by two studies [378, 379]. The size of this repeat ranges from 2-23 repeats in controls to approximately 700 to 1600 repeats in patients [378]. The disease associated allele is too large to amplify by PCR due to the large number of GC rich repeats. Hence, in one study, this repeat expansion was identified by an aberrant segregation pattern of alleles in a pedigree. In the second study, the presence of the expanded repeat was identified by manual alignment of NGS reads around a cluster of SNPs at this location. A similar repeat would not have been identified by any of the methods employed in this project. With expansion of short tandem repeats now demonstrated as a major disease mechanism in ALS, similar repeats within genes in the 7q34-q36 interval in CMT54 are good candidates for further gene screening.

A potential location for repeat elements is the gap in the human genome reference sequence located within the intronic region of the DPP6 gene. This is a non-bridged sequence gap with a nominal size of 99 kb [255](Section 4.3.1). Almost all such remaining gaps contain tandem repeat sequences [227]. This region of DPP6 is excluded from the disease locus by a genetic recombination in an unaffected individual by one informative marker (D7S2546). However, if this unaffected individual were non-penetrant, then the disease gene based on 8 general discussion 190 positional cloning would be DPP6 alone. DPP6 was initially selected as a good positional candidate gene due to its association with susceptibility to sporadic ALS in several European populations [264–266] This association could not be explained by coding sequence mutations in DPP6 [380]. DPP6 was sequenced in this thesis, with no mutations identified. The potential size of the gap in the genome reference sequence assembly makes it difficult to quantify using PCR based methods.

The initial genome wide linkage analysis carried out in CMT54 identified 7q34-q36 as the only locus with significant linkage to the disease. While three other loci achieved two-point LOD scores above 1.5, all non-synonymous sequence variants in these intervals identified by exome sequencing were excluded. Therefore, it remains likely that the causative disease mutation is located within the 7q34-q36 locus and is yet to be identified.

8.2 genetic linkage analysis in additional dhmn families

Using linkage analysis, the 7q34-q36 interval was examined in a cohort of 20 families with dHMN or CMT with predominant motor phenotype. One family (CMT44) was identified with suggestive linkage to 7q34-q36 (two-point LOD score of 2.02) reaching the the maximum theoretical power for this pedigree. Seven additional families had a haplotype segregating with affected individuals at 7q34-q36, however, they were uninformative in linkage analysis. Linkage to 7q34-q36 was excluded in the remaining 12 families.

Comparison of haplotypes between the families not excluded and a control group of 56 haplotypes indicated CMT54 and three other families (including CMT44) had haplotypes that statistically were significantly associated with the disease. This implied that the haplotypes were IBD and the families share a common ancestor for the 7q34-q36 interval. The common interval shared by 8 general discussion 191 these haplotypes suggested a smaller disease interval of 1.14 Mb. However, subsequent analysis comparing CMT54 and CMT44 haplotypes using a larger set of SNP markers showed the haplotypes are different. The initial false haplotype association may have been caused by chance or due to differences in ethnicity between the disease and control groups used in the analysis. The families with the common haplotypes all had similar ethnic backgrounds.

With no additional families conclusively linked to 7q34-q36 from the larger dHMN cohort it is likely that the gene on 7q34-q36 is a rare cause of dHMN. This reflects the genetic heterogeneity in dHMN overall, with most genes accounting for only a small percentage of disease [6]. If the disease gene on 7q34-q36 has a similar incidence to other dHMN genes we can expect few of the 20 families to be linked to 7q34-q36. However, this does not exclude the small families that were uninformative or gave suggestive LOD scores at 7q34-q36. Many of these families achieved the maximum theoretical LOD scores for markers at this locus.

8.3 genetic studies in family cmt44

In dHMN family, CMT44, exome sequencing identified the potential digenic inheritance of the known CMT gene MFN2 and an unknown mutation in a second gene. In this pedigree, the founder dHMN individual does not carry the MFN2 mutation, which is instead transmitted from his spouse who has a mild CMT phenotype. The four other affected individuals have a more severe combined dHMN/CMT2 predominant motor phenotype and carry the MFN2 mutation and the unknown mutation. The MFN2 mutation shows reduced or age dependant penetrance. The founder CMT individual has mild symptoms and onset during advanced adulthood. Two clinically unaffected but at risk individuals also carry the MFN2 mutation. A second mutation is likely to 8 general discussion 192 be inherited from the founder dHMN parent causing the dHMN phenotype. A haplotype at the 7q34-q36 disease interval was shown to segregate with individuals in CMT44 with a dHMN phenotype and not in any of the other family members. No putative pathogenic sequence variants were identified within the 7q34-q36 interval using exome sequencing. However, two candidate SNVs were identified in the genes DNAH11 and PCDHGA4. The SNV in PCD- HGA4 is a particularly good candidate gene for dHMN, with essential roles in the brain and spinal cord and a mouse model with severe neurodegenerative phenotype. As no pathogenic mutation was identified within the 7q34-q36 interval in CMT44, it is unlikely that this locus plays a major role in the disease and the suggestive (but insignificant) LOD score of 2 has probably occurred by chance.

The MFN2 P123L mutation was previously described in a small pedigree with severe early onset CMT2. This phenotype contrasts with the reduced penetrance and mild phenotype seen in some individuals in CMT44. The more severe dHMN phenotype in four individuals is likely to result from inheritance of both the dHMN and CMT2 genes from the two independently affected parents. The genetic analysis of CMT44 highlights the difficulties associated with diagnosis of dHMN and related peripheral neuropathies and reaffirms that caution must be applied when using unaffected individuals to define disease intervals.

8.4 conclusion

Identification of new gene mutations is critical to further understanding the biochemical and cellular processes underlying dHMN. Although the causative mutation for dHMN on 7q34-q36 was not identified, a significant proportion of the disease interval has been excluded using a combination of traditional 8 general discussion 193 and new technologies. This project spanned a period where there has been a revolution in sequencing technologies. NGS continues to be developed and improved. Bioinformatics platforms are also rapidly advancing for analysis of the vast amounts of sequencing data. Much of the sequencing data gen- erated by NGS in this project can be re-examined when new more powerful bioinformatics approaches address current limitations or when more powerful computing resources become available. In addition to the analysis of 7q34-q36 in family CMT54, a strong candidate dHMN gene mutation has been identified in CMT44. Finding the 7q34-q36 disease gene and other dHMN mutations will lead to improved dHMN diagnostics and may identify potential therapeutic targets for disease prevention or treatment. The further analysis of rare genetic variants identified through NGS will allow us to characterise the influence of less penetrant modifier loci on the variable phenotypes seen in dHMN and other peripheral neuropathies. Future studies should include transcrip- tome analysis by next-gen RNA sequencing, which may identify unknown transcripts and exons that map to chromosome 7q34-q36. CUSTOMGNUBASHSCRIPTS A

This Appendix details the BAsH scripts written to automate tasks and analysis in this thesis. All scripts were written by myself for BAsH, version 4.1.5 (Free

Software Foundation, Inc) using Ubuntu 10.04.4 LTS (www.ubuntu.com).

a.1 linkage analysis

The script pedmake.sh converts MLINK pedigree and marker data files into the input format required for the Easylinkage plus software package. File and marker naming limitations apply when using this script. File names can have up to 4 digits in a marker (e.g. D7s1234). Unique marker names are supported if added when prompted by the script and must be 6 or 7 characters long. For markers not named accordingly, manually edit the mlink.DAT file before running the script to change the marker names to comply with these requirements. Family names should be 6 digits long (e.g. CMT123). The output files from this script have been tested with; FastSLink, FastLink, SuperLink, and GeneHunter.

1 #!/bin/bash

2 echo"converts cyrillic output into easy linkage format"

3 OPTIONS="Help Run Quit"

4 select opt in $OPTIONS; do

5 if["$opt"="Quit" ]; then

6 echo done

7 exit

8 elif ["$opt"="Run" ]; then

9

10 echo"enter the marker chromosome"

11 read chrom

12 echo"D"$chrom > pattern.tmp

13 echo"d"$chrom >> pattern.tmp

194 a custom gnu bash scripts 195

14 echo"Are there any special microsat names(y/n)?"

15 read REPLY

16 while [ $REPLY =="y"]

17 do

18 # if[ $REPLY =="y" ]; then

19 echo"enter the special name"

20 read specM

21 echo $specM >> pattern.tmp

22 echo"another(y/n)?"

23 read REPLY

24 # fi

25 done

26 27 echo" *searching for patterns" 28 cat pattern.tmp

29 grep -f pattern.tmp mlink.DAT > markers.tmp# geta list of markers from. DAT file 30 echo" *the following markers were found" 31 cat markers.tmp

32 rm pattern.tmp

33

34 echo"is the disease marker in the1st position(y/n)?"

35 read REPLY

36 if [ $REPLY =="y" ]; then

37 b=42#position of first marker scoreA _1

38 d=45#position of second markerA _2

39 elif [ $REPLY =="n" ]; then

40 b=36#position of first marker scoreA _1

41 d=39#position of second markerA _2

42 fi

43 fcount=1# file counter

44

45 while read line

46 do

47 extnum=$(printf %03d $fcount)#adds leading zeros to the extension mumber for the file name

48 markerC=‘echo"$line"‘;# marker name line info

49 markerD=‘echo ${markerC/\"/\@}‘# Replaces first match of" with @

50 markerE=‘echo ${markerD/\"}‘# Removes last"

51 markerN=‘echo ${markerE:\‘expr index"$markerE" ’\@’\‘:7}‘# holds the marker name with the extra space at the end

52 echo"generating file for" $markerN

53

54 filename=‘echo ${markerE:\‘expr index"$markerE" ’\@’\‘:${#markerN}}" _" $chrom"_"$extnum".abi"‘

55

56 echo $filename

57 echo"output for marker"$markerN" into file:"$filename

58 echo -e"MARKER\tLANE\tID\tA _1\tA_2" > $filename#file header

59 a=0#line counter

60 while read line

61 do

62 a=$(($a+1))#a++ a custom gnu bash scripts 196

63 lineC=‘echo -e"$line\n"‘# load marker scores line into lineC

64 echo -e ${markerE:‘expr index"$markerE" ’\@’‘:${#markerN}}"\t1\ t"$a"\t"${lineC:‘echo $b‘:2}"\t"${lineC:‘echo $d‘:2} >> $filename;# MarkerName1 $aA _1A _2

65 done<"mlink.pre"

66 b=$(($b+10))# move to the next markerA _1

67 d=$(($d+10))# move to the next markerA _2

68 fcount=$(($fcount+1))#count to the next file

69 cat‘echo $filename‘

70 done<"markers.tmp"

71

72 # creating thep _family.pro file

73 #cut-b 1-39 mlink.pre>p _family.pro# create the pedigree file for easylinkage

74 # required extra field for SLink

75 a=0#line counter

76 b=$(($b+1))# move to the correct position for’S’ in Sample

77 while read line

78 do

79 a=$(($a+1))#a++

80 if[‘echo ${line:\‘echo $b\‘:1}‘ =="S"]

81 then

82 echo"Sample"

83 # echo-e${line:0:13}"\t"${line:13:7}"\t"${line:21:7}"\t"${line :28:4}"\t"${line:32:5}"\t"${line:37:6}"\t2" >> pro.tmp;#

84 echo -e ${line:0:41}"\t2" >> pro.tmp ;#

85 else

86 echo"No sample"

87 # echo-e${line:0:13}"\t"${line:13:7}"\t"${line:21:7}"\t"${line :28:4}"\t"${line:32:5}"\t"${line:37:6}"\t0" >> pro.tmp;#

88 echo -e ${line:0:41}"\t0" >> pro.tmp ;#

89 fi

90 done<"mlink.pre"

91

92 cat pro.tmp > p_family.pro

93 catp _family.pro

94 rm pro.tmp

95

96 echo"all done. Choose another option1(HELP),2(RUN) or3(EXIT)."

97

98 rm markers.tmp

99

100 elif ["$opt"="Help" ]; then

101 echo -e"limitations marker format D7s1234, with up to\n four digits for the marker name. Doesnt work\n for made up markers unless they are added in as specials\n will find D7S as well as d7s. There cannot be any spaces in marker names, manually edit these if there is not many in the mlink .DAT file. Make marker names6 or7 characters long"

102 else

103 clear

104 echo bad option

105 fi

106 done

¥ a custom gnu bash scripts 197

a.2 ngs coverage visualisation

The script PileupToBgr.sh extracts sequencing depth from SAMTOOLS pileup format and formats a bedgraph file for visualisation using the UCSC genome browser. To work there must be some sequence coverage at the start position selected. If there is no coverage at the desired start position then a position in the previous gene must be selected where some sequence coverage is present.

1 #!/bin/bash

2

3 echo this script will take pileup formatted data and generate a bedgraph showing the sequencing depth at each base position

4 echo it is recommended to examine small regions only to limit the size of the files uploaded to the UCSC genome browser

5

6 echo enter sample number# id of the sample being examined

7 read sample

8 echo"enter the chromosome in the format chrN"

9 read chrom

10 echo"enter the start position for the gene or region of interest in the format 123456789"#1st base position of gene of interest

11 read GeneStart

12 echo"enter the end position of the region of interest in the format 123456789"

13 read GeneEnd

14 echo enter the gene or region name"(Identifier)"# name of the gene of interest

15 read GeneName

16 GLength=$[$GeneEnd-$GeneStart]#calculate the length of the region of interest

17

18 # 1. makea smaller file with the info from just one chromosome

19 echo do you want to isolate chromosome data? only needs to be"done once.y/n"# skip this step if it has already been done previously

20 read Answer

21 if [ $Answer ="y" ]; then

22 str1="grep"$chrom""$sample".pileup>"$sample"."$chrom".pileup"

23 echo isolating $chrom data

24 eval $str1

25 fi

26

27 # 2. makea pile up file contining the sequence information for the gene of interest

28 # due to gaps in the sequence where there is no coverage at all this pileup will contain more coverage information at the end of the gene. This is because it counts lines equal to the length of the gene so ant gaps will cause lines after the gene to be included. This is no problem and beneficial as you can see where the gaps really are.

29 str2="grep-A"$GLength""$GeneStart""$sample"."$chrom".pileup>"$GeneName". pileup.tmp"

30 echo isolating $GeneName data

31 eval $str2 a custom gnu bash scripts 198

32

33 # 3. turn pileup into bedgraph format

34 echo generating bedgraph format

35 str3="cat"$GeneName".pileup.tmp| cut-f 1,2> one.tmp"

36 eval $str3

37 str4="cat"$GeneName".pileup.tmp| cut-f 2,4> two.tmp"

38 eval $str4

39 str5="join -12 -21-o 1.1,1.2,2.1,2.2 one.tmp two.tmp>"$GeneName".tmp"

40 eval $str5

41

42 # 4. add header to bedgraph

43 echo adding headers

44 echo"browser position"$chrom":"$GeneStart"-"$GeneEnd > header.tmp

45 echo browser full altGraph > header1.tmp

46 echo track type=bedGraph name=\" $sample $GeneName\" description=\"Sequencing depth"for" $sample $GeneName\" visibility=full color=200,100,0 altColor =0,100,200 priority=10 alwaysZero=on autoScale=off viewLimits=0:50 yLineMark =10 yLineOnOff=on > header2.tmp

47 str6="cat header.tmp header1.tmp header2.tmp"$GeneName".tmp>"$GeneName".bgr"

48 eval $str6

49

50 # 5. remove temporary files

51 echo removing temporary files 52 rm -v *.tmp 53

54 #merge Bedgraphs from different samples into one file with data on mutations in the region... can load all the mutations into the same file as there are not that many.

55

56 #6 upload the GeneName.bgr to the UCSC genome browser

57 echo"process complete" check $GeneName".bgr"

58 #echo GeneName.bgr is now ready to upload to UCSC genome browser. Make sure you are viewing HG18 build.

¥

a.3 ngs variant comparison with fold coverage analysis

Two versions have been written, CompSNPAllchr.sh and CompSNPchr7.sh. CompSNPAllchr.sh replaces CompSNPchr7.sh with more efficient code and handling of input data from multiple chromosomes. This tool compares SNP reports between samples using the SNP reports generated with SAMtools. It lists SNPs common to affected samples and not present in control samples, checking for coverage where the variant is not reported. The dbSNP annota- tions can be downloaded using the UCSC table browser. Limitations: Exome a custom gnu bash scripts 199

sequencing pileup files with good coverage are large (6 gb to 8 gb). Pileup files should be optimised to ensure good performance. If a linked region is being examined extract only the target chromosome data using the grep command (grep chr7 sample1.pileup > sample1.chr7.pileup). Otherwise use the cut com- mand to remove only the necessary colums out of the data files to reduce their size (optimised size 60%)˜ (cut -f 1-4 sample1.pileup > sample1.cut1-4.pileup). The SNP report files are smaller and are not a limit to the comparison on a reasonable desktop PC. Hard disk speed limits process on most computers, so locate files on local HDD not on USB HDD or network attached storage. If large amounts of RAM are available, load SNP and pileup files onto a RAM disk to increase speed but output should be directed to normal HDD space. This script is not recommended for routine use as it takes a long time to run on an entire exome.

grep chr7 sample1.pileup > sample1.chr7.pileup

cut -f 1-4 sample1.pileup > sample1.cut1-4.pileup

1 #!/bin/bash ¥

2 3 echo -e" **************\nCompSNP script\n**************" 4

5 echo"A specific file structure is required to run this script"

6 echo /Project

7 echo"/Project/snpfiles (/Project/\$SamPath/Sample1.snp)"

8 echo /Project/Sample1/Sample1.pileup

9 echo /Project/Sample2/Sample2.pileup

10 echo"Results are located in/Project/results renamed with the time of completion"

11

12 13 echo -e"\nprogram requirements\n * Sample data: sample.snp and sample.pileup\n snp file must have the same file name as the.pileup file\n * SNP database reference for annotation\n eg. chr7q34-q36SNP130.snp\n * read the script file for more detail\n\n"

14 read -p"Press any key to continue when files are ready or terminate usingC"

15

16 echo -e"\n\nEnter the path to the samples(dont use use/ at end of path)\n\n !!!Current default BGIData/onlychr7data/ will overwrite whatever you type!!! "

17 read SamPath

18 SamPath=BGIData#SamPath points to the sample.snp files only

19 a custom gnu bash scripts 200

20 #Initialise results files

21 echo"‘date‘" > $SamPath/debug.txt

22 echo"‘date‘" > $SamPath/GoodSNP.txt

23 echo"‘date‘" > $SamPath/BadSNP.txt

24 echo"‘date‘" > $SamPath/checkSNP.tmp

25

26 # get sample numbers without file extensions

27 # should use read to input new sample numbers each time buti was using these same four samples so added defaults that overwrite input

28 echo -e"Enter your sample numers\n\n!!!Current default will overwrite whatever you type!!!"

29

30 echo -e"enter sample Affected 1:\c"

31 read Sam1

32 Sam1=A040681# comment this line if you want to use different samples

33

34 echo -e"enter sample Affected 2:\c"

35 read Sam2

36 Sam2=A050794# comment this line if you want to use different samples

37

38 echo -e"enter sample Affected 3:\c"

39 read Sam3

40 Sam3=A080682# comment this line if you want to use different samples

41

42 echo -e"enter sample Control 1:\c"

43 read Sam4

44 Sam4=N010742# comment this line if you want to use different samples

45

46 read -p"Press any key to start analysis... terminate process early using C"

47

48 # addsa header with the sample numbers used to results files. Improvement would createa unique file name each run.

49 echo -e"Samples:\t" $Sam1 $Sam2 $Sam3 $Sam4 >> $SamPath/GoodSNP.txt

50 echo -e"Samples:\t" $Sam1 $Sam2 $Sam3 $Sam4 >> $SamPath/BadSNP.txt

51 echo -e"Samples:\t" $Sam1 $Sam2 $Sam3 $Sam4 >> $SamPath/checkSNP.tmp

52 echo -e"Samples:\t" $Sam1 $Sam2 $Sam3 $Sam4 >> $SamPath/debug.txt

53 # Error checking for files

54 echo"header for file $Sam1.snp" >> $SamPath/debug.txt

55 head -n 1 $SamPath/$Sam1.snp >> $SamPath/debug.txt

56 echo"header for file $Sam1.pileup" >> $SamPath/debug.txt

57 head -n 1"$Sam1/$Sam1.pileup" >> $SamPath/debug.txt

58

59 echo"header for file $Sam2.snp" >> $SamPath/debug.txt

60 head -n 1 $SamPath/$Sam2.snp >> $SamPath/debug.txt

61 echo"header for file $Sam2.pileup" >> $SamPath/debug.txt

62 head -n 1"$Sam2/$Sam2.pileup" >> $SamPath/debug.txt

63

64 echo"header for file $Sam3.snp" >> $SamPath/debug.txt

65 head -n 1 $SamPath/$Sam3.snp >> $SamPath/debug.txt

66 echo"header for file $Sam3.pileup" >> $SamPath/debug.txt

67 head -n 1"$Sam3/$Sam3.pileup" >> $SamPath/debug.txt

68

69 echo"header for file $Sam4.snp" >> $SamPath/debug.txt a custom gnu bash scripts 201

70 head -n 1 $SamPath/$Sam4.snp >> $SamPath/debug.txt

71 echo"header for file $Sam4.pileup" >> $SamPath/debug.txt

72 head -n 1"$Sam4/$Sam4.pileup" >> $SamPath/debug.txt

73 echo -e"\nNo file errors should have been reported. There should bea header for each file." >> $SamPath/debug.txt

74

75 # generate SNP reference file automatically

76

77 echo"Generating SNP reference file"

78 cut -f 1,2"$SamPath/$Sam1.snp" > AllSNP.tmp

79 cut -f 1,2"$SamPath/$Sam2.snp" >> AllSNP.tmp

80 cut -f 1,2"$SamPath/$Sam3.snp" >> AllSNP.tmp

81 cut -f 1,2"$SamPath/$Sam4.snp" >> AllSNP.tmp

82 sort AllSNP.tmp > AllSNP.sort

83 uniq AllSNP.sort AllSNP.uniq

84 echo -e"unique SNPs:\c"

85 wc -l AllSNP.uniq

86

87 rm AllSNP.tmp

88 rm AllSNP.sort

89

90 DataFile=AllSNP.uniq

91 echo"DataFile is $DataFile" >>"$SamPath/debug.txt"

92 DataCol=1

93 echo"DataCol is $DataCol" >>"$SamPath/debug.txt"

94

95 #Pull out all markers from file(add-u command to sort to pull out unique SNPs)

96 #echo"The data file contains the following SNP positions"

97 str1="cat $DataFile"

98 #str1="cut-f $DataCol $DataFile"

99 #str1="sort-k"$DataCol","$DataCol""$DataFile"| cut-f"$DataCol

100

101 #Count number of markers in $Datafile

102 echo -e"Begining analysis.\nThis may take some time.\nThere are‘eval $str1| wc-l‘ unique SNPs to be cross checked"

103 echo -e"number\tControl\tAff1\tAff2\tAff3\t"

104 #generates an array with the marker positions

105

106 OLD_IFS=$IFS

107 IFS=$’\n’

108 let line_counter=0

109 for line in $(cat"$DataFile"); do

110 str1=‘grep -m 1 ${line} $SamPath/$Sam1.snp | cut -f 1,2‘# is therea SNP in thos position?

111 str2=‘grep -m 1 ${line} $SamPath/$Sam2.snp | cut -f 1,2‘

112 str3=‘grep -m 1 ${line} $SamPath/$Sam3.snp | cut -f 1,2‘

113 str4=‘grep -m 1 ${line} $SamPath/$Sam1.snp | cut -f 1,2‘

114 str5=1

115 str6=0

116 echo -n -e"\n$line _counter:\t"

117 while:

118 do

119 str14=n

120 str11=n a custom gnu bash scripts 202

121 str12=n

122 str13=n

123 if [[ $str4 ]]

124 then

125 str14=Y#This is the control sample soY here excludes this SNP

126 echo -n -e $str14"\t"

127 str5=0

128 str6=0

129 break

130 else

131 str14=‘grep -m 1 ${line} $Sam4/$Sam4.pileup | cut -f 4‘

132 if [[ $str14 -lt 10 ]]

133 then#cannot exclude this SNP coverage is too low

134 echo -n -e"!!"$str14"!!\t"

135 str6+=1

136 else# if this is above 10 then the sample is included and will be caught by str5 analysis

137 echo -n -e $str14"\t"

138 fi

139 fi

140 if [[ $str1 ]]# if the SNP exists in the sample.snp file it is reported

141 then

142 str11=Y

143 echo -n -e $str11"\t"

144 else# the SNP is not reported in the sample.snp file, check the coverage in the pileup file. Field4 is the coverage figure .

145 str11=‘grep -m 1 ${line} $Sam1/$Sam1.pileup | cut -f 4‘

146 str5=0

147 if [[ $str11 -lt 10 ]]

148 then#cannot exclude this SNP coverage is too low

149 echo -n -e"!!"$str11"!!\t"

150 str6+=1

151 else#this SNP is not in this sample and has good coverage therefore is excluded

152 echo -n -e $str11"\t"

153 str6=0

154 break

155 fi

156 fi

157 if [[ $str2 ]]

158 then

159 str12=Y

160 echo -n -e $str12"\t"

161 else

162 str12=‘grep -m 1 ${line} $Sam2/$Sam2.pileup | cut -f 4‘

163 str5=0

164 if [[ $str12 -lt 10 ]]

165 then#cannot exclude this SNP coverage is too low

166 echo -n -e"!!"$str12"!!\t" a custom gnu bash scripts 203

167 str6+=1

168 else#this SNP is not in this sample and has good coverage therefore is excluded

169 echo -n -e $str12"\t"

170 str6=0

171 break

172 fi

173 fi

174 if [[ $str3 ]]

175 then

176 str13=Y

177 echo -n -e $str13"\t"

178 break

179 else

180 str13=‘grep -m 1 ${line} $Sam3/$Sam3.pileup | cut -f 4‘

181 str5=0

182 if [[ $str13 -lt 10 ]]

183 then#cannot exclude this SNP coverage is too low

184 echo -n -e"!!"$str13"!!\t"

185 str6+=1

186 break

187 else#this SNP is not in this sample and has good coverage therefore is excluded

188 echo -n -e $str13"\t"

189 str6=0

190 break

191 fi

192 fi

193 done

194 if [[ $str5 -gt 0 ]]

195 then#generatesa report on this SNP position from the UCSC snp 130 information

196 snpstr=‘grep ${line} chr7q34-q36SNP130.snp | cut -f 5,8,10,16,12,13‘

197 echo -e ${line}":\t" $str11"\t"$str12"\t"$str13"\t" $str14"\t"$str5"\t" $snpstr >>"$SamPath/GoodSNP.txt "

198 else# these SNPs are not found

199 echo -e ${line}":\t" $str11".\t"$str12".\t"$str13".\t" $str14".\t"$str5 >>"$SamPath/BadSNP.txt"

200 fi

201 if [[ $str6 -gt 0 ]]

202 then

203 snpstr=‘grep"${line}" chr7q34-q36SNP130.snp | cut -f 5,8,10,16,12,13‘

204 echo -e ${line}":\t" $str11".\t"$str12".\t"$str13".\t" $str14".\t"$str5"\t" $snpstr >>"$SamPath/checkSNP. tmp"

205 fi

206 let line_counter=($line_counter+1)

207 done

208 IFS=$OLD_IFS

209 a custom gnu bash scripts 204

210 echo"SNPs that need to be checked" >>"$SamPath/GoodSNP.txt"

211 cat"$SamPath/checkSNP.tmp" >>"$SamPath/GoodSNP.txt"

212 rm"$SamPath/checkSNP.tmp"

213

214 mkdir results

215 mv"$SamPath/GoodSNP.txt""results/‘date‘ GoodSNP.txt"

216 mv"$SamPath/BadSNP.txt""results/‘date‘ BadSNP.txt"

217 mv"$SamPath/debug.txt""results/‘date‘ debug.txt"

218 echo"the process is complete. The program output can be found in results/"

¥ PCRPRIMERS B

This Appendix lists the PCR primers used in this thesis.

Table B.1: PCR primers used in gene screening. Primers were supplied by Invitrogen in a 25 nmole scale desalted purification

Gene / Exon Primer sequence (forward / reverse) RARRES2 Ex1A F/R AGGAGCCAGAGTTGCTGGAG / ACCTAAACACGTGGGAGGG Ex1B F/R AGAGGAAGGAGGGAAGATTTTG / CCCGGGCCATGCTACTC Ex1C F/R AGTATTCGGCCTCCGGC / ACAGTGGCCTTTTCTCCAAAG Ex2 F/R ATTTCCAGGCTCCTTCACCT / GAATGCGCTGGTCTCCTG Ex3 F/R GGATCACACCAGGGCTTG / TTCTTAATGGACCTCCCAATTC Ex4/5 F/R GCCTAGTTGGGGATCTGGTC / CCCCTCAGCATTCCCAG ABP1 Ex1_1 F/R AAAATTCCATGGCCCTAACC / ATACTCTGCTGTGGAGATGGG Ex1_2 F/R AATGTCACCGAGTTTGCTGTG / GCTTGTGGGAGGAGAAGAGG Ex1_3 F/R GTACAACGGGAAGTTCTATGGG / ATAATGGACCGGGTCATCG Ex1_4 F/R ACCAAGTACCTCGATGTCGG / TTCCACCACCAGGGAGTTATAG Ex2 F/R GATCCTGGGGTAGAATGAGGAG / ATCCACATAGACTAGGAACCCC Ex3 F/R CTCTATTCTGACCCTGAGGCTG / GATGGTGATGAGGATGGTCC Ex4 F/R GAGGAATTCAGCAAGTTTCCAG / CTCTGAGGGTTTATTGTCTGCC EZH2 Ex2 F/R ACTGATTGTTAGTTTGCTGCGG / AAAACTTATTGAACTTAGGAGGGG Ex3 F/R TTTTGTATTATTTGAATGTGGGAAAC / AAAGATGGACACCCTGAGGTC Ex4 F/R ACCCTAAGTAAAAGAAAAGAGAGAATC / GGAAAAGAGTAATACTGCACAGG Ex5 F/R AAATCTGGAGAACTGGGTAAAGAC / TCATGCCCTATATGCTTCATAAAC Ex6 F/R CTAGGCTATGCCTGTTTTGTCC / AAAAGAGAAAGAAGAAACTAAGCCC Ex7 F/R GGCATTCCACAGACATGTAAAG / GCTCATCCGCTACATTGATTC Ex8 F/R CATCAAAAGTAACACATGGAAACC / GCAATCCTCAAGCAACAAAG Ex9 F/R TCCATTAATTGACTTTTCCAGTG / CATTAACGCTGACTTGATCACC Ex10 F/R TTCACAATATAGAACTGTCTTGCC / AAACACCACAAGCTAGGGTACG Ex11 F/R TTGAGTTGTCCTCATCTTTTCG / CCAAGAATTTTCTTTGTTTGGAC Ex12 F/R GCATCTAGCAGTGTCACAGAAG / AGAAACCAACAACAGCCCTTAG Ex13 F/R TTCATGGCACAGATAGATCCAG / CAAATTGGTTTAACATACAGAAGGC Ex14 F/R CACTCCACAGGTAGTAGGGAAG / AGGGAGTGCTCCCATGTTC Ex15 F/R GAGAGTCAGTGAGATGCCCAG / TGCCCCAGCTAAATCATCTAAG Ex16 F/R TTCTGTAGTCTACTTTGTCCCCAG / ATTCATTTCCAATCAAACCCAC Ex17 F/R TCCAGAAGGTCCAGTATTCACTC / GTCACCTCACTGACCTCTACCC Ex18-19 F/R CAAACCCTGAAGAACTGTAACC / AACCCTCCTTTGTCCAGAGTTC Ex20 F/R TTCTATGAATGTGCCGTTATGC / AAAACACTTTGCAGCTGGTG Ex20-N F/R CACTATCTTCAGCAGGCTT / GTTAAATCAAGATTCATACAAAGACAACTA

Table B.1 - continued on next page

205 b pcr primers 206

Table B.1 - continued from previous page Gene / Exon Primer sequence (forward / reverse) FASTK Ex1 F/R GTCTCCGAAGACCGATAGCTG / GTCTCCTCACCCGTGAAATG Ex2_1 F/R TGAGGTAAGCAGTAGGTAGCAGG / AAGCGCCACCGAGTAGTG Ex2_2 F/R GTCTCCGAAGACCGATAGCTG / GTCTCTGTCCGTGGTCCTCTAC Ex3 F/R AACGGGAGAGGAGAAAGGTG / CTGGCTCAATTCCACCTCAG Ex4 F/R CTGACAGGAGGAGGAAGGTG / CTTCAGGAAAAGGATAGCCGAG Ex5 F/R GGGACTGTCCTGTCTTTACAGC / ACGCACAATCAGAGCATGAG Ex6 F/R AACTACATCAGTGGTACGCAGC / CATGGTCCCTCTGGCTTTTC Ex7-8 F/R GTGGTGGCACTAGACTGGGAC / GCAGAAATGCCAGCGTTC Ex9-10 F/R AGACCCTGCCCAGAGGTAAAG / ATCAAAACACAGGGAACCAAAG Ex1-N F/R GCGTTCTGCTCTGGCTTTG / GTGGCCGCTGCCGCTAA Ex2B-N F/R GCTGCTGATCCCTCCAGTA / CTCTGTCCGTGGTCCTCTA Ex7-N F/R CTGTGCCTCCAGGCTACT / CCAGCGTTCCCGCAACA Ex8-N F/R CGCCACTACTCGAGACC / GGACTCCAGTTCCTCGAAG LR8 Ex2 F/R TCTCTCCTTTCACCTGTGCTG / TGTTGCTCCCAAAATGTAATTC Ex3 F/R AGAGGTCAAGGCAGAGGAAAAG / CAAGATCCATAAGCGCCTTC Ex4 F/R CCTGCTTCACATTGGCTATTC / GGGCAGAAGACAGATAATGGAG Ex5 F/R GGGAGGAGAGGGTTTCCTG / ACATCTCTGTCCTCCAGGG Ex6 F/R AGCAAGCTGGATAGAGTCAGTG / CCCCGCACTCCTACCTGTC Ex7 F/R TTCCTTTCCCTCCTCCTTTC / CATAGGGGAGGCAAGTGTGAG CENTG3 Ex1-2 F/R GTTGTTGTTGCCGCTGGAG / TGGAACACAGTCAGCAGAAAAC Ex3-4 F/R ATAGGTGACACAGGCAGCAG / AGGGAAGAGGACTGGAAGG Ex5 F/R CTGAAGGTGAGCGTCACAGG / CAAGTAGACCCTTCTCCCTGG Ex6 F/R GGTCTACTTGAGTTCACCTGCC / GAAAAGAGTAGGGGCTTGGG Ex7 F/R GTTCTTGGTGCTCTGTGTGTTC / ACTGGGAGCCTTCCTATCG Ex8 F/R AGTGAGAGCAAGGCTGTGTGTC / GAAGGTGGACAGGTGTGAGC Ex9 F/R CCCCATTAGCACAGGGATTC / TACCGTCTTCCTGACTTGGG Ex10 F/R TCTCAGTTCCTTCCCGTCC / CCTTGTTAAGGACTCCGCC Ex11 F/R TGACGACATCGTGACCTCC / AAATCACTAAGGTCTGCTTCCC Ex12 F/R GCAGGGAGAGACCTAGGAGTTG / CTGTGCCCAGGCTGAGAAG Ex13 F/R CTGAATGTCTCAGGCTCTGCTC / AAGAATGCTGCCCAGTTCC Ex14 F/R CCTAGAGAGAGGTGTCCGTCTG / GCAACTACTAGAAAGTGCTTAATCC Ex15-16 F/R GGTAGTGAGTGGAGTTGTGGC / GCGGGACAACTAGGGAAGG Ex17 F/R GAAGCCTTCCCTAGTTGTCCC / TCTAATACCATCACAGCCACTCC Ex18 F/R TCTCACTGTTTCTTCCTTGTTCC / GGGAGGAAAGAAGAGGTTGC Ex19 F/R TCTTCTTTCCTCCCCTACAACC / TGCTTCTCCATCTCTGTGTCTG UTR-N F/R ATGAGTAGGGGAGAAAGGCATATACTATT / GCTCCTCGCTTACAGCGTC IsoA-Ex1 F/R ATGAGTAGGGGAGAAAGGCATATACTATT / CGCCAGCCCAGTAGTCC IsoA-Ex1B F/R GAACTGCCCGCCGCAGA / GCTCCGAAGCCATGAACTTCC IsoA-Ex9 F/R GGAGGAGGGAGGCAGGA / CCCATCACAGAGGCCTGAG ACCN3 Ex4-5 F/R CTAAACCCTCAGCTCCTCAGAC / CACGAGTCCTTGCGAAGC Ex6 F/R AGTACAAGAACTGTGCCCACC / CACTAGGGCTACGGCCTG Ex7-8 F/R CTTGAAAGGAGTGGGAGAACC / ACACCTGGAGGCAAGCAG Ex9-10 F/R CCTCGAGATCCTAGACTACCTCTG / AACGGGCACACCCACTG NOS3 Ex2 F/R GTCCCCTCCCTCTTCCTAAG / CTCCCGACTCAGCTACAAGTTC Ex3 F/R GTGAGATCGCCAGTGCTGTAAC / ATTCAAAGAAGGGTTGAGCTTC Ex4 F/R ACCCCTGTACCCTTCCTCC / GGACAAGCAGGAAGTTGTGTG Ex5 F/R AGCACTTGCACAAAGCCTG / CGGGAAGCCTTAGGAACTG Table B.1 - continued on next page b pcr primers 207

Table B.1 - continued from previous page Gene / Exon Primer sequence (forward / reverse) Ex6-7 F/R CCTGCCTCCTCACCAGC / GAGGCTCCCCTCTTCCTG Ex8 F/R TGAAGGCAGGAGACAGTGG / CTTGCAGGCCCTTCTTGAG Ex9 F/R AGCACCAAAGGGATTGACTG / TTTTGGGGATGGAGTGAGAG Ex10 F/R GGGAAACAGAGATAGTCTCCCC / TCCTGCCCTAGTTTCTAAATGG Ex11-12 F/R GATATGGGGTAATCGAGGGC / ACTGCTGGGGCAGGGTC Ex13 F/R CACTCAGTATCCCAAAACCCTG / TCCTGACTTGTAACACTTCCTGTC Ex14 F/R GTGTTCAGAGATCAAGTTGGGG / GTTCCTGCCCAGTCACTCC Ex15 F/R CAAATCAATAATAATGAGGATCAGC / ACAGAGGGAGAGCTTTTCAGAC Ex16-17 F/R CCTCAGCCCCTCCCAAG / AAAGAACCAAACAGAATCAGGG Ex18-19 F/R AGGAGCAAGACGCAGTGAAG / AGTCAGAGAGGGGTGGGG Ex20 F/R CTCCTCCTCCCACATTCTCC / CTGGGTCGGGTCCTGAG Ex21 F/R AGTCACCAAAACACAAACATCAG / AGGAGAAGAAAGTCCAGGGG Ex22 F/R AAACCCTAAAGAGGCTCAGTGG / AGAGCTCCTTCCTGTTTCCAG Ex23-24 F/R GAAGAACTTGGGTCCTCCTTG / TTCCTATTTCCTGCCTTCCAG Ex25 F/R TTTCAGCCCAAAACGCTG / CTGGAGAGCGCGAGTCC Ex26 F/R GGCTCTGCCCCTGTTGAC / GGAAAAGAGCACAGTGGATCAG Ex27 F/R ACATGGAGCTGGACGAGG / AGGAATAATGCTGAATCCTTGC Ex4-N F/R GGCCTGTCCTGACCTTT / CCACGTGGTGTTAGGATCT EX18-N F/R CCAGGAGCAAGACGCAGTGAA / GCAAATGAGGCAGCCTGAGCTA EX19-N F/R CTAACCCGGCTGGTTCT / GGGCACTTATGGGGAGTCA Ex20-N F/R CCCTGTCCTGAACACCCTGA / CTGGGTCGGGTCCTGAG EX26-N F/R CCCGCTCCGGAGACTTT / CGTTGCTGATCCTGTCGGAA ZNF775 NC-Ex1 F/R CCAATGGAGTGGCTGGG / CTGGTCCATCCCTCGCAC Ex2 F/R AGAGTGCTTTATGGGATGGGAG / TGGCTTTAGAAAAGATAAGGGC Ex3A F/R TTCTCTCCATCCCACCCAG / GCCGGGAGTGGGTCTTC Ex3B F/R GTCACTTTGTATGCCTGGACTG / GCAGCTCTTACCGCAGC ExCR F/R AGCTGATTCAGGACGCGG / CTCAGCGACAGCGTGGG Ex3D F/R CTCTGCCAGGGCTGGTG / TTCAGAGCCTTAAGGAATCCAC Ex3E F/R AAGAGCTTCTCGTGGTGGTC / CTGAAGAGGGTGGTCACATCC Ex3F F/R GGTGGATTCCTTAAGGCTCTG / CATACCAGTTCTAGAGTCGGGC TMEM176A Ex1 F/R CTGCAGCCTGTCATGAACC / CTCCTACGCTCCTCCAACTG Ex2 F/R CTCCCCAACTCCAATGAAATC / AAGACATAACTGGCTCGGTGTC Ex3 F/R TTTTGGGGTTTTAGCGG / CATGTTGGAGGAAGGCAATC Ex4 F/R CCTGCATGGAGGCTCTCTC / TCTCTCTCAGAGTCCAGTCGG Ex5 F/R CTGTTGCAGGCTTCTCTGG / GGTAACACCTATTTCTGGTGGAG Ex6 F/R ATAAGGAGCATGGGGACAGAG / ATGATAATAGGTGGGACTTGGG Ex7 F/R ACCATGCTGCTGATTCTGGTAG / GTTTGCCATTGGCTCCTTC DPP6 iso1-Ex1 F/R GCTGAGCCAGGCAGAGTC / AACGTAAGGCGAATTCCAGAT iso3-Ex1 F/R CTGCCCTTAGGGACCAGAG / ACTGAGAAGGACAGACCTCCC iso2-Ex1 F/R TTACCTTACCGCTTGGACTCTG / CCCCAAATACATATCCTCCCTC Ex2 F/R TAGCTTCAAGAATGGGAAGATG / GCATAGTGAGAATTAGAACTTGGG Ex3 F/R AAATGACAATCAGAATGTTTATGG / AGACCCAACCTATTGAAAGTGG Iso3-Ex1A F/R CTGCCCTTAGGGACCAGA / CTGAAGACTGGATGGAAACG Iso3-Ex1B F/R AAACTGAAGACCTGGAAGATTT / CATCGCTCCTCACGGTG iso1-Ex1A F/R GAGAGAAAGCACAGCCAGA / GGTGTTGATCTTGCCAGT iso1-Ex1B F/R GGCTTCGCTGTACCAGA / AAAATTTCCCCTGCCGAC iso2-Ex1 F/R CCACTTGCCGGTTGCTAA / CTCTGTCTCTGTCTCTCTCA Exon2 F/R ACTGTTTTGTATCTTGGTTCTTGA / GCATAGTGAGAATTAGAACTTGGGA Exon3 F/R CAATCAGAATGTTTATGGTCCTCTC / GCAAATTACTAGACCCAACCTAT Table B.1 - continued on next page b pcr primers 208

Table B.1 - continued from previous page Gene / Exon Primer sequence (forward / reverse) Exon4 F/R AGTGATTTTACTTTACATAAGCATTCG / TTGGCTATGCTAAAAGTTGAAACC Exon5 F/R AAATGGCACCAGCAGATAC / AGAAGAGTCAAGGAAATATACTTAATGAG prot-Ex6A F/R AGCATCCTGGGACCATAG / GGTCTCCATGGAATATCTGAAACA prot-Ex6B F/R CATTGCCAGCTTGCCATGTA / CCAGTCTCCTTCCCATGATTA prot-Ex6C F/R CAATGGACCCTCGTGAATAC / GCTTCAGACACATGGAGAAAG Exon6 F/R CTGATGTTTGTTAGGATTCACAGT / AACAGCTCGTCCATGTTC Exon7 F/R AGGAGAAAGGAGTTAAGTTATAGGTAG / CAGGCTGAGTACAGTGAGAT Exon8 F/R ACCCGGTTGATGATTTCAG / AAGAACCCAACTCCAATGTAG Exon9 F/R CCTCATGTTCCTGCTGG / ACTGAGTGAGCAGGGTT Exon10 F/R GCTTTGCAGAGCTGAAGT / CCAGAAGAGACCATGCAC Exon11 F/R TGGCTGTGTCACCACTG / GGGCTGAGACGACTCTG Exon12 F/R GCCAAATTGTTTCCTGCAC / GTGATTTACTCTGAGAAGCCAT Exon13 F/R ATTTTATCCTGGTTCAACCTCTT / CCTAGAGGCTGTCACCC Exon14 F/R GAGGATGAAGAGCCATGC / CTGTACTCCATAAGTGTAATTGACAAA Exon15 F/R TGTCCATAGAGGTAGTGCCA / GGTGCAATTCTGCAAAGC Exon16 F/R TCCGCAGATCACCCTCAT / CCCTCCCTGCAAGTGTG Exon17 F/R AGCCTGTTTTCTTGTTCTCG / GGGCCAGCTGATCTCAT Exon18 F/R CAGTTACGCACATGAAAAGC / CTGCACTGTGTCACAGC Exon19 F/R CACCCAGAGCACCCTCA / TTCAATATTCCTTTATCATGGGTGT Exon20A F/R CGGGTCACGGGTAAAATC / CTTCGTGCAGGAGCTTG Exon20B F/R GTGGTGGTAAAGTGTGACG / TTTAGGTAAGAGCAAATTGCATGAA Exon21 F/R TCTGGGCTACAGGAAGACAA / GGATTAGACGTGGGTAATGG Exon22 F/R GAACTGGCTGCTCTGGG / CTGAAGGGAGGGCTGTG Exon23 F/R GGCCCTCCCTAGAGTTTG / GAAGGCAGAGTGGGCCATA Exon24 F/R TTCTCAGTCAGGTTTCCAAGC / ACCCATGGTGCCATCAA Exon25 F/R AGAGCCTCCTCCTTGATG / TGAACACCCGTCAGGTC Exon26A F/R CACACCTTCTGTGCGTTC / TGGAGGCCGTGAGTGAT Exon26B F/R TTGCTTGGGAAACAAGCTC / TCACCATGTTGAACGTCCTCATTA Exon26C F/R GTTCATATATGGGCTTGCTACT / TAAACCAAATGTCATGGGAAAGAG Exon26D F/R TGTCCTCCTTGGTACCG / CGAAGTCAGTCCTGACATTC Exon26E F/R TTCCTTTTGAGCTTGTGGAC / AAGTCTCCTGAAGCCTAGCA Exon22-N F/R TGGAACTGGCTGCTCTGGG / GGACGGTGGTGCTGAAGG CUL1 Ex2 F/R TTTTCATCTAATGGAGAACTAGCTG/TTTTACATGATGAACTGGGAGG Ex3 F/R TTTGGTTGATATATTGGGAATGC/TGCAATAGCAAAACACTGTCAC Ex4 F/R CGCATATCAAATGTTCAGGG/AAACTGTGATGGTAGGGTCAGG Ex5/6 F/R TGTCCTGATAGCAGATTTTGACC/CCTTCGTAAATGCTAACAATATCC Ex7 F/R AACATAACTTAAACTGAATGCTACACC/CAACAAATGACTTGTAAAGAAAGC Ex8 F/R CCTCCCTTATTTTCTTAATCGC/TTTTAAAACTGACATGGGCAAG Ex9 F/R TGAGAATTTTAGGAAGCATTAGAGTTG/AGCCTTTTGTTTCTAGGGAAGC Ex10/11 F/R GTTTCAACGATGGTACCAAAAG/GGAGGGAATGCAGATTTTCAAC Ex12 F/R TGAGATTGCCAGTTTGCTGTAG/GGACTGACTGAGTATTAATGAGAAAAG Ex13 F/R AAAGCCAAAGGCACATCTTG/GCTCGGCCATAGAGCTG Ex14 F/R TGTGCAACATAGAGTTGTGGG/TAGAGCTGACACCACTTCAAGG Ex15 F/R AATAAAGAGCAATCCACATTCATC/ATCTGCAGGCCTGTCCC Ex16 F/R GCTTTACTTAGAAACGCTGCC/AAACCTGCCACTATTTCACCAG Ex17 F/R AACTTGGGCTGTATATTTTGGG/AAAGATGCCCATATCCTCTTCC Ex18/19 F/R GAAAATGGACTGTGAGGTTGC/FAATACTGAGATCCGTGCTAGGG Ex20 F/R AAGCAGGGCACTTCTTAAAGG/GCAATCACTCCTTTGTCACCC Ex21 F/R CTCTGTGCAGCTTGTCCTTAAC/CATGCTACAGCAATCAATCCAC Ex22 F/R TGCCAGGAAGTAAAACTCTATGC/ACGGAGGGTCAGTCCCC KCNH2 Table B.1 - continued on next page b pcr primers 209

Table B.1 - continued from previous page Gene / Exon Primer sequence (forward / reverse) Ex1 F/R ACCCGAAGCCTAGTGCTGG/CCATCCACACTCGGAAGAG Ex2 F/R GAGTGGAGAATGTGGGGAAG/CTCGGGGCGTTCTCCAG Ex3 F/R ACTTGGGTTCCAGGGTCC/CCATTACATCCTCCCTTCCTG Ex4 F/R GTTCCCCTCCTTCCCTTACC/GTGAAAAGGTCAGAAGCCGAG Ex5 F/R GGTCTCCACTCTCGATCTATGG/CTCTGGATCACAGCCCACTC Ex6 F/R CTCCTCCTCATTCTGCTTGG/TTTCTCTGTCCTCCTCGCC Ex7 F/R AAGTCTTTGGGGATCTGACCTC/GGCTAGCAGCCTCAGTTTCC Ex8 F/R ACGCTGAGACTGAGACACTGAC/ACTTGTTTGCTGTGCCAAGAG Ex9A F/R AGATTTCTCTGACATGGAGGGG/CCCAGTGACTGCATATTCAGAAG Ex9B F/R GCCCTGTACTTCATCTCCCG/GGCCCTTAGTGAAACCAAATG Ex1-New F/R AGTCCAGTCTTGGCCGC/ACCAGGCCCCATTGACTC Ex2-New F/R CAGTCCCAGCCATGCTTC/CCCACAGAACCCTGCCC Ex9-New F/R AGATTTCTCTGACATGGAGGGG/CCCAGTGACTGCATATTCAGAAG Ex10 F/R AGCTGAGGGGACATGCTCTG/ATTCAATGTCACACAGCAAAGG Ex11 F/R TTTTCTGTGTTAAGGAGGGAGC/AAGGGATGGGAAGGTCTGAG Ex12 F/R TCCTGCCCAGTCCTCTCTC/CTGGGTGAGCGGGGTAG Ex13 F/R GAAGAGCAGCGACACTTGC/CCCTCTACCAGACAACACCG Ex14 F/R AGGGCCTGGGTTGACAG/CAGCAGAAAGGCAGCAAAG Ex15 F/R CTCCCGTCCATCCTCTGTC/CTCCTGAGCAGGGCCTC Ex1-New2 F/R CCCGAAGCCTAGTGCTG/CCCATCCACACTCGGAAGA Ex2-New2 F/R GCTGTGTGAGTGGAGAATG/GTCACACCCCCACAGAAC Ex4-New2 F/R TCCATTTCCCAGGCCTT/GGCCCAGAATGCAGCAA SMARCD3 Ex2 F/R CAGATCCTCAGAATGGCCC/ATGTGGTGGGAGCAGGAG Ex3 F/R GTCACCAGCCGAGGGTC/CGCTACTCGCTTACCTGGTC Ex4 F/R GGTCACTCCTGTGTCCTGAGAG/AGGCCCCTGTGTTCTGG Ex5/6 F/R CTCCCCTGACATCTTTTGTACC/CCTTAGTGCAGACACCTTGTTC Ex7/8 F/R GCAGAGGATTTCATCTAAGCCC/TCTCTCAAGATGCACCACTTTG Ex9 F/R TGTTTCCTCAATCAACACATCC/ATCTAGAAGGGAGGGGTGGTAG Ex10/11 F/R GAATCAGACTTCCTCTCAGCTTC/ACACCCTGGTCTCAGAAAAGTC Ex12/13 F/R TGAGGGGTTGTCCTTCACAG/CCACAAGCTGAACAGGGAG Ex14 F/R GGTGTCAGTAGGTGGGATTGTC/TGAATGACTTTTAATCCAGCCC Ex3-New F/R AACGCTCAGGGTCACCAG/CGCTCCCTGCTAATCATTC Ex3-New2 F/R TCAACGCTCAGGGTCAC/ACTCGCTTACCTGGTCC NUB1 Ex2 F/R TTACCCAACAAAGCACCTTAAC/GCCAAAAGTTTATATGGTCATGC Ex3 F/R TCATCAGAGTTGATGGCCTTAG/TGGATTGTAACTTTCCAAACAAC Ex4 F/R TCAGTTTATATCCTCTGCCGTC/CTCACAGAGGATCAAATAAGCC Ex5 F/R ATAGGATCTCTCTGTTGCCCAG/TTGGAAAAGCTATTTTCTGCTTTAC Ex6/7 F/R GGGTGACAGAGTGAGACCTTG/TCCAACCCTTTACAACACTTCC Ex8 F/R TTCTTAATTTTGGCTTTTAGATTGAG/CCACTGTTGTAGATGAAAGTCTGC Ex9 F/R AAACACAGCTCCCTTCCTAAAC/GCCAGATTTGTAAAACATTGAACAC Ex10 F/R TTTAATACCTATCATGTTGCCCC/GAGAAAGCACAGGCATGTTTC Ex11 F/R CCAAAATCTGATGCCTTTTAATTC/AGTTTCTCCTTCCTTCCCAATC Ex12 F/R CTGGAAATGCGGAGAATCC/CTCAGGGACAAGGCCAGAG Ex13 F/R TCATAGCAGCAGAGGACGG/AGTCCTGCTGCCTGATTCAC Ex14 F/R TCTTAAACATGAGAGTCGGCTG/CCTCAGTCTCCACATCTGCC Ex15 F/R CAGTCTGAGGTTGACTGTGGTC/AAAGAGCTGCATTAGCCTTTTC TMUB1 Ex2 F/R GACCCTGTGGAGGTTCCAG/ATAAAGGAGGGCTGGGTCTTAG Ex3 F/R GAGGGTGGGAGTGCTGTTC/AGAGGCAGGCCGGAGAG Ex2-New F/R GGTTTAGGTGAACCGTAACG/CCTTCTGCGACCACTCTC Table B.1 - continued on next page b pcr primers 210

Table B.1 - continued from previous page Gene / Exon Primer sequence (forward / reverse) Ex3-New F/R GTGGGAGTGCTGTTCCG/CAGTCCGTGCCCTTCTC REPIN1 Ex1A F/R GTTGGTCTCTGGAGTGACTGTG/CCGCAGATGAGCTCGAAG Ex1B F/R GCCCTGGTTCTGCATCTG/GATCGCTTGTGAATCTTGCTG Ex1C F/R CGCTTTACCAATAAGCCCTATC/GGCTGCCCTGGGAGAAG Ex1D F/R AAGAACTTCGGCAAGAAGACG/TAGGATTCCCATTGTCCCCTAC ABCB8 Ex1 F/R AGCCAACATAGAGCCCTCAG/CTCGTGGACTCCATCTCCC Ex2 F/R AACAAGCATAAACATTTGCGG/CTACATGAGGCCTTCTCCCAC Ex3 F/R AAGAGGAAGTGCTCCTTCCAG/GGAATGCCATCCCTTTGC Ex4/5 F/R GACAAGCGCAGTGCTCAGTAG/CTTCAGGATAAGGTCCTACCCC Ex6 F/R GTGACTGATGGCCTGAGGG/GTCAGCCCAACATCAGGG Ex7/8 F/R CTGACCCTTGGAAAAGTCCTTC/TGAGAGAGACCAGTGCATTCAG Ex9 F/R AAGAGGAGTGAGTCCTGGGAG/ATGAAGGGAGGGCCTTTG Ex10 F/R ACTCTCCCAGTCAGCAGTGG/AGCTGCACAGCAGTCGG Ex11/12 F/R ACCTGCTTCCTTGCTCCC/GGGGACATTAGTCCAGACCTC Ex13/14 F/R AGGGCTACAGGGAGACTTCAG/AAAGAAAGAAAAGGCACCTGAG Ex15 F/R GACTGTCCCATTGTGTATGGTG/CTGCTCACTGCTCCTGTGG Ex16 F/R GGCACTTAGAGGATCTTGATGG/CAACTTCATATAACTCATTGCAGC Ex17 F/R GCCCAATGGCTGATAGAGAG/AGTCCCCAGTAGGCTCCC CRYGN Ex1 F/R AGCAAATCTTTTGTCCGGG/GGATTTGTCACCATCAACAGTG Ex2 F/R CTGGGAGTAGGGGTTCCATC/GATTCCTTGCCACTGTCTGC Ex3 F/R GCTTCTGTTTCTGGGAGAGG/CACTGCTGGAGGTCTCATTC Ex4 F/R GTAGATGGGTCCCCAAAAGG/AACTGGCAGAGAAATGAAAACTG RHEB Ex1 F/R AAAAGCGGCGGAAGAAG/CTCACTTCCCCAGACGAGAC Ex2 F/R AATGTTTTCTGACCCTGATACC/CCTCCATGAATACACAATGGC Ex3 F/R TCATTGTTCAAACTTCAGGTGG/ACAACCTGAGGGGTTTTCTTG Ex4 F/R ATCCACTCTGCGTCTACTTTG/ACTATTAATAATTCCAGCACAGTCC Ex5/6 F/R ATGTTAAAATGGTGAGTCTGTGAC/ACGACCATAACACCTCTGAACG Ex7 F/R AGAAATTGACCAAAATGGAGCC/ATCAAGACAGAGGTCAAATGCC Ex8 F/R CTGAAGGAAACAGGGTCAAAAG/TGAAAGCCACATCAAGTAGCAC Ex1-New F/R GATGGAGTTTCCGACACG/CGGCTCCAAACTTCGCA PRKAG2 Ex1 F/R GTGGCTTCAGGGTGTAACAGAG/AGCGTTAACCGCTTCGG Ex2 F/R AAGGATAAGGTCTCAGGAAGGG/GTGAACACACAGGTGGAAGTG Ex3 F/R AGTTTCATACAGGGCAGGAGG/ATACAGGCACTCAGCACCAAG Ex4 F/R GAGACCAGACCTCTGACCACC/TGCAGATAGACTGCTCAGCTTG Ex5 F/R GTCCCTTCCCACGCTTC/TAATTCGAGTTCGGCGAGTAAG Ex6 F/R GGTTTACGTATATGCTCTTCATCC/AATAAACGTTTGCTGGCATTG Ex7 F/R AAGACTTGGTCAAATATGTTAAGGG/TGAATTCCAAACCGGCATC Ex8 F/R TCCAGTTTGGTATACACAATTAACAC/GATTTGGAAAATCACCATCAGC Ex9 F/R TGCTATTATCCAAGCTATGTTTTG/TTTTATGGAGCAGAGAGAAAAGG Ex10 F/R AAACTGAAGGGTATTTGAATTTGTG/GCCCTGTTTGGAATGAAGAAC Ex11 F/R AAGTGCTTTAAGGCAGGGGTAG/GGGAGACCTCAAAACTAACAATTTC Ex12 F/R TCATGTTAATTCAGGCATCCAG/CTCCTGTGCCAGGTCATCTTAC Ex13 F/R TGAATAAATTCAGCACCAAGGG/CCTCACACCCCAAAAGGTTAG Ex14 F/R ATGCAGGTACTAAATTCAGATAAGC/CTGAGATTCAGGCTTCCAGAG Ex15 F/R GGCTGGAGGGATGTGTTG/TTAAATGCTGCACTTCCTGTTC Ex16 F/R CTGCCAGACACAGGTAGAAACC/ACATTCTTAACCACTTGCAGCC Ex1-New F/R CTTCTTGGCCCGCTGGA/CAGCGTTAACCGCTTCG Table B.1 - continued on next page b pcr primers 211

Table B.1 - continued from previous page Gene / Exon Primer sequence (forward / reverse) GALNT11 Ex2-New F/R AGTGGTCCTACTTGGTGGTTTG/TGATGCAGCATTTCTGATCTC Ex3-New F/R CAACCTAGAGACAACCTAAGTTTGAG/CCTCGTGACCTCTTACACTGAC Ex4-New F/R GGTGAACAAGGGGACAGTAAATC/CTTGAAAGTTCAGCTGTGTTAGTG Ex5-New F/R CTGTAGTATTTGCTTGATAGTTTGTTG/ATCAGGACAGGCGCTGAG Ex6-New F/R ATTGGACGGGTGGTTGTACC/CTTTTGCTCCTTACAAAGAAGG Ex1 F/R ACGGGTGGTTGTACCTAAATTG/CTTTTGCTCCTTACAAAGAAGG Ex2 F/R GTTATGTTCCAAAATGACACTGAG/GGCAGCAGAGAAAGACCCTATC Ex3 F/R CTTGCCAAGTTTCATAAACACC/CTTGCCAAGTTTCATAAACACC Ex4 F/R TTCATTGATTCTCAGATACAGTTTAAG/TGCGTAGAGAAGTTCCCATAAAG Ex10-New F/R AATTCGTACCTCCGGAATAACC/CTGATGTCCTCAGGAGTGACTG Ex11-New F/R AGTGCATTGGTCTGTTTCTGC/AAAATGAGTTCATCTGCCACAC Ex12-New F/R ACTAACCTTTGGCTAAGAGGGG/CTTTCCTAAAGCCCCAAACTCC SLC4A2 Ex2 F/R CCATCCAGGTTATGCCTCC/CTCACACCCCTGGTTGGC Ex3 F/R ATTAGAGGAGATAGCTGCTGGC/ACTAGGACCCCAGTCCCTTC Ex4/5 F/R GGGATCAGGTTTGACTGTCC/GTTCCAGCATCACCTCACTATC Ex6/7 F/R AGCTAGGAGGGCCTGGAGTC/AGGCTTGAGATTACAGCTGGTC Ex8 F/R ACTCACCTCTGACACCCACC/GTGGACCATGCTTCCTCTG Ex9/10 F/R CATGTGTAGGCTGGTGTTCC/CCTCCCACTCTGAGTCCAC Ex11/12 F/R TGAGGAAGCCTGGGTGTG/CACAAAGAAGAGGGCTGAGG Ex13 F/R GTCTTGATCCCCATGACTGC/ACCCAGGCCTCCGTCAG Ex14/15 F/R ATCTGCCCAAGATAAGGGTACG/AGGTCCTGAGCTCTGGGTG Ex16 F/R CTCGGTGAGGGCTCTTCTC/CTCAAATCCAGAGGATGGGAC Ex17/18 F/R TTCCACCCGACACTCACTG/TAAGCTTAAGGACAGGCTGGG Ex19 F/R CTTAAGATGCCATCCCCTTTC/ATCCAAGGTTCCCTGAGGTC Ex20 F/R CATTCTGGAAAGACCCTGTG/ATCCCCGATAACTATGGAGAGG Ex21 F/R GTACGTTGCCTCTTGCCTTTC/CTGAACTGGGCAGATGGG Ex22/23 F/R CTGGCACCTAGGAATGTTCTTC/CCAAAGCACTTTACTGCAGGG XRCC2 Ex1 F/R CTGCGCAGTTGGTGAATG/CCCAAGCCTCCCAATCC Ex2 F/R CACCCAGCCTAAAGTTATGAAATC/CACAACCTGTGTCTCATTTTCC Ex3A F/R TTGCTTGTACAGCTCCATTTTG/TGCATTATAGTTTGTGTCGTTGC Ex3B F/R CCTTTTGATTTTGGATAGCCTG/TTTAAAACTTTAGGGCTGGGC ABCF2 Ex2 F/R ATTCCTGAGAGGGATGGAGC/ATCAAGATTCCAACTCTGGCTG Ex3 F/R AGTGCTCAGCAGGATTGGAG/TTTGCTCTGAACACAGTTGATG Ex4/5 F/R ATGGATGAGGTCCCAGTTTGAC/ACCACTGACTCCTGTTCCAAAC Ex6 F/R ACTCAGAATTGCATGTAGCAGG/TAACATCAGTGCAGTTTCTGGG Ex7 F/R GGACAACACATCTAAAAGGTGC/TCAAGTGACCCACCCACC Ex8/9 F/R CTTACTGCACAGCCTTAGAGCC/GCTCCTTTCTTATCCACCTCAC Ex10 F/R GATAAGGTGGGGTGTGATTGG/AGCACAGAACAGGAGCAACTC Ex11 F/R CTTCCTCCTCTCTTGGTGACTG/AGTAGTAGCCCCTGCATCACTC Ex12/13 F/R GTTTGCCTGCTCTGCCTATG/AAATGGAGGAGACTCTGGTTTG Ex14 F/R TTATAAATGGCGATGCTAATGG/GGGTATTATGCACCCTCCAC Ex15 F/R CTTCCTAGACGCCCCTCAG/GAGCGGCTGGTCAGGTTA Ex16 F/R CTTGCAGTGTAGGCAGGACC/CACAGCCGTCTGGACTTCTC KEL Ex1 F/R AAGGGCAAGATTGCTTGG/GGGCAGGACTTTTGCAC Ex2 F/R AACTCTCTTCAATGCACCAT/TAGTAGATGAGTGTTTGTGGGAT Ex3 F/R TTCTTCCCATCCCTGTCTCTCAA/GGTCAGGGTGGGCAGAA Ex4 F/R CCTGATGTTCTTTGCCTAGC/GGAGCAGTCATGGTCATTT Table B.1 - continued on next page b pcr primers 212

Table B.1 - continued from previous page Gene / Exon Primer sequence (forward / reverse) Ex5 F/R CCTTTGATAGGAATTTGGGGC/TAAGCATGCATTGGTCCCATA Ex6 F/R AGAGCCGATCCAGACAA/ACCTTCCTCTAAGGGATGATT Ex7 F/R CACCTGCCCTCTCCATC/CCAGAGGTCCCTGCTTT Ex8 F/R CCTCTGAGGCCAGGAGAA/GAGGAAGAAGGAAAGGAGGTAAT Ex9 F/R TCACACCCAAGGGGAAG/CCAAGATCTACGGTGCTCA Ex10 F/R GGCTCTTCTTGAAGCATCC/GAGAGAGATGCCGACATTTAC Ex11 F/R GGAATGAAGATCTAGTGAAGACTG/GGAAGGCTGCTTTGGGTA Ex12 F/R GTCAGAAGCTGTGGGCA/CATCTCGCTTGTTCCAATACT Ex13 F/R GCAACGAGTGGGAGAAAT/CTTAGGAGGGTCAGAGAAGTG Ex14 F/R ATTGGTGTTTGTCAGCGT/GGCTTCCTTTCCTGTGAATC Ex15 F/R GAGGGACTGTGTAGGTCTCA/CATCATAACACCTGTCGGC Ex16 F/R GGACTTACGGTTGGAGAATTG/CCCTTGTGGTCTTCCCTTA Ex17 F/R CTCACCTAGGCAGCACCAA/CAGGAATGGTGGAAGGAAA Ex18 F/R GTGTGGAGGGACTTTCAAAT/AGCTTCAGTAATAAGGCGTTC Ex19 F/R ATCACCCCTTGTCACCG/GGAAATCTTGCTCTGATTCCAT MLL3 Ex1 F/R GGAGGAACCCGAGCAGCA/GGTGACAGCTCCGGCCC Ex2 F/R TCAAATTGTGCCTTATGTTGAATATAAGTC/TACATGCAGTGTACAAACATATGCAAA Ex3 F/R ATGCTTTATAGATACATTCACCAGTGATTA/TTTCTCTATAGTCATTTCACTGCTTTATTC Ex4 F/R AAGCACAAGTATCTATAGTTATCGC/GAACAATCCCATTTGAAAATTTATAAACAT Ex5 F/R ATTTGGCAGTACATTGCATTTAAACA/CTAGATGAAAGGTATTTATTATTGAGGACA Ex6 F/R AGAGTTGATTTTGAAGTTTTATGTCTTTAT/AATCCTTTCAAGTGAACTTCTGATT Ex7 F/R CTTCCCAGCTTATATTTGAATATTCCTT/ACTCAGGGAATTTACAACATTTGTTA Ex8 F/R AATCCTTTCAAATGATTTTACTGTATAGTC/CTATTATATTACACAAACCTTTAGCACC Ex9 F/R GCAGTTCCCAGGACTGATTATT/CCGATTTGTCTGGTCTCCAT Ex10 F/R AAGGTTTACATCAGAACTACAATTACAAAT/GCTGTAAGATGGTAGAGCAAATGAC Ex11 F/R GCAATGCCTTCTCTTTACCTAGATAAT/ATTGGTGATACAATGGCACAAC Ex12 F/R AATCATAATTGTTGACTTAATGAACAAAGA/ACTGAAGTTTTAGTCAATATTCACATATTT Ex13 F/R AACTCTTTACTCAAATTACTGTTGAGGGAA/AGCAATCGCAAAAATGTCTTTGG Ex14A F/R GTTCTAGTGTAGCCAAATGTTTTATATATC/CATCAATCCAGTAGAAAGTTCAGAAT Ex14B F/R CATTATGTCCTGAGGAACAGT/TCCACACCAAAGCTACACATATAC Ex15 F/R TTCCTTAGTTGCTGGCTTTGAC/AAGATTTAATTCTTAACATCCAGTAGGG Ex16 F/R GTTTTATCTCAGTGGCATTTGGATTTA/ACTTGCTATGAGATTTTCATCATTAACT Ex17 F/R TTCTGAAGCTTTCCAGTTAAGTGTA/GTATATGAGATAAAGCAGAAATGTGCAA Ex18 F/R TAGCTATAGAGCTATATTAGTTAAGAAGGT/AACTGGGAGCTCATGTTACT Ex19 F/R CTTCTAGGTACATAAGCACCATATTAATTT/ACCAAAGTGGTATGAGAAAGC Ex20 F/R CTTCCTGAATTACAGTAACTGAGTAACTA/TTGTGATTAAGGAAAAGCTATTTTCCAT Ex21 F/R TGAAGACTCAGTGTTACAGGTTT/GGTTATGGACAGTAAGGTGCAA Ex22 F/R GCACTTTTAACTTTAGTGTTCAAACAT/GTTTGACAAGTGGTAAGCCAAT Ex23 F/R ACATGGGTATTTTGGAACAGTAATC/ACACATATTTCAAAAGTTACTTTATATGCT Ex24 F/R ATTCCCCATTATGTAGGGATTATATATTTG/ATTGAAGTCTTCAGTATTGTATTTTTCTTT Ex25 F/R CCTGTGTACTGTGGAAATGTAG/TTTCCAATAGAAATATCAAGACCAAAGG Ex26 F/R ATTACCTTCAGTGGGTTTGTCT/ACAACAAACCTATGTTTTAAAACTCAATAC Ex27 F/R TCTGTTATTTCTGCTTCCTTGTCAA/AAAGATTTAACTGAAAGGAAGATGTATG Ex28 F/R CAGTATCTGCTGTACTCTAGGTT/ACACATCATACACCTCTCGC Ex29 F/R GTTATATACTAAGTACAGGTAAATAGTGCT/AGAAGTTAAATTATAACCATGCAAAAATCT Ex30 F/R AAGATTTTTGCATGGTTATAATTTAACTTC/AGACTAGTAGGGTCCTGGTA Ex31 F/R CCCATACCAGGACCCTACT/CTTGCTTGTGTGTGTATATGTACC Ex32 F/R AGCTTTACTTCAACTCCTTGTTAC/AATATCATACACAATCATTTAAGCAGACTA Ex33 F/R TGTAATAGGTTTTAATGAAAGCTATACTTG/GTTAAGCTACCTTTCTATTGCAGTT Ex34 F/R CATCAACATTCTTTATTACAACACATCAT/CCTAATACAAATAAGGTTCAAGCACTG Ex35 F/R TTTCTTTTATTTGAGAAAGTTCATGATAGT/AAATAATAATGCAAAGACCTCCCT Table B.1 - continued on next page b pcr primers 213

Table B.1 - continued from previous page Gene / Exon Primer sequence (forward / reverse) Ex36A F/R TCCCCCCATGATGGTAGTT/AGATGGTGGTCGTGAGTTAG Ex36B F/R GTCTTTCTCAGGCTCAGACT/TCACTGGTTCCAGCTGCTA Ex36C F/R TCCCAAATCCTTGGGCCTATC/CTACAACAGGTCGTGGAGTTC Ex36D F/R CCCAGCCTGGAACCATA/AGGTGTCTGGGAACATGTATC Ex36E F/R TTCCTGCAAGCAGCACAA/TCCTGTGTAGAAATTACTCAGTTTGATG Ex37 F/R GGGCAACATAGCAAGATCCCATATCTAA/GCCACCACACCCAGCTC Ex38A F/R GGTGATAGCTTTGCTTTTGTAAGTC/GGAGCTTCTGAAAATTCACCAC Ex38B F/R CCCAGGGTCTACCTAATCAGC/GAGAACCAGAGTTTTGTTTTCTTG Ex38C F/R GATCCAGAACTTGACATGGGAG/AACTTGACACATGATTGGATGG Ex38D F/R TGTGGCATAACTGGATCAACTC/AACAAACTCCTCAAGGAGGAATAG Ex39 F/R AATATTTGAATTACTTCTTAGTGTACGTTG/GGGCTTTCCATTTTCTGTATGT Ex40 F/R ACTGTTAAGCACCTAATACAGTTT/AAGCATGAGTTTGCTGGG Ex41 F/R TTTTCTTTCTTTCTTTTAAAGTTTTTTCCC/GGTTTAGAACAGCACACATAGAG Ex42 F/R AGATTTGTGTAACTTATAATGTTGGACT/AGTCACTAATGAACAACACACC Ex43A F/R ACCTCAACCACAGTCATCAGC/CCGTTGTCTCTCTTGCTGTTC Ex43B F/R AGTCTCAAATGCAAATCCACAG/GTGTAAAAGAAGGCCTCACTGG Ex43C F/R ATTCACCATCAATCCCTGTTG/TCCATGGAGAGCTTGTCTACTTC Ex43D F/R AAGAGTCGGTGGAACCAGTC/GAATAATGTGGAAATACAGTTGCTG Ex44 F/R AGCTTGGCACTGCATGAACTAA/GGCATGAGCCACTGCCC Ex31-iso2 F/R GGTCTTAGAAAGTTGAGCTACAC/ACAAGCTCTATCAATTCTGACATGTTA Ex45 F/R ACAGAACTACGTTTTAGCAAGG/CTTTACAATTCTATTGCCATTTCACAATAC Ex46 F/R GGCAATAGAATTGTAAAGTTCTACCC/AAAGACATTATGCCACATACATACATTT Ex47 F/R AGGGTTGGTACAGGAAGC/AATCAAAATATTTACATTAGTAAATTGGCG Ex48 F/R ACACTACGCCAATTTACTAATGTAAATA/TGGTAGACACCTGACCCATC Ex49 F/R TTTCTTCAGTGGTTCACTGTTG/AAGATAATCAGTATCTCAAAGAGTAAGC Ex50 F/R ATGTTGTAATGAGTTGATACTGCATAATTT/AAGAGACACTGCCAAGGAC Ex51 F/R AGAACCACTGTCACTTTTGAATATAACATA/ATGCCCAGAACACAGAAGTC Ex52A F/R TTGTCTTAACAAGAATTGCTTTCCG/GATAGTCTTTGGGCACAGGAT Ex52B F/R ACATTTAAACCACCTTGTGAGGATGAAATA/TGTCGCACCTCATCACGC Ex52C F/R CCAAAGGGAATTCATGAGCAAG/ACAAAATACATTCATTAGCCTGACA Ex53 F/R TCCTTGTGAGTGGTTATTCTTCATT/ATGTTACAGCATGTGTTCAAGT Ex54 F/R CCTTTGACCAGCTTGAGATTG/ACAGGAAACCACTAATACGCC Ex55 F/R TTTGGGCATTACTCAGAGGTATC/TCACATACTGGACATCAATAGCTT Ex56 F/R CGAGGGTGAAACCTTTGTATGAA/ACGGAGAAAGGGAGAGGAT Ex57 F/R AGTCCTCCTGGTGTGGATAG/TCTCCACCAGCACTGGC Ex58 F/R ACTGCTTTATCAATAAATGGGAAGT/GACCTGTGTGAGGAGGG Ex59 F/R AGATGGAGGCAGAAACGC/CCAATGGTGTGTTGAATCGC PDIA4 Ex1 F/R CCAGACCCCGAGTTCTC/CACTAGTTCTGGAGCTGAC Ex2 F/R CCGGCCAGGAAAGTTTTAAG/TGAAGGGCTGAGCGAAT Ex3 F/R ATGTATTATTGTTGGATTTAAATCCCATAG/CAAATTCCCATTCCCTGCG Ex4 F/R CTTCTGGAGGGAGGTGAAG/CTGTTCCTGCCTCACGG Ex5 F/R GCTCATTTAGCTAGAAAGTGAGTC/TCAATGGCAGCTTCCTGTTA Ex6 F/R ATGGGCCATGGGAAGTG/GATGTCCCAGGAGCTTTCT Ex7 F/R GGTAACGAATTAAATCTCAGACATTTAG/GTGGCTCAAAGCAGAGT Ex8 F/R GTCCCTCAAACATTGTACTCAAGAA/CACTCCACCTTGCCTGAAA Ex9 F/R GAGTCTGAGCTGGGACTTG/CCTGGCCTGTCTGGATT Ex10 F/R CCCAGAATCTTTGGCCTTT/ACTGTCGTTTGTTGCCG b pcr primers 214

Table B.2: PCR primers for microsatellite markers. Forward primers were 5’-FAM labelled. Marker Primers forward / reverse D7S1815 F/R FAM-GCCTACTTCCCTTGCTTACC / CCCAGTTCATGCCTCTGC D7S615 F/R FAM-TCCTATGGAATTCCTGTGG / GCCATTGGTCACTCATATTTG D7S2442 F/R FAM-TGAGCCAAGATCACAGCACT / CTGGAAGCAACAGATGTCACTA D7S2419 F/R FAM-TTGTCCCAGGCAGTTTTAC / GGCAATGAAGTTTCCCAG AD0-23xGT F/R FAM-GCTGAGATGGCACCATTGTA / GTAGGTGTGGGGGTGGAGT AD1-28xTA F/R FAM-ACATGCCTGTTATCCCAGCTAC / CGTTTTGATACATTATGGAATGG AD2-30xTA F/R FAM-GGCCTTGTGATCATGTGAGTTA / GAAACGTTGAAACAATGACAGC AD3-23xCA F/R FAM-AGATACTCAGGAGGCTGAGGTG / ATTCCTGGGGATGCTTATGAC D7S1823 F/R FAM-TCCAAGTAAAGGACAGACTTC / AGCCTCTGTGCAGATCCTC † D7S2447 F/R FAM-GTCAGATGTAGGAACCAAGC / TCTCACTGTACTCAGCCTGT † D7S1518 F/R FAM-TCAGTGGGGAAAGCCCTCAC / GGGCAACAAGAACGAAACTCC † D7S684 F/R FAM-GCTTGCAGTGAGCCGAC / GATGTTGATGTAAGACTTTCCAGCC † D7S637 F/R FAM-GCTGTACCTCCCAGCA / ATATCAAAACCAAGATGTTGAC † D7S661 F/R FAM-TTGGCTGGCCCAGAAC / TGGAGCATGACCTTGGAA † D7S2513 F/R FAM-GCAGCATTATCCTCAACAGC / CACAAATGGCAGCCTTTC D7S2473 F/R FAM-CAGACCTCAGTGCCTCAG / GGGGATTCACCAGTAGATG † Marked primers were supplied by Sigma. Otherwise primers were supplied by Invitrogen.

Table B.3: PCR primers for RT-PCR. Primers supplied by Invitrogen. Name Primer sequence CNTNAP2-Ex20 F GCCACGAGAAGACCATCTTT CNTNAP2-Ex21 F CCCAAGTCGCTCTTTCTG CNTNAP2-SiEx1 F GACAAAGGACGGTCCTCG CNTNAP2-Ex22 R TCTGGAGAGGCAACCAGT CNTNAP2-Ex23 R GGATTATATGGAAAATCCGCACT CNTNAP2-Ex24 R GGTGCACAGGATGGTGAA

Table B.4: PCR primers for HRM analysis. Primers supplied by Sigma. Name Primer sequence MFN2 Exon5F CTGGTATCTGCGTTGTGAAAGT MFN2 Exon5R AGAGGTGAGGTCAGCCGTGTC

Table B.5: PCR primers for validation of CNV identified with aCGH at 7q34-q36. Primers supplied by Invitrogen Name Primer sequence DelAa_F AGATTCATATTCAAACCATCTTTATTTACG DelAb_F TAGTTCTAGGTATCAGGACTAAAGGAAT DelAc_F GATTCCTTCACAGTTCTAGTAAAGATT DelAa_R AACAAAGCAGGGTCATAGC DelAc_R CAAAGCAGGGTCATAGCAT DelC_F TATAATGTGTGGGATTGAGGGTC DelC_R CTGTTACATGAATTGAATAAACTACAGTTG b pcr primers 215

Table B.6: PCR primers for validation of variants identified through NGS at 7q34-q36. Primers supplied by Invitrogen and Sigma as indicated. Sequence Name Supplier AGCATCGCACTAATCCACG CLCN1-UTR6.1F Invitrogen GAGCTACAAAGGGATGGGG CLCN1-UTR6.1R Invitrogen CAATCCCTCACCCTACCTCTTAC CLCN1-UTR6.2F Invitrogen ACAGTCATTCCCTGGATGC CLCN1-UTR6.2R Invitrogen GCAGGGCCTGCAGTGAC SSPO–>C F Invitrogen GACAGTGCCCTGGGTTGGTAA SSPO–>C R Invitrogen CTTTGGGTGGAGGCATGG SSPO-C>G F Invitrogen CCCCTCACAAGTCCGATG SSPO-C>G R Invitrogen GCCCTGGCTGTGGGAGAA PRSS-FRAG F Invitrogen GCCTGGAGGCGAGAGGATTTA PRSS-FRAG R Invitrogen ACTTGATTATCTTACCTTCTTATTCAAGTT ZNF398 frag F Invitrogen TCTTCAGTCAGTCCTGGCA ZNF398 frag R Invitrogen TGAAGGAGATCCCAGAGTTCTTGTTT KRBA1 frag F Invitrogen AGTGCAGTGCCGTGGAC KRBA1 frag R Invitrogen TCAAATTGTGCCTTATGTTGAATATAAGTC MLL3 frag F Invitrogen GTACAAACATATGCAAAGCATCATTATAC MLL3 frag R Invitrogen CTGACTCCCGGCCTCTC CENTG3prot Ex1 F Invitrogen ACACTCGGTCCAGACCC CENTG3prot Ex1 R Invitrogen CTGGTTACACAGGTGTGTTCAGT NOS3EST-Gen-F Invitrogen GGCATCTAAACAGCGCTCAAT NOS3EST-Gen-R Invitrogen TCCCTCGAACACGAGACG NOS3EST-cDNA-F Invitrogen AAACAGCGCTCAATTTCTTCTTT NOS3EST-cDNA-R Invitrogen ATTCCAGCATTTTATTTTATTTGTTTTATT NOS3EST-LS-F Invitrogen AAGGCCAGCCTGGGCAA NOS3EST-LS-R Invitrogen CCAGTGTCTGGGTTCCTGAG SSCP-Exon19-21-F Invitrogen GAGGGAAGGATGATGTGTGG SSCP-Exon19-21-R Invitrogen GTGTCTGGGTGGGAGGAG SSCP-Exon63-65-F Invitrogen GTTCTTCGCTGTCCAGATCC SSCP-Exon63-65-R Invitrogen CGATCTGATTGCTGGGG SSCP-Exon99-100-F Invitrogen CTCTCCCCACACCTACCC SSCP-Exon99-100-R Invitrogen AGTACCAGGCCAAAGTGGAAT GIMAP5-frag-F Invitrogen AGGCACACAGCTCCAAGTCT GIMAP5-frag-R Invitrogen ACAGCAGCGGGTAGTATTTAGG CUL1-Ex1AF Invitrogen ACCAGCCTCTGGAGAAACG CUL1-Ex1AR Invitrogen CCCAACAGCTTCTCCCC CUL1-Ex1BF Invitrogen AGTCACTGCGACAccgc CUL1-Ex1BR Invitrogen TTTTCTGTCAGTGTGTCTCACC CUL1-Ex2F-Alex Invitrogen AAGGTCTAGCTGCGTCCCTC CR624471-Ex1_1F Invitrogen TCGAGAATTGGCTGAGGTTC CR624471-Ex1_1R Invitrogen TTCAACCTAAATACTACCCGCTG CR624471-Ex1_3F Invitrogen AACCCCACATAGGTACACTGG CR624471-Ex1_3R Invitrogen CTGTTGCTGCTGCTCGC UTR-A-CNTNAP2F Invitrogen CTCCAGGCTCCAGTTCTTCC UTR-A-CNTNAP2R Invitrogen CCTTCTTGGTTTGAATTTCCTC UTR-B-CNTNAP2F Invitrogen ACTTCGGAGTTGATACCCTGG UTR-B-CNTNAP2R Invitrogen GCAGTGTCATCTCCTACCACAG CNTNAP2-Exon18F Invitrogen TTGGAAAATTCCTACCTAAGTTGAG CNTNAP2-Exon18R Invitrogen GATACTTTTGTGTGGCTCTCCC CNTNAP2-Alt-EX1F Invitrogen GCGCACCCTTGCAACAC CNTNAP2-Alt-EX1R Invitrogen ATTATGCAATTGGGTTGGGAG LOC392145-EX1F Invitrogen ctttctgtgcagggaacagc LOC392145-EX1R Invitrogen TCCCCTTCCCTGGGCTC CASP2-Ex1R Sigma TTCACCCCATTGGACCGC CASP2-Ex1F Sigma PEDIGREESFORADDITIONALDHMNFAMILIES C

This Appendix shows the pedigrees for the additional dHMN families not shown in Chapter3. The dHMN families uninformative for markers at 7q34- q36 are; CMT484, CMT274, CMT677, CMT703, CMT131, CMT273, CMT333 and CMT609. The dHMN families excluded from linkage to 7q34-q36 are; CMT701, CMT260, CMT618, CMT477, CMT748, CMT779, CMT808, CMT375, CMT102, CMT724 and CMT331.

216 c pedigrees for additional dhmn families 217 Pedigree CMT274 with haplotype analysis at Figure C.2: 7q34-q36 Pedigree CMT484 with haplotype analysis at Figure C.1: 7q34-q36 c pedigrees for additional dhmn families 218

Figure C.3: Pedigree CMT609 with haplotype analysis at 7q34-q36 c pedigrees for additional dhmn families 219 Pedigree CMT703 with haplotype analysis at 7q34- Figure C.5: q36 Pedigree CMT677 with haplotype analysis at 7q34- Figure C.4: q36 c pedigrees for additional dhmn families 220

Figure C.6: Pedigree CMT131 with haplo- Figure C.7: Pedigree CMT273 with type analysis at 7q34-q36 haplotype analysis at 7q34-q36 c pedigrees for additional dhmn families 221

Figure C.8: Pedigree CMT333 with haplotype analysis at 7q34-q36 c pedigrees for additional dhmn families 222 Pedigree CMT701 with haplotype analysis at 7q34-q36 Figure C.9: c pedigrees for additional dhmn families 223 Pedigree CMT260 with haplotype analysis at 7q34-q36 Figure C.10: c pedigrees for additional dhmn families 224

Figure C.11: Pedigree CMT618 with haplotype analysis at 7q34-q36 c pedigrees for additional dhmn families 225

Figure C.12: Pedigree CMT477 with haplotype analysis at 7q34-q36 c pedigrees for additional dhmn families 226 Pedigree CMT808 with haplo- Figure C.15: type analysis at 7q34-q36 Pedigree CMT779 with haplo- Figure C.14: type analysis at 7q34-q36 Pedigree CMT748 with haplo- Figure C.13: type analysis at 7q34-q36 c pedigrees for additional dhmn families 227

Figure C.16: Pedigree CMT375 with haplotype analysis at 7q34-q36

Figure C.17: Pedigree CMT102 with haplotype analysis at 7q34-q36 c pedigrees for additional dhmn families 228

Figure C.18: (Main figure) Pedigree CMT724 with haplotype analysis at 7q34-q36. (Box) The two possible haplotype pairs for individuals II:1 and II:2 can be inferred from generation III but can not be assigned to either individual. c pedigrees for additional dhmn families 229 Pedigree CMT331 with haplotype analysis at 7q34-q36 Figure C.19: TWOPOINTLODSCORESFORADDITIONAL D DHMN FAMILIES

This Appendix shows the two-point LOD scores for the additional dHMN families not shown in Chapter 3. The dHMN families uninformative for mark- ers at 7q34-q36 are; CMT484, CMT274, CMT677, CMT703, CMT131, CMT273, CMT333 and CMT609. The dHMN families excluded from linkage to 7q34- q36 are; CMT701, CMT260, CMT618, CMT477, CMT748, CMT779, CMT808, CMT375, CMT102, CMT724 and CMT331.

Table D.1: Pedigree CMT274 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S661 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S498 0.5809 0.5153 0.4469 0.3035 0.1612 0.0463 D7S2511 0.5809 0.5153 0.4469 0.3035 0.1612 0.0463 D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2442 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2419 0.5809 0.5153 0.4469 0.3035 0.1612 0.0463 D7S688 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2439 0.5809 0.5153 0.4469 0.3035 0.1612 0.0463 D7S3070 0.5809 0.5153 0.4469 0.3035 0.1612 0.0463 D7S483 0.4403 0.3842 0.3271 0.2128 0.1075 0.0295 D7S1815 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2462 0.3538 0.3052 0.2566 0.1623 0.0797 0.0213 D7S2447 0.4403 0.3842 0.3271 0.2128 0.1075 0.0295

230 d two point lod scores for additionaldhmn families 231

Table D.2: Pedigree CMT609 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 -3.5741 -0.4492 -0.2014 -0.0119 0.0449 0.0417 D7S661 -3.2730 -0.1704 0.0539 0.1922 0.1910 0.1209 D7S498 -3.4491 -0.0634 0.1437 0.2505 0.2231 0.1335 D7S2511 0.5944 0.5222 0.4540 0.3294 0.2171 0.1102 D7S615 -0.1041 -0.0720 -0.0493 -0.0219 -0.0086 -0.0025 D7S2442 -3.2730 -0.1704 0.0539 0.1922 0.1910 0.1209 D7S2419 -3.2730 -0.1704 0.0539 0.1922 0.1910 0.1209 D7S688 0.4058 0.3657 0.3262 0.2482 0.1701 0.0889 D7S2439 0.0039 0.0263 0.0398 0.0486 0.0416 0.0244 D7S3070 -3.5741 -0.4492 -0.2014 -0.0119 0.0449 0.0417 D7S483 0.3503 0.3799 0.3841 0.3449 0.2626 0.1474 D7S1815 -0.0689 -0.0724 -0.0684 -0.0481 -0.0245 -0.0070 D7S2462 0.7270 0.6616 0.5958 0.4616 0.3214 0.1702 D7S637 -0.2024 -0.1814 -0.1543 -0.0970 -0.0490 -0.0163 D7S2447 0.5351 0.5039 0.4682 0.3832 0.2792 0.1532

Table D.3: Pedigree CMT333 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 0.0223 0.0252 0.0256 0.0209 0.0123 0.0038 D7S661 0.1316 0.1094 0.0887 0.0521 0.0239 0.0061 D7S498 0.0580 0.0529 0.0466 0.0319 0.0167 0.0048 D7S2511 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2442 0.0280 0.0228 0.0181 0.0103 0.0046 0.0012 D7S2419 0.0902 0.0744 0.0598 0.0347 0.0157 0.0040 D7S688 0.0580 0.0529 0.0466 0.0319 0.0167 0.0048 D7S2439 0.1397 0.1182 0.0975 0.0594 0.0283 0.0075 D7S3070 0.1316 0.1094 0.0887 0.0521 0.0239 0.0061 D7S483 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S1815 -0.2108 -0.1621 -0.1227 -0.0647 -0.0276 -0.0067 D7S2462 0.0580 0.0291 0.0079 -0.0136 -0.0141 -0.0055 D7S637 0.0580 0.0291 0.0079 -0.0136 -0.0141 -0.0055 D7S2447 0.0792 0.0652 0.0523 0.0302 0.0137 0.0035 d two point lod scores for additionaldhmn families 232

Table D.4: Pedigree CMT484 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S661 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S498 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2511 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2442 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S2419 0.1761 0.1477 0.1206 0.0719 0.0334 0.0086 D7S688 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S2439 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S3070 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S483 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S1815 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2462 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S637 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S2447 0.1761 0.1477 0.1206 0.0719 0.0334 0.0086

Table D.5: Pedigree CMT677 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 -0.0212 -0.0340 -0.0402 -0.0373 -0.0228 -0.0072 D7S661 0.1038 0.0780 0.0569 0.0272 0.0105 0.0024 D7S498 -0.0984 -0.0783 -0.0630 -0.0398 -0.0211 -0.0064 D7S2511 0.1415 0.1124 0.0874 0.0485 0.0220 0.0058 D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2442 0.2075 0.1682 0.1310 0.0673 0.0243 0.0040 D7S2419 0.0186 0.0176 0.0159 0.0110 0.0057 0.0016 D7S688 0.2239 0.1839 0.1453 0.0774 0.0294 0.0053 D7S2439 -0.1947 -0.1414 -0.1022 -0.0509 -0.0216 -0.0056 D7S3070 -0.7413 -0.4904 -0.3510 -0.1841 -0.0823 -0.0215 D7S483 0.2129 0.1782 0.1464 0.0908 0.0456 0.0131 D7S2462 -0.6164 -0.3733 -0.2460 -0.1118 -0.0451 -0.0110 D7S637 0.1415 0.1124 0.0874 0.0485 0.0220 0.0058 d two point lod scores for additionaldhmn families 233

Table D.6: Pedigree CMT703 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 -0.4771 -0.3372 -0.2416 -0.1192 -0.0490 -0.0117 D7S661 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S498 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S2511 -0.0969 -0.0768 -0.0595 -0.0325 -0.0141 -0.0035 D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2442 -0.4771 -0.3372 -0.2416 -0.1192 -0.0490 -0.0117 D7S2419 -0.4771 -0.3372 -0.2416 -0.1192 -0.0490 -0.0117 D7S688 -0.4771 -0.3372 -0.2416 -0.1192 -0.0490 -0.0117 D7S2439 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S3070 -4.9384 -0.7212 -0.4437 -0.1938 -0.0757 -0.0177 D7S483 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S1815 -0.4771 -0.3372 -0.2416 -0.1192 -0.0490 -0.0117 D7S2462 -0.4771 -0.3372 -0.2416 -0.1192 -0.0490 -0.0117 D7S637 0.1761 0.1477 0.1206 0.0719 0.0334 0.0086 D7S2447 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Table D.7: Pedigree CMT131 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S661 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S498 -4.5026 -0.9999 -0.6989 -0.3979 -0.2218 -0.0969 D7S2511 -4.5026 -0.9999 -0.6989 -0.3979 -0.2218 -0.0969 D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2442 -4.5026 -0.9999 -0.6989 -0.3979 -0.2218 -0.0969 D7S2419 0.0792 0.0719 0.0645 0.0492 0.0334 0.0170 D7S2439 -4.5026 -0.9999 -0.6989 -0.3979 -0.2218 -0.0969 D7S3070 -4.5026 -0.9999 -0.6989 -0.3979 -0.2218 -0.0969 D7S483 0.3010 0.2787 0.2553 0.2041 0.1461 0.0792 AD0AD0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S1815 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 AD3AD3 0.3010 0.2787 0.2553 0.2041 0.1461 0.0792 D7S2462 0.0792 0.0719 0.0645 0.0492 0.0334 0.0170 D7S637 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2447 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 d two point lod scores for additionaldhmn families 234

Table D.8: Pedigree CMT273 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 -0.0969 -0.0862 -0.0757 -0.0555 -0.0362 -0.0177 D7S661 -0.0969 -0.0862 -0.0757 -0.0555 -0.0362 -0.0177 D7S498 -0.0969 -0.0862 -0.0757 -0.0555 -0.0362 -0.0177 D7S2511 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2442 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2419 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S688 -0.1761 -0.1549 -0.1347 -0.0969 -0.0621 -0.0300 D7S2439 -4.9402 -1.0000 -0.6989 -0.3979 -0.2218 -0.0969 D7S3070 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S483 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S1815 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2462 -4.9402 -1.0000 -0.6989 -0.3979 -0.2218 -0.0969 D7S637 -0.0969 -0.0862 -0.0757 -0.0555 -0.0362 -0.0177 D7S2447 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 d two point lod scores for additionaldhmn families 235

Table D.9: Pedigree CMT808 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S661 -5.1858 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247 D7S498 -4.9402 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247 D7S2511 -0.2016 -0.1560 -0.1187 -0.0635 -0.0277 -0.0072 D7S615 -5.1858 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247 D7S2442 -5.1858 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247 D7S2419 -4.9402 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247 D7S688 -4.9402 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247 D7S2439 -1.0212 -0.5733 -0.3758 -0.1712 -0.0679 -0.0160 D7S3070 -4.9402 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247 D7S483 0.1037 0.1038 0.1018 0.0914 0.0720 0.0423 D7S1815 0.5187 0.4788 0.4370 0.3467 0.2459 0.1317 D7S2462 -1.0212 -0.5733 -0.3758 -0.1712 -0.0679 -0.0160 D7S2447 -4.9402 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247

Table D.10: Pedigree CMT260 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 -4.9720 -0.2587 0.0044 0.1649 0.1436 0.0545 D7S661 -5.5081 -0.8214 -0.5004 -0.2035 -0.0705 -0.0139 D7S498 -5.1560 -0.6542 -0.3532 -0.0999 -0.0138 0.0026 D7S2511 -4.7725 -0.3563 -0.1115 0.0453 0.0549 0.0203 D7S615 0.0667 0.0532 0.0415 0.0228 0.0099 0.0025 D7S2442 0.5516 0.4869 0.4205 0.2824 0.1457 0.0390 D7S2419 0.1987 0.1595 0.1229 0.0609 0.0201 0.0025 D7S688 -4.8352 -0.4436 -0.1806 -0.0038 0.0211 0.0068 D7S2439 0.0806 0.0637 0.0491 0.0263 0.0113 0.0028 D7S3070 -4.8352 -0.4436 -0.1806 -0.0038 0.0211 0.0068 D7S483 -5.5383 -0.9486 -0.6004 -0.2649 -0.1025 -0.0236 D7S1815 -1.0985 -0.6980 -0.4767 -0.2326 -0.1010 -0.0264 D7S2462 -5.1490 -0.6904 -0.4119 -0.1666 -0.0586 -0.0119 D7S2447 -6.1524 -1.1612 -0.7143 -0.3007 -0.1117 -0.0247 d two point lod scores for additionaldhmn families 236

Table D.11: Pedigree CMT779 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 -4.9645 -1.9996 -1.3978 -0.7959 -0.4437 -0.1938 D7S2511 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170 D7S615 0.0134 -0.0086 -0.0269 -0.0500 -0.0531 -0.0356 D7S2442 -0.0389 -0.0534 -0.0641 -0.0728 -0.0641 -0.0389 D7S2419 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S688 -4.9645 -1.9996 -1.3978 -0.7959 -0.4437 -0.1938 D7S2439 0.0000 -0.0410 -0.0757 -0.1192 -0.1192 -0.0757 D7S3070 -5.2180 -0.7212 -0.4437 -0.1938 -0.0757 -0.0177 D7S483 -5.2180 -0.7212 -0.4437 -0.1938 -0.0757 -0.0177 D7S1815 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2462 -5.2180 -0.7212 -0.4437 -0.1938 -0.0757 -0.0177 D7S2447 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Table D.12: Pedigree CMT477 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 -0.9379 -0.6017 -0.4066 -0.1847 -0.0713 -0.0162 D7S661 0.1020 0.0748 0.0544 0.0289 0.0155 0.0072 D7S498 -0.1965 -0.1292 -0.0843 -0.0331 -0.0103 -0.0018 D7S615 -0.2314 -0.1832 -0.1416 -0.0767 -0.0330 -0.0079 D7S2442 -0.0517 -0.0588 -0.0569 -0.0380 -0.0153 -0.0009 D7S2419 -0.8477 -0.5450 -0.3714 -0.1732 -0.0696 -0.0173 D7S688 -0.1965 -0.1292 -0.0843 -0.0331 -0.0103 -0.0018 D7S2439 -0.9379 -0.6017 -0.4066 -0.1847 -0.0713 -0.0162 D7S3070 -1.0035 -0.6408 -0.4338 -0.2029 -0.0852 -0.0246 D7S1815 0.0350 0.0251 0.0173 0.0073 0.0023 0.0004 D7S2462 -0.6099 -0.4279 -0.3011 -0.1416 -0.0552 -0.0126 D7S2447 -0.2979 -0.1716 -0.1035 -0.0404 -0.0174 -0.0076 d two point lod scores for additionaldhmn families 237

Table D.13: Pedigree CMT331 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 0.0561 0.0436 0.0330 0.0172 0.0072 0.0017 D7S661 -2.3970 -0.4527 -0.2337 -0.0800 -0.0270 -0.0034 D7S498 -1.7849 -0.5506 -0.3130 -0.1173 -0.0371 -0.0038 D7S2511 0.2793 0.2383 0.1980 0.1222 0.0586 0.0154 D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2442 0.1213 0.0930 0.0610 0.0043 -0.0227 -0.0175 D7S2419 -5.3260 -0.4600 -0.2023 0.0023 0.0666 0.0603 D7S688 -0.1641 -0.1316 -0.1021 -0.0542 -0.0222 -0.0048 D7S2439 0.2673 0.2147 0.1672 0.0890 0.0340 0.0030 D7S3070 -2.6658 -0.5634 -0.3118 -0.1035 -0.0179 0.0121 D7S483 -2.5798 -0.7164 -0.4417 -0.1932 -0.0755 -0.0177 D7S1815 -2.2566 -0.9877 -0.6773 -0.3630 -0.1901 -0.0788 D7S2462 0.4043 0.3624 0.3195 0.2330 0.1499 0.0728 D7S2447 0.0502 0.0449 0.0447 0.0485 0.0462 0.0309

Table D.14: Pedigree CMT724 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S2202 -1.1596 -0.5877 -0.2878 -0.0271 0.0390 0.0228 D7S1518 0.6013 0.5264 0.4502 0.2971 0.1533 0.0429 D7S2513 -1.1602 -0.5878 -0.2879 -0.0271 0.0390 0.0228 D7S2473 0.5809 0.5153 0.4469 0.3035 0.1612 0.0463 D7S661 -0.0121 -0.0114 -0.0101 -0.0066 -0.0033 -0.0009 D7S498 -0.1414 -0.0627 -0.0216 0.0072 0.0070 0.0018 D7S2511 0.1015 0.0839 0.0676 0.0393 0.0179 0.0045 D7S615 -0.6518 -0.3902 -0.2515 -0.1101 -0.0449 -0.0118 D7S2442 -0.5481 -0.2910 -0.1344 0.0175 0.0501 0.0243 D7S2419 0.6080 0.5332 0.4570 0.3029 0.1570 0.0442 D7S688 0.5923 0.5187 0.4438 0.2930 0.1512 0.0423 D7S2439 -3.8570 -1.1223 -0.6148 -0.1923 -0.0362 0.0035 D7S3070 -3.8570 -1.1223 -0.6148 -0.1923 -0.0362 0.0035 D7S483 -4.4991 -1.1124 -0.6803 -0.2802 -0.1002 -0.0209 D7S1815 -5.3591 -1.4482 -0.9227 -0.4044 -0.1535 -0.0342 D7S2462 -5.1540 -1.5686 -0.9846 -0.4173 -0.1527 -0.0326 D7S2447 -0.7413 -0.4724 -0.3223 -0.1516 -0.0610 -0.0145 d two point lod scores for additionaldhmn families 238

Table D.15: Pedigree CMT618 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 -4.5214 -0.4425 -0.1885 0.0103 0.0704 0.0614 D7S661 -2.8949 -0.2845 -0.0595 0.0873 0.1063 0.0707 D7S498 -2.9054 -0.5799 -0.3328 -0.1328 -0.0498 -0.0117 D7S2511 -8.1683 -1.1633 -0.6320 -0.1835 -0.0053 0.0437 D7S2442 -4.4554 -0.6362 -0.3337 -0.0654 0.0384 0.0537 D7S2419 -4.9227 -1.3617 -0.8196 -0.3437 -0.1286 -0.0287 D7S688 -7.9118 -1.1629 -0.6318 -0.1834 -0.0053 0.0437 D7S2439 0.1249 0.1038 0.0840 0.0492 0.0226 0.0058 D7S3070 -9.9771 -1.4425 -0.8874 -0.3876 -0.1514 -0.0355 D7S483 -4.3118 -0.1849 0.0263 0.1438 0.1348 0.0785 D7S1815 -5.1849 -0.7560 -0.4969 -0.2470 -0.1059 -0.0261 D7S2462 -4.8261 -1.0688 -0.5517 -0.1310 0.0221 0.0519 D7S637 -3.3683 -0.7205 -0.4434 -0.1937 -0.0757 -0.0177

Table D.16: Pedigree CMT748 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S1518 -4.6970 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247 D7S661 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S498 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2511 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S615 -1.0212 -0.5732 -0.3758 -0.1712 -0.0679 -0.0160 D7S2442 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2419 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S688 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S2439 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S3070 -4.9569 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247 D7S483 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 D7S1815 -1.0212 -0.5732 -0.3758 -0.1712 -0.0679 -0.0160 D7S2462 -5.6999 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247 D7S2447 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 d two point lod scores for additionaldhmn families 239

Table D.17: Pedigree CMT701 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S661 -4.4795 -0.5837 -0.3262 -0.1159 -0.0344 -0.0053 D7S498 -8.9135 -1.6371 -1.0715 -0.5454 -0.2734 -0.1075 D7S2511 -8.7837 -1.3049 -0.7699 -0.3097 -0.1101 -0.0230 D7S2419 -4.8428 -0.6513 -0.3846 -0.1555 -0.0558 -0.0118 D7S688 -9.2754 -1.4098 -0.8598 -0.3697 -0.1421 -0.0327 D7S2439 -9.2308 -2.4876 -1.6226 -0.8057 -0.3821 -0.1346 D7S3070 -4.1146 -1.8532 -1.2343 -0.6326 -0.3126 -0.1180 D7S483 -3.8545 -1.2639 -0.7707 -0.3747 -0.2013 -0.0915 D7S2462 -4.0306 -1.4206 -0.9053 -0.4600 -0.2406 -0.1008 D7S637 -4.2524 -1.6062 -1.0544 -0.5429 -0.2749 -0.1083 D7S2447 -4.4725 -1.9030 -1.2804 -0.6644 -0.3286 -0.1223

Table D.18: Pedigree CMT102 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S2511 -3.9964 -0.7211 -0.4436 -0.1938 -0.0757 -0.0177 D7S483 -3.9964 -0.7211 -0.4436 -0.1938 -0.0757 -0.0177

Table D.19: Pedigree CMT375 two-point LOD scores between the dHMN locus and microsatellite markers on chromosome 7q34-q36. LOD score at recombination fraction Marker 0.00 0.05 0.10 0.20 0.30 0.40 D7S2511 -5.1596 -0.6962 -0.3311 -0.0347 0.0562 0.0555 D7S483 -6.1442 -0.6962 -0.3311 -0.0347 0.0562 0.0555 MULTI-POINTLODSCORESFORADDITIONAL E DHMN FAMILIES

This Appendix shows the multi-point LOD scores for the additional dHMN families not shown in Chapter3. The dHMN families uninformative for mark- ers at 7q34-q36 are; CMT484, CMT274, CMT677, CMT703, CMT131, CMT273, CMT333 and CMT609. The dHMN families excluded from linkage to 7q34- q36 are; CMT701, CMT260, CMT618, CMT477, CMT748, CMT779, CMT808, CMT375, CMT102, CMT724 and CMT331. These plots show multi-point LOD score plotted against genetic distance.

3.0

2.0

1.0

0.0 0 5 10 15 20 25

-1.0

-2.0 Multi-point LOD score -3.0

-4.0 D7S1518 D7S661 D7S498 D7S2511 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S637 D7S2447

Genetic distance (cM) Figure E.1: Multi-point LOD plot for family CMT609.

240 e multi-point lod scores for additional dhmn families 241

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2 Multi-point LOD score

0.1

0.0 0 5 10 15 20 25 D7S1518 D7S661 D7S498 D7S2511 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S2447

Genetic distance (cM) Figure E.2: Multi-point LOD plot for family CMT274.

3.0

2.0

1.0

0.0 0 5 10 15 20

-1.0 Multi-point LOD score

-2.0 D7S3070 D7S483 D7S1518 D7S661 D7S498 D7S2442 D7S2439 D7S2462 D7S637 D7S2511 D7S615 D7S2419 D7S688

Genetic distance (cM) Figure E.3: Multi-point LOD plot for family CMT677. e multi-point lod scores for additional dhmn families 242

0.7

0.6

0.5

0.4

0.3

0.2

0.1 Multi-point LOD score 0.0 0 5 10 15 20 25 D7S1518 D7S661 D7S498 D7S2511 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S637 D7S2447

Genetic distance (cM) Figure E.4: Multi-point LOD plot for family CMT484.

3.0

2.0

1.0

0.0 0 5 10 15 20 25

-1.0

-2.0 Multi-point LODMulti-point score -3.0

-4.0

-5.0 D7S2511 D7S615 D7S1518 D7S661 D7S498 D7S2447 D7S2419 D7S688 D7S2442 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S637

Genetic distance (cM) Figure E.5: Multi-point LOD plot for family CMT703. e multi-point lod scores for additional dhmn families 243

3.0

2.0

1.0

0.0 0 5 1 0 1 5 2 0 2 5 -1.0

-2.0

-3.0 Multi-point LOD score -4.0

-5.0 D7S1518 D7S661 D7S498 D7S2511 D7S615 D7S2442 D7S2419D7S2419 D7S2439 D7S3070 D7S483AD0 D7S1815 AD3 D7S2462 D7S637 D7S2447

Figure E.6: Multi-point LOD plot for family CMT131.

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

Multi-point LODMulti-point score 0.2

0.1

0.0 0 5 10 15 20 25 D7S2442 D7S2419 D7S688 D7S1518 D7S661 D7S49 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S637 D7S2447 D7S2511 D7S615

Genetic distance (cM) Figure E.7: Multi-point LOD plot for family CMT333. e multi-point lod scores for additional dhmn families 244

3.0

2.0

1.0

0.0 0 5 10 15 20 25 -1.0

-2.0

-3.0 Multi-point LODMulti-point score -4.0

-5.0 D7S1518 D7S661 D7S498 D7S2511 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S637 D7S2447

Genetic distance (cM) Figure E.8: Multi-point LOD plot for family CMT273.

3.0

2.0

1.0

0.0 0 5 10 15 20 25 -1.0

-2.0

-3.0

Multi-point LODMulti-point score -4.0

-5.0

-6.0 D7S1518 D7S661 D7S498 D7S2511 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S2447

Genetic distance (cM) Figure E.9: Multi-point LOD plot for family CMT260. e multi-point lod scores for additional dhmn families 245

3.0

2.0

1.0

0.0 0 5 10 15 20 -1.0

-2.0

-3.0

-4.0

-5.0

Multi-point LOD score -6.0

-7.0

-8.0 D7S661 D7S498 D7S2511 D7S2419 D7S688 D7S2439 D7S3070 D7S483 D7S2462 D7S637 D7S2447

Genetic distance (cM) Figure E.10: Multi-point LOD plot for family CMT701

3.0

2.0

1.0

0.0 0 5 1510 20 -1.0

-2.0

-3.0

-4.0

-5.0

Multi-point LOD score -6.0

-7.0

-8.0 D7S1518 D7S661 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S637 D7S498 D7S2511D7S2511 D7S2442 D7S2419 D7S688

Genetic distance (cM) Figure E.11: Multi-point LOD plot for family CMT618. e multi-point lod scores for additional dhmn families 246

3.0

2.0

1.0

0.0 0 5 10 15 20 25 -1.0

-2.0

-3.0

-4.0 Multi-point LOD score

-5.0

-6.0 D7S2202 D7S1518 D7S2513 D7S2473 D7S661 D7S498 D7S2511 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S2447

Genetic distance (cM) Figure E.12: Multi-point LOD plot for family CMT724.

3.0

2.0

1.0

0.0 50 10 15 20 25

-1.0

-2.0

-3.0 Multi-point LOD score -4.0

-5.0 D7S1518 D7S661 D7S498 D7S2511 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S2447

Genetic distance(cM) Figure E.13: Multi-point LOD plot for family CMT331. e multi-point lod scores for additional dhmn families 247

3.0

2.0

1.0

0.0 0 5 10 15 20 25

-1.0

Multi-point LODMulti-point score -2.0

-3.0 D7S1518 D7S661 D7S498 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S1815 D7S2462 D7S2447

Genetic distance (cM) Figure E.14: Multi-point LOD plot for family CMT477.

3.0

2.0

1.0

0.0 5 10 15 20 25

-1.0

-2.0

-3.0 Mult-point LOD Mult-point score -4.0

-5.0 D7S1518 D7S2511 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S2447

Genetic distance (cM) Figure E.15: Multi-point LOD plot for family CMT779. e multi-point lod scores for additional dhmn families 248

3.0

2.0

1.0

0.0 0 5 10 15 20 25 -1.0

-2.0

-3.0

-4.0 Multi-point LOD score

-5.0

-6.0 D7S1518 D7S661 D7S498 D7S2511 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S2447

Genetic distance (cM) Figure E.16: Multi-point LOD plot for family CMT748.

3.0

2.0

1.0

0.0 0 5 10 15 20 25

-1.0

-2.0

-3.0 Multi-point LOD score -4.0

-5.0 D7S1518 D7S661 D7S498 D7S2511 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S483 D7S1815 D7S2462 D7S2447

Genetic distance (cM) Figure E.17: Multi-point LOD plot for family CMT808. 7Q34-Q36 CONTROL HAPLOTYPES F

Table F.1: Healthy control haplotypes for association analysis from unaffected spouses in dHMN families

# D7S1518 D7S661 D7S498 D7S2511 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S483 AD0 D7S1815 AD3 D7S2462 D7S637 D7S2447 153371258163.2.5.2 293251434424.1.3.3 373152234133.2.2.3 463762253144...8.1 587271474112.5.551 684881573126.1.312 753111534367.2.34. 8 5 2 2 10 1 2 5 8 3 2 10 . 1 . 1 5 . 975111732323.2.241 10 11 4 2 1 2 4 3 4 4 3 3 . 3 . 6 5 3 11 8 2 2 . 2 4 3 4 4 1 3 . 2 . 7 5 . 12 9 4 2 . 1 4 3 1 3 1 . . 1 . 6 5 . 13 4 8 8 3 2 7 7 2 4 6 9 . 1 . 4 4 . 14 5 3 1 5 1 4 3 3 4 1 2 . 4 . 2 3 . 15 5 1 1 6 . 4 3 2 7 2 5 . 1 . 3 3 . 16 8 1 1 5 . 8 3 7 10 1 5 . 2 . 5 1 . 17 4 3 2 6 . 2 5 2 5 2 5 . 1 . 5 1 . 18 9 1 1 5 . . 6 7 4 3 9 . 3 . 6 3 . 19 8 3 1 1 . 2 3 7 10 1 2 . . . 5 1 . 20 9 4 4 1 . 4 3 2 4 4 4 . . . 2 3 . 21 2 1 1 6 2 . 3 . 4 . 6 . 1 . . . 3 22 5 3 1 7 2 . 3 . 1 . 3 . 1 . . . 3 23 6 7 2 6 1 . 3 7 4 4 9 . 2 . 2 . 3 24 9 5 2 8 1 5 3 5 2 4 4 . 2 . 1 . 1 25 7 1 1 6 1 2 3 4 4 7 2 . 2 . 2 . 1 26 10 5 1 1 2 4 3 4 3 3 3 . 1 . 9 . 3 27 4 5 3 6 1 4 11 . 3 4 2 2 3 2 6 3 1 28 5 3 1 5 2 2 5 . 3 6 4 2 2 4 5 5 3 29 3 1 2 3 2 2 4 2 3 2 2 2 2 2 2 3 2 30 5 3 1 4 1 2 3 5 1 5 3 2 2 1 2 6 2 31 5 2 1 1 2 2 3 6 6 2 5 4 1 1 2 5 1 Table F.1 - continued on next page

249 f 7q34-q36 control haplotypes 250

Table F.1 - continued from previous page D7S1518 # D7S661 D7S498 D7S2511 D7S615 D7S2442 D7S2419 D7S688 D7S2439 D7S3070 D7S483 AD0 D7S1815 AD3 D7S2462 D7S637 D7S2447 32 4 1 1 5 2 4 3 3 5 3 2 4 1 3 4 5 2 33 2 1 1 6 1 2 1 1 4 4 3 4 4 4 1 3 1 34 1 4 1 7 1 2 2 3 2 3 3 2 2 2 3 3 2 35 8 5 1 1 1 4 . 2 1 4 4 . 2 . 8 . 3 36 9 3 1 1 1 4 3 4 2 5 3 . 2 . . . 2 37 9 3 1 1 . 2 . . 4 3 2 5 . 2 1 . . 38 9 7 1 . 2 2 3 4 1 4 . . 2 8 . . 2 39 4 2 1 5 1 6 6 4 10 . 9 . . . 5 4 3 40 5 1 2 3 2 4 6 7 1 . 10 . . . 2 5 3 41 7 7 1 6 1 4 13 4 3 . 3 . . . 6 4 2 42 5 2 2 2 2 4 14 4 4 . 2 . . . 5 5 3 43 9 1 1 1 1 4 13 5 4 . 9 . . . 7 3 1 44 5 3 1 1 2 2 14 1 4 . 9 . . . 5 3 3 45 5 1 1 1 1 2 13 5 1 . 3 . . . 5 5 2 46 2 3 1 6 1 7 6 3 4 . 2 . . . 6 3 2 47 3 3 1 6 2 4 5 8 1 5 2 . 1 . 2 . 3 48 8 4 1 7 1 4 2 3 ...... 8 . 3 49 . 7 1 1 1 2 3 2 4 4 3 2 1 3 3 3 3 50 . 2 3 10 1 . . . . . 4 3 1 1 1 3 2 51 . 7 7 5 1 4 2 . 4 6 3 . 1 . 4 6 1 52 . . . 1 2 2 . 4 1 4 2 . 3 . 3 . 3 53 7 1 1/2 1/3 1 4 3 3/4 1/7 3 8 . 3 . 2 . 3 54 1 5 7 . 2 6 2 2 8 4 . . 1 . . . . 55 1 3 2 1/4 1/2 4 . 2 4 4 2 . 3 . 3 . 2 56 6 . . 4 1 4 3 4 2 5 2 . 1 . 1 . 3 7Q34-Q36 CANDIDATE GENES AND REPORTED G GENE EXPRESSION

Table G.1: Candidate genes within the minimum probable candidate interval on 7q34-q36. Only alternate isoforms with unique exons are shown. Symbol Accession # Gene name Alternate isoforms CNTNAP2 NM_014141† contactin-associated protein-like 2 isoform 2 (NM_014141)† C7orf33 NM_145304.2 chromosome 7 open reading frame 33 CUL1 NM_003592 cullin 1 EZH2 NM_004456 enhancer of zeste 2 isoform a isoform b (NM_152998) PDIA4 NM_004911 protein disulfide isomerase-associated 4 ZNF786 NM_152411 zinc finger protein 786 ZNF425 NM_001001661 zinc finger protein 425 ZNF398 NM_170686 zinc finger 398 isoform a isoform b (NM_020781) ZNF282 NM_003575 zinc finger protein 282 ZNF212 NM_012256 zinc finger protein 212 ZNF783 NM_001004302 zinc finger protein 783 ZNF777 NM_015694 zinc finger protein 777 ZNF746 NM_152557 zinc finger protein 746 ZNF767 NM_024910 zinc finger family member 767 KRBA1 NM_032534 KRAB A domain containing 1 ZNF467 NM_207336 zinc finger protein 467 SSPO NM_198455 SCO-spondin ZNF862 (NM_001099220) ATP6V0E2 NM_145230 ATPase, H+ transporting, V0 subunit isoform 1 isoform 2 (NM_001100592) ACTR3C NM_001164458 Actin-related protein 3C variant 2 (NM_001164459) LRRC61 NM_001142928 leucine rich repeat containing 61 variant 2 (NM_023942) C7orf29 NM_138434 Homo sapiens chromosome 7 open reading frame 29 RARRES2 NM_002889 chemerin preproprotein REPIN1 NM_014374 replication initiator 1 isoform 1 isoform 3 (NM_001099695) ZNF775 NM_173680 zinc finger protein 775 GIMAP8 NM_175571 GTPase, IMAP family member 8 GIMAP7 NM_153236 GTPase, IMAP family member 7 GIMAP4 NM_018326 GTPase, IMAP family member 4 GIMAP6 NM_024711 GTPase, IMAP family member 6 GIMAP2 NM_015660 GTPase, IMAP family member 2 GIMAP1 NM_130759 GTPase, IMAP family member 1 GIMAP5 NM_018384 GTPase, IMAP family member 5 Table G.1 - continued on next page † alternate isoforms with the same identifier as the parent gene.

251 g 7q34-q36 candidate genes and reported gene expression 252

Table G.1 - continued from previous page Symbol Accession # Gene name Alternate isoforms TMEM176B NM_014020 transmembrane protein 176B variant 1 variant 2 (NM_001101311) variant 3 (NM_001101312) TMEM176A NM_018487 hepatocellular carcinoma-associated antigen 112 ABP1 NM_001091† Amiloride-sensitive amine oxidase [copper-containing] isoform 2 (NM_001091)† KCNH2 NM_172057 voltage-gated potassium channel, subfamily H isoform Isoform b c (NM_172056) NOS3 NM_000603 nitric oxide synthase 3 (endothelial cell) ATG9B NM_173681 ATG9 autophagy related 9 homolog B ABCB8 NM_007188 ATP-binding cassette, sub-family B, member 8 ACCN3 NM_004769 amiloride-sensitive cation channel 3 isoform a isoform b (NM_020321) CDK5 NM_004935 cyclin-dependent kinase 5 SLC4A2 NM_003040 solute carrier family 4, anion exchanger, member isoform 1 variant 2 isoform 1 (NM_001199692) isoform 3 (NM_001199694) FASTK NM_006712 Fas-activated serine/threonine kinase isoform 1 isoform 4 (NM_033015) TMUB1 NM_031434 transmembrane and ubiquitin-like domain-containing isoform 2 protein 1 isoform 1 (NM_001136044) AGAP3 NM_031946 centaurin, gamma 3 isoform a isoform b (NM_001042535) GBX1 NM_001098834 gastrulation brain homeo box 1 ASB10 NM_001142459 ankyrin repeat and SOCS box-containing 10 variant 3 (NM_080871) ABCF2 NM_007189 ATP-binding cassette, sub-family F, member 2 isoform isoform b a (NM_005692) CSGLCA-T NM_019015 chondroitin sulfate glucuronyltransferase SMARCD3 NM_001003802 SWI/SNF-related matrix-associated actin-dependent variant 3 regulator of chromatin subfamily D member 3 (NM_001003801) variant 3 (NM_003078) NUB1 NM_016118 NEDD8 ultimate buster-1 WDR86 NM_198285 WD repeat-containing protein 86 CRYGN NM_144727 Gamma-crystallin N (Gamma-N-crystallin) RHEB NM_005614 Ras homolog enriched in brain PRKAG2 NM_016203 5’-AMP-activated protein kinase subunit gamma-2 variant b (NM_024429) variant c (NM_001040633) GALNTL5 NM_145292 putative polypeptide N-acetylgalactosaminyltransferase-like protein 5 GALNT11 NM_022087 polypeptide N-acetylgalactosaminyltransferase 11 MLL3 NM_170606 myeloid/lymphoid or mixed-lineage leukemia 3 CCT8L1 NM_001029866 chaperonin containing TCP1, subunit 8 XRCC2 NM_005431 X-ray repair cross complementing protein 2 ACTR3B NM_020445 actin-related protein 3-beta isoform 1 DPP6 NM_130797 dipeptidyl-peptidase 6 isoform 1 isoform 2 (NM_001936) isoform 3 (NM_001039350) DPP6 protein (Q8IYG9)‡ † alternate isoforms with the same identifier as the parent gene. ‡ UniProtKB/TrEMBL accession number. g 7q34-q36 candidate genes and reported gene expression 253

Table G.2: Genes localised outside the minimum probable candidate interval on chromosome 7q34-q36. Alternate isoforms are listed where alternate exons for CDS or UTRs exist which require additional PCR amplicons designed for sequencing coverage. Alternately spliced variants without unique exons are not shown. Symbol Accession # Gene name Alternate isoforms WEE2 NM_001105558 WEE1 homolog 2 SSBP1 NM_003143 single-stranded DNA binding protein 1 PRSS37 NM_001008270 protease, serine, 37 CLEC5A NM_013252 C-type lectin, superfamily member 5 MGAM NM_004668 maltase-glucoamylase LOC93432 NR_003715 Putative maltase-glucoamylase-like protein LOC93432 TRYX3 NM_001001317 trypsin X3 PRSS1 NM_002769 Trypsin-1 PRSS2 NM_002770 Trypsin-2 EPHB6 NM_004445 Ephrin type-B receptor 6 isoform 3 (-) TRPV6 NM_018646 Transient receptor potential cation channel subfamily V member 6 TRPV5 NM_019841 Transient receptor potential cation channel subfamily V member 5 C7orf34 NM_178829 Uncharacterized protein C7orf34 KEL NM_000420 Kell blood group, metallo-endopeptidase PIP NM_002652 prolactin-induced protein GSTK1 NM_001143679 glutathione S-transferase kappa 1 isoform 2 TMEM139 NM_153345 transmembrane protein 139 CASP2 NM_032982 caspase 2 isoform 1 isoform 3 (NM_032983) CLCN1 NM_000083 Chloride channel protein 1 FAM131B NM_014690 Protein FAM131B ZYX NM_003461 zyxin variant 2 (NM_001010972) EPHA1 NM_005232 ephrin receptor EphA1 FAM115C NM_001130025 Protein FAM115C variant 2 (NM_173678) FAM115A NM_014719 Homo sapiens family with sequence similarity 115, member A ARHGEF35 NM_001003702 rho guanine nucleotide exchange factor 35 ARHGEF5 NM_005435 rho guanine nucleotide exchange factor 5 NOBOX NM_001080413 NOBOX oogenesis homeobox TPK1 NM_022445 thiamin pyrophosphokinase 1 isoform a isoform b (NM_001042482) g 7q34-q36 candidate genes and reported gene expression 254

Table G.3: Expression pattern of genes within the chromosome 7q34-q36 dHMN1 disease region. (+) indicates the gene is expressed in the tissue indicated, (H) indicates the gene is expressed at a higher level in this tissue relative to other tissues in the data set, (L) indicates the gene is expressed at a lower level in this tissue relative to other tissues, (-) indicates there is no gene expression detected. (*) indicates genes with no expression data available from Genenote and GNF BioGPS, instead UniGene - electronic northern normal expression data is presented.

Gene Brain Spinal Cord Muscle Gene Brain Spinal Cord Muscle WEE2 + + + ACTR3C*L SSBP1 + + + LRRC61 + + + PRSS37 + + + RARRES2 LLL CLEC5A + + + REPIN1 + + + MGAM LLL ZNF775 + + + LOC93432 + + + GIMAP8 + + + TRYX3 + + + GIMAP7 + + + PRSS1 LLL GIMAP4 + + + PRSS2 LLL GIMAP6 + + + EPHB6 + + + GIMAP2 + + + TRPV6 + + + GIMAP1 + + + TRPV5 + + + GIMAP5 + + + C7orf34 + + + TMEM176B LLL KEL + + + TMEM176A LLL PIP LLL ABP1 LLL GSTK1 + + + KCNH2 + + + TMEM139 + + + NOS3 + + + CASP2 + + + ATG9B + + + CLCN1 + + + ABCB8 + + + FAM131B + + L ACCN3 + + + ZYX + + + CDK5 H + + EPHA1 + + + SLC4A2 + + + FAM115C + + + FASTK + + + ARHGEF5 + + + TMUB1 + + + NOBOX* AGAP3 + + + TPK1 + + + GBX1 + + + CNTNAP2 HHL ASB10 + + + CUL1 + + + ABCF2 + + + EZH2 LLL CSGLCA-T + + + PDIA4 + + + SMARCD3 + + + ZNF786 + L L NUB1 + + + ZNF425 + + + WDR86 + H + ZNF398 + L + CRYGN* + - - ZNF282 + + + RHEB + + + ZNF212 + + + PRKAG2 + + + ZNF783 + + + GALNTL5 + + + ZNF777 + + + GALNT11 + + + ZNF746 + + + MLL3 + + + ZNF767 + + + CCT8L1 + + + KRBA1 + + + XRCC2 + + + ZNF467 + + + ACTR3B + + + SSPO + + + DPP6 H H + ATP6V0E2 H H + NEXTGENERATIONSEQUENCING H

Table H.1: Sequencing gaps in 454 sequencing of 7q34-q36 with length greater than 10 kb. Gap Start position† End position† Length (bp) Genes GAP1 142928643 142944949 16307 GAP2 142958727 142984092 25366 GAP3 142993142 143019916 26776 GAP4 143019976 143055729 35754 FAM115C GAP5 143080626 143097060 16435 GAP6 143118288 143174571 56285 GAP7 143179456 143202297 22842 GAP8 143507776 143535852 28077 GAP9 143542456 143582489 40034 GAP10 143591439 143664224 72789 GAP11 143671214 143704929 33718 ARHGEF5 GAP12 149236329 149249201 12878 GAP13 149336627 149348154 11534 GAP14 149468960 149485938 16997 GAP15 152134431 152148468 14038 ACTR3B GAP16 153092835 153106814 13985 GAP17 153357906 153369172 11267 DPP6 GAP18 153369692 153381978 12291 DPP6 † Sequence position based on Human Mar. 2006 (NCBI36/hg18) Assembly.

255 h next generation sequencing 256

Table H.2: Analysis of novel variants within chromosomal translocations on 7q34-q36. Novel variants include six non-synonymous changes and three other novel variants Identity† Chr. Strand Start End Ref. DNA SNP at position MLL3:c.3611T>C 100% 7 - 151548623 151548663 T 95.20% 1 - 147158243 147158283 C 95.20% 2 + 91248287 91248327 G 95.20% 13_rand - 177226 177266 C MLL3:c.2689C>T 100% 7 - 151563895 151563935 C 100% 1 - 147142930 147142970 A/G rs58408284 100% 2 + 91263559 91263599 G 100% 13_rand - 163587 163627 G 97.60% 21 + 10048654 10048694 A/G rs58408284 MLL3:c.2674G>A 100% 7 - 151563910 151563950 G 97.60% 1 - 147142915 147142955 C/T rs2838159 rs8175336 97.60% 2 + 91263574 91263614 C/T rs2838159 rs8175336 97.60% 13_rand - 163572 163612 G 95.20% 21 + 10048669 10048709 C/T rs8175336 CCT8L1:c.483G>A 100% 7 + 151773957 151773997 G 97.60% 22 - 15452938 15452978 T CCT8L1:c.532A>G 100% 7 + 151774006 151774046 A 95.20% 22 - 15452889 15452929 C CCT8L1:c.579G>A 100% 7 + 151774054 151774094 G 97.60% 22 - 15452841 15452881 T CCT8L1:c.619A>G 100% 7 + 151774093 151774133 A 95.20% 22 - 15452802 15452842 C CCT8L1:c.626A>C 100% 7 + 151774100 151774140 A 95.20% 22 - 15452795 15452835 G CCT8L1:c.700G>A 100% 7 + 151774175 151774215 G 97.60% 22 - 15452720 15452760 T † Identity indicates the variation of the sequence within 20bp in both directions around reported variants. CNVIDENTIFIEDWITHARRAYCGH I

Table I.1: Gains and losses identified in control individual II:7 using aCGH. Start and end positions refer to the Human Mar. 2006 (NCBI36/hg18) assembly.

Chr. Start End Chr. Start End II:7 gains chr7 112933761 114687270 chr7 141136570 141136707 chr7 145426554 145427643 chr7 142281871 142282647 chr7 147106328 147107617 chr7 148583077 148583362 chr7 147703070 147707039 chr7 149208062 149208712 chr7 148493747 148494200 chr7 153092228 153092495 chr7 150677108 150677431 chr7 154022900 154027513 chr7 151432571 151433370 II:7 losses chr7 151728888 151747584 chr1 104250813 106105850 chr9 92604082 93435806 chr3 117620757 119959839 chr11 27545010 29000366 chr4 167486570 171501584 chr11 38438080 39760781 chr6 122839696 126222259 chr12 84625746 86284769 chr7 49342534 50068562 chr18 30000311 31201020 chr7 77906667 79255842 chr21 15899328 18942345

257 i cnv identified with array cgh 258

Table I.2: Gains and losses identified in affected individual III:9 using aCGH. Start and end positions refer to the Human Mar. 2006 (NCBI36/hg18) assembly.

Chr. Start End Chr. Start End Affected III:9 gains chr7 20398581 22826154 chr7 141008522 141008827 chr7 87955968 88650701 chr7 141136570 141136764 chr7 112933761 114687270 chr7 141405649 141408015 chr7 118682103 122036934 chr7 141411964 141433327 chr7 128353657 130376195 chr7 141436298 141436906 chr7 141383232 141383755 chr7 142368556 142369029 chr7 141452114 141452346 chr7 142590983 142591379 chr7 141529779 141532054 chr7 142926564 142928267 chr7 142705631 142707109 chr7 143232340 143233967 chr7 143527056 143705101 chr7 148583077 148583344 chr7 144547151 144552372 chr7 148942734 148948626 chr7 145923012 145924645 chr7 149208036 149208712 chr7 146642408 146643847 chr7 150070208 150071040 chr7 148493747 148494215 chr7 150469368 150472497 chr7 149504457 149506975 chr7 151003440 151003796 chr7 150021169 150021774 chr7 154021940 154027513 chr7 150677028 150677330 chrX 133579511 134418254 chr7 151432885 151433370 Affected III:9 losses chr7 153572344 153576331 chr1 78301095 81843776 chr8 28788076 30510657 chr1 104250813 106105850 chr8 37782056 38480739 chr1 187316392 188037381 chr8 115839571 118291267 chr2 109804744 111177875 chr9 101609238 104005216 chr2 165516142 167081870 chr9 121327018 122844496 chr2 218674727 219908166 chr10 44219153 45339171 chr2 239631901 240609103 chr10 87775822 89497316 chr3 37928555 38986892 chr11 1405506 3601945 chr3 73618613 75126725 chr12 73217711 74588154 chr3 76194026 81813307 chr12 84625746 86284769 chr3 86019386 87778062 chr13 106444209 109227022 chr3 171960773 176318840 chr14 42637746 43884075 chr3 191707744 194848137 chr14 79896757 83876603 chr4 11290241 17357044 chr15 88803154 91002601 chr4 26907162 32192670 chr18 22618645 26406470 chr4 124278703 124438026 chr18 30000311 31201020 chr6 49926208 51776238 chr18 46253735 48483432 chr6 78591356 80510744 chr17 22200000 24170873 chr6 113488617 119933450 chr21 15899328 18942345 i cnv identified with array cgh 259

Table I.3: Gains and losses identified in affected individual III:12 using aCGH. Start and end positions refer to the Human Mar. 2006 (NCBI36/hg18) assembly.

Chr. Start End Chr. Start End Affected III:12 gains chr7 87955968 88650701 chr7 141076919 141077135 chr7 143594780 143705101 chr7 141667108 141667973 chr7 144546961 144552372 chr7 141691212 141712851 chr7 144944021 144946260 chr7 142350806 142351891 chr7 145532824 145533594 chr7 143232301 143233967 chr7 146642327 146643847 chr7 143729164 143730031 chr7 147703070 147707039 chr7 148583210 148583362 chr7 148493747 148494215 chr7 148817533 148820960 chr7 149504178 149506807 chr7 149207950 149208712 chr7 151432885 151433370 chr7 149666004 149668696 chr7 151511225 151515357 chr7 153092228 153092499 chr7 151708944 151717257 chr7 154338103 154339800 chr7 151728927 151747584 Affected III:2 losses chr7 154023876 154031412 chr1 104250813 106105850 chr8 37782056 38480739 chr2 33323057 35182721 chr8 111799647 114702528 chr3 37928555 38986892 chr10 44219153 45339171 chr3 144583097 147976445 chr11 38438080 39760781 chr4 97035757 99272119 chr11 40954001 41584983 chr5 117403380 118509111 chr12 73217711 74588154 chr7 77906667 79255842 chr13 93112275 94079060 chr7 112933761 114687270 chr14 42637746 43884075 chr7 143527112 143582678 chr18 37521727 38182028 i cnv identified with array cgh 260

Table I.4: Gains and losses identified in affected individual II:1 using aCGH. Start and end positions refer to the Human Mar. 2006 (NCBI36/hg18) assembly.

Chr. Start End Chr. Start End Affected II:1 gains chr7 143594780 143705101 chr7 141411964 141433443 chr7 144402282 144404560 chr7 143729164 143730012 chr7 144546976 144552372 chr7 148583210 148583362 chr7 144930202 144931073 chr7 149207950 149208712 chr7 144944085 144945984 chr7 149666004 149667659 chr7 146642408 146643847 chr7 150065475 150065749 chr7 148136687 148137341 chr7 153092228 153092495 chr7 148493747 148494215 chr7 154022327 154029815 chr7 149503960 149505812 chrX 22015874 23768392 chr7 150677108 150677330 chrX 105779829 106404347 chr7 151432825 151433370 chrX 115487962 116370306 chr7 151503095 151504259 chrX 128122567 129811507 chr7 151597923 151620438 chrX 133579511 134418254 chr7 151728927 151747193 Affected II:1 losses chr8 41597188 43507449 chr1 172119474 179297968 chr8 101729326 106640189 chr1 181412375 186257480 chr9 76586459 78615389 chr2 165516142 167081870 chr10 44219153 45339171 chr3 37928555 38986892 chr10 49721945 52447409 chr3 122740584 124718365 chr10 110908377 111439564 chr4 90972361 93284435 chr11 40954001 41584983 chr5 1597400 5644849 chr12 84625746 86284769 chr6 49926208 51776238 chr14 42637746 43884075 chr6 62601457 65114644 chr14 57198571 58329722 chr6 78591356 80510744 chr15 81026720 86249006 chr7 45237624 47001006 chr15 88803154 91002601 chr7 112933761 114687270 chr16 58191891 61617665 chr7 141452964 141453225 chr18 22618645 26406470 chr7 141543684 141543981 chr18 30000311 31201020 chr7 143527056 143582678 chr18 37521727 38182028 BIBLIOGRAPHY

[1] A.E. Harding. Inherited neuronal atrophy and degeneration predominantly of lower motor neurons. In P.J. Dyck, P.K. Thomas, J.W. Griffin, P.A. Low, and J.F. Poduslo, editors, Peripheral Neuropathy, pages 1051–1064. W.B. Saunders, Philadelphia, 3rd edition, 1993.

[2] O. V. Evgrafov, I. Mersiyanova, J. Irobi, L. Van Den Bosch, I. Dierick, C. L. Leung, O. Schagina, N. Verpoorten, K. Van Impe, V. Fedotov, E. Dadali, M. Auer-Grumbach, C. Windpassinger, K. Wagner, Z. Mitrovic, D. Hilton- Jones, K. Talbot, J. J. Martin, N. Vasserman, S. Tverskaya, A. Polyakov, R. K. Liem, J. Gettemans, W. Robberecht, P. De Jonghe, and V. Timmerman. Mutant small heat-shock protein 27 causes axonal Charcot-Marie-Tooth disease and distal hereditary motor neuropathy. Nat. Genet., 36:602–606, Jun 2004.

[3] J. Irobi, K. Van Impe, P. Seeman, A. Jordanova, I. Dierick, N. Verpoorten, A. Michalik, E. De Vriendt, A. Ja- cobs, V. Van Gerwen, K. Vennekens, R. Mazanec, I. Tournev, D. Hilton-Jones, K. Talbot, I. Kremensky, L. Van Den Bosch, W. Robberecht, J. Van Vandekerckhove, C. Van Broeckhoven, J. Gettemans, P. De Jonghe, and V. Timmerman. Hot-spot residue in small heat-shock protein 22 causes distal motor neuropathy. Nat. Genet., 36:597–601, Jun 2004.

[4] S. J. Kolb, P. J. Snyder, E. J. Poi, E. A. Renard, A. Bartlett, S. Gu, S. Sutton, W. D. Arnold, M. L. Freimer, V. H. Lawson, J. T. Kissel, and T. W. Prior. Mutant small heat shock protein B3 causes motor neuropathy: utility of a candidate gene approach. Neurology, 74:502–506, Feb 2010.

[5] V. Timmerman, P. Raeymaekers, E. Nelis, P. De Jonghe, L. Muylle, C. Ceuterick, J. J. Martin, and C. Van Broeck- hoven. Linkage analysis of distal hereditary motor neuropathy type II (distal HMN II) in a single pedigree. J. Neurol. Sci., 109:41–48, May 1992.

[6] I. Dierick, J. Baets, J. Irobi, A. Jacobs, E. De Vriendt, T. Deconinck, L. Merlini, P. Van den Bergh, V. M. Rasic, W. Robberecht, D. Fischer, R. J. Morales, Z. Mitrovic, P. Seeman, R. Mazanec, A. Kochanski, A. Jordanova, M. Auer-Grumbach, A. T. Helderman-van den Enden, J. H. Wokke, E. Nelis, P. De Jonghe, and V. Timmerman. Relative contribution of mutations in genes for autosomal dominant distal hereditary motor neuropathies: a genotype-phenotype correlation study. Brain, 131:1217–1227, May 2008.

[7] K. W. Chung, S. B. Kim, S. Y. Cho, S. J. Hwang, S. W. Park, S. H. Kang, J. Kim, J. H. Yoo, and B. O. Choi. Distal hereditary motor neuropathy in Korean patients with a small heat shock protein 27 mutation. Exp. Mol. Med., 40:304–312, Jun 2008.

[8] B. Tang, X. Liu, G. Zhao, W. Luo, K. Xia, Q. Pan, F. Cai, Z. Hu, C. Zhang, B. Chen, F. Zhang, L. Shen, R. Zhang, and H. Jiang. Mutation analysis of the small heat shock protein 27 gene in chinese patients with Charcot- Marie-Tooth disease. Arch. Neurol., 62:1201–1207, Aug 2005.

[9] B. S. Tang, G. H. Zhao, W. Luo, K. Xia, F. Cai, Q. Pan, R. X. Zhang, F. F. Zhang, X. M. Liu, B. Chen, C. Zhang, L. Shen, H. Jiang, Z. G. Long, and H. P. Dai. Small heat-shock protein 22 mutated in autosomal dominant Charcot-Marie-Tooth disease type 2L. Hum. Genet., 116:222–224, Feb 2005.

[10] H. H. Kampinga, J. Hageman, M. J. Vos, H. Kubota, R. M. Tanguay, E. A. Bruford, M. E. Cheetham, B. Chen, and L. E. Hightower. Guidelines for the nomenclature of the human heat shock proteins. Cell Stress Chaperones, 14:105–111, Jan 2009.

[11] G. Kappe, E. Franck, P. Verschuure, W. C. Boelens, J. A. Leunissen, and W. W. de Jong. The human genome encodes 10 alpha-crystallin-related small heat shock proteins: HspB1-10. Cell Stress Chaperones, 8:53–61, 2003.

[12] S. Quraishe, A. Asuni, W. C. Boelens, V. O’Connor, and A. Wyttenbach. Expression of the small heat shock protein family in the mouse CNS: differential anatomical and biochemical compartmentalization. Neuroscience, 153:483–491, May 2008.

[13] P. Tallot, J. F. Grongnet, and J. C. David. Dual perinatal and developmental expression of the small heat shock proteins [FC12]aB-crystallin and Hsp27 in different tissues of the developing piglet. Biol. Neonate, 83:281–288, 2003.

[14] P. Verschuure, C. Tatard, W. C. Boelens, J. F. Grongnet, and J. C. David. Expression of small heat shock proteins HspB2, HspB8, Hsp20 and cvHsp in different tissues of the perinatal developing pig. Eur. J. Cell Biol., 82:523–530, Oct 2003.

261 bibliography 262

[15] J. C. Plumier, D. A. Hopkins, H. A. Robertson, and R. W. Currie. Constitutive expression of the 27-kDa heat shock protein (Hsp27) in sensory and motor neurons of the rat nervous system. J. Comp. Neurol., 384:409–428, Aug 1997.

[16] R. Benndorf, X. Sun, R. R. Gilmont, K. J. Biederman, M. P. Molloy, C. W. Goodmurphy, H. Cheng, P. C. Andrews, and M. J. Welsh. HSP22, a new member of the small heat shock protein superfamily, interacts with mimic of phosphorylated HSP27 ((3D)HSP27). J. Biol. Chem., 276:26753–26761, Jul 2001.

[17] K. Kijima, C. Numakura, T. Goto, T. Takahashi, T. Otagiri, K. Umetsu, and K. Hayasaka. Small heat shock protein 27 mutation in a Japanese patient with distal hereditary motor neuropathy. J. Hum. Genet., 50:473–476, 2005.

[18] H. Houlden, M. Laura, F. Wavrant-De Vrieze, J. Blake, N. Wood, and M. M. Reilly. Mutations in the HSP27 (HSPB1) gene cause dominant, recessive, and sporadic distal HMN/CMT type 2. Neurology, 71:1660–1668, Nov 2008.

[19] Y. Z. Chen, S. H. Hashemi, S. K. Anderson, Y. Huang, M. C. Moreira, D. R. Lynch, I. A. Glass, P. F. Chance, and C. L. Bennett. Senataxin, the yeast Sen1p orthologue: characterization of a unique protein in which recessive mutations cause ataxia and dominant mutations cause motor neuron disease. Neurobiol. Dis., 23:97–108, Jul 2006.

[20] A. Antonellis and E. D. Green. The role of aminoacyl-tRNA synthetases in genetic diseases. Annu Rev Genomics Hum Genet, 9:87–107, 2008.

[21] M. Auer-Grumbach, A. Olschewski, L. Papi?, H. Kremer, M. E. McEntagart, S. Uhrig, C. Fischer, E. Frohlich, Z. Balint, B. Tang, H. Strohmaier, H. Lochmuller, B. Schlotter-Weigel, J. Senderek, A. Krebs, K. J. Dick, R. Petty, C. Longman, N. E. Anderson, G. W. Padberg, H. J. Schelhaas, C. M. van Ravenswaaij-Arts, T. R. Pieber, A. H. Crosby, and C. Guelly. Alterations in the ankyrin domain of TRPV4 cause congenital distal SMA, scapuloper- oneal SMA and HMSN2C. Nat. Genet., 42:160–164, Feb 2010.

[22] D. Rambaldi and F. D. Ciccarelli. FancyGene: dynamic visualization of gene structures and protein domain architectures on genomic loci. Bioinformatics, 25:2281–2282, Sep 2009.

[23] J. M. Fontaine, X. Sun, R. Benndorf, and M. J. Welsh. Interactions of HSP22 (HSPB8) with HSP20, alphaB- crystallin, and HSPB3. Biochem. Biophys. Res. Commun., 337:1006–1011, Nov 2005.

[24] A. A. Shemetov, A. S. Seit-Nebi, O. V. Bukach, and N. B. Gusev. Phosphorylation by cyclic AMP-dependent protein kinase inhibits chaperone-like activity of human HSP22 in vitro. Biochemistry Mosc., 73:200–208, Feb 2008.

[25] K. Kato, K. Hasegawa, S. Goto, and Y. Inaguma. Dissociation as a result of phosphorylation of an aggregated form of the small stress protein, hsp27. J. Biol. Chem., 269:11274–11278, Apr 1994.

[26] L. Almeida-Souza, S. Goethals, V. de Winter, I. Dierick, R. Gallardo, J. Van Durme, J. Irobi, J. Gettemans, F. Rousseau, J. Schymkowitz, V. Timmerman, and S. Janssens. Increased monomerization of mutant HSPB1 leads to protein hyperactivity in Charcot-Marie-Tooth neuropathy. J. Biol. Chem., 285:12778–12786, Apr 2010.

[27] D. Hayes, V. Napoli, A. Mazurkie, W. F. Stafford, and P. Graceffa. Phosphorylation dependence of hsp27 multimeric size and molecular chaperone function. J. Biol. Chem., 284:18801–18807, Jul 2009.

[28] R. A. Stetler, Y. Gao, A. P. Signore, G. Cao, and J. Chen. HSP27: mechanisms of cellular protection against neuronal injury. Curr. Mol. Med., 9:863–872, Sep 2009.

[29] P. Mehlen, A. Mehlen, J. Godet, and A. P. Arrigo. hsp27 as a switch between differentiation and apoptosis in murine embryonic stem cells. J. Biol. Chem., 272:31657–31665, Dec 1997.

[30] R. Benndorf, K. Hayess, S. Ryazantsev, M. Wieske, J. Behlke, and G. Lutsch. Phosphorylation and supramolecu- lar organization of murine small heat shock protein HSP25 abolish its actin polymerization-inhibiting activity. J. Biol. Chem., 269:20780–20784, Aug 1994.

[31] A. P. Arrigo. The cellular "networking" of mammalian Hsp27 and its functions in the control of protein folding, redox state and apoptosis. Adv. Exp. Med. Biol., 594:14–26, 2007.

[32] Y. Chen, T. S. Voegeli, P. P. Liu, E. G. Noble, and R. W. Currie. Heat shock paradox and a new role of heat shock proteins and their receptors as anti-inflammation targets. Inflamm Allergy Drug Targets, 6:91–100, Jun 2007. bibliography 263

[33] K. S. Sinsimer, F. M. Gratacos, A. M. Knapinska, J. Lu, C. D. Krause, A. V. Wierzbowski, L. R. Maher, S. Scrudato, Y. M. Rivera, S. Gupta, D. K. Turrin, M. P. De La Cruz, S. Pestka, and G. Brewer. Chaperone Hsp27, a novel subunit of AUF1 protein complexes, functions in AU-rich element-mediated mRNA decay. Mol. Cell. Biol., 28:5223–5237, Sep 2008.

[34] C. A. Ross and M. A. Poirier. Protein aggregation and neurodegenerative disease. Nat. Med., 10 Suppl:S10–17, Jul 2004.

[35] S. Ackerley, P. A. James, A. Kalli, S. French, K. E. Davies, and K. Talbot. A mutation in the small heat-shock protein HSPB1 leading to distal hereditary motor neuronopathy disrupts neurofilament assembly and the axonal transport of specific cellular cargoes. Hum. Mol. Genet., 15:347–354, Jan 2006.

[36] I. V. Mersiyanova, A. V. Perepelov, A. V. Polyakov, V. F. Sitnikov, E. L. Dadali, R. B. Oparin, A. N. Petrin, and O. V. Evgrafov. A new variant of Charcot-Marie-Tooth disease type 2 is probably the result of a mutation in the neurofilament-light gene. Am. J. Hum. Genet., 67:37–46, Jul 2000.

[37] G. M. Fabrizi, T. Cavallaro, C. Angiari, L. Bertolasi, I. Cabrini, M. Ferrarini, and N. Rizzuto. Giant axon and neurofilament accumulation in Charcot-Marie-Tooth disease type 2E. Neurology, 62:1429–1431, Apr 2004.

[38] J. Zhai, H. Lin, J. P. Julien, and W. W. Schlaepfer. Disruption of neurofilament network with aggregation of light neurofilament protein: a common pathway leading to motor neuron degeneration due to Charcot-Marie- Tooth disease-linked mutations in NFL and HSPB1. Hum. Mol. Genet., 16:3103–3116, Dec 2007.

[39] J. Irobi, L. Almeida-Souza, B. Asselbergh, V. De Winter, S. Goethals, I. Dierick, J. Krishnan, J. P. Timmermans, W. Robberecht, P. De Jonghe, L. Van Den Bosch, S. Janssens, and V. Timmerman. Mutant HSPB8 causes motor neuron-specific neurite degeneration. Hum. Mol. Genet., 19:3254–3265, Aug 2010.

[40] M. Coleman. Axon degeneration mechanisms: commonality amid diversity. Nat. Rev. Neurosci., 6:889–898, Nov 2005.

[41] S. C. Benn, D. Perrelet, A. C. Kato, J. Scholz, I. Decosterd, R. J. Mannion, J. C. Bakowska, and C. J. Woolf. Hsp27 upregulation and phosphorylation is required for injured sensory and motor neuron survival. Neuron, 36:45–56, Sep 2002.

[42] J. N. Lavoie, G. Gingras-Breton, R. M. Tanguay, and J. Landry. Induction of Chinese hamster HSP27 gene ex- pression in mouse cells confers resistance to heat shock. HSP27 stabilization of the microfilament organization. J. Biol. Chem., 268:3420–3429, Feb 1993.

[43] N. Mounier and A. P. Arrigo. Actin cytoskeleton and small heat shock proteins: how do they interact? Cell Stress Chaperones, 7:167–176, Apr 2002.

[44] B. M. Doshi, L. E. Hightower, and J. Lee. The role of Hsp27 and actin in the regulation of movement in human cancer cells responding to heat shock. Cell Stress Chaperones, 14:445–457, Sep 2009.

[45] G. Pigino, L.L. Kirkpatrick, and S.T. Brady. Cytoskeleton of Neurons and Glia. In G.J. Siegel, editor, Basic neurochemistry : molecular, cellular, and medical aspects, pages 123–138. Elsevier Academic, Toronto, 7th edition, 2006.

[46] A. J. Macario and E. Conway de Macario. Chaperonopathies by defect, excess, or mistake. Ann. N. Y. Acad. Sci., 1113:178–191, Oct 2007.

[47] M. V. Kim, A. S. Kasakov, A. S. Seit-Nebi, S. B. Marston, and N. B. Gusev. Structure and properties of K141E mutant of small heat shock protein HSP22 (HspB8, H11) that is expressed in human neuromuscular disorders. Arch. Biochem. Biophys., 454:32–41, Oct 2006.

[48] M. Litt, P. Kramer, D. M. LaMorticella, W. Murphey, E. W. Lovrien, and R. G. Weleber. Autosomal dominant congenital cataract associated with a missense mutation in the human alpha crystallin gene CRYAA. Hum. Mol. Genet., 7:471–474, Mar 1998.

[49] S. Bera, P. Thampi, W. J. Cho, and E. C. Abraham. A positive charge preservation at position 116 of alpha A-crystallin is critical for its structural and functional integrity. Biochemistry, 41:12421–12426, Oct 2002.

[50] D. Baumer, O. Ansorge, M. Almeida, and K. Talbot. The role of RNA processing in the pathogenesis of motor neuron degeneration. Expert Rev Mol Med, 12:e21, 2010.

[51] M. van Blitterswijk and J. E. Landers. RNA processing pathways in amyotrophic lateral sclerosis. Neurogenetics, 11:275–290, Jul 2010. bibliography 264

[52] X. Sun, J. M. Fontaine, A. D. Hoppe, S. Carra, C. DeGuzman, J. L. Martin, S. Simon, P. Vicart, M. J. Welsh, J. Landry, and R. Benndorf. Abnormal interaction of motor neuropathy-associated mutant HspB8 (Hsp22) forms with the RNA helicase Ddx20 (gemin3). Cell Stress Chaperones, 15:567–582, Sep 2010.

[53] S. Lefebvre, L. Burglen, S. Reboullet, O. Clermont, P. Burlet, L. Viollet, B. Benichou, C. Cruaud, P. Millasseau, and M. Zeviani. Identification and characterization of a spinal muscular atrophy-determining gene. Cell, 80: 155–165, Jan 1995.

[54] S. Lefebvre, P. Burlet, Q. Liu, S. Bertrandy, O. Clermont, A. Munnich, G. Dreyfuss, and J. Melki. Correlation between severity and SMN protein level in spinal muscular atrophy. Nat. Genet., 16:265–269, Jul 1997.

[55] C. Soler-Botija, I. Cusco, L. Caselles, E. Lopez, M. Baiget, and E. F. Tizzano. Implication of fetal SMN2 expression in type I SMA pathogenesis: protection or pathological gain of function? J. Neuropathol. Exp. Neurol., 64:215–223, Mar 2005.

[56] A. K. Gubitz, W. Feng, and G. Dreyfuss. The SMN complex. Exp. Cell Res., 296:51–56, May 2004.

[57] G. Meister, C. Eggert, and U. Fischer. SMN-mediated assembly of RNPs: a complex story. Trends Cell Biol., 12: 472–478, Oct 2002.

[58] A. G. Todd, D. J. Shaw, R. Morse, H. Stebbings, and P. J. Young. SMN and the Gemin proteins form sub- complexes that localise to both stationary and dynamic neurite granules. Biochem. Biophys. Res. Commun., 394: 211–216, Mar 2010.

[59] W. Rossoll, S. Jablonka, C. Andreassi, A. K. Kroning, K. Karle, U. R. Monani, and M. Sendtner. Smn, the spinal muscular atrophy-determining gene product, modulates axon growth and localization of beta-actin mRNA in growth cones of motoneurons. J. Cell Biol., 163:801–812, Nov 2003.

[60] F. Gabanella, M. E. Butchbach, L. Saieva, C. Carissimi, A. H. Burghes, and L. Pellizzoni. Ribonucleoprotein assembly defects correlate with spinal muscular atrophy severity and preferentially affect a subset of spliceo- somal snRNPs. PLoS ONE, 2:e921, 2007.

[61] G. Laroia, R. Cuesta, G. Brewer, and R. J. Schneider. Control of mRNA decay by heat shock-ubiquitin- proteasome pathway. Science, 284:499–502, Apr 1999.

[62] G. Laroia, B. Sarkar, and R. J. Schneider. Ubiquitin-dependent mechanism regulates rapid turnover of AU-rich cytokine mRNAs. Proc. Natl. Acad. Sci. U.S.A., 99:1842–1846, Feb 2002.

[63] I. Dierick, J. Irobi, S. Janssens, J. Theuns, R. Lemmens, A. Jacobs, E. Corsmit, N. Hersmus, L. Van Den Bosch, W. Robberecht, P. De Jonghe, C. Van Broeckhoven, and V. Timmerman. Genetic variant in the HSPB1 promoter region impairs the HSP27 stress response. Hum. Mutat., 28:830, Aug 2007.

[64] I. Puls, C. Jonnakuty, B. H. LaMonte, E. L. Holzbaur, M. Tokito, E. Mann, M. K. Floeter, K. Bidus, D. Drayna, S. J. Oh, R. H. Brown, C. L. Ludlow, and K. H. Fischbeck. Mutant dynactin in motor neuron disease. Nat. Genet., 33:455–456, Apr 2003.

[65] C. Munch, R. Sedlmeier, T. Meyer, V. Homberg, A. D. Sperfeld, A. Kurt, J. Prudlo, G. Peraus, C. O. Hanemann, G. Stumm, and A. C. Ludolph. Point mutations of the p150 subunit of dynactin (DCTN1) gene in ALS. Neurology, 63:724–726, Aug 2004.

[66] C. Munch, A. Rosenbohm, A. D. Sperfeld, I. Uttner, S. Reske, B. J. Krause, R. Sedlmeier, T. Meyer, C. O. Hanemann, G. Stumm, and A. C. Ludolph. Heterozygous R1101K mutation of the DCTN1 gene in a family with ALS and FTD. Ann. Neurol., 58:777–780, Nov 2005.

[67] C. Vilarino-Guell, C. Wider, A. I. Soto-Ortolaza, S. A. Cobb, J. M. Kachergus, B. H. Keeling, J. C. Dachsel, M. M. Hulihan, D. W. Dickson, Z. K. Wszolek, R. J. Uitti, N. R. Graff-Radford, B. F. Boeve, K. A. Josephs, B. Miller, K. B. Boylan, K. Gwinn, C. H. Adler, J. O. Aasly, F. Hentati, A. Destee, A. Krygowska-Wajs, M. C. Chartier-Harlin, O. A. Ross, R. Rademakers, and M. J. Farrer. Characterization of DCTN1 genetic variability in neurodegeneration. Neurology, 72:2024–2028, Jun 2009.

[68] T. A. Schroer. Dynactin. Annu. Rev. Cell Dev. Biol., 20:759–779, 2004.

[69] R. H. Melloni, M. K. Tokito, and E. L. Holzbaur. Expression of the p150Glued component of the dynactin complex in developing and adult rat brain. J. Comp. Neurol., 357:15–24, Jun 1995.

[70] M. J. Farrer, M. M. Hulihan, J. M. Kachergus, J. C. Dachsel, A. J. Stoessl, L. L. Grantier, S. Calne, D. B. Calne, B. Lechevalier, F. Chapon, Y. Tsuboi, T. Yamada, L. Gutmann, B. Elibol, K. P. Bhatia, C. Wider, C. Vilarino-Guell, O. A. Ross, L. A. Brown, M. Castanedes-Casey, D. W. Dickson, and Z. K. Wszolek. DCTN1 mutations in Perry syndrome. Nat. Genet., 41:163–165, Feb 2009. bibliography 265

[71] J. R. Levy, C. J. Sumner, J. P. Caviston, M. K. Tokito, S. Ranganathan, L. A. Ligon, K. E. Wallace, B. H. LaMonte, G. G. Harmison, I. Puls, K. H. Fischbeck, and E. L. Holzbaur. A motor neuron disease-associated mutation in p150Glued perturbs dynactin function and induces protein aggregation. J. Cell Biol., 172:733–745, Feb 2006.

[72] B. H. LaMonte, K. E. Wallace, B. A. Holloway, S. S. Shelly, J. Ascano, M. Tokito, T. Van Winkle, D. S. Howland, and E. L. Holzbaur. Disruption of dynein/dynactin inhibits axonal transport in motor neurons causing late- onset progressive degeneration. Neuron, 34:715–727, May 2002.

[73] I. Puls, S. J. Oh, C. J. Sumner, K. E. Wallace, M. K. Floeter, E. A. Mann, W. R. Kennedy, G. Wendelschafer-Crabb, A. Vortmeyer, R. Powers, K. Finnegan, E. L. Holzbaur, K. H. Fischbeck, and C. L. Ludlow. Distal spinal and bulbar muscular atrophy caused by dynactin mutation. Ann. Neurol., 57:687–694, May 2005.

[74] J. K. Moore, D. Sept, and J. A. Cooper. Neurodegeneration mutations in dynactin impair dynein-dependent nuclear migration. Proc. Natl. Acad. Sci. U.S.A., 106:5147–5152, Mar 2009.

[75] S. Ahmed, S. Sun, A. E. Siglin, T. Polenova, and J. C. Williams. Disease-associated mutations in the p150(Glued) subunit destabilize the CAP-gly domain. Biochemistry, 49:5083–5085, Jun 2010.

[76] K. Grohmann, M. Schuelke, A. Diers, K. Hoffmann, B. Lucke, C. Adams, E. Bertini, H. Leonhardt-Horti, F. Muntoni, R. Ouvrier, A. Pfeufer, R. Rossi, L. Van Maldergem, J. M. Wilmshurst, T. F. Wienker, M. Sendtner, S. Rudnik-Schoneborn, K. Zerres, and C. Hubner. Mutations in the gene encoding immunoglobulin mu- binding protein 2 cause spinal muscular atrophy with respiratory distress type 1. Nat. Genet., 29:75–77, Sep 2001.

[77] K. Grohmann, R. Varon, P. Stolz, M. Schuelke, C. Janetzki, E. Bertini, K. Bushby, F. Muntoni, R. Ouvrier, L. Van Maldergem, N. M. Goemans, H. Lochmuller, S. Eichholz, C. Adams, F. Bosch, P. Grattan-Smith, C. Navarro, H. Neitzel, T. Polster, H. Topaloglu, C. Steglich, U. P. Guenther, K. Zerres, S. Rudnik-Schoneborn, and C. Hubner. Infantile spinal muscular atrophy with respiratory distress type 1 (SMARD1). Ann. Neurol., 54:719–724, Dec 2003.

[78] M. Pitt, H. Houlden, J. Jacobs, Q. Mok, B. Harding, M. Reilly, and R. Surtees. Severe infantile neuropathy with diaphragmatic weakness and its relationship to SMARD1. Brain, 126:2682–2692, Dec 2003.

[79] U. P. Guenther, M. Schuelke, E. Bertini, A. D’Amico, N. Goemans, K. Grohmann, C. Hubner, and R. Varon. Genomic rearrangements at the IGHMBP2 gene locus in two patients with SMARD1. Hum. Genet., 115:319–326, Sep 2004.

[80] I. Maystadt, M. Zarhrate, P. Landrieu, O. Boespflug-Tanguy, S. Sukno, P. Collignon, J. Melki, C. Verellen- Dumoulin, A. Munnich, and L. Viollet. Allelic heterogeneity of SMARD1 at the IGHMBP2 locus. Hum. Mutat., 23:525–526, May 2004.

[81] R. E. Appleton, C. Hubner, K. Grohmann, and R. Varon. ’Congenital peripheral neuropathy presenting as apnoea and respiratory insufficiency: spinal muscular atrophy with respiratory distress type 1 (SMARD1)’. Dev Med Child Neurol, 46:576, Aug 2004.

[82] A. Giannini, A. M. Pinto, G. Rossetti, E. Prandi, D. Tiziano, C. Brahe, and N. Nardocci. Respiratory failure in infants due to spinal muscular atrophy with respiratory distress type 1. Intensive Care Med, 32:1851–1855, Nov 2006.

[83] V. C. Wong, B. H. Chung, S. Li, W. Goh, and S. L. Lee. Mutation of gene in spinal muscular atrophy respiratory distress type I. Pediatr. Neurol., 34:474–477, Jun 2006.

[84] U. P. Guenther, R. Varon, M. Schlicke, V. Dutrannoy, A. Volk, C. Hubner, K. von Au, and M. Schuelke. Clin- ical and mutational profile in spinal muscular atrophy with respiratory distress (SMARD): defining novel phenotypes through hierarchical cluster analysis. Hum. Mutat., 28:808–815, Aug 2007.

[85] U. P. Guenther, L. Handoko, R. Varon, U. Stephani, C. Y. Tsao, J. R. Mendell, S. Lutzkendorf, C. Hubner, K. von Au, S. Jablonka, G. Dittmar, U. Heinemann, A. Schuetz, and M. Schuelke. Clinical variability in distal spinal muscular atrophy type 1 (DSMA1): determination of steady-state IGHMBP2 protein levels in five patients with infantile and juvenile disease. J. Mol. Med., 87:31–41, Jan 2009.

[86] V. Fanos, A. Cuccu, S. Nemolato, V. Marinelli, and G. Faa. A new nonsense mutation of the IGHMBP2 gene responsible for the first case of SMARD1 in a Sardinian patient with giant cell hepatitis. Neuropediatrics, 41: 132–134, Jun 2010.

[87] A. Diers, M. Kaczinski, K. Grohmann, C. Hubner, and G. Stoltenburg-Didinger. The ultrastructure of pe- ripheral nerve, motor end-plate and skeletal muscle in patients suffering from spinal muscular atrophy with respiratory distress type 1 (SMARD1). Acta Neuropathol., 110:289–297, Sep 2005. bibliography 266

[88] A. M. Kaindl, U. P. Guenther, S. Rudnik-Schoneborn, R. Varon, K. Zerres, M. Schuelke, C. Hubner, and K. von Au. Spinal muscular atrophy with respiratory distress type 1 (SMARD1). J. Child Neurol., 23:199–204, Feb 2008.

[89] Y. Fukita, T. R. Mizuta, M. Shirozu, K. Ozawa, A. Shimizu, and T. Honjo. The human S mu bp-2, a DNA- binding protein specific to the single-stranded guanine-rich sequence related to the immunoglobulin mu chain switch region. J. Biol. Chem., 268:17463–17470, Aug 1993.

[90] K. Grohmann, W. Rossoll, I. Kobsar, B. Holtmann, S. Jablonka, C. Wessig, G. Stoltenburg-Didinger, U. Fischer, C. Hubner, R. Martini, and M. Sendtner. Characterization of Ighmbp2 in motor neurons and implications for the pathomechanism in a mouse model of human spinal muscular atrophy with respiratory distress type 1 (SMARD1). Hum. Mol. Genet., 13:2031–2042, Sep 2004.

[91] A. E. Gorbalenyaa and E. V. Kooninb. Helicases: amino acid sequence comparisons and structure-function relationships. Curr. Opin. Struct. Biol., 3:419–429, June 1993.

[92] T. R. Mizuta, Y. Fukita, T. Miyoshi, A. Shimizu, and T. Honjo. Isolation of cDNA encoding a binding protein specific to 5’-phosphorylated single-stranded DNA with G-rich sequences. Nucleic Acids Res., 21:1761–1766, Apr 1993.

[93] J. E. Walker, M. Saraste, M. J. Runswick, and N. J. Gay. Distantly related sequences in the alpha- and beta- subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J., 1:945–951, 1982.

[94] N. V. Grishin. The R3H motif: a domain that binds single-stranded nucleic acids. Trends Biochem. Sci., 23: 329–330, Sep 1998.

[95] E. Liepinsh, A. Leonchiks, A. Sharipo, L. Guignard, and G. Otting. Solution structure of the R3H domain from human Smubp-2. J. Mol. Biol., 326:217–223, Feb 2003.

[96] Y. Z. Chen, C. L. Bennett, H. M. Huynh, I. P. Blair, I. Puls, J. Irobi, I. Dierick, A. Abel, M. L. Kennerson, B. A. Rabin, G. A. Nicholson, M. Auer-Grumbach, K. Wagner, P. De Jonghe, J. W. Griffin, K. H. Fischbeck, V. Timmerman, D. R. Cornblath, and P. F. Chance. DNA/RNA helicase gene mutations in a form of juvenile amyotrophic lateral sclerosis (ALS4). Am. J. Hum. Genet., 74:1128–1135, Jun 2004.

[97] U. P. Guenther, L. Handoko, B. Laggerbauer, S. Jablonka, A. Chari, M. Alzheimer, J. Ohmer, O. Plottner, N. Gehring, A. Sickmann, K. von Au, M. Schuelke, and U. Fischer. IGHMBP2 is a ribosome-associated helicase inactive in the neuromuscular disorder distal SMA type 1 (DSMA1). Hum. Mol. Genet., 18:1288–1300, Apr 2009.

[98] M. de Planell-Saguer, D. G. Schroeder, M. C. Rodicio, G. A. Cox, and Z. Mourelatos. Biochemical and genetic evidence for a role of IGHMBP2 in the translational machinery. Hum. Mol. Genet., 18:2115–2126, Jun 2009.

[99] M. Miao, S. L. Chan, G. L. Fletcher, and C. L. Hew. The rat ortholog of the presumptive flounder antifreeze enhancer-binding protein is a helicase domain-containing protein. Eur. J. Biochem., 267:7237–7246, Dec 2000.

[100] G. A. Cox, C. L. Mahaffey, and W. N. Frankel. Identification of the mouse neuromuscular degeneration gene and mapping of a second site suppressor allele. Neuron, 21:1327–1337, Dec 1998.

[101] T. P. Maddatu, S. M. Garvey, D. G. Schroeder, T. G. Hampton, and G. A. Cox. Transgenic rescue of neurogenic atrophy in the nmd mouse reveals a role for Ighmbp2 in dilated cardiomyopathy. Hum. Mol. Genet., 13: 1105–1115, Jun 2004.

[102] P. De Jonghe, M. Auer-Grumbach, J. Irobi, K. Wagner, B. Plecko, M. Kennerson, D. Zhu, E. De Vriendt, V. Van Gerwen, G. Nicholson, H. P. Hartung, and V. Timmerman. Autosomal dominant juvenile amyotrophic lateral sclerosis and distal hereditary motor neuronopathy with pyramidal tract signs: synonyms for the same disorder? Brain, 125:1320–1325, Jun 2002.

[103] S. Gopinath, I. P. Blair, M. L. Kennerson, J. C. Durnall, and G. A. Nicholson. A novel locus for distal motor neuron degeneration maps to chromosome 7q34-q36. Hum. Genet., 121:559–564, Jun 2007.

[104] M. C. Moreira, S. Klur, M. Watanabe, A. H. Nemeth, I. Le Ber, J. C. Moniz, C. Tranchant, P. Aubourg, M. Tazir, L. Schols, M. Pandolfo, J. B. Schulz, J. Pouget, P. Calvas, M. Shizuka-Ikeda, M. Shoji, M. Tanaka, L. Izatt, C. E. Shaw, A. M’Zahem, E. Dunne, P. Bomont, T. Benhassine, N. Bouslam, G. Stevanin, A. Brice, J. Guimaraes, P. Mendonca, C. Barbot, P. Coutinho, J. Sequeiros, A. Durr, J. M. Warter, and M. Koenig. Senataxin, the ortholog of a yeast RNA helicase, is mutant in ataxia-ocular apraxia 2. Nat. Genet., 36:225–227, Mar 2004.

[105] A. Duquette, K. Roddier, J. McNabb-Baltar, I. Gosselin, A. St-Denis, M. J. Dicaire, L. Loisel, D. Labuda, L. Marc- hand, J. Mathieu, J. P. Bouchard, and B. Brais. Mutations in senataxin responsible for Quebec cluster of ataxia with neuropathy. Ann. Neurol., 57:408–414, Mar 2005. bibliography 267

[106] C. Criscuolo, L. Chessa, S. Di Giandomenico, P. Mancini, F. Sacca, G. S. Grieco, M. Piane, F. Barbieri, G. De Michele, S. Banfi, F. Pierelli, N. Rizzuto, F. M. Santorelli, L. Gallosti, A. Filla, and C. Casali. Ataxia with oculomotor apraxia type 2: a clinical, pathologic, and genetic study. Neurology, 66:1207–1210, Apr 2006.

[107] T. Asaka, H. Yokoji, J. Ito, K. Yamaguchi, and A. Matsushima. Autosomal recessive ataxia with peripheral neuropathy and elevated AFP: novel mutations in SETX. Neurology, 66:1580–1581, May 2006.

[108] B. L. Fogel and S. Perlman. Novel mutations in the senataxin DNA/RNA helicase domain in ataxia with oculomotor apraxia 2. Neurology, 67:2083–2084, Dec 2006.

[109] A. Suraweera, O. J. Becherel, P. Chen, N. Rundle, R. Woods, J. Nakamura, M. Gatei, C. Criscuolo, A. Filla, L. Chessa, M. Fusser, B. Epe, N. Gueven, and M. F. Lavin. Senataxin, defective in ataxia oculomotor apraxia type 2, is involved in the defense against oxidative DNA damage. J. Cell Biol., 177:969–979, Jun 2007.

[110] D. R. Lynch, C. D. Braastad, and N. Nagan. Ovarian failure in ataxia with oculomotor apraxia type 2. Am. J. Med. Genet. A, 143A:1775–1777, Aug 2007.

[111] M. Anheim, M. C. Fleury, J. Franques, M. C. Moreira, J. P. Delaunoy, D. Stoppa-Lyonnet, M. Koenig, and C. Tranchant. Clinical and molecular findings of ataxia with oculomotor apraxia type 2 in 4 families. Arch. Neurol., 65:958–962, Jul 2008.

[112] L. Arning, L. Schols, H. Cin, M. Souquet, J. T. Epplen, and D. Timmann. Identification and characterisation of a large senataxin (SETX) gene duplication in ataxia with ocular apraxia type 2 (AOA2). Neurogenetics, 9: 295–299, Oct 2008.

[113] P. Nicolaou, A. Georghiou, C. Votsi, L. T. Middleton, E. Zamba-Papanicolaou, and K. Christodoulou. A novel c.5308_5311delGAGA mutation in Senataxin in a Cypriot family with an autosomal recessive cerebellar ataxia. BMC Med. Genet., 9:28, 2008.

[114] L. Schols, L. Arning, R. Schule, J. T. Epplen, and D. Timmann. "Pseudodominant inheritance" of ataxia with ocular apraxia type 2 (AOA2). J. Neurol., 255:495–501, Apr 2008.

[115] M. Tazir, L. Ali-Pacha, A. M’Zahem, J. P. Delaunoy, M. Fritsch, S. Nouioua, T. Benhassine, S. Assami, D. Grid, J. M. Vallat, A. Hamri, and M. Koenig. Ataxia with oculomotor apraxia type 2: a clinical and genetic study of 19 patients. J. Neurol. Sci., 278:77–81, Mar 2009.

[116] M. Anheim, B. Monga, M. Fleury, P. Charles, C. Barbot, M. Salih, J. P. Delaunoy, M. Fritsch, L. Arning, M. Syn- ofzik, L. Schols, J. Sequeiros, C. Goizet, C. Marelli, I. Le Ber, J. Koht, J. Gazulla, J. De Bleecker, M. Mukhtar, N. Drouot, L. Ali-Pacha, T. Benhassine, M. Chbicheb, A. M’Zahem, A. Hamri, B. Chabrol, J. Pouget, R. Murphy, M. Watanabe, P. Coutinho, M. Tazir, A. Durr, A. Brice, C. Tranchant, and M. Koenig. Ataxia with oculomotor apraxia type 2: clinical, biological and genotype/phenotype correlation study of a cohort of 90 patients. Brain, 132:2688–2698, Oct 2009.

[117] G. Airoldi, A. Guidarelli, O. Cantoni, C. Panzeri, C. Vantaggiato, S. Bonato, M. Grazia D’Angelo, S. Falcone, C. De Palma, A. Tonelli, C. Crimella, S. Bondioni, N. Bresolin, E. Clementi, and M. T. Bassi. Characterization of two novel SETX mutations in AOA2 patients reveals aspects of the pathophysiological role of senataxin. Neurogenetics, 11:91–100, Feb 2010.

[118] A. G. Bassuk, Y. Z. Chen, S. D. Batish, N. Nagan, P. Opal, P. F. Chance, and C. L. Bennett. In cis autosomal dominant mutation of Senataxin associated with tremor/ataxia syndrome. Neurogenetics, 8:45–49, Jan 2007.

[119] H. D. Kim, J. Choe, and Y. S. Seo. The sen1(+) gene of Schizosaccharomyces pombe, a homologue of budding yeast SEN1, encodes an RNA and DNA helicase. Biochemistry, 38:14697–14710, Nov 1999.

[120] A. Suraweera, Y. Lim, R. Woods, G. W. Birrell, T. Nasim, O. J. Becherel, and M. F. Lavin. Functional role for senataxin, defective in ataxia oculomotor apraxia type 2, in transcriptional regulation. Hum. Mol. Genet., 18: 3384–3396, Sep 2009.

[121] D. Ursic, K. L. Himmel, K. A. Gurley, F. Webb, and M. R. Culbertson. The yeast SEN1 gene is required for the processing of diverse RNA classes. Nucleic Acids Res., 25:4778–4785, Dec 1997.

[122] M. Winey and M. R. Culbertson. Mutations affecting the tRNA-splicing endonuclease activity of Saccha- romyces cerevisiae. Genetics, 118:609–617, Apr 1988.

[123] T. P. Rasmussen and M. R. Culbertson. The putative nucleic acid helicase Sen1p is required for formation and stability of termini and for maximal rates of synthesis and levels of accumulation of small nucleolar RNAs in Saccharomyces cerevisiae. Mol. Cell. Biol., 18:6885–6896, Dec 1998. bibliography 268

[124] D. Ursic, K. Chinchilla, J. S. Finkel, and M. R. Culbertson. Multiple protein/protein and protein/RNA inter- actions suggest roles for yeast DNA/RNA helicase Sen1p in transcription, transcription-coupled DNA repair and RNA processing. Nucleic Acids Res., 32:2441–2452, 2004.

[125] E. J. Steinmetz, C. L. Warren, J. N. Kuehner, B. Panbehi, A. Z. Ansari, and D. A. Brow. Genome-wide distribu- tion of yeast RNA polymerase II and its control by Sen1 helicase. Mol. Cell, 24:735–746, Dec 2006.

[126] A. Antonellis, R. E. Ellsworth, N. Sambuughin, I. Puls, A. Abel, S. Q. Lee-Lin, A. Jordanova, I. Kremensky, K. Christodoulou, L. T. Middleton, K. Sivakumar, V. Ionasescu, B. Funalot, J. M. Vance, L. G. Goldfarb, K. H. Fischbeck, and E. D. Green. Glycyl tRNA synthetase mutations in Charcot-Marie-Tooth disease type 2D and distal spinal muscular atrophy type V. Am. J. Hum. Genet., 72:1293–1299, May 2003.

[127] K. Sivakumar, T. Kyriakides, I. Puls, G. A. Nicholson, B. Funalot, A. Antonellis, N. Sambuughin, K. Christodoulou, J. L. Beggs, E. Zamba-Papanicolaou, V. Ionasescu, M. C. Dalakas, E. D. Green, K. H. Fis- chbeck, and L. G. Goldfarb. Phenotypic spectrum of disorders associated with glycyl-tRNA synthetase muta- tions. Brain, 128:2304–2314, Oct 2005.

[128] R. Del Bo, F. Locatelli, S. Corti, M. Scarlato, S. Ghezzi, A. Prelle, G. Fagiolari, M. Moggio, M. Carpo, N. Bresolin, and G. P. Comi. Coexistence of CMT-2D and distal SMA-V phenotypes in an Italian family with a GARS gene mutation. Neurology, 66:752–754, Mar 2006.

[129] O. Dubourg, H. Azzedine, R. B. Yaou, J. Pouget, A. Barois, V. Meininger, D. Bouteiller, M. Ruberg, A. Brice, and E. LeGuern. The G526R glycyl-tRNA synthetase gene mutation in distal hereditary motor neuropathy type V. Neurology, 66:1721–1726, Jun 2006.

[130] A. Hamaguchi, C. Ishida, K. Iwasa, A. Abe, and M. Yamada. Charcot-Marie-Tooth disease type 2D with a novel glycyl-tRNA synthetase gene (GARS) mutation. J. Neurol., 257:1202–1204, Jul 2010.

[131] A. Abe and K. Hayasaka. The GARS gene is rarely mutated in Japanese patients with Charcot-Marie-Tooth neuropathy. J. Hum. Genet., 54:310–312, May 2009.

[132] P. A. James, M. Z. Cader, F. Muntoni, A. M. Childs, Y. J. Crow, and K. Talbot. Severe childhood SMA and axonal CMT due to anticodon binding domain mutations in the GARS gene. Neurology, 67:1710–1712, Nov 2006.

[133] B. Rohkamm, M. M. Reilly, H. Lochmuller, B. Schlotter-Weigel, N. Barisic, L. Schols, G. Nicholson, D. Pareyson, M. Laura, A. R. Janecke, G. Miltenberger-Miltenyi, E. John, C. Fischer, F. Grill, W. Wakeling, M. Davis, T. R. Pieber, and M. Auer-Grumbach. Further evidence for genetic heterogeneity of distal HMN type V, CMT2 with predominant hand involvement and Silver syndrome. J. Neurol. Sci., 263:100–106, Dec 2007.

[134] K. L. Seburn, L. A. Nangle, G. A. Cox, P. Schimmel, and R. W. Burgess. An active dominant mutation of glycyl-tRNA synthetase causes neuropathy in a Charcot-Marie-Tooth 2D mouse model. Neuron, 51(6):715–726, Sep 2006.

[135] W. W. Motley, K. L. Seburn, M. H. Nawaz, K. E. Miers, J. Cheng, A. Antonellis, E. D. Green, K. Talbot, X. L. Yang, K. H. Fischbeck, and R. W. Burgess. Charcot-Marie-Tooth-linked mutant GARS is toxic to peripheral neurons independent of wild-type GARS levels. PLoS Genet., 7(12):e1002399, Dec 2011.

[136] M. Delarue. Aminoacyl-tRNA synthetases. Curr. Opin. Struct. Biol., 5:48–55, Feb 1995.

[137] Q. Ge, E. P. Trieu, and I. N. Targoff. Primary structure and functional expression of human Glycyl-tRNA synthetase, an autoantigen in myositis. J. Biol. Chem., 269:28790–28797, Nov 1994.

[138] W. W. Motley, K. Talbot, and K. H. Fischbeck. GARS axonopathy: not every neuron’s cup of tRNA. Trends Neurosci., 33:59–66, Feb 2010.

[139] P. Latour, C. Thauvin-Robinet, C. Baudelet-Mery, P. Soichot, V. Cusin, L. Faivre, M. C. Locatelli, M. Mayen- con, A. Sarcey, E. Broussolle, W. Camu, A. David, and R. Rousson. A major determinant for binding and aminoacylation of tRNA(Ala) in cytoplasmic Alanyl-tRNA synthetase is mutated in dominant axonal Charcot- Marie-Tooth disease. Am. J. Hum. Genet., 86:77–82, Jan 2010.

[140] H. M. McLaughlin, R. Sakaguchi, C. Liu, T. Igarashi, D. Pehlivan, K. Chu, R. Iyer, P. Cruz, P. F. Cherukuri, N. F. Hansen, J. C. Mullikin, L. G. Biesecker, T. E. Wilson, V. Ionasescu, G. Nicholson, C. Searby, K. Talbot, J. M. Vance, S. Zuchner, K. Szigeti, J. R. Lupski, Y. M. Hou, E. D. Green, and A. Antonellis. Compound heterozygosity for loss-of-function lysyl-tRNA synthetase mutations in a patient with peripheral neuropathy. Am. J. Hum. Genet., 87:560–566, Oct 2010. bibliography 269

[141] A. Jordanova, J. Irobi, F. P. Thomas, P. Van Dijck, K. Meerschaert, M. Dewil, I. Dierick, A. Jacobs, E. De Vriendt, V. Guergueltcheva, C. V. Rao, I. Tournev, F. A. Gondim, M. D’Hooghe, V. Van Gerwen, P. Callaerts, L. Van Den Bosch, J. P. Timmermans, W. Robberecht, J. Gettemans, J. M. Thevelein, P. De Jonghe, I. Kremensky, and V. Timmerman. Disrupted function and axonal distribution of mutant tyrosyl-tRNA synthetase in dominant intermediate Charcot-Marie-Tooth neuropathy. Nat. Genet., 38:197–202, Feb 2006.

[142] C. Windpassinger, M. Auer-Grumbach, J. Irobi, H. Patel, E. Petek, G. Horl, R. Malli, J. A. Reed, I. Dierick, N. Verpoorten, T. T. Warner, C. Proukakis, P. Van den Bergh, C. Verellen, L. Van Maldergem, L. Merlini, P. De Jonghe, V. Timmerman, A. H. Crosby, and K. Wagner. Heterozygous missense mutations in BSCL2 are associated with distal hereditary motor neuropathy and Silver syndrome. Nat. Genet., 36:271–276, Mar 2004.

[143] M. Auer-Grumbach, B. Schlotter-Weigel, H. Lochmuller, G. Strobl-Wildemann, P. Auer-Grumbach, R. Fischer, H. Offenbacher, E. B. Zwick, T. Robl, G. Hartl, H. P. Hartung, K. Wagner, and C. Windpassinger. Phenotypes of the N88S Berardinelli-Seip congenital lipodystrophy 2 mutation. Ann. Neurol., 57:415–424, Mar 2005.

[144] J. Irobi, P. Van den Bergh, L. Merlini, C. Verellen, L. Van Maldergem, I. Dierick, N. Verpoorten, A. Jor- danova, C. Windpassinger, E. De Vriendt, V. Van Gerwen, M. Auer-Grumbach, K. Wagner, V. Timmerman, and P. De Jonghe. The phenotype of motor neuropathies associated with BSCL2 mutations is broader than Silver syndrome and distal HMN type V. Brain, 127:2124–2130, Sep 2004.

[145] B. P. van de Warrenburg, H. Scheffer, J. J. van Eijk, M. H. Versteeg, H. Kremer, M. J. Zwarts, H. J. Schelhaas, and B. G. van Engelen. BSCL2 mutations in two Dutch families with overlapping Silver syndrome-distal hereditary motor neuropathy. Neuromuscul. Disord., 16:122–125, Feb 2006.

[146] E. Brusse, D. Majoor-Krakauer, B. M. de Graaf, G. H. Visser, S. Swagemakers, A. J. Boon, B. A. Oostra, and A. M. Bertoli-Avella. A novel 16p locus associated with BSCL2 hereditary motor neuronopathy: a genetic modifier? Neurogenetics, 10:289–297, Oct 2009.

[147] J. Magre, M. Delepine, E. Khallouf, T. Gedde-Dahl, L. Van Maldergem, E. Sobel, J. Papp, M. Meier, A. Megar- bane, A. Bachy, A. Verloes, F. H. d’Abronzo, E. Seemanova, R. Assan, N. Baudic, C. Bourut, P. Czernichow, F. Huet, F. Grigorescu, M. de Kerdanet, D. Lacombe, P. Labrune, M. Lanza, H. Loret, F. Matsuda, J. Navarro, A. Nivelon-Chevalier, M. Polak, J. J. Robert, P. Tric, N. Tubiana-Rufi, C. Vigouroux, J. Weissenbach, S. Savasta, J. A. Maassen, O. Trygstad, P. Bogalho, P. Freitas, J. L. Medina, F. Bonnicci, B. I. Joffe, G. Loyson, V. R. Panz, F. J. Raal, S. O’Rahilly, T. Stephenson, C. R. Kahn, M. Lathrop, and J. Capeau. Identification of the gene altered in Berardinelli-Seip congenital lipodystrophy on chromosome 11q13. Nat. Genet., 28:365–370, Aug 2001.

[148] D. Ito, T. Fujisawa, H. Iida, and N. Suzuki. Characterization of seipin/BSCL2, a protein associated with spastic paraplegia 17. Neurobiol. Dis., 31:266–277, Aug 2008.

[149] W. Fei, G. Shui, B. Gaeta, X. Du, L. Kuerschner, P. Li, A. J. Brown, M. R. Wenk, R. G. Parton, and H. Yang. Fld1p, a functional homologue of human seipin, regulates the size of lipid droplets in yeast. J. Cell Biol., 180: 473–482, Feb 2008.

[150] C. Lundin, R. Nordstrom, K. Wagner, C. Windpassinger, H. Andersson, G. von Heijne, and I. Nilsson. Mem- brane topology of the human seipin protein. FEBS Lett., 580:2281–2284, Apr 2006.

[151] D. Ito and N. Suzuki. Molecular pathogenesis of seipin/BSCL2-related motor neuron diseases. Ann. Neurol., 61:237–250, Mar 2007.

[152] D. Ito and N. Suzuki. Seipinopathy: a novel endoplasmic reticulum stress-associated disease. Brain, 132:8–15, Jan 2009.

[153] M. L. Kennerson, G. A. Nicholson, S. G. Kaler, B. Kowalski, J. F. Mercer, J. Tang, R. M. Llanos, S. Chu, R. I. Takata, C. E. Speck-Martins, J. Baets, L. Almeida-Souza, D. Fischer, V. Timmerman, P. E. Taylor, S. S. Scherer, T. A. Ferguson, T. D. Bird, P. De Jonghe, S. M. Feely, M. E. Shy, and J. Y. Garbern. Missense mutations in the copper transporter gene ATP7A cause X-linked distal hereditary motor neuropathy. Am. J. Hum. Genet., 86: 343–352, Mar 2010.

[154] R. I. Takata, C. E. Speck Martins, M. R. Passosbueno, K. T. Abe, A. L. Nishimura, M. D. Da Silva, A. Monteiro, M. I. Lima, F. Kok, and M. Zatz. A new locus for recessive distal spinal muscular atrophy at Xq13.1-q21. J. Med. Genet., 41:224–229, Mar 2004.

[155] M. Kennerson, G. Nicholson, B. Kowalski, K. Krajewski, D. El-Khechen, S. Feely, S. Chu, M. Shy, and J. Garbern. X-linked distal hereditary motor neuropathy maps to the DSMAX locus on chromosome Xq13.1-q21. Neurology, 72:246–252, Jan 2009.

[156] C. C. Weihl and G. Lopate. Motor neuron disease associated with copper deficiency. Muscle Nerve, 34:789–793, Dec 2006. bibliography 270

[157] C. Vulpe, B. Levinson, S. Whitney, S. Packman, and J. Gitschier. Isolation of a candidate gene for Menkes disease and evidence that it encodes a copper-transporting ATPase. Nat. Genet., 3:7–13, Jan 1993.

[158] J. F. Mercer, J. Livingston, B. Hall, J. A. Paynter, C. Begy, S. Chandrasekharappa, P. Lockhart, A. Grimes, M. Bhave, and D. Siemieniak. Isolation of a partial candidate gene for Menkes disease by positional cloning. Nat. Genet., 3:20–25, Jan 1993.

[159] J. Chelly, Z. Tumer, T. Tonnesen, A. Petterson, Y. Ishikawa-Brush, N. Tommerup, N. Horn, and A. P. Monaco. Isolation of a candidate gene for Menkes disease that encodes a potential heavy metal binding protein. Nat. Genet., 3:14–19, Jan 1993.

[160] S. G. Kaler, L. K. Gallo, V. K. Proud, A. K. Percy, Y. Mark, N. A. Segal, D. S. Goldstein, C. S. Holmes, and W. A. Gahl. Occipital horn syndrome and a mild Menkes phenotype associated with splice site mutations at the MNK locus. Nat. Genet., 8:195–202, Oct 1994.

[161] I. Maystadt, M. Zarhrate, D. Leclair-Richard, B. Estournet, A. Barois, F. Renault, M. C. Routon, M. C. Durand, S. Lefebvre, A. Munnich, C. Verellen-Dumoulin, and L. Viollet. A gene for an autosomal recessive lower motor neuron disease with childhood onset maps to 1p36. Neurology, 67:120–124, Jul 2006.

[162] I. Maystadt, R. Rezsohazy, M. Barkats, S. Duque, P. Vannuffel, S. Remacle, B. Lambert, M. Najimi, E. Sokal, A. Munnich, L. Viollet, and C. Verellen-Dumoulin. The nuclear factor kappaB-activator gene PLEKHG5 is mutated in a form of autosomal recessive lower motor neuron disease with childhood onset. Am. J. Hum. Genet., 81:67–76, Jul 2007.

[163] M. De Toledo, V. Coulon, S. Schmidt, P. Fort, and A. Blangy. The gene for a new brain specific RhoA exchange factor maps to the highly unstable chromosomal region 1p36.2-1p36.3. Oncogene, 20:7307–7317, Nov 2001.

[164] A. Matsuda, Y. Suzuki, G. Honda, S. Muramatsu, O. Matsuzaki, Y. Nagano, T. Doi, K. Shimotohno, T. Harada, E. Nishida, H. Hayashi, and S. Sugano. Large-scale identification and characterization of human genes that activate NF-kappaB and MAPK signaling pathways. Oncogene, 22:3307–3318, May 2003.

[165] S. W. Barger, A. M. Moerman, and X. Mao. Molecular mechanisms of cytokine-induced neuroprotection: NFkappaB and neuroplasticity. Curr. Pharm. Des., 11:985–998, 2005.

[166] G. Landoure, A. A. Zdebik, T. L. Martinez, B. G. Burnett, H. C. Stanescu, H. Inada, Y. Shi, A. A. Taye, L. Kong, C. H. Munns, S. S. Choo, C. B. Phelps, R. Paudel, H. Houlden, C. L. Ludlow, M. J. Caterina, R. Gaudet, R. Kleta, K. H. Fischbeck, and C. J. Sumner. Mutations in TRPV4 cause Charcot-Marie-Tooth disease type 2C. Nat. Genet., 42:170–174, Feb 2010.

[167] H. X. Deng, C. J. Klein, J. Yan, Y. Shi, Y. Wu, F. Fecto, H. J. Yau, Y. Yang, H. Zhai, N. Siddique, E. T. Hedley- Whyte, R. Delong, M. Martina, P. J. Dyck, and T. Siddique. Scapuloperoneal spinal muscular atrophy and CMT2C are allelic disorders caused by alterations in TRPV4. Nat. Genet., 42:165–169, Feb 2010.

[168] B. Nilius and G. Owsianik. Channelopathies converge on TRPV4. Nat. Genet., 42:98–100, Feb 2010.

[169] P. Fleury and G. Hageman. A dominantly inherited lower motor neuron disorder presenting at birth with associated arthrogryposis. J. Neurol. Neurosurg. Psychiatr., 48:1037–1048, Oct 1985.

[170] A. J. van der Vleuten, C. M. van Ravenswaaij-Arts, C. J. Frijns, A. P. Smits, G. Hageman, G. W. Padberg, and H. Kremer. Localisation of the gene for a dominant congenital spinal muscular atrophy predominantly affecting the lower limbs to chromosome 12q23-q24. Eur. J. Hum. Genet., 6:376–382, 1998.

[171] G. Owsianik, D. D’hoedt, T. Voets, and B. Nilius. Structure-function relationship of the TRP channel super- family. Rev. Physiol. Biochem. Pharmacol., 156:61–90, 2006.

[172] M. B. Harms, K. M. Ori-McKenney, M. Scoto, E. P. Tuck, S. Bell, D. Ma, S. Masi, P. Allred, M. Al-Lozi, M. M. Reilly, L. J. Miller, A. Jani-Acsadi, A. Pestronk, M. E. Shy, F. Muntoni, R. B. Vallee, and R. H. Baloh. Mutations in the tail domain of DYNC1H1 cause dominant spinal muscular atrophy. Neurology, Mar 2012.

[173] M. N. Weedon, R. Hastings, R. Caswell, W. Xie, K. Paszkiewicz, T. Antoniadi, M. Williams, C. King, L. Green- halgh, R. Newbury-Ecob, and S. Ellard. Exome sequencing identifies a DYNC1H1 mutation in a large pedigree with dominant axonal Charcot-Marie-Tooth disease. Am. J. Hum. Genet., 89(2):308–312, Aug 2011.

[174] M. Hafezparast, R. Klocke, C. Ruhrberg, A. Marquardt, A. Ahmad-Annuar, S. Bowen, G. Lalli, A. S. With- erden, H. Hummerich, S. Nicholson, P. J. Morgan, R. Oozageer, J. V. Priestley, S. Averill, V. R. King, S. Ball, J. Peters, T. Toda, A. Yamamoto, Y. Hiraoka, M. Augustin, D. Korthaus, S. Wattler, P. Wabnitz, C. Dickneite, S. Lampel, F. Boehme, G. Peraus, A. Popp, M. Rudelius, J. Schlegel, H. Fuchs, M. Hrabe de Angelis, G. Schiavo, D. T. Shima, A. P. Russ, G. Stumm, J. E. Martin, and E. M. Fisher. Mutations in dynein link motor neuron degeneration to defects in retrograde transport. Science, 300(5620):808–812, May 2003. bibliography 271

[175] X. J. Chen, E. N. Levedakou, K. J. Millen, R. L. Wollmann, B. Soliven, and B. Popko. Proprioceptive sensory neuropathy in mice with a mutation in the cytoplasmic Dynein heavy chain 1 gene. J. Neurosci., 27(52):14515– 14524, Dec 2007.

[176] L. Viollet, A. Barois, J. G. Rebeiz, Z. Rifai, P. Burlet, M. Zarhrate, E. Vial, M. Dessainte, B. Estournet, B. Kleinknecht, J. Pearn, R. D. Adams, J. A. Urtizberea, D. P. Cros, K. Bushby, A. Munnich, and S. Lefeb- vre. Mapping of autosomal recessive chronic distal spinal muscular atrophy to chromosome 11q13. Ann. Neurol., 51:585–592, May 2002.

[177] L. Viollet, M. Zarhrate, I. Maystadt, B. Estournet-Mathiaut, A. Barois, I. Desguerre, M. Mayer, B. Chabrol, B. LeHeup, V. Cusin, T. Billette De Villemeur, D. Bonneau, P. Saugier-Veber, A. Touzery-De Villepin, A. De- laubier, J. Kaplan, M. Jeanpierre, J. Feingold, and A. Munnich. Refined genetic mapping of autosomal recessive chronic distal spinal muscular atrophy to chromosome 11q13.3 and evidence of linkage disequilibrium in Eu- ropean families. Eur. J. Hum. Genet., 12:483–488, Jun 2004.

[178] M. McEntagart, N. Norton, H. Williams, M. D. Teare, M. Dunstan, P. Baker, H. Houlden, M. Reilly, N. Wood, P. S. Harper, P. A. Futreal, N. Williams, and N. Rahman. Localization of the gene for distal hereditary motor neuronopathy VII (dHMN-VII) to chromosome 2q14. Am. J. Hum. Genet., 68:1270–1276, May 2001.

[179] S. Reddel, R. A. Ouvrier, G. Nicholson, I. Dierick, J. Irobi, V. Timmerman, and M. M. Ryan. Autosomal dominant congenital spinal muscular atrophy–a possible developmental deficiency of motor neurones? Neu- romuscul. Disord., 18:530–535, Jul 2008.

[180] C. J. Frijns, J. Van Deutekom, R. R. Frants, and F. G. Jennekens. Dominant congenital benign spinal muscular atrophy. Muscle Nerve, 17:192–197, Feb 1994.

[181] M. Muglia, A. Magariello, L. Citrigno, L. Passamonti, T. Sprovieri, F. L. Conforti, R. Mazzei, A. Patitucci, A. L. Gabriele, C. Ungaro, M. Bellesi, and A. Quattrone. A novel locus for dHMN with pyramidal features maps to chromosome 4q34.3-q35.2. Clin. Genet., 73:486–491, May 2008.

[182] K. Christodoulou, E. Zamba, M. Tsingis, A. Mubaidin, K. Horani, S. Abu-Sheik, M. El-Khateeb, K. Kyriacou, T. Kyriakides, A. K. Al-Qudah, and L. Middleton. A novel form of distal hereditary motor neuronopathy maps to chromosome 9p21.1-p12. Ann. Neurol., 48:877–884, Dec 2000.

[183] L. T. Middleton, K. Christodoulou, A. Mubaidin, E. Zamba, M. Tsingis, K. Kyriacou, S. Abu-Sheikh, T. Kyri- akides, V. Neocleous, D. M. Georgiou, M. el Khateeb, A. al Qudah, and K. Horany. Distal hereditary motor neuronopathy of the Jerash type. Ann. N. Y. Acad. Sci., 883:65–68, Sep 1999.

[184] J. Irobi, I. Dierick, A. Jordanova, K. G. Claeys, P. De Jonghe, and V. Timmerman. Unraveling the genetics of distal hereditary motor neuronopathies. Neuromolecular Med., 8:131–146, 2006.

[185] G. Nicholson, M. Kennerson, M. Brewer, J. Garbern, and M. Shy. Genotypes & sensory phenotypes in 2 new X-linked neuropathies (CMTX3 and dSMAX) and dominant CMT/HMN overlap syndromes. Adv. Exp. Med. Biol., 652:201–206, 2009.

[186] I. Cusco, M. J. Barcelo, R. Rojas-Garcia, I. Illa, J. Gamez, C. Cervera, A. Pou, G. Izquierdo, M. Baiget, and E. F. Tizzano. SMN2 copy number predicts acute or chronic spinal muscular atrophy but does not account for intrafamilial variability in siblings. J. Neurol., 253(1):21–25, Jan 2006.

[187] D. D. Coovert, T. T. Le, P. E. McAndrew, J. Strasswimmer, T. O. Crawford, J. R. Mendell, S. E. Coulson, E. J. Androphy, T. W. Prior, and A. H. Burghes. The survival motor neuron protein in spinal muscular atrophy. Hum. Mol. Genet., 6:1205–1214, Aug 1997.

[188] G. E. Oprea, S. Krober, M. L. McWhorter, W. Rossoll, S. Muller, M. Krawczak, G. J. Bassell, C. E. Beattie, and B. Wirth. Plastin 3 is a protective modifier of autosomal recessive spinal muscular atrophy. Science, 320(5875): 524–527, Apr 2008.

[189] N. Roy, M. S. Mahadevan, M. McLean, G. Shutler, Z. Yaraghi, R. Farahani, S. Baird, A. Besner-Johnston, C. Lefebvre, and X. Kang. The gene for neuronal apoptosis inhibitory protein is partially deleted in individuals with spinal muscular atrophy. Cell, 80:167–178, Jan 1995.

[190] Y. H. Liang, X. L. Chen, Z. S. Yu, C. Y. Chen, S. Bi, L. G. Mao, B. L. Zhou, and X. N. Zhang. Deletion analysis of SMN1 and NAIP genes in Southern Chinese children with spinal muscular atrophy. J Zhejiang Univ Sci B, 10:29–34, Jan 2009.

[191] R. Gotz, C. Karch, M. R. Digby, J. Troppmair, U. R. Rapp, and M. Sendtner. The neuronal apoptosis inhibitory protein suppresses neuronal differentiation and apoptosis in PC12 cells. Hum. Mol. Genet., 9:2479–2489, Oct 2000. bibliography 272

[192] E. Perlson, S. Maday, M. M. Fu, A. J. Moughamian, and E. L. Holzbaur. Retrograde axonal transport: pathways to cell death? Trends Neurosci., 33:335–344, Jul 2010.

[193] S. J. Kolb, S. Sutton, and D. R. Schoenberg. RNA processing defects associated with diseases of the motor neuron. Muscle Nerve, 41:5–17, Jan 2010.

[194] J. Sreedharan, I. P. Blair, V. B. Tripathi, X. Hu, C. Vance, B. Rogelj, S. Ackerley, J. C. Durnall, K. L. Williams, E. Buratti, F. Baralle, J. de Belleroche, J. D. Mitchell, P. N. Leigh, A. Al-Chalabi, C. C. Miller, G. Nicholson, and C. E. Shaw. TDP-43 mutations in familial and sporadic amyotrophic lateral sclerosis. Science, 319:1668–1672, Mar 2008.

[195] C. Vance, B. Rogelj, T. Hortobagyi, K. J. De Vos, A. L. Nishimura, J. Sreedharan, X. Hu, B. Smith, D. Ruddy, P. Wright, J. Ganesalingam, K. L. Williams, V. Tripathi, S. Al-Saraj, A. Al-Chalabi, P. N. Leigh, I. P. Blair, G. Nicholson, J. de Belleroche, J. M. Gallo, C. C. Miller, and C. E. Shaw. Mutations in FUS, an RNA processing protein, cause familial amyotrophic lateral sclerosis type 6. Science, 323:1208–1211, Feb 2009.

[196] J. Sambrook, E. F. Fritsch, and T Maniatis. Molecular Cloning, A laboratory manual,. Laboratory Press, Cold Spring Harbor, 2nd edition, 1989.

[197] X. Wu and D. J. Munroe. EasyExonPrimer: automated primer design for exon sequences. Appl. Bioinformatics, 5:119–120, 2006.

[198] W. J. Kent. BLAT–the BLAST-like alignment tool. Genome Res., 12:656–664, Apr 2002.

[199] T. Strachan and A. P. Read. Human Molecular Genetics. Wiley-Liss, New York, 2nd edition, 1999.

[200] K. W. Broman, J. C. Murray, V. C. Sheffield, R. L. White, and J. L. Weber. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet., 63:861–869, Sep 1998.

[201] A. Kong, D. F. Gudbjartsson, J. Sainz, G. M. Jonsdottir, S. A. Gudjonsson, B. Richardsson, S. Sigurdardottir, J. Barnard, B. Hallbeck, G. Masson, A. Shlien, S. T. Palsson, M. L. Frigge, T. E. Thorgeirsson, J. R. Gulcher, and K. Stefansson. A high-resolution recombination map of the human genome. Nat. Genet., 31:241–247, Jul 2002.

[202] G. A. McVean, S. R. Myers, S. Hunt, P. Deloukas, D. R. Bentley, and P. Donnelly. The fine-scale structure of recombination rate variation in the human genome. Science, 304:581–584, Apr 2004.

[203] S. Myers, L. Bottolo, C. Freeman, G. McVean, and P. Donnelly. A fine-scale map of recombination rates and hotspots across the human genome. Science, 310:321–324, Oct 2005.

[204] F. Baudat, J. Buard, C. Grey, A. Fledel-Alon, C. Ober, M. Przeworski, G. Coop, and B. de Massy. PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science, 327:836–840, Feb 2010.

[205] M. A. Pericak-Vance. Analysis of Genetic Linkage Data for Mendelian Traits. Current Protocols in Human Genetics, Supplement 9:1.4.1–1.4.31, 2001.

[206] N. E. Morton. Sequential tests for the detection of linkage. Am. J. Hum. Genet., 7:277–318, Sep 1955.

[207] Z. Brkanac, M. Fernandez, M. Matsushita, H. Lipe, J. Wolff, T. D. Bird, and W. H. Raskind. Autosomal dominant sensory/motor neuropathy with Ataxia (SMNA): Linkage to chromosome 7q22-q32. Am. J. Med. Genet., 114:450–457, May 2002.

[208] Z. Brkanac, D. Spencer, J. Shendure, P. D. Robertson, M. Matsushita, T. Vu, T. D. Bird, M. V. Olson, and W. H. Raskind. IFRD1 is a candidate gene for SMNA on chromosome 7q22-q23. Am. J. Hum. Genet., 84:692–697, May 2009.

[209] S. Gopinath. Finding new genes causing motor neuron diseases. Master’s thesis, Faculty of Medicine, Univer- sity of Sydney, 2006.

[210] W. J. Kent, C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and D. Haussler. The human genome browser at UCSC. Genome Res., 12:996–1006, Jun 2002.

[211] G. Benson. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res., 27:573–580, Jan 1999.

[212] T. H. Lindner and K. Hoffmann. easyLINKAGE: a PERL script for easy and automated two-/multi-point linkage analyses. Bioinformatics, 21:405–407, Feb 2005.

[213] J. Ott. Computer-simulation methods in human linkage analysis. Proc. Natl. Acad. Sci. U.S.A., 86:4175–4178, Jun 1989. bibliography 273

[214] D.E. Weeks, J. Ott, and G.M. Lathrop. SLINK: a general simulation program for linkage analysis. Am. J. Hum. Genet., 47(Suppl):A204, September 1990.

[215] A. E. Harding and P. K. Thomas. Peroneal muscular atrophy with pyramidal features. J. Neurol. Neurosurg. Psychiatr., 47:168–172, Feb 1984.

[216] W. J. Ewens, R. S. Spielman, N. L. Kaplan, X. Gao, R. W. Morris, and E. R. Martin. Disease associations and family-based tests. Current Protocols in Human Genetics, 58:1.12.1–1.12.24, 2008.

[217] G. A. Nicholson, J. L. Dawkins, I. P. Blair, M. Auer-Grumbach, S. B. Brahmbhatt, and D. J. Hulme. Hereditary sensory neuropathy type I: haplotype analysis shows founders in southern England and Europe. Am. J. Hum. Genet., 69:655–659, Sep 2001.

[218] R. Claramunt, T. Sevilla, V. Lupo, A. Cuesta, J. M. Millan, J. J. Vilchez, F. Palau, and C. Espinos. The p.R1109X mutation in SH3TC2 gene is predominant in Spanish Gypsies with Charcot-Marie-Tooth disease type 4. Clin. Genet., 71:343–349, Apr 2007.

[219] A. Gabreels-Festen, S. van Beersum, L. Eshuis, E. LeGuern, F. Gabreels, B. van Engelen, and E. Mariman. Study on the gene and phenotypic characterisation of autosomal recessive demyelinating motor and sensory neuropathy (Charcot-Marie-Tooth disease) with a gene locus on chromosome 5q23-q33. J. Neurol. Neurosurg. Psychiatr., 66:569–574, May 1999.

[220] J. Senderek, C. Bergmann, C. Stendel, J. Kirfel, N. Verpoorten, P. De Jonghe, V. Timmerman, R. Chrast, M. H. Verheijen, G. Lemke, E. Battaloglu, Y. Parman, S. Erdem, E. Tan, H. Topaloglu, A. Hahn, W. Muller-Felber, N. Rizzuto, G. M. Fabrizi, M. Stuhrmann, S. Rudnik-Schoneborn, S. Zuchner, J. Michael Schroder, E. Buch- heim, V. Straub, J. Klepper, K. Huehne, B. Rautenstrauss, R. Buttner, E. Nelis, and K. Zerres. Mutations in a gene encoding a novel SH3/TPR domain protein cause autosomal recessive Charcot-Marie-Tooth type 4C neuropathy. Am. J. Hum. Genet., 73:1106–1119, Nov 2003.

[221] M. Brewer, F. Changi, A. Antonellis, K. Fischbeck, P. Polly, G. Nicholson, and M. Kennerson. Evidence of a founder haplotype refines the X-linked Charcot-Marie-Tooth (CMTX3) locus to a 2.5 Mb region. Neurogenetics, 9:191–195, Jul 2008.

[222] J. L. Weber and C. Wong. Mutation of human short tandem repeats. Hum. Mol. Genet., 2:1123–1128, Aug 1993.

[223] G. F. Richard, A. Kerrest, and B. Dujon. Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol. Mol. Biol. Rev., 72:686–727, Dec 2008.

[224] B. Brinkmann, M. Klintschar, F. Neuhuber, J. Huhne, and B. Rolf. Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am. J. Hum. Genet., 62:1408–1415, Jun 1998.

[225] F. S. Collins. Positional cloning moves from perditional to traditional. Nat. Genet., 9:347–350, Apr 1995.

[226] J. S. Collins and C. E. Schwartz. Detecting polymorphisms and mutations in candidate genes. Am. J. Hum. Genet., 71:1251–1252, Nov 2002.

[227] International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature, 431:931–945, Oct 2004.

[228] J. C. Venter, M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G. Sutton, H. O. Smith, M. Yandell, C. A. Evans, R. A. Holt, J. D. Gocayne, P. Amanatides, R. M. Ballew, D. H. Huson, J. R. Wortman, Q. Zhang, C. D. Kodira, X. H. Zheng, L. Chen, M. Skupski, G. Subramanian, P. D. Thomas, J. Zhang, G. L. Gabor Miklos, C. Nelson, S. Broder, A. G. Clark, J. Nadeau, V. A. McKusick, N. Zinder, A. J. Levine, R. J. Roberts, M. Simon, C. Slayman, M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D. Fasulo, M. Flanigan, L. Florea, A. Halpern, S. Hannenhalli, S. Kravitz, S. Levy, C. Mobarry, K. Reinert, K. Remington, J. Abu-Threideh, E. Beasley, K. Biddick, V. Bonazzi, R. Brandon, M. Cargill, I. Chandramouliswaran, R. Charlab, K. Chaturvedi, Z. Deng, V. Di Francesco, P. Dunn, K. Eilbeck, C. Evangelista, A. E. Gabrielian, W. Gan, W. Ge, F. Gong, Z. Gu, P. Guan, T. J. Heiman, M. E. Higgins, R. R. Ji, Z. Ke, K. A. Ketchum, Z. Lai, Y. Lei, Z. Li, J. Li, Y. Liang, X. Lin, F. Lu, G. V. Merkulov, N. Milshina, H. M. Moore, A. K. Naik, V. A. Narayan, B. Neelam, D. Nusskern, D. B. Rusch, S. Salzberg, W. Shao, B. Shue, J. Sun, Z. Wang, A. Wang, X. Wang, J. Wang, M. Wei, R. Wides, C. Xiao, C. Yan, A. Yao, J. Ye, M. Zhan, W. Zhang, H. Zhang, Q. Zhao, L. Zheng, F. Zhong, W. Zhong, S. Zhu, S. Zhao, D. Gilbert, S. Baumhueter, G. Spier, C. Carter, A. Cravchik, T. Woodage, F. Ali, H. An, A. Awe, D. Baldwin, H. Baden, M. Barnstead, I. Barrow, K. Beeson, D. Busam, A. Carver, A. Center, M. L. Cheng, L. Curry, S. Danaher, L. Davenport, R. Desilets, S. Dietz, K. Dodson, L. Doup, S. Ferriera, N. Garg, A. Gluecksmann, B. Hart, J. Haynes, C. Haynes, C. Heiner, S. Hladun, D. Hostin, J. Houck, T. Howland, C. Ibegwam, J. Johnson, F. Kalush, L. Kline, S. Koduru, A. Love, F. Mann, D. May, S. McCawley, T. McIntosh, I. McMullen, M. Moy, L. Moy, B. Murphy, K. Nelson, C. Pfannkoch, E. Pratts, V. Puri, H. Qureshi, M. Reardon, R. Rodriguez, Y. H. Rogers, D. Romblad, B. Ruhfel, R. Scott, C. Sitter, M. Smallwood, E. Stewart, R. Strong, E. Suh, R. Thomas, N. N. Tint, S. Tse, C. Vech, G. Wang, J. Wetter, S. Williams, M. Williams, S. Windsor, E. Winn-Deen, K. Wolfe, J. Zaveri, K. Zaveri, J. F. Abril, R. Guigo, M. J. bibliography 274

Campbell, K. V. Sjolander, B. Karlak, A. Kejariwal, H. Mi, B. Lazareva, T. Hatton, A. Narechania, K. Diemer, A. Muruganujan, N. Guo, S. Sato, V. Bafna, S. Istrail, R. Lippert, R. Schwartz, B. Walenz, S. Yooseph, D. Allen, A. Basu, J. Baxendale, L. Blick, M. Caminha, J. Carnes-Stine, P. Caulk, Y. H. Chiang, M. Coyne, C. Dahlke, A. Mays, M. Dombroski, M. Donnelly, D. Ely, S. Esparham, C. Fosler, H. Gire, S. Glanowski, K. Glasser, A. Glodek, M. Gorokhov, K. Graham, B. Gropman, M. Harris, J. Heil, S. Henderson, J. Hoover, D. Jennings, C. Jordan, J. Jordan, J. Kasha, L. Kagan, C. Kraft, A. Levitsky, M. Lewis, X. Liu, J. Lopez, D. Ma, W. Majoros, J. McDaniel, S. Murphy, M. Newman, T. Nguyen, N. Nguyen, M. Nodell, S. Pan, J. Peck, M. Peterson, W. Rowe, R. Sanders, J. Scott, M. Simpson, T. Smith, A. Sprague, T. Stockwell, R. Turner, E. Venter, M. Wang, M. Wen, D. Wu, M. Wu, A. Xia, A. Zandieh, and X. Zhu. The sequence of the human genome. Science, 291:1304–1351, Feb 2001.

[229] A. D. Baxevanis. Using genomic databases for sequence-based biological discovery. Mol. Med., 9:185–192, 2003.

[230] J. E. Parrish and D. L. Nelson. Methods for finding genes. A major rate-limiting step in positional cloning. Genet. Anal. Tech. Appl., 10:29–41, Apr 1993.

[231] N. Papadopoulos. Introduction to positional cloning. Clin. Exp. Allergy, 25 Suppl 2:116–118, Nov 1995.

[232] M. Olson, L. Hood, C. Cantor, and D. Botstein. A common language for physical mapping of the human genome. Science, 245:1434–1435, Sep 1989.

[233] D. L. Nelson. Positional cloning reaches maturity. Curr. Opin. Genet. Dev., 5:298–303, Jun 1995.

[234] T. A. Pearson and T. A. Manolio. How to interpret a genome-wide association study. JAMA, 299:1335–1344, Mar 2008.

[235] H. K. Tabor, N. J. Risch, and R. M. Myers. Candidate-gene approaches for studying complex genetic traits: practical considerations. Nat. Rev. Genet., 3:391–397, May 2002.

[236] K. Garber. Genetics. The elusive ALS genes. Science, 319:20, Jan 2008.

[237] T.A. Brown, editor. Genomes. Wiley-Liss, Oxford, 2nd edition edition, 2002.

[238] S. A. Cook, K. R. Johnson, R. T. Bronson, and M. T. Davisson. Neuromuscular degeneration (nmd): a mutation on mouse chromosome 19 that causes motor neuron degeneration. Mamm. Genome, 6:187–191, Mar 1995.

[239] M. Zhu and S. Zhao. Candidate gene identification approach: progress and challenges. Int. J. Biol. Sci., 3: 420–427, 2007.

[240] M. L. Metzker. Emerging technologies in DNA sequencing. Genome Res., 15:1767–1776, Dec 2005.

[241] M. Grompe. The rapid detection of unknown mutations in nucleic acids. Nat. Genet., 5:111–117, Oct 1993.

[242] E. Y. Chan. Advances in sequencing technology. Mutat. Res., 573:13–40, Jun 2005.

[243] S. Kohler, S. Bauer, D. Horn, and P. N. Robinson. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet., 82:949–958, Apr 2008.

[244] L. C. Tranchevent, R. Barriot, S. Yu, S. Van Vooren, P. Van Loo, B. Coessens, B. De Moor, S. Aerts, and Y. Moreau. ENDEAVOUR update: a web resource for gene prioritization in multiple species. Nucleic Acids Res., 36:W377– 384, Jul 2008.

[245] S. Aerts, D. Lambrechts, S. Maity, P. Van Loo, B. Coessens, F. De Smet, L. C. Tranchevent, B. De Moor, P. Mary- nen, B. Hassan, P. Carmeliet, and Y. Moreau. Gene prioritization through genomic data fusion. Nat. Biotechnol., 24:537–544, May 2006.

[246] C. Houdayer, C. Dehainault, C. Mattler, D. Michaux, V. Caux-Moncoutier, S. Pages-Berhouet, C. D. d’Enghien, A. Lauge, L. Castera, M. Gauthier-Villars, and D. Stoppa-Lyonnet. Evaluation of in silico splice tools for decision-making in molecular diagnosis. Hum. Mutat., 29:975–982, Jul 2008.

[247] J. Bell, D. Bodmer, E. Sistermans, and S. Ramsden. Practice guidelines for the interpretation and reporting of unclassified variants (uvs) in clinical molecular genetics. Technical report, Clinical Molecular Genetics Society and Dutch Society of Clinical Genetic Laboratory Specialists, http://www.cmgs.org/BPGs/, Oct 2007.

[248] S. M. Hebsgaard, P. G. Korning, N. Tolstrup, J. Engelbrecht, P. Rouze, and S. Brunak. Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic Acids Res., 24: 3439–3452, Sep 1996. bibliography 275

[249] M. G. Reese, F. H. Eeckman, D. Kulp, and D. Haussler. Improved splice site detection in Genie. J. Comput. Biol., 4:311–323, 1997.

[250] F. O. Desmet, D. Hamroun, M. Lalande, G. Collod-Beroud, M. Claustres, and C. Beroud. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res., 37:e67, May 2009.

[251] M. Pertea, X. Lin, and S. L. Salzberg. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res., 29:1185–1190, Mar 2001.

[252] M. Wang and A. Marin. Characterization and prediction of alternative splice sites. Gene, 366:219–227, Feb 2006.

[253] D.H. Mathews, J. Sabina, M. Zuker, and D.H. Turner. Expanded Sequence Dependence of Thermodynamic Parameters Provides Robust Prediction of Rna Secondary Structure. J. Mol. Biol., 288:910–940, 1999.

[254] M. Zuker, D.H. Mathews, and D.H. Turner. Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide. In J. Barciszewski and B.F.C. Clark, editors, RNA Biochemistry and Biotechnology,. NATO ASI Series, Kluwer Academic Publishers„ 1999.

[255] L. W. Hillier, R. S. Fulton, L. A. Fulton, T. A. Graves, K. H. Pepin, C. Wagner-McPherson, D. Layman, J. Maas, S. Jaeger, R. Walker, K. Wylie, M. Sekhon, M. C. Becker, M. D. O’Laughlin, M. E. Schaller, G. A. Fewell, K. D. Delehaunty, T. L. Miner, W. E. Nash, M. Cordes, H. Du, H. Sun, J. Edwards, H. Bradshaw-Cordum, J. Ali, S. Andrews, A. Isak, A. Vanbrunt, C. Nguyen, F. Du, B. Lamar, L. Courtney, J. Kalicki, P. Ozersky, L. Bielicki, K. Scott, A. Holmes, R. Harkins, A. Harris, C. M. Strong, S. Hou, C. Tomlinson, S. Dauphin- Kohlberg, A. Kozlowicz-Reilly, S. Leonard, T. Rohlfing, S. M. Rock, A. M. Tin-Wollam, A. Abbott, P. Minx, R. Maupin, C. Strowmatt, P. Latreille, N. Miller, D. Johnson, J. Murray, J. P. Woessner, M. C. Wendl, S. P. Yang, B. R. Schultz, J. W. Wallis, J. Spieth, T. A. Bieri, J. O. Nelson, N. Berkowicz, P. E. Wohldmann, L. L. Cook, M. T. Hickenbotham, J. Eldred, D. Williams, J. A. Bedell, E. R. Mardis, S. W. Clifton, S. L. Chissoe, M. A. Marra, C. Raymond, E. Haugen, W. Gillett, Y. Zhou, R. James, K. Phelps, S. Iadanoto, K. Bubb, E. Simms, R. Levy, J. Clendenning, R. Kaul, W. J. Kent, T. S. Furey, R. A. Baertsch, M. R. Brent, E. Keibler, P. Flicek, P. Bork, M. Suyama, J. A. Bailey, M. E. Portnoy, D. Torrents, A. T. Chinwalla, W. R. Gish, S. R. Eddy, J. D. McPherson, M. V. Olson, E. E. Eichler, E. D. Green, R. H. Waterston, and R. K. Wilson. The DNA sequence of human chromosome 7. Nature, 424:157–164, Jul 2003.

[256] D. Karolchik, A. S. Hinrichs, T. S. Furey, K. M. Roskin, C. W. Sugnet, D. Haussler, and W. J. Kent. The UCSC Table Browser data retrieval tool. Nucleic Acids Res., 32:D493–496, Jan 2004.

[257] K. D. Pruitt, T. Tatusova, W. Klimke, and D. R. Maglott. NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res., 37:D32–36, Jan 2009.

[258] K. D. Pruitt, T. Tatusova, and D. R. Maglott. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 35:D61–65, Jan 2007.

[259] D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler. GenBank: update. Nucleic Acids Res., 32:D23–26, Jan 2004.

[260] K. D. Pruitt, J. Harrow, R. A. Harte, C. Wallin, M. Diekhans, D. R. Maglott, S. Searle, C. M. Farrell, J. E. Loveland, B. J. Ruef, E. Hart, M. M. Suner, M. J. Landrum, B. Aken, S. Ayling, R. Baertsch, J. Fernandez-Banet, J. L. Cherry, V. Curwen, M. Dicuccio, M. Kellis, J. Lee, M. F. Lin, M. Schuster, A. Shkeda, C. Amid, G. Brown, O. Dukhanina, A. Frankish, J. Hart, B. L. Maidak, J. Mudge, M. R. Murphy, T. Murphy, J. Rajan, B. Rajput, L. D. Riddick, C. Snow, C. Steward, D. Webb, J. A. Weber, L. Wilming, W. Wu, E. Birney, D. Haussler, T. Hubbard, J. Ostell, R. Durbin, and D. Lipman. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res., 19:1316–1323, Jul 2009.

[261] R. Leinonen, F. Nardone, W. Zhu, and R. Apweiler. UniSave: the UniProtKB sequence/annotation version database. Bioinformatics, 22:1284–1285, May 2006.

[262] S. Rouquier, S. Taviaux, B. J. Trask, V. Brand-Arpon, G. van den Engh, J. Demaille, and D. Giorgi. Distribution of olfactory receptor genes in the human genome. Nat. Genet., 18:243–250, Mar 1998.

[263] M. A. Robinson, M. P. Mitchell, S. Wei, C. E. Day, T. M. Zhao, and P. Concannon. Organization of human T-cell receptor beta-chain genes: clusters of V beta genes are present on chromosomes 7 and 9. Proc. Natl. Acad. Sci. U.S.A., 90:2433–2437, Mar 1993.

[264] M. A. van Es, P. W. van Vught, H. M. Blauw, L. Franke, C. G. Saris, L. Van den Bosch, S. W. de Jong, V. de Jong, F. Baas, R. van’t Slot, R. Lemmens, H. J. Schelhaas, A. Birve, K. Sleegers, C. Van Broeckhoven, J. C. Schymick, B. J. Traynor, J. H. Wokke, C. Wijmenga, W. Robberecht, P. M. Andersen, J. H. Veldink, R. A. Ophoff, and L. H. van den Berg. Genetic variation in DPP6 is associated with susceptibility to amyotrophic lateral sclerosis. Nat. Genet., 40:29–31, Jan 2008. bibliography 276

[265] S. Cronin, S. Berger, J. Ding, J. C. Schymick, N. Washecka, D. G. Hernandez, M. J. Greenway, D. G. Bradley, B. J. Traynor, and O. Hardiman. A genome-wide association study of sporadic ALS in a homogenous Irish population. Hum. Mol. Genet., 17:768–774, Mar 2008.

[266] R. Del Bo, S. Ghezzi, S. Corti, D. Santoro, A. Prelle, M. Mancuso, G. Siciliano, C. Briani, L. Murri, N. Bresolin, and G. P. Comi. DPP6 gene variability confers increased risk of developing sporadic amyotrophic lateral sclerosis in Italian patients. J. Neurol. Neurosurg. Psychiatr., 79:1085, Sep 2008.

[267] T. Dunckley, M. J. Huentelman, D. W. Craig, J. V. Pearson, S. Szelinger, K. Joshipura, R. F. Halperin, C. Stamper, K. R. Jensen, D. Letizia, S. E. Hesterlee, A. Pestronk, T. Levine, T. Bertorini, M. C. Graves, T. Mozaffar, C. E. Jackson, P. Bosch, A. McVey, A. Dick, R. Barohn, C. Lomen-Hoerth, J. Rosenfeld, D. T. O’connor, K. Zhang, R. Crook, H. Ryberg, M. Hutton, J. Katz, E. P. Simpson, H. Mitsumoto, R. Bowser, R. G. Miller, S. H. Appel, and D. A. Stephan. Whole-genome analysis of sporadic amyotrophic lateral sclerosis. N. Engl. J. Med., 357: 775–788, Aug 2007.

[268] E. Lingueglia, J. R. de Weille, F. Bassilana, C. Heurteaux, H. Sakai, R. Waldmann, and M. Lazdunski. A modulatory subunit of acid sensing ion channels in brain and dorsal root ganglion cells. J. Biol. Chem., 272: 29778–29783, Nov 1997.

[269] D. Alvarez de la Rosa, P. Zhang, D. Shao, F. White, and C. M. Canessa. Functional implications of the localization and activity of acid-sensitive channels in rat peripheral nervous system. Proc. Natl. Acad. Sci. U.S.A., 99:2326–2331, Feb 2002.

[270] J. F. Staropoli, C. McDermott, C. Martinat, B. Schulman, E. Demireva, and A. Abeliovich. Parkin is a component of an SCF-like ubiquitin ligase complex and protects postmitotic neurons from kainate excitotoxicity. Neuron, 37:735–749, Mar 2003.

[271] T. Kitada, S. Asakawa, N. Hattori, H. Matsumine, Y. Yamamura, S. Minoshima, M. Yokochi, Y. Mizuno, and N. Shimizu. Mutations in the parkin gene cause autosomal recessive juvenile parkinsonism. Nature, 392: 605–608, Apr 1998.

[272] K. A. Strauss, E. G. Puffenberger, M. J. Huentelman, S. Gottlieb, S. E. Dobrin, J. M. Parod, D. A. Stephan, and D. H. Morton. Recessive symptomatic focal epilepsy and mutant contactin-associated protein-like 2. N. Engl. J. Med., 354:1370–1377, Mar 2006.

[273] J. I. Friedman, T. Vrijenhoek, S. Markx, I. M. Janssen, W. A. van der Vliet, B. H. Faas, N. V. Knoers, W. Cahn, R. S. Kahn, L. Edelmann, K. L. Davis, J. M. Silverman, H. G. Brunner, A. G. van Kessel, C. Wijmenga, R. A. Ophoff, and J. A. Veltman. CNTNAP2 gene dosage variation is associated with schizophrenia and epilepsy. Mol. Psychiatry, 13:261–266, Mar 2008.

[274] A. J. Verkerk, C. A. Mathews, M. Joosse, B. H. Eussen, P. Heutink, and B. A. Oostra. CNTNAP2 is disrupted in a family with Gilles de la Tourette syndrome and obsessive compulsive disorder. Genomics, 82:1–9, Jul 2003.

[275] D. E. Arking, D. J. Cutler, C. W. Brune, T. M. Teslovich, K. West, M. Ikeda, A. Rea, M. Guy, S. Lin, E. H. Cook, and A. Chakravarti. A common genetic variant in the neurexin superfamily member CNTNAP2 increases familial risk of autism. Am. J. Hum. Genet., 82:160–164, Jan 2008.

[276] S. J. Huffaker, J. Chen, K. K. Nicodemus, F. Sambataro, F. Yang, V. Mattay, B. K. Lipska, T. M. Hyde, J. Song, D. Rujescu, I. Giegling, K. Mayilyan, M. J. Proust, A. Soghoyan, G. Caforio, J. H. Callicott, A. Bertolino, A. Meyer-Lindenberg, J. Chang, Y. Ji, M. F. Egan, T. E. Goldberg, J. E. Kleinman, B. Lu, and D. R. Weinberger. A primate-specific, brain isoform of KCNH2 affects cortical physiology, cognition, neuronal repolarization and risk of schizophrenia. Nat. Med., 15:509–518, May 2009.

[277] M. Dahiyat, A. Cumming, C. Harrington, C. Wischik, J. Xuereb, F. Corrigan, G. Breen, D. Shaw, and D. St Clair. Association between Alzheimer’s disease and the NOS3 gene. Ann. Neurol., 46:664–667, Oct 1999.

[278] A. Akomolafe, K. L. Lunetta, P. M. Erlich, L. A. Cupples, C. T. Baldwin, M. Huyck, R. C. Green, and L. A. Farrer. Genetic association between endothelial nitric oxide synthase and Alzheimer disease. Clin. Genet., 70: 49–56, Jul 2006.

[279] L. Bergeron, G. I. Perez, G. Macdonald, L. Shi, Y. Sun, A. Jurisicova, S. Varmuza, K. E. Latham, J. A. Flaws, J. C. Salter, H. Hara, M. A. Moskowitz, E. Li, A. Greenberg, J. L. Tilly, and J. Yuan. Defects in regulation of apoptosis in caspase-2-deficient mice. Genes Dev., 12:1304–1314, May 1998.

[280] M. D. Nguyen, R. C. Lariviere, and J. P. Julien. Deregulation of Cdk5 in a mouse model of ALS: toxicity alleviated by perikaryal neurofilament inclusions. Neuron, 30:135–147, Apr 2001.

[281] R. B. Hough, A. Lengeling, V. Bedian, C. Lo, and M. Bucan. Rump white inversion in the mouse disrupts dipeptidyl aminopeptidase-like protein 6 and causes dysregulation of Kit expression. Proc. Natl. Acad. Sci. U.S.A., 95:13800–13805, Nov 1998. bibliography 277

[282] L. Wilson, Y. H. Ching, M. Farias, S. A. Hartford, G. Howell, H. Shao, M. Bucan, and J. C. Schimenti. Random mutagenesis of proximal mouse chromosome 5 uncovers predominantly embryonic lethal mutations. Genome Res., 15:1095–1105, Aug 2005.

[283] J. T. Winston, P. Strack, P. Beer-Romero, C. Y. Chu, S. J. Elledge, and J. W. Harper. The SCFbeta-TRCP- ubiquitin ligase complex associates specifically with phosphorylated destruction motifs in IkappaBalpha and beta-catenin and stimulates IkappaBalpha ubiquitination in vitro. Genes Dev., 13:270–283, Feb 1999.

[284] T. Maniatis. A ubiquitin ligase complex essential for the NF-kappaB, Wnt/Wingless, and Hedgehog signaling pathways. Genes Dev., 13:505–510, Mar 1999.

[285] S. H. Loh, L. Francescut, P. Lingor, M. Bahr, and P. Nicotera. Identification of new kinase clusters required for neurite outgrowth and retraction by a loss-of-function RNA interference screen. Cell Death Differ., 15:283–298, Feb 2008.

[286] H. Z. Ring, V. Vameghi-Meyers, W. Wang, G. R. Crabtree, and U. Francke. Five SWI/SNF-related, matrix- associated, actin-dependent regulator of chromatin (SMARC) genes are dispersed in the human genome. Ge- nomics, 51:140–143, Jul 1998.

[287] D. F. Bogenhagen, D. Rousseau, and S. Burke. The layered structure of human mitochondrial DNA nucleoids. J. Biol. Chem., 283:3665–3675, Feb 2008.

[288] J. Huang, Z. Gong, G. Ghosal, and J. Chen. SOSS complexes participate in the maintenance of genomic stability. Mol. Cell, 35:384–393, Aug 2009.

[289] S. Poliak, L. Gollan, R. Martinez, A. Custer, S. Einheber, J. L. Salzer, J. S. Trimmer, P. Shrager, and E. Peles. Caspr2, a new member of the neurexin superfamily, is localized at the juxtaparanodes of myelinated axons and associates with K+ channels. Neuron, 24:1037–1047, Dec 1999.

[290] M. Rebhan, V. Chalifa-Caspi, J. Prilusky, and D. Lancet. GeneCards: integrating information about genes, proteins and diseases. Trends Genet., 13:163, Apr 1997.

[291] I. Yanai, H. Benjamin, M. Shmoish, V. Chalifa-Caspi, M. Shklar, R. Ophir, A. Bar-Even, S. Horn-Saban, M. Safran, E. Domany, D. Lancet, and O. Shmueli. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics, 21:650–659, Mar 2005.

[292] O. Shmueli, S. Horn-Saban, V. Chalifa-Caspi, M. Shmoish, R. Ophir, H. Benjamin-Rodrig, M. Safran, E. Do- many, and D. Lancet. GeneNote: whole genome expression profiles in normal human tissues. C. R. Biol., 326: 1067–1072, 2003.

[293] C. Wu, C. Orozco, J. Boyer, M. Leglise, J. Goodale, S. Batalov, C. L. Hodge, J. Haase, J. Janes, J. W. Huss, and A. I. Su. BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol., 10:R130, 2009.

[294] A. I. Su, T. Wiltshire, S. Batalov, H. Lapp, K. A. Ching, D. Block, J. Zhang, R. Soden, M. Hayakawa, G. Kreiman, M. P. Cooke, J. R. Walker, and J. B. Hogenesch. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. U.S.A., 101:6062–6067, Apr 2004.

[295] F. Zhang, W. Gu, M. E. Hurles, and J. R. Lupski. Copy number variation in human health, disease, and evolution. Annu Rev Genomics Hum Genet, 10:451–481, 2009.

[296] S. A. McCarroll, F. G. Kuruvilla, J. M. Korn, S. Cawley, J. Nemesh, A. Wysoker, M. H. Shapero, P. I. de Bakker, J. B. Maller, A. Kirby, A. L. Elliott, M. Parkin, E. Hubbell, T. Webster, R. Mei, J. Veitch, P. J. Collins, R. Handsaker, S. Lincoln, M. Nizzari, J. Blume, K. W. Jones, R. Rava, M. J. Daly, S. B. Gabriel, and D. Altshuler. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet., 40:1166–1174, Oct 2008.

[297] C. Alkan, B. P. Coe, and E. E. Eichler. Genome structural variation discovery and genotyping. Nat. Rev. Genet., 12:363–376, May 2011.

[298] J. Sebat, B. Lakshmi, J. Troge, J. Alexander, J. Young, P. Lundin, S. Maner, H. Massa, M. Walker, M. Chi, N. Navin, R. Lucito, J. Healy, J. Hicks, K. Ye, A. Reiner, T. C. Gilliam, B. Trask, N. Patterson, A. Zetterberg, and M. Wigler. Large-scale copy number polymorphism in the human genome. Science, 305:525–528, Jul 2004.

[299] A. J. Iafrate, L. Feuk, M. N. Rivera, M. L. Listewnik, P. K. Donahoe, Y. Qi, S. W. Scherer, and C. Lee. Detection of large-scale variation in the human genome. Nat. Genet., 36:949–951, Sep 2004. bibliography 278

[300] R. Redon, S. Ishikawa, K. R. Fitch, L. Feuk, G. H. Perry, T. D. Andrews, H. Fiegler, M. H. Shapero, A. R. Carson, W. Chen, E. K. Cho, S. Dallaire, J. L. Freeman, J. R. Gonzalez, M. Gratacos, J. Huang, D. Kalaitzopoulos, D. Komura, J. R. MacDonald, C. R. Marshall, R. Mei, L. Montgomery, K. Nishimura, K. Okamura, F. Shen, M. J. Somerville, J. Tchinda, A. Valsesia, C. Woodwark, F. Yang, J. Zhang, T. Zerjal, J. Zhang, L. Armengol, D. F. Conrad, X. Estivill, C. Tyler-Smith, N. P. Carter, H. Aburatani, C. Lee, K. W. Jones, S. W. Scherer, and M. E. Hurles. Global variation in copy number in the human genome. Nature, 444:444–454, Nov 2006.

[301] J. M. Kidd, G. M. Cooper, W. F. Donahue, H. S. Hayden, N. Sampas, T. Graves, N. Hansen, B. Teague, C. Alkan, F. Antonacci, E. Haugen, T. Zerr, N. A. Yamada, P. Tsang, T. L. Newman, E. Tuzun, Z. Cheng, H. M. Ebling, N. Tusneem, R. David, W. Gillett, K. A. Phelps, M. Weaver, D. Saranga, A. Brand, W. Tao, E. Gustafson, K. McKernan, L. Chen, M. Malig, J. D. Smith, J. M. Korn, S. A. McCarroll, D. A. Altshuler, D. A. Peiffer, M. Dorschner, J. Stamatoyannopoulos, D. Schwartz, D. A. Nickerson, J. C. Mullikin, R. K. Wilson, L. Bruhn, M. V. Olson, R. Kaul, D. R. Smith, and E. E. Eichler. Mapping and sequencing of structural variation from eight human genomes. Nature, 453:56–64, May 2008.

[302] D. F. Conrad, D. Pinto, R. Redon, L. Feuk, O. Gokcumen, Y. Zhang, J. Aerts, T. D. Andrews, C. Barnes, P. Campbell, T. Fitzgerald, M. Hu, C. H. Ihm, K. Kristiansson, D. G. Macarthur, J. R. Macdonald, I. Onyiah, A. W. Pang, S. Robson, K. Stirrups, A. Valsesia, K. Walter, J. Wei, C. Tyler-Smith, N. P. Carter, C. Lee, S. W. Scherer, and M. E. Hurles. Origins and functional impact of copy number variation in the human genome. Nature, 464:704–712, Apr 2010.

[303] H. Park, J. I. Kim, Y. S. Ju, O. Gokcumen, R. E. Mills, S. Kim, S. Lee, D. Suh, D. Hong, H. P. Kang, Y. J. Yoo, J. Y. Shin, H. J. Kim, M. Yavartanoo, Y. W. Chang, J. S. Ha, W. Chong, G. R. Hwang, K. Darvishi, H. Kim, S. J. Yang, K. S. Yang, H. Kim, M. E. Hurles, S. W. Scherer, N. P. Carter, C. Tyler-Smith, C. Lee, and J. S. Seo. Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing. Nat. Genet., 42:400–405, May 2010.

[304] R. E. Mills, K. Walter, C. Stewart, R. E. Handsaker, K. Chen, C. Alkan, A. Abyzov, S. C. Yoon, K. Ye, R. K. Cheetham, A. Chinwalla, D. F. Conrad, Y. Fu, F. Grubert, I. Hajirasouliha, F. Hormozdiari, L. M. Iakoucheva, Z. Iqbal, S. Kang, J. M. Kidd, M. K. Konkel, J. Korn, E. Khurana, D. Kural, H. Y. Lam, J. Leng, R. Li, Y. Li, C. Y. Lin, R. Luo, X. J. Mu, J. Nemesh, H. E. Peckham, T. Rausch, A. Scally, X. Shi, M. P. Stromberg, A. M. Stutz, A. E. Urban, J. A. Walker, J. Wu, Y. Zhang, Z. D. Zhang, M. A. Batzer, L. Ding, G. T. Marth, G. McVean, J. Sebat, M. Snyder, J. Wang, K. Ye, E. E. Eichler, M. B. Gerstein, M. E. Hurles, C. Lee, S. A. McCarroll, J. O. Korbel, and . 1000 Genomes Project. Mapping copy number variation by population-scale genome sequencing. Nature, 470:59–65, Feb 2011.

[305] J. A. Lee and J. R. Lupski. Genomic rearrangements and gene copy-number alterations as a cause of nervous system disorders. Neuron, 52:103–121, Oct 2006.

[306] J. Huang, X. Wu, G. Montenegro, J. Price, G. Wang, J. M. Vance, M. E. Shy, and S. Zuchner. Copy number variations are a rare cause of non-CMT1A Charcot-Marie-Tooth disease. J. Neurol., 257:735–741, May 2010.

[307] J. R. Lupski, R. M. de Oca-Luna, S. Slaugenhaupt, L. Pentao, V. Guzzetta, B. J. Trask, O. Saucedo-Cardenas, D. F. Barker, J. M. Killian, C. A. Garcia, A. Chakravarti, and P. I. Patel. DNA duplication associated with Charcot-Marie-Tooth disease type 1A. Cell, 66:219–232, Jul 1991.

[308] P. F. Chance, M. K. Alderson, K. A. Leppig, M. W. Lensch, N. Matsunami, B. Smith, P. D. Swanson, S. J. Odelberg, C. M. Disteche, and T. D. Bird. DNA deletion associated with hereditary neuropathy with liability to pressure palsies. Cell, 72:143–151, Jan 1993.

[309] Q. S. Padiath, K. Saigoh, R. Schiffmann, H. Asahara, T. Yamada, A. Koeppen, K. Hogan, L. J. Ptacek, and Y. H. Fu. Lamin B1 duplications cause autosomal dominant leukodystrophy. Nat. Genet., 38:1114–1123, Oct 2006.

[310] L. Feuk, A. R. Carson, and S. W. Scherer. Structural variation in the human genome. Nat. Rev. Genet., 7:85–97, Feb 2006.

[311] H. Matsuzaki, P. H. Wang, J. Hu, R. Rava, and G. K. Fu. High resolution discovery and confirmation of copy number variants in 90 Yoruba Nigerians. Genome Biol., 10:R125, 2009.

[312] D. P. Locke, A. J. Sharp, S. A. McCarroll, S. D. McGrath, T. L. Newman, Z. Cheng, S. Schwartz, D. G. Albert- son, D. Pinkel, D. M. Altshuler, and E. E. Eichler. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet., 79:275–290, Aug 2006.

[313] A. J. Sharp, D. P. Locke, S. D. McGrath, Z. Cheng, J. A. Bailey, R. U. Vallente, L. M. Pertz, R. A. Clark, S. Schwartz, R. Segraves, V. V. Oseroff, D. G. Albertson, D. Pinkel, and E. E. Eichler. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet., 77:78–88, Jul 2005. bibliography 279

[314] T. H. Shaikh, X. Gai, J. C. Perin, J. T. Glessner, H. Xie, K. Murphy, R. O’Hara, T. Casalunovo, L. K. Conlin, M. D’Arcy, E. C. Frackelton, E. A. Geiger, C. Haldeman-Englert, M. Imielinski, C. E. Kim, L. Medne, K. Anna- iah, J. P. Bradfield, E. Dabaghyan, A. Eckert, C. C. Onyiah, S. Ostapenko, F. G. Otieno, E. Santa, J. L. Shaner, R. Skraban, R. M. Smith, J. Elia, E. Goldmuntz, N. B. Spinner, E. H. Zackai, R. M. Chiavacci, R. Grundmeier, E. F. Rappaport, S. F. Grant, P. S. White, and H. Hakonarson. High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications. Genome Res., 19:1682–1690, Sep 2009.

[315] B. Guthmundsson, E. Olafsson, F. Jakobsson, and P. Luthvigsson. Prevalence of symptomatic Charcot-Marie- Tooth disease in Iceland: a study of a well-defined population. Neuroepidemiology, 34:13–17, 2010.

[316] M. L. Metzker. Sequencing technologies - the next generation. Nat. Rev. Genet., 11:31–46, Jan 2010.

[317] J. Shendure and H. Ji. Next-generation DNA sequencing. Nat. Biotechnol., 26:1135–1145, Oct 2008.

[318] M. Margulies, M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben, J. Berka, M. S. Braverman, Y. J. Chen, Z. Chen, S. B. Dewell, L. Du, J. M. Fierro, X. V. Gomes, B. C. Godwin, W. He, S. Helgesen, C. H. Ho, C. H. Ho, G. P. Irzyk, S. C. Jando, M. L. Alenquer, T. P. Jarvie, K. B. Jirage, J. B. Kim, J. R. Knight, J. R. Lanza, J. H. Leamon, S. M. Lefkowitz, M. Lei, J. Li, K. L. Lohman, H. Lu, V. B. Makhijani, K. E. McDade, M. P. McKenna, E. W. Myers, E. Nickerson, J. R. Nobile, R. Plant, B. P. Puc, M. T. Ronan, G. T. Roth, G. J. Sarkis, J. F. Simons, J. W. Simpson, M. Srinivasan, K. R. Tartaro, A. Tomasz, K. A. Vogt, G. A. Volkmer, S. H. Wang, Y. Wang, M. P. Weiner, P. Yu, R. F. Begley, and J. M. Rothberg. Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437:376–380, Sep 2005.

[319] A. Valouev, J. Ichikawa, T. Tonthat, J. Stuart, S. Ranade, H. Peckham, K. Zeng, J. A. Malek, G. Costa, K. McKer- nan, A. Sidow, A. Fire, and S. M. Johnson. A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res., 18:1051–1063, Jul 2008.

[320] J. M. Rothberg, W. Hinz, T. M. Rearick, J. Schultz, W. Mileski, M. Davey, J. H. Leamon, K. Johnson, M. J. Milgrew, M. Edwards, J. Hoon, J. F. Simons, D. Marran, J. W. Myers, J. F. Davidson, A. Branting, J. R. Nobile, B. P. Puc, D. Light, T. A. Clark, M. Huber, J. T. Branciforte, I. B. Stoner, S. E. Cawley, M. Lyons, Y. Fu, N. Homer, M. Sedova, X. Miao, B. Reed, J. Sabina, E. Feierstein, M. Schorn, M. Alanjary, E. Dimalanta, D. Dressman, R. Kasinskas, T. Sokolsky, J. A. Fidanza, E. Namsaraev, K. J. McKernan, A. Williams, G. T. Roth, and J. Bustillo. An integrated semiconductor device enabling non-optical genome sequencing. Nature, 475:348–352, Jul 2011.

[321] J. Korlach, K. P. Bjornson, B. P. Chaudhuri, R. L. Cicero, B. A. Flusberg, J. J. Gray, D. Holden, R. Saxena, J. Wegener, and S. W. Turner. Real-time DNA sequencing from single polymerase molecules. Meth. Enzymol., 472:431–455, 2010.

[322] M. Ronaghi, S. Karamohamed, B. Pettersson, M. Uhlen, and P. Nyren. Real-time DNA sequencing using detection of pyrophosphate release. Anal. Biochem., 242:84–89, Nov 1996.

[323] M. Ronaghi, M. Uhlen, and P. Nyren. A sequencing method based on real-time pyrophosphate. Science, 281: 363, 365, Jul 1998.

[324] P. D. Stenson, M. Mort, E. V. Ball, K. Howells, A. D. Phillips, N. S. Thomas, and D. N. Cooper. The Human Gene Mutation Database: 2008 update. Genome Med, 1:13, 2009.

[325] J. Guo, N. Xu, Z. Li, S. Zhang, J. Wu, D. H. Kim, M. Sano Marma, Q. Meng, H. Cao, X. Li, S. Shi, L. Yu, S. Kalachikov, J. J. Russo, N. J. Turro, and J. Ju. Four-color DNA sequencing with 3’-O-modified nucleotide reversible terminators and chemically cleavable fluorescent dideoxynucleotides. Proc. Natl. Acad. Sci. U.S.A., 105:9145–9150, Jul 2008.

[326] J. Bowers, J. Mitchell, E. Beer, P. R. Buzby, M. Causey, J. W. Efcavitch, M. Jarosz, E. Krzymanska-Olejnik, L. Kung, D. Lipson, G. M. Lowman, S. Marappan, P. McInerney, A. Platt, A. Roy, S. M. Siddiqi, K. Steinmann, and J. F. Thompson. Virtual terminator nucleotides for next-generation DNA sequencing. Nat. Methods, 6: 593–595, Aug 2009.

[327] M. J. Fullwood, C. L. Wei, E. T. Liu, and Y. Ruan. Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res., 19:521–532, Apr 2009.

[328] C. Trapnell and S. L. Salzberg. How to map billions of short reads onto genomes. Nat. Biotechnol., 27:455–457, May 2009.

[329] M. J. Chaisson, D. Brinza, and P. A. Pevzner. De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res., 19:336–346, Feb 2009.

[330] D. R. Bentley. Whole-genome re-sequencing. Curr. Opin. Genet. Dev., 16:545–552, Dec 2006. bibliography 280

[331] E. H. Turner, S. B. Ng, D. A. Nickerson, and J. Shendure. Methods for genomic partitioning. Annu Rev Genomics Hum Genet, 10:263–284, 2009.

[332] E. Hodges, Z. Xuan, V. Balija, M. Kramer, M. N. Molla, S. W. Smith, C. M. Middle, M. J. Rodesch, T. J. Albert, G. J. Hannon, and W. R. McCombie. Genome-wide in situ exon capture for selective resequencing. Nat. Genet., 39:1522–1527, Dec 2007.

[333] M. A. Cleary, K. Kilian, Y. Wang, J. Bradshaw, G. Cavet, W. Ge, A. Kulkarni, P. J. Paddison, K. Chang, N. Sheth, E. Leproust, E. M. Coffey, J. Burchard, W. R. McCombie, P. Linsley, and G. J. Hannon. Production of complex nucleic acid libraries using highly parallel in situ oligonucleotide synthesis. Nat. Methods, 1:241–248, Dec 2004.

[334] S. B. Ng, E. H. Turner, P. D. Robertson, S. D. Flygare, A. W. Bigham, C. Lee, T. Shaffer, M. Wong, A. Bhattachar- jee, E. E. Eichler, M. Bamshad, D. A. Nickerson, and J. Shendure. Targeted capture and massively parallel sequencing of 12 human exomes. Nature, 461:272–276, Sep 2009.

[335] C. Gilissen, A. Hoischen, H. G. Brunner, and J. A. Veltman. Unlocking Mendelian disease using exome sequencing. Genome Biol, 12:228, Sep 2011.

[336] J. R. Lupski, J. G. Reid, C. Gonzaga-Jauregui, D. Rio Deiros, D. C. Chen, L. Nazareth, M. Bainbridge, H. Dinh, C. Jing, D. A. Wheeler, A. L. McGuire, F. Zhang, P. Stankiewicz, J. J. Halperin, C. Yang, C. Gehman, D. Guo, R. K. Irikat, W. Tom, N. J. Fantin, D. M. Muzny, and R. A. Gibbs. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N. Engl. J. Med., 362:1181–1191, Apr 2010.

[337] P. C. Ng, S. Levy, J. Huang, T. B. Stockwell, B. P. Walenz, K. Li, N. Axelrod, D. A. Busam, R. L. Strausberg, and J. C. Venter. Genetic variation in an individual human exome. PLoS Genet., 4:e1000160, 2008.

[338] S. B. Ng, K. J. Buckingham, C. Lee, A. W. Bigham, H. K. Tabor, K. M. Dent, C. D. Huff, P. T. Shannon, E. W. Jabs, D. A. Nickerson, J. Shendure, and M. J. Bamshad. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet., 42:30–35, Jan 2010.

[339] J. L. Wang, X. Yang, K. Xia, Z. M. Hu, L. Weng, X. Jin, H. Jiang, P. Zhang, L. Shen, J. F. Guo, N. Li, Y. R. Li, L. F. Lei, J. Zhou, J. Du, Y. F. Zhou, Q. Pan, J. Wang, J. Wang, R. Q. Li, and B. S. Tang. TGM6 identified as a novel causative gene of spinocerebellar ataxias using exome sequencing. Brain, 133:3510–3518, Dec 2010.

[340] R. Li, Y. Li, K. Kristiansen, and J. Wang. SOAP: short oligonucleotide alignment program. Bioinformatics, 24: 713–714, Mar 2008.

[341] D. Blankenberg, G. Von Kuster, N. Coraor, G. Ananda, R. Lazarus, M. Mangan, A. Nekrutenko, and J. Taylor. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol, Chapter 19:1–21, Jan 2010.

[342] J. Goecks, A. Nekrutenko, J. Taylor, E. Afgan, G. Ananda, D. Baker, D. Blankenberg, R. Chakrabarty, N. Coraor, J. Goecks, G. Von Kuster, R. Lazarus, K. Li, A. Nekrutenko, J. Taylor, and K. Vincent. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol., 11:R86, 2010.

[343] P. C. Ng and S. Henikoff. Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet, 7:61–80, 2006.

[344] P. D. Thomas, M. J. Campbell, A. Kejariwal, H. Mi, B. Karlak, R. Daverman, K. Diemer, A. Muruganujan, and A. Narechania. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res., 13: 2129–2141, Sep 2003.

[345] Y. Bromberg, G. Yachdav, and B. Rost. SNAP predicts effect of mutations on protein function. Bioinformatics, 24:2397–2398, Oct 2008.

[346] V. Ramensky, P. Bork, and S. Sunyaev. Human non-synonymous SNPs: server and survey. Nucleic Acids Res., 30:3894–3900, Sep 2002.

[347] S. Sunyaev, V. Ramensky, I. Koch, W. Lathe, A. S. Kondrashov, and P. Bork. Prediction of deleterious human alleles. Hum. Mol. Genet., 10:591–597, Mar 2001.

[348] J. A. Bailey, Z. Gu, R. A. Clark, K. Reinert, R. V. Samonte, S. Schwartz, M. D. Adams, E. W. Myers, P. W. Li, and E. E. Eichler. Recent segmental duplications in the human genome. Science, 297:1003–1007, Aug 2002.

[349] J. A. Bailey, A. M. Yavor, H. F. Massa, B. J. Trask, and E. E. Eichler. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res., 11:1005–1017, Jun 2001. bibliography 281

[350] A. Kozik, E. Kochetkova, and R. Michelmore. GenomePixelizer–a visualization program for comparative genomics within and between species. Bioinformatics, 18:335–336, Feb 2002.

[351] S. Griffiths-Jones, H. K. Saini, S. van Dongen, and A. J. Enright. miRBase: tools for microRNA genomics. Nucleic Acids Res., 36:D154–158, Jan 2008.

[352] D. J. Hedges, D. Hedges, D. Burges, E. Powell, C. Almonte, J. Huang, S. Young, B. Boese, M. Schmidt, M. A. Pericak-Vance, E. Martin, X. Zhang, T. T. Harkins, and S. Zuchner. Exome sequencing of a multigenerational human pedigree. PLoS ONE, 4:e8232, 2009.

[353] O. Harismendy, P. C. Ng, R. L. Strausberg, X. Wang, T. B. Stockwell, K. Y. Beeson, N. J. Schork, S. S. Murray, E. J. Topol, S. Levy, and K. A. Frazer. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol., 10:R32, 2009.

[354] N. F. Asan, Y. Xu, H. Jiang, C. Tyler-Smith, Y. Xue, T. Jiang, J. Wang, M. Wu, X. Liu, G. Tian, J. Wang, J. Wang, H. Yang, and X. Zhang. Comprehensive comparison of three commercial human whole-exome capture plat- forms. Genome Biol, 12:R95, Sep 2011.

[355] L. Cartegni, S. L. Chew, and A. R. Krainer. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat. Rev. Genet., 3:285–298, Apr 2002.

[356] M. J. Bamshad, S. B. Ng, A. W. Bigham, H. K. Tabor, M. J. Emond, D. A. Nickerson, and J. Shendure. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet., 12:745–755, 2011.

[357] Megan Hwa. Brewer. Molecular genetics of X-linked Charcot-Marie-Tooth neuropathy (CMTX3). Master’s thesis, Faculty of Medicine, University of Sydney, 2011.

[358] H. Li and R. Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25:1754–1760, Jul 2009.

[359] H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25:2078–2079, Aug 2009.

[360] B. Giardine, C. Riemer, R. C. Hardison, R. Burhans, L. Elnitski, P. Shah, Y. Zhang, D. Blankenberg, I. Albert, J. Taylor, W. Miller, W. J. Kent, and A. Nekrutenko. Galaxy: a platform for interactive large-scale genome analysis. Genome Res., 15:1451–1455, Oct 2005.

[361] J. T. Robinson, H. Thorvaldsdottir, W. Winckler, M. Guttman, E. S. Lander, G. Getz, and J. P. Mesirov. Integra- tive genomics viewer. Nat. Biotechnol., 29:24–26, Jan 2011.

[362] K. Verhoeven, K. G. Claeys, S. Zuchner, J. M. Schroder, J. Weis, C. Ceuterick, A. Jordanova, E. Nelis, E. De Vriendt, M. Van Hul, P. Seeman, R. Mazanec, G. M. Saifi, K. Szigeti, P. Mancias, I. J. Butler, A. Kochanski, B. Ryniewicz, J. De Bleecker, P. Van den Bergh, C. Verellen, R. Van Coster, N. Goemans, M. Auer-Grumbach, W. Robberecht, V. Milic Rasic, Y. Nevo, I. Tournev, V. Guergueltcheva, F. Roelens, P. Vieregge, P. Vinci, M. T. Moreno, H. J. Christen, M. E. Shy, J. R. Lupski, J. M. Vance, P. De Jonghe, and V. Timmerman. MFN2 mutation distribution and genotype/phenotype correlation in Charcot-Marie-Tooth type 2. Brain, 129:2093–2102, Aug 2006.

[363] L. Bartoloni, J. L. Blouin, Y. Pan, C. Gehrig, A. K. Maiti, N. Scamuffa, C. Rossier, M. Jorissen, M. Armengot, M. Meeks, H. M. Mitchison, E. M. Chung, C. D. Delozier-Blanchet, W. J. Craigen, and S. E. Antonarakis. Mutations in the DNAH11 (axonemal heavy chain dynein type 11) gene cause one form of situs inversus totalis and most likely primary ciliary dyskinesia. Proc. Natl. Acad. Sci. U.S.A., 99:10282–10286, Aug 2002.

[364] G. C. Schwabe, K. Hoffmann, N. T. Loges, D. Birker, C. Rossier, M. M. de Santi, H. Olbrich, M. Fliegauf, M. Failly, U. Liebers, M. Collura, G. Gaedicke, S. Mundlos, U. Wahn, J. L. Blouin, B. Niggemann, H. Omran, S. E. Antonarakis, and L. Bartoloni. Primary ciliary dyskinesia associated with normal axoneme ultrastructure is caused by DNAH11 mutations. Hum. Mutat., 29:289–298, Feb 2008.

[365] G. T. Banks and E. M. Fisher. Cytoplasmic dynein could be key to understanding neurodegeneration. Genome Biol., 9:214, 2008.

[366] J. Eschbach and L. Dupuis. Cytoplasmic dynein in neurodegeneration. Pharmacol. Ther., 130:348–363, Jun 2011.

[367] Q. Wu and T. Maniatis. A striking organization of a large family of human neural cadherin-like cell adhesion genes. Cell, 97:779–790, Jun 1999.

[368] X. Wang, J. A. Weiner, S. Levi, A. M. Craig, A. Bradley, and J. R. Sanes. Gamma protocadherins are required for survival of spinal interneurons. Neuron, 36:843–854, Dec 2002. bibliography 282

[369] C. Zou, W. Huang, G. Ying, and Q. Wu. Sequence analysis and expression mapping of the rat clustered protocadherin gene repertoires. Neuroscience, 144:579–603, Jan 2007.

[370] B. Tasic, C. E. Nabholz, K. K. Baldwin, Y. Kim, E. H. Rueckert, S. A. Ribich, P. Cramer, Q. Wu, R. Axel, and T. Maniatis. Promoter choice determines splice site selection in protocadherin alpha and gamma pre-mRNA splicing. Mol. Cell, 10:21–33, Jul 2002.

[371] X. Wang, H. Su, and A. Bradley. Molecular mechanisms governing Pcdh-gamma gene expression: evidence for a multiple promoter and cis-alternative splicing model. Genes Dev., 16:1890–1905, Aug 2002.

[372] Q. Wu, T. Zhang, J. F. Cheng, Y. Kim, J. Grimwood, J. Schmutz, M. Dickson, J. P. Noonan, M. Q. Zhang, R. M. Myers, and T. Maniatis. Comparative DNA sequence analysis of mouse and human protocadherin gene clusters. Genome Res., 11:389–404, Mar 2001.

[373] J. A. Weiner, X. Wang, J. C. Tapia, and J. R. Sanes. Gamma protocadherins are required for synaptic develop- ment in the spinal cord. Proc. Natl. Acad. Sci. U.S.A., 102:8–14, Jan 2005.

[374] E. LeGuern, A. Guilbot, M. Kessali, N. Ravise, J. Tassin, T. Maisonobe, D. Grid, and A. Brice. Homozygosity mapping of an autosomal recessive form of demyelinating Charcot-Marie-Tooth disease to chromosome 5q23- q33. Hum. Mol. Genet., 5:1685–1688, Oct 1996.

[375] H. Azzedine, N. Ravise, C. Verny, A. Gabreels-Festen, M. Lammens, D. Grid, J. M. Vallat, G. Durosier, J. Senderek, S. Nouioua, T. Hamadouche, A. Bouhouche, A. Guilbot, C. Stendel, M. Ruberg, A. Brice, N. Birouk, O. Dubourg, M. Tazir, and E. LeGuern. Spine deformities in Charcot-Marie-Tooth 4C caused by SH3TC2 gene mutations. Neurology, 67:602–606, Aug 2006.

[376] M. Magrane and U. Consortium. UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford), 2011:bar009, 2011.

[377] M. J. Clark, R. Chen, H. Y. Lam, K. J. Karczewski, R. Chen, G. Euskirchen, A. J. Butte, and M. Snyder. Perfor- mance comparison of exome DNA sequencing technologies. Nat. Biotechnol., 29:908–914, 2011.

[378] M. Dejesus-Hernandez, I. R. Mackenzie, B. F. Boeve, A. L. Boxer, M. Baker, N. J. Rutherford, A. M. Nicholson, N. A. Finch, H. Flynn, J. Adamson, N. Kouri, A. Wojtas, P. Sengdy, G. Y. Hsiung, A. Karydas, W. W. Seeley, K. A. Josephs, G. Coppola, D. H. Geschwind, Z. K. Wszolek, H. Feldman, D. S. Knopman, R. C. Petersen, B. L. Miller, D. W. Dickson, K. B. Boylan, N. R. Graff-Radford, and R. Rademakers. Expanded GGGGCC Hexanucleotide Repeat in Noncoding Region of C9ORF72 Causes Chromosome 9p-Linked FTD and ALS. Neuron, 72:245–256, Oct 2011.

[379] A. E. Renton, E. Majounie, A. Waite, J. Simon-Sanchez, S. Rollinson, J. R. Gibbs, J. C. Schymick, H. Laaksovirta, J. C. van Swieten, L. Myllykangas, H. Kalimo, A. Paetau, Y. Abramzon, A. M. Remes, A. Kaganovich, S. W. Scholz, J. Duckworth, J. Ding, D. W. Harmer, D. G. Hernandez, J. O. Johnson, K. Mok, M. Ryten, D. Trabzuni, R. J. Guerreiro, R. W. Orrell, J. Neal, A. Murray, J. Pearson, I. E. Jansen, D. Sondervan, H. Seelaar, D. Blake, K. Young, N. Halliwell, J. B. Callister, G. Toulson, A. Richardson, A. Gerhard, J. Snowden, D. Mann, D. Neary, M. A. Nalls, T. Peuralinna, L. Jansson, V. M. Isoviita, A. L. Kaivorinne, M. Holtta-Vuori, E. Ikonen, R. Sulkava, M. Benatar, J. Wuu, A. Chio, G. Restagno, G. Borghero, M. Sabatelli, D. Heckerman, E. Rogaeva, L. Zinman, J. D. Rothstein, M. Sendtner, C. Drepper, E. E. Eichler, C. Alkan, Z. Abdullaev, S. D. Pack, A. Dutra, E. Pak, J. Hardy, A. Singleton, N. M. Williams, P. Heutink, S. Pickering-Brown, H. R. Morris, P. J. Tienari, and B. J. Traynor. A Hexanucleotide Repeat Expansion in C9ORF72 Is the Cause of Chromosome 9p21-Linked ALS- FTD. Neuron, 72:257–268, Oct 2011.

[380] H. Daoud, P. N. Valdmanis, P. A. Dion, and G. A. Rouleau. Analysis of DPP6 and FGGY as candidate genes for amyotrophic lateral sclerosis. Amyotroph Lateral Scler, 11:389–391, Aug 2010.