Positional cloning of the mutated in Hereditary Motor and Sensory Neuropathy- Russe (HMSNR)

Janina Hantke

This thesis is presented for the degree of Doctor of Philosophy at The University of Western Australia

Western Australian Institute for Medical Research And Centre for Medical Research School of Medicine and Pharmacology 2004

DECLARATION OF CONTENT

To my best knowledge and belief, I certify that this thesis does not incorporate any material previously submitted for a degree or diploma in any other institution of higher education. For any work in this thesis that has been co-published with other authors, I have the permission of all co-authors to include this work in my thesis. The thesis complies with the guidelines of the University of Western Australia and the Faculty of Medicine and Dentistry.

Prof Luba Kalaydjieva (Principal supervisor)

Dr David Chandler (Co-supervisor)

Janina Hantke (PhD candidate)

ABSTRACT

Hereditary Motor and Sensory Neuropathy-Russe (HMSNR) is a rare recessive form of Charcot-Marie-Tooth disease (CMT) that has been identified in the European Gypsy (Roma) population. Clinically, HMSNR manifests with typical CMT symptoms, while no associated features have been detected. Distinct neuropathological features of HMSNR include the presence of numerous clusters of thinly myelinated fibres originating from regenerative activity. HMSNR has been previously mapped to 10q using a large Bulgarian Gypsy kindred. Subsequent identification of related chromosome 10q haplotypes in Spanish and Romanian Gypsy families suggested a founder mutation in the Gypsy population as the cause of HMSNR. This thesis describes the refined mapping of the HMSNR gene by generating a high-density physical-genetic map of the HMSNR region containing 20 microsatellite markers and 229 SNPs and insertion/deletions which allowed meticulous mapping of recombination breakpoints resulting in a reduction of the HMSNR gene region from 1 Mb to just 63.8 kb. Analysis of positional candidates by direct sequencing included 14 known , 7 predicted genes and 42 expressed sequence tags (ESTs) non- overlapping with the genes. 78 putative HMSNR mutations were identified, two of which exhibit complete segregation with the HMSNR phenotype. Both are located in the so-called testis-specific part of unexpected candidate gene hexokinase 1 (HK1), in a rare alternative untranslated 5’ exon of HK1 and in the adjacent downstream intron. Expression analysis of transcripts containing the alternative exon suggests that the exon is not confined to testis but may be expressed in the nervous system. It remains to be speculated how a gene that functions in the fundamental process of energy generation might be involved in a neuropathy. Further investigations are likely to expand the knowledge about the importance of HK1 in the peripheral nervous system and may elucidate new roles of HK1.

i

TABLE OF CONTENTS

ABSTRACT...... I TABLE OF CONTENTS...... II INDEX OF TABLES...... V INDEX OF FIGURES...... VI ABBREVIATIONS ...... VIII ACKNOWLEDGEMENTS...... XII 1 GENERAL INTRODUCTION AND REVIEW OF THE LITERATURE ...... 1 1.1 THE USE OF GENETIC ISOLATES FOR POSITIONAL CLONING ...... 1 1.1.1 Advantages of genetic isolates...... 1 1.1.2 The Gypsies as a founder population ...... 2 1.2 THE BIOLOGY OF THE PERIPHERAL NERVOUS SYSTEM (PNS) ...... 3 1.2.1 Organisation of the PNS...... 3 1.2.2 Cell types of the PNS...... 3 1.2.3 Myelinated axons in the PNS...... 5 1.2.3.1 Morphology and function of myelinated axons in the PNS ...... 5 1.2.3.2 PNS myelin composition ...... 8 1.2.3.3 The process of myelination...... 13 1.2.4 Pathology of the PNS ...... 16 1.2.4.1 Neuropathies - Disorders of the PNS...... 16 1.2.4.2 Diseased versus normal – diagnostic tools for determination of PNS function ...... 16 1.3 CHARCOT-MARIE-TOOTH DISEASE (CMT) ...... 21 1.3.1 History...... 21 1.3.2 Prevalence...... 21 1.3.3 Clinical classification...... 22 1.3.4 Clinical aspects ...... 23 1.3.4.1 Clinical aspects of the dominant forms...... 24 1.3.4.2 Clinical aspects of the recessive forms ...... 25 1.3.5 Genetics...... 26 1.3.5.1 Charcot-Marie-Tooth-disease- demyelinating forms (CMT 1, 4 and X)...... 27 1.3.5.2 Charcot-Marie-Tooth-disease- Axonal forms (CMT 2 and X) ...... 30 1.3.5.3 Conclusion...... 33 1.4 HEREDITARY MOTOR AND SENSORY NEUROPATHY-RUSSE (HMSNR) ...... 35 1.4.1 Clinical aspects ...... 35 1.4.2 Genetics...... 38 1.4.2.1 Genome scan...... 38 1.4.2.2 Refined mapping...... 39 1.5 THE AIMS OF THIS PHD PROJECT ...... 43 2 MATERIALS AND METHODS ...... 44 2.1 MATERIALS...... 44 2.1.1 DNA samples...... 44 2.1.2 Tissue samples...... 44 2.1.3 Recipes for solutions ...... 45 2.1.3.1 Media for cultivation of E. coli...... 45 2.1.3.2 Media for cell culture...... 46 2.1.3.3 Solutions for agarose gel electrophoresis...... 46 2.1.3.4 Solutions for polyacrylamide gel electrophoresis ...... 48 2.1.3.5 Solutions for cDNA library screen [155]...... 48 2.1.3.6 Wash solutions (Northern blot and cDNA library screen) ...... 49 2.1.3.7 Solutions for immunohistochemistry ...... 49 2.2 METHODS ...... 50 2.2.1 Culture methods ...... 50 2.2.1.1 Preparation of chemically competent cells...... 50 2.2.1.2 Isolation and culture of Schwann cells and DRGs...... 50 2.2.2 Molecular biological methods...... 51 2.2.2.1 Polymerase chain reaction (PCR) ...... 51

ii

2.2.2.2 Genotyping ...... 53 2.2.2.3 Agarose gel electrophoresis ...... 53 2.2.2.4 PCR and plasmid purification...... 54 2.2.2.5 DNA sequencing...... 54 2.2.2.6 Typing of sequence variants ...... 56 2.2.2.7 Cloning of PCR fragments...... 58 2.2.2.8 Isolation of total RNA...... 58 2.2.2.9 RT-PCR ...... 59 2.2.2.10 Northern Blot...... 60 2.2.2.11 cDNA library screen ...... 63 2.2.2.12 Immunohistochemistry ...... 66 2.2.3 Data analysis and Internet resources...... 67 2.2.3.1 Identification of the critical HMSNR region in online maps and identification of positional candidates ………………………………………………………………………………………………...67 2.2.3.2 Interpretation of sequence data ...... 68 2.2.3.3 Other Internet resources...... 69 3 REFINED MAPPING OF THE CRITICAL HMSNR GENE REGION ...... 70 3.1 REFINED MAPPING AS A PART OF THE POSITIONAL CLONING PROCESS...... 70 3.1.1 Linkage mapping of Mendelian disease loci ...... 70 3.1.2 General strategy of the refined mapping...... 73 3.1.3 Refined mapping in isolated founder populations...... 74 3.1.3.1 Homozygosity mapping...... 76 3.1.3.2 Mapping in young founder populations ...... 76 3.1.3.3 Mapping in old founder populations...... 76 3.1.3.4 Mapping in the Gypsies ...... 77 3.2 APPROACH TO THE REFINED MAPPING OF THE HMSNR GENE...... 78 3.2.1 Foundations of the refined mapping of HMSNR (work prior to this PhD project) ...... 78 3.2.2 Strategy of the refined mapping of the HMSNR gene...... 80 3.2.3 The use of databases to implement the strategy ...... 81 3.3 RESULTS OF THE REFINED MAPPING OF THE HMSNR GENE REGION ...... 84 3.3.1 Integration of genetic data with a changing physical map...... 84 3.3.2 Polymorphic markers used for the construction of a high density genetic map of the HMSNR region ...... 87 3.3.2.1 Identification of polymorphisms...... 87 3.3.2.2 Microsatellites ...... 89 3.3.2.3 Insertion/deletions (indels) and Single Nucleotide Polymorphisms (SNPs) ...... 91 3.3.3 Mapping of the recombination breakpoints...... 96 3.3.3.1 The Spanish Gypsy families – no further recombination...... 100 3.3.3.2 The large Bulgarian Gypsy kindred – historical recombination on the telomeric side...... 103 3.3.3.3 The Romanian Gypsy families – centromeric and telomeric recombination ...... 108 3.3.3.4 The small Bulgarian Gypsy family – recent centromeric recombination...... 113 3.3.3.5 Summary of the recombination mapping...... 116 3.4 SUMMARY AND DISCUSSION OF THE REFINED MAPPING...... 117 4 ANALYSIS OF POSITIONAL CANDIDATES – IDENTIFICATION OF PUTATIVE HMSNR MUTATIONS ...... 120 4.1 STRATEGIC CONSIDERATIONS ...... 120 4.1.1 Classification of heritable disease-causing changes...... 120 4.1.2 Looking for the HMSNR mutation...... 121 4.1.3 Exclusion criteria for putative mutations ...... 122 4.1.3.1 The exclusion criteria discussed in the literature ...... 122 4.1.3.2 Application of the exclusion criteria to the HMSNR project ...... 125 4.2 ANALYSIS OF POSITIONAL CANDIDATES BY DIRECT SEQUENCING ...... 127 4.2.1 Positional candidate genes (stage 1)...... 128 4.2.1.1 Direct sequencing of the exons of the known genes ...... 128 4.2.1.2 Direct sequencing of the predicted promoters of the known genes...... 134 4.2.1.3 Direct sequencing of predicted genes ...... 135 4.2.1.4 Summary of the sequencing of positional candidate genes (stage 1) ...... 137 4.2.2 Positional candidate ESTs (stage 2)...... 139 4.2.3 Sequencing the final region of homozygosity of 63.8 kb (stage 3)...... 142 4.2.4 Summary of the mutation analysis...... 144 4.3 POPULATION SCREENING OF THE TWO PUTATIVE HMSNR MUTATIONS ...... 146 4.3.1 Aim of the mutation screening...... 146 4.3.2 Introduction to Hexokinase 1 ...... 146 4.3.3 Typing the two putative HMSNR mutations ...... 150

iii

4.3.4 Screening individuals of Gypsy ethnicity...... 150 4.3.4.1 Population screen...... 150 4.3.4.2 Screening of Gypsy families with unclassified CMTs...... 152 4.3.5 Screening individuals of non-Gypsy ethnicity ...... 156 4.3.5.1 Population screen...... 156 4.3.5.2 Screening of Non-Gypsy families with unclassified CMT ...... 156 4.4 SCREENING OF HEXOKINASE-1 IN A SPANISH PATIENT OF NON-GYPSY ETHNICITY ...... 157 4.4.1 Clinical features of the Spanish patient...... 157 4.4.2 Genetic investigation of the Spanish patient ...... 157 4.4.3 Conclusions of screening HK1 for mutations in the Spanish patiens...... 160 4.5 SUMMARY AND DISCUSSION OF THE MUTATION SCREEN ...... 162 5 GATHERING EVIDENCE FOR THE INVOLVEMENT OF HK1 IN HMSNR ...... 164 5.1 HEXOKINASES – AN OVERVIEW ...... 164 5.1.1 The four hexokinase isoenzymes in mammalia...... 164 5.1.1.1 Evolutionary conservation ...... 164 5.1.1.2 Gene structure and of the four human isoenzymes ...... 166 5.1.2 Functions of hexokinases...... 167 5.1.3 Hexokinase pathologies and associated phenotypes ...... 170 5.1.3.1 Hexokinase pathologies in humans...... 170 5.1.3.2 Mouse models of hexokinase pathologies...... 171 5.1.4 Hexokinase 1 in the peripheral nervous system ...... 173 5.2 STRATEGIC CONSIDERATIONS ...... 174 5.2.1 The HMSNR mutation ...... 174 5.2.2 Potential effects of the two putative HMSNR mutations...... 175 5.2.2.1 Possible effects of the two putative mutations on HK1 ...... 175 5.2.2.2 Long range effects in cis...... 178 5.2.3 Strategy for the investigation...... 179 5.3 RESULTS ...... 181 5.3.1 Results of the investigation of HK1 and the two putative HMSNR mutations ...... 181 5.3.1.1 Computational analysis of the two putative mutations...... 181 5.3.1.2 Transcriptional analysis ...... 188 5.3.1.3 Hexokinase activity in cultured Schwann cells...... 201 5.3.1.4 Immunohistochemistry ...... 202 5.3.2 Results of examining the possible involvement of a new gene in HMSNR ...... 204 5.3.3 Results of examining the possible involvement of FLJ31406 or FLJ22761 in HMSNR ... 205 5.3.3.1 FLJ31406...... 205 5.3.3.2 FLJ22761...... 206 5.4 SUMMARY AND DISCUSSION...... 208 6 GENERAL DISCUSSION AND CONCLUSION...... 213 6.1 REFINED MAPPING OF THE HMSNR GENE REGION ...... 215 6.2 MUTATION ANALYSIS ...... 217 6.3 HK1 AS THE POSSIBLE HMSNR GENE ...... 219 6.3.1 The transport theme in CMT ...... 220 6.3.2 The apoptosis theme in CMT...... 220 6.3.3 Possible mutational mechanisms...... 222 6.4 FUTURE DIRECTIONS ...... 225 REFERENCES...... 227 APPENDICES ...... 248 APPENDIX A: LIST OF POLYMORPHIC VARIANTS ...... 248 APPENDIX B: LIST OF PRIMERS ...... 252 APPENDIX C: PUBLICATIONS...... 258

iv

INDEX OF TABLES

Table 1: Summary of lipids in PNS myelin (adapted from [11, 12])...... 8 Table 2: Summary of in PNS myelin (adapted from [12])...... 9 Table 3: Normal values for nerve conduction velocities in the median nerve (adapted from [14])...... 18 Table 4: Prevalence of CMT (table modified from [43])...... 22 Table 5: HMSN classification of Charcot-Marie-Tooth-disease [59]...... 23 Table 6: Overview of Charcot-Marie-Tooth disease: genes and loci * ...... 26 Table 7: Overview of the HMSNR families ...... 44 Table 8: DNA ladders used for agarose gel electrophoresis...... 54 Table 9: Internet addresses of databases that provide physical maps of the ...... 82 Table 10: Overview of the microsatellites on chromosome 10q used in this study in order centromeric to telomeric...... 89 Table 11: Frequencies of the different types of base substitutions for the SNPs identified in this PhD project...... 92 Table 12: Distribution of the 229 polymorphisms identified in the HMSNR region between D10S2480 and D10S560 ...... 94 Table 13: Density of SNPs and insertion/deletions identified in this PhD in intra- and intergenic regions...... 94 Table 14: Comparison of the functional knowledge in August 2001 and to date about the 14 known genes in the interval between bA86K9CA1 and D10S560 in order from centromere to telomer (source: NCBI Locus Link [231])...... 130 Table 15: Putative mutations detected while sequencing the known genes and their position within the gene ...... 132 Table 16: Putative mutations detected while sequencing the known genes and their exclusion ...... 132 Table 17: Amount of sequencing performed for each gene including promoter predictions (all in kb) ...... 135 Table 18: Names and accession numbers of the predicted genes from the NCBI database that were sequenced in the critical region...... 136 Table 19: Putative mutations identified while sequencing the predicted genes...... 137 Table 20: Summary of the putative mutations identified by sequencing analysis...... 138 Table 21: Location of the five putative mutations identified during the sequencing of the ESTs in the 110 kb and details on their exclusion ...... 142 Table 22: Overview of the exclusion of the newly identified mutations during the sequencing of the remaining introns and intergenic regions in the 63.8 kb...... 144 Table 23: Overview of the findings of the mutation analysis ...... 144 Table 24: Changes detected in the Spanish patient while sequencing the exons of hexokinase 1...... 159 Table 25: Comparison of the exon sizes of several mammalian hexokinases (from [245]) ...... 165 Table 26: Overview of the human hexokinases ...... 167 Table 27: Kinetic and regulatory properties of the four human hexokinases (modified from [253] and [252]) ...... 168 Table 28: HK1 ESTs originating from peripheral nervous system...... 174 Table 29: Transcripts inspected for alternative translation initiation...... 183 Table 30: Result of the cDNA library screen...... 199

v

INDEX OF FIGURES

Figure 1: Representative drawing of a vertebrate neuron (figure reproduced from [10]) 4 Figure 2: Cross-section of three fascicles of a normal peripheral nerve (figure adapted from [9])...... 5 Figure 3: Diagram showing a myelinated axon of the PNS (figure reproduced from [12]) ...... 6 Figure 4: Schematic drawing of the two hypotheses of myelin formation (reproduced from [12])...... 7 Figure 5: The development of a neural crest cell into the a myelinating or non- myelinating Schwann cell [34]...... 14 Figure 6: Schematic representation of the three main reactions of peripheral nerve to disease (figure taken from [9])...... 17 Figure 7: Large Bulgarian pedigree in which HMSNR and HMSNL segregate independently (figure adapted from [5])...... 35 Figure 8: Picture of a 24 year old Spanish HMSNR patient with distal atrophy of the limbs...... 36 Figure 9: Electron micrograph of transverse section trough HMSNR sural nerve [152]37 Figure 10: Light micrograph of transverse section trough HMSNR sural nerve [152]. .38 Figure 11: Chromosome 10q haplotype data from the initial genome scan and the refined mapping (Data adapted from [5, 6]) ...... 40 Figure 12: Pedigree of the large Romanian Gypsy family ROM-1 ...... 42 Figure 13: Physical map of the interval between bA86K9CA1 and D10S1742 in April 2001 and shared HMSNR haplotype ...... 80 Figure 14: Changes in the physical map of the critical HMSNR region between bA86K9CA1 and D10S1742 ...... 85 Figure 15: Schematic drawing of the markers included in the haplotype analysis, the detection of new variants and how these relate to the critical HMSNR region at the beginning of this PhD project ...... 87 Figure 16: Distribution of the microsatellites over the critical HMSNR region...... 90 Figure 17: Haplotypes at the beginning of this PhD project...... 97 Figure 18: Haplotypes after the refined mapping ...... 98 Figure 19: Markers used in the pedigrees in the following section ...... 99 Figure 20: Pedigree of the Spanish Gypsy family Sp-4...... 101 Figure 21: Pedigree of the Spanish Gypsy family Sp-5...... 102 Figure 22: Pedigree of the Bulgarian Gypsy family BULG-1a ...... 104 Figure 23: Pedigree of the Bulgarian Gypsy family BULG-1b ...... 105 Figure 24: Pedigree of the Bulgarian Gypsy family BULG-1c ...... 106 Figure 25: Pedigree of the Bulgarian Gypsy family BULG-1d ...... 107 Figure 26: Pedigree of the large Romanian Gypsy family ROM-1 ...... 110 Figure 27: Pedigree of the small Romanian Gypsy family ROM-2...... 111 Figure 28: Comparison of the centromeric side of haplotype “n” with the conserved haplotype...... 112 Figure 29: Comparison of the telomeric side of the conserved HMSNR haplotype with haplotypes “o”, “d” and “e” ...... 113 Figure 30: Pedigree of the Bulgarian Gypsy family BULG-2 as obtained from the initial genotyping of markers in the broad HMSNR interval...... 114 Figure 31: Pedigree of the Bulgarian Gypsy family BULG-2 after extended typing of polymorphisms...... 115 Figure 32: Current physical map of the interval between bA86K9 CA1 and D10S560 ...... 129 Figure 33: Map of the ESTs analysed in the 110 kb in relation to the known genes....140 vi

Figure 34: Map of the final region of homozygosity of 63.8 kb...... 143 Figure 35: Genomic arrangement of human hexokinase 1 and the tissue-specific mRNAs...... 148 Figure 36: Genomic arrangement of human hexokinase 1 and the ESTs...... 149 Figure 37: Locations of the two putative mutations relative to each other and to the 5’ exons of hexokinase 1 ...... 150 Figure 38: Inferred chromosome 10 haplotypes of the five HMSNR carriers identified in the population screen in the Gypsies...... 151 Figure 39: Pedigree of the Gypsy family IRE-1 ...... 153 Figure 40: Extended pedigree of the Romanian Gypsy family ROM-1 ...... 154 Figure 41: Haplotypes of the Bulgarian Gypsy patient...... 155 Figure 42: Inferred haplotypes of the Spanish patient in comparison to the HMSNR haplotype...... 158 Figure 43: Schematic drawing of the hexokinase reaction with glucose...... 168 Figure 44: Sequence comparison of the homologous regions in chimpanzee, dog and mouse for the putative HMSNR mutation in alternative exon T2 of HK1 ...... 182 Figure 45: Sequence comparison of the homologous region in dog for the putative HMSNR mutation in the intron after alternative exon T2 of HK1 ...... 182 Figure 46: Search for upstream ORFs and start codons...... 183 Figure 47: Prediction of the secondary structure of the mRNA containing alternative exon T2 in normal and mutant state using the Vienna Package (RNAfold)...... 186 Figure 48: Prediction of the secondary structure of the mRNA containing alternative exon T2 in normal and mutant state using Genebee ...... 187 Figure 49: Spermatocyte transcripts of mouse Hk1 published by Mori et al 1993 [241] ...... 188 Figure 50: Area of high identity of the mouse genome to human alternative exon T2 189 Figure 51: RT-PCR results using mouse testis and mouse brain...... 190 Figure 52: Overview of the RT-PCR experiments performed in the 5’ region of mouse HK1...... 191 Figure 53: Overview of the RT-PCR experiments performed in the 5’ region of human HK1...... 195 Figure 54: Probes used in the Northern Blot experiments ...... 196 Figure 55: Northern blots...... 198 Figure 56: Result of RACE from the alternative exon in 5’ and 3’ direction...... 201 Figure 57: Nerve biopsies from (a) normal and (b) HMSNR nerve stained with an anti- HK1- antibody...... 203 Figure 58: Location of spliced ESTs and predicted genes in relation to the known genes in the genomic area surrounding the two putative mutations ...... 204 Figure 59: Comparison of the PDB (porin binding domain) of HK1 and HK2 with the putative PDB in FLJ22761...... 207

vii

ABBREVIATIONS

µCi Micro Curie A Adenine aa Amino acid ADP Adenosine diphosphate AKT1 v-akt murine thymoma viral oncogene homolog 1 Amp Ampicillin AMP Adenosine monophosphate ANT Adenine nucleotide translocator APS Ammonium-peroxy-disulfate ARCMT Autosomal recessive Charcot-Marie-Tooth disease ARMS-PCR Amplification refractory mutation system-polymerase chain reaction ATP Adenosine triphosphate BAC Bacterial artificial chromosome Bad BCL2-antagonist of cell death BAEP Brainstem auditory evoked potentials BAX BCL2-associated X protein BCL2 B-cell CLL/lymphoma 2 Bid BH3 interacting domain death agonist Bi-PASA Bidirectional PCR Amplification of Specific Alleles BLAST Basic local alignment search tool bp Base pairs BSA Bovine serum albumin BULG Bulgarian HMSNR family C Cytosine cAMP cyclic adenosine monophosphate CCAR-1 Cell cycle and apoptosis regulator 1 CCFDN Congenital Cataracts Facial Dysmorphism Neuropathy syndrome cDNA Complementary DNA CDTP1 Carboxy-terminal-domain phosphatase 1 Cen Centromere CFTR cystic fibrosis transmembrane regulator cfu Colony forming units CH Congenital Hypomyelination CLN5 ceroid-lipofuscinosis, neuronal 5 cM Centi Morgan CMT Charcot-Marie-Tooth disease CMT1 Demyelinating Charcot-Marie-Tooth disease (type 1) CMT2 Axonal Charcot-Marie-Tooth disease (type 2) CMT4 Autosomal recessive demyelinating Charcot-Marie-Tooth disease CNP 2’,3’-cyclic nucleotide 3’-phosphodiesterase CNS Central nervous system CSF1R Colony stimulating factor 1 receptor Cx32 Connexin32 CXXC6 CXXC6 finger DAB Diaminobenzidine tetrahydrochloride dATP 2'-deoxy-adenosine-5'-triphosphate dbSNP SNP database at NCBI dCTP 2'-deoxy-cytidine-5'triphosphate DDX50, DDX21 DEAD-box polypeptide 50 and 21 Dea Downeast anaemia mouse model of Hk1 deficiency dGTP 2'-deoxy-guanosine-5'-triphosphate DMEM Dulbecco's Modified Eagle's Medium DNA Deoxyribonucleic acid dNTP 2'deoxy-nucleotide-5' triphosphate DRG Dorsal root ganglia DSS Déjerine-Sottas-Syndrome DTD Diastrophic dysplasia dTTP 2'-deoxy-thymidine-5'-triphosphate. viii

EBI European Bioinformatics Institute E-cadherin Epithelial cadherin EDTA ethylenediaminetetraacetic acid EGR Early growth response gene EMBL European Molecular Biology Laboratory EPM1 Epilepsy progressive myoclonus 1 ER Endoplasmatic reticulum ESE Exonic splice site enhancer ESS Exonic splice site silencer EST Expressed sequence tag ETn Early transposon FATA Fatty acid FA Formamide Fam Fluorescein FOXN1 Forkhead box N1 FTDP 17 familial isolated growth hormone deficiency type II G Guanosine G6P Glucose-6-phosphate GAN Giant Axonal Neuropathy GARS Glycyl-tRNA synthetase GCK or Gck Glucokinase GCKR Glucokinase regulator protein GDAP1 Ganglioside-induced differentiation associated protein 1 Glc Glucose GTC Genome Therapeutics Corporation Hex Hexachloro Fluorescein HGMD Human gene mutation database HGP Human Genome Project HGVbase Human Genome Variation database HI Hyperinsulinemia of infancy HK or Hk Hexokinase (1 to 4) HMG High mobility group domain HMN Hereditary Motor Neuropathy HMSN Hereditary Motor and Sensory Neuropathy HMSNL Hereditary Motor and Sensory Neuropathy-Lom HMSNR Hereditary Motor and Sensory Neuropathy-Russe HNPP Hereditary Neuropathy with liability to Pressure Palsies Hs Homo sapiens HSN Hereditary Sensory Neuropathy HSP Heat shock protein IBD Identical by descent IGHD II Frontotemporal dementia or Parkinsonism linked to chromosome 17 IHGSC International Human Genome Sequencing Consortium IL-1 Interleukin-1 IMBB/FORTH Institute of Molecular Biology and Biotechnology/Foundation for Research and Technology Hellas IPTG Isopropyl-beta-D-thiogalactopyranoside IRE Irish family where HMSNR is suspected IRES Internal ribosome entry site ISE Intronic splice site enhancer ISS Intronic splice site silencer kb Kilobases kDA Kilodalton KIF1Bβ Kinesin superfamily motor protein 1Bβ Lab Id Laboratory identification number LB medium Luria-Bertani medium LCT Lactase LD Linkage disequilibrium LGMD Limb girdle muscular dystrophy LITAF Lipopolysaccharide-induced tumour necrosis factor-α LMBR1 limb region 1 LMNA Lamin A/C LOD Logarithm of the odds ix

MAG Myelin-associated glycoprotein MAPK mitogen-activated protein kinase Mb Megabases MBP Myelin basic protein MCM6 minichromosome maintenance deficient 6 MF Myelinated fibre MFN2 Mitofusin 2 Mm Mus musculus MMLV-RT Murine Leukaemia Virus Reverse Transcriptase MNCV Motor nerve conduction velocity MODY II Maturity onset diabetes of the young type II MOPS 3-[N-morpholino]propanesulfonic acid (MOPS) (free acid) MRI Magnetic resonance imaging mRNA Messenger ribonucleic acid MTM Myotubularin MTMR2 Myotubularin related protein NCBI National Center for Biotechnology Information NCV Nerve conduction velocity NDM Neonatal diabetes mellitus NDRG1 N-myc downstream-regulated gene 1 NEUROG3 Neurogenin3 NF-L Neurofilament-light NIDDM Non-insulin dependent diabetes mellitus NSHA Non-spherocytic haemolytic anaemia NT2 Human brain neural progenitor cell line; from teratocarcinoma OD Optical density ORF Open reading frame P0 (MPZ) Myelin protein zero PA Phosphoric acid PAX6 paired box gene 6 PCR Polymerase chain reaction PHHI hyperinsulinemic hypoglycaemia of infancy Pi Inorganic phosphate PKD Polycystic kidney disease PLP/DM20 Proteolipid protein PMD Pelizaeus-Merzbacher disease PMP2 Peripheral myelin protein 2 PMP22 Peripheral myelin protein 22 PNS Peripheral nervous system PRG1 Proteoglycan, secretory granule 1 PRO-SL Pre-senile frontal lobe dementia with bone cysts PTPC permeability transition pore complex RAB7 RAS-related GTP-binding protein 7 RACE Rapid amplification of cDNA ends Ras Rat sarcoma viral oncogene RFCGR Rosalind Franklin Centre for Genome Research Rn Rattus norvegicus RNA Ribonucleic acid RNase Ribonuclease ROM Romanian HMSNR family rs Reference SNP RT Reverse transcription RT-PCR Reverse transcriptase-polymerase chain reaction SBF2 SET-binding factor 2 SDS Sodium dodecyl sulphate SHH homolog SNP Single nucleotide polymorphism Sox9 /10 SRY (sex determining region Y)-box 9/ 10 SP Spanish HMSNR family SPTLC1 Serine palmitoyl transferase SREBP-1c sterol regulatory element-binding protein 1c SSC buffer Sodium Chloride-Sodium Citrate Buffer SSPE buffer Sodium Chloride-Sodium Phosphate-EDTA Buffer

x

STS Sequence tagged site SUPV3L1 Suppressor of var3 like 1 SVL Superficial vastus lateralis T Thymine TACR2 Tachykinin receptor 2 TAE buffer Tris acetate EDTA buffer TBE buffer Tris borate EDTA buffer TCA Tricarboxylic acid cycle TDO2 Tryptophan oxygenase 2 Tel Telomere Tet Tetrachloro Fluorescein UTR Untranslated region UV Ultraviolet VDAC Voltage-dependant anion channel vLINCL variant late infantile neuronal ceroid lipofuscinosis VNTR Variable number tandem repeat VPS26 Vacuole protein sorting 26 www World wide web x-Gal 5-Bromo-4-chloro-3-indolyl-b-d-galactoside YAC Yeast artificial chromosome

xi

ACKNOWLEDGEMENTS

First of all I want to thank Prof Luba Kalaydjieva for the support throughout my PhD and for pushing me to rewrite chapters over and over until I really understood what was important. I have learned a lot over the past four years. I am very grateful to Dave for showing me new techniques in the lab like Schwann cell culture and DRG culture which I greatly enjoyed to learn. I also want to thank you for taking over the co-supervisor position for the last part of my PhD, and reading hundreds of pages of chapter drafts for my thesis and especially thanks for taking on the rough drafts that I couldn’t give to Luba. Thank you to Tam for being co-supervisor for first part of my PhD. I also want to thank the whole Lab, Bec, Mick and Dora, and also to Bharti for being a good team making my time enjoyable. I especially thank Mick for our conversations about the dilemmas of a PhD student. It’s good to know you’re not alone. I am greatly indebted to WAIMR for receiving the WAIMR scholarship, without which this whole PhD wouldn’t have been possible. A big thank you also to Heather Williams and Sato Juniper from the Postgraduate research and Scholarships Office, who made it possible to finish this PhD. I also want to thank the University of WA for awarding me the completion scholarship for the last 16 weeks of my PhD. Thank you to thank Evan for supplying my project with his control mice, and I also thank Bruno for letting me scavenge rat tissues from the “Friday rat”. I want thank my parents for supporting me through all the years, financially but also by listening to my problems in lengthy phone calls. A big thank you also to the rest of my family just for being family. Then I want to thank all my friends back home in Germany for trying to keep in touch – I know it’s hard. I hope I see you all soon, there’s lots of catching up to do. Thankyou also to all the new friends I made here in Australia over the past four years, especially Brigit and Annett. Last but not least I want to thank Khang for supporting me over the past year. Thank you for all the practical support from putting in correction into my thesis to cooking for me (your Thai beef salad rocks!!!). But most of all thank you for believing in me and for just being there - writing can be very lonely. And I also thank cat for keeping me company.

xii

Chapter 1: General Introduction and Review of the Literature

1 GENERAL INTRODUCTION AND REVIEW OF THE LITERATURE

Chapter Outline

The first part of the thesis is dedicated to familiarise the reader with the topic of this thesis and to review the relevant literature. This chapter has five aims: 1. To give an overview of the use of founder populations for positional cloning and its importance for this project, 2. To introduce the reader to the peripheral nervous system and specifically, to axons and Schwann cells and the process of myelination, 3. To give an overview of the different forms of Charcot-Marie-Tooth (CMT) disease, their common and special features, and their genetics, 4. To present the clinical and genetic data known about HMSNR at the beginning of this project, 5. To explain the aims of this study.

1.1 THE USE OF GENETIC ISOLATES FOR POSITIONAL CLONING

1.1.1 Advantages of genetic isolates

The mapping and/or cloning of over 50 disease genes using population isolates like the Finns, the Old Order Amish, the Hutterite, the Sardinians or the Jews demonstrates how valuable these communities are for the study of inherited, especially Mendelian, disorders (reviewed in [1]). The main advantages lie in the founding of the population by a few individuals and the ensuing genetic isolation often associated with a high incidence of consanguinity, which together influence the prevalence of inherited diseases in comparison to the general population, resulting in some disorders becoming disproportionately frequent whereas others totally disappear. Moreover, in these communities allelic or locus heterogeneity are not common in monogenic disorders which are, in most cases, caused by a single mutation on a common ancestral chromosome and thus can be easily mapped simply by searching for shared haplotypes not only in one family but in the whole population. Historic recombination accumulated

1 Chapter 1: General Introduction and Review of the Literature on the disease haplotype then greatly facilitates the refinement of the critical region. Some excellent examples that illustrate the power of positional cloning using genetic isolates have been contributed by studies in the Finns: The locus for progressive myoclonus epilepsy (EPM1) was mapped to a 7 cM interval on chromosome 20q and refined mapping using 38 families reduced the critical region to only 176 kb. A similar result was obtained with 14 families for the locus of pre-senile frontal lobe dementia with bone cysts (PLO-SL) on chromosome 19q, where examination of the shared haplotypes resulted in a critical region of just over 150 kb (reviewed in [2]).

1.1.2 The Gypsies as a founder population

One of the founder populations that have emerged as a remarkable tool for disease gene mapping are the Gypsies (also called Roma). The overall success of genetic studies in the Gypsies has been underlined by the recent identification of a number of single gene disorders and their respective disease-causing mutations which resulted from single founder mutations in the majority of cases [3]. After the exodus from India the Gypsies reached Europe’s Balkan region about one thousand years ago. Several successive migration waves out of the Balkans into other parts of Europe led to the formation of the present Gypsy groups, which are genetically different from the surrounding European population. With the genetic distance between Gypsy groups being even larger than the distance to the population of their resident country, diversification of haplotypes is extremely large. In terms of a founder mutation this implies a vast number of different recombinations can accumulate on the disease haplotype which makes it easy to reduce the critical homozygous region to a bare minimum. Furthermore, frequencies of mutations are often high and mapping of disease genes can be based on conventional linkage analysis, rather than homozygosity mapping within one consanguineous family, where recombinations are soon exhausted while the critical region remains of considerable size. With carrier rates amongst the Gypsy population being between 5 and 20 % consanguinity loses its relevance (reviewed in [3] and [4]). HMSNR, the focus of this PhD project, has been mapped in the Bulgarian Gypsies [5]. Subsequently, affected individuals of Gypsy origin from Spain and Romania have been identified that shared haplotypes with the original Bulgarian kindred [6]. Based on previous experience with HMSNL [7] and other disorders identified in the Gypsies, positional cloning of the HMSNR gene relies on the shared

2 Chapter 1: General Introduction and Review of the Literature haplotypes and the fact that a founder mutation is most likely to be the cause of HMSNR.

1.2 THE BIOLOGY OF THE PERIPHERAL NERVOUS SYSTEM (PNS)

1.2.1 Organisation of the PNS

The human nervous system can be divided into two parts: the central nervous system (CNS), which consists of the brain, and the spinal cord, and the peripheral nervous system (PNS), which, in turn, has two parts namely the somatic nervous system and the autonomic nervous system. The main function of the PNS is to affiliate the brain and the spinal cord to the rest of the body, thus mediating information transfer between the CNS and the periphery. To achieve this connection two types of neurons are necessary: efferent or motor neurons which transmit information away from the CNS and afferent or sensory neurons which are responsible for conveying signals to the CNS [8]. The peripheral nervous system can be divided into the somatic and the autonomic parts. The somatic system which consists of cranial and spinal nerves serves as the connection to the external environment. The autonomous nervous system on the other hand is mainly involved in involuntary functions such as heart rate, vasomotor tone, bowel motility and saliva flow and thus joins the CNS and the internal environment. It can be separated into the sympathetic and the parasympathetic system [9].

1.2.2 Cell types of the PNS

The two major cell types of the peripheral nervous system are neurons (nerve cells) and Schwann cells. Other cells types can be found in the supporting connective tissue and the blood vessels. The basic makeup of a neuron (a nerve cell) comprises a cell body with one long axon extending outwards to transmit impulses away from the cell body and several dendrites, which are shorter in length and more branched, whose task it is to take in the signals that come from other neurons ([10] and Figure 1).

3 Chapter 1: General Introduction and Review of the Literature

Figure 1: Representative drawing of a vertebrate neuron (figure reproduced from [10]) Outgoing signals are transmitted away from the cell body passing through the axon, while incoming signal reach the cell body via the dendrites.

Whereas the structure and anatomy of axons is comparable in CNS and PNS, major differences can be noted when looking at the associated supporting structures, myelin and the connective tissue. Myelin is a specialist membrane structure that envelops some axons in both the CNS and the PNS. Axons in both CNS and PNS may be myelinated or unmyelinated. The myelin-producing cells of the CNS are the oligodendrocytes. Each of them supplies several axons with myelin. This stands in contrast to the PNS, where each myelinating Schwann cell provides just a portion of the myelin sheath required to fully cover one axon. The non-myelinating Schwann cell however, may surround three to six unmyelinated axons [9]. In the PNS, there are three types of connective tissue layers covering both myelinated and unmyelinated nerve fibres that fulfil a protecting function but also nourish the fibres through various blood vessels. The endoneurium, which consists of fibroblasts and collagen, directly surrounds the different axons inside a bundle of nerve fibres or fascicle. The fascicle itself is coated with the perineurium, a layer made of dense connective tissue. Each group of fascicles is in turn surrounded by the epineurium containing dense and loose collagenous tissue and adipose tissue ([9] and Figure 2).

4 Chapter 1: General Introduction and Review of the Literature

Perineurium

Epineurium

Endoneurium

Figure 2: Cross-section of three fascicles of a normal peripheral nerve (figure adapted from [9]) The three layers of connective tissue that cover the nerve fibre, namely the epineurium, the perineurium and the endoneurium have been labelled in the figure.

1.2.3 Myelinated axons in the PNS

“Myelin is a membrane, laid down in segments along axons, that functions as an insulator to increase the velocity of stimuli being transmitted between a nerve-cell body and its target.” ([11] p. 1).

1.2.3.1 Morphology and function of myelinated axons in the PNS

As already stated above, Schwann cells generate the PNS myelin. The myelin segment produced by each Schwann cell including the axon is called the internode. The gap that arises between two internodes is called the node of Ranvier. The myelin itself can be divided into compact and non-compact, with the non-compact part being the area close to the node of Ranvier, also named the paranode (reviewed in [12] and Figure 3).

5 Chapter 1: General Introduction and Review of the Literature

Figure 3: Diagram showing a myelinated axon of the PNS (figure reproduced from [12]) The left part of the figure is a magnification of the compact myelin detailing the position of major dense line and intraperiod line. In the right part of the figure a myelinated axon is shown in section. The myelin is non-compacted at the node of Ranvier and the paranode, while it is compacted in the area leading away from the paranode.

To form myelin the Schwann cell flattens into a spade-like shape and wraps itself tightly around the axon, while removing the majority of the cytoplasm from the spade. For the wrapping process two theories have been raised: The formation of the myelin is achieved by either movement of the adaxonal inner rim of the Schwann cell or the abaxonal outer rim (Figure 4, reviewed in [12]). In 1989 Bunge and colleagues monitored the movement of Schwann cell nuclei of myelinating Schwann cells relative to the inner and the outer lip of the Schwann cell cytoplasm and found that the outer lip serves as an anchor, whereas spiral movement around the axon occurs at the inner lip [13]. On electron micrographs of myelin cross-sections two lines are visible, the major dense line, made up by the cytoplasmic part inside the wrapping Schwann cell, and the intraperiod line, which is the plasma membrane. Additionally, an opening of the major dense line has been seen, which was at first thought to be an artefact of microscopic preparation, but soon found to be real (Figure 3). The so-called Schmidt- Lantermann-incisures were found to contain Schwann cell cytoplasm, desmosomes, a certain type of microtubule and dense bodies. As for the functions of these cytoplasmic ducts, it has been proposed that they serve to increase the plasticity of the myelin and/or play a role in the metabolism and maintenance of the myelin sheath (reviewed in [12, 14]). 6 Chapter 1: General Introduction and Review of the Literature

Figure 4: Schematic drawing of the two hypotheses of myelin formation (reproduced from [12]). The upper part of the figure shows the formation of myelin by movement of the outer lip of the Schwann cell around the axon, while the lower part of the figure illustrates the formation of myelin by means of movement of the inner lip. In 1989, studies by Bunge and colleagues suggested that myelin formation occurs in fact by progression of the inner lip of the Schwann cell [13].

The nodes of Ranvier are regularly spaced along the axon. As they are not covered in myelin, there is no electrical insulation at these points. During the conduction of an electrical impulse along a myelinated axon, the internode becomes depolarised, while the action potential is rebuilt each time it reaches a node of Ranvier by the flux of sodium ions through voltage-sensitive channels. This process of restoring the action potential is also called “saltatory conduction”, because it appears that the action potential jumps from node to node. In unmyelinated axons, the sodium channels are evenly distributed over the whole axon making the charge density similar at all points. Therefore the wave of depolarisation has a much shorter range and conduction velocity is lower. The benefits of using myelin for insulation are quite obvious when comparing the transmission velocities of myelinated and non-myelinated axons. Whereas the speed of the impulse is about 1 m/s in unmyelinated axons, it reaches values of over 100 m/s in myelinated ones. As fibre diameter and conduction velocity are in a directly proportional relationship, this means in order to achieve a similar speed in an unmyelinated nerve fibre the diameter would have to be increased dramatically. Therefore, myelin is a remarkable structure that saves the body time, energy and space (reviewed in [11, 12]). Apart from conveying electrical impulses there is also a transport of molecules and organelles along nerve fibre. This may occur in anterograde (cell body to end of

7 Chapter 1: General Introduction and Review of the Literature nerve fibre) or in opposite direction which is called retrograde transport (reviewed in [14]). To maintain the axon, it needs to be supplied with protein, and therefore two models have been proposed. The first one relying on the assumption, that all protein is produced in the cell body and then moved to its axonal destination via a slow transport system. The other model suggests that there is local synthesis of proteins along the axon (reviewed in [15]).

1.2.3.2 PNS myelin composition

The proportion of lipids in myelin is rather high when compared to other biological membranes. The high lipid content of myelin ensures the insulating function during nerve impulse conveyance. Overall, lipids account for 70 to 80 %, while proteins only make up for 20 to 30 % of the total myelin (reviewed in [12]).

Lipids

Myelin does not contain any specific lipids that can only be isolated from myelin; moreover, all categories of lipids detected in other membranes are also found in myelin. The major subclasses of lipids and their presence in PNS myelin are summarised in Table 1. The main fatty acids of the PNS myelin lipids are oleic acid with 18 carbon atoms [C18:1(n-9)] and saturated acids with 20 to 24 carbon atoms (C20 to C24).

Table 1: Summary of lipids in PNS myelin (adapted from [11, 12]) Lipid class Components [16] Content in total lipid of PNS myelin 1. Cholesterol total: 20-30 % 2. Phospholipids total: 46-56 % Phosphatidic PA + ethanolamine ++ ethanolamine Phosphatidic choline PA + choline ++ Phosphatidic serine PA + serine ++ Phosphatidic inositol PA + inositol + Plasmalogen PA + alkene + FATA or choline + Ceramide Sphingosine + FATA + Sphingomyelin Sphingosine + FATA+ PA + choline ++ 3. Glycolipids total: 15-30 % Cerebroside (sulfatides) Glucose or galactose + sphingosine + ++ FATA (+ sulphuric acid) Ganglioside Oligosaccharide + sialic acid + + sphingosine + FATA Abbreviations: FATA = fatty acid, PA = phosphoric acid, + = minor component, ++ = major component

8 Chapter 1: General Introduction and Review of the Literature When comparing CNS and PNS myelin, only minor differences can be noted. Furthermore, the amount of each lipid component differs between species and errors can be introduced by the preparation of the myelin and the detection method (reviewed in [11, 12]).

Proteins

While there are no myelin-specific lipids, a number of proteins exist that are unique to myelin. With over 60 %, glycoproteins make up for the bulk of the protein content of PNS myelin. The remainder are basic proteins (up to 25 %) and a variety of low abundance proteins including a membrane associated enzyme and gap junction protein. Furthermore, there are a large number of proteins expressed at a very low level, the majority of which are little characterised, and whose function in the PNS is unknown (Table 2) (reviewed in [12]).

Table 2: Summary of proteins in PNS myelin (adapted from [12]) Abundance in Molecular mass Trans- Localisation in myelin (% of (mature protein) membrane compact or non- total myelin (kDa) domain(s) compact myelin protein) Glycoproteins P0 50-70 % 28 1 Compact PMP22 2-5 % 22 4 Compact MAG 1 % 100 1 Non-compact Periaxin 5 % 170 None Non-compact E-cadherin < 0.5 % 130 1 Non-compact Basic proteins MBP 5-15 % 14-21.5 None Compact PMP2/P2 1-10 % 14.8 None Compact Other proteins CNP < 0.5 % 46/48 None Compact PLP/DM20 < 0.5 % 30/25 4 controversial Cx32 < 0.5 % 32 4 Non-compact

• Glycoproteins

Large amounts of the PNS glycoprotein can be accounted for by P0 (Protein zero), a homophilic adhesion molecule. Interestingly, a high expression of P0 has been found in myelinating Schwann cells, whereas it is totally absent in non-myelinating Schwann cells and oligodendrocytes (reviewed in [12]), suggesting a specific role for P0 in PNS myelin. P0 is believed to contribute to the structure of the major dense line in conjunction with MBP (myelin basic protein). In addition, P0 is thought to be important

9 Chapter 1: General Introduction and Review of the Literature for preservation of the compactness of the myelin (reviewed in [12]). P0 gene expression is influenced by the transcription factor SRY (sex determining region Y)-box 10

(Sox10) [17] and the level of P0 protein is important for the proper localisation of epithelial cadherin (E-cadherin) [18]. Localisation studies with P0 point mutants demonstrated a correct localisation to the plasma membrane for most of the mutants [19]. Changes in the extracellular domain necessary for adhesion seem to be the cause of neuropathy, as shown with epitope-tagged P0 in mice [20]. Expression of the peripheral myelin protein 22 (PMP22), which is not restricted to the PNS, is regulated in a tissue specific manner through the alternative use of two promoters, generating two mRNAs that differ in their 5’ untranslated region (reviewed in [12]). Duplication of a 1.5 Mb region containing the PMP22 gene is the most common cause of Charcot-Marie-Tooth disease (CMT) [21]. Apart from the proposed function for PMP22 in myelin formation and maintenance, it has been suggested from cell culture experiments that PMP22 is involved in cell death and cell spreading. The ability of PMP22 to trigger apoptosis has been shown to depend on incorporation of the protein into the plasma membrane and thus display on the cell surface [22]. A number of reports have been published which investigate the impact of PMP22 mutations associated with neuropathies by focussing mainly on localisation studies. In summary, it seems that a large number of point mutants are retained in the cell in compartments like the Golgi or the endoplasmic reticulum (ER) instead of being transported to the plasma membrane [19, 22, 23]. Thus, it has been proposed that the pathological mechanism is “gain of function” caused by blocking of an intracellular compartment or formation of heterodimers with the wild-type protein (reviewed in [22]). The latter being less likely as other studies indicated that the mutant protein did not interfere with the trafficking of the wild type protein (reviewed in [23]). Comparing the different localisation studies makes it quite clear that a result shown in vitro with cell culture may not give the full information relevant for the in vivo biological system. This is best illustrated with the Glycine 107 to Valine mutation which exhibited ER retention in cell culture [19], whereas in a nerve biopsy from a patient it was accumulated in the onion bulb [23], a structure that cannot be analysed in cell culture generated from non-nervous-system tissues. Moreover, results obtained for PMP22 duplication and its in vitro simulation by overexpression of PMP22 seem to differ substantially. Whereas in nerve biopsies no evidence for accumulation of PMP22 was found [23], the study by Shames and colleagues which used the artificial system demonstrated that overexpressed PMP22 accumulates inside the cell in the form of structures appearing similar to myelin [19].

10 Chapter 1: General Introduction and Review of the Literature The myelin-associated glycoprotein (MAG) is a membrane protein, which is expressed in two isoforms (S-MAG and L-MAG), is thought to be involved in axonal recognition and adhesion, intermembrane spacing, regulation of neurite growth, signal transduction during glial cell differentiation and in the maintenance of axonal integrity (reviewed in [12]). A disturbance of MAG function leads to myelin defects and axonal abnormalities in mice aged more than eight month while the initial myelination seems to take place normally. Thus it has been proposed that the importance of MAG lies in sustaining a normal interplay between axon and myelinating Schwann cell, rather than in the development of the myelin sheath [24]. Periaxin exists in two isoforms, L- and S-periaxin, which are produced from the same gene by alternative splicing. Recently, periaxin has been shown to be part of a dystroglycan complex in which it directly binds to Dystrophin-related-protein 2, as demonstrated in yeast-two-hybrid experiments. A disruption of L-periaxin causes the instability of the dystroglycan complex and thus instability of the myelin sheath, which, in turn, leads to a demyelinating neuropathy, in humans called CMT4F (described in 1.3.5.1). As seen for MAG, periaxin is not required for the formation of the myelin, but to maintain its stability, once it has been established. In addition, periaxin is also one of the many targets of the transcription factor EGR2 (early growth response gene-2) [25, 26]. E-cadherin belongs to the family of calcium-dependant adhesion molecules, which are involved in mediating cell adhesion in epithelia. While E-cadherin usually joins the membranes of two adjacent cells, in the myelinated fibre of the PNS it interconnects the consecutive spirals of membrane formed by one and the same Schwann cell around an axon. Normally, E-cadherin is localised to the adherens junction in the paranodes, the Schmidt-Lanterman-incisures and the mesaxon, which is the part of the myelin sheath, where the adaxonal outer rim of the wrapping Schwann cell membrane is attached to the membrane underneath (reviewed in [18, 27]). A recent study using a mouse model with PNS-specific loss of E-cadherin showed abnormalities only limited to the outer mesaxon, which were explained with the possible replacement of E-cadherin function by other members of the cadherin family as for instance N- -/- cadherin [27]. Furthermore, in the P0 mouse it has been demonstrated that P0 is vital for the accurate localisation of E-cadherin and its binding partner β-catenin. In addition, PMP22 has also been implicated in incorrect E-cadherin distribution through its influence on P0 levels [18].

11 Chapter 1: General Introduction and Review of the Literature

• Basic proteins

The main physicochemical properties of myelin basic protein (MBP) are its hydrophilicity and its high isoelectric point. The MBP locus on chromosome 18 serves for the transcription of eight alternatively spliced mRNAs from three transcription start sites. The encoded gene products that are either called “classical” MBPs or “golli” (gene in the oligodendrocytes lineage) proteins fulfil tasks in the immune and the nervous system (reviewed in [28]). As already mentioned, in the PNS, MBP is involved in the formation of the major dense line through electrostatic interaction together with P0. Interestingly, experiments with shiverer mouse mutant, which lacks functional MBP, demonstrated that loss of MBP leads to an increase in connexin 32 (Cx32) and MAG protein, but not their respective mRNAs, whereas the levels of PMP22 and P0 were not altered. Additionally, a decrease in the level MBP was also found to increase the number of Schmidt-Lanterman-incisures and the amount of E-cadherin. This suggests that MBP exerts a regulatory effect on the level of a number of myelin proteins [29]. The expression of another basic protein, PMP2 (peripheral myelin protein 2) or

P2, is limited to the nervous system where it occurs in both PNS and CNS. P2 has been assigned to a family of proteins that are known for their ability to bind fatty acids

(reviewed in [30]). P2 has a high affinity to oleic acid, retinoic acid and retinol and it is thought to function as a lipid carrier of the nervous system (reviewed in [12]).

• Other proteins

Both 2’, 3’-cyclic nucleotide 3’-phosphodiesterase (CNP) and proteolipid protein (PLP) have been extensively investigated in the CNS, however, their function in the PNS where they represent minor constituents of myelin, remains to be elucidated. CNP, an enzyme with two isoforms, converts 2’, 3’-cyclic nucleotides to 2’- nucleotides. The highest enzyme activity occurs in the corpus callosum of the brain. Activity in the peripheral nerve is much lower, but higher than in tissues outside the nervous system (reviewed in [11]). In PNS myelin the main sites of CNP expression are the periaxonal region, the outer border of the myelin sheath, the Schwann cell surface membrane, the Schmidt-Lanterman-incisures and the Schwann cell cytoplasm (reviewed in [12]). Enzyme activity has been found up-regulated during development and lowered in trembler and quaking mouse models of defective myelination and, in addition, it also appears to be dramatically decreased during Wallerian degeneration (reviewed in [11]). The first clues towards the function of CNP come from studies in the CNS, in oligodendrocytes. In a mouse model expressing a six fold dosage of CNP the

12 Chapter 1: General Introduction and Review of the Literature speed of oligodendrocyte development and the amount of myelin formed were increased notably, while compaction of myelin was lacking. This indicates that CNP may play a role in the temporal organisation of myelination, the regulation of myelin protein expression and compaction of myelin [31]. Mutations in PLP, which is the major constituent of CNS myelin, have been found to cause Pelizaeus-Merzbacher disease (PMD), an X-linked myelin disorder of the CNS, usually associated with flickering eyes, rigidity and physical and mental retardation. The majority of PMD cases are caused by a chromosomal duplication resulting in a dosage effect similar to what has been seen for PMP22 in the PNS. Moreover, patients with the duplication present with a relatively mild form of PMD, which is another parallel to the situation with PMP22 (reviewed in [32]). Gap junction proteins, like connexin 32 (Cx32), enable the exchange of ions and small molecules between cells through the formation of channels (connexons) consisting of six connexin monomers (reviewed in [33]). In the PNS myelin, Cx32 is located at non-compact paranodal regions and at the Schmidt-Lantermann incisures. The connexons, made of Cx32 monomers, provide a means for an easy exchange of ions between the loops of myelin formed by one Schwann cell (reviewed in [12]). It has been shown that mutant Cx32 proteins are unable to form connexons and are retained in the cell. Moreover the mutant protein does not aggregate in the cell but is subject to quality control and rapid degradation, similar to the fast turnover seen for the wild-type protein [33].

1.2.3.3 The process of myelination

Schwann cells originate in the neural crest which is derived from the ectoderm. In addition to being the source of Schwann cells, the neural crest is also the root of other cell types (e.g. sensory neurons). Some of the soon-to-be Schwann cells have already committed to the glial lineage before their journey to the PNS starts, whereas others receive a signal while they are travelling. Arrived in the PNS, the immature Schwann cells start encircling first groups of axons, then single axons. With progression of the Schwann cell and axonal development a “one-to-one” relationship is established between a Schwann cell and a segment of an axon destined to be myelinated (Figure 5). The production of a basal lamina then marks the final transition into the myelinating phenotype (reviewed in [11, 12, 34, 35]. The presence of the axon itself with a diameter larger than 0.7 µm is the key signal to induce myelination, and thus expression of a typical set of genes. Moreover 13 Chapter 1: General Introduction and Review of the Literature continuing contact between axon and Schwann cell is indispensable for sustaining the myelinating phenotype (reviewed in [11, 12]).

Figure 5: The development of a neural crest cell into the a myelinating or non-myelinating Schwann cell [34] Schwann cells develop from neural crest cells into Schwann cell precursors and then immature Schwann cells while migrating into the PNS. The immature Schwann cells establish a “one to one” connection with the axons. Further development into myelinating or Non-myelinating Schwann cells is determined by signals from the axon, with the axon diameter being the key signal.

Transcriptional control of myelination

The progression from the neural crest cell to the myelinating Schwann cell is influenced by the expression of several transcription factors: Pax3, Tct-1/Oct6/SCIP, EGR2/Krox20, EGR1/Krox24 and Sox10. In the early stages of development, the homeodomain (a DNA binding domain) protein, Pax3, is detected in the spinal cord and the neural crest. With progression of the Schwann cell development, Pax3 expression is maintained, but the level rises at birth and remains high in non-myelinating Schwann cells, whereas it decreases in myelinating Schwann cells (reviewed in [36]). The domain architecture of Oct6, which is also called SCIP for “suppressed cAMP inducible POU” or Tst-1, consists of a homeodomain, a POU-domain and a POU-specific domain (reviewed in [36]). POU is a protein domain with a size of 70 to 75 amino acids, usually located upstream of a homeobox domain. It is thought to play a role in site-specific DNA binding and interaction of proteins on DNA (reviewed in [37]). Oct6 is expressed in both Schwann cells and oligodendrocytes, although loss of function apparently only leads to defects in Schwann cell development. In Schwann cells the expression of Oct6 is dependent on the level of cAMP, which in turn is

14 Chapter 1: General Introduction and Review of the Literature dependent on axonal contact (reviewed in [36]). Oct6 expression peaks transiently in the promyelinating Schwann cell and it has been shown in Oct6 null mice that lack of this protein prevents the cells from leaving the promyelinating stage and thus impedes progression to the mature myelinating phenotype [38]. The two zinc-finger proteins early growth response 1 (EGR1) (mouse homologue Krox24) and early growth response 2 (EGR2) (mouse homologue Krox20) have been implicated with an important role in Schwann cell development. While EGR1/Krox24 is expressed in non-myelinating Schwann cells, expression of EGR2/Krox20 is confined to the myelinating phenotype. EGR2/Krox20 is first detected in the embryonic stage of myelinating Schwann cells and is triggered by axonal contact. After commitment to the myelinating cell lineage, expression continues throughout life (reviewed in [36]). Homozygous inactivation of EGR2/Krox20 has been found to result in lack of terminal differentiation, as demonstrated in mice where Krox20-/- homozygotes develop a neuropathy, while Krox20+/- heterozygotes remain normal [39]. Mutant EGR2 protein has been shown to impede the expression of myelin genes that are regulated by the wildtype EGR2, thus exerting a dominant negative effect [40]. Musso and colleagues demonstrated decreased binding of a certain EGR2 mutant to the Cx32 promoter region, suggesting a transcriptional de-regulation of Cx32 which leads to CMT1 [41]. The transcription factor SOX10, which is a high mobility group domain protein (HMG = DNA-binding domain) is expressed in Schwann cells throughout all stages. It has been shown that it controls expression of the structural myelin protein P0 through interaction with both distal and proximal promoter elements of the P0 promoter region [17].

15 Chapter 1: General Introduction and Review of the Literature

1.2.4 Pathology of the PNS

1.2.4.1 Neuropathies - Disorders of the PNS

Peripheral neuropathies can be divided into two main groups according to the pattern of PNS involvement. The first group, named polyneuropathies, exhibits a bilaterally symmetrical disturbance of PNS function. Causes of this type of neuropathy are toxic substances, deficiency states, systemic metabolic disorders and, in some cases, immune reactions. To the second group belong mononeuropathies, triggered by focal lesions of the PNS, and multiple mononeuropathies, where several individual lesions come together. Neuropathies in this group are often due to mechanical injury of the nerve, or damage by radiation, electricity, heat, but also tumours of the PNS [14]. A different kind of classification that orientates itself mainly on the cause of the neuropathy distinguishes the following types of neuropathy [14]: 1. Neuropathy due to ischemia and physical agents 2. Inherited peripheral neuropathy 3. Neuropathy associated with systemic disease 4. Infectious, postinfectious and inflammatory neuropathy 5. Neuropathy associated with industrial agents, metals and drugs 6. Neuropathy associated with neoplasms Apart from inherited forms (see 1.3), there will be no further discussion of the different types of neuropathy, as this is beyond the scope of this thesis.

1.2.4.2 Diseased versus normal – diagnostic tools for determination of PNS function

General reactions of peripheral nerve to disease

The response of peripheral nerve to disease can be summarised in three main categories [9] (Figure 6): 1. Wallerian degeneration and regeneration 2. Axonal degeneration 3. Segmental demyelination

16 Chapter 1: General Introduction and Review of the Literature

Figure 6: Schematic representation of the three main reactions of peripheral nerve to disease (figure taken from [9]) Wallerian degeneration is the reaction of the nerve to injury. Segmental demyelination is caused by disease of the Schwann cell, while dying back of the axon causes axonal degeneration. The left part of the figure contains the normal nerve for comparison.

Wallerian degeneration and regeneration, named after Augustus Waller, summarises the series of events that take place in an axon after injury. The regeneration is overall more successful in the PNS than in the CNS, where it is very slow. The proximal stump, which is still connected with the nerve cell body, is the place of regeneration, whereas degeneration primarily occurs in the distal stump. Degeneration is characterised by the dissolving of the cytoskeleton, accumulation of debris in the axoplasm and, the disappearance of membranes. Later the axon is subject to fragmentation and the myelin is broken up, while the basal lamina stays intact. Clearing of all the debris starts in autophagic vacuoles of the Schwann cell and, in time, macrophages appear in large numbers. A necessity for successful regeneration is the formations of the “bands of Büngner” by the increasing number of proliferating Schwann cells. These bands are thought to guide the axon sprouts extending from the proximal stump once they have crossed the gap. Similar to the developing fibre, a “one- to-one” relationship between axons and Schwann cells is re-established, and the axons are remyelinated. However, the new myelin layer is of decreased thickness and internodes are shorter when compared to the pre-disease state (reviewed in [9, 14]).

17 Chapter 1: General Introduction and Review of the Literature Disintegrating of the axon and subsequent disappearing are signs of axonal degeneration. The so-called “dying back” of the axon starting at its end has been suggested to be indicative of a disease primarily affecting the body of the nerve cell [9]. Segmental demyelination, the disintegration of myelin in segments of internodes, can be caused by disease in the Schwann cell, but also by axonal atrophy in which case it is a secondary effect. Remyelination can occur, once the initial cause of demyelination recedes. Accompanied by Schwann cell proliferation, a new myelin sheath is elaborated, which is usually thinner than the previous one. If there are recurring events of de- and remyelination a structure called “onion bulbs” is formed, named after its similarity with the multiple layers of an onion [9].

Diagnosis of neuropathies

• Electrophysiology

The principle of electrophysiological measurements of PNS function is the electrical stimulation of the nerve with the stimulus being either of constant voltage or constant current unit. The stimulation is achieved by current flowing between two percutaneous electrodes (anode and cathode). The set up of the electrodes depends on whether sensory or motor conduction studies are being carried out. After recording the action potential, nerve conduction velocities (NCVs) can be calculated. The deduced values vary between different nerves and different segments. NCVs are slower in the lower than in the upper extremities, and faster in the proximal than in the distal segment of a nerve. Table 3 gives normal nerve conduction velocity values for the median nerve, which is one of the most widely used criteria for assessment of nerve function [14].

Table 3: Normal values for nerve conduction velocities in the median nerve (adapted from [14]) Points of stimulation Conduction velocity [m/s] * Motor fibres Palm-wrist 48.8 ± 5.3 Wrist-elbow 57.7 ± 4.9 Elbow-axilla 63.6 ± 6.2 Sensory fibres Digit-palm 58.8 ± 5.8 Palm-wrist 56.2 ± 5.8 Wrist-elbow 61.9 ± 4.2 * mean ± standard deviation

When using surface electrodes, a compound action potential is obtained, which includes information about the input from all the muscle fibres innervated by the nerve.

18 Chapter 1: General Introduction and Review of the Literature Measurement of compound action potentials from the sural nerve has gained great importance as it gives electrophysiological evidence for the involvement of different fibre groups in neuropathies [14]. Disease pathology as described in the three categories in the previous section results in a change of the electrophysiological values. Wallerian and axonal degeneration result in a reduction in the amplitude of the compound action potential and electrical evidence of denervation, whereas segmental demyelination is associated with a slowing of the nerve conduction [14].

• The clinical defect

Evaluation of the clinical defect in neuropathies concentrates mainly on the motor deficit and the sensory disturbance. Ascertainment of the motor deficit is usually done by measurement of the force generated by a muscle. Weakness in generating the force, or fatigue, where the force cannot be maintained for a prolonged time, are the general manifestations of neuromuscular disorders [14]. The simplest survey of sensation over the surface of the body, without use of standardised tests, can be done by employing inexpensive hand-held instruments and devices. Examination is performed with cotton, sharpened pins or similar tools on the surface of the body, while asking the patient for absent or abnormal sensation. The patient reports if there is any feeling in the area of question and if this feeling is normal. Sensations can also be compared between regions of the body. To standardise the process computer-assisted systems can be employed [14].

• Neuropathology

Nerve biopsies are the foundation of all neuropathological studies. Due to the easy access and the extensive records available for comparison, the nerve of choice is in most cases the sural nerve in the ankle region. Its location is in close proximity to the saphenous vein and apart from few exceptions, it consists completely of sensory fibres. Morphometric studies, in general, aim at examining number, size distribution and shape of discrete groups of neurons and their axons using sections of the fibres. For pathological assessment of neuropathies, the number, density, dispersion, diameter, distribution and shape of myelinated fibres (MF) is of special interest. Values obtained from axons and myelin of patient biopsies can be related to previously established parameters of the disease state and compared to normal biopsies, thus enabling for

19 Chapter 1: General Introduction and Review of the Literature instance the distinction of very similar phenotypes. Axonal swelling and atrophy, demyelination and remyelination and onion bulbs can be assessed. The teased fibre preparation is a special method that allows the inspection of single myelinated fibres by pulling the fibre out of a nerve biopsy and subsequent fixation on a slide. This type of preparation allows examination of a number of features of myelinated fibres that would otherwise be hard to access. Successive internodes can be assessed and pathological characteristics can be identified easier because of the increased sensitivity of the method compared to sections of the fibre. In addition, the teased fibre method can give an answer to the question whether a neuropathy is active, and can help identify branching of the fibre or sprouting, or enable the ultra-structural analysis of a given pathological change. In order to facilitate a comparison, an extensive classification has been elaborated which can be used as standard for assessment of the state of the fibre (reviewed in [14]).

20 Chapter 1: General Introduction and Review of the Literature

1.3 CHARCOT-MARIE-TOOTH DISEASE (CMT)

1.3.1 History

When in February 1886 Dr Jean-Martin Charcot and Dr Pierre Marie from the Hôpital de la Pitié de Salpêtrière in Paris published their findings on a special form of progressive muscular atrophy, Howard Henry Tooth had almost finished his thesis on the peroneal type of muscular atrophy. Although he published his results only three months later, Charcot and Marie gave the disease its name. Tooth was only included in the name later, to honour his work. However, they were not the first to describe this disease. In 1855 Rudolph Virchow first published a study about progressive muscular atrophy and other reports by Eulenburg in 1856, Friedreich and Eichhorst in 1873, followed. In 1893 Déjerine and Sottas reported two siblings affected with a recessive disease of the peripheral nervous system, similar to the disorder described by Charcot and Marie, but much more severe, thus igniting a discussion about whether the so-called Déjerine-Sottas Syndrome was a separate entity. Another publication that caused much debate was the one by Roussy and Lévy in 1926, introducing the Roussy-Lévy-Syndrome, which differs from the original CMT symptoms by addition of static tremor of the hands (reviewed in [42]. In the course of time, numerous reports have been published, stressing specific symptoms or proposing new entities, but only in recent years with the availability of the human genome sequence has the number of CMT loci and genes increased dramatically, reflecting the enormous genetic heterogeneity of CMT.

1.3.2 Prevalence

The prevalence of CMT ranges from 0 to 42 per 100000 individuals (Table 4). This variety is caused by the method of ascertainment/recruitment of patients and by the differences between populations. When dividing the dominant cases into demyelinating and axonal CMT, CMT1 and CMT2, respectively, it seems that both are equally frequent (reviewed in [43]), whereas the recessive cases are rare [44]. However, overall prevalence, ratio of CMT1 to CMT2 or dominant versus recessive forms can assume extreme values in isolated inbred populations. This is the case with the population in Western Norway where about 41 in 100000 individuals are affected with CMT [44]. Another example is Algeria, where 23 % of all marriages in the general population are

21 Chapter 1: General Introduction and Review of the Literature consanguineous, and autosomal recessive CMTs account for over 60 % of all cases, diagnosed in Algiers’s three main neurological centres [45]. The focus of this PhD is on the Gypsies, an isolated population, where consanguineous marriage is frequent in certain groups. Based on the cases of three neuropathies, HMSNL, HMSNR and CCFDN, and available census data on the number of Gypsies, it has been estimated that autosomal recessive neuropathies affect 1 in 5000 Gypsies in Bulgaria (personal communication from Luba Kalaydjieva and Ivailo Tournev).

Table 4: Prevalence of CMT (table modified from [43]) Area of survey Prevalence Author/Reference per 100,000 North Carolina, USA 5.3 Herndon 1954 [46] Rochester, Minnesota, USA 0 Kurland 1958 [47] Carlisle, United Kingdom 1.4 Brewis et al 1966 [48] Iceland 1.6 Gudmundson 1969 [49] Guam, Asian territory of the 23.7 Chen et al1968 [50] USA Islands of southern Japan 3.48 Kondo et al 1970 [51] Western Norway 41 Skre 1974 [44] Newcastle upon Tyne, United 4.7 Davis et al 1978 [52] Kingdom Edinburgh, United Kingdom 6 Brooks and Emery 1982 [53] Southwest Sweden 19 Hagberg and Westerberg 1983 [54] Cantabria, Spain 28.2 Combarros et al 1987 [43] Yonago and Sakaiminato, 10.8 Kurihara et al [55] Western Japan Molise, South Italy 17.5 Morocutti et al [56]

1.3.3 Clinical classification

Charcot-Marie-Tooth (CMT) disease is a polyneuropathy and belongs to the group of inherited neuropathies, which apart from CMT also comprises a number of disorders caused by inherited defects of metabolic pathways. CMT or hereditary motor and sensory neuropathy is a genetically heterogeneous group of disorders which share a phenotype characterised by weakness and wasting of the distal limb muscles that is frequently associated with distal sensory loss and skeletal deformities. Before the discovery of any disease-causing mutations, classifications of CMT were mainly based on the clinical picture and the mode of inheritance. In 1968 Dyck and Lambert [42, 57] distinguished a hypertrophic CMT type, associated morphologically with segmental demyelination, and a neuronal type of CMT disease

22 Chapter 1: General Introduction and Review of the Literature with little or no evidence of demyelination, with each form comprising a majority of dominant cases and a few recessive ones. The main diagnostic difference between the two classes was the motor nerve conduction velocity, found to be half the normal value in the hypertrophic neuropathy and only slightly slowed or normal in neuronal type. Harding and Thomas refined this classification by setting a cut-off value of 38 m/s. Individuals with median motor nerve conduction velocity below 38 m/s were designated as affected with hereditary motor and sensory neuropathy type I (HMSN I) whereas values above 38 m/s correspond to hereditary motor and sensory neuropathy type II (HMSN II) [58]. Summarising problems with classification in a recent report, Reilly [59] noted that the HMSN classification, mostly applied by clinicians, has seven subclasses (HMSN I to HMSN VII, Table 5). This is, to a degree, interchangeable with the CMT classification favoured by geneticists (HMSN I = CMT1 and HMSN II = CMT2).

Table 5: HMSN classification of Charcot-Marie-Tooth-disease [59] HMSN I Demyelinating (CMT 1) HMSN II Axonal (CMT 2) HMSN III Severe demyelinating HMSN IV Refsum’s disease HMSN V CMT + spastic paraplegia HMSN VI CMT + optic atrophy HMSN VII CMT + retinitis pigmentosa

On extension the CMT classification system also accommodates recessive forms of HMSN I/CMT1 named CMT4 and x-linked form that have been given the name CMTX. In contrast, the system does not cater for recessive forms of HMSN II/CMT2, which are simply called ARCMT2. A detailed genetic classification based on the CMT system will be given in 1.3.5.

1.3.4 Clinical aspects

On a clinical level all forms of CMT share common features whose incidence and severity may vary even within families, this made subdividing HMSN I/CMT1 and HMSN II/CMT2 very difficult before genetic data became available. In order to pay tribute to the historical progress and to draw a general picture, the content of the following section is mainly taken from four papers, pre-dating the Human Genome Project, published by Dyck and Lambert [42, 57] and Harding and Thomas [58, 60], who put great effort into documenting all the clinical manifestations of CMT.

23 Chapter 1: General Introduction and Review of the Literature 1.3.4.1 Clinical aspects of the dominant forms

In both dominant forms of CMT, CMT1 and CMT2, a subset of individuals is asymptomatic (16 % in [58]), even though affected status has been proven by genetic testing and electrophysiological examination. For the remaining individuals the disease onset is in the first or second decade for CMT1 and in the second for CMT2. For the latter, onset can be delayed until the seventh decade in some individuals. The first sign of disease is often foot deformity, which is evident in nearly all cases of CMT1 and some cases of CMT2. Characteristically seen in patients are high arches, curled up toes (hammer toes), pes cavus, pes equinovarus (as a result of higher raising of the knee in order to compensate for drop foot) and, in association with these features, corns and calluses and sometimes foot ulcers. Scoliosis is more frequent in type 1 CMT than in type 2. A pronounced motor deficit evident by weakness and atrophy of the distal muscles in upper and lower limbs and gait abnormalities, like frequent tripping and a general clumsiness, develops in time in most of the CMT1 cases, still confinement to a wheelchair is uncommon and prognosis is favourable. For CMT2, similar symptoms can be noted, but they are always less severe and later in onset. Tremor and ataxia occur to a similar extent in both types of CMT for the lower limbs, but are more frequent in the upper limbs in CMT1. Total tendon areflexia has been reported in 58 % of all CMT1 cases and only 9 % of CMT2 cases by Harding and Thomas [58]. The development of a sensory deficit usually happens after the onset of the motor involvement. It is frequently associated with loss of vibration sense, two-point discrimination and joint position sense. The sensory impairment is again less obtrusive in CMT2 in comparison to CMT1. Upon electrophysiological examination median motor nerve conduction velocities are below 38 m/s for CMT1 and normal or slightly reduced for CMT2. Sensory action potentials are either absent or exhibit low amplitude and latency (time between stimulus and response). Again, this feature is more pronounced in CMT1 than CMT2. Neuropathological changes are distinct, with CMT1 showing an enlargement in the transverse fascicular area of the nerves in some patients, which is not detected in any CMT2 case. Furthermore, nerve biopsies from CMT1 affected individuals present with a decreased number of myelinated fibres and the largest myelinated fibres are smaller than normal. Under the electron microscope segmental demyelination and remyelination with variable formation of onion bulbs can be seen in CMT1 cases. CMT2 on the other

24 Chapter 1: General Introduction and Review of the Literature hand shows a more axonal involvement with no evidence of hypertrophic changes or segmental demyelination. A number of other symptoms may be associated with CMT1 and CMT2, including: unequal pupils, sensorineural deafness, optic atrophy, dysarthria and the need for amputations as a result of severe deformity or ulceration related to dense sensory loss.

1.3.4.2 Clinical aspects of the recessive forms

All clinical symptoms that can be found in the dominant forms of CMT are also present in the recessive ones. The recessive forms of CMT exhibit a more severe clinical picture than the dominant forms. Motor deficit and sensory loss are more prominent and the prognosis is worse than for dominant cases. The incidence of weakness, ataxia, total tendon areflexia and scoliosis is much higher in the recessive cases. For recessive CMT2 the age of onset is significantly earlier than for the dominant form, where no such difference was noted between dominant and recessive CMT1. Median motor nerve conduction velocities are significantly lower in recessive CMT1, but well preserved in recessive CMT2 [60].

25 Chapter 1: General Introduction and Review of the Literature

1.3.5 Genetics

CMT can be inherited as an autosomal dominant, autosomal recessive or X-linked trait. The following section, which will be mainly using the CMT-based classification, gives a summary of the genetic findings and points out specific clinical features. The ever-increasing number of loci that are being discovered gives a good indication of the complexity of biological processes involved in CMT pathology. An overview is given in Table 6.

Table 6: Overview of Charcot-Marie-Tooth disease: genes and loci * Demyelinating CMT Axonal CMT

Dominant inheritance Dominant inheritance CMT1A: PMP22, 17p11 CMT2A: KIF1Bβ/ MFN2 1p36 CMT1B: P0, 1q22 CMT2B: RAB7, 3q13-q22 CMT1C: LITAF, 16p13 CMT2C: 12q23-q24 CMT1D: EGR2, 10q21 CMT2D: GARS, 7p14 HNPP: PMP22 deletion, 17p11 CMT2E: NF-L, 8p21 HMSN3/Déjérine-Sottas: PMP22, P0, CMT2F: 7q11-q21 HSP27 EGR2 CMT2G: 12q12-13.3 CMT2L:12q24 CMT2P: 3q13.1

Recessive inheritance Recessive inheritance CMT4A: GDAP1, 8q21.1 AR-CMT2A: Lamin A/C, 1q21 CMT4B1: MTMR2, 11q23 ARCMT2B: 19q13.3 CMT4B2: SBF2, 11p15 AR-CMT + Pyramidal signs: 8q21.1 CMT4C: KIAA1985, 5q23-q33 GAN: Gigaxonin, 16q24 CMT4D/HMSNL: NDRG1, 8q24 Infantile neuropathy + respiratory failure: - CMT4E: EGR2, 10q21 CMT4F: periaxin, 1q13 CMT4 + juvenile glaucoma: SBF2, 11p15 CCFDN: CTDP1, 18qter HMSNR: 10q22 (this thesis)

X-linked X-linked CMTX: Cx32, Xq13.1 CMTX axonal: Xq24-q26 * After www.neuro.wustl.edu/neuromuscular/time/hmsn.html

26 Chapter 1: General Introduction and Review of the Literature 1.3.5.1 Charcot-Marie-Tooth-disease- demyelinating forms (CMT 1, 4 and X)

Autosomal dominant inheritance of demyelinating CMT 1: A wide variety of autosomal dominant forms of Charcot-Marie-Tooth-Disease have been reported so far. Among them: CMT 1A, CMT 1B, CMT 1C, hereditary neuropathy with liability to pressure palsies (HNPP) and CMT1D (also called congenital hypomyelination). The majority of CMT 1 cases (68 %) [21] result from a duplication of chromosome 17p11.2, in a 1.5 Mb region which is a hot spot for unequal crossing over during meiosis [61]. Sporadic cases where the duplication is found de novo are frequently encountered. It is thought that the duplications are usually of paternal origin and arise from unequal non-sister chromatid exchange during male spermatogenesis due to misalignment at the so-called CMT1A-REP repeat sequence [62, 63]. Maternal origin of the duplication has been reported by Silander et al 1996 [64]. Patients with the 17p11.2 duplication manifest with a range of phenotypes from classic CMT1 symptoms to diaphragmatic weakness, pyramidal signs and, in some cases, Roussy-Lévy phenotypes, which adds gait ataxia and tremors to the clinical symptoms [65]. Roussy- Lévy-syndrome was originally thought to be a distinct entity, but detection of the 1.5 Mb duplication in Roussy-Lévy patients confirmed that it is a phenotypic variant of CMT1A rather then a separate disease [66]. The duplicated region includes the gene for peripheral myelin protein 22 (PMP22) leading to a gene dosage effect and an increased amount of PMP22 in patients with the duplication [67, 68]. A deletion of the same 1.5 Mb region is the cause of HNPP [69], which shows special features, such as focal myelin thickening and the occurrence of pressure palsies. Valentijn [70] implicated mutations in PMP22 as a cause of CMT1A after detecting the same point mutation in PMP22 in human CMT1A patients and the Trembler-J mouse, which is a spontaneous neurological mouse mutant. In summary, mutations in PMP22 have been associated with CMT1A, HNPP and DSS (see Nelis et al., 1999 for a list of mutations)

CMT1B has been associated with mutations in the myelin protein zero gene (P0) on chromosome 1q22-23, which encodes the myelin protein zero, a major structural component of peripheral myelin [71]. CMT1C, which is a typical CMT1 in a clinical sense with no outstanding features, has been mapped to chromosome 16p13.1-p12.3 [72]. Mutations in the lipopolysaccharide-induced tumour necrosis factor-α gene (LITAF/SIMPLE) have been recently identified as the cause of CMT 1C [73]

27 Chapter 1: General Introduction and Review of the Literature CMT1D - congenital hypomyelination (CH) with onset before or at birth is associated with severe weakness and can lead to early death [74]. CH is caused by mutations in the EGR2 gene and has been reported both as dominant and recessive trait (CMT4E) [75].

Déjerine-Sottas-Syndrome (DSS) may be caused by mutations in P0, PMP22 and EGR2 occurring in both dominant and recessive mode of inheritance [64, 74, 76-78]. It is however questionable whether the disease descriptions collected under the name DSS represent a separate disease entity or wether they are phenotypic variants of several different CMT forms whose prominent severity is due to the position of the mutation in the respective gene and possibly the genetic background onto which the mutation has been imposed. Autosomal recessive inheritance of demyelinating CMT: To date, a number of loci have been identified: CMT 4A, CMT 4B1, CMT 4B2, CMT 4 with glaucoma, CMT 4C, CMT 4D (HMSNL), CMT 4E, CMT 4F, DSS, CCFDN and HMSNR. CMT 4A (linked to 8q13-21.1) presents with severe hypomyelination, basal laminal onion bulbs, early onset before the age of two years and a severe and rapid progression often to wheelchair dependency [79]. Mutations in the gene encoding the Ganglioside-induced differentiation associated protein 1 (GDAP1), a putative Glutathione-S-transferase were found to cause the CMT4A phenotype [80]. Interestingly, mutations in the same gene have also been linked with an axonal phenotype of recessive CMT associated with hoarse voice and vocal cord paresis [81], implying the GDAP1 mutations can cause, both demyelinating and axonal pathologies. Other reports that followed confirmed phenotypic heterogeneity at the GDAP1 locus and added to the growing list of mutations in GDAP1 [82, 83]. Until now, 16 mutations in GDAP1 have been recorded in the CMT mutation database (http://molgen- www.uia.ac.be/CMTMutations/) The special feature of CMT 4B is focally folded myelin with loss of myelinated fibres that can be found in both forms (CMT 4B1 and B2) on sural nerve biopsy. CMT 4B1 has been mapped to chromosome 11q22 [84, 85] and subsequently mutations have been identified in the MTMR2 gene, which encodes myotubularin-related protein- 2, a dual specificity phosphatase [86, 87]. The phenotypic spectrum of CMT4B1 also includes facial and bulbar weakness, respiratory problems and diaphragmatic weakness [87, 88]. Furthermore patients become wheelchair-bound by the third decade of life [89]. CMT 4B2 has been linked to 11p15 [90]. Recently, mutations in the SET binding factor 2 (SBF2), which is a member of the pseudo-phosphatase branch of the

28 Chapter 1: General Introduction and Review of the Literature myotubularin family, have been associated with CMT4B2 [91]. This publication was closely followed by a report linking MTMR13, also a pseudo-phosphatase, with CMT4B2 associated with early onset glaucoma [92]. MTMR13 and SBF2 are in fact, one and the same gene, and the early onset glaucoma is a phenotypic variant probably due to the position of the mutation in the SBF2 gene. Additionally, mutations in P0 have been associated with the occurrence of focally folded myelin [93]. CMT 4C, which has been mapped to 5q23-33, presents with a rapid worsening of foot deformities and severe early onset scoliosis versus slow progression of the motor deficit [94-96]. Manifestations of the disease include infantile neuropathy, associated with early wheel chair dependence and respiratory problems [97]. Nerve biopsies reveal the presence of basal lamina onion bulbs and morphological alterations in myelinating and non-myelinating Schwann cells [98]. Mutations in the gene KIAA1985 have been identified as causative for the CMT4C phenotype. Apart from domain prediction, which yielded TPR (tetratricopeptide repeat) domains, commonly found in proteins required for mitosis and RNA synthesis [99] and SH3 (Src homology 3) domains, which are thought to be involved in protein-protein interaction [10], nothing is known about the function of the KIAA1985 protein [97]. HMSNL (CMT4D) is one of the three neuropathies occurring in the European Gypsy populations. Mapped to chromosome 8q24 in 1996 [100], a truncating mutation and an exon skipping mutation associated with an even more severe phenotype, were subsequently found in the N-myc downstream-regulated gene 1 (NDRG1) [7, 101]. The disorder is characterised by very severe, early axonal loss and is associated with the invariable development of neural deafness during the 2nd – 3rd decade of life [102]. Neuropathology results include hypomyelination, deficient myelin compaction, subsequent demyelination, poorly formed regressing onion bulbs and peculiar pleomorphic inclusions in the adaxonal Schwann cell cytoplasm [103]. CMT4E (corresponding to a congenital hypomyelinating neuropathy), which presents with especially thin or even absent myelin sheath and with onion bulbs, has been linked to mutations in the gene encoding EGR2 on chromosome 10q21-22. EGR2 mutations have been connected to both recessive and dominant CMT [75]. Both CMT4F and recessive DSS have been associated with mutation in the gene encoding the protein periaxin, which has been located to chromosome 19q13.13-13.2, and are therefore likely to represent the same disease [26, 104, 105]. Both disease phenotypes are severe, marked by marked depletion of myelinated fibres, presence of onion bulbs, severe sensory loss and motor involvement (MNCVs undetectable).

29 Chapter 1: General Introduction and Review of the Literature Parman et al [77] report recessive DSS in association with point mutations in PMP22, which illustrates the wide variety of sub-forms of DSS, emerging from the definition of DSS as “severe cases of CMT”. As suggested earlier, there has been substantial doubt as to whether DSS is a separate disease entity [57, 58] in which case it would be better to include the cases with periaxin mutations in the CMT4F category. Another recessive form of CMT1 occurs as part of the Congenital Cataracts Facial Dysmorphism Neuropathy (CCFDN) syndrome, mapped to chromosome 18qter in the European Gypsies [106]. Apart from a neuropathy due to hypomyelination, a wide range of other clinical features have been detected. These include congenital cataracts and microcorneas, impaired physical growth, delayed motor and intellectual development, facial dysmorphism and hypogonadism [107, 108]. The mutation causing CCFDN is a single nucleotide substitution in intron 6 of the gene Carboxy-terminal- domain phosphatase 1 (CDTP1) which leads to abnormal splicing and insertion of an Alu-sequence into the processed mRNA of CDTP1 resulting in a premature stop codon and down-regulation of the transcript [109]. As HMSNR is the purpose of this study, it will be discussed as a separate part of this chapter. X-linked inheritance of demyelinating CMT: Mutations in the gene encoding the protein connexin 32, which is located on chromosome Xq13-q22, have been shown to be responsible for the X-linked form of CMT 1. Males are usually more severely affected, onset being between 2 and 24 years, whereas females can be asymptomatic with normal to slightly reduced MNCVs [110]. A Cx32 null mouse model showed a demyelinating peripheral neuropathy thus confirming the importance of Cx32 for Schwann cells [111]. Moreover, similar observations were made in humans with a complete deletion of the Cx32 coding sequence [112].

1.3.5.2 Charcot-Marie-Tooth-disease- Axonal forms (CMT 2 and X)

Autosomal dominant inheritance of CMT 2: As already mentioned in the section about the clinical characteristics, CMT2 patients have normal or only slightly reduced MNCV and the progression of the disease is often slow and less severe than CMT1. On neuropathological examination, axonal involvement rather then demyelination is the prominent feature of CMT2. To date, nine subtypes have been described for the autosomal dominant form namely: CMT2A, CMT2B, CMT2C, CMT2D, CMT2E, CMT2F, CMT2G, CMT2L and CMT2P.

30 Chapter 1: General Introduction and Review of the Literature CMT2A, mapped to 1p35-36, presents with onset in the first decade, atrophy in distal muscles and reduced tendon reflexes in the lower limbs [113, 114]. In a Japanese family this form of axonal CMT has been attributed to a mutation in the microtubule motor KIF1Bβ belonging to the kinesin superfamily of molecular motor proteins [115]. However, a recent article reports that no mutations in KIF1Bβ have been found in other CMT2A families, prompting the search for another gene that could be involved in the CMT2A pathology. Consequently, a number of missense mutations in the gene encoding the GTPase mitofusin 2 (MFN2) have been detected and claimed to be responsible for CMT2A in these families [116]. After CMT2B had been linked to chromosome 3q13-22 in an American family [117], it was argued that it would be better to assign this disorder into the group of hereditary sensory neuropathies (HSN), because of the occurrence of foot ulcers and osteomyelitis leading to amputation in CMT2B patients [118]. Recent genetic evidence associated mutations in the serine palmitoyltransferase (SPTLC1) on chromosome 9q22.1-22.3 with HSN type I [119, 120] and mutations in the small GTPase late endosomal RAS-related GTP-binding protein 7 (RAB7) with CMT2B [121], confirming that, genetically, they are separate entities, although they share a number of clinical symptoms. CMT 2C manifests with vocal cord paresis as the most remarkable feature [122]. Further symptoms include weakness of intercostal muscles and respiratory problems in severe cases [123]. A report by Dematteis dealing with the occurrence of sleep apnoea in CMT1A patients, suggested that CMT2C might be a clinical manifestation of CMT1A, thus inferring that familial predisposition to pharyngeal neuropathy may be part of CMT1A [124]. However, there seems to be a clear genetic distinction between CMT1A patients, that occasionally may present with respiratory features and CMT2C, which has now been mapped to chromosome 12q23-24 [125]. The main characteristic of CMT2D is the severe atrophy and weakness of the hands while the feet are only mildly or moderately affected [126]. Mutations in the glycyl tRNA synthetase gene (GARS) on chromosome 7p14 have been found to cause CMT2D, but also distal hereditary motor neuropathy V (HMN V) [127]. CMT2E has been connected with mutations in the neurofilament-light gene (NF-L) on chromosome 8p21. The Russian family used for the original linkage presented with a typical CMT2 phenotype [128]. Meanwhile additional features, including deafness, MNCVs in the CMT1 range and severe early onset, have been

31 Chapter 1: General Introduction and Review of the Literature associated with other mutations in the same gene implying that the clinical spectrum of the disease is much wider than initially expected [129, 130]. CMT2F has been described in a multigenerational Russian family with CMT2 characteristics and the locus has been assigned to chromosome 7q11-21 [131]. Mutations in the gene encoding the small heat shock protein 27 (HSP27) have been found to be causative for CMT2F, but have also been linked with distal HMN [132]. A very recent study added a new locus on chromosome 12q12-13.3 to the growing number of CMT2 loci. The disorder which manifests as typical CMT2 with slow progression and no associated symptoms has been named CMT2G [133]. Another recent article reports the mapping of a new CMT2 locus (termed CMT2L) to chromosome 12q24 in a large Chinese pedigree. While the majority of subjects manifest with the typical CMT2 phenotype with mild or moderate sensory deficit, some individuals additionally present with involvement of proximal muscles. The locus is distinct from CMT2C, mapped to 12q23-24, but overlaps with a region for distal HMN II [134] For CMT2P a late onset in the fourth or fifth decade, preceded by muscle cramps, has been described. Further symptoms include areflexia, elevated creatine kinase levels, hyperlipidemia and diabetes mellitus. The disease was mapped to chromosome 3q14.1- q13 which was further refined to 3q13.1 [135, 136]. Autosomal recessive inheritance of CMT 2: So far, five recessive CMT 2 disorders have been found: ARCMT 2A, ARCMT 2B, ARCMT2, GAN and severe infantile axonal neuropathy. As with the recessive CMT1 forms, the clinical picture of recessive CMT2 often reflects an increased severity in comparison to the dominant forms. ARCMT 2A, which has been mapped to chromosome 1q21.2-21.3, presents with classical CMT2 signs and additionally with weakness and amyotrophy of proximal muscles of the pelvic girdle and a moderate scoliosis in some patients [45, 137]. Mutations in the lamin A/C gene (LMNA) have been shown to cause ARCMT2A [138]. Interestingly, defects in this gene have also been implicated in Emery-Dreifuss muscular dystrophy; dilated cardiomyopathy type 1A, limb girdle muscular dystrophy type 1B and autosomal dominant partial lipodystrophy [139-143], thus indicating phenotypic heterogeneity. For ARCMT2B, the onset has been found to be in the second to fourth decade and it has been mapped to chromosome 19q13.3 [144].

32 Chapter 1: General Introduction and Review of the Literature Another autosomal recessive CMT 2 that has been mapped to 8q21.3 (with GDAP1 and PMP2 being outside the region of homozygosity) starts in the first decade and is characterised by distal atrophy in all four limbs, pyramidal involvement and absent or brisk tendon reflexes [145]. Recently, the gene mutated in giant axonal neuropathy (GAN) has been mapped to chromosome 16q24.1. The protein encoded by this gene has been named gigaxonin, after the most striking and unique neuropathological feature of GAN, the formation of giant axons. These are due to accumulation of neurofilaments, leading to segmental distension of the axons. Onset of GAN is in childhood, with distal amyotrophy, hypoesthesia of the lower limbs and areflexia. Cerebellar ataxia and signs of pyramidal tract involvement are also observed. Death usually occurs in adolescence [146-148]. The gene for severe infantile axonal neuropathy has not been localised so far. Clinical features of this disease include respiratory failure (4 of 5 patients died within the first month after birth), no sensory action potentials detectable, reduced MNCV, denervation and equinovarus deformities at birth [149]. X-linked inheritance of CMT 2: A family with an X-linked recessive form of axonal neuropathy, with onset in childhood and associated deafness and mental retardation, has shown linkage to Xq24-q26 [150].

1.3.5.3 Conclusion

When the first CMT mutations in PMP22, P0 and CX32 where identified in the early 1990s, it seemed that defects in major structural components of myelin cause dominant demyelinating CMT. Then in the late 1990s, the discovery of recessive gene mutations that result in demyelinating CMT accelerated and it appeared that those were located in genes of very diverse function such as phosphatases in the case of CMT4B, a transcription factor in the case of CMT4E and the structural myelin protein periaxin in CMT4F. In a similar fashion the first proteins implicated in dominant CMT2 were in structural proteins, for example NF-light which is involved CMT2E. Over time variety seemed to increase, with the discovery of CMT mutations in proteins involved in fundamental processes such as transcription (CTDP1 mutations in CCFDN) or translation (GARS mutations in CMT2D). Additionally, it became clear that structural proteins represent only a part of the diversity and that they were not confined to dominant forms of CMT. The implication for the HMSNR project was that no assumption could be made about what type of gene could be expected to be mutated in HMSNR. 33 Chapter 1: General Introduction and Review of the Literature Another aspect of the CMT genes is phenotypic heterogeneity. This involves mutations in one and the same CMT gene may cause both demyelinating and axonal pathologies. Moreover, apart from causing forms of CMT, mutations in CMT genes may also lead to other diseases, not necessarily with neuromuscular pathology.

34 Chapter 1: General Introduction and Review of the Literature

1.4 HEREDITARY MOTOR AND SENSORY NEUROPATHY- RUSSE (HMSNR)

Hereditary Motor and Sensory Neuropathy–Russe (HMSNR) is a severe peripheral neuropathy, which was identified during studies on HMSN-Lom, a neuropathy caused by mutations in the gene NDRG1 on chromosome 8q24. The original extended Bulgarian pedigree showed multiple affected members (Figure 7). In some branches of this pedigree linkage to chromosome 8q24 (the HMSNL locus) could be established, whereas in other branches linkage to 8q24 was excluded, suggesting the independent segregation of a second autosomal recessive neuropathy. The new disorder was subsequently mapped to 10q22. It was named HMSN-Russe (HMSNR) after the town Russe in Northern Bulgaria, where most of the affected individuals live [5]. Both the original Bulgarian HMSNR kindred and subsequently identified families belong to the Gypsy group of the Kalderash. Together with the Rudari, the Lom and the Kalaydjii South, the Kalderash Gypsies belong to the big migrational category of the Vlax Gypsies, which settled in Wallachia (present-day Romania) (reviewed in [151]).

BULG-1a BULG-1d BULG-1b BULG-1c Figure 7: Large Bulgarian pedigree in which HMSNR and HMSNL segregate independently (figure adapted from [5]) This large Bulgarian Gypsy kindred was used to map HMSNR. Black symbols indicate individuals affected by HMSNR, white symbols indicate unaffected individuals. Symbols with a vertical line denote individuals affected by HMSNL. The four HMSNR branches of this pedigree are labelled with BULG-1a to d.

1.4.1 Clinical aspects

HMSNR manifests clinically at the age of 7 to 16 years with weakness and atrophy in the distal lower limbs (Figure 8) whereas the distal upper limbs become involved at the age of 10 to 43 years. After progressing steadily, the disease leads to a severe disability with total paralysis of the muscles below the elbow and below the knee in the 4th and 5th decade. All patients present with foot deformities e.g. pes cavus or

35 Chapter 1: General Introduction and Review of the Literature clawing toes and frequently with hand deformities. Sensory loss, affecting all modalities is severe in older individuals. Electrophysiological findings include no detectable sensory action potentials, and motor nerve conduction velocities moderately reduced in the upper limbs (ulnar nerve 31.9 ± 7.05; median nerve 32.0 ± 6.8 m/s) and unmeasurable in the lower limbs. All patients show a peculiar increase in the threshold for electrical nerve stimulation. No associated symptoms (e.g. sensorineural deafness, optic atrophy (as discussed in 1.3.4) have been encountered in HMSNR patients [6].

Figure 8: Picture of a 24 year old Spanish HMSNR patient with distal atrophy of the limbs (unpublished photograph, patient described in [152])

As HMSNL and HMSNR may segregate independently within the same family, it is important to note, which clinical features distinguish these two autosomal recessive neuropathies from each other, especially in early childhood, where making a distinction between the two is difficult. While both disorders present with clinical symptoms from an early age, these seem to be more severe in HMSNL. An easier differentiation of the two neuropathies can be made upon electrophysiological examination. Nerve conduction velocities are prominently lower (below 21 m/s) in young HMSNL patients and even unobtainable in the leg after age 7 to 9, while in HMSNR patients of similar age the nerve conduction velocities are preserved over 21 m/s. Aberrations in the distal latencies are more pronounced in HMSNL patients. The most useful distinction for patients in the first or second decade is made using BAEP (brains stem auditory evoked potentials) recordings, which are always normal in HMSNR patients and abnormal in

36 Chapter 1: General Introduction and Review of the Literature HMSNL patients from an early age. Despite these differences, a final conclusion should be best supported by genetic evidence [153]. With the general clinical symptoms being mostly indistinguishable from other CMTs, special features of HMSNR have been mainly elucidated by examination of nerve biopsies. Sural nerve biopsies reveal a general depletion of large myelinated fibres, which is typical for all CMTs, but also profuse regenerative activity with numerous clusters of thinly myelinated fibres (Figure 9 and Figure 10). The latter is a prominent finding in HMSNR and not seen in other forms of CMT disease. Myelinated fibre densities in HMSNR patients are lowered to 3500 to 7500 myelinated fibres (MF) per mm2, while the density of unmyelinated fibres is elevated to 52200 to 67700 MF/mm2. Myelin thickness relative to axonal diameter is reduced, which is characteristic of hypomyelination. However, it is unclear whether hypomyelination is the primary problem in HMSNR or whether this observation is the result of the rate of myelination falling behind that of axonal regeneration. Hence, this implies both Schwann cells and axons are potentially involved in the disease pathology making it difficult to definitely assign HMSNR into the demyelinating (CMT1) or axonal (CMT2) group of CMT [6].

Figure 9: Electron micrograph of transverse section trough HMSNR sural nerve [152] This electron micrograph of a sural nerve biopsy from an HMSNR patient shows a regenerative cluster comprising small fibres surrounded by parent basal lamina.

37 Chapter 1: General Introduction and Review of the Literature

Figure 10: Light micrograph of transverse section trough HMSNR sural nerve [152]. This light micrograph of a transverse section through a HMSNR sural nerve shows a marked reduction in the myelinated fibre population and numerous clusters of thinly myelinated fibres. No active fibre degeneration or demyelination and no hypertrophic changes are visible.

1.4.2 Genetics

1.4.2.1 Genome scan

The original HMSNR mapping kindred from Bulgaria (branches BULG-1a, b in Figure 7) consisted of 15 individuals, 11 of whom were affected. A genome scan was conducted with a density of 10-cM intermarker distance, showing evidence of linkage to a >40 cM region on chromosome 10q, flanked by markers D10S208 and D10S1686. With a maximum value of 4.5 the multipoint LOD score peaked in the interval D10S196-D10S37. The highest two-point LOD score (3.96) was observed at D10S1652 [5]. The obvious functional and positional candidate gene EGR2 which is involved in CMT4E and CMT1, was excluded based on sequencing, with no putative mutations found and a newly identified recombinant single nucleotide polymorphism (SNP) in nucleotide position 1219 of the EGR2 mRNA (accession number: AF139463.1). The SNP is an A to C change which results in a synonymous arginine to arginine amino acid substitution in position 362 of the EGR2 protein (accession number: AAD24588.1) [5]. Increasing the average intermarker distance to 1.5 cM and adding affected individuals from BULG-1c and d (Figure 7) into the analysis, the gene causing HMSNR was positioned into a region on 10q22, between markers D10S581 and D10S1742. The peak two-point LOD score of 5.53 was calculated at marker D10S1647, whereas the

38 Chapter 1: General Introduction and Review of the Literature highest multipoint LOD scores were found in the interval D10S1647-D10S560 with a maximum value of 6.43. Analysis of the haplotypes revealed four recent recombinations, one of them centromeric and three telomeric, which placed the gene in the interval between D10S581 and D10S537, spanning 5.2 cM. Using historical recombinations, the telomeric boundary could be defined to D10S1742, reducing the region to a genetic distance of 3.5 cM (Figure 11-upper panel) [5].

1.4.2.2 Refined mapping

Ongoing collaboration with clinicians from Europe provided new Gypsy families from other countries that were diagnosed as having a severe autosomal recessive CMT disease, possibly being HMSNR. New families from Spain and Romania (hereafter called: Sp-4 and Sp-5; ROM-1 and ROM-2) were recruited that presented haplotypes closely related to the conserved HMSNR haplotype, which suggests that a common founder mutation is causing HMSNR in all affected subjects, similar to what has been seen with HMSNL [6]. New microsatellite markers (namely: bA153K11CA1, bA153K11CA2, bA86K9CA1, bA227H15CA1 und bA227H15CA2) were identified in the critical region by searching the UCSC database (Figure 11). Meanwhile the DNA sample for another affected Bulgarian individual (R1 from branch BULG-1d) had become available and was included into the analysis.

39 Chapter 1: General Introduction and Review of the Literature

1. Initial refined mapping and exclusion of EGR2 (BULG1a-d) cen tel D10S561 EGR2 D10S581 D10S1646 D10S1670 D10S210 D10S2480 D10S1678 D10S1647 D10S1672 D10S1742 D10S1665 D10S560 D10S37 Haplotype 171321513611775 a 20110215136111127 b 11521513611775 c 102321513614587 d 102321513614526 e 1023215136111127 f 12?421513611734 g 18?7215136111135 h 12?421513611775 i-1 12?421513611774 i-2

2. Identification of new HMSNR families - refinement of the critical region D10S581 D10S1646 D10S1670 bA153K11-CA1 D10S210 bA153K11CA2 D10S2480 bA86K9CA1 D10S1678 D10S1647 bA227H15CA1 bA227H15CA2 D10S1672 D10S1742 D10S1665 D10S560 Haplotype Original Bulgarian 3212571336321177 a mapping kindred 102125713363211112 b (BULG-1a-d) 5212571336321177 c 3212571336321458 d 3212571336321452 e 32125713363211112 f 4212571336321173 g 72125713363211113 h 4212571336321177 i HMSNR "broad" interval of ~3.5 cM

R1 -Added individual from 46125713336121117 j family BULG-1d

Spanish and 4332571336321172 k Romanian Families 4513491336321172 l Sp-4, 5/ ROM-1,2 7412571336321172 m (new haplotypes 12413597235 321177 n only) 9332571336321371 o

HMSNR "narrow" interval of ~1 Mb (estimate in 2001)

Figure 11: Chromosome 10q haplotype data from the initial genome scan and the refined mapping (Data adapted from [5, 6]) The initial genome scan (upper panel) in the large Bulgarian Gypsy kindred (BULG-1a to d) placed the critical HMSNR region between markers D10S581 and D10S1742. For the refinement of the critical region new markers, namely bA153K11-CA1, bA153K11CA2, bA86K9-CA1, bA227H15-CA1 and bA227H15-CA2, were added (middle panel) and new Gypsy families from Spain and Romania and, one additional member of the large Bulgarian Gypsy kindred were recruited, which added new recombinant HMSNR haplotypes (haplotypes “j”, “k”, “l”, “m”, “n” and “o”. This placed the critical HMSNR region between markers bA86K9-CA1 and D10S1742 (lower panel). Haplotype “n” contains a new allele at marker D10S1647 (allele “5”), which was presumed to be a microsatellite mutation. The abbreviations cen and tel stand for centromere and telomere, respectively. 40 Chapter 1: General Introduction and Review of the Literature The two families from Spain, Sp-4 and Sp-5, which consist of each one affected and two unaffected individuals, contributed haplotypes “k”, “l” and “m”, thus moving the centromeric boundary between bA153K11CA2 and D10S2480. The two new Romanian families, ROM-1, with eight unaffected and three affected subjects, and ROM-2, with two unaffected and one affected individuals, added haplotypes “n” and “o” to the picture, reducing the critical region to approximately 1 Mb (at that time), with recombination breakpoints located between bA86K9CA1 and D10S1678 on the centromeric side and between D10S1672 and D10S1742 on the telomeric side. An unusual haplotype generated from the ancestral haplotype by a centromeric recombination and containing a presumed microsatellite mutation at marker D10S1647 (Figure 11: haplotype “n” and Figure 12) was found in individuals ROM-5 and ROM- 20. The fact that this haplotype could not be found in ROM-52, the grandmother of ROM-20, precluded formal confirmation of linkage to 10q22 in this family. This observation implies a historical recombination, a high carrier rate and a high frequency of the recombinant haplotype in the population, and/or consanguineous marriage between the grandparents of ROM-20. The assumption of a high frequency of this recombinant disease haplotype in the population is supported by its occurrence in another, more remote branch of the same family, and in a second totally unrelated Romanian HMSNR family. On the basis of these haplotypes, a “narrow” gene interval, flanked by markers bA86K9CA1 and D10S1742, was defined. Since conventional linkage analysis resulted in exclusion under low gene frequency and inconclusive evidence of linkage under high gene frequency, it was not possible to rule out the possibility that the above assumptions are incorrect and the gene is actually located in the “broad” region defined by the Bulgarian recombinant haplotypes. The problem could be solved by the recruitment of additional families, and by obtaining more detailed pedigree information on the large Romanian family where the critical recombination was observed.

41 Chapter 1: General Introduction and Review of the Literature

ROM-41 ROM-53 ROM-52

on

ROM-42 ROM-23 ROM-5 ROM-15 ROM-47

aao an n i

ROM-55 ROM-20 ROM-16

nnii

Figure 12: Pedigree of the large Romanian Gypsy family ROM-1 This figure shows the newly recruited Romanian Gypsy family. Haplotypes “a” and “i”, which occur in the Bulgarian HMSNR Gypsy kindred, were also present in this family. Additionally, this family contributed the haplotypes “n” and “o”. Haplotype “n” seems to be inherited in the family by individual ROM-53, while at the same time being introduced newly by the husband of individual ROM-52. The white haplotypes denote non-HMSNR .

42 Chapter 1: General Introduction and Review of the Literature

1.5 THE AIMS OF THIS PHD PROJECT

At the beginning of this PhD project the HMSNR gene had been positioned into a ~1 Mb region on chromosome 10q22, flanked by the markers bA86K9CA1 and D10S1742. The aim of this project was the positional cloning of the HMSNR gene. The two factors this project was built on, were, firstly, the Gypsy families with their closely related haplotypes and, secondly, the public databases. Using the diversity of recombinations that has accumulated on the disease haplotype, the smallest interval of homozygosity was to be defined by mapping the recombination breakpoints. This, in turn, was the region where the founder mutation must be located. Searching the public databases, which became more reliable as the Human Genome project progressed, provided candidate genes that were searched for mutations by sequencing. Each putative mutation was evaluated by testing segregation with HMSNR and whether it is a known polymorphism. Candidate mutations that were not excluded by these criteria were tested in control samples.

The specific aims of this project were: 1. Integrated physical-genetic map of the critical HMSNR gene region 2. Refined genetic mapping: Haplotype analysis Identification of recombinant haplotypes Mapping recombination breakpoints Defining the minimum region of homozygosity (critical gene region) 3. Identification of the HMSNR mutation

43 Chapter 2: Materials and Methods

2 MATERIALS AND METHODS

2.1 MATERIALS

2.1.1 DNA samples

A total of 50 members of the HMSNR families participated in this study (Table 7). These families are of Gypsy ethnicity and reside in Bulgaria, Romania/France and Spain. Blood samples were collected prior to the start of this PhD and DNA had been isolated from peripheral blood lymphocytes by co-workers according to a protocol by Miller et al [154] or using the QIAamp DNA Blood kit (Qiagen) according to the recommendations of the manufacturer.

Table 7: Overview of the HMSNR families Family Number of Number of HMSNR Number of unaffected individuals affected members members BULG-1 23 16 7 BULG-2 3 1 2 ROM-1 15 5 10 ROM-2 3 1 2 Sp-4 3 1 2 Sp-5 3 1 2 Sum 50 25 25

Furthermore, a population sample of 790 healthy Gypsies, a control sample of 54 Bulgarians, originating from a study on polycystic kidney disease that was conducted earlier in the laboratory and, 144 samples from families with unclassified Charcot- Marie-Tooth disease were used in a population screen. DNA samples of the latter were referred to the laboratory by collaborating clinicians. Informed consent was obtained from all individuals participating. The study complied with the ethical guidelines of the institutions involved and approval of ethics was obtained (UWA Human Ethics Approval Number: 0611).

2.1.2 Tissue samples

Human peripheral nerve tissue was obtained from a patient with surgery on a neuroma. RNA was isolated from a 2 cm piece of sciatic nerve, 4 cm proximal to the neuroma. This tissue was collected as part of a donor program organised to supply

44 Chapter 2: Materials and Methods human nervous tissue for the study of peripheral neuropathies (UWA Ethics Approval number: N2003-057). Animal tissues were taken from mice and rats. Mice of the strain C57BL6 were provided by Dr. Evan Ingley and rats by Dr. Bruno Meloni. The animals were part of projects conducted by Dr. Evan Ingley and Dr. Bruno Meloni for which ethics approval has been obtained (Rats: UWA Ethics approval number: 01/100/096; mice RPH ethics approval number: RPH 11/97). The animal tissues used in this PhD originated from animals that were sacrificed within the experiments of these projects.

2.1.3 Recipes for solutions

2.1.3.1 Media for cultivation of E. coli

LB medium (1 litre)

Tryptone 10 g Yeast extract 5 g NaCl 5 g ddH2O Add to 1 l

SOB medium (1 litre)

Tryptone 20 g Yeast extract 5 g NaCl 0.5 g 1 M KCl 2.5 ml ddH2O Add to 990 ml

After autoclaving add: sterile 1 M MgCl2 solution 10 ml

SOC medium (1 litre)

Add to SOB medium after autoclaving: Sterile 1 M Glucose solution 20 ml

Agar plates

Add to medium before autoclaving: Agar agar 15 g/l

Add to medium after autoclaving or spread on agar plates: X-Gal (5-Bromo-4-chloro-3-indolyl-b-d- 80 µg/ml (plates) galactoside) 20 µl of 50 mg/ml (spreading)* * can be diluted with sterile LB medium for easier spreading

45 Chapter 2: Materials and Methods Antibiotics

Add to medium after autoclaving and cooling to ~55 °C: Ampicillin from 1000 × stock to final concentration of 100µg/ml

2.1.3.2 Media for cell culture

Schwann cell medium

Iscove’s modified DMEM Penicillin 100 u/ml Streptomycin 100 µg/ml Foetal calf serum 10 % Recombinant human β1-heregulin 177- 10 nM 244 (Genentech) Insulin 2.5 µg/ml 3-isobutyl-1-methylxanthine 0.5 mM Forskolin 0.5 µM

Until 1st passage include: Phytohemagglutinin 0.25 µg/ml Laminin 0.5 µg/ml

Dorsal root ganglia (DRG) medium

DMEM/F12 Penicillin 100 u/ml Streptomycin 100 µg/ml Foetal calf serum 5 %

Dissociation mix

DMEM/F12 Penicillin 100 u/ml Streptomycin 100 µg/ml Foetal calf serum 5 % Collagenase (Sigma) 0.1 % BSA 0.1 % Trypsin 0.1 %

2.1.3.3 Solutions for agarose gel electrophoresis

TAE Buffer 50 × (1 litre)

Tris-Base 242 g Glacial acetic acid 57.1 ml EDTA × 2 H2O (Na2-salt) 32.7 g ddH2O Add to 1 l

46 Chapter 2: Materials and Methods

TBE Buffer 10 × (1 litre)

Tris-Base 108 g Boric acid 55 g EDTA × 2 H2O (Na2-salt) 8.3 g ddH2O Add to 1 l

Ficoll loading buffer [155]

Ficoll 15 % (w/v) Bromophenol Blue 0.25 % (w/v) Xylene cyanol 0.25 % (w/v) Note: The amount of Bromophenol Blue and Xylene cyanol can be reduced by half, as long as Ficoll concentration is maintained at 15 %.

Formamide (FA) gels for RNA (100 ml) (according to Qiagen)

Agarose 1.2 g 10 × FA gel buffer 10 ml RNase free water Add to 100 ml

After melting of agarose and cooling to ~65 °C add: 37 % (12.3 M) Formaldehyde 1.8 ml Ethidium bromide (10 mg/µl) 1 µl

10 × FA gel buffer pH 7.0 (NaOH)

MOPS (free acid) 200 mM Na-acetate 50 mM EDTA × 2 H2O (Na2-salt) 10 mM

1 × FA gel running buffer

10 × FA gel buffer 100 ml 37 % (12.3 M) Formaldehyde 20 ml RNase free water 880 ml

5 × RNA loading buffer

Saturated aqueous bromophenol blue solution 16 µl 500 mM EDTA × 2 H2O (Na2-salt) pH 80 µl 8.0 37 % (12.3 M) Formaldehyde 720 µl 100 % Glycerol 2 ml Formamide 3084 µl 10 × FA gel buffer 4 ml RNase free water Add to 10 ml

47 Chapter 2: Materials and Methods 2.1.3.4 Solutions for polyacrylamide gel electrophoresis

5 % polyacrylamide gels for sequencing and GeneScan

Urea 18 g 10 × TBE 5 ml 50 % Long Ranger ® (FMC BioProducts) 5 ml ddH2O Add to 50 ml Before pouring the gel add: APS 10 % 250 µl TEMED 35 µl

Loading buffer for polyacrylamide gels

Blue Dextran EDTA-dye EDTA × 2 H2O (Na2-salt) 50 mM Blue Dextran 50 mg/ml

Loading buffer for sequencing reactions: Blue Dextran EDTA 1 µl Formamide 5 µl

Loading buffer for genotyping reactions: Blue Dextran EDTA 1 µl GeneScan™-500 Tamra™-size standard 2 µl (Applied Biosystems) Formamide 10 µl

2.1.3.5 Solutions for cDNA library screen [155]

Denaturing solution

NaOH 0.5 N NaCl 1.5 M

Neutralising solution

NaCl 1.5 M Tris-Cl (pH 7.4) 0.5 M

20 × SSPE (pH 7.4)

NaCl 3 M NaH2PO4 × H2O 0.2 M EDTA 0.02 M

48 Chapter 2: Materials and Methods

20 × SSC (pH 7.4)

NaCl 3 M Na-citrate pH 7 0.3 M

Pre-washing solution

SSC 5 × SDS 0.5 % EDTA pH 8.0 1 mM Note: The pre-washing solution was made as suggested in an older edition of Sambrock et al. [155]. The new edition suggests washing in 6 × SSC or 6 × SSPE.

50 × Denhardt’s reagent in H2O[156]

Ficoll 400 1 % (w/v) Polyvinylpyrrolidine 1 % (w/v) Bovine serum albumin (Fraction V) 1 % (w/v) Note: Store stock at -20 °C.

Pre-hybridisation buffer

SSC 6 × Denhardt’s reagent 5 × SDS 0.5 % Denatured Salmon Sperm DNA (Sigma) 20 µg/ml

2.1.3.6 Wash solutions (Northern blot and cDNA library screen)

Wash solutions used for hybridisation with ULTRAhyb™ (Ambion) and the cDNA library screen

Wash 1 Wash 2 Wash 3 SSC 2 × 1 × 0.1 × SDS 0.1 % (w/v) 0.1 % (w/v) 0.1 % (w/v)

Wash solutions used for hybridisation with Expresshyb (Clontech)

Wash 1 Wash 2 SSC 2 × 0.1 × SDS 0.05 % (w/v) 0.1 % (w/v)

2.1.3.7 Solutions for immunohistochemistry

TBS Buffer

Tris pH 7.5 10 mM NaCl 200 mM Azide 0.02 % 49 Chapter 2: Materials and Methods 2.2 METHODS

2.2.1 Culture methods

2.2.1.1 Preparation of chemically competent cells

One Shot Top 10 chemically competent E. coli cells (Invitrogen/life technologies) were used to prepare competent cells by the calcium chloride method. A 50 µl aliquot of purchased One Shot Top 10 E. coli was inoculated into 1 ml of LB and incubated shaking at 37 °C over night. 100 µl of this culture were inoculated into 10 ml LB medium and incubated with shaking at 37 ° for 1 to 2 h or until the optical density at

660 nm (OD660nm determined using the following spectrophotometers: Perkin Elmer MBA2000, Eppendorf Biophotometer) was between 0.6 and 1. The cells were cooled on ice, centrifuged at 2500 rpm in a centrifuge 5810R (Eppendorf) at 4 °C and the supernatant was discarded. The resulting cell pellet was resuspended in 5 ml of sterile ice-cold 0.1 M calcium chloride solution and left on ice for 40 min. The cells were centrifuged again at 2500 rpm and 4 °C, resuspended in 1 ml of sterile ice-cold 0.1 M calcium chloride solution and divided on ice into 50 µl aliquots, which were stored at - 80 °C.

2.2.1.2 Isolation and culture of Schwann cells and DRGs

Schwann cells and DRGs from rats and mice were isolated and cultured with a method adapted after a protocol by Li [146]. All centrifugation steps were carried out in 15 ml polypropylene tubes in a centrifuge 5810 (Eppendorf) at room temperature. After the sciatic nerves or DRGs were dissected from the animals, they were stored in DRG medium until further processing, with the longest storage being over night. For isolation of Schwann cells and DRGs, the DRGs or sciatic nerves were cut into small pieces and sedimented for 3 min at 1000 rpm. 1 ml dissociation mix was added to the pellet, and the tubes were incubated at 37 °C for 1 h in a water bath with gentle shaking every 20 min. Then, the mix was resuspended by pipetting up and down with a P1000 pipette. Afterwards 2 ml of DMEM/F12 were added and the mix was centrifuge for 3 min at 1000 rpm. This was repeated twice and the pellet was resuspended in 200 µl DMEM/F12. For cultivation of Schwann cells from sciatic nerve the 200 µl were resuspended into 2 ml of Schwann cell medium, separated into cell culture dishes and incubated at 37 °C in 5 % CO2 (incubator, Heraeus). For cultivation from DRG neurons from DRG the 200 µl were pipetted onto a gradient containing 8 ml

50 Chapter 2: Materials and Methods 5 % BSA in DMEM/F12 layered with 4 ml DMEM/F12. Density centrifugation was carried out at 600 rpm for 1 min. From the pellet Schwann cells could be cultivated by plating it out in Schwann cell medium. The pellet was washed twice with DMEM/F12 and resuspended in DRG medium, plated out into flasks and incubated at 37 °C in 5 %

CO2. Cells were fed and replated into a new flask or culture dish when necessary depending on cell density.

2.2.2 Molecular biological methods

2.2.2.1 Polymerase chain reaction (PCR)

PCR primers were designed either using primer finder [157] or manually fulfilling the following criteria: primer length between 17 and 23 nucleotides, G or C at the 3’ end, avoid runs of four or more of the same base, similar GC content and annealing temperature for each primer pair. All primers have been purchased from Geneworks. A typical 50 µl reaction mix would contain:

ddH2O 24.8 µl 10 x Buffer containing 1.5 mM MgCl2 5 µl 5 mM dNTPs 5 µl Forward primer 20 ng/µl 5 µl Reverse primer 20 ng/µl 5 µl Taq polymerase 5 U/µl 0.2 µl Template (genomic 10 ng/µl) 5 µl

51 Chapter 2: Materials and Methods All reactions were performed on 96 well thermal cyclers of the type Mastercycler®, Mastercycler® gradient (Eppendorf) or PTC-100 (MJ Research). Standard cycling programmes were annealing at 55 °C or touchdown annealing from 63 °C to 55 °C with the following conditions:

Annealing at 55 °C 94 °C 5 min

94 °C 30 s 55 °C 30 s }35 cycles 72 °C 30 s

72 °C 7 min

Touchdown 63 °C to 55 °C 94 °C 5 min

94 °C 30 s 63 °C (-0.5 °C per cycle) 30 s }15 cycles 72 °C 30 s

94 °C 30 s 55 °C 30 s }20 cycles 72 °C 30 s

72 °C 7 min

Optimisation of PCR reactions was achieved by changing the composition of the reaction mixture or the cycling conditions. Alterations of the reaction mixture included the addition of reagents that modify annealing dynamics such as Q-solution (Qiagen), the use of hot start enzymes such as Hotstar Taq polymerase with initial denaturation changed to 15 min, or, the titration of the amount of MgCl2 by testing concentrations between 1 mM and 3 mM using the reaction buffer without MgCl2. The extension step of 72 °C was prolonged for longer PCR products according to the fact that Taq polymerase can synthesise 1 kb per min. If PCR products displayed additional non- specific bands then the annealing temperature was increased. In these cases the touchdown programme was run from 68 °C to 60 °C. On the other hand, if PCR products were faint or no amplification could be detected, then annealing temperatures were decreased. For the purpose of adjusting the annealing temperature, a gradient thermal cycler (Mastercycler® gradient, Eppendorf) was used in some cases, in order to test a range of temperatures.

52 Chapter 2: Materials and Methods 2.2.2.2 Genotyping

To type microsatellite markers, PCR primers were designed to amplify a 100 to 300 bp fragment encompassing the microsatellite. All primers were synthesised by Geneworks. A fluorescent label (Hex, Fam or Tet) was attached to the 5’ end of each forward primer. A typical PCR reaction would contain: ddH2O 1.5 µl 10 x Buffer, without MgCl2 0.5 µl 5 mM dNTPs 0.5 µl Forward primer 200 ng/µl 0.5 µl Reverse primer 200 ng/µl 0.5 µl 1.5 mM MgCl2 0.3 µl Taq polymerase 5 U/µl 0.05 µl Template (genomic 10 ng/µl) 2 µl

Optimisation of PCR reactions was performed as stated previously. PCR products of different size or label could be pooled. Between 0.5 and 1.5 µl of PCR product were mixed with 3 µl of loading buffer and fragments were separated on an ABI PRISM 377 DNA Analyzer (Applied Biosystems), using denaturing 5.0 % polyacrylamide gels. The genotyping data were processed using GENOTYPER version 2.5.

2.2.2.3 Agarose gel electrophoresis

Size separation of DNA was performed using agarose (DNA grade, Progen) gels prepared with TAE or TBE buffer and run in submerged horizontal gel systems Sub- Cell Model 96 Cell/192 Cell (Biorad), Model B1 EasyCast™ Mini Gel Electrophoresis System (Owl Scientific) or Horizon 58 (Life Technologies) at voltages between 80 and 100 V. The size standards that were used are listed in Table 8. TBE buffered 1 % gels were used to test RNA for degradation and for large PCR products. Success of PCR reactions was examined on 2 % TAE or TBE gels. 3 or 4 % gels were used to separate fragments generated by restriction digest or Tetra-primer-ARMS PCR. In order to allow an easier melting of the agarose and a higher resolution, 4 % agarose was prepared with one part metaphor agarose and three parts agarose. Ethidium bromide (MP formerly ICN Biochemicals, stock 10 mg/ml) was added to agarose solutions to a final concentration of 0.3 µg/ml. Ethidium bromide stained DNA and RNA fragments were visualised using a UV transilluminator (Spectroline) and photographed using a digital camera in conjunction with the Kodak ID software version 3.5.4 (Kodak Scientific imaging systems).

53 Chapter 2: Materials and Methods

Table 8: DNA ladders used for agarose gel electrophoresis DNA marker Conc. in Fragments loading buffer pUC19 (Fisher Biotech) 250 ng/µl 501/489, 404, 331, 242, 190, 147, 111/110, 67, 34, 26 1 kb ladder (Promega) 250 ng/µl 10000, 8000, 6000, 5000, 4000, 3000, 2500, 2000, 1500, 1000, 750, 500, 250/253 GeneRuler™ 1kb DNA 250 ng/µl 10000, 8000, 6000, 5000, 4000, 3500, Ladder (Fermentas Life 3000, 2500, 2000, 1500, 1000, 750, 500, Sciences) 250

2.2.2.4 PCR and plasmid purification

PCR products were purified for the purpose of sequencing or cloning using the following kits according to the manufacturer’s protocol: the QIAquick PCR purification kit, the QIAquick 96 PCR purification kit and the QIAquick gel extraction kit (all Qiagen), the GFX PCR and gel band purification kit and the GFX 96 PCR purification kit (both Amersham Biosciences. Elution was performed in volumes between up to

50 µl using ddH2O (Baxter). Lower volumes were used to concentrate PCR products. Plasmids from 5 ml over E. coli cultures were purified using the QIAprep Plasmid Mini kit (Qiagen) according to the manufacturer’s recommendation.

2.2.2.5 DNA sequencing

Plasmids or PCR products were purified (see above); cycle sequencing reactions were set up using either the BigDye Terminator cycle sequencing kit (Applied Biosystems) or the ET Terminator cycle sequencing kit (Amersham Biosciences). BigDye Terminator could be used in combinations with enhancing buffers, namely BetterBuffer (Microzone) or BigBuffer (Applied Biosystems). Approximate concentrations of DNA templates were estimated from agarose gels prior to purification. The amount to be used for cycle sequencing reaction was increased for templates that showed weak staining on agarose gels and decreased for templates with strong staining. Cycle sequencing was performed on 96 well thermal cyclers of the type Mastercycler®, Mastercycler® gradient (Eppendorf) or PTC-200 (MJ Research). Composition of sequencing reactions and cycling conditions were as follows:

54 Chapter 2: Materials and Methods BIG DYE 2.0 ET Terminator Primer 20 ng/µl 1 µl Primer 20 ng/µl 1 µl BigDye 2.0 4 µl ET terminator 1 µl Template 1- 4 µl Template 3- 4 µl ddH2O Add to 10 µl ddH2O Add to 10 µl

BIG DYE 3.0 BIG DYE 3.1 Primer 20 ng/µl 1 µl Primer 20 ng/µl 1 µl BigDye 3.0 4 µl BigDye 3.1 2 µl Better buffer 2 µl BigBuffer 1 µl Template 3- 4 µl Template 3- 4 µl ddH2O Add to 10 µl ddH2O Add to 10 µl

Cycle sequencing conditions for BigDye version 2.0, 3.0 and 3.1 96 °C 1 min Only for Bigdye 3.1

96 °C 10 s 50 °C 5 s }25 cycles 60 °C 4 min

Cycle sequencing conditions for ET terminator 95 °C 20 s 50 °C 15 s }25 cycles 60 °C 1 min

Sequencing reactions were subjected to ethanol precipitation according to the protocol supplied by Applied Biosystems in a 96 well format after cycle sequencing was completed. For reactions set up with BigDye versions 2.0/3.0 and ET Terminator, this involved addition of 1 µl Na-acetate pH 4.9 and 25 µl undenatured ethanol, followed by centrifugation at 2500 × g for 30 min, drying of the tubes by inversion on tissue at 600 × g for 1 min, addition of 150 µl 70 % ethanol, centrifugation at 2500 × g for 10 min and a final drying step. For BigDye version 3.1, the protocol involved addition of 2.5 µl 125 mM EDTA prior to addition of Na-acetate and ethanol. Drying steps were performed at 185 × g, the volume of 70 % ethanol was reduced to 30 µl and the following centrifugation was performed at 1650 × g for 15 min. All centrifugation steps were performed in centrifuges 5810 or 5810R (Eppendorf) at room temperature. 4 µl of loading buffer were added to each reaction, and denaturing step at 96 °C for 3 min was performed in a thermal cycler before the reactions were run on in denaturing 5 % polyacrylamide gels on an ABI 377 DNA Analyser (Applied Biosystems). Sequences were analysed using the software Sequence Navigator 1.0.1 (Applied Biosystems).

55 Chapter 2: Materials and Methods 2.2.2.6 Typing of sequence variants

Restriction fragment length polymorphism (RFLP) analysis

Restriction digests were mainly used for typing SNPs, but also for determining whether the plasmids isolated from the positive colonies that were identified from the cDNA library screen contain inserts. All restriction enzymes were purchased from New England Biolabs or Promega. To set up a restriction digest for SNP typing, the two SNP alleles including about 10 bp flanking sequence were pasted into the programme Webcutter 2.0 [158] and all cutting enzymes with cutting sites of equal or greater four bp were determined. A suitable enzyme was then selected from the ones that only cut one of the two sequences. When selecting a restriction enzyme, enzymes for which the base change of question would create a restriction site were chosen over the ones where a restriction site was abolished. In addition, enzymes with longer recognition sites were preferred, due to the fact that they would create fewer fragments and thus enable unambiguous separation of the fragments on agarose. Other factors for choosing an enzyme were cost and availability. The PCR fragment to be used for the digest was then pasted into Webcutter 2.0 and fragments sizes were determined. A standard restriction digest mix in a volume of 20 µl would contain: ddH2O Add to 20 µl Enzyme (2.5 u) x µl 10 × Enzyme Buffer 2 µl PCR product 10 µl

The amount of PCR product was increased if more than three restriction fragments were expected and the reaction was optimised by testing several volumes of PCR product, if necessary. For digesting the plasmids isolated from the cDNA library screen the following mix was used: ddH2O Add to 20 µl Eco RI (20 u) 1 µl BamHI (20 u) 1 µl 10 × Eco Buffer 2 µl Plasmid 2 µl

All digests were performed for 6 to 12 h at the recommended conditions, which includes additives such as BSA in some cases and the recommended temperature. The completed digest was mixed with 3 µl of Ficoll loading buffer, loaded onto 3 or 4 %

56 Chapter 2: Materials and Methods agarose gels for SNP typing and 1 % agarose for plasmid digests and run at 90 to 100 V for up to 1 h.

Tetra-primer-ARMS PCR

Tetra-primer-ARMS PCR [159] was used for typing three of the SNPs, where restriction digests were not possible. This technique has been developed by combining ARMS-PCR [160], Bi-PASA PCR [161] and Tetra-primer PCR [162]. The principle of Tetra-primer-ARMS PCR is the use of four primers of 26 bp or greater in one PCR reaction, two outer and two inner primers. The outer primer pair has a higher annealing temperature and PCR conditions are chosen such that in the first cycles template for the inner primers is generated from the outer primers. The two inner primers bind to both sides of the sequence variant; allele specificity is achieved by a mismatch between the 3’ end of the inner primer and the template in concordance with the SNP to be tested for. Additionally, the inner primers contain a deliberate mismatch at position –2 from the 3’ end to ensure allele specificity. The allele-specific fragments generated from each inner and outer primer are chosen to be of different size thus allowing gel separation. Software to design optimal primers for tetra-primer ARMS- PCR is available on the internet [159, 163]. The PCR reactions were prepared as follows and a touchdown programme (see below) was used for PCR amplification. After completed PCR cycling, each reaction was mixed with 3 µl of Ficoll loading buffer and run on 4 % agarose gels.

ddH2O 0.8 µl 10× Buffer, 1.5 mM MgCl2 2 µl 5 mM dNTPs 2 µl Outer forward primer 20 ng/µl 2 µl Outer reverse primer 20 ng/µl 2 µl Inner forward primer 20 ng/µl 2 µl Inner reverse primer 20 ng/µl 2 µl 5 × Q-Solution 4 µl Taq polymerase 5 U/µl 0.2 µl Template (genomic 10 ng/µl) 3 µl

57 Chapter 2: Materials and Methods Touchdown 68 °C to 61 °C 94 °C 4 min

94 °C 1 min 68 °C (-0.5 °C per cycle) 1 min }15 cycles 72 °C 1 min

94 °C 1 min 61 °C 1 min }20 cycles 72 °C 1 min

72 °C 2 min

Typing of Insertion/deletions (indels)

The majority of insertion/deletions were typed by direct sequencing. However, for one 27 bp insertion/deletion it was possible to perform PCR amplification and subsequently separate the resulting fragments on 4 % agarose gels.

2.2.2.7 Cloning of PCR fragments

PCR products were cloned into the pCR®2.1-TOPO® vector using the TA TOPO® cloning kit (both Invitrogen/Life technologies) for the purpose of sequencing. TOPO® is based on topoisomerase, which is attached to the vector and makes the use of a ligase unnecessary. The ligation reaction was set up according to the manufacturer’s protocol using up to 4 µl of purified concentrated PCR product. The ligation mix was transformed into One Shot Top 10 chemically competent E. coli cells (Invitrogen/Life technologies) using the method as stated in the protocol. With One Shot Top 10 E. coli cells blue/white selection can be performed without the addition of IPTG to the medium.

2.2.2.8 Isolation of total RNA

Total RNA was isolated from human peripheral nerve, from mouse sciatic nerve, dorsal root ganglia, brain, testis, liver, heart, spleen and from rat sciatic nerve, brain and dorsal root ganglia. Rat and mouse tissues were dissected out of the animal/s, stored in cryogenic tubes (cryo.s, 2 ml, round bottom, external thread, with starfoot; Greiner bio- one) and the tubes were transferred directly onto dry ice or if available liquid nitrogen. Human nerve was taken out by a surgeon and then stored in dry ice. Tissues were stored long term storage at -80 °C or for human peripheral nerve in liquid nitrogen. Frozen human and rat peripheral nerve were ground in custom-made containers using a bowl mill (Retsch) for 2 min at maximum speed. For this process, both nerve

58 Chapter 2: Materials and Methods and custom-made containers were pre-cooled in liquid nitrogen. The other tissues were ground with mortar and pestle under liquid nitrogen. In each case, the resulting powder was suspended in Trizol® (Invitrogen/life technologies), transferred into a 1.5 ml or 2.0 ml tube, frozen on dry ice and stored at -80 °C until RNA isolation. Cultured Schwann cells were prepared for RNA isolation by addition of Trizol® into the culture dish according to the Trizol® protocol and stored at 80 °C until RNA isolation. For RNA isolation, the Trizol® mixture was defrosted and processed according to the Trizol® protocol, or, the aqueous phase was transferred onto RNAeasy columns (Qiagen) treated with DNase (RNase-free DNase set, Qiagen) according to the manufacturer’s recommendation. RNA isolation would then proceed as stated in the RNeasy protocol. The concentration of the resulting RNA was determined using a spectrophotometer (Perkin Elmer MBA2000 or Eppendorf Biophotometer) at the wavelength of 260 nm. Integrity of isolated RNA was tested by running 3 µl on a 1 % agarose gel, where high quality RNA should display two strong characteristic ribosomal bands of approximately 2 and 5 kb.

2.2.2.9 RT-PCR

All RT-PCR was performed by the two-step protocol, where cDNA is first synthesised from an RNA template and the cDNA is then used in separate reactions as a PCR template.

RT-PCR using MMLV-RT minus RNase H (Promega)

For reverse transcription with MMLV-RT minus RNase H the following mix was prepared in 0.5 ml tubes, incubated at 65 °C for 10 min and then transferred onto ice. RNA 2 µg Random Hexamers or Oligo dT 50 ng/µl 2 µl or 3 µl (Promega) or gene-specific primer 10 µM ddH2O Add to 14 µl

59 Chapter 2: Materials and Methods The following mix (prepared as a master mix if several RT reactions were to be performed) was then added to the RNA-mix on ice. 5 × RT-Buffer (Promega) 6 µl 5 mM dNTPs 6 µl MMLV –RT minus RNase H (Promega) 0.5 Optional RNAguard 30 u/µl (Amersham 1 µl Biosciences) RNase free water 3.5 µl

Each RT reaction was then incubated at 30 °C for 10 min followed by incubation at 42 °C for 80 min. The resulting cDNA was stored at 20 °C. PCR reactions using 0.5 to 4 µl of the cDNA as a template were set up using exonic primers (spanning introns if possible). PCR conditions were optimised as described.

RT-PCR using Sensiscript (Qiagen)

RNase inhibitor (RNAguard, Amersham, 30 U/µl) was diluted to 10 U/µl by adding 1.5 µl 10 × Sensiscript buffer (Qiagen) and 8.5 µl RNase free water to 5 µl of RNase inhibitor. A reverse transcription reaction was prepared as follows: 10 × Sensiscript buffer (Qiagen) 2 µl 25 mM dNTPs 0.4 µl Random Hexamers 50 ng/µl (Promega) 2 µl RNase inhibitor mix (10 U/µl) 1 µl Sensiscript (1 reaction) 1 µl RNA 50 ng RNase free water Add to 20 µl

PCR reactions using 2 to 5 µl of the cDNA as template were set up as described.

2.2.2.10 Northern Blot

Production of a Northern blot

A formamide (FA) agarose gel containing ethidium bromide was incubated with FA gel running buffer to equilibrate for about 30 min before loading 5 µg of RNA mixed with 5 × RNA loading buffer into each well. RNA was either purchased (Human Brain Total RNA, Human Testis Total RNA, both from Clontech) or isolated as stated. The FA gel was run at 80 to 90 V in a Model B1 EasyCast™ Mini Gel Electrophoresis System (Owl) in FA gel running buffer for about 2 h. The RNA bands were visualised with a UV transilluminator (Spectroline) and a photograph was taken using a digital

60 Chapter 2: Materials and Methods camera in conjunction with the Kodak ID software version 3.5.4 (Kodak Scientific imaging systems). A Hybond™-N+ membrane was cut to the size of the gel, pre-wetted in RNase free water by floating on the surface of water in a Petri dish and then immersed in blotting buffer (20 × SSC). Upward capillary transfer was set up in the same gel apparatus as used for running the gel according to a protocol in Sambrock et al [155]. The two chambers of the gel apparatus were filled with blotting buffer 20 × SSC. The following layers were built up on the passage between the two chambers: 1 × pre-wet Whatman paper, FA gel, pre-wet Hybond membrane, 3 × pre-wet Whatman paper, paper towels (ca. 2-4 cm), small flat tray and weight (ca. 0.5 kg, e. g. Schott bottle). Care had to be applied to avoid bubbles when placing the membrane. Gladwrap was placed around the gel before adding the membrane in order to prevent direct transfer of buffer from the reservoir to the paper towels. The transfer was left over night for approximately 16 h. After the transfer was finished, the membrane was washed in 6 × SSC for 5 min while shaking gently and dried on Whatman paper. For cross-linking of the RNA with the membrane, the membrane was exposed to UV light in a UV cabinet for 1 min 45 s and placed in a microwave for 5 s. After additional drying at 55 °C in an oven (universal oven, Memmert), the membrane was stored in aluminium foil until hybridisation.

Radioactive labelling of DNA probes and removal of excess nucleotides

The Random Primed DNA Labelling Kit (Roche Applied Science) was used according to the manufacturer’s recommendations to radioactively label DNA probes. The radioactive isotope used for all labelling reactions was 32P in form of [α-32P] dCTP. The radioisotope was monitored throughout the labelling process and subsequent hybridisation using a Geiger-Müller counter (Series 900 Mini Monitor, Mini instruments). 25 ng of the purified PCR product was diluted into a volume of 9 µl, denatured at 95 °C for 10 min and placed on ice. Then 3 µl of dNTP mix (dATP, dGTP, dTTP each 0.5 mM), 2 µl of reaction mix (10 × concentrated, containing random hexamers), 1 µl of Klenow enzyme and 5 µl of [α-32P] dCTP (= 50 µCi) were added. For fresh isotope 5 µl equal 50 µCi, for older isotope this amount was increased up to 10 µl. However no isotope older then two weeks was used for labelling reactions. After brief

61 Chapter 2: Materials and Methods mixing, the tube was placed into a heating block at 37 °C for 45 min. The reaction was stopped by adding 2 µl of 0.2 M EDTA (pH 8.0). Unincorporated dNTPs were removed using Quick Spin Columns, Sephadex G- 50 (Roche Applied Science) or the Qiaquick spin columns (Qiagen) in conjunction with buffer PN (Qiagen) according to the manufacturer’s protocol. Success of a labelling reaction was tested by comparing the counts for the column and the probe eluate in a defined distance (~25 cm). The counts of the column should always be lower then the counts for the probe eluate thus demonstrating that the majority of [α-32P] dCTP was incorporated. The counts of the probe were determined by taking small aliquots and diluting them with ddH2O if necessary. After this step the probe could be stored for a few days at -20 °C until hybridisation.

Hybridisation, washing and detection

A Human 12-Lane Multi-Tissue-Northern™ (MTN) Blot (Clontech) was used or the blots were made according to the procedure described above. Hybridisation of Northern blot membranes was performed using either ExpressHyb (Clontech) for the MTN™ Blot or ULTRAhyb™ (Ambion) for the other blots with volumes between 5 and 10 ml in hybridisation bottles (Hybaid). Both reagents had to be warmed to 68 °C in the hybridisation oven (Hybaid) prior to the start of the experiment. For the pre- hybridisation the membrane was placed into the bottle flush to the glass and incubated at 68 °C for 30 min up to 2 h. Meanwhile, the radioactively labelled probe was denatured for 3 min at 95 °C and put on ice. Hybridisation was started by mixing the probe with the hybridisation solution. Using ExpressHyb, the hybridisation time was 1 h as recommended by the manufacturer, while ULTRAhyb™ hybridisations were performed over night as recommended. After completion of the hybridisation, the blot was washed several times. For ExpressHyb the wash steps were as in the protocol: several rinses in wash solution 1, followed by 30 to 40 min agitated washing with the same solution at room temperature, then, 40 min at 50 °C in wash solution 2. For ULTRAhyb™, the washing was slightly modified from the manufacturer’s protocol. Three wash solutions were used and all washes were performed at 42 °C. First the blot was rinsed twice for 30 s in wash solution 1, then the blot was washed twice for 5 min in wash solution 1, once for 10 min in wash solution 2 and finally twice for 15 min in wash solution 3. The counts of the isotope were measured after each wash step using a hand held monitor (Geiger-Müller counter, Series 900 Mini Monitor, Mini instruments) and washing would continue after the last step with a higher temperature if the counts 62 Chapter 2: Materials and Methods were still over 100 cpm. The washed membrane was wrapped in plastic wrap, taped to an old x-ray film and exposed in a autoradiography cassette to a new film at -80 °C for up to several days. The film was developed with an automatic developer (CP1000, AGFA) and orientation of blot and film was determined according to asymmetric fluorescent ink marks, prepared prior to exposure of the film. Stripping of Northern blot membranes for re-use was done by heating up the wash solutions in a microwave and washing the blot for 5 to 10 min in each of them, or, heated 0.5 % SDS was used in the same way.

2.2.2.11 cDNA library screen

The cDNA library screen was performed using the SuperScript™ Human foetal brain cDNA library (Life technologies); with the mRNA source being a healthy 18 week old foetus. The screening procedure was carried out as stated in Sambrock et al. [155] for large agar plates (φ 135 mm) with some adjustments.

Titrating of the cDNA library

A titration of the library was performed in order to determine the number of colony-forming units (cfu) per library aliquot. This step ensures that adequate cell densities are plated onto the master plates of the library. The library was defrosted on ice and two aliquots of 5 µl were taken out under sterile conditions. The first aliquot was used for the titration, while the second one was stored at 4 °C over night. A serial dilution was prepared from the first aliquot by adding the 5 µl to 5 ml of SOB-Amp medium (dilution 1:1000). After mixing, 5 µl were taken from the first dilution and added to 5 ml SOB-Amp medium (dilution 1:1,000,000). This was repeated another time thus reaching a final dilution of 1:1,000,000,000. 100 µl were taken from each dilution and suspended into 400 µl SOB-Amp medium. Each of the three resulting dilutions was spread onto a SOB-Amp agar plate and incubated at 37 °C over night. On the following day, the colonies were counted on the agar plates where densities allowed this and the number of cfu in the initially used library aliquot was calculated.

Plating of the library and preparing replica plates

According to the number of cfu calculated the second library aliquot was diluted so that approximately 20,000 cfu would be in about 500 µl of SOB-Amp medium. 12

63 Chapter 2: Materials and Methods labelled nylon membranes (Hybond™-N+, φ 132 mm, Amersham Biosciences) were distributed over 12 SOB-Amp agar plates. Each 500 µl-dilution of the library (containing ~ 20,000 cfu) was spread with glass beads onto one of the agar plates and incubated over night at 37 °C. On the next day, the six master plates, that displayed the most even spreading, were chosen to produce replica plates. The following steps were performed in a laminar flow with each of the master membranes: The master membrane was peeled off the SOB plate and placed onto autoclaved round Whatman paper. The first labelled replica membrane was moistened by laying it onto a SOB-Amp plate and subsequently placed on top of the master membrane avoiding exact overlap of the two membranes, which would make it difficult to separate them. A Whatman paper was placed on top of the two membranes, covered with a Petri plate and pressed down firmly in order to allow the transfer of bacteria. Asymmetric needle marks with an 18 gauge needle were placed onto the membranes for later reproduction of the orientation of the master versus the replica. Afterwards the replica was placed onto a SOB-Amp plate. The process was repeated with the second replica, before moving on to the next master plate. Finally, each of the six master plates had been replicated onto two replica plates. All 12 replica plates were incubated for 5 h (until colonies were clearly visible) and after that stored at 4 °C until further processing.

Lysis of colonies and binding of DNA to filters

Four flat plastic trays each containing a sheet of Whatman paper soaked in (1) 10 % SDS, (2) denaturing solution, (3) neutralising solution and (4) 2 × SSPE were set up and excess fluid was removed from each tray. The replica membranes were removed from the SOB-Amp plates and placed into the first tray (10 % SDS) for 3 min and subsequently in each of the following trays for 5 min. After the 5 min incubation in the last tray with 2 × SSPE, more 2 × SSPE was added; the membranes were left to float for a few minutes and then submerged by shaking the tray gently. All membranes were left to dry on a new sheet of Whatman paper for approximately an hour. Further drying was achieved by incubation at 65 °C. The fully dried replicas were wrapped in gladwrap and the DNA was fixed to the membrane by placing them DNA side down onto a UV transilluminator (Spectroline) for 1.5 min. Until hybridisation, the membranes were stored wrapped in aluminium foil with round Whatman paper sandwiched between them.

64 Chapter 2: Materials and Methods Hybridisation

To wet the membranes prior to starting the hybridisation, a tray was filled with 2 × SSC and the membranes were placed floating on the surface for 5 min and then submerged. Then the membranes were stacked on top of each other into a plastic tray with pre-wash solution and incubated at 50 °C for 30 min. Afterwards the membranes were removed from the solution and bacterial debris was scraped off with a Kim wipe soaked in pre-washing solution. Pre-hybridisation was performed in the same box by replacing the pre-washing solution with pre-hybridisation solution and incubating at 68 °C in a hybridisation oven (Hybaid) for up to 3 hours. For the hybridisation two reactions of radioactively labelled probe were prepared as described. To reach a final concentration of 2 × 105 to 1 × 106 cpm/ml, the volume of the pre-hybridisation solution was reduced to approximately 65 ml. To ensure even hybridisation of all membranes, the majority of the pre-hybridisation solution was transferred into a 50 ml tube, the probe was added and the solution was mixed by gentle shaking. Then the membranes were placed back into the plastic tray one after another and sandwiched with the hybridisation solution. The hybridisation was performed at 68 °C shaking and over two nights. The length of the hybridisation can be calculated by using the following formula [164]:

T 1/2 = 1/x × y/5 × z/10 × 2

T1/2 = time at 50 % hybridisation x = weight of probe in µg y = complexity of probe in kb z = volume of hybridisation solution in ml

For the hybridisation carried out in this PhD the calculation was:

T 1/2 = 1/0.05 × 0.336/5 × 65/10 × 2 = 17.4 h

Washing and exposure

After the hybridisation was complete, the shaker was switched off and the sealed plastic box was left to stand for a few minutes to allow any aerosol to settle down. Then the membranes were moved to a large volume (300 to 500 ml) of wash solution 1 for 5 min and gently moved. This was repeated once. Afterwards, the membranes were

65 Chapter 2: Materials and Methods transferred to a similar volume of wash solution 2 at 68 °C for about 30 min. The counts of the isotope should be near background level after this step. If otherwise, washing can continue with wash solution 3 also at 68 °C. The washed membranes were dried briefly on Whatman paper at room temperature, wrapped in plastic wrap and taped to an old x- ray film avoiding wrinkles in the plastic wrap. Marks with fluorescent ink were applied to indicate the orientation of the membranes taped to the old film versus the newly to be exposed film. Exposure of the x-ray film was performed in an autoradiography cassette at -80 °C for several days and repeated until signal was strong enough. All films were developed using a developing machine (CP1000, AGFA).

Secondary screening and identification of positive colonies

The developed film and the replica membranes were aligned with the fluorescent ink marks. The replicas were aligned with the master according to the needle marks. Positive colonies that were present on both replicas were selected and the original colony was identified on the master plate. As colonies can be very dense, the area around each positive colony was scraped off the master plate with a sterile pipette tip, inoculated in 1 ml sterile LB-Amp and grown over night at 37 °C for the secondary screen. On the next day a 1:1000 dilution of the bacterial solution was prepared in LB- AMP. 100 µl of the dilution were added to 400 µl of LB-Amp, spread onto SOB-Amp plates using glass beads and incubated over night at 37 °C. Replicas were taken from the colonies on agar and the secondary screen proceeded in the same way as the primary screen. Positive colonies that were identified in the secondary screen were scraped from the secondary master, inoculated in 5 ml LB-Amp and grown at 37 °C over night. The plasmid DNA was isolated using the Qiaquick plasmid purification kit (Qiagen) according to the manufacturer’s protocol and subjected to direct sequencing using the vector-specific primers M13-forward and reverse.

2.2.2.12 Immunohistochemistry

For immunohistochemistry the HXK1 (N-19) sc-6517 antibody (Santa Cruz Biotechnology) was used. This work was performed by members of Dr. Rosalind King’s group in London, UK. Frozen sections of human peripheral nerve were fixed with acetone for 10 min at room temperature and then incubated with TBS buffer for 10 min. Blocking was

66 Chapter 2: Materials and Methods performed with 0.1% Triton X 100 in rabbit serum diluted 1:4 in TBS for 20 min at room temperature. Following this any excess blocker was removed and the sections were incubated over night at 4 °C with anti-HXK1 antibody diluted 1:5 with TBS buffer. The sections were washed with TBS buffer for 5 min at room temperature and incubated with 1:5 diluted anti-goat peroxidase raised in rabbit for 30 min at room temperature. Development was performed with DAB (diaminobenzidine tetrahydrochloride) counterstained with Haematoxylin. The section were dehydrated and then mounted. Pictures were taken with an Axiophot Microscope (Zeiss).

2.2.3 Data analysis and Internet resources

2.2.3.1 Identification of the critical HMSNR region in online maps and identification of positional candidates

At each stage of the refined mapping the critical region was identified in the online maps at the National Center for Biotechnology Information (NCBI, Map Viewer) and at University of California Santa Cruz (UCSC) Genome Bioinformatics homepage (Genome Browser) by searching for microsatellite markers in the critical region and at its boundaries and searching for specific BAC clones from which microsatellite markers had been identified. The changes in BAC clones and markers between the maps and their versions were evaluated in comparison to the genetic data. Furthermore, BAC clones in the HMSNR region whose sequence was not yet finished by Human Genome Project (HGP) could be tested for overlap by using Blast 2 sequences. Positional candidates were all known genes, predicted genes and expressed sequence tags (ESTs) in Map Viewer and Genome Browser that were contained between the most centromeric and the most telomeric marker that exhibited heterozygosity in HMSNR affected individuals. Additionally, BAC clones were analysed in NIX at the Rosalind Franklin Centre for Genomics Research (RFCGR) to identify positional candidates that map to the respective BAC clone. General information about positional candidates was obtained from Locus Link, annotation of exons was obtained from Sequence Viewer and Evidence Viewer at NCBI (available via Locus Link entry for each gene), or, from the Genome browser at UCSC. If annotation was incomplete or not available, exon/intron boundaries were inspected manually and protein sequences were deduced using ORF (open reading frame) Finder at NCBI, the translate tool at the Expert Protein Analysis System server (Expasy) (or Sequence Navigator 1.0.1 (Applied Biosystems).

67 Chapter 2: Materials and Methods Internet addresses:

Map Viewer (NCBI) http://www.ncbi.nlm.nih.gov/mapview/ Genome Browser (UCSC) http://www.genome.ucsc.edu/ Blast 2 sequences (NCBI) http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html NIX (RFCGR) http://www.hgmp.mrc.ac.uk/Registered/Webapp/nix/index.html Locus Link (NCBI) http://www.ncbi.nlm.nih.gov/LocusLink/ ORF Finder (NCBI http://www.ncbi.nlm.nih.gov/gorf/gorf.html Translate tool (Expasy) http://au.expasy.org/tools/dna.html

2.2.3.2 Interpretation of sequence data

Sequencing was performed to either confirm identity of a PCR fragment or, to analyse positional candidates for variants informative about recombination and identification of putative HMSNR mutations. Sequence data were analysed using Sequence Navigator 1.0.1 (Applied Biosystems), which included comparison of forward and reverse strand and comparison to the reference sequence, which was obtained from the database entry of the BAC clone to which the fragment mapped. Or, for sequencing of plasmids, a map of the plasmid was obtained from the manufacturer’s homepage (Invitrogen) in order to distinguish the sequence of the plasmid from the cloned fragment. Variants identified while sequencing positional candidate genes were divided into three groups. First changes that only occurred in unaffected individuals were not further examined. Second, putative mutations were checked for their presence in the public databases (dbSNP, HGVbase) and segregation with HMSNR was tested by typing additional or all members of the HMSNR families. Any change that was not excluded by lack of co-segregation or presence in the databases (the likelihood of a rare Gypsy founder mutation to be found in the database was considered to be low) was subjected to a population screen in the Gypsies.

Internet addresses: dbSNP (NCBI) [165] http://www.ncbi.nlm.nih.gov/SNP/ HGVbase (Karolinska institute) [166, 167] http://hgvbase.cgb.ki.se/

68 Chapter 2: Materials and Methods

2.2.3.3 Other Internet resources

Protein domain predictions Conserved domain database http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi

Promoter prediction CorePromoter [168] http://argon.cshl.org/genefinder/CPROMOTER/index.htm NNPP http://www.fruitfly.org/seq_tools/ promoter.html Grail and TSSW through http://www.hgmp.mrc.ac.uk/Registered/ NIX Webapp/nix/index.html

Alignments of DNA and protein sequences ClustelW [169, 170] http://www.ebi.ac.uk/clustalw/index.html BLAST [171, 172] http://www.ncbi.nlm.nih.gov/BLAST/

Splice-site predictions NNSPLICE 0.9 [173, 174] http://www.fruitfly.org/seq_tools/splice.html NetGene2 [175] http://www.cbs.dtu.dk/services/NetGene2/ GeneSplicer [176] http://www.tigr.org/tdb/GeneSplicer/gene_spl.html

RNA secondary structure prediction GeneBee [177] http://www.genebee.msu.su/services/rna2_reduced.html Vienna Package (RNAfold) http://www.tbi.univie.ac.at/~ivo/RNA/ [178]

Exonic splice site enhancer (ESE) prediction ESE finder http://rulai.cshl.edu/tools/ESE/

Analysis of UTRs UTRscan [179] http://www.ba.itb.cnr.it/BIG/UTRScan/

69 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

3 REFINED MAPPING OF THE CRITICAL HMSNR GENE REGION

Chapter outline

The aims of this chapter are: 1. Providing the background on positional cloning of Mendelian disorders, including refined mapping, and on the merits of founder populations for positional cloning and refined mapping. 2. Recapitulating the information about the position of the HMSNR gene at the start of this PhD and explaining the strategy that has been used to perform the refined mapping. 3. Describing the results of the refined mapping, including the integration of changing physical maps with genetic data, the details about the polymorphic markers identified in this PhD and the mapping of the recombination breakpoints in order to define the minimal region of homozygosity. 4. Adding concluding remarks in regards to what has been achieved in the process of the refined mapping of the HMSNR gene region and what lessons have been learned.

3.1 REFINED MAPPING AS A PART OF THE POSITIONAL CLONING PROCESS

3.1.1 Linkage mapping of Mendelian disease loci

Positional cloning is defined as the identification of a disease gene based on its chromosomal position in the genome. It is the approach of choice for genetic mapping, when little or no information about the gene or protein is available [180]. In 1986, the gene for granulomatous disease was the first to be identified by positional cloning [181]. Since then, positional cloning strategies have greatly benefited from the progress of the Human Genome Project (HGP), and the focus has changed from pure positional cloning to a “positional candidate approach”, where, once the approximate chromosomal position has been determined, suitable candidate genes in this region are selected from the databases and subjected to mutation screening [182].

70 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region Linkage analysis is used to reveal the approximate chromosomal location of a disease gene. Employing statistical calculations, it aims at finding pieces of the genome that are co-inherited in a family with the disease phenotype, and that are unlikely to be passed on in this pattern by pure chance [183]. Non-independent segregation of Mendelian loci was observed for the first time in sweet pea in 1905/1906. A few years later, in 1911, T. H. Morgan hypothesized that this kind of segregation was caused by the proximity of these loci on the same chromosome. In the following decades, a number of scientists tried to estimate linkage of loci with diverse statistical approaches (reviewed in [184]). Only in 1955, N. E. Morton developed the lod (logarithm of the odds) score method [185], which is now the method of choice for linkage analysis. The observation of linked loci in humans started with the loci for haemophilia and colour blindness on the X chromosome in 1937, while the first linkage of autosomal human loci was found between the Lutheran blood group and secretor character in 1954 (reviewed in [184]). Successful linkage of the disease gene to a chromosomal region is inevitably dependant on correct genotyping, but also on the size and structure of the family and on correct assessment of the phenotype [186]. Several factors such as genetic or phenotypic heterogeneity, incomplete penetrance or phenocopies can lead to inaccurate evaluation of phenotypes. The term genetic heterogeneity implies that similar phenotypes are caused by different genetic mutations. This can be locus heterogeneity, where the same phenotype is due to mutations at different loci, or allelic heterogeneity, where different mutations at one and the same locus cause the same phenotype [187]. Locus heterogeneity has been shown for early onset Alzheimer’s disease, which results from mutations in three different loci on chromosome 1, 14 and 21 (reviewed in [188]). An example for allelic heterogeneity, on the other hand, is CMT1A, where over 50 causative mutations have been reported for PMP22 gene locus (http://www.molgen.ua.ac.be/CMTMutations/). Allelic heterogeneity has also been seen in patients with the recessive disorder cystic fibrosis, where patients from outbred populations often have two different disease alleles, and do not show homozygosity at the disease locus, as seen in Dutch and French-Canadian patients in a study by de Vries [189]. Phenotypic heterogeneity means that different mutations at one and the same locus cause different diseases with distinct phenotypes [187], as reported for the lamin A/C gene, which when mutated can give rise to autosomal recessive CMT2A, Emery- Dreifuss muscular dystrophy, dilated cardiomyopathy type 1A, limb girdle muscular dystrophy type 1B or autosomal dominant partial lipodystrophy [138-143]. Penetrance

71 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region designates the percentage of individuals with a mutant genotype that exhibit the disease phenotype. 100 % penetrance specifies complete penetrance that means all individuals with the mutant genotype also show the associated phenotype. If this is not the case, then penetrance is incomplete [187]. Penetrance can also be age related, which means that the proportion of individuals that carry the disease genotype and also exhibit signs of the disease varies with age. Late onset, as seen in 95 % of Alzheimer’s cases (reviewed in [188]), can be a problem for phenotypic assessment. Phenocopies are another challenge to linkage analysis. In these individuals, the disease phenotype is caused by environmental influences and resembles a phenotype that is usually attributable to genetic factors. This can be the case for adult-onset diseases such as cancer or diabetes [187]. In linkage analysis the whole genome is screened with a set of markers, characterised by defined locations and intermarker distances [183]. The genetic distance of these markers is given by the recombination fraction in centi Morgan (cM). 1 % recombination fraction between two loci has been defined as a distance of 1 cM by J. B. S. Haldane in 1919 (reviewed in [184]). On average, 1 cM on a genetic map equals 0.88 Megabases (Mb) on a physical map. However, genetic distances are sex-specific, in females 1 cM approximates to 0.7 Mb, while in males 1 cM is on average 1.05 Mb [180]. In 1913, A. H. Sturtevant compiled the first genetic map by determining an order for five markers on the X chromosome of Drosophila (reviewed in [187]). Amongst these markers were characteristics such as white eyes, miniature body and rudimentary wings (reviewed in [190]). The first human genetic markers were blood groups and later electrophoretic mobility of serum proteins was used as a marker. Over time, these have been replaced by more suitable markers, which are more informative, easier to type and present in large numbers in the genome. Today, it is standard to use microsatellites, which belong to the group of polymorphisms called variable number tandem repeats (VNTR). Microsatellites are usually di-, tri- or tetra nucleotide repeats that are highly informative due to their many alleles (reviewed in [180]). Widely used genetic maps generated from typing microsatellites in large pedigrees include the Généthon map in 1996 [191], the Marshfield map 1998 [192] and, more recently, the deCODE map [193]. The intermarker distance, and thus the resolution of linkage mapping, is between 5 and 10 cM for the initial screen, and is decreased to 1 cM by typing more markers for regions where linkage is suspected [186]. Only a few years ago, the translation of genetic distances into physical ones was an expensive and time-consuming process,

72 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region which involved the construction of a physical map of the region of interest by overlapping individual genomic clones. With the resources generated by the Human Genome Project, the physical position of genetic markers can now be obtained by a simple search in one of the databases available via the Internet [186].

3.1.2 General strategy of the refined mapping

Refined mapping is defined as the reduction of the critical region, where the disease-causing mutation is located, to the smallest possible size. This is of great importance, as the size of the critical region determines the number of positional candidate genes that need to be screened for disease-causing mutations [180]. The genome-wide scan with microsatellite markers usually results in a candidate region of several cM. With 1 cM being on average 1 Mb, regions are often several Mb in size. As the density of genes in the human genome varies between 0.6 and 19.0 genes per 1 Mb [194], one can easily be left with a large number of positional candidates. Therefore, refining the critical region is a crucial step of positional cloning, which may save the researcher time and money. The definition of the critical region is obtained by haplotype analysis. A haplotype is a set of alleles from one and the same chromosome which are passed on as a block in a pedigree. The disease haplotype is the set of alleles that is co-inherited with the disease. Haplotype analysis traces the inheritance of each haplotype, including the disease haplotypes, throughout the pedigree as far as this is possible. The length of the disease haplotype defines the critical region where the disease-causing mutation must be located [180]. Refinement of the critical region is achieved by analysis of recombination breakpoints. Using numerous polymorphic markers covering the entire critical region, the recombination breakpoints can be mapped meticulously [180]. Recombination is the result of DNA exchange between homologous DNA sequences. For meiotic recombination this involves sister chromatids [10]. The limit of the refined mapping is reached, when the position of the closest recombination breakpoints on either side of the disease locus is mapped by densely spaced markers [180].

73 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

3.1.3 Refined mapping in isolated founder populations

Isolated founder populations such as the Finns, the Ashkenazi Jews or the Old Order Amish, provide a powerful tool for identification of disease genes. Due to their specific genetic properties, their use may greatly facilitate positional cloning [1]. A founder effect occurs when a small number of individuals are separated from a larger population by events such as migration into a new geographic location. A similar scenario also takes place when a population is subjected to a bottleneck which dramatically reduces population size. In both cases the genetic diversity is reduced [195]. The founder effect and continued isolation lead to the formation of a relatively homogeneous population with allele frequencies distinct to the parental population. In this process, some disease alleles become more common while others may disappear completely. In addition, a high incidence of inbreeding may raise the frequency of rare recessive disorders. Due to the limited genetic diversity in the founders and little admixture with other populations in the following generations, founder populations effectively exhibit less locus heterogeneity, but also less allelic heterogeneity for Mendelian diseases, which can be exploited for their mapping [2, 196]. Specific strategies for mapping of disease loci and for subsequent refined mapping have been developed which utilise the properties of founder populations. These strategies are based on the assumption that a considerable number of disease mutations occurring in these populations are so-called founder mutations, where all affected individuals inherited the disease allele from one or few distant ancestor/s [2]. In some cases, where extensive genealogical records are available, relation can be shown for seemingly unrelated individuals, therefore supporting the hypothesis of a founder mutation. This was possible for variant late infantile neuronal ceroid lipofuscinosis (vLINCL), which occurs in the Finns. Family records could be traced back to the 17th century, thus making it possible to join the majority of pedigrees [197]. Another example is the recessive disorder congenital severe combined immunodeficiency associated with alopecia that is due to a mutation in the gene encoding the transcription factor forkhead box N1 (FOXN1). Two affected individuals born to consanguineous parents were identified in an isolated community in Southern Italy. Community records indicated affected individuals in previous generations and thus prompted further testing. It was possible to link all identified carriers and affected individuals in a large seven

74 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region generation pedigree, which was founded by a single couple, that was born at the beginning of the 19th century, both of them carrying the disease mutation [198]. Over time, many different disease haplotypes are generated by recombination events on the ancestral haplotype of the founder, where the disease-causing mutation originated. Recombination that can be observed in a given pedigree is termed recent recombination, whereas older recombinations are called historical. Each disease haplotype contains a part of the ancestral chromosome which is identical by descent (IBD) and includes the mutation. Because of different recombination events, this ancestral part has a different size on each of the disease haplotypes. The combined power of all recombinations, historical and recent ones, can therefore be used to refine the critical gene region by identifying the smallest shared segment between all affected individuals. The more recombinations are identified, the smaller the shared fragment and the critical region. Due to the founder effects all affected individuals are likely to carry the same mutation, enabling the inclusion of seemingly unrelated families into the haplotype analysis, thereby increasing the number of possible recombinations. Therefore, refined mapping in founder populations can reduce the interval to a size that is not achieved in a single consanguineous family from an outbred population [186, 199]. All IBD methods depend on the number of meioses that have taken place between the ancestor and the present day affected individual [196]. For example, when performing homozygosity mapping in consanguineous families from outbred populations, the same disease allele inherited form a recent common ancestor has been transmitted to the affected offspring via both parents and is therefore identical by descent and homozygous segments of several cM are expected to span the disease locus [200]. However, a critical region of several cM can be a serious obstacle for positional cloning. In contrast, IBD segments in founder populations will be much smaller and therefore offer a greater resolution for fine-mapping. Due to the founder effect, markers surrounding the disease locus will be in linkage disequilibrium (LD) with the disease mutation, i.e. specific markers alleles are over-represented on the disease haplotype. The size of the segments is related to the age of the population and of the mutation, that means in younger founder populations, the IBD regions are larger, whereas in older founder populations, the IBD intervals are shorter and mapping methods need ensure that the shared region is not missed [196]. The next four sections will provide examples for the IBD methods and their mapping power including homozygosity mapping, mapping in younger and older isolates, and mapping in the Gypsies.

75 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region 3.1.3.1 Homozygosity mapping

Homozygosity mapping has been used to identify a locus for autosomal recessive retinitis pigmentosa. With the help of two large Indian families, a 16 cM region of homozygosity was located on chromosome 16p12.1-12 [201]. Another example is the detection of a locus for autosomal recessive non-progressive infantile ataxia, where a region spanning 19.5 cM on chromosome 20 has been identified by homozygosity mapping in 23 members of a large multigenerational pedigree [202]. One has to note that, in both cases, even though the families were large, the homozygous interval is of considerable size and, in addition, both studies did not obtain lod scores above or equalling three which is regarded as the threshold for conclusive linkage (reviewed in [184]). Homozygosity mapping has been applied successfully for the mapping of CMT4B1 and CMT4C. Identification of the CMT4B1 locus was performed in a large pedigree originating from southern Italy. Initial typing in five affected individuals identified several regions of shared homozygosity; all but one, were excluded after typing of all pedigree members, resulting in the mapping of the disease to a 4 cM interval on chromosome 11q23 with a peak lod score of 5.54 [84]. Similarly, homozygosity mapping of the CMT4C locus yielded a 13 cM interval, which could be reduced to 4 cM by analysis of haplotype sharing between two families [94].

3.1.3.2 Mapping in young founder populations

An example for mapping of disease loci in young isolates is benign recurrent intrahepatic cholestasis. Houwen and colleagues mapped the mutated gene using only three distantly related affected individuals from an isolated fishing community in the Netherlands. Due to the relatively young age of this founder population it was assumed that the IBD fragment would be reasonably large, thus making it possible to map the disease locus to a ~20 cM region on chromosome 18 by typing just 256 markers in 10 individuals and searching for shared segments [196].

3.1.3.3 Mapping in old founder populations

The Finns are one of the older founder populations with a history going back two thousand years for parts of the population (reviewed in [2]). One of the many disease genes mapped in the Finnish population is the one for diastrophic dysplasia (DTD). The chromosomal location of the gene was determined by linkage analysis using 13 families with two or three affected siblings each. The highest two-point lod

76 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region score of 7.37 was obtained at D5S72 [203]. As no obvious heterogeneity was seen amongst Finnish patients, refinement of the critical region also included single cases, which cannot be conventionally linked to the region. Linkage disequilibrium mapping placed the DTD gene at 60 kb from the CSFR1 locus [199]. Finally, the causative gene mutation was identified in a novel sulphate transporter 70 kb proximal of CSFR1, thus exemplifying the power of LD mapping [204].

3.1.3.4 Mapping in the Gypsies

Due to their history and the distinct group structure, the Gypsies, which are the subject of this thesis, are an interesting young founder population. They left India about 1000 years ago and migrated to Europe. Subsequently, Gypsy groups moved to different countries in Europe in several migrational waves. The common origin on one hand and the separation into different groups residing in different European countries on the other hand, resulted in the Gypsies being genetically distinct from the European population, while at the same time they exhibit a great divergence between the Gypsy groups (reviewed in [4]). The genetic isolation of the Gypsies relies on endogamy which is conferred by strong group identity (reviewed in [3]). Hereditary motor and sensory neuropathy-Lom (HMSNL) is caused by a founder mutation in the Gypsy population. Once the position on 8q24 had been identified, the refined mapping was built on haplotypes from Gypsy patients from all over Europe and was able to reduce the critical region to just 200 kb. Mutation screening and haplotype analysis demonstrated that all affected individuals have the same founder mutation and exhibit at least part of the haplotype on which the mutation originally occurred. Collection of new patients could therefore be based on these closely related yet divergent haplotypes and was not just limited to families large enough to calculate linkage [7, 205]. The list of successful positional cloning efforts using founder populations is impressive and identification of genes implicated in diseases provides valuable knowledge, not only in the field of pathophysiology, but also in understanding fundamental biological processes [2].

77 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

3.2 APPROACH TO THE REFINED MAPPING OF THE HMSNR GENE

3.2.1 Foundations of the refined mapping of HMSNR (work prior to this PhD project)

HMSNR was identified in several branches of a large Bulgarian HMSNL pedigree of Gypsy ethnicity where linkage to the HMSNL locus on chromosome 8q24 was excluded. Therefore it was concluded that a second peripheral neuropathy was segregating independently in this kindred [5]. While both disorders manifest with typical neuropathy symptoms, a distinction was made by thorough investigations of the new form, termed HMSN-Russe. HMSNL patients present with neural deafness, severely slowed MNCVs and a rapid disease progression. Neuropathological examination reveals hypertrophic onion bulbs in younger patients. HMSNR, on the other hand, is characterised by MNCVs just inside the demyelinating range, disease progression is less rapid and none of the patients exhibits neural deafness. In addition, neuropathology shows no onion bulbs, but profuse regenerative activity can be detected, which is not common to any other neuropathy [6, 102, 103, 153, 206]. A genome scan with 10 cM average intermarker distance was conducted in HMSNR affected individuals from two branches of the large Bulgarian Gypsy kindred, where HMSNL and HMSNR segregate independently. Linkage analysis resulted in the assignment of the critical HMSNR gene region to a >40 cM interval on chromosome 10q flanked by markers D10S208 and D10S1686. The two-point lod score peaked at D10S1652 with a value of 3.96, while calculation of multipoint lod scores resulted in the highest score of 4.5 for the interval between markers D10S196 and D10S537 [5]. The obvious candidate gene in this region, EGR2, which is mutated in two neuropathies, CMT1D and CMT4E [75], was excluded because no disease-causing mutations were found by direct sequencing of the exons of the gene. Further analysis of polymorphisms identified during the sequencing placed this gene outside the shared region of homozygosity [5]. A selection of additional microsatellite markers and inclusion of the remaining two branches of the large Bulgarian Gypsy pedigree into the analysis, decreased the intermarker distance from 10 cM to 1.5 cM. Recalculation of the two- point lod scores resulted in a peak at D10S1647 with a value of 5.53, while the highest multipoint lod score, 6.43, was obtained for the interval between D10S1647 and

78 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region D10S560. Haplotype analysis revealed several historical and one recent recombination, which placed the critical HMSNR gene region between marker D10S581 and D10S1742 [5]. Subsequently, collaborators in Europe recruited additional Gypsy families with autosomal recessive neuropathies similar to HMSNR. These families were typed for the markers in the critical region. To obtain a higher resolution, microsatellite markers were identified by searching the available sequence of the BAC clones for tandem dinucleotide repeats containing two nucleotides, which then had to be tested for polymorphism in the HMSNR families. In this way, five new microsatellites markers were added to the haplotype. Closely related haplotypes were identified in two Romanian and two Spanish Gypsy families. Linkage to the HMSNR region could not be formally established in these families; three were nuclear families each comprising one affected and two unaffected individuals, while for one large family an important individual was deceased, and, in addition, information about the precise relationship between two branches of this family was unavailable. Nevertheless, it was possible to include these families, as the shared homozygous part of the HMSNR haplotype was thought to have been passed on from a common ancestor, thus all individuals are assumed to be related distantly. Using this approach relies on the assumption that recombination events, accumulated on the disease haplotype over the entirety of the population history and could be identified by refined mapping. Examination of the haplotypes in the newly added families revealed several historical recombinations, that were different to the ones identified in the large Bulgarian Gypsy pedigree, which allowed a reduction of the critical region [6, 152].

79 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

3.2.2 Strategy of the refined mapping of the HMSNR gene

At the beginning of this PhD project the critical HMSNR gene region, which is the interval for which all HMSNR affected individuals are homozygous, was ca. 1 Mb (according to the physical map in 2001) and bordered by the microsatellite markers bA86K9CA1 on the centromeric and D10S1742 on the telomeric side [6, 152]. The region contained (Figure 13): • 7 microsatellite markers, • 311 SNPs (source: dbSNP), • 9 known genes (source: UCSC freeze April 2001), • 12 BAC clones (source: UCSC freeze April 2001), all consisting of a number of unordered pieces, and • a gap of unknown size in the telomeric part of the critical region

a) BAC clones AL360177.16 AL442635.8

AC026046.5 AC005063.2 AC027617.2 AL513534.10 AL359844.3 GAP AL391539.11 AL590067.1 AC016821.5

AC026057.4 AC011010.4

b) Genes DDX21 VPS26 FLJ22761

MGC3199 PRG1 SUPV3L1 HK1 TACR2 NET-7

c) Microsatellite bA86K9CA1 D10S1678 D10S1647 bA227H15CA2 D10S1742 Markers bA227H15CA1 D10S1672

d) Shared haplotype 3 3 3 5/6 1 2 1

Figure 13: Physical map of the interval between bA86K9CA1 and D10S1742 in April 2001 and shared HMSNR haplotype The map contains the BAC clones (a), the genes (b) and the microsatellite markers (c) contained in the critical HMSNR gene region in April 2001. The order of markers bA227H15CA1, D10S1647, D10S1672 and bA227H15CA2 changed with completion of the sequence by the HGP. The conserved HMSNR disease haplotype (d) is shaded in grey. Two alleles, namely 5 and 6, were observed for marker D10S1647, with allele 5 presumed to be a microsatellite mutation. The data for (a) and (b) has been obtained from UCSC.

80 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region Recombination mapping required additional markers, in order to obtain more accurate information about the recombination breakpoints. The existing SNPs were not considered useful as even today, large numbers of SNPs are not validated. Moreover, first reads of BAC clone sequences contained many errors which would have resulted in false SNPs. In addition, it was expected that a number of SNPs would not be polymorphic in the HMSNR families. Therefore, a strategy was employed, where refined mapping and mutation analysis were performed at the same time, by direct sequencing of exons and 50 to 100 bp of flanking introns of positional candidate genes. Due to the diversity of genes mutated in CMT disease, selection of functional candidate genes was difficult; therefore all positional candidate genes had to be included. Each identified sequence variant was examined for its potential to be either a disease-causing mutation or a neutral variant that may give further information in the mapping of recombination breakpoints. The procedure of evaluating putative disease- causing mutations is described in detail in chapter 4 (see 4.1.3.2, page 125). Polymorphisms, that were informative for the purpose of mapping recombination breakpoints, were genotyped in the pedigrees and incorporated into the haplotypes. In a dynamic process, the haplotype information was then related back to the physical map, and sequencing of candidate genes proceeded in the newly defined critical region. Throughout this process, the development of the physical maps, that is the progress of the sequencing of the region on chromosome 10 by the Human Genome Project, had to be monitored closely and alterations in the physical position of markers were evaluated in conjunction with the haplotype data. On specific request, BAC clones located in the critical region were prioritised for sequencing by the Sanger Centre, thus leading to a more rapid completion of the sequencing in this part of chromosome 10q. The end point of the refined mapping was defined as: a) determining the positions of the recombination breakpoints and b) identification of the minimum region of homozygosity shared by all disease chromosomes.

3.2.3 The use of databases to implement the strategy

Today, positional cloning of a disease gene relies heavily on the physical maps provided by the human genome databases, which allow the exact determination of a marker position in the DNA sequence. This has not always been possible. Just a few years ago, investigators who wanted to carry out positional cloning had to create their own physical map of the critical region using genomic recourses such as YAC clones

81 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region (yeast artificial chromosomes) and identification of genes was performed by methods such as exon trapping or cDNA selection. In February 2001, the draft sequence of the human genome was published by the company Celera and the International Human Genome Sequencing Consortium (IHGSC) [207, 208], and made available to the scientific community. The two human genome sequencing efforts were based on different strategies: Whereas Celera performed a whole-genome shotgun sequencing which results in unmapped fragmentary sequence information, the IHGSC chose to isolate, order and sequence overlapping bacterial artificial chromosomes (BAC clones) by the shotgun method (reviewed in [209]). Chromosome 10, to which the HMSNR region has been assigned, is still in the process of being sequenced by the Sanger Centre and in parts by the Genome Therapeutics Corporation (GTC) and the Institute of Molecular Biology and Biotechnology/Foundation for Research and Technology Hellas (IMBB/FORTH) as part of the IHGSC project [210]. To date (in 2004), large parts of the sequence have been finished and only few gaps remain. The two publicly available assemblies of the human sequence are the Human Genome Project Working Draft at the University of California, Santa Cruz (UCSC) and the one produced by the National Center for Biotechnology Information (NCBI) (Table 9), which both construct the genome from BAC clones but employ different methods. The UCSC obtains a frozen dataset at a given time point from GenBank and generates an assembly using the respective fingerprinting contigs from the Washington University Genome Sequencing Center. The NCBI on the other hand applies a strategy, where neighbouring BAC clones are identified by detecting overlap with a variant of the BLAST algorithm [171, 172]. This is further supported by chromosome assignment with fluorescence in situ hybridisation (FISH) and sequence-tagged sites (STS). Both, UCSC and NCBI, also use expressed sequence tags (ESTs) and mRNAs to determine the order and orientation of clones [211].

Table 9: Internet addresses of databases that provide physical maps of the human genome NCBI http://www.ncbi.nlm.nih.gov UCSC http://www.genome.ucsc.edu/ Celera (available with subscription) http://www.celera.com GRL http://grl.gi.k.u-tokyo.ac.jp Ensemble http://www.ensemble.org

82 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region An annotation of the assembly is provided to the researcher via an interface in the World Wide Web (www). For this purpose, the NCBI has created the Map Viewer and UCSC the Genome Browser, which are publicly available and which were the main source of information for the HMSNR project. Other annotations, such as the GRL (Genome Resource Locator), maintained by the University of Tokyo, and Ensemble, a joint project between the European Molecular Biology Laboratory/European Bioinformatics Institute (EMBL-EBI) and the Sanger Centre, were also explored (Table 9).

83 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

3.3 RESULTS OF THE REFINED MAPPING OF THE HMSNR GENE REGION

3.3.1 Integration of genetic data with a changing physical map

One of the main challenges was to securely locate the current critical region at any time of the refined mapping process. At the beginning of this PhD project, the Human Genome Project had just started to yield first results and most of the sequence available in the public databases was full of gaps, prone to error and most of all, subject to constant changes. In order to obtain more accurate information about the critical region of the HMSNR gene, the map of the interval between bA86K9CA1 and D10S1742 was compared between UCSC and NCBI, and changes in the databases were closely monitored. The resulting physical map was then superimposed onto the haplotype data from the HMSNR families and thus position of markers and orientation of BAC clones/ contiguous sequence (contigs) could be assessed and sometimes questioned. For the purpose of giving an overview of the changes in the physical map, the old UCSC freezes shall be used for illustration (Figure 14). These maps do not always coincide with the NCBI maps for the same period, due to the different methods employed for contig assembly. The starting point for this PhD project was the UCSC December 2000 freeze, where all BAC clones were still unfinished, and the size of the critical region was over 1 Mb with a gap of unknown size in the middle (part [a] in Figure 14). D10S2480 had to be included into the critical HMSNR region, because both bA86K9CA1 and D10S2480 mapped to the same unfinished BAC clone (AL360177), consisting of a number of unordered pieces, and thus the order of the two markers was unclear. On the telomeric side, BAC clones AC021954 and AL450311 were both considered part of the physical map of the HMSNR region for similar reasons. The order of the telomeric markers was ambiguous - due to the gaps, there was the possibility that this part of the sequence, containing four BAC clones was orientated in the wrong direction.

84 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

[a] UCSC December 2000 Freeze

D10S2480

bA86K9CA1

D10S1678

> 1 Mb

D10S1742

[b] UCSC April 2001 Freeze

[c] UCSC August 2001 Freeze

GAP AL513534 AL590067 AL442635 GAP GAP GAP AC011010 AL138925 AL391539 AL359844 AC027617 AC091603 AC037447 AC016821 AC024601

[d] UCSC December 2001 Freeze

GAP AL513534 AL359844 AL596223 AC067749 AC016821 AC021954 AL360177 AL391539 AL442635 GAP AC024601 AC011010 AL450311 AL590067 AC005063

[e] UCSC April 2002 Freeze

GAP AC005063 AL513534 AL359844 AL596223 AC016821 AC021954 AL360177 AL391539 AL442635 AL672126 AL450311 AC011010 [f] UCSC June 2002 Freeze

AL713888 AL513534 AL359844 AL596223 AC016821 AL360177 AL391539 AL442635 AL672126 AL450311

~ 0.8 Mb bA86K9CA1 D10S1742 Figure 14: Changes in the physical map of the critical HMSNR region between bA86K9CA1 and D10S1742 This figure follows the changes in the physical map of the critical HMSNR region between bA86K)-CA1 and D10S1742 using data from the UCSC Freezes December 2000 (part [a]) to June 2002 (part [f]), when sequencing of this area of chromosome 10 was completed and the minimum tiling path of BAC clones across the critical region had been established. BAC clones with complete sequence are shown in black boxes, while unfinished ones are shown in grey. Gaps in the sequence are indicated by a dashed box with the word “gap”.

85 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region Subsequently, the April 2001 freeze brought only minor changes to the map with all BAC clones still being in the same order. However, the August 2001 freeze (part [c] in Figure 14) gave an entirely different picture. AL513534 was the first BAC clone in the critical region that was sequenced completely. Several BAC clones, amongst them also AL360177, to which D10S2480 and bA86K9CA1 were mapped, disappeared from the region. UCSC temporarily placed D10S2480 about 2 Mb closer to the centromere, thus seemingly increasing the critical region to 3.4 Mb. This, however, was not in concordance with the genotyping data obtained with more centromeric markers, making it likely to be an error in the assembly of the BAC clones. Furthermore, BAC clones AC091603 and AC037447 had been newly placed into the gap. In the next freeze, in December 2001 (part [d] in Figure 14), these clones were removed from the region and replaced by the clone AC067749, the BAC clone AL360177 reappeared on the map and, in addition, sequencing had reached completion for five more BAC clones. This fast progress was largely happening thanks to the Sanger Centre which prioritised sequencing of BAC clones in the critical HMSNR region on specific request. In addition, the Sanger Centre also provided information about the clone that was chosen to bridge the gap, which was AL672126, thus facilitating the evaluation of the misplaced BAC clone AC067749 that was also present in NCBI build 28 in February 2002. Preliminary sequence (several unordered pieces) for BAC clone AL672126 was downloaded from the ftp server at the Sanger Centre. An alignment of the pieces with the ends of the adjacent BAC clones, the centromeric AL569223 and the telomeric AC016821, using BLAST2sequences, enabled an estimate for the size of the gap. By April 2002 (part [e] in Figure 14) the sequencing was close to completion and in the June Freeze of the same year (part [f] in Figure 14) the minimum tiling path for the HMSNR region, which is the smallest number of BAC clones needed to cover the whole interval, was available and all markers could be mapped unambiguously. The map location and order of markers at the centromeric and telomeric boundary of the critical HMSNR region was confirmed and the interval between bA86K9CA1 and D10S1742 reduced to 0.8 Mb. From this point onwards the BAC clones in the region underwent only minor changes in sequence that had little effect on the position of markers in the critical HMSNR region.

86 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region 3.3.2 Polymorphic markers used for the construction of a high density genetic map of the HMSNR region

3.3.2.1 Identification of polymorphisms

At the beginning of this PhD, the boundaries of the critical HMSNR gene interval of ~1 Mb (in 2001) on chromosome 10q were defined by the microsatellite markers bA86K9CA1 on the centromeric side and D10S1742 on the telomeric side (Figure 15).

cen tel D10S581 D10S2480 bA86K9-CA1 D10S1742 D10S560

Critical region at start of PhD (~ 1 Mb in 2001)

Detection of new variants

Haplotype analysis

Figure 15: Schematic drawing of the markers included in the haplotype analysis, the detection of new variants and how these relate to the critical HMSNR region at the beginning of this PhD project Cen denotes centromere and tel denotes telomere

The identification of new polymorphisms concentrated on the region between bA86K9CA1 and D10S560; the latter located ~370 kb telomeric of D10S1742 (distance in 2004), outside the critical region (Figure 15). On the centromeric side, the relative order of D10S2480 and bA86K9CA1 was uncertain and any changes in the map had to be monitored continuously. However, in 2001, there was no gene located in the ~80 kb sized interval between these two markers; therefore the search for putative HMSNR mutations did not include this area. Conversely, two excellent positional candidate genes, neurogenin 3 and tetraspan NET-7, both with proposed functions in the peripheral nervous system, were annotated on the telomeric side of the critical region inside the ~370 kb area between D10S1742 and D10S560. Due to sequence gaps existing in 2001 on either side of the map segment containing D10S1742 and D10S560, the orientation of this part of the contig was unclear. Therefore, the interval between

87 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region D10S1742 and D10S560 was included into the search for informative markers and putative mutations, because it seemed possible that it was in fact part of the HMSNR gene region. In contrast, haplotype analysis focussed on the interval between D10S581 and D10S560 (Figure 15), which is approximately 5.7 Mb (in 2004) in size. While no new markers were identified between D10S581 and bA86K9CA1, it was necessary to incorporate the previously established data into the haplotype analysis, in order to trace inheritance of haplotypes in the pedigrees and identify recombinant haplotypes.

Identification of microsatellite markers

Microsatellites markers were initially taken from the published genetic maps, specifically from the Généthon map [191] and the Southampton map [212] as described in [5], and their position was related to the physical map using the primer sequence provided by the databases. New microsatellites were identified by searching the physical map of the critical region for tandem repeats with the help of the UCSC database, which lists all repeats in a predetermined fragment of genomic DNA. Suitable repeats with low complexity, that would allow unambiguous allele discrimination were selected and typed in order to reveal whether they were polymorphic in the HMSNR families.

Identification of Single nucleotide polymorphisms (SNPs)

Single nucleotide polymorphisms (SNPs) and insertion/deletions were identified by direct sequencing of positional candidate genes in selected HMSNR family members. Thus a total of nearly 150 kb has been sequenced. Any variation from the reference sequence, which was taken from the BAC clone to which the positional candidate mapped, was examined for being a putative mutation or a polymorphism useful for the refined mapping. The panel of individuals used for the initial sequencing was selected to represent HMSNR affected subjects, obligate carriers and non-carriers. Individuals, representative for all of the recombinations, were included into the panel. All typing information was collected in a master table and subsequently entered into the pedigrees in order to assess inheritance of alleles and recombinations.

88 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

3.3.2.2 Microsatellites

As a result of this PhD, the haplotype of the HMSNR region is now defined by 20 microsatellite markers from D10S581 to D10S560 (Table 10). Of these 20 microsatellite markers, 15 are known, 11 as microsatellite markers in UniSTS (NCBI) and four as insertion/deletions in dbSNP (NCBI). For the remaining five markers, no entries could be found in UniSTS or dbSNP. Of the 20 microsatellite markers, 17 were known prior to this PhD, while three, bA227H15AAAC, bA404C6AC1 and bA404C6AC2, were identified by a co-worker within the duration of this PhD project and successfully integrated into the map. Besides, the PCR conditions for the microsatellite (bA314J18TA1) were re-optimised, allele calling was corrected and the marker was added to the map.

Table 10: Overview of the microsatellites on chromosome 10q used in this study in order centromeric to telomeric Microsatellite Accession number Distance to previous microsatellite on contig NT_008583.16 (2004) D10S581 UniSTS:50476 - D10S1646 UniSTS:64851 2.773533 D10S1670 UniSTS:42813 0.251444 bA153K11CA2 * rs10636655 (ins/del) 1.079315 D10S210 UniSTS:43878/UniSTS:32895 0.095595 bA153K11CA2 * not listed 0.041324 D10S2480 UniSTS:38119 0.224762 bA86K9CA1 * not listed 0.081463 bA314J18TA1 * not listed 0.239135 D10S1678 UniSTS:4255 0.050645 D10S1647 UniSTS:66820 0.257876 bA227H15CA1 * rs5785901 (ins/del) 0.052638 bA227H15CA2 * not listed 0.015696 D10S1672 UniSTS:79906 0.011253 bA227H15AAAC * (new) rs10668853 (ins/del) 0.077939 D10S1742 UniSTS:48298 0.082255 bA404C6AC1 * (new) rs10583969 (ins/del) 0.016339 bA404C6AC2 * (new) not listed 0.026869 D10S1665 UniSTS:33169 0.076274 D10S560 UniSTS:17166/UniSTS:14764 0.250284 * The name of these markers has been derived from the Sanger name of the BAC clone to which they map. “New” indicates that these polymorphisms were newly identified

The density over the whole region between D10S581 and D10S560 is one microsatellite marker every ~285 kb (20 microsatellites in ~5.7 Mb). The distances between the microsatellites are listed in Table 10.

89 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region For the critical region between bA86K9CA1 and D10S1742 the density is higher, one microsatellite marker every 88 kb (nine microsatellites in ~0.8 Mb) (Figure 16). Apart from the nine microsatellites, which define the interval flanked by bA86K9CA1 and D10S1742, further seven simple repeats of two to four nucleotides, which represent putative microsatellite markers were tested. However, four of these were not polymorphic in the HMSNR families, while three presented with only two alleles as insertion/deletion polymorphisms.

Perfect repeats not analysed

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Non-polymorphic repeats and ins/dels

Microsatellite markers AAAC bA86K9CA1 bA314J18CA1 D10S1678 D10S1647 bA227H15CA1 bA227H15CA2 D10S1672 bA227H15 D10S1742

Refined HMSNR region

Figure 16: Distribution of the microsatellites over the critical HMSNR region The haplotype for the interval between bA86K9-CA1 and D10S1742 is defined by nine microsatellite markers (lower panel). Additionally, seven repeats of two to four nucleotides were analysed, four of these were non-polymorphic, while three presented with only two alleles and were therefore classified as insertion/deletion polymorphisms. Further 18 perfect repeats (upper panel, arrows) were not included into the analysis. The scale is given in Megabases.

A higher density of microsatellites might have been possible to achieve and would have potentially reduced the sequencing effort. Between bA86K9CA1 and D10S1742, there are 18 additional perfect repeats, which denote potential microsatellites, eight of them are repeats of two nucleotides, two have three nucleotides and another eight are tetra-nucleotide repeats. Their distribution over the region between bA86K9CA1 and D10S1742 is shown in Figure 16. The majority of these repeats is located between bA86K9CA1 and bA314J18TA1 (six repeats) and between D10S1678 and D10S1647 (nine repeats), while two are located near bA227H15AAAC and one is between bA227H15CA1 and bA227H1CA2. With respect to the refined critical HMSNR region which resulted from this PhD project, it is apparent that the repeats in and near this region have been almost fully exhausted. The large number of repeats centromeric to the refined HMSNR region (15 repeats), could have been explored more

90 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region thoroughly, to enable an earlier reduction of the critical region on the centromeric side, including the designation of the allele “5” at D10S1647 as a result of recombination rather than microsatellite mutation. Conversely, the repeats sequences in the telomeric part between D10S1672 and D10S560 have been analysed to an extent that makes the other repeats redundant for the refined mapping of the HMSNR region. The refined critical region itself, which measures only 63.8 kb, is well defined with three microsatellites; two more repeats were non-polymorphic, leaving just one repeat that has not been examined.

3.3.2.3 Insertion/deletions (indels) and Single Nucleotide Polymorphisms (SNPs)

Although in March 2001, 311 variants were reported in dbSNP for the critical HMSNR interval flanked by bA86K9CA1 and D10S1742, it was considered preferable to sequence positional candidate genes and thereby identify variants due to several reasons: Many variants were (and still are) not verified, while those that truly exist have been ascertained in selected populations. It was expected that some of the truly existing variants would be non-polymorphic in an isolated founder population such as the Gypsies and that new variants would be identified that have not been reported. During the analysis of positional candidates by direct sequencing of exons and 50 to 100 bp of the flanking introns, and in addition sequencing of introns/intergenic regions of a small part of the critical HMSNR interval (the latter performed by a co- worker), a total of 229 polymorphic variants were identified. In summary, this sequencing effort included nearly 150 kb, with a majority of 94 % being intragenic and 6 % intergenic sequence. Of 229 variants, 217 are located in the interval between bA86K9CA1 and D10S1742, while the remaining 12 located in the area flanked by D10S1742 and D10S560. The majority of variants detected in this PhD were unknown at the time of identification. This becomes quite evident, when looking at the number of SNPs and insertion/deletion that were present in dbSNP at the start of this PhD project: For the interval between bA86K9CA1 and D10S1742, 311 variants had been submitted to dbSNP by March 2001. By October 2004, this number had increased ten times to 3262; about two thirds of these, specifically 2167 have been submitted between March 2003 and October 2004. This highlights the necessity for identifying new variants during this PhD, rather then analysing those which have already been submitted to the database.

91 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region To date, according to dbSNP build 121 released in June 2004, 177 of the 229 detected variants (77.3 %) are known polymorphisms and have been submitted to dbSNP by other research groups. In spite of the latest waves of new entries into the database, there are still 52 (22.7 %) variants that seem to be private to the Gypsies. 216 of the 229 variants are single nucleotide polymorphism, and the remaining 13 are insertion/deletions between one and 27 bp. A number has been assigned to each polymorphism, facilitating the handling of the data (see Appendix A for a comprehensive list of all polymorphisms).

Type of variation

• Insertion/deletions

Of the 13 identified insertion/deletions of six were 1 bp-insertion/deletion; two cases each were observed for 2- and 4 bp-insertion/deletions, and insertion/deletions of 3, 6 and 27 bp, respectively, occurred only once. A considerable number of the insertion/deletions seemed to be part of a nucleotide repeat unit as judged by inspection of the surrounding sequence.

• Single nucleotide polymorphisms

A total of 216 SNPs were identified in this PhD project. A majority of these fell into the group of A/G and C/T transitions, with 79 and 80, respectively, of the total. A smaller number was detected for A/C and G/T changes, each occurring 16 times. A/T and C/G changes were even less frequent, as they were found only in 12 and 13 of all cases, respectively (Table 11).

Table 11: Frequencies of the different types of base substitutions for the SNPs identified in this PhD project Base Frequency (% of 216 SNPs Frequency of transitions and substitution in total) transversions (% of 216 SNPs in total) A/G 79 (36.6 %) Transitions: 159 (73.6 %) C/T 80 (37.0 %) A/C 16 (7.4 %) Transversions: 57 (26.4 %) G/T 16 (7.4 %) A/T 12 (5.6 %) C/G 13 (6.0 %)

When evaluating the type of polymorphism, one must take into account that a transition from A to G is a transition from T to C on the other strand of the DNA. Looking at the data of 216 SNPs gathered in this PhD project, A/G and C/T changes

92 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region make up for 36.6 % and 37.0 % of the total, and A/C and G/T changes account for 7.4 % each (Table 11), indicating no preference for either strand in the formation of the SNP. The frequencies for transitions, the replacement of a purine base by a purine base or, a pyrimidine base by a pyrimidine base, and transversions, which are the exchange of a purine base by a pyrimidine base and vice versa) calculated from the HMSNR data of 216 SNPs are 73.6 % and 26.4 %, respectively, suggesting that transitions seem to occur more frequent then transversion.

Density and distribution of SNPs and insertion/deletions

The distribution of the variants plays a role in the refined mapping and mutation analysis as variants in genes are more likely to be candidate mutations, while variants in intergenic regions are expected to be neutral because they are less likely to disrupt the function of a gene, which would in turn subject the respective allele to negative selection. For that reason, one could postulate that sequencing of intergenic regions would be more likely to identify polymorphic markers. However, in most positional cloning projects, the typing effort is directed to identify both, putative mutations and informative polymorphisms, thus favouring intragenic over intergenic regions. To obtain a truly unbiased picture of the general distribution of the variants to genes and intergenic region one has to analyse both to the same extent. However, this is not true for this study, where mainly exons were selected for direct sequencing and intergenic regions were sequenced rather by “accident” in form of predicted genes that were later removed from the physical map of the region. Then again, a number of papers published in this area have the same selection bias by including data obtained from mutation databases, which contain mainly data from coding parts of genes for the obvious reason that these are the first to be examined for mutations because of the effects of truncating mutations and amino acid changes on the protein. From a total of 229, 220 (96.0 %) of the variants detected in this study were located in genic regions, while 9 (4.0 %) were in intergenic regions (Table 12), thus reflecting the selection bias discussed above. For variants flanking genes, the dbSNP definition has been adopted, where a polymorphism is called flanking, if it has been mapped within 2 kb of a gene, whereas outside the 2 kb the variant is intergenic. With this approach, changes in the promoter region are safely included. A majority (55.9 %) of all intragenic variants from the HMSNR study were located in introns. For flanking, 3’/5’ UTR and coding sequence, the values were 7.3 %, 10.0 % and 9.1 %. Of the

93 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region variants found in the coding regions of the genes, 50 % caused synonymous amino acid changes, while the remaining 50 % were non-synonymous amino acid changes. Additional, 25 variants were identified in ESTs that do not belong to the known genes and further 14 variants in an mRNA, for which no translation into a protein could be established. These variants were listed separately, as assignment to either 3’/5’ untranslated sequence or coding sequence was not possible (Table 12).

Table 12: Distribution of the 229 polymorphisms identified in the HMSNR region between D10S2480 and D10S560 Location Frequency Intergenic 9 (4 % of 229) Intragenic (total) 220 (96 % of 229) Intronic 123 (55.9 % of 220) Flanking 16 (7.3 % of 220) 3’ and 5’ UTR 22 (10.0 % of 220) Coding sequence 20 (9.1 % of 220) synonymous amino acid changes 10 (50 % of 20) non-synonymous amino acid changes 10 (50 % of 20) ESTs 25 (11.4 % of 220) Untranslated mRNA 14 (6.4 % of 220)

The total sequencing effort performed during this PhD project amounts to 149.6 kb, which includes 140.8 kb (94.1 %) of intragenic sequence and 8.8 kb (5.9 %) of intergenic sequence (Table 13). Out of 149.6 kb that have been sequenced, 54 kb fall into the refined HMSNR region of 63.8 kb, while the remaining 89 kb are located between bA86K9CA1 and D10S560, excluding the 63.8 kb flanked by SNP #171 and SNP#156 (for SNP positions see Appendix A).

Table 13: Density of SNPs and insertion/deletions identified in this PhD in intra- and intergenic regions Amount of sequencing Number of SNPs Frequency of performed and indels variants per identified kb Intergenic regions 8.808 kb (5.9 %) 9 1.02/kb Intragenic regions 140.798 kb total (94.1 %) 220 1.56/kb Intronic 83.360 kb 123 1.78/kb Flanking 10.734 kb 16 1.49/kb 3’ and 5’ UTR 10.706 kb 22 2.05/kb Coding sequence 25.652 kb 20 0.78/kb ESTs 8.413 kb 25 2.97/kb Untranslated mRNA 1.933 kb 14 7.24/kb Total 149.606 kb 229 1.53/kb

94 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region The overall density of SNPs and insertion/deletions identified in this PhD project was an average of 1.53 variants per kb of DNA sequence analysed (Table 13). However the distribution of the variants is not even as in the late stages of the project, sequencing efforts were focussed on the area inside and around the newly defined region of homozygosity of 63.8 kb (details see recombination mapping in 3.3.3). A total of 85 variant were detected inside the refined HMSNR of 63.8 kb, thus resulting in a density of 1 variant every 635 bp for this part of the region. The average density of 1.53 per kb can be broken down into 1.02 variants per kb of intergenic sequence and 1.56 variants per kb of intragenic sequence. For intronic, flanking, 3’/5’ UTR and coding sequence, the values are 1.78, 1.49, 2.05 and 0.78 variants per kb. For the variants in the ESTs and in the untranslated mRNA, the density was 2.97 and 7.24 variants per kb, respectively, much higher than for the rest of the sequence (Table 13).

95 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

3.3.3 Mapping of the recombination breakpoints

Haplotype analysis with the aim of mapping the recombination breakpoints was carried out in the interval flanked by markers D10S581 and D10S560. A total of 249 polymorphic markers were examined, 20 microsatellite markers, 216 SNPs and 13 insertion/deletions. Of the 20 microsatellite markers, 16 were informative about recombination breakpoints, while four showed the same allele in all affected individuals. Due to the fact the both SNPs and insertion/deletions only have two alleles, a lower proportion of these variants, namely 56 (24.4 %) were informative. 78 of 229 (34.1 %) were classified as putative mutations (discussed in chapter 5), where HMSNR patients presented homozygous for the rare allele, while the remaining 95 (41.5 %) variants were uninformative. These were mainly detected in non-carriers. Prior to this PhD, the HMSNR haplotype was defined by 16 microsatellite markers (Figure 17). In the process of refined mapping of the HMSNR gene region, 60 informative markers were added, four microsatellites, 53 SNPs and three insertion/deletions (Figure 18). The overall number of HMSNR chromosomes was 50 (25 HMSNR patients). Among these 50 chromosomes, a total of 17 different haplotypes was observed. Haplotype “a” has been designated as the conserved ancestral haplotype, because the highest number of chromosomes, namely eight, was identified for this haplotype. However, one might argue, that haplotypes “b” or “f” might be ancestral, as they both occur seven times. The overall number of haplotypes was 15 at the start of this project and 17 after the refined mapping had been performed. This was due to addition of one individual, which contributed haplotypes “p” and “q”. The number of informative polymorphisms varied between haplotypes, with haplotype “q” having 40, the highest number of informative variants, followed by haplotypes “d” and “e” with 27 variants each and haplotype “o” with 17. All other haplotypes contained considerably less informative variants (Figure 18).

96 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

centromere Megabases telomere 0 2.7 2.8 2.9 3.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 Figure 17: Haplotypes at the beginning of this PhD project A total of 15 HMSNR haplotypes were known at the beginning of this PhD. The scale at the top indicates the approximate position of the microsatellite markers on the physical map (dashed lines). The haplotypes are divided into the conserved HMSNR haplotype “a” (in grey) and recombinant haplotypes # alleles Spanish D10S581 D10S560 D10S210 Bulgarian Haplotype D10S1646 D10S1670 D10S2480 D10S1678 D10S1647 D10S1672 D10S1742 D10S1665 Romanian bA86K9CA1 identified in the initial genome scan in bA153K11CA2 bA153K11CA1 bA227H15CA1 bA227H15CA2 the large Bulgarian Gypsy kindred and conserved haplotype a 3217521336321177 8 404 #1 the ones identified in the Spanish and Romanian Gypsy families (recombina- recombinant hapotypes b 102175213363211112 7 700 #2 tions shown in colour). The critical c 5217521336321177 1 100 #3 Haplotypes identified in region as defined by recombinant d 3217521336321458 2 200 #4 genome screen and initial e 3217521336321452 5 500 #5 refinement haplotypes “n” and “o” was estimated to f 32175213363211112 7 700 #6 g 4217521336321173 1 100 #7 be over 1 Mb in 2001. However, after the h 72175213363211113 1 100 #8 sequencing of this region was completed i 4217521336321177 4 301 #9 by the HGP this was only 787.4 kb. The j 4617521336321177 1 100 #10 overall number of alleles and the number Haplotypes contributed by k 4337521336321172 2 020 #11 the Spanish and Romanian l 4519431336321172 1 010 #12 of alleles contributed by the Bulgarian, families m 7417521336321172 1 010 #13 Romanian and Spanish Gypsies are n 12419537235 321177 5 005 #14 o 9337521336321371 2 002 #15 stated to the right of each haplotype. The order of microsatellites is as in 2004. The 787.4 kb estimated to be over 1 Mb according to physical maps in 2001 boxed allele 5 at marker D10S1647 denotes a presumed microsatellite mutation.

97 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

Figure 18: Haplotypes after the

centromere Megabases telomere refined mapping 0 2.7 2.8 2.9 3.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 This figure is based on figure 17 and shows the additional informative markers that have been used to refine the critical HNSNR gene region. Haplotypes “o” occurring in one Bulgarian individual and “q” only seen in two Romanian HMSNR AAAC patients are most informative for the #5 #39 #244 #113 # alleles Spanish D10S581 D10S210 D10S560 Bulgarian D10S1646 D10S1670 D10S2480 D10S1672 D10S1742 D10S1665 Romanian D10S1678 D10S1647 refined mapping. This is supported #88, 95, 241 95, #88, bA86K9-CA1 #156,160,161 bA404C6AC1 bA404C6AC2 bA314J18TA1 bA153K11CA1 bA153K11CA2 bA227H15CA1 bA227H15CA2 #104, 105, 106 105, #104, 112 111, #108, #62, 67, 68, 69 68, 67, #62, 115 ,116 #114, #163, 85, 86, 87 86, 85, #163, bA227H15 #52, 145, 143, 168 143, #52, 145,

#100,101, 102, 103 102, #100,101, by haplotypes “d”, “e” and “n”, #142, 132, 133, 143 133, #142, 132, 171 138, #136, 137,

Conserved haplotype while the other haplotypes contrib- a 3 2 1 2 5 7 1 3 C 5 3 A 6 AGAdel AATC CTTA GAG? 3 2 1 CTT ATTT AC? 3 AGAC T TCA TCA 1 del 7 TGC 4 7 7 8 4 0 4 #1 uted little or no information. Allele

Recombinant haplotypes “5” at marker D10S1647 which was b 10 2 1 2 5 7 1 3 C 5 3 A 6 AGAdel AATC CTTA GAG? 3 2 1 CTT ATTT AC? 3 AGAC T TCA TCA 1 del 7 TGC 4 11 2 7 7 0 0 #2 pressumed to be a microsatellite c 5 2 1 2 5 7 1 3 C 5 3 A 6 A??? A??? ? ? 3 2 1 ? ? AC? 3 ? ? ??A ? 1 del 7 ? 4 7 7 1 1 0 0 #3 d 3 2 1 2 5 7 1 3 C 5 3 A 6 AGAdel AATC CTTA GAG? 3 2 1 CTT GGCC CT? 2 GAG? ? ??G CTG 4 ins 1 AAA 5 5 8 2 2 0 0 #4 mutation is now shown to be part of e 32125713C53A6 A???AATC ? ? 321CTTGGCCCT?2GAGTTCTGCTG4ins1 ? 5 5 2 5 500 #5 an ongoing recombination in haplo- f 3 2 1 2 5 7 1 3 C 5 3 A 6 A??? AATC ? ? 3 2 1 CTT ATTT AC? 3 AGAC T TCA TCA 1 del 7 ? 4 11 2 7 7 0 0 #6 g 42125713?53?6 ? ? ? ? 321 ? ? ? 3??A????AT??1 ? 7 ? 4 7 3 1 100 #7 type “n”. The conserved HMSNR h 7 2 1 2 5 7 1 3 C 5 3 A 6 A??? A??? ? ? 3 2 1 ? ? ?C? 3 AGA? ? ??A T?? 1 del 7 ??? 4 11 3 1 1 0 0 #8 haplotype is shown in grey, recom- i 4 2 1 2 5 7 1 3 C 5 3 A 6 AGAde AATC CTTA GAG? 3 2 1 CTT ATTT AC? 3 AGA? T ??A TCA 1 del 7 TGC 4 7 7 4 3 0 1 #9 j 46125713C53A6 A???A??? ? ? 321 ? ? AC?3AGA?? ? T??1 ? 7 ? 4 7 7 1 010 #10 binations are in colour. Markers that k 4 3 3 2 5 7 1 3 C 5 3 A 6 AGAdel AATC CTTA GAG? 3 2 1 CTT ATTT AC? 3 AGA? ? ??A TCA 1 del 7 TGC 4 7 2 2 0 2 0 #11 have been added during this PhD l 4 5 1 3 4 9 1 3 C 5 3 A 6 A??? A??? ? ? 3 2 1 CTT A??? AC? 3 AGA? ? ??A T?? 1 del 7 ? 4 7 2 1 0 1 0 #12 m 7 4 1 2 5 7 1 3 C 5 3 A 6 A??? A??? ? ? 3 2 1 CTT A??? AC? 3 AGA? ? ??A T?? 1 del 7 ? 4 7 2 1 0 1 0 #13 project are in bold. “?” has been used n 12 4 1 3 5 9 7 2 G 4 3 G 5 AGAdel AATC CTTA GAG? 3 2 1 CTT ATTT ACG 3 AGAC T TCA TCA 1 del 7 TGC 4 7 7 5 0 0 5 #14 where markers have not been typed o 9 3 3 2 5 7 1 3 C 5 3 A 6 AGAdel AATC GAG? 3 2 1 TCC GGCC CTA 3 AGAC C TCA TCA 3 del 4 TGC 2 7 1 2 0 0 2 #15 New p 7 2 1 2 5 7 1 3 C 5 3 A 6 AGAdel AATC CTTA GAGC 3 2 1 CTT ATTT AC? 3 AGA? ? ??A TCA 1 del 7 TGC 4 7 3 1 1 0 0 #16 for the particular haplotype. New “p” New q 4 5 1 5 2 9 5 1 G 1 3 G 2 GAGins CGCT TCCG AGAT 3 2 1 CTT ATTT AC? 3 AGA? ? ??A TCA 1 del 7 TGC 4 7 7 1 1 0 0 #17 and new “q” denote the haplotypes of the newly added Bulgarian 63.8 kb critical region after refined mapping HMSNR patient K3. 13 markers between bA86K9CA1 and D10S1647, for which only haplotype “q” differed from the conserved haplotype “a”, have been omitted from the picture for simplification. To save space, variants that are located in close proximity to each other have been summarised in groups.

98 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region While all HMSNR haplotypes were clearly related, as evident by the shared conserved part, there was high diversity of recombinations. In this small sample of 25 affected individuals, 11 centromeric and 8 telomeric recombinations were detected. Thus, a total of 19 recombinations were available for the refined mapping. This divergence of the haplotypes that was generated by recombination events collected on the ancestral HMSNR disease haplotype over many generations conferred the power to the recombination mapping and made it possible to reduce the critical region to a minimum of only 63.8 kb (Figure 18). The next section will give detail on the mapping of the recombinations in the families. The pedigrees contain all microsatellites used in the haplotype analysis and a representative selection of SNPs (Figure 19). Newly identified markers are given in bold. For comparison, the conserved haplotype (haplotype “a”) has been added to each pedigree.

conserved HMSNR haplotype

D10S581 3 D10S1646 2 D10S1670 1 bA153K11CA2 7 D10S210 5 bA153K11CA1 2 D10S2480 1 bA86K9-CA1 3 bA314J18TA1 5 newly identified D10S1678 3 D10S1647 6 #62 1 newly identified #171 1 newly identified representative for centromeric boundary of the critical region bA227H15CA1 3 bA227H15CA2 2 refined region of homozygosity D10S1672 1 #156 1 newly identified representative for telomeric boundary of the critical region #85 1 newly identified bA227H15AAAC 3 newly identified D10S1742 1 bA404C6AC1 7 newly identified bA404C6AC2 4 newly identified D10S1665 7 D10S560 7

a

Figure 19: Markers used in the pedigrees in the following section Newly identified markers are shown in bold. The conserved HMSNR disease haplotype is shaded grey, which has also been applied to the following figures. The arrow indicates the refined region of homozygosity which is defined by the three microsatellite markers bA227H15CA1, bA227H15CA2 and D19S1672.

99 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

3.3.3.1 The Spanish Gypsy families – no further recombination

The two Spanish Gypsy families (Sp-4 and Sp-5) had been recruited because collaborating clinicians diagnosed a recessive neuropathy in these families that is clinically and neuropathologically compatible with HMSNR. Genotyping demonstrated that their haplotypes in the critical interval on chromosome 10 were closely related to the HMSNR haplotype that had been identified in the Bulgarian Gypsy kindred where HMSNR had been mapped. While family Sp-4 (Figure 20) exhibits consanguinity (second cousin marriage), no consanguinity has been reported in family Sp-5 (Figure 21). In family Sp-4 one identical disease haplotype (“k”) was transmitted by both parents, which could have been introduced into the family by one of the great-great- grand parents of the affected individual Sp4-3. In family Sp-5, on the other hand, two disease haplotypes (“l” and “m”), with different historical recombinations, were passed on to the affected subject Sp5-3. Altogether, this points to the occurrence of at least three different HMSNR haplotypes in the Spanish Gypsy population (Figure 20 and Figure 21), none of which was shared by any of the other HMSNR families. From the data gathered by a co-worker prior to this PhD [6, 152], the two Spanish nuclear families Sp-4 and Sp-5 (Figure 20 and Figure 21) exhibited recombination breakpoints between D10S1670 and bA153K11CA1 (Sp4-3, haplotype “k”), between D10S1646 and D10S1670 (Sp5-3, haplotype “m”) and between bA153K11CA2 and D10S2480 (Sp5-3, haplotype “l”) on the centromeric side of the critical region, while the telomeric boundary was located between D10S1665 and D10S560 in both families. Identical alleles for marker D10S560 (allele “2”) in the three haplotypes indicate relatedness, with all three haplotypes probably derived from the same ancestral allele, prevalent in the Spanish Gypsy sub-isolate. The historical recombinations identified on the centromeric side warranted further analysis, while the telomeric one appeared less promising for the refined mapping process. However, additional typing of these two families for newly identified polymorphisms (in bold in the pedigrees) failed to reveal a shift of the recombination breakpoints, which means that family Sp-4 and -5 did not contribute further to the reduction of the critical region (Figure 20 and Figure 21).

100 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

conserved Sp4-2 HMSNR haplotype

D10S581 4 12 3 D10S1646 3 3 2 D10S1670 3 1 1 bA153K11CA2 7 9 7 D10S210 5 5 5 bA153K11CA1 2 1 2 D10S2480 1 5 1 bA86K9-CA1 3 2 3 bA314J18TA1 51 5 D10S1678 3 3 3 D10S1647 6 7 6 #62 11 1 #171 ?? 1 bA227H15CA1 3 1 3 bA227H15CA2 2 2 2 D10S1672 1 11 1 #156 ?? 1 #85 12 1 bA227H15AAAC ?? 3 D10S1742 1 3 1 bA404C6AC1 ?? 7 bA404C6AC2 ?? 4 D10S1665 7 6 7 D10S560 2 2 7

ka

conserved Sp4-3 Sp4-4 HMSNR haplotype

D10S581 4 4 3 12 3 D10S1646 3 3 4 3 2 D10S1670 3 3 1 1 1 bA153K11CA2 7 7 5 9 7 D10S210 5 5 5 5 5 bA153K11CA1 2 2 2 1 2 D10S2480 1 1 5 5 1 bA86K9-CA1 3 3 6 2 3 bA314J18TA1 55 11 5 D10S1678 3 3 3 3 3 D10S1647 6 6 5 7 6 #62 11 21 1 #171 ?? ?? 1 bA227H15CA1 3 3 5 1 3 bA227H15CA2 2 2 2 2 2 D10S1672 1 1 9 11 1 #156 11 ?? 1 #85 11 ?? 1 bA227H15AAAC 33 26 3 D10S1742 1 1 3 3 1 bA404C6AC1 77 44 7 bA404C6AC2 44 31 4 D10S1665 7 7 6 6 7 D10S560 2 2 2 2 7

kk a

Figure 20: Pedigree of the Spanish Gypsy family Sp-4 The recombination breakpoints on HMSNR haplotype “k” in family Sp-4 could not be moved any further. This pedigree did not contribute the reduction of the critical region. Pedigree contains all microsatellites and selected SNPs. Markers in bold have been newly added in the process of refined mapping. “?” has been used where markers have not been typed for that individual. The letters underneath the alleles represent haplotypes names as in Figure 17 and Figure 18. SNP-alleles are called as follows: #62: 1=A, 2=C; #171: 1=C, 2= T; #156: 1=C, 2=T; #85: 1=T, 2=G.)

101 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

conserved Sp5-1 (420) Sp5-2 (415) HMSNR haplotype

D10S581 10 4 7 1 3 D10S1646 4 5 4 4 2 D10S1670 1 1 1 8 1 bA153K11CA1 2 3 2 2 2 D10S210 4 4 5 5 5 bA153K11CA2 7 9 7 8 7 D10S2480 7 1 1 5 1 bA86K9-CA1 2 3 3 2 3 bA314J18TA1 1 5 5 1 5 D10S1678 3 3 3 3 3 D10S1647 6 6 6 1 6 #62 11 11 1 #171 ? ? ? ? 1 bA227H15CA1 1 3 3 5 3 bA227H15CA2 2 2 2 2 2 D10S1672 8 1 1 7 1 #156 ? ? ? ? 1 #85 ?? ?? 1 bA227H15AAAC 5 3 3 3 3 D10S1742 3 1 1 4 1 bA404C6AC1 2 7 7 3 7 bA404C6AC2 5 4 4 6 4 D10S1665 3 7 7 5 7 D10S560 2 2 2 2 7

lm a

Sp5-3 (408) conserved HMSNR haplotype

D10S581 4 7 3 D10S1646 5 4 2 D10S1670 1 1 1 bA153K11CA1 3 2 2 D10S210 4 5 5 bA153K11CA2 9 7 7 D10S2480 1 1 1 bA86K9-CA1 3 3 3 bA314J18TA1 5 5 5 D10S1678 3 3 3 D10S1647 6 6 6 #62 1 1 1 #171 ? ? 1 bA227H15CA1 3 3 3 bA227H15CA2 2 2 2 D10S1672 1 1 1 #156 1 1 1 #85 ? ? 1 bA227H15AAAC 3 3 3 D10S1742 1 1 1 bA404C6AC1 7 7 7 bA404C6AC2 4 4 4 D10S1665 7 7 7 D10S560 2 2 7 lm a Figure 21: Pedigree of the Spanish Gypsy family Sp-5 The recombination breakpoints on HMSNR haplotypes “l” and “m” in family Sp-5 could not be moved any further. This pedigree did not contribute the reduction of the critical region. Pedigree contains all microsatellites and selected SNPs. Markers in bold have been newly added in the process of refined mapping. “?” has been used where markers have not been typed for that individual. The letters underneath the alleles represent haplotypes names as in Figure 17 and Figure 18. SNP-alleles are called as follows: #62: 1=A, 2=C; #171: 1=C, 2= T; #156: 1=C, 2=T; #85: 1=T, 2=G.

102 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region 3.3.3.2 The large Bulgarian Gypsy kindred – historical recombination on the telomeric side

Two of the four Bulgarian Gypsy families, namely BULG-1a and b, were used in the genome scan in the original mapping study, while BULG-1c and d were typed for the markers of the critical region on chromosome 10 after linkage had been established [5]. All four families are part of the large Bulgarian Gypsy pedigree where HMSNL and HMSNR segregate independently (Figure 7, [5]). The DNA sample of R1 became only available at a later stage (R1 is not included in the pedigree in Figure 7). In family BULG-1a (Figure 22, [5]) the great-grand mothers of the affected individuals are sisters. Interestingly, the two HMSNR haplotypes (haplotypes “e” and “f”) found in this family are not identical, as one would expect from a consanguineous family. This raises the question whether one of the haplotypes was introduced by an individual that married into the family or was produced by recombination in that family. In family BULG-1b (Figure 23, [5]) four HMSNR haplotypes (haplotypes “a”, “b”, “c”, “d”) were detected. One haplotype (haplotype “a”) is the conserved ancestral haplotype. The other three haplotypes have been generated by recombination. One of them is product of a recent centromeric recombination (haplotype “c” in R11), while the others were due to historical recombination. There is one disease haplotype (haplotype “b”), which occurs in all seven affected individuals, but was not introduced by the same subject, as R15 does not carry this haplotype. In family BULG-1c (Figure 24, [5]) the two siblings (R16 and R17) carry haplotype “f”, identical to BULG-1a, and the recombinant haplotype “i”, with a historical recombination. Due to the lack of parental DNA samples, haplotypes were inferred, based on the haplotypes seen in the other HMSNR families. In BULG-1d (Figure 25) of the large Bulgarian pedigree, six HMSNR haplotypes (haplotypes “e”, “f”, “g”, “h”, “i", “j”) with different centromeric and telomeric recombination breakpoints were detected. Haplotype “e” also occurs in family BULG-1a, haplotype “f” also in BULG-1a and c, and haplotype “i" was also observed in BULG-1c, as mentioned above. Haplotypes “g”, “h” and “j” have only been noted in BULG-1d. As DNA samples for the parents of R4 and R5 were unavailable, haplotypes for these individuals had to be inferred.

103 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

conserved R-21 R-22 R-23 R-42 R-43 HMSNR haplotype

D10S581 3 3 3 3 3 3 3 3 10 7 3 D10S581 D10S1646 2 2 2 2 2 2 2 2 8 4 2 D10S1646 D10S1670 1 1 1 1 1 1 1 1 1 4 1 D10S1670 bA153K11CA2 7 7 7 7 7 7 7 7 6 9 7 bA153K11CA2 D10S210 5 5 5 5 5 5 5 5 5 5 5 D10S210 bA153K11CA1 2 2 2 2 2 2 2 2 2 2 2 bA153K11CA1 D10S2480 1 1 1 1 1 1 1 1 1 5 1 D10S2480 bA86K9-CA1 3 3 3 3 3 3 3 3 6 2 3 bA86K9-CA1 bA314J18TA1 5 5 5 5 5 5 5 5 1 1 5 bA314J18TA1 D10S1678 3 3 3 3 3 3 3 3 2 3 3 D10S1678 D10S1647 6 6 6 6 6 6 6 6 3 2 6 D10S1647 #62 11 11 11 11 11 1 #62 #171 ? ? ? ? ? ? ? ? ? ? 1 #171 bA227H15CA1 3 3 3 3 3 3 3 3 5 2 3 bA227H15CA1 bA227H15CA2 2 2 2 2 2 2 2 2 2 2 2 bA227H15CA2 D10S1672 1 1 1 1 1 1 1 1 1 9 1 D10S1672 #156 ? ? 1 1 ? ? ? ? 2 2 1 #156 #85 ?? 21 ?? ?? 22 1 #85 bA227H15AAAC 2 3 2 3 2 3 2 3 2 2 3 bA227H15AAAC D10S1742 4 1 4 1 4 1 4 1 5 4 1 D10S1742 bA404C6AC1 1 7 1 7 1 7 1 7 5 5 7 bA404C6AC1 bA404C6AC2 5 4 5 4 5 4 5 4 4 5 4 bA404C6AC2 D10S1665 5 11 5 11 5 11 5 11 4 7 7 D10S1665 D10S560 2 2 2 2 2 2 2 2 3 2 7 D10S560

e f e f e f e f a

conserved R40 R41 HMSNR haplotype

D10S581 3 4 3 D10S1646 2 4 2 D10S1670 1 9 1 bA153K11CA2 ? ? 7 D10S210 5 7 5 bA153K11CA1 ? ? 2 D10S2480 1 5 1 bA86K9-CA1 ? ? 3 bA314J18TA1 5 1 5 D10S1678 3 2 3 D10S1647 6 5 6 #62 ? ? 1 #171 ? ? 1 bA227H15CA1 ? ? 3 bA227H15CA2 ? ? 2 D10S1672 1 9 1 #156 ? ? 1 #85 ? ? 1 bA227H15AAAC 3 2 3 D10S1742 1 6 1 bA404C6AC1 7 5 7 bA404C6AC2 4 4 4 D10S1665 11 14 7 D10S560 2 2 7

f a

Figure 22: Pedigree of the Bulgarian Gypsy family BULG-1a While HMSNR haplotype “f” did not contribute to the reduction of the critical region, it was possible to the refine the recombination breakpoint of haplotype “e” to the vicinity of SNP #156. Pedigree contains all microsatellites and selected SNPs. Markers in bold have been newly added in the process of refined mapping. “?” has been used where markers have not been typed for that individual. The letters underneath the alleles represent haplotypes names as in Figure 17 and Figure 18. SNP-alleles are called as follows: #62: 1=A, 2=C; #171: 1=C, 2= T; #156: 1=C, 2=T; #85: 1=T, 2=G.

104 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

conserved R-15 R-10 R-9 HMSNR haplotype

D10S581 3 10 D10S581 5 3 D10S581 3 2 3 D10S1646 2 2 D10S1646 2 2 D10S1646 2 7 2 D10S1670 1 1 D10S1670 3 1 D10S1670 1 3 1 bA153K11CA1 2 2 bA153K11CA1 9 2 bA153K11CA1 2 5 2 D10S210 5 4 D10S210 4 5 D10S210 5 1 5 bA153K11CA2 7 9 bA153K11CA2 9 7 bA153K11CA2 7 9 7 D10S2480 1 7 D10S2480 1 1 D10S2480 1 7 1 bA86K9-CA1 3 2 bA86K9-CA1 3 3 bA86K9-CA1 3 1 3 bA314J18TA1 5 1 bA314J18TA1 1 5 bA314J18TA1 5 1 5 D10S1678 3 3 D10S1678 3 3 D10S1678 3 3 3 D10S1647 6 7 D10S1647 1 6 D10S1647 6 7 6 #62 1 1 #62 2 1 #62 1 1 1 #171 ? ? #171 ? ? #171 ? ? 1 bA227H15CA1 3 1 bA227H15CA1 5 3 bA227H15CA1 3 7 3 bA227H15CA2 2 2 bA227H15CA2 2 2 bA227H15CA2 2 2 2 D10S1672 1 1 D10S1672 8 1 D10S1672 1 7 1 #156 2 1 #156 ? ? #156 ? ? 1 #85 2 1 #85 ? ? #85 ? ? 1 bA227H15AAAC 2 4 bA227H15AAAC 1 3 bA227H15AAAC 3 2 3 D10S1742 4 4 D10S1742 4 1 D10S1742 1 4 1 bA404C6AC1 1 6 bA404C6AC1 3 7 bA404C6AC1 7 3 7 bA404C6AC2 5 5 bA404C6AC2 5 4 bA404C6AC2 4 5 4 D10S1665 5 1 D10S1665 4 7 D10S1665 7 7 7 D10S560 ? ? D10S560 2 7 D10S560 7 5 7

da aa

conserved R-13 R-14 R-11 R-12 R-6 R-7 R-8 HMSNR haplotype

D10S581 3 10 3 10 D10S581 5 10 3 10 D10S581 10 3 10 3 10 3 3 D10S1646 2 2 2 2 D10S1646 2 2 2 2 D10S1646 2 2 2 2 2 2 2 D10S1670 1 1 1 1 D10S1670 1 1 1 1 D10S1670 1 1 1 1 1 1 1 bA153K11CA1 2 2 2 2 bA153K11CA1 2 2 2 2 bA153K11CA1 2 2 2 2 2 2 2 D10S210 5 5 5 5 D10S210 5 5 5 5 D10S210 5 5 5 5 5 5 5 bA153K11CA2 7 7 7 7 bA153K11CA2 7 7 7 7 bA153K11CA2 7 7 7 7 7 7 7 D10S2480 1 1 1 1 D10S2480 1 1 1 1 D10S2480 1 1 1 1 1 1 1 bA86K9-CA1 3 3 3 3 bA86K9-CA1 3 3 3 3 bA86K9-CA1 3 3 3 3 3 3 3 bA314J18TA1 5 5 5 5 bA314J18TA1 5 5 5 5 bA314J18TA1 5 5 5 5 5 5 5 D10S1678 3 3 3 3 D10S1678 3 3 3 3 D10S1678 3 3 3 3 3 3 3 D10S1647 6 6 6 6 D10S1647 6 6 6 6 D10S1647 6 6 6 6 6 6 6 #62 11 11 #62 11 11 #62 11 11 11 1 #171 ? ? ? ? #171 ? ? ? ? #171 ? ? ? ? ? ? 1 bA227H15CA1 3 3 3 3 bA227H15CA1 3 3 3 3 bA227H15CA1 3 3 3 3 3 3 3 bA227H15CA2 2 2 2 2 bA227H15CA2 2 2 2 2 bA227H15CA2 2 2 2 2 2 2 2 D10S1672 1 1 1 1 D10S1672 1 1 1 1 D10S1672 1 1 1 1 1 1 1 #156 1 1 1 1 #156 ? ? 1 1 #156 ? ? ? ? ? ? 1 #85 ? ? 2 1 #85 ? ? ? ? #85 ? ? ? ? 1 1 1 bA227H15AAAC 2 3 2 3 bA227H15AAAC 3 3 3 3 bA227H15AAAC 3 3 3 3 3 3 3 D10S1742 4 1 4 1 D10S1742 1 1 1 1 D10S1742 1 1 1 1 1 1 1 bA404C6AC1 1 7 1 7 bA404C6AC1 7 7 7 7 bA404C6AC1 7 7 7 7 7 7 7 bA404C6AC2 5 4 5 4 bA404C6AC2 4 4 4 4 bA404C6AC2 4 4 4 4 4 4 4 D10S1665 5 11 5 11 D10S1665 7 11 7 11 D10S1665 11 7 11 7 11 7 7 D10S560 8 2 8 2 D10S560 7 2 7 2 D10S560 2 7 2 7 2 7 7 db db cb ab ba ba ba a Figure 23: Pedigree of the Bulgarian Gypsy family BULG-1b Haplotype “a” is the conserved HMSNR haplotype. For haplotypes “b” and “c”, the recombination breakpoints were not changed in the process of the refined mapping. Haplotype “d” is closely related to haplotype “e” (Figure 22). Both exhibit the same recombination breakpoint between SNPs #156 and #85, while the only difference is marker D10S560, which suggests that an independent recombination event took place on one of the haplotypes. Pedigree contains all microsatellites and selected SNPs. Markers in bold have been newly added in the process of refined mapping. “?” has been used where markers have not been typed for that individual. The letters underneath the alleles represent haplotypes names as in Figure 17 and Figure 18. SNP-alleles are called as follows: #62: 1=A, 2=C; #171: 1=C, 2= T; #156: 1=C, 2=T; #85: 1=T, 2=G.)

105 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

conserved R-16 R-17 HMSNR haplotype

D10S581 3 4 3 4 3 D10S1646 2 2 2 2 2 D10S1670 1 1 1 1 1 bA153K11CA2 7 7 7 7 7 D10S210 5 5 5 5 5 bA153K11CA1 2 2 2 2 2 D10S2480 1 1 1 1 1 bA86K9-CA1 3 3 3 3 3 bA314J18TA1 5 5 5 5 5 D10S1678 3 3 3 3 3 D10S1647 6 6 6 6 6 #6211111 #171 ? ? ? ? 1 bA227H15CA1 3 3 3 3 3 bA227H15CA2 2 2 2 2 2 D10S1672 1 1 1 1 1 #156 ? ? ? ? 1 #85 ? ? ? ? 1 bA227H15AAAC 3 3 3 3 3 D10S1742 1 1 1 1 1 bA404C6AC177777 bA404C6AC244444 D10S1665 11 7 11 7 7 D10S560 2 7 2 7 7

fi fi a

Figure 24: Pedigree of the Bulgarian Gypsy family BULG-1c The recombination breakpoint on HMSNR haplotypes “f” and “i” were not changed in the process of the refined mapping. Pedigree contains all microsatellites and selected SNPs. Markers in bold have been newly added in the process of refined mapping. “?” has been used where markers have not been typed for that individual. The letters underneath the alleles represent haplotypes names as in Figure 17 and Figure 18. SNP-alleles are called as follows: #62: 1=A, 2=C; #171: 1=C, 2= T; #156: 1=C, 2=T; #85: 1=T, 2=G.

106 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

conserved R-4 R-3 HMSNR haplotype

D10S581 7 4 D10S581 3 7 3 D10S1646 2 2 D10S1646 2 4 2 D10S1670 1 1 D10S1670 1 4 1 bA153K11CA2 7 7 bA153K11CA2 7 9 7 D10S210 5 5 D10S210 5 5 5 bA153K11CA1 2 2 bA153K11CA1 2 2 2 D10S2480 1 1 D10S2480 1 5 1 bA86K9-CA1 3 3 bA86K9-CA1 3 2 3 bA314J18TA1 5 5 bA314J18TA1 5 1 5 D10S1678 3 3 D10S1678 3 3 3 D10S1647 6 6 D10S1647 6 2 6 #62 1 1 #62 1 1 1 #171 ? ? #171 ? ? 1 bA227H15CA1 3 3 bA227H15CA1 3 2 3 bA227H15CA2 2 2 bA227H15CA2 2 2 2 D10S1672 1 1 D10S1672 1 9 1 #156 ? ? #156 ? ? 1 #85 ? ? #85 ? ? 1 bA227H15AAAC 3 3 bA227H15AAAC 3 2 3 D10S1742 1 1 D10S1742 1 4 1 bA404C6AC1 7 7 bA404C6AC1 7 5 7 bA404C6AC2 4 4 bA404C6AC2 4 5 4 D10S1665 11 7 D10S1665 11 7 7 D10S560 3 7 D10S560 2 2 7

hi f a

conserved R-1 R-2 R-5 HMSNR haplotype

D10S581 4 3 7 7 3 4 3 D10S1646 6 2 2 4 2 2 2 D10S1670 1 1 1 4 1 1 1 bA153K11CA2 7 7 7 9 7 7 7 D10S210 5 5 5 5 5 5 5 bA153K11CA1 2 2 2 2 2 2 2 D10S2480 1 1 1 5 1 1 1 bA86K9-CA1 3 3 3 2 3 3 3 bA314J18TA1 5 5 5 1 5 5 5 D10S1678 3 3 3 3 3 3 3 D10S1647 6 6 6 2 6 6 6 #62 1 1 1 1 ? ? 1 #171 ? ? ? ? ? ? 1 bA227H15CA1 3 3 3 2 3 3 3 bA227H15CA2 2 2 2 2 2 2 2 D10S1672 1 1 1 9 1 1 1 #156 ? ? ? ? ? ? 1 #85 ?? ?? ?? 1 bA227H15AAAC 3 3 3 2 2 3 3 D10S1742 1 1 1 4 4 1 1 bA404C6AC1 7 7 7 5 1 7 7 bA404C6AC2 4 4 4 5 5 4 4 D10S1665 7 11 11 7 5 7 7 D10S560 7 2 3 2 2 3 7

jf h eg a

Figure 25: Pedigree of the Bulgarian Gypsy family BULG-1d For HMSNR haplotypes “f”, “g”, “h”, “i” and “j”, the refined mapping did not change the recombination breakpoint. Haplotype “e”, which was already seen in family BULG-1a (Figure 22), appears also in this family. Pedigree contains all microsatellites and selected SNPs. Markers in bold have been newly added in the process of refined mapping. “?” has been used where markers have not been typed for that individual. The letters underneath the alleles represent haplotypes names as in Figure 17 and Figure 18. SNP-alleles are called as follows: #62: 1=A, 2=C; #171: 1=C, 2= T; #156: 1=C, 2=T; #85: 1=T, 2=G.

107 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region These families initially showed the furthest centromeric recombination between D10S581 and D10S1646 in a number of individuals and haplotypes, while promising telomeric recombinations between D10S1672 and D10S1742 were observed in BULG- 1a (haplotype “d”, Figure 22) and BULG-1b and d (haplotype “e”, Figure 23 and Figure 25) [5]. However, addition of R1 to family BULG-1d provided another centromeric recombination breakpoint between D10S1646 and D10S1670 which was located further into the region of homozygosity (Figure 25). As a result of the refined mapping the recombination breakpoints on the centromeric side could not be mapped further into the critical region by identification of new informative variants, while the telomeric end of the region of homozygosity was redefined. The size of the critical HMSNR region was changed by the telomeric recombination in haplotypes “d” and “e” which previously showed recombination between D10S1672 and D10S1742. Typing of additional variants localised the recombination breakpoint for both haplotypes into the vicinity of SNP #163 (located close to SNP #85 which is indicated in the pedigrees), thus reducing the critical region by 129.9 kb. Both haplotypes exhibit exactly the same alleles apart from D10S5650, which indicates that one and the same recombination near SNP #163, generated an ancestral haplotype, while an additional recombination event with a breakpoint centromeric of D10S560 occurred in one of the two haplotypes. The individuals exhibiting the two haplotypes “d” and “e” are R13, R14 from BULG-1b, and R21, R22, R23, R42 belonging to BULG-1a; and R5 from BULG-1d (Figure 22, Figure 23, Figure 24, Figure 25). The fact, that two closely related recombinant haplotypes (haplotypes “d” and “e”) that carry one and the same historical recombination near SNP #163 were transmitted in different branches of the large pedigree, implies that the historical recombination which created the ancestral haplotype may have happened several generations ago. Furthermore, this haplotype could be common in the Bulgarian Gypsy population.

3.3.3.3 The Romanian Gypsy families – centromeric and telomeric recombination

Similar to the Spanish Gypsy families, the two Romanian Gypsy families were also recruited by collaborators in Europe due to their recessive neuropathy with a phenotype compatible with HMSNR. Genotyping for the markers that exhibited linkage in the Bulgarian Gypsy families resulted in the detection of closely related haplotypes and addition of these families to the HMSNR study [6].

108 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region New subjects, specifically two more patients (ROM-13 and ROM-14) and their two parents (ROM-8 and ROM-9) (Figure 26) originating from a branch of the family residing in France, were added to the large pedigree (ROM-1) during this PhD. Although the number of affected people was now up to five, linkage analysis was inconclusive due to the unusual inheritance of one of the HMSNR haplotypes (haplotype “n”, Figure 26). This haplotype was introduced into the family by three seemingly unrelated individuals (ROM-53, ROM-8 and the husband of ROM-52), two of them married into the family. This raises the possibility of unreported consanguinity or, alternatively, that haplotype “n” is very frequent in the Romanian Gypsies, which is supported by its occurrence in another unrelated family (ROM-2, Figure 27). Apart from its inheritance, haplotype n also has another interesting feature. The conserved HMSNR marker allele at D10S1647 is 6, but in haplotype “n” it is 5. Mutation rates for microsatellites have been estimated to be about 10-3 to 10-4 per locus per generation [213, 214]. Because there was no evidence of an ongoing recombination on either side of the marker, it seemed likely that the marker allele “5” at D10S1647 was the product of a mutation. Therefore the recombination mapping was carried out under the assumption that this is a microsatellite mutation. At the start of this PhD, the furthest recombination breakpoints for the two Romanian Gypsy families were flanked by D10S1670 and bA135K11CA1 centromeric (haplotype “o”) and by D10S1672 and D10S1742 telomeric (haplotype “n”) [6]. Overall, four HMSNR haplotypes occurred in these two families, namely haplotype “a”, “i", “n” and “o” (Figure 26 and Figure 27). Haplotype “a”, which is the conserved ancestral haplotype, and haplotype “i", were also found in the Bulgarian Gypsies, indicating the common origin for these haplotypes in all of the families. Haplotypes “n” and “o”, on the other hand, seem to be confined to the Romanian Gypsy families. Both carried promising recombinations for the refined mapping, haplotype “n” on the centromeric and haplotype “o” on the telomeric side of the critical region.

109 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

conserved ROM-41 ROM-53 ROM-52 HMSNR haplotype

D10S581 9 4 D10S581 12 3 4 11 3 D10S1646 3 4 D10S1646 4 3 2 3 2 D10S1670 3 1 D10S1670 1 3 3 3 1 bA153K11CA2 7 9 bA153K11CA2 9 7 1 7 7 D10S210 5 5 D10S210 5 1 2 1 5 bA153K11CA1 2 2 bA153K11CA1 3 5 11 5 2 D10S2480 1 5 D10S2480 7 1 1 7 1 bA86K9-CA1 3 2 bA86K9-CA1 2 3 5 3 3 bA314J18TA1 5 1 bA314J18TA1 4 1 1 1 5 D10S1678 3 3 D10S1678 3 4 3 4 3 D10S1647 6 7 D10S1647 5 6 5 6 6 #62 1 1 #62 1 1 1 1 1 #171 ? ? #171 ? ? ? ? 1 bA227H15CA1 3 5 bA227H15CA1 3 5 4 6 3 bA227H15CA2 2 2 bA227H15CA2 2 1 2 3 2 D10S1672 1 8 D10S1672 1 1 8 10 1 #156 ? ? #156 ? ? 2 2 1 #85 2 2 #85 1 2 1 2 1 bA227H15AAAC 3 3 bA227H15AAAC 3 3 5 2 3 D10S1742 3 4 D10S1742 1 3 4 8 1 bA404C6AC1 ? ? bA404C6AC1 7 4 4 5 7 bA404C6AC2 2 5 bA404C6AC2 4 2 4 5 4 D10S1665 7 6 D10S1665 7 7 5 7 7 D10S560 1 11 D10S560 ? ? 8 12 7

on a

conserved ROM-8 ROM-9 ROM-42 ROM-23 ROM-5 ROM-15 ROM-47 HMSNR haplotype

D10S581 4 12 3 4 3 3 3 9 D10S581 3 12 D10S581 4 12 4 5 3 D10S1646 2 4 2 4 2 4 2 3 D10S1646 2 4 D10S1646 2 4 2 6 2 D10S1670 1 1 1 1 1 1 1 3 D10S1670 1 1 D10S1670 3 1 1 8 1 bA153K11CA2 9 9 7 9 7 9 7 7 bA153K11CA2 7 9 bA153K11CA2 1 9 7 1 7 D10S210 5 5 5 5 5 3 5 5 D10S210 5 5 D10S210 2 5 5 4 5 bA153K11CA1 3 3 2 2 2 2 2 2 bA153K11CA1 2 3 bA153K11CA1 11 3 2 5 2 D10S2480 5 7 1 5 1 1 1 1 D10S2480 1 7 D10S2480 1 7 1 7 1 bA86K9-CA1 2 2 3 2 3 3 3 3 bA86K9-CA1 3 2 bA86K9-CA1 5 2 3 3 3 bA314J18TA1 1 4 5 1 5 4 5 5 bA314J18TA1 5 4 bA314J18TA1 1 4 5 1 5 D10S1678 3 3 3 3 3 3 3 3 D10S1678 3 3 D10S1678 3 3 3 3 3 D10S1647 4 5 6 7 6 7 6 6 D10S1647 6 5 D10S1647 5 5 6 2 6 #62 11 11 11 11 #62 11 #62 11 11 1 #171 ? ? ? ? ? ? ? ? #171 ? ? #171 ? ? ? ? 1 bA227H15CA1 1 3 3 5 3 5 3 3 bA227H15CA1 3 3 bA227H15CA1 4 3 3 5 3 bA227H15CA2 2 2 2 2 2 3 2 2 bA227H15CA2 2 2 bA227H15CA2 2 2 2 1 2 D10S1672 8 1 1 8 1 2 1 1 D10S1672 1 1 D10S1672 8 1 1 1 1 #156 ? ? ? ? ? ? 1 2 #156 ? ? #156 2 1 ? ? 1 #85 21 12 ?? 12 #85 11 #85 21 12 1 bA227H15AAAC 3 3 3 3 3 2 3 3 bA227H15AAAC 3 3 bA227H15AAAC 5 3 3 2 3 D10S1742 4 1 1 4 1 3 1 3 D10S1742 1 1 D10S1742 4 1 1 4 1 bA404C6AC1 4 7 7 3 7 4 7 4 bA404C6AC1 7 7 bA404C6AC1 4 7 7 7 7 bA404C6AC2 3 4 4 5 4 5 4 2 bA404C6AC2 4 4 bA404C6AC2 4 4 4 5 4 D10S1665 5 7 7 6 7 7 7 7 D10S1665 7 7 D10S1665 5 7 7 6 7 D10S560 7 7 7 11 7 2 7 1 D10S560 7 7 D10S560 8 7 ? ? 7

naaao an ni a

conserved Rom-12 ROM-13 ROM-14 ROM-55 ROM-20 ROM-16 HMSNR haplotype

D10S581 12 3 12 3 D10S581 12 4 D10S581 12 4 4 4 3 D10S1646 4 2 4 2 D10S1646 4 4 D10S1646 4 2 2 2 2 D10S1670 1 1 1 1 D10S1670 1 1 D10S1670 1 1 3 1 1 bA153K11CA2 9 7 9 7 bA153K11CA2 9 9 bA153K11CA2 9 7 1 7 7 D10S210 5 5 5 5 D10S210 5 2 D10S210 5 5 2 5 5 bA153K11CA1 3 2 3 2 bA153K11CA1 3 13 bA153K11CA1 3 2 11 2 2 D10S2480 7 1 7 1 D10S2480 7 1 D10S2480 7 1 1 1 1 bA86K9-CA1 2 3 2 3 bA86K9-CA1 2 2 bA86K9-CA1 2 3 5 3 3 bA314J18TA1 4 5 4 5 bA314J18TA1 4 2 bA314J18TA1 4 5 1 5 5 D10S1678 3 3 3 3 D10S1678 3 1 D10S1678 3 3 3 3 3 D10S1647 5 6 5 6 D10S1647 5 4 D10S1647 5 6 5 6 6 #62 1111 #62 11#62 11 11 1 #171 ? ? ? ? #171 ? ? #171 ? ? ? ? 1 bA227H15CA1 3 3 3 3 bA227H15CA1 3 6 bA227H15CA1 3 3 4 3 3 bA227H15CA2 2 2 2 2 bA227H15CA2 2 4 bA227H15CA2 2 2 2 2 2 D10S1672 1 1 1 1 D10S1672 1 1 D10S1672 1 1 8 1 1 #156 ? ? ? ? #156 ? ? #156 1 1 ? ? 1 #85 1111 #85 12#85 11 21 1 bA227H15AAAC 3 3 3 3 bA227H15AAAC 3 2 bA227H15AAAC 3 3 5 3 3 D10S1742 1 1 1 1 D10S1742 1 4 D10S1742 1 1 4 1 1 bA404C6AC1 7 7 7 7 bA404C6AC1 7 3 bA404C6AC1 7 7 4 7 7 bA404C6AC2 4 4 4 4 bA404C6AC2 4 5 bA404C6AC2 4 4 4 4 4 D10S1665 7 7 7 7 D10S1665 7 6 D10S1665 7 7 5 7 7 D10S560 7 7 7 7 D10S560 7 3 D10S560 7 7 8 7 7

nana n ni i a

Figure 26: Pedigree of the large Romanian Gypsy family ROM-1 For both, HMSNR haplotype “n” and “o”, it was possible to further move the recombination breakpoint into the critical region thereby reducing it substantially. Haplotype “n”, is introduced into the family on three occasions, namely from ROM-8, the husband of ROM-52 and ROM-53. The allele “5” at marker D10S1647 was previously thought to be a microsatellite mutation. However additional data obtained during the refined mapping supports this marker as a recombination breakpoint on the centromeric side of the critical HMSNR gene region. Haplotype “o”, which occurs in HMSNR patient ROM-23 and his father ROM-41 has enabled a reduction of the critical region on the telomeric side, thereby redefining the boundary of the critical regio to the vicinity of SNP #156. Pedigree contains all microsatellites and selected SNPs. Markers in bold have been newly added in the process of refined mapping. “?” has been used where markers have not been typed for that individual. The letters underneath the alleles represent haplotypes names as in Figure 17 and Figure 18. SNP-alleles are called as follows: #62: 1=A, 2=C; #171: 1=C, 2= T; #156: 1=C, 2=T; #85: 1=T, 2=G.

110 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

II-1 conserved (R148X +/-) HMSNR haplotype

D10S581 12 9 3 D10S1646 4 3 2 D10S1670 1 3 1 bA153K11CA1 3 2 2 D10S210 5 5 5 bA153K11CA2 9 7 7 D10S2480 7 1 1 bA86K9-CA1 2 3 3 bA314J18TA1 4 5 5 D10S1678 3 3 3 D10S1647 5 6 6 #62 1 1 1 #171 ? ? 1 bA227H15CA1 3 3 3 bA227H15CA2 2 2 2 D10S1672 1 1 1 #156 1 2 1 #85 1 2 1 bA227H15AAAC 3 3 3 D10S1742 1 3 1 bA404C6AC1 7 4 7 bA404C6AC2 4 2 4 D10S1665 ? ? 7 D10S560 ? ? 7

no a

II-2 II-3 conserved (R148X -/-) (R148X -/-) HMSNR haplotype

D10S581 12 7 12 7 3 D10S1646 4 7 4 7 2 D10S1670 1 4 1 4 1 bA153K11CA1 3 8 3 8 2 D10S210 5 3 5 3 5 bA153K11CA2 9 9 9 9 7 D10S2480 7 5 7 5 1 bA86K9-CA1 2 2 2 2 3 bA314J18TA1 4 1 4 1 5 D10S1678 3 3 3 3 3 D10S1647 5 2 5 2 6 #62 1 1 1 1 1 #171 ? ? ? ? 1 bA227H15CA1 3 5 3 5 3 bA227H15CA2 2 2 2 2 2 D10S1672 1 8 1 8 1 #156 ? ? ? ? 1 #85 1 2 1 2 1 bA227H15AAAC 3 2 3 2 3 D10S1742 1 3 1 3 1 bA404C6AC1 7 2 7 2 7 bA404C6AC2 4 5 4 5 4 D10S1665 ? ? ? ? 7 D10S560 ? ? ? ? 7 nn a Figure 27: Pedigree of the small Romanian Gypsy family ROM-2 Haplotypes “n” and “o”, seen in Family ROM-1 (Figure 26), also occur in this small nuclear family, suggesting that haplotypes resulting from the same historical recombination occurs across the population and can be shared between unrelated families. Furthermore, this pedigree indicates that an individual may be affected with HMSNR, but at the same time may also be a carrier for HMSNL, as this is the case for individual II-1. Pedigrees contains all microsatellites and selected SNPs. Markers in bold have been newly added in the process of refined mapping. “?” has been used where markers have not been typed for that individual. The letters underneath the alleles represent haplotypes names as in Figure 17 and Figure 18. SNP-alleles are called as follows: #62: 1=A, 2=C; #171: 1=C, 2= T; #156: 1=C, 2=T; #85: 1=T, 2=G. R148X +/+ = HMSNL affected, while R148X +/- = HMSNL carrier)

111 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

Haplotype analysis and further typing for newly identified variants in both, the small (ROM-2) and the large Romanian Gypsy family (ROM-1), helped reduce the region of homozygosity on the centromeric and the telomeric side. The centromeric boundary of the region of homozygosity was moved by 548 kb from bA86K9CA1 to D10S1647 (haplotype “n”), the marker that was previously assumed to carry a microsatellite mutation. The position of the recombination breakpoint is supported by three other variants, namely SNP #5 and #39 and the microsatellite bA314J18-TA1 (Figure 28). #5 #39 #52 #62 #67 #68 #69 #145 #143 #168 #142 #132 #133 #143 #136 #137 #138 #171 D10S581 D10S210 D10S1646 D10S1670 D10S2480 D10S1678 D10S1647 D10S1672 bA86K9CA1 bA314J18TA1 bA153K11CA2 bA153K11CA1 bA227H15CA1 bA227H15CA2

conserved 3 2175213C53 A 6 AGAdelAATCCT T AGAG? 321 n 124195372G4 3 G5 AGAdelAATCCT T AGAG? 321 ~548 kb

155.1 kb 122.7 185.9 kb Figure 28: Comparison of the centromeric side of haplotype “n” with the conserved haplotype Markers in bold were newly added during the refined mapping in this PhD project. “?” has been used where markers have not been typed for that haplotype. Alleles that support the recombination breakpoint at D10S1647 are italicised and framed.

In a similar way, support for the telomeric recombination was achieved after extensive typing of variants (Figure 29). After identification of the SNPs #85 to 88 and 95, it looked initially as if there might have been two recombination events leaving parts of the haplotype between #95 and D10S1742 the same as the conserved haplotype, or, another possibility was to question the order of the markers. A thorough investigation of the area, including outside D10S1742, confirmed that the order of markers was in fact right. With the addition of new informative markers, the telomeric recombination breakpoint was mapped into the area between D10S1672 and SNP #156 (haplotype “o”). This lead to the conclusion that the recombination on haplotype “o”, which only occurs in individuals Rom-23 and II-1, is different from the one observed on haplotype “e” and “d” in families BULG-1a, b and d (Figure 29). The historical recombination in haplotype “o” provides strong support to the boundary defined by haplotypes “d” and

112 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region “e” in the Bulgarian Gypsies as it maps 0.7 kb centromeric from the breakpoint of these haplotypes In summary, the recombinations in the Romanian Gypsies refined the shared region of homozygosity to just 109.3 kb flanked by the microsatellite markers D10S1647 and SNP #156. #85 #86 #87 #88 #95 #156 #160 #161 #163 #241 #100 #101 #102 #103 #244 #104 #105 #106 #108 #111 #112 #113 #114 #115 #116 D10S560 D10S1672 D10S1742 D10S1665 bA404C6AC1 bA404C6AC2 bA227H15CA1 bA227H15CA2 bA227H15AAAC conserved 321CTTATTTAC?3AGACTTCATCA1 del 7TGC477 o 321TCCGGCCCTA3AGACC TCATCA3del 4TGC2 7 1 d 321CTTGGCCCT?2GAG????GCTG4 ins 1AAA5 58 e 321CTTGGCCCT?2GAGTTCTGCTG4 ins 1???552

0.7 kb 129.8 kb

130.5 kb

Figure 29: Comparison of the telomeric side of the conserved HMSNR haplotype with haplotypes “o”, “d” and “e” Markers in bold were newly added during the refined mapping in this PhD project. “?” has been used where markers have not been typed for that haplotype. Alleles that support the recombination breakpoint at #156 in haplotype “o” are italicised and framed.

3.3.3.4 The small Bulgarian Gypsy family – recent centromeric recombination

With the new critical region in mind, a nuclear family from Bulgaria (BULG-2) was re-assessed. In this family, both, mother and son are affected with a peripheral neuropathy. Whereas the mother shows the typical symptoms of HMSNL and was also found homozygous for the truncating mutation R148X in NDRG1 on chromosome 8q24, the son and the daughter are carriers of the HMSNL haplotype and heterozygous for the HMSNL mutation. Therefore the son must be affected with a different neuropathy than the mother. The initial genotyping for HMSNR markers on chromosome 10, performed by a co-worker, and shown in Figure 30, demonstrated that the mother was a carrier of the conserved HMSNR haplotype, and both, son and daughter had inherited her normal chromosome, while obtaining a conserved HMSNR chromosome from the deceased father. In the course of this PhD project, the members of this family were typed for additional markers in the HMSNR region.

113 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

K1 (R148X +/+) D10S1646 2 5 D10S1670 1 1 D10S210 5 2 D10S2480 1 5 D10S1678 3 3 D10S1647 6 2 D10S1672 1 1

K3 K2 (R148X +/-) (R148X +/-)

D10S1646 2 5 2 5 D10S1670 1 1 1 1 D10S210 5 2 5 2 D10S2480 1 5 1 5 D10S1678 3 3 3 3 D10S1647 6 2 6 2 D10S1672 1 1 1 1

Figure 30: Pedigree of the Bulgarian Gypsy family BULG-2 as obtained from the initial genotyping of markers in the broad HMSNR interval The initial typing of selected markers in the HMSNR region suggested that all three individuals in this pedigree were carriers for HMSNR. Grey areas highlight the conserved HMSNR haplotype. R148X +/+ indicates HMSNL affecteds, while R148X +/- indicates HMSNL carrier

With the decrease of the critical region of homozygosity to just 109.3 kb due to the recombinations identified in the other families, additional typing of variants was performed, thereby demonstrating that the chromosome K3 inherited from his mother (haplotype “q”) had been subject to recombination, thus containing a part of the normal chromosome at the centromeric end, while the telomeric end was a part of the maternal HMSNR chromosome “i", which also occurs in the Romanian Gypsy families and in parts of the large Bulgarian Gypsy pedigree. This maternal recombination for which the breakpoint was mapped between SNP #171 and the microsatellite marker bA227H15CA1 reduced the size of the critical region to just 63.8 kb (Figure 31). Triggered by these genetic findings extended clinical examination of the patient was performed, which confirmed that K3 was not affected by HMSNL as indicated by previous clinical examination and the genotype data. He had normal BEAPs (brainstem evoked auditory potentials) and higher nerve conduction velocities than seen in

114 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region HMSNL patients of a similar age. Furthermore, the clinical picture presented, was in the range of HMSNR patients [215].

K1 conserved (R148X +/+) HMSNR haplotype

D10S581 4 4 3 D10S1646 2 5 2 D10S1670 1 1 1 bA153K11CA1 2 5 2 D10S210 5 2 5 bA153K11CA2 7 9 7 D10S2480 1 5 1 bA86K9-CA1 3 1 3 bA314J18TA1 5 1 5 D10S1678 3 3 3 D10S1647 6 2 6 #62 1 1 1 #171 ? ? 1 bA227H15CA1 3 5 3 bA227H15CA2 2 1 2 D10S1672 1 1 1 #156 ? ? 1 #85 1 2 1 bA227H15AAAC 3 5 3 D10S1742 1 3 1 bA404C6AC1 7 4 7 bA404C6AC2 4 5 4 D10S1665 7 5 7 D10S560 7 2 7

i-1 a

K3 K2 conserved (R148X +/-) (R148X +/-) HMSNR haplotype

D10S581 7 4 7 4 3 D10S1646 2 5 2 5 2 D10S1670 1 1 1 1 1 bA153K11CA1 2 5 2 5 2 D10S210 5 2 5 2 5 bA153K11CA2 7 9 7 9 7 D10S2480 1 5 1 5 1 bA86K9-CA1 3 1 3 1 3 bA314J18TA1 5 1 5 1 5 D10S1678 3 3 3 3 3 D10S1647 6 2 6 2 6 #62 1 2 1 2 1 #171 1 2 ? ? 1 bA227H15CA1 3 3 3 5 3 bA227H15CA2 2 2 2 1 2 D10S1672 1 1 1 1 1 #156 1 1 ? ? 1 #85 1 1 1 2 1 bA227H15AAAC 3 3 3 5 3 D10S1742 1 1 1 3 1 bA404C6AC1 7 7 7 4 7 bA404C6AC2 4 4 4 5 4 D10S1665 7 7 7 5 7 D10S560 3 7 3 2 7 pq p a Figure 31: Pedigree of the Bulgarian Gypsy family BULG-2 after extended typing of polymorphisms The extended typing effort performed in this family resulted in detection of a maternal recombination, which generated the HMSNR haplotype “q”, which proves crucial for the definition of the centromeric boundary of the critical region of 63.8 kb. Pedigree contains all microsatellites and selected SNPs. Markers in bold have been newly added in the process of refined mapping. “?” has been used where markers have not been typed for that individual. The letters underneath the alleles represent haplotypes names as in Figure 17 and Figure 18. SNP- alleles are called as follows: #62: 1=A, 2=C; #171: 1=C, 2= T; #156: 1=C, 2=T; #85: 1=T, 2=G.)

115 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region 3.3.3.5 Summary of the recombination mapping

At the beginning of this PhD project 15 HMSNR haplotypes (haplotypes “a” to “o”) had been identified. The boundaries of the critical region were between bA86K9CA1 and D10S1678 on the centromeric side and D10S1672 and D10S1742 on the telomeric side. This interval was estimated to be over 1 Mb in 2001; however, completion of sequencing of this region by the Human Genome Project revealed that its true size was around 0.8 Mb. Recombination mapping in the interval between bA86K9CA1 and D10S1742 in the HMSNR families moved the recombination breakpoints for four of the haplotypes, namely haplotype “d”, “e”, “n” and “o”. High accuracy in mapping of the breakpoint was achieved by employing a high-density physical-genetic map. Furthermore, a new Bulgarian Gypsy family BULG-2 was included into the typing effort, thus adding haplotypes “p” and “q” to the picture. Haplotype “q” was produced by a recent recombination in the mother of the affected individual, which was crucial for the refined mapping (Figure 18). Refined mapping and addition of a new Gypsy family from Bulgaria increased the number of HMSNR haplotypes from 15 to 17. 12 of the haplotypes (haplotypes “a” to “j”, “p”, “q”) were found in the Bulgarian Gypsies, two of these (haplotypes “a” and “i") also occurred in the Romanian Gypsy families, which shared additional two haplotypes (“n” and “o”) between each other. The Spanish HMSNR Gypsy families both exhibited distinct haplotypes (Sp4: haplotype “k” and Sp5: haplotypes “l” and “m”), which they did not share between each other. The new shared region of homozygosity between the Bulgarian, Spanish and Romanian Gypsy families has a size of only 63.8 kb. Its centromeric boundary is defined by the recent recombination in haplotype “q” with the breakpoint telomeric to SNP #171, whereas its telomeric boundary is given by SNP #156 in haplotype “o” (Figure 18).

116 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region

3.4 SUMMARY AND DISCUSSION OF THE REFINED MAPPING

The refined mapping of the critical HMSNR gene region relied on the assumption that the disease-causing mutation is a founder mutation in the Gypsy population. The ancestral haplotype, which acquired the mutation many generations ago, was over time subject to recombination. By mapping the recombination breakpoints on these newly generated haplotypes in an accurate manner, the size of the remaining part of the ancestral haplotype that is common to all HMSNR disease haplotypes would become evident. This minimal region of shared homozygosity has to contain the disease-causing mutation. To achieve this aim, the physical and genetic data of numerous markers had to be integrated into a bigger picture. The strategy was to incorporate refined mapping and mutation analysis by sequencing exons of positional candidates and examining the identified variants for their potential to be a putative mutation or informative about a recombination breakpoint. Previously, the HMSNR gene region had been mapped to the interval flanked by bA86K9CA1 on the centromeric side and D10S1742 on the telomeric side, which was estimated to be over 1 Mb in 2001. However, after completion of this part of chromosome 10 by the HGP the true size was revealed to be ~0.8 Mb. Identification of new variants concentrated on the slightly larger area between bA86K9CA1 and D10S560. This was necessary, due to the uncertain order of markers at the telomeric boundary of the HMSNR region owing to incomplete sequence of the BAC clones and sequence gaps in the map, which made it possible that this area was actually part of the HMSNR region. In addition, more informative markers were also needed to clarify the haplotypes between D10S1742 and D10S560. Moreover the region flanked by D10S1742 and D10S560 contained two excellent positional candidates for which functions in the PNS had been proposed. In summary, a total 216 SNPs and 13 insertion/deletions were identified. Amongst the insertion/deletions there was a relative abundance of 1 bp deletions. This might be explained by loss of a base through spontaneous depurination, which is one of the most common chemical changes that DNA can undergo (reviewed in [10]). Moreover 46 % of all deletions and 63 % of all insertions submitted to the Human Gene Mutation Database (HMGD) involve 1 bp [216].

117 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region Amongst the 216 SNPs, transitions occurred at 73.6 %, while transversions made up for 26.4 %. The formation of a single nucleotide polymorphism is in a majority of cases caused by deamination via a tautomer of the respective base. For example, this type of chemical reaction transforms a cytosine into uracil, which in turn will pair with adenine instead of guanine (reviewed in [10]). A tautomeric shift always results in a transition (reviewed in [16]), which could explain why transitions were much more frequently encountered amongst the SNPs identified in this PhD. Likewise, the data obtained in the HMSNR study using 216 SNPs, match quite well the figures obtained in a study of 63609 SNPs by Salisbury, where transitions represented 71 % and transversions 29 % of the total [217]. In addition, the report by Salisbury also found that no strand seems to be preferred for the generation of a SNP, i.e. transitions from A to G which are transitions from T to C on the other strand of the DNA, occur at the same frequency. With approximately 150 kb sequenced in the course of this PhD, the overall density of SNPs and insertion/deletions is 1.53 variants per kb of DNA analysed, while inside the refined critical region of 63.8 kb a higher density of one variant per 635 bp was achieved. It became apparent, that, while microsatellite markers are more likely to be informative about recombination breakpoints, SNPs and insertion/deletions play an important role in the fine tuning of the mapping, but also give vital support for microsatellite data. In addition, it can be assumed that with one variant every 635 bp and 54 kb of the 63.8 kb sequenced, refined mapping has been exhausted and recombination breakpoints are not likely to be moved further into the critical region. Due to the genetic isolation of the Gypsies, it was expected that not all polymorphisms identified would be known variants, indicated by submission to the dbSNP database by other laboratories. This held true, as ~ 22 % of all variants seemed to be private to the Gypsies. While one might say there is still a majority of 78 % of the identified polymorphisms that the Gypsies seem to share with the other populations, one should not forget that this excludes variants that were simply not polymorphic in the Gypsies. Furthermore, two thirds of variants located in the 0.8 Mb interval that was the critical HMSNR region at the beginning of this PhD were only submitted to databases between March 2003 and October 2004. This indicates that sequencing was very useful for identification of variants as by March 2003 the critical region had already been refined to nearly the size it is now. Haplotype analysis in the region between D10S581 and D10S560 incorporating 20 microsatellite markers and 56 informative SNPs and insertion/deletions revealed that

118 Chapter 3: Refined Mapping of the Critical HMSNR Gene Region the ancestral HMSNR haplotype had been subject to numerous recombinations, pointing to an old age of the mutation, while the low number of affected individuals and their restriction to one particular Gypsy group (the Kalderash) indicates the influence of genetic drift, leading to enrichment of the HMSNR haplotype in one group and loss of the allele in other groups. Crucial recombinations were contributed by both Bulgarian and Romanian Gypsy families. One of the important recombinations was only identified after refined mapping on the centromeric side had progressed considerably, leading to inclusion of a nuclear Gypsy family from Bulgaria into the study. Extensive typing of variants revealed a crucial recent recombination in a male subject, diagnosed with a recessive peripheral neuropathy, similar to HMSNR. This emphasizes the difficulties caused by a HMSNR phenotype that shows strong similarity with the classical CMT phenotype while including few specific features. The critical region was successfully reduced from about 1 Mb (in 2001) between bA86K9CA1 and D10S1742 to just 63.8 kb flanked by SNP #171 and #156. An interval of this size is amenable to direct sequencing, thus facilitating the identification of the disease-causing mutation. The refinement of the HMSNR gene region to just 63.8 kb stands in excellent comparison to refined mapping efforts for the other two neuropathies in the Gypsies, for which the shared region of homozygosity was decreased to 200 kb in the case of HMSNL [7], and 155 kb for CCFDN [109]. Likewise, the result can also be compared to refined mapping examples in the Finns, where the size of the locus for EPM was refined to 176 kb and for PRO-SL to 150 kb (reviewed in [2]).

119 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

4 ANALYSIS OF POSITIONAL CANDIDATES – IDENTIFICATION OF PUTATIVE HMSNR MUTATIONS

Chapter outline

This chapter deals with the following issues: 1. The process of finding putative mutations, the exclusion criteria and the strategy behind the analysis of positional candidate genes. 2. Results of the screening of positional candidates for mutations and the characteristics of these putative mutations. 3. Introduction of two putative HMSNR mutations in the gene hexokinase 1 (HK1) and screening for these mutations in Gypsy and non-Gypsy subjects. 4. Screening of HK1 in a Spanish Non-Gypsy patient with a complex phenotype including a neuropathy that manifests with neuropathological features similar to the ones seen in HMSNR. 5. Summary of the mutation analysis and conclusion.

4.1 STRATEGIC CONSIDERATIONS

4.1.1 Classification of heritable disease-causing changes

The term mutation is derived from the Latin “mutare” which translates as “to change” or “to exchange”. In human genetics, mutation is used to describe a heritable changes, that may be associated with a negative outcome, a disease [218]). Genetic disease can be caused by mutations at the DNA sequence level or epigenetic modifications not affecting the DNA sequence (reviewed in [180]). Epigenetic changes interfere with the heritable pattern of silencing of chromosomal regions and can be caused by alteration of the DNA methylation, modification of histones or changes in RNA associated silencing (reviewed in [219]). DNA mutations can be divided into three categories: Genome mutations, which affect the genome as a whole (i.e. the number of chromosomes), chromosome mutations including translocation, deletion, insertion or inversion of large parts of single chromosomes, and finally, gene mutations which affect the DNA sequence on a molecular level, such as base substitutions, insertions and deletions (reviewed in [220]).

120 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations On a functional level one distinguishes loss-of-function mutations, hypomorphic mutations, hypermorphic mutations and gain-of-function mutations. A loss-of-function mutation completely abolishes normal function of the gene product. Hypomorphic mutations result in reduction of the normal function or level of expression, while hypermorphic mutations increase the amount of gene-product. Gain-of-function mutations, on the other hand, lead to new functions of the gene or gene product [190].

4.1.2 Looking for the HMSNR mutation

An epigenetic mutation that is not apparent on a DNA sequence level was considered unlikely as the cause of HMSNR for several reasons. Due to their global effect on particular chromosomal regions, epigenetic disorders often manifest with complex phenotypes frequently including mental retardation. Moreover, inheritance patterns of these disorders are not Mendelian and usually difficult to analyse [219, 221]. Neither is the case for HMSNR, where inheritance is strictly autosomal recessive and the phenotype is well defined and fully penetrant with a pathology exclusively affecting the peripheral nervous system [5, 6]. Therefore the search for the HMSNR mutation concentrated on changes in the sequence of the DNA. A genome scan with subsequent linkage analysis, performed by a co-worker, placed the critical HMSNR region on chromosome 10q [5]. Following a first refinement of the critical region by adding new microsatellites and recruiting additional families, the critical region of approximately 1 Mb was defined by markers bA86K9CA1 and D10S1742 at the beginning of this PhD project. In order to identify putative disease-causing mutations in this region, a method was needed, that would be able to detect changes in the DNA sequence. For this PhD project, the method of choice was direct sequencing of PCR products. This method can safely uncover single base changes and insertion/deletions up to several hundred base pairs, if the insertion/deletion is located within the PCR fragment to be amplified. Large insertion/deletions can cause failure of PCR amplification in that region, in which case the critical region needs to be analysed by other methods, i.e. FISH (fluorescence in situ hybridisation) or Southern blots. The search for the mutation concentrated on the genes in the region as the most likely location of a disease-causing mutation. Therefore exons and 50 to 100 bp flanking intronic sequence were amplified for direct sequencing. This positional candidate gene approach stands in contrast to a functional cloning approach, where information about the protein in question is available and mapping to a chromosomal position is performed 121 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations after the protein has been identified. Positional cloning cannot build on any functional information. This implies that for functional cloning, candidates are proteins that fulfil a specific function, whereas for the positional cloning approach genes are assessed which are located in a critical chromosomal region [222]. Potential positional candidates include known genes, but also gene predictions and ESTs which can give information about so far unknown genes in the region or extend the known genes. With no knowledge about the disease mechanism of HMSNR, the selection and prioritising of candidate genes can only build on the fact that the disease pathology of HMSNR is occurring in the PNS. This implies that genes with a function in the PNS and/or with known expression in the PNS are the prime positional candidates. However, from what is known about other autosomal recessive CMTs, the diversity of genes found to be mutated is great, and many of them had no known PNS- specific function before they were implicated with CMT. Therefore all known genes, gene predictions and ESTs in the region needed to be considered and ranked. During this PhD project the refined mapping and the analysis of positional candidates were performed in parallel, which implies that as the critical region was reduced, positional candidates were excluded on the basis of being outside the refined critical region and the focus of sequencing changed to the new region. Therefore, the process of selecting positional candidates was determined by two factors: Firstly, the current critical region and its content including known genes, predicted genes and expressed sequence tags (ESTs); Secondly, the ranking of these positional candidates according to their likelihood of being involved in the disease pathology of a peripheral neuropathy. The two genome interfaces used for identification of positional candidates were NCBI and UCSC. Both provide extensive information about genes and ESTs and their position in the map of the human genome in the respective interfaces ‘Map viewer’ and ‘Human Genome Browser’.

4.1.3 Exclusion criteria for putative mutations

4.1.3.1 The exclusion criteria discussed in the literature

While conducting the screening for mutations, the main question is how to recognise a putative mutation. In an autosomal recessive disorder, such as HMSNR, affected subjects have to be homozygous for the “mutant” allele. For each variant identified, a set of criteria needs to be applied, which will distinguish the disease- causing mutation from harmless polymorphisms.

122 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations Criteria which can be used to prove a disease-causing mutation have been discussed to great extent in two publications by Cotton and Scriver, and Cotton and Horaitis, who are concerned with the quality of published mutation reports [223, 224]. Also discussed are artefacts and ambiguous mutations which need to be recognised when dealing with a putative mutation. Artefacts comprise PCR errors, which are simply avoided by a second independent analysis, and allelism in cis, which occurs when the allele that is causing the disease phenotype is located at an adjacent position in the same gene. Ambiguous mutations can be caused by changes in the third base of a codon, which may be silent, change splicing or result in an unstable mRNA. Further ambiguities can result from changes affecting conserved amino acids, or stop codons that are located downstream of important functional protein domains. Proof for such changes can only be obtained by functional analysis [223].

The criteria for the proof of a disease-causing mutation suggested by Cotton and Scriver [223] and Cotton and Horiatis [224] are:

1. The type of mutation 2. Extent of DNA analysed 3. Co-segregation of the mutant allele with the disease 4. Prevalence of the mutant allele among unaffected controls 5. Functional analysis

The type of the mutation (criterion 1) refers to whether it is the exchange of a single nucleotide or an insertion/deletion. Depending on the location of the mutation, which can be exonic, intronic or flanking the gene, there can be a number of effects. While nonsense mutations are the least ambiguous, as Cotton and Scriver remark, missense changes to the amino acid sequence are to be evaluated regarding the properties of the amino acid changed and whether it is conserved between species. For silent mutations, which usually affect the third base in a triplet, a change in splicing and/or stability of mRNA should be taken into consideration. However, mutations outside the coding sequence of a gene are not discussed by Cotton and Scriver [223], possibly due to the fact that most published mutations are located in the coding part of the gene (reviewed in [186]). Nevertheless, mutations outside the coding sequence of a gene can be deleterious and severe as has been proven in several reports. Mutations in the 5’ untranslated region affect mainly regulation of translation initiation through

123 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations changes in mRNA secondary structure and upstream open reading frames, whereas mutations in the 3’ untranslated region mainly influence translation efficiency (reviewed in [225]). Diseases like hereditary hyperferritinemia/cataract syndrome (reviewed in [225]) or hereditary thrombocythemia [226] have been associated with changes in the UTRs of the respective genes involved. But mutations can also be located in introns and promoters. Intronic mutations can lead to aberrant splicing by activating cryptic splice sites as demonstrated for CCFDN [109], while mutations in the nerve-specific promoter of Cx32 have been found to cause CMTX [227]. The second criterion that the two papers also touch on, is the extent of DNA analysed which, according to Cotton and Scriver and Cotton and Horaitis [223, 224], implies that the search should not be stopped when a putative mutation has been found, but the whole gene should be analysed, which aims at avoiding allelism in cis. Segregation analysis (criterion 3) basically implies that all family members should be typed for the mutation in question. It can be expected, that if there is complete penetrance that co-segregation of the mutation with the disease should be complete, as well. In the case of incomplete penetrance, one may find individuals that possess the mutant genotype but do not show any signs of the disease. However, no affected individuals should have a genotype different from the mutant one. To test for the prevalence of a mutation (criterion 4), Cotton and Scriver suggest screening at least 100 normal chromosomes, arguing that there is a negative selection against mutations but not against harmless polymorphisms, thus mutations are likely to be rare and occur with frequencies of less then 1 % in the population. Ideally the normal chromosomes should originate from healthy unrelated individuals from the same population, where the mutation has been found. However, there are some mutations which occur at frequencies higher than 1 % in specific populations, as for example the sickle cell mutation in haemoglobin B or the most common cystic fibrosis mutation F508 del. Frequencies of the sickle cell mutation in the USA reach 1 in 12 in African Americans (8 %), 1 in 24 for Hispanics (4 %) and 1 in 600 in Caucasians (0.17 %) [228], thus this change would theoretically be a mutation in African Americans and Hispanics but a polymorphism in Caucasians. The F508del mutation in the cystic fibrosis transmembrane regulator gene (CFTR) is most frequent in populations with European ancestry, where it accounts for 50 to 75 % of the cystic fibrosis carrier frequency, with 3.8 % overall carriers (reviewed in [229]), this amounts to at least 2 % for F508del in European populations. The explanation for such high frequencies is an advantage that is associated with carrying the mutation. The sickle cell mutation is

124 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations thought to balance negative selection, due to the malaria resistance that is associated with sickle cell haemoglobin (HbS) (reviewed in [195]. The high incidence of cystic fibrosis has been proposed to result from a heterozygote advantage due to increased resistance to diseases such as cholera (reviewed in [229]). These examples illustrate that the cut-off at 1 % frequency for mutations needs to be treated with caution and Cotton and Scriver suggest that for “mutations resembling polymorphism” the frequency needs to be determined. The final proof of a disease-causing mutation can only be obtained from functional studies including expression analysis (criterion 5), which elucidates the disease causing mechanism of a mutation.

4.1.3.2 Application of the exclusion criteria to the HMSNR project

First of all, any variant of interest was confirmed by repeated experiments thus ruling out analytical artefacts. For this PhD, criteria needed to be chosen that would enable a decision about a putative mutation, without wasting too much time on harmless polymorphisms. Therefore two criteria were applied for a first assessment of potential disease-causing mutations in the HMSNR families: 1. Co-segregation of the variant with HMSNR, and 2. A modification of the population screening approach proposed by Cotton and Scriver.

HMSNR is a recessive disorder with 100 % penetrance; therefore the co- segregation of the disease-causing mutation with the disease phenotype is complete. Only individuals homozygous for the disease allele are affected by HMSNR. Unaffected individuals are divided into carriers (amongst them parents and offspring of HMSNR patients), and non-carriers based on haplotype analysis – carriers have one HMSNR haplotype and another non-disease haplotype, while non-carriers are characterised by two non-disease haplotypes. Segregation analysis was performed in a panel of selected individuals representing affected subjects, carriers and non-carriers. For any putative change that displayed complete co-segregation with HMSNR, it was expected that all affected individuals are homozygous for the rare allele, all obligate carriers are heterozygous and all non-carriers are homozygous for the common allele. If typing of the panel did not lead to exclusion of the variant as the HMSNR mutation by

125 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations demonstrating lack of co-segregation, then additional carriers were typed and if still in doubt, all family members were included. In parallel, the NCBI SNP database (dbSNP) was consulted to obtain the information whether the detected change is a known polymorphism. Cotton and Scriver and Cotton and Horaitis do not directly discuss the use of databases, possibly due to the fact that the variation databases have only very recently reached a point of completeness, where they become helpful for such purposes. However, they suggest screening 100 healthy chromosomes, which serves a similar purpose in absence of database information. When conducting mutation studies for a widely spread mutation in the general population, there is a possibility that the disease-causing mutation has also been deposited in the databases as a polymorphism This is particularly relevant for complex disorders in the general European population, where susceptibility alleles may occur at relatively high frequencies. By contrast, this was hardly to be expected for a founder mutation in the Gypsy population, making the database criterion useful for exclusion purposes in the case of HMSNR. On the other hand, due to their genetic isolation the Gypsies have also acquired a number of seemingly “private” polymorphisms, which when detected as a change in HMSNR patients, needed to be excluded by segregation analysis only, because they are not likely to be found in the databases. For a first assessment of a possible HMSNR mutation, compliance with either one of the criteria, lack of co-segregation with the disease phenotype or deposition in the database as a polymorphism, was assumed to be sufficient for exclusion as a disease-causing mutation. Conversely, if these criteria did not lead to exclusion, then the next step was to conduct a population screen. In expectation of a founder mutation in the Gypsy population, the screen considered two aspects: Firstly, unrelated individuals of Gypsy ethnicity representing the groups in which the disease was identified; secondly closely and more distantly related Gypsy groups. This approach is designed to provide data about the carrier rate in the Gypsy population and information whether the mutation is restricted to certain groups. As shown for other monogenic disorders in the Gypsies, average carrier rates for mutations are often over 1 %, and can be up to 20 % in some groups as has been demonstrated for HMSN-Lom (reviewed in [3]). Furthermore, a sample containing unrelated healthy controls of the Non-Gypsy population should be tested, which should preferably originate from one of the four countries (Bulgaria, Romania, Spain, and France) where HMSNR families reside. In

126 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations this second part of the population screen it was expected, that all controls carry two copies of the common normal allele, thus demonstrating that the mutant allele is unlikely to have been transmitted to the Gypsies by admixture with the surrounding Non-Gypsy population. In addition, the other criteria discussed by Cotton and Scriver and Cotton and Horaitis were not disregarded. Therefore, all possible disease-causing mutations, in the coding sequence, in the untranslated region, intronic, or flanking the gene, were taken into consideration. With the criterion concerning the extent of DNA analysed Cotton and Scriver refer to searching the whole gene instead of stopping the mutation screen at the first mutation that has been identified. For HMSNR, this criterion had to be expanded, as the gene in question was not known. Therefore the analysis included all known genes in the critical region until a refinement of the region would finally exclude them, and in addition ESTs and a large amount of intronic sequence were included into the analysis of the refined critical region.

4.2 ANALYSIS OF POSITIONAL CANDIDATES BY DIRECT SEQUENCING

The analysis of positional candidates by direct sequencing yielded a total of 229 variants. For 95 of these, HMSNR patients were homozygous for the common allele. HMSNR affected individuals carrying a recombinant haplotype were found to be heterozygous for 56 of the 229 variants. These variants were used to perform the refined mapping. For the remaining 78, the HMSNR affected individuals were homozygous for the rare allele, which made these variants putative mutations. Out of 78 variants, 71 were single nucleotide changes, while seven were insertion/deletions of one base in five cases and of two bases in two cases. The analysis of positional candidates was performed in three stages, with stage one focussing on the interval between bA86K9CA1 and D10S560 including known genes, predicted genes and a number of predicted promoter regions of several known genes, while stage two concentrated on the refined region of ca 110 kb between markers D10S1647 and a SNP in the 5’ region of hexokinase 1. Finally, stage three dealt with the refined region of 63.8 kb by direct sequencing of parts of the remaining introns that had not been sequenced before.

127 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations The next three sections will describe the three stages and a fourth section will summarise the findings.

4.2.1 Positional candidate genes (stage 1)

4.2.1.1 Direct sequencing of the exons of the known genes

At described in chapter 4 (Figure 14), the order of BAC clones changed numerous times during the course of the physical mapping, consequently there were constant modifications to the number of candidate genes in the critical region. With the progress of the sequencing of BAC clones, gaps in the sequence were closed and a final position could be established for all of the known genes. Initially, information was collected about all the known genes and mRNAs in the 0.8 Mb (estimated over 1 Mb in 2001). Using the genome interfaces at NCBI and UCSC, a total of 14 known genes (one of them an mRNA without known translation) were identified in the extended critical region between bA86K9CA1 and D10S560 (Figure 32 and Table 14). As two excellent candidates, namely tachykinin receptor 2 (TACR2) and tetraspan NET-7, had been sequenced and excluded prior to this PhD project by a co-worker, the best candidate left in the region was NEUROG3 (neurogenin 3), which is thought to be a transcription factor involved in neurogenesis in PNS and CNS. However, the only putative mutation found in NEUROG3, a T to C change affecting 754 of NM_020999, which results in a phenylalanine to serine amino acid substitution at a residue that is conserved in mouse and rat, was excluded due to lack of co-segregation with the HMSNR phenotype. In detail, two obligate carriers were homozygous for the C allele and, in addition, the C allele was seen in two non-carriers, which had the heterozygous C/T genotype, which shows that the C-allele does occur on non-HMSNR haplotypes. At a later stage, this variant was also submitted to dbSNP (rs4536103) by other researchers thus adding weight to it being a harmless polymorphism. Moreover, with the progress of the sequencing of the BAC clones by the Human Genome Project, it became clear that this gene is in fact located telomeric to the critical HMSNR gene region.

128 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

10q22

0.5 Mb 1.0 Mb NT_008583.16 a)

b) D10S2480 bA86K9CA1 bA314J18TA1 D10S1678 D10S1647 bA227H15CA2 D10S1672 bA227H15AAAC D10S1742 bA404C6AC1 bA404C6AC2 D10S1665 D10S560 bA227H15CA1 bA227H15CA1

c) AL360177 Al391539 AL442635 AL672126 AL450311

AL713888 AL513534 AL359844 AL596223 AC016821

C10orf24 LOC398976 FLJ31406

d) CXXC6 CCAR1 DDX50 DDX21 PRG1 VPS26 HK1 TACR2 NET-7 NGN3 KIAA1279 SUPV3L1 FLJ37767/FLJ22761

Figure 32: Current physical map of the interval between bA86K9 CA1 and D10S560 (updated from [215] using NCBI map viewer, built 34.3), a) Current version of the contig on chromosome 10; b) Microsatellite markers; c) Minimum tiling path of BAC clones, d) In black: position of the 13 known genes and the mRNA; In grey: genes that were not analysed because they have been placed into the region at the end of 2003. Note that the annotation of the 5’ end of the gene CXXC6 extended into the area between D10S2480 and bA86K9CA1; however, this part of the gene was not sequenced, as it was outside the critical region.

While none of the remaining genes (Table 14) seemed to be a really good candidate, one has to bear in mind that many other genes found to be linked to a neuropathy, were not the obvious candidates. Thus, all the genes with uncharacterised function, the zinc finger CXXC6 (then called KIAA1676), CCAR1 (cell cycle and apoptosis regulatory protein-1, then called FLJ10839), DDX50, DDX21, KIAA1279, VPS26, FLJ22761, were equally good candidates. Three of them, DDX21, DDX50 and SUPV3L1, were putative RNA helicases based on predictions. At this time, no connection had been established between the actual process of transcription/translation and CMT phenotypes. Today, mutations in the glycyl-t-RNA synthetase (GARS) and in the carboxy terminal domain phosphatase 1 (CTDP1) of RNA polymerase II, are known to cause CMT2D and CCFDN [109, 127]. Moreover, the mouse dead box helicase Ddx20 was recently shown to repress transcriptional activation by Egr2/Krox20 [230]. Thus, the three putative helicases might now be considered good candidates.

129 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

Table 14: Comparison of the functional knowledge in August 2001 and to date about the 14 known genes in the interval between bA86K9CA1 and D10S560 in order from centromere to telomer (source: NCBI Locus Link [231]) Gene Gene name Function 2001 To date (new Abbreviation information) CXXC6 1 CXXC6 finger Unknown Zink finger transcription factor, associated with leukaemia CCAR1 1 Cell cycle and Unknown, predicted Regulation of cell apoptosis DNA binding domain division and apoptosis regulator 1 DDX50 2 DEAD box Unknown, predicted Putative DEAD-box polypeptide 50 DEAD-box helicase helicase, possibly in RNA synthesis or processing DDX21 2 DEAD box Putative RNA helicase, unwinds dsRNA, folds polypeptide 21 ssRNA, possibly involved in ribosomal RNA biogenesis, RNA editing, RNA transport, general transcription. KIAA1279 1 - Unknown PRG1 2 Proteoglycan, Proteoglycan stored in Proteoglycan stored in secretory secretory granule 1, in secretory granule 1, in granule 1 hematopoietic cells hematopoietic cells, possible mediator of granule-mediated apoptosis. VPS26 2 Vacuole protein Unknown, putative Part of retromer sorting transport/sorting protein complex, involved in retrograde transport of proteins from endosomes to the trans- Golgi network SUPV3L1 2 Suppressor of Unknown, similar to a Mitochondrial RNA var3 like 1 mitochondrial RNA helicase, unwinds helicase dsDNA FLJ31406 3 - Unknown FLJ22761 2 - Unknown, predicted hexokinase domain HK1 2 Hexokinase Glycolytic enzyme TACR2 2 Tachykinin Receptor for tachykinin (neuropeptide substance K) receptor 2 NET-7 2 Involved in PNS signalling and maintenance NEUROG3 2 Neurogenin Transcription factor involved in neurogenesis in CNS and PNS 1 In 2001 only present in NCBI 2 Present in NCBI and UCSC 3 mRNA identified in UCSC

130 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations Furthermore, there was a putative sorting protein (VPS26), a putative DNA binding protein (CCAR-1), a putative hexokinase (FLJ22761) and two genes with no information about predicted domains whatsoever (CXXC6 and KIAA1279). Progressing research has linked PRG1 and CCAR-1 with apoptosis [232, 233], which would have made them good candidates in the light of the recent finding that mutation in GTPase mitofusin 2 and in the small heat shock protein B1, both thought to be connected to apoptosis events, cause CMT2A and CMT2F [116, 132]. VPS26 has been implicated in transport from endosome to the Golgi [234]. While CXXC6 has now been brought in connection with leukaemia [235], there is still no functional knowledge about KIAA1279 and FLJ22761. Similar to hexokinase 1, FLJ22761 with its predicted hexokinase domains, was not considered to be a very attractive positional candidate, because mutations in hexokinase 1, namely amino acids substitutions and large deletions of the coding sequence, have been shown to cause haemolytic anaemia [236- 238]. FLJ31406, so far without any associated translation into a protein, was placed into the region in UCSC only after the sequencing of the BAC clones had been completed, and was therefore sequenced last. Blast searches yielded no information about this mRNA, no ESTs were found to cover it and even now FLJ31406 does not appear in the list of known genes in neither NCBI nor UCSC. Sequencing of the known genes included all genes located in the interval between bA86K9CA1 and D10S560. On the centromeric side, crucial recombinations were only identified whilst sequencing the predicted promoters and the predicted genes. Similarly, on the telomeric side, priority was given to the three excellent candidate genes TACR2, tetraspan NET-7 and NEUROG3, which were shown to be outside the critical region, when the complete sequence was published by the HGP. In addition, after the three genes had been sequenced, heterozygous variants in HK1 were detected in HMSNR patients, which supported the fact that these genes were not part of the critical region. A total of 25 potential mutations (Table 15 and Table 16) were identified while sequencing the known genes. Three of them were insertion/deletions, while the remaining 22 fell into the category of single nucleotide changes. One candidate mutation each was identified in CXXC6, PRG1, VPS26 and NEUROG3, while NET-7 and DDX21 had each two putative mutations. Sequencing of FLJ22761 and HK1, detected five putative mutations for each of the genes, and FLJ31406 had seven. No potential mutations were detected in CCAR1, DDX50, KIAA1279, SUVP3L1 and TACR2.

131 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

Table 15: Putative mutations detected while sequencing the known genes and their position within the gene Gene Putative Position in gene mutations 5’ 5’/3’ Unknown intron coding 3’ (Single flanking UTR (UTR/cds) flanking nt/indels) CXXC6 1/0 - 1 - - - - CCAR1 0/0 ------DDX50 0/0 ------DDX21 2/0 - 1 - 1 - - KIAA1279 0/0 ------PRG1 1/0 - 1 - - - - VPS26 1/0 - - - - 1 - SUPV3L1 0/0 ------FLJ31406 7/0 - - 5 2 - - FLJ22761 5/0 - 2 - 2 1 - HK1 2/3 - - - 3 1 1 TACR2 0/0 ------NET-7 2/0 - 1 - 1 - - NEUROG3 1/0 - - - - 1 - Sum 22+3=25 - 6 5 9 4 1 % 100 % 0 % 25 % 20 % 36 % 16 % 4 %

Table 16: Putative mutations detected while sequencing the known genes and their exclusion Gene Putative mutations First round exclusion criteria (Single nt/indels) Lack of co- Known Both criteria segregation only variant only fulfilled CXXC6 1/0 - - 1 CCAR1 0/0 - - - DDX50 0/0 - - - DDX21 2/0 1 1 - KIAA1279 0/0 - - - PRG1 1/0 - 1 - VPS26 1/0 - - 1 SUPV3L1 0/0 - - - FLJ31406 7/0 - - 7 FLJ22761 5/0 - - 5 HK1 2/3 1 1 3 TACR2 0/0 - - - NET-7 2/0 - - 2 NEUROG3 1/0 - - 1 22/3 2 3 20

In regards to their position in the respective candidate gene (Table 15), a majority of nine (36 %) were located in the introns, another six (25 %) were detected in the 5’ and 3’ untranslated sequence, four (16 %) in the coding sequence and one (4 %) in the 3’ flanking sequence. Another five putative mutations (see column “unknown”,

132 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations Table 15) were identified in FLJ31406, for which no translation into a protein could be determined, therefore it was unclear whether these changes would be in the coding sequence or in the UTRs. The four putative mutations in the coding sequence were located in VPS26, FLJ22761, HK1 and NEUROG3. In VPS26 a C/T change at base pair 220 of NM_004896.2 caused a synonymous amino acid change from serine to serine. Another synonymous amino acid change was the putative mutation in the coding sequence of HK1, where an A/G change in position 1524 of NM_000188.1 encoded a lysine on both alleles. In FLJ22761, an arginine to tryptophane change, which equals a change from a charged polar to a non-polar amino acid, was caused by a C/T change at base pair 2294 of NM_025130.2. This amino acid residue is conserved in the mouse homologue of FLJ22761, and also in closely related genes, namely HK1 and HK2, from a number of species. The coding change in NEUROG3 occurred at base pair 754 of NM_020999, which caused a serine to phenylalanine change (as discussed above). Exclusion of the 25 putative mutations in known genes (Table 16) was achieved as follows: solely by lack of co-segregation with the HMSNR phenotype for two of the variants; three were excluded because they were known polymorphisms and the remaining 20 fulfilled both criteria. It has to be noted, that all three variants that were detected in the coding sequence were among the 20 that showed lack of co-segregation and at the same time were known polymorphisms. As sequencing of the exons of the known genes including 50 to 100 bp of the flanking introns had not yielded a mutation that could be further investigated, new possible mutations needed to be identified. These may be located in a region of a known gene that had not been searched, as for example the promoter, the introns, or an unknown exon. Another possibility was an unknown gene in the region. In order to take these possibilities into account, the database entries for the genes were searched again, but no new exons were identified at that time. Further, predicted promoters were analysed for 12 of the genes and predicted genes were inspected as possible new genes in the region. With a critical region of approximately 1 Mb, at that time, sequencing of introns was not feasible.

133 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

4.2.1.2 Direct sequencing of the predicted promoters of the known genes

The promoter was predicted by pasting the sequence of up to 2 kb upstream of the transcription start site into prediction programmes, namely Core Promoter, NNPP, Grail and TSSW, which are available on the internet (for URLs see methods). Subsequently, the output from several programmes was compared and the most likely predicted promoter regions were subjected to direct sequencing including up to 300 bp of flanking sequence on either side of the predicted promoter in order to allow for some prediction error and transcription factor binding sites that are not in close proximity to the predicted site. However, the programmes frequently yielded inconsistent predictions evident by the large differences in the prediction results. Therefore, it was not possible to conclusively state, whether the true promoter had been actually included into the analysis. Further problems occurred for genes with incomplete 5’ end as was the case for FLJ22761 and CXXC6, where new 5’ coding exons were published afterwards. Due to the previous difficulty with determining the promoters, the predictions were not repeated for these cases. The sequencing of the putative promoters identified one additional candidate mutation, a single base change in CXXC6, which is now located in the coding sequence of this gene. This A to G change at base pair 3578 of NM_030625.1, which results in a methionine to isoleucine substitution, was excluded due to lack of co-segregation with HMSNR and has meanwhile also been deposited in the dbSNP database by other researchers. Moreover, this amino acid residue is not conserved in the rat and the mouse, where the methionine is substituted by valine; all three amino acids, methionine, isoleucine and valine are non-polar amino acids, With the known genes and their predicted promoters, the total sequencing effort amounted to 83.7 kb (Table 17). Between 1.9 kb and 11.1 kb had been sequenced per gene. 46.0 % of the sequencing was performed in introns, 29.8 % in coding sequence, while flanking and UTR sequence was covered with 9.7 % and 12.2 %, respectively. 1.933 kb (2.3 %) belonging to FLJ31406 could not be assigned to either coding or untranslated sequence, as no translation into a protein could be established.

134 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

Table 17: Amount of sequencing performed for each gene including promoter predictions (all in kb) Genes1 kb per Flanking UTR Unknown Intron Coding Prom. gene Sequence (kb) UTR/cds (kb) (kb) pred. 4 (kb) (kb) CXXC6 7.475 0.943 1.876 0 2.013 2.643 yes 2 CCAR1 11.161 0.644 0.403 0 6.582 3.532 yes DDX50 6.477 0.698 0.323 0 3.242 2.214 yes DDX21 7.484 0.101 1.150 0 4.085 2.148 no KIAA1279 5.792 1.126 0.574 0 2.223 1.869 yes PRG1 2.141 0.622 0.642 0 4.01 0.476 yes VPS26 4.894 0.592 1.670 0 1.650 0.982 yes SUPV3L1 6.087 0.046 0.083 0 3.598 2.360 yes FLJ31406 2.963 0.189 03 1.9333 0.841 03 no FLJ22761 8.682 0.266 0.912 0 4.830 2.674 yes 2 HK1 11.439 0.281 1.263 0 6.698 3.197 no TACR2 3.557 1.508 0 0 0.738 1.311 yes NET-7 3.540 0.416 0.825 0 1.413 0.886 yes NEUROG3 1.990 0.676 0.503 0 0.166 0.645 yes Sum 83.682 8.108 10.224 1.933 38.480 24.937 % 100 % 9.7 % 12.2 % 2.3 % 46.0 % 29.8% 1 genes in order centromeric to telomeric 2 5’ end of gene incompletely annotated at time of prediction, prediction now invalid 3 coding sequence and UTR could not be determined for FLJ31406 therefore given as unknown 4 Prom. pred. = promoter predicted

4.2.1.3 Direct sequencing of predicted genes

After the promoters, gene predictions were analysed, by sequencing of exons and 50 to 100 bp of flanking intronic sequence. Most of the gene predictions displayed in the UCSC genome browser matched the already known genes. Seven predicted genes (Table 18) that appeared in the NCBI database were analysed by direct sequencing, as described for the known genes. However, in subsequent NCBI builds, after the sequencing of this region on chromosome 10 was completed, all of the predictions were withdrawn from the database. This demonstrates that gene predictions should be treated with caution, especially if the region of question is still undergoing changes in the BAC sequence and if there is no support by ESTs or mRNAs.

135 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

Table 18: Names and accession numbers of the predicted genes from the NCBI database that were sequenced in the critical region Name of predicted gene Accession number LOC219737 (LOC118792) Was XM_168766, record removed LOC142851 Was XM_084362, record removed LOC159322 Was XM_099923, record removed LOC159321 Was XM_089516, record removed LOC118793 Was XM_061150, record removed LOC118622 Was XM_061060, record removed LOC159241 Was XM_099854, record removed

With the LOC gene predictions a total of 14.755 kb were sequenced. Due to the fact that all of the prediction models were subsequently withdrawn, this sequence was assigned to the known genes, if they were within a known gene or flanking by no more than 2 kb, or if otherwise labelled intergenic sequence. Assignment of LOC gene sequence to known genes became possible, due to the fact that the BAC sequence was finished and gaps were closed by the Human Genome Project, which in turn enabled correction of previous gene annotations. Thereby 0.749 kb of flanking sequence, 0.220 kb of UTR sequence, 0.514 kb of coding sequence and 4.464 kb of intronic sequence were added to the known genes, while the remaining 8.808 kb were intergenic. The additional intragenic sequence is mostly a consequence of a large gap in the genomic sequence which was still present at the time at which the gene predictions were performed. This gap prevented the complete annotation of the genes FLJ22761 and hexokinase 1. The prediction models did correctly predict five additional exons for these genes, but also a number of exons which are now located in their intronic sequence. Additionally, for some of the known genes, larger untranslated regions were annotated after new mRNAs were discovered, overlapping with some of the sequence from the prediction models. A total of 10 putative mutations (Table 19) were identified by sequencing the predicted genes, nine single base changes and one insertion/deletion of two bp. All ten candidate mutations could be assigned to known genes after the predictions had been withdrawn from the database. One is located in PRG1; five were in HK1 and four in NET-7. Four of ten putative mutations were in the untranslated regions, another four in the introns and one each in coding and flanking sequence. The putative mutation identified in the coding sequence of HK1 was an A/G change at nucleotide 519 of the HK1 transcript NM_033500. It encoded a substitution from arginine to histidine; both of them charged non-polar amino acids. It could not be determined whether this amino

136 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations acid residue is conserved, as no homologous sequences from other species could be found in the databases. All ten putative mutations were excluded, one based only on lack of co- segregation with HMSNR, and nine were known polymorphism and, in addition, did not co-segregate with HMSNR (Table 19).

Table 19: Putative mutations identified while sequencing the predicted genes Prediction Gene Putative Position in gene assigned mutations 5’ UTR intron coding 3’ (Single nt/ flanking flanking indels) LOC219737 PRG1 1/0 - 1 - - - LOC159321 HK1 5/0 - - 4 1 - LOC159241 NET-7 3/1 - 3 - - 1 Sum 9+1=10 0 4 4 1 1

4.2.1.4 Summary of the sequencing of positional candidate genes (stage 1)

During the sequencing of 14 known genes, their predicted promoters and seven predicted genes located between bA86K9CA1 and D10S560, 36 putative mutations were identified, 25 in the known genes, one in a predicted promoter and another ten in the gene prediction models (summarised in Table 20). A total of 32 were single nucleotide changes and four were insertion deletions. 10 (27.8 %) of 36 were located in the untranslated regions of the genes, 13 (36.1 %) in the introns, six (16.7 %) in the coding sequences and the two (5.6 %) were found in the 3’ flanking sequence of the genes. For five (8.3 %) located in FLJ31406 assignment to coding sequence or UTR could not be established (Table 20). Three of 36 candidate mutations were excluded based solely on lack of co- segregation with the HMSNR phenotype, another three were known polymorphisms, while a majority of 30 was excluded based on both criteria (Table 20). In summary, this implies that the HMSNR mutation had not been identified and the search for putative mutations needed to be continued.

137 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

Table 20: Summary of the putative mutations identified by sequencing analysis The sequencing included all known genes, their putative promoter regions and predicted genes located between markers bA86K9CA1 andD10S560. Given in the table are: the position of all putative mutations in the respective gene and their exclusion as a cause for HMSNR. Gene Putative Position in gene First round exclusion criteria Abbreviation mutations 5’ UTR Unknown intron coding 3’ Lack of co- Known Both (Single flanking (UTR/cds) flanking segregation variant criteria nt/indels) only only fulfilled CXXC6 2/0 - 1 - - 1 - - - 2 CCAR1 0/0 ------DDX50 0/0 ------DDX21 2/0 - 1 - 1 - - 1 1 - KIAA1279 0/0 ------PRG1 2/0 - 2 - - - - - 1 1 VPS26 1/0 - - - - 1 - - - 1 SUPV3L1 0/0 ------FLJ31406 7/0 - - 5 2 - - - - 7 FLJ22761 5/0 - 2 - 2 1 - - - 5 HK1 7/3 - - - 7 2 1 2 1 7 TACR2 0/0 ------NET-7 5/1 - 4 - 1 - 1 - - 6 NEUROG3 1/0 - - - - 1 - - - 1 Sum 32+4=36 - 10 5 13 6 2 3 3 30 % 100% 0 % 27.8 % 13.9 % 36.1 % 16.7 % 5.6 % 8.3 % 8.3 % 83.3 %

138 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

4.2.2 Positional candidate ESTs (stage 2)

While exploring the polymorphisms that were identified during the sequencing of the known and predicted genes, it was possible to redefine both the centromeric and the telomeric boundary of the region of homozygosity. On the centromeric side heterozygous SNPs in HMSNR patients from the Romanian family enabled the establishment of a recombination breakpoint telomeric of marker D10S1647. The new telomeric end of the critical region was supported by a number of heterozygous variants occurring in affected individuals of both the Romanian and the Bulgarian families and was placed centromeric of SNP #156. With this progress in refined mapping, the region of homozygosity was reduced to ~110 kb. This part of the critical interval includes four genes in order (Figure 33): SUPV3L1, FLJ31406, FLJ22761 and HK1. While SUPV3L1 and HK1 are not fully contained in the 110 kb, FLJ31406 and FLJ22761 have an overlapping genomic arrangement without actual exon sharing in the inner part of the 110 kb. Functionally, these four genes include a putative helicase, a gene with unknown function, a predicted hexokinase and a hexokinase, respectively. The HMSNR mutation must be in one of these four genes or in a new gene, yet to be identified. The latter seemed rather unlikely, as the distance between SUPV3L1 and FLJ31406 is only 6 kb, FLJ31406 and FLJ22671 are overlapping, and there are only 2.5 kb between the last exon of FLJ22761 and the first exon of HK1. Any new gene would have to overlap the existing ones and would possibly be transcribed from the opposite strand. Overlapping gene arrangements have been reported for a only a few loci in the human genome, one of them being the MINK and the CHRNE locus on chromosome 17 [239]. The little space available for another gene in the 110 kb of HMSNR region may indicate that the mutation is likely to be in one of the known genes. But the exons of these genes had already been sequenced. This meant that the mutation was either in an exon that had not been annotated, in an intron or, very unlikely in an overlapping new gene. Neither UCSC nor NCBI provided any new gene predictions for the 110 kb, adding evidence to the notion that there might be no other gene contained in the critical region.

139 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 110000 120000 BAC clone bA227H15 (accession: AL569223) D10S1647 bA227H15CA1 bA227H15-CA2 D10S1672 Genes FLJ31406

SUPV3L1 FLJ22761 Hexokinase 1 // // HK1 exon S18 FLJ31406-5 (36303-37742) HK1-T1 (90970-91078) SUPV3L1-2 (6957-7034) FLJ31406-4 (43847-43995) HK1-alt T2b (91977-92067) (BI562261) SUPV3L1-3 (7413-7520) FLJ31406-3 (44137-44195) HK1-alt T2 (99493-99755) (4 ESTs) SUPV3L1-alt4 (8403-8436) (AV721871) FLJ31406-2 (46792-46965) HK1-alt T2c (101112-101124) (BG718423) SUPV3L1-4 (8612-8726) FLJ31406-1 (53341-53460) HK1-T2 (103499-103600) SUPV3L1-5 (10237-10406) HK1-T3 (109641-109740) SUPV3L1-6 (12628-12736) FLJ22761-1 (41272-41468) FLJ22761-11 (71259-71405) HK1-alt T4 (113228-113319) (AK128226) SUPV3L1-alt 6 (12624-13243)(BU168259) FLJ22761-2 (48177-48339) FLJ22761-12 (71503-71622) HK1-altT4b (113693-113744) (predicted) SUPV3L1-7 (16158-16235) FLJ22761-3 (53744-53882) FLJ22761-13 (78006-78101) HK1-T4 (116603-116659) SUPV3L1-7 ALT (16158-16350) (T68413) FLJ22761-4 (53984-54103) FLJ22761-14(78297-78396) SUPV3L1-8 (17962-18053) FLJ22761-5 (60012-60110) FLJ22761-15 (79746-79820) SUPV3L1-9 (19342-19522) FLJ22761-6 (61620-61721) FLJ22761-18 (86555-86788) SUPV3L1-10 (20074-20167) FLJ22761-7 (64152-64335) FLJ22761-19 (87580-88529) SUPV3L1-11 (21250-21469) FLJ22761-8 (67049-67202) FLJ22761-19b (88407-88522) SUPV3L1-12 (23409-23489) FLJ22761-9 (68336-68563) SUPV3L1-13 (23837-24013) FLJ22761-10 (69394-69698) SUPV3L1-14 (28763-28911) FLJ22761-15b (79746-79929) SUPV3L1-15 (29570-30005) FLJ22761-17 (83028-83157) (BX538078) FLJ22761-16 (82109-82264)

ESTs

7 22411-22699 10 47133-47474 13 61590-61802 21 79842-79930 1 7567-8082 8 23255-23727 11 50322-50946 17 69510-69923 22 82109-82469 2 13343-14242 9 26617-27005 12 56053-56628 23 82132-82607 3 14433-14875 18 56290-56390/58878-59250 24 87339-87431 25 108139-108187 4 16877-16965 14 62049-62230 19 74771-75230 26 86312-86697 5 17195-17487 15 63156-63390 20 76483-76886 27 89070-89118 28 111813-112110/112112-112266 6 18703-18875 16 63403-63523 32 81909-82113 29 114966-115460 30 55975-56068 31 57705-58129

EST legend: 1 AW592441, BM972081, AW207632, AW207630 8 AW629728, BG994573 15 AI581736 21 AW451101, BF063294, AI380194, AI380397 28 BE162082 (2 alignments) 2 BG778902, AA325674, AW157258 9 BG994573 16 AW849423 22 AW451101, BF063294, AI380194, AI380397}spliced 29 BU661976 3 AA764876, BG992172 10 BG979125 17 T55871, AI673121 (Hs.161610) 23 BQ188285, BQ185723, AW450222, AA152163 30 AW813635 4 AW589252 11 AW813580 18 BQ638264 (2 alignments) 24 AA738258 31 AW813635 5 AW589252, AL601928 12 AW850137 19 AA018266 25 AA738258}spliced 32 BF754448 6 BF913205 13 BQ321426 20 AA018231}same clone 26 AI203682 7 AW812661, H69867 14 BF882882 27 AI203682 }was LOC142851

Figure 33: Map of the ESTs analysed in the 110 kb in relation to the known genes This map shows chromosome 10 nucleotide positions 6975000 to 6987000, according to UCSC freeze June 2002.In the scale shown at the top, the “0” corresponds to position 6975000 in the UCSC map. The positions of exons and ESTs are relative to that scale. The following colour codes are used to designate different genes: dark blue- SUVP3L1, orange- FLJ31406, light green FLJ22761, light blue-HK1. Newly identified exons to the known genes are in red and the label is bolded. The working numbers given to ESTs are highlighted in light yellow, and the corresponding accession numbers are provided at the bottom of the figure. 140 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations Nevertheless, all possibilities needed to be addressed. Even with only 110 kb left, sequencing all introns was a major enterprise and therefore left as a last resort. Instead, the effort was focussed on analysing all ESTs mapped to the 110 kb. ESTs ideally represent expressed parts of the genome. However, the quality of ESTs may be compromised by residual DNA in the RNA preparation, which can lead to priming of genomic DNA during clone library production and thus generation of false ESTs. Additionally, clones in the library are often only sequenced in one direction, and rarely include sequence of the entire insert. For the purpose of assessing the ESTs, the UCSC Genome Browser was used, which provides a very comprehensive alignment of all spliced and unspliced ESTs. From a list of several hundred ESTs, the ones matching exactly the exons of the already sequenced known genes had to be excluded. For the remaining ESTs the alignment quality and possible other positions in the genome as given by UCSC were considered, in order to single out misplaced ESTs. The residual ESTs were subjected to direct sequencing. Overall, a total of 42 ESTs have been identified this way, which together comprise 32 possible exons. No new gene seemed to be contained in the ESTs, as they were mostly unspliced and in many cases did not exhibit overlap with each other. There was no evidence that the ESTs represent novel genes as single ESTs may be rare transcripts, but most likely represent contaminating sequences including genomic DNA [240]. However, analysis of the ESTs revealed eight, so far not sequenced, alternative exons to the known genes deduced from spliced ESTs or newly submitted mRNAs, and one predicted exon (Figure 33). In total, this part of the sequencing comprised a further 16.226 kb, a majority of 8.413 kb belonged to the ESTs and thus was putative exonic sequence, 7.350 kb were intronic sequence flanking the newly identified exons, 0.262 were UTRs of the known genes and 0.201 kb and coding sequence (originating from alternative exon T4 in HK1). Only five (Table 21) additional candidate mutations were found, all of them single nucleotide substitutions. Relative to the known genes, one of the changes was located in an intron of SUPV3L1, two were intronic in FLJ22761 and one was intronic in HK1. The fifth change was identified in an alternative exon of HK1, specifically alternative exon T2, which was deduced from four spliced ESTs: BM686492, BG719874, BQ187332, CK300651, with the latter two originating from the same clone. Four of the five newly identified putative mutations were excluded, one due to lack of

141 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations co-segregation with HMSNR, one was a known polymorphism, and another two complied with both criteria. However, the fifth variant, a G to T change in the alternative exon T2 of HK1, was not excluded by these criteria, prompting further investigation (Table 21).

Table 21: Location of the five putative mutations identified during the sequencing of the ESTs in the 110 kb and details on their exclusion Location of putative mutation First round exclusion criteria Relative to gene Relative to EST alignment Lack of co- Known Both segregation variant criteria only only fulfilled Intronic in SUPV3L1 Flanking AW629728 x Intronic in FLJ22761 In BQ638264 x Intronic in FLJ22761 In AA738251 x Intronic in HK1 In BU661976 x Exonic in HK1 In BQ187332, BM686492, Not excluded BG719874, CK300651 1 1 2

4.2.3 Sequencing the final region of homozygosity of 63.8 kb (stage 3)

After inclusion of a Bulgarian nuclear family into the refined mapping, the region of homozygosity was reduced to just 63.8 kb, containing the first exon of FLJ31406, FLJ22761 from exon 3 to the end of the gene, and several 5’ exons of HK1 (Figure 34). Due to the lack of functional proof for the putative mutation in alternative exon T2, it seemed possible that the real mutation was not yet identified. Therefore, a co- worker started sequencing the remaining introns and intergenic sequence. Additional 34.943 kb comprised of 33.066 kb intronic sequence and 1.877 kb of flanking sequence located between the 3’ end of FLJ22761 and the 5’ end of HK1 were analysed for putative mutations. Including previous sequencing efforts, a total of ~54 kb of the final critical HMSNR gene region of 63.8 kb have been searched for mutations.

142 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

C10orf24 LOC398976 FLJ31406

CXXC6 CCAR1 DDX50 DDX21 KIAA1279 PRG1 VPS26 SUPV3L1 HK1 TACR2 NET-7 NGN3 FLJ22761

109.3 kb

63.8 kb

Microsatellite markers bA227H15CA1 bA227H15CA2 D10S1672

Exon 1 Genes FLJ31406

Exon 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 FLJ22761

Exon T1 alt T2b alt T2 alt T2c T2 T3 alt T4

HK1

ESTs

Not sequenced regions

Figure 34: Map of the final region of homozygosity of 63.8 kb The genes located between markers bA86K9 and D10S560, according to NCBI MapViewer (August 2004) are shown at the top of the figure. The two genes in grey font colour, C10orf24 and LOC398974, have only recently been added to the map and were therefore not included into the analysis for this PhD. The lower part shows genes contained in the refined critical HMSNR region of 63.8 kb, approximate location of ESTs and areas that have not been sequenced so far.

143 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations Sequencing of the above mentioned ~35 kb, identified 37 additional candidate mutations (Table 22), which I have subsequently analysed for co-segregation with HMSNR and which were included in this thesis. Three of the candidate mutations were insertion/deletions of one bp each, while 34 were single base changes. Ten of these candidate mutations were known polymorphisms. Further 26 were known polymorphisms, but also exhibited lack of co-segregation with HMSNR (Table 22). However, one candidate mutation, located in an intron of HK1, was not excluded and consequently needed further analysis.

Table 22: Overview of the exclusion of the newly identified mutations during the sequencing of the remaining introns and intergenic regions in the 63.8 kb Gene Putative First round exclusion criteria Abbreviation mutations Lack of co- Known Both criteria Single segregation variant only fulfilled nt/indels only FLJ31406/FLJ22761 1/0 - - 1 (overlap region) FLJ22761 9/2 - 3 8 HK1 24/1 - 7 17 37 10 26

4.2.4 Summary of the mutation analysis

A total of 78 possible mutations (Table 23) have been identified during the analysis of positional candidates, in the process of sequencing nearly 150 kb of genomic DNA in a panel of HMSNR patients. The number of affected individuals included ranged between one and five, thereby taking into account newly acquired haplotype data. This was achieved by selecting individuals with haplotypes containing ongoing centromeric or telomeric recombinations for the sequencing.

Table 23: Overview of the findings of the mutation analysis Amount of sequencing Putative mutations Exclusion performed (kb) Single nt/indels Stage 1 98.437 32/4 All excluded Stage 2 16.396 5/0 One not excluded Stage 3 34.943 34/3 One not excluded Total 149.776 78 Two not excluded Stage 1: Sequencing of all exons of all known genes, their predicted promoters and gene prediction models, Stage 2: Sequencing of all ESTs in the initial 110 kb refined region and Stage 3: partial sequencing of the remaining introns of the final critical region

144 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations The three stages of analysis, candidate genes (including all exons of all known genes, their predicted promoters and gene prediction models), all ESTs in the initial 110 kb refined region and partial sequencing of the remaining introns of the final critical region of 63.8 kb, identified 36, five and 37 putative mutations respectively. Of the 78, four were excluded solely by the lack of co-segregation with the HMSNR phenotype, and for 14, exclusion was based on the fact that they had been deposited as polymorphisms in dbSNP, while for 58, exclusion was founded on both criteria. Two changes were not excluded and therefore needed further examination. Confirmation of the exclusion was also provided by the refined mapping for a subset of candidate mutations. Of the 78 changes, 31 were placed outside the final region of homozygosity, while 47 are located inside the 63.8 kb. The higher number of putative mutations in the final critical region is due to the increased density of sequencing performed in this part. For 13 of the 14 putative mutations, which were exluded solely on the basis of deposition of the variant in the database, exclusion can be supported by other means. Four of the putative mutations are located outside the refined critical region of 63.8 kb. Additional four variants can be excluded on grounds of frequency data in dbSNP, which indicate that the putative disease allele is frequently found. For five of 14 variants lack of co-segregation with HMSNR was established after the submission of this PhD thesis. This leaves one variant affecting a mononucleotide repeat, where the existence of a change in the HMSNR families needs to be confirmed, as repeated direct sequencing yielded conflicting results.

145 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

4.3 POPULATION SCREENING OF THE TWO PUTATIVE HMSNR MUTATIONS

4.3.1 Aim of the mutation screening

Two putative HMSNR mutations could not be excluded with the two criteria for first assessment. Therefore, a population screen was performed for both mutations, in order to gather evidence as to whether either of the putative mutations is in fact disease- causing. For this purpose, Gypsy sample needed to be screened and furthermore a sample of non-Gypsy individuals, which should indicate whether the putative mutation was introduced into the Gypsies by admixture with the surrounding non-Gypsy population. It was hypothesized that the population screen in the Gypsies would identify carriers for the two putative mutations in the Kalderash, which is the Gypsy group the HMSNR families belong to; while possibly detect very few carriers in other Gypsy groups. Owing to the close proximity of the two mutations (details see 4.3.3), it was expected that they would be in linkage disequilibrium with each other. Therefore the two mutant alleles would always occur together on the same haplotype. However, there seemed to be a slim chance that a healthy individual could be identified which is homozygous for one or even both of the two putative mutations thus leading to exclusion of one or both of the putative mutations as disease-causing as the penetrance is 100 % for HMSNR.

4.3.2 Introduction to Hexokinase 1

Both putative HMSNR mutations were identified in the gene hexokinase 1 (HK1); therefore this section will focus on giving a brief introduction to the structure of this gene and mutations identified in HK1 in context with haemolytic anaemia, before the next section will describe the mutations and their location in HK1. The HK1 gene on chromosome 10q22 encodes one ubiquitous and several tissue-specific HK1 isoforms, which are all generated from the same locus by alternative splicing (Figure 35). The ubiquitously expressed (NM_000188) isoform, which has also been called the “somatic” isoform by Mori et al 1993 [241] and Andreoni et al 2000 [242], contains 18 exons labelled S1 to S18. The erythrocyte

146 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations specific isoform HK1-R (NM_033496) is also encoded by exon S2 to S18, but has a different first exon, 1R. Additionally, there are a number of testis-specific isoforms, which again share exon S2 to S18, but contain additional testis-specific exons labelled T1 to T6 and alternative exons T2b, T2, T4 and T5. Three of these transcripts have been processed by the NCBI gene annotation and have become reference sequences (NM_033497, NM_033498 and NM_033500), while the remaining are not annotated (AK128226 and U38228), thus implying that they might be incomplete, which seems to be the case for U38228. An additional mRNA (AK091267) has been omitted from the figure for clarification purposes. This mRNA, isolated from tongue tumour tissue, starts with a 289 exon before T6, and further contains S2 to S13; it contains the sequence between S11 and S12 that is intronic in the other isoforms, and in addition, S13 in a longer version of 469 bp. The refined HMSNR gene interval of 63.8 kb partially contains the 5’ region of HK1; therefore ESTs in that area were searched using the UCSC Genome Browser. This search identified the ESTs BQ187332 and CK300651 (both derived from the clone UI- E-EJ1-ajz-j-23-0-UI), BG719874, BM686492 and BG718423 (Figure 36). BQ187332 and CK300651, derived from several foetal and adult eye tissues, start with alternative exon T2, which is followed by T2, T3 and T4, while the end is given by an alternative exon T5. EST BM686492, which has been generated from an adult optic nerve library, contains parts of alternative exon T2, then T2, T3, T4 and S2. EST BG719874 is derived from testis and contains exons T1, alternative T2, T2, T3, T4 and T5, while EST BG718423, from the same tissue, starts with T1, the next exon is alternative exon T2c, followed by T2, T3, T4 and S2. All mutations so far reported for HK1 cause nonspherocytic haemolytic anaemia with varying degree of severity and associated symptoms. In 1995, Bianchi and Magnani were the first to publish mutations in HK1. They found a 96 bp deletion involving the complete exon S5 (nt 577 to 672 of NM_000188) and a T to C change at nt 1667 (also NM_000188), which encodes a leucine to serine change at amino acid 529, in one and the same patient, who was a compound heterozygote for the two mutations [236]. In another patient, a report by Kanno and colleagues, a homozygous intragenic deletion of 9490 bp involving exons S5 to S8 was identified [237]. In 2003, van Wijk et al published a homozygous missense mutation at amino acid residue 680, a threonine to serine change, caused by a C to G change at nucleotide 2120 of NM_000188 [238].

147 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

Genomic arrangement of hexokinase 1 (spanning ~133 kb)

Introns (~kb) 0.9 7.4 1.4 2.3 6.0 3.5 3.2 0.54.3 14.9 2.9 19.4 5.2 15.9 4.7 3.6 0.6 0.1 7.3 2.8 2.4 1.5 0.3 1.4 2.8 2.8 2.6 3.5 2.1

T1 altT2b altT2altT2c T2 T3 altT4 T4altT5 T5 1R S1 T6 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 Exons (bp) 122/ 109 90 262 112 10093 48284 93 200 144 200 163 149 120 96 100184 156 234 305149 120 96 100 184 156 234 892 176

Hexokinase 1 transcripts (mRNAs)

1. Somatic transcript NM_000188 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18

2. Erythrocyte transcript NM_033496 1R S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18

3. Testis-specific transcripts NM_033497 T1 T2T3 T4 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18

NM_033498 Legend T1 T2T3 T4 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18

Testis-specifc exon NM_033500 Erythrocyte-specifc exon T1 T2T3 T4 T5 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 Somatic exon AK128226 Exon identified from ESTs T1 T2T3 T4 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11S12 S13 S14 S15 S16 S17 S18 altT2b altT4 ATG translation start U38228 (incomplete?) T1 T2T3 T4 T6 S2

Figure 35: Genomic arrangement of human hexokinase 1 and the tissue-specific mRNAs At the top of the figure the genomic arrangement of human HK1 including intron size in kb and exon names and sizes in bp are displayed. All reported mRNAs are shown in the lower part. RNAs accessions starting with NM are annotated sequenced from NCBI RefSeq. Exons in yellow are testis-specific, exons in red erythrocyte specific and exon in black are somatic. Exons in green only occur in ESTs. The term somatic transcript (as used in [241, 242]) refers to the ubiquitously expressed isoform of HK1. Reverse triangles depict translation start sites as annotated in NCBI Sequence viewer.

148 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

Genomic arrangement of hexokinase 1 (spanning ~133 kb)

Introns (~kb) 0.9 7.4 1.4 2.3 6.0 3.5 3.2 0.54.3 14.9 2.9 19.4 5.2 15.9 4.7 3.6 0.6 0.1 7.3 2.8 2.4 1.5 0.3 1.4 2.8 2.8 2.6 3.5 2.1

T1 altT2b altT2altT2c T2 T3 altT4 T4altT5 T5 1R S1 T6 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 Exons (bp) 122/ 109 90 262 112 10093 48284 93 200 144 200 163 149 120 96 100184 156 234 305149 120 96 100 184 156 234 892 176

Hexokinase 1- additional transcripts derived from ESTs

Legend

BG719874 (testis) Testis-specifc exon

T1 altT2 T2T3 T4 T5 Erythrocyte-specifc exon Somatic exon BM686492 (optic nerve) Exon identified from ESTs altT2 T2T3 T4 S2 ATG translation start

BQ187332/CK300651 (eye)

altT2 T2T3 T4 altT5

BG718423 (testis) T1 altT2c T2T3 T4 S2 Figure 36: Genomic arrangement of human hexokinase 1 and the ESTs At the top of the figure the genomic arrangement of human HK1 including intron size in kb and exon names and sizes in bp are displayed. The lower part shows ESTs that contain alternative exons that have so far not been annotated in NCBI sequence viewer or RefSeq. The source tissue is given in brackets for each EST. ESTs BQ187332 and CK300651 originate from the same clone and have therefore been summarised. Exons in yellow are testis-specific, exons in red erythrocyte specific and exon in black are somatic. Exons in green only occur in ESTs. The term somatic transcript (as used in [241, 242]) refers to the ubiquitously expressed isoform of HK1. Reverse triangles depict translation start sites as annotated in NCBI Sequence viewer.

149 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

4.3.3 Typing the two putative HMSNR mutations

The first putative HMSNR mutation, which was identified during the sequencing of the ESTs, is a G to C change (laboratory identification number #144, see Appendix A) located in alternative exon T2 of hexokinase 1 (HK1). In order to examine this putative mutation in a large population, a screening procedure was necessary. Hence a restriction assay using the enzyme AluI was established. The second putative HMSNR mutation is a G to A change (laboratory identification number #213, see Appendix A) and is located in the intron following the alternative exon T2, 117 bp upstream from the intron/exon boundary of alternative exon T2c of HK1. The distance between the two candidate mutations is only 1.314 kb (Figure 37). Therefore it seems likely that both changes are in complete linkage disequilibrium. A restriction digest with the enzyme Tsp509I was used for typing of the second mutation

#144 #213 G/C G/A

Alternative T2 Alternative T2c

1.314 kb

1.152 kb 0.117 kb

Figure 37: Locations of the two putative mutations relative to each other and to the 5’ exons of hexokinase 1 The first putative mutation (ID #144), a G to C change, is located in alternative exon T2 of HK1, which is part of the ESTs BG719874, BM686492 and BQ187332/CK300651. The second mutation (ID #213), a G to A change, is located 1.314 kb downstream of the first mutation and 0.117 kb upstream of alternative exon T2c of HK1 which is represented in EST BG718423.

4.3.4 Screening individuals of Gypsy ethnicity

4.3.4.1 Population screen

The population screen was conducted in a large sample of unrelated Bulgarian Gypsies, representing a cross section of the genetic structure of the Gypsy population. Within this population screen 790 individuals were tested for the putative mutation 1 (ID #144) and 745 for the putative mutation 2 (ID #213), with the sample being

150 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations overlapping for the two mutations. Five individuals were identified, who exhibited carrier status for both putative mutations. Four of these carriers belong to the Kalderash, the group in which HMSNR has been identified, while the fifth carrier is a member of the Rudari, a closely related group. The calculated carrier frequencies are 9.1 % for the Kalderash (four carriers in 44 individuals analysed) and 1.2 % for the Rudari (one carrier in 84 individuals analysed). No carriers were identified in any of the other groups. In order to confirm that the occurrence of the putative mutations in the carriers was also connected with the HMSNR haplotype, a co-worker typed the five carriers for a set of microsatellite markers in the critical region on chromosome 10. For all carriers it could be inferred that the conserved HMSNR haplotype surrounds the mutations (Figure 38).

Conserved HMSNR KALD 10 KALD 24 KALD 25 KALD 29 VLAX 91 Haplotype

D10S2480 11 51 51 51 51 1 bA86K9-CA1 63 23 23 23 23 3 bA314J18-TA1 15 15 25 15 25 5 D10S1678 33 33 13 33 13 3 D10S1647 ?? ?? ?? ?? 66 6 bA227H15-CA1 5 3 6 3 5 3 6 3 1 3 3 bA227H15-CA2 3 2 1 2 1 2 1 2 1 2 2 D10S1672 10 1 1 1 11 1 1 1 1 1 1 Put. HMSNR mutation 1GCGCGCGCGCC Put. HMSNR mutation 2 G A G A G A G A G A A bA227H15-AAAC 2 3 7 3 2 3 7 3 4 3 3 D10S1742 41 31 31 31 41 1

Figure 38: Inferred chromosome 10 haplotypes of the five HMSNR carriers identified in the population screen in the Gypsies Individuals KALD10, 24, 25 and 29 belong to the Gypsy group of the Kalderash, while individual VLAX91 is a Gypsy from the Rudari group. The conserved HMSNR haplotype is shaded in grey. All haplotypes had to be inferred due to the lack of parental data.

This population screen demonstrates that HMSNR is largely restricted to the Kalderash, with few cases detected in the Spanish Gypsies. The HMSNR mutation seems to be an old mutation, enriched in the Kalderash by genetic drift effects, while it has been lost from other groups. Moreover, there seems to be little gene flow between Gypsy groups [4].

151 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

4.3.4.2 Screening of Gypsy families with unclassified CMTs

In addition to the population screen, which included healthy individuals of Gypsy ethnicity, families with unclassified CMTs were tested for the two putative mutations in hexokinase 1. The screening involved 74 subjects for the putative mutation 1 (ID #144) and 76 for putative mutation 2 (ID #213). During this screen, the disease alleles were identified in two families and one singlet (single individual without parental information). A total of five individuals were identified, that were homozygous for the two disease alleles, while seven individuals tested heterozygous for both mutations. The remaining 62 and 64 individuals, respectively, were homozygous for the common alleles. The first family named IRE-1 consists of two parents and two children, both diagnosed with a peripheral neuropathy. DNA samples of this family were sent to the lab by clinicians in Ireland who suspected HMSNL or HMSNR in the children. The family is of Gypsy ethnicity and has recently moved from Romania to Ireland. The haplotypes in both neuropathy-affected children are identical to the conserved HMSNR haplotype for all markers that have been typed. The children are homozygous for both putative HMSNR mutations, while the parents are carriers. This suggests that the neuropathy diagnosed in the children might be HMSNR (Figure 39). In addition to those four subjects, the DNA sample of a presumed distant male relative to the family, who is affected with a neuropathy, was tested. This individual was found to be a carrier of the conserved HMSNR haplotype and the two putative mutations were present in their heterozygous form (Figure 39).

152 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

IRE1-1 IRE1-2 distant relative of the family? (unclassified CMT)

D10S2480 7 1 1 1 1 1 bA86K9-CA1 1 3 3 6 3 7 bA314J18-TA1 1 5 5 1 5 1 D10S1678 2 3 3 4 3 3 D10S1647 8 6 6 5 6 7 bA227H15-CA1 5 3 3 4 3 1 bA227H15-CA2 1 2 2 1 2 2 D10S1672 1 1 1 1 1 9 Put. HMSNR mutation 1 G C C G C G Put. HMSNR mutation 2 G A A G A G bA227H15-AAAC 1 3 3 3 3 3 D10S1742 4 1 1 4 1 4

IRE1-3 IRE1-4 (unclassified CMT) (unclassified CMT)

D10S2480 1 1 1 1 bA86K9-CA1 3 3 3 3 bA314J18-TA1 5 5 5 5 D10S1678 3 3 3 3 D10S1647 6 6 6 6 bA227H15-CA1 3 3 3 3 bA227H15-CA2 2 2 2 2 D10S1672 1 1 1 1 Put. HMSNR mutation 1 C C C C Put. HMSNR mutation 2 A A A A bA227H15-AAAC 3 3 3 3 D10S1742 1 1 1 1

Figure 39: Pedigree of the Gypsy family IRE-1 DNA samples of this family were submitted to the laboratory for CMT mutation testing. Both, IRE1-3 and IRE1-4 typed homozygous for the two putative HMSNR mutations and also exhibit the conserved HMSNR haplotype. Both parents are obligate carriers, suggesting the children are affected with HMSNR. A presume distantly related individual with an unclassified CMT was found to be a carrier of the HMSNR haplotype. The conserved HMSNR haplotype is shaded in grey.

The second family (Figure 40), where additional carriers and individuals homozygous for the putative mutations were identified, was an extension of the large Romanian Gypsy family ROM-1. These newly identified subjects include ROM-26 and ROM-39, who were both homozygous for the mutant alleles. While ROM-26 had been diagnosed with a neuropathy, ROM-39 was free of obvious signs of a neuropathy at the time of sample collection at age 10. However, first manifestations of HMSNR can be noted between ages 7 and 10, implying that early symptoms might have been missed during the clinical examination (Figure 40).

153 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

ROM-41 ROM-53 ROM-52 M/N M/N N/N

D10S2480 1 5 7 1 1 7 D10S2480 bA86K9-CA1 3 2 2 3 5 3 bA86K9-CA1 bA314J18-TA1 5 1 4 1 1 1 bA314J18-TA1 D10S1678 3 3 3 4 3 4 D10S1678 D10S1647 6 7 5 6 5 6 D10S1647 bA227H15-CA1 3 5 3 5 4 6 bA227H15-CA1 bA227H15-CA2 2 2 2 1 2 3 bA227H15-CA2 D10S1672 1 8 1 1 8 10 D10S1672 Put. HMSNR mutation 1 C G C G G G Put. HMSNR mutation 1 Put. HMSNR mutation 2 A G A G G G Put. HMSNR mutation 2 bA227H15-AAAC 3 3 3 3 ? ? bA227H15-AAAC D10S1742 3 4 1 3 4 8 D10S1742

ROM-49 ROM-8 ROM-9 ROM-42 ROM-23 ROM-5 ROM-2 ROM-51 ROM-15 ROM-47 (unclassified CMT?) M/N (unclassified CMT?) (non-paternity for Rom-41?) M/M M/M N/N M/N M/N M/N N/N M/N M/N

D10S2480 5 7 1 5 1 1 1 1 1 7 D10S2480 D10S2480 7 1 1 7 1 7 D10S2480 bA86K9-CA1 2 2 3 2 3 3 3 3 3 2 bA86K9-CA1 bA86K9-CA1 ? ? 5 2 3 3 bA86K9-CA1 bA314J18-TA1 1 4 5 1 5 4 5 5 5 4 bA314J18-TA1 bA314J18-TA1 4 1 1 4 5 1 bA314J18-TA1 D10S1678 3 3 3 3 3 3 3 3 3 3 D10S1678 D10S1678 3 3 3 3 3 3 D10S1678 D10S1647 4 5 6 7 6 7 6 6 6 5 D10S1647 D10S1647 5 5 5 5 6 2 D10S1647 bA227H15-CA1 1 3 3 5 3 5 3 3 3 3 bA227H15-CA1 bA227H15-CA1 3 4 4 3 3 5 bA227H15-CA1 bA227H15-CA2 2 2 2 2 2 3 2 2 2 2 bA227H15-CA2 bA227H15-CA2 2 2 2 2 2 1 bA227H15-CA2 D10S1672 8 1 1 8 1 2 1 1 1 1 D10S1672 D10S1672 1 8 8 1 1 1 D10S1672 Put. HMSNR mutation 1 G C C G C G C C C C Put. HMSNR mutation 1 Put. HMSNR mutation 1 C G G C C G Put. HMSNR mutation 1 Put. HMSNR mutation 2 G A A G A G A A A A Put. HMSNR mutation 2 Put. HMSNR mutation 2 A GGA A G Put. HMSNR mutation 2 bA227H15-AAAC 3 3 3 3 3 2 3 3 3 3 bA227H15-AAAC bA227H15-AAAC 3 5 5 3 3 2 bA227H15-AAAC D10S1742 4 1 1 4 1 3 1 3 1 1 D10S1742 D10S1742 1 4 4 1 1 4 D10S1742

?

ROM-38 ROM-12 ROM-13 ROM-14 ROM-39 ROM-40 ROM-55 ROM-28 ROM-48 ROM-50 ROM-20 ROM16 M/N M/M M/M M/M N/N M/N N/N N/N M/N M/M M/N

D10S2480 7 1 7 1 1 1 1 7 D10S2480 7 1 D10S2480 7 5 7 1 1 1 bA86K9-CA1 2 3 2 3 3 3 3 2 bA86K9-CA1 2 2 bA86K9-CA1 2 2 2 3 5 3 bA314J18-TA1 4 5 4 5 5 5 4 1 bA314J18-TA1 4 2 bA314J18-TA1 4 1 4 5 1 5 D10S1678 3 3 3 3 3 3 3 3 D10S1678 3 1 D10S1678 3 3 3 3 3 3 D10S1647 5 6 5 6 6 6 7 7 D10S1647 5 4 D10S1647 5 6 5 6 5 6 bA227H15-CA1 3 3 3 3 3 3 5 1 bA227H15-CA1 3 6 bA227H15-CA1 3 9 3 3 4 3 bA227H15-CA2 2 2 2 2 2 2 3 2 bA227H15-CA2 2 4 bA227H15-CA2 2 2 2 2 2 2 D10S1672 1 1 1 1 1 1 2 1 D10S1672 1 1 D10S1672 1 8 1 1 8 1 Put. HMSNR mutation 1 C C C C C C G G Put. HMSNR mutation 1 C G Put. HMSNR mutation 1 C G C C G C Put. HMSNR mutation 2 A A A A A A G G Put. HMSNR mutation 2 A G Put. HMSNR mutation 2 A G A A G A bA227H15-AAAC 3 3 3 3 3 3 2 4 bA227H15-AAAC 3 2 bA227H15-AAAC 3 3 3 3 5 3 D10S1742 1 1 1 1 1 3 3 4 D10S1742 1 4 D10S1742 1 4 1 1 4 1

ROM-54 ROM-26 ROM-27 M/N (unclassified CMT?) N/N M/M

D10S2480 7 7 5 5 bA86K9-CA1 2 2 2 2 bA314J18-TA1 4 4 1 1 D10S1678 3 3 3 3 D10S1647 5 5 6 6 bA227H15-CA1 3 3 9 9 bA227H15-CA2 2 2 2 2 D10S1672 1 1 8 8 Put. HMSNR mutation 1 C C G G Put. HMSNR mutation 2 A A GG bA227H15-AAAC 3 3 3 3 D10S1742 1 1 4 4 Figure 40: Extended pedigree of the Romanian Gypsy family ROM-1 Previous genotyping included ROM-5, 8, 9, 13, 14, 15, 16, 20, 41, 42, 47, 52, 53 and 55 (Figure 26). Typing of the two putative mutations included additional 11 members of family ROM-1. This resulted in the identification of six non-carriers N/N (ROM-2, 27, 28, 40, 48 and 49), three carriers M/N (ROM-50, 51and 54) and two individuals who are homozygous for both putative mutations M/M, namely ROM-26 and ROM-39. Individual ROM-26 has been diagnosed with a CMT and it is likely that this CMT is HMSNR. Individual ROM-39 was free of obvious signs of a neuropathy at the time of sample collection at the age of 10. However, early clinical symptom may have been missed during examination. The conserved HMSNR haplotype is shaded in grey. N depicts a normal allele, M a mutant allele which includes both putative mutations. Grey symbols denote individual affected by an unknown CMT.

154 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations Furthermore, a single individual, where no relatives were available, was tested (Figure 41). This Bulgarian Gypsy patient (R18) was diagnosed with a CMT, which is similar to HMSNR. Haplotypes were inferred and suggested that one of her chromosomes in the critical region is identical to one of the HMSNR haplotypes (specifically haplotype “f”), while the other chromosome was shown to contain a small part of the conserved HMSNR haplotype, which contains the two putative mutations. Typing of R18 for the two putative mutations revealed homozygosity for the presumed disease alleles in both cases, implying that the neuropathy diagnosed in R18 might be HMSNR. Overall, the homozygous region surrounding the two putative mutations is about 74.7 kb – however not all identified variants have been typed in this individual, thus leaving the possibility of an even smaller homozygous region.

conserved R-18 HMSNR haplotype

D10S581 3 7 3 D10S1646 2 7 2 D10S1670 1 4 1 bA153K11-CA2 7 7 7 D10S210 5 3 5 bA153K11-CA1 2 11 2 D10S2480 1 9 1 bA86K9-CA1 3 1 3 bA314J18-TA1 5 1 5 D10S1678 3 3 3 D10S1647 6 7 6 bA227H15-CA1 3 7 3 bA227H15-CA2 2 2 2 D10S1672 1 1 1 #82 1 2 1 Put. HMSNR mutation 1 C C 74.7 kb C Put. HMSNR mutation 2 A A A bA227H15-AAAC 3 5 3 D10S1742 1 1 1 bA404C6-AC1 7 5 7 bA404C6-AC2 4 4 4 D10S1665 11 11 7 D10S560 2 2 7

f a

Figure 41: Haplotypes of the Bulgarian Gypsy patient Individual R18 is a Gypsy patient from Bulgaria who resides in the same community as the HMSNR families. She was diagnosed with a neuropathy similar to HMSNR. Genotyping revealed that she carries one complete HMSNR chromosome (haplotype “f”). Typing for the two putative HMSNR mutations showed homozygosity for the mutant allele in both cases. The patient is homozygous for an area of 74.7 kb surrounding the mutation. This indicates that neuropathy diagnosed in this individual may be HMSNR. SNP alleles are called as follows: #82: 1= C and 2 = G. The conserved HMSNR haplotype is shaded in grey. Haplotypes had to be inferred due to lack of parental genotyping information.

155 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations 4.3.5 Screening individuals of non-Gypsy ethnicity

4.3.5.1 Population screen

Part of the population screen is the examination of non-Gypsy individuals from the country in which the Gypsies reside, thus supplying further evidence, that the occurrence of the mutant allele is indeed restricted to the Gypsy population. This would be the case for a founder mutation occurring in an endogamous population like the Gypsies. Absence of the mutant allele from the non-Gypsy population shows that it has not been transmitted to the Gypsies by admixture with the surrounding population. For this purpose, non-related Non-Gypsy subjects of Bulgarian origin were chosen from a study of PKD (Polycystic kidney disease) that had been conducted in the laboratory in previous years as Bulgaria is the country where over half of the HMSNR family members reside. A total of 54 individuals (108 chromosomes) were tested for the first putative HMSNR mutation (ID #144) and 50 individuals (100 chromosomes) were screened for the second putative mutation (ID #213), with the sample being overlapping apart from the four additional individuals tested for the first mutation. All individuals exhibited two normal alleles (G/G) at both putative mutation sites.

4.3.5.2 Screening of Non-Gypsy families with unclassified CMT

To further the investigation of non-Gypsy individuals, additional subjects were screened, which belong to families with CMT-affected members, where no mutation has been identified yet. Similar to the population screen in Bulgarian non-Gypsy individuals, it was anticipated that none of the individuals should carry the disease allele and none should be homozygous for the disease allele. The screening involved 68 individuals for putative mutation one and 62 for putative mutation two, again a mostly overlapping sample apart from a few individuals. As expected, all tested family members were homozygous for the common allele (G/G) for both putative mutations.

156 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

4.4 SCREENING OF HEXOKINASE-1 IN A SPANISH PATIENT OF NON-GYPSY ETHNICITY

4.4.1 Clinical features of the Spanish patient

The DNA sample of this Spanish neuropathy patient was referred to the laboratory by Dr. Jaume Colomer. The patient was born to non-consanguineous healthy parents. When examined by Dr. Colomer at the age of five, the boy presented with a demyelinating neuropathy with CNS involvement. His motor development was delayed. At the time of examination he was able to sit, but could only walk with support. Motor nerve conduction velocities were reduced in comparison to normal values. He showed hypoacusis evident by abnormalities in the brainstem auditory evoked potentials (BAEP). In the magnetic resonance image (MRI) a hypoplasia of the cerebellar vermis was detected. His intellectual development was retarded and he did not talk. Additionally, he presented with some dysmorphic features. Neuropathological examination performed by Dr. Rosalind King revealed features identical to HMSNR, suggesting that the patient could in fact have a different allelic mutation in the HMSNR gene.

4.4.2 Genetic investigation of the Spanish patient

The genetic analysis included testing for the CMT1A duplication performed by a laboratory in Spain and the two putative HMSNR mutations as part of this PhD project. The result of the CMT1A duplication test was negative. Typing for both putative HMSNR mutations revealed that the patient has the common allele on both chromosomes. This was further supported by typing selected microsatellite markers in the region, which showed that there was no resemblance to the HMSNR haplotype (Figure 42). In addition, it became clear, that the patient was heterozygous for this region on chromosome 10.

157 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

11029036 conserved (G/G) HMSNR haplotype

D10S2480 1 1 1 bA86K9-CA1 7 8 3 bA314J18-TA1 1 2 5 D10S1678 3 3 3 D10S1647 7 6 6 bA227H15-CA1 6 5 3 bA227H15-CA2 3 3 2 D10S1672 9 11 1 Put. HMSNR mutation 1 G G C Put. HMSNR mutation 2 G G A bA227H15-AAAC 2 3 3 D10S1742 4 5 1

Figure 42: Inferred haplotypes of the Spanish patient in comparison to the HMSNR haplotype Typing of the Spanish patient for 10 microsatellite markers and the two putative mutations revealed that the haplotypes of this patient exhibit no resemblance to the HMSNR haplotype, and, in addition it was shown, that the patient is heterozygous for the HMSNR region of chromosome 10.

Hexokinase 1 is the gene in which both putative HMSNR mutations have been identified; therefore HK1 was considered a candidate gene for disease in the Spanish patient, which resembled HMSNR in some features, but especially in its neuropathology. To identify putative disease-causing mutations in this patient, all exons of HK1 were screened by PCR amplification and subsequent direct sequencing. No changes were detected in exons T1, altT2b, altT2, T2, T3, R1, S1, altT6, S2 to S9, S11, and S13 to S16. In the remaining exons a total of 18 changes (Table 24) from the contig sequence (NT_008583.16) have been detected. Ten of them were homozygous, while eight were heterozygous. As for their location in hexokinase 1, 12 were in introns, one was 3’ flanking, and, five were in the coding sequence. Of the five changes identified in the coding sequence, four were non- synonymous, while one was synonymous. Conservation of the four non-synonymous amino acid changes in species other than human could not be assessed. In all four cases the changes occurred in the so-called testis-specific exons of HK1, which are poorly researched in other species.

158 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

Table 24: Changes detected in the Spanish patient while sequencing the exons of hexokinase 1 Exon Detected variants Genotype seen Accession Change (Lab ID) Location in healthy number subjects? if known AltT4 C/T homo (#234) bp 486 AK128226, Yes rs4746836 Pro/Ser T4 T /C hetero (#86) intronic Yes rs2894081 G/A hetero (#237) intronic ? rs2394545 T/C hetero (#87) intronic Yes rs4746837 G /A hetero (#238) intronic ? rs3812689 T5 C /A homo (#88) intronic Yes rs2002905 A/G homo (#90) bp 519 NM_033500, Yes rs906220 His/Arg C/T hetero (#239) intronic ? rs1108272 G/A homo (#91) intronic Yes rs906221 G/A homo (#92) intronic Yes rs906222 T/A homo (#93) intronic Yes rs906223 C/T hetero (#94) intronic Yes rs5030951 T6 T/A hetero (#240), bp 421 U38228, Ser/Arg ? rs7912524 T/G hetero (#242) bp 502 U38228, Arg/Ser ? rs10998724 S10 G/A homo (#99) bp 1524 NM_000188, Yes rs748235 Lys/Lys S12 G/A homo (#102) intronic Yes rs749105 S17 C/T homo (#244) intronic Yes rs1227938 S18 Del T homo (#245) 3’flanking Hk1 Yes Not listed Abbreviations: homo = homozygous, hetero = heterozygous, LAB ID = laboratory identification number

In alternative exon T4 a C to T change was identified at position 486 in transcript AK128266. No protein sequence was annotated for this mRNA; however, with the use of ORF-finder (NCBI) a 952 amino acid long sequence could be determined, which resembled the somatic HK1 except for its amino-terminal end. The identified change encoded a proline to serine substitution, equivalent to the exchange of a non-polar amino acid by an uncharged polar one, and was therefore considered promising. Hence a restriction digest with the enzyme BsmAI was established for further screening. A total of 68 individuals from diverse ethnic backgrounds were screened. The C/C genotype was found in 15 individuals, C/T in 36 individuals and T/T in 17 individuals. With allele frequencies of 48.5 for the C allele and 51.5 for the T allele, it seems likely that this change is a harmless polymorphism, where neither allele is subjected to negative selection. The second non-synonymous change was a histidine/arginine change encoded by an A/G at nucleotide 519 of NM_033500; this variant had previously been identified and excluded in the HMSNR families. The other two non-synonymous changes were

159 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations identified in testis-specific exon T6, at nucleotide 421 and 502 of U38228, which caused a serine to arginine and an arginine to serine substitution respectively. The two changes were not detected in the Gypsies. All but one of the changes were known variants with entries in dbSNP. The disorder of the Spanish patient is very rare; therefore listing of the disease-causing variant in dbSNP (NCBI) was regarded as unlikely, which meant that listing of a variant in dbSNP could serve as an exclusion criterion. DNA samples of the parents of this patient were unavailable; thus the test for co- segregation of a variant with the disease could not be performed. Occurrence of the same genotype in the Spanish patient as in one of his healthy parents could have been used for exclusion of variants as disease-causing mutation. However, without parents available, a variant was regarded as excluded if the same genotype had been detected in healthy individuals from the HMSNR families, which had been sequenced in the process of analysing HK1, or, additional individuals were analysed in some cases. It could be shown for 13 of 18 variants that the respective genotype also exists in healthy individuals in homozygous form, thereby excluding them as disease-causing mutations.

4.4.3 Conclusions of screening HK1 for mutations in the Spanish patiens

A Spanish neuropathy patient with neuropathological features resembling HMSNR was screened for mutations. The main aim of this part of the project was to search for mutations in HK1, which could support the claim that this gene is involved in HMSNR. This implies the possibility that defects in HK1 could cause a pathology other than nonspherocytic haemolytic anaemia, which specifically affects the peripheral nervous system. The genetic investigations included testing for the CMT1A duplication of PMP22, which is (with nearly 70 % of all cases) the most common cause of CMT1 [21]. The duplication can be inherited or arise de novo [62] and it is not confined to any ethnic group. The patient tested negative for the duplication. Furthermore, the patient was examined for the two putative HMSNR mutations in HK1. It was not expected that the patient would carry any of these changes, as he is not of Gypsy ethnicity and the diseases is confined to the European Gypsy population. This was also supported by determining the haplotype for the HMSNR region on chromosome 10q, where he presented with alleles different from the HMSNR haplotype.

160 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations Due to the fact that both parents of the patient are healthy the assumption was made that his disease might be recessive. Although the patient was heterozygous for the HMSNR region, it was not excluded that he could have a disease-causing mutation in HK1, as he could be a compound heterozygote or he could harbour a de novo mutation. The next step was screening the exons and flanking intronic sequence of HK1 for putative mutations. During this search, a total of 18 putative mutations were identified. Exclusion criteria for putative mutations were: listing of the variant in the public database and/or detection of the same genotype (as in the Spanish patient) in a healthy individual. Thereby all detected putative mutations were excluded. However, two exons of HK1 (altT2c and alt T5) still require screening, in addition, screening of introns or promoters of HK1, or testing for large genomic deletions might identify putative mutations. Future mutation analysis could also involve screening NDRG1, as the patient presents with abnormalities in the BAEP, which is typical for HMSNL. Moreover, it needs to be considered, that the complex phenotype the patient presents with, a neuropathy with CNS involvement including mental retardation, could also be complex on a genetic level, involving mutations at several loci.

161 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations

4.5 SUMMARY AND DISCUSSION OF THE MUTATION SCREEN

The analysis of positional candidates for the HMSNR gene by direct sequencing of nearly 150 kb, with 54 of 150 kb belonging to the refined HMSNR region of 63.8 kb, identified a total of 78 candidate mutations. All, but two, have been excluded by lack of co-segregation with the HMSNR phenotype and/or they have been previously reported as a polymorphism. A G/C change in the untranslated alternative exon T2 of the gene hexokinase 1 and a G/A change in the following intron could not be excluded. They segregate completely with HMSNR and are not known as polymorphisms. A population screen using a large sample of European Gypsies from different Gypsy groups was conducted by testing for both putative mutations. Five carriers were identified; four belonging to the Kalderash, which is the Gypsy group where all HMSNR families come from, and another carrier was detected in the Rudari, a closely related group. The respective carrier rates were calculated to be 9.1 % for the Kalderash and 1.2 % for the Rudari. The results of this population screen can be compared with the screen performed for the CCFDN mutation in the gene CTDP1, where the carrier rate among the Rudari, the group to which the majority of the CCFDN families belong, was 6.9 %, in other Gypsy groups the average carrier rate was 0.6 % [109]. The carrier rate amongst the group where the disease mainly occurs is similar between CCFDN (the Rudari) and HMSNR (the Kalderash). In contrast to CCFDN, distribution of HMSNR seems to be much more restricted, as only one non-Kalderash carrier was identified. For CCFDN, carriers were identified in diverse Gypsy groups. One can also compare the frequencies of the two putative HMSNR mutations with the frequency of the major ceroid-lipofuscinosis, neuronal 5 gene (CLN5) mutation in the Finns. Mutations in this gene cause variant late infantile neuronal ceroid lipofuscinosis (vLINCL). Screening individuals from a high-risk area, the carrier frequency was determined to be 1 in 24 (4.2 %) in a specific community, while it was 1 in 100 (1 %) in the remainder of the high-risk area. Outside this area, no carriers were found among 100 controls, indicative of a carrier rate below 1 % [197, 243]. Recombination diversity increases with the age of a mutation, as recombination events accumulate on the disease haplotype. Based on the diversity of haplotypes seen in the HMSNR families, the HMSNR mutation seems to be an old mutation.

162 Chapter 4: Analysis of Positional Candidates-Identification of Putative HMSNR Mutations Enrichment of the HMSNR mutation in the Kalderash, possibly by genetic drift, is supported by the frequency data. By contrast HMSNR is absent from other Gypsy groups, also underlined by the detection of only one non-Kalderash carrier. HMSNR might have been lost from these groups or it may not have been present in the founders at the time of the group split. The strong adherence to the traditional group structure in the Gypsies, which results in little admixture between the different Gypsy group ([151]),. The presence of the two putative mutations and the HMSNR haplotype was also demonstrated in additional Gypsy families, where recessive CMTs had been diagnosed. For some of these individuals it is known that they belong to the Kalderash Gypsies as they are the extension of the HMSNR family ROM-1, while for the Bulgarian Gypsy R18 and the Irish Gypsy family it is unknown whether they are Kalderash Gypsies. Investigation of 100 healthy chromosomes from the Bulgarian non-Gypsy population demonstrated that the disease allele had not been transmitted to the Gypsies by the surrounding population. This was supported by screening a number of non- Gypsy families with unclassified CMTs, which all exhibited the common allele in its homozygous form at the two mutation sites. Proof for the involvement of gene in a disease is best achieved by identifying a second mutation in that gene, which causes a similar pathology [186]. In an attempt to gain further support for HK1 as the HMSNR gene, the exons of the gene were screened in a Spanish patient (non-Gypsy) with a complex syndrome that includes a neuropathology similar to HMSNR. This search identified 18 putative mutations, all of which were excluded as being the cause of a neuropathy. These results do not completely rule out HK1 mutations as the cause of the neuropathology in the Spanish patient. Further screening of introns and promoters, including two so far not sequenced exons might reveal a disease-causing mutation.

163 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

5 GATHERING EVIDENCE FOR THE INVOLVEMENT OF HK1 IN HMSNR

Chapter outline

The aims of this chapter are: 1. To give an overview of the family of hexokinases, including their evolutionary conservation, their function and pathologies that are caused by hexokinase mutations. 2. To explain the strategy used to investigate involvement of HK1 in HMSNR and consider the possible involvement of other genes in HMSNR. 3. To report the results so far obtained in this investigation, 4. To give a summary and draw conclusions.

5.1 HEXOKINASES – AN OVERVIEW

5.1.1 The four hexokinase isoenzymes in mammalia

5.1.1.1 Evolutionary conservation

In mammals, four different hexokinase isoenzymes, namely hexokinase 1, hexokinase 2, hexokinase 3 and hexokinase 4 (commonly called glucokinase), have been described. Hexokinase 1, 2 and 3 have a molecular weight of approximately 100 kDa, while glucokinase weighs 50 kDa. Apart from glucokinase in mammals, 50 kDa hexokinases have been found in yeast (Zymomonas mobilis, Saccharomyces cerevisiae), Drosophila melanogaster, Trypanosomas, Plasmodium falciparum, shrimp, silkworm and lobster. 100 kDa hexokinases, on the other hand, were identified in locust and Ascaris suum (reviewed in [244]) In 1973, Colowick postulated that the 100 kDa hexokinase arose by duplication and fusion of an ancestral 50 kDa hexokinase. This was later supported by the discovery of extensive sequence similarity between the N-terminal and the C-terminal half of the 100 kDa hexokinases, but also similarity of each half to the 50 kDa hexokinases. This is also evident from the conservation in exon sizes for the coding exons, which are highly similar between the N- and C-terminal half of each of the 100 kDa hexokinases and the glucokinases (Table 25) (reviewed in [244]).

164 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

Table 25: Comparison of the exon sizes of several mammalian hexokinases (from [245]) Length of exons (coding sequence only) #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 Hs HK1 63 163 149 120 96 100 184 156 234 305 149 120 96 100 184 156 234 142 Rn Hk1 63 163 149 120 96 100 184 156 234 305 149 120 96 100 184 156 234 145 Hs HK2 63 163 149 120 96 100 184 156 234 305 149 120 96 100 184 156 234 142 Rn Hk2 63 163 149 120 96 100 184 156 234 305 149 120 96 100 184 156 234 142 Mm Hk2 63 163 149 120 96 100 184 156 234 305 149 120 96 100 184 156 234 145 Hs HK3 96 163 155 120 96 100 184 156 234 296 137 120 96 100 184 156 234 142 Hs GCK 42 163 155 120 96 100 184 156 234 142 Rn Gck 45 163 155 120 96 100 184 156 234 142 This table compares the sizes of the so-called somatic exons of the hexokinases. Erythrocyte-specific and testis-specific exons have not been included. Exons that differ in size from human HK1 are shaded in grey. Abbreviations: Mm. = Mus musculus; Rn. = Rattus norvegicus; Hs. = Homo sapiens

165 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

5.1.1.2 Gene structure and protein of the four human isoenzymes

The gene structure of human HK1, localised to chromosome 10q22 has already been discussed in conjunction with the screening for mutations in this gene (see Figure 35 on page 148 and Figure 36 on page 149). It shall be reiterated here that hexokinase 1 exists in several tissue-specific isoforms, namely HK1, which is the ubiquitously expressed somatic transcript, HK1R, which is the erythrocyte isoform, and several testis-specific isoforms. While 17 3’ exons are shared between all transcripts, tissue- specificity is generated by alternative splicing of at least 11 5’ exons. One of these is specific to erythrocytes; one is part of the somatic transcript, while the remaining nine are, in various combinations, forming the testis-specific transcripts. It has been shown that the different N-termini of the transcripts serve to localise the protein to different intracellular compartments. The somatic form contains a porin binding domain, which enables it to bind to the mitochondria (reviewed in [244]). The erythrocyte form is soluble in the cytoplasm, and in addition also has a different half life compared to the somatic HK1 [246]. No localisation has been reported for the human testis-specific forms of HK1. However, from the studies in mouse it is known that in spermatocytes, these isoforms can localise to the mitochondria, the fibrous sheath of the flagellum and the membrane of the sperm head [247]. Human HK2 is transcribed from chromosome 2p12. The annotated transcript NM_000189 contains 18 exons and translation starts in exon 1 (Table 26). HK2 also contains a porin binding domain and can be in a soluble state or bound to the mitochondria. So far only one mRNA has been found to be transcribed from the human HK3 locus on chromosome 5q35.2. The transcript (NM_002115) contains 19 exons and translation starts in the second exon (Table 26).The HK3 protein has been shown to localise to the periphery of the nucleus ([248]). The three mRNAs of human glucokinase, which are between 2.4 and 2.8 kb in length, are transcribed from chromosome 7p13. Each transcript has a distinct 5’ region containing one or two exons, while the remaining nine exons are identical between all transcripts. Isoform 1 of human GCK (NM_000162) contains a unique first exon and is mainly expressed in the pancreatic islet beta cells. Isoform 2 and 3 (NM_033507 and NM_033508) represent the major and the minor liver transcript, respectively (Table 26). They share the first exon, which is different from the first exon in isoform 1, and isoform 3 in addition contains a unique second exon, which is not present in the other 166 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR transcripts. For isoform 1 and 2, translation starts in the first exon, while for isoform 3 the translation start site has been annotated for the unique second exon. From the ESTs collected in the Unigene clusters of the four hexokinases it seems that HK1 is most abundantly expressed, followed by HK2, while expression of HK3 and GCK seems to be restricted to specific tissues.

Table 26: Overview of the human hexokinases Hexokinase 1 Hexokinase 2 Hexokinase 3 Glucokinase Chromosome 10q22 2p12 5q35.2 7p13 mRNA sequences NM_000188 NM_000189 NM_002115 NM_000162 (Refseq, NCBI)1 NM_033496 NM_033507 NM_033497 NM_033508 NM_033498 NM_033500 2 Size of mRNAs 3.5 – 4 kb 4.2 kb 3.0 kb 2.4 – 2.8 kb Similarity to Mm. 81.5% Mm. 81.7 % Mm. 83.25 % Mm 89.2 % mouse and rat on Rn. 82.1 % Rn. 87.2 Rn. 84.3 % Rn. 89.2 % nucleotide level 2 Protein NP_000179 NP_000180 NP_002106 NP_000153 sequences NP_277031 NP_227042 (Refseq, NCBI) NP_277032 NP_227043 NP_277033 NP_277035 Size of protein 917 aa 917 aa 923 aa 465 aa Unigene Hs.118625 Hs.406266 Hs.411695 Hs.1270 1 does not reflect full variety of transcripts, contains only NCBI reference sequences 2 source: Genecards [249, 250] Abbreviations: Mm. = Mus musculus; Rn. = Rattus norvegicus; Hs. = Homo sapiens

5.1.2 Functions of hexokinases

According to IUB (International Union of Biochemistry) nomenclature, hexokinases are called adenosine triphosphate (ATP): D-hexose 6-phoshotransferases (EC 2.7.1.1.). Their principal function is the transfer of a phosphoryl group from ATP to hexoses. Hexokinases phosphorylate a variety of hexoses, but typically exhibit preference for one particular hexose, with the most important one being D-glucose (Figure 43) (reviewed in [244]).

167 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

2- CH2OH CH2O PO3 Hexokinase O O Mg2+ H H H H H H + + ATP + ADP + H HO OHH OH HO OH H OH

H OH H OH

Glucose (Glc) Glucose-6-phosphate (G6P)

Figure 43: Schematic drawing of the hexokinase reaction with glucose Hexokinases catalyse the phosphorylation of hexoses. Glucose and ATP are converted to glucose-6-phosphate and ADP + H+ in presence of Magnesium ions (Mg2+).

The hexokinase reaction is thought to follow a random Bi Bi mechanism, which means that both substrates have to be bound before the reaction occurs and there is no preferred order for substrate addition (reviewed in [251]), although the latter has been disputed. Absolute requirement for all hexokinases is ATP-Mg2+-chelate. A ternary complex with hexokinase, glucose and ATP-Mg2+ is required for the reaction to take place (reviewed in [244]). X-ray structures gave evidence that glucose induces a conformational change to hexokinase, where two lobes move to form an 8 Å cleft containing the active site. The carbon atom 6 of glucose and ATP- Mg2+ are brought into immediate proximity which is thought to confer specificity to the reaction and circumvent the simple hydrolysis of ATP to ADP and Pi, which requires less energy (reviewed in [251]). All four isoenzymes exhibit different kinetic and regulatory properties (Table

27). HK3 has the highest affinity to glucose, as evident by the lowest Km, while GCK has the lowest affinity, shown by a high Km. HK1, HK2 and HK3 can be inhibited by their product glucose-6-phosphate (G6P), whereas this is not the case for GCK. For

HK1, this product inhibition can be counteracted by increasing levels of Pi. HK3 is the only hexokinase that can be inhibited by excess glucose in an ATP-dependant fashion (reviewed in [252] and [253]).

Table 27: Kinetic and regulatory properties of the four human hexokinases (modified from [253] and [252]) HK1 HK2 HK3 GCK Km for Glucose [mmol/l] 0.03 0.3 0.003 > 1 Inhibition by G6P sensitive sensitive sensitive insensitive Antagonism to G6P inhibition by Pi yes no no N/A ATP-dependent inhibition by excess no no yes no glucose

168 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR In addition to regulation by substrates and product, other regulation mechanisms come into place for particular isoenzymes. HK2 is regulated by insulin [254], catecholamines, cyclic AMP (cAMP) and isoprenterol [255]. The influence of insulin on HK2 transcription is mediated by sterol regulatory element-binding protein 1c (SREBP-1c), which interacts with the promoter region of HK2 [256]. Recently, it was shown that proinflammatory interleukin-1 (IL-1) is able to induce an increase of HK2 via IL-1 receptor, rat sarcoma viral oncogene (Ras) and mitogen-activated protein kinase (MAPK) pathway in mesangial cells during a state of injury and inflammation [257]. GCK is the only hexokinase that is regulated by a specific regulatory protein, the name glucokinase regulator (GCKR). GCKR, which is expressed in the liver and in the hypothalamus, inhibits GCK by competing for GCK binding with glucose (reviewed in [258]). It has been hypothesised, that due to their different kinetic characteristics and regulatory properties, each of the isoenzymes is suited to different states of metabolism in different tissues and/or different cellular compartments. Thus HK1 is thought to primarily generate G6P for glycolysis, while HK2 is thought to provide G6P for anabolic reaction such as glycogen synthesis or the pentose-phosphate cycle ([253]). Glucokinase, on the other hand, is the “glucose sensor” of the liver (reviewed in [259]). Recently a new role has been emerging for hexokinases: the involvement in mitochondrial pathway of apoptosis. The permeability transition pore complex (PTPC), which contains the voltage-dependant anion channel (VDAC), the adenine nucleotide translocator (ANT) and a number of other proteins, sits in the mitochondrial membrane at points of close contacts between the outer and the inner membrane. Apoptotic signals lead to formation of the permeability transition pore. This leads to loss of inner membrane potential and osmotic swelling of the mitochondria, which results in release of proteins, including cytochrome c from the mitochondria. These events trigger enzymes called caspases and lead ultimately to cell death. The BCL2-associated X protein (BAX) is a major pro-apoptotic player of the B-cell CLL/lymphoma 2 (BCL2)- family, which binds to VDAC and induces apoptosis (reviewed in [260]). It has been demonstrated that the two hexokinase isoenzymes, which bind to the outer mitochondrial membrane via VDAC, HK1 (somatic) and HK2, can counteract the pro- apoptotic influence of Bax [261, 262]. Furthermore, the serine/threonine protein kinase v-akt murine thymoma viral oncogene homolog 1 (Akt1) is activated by the presence of glucose and promotes binding of HK1/HK2 to VDAC, thus preventing apoptosis [261,

169 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR 263, 264]. Moreover, glucokinase, which does not bind to the VDAC, has been identified as part of a protein complex containing BCL2-antagonist of cell death (BAD), which resides at the surface of liver mitochondria. The pro-apoptotic protein BAD is dephosphorylated in a state of low glucose levels, which leads to apoptosis [265]. Thus, it seems that regulation of glucose metabolism by hexokinases and hence energy production are tightly interwoven with the regulation of cell death [266].

5.1.3 Hexokinase pathologies and associated phenotypes

5.1.3.1 Hexokinase pathologies in humans

Mutations in HK1 and GCK are associated with distinct pathologies in humans, haemolytic anaemia and diabetes, respectively. Both diseases exhibit genetic and allelic heterogeneity. No disease-causing mutations have been reported for HK3. A number of studies tried to show with limited success that variants in HK2 are associated with non- insulin dependent diabetes (NIDDM) [267-271]. Although seven missense changes (listed in HMGD, [272]) have shown some relationship to NIDDM, it seems that mutations in HK2 are not a major factor in NIDDM [267, 269].

Hexokinase 1

The location of four disease-causing mutations [236-238] that have so far been identified in HK1 has been described in detail in section 4.3.2. Here, more emphasis shall be given to the resulting phenotype. All four mutations cause hereditary nonspherocytic haemolytic anaemia (NSHA). Apart from hexokinase, there are a number of other erythrocyte enzyme deficiencies, amongst them deficiencies of other glycolytic enzymes that cause hereditary NSHA. The primary feature of this disease is increased destruction of red blood cells, called hemolysis, for which the bone marrow tries to compensate by production of more red blood cells. NSHA caused by HK1 mutations generally presents with variable severity, ranging from lethality in the foetus to mild anaemia. For some patients associated symptoms have been described, such as multiple malformations, psychomotor retardation, periventricular leukomalacia (reviewed in [252]) and additionally intrauterine growth retardation [237]. Analysis of the mutant HK1 enzyme showed altered electrophoretic mobility, reduced stability or aberrant kinetics (reviewed in [252]).

170 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR Glucokinase

Mutations in the human glucokinase gene have been shown to cause maturity onset diabetes of the young type II (MODY II), and neonatal diabetes mellitus (NDM) hyperinsulinemia of infancy (HI). By 2003, a total of 195 mutations had been reported in the glucokinase gene. MODY is autosomal dominantly inherited, typically with an early onset, associated with abnormalities in β-cell function and moderate hyperglycaemia. NDM can be caused by homozygous inactivation of glucokinase. Patients need treatment with insulin often in the first month of life. HI, also called hyperinsulinemic hypoglycaemia of infancy (PHHI) can be due to activating mutations in the GCK gene. Affected individuals display high levels of insulin despite hypoglycaemia. All three disorders are genetically heterogenous and can be caused by mutations in a number of different genes (reviewed in [273]).

5.1.3.2 Mouse models of hexokinase pathologies

Mouse models for hexokinase deficiency have been reported for hexokinase 1, hexokinase 2 and glucokinase. For glucokinase, there is also a model for overexpression.

Hexokinase 1

Spontaneous insertion of an early transposon (ETn) into intron 4 of Hk1 caused deficiency of Hk1 in mice from the A/J inbred strain. The resulting phenotype resembled nonspherocytic haemolytic anaemia (NSHA) in humans and the mouse model was named “downeast anaemia” or dea. In the mouse model, the disorder is recessive, as in humans. The dea/dea mice present severe anaemia, with red blood cell counts, haemoglobin, hematocrit and red cell mean corpuscular haemoglobin content being markedly decreased. The amount of circulating reticulocytes is extremely elevated, while high levels of bilirubin are a sign of red cell hemolysis. Numbers of white blood cells seem to be in the normal range. Other features of the dea/dea mouse model include enlargement of the spleen, specifically the red pulp, and iron accumulation in liver and kidney. A marked reduction of overall hexokinase activity was detected in red blood cells, spleen and kidney of the dea/dea mouse, while the activity in the liver remained normal [274].

171 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR Hexokinase 2

A mouse model of Hk2 deficiency was generated using a homologous recombination technique which resulted in disruption of exon 4 of the Hk2 gene. Mice with the Hk2 -/- genotype were severely retarded in growth and died around E 7.5. Heterozygous Hk2+/- mice displayed markedly lowered levels of Hk2 in adipose tissue, heart and muscle, compared to wildtype mice. Similarly, Hk2 activity was reduced in heterozygous animals while Hk1 activity remained unchanged. Examination of insulin resistance and glucose tolerance did not reveal any differences between heterozygous and wildtype animals, not even when a high-fat diet was administered [275]. A later study using the same mouse model investigated the effect of partial knockout of Hk2 on muscle glucose uptake. While no differences between heterozygous Hk2+/- and wildtype mice were seen in the sedentary state, changes in muscle glucose uptake were noted in the heterozygous mice upon stimulation with exercise. Muscle glucose uptake was only impaired in tissues with a high oxidative capacity and accelerated glucose flux, such as the heart and the soleus, whereas in less oxidative muscles, such as the gastrocnemius or the superficial vastus lateralis (SVL) muscles this effect was less pronounced and it seemed that these muscles could switch to glycogen as a source of glucose [276].

Glucokinase

A complete knockout of Gck by insertion of neomycin into exon 3 to 5 has been reported by Grupe et al 1995 [277]. Homozygous Gck-/- mice died between days P3 and P5. These mice were smaller and displayed elevated glucose levels from birth. Histological examination revealed changes in the liver, specifically microvesicular steatosis and depletion of glycogen; the amount of circulating triglycerides was increased. It was hypothesised that death was caused by diabetes and the associated coma due to hyperosmolality or by starvation as a result of a stop in feeding around day P3. Heterozygous Gck+/- mice were viable, but displayed abnormal β-cell response to glucose and hyperglycaemia, while insulin levels were normal. The lethal phenotype of the homozygous Gck-/- mouse could be rescued by transgenic expression of pancreatic β-cell Gck, which seemed to be sufficient to correct some of the abnormalities in absence of functional liver Gck [277]. In a complementary approach, Terauchi et al 1995 [278] sought to disrupt only the pancreatic β-cell Gck by targeting the specific first exon. Homozygous mice died within the first week of life from dehydration caused

172 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR by severe diabetes, while heterozygous mice were viable with only mild symptoms of diabetes [278]. Based on a number of mouse models generated using different systems it is possible to state that complete knockout of both Gck isoforms causes death in the embryonic stage or shortly after birth, due to severe diabetes. Complete knockout of only the pancreatic β-cell form results in a similar phenotype, while mice lacking hepatic Gck only are viable but present with impaired insulin secretion during hyperglycaemia (reviewed in [259]). Interestingly, mice overexpressing Gck showed increased amounts of hepatic Gck, while β-cell Gck was unexpectedly in the normal range. Elevated levels of hepatic Gck were shown to result in increased clearance of glucose by the liver and formation of glycogen independent of insulin levels, thus preventing development of diabetes type 2 [279].

5.1.4 Hexokinase 1 in the peripheral nervous system

There are very few publications that investigate the role of HK1 in the peripheral nervous system and none of them has been published in recent years. Total hexokinase activity in various nervous system tissues was measured by Kato et al in 1973 [280]. HK activities were similar in the three PNS tissues tested, which were dorsal root ganglia (DRG; sensory), anterior horn (motor) and facial nucleus (motor). In comparison with CNS tissues, the three PNS tissues showed HK activities higher then Purkinje cells or pyramidal cells, but lower activities then Deiter’s cells or giant cells. Localisation of HK1 in PNS tissues has been studied by Wilkin and Wilson in 1977 [281]. Employing immunostaining against HK1, they found that in rat DRGs the satellite cells stain stronger than the neurons. The majority of immunoreactivity was found in the neutral cytoplasm but intense staining was also seen in the nerve terminals. In general, HK1 staining was detected in all neurons but seemed to vary between different types of neurons. Likewise, ubiquitous staining of autonomous nerves with HK1 antibody was reported by Lawrence et al [282], which observed a more intense immunostaining in the cell body compared to the nerve fibre. It has also been demonstrated that HK1 moves along the axon together with mitochondria in what is thought to be fast transport. In addition, HK1 seems to accumulate proximal and distal of a tie in sciatic nerve [283].

173 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR The Unigene cluster for HK1 (Hs.118625) contains 702 ESTs, ten of which originate from 6 six clones that were derived from peripheral nervous system. One of these ESTs, AA12757 contains the sequence of exon S1 (Table 28), thereby indicating that the somatic form of HK1 is expressed in the PNS.

Table 28: HK1 ESTs originating from peripheral nervous system EST accession IMAGE clone Direction (5’/3’) AA216625 649721 3' read AA216677 5' read AA102660 563816 3' read AA101221 5' read AA102757 1 563926 5' read AA102758 3' read AA179705 613135 3' read AA180524 5' read BF448867 4090787 3' read AA232102 664680 3' read 1 contains exon S1

5.2 STRATEGIC CONSIDERATIONS

5.2.1 The HMSNR mutation

Refined mapping of the HMSNR region on chromosome 10q22 defined an interval of 63.8 kb as the location of the HMSNR mutation. A total of 54 kb of the 63.8 kb were screened for mutations. This analysis included all expressed sequences along with 50 to 100 bp of flanking sequences; in addition, the intron and the flanking sequence to the genes were partially screened for putative HMSNR mutations. Thereby two putative HMSNR mutations were identified that could not be excluded based on the criteria detailed in chapter five. The first one is a G to C change located in the alternative exon T2 of the of the gene hexokinase 1, while the second mutation, a G to A change, has been identified in the intron following alternative exon T2, 1.314 kb downstream of the first mutation. The HMSNR mutation must be either one of the two putative mutations that have been identified during the mutation analysis, or, alternatively, it is contained in the remaining 9.8 kb of intragenic and intronic sequence that have so far not been searched. The location of a putative disease-causing mutation is a major criterion for assessing the potential effect of a mutation [223]. It is generally assumed that a change in an expressed sequence has more potential to cause disease than a change in a

174 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR promoter, an intron, or intergenic sequence, based on their decreasing order of importance to the function of genes. It seems plausible, as a working hypothesis, to “rank” the two putative HMSNR mutations in their potential to be disease-causing. It appears more likely that the putative mutation in alternative exon T2 is the HMSNR mutation, than the one in the following intron. Moreover, any additional putative mutation contained in the not analysed 9.8 kb would be located in an intron or in the flanking sequence of the genes. According to the above mentioned “ranking”, any additional putative mutation would be ranked behind the mutation in alternative exon T2, because this would still be the only mutation that is in an expressed sequence.

5.2.2 Potential effects of the two putative HMSNR mutations

Each of the two putative HMSMR mutations can potentially affect HK1, a new overlapping gene or distant genes located on the same chromosome. The primary effect of a mutation on mRNA and or protein results in a change in the function of the gene through causing loss of function, gain of function, decrease or increase of function.

5.2.2.1 Possible effects of the two putative mutations on HK1

Primary effects

• Effects on transcription of HK1

The two putative HMSNR mutations could potentially affect the transcription of HK1. While neither of them seem to be located in a core promoter region, there is a possibility that the normal function of non-core promoter elements is disrupted. Non-core promoter elements comprise enhancers, silencers, boundary elements and response elements, which may contain transcription factor sites. Enhancers up- regulate transcription, while silencers serve to decrease the transcription rate. Boundary elements are thought to insulate the effect of enhancers and silencers, and response elements act to change transcription in response to stimuli (reviewed in [180]) Variants in regulatory gene regions can lead to total removal of transcription factor binding sites, or the generation of new, spurious binding sites, or they can change the efficiency of transcription factor binding [284]. However such variants have so far only rarely been associated with disease. One of the few examples are single base mutations in intron 6 of the human tryptophan

175 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR oxygenase gene (TDO2) that cause psychiatric disorders by disrupting the YY-1 transcription factor binding site [285].

• Effect on splicing of HK1

A dysregulation in pre-mRNA splicing presents a possible disease mechanism that both putative mutations could be involved in. Such a defect could affect PNS transcripts of HK1, containing 5’ untranslated exons such as alternative exon T2 or alternative exon T2c, as the first putative mutation located in alternative exon T2 and the second one is in the intron between alternative exon T2 and alternative exon T2c. The classical elements of splicing are the 5’ splice site (donor), the 3’ splice site (acceptor) and the branch site in the intron [286, 287]. A disruption of any of these sites changes the composition of the transcript. Increasing interest has also focussed on auxiliary cis-elements, such as exonic and intronic splicing enhancers (ESEs and ISEs) and exonic and intronic splicing silencers (ESSs and ISSs), which are thought to contain additional splicing information. 3’ and 5’splice sites and the branch sites are poorly conserved, which confers a need for supporting regulatory elements, thus enabling recognition of genuine splice sites which are greatly outnumbered by so-called pseudo splice sites that often show stronger conservation in their sequence [288]. That mutations in auxiliary splicing elements can cause pathological changes, has been demonstrated for diseases such as familial isolated growth hormone deficiency type II (IGHD II), Frasier syndrome and frontotemporal dementia or Parkinsonism linked to chromosome 17 (FTDP-17) (reviewed in [288]), which implies that the disruption of auxiliary element represents a possibility for the effect of the HMSNR mutations.

• Effect on translation of HK1

Both putative mutations could also have an effect on translation of the HK1 mRNA. The intronic putative HMSNR mutation could only exert such an effect if splicing changed the composition of the transcript. By contrast, changes in the level, timing, tissue-specificity or initiation of translation represent effects the putative mutation in alternative exon T2 could have on HK1 transcripts that contain the alternative exon. The main mechanism of translation initiation is the recruitment of the ribosome and several protein factors to the 5’ cap structure of the mRNA, the ribosome then scans the RNA until it encounters a translation initiation codon AUG in the right sequence context, where it starts translating (reviewed in [289]. Recently, a number of different

176 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR translation initiation mechanisms have been revealed. Ribosome scanning may be leaky, which means there can be upstream initiation codons which are recognised, translation is initiated and proceeds until a stop codon. Such upstream open reading frames have been shown to regulate translation (reviewed by [290]). An example for this is the proto-oncogene Friend-leukaemia insertion site-1 (Fli-1), where the generation of two different isoforms is achieved by two upstream open reading frames in a sophisticated termination-reinitiating process [291]. Another mechanism is ribosome shunting, where the ribosome stalls at a secondary structural element and dissociates from the mRNA. Furthermore, for a number of genes internal initiation of translation from internal ribosome entry sites (IRES) has been described (reviewed in [289]). The existence of IRES has been demonstrated for a diverse genes range of genes amongst them transcription factors, growth factors and receptors, translation factors cytoskeletal proteins, kinases and transporters. It has been shown that IRES mediated translation is of particular importance during events such as apoptosis, cell cycle, hypoxia, heat/cold shock, where cap-dependent translation is inhibited (reviewed in [292]). In a number of diseases it has been shown that mutations in the UTRs results in changes in translation and thereby cause pathology. For instance, a specific mutation in the 5’UTR of the nerve-specific transcript of Cx32 causes CMTX by completely abolishing translation of the transcript. Further investigation of this mutation demonstrated that it disrupts an IRES [293]. Other translational disorders such as hereditary hyperferritinemia/cataract syndrome or hereditary thrombocythemia have been reviewed by Cazzola and Skoda 2000 [225].

• Effect on intracellular location of RNA or protein

The two putative mutations may also affect the intracellular localisation of the HK1 RNA or protein. It has been shown, for example with neuronal BC1 RNA (a non-coding RNA polymerase III transcript), that mRNAs can be moved along axons and dendrites of neurons in a transport process via microtubules and actin filaments that requires specific sequence elements in the RNA [294]. These elements are thought to take on a secondary structure that is essential for the correct localisation of the RNA, but may also prevent translation of the RNA until factors are available that release the suppression [295]. Whether a possible defect in HK1 mRNA localisation could be conferred by the two putative mutations is however unclear, as there is no data about the transport of HK1 mRNA to specific subcellular locations.

177 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR By contrast, the transport of HK1 protein along the axon of PNS and CNS neurons has been demonstrated [296]. Nevertheless, a defect in the transport of the HK1 protein as a consequence of one of the two HMSNR mutations could only be caused if the amino acid sequence is changed. As one mutation is in an untranslated exon and the other in an intron, the only feasible mechanism by which the protein could be changed involves splicing errors that disrupt one of the translated exons of the HK1 gene.

Effect on HK1 function

The above explained possible primary effects detail how the two putative HMSNR mutations could change transcription, translation and localisation of HK1. As a consequence of any of these changes HK1 function could be altered. The main functions of HK1 were already mentioned earlier in this chapter as catalysing the first step of glycolysis and acting as an anti-apoptotic player. It is conceivable that either of these functions could potentially be affected by one of the two putative mutations. Moreover, there may be new roles of HK1 that have so far not been described.

5.2.2.2 Long range effects in cis

There exists the possibility that the two putative mutations do not affect the gene they are located in, which is HK1, but instead exert an influence on genes at a distance on the same chromosome. Reports about such mutations which have an effect on distant cis-acting regulatory sequences are scarce, and mostly involve chromosomal translocations or larger insertion/deletions rather than single nucleotide changes as in the case of the two putative HMSNR mutations. In humans, the eye disease aniridia can be due to interstitial deletions at 11.6 and 22.1 kb away from the third poly-A-site of the paired box gene 6 (PAX6) which reduce PAX6 transcription and thereby cause haploinsufficiency [297]. Similarly, long range cis regulatory elements were also found to extend over 25 kb of 5’ end of the mouse Pax6 gene (reviewed in [297]). Another example are the translocation breakpoints found between 140 and 950 kb proximal of the sry (sex determining y) box gene 9 (SOX9) which result in the bone disease campomelic dysplasia, that is otherwise caused by mutations in the SOX9 gene [298]. Likewise, a disruption of a regulatory element for the human sonic hedgehog homolog (SHH) gene by a translocation, which is located in ~1 Mb distance from the SHH locus in an intron of the limb region 1 gene (LMBR1), was shown to have no effect on LMBR1 but affects SHH, thus causing preaxial

178 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR , a disease that is associated with limb malformations [299]. An example for point mutations that affect long range regulatory elements is adult-type hypolactasia, commonly known as lactose intolerance, which has been associated with a C to T change at 13.9 kb and a G to A change at 22 kb 5’ of the start codon of the gene lactase (LCT). The changes are located in the minichromosome maintenance deficient 6 gene (MCM6) [300]. From the examples above it seems that long range cis-acting regulatory elements are located within several kb up to 1 Mb of the gene they affect. Applied to the situation with HMSNR one would have to consider genes in a radius of at least 1 Mb around the two putative HMSNR mutations. NCBI map viewer currently displays 30 genes in this interval. Immediately upstream of HK1 is the predicted hexokinase FLJ22761, while further downstream the next gene is TACR2. Genes such as TACR2, NET-7 or NEUROG3, which are located in ~130, ~175 and ~295 kb distance, respectively, from the two putative mutations, and were excluded based on mutation analysis and refined mapping, would be good candidates for long range effects. However, it has to be noted that long range regulation in cis is rare and most mutations do actually affect the gene in which they are located. Therefore, any examination of distant candidate gene should await the investigation of hexokinase 1 as the most likely gene to be affected by the putative mutations.

5.2.3 Strategy for the investigation

Due to the fact that two putative HMSNR mutations were identified in HK1, it was decided that efforts should concentrate on HK1 as the gene that might be affected in HMSNR. However, at the same time, the possible existence of a new gene overlapping the 5'region of HK1 had to be examined by predictions and database searches. In addition, information about the other two genes in the critical region, FLJ22761 and FLJ31406, was collected from databases and alignments, owing to the possibility that the real mutations might not have been identified yet and therefore there is a slight possibility that these two genes could be involved in HMSNR. Although both putative HMSNR mutations could also have long range effects, they are rare and it was therefore considered that these should only be investigated if no other disease-causing mechanism can be identified. To further investigate HK1, computational methods were exploited to estimate the evolutionary importance of the mutated nucleotides and the effect the mutations could have on gene transcription and translation. This involved looking at conservation 179 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR of the nucleotide in other species, possible changes in splicing brought on by either mutation and, in the case of exonic mutation, changes in the secondary structure of transcripts containing alternative exon T2. Further, to establish a stronger connection of the alternative exon T2 mutation to HMSNR pathology it was necessary to identify transcripts containing this exon in the PNS. To this end, hexokinase 1 transcripts in human and mouse needed to be examined. For this part of the project, methods such as RT-PCR, Northern blot, a cDNA library screen and RACE were applied. Furthermore, immunohistochemical studies and HK activity measurements were performed by collaborators in the UK and the Netherlands.

180 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

5.3 RESULTS

5.3.1 Results of the investigation of HK1 and the two putative HMSNR mutations

5.3.1.1 Computational analysis of the two putative mutations

Conservation of the putative mutations in other species

The level of conservation of a putative mutation site in other species gives evidence the importance of the nucleotide. For translated sequences this relates to protein structure and function, where conserved residues are likely to be of greater importance [223]. For non-coding regions, on the other hand, it has been shown that regions of high conservation between species are good candidates for regulatory elements ([301]). Both putative HMSNR mutations are located in non-coding regions, one in an untranslated exon and the other in an intron. The search for sequence homologies in other species was considered useful for two reasons: Firstly, the information about the importance of the mutated nucleotide based on conservation, and secondly, identification of homologous sequences in species such as mouse or rat, could enable the use of animal models. Therefore, extensive BLAST [171, 172] and BLAT [302] searches against the genomes of various species were conducted using the alternative exon T2 (262 bp) and a ~1.7 kb fragment containing the full alternative exon T2 with the first mutation site and the following intron up to 170 bp after the second mutation site. This resulted in the identification of highly homologous regions containing the putative mutation site in the alternative exon T2 in chimpanzee, dog and mouse (Figure 44). Furthermore, a homologous region for the second mutation site located in the intron after alternative exon T2 was identified in the dog genome (Figure 45). For both putative mutations the wildtype nucleotide was conserved between the species.

181 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

Putative HMSNR mutation G/C change

Human ACTGGAAC-CCTGTGGGAGCACATAGCTGGCATTT-T-TTGCTAG--AGATTAGGAAGTCTTTT-GCTTCCT Chimp ACTGGAAC-CCTGTGGGAGCACATAGCTGGCATTT-T-TTGCTAG--AGATTAGGAAGTCTTTT-GCTTCCT Dog ACTGGAAC-CCTGTGGGAGCACATAGCTGGCATTT-T-TTGCTTG--AGAT-AGGAAGTCTTTT-GCTTCCT Mouse -CTGGA-CGCCTGTGGGAGCACATAGCTGGCATTTCTGTTAAGAGTGAGATTTGGAAGTCTTTTTGCTTCCT Figure 44: Sequence comparison of the homologous regions in chimpanzee, dog and mouse for the putative HMSNR mutation in alternative exon T2 of HK1 The putative mutation is boxed and bold. Area shaded in grey denotes identical sequences between all four species.

For the second putative mutation, no homologous region was identified in chimpanzee, due to a gap in the genomic sequence, whereas the mouse sequence was available but did not exhibit any sequence similarity. In order to verify that the dog sequence contained in the genomic clone AAEX01016323 which maps to chromosome 4 is indeed the region for dog HK1, the human HK1 sequence was blasted against dog genome and highest similarity found for chromosome 4 clones AAEX01016324 and AAEX01016325. All three clones are located adjacent to each other in the Dog Genome Browser in UCSC.

Putative HMSNR mutation G/A change

Human CATCCTTGGGCAGACACGAGGAAGGATGAAAGG-ATTGACCTGCGAGAATC-- Dog CATCCTTGGG-AGGTAGTGGGAGGGAAGAAAGGGGTTTA--TGCAAGAGTCCA Figure 45: Sequence comparison of the homologous region in dog for the putative HMSNR mutation in the intron after alternative exon T2 of HK1 The putative mutation is boxed and bold. Areas shaded in grey denote identical sequences between dog and human sequence.

The genomes from cow, pig, cat, rat, sheep, chicken, zebra fish and fruit fly failed to reveal any similarity to the two putative HMSNR mutation sites in BLAST and BLAT searches. Rat HK1 was mapped cytogenetically to chromosome 20; however, the physical map of this area has a gap at the position of HK1.

Examination of translation initiation

In order to assess if the human transcripts containing alternative exon T2 have a different translation start site, the testis-specific transcripts published to date (listed in Table 29), which use a start codon in T3, were translated using the translate tool

182 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR available from the Expasy server. All forward frames were inspected manually for AUG (ATG) and CUG (CTG) start codons. The decision to look at transcripts with and without T2 was made due to the detection of transcripts lacking T2 in both mouse testis and mouse brain.

Table 29: Transcripts inspected for alternative translation initiation T1 + T2 (short) + T3 T1 + T2 (long) + T3 T1 + altT2b +T2(short) + T3 T1 + altT2 + T2 (short) + T3 (normal and mutant) T1 + altT2 + T3 (normal and mutant)

None of the tested transcripts contained any other upstream AUG (ATG) start codons that were in-frame with the start codon in T3 and would allow continuous translation into the main ORF. There are two upstream stop codons in T3 in-frame with the start codon of the main ORF (Figure 46, panel (a)). In all transcripts upstream AUG and CUG start codons were detected. Furthermore, when comparing normal and mutant alternative exon T2, the G to C change abolished an upstream stop codon (TAG) and replaced it by a tyrosine (TAC) (Figure 46, panel (b)). There were two upstream and one downstream CUG start codons in frame with the TAG stop codon. However, the relevance of these non-canonical start codons is not clear. a) HK1 exon T3

gcguucaagacccagcuguugagaguagaaaagcagaagaaaggacccgaggucagcaagu R S R P S C - E - K S R R K D P R S A S gcccuccccacaauggggcagaucugccagcgagaaucg A L P T M G Q I C Q R E S b) HK1 alternative exon T2

gacggcguctggaguuuugcacaaaagagaauugaauuguagaucagcugggaaguuacu D G V W S F A Q K R I E L - I S W E V T gugguaguccuggugcccugcggccuccagcgacuggaacccugugggagcacaua g/ccug V V V L V P C G L Q R L E P C G S T -/Y L

Figure 46: Search for upstream ORFs and start codons Panel (a) shows exon T3 and the canonical AUG start codon (boxed and shaded grey) that is used for translation of the testis-specific HK1 isoforms. Panel (b) shows partial sequence of alternative exon T2. Non-canonical CUG start codons are boxed with a dashed line and shaded grey. The putative mutation in alternative exon T2 G to C, in bold, abolishes the stop codon an upstream ORF.

In order to assess whether there are functional elements contained in alternative exon T2, the sequence of the exon (262 bp) was analysed using the program UTRscan [179], which compares the input sequence against a database of user-submitted UTR 183 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR elements. This search yielded similarity to an IRES pattern which was submitted by Le and Maizel [303] for base pairs 170 to 262, and contains the putative mutation site at base pairs 188.

Predictions of splice sites and auxiliary splice elements

Splice-site predictions were performed with both mutations in order to evaluate whether any of the changes would activate a cryptic splice site using NNSPLICE 0.9 [173, 174], NetGene2 [175] and GeneSplicer [176]. The sequence used for the predictions contained the mutation site and 200 bp of flanking sequence on each site. Each prediction was carried out with the normal and the mutant nucleotide and the output of programmes was compared. All three programmes predicted no changes in splicing for the putative mutation in the intron following alternative exon T2. However, for the mutation site in alternative exon T2, the creation of a donor splice site on the complementary strand was predicted by NNSPLICE 0.9 and GeneSplicer, while NetGene2 predicted the creation of a splice acceptor site, also on the reverse strand. While these predictions do not have any immediate consequence at the moment, as there seem to be no transcripts generated from the complementary strand, they need to kept in mind in case a new gene is identified that overlaps HK1. So far, predictions programmes for auxiliary splice elements are scarce. Programmes are, however, available for prediction of ESEs and have been applied for the putative HMSNR mutation in alternative exon T2. To obtain a prediction the sequence of human alternative exon T2 (262 bp) was pasted into RESCUE ESE [304, 305] and ESE finder [287]. RESCUE ESE did not predict any ESEs at or near the putative mutation site. ESE finder predicted the existence of a binding site for the protein SRp55 immediately upstream of the mutation site, while the actual binding motif was unaffected by the G to C change.

184 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR mRNA secondary structure prediction

A prediction of the effect of the mutation on the mRNA secondary structure was attempted for the G/C change in the alternative exon T2 of HK1 using GeneBee [177] and the Vienna Package (RNAfold) [178]. A number of changes in the mRNA secondary structure have been shown to be involved in various disease pathologies (reviewed in [225]); this could also be the case for HMSNR. The sequence submitted to GeneBee contained exons T1, alternative T2, T2, T3 and T4, while a shorter sequence consisting of the putative mutation site, 110 bp of 5’ flanking and 75 bp of 3’ flanking sequence were used for the prediction with the Vienna Package. All predictions were performed with normal and mutant mRNA. While GeneBee predicted a change in the secondary structure of the mRNA between normal and mutant mRNA, no obvious differences were predicted by RNAfold. Furthermore, the prediction models from different programmes showed no similarities for the part of the sequence used in both predictions (Figure 47 and Figure 48). From the two prediction attempts it can be concluded that the two prediction programmes probably use different algorithms to generate the prediction, which results in a difference in the secondary structure output. In addition it seems that the secondary structure of the mRNA may be affected by the putative mutation in alternative exon T2. However, further predictions and experimental proof are necessary to clarify this.

185 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

Normal: -62.6 kcal/mol

Mutant: -61.14 kcal/mol

Figure 47: Prediction of the secondary structure of the mRNA containing alternative exon T2 in normal and mutant state using the Vienna Package (RNAfold) Both normal and mutant mRNA containing the mutation site and 110 bp of 5’ flanking and 75 bp of 3’ flanking sequence pasted into the Vienna package prediction programme. While slightly different free energy values were calculated, the two predicted secondary structures exhibit no difference.

186 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

Normal

Figure 48: Prediction of the secondary structure of the mRNA containing alternative exon T2 in normal and mutant state using Genebee Both normal and mutant mRNA containing exons T1, alternative T2, T2, T3 and T4 were pasted into the Genebee prediction programme. The arrow indicates the positions of the putative HMSNR mutation at 297 bp. The predicted secondary structure and the calculated free energy differ for both mRNAs.

187 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

5.3.1.2 Transcriptional analysis

RT-PCR analysis using mouse templates

The use of mouse tissues in order to investigate expression of the alternative exon T2 is a practical alternative to the use of human tissues. The databases were searched for known transcripts containing the exons that were analysed in the human tissues. A publication by Mori and colleagues from 1993 [241] described three transcripts containing four exons (Figure 49), which were the mouse counterparts of the human T1 to T4 described by Andreoni et al 2000 [242]. Furthermore an alternative exon T3 was described, which does not seem to have a human equivalent.

5’ exons of mouse hexokinase 1

T1 T2 altT3 T3 T4 R1 S1 S2 //// // // // // // 3’ 70 bp 46 bp 125 bp 95 bp 48 bp 200 bp 120 bp 163 bp

Hk1 transcripts from spermatocyte published by Mori et al (1993)

T1 T2 AltT3 T3 T4 S2

Hk1-sa (L16948) 125 95 48 Hk1-sb (L16949) 70 46 95 48 163 Hk1-sc (L16950) ? 95 48

Figure 49: Spermatocyte transcripts of mouse Hk1 published by Mori et al 1993 [241] The upper part contains a schematic drawing of the known 5’ exons in mouse in order in which they occur on a genomic level. Exons are shown as boxes with exon name and exon size. Alt T3 stands for alternative exon T3. The lower part shows the spermatocyte-specific transcripts published by Mori et al in 1993. Exon names are at the top and sizes in bp in the boxes depicting the exons. “?” stands for a small part of the transcript which maps to the area after T2. All drawings are not to scale. Exons names have not been assigned by Mori at al 1993 and were therefore matched up with their human counterparts in order to facilitate a comparison.

However, no alternative exon T2 had been described. From blast searches it was apparent, that there is a region between T1 and T2 that exhibits high similarity to the human alternative exon T2 (Figure 50), with the mouse counterpart predicted to be 257 bp, instead of 262 bp for the human alternative T2. Therefore, primers were designed into this 257 bp area with the aim of identifying the exon and connecting it to

188 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR the other exons. Mouse testis cDNA was used, as it was known from the experiments with human testis cDNA, that the alternative exon T2 is present in this tissue. By RT- PCR followed by direct sequencing of the product, the 5’ boundary of the exon was established at a surprising 140 bp upstream of the predicted boundary, thus increasing the exon size from 257 bp to 397 bp. The product using primers T1-F33-52 + altT2- R362-381 was seen in testis and kidney, but not in sciatic nerve, brain, liver, spleen, eye and heart. Subsequent experiments concentrated on testis, brain and sciatic nerve, and, did not include the other tissues, as the aim of the experiments was to identify expression of alternative T2 in the nervous system.

Human gctggaac-gtccc-gggattg-aagcttggatccgaactgttggacggcgtct-ggagt ||||| || | ||| |||| || ||||||||||| |||||||| ||| | |||| ||| | Mouse gctgg-actg-ccctggga-tgaaagcttggatctgaactgttagacagtgtctggga-t

Human tttgcacaaaagagaattgaattgtagatcagctgggaagttactgtggtagtcctg-gt | ||||||||||||||||||||||||||||||||||||| | || | || || | | Mouse tgtgcacaaaagagaattgaattgtagatcagctgggaaaccagtgagacagcccagcg-

Human gccctgcggcctccagcgactggaac-cctgtgggagcacatagctggcattt-t-ttgc ||||| ||||||||| | ||||| | |||||||||||||||||||||||||| | || Mouse gccctt-ggcctccaggg-ctgga-cgcctgtgggagcacatagctggcatttctgttaa

Human tag--agattaggaagtctttt-gcttcctctgtgaaaaggcttgaattcaatggact || ||||| ||||||||||| |||||||||||||| | |||| |||||||| ||| Mouse gagtgagatttggaagtctttttgcttcctctgtgaagtg-cttggattcaatg-act

Figure 50: Area of high identity of the mouse genome to human alternative exon T2 A BLAST search including the human alternative exon T2 against the mouse genome identified a highly identical area located between mouse exon T1 and exon T2 of Hk1 on chromosome 10. A total of 194 of 238 bp (81.5 %) are identical between human and mouse. The location of the putative mutation in alternative exon T2 is boxed and shaded in grey.

The 3’ end of the exon proved to be difficult to define, as PCR reactions seemed impossible to optimise. After testing several combinations of primers, with the forward located in the alternative exon, while the reverse was designed into T2, T3 or S2, it became clear that there must be alternative splicing in the tissues chosen, as all optimisation efforts were unable to decrease the number of bands. Therefore, the primer pair altT2-F246-265 + S2-R54-73 was chosen to amplify all bands in testis and brain, separate them on an agarose gel, cut each band out of the gel and sequence with the PCR primers. This sequencing yielded eight transcripts in testis and two of these were also found in brain (Figure 51 (a) and (b) and Figure 52). It was shown, that the alternative exon T2 can be of 397 bp or 642 bp. Furthermore, bp 398 to 539 of the 642 bp exon can be spliced out in order to make two exons of 397 bp and 103 bp. Alternative splicing was also detected for exon T2, which can be 46 bp, as stated by Mori et al [241], or 181 bp by extension of its 3’ end. While Mori et al [241] did not

189 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR directly describe the longer T2, their Hk1-sc transcript contains 21 bp of the end of the 181 bp version of T2 (Figure 49, labelled “?”).

Mouse testis testis Mouse DNA ladder DNA ladder Brain Mouse

a) b)

Non-specific amplification 794 659 613 613 549 517 471 501/489 414 501/489 404 368 404 368 331 331 242 242

c) d)

501/489 501/489 435 404 404 331 300 331 242 254 242 254

Figure 51: RT-PCR results using mouse testis and mouse brain (a) RT-PCR with primers altT2 F246-265 and S2-R6-26 in mouse testis resulted in eight bands of sizes between 368 and 794 bp, which were confirmed as splice variants of alternative exon T2 and exon T2 by sequencing. (b) Using the same primers as in (a), two of these bands (368 and 613 bp) were also detected in mouse brain, while a third band was found to be no specific amplification. (c) RT-PCR using primers T1-F33-52 and S2-R6-26 detected three bands of 254, 300 and 435 bp in mouse testis. (d) Using the same primers for RT-PCR in mouse brain, one band of 254 bp was detected. None of the products amplified in (c) and (d) contained alternative exon T2. The three products were all generated by alternative splicing of exon T2.

190 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

T1-F33-52 S1-F95-114 S3-F34-54 predicted T1 altT2 T2 T3 S1 S2 S3 S7 Primer positions // // // // // // // 70 bp 257 bp 46 bp 100 bp 120 bp 163 bp 149 bp 184 bp 3’

T2-R26-46 T3-R60-80 S2-R54-73 S7-R6-26 Detected with RT-PCR

Detected with RT-PCR altT2-F362-381

altT2-F246-265 46 135

altT2-F39-60

397 142 103

altT2-R362-381

RT-PCR from alt exon T2 or bridging over exon T2 PCR products Tissues in which the transcript was detected

T1 Alt T2 T2 T3 T4 S2 T1-F33-52 + altT2-R362-381 418 bp 38 381 Testis, kidney altT2-F246-265 + S2-R54-73 (1) 794 bp 38 152 142 103 46 135 95 48 73 Testis (2) 659 bp 152 142 103 46 95 48 73 Testis (3) 613 bp 152 142 103 95 48 73 Testis, brain (4) 549 bp 152 46 135 95 48 73 Testis (5) 517 bp 152 103 46 95 48 73 Testis (6) 471 bp 152 103 95 48 73 Testis (7) 414 bp 152 46 95 48 73 Testis (8) 368 bp 152 95 48 73 Testis, brain T1-F33-52 + S2-R54-73 (1) 435 bp 38 46 135 95 48 73 Testis (2) 300 bp 38 46 95 48 73 Testis (3) 254 bp 38 95 48 73 Testis, brain

RT-PCR of common part of HK1 S3 S4 S5 S6 S7 Testis, brain, sciatic nerve S3-F34-54 + S7-R6-26 458 bp 217 122 34 100 48

RT-PCR of somatic HK S1 S2 S3 S4 S5 S6 S7 S1-F95-114 + S7-R6-26 680 bp 45 163 149 120 96 100 48 Testis, brain, (sciatic nerve not tested)

Figure 52: Overview of the RT-PCR experiments performed in the 5’ region of mouse HK1 The top of the figure gives an overview of all primers and their nucleotide positions in the exons. Exons were no primers were designed have been omitted from the drawing. The lower part of the figure gives primer combinations, sizes of PCR products, exons contained as determined by sequencing, and, the tissues in which the RT-PCR was successful.Splicing of alternative exon T2 is depicted in orange and yellow, with orange being the part the exhibits homology to the human counterpart and yellow being the two additional parts of alternative exon T2, that were only detected in mouse.

191 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR Due to the low amount of RNA obtained from mouse sciatic nerve it was decided to use the SensiScript protocol (Qiagen), as it allows cDNA synthesis from 50 ng of RNA. The synthesised cDNA was tested by PCR amplification with primers S3-F34-54 and S7-R6-26 which resulted in a band of the expected size. However, no PCR products were amplified when combining alt T2-F246-266 with T2-R26-46 or S2-R54-73. In addition, a PCR between T1 and S2 was performed in testis and brain, which yielded transcripts that did not contain the alternative exon T2. Three transcripts were identified in testis, one of them also in brain (Figure 51 (c) and (d) and Figure 52). One striking feature of the identified brain transcripts for both amplifications, between T1 and S2 but also between alternative T2 and S2, was the fact that none of the transcripts contained exon T2. This was supported by the fact that no amplification was detected in brain when using primers for alternative T2 and T2. The common part of HK1 (primers from S3 to S7) could be amplified in testis, brain and sciatic nerve (using SensiScript protocol [Qiagen] for the nerve.) The specific somatic part (primers S1 to S7) could be amplified from testis and brain, while sciatic nerve has not been tested, as this PCR product was primarily used to probe the Northern blot. Problematic was the high number of cycles needed to detect the transcripts with the alternative exon, which lead to increased non-specific amplification from residual genomic DNA, thus necessitating DNase treatment of the RNA. Furthermore, yields of mouse sciatic nerve RNA were low and RNA was prone to degradation, limiting the number and type of experiments that could be done. Therefore the Sensiscript protocol (Qiagen) was used, where one reverse transcription can be performed with as little as 50 ng of RNA. However, this approach did not lead to identification of alt exon T2 containing transcripts in sciatic nerve, and, in addition it totally abolished the transcripts seen in testis or decreased their variety, depending on the primers used. Another approach was the use of gene-specific primers in the reverse transcription of testis and brain RNA. PCR reactions using cDNA generated with gene-specific primers yielded products for the somatic part of HK1 in the positive control reaction, but not for the part containing the alternative exon T2, not even in the testis control. As stated in 5.3.1.1 the rat sequence for the genomic region, where HK1 is located, was not available. It was assumed that highest similarity would exist to the mouse. RNA was isolated from rat brain, dorsal root ganglia (DRG) and sciatic nerve. While the sciatic nerve RNA was limited and used in Northern blot experiments, cDNA was synthesised from brain and DRG RNA using random hexamers. The forward

192 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR mouse primers altT2-F246-265 and altT2-F39-60 were paired with altT2-R362-381 in

PCR reactions with rat brain cDNA and rat DRG cDNA titrating MgCl2 from 1.5 to 2.5 mM. However, only non-specific amplification (confirmed by sequencing) was detected.

RT-PCR analysis using human templates

Based on the human EST sequences (ESTs BQ187332 and CK300651 [both derived from the same clone], BG719874 and BM686492) it was known that the alternative exon may be expressed in testis and eye tissues including optical nerve. The aim of the RT-PCR experiments in human tissues and cells was to show that the alternative exon T2 is expressed in the nervous system (brain/peripheral nerve). Therefore, primers were designed into the alternative exon and exons T1, T2, T3, T4 and S2. In addition, primers for the common part of all HK1 transcripts and the ubiquitous transcript containing S1 were designed (Figure 53). cDNA synthesis was performed using random hexamers or a mixture of random hexamers and oligo dT. It was assumed that random hexamers would be more efficient in the first strand synthesis of RNA containing the 5’ end located alternative exon T2, because they bind randomly as opposed to oligo dT which binds at the 3’ end of the RNA. Due to the known expression of the alternative exon T2 in testis, all PCR reactions were first optimised with cDNA generated from human testis RNA (Clontech). Subsequently, all optimised reactions were performed using human brain cDNA (RNA from Clontech), human peripheral nerve (RNA from total human peripheral nerve) and human Schwann cell cDNA (RNA from cultured Schwann cells). For all amplifications with primers in alternative exon T2 and the amplification from T1 to S2, products were only obtained in testis (Figure 50). Contrary to the results in the mouse, alternative splicing of alternative exon T2 was no observed. The common somatic part of HK1 could be amplified from all tissues tested (testis, brain, peripheral nerve, Schwann cells). The ubiquitous HK1 somatic transcript was amplified successfully from testis and brain (Figure 50), while other tissues were not tested. Several problems were encountered while performing the RT-PCR analysis. All PCR reactions involving primers in the alternative exon had to be carried out with 40 cycles. This gives some evidence that the transcripts might be rare. Moreover, amplification from T1 into S2 in testis resulted in a transcript that did not contain

193 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR alternative exon T2 (Figure 53), as seen in RT-PCR with mouse testis, while the transcript containing alternative exon T2 was not visible on agarose gels. As a side effect of the high number of cycles, non-specific amplification of residual genomic DNA was found to occur frequently, prompting introduction of DNase treatment into the RNA isolation procedure.

194 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

Forward primers

T1-F10-29 altT2-F46-66 S1-F100-120 S3-F35-54

T1 altT2 T2 T3 T4 S1 S2 S3 S7 //// // // // // // // 109 bp 262 bp 122/176 bp 100 bp 48 bp 144 bp 163 bp 149 bp 184 bp 3’ altT2-R239-258 T2-R99-119 T3-R15-34 T4-R28-48 S2-R68-87 S7-R6-26

Reverse primers

RT-PCR from alt exon T2 or bridging over exon T2 PCR products •Successful in testis Alt T1 T2 T2 T3 T4 S2 T1-F10-29 + altT2-R239-258 357 bp 99 258 altT2-F46-66 + T2-R99-119 336 bp 217 119 altT2-F46-66 + T3-R15-34 373 bp 217 122 34 altT2-F46-66 + T4-R28-48 487 bp 217 122 100 48 122 altT2-F46-66 + S2-R68-87 574 bp 217 100 48 68 T1-F10-29 + S2-R68-87 437 bp 99 122 100 48 68

RT-PCR of common part of HK1 •Successful in human testis, total nerve, cultured Schwann cells, brain S3 S4 S5 S6 S7 S3-F35-54 + S7-R6-26 457 bp 115 120 96 100 26

RT-PCR of somatic HK1 •Successful in testis and brain, not performed in nerve or Schwann cell S1 S2 S3 S4 S5 S6 S7 S1-F100-120 + S7-R6-26 699 bp 45 163 149 120 96 100 26

Figure 53: Overview of the RT-PCR experiments performed in the 5’ region of human HK1 The top of the figure gives an overview of all primers and their nucleotide positions in the exons. Exons were no primers were designed have been omitted from the drawing. The lower part of the figure gives primer combinations, the tissues in which the product was detected, sizes of PCR products and the exons contained in each product as determined by sequencing.

195 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

Northern blotting

Northern blots are used to determine the size and the level of expression of transcripts that exhibit high homology to the probe sequence. In the HMSNR project, Northern blots were used to test expression of the alternative exon T2 in different tissues of human, mouse, and rat origin. The first Northern Blot experiment was performed using a human 12-Lane MTN™ Blot (Clontech) containing whole brain, heart, skeletal, muscle, colon, thymus, spleen, kidney, liver, small intestine, placenta, lung and peripheral blood leukocytes RNA. The blot hybridised well with the β-actin control supplied by the manufacturer. In order to test for expression of the alternative exon T2, the human alternative T2 probe 1 (Figure 54) was used, which contains the complete alternative exon (262 bp) including 30 and 8 bases of 5’ and 3’ flanking genomic sequence. No hybridisation signal was detected in repeated hybridisations rounds. As the blot did not contain human testis or spermatocytes, it was not possible to determine, why this experiment failed.

Human probes AltT2 30 8 Alt exon T2 Probe 1 300 bp 262

AltT2 T2 Alt exon T2 Probe 2 336 bp 217 119

Mouse probes AltT2 T2 Alt exon T2 Probe 405 bp 359 46 S1 S2 S3 S4 S5 S6 S7 Somatic HK1 Probe 680 bp 45 163 149 120 96 100 48

Figure 54: Probes used in the Northern Blot experiments Human probes are shown in the upper part of the figure and mouse probes in the lower part. Human alternative exon T2 probe 1 was amplified from genomic DNA and contains the complete human alternative exon T2 including 30 and 8 bp of genomic sequence on the 5’ and 3’ end respectively. Human alternative exon T2 probe 2 and the mouse alternative exon T2 probe amplified from cDNA and include partial sequence of alternative exon T2 and exon T2. The somatic HK1 probe was generated by amplification from exon S1 to S7 of HK1.

Therefore a blot was made containing RNAs from human testis and brain, mouse testis, brain, spleen, kidney, liver, heart and rat sciatic nerve. For the human tissues this blot was hybridised with the human alternative T2 probe 2 (Figure 54), which contains, 217 bp of alternative T2 and 119 bp of T2. The mouse and rat tissues were hybridised with the mouse alternative T2 probe, assuming that the rat sequence might be closely 196 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR related to the mouse sequence. Weak signals at about 4.5 kb, the size of testis-specific HK1, were detected in both, mouse and human testis, while the other tissues exhibited very weak ambiguous signals of a similar size whose intensity did not increase even with longer exposure, and which were not interpretable. As a control, both parts of the blot, the human and the mouse/rat were hybridised with the mouse somatic HK1 probe (Figure 54). Sequence comparisons justified this choice, as 94 % of the amino acids were identical between mouse and rat, while 88 % identity, respectively, was determined for the comparison between mouse and human and rat and human. Signals of approximately 4 to 4.5 kb size were detected in all tissues apart from liver. Mouse testis exhibited the strongest signal. For the last Northern blot experiment a new blot containing human testis and brain RNA, mouse testis, brain and pooled nerve RNA, and, rat brain, sciatic nerve and DRG RNA (Figure 55). Hybridisation was performed as stated above, with the human alternative T2 probe 1 for the human tissues and the mouse alternative T2 probe for mouse and rat tissues. Both human and mouse testis gave a strong signal at 4.5 kb (Figure 55, “a” and “b”). Human brain showed a smear over most of the RNA lane, while none of the other tissues showed any signal even after exposure of up to 62 h. A second hybridisation with the mouse somatic HK1 probe was performed, which resulted in resulted in bands of 4 to 4.5 kb in all tissues (Figure 55, “c” and “d”). Furthermore, the blot was hybridised with a human β-actin probe (as supplied with MTN™ blot, Clontech) as a control. All lanes showed products at about 2.2 kb, as expected (Figure 55, “e” and “f”).

197 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

ooled nerve ooled p Hs BrainHs Testis Hs Mm Testis Mm brain 1 Rn brain brain Rn nerve sciatic Rn Rn DRG Mm a) b)

4.5 kb

1.9 kb

c) d)

4.5 kb

1.9 kb

e) f)

4.5 kb

1.9 kb

Figure 55: Northern blots a) Hybridised with the human alternative T2 probe 2; c) hybridised with mouse alternative T2 probe; b) and d) hybridised with mouse somatic HK1 probe; e) and f) hybridised with the human β-actin probe (Clontech). Signals for somatic HK1 at about 4-4.5 kb (b, d) and β-actin at about 2.2 kb (e, f) were detected in all tissues. Hybridisation with the human and mouse alternative exon T2 probes gave strong signals for both human and mouse testis at 4.5 kb and a smear in the lane for human brain, while all other tissues did not show any signal even after prolonged exposure of up to 62 h. Mouse pooled nerve contains DRGs, sciatic nerve, Schwann cells and nerve fibroblast.

198 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR cDNA library screen

Screening of cDNA libraries is used to identify transcripts contained in the source RNA that are identical to the probe sequence or share a homology with the probe sequence. Within the scope of this PhD project a human foetal brain cDNA library was screened to identify transcripts containing the alternative exon T2 in a tissue of the nervous system. The probe used for this experiment was the same used on some of the Northern blots (human Alternative T2 probe 2, Figure 54), containing 217 nucleotides of the alternative exon T2 and 119 nucleotides of T2, thus adding up to a probe of 336 nucleotides. A total of seven positive clones (named S1 to S7) were isolated from the secondary screen. Sequencing of the insert of each clone with vector-specific primers and BLAST searches were performed. Highest homologies for the inserts are listed in Table 30. Using BLAST 2 sequences an alignment of the probe and each the full mRNA of the positives was attempted. However, none of the sequences showed any significant similarity to the probe, thus making them false positives. Subsequently, another 24 clones and serial dilutions of the library were tested by PCR amplification of the probe using primers alt2-F46-66 and T2-R99-119 (see Figure 53); however no product could be amplified.

Table 30: Result of the cDNA library screen Clones Highest homology of the insert [Accession number] S1/S2 Acyl-CoA-desaturase/stearyl-CoA-desaturase mRNA [AF389338] S3 mRNA for KIAA0513 protein [AB011085] S4/S5 Oxysterol binding protein-like 2 (OSBPL2) [NM_144498] S6 Hypothetical protein FLJ31434 mRNA (moderately similar to Rattus norvegicus endo-alpha-D-mannosidase) [NM_152496.1] S7 clone RP11-356F4 from chromosome 1 [AL591604]

The cDNA library screen failed to give any interpretable results. This can be due to several reasons. Firstly, the hybridisation conditions were not stringent enough, thereby identifying false positives. Secondly, the probe was, with 336 nucleotides rather short, a longer probe might be more successful. Thirdly, the alternative exon and/or exon T2 are not expressed in foetal brain, or do not occur in the same transcript. If only one of the exons is not expressed, then this would shorten the probe dramatically and potentially lead to identification of false positive clones. In addition, the probe could not be amplified from additional 24 colonies and library dilutions, thus questioning the existence of clones containing the two exons in the library.

199 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR After the cDNA library screen had been performed, transcripts in mouse brain were identified that contained alternative exon T2, but not exon T2. This suggests that in brain both exons may not occur in the same transcripts and it may also explain the identification of false positives in the library screen and why the probe could not be amplified from a library aliquot.

5’ and 3’ RACE

Rapid amplification of cDNA ends (RACE) is a method that is used to determine the full length of the transcript(s) of a gene, when only partial sequence is known. One of the putative HMSNR mutations is located in a rare exon, named alternative T2, which is part of HK1 according to EST sequences. The questions that were addressed employing the RACE technique were firstly: Is the alternative exon T2 transcribed in human peripheral nerve? And secondly: If there is a transcript in nerve, is it part of HK1 or part of a new gene. Although there was some evidence, that the alternative exon T2 is in fact part of HK1, as it is contained in four ESTs, which also include other exons of HK1, there was still the possibility that this the putative mutation might be part of a new gene. 5’ and 3’ RACE were performed on total human peripheral nerve by a co-worker using the GeneRacer™ kit (Invitrogen), and the obtained RACE fragments were cloned and sequenced as part of this PhD project. The gene-specific primers were designed into the alternative exon T2 with respect to 5’ and 3’ direction of HK1. The result of the 5’ RACE was an extension of the alternative exon into the genomic sequence in 5’ direction by 398 bp, while the 3’ RACE product started in alternative exon T2, continued into T2 using the splice sites known from the four EST sequences and further extended into the genomic sequence after T2 by 419 bp (Figure 56). The two sequences obtained from 5’ and 3’ RACE were combined with the alternative exon T2 and blasted against the published sequence to check for sequence variation and/or putative sequence errors. A total of four differences from the published sequence were detected. One was a known SNP (rs5030949), while three were not known and therefore had to be treated with caution. Six putative open reading frames with sizes of 201, 177, 168, 126, 111 and 108 nucleotides, or, 67, 59, 56, 42, 37, 36 amino acids, respectively, were predicted with ORF Finder (NCBI). Furthermore, a translated blast against the protein database (NCBI) was performed with the complete sequence in order to identify any similarities to known proteins. This search yielded highest sequences identities for the extended part of exon T2. Around 60 to 75 %

200 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR identity were found to Alu sequences of the subfamilies SQ, SP, J and SX, but also a number of unnamed proteins BAC86363 and BAC86300, for which mRNAs map to chromosome 8 and 19, respectively. Interestingly, both sequences were derived from clones in libraries generated from human testis.

T1 alt T2b alt T2 alt T2c T2 5’ region of HK1 with // // // // 3’ alternative exon T2 104 bp 90 bp 262 bp 112 bp 122/176 bp

AAAAAAAA RACE (peripheral nerve) 660 bp 595 bp

5’ RACE 3’ RACE Figure 56: Result of RACE from the alternative exon in 5’ and 3’ direction The upper part of the figure shows the known testis-specific exons (black boxes) in the 5’ region of HK1. Names of the exons are indicated on top and sizes below the box. The lower part of the figure shows the result of the 5’ and 3’RACE with gene-specific primers in alternative exon T2 of HK1. The 5’ RACE yielded an extension of the alternative exon in 5’direction. The 3’ RACE resulted in a spliced product containing the 3’ end of alternative exon T2 and exon T2, which was extended into the following intron.

The RACE experiment yielded an unexpected result, which needs to be replicated to prove that it is not an artefact. The 5’ RACE result could be due to genomic contamination in the RNA, which could be avoided by treating the RNA with DNase prior to the experiment. The splicing of the 3’ RACE product into T2 gives some evidence, that there might be an mRNA in peripheral nerve that contains the alternative exon T2. Another issue is the design of the primers. Assuming that HK1 is the gene in question, then 3’ and 5’ direction are clear; however, when looking at the possibility of detecting a new gene that overlaps HK1, then both plus and minus strand need to be considered and gene-specific primers need to accommodate this possibility.

5.3.1.3 Hexokinase activity in cultured Schwann cells

Hexokinase activity was tested in primary cultures of Schwann cells from normal and HMSNR affected individuals by collaborators in the Netherlands. However, no differences were detected between normal and HMSNR Schwann cells.

201 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR 5.3.1.4 Immunohistochemistry

Immunohistochemical studies into distribution of HK1 in human peripheral nerve (Figure 57) were performed by collaborators in the UK. Both normal and HMSNR affected nerve were stained using an anti-HK1 antibody that has been raised against a peptide containing parts of the porin binding domain and the following exon. Myelinated axons exhibit strong staining, while unmyelinated axons and Schwann cells are much less stained. In HMSNR nerve in comparison to normal nerve, there is a stronger staining of Schwann cells and unmyelinated axon, however, these results need further confirmation. The specificity of the antibody in regards to the different isoforms of HK1 is unknown. Although it can be assumed that it detects the ubiquitous somatic form, it is unclear whether the antibody shows any affinity to the isoforms without the porin binding domain

202 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

a)

b)

Figure 57: Nerve biopsies from (a) normal and (b) HMSNR nerve stained with an anti- HK1- antibody HMSNR nerve shows a slightly stronger staining for HK1 than the normal nerve. However this result requires further confirmation. Nerve biopsies were taken from sural nerve at the ankle. The sections are frozen 7 micron sections. Pictures were taken with a X40 objective. Pictures are courtesy of Dr Rosalind King.

203 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR 5.3.2 Results of examining the possible involvement of a new gene in HMSNR

In order to assess the possibility that a new so far unidentified gene is overlapping both or one of the mutations, ESTs and predicted genes were examined in the UCSC Genome Browser (Figure 58). Two spliced ESTs were identified which are running in the opposite direction to the two known genes FLJ22761 and HK1. EST AI203682 from testis partially overlaps with an exon of FLJ22761 and the genomic arrangement extends beyond the 3’ end of FLJ22761. EST AA738258, isolated from germ cell tumour, shows homology to the TAR1 repeat, and, extends over the 3’ end of FLJ22761 and the 5’ end of HK1. It cannot be concluded that these ESTs actually constitute real genes since a proportion of ESTs sequences lodged in un-curated public databases have been shown to be spurious due to mislabelling, chimerical sequences and genomic contamination [306].

a) Known genes FLJ22761 HK1 5’ 3’ 5’ G/C G/A 3’

Alt T2 Alt T2c b) Spliced ESTs AI203682 (437 bp) 3’ 5’

5’ 3’ AA738258 (273 bp)

c) GeneScan prediction

5’ 3’

Figure 58: Location of spliced ESTs and predicted genes in relation to the known genes in the genomic area surrounding the two putative mutations Two ESTs which run into the opposite direction of the two known genes FLJ22762 and HK1 (a) were identified in the vicinity of the two putative HMSNR mutations (indicated as G/C and G/A change in (a), respectively). The first EST, AI203682, (b) contains two exons, one of which overlapping with an exon of FLJ22761, while the other is located in the intergenic space between HK1 and FLJ22761. The second EST, AA738258 (b), has two exons located in intron of HK1 and extends into FLJ22761. Sizes of the EST sequences are given in brackets. Genescan (UCSC Genome Browser) predicted one new putative exon 312 bp upstream of the putative mutation in alternative exon T2. All other exons were identical to known exons of FLJ22761 and HK1 (indicated by same shading as exons of FLJ22716 and HK1, respectively.) Drawing is not to scale.

204 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR The UCSC Genome Browser also includes gene prediction. For the area shown in Figure 58, no new exons were predicted by Twinscan, Geneid and SGP. Genescan [307] predicts a gene spanning FLJ22761 and HK1, with exons identical to exons annotated for these two genes, apart from one exon that starts 312 bp upstream of the putative HMSNR mutation in alternative exon T2 and ends at the intron/exon boundary of alternative exon T2 (Figure 58). Whether this exon is real needs to be examined experimentally.

5.3.3 Results of examining the possible involvement of FLJ31406 or FLJ22761 in HMSNR

The first and foremost part of the future work is the completion of the sequencing of the final critical region HMSNR gene region of 63.8 kb. Until this task has been accomplished it cannot be excluded that there might be another HMSNR mutation which segregates completely with the HMSNR phenotype. So far 54 kb of 63.8 kb have been analysed, including all exons of the known genes, all ESTs and predicted exons. This implies that all “prime” locations for putative mutations have been examined and other putative HMSNR mutations would have to be in an intron of one of the known genes or in the flanking region between the genes. The remaining parts of the 63.8 kb that require sequencing are scattered over the whole 63.8 kb. Therefore, another putative mutation could be located in FLJ31406, FLJ22761, HK1 or in a new gene yet to be identified. Results on the investigation of HK1 and on the existence of a putative new gene have been given in 5.3.1 and 5.3.2 ; therefore the emphasis of this part shall be on giving an overview on what is known about FLJ31406 and FLJ22761 from databases and comparative alignments.

5.3.3.1 FLJ31406

FLJ31406 (accession: AK055968) comprises 1933 bp in five exons. Only one of these exons is located in the critical HMSNR region of 63.8 kb. The sequence for this gene was isolated from a clone library generated from differentiated NT2 cells, a human brain neural progenitor cell line from teratocarcinoma. It is unclear whether the mRNA encodes a real protein, as there is no protein annotated in the databases, and, the longest possible open reading frame consists of just 83 amino acids encoded by nucleotides 1024 to 1275. It is possible that the gene

205 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR is incomplete thus preventing annotation of a protein, there is, however, no evidence in the form of ESTs that would hint at additional exons. BLAST searches against the non-redundant and the EST database did not reveal any significant similarities; this was the same for translated BLAST and BLAST searches against the genomes of dog, mouse and rat. Areas of 98.6 % identity were identified in the chimp genome; however, no supporting ESTs could be identified.

5.3.3.2 FLJ22761

The gene FLJ22761, a predicted hexokinase, (accession: NM_025130) contains 19 exons that are subject to alternative splicing in the 3’ region thus enabling the use of three alternate poly-adenylation signals for the translation. The gene is, according to the Unigene entry (Hs.445459), expressed in bone, brain, colon, eye, kidney, liver, lung, muscle, pancreas, stomach, testis, vascular, blood, embryo and adult stage. None of the ESTs originates from the peripheral nervous system. Out of 86 ESTs contained in the cluster three ESTs (BF940098 IMAGE clone: 3439369; BF939903 IMAGE clone: 3441179; AV724583.1 IMAGE clone HTBBXA12) are derived from brain. The mouse homologue (accession: NM_145419) shows 86 % identity at the nucleotide level and 89 % identity at the protein level to the human FLJ22761. The Unigene cluster (Mm.213213) contains 11 ESTs which originate from eye, placenta and brain. Both human and mouse FLJ22761, exhibit the same genomic arrangement, where the 3’ end of FLJ22761 is located in close proximity to the 5’ start of hexokinase 1, while both genes are not directly overlapping Predictions of conserved domains suggest that FLJ22761 is a hexokinase; however there is no experimental evidence for its catalytic function. FLJ22761 closely resembles hexokinases in number of features. Exon sizes of FLJ22761 are identical to the exon sizes of the other hexokinases (see 5.1.1.2 and Table 25). One of the three isoforms generated by alternative poly-adenylation has the same size of 917 amino acids as have HK1, 2 and 3. An alignment of the amino acids of 917 aa form of FLJ22761 with the amino acid sequence of the somatic isoform of HK1 and the amino acids sequence encoded by HK2 was performed. This revealed that FLJ22761 shows 70 % homology to HK1 and 67 % to HK2, while HK1 and HK2 exhibit 73 % homology between each other. Interestingly, a comparison of the first 21 amino acids which constitute the porin binding domain of HK1 and HK2 suggests that FLJ22761 might

206 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR also have such a domain and thus could be soluble or mitochondria-bound via VDAC in the same way as HK1 and HK2 (Figure 59).

HK1 MIAAQLLAYYFTELKDDQVKK HK2 MIASHLLAYFFTELNHDQVQK FLJ22761 MFAVHLMAFYFSKLKEDQIKK *:* :*:*::*::*:.**::*

Figure 59: Comparison of the PDB (porin binding domain) of HK1 and HK2 with the putative PDB in FLJ22761 An alignment of the first 21 amino acids of FLJ22761, HK1 and HK2 was performed using clustalW. A total of nine amino acids were identical between all three proteins and additional 10 constituted conserved amino acid substitutions. The colours designate amino acid properties: red = small and hydrophobic; blue = acidic; magenta = basic; green = amino acid with a hydroxyl amine group including Q. For the alignment “*” denotes a complete match, “:” a conserved amino substitution and “.” a semi-conserved amino acid substitution.

207 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR

5.4 SUMMARY AND DISCUSSION

Functional analysis is the final criterion for the proof of a disease-causing mutation [223]. This involves showing that there is a difference between the normal and the disease state that is caused by the mutation in question. The design of the functional analysis depends on the type and location of the mutation and its expected effect on the gene and/or protein. The result of the mutation screen were two putative HMSNR mutations, a G to C change in alternative exon T2 of HK1, which is untranslated, and the other, a G to A change in the intron following this exon, in ca. 1.3 kb distance from the first putative mutation. Neither change could be proven as causative by genetic means indicating that they may be in linkage disequilibrium. It was concluded that due to its position inside in an exon the putative mutation in alternative exon T2 is more likely to be disease-causing than the one in the following intron. Furthermore it was considered, that the real mutation may not have been detected yet, in which case the other two genes in the critical region, FLJ31406 and FLJ22761, may be affected by a putative mutation. For the two putative mutations that have been identified, it was assumed that the so-called testis-specific transcripts of HK1 are most likely to be affected by the change, provided that these transcripts can be detected in the nervous system, as HMSNR does not have a pathology outside the nervous system. If this cannot be established, than it is most likely that there is a general effect on HK1, likely on the common transcript. More remote were considerations, that a new so far not identified gene overlapping the HK1 locus could be effected by one of the mutations, or, that there is an effect on a distant gene. In order to obtain more knowledge about both putative mutations computational methods and transcriptional analysis were employed. The computational analysis focussed on determining whether the two putative mutations are in positions that are conserved in genomes of species other than human. Furthermore, splice-site predictions for the mutation located in an untranslated exon and a prediction of the mRNA secondary structure were attempted, putative translation start sites and upstream ORFs were examined and the sequence of the alternative exon was analysed with the program UTRscan. Conservation of the common nucleotide at the mutations site was found in mouse, chimpanzee and dog for the putative mutation in alt exon T2, while the other

208 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR mutation in the following intron seemed to be conserved in dog, but not in mouse, while the chimpanzee genome was incomplete at this position. No conservation was found in other genomes, which can be due to true lack of similarity or sequence gaps. Examination of putative translation start sites demonstrated that there was no in- frame start codon upstream of the one in exon T3 that would allow continuous translation into the annotated ORF. However, translation of the mutation site in alternative exon T2 revealed that the G to C change could potentially create an upstream open reading frame by abolishing a stop codon. Furthermore, analysis of the alternative exon T2 in UTRscan identified similarity to IRES sequences in the sequence surround the putative mutation. The prediction of putative splice sites resulted in no changes in splicing for the forward strand for both mutations and for the complementary strand no changes were predicted for the intronic mutation. Creation of a new splice site on the complementary strand was predicted for putative HMSNR mutation alt exon T2. However, this would only be relevant if a transcript and thus a new gene could be identified which overlaps HK1 on the complementary strand. However, the prediction programmes did not assess branch sites, therefore it cannot be excluded that the putative mutation in the intron following alternative exon T2 weakens or strengthens a branch site sequence. In addition, ESE prediction was carried out for the mutation in alternative exon T2, which did not yield any ESE motif at the mutation site. Location of both putative HMSNR mutations in exonic or intronic splicing enhancers or silencers cannot be discounted until further experimental work proves the opposite. Given the complexity of alternative splicing in the 5’ region of HK1, seen in mouse and human, it seems conceivable that a mutation in a regulatory splice element disrupts the equilibrium of alternative transcripts in tissue-specific manner thereby causing specific disease pathology such as HMSNR. Cartegni et al have discussed an interesting observation: It seems that diversity of codon usage in coding exons is much more restricted and silent changes are in fact less common than one would expect from the notion that the protein sequence is not affected by such changes. Negative selection against silent mutations might reflect the requirements of splice regulation (reviewed in [287]). Such negative selection could also be in place for non-coding exons. The surprisingly high conservation of the sequence surrounding the putative HMSNR mutation in alternative exon T2 in comparison to mouse and dog, while at the same time other 5’ exons in mouse are markedly different and in dog are not identifiable at least in blast searches, could mean

209 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR that there are auxiliary splicing elements required for correct production of the different transcripts at the right time in the right cell type. Prediction of changes in the mRNA secondary structure caused by the exonic mutation, the G to C change in alternative exon T2, showed that the structure may be dependent on the size of the mRNA and therefore predictions should only be made once the composition of the mRNA has been established. Due to the given time-frame for this PhD the transcriptional analysis could only focus on transcripts containing the alternative exon T2. The Aim was to establish whether this exon is expressed in the nervous system. Therefore, several RT-PCR experiments including one primer in the alternative exon T2 and the other primer in a downstream or an upstream exon of HK1 were established in testis, as the exon is known to be expressed in this tissue from the human EST data. However, no product could be found in human tissue from the nervous system including brain, peripheral nerve and Schwann cells. The mouse equivalent of the alternative exon T2 was not described before, but BLAST searches identified an area of high homology to the human exon, which was located between the mouse T1 and T2 and therefore likely to be the mouse counterpart of alt exon T2. Indeed RT-PCR experiments established transcripts in testis and also in brain. Surprisingly, the mouse alternative exon T2 was larger the human one and also exhibited alternative splicing in a way that was not seen in the human. Eight new transcripts involving alternative splicing of alternative exon T2 and the following exon T2 could be identified; two of these transcripts were also found in brain. Furthermore, Northern blot experiments were performed which could only detect transcripts containing the alternative exon T2 in mouse and human testis, while no transcripts were seen in tissues of the nervous system. A cDNA library screen using a foetal brain library did only detect false positives. The detection power for both, Northern blot and cDNA library screen could have been limited due to the small size of the probe, which for reason of specificity only contained parts of the alternative exon T2 and parts of T2. Moreover, exon T2 is subject to alternative splicing itself, as was later determined by RT-PCR in mouse, thus the design of the probe was not ideal. This is especially true for the foetal brain library screen as, at least in the mouse, brain transcripts which contain alternative T2 do not seem to contain T2. In addition, a RACE experiments using human peripheral nerve were performed partially by a co-worker and partially within the scope of this PhD. The result of these experiment was a transcript containing alternative exon T2 and exon T2 extended into

210 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR the genomic DNA. An interpretation was difficult and it was concluded that the experiment would need to be repeated to reproduce or reject the transcript. Further experiments are necessary to that end. However, it seems that the exon is transcribed in mouse brain. Overall, it can be concluded, that transcripts containing the alternative exon are rare, as their detection in hybridisation based methods seems to be difficult and RT-PCR has to rely on 40 cycles in order to detect transcripts even in testis. This is supported by the fact that PCR amplifications bridging the alternative exon T2 only yield amplicons devoid of the exon. Moreover it needs to be considered if the mouse is really a good model for the human alternative exon T2 as both mouse and human transcripts from the 5’ area of HK1 contain exons that do not have an equivalent in the other species, and in addition the mouse seems to exhibit an amount of alternative splicing of alt exon T2 that is not seen in humans. Immunohistochemistry experiments using an antibody against HK1 on nerve biopsies from normal and HMSNR affected individuals showed that HK1 might be slightly elevated in the HMSNR biopsy. However these results require confirmation in order to exclude artefacts. Biochemical analysis of the HK activity in normal and HMSNR cultured Schwann cell yielded no difference between the two samples. This might be due to the masking effect of other hexokinase isoenzymes. Another point to be considered is the cultivation of Schwann cells in media containing forskolin. In rat lymphoid FRTL5 cells it has been shown that forskolin augments levels of HK1 mRNA by increasing the intracellular cAMP [308]. This might also be the case for Schwann cells and thus complicate interpretation of results. In addition, it is not known from HMSNR, whether the problem lies in the Schwann cell or the neuron, thus this experiment might have examined the wrong cell type and for that reason has not detected any differences between normal and mutant. However, if the HMSNR pathology at least partially affects Schwann cells, then this could mean that catalytic HK1 activity is not affected and thus implying that if HK1 is the HMSNR gene, another function must be affected. Database searches and predictions were employed to assess whether a new gene could overlap the two mutation sites and thus be potentially affected by these two mutations. No strong evidence was gained for the presence of a new gene. Such a gene would potentially have to overlap FLJ22761 and HK1, which would create an overlapping arrangement of four genes, as FLJ31406 and FLJ22761 are already partially contained in the intron of each other. Until very recently, it was thought that overlapping gene arrangements are rare in eukaryotic nuclear genomes, while being

211 Chapter 5: Gathering Evidence for the Involvement of HK1 in HMSNR frequently encountered in prokaryotic genomes and organelles such as mitochondria (reviewed in [306]). The NCBI human genome assembly build 33 contained 34,604 annotated genes, amongst which 774 pairs of overlapping protein coding genes were identified, however an overlap of four genes was only identified once [306], thus making this possibility unlikely for the HMSNR region. Further analysis focussed on gaining information about the genes FLJ31406 and FLJ22761 which are contained in the centromeric half of the refined critical HMSNR region of 63.8 kb. Not much is known about FLJ31406 apart from that it seems to be untranslated, it originated from a teratocarcinoma library that was obtained from neuronal precursor cells, and, it does not exhibit any similarity to any other known expressed sequenced from human or other species. Its function remains open to speculation. Recently the concept of gene regulation in eukaryotes by non-coding antisense RNAs in cis or trans has received great attention (reviewed in [309]). FLJ34106 could be a candidate for such a regulation mechanism, as it seems to be non-coding and is partially locate in the introns of FLJ22761. FLJ22761 has been predicted to be a hexokinase. It is expressed in the CNS; however, there is no data about PNS expression. A comparison of the amino-terminal end of FLJ22761 with the amino-terminus of HK1 (somatic) and HK2 yielded high similarities thus suggesting that FLJ22761 may possess a porin binding domain. It seems possible that FLJ22761 is a functional hexokinase. This implies all the work that is directed at investigating the expression of the four isoenzymes and their respective function, should consider FLJ22761, as this might be the fifth player in the game. One could also speculate that two genes which potentially fulfil a similar function and are located immediately adjacent to each other might be subject to the same regulatory forces.

212 Chapter 6: General Discussion and Conclusion

6 GENERAL DISCUSSION AND CONCLUSION

Charcot-Marie-Tooth diseases (CMTs) or hereditary motor and sensory neuropathies (HMSN) are, with a prevalence of up to 1 in 2500 [44], a frequent cause of peripheral nerve pathology. The heterogeneous group of CMTs is commonly characterised by a pronounced motor deficit associated with wasting and atrophy of the distal limb muscles, and, a marked sensory loss. Frequently, deformities of the feet or hands are observed (reviewed in [310]). Clinical severity varies between different forms of CMT from a mild impairment to severe disability and even death in rare cases. One of the first classifications of CMTs was presented by Dyck and Lambert in 1968, who distinguished a type of CMT that is characterised by segmental demyelination associated with a prominent reduction in nerve conduction velocities, while the second type of CMT showed degeneration of neurons, segmental demyelination was minimal and nerve conduction velocities were near normal values [42, 57]. Harding and Thomas substantiated this work by designating the two types of CMT as type I and type II, each of which could be inherited in an autosomal dominant or recessive manner. Additionally, they set a cut-off value at 38 m/s for the nerve conduction velocities [58]. From the identification of the first genes in CMT, which were PMP22 in 1992 [70], Cx32 [110] and P0 in 1993 [71], it was a long way to the multitude of over 20 loci that is known now. Most of these were identified in the past four years, benefiting from the advances of the human genome project; and the number is still growing suggesting an enormous genetic heterogeneity for CMT disease. This thesis presents work into the genetics of Hereditary Motor and Sensory Neuropathy-Russe (HMSNR), a rare form of recessive CMT, which only occurs in the European Gypsies. HMSNR is the third recessive neuropathy that has been identified in the European Gypsy population. The first neuropathy, Hereditary Motor and Sensory Neuropathy-Lom (HMSNL), is caused by mutations in the N-myc downstream regulated gene1 (NDRG1) which results in insertion of a premature stop codon and truncation of the protein [7]. The second disease is Congenital Cataracts Facial Dysmorphism Neuropathy syndrome (CCFDN), which was recently shown to be due to a mutation in the gene encoding the carboxy-terminal domain phosphatase of RNA polymerase II on chromosome 18qter [109].

213 Chapter 6: General Discussion and Conclusion HMSNR patients exhibit typical symptoms of CMT disease without any associated features. Onset is between 7 and 16 years of age and the disease progresses steadily to a severe disability. Motor nerve conduction velocities are moderately reduced in the upper limbs (~32 m/s) and undetectable in the lower limbs. Peculiar to HMSNR is an increase in the threshold for electrical nerve stimulation, for which the origin is unknown [6, 152]. Neuropathology is by far the most distinct feature of HMSNR. Typically for all CMTs, there is pronounced loss of large myelinated fibres, indicating axonal degeneration either primary, or, secondary to segmental demyelination. But the distinct feature is the abundance of regenerative clusters consisting of small thinly myelinated axons surrounded by parent basal lamina, whilst at same time there is no evidence of active axonal degeneration or atrophy [6]. Moreover, no onion bulbs were detected in sural nerve biopsies from HMSNR patients [6]. Normally, onion bulbs develop due to successive cycles of segmental demyelination and subsequent Schwann cell proliferation with remyelination [9]. Evidence of demyelination/remyelination in form of onion bulbs was found in all demyelinating recessive CMTs (CMT4A to F, [75, 79, 87, 90, 96, 102, 104]), while it was absent in recessive CMT2 (ARCMT2A and B, ARCMT, GAN and severe infantile neuropathy, [137, 144, 145, 148, 149]. Interestingly, patients that have mutations in GDAP1, the CMT4A locus, that are associated with an axonal pathology, occasionally exhibit onion bulb formation [311], thus indicating that CMT4A is a mixed CMT, or, that onion bulb formation can be part of the phenotypic spectrum of axonal CMT. It seems that HMSNR differs from the recessive demyelinating forms of CMT in that there is no evidence of repeated demyelination and remyelination as is typical for CMT4, while at same time active axonal degeneration, the hallmark of autosomal recessive CMT2 is absent as well. Thus, designation of HMSNR as demyelinating or axonal CMT is not straightforward. Furthermore, this implies that it is unclear whether the primary problem of HMSNR lies in the axon or the in the Schwann cell, a fact which has to be taken into consideration when examining candidate genes. HMSNR was mapped to chromosome 10q23 in branches of a large Bulgarian Gypsy kindred, where HMSNL and HMSNR segregated independently, excluding the excellent positional candidate gene EGR2 during the first refinement of the critical HMSNR gene region [5]. Subsequently, additional families from Romania and Spain, which exhibited segregation of a recessive neuropathy similar to HMSNR, were included into the

214 Chapter 6: General Discussion and Conclusion genetic analysis. Even though linkage to chromosome 10q23 could not be established in these families, a founder effect in the Gypsy population was strongly suspected due to the chromosome 10 haplotypes detected in these individuals, which resembled the HMSNR haplotypes, while at the same time exhibiting different historical recombinations centromeric and telomeric of the conserved part common to all haplotypes. Therefore, it was believed that all individuals, although residing in different countries, share a common distant ancestor. This assumption is based on Gypsy population history, where founder effects are well documented, but also on previous gene mapping success using shared haplotypes and exploiting accumulated historical recombinations for the refinement of the critical gene region [3, 4, 151]. For HMSNR, inclusion of the additional families reduced the critical region to the interval between markers bA86K9CA1 and D10S1742, which was estimated to be about 1 Mb at the start of this PhD in 2001 [6, 152]. This PhD project aimed at the identification of the causative HMSNR mutation applying a positional cloning strategy. This necessitated construction of a high-density integrated physical and genetic map of the HMSNR region on chromosome 10q, which was used in the refined mapping of the HMSNR gene. Haplotype analysis enabled identification of recombinant haplotypes, mapping of the recombination breakpoints, and, definition of the minimum region of homozygosity that is shared by all HMSNR haplotypes providing the location of the HMSNR mutation. The final step was the identification of the two putative HMSNR mutations.

6.1 REFINED MAPPING OF THE HMSNR GENE REGION

The refined mapping of the HMSNR region (Chapter 3) resulted in a reduction of the critical interval from ~1 Mb to just 63.8 kb. The strategy for this part of the project was based on the assumption that the HMSNR mutation is a founder mutation in the Gypsy population. This implied that for the refined mapping it was possible to use all recombinations that accumulated on the disease haplotype over the history of the population, which permits the inclusion of small families into the analysis, where linkage to the disease cannot be demonstrated or is inconclusive [199]. The size of the refined interval, which is the part that is inherited identical by descent (IBD) in all disease haplotypes, depends on the age of the mutation. The older the mutation, the smaller is the expected shared region between all haplotypes [196].

215 Chapter 6: General Discussion and Conclusion Comprehensive analysis of all HMSNR haplotypes incorporating a large number of newly identified informative variants enabled meticulous mapping of the recombination breakpoints. A total of 17 different HMSNR haplotypes, containing 11 different centromeric and six different telomeric recombinations, were detected in just 25 affected individuals. This result suggests that the HMSNR mutation is likely to be an old mutation. The diversity of recombinations made it possible to successfully reduce the critical HMSNR region to 63.8 kb, with the centromeric border being defined by SNP #171 on the centromeric side (located between exon 2 of FLJ22761 and exon 1 of FLJ3106) and SNP #156 on the telomeric side (located between alternative exon T4 and exon T4 of HK1). The definition of the centromeric boundary was crucially dependent on ruling out allele “5” at marker D10S1647 as microsatellite mutation in both Romanian Gypsy families, which substantially reduced the critical region by ~ 548 kb and prompted further investigation of individual (K3) who was suspected to have HMSNR. Identification of a recent recombination on the maternal haplotype in K3 decreased the region by another 45 kb. On the telomeric side, both the Bulgarian and the Romanian Gypsy families exhibited different recombination breakpoints within 0.7 kb of each other supported by a number of SNPs, with SNP #156 being the SNP located furthest into the critical region. The final critical HMSNR region is, at 63.8 kb even smaller than the regions identified in HMSNL (200 kb) [7] or CCFDN (155 kb) [109], the other two neuropathies mapped in the Gypsies. The result also stands in good comparison to mapping efforts in another founder population, the Finns, where the loci for EPM and PRO-SL were refined to 176 kb and 150 kb, respectively (reviewed in [2]). Gene mapping in founder populations is associated with a number of advantages. Favourable conditions for disease gene identification are provided by reduced allelic and locus heterogeneity caused by the founder effect and subsequent genetic isolation. Diseases that are rare in other populations may increase in prevalence (reviewed in [2]). Genetic homogeneity allows the application of a shared haplotype method or linkage disequilibrium mapping to localise disease genes. Refined mapping of a disease locus is particularly rewarding in founder populations, as all affected individuals are likely to have inherited the disease haplotype from a distant ancestor and thus recombination events that took place over the history of the population can be used to restrict the disease locus (reviewed in [2]). Mapping of the HMSNR gene exemplifies these advantages. HMSNR is a genetically homogenous disorder; all affected individuals share a small part of the ancestral haplotype, containing the disease-causing mutation.

216 Chapter 6: General Discussion and Conclusion Refined mapping could exploit a total of 17 different recombinations occurring in just 25 individuals.

6.2 MUTATION ANALYSIS

The mutation analysis (chapter 4) identified two putative HMSNR mutations in gene hexokinase 1 (HK1), which could not be excluded by applying the exclusion criteria set out at the beginning of chapter 4. At first the putative mutation in alternative exon T2 was identified, when all the ESTs in the critical region of ~64 kb were searched for mutations. Screening of a large Bulgarian Gypsy sample and more than 100 chromosomes of Bulgarian Non-Gypsies did not exclude the putative mutation. However, expression studies of the transcript were problematic and fell short of demonstrating expression of alternative exon T2 in a tissue of the PNS. In order to support the claim that this change is the real HMSNR mutation, a co-worker started sequencing the remaining intronic and intergenic regions. The reasoning was that if this effort would fail to identify another putative mutation then the one in alternative exon T2 has to be the real HMSNR mutation. Moreover, sequencing the whole critical region would also comply with the criteria set out by Cotton and Scriver 1998 [223] and Cotton and Horaitis 2000 [224], who draw attention to the fact that the true mutation may be missed in some cases if sequencing stops when the first mutation is identified. However, the sequencing identified a second putative mutation, which underwent the same screening procedure in the Bulgarian Gypsy sample and the Non-Gypsy sample. Both putative mutations were always detected together and seem to be in linkage disequilibrium. The fact that both putative mutations are located in HK1 suggests that this gene may be involved in the pathology of HMSNR. In the final stages of HMSNR gene identification, lack of allelic heterogeneity in the Gypsies, which was useful for the refined mapping proved to be disadvantageous. As Botstein and Risch remark, the best confirmation of the involvement of a given gene into a pathology is the identification of several mutations in the gene that cause similar phenotypes [186]. HMSNR, however, is a rare disorder with only 30 known affected individuals, all of Gypsy background, and, all being homozygous for the same two putative HMSNR mutations. For the purpose of identifying another mutation in the suspected HMSNR gene hexokinase 1, which would prove the involvement of HK1 in HMSNR, a Spanish non-Gypsy patient with a complex syndrome, that included a

217 Chapter 6: General Discussion and Conclusion neuropathy resembling HMSNR in its neuropathology, was screened for mutations in the gene. So far no additional mutation has been identified. Due to the rareness of HMSNR it could not be expected that support for the HMSNR gene would be obtained through identification of other mutations that cause a similar pathology. It remains however possible, that other mutations in HK1 cause neuropathies that do not resemble HMSNR. Further screening of HK1 in patients with unknown neuropathies may therefore identify mutations that could support the role of HK1 as a CMT gene. In the past decade, gene mapping in the Gypsies has identified a number of private founder mutations, some of which result in novel disorders, such as HMSNL and CCFDN, while others are the cause of known Mendelian diseases, as for example polycystic kidney disease or galactokinase deficiency (reviewed in [3]). So far a total of 11 private founder mutations in the Gypsies have been described ([312, 313] and reviewed in [3]) and it is likely that more will be found. The problem with identifying mutations in genes for rare disorders in the Gypsies remains, while refined mapping has the advantage of using accumulated historical recombinations, unusual mutations will be hard to prove. As demonstrated in chapter 4 the first putative mutation (Laboratory identification number #144) is located in alternative exon T2 of HK1, while the second one (Laboratory identification number #213) is in the intron following alternative exon T2, 1.314 kb downstream of the first putative mutation and 117 bp upstream of alternative exon T2c of HK1. 54 kb of the final critical region of 63.8 kb were searched for putative HMSNR mutations, thereby including all expressed sequences and their exon intron boundaries, which are prime positional candidates for putative mutations. While it cannot be excluded that the remaining 9.8 kb harbour further putative mutations that fulfil the same criteria, the two identified changes represent good candidates for being causative of HMSNR. Therefore further efforts concentrated on these two mutations. As both of them are located in HK1 it was considered, that this gene is most likely the HMSNR gene. Based on the location of the mutations, the first is exonic in the 5’ UTR and the second one is in an intron, it was assumed that the exonic mutation (laboratory identification number #144) is more likely to be disease-causing, as it has greater potential to disrupt the normal expression of an isoform of HK1.

218 Chapter 6: General Discussion and Conclusion 6.3 HK1 AS THE POSSIBLE HMSNR GENE

Mutations affecting coding sequences of HK1 have been shown to cause nonspherocytic haemolytic anaemia (NSHA) [236-238]. If proof can be obtained that HK1 is the HMSNR gene, than this indicates phenotypic heterogeneity at the HK1 locus affecting two very different tissues. Phenotypic heterogeneity has also been noticed for a number of CMT loci, although usually involving other neuromuscular diseases. The lamin A/C gene which is mutated in CMT2A has also been implicated in Emery-Dreifuss muscular dystrophy, dilated cardiomyopathy type 1A, limb girdle muscular dystrophy 1B and autosomal dominant partial lipodystrophy (reviewed in [138]). Another example for phenotypic heterogeneity is CMT2D, where mutations in the GARS gene result in CMT2D and distal HMN-V [127]. Similarly, Mutations affecting HSP27 can cause CMT2F and distal HMN-II [132]. In light of these examples, the involvement of a CMT gene in NSHA seems somewhat out of the ordinary. But there is one example which demonstrates that CMT genes can be involved in pathologies other than neuromuscular. Hereditary mutations in LITAF/Simple have been shown to cause CMT1C [73], while acquired somatic mutations of LITAF/Simple, which were not identical to the CMT mutations, have recently been detected in cancers of patients with extramammary Paget’s disease (EMPD) [314]. These examples give a good indication that specific mutations may cause explicit pathologies in certain tissues, whilst other tissues remain unaffected. They also suggest that phenotypic heterogeneity might be common amongst CMT genes, thus indicating that it may be possible that mutations in HK1 cause NSHA or HMSNR. When comparing HK1 to the other genes that have been implicated with CMT, HK1 seems at the first look like “the odd one out”, because it catalyses the first step of glycolysis and is therefore essential to the energy metabolism of the cell. Even though the diversity of CMT genes is now enormous, owing to the past four years, which were extremely fruitful for CMT gene discovery, none of these genes is involved in glycolysis or energy metabolism as such. CMT genes are structural components of myelin, transcription factors, phosphatases, neurofilaments, GTPases, microtubule motors, heat shock proteins, parts of ion channels and even amino-acyl tRNA transferases.

219 Chapter 6: General Discussion and Conclusion Two emerging themes can be noted when comparing functional involvement of CMT genes, the first being transport, both vesicular and axonal, and the second one being apoptosis.

6.3.1 The transport theme in CMT

A functional network of neurofilament (NF) is essential for proper axonal and vesicular transport. Mutations in NF-light are the cause of CMT2E [128]. It is thought that mutant protein disrupts assembly of the neurofilament thereby decreasing axon diameter which results in a slowing of axonal transport (reviewed in [315]). Consequently, every disturbance of neurofilament assembly by other proteins is disruptive to axonal transport. This is likely the case for gigaxonin, where mutants of GAN are characterised by accumulation of neurofilaments [146-148]. Further, mutations in KIF1Bβ, a microtubule motor, have been implicated in abnormalities in axonal transport in CMT2A [115]. MTMR2, which is mutated in CMT4B1 [86], interacts with NF-L in peripheral nerve [315]. Studies in yeast MTMR proteins show that they are involved vesicular trafficking [316], and examination of the function of myotubularin (MTM) 6 and 9 in c. elegans demonstrated that these proteins regulate the Arf6 endocytotic pathway [317], which interestingly, also seems to be affected by PMP22 mutants that cause CMT1A [318]. Therefore it was suggested that for CMT4B, which is due to mutations in two MTMR proteins, and for CMT1A, there might be similarities in the disease mechanism [317]. Another example for a possible impairment of intracellular trafficking as the basis of CMT disease is RAB7, which is involved in transport between late endosome and lysosome. Mutations in the RAB7 gene result in CMT2B [121]. In summary, it seems that a disruption in cellular trafficking in the axon is a common cause of CMT, in which neurofilaments appear to be playing a central role.

6.3.2 The apoptosis theme in CMT

Apart from having an impact on cellular trafficking, PMP22 mutants have also been shown to influence cell death [318]. Sancho and colleagues investigated proliferation and apoptosis of Schwann cells in PMP22 mutant mice and normal controls. They found that early apoptosis and proliferation seemed to be comparable, while abnormalities occurred from week 10, where normal mice showed little evidence of apoptosis, whilst mutant mice exhibited continued proliferation and apoptosis [319].

220 Chapter 6: General Discussion and Conclusion CMT1C is due to mutations in LITAF/SIMPLE (also called PIG7) [73]. LITAF expression has been detected early after tumour protein p53 (TP53) expression in a colorectal cancer cell line. It was therefore suggested that it might be directly regulated by p53 in apoptosis [320]. Furthermore, LITAF and related proteins seem to generate ROS, which can induce apoptosis (reviewed in [314]). Along with LITAF, Bax and the BH3 interacting domain death agonist (Bid), NDRG1, which is mutated in HMSNL, has been shown to be induced early in p53 mediated apoptosis in experiments using human colorectal adenocarcinoma cell lines. It has been demonstrated that p53 directly binds to elements in the NDRG1 promoter. Silencing NDRG1 abolished p53 mediated apoptosis suggesting that NDRG1 is required for p53 mediated apoptosis [321]. Research into the function of the transcription factor EGR2, which causes CMT1D and CMT4E when mutated [75], has shown that EGR2 is fundamental for the transition of non-myelinating Schwann cells to myelinating ones. Before the onset of myelination, proliferation and apoptosis of Schwann cells declines; this has now been shown to be influenced by EGR2 in several ways. EGR2 expression reduces the influence of NRG1 (neuregulin1) on Schwann cells, it signals cell cycle exit and inactivates TGFβ thus preventing apoptosis. Furthermore, it suppresses JNK/c-Jun signalling, consequently controlling cell division and cell death [322]. Recently, HSP27 was shown to be mutated in CMT2F. It was demonstrated in cell culture that survival of neuronal cells is reduced by the mutant protein. Assembly of NF-L also seemed to be affected [132]. HSP27 acts as a cyto-protective and anti- apoptotic; its up-regulation is important for the survival of injured motor and sensory neurons (reviewed in [132]). The anti-apoptotic effect of HSP27 is conveyed by direct interaction with cytochrome c released from the mitochondria which impedes caspase activation (reviewed in [323]). MFN2 represents a pro-apoptotic protein that has recently been implicated as the gene for the majority of CMT2A cases. It is located at the outer mitochondrial membrane, where it is involved in regulating fusion of mitochondria [116]. MFN2 was implicated in cell death due to its co-localisation with Bax during apoptosis in HeLa, COS7 cells and myocytes [324]. For many of the proteins named here, studies into apoptosis were performed outside the nervous system, thus it is unclear what effect the CMT mutations may have on apoptosis in the PNS. However, due to the importance of apoptosis in the nervous system it seems plausible that further investigation might reveal disturbance of apoptotic regulation as a common theme of CMT. During the development of the

221 Chapter 6: General Discussion and Conclusion nervous system neurons are generated in excess and they are set to die unless they receive vital support from trophins/growth factors (reviewed in [325-327]). Pro- apoptotic Bax is essential for apoptosis in the developing neuron, as seen from mice which lack Bax, whose neurons stay alive indefinitely in the absence of growth factors (reviewed in [326]). In contrast, adult neurons are preset to survive without any neurotrophic support (reviewed in [325-327]), while apoptosis is conveyed via extrinsic death-receptor or intrinsic mitochondrial pathways, which are regulated by a tightly controlled complex network of pathways (reviewed in [326]). The fundamental role of HK1 in energy metabolism describes HK1 function only partially. In recent years it has been repeatedly demonstrated that hexokinase isoenzymes HK1 and HK2 act to prevent apoptosis by counteracting the pro-apoptotic influence of Bax [261, 262]. Moreover, glucokinase has been found to be part of protein complex containing the pro-apoptotic protein BAD, which resides at liver mitochondria. At low levels of glucose BAD is dephosphorylated which results in apoptosis [265]. It appears that glucose metabolism and apoptosis are linked via the action of hexokinases [266]. This suggests that HMSNR might not actually be “the odd one out”, but could be due to a problem in apoptosis caused by a mutation in HK1. The involvement of HK1 in mitochondrial apoptosis makes it quite possible that the pathology in HMSNR is actually a problem of apoptosis, especially due the interaction of HK1 with Bax which has been clearly proven as a major pro-apoptotic player in the nervous system.

6.3.3 Possible mutational mechanisms

In chapter 5 of this PhD thesis, the data from computational and experimental analysis of HK1 are presented. Analysis of the conservation of the two putative mutation sites and the surrounding sequence in other species, revealed a higher conservation for the alternative exon T2, thus adding weight to the consideration, that the putative mutation in this exon affects an important nucleotide and is therefore more likely to be disease-causing than the other change. Furthermore, the effect of the putative mutation in alternative exon T2 on translation regulation was assessed by searching the 5’ exons of the human HK1 mRNA for upstream start codons, such as canonical AUGs and non-canonical CUGs, and upstream open reading frames. There was an abundance of upstream start codons, especially CUG codons, suggesting that there might be complex regulation in place, 222 Chapter 6: General Discussion and Conclusion which ensures, that the ribosome translates the known testis specific ORF, which starts in exon T3. Additionally, it was found, that the putative mutation in alternative exon T2 abolishes the stop codon of an upstream ORF which starts with a non-canonical CUG. If this upstream ORF is in fact used to regulate translation of HK1, then the putative mutation could result in a disruption of this regulation mechanism. To further the investigation of possible modes of translation regulation in HK1, the alternative exon T2 was examined by UTRscan [179], a programme which compares the input sequence to user-submitted data describing experimentally confirmed sequence motifs. This resulted in the prediction of an IRES in the area of the putative mutation, indicating that cap-independent translation might be possible for HK1. Together, these data suggest, that there is a translation regulation mechanism in place at the 5’ UTR of HK1. Moreover, it has been demonstrated in numerous examples that pro- and anti-apoptotic proteins can be translated by cap-independent mechanisms which allow translation during apoptosis when cap-dependent translation is inhibited (reviewed in [292]). It seems therefore likely, that due to its involvement in apoptosis, HK1 may be translated by cap-independent mechanisms, such as IRES, in addition to the conventional cap- dependant mechanism. This in turn, implies that HMSNR could possibly be caused by a disruption of cap-independent translation through the destruction of an IRES. Both regulation mechanisms, the use of upstream ORFs and the use of IRES, may potentially be valid for the regulation of translation in HK1 and may even be linked, as an IRES can enable the ribosome to avoid upstream ORFs and thereby evading the effect of the upstream ORF (reviewed in [290]). To this end, further experimental work is required to establish the usage of one or both mechanisms and to investigate the influence of the putative mutation in alternative exon T2 on translation. Transcriptional analysis of the alternative exon T2 was carried out in human and mouse. Using human templates, combinations of known exons containing alternative exon T2 could only be amplified from testis but not from brain or peripheral nerve. However, this may not suggest, that these transcripts do not exist in the nervous system, as the human ESTs that contain alternative exon T2 give evidence that it must be expressed at least in optical nerve. Therefore, one can conclude, that the transcripts that were tried to amplify, may be rare and very specific in their localisation and/or their timing of expression. Support for the existence of transcripts containing alternative exon T2 in the nervous system came from RT-PCR in using RNA from mouse tissues. Prior to the experiments performed within this thesis, alternative exon T2 was not known in mouse.

223 Chapter 6: General Discussion and Conclusion The data presented in chapter 5, do not only demonstrate the existence of alternative exon T2 in mouse, but they also identify eight new transcripts in mouse testis, two of which where also detected in mouse brain. Thereby, the limited knowledge about alternative splicing in the 5’ region of mouse Hk1, as published by Mori et al [241], was significantly extended. Moreover, this data leads the way to implicating the mutation in alternative exon T2 in the pathology of HMSNR, by demonstrating the occurrence of specific transcripts in the nervous system. The overall complexity of alternative splicing of 5’ exons of hexokinase 1 encountered in database searches for human HK1 and through experimental evidence in mouse Hk1 indicates that alternative splicing in the 5’ region of hexokinase 1 must be of great importance. So far limited predictions of changes in splice sites and splice enhancers caused by the two putative mutations have not provided any evidence that this may be the mechanism by which HMSNR is caused. However, more sophisticated predictions combined with experimental data may change this. A disruption of alternative splicing by either of the two variants could potentially lead to disease for which a number of possible consequences can be imagined. For example, the availability of sites for cap-independent translation or the usage of certain upstream ORFs may be regulated by alternative splicing. Moreover, alternative splicing could serve to localise HK1 mRNAs. While it has not been shown, that HK1 mRNAs are localised to specific places in the cell, it is known that mRNA can be transported along the axon and that the sequence composition of the 5’ and 3’ UTR may be important for this process [294, 295]. This implies that changes in alternative splicing or even just the putative mutation located in a properly spliced transcript may abolish correct localisation of the mRNA and thus lead to disease. As problems in transport are one common cause of CMT, localisation of HK1 mRNA in the nerve should be addressed in future experiments. As stated in chapter 5, the overall HK activity in primary culture of HMSNR and normal Schwann cells was indistinguishable. This does not exclude a problem in glycolysis as the cause of HMSNR, due to several reasons: culturing may change HK activity, the activity of other HK isoenzymes may cover a change in HK1 activity and last but not least, it is unknown whether the problem in HMSNR is in the Schwann cell or the axon, which implies the a possible change in HK activity cannot be discounted as the problem of HMSNR until the neurons of HMSNR patient have been examined, as well.

224 Chapter 6: General Discussion and Conclusion Importantly, it has been suggested, based on experimental evidence from the CNS, that Schwann cells convert glucose to pyruvate, which, in turn, is converted to lactate and supplied to the neuron by a shuttle mechanism, where it is converted back into pyruvate which fuels the tricarboxylic acid (TCA) cycle (reviewed in [328]). These findings warrant a detailed examination of glycolysis in the HMSNR nerve, if possible in an in vivo system, such as an animal model.

6.4 FUTURE DIRECTIONS

Future work should first of all establish, that all putative mutations have been identified in the final region of homozygosity of 63.8 kb, which implies sequencing of all so far not analysed parts. Any new putative mutation needs to be subjected to the scrutiny of the same exclusion criteria that were applied for the other putative mutations. If more mutations are identified outside HK1, then the respective gene (FLJ22761 or FLJ31406) needs to be included in the investigation. To gain additional evidence for the existence of HK1 transcripts containing the alternative exon T2, but also to determine the variety of transcripts in the nervous system, experiments on the RNA level need to be carried out. This may involve 5’RACE or RT-PCR. Mouse and human nerve tissue can be used, but one may consider the usage of dog as another model, for two reasons: Firstly, the conservation between human and dog is much higher, especially for the putative area of alternative exon T2, and possibly splicing of the 5’ region of HK1 is more similar to the human situation. Secondly, due to their size, dogs have larger sciatic nerves and dorsal root ganglia, making the sacrifice of large numbers of animals unnecessary. The latter would be the case for mice, where RNA yields from sciatic nerve/DRG are extremely low. For the putative mutation in alternative exon T2, a knock in animal model could be generated. If this model has a neuropathy then this proves that this is the HMSNR mutation, in addition it could also be used for further experiments, thus overcoming the problem of availability of affected tissue and at same enabling in vivo studies. With a HMSNR animal model in place it would be possible to examine PNS glycolysis, apoptosis and intracellular transport in vivo. In conclusion, confirmation of one of the two putative mutations as the HMSNR mutation awaits the completion of the sequencing of the HMSNR region and experimental evidence confirming a deleterious effect of the mutation on the gene. Key data supporting the role of HK1 as the HMSNR gene, provided by the presence of HK1

225 Chapter 6: General Discussion and Conclusion transcripts containing the alternative exon T2 in the PNS, are yet to be obtained but if the exon exists it is likely to be found in the neuron. Once this data is obtained the research may proceed with investigating the effects of mutant HK1 on glycolysis and apoptosis in neurons, the two most likely pathways involved the pathogenesis of HMSNR. Meanwhile, one can be assured that the elucidation of the mechanism of the HMSNR pathology is likely to contribute valuable knowledge about basic PNS functions such as apoptosis and energy mechanism and increase the comprehension of PNS biology.

226 References

REFERENCES

1. Arcos-Burgos, M., and M. Muenke. 2002. Genetics of population isolates. Clin Genet 61:233-47. 2. Peltonen, L. 2000. Positional cloning of disease genes: advantages of genetic isolates. Hum Hered 50:66-75. 3. Kalaydjieva, L., D. Gresham, and F. Calafell. 2001. Genetic studies of the Roma (Gypsies): a review. BMC Med Genet 2:5. 4. Gresham, D., B. Morar, P. A. Underhill, G. Passarino, A. A. Lin, C. Wise, D. Angelicheva, F. Calafell, P. J. Oefner, P. Shen, I. Tournev, R. de Pablo, V. Kucinskas, A. Perez-Lezaun, E. Marushiakova, V. Popov, and L. Kalaydjieva. 2001. Origins and divergence of the Roma (gypsies). Am J Hum Genet 69:1314-31. 5. Rogers, T., D. Chandler, D. Angelicheva, P. K. Thomas, B. Youl, I. Tournev, V. Gergelcheva, and L. Kalaydjieva. 2000. A novel locus for autosomal recessive peripheral neuropathy in the EGR2 region on 10q23. Am J Hum Genet 67:664-71. 6. Thomas, P. K., L. Kalaydjieva, B. Youl, A. Rogers, D. Angelicheva, R. H. King, V. Guergueltcheva, J. Colomer, C. Lupu, A. Corches, G. Popa, L. Merlini, A. Shmarov, J. R. Muddle, M. Nourallah, and I. Tournev. 2001. Hereditary motor and sensory neuropathy-russe: new autosomal recessive neuropathy in Balkan Gypsies. Ann Neurol 50:452-7. 7. Kalaydjieva, L., D. Gresham, R. Gooding, L. Heather, F. Baas, R. de Jonge, K. Blechschmidt, D. Angelicheva, D. Chandler, P. Worsley, A. Rosenthal, R. H. King, and P. K. Thomas. 2000. N-myc downstream-regulated gene 1 is mutated in hereditary motor and sensory neuropathy-Lom. Am J Hum Genet 67:47-58. 8. Tortora, G. J. 1988. Introduction to the human body. 9. Richardson, E. P., and U. De Girolami. 1995. Pathology of the peripheral nerve. 10. Alberts, B., D. Bray, J. Lewis, M. Raff, K. Roberts, and J. D. Watson. 1994. Molecular biology of the cell, 3rd edition. 11. Morell, P. 1984. Myelin. 2nd edition. Plenum Press:1-50. 12. Garbay, B., A. M. Heape, F. Sargueil, and C. Cassagne. 2000. Myelin synthesis in the peripheral nervous system. Prog Neurobiol 61:267-304. 13. Bunge, R. P., M. B. Bunge, and M. Bates. 1989. Movements of the Schwann cell nucleus implicate progression of the inner (axon-related) Schwann cell process during myelination. J Cell Biol 109:273-84. 14. Dyck, P. J., P. K. Thomas, E. H. Lambert, and R. Bunge. 1984. Peripheral neuropathy, 2nd edition. 15. Alvarez, J., A. Giuditta, and E. Koenig. 2000. Protein synthesis in axons and terminals: significance for maintenance, plasticity and regulation of phenotype. With a critique of slow transport theory. Prog Neurobiol 62:1-62. 16. Hofmann, E. 1996. Medizinische Biochemie systematisch. 17. Peirano, R. I., D. E. Goerich, D. Riethmacher, and M. Wegner. 2000. Protein zero gene expression is regulated by the glial transcription factor Sox10. Mol Cell Biol 20:3198-209. 18. Menichella, D. M., E. J. Arroyo, R. Awatramani, T. Xu, P. Baron, J. M. Vallat, J. Balsamo, J. Lilien, G. Scarlato, J. Kamholz, S. S. Scherer, and M. E. Shy. 2001. Protein zero is necessary for E-cadherin-mediated adherens junction formation in Schwann cells. Mol Cell Neurosci 18:606-18. 19. Shames, I., A. Fraser, J. Colby, W. Orfali, and G. J. Snipes. 2003. Phenotypic differences between peripheral myelin protein-22 (PMP22) and myelin protein zero

227 References (P0) mutations associated with Charcot-Marie-Tooth-related diseases. J Neuropathol Exp Neurol 62:751-64. 20. Previtali, S. C., A. Quattrini, M. Fasolini, M. C. Panzeri, A. Villa, M. T. Filbin, W. Li, S. Y. Chiu, A. Messing, L. Wrabetz, and M. L. Feltri. 2000. Epitope- tagged P(0) glycoprotein causes Charcot-Marie-Tooth-like neuropathy in transgenic mice. J Cell Biol 151:1035-46. 21. Wise, C. A., C. A. Garcia, S. N. Davis, Z. Heju, L. Pentao, P. I. Patel, and J. R. Lupski. 1993. Molecular analyses of unrelated Charcot-Marie-Tooth (CMT) disease patients suggest a high frequency of the CMTIA duplication. Am J Hum Genet 53:853-63. 22. Brancolini, C., P. Edomi, S. Marzinotto, and C. Schneider. 2000. Exposure at the cell surface is required for gas3/PMP22 To regulate both cell death and cell spreading: implication for the Charcot-Marie- Tooth type 1A and Dejerine-Sottas diseases. Mol Biol Cell 11:2901-14. 23. Hanemann, C. O., D. D'Urso, A. A. Gabreels-Festen, and H. W. Muller. 2000. Mutation-dependent alteration in cellular distribution of peripheral myelin protein 22 in nerve biopsies from Charcot-Marie-Tooth type 1A. Brain 123:1001-6. 24. Fruttiger, M., D. Montag, M. Schachner, and R. Martini. 1995. Crucial role for the myelin-associated glycoprotein in the maintenance of axon-myelin integrity. Eur J Neurosci 7:511-5. 25. Sherman, D. L., C. Fabrizi, C. S. Gillespie, and P. J. Brophy. 2001. Specific disruption of a schwann cell dystrophin-related protein complex in a demyelinating neuropathy. Neuron 30:677-87. 26. Boerkoel, C. F., H. Takashima, P. Stankiewicz, C. A. Garcia, S. M. Leber, L. Rhee-Morris, and J. R. Lupski. 2001. Periaxin mutations cause recessive Dejerine- Sottas neuropathy. Am J Hum Genet 68:325-33. 27. Young, P., O. Boussadia, P. Berger, D. P. Leone, P. Charnay, R. Kemler, and U. Suter. 2002. E-cadherin is required for the correct formation of autotypic adherens junctions of the outer mesaxon but not for the integrity of myelinated fibers of peripheral nerves. Mol Cell Neurosci 21:341-51. 28. Givogri, M. I., E. R. Bongarzone, and A. T. Campagnoni. 2000. New insights on the biology of myelin basic protein gene: the neural- immune connection. J Neurosci Res 59:153-9. 29. Smith-Slatas, C., and E. Barbarese. 2000. Myelin basic protein gene dosage effects in the PNS. Mol Cell Neurosci 15:343-54. 30. Narayanan, V., B. Ripepi, E. W. Jabs, A. Hawkins, C. Griffin, and G. Tennekoon. 1994. Partial structure and mapping of the human myelin P2 protein gene. J Neurochem 63:2010-3. 31. Gravel, M., J. Peterson, V. W. Yong, V. Kottis, B. Trapp, and P. E. Braun. 1996. Overexpression of 2',3'-cyclic nucleotide 3'-phosphodiesterase in transgenic mice alters oligodendrocyte development and produces aberrant myelination. Mol Cell Neurosci 7:453-66. 32. Yool, D. A., J. M. Edgar, P. Montague, and S. Malcolm. 2000. The proteolipid protein gene and myelin disorders in man and animal models. Hum Mol Genet 9:987-92. 33. VanSlyke, J. K., S. M. Deschenes, and L. S. Musil. 2000. Intracellular transport, assembly, and degradation of wild-type and disease-linked mutant gap junction proteins. Mol Biol Cell 11:1933-46. 34. Mirsky, R., K. R. Jessen, A. Brennan, D. Parkinson, Z. Dong, C. Meier, E. Parmantier, and D. Lawson. 2002. Schwann cells as regulators of nerve development. J Physiol Paris 96:17-24.

228 References 35. Jessen, K. R., and R. Mirsky. 2002. Signals that determine Schwann cell identity. J Anat 200:367-76. 36. Wegner, M. 2000. Transcriptional control in myelinating glia: the basic recipe. Glia 29:118-23. 37. Verrijzer, C. P., and P. C. Van der Vliet. 1993. POU domain transcription factors. Biochim Biophys Acta 1173:1-21. 38. Arroyo, E. J., J. R. Bermingham, Jr., M. G. Rosenfeld, and S. S. Scherer. 1998. Promyelinating Schwann cells express Tst-1/SCIP/Oct-6. J Neurosci 18:7891-902. 39. Topilko, P., S. Schneider-Maunoury, G. Levi, A. Baron-Van Evercooren, A. B. Chennoufi, T. Seitanidou, C. Babinet, and P. Charnay. 1994. Krox-20 controls myelination in the peripheral nervous system. Nature 371:796-9. 40. Nagarajan, R., J. Svaren, N. Le, T. Araki, M. Watson, and J. Milbrandt. 2001. EGR2 mutations in inherited neuropathies dominant-negatively inhibit myelin gene expression. Neuron 30:355-68. 41. Musso, M., P. Balestra, E. Bellone, D. Cassandrini, E. Di Maria, L. L. Doria, M. Grandis, G. L. Mancardi, A. Schenone, G. Levi, F. Ajmar, and P. Mandich. 2001. The D355V mutation decreases EGR2 binding to an element within the Cx32 promoter. Neurobiol Dis 8:700-6. 42. Dyck, P. J., and E. H. Lambert. 1968. Lower motor and primary sensory neuron diseases with peroneal muscular atrophy. II. Neurologic, genetic, and electrophysiologic findings in various neuronal degenerations. Arch Neurol 18:619- 25. 43. Combarros, O., J. Calleja, J. M. Polo, and J. Berciano. 1987. Prevalence of hereditary motor and sensory neuropathy in Cantabria. Acta Neurol Scand 75:9-12. 44. Skre, H. 1974. Genetic and clinical aspects of Charcot-Marie-Tooth's disease. Clin Genet 6:98-118. 45. Chaouch, M., Y. Allal, A. De Sandre-Giovannoli, J. M. Vallat, A. Amer-el- Khedoud, N. Kassouri, A. Chaouch, P. Sindou, T. Hammadouche, M. Tazir, N. Levy, and D. Grid. 2003. The phenotypic manifestations of autosomal recessive axonalCharcot-Marie-Tooth due to a mutation in Lamin A/C gene. Neuromuscul Disord 13:60-7. 46. Herndon, C. N. 1954. Three North Carolina surveys. Am J Hum Genet 6:65-7; discussion, 74-84. 47. Kurland, L. T. 1958. Descriptive epidemiology of selected neurologic and myopathic disorders with particular reference to a survey in Rochester, Minnesota. J Chronic Dis 8:378-418. 48. Brewis, M., D. C. Poskanzer, C. Rolland, and H. Miller. 1966. Neurological disease in an English city. Acta Neurol Scand 42:Suppl 24:1-89. 49. Gudmundsson, K. R. 1969. Prevalence and occurrence of some rare neurological diseases in Iceland. Acta Neurol Scand 45:114-8. 50. Chen, K. M., J. A. Brody, and L. T. Kurland. 1968. Patterns of neurologic diseases on guam. Arch Neurol 19:573-8. 51. Kondo, K., T. Tsubaki, and F. Sakamoto. 1970. The Ryukyuan muscular atrophy. An obscure heritable neuromuscular disease found in the islands of southern Japan. J Neurol Sci 11:359-82. 52. Davis, C. J., W. G. Bradley, and R. Madrid. 1978. The peroneal muscular atrophy syndrome: clinical, genetic, electrophysiological and nerve biopsy studies. I. Clinical, genetic and electrophysiological findings and classification. J Genet Hum 26:311-49. 53. Brooks, A. P., and A. E. Emery. 1982. A family study of Charcot-Marie-Tooth disease. J Med Genet 19:88-93.

229 References 54. Hagberg, B., and B. Westerberg. 1983. Hereditary motor and sensory neuropathies in Swedish children. I. Prevalence and distribution by disability groups. Acta Paediatr Scand 72:379-83. 55. Kurihara, S., Y. Adachi, K. Wada, E. Awaki, H. Harada, and K. Nakashima. 2002. An epidemiological genetic study of Charcot-Marie-Tooth disease in Western Japan. Neuroepidemiology 21:246-50. 56. Morocutti, C., G. B. Colazza, G. Soldati, C. D'Alessio, M. Damiano, C. Casali, and F. Pierelli. 2002. Charcot-Marie-Tooth disease in Molise, a central-southern region of Italy: an epidemiological study. Neuroepidemiology 21:241-5. 57. Dyck, P. J., and E. H. Lambert. 1968. Lower motor and primary sensory neuron diseases with peroneal muscular atrophy. I. Neurologic, genetic, and electrophysiologic findings in hereditary polyneuropathies. Arch Neurol 18:603-18. 58. Harding, A. E., and P. K. Thomas. 1980. The clinical features of hereditary motor and sensory neuropathy types I and II. Brain 103:259-80. 59. Reilly, M. M. 2000. Classification of the hereditary motor and sensory neuropathies. Curr Opin Neurol 13:561-4. 60. Harding, A. E., and P. K. Thomas. 1980. Autosomal recessive forms of hereditary motor and sensory neuropathy. J Neurol Neurosurg Psychiatry 43:669-78. 61. Lupski, J. R., R. M. de Oca-Luna, S. Slaugenhaupt, L. Pentao, V. Guzzetta, B. J. Trask, O. Saucedo-Cardenas, D. F. Barker, J. M. Killian, C. A. Garcia, and et al. 1991. DNA duplication associated with Charcot-Marie-Tooth disease type 1A. Cell 66:219-32. 62. Pentao, L., C. A. Wise, A. C. Chinault, P. I. Patel, and J. R. Lupski. 1992. Charcot-Marie-Tooth type 1A duplication appears to arise from recombination at repeat sequences flanking the 1.5 Mb monomer unit. Nat Genet 2:292-300. 63. Palau, F., A. Lofgren, P. De Jonghe, S. Bort, E. Nelis, T. Sevilla, J. J. Martin, J. Vilchez, F. Prieto, and C. Van Broeckhoven. 1993. Origin of the de novo duplication in Charcot-Marie-Tooth disease type 1A: unequal nonsister chromatid exchange during spermatogenesis. Hum Mol Genet 2:2031-5. 64. Silander, K., P. Meretoja, E. Nelis, V. Timmerman, C. Van Broeckhoven, P. Aula, and M. L. Savontaus. 1996. A de novo duplication in 17p11.2 and a novel mutation in the Po gene in two Dejerine-Sottas syndrome patients. Hum Mutat 8:304-10. 65. Thomas, P. K., W. Marques, Jr., M. B. Davis, M. G. Sweeney, R. H. King, J. L. Bradley, J. R. Muddle, J. Tyson, S. Malcolm, and A. E. Harding. 1997. The phenotypic manifestations of chromosome 17p11.2 duplication. Brain 120 ( Pt 3):465-78. 66. Auer-Grumbach, M., S. Strasser-Fuchs, K. Wagner, E. Korner, and F. Fazekas. 1998. Roussy-Levy syndrome is a phenotypic variant of Charcot-Marie-Tooth syndrome IA associated with a duplication on chromosome 17p11.2. J Neurol Sci 154:72-5. 67. Valentijn, L. J., P. A. Bolhuis, I. Zorn, J. E. Hoogendijk, N. van den Bosch, G. W. Hensels, V. P. Stanton, Jr., D. E. Housman, K. H. Fischbeck, D. A. Ross, and et al. 1992. The peripheral myelin gene PMP-22/GAS-3 is duplicated in Charcot- Marie-Tooth disease type 1A. Nat Genet 1:166-70. 68. Lupski, J. R., C. A. Wise, A. Kuwano, L. Pentao, J. T. Parke, D. G. Glaze, D. H. Ledbetter, F. Greenberg, and P. I. Patel. 1992. Gene dosage is a mechanism for Charcot-Marie-Tooth disease type 1A. Nat Genet 1:29-33. 69. Chance, P. F., M. K. Alderson, K. A. Leppig, M. W. Lensch, N. Matsunami, B. Smith, P. D. Swanson, S. J. Odelberg, C. M. Disteche, and T. D. Bird. 1993. DNA deletion associated with hereditary neuropathy with liability to pressure palsies. Cell 72:143-51.

230 References 70. Valentijn, L. J., F. Baas, R. A. Wolterman, J. E. Hoogendijk, N. H. van den Bosch, I. Zorn, A. W. Gabreels-Festen, M. de Visser, and P. A. Bolhuis. 1992. Identical point mutations of PMP-22 in Trembler-J mouse and Charcot-Marie-Tooth disease type 1A. Nat Genet 2:288-91. 71. Hayasaka, K., M. Himoro, W. Sato, G. Takada, K. Uyemura, N. Shimizu, T. D. Bird, P. M. Conneally, and P. F. Chance. 1993. Charcot-Marie-Tooth neuropathy type 1B is associated with mutations of the myelin P0 gene. Nat Genet 5:31-4. 72. Street, V. A., J. D. Goldy, A. S. Golden, B. L. Tempel, T. D. Bird, and P. F. Chance. 2002. Mapping of Charcot-Marie-Tooth disease type 1C to chromosome 16p identifies a novel locus for demyelinating neuropathies. Am J Hum Genet 70:244-50. 73. Street, V. A., C. L. Bennett, J. D. Goldy, A. J. Shirk, K. A. Kleopa, B. L. Tempel, H. P. Lipe, S. S. Scherer, T. D. Bird, and P. F. Chance. 2003. Mutation of a putative protein degradation gene LITAF/SIMPLE in Charcot-Marie-Tooth disease 1C. Neurology 60:22-6. 74. Warner, L. E., M. J. Hilz, S. H. Appel, J. M. Killian, E. H. Kolodry, G. Karpati, S. Carpenter, G. V. Watters, C. Wheeler, D. Witt, A. Bodell, E. Nelis, C. Van Broeckhoven, and J. R. Lupski. 1996. Clinical phenotypes of different MPZ (P0) mutations may include Charcot-Marie-Tooth type 1B, Dejerine-Sottas, and congenital hypomyelination. Neuron 17:451-60. 75. Warner, L. E., P. Mancias, I. J. Butler, C. M. McDonald, L. Keppen, K. G. Koob, and J. R. Lupski. 1998. Mutations in the early growth response 2 (EGR2) gene are associated with hereditary myelinopathies. Nat Genet 18:382-4. 76. Hayasaka, K., M. Himoro, Y. Sawaishi, K. Nanao, T. Takahashi, G. Takada, G. A. Nicholson, R. A. Ouvrier, and N. Tachi. 1993. De novo mutation of the myelin P0 gene in Dejerine-Sottas disease (hereditary motor and sensory neuropathy type III). Nat Genet 5:266-8. 77. Parman, Y., V. Plante-Bordeneuve, A. Guiochon-Mantel, M. Eraksoy, and G. Said. 1999. Recessive inheritance of a new point mutation of the PMP22 gene in Dejerine-Sottas disease. Ann Neurol 45:518-22. 78. Timmerman, V., P. De Jonghe, C. Ceuterick, E. De Vriendt, A. Lofgren, E. Nelis, L. E. Warner, J. R. Lupski, J. J. Martin, and C. Van Broeckhoven. 1999. Novel missense mutation in the early growth response 2 gene associated with Dejerine-Sottas syndrome phenotype. Neurology 52:1827-32. 79. Ben Othmane, K., F. Hentati, F. Lennon, C. Ben Hamida, S. Blel, A. D. Roses, M. A. Pericak-Vance, M. Ben Hamida, and J. M. Vance. 1993. Linkage of a locus (CMT4A) for autosomal recessive Charcot-Marie-Tooth disease to chromosome 8q. Hum Mol Genet 2:1625-8. 80. Baxter, R. V., K. Ben Othmane, J. M. Rochelle, J. E. Stajich, C. Hulette, S. Dew- Knight, F. Hentati, M. Ben Hamida, S. Bel, J. E. Stenger, J. R. Gilbert, M. A. Pericak-Vance, and J. M. Vance. 2002. Ganglioside-induced differentiation- associated protein-1 is mutant in Charcot-Marie-Tooth disease type 4A/8q21. Nat Genet 30:21-2. 81. Cuesta, A., L. Pedrola, T. Sevilla, J. Garcia-Planells, M. J. Chumillas, F. Mayordomo, E. LeGuern, I. Marin, J. J. Vilchez, and F. Palau. 2002. The gene encoding ganglioside-induced differentiation-associated protein 1 is mutated in axonal Charcot-Marie-Tooth type 4A disease. Nat Genet 30:22-5. 82. Nelis, E., S. Erdem, P. Y. Van Den Bergh, M. C. Belpaire-Dethiou, C. Ceuterick, V. Van Gerwen, A. Cuesta, L. Pedrola, F. Palau, A. A. Gabreels-Festen, C. Verellen, E. Tan, M. Demirci, C. Van Broeckhoven, P. De Jonghe, H. Topaloglu, and V. Timmerman. 2002. Mutations in GDAP1: autosomal recessive CMT with demyelination and axonopathy. Neurology 59:1865-72.

231 References 83. Senderek, J., C. Bergmann, V. T. Ramaekers, E. Nelis, G. Bernert, A. Makowski, S. Zuchner, P. De Jonghe, S. Rudnik-Schoneborn, K. Zerres, and J. M. Schroder. 2003. Mutations in the ganglioside-induced differentiation-associated protein-1 (GDAP1) gene in intermediate type autosomal recessive Charcot-Marie- Tooth neuropathy. Brain 126:642-649. 84. Bolino, A., V. Brancolini, F. Bono, A. Bruni, A. Gambardella, G. Romeo, A. Quattrone, and M. Devoto. 1996. Localization of a gene responsible for autosomal recessive demyelinating neuropathy with focally folded myelin sheaths to chromosome 11q23 by homozygosity mapping and haplotype sharing. Hum Mol Genet 5:1051-4. 85. Bolino, A., E. R. Levy, M. Muglia, F. L. Conforti, E. LeGuern, M. A. Salih, D. M. Georgiou, R. K. Christodoulou, I. Hausmanowa-Petrusewicz, P. Mandich, A. Gambardella, A. Quattrone, M. Devoto, and A. P. Monaco. 2000. Genetic refinement and physical mapping of the CMT4B gene on chromosome 11q22. Genomics 63:271-8. 86. Bolino, A., M. Muglia, F. L. Conforti, E. LeGuern, M. A. Salih, D. M. Georgiou, K. Christodoulou, I. Hausmanowa-Petrusewicz, P. Mandich, A. Schenone, A. Gambardella, F. Bono, A. Quattrone, M. Devoto, and A. P. Monaco. 2000. Charcot-Marie-Tooth type 4B is caused by mutations in the gene encoding myotubularin-related protein-2. Nat Genet 25:17-9. 87. Houlden, H., R. H. King, N. W. Wood, P. K. Thomas, and M. M. Reilly. 2001. Mutations in the 5' region of the myotubularin-related protein 2 (MTMR2) gene in autosomal recessive hereditary neuropathy with focally folded myelin. Brain 124:907-15. 88. Tyson, J., D. Ellis, U. Fairbrother, R. H. King, F. Muntoni, J. Jacobs, S. Malcolm, A. E. Harding, and P. K. Thomas. 1997. Hereditary demyelinating neuropathy of infancy. A genetically complex syndrome. Brain 120 ( Pt 1):47-63. 89. Quattrone, A., A. Gambardella, F. Bono, U. Aguglia, A. Bolino, A. C. Bruni, M. P. Montesi, R. L. Oliveri, M. Sabatelli, O. Tamburrini, P. Valentino, C. Van Broeckhoven, and M. Zappia. 1996. Autosomal recessive hereditary motor and sensory neuropathy with focally folded myelin sheaths: clinical, electrophysiologic, and genetic aspects of a large family. Neurology 46:1318-24. 90. Othmane, K. B., E. Johnson, M. Menold, F. L. Graham, M. B. Hamida, O. Hasegawa, A. D. Rogala, A. Ohnishi, M. Pericak-Vance, F. Hentati, and J. M. Vance. 1999. Identification of a new locus for autosomal recessive Charcot-Marie- Tooth disease with focally folded myelin on chromosome 11p15. Genomics 62:344- 9. 91. Senderek, J., C. Bergmann, S. Weber, U. P. Ketelsen, H. Schorle, S. Rudnik- Schoneborn, R. Buttner, E. Buchheim, and K. Zerres. 2003. Mutation of the SBF2 gene, encoding a novel member of the myotubularin family, in Charcot-Marie- Tooth neuropathy type 4B2/11p15. Hum Mol Genet 12:349-56. 92. Azzedine, H., A. Bolino, T. Taieb, N. Birouk, M. Di Duca, A. Bouhouche, S. Benamou, A. Mrabet, T. Hammadouche, T. Chkili, R. Gouider, R. Ravazzolo, A. Brice, J. Laporte, and E. LeGuern. 2003. Mutations in MTMR13, a new pseudophosphatase homologue of MTMR2 and Sbf1, in two families with an autosomal recessive demyelinating form of Charcot-Marie-Tooth disease associated with early-onset glaucoma. Am J Hum Genet 72:1141-53. 93. Gabreels-Festen, A. A., J. E. Hoogendijk, P. H. Meijerink, F. J. Gabreels, P. A. Bolhuis, S. van Beersum, T. Kulkens, E. Nelis, F. G. Jennekens, M. de Visser, B. G. van Engelen, C. Van Broeckhoven, and E. C. Mariman. 1996. Two divergent types of nerve pathology in patients with different P0 mutations in Charcot-Marie- Tooth disease. Neurology 47:761-5.

232 References 94. LeGuern, E., A. Guilbot, M. Kessali, N. Ravise, J. Tassin, T. Maisonobe, D. Grid, and A. Brice. 1996. Homozygosity mapping of an autosomal recessive form of demyelinating Charcot-Marie-Tooth disease to chromosome 5q23-q33. Hum Mol Genet 5:1685-8. 95. Guilbot, A., N. Ravise, A. Bouhouche, P. Coullin, N. Birouk, T. Maisonobe, T. Kuntzer, C. Vial, D. Grid, A. Brice, and E. LeGuern. 1999. Genetic, cytogenetic and physical refinement of the autosomal recessive CMT linked to 5q31-q33: exclusion of candidate genes including EGR1. Eur J Hum Genet 7:849-59. 96. Kessali, M., R. Zemmouri, A. Guilbot, T. Maisonobe, A. Brice, E. LeGuern, and D. Grid. 1997. A clinical, electrophysiologic, neuropathologic, and genetic study of two large Algerian families with an autosomal recessive demyelinating form of Charcot-Marie-Tooth disease. Neurology 48:867-73. 97. Senderek, J., C. Bergmann, C. Stendel, J. Kirfel, N. Verpoorten, P. De Jonghe, V. Timmerman, R. Chrast, H. G. V. M, G. Lemke, E. Battaloglu, Y. Parman, S. Erdem, E. Tan, H. Topaloglu, A. Hahn, W. Muller-Felber, N. Rizzuto, G. M. Fabrizi, M. Stuhrmann, S. Rudnik-Schoneborn, S. Zuchner, J. Michael Schroder, E. Buchheim, V. Straub, J. Klepper, K. Huehne, B. Rautenstrauss, R. Buttner, E. Nelis, and K. Zerres. 2003. Mutations in a Gene Encoding a Novel SH3/TPR Domain Protein Cause Autosomal Recessive Charcot-Marie-Tooth Type 4C Neuropathy. Am J Hum Genet 73:1106-19. 98. Gabreels-Festen, A., S. van Beersum, L. Eshuis, E. LeGuern, F. Gabreels, B. van Engelen, and E. Mariman. 1999. Study on the gene and phenotypic characterisation of autosomal recessive demyelinating motor and sensory neuropathy (Charcot-Marie-Tooth disease) with a gene locus on chromosome 5q23-q33. J Neurol Neurosurg Psychiatry 66:569-74. 99. Sikorski, R. S., M. S. Boguski, M. Goebl, and P. Hieter. 1990. A repeating amino acid motif in CDC23 defines a family of proteins and a new relationship among genes required for mitosis and RNA synthesis. Cell 60:307-17. 100. Kalaydjieva, L., J. Hallmayer, D. Chandler, A. Savov, A. Nikolova, D. Angelicheva, R. H. King, B. Ishpekova, K. Honeyman, F. Calafell, A. Shmarov, J. Petrova, I. Turnev, A. Hristova, M. Moskov, S. Stancheva, I. Petkova, A. H. Bittles, V. Georgieva, L. Middleton, and P. K. Thomas. 1996. Gene mapping in Gypsies identifies a novel demyelinating neuropathy on chromosome 8q24. Nat Genet 14:214-7. 101. Hunter, M., R. Bernard, E. Freitas, A. Boyer, B. Morar, I. J. Martins, I. Tournev, A. Jordanova, V. Guergelcheva, B. Ishpekova, I. Kremensky, G. Nicholson, B. Schlotter, H. Lochmuller, T. Voit, J. Colomer, P. K. Thomas, N. Levy, and L. Kalaydjieva. 2003. Mutation screening of the N-myc downstream- regulated gene 1 (NDRG1) in patients with Charcot-Marie-Tooth Disease. Hum Mutat 22:129-35. 102. Kalaydjieva, L., A. Nikolova, I. Turnev, J. Petrova, A. Hristova, B. Ishpekova, I. Petkova, A. Shmarov, S. Stancheva, L. Middleton, L. Merlini, A. Trogu, J. R. Muddle, R. H. King, and P. K. Thomas. 1998. Hereditary motor and sensory neuropathy--Lom, a novel demyelinating neuropathy associated with deafness in gypsies. Clinical, electrophysiological and nerve biopsy findings. Brain 121:399- 408. 103. King, R. H., I. Tournev, J. Colomer, L. Merlini, L. Kalaydjieva, and P. K. Thomas. 1999. Ultrastructural changes in peripheral nerve in hereditary motor and sensory neuropathy-Lom. Neuropathol Appl Neurobiol 25:306-12. 104. Delague, V., C. Bareil, S. Tuffery, P. Bouvagnet, E. Chouery, S. Koussa, T. Maisonobe, J. Loiselet, A. Megarbane, and M. Claustres. 2000. Mapping of a new locus for autosomal recessive demyelinating Charcot- Marie-Tooth disease to

233 References 19q13.1-13.3 in a large consanguineous Lebanese family: exclusion of MAG as a candidate gene. Am J Hum Genet 67:236-43. 105. Guilbot, A., A. Williams, N. Ravise, C. Verny, A. Brice, D. L. Sherman, P. J. Brophy, E. LeGuern, V. Delague, C. Bareil, A. Megarbane, and M. Claustres. 2001. A mutation in periaxin is responsible for CMT4F, an autosomal recessive form of Charcot-Marie-Tooth disease. Hum Mol Genet 10:415-21. 106. Angelicheva, D., I. Turnev, D. Dye, D. Chandler, P. K. Thomas, and L. Kalaydjieva. 1999. Congenital cataracts facial dysmorphism neuropathy (CCFDN) syndrome: a novel developmental disorder in Gypsies maps to 18qter. Eur J Hum Genet 7:560-6. 107. Tournev, I., R. H. King, J. Workman, M. Nourallah, J. R. Muddle, L. Kalaydjieva, K. Romanski, and P. K. Thomas. 1999. Peripheral nerve abnormalities in the congenital cataracts facial dysmorphism neuropathy (CCFDN) syndrome. Acta Neuropathol (Berl) 98:165-70. 108. Tournev, I., L. Kalaydjieva, B. Youl, B. Ishpekova, V. Guergueltcheva, O. Kamenov, M. Katzarova, Z. Kamenov, M. Raicheva-Terzieva, R. H. King, K. Romanski, R. Petkov, A. Schmarov, G. Dimitrova, N. Popova, M. Uzunova, S. Milanov, J. Petrova, Y. Petkov, G. Kolarov, L. Aneva, O. Radeva, and P. K. Thomas. 1999. Congenital cataracts facial dysmorphism neuropathy syndrome, a novel complex genetic disease in Balkan Gypsies: clinical and electrophysiological observations. Ann Neurol 45:742-50. 109. Varon, R., R. Gooding, C. Steglich, L. Marns, H. Tang, D. Angelicheva, K. K. Yong, P. Ambrugger, A. Reinhold, B. Morar, F. Baas, M. Kwa, I. Tournev, V. Guerguelcheva, I. Kremensky, H. Lochmuller, A. Mullner-Eidenbock, L. Merlini, L. Neumann, J. Burger, M. Walter, K. Swoboda, P. K. Thomas, A. von Moers, N. Risch, and L. Kalaydjieva. 2003. Partial deficiency of the C-terminal- domain phosphatase of RNA polymerase II is associated with congenital cataracts facial dysmorphism neuropathy syndrome. Nat Genet 35:185-9. 110. Bergoffen, J., S. S. Scherer, S. Wang, M. O. Scott, L. J. Bone, D. L. Paul, K. Chen, M. W. Lensch, P. F. Chance, and K. H. Fischbeck. 1993. Connexin mutations in X-linked Charcot-Marie-Tooth disease. Science 262:2039-42. 111. Scherer, S. S., Y. T. Xu, E. Nelles, K. Fischbeck, K. Willecke, and L. J. Bone. 1998. Connexin32-null mice develop demyelinating peripheral neuropathy. Glia 24:8-20. 112. Nakagawa, M., H. Takashima, F. Umehara, K. Arimura, F. Miyashita, N. Takenouchi, W. Matsuyama, and M. Osame. 2001. Clinical phenotype in X- linked Charcot-Marie-Tooth disease with an entire deletion of the connexin 32 coding sequence. J Neurol Sci 185:31-7. 113. Ben Othmane, K., L. T. Middleton, L. J. Loprest, K. M. Wilkinson, F. Lennon, M. P. Rozear, J. M. Stajich, P. C. Gaskell, A. D. Roses, M. A. Pericak-Vance, and et al. 1993. Localization of a gene (CMT2A) for autosomal dominant Charcot- Marie-Tooth disease type 2 to chromosome 1p and evidence of genetic heterogeneity. Genomics 17:370-5. 114. Saito, M., Y. Hayashi, T. Suzuki, H. Tanaka, I. Hozumi, and S. Tsuji. 1997. Linkage mapping of the gene for Charcot-Marie-Tooth disease type 2 to chromosome 1p (CMT2A) and the clinical features of CMT2A. Neurology 49:1630- 5. 115. Zhao, C., J. Takita, Y. Tanaka, M. Setou, T. Nakagawa, S. Takeda, H. W. Yang, S. Terada, T. Nakata, Y. Takei, M. Saito, S. Tsuji, Y. Hayashi, and N. Hirokawa. 2001. Charcot-Marie-Tooth disease type 2A caused by mutation in a microtubule motor KIF1Bbeta. Cell 105:587-97.

234 References 116. Zuchner, S., I. V. Mersiyanova, M. Muglia, N. Bissar-Tadmouri, J. Rochelle, E. L. Dadali, M. Zappia, E. Nelis, A. Patitucci, J. Senderek, Y. Parman, O. Evgrafov, P. D. Jonghe, Y. Takahashi, S. Tsuji, M. A. Pericak-Vance, A. Quattrone, E. Battologlu, A. V. Polyakov, V. Timmerman, J. M. Schroder, and J. M. Vance. 2004. Mutations in the mitochondrial GTPase mitofusin 2 cause Charcot-Marie-Tooth neuropathy type 2A. Nat Genet 36:449-51. 117. Kwon, J. M., J. L. Elliott, W. C. Yee, J. Ivanovich, N. J. Scavarda, P. J. Moolsintong, and P. J. Goodfellow. 1995. Assignment of a second Charcot-Marie- Tooth type II locus to chromosome 3q. Am J Hum Genet 57:853-8. 118. Vance, J. M., M. C. Speer, J. M. Stajich, S. West, C. Wolpert, P. Gaskell, F. Lennon, R. M. Tim, M. Rozear, K. B. Othmane, and et al. 1996. Misclassification and linkage of hereditary sensory and autonomic neuropathy type 1 as Charcot- Marie-Tooth disease, type 2B. Am J Hum Genet 59:258-62. 119. Bejaoui, K., C. Wu, M. D. Scheffler, G. Haan, P. Ashby, L. Wu, P. de Jong, and R. H. Brown, Jr. 2001. SPTLC1 is mutated in hereditary sensory neuropathy, type 1. Nat Genet 27:261-2. 120. Dawkins, J. L., D. J. Hulme, S. B. Brahmbhatt, M. Auer-Grumbach, and G. A. Nicholson. 2001. Mutations in SPTLC1, encoding serine palmitoyltransferase, long chain base subunit-1, cause hereditary sensory neuropathy type I. Nat Genet 27:309- 12. 121. Verhoeven, K., P. De Jonghe, K. Coen, N. Verpoorten, M. Auer-Grumbach, J. M. Kwon, D. FitzPatrick, E. Schmedding, E. De Vriendt, A. Jacobs, V. Van Gerwen, K. Wagner, H. P. Hartung, and V. Timmerman. 2003. Mutations in the Small GTP-ase Late Endosomal Protein RAB7 Cause Charcot-Marie-Tooth Type 2B Neuropathy. Am J Hum Genet 72:722-7. 122. Yoshioka, R., P. J. Dyck, and P. F. Chance. 1996. Genetic heterogeneity in Charcot-Marie-Tooth neuropathy type 2. Neurology 46:569-71. 123. Dyck, P. J., W. J. Litchy, S. Minnerath, T. D. Bird, P. F. Chance, D. J. Schaid, and A. E. Aronson. 1994. Hereditary motor and sensory neuropathy with diaphragm and vocal cord paresis. Ann Neurol 35:608-15. 124. Dematteis, M., J. L. Pepin, M. Jeanmart, C. Deschaux, A. Labarre-Vila, and P. Levy. 2001. Charcot-Marie-Tooth disease and sleep apnoea syndrome: a family study. Lancet 357:267-72. 125. Klein, C. J., J. M. Cunningham, E. J. Atkinson, D. J. Schaid, S. J. Hebbring, S. A. Anderson, D. M. Klein, P. J. Dyck, W. J. Litchy, and S. N. Thibodeau. 2003. The gene for HMSN2C maps to 12q23-24: a region of neuromuscular disorders. Neurology 60:1151-6. 126. Ionasescu, V., C. Searby, V. C. Sheffield, T. Roklina, D. Nishimura, and R. Ionasescu. 1996. Autosomal dominant Charcot-Marie-Tooth axonal neuropathy mapped on chromosome 7p (CMT2D). Hum Mol Genet 5:1373-5. 127. Antonellis, A., R. E. Ellsworth, N. Sambuughin, I. Puls, A. Abel, S. Q. Lee-Lin, A. Jordanova, I. Kremensky, K. Christodoulou, L. T. Middleton, K. Sivakumar, V. Ionasescu, B. Funalot, J. M. Vance, L. G. Goldfarb, K. H. Fischbeck, and E. D. Green. 2003. Glycyl tRNA synthetase mutations in Charcot-Marie-Tooth disease type 2D and distal spinal muscular atrophy type V. Am J Hum Genet 72:1293-9. 128. Mersiyanova, I. V., A. V. Perepelov, A. V. Polyakov, V. F. Sitnikov, E. L. Dadali, R. B. Oparin, A. N. Petrin, and O. V. Evgrafov. 2000. A new variant of Charcot-Marie-Tooth disease type 2 is probably the result of a mutation in the neurofilament-light gene. Am J Hum Genet 67:37-46. 129. Jordanova, A., P. De Jonghe, C. F. Boerkoel, H. Takashima, E. De Vriendt, C. Ceuterick, J. J. Martin, I. J. Butler, P. Mancias, S. C. Papasozomenos, D. Terespolsky, L. Potocki, C. W. Brown, M. Shy, D. A. Rita, I. Tournev, I.

235 References Kremensky, J. R. Lupski, and V. Timmerman. 2003. Mutations in the neurofilament light chain gene (NEFL) cause early onset severe Charcot-Marie- Tooth disease. Brain 126:590-597. 130. Zuchner, S., M. Vorgerd, E. Sindern, and J. M. Schroder. 2004. The novel neurofilament light (NEFL) mutation Glu397Lys is associated with a clinically and morphologically heterogeneous type of Charcot-Marie-Tooth neuropathy. Neuromuscul Disord 14:147-57. 131. Ismailov, S. M., V. P. Fedotov, E. L. Dadali, A. V. Polyakov, C. Van Broeckhoven, V. I. Ivanov, P. De Jonghe, V. Timmerman, and O. V. Evgrafov. 2001. A new locus for autosomal dominant Charcot-Marie-Tooth disease type 2 (CMT2F) maps to chromosome 7q11-q21. Eur J Hum Genet 9:646-50. 132. Evgrafov, O. V., I. Mersiyanova, J. Irobi, L. Van Den Bosch, I. Dierick, C. L. Leung, O. Schagina, N. Verpoorten, K. Van Impe, V. Fedotov, E. Dadali, M. Auer-Grumbach, C. Windpassinger, K. Wagner, Z. Mitrovic, D. Hilton-Jones, K. Talbot, J. J. Martin, N. Vasserman, S. Tverskaya, A. Polyakov, R. K. Liem, J. Gettemans, W. Robberecht, P. De Jonghe, and V. Timmerman. 2004. Mutant small heat-shock protein 27 causes axonal Charcot-Marie-Tooth disease and distal hereditary motor neuropathy. Nat Genet. 133. Nelis, E., J. Berciano, N. Verpoorten, K. Coen, I. Dierick, V. Van Gerwen, O. Combarros, P. De Jonghe, and V. Timmerman. 2004. Autosomal dominant axonal Charcot-Marie-Tooth disease type 2 (CMT2G) maps to chromosome 12q12- q13.3. J Med Genet 41:193-7. 134. Tang, B. S., W. Luo, K. Xia, J. F. Xiao, H. Jiang, L. Shen, J. G. Tang, G. H. Zhao, F. Cai, Q. Pan, H. P. Dai, Q. D. Yang, J. H. Xia, and O. V. Evgrafov. 2004. A new locus for autosomal dominant Charcot-Marie-Tooth disease type 2 (CMT2L) maps to chromosome 12q24. Hum Genet 114:527-33. 135. Takashima, H., M. Nakagawa, M. Suehara, M. Saito, A. Saito, N. Kanzato, T. Matsuzaki, K. Hirata, J. D. Terwilliger, and M. Osame. 1999. Gene for hereditary motor and sensory neuropathy (proximal dominant form) mapped to 3q13.1. Neuromuscul Disord 9:368-71. 136. Takashima, H., M. Nakagawa, K. Nakahara, M. Suehara, T. Matsuzaki, I. Higuchi, H. Higa, K. Arimura, T. Iwamasa, S. Izumo, and M. Osame. 1997. A new type of hereditary motor and sensory neuropathy linked to chromosome 3. Ann Neurol 41:771-80. 137. Bouhouche, A., A. Benomar, N. Birouk, A. Mularoni, F. Meggouh, J. Tassin, D. Grid, A. Vandenberghe, M. Yahyaoui, T. Chkili, A. Brice, and E. LeGuern. 1999. A locus for an axonal form of autosomal recessive Charcot-Marie-Tooth disease maps to chromosome 1q21.2-q21.3. Am J Hum Genet 65:722-7. 138. De Sandre-Giovannoli, A., M. Chaouch, S. Kozlov, J. Vallat, M. Tazir, N. Kassouri, P. Szepetowski, T. Hammadouche, A. Vandenberghe, C. L. Stewart, D. Grid, and N. Levy. 2002. Homozygous Defects in LMNA, Encoding Lamin A/C Nuclear-Envelope Proteins, Cause Autosomal Recessive Axonal Neuropathy in Human (Charcot-Marie-Tooth Disorder Type 2) and Mouse. Am J Hum Genet 70:726-736. 139. Bonne, G., M. R. Di Barletta, S. Varnous, H. M. Becane, E. H. Hammouda, L. Merlini, F. Muntoni, C. R. Greenberg, F. Gary, J. A. Urtizberea, D. Duboc, M. Fardeau, D. Toniolo, and K. Schwartz. 1999. Mutations in the gene encoding lamin A/C cause autosomal dominant Emery-Dreifuss muscular dystrophy. Nat Genet 21:285-8. 140. Cao, H., and R. A. Hegele. 2000. Nuclear lamin A/C R482Q mutation in canadian kindreds with Dunnigan-type familial partial lipodystrophy. Hum Mol Genet 9:109- 12.

236 References 141. Muchir, A., G. Bonne, A. J. van der Kooi, M. van Meegen, F. Baas, P. A. Bolhuis, M. de Visser, and K. Schwartz. 2000. Identification of mutations in the gene encoding lamins A/C in autosomal dominant limb girdle muscular dystrophy with atrioventricular conduction disturbances (LGMD1B). Hum Mol Genet 9:1453- 9. 142. Raffaele Di Barletta, M., E. Ricci, G. Galluzzi, P. Tonali, M. Mora, L. Morandi, A. Romorini, T. Voit, K. H. Orstavik, L. Merlini, C. Trevisan, V. Biancalana, I. Housmanowa-Petrusewicz, S. Bione, R. Ricotti, K. Schwartz, G. Bonne, and D. Toniolo. 2000. Different mutations in the LMNA gene cause autosomal dominant and autosomal recessive Emery-Dreifuss muscular dystrophy. Am J Hum Genet 66:1407-12. 143. Brodsky, G. L., F. Muntoni, S. Miocic, G. Sinagra, C. Sewry, and L. Mestroni. 2000. Lamin A/C gene mutation associated with dilated cardiomyopathy with variable skeletal muscle involvement. Circulation 101:473-6. 144. Leal, A., B. Morera, G. Del Valle, D. Heuss, C. Kayser, M. Berghoff, R. Villegas, E. Hernandez, M. Mendez, H. C. Hennies, B. Neundorfer, R. Barrantes, A. Reis, and B. Rautenstrauss. 2001. A second locus for an axonal form of autosomal recessive Charcot-Marie- Tooth disease maps to chromosome 19q13.3. Am J Hum Genet 68:269-74. 145. Barhoumi, C., R. Amouri, C. Ben Hamida, M. Ben Hamida, S. Machghoul, M. Gueddiche, and F. Hentati. 2001. Linkage of a new locus for autosomal recessive axonal form of Charcot- Marie-Tooth disease to chromosome 8q21.3. Neuromuscul Disord 11:27-34. 146. Bomont, P., L. Cavalier, F. Blondeau, C. Ben Hamida, S. Belal, M. Tazir, E. Demir, H. Topaloglu, R. Korinthenberg, B. Tuysuz, P. Landrieu, F. Hentati, and M. Koenig. 2000. The gene encoding gigaxonin, a new member of the cytoskeletal BTB/kelch repeat family, is mutated in giant axonal neuropathy. Nat Genet 26:370-4. 147. Timmerman, V., P. De Jonghe, and C. Van Broeckhoven. 2000. Of giant axons and curly hair. Nat Genet 26:254-5. 148. Zemmouri, R., H. Azzedine, S. Assami, N. Kitouni, J. M. Vallat, T. Maisonobe, T. Hamadouche, M. Kessaci, B. Mansouri, E. Le Guern, D. Grid, and M. Tazir. 2000. Charcot-Marie-Tooth 2-like presentation of an Algerian family with giant axonal neuropathy. Neuromuscul Disord 10:592-8. 149. Wilmshurst, J. M., A. Bye, C. Rittey, C. Adams, A. F. Hahn, D. Ramsay, R. Pamphlett, J. D. Pollard, and R. Ouvrier. 2001. Severe infantile axonal neuropathy with respiratory failure. Muscle Nerve 24:760-8. 150. Priest, J. M., K. H. Fischbeck, N. Nouri, and B. J. Keats. 1995. A locus for axonal motor-sensory neuropathy with deafness and mental retardation maps to Xq24-q26. Genomics 29:409-12. 151. Morar, B., D. Gresham, D. Angelicheva, I. Tournev, R. Gooding, V. Guergueltcheva, C. Schmidt, A. Abicht, H. Lochmuller, A. Tordai, L. Kalmar, M. Nagy, V. Karcagi, M. Jeanpierre, A. Herczegfalvi, D. Beeson, V. Venkataraman, K. Warwick Carter, J. Reeve, R. De Pablo, V. Kucinskas, and L. Kalaydjieva. 2004. Mutation history of the roma/gypsies. Am J Hum Genet 75:596-609. 152. Colomer, J., C. Iturriaga, L. Kalaydjieva, T. Rogers, J. Hantke, R. H. King, I. Tournev, and P. K. Thomas. 2001. HMSN-Russe in two Spanish patients: Distinctive features of the disease and current genetic findings. Acta Myologica XX:202-9. 153. Guergueltcheva, V., I. Tournev, V. Bojinova, J. Hantke, I. Litvinenko, B. Ishpekova, A. Shmarov, J. Petrova, A. Jordanova, and L. Kalaydjieva. Early

237 References clinical and electrophysiological features of the two most common forms of Charcot- Marie-Tooth disease in the Roma (Gypsies). submitted to Annals of Child Neurology. 154. Miller, S. A., D. D. Dykes, and H. F. Polesky. 1988. A simple salting out procedure for extracting DNA from human nucleated cells. Nucleic Acids Res 16:1215. 155. Sambrock, J., J. W. Russell, and J. Sambrock. 2001. Molecular Cloning 3rd edition. 156. Denhardt, D. T. 1966. A membrane-filter technique for the detection of complementary DNA. Biochem Biophys Res Commun 23:641-6. 157. http://eatworms.swmed.edu/~tim/primerfinder/. 158. http://rna.lundberg.gu.se/cutter2/. 159. Ye, S., S. Dhillon, X. Ke, A. R. Collins, and I. N. Day. 2001. An efficient procedure for genotyping single nucleotide polymorphisms. Nucleic Acids Res 29:E88-8. 160. Newton, C. R., A. Graham, L. E. Heptinstall, S. J. Powell, C. Summers, N. Kalsheker, J. C. Smith, and A. F. Markham. 1989. Analysis of any point mutation in DNA. The amplification refractory mutation system (ARMS). Nucleic Acids Res 17:2503-16. 161. Liu, Q., E. C. Thorland, J. A. Heit, and S. S. Sommer. 1997. Overlapping PCR for bidirectional PCR amplification of specific alleles: a rapid one-tube method for simultaneously differentiating homozygotes and heterozygotes. Genome Res 7:389- 98. 162. Ye, S., S. Humphries, and F. Green. 1992. Allele specific amplification by tetra- primer PCR. Nucleic Acids Res 20:1152. 163. http://cedar.genetics.soton.ac.uk/public_html/primer1.html. 164. 1987- ongoing. Current protocols in molecular biology / edited by Frederick M. Ausubel. 165. Sherry, S. T., M. H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, and K. Sirotkin. 2001. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308-11. 166. Brookes, A. J., H. Lehvaslaiho, M. Siegfried, J. G. Boehm, Y. P. Yuan, C. M. Sarkar, P. Bork, and F. Ortigao. 2000. HGBASE: a database of SNPs and other variations in and around human genes. Nucleic Acids Res 28:356-60. 167. Fredman, D., M. Siegfried, Y. P. Yuan, P. Bork, H. Lehvaslaiho, and A. J. Brookes. 2002. HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources. Nucleic Acids Res 30:387-91. 168. Zhang, M. Q. 1998. Identification of human gene core promoters in silico. Genome Res 8:319-26. 169. Pearson, W. R., and D. J. Lipman. 1988. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85:2444-8. 170. Pearson, W. R. 1990. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 183:63-98. 171. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J Mol Biol 215:403-10. 172. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389-402. 173. Reese, M. G., F. H. Eeckman, D. Kulp, and D. Haussler. 1997. Improved splice site detection in Genie. J Comput Biol 4:311-23. 174. Reese, M. G., and F. H. Eeckman. 1996. Splice Sites: A detailed neural network study. Proceedings of the Genome Mapping & Sequencing Meeting, Cold Spring Harbour, New York arranged by D. Bentley, E. Green and P. Hieter.

238 References 175. Brunak, S., J. Engelbrecht, and S. Knudsen. 1991. Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220:49-65. 176. Pertea, M., X. Lin, and S. L. Salzberg. 2001. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res 29:1185-90. 177. Brodskii, L. I., V. V. Ivanov, L. Kalaidzidis Ia, A. M. Leontovich, V. K. Nikolaev, S. I. Feranchuk, and V. A. Drachev. 1995. [GeneBee-NET: An Internet based server for biopolymer structure analysis]. Biokhimiia 60:1221-30. 178. Mathews, D. H., J. Sabina, M. Zuker, and D. H. Turner. 1999. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288:911-40. 179. Pesole, G., and S. Liuni. 1999. Internet resources for the functional analysis of 5' and 3' untranslated regions of eukaryotic mRNAs. Trends Genet 15:378. 180. Strachan, T., and A. P. Read. 1999. Human Molecular Genetics 2nd edition. 181. Royer-Pokora, B., L. M. Kunkel, A. P. Monaco, S. C. Goff, P. E. Newburger, R. L. Baehner, F. S. Cole, J. T. Curnutte, and S. H. Orkin. 1986. Cloning the gene for an inherited human disorder--chronic granulomatous disease--on the basis of its chromosomal location. Nature 322:32-8. 182. Collins, F. S. 1995. Positional cloning moves from perditional to traditional. Nat Genet 9:347-50. 183. Gulcher, J. R., A. Kong, and K. Stefansson. 2001. The role of linkage studies for common diseases. Curr Opin Genet Dev 11:264-7. 184. Pawlowitzki, I. H., J. H. Edwards, and E. A. Thompson. 1997. Genetic mapping of disease genes. 185. Morton, N. E. 1955. Sequential tests for the detection of linkage. Am J Hum Genet 7:277-318. 186. Botstein, D., and N. Risch. 2003. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet 33 Suppl:228-37. 187. Haines, J. L., and M. A. Pericak-Vance. 1998. Approaches to gene mapping in complex human disease. 188. Rocchi, A., S. Pellegrini, G. Siciliano, and L. Murri. 2003. Causative and susceptibility genes for Alzheimer's disease: a review. Brain Res Bull 61:1-24. 189. de Vries, H. G., M. A. van der Meulen, R. Rozen, D. J. Halley, H. Scheffer, L. P. ten Kate, C. H. Buys, and G. J. te Meerman. 1996. Haplotype identity between individuals who share a CFTR mutation allele "identical by descent": demonstration of the usefulness of the haplotype-sharing concept for gene mapping in real populations. Hum Genet 98:304-9. 190. Hartl, D. L., and E. W. Jones. 2002. Essential genetics: A genomics perspective (3rd edition). 191. Dib, C., S. Faure, C. Fizames, D. Samson, N. Drouot, A. Vignal, P. Millasseau, S. Marc, J. Hazan, E. Seboun, M. Lathrop, G. Gyapay, J. Morissette, and J. Weissenbach. 1996. A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature 380:152-4. 192. Broman, K. W., J. C. Murray, V. C. Sheffield, R. L. White, and J. L. Weber. 1998. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am J Hum Genet 63:861-9. 193. Kong, A., D. F. Gudbjartsson, J. Sainz, G. M. Jonsdottir, S. A. Gudjonsson, B. Richardsson, S. Sigurdardottir, J. Barnard, B. Hallbeck, G. Masson, A. Shlien, S. T. Palsson, M. L. Frigge, T. E. Thorgeirsson, J. R. Gulcher, and K. Stefansson. 2002. A high-resolution recombination map of the human genome. Nat Genet.

239 References 194. Imanishi, T., T. Itoh, Y. Suzuki, C. O'Donovan, S. Fukuchi, K. O. Koyanagi, R. A. Barrero, T. Tamura, Y. Yamaguchi-Kabata, M. Tanino, K. Yura, S. Miyazaki, K. Ikeo, K. Homma, A. Kasprzyk, T. Nishikawa, M. Hirakawa, J. Thierry-Mieg, D. Thierry-Mieg, J. Ashurst, et al. 2004. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol 2:E162. 195. Jobling, M. A., M. E. Hurles, and C. Tyler-Smith. 2004. Human evolutionary genetics: Origins, peoples & disease. 196. Houwen, R. H., S. Baharloo, K. Blankenship, P. Raeymaekers, J. Juyn, L. A. Sandkuijl, and N. B. Freimer. 1994. Genome screening by searching for shared segments: mapping a gene for benign recurrent intrahepatic cholestasis. Nat Genet 8:380-6. 197. Varilo, T., M. Savukoski, R. Norio, P. Santavuori, L. Peltonen, and I. Jarvela. 1996. The age of human mutation: genealogical and linkage disequilibrium analysis of the CLN5 mutation in the Finnish population. Am J Hum Genet 58:506-12. 198. Adriani, M., A. Martinez-Mir, F. Fusco, R. Busiello, J. Frank, S. Telese, E. Matrecano, M. V. Ursini, A. M. Christiano, and C. Pignata. 2004. Ancestral founder mutation of the nude (FOXN1) gene in congenital severe combined immunodeficiency associated with alopecia in southern Italy population. Ann Hum Genet 68:265-8. 199. Hästbacka, J., A. de la Chapelle, I. Kaitila, P. Sistonen, A. Weaver, and E. Lander. 1992. Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nat Genet 2:204-11. 200. Lander, E. S., and D. Botstein. 1987. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science 236:1567-70. 201. Finckh, U., S. Xu, G. Kumaramanickavel, M. Schurmann, J. K. Mukkadan, S. T. Fernandez, S. John, J. L. Weber, M. J. Denton, and A. Gal. 1998. Homozygosity mapping of autosomal recessive retinitis pigmentosa locus (RP22) on chromosome 16p12.1-p12.3. Genomics 48:341-5. 202. Tranebjaerg, L., T. M. Teslovich, M. Jones, M. M. Barmada, T. Fagerheim, A. Dahl, D. M. Escolar, J. M. Trent, E. M. Gillanders, and D. A. Stephan. 2003. Genome-wide homozygosity mapping localizes a gene for autosomal recessive non- progressive infantile ataxia to 20q11-q13. Hum Genet 113:293-5. 203. Hästbacka, J., I. Kaitila, P. Sistonen, and A. de la Chapelle. 1990. Diastrophic dysplasia gene maps to the distal long arm of chromosome 5. Proc Natl Acad Sci U S A 87:8056-9. 204. Hästbacka, J., A. de la Chapelle, M. M. Mahtani, G. Clines, M. P. Reeve-Daly, M. Daly, B. A. Hamilton, K. Kusumi, B. Trivedi, A. Weaver, and et al. 1994. The diastrophic dysplasia gene encodes a novel sulfate transporter: positional cloning by fine-structure linkage disequilibrium mapping. Cell 78:1073-87. 205. Chandler, D., D. Angelicheva, L. Heather, R. Gooding, D. Gresham, P. Yanakiev, R. de Jonge, F. Baas, D. Dye, L. Karagyozov, A. Savov, K. Blechschmidt, B. Keats, P. K. Thomas, R. H. King, A. Starr, A. Nikolova, J. Colomer, B. Ishpekova, I. Tournev, J. A. Urtizberea, L. Merlini, D. Butinar, B. Chabrol, T. Voit, M. Baethmann, V. Nedkova, A. Corches, and L. Kalaydjieva. 2000. Hereditary motor and sensory neuropathy--Lom (HMSNL): refined genetic mapping in Romani (Gypsy) families from several European countries. Neuromuscul Disord 10:584-91. 206. Colomer, J., C. Iturriaga, L. Kalaydjieva, D. Angelicheva, R. H. King, and P. K. Thomas. 2000. Hereditary motor and sensory neuropathy-Lom (HMSNL) in a Spanish family: clinical, electrophysiological, pathological and genetic studies. Neuromuscul Disord 10:578-83.

240 References 207. Venter, J. C., M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G. Sutton, H. O. Smith, M. Yandell, C. A. Evans, R. A. Holt, J. D. Gocayne, P. Amanatides, R. M. Ballew, D. H. Huson, J. R. Wortman, Q. Zhang, C. D. Kodira, X. H. Zheng, L. Chen, M. Skupski, et al. 2001. The sequence of the human genome. Science 291:1304-51. 208. Lander, E. S., L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, W. FitzHugh, R. Funke, D. Gage, K. Harris, A. Heaford, J. Howland, L. Kann, J. Lehoczky, R. LeVine, P. McEwan, K. McKernan, et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860-921. 209. Istrail, S., G. G. Sutton, L. Florea, A. L. Halpern, C. M. Mobarry, R. Lippert, B. Walenz, H. Shatkay, I. Dew, J. R. Miller, M. J. Flanigan, N. J. Edwards, R. Bolanos, D. Fasulo, B. V. Halldorsson, S. Hannenhalli, R. Turner, S. Yooseph, F. Lu, D. R. Nusskern, B. C. Shue, X. H. Zheng, F. Zhong, A. L. Delcher, D. H. Huson, S. A. Kravitz, L. Mouchard, K. Reinert, K. A. Remington, A. G. Clark, M. S. Waterman, E. E. Eichler, M. D. Adams, M. W. Hunkapiller, E. W. Myers, and J. C. Venter. 2004. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc Natl Acad Sci U S A 101:1916-21. 210. Bentley, D. R., P. Deloukas, A. Dunham, L. French, S. G. Gregory, S. J. Humphray, A. J. Mungall, M. T. Ross, N. P. Carter, I. Dunham, C. E. Scott, K. J. Ashcroft, A. L. Atkinson, K. Aubin, D. M. Beare, G. Bethel, N. Brady, J. C. Brook, D. C. Burford, W. D. Burrill, et al. 2001. The physical maps for sequencing human chromosomes 1, 6, 9, 10, 13, 20 and X. Nature 409:942-3. 211. Semple, C. A. 2001. Bases and spaces: resources on the web for accessing the draft human genome - II - after publication of the draft. Genome Biol 2:2001. 212. Collins, A., J. Frezal, J. Teague, and N. E. Morton. 1996. A metric map of humans: 23,500 loci in 850 bands. Proc Natl Acad Sci U S A 93:14771-5. 213. Banchs, I., A. Bosch, J. Guimera, C. Lazaro, A. Puig, and X. Estivill. 1994. New alleles at microsatellite loci in CEPH families mainly arise from somatic mutations in the lymphoblastoid cell lines. Hum Mutat 3:365-72. 214. Brinkmann, B., M. Klintschar, F. Neuhuber, J. Huhne, and B. Rolf. 1998. Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am J Hum Genet 62:1408-15. 215. Hantke, J., T. Rogers, L. French, I. Tournev, V. Guergueltcheva, J. A. Urtizberea, J. Colomer, A. Corches, C. Lupu, L. Merlini, P. K. Thomas, and L. Kalaydjieva. 2003. Refined mapping of the HMSNR critical gene region-- construction of a high-density integrated genetic and physical map. Neuromuscul Disord 13:729-36. 216. Antonarakis, S. E., M. Krawczak, and D. N. Cooper. 2000. Disease-causing mutations in the human genome. Eur J Pediatr 159 Suppl 3:S173-8. 217. Salisbury, B. A., M. Pungliya, J. Y. Choi, R. Jiang, X. J. Sun, and J. C. Stephens. 2003. SNP and haplotype variation in the human genome. Mutat Res 526:53-61. 218. Condit, C. M., P. J. Achter, I. Lauer, and E. Sefcovic. 2002. The changing meanings of "mutation:" A contextualized study of public discourse. Hum Mutat 19:69-75. 219. Egger, G., G. Liang, A. Aparicio, and P. A. Jones. 2004. Epigenetics in human disease and prospects for epigenetic therapy. Nature 429:457-63. 220. Knippers, R. 1997. Molekulare Genetik, 7. durchgesehene und korrigierte Auflage. 221. Rakyan, V. K., J. Preis, H. D. Morgan, and E. Whitelaw. 2001. The marks, mechanisms and memory of epigenetic states in mammals. Biochem J 356:1-10.

241 References 222. Collins, F. S. 1992. Positional cloning: let's not call it reverse anymore. Nat Genet 1:3-6. 223. Cotton, R. G., and C. R. Scriver. 1998. Proof of "disease causing" mutation. Hum Mutat 12:1-3. 224. Cotton, R. G., and O. Horaitis. 2000. Quality control in the discovery, reporting, and recording of genomic variation. Hum Mutat 15:16-21. 225. Cazzola, M., and R. C. Skoda. 2000. Translational pathophysiology: a novel molecular mechanism of human disease. Blood 95:3280-8. 226. Ghilardi, N., A. Wiestner, and R. C. Skoda. 1998. Thrombopoietin production is inhibited by a translational mechanism. Blood 92:4023-30. 227. Ionasescu, V. V. 1998. X-Linked Charcot-Marie-Tooth Disease and Connexin32. Cell Biol Int 22:807-813. 228. http://www.genesoc.com/counseling/Outlines/hemoC.htm. 229. Bobadilla, J. L., M. Macek, Jr., J. P. Fine, and P. M. Farrell. 2002. Cystic fibrosis: a worldwide analysis of CFTR mutations--correlation with incidence data and application to screening. Hum Mutat 19:575-606. 230. Gillian, A. L., and J. Svaren. 2004. The Ddx20/DP103 dead box protein represses transcriptional activation by Egr2/Krox-20. J Biol Chem 279:9056-63. 231. Pruitt, K. D., K. S. Katz, H. Sicotte, and D. R. Maglott. 2000. Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet 16:44- 7. 232. Raja, S. M., B. Wang, M. Dantuluri, U. R. Desai, B. Demeler, K. Spiegel, S. S. Metkar, and C. J. Froelich. 2002. Cytotoxic cell granule-mediated apoptosis. Characterization of the macromolecular complex of granzyme B with serglycin. J Biol Chem 277:49523-30. 233. Rishi, A. K., L. Zhang, M. Boyanapalli, A. Wali, R. M. Mohammad, Y. Yu, J. A. Fontana, J. S. Hatfield, M. I. Dawson, A. P. Majumdar, and U. Reichert. 2003. Identification and characterization of a cell cycle and apoptosis regulatory protein-1 as a novel mediator of apoptosis signaling by retinoid CD437. J Biol Chem 278:33422-35. 234. Reddy, J. V., and M. N. Seaman. 2001. Vps26p, a component of retromer, directs the interactions of Vps35p in endosome-to-Golgi retrieval. Mol Biol Cell 12:3242- 56. 235. Ono, R., T. Taki, T. Taketani, M. Taniwaki, H. Kobayashi, and Y. Hayashi. 2002. LCX, leukemia-associated protein with a CXXC domain, is fused to MLL in acute myeloid leukemia with trilineage dysplasia having t(10;11)(q22;q23). Cancer Res 62:4075-80. 236. Bianchi, M., and M. Magnani. 1995. Hexokinase mutations that produce nonspherocytic hemolytic anemia. Blood Cells Mol Dis 21:2-8. 237. Kanno, H., K. Murakami, Y. Hariyama, K. Ishikawa, S. Miwa, and H. Fujii. 2002. Homozygous intragenic deletion of type I hexokinase gene causes lethal hemolytic anemia of the affected fetus. Blood 100:1930. 238. van Wijk, R., G. Rijksen, E. G. Huizinga, H. K. Nieuwenhuis, and W. W. van Solinge. 2003. HK Utrecht: missense mutation in the active site of human hexokinase associated with hexokinase deficiency and severe nonspherocytic hemolytic anemia. Blood 101:345-7. 239. Dan, I., N. M. Watanabe, E. Kajikawa, T. Ishida, A. Pandey, and A. Kusumi. 2002. Overlapping of MINK and CHRNE gene loci in the course of mammalian evolution. Nucleic Acids Res 30:2906-10. 240. Liang, F., I. Holt, G. Pertea, S. Karamycheva, S. L. Salzberg, and J. Quackenbush. 2000. Gene index analysis of the human genome estimates approximately 120,000 genes. Nat Genet 25:239-40.

242 References 241. Mori, C., J. E. Welch, K. D. Fulcher, D. A. O'Brien, and E. M. Eddy. 1993. Unique hexokinase messenger ribonucleic acids lacking the porin-binding domain are developmentally expressed in mouse spermatogenic cells. Biol Reprod 49:191- 203. 242. Andreoni, F., A. Ruzzo, and M. Magnani. 2000. Structure of the 5' region of the human hexokinase type I (HKI) gene and identification of an additional testis- specific HKI mRNA. Biochim Biophys Acta 1493:19-26. 243. Klockars, T., M. Savukoski, J. Isosomppi, and L. Peltonen. 1999. Positional cloning of the CLN5 gene defective in the Finnish variant of the LINCL. Mol Genet Metab 66:324-8. 244. Wilson, J. E. 1995. Hexokinases. Rev Physiol Biochem Pharmacol 126:65-198. 245. Sebastian, S., S. Edassery, and J. E. Wilson. 2001. The human gene for the type III isozyme of hexokinase: structure, basal promoter, and evolution. Arch Biochem Biophys 395:113-20. 246. Murakami, K., H. Kanno, J. Tancabelic, and H. Fujii. 2002. Gene expression and biological significance of hexokinase in erythroid cells. Acta Haematol 108:204-9. 247. Travis, A. J., J. A. Foster, N. A. Rosenbaum, P. E. Visconti, G. L. Gerton, G. S. Kopf, and S. B. Moss. 1998. Targeting of a germ cell-specific type 1 hexokinase lacking a porin-binding domain to the mitochondria as well as to the head and fibrous sheath of murine spermatozoa. Mol Biol Cell 9:263-76. 248. Sui, D., and J. E. Wilson. 1997. Structural determinants for the intracellular localization of the isozymes of mammalian hexokinase: intracellular localization of fusion constructs incorporating structural elements from the hexokinase isozymes and the green fluorescent protein. Arch Biochem Biophys 345:111-25. 249. Rebhan, M., V. Chalifa-Caspi, J. Prilusky, and D. Lancet. 1997. GeneCards: integrating information about genes, proteins and diseases. Trends Genet 13:163. 250. Rebhan, M., V. Chalifa-Caspi, J. Prilusky, and D. Lancet. 1997. GeneCards: encyclopedia for genes, proteins and diseases.World Wide Web URL: http://bioinformatics.weizmann.ac.il/cards. 251. Voet, D., and J. G. Voet. 1995. Biochemistry, 2nd edition. 252. Kanno, H. 2000. Hexokinase: gene structure and mutations. Baillieres Best Pract Res Clin Haematol 13:83-8. 253. Wilson, J. E. 2003. Isozymes of mammalian hexokinase: structure, subcellular localization and metabolic function. J Exp Biol 206:2049-57. 254. Printz, R. L., S. Koch, L. R. Potter, R. M. O'Doherty, J. J. Tiesinga, S. Moritz, and D. K. Granner. 1993. Hexokinase II mRNA and gene structure, regulation by insulin, and evolution. J Biol Chem 268:5209-19. 255. Osawa, H., R. L. Printz, R. R. Whitesell, and D. K. Granner. 1995. Regulation of hexokinase II gene transcription and glucose phosphorylation by catecholamines, cyclic AMP, and insulin. Diabetes 44:1426-32. 256. Gosmain, Y., E. Lefai, S. Ryser, M. Roques, and H. Vidal. 2004. Sterol regulatory element-binding protein-1 mediates the effect of insulin on hexokinase II gene expression in human muscle cells. Diabetes 53:321-9. 257. Taneja, N., P. E. Coy, I. Lee, J. M. Bryson, and R. B. Robey. 2004. Proinflammatory interleukin-1 cytokines increase mesangial cell hexokinase activity and hexokinase II isoform abundance. Am J Physiol Cell Physiol 287:C548-57. 258. Brocklehurst, K. J., R. A. Davies, and L. Agius. 2004. Differences in regulatory properties between human and rat glucokinase regulatory protein. Biochem J 378:693-7. 259. Postic, C., M. Shiota, and M. A. Magnuson. 2001. Cell-specific roles of glucokinase in glucose homeostasis. Recent Prog Horm Res 56:195-217.

243 References 260. Godbole, A., J. Varghese, A. Sarin, and M. K. Mathew. 2003. VDAC is a conserved element of death pathways in plant and animal systems. Biochim Biophys Acta 1642:87-96. 261. Rathmell, J. C., C. J. Fox, D. R. Plas, P. S. Hammerman, R. M. Cinalli, and C. B. Thompson. 2003. Akt-directed glucose metabolism can prevent Bax conformation change and promote growth factor-independent survival. Mol Cell Biol 23:7315-28. 262. Pastorino, J. G., N. Shulga, and J. B. Hoek. 2002. Mitochondrial binding of hexokinase II inhibits Bax-induced cytochrome c release and apoptosis. J Biol Chem 277:7610-8. 263. Gottlob, K., N. Majewski, S. Kennedy, E. Kandel, R. B. Robey, and N. Hay. 2001. Inhibition of early apoptotic events by Akt/PKB is dependent on the first committed step of glycolysis and mitochondrial hexokinase. Genes Dev 15:1406-18. 264. Majewski, N., V. Nogueira, R. B. Robey, and N. Hay. 2004. Akt inhibits apoptosis downstream of BID cleavage via a glucose-dependent mechanism involving mitochondrial hexokinases. Mol Cell Biol 24:730-40. 265. Danial, N. N., C. F. Gramm, L. Scorrano, C. Y. Zhang, S. Krauss, A. M. Ranger, S. R. Datta, M. E. Greenberg, L. J. Licklider, B. B. Lowell, S. P. Gygi, and S. J. Korsmeyer. 2003. BAD and glucokinase reside in a mitochondrial complex that integrates glycolysis and apoptosis. Nature 424:952-6. 266. Downward, J. 2003. Cell biology: metabolism meets death. Nature 424:896-7. 267. Echwald, S. M., C. Bjorbaek, T. Hansen, J. O. Clausen, H. Vestergaard, J. R. Zierath, R. L. Printz, D. K. Granner, and O. Pedersen. 1995. Identification of four amino acid substitutions in hexokinase II and studies of relationships to NIDDM, glucose effectiveness, and insulin sensitivity. Diabetes 44:347-53. 268. Vidal-Puig, A., R. L. Printz, I. M. Stratton, D. K. Granner, and D. E. Moller. 1995. Analysis of the hexokinase II gene in subjects with insulin resistance and NIDDM and detection of a Gln142-->His substitution. Diabetes 44:340-6. 269. Laakso, M., M. Malkki, and S. S. Deeb. 1995. Amino acid substitutions in hexokinase II among patients with NIDDM. Diabetes 44:330-4. 270. Ardehali, H., G. E. Tiller, R. L. Printz, H. Mochizuki, M. Prochazka, and D. K. Granner. 1996. A novel (TA)n polymorphism in the hexokinase II gene: application to noninsulin-dependent diabetes mellitus in the Pima Indians. Hum Genet 97:482-5. 271. Malkki, M., M. Laakso, and S. S. Deeb. 1997. The human hexokinase II gene promoter: functional characterization and detection of variants among patients with NIDDM. Diabetologia 40:1461-9. 272. Stenson, P. D., E. V. Ball, M. Mort, A. D. Phillips, J. A. Shiel, N. S. Thomas, S. Abeysinghe, M. Krawczak, and D. N. Cooper. 2003. Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat 21:577-81. 273. Gloyn, A. L. 2003. Glucokinase (GCK) mutations in hyper- and hypoglycemia: maturity-onset diabetes of the young, permanent neonatal diabetes, and hyperinsulinemia of infancy. Hum Mutat 22:353-62. 274. Peters, L. L., P. W. Lane, S. G. Andersen, B. Gwynn, J. E. Barker, and E. Beutler. 2001. Downeast anemia (dea), a new mouse model of severe nonspherocytic hemolytic anemia caused by hexokinase (HK(1)) deficiency. Blood Cells Mol Dis 27:850-60. 275. Heikkinen, S., M. Pietila, M. Halmekyto, S. Suppola, E. Pirinen, S. S. Deeb, J. Janne, and M. Laakso. 1999. Hexokinase II-deficient mice. Prenatal death of homozygotes without disturbances in glucose tolerance in heterozygotes. J Biol Chem 274:22517-23. 276. Fueger, P. T., S. Heikkinen, D. P. Bracy, C. M. Malabanan, R. R. Pencek, M. Laakso, and D. H. Wasserman. 2003. Hexokinase II partial knockout impairs

244 References exercise-stimulated glucose uptake in oxidative muscles of mice. Am J Physiol Endocrinol Metab 285:E958-63. 277. Grupe, A., B. Hultgren, A. Ryan, Y. H. Ma, M. Bauer, and T. A. Stewart. 1995. Transgenic knockouts reveal a critical requirement for pancreatic beta cell glucokinase in maintaining glucose homeostasis. Cell 83:69-78. 278. Terauchi, Y., H. Sakura, K. Yasuda, K. Iwamoto, N. Takahashi, K. Ito, H. Kasai, H. Suzuki, O. Ueda, N. Kamada, and et al. 1995. Pancreatic beta-cell- specific targeted disruption of glucokinase gene. Diabetes mellitus due to defective insulin secretion to glucose. J Biol Chem 270:30253-6. 279. Niswender, K. D., M. Shiota, C. Postic, A. D. Cherrington, and M. A. Magnuson. 1997. Effects of increased glucokinase gene copy number on glucose homeostasis and hepatic glucose metabolism. J Biol Chem 272:22570-5. 280. Kato, T., and O. H. Lowry. 1973. Enzymes of energy-converting systems in individual mammalian nerve cell bodies. J Neurochem 20:151-63. 281. Wilkin, G. P., and J. E. Wilson. 1977. Localization of hexokinase in neural tissue: light microscopic studies with immunofluorescence and histochemical procedures. J Neurochem 29:1039-51. 282. Lawrence, G. M., D. G. Walker, and I. P. Trayer. 1984. The ubiquitous localization of type I hexokinase in rat peripheral nerves, smooth muscle cells and epithelial cells. Histochem J 16:1113-23. 283. McDougal, D. B., Jr., M. J. Yu, P. D. Gorin, and E. M. Johnson, Jr. 1981. Transported enzymes in sciatic nerve and sensory ganglia of rats exposed to maternal antibodies against nerve growth factor. J Neurochem 36:1847-52. 284. Ponomarenko, J. V., T. I. Merkulova, G. V. Orlova, O. N. Fokin, E. V. Gorshkova, A. S. Frolov, V. P. Valuev, and M. P. Ponomarenko. 2003. rSNP_Guide, a database system for analysis of transcription factor binding to DNA with variations: application to genome annotation. Nucleic Acids Res 31:118-21. 285. Vasiliev, G. V., V. M. Merkulov, V. F. Kobzev, T. I. Merkulova, M. P. Ponomarenko, and N. A. Kolchanov. 1999. Point mutations within 663-666 bp of intron 6 of the human TDO2 gene, associated with a number of psychiatric disorders, damage the YY-1 transcription factor binding site. FEBS Lett 462:85-8. 286. Blencowe, B. J. 2000. Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem Sci 25:106-10. 287. Cartegni, L., S. L. Chew, and A. R. Krainer. 2002. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet 3:285- 98. 288. Faustino, N. A., and T. A. Cooper. 2003. Pre-mRNA splicing and human disease. Genes Dev 17:419-37. 289. Hellen, C. U., and P. Sarnow. 2001. Internal ribosome entry sites in eukaryotic mRNA molecules. Genes Dev 15:1593-612. 290. Morris, D. R., and A. P. Geballe. 2000. Upstream open reading frames as regulators of mRNA translation. Mol Cell Biol 20:8635-42. 291. Sarrazin, S., J. Starck, C. Gonnet, A. Doubeikovski, F. Melet, and F. Morle. 2000. Negative and translation termination-dependent positive control of FLI-1 protein synthesis by conserved overlapping 5' upstream open reading frames in Fli-1 mRNA. Mol Cell Biol 20:2959-69. 292. Stoneley, M., and A. E. Willis. 2004. Cellular internal ribosome entry segments: structures, trans-acting factors and regulation of gene expression. Oncogene 23:3200-7. 293. Hudder, A., and R. Werner. 2000. Analysis of a Charcot-Marie-Tooth disease mutation reveals an essential internal ribosome entry site element in the connexin-32 gene. J Biol Chem 275:34586-91.

245 References 294. Muslimov, I. A., M. Titmus, E. Koenig, and H. Tiedge. 2002. Transport of Neuronal BC1 RNA in Mauthner Axons. J Neurosci 22:4293-301. 295. Kloc, M., N. R. Zearfoss, and L. D. Etkin. 2002. Mechanisms of subcellular mRNA localization. Cell 108:533-44. 296. Wagner, G., J. Kovacs, P. Low, F. Orosz, and J. Ovadi. 2001. Tubulin and microtubule are potential targets for brain hexokinase binding. FEBS Lett 509:81-4. 297. Lauderdale, J. D., J. S. Wilensky, E. R. Oliver, D. S. Walton, and T. Glaser. 2000. 3' deletions cause aniridia by preventing PAX6 gene expression. Proc Natl Acad Sci U S A 97:13755-9. 298. Pfeifer, D., R. Kist, K. Dewar, K. Devon, E. S. Lander, B. Birren, L. Korniszewski, E. Back, and G. Scherer. 1999. Campomelic dysplasia translocation breakpoints are scattered over 1 Mb proximal to SOX9: evidence for an extended control region. Am J Hum Genet 65:111-24. 299. Lettice, L. A., T. Horikoshi, S. J. Heaney, M. J. van Baren, H. C. van der Linde, G. J. Breedveld, M. Joosse, N. Akarsu, B. A. Oostra, N. Endo, M. Shibata, M. Suzuki, E. Takahashi, T. Shinka, Y. Nakahori, D. Ayusawa, K. Nakabayashi, S. W. Scherer, P. Heutink, R. E. Hill, and S. Noji. 2002. Disruption of a long-range cis-acting regulator for Shh causes preaxial polydactyly. Proc Natl Acad Sci U S A 99:7548-53. 300. Enattah, N. S., T. Sahi, E. Savilahti, J. D. Terwilliger, L. Peltonen, and I. Jarvela. 2002. Identification of a variant associated with adult-type hypolactasia. Nat Genet 30:233-7. 301. Hardison, R. C. 2000. Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet 16:369-72. 302. Kent, W. J. 2002. BLAT--the BLAST-like alignment tool. Genome Res 12:656-64. 303. Le, S. Y., and J. V. Maizel, Jr. 1997. A common RNA structural motif involved in the internal initiation of translation of cellular mRNAs. Nucleic Acids Res 25:362- 69. 304. Fairbrother, W. G., G. W. Yeo, R. Yeh, P. Goldstein, M. Mawson, P. A. Sharp, and C. B. Burge. 2004. RESCUE-ESE identifies candidate exonic splicing enhancers in vertebrate exons. Nucleic Acids Res 32:W187-90. 305. Fairbrother, W. G., R. F. Yeh, P. A. Sharp, and C. B. Burge. 2002. Predictive identification of exonic splicing enhancers in human genes. Science 297:1007-13. 306. Veeramachaneni, V., W. Makalowski, M. Galdzicki, R. Sood, and I. Makalowska. 2004. Mammalian overlapping genes: the comparative perspective. Genome Res 14:280-6. 307. Burge, C., and S. Karlin. 1997. Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78-94. 308. Yokomori, N., M. Tawata, Y. Hosaka, and T. Onaya. 1992. Transcriptional regulation of hexokinase I mRNA levels by TSH in cultured rat thyroid FRTL5 cells. Life Sci 51:1613-9. 309. Munroe, S. H. 2004. Diversity of antisense regulation in eukaryotes: Multiple mechanisms, emerging patterns. J Cell Biochem 93:664-671. 310. Pareyson, D. 1999. Charcot-marie-tooth disease and related neuropathies: molecular basis for distinction and diagnosis. Muscle Nerve 22:1498-509. 311. Sevilla, T., A. Cuesta, M. J. Chumillas, F. Mayordomo, L. Pedrola, F. Palau, and J. J. Vilchez. 2003. Clinical, electrophysiological and morphological findings of Charcot-Marie-Tooth neuropathy with vocal cord palsy and mutations in the GDAP1 gene. Brain 126:2023-33. 312. Casana, P., F. Martinez, S. Haya, J. I. Lorenzo, C. Espinos, and J. A. Aznar. 2000. Q1311X: a novel nonsense mutation of putative ancient origin in the von Willebrand factor gene. Br J Haematol 111:552-5.

246 References 313. Coto, E., J. Rodriguez, N. Jeck, V. Alvarez, R. Stone, C. Loris, L. M. Rodriguez, M. Fischbach, H. W. Seyberth, and F. Santos. 2004. A new mutation (intron 9 +1 G>T) in the SLC12A3 gene is linked to Gitelman syndrome in Gypsies. Kidney Int 65:25-9. 314. Matsumura, Y., C. Nishigori, T. Horio, and Y. Miyachi. 2004. PIG7/LITAF gene mutation and overexpression of its gene product in extramammary Paget's disease. Int J Cancer 111:218-23. 315. Previtali, S. C., B. Zerega, D. L. Sherman, P. J. Brophy, G. Dina, R. H. King, M. M. Salih, L. Feltri, A. Quattrini, R. Ravazzolo, L. Wrabetz, A. P. Monaco, and A. Bolino. 2003. Myotubularin-related 2 protein phosphatase and neurofilament light chain protein, both mutated in CMT neuropathies, interact in peripheral nerve. Hum Mol Genet 12:1713-23. 316. Laporte, J., L. Liaubet, F. Blondeau, H. Tronchere, J. L. Mandel, and B. Payrastre. 2002. Functional redundancy in the myotubularin family. Biochem Biophys Res Commun 291:305-12. 317. Dang, H., Z. Li, E. Y. Skolnik, and H. Fares. 2004. Disease-related myotubularins function in endocytic traffic in Caenorhabditis elegans. Mol Biol Cell 15:189-96. 318. Chies, R., L. Nobbio, P. Edomi, A. Schenone, C. Schneider, and C. Brancolini. 2003. Alterations in the Arf6-regulated plasma membrane endosomal recycling pathway in cells overexpressing the tetraspan protein Gas3/PMP22. J Cell Sci 116:987-99. 319. Sancho, S., P. Young, and U. Suter. 2001. Regulation of Schwann cell proliferation and apoptosis in PMP22-deficient mice and mouse models of Charcot-Marie-Tooth disease type 1A. Brain 124:2177-87. 320. Polyak, K., Y. Xia, J. L. Zweier, K. W. Kinzler, and B. Vogelstein. 1997. A model for p53-induced apoptosis. Nature 389:300-5. 321. Stein, S., E. K. Thomas, B. Herzog, M. D. Westfall, J. V. Rocheleau, R. S. Jackson, M. Wang, and P. Liang. 2004. NDRG1 is necessary for p53-dependent apoptosis. J Biol Chem. 322. Parkinson, D. B., A. Bhaskaran, A. Droggiti, S. Dickinson, M. D'Antonio, R. Mirsky, and K. R. Jessen. 2004. Krox-20 inhibits Jun-NH2-terminal kinase/c-Jun to control Schwann cell proliferation and death. J Cell Biol 164:385-94. 323. Zourlidou, A., M. D. Payne Smith, and D. S. Latchman. 2004. HSP27 but not HSP70 has a potent protective effect against alpha-synuclein-induced cell death in mammalian neuronal cells. J Neurochem 88:1439-48. 324. Karbowski, M., Y. J. Lee, B. Gaume, S. Y. Jeong, S. Frank, A. Nechushtan, A. Santel, M. Fuller, C. L. Smith, and R. J. Youle. 2002. Spatial and temporal association of Bax with mitochondrial fission sites, Drp1, and Mfn2 during apoptosis. J Cell Biol 159:931-8. 325. Becker, E. B., and A. Bonni. 2004. Cell cycle regulation of neuronal apoptosis in development and disease. Prog Neurobiol 72:1-25. 326. Benn, S. C., and C. J. Woolf. 2004. Adult neuron survival strategies--slamming on the brakes. Nat Rev Neurosci 5:686-700. 327. Yeo, W., and J. Gautier. 2004. Early neural cell death: dying to become neurons. Dev Biol 274:233-44. 328. Pellerin, L. 2003. Lactate as a pivotal element in neuron-glia metabolic cooperation. Neurochem Int 43:331-8.

247 Appendix A

APPENDICES

APPENDIX A: LIST OF POLYMORPHIC VARIANTS

Lab Id = laboratory identification number Coding seq = coding sequence intronic/E = location intronic to the respective gene, but in an EST SEQ = typing by direct sequencing D + enzyme = typing by RFLP T = typing by Tetra-primer-ARMS PCR M = putative mutation U = uninformative R = informative for refined mapping The two putative mutations that could not be excluded are bolded.

Lab Gene Position in Type of Position on Contig dbSNP Private to Typing method ID Gene change NT_008583.16 accession Gypsies #002 CXXC6 coding seq C/T 18957010 rs3998860 no SEQ M #003 CXXC6 intronic ins/del T 18958016 rs5030660 yes SEQ U #004 CXXC6 3'UTR C/T 19003626 rs5030882 no SEQ M #005 CCAR1 5' flanking C/G 19031937 rs5030894 yes SEQ+T R #006 CCAR1 5' flanking A/C 19032023 rs5030883 no SEQ U #007 CCAR1 intronic C/T 19048954 rs1149679 no SEQ U #008 CCAR1 coding seq A/G 19053443 rs3740594 no SEQ R #009 CCAR1 intronic C/T 19053581 rs5030884 no SEQ R #010 CCAR1 coding seq C/T 19058130 rs5030885 yes SEQ U #011 CCAR1 coding seq A/G 19064839 rs5030887 no SEQ U #012 CCAR1 intronic A/T 19072196 rs1694493 no SEQ R #013 CCAR1 intronic C/T 19076746 rs1149688 no SEQ U #014 CCAR1 3' flanking C/G 19102508 rs4746788 no SEQ U #015 DDX50 5'UTR G/T 19212211 rs5030888 yes SEQ R #016 DDX50 5'UTR C/T 19212239 rs5030889 yes SEQ U #017 DDX50 intronic C/T 19221931 rs5030890 yes SEQ U #018 DDX50 intronic A/G 19224585 rs5030891 no SEQ U #019 DDX50 intronic A/T 19246851 rs5030892 no SEQ U #020 DDX50 coding seq C/T 19247868 rs5030895 no SEQ U #021 DDX50 intronic G/T 19248063 rs1539166 no SEQ U #022 DDX50 coding seq A/G 19252099 rs5030900 no SEQ R #023 DDX21 5'UTR A/G 19267261 rs5030896 yes SEQ U #024 DDX21 intronic C/T 19274094 rs5030897 no SEQ R #025 DDX21 intronic C/T 19276283 rs3898314 no SEQ U #026 DDX21 intronic G/T 19276549 rs5030898 yes SEQ M #027 DDX21 intronic G/T 19284421 rs5030899 no SEQ U #028 DDX21 intronic A/T 19285744 rs5030901 yes SEQ U #029 DDX21 intronic A/T 19285754 rs5030902, no SEQ U rs7904835 #030 DDX21 intronic C/T 19292705 rs2429025 no SEQ R #031 DDX21 intronic A/C 19293347 rs2251911 no SEQ U #032 DDX21 3'UTR ins/del TTC 19293923 rs5030661 no SEQ R #033 DDX21 3'UTR C/T 19294577 rs4554 no SEQ M #034 intergenic C/T 19353464 rs2491026 no SEQ R #035 intergenic A/T 19353642 rs5030930 yes SEQ U #036 intergenic A/G 19353673 rs5030931 no SEQ U #037 intergenic C/T 19354187 rs2491025 no SEQ U #038 intergenic C/T 19372830 rs5030903 yes SEQ U #039 intergenic A/G 19372892 rs5030907 no SEQ + D PvuII R #040 PRG1 5' flanking A/G 19376644 rs2394527 no SEQ R #041 PRG1 5' flanking C/T 19376759 rs5030929 yes SEQ + T R #042 PRG1 5'UTR A/G 19380431 rs5030904 yes SEQ U #043 PRG1 5'UTR A/G 19396971 rs5030905 no SEQ M #044 PRG1 5'UTR A/G 19398772 rs5030906 yes SEQ U #045 PRG1 coding seq A/G 19408007 rs2805910 no SEQ U #046 PRG1 3'UTR C/T 19415220 rs12437 no SEQ U #047 PRG1 3'UTR G/T 19415552 rs7377 no SEQ M #048 VPS26 5'UTR A/G 19435168 rs5030909 yes SEQ U #049 VPS26 coding seq C/T 19443946 rs5030932 no SEQ M #050 VPS26 3'UTR C/T 19482629 rs1802295 no SEQ U #051 VPS26 3'UTR A/G 19482996 rs1048717 no SEQ R

248 Appendix A

Lab Gene Position in Type of Position on Contig dbSNP Private to Typing method ID Gene change NT_008583.16 accession Gypsies #052 SUPV3L1 intronic A/G 19496251 rs5030908 no SEQ+T R #053 SUPV3L1 intronic C/T 19496995 rs5030910 yes SEQ U #054 SUPV3L1 intronic C/T 19509482 rs5030911 yes SEQ U #055 SUPV3L1 intronic A/C 19510182 rs5030912 yes SEQ U #056 FLJ31406 UTR A/G 19526394 rs5030933 no SEQ U #057 FLJ31406 UTR C/G 19526515 rs5030934 no SEQ M #058 FLJ31406 UTR A/G 19526536 rs5030935 no SEQ U #059 FLJ31406 UTR C/T 19526797 rs5030936 no SEQ M #060 FLJ31406 UTR A/G 19527052 rs5030937 no SEQ M #061 FLJ31406 UTR A/G 19527071 rs5030938 no SEQ M #062 FLJ31406 UTR A/C 19527494 rs5030939 no SEQ + D ApaLI R #063 FLJ31406 intronic A/G 19527732 rs5030940 yes SEQ U #064 FLJ31406 UTR A/G 19533885 rs5030941 no SEQ U #065 FLJ31406 intronic G/T 19534054 rs5030942 yes SEQ U #066 FLJ31406 UTR C/T 19534096 rs4746822 no SEQ M #067 FLJ31406 UTR C/T 19536800 rs4746825 no SEQ R #068 FLJ31406 UTR A/G 19536805 rs4746826 no SEQ R #069 FLJ31406 UTR C/T 19536818 rs5030944 no SEQ R #070 FLJ31406 intronic A/G 19543201 rs4746827 no SEQ M #071 FLJ31406 UTR A/G 19543296 rs5030945 no SEQ U #072 FLJ31406 UTR G/T 19543313 rs5030946 no SEQ U #073 FLJ31406 intronic A/G 19543404 rs5030947 yes SEQ M #074 FLJ31406 intronic C/T 19543568 rs874557 no SEQ U #075 FLJ22761 intronic G/T 19557286 rs5030913 no SEQ U #076 FLJ22761 intronic A/G 19561062 rs3740600 no SEQ M #077 FLJ22761 coding seq C/T 19561530 rs5030948 no SEQ U #078 FLJ22761 intronic C/G 19568097 rs5030912 yes SEQ U #079 FLJ22761 coding seq C/T 19569815 rs1111335 no SEQ M #080 FLJ22761 coding seq A/C 19577665 rs906219 no SEQ U #081 FLJ22761 3'UTR A/G 19578184 rs4746832 no SEQ M #082 FLJ22761 3'UTR C/G 19578386 rs2611 no SEQ M #083 HK1 intronic A/C 19589495 rs5030949 no SEQ U #084 HK1 intronic C/T 19600253 rs5030916 no SEQ U #085 HK1 intronic G/T 19606356 rs5030917 yes SEQ R #086 HK1 intronic C/T 19606387 rs2894081 no SEQ R #087 HK1 intronic C/T 19606614 rs4746837 no SEQ R #088 HK1 intronic A/C 19611573 rs2002905 no SEQ + D FokI R #089 HK1 intronic G/T 19611683 rs5030950 yes SEQ U #090 HK1 coding seq A/G 19611765 rs906220 no SEQ M #091 HK1 intronic A/G 19611789 rs906221 no SEQ M #092 HK1 intronic A/G 19611851 rs906222 no SEQ M #093 HK1 intronic A/T 19611862 rs906223 no SEQ M #094 HK1 intronic C/T 19611935 rs5030951 yes SEQ M #095 HK1 intronic C/T 19629171 rs3812691 no SEQ + D HphI R #096 HK1 intronic A/C 19629681 rs5030918 yes SEQ U #097 HK1 intronic C/G 19629683 rs5030919 yes SEQ U #098 HK1 coding seq C/G 19654752 rs3740603 no SEQ U #099 HK1 coding seq A/G 19693575 rs748235 no SEQ M #100 HK1 intronic A/G 19695411 rs5030915 yes SEQ R #101 HK1 intronic C/T 19695479 rs2305196 no SEQ R #102 HK1 intronic A/G 19695857 rs749105 no SEQ + D Bcl I R #103 HK1 intronic A/G 19703246 rs2278745 no SEQ R #104 HK1 3'UTR C/T 19712693 rs5030886 yes SEQ R #105 TACR2 3' flanking C/T 19715336 rs5030893 yes SEQ R #106 TACR2 coding seq A/G 19715810 rs2229170 no SEQ R #107 TACR2 coding seq C/T 19727167 rs5030920 no SEQ U #108 TACR2 5' flanking A/G 19727784 rs3793853 no SEQ R #109 TACR2 5' flanking C/T 19727792 rs5030928 no SEQ U #110 TACR2 5' flanking A/G 19728410 rs5030921 no SEQ U #111 intergenic C/T 19733573 rs5030922 no SEQ R #112 intergenic A/G 19733663 rs5030923 no SEQ R #113 intergenic ins/del 27bp 19736128 rs5030626 yes SEQ + 4% R Agar. #114 NET-7 5' flanking A/T 19762063 rs5030924 no SEQ R #115 NET-7 5' flanking A/G 19762301 rs3812705 no SEQ R #116 NET-7 5' flanking A/C 19762305 rs5030925 no SEQ R #117 NET-7 intronic A/G 19815480 rs1652804 no SEQ M #118 NET-7 3'UTR C/T 19818165 rs1052179 no SEQ M #119 NET-7 3'UTR G/T 19818567 rs1665583 no SEQ M #120 NET-7 3'UTR G/T 19818572 rs5030926 no SEQ M rs3750790 #121 NET-7 3'UTR ins/del TT 19818573 rs1652803 no SEQ M (diff. allele) #122 NET-7 3' flanking C/T 19818855 rs1652802 no SEQ M #123 NET-7 3' flanking A/T 19818873 rs5030927 yes SEQ U #124 Neurog3 coding seq A/G 19883359 rs4536103 no SEQ + D MboII M #125 FLJ22761 coding seq C/T 19543819 rs874556 no SEQ U 249 Appendix A

Lab Gene Position in Type of Position on Contig dbSNP Private to Typing method ID Gene change NT_008583.16 accession Gypsies #126 FLJ22761 intronic A/T 19544067 rs5030970 no SEQ U #127 FLJ22761 intronic A/T 19544104 rs5030971 no SEQ U #128 FLJ22761 intronic A/G 19544125 rs5030972 no SEQ U #129 FLJ22761 intronic A/G 19549893 rs4145930 no SEQ M #130 FLJ22761 intronic C/T 19550064 rs5030973 yes SEQ U #131 FLJ22761 intronic A/G 19540215 not listed yes SEQ U #132 FLJ22761 intronic/E A/G 19540301 rs4072136 no SEQ R #133 FLJ22761 intronic/E A/G 19540303 rs4072135 no SEQ R #134 FLJ22761 intronic/E C/T 19540312 rs4075240 no SEQ R #135 FLJ22761 intronic/E C/T 19540355 not listed yes SEQ U #136 FLJ22761 intronic/E A/G 19540425 rs10823321 no SEQ R #137 FLJ22761 intronic/E A/G 19540693 rs11813186 no SEQ R #138 FLJ22761 intronic C/T 19540904 rs1983128 no SEQ R #139 FLJ22761 intronic/E C/T 19546443 rs7075720 no SEQ U #140 FLJ22761 intronic/E A/G 19559860 rs7918950 no SEQ U #141 HK1 intronic ins/del 19605947 rs3086615 no SEQ U GAGAGA (diff. alleles) #142 FLJ22761 intronic/E C/T 19537340 rs10762265 no SEQ + D HphI R #143 SUPV3L1 intronic A/G 19503270 rs10823313 no SEQ R #144 HK1 5'UTR C/G 19589622 not listed yes SEQ + D AluI M #145 SUPV3L1 intronic A/G 19503186 rs10823312 no SEQ R #146 SUPV3L1 intronic/E C/T 19503913 rs12414176 no SEQ U #147 SUPV3L1 intronic/E ins/del GGTA 19504176 not listed yes SEQ U #148 SUPV3L1 intronic/E A/G 19504470 rs2135587 no SEQ U #149 SUPV3L1 intronic/E A/G 19504574 rs2135588 no SEQ U #150 SUPV3L1 intronic/E C/T 19507263 not listed yes SEQ U #151 SUPV3L1 intronic C/G 19508639 not listed yes SEQ U #152 SUPV3L1 intronic/E A/G 19508680 rs7904007 no SEQ U #153 SUPV3L1 intronic A/G 19508856 not listed yes SEQ U #154 FLJ22761 intronic/E C/T 19549043 rs4746829 no SEQ M #155 FLJ22761 intronic/E A/G 19549156 rs10762269 no SEQ U #156 HK1 intronic C/T 19604835 not listed yes SEQ R #157 HK1 intronic C/T 19604841 rs10998699 no SEQ U #158 HK1 intronic/E A/G 19604938 rs10762278 no SEQ U #159 HK1 intronic/E G/T 19604944 rs10762279 no SEQ U #160 HK1 intronic/E C/T 19605023 rs10823334 no SEQ R #161 HK1 intronic/E C/T 19605061 rs10823335 no SEQ R #162 HK1 intronic/E C/T 19605235 rs10998700 no SEQ M #163 HK1 intronic A/G 19605468 rs2394543 no SEQ R #164 FLJ22761 intronic/E C/T 19560436 rs7092167 no SEQ +D BsrBI M #165 FLJ22761 intronic/E C/G 19548887 not listed yes SEQ U #166 FLJ22761 intronic C/G 19545884 rs2394531 no SEQ U #167 HK1 intronic C/T 19604833 rs7916281 no SEQ U #168 SUPV3L1 intronic ins/del GGAT 19516947 not listed yes SEQ R #169 SUPV3L1 intronic A/G 19513121 not listed yes SEQ M #170 FLJ22761 intronic A/C 19541056 rs1983127 no SEQ U #171 FLJ22761 intronic C/T 19541073 not listed yes SEQ R #172 FLJ22761 intronic A/G 19541682 rs10762267 no SEQ M #173 FLJ22761 intronic A/G 19541959 rs7068302 no SEQ M #174 FLJ22761 intronic A/G 19542943 not listed yes SEQ U #175 FLJ22761 intronic C/T 19543002 rs7091301 no SEQ U #176 FLJ22761 intronic A/G 19543006 rs7073527 no SEQ U #180 FLJ22761 intronic C/G 19544887 rs7083041 no SEQ M #181 FLJ22761 intronic C/T 19545455 rs10762268 no SEQ U #182 FLJ22761 intronic A/G 19548661 rs2394532 no SEQ U #183 FLJ22761 intronic ins/del T 19554410 rs5785905 no SEQ M #184 FLJ22761 intronic G/T 19554565 rs1967236 no SEQ M #185 FLJ22761 intronic ins/del A 19555587 rs11348828 no SEQ M #186 FLJ22761 intronic A/C 19557680 rs1472815 no SEQ M #187 FLJ22761 intronic C/T 19558006 rs1021964 no SEQ M #188 FLJ22761 intronic A/T 19565568 rs1027793 no SEQ U #189 FLJ22761 intronic C/T 19565979 not listed yes SEQ U #190 FLJ22761 intronic A/G 19566152 rs925024 no SEQ M #191 FLJ22761 intronic A/G 19570691 rs10823326 no SEQ M #193 FLJ22761 intronic A/G 19574362 rs4746831 no SEQ M #194 FLJ22761 intronic C/T 19575262 rs6480393 no SEQ M #195 FLJ22761 3' flanking C/T 19578700 rs10998681 no SEQ U #196 HK1 intronic C/T 19582272 rs1472819 no SEQ M #197 HK1 intronic A/C 19582342 rs1472818 no SEQ M #198 HK1 intronic C/T 19582488 rs7903625 no SEQ M #199 HK1 intronic C/T 19582672 not listed no SEQ U #200 HK1 intronic C/T 19582865 rs7095547 no SEQ M #201 HK1 intronic A/C 19582936 not listed yes SEQ U #202 HK1 intronic A/G 19583259 rs1472817 no SEQ M #203 HK1 intronic C/T 19583550 rs1472816 no SEQ M #204 HK1 intronic ins/del T 19583570 rs10716029 no SEQ M #206 HK1 intronic A/G 19584395 rs2394535 no SEQ M 250 Appendix A

Lab Gene Position in Type of Position on Contig dbSNP Private to Typing method ID Gene change NT_008583.16 accession Gypsies #207 HK1 intronic C/T 19584511 rs2394536 no SEQ M #208 HK1 intronic C/T 19584852 rs2394538 no SEQ M #209 HK1 intronic A/C 19585510 rs10823328 no SEQ U #210 HK1 intronic A/G 19586894 rs7905200 no SEQ M #211 HK1 intronic C/T 19587901 rs10740318 no SEQ M #212 HK1 intronic C/T 19588772 rs10823330 no SEQ M #213 HK1 intronic A/G 19590937 not listed yes SEQ +D M Tsp509I #215 HK1 intronic A/G 19592050 rs7907242 no SEQ M #218 HK1 intronic A/C 19592421 rs10998691 no SEQ U #219 HK1 intronic C/G 19592682 rs10998692 no SEQ U #220 HK1 intronic A/G 19593916 rs10998694 no SEQ M #221 HK1 intronic C/T 19594936 rs10998696 no SEQ M #222 HK1 intronic A/G 19596479 rs7894213 no SEQ M #223 HK1 intronic C/T 19596600 not listed yes SEQ U #224 HK1 intronic A/C 19596854 rs4469773 no SEQ M #225 HK1 intronic G/T 19596868 rs2394539 no SEQ M #226 HK1 intronic C/T 19597100 rs2394541 no SEQ M #227 HK1 intronic C/T 19598438 rs2063048 no SEQ M #228 HK1 intronic A/G 19599047 rs10823332 no SEQ M #229 HK1 intronic A/T 19601465 rs1810023 no SEQ M #230 HK1 intronic A/G 19601608 rs2394542 no SEQ M #234 HK1 coding seq C/T 19603238 rs4746836 no SEQ + D U BsmAI #236 HK1 intronic A/G 19591774 rs10762275 no SEQ U #241 HK1 coding seq A/G 19649506 not listed yes SEQ R #243 HK1 intronic ins/del GT 19649620 rs10536437, no SEQ M rs10542467, rs10578071 #244 HK1 intronic A/G 19709423 rs1227938 no SEQ R #245 HK1 3' flanking ins/del T 19712920 not listed yes SEQ M #246 HK1 intronic G/T 19605554 rs2394544 no SEQ M #247 HK1 intronic del A 19605620 rs5785909 no SEQ M

SNPs identified only in the Spanish patient #237 HK1 intronic A/G 19606398 rs2394545 - SEQ - #238 HK1 intronic A/G 19606651 rs3812689 - SEQ - #239 HK1 intronic C/T 19611776 rs1108272 - SEQ - #240 HK1 intronic A/T 19649450 rs7912524 - SEQ - #242 HK1 coding seq G/T 19649531 rs10998724 - SEQ -

251 Appendix B

APPENDIX B: LIST OF PRIMERS

Microsatellite Primers 5’-3’ forward2; reverse or accession number (NCBI – marker 1 UniSTS database) D10S581 UniSTS: 50476 D10S1646 UniSTS: 64851 D10S1670 UniSTS: 42813 BA153K11CA1 GCAAGTTCATCCTGGCTAAC; GATCTGGGCTCAATGCAAG D10S210 UniSTS: 32895 BA153K11CA2 GTTACACTAAATGGAATGAGG; GTGTTATCATGTATGGTTAGG D10S2480 UniSTS: 38119 BA86K9CA1 GGCAGGAGAATCACTTGAAC; GACACTTGGTTTAGCAGCTG BA314J18TA1 CCTGGGCAACAAGAGTAAAAC; GATAGTGAAGAGCTCAG D10S1678 UniSTS: 4255 D10S1647 UniSTS: 66820 BA227H15CA1 GGAACAGAAAGTCTTAACC; GAATATTGAGATGAGTGGTG BA227H15CA2 CACATAAAGTTAAGCCTGG; CCTGGACAATATAGCAAG D10S1672 UniSTS: 79906 BA227H15AAAC CAGTGAGCCAAGATTCT; CACTGGATAGAGAGACAGA D10S1742 UniSTS: 48298 BA404C6AC1 CAGCCAGAAGACAGATAC; CAGTCATTCTTCTCCC BA404C6AC2 CAAACCTCTTCCTTTAGCC; GATAATGGTGTTGTGGC D10S1665 UniSTS: 33169 D10S560 UniSTS: 17166 1 Microsatellite markers in order centromeric to telomeric 2 Forward primers have been labelled with one of the following fluorescent dyes: Fam, Tet, Hex.

Primers for Tetra-Primer-ARMS PCR

Variant #005 Inner primer F AAGGAAAGCCCTTCCTTCAGAGCAACCC Inner primer R CTTAACAATGAGCGGGGCTATATGTCTAGC Outer primer F ACTTAACGGGTGCAGCACACCAACAT Outer primer R AAGCCATTTCAAACCCGTCAGCTTTAGG

Variant #041 Inner primer F GAGGAGAAAGTTTCAGAGGGAAGCCACCC Inner primer R GTGTCCCTGGGGGAATTACTGCGGGA Outer primer F TGGGGTAAGGGACCAGAGAGCTGCATT Outer primer R CATTGGTGATGGAGGGAAAACCCCCTTT

Variant #052 Inner primer F TGGTTGTCTTTCTGGATGTGTTTTAGATG Inner primer R CGATTCTGCACATCCAAACAGTGCGT Outer primer F TAAATTCCACTGAAATCTGCTCCCACTG Outer primer R TGCAGCACCCTGTTATACCATAGAGGAA

Primers used for RT-PCR of HK1 Mouse Human T1-F33-52 CTCAAGGCTTGCTAGGTTAG T1-F10-29 ATCTTGCTGTGTTTGGACAG altT2-F362-381 CTTCCTCTGTGAAGTGCTTG altT2-F46-66 GATTGAAGCTTGGATCCGAAC altT2-F246-265 TAGATCAGCTGGGAAACCAG altT2-R239-258 CAGTCCATTGAATTCAAGCC altT2-R362-381 CAAGCACTTCACAGAGGAAG T2-R99-119 GTCCTTGCAATCCAACTGTGG T2-R26-46 CTTCTCAGGTGTCTGAGGAAC T3-R15-34 GCTTTTCTACTCTCAACAGC T3-R60-80 GTTCTGTCCCATAGTGTAGAG T4-R28-48 CTCACTTTCAGCAAGTAGATG S1-F95-114 AGCTGAAGGATGACCAAGTC S1-F100-120 CTGGCCTATTACTTCACGGAG S2-R54-73 CTTTCTTGAAGCGTGTCAGG S2-R68-87 GCCATTCTTCATCTCCTTCC S3-F34-54 GGTCTTCCTTTCGAATCCTGC S3-F35-54 GTCTTCCTTTCGAATTCTGCGG S7-R6-26 CATGTAGCAAGCATTGGTGCC S7-R6-16 CATGTAGCAAGCATTGGTGCC Genomic R CAGGATGGCAGCTTTCAGAAG (use with S3-F34-54)

252 Appendix B Typing of the two putative HMSNR mutations by PCR + restriction digest

Variant #144 Putative mutation 1-F/R GCTGGGAAGTTACTGTGG/ CTCATGTGCCCACCACTG

Variant #213 Putative mutation 2-F/R CAGAATGTGGCCCAATCTAC/ GGCCAGTTAAATCTGAGATGC

Primers used for sequencing candidate genes and ESTs Contig: NT_008583.16 Start End Primers forward/reverse (bp)

Gene: CXXC6 18956676 18957027 CATCCAAGATTGACACC/ GAAGGTGGTTTCTGTTG 351 18957758 18958100 GAGGAAGTCTGTTCATCC/ CCATTCTTGTGGCAGAAC 342 18962641 18962967 CCTTGTCAAGATTCATGGTTCTG/ CATCCAACTCATACTCCTGATC 326 18963208 18963567 GAGGATAAGTGAGTAGG/ GATCACTCTGGCTATTG 359 18977873 18978278 CATGGAGAACATCAAAAGG/ CTCAAATCCTATTCTCCC 405 18983700 18984187 CCGGTTCTTTAGGCATCCATC/ GAGGAAGCCAGGGACAATTG 487 18992212 18992488 GTGCATTAACCACCATCAGCC/ GATCATGCCACTGTACTGCTG 276 18993592 18993959 GCTAAGCTACACCTTCATGAC/ CAGTATACGTAAGGACTGCAG 367 18997174 18997784 CTTTCCCAAAGGCAACAACAC/ CCAAGAAATACTGATGGGTCAC 610 19001594 CCAACCAACACAACATCAGCG/ CACTTCGCCAAGCTGTGAAATG CATTTCACAGCTTGGCGAAGTG/ CTCAATCAAAACCGAGCCGTG CACGGCTCGGTTTTGATTG/ CACTGCACTAGCAAAGGC CCATTGGGTCTGAAGGC/ GCACCAGAAGCTGTGTAG CTCTACACAGCTTCTGG/ GAATGTGAGAGGACACAG CTGTGTCCTCTCACATTC/ GCTCTTAATGGACAAAGCC GGCTTTGTCCATTAAGAGC/ CAGATACTTTGCACCAG GCAAAAGCAAACAGTTCCAG/ CAAGGGTGTATTCCTGATAG GCTTCAGCTGCTTCAACAAG/ GGTAAACTGCAACAAGCGTC 19005546 GACGCTTGTTGCAGTTTACC/ GGAGGTCATGTATCAGTG 3952

Gene: CCAR1 19031694 CACACCAACATGGCACATG/ GCCATTTCAAACCCGTCAG 19032279 CCTTCCACAGATTCGACAAG/ GCAAGCGCAAGACATATAGC 582 19033210 19033600 GGATGTTCATTTGCCC/ CTTCCCAAAGTGCTAG 391 19047786 19048140 GTGCTGGAGTGAACATGG/ CTGGCCCAACTATTCTTCC 463 19048624 19049017 CAAAGACTGGCTAAGGATCTGG/ GAATCAATGCCACACCAGAAC 394 19051367 19051771 GTATCACAGCTGAACCTACAAGG/ GAGCTCAGGAGTTCAAGACTAC 404 19053146 19053606 GCAGTGAGTCGACTTGAGATTG/ GACCTGACTCTCAAGATGTAC 461 19057922 CAGTGCTGTCAAAGG/ GACTCATTTACCAAGC 19058612 GTAGATGTGTCCTATTCAGG/CAAGTAGGTGACAGAGTATTG 690 19059849 CTCTCTCTCACGACTGAAAG/ CAAAGTGCCGAGATTACTGG 19060798 CCAGTCGATATCCTTCATGG/ GATGGTGGCAATGGTTGTAC 949 19064485 19065173 CTCAAGAGATGAGGCAAAG/ CTGGAGCTGAGATTCAAG 689 19065471 19065946 GACTGTTGAACTAGTAGCGTAC/ GAGGCTGAGGTATCTAATACTG 476 19066157 19066568 GAGATTACCTGCATTGACCAGG/ GACTGACAGATACCAGAGGTAG 412 19067069 19067524 GATTACCAGTCGCTGTTACCAG/ GACCAACCTCAAGTTCTTGCAG 456 19067993 19068446 GCTGAGGCAAGAGAATCTAC/ CACAGATCTGGCTTTGATTGG 454 19071902 19072258 CTGCTCCATATATTCAGG/ CAACAGAACGAGACTCAG 357 19076590 19077069 CACCATCATGCCAATGCACTC/ GAGAGTTAGGAGCCATCAAAGC 595 19082022 19082322 CGTTGTATGTGGCAAG/ CATCGCCTGATTTTCTCC 301 19083832 19084084 CAGTGAATACAATGTCAGC/ GGGCAATATGGCAAAAC 253 19096906 GATATTTGCCTCATCCTAGC/ GTGCAGTGGTATAGTCATAG 19097744 GATTATAGGTGTGAGCCAC/ GCTAGGATGAGGCAAATATC 838 19098674 GACCTAGCATATCTTCCTGC/ GAAGTGGAGGTTGTAGTGAG 19099484 GCAGTGGTTGCAGTAAGTTG/ GAACCATGAAGAGTCTGAGTC 810 19100491 19100958 GGAAGGTCACAATGCTATC/ GAAATTGACAGACATTGCC 468 19101953 GAGACTGATCAAGAGG/ GAATGTGAGTTTCTGC 19102671 GAAGGCTAGGAATTACAAGC/ GAGGTAAATGCTTCGTCACC 718

Gene: DDX50 19211550 19212028 GTCAAATGGACAGCGTGAGAAG/ GCCAATCTCCATAAAGGAAGC 479 19212071 19212458 GGCCCGTTGATATCTCTCAC/ CATCAGGGACAGACAGTTCC 388 19217518 19218058 CTGTAGATTGAGCTGAACTGG/ CTGTACAGACTCTAGTGCAAC 541 19221160 19221455 CAGTAAATGAGGAAGAGTTGAG/ CTATAATCCCAACTACTCAGC 297 19221842 19222237 GTGGTAGGAATAACTGTGTC/ CTAGAACTTAGGACTCGTG 396 19223862 GGAACCTCACTGAAGAGAC/ GAGAAGAGAGAAATGCCACC 19224627 GTTTGATAGACGTGCAAAGGC/ CAAGGACAGAACAGATACC 765 19224832 19225209 GCAATGAAGTCTGTGTG/ GATTAGGCATGAAGAGG 388 19230608 19231048 GTGCTGTTGAGGAGAGTC/ CAGGTCACAGGATTACAGG 441 19244977 19245383 CAGTAAGCATGCATTG/ CTAAGAGCAAGAAGAG 407 19245649 19245969 GATTCCGGAATGACTGCAAC/ GATCTCCACAGGCTTCACAG 321 19246782 19247068 CTGATCTTTTAATCTGCTAGGGC/ CTATTTGAACTGGGAGAGCTTAC 287

253 Appendix B

19247808 19248184 CCCTATGTCTTGACC/ GGAAGAGGTGATTGTC 377 19251866 19252214 GGCTCATAAGAAATGCACAG/ GCATTATTCATGCCATTAGGC 350 19254028 19254351 CATTAAGCAGCCACTGAGTC/ CTGAGAGAGACTCCATCTC 324 19257143 CGTTAACAGAATAGATGGCC/ CCACTTCGTCTTCTACCATC 19257858 GGTAGACAGAGTCGACAAGG/ CTGGGATTACAGCCATGAGC 716

Gene: DDX21 19266975 19267341 GAATTTACCACGCGGAAAGC/ GTCTGGACGCGCTGTTATC 367 19270563 CCTTATCACAGCATGAAAG/ CTTCATCTTCTTGGGCTTAG 19271280 GCAGAGCCTTCTGAAGTTGAC/ CACAAGGTGCTGACACATCAC 718 19272919 19273211 CTCTTTCTCAGAAGGTTCTC/ GAATACACAGGCCTACTTC 293 19274070 19274451 GTAGATGTTGACCAGAAGGTAG/ GGTATGTACACAATGCCCAAG 382 19276167 19276569 CGTCATCATCAGCGTTAGACC/ CAAGCAGGACCATGTTCTTGC 403 19277760 19278199 CTAGATTAGGAGTGCCATGAG/ GGCATGAGAGTCACTTGAAC 440 19279778 19280305 GTGGTTGCTGGTGAACATTG/ GGCAGGAGAATCACTTGAAC 528 19281027 19281336 GTGTTGTGCTCACTGATC/ GACCAAGAGCAGATATGC 310 19282726 19283197 CAGGCCTAACTGATGAAGAG/ CACTATATCCCCAAGTCACC 472 19284319 19284694 GAAGTGCTAAGGGACCTTCTC/ CTTTCAGTCATTGCCACACAC 376 19285404 19285784 CAGAAACCTGGAGCTCTCTGTAC/ CAACATAGTGAGACCCTGTCTC 381 19288190 19288748 CACCGTGGTGAACTTCTTG/ CTGTGTCTGGCCAGAAATTG 559 19289604 19290000 CGAGGGACTGTTCGTACTAATTG/ GATGAGAGAGTGAAGAGATGCAG 397 19292155 19292726 GTAGCTAATAGCCACTAGG/ GAGAGCTGGATATCGTTCAC 572 19293323 GACCAGTGACTTGTTAAGCAC/ GTCTGTGTCTGAAAGAGACAG GCGGAGTTTCAGTAAAGC/ GACAGGTGAACTCACAAAG 19294608 CACATGTATAAGGTGGCTC/ CATACTCTTGAGCAATGC 1286

Gene: KIAA1279 19298716 19299089 GGACTTAGCTCTTGTTACCTTGG/ CAAGAGGTGGAGGTTACAGTGAG 374 19299186 GATGGGTCTTGCTATGTTGC/ CAATACGCTCGGCAATGAC CCATGAGAATGGTTGCTG/ GTGGAACTCGATGACTGC 19300272 GCAGTCATCGAGTTCCAC/ CGAATTGCGCCTTACAAC 1087 19311169 19311581 GCTGCACTACATAAGGTC/ CAGCTATGAGGTGACTTC 413 19315710 19316197 GAGGTTACAGTAAGCCGAG/ GTCTCAGGTTTGAACAAGGTC 488 19316490 19317101 GGCAGGAGAATTGCTTGAAC/ GTTGAGATAGCGTGCCTATG 612 19319492 19320020 CAGTTGTCTTTCAATCCGAGTGG/ CTTCTCTCCAATGGTTCAGAGAG 529 19321591 19322105 GTGTTTGTGGTGCGAGAAAG/ CCGGATTCAAGCAATTCGAAG 515 19326322 19328095 GATACCCTTCTAAACCAGGCTG/ GCAAGCACCTTAAACAGAGCAC 1774 GTGAGCTACTTGAGACCTTTAG/ GAAGAGCTGGTAGTACTTCAGTG GTTGGTCAACAGACAGATCC/ GGCCATCTTGGTTCTGAATC CCAGGAAATAGAAGTTGAGC/ GCTTGTTACCAGGTACATTAC CTGAGTGTTTGCTAGGATCC/GATTACAGGCATGAGACACG

Gene: PRG1 19398504 CCGACAACTCATTCTTGAAGG/ GCACGCCAACTTCTTGATTAG 19399299 GTTCAGGAAATTGTGACGTGTG/ CATATTAAAGGGTTCAGCGACC 796 19407970 19408313 GTAACAATGTGGGTTCCTCGGGC/ CTGAAATCCCAGCTGCTCCAGAG 344 19414790 CCCACATATCAAACTCCACTGG/ GTGTCAAGGTGGGAAAATCCTC GGTCAACATGGATTAGAAGAGG/ GCCATTTTATGGCCATGGGAATC GTTCAACCAACATC/ GAAAAGGGGCATATC 19415790 CACAGTGAATTTGTAGAGTGGGG/ GACTTCTGGAGAAACGTCTG 1001

Gene: VPS26 19434141 19434351 GGCTCACACTGAGGATTAAG/ CTATAGGACCTCCAGATGAG 211 19434748 GCCTCCAGATGAATCCTTC/ CTCTTCCCTCAGGAGAACTC 19435331 CTCCAAGCGGCCTCTC/ GTGGCCCGACAGGCCTAAC 584 19443759 19444029 GTAGTGACTGACTAAAC/ CACAGACCACTTTGATAAC 271 19466659 19467029 GGAAGCTACTAGTAGG/ CAAGGCTGGTAGTTTC 371 19467841 19468246 GTGTTTGAAGTAGATGTCTGG/ GTTAGAACGCTCTATTACCTC 406 19468865 19469253 GAGAAGCCTGAGTTGCAG/ GCTCAGAGATCCTGTTG 389 19473293 19473551 GTGAGATATGTTCTTCC/ GCCAGCAATATTAACTAGC 259 19476886 19477180 GAAGGCATAAGGCTAG/ CAGTCTTCTTTGCAGCTG 295 19479302 19479614 GTGAAGCATTAGTGTCAACTC/ GCTTAGAAGAGTGCTTTCGAG 313 19481998 GTGGCAATCAATAGAGATAAGGC/ CTGCGCTGTAAGATAAAGGAAG GGCCAAAACTCCATATATG/ GCATGTAGAGAATAAGAGG CATGTGAGTAGGCTGG/ GCCATAATGACCTGCCAC 19483792 GGTCAGGAACCAACTTTACTGG/ CATGCAAACAGCAAGCTCAG 1795

Gene: SUPV3L1 19491177 19491548 CAGACAGTGTAGAACCTGC/ GATGTAGCTTCCTCCCAAC 371 19493804 19494065 GTCACCTTGCTGCCTTATTC/ GGAAGAACAAAGACTACCAGG 262 19496017 19496403 GTAATTCCATTGGCAAGGAGC/ CGAGTCAAATAATGCAGCACC 387 19496804 19497032 GCTGCCTTTAGTGTTATTGC/ CCAAGCTGTACCGTAAGAG 229 19497284 19497577 GTGTGGTGTGTCTGTGTGC/ GGATGTTACCAGACCCACTC 294 19498459 19498842 CCGCAACTTAGCTTGTTAAG/ GTAAGATACCAGTGAATTGCC 384 19500114 19500542 GCTGTCTTTGCCAGGATG/ CAGAGTCAGTTGGTAGCATC 429 19502477 19502780 GTGGTAGTGGAGAGAACAC/ GTACCATTGCATTCCAGCC 303 19506001 19506225 GGTAGTGGTGATGGTGATG/ CAGCGCAGATGTCATAAAGC 225 19507816 19508090 GCCAACGTAAAGTACTTTGGC/ CTGACGCTGGCATTATCTAG 275 19509230 19509578 CACAAGAAGTTGAGTTGG/ CCAATACATCAAGCAGC 349 254 Appendix B

19509922 19510268 GAAGAACAGGAGGTGAG/ CCTCAATAGGTGTAGGC 347 19511122 19511498 GGAGTGAGCATTATGC/ GTGAGAGTCTGGATGG 377 19513254 19513536 GCCTGAATGGGACTTTATCG/ GCAACTTCTCAAAGGCAGAC 283 19513609 19514142 CCTCTCAGTATTTCAGGTGG/ CTGGCTCAGTGCTTTACAC 534 19518650 19519064 GACTTAAGGGAATTCTGACAC/ CTCACCTAGCCATCTTAACTAG 415 19519427 GTCATGATGTAGGTGTGCC/ GTGTCCTTCTAGCTTGGCTC CAGTCACGATTGTCAGGAACC/ GCCACAATACTAGAGGAGG 19520049 CAGTCACGATTGTCAGGAACC/ CAGGAACAGAAAACTAGTCCG 623

Gene: FLJ31406 19526167 GACAAGGAATGCTCAGGAAG/ CTTGTGCTTCAGTCTCCTG CCTTTCAGTCCACTCAG/ CCAGAAGTCTCATTTGC CTAACACAGTGGAAAGTGCAC/ CTCCCACACAGTTTCTGATTC 19527816 GTACTGAGTATGGTGGGAG/ CTGTGACCACTGACATAGC 1650 19533717 19534230 CCACCCAGCAAATATGAAAG/ GACTGGTTTATGGCTGCTG 514 19536601 19537001 GCTCACTGGGATGGATTATTG/ ATTTGTCTGTCGGGCTTATGC 401 19543170 19543642 GAGGGTCTACTACAGATATGGC/ GTTTAGTCCCTGTCTCCTCTC 473

Gene: FLJ22761 19531072 19531629 CATCAATACCCTGCTTCC/ GAAGTTCCGAGTGAAGTG 558 19537995 19538377 CCACATCTCAATGCTGTC/ GAGAGAGAAGCCAAGGTAC 383 19543568 19544207 GTGGCTGTGATTTGAAGAC/ CTTCCAGTTTAGTCTGTCG 640 GTCAGTTCTACCCAACG/CAAGTTCTCTGCCTCAC 19549842 19550172 GCTTAGAGAAGAGAAGGC/ CTTCTCTGATTCTCCCTC 331 19551404 19551745 CTGAACCTTGATGAGGCCTTG/ GAATTCACTCAGAGCTCGCTG 342 19553995 19554378 GTGGTACATTGAGGAGGACTC/ GGATAGCTAGTCACAACTGG 385 19555614 19555917 GTAGTTCCAACCACTCAGG/ TTGCTTGAACCCAGAAGG 304 19556837 19557326 GTTCTTGCCATCCAGCACTTTG/ GAACCTCTCATCTGAGTCATGG 490 19558189 19558582 GAGAGGTGGAGAAGCTTGTTC/ CAGGATAGTCCAGAAGACACTG 394 19559090 19559753 GCACAGACTGGTTGAAATG/ CTCCATTAGAGACGTGAAC 662 19560973 GGACTTTACCACAGAAGTGG/ GATGCACTGCACAATGTGATC 19561698 CAACAAGATCTTCGCCATCC/ CTATCATGTGTTCCCAGCTGC 726 19567842 CTTGACAAGCAGCCAGCTATAG/ CAAGCCAAACAGGTTTAGGAGG 19568465 CATCAAGAGGAGAAACGTAGG/ GTAACTTCATCTCTGCCTTGG 624 19569540 19569934 CAAGGTTAGCCAGCTATGAC/ CCTTAGGTCTGGTAGTGTTG 395 19571890 19572367 GATGATAGCTTGCAGTGC/ CGATCCTGAATCAGTTGG 478 19572651 GTTTGCATCAGTATCCAGAC/ CACCATTTACGATGTCTAGTG 19573418 CACTAGACATCGTAAATGGTG/ GATAACATCCAGTGCTAGC 768 19576080 18576900 CTCACACCTTGTCTGTCAG/ GTGGACCTTCTTGAGTCTC 821 CAATAGAGCAGGGAGTCAG/ GCACGATTTGTGGGAAAG 19577393 GACTTTGAAGTATCCAGCCC/ CCTGTTGGTAGGAAGTTTGG GAAGAAGCTGTTTAGAGCC/ CTCAGGACAGACACTTTG 19578594 CTCGGGTACTCTTAGTATC/ CAATCAACCTCTCTGCAC 1202

Gene: HK1 19580821 19581097 GAGACACAAGGTATCTAGGC/ CTGGGTAACATAGCGAGAC 277 19581874 19582106 CCATTCTTCTTCCTCACC/ GGAAATGGTCAGGAACAG 233 19589405 19589784 CCAGTCCTGATTTCTGAG/ CATCTTACCACACAGTCC 380 19590943 CCTGCGAGAATCTAGAGAAC/ CCTGGCTAACACAGTGAAAC 19591241 ACAGAGTCTCGCACTGTC (sequencing only) 299 19593313 19593722 CAAGTCCTGGAGTAACATCC/ GGACAGGAAGCTACTTAATGC 410 19599464 19599788 GTGTCTGCACTATGTCTTGG/GAGGCAGAGGTTATAGTGAG 325 19602905 19603586 CTGGATTGTTCTGGAAGCTC/ CAACCTGTAAAGTGCCTC 682 19606293 19606740 CACAGTGCCTGGTACATTTG/ GTTGGAGGTCAATACTGCTC 448 19611514 19611956 GTCTTGAACTCCTGACCTTG/ CCACACTTGTTTAGACTGCC 443 19626596 19627058 CTCAGGAAGCTGTCTGATTG/ CATGTCCACTCCATTTGTTC 463 19629493 19630099 GACGTGCAGGATGATGC/ GAGACACTGACAATGCCG 607 19648826 CATGTCCAATCTTGCGTTTG/ AGACACCAGACATCAGACAC 19649755 CAGGACAAGTGTCTGGAAAG/ CAGCATCTCCAGATCTTTAC 930 19654592 19655042 CAGCAGTTGAATGACCTTCC/ GATGGATGGATGGATGGATG 451 19670667 19671159 CATGATTGCTGCTTTCTGGC/ GGAGAGCCTCTGTTTAAG 493 19675595 19675921 GTATTCCTTTCTGTGTGTGTGG/ CATGCAAACCATTCTGCCAAC 327 19679180 19679699 CCACATCCACTTGCTTTC/ CCTCATCATTCCACACTG 521 19680068 CTTCTGCCAAGCGCTGTTAAG/ CATGTAGCAAGCATTGGTGCC 19680637 CTGTGGTGAATGACACAGTG/ GTGATCACTTGGCTTTATGCC 570 19687691 19688078 GTTCAGTCACTCAAGCACATGG/ GACTCAAGAAGTAACAAGGGAC 388 19690617 19691195 CTCTTCTCAGGCATTTGACAG/ CTGTTCTCAAACTCCTTCCTC 579 19693346 19693806 CCAATAAATGCTCAGTCCAGCTG/ CATCACCAGTGTGAAGGCATC 461 19695158 19695933 CCTTGAAAGTCTGGAGTC/ CATCTAGGAATGACACTGAGG 776 19697104 19697783 GATGTTTGAACTCCGTCTCC/ CAGTGGAGTCAGGAAATAGG 689 19700027 19700342 CAGTGCAACAGACAGAAC/ GGAGCTGGTCAAGAAATG 316 19702815 19703342 GAAAGTCAGCTAAGCAGG/ CCAGTCTTGAAACCTCTG 528 19705593 19706084 AGAGCTATCAGGCAGCTGAG/ CATCACAATCCACCAGCCAG 492 19709391 19709946 CTCTGTCTGTGGATGAGTAC/ CACTTCTCTGTCTCCCTAAG 556 19711745 CTTCCGATGCTTATGTCCTG/ GTTGCACTGAGCTGAGATTG sequenced with: GCACACGCTCTATTCTGTAC/ CATAGCATTAGCTGCTTCCTC CTCCAGAATCATGCACCAGAC/ GATGCCACTACACTTCACACG 255 Appendix B

19712984 GGTACAGAATAGAGCGTGTGC/GAGACAAGACACATTTCCGC 1240

Gene: TACR2 19715244 CAGTTTCCCTGTCTTGCTTC/ GAGCTGAACTTTCAGATGTGG 847 GATCCCAATGTGGCAGTGTTG/ CTTGTCATGGTGGAAAGTGG 19716091 CTCTGGAAGGGAGTCAGTTTC/ CAACACTGCCACATTGGGATC 19717896 19718233 GGCTTGGGCACTAAGAAGTAAC/ CAGGAATGAGGATGGATGCTTG 338 19719654 19720047 CTGGCAGACTTCGGAGTC/ GGCCTTGAGACCGTAAGG 398 19725769 19726102 GAGCTGGGCTAACAGGAG/ CATGGATGCATGCACGCGTG 338 19726728 CTTCATCGTCAATCTGGCGC/ GCCATTCCCATGGTTCTACC 1405 GCCAGGTCCTTTGTTCCAGAC/ GCATTGAAGGCAGCCATGCAG CTTCCATCTCTTCAGCGAAG/ GGCTTCAGTCACAATGTCAC CAGAGAACGGCAGGGAAG/ CAAATCACAGTGTCAGAGCAG 19728132 GTAGACAGAGCTCAGAAAGG/ CTTCCCTGCCGTTCTCTG 19728292 19728522 CATCTTCACTGCTTCACACAC/ CTAGGAGGAGAAGCCAGAAC 231

Gene: NET-7 19761899 GAGCAAACTTCAGACTCCAG/ GGAAACGCCTGTGCTAATG CATTAGCACAGGCGTTTCC/ CTTGAGCCAGAGGTAGGAG 19762694 CTGTGATTGCCGCGCTCCAGT/ GCGGCTAAGTGTGGACCCAAC 796 19794512 19794897 CACAGCAGGTACCACAGC/ GGAGGTGGTTTGAGGCAG 386 19796020 19796249 CTCCAGGTGAGTCTCTGAATTC/ CAGCAGAATGAAAGCTCCATGG 230 19806397 19806678 GCTGCATTGGCACAAGGCATTC/ CCATCTTCTTGAGCAGG 282 19809091 19809396 CATAATAAACTGCCTCCGTCTC/ CATCTTCAAGTTGGGAAGAGAG 306 19815212 19815527 GGTCGGAATCTCAGCTCCATCC/ CTCTGGGTCACAGCCAGGAATG 316 19816918 19817264 CTGAGGATCACAGCGGTTG/ CAGGCAAGTCCTGGACTTC 347 19817647 GACTGGCTGCTTGGGTTTCTG/ GACTGAGAGGTGCTATCCCAG CTGGGATAGCACCTCTCAGTC/ CTGCCTTGAATACCACCGCCAAG 19818523 CCATTTCATCTCTGGCAGTG/ GGGCAAGAATAAGACATGCG 877

Gene: NEUROG 3 19882875 19884773 CAGTGACGGACTCAAACTTACC/ GTACAAGCTGTGGTCCGCTATG CTGTGAAAGGACCTGTCTGTC/ GCTGAGAGACCAAACATTGG GAGCGCAATCGAATGCACAAC/ GAAGCAGAAGGAACAAGTGC CTCGGAAGACGAAGTGACCTG/ CTATGCGCAGCGTTTGAGTCAG GCTGCTCATCGCTCTCTATTC/CAGTGCCAACTCGCTCTTAG GAAGCAGATAAAGCGTGCCAAG/ GGTAAGTTTGAGTCCGTCACTG CTTCCTGTGCGATTTCAGAC/ GATTTCCTGAGCAGCAAGTC 1990

Gene predictions Intergenic sequence 19736344 19735880 CACCTATTTCCAGATGCCTG/ CTCTCTTGTTTGAGGCACTG 465 19734094 19733468 GCCAGTATTCTCGGTCTTATC/ GAGAGAAAGGTCAGAGTAGCC 627 19732373 19731865 CTGCCCACTTGAAATCC/ GGCAGGAGAATCACTTG 509 19353101 19353566 CTGGGAACTTACACAGACTG/ CATACGTGTCAGCAACAGTTG 466 19353294 19353876 GTCAAAGGACACAGAGATG/ CTGCTTTACTCAGAGTCTAC 583 19354004 19354395 CTTCTGCTAGAATTTCCAGCC/ CTCTGAAAGTGCTGGGATTAC 391 19361987 19362347 CTACAGAGTGAGACGCCATC/ GCTGAGAAGCACTGAAGATTG 361 19367365 19367877 GAGAGCAGTTGTTCCTTGAG/ CCAACTGGTCACTGTTGTG 513 19368024 19368547 GCCTGAAGTAAACGACAGAC/ GCACTGGAATCAAGAAGTCAG 524 19368680 19369061 CAAGAACCAACCACAGTGAC/ GGCATAACTCTCTTGAACCTC 382 19372510 19373026 GGAATTACACAATCTCCGCTC/ GTTGAGAAAGGACAGGCTCTG 517 19376428 19376950 CTTGGAGTATAGAGGAAGAGG/ CTGAAACTACCTCCACACAG 523 19380195 19380727 CCAAGGCACTACAAAGGATTC/ GGTCCAAGTCCTGAGATAAGC 459 19396754 19397235 CAGGAGGTGGAGGTACAC/ CCTGCTGATTCCTTTACC 482 19353101 19353799 CTGGGAACTTACACAGACTGC/ GTTTGACCAACAACTGGGCAC 699 19353991 19354391 GCCACAACATAAACTTCTGC/ GAAAGTGCTGGGATTACAG 401 19355881 19356281 GCTGGTCTCAAATGTCTGAC/ CACAGGAGAACTACAGAGTG 401 19827753 19828257 CATTAGGCACACTGAGG/ CTTATTCTTGCCCTTCC 505

Intronic in HK1 19589169 19589572 CACTGCTTACACTATGGCTG/ CTACCACAGTAACTTCCCAG 404 19599912 19600283 GTCTCAAACTCCTGACCTC/ CTCCTATTTCCAGCACAC 372 19611154 19611635 GTCATCGTCTGGTTTCTCAC/ CTGACTTCTATCACCAGAGC 482 19603550 19603739 GGTCTGGGCACTAAGATAG/ CTCTGCTCTCAACTTGGAG 190 19605643 19606021 GATGCCTGTAATCCCAACTAC/ GTCTGCGAAAGATGAGTTAGC 379 19621309 19621559 GCTGCTCATATTTGGCTC/ GTGACAGAGCCAGACTTTG 251 19628794 19629319 GAATCCTAAAGTGGCTGTC/ CCATTCCTCGGAAATGAG 526 19639220 19639708 CTGATTACAGGCTCCAG/ GGAAGTGACTGCTAACG 489 19645852 19646228 CTTTGGGAGGAAGAAGGAG/ CTTTCACACAGTCTGGGAAG 377

3’ flanking of FLJ22761 19578880 19579228 GCACCACCACATTCAAC/ CCTTTCTGTTCTGCCAC 349

3’ flanking of NET-7 19818511 19818910 CTCCAAGAACTGCCTCAAG/ GAGAGATGAGGTTGTGGTG 400

256 Appendix B

ESTs BE887501 19493259 19493590 GTCGCGAAATAGATGAG/ CCAGTCAAAGGAGATGA 332 AW592441 19497526 19498221 GCTCATTTCTCTAGAAG/ CATCTTAGTGGCTCTGAG 696 CAGCACTGAGGATACAACG/ CTGAGGTTACAGGCCTTATC AV721871 19498322 19498624 CTCAGTCTCAGCTCAGTGT/ CCTTACAGTCCAACACAGG 303 BU168259 19502735 19503425 CAAGACAGAGTCTTGCTCTG/ GGATCGCACGTTTGTCATAC 691 CCAAAGTACTGGGATTACAG/ GTGGAGTAAACAAGCAAGTG BG778902 19503221 19504363 CTTTGTACCACTCTCCAG/ GGCAGTGTACATAACAGG 1143 CACTTGCTTGTTTACTCCAC/ CCTGTAGTTGTAGCTACTCAG CAAGCAATCCTTCTGCCTTAGC/ CCTATTGATTCCGGATAGGTGG CAGTGGGATTACAGGTATAAC/ CTACTTCGTGATCTGGATATG AA764876 19504391 19504917 GTCTCACTCCTGTCGTTC/ CTCTCAGGAGCAGAAAGTG 527 T68413 19506202 19506458 CCGTGCTTTATGACATC/ CTAGCATGAAACACACC 257 AW589252 19506792 19507425 CACAGCTGTTTGCCAACATG/ CCAGCCAAGTTCAGCTATTTG 624 GAGTTCTTGAGGTCCATAG/ CCATGAGACAGAGTCTTG BF913205 19508501 19508918 GACAAGGTCTCACTCTGTTAC/ GTCTAACCTTCTGTGACTTGTC 418 AW812661 19512253 19512695 CCTAACGGTGTAATGTGTAGC/ GGAGGATTACTTGAGGACAG 443 AW629728 19513069 19513647 CACTGCCAGATCATCATAGG/ CACTCCCTCATATCCTTCC 579 BG994573 19516540 19517088 GTTCAGGCGATTCTCCTGTC/ GGCCAAGCATGAGTTCTCAAG 549

BG979125 19536981 19537454 CAATAATCCATCCCAGTGAGC/ AGCTAGTATTACAGGCAGGTG 474 AW813580 19540141 19540975 GATGGGATCAGTCGTGTTCTG/ GATGCTTTGTGTGGTGAGAGG CTCCTCCTGATACAGACAAG/ CAAACTCAGGACTTCTCTGG AW813635 19545801 19546080 CCATCATCTATACAGGGGACAG/ GATTACAGGCATGAACCACCGC 280 19547713 19548173 CAAGGAAGGGAAAAATGC/ GGTTAAGACTTTCTGTTCC 461 BQ638264 19546071 19546336 GCCTGTAATCCCAGCACTTTG/ GTTTCTGACAGAGTCTCGCG 266 19548752 19549254 CTCGCGGTGAAATTCAGAG/ CAGAATGCATGGTTTGCCAG 503 AW850137 19546236 19546680 GGCAGGAGAATCACTTGAAC/ GATAAACCATCCGTTCACACC 447 BQ321426 19551610 19551906 CATGATGACCTGTGCCTATG/ GAAAGTTCCAGGAGCACAG 297 BF822822 19551816 19552269 CGACAAGTGTTTACTGAGCAC/ CTCTGCCTCTCACTTGAATG 454 AI581736 19552983 19553551 CCATTCCTTTACCACTCC/ CAAACACTTGAGCACTGG 569 CAGTATCTTCCCTTGGCTG (sequencing only) AI673121 19559552 19560000 CTGGAGTATGGGCTGAAGAAG/ GAGAAGCAGCAGGAAGACAAG 449 AA018266 19564559 19564969 TGGCAGGCACTCAAATC/ TTCTGGGATGACAGGAG 411 19564805 19565364 CGAGACTGGCATAAAGC/ GATGGATGGTGGTGATG 560 AA018233 19566263 19567021 GTCCAGTTTCCCAAGAAC/CACTTGGAGAAAGGACTG 759 CAGATTCTTACTCTGTTGCC (sequencing only) BF754448 19571760 19572159 GTAGCACAAAGCCAAGGAG/ GGAGACGCTCTGAAATCTG 400 19572350 19572738 CCAACTGATTCAGGATCGC/ GGGAGAGGAAACCCATAG 389 AA738258 19560258 19560756 CCACTTGGAACGTTACCAAAG/ GGCATCCAGTCAAGGAAATG 499 19577011 19577555 GGCTAAATGTGTGAGGTTG/ GTTCCTTCACAGTTTCCTG 545 19597885 19598290 GGTTATGCACTCCATCAG/ GGAAGGAGTTGGATCATG 406 BE162082 19601679 19602321 GTGATTACAGGCGTGAAC/ ATTCCTGGTGGCATTCTG 643 GATGTCATTGCCACCTCTC/ GGCTCCAGGCAATAAGAAC BU661976 19604791 19605663 GAGACAGAGTCTCCCACTTTG/ CAGCCACACTAATTCTCCTC 873 CAAGTCACATGGCCAGTTCTG/ GTAGTTGGGATTACAGGCATCC

257 Appendix C

APPENDIX C: PUBLICATIONS

Journal publications:

Colomer J, Iturriaga C, Kalaydjieva L, Rogers T, Hantke J, King RH, Tournev I, Thomas PK: HMSN-Russe in two Spanish patients: Distinctive features of the disease and current genetic findings. Acta Myologica 2001, XX:202-209.

Hantke J, Rogers T, French L, Tournev I, Guergueltcheva V, Urtizberea JA, Colomer J, Corches A, Lupu C, Merlini L, et al.: Refined mapping of the HMSNR critical gene region--construction of a high-density integrated genetic and physical map. Neuromuscul Disord 2003, 13:729-736.

Guergueltcheva V, Tournev I, Bojinova V, Hantke J, Litvinenko I, Ishpekova B, Shmarov A., Petrova J, Jordanova A and Kalaydjieva L: Early clinical and electrophysiological features of the two most common forms of Charcot-Marie-Tooth disease in the Roma (Gypsies)(in press Annals of Child Neurology)

Abstracts: Tamara Rogers, Janina Hantke, Dora Angelicheva, Ivailo Tournev, Jaume Colomer, Axinia Corches, P.K. Thomas, and Luba Kalaydjieva: Refined Mapping of a Novel Autosomal Recessive Motor and Sensory Neuropathy – Russe. 25th Annual Scientific Meeting of the Human Genetics Society of Australasia in Cairns.

Hantke J, Rogers T, Chandler D, Kalaydjieva L: Looking for the gene for a severe autosomal recessive peripheral neuropathy. 28th Annual Scientific Meeting of the Human Genetics Society of Australasia in Perth.

258