The Genes Involved in Hearing and Endocrine Disorders

A Thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of

Medical and Human Sciences.

2012 Emma Mary Jenkinson

School of Medicine

CONTENTS Content Page List of Tables 7 List of Figures 10 Abstract 16 Declaration 17 Copyright statement 17 Acknowledgment 19 Abbreviations 20 Chapter 1: Introduction 24 1. Introduction 25 1.1. Sensorineural Hearing Loss 25 1.1.1 Genes which Regulate Ion Homeostasis in the Cochlear 28 1.1.2 Genes Responsible for Formation and Maintenance of the 29 Hair bundles 1.1.3 Genes Responsible for Maintenance of the Extracellular 32 Matrix 1.1.4 Transcription Factor Genes 34 1.1.5 Genes of Poorly Established Function 35 1.1.6 The Need for Further Research in Deafness Genetics 37 1.2 Hypothalamic Pituitary Gonadal axis 37 1.2.1 Hypogonadotropic Hypogonadism 39 1.2.2 Hypergonadotropic Hypogonadism: Premature Ovarian 52 Failure (POF) and Ovarian Dysgenesis 1.2.3 Premature Ovarian Failure and Sensorineural Hearing 71 Loss: Perrault Syndrome 1.3 The Evolution of Techniques in Genetic Medicine 80 1.3.1 Linkage Mapping 80 1.3.2 From Locus to Gene Mutation 83 1.3.3 Next Generation Sequencing 84 1.4 Aim 89 Chapter 2: Materials and Methods 91 2.Materials and Methods 92

2

2.1 Suppliers 92 2.2 Nucleic Acid procedures 92 2.2.1 DNA Extraction 92 2.2.2 RNA Extraction 92 2.2.3 Quantification of Nucleic Acids 93 2.2.4 Standard PCR Reaction 93 2.2.5 Agarose Gel Electrophoresis 94 2.2.6 Purification of PCR Products 95 2.2.7 Sequencing Reactions 95 2.2.8 Purification of Sequencing Products 96 2.2.9 Whole Genome Amplification 96 2.2.10 Reverse Transcription 96 2.2.11 Affymetrix Genome-wide Human SNP Array V6.0/250K 96 2.2.12 Sybr Green 97 2.2.13 Copy Number assay 97 2.2.14 Whole exome sequencing 97 2.2.15 Expression array 98 2.3 Data Analysis 98 2.4 Procedures 98 2.4.1 Extraction of Protein from Zebrafish Embryos 98 2.4.2 Extraction of Protein from Mammalian tissue 99 2.4.3 SDS Polyacrylamide Gel Electrophoresis (SDS-Page) 100 2.4.4 Western Blot Analysis 100 2.4.5 Developing Western Blot 101 2.4.6 Stripping Western Blots 103 2.5 Zebrafish Model Organism Techniques 103 2.5.1 Zebrafish Care and Breeding 103 2.5.2 De-chorination of Embryos 103 2.5.3 Procedure for de-yolking embryos 104 2.6 Immunohistochemistry Techniques 104

2.6.1 Sectioning and Mounting of Embryonic Tissue 104

3

2.6.2 Immunohistochemistry of Tissue Sections 105 Chapter 3. Clinical Details of Patients Involved in the Study 108 3. Clinical Details of Patients Involved in the Study 109 3.1 Aim 109 3.2 Clinical Descriptions 110 3.2.1 Perrault Syndrome Cohort 110 3.2.2 Family with Hypogonadotropic Hypogonadism syndrome 120 Chapter 4. Mutational Screening in HSD17B4 and HARS2 123 4. Mutational Screening in HSD17B4 and HARS2 124 4.1 Introduction 124 4.2 Aim 124 4.3 Results 125 4.4 Discussion 129 Chapter 5. Autozygosity Mapping 130 5. Autozygosity Mapping 131 5.1 Introduction 131 5.2 Aim 133 5.3 Results 133 5.3.1 Perrault Syndrome 134 5.3.1.1 Family P1 134 5.3.1.2 Family P2 142 5.3.1.3 Family P4 143 5.3.1.4 Family P5 144 5.3.1.5 Family P8 146 5.3.2 Hypogonadotropic Hypogonadism Syndrome: Family HH1 150 5.4 Discussion 155 5.4.1 Identification of a Perrault syndrome locus on 155 Chromosome 19 5.4.2 Identification of a Homozygous Deletion within the Perrault 157 syndrome locus 5.4.3 NOBOX Haploinsufficiency in Family P5 158 5.4.4 Identification of a locus for Hypogonadotropic 160 Hypogonadism syndrome on Chr3

4

5.5 Conclusions 160 Chapter 6. Copy Number Deletion Investigations 161 6. Copy Number Deletion Investigations 162 6.1 Introduction 162 6.2 Aim 164 6.3 Results 164 6.3.1 Investigation of the Chr19 Homozygous Deletion detected 164 in Family P1 6.3.2 Investigation of the Chr7 Heterozygous Deletion detected 186 in Family P5 6.4 Discussion 190 6.4.1 Sequencing of CDKN2D, KRI1, AP1M2 and SLC44A2 in 190 Perrault Syndrome cohort. 6.4.2 Expression analysis in Family P1 191 6.4.3 Confirmation of the heterozygous deletion on chromosome 194 7 in Family P5 6.5 Conclusions 196 Chapter 7. Perrault Syndrome Investigations 197 7. Perrault Syndrome Investigations 198 7.1 Introduction 198 7.2 Aim 198 7.3 Results 198 7.3.1 Sanger Sequencing of Candidate Genes in 19p13.3-13.11 198 locus 7.3.2 Next Generation Sequencing of Family P1 199 7.3.3 Expression Experiments for variants detected in Family P1 206 7.3.4 Next Generation Sequencing of additional Perrault 212 syndrome patients 7.4 Discussion 213 7.4.1 Next Generation Sequencing of Family P1 213 7.4.2 Expression Experiments in Family P1 217 7.4.3 Whole Exome Sequencing of Patients P6:II:1 and P2:II:2 219 7.5 Conclusion 220

5

Chapter 8. Hypogonadotropic Hypogonadism Investigations 221 8. Hypogonadotropic Hypogonadism Investigations 222 8.1 Introduction 222 8.2 Aim 223 8.3 Results 223 8.3.1 Sanger sequencing of candidate genes within the 3p22.1- 223 p21.2 locus. 8.3.2 Zebrafish BSN knockdown model 231 8.4 Discussion 244 8.4.1 Identification of BSN mutations in Family HH1 244 8.4.2 BSN knockdown zebrafish model 245 8.5 Conclusion 247 Chapter 9. Discussion 248 9.Discussion 249 9.1 Perrault Syndrome; Summary and final conclusions. 249 9.2 Perrault syndrome; Future work. 250 9.3 Hypogonadotropic Hypogonadism syndrome; Summary and 252 final conclusions. 9.4 Hypogonadotropic Hypogonadism syndrome; Future work. 254 9.5 Final conclusions 255 9.6 New Findings and Published Data since Submission of 256 Thesis Chapter 10. Appendix 257 10.Appendix 258 10.1 References 292 10.2 Publications 311

Final Word Count = 71,970

6

List of Tables

Table Page 1.1. Table of known deafness genes. 36 1.2. Table of known hypogonadotropic hypogonadism genes 52 1.3. Table of known hypergonadotropic hypogonadism genes 71 1.4. Phenotypic features of reported Perrault syndrome patients. 73 2.1. Standard PCR reaction mixture. 93 2.2. Standard sequencing reaction mixture. 95 2.3. Primer sequences for SYBR green experiments. 97 2.4. Antibodies and conditions used in western blotting experiments. 102 2.5. Antibodies and conditions used for immunohistochemistry 106 experiments. 3.1. Clinical features for affected individuals from Perrault syndrome 117 cohort. 3.2. Features of affected individuals from Family HH1. 122 4.1. Common polymorphisms detected during mutational screening 126 of HARS2 and HSD17B4. 4.2. List of samples that were sequenced for HARS2 and HSD17B4 129 from Perrault syndrome cohort. 5.1. Data for 19p13.2 homozygous deletion as displayed in ChAS 139 software. 5.2. Data for 7q35 heterozygous deletion detected in affected 145 individual P5:II:1, as displayed in ChAS analysis software. 5.3. All homozygous regions over 2Mb detected in affected 148 individuals from Perrault syndrome families. 6.1. List of patients from Perrault syndrome cohort that were 165 sequenced for AP1M2, CDKN2D, KRI1 and SLC44A2 mutations. 6.2. Details of multi species alignment for the chromosome 19 166 homozygous region deleted in Family P1. 6.3. Efficiency of individual primer assays as calculated from 175 validation experiments. 6.4. Average assay efficiency based on validation experiments. 175

7

6.5 Triplicate Ct values for individuals from Family P1. 176 6.6. Results of the Comparative Ct method analysis of SYBR green 177 based qPCR data. 6.7. Results of the Pfaffl method analysis of SYBR green based 177 qPCR data. 6.8. Expression array data for 10 genes surrounding the19p13.2 182 intergenic deletion. 7.1 Candidate genes from locus 19p13.3-13.11 sequenced in 199 Family P1. 7.2 Variations detected in 19p13.3-13.11 locus after filtering using 201 predefined criteria. 7.3 Polyphen2 and SIFT in-silico predictions for variants detected 205 using whole exome sequencing. 7.4 Sequencing of ethnically matched control samples for novel 206 variants. 7.5 Next generation sequencing coverage data for Perrault 212 syndrome patients. 8.1. In silico predictions for the p.R3361W BSN mutation. 226 8.2. Coverage data for whole exome sequencing of sample 227 HH1:III:4. 8.3. Next Generation exome data for 3p22.1-p21.2 locus in affected 228 individual HH1:III:4. 8.4. Non-synonymous BSN variants detected in a cohort of IHH 230 patients. 10.1. Primer sequences for coding exons of HSD17B4. 259 10.2. Primer sequences for coding exons of HARS2. 260 10.3. Primer sequences for the confirmation and breakpoint 261 determination of Family P1 homozygous deletion. 10.4. Primer sequences for coding exons of AP1M2, SLC44A2, 262 CDKN2D and KRI1. 10.5. cDNA concentrations used in validation experiment 1. 263 10.6. Results of GAPDH assay efficiency validation experiment 1. 264 10.7. Results of CDKN2D assay efficiency validation experiment 1. 264

8

10.8. Results of KRI1 assay efficiency validation experiment 1. 265 10.9. Results of SLC44A2 assay efficiency validation experiment 1. 265 10.10. Results of AP1M2 assay efficiency validation experiment 1. 266 10.11. cDNA concentrations used in validation experiment 2. 268 10.12. Results of GAPDH assay efficiency validation experiment 2. 268 10.13. Results of CDKN2D assay efficiency validation experiment 2. 269 10.14. Results of KRI1 assay efficiency validation experiment 2. 269 10.15. Results of SLC44A2 assay efficiency validation experiment 2. 270 10.16. Results of AP1M2 assay efficiency validation experiment 2. 270 10.17. Raw data for differentially expressed genes identified on 273 expression array. 10.18. Raw data for NOBOX Taqman assay 1. 283 10.19. Raw data for NOBOX Taqman assay 2. 283 10.20. Primer sequences for the breakpoint of suspected 7q35 284 heterozygous deletion. 10.21. Primers sequences for expression analysis of CLPP, 284 GTF2F1 and PCP2. 10.22. Whole exome SNP data for individual P6:II:1. 285 10.23. Whole exome Indel data for individual P6:II:1. 286 10.24. Primer sequences for coding exons of human BSN. 289 10.25. Primer sequences for zebrafish Bsn cDNA sequencing. 291

9

List of Figures

Figure Page 1.1 Major structures of the inner ear. 27 1.2 Major structures of the mouse and zebrafish inner ear 27 1.3 Hormonal control of the HPG axis. 38 1.4 Pathogenesis of Kallmann syndrome. 41 1.5 Schematic illustration of TACR3. 51 1.6 Schematic illustration of FSHR. 64 1.7 Identical by descent inheritance of a pathogenic allele. 82 1.8 Illustration of the principles of the SOLiD platform next 87 generation sequencing chemistry. 1.9 Illustration of the principles of the Illumina Solexa platform next 88 generation sequencing chemistry. 3.1 Pedigree of Family P1. 110 3.2 Pedigree of Family P2. 111 3.3 Pedigree of Family P3. 111 3.4 Pedigree of Family P4. 112 3.5 Pedigree of Family P5. 112 3.6 Pedigree of Family P6. 113 3.7 Pedigree of Family P8. 114 3.8 Pedigree of Family P9. 114 3.9 Pedigree of Family P10. 115 3.10 Pedigree of Family P11. 116 3.11 Pedigree of Family HH1. 121 3.12. Clinical photograph of affected individuals from Family HH1. 121 4.1 Illustration of the HARS2 gene. 125 4.2 Illustration of the HSD17B4 gene. 125 4.3. Sequence chromatogram showing a section of HARS2 for 127 affected sisters from Family P2. 4.4. Sequence chromatogram showing a section of HARS2 for 127 affected individuals P8:II:1 and P6:II:1. 4.5. Sequence chromatogram showing a section of HSD17B4 for 128

10

affected individuals P7 and P5:II:1. 4.6. Sequence chromatogram showing a section of HSD17B4 for 128 affected individuals P5:II:1 and P6:II:1. 5.1. Pedigree of Family P1. 134 5.2. Affymetrix 250K Array data for Family P1 presented in 135 AutoSNPa. 5.3. Affymetrix 6.0 array data for Family P1 presented using 137 AutoSNPa. 5.4. Homozygous deletion in affected members of Family P1 as 139 shown using ChAS software. 5.5. Agarose gel electrophoresis of PCR amplicons for Family P1. 140 5.6. Agarose gel electrophoresis showing a representative sample 140 of ethnically matched control samples amplified using PP1. 5.7. Chromatogram of breakpoint sequencing in affected 141 individuals from Family P1. 5.8. Full sequence of region deleted in affected individuals from 141 Family P1. 5.9. Pedigree of Family P2. 142 5.10. Affymetrix 6.0 array data for Family P2 presented using 142 AutoSNPa. 5.11. Pedigree of Family P4. 143 5.12. Affymetrix 6.0 array data for Family P4 presented using 143 AutoSNPa. 5.13. Pedigree of Family P5. 144 5.14. Affymetrix 6.0 array data for Family P5 presented using 144 AutoSNPa. 5.15. Heterozygous deletion of NOBOX in affected individual 145 P5:II:1 shown using ChAS software. 5.16. Pedigree of Family P8. 146 5.17. Affymetrix 6.0 array data for Family P8 presented using 146 AutoSNPa. 5.18. Pedigree of Family HH1. 150

11

5.19. Affymetrix 250K Array data of chromosome 3 for Family HH1 151 presented in AutoSNPa.

5.20 Affymetrix 250K Array data of chromosome 5 for Family HH1 152 presented in AutoSNPa.

5.21. Affymetrix v6.0 Array data of chromosome 3 for selected 153 members of Family HH1 presented in AutoSNPa.

5.22. Affymetrix v6.0 Array data of chromosome 5 for selected 154 members of Family HH1 presented in AutoSNPa.

6.1. Model of long range control of transcription. 163

6.2. Location of chromosome 19 homozygous deletion relative to 164 surrounding genes.

6.3. Multi species alignment of the chromosome 19 homozygous 167 deletion identified in Family P1.

6.4 Amplification plot and standard curve of CDKN2D dilution 173 series.

6.5. Electrophoresis gel showing GAPDH amplification in a range 174 of embryonic cDNA samples.

6.6. Dissociation curves and amplification plots for GAPDH, 176 CDKN2D, KRI1 and SLC44A2.

6.7. Fold change for SLC44A2, KRI1 and CDKN2D expression 178 using different methods of data analysis.

6.8. Chromosome 19 differentially expressed genes. 183

6.9. Genes with a positive Log2 fold change ≥2. 184

6.10. Genes with a negative Log2 fold change ≤ -3. 185

6.11. Alignment map from ABI website showing the location of the 186 two assays used for SYBR green analysis.

12

6.12. Copy number analysis of P1:III:4, shown using the 187 Affymetrix ChAS software.

6.13. Results of Taqman Assay 1 analysed on CopyCaller 188 software.

6.14. Results of Taqman Assay 2 analysed on CopyCaller 188 software.

6.15 Electrophoresis gel showing PCR of heterozygous deletion 190 breakpoint.

7.1 Criteria for filtering novel variants identfied in whole exome 200 data.

7.2 Chromatogram showing segregation of the homozygous 202 c.433A>C CLPP variant in Family P1.

7.3 Chromatogram showing segregation with disease of the 202 homozygous c.1328G>T GTF2F1 variant in Family P1.

7.4 Chromatogram showing segregation with disease of the 203 homozygous c.392C>G PCP2 variant in Family P1. 7.5 Chromatogram showing segregation with disease of the 203 homozygous c.538C>T CYP4F11 variant in Family P1. 7.6 Multi-species alignment showing CLPP p.T145P variant. 204 7.7 Multi-species alignment showing PCP2 p.P131R variant. 204 7.8 Multi-species alignment showing GTF2F1 p.G443V variant. 204 7.9 Multi-species alignment showing CYP4F11 p.R180C variant. 204 7.10. Electrophoresis gels showing PCR amplification of 208 embryonic cDNA samples. 7.11. Electrophoresis gels showing PCR amplification of 209 embryonic cDNA samples. 7.12 RNA-Seq next gen data indicating expression levels in 210 various issues of the adult inner ear. 7.13 ClpP staining of human embryonic ovarian tissue. 211

13

7.14 Rap74 staining of human embryonic ovarian tissue. 211 8.1. Representation of genomic BSN. 225 8.2. Segregation of the BSN c.10081C>T variant identified in 226 members of Family HH1. 8.3. Multi species alignment of the BSN p.R3361W variant 227 detected in Family HH1. 8.4. Bsn morpholino binding sites. 232 8.5. Development of the zebrafish brain at 24 hours post 233 fertilization. 8.6. Photograph showing zebrafish transverse brain subdivision at 234 29 hours post fertilization. 8.7. Phenotype of BSN MO mix injected embryos at 1dpf. 235 8.8. Phenotype of BSN MO mix injected embryos at 2dpf. 235 8.9. Charts showing survival results and phenotype data for two 236 representative batches of morpholino injections. 8.10. Charts showing results at 2dpf of two representative batches 237 of zebrafish morpholino injections. 8.11. Splicing of cDNA after zebrafish morpholino injections. 238 8.12. Western blot showing Bsn expression in mouse synaptic 239 junction protein preparations. 8.13. Western blot showing optimization of antibody rb2BSN1 in 240 zebrafish protein. 8.14. A. Western blot of morpholino injected zebrafish batches 241 using rb2BSN1. 8.15. Results of control morpholino experiments using a 1mM MO 243 solution with a 15msec injection time. 8.16. Results of control morpholino experiments using a 1mM MO 243 solution at 10msec injection time. 10.1. Clinical poster from ESHG describing the phenotype of 258 Family P10. 10.2. Amplification plot and standard curve of GAPDH, CDKN2D, 267 SLC44A2 and KRI1 dilution series for validation experiment 1. 10.3. Amplification plot for AP1M2 dilutions for validation 268

14

experiment 1. 10.4. Amplification plot and standard curve of GAPDH, CDKN2D, 271 SLC44A2 and KRI1 dilution series for validation experiment 2. 10.5. Amplification plot for AP1M2 dilutions for validation 272 experiment 2. 10.6. Amplification plots showing gene of interest in relation to 272 GAPDH for samples P1:III3 and P1II:1. 10.7. Affymetrix Human SNP array protocol 291

15

Abstract The Genes Involved in Hearing and Endocrine Disorders. Emma Mary Jenkinson, the University of Manchester, PhD in Developmental Biomedicine, submitted 2012

In recent years, there has been a great deal of interest in rare autosomal recessive disorders. This project entitled ‘The Genes Involved in Hearing and Endocrine Disorders’ focuses on a group of autosomal recessive phenotypes which include symptoms such as sensorineural hearing loss, ovarian dysgenesis, hypogonadotropic hypogonadism, short stature and developmental delay. The aim of the project is to give insight into the molecular pathology of two disorders; Perrault syndrome (PS) and an unclassified Hypogonadotropic Hypogonadism (HH) disorder, through the identification of causative genes.

Perrault syndrome is defined as the association of sensorineural hearing loss and primary ovarian failure/ovarian dysgenesis. The phenotypic spectrum of PS is broad with the most common additional features being neurological, including ataxia. The HH disorder presented in this thesis is novel, where affected family members present with a complex combination of features including hearing loss, hypogonadism, facial dysmorphism, microcephaly and learning disability. I undertook a combination of genetic techniques including autozygosity mapping and next generation sequencing to define the causative genes.

In one consanguineous PS family I identified a locus at 19p13 and subsequent sequence analysis determined three novel missense changes in PCP2, CLPP and GTF2F1 which may be pathogenic. In the HH family autozygosity mapping defined a locus at chromosome 3p21 and a novel missense variant in BSN was identified. Subsequent developmental biology techniques were used to define the pathogenicity of these variants.

In conclusion, the data presented in this thesis has contributed to current understanding of hearing and endocrine disorders in humans. Novel mutations have been identified in genes which have not previously been linked to hearing or sexual development. Future work will be aimed at determining the specific roles of these genes in disease pathogenesis and providing accurate risk estimation for the families who have taken part in this study. An additional aim will be to increase the understanding of the pathogenesis of more common disorders of hearing loss and infertility in the hope of developing novel therapeutics.

16

Declaration The author declares that no portion of the work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning.

Copyright Statement The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the “Copyright”) and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes.

Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has from time to time. This page must form part of any such copies made.

The ownership of certain Copyright, patents, designs, trademarks and other intellectual property (the “Intellectual Property”) and any reproductions of copyright works in the thesis, for example graphs and tables (“Reproductions”), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions.

Further information on the conditions under which disclosure, publication and commercialisation of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see http://www.campus.manchester.ac.uk/medialibrary/policies/intellectual- property.pdf), in any relevant Thesis restriction declarations deposited in the University Library, The University Library’s regulations (see

17

http://www.manchester.ac.uk/library/aboutus/regulations) and in The University’s policy on presentation of Theses

18

Acknowledgments I would like to thank my co-supervisors Dr Bill Newman and Prof Julian Davis. Thank you for your teaching, encouragement and scientific expertise throughout this project. Special thanks to Prof Dorothy Trump for giving me the opportunity to do this PhD and for your encouragement, support and guidance throughout my first year. I would also like to thank Dr Jill Urquhart, Dr Sarah Daly, James O’Sullivan and Dr Sanjeev Bhaskar who have all directly contributed to the next generation sequencing and array data presented in this thesis. Thank you to all members of the Newman research group past and present for all of your help and for making the lab an enjoyable place to work.

Thank you to Prof Mary-Claire King for the opportunity to work in her lab for two months and for the valuable skills and experience that I gained during this time. Thank you also to the Infertility Research Trust and the Manchester Biomedical Research Centre for funding this project.

I would lastly like to thank my friends and family, especially my parents who have given me endless support and encouragement, it was you who made it possible for me to complete this PhD.

19

Abbreviations

A – adenine ABI - applied Biosystems AR - androgen receptor ATP - BCA assay - bicinchoninic acid assay BGI - Beijing Genomics Institute bp - base pair BRC - Biomedical Research Centre C - cytosine CaCl2 - calcium chloride cat - catalogue CAZ - cytomatrix at the active zone CHARGE -coloboma of the eye, heart anomalies, atresia, mental and growth retardation, genito-urinary anomalies and ear abnormalities ChAS - Chromosome analysis software Chr - chromosome CNV - copy number variation CP - cribriform plate CPP - central precocious puberty d.p. - decimal places DAB - 3,3′-Diaminobenzidine tetrahydrochloride ddH2O - double distilled water del - deletion DGV - database of genetic variants DNA - deoxyribonucleic acid dNTP - deoxyribonucleotide triphosphate dpf - days post fertilization DSD - disorders of sexual development DTT - Dithiothreitol E - efficiency EDTA - ethylenediaminetetraacetic acid

20

ENaC - epithelial amiloride sensitive sodium channel ESHG - European society of human genetics FDR - false discovery rate Fig - Figure FISH - fluorescence in situ hybridization FSH - follicle stimulating hormone g - gram G - guanine GL - glomerular layer GnRH - Gonadotropin releasing hormone GOI - gene of interest

H2O - water

H2O2 - hydrogen peroxide HCl - hydrochloric acid HEPES - N-(2-Hydroxyethyl)piperazine-N'-(2-ethanesulfonic acid) HGMD - human gene mutation database HH - hypogonadotropic hypogonadism HPG - hypothalamic pituitary gonadal HRP - horseradish peroxidase hrs - hours IBD - identical by descent IHC - Immunohistochemistry IHC - inner hair cells IHH - idiopathic hypogonadotropic hypogonadism Indel - insertion deletion polymorphism IQ - intelligence quotient Kb - kilobases KCl - potassium chloride kDa - kilodaltons L - litre LH - luteinizing hormone LHON - Leber hereditary optic neuropathy LOD - logarithm (base 10) of odds

21

Ltd - limited M - molar mA - milliamps MAQ - Mapping and assembly with qualities Mb - megabase MEDNIK - mental retardation, enteropathy, deafness, neuropathy, ichthyosis and keratodermia mins – minutes ml – millilitre MO – morpholino MRI - magnetic resonance imaging mRNA - messenger ribonucleic acid msec – millisecond NaCl - sodium chloride

NaHCO3 - sodium bicarbonate NaOH - Sodium hydroxide NCBI - National Centre for Biotechnology Information ng – nanogram NHLBI - National Heart, Lung, and Blood Institute NHS - National Health Service nIHH - normosmic idiopathic hypogonadotropic hypogonadism NKB - Neurokinin B NTC - non template control OFC - occipitofrontal circumference OHC - outer hair cells OMP - olfactory marker protein ON - olfactory neurons OSMED - otospondylomegaepiphyseal dysplasia OT - olfactory tract PBS - phosphate buffered saline PCR - polymerase chain reaction PMSF - Phenylmethylsulfonyl fluoride POF - premature ovarian failure

22

PS – Perrault Syndrome Ref – reference RNA - ribonucleic acid rpm - revolutions per minute RT - reverse transcriptase RT-PCR - Reverse transcription polymerase chain reaction qPCR – Quantitative Real Time PCR SDS - Sodium dodecyl sulfate sec – seconds Sh-1 - shaker-1 mutant SNHL - sensorineural hearing loss SNP - single nucleotide polymorphism T – thymine TBE - Tris/Borate/EDTA TFIIF - General transcription factor IIF Tris – trisamine tRNA - transfer ribonucleic acid UTR - untranslated region UV – ultraviolet V – volts v/v - volume in volume w/v -weight in volume WGA - whole genome amplification WT – wildtype yrs – years Z.fish – zebrafish μg – microgram μl – microlitre μM - micro molar

23

CHAPTER 1. INTRODUCTION

24

1.0 Introduction

In recent years, there has been great success in gene identification for rare autosomal recessive disorders and hence an increased understanding of underlying pathways and disease mechanisms. The aim of the project was to give insight into the molecular pathology of two autosomal recessive conditions in which deafness is associated with endocrine disorders: Perrault syndrome (deafness and ovarian dysgenesis OMIM233400) and one unclassified disorder causing hypogonadotropic hypogonadism (HH), deafness and learning disability, through the identification of causative genes and their mutations. Recent technological advances in the field of genetic medicine, such as next generation sequencing have facilitated gene identification. Through the identification and investigation of genes involved in hearing and endocrine disorders is it hoped that effective therapeutic strategies can be developed.

1.1 Sensorineural Hearing loss

Hearing loss is the most common form of sensory impairment in humans. In England the prevalence of permanent childhood hearing loss is reported to be 1.33 in every thousand newborn babies [1]. It can be divided into three classes, conductive, sensorineural or a combination of both which is called mixed hearing loss. Conductive hearing loss is caused by damage or impairment of the structures of the external ear or the ossicles of the middle ear. Sensorineural hearing loss (SNHL) is caused by impairment of the structures of the inner ear [2]. More than 50% of pre-lingual hearing loss is genetic, the rest are caused by environmental factors such as infection or injury [2]. Hereditary causes of hearing loss are highly heterogeneous with approximately 57 non syndromic deafness genes having been identified to date [3]. As well as this there are also a large number of chromosomal loci for deafness which have been mapped, but as yet causative mutations have not been identified. When new loci are identified they are designated either DFNA, DFNB, or DFN followed by an accession number. DFNA designations are used for dominant inheritance patterns, DFNB designation 25

is used for recessive inheritance and DFN is used for X linked inheritance. Heterogeneity on this scale makes genetic screening and development of effective therapies a major challenge [1].

Understanding the complex nature of hereditary hearing loss, and the genes involved in its pathogenesis will expand our knowledge of the mammalian auditory system. The classification of hearing loss being investigated by this study is sensorineural hearing loss, caused by impairment of the structures of the inner ear. The inner ear is a remarkable structure which can detect vibrations induced by sound of less than a nanometre and amplify them more than 100 fold via the mechanosensory hair cells of the Organ of Corti. The Organ of Corti is found in the cochlear and is made up of a range of different cellular structures including thousands of sensory hair cells and non sensory supporting cells. The hair cells have an extremely sensitive apical surface known as the hair bundle which is made up of dozens of stereocilia. The tectorial membrane (an extracellular matrix) covers the apical surface of the Organ of Corti and attaches to its outer hair cells (Figure 1.1). When sound is heard the vibrations in the air are converted to oscillations in the fluid of the inner ear. These vibrations cause deflection of the hair bundles, hair cell depolarization and neurotransmitter release [4]. The genes that have so far been identified as carrying mutations which cause sensorineural hearing loss can be divided into five groups. Those which play a role in regulating potassium homeostasis in the cochlear, those which play a role in hair bundle formation, which form and maintain the extracellular matrix, transcription factors and a group of genes for whom function is still largely unknown. A selection of these genes and the mutations that have previously been identified are reviewed in this section (summarised in Table 1.1) but for a more comprehensive review the reader is directed to [5].

26

Figure 1.1. Human inner ear structure showing the three major sections of the cochlear; the Scala vestibuli, the Scala media and the Scala tympani. The organ of Corti has one row of inner hair cells and three rows of outer hair cells, separated by supporting cells. Each of these hair cells have bundles of organelles known as stereocilia extending towards the tectorial membrane which convert motion in the fluids of the inner ear into electrical signals to be conducted back to the central nervous system [6].

Figure 1.2. Mouse and zebrafish models of genetic deafness have been developed for various genes. The schematic on the left shows the structures of a mouse cochlea.Taken from Lonyai et al [7] The schematic on the right shows the structures of the zebrafish inner ear.Taken from Ernest et al [8].

27

1.1.1 Genes which Regulate Ion Homeostasis in the Cochlear

Maintaining ion homeostasis between the different components of the cochlear is vital for depolarization of inner ear hair cells and neurotransmitter release. A large family of proteins called connexins play a vital part in regulating this homeostasis by forming clusters of intercellular channels known as gap junctions. The first autosomal recessive form of hereditary deafness (DFNB1) was identified in 1997 due to mutations in the GJB2 gene, which encodes connexin 26 [9]. The identification of this gene as a cause for non-syndromic deafness came initially from an investigation into a Caucasian family with autosomal dominant palmoplantar keratoderma (PPK) and SNHL. Haplotype analysis and linkage suggested that the two disorders did not segregate completely together leading the investigators to consider that they may be caused by two separate gene mutations. The deafness phenotype was the result of a missense change in GJB2 which caused p.M34T in the protein sequence. In this family the GJB2 mutation resulted in autosomal dominant deafness, and it was proposed that the mutation may act in a dominant negative fashion [9]. However, the recessive SNHL locus DFNB1 also mapped to this region of chromosome 13. GJB2 mutations were found in three consanguineous families from Pakistan who all mapped to DFNB1. In one family a homozygous nonsense mutation which results in p.W77X segregated with the disease and was not detected in a panel of control samples. The remaining two families were found to have the same homozygous nonsense mutation p.W24X [9].

Mutations in this gene account for a major proportion (up to 50% in some populations) of early non-syndromic autosomal recessive deafness [10]. The most common mutation is a single nucleotide deletion leading to a frameshift and premature stop codon at position 38 [11]. Most GJB2 mutations result in the impaired formation of functional gap junction channels in in vitro studies [12, 13]. However, not all GJB2 mutations are completely inactivating. It has been demonstrated that the p.V84L mutation affects permeability of IP3 which impairs the propagation of Ca2+ waves [14]. Other members of the connexin gene family have also been identified as harbouring mutations 28

which cause non syndromic SNHL, such as GJB6 (encodes Connexin 30), GJB3 (encodes Connexin 31) and GJA1 (encodes Connexin 43) [15-17].

Although the correct formation of gap junctions is vital, the connexins are not the only proteins that are important for the maintenance of potassium homeostasis in the inner ear. Mutations in genes encoding ion channels such as KCNQ4 and SLC26A4 can cause hearing loss in humans. KCNQ4 encodes a potassium ion channel that is expressed in the sensory outer hair cells of the cochlear. The exact role of KCNQ4 in hearing is still unclear but mutations cause autosomal dominant SNHL with a variable phenotype in DFNA2 families [18]. Pendrin is a transmembrane anion exchange molecule that is encoded by SLC26A4. Mutations in SLC26A4 are the cause of non syndromic hearing loss at locus DFNB4, and can also cause a syndromic form of hearing loss known as Pendred syndrome (a condition defined by SNHL and enlargement of the thyroid gland (goitre)) [19]. A knockout mouse model of Slc26a4 has enlargement of the endolymphatic space which increases free radical stress and reduces the expression of the Kcnj10 gene. Kcnj10 encodes a potassium ion channel, therefore mutations in Slc26a4 result in disruption to ion homeostasis [20].

1.1.2 Genes Responsible for Formation and Maintenance of the Hair bundles

The second group of genes which play an important role in hereditary deafness are involved in the morphogenesis of the hair bundles. This group consist of, among others, genes encoding a family of proteins known as myosins which play a vital role in stereocilia formation and maintenance.

MYO7A is one member of the myosin family of proteins which when mutated can cause non-syndromic SNHL and can also cause syndromic hearing loss in patients with Usher Syndrome type I. The DFNB2 locus was assigned to chromosome 11q13.5 in 1994 by homozygosity mapping of a large consanguineous family from Tunisia [21]. At this time a gene known as olfactory marker protein (OMP) had already been identified and localized 29

within the DFNB2 locus [22]. Although OMP was not itself the pathogenic cause for SNHL its localization was extremely useful in identifying the DFNB2 gene. The reason for this is because in a mouse model of autosomal recessive deafness, hyperactivity and vestibular defects known as Shaker-1 the causative gene sh-1 was shown to be tightly linked to the murine homologue of human OMP. This led to the proposal that the gene causing SNHL in the Tunisian family must be the human homologue of sh-1 [21]. The human homologue of sh-1 was later identified as MYO7A [23]. Since its identification there have been several families with non syndromic recessive SNHL in whom homozygous mutations in MYO7A have been reported [24, 25]. To further highlight the importance of MYO7A in hearing, mutations in this gene can also cause non-syndromic deafness with an autosomal dominant pattern of inheritance. The first autosomal dominant mutation was identified in 1997 in a DNFA11 family with non-syndromic progressive hearing loss. All affected individuals carried a heterozygous in frame 9 base pair deletion which resulted in the deletion of amino acids 886-888 in the protein sequence [26]. Other dominant MYO7A mutations have been identified in DFNA11 families [27-29]. These include a missense mutation causing p.R853C in a German family with slow progressive dominant hearing loss with childhood onset. The p.R853C mutation lies in a conserved motif of the protein sequence which is important for binding with calmodulin [30].

As mentioned above, not only can MYO7A mutations cause autosomal recessive and dominant forms of non syndromic hearing loss, but they can also be a cause of syndromic hearing loss. Usher syndrome (hearing loss and retinitis pigmentosa) is the most common cause of deaf-blindness in humans, and in 1992 the USH1B locus was assigned to 11q13.5 [31]. MYO7A was considered to be a good candidate gene given its identification as the human homologue of murine sh-1. Screening of nine USH1B families identified novel mutations (two nonsense, one six base pair deletion and two missense changes) in affected individuals from five families [23].

30

Other members of the myosin family of proteins have been identified as harbouring mutations which cause sensorineural deafness. MYO15 was determined to be the gene which caused non-syndromic recessive SNHL at locus DFNB3. The initial identification of this locus came from a study of an isolated community on the island of Bali in which approximately 2% of the population had non-syndromic congenital deafness. Homozygosity mapping was used to map the locus to chromosome 17 [32, 33]. In 1998 mutations in Myo-15 were identified as the cause of an autosomal recessive mouse model of deafness known as Shaker-2 [34]. When human MYO15 was mapped to within the chromosome 17 locus it was considered to be a good candidate gene, based on the Shaker-2 mouse model. Sequencing in three unrelated DFNB3 families (one from the community in Bali and two Indian) revealed two homozygous missense mutations and one homozygous nonsense mutation [35].

MYO6 mutations cause autosomal recessive deafness at locus DFNB37. This deafness locus was identified by mapping of three large Pakistani families, and sequencing revealed homozygous mutations, one missense, one frameshift and one nonsense. In the family found to have a frameshift mutation leading to a premature stop codon, additional phenotypic features were noted. Segregating along with the deafness was vestibular dysfunction, and a mild facial dysmorphology. One affected family member had retinitis pigmentosa which is seen in combination with SNHL in Usher syndrome as mentioned above [36]. Finally in a family with autosomal dominant deafness and familial hypertrophic cardiomyopathy, a heterozygous missense mutation was detected in MYO6 which causes a change in a highly conserved residue p.H246R [37].

The myosin family are a group of cytoskeletal proteins, and can be classed as conventional or unconventional myosins. Conventional myosins are actin based molecular motors that convert ATP to energy in order to move along actin filaments. MYO7A, MYO15 and MYO6 are all unconventional myosins. The function of the unconventional class of myosins is less well understood. The tail domains of these proteins are different from one unconventional 31

myosin to another, and it is this region that is thought to determine function through specific interactions with other cellular molecules [38]. For example, MYO6 has a unique function in the inner ear. It is a backward stepping myosin which moves towards the minus end of actin based filaments unlike other myosins which move towards the positive end [39]. This could mean that MYO6 plays a role in clearing away molecular components that are released by actin treadmilling at the taper of the stereocilium [40]. MYO6 may also play a role in stabilization of the hair cell apex due to its localization at the base of the hair bundle within the cuticular plate, the actin rich structure that supports the stereocilia [41].

The myosin family of proteins is just one example of the large number of different proteins that have been identified as being vital for hair bundle formation. Mutations in many of the genes involved in this process have been found to cause SNHL in both humans and mice.

1.1.3 Genes Responsible for Maintenance of the Extracellular Matrix

The inner ear contains two extracellular matrices, the basement membrane and the tectorial membrane, between which the sensory epithelia can be found. Disruption to the protein components of the tectorial membrane cause sensorineural deafness in humans. Mutations in the TECTA gene (encoding α-tectorin, a non collagenous component of the tectorial membrane) cause hearing loss with recessive and dominant patterns of inheritance [42, 43]. Functional studies of a dominant p.C1509G mutation found in a Turkish DFNA12 family have shown the tectorial membrane is reduced in size and only makes contact with the first row of outer hair cells (OHC). This reduces the overall number of OHCs that are involved in mechanotransduction as the second and third row are essentially non functional, resulting in hearing loss [44-46].

Another essential component of the tectorial membrane is Type XI collagen A2 (COL11A2). Mutations in this gene have been detected in families with non-syndromic autosomal recessive (DFNB53) and dominant (DFNA13) 32

hearing loss, and also in patients with non ocular Stickler syndrome and otospondylomegaepiphyseal dysplasia (OSMED) [47-50]. Stickler syndrome and OSMED belong to a group of disorders known as osteochondrodysplasias which affect skeletal development and are associated with SNHL and sometimes ocular abnormalities (OMIM215150, OMIM184840).

Similarly mutations in stereocilin, otoancorin and cochlin (all components of the extracellular matrix) can cause SNHL. The STRC gene which encodes Stereocilin is located on human chromosome 15 within the DFNB16 locus. Mutations were detected in two of the families with non-syndromal deafness that had been used in the mapping of this locus. In one consanguineous family a homozygous insertion was detected leading to a frameshift and premature stop codon within exon 13 of the gene. The second family were non consanguineous and affected individuals were found to be compound heterozygotes. A 4bp deletion resulting in a frameshift and premature stop codon was detected on the paternal allele, and a large deletion , starting in intron 7 and ending in intron 16 was present on the maternal allele [51]. Mutations in the gene encoding otoancorin (OTOA) were the cause of non syndromic recessive deafness in a consanguineous Palestinian family mapping to DFNB22. Affected members of this family were homozygous for a single base pair change at the exon/intron 12 junction which was expected to affect splicing, resulting in exon skipping or the introduction of a cryptic splice site [52]. Although mutations in OTOA are not thought to be a common cause of hereditary deafness there have been other mutations described in the Palestinian population. A novel p.D356V mutation was detected as part of a screen of Palestinian families with non syndromic hearing loss, and a large 500kb deletion encompassing the OTOA gene was detected as a result of homozygosity mapping in another family [53, 54]. Stereocilin and otoancorin are both responsible for correct attachment of the tectorial and otoconial membranes to sensory and non sensory cells within the cochlear. It is disruption of this attachment which is thought to reduce mechanotransduction in patients with gene mutations [52, 55].

33

In 1998 sequencing of COCH (encoding the extracellular matrix protein cochlin) revealed three novel heterozygous missense variants in DFNA9 families [56]. Mutations in COCH cause a distinctive phenotype of progressive hearing loss and vestibular dysfunction, which is transmitted in an autosomal dominant fashion [57]. One COCH mutation, p.P51S is found more frequently in families originating from Belgium and the Netherlands due to a founder mutation in these populations [58].

It is clear from the range of mutations discovered in proteins of the extracellular matrix, that the acellular membranes which cover the cochlear structures in the inner ear are vitally important to the normal mechanotransduction of sound. It is unsurprising then that disruption of this matrix through gene mutations results in non syndromic SNHL in humans.

1.1.4 Transcription Factor Genes

Tissue specific transcriptional regulation is vital to ensure the precise order of the many different components found in the highly specialized Organ of Corti within the cochlear. Mutations in several transcription factor genes have been discovered to be a cause of syndromic and non syndromic deafness in humans.

POU4F3 and POU3F4 both belong to a group of transcription factors known as the POU superfamily. They are involved in tissue specific gene regulation and mutations are a known cause of SNHL. POU3F4 was initially considered to be a good candidate gene for the DFN3X locus after it was mapped to the X chromosome in mice close to the Pgkl gene and the Plp locus. This region is evolutionarily conserved between mice and humans and when this information was combined with its expression pattern during rat embryogenesis, POU3F4 became an attractive candidate for the human deafness locus [59]. Sequencing of 14 unrelated DFN3X patients revealed 5 different mutations in the POU3F4 gene, three deletions leading to premature termination codons, and two missense variants. Additionally three microdeletions and one duplication were detected 5’ to the POU3F4 gene 34

which may be exerting an effect via a non coding regulatory region upstream of the gene [60]. As well as causing SNHL, mutations in POU3F4 can also cause mixed hearing loss, and alter cochlear morphology by causing bony malformations [61].

POU4F3 is also involved in tissue specific regulation of transcription, and is located on chromosome 5. Mutations in POU4F3 cause autosomal dominant progressive non syndromic hearing loss at locus DFNA15. The initial mapping of DFNA15 was carried out on an Israeli Jewish family with five generations of progressive hearing loss. POU4F3 was considered to be a good candidate for this locus because mice with a targeted deletion of this gene were completely deaf. An 8bp deletion leading to a frameshift and premature stop codon was found to be the causative mutation in the Jewish kindred [62]. Functional experiments on two missense mutations detected in other DFNA15 patients revealed that the mutant proteins did not localize correctly to the nucleus and showed a reduced capacity for DNA binding [63].

1.1.5 Genes of Poorly Established Function

There are several genes known to cause SNHL for which function and pathogenesis are still not well established. An example of such a gene is TMPRSS3 located on chromosome 21 which when mutated causes recessive deafness in DFNB8 and DFNB10 families [64, 65]. Studies in mice have provided some evidence as to the function of this protein. It is thought that the epithelial amiloride-sensitive sodium channel (ENaC), which is expressed in several sodium-reabsorbing tissues including the inner ear may be a substrate for cleavage by TMPRSS3. There is a strong correlation between lack of proteolytic activity and deafness phenotype. But at the moment our understanding of which mechanisms may be controlled by this proteolytic cleavage and how disruption of cleavage leads to hearing loss is still incomplete [66, 67].

35

Another example of a deafness gene which we do not fully understand the function of is TMIE. TMIE (Transmembrane Inner Ear expressed) is located on chromosome 3 and loss of function mutations cause recessive hearing loss [68-70]. Transmembrane inner ear expressed protein does not resemble any other known protein, but mutations which cause deafness in mice and humans indicate that it has an important function in the inner ear [68]. A zebrafish knockdown model of Tmie has also provided evidence as to its vital role in the development of the structures of the inner ear [71]. The recessive mouse model Spinner carries mutations in Tmie which result in morphological defects in the outer hair cell stereocilia indicating a possible role for this protein in hair bundle formation or maintenance [72].

Chromosome Gene Deafness loci Phenotype 13p13 GJB2 DFNB1A SNHL 13p13 GJB6 DFNB1B SNHL 1p34.3 GJB3 DFNA2B SNHL 1p34.2 KCNQ4 DFNA2 SNHL 7q31.1 SLC26A4 DFNB4 SNHL/Pendred syndrome 11q13.5 MYO7A DFNB2/DFNA11/USH1 SNHL/Usher B syndrome 17p11.2 MYO15 DFNB3 SNHL 6q14.1 MYO6 DFNB37 SNHL 11q23.3 TECTA DFNA12 SNHL 6p21.32 COL11A2 DFNB53/DFNA13 SNHL/OSMED/Stickl er syndrome 15q11.2 STRC DFNB16 SNHL 16p12.2 OTOA DFNB22 SNHL 14p11.2 COCH DFNA9 SNHL/Vestibular dysfunction 5q32 POU4F3 DFNA15 SNHL Xq21.1 POU3F4 DFN3X SNHL 21q22.11 TMPRSS3 DFNB8/10 SNHL 3p21.31 TMIE DFNB6 SNHL

Table 1.1. Chromosomal location and associated phenotype of the genes discussed in Chapter1.1.

36

1.1.6 The Need for Further Research in Deafness Genetics

The field of deafness genetics has achieved tremendous success in recent years. But despite all that we now know about causes of monogenic hearing loss, there is still a lot to learn. Not only are there deafness genes for which function has not fully been determined, but SNHL is genetically heterogeneous and there are also still many genes yet to be discovered. As research continues and more genes are found we are expanding our knowledge of the mammalian auditory system. Studying rare disorders such as Perrault syndrome and the unclassified HH syndrome will provide additional information about the proteins and processes that are important for correct functioning of the auditory system in humans. Identification of genes involved in hereditary hearing loss will also translate into improved clinical care for patients. New methods of diagnosis for hearing loss are being developed which could then be used for reproductive counselling in families known to carry genetic mutations. A recent example of the diagnostic technology being developed is the Invader assay, which has been designed to simultaneously screen for 47 known hearing loss mutations in 13 known genes. In patients with early onset hearing loss this assay was able to detect mutations in 41.8% of patients, but detection of mutations was lower in those with later onset hearing loss, 16% [73].

1.2 Hypothalamic Pituitary Gonadal axis

Mammalian reproduction is controlled and supported by the Hypothalamic- Pituitary-Gonadal (HPG) axis. A group of highly specialized neurons in the hypothalamus activate the HPG axis via the pulsatile release of GnRH (gonadotropin releasing hormone) into the hypophyseal portal circulatory system [74]. This circulatory system delivers GnRH to the anterior pituitary where it binds to the gonadotropes initiating the production and release of the gonadotropins, luteinizing hormone (LH) and follicle stimulating hormone (FSH).The gonadotropins bind to receptors within the gonads and regulate steroidogenesis and gametogenesis. Both LH and FSH are dimeric glycoprotein hormones which share the same α-subunit, but have a β- 37

subunit which is specific for each hormone [75]. In males, LH is responsible for stimulating testosterone production in the Leydig cells and androgen- binding hormone in the Sertoli cells. FSH binds to receptors on the Sertoli cells and along with testosterone drives spermatogenesis. In females, FSH is the primary hormone which drives folliculogenesis, but LH still plays a pivotal role during follicular growth by stimulating the theca cells to produce androgens which are converted to estrogens in the granulosa cells [76]. LH also plays an important role in the female menstrual cycle by controlling ovulation and stimulating progesterone production [77].

Disruption of many different stages along the HPG axis can lead to reproductive defects and lack of sexual maturation. Infertility caused by genetic mutations can be broadly divided into two main types, hypogonadotropic hypogonadism and hypergonadotropic hypogonadism. Hypogonadism is classically differentiated into one of these two groups by determining the point in the HPG axis at which the defect exerts its effect.

Figure 1.3. Hormonal control of the HPG axis.

38

If the release of GnRH from the hypothalamic neurons or the recognition of GnRH by the pituitary (and subsequent release of gonadotropins) is defective then the patient suffers with hypogonadotropic hypogonadism. If the production of LH and FSH from the pituitary gland is normal but the hormones are not recognized by the gonads, or the gonads cannot respond to stimulation by these hormones then the patient suffers from hypergonadotropic hypogonadism. In this type of hypogonadism the lack of response by the gonads will cause the pituitary to release more LH and FSH to try to overcome the defect, meaning that patients will have high levels of gonadotropins.

1.2.1 Hypogonadotropic Hypogonadism

Hypogonadotropic hypogonadism (HH) is defined as a deficiency of the pituitary secretion of follicle stimulating hormone (FSH) and luteinizing hormone (LH), and is brought about by a defect in the hypothalamus or pituitary gland. It typically presents as a lack of secondary sexual characteristics and impaired fertility. HH can be congenital due to genetic defects or acquired due to factors such as extreme weight loss, injury, or infection. Congenital HH has an incidence of 1-10 in 100,000 births and the genetic cause is only known for about 30% of all cases [78, 79]. Pathogenesis and cause for the vast majority of cases of HH are still unknown and it is thought to be a highly heterogeneous disorder, both genetically and phenotypically. A summary of HH genes can be found in Table 1.2.

Kallmann Syndrome Kallmann syndrome (HH and anosmia) accounts for approximately 60% of all cases of HH [79], and is much more common in males than females, with the ratio being estimated as 5:1[80]. The genetic basis of Kallmann syndrome is highly complex and the genetic causes overlap with those of normosmic idiopathic hypogonadotropic hypogonadism (nIHH). Because Kallmann syndrome and nIHH are so closely related it is vital to understand and appreciate what is so far known about both disorders. This is especially 39

important due to the fact that anosmia is difficult to detect and is not always tested for in the clinic. The first clinical report of HH in association with anosmia was in 1856 by Maestre de San Juan, but the condition was given its name after a more detailed description of the hereditary nature of the disorder by Kallmann in 1944 [81, 82].

In 1989 it was shown that GnRH neurons in mice originate outside of the central nervous system, in a structure called the median olfactory placode and migrate during embryonic development to their final destination in the olfactory bulb [83, 84]. In the same year the first real insights into the mechanisms underlying Kallmann syndrome came from the examination of a 19 week old Kallmann foetus with a deletion of Xp22.3. The foetus was the sibling of a child described by Bick et al who had a contiguous gene syndrome due to an X terminal deletion. This child presented with Kallmann syndrome as well as ichthyosis and chondrodysplasia punctata [85]. Examination of the Kallmann foetus showed that the olfactory, vomeronasal and terminal nerves had not completed their migration correctly. These nerves were not in contact with the brain and instead could be shown to be in an abnormal neural tangle in the meninges. The GnRH neurons had also failed to make contact with the brain and could be seen in the same neural tangle. The foetus was found to have an absence of olfactory bulbs and tracts [86]. It was hypothesised that Kallmann syndrome is caused by incomplete or incorrect neuronal migration (Figure 1.4).

40

Figure 1.4. The pathogenesis of Kallmann syndrome. The olfactory neurons (ON) in the normal diagram have migrated through the cribriform plate (CP) into the olfactory bulb. They make synapses with mitrial cells (M) within the glomerular layer (GL); the axons of the mitrial cells form the olfactory tract (OT). The neurons which secrete GnRH (shown in red) originate in the olfactory placode and migrate along the olfactory nerves until they reach the forebrain. In Kallmann syndrome, the lack of KAL1 (shown in green) has disrupted migration resulting in a neural tangle of GnRH neurons and olfactory neurons between the CP and olfactory bulb. Figure is taken from [87].

KAL1 Three modes of inheritance have been reported in families with Kallmann syndrome, X linked recessive, autosomal dominant and autosomal recessive. In 1991 the first causative gene was identified on the X chromosome, KAL1 which encodes the glycoprotein anosmin1. Deletions and translocations in Kallmann patients were found to disrupt the open reading frame of this gene which led to its identification as a pathogenic cause of Kallmann syndrome [88, 89]. Anosmin1 directly affects the migration of GnRH neurons by acting as a chemoattractant which supports the theory of impaired neuronal migration [90]. Various loss of function mutations in KAL1 have been shown to be the underlying cause of the X linked recessive form of this disorder [91-93]. Despite this, the overall contribution of KAL1 mutations to Kallmann syndrome pathogenesis is

41

relatively small. The prevalence is estimated to be between 6 and 14% with a higher prevalence in familial cases than sporadic ones [94-96].

FGFR1 The first autosomal dominant Kallmann mutations identified were loss of function mutations in FGFR1 (fibroblast growth factor receptor) in 2003. This discovery was made when two individuals affected by different contiguous gene syndromes were analysed using fluorescence in situ hybridization (FISH). The authors identified a 540Kb overlapping region which contained three candidate genes. FGFR1 was considered to be the most likely of the three, and screening of a cohort of Kallmann patients revealed various heterozygous mutations in four familial and 8 sporadic samples [80]. FGF signalling is involved in many pathways and processes during development and the authors proposed several arguments for the possibility that anosmin1 plays a role in this pathway via direct interaction with the FGF receptor. In the following year, Gonzalez-Martinez et al identified the heparan sulphate dependent mechanism by which this interaction takes place, and more recent studies have shown that anosmin1 can bind to both heparan and FGF receptor to have a dual role in signalling activity and complex formation [97, 98].

PROK2 and PROKR2 The third and fourth Kallmann genes were discovered as a pair, PROK2 (prokineticin 2) and its receptor PROKR2 were identified using a direct candidate gene approach [99]. A mouse model with abnormal development of olfactory bulbs and atrophic reproductive organs pointed to the involvement of PROKR2. Sequencing of a cohort of 192 patients revealed various mutations in this gene and in its ligand PROK2. All mutations detected in PROK2 were in the heterozygous state, but in PROKR2 heterozygous, homozygous and compound heterozygous mutations were reported. The functional relevance of the various heterozygous mutations detected in the gene PROKR2 was uncertain. Finding both heterozygous and homozygous (or compound heterozygous) individuals for the same mutations who also have the same phenotype is unusual. Added to this 42

three of the mutations found in Kallmann patients were also found once each in a panel of healthy controls, and the degree of hypogonadism and anosmia in patients affected by PROKR2 and PROK2 mutations was also variable. This raised the idea of a digenic model of inheritance for Kallmann syndrome. This was found to be the case for one patient who had missense mutations in PROKR2 and KAL1 [99]. Since the initial report of PROKR2 and PROK2 mutations, the phenotypic spectrum has been expanded and functional studies have been carried out. There is now evidence that several of the mutations found have a deleterious effect on prokineticin signalling in vitro [100]. Biallelic mutations are thought to have a more severe reproductive phenotype in affected males, and clinical anomalies which are non olfactory and non reproductive are thought to be rare in patients harbouring PROKR2 and PROK2 mutations [101, 102].

CDH7 In 2008 mutations were found in a cohort of Kallmann and nIHH patients in the gene CDH7 [103]. CDH7 mutations also cause a developmental disorder known as CHARGE syndrome (OMIM 214800). Affected individuals present with coloboma of the eye, heart anomalies, atresia, mental and growth retardation, genito-urinary anomalies and ear abnormalities including deafness. CHARGE syndrome is distinct from Kallmann syndrome but has the overlapping features of hypogonadism and anosmia. Screening a cohort of Kallmann and nIHH patients for mutations in CHD7 revealed that Kallmann syndrome can be considered a mild allelic variant of CHARGE. Seven mutations were detected, and it was estimated that CHD7 mutations account for approximately 6% of cases of Kallmann/nIHH [103].

FGF8 In 2008 Falardeau et al identified FGF8 as the specific FGFR1 ligand responsible for GnRH neuronal development [104]. The group used a candidate gene approach and found six missense mutations in various isoforms of FGF8. The mutations were found in Kallmann syndrome and also nIHH patients within their cohort. The severity of the phenotype and the association of additional features varied depending on the mutation. Only 43

one nIHH individual in this study was a homozygote and two nIHH patients were shown to have digenic inheritance after identification of mutations in FGFR1 [104]. It is still possible that some Kallmann and nIHH patients with heterozygous mutations in FGF8, PROKR2 and PROK2 will have additional mutations in as yet unidentified genes.

NELF Nasal Embryonic LHRH Factor (NELF) was originally identified as an important regulator of olfactory and GnRH neuronal migration in mice [105]. Screening of a cDNA library from primary GnRH neurons that were migrating was compared to a library from non-migrating neurons in mice. Nelf was the only novel clone to be identified during the screen and was found to be expressed in peripheral and central nervous system tissues during embryogenesis. Nelf was strongly expressed in GnRH neurons as they migrate, but could not be detected postnatally once the cells have entered the forebrain. Initial characterization indicated that Nelf acts as a guidance molecule for olfactory and GnRH neuronal migration [105]. This functional data made human NELF a good candidate gene for Kallmann syndrome and nIHH. In 2004 the human NELF was mapped to chromosome 9, and mutational screening in a cohort of 65 patients with HH (33 of whom had Kallmann syndrome) revealed a heterozygous missense mutation which causes p.T480A in the protein sequence and creates a new splice site, in a patient with sporadic nIHH. This mutation was not detected in a panel of 100 control samples [106]. In 2007, a heterozygous intronic deletion of NELF which caused aberrant splicing and premature termination was found in a patient with severe Kallmann syndrome. Several members of this family were affected with wide variability in phenotypic features of each affected individual. The proband had severe Kallmann syndrome, the father had delayed puberty and congenital anosmia, the mother had clinodactyly, Duane ocular retraction syndrome and was menopausal, a female sibling had a bifid nose and highly arched palate, and a male sibling had clinodactyly. The proband was also shown to be heterozygous for a missense mutation in FGFR1, and a digenic model of inheritance was proposed to explain the phenotypic variation within this family [107]. More 44

recently there has been growing evidence that nIHH and Kallmann syndrome may be caused by digenic inheritance of NELF and other gene mutations. Sequencing of NELF in a cohort of 168 nIHH and Kallmann patients identified one patient with compound heterozygous mutations in NELF, and 2 patients with digenic gene mutations. The first individual was heterozygous for p.A253T in NELF, and p.C163del in KAL1. The second individual was heterozygous for an intronic mutation which led to aberrant splicing and skipping of exon 10 in NELF, and a nonsense mutation p.W275X in TACR3 [108].

Additional features of Kallmann syndrome. As well as the obligatory hypogonadotropic hypogonadism and anosmia that characterize Kallmann syndrome, it is important to note that there are a range of additional features described in the literature. In Dode et al’s 2003 report of pathogenic mutations in FGFR1, affected individuals from one consanguineous family were noted to have cleft palate, agenesis of the corpus callosum, hearing loss, and fusion of fourth and fifth metacarpal bones. Three other unrelated individuals with FGFR1 mutations had cleft lip or palate, one of whom also had iris coloboma and absence of the nasal cartilage on the right side. In another family who had a synonymous substitution predicted to effect splicing, dental agenesis was a common feature among affected siblings and their unaffected mother [80]. Similarly when Cole et al screened a cohort of 170 Kallmann syndrome and 154 nIHH patients for PROK2 and PROKR2 mutations they saw a range of non typical features in several of the patients in whom mutations were detected [100]. These features included epilepsy, hearing loss, sleep disorders, fibrous dysplasia, learning disability and synkinesia [100]. FGF8 mutations have been associated with cleft lip and palate, hearing loss, camptodactyly, and ocular abnormalities [104, 109] and KAL1 mutations have been associated with renal agenesis, synkinesia and hearing loss [95, 110]. Interestingly for my study is the recurrent association of Kallmann syndrome or nIHH with hearing loss. The mechanism of hearing loss for most of the mutations detected is as yet unknown, but this highlights the association between the processes involved in hearing and endocrine function. Also worthy of noting 45

are the few Kallmann syndrome patients who have been reported to have what is known as reversible Kallmann syndrome. This is thought to occur when gonadal function is sustained after treatment with testosterone has stopped. Mutations in KAL1, FGFR1 and PROKR2 have been found in patients with a reversible form of this syndrome [111-113]

Idiopathic Hypogonadotropic Hypogonadism (nIHH) As previously discussed the genes and mutations which cause Kallmann syndrome and nIHH largely overlap. Many of the genes causing nIHH have already been discussed in the previous section (FGFR1, PROK2, PROKR2 CDH7 and FGF8); however there are some genes in which mutations cause nIHH but not Kallmann syndrome and those genes will be discussed in the following section.

Kallmann syndrome is caused by a defect in neuronal migration, and all of the genes discussed so far play some role in regulation or guidance of this process. For nIHH there are two additional modes of pathogenicity which need to be considered, defects in the regulation of GnRH release, and defects in GnRH action. Mutations in the genes which are vital for these processes result in normal olfactory function, but hypogonadotropic hypogonadism [79].

GNRHR Given what we know about the importance of regulated GnRH release in mammalian reproduction, it would be natural to consider GNRH1 and GNRHR as candidate genes for pathogenic mutations. In 1997 de Roux et al identified compound heterozygous mutations in the GNRHR gene in a family with two siblings affected by nIHH [114]. The gonadotropin releasing hormone receptor is a G-protein coupled receptor with seven transmembrane domains. The affected family members were found to have one mutation, p.Q106R, in the first extracellular loop of the protein. This mutation was shown through hormone binding experiments to greatly reduce the binding of GnRH. The second mutation was in the third intracellular loop, p.R262Q. Hormone binding was shown to be normal in cells transfected with 46

this mutant strain, but activation of phospholipase C was greatly reduced [114]. Traditionally hypogonadism due to GNRHR mutations was thought to be an autosomal recessive disorder, but studies of GNRHR variants in nIHH cohorts have indicated that this is not necessarily the case. As with all of the genes involved in the pathogenesis of hypogonadism that we have explored so far, inheritance is variable and complex. One study showed that just as with Kallmann syndrome, nIHH has a skewed male to female ratio of approximately 5:1. The inheritance pattern for female probands could be either autosomal dominant or recessive in 60% of cases, with the remaining 40% being sporadic. The opposite was true of male cases, with the vast majority being sporadic and only 10% showing evidence of autosomal dominant or recessive inheritance [115].

GPR54 and Kisspeptin In 2003 a novel and exciting pathway controlling reproduction in humans was identified. Mutations in the GPR54 gene were discovered to be a cause of nIHH simultaneously by two independent groups. At this time there was very little known about its function. GPR54 is a receptor with seven transmembrane domains, the ligand for which is kisspeptin (KISS1). Kisspeptin is a 154 protein which is cleaved into a 54 amino acid active peptide known as metastin [116, 117]. The first group led by Nicolas de Roux identified a homozygous deletion in a consanguineous family. The parents were first cousins and five out of the eight siblings were affected by idiopathic hypogonadotropic hypogonadism, with no sign of anosmia. The deletion was 155bp in length and spanned the end of intron 4 and start of exon 5. This resulted in the loss of the 6th and 7th transmembrane domains which are essential for stimulation of signal transduction by a G protein coupled receptor such as this one [118]. This group also identified a homozygous missense change p.L102P in another unrelated nIHH patient. The deleterious effect of this change was confirmed in 2007 when functional studies and a phenotypic analysis of individuals with this variant were carried out [119].

47

The second group to identify GPR54 mutations used a similar autozygosity mapping approach in a large consanguineous family from Saudi Arabia. They identified a homozygous missense change p.L148S which was not present in unaffected family members or in a panel of controls. Additionally a screen of GPR54 in their cohort of nIHH patients identified one other individual with compound heterozygous mutations. The first mutation being a nonsense p.R331X, and the second being a nonstop p.X399R. The nonstop mutation leads to the continuation of the reading frame into the poly A tail without a novel stop codon. Neither mutation was detected in control samples. In vitro assays performed by this group showed reduced functionality of G protein coupled receptor 54 for all three of the mutations identified. Additionally a Gpr54 null mouse model was shown to be a phenocopy of human nIHH in both male and female [120].

Studies of in vivo stimulation of gonadotropin release by metastin revealed it to be extremely potent when compared with other neuropeptides. At doses as low as 1fmol metastin can elicit LH and FSH secretion in mice [121] Similarly a dramatic increase in gonadotropin levels could be seen in rats treated with metastin. The effect was shown in immature female rats, immature female rats that had been primed with pregnant mare serum gonadotropin, and in mature male rats. The subcutaneous administration of metastin acted through GnRH neurons to release gonadotropins and in primed females induce ovulation [122]. Rat models have also shown that although metastin stimulates release of both gonadotropins, the effect is much more potent on LH release, with higher concentrations being required for statistically significant increases in serum FSH levels [123].

Not only do loss of function mutations in GPR54 result in nIHH, but at the other end of the spectrum gain of function mutations can result in precocious puberty. In 2008 an p.R386P mutation was identified in a young girl with precocious puberty which resulted in prolonged kisspeptin induced activation of intracellular signalling pathways [124]. So far there have been no reports of KISS1 mutations causing nIHH but two mutations have been detected in three unrelated cases of central precocious puberty (CPP). The first was a 48

heterozygous p.P74S missense change in a boy with idiopathic CPP, and the second was a homozygous missense causing p.H90D in unrelated girls also affected by idiopathic CPP. Neither of these mutations were detected in a screen of 400 controls samples. In vitro studies indicated that the p.P74S mutations may produce a peptide that is more resistant to degradation. This could be consistent with a role in CPP. However, the relevance of the p.H90D variant is less clear. The amino acid at this position is not well conserved and although none of the controls screened in the original study carried the variant, it has been detected in the heterozygous state in healthy controls in an American cohort, and in a patient with nIHH from a French cohort. Taken together with the in vitro evidence that showed no difference in activity or rate of degradation to controls, this variant may turn out to be a rare polymorphism. However, for the time being it has only been reported in the homozygous state in CPP patients meaning that a role in disease pathogenesis cannot be completely ruled out [125].

TAC3 and TACR3 The gene TAC3 encodes a member of the substance P-related tachykinin family of proteins known as Neurokinin B (NKB). Its receptor TACR3 (encoding NK3R) is expressed on GnRH expressing neurons in mice and rats, and NKB is strongly expressed in kisspeptin expressing neurons of the sheep hypothalamus [126-128]. Neurokinin B was first highlighted as a crucial regulator of human sexual development when mutations were discovered in consanguineous nIHH families from Turkey. Autozygosity mapping was carried out on nine consanguineous families three of which shared a common region of homozygosity on chromosome 4 in all affected individuals. TACR3 lay within this mapped region and was considered to be the strongest candidate due to its expression and co-localization with kisspeptin in GnRH neurons. Two mutations were detected, p.G93D in one family and p.P353S in the remaining two families. These missense changes were not detected in ethnically matched controls, and functional analysis revealed that both changes resulted in disruption to receptor signalling. An additional family mapped as part of this study did not have any mutations in TACR3, but did have two homozygous regions common to affected but not 49

unaffected family members. One of these regions contained TAC3, an obvious candidate gene given the TACR3 mutations previously found. The affected individuals carried a missense mutation causing p.M90T which is the C terminal amino acid in the mature peptide, and lies within a highly conserved motif region. Once again this variant was not detected in control samples and activity of the mutant protein was reduced compared to wild- type NKB [129]. The same group have also found another pathogenic mutation in TACR3. The novel missense mutation causes p.H148L in three siblings with nIHH and signalling was shown to be severely disrupted in vitro [130].

More recently a screen of 345 patients with nIHH has estimated the prevalence of rare variants in TACR3 to be approximately 5.5% [131]. Of the 345 affected individuals 13 rare variants were detected in 19 patients (Figure 1.5). Four of these were synonymous variants, one of which (p.T246T) was in the homozygous state and was predicted to affect splicing. Six variants were non-synonymous (one was the previously reported p.P353S, and five were novel), but functional studies revealed that only 3 of the 5 novel variants (p.Y256H, p.R295S and p.Y315C) resulted in loss of signalling in vitro. The p.R295S variant was detected in only one sample and was in the heterozygous state. Sequencing of other HH genes did not reveal any other novel mutations. The final 3 variants were nonsense mutations, one of which (p.S27X) was found in the homozygous state in one patient. The second, p.W275X, was found to be homozygous in 3 patients and heterozygous in an additional 4 patients. The third nonsense mutation, p.W208X, was also detected in the heterozygous state in one sample. Sequencing of other HH genes in these samples revealed no novel mutations. The authors suggested that heterozygous mutations in this gene might contribute to the nIHH disease phenotype as with PROKR2. It could be that a single mutation is enough to result in phenotypic features or the contribution could be in the form of digenic inheritance with as yet unidentified genes. Interestingly the phenotypic analysis of patients with TACR3 mutations showed partial or complete reversal of phenotype (similar to previously described reversible Kallmann syndrome) upon completion of hormone therapy. This suggests 50

that NKB signalling is more crucial during late gestation and early neonatal period than during adulthood. In one consanguineous family with a homozygous TAC3 deletion resulting in a premature stop codon, there were 4 affected individuals. Two of the females with this mutation went on to conceive spontaneously and a third continued to have spontaneous regular menstrual cycles after hormone treatment was stopped. The authors suggest that other members of the tachykinin family may compensate for the loss of NKB making it less essential for regulation of GnRH release in adult life [131].

Figure 1.5. Schematic illustration of TACR3 showing a range of mutations. Image taken from [131].

The identification of a number of different mutations in both TAC3 and TACR3 in nIHH cohorts highlights the importance of the NKB pathway in reproduction. However, despite the work that has been completed so far the mechanism by which this pathway influences GnRH release is still unknown. It appears that if this pathway is impaired during neonatal development the result is hypogonadism, but it plays less of a role in GnRH regulation in adult life, making recovery possible for affected individuals [131]. Further research into this receptor/ligand pair will be needed to test this hypothesis and in general could provide greater insight into the role of NKB in reproduction.

51

Chromosome Gene Phenotype Xp22.3 KAL1 Kallmann syndrome 8p11.22 FGFR1 Kallmann syndrome/nIHH 3p13 PROK2 Kallmann syndrome/nIHH 20p12.3 PROKR2 Kallmann syndrome/nIHH 18q22.1 CDH7 Kallmann syndrome/CHARGE syndrome/nIHH 10q24.32 FGF8 Kallmann syndrome/nIHH 9q34.3 NELF Kallmann syndrome/nIHH 4q13.2 GNRHR nIHH 1q32.2 KISS1 nIHH 19p13.3 KISS1R nIHH/precocious puberty 12q13.3 TAC3 nIHH 4q24 TACR3 nIHH

Table 1.2. Chromosomal location and associated phenotype with hypogonadotropic hypogonadism genes discussed in Chapter 1.2.1.

1.2.2 Hypergonadotropic Hypogonadism: Premature Ovarian Failure (POF) and Ovarian Dysgenesis

In females hypergonadotropic hypogonadism is more commonly termed premature ovarian failure (POF). Premature ovarian failure is defined as the onset of amenorrhea combined with elevated serum gonadotropin levels before the age of 40 years old, and estimates of incidence in the United States suggest that POF affects approximately 1% of women of reproductive age. Of this 1%, 1 in 10,000 women will be affected by the age of 20 and 1 in 1000 by the age of 30 years old [132]. POF can present as lack of pubertal development (absence of mammary and pubic hair development) and primary amenorrhea (lack of menarche at 13 years), or as secondary amenorrhea (a premature form of menopause in which normal menarche occurred but ceased before the age of 40 years) [133]. The most severe form of POF is known as ovarian dysgenesis and leads to the complete depletion of follicles before the onset of puberty resulting in streak gonads (small non functional ovaries) [134]. The typical phenotypic features are primary amenorrhea, reproductive sterility, lack of secondary sexual characteristics, elevated gonadotropin levels and low oestrogen levels [135]. Pure gonadal dysgenesis is the presence of hypergonadotropic

52

hypogonadism in a patient with normal sex chromosome constitution, 46,XX or 46,XY [136]. The pathogenesis of POF is varied and can be a result of accelerated follicle loss, impaired follicular development or a reduced initial number of follicles in affected women. There are non-genetic causes of POF such as secondary to chemotherapy, radiation therapy, hypothyroidism, infection or surgery, but for many women the cause is genetic. A summary of POF genes can be seen in Table 1.3.

Copy Number Variation of the X chromosome The X chromosome is vitally important for normal sexual development in females. This importance can be demonstrated by consideration of the genetic abnormalities associated with the X chromosome which result in impaired ovarian development.

Copy number variations and chromosomal abnormalities account for approximately 9% of cases of POF the majority of which occur on the X chromosome [137]. Turner syndrome, or monosomy X (45,X), is the complete deletion of one copy of the X chromosome. Ovarian dysgenesis is a typical feature of this syndrome, along with short stature and congenital lymphedema. Clinical presentation can be extremely variable ranging from prenatal presentation with cystic hygroma to presentation with moderate short stature or primary infertility. There can be various other associations such as cardiac defects, high blood pressure and progressive sensorineural deafness, and approximately 40% of patients have a mosaic karyotype [138]. Although ovarian dysgenesis is typical in cases of Turner syndrome approximately 5-10% of affected girls do have spontaneous sexual development; around 5% have menstrual periods, and 2-5% have spontaneous pregnancies [139]. A histological study of Turner embryos at different developmental stages has shown that germ cells are present during early development [140]. At around 15-20 weeks development the rate of apoptosis of germ cells in 45,X girls is greatly increased compared to wildtype embryos and this is thought to lead to gonadal dysgenesis in later life [141]. However the mechanism of follicular loss and the age at which ovarian failure occurs is still largely unknown in this group of patients. 53

Triple X syndrome is an X chromosome aneuploidy, which affects approximately 1 in 900-1000 women [142, 143]. There is a wide range of phenotypic variability in women with 47,XXX, some are asymptomatic or very mildly affected, and others are significantly affected. The most common phenotypic features include tall stature, hypotonia, epicanthic folds, clinodactyly, motor and speech delay and learning difficulties [143]. For most 47,XXX women sexual maturation and fertility are not significantly affected, but there have been some suggestions of an association with POF [144]. As early as 1959 a case of triple X syndrome was reported in association with POF. Jacobs et al reported a case of 47,XXX with secondary amenorrhea, lack of secondary sexual characteristics and reduced follicle number upon ovarian biopsy [145]. In 1983, a case report was published on two women with 47,XXX and premature ovarian failure. These cases both had normal sexual development in adolescence and had normal secondary sexual characteristics. They both presented with irregular menses and secondary amenorrhea with elevated LH and FSH levels and low oestrogen levels [146]. More recently there was a report of a 31 year old woman with triple X syndrome who had normal height, intelligence and sexual development, but premature ovarian failure [147]. The true prevalence of POF in women with 47,XXX is still unknown, but in one study in which a POF cohort was screened for cytogenetic abnormalities 47,XXX was detected in 3.8% of cases. Interestingly both of the cases detected by this study also had autoimmune thyroid conditions [142].

Partial deletions or translocations of chromosome X also cause aberrations in ovarian development. These rearrangements have been shown to occur on both the long and short arm of the X chromosome, indicating that both contain genes which are important for normal ovarian development. However, it has been suggested that there is a ‘critical region’ from Xq13.3 to Xq27 and disruption of regulatory domains within the critical region are thought to cause altered gene transcription [148]. Many of the translocations described in the literature are intergenic but there are examples of breakpoints which disrupt function of specific genes [149]. DIAPH2 is the human homologue of the Drosophila gene diaphanous which is important for 54

cytokinesis, and mutant alleles in this gene cause sterility in male and female fruit flies [150]. In 1997, 11 X chromosome translocations in patients with POF were mapped to 15Mb on Xq21 [151]. One of the patients mapped to this location with a 46X,t(X;12) balanced translocation and was described as having secondary amenorrhea and gonadal dysgenesis with no other clinical features. Her mother also carried the translocation and was diagnosed with premature menopause at 32 year of age; both had high gonadotropin levels [151, 152]. The breakpoint in this family was found to occur in the last intron of the DIAPH2 gene and segregated with POF. This translocation was hypothesized to result in either a truncated product which would be prone to nonsense mediated decay, or to form an altered protein by fusion with the chromosome 12 sequence. Expression analysis revealed DIAPH2 to be ubiquitously expressed in human adult and fetal tissues as well as during mouse embryogenesis. This includes expression in the developing ovaries and testis [153]. This translocation is the only DIAPH2 mutation described in relation to POF, and there is very little known about its exact role in ovarian development. Another example of an X;autosomal translocation with disturbs a gene within this ‘critical region’ was described in 2000 by Prueitt et al. The patient was described as having secondary amenorrhea and a balanced translocation within Xq25 was found to disturb XPNPEP2, which encodes a ubiquitously expressed aminopeptidase. The breakpoint lay within intron 1 of the gene but some XPNPEP2 mRNA was still detected in cells carrying the translocation meaning that the gene may at least partially escape X inactivation [154]. Once again no other mutations in XPNPEP2 have been found in patients with POF and its exact role in the pathogenesis of this disease remains unclear.

It should be noted that it is not only X chromosome copy number variants that have been linked to POF pathogenesis. Microdeletions and duplications have been identified in some autosomal genes determined to be important in ovarian development. In 2010 Ledig et al carried out array-CGH analysis of a cohort of POF and ovarian dysgenesis patients and found rare or previously undescribed CNVs in 17 of 44 POF and 14 of 30 ovarian 55

dysgenesis patients [139]. The microdeletions and microduplications described in this paper occurred predominantly in genes which have no previous associations to POF. However, the authors focused their discussion on the genes most likely to be involved in ovarian development, and may have identified potential new candidate genes such as IMMP2L and PLCB1. Follow up studies will be needed to provide evidence as to the role of these genes in POF pathogenesis.

BMP15 Bone morphogenetic protein 15 (BMP15), is located on the X chromosome and is part of the Transforming growth factor (TGF ) superfamily of proteins which are vital for embryonic development and tissue formation [155]. BMP15 expression studies in mice and sheep have shown that it is expressed exclusively in the oocyte and high levels can be detected throughout follicular maturation and ovulation [156, 157]. Naturally occurring BMP15 mutations have been detected in several different breeds of sheep in which homozygotes (or compound heterozygotes when two breeds are crossed) have POF caused by arrested folliculogenesis [156, 158]. Interestingly the knockout mouse model of bmp15 has a milder phenotype leading to subfertility caused by defects in the ovulation process [159]. There has been evidence to suggest that BMP15 and its paralogue GDF9 have different roles in rodent and ruminant species. A study of sheep and cow granulosa cells has shown that GDF9 and BMP15 have species specific functions when administered individually. However, the overall effect of both growth factors together appears to be similar [160].

Despite all of the evidence provided by models as to the importance of these proteins in ovarian development, the role of BMP15 mutations in human POF is still a controversial one. Dominant negative mutations were first reported in human BMP15 in two sisters with primary amenorrhea. The p.Y235C mutation was shown to decrease growth of granulosa cells in vitro after stimulation with wild-type BMP15 [155]. Since the initial report there has been a flurry of publications reporting novel variants in BMP15 in

56

association with POF [161-163]. However, some mutations in BMP15 which have been reported as pathogenic have later been shown to be rare polymorphisms, and caution should be taken when interpreting BMP15 mutations in relation to POF. For example, the p.A180T mutation was described as a cause for POF by two groups before it was suggested by Ledig et al that this variant was more likely to be a rare polymorphism [161, 162, 164]. In both of the initial reports p.A180T was not detected in the control cohort, but Ledig et al detected it in 1/220 control alleles. Additionally it was found that although the mutation was detected in the heterozygous state in a patient with POF, it had been inherited from her unaffected mother, and was also carried by an unaffected sister, indicating a lack of segregation with disease. Similarly, in a second family the mutation was carried by only one of two affected sisters, and was maternally inherited [164]. This all pointed to the likelihood of p.A180T being a rare polymorphism rather than a pathogenic mutation. More recently a functional luciferase reporter assay has been used to show the impact of p.A180T on protein function. There was no significant difference found in activity of this variant to wild-type [163]. Finally, Tiotiu et al detected the p.A180T variant in two healthy controls supporting the growing panel of evidence that this variant is a rare polymorphism [165].

Another good example of a controversial BMP15 mutation is the highly conserved missense change p.L148P which has been detected in the heterozygous state in patients with POF by multiple groups [161, 163, 164, 166]. This variant has been considered by some as being one of the most potentially damaging in the context of POF. The same functional luciferase reporter assay mentioned previously showed a large reduction in activity for this missense change compared with wild-type BMP15 [163]. Several of the patients described as having this variant are likely to be of African ancestry, although the mutation was also detected in a few Caucasian patients. Tiotui et al were the first group to search for the variant in a panel of sub-Saharan African controls and detected p.L148P in the heterozygous state in 6.3%, indicating that this variant is more prevalent in African populations than in Caucasian ones. Most of the studies previous to this one had used control 57

cohorts of Caucasian ancestry and Tiotui et al took this opportunity to emphasize the importance of interpreting genetic variants in the context of ethnicity. The seemingly higher prevalence of this change in the African population certainly raises questions about, but does not exclude, the role of this variant in POF pathogenesis [165]. The extent of the contribution of BMP15 mutations to POF and ovarian dysgenesis remains unclear. What is clear is that BMP15 and GDF9 play an important role in ovarian development and are likely to contribute to POF pathogenesis in some affected individuals. However, the studies discussed above have highlighted the fact that functional assays and careful screening of large control cohorts are necessary to determine the relevance of any mutations suspected of being pathogenic.

FMR1 The most common form of X linked mental retardation, known as Fragile X syndrome, is the result of an overexpansion of a CGG trinucleotide repeat in the 5’ UTR of the FMR1 gene (OMIM 300624). The full disease causing mutation is considered to be an expansion over 200 repeats, with individuals who carry between 55 and 199 repeats being described as having a premutation [167]. The first suggestion of an association between the FMR1 premutation and POF was in 1991 in a study of phenotypic features of a large Fragile X cohort [168]. A large multi centre study of 204 women from fragile X families was carried out and showed that a significantly higher proportion of premutation carriers had POF (24%) than non mutation carriers (6%) or controls (8%) [169]. A small study of a cohort of 147 females with POF found premutations in 4 familial and 2 sporadic cases. A larger international collaboration which screened 760 women from fragile X families found 16% of premutation carriers had POF compared with no cases in full mutation carriers and 0.4% of healthy controls [170, 171]. Two hypotheses have been suggested to explain the association between FMR1 premutations and POF. The first is that since the premutation allele is known to increase transcription, an increased production of the protein may lead to accumulation and impaired expression of genes important for ovarian development [172]. The protein product of FMR1, FMRP is an RNA binding 58

protein that regulates translation and is highly expressed in the germ cells of the fetal ovary in mice [173, 174]. The second hypothesis is that the longer repeat may lead to less efficient translation, and that build up of abnormal mRNA in the ovaries may have a toxic effect leading to follicular atresia [172]. Interestingly, it has also been suggested that deletions in the FMR2 gene, which lies 600Kb distal to FMR1, may be more frequent in women with POF than in healthy controls [175].

POF1B POF1B located on the X chromosome encodes an actin binding protein which is important for germ cell division. This gene was first identified as a candidate for POF by Bione et al. in 2004. A patient with secondary amenorrhea was found to have a balanced translocation 46X, t(X;1), whose breakpoint lay within the third intron of POF1B. POF1B is expressed in the embryonic mouse ovary indicating a role in early ovarian development, it escapes X inactivation and has no homologue on the Y chromosome. Bione et al. also screened a cohort of 223 POF patients for mutations in POF1B. They found five missense variants in the POF cohort, but screening of 900 control samples detected the same five variants at different frequencies. The results of this first study could not conclusively show an association between POF1B and ovarian failure [176]. The link between POF1B and POF was established more firmly in 2006 by the identification of a homozygous missense mutation in affected individuals from a consanguineous Lebanese family. The five affected females from this family all had delayed puberty and primary amenorrhea, and autozygosity mapping revealed a region of interest on the X chromosome. The homozygous point mutation found in affected individuals lay in exon 10 of POF1B and caused p.R329Q in the protein sequence. A screen of 92 ethnically matched controls detected the variant in the heterozygous state in 4 individuals, which results in an estimated allele frequency for this population of 2.2%. The predicted rate of homozygosity was 0.048% which could explain the frequency of POF in the Lebanese population. Functional assessment of the mutant allele showed that it bound to non-muscle actin with approximately fourfold less affinity than wild-type protein [177]. 59

Autosomal Gene Mutations and POF The X chromosome unquestionably plays an important role in sexual development in females; however, autosomal gene mutations have also been detected in patients with POF. There are several autosomal genes which are known to encode proteins important in the regulation or function of the HPG-axis. A selection of these genes and the mutations that have previously been identified are reviewed in this section but for a more comprehensive review the reader is directed to [178].

Gonadotropin Hormones FSHB and LHB Given that FSH and LH are essential components of the HPG-axis, it is not surprising to find that mutations in the hormone specific β-subunits of the genes encoding these proteins affect sexual development in humans. In 1993, the case of a woman with primary amenorrhea and infertility due to isolated FSH-β deficiency was found to be caused by a homozygous deletion in the FSHB gene [179]. The mutation was a 2 base pair deletion in exon 3 resulting in a frameshift and premature stop codon. The mutant allele was predicted to significantly reduce the ability of the protein to bind to the FSH receptor since the C terminal region of the peptide is vital for this function [179]. The same 2bp deletion was later detected in an Israeli woman with isolated deficiency of FSH, treatment with gonadotropin was able to induce ovulation and resulted in pregnancy in this instance [180, 181]. An additional case of isolated FSH-β deficiency was described in 1997 in a female with delayed pubertal development and primary amenorrhea. The patient responded to GnRH treatment, indicated by the appropriate levels of LH, meaning that the HPG axis was functioning normally. This woman had compound heterozygous mutations in the FSHB gene. The maternal allele had a 2bp deletion resulting in a premature stop codon p.V61X, and the paternal allele contained a missense mutation which causes p.C51G. The function of each mutation was assessed by transfecting either wildtype or mutant FSHB along with a normal α-subunit gene into stable cell lines. The mRNA levels of the α and β subunits were expressed at similar levels, but only wildtype protein could be detected by an 60

immunoradiometric assay. No mutant protein was detected. Both the deletion and the missense mutation are predicted to affect the affinity of the mutant β-subunit for binding to its α counterpart and forming functional FSH [182].

In the cases of isolated FSH-β deficiency described above all affected individuals had complete lack of pubertal development and were infertile prior to gonadotropin treatment. In 2002, an interesting case was described in which two siblings, one male and one female, with a homozygous nonsense mutation in FSHB showed signs of pubertal development [183]. The female presented with partial breast development, but primary amenorrhea, low estradiol and FSH levels, and high LH levels. The male sibling appeared to have normal pubertal development with normal testosterone levels, but small testes. He presented with infertility and also had low FSH and elevated LH levels and azoospermia. In vitro analysis of the mutation found in these siblings (p.Y76X) was carried out. Mutant FSH was not detectable using several different assays and this novel mutation was compared with the previously described p.V61X and p.C51G mutations. Despite assay results being the same for all three mutations, the phenotype in this latest family was clearly milder. The authors suggested two possible explanations for the findings, either the assays used were not sensitive enough to detect very low levels of FSH. Or alternatively other environmental or genetic factors could partially compensate for loss of FSH in some cases [183].

Mutations in the LHB gene, which encodes the β-subunit of luteinizing hormone (LH) are a rare cause of female hypogonadism. Several mutations have been described as a cause of male hypogonadism one of which also caused hypogonadism in a female sibling [184-186]. A homozygous mutation which causes aberrant splicing of the LHB mRNA was found in three siblings (two male and one female) from a consanguineous Brazilian family. The affected male siblings presented with hypogonadism, selective LH deficiency, immature Leydig cells and arrested spermatogenesis. Their sister had undergone normal pubertal development, but presented with 61

secondary amenorrhea and infertility. She had a normal sized uterus and normal FSH levels. Treatment with exogenous oestrogen resulted in the development of a dominant follicle but could not restore ovulation and no corpus luteum could be detected. The mutation found in this family was a single nucleotide transversion, c.183+1G>C, which affected the 5’ splice donor site of intron 2 resulting in the retention of intron 2 in the mRNA sequence [184]. The phenotype described in the female patient indicates that LH is not necessary for early preovulatory stage follicular growth, or for pubertal development in females. However, lack of LH did result in arrested follicular development and impaired ovulation indicating an important role for LH in later stage processes. To date, only one other variation in LHB has been reported in association with female infertility. The mutation, a missense change which caused p.G102S, was initially detected in a screen of 103 unrelated individuals from Singapore, and was later shown to be associated with male and female infertility [187]. The association with female infertility was found by screening 52 unrelated women with infertility and 212 healthy fertile controls. The mutation was found in the heterozygous state in two women with infertility and endometriosis, but was not found in any healthy controls [188]. A similar study revealed that this variant also has an association with male infertility. The mutation was detected in 5/145 infertile men, but not in any of the 200 fertile control samples [189]. A functional study was carried out in which the p.G102S mutant was compared to wild- type LHB when transfected into Chinese hamster ovary cells. The results revealed that α and β subunit dimerization was unaffected but the mutant protein had lower receptor binding activity than wild-type protein [190].

Gonadotropin receptors FSHR and LHR In 1995 the first mutation in the follicle stimulating hormone receptor (FSHR) gene was reported. Through linkage analysis of Finnish families affected by ovarian dysgenesis mutations in exon 7 of the FSHR gene were found. Affected individuals from six different families were all homozygous for a missense change which resulted in p.A189V in a highly conserved region of the protein. A functional assay showed that this mutation significantly reduces signal transduction but does not affect ligand binding [191]. Further 62

work on the characterization of this mutation found that the mutant protein was produced at similar levels to wildtype protein and that it was stable. However confocal microscopy revealed that localization of the receptor to the plasma membrane was impaired and most of the mutant protein that was detected was intracellular. The p.A189V mutation is in a conserved motif of 5 amino acids that is present in all glycoprotein hormone receptors. The same mutation was produced in a human LHR construct and the same functional consequences of reduced signal transduction and intracellular localization were observed. This implies that the conserved motif in these glycoprotein hormone receptors is essential for correct targeting to the cell surface membrane [192]. A second FSHR mutation was identified in Finnish patients in 2002. The patient, a female with primary amenorrhea and ovarian failure, was found to be a compound heterozygote for the previously described p.A189V mutation and a novel variant causing p.A419T, located in the second alpha helix of the transmembrane domain of the receptor. The novel mutation was determined to greatly reduce signal transduction but had no effect on expression at the cell surface membrane or binding to FSH [193]. The p.A189V mutation detected in the Finnish cohort has been screened in other populations including the UK and North America. It appears that this mutation is rare in other populations supporting the theory that this is a founder mutation in Finland [194, 195].

63

Figure 1.6. Schematic illustration of FSHR showing a range of mutations. Image taken from [193].

In 2003 a novel homozygous FSHR mutation was described in a Caucasian woman with primary amenorrhea, delayed sexual development, high LH and FSH levels and small ovaries. The homozygous missense change resulted in a p.P519T mutation in the second extracellular loop of the receptor. The mutant receptor was shown to have impaired targeting to the cell surface membrane when expressed in vitro [196]. In the same year, a case of 46,XX primary amenorrhea, with delayed puberty was found to be caused by a p.P384R mutation in the FSHR gene. This case was interesting because the proband was initially thought to be homozygous for the missense mutation. Sequencing of DNA from the girl’s father revealed that he was heterozygous as expected, but the mother was homozygous for the wildtype sequence. The authors suggest that this female is in fact hemizygous, and has a small deletion of FSHR which was either inherited maternally, or occurred spontaneously. Alternatively they suggest that somatic gene conversion, in which exon 10 of the paternal allele has exchanged with the same region of

64

the maternal wildtype allele could explain the results. Uniparental disomy was excluded based on evidence from microsatellite analysis [197].

The mutations described so far cause an ovarian dysgenesis phenotype, but partial loss of function mutations in the FSHR gene have also been described. Compound heterozygous mutations were described by Beau et al in a patient presenting with secondary amenorrhea, high gonadotropin levels (particularly FSH) and infertility [198]. The missense changes caused p.I160T in the extracellular domain and p.R573C in the third intracellular loop of the receptor. This woman had an unusual phenotype because her height, weight and pubertal development where all normal, she had normal sized ovaries and numerous follicles of up to 5mm in size on ultrasound. Biopsy revealed that her follicular development was normal up until the stage of a small antrum, but that development was impaired beyond this. It was hypothesised that residual FSH receptor function was enough to allow follicular growth to the antral stage, but that follicular selection and pre- ovulatory development would require much stronger FSH signalling. This more intense stimulation by FSH was not facilitated by the mutant versions of FSH receptor detected in the affected woman [198]. In a similar case a woman presented with normal growth and sexual development but primary amenorrhea and high gonadotropin levels. The uterus and ovaries were of normal size and follicles could be detected but once again the follicular development had been stunted at the stage of small antrum, this time 3mm in size. Sequencing of the FSHR receptor revealed compound heterozygous mutations; p.D224V in the extracellular domain of the protein and p.L601V in the third extracellular loop. Functional analysis of the two mutations revealed that the biological activity of the p.L601V mutant was markedly reduced, and the p.D224V mutant was not targeted to the cell surface membrane properly. The difference in severity between the phenotype of the patient described by Beau et al and the patient described in this study was also explored [199]. This was thought to be due to the difference in residual biological activity of the p.L601V mutation compared with the p.R573C mutation. The percentage stimulation compared to wildtype for the p.R573C mutant was 24±4%, compared with 12±3% for the p.L601V mutant [199]. 65

As with inactivating mutations in LHB, mutations in the LHR gene are thought to be a rare cause of POF. Women with mutations in LHB or LHR share a similar clinical phenotype. They both have normal female external genitalia, normal pubertal development, oligo-amenorrhea and infertility. Levels of progesterone and estradiol will be as expected for early or mid follicular phase but will not reach the higher levels expected during ovulation. The main distinguishing clinical feature is the serum LH levels in these women. In patients with LHB mutations the LH levels are low, whereas in patients with LHR mutations these levels are elevated [200, 201]. In 1996 a family with a homozygous nonsense mutation p.R554X in LHR was described. This mutation was found in four affected sibs. Karyotyping of affected individuals revealed three of the sibs to be 46,XY, but phenotypically female. The mutation in these individuals had caused male pseudohermaphroditism; they all had normal female external genitalia but lack of breast development and primary amenorrhea. All three individuals had a gonadectomy and the histology results revealed Leydig cell hypoplasia. In the same family a 46,XX affected individual presented with normal sexual development but amenorrhea and high LH levels with normal levels of FSH and estradiol. It was suggested that the normal sexual development of this affected female indicates that LH is less important in pubertal development in females than in males, and that the degree of LH resistance must be severe to cause a POF phenotype in women [202]. A similar family was described with two siblings, the first was a 46,XY pseudohermaphrodite with female genitalia and her 46,XX sister with oligo/amenorrhea and infertility. The affected individuals in this family were found to have a homozygous 6 base pair deletion in LHR, which resulted in the deletion of two amino acids from the protein sequence, p.L608_V609del. Functional experiments showed that the mutant receptor was retained more frequently intracellularly than wildtype receptor, and that there were lower steady state expression levels of mutant compared to wildtype [203].

A missense mutation was reported in 1996 which causes pseudohermaphroditism and Leydig cell hypoplasia in affected 46,XY males, and amenorrhea with infertility, but normal sexual development in females. 66

The mutation results in p.A593P, and was found to be homozygous in two affected 46,XY males and their 46,XX sister whose parents were first cousins [204, 205]. Ovarian biopsy of the affected female revealed follicles at all developmental stages including primordial, preantral and large antral follicles. However, measurements of hormones and primary amenorrhea in the patient indicated that ovulation did not occur. These results once again highlight the importance of LH signalling for ovulation, but show that follicular maturation can occur through FSH signalling alone [205]. The p.A593P mutation was studied in HEK293 cells which revealed a decreased overall expression of the mutant receptor, as well as retention of the receptor in the cytoplasm instead of localizing correctly to the cell surface membrane. Recently, it was shown that a small molecule agonist, Org42599 will bind allosterically and rescue the plasma membrane expression of p.A593P mutant constructs. This work could potentially provide interesting and new approaches to therapeutic treatments for patients with LHR mutations [206].

FIGLA FIGLA is an oocyte specific transcription factor which functions in the formation of primordial follicles in the human fetal ovary [207, 208]. In mice Figla is expressed early in embryonic development in female, but not male gonads. Expression is restricted to the oocytes, and targeted mutagenesis of Figla in mice causes infertility in homozygous females. In contrast, male mice homozygous for the null allele produced normal litter sizes when mated with wildtype, or heterozygous females. In null females the embryonic gonads appear very similar to wildtype gonads, but primordial follicles do not form in the postnatal period and germ cell are massively depleted [209]. Information into the function of Figla in mice made FIGLA an attractive candidate gene for premature ovarian failure in humans. In 2008 a screen of 100 Chinese women with POF revealed two heterozygous deletions in FIGLA which were not found in healthy controls. The first deletion was 22bp in length and caused a frameshift and premature stop at position 66 in the protein sequence. The affected individual had irregular menstrual cycles from the age of menarche, gave birth to a healthy daughter at 23 years and presented with secondary amenorrhea at 36 years old. Pelvic 67

ultrasonography revealed streak ovaries with no visible follicles. The second affected individual was found to have a heterozygous in frame 3 bp deletion which resulted in the loss of an asparagine residue at position 140 in the protein sequence, p.N140del. The phenotypic features in this case included menarche at 14 years of age and normal menstrual cycles until 27 years old when secondary amenorrhea began to develop. She presented at the clinic at age 29, with secondary amenorrhea and infertility but no other clinical features. Only the right ovary was visualized using ultrasonography and it was found to be small and follicles appeared to be depleted [210]. There have been no other reports in the literature of FIGLA mutations as a cause of POF.

NOBOX Newborn ovary homeobox gene, or NOBOX, is an oocyte specific transcriptional regulator which is required during the initial stages of folliculogenesis [211, 212]. In a knockout mouse model of Nobox, the null females were infertile and had atrophic ovaries lacking oocytes at six weeks old. The null males were phenotypically normal. It was shown that in the postnatal period there was a rapid loss of oocytes and no evidence of development of the follicles beyond the primordial stage [213]. The information gathered from animal models and the specific expression of NOBOX in ovarian tissues in both humans and mice made it an excellent candidate gene for POF. In 2007, Qin et al screened a cohort of 96 women with POF and found a novel heterozygous missense mutation in NOBOX which causes p.R355H in the protein sequence [214]. This mutation was not detected in a screen of 278 healthy ethnically matched controls. Functional experiments showed that the mutation disrupted binding in the homeodomain, and had a dominant negative effect on the wildtype protein [214]. No other mutations within NOBOX have been described in the literature, but deletions of the chromosome 7 region containing the NOBOX gene have been associated with primary amenorrhea as well as a range of other phenotypic features. In 2006, Bisgaard and colleagues described twin sisters with long QT syndrome with features of mental retardation, sensorineural hearing loss, microcephaly and growth retardation. The 68

patients were found to have a deletion 7q34-q36.2 of approximately 12.4Mb, which deleted several genes including KCNH2 (mutated in the dominant form of long QT syndrome) [215]. NOBOX was another of the genes found within the deleted region in these twins. There was no indication of primary amenorrhea or fertility issues in these patients at the time of clinical assessment. However, the twins were 6 ½ years old at the time of presentation meaning that primary amenorrhea would not have been evident as a phenotypic feature at this time. In 2008 a patient was described with a 12Mb deletion of 7q33-q35, once again including the NOBOX gene. The patient was female and was diagnosed as having autism with severe language delay and primary amenorrhea. The two most functionally relevant genes were thought to be CNTNAP2 and NOBOX, based on associations with autism susceptibility and POF respectively [216]. More recently another deletion in the same region was described in two siblings with mental retardation, language delay, and dysmorphic facial appearance. The siblings were a brother and sister and primary amenorrhea was an additional feature in the affected female. Once again haploinsufficiency of the NOBOX gene was considered to be the most credible explanation for POF in this patient [217].

NR5A1 NR5A1 encodes an important regulator of steroidogenic enzymes known as Steroidogenic factor-1 (SF-1). This protein is vitally important for the correct regulation of many aspects of adrenal and reproductive function in mammalian species [218, 219]. The initial mutations described in NR5A1 were in patients with adrenal insufficiency and pure 46,XY gonadal dysgenesis. This phenotype was consistent with the phenotype observed in homozygous knockout mouse models which have female Mϋllerian structures in male and female animals, adrenal and gonadal agenesis and a variety of additional features [220, 221]. NR5A1 mutations as a cause of this phenotype in humans was first described in 1999 in a patient with a de novo heterozygous 2bp mutation [222]. A spectrum of phenotypes have since been attributed to mutations in NR5A1 including 46,XX POF. In 2009, mutations were detected in four families with history of 46,XY disorders of 69

sexual development and 46,XX premature ovarian failure. Two of the families carried heterozygous single base pair deletions which caused a frameshift leading to a premature stop codon. The third consanguineous family carried a homozygous missense change, and a heterozygous missense change was detected in the fourth family. This missense was predicted to cause p.M1I, disrupting the initiation codon which may result in absence of initiation or defective initiation using an alternative codon located downstream. The mutation was inherited from the patients’ healthy mother who had reported no menstrual abnormalities [223]. This suggests incomplete penetrance which is consistent with other NR5A1 mutations that have previously been described [224]. It has also been observed that a wide ranging phenotypic variability can be seen within families, including the preservation of fertility with low ovarian reserve in some females with heterozygous mutations [225]. As well as mutations being described in familial cases of 46,XX POF, mutations have also been detected in sporadic patients. In a cohort of 25 sporadic POF patients, mutations in NR5A1 were found in two individuals. The first individual had a 9bp in frame deletion which resulted in the loss of three amino acids, and the second individual had two heterozygous missense changes both inherited on the same chromosome [223]. SF-1 is clearly an important protein, with vital regulatory roles required for sexual development and maturation in humans. Mutations in this gene are particularly interesting because of the wide spectrum of phenotypic features observed, which includes 46,XY DSD, 46,XX POF and adrenal insufficiency. Added to this complexity is that fact that a range of different mutations have been reported with dominant and recessive patterns of inheritance and variability in penetrance.

70

Chromosome Gene Phenotype Xq21 DIAPH2 POF Xq25 XPNPEP2 POF Xp11.22 BMP15 POF Xq27.3 FMR1 POF Xq21.1-q21.2 POF1B POF 11p14.1 FSHB FSH β deficiency/POF 19q13.33 LHB LH β deficiency/POF 2p16.3 FSHR POF 2p16.3 LHR POF/pseudohermaphroditism 2p13.3 FIGLA POF 7q35 NOBOX POF 9q33.3 NR5A1 46XX POF/46XY Disorders of sexual development/adrenal insufficiency 5q23.1 HSD17B4 Perrault syndrome 5q31.3 HARS2 Perrault syndrome

Table 1.3. Chromosomal location and associated phenotype of hypergonadotropic hypogonadism genes discussed in Chapter 1.2.2.

1.2.3 Premature Ovarian Failure and Sensorineural Hearing Loss: Perrault Syndrome

Reviewing some of the vast amount of information now known about ovarian development and the pathogenesis of POF shows that both genetically and phenotypically this is a heterogeneous disorder. As discussed above, POF can be idiopathic or syndromic. The syndrome of interest in this study is Perrault syndrome, which is defined as the association of sensorineural hearing loss and POF/ovarian dysgenesis. Since the original presentation in 1951 by Perrault (later reviewed by Josso et al in 1963), [226, 227], there have been several clinical reports of this rare syndrome, some of which will be reviewed here (Table 1.4).

In 1979 Pallister and Opitz diagnosed a sibship of three affected sisters and two affected brothers with Perrault syndrome. The diagnosis was based upon clinical examination which showed that all three of the sisters suffered from ovarian dysgenesis and severe sensorineural deafness. The two affected brothers also showed signs of severe sensorineural deafness, but

71

possessed apparently normal testicular function [228]. This paper described classic presentation of Perrault syndrome, and concluded that the syndrome is caused by a rare autosomal recessive mutation causing ovarian dysgenesis in female homozygotes and sensorineural deafness in female and male homozygotes. In 1983 two sisters were described as having Perrault syndrome in a report by Bosze et al. The paper described the same clinical presentation of ovarian dysgenesis with sensorineural deafness in 46,XX sisters; however, other developmental anomalies were also noted. These included blue sclerae, high arched palate, spina bifida occulta and epilepsy. It was unclear whether these anomalies were pleiotropic manifestations of Perrault syndrome or simply coincidental findings in this sibship [229].

72

Author Cases Ovarian Sensorineural Other findings Dysgenesis? Hearing Loss? Perrault et al. (1951) 2 sisters Female 1 Yes Yes None Female 2 Yes Yes None Christakos et al. 3 sisters and 1 brother (1969) Female 1 Yes Yes Mental retardation Female 2 Yes No Mental retardation Female 3 Yes Yes Mental retardation Male 1 N/A Yes Not described-patient deceased Pallister and Opitz 3 sisters and 2 brothers (1979) Female 1 Yes Yes None Female 2 Yes Yes None Female3 Yes Yes None Male 1 N/A Yes None Male 2 N/A Yes None Bösze et al (1983) 2 sisters Female 1 Yes Yes Blue sclerae, highly arched palate, lumbar spina bifida occulta, slightly retarded bone age.

Female 2 Yes Yes Epilepsy, micrognathia, unusual whorl patterns on fingertips McCarthy and Opitz 2 sisters (1985) Female 1 Yes Yes Highly arched palate, micrognathia, mild spastic diplegia, Follow-up report by heel cord contractures, mild spastic dislocation of left Fiumara et al (2004) hip, eye muscle dysfunction, progressive ataxia and signs of progressive polyneuropathy

Female 2 Yes Yes Café-au-Lait spot, slight synophrys, hypotonic mouth breather, mild residual congenital lymphedema, progressive weakness in limbs, development of pes cavus, evidence of progressive axonal type neuropathy.

Table 1.4. Phenotypic features of reported Perrault syndrome patients.

73

Author Cases Ovarian Sensorineural Other findings Dysgenesis? Hearing Loss? Nishi et al (1988) 2 sisters Female 1 Yes Yes Growth retardation, ataxic gait, nystagmus, limited extraocular movements, mild scoliosis, weakness of lower limbs, bilateral pes equinovarus

Female 2 Yes Yes Abnormalities of lower limbs, nystagmus, limited extraocular movement Cruz et al (1992) 2 sisters Female 1 Yes Yes None Female 2 Yes Yes None Linssen et al (1994) 2 brothers and 1 sister Male 1 N/A Yes Amelogenesis imperfecta, hypotonia, decreased tendon reflexes, broad based gait, dyspraxia, and fine choreatic movements.

Male 2 N/A Yes Amelogenesis imperfecta, hypotonia, decreased tendon reflexes, broad based gait, mild mental retardation, dyspraxia and fine choreatic movements.

Female 1 Yes Yes Amelogenesis imperfecta, hypotonia, decreased tendon reflexes, broad based gait, mild mental retardation, myoclonic jerks, and linear skin pigmentation Gottschalk et al 1 brother and 1 sister (1996) Male1 N/A Yes Slight hyperopia and astigmatism

Female 1 Yes Yes Cerebellar hypoplasia, ataxia, saccadic dysmetria and mild mental retardation. Nikolaou and 1 sporadic case Winston (1999) Female 1 Yes Yes None

Table 1.4. Continued.

74

Author Cases Ovarian Sensorineural Other findings Dysgenesis? Hearing Loss? Fiumara et al (2004) 2 sisters Female 1 Yes Yes Delayed psychomotor development, ataxic gait, myopia, ophthalmoplegia, muscle hypotrophy and weakness, bilateral pes cavus, absent tendon reflexes, saccadic eye movements, corneal epithelial abnormality, arched palate. Female 2 Yes Yes Delayed Psychomotor development, ataxic gait, myopia, ophthalmoplegia, muscle weakness, corneal epithelial abnormality Jacob et al (2007) 2 sisters Female 1 Yes Yes Marfanoid body proportions, high arched palate Female 2 Yes Yes Marfanoid body proportions, high arched palate Marlin et al (2008) 2 sporadic cases Female 1 Yes Yes None

Female 2 Yes Yes Moderate mental retardation, cerebellar ataxia, nystagmus, ophthalmoplegia, bilateral ptosis and signs of neuropathy. 2 sisters Female 3 Yes Yes None Female 4 Yes Yes None

Table 1.4. Continued.

75

Bosze et al were not the only group to observe additional phenotypic features in Perrault patients. In 1985, McCarthy and Opitz presented clinical findings for two sisters with the disorder. Patient 1 presented with lack of sexual development, moderate to severe sensorineural hearing loss and a number of other anomalies similar to those described by Bosze at al. These included a highly arched palate, micrognathia, cubitus valgus, short fifth metacarpals, mild spastic diplegia and heel cord contractures. However, it was noted at the time that these additional features could have been the result of mild perinatal brain damage caused by the use of forceps during what was described as a ‘difficult breech extraction’. Patient 2 was 4 years 1 month at the time of examination and presented with sensorineural hearing loss, but none of the neurological anomalies shown in her sister [230]. This case is particularly interesting because at the time of initial presentation Patient 2 was too young to display symptoms of ovarian dysgenesis and displayed no neurological symptoms. It was therefore incorrectly concluded that although both sisters had Perrault syndrome, Patient 1 also had a stable neurological condition, most likely caused by the difficult birth, and that Patient 2 was neurologically normal [230, 231]. This case was updated in 2004 by Fiumara et al [231]. The follow-up report reviewed the condition of the sisters in 1996 and presented further clinical findings. Interestingly, what was thought to be a stable neurological condition in Patient 1 was in fact a progressive disorder. In her final examination she was diagnosed as having Perrault syndrome with progressive ataxia, learning problems, and severe progressive sensorimotor neuropathy [231]. Patient 2 had initially presented as neurologically normal, however later assessments showed that she too had developed neurological features. Her phenotype was similar to that of her sister, but appeared to be less severe. It was suggested that two forms of Perrault syndrome may exist, the first being the original form which is non progressive, and the second being the progressive phenotype with neurological anomalies [231].

Two cases of Perrault syndrome which would fit the criteria for the hypothesised progressive form of this syndrome were described in 1988 by Nishi et al [232]. These sisters were Japanese and presented with the 76

obligatory deafness and ovarian dysgenesis, as well as a range of neurological anomalies. These included ataxic gait, nystagmus, weak lower limbs, limited extraocular movements and short stature [232]. Once again the limited knowledge of the exact characterization of Perrault syndrome meant that the authors of this case study were unable to determine if these were symptoms of the syndrome itself or coincidental findings. In 1994 the discussion surrounding neurological anomalies and Perrault syndrome was once again brought to attention in a case study by Linssen et al. Three siblings, two males and one female, presented with sensorineural deafness, the presence of ovarian dysgenesis in the female sibling led to the diagnosis of Perrault syndrome. There were additional findings in this sibship, the most interesting of which being abnormal dental enamel and sensory neuropathy which were found in all three patients. All three presented with gait disorder, hypotonia and hypo-reflexia which were thought to be secondary to the sensory neuropathy. Additionally mild developmental difficulties, dyspraxia, and chorea were noted, although these features were not consistent in all affected individuals [233]. A comprehensive review of all reported cases of Perrault syndrome was published in 1996 by Gottschalk et al. The authors reviewed 28 cases from 11 families. In these 28 cases 10 were reported to have abnormal neurological findings, and it was suggested that this number could be even higher as several studies did not report on neurological findings at all. Although not definitively determined by the cases reviewed, it is a plausible possibility that these findings are pleiotropic manifestations of Perrault syndrome. Therefore, ovarian dysgenesis and sensorineural deafness may not be the only defining symptoms of this disorder [234].

In a case report published in 2007 by Jacob et al, a previously unreported anomaly was described in two sisters. These patients had all of the usual features of Perrault syndrome, but also displayed features similar to those shown in Marfan syndrome. The sisters both had abnormal body proportions; including long arm span, short trunk, highly arched palate, and long fingers but neither patient displayed the usual cardiac or ocular problems associated with Marfan syndrome [235]. Some of the features described in this case such as highly arched palate have been reported 77

previously [232]. Two more patients were described as having Perrault syndrome in 1992 by Cruz et al, and one sporadic case has been published in 1999 by Nikolaou and Winston. No additional malformations or neurological symptoms were noted in these cases [236] [237]. More recently four new cases of Perrault syndrome, two sporadic and two familial, were reported. One of the four presented with neurological symptoms. Candidate gene analysis was performed on these patients and GJB2, POLG and FOXL2 were all excluded as candidate genes [238].

When this project began in 2008 no gene mutations causal for Perrault Syndrome had been identified. Throughout the course of our project collaborators in Seattle have identified pathogenic mutations in two Perrault families from their cohort [239, 240]. The genes identified, HSD17B4 and HARS2, are not known to be related in the function of their products [239]. HSD17B4 mutations were identified in two American sisters using whole exome sequencing [239]. The sisters in whom these mutations have been detected are described in detail in the text above. The initial case report came from McCarthy and Opitz in 1985, and was then updated in 2004 by Fiumara. The sisters had progressive neurological features which were initially not detected in the younger sibling. HSD17B4 encodes D- bifunctional protein (DBP) and gene mutations are a cause of DBP deficiency (OMIM 261515). DBP deficiency results in the accumulation of β- oxidation substrates and phenotypic features typically include hypotonia, seizures, hearing and vision loss, developmental delay, dysmorphic facial features and often fatality within the first two years of life. DBP deficiency can be divided into three subtypes, type I is a deficiency in both hydratase and dehydrogenase activity, type II is deficiency in hydratase activity alone, and type III is deficiency of dehydrogenase activity alone [241]. The affected sisters were compound heterozygous for a nonsense mutation p.Y568X (resulting in greatly reduced levels of mRNA and protein) and a missense mutation p.Y217C (predicted to destabilize the dehydrogenase domain of the DBP protein). The conclusion was that in this family the affected individuals suffered from a very mild case of DBP deficiency. Several of the features described in these girls have been described in other DBP 78

deficiency cases, such as hearing loss, mental retardation and progressive neuropathy. However, this is the first report of ovarian dysgenesis as a feature of DBP deficiency as previously no affected females have survived past puberty [239].

The second gene to be identified to cause Perrault syndrome was HARS2 which encodes mitochondrial histidyl tRNA synthetase. This enzyme catalyses the covalent linkage of histidine to its cognate tRNA and is required in the mitochondria for protein translation. The family in which compound heterozygous HARS2 mutations were detected are the family discussed above that were initially described by Pallister and Optiz in 1979 [228]. The sibship included three affected female and two affected male siblings, and represented a classic case of Perrault syndrome with no additional manifestations. All affected family members were compound heterozygotes for two missense mutations, p.L200V and p.V368L. The p.L200V mutation also produced a cryptic splice site which resulted in the deletion of 12 amino acids from the protein product. This meant that all affected individuals carried three mutant transcripts of this gene. Functional analysis revealed that expression and localization of p.V368L and p.L200V mutants were similar to wildtype. Homodimerization of p.L200V was as efficient as wildtype, but p.V368L was found to be more efficient than wildtype. A pyrophosphate exchange assay demonstrated reduced enzymatic activity of both mutants and down regulation of hars-1 in C.elegans by RNAi resulted in severe defects in the gonads and absence of oocytes or fertilized eggs [240].

The identification of HSD17B4 and HARS2 as the genes responsible for Perrault syndrome in these families presents further evidence of the complex genetic nature of this disorder. The genes have no obvious connection functionally and do not interact in common biological pathways. As well as this, the literature published on Perrault syndrome shows a wide and highly varied spectrum of phenotypes associated with this disorder. It is likely that this syndrome is under diagnosed, particularly in males [230, 238]. This is because in males sensorineural deafness is the only definitive 79

symptom and in the absence of affected sisters the syndrome could easily be misdiagnosed as isolated hereditary deafness. Defects of the nervous system appear to be a common manifestation as shown in several of the reported cases [231-233]. There is a definite need for further investigation into the rare and interesting phenotype. And without identification of causative gene mutations in other families it remains difficult to determine whether neurological features should be considered features of Perrault syndrome or represent a separate condition in some patients. The hypothesis of more than one form of Perrault syndrome seems plausible and evidence indicates that this is a highly heterogeneous condition, phenotypically and genetically (Table 1.4).

1.3 The Evolution of Techniques in Genetic Medicine

With the field of genetic medicine evolving at a rapid and constant pace, the methods used for identifying pathogenic mutations in affected families are also changing. Even throughout the course of this project techniques have vastly improved and it is appropriate to give consideration to techniques both old and new that are being used in the identification of gene mutations in Mendelian disorders.

1.3.1 Linkage Mapping Traditionally, linkage analysis and positional cloning were the techniques of choice for the identification of pathogenic mutations in monogenic diseases. These strategies achieved great success initially, but there were limitations. The starting point for this technique was linkage mapping. This was carried out on large families in which the disease phenotype segregated in a Mendelian fashion. The proposal that inherited disorders could be mapped using genome wide DNA polymorphisms first was first reported in 1980 [242]. The first types of polymorphism used in mapping were restriction fragment length polymorphisms. Specific sequences which are recognizable to different DNA restriction enzymes could be utilized to produce fragments of DNA of a variety of sizes. Differences in the genotypes of individual people would produce different length fragments and as such a unique map 80

of the genome could be produced for each person by running these fragments on an agarose gel. The genotypic differences could be as simple as a change in a single base which would create a new cleavage site or eliminate an existing one. Alternatively insertions or deletions of repeat regions or other sections of DNA would alter the fragment size [242]. This idea soon progressed and simple polymorphic repeat markers were used instead of restriction digests. These repeat regions have differing lengths in different individuals and combinations of polymorphic repeats were used to form a map of the human genome [243]. Using simple polymorphic markers provided a more rapid and efficient method for mapping of disease loci. With the completion of the Human Genome Project the technique was able to evolve again to using hundreds of thousands of SNPs (single nucleotide polymorphisms) [244]. It is now possible to run micro-arrays which map the human genome in fine detail. These arrays can produce a huge amount of information because not only do they contain genotyping SNP probes but they also harbour thousands of copy number probes.

Traditional linkage analysis requires the ascertainment of one or more large families with multiple affected individuals. This presented a major limitation since with rare recessive disorders it can often be difficult to find enough families with multiple affected individuals [245]. As well as this in order for mapping to be successful it is necessary that the phenotype of affected individuals be well characterized with minimal ambiguity. Any uncertainties about the affected status of patients can obscure results and affect the mapping [246]. One method which allows sensitive and specific mapping for rare recessive conditions, and requires much fewer patients is autozygosity mapping. Autozygosity mapping can be carried out successfully with as few as one family, or with several small families providing that the parents are consanguineous [247] [248]. Consanguinity can lead to the inheritance of homozygous regions of genome from a single common ancestor [245]. These regions are known as identical by descent (IBD) or autozygous (Figure 1.7). In affected individuals identical copies of recessive disease- causing alleles have been inherited twice within these regions of autozygosity. 81

D

D D

D D

D D

Figure 1.7. Simple diagram to demonstrate Identical By Descent inheritance of a disease causing allele (D), from a single common ancestor. Adapted from [245].

The basic principle of autozygosity mapping is that not only will the disease causing allele be IBD in affected individuals but any polymorphic markers which flank these regions will be identical too. It is therefore possible to carry out genome-wide typing of consanguineous families using SNPs or microsatellite markers to look for regions of homozygosity. Homozygous regions which are consistently common to affected individuals, but are not present in unaffected individuals from the same family are likely to contain the disease-causing allele [245]. Autozygosity mapping has successfully identified many recessive disease loci, and has led to the identification of mutations in a large proportion of cases. A recent example of the power of this technique is the mapping of a single large Pakistani family with non syndromic mental retardation, to a locus on chr11p15 [249]. However, although the advantages of this technique are great, it is not without its problems. The major problem with homozygosity mapping is that it does not account for any unexpected allelic or locus heterogeneity. The whole point of this method is that it assumes that patients from consanguineous families will be homozygous for a mutation inherited from a common ancestor.

82

Problems arise when more than one mutant locus or allele is present within the same consanguineous pedigree [250] This was shown to be the case in a study of a large consanguineous pedigree from Brazil who presented with deafness. The genome wide scan of these subjects did not provide any common regions of homozygosity or linkage. Instead of discovering a single homozygous mutation as expected, this study revealed at least two mutant alleles in the MYO15A gene, with the possibility of other causes for deafness within the same pedigree [251].

1.3.2 From Locus to Gene Mutation Traditionally once a disease locus had been identified positional cloning would be used to locate the specific gene mutation. Using the pure positional cloning technique a pathogenic mutation was identified based entirely on map position, without any prior functional knowledge [252]. The most commonly used process was known as chromosome walking. This required a previously mapped gene or linked marker which was used as a starting point. The end fragments of a clone of the linked marker or known gene were used to probe the cDNA library for an adjacent second clone. A subclone of small end fragments of these clones were restriction digested and used to isolate other overlapping clones from the library and so on. In this way overlapping clones are produced that move outwards from either end of the start point, hence the term chromosome walking. One of the best examples of pure positional cloning is the discovery of TCOF1, which encodes Treacle protein and is mutated in patients with Treacher Collins syndrome (OMIM 154500). This gene was identified by fine mapping the region in affected individuals, and then constructing a combined radiation hybrid and genetic linkage map. From this map a YAC contig of the mapped region was produced. This YAC contig was then converted to a cosmid contig and screening of placental and skeletal muscle cDNA libraries combined with exon trapping produced a transcription map of the region. The group discovered at least 7 genes in the region and TCOF1 was found to contain mutations in five unrelated Treacher Collins families [253]. Using positional cloning in this way is a laborious process, and many of the positional cloning successes were made easier through identification of 83

patients with large cytogenetic rearrangements or deletions which greatly aided mapping and gene identification [254, 255]. The labour intensive nature of this process meant that it quickly gave way to a more candidate based approach. The ‘positional candidate approach’ required mapping of the disease region as described above with focus being given to any functionally or biochemically attractive candidates which had been mapped to the same region [252]. Since the completion of the Human Genome Project in 2003, it has become common to take a candidate gene approach when looking for mutations in a mapped region [256].

1.3.3 Next Generation Sequencing The traditional approach to identification of mutations in Mendelian disease has proven extremely successful, and has identified causal variants for many recessive, dominant and x-linked conditions. However, as mentioned above these techniques are not without limitations. The most restricting of which is that traditional linkage is not suitable for extremely rare disorders, or de novo mutations. As mentioned previously autozygosity mapping can be used for some rare recessive disorders but this technique is only suitable where consanguinity exists. For disorders in which adequate numbers of affected individuals cannot be collected and there is no evidence of consanguinity, linkage mapping is not a practical option [257]. Next generation sequencing, either whole exome or whole genome, has provided scientists with a new and exciting method by which to study extremely rare or de novo mutations. The following section will provide a brief overview of the recent developments in DNA sequencing technologies, but for a full and comprehensive review of next generation sequencing the reader is referred to Shendure and Ji, 2008 and Bick 2011 [258, 259].

Sanger established a novel technique for DNA sequencing in 1977 and has been combined with automating technologies and improved over the ensuing years [260, 261]. The Sanger sequencing technique has been the basis of most sequencing applications since the early 90s. It now offers an affordable and accurate application capable of producing reads of up to 1000bp that is widely and routinely used in genetic research and clinical 84

application around the world. The modern day Sanger method uses synthetic oligonucleotide primers to generate a set of DNA fragments. Enzymatic synthesis of a new DNA strand is carried out by Taq DNA polymerase, from a specific point on the template DNA. The reaction is cycled through a number of rounds of template denaturation, primer annealing and primer extension and contains four deoxynucleotides and four fluorescently labelled dideoxynucleotide analogues. When the dideoxynucleotide analogues are incorporated into the new DNA strand they cause chain termination. A nested set of fragments is produced each one ending with one of the four fluorescently labelled dideoxynucleotides. The fluorescent signals and corresponding base can then be read on an automated capillary sequencer.[261]

The next generation sequencing technologies are an alternative form of high throughput sequencing which enable sequencing of a whole human genome in a matter of weeks. Several alternative platforms for next generation DNA sequencing exist such as the Roche 454, the Ion Torrent, the Illumina Solexa and the ABI SOLiD platforms. The Illumina Solexa and ABI SOLiD platforms work on the principles of cyclic array sequencing first described by Shendure et al [262]. They differ in their biochemistry and array generation but they are based upon the same strategy. Essentially genomic DNA is sheared into random fragments and common adaptor sequences are ligated onto the end of each fragment. Different methods of library preparation can be used to produce simple shot gun fragment libraries or mate paired libraries. One of the main advantages of using mate paired reads is that greater accuracy can be achieved for the mapping of ambiguous or repetitive sequences. If one half of the mate pair is unique and can be aligned to a single location in the genome, then the other mate pair which is ambiguous can also be accurately aligned [263].

Individual fragments of the DNA library are then amplified in an in vitro cloning step. The methods used to produce these in vitro colonies differ between platforms [259]. The SOLiD workflow uses an emulsion PCR method for amplification of the library. Individual library fragments are 85

captured by a micron-scale primer coated bead in an aqueous droplet. Once amplification has taken place each bead will contain amplicons of a single DNA fragment tethered to its surface. The beads which contain amplification products can be selected for and enriched. The Solexa platform uses an alternative method for library amplification. This technique relies on bridge or cluster PCR. Bridge PCR involves the production of a flowcell containing tethered forward and reverse primers with a flexible linker in between. Amplicons which are produced from an individual fragment of the template library will remain clustered to a single physical location on the flowcell. This method of amplification can produce approximately 1000 copies of each individual fragment within the library. The large number of copies of each fragment allows improved accuracy by reducing the number of random errors that would occur through single molecule sequencing [264].

The SOLiD sequencing chemistry uses a unique two base encoding system based on sequencing by ligation (Figure 1.8). The sequencing is driven by DNA ligase which ligates a series of probes to the tethered amplicon sequences on each bead. Each probe is made up of 8 bases; the first two bases represent 16 different dinucleotide combinations. There are four corresponding dyes with each colour representing four possible dinucleotide combinations. A universal primer which is designed to bind to the adaptor sequence is used to initiate the process. In each round of sequencing the fluorescently labelled octamer probes are used to extend the sequence starting from the universal primer. After each probe has ligated to the growing chain the fluorescent signal is imaged and the probe is cleaved by a cleavage agent between bases 5 and 6. This releases the fluorescent label and allows binding of the next probe. Back to back rounds of probe ligation allow sequencing of every 5th base, e.g. bases 5, 10, 15 and 20. Once this round of sequencing is complete, the newly synthesized strand is denatured from the template strand and the process begins again using a primer that is designed to bind one base back from the first primer. The second round of ligation will allow sequencing of bases 4, 9, 14 and 19, the third round will sequence bases 3, 8, 13 and 18 and so on until the entire amplicon has been sequenced. Because each fluorescent dye represents a dinucleotide 86

pair, instead of a single base, each base is sequenced twice. In the first round of ligation the base will be sequenced as the first of the pair, and in the second round it will be the second base in the pair. Because of its design, two base encoding identifies mis-called bases more easily than other sequencing chemistries [265].

Figure 1.8. A. The 16 Dinucleotide probes used in the SOLiD sequencing chemistry. Cleavage of the fluorescent tag occurs between three degenerate bases (N) and three universal bases (Z) B. The basic principle of 2 base encoding. Pictures adapted from Applied Biosystems webinar [266].

The Solexa sequencing platform from Illumina uses a different reaction chemistry for its sequencing (Figure 1.9). The ‘reversible termination’ 87

technique was originally invented and developed in Cambridge by Shankar Balasubramanian and David Klenerman. The basis of the sequencing chemistry is that DNA fragments are immobilized to a solid surface and these templates are sequenced using a DNA polymerase based scheme. In each cycle of sequencing one of four fluorescently labelled dNTPs is incorporated into the growing strand, with a universal primer targeted to adaptor sequences used to begin synthesis. The correct base is incorporated opposite the first free base of the template strand. This base is blocked by an azido methyl group at the 3’ oxygen site which prevents incorporation of the next base in the sequence. This means that each cycle is limited to a single base extension, such that in order to sequence x number of bases, you would require x number of cycles. The fluorescent tag attached to the incorporated base is then imaged. Once the tag has been imaged it is cleaved by a water soluble phosphidine molecule, which simultaneously removes the azido methyl group. Once the blocking agent has been removed the next cycle of sequencing can begin by incorporating the next base in the sequence [264].

Figure 1.9. The major steps in the reverse termination chemistry used by Illumina Solexa sequencing platforms. Step 1. Primer annealing to template strand. Step 2. Incorporation of a fluorescently labelled and chemically blocked dNTP into the growing strand. Followed by imaging of fluorescent

88

tag. Step 3. Chemical removal of the fluorescent label and blocking agent resets the system ready for the next cycle of sequencing. Picture is adapted from [264].

The first commercially available sequencing platform from Illumina was released in 2006 and was called the Genome Analyser. The Genome Analyser was able to accurately sequence in a single experiment a billion nucleotide bases. The latest platform, released in 2010, is called the HiSeq 2000, and now offers the ability to sequence over 200 billion nucleotide bases accurately in a single experiment. This shows that in just a few short years the scale of improvement which next generation sequencing platforms have undergone (not just Illumina and SOLiD but many other technologies as well) has been enormous. The future for DNA sequencing will be fast moving and constantly evolving. Changes and improvements have been evident throughout the course of this project, and the ability for small labs to run whole genome and exome sequencing with high accuracy and low cost makes this an exciting time to work in the field of medical genetics. This technology has the potential to determine the genetic basis or susceptibility of a wide range of diseases, as shown by the range of publications that are emerging in which these technologies have been employed.

1.4 Aim Hypothesis: Genetic variants are responsible to rare syndromal forms of SNHL and hypogonadism.

The aim of this project is to identify the causative mutations for two syndromes in which sensorineural hearing loss and infertility are key phenotypic features. The first syndrome is the rare autosomal recessive syndrome, Perrault syndrome. The second syndrome under investigation is uncharacterized, but has key features which include sensorineural hearing loss, hypogonadotropic hypogonadism, microcephaly and learning disability in all affected individuals. Please see Chapter 3 for a full description of families enrolled in this project and their phenotypic characteristics.

89

The techniques employed in this study will be a combination of older techniques such as autozygosity mapping to identify disease loci, followed by screening of candidate genes using Sanger sequencing techniques. And newer techniques including whole exome next generation sequencing.

The determination of genetic causes for these disorders will be accompanied by expression analysis and functional work where relevant. The identification of the genes involved in rare disorders and the functional effect of mutations can lead to a greater understanding of the pathogenesis of more common disorders, accurate risk estimation within affected families and the promise of therapeutic developments.

90

CHAPTER 2. MATERIALS AND METHODS

91

Chapter 2: Materials and Methods Ethical Approval for this study was granted by the Salford and Trafford Local Research Ethics committee 03/05/05 (05/Q1404/49).

2.1 Suppliers Chemicals used were purchased from a range of suppliers including Sigma- Aldrich Company Ltd, Fisher Scientific and Vector Labs. All chemicals were of molecular biology grade unless indicated otherwise in the text. Suppliers and catalogue numbers (cat number) are given where appropriate in the text.

2.2 Nucleic Acid procedures All primers used in this project were designed using the Primer 3 program V4.0 http://frodo.wi.mit.edu/ and purchased from Invitrogen or Sigma-Aldrich unless otherwise stated in the text.

2.2.1 DNA Extraction DNA was extracted from blood using the Qiagen Puregene Blood Core Kit C. This kit was used according to the manufacturer’s protocol on the Gentra Systems Autopure LS robot. All DNA extractions were performed as a service by the NHS National Genetic reference Laboratory at St Marys Hospital.

2.2.2 RNA Extraction RNA extraction from human tissues: RNA was extracted from human embryonic tissues using the RNeasy Mini kit (Qiagen, cat number: 74104) according to manufacturer’s instructions.

RNA extraction from human cell line: RNA was extracted from primary human lymphocytes using the EZ1 robot, and the EZ1 RNA Cell Mini Kit (Qiagen, cat number: 958034) according to manufacturer’s instructions. All RNA extractions were carried out in RNase free conditions. Work surfaces were cleaned with RNase ZAP solution (Invitrogen, cat number: AM9780). 92

RNA extraction from Zebrafish embryos: RNA was extracted using the TRIZOL (Invitrogen, cat number: 15596018) method according to manufacturer’s instructions.

2.2.3 Quantification of Nucleic Acids RNA and DNA were quantified using the Thermo Scientific Nanodrop 8000 Spectrophotometer.

2.2.4 Standard PCR Reaction DNA amplification was performed by PCR using one of the following Abgene Reddymixes (Thermo Scientific) custom PCR mastermix-CM102 (Taq DNA polymerase, 3.7mM MgCl2, 0.085mg/ml BSA, 6.7uM EDTA, 0.75mM each of dATP, dGTP, dTTP and dCTP) custom PCR mastermix-CM130 (Thermostart DNA polymerase, Thermo-start buffer, 3.7mM MgCl2 and 0.75mM each of dATP, dGTP, dTTP and dCTP) AB795 PCR mastermix (Long range PCR master mix) The PCR reactions were carried out typically using the component and volumes stated in Table 2.1.

Reagent Volume in 10µl reaction Reddymix (CM102/CM130/AB795) 5 µl 5µM Primer mix (Forward and Reverse) 1 µl

Sterile H2O 3 µl Genomic DNA (5-20ng/ µl) 1 µl

Table 2.1: Standard PCR reaction mixture.

93

Reactions were typically cycled on the Veriti 96 well Thermal cycler using the following conditions: Initial Denaturation: 95oC for 5mins Denaturation: 95oC for 30sec Annealing: Variable temp*. for 30sec-1min 30-40 cycles Extension: 72oC for 30sec/500bp Final Extension: 72oC for 7mins

* All primers used were optimized to predetermine the most efficient PCR conditions. Annealing temperature was typically in the range 50-64oC. Annealing time was typically 45 seconds but could be optimized based on primer performance. Extension time was typically 30 seconds per 500 base per of nucleic acid being amplified.

2.2.5 Agarose Gel Electrophoresis All chemicals were purchased from Sigma Aldrich and were molecular biology grade.

10x TBE solution (5 Litres): 540g Tris Base 275g Orthoboric Acid 200mls 0.5M EDTA

PCR products were analysed using horizontal agarose gel electrophoresis. Gels (1.5-2%) were typically formulated using the following components and volumes:

1.5-2g Agarose powder 100ml 1x TBE 5µl Ethidium Bromide

Gels were run in 1x TBE at 120-140V for approximately 15-20mins. Gels were visualised on a UV transilluminator (with a wavelength of 205nm).

94

Product size was estimated using standard sized DNA markers such as the Fermentas Gene Ruler 100bp DNA ladder plus (cat number: SM0321).

2.2.6 Purification of PCR Products PCR Products were purified with the Agencourt Ampure protocol in accordance with manufacturer’s instructions using Beckman Coulter Agencourt Ampure Magnetic Beads (cat number: A63881). The samples were run on the Beckman Coulter Biomek NX3 Multichannel Robot using the following volumes:

PCR Product - 16µl Agencourt Ampure Magnetic Beads – 28.8µl

2.2.7 Sequencing Reactions DNA sequencing was carried out using the BigDye Terminator kit version 3.1 (Applied Biosystems, cat number: 4337454). Reactions were carried out typically using the components and volumes stated in Table 2.2.

Reagent Volume in 10µl Reaction BigDye v3.1 Sequencing Kit 0.25 µl BigDye v3.1 Sequencing Buffer 1.75 µl

Sterile H2O 4.15 µl 2µM primer 1.6 µl Purified PCR product 2 µl

Table 2.2. Standard sequencing reaction mixture.

Sequencing reactions were cycled on the Veriti 96 well Thermal Cycler using the following conditions:

98oC for 10 secs 55oC for 20secs 35 cycles 60oC for 4 mins

95

2.2.8 Purification of Sequencing Products Products of sequencing reactions were purified using Sephadex purification (Sigma-Aldrich, cat number: S6147) according to manufacturer’s instructions. Alternatively, sequencing reactions were purified using Agencourt Cleanseq protocol in accordance with manufacturer’s instructions with Beckman Coulter Agencourt Cleanseq Magnetic Beads (cat number: A29154). The samples were run on the Beckman Coulter Biomek NX3 Multichannel Robot using the following volumes: Sequencing Product - 10µl Agencourt Cleanseq Magnetic Beads – 10µl

All sequencing was carried out on the ABI3730 DNA analyser as a service by the NHS Genetics Molecular Laboratory in St Mary’s Hospital.

2.2.9 Whole Genome Amplification Whole genome amplification was carried out on samples that did not contain the minimum amount of DNA required for SNP array analysis. Amplification reactions were carried out using the REPLI-g Mini Kit (QIAgen, cat number: 150033) according to manufacturer’s instructions.

2.2.10 Reverse Transcription RNA was converted to cDNA using the High capacity RNA-cDNA kit (Applied Biosystems, cat number 4387406) according to manufacturer’s instructions.

2.2.11 Affymetrix Genome-wide Human SNP Array V6.0/250K Genomic DNA (500ng) was digested with NspI and StyI enzymes and ligated to adaptors to allow PCR amplification using a generic primer. This was carried out using the Affymetrix Genome-Wide Human SNP Nsp/Sty Assay Kit V5.0/6.0 according to the manufacturer’s protocol. For the 250K array the Affymetrix Genome-Wide Human SNP Nsp/Sty 500K Assay Kit was used according to manufacturer’s instructions. The PCR products are purified, fragmented and labelled before being hybridized to the Human SNP Array V6.0 or the 250K Array. This procedure was all carried out according 96

to the manufacturer’s protocol. The Affymetrix Genome-Wide Human SNP Arrays were carried out as a part of a service provided by the BRC at St Mary’s Hospital. See Appendix Figure 10.7 for a schematic of the protocol. Taken from the Affymetrix website. http://media.affymetrix.com/support/technical/datasheets/genomewide_snp6 _datasheet.pdf

2.2.12 Sybr Green Quantitative Real time PCR was carried out on the Applied Biosystems 7900HT Fast Real Time PCR System, using the Applied Biosystems SYBR Green PCR master mix (cat number: 4309155) in accordance with manufacturer’s instructions. Primers for all genes were purchased from Primer Design Ltd.

Primer Sequence Supplier AP1M2 Forward CCTTCCTCACCTCTTCCTTATTC Primer design AP1M2 Reverse AGCCCCTCTTCTATTTCTTCATATAA Primer design CDKN2D Forward GGGGTTATGTATCAGAAGAGAGG Primer design CDKN2D Reverse CAACACCTATAAGCCACAAACTG Primer design KRI1 Forward CTATAACAGAACAGCATCGTCATC Primer design KRI1 Reverse CCTTCCTCTCGTAGTCCTTCA Primer design SLC44A2 Forward CTGCCCATTTACTGCGAAAAC Primer design SLC44A2 Reverse CCGACTCACCACCGTAGAA Primer design Table 2.3. Primer sequences for SYBR green experiments. All primers were purchased from Primer Design Ltd.

2.2.13 Copy Number assay Taqman copy number assays for the NOBOX gene (Assay numbers: Hs03648039 and Hs03623799) were purchased from Applied Biosystems. Reactions were prepared and analysed in accordance with manufacturer’s instructions on the Applied Biosystems 7900HT Fast Real Time PCR System.

2.2.14 Whole exome sequencing Whole exome sequencing was carried out as a service by the Beijing Genomics Institute (BGI) in Hong Kong, or as part of a collaboration with the

97

King Lab, Genome Sciences, University of Washington, Seattle. Exomes were captured using the Agilent Sure Select 38Mb Exon Capture array and sequencing was carried out on the Illumina HiSeq 2000 platform. http://www.genomics.cn/en/index

2.2.15 Expression array Expression array analysis of extracted mRNA was carried out as a service by the microarray service at the Paterson Institute for Cancer Research. The Affymetrix HG_U133 Plus 2.0 array was used in accordance with manufacturer’s instructions.

2.3 Data Analysis

Analysis of data from the Affymetrix Human Genome-Wide SNP Array 6.0 was performed using the AutoSNPa and IBD programs available for download from the University of Leeds Institute of Molecular medicine. http://dna.leeds.ac.uk/autosnpa/

Analysis of sequencing data was performed using the Staden sequence analysis program and Gene Screen. Gene screen is available for download from the University of Leeds Institute of Molecular medicine. http://dna.leeds.ac.uk/genescreen/

2.4 Protein Procedures

2.4.1 Extraction of Protein from Zebrafish Embryos Zebrafish embryos were collected as described in section 2.5.1. Embryos were de-chorionated and de-yolked prior to protein extraction. A minimum of 30 embryos were required for protein extraction.

98

Protein Extraction buffer : 20mM HEPES (pH 7.4) 0.1M NaCl 1mM DTT 1% (w/v) SDS Prepare relevant amount of extraction buffer (150ul of buffer is required per tube of embryos) by adding the following:

Protease Inhibitor Cocktail, Set III EDTA-free (Calbiochem cat number: 539134)) - 8ul per 1ml extraction buffer 0.25M EDTA – 4ul per 1ml extraction buffer 100mM PMSF (Phenylmethylsulfonyl fluoride, Sigma Aldrich, catalogue number: P7626) – 6ul per 1ml extraction buffer

150ul of prepared extraction buffer was added per tube of embryos. Embryos were homogenised with Turboturrex homogeniser for 2 min on ice. Solution was then boiled for 3 mins at 95oC. To each tube150ul of extraction buffer minus SDS but containing 4% (v/v) Triton X 100 was added. Solutions were carefully mixed by pipetting up and down, followed by homogenisation with the Turboturrex homogeniser for 2 min on ice. Tubes were centrifuged on a benchtop device at maximum speed for 5mins. Supernatant was transferred to a clean tube and snap frozen using liquid nitrogen.

2.4.2 Extraction of Protein from Mammalian tissue Lysis Buffer : 200mM Tris-HCl (pH 8) 150mM NaCl Add Inhibitor Cocktail, 1% (v/v) Nonidet P-40 Set III EDTA-free 10% (v/v) Glycerol (Calbiochem cat 1mM EDTA number:539134). 5mM Sodium Fluoride 8ul per 1ml 0.1% (w/v) SDS 2mM Sodium orthovanadate

99

Homogenise tissue in Lysis Buffer (3.5 volumes of buffer to tissue) and boil for 5 minutes at 100oC. Centrifuge at maximum speed on a benchtop centrifuge for 20 minutes. Carefully remove supernatant. Pipette into 25ul aliquots. Store at -20oC.

2.4.3 SDS Polyacrylamide Gel Electrophoresis (SDS-PAGE) SDS-page was used for protein analysis. Protein extracts were denatured prior to gel loading in a loading buffer containing SDS, for 5 minutes at 100oC. The total volume of protein loaded was 30ul. Amount of protein to be loaded was determined using the BCA assay kit (Pierce, catalogue number: PN23225) according to manufacturer’s instructions.

4X Sample Buffer (500ml): 125ml 1M Tris HCl 40g SDS 200ml Glycerol 0.04g Bromphenol Blue 100ml 2-mecaptoethanol Adjust pH to 6.8

BioRad 4–20% Mini-PROTEAN TGX Precast Gel (Catalogue number:456- 1093EDU) were used and were ran for approximately 60 minutes in BioRad 1x Tris/Glycine/SDS Electrophoresis Buffer (Diluted from 10x stock, Catalogue number:161-0732EDU) at 100-120V. Protein samples were sized against the Fermentas Spectra™ Multicolor High Range Protein Ladder (catalogue number: SM1851).

2.4.4 Western Blot Analysis All chemicals were purchased from Sigma Aldrich and were molecular biology grade.

100

Semi-dry Transfer Technique: Semi-dry transfer buffer (1L): Tris – 5.81g Glycine – 2.92g SDS – 1.875ml of 20% (w/v) SDS stock sln Methanol – 200ml

Following electrophoresis, SDS-page gels were soaked in transfer buffer for approximately 10 minutes. Gels were stacked between layers of filter paper and nitrocellulose membrane soaked in transfer buffer. Assembled stacks were run for 1-2hrs at room temperature with a constant current of 150mA (voltage limited to 20V) on a semi-dry electroblotter.

Wet Transfer Technique: Wet transfer Buffer pH 8.3 (1L) Tris – 5.81g Glycine – 2.92g SDS – 1.875ml of 20% (w/v) SDS stock sln pH adjusted to 8.3 using HCl

Following SDS-page gels were soaked in transfer buffer for approximately 10 minutes. Gels were stacked between layers of filter paper and nitrocellulose membrane soaked in transfer buffer. Assembled stacks were run for 5hours or overnight at 4oC with a constant current of 200mA, (voltage limited to 20V) using the MiniTrans Blot wet transfer tank and apparatus from BioRad.

2.4.5 Developing Western Blot PBS-Tween (2L): 10 x Phosphate Buffered saline tablets (Sigma-Aldrich, cat number: P4417) 1ml Tween 20 (Sigma-Aldrich, cat number: P1379)

101

Blocking Solution 1: 4% (w/v) skimmed milk powder (Marvel) in PBS-Tween Blocking Solution 2: 1% (w/v) Bovine Serum in PBS-Tween

Following transfer nitrocellulose membranes were incubated for a minimum of 1 hour at room temperature in blocking solution. All of the following incubations were carried out on a rocking platform. Membranes were incubated in primary antibody for conditions specific to the antibody in use (Table 2.4). Membranes were washed three times for 10 minutes in PBS-Tween. Membranes were incubated in secondary antibody for conditions specific to the antibody in use (Table 2.4). Membranes were washed five times for 10 minutes in PBS-Tween. Membranes were incubated for 5 minutes in Supersignal West Pico Chemiluminescent substrate (Pierce, catalogue number: 34077) and exposed in a dark room on Amersham Hyperfilm ECL (cat number: GZ2890683) for variable exposure times. Films were developed on the AGFA Curix 60 table top processor using AGFA developer G153, AGFA Fixer G354 and washed in H2O.

Primary Catalogue Working Supplier Antibodies Number dilution Gift from Gundelfinger group, rb2BSN1 (anti Leibniz Institute for N/A 1:1000 rabbit) Neurobiology, Magdeburg Germany Gift from Lowe group, Faculty TAT1 (anti mouse) of Life Sciences, University of N/A 1:1000 Manchester Secondary Catalogue Working Supplier Antibody Number dilution Unconjugated Goat Anti-Rabbit Vector Labs AI-1000 1:1000 IgG Unconjugated Horse Anti-Mouse Vector Labs AI-2000 1:1000 IgG

Table 2.4. Antibodies and conditions used in western blotting experiments.

102

2.4.6 Stripping Western Blots Membranes were washed three times for 5 minutes in PBS-Tween, followed by three washes of 5 minutes in H2O. Membranes were then washed in

0.2M NaOH for 5 minutes, followed by five 5 minute washes in H2O.

2.5 Zebrafish Model Organism Techniques

2.5.1 Zebrafish Care and Breeding Zebrafish were cared for and bred within the Biological Sciences Facility at the University of Manchester. Detailed protocols for breeding zebrafish can be found in The Zebrafish Book by Monte Westerfield, 4th edition found online at http://zfin.org/zf_info/zfbook/cont.html. For further details on breeding strategy please see Chapter 8.3.2.

2.5.2 De-chorination of Embryos Instant ocean stock solution (100ml): 4g instant ocean

100ml ddH2O

1 X Chorion water (1L): 1.5ml instant ocean stock 500µl methylene blue (Sigma, cat number: 319112)

Embryos were transferred into a glass dish. All chorions were intact. Chorion water was removed using a 200μl pipette. Embryos were covered with Pronase (Roche, cat number: 10165921001) using a glass pipette and incubated at 37oC in an oven for 15 minutes. A glass pipette was used to break remaining intact chorions by pipetting up and down. Embryos were transferred into 8x chorion water and broken chorions were removed from dish. Washes in 8 x chorion water were repeated until all broken chorions were removed and the water was clean. Embryos were transferred into 1 x chorion water.

103

2.5.3 Procedure for de-yolking embryos Ginzburg Buffer (1 Litre): 6.5 g NaCl 0.25 g KCl

0.3 g CaCl2

0.2 g NaHCO3

To prepare 1ml Ginzburg Buffer for use the following were added: 8ul Protease Inhibitor Cocktail, Set III EDTA-free (Calbiochem catalogue number: 539134) 3ul 0.25M EDTA 3ul 100mM PMSF (Phenylmethylsulfonyl fluoride, Sigma Aldrich, catalogue number: P7626) Note: PMSF was made fresh for each use = 5.9mg dissolved in 339.2ul of isopropanol

Approximately 30-100 embryos were transferred into a 2ml eppendorf tube. Chorion water was removed using a 200μl pipette. 1ml of prepared Ginzburg buffer was added and mixed by pipetting up and down 15 times. Tubes were spun at 2000rpm for 1 minute. Supernatant was carefully removed. 1ml of prepared Ginzburg buffer was added and the tube was inverted to mix. Tubes were spun at 2000rpm for 1 minute. Supernatant was carefully removed.

2.6 Immunohistochemistry Techniques

2.6.1 Sectioning and Mounting of Human Embryonic Tissue: Paraffin embedded tissue was sectioned using the Leica RM2235 Microtome to 5µm thickness and mounted on polylysine coated microscope slides.

104

2.6.2 Immunohistochemistry of Tissue Sections Rehydration: Slides were placed into a metal rack and rehydrated by transferring into the following solutions. Xylene Solution 1 = 3 minutes Xylene Solution 2 = 3 minutes 100% Ethanol = 2 minutes 90% Ethanol = 2 minutes Water = 10 seconds Removal of Endogenous Peroxidase: Slides were placed into a plastic rack and transferred into a solution containing 250ml PBS (pH 7.4) and 750ul of 30% (v/v) stock H2O2. Incubation was carried out on a shaker for 20 minutes. Slides were washed in PBS on a shaker for 5 minutes. This wash was repeated three times.

Permeabilisation: 1L Sodium Citrate Solution: 3g Sodium citrate (pH 6) 200ul concentrated Acetic Acid

Sodium citrate solution (250ml) was heated in a microwave at full power for 2 minutes in a glass beaker. The solution was then placed onto a hot block and kept at boiling point. Slides were placed into a metal rack and submerged in boiling Sodium Citrate Solution for approximately 10 minutes. Slides were allowed to cool for 20 minutes. Slides were washed in PBS on a shaker for 5 minutes. This wash was repeated three times.

Antibody Staining PBS + 0.1% (v/v) Triton X Buffer solution: 40ml PBS (Sigma-Aldrich, cat number: P4417) 40ul Triton X (Sigma-Aldrich, cat number: T8787)

105

Each section of tissue was isolated by drawing around it with an Immedge Hydrophobic Barrier pen (Vector Labs cat number: H-4000). Antibodies were diluted to appropriate concentrations (see Table 2.5 for working antibody concentrations) in PBS+0.1%(v/v) Triton X Buffer. Dilutions contained 3% serum appropriate to the species in which the secondary antibody was raised. The secondary antibody used in this project was raised in goat, therefore Vector Labs Normal Goat Serum (Vector Labs S-1000) was added to all dilutions.

Antibody dilutions were pipetted onto sections and left in a moist box overnight at 4oC. The following day slides were washed in PBS on a shaker for 5 minutes. This wash was repeated three times.

Secondary antibody was diluted to appropriate concentration in PBS+0.1%(v/v) Triton X Buffer. Antibody was pipetted onto tissue sections and allowed to incubate for 2 hours in a moist box at 4oC. Slides were washed in PBS on a shaker for 5 minutes. This wash was repeated three times.

Catalogue Working Primary Antibodies Company Number dilution Anti-CLPP Sigma HPA010649 1:200 Anti-GTF2F1 Sigma HPA022793 1:100

Catalogue Secondary Antibody Company Number Unconjugated Goat Anti- Vector Labs AI-1000 1:800 Rabbit IgG

Table 2.5. Antibodies and conditions used for immunohistochemistry experiments.

Peroxidase Addition A 1:200 solution of Streptavidin HRP (Vector Labs, cat number: SA-5004) using PBS+0.1% (v/v) Triton X Buffer was prepared. The prepared solution was pipetted onto tissue sections and left for 1 hour in a moist box at 4oC.

106

Slides were washed in PBS on a shaker for 5 minutes. This wash was repeated three times.

Colour Staining: Toludine Blue Working Solution: 2.77ml Toludine blue stock (Sigma-Aldrich, cat number: 89640)

250ml ddH2O

One tablet of 3,3′-Diaminobenzidine tetrahydrochloride (DAB: Sigma-Aldrich, cat number: D5905) was dissolved into 10ml of water. The solution was vortexed to fully dissolve the tablet. DAB solution was pipetted onto tissue sections and left for 2 minutes. DAB solution was removed from the slide by shaking. A solution containing 1ml of DAB solution plus 10μl of 3% (v/v) stock solution H2O2 was prepared. DAB + H2O2 solution was pipetted onto tissue sections and left for 3 minutes. Slides were washed in PBS on a shaker for 5 minutes in a plastic container.

Slides were placed into a plastic rack and transferred into Toludine Blue Working Solution for 2 minutes. Slides were rinsed in de-ionised water.

Dehydration: Slides were placed into a metal rack and dehydrated by transferring into the following solutions.

70% Ethanol = 10 seconds 90% Ethanol = 10 seconds 100% Ethanol = 3 minutes Xylene solution 1 = 2 minutes Xylene solution 2 = 2 minutes

Slides were mounted using Entellan Mounting media (VWR catalogue number: 1.07961.0500). Slides were left for approximately 45 minutes until dry and viewed under a microscope.

107

CHAPTER 3. CLINICAL DETAILS OF PATIENTS INVOLVED IN THE STUDY

108

3.0 Clinical Details of Patients Involved in the Study.

3.1 Aim

The aim of this chapter is to describe the clinical characteristics and provide pedigree information for families taking part in this study. Ethical approval for this study was obtained from the University of Manchester (06138) and NHS ethics committees (06/Q1406/52). Written consent was obtained from affected adults, who could understand explanations of the purpose of the research. In patients with learning difficulties informed consent was obtained verbally from individuals who could understand simple explanations of the purposes of the research with additional written consent being obtained from parents when available. Clinical details were provided by the consulting clinician for each family, and DNA was made available whenever possible. In the pedigrees below only individuals with available DNA samples are numbered, with the exception of family P10. The pedigree for family P10 has been taken from the referenced poster presentation.

109

3.2 Clinical Descriptions

3.2.1 Perrault Syndrome Cohort

Family P1: Family P1 are a large consanguineous family of Pakistani origin. DNA is available for three affected females, unaffected parents and three unaffected siblings (Figure 3.1). The phenotype of affected individuals in this family is complex and can be broadly categorized as Perrault syndrome with neurological features. All three affected sisters have ovarian dysgenesis, sensorineural hearing loss, epilepsy, short stature, severe microcephaly (height, weight and OFC below 3rd centile in all sisters) and moderate to severe learning difficulties. Sisters P1:III:3 and P1:III:5 have additional phenotypic features including ataxia and signs of lower limb spasticity. Affected individual P1:III:2 has undergone an MRI scan which showed abnormal high signal intensity in the deep white matter and cortico-spinal tract. For a summary of the clinical details of affected individuals from this family please see Table 3.1.

Figure 3.1. Pedigree of Family P1

110

Family P2: Family P2 are a non-consanguineous Irish family with two affected siblings (Figure 3.2). Both affected siblings have Perrault syndrome with cerebellar hypoplasia. The younger sibling P2:II:2 also has additional features of ataxia and dyspraxia.

Figure 3.2. Pedigree of Family P2

Family P3: Family P3 are a non-consanguineous Egyptian family with affected identical twins (Figure 3.3). The sisters have Perrault syndrome without any evidence of neurological involvement or any additional phenotypic features.

Figure 3.3. Pedigree of Family P3

111

Family P4: Patient P4:II:1 was born to first cousin parents and phenotypic features include hypergonadotropic hypogonadism, sensorineural hypoacusia, and cerebellar ataxia (Figure 3.4). The hearing deficit and hypogonadism in this patient could be considered mild, but her neurological phenotype was severe. A full description of the complex neurological phenotype for this patient is given in [267]. In the text P4:II:1 is Case 4.

Figure 3.4. Pedigree of Family P4

Family P5: Affected siblings from consanguineous family P5 both have Perrault syndrome with learning difficulties and ichthyosis (Figure 3.5). Affected female P5:II:1 also has epilepsy.

Figure 3.5. Pedigree of Family P5

112

Family P6: Family P6 are a non consanguineous family with one affected individual (Figure 3.6). Affected female P6:II:1 has Perrault syndrome without any additional phenotypic features.

Figure 3.6. Pedigree of Family P6

Family P7: The pedigree for Family P7 is unknown. This family are British Pakistani and consanguinity is thought to exist within the family. The relationship between the parents is unknown, and we do not have clinical details on any additional family members. DNA is available for one affected individual, please refer to Table 3.1 for a summary of clinical details. The affected individual from this family shall be referred to as P7 for the remainder of this text.

113

Family P8: DNA is available from one affected female from family P8. The family are Italian and the parents of this affected individual are first cousins. DNA samples are available for the unaffected parents and one unaffected sibling. The phenotype of this patient is well described by Fiumara et al., within the text affected individual P8:II:1 is NM [231] . This patient is thought to have Perrault syndrome with progressive neurological features. For a summary of clinical details please see Table 3.1.

Figure 3.7. Pedigree of Family P8

Family P9: DNA is available for one affected female from family P9. Family P9 are British and there is no evidence of consanguinity. Affected female P9:II:1 for whom DNA is available has Perrault syndrome with additional features of short stature, dyspraxia and hemi-atrophy of the left hand and foot. There is no DNA available for her affected brother, who is known to have sensorineural hearing loss and hypospadias.

Figure 3.8. Pedigree of Family P9

114

Family P10: DNA is available for one affected female from Family P10, P10:IV:2 (indicated by an arrow in Fig 3.9). The affected individual has Perrault syndrome with neurological defects, including subcortical atrophy and muscular hypotonia. Interestingly individual P10:IV:2 also carries a 11778G>A mutation in mitochondrial DNA resulting in severe Leber hereditary optic neuropathy (LHON) (Appendix Figure 10.1). This mutation is found in all other affected family members, but sensorineural hearing loss, hypergonadotropic hypogonadism and neurological features are only found in individual P10:IV:2. It is thought that this individual represents a sporadic case of Perrault syndrome in an LHON family. Full clinical description can be seen in Appendix Figure 10.1, a clinical poster presentation from the ESHG conference presented by project collaborators [268]. For a summary of clinical features please see Table 3.1.

Figure 3.9. Pedigree of Family P10 taken from [268].

115

Family P11: DNA is available for three affected siblings, with first cousin parents of Iranian descent. Affected individuals have Perrault syndrome with mild mental retardation in sibling P11:III:1, and possible mild mental retardation in sibling P11:III:4. Sibling P11:III:3 shows no signs of mental retardation and has an IQ within the normal range. For full clinical details of this family please see Mehdipour et al [269]. For a summary of clinical details please see Table 3.1.

Figure 3.10. Pedigree of Family P11

116

Table 3.1. Clinical features for affected individuals from Perrault syndrome cohort.

117

Table 3.1. Continued.

118

Table 3.1 Continued.

119

3.2.2 Family with Hypogonadotropic Hypogonadism syndrome Family HH1: Family HH1 are a large consanguineous family of Pakistani origin (Figure 3.11). Affected individuals have a novel hypogonadotropic hypogonadism syndrome with a complex combination of phenotypic features. Although many forms of non syndromic deafness, hypogonadotropic hypogonadism, microcephaly and learning disability have been defined, a previous report of this particular combination of features could not be identified in a search of the literature. NCBI Pubmed and the Human Gene Mutation Database (HGMD) were both used to search for previous descriptions of this syndrome. The most striking features of this disorder include hypogonadotropic hypogonadism, sensorineural hearing loss, severe microcephaly, growth retardation, dysmorphic facial features and developmental delay. The characteristic dysmorphic facial features which are present in all affected individuals include convex nasal ridge, highly arched eyebrows, hypertelorism, and micrognathia (Figure 3.12). Three affected individuals had intrauterine growth retardation and all had neonatal feeding difficulties and iron deficiency anaemia. Skeletal abnormalities were also common to all affected individuals including Sprengel shoulder, camptodactyly and broad halluces. Additional phenotypic characteristics include neurological defects, palatal abnormalities, epilepsy and diabetes, although these features were not consistent in all affected individuals. For a summary of phenotypic features please see Table 3.2. A clinical case report of this family has recently been published in the American Journal of Medical Genetics in an attempt to identify and recruit additional families with the same disorder [270].

120

Figure 3.11. Pedigree of Family HH1.

Figure 3.12. Clinical photograph of affected individuals from Family HH1. A= HH1:III:1, B= HH1:III:4, C= HH1:III:5 and D= HH1:IV:1. Dysmorphic facial features including convex nasal ridge, highly arched eyebrows, hypertelorism, and micrognathia are common to all four individual.

121

Feature Patient Patient HH1:III:4 Patient HH1:III:5 Patient 4 HH1:III:1 HH1:IV:1

Age at evaluation 32 23 21 8 (years)

OFC -4.6 SD -5.3 SD -6.2 SD -6.3 SD Height -2.1 SD -3.0 SD -3.0 SD -3.0 SD Weight -3.3 SD - 2.8 SD -3.1 SD -3.6 Pregnancy IUGR Normal IUGR IUGR oligohydramnios oligohydramnios oligohydramnios

Birth weight 2.14kg 3.12kg 2.25kg 1.9kg Gestation (weeks) 37 41 40 40 Neonatal Feeding + + + + problems

Iron deficiency + + + + anemia

Learning disability + + + + (moderate)

Age at walking 3-4yrs 3-4yrs 4-5yrs 2yrs

Speech sentences sentences Single words 3 word sentences

Seizures 1 post-trauma From 16. GM + none 1 post- episode partial complex immunization seizures convulsion Hypogonadotropic + + + + hypogonadism

Pubertal Delayed Primary Primary - abnormality puberty, amenorrhea amenorrhea Gynecomastia External genitalia Atrophic testes Normal female Normal female Cryptorchidism, micropenis

Sensorineural From birth From birth Diagnosed age Diagnosed aged Deafness 5yr 1yr

Vision problems Myopia Myopia Myopia Myopia Diabetes NIDDM, onset - NIDDM, onset 16 - 30yrs Neurological Peripheral Poor balance - - problems neuropathy Skeletal Sprengel Stooped posture, Sprengel Broad thumbs, abnormalities shoulder, broad fifth finger shoulder, genu overlapping toes halluces camptodactyly valgum Facial + + + + Dysmorphism Palatal abnormality Bifid uvula Bifid uvula - -

Teeth Normal Normal Absent roots Normal

Table 3.2. Phenotypic features of affected individuals from Family HH1 taken from Jenkinson et al [270]. (+ = present, - = absent) 122

CHAPTER 4. MUTATIONAL SCREENING IN HSD17B4 AND HARS2

123

4.0 Mutational Screening in HSD17B4 and HARS2

4.1 Introduction During the course of this project novel mutations were detected by Pierce et al. in affected individuals from two Perrault families in the genes HSD17B4 and HARS2 [239, 240]. A more detailed description of these mutations can be found in Chapter 1.2.3. It is hoped that the identification of these genes will provide information about gene function, biological processes and disease pathogenesis. In both of the initial reports of mutations in Perrault families, Pierce et al screened a cohort of affected individuals but did not detect any other novel mutations. Six additional families were screened for HSD17B4 mutations, and ten Perrault families were screened for HARS2 mutations [239, 240]. The lack of additional families with mutations in these genes suggests that Perrault syndrome is a heterogeneous disorder.

4.2 Aim The aim of this chapter is to present the results of a screen of affected individuals from a cohort of nine Perrault syndrome families for mutations in the genes HSD17B4 and HARS2. Primers were designed to amplify the coding exons of HSD17B4 and HARS2, (see Appendix Table 10.1 and 10.2 for primer sequences) and screening was carried out on affected individuals from 9 Perrault families. All primers were designed using the online software Primer 3 (http://frodo.wi.mit.edu/primer3/). Traditional Sanger DNA sequencing techniques using the Big Dye Terminator V3.1 sequencing kit (Applied Biosystems) were employed for the work presented in this chapter. For full details of the methods and material used please see Chapter 2. The affected individuals from Family P10 and P8 were not screened for mutations as this work had already been carried out by collaborators. No pathogenic mutations were detected in HARS2 or HSD17B4 in either family.

124

4.3 Results Sequencing of coding exons of HARS2 and HSD17B4 in affected individuals from our Perrault syndrome cohort did not identify any pathogenic mutations. Sequencing did detect known common polymorphisms, all of which were identified on the NCBI database of common SNPs (Table 4.1).

Figure 4.1. Representation of HARS2 gene. The primer pairs used for PCR amplification and sequencing are shown in red, coding exons are shown in blue and untranslated region (UTR) is shown in purple.

Figure 4.2. Representation of the HSD17B4 gene. Coding exons are shown in blue and UTR is shown in purple.

125

GENE/ALLELE/FREQ Genotype of Affected Individuals HSD17B4 rs26180 (MAF: P9:II:1 G/G, P4:II:1 G/G, P8:II:1 C/C, P3:II:1 G/C, P3:II:2 0.43(G)) G/C, P2:II:1 G/G, P2:II:2 G/C, P1:III:2 G/G, P1:III:3 G/C, P1:III:5 G/C, P5:II:1 G/G, P6:II:1 G/G, P7 C/C HSD17B4 rs25640 (MAF: 0.37 P9:II:1 A/A, P4:II:1 A/G, P8:II:1 G/G, P3:II:1 A/G, P3:II:2 A/G, (A)) P2:II:1 A/A, P2:II:2 A/G, P1:III:2 A/A, P1:III:3 A/G, P1:III:5 A/G, P5:II:1 A/A, P6:II:1 A/A, P7 G/G HSD17B4 rs11205 (MAF: 0.39 P9:II:1 A/G, P4:II:1 G/G, P8:II:1 A/A, P3:II:1 A/G, P3:II:2 A/G, (G)) P2:II:1 A/G, P2:II:2 A/G, P1:III:2 A/G, P1:III:3 A/A, P1:III:5 A/A, P5:II:1 G/G, P6:II:1 G/G, P7 A/A HSD17B4 rs460945 (MAF: P9:II:1 C/T, P4:II:1 C/T, P8:II:1 C/C, P3:II:1 C/T, P3:II:2 C/T, 0.28(T)) P2:II:1 C/T, P2:II:2 C/C, P1:III:2 C/T, P1:III:3 C/C, P1:III:5 C/C, P5:II:1 C/C, P6:II:1 C/C, P7 C/C

HSD17B4 rs28943591 (MAF: P9:II:1 G/G, P4:II:1 G/G, P8:II:1 G/G, P3:II:1 G/G, P3:II:2 0.01 (A)) G/G, P2:II:1 A/G, P2:II:2 A/G, P1:III:2 G/G, P1:III:3 G/G, P1:III:5 G/G, P5:II:1 G/G, P6:II:1 G/G, P7 G/G HSD17B4 rs72111487 (Intronic P9:II:1 TT/TT, P4:II:1 TT/TT, P8:II:1 TT/TT , Indel) P3:II:1 TT/TT, P3:II:2 TT/TT, P2:II:1 TT/TT, P2:II:2 TT/TT , P1:III:2 TT/TT, P1:III:3 TT/TT, P1:III:5 TT/TT, P5:II:1 -/-, P6:II:1 -/-, P7 -/-

HSD17B4 rs67686075 (Intronic P9:II:1 AC/AC, P4:II:1 AC/-, P8:II:1 AC/AC, P3:II:1 AC/-, Indel) P3:II:2 AC/-, P2:II:1 AC/AC, P2:II:2 AC/-, P1:III:2 AC/AC, P1:III:3 AC/-, P1:III:5 AC/-, P5:II:1 AC/AC, P6:II:1 AC/AC, P7 AC/AC

HSD17B4 rs28943596 (MAF: P9:II:1 A/A, P4:II:1 G/G, P8:II:1 A/A, P3:II:1 A/A, P3:II:2 A/A, 0.03 (G)) P2:II:1 A/A, P2:II:2 A/G, P1:III:2 A/A, P1:III:3 A/A, P1:III:5 A/A, P5:II:1 G/G, P6:II:1 G/G, P7 G/G

HSD17B4 rs28943593 (MAF: P9:II:1 C/T, P4:II:1 C/C, P8:II:1 C/C, P3:II:1 C/T, P3:II:2 C/T, 0.08 (T)) P2:II:1 C/T, P2:II:2 C/T, P1:III:2 C/T, P1:III:3 T/T, P1:III:5 T/T, P5:II:1 C/C, P6:II:1 C/C, P7 C/C

HSD17B4 rs34381335 (Intronic P9:II:1 -/-, P4:II:1 -/-, P8:II:1 -/-, P3:II:1 -/A ,P3:II:2 -/A, P2:II:1 Indel) -/-, P2:II:2 -/-, P1:III:2 -/-, P1:III:3 -/-, P1:III:5 -/-, P5:II:1 -/-, P6:II:1 -/-, P7 -/-

HARS2 rs6883035 (MAF: 0.21 P9:II:1 G/G, P4:II:1 G/G, P8:II:1 G/G, P3:II:1 G/G, P3:II:2 (A)) G/G, P2:II:1 G/G, P2:II:2 G/G, P1:III:2 G/G, P1:III:3 G/G, P1:III:5 G/G, P5:II:1 G/A, P6:II:1 G/G, P7 G/A, P11:II:1 G/G HARS2 rs56372992 (Intronic P9:II:1 -/-, P4:II:1 -/-, P8:II:1 -/-, P3:II:1 -/-, P3:II:2 -/-, P2:II:1 - Indel) /-, P2:II:2 -/-, P1:III:2 -/-, P1:III:3 -/-, P1:III:5 -/-, P5:II:1 TTGATATAG /TTGATATAG, P6:II:1 - /TTGATATAG, P7 -/TTGATATAG, P11:II:1 -/-

Table 4.1. Common polymorphisms detected during mutational screening of HARS2 and HSD17B4.

126

Amino Acid code

Figure 4.3. Sequence chromatogram showing a section of HARS2 for affected sisters from Family P2.The highlighted codon (green) shows the site of the pathogenic p.L200V variant found by Pierce et al. All individuals in our cohort are wildtype for this allele.

Amino Acid code

Figure 4.4. Sequence chromatogram showing a section of HARS2 for affected individuals P8:II:1 and P6:II:1.The highlighted codon (green) shows the site of the pathogenic p.V368L variant found by Pierce et al. All individuals in our cohort are wildtype for this allele.

127

Amino Acid code

Figure 4.5. Sequence chromatogram showing a section of HSD17B4 for affected individuals P7 and P5:II:1. The highlighted codon (green) shows the site of the pathogenic p.Y217C variant found by Pierce et al. All individuals in our cohort are wildtype for this allele.

Amino Acid code

Figure 4.6. Sequence chromatogram showing a section of HSD17B4 for affected individuals P5:II:1 and P6:II:1. The highlighted codon (yellow) shows the site of the pathogenic p.Y568X variant found by Pierce et al. All individuals in our cohort are wildtype for this allele.

128

Sample/s Sample/s sequenced for sequenced for HARS2 coding HSD17B4 coding Family exons exons P1 III:2, III:3, III:5 III:2, III:3, III:5 P2 II:1, II:2 II:1, II:2 P3 II:1, II:2 II:1, II:2 P4 II:1 II:1 P5 II:1 II:1 P6 II:1 II:1 P7 P7 P7 P9 II:1 II:1 P11 III:1 III:1

Table 4.2. List of samples that were sequenced for HARS2 and HSD17B4 from Perrault syndrome cohort.

4.4 Discussion Upon identification of pathogenic mutations in HSD17B4 and HARS2 Pierce et al screened other affected individuals from their Perrault syndrome cohort. No mutations were detected in any other affected individuals [239, 240]. The finding of two independent genes in unrelated families indicates that Perrault syndrome is a genetically heterogeneous disorder. There is no obvious biological link between the two proteins identified so far. No common pathway can be identified and there are no known biological interactions between HARS2 and HSD17B4. Sequencing of the coding exons of both genes in nine Perrault syndrome families from our cohort has provided additional evidence of genetic heterogeneity in this syndrome [271]. We cannot exclude the possibility that intronic mutations in these genes may be present and may be having an effect on gene splicing or regulation of expression. However, it remains likely that HARS2 and HSD17B4 do not represent the pathogenic cause of Perrault syndrome for the families involved in this project. This work supports the assertion that mutations in these genes are not a common cause of Perrault syndrome and that this syndrome is genetically heterogeneous.

129

CHAPTER 5. AUTOZYGOSITY MAPPING

130

5.0 Autozygosity Mapping and Copy Number Analysis

5.1 Introduction As discussed in Chapter 1 autozygosity mapping represents an effective and sensitive method of mapping a disease locus using far fewer families than traditional linkage techniques. One of the biggest challenges to this method of mapping is the analysis of the large amounts of data produced from the high density genotyping arrays now available. Practical alternatives are needed to the more traditional mathematical linkage analysis, which can be problematic when applied to autozygosity mapping. A major problem with using traditional statistical analysis techniques for autozygosity mapping is that the families being mapped are consanguineous. Known allele frequencies for different populations are used in many statistical calculations. It can be incorrect to assume that the allele frequencies of consanguineous populations are the same as the ethnic groups from which they arose [272]. In addition, it is not always possible to calculate allele frequency from family members because in most cases samples are only available for a few individuals, and calculating frequency in this way would create a biased result since all individuals come from the same family. Complex pedigrees with multiple consanguineous unions make calculating LOD scores difficult, and while simplifying the pedigree structure would overcome this problem; the LOD score from such an analysis may not be the most reliable. Additionally the extent of consanguinity within a family might not be fully known or disclosed when compiling the pedigree [272]. An example of a program which uses simplified versions of known pedigrees to calculate linkage likelihoods is the Merlin program [273]. Merlin is a multipoint linkage program which uses sparse binary trees as a representation of gene flow through a pedigree. The simplification of complex pedigrees which may not be full and comprehensive is not the only problem with using the Merlin program for this project. Merlin was designed to handle thousands of markers which would have been suitable for analysing genotyping data from early versions of SNP arrays such as Affymetrix 10K or 50K arrays. The memory requirements for Merlin increases in a linear manner with the number of markers meaning that the 131

resources required for analysing the larger modern SNP arrays would be huge [272]. The theoretical and practical problems surrounding such statistical analysis make programs such as Merlin unsuitable for use in this project. Statistical analysis is not essential to use autozygosity mapping to analyse a consanguineous kindred. The concept of autozygosity mapping is simple and visual detection of regions of common homozygosity is all that is required to identify candidate regions [274]. AutoSNPa is a computer program which presents large quantities of SNP data in a clear and easy to use format [272]. The graphical interface allows the user to visually detect regions of homozygosity which are common to affected individuals but not unaffected individuals. The simplicity of the program and the fact that it does not involve any kind of statistical analysis means that the problems discussed above are avoided. This alternative way of viewing SNP array data is highly effective for autozygosity mapping. AutoSNPa displays the data from the Affymetrix SNP Arrays as colour coded SNPs according to physical position along the chromosome in Mb. Each chromosome can be seen by clicking the appropriate tab along the top of the interface. The data is shown in two groups, the affected individuals are displayed in columns on the left of the interface, and unaffected individuals are displayed in columns on the right of the interface. Each person is given a colour coded SNP genotype based on four categories, “no- call”, “heterozygous”, “common homozygous”, and “ rare homozygous”. The common and rare homozygous calls are determined using genotype information from affected individuals only. If both AA and BB genotypes are present in the affected individuals then the most common genotype is designated “common homozygous”. The “common homozygous” genotype is displayed in black on the interface, “rare homozygous” and “heterozygous” calls are displayed in yellow and “no calls” appear in grey [272]. The program allows several different graphical representations of the data including the “Heterozygous SNPs” view which highlights areas of homozygosity as blue and pink boxes. And the “Homozygous Runs” view which highlights areas of homozygosity and indicates the probability that such a region occurs by chance using either the proportion of homozygous SNPs that the individual has on the chromosome under analysis, or from 132

predetermined population allele frequencies. The data for each region of interest can be exported into text files to view individual genotype calls, either as flat text files or excel spreadsheets [272]. AutoSNPa was used to analyse all Affymetrix array data presented in this chapter.

The Chromosome Analysis Suite (ChAS) from Affymetrix is a software package designed for the cytogenetic analysis of Affymetrix array data. This software is simple to use and can generate data in graphical and tabular formats. The program allows the user to identify copy number gains and losses, mosaicism and loss of heterozygosity. For the purposes of this study ChAS was used for copy number variation (CNV) analysis of array data. http://media.affymetrix.com/support/downloads/manuals/chas_software_use r_manual.pdf

5.2 Aim The aim of this chapter is to map disease loci in consanguineous families using autozygosity mapping. Affymetrix Human SNP array technology in conjunction with AutoSNPa and Affymetrix Chromosome Analysis Suite (ChAS) software was used for this project.

5.3 Results The SNP array data presented in this chapter was produced by Dr Jill Urquhart and Dr Sarah Daly as a service run by the National Genetics Reference Laboratory at St Mary’s Hospital, Manchester.

133

5.3.1 Perrault Syndrome 5.3.1.1 Family P1 Mapping of Family P1 to 19p13.3-13.11

Figure 5.1. Pedigree of Family P1. For full clinical details of Family P1 please see Chapter 3.

Initial mapping of family P1 was carried out using Affymetrix Human Genome-Wide SNP Array 250K. The data generated from these arrays were analysed using the AutoSNPa program described previously to look for areas of autozygosity common to affected individuals. Analysis of 250K array data identified two areas of autozygosity on chromosome 19 (Figure 5.2). 19p13.3-13.11 (19:5524344-16654702 Hg18) region approximately 11.1Mb 19q13.31-13.32 (19:49053000-51267243 Hg18) region approximately 2.2Mb

134

Figure 5.2. Affymetrix 250K Array data for Family P1 presented in AutoSNPa (Hg18). Regions of homozygosity are shown in black. Two large regions of approximately 11.1Mb and 2.2Mb can clearly be seen on chromosome 19.

The Affymetrix Human Genome Wide 250K Array has the capacity to genotype 250,000 SNPs throughout the genome. However, these SNPs are not evenly distributed between the different chromosomes, meaning that

135

some chromosomes have a higher density of SNPs and therefore better coverage than others. Chromosome 19 is not represented by a high density of SNPs on this array. As well as this family P1 had a higher than expected number of “no calls” (represented by grey bars on Figure 5.2). Unaffected family member P1:III:4 appears to be homozygous at the 2.2Mb locus, however this individual has a particularly high number of “no calls” at this locus which makes it difficult to rule this out as a region of interest.

In order to better establish the exact size of the regions of homozygosity and the start and end points of each locus, selected samples from Family P1 were mapped again using the Affymetrix Human Genome-Wide SNP Array V6.0 (Figure 5.3). This array has over 906,600 SNP probes, which increases density and coverage on all chromosomes including chromosome 19. The results of the 6.0 array mapping eliminated the smaller locus at 19q13.31- 13.32 as a region of interest for this family. Using the new array it was clear that unaffected individual P1:III:4 was homozygous across this region. The mapping also re-defined the boundaries and the size of the homozygous locus 19p13.3-13.11. The higher density array data showed the locus to be 10.5Mb in size defined by rs4366824 and rs3852916. Within the homozygous region there were far fewer “no calls” than the 250K array data, and there was one heterozygous call for affected individuals P1:III:3 and P1:III:5 at rs8104872. This SNP was flanked by regions of homozygosity on both sides and was therefore considered to be an error/miscall in the array data (Fig 5.3).

136

Figure 5.3. Affymetrix 6.0 array data for Family P1 presented using AutoSNPa (Hg18).The higher density SNP array shows clearly that unaffected individual P1:III:4 is homozygous at the 19q13.31-13.32 region. The boundries of the homozygous region 19p13.3-13.11 are re-defined using the V6.0 array. The region is approximately 10.5Mb.

137

Copy Number Analysis of Family P1 The disease locus established at 19p13.3-13.11 was analysed for copy number variation. The Affymetrix Genome-wide Human SNP array 6.0 contains over 946,000 copy number probes and analysis of data in ChAS (Chromosome Analysis Suite) software enables detection of micro deletions or duplications. Analysis of the 6.0 array data for the three affected individuals from family P1 identified a homozygous deletion within the 19p13.3-13.11 locus. The deletion was identified using three contiguous probes; one copy number probe CN_780301 and two SNP probes SNP_A-8406134 and SNP_A- 1810878 (Figure 5.4, Table 5.1). Primers were used to confirm the presence of the deletion in affected family members and to identify breakpoints. In total 16 pairs of primers (PP1-16) were used to interrogate the suspected deletion, and one set of primers was used to sequence across the breakpoint. Please see Appendix Table 10.3 for primer sequences. Amplification occurred in all family members using all primer pairs with the exception of PP1 and PP12 which only amplified in unaffected family members (Fig 5.5). A combination of PP14 forward primer and PP15 reverse primer was used to amplify across the suspected breakpoint, and specifically designed Break point primers were used to sequence this amplicon. Sequencing revealed that the deletion was 1361bp in length, Chr19:10,701,423-10,702,784 (Hg19) in all affected family members (Figure 5.8). PCR amplification using PP1 and PP12 was carried out on 97 ethnically matched control samples to test for the presence of the deletion. All controls samples amplified using both primer pairs, indicating that the deletion is not present in the homozygous form in this panel of controls.

138

Figure 5.4. Homozygous deletion in affected members of Family P1 as shown using ChAS software.

File CN Type Chr Min Max Size Mean Cyto Genes State (kbp) Marker band Distance P1:III:2 0 Loss 19 1070157 1070183 0.254 126 p13.2 Intergenic 9 3 P1:III:3 0 Loss 19 1070157 1070183 0.254 126 p13.2 Intergenic 9 3 P1:III:5 0 Loss 19 1070157 1070183 0.254 126 p13.2 Intergenic 9 3

Table 5.1. Data for 19p13.2 homozygous deletion as displayed in ChAS software.

139

Figure 5.5. Agarose gel electrophoresis of PCR amplicons for Family P1. Left hand gel shows PCR products amplified using Deletion PP1 primers and the right hand gel shows PCR products amplified using Deletion PP12 primers. Note that amplification did not occur in affected individuals for these two primer pairs. Ladder sizes are given in base pairs.

Figure 5.6. Agarose gel electrophoresis showing a representative sample of ethnically matched control samples amplified using Deletion PP1 primers. Ladder sizes are given in base pairs.

140

Figure 5.7 (above). Chromatogram of breakpoint sequencing in affected individuals from Family P1. Breakpoint occurs at the vertical black line.

Figure 5.8 (left). Full sequence of region deleted in affected individuals from Family P1. Deleted region is 1361bp in length from 10,701,423-10,702,784 on Hg19.

141

5.3.1.2 Family P2

Figure 5.9. Pedigree of family P2. For full clinical details please see Chapter 3.

Mapping of family P2 was carried out on Affymetrix 6.0 arrays. Affected siblings P2:II:1 and P2:II:2 were genotyped and data was analysed using AutoSNPa as described previously. Family history did not indicate consanguinity within this family. However, the family came from a remote village in Ireland and genotyping was carried out to establish if there was any evidence of distant consanguinity. No large regions of homozygosity could be detected, which indicates that distant consanguinity does not exist in this family. Mapping could therefore not identify a locus for Perrault syndrome in family P2.

Figure 5.10. Figure showing small regions of homozygosity. This evidence does not indicate significant consanguinity within this family.

Affected individuals from family P2 had a limited amount of DNA available. Whole Genome Amplification (WGA) was carried out using the QIAgen Repli-G Mini Kit. The WGA process was suitable for genotyping samples, but analysis using ChAS software was not possible.

142

5.3.1.3 Family P4 Figure 5.11. Pedigree of family P4. For full clinical details please see Chapter 3.

Mapping of family P4 was carried out on Affymetrix 6.0 arrays. Affected sibling P4:II:1 and unaffected sibling P4:II:2 were genotyped and data was analysed using AutoSNPa as described previously. No single homozygous locus could be identified for this family. Affected individual P4:II:1 had many regions of homozygosity (Table 5.3) which were not shared by her unaffected brother, as shown in Figure 5.12. Affected individual P4:II:1 was not homozygous at the Chr19 locus identified in family P1.

Figure 5.12. Figure showing multiple regions of homozygosity (red boxed areas) in affected individual P4:II:1, but not in unaffected individual P4:II:2.

143

5.3.1.4 Family P5 Figure 5.13. Pedigree of Family P5. For full clinical details please see Chapter 3.

Mapping of family P5 was carried out on Affymetrix 6.0 arrays and analysed using AutoSNPa as described previously (Figure 5.14). DNA was available for both affected siblings for family P5, but only affected female P5:II:1 had enough DNA available for genotyping on this array. No single homozygous locus could be identified for this family. Affected individual P5:II:1 had many regions of homozygosity throughout the genome (Table 5.3). Affected individual P5:II:1 was not homozygous at the Chr19 locus identified in family P1.

Figure 5.14. Figure showing multiple regions of homozygosity in affected individual P5:II;1.

144

Copy Number Analysis of Family P5 Analysis of the copy number data for family P5 in ChAS, identified a heterozygous deletion in affected female P5:II;1 on chromosome 7q35 (Figure 5.15). This deletion was estimated to be 188.9Kb in size. The deleted region contained several genes, one of which, NOBOX, is implicated in ovarian dysgenesis pathogenesis, please see Chapter 1.2.2 for more details.

Figure 5.15. Heterozygous deletion of NOBOX in affected individual P5:II:1 shown using ChAS software.

Sample P5:II:1 CN State 1 Type Loss Chromosome 7 Min 143917588 Max 144106496 Size (kbp) 188.908 Mean marker distance 4607 Marker count 42 Cytoband start q35 Cytoband end q35 Genes OR2A9P, OR2A42, OR2A7, OR2A1, NOBOX, OR2A9P, OR2A42, CTAGE4, OR2A20P, OR2A20P, ARHGEF5, OR2A1

OMIM 610934 Premature_ovarian_failure_5(611548):NM_001080413 (NOBOX)

Table 5.2. Data for 7q35 heterozygous deletion detected in affected individual P5:II:1, as displayed in ChAS software. 145

5.3.1.5 Family P8

Figure 5.16. Pedigree of Family P8. For full clinical details please see Chapter 3.

Mapping of family P8 was carried out on Affymetrix 6.0 arrays and analysed as described previously using AutoSNPa. Affected sibling P8:II:1 and unaffected sibling P8:II:2 were both genotyped. No single homozygous locus could be identified for family P8. Affected individual P8:II:1 had many regions of homozygosity which were not shared by her unaffected brother, as shown in Figure 5.17. Affected individual P8:II:1 was not homozygous at the Chr19 locus identified in family P1.

Figure 5.17. Figure showing multiple regions of homozygosity in affected individual P8:II:1, but not in unaffected individual P8:II:2.

146

Table 5.3 shows all homozygous regions over 2Mb detected in affected individuals from our Perrault syndrome cohort. No additional families were homozygous for the Family P1 19p13.3-13.11 locus. One other region of homozygosity was common to affected individuals from families P4 and P5, chr12:33,778,770-37,161,420. There is only one protein coding gene which lies within this region, ALG10. The protein product of ALG10 is a glucosyltransferase responsible for adding glucose molecules to lipid linked oligosaccharide precursors prior to N-linked glycosylation [275]. The function of this gene makes it an unlikely candidate for ovarian dysgenesis or sensorineural deafness. Also importantly, genotyping of the unaffected sibling in Family P4 indicated that he was also homozygous across this region, which makes this unlikely to be a true Perrault syndrome locus.

147

Family Homozygous region Size (bps) Family Homozygous region Size (bps) chr1:8,201,516- chr17:38,722,930- P1 11,694,930 3,493,411 P5 41,647,900 2,924,976 chr1:16,796,740- chr19:23,653,450- P1 19,015,540 2,218,802 P5 33,188,930 9,535,478 chr2:239,501,200- chr20:33,943,970- P1 242,683,200 3,182,016 P5 36,207,920 2,263,944 chr19:5,716,869- chr1:182,820,200- P1 6,947,640 1,230,771 P8 185,796,900 2,976,720 chr19:6,950,420- chr1:185,801,200- P1 16,253,160 9,302,743 P8 190,821,200 5,020,064 chr1:34,526,930- chr1:191,213,400- P4 48,209,410 13,682,480 P8 196,195,200 4,981,760 chr1:48,217,800- chr1:228,751,200- P4 65,631,700 17,413,900 P8 231,572,000 2,820,816 chr2:121,142,800- chr1:231,698,200- P4 133,464,000 12,321,260 P8 234,216,300 2,518,080 chr4:109,705,800- chr2:177,681,000- P4 137,473,600 27,767,750 P8 180,058,600 2,377,632 chr5:38,751,210- chr3:61,116,070- P4 113,347,200 74,595,990 P8 64,072,900 2,956,824 chr5:156,958,300- chr3:64,451,680- P4 168,331,600 11,373,300 P8 67,534,740 3,083,056 chr7:131,114,400- chr3:89,122,740- P4 133,775,600 2,661,160 P8 95,453,800 6,331,056 chr8:53,338,200- chr3:150,350,600- P4 60,205,680 6,867,484 P8 152,764,800 2,414,192 chr8:77,523,580- chr3:157,630,100- P4 96,492,430 18,968,850 P8 172,002,300 14,372,270 chr9:137,811,500- chr6:27,420,830- P4 140,191,300 2,379,760 P8 29,473,400 2,052,574 chr11:79,720,490- chr6:123,128,100- P4 88,206,740 8,486,256 P8 129,503,000 6,374,904 chr11:88,223,810- chr8:41,531,090- P4 90,889,550 2,665,744 P8 51,927,410 10,396,320 chr12:33,459,460- chr9:8,521,899- P4 37,161,420 3,701,956 P8 11,002,670 2,480,768 chr15:37,601,240- chr9:14,252,700- P4 55,431,820 17,830,590 P8 16,896,360 2,643,661 chr15:77,953,650- chr9:16,902,370- P4 91,275,180 13,321,540 P8 23,298,830 6,396,458 chr18:65,480,890- chr9:26,823,140- P4 76,116,030 10,635,140 P8 30,269,610 3,446,472 chr9:35,428,300- P4 chr20:9,795-5,326,665 5,316,870 P8 38,711,040 3,282,748 chr22:21,275,890- chr10:95,335,420- P4 31,851,340 10,575,450 P8 99,274,450 3,939,032

Table 5.3. All homozygous regions over 2Mb detected in affected individuals from Perrault syndrome families. Family P1 19p13.3-13.11 locus is shown in red, this locus has been divided by one heterozygous call on AutoSNPa which is likely to be a false call. The only homozygous region common to more than one family is on chromosome 12 and is highlighted in yellow.

148

Homozygous Homozygous Family region Size (bps) Family region Size (bps) chr6:46,382,160- chr10:99,296,490- P5 48,428,610 2,046,452 P8 103,294,100 3,997,616 chr6:155,441,300- chr10:109,433,400- P5 157,876,300 2,435,072 P8 112,534,600 3,101,240 chr6:157,876,500- chr10:112,539,400- P5 160,475,700 2,599,248 P8 115,057,600 2,518,224 chr11:47,323,950- chr11:47,841,420- P5 50,374,120 3,050,176 P8 54,596,900 6,755,480 chr12:33,778,770- chr12:56,411,070- P5 39,107,740 5,328,976 P8 64,487,040 8,075,972 chr12:46,025,820- chr12:64,502,340- P5 48,035,040 2,009,224 P8 67,617,980 3,115,644 chr12:68,020,640- chr13:41,050,420- P5 71,174,660 3,154,016 P8 43,814,650 2,764,224 chr12:71,179,800- chr13:48,072,130- P5 73,408,390 2,228,592 P8 54,106,520 6,034,392 chr17:16,806,980- chr13:54,107,910- P5 23,399,590 6,592,614 P8 56,428,060 2,320,152 chr17:24,926,100- chr13:56,438,760- P5 29,266,150 4,340,048 P8 58,546,810 2,108,048 chr17:32,333,800- chr14:102,555,900- P5 38,566,380 6,232,586 P8 106,174,300 3,618,416

Table 5.3. Continued.

149

5.3.2 Hypogonadotropic Hypogonadism Syndrome: Family HH1

Initial mapping of family HH1 was carried out using Affymetrix Human Genome-Wide SNP Array 250K.

Analysis of 250K array data identified two areas of homozygosity on chromosome 3 and chromosome 5 (Figure 5.19). 3p22.1-p21.2 region approximately 10.1Mb

5p15.2 region approximately 2Mb

Figure 5.18. Pedigree of family HH1. For full clinical details please see Chapter 3.

150

Figure 5.19. Region of homozygosity of approximately 10.1Mb found on chromosome 3 in all affected individuals from Family HH1.

151

Figure 5.20. Region of homozygosity of approximately 2Mb found on chromosome 5 in all affected individuals from Family HH1.

In order to better establish the exact size of the regions of homozygosity and the start and end points of each locus, selected samples from Family HH1 were mapped again using the Affymetrix Human Genome-Wide SNP Array 6.0. At

152

the same time an additional unaffected family member who consented to take part in the study after the original arrays had been run was also genotyped using the 6.0 array (Figure 5.21).

Figure 5.21. Affymetrix 6.0 array genotyping data for selected members of Family HH1 displayed using AutoSNPa. Homozygous region on chromosome 3 can clearly be seen and start and end points defined using the higher density SNP array.

153

Figure 5.22. Affymetrix 6.0 array genotyping data for selected members of Family HH1 displayed using AutoSNPa. The homozygous region which was detected using the 250K array data on chromosome 5 can clearly be seen in unaffected family member HH1:II:1.

The mapping data from unaffected individual HH1:II:1 showed a large region of homozygosity at the chromosome 5 locus. This data means that the 5p15.2 locus is no longer a region of interest in family HH1. Analysis of copy

154

number data for family HH1 did not identify any novel copy number variations within the 3p22.1-p21.2 region.

5.4 Discussion

The aim of the work presented in this chapter was to identify disease loci in families affected by Perrault Syndrome and the unclassified Hypogonadotropic Hypogonadism disorder. To address this objective families with evidence of consanguinity were genotyped using Affymetrix genotyping arrays and autozygosity mapping was used to identify disease loci. The results generated by this work are discussed below.

5.4.1 Identification of a Perrault syndrome locus on Chromosome 19 The largest family in our Perrault cohort is family P1. This consanguineous family were mapped using Affymetrix SNP arrays and a large locus identified at 19p13.3-13.11. This work has identified a novel locus for Perrault syndrome which is considered to be a highly heterogeneous disorder. As discussed in Chapter 1 there have been two genes previously identified which cause Perrault syndrome. These genes are HSD17B4 and HARS2 both located on chromosome 5. These genes were identified in a single family each, and additional mutations have not been found in any other affected individuals [271]. The identification of a novel locus in family P1 adds further evidence towards the heterogeneous nature of this disorder. The chromosome 19 locus contains 298 protein coding genes (Ensembl database build Hg19). Among these genes there are several which stand out as interesting candidates for Perrault syndrome. The first candidate gene is the Cyclin-dependent kinase 4 inhibitor D, CDKN2D, which is the human homologue of the Ink4d gene in mice. Ink4d is important in maintaining the post mitotic state of sensory hair cells in the inner ear. In 2003 Chen et al created a null mouse model with a homozygous gene knockout which displayed a progressive deafness phenotype. The sensory hair cells within the Organ of Corti were shown to re-enter the cell cycle resulting in apoptosis. The hair cells in the inner ear do not regenerate which means that apoptosis of these cells leads to progressive hearing loss [276]. Another

155

knock out model of Ink4d showed evidence that this gene may be important for reproductive function. Both male and female null mice were fertile but male mice had a mild reproductive phenotype with noticeable testicular atrophy and increased apoptosis of germs cells [277]. Despite the apparent importance of this protein for maintenance of hearing and endocrine function in mice there have so far been no links to human disease. CDKN2D lies within the deafness locus DFNB68 but sequencing of the coding exons of families mapping to this locus have not revealed any mutations [278].

Two other interesting candidate genes lie within our mapped locus, MYO1F and SLC44A2 both of which have previously been linked to human hearing loss. The unconventional myosin MYO1F is part of a large family of proteins, several of which have been found to harbour mutations which cause hearing loss in humans. This was discussed in more detail in Chapter 1. MYO1F has a wide range of tissue expression including expression within the cochlear and lies within one of the two possible mapped regions for DFNB15. In the initial DFNB15 family sequencing of MYO1F did not reveal any mutations [279]. However, in 2008 a large cohort of 450 patients with non syndromic bilateral deafness revealed six missense variations in the heterozygous state. Of these six mutations, five were evolutionarily conserved, and homology modelling predicted a pathogenic role for three of the mutations. A dominant mode of inheritance could be shown within a Spanish family with a p.D776N mutation [280]. SLC44A2 is an inner ear membrane glycoprotein, the function of which is not fully understood. It has been implicated in antibody induced hearing loss in humans and interacts with Cochlin in the guinea pig inner ear making it an interesting candidate for hearing loss in our family [281, 282]. So far no pathogenic mutations in SLC44A2 in humans have been described in relation to hearing loss.

Another candidate gene which lies within our region is ZNF653. This zinc finger protein interacts with NR5A1, which encodes the protein Steroidogenic Factor 1 (SF1). Mutations in NR5A1 have been found as a cause of premature ovarian failure as well as 46XY disorders of sexual development [223]. There have been no descriptions of mutations in 156

ZNF653 as a cause of premature ovarian failure but its interaction with SF1, which is key to the transcriptional regulation of genes in the HPG axis makes it noteworthy as a candidate gene [283]. Finally the last gene within the chromosome 19 locus which represents a good candidate for Perrault syndrome is FARSA. FARSA, which encodes the phenylalanyl-tRNA synthetase alpha chain, forms an active tetramer with two alpha and two beta chains. This protein belongs to a family of enzymes whose function is to charge their relevant amino acids to tRNA. FARSA encodes the alpha chain of the cytoplasmic version of this enzyme, which attaches phenylalanine to its cognate tRNA. The Perrault gene HARS2, mentioned above, is another member of this family of enzymes. HARS2 attaches histidine to its cognate tRNA, but this enzymatic reaction takes place within the rather than the cytoplasm. Given the relationship between HARS2 and Perrault syndrome, FARSA could be considered an interesting candidate gene, despite the difference in localization of the active proteins [240].

5.4.2 Identification of a Homozygous Deletion within the Perrault syndrome locus Copy number analysis of the Affymetrix 6.0 array data for family P1 identified a small homozygous deletion within the 19p13.3-13.11 Perrault syndrome locus. This deletion was one of three homozygous deletions detected within the region (data not shown), but was the only one not previously described as a common copy number variant. The presence of the deletion in all affected and no unaffected family members, was confirmed using primers to selectively amplify regions of genome around the suspected deletion. The breakpoint was determined using PCR amplification and traditional sequencing techniques. The exact size of the deletion is 1361bp.

The deletion does not directly disrupt any known coding genes. However, the possibility of long distance effects on transcription by disruption of a regulatory region still exists. As such, the fact that this deletion does not encompass coding regions of the genome does not exclude it as a potential 157

pathogenic cause of Perrault syndrome in family P1. An example of the long range effect of deletions on gene function can be seen in the X-linked deafness gene POU3F4. Micro deletions as far as 900Kb distal to this gene are thought to be pathogenic in patients mapping to the DFN3 locus. Molecular analysis of families with x linked deafness has identified a deletion hotspot on Xq21.1 which contains regulatory regions controlling transcription of POU3F4 [284-286]. Another example of this type of gene disruption was described in a family with juvenile onset progressive cataracts. A t(5;16) translocation was identified which caused separation of the basic region leucine zipper (bZIP) transcription factor, MAF, from a regulatory domain which was thought to result in altered gene expression [287]. There are many other examples of position effect genes which can cause human disease, a good review can be found by Kleinjan 2005 [288].

Further experiments will be needed to determine the effect, if any, of the homozygous deletion found in family P1. But the identification of this deletion highlights the importance of copy number variation analysis of array data. The analysis software ChAS from Affymetrix, allows the user to look for small variations by changing the analysis settings to have minimal deletion sizes or a minimal number of probes which detect the variation. Due to this technology it is no longer just large chromosomal rearrangements that can be detected but also micro deletions and insertions, expanding the use of microarray experiments in the identification of pathogenic mutations in rare disorders.

5.4.3 NOBOX Haploinsufficiency in Family P5 Mapping of the affected female P5:II:1 could not identify a locus for Perrault syndrome. Unfortunately the DNA provided by the affected male sibling (P5:II:2) in this family was not of a high enough concentration or quality to be genotyped using the Affymetrix 6.0 array. Without samples from additional family members identification of a novel disease locus is unlikely. However, it was appropriate for this study to genotype individual P5:II:1 to ascertain if homozygosity existed at the previously identified 19p13.3-13.11 locus.

158

Potentially this could help to reduce the size of the locus, making identification of the pathogenic mutation more likely. Individual P5:II:1 did not show homozygosity at 19p13.3-13.11. This is unsurprising given that mounting evidence indicates heterogeneity in this rare syndrome. Although mapping could not provide a disease locus analysis of the copy number data from the Affymetrix 6.0 array identified a potentially interesting heterozygous deletion on chromosome 7q35. Using the Affymetrix ChAS software the deletion was estimated as being approximately 188.9kb in length, deleting coding exons of 12 genes. The most interesting of the genes disrupted by this deletion is NOBOX. As discussed in detail in Chapter 1, large deletions of chromosome 7 have been reported in patients with premature ovarian failure, as well as a range of other phenotypic features [216, 217]. The small deletion detected in the affected female from family P5 encompasses the entire coding region of NOBOX and would lead to haploinsufficiency as seen in the previously reported cases.

However, unlike the homozygous deletion described in family P1, heterozygous deletions such as this can be more difficult to confirm. The log 2 ratio intensities of the probes in the deleted region are variable, which may be due to the heterozygous nature of the deletion or could indicate the possibility of a false positive result. It is also important to note that the density of copy number probes on the 6.0 array within this region is low, and there are no copy number probes that lie within the NOBOX gene itself. Added to this there have also been reports of common copy number variations in this region of chromosome 7 [289-291]. Copy number variation data from approximately 80 other samples that have been genotyped for various diseases within our department did not show a common tendency for drop out of probe signal in this region. Some samples appeared to carry deletions within the same region but none covered the NOBOX gene (Data not shown). This work may have identified a pathogenic cause for premature ovarian failure in this patient; however, further investigations will be needed to confirm this result.

159

5.4.4 Identification of a locus for Hypogonadotropic Hypogonadism syndrome on Chr3 Mapping of the large consanguineous family HH1 has identified a disease locus of 13.1Mb on 3p22.1-p21.2. This family have a complex phenotype which is described in detail in Chapter 3. The locus contains 217 coding genes, one of which stands out as an interesting candidate gene. TMIE lies within our mapped region and mutations in this gene have previously been described in patients with recessive hearing loss. TMIE is discussed in more detail in Chapter 1. This is the only known deafness gene and there are no known hypogonadotropic hypogonadism genes within the locus, making TMIE the most promising candidate gene. Analysis of copy number variation data within the mapped region using the ChAS software did not identify any potentially pathogenic variations.

5.5 Conclusions The aim of this chapter was to use autozygosity mapping to identify novel loci for genes involved in hearing and endocrine development. The major limitation of this approach was that only consanguineous families could be mapped using this technique. This means that for families P2, P3, P6, P9 and P10, this strategy was not appropriate to identify Perrault syndrome loci. Autozygosity mapping is a good technique for mapping smaller families as it requires less samples than traditional linkage analysis; however, sample size can still be a limiting factor. For several of the Perrault families in this study an individual locus could not be identified due to lack of available samples from affected and unaffected individuals. Families P4, P5 and P8 are all small families with only one or two affected individuals and a limited number of samples available for mapping.

The work in this chapter has identified a disease locus for a novel hypogonadotropic hypogonadism syndrome and has also reinforced previous findings that suggest Perrault syndrome is a highly heterogeneous disorder. A novel locus for Perrault syndrome was identified in the largest family from our cohort, family P1. However, none of the other Perrault families from this study mapped to the same chromosome 19 locus.

160

CHAPTER 6. COPY NUMBER DELETION INVESTIGATIONS

161

6.0 Investigation of The Novel Copy Number Variants Detected In Families P1 and P5

6.1 Introduction Copy number analysis of Affymetrix V6.0 array data has identified two families with Perrault syndrome with possible pathogenic deletions. Affected members of family P1 harbour a homozygous intergenic deletion within the 19p13.3-13.11 locus identified in Chapter 1. This deletion does not directly disrupt the coding region of any genes. The aim of the work in this chapter is to investigate the possibility that this deletion could contain a cis acting regulatory domain, which when deleted affects the transcription of neighbouring genes. The second copy number variant identified using Affymetrix V6.0 array technology was a heterozygous deletion of 7q35 in affected individual P5:II:1 (see Chapter 5 for more details). This suspected deletion contains a gene called NOBOX. NOBOX haploinsufficiency has been reported as a pathogenic cause of ovarian dysgenesis [216, 217]. For more detailed description of these reports please refer to Chapter 1. The aim of the work in this chapter is to confirm the presence of a heterozygous deletion on 7q35 and haploinsufficiency of NOBOX in individual P5:II:1.

Control of gene expression in mammalian species is a complex process involving a cascade of signal transduction pathways, over 2000 transcription factors and hundreds of thousands of transcription factor binding sites [292]. Maintaining unique patterns of expression for individual cell types within an organism requires careful temporo-spatial control and integration of signal transduction pathways. A large amount of transcriptional regulation is controlled from within the promoter region close to the transcription start site, but complex organisms also require a higher level of control. As such distal regulatory sites which can be large distances from the gene being transcribed and lie within non-coding regions of the genome, are also used to control expression (Figure 6.1) [293].

162

Figure 6.1. One possible model of long range control of transcription. A distal regulatory element (yellow oval) located many kilobases upstream of the transcription start site is brought close to the RNA II polymerase holoenzyme by looping of intervening genomic DNA [292].

There are several different types of regulatory domains that can control gene expression from large distances (enhancers, silencers and insulators), all of which are important for maintaining the normal transcriptome of any given cell type [292, 293]. Deletions which encompass distal regulatory regions can result in pathogenic disorders. A good example of this being the X linked deafness gene POU3F4, see Chapter 1 for more details.

Abnormal differential gene expression can be used as an indication of transcriptional mis-regulation. Quantitative real time PCR (qPCR) is a powerful technique for investigation of differential gene expression between samples. The basis of qPCR is that fluorescent signals are monitored in real time as they are amplified through the linear phase of a set number of PCR cycles. The point at which the fluorescent signal rises above a pre-defined threshold is known as the Ct value. The smaller the Ct value, the greater the amount of initial target. This technique can be carried out using a variety of different chemistries, such as SYBR green or Taqman probes. It can also be used to quantify gene expression (by looking at mRNA levels after reverse transcription of RNA samples to cDNA), or genomic copy number. An alternative method for evaluating gene expression on a more global scale is

163

to use micro-array technology. Expression micro-arrays have the power to analyse thousands of genes simultaneously using oligonucleotide probes to capture targeted regions of cDNA sequences.

6.2 Aim The aim of this chapter is to further investigate the deletions identified from copy number analysis of the Affymetrix V6.0 array data presented in Chapter 5. Experiments are designed to assess the pathogenic potential of a 1361bp homozygous deletion on 19p13.2 detected in family P1, and to confirm the presence of an approximately 188.9Kb heterozygous deletion of 7q35 identified in family P5.

6.3 Results 6.3.1 Investigation of the Chr19 Homozygous Deletion detected in Family P1 Sequencing of Genes Located Proximal to the Homozygous Deletion The homozygous deletion detected in family P1 is located on chr19:10,701,423-10,702,784 (Hg19). This is an intergenic region of the genome which lies between the genes AP1M2 and SLC44A2 (Fig 6.2.).

1361bp homozygous deletion KRI1 CDKN2D AP1M2 SLC44A2

Fig 6.2. Location of chromosome 19 homozygous deletion relative to surrounding genes.

Upon initial identification of this deletion sequencing of four neighbouring genes was carried out in Perrault patients from families within our cohort. Collection of samples for this project was an ongoing process and at the time that this work was completed samples were only available for families P2, P3, P8 and P9. All coding exons, and exon-intron boundaries were amplified using standard PCR techniques, with primers designed using Primer 3 (http://frodo.wi.mit.edu/primer3/). Amplified exons were sequenced using sanger sequencing technology. For full details of the methods used

164

please see Chapter 2. For a full list of primer sequences please see Appendix Table 10.4. No pathogenic mutations were identified (Table 6.1)

Sample Sequencing carried out for Mutations detected? all coding exons of AP1M2, CDKN2D, KRI1 AND SLC44A2? P9:II:1 Y N P4:II;1 Y N P8:II:1 Y N P3:II:1 Y N P3:II:2 Y N P2:II:1 Y N P2:II:2 Y N

Table 6.1. List of patients from Perrault syndrome cohort that were sequenced for AP1M2, CDKN2D, KRI1 and SLC44A2 mutations.

165

Multi-species Alignment of the 19p13.2 Deleted Region Using the Ensembl web browser a multi-species alignment of the deleted region was carried out. Alignment of the human sequence to the corresponding region of four other species showed that the region was not highly conserved. Some small sections of conserved sequence were identified and these are shown in blue in Figure 6.3. A region was considered to be conserved if greater than 50% of bases in the alignment were the same. Details of the four species used in the alignment and the aligned regions are shown in Table 6.2. The same alignments were repeated using the ClustalWeb alignment tool and a pairwise alignment score was calculated for each species compared to the human sequence. The pairwise alignment score for the chimpanzee sequence was 98%, for mouse 54%, for cow 53% and for dog 51%.

Homo sapiens (Human) chromosome:GRCh37:19:10701423:10702784:1 Pan troglodytes (Chimpanzee) chromosome:CHIMP2.1:19:10856157:10857517:1 Mus musculus (Mouse) chromosome:NCBIM37:9:21117453:21118530:1 Bos taurus (Cow) chromosome:UMD3.1:7:16317109:16318193:1 Canis lupus familiaris (Dog) chromosome:BROADD2:20:53595715:53596865:-1

Table 6.2. Details of multi species alignment for the chromosome 19 homozygous region deleted in Family P1.

166

Homo sapiens TTCTCCTGCCTCAGCCTCCCAAGTAGCTGGGACTACAGGTGCACTCCACCATGCCCAGCT Pan troglodytes TTCTCCTGCCTCAGCCTCCCAAGTAGCTGGGACTACAGGTGCGCTCCATCATGCCCAGCT Mus musculus ------Bos taurus ------Canis lupus familiaris ------

Homo sapiens AATTTTTGTATTTTTATTAGAGATGGGGTTTCACCATGTTGACCAGGATGCTGTCGAACT Pan troglodytes AATTTTTGTATTTTTATTAGAGATGGGGTTTCACCATGTTGACCAGGATGGTGTCGAACT Mus musculus ------Bos taurus ------Canis lupus familiaris ------

Homo sapiens CTTGACCTCGTGATCCGCCCACCTCAGCCTCCCAAAGTGCGGTAGTTTTGAGGACAGGGC Pan troglodytes CTTGACCTCGTGATCCGCCCGCCTCAGCCTCCCAAAGTGCGGTAGTTTTGAGGACAGGGC Mus musculus ------Bos taurus ------Canis lupus familiaris ------

Homo sapiens AGAAGGAAAGGGAAAGTAAACCT------CAGCCTTGCTTAGA-ACCACCTTGCTTGGC- Pan troglodytes AGAAGGAAAAGGAAAGTAAACCT------CAGCCTTGCTTAGA-ACCACCTTGCTTGGC- Mus musculus ------Bos taurus ---AGGAGAGGGAAAATAAATCT------CAGGCT-GGTTGGA-GTGATCTTGCTCTACA Canis lupus familiaris ---AGGGGAAGGAAAGGAGCACTCCCTTCCTGCCT-GACTGGAACTGATCCTACTTAAGA

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus GAGAGTCCCTTGGACAGCAAGATCAAACCAGTCAATCCTAAAGGAAATCAGCCTGAATAT Canis lupus familiaris GAGGG------

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus TTTTTGGAAGGACTGATCCTGAAGCTGAAGCTCCAATACTTTGCCCACCTGATGTGAAGA Canis lupus familiaris ------

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus GCTGACTCATTAGACAAACCCTGATGCTGGGCGAGATTGAGGGCAGGAGGAGAAGAGGGT Canis lupus familiaris ------AGAG------

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus GACAGAGGATGAGATGGTTGGATGGCATCACCGACTCAATGGACATGAGACTGAGCAAAC Canis lupus familiaris ------

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus TCCGGGAGATAGTGAAGGACAGGGAAGCCTGGTGTGCTTCAGTCCATGGGGTGGCAAAGA Canis lupus familiaris ------

Homo sapiens ------TGAGGGAGGGAGATGTCCTGCAGGGCCTG Pan troglodytes ------TGAGGGAGGGAGATGTCCTGCAGGGCCTG Mus musculus ------Bos taurus GTAGCACGCGGCTGAGCAGCTGAACAACCACCGAGGGAGGGAGA-ATCCTGGAAGGCCAC Canis lupus familiaris ------GCTCAA--AGGGCCGGGGGTGGGGGT-GGGGTGGGGGGGATG

Homo sapiens CA-GGGAACAAGACTCAGCCTACCCAGACCTCACTTGTCCTTTCCCCAAACCCTGCCCAG Pan troglodytes CA-GGGAACAAGACTCAGCCTACCCAGACCTCACCTGTCCTTTACCCAAACCCTGCCCAG Mus musculus ------Bos taurus AA-GGGGGCGACATTCAGCTCTCCCAGGCCTAGTGCTTGG-TCCGCCAAACCCAGCCCGG Canis lupus familiaris GGGGGGGTGGACACTCAGCTCTCCCAGGCCTCATACTTCC-TTCTGCAAACCCAGCCCCA

Homo sapiens CCCTTTATCAGCCCCTGTAGGTGGGG------A-CTTA------G Pan troglodytes CCCTTTATCAGCCCCTGTAGGTGGGG------A-CTTA------G Mus musculus ------Bos taurus -TCTTTATCAGCCCCTGAGGGTGAGATGTGCGTCCCCTCAGGGTACTTGAGGAAGGGAGG Canis lupus familiaris CACTCTATCAGCCTCTGGGGATGGGGAGGGCGTCCCCACAGGG-A-CTGAGGGAGGGAGG

Homo sapiens AG-----AACAGCAGCTATCAGGATCATCCACTTGCCTC------Pan troglodytes AG-----AACAGCAGCTATCAGGATCATCCACTTGCCTC------Mus musculus ------Bos taurus AG-----GACAGCAGCTGTCAGAGCCATCCACTGC-TT-CTGGC-GTCATTTGCTGAGCA 167

Canis lupus familiaris CGAGGGTGGCAGCTGCTATCAGAACTACCCACTCCCCTCCTCGCTGGCATCTAATGGGGA

Homo sapiens ------TCCACCTGAAATTAACATCAACATTGACAGCAGCACTGCACTTTA Pan troglodytes ------TCCACCTGAAATTAACATCAACATTGACAGCAGCACTGCACTTTA Mus musculus ------Bos taurus ------Canis lupus familiaris TAATATCACAGCAAC------GAAAGCAGAAATGCACTC--

Homo sapiens ATTAAAACATTATCATCTGGGCCAGGAGCAATGTTTCATGCCTGT-ATCCCAGCACTTTG Pan troglodytes ATTAAAACATTATCATCTGGGCCAGGAGCAGTGTTTCATGCCTGT-ATCCCAGCACTTTG Mus musculus ------GGCGGTGGCGCACGCCTTTAATCCCAGCACT-CG Bos taurus ------Canis lupus familiaris ------

Homo sapiens GGAGGCCAAGGTGGACAGAGCA-----CCTGAGGCCAG------GAGTTTG Pan troglodytes GGAGGCCAACGTGGACAGAGCA-----CCTGAGGCCAG------GAGTTTG Mus musculus GGAGGCAGAGGCAGGCGAATTTCTGAGTTCGAGGCCAGCCTGGTCTACAGAGTGAGTTCC Bos taurus ------Canis lupus familiaris ------

Homo sapiens AGACCAGCCTGGCCAACATGGCGAAACCCTGTCTCCACTAAAAATACAAAAATTAGCCGG Pan troglodytes AGACCAGCCTGGCCAACATGGCGAAACCCTGTCTCCACTAAAAATACAAAAATTAGCCGG Mus musculus AGGACAGCCAGGGCTATACAGAGAAACCCTGTCTCGAAAAACCATAAAAAAA------Bos taurus ------Canis lupus familiaris ------

Homo sapiens GCGTGGTGACGCATGCCTGTAATTCCAGCTACTTGGGAGGCTGAGGCAGAAGAATCACTT Pan troglodytes GCGTGGTGACGCACGCCTGTAATTCCAGCTACTTGGGAGGCTGAGGCAGAAGAATCACTT Mus musculus ------Bos taurus ------Canis lupus familiaris ------

Homo sapiens GAACTGGGGAGGCGGAGGTTATAGTGAGCCGAGATGGCACCACTGCACTCCAGCCTGGGT Pan troglodytes GAACTGGGGAGGCGGAGGTTATAGTGAGCCGAGATGGCACCACTGCACTCCAGCCTGGGT Mus musculus ------Bos taurus ------Canis lupus familiaris ------

Homo sapiens GACAGAGTGAGACTCCGTCTCAAGAAAATAAATAAATAGAATGAAAAATAAAAAGTTATC Pan troglodytes GACAGAGTGAGACTCCATCTCAAGAAAATAAATAAATAGAATGAAAAATAAAAAGTTATC Mus musculus ------AAAAAAAAAAAAAAAAATCTAGAAGAAA-GGTTGAG Bos taurus ------Canis lupus familiaris ------

Homo sapiens ------Pan troglodytes ------Mus musculus ATAGCCCGGATGGCCTGGAGCTCATTGTGTGGTCCAGACTGGCCTCAGCCTTACAAGTTC Bos taurus ------Canis lupus familiaris ------

Homo sapiens ------Pan troglodytes ------Mus musculus TGTGATTAGTCACGAACCAACATGTCTGACTTTGTTTTCCGTTTTCTAATTTCATTCTTT Bos taurus ------Canis lupus familiaris ------

Homo sapiens ------Pan troglodytes ------Mus musculus TTTTTTTAATATCCCAGGCTACCCTTAAACTCAAGATCATTCTGCTTCCCCATTGCTGGG Bos taurus ------Canis lupus familiaris ------

Homo sapiens ------Pan troglodytes ------Mus musculus ATACCGTTTTGCGGAAGGTGATGGGGAAAAAAACCCACTCTAAACCTTAGCCTAGCCATC Bos taurus ------Canis lupus familiaris ------

Homo sapiens ------Pan troglodytes ------Mus musculus TACATGACGCTCCGCTGTGGAAGAAACGTTTCCTGTGCAGCCGCCAGGGGGCGACAAAGC Bos taurus ------Canis lupus familiaris ------

Homo sapiens ------Pan troglodytes ------Mus musculus CCGTTTGGGCATCGGTGCGGGCTCTGACCTTTAACAGGCCGGGAGTCCAGATGCAGCTAT 168

Bos taurus ------Canis lupus familiaris ------

Homo sapiens ------ATCTG----GCACAAGTCTCC Pan troglodytes ------ATCTG----GCACAAGTCTCC Mus musculus GAGGATCTCCCAAGGCAATTGTCCTAAAGCAGAGGGCCTGTGTGAAGAGCTT------Bos taurus ------Canis lupus familiaris ------

Homo sapiens TT------Pan troglodytes TT------Mus musculus AT------Bos taurus ------Canis lupus familiaris --CTTGAGCATGTCTCACCCGATACTCCTTTCTGTCTTCAATTTTTTTTTCAAGTTAGTT

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus ------Canis lupus familiaris ATTATTATTATTATTTTTTTTTTTTTTGCTGTTCAACTCTGGTCTACCTATCTTTGGTGT

Homo sapiens ------CGTGCCATCTTATAAAGGTAGAAGCA Pan troglodytes ------CGTGCCATCTTATAAAGGTAGAAGCA Mus musculus ------CA---GCCACCACCTGGGAGGAAGCA Bos taurus ------CATGTCCACTCACAGAGAAAGAAGCA Canis lupus familiaris TCCTGAGAATTCAGGACCCGAGGGGAAAACATTTCATGTCTTCCTACCTTGGAAGAAGCA

Homo sapiens GGGACACAGAGAAGTGAAGGCACCTGCCCAAGCCCACGGCCCCAGAGCCAGGAGGCAGTT Pan troglodytes GGGACACAGAGAAGTGAAGGCACCTGCCCAAGCCCACGGCCCCAGAGCCAGGAGGCAGTT Mus musculus GA-GCACAGACAGT------CTCAGGGCCA------CACAGCCGGGTTA---TC Bos taurus GAGGCACAGAGAGTTGAAGCCACTTGCCCAAGGCCA------CACAA----G----AGAT Canis lupus familiaris GAGGCACAGAAAGGTGAAGCCATTCGCCTGGGGTTG------CACAGCTGGGAGGCGGCT

Homo sapiens GACCCAGAATTC---GAATCTGAGTCCCGGGATTTAAGAATCTG------Pan troglodytes GATCCAGAATTC---GAATCTGAGTCCCGGGATTTAAGAATCTG------Mus musculus AGGCCACAATTCTCTTATTCTG-GTTGT----TTTGACAGT---CCTAGGGGGTGGAACG Bos taurus GGGCCAGAATTT---GAACCTGAGTTGTGGAGTTTGAGAGCTGA------Canis lupus familiaris GGGCCAGAATTT---GAACCTGAGTCCTCATGTTCAAGAGTTCA------

Homo sapiens ------GTCCCAGCCTGCTTCC Pan troglodytes ------GTCCCAGCCTGCTTCC Mus musculus CCCACCTTTGCTGGTGACAGGCAAGCCCTGTGCCCCTGATTACATTCCCAGCCCTT---- Bos taurus ------GTCCCAGTCCTCATCC Canis lupus familiaris ------ACCCAAACCCACTCCC

Homo sapiens G-CCTGCTTTGC------TAT--AT-TGCTGGATGCAAATATTACTATTAGTAACA Pan troglodytes G-CCTGCTTTGC------TAT--AT-TGCTGGATGCAAATATTACTATTAGTAACA Mus musculus -AATGTTTTTG-TCTGGGGCCTCAT--GTATAGCACA------Bos taurus A-ATGGCTCTGT------TATGTGT--AACACATACAAGCATTACTGTTACTAACA Canis lupus familiaris A-ACTGCTCTGC------CCCGTAT-TGCCAGATACGAGCATTACTATTATCAACA

Homo sapiens CTAACTCTGGCTTAAATGGTGCTCACCATAAGTGAGGGTG------Pan troglodytes CTAACTCTGGCTTAAATGGTGCTCACCATAAGTGAGGGTG------Mus musculus ------GGCT------Bos taurus ATAGCTAACCCCTGTGTGGTGCTCACTTT------ACCCAGGGGCTGGTTTCCTC Canis lupus familiaris ATAGGCCCTGCTGGTGTGGTGCCCATTAT------

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus TATGACTCAGTTGGTAAAGAATCTGCCTGAAATGCAGGAGGCATGGGAGTACACGTTCAT Canis lupus familiaris ------

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus CCCTGGGTGGGGAAGATTCCCCTGGAGGAGGAAACGGCAACCCACTCCAGTATTCATGCC Canis lupus familiaris ------

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus TGGGCAATCCCATGGACAGAGGAGCTTTGTGGGCTACAGCCCATGGGGTCGCAAAGAGTC Canis lupus familiaris ------

Homo sapiens ------GCCCTAAATGTCG Pan troglodytes ------GCCCTAAATGTCG 169

Mus musculus ------AACCTTAAATTC- Bos taurus AGACACGTCTGAGAGGCTAAGCACACGTTCATGCACACCCAGGGGCTACCCTGAAAGTCT Canis lupus familiaris ------GCCCTAAATGTCT

Homo sapiens CCCATGCAG------Pan troglodytes CCCATGCAG------Mus musculus ------CCCCTTTTTTC------TCTTTTTT Bos taurus TCCAAGTGGGA------Canis lupus familiaris CCCCTGTGGGAATGCTTCCGGAAACAATGCCTTTTTTTTGTTTGTCTTAAGATTTTTTTT

Homo sapiens ------Pan troglodytes ------Mus musculus TTCCTTTTAAATCTTTTTAAAGATTCATTTATTTA------Bos taurus ------Canis lupus familiaris TTAAAGATTAATTTTTTTTAATTTTTATTTATTTATGATAGTCACAGAGAGAGAGAGAGG

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus ------Canis lupus familiaris CAGAGACACAAGCAGAGGGAGAAGCAGGCTCCATGCACCGGGAGCCCGACGTGGGATTCG

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus ------Canis lupus familiaris ATCCCAGGTCTCCAGGATTGCGCCCTGGGCCAAAGGCAGGCGCTAAACTGCTGCGCCACT

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus ------Canis lupus familiaris CAGGGATCCCCTGTCTTAAGATTTTATTTATTTATTGGAGACACACAGAGAGAGGCAGAG

Homo sapiens ------Pan troglodytes ------Mus musculus ------Bos taurus ------Canis lupus familiaris ACATAGGCAGAGGGAGAACAGGCTCCTAGCAAGGAGCCCAATGTGGGACTCGATCCCAGG

Homo sapiens ------Pan troglodytes ------Mus musculus ------TTATATGTAAGTACACTGTAGCTGTC Bos taurus ------Canis lupus familiaris ACTCCAGGATCATGCCCTGAGCCCAAGGCAGACG------

Homo sapiens ------Pan troglodytes ------Mus musculus TTCAGACAATCCAGAAGAGGGAGTCAGATCTTTTTACAGATGGTTGTGAGCCACCATGTG Bos taurus ------Canis lupus familiaris ------

Homo sapiens ------GAA Pan troglodytes ------GAA Mus musculus GTTGCTGGGATTTAAACTCCGAACCTTCGGAAGAGCAGTCGGGTGCTCTTACCCACTGAG Bos taurus ------Canis lupus familiaris ------CTCAACCACTGAG

Homo sapiens ------TCCTCAAGACAGCCCCTTGAATTACAGAAGCCTATTTT-TTTTTTTTA Pan troglodytes ------TCCTCAAGACAGCCCCTTGAATTAGGGAAGCCTATTATCTTTTTTTTA Mus musculus ------CCATCTCACCAGCCC------Bos taurus ------GGTAATGTTTTGAGTTCAGGAACCTGATGTTG------Canis lupus familiaris CCACCCAGGCGTCCCTGGAAACAATGCTTTGAGTTAGGGAGGCCTATCATC------

Homo sapiens TTTATTTGAGACAAGGACTTGCTTTGTCGCCCAGGCTGCGGTGCAGTGGTGCAAACACGG Pan troglodytes TTTATTTGAGACAAGGACTTGCTTTGTCGCCCAGGCTGTGGTGCAGTGGTGCAAACACGG Mus musculus ------Bos taurus ------Canis lupus familiaris ------

Homo sapiens CTCACTGCAGCCTCAACCTTCTGAACTCAAGAGATCCTCTCACCTAGGCCTCTCCAGTAG Pan troglodytes CTCACTGCAGCCTCAACCTTCTGAACTCAAGAGATCCTCTCACCTAGGCCTCTCCAGTAG Mus musculus ------Bos taurus ------Canis lupus familiaris ------

Homo sapiens CTGGGACTACAGGTGAGCAACACCACTCCCAGCTAATTTTTAAATTTTTTTTTTTTTTTT 170

Pan troglodytes CTGGGACTACAGGTGAGCACCACCACTCCCAGCTAATTTTTAAATTTTCTTTTTTTT--T Mus musculus ------TCTTTTTTTCTTTTT Bos taurus ------Canis lupus familiaris ------

Homo sapiens TTTTGAGACAGAGTCTTGCTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCGGCTC Pan troglodytes TTTTTAGATGGAGTCTTGCTCTGTCACCCAGGCTGGAGTGCAGTGGCGCAATCTTGGCTC Mus musculus TTTCAAGACAGGGTTTCTCTGTATAGCCCTGGCTGT------Bos taurus ------Canis lupus familiaris ------

Homo sapiens ACTGCAACTTCCGCCTCACAGGTTCACACCCT Pan troglodytes ACTGCAACTTCCGCCTCACGGGTTCACACCCT Mus musculus ------Bos taurus ------Canis lupus familiaris ------

Figure 6.3. Multi species alignment of the chromosome 19 homozygous deletion identified in Family P1. The alignment was carried out using the Ensembl genome browser (http://www.ensembl.org/index.html). Regions of conserved sequence are highlighted in blue.

171

SYBR Green analysis of Genes Surrounding Homozygous Deletion in Family P1 RNA was extracted from lymphocyte primary cell cultures for two members of Family P1, affected individual P1:III:3 and unaffected individual P1:II:1. Cells were cultured to approximately 10 million cells and pelleted. RNA was extracted from the cell pellets and converted to cDNA using reverse transcription PCR. For full description of the methods and materials used see Chapter 2. Lymphocytes were chosen for cell culture as they are a practical cell line to work with and are easily available. However, it is important to note when analysing results that lymphocytes may not have the same pattern of protein expression as tissues which are more relevant to the disease phenotype. They could however give an indication of differential expression if regulation of transcription has been disrupted.

Experiments were carried out to assess the efficiency of SYBR green reactions for each of the gene assays. Two experiments were carried out with different dilutions of control cDNA. The cDNA sample used for all validation experiments was P1:II:1 (265ng/ul). Figure 6.4 shows an example of the amplification curve and standard curve for the CDKN2D assay. Data tables and plots for all gene assay validation experiments can be seen in Appendix Tables 10.5-10.16 and Appendix Figures 10.2-10.

172

Figure 6.4 Amplification plot and standard curve of CDKN2D dilution series using ABI SDS software. Red line on amplification plot indicates Ct threshold. Slope of curve is -3.4(1d.p.)

Throughout the validation experiments amplification was consistently seen in non template control (NTC) samples for GAPDH primers. These primers were designed and purchased from Primer Design Ltd and the sequence of the primers was not provided. The NTC amplification occurs at approximately 35 cycles, indicating the possibility that this amplification was due to primer-dimer formation. In order to assess this PCR amplification of a range of human embryonic cDNA samples for GAPDH using the Primer design assay was carried out (Figure 6.5). This amplification was carried out for 35 cycles on the Veriti Thermal cycler, using 10ng of cDNA as starting material. In Figure 6.5 weak bands which are smaller than 100bp can be seen in every sample amplified. These bands represent primer- dimer formation at 35 cycles. It is therefore likely that the amplification seen in the NTC samples for GAPDH is due to primer-dimer formation and not contamination.

173

Figure 6.5. Electrophoresis gel showing GAPDH amplification in a range of embryonic cDNA samples. Weak primer-dimer bands of less than 100bp can be seen in all samples amplified.

Primer-dimer formation can sometimes cause errors in the quantification of mRNA levels using qPCR. However, it is considered unlikely that this will be the case for these experiments given that the bands detected in Figure 6.5 are relatively weak at 35 cycles. Also a significant peak representing primer-dimer formation was not detected on the dissociation curve for these GAPDH primers (Figure 6.6). Efficient amplification of AP1M2 primers could not be achieved during assay validation. It was therefore not possible to assess relative expression of AP1M2 transcripts. Please see Appendix Tables 10.10 and 10.16 for data and Appendix Figures 10.3 and 10.5 for amplification plots. Primer efficiencies for the remaining assays were calculated from standard curve slopes using the Agilent qPCR slope to efficiency calculator http://www.genomics.agilent.com/CollectionSubpage.aspx?PageType=Tool &SubPageType=ToolBioCalculator&PageID=14#2

174

Gene Experiment Slope Assay efficiency GAPDH 1 -3.209548 104.913% CDKN2D 1 -3.3489196 98.886% KRI1 1 -3.3639436 98.276% SLC44A2 1 -3.3459854 99.006% AP1M2 1 N/A N/A GAPDH 2 -3.3304083 99.647% CDKN2D 2 -3.4367142 95.423% KRI1 2 -3.4832575 93.681% SLC44A2 2 -3.412971 96.336% AP1M2 2 N/A N/A

Table 6.3. Efficiency of individual primer assays as calculated from validation experiments.

Gene Average Assay efficiency (1d.p) GAPDH 102.3% CDKN2D 97.2% KRI1 96.0% SLC44A2 97.7%

Table 6.4. Average assay efficiency based on validation experiments.

In order to use the comparative Ct method, the primer efficiencies of the endogenous control and the genes of interest need to be similar. The maximum difference in average efficiency is between GAPDH and KRI1 at 6.3%. The minimum difference in average efficiency was between GAPDH and SLC44A2 at 4.6%. These efficiencies were considered to be similar enough to continue with the comparative Ct method. An alternative method of calculating expression differences between two sets of data is the Pfaffl method. The difference between the Pfaffl method and the Comparative Ct method is that the Pfaffl method incorporates a correction for differences in PCR efficiency [294]. Because of the range of differences in efficiency between our genes of interest and the housekeeping gene GAPDH, the Pfaffl method shall also be used to evaluate the data generated by these experiments.

175

Expression Analysis of Family P1 using SYBR Green based qPCR

Figure 6.6. Dissociation curves and amplification plots for GAPDH (A), CDKN2D (B), KRI1 (C) and SLC44A2 (D). Three DNA samples were analysed using SYBR green, affected individual P1:III:3, unaffected individual P1:II:1 and control biobank DNA purchased from Primer design Ltd. All subsequent analysis was carried out on samples from Family P1.

SAMPLE GAPDH Ct CDKN2D Ct KRI1 Ct value SLC44A2 Ct value Value Value P1:III:3 17.34 25.07 21.95 25.02 P1:III:3 17.44 25.11 21.95 24.96 P1:III:3 17.49 25.05 21.90 25.00 Average Ct 17.42 (± 0.08) 25.08 (±0.03) 21.93 (±0.03) 24.99 (±0.03) for P1:III:3 P1:II:1 17.50 25.98 22.93 25.10 P1:II:1 17.46 25.92 22.85 25.04 P1:II:1 17.30 25.96 22.84 24.97 Average Ct 17.42 (±0.11) 25.95 (±0.03) 22.87 (±0.05) 25.04 (±0.07) for P1:II:1

Table 6.5. Triplicate Ct values for individuals from Family P1.

176

Comparative Ct Method Calculations

= FOLD CHANGE IN AFFECTED INDIVIDUAL RELATIVE TO UNAFFECTED INDIVIDUAL

∆Ct = Ct(gene of interest) – Ct(endogenous reference gene) ∆∆Ct = ∆Ct (sample A) - ∆Ct (sample B), where sample A is the affected individual and sample B is the unaffected individual.

Sample ∆Ct CDKN2D ∆Ct KRI1 ∆Ct SLC44A2 P1:III:3 7.65 4.51 7.57 P1:II:1 8.53 5.45 7.62

∆∆Ct -0.88 -0.94 -0.05

FOLD CHANGE 1.84 1.91 1.04 2^-(∆∆Ct)

Table 6.6. Results of the Comparative Ct method analysis of SYBR green based qPCR data.

Pffafl Method Calculations

= FOLD CHANGE IN AFFECTED INDIVIDUAL RELATIVE TO UNAFFECTED INDIVIDUAL

E = Efficiency as calculated in validation experiments GOI = Gene of interest Ref = Reference gene (GAPDH)

Gene Ct of Ct of ∆Ct (Ct P1:II:1 Efficiency E∆Ct Fold P1:III:3 P1:II:1 – Ct P1:III:3) (E) change GAPDH 17.422 17.420 -0.002 2.023 0.998 CDKN2D 25.075 25.954 0.879 1.972 1.816 1.819 KRI1 21.934 22.874 0.940 1.960 1.882 1.885 SLC44A2 24.995 25.035 0.040 1.977 1.028 1.030

Table 6.7. Results of the Pfaffl method analysis of SYBR green based qPCR data

177

Figure 6.7. Fold change for SLC44A2, KRI1 and CDKN2D expression in sample P1:III:3 compared to P1:II1 using different methods of data analysis

Expression Array Analysis of Family P1 At the same time as running the SYBR green qPCR experiments, expression array analysis was carried out. The SYBR green experiments were used to analyse expression changes in the genes immediately surrounding the homozygous deletion. But as discussed previously cis acting regulatory regions can exert effects on genes from several hundred kilobases away. Expression array technology was used to assess gene expression on a whole genome scale.

The HG_U133 Plus 2.0 array from Affymetrix was selected for this set of experiments. See Chapter 2, for full details of methods used. As with the qPCR experiments, gene expression was compared for unaffected individual P1:II:1 and affected individual P1:III:3 from Family P1. The HG_U133 Plus 2.0 array contains 54,675 probes which analyse approximately 38,500 unique genes, and 47,400 transcripts. Out of these transcripts, 179 displayed statistically significant differential expression between the two samples.

The raw data generated from the HG_U133 array was analysed using the Limma fit model from the Limma Bioconductor package.

178

http://www.bioconductor.org/packages/release/bioc/html/limma.html This analysis was carried out by a bio-informatition Mark Wappett as a service provided by the Patterson Institute. The Limma fit model was used to calculate an adjusted P value, which allows for the multivariate nature of array based experiments and works out a false discovery rate. Because of the multiplicity of statistical tests carried out on microarray data issues can occur with the number of false positive calls. Statistical analysis of a single test would often use a P value threshold of 0.05. This means that the chances of the result being a false positive are 5%. The false positive rate is the proportion of negative results that are declared positive (in the case of expression analysis this is the number of genes which are truly not differentially expressed but are declared differentially expressed in the analysis). However, 5% of the total number of genes is a much higher proportion of the total number of differentially expressed genes. A good example of this is explained by Gusnanto et al [295]. The example given is that in a panel of 10,000 genes that are being assessed for differential expression the false positive rate is 0.05 (475/9500). If 875 genes are declared as differentially expressed, all false positives are within this group. Using the false positive rate above this would be 475/875 which is 54% false positives. This value is the false discovery rate (FDR) [295]. To deal with the problem caused by multiplicity an FDR adjusted p value is generated. The FDR adjusted p value is only applied to significant results. For example an FDR adjusted p value of 0.05 would mean that 5% of the genes that are declared as differentially expressed will be false positives which is a much smaller number of genes.

A selection of the large amount of data generated by the expression array experiments is presented below. Due to the way in which bioinformatic analysis of the data was carried out at the Paterson Institute, University of Manchester, all values represent the level of expression in unaffected individual P1:II:1 when compared to affected individual P1:III:3. This means that a positive fold change value indicates that unaffected individual P1:II:1 has a higher level of expression that affected individual P1:III:3. Therefore, the positive fold change genes are at a lower level of expression in patient 179

P1:III:3 when compared to the unaffected control. A negative fold change value indicates that unaffected individual P1:II:1 has a lower level of expression that affected individual P1:III:3. Therefore, the negative fold change genes are at a higher level of expression in the affected individual compared to the unaffected family member.

The SYBR green analysis indicated that KRI1 and CDKN2D had higher levels of expression in affected individual P1:III:3 compared to control individual P1:II:1. This result was not supported by the array data. Data for the ten genes closest to the intergenic deletion are shown in Table 6.8, this includes KRI1 and CDKN2D. As seen by the values given in Table 6.8 there was no statistically significant difference in the expression levels of any of these genes. For several genes more than one expression probe is present on the array. The data for all expression probes for each gene is given in Table 6.8.

Figure 6.8, displays all genes on chromosome 19 which showed statistically significant differential expression between the two samples analysed. Two genes (AXL and KANK2) on chromosome 19 showed a positive log2 fold change, meaning that they have a lower level of expression in affected individual P1:III:3. Four genes (CD22, LILRB1, NAPSB and SIGLC6) showed a negative log2 fold change meaning that they have a higher level of expression in affected individual P1:III:3. Figure 6.9, displays genes on all chromosomes with a statistically significant positive Log2 fold change (adjusted P value ≤0.05). This positive Log2 fold change means that these genes are at a lower level of expression in affected individual P1:III:3 compared with unaffected individual P1:II:1. Figure 6.9 only represents the genes with a fold change ≥ 2. For a full list of differentially expressed genes please see Appendix Table 10.17.

Fig 6.10. displays genes on all chromosomes with a statistically significant negative Log2 fold change (adjusted P value ≤0.05). A negative Log2 fold change means that these genes are at a higher level of expression in 180

affected individual P1:III:3 than in the unaffected individual P1:II:1. Figure 6.10 only displays the genes with a fold change ≤-3. For a full list of differentially expressed genes please see Appendix Table 10.17.

181

Probeset Name Gene P1:III:3_r1 P1:III:3_r2 P1:III:3_r3 P1:II:1_r1 P1:II:1_r2 P1:II:1_r3 221417_x_at S1PR5Symbol -0.010 -0.011 -0.204 -0.055 -0.608 -0.368 230464_at S1PR5 7.313 6.703 6.993 6.216 6.154 6.334 233743_x_at S1PR5 -0.203 -0.009 0.450 0.007 -0.145 -0.376 226871_s_at ATG4D 8.501 8.288 8.494 8.343 8.210 8.360 218798_at KRI1 5.633 4.745 5.562 4.871 4.796 5.301 227587_at KRI1 8.855 8.803 8.778 8.776 8.629 9.027 210240_s_at CDKN2D 5.506 5.540 5.640 4.885 5.315 4.940 213586_at CDKN2D 4.098 3.761 4.201 0.785 1.218 3.173 218261_at AP1M2 0.067 0.568 0.531 1.076 0.625 0.580 65517_at AP1M2 3.060 3.006 2.615 2.980 3.323 1.490 224609_at SLC44A2 5.901 6.118 5.837 6.283 5.865 5.793 225175_s_at SLC44A2 7.508 7.355 7.035 6.984 7.377 7.591 208930_s_at ILF3 8.352 8.378 8.583 8.333 8.133 8.378 208931_s_at ILF3 6.449 6.348 6.426 6.442 6.500 6.227 211375_s_at ILF3 8.703 8.535 8.525 8.311 8.309 8.055 217804_s_at ILF3 6.209 5.835 6.580 6.451 6.016 6.619 217805_at ILF3 5.010 6.453 6.596 4.954 5.663 6.087 221270_s_at QTRT1 5.378 4.966 5.669 5.711 5.183 4.903 1555895_at DNM2 4.196 4.968 4.299 4.530 4.283 4.300 202253_s_at DNM2 2.885 4.024 3.782 3.485 2.629 3.331 216024_at DNM2 6.995 7.089 7.043 6.969 7.117 6.951

203679_at TMED1 5.894 6.306 6.373 5.116 5.668 5.846

Table 6.8. Expression array data for 10 genes surrounding the19p13.2 intergenic deletion. Triplicate datasets are shown for affected and unaffected individuals; labelled r1-r3. Numerical values represent level of expression for each probe.

182

Chr19 Differentially Expressed Genes

3

2 1

0

-1

Log2 Fold change Fold Log2 -2 -3 SIGLEC KANK2 AXL NAPSB LILRB1 CD22 CD22 CD22 6 Log2 Fold Change 1.89872 1.89092 -1.9975 -2.0298 -2.0472 -2.4518 -2.5287 -2.7463 Gene name

Figure 6.8. Chromosome 19 Differentially Expressed Genes.

183

Genes with positive Log2 Fold Change

EIF1AYLOC100130829LOC100130216LOC100133662LOC100130224JARID1D /// CYorf15AZFY ///DDX3Y USP9Y ///LOC100130216 RPS4Y1 /// CYorf15AUTYLOC100130829CYorf15BCYorf15B /// DDX3YUSP9YTTTY15 ///APBA1 ZFY/// LOC100130220DDX3YSTON1LOC401629 /// LOC100130220LOC728987ENOX1NTRK2 /// SLC35F3LOC401630TMEM200AC10orf10LOC100132181CDR1C14orf91TTTY10UTYLOC654433AHSA2MYH6SLC8A1FANK1UMODL1KAL1LOC100130224STON1PRKYFGD6LOC100131039 ///VASH1 UTYLOC157627HSD17B7P2 12

10

8 Log2 Fold Change 6

4

2

0

Figure 6.9. Genes with a positive Log2 fold change ≥2. These genes have a lower level of expression in affected individual P1:III:3 than unaffected individual P1:II:1. The genes with the largest differential expression are those located on the Y chromosome. This is as expected when comparing male and female samples and is not relevant to the disease phenotype.

184

Genes with negative Log2 Fold Change

0 -2 -4 -6 Log2 Fold Change -8 -10 -12

-14

SYK

FOS

XIST XIST XIST XIST XIST XIST

CNR2 DTNB

CYCS OCLN

BATF2

ENPP1 TDRD9 FCRLA CXCL1

SCN3A

TMCC3

MACC1

YTHDC1

FCGR2B

SLC44A5 PCBP2///

HLA-DRB1/// IGK@ /// IGKC /// IGKC /// IGK@

Figure 6.10. Genes with a negative Log2 fold change ≤ -3. These genes have a higher level of expression in affected female P1:III:3 than unaffected male P1:II:1. The largest difference in expression can be seen in the gene XIST, which is located on the X chromosome. Differential levels of expression in genes located on the X or Y chromosome are to be expected when comparing male and female samples and are not relevant to the disease phenotype.

185

6.3.2 Investigation of the Chr7 Heterozygous Deletion detected in Family P5 qPCR analysis using Taqman Copy Number Probes Quantitative RT-PCR was used to investigate the possible heterozygous deletion detected in the Affymetrix V6.0 Human SNP array data for affected individual P5:II:1. Two separate Taqman copy number assays were used to determine NOBOX copy number. Both assays lie within intron 2 of the genomic sequence (Figure 6.11).

Figure 6.11. Alignment map from ABI website showing the location of the two assays used for SYBR green analysis. http://www5.appliedbiosystems.com/tools/alignMap/alignMap.rb?viewerAxisEntit yType=CNV&viewerAxisEntity=Hs03648039_cn&gene_id=135935

In order to assess the copy number of this gene in affected individual P5:II:1, a control sample with a known copy number of 2 was needed. Unaffected individual P1:III:4 from family P1 was previously genotyped using Affymetrix V6.0 Human SNP array (see Chapter 5). Copy number analysis using ChAS software indicated that this individual has a NOBOX copy number of 2 (Fig 6.12). Individual P1:III:4 was used as the control sample for the following experiments.

186

Figure 6.12. Copy number analysis of P1:III:4, shown using the Affymetrix ChAS software. This unaffected individual has a copy number of 2 for the deleted region on chromosome 7q35. qPCR using Taqman Copy number assays for NOBOX were carried out on genomic DNA samples for affected individual P5:II:1, her affected male sibling P5:II:2 and control sample P1:III:4. Raw data can be found in Appendix Tables 10.18 and 10.19. Data analysis was carried out using the Applied Biosystems Copy Caller software to determine copy number relative to the control sample P1:III:4. Analysed data is shown in Figures 6.13 and 6.14. These experiments were repeated using alternative control samples from within the department. The results were consistent in repeat experiments (data not shown).

187

Figure 6.13. Results of Taqman Assay 1 analysed on CopyCaller software. Con = P1:III:4, Female = P5:II:1 and Male = P5:II:2.

Figure 6.14. Results of Taqman Assay 2 analysed on CopyCaller software. Con2 = P1:III:4, Female2 = P5:II:1 and Male2 = P5:II:2.

Analysis of the qPCR data using the Affymetrix Copy Caller software indicates that both affected individuals in Family P5 have a NOBOX copy number of greater than 2. The Affymetrix V6.0 array data previously suggested that affected female P5:II:1 may have a heterozygous deletion

188

of approximately 188.9kb on Chr7q35. The results of qPCR experiments do not support this original finding. Taqman assay 1 indicates that P5:II:1 has a NOBOX copy number of 4, and assay 2 indicates that the same individual has a copy number of 5. There are several possibilities to explain the lack of consistency in these results which shall be discussed in more detail in the section 6.4 in this chapter.

PCR amplification of Heterozygous Deletion An alternative method of investigating the presence of the 7q35 heterozygous deletion is to amplify across the deletion breakpoints using PCR primers. All primers were designed using Primer 3 software and for detailed information of the PCR methods used please see Chapter 2. Primers were designed to bind to genomic DNA sequence outside of the 7q35 deletion as defined by the Affymetrix V6.0 array data. For full primer sequences please see Appendix Table 10.20.

The suspected deletion is heterozygous, meaning that individual P5:II:1 would have one wild-type allele and one mutant allele. The wild-type allele is full length and producing an amplicon of such a large size (>188.9Kb) would not be possible using standard PCR amplification. If the deletion is present on the mutant allele a small amplicon could theoretically be produced and detected using agarose gel electrophoresis. This is due to the primers being much closer together on the mutant allele because the section of DNA between them has been deleted. No amplification would be expected in a control sample with two wild-type alleles.

Standard PCR amplification of control DNA and DNA from individual P5:II:1 was carried out at a range of annealing temperatures (52oC to 62oC in 2oC increments), with a 1 minute extension for 35 cycles. Specific amplification could not be achieved for any of the three primer pairs. Non specific amplification which was present in both the control sample and sample P5:II:1 can be seen for primer pair 2 and primer pair 3 (Fig 6.15).

189

Figure 6.15. A. Amplification of PCR primer pair 1-3 over a temperature range 52-62oC in control DNA. B. Amplification of PCR primer pair1-3 over a temperature range 52-62oC in DNA for affected individual P5:II:1. C. Amplification of PCR primer pair 3 at 58oC in individual P5:II:1 and control DNA. Amplification was unsuccessful in all primer pairs. Non-specific binding can be seen for primer pairs 2 (A and B) and 3 (C) in sample P5:II:1 and control DNA.

6.4 Discussion 6.4.1 Sequencing of CDKN2D, KRI1, AP1M2 and SLC44A2 in Perrault Syndrome cohort. Sanger sequencing was used to screen four genes in close proximity to the chromosome 19 homozygous deletion detected in Family P1. The coding exons and exon-intron boundaries of the genes CDKN2D, KRI1, AP1M2 and SLC44A2 were sequenced in seven affected individuals from five families with Perrault syndrome. The purpose of this sequencing was to investigate the possibility that mis-regulation of one of these genes could be the pathogenic cause of Perrault syndrome in Family P1. Identification of novel mutations in additional unrelated individuals with Perrault syndrome would have provided strong evidence that the homozygous

190

deletion was exerting a pathogenic effect. As discussed in detail in Chapter 5, CDKN2D was the most promising candidate gene for Perrault syndrome due to mouse models which show both hearing and reproductive phenotypes [276, 277, 296].

Pathogenic mutations were not detected in CDKN2D, AP1M2, KRI1 or SCL44A2 in any patients from our Perrault syndrome cohort. It remains possible that disruption of one of these genes may still represent the pathogenic cause of Perrault syndrome in Family P1 but due to genetic heterogeneity a second family could not be identified.

6.4.2 Expression analysis in Family P1 Two different methods were used to investigate mRNA expression levels in an affected and an unaffected individual from Family P1. As discussed in the introduction section of this chapter, the purpose of these experiments is to determine the likelihood that the homozygous deletion detected in Family P1 had eliminated a key cis acting regulatory region.

Real time PCR using SYBR green chemistry was used to investigate expression level differences in four genes (AP1M2, CDKN2D, KRI1 and SLC44A2) surrounding the deletion. Initial validation experiments revealed that the AP1M2 assay (designed by Primer Design Ltd) did not produce efficient amplification of cDNA at a range of concentrations. Two possibilities could explain the lack of efficient amplification in sample P1:II:1. The first is that the primers used in the assay did not bind efficiently to the target cDNA sequence. The second possibility is that AP1M2 may not be expressed in adult lymphocytes. The RNA which was extracted from individual P1:II:1 came from a lymphocyte primary cell culture and if AP1M2 has tissue specific expression then it may not be detected by these experiments. In order to investigate the expression of this gene using SYBR green based qPCR, new primers would need to be designed, or an alternative cell line which is known to express AP1M2 would be needed. All other primer assays were validated and shown to have similar amplification efficiencies. In order to use the comparative Ct method to analyse sample 191

data, primer efficiencies need to be similar. This is because the comparative Ct calculation does not correct for inter assay differences in efficiency. This method of calculation is best used for applications such as Taqman assays in which all assays have been pre-validated by the manufacturer and efficiency can be assumed to be 100%. However, for SYBR green analysis care should be taken to calculate assay efficiency and ensure small inter-assay differences. Typically 5% is considered to be the maximum acceptable difference between the gene of interest assay and the housekeeping assay (average difference between CDKN2D and GAPDH is 5.1%, the average difference between SLC44A2 and GAPDH is 4.6%). However, for a difference that is greater than 5% (average difference between KRI1 and GAPDH is 6.3%) corrections should be made within the fold change calculation. One method of analysis which corrects for such differences is the Pfaffl method. Both the Comparative Ct method and the Pfaffl method were used to calculate fold change of each gene in affected individual P1:III:3 compared to unaffected individual P1:II:1. The fold change for each gene was the same to 1 decimal place using both methods. This indicates that the efficiencies of all of the gene of interest assays were close enough to GAPDH assay efficiency to be comparable.

There was no difference in expression levels of SLC44A2 for the affected individual compared to the unaffected family member (SLC44A2 fold change= 1.0 to 1d.p.). Expression of KRI1 and CDKN2D was found to be higher in the affected member of Family P1 compared to the unaffected family member (KRI1 fold change= 1.9 to 1d.p., CDKN2D fold change= 1.8 to 1d.p.). This is nearly a 2 fold difference in expression of these genes in the affected individual. The significance of this result would need to be explored by repeating the experiment. As well as this additional samples would need to be tested. A panel of control samples could help to identify an average level of expression of these genes in the general population, and if samples for additional family members became available expression levels in these individuals would be analysed. However, global expression analysis of Family P1 samples using the Affymetrix HG_U133 Plus 2.0 expression array did not support the qPCR result. This analysis indicated 192

no significant difference in levels of KRI1 or CDKN2D between the affected and unaffected individual. It seems unlikely therefore that altered gene transcription of either of these genes is the pathogenic cause of Perrault syndrome in Family P1.

The Affymetrix HG_U133 Plus 2.0 array was used to give a more global picture of expression differences between the two samples. If long distance regulation of genes on chromosome 19 had been disrupted by the homozygous deletion this array would be expected to detect significant up or down regulation of the affected gene in sample P1:III:3. Six genes on chromosome 19 displayed significant differential expression (Fig 6.18). Two of these genes, KANK2 and AXL were down regulated in affected individual P1:III:3. AXL encodes a tyrosine kinase receptor which binds to growth factor GAS6 and over expression has been implicated in a range of cancers including breast and thyroid carcinoma [297-300]. KANK2 encodes the protein KN motif and ankyrin repeat domain-containing protein 2 (also known as SRC-1 interacting protein). KANK2 expression is highest in cervix, colon, heart, kidney and lung [301]. One function of KANK2 is to sequester steroid receptor co-activators (SRC) into the cytoplasm, thus playing a role in the regulation of transcription [302]. Four genes on chromosome 19 were significantly up-regulated in individual P1:III:3, NAPSB, SIGLEC6, LILRB1 and CD22. NAPSB encodes the napsin B aspartic peptidase psuedogene, which is a large non-coding RNA with no protein function [303]. SIGLEC6 encodes the protein Sialic acid-binding Ig- like lectin 6, a sialic acid dependant adhesion molecule which is thought to mediate cell-cell recognition [304]. LILRB1 encodes Leukocyte immunoglobulin-like receptor subfamily B member 1, a receptor for class 1 MHC antigens, and plays a role in down regulation of the immune response [305]. Finally, CD22, encodes B-Cell receptor 22, which mediates B-cell interaction in a sialic acid dependent manner, and belongs to the same family of proteins as SIGLEC6 [306]. None of the differentially expressed genes on chromosome 19 appear to be good candidates for Perrault syndrome based on their known functions. Three genes have functions in the innate immune system (LILRB1, CD22 and SIGLEC6), one is a non 193

functional psuedogene (NAPSB), one is a tyrosine-kinase receptor, implicated in tumour pathogenesis (AXL), and the final gene plays a role in the regulation of transcription (KANK2). The combination of qPCR and array data does not indicate differential expression of any genes which could be functionally relevant to the pathogenesis of Perrault syndrome. Based on this evidence it appears that the homozygous deletion detected in Family P1 does not encompass a crucial regulatory domain. Multi-species alignment did not show significant conservation of the region, and no pathogenic mutations have been detected in proximal genes in other individuals with Perrault syndrome. Additionally we sought advice from Dr Dirk-Jan Kleinjan who works at the MRC Medical and Developmental Genetics department in Edinburgh. Dr Kleinjan has experience in looking at long distance regulation of gene expression, and analysed our deleted region for sequence conservation and known histone marks that could indicate the presence of a promoter or regulatory elements. He found no evidence of either. This combination of evidence suggests that the chromosome 19 homozygous deletion is not likely to be the pathogenic cause of Perrault syndrome in Family P1.

6.4.3 Confirmation of the heterozygous deletion on chromosome 7 in Family P5 Copy number analysis of Affymetrix V6.0 SNP array data identified a possible heterozygous deletion on 7q35 in an affected individual from Family P5. Confirmation of this deletion is made difficult due to its size (188.9Kb) and heterozygous nature. This deletion encompasses the gene NOBOX, and haploinsufficiency of NOBOX has previously been reported as a cause of ovarian dysgenesis [216, 217]. In an attempt to confirm the presence of this deletion qPCR using Taqman copy number probes was carried out on affected individual P5:II:1, and an unaffected control sample P1:III:4, who is known to have a NOBOX copy number of 2.

Analysis of the Taqman copy number data revealed an unexpected result. Two copy number assays were used, both of which lay within intron 2 of the NOBOX gene. Copy number assay 1 indicated that individual P5:II:1 194

has a copy number of 4, and that her affected brother (P5:II:2) has a copy number of 3, relative to control individual P1:III:4. This result did not support the array data which indicated a heterozygous deletion of the region. Copy number assay 2 indicated that individual P5:II:1 has a copy number of 5, and individual P5:II:2 has a copy number of 4. Once again this result displayed a copy number gain rather than a loss, but did not support the results from assay 1. Although the results of the two Taqman assays did not give the same NOBOX copy number for each patient, they did show the same difference in copies between the two affected individuals in Family P5. The affected female P5:II:1 displayed one extra copy to her affected brother P5:II:2 for both assays. The aim of this set of experiments was to confirm the presence of the deletion, or to establish if the result from the Affymetrix array data was a false positive. The array data itself was not conclusive evidence of the presence of the deletion. This is due to the fact that the density of copy number probes in this region is low. Of particular interest to our phenotype is the gene NOBOX, and there are no copy number probes which lie within the NOBOX gene on this array. Coupled to this is the fact that the Log2 ratio intensities for the probes in the deleted region are variable. It is therefore unsurprising that the Taqman copy number assays did not identify a deletion of the NOBOX gene, suggesting that the array data result was a false positive. What was surprising was that individuals P5:II1 and P5:II:2, appeared to have additional copies of NOBOX. One possible explanation is that this region on 7q35 has previously been reported as being a common site of copy number variation on the DGV database (variation_3708, variation_4573 and variation_37284). Copy number gains, losses and inversions have been identified in multiple independent investigations of human genome global copy number variation [289, 291].

Amplification across the breakpoint of the suspected deletion was also unsuccessful. This result, together with the Taqman assay data, means that haploinsufficiency of NOBOX is unlikely to be the pathogenic cause of ovarian dysgenesis in Family P5. It is likely that the deletion detected by the Affymetrix v6.0 SNP array was a false positive result. 195

6.5 Conclusions The aim of this chapter was to further investigate the deletions identified from copy number analysis of the Affymetrix V6.0 array data presented in Chapter 5. The 1361bp homozygous deletion identified on 19p13.2 is present in all affected individuals and segregates with disease in Family P1. However, the deletion is intergenic, does not directly disrupt the coding region of any genes and lies in a poorly conserved region with no evidence of regulatory sequences. Mutations were not identified in any genes contiguous with the deletion in unrelated patients with Perrault Syndrome. The results presented in this chapter could not demonstrate compelling evidence of transcriptional mis-regulation through expression analysis. The deletion does not appear to be having a long range effect on the transcriptional regulation of genes on chromosome 19. It is therefore unlikely that this homozygous deletion is the pathogenic cause of Perrault syndrome in Family P1.

The heterozygous deletion of 7q35 identified in an affected individual from Family P5 could not be confirmed. qPCR analysis of the NOBOX gene using Taqman assays identified a gain in copy number for both affected individuals from Family P5, and PCR amplification of the breakpoint could not be achieved. The evidence presented in this chapter suggests that the identification of a heterozygous deletion on the Affymetrix array was a false positive result. Haploinsufficiency of NOBOX is not likely to be the pathogenic cause of ovarian dysgenesis in individual P5:II:1.

196

CHAPTER 7. PERRAULT SYNDROME INVESTIGATIONS

197

Chapter 7. Perrault syndrome Investigations

7.1 Introduction In Chapter 5 autozygosity mapping was used to identify a Perrault syndrome locus at 19p13.3-13.11 in Family P1. This is a novel Perrault syndrome locus, containing 298 genes. The aim of the following chapter is to investigate potential candidates within this locus and identify novel pathogenic mutations for Perrault syndrome. Several genes within the chromosome 19 locus are considered to be good candidates, and these are discussed in more detail in Chapter 5. Following initial autozygosity mapping a traditional candidate gene approach to identifying the causal mutation(s) was taken. However, throughout the course of the project there have been technological advances in the field of medical genetics. One such advance is the ready availability of next generation sequencing platforms, which can be utilized in several ways such as whole exome sequencing or transcriptome sequencing (RNA-Seq). This is discussed in more detail in Chapter 1.3.3.

7.2 Aim The aim of the work in this chapter is to identify pathogenic mutations in the previously identified 19p13.3-13.11 locus in Family P1, and in other affected individuals from our Perrault syndrome cohort. This chapter also aims to investigate any variants detected using relevant techniques.

7.3 Results 7.3.1 Sanger Sequencing of Candidate Genes in 19p13.3-13.11 locus Within the 19p13.3-13.11 locus several genes were identified as good candidates for Perrault syndrome. The genes listed in Table 7.1 were all screened for pathogenic mutations in affected individuals from Family P1. Primers were designed using Primer 3 software to amplify all coding exons and exon-intron boundaries. Traditional Sanger sequencing techniques were used, for a more detailed description of methods and materials used please see Chapter 2.

198

Gene Potential Links to Perrault Syndrome FARSA Encodes the Phenylalanyl-tRNA synthetase alpha chain. Previously identified Perrault syndrome gene HARS2, is a member of the same family of enzymes. CDKN2D Murine homologue is vital for maintaining the post mitotic state of sensory hair cells in the inner ear. Mild reproductive phenotype in male knockout mice. MYO1F Heterozygous mutations identified in a large cohort of non syndromic deafness patients. SLC44A2 Inner ear membrane glycoprotein. Implicated in antibody induced hearing loss in humans. Interacts with Cochlin in the guinea pig inner ear. ZNF653 Interacts with NR5A1, which encodes the protein Steroidogenic Factor 1 (SF1). Mutations in NR5A1 have been found as a cause of premature ovarian failure as well as 46XY disorders of sexual development

Table 7.1 Candidate genes from locus 19p13.3-13.11 sequenced in Family P1

No novel variants were detected in any of the candidate genes sequenced in members of Family P1. All variants detected were known SNPs and were listed on the NCBI SNP database.

7.3.2 Next Generation Sequencing of Family P1 As part of a collaboration with the King lab at the University of Washington Seattle, next generation sequencing of affected individual P1:III:3 was carried out. Initially, whole genome sequencing was attempted using the ABI SOLiD next generation platform. I spent two months in Seattle in Summer 2009 preparing and amplifying a fragment and a mate paired library for affected individual P1:III:3 in accordance with manufacturers protocols. Despite several attempts and regular advice from ABI technical specialists the mate paired library preparation was unsuccessful. The fragment library preparation was more successful and was sequenced

199

using the SOLiD v3 platform. However, analysis of the resulting data indicated poor coverage of the genome and no pathogenic variants could be identified within the 19p13.3-13.11 mapped region. It was decided that our collaborators would repeat the experiment using an alternative platform. This time a whole exome library was prepared using the Nimblegen SeqCapEZ Human Exome Library V2.0 kit and the library was sequenced using the Illumina Genome Analyzer IIX. Initial bioinformatic analysis was carried out in Seattle by the King lab using the Illumina pipeline v1.6 (default parameters). All high quality reads were mapped to the reference human genome sequence (GRCh37,UCSC hg19) using Mapping and Assembly with Qualities (MAQ) v0.7.1 [307]. Potential single nucleotide variants and indels were identified using the MAQ Perl-based filter after alignment (-map), assembly (-assembly), and consensus calling (-cns2snp) features.

At this stage we received the analyzed data and filtered the results using predefined criteria as laid out in Figure 7.1.

Figure 7.1 Criteria for filtering novel variants identfied in whole exome data

200

Predi Genomic Ref cted *Segregat Co- > Zygo Read Variant Protein e with Chr ordinates Var sity Num reads Ratio Gene Effect Disease? A > 19 6,364,528 C/C Hom 150 88 59 CLPP T145P YES C > 19 6,380,605 A/A Hom 580 219 38 GTF2F1 G443V YES G > 19 7,696,593 A/A Hom 149 134 90 PCP2 P132P YES G > 19 7,696,594 C/C Hom 154 134 87 PCP2 P131R YES T > T2469 False 19 8,145,935 G/T Het 560 250 45 FBN3 P positive 10,218,23 T > False 19 6 G/T Het 178 66 37 PPAN V82G positive 11,123,64 T > False 19 7 G/T Het 489 187 38 SMARCA4 V766G positive 16,035,68 G > 19 0 A/A Hom 177 47 27 CYP4F11 R180C YES 17,122,42 C > 19 3 A/A Hom 161 74 46 CPAMD8 V185L N** 17,571,47 A > 19 6 T/T Hom 224 108 48 NXNL1 L68Q N** *Segregation confirmed using traditional Sanger sequencing techniques **Variants lie just outside of 19p13.3-13.11 locus however due to homozygosity segregation was checked. As expected the variants did not segregate with disease in the family.

Table 7.2 Variations detected in 19p13.3-13.11 locus after filtering using predefined criteria.

All variants which met the predefined criteria were confirmed using standard Sanger sequencing (Table 7.2). Segregation of variants with disease phenotype was checked in affected and unaffected family members (Figures 7.2-7.5). Of the variants detected using next generation sequencing, five were confirmed to be present in all affected individuals and segregated with disease in additional family members. These variants are CLPP c.433A>C, GTF2F1 c.1328G>T, PCP2 c.392C>G, PCP2 c.393C>T and CYP4F11 c.538C>T. All variants are given in the forward direction. The variant PCP2 c.393C>T was synonymous and therefore not considered likely to be pathogenic.

201

Figure 7.2 Sanger sequencing results showing segregation with disease of the homozygous c.433A>C CLPP variant in Family P1. Please refer to Chapter 3 for family pedigree.

Figure 7.3 Sanger sequencing results showing segregation with disease of the homozygous c.1328G>T GTF2F1 variant in Family P1.

202

Figure 7.4 Sanger sequencing results showing segregation with disease of the homozygous c.392C>G PCP2 variant in Family P1.

Figure 7.5 Sanger sequencing results showing segregation with disease of the homozygous c.538C>T CYP4F11 variant in Family P1.

203

Multi-species alignment of each of each of the variants was carried out using ClustalW [308]. These alignments can be seen in Figures 7.6-7.9.

Figure 7.6 Multi-species alignment showing CLPP p.T145P variant.

Figure 7.7 Multi-species alignment showing PCP2 p.P131R variant.

Figure 7.8 Multi-species alignment showing GTF2F1 p.G443V variant.

Figure 7.9 Multi-species alignment showing CYP4F11 p.R180C variant.

In silico prediction tools PolyPhen2 [309] and SIFT [310] were used to predict pathogenic potential of each of the four variants. The results are displayed below in Table 7.3. Two pairs of datasets called HumDiv and HumVar were used to train and test the PolyPhen2 software algorithms. The user can choose between the two scores given, depending on the

204

variant being investigated. For Mendelian diseases in which mutations are expected to have a drastic effect (such as in Perrault syndrome), it is recommended that the HumVar trained version of the software is used [309].

Polyphen2 HumDiv Polyphen2 HumVar SIFT Gene Variant score score prediction Probably Damaging Probably Damaging *DAMAGING CLPP p.T145P 1.000 0.997 (0) Probably Damaging Probably Damaging TOLERATED GTF2F1 p.G443V 0.999 0.954 (0.2) Probably Damaging Probably Damaging DAMAGING PCP2 p.P131R 0.998 0.922 (0) PCP2 p.P132P Synonymous Synonymous Synonymous Probably Damaging Possibly Damaging TOLERATED CY4F11 p.R180C 0.97 0.704 (0.06) *DAMAGING - Low confidence predictions with Median conservation above 3.25 Table 7.3 Polyphen2 and SIFT in-silico predictions for variants detected using whole exome sequencing.

A cohort of 193 ethnically matched control samples were sequenced for each of the four non-synonymous variants. Some samples from the cohort did not sequence successfully. Final numbers of control samples sequenced for each variant can be seen in Table 7.4. CLPP c.433A>C, GTF2F1 c.1328G>T and PCP2 c.392C>G were not detected in any of the samples sequenced. CYP4F11 c.538C>T was found in the heterozygous state in two samples from this cohort (Table 7.4). As well as this each variant was entered into the NHLBI Exome Sequencing Project Exome Variant Server. http://evs.gs.washington.edu/EVS/ The NHLBI Exome Sequencing Project is a large collaboration of groups using next generation sequencing to discover novel disease genes in a range of heart, lung and blood disorders. All datasets generated by this group are uploaded to an internet based server to create a database of variants available to the wider research community. The only variant which was listed in this database was CYP4F11 c.538C>T. The c.538C>T variant was found in 58/10700 alleles on the NHLBI database. Since this variation was first identified in our family it has also been annotated on the NCBI

205

SNP database as rs148197835. Based on the finding of this variant in our control cohort, on the NHLBI database and more recently on the NCBI SNP database, it is considered unlikely that CYP4F11 c.538C>T is a pathogenic variant. It is more likely to be a rare polymorphism. The remainder of the investigations in this chapter focus on the three variants most likely to be pathogenic in Family P1, CLPP c.433A>C, GTF2F1 c.1328G>T and PCP2 c.392C>G.

Number of Number of Number of Number of control control control control samples samples samples samples Homozygous Heterozygous Homozygous Gene sequenced WT for variant for variant CLPP 180 180 0 0 PCP2 183 183 0 0 GTF2F1 189 189 0 0 CYP4F11 184 182 2 0

Table 7.4 Sequencing of ethnically matched control samples for novel variants detected in Family P1.

Primers were designed for all coding exons and exon-intron boundaries of the genes PCP2, CLPP and GTF2F1. Twelve affected individuals from families P2-11 were sequenced for all three genes. No novel variants were detected in any of the genes sequenced. The only variants detected were common polymorphisms listed on the NCBI SNP database.

7.3.3 Expression Experiments for variants detected in Family P1 In order to establish the potential for pathogenicity of each of the three variants, expression analysis was carried out. Given the obligatory deafness and ovarian dysgenesis of Perrault syndrome it is reasonable to expect expression of whichever gene is responsible for this condition in cochlea and ovarian tissue. It is also possible that disrupted function of more than one of the genes identified in this family is responsible for this complex phenotype. Establishing robust expression patterns for all three genes may help to determine the most likely cause of the phenotypic features seen in Family P1.

206

Tissue dissection was carried out by Dr Andrew Berry on human embryo M374, at approximately 8.5-9 weeks gestation and 6mm foot length. A range of tissues were dissected including gonad and cochlea tissue. RNA was extracted from all samples, for full details of methods and materials used please see Chapter 2.2.2. Reverse transcription PCR (RT-PCR) of RNA samples extracted from embryonic tissue was carried out to obtain cDNA. Primers were designed to amplify sections of cDNA transcripts for the genes CLPP, PCP2 and GTF2F1. Standard PCR amplification of transcripts was carried out in all embryonic samples at approximately 10ng/µl. For details of primer sequences please see Appendix Table 10.21. Figure 7.10 shows the PCR amplification of embryonic tissue for GAPDH, CLPP and GTF2F1. GAPDH is a standard housekeeping gene which was used to confirm successful RNA extraction and reverse transcription of all tissues. Expression of GAPDH was uniform in all tissues except for lung. This means that for most tissues semi quantitative approximations of expression levels can be made. CLPP was ubiquitous in all tissues sampled, with no apparent difference in expression levels. GTF2F1 was also expressed in all tissues, but expression appeared to be higher in adrenal and liver tissue. Results of PCR amplification of PCP2 transcripts can be seen in Figure 7.11. Initial results indicated a lack of expression in cochlea tissue (Figure 7.11, A, column 8). However, a repeat experiment demonstrated amplification in cochlea tissue but not in skin (Figure 7.11, B, column 5). Amplification using these primers appears to be variable in efficiency; however, over the two experiments amplification was achieved in all tissues tested. At 8.5-9 weeks of embryonic development, dissection of cochlea tissue was technically challenging given the relatively small size of this particular organ. It is unlikely that a clean dissection was obtained, and surrounding epithelial tissue is likely to be present in the sample. This means that positive amplification of transcript in this sample is not definitive of expression in the cochlea and is merely suggestive that expression is possible in this organ during development.

207

A

B

C

Figure 7.10. PCR amplification of embryonic cDNA samples. A= GAPDH, B = CLPP and C= GTF2F1. Embryonic sample 1 =Lungs, 2 =Adrenal, 3 =Liver, 4 =Kidney, 5 =Skin, 6 =Heart, 7 =Gonad and 8 = Cochlea.

208

A

B

Figure 7.11 A= Experiment 1: PCR amplification of PCP2 cDNA transcripts in a panel of embryonic samples 1 =Lungs, 2 =Adrenal, 3 =Liver, 4 =Kidney, 5 =Skin, 6 =Heart, 7 =Gonad and 8 =Cochlea. B = Experiment 2: PCR amplification of GAPDH, CLPP and PCP2 cDNA transcripts for the same panel of embryonic samples.

Expression analysis of CLPP, PCP2 and GTF2F1 in tissues of the inner ear was carried out by the Huentelman group at the Translational Genomics Research Institute in Arizona [311]. Next generation RNA-Seq technology was utilized to characterize the transcriptome of healthy human inner ear tissue from four adult donors. Sequencing was carried out using the Illumina HiSeq 2000 platform, and the data was analysed using the Cufflinks quantitation software. The resulting data was shared with us as part of a collaboration on this project and showed expression of GTF2F1 in most inner ear tissue samples, including ampulla, cochlea, saccule, utricle

209

and vestibular tissue (Figure 7.12). CLPP was expressed in vestibular and saccule tissue, but PCP2 was not detected in any of the tissues of the inner ear.

Figure 7.12 RNA-Seq data indicating expression levels in various tissues of the adult inner ear.

To further assess the expression of each of our genes in the developing ovary, immunohistochemistry (IHC) using horseradish peroxidase and diaminobenzidine (DAB) was carried out. Ovarian tissue was dissected from human embryo M363 (approximately 18 weeks gestation, 20mm foot length) by Dr Andrew Berry. The tissue was embedded and sectioned on a microtome. Commercial antibodies were only available for two of the proteins of interest, ClpP and Rap74 (GTF2F1). For full details of the methods used for IHC staining please see Chapter 2.6. Negative control experiments in which no antibody was added to the tissue section were carried out. For both antibodies specific staining due to protein expression could be demonstrated. Results of the IHC staining can be seen in Figures 7.13 and 7.14. Both Rap74 and ClpP were expressed in the germ cells of embryonic ovarian tissue, with expression being absent for both genes in

210

the stromal layers. Figure 7.13 shows ClpP staining surrounding the nucleus of germ cells, which is in keeping with known protein localization in the mitochondria. Figure 7.14 shows strong Rap74 staining in the nucleus of ovarian germ cells which is consistent with its function as a general transcription factor.

Figure 7.13 ClpP staining of human embryonic ovarian tissue. Antibody dilution is 1:200. Strong staining can be seen surrounding the nucleus of ovarian germ cells.

Figure 7.14 Rap74 staining of human embryonic ovarian tissue. Antibody dilution is 1:100. Strong staining can be seen in the nucleus of the germ cells.

211

7.3.4 Next Generation Sequencing of additional Perrault syndrome patients Whole exome next generation sequencing was carried out on two additional affected individuals from our Perrault syndrome cohort, P6:II:1 and P2:II:2. P6:II:1 is an affected individual from a non consanguineous family of British origin who has Perrault syndrome with no additional features. Perrault syndrome is an autosomal recessive condition, which means that given the lack of consanguinity compound heterozygous mutations are most likely in this individual but homozygous mutations should also be considered. Whole exome sequencing of P6:II:1 was carried out as a service at the Beijing Genomics Institute (BGI) in Hong Kong. The exome was captured using the Agilent Sure Select 38Mb Exon Capture array and sequencing was carried out on the Illumina HiSeq 2000 platform. Details of the levels of coverage for sample P6:II:1 can be seen in Table 7.5.

P2:II:2 P6:II:1 Number of bases with 1x coverage 81.50% 95.00% Number of bases with 10x coverage 58.70% 75.80% Number of bases with 20x coverage 48.00% 63.00% Mean coverage 50x 49x * Exome size = 38000000bp

Table 7.5 Next generation sequencing coverage data for Perrault syndrome patients.

The resulting data was filtered to identify potential pathogenic changes. The SNP data was filtered using the criteria laid out in Figure 7.1. After which all single heterozygous calls were removed leaving just possible compound heterozygous and homozygous variants. All intergenic variants and intronic variants were removed. A search of remaining variants on the NHLBI Exome Sequencing Project Exome Variant Server was carried out and all commonly identified variants were removed. For a full list of remaining single nucleotide variants please see Appendix Table 10.22. The

212

indel data that was provided by the BGI after initial bioinformatic analysis was presented in a different format to the data provided by our Seattle collaborators. Therefore the criteria described in Figure 7.1 could not be applied. Instead all indel variants with a read number of less than 5 were removed from the analysis and remaining variants were included. All intergenic and intronic variants were removed. For a full list of remaining indel variants please see Appendix Table 10.23. Of the remaining candidate genes, 11 contain predicted homozygous or compound heterozygous single nucleotide variants, 1 gene contains a predicted heterozygous deletion and heterozygous single nucleotide variant, and 3 genes contain predicted homozygous or compound heterozygous indel variants. Given that there were no additional family members available for sequencing and no locus had been identified in this patient is was not possible to identify a single gene responsible for Perrault syndrome in individual P6:II:1.

Whole exome sequencing for sample P2:II:2 was carried out as an in house service at the National Genetics Reference Laboratory (NGRL) at St Mary’s Hospital in Manchester. The exome was captured using the Agilent SureSelect Human All Exon 50MB array and sequenced using the SOLiD 4 system. Data on the coverage for this sample can be seen in Table 7.5. The mean coverage for sample P2:II:2 is actually higher than that of sample P6:II:1. However, more detailed breakdown of the coverage levels revealed that only 48% of bases were covered at 20X, and only just over half of the total number of bases were covered at 10X. This level of coverage is not considered to be high quality which makes analysis of the data unreliable. Given this information it is unsurprising that causative mutations for Perrault Syndrome in this family could not be established.

7.4 Discussion 7.4.1 Next Generation Sequencing of Family P1 Despite the presence of several good candidate genes within the 19p13.3- 13.11 locus, no pathogenic mutations were detected using a traditional screening approach in Family P1. As discussed in Chapter 1, next 213

generation sequencing is revolutionising the way in which genetic research is carried out. It is no longer necessary to select candidates and systematically work through hundreds of genes within a defined locus looking for mutations. Whole exome sequencing is a faster and much more efficient method of identifying novel mutations. Whole exome capture and sequencing of affected individual P1:III:3, identified three novel homozygous variants in different genes which segregate with disease in Family P1. High quality data is essential to maximise the likelihood of identifying any given pathogenic mutation. The first attempt at whole genome sequencing of sample P1:III:3 using a fragment library did not identify any of the three variants detected using the exome capture array. Next generation sequencing technology is still relatively new and is continuing to advance at a rapid pace. During the time that I spent in Seattle working on whole genome libraries the SOLiD system 3 was the latest platform. The reagent kits were still incredibly new and the practical challenges that faced us when preparing the libraries became evident by the lack of results obtained. Applied Biosystems have since released the SOLiD system 4 and their most recent system the 5500 Genetic Analyzer, the kits have been optimised and the workflow is much more efficient and effective. Given the speed with which companies are producing improved versions of sequencing platforms researchers can expect the efficiency with which data is produced and the overall quality of next generation data to continue to improve.

The three genes with novel missense variants identified using next generation sequencing are CLPP, PCP2 and GTF2F1. CLPP encodes the proteolytic component of human ATP-dependent Clp protease (ClpP). The active protease (ClpXP) is a complex made up of two components ClpP and ClpX (encoded by the CLPX gene on chromosome 15). The function of this protease complex is to catalyse the unfolding and degradation of mis- folded or specially tagged proteins within the mitochondria. Both components of the protein are well conserved throughout evolution and can be found in all eubacterial species. In humans ClpP forms a stable heptamer which interacts with ClpX to form the active protein in the 214

presence of ATP [312]. Germline mutations in CLPP have not been described in humans. However, in 2008, a study of mitochondrial gene expression in a patient with dominant hereditary spastic paraplegia (SPG13) found that expression of ClpP was reduced at the mRNA and protein level in the affected individual when compared to control samples [313]. This link to spastic paraplegia is interesting in relation to Family P1 given that two of the sisters have ataxia and signs of lower limb spasticity. Also relevant to this project is the fact that mitochondrial function has already been shown to be important in the context of Perrault syndrome pathogenesis, as mutations have previously been identified in the mitochondrial protein HARS2 [240].

GTF2F1 encodes the Rap74 subunit of human general transcription factor IIF (TFIIF). TFIIF is an initiator of transcription and acts by recruiting RNA polymerase II to the initiation complex [314]. As well as interacting with RNA polymerase II, Rap74 also binds to other transcription factors such as the androgen receptor (AR). The AR is a steroid hormone receptor which when bound to its ligands testosterone and dihydrotestosterone, regulates the transcription of genes important for male sexual development. Rap74 binding to the AF1 domain of the AR protein causes a conformational change to the protein structure and activation of transcription [315-317]. Additionally in the context of Perrault syndrome, it is important to note that mutations in transcription factors are a known cause of deafness in humans. As discussed in detail in Chapter 1, mutations in transcription factors such as POU3F4 and POU4F3 can cause non syndromic deafness through mis-regulation of gene expression within the specialized structures of the inner ear.

The third missense variant that was detected in individual P1:III:3 was in the gene PCP2, which encodes the Purkinje cell protein 2/L7. Purkinje cell protein 2 is thought to be expressed exclusively in Purkinje cells and in rod bipolar cells of the retina [318]. PCP2 contains a GoLoco motif and is predicted to have guanine nucleotide dissociation inhibitor activity (GDI) [319]. The exact function of Purkinje cell protein 2 is not completely 215

understood but it is thought to play a role in Purkinje cell development or regulation [320]. The localization and predicted role of this protein does not immediately highlight it as a good candidate for Perrault syndrome. However, an alternative name for PCP2 is GPSM4, or G-protein signalling modulator-4. Recently mutations in another member of the G-protein signalling modulator family, GPSM2, were identified in patients with autosomal recessive deafness [321].

Each of the three variants identified in Family P1 is a potential candidate for Perrault syndrome. All three variants segregate with the disease in family members. Multi-species alignment indicates that each of the three residues are well conserved. GTF2F1 c.G1328T is the least well conserved and CLPP p.T145P is the most highly conserved. The in silico prediction tool Polyphen2 predicted all residues to be ‘probably damaging’, but the SIFT in silico algorithm predicted that the GTF2F1 mutation would be ‘tolerated’. This could be due to the fact that the p.G443V residue is the least well conserved of the variants in the multi-species alignments, and is not conserved in mice (Figure 7.8). None of the variants were detected in a panel of ethnically matched control samples, or in unrelated Perrault syndrome patients from within our cohort. The results presented in this chapter indicate that one or more of the variants identified using next generation sequencing is the pathogenic cause of Perrault syndrome in Family P1.

One other variant was detected in Family P1 which segregated with disease and at the time was not present on the NCBI SNP database. The c.538C>T variant in CYP4F11 was detected in the heterozygous form in two samples from a panel of ethnically matched controls, and was present in the NHLBI Exome Sequencing Project database. This variant was later annotated and can now be found on the NCBI SNP database too. This project was carried out during a period of fast paced technological advancement. Whole exome/genome sequencing is becoming more and more common, but this was not the case at the start of the project when this technology was still very much in its prototypic years. As more and 216

more people are using next generation sequencing as a research tool, the amount of information which will become available about rare variants within the human genome will increase. The c.538C>T variant in CYP4F11 is a good example of the importance of keeping up to date with the latest information available on databases such as NHLBI and NCBI, as this will allow researchers to distinguish pathogenic variants from rare polymorphisms.

7.4.2 Expression Experiments in Family P1 Experiments were carried out to determine expression patterns of the three genes in which novel variants were detected using next generation sequencing. Expression of these genes in tissues which are of particular interest for Perrault syndrome (eg ovarian and cochlea tissue) could provide clues as to which variants have the potential to be pathogenic. Tissue samples were collected from a human embryo of approximately 8.5- 9 weeks gestation, RNA extracted and reverse transcription PCR was carried out to obtain cDNA for each tissue. Primers were designed to target the cDNA sequence of each of the three genes, PCP2, GTF2F1 and CLPP and PCR amplification was carried in each of the tissues. All three transcripts were expressed ubiquitously in all tissues tested. Cochlea tissue was extracted from the embryo and included in the panel of tissues. However, given the early gestation and the size of the developing cochlea at this stage in development it was technically challenging to obtain a clean dissection. The cochlea tissue was contaminated with surrounding epithelial cells. Expression of all three transcripts was observed in skin tissue, which means that the expression seen in the cochlea samples may be due to epithelial contamination.

In order to better establish expression of CLPP, PCP2 and GTF2F1 in tissues of the inner ear we collaborated with the Huentelman group at the Translational Genomics Research Institute in Arizona. Huentelman used next generation RNA-Seq technology to characterize the transcriptome of the human inner ear. The profiled tissue came from four donors undergoing surgery to remove a tumour located close to the inner ear, but the tissue 217

that was profiled was itself healthy. The sequencing was carried out using the Illumina HiSeq 2000 platform and data was analysed using the Cufflinks quantitation software. The results of the RNA-Seq analysis showed expression of GTF2F1 in most inner ear tissue samples, including ampulla, cochlea, saccule, utricle and vestibular tissue. Lower levels of CLPP expression were detected in vestibular and saccule tissue, but PCP2 was not detected in any of the tissues of the inner ear.

Expression of two of the proteins of interest within embryonic ovarian tissue was established using immunohistochemistry of sectioned tissue. Ovarian tissue was collected from a human embryo M363 of approximately 18 weeks gestation (20mm foot length). Commercial antibodies were available from Sigma Aldrich for ClpP and Rap74 (GTF2F1). The results of this staining indicate that both proteins are expressed in germ cells at this stage in ovarian development. As expected ClpP appeared to localize to the mitochondria and Rap74 appeared to be most strongly expressed in the nucleus of the germ cells.

The three different methods of expression profiling indicate that all three proteins are important during early embryonic development. This was shown by the ubiquitous expression of all three in a panel of tissue samples using RT-PCR. GTF2F1 was shown to have the most prominent expression in the tissues of the adult ear using RNA-Seq analysis. This highlights GTF2F1 as a potential candidate for the sensorineural hearing loss in Perrault syndrome. Expression of Rap74 protein was also shown using immunohistochemistry (IHC) in the germ cells of embryonic ovarian tissue. This means that GTF2F1 is also a reasonable candidate for the ovarian dysgenesis phenotype of Perrault syndrome. However, expression of ClpP was also detected by IHC in developing germ cells, and some limited expression of CLPP was detected in the tissues of the adult inner ear, although not in the cochlea. Analysis of Pcp2 expression in the developing ovary was not possible as no commercial antibodies were available for IHC. However, there was no detection of PCP2 transcript in the RNA-Seq analysis of human inner ear tissue which reduces the 218

likelihood that this gene is causing the obligatory deafness of Perrault syndrome. Overall the expression data generated highlights GTF2F1 as the most promising candidate for Perrault syndrome in this family, and PCP2 as the least promising candidate. Further work is needed to identify which variant is pathogenic and it still remains a real possibility that a combination of one or more of these three variants is causing this complex phenotype.

7.4.3 Whole Exome Sequencing of Patients P6:II:1 and P2:II:2 Next generation sequencing of two additional unrelated patients from our Perrault syndrome cohort was carried out. Sequencing of sample P2:II:2 did not generate sufficient coverage across the exome and analysis of resulting data could not be relied upon to identify pathogenic variants. As such this line of investigation could not be pursued.

Sequencing of sample P6:II:1 was carried out as a service by BGI in HongKong. The data was analysed as previously described and potential pathogenic variants were identified. Family P6 are non consanguineous and mapping of a disease locus had not been carried out. As well as this only one affected individual was available for sequencing and no other family members. This means that isolating and identifying a single candidate gene was not possible. However, analysis of the data has produced a list of candidate genes which will require further investigation; these can be seen in Appendix Tables 10.22 and 10.23. The analysis of data from sample P6:II:1 is a good example of one of the biggest limitations of using next generation sequencing. One of the most limiting factors for a project such as this one is that sequencing of one affected individual is not always enough to identify the pathogenic mutations. Sequencing of several affected individuals from the same family would increase the chances of identifying a pathogenic mutation, and linkage mapping or autozygosity mapping to identify a disease locus can also greatly increase the chances. Perrault syndrome is a particularly heterogeneous disorder, as demonstrated by the identification of mutations in two unrelated genes in individual families [239, 240, 271]. This means that comparison of 219

sequencing data from unrelated individuals with Perrault syndrome is less likely to identify a single causative gene. However, for other less heterogeneous syndromes this would be a practical and effective strategy of gene identification.

7.5 Conclusion The aim of this chapter was to use next generation sequencing to identify pathogenic mutations in Perrault syndrome patients. Three novel missense variants were identified in the 19p13.3-13.11 locus in affected members of Family P1; CLPP c.433A>C, GTF2F1 c.1328G>T and PCP2 c.392C>G. Expression analysis has indicated that each of the variants detected has the potential to be pathogenic for this disorder and it remains a possibility that a combination of these variants is responsible for the complex phenotype observed in Family P1. Based on the evidence presented in this chapter it appears unlikely that PCP2 mutations could be responsible for sensorineural hearing loss which is a defining characteristic of Perrault syndrome. However, a role in ovarian development cannot be ruled out without further work and given the localisation of this protein in Purkinje cells it is possible that the PCP2 variant could be responsible for the severe neurological features observed in Family P1. There are no informative animal models available for any of the three candidate genes. At this stage in the investigation it is not possible to determine which variant or combination of variants is responsible for causing Perrault syndrome, however it is highly likely that one or more of these three genes are vital for hearing and endocrine function in humans. Further work will be needed to clarify the exact role of each of the three proteins identified within the context of this disorder.

220

CHAPTER 8. HYPOGONADOTROPIC HYPOGONADISM INVESTIGATIONS

221

Chapter 8. Hypogonadotropic Hypogonadism syndrome Investigations

8.1 Introduction In Chapter 5 autozygosity mapping was used to identify a disease locus at 3p22.1-p21.2 in a large consanguineous family with a novel hypogonadotropic hypogonadism (HH) syndrome. The mapping defined a 13.1Mb locus containing 217 protein coding genes. The aim of the following chapter is to investigate potential candidate genes within this locus and identify the pathogenic cause for this novel syndrome.

I decided to undertake functional investigation of any potential disease causing mutations. One method of learning more about disease pathogenesis is through the use of animal models. Zebrafish are used as a model organism for a wide range of human disorders including cancer, liver disease, cardiovascular disease and neurodegenerative disorders. Zebrafish represent an attractive model for human disease for two main reasons, simplicity and speed. Zebrafish lay large numbers of externally fertilized eggs which develop rapidly. The eggs are transparent and easily manipulated allowing researchers to study gene expression and function through early embryonic development. Within one day of fertilization, development of the central nervous system can be visualized without the need for dissection. As well as this the zebrafish genome shows a high degree of synteny across vertebrates and has 50-80% homology with many human sequences. However, the zebrafish genome is still not fully annotated and identifying human homologues can be difficult because many genes are duplicated [322]. Despite this zebrafish have been used successfully to expand current knowledge about a variety of rare genetic disorders and determine pathogenecity of rare variants. An example of one rare autosomal recessive disorder in which a zebrafish model has been informative is MEDNIK syndrome. MEDNIK syndrome is characterized by mental retardation, enteropathy, deafness, neuropathy, ichthyosis and keratodermia [323]. Mutations in the gene AP1S1 are the pathogenic cause of MEDNIK syndrome in humans, and a zebrafish model was used to 222

demonstrate the altered function of the human protein in affected individuals. The knockdown model was created using antisense morpholino (MO) technology [324]. Morpholinos are short chain oligos consisting of approximately 25 subunits. Each subunit is made up of a nucleic acid base with a morpholine ring, and subunits are linked via non- ionic phosphorodiamidate bonds. Morpholinos can block translation initiation or can effect pre-mRNA splicing via steric blocking. Morpholino based knockdown of Ap1s1 caused a severe morphological and behavioural phenotype in zebrafish embryos, including defects in skin and fin formation. The morphant phenotype was rescued by injection of human wildtype AP1S1 mRNA, but could not be rescued with mRNA containing the most common mutation found in MEDNIK patients, a splice site mutation leading to loss of exon 3 [324]. This is a good example of the use of this model organism for demonstrating loss of function of a human mutation, and highlighting which developmental pathways may be important in disease pathogenesis.

8.2 Aim The aim of the work in this chapter is to identify pathogenic mutations in the previously identified 3p22.1-p21.2 locus in Family HH1. This chapter also aims to investigate any variants detected using relevant techniques.

8.3 Results 8.3.1 Sanger sequencing of candidate genes within the 3p22.1-p21.2 locus. Within the 3p22.1-p21.2 locus one deafness gene was identified as a good candidate for this disorder. The gene TMIE lies within the mapped locus and mutations in this gene have previously been described in patients with recessive hearing loss. Primers were designed to amplify all coding exons and exon-intron boundaries. Traditional Sanger sequencing techniques were used, for a more detailed description of methods and materials used please see Chapter 2.2.7. No novel mutations were detected in the coding regions of TMIE. An additional candidate gene which appeared interesting within the 3p22.1- p21.2 locus was the gene BSN. BSN encodes the pre-synaptic protein 223

Bassoon which is approximately 420KDa in size [325, 326]. Bassoon is a scaffolding protein and plays a vital role in the correct organization of the cytoskeleton at the pre-synaptic CAZ (cytomatrix at the active zone). Bassoon shares significant sequence similarity with another large scaffolding protein known as Piccolo and both are conserved throughout evolution in vertebrate species [327]. At the synapse of neuronal cells the carefully controlled release of neurotransmitter occurs at a specialized region known as the active zone. The active zone is tightly associated with the plasma membrane and contains a variety of molecular components which form the cytomatrix at the active zone (CAZ). Bassoon and Piccolo are two structural components of the CAZ which help to organize this important structure [328]. Studies of a mouse model lacking the central region of the bassoon protein revealed that bassoon is crucial for anchoring to the CAZ. Investigation of hippocampal sections also suggested that a significant fraction of the glutamatergic synapses are functionally inactive in mutant mice [328]. Bassoon has also been shown to be present in ribbon synapses. Ribbon synapses are highly specialized to cope with precise and reliable high throughput neurotransmitter release. The ribbons are specialized versions of the CAZ. They contain all of the CAZ specific proteins, including Bassoon, and are found in the photoreceptor and bipolar cells of the retina and the hair cells of the inner ear [329]. In the Bsn mutant mice discussed previously the synaptic ribbons of photoreceptor cells in the retina are not associated with the active zone as in wildtype cells. They are found floating freely in the pre-synaptic terminal highlighting an important role for Bassoon in anchoring ribbons to the cell membrane [330]. Impaired anchoring of ribbons in Bsn mutants was also shown in the hair cells of the cochlear. This led to hearing impairment as synchronized auditory signalling was defective in homozygous mutants [331]. This auditory phenotype in the mouse model is of particular interest given the hearing loss in Family HH1. Also of interest in relation to the phenotype of Family HH1 is the presence of Bassoon at the neurosecretory active zones of GnRH releasing neurons [332]. As discussed in Chapter 1.2, carefully controlled pulsatile release of gonadotropin releasing hormone (GnRH) is vital for reproductive function and sexual development in humans. 224

Primers were designed to amplify all coding exons and exon-intron boundaries (Figure 8.1). For primer sequences please see Appendix Table 10.24. Sequencing identified a homozygous missense mutation in exon 6 of the BSN gene (Figure 8.2). The novel variant is c.10081C>T, which predicts an amino acid substitution in the protein sequence p.R3361W. This variant is not present on the NCBI SNP database or the NHLBI exome variant database (this database was last accessed 29Apr12 and contained 5379 genomes), and in a screen of 284 ethnically matched controls was only found in 1 sample in the heterozygous state.

Figure 8.1. Representation of human genomic BSN.

225

Figure 8.2. Chromatograms showing the c.10081C>T variant identified in members of Family HH1. Pedigree shows segregation of the variant with disease. Polyphen2 HumDiv Polyphen2 HumVar SIFT Gene Variant score score prediction Probably Damaging Probably Damaging DAMAGING BSN p.R3361W 1.000 0.997 (0)

Table 8.1. In silico predictions for the p.R3361W BSN variant.

226

Figure 8.3. Multi species alignment carried out using ClustalWeb. The arginine residue at position 3361 is conserved in vertebrate species. The first sequence in the alignment labelled BSN is human, the sequence labelled x is Xenopus, all other species are labelled in full.

In order to provide additional evidence that the BSN c.10081C>T variant is the pathogenic cause of the phenotype described in Family HH1, whole exome next generation sequencing of one affected family member was carried out. The purpose of this experiment is to determine if any other potentially pathogenic variants are present within the 3p22.1-p21.2 locus. Whole exome sequencing was carried out as a service at the Beijing Genomics Institute (BGI) in Hong Kong. The exome was captured using the Agilent Sure Select 38Mb Exon Capture array and sequencing was carried out on the Illumina HiSeq 2000 platform.

Sample HH1:III:4 Number of bases with 1x coverage 96.10% Number of bases with 10x coverage 80.20% Number of bases with 20x coverage 68.00% Mean coverage 54x

Table 8.2. Coverage data for whole exome sequencing of sample HH1:III:4

Whole exome sequencing of affected individual HH1:III:4 did not identify any other potential pathogenic variants within the 3p22.1-p21.2 locus. The novel variants identified within the mapped region are given in Table 8.3.

227

Only two coding variants were identified, the first is a synonymous change in the gene RPL29, and the second is the previously identified BSN variant which predicts p.R3361W. This region of chromosome 3 was mapped using Affymetrix array technology and is known to be homozygous throughout. Heterozygous variants were detected in this region using whole exome sequencing however the read numbers and quality scores for these variants are low indicating that they are false positive results (Table 8.3).

Genomic Genotyp Predicted Gene Residue Chr position Ref e zygosity Name Codon change Function chr 3 41952994 A G/G Hom ULK4 intron chr 3 44872315 G A/A Hom KIF15 intron chr 3 45588774 T T/G Het LARS2 intron chr 3 47155641 C T/T Hom SETD2 intron chr 3 47318755 A G/G Hom KIF9 intron chr 3 47777763 T C/C Hom SMARCC intron chr 1 3 48266044 A A/G Het CAMP intron chr 3 49699359 C T/T Hom BSN 3361 R=>W Missense chr 3 52021129 G A/A Hom ACY1 Intron chr 3 52027876 G G/T Het RPL29 123 A=>A Synon chr 3 53774195 T T/G Het CACNA1 Intron chr D 3 53814310 G G/A Het CACNA1 Intron D Table 8.3. Next Generation exome data for 3p22.1-p21.2 locus in affected individual HH1:III:4.

Given the novel combination of features seen in Family HH1 it was not possible to screen for mutations in additional families with the same phenotype. This syndrome is likely to be rare; an estimate for population frequency of this mutation in the south Asian population is 3 in a million. Attempts were made to ascertain additional families with the same phenotype but none could be identified, which may possibly be explained by the low population frequency.

228

Hypogonadotropic hypogonadism is a distinct feature of the syndrome described in Family HH1, and for this reason BSN was sequenced in a cohort of 109 idiopathic hypogonadotropic hypogonadism (IHH) patients to establish if variants in BSN could cause IHH without the additional syndromal features. In total 38 unique variants were identified in this cohort; 12 synonymous variants, 24 non-synonymous variants, 1 duplication and 1 deletion. Synonymous variants were discarded, leaving the 26 variants summarised in Table 8.4, all of which were heterozygous. All single nucleotide substitutions were checked against the NHLBI exome variant database, and frequency data for each of these variants is given in Table 8.4. Of the 109 samples screened only three samples had two heterozygous variants, these are highlighted in different colours in Table 8.4. The first sample contained a novel duplication p.Thr577_Pro598dup, and a p.P885S substitution. Although the duplication appears to be novel, this was not the case for the p.P885S substitution. The T allele was seen at a frequency of 1.2% in the NHLBI exome variant database cohort. This substitution is most likely a rare SNP. The second sample contained two substitutions, p.A2475T and p.P620L. Sequencing of the parents of this individual revealed that both variants were transmitted on the maternal allele ruling out compound heterozygous inheritance. The final sample with more than one BSN variant was one of two affected siblings. One of the affected siblings had two substitutions, p.P1249R and p.K2697E. However, the other sibling only had the p.P1249R variant and not p.K2697E, meaning that these variants did not segregate with disease in this family.

229

Sample BSN variant Zygosi Type Amino acid Number Found in EVS ty substitution of database Samples with variant Individual c.863C>T Het Non p.P288L 3 Y A-C synon (TT=0/TC=23/C C=5320) Individual c.1795_1861d Het Indel p.Thr577_Pro5 1 N/A D up66 98dup c.2653C>T Het Non p.P885S 1 Y (rs150021639) synon (TT=0/TC=131/C C=5247) Individual c.2160C>G Het Non p.S720R 1 N E synon Individual c.3695C>T Het Non p.T1232I 1 Y (rs112787310) F synon Individual c.3509G>A Het Non p.G1170D 1 N G synon Individual c.4445C>T Het Non p.P1482L 1 Y (rs145880557) H synon (TT=1/TC=32/C C=5346) Individual c.4679C>T Het Non p.T1560M 1 N I synon Individual c.6409G>A Het Non p.A2137T 1 N J synon Individual c.7241G>A Het Non p.R2414Q 1 Y (rs148674638) K synon (AA=0/AG=14/G G=5351) Individual c.7541C>G Het Non p.A2514G 1 N L synon Individual c.7969A>G Het Non p.T2657A 1 N M synon Individual c.8528T>C Het Non p.M2843T 1 N N synon Individual c.8837G>A Het Non p.R2946Q 3 Y (rs116113662) O synon (AA=0/AG=105/ GG=5265) Individual c.9220G>A Het Non p.G3074R 1 Y (rs147485557) P synon (AA=0/AG=40/G G=5335) Individual c.9632C>A Het Non p.A3211D 1 Y (rs146448461) Q synon (AA=0/AC=16/C C=5363) Individual c.9919A>G Het Non p.S3307G 1 Y (rs145931220) R synon (GG=0/GA=17/A A=5362) Individual c.10991G>A Het Non p.R3664Q 1 Y (rs140013456) S synon (AA=0/AG=41/G G=5330) Individual c.11014C>G Het Non p.R3672G 1 N T synon Individual c.1064C>T Het Non p.A355V 1 Y (rs150570799) U synon (TT=0/TC=1/CC =5378)

Table 8.4. Non-synonymous BSN variants detected in a cohort of IHH patients.

230

Sample BSN Zygosit Type Amino acid Numbe Found in EVS variant y substitution r of database Sample s with variant Individual c.1063G>A Het Non p.A355T 1 Y (rs148125466) V syno (AA=1/AG=36/GG=53 n 42) Individual Non W syno c.7423G>A Het n p.A2475T 1 N Non syno N (rs185089105, no c.1859C>T Het n p.P620L 1 freq data) Individual s X Non Individual syno Y c.3746C>G Het n p.P1249R 2 N Individual Non X syno c.8089A>G Het n p.K2697E 1 N Individual c.186_203d Het Indel p.Pro63_Gly68d 1 N/A Z el el

Table 8.4. Continued.

8.3.2 Zebrafish BSN knockdown model In order to further investigate BSN attempts were made to create a knockdown zebrafish model of disease using morpholino technology. If a distinct and reproducible phenotype could be established then potentially the model can be used in rescue experiments and as a method of assessing pathogenecity of rare variants such as the c.10081C>T variant identified in Family HH1. The zebrafish genome Zv9 on the Ensembl Genome Browser contains two copies of the bsn gene. The first bsn gene is on chr 8:55,800,321- 55,825,214 (Zv9) and shares 42% identity with the human protein sequence. From here on this gene will be referred to as bsna. The second gene is on chr11:38,359,899:38,373,594 (Zv9) and shares 44% identity with the human sequence, from here on referred to as bsnb. As mention previously the annotation of the zebrafish genome is incomplete and the start codon for the bsna and bsnb genes are unknown. This may explain the reason that both genes currently show less than 50% identity with human sequence. Translation blocking MOs cannot be designed for these genes because the position of the start codon

231

is still unknown. Three splice site blocking MO’s were designed, two for bsna (BSNa ex10 =AAATGAATTACCCATACCTGACTCT, BSNa ex13 =TATATTGCCCATAAGAACTAGCCGT) and one for bsnb (TAAATTCAGCACCAGTTACCTTCAC) using the genomic sequence provided by ensembl (Figure 8.4).

Figure 8.4. Bsn morpholino binding sites. Exons are shown in capital letters and introns are shown in lower case. Position of morpholino binding is shown in red.

To produce eggs male and female adult zebrafish were paired in the evening and eggs collected the following morning within 1 hr of the aquarium lights being switched on. Knockdown was carried out via microinjection of the morpholino antisense oligos (GeneTools, LLC, Philomath, OR, USA) described above (Fig.8.4), at 1 or 2 cell stage zygotes. Microinjection was optimized to deliver 1nl of MO per injection with a 10msec injection time. Functional effects were observed at 1 day post fertilization (1dpf) and 2 days post fertilization (2dpf).

Initial optimization of was carried out for each MO individually. A range of doses from 100μM-750μM were tested for toxicity and morphant phenotype. Morpholinos were found to be effective and non-toxic at 750μM. The concentration of a 750μM BSNa ex10 MO solution was calculated as 6.27ng/nl, meaning that 6.27ng was delivered with each injection. The

232

concentration of a 750μM solution of BSNb MO is 6.25ng/nl, which means that 6.25ng of morpholino is delivered with each injection. The BSNa ex10 MO (750μM) and BSNb MO (750μM) were co-injected as a BSN MO mix, (approximately 12.52ng of MO is delivered in total in a 1nl injection). Sequencing of zebrafish cDNA using primers designed by Primer 3 software (see Appendix Table 10.25 for primer sequences) identified errors in the Ensembl genomic sequence. The annotations of the exon-intron boundaries were incorrect in the region surrounding exon 13, meaning that this morpholino was not suitable for use.

In wildtype embryos at 24hrs development, formation of the structures of the brain can be seen. This is shown in Figure 8.5, taken from [333]. At this stage in development clear definition of the forebrain, mid-brain and hind- brain can be expected in healthy wildtype embryos (Figure 8.6).

Figure 8.5. Development of the zebrafish brain at 24 hours post fertilization. At this stage in development many of the structures of the brain can clearly be seen including the telencephalon (T), diencephalon (D), epiphysis (E), dorsal mid-brain (M), cerebellum (C), floor plate (FP) and hind-brain rhombomeres (r1-7). Image taken from [333].

233

Figure 8.6. Photograph showing transverse brain subdivision at 29 hours post fertilization. The mid-brain (M) is clearly separated from the hind-brain (H) by the cerebellar (C) at the mid/hind brain boundary. Taken from [333].

A morphant phenotype was observed in knockdown embryos (Figures 8.9 and 8.10). At 1dpf the morphant embryos appeared to have necrosis of the brain and definition of the neural structures could not clearly be observed (Figure 8.7). At 2dpf morphant embryos could be categorized as being mild or extreme (Figure 8.8). The same definitions were used to characterize all injected batches. The phenotype was considered mild if embryos were observed as having; A severely reduced head size (particularly around the fore and mid brain region) with normal or slightly small eyes. Small or underdeveloped eyes with only slightly reduced or normal head size. Pigmentation was normal or only slightly reduced in the mild phenotype. The phenotype was described as extreme if embryos had; A severely reduced head size (particularly around the fore and mid brain region) and very small eyes, with or without signs of necrosis or reduced pigmentation. Severe non-specific developmental defects were also seen, in which no clear formation of embryonic structures could be seen, often coupled with severe oedema. These severe defects of embryogenesis were also seen in uninjected and mock embryos and were not considered to be part of the morphant phenotype.

234

Figure 8.7. Phenotype of BSN MO mix injected embryos at 1dpf. Morphant embryos have necrosis and less defined neural structures than mock injected embryos at the same developmental stage.

Figure 8.8. Phenotype of BSN MO mix injected embryos at 2dpf. At this stage in development MO injected embryos can be divided into two groups based on the severity of the morphant phenotype, including head and eye size.

235

Figure 8.9. Charts showing survival results and phenotype data for two representative batches of morpholino injections. Expected percentage of death in uninjected embryos is approximately 10% and approximately 30% in injected embryos. Death rate in batch 1 was higher than expected indicating a sub-optimal batch of eggs has been laid.

236

Figure 8.10. Charts showing results at 2dpf of two representative batches of zebrafish morpholino injections.

In order to establish if splicing of zebrafish bsn transcript had been disrupted by the BSNa ex10 MO and the BSNb MO, RNA was extracted from two batches of zebrafish which were injected with MO mix. RNA was extracted at 2dpf from mock injected embryos, uninjected embryos, and MO injected embryos which had been categorized into mild, extreme and normal phenotype as described previously. RNA was converted to cDNA and primers were designed to amplify bsn transcript around the MO binding sites.

As shown in Figure 8.11, splicing was disrupted for both MOs. Wildtype transcript was present in all samples but in MO injected samples additional bands representing mis-spliced transcripts can be seen. Bands were separated using gel electrophoresis and extracted from the agarose gel.

237

Sequencing of the smaller band produced by the BSNa ex10 MO revealed a deletion of 42bp caused by mis-splicing of exon10. Sequencing of the BSNb MO revealed two mis-splicing events; the first resulted in a loss of 64bp from the cDNA transcript at the end of exon 9 and is predicted to cause a frameshift and premature termination of the protein sequence. Sequencing of the second mis-spliced product revealed that the entire intron between exon 9 and exon 10 had been retained. No mis-spliced products were detected in mock or uninjected embryos. Amplification was also carried out under the same conditions using negative reverse transcriptase (RT) samples produced during the RNA to cDNA conversion. These were used as negative controls to demonstrate that there was no contamination from genomic DNA. Please see Appendix Table 10.25 for primer sequences.

Figure 8.11. Splicing of cDNA after morpholino injections. 1=batch1 ‘extreme’, 2=batch1 ‘mild’, 3= batch1’normal’, 4=batch2 ‘extreme’, 5= batch2 ‘mild’, 6= batch2 ‘normal’, 7= mock, 8=uninjected

Evidence from the RT PCR experiments demonstrated that the BSNa ex10 and BSNb MOs were disrupting splicing of zebrafish bsn transcript. However, wildtype transcript was still present in MO injected embryos, which means that some wildtype protein could be produced. In order to demonstrate reduced protein levels in MO injected embryos a zebrafish specific antibody is required. There are no antibodies for zebrafish bassoon available commercially. During the course of this project we have

238

collaborated with the Gundelfinger group based at the Leibniz Institute for Neurobiology, Magdeburg Germany. This group are responsible for creating the Bsn knockout mouse model discussed earlier and have spent years investigating the function of this protein in the synaptic active zone. The group kindly allowed us to try one of the antibodies (rb2BSN1) which they routinely use for western blotting of mouse protein to discover if it would be able to cross react with the zebrafish bassoon protein.

Initially rb2BSN1 was optimized using protein isolated from whole mouse brain. Bands of the correct size for murine bsn were easily detected at a primary antibody dilution of 1:1000 (Fig 8.12). For full details of methods used please see Chapter 2.

Figure 8.12. Western blot on the left showing Bsn expression in mouse synaptic junction protein preparations; image taken from [326]. Two major protein bands of >400KDa and 350KDa are detected. Mature Bassoon is thought to migrate at >400KDa and other bands represent degradation products. Lanes 1-7= mouse synaptic junctional protein preparations, Lane 7’ = a short exposure of lane 7. Western blot on the right shows successful detection of Bassoon using antibody rb2BSN1 (1:1000) in our lab in whole mouse brain protein extract.

239

Figure 8.13. Western blot showing optimization of antibody rb2BSN1 at 1:1000 dilution in zebrafish protein. A single band was detected in zebrafish protein extract at approximately 240KDa.

The predicted size of the bsna peptide is 415KDa and bsnb peptide is predicted to be 342KDa on the Ensembl website. The band detected by rb2BSN1 is approximately 240KDa in size (Figure 8.13). This is smaller than predicted and may indicate that rb2BSN1 is not cross reacting with bassoon protein in zebrafish and is binding to an alternative protein. However, the antibody was highly specific, detecting only one band in the zebrafish samples and as discussed previously, annotation of the zebrafish genome is incomplete and the start codons for bsna and bsnb are as yet unidentified. This means that the estimated protein sizes given by the Ensembl website may not be accurate.

In order to investigate this further protein was extracted from zebrafish embryos at 1dpf. Extractions were carried out using uninjected, mock injected and MO mix injected embryos. Figure 8.14 shows a western blot using rb2BSN1. Initial results indicate that protein levels are reduced in MO injected morphant zebrafish at 1dpf (Figure 8.14 A). However, this experiment was repeated including tubulin (TAT1 mouse anti tubulin was given as a kind gift by the Lowe group; Faculty of Life Sciences, University of Manchester and was used at 1:1000 dilution) as a loading control, and no difference in protein level could be seen between mock and MO injected samples in this blot (Fig 8.14 B). The results of the western blotting experiments using antibody rb2BSN1 were inconclusive. The evidence

240

presented here suggests that rb2BSN1 may not be suitable for detecting bassoon in zebrafish. It is clear from the RT PCR experiments that splicing of bsn transcripts has been disrupted by the morpholino injection, however to confirm knockdown of protein in injected embryos an alternative antibody would be required.

Figure 8.14. A. Western blot using rb2BSN1. Expression of protein in MO injected embryos appears to be reduced compared to wildtype and mock injected embryos. B. Blot showing a repeat western experiment using rb2BSN1. No significant difference in expression of mock and MO injected embryos could be seen. TAT1 (Tubulin) loading control in the same gel. Tubulin could not clearly be visualised in the mouse or wildtype sample, thought to be due to a bubble during transfer. Loading of mock and MO injected samples appears equal.

The co-injection of two morpholinos at 750uM results in an overall morpholino dose of approximately 12.52ng. This is considered to be a high dose; 6-8ng is more typical of most morpholino experiments in zebrafish models. At this dose it is important to investigate the possibility that

241

generalized non-specific toxicity could be contributing to the phenotype seen in morphant embryos. In order to determine if this was the case for the bsn morphant embryos a standard control MO was purchased from Gene tools (CCTCTTACCTCAGTTACAATTTATA). The control MO was diluted to a stock concentration of 1mM. Previous experiments with BSN MO mix injected 1nl (10msec injection) at a total concentration of 12.52ng/nl. New dilutions of the BSN MO mix were created with each MO being a 500μM stock (this is a 1mM total, equivalent to the control MO stock). This new solution is now at a total MO concentration of 8.35ng/nl. A 15msec injection will deliver approximately 1.5nl of MO mix, meaning that the same total amount is delivered per injection as in previous experiments (12.52ng). The results of the control morpholino experiment at this dose can be seen in Figure 8.15. The results indicate that at this high dose the control MO injected embryos are displaying signs of neural necrosis very similar to our morphant embryos. This experiment was repeated using the new MO mix with a 10msec (1nl) injection. This means that a total of 8.35ng is being delivered per injection. At this concentration the percentage of control MO injected embryos which display a morphant phenotype at 1dpf is less than 10%, however nearly 80% of BSN MO injected embryos still have a morphant phenotype at 1dpf (Figure 8.16). These results suggest that the phenotype seen in BSN MO injected embryos is specific, however at the higher dose this effect was being masked by non-specific toxic effects. For all future experiments the lower dose of MO mix (500μM of each MO to create a 1mM total) should be used. Extra experiments to determine if this phenotype is specific will also need to be carried out.

242

Figure 8.15. Control Morpholino experiments using a 1mM MO solution at 15msec (total dose delivered is approximately 12.52ng per injection)

Figure 8.16. Control Morpholino experiments using a 1mM MO solution at 10msec injection time (total dose delivered is approximately 8.35ng per injection)

243

8.4 Discussion 8.4.1 Identification of BSN mutations in Family HH1 Candidate gene sequencing of affected individuals from Family HH1 identified a novel homozygous missense mutation in the gene BSN. Bassoon is a structural component of the cytomatrix at the active zone (CAZ), which is a highly specialized structure found in the synapse at the site of neurotransmitter release [328]. Knockout mouse models have demonstrated an important function for Bsn in auditory signalling, and bassoon has also been detected in GnRH releasing neurons at the neurosecretory active zones [331, 332]. For these reasons human BSN is an attractive candidate gene for pathogenicity in Family HH1. The homozygous missense mutation c.10081C>T predicts a non-synonymous amino acid substitution p.R3361W. This amino acid residue is well conserved in vertebrate species and the p.R3361W substitution is predicted to be damaging by the SIFT and Polyphen2 in silico algorithms. This variant is not described on the NCBI SNP database, the NHLBI exome variant database or the 1000 genomes project database, and was found in the heterozygous state in 1 sample out of 284 ethnically matched controls.

In order to exclude the possibility of an alternative mutation within the 3p22.1-p21.2 locus, whole exome sequencing of affected individual HH1:III:4 was carried out. Only two coding variants were detected within the mapped region, the first was a synonymous variant in the gene RPL29 and the second was the BSN c.10081C>T variant. All other variants within the region were intronic. The evidence presented here suggests that BSN c.10081C>T is the pathogenic cause of the novel phenotype described in Family HH1.

A large cohort of idiopathic HH (IHH) patients were screened for mutations in BSN. There are examples in the literature of genes in which mutations can cause rare disorders, and also more common conditions. One example of such a gene is TREX1. Mutations in TREX1 are the pathogenic cause of Aicardi-Goutieres syndrome (AGS) and can also infer susceptibility for the more common disorder systemic lupus erythematosus [334, 335]. Systemic 244

lupus erythematosus and AGS show phenotypic and pathological overlap. It was hoped that by screening a cohort of patients with phenotypic overlap to Family HH1 we would identify novel pathogenic mutations in BSN. A range of variants were found in the IHH cohort but none that were considered likely to be pathogenic and so provide no evidence that variants in BSN contribute to the aetiology of non-syndromal IHH.

8.4.2 BSN knockdown zebrafish model In order to evaluate the pathogenicity of the c.10081C>T variant a knockdown model was created in zebrafish. The zebrafish genome Zv9 (Ensembl Genome Browser) contains two copies of the bsn gene, bsna (42% identity with the human protein) and bsnb (44% identity with the human sequence). Both genes show less than 50% homology to the human sequence, however, as discussed previously the zebrafish genome is still not fully annotated. The exact start site of each of the bsn genes is unknown. Once the sequence is fully annotated it may be that this increases the percentage identity to human BSN. Knockdown of Bassoon protein was carried out using antisense morpholino (MO) technology. Splice blocking morpholinos were used for both bsn genes and knockdown induced a consistent neurological phenotype including neural necrosis, and a reduced head and eye size. Altered splicing could be demonstrated for both transcripts by sequencing cDNA from morphant embryos; however reduced levels of protein could not be demonstrated consistently using western blotting. Attempts were made to optimize an antibody (rb2BSN1) used routinely in the detection of bassoon in mice by our collaborators. This antibody detected a protein band at approximately 240KDa in zebrafish samples. This is smaller than predicted for zebrafish bassoon, and it remains likely that this antibody is not cross reacting with zebrafish bassoon. In order to continue with this model a zebrafish specific antibody would be required to demonstrate knockdown of the protein in morphant embryos. It is also important to highlight that although the BSNa exon 10 MO did alter splicing (resulting in the deletion of 42bp from exon 10), it did not cause skipping of the entire exon and the deleted section of exon 10 remained in frame. The protein produced from the splice modified RNA 245

may be non functional, which explains the observed phenotype. However, there still remains a possibility of residual protein function. Additional experiments to validate this model would include designing a second splice blocking morpholino for bsna in the hopes that the same phenotype would be observed. Ideally the second morpholino would target an exon which will cause a frameshift and premature stop in the coding sequence. This will cause nonsense mediated decay of the transcript eliminating protein function.

Initially co-injection of both BSN morpholinos was carried out with each MO being 750μM. However, experiments using a commercial control MO purchased from Gene Tools demonstrated that this dose was too high. At this dose a similar non-specific toxic phenotype could be seen in embryos injected with control morpholino as well as in the BSN MO injected fish. Experiments carried out using a BSN MO mix in which both morpholinos were at 500μM, still produced a consistent and reproducible phenotype, and importantly at this dose no phenotype was observed in control MO embryos. It is clear from this data that further experiments using the model should be carried out at a dose of 1nl from a 1mM total MO mix to avoid non specific toxic effects. Additionally further experiments could be carried out in a p53 null zebrafish line. Most non specific off target effects of antisense MOs exert their effect via the p53 apoptosis pathway. In p53 null fish, it is hoped that the phenotype will still be seen indicating a specific effect. If the phenotype is no longer observed when injected into this mutant strain, then the MO is likely having non specific effects.

The results presented in this chapter are preliminary data for a zebrafish bsn knockdown model. The initial results are promising, with morphant fish displaying a neurological phenotype, however additional control experiments as described above using alternative morpholinos and the p53 mutant zebrafish would be needed to provide more evidence that this is a specific effect. From here this model could be used for rescue experiments using human BSN mRNA constructs. If rescue with WT construct is successful then it would be informative in relation to the HH1 phenotype to 246

see if injecting a BSN mRNA construct containing the c.10081C>T variant would also rescue the phenotype.

8.5 Conclusion The aim of this chapter was to identify pathogenic mutations in the 3p22.1- p21.2 homozygous locus in Family HH1 and to investigate any variants detected using relevant techniques. A novel missense variant was identified in the gene BSN in all affected individuals. This variant segregates with disease, is predicted to be damaging in silico and was the only non-synonymous variant detected within the mapped locus using whole exome sequencing. Based on the evidence presented here it is likely that the pathogenic cause of the novel phenotype described in Family HH1 is the homozygous c.10081C>T BSN variant. This variant is most likely to cause loss of function of the Bassoon protein rather than be hypomorphic. This seems likely due to the phenotypic similarities (hearing loss) in knockout mice, however further functional experiments will be needed to prove loss of function. Unfortunately to date a second family with a similar phenotype has not been identified to prove the causative relationship between BSN biallelic mutations and the complex phenotype. Preliminary data has been presented for a zebrafish knockdown model of bassoon which will allow future functional experiments to be carried out.

247

CHAPTER 9. DISCUSSION

248

9.0 Discussion

The aim of this project was to identify the causative mutations for two syndromes which have impaired fertility and hearing loss as key phenotypic features. Through the use of classic genetic techniques and cutting edge sequencing technology novel variants have been identified which are likely to be important in the context of hearing and endocrine dysfunction in humans.

9.1 Perrault Syndrome; Summary and final conclusions.

Perrault syndrome is a rare autosomal recessive disorder characterized by sensorineural hearing loss and ovarian dysgenesis in affected females. A range of additional phenotypic features have been reported in cases of Perrault syndrome, with the most common additional features being neurological. The purpose of this project was to gain additional information on the pathogenesis of Perrault syndrome through the identification of novel causative genes in affected families. As well as being rare, Perrault syndrome is genetically heterogeneous [271]. So far two families with mutations have been identified. The first family has mutations in the gene HSD17B4 and the second has mutations in the seemingly unrelated gene HARS2 [239, 240]. These mutations and their functional effects are discussed in more detail in Chapter 1. After excluding mutations in HSD17B4 and HARS2 as the cause of Perrault syndrome in our cohort of patients, a range of techniques were employed to identify and investigate novel genes involved in Perrault syndrome pathogenesis.

The techniques employed throughout this project include autozygosity mapping, next generation sequencing, copy number variation analysis, quantitative PCR and immunohistochemistry. As discussed in Chapter 1, the techniques being developed and used in genetic medicine are constantly evolving. The combination of more classical techniques such as autozygosity mapping, with the newest next generation sequencing technology has been able to successfully identify novel mutations for 249

Perrault syndrome in Family P1. Three novel variants within the 19p13.3- 13.11 mapped locus have been identified in affected individuals; CLPP c.433A>C, GTF2F1 c.1328G>T and PCP2 c.392C>G. The evidence presented throughout the course of this project indicates that one or more of these variants are the pathogenic cause of Perrault syndrome for this family. Expression patterns for Rap74 (encoded by GTF2F1), Pcp2 and ClpP were investigated during this project as a means of determining the most likely candidates for disease pathogenesis. The results of this project have also provided evidence as to the heterogeneous nature of this Perrault syndrome. So far three genes/loci have been identified each in only one family. The possibility should be considered that the cohort of patients gathered for this project may represent a combination of more than one disorder. A wide range of phenotypic features have been described in patients with Perrault syndrome, and it is unclear if these manifestations are all part of the same disorder or if more than one syndrome may involve the combination of hearing loss and premature ovarian failure. This highlights the importance for accurate phenotyping within cohorts of patients with rare disorders. This project has identified three proteins with potential functions in hearing and endocrine function which were previously unknown. It is hoped that further investigation of the genes identified here will increase our understanding of this rare and interesting disorder as well as expanding our current knowledge of endocrine and auditory biology in humans.

9.2 Perrault syndrome; Future work.

In order to take the Perrault syndrome project forward several experiments could be carried out. In Chapter 5 the expression pattern of ClpP and Rap74 (GTF2F1) protein was assessed in human embryonic ovarian sections (approximately 18 weeks gestation, 20mm foot length) using immunohistochemistry (IHC). The expression of Pcp2 protein could not be assessed because a commercial antibody was not available for this protein. It is possible to raise a custom made antibody to carry out this work or an alternative experiment which would determine expression patterns in a 250

similar way is to use in situ hybridization. In situ hybridization does not require the use of an antibody, instead labelled oligonucleotide probes are designed to target the mRNA within selected tissue sections. Using this technique the expression of PCP2 transcript in embryonic ovarian sections can be investigated. Another experiment which would add to the expression data presented here would be to obtain and embed human embryonic cochlear of a similar gestation and carry out IHC and in situ hybridization on this tissue. Establishing expression patterns for the three proteins of interest is important for identifying potential roles in disease pathogenesis. The results of the RNA-Seq experiments presented in Chapter 5 indicate that PCP2 is not expressed in the organs of the inner ear in adults. However, carrying out the in situ experiment described here will give an indication of whether this protein is important for embryonic development of the human cochlear.

The c.433A>C variant identified in the gene CLPP is of particular interest due to the spastic paraplegia phenotype seen in affected individuals in Family P1. Patients with dominant hereditary spastic paraplegia (HSP) have been shown to have reduced levels of ClpP, indicating a possible link with HSP pathogenesis [313]. In order to investigate this further, it would be interesting to sequence a cohort of HSP patients for mutations in CLPP. Identification of novel mutations in this cohort may provide evidence that the c.433A>C variant is contributing to the disease phenotype in Family P1. This work is currently ongoing through collaborations with Professor Tom Warner at University College London who has access to a large cohort of HSP patients with likely recessive inheritance.

The variant detected in GTF2F1 is of particular interest to the Perrault syndrome phenotype due to the relationship between Rap74 and the androgen receptor (AR). The AR is a steroid hormone receptor which among other things regulates the transcription of genes important for sexual development. In 2008 an AR knockout mouse model was established. The female knockout mice exhibited a reduced number of pups per litter compared to wild type mice. Histological examination 251

revealed that folliculogenesis was impaired in AR deficient ovaries. This indicates that the AR is not only essential for male reproductive function but may play an important role in female folliculogenesis too [336]. Rap74 (encoded by GTF2F1) binding to the AF1 domain of the androgen receptor causes a conformational change to the protein structure and activation of transcription [315-317]. The binding of Rap74 to AF1-AR is a potential link to ovarian dysgenesis pathogenesis. One way of investigating this hypothesis would be to compare the ability of wild type and the p.G443V mutant Rap74 in binding to the AF1 domain of the AR. To do this an experiment similar to that described by Lavery et al could be used [316]. This group used wildtype and mutant C-terminal domain fragments of Rap74 and radiolabelled them. These fragments were then incubated with immobilized AF1-AR. After a series of washes the bound radio-activity was measured [316]. A similar technique could be used in the investigation of the p.G443V mutation as this also lies within the C-terminal domain which is crucial for AR binding.

Finally it is important to continue to screen new cases of Perrault syndrome for mutations in these genes. Although this disorder is genetically heterogeneous it is still possible that additional mutations in the same genes may be found in other affected families. Similarly it is important to map new families where possible and identify novel loci. Autozygosity mapping is currently being carried out using Affymetrix arrays for Family P11.

9.3 Hypogonadotropic Hypogonadism syndrome; Summary and final conclusions.

In Chapter 3 a previously uncharacterized complex syndrome characterized by hypogonadotropic hypogonadism (HH) was described in a large consanguineous kindred. The affected individuals from this family had a novel combination of phenotypic features including HH, sensorineural hearing loss, learning disability and characteristic facial dysmorphism. Autozygosity mapping of this family identified a disease loci at 3p22.1- 252

p21.2. Candidate gene sequencing was carried out and a homozygous missense mutation c.10081C>T in the gene BSN was identified which segregates with disease. Exome sequencing of one of the affected family members could not identify any other potentially pathogenic variants within the locus. The data presented throughout this project suggests that the c.10081C>T variant which causes p.R3361W in the bassoon protein is the pathogenic cause of this novel syndrome.

The function of Bassoon has been well characterized through investigation of knockout mouse models, but no human mutations or links to disease have been described in the literature. The main function of bassoon is to act as a scaffolding protein and ensure the correct organization of the cytoskeleton at the pre-synaptic CAZ (cytomatrix at the active zone). As discussed in Chapter 8, one of the phenotypic features of Bsn null mice is hearing impairment due to impaired anchoring of ribbon synapses. As well as this there is the potential for disrupted GnRH signalling as bassoon is also found at the active zone of GnRH neurosecretory cells. As part of this project attempts were made to establish a zebrafish knockdown model. The advantage of working with a zebrafish model over a mouse model is that zebrafish lay large numbers of externally fertilized eggs and have a rapid embryonic development. This means that functional experiments can be carried out quickly and can easily be repeated multiple times to verify results. Preliminary data for an anti-sense morpholino induced knockdown model of bassoon were presented in Chapter 8. Preliminary data indicates that bsn knockdown fish do have a reproducible neurological phenotype and morpholino injection has disrupted splicing but further experiments will be needed to fully validate the model before rescue experiments can be carried out. As well as rescue experiments zebrafish models can be used to investigate and track gene expression using fluorescent reporter genes, and compare wildtype and mutant expression in vivo.

253

9.4 Hypogonadotropic Hypogonadism syndrome; Future work.

In order to continue with the bsn zebrafish model further validation of the phenotype will be required. Repeat experiments need to be carried out in which larger numbers of embryos are injected in order to increase confidence in the neurological phenotype. The possibility of non-specific off target effects being the cause of this phenotype would also need to be investigated. One way of establishing specificity is to inject the morpholino mix into a p53 null zebrafish line. The majority of non specific effects induced by morpholino injection act via the p53 apoptosis pathway. If the same phenotype can be observed in morpholino injected p53 null embryos then the phenotype is more likely to be specific.

Once the morphant phenotype has been established as being robust and specific functional analysis of the c.10081C>T mutation can be carried out. Through collaborations with the Gundelfinger group in Magdeburg we have access to wild type BSN construct and also to a construct containing the c.10081C>T mutation. Differences in the ability of the two constructs to rescue the morphant phenotype will give an indication of whether the mutation is causing loss of protein function.

As well as the zebrafish model further experiments on the knockout mouse model could be carried out in collaboration with the Gundelfinger group. The hypothesis that BSN is important for the pulsatile release of GnRH from GnRH releasing neurons can be investigated by collecting blood samples from wild type and null mice. Hormone assays can then be used to measure and compare LH and FSH levels in the mice. Personal correspondence with the Gundelfinger group has indicated that homozygous null mice do not breed and that fertility may be impaired. As well as carrying out hormone assays hypogonadism can also be established by comparing the gonads of mutant and wild type mice. Ovaries and testes can be dissected and in hypogonadal mice should be atrophic and small compared with wild type mice [337].

254

Further functional investigations into the c.10081C>T BSN variant have already started. The Gundelfinger group has recently started to carry out some exciting and innovative research into mental retardation syndromes. Samples from our family will be included in this project and will allow us to determine the effects of the c.10081C>T mutation in patient cells. The aim of this project is to study the effect that the BSN mutation is having in HH1 family members by using neurons derived from the differentiation of induced pluripotent stem (iPS) cells obtained from patient skin biopsies. Fibroblasts and keratinocytes will be collected from our patients and reprogrammed to generate pluripotent stem cells by using a retrovirus to stably express a minimum of three genes (Oct4, Sox2 and Klf4). These stem cells can then be differentiated to form excitatory and inhibitory neuronal cells which will be assessed for changes in morphology, biochemistry and function, and hopefully will provide information relevant to pathogenesis in vivo.

Finally it is important to this project to continue to try to identify additional families with the same phenotype as Family HH1. We have recently been contacted by a group at Great Ormond Street Hospital who have a family with a similar phenotype to ours. The family is a consanguineous British Pakistani family with two affected children and two unaffected children. Autozygosity mapping has already been carried out and a single large locus has been identified which overlaps with the 3p22.1-p21.2 locus. Mutation analysis is currently ongoing.

9.5 Final conclusions

The aim of this project was to give insight into the molecular pathology of two disorders (Perrault syndrome and a novel Hypogonadotropic Hypogonadism (HH) disorder) through the identification of causative genes and their mutations. I believe that by using a range of techniques this aim has been achieved. Novel mutations have been identified in genes which have not previously been linked to hearing or sexual development in humans. There are still many exciting experiments to be carried out in 255

order to fully understand the roles that BSN, CLPP, PCP2 and GTF2F1 may be playing in disease pathogenesis for the families described here. But hopefully by identifying these genes this project has contributed to what is currently known about hearing and endocrine disorders.

9.6 New Findings and Published Data since Submission of Thesis

Following the submission of this thesis additional data has become available. Two additional Perrault syndrome families, both from Pakistan with evidence of consanguinity, have been identified. Autozygosity mapping has shown that affected individuals from both families have homozygous regions which overlap with the Perrault locus, 19p13.3-13.11 originally mapped in family P1. Subsequently, whole exome sequencing has identified mutations in the gene CLPP in all affected individuals, a homozygous missense mutation segregates with disease in one family and a homozygous splice site mutation can be found in the second family. This work has been carried out in collaboration with Dr Tom Friedman at the National Institute on Deafness and Other Communication Disorders (NIDCD) in Bethesda. A manuscript detailing this work including description of mutations and predicted functional effects is currently being prepared for submission to the American Journal of Human Genetics.

Additionally, a fourth gene for Perrault syndrome has been identified by our collaborator, Professor Mary-Claire King at the University of Washington, Seattle. The King lab have identified mutations in the gene LARS2 in two Perrault syndrome families. A poster detailing these findings was recently presented at the American Society of Human Genetics Meeting 2012. LARS2 encodes the mitochondrial leucyl-tRNA synthetase (from the same family of proteins as the previously identified HARS2), and mutation of lars- 2 in Caenorhabditis elegans results in abnormal germ cell differentiation and sterility.

256

CHAPTER 10. APPENDIX

257

10.0 Appendix

Appendix Figure 10.1. Clinical poster from ESHG describing the phenotype of Family P10.

258

Primer Sequence HSD17B4 exon 1 Forward TAGATGAACGCAAGGTGTCG HSD17B4 exon 1 Reverse TAACAATCGATGCCCACAGA HSD17B4 exon 2 Forward GGTTGAGAATGTCAGTGATAGGA HSD17B4 exon 2 Reverse TCTCGCACCAGTAGACAAACC HSD17B4 exon 3 Forward TAGGCATTGGCTTTTTCTCC HSD17B4 exon 3 Reverse TTAAGTATGCGATGGCCACA HSD17B4 exon 4 Forward TGAAAATGGCTGTGTTGTGTT HSD17B4 exon 4 Reverse AACTTTGGCAGACTTAGAAAATCA HSD17B4 exon 5 Forward TGTGAGAATTGTTAAAACTTTTGATG HSD17B4 exon 5 Reverse TCCATAAAATTGCCACCTCA HSD17B4 exon 6&7 Forward TGATACTTAGGCTTTTGTGAGTCAA HSD17B4 exon 6&7 Reverse ACTCATCATTTATATTAGCAGCAAAA HSD17B4 exon 8 Forward GCATAACTGGAATAAAGGCAAAA HSD17B4 exon 8 Reverse CAGCTCAGCCAATCTGTGAC HSD17B4 exon 9 Forward AAGTCTAAGATCATTTGGTTCTGG HSD17B4 exon 9 Reverse TGCTTTTCTAAATTTTCCACAATG HSD17B4 exon 10 Forward GCCCTTTAGAAATGGCTCAG HSD17B4 exon 10 Reverse GCCACATTTTCATTTGGTAGG HSD17B4 exon 11 Forward CAAGCCTTGGTCTCTGACATC HSD17B4 exon 11 Reverse GTCTTGAAAGGGCCACAGAC HSD17B4 exon 12 Forward TTTTCCCTTCAGCTTCAAATG HSD17B4 exon 12 Reverse GCAGAAAATATGCTATAGACGATTCA HSD17B4 exon 13 Forward GGAAGGAGGTGGCTTTCAAC HSD17B4 exon 13 Reverse GAGACATGGGCTTCCTTCTG HSD17B4 exon 14 Forward CCTGACATACATTCAGTTCATGAGT HSD17B4 exon 14 Reverse TGCTTCTCTCCATTATTAGAGACTG HSD17B4 exon 15 Forward TGTCTGTTAGCAAGAAGCAAGC HSD17B4 exon 15 Reverse CTTTTTGCTGGCATTTTGAAC HSD17B4 exon 16 Forward AGGCTTTATGAACGCCAAGA HSD17B4 exon 16 Reverse GGACAAAGCTTAAGGTGACCA HSD17B4 exon 17 Forward TCCTCTGCAGCATCTGTTGT HSD17B4 exon 17 Reverse CTTCTCCCCCTCCTTCTTTA HSD17B4 exon 18 Forward GCTCCTCTTCTCTCTGCCTTT HSD17B4 exon 18 Reverse GGAGGAAGAAGAGGGACTGG HSD17B4 exon 19 Forward GAAAGGTCATTTCAGGCAAATC HSD17B4 exon 19 Reverse TTTTGGTATCTAGTGGGAAAACA HSD17B4 exon 20 Forward TTTTTCCCTCCCACTGATTTT HSD17B4 exon 20 Reverse GGAACTTCCCTCCACCATAA HSD17B4 exon 21 Forward TGGAGAGAGAGCAAGGAACTG HSD17B4 exon 21 Reverse TGATTGCCAGGTCAATGAAA HSD17B4 exon 22 Forward TGAAAGACAAAGAATTGGCTTACTT HSD17B4 exon 22 Reverse TTGAAAACACCAGACAAGCTG HSD17B4 exon 23 Forward GCAGCCTTTTATTTTATCTGGA HSD17B4 exon 23 Reverse CTCCAAAACGCTGTTTGCTT HSD17B4 exon 24 Forward TGAAAGACACATTGTATGAAGAAAA HSD17B4 exon 24 Reverse TGCATATAGCCAAGATGACTGTTT HSD17B4 exon 25 Forward GTGGTTGCAGACCATGACC HSD17B4 exon 25 Reverse CCTGGCTTATAAATCAGAATTTGG

Appendix Table 10.1. Primer sequences for coding exons of HSD17B4.

259

Primer Sequence HARS2 EXON 1 FOR CTGGCTACTAAGGGAACTTG HARS2 EXON 1 REV AAATCAAAACTCCAACCTCTC HARS2 EXON 2 FOR TTGTGTGGTGAAGACCTGAC HARS2 EXON 2 REV AAAGCAGCCATAGTGAAAAC HARS2 EXON 3&4 FOR AGGACTGACCTCTGCCTTGC HARS2 EXON 3&4 REV CCAAATGTCTGTGCTTCTGC HARS2 EXON 5&6 FOR TGCTTTCTGCTGAACTTTTAG HARS2 EXON 5&6 REV AAACATCCCATCCACAATCC HARS2 EXON 7 FOR ATTTCCATCCTTTTTTGTGTG HARS2 EXON 7 REV AGTATGCCTCCTTACCTTCC HARS2 EXON 8 FOR TTGGGGTGGAAGGTAAGGAG HARS2 EXON 8 REV GCTTTGCCATCTGAAGACAC HARS2 EXON 9&10 FOR AAAGAAAATGAGGAAGACTAGC HARS2 EXON 9&10 REV TCACCTTTGGTCTAATAAGAGAG HARS2 EXON 11 FOR GCAGAGGATGAAGGTAGGTC HARS2 EXON 11 REV AGAAAGCCAGAGATAAGTGG HARS2 EXON 12 FOR TTATGTCTGGGGTGGAACTC HARS2 EXON 12 REV CGGATTTCATTTGTCTTTCTTC HARS2 EXON 13 FOR ACAAATGAAAATCCGAATGGG HARS2 EXON 13 REV GGAGTCCTCAGGGTCTTCTAC

Appendix Table 10.2. Primer sequences for coding exons of HARS2.

260

Primer Sequence DELETION PP1 FORWARD ACCAGGATGCTGTCGAACTC DELETION PP1 REVERSE TGGGATACAGGCATGAAACA DELETION PP2 FORWARD GAACCTCCTCCAGCAGCAT DELETION PP2 REVERSE GCAGGCTCAGAGCAGACC DELETION PP3 FORWARD GTCTGGGGCCAGAGACAATA DELETION PP3 REVERSE ATGGCTCATCTGGTCAGTCC DELETION PP4 FORWARD GCAGTGGCTCATGCCTGT DELETION PP4 REVERSE GCGTGTTAGCATCGCTTCTT DELETION PP5 FORWARD TTACAGCATGTGCCACCA DELETION PP5 REVERSE TGACAACGACCGCACCAT DELETION PP6 FORWARD GCCTAGACATTGCTGAACCTC DELETION PP6 REVERSE GGGTTCTGCTTACTGCCTGA DELETION PP7 FORWARD TTTTCTTTGAGATGGAGTCTCG DELETION PP7 REVERSE CTTGAGCCCAGGAGTTTGAG DELETION PP8 FORWARD CATCCATTCACCTTCTTTCTCC DELETION PP8 REVERSE AGCCAAAATCATTCCACTGC DELETION PP9 FORWARD GGTGACAAGGCGAGACTCTG DELETION PP9 REVERSE GCCAAAGCTGCTCTCAAACT DELETION PP10 FORWARD AAGTCCATGAGCTCGTCCAG DELETION PP10 REVERSE GAAGGCTGAGGCAGGAGAAT DELETION PP11 FORWARD TGGTCAGGCTGGTCTCAAA DELETION PP11 REVERSE CTATAGGCACCAGCCACCAC DELETION PP12 FORWARD GCGAAACCCTGTCTCCACTA DELETION PP12 REVERSE TATAGCAAAGCAGGCGGAAG DELETION PP13 FORWARD ACCTCAGCCTCTCAAAGTGC DELETION PP13 REVERSE GATCACACCACTCCACTCCA DELETION PP14 FORWARD AGAGCACAGATGCTGGGTTT DELETION PP14 REVERSE AAAATTAGCTGGGCATGGTG DELETION PP15 FORWARD GCAGCCTCAACCTTCTGAAC DELETION PP15 REVERSE CATCTTGGCTAACACGGTGA DELETION PP16 FORWARD GTCCCAAAGGGCTGAGATT DELETION PP16 REVERSE CTCACATCTGTCATCCTAGCAA DELETION BREAKPOINT FORWARD AATGGAGTCTCAGTCTGTCG DELETION BREAKPOINT REVERSE CAGAAGCTCAAGACCAGCTT

Appendix Table 10.3. Primer sequences for the confirmation and breakpoint determination of Family P1 homozygous deletion.

261

Primer Sequence AP1M2 EXON 1 FOR ATCTTCAAGTTGGCCGACAG AP1M2 EXON 1 REV TGTTCCCAGACCTCTTCCTG AP1M2 EXON 2&3 FOR AAGCCCCTCAGCTAGGAAGT AP1M2 EXON 2&3 REV GAAGACCCAGGGAGATCTGG AP1M2 EXON 4&5 FOR TAAGTGATCCTCGTGCCTTG AP1M2 EXON 4&5 REV GAGGGAGGACGTTGGGTATAG AP1M2 EXON 6 FOR CGCTCCGAGGGTATCAAGTA AP1M2 EXON 6 REV GCGAGCGCTTATTTCAAAAC AP1M2 EXON 7 FOR ACTGTGCCAGTTTTTGCTCA AP1M2 EXON 7 REV CCTCCAAAATTGTTGGGATT AP1M2 EXON 8 FOR GCGAAGAAGCGAGACTCTGT AP1M2 EXON 8 REV CAAAGTCCTGGGCTTAAGTGA AP1M2 EXON 9 FOR CTGGGCGACAGAGTGAGACT AP1M2 EXON 9 REV CACACCTGACTGGGTCTCTG AP1M2 EXON 10 FOR TCTGTGAAATGGGCTGTGAG AP1M2 EXON 10 REV GCAGCCTTCAGAGGAGTGTT AP1M2 EXON 11 FOR CTTGGCCTAAGCCGTCTCTT AP1M2 EXON 11 REV CTGAGCCAGAATGTGGTGAA AP1M2 EXON 12 FOR CTCCCAAAGTGCTGGGATT AP1M2 EXON 12 REV AAGCAAAGGCAACAGCAGAG KRI1 EXON 1 FOR GAAAGGGATCCCGGAAAAG KRI1 EXON 1 REV GCACTGTCTGTCCCTCATGG KRI1 EXON 2 FOR CTCCTGCTGTTTGACCACCT KRI1 EXON 2 REV CCCTGAGGGGAGATGCTAC KRI1 EXON 3 FOR ATTGGTGGATTTGGGATTTG KRI1 EXON 3 REV CTTCTGTCCTCAAGCGATCC KRI1 EXON 4&5 FOR AATCGCAGATAGCCCTGAGA KRI1 EXON 4&5 REV CCCCGTTTAACTGATGAGGA KRI1 EXON 6&7 FOR GCCCAGGAGTTCAAGATCAG KRI1 EXON 6&7 REV TCCCATCTTCAGAGGGTTACTT KRI1 EXON 8 FOR CTACTCCAGAGGCTGGCTGT KRI1 EXON 8 REV TGTGGCTCTGCCTTACAGAA KRI1 EXON 9&10 FOR CGTGCACCTTGTAGGGTTCT KRI1 EXON 9&10 REV CAGCTGCTTGAGCTCTTCCT KRI1 EXON 11 FOR AGAGACTCGGGAGCGAAAG KRI1 EXON 11 REV CAGGTGATCTGCCTGCCTTA KRI1 EXON 12&13 FOR GGGAGCCCTAGCTGTAGGAG KRI1 EXON 12&13 REV GTCGTAGTCGGCGTCCAT KRI1 EXON 14 FOR CACTGTGAGGACCCCAACTT KRI1 EXON 14 REV TTGTGGCATTTCTGAGCAAC KRI1 EXON 15&16 FOR CAAGGATCCGTTCTCTGAGC KRI1 EXON 15&16 REV GATTCGAACCCAGGTCTCAG KRI1 EXON 17 FOR CTGTGCGGCTGCTTCACTAT KRI1 EXON 17 REV CTCCCAAAGTGCTGGGATTA SLC44A2 EXON 23 FOR GGTCCCAGTGTGTCTGCTTT SLC44A2 EXON 23 REV CTCCACGTCCAGAAACTGGT CDKN2D EXON 1 FORWARD AGGGTGAGTTAGGGGGAGAC CDKN2D EXON 1 REVERSE CTGGGGTCTCGATCCTCAT CDKN2D EXON 2 FORWARD TTCCTGTTTCTGGGAGATGC CDKN2D EXON 2 REVERSE AAGCCACAAACTGTGCTCCT

Appendix Table 10.4. Primer sequences for coding exons of AP1M2, SLC44A2, CDKN2D and KRI1.

262

Primer Sequence SLC44A2 EXON 1 FOR GTGTTCCCAGGGTGAAGC SLC44A2 EXON 1 REV GGGATAAGTGGGGTGAAGG SLC44A2 EXON 2 FOR GAGGCAGAAAATTCCACAGG SLC44A2 EXON 2 REV GTGAGCCGAGATTGAGATCG SLC44A2 EXON 3&4 FOR CCTGGATGACGGAGTGAGAC SLC44A2 EXON 3&4 REV GCCCAGCCTATAAGCACTTG SLC44A2 EXON 5&6 FOR AGGCAGGAGAATTGCTTGAA SLC44A2 EXON 5&6 REV AGATGGAGGGAGGTGAGTCC SLC44A2 EXON 7&8 FOR TTCTGTGTTCCTGGCTTCAA SLC44A2 EXON 7&8 REV GACAAGGCTCGGGTCAGA SLC44A2 EXON 9&10 FOR GGCGCCAAGTGAGGATATT SLC44A2 EXON 9&10 REV TTTTCCTGATGGGCCATTT SLC44A2 EXON 11&12 FOR AAAGTCCCTGAGGCAGAAGC SLC44A2 EXON 11&12 REV AGCACATGACGTATCCCACA SLC44A2 EXON 13&14 FOR AGGATGGAGCTGTCCCTAGA SLC44A2 EXON 13&14 REV TCCAAGTGGACATGAGAGGTT SLC44A2 EXON 15&16 FOR TCAATCCCTATGTCTCCTGTCC SLC44A2 EXON 15&16 REV GAACTCCACCCCGTCATAGA SLC44A2 EXON 17&18 FOR TAGCTCACTGCAGCCTCAAA SLC44A2 EXON 17&18 REV AAGTTTGCCCAACAGGAAGA SLC44A2 EXON 19&20 FOR TCCCACTCTCCTCCAGATTG SLC44A2 EXON 19&20 REV ATTCCCCACCTCTGACCTCT SLC44A2 EXON 21 FOR GAGGGGTTGGGATGTCACTA SLC44A2 EXON 21 REV GCTTCTAAGCGCAAAAGGAA SLC44A2 EXON 22 FOR GGCTGCCACTAACTCTGGTC SLC44A2 EXON 22 REV ACACACAGGATCCCCACACT

Appendix Table 10.4. Continued.

Dilution Approx. cDNA concentration 1 26.5ng/ul 2 2.65ng/ul 3 0.265ng/ul 4 0.0265ng/ul 5 0.00265ng/ul

Appendix Table 10.5. cDNA concentrations used in validation experiment 1. The DNA sample P1:II:1 (265ng/ul) was used for all validation experiments.

263

Dilution Detector Name Reporter Ct Threshold 1 GAPDH SYBR 16.16426 0.199526 1 GAPDH SYBR 16.16251 0.199526 1 GAPDH SYBR 16.43153 0.199526 2 GAPDH SYBR 17.46672 0.199526 2 GAPDH SYBR 17.16734 0.199526 2 GAPDH SYBR 17.08981 0.199526 3 GAPDH SYBR 20.39598 0.199526 3 GAPDH SYBR 20.37149 0.199526 3 GAPDH SYBR 20.37806 0.199526 4 GAPDH SYBR 23.55157 0.199526 4 GAPDH SYBR 23.60425 0.199526 4 GAPDH SYBR 23.55021 0.199526 5 GAPDH SYBR 26.93932 0.199526 5 GAPDH SYBR 26.84164 0.199526 5 GAPDH SYBR 26.85156 0.199526 NTC GAPDH SYBR 34.81509 0.199526 NTC GAPDH SYBR 35.64138 0.199526 NTC GAPDH SYBR 35.0569 0.199526

Appendix Table 10.6. Results of GAPDH assay efficiency validation experiment 1. Dilution Detector Name Reporter Ct Threshold 1 CDKN2D SYBR 25.32768 0.199526 1 CDKN2D SYBR 25.34454 0.199526 1 CDKN2D SYBR 25.38192 0.199526 2 CDKN2D SYBR 25.54595 0.199526 2 CDKN2D SYBR 25.51082 0.199526 2 CDKN2D SYBR 25.55342 0.199526 3 CDKN2D SYBR 28.86085 0.199526 3 CDKN2D SYBR 28.78603 0.199526 3 CDKN2D SYBR 28.73898 0.199526 4 CDKN2D SYBR 31.87979 0.199526 4 CDKN2D SYBR 31.89124 0.199526 4 CDKN2D SYBR 31.97875 0.199526 5 CDKN2D SYBR 35.77805 0.199526 5 CDKN2D SYBR 36.76592 0.199526 5 CDKN2D SYBR 35.69967 0.199526 NTC CDKN2D SYBR Undetermined 0.199526 NTC CDKN2D SYBR Undetermined 0.199526 NTC CDKN2D SYBR Undetermined 0.199526

Appendix Table 10.7. Results of CDKN2D assay efficiency validation experiment 1.

264

Dilution Detector Name Reporter Ct Threshold 1 KRI1 SYBR 21.8249 0.199526 1 KRI1 SYBR 22.08993 0.199526 1 KRI1 SYBR 22.07756 0.199526 2 KRI1 SYBR 22.66372 0.199526 2 KRI1 SYBR 22.04627 0.199526 2 KRI1 SYBR 22.62885 0.199526 3 KRI1 SYBR 25.80992 0.199526 3 KRI1 SYBR 25.74606 0.199526 3 KRI1 SYBR 25.76635 0.199526 4 KRI1 SYBR 29.10536 0.199526 4 KRI1 SYBR 29.23571 0.199526 4 KRI1 SYBR 29.40145 0.199526 5 KRI1 SYBR 32.25787 0.199526 5 KRI1 SYBR 32.25739 0.199526 5 KRI1 SYBR 32.98962 0.199526 NTC KRI1 SYBR Undetermined 0.199526 NTC KRI1 SYBR Undetermined 0.199526 NTC KRI1 SYBR Undetermined 0.199526

Appendix Table 10.8. Results of KRI1 assay efficiency validation experiment 1.

Dilution Detector Name Reporter Ct Threshold 1 SLC44A2 SYBR 23.961138 0.199526 1 SLC44A2 SYBR 23.931314 0.199526 1 SLC44A2 SYBR 24.220728 0.199526 2 SLC44A2 SYBR 25.38897 0.199526 2 SLC44A2 SYBR 25.11055 0.199526 2 SLC44A2 SYBR 25.061855 0.199526 3 SLC44A2 SYBR 28.5109 0.199526 3 SLC44A2 SYBR 28.187294 0.199526 3 SLC44A2 SYBR 28.49635 0.199526 4 SLC44A2 SYBR 31.645468 0.199526 4 SLC44A2 SYBR 31.642591 0.199526 4 SLC44A2 SYBR 32.0485 0.199526 5 SLC44A2 SYBR 36.93869 0.199526 5 SLC44A2 SYBR 34.98138 0.199526 5 SLC44A2 SYBR 35.49578 0.199526 NTC SLC44A2 SYBR Undetermined 0.199526 NTC SLC44A2 SYBR Undetermined 0.199526 NTC SLC44A2 SYBR 36.6534 0.199526

Appendix Table 10.9. Results of SLC44A2 assay efficiency validation experiment 1.

265

Dilution Detector Name Reporter Ct Threshold 1 AP1M2 SYBR 39.492405 0.199526 1 AP1M2 SYBR Undetermined 0.199526 1 AP1M2 SYBR 39.463963 0.199526 2 AP1M2 SYBR 38.088657 0.199526 2 AP1M2 SYBR 38.113613 0.199526 2 AP1M2 SYBR 38.082123 0.199526 3 AP1M2 SYBR Undetermined 0.199526 3 AP1M2 SYBR Undetermined 0.199526 3 AP1M2 SYBR 39.141968 0.199526 4 AP1M2 SYBR 38.265667 0.199526 4 AP1M2 SYBR 38.905006 0.199526 4 AP1M2 SYBR Undetermined 0.199526 5 AP1M2 SYBR 37.596066 0.199526 5 AP1M2 SYBR Undetermined 0.199526 5 AP1M2 SYBR Undetermined 0.199526 NTC AP1M2 SYBR 38.464645 0.199526 NTC AP1M2 SYBR Undetermined 0.199526 NTC AP1M2 SYBR Undetermined 0.199526

Appendix Table 10.10. Results of AP1M2 assay efficiency validation experiment 1.

266

Appendix Figure 10.2. Amplification plot and standard curve of GAPDH, CDKN2D, SLC44A2 AND KRI1 dilution series for validation experiment 1. Red line on amplification plot indicates Ct threshold.

267

Appendix Figure 10.3. Amplification plot for AP1M2 dilutions for validation experiment 1. This plot shows that AP1M2 was not efficiently amplified at these dilutions. Dilution Approx. cDNA concentration 1 2.65ng/ul 2 0.265ng/ul 3 0.0265ng/ul 4 0.00265ng/ul 5 0.000265ng/ul

Appendix Table 10.11. cDNA concentrations used in validation experiment 2. The DNA sample P1:II:1 (265ng/ul) was used for all validation experiments.

Dilution Detector Name Reporter Ct Threshold 1 GAPDH SYBR 17.26014 0.2 1 GAPDH SYBR 17.21402 0.2 1 GAPDH SYBR 17.18488 0.2 2 GAPDH SYBR 20.22608 0.2 2 GAPDH SYBR 20.25404 0.2 2 GAPDH SYBR 20.16522 0.2 3 GAPDH SYBR 23.52998 0.2 3 GAPDH SYBR 23.32445 0.2 3 GAPDH SYBR 23.45436 0.2 4 GAPDH SYBR 26.47966 0.2 4 GAPDH SYBR 26.72437 0.2 4 GAPDH SYBR 26.94725 0.2 5 GAPDH SYBR 30.41008 0.2 5 GAPDH SYBR 30.52611 0.2 5 GAPDH SYBR 30.926 0.2 NTC GAPDH SYBR 34.9857 0.2 NTC GAPDH SYBR 34.35886 0.2 NTC GAPDH SYBR 35.26433 0.2

Appendix Table 10.12. Results of GAPDH assay efficiency validation experiment 2.

268

Dilution Detector Name Reporter Ct Threshold 1 CDKN2D SYBR 25.5092 0.2 1 CDKN2D SYBR 25.42436 0.2 1 CDKN2D SYBR 25.45773 0.2 2 CDKN2D SYBR 29.09358 0.2 2 CDKN2D SYBR 28.64218 0.2 2 CDKN2D SYBR 28.6299 0.2 3 CDKN2D SYBR 32.47784 0.2 3 CDKN2D SYBR 32.04047 0.2 3 CDKN2D SYBR 31.98743 0.2 4 CDKN2D SYBR 36.52485 0.2 4 CDKN2D SYBR 35.41995 0.2 4 CDKN2D SYBR 36.33194 0.2 5 CDKN2D SYBR 38.63617 0.2 5 CDKN2D SYBR Undetermined 0.2 5 CDKN2D SYBR Undetermined 0.2 NTC CDKN2D SYBR Undetermined 0.2 NTC CDKN2D SYBR Undetermined 0.2 NTC CDKN2D SYBR Undetermined 0.2

Appendix Table 10.13. Results of CDKN2D assay efficiency validation experiment 2. Dilution Detector Name Reporter Ct Threshold 1 KRI1 SYBR 22.80605 0.2 1 KRI1 SYBR 22.84676 0.2 1 KRI1 SYBR 22.67541 0.2 2 KRI1 SYBR 25.81638 0.2 2 KRI1 SYBR 25.90042 0.2 2 KRI1 SYBR 25.64673 0.2 3 KRI1 SYBR 29.03657 0.2 3 KRI1 SYBR 28.94906 0.2 3 KRI1 SYBR 29.20165 0.2 4 KRI1 SYBR 32.89526 0.2 4 KRI1 SYBR 32.99276 0.2 4 KRI1 SYBR 32.82018 0.2 5 KRI1 SYBR 36.56302 0.2 5 KRI1 SYBR 36.94241 0.2 5 KRI1 SYBR Undetermined 0.2 NTC KRI1 SYBR Undetermined 0.2 NTC KRI1 SYBR Undetermined 0.2 NTC KRI1 SYBR Undetermined 0.2

Appendix Table 10.14. Results of KRI1 assay efficiency validation experiment 2.

269

Dilution Detector Name Reporter Ct Threshold 1 SLC44A2 SYBR 25.09432 0.2 1 SLC44A2 SYBR 25.03334 0.2 1 SLC44A2 SYBR 25.00832 0.2 2 SLC44A2 SYBR 28.3916 0.2 2 SLC44A2 SYBR 28.32836 0.2 2 SLC44A2 SYBR 28.37659 0.2 3 SLC44A2 SYBR 31.20297 0.2 3 SLC44A2 SYBR 31.18634 0.2 3 SLC44A2 SYBR 31.36222 0.2 4 SLC44A2 SYBR 35.54755 0.2 4 SLC44A2 SYBR 34.4162 0.2 4 SLC44A2 SYBR 36.41696 0.2 5 SLC44A2 SYBR Undetermined 0.2 5 SLC44A2 SYBR Undetermined 0.2 5 SLC44A2 SYBR Undetermined 0.2 NTC SLC44A2 SYBR 39.99793 0.2 NTC SLC44A2 SYBR Undetermined 0.2 NTC SLC44A2 SYBR Undetermined 0.2

Appendix Table 10.15. Results of SLC44A2 assay efficiency validation experiment 2. One of the none template control (NTC) samples showed amplification of SLC44A2 at approximately 36 cycles. This may indicate weak primer-dimer formation of primers in this well or contamination during experimental set up. Dilution Detector Name Reporter Ct Threshold 1 AP1M2 SYBR 37.42623 0.2 1 AP1M2 SYBR 37.50364 0.2 1 AP1M2 SYBR 37.48217 0.2 2 AP1M2 SYBR 39.79408 0.2 2 AP1M2 SYBR 39.24894 0.2 2 AP1M2 SYBR 39.53138 0.2 3 AP1M2 SYBR 38.02202 0.2 3 AP1M2 SYBR 39.61375 0.2 3 AP1M2 SYBR Undetermined 0.2 4 AP1M2 SYBR 39.63942 0.2 4 AP1M2 SYBR 39.96813 0.2 4 AP1M2 SYBR 39.63198 0.2 5 AP1M2 SYBR 37.13278 0.2 5 AP1M2 SYBR 38.33423 0.2 5 AP1M2 SYBR 38.64352 0.2 NTC AP1M2 SYBR 39.61118 0.2 NTC AP1M2 SYBR 36.78981 0.2 NTC AP1M2 SYBR 38.78393 0.2

Appendix Table 10.16. Results of AP1M2 assay efficiency validation experiment 2.

270

Appendix Figure 10.4. Amplification plot and standard curve of GAPDH, CDKN2D, SLC44A2 AND KRI1 dilution series for validation experiment 2. Red line on amplification plot indicates Ct threshold.

271

Appendix Figure 10.5. Amplification plot for AP1M2 dilutions for validation experiment 2. This plot shows that AP1M2 was not efficiently amplified at these dilutions.

Appendix Figure 10.6.: Amplification plots showing gene of interest in relation to GAPDH for samples P1:III3 and P1II:1. A = SLC44A2, B = CDKN2D and C = KRI1

272

Probeset Gene Log2 P Value Adjuste P1:III:3_ P1:III:3_ P1:III:3_ P1:II:1_ P1:II:1_ P1:II:1_ Name Symbol Fold d P r1 r2 r3 r1 r2 r3 Chang Value e 204409_s_a EIF1AY 10.658 0.00000 0.00158 -1.233 2.043 -0.543 10.779 10.838 10.624 t 4 9 230760_at LOC1001308 9.474 0.00000 0.00105 -3.743 -0.981 -1.915 7.238 7.295 7.250 29 /// ZFY 2 8 228492_at LOC1001302 8.967 0.00031 0.01044 -2.942 -2.252 2.482 8.109 8.038 8.043 16 /// USP9Y 0 6 201909_at LOC1001336 8.955 0.00010 0.00623 -0.108 0.913 4.556 10.713 10.681 10.831 62 /// 2 5 RPS4Y1 211149_at LOC1001302 8.194 0.00004 0.00424 0.975 -1.607 -2.878 6.891 7.123 7.058 24 /// UTY 6 6 206700_s_a JARID1D 7.939 0.00004 0.00428 1.774 -1.674 1.466 8.426 8.396 8.560 t 6 6 232618_at CYorf15A 7.524 0.00004 0.00431 0.463 -1.562 2.047 7.925 7.894 7.701 7 3 205000_at DDX3Y 7.420 0.00000 0.00175 3.338 1.944 0.762 9.430 9.571 9.302 6 2 206624_at LOC1001302 7.410 0.00000 0.00188 -2.664 -1.920 -0.123 5.849 6.071 5.603 16 /// USP9Y 7 3 236694_at CYorf15A 7.396 0.00037 0.01142 1.918 1.211 -2.721 7.509 7.742 7.344 1 8 207246_at LOC1001308 6.892 0.00005 0.00475 -1.437 -2.186 1.043 6.186 6.110 5.799 29 /// ZFY 5 2 223645_s_a CYorf15B 5.633 0.00009 0.00601 2.972 0.298 2.852 7.744 7.603 7.673 t 3 3 223646_s_a CYorf15B 5.537 0.00030 0.01026 2.373 2.918 -0.444 7.095 7.249 7.115 t 0 9 205001_s_a DDX3Y /// 5.064 0.00000 0.00027 3.522 4.196 3.502 8.777 8.841 8.795 t LOC1001302 0 4 20 214983_at TTTY15 4.713 0.00000 0.00169 2.039 1.209 0.477 5.844 5.915 6.105 6 0 206679_at APBA1 3.951 0.00138 0.02273 0.731 3.275 3.770 6.487 6.167 6.973 5 4 1570360_s_ DDX3Y /// 3.718 0.00003 0.00384 1.661 2.072 3.264 6.005 6.180 5.968 at LOC1001302 6 4 20 231795_at STON1 3.614 0.00136 0.02255 -0.475 2.511 1.235 4.424 4.647 5.043 4 5 235942_at LOC401629 3.514 0.00000 0.00180 1.731 1.390 2.082 5.808 5.034 4.903 /// 7 0 LOC401630 1566465_at LOC728987 3.395 0.01173 0.07002 -1.167 1.670 3.158 4.492 4.486 4.867 1 1 219501_at ENOX1 3.319 0.00714 0.05363 4.939 1.516 1.707 6.319 5.956 5.843 4 6 221796_at NTRK2 3.300 0.00002 0.00293 2.863 1.972 3.238 6.206 5.914 5.853 1 9 229065_at SLC35F3 3.221 0.00000 0.00168 5.404 5.781 6.407 9.147 8.978 9.129 5 2 234994_at TMEM200A 2.907 0.00000 0.00082 5.388 5.100 4.726 7.923 8.007 8.005 1 7 209183_s_a C10orf10 2.845 0.00000 0.00103 3.925 3.441 4.061 6.851 6.566 6.546 t 2 1 236181_at LOC1001321 2.793 0.00026 0.00970 1.242 1.470 0.563 4.680 3.758 3.214 81 9 2

Appendix Table 10.17. Raw data for differentially expressed genes identified on expression array.

273

Probeset Gene Log2 P Value Adjuste P1:III:3_ P1:III:3_ P1:III:3_ P1:II:1_ P1:II:1_ P1:II:1_ Name Symbol Fold d P r1 r2 r3 r1 r2 r3 Chang Value e 207276_at CDR1 2.770 0.01332 0.07474 4.294 3.736 0.920 6.176 5.685 5.399 2 4 1570162_at C14orf91 2.763 0.00070 0.01563 0.114 2.011 1.303 4.338 3.670 3.708 0 1 224293_at TTTY10 2.716 0.00492 0.04399 0.543 2.518 3.268 4.718 5.277 4.481 5 6 210322_x_a UTY 2.609 0.00054 0.01404 4.974 3.188 4.181 6.944 6.763 6.462 t 5 2 228425_at LOC654433 2.607 0.00098 0.01893 3.749 3.603 1.903 5.752 5.764 5.560 7 8 226665_at AHSA2 2.604 0.00000 0.00190 4.239 3.615 4.184 6.920 6.471 6.458 8 4 214468_at MYH6 2.596 0.01320 0.07435 4.748 1.756 4.748 6.321 6.441 6.277 1 6 238546_at SLC8A1 2.508 0.02151 0.09737 0.076 3.148 2.937 3.692 5.003 4.990 3 0 232968_at FANK1 2.465 0.00002 0.00326 4.154 4.059 4.736 6.490 6.683 7.173 6 4 1553183_at UMODL1 2.450 0.02419 0.10405 -1.384 1.672 2.013 3.478 3.094 3.077 3 7 205206_at KAL1 2.447 0.01093 0.06759 0.134 2.567 2.996 4.558 4.096 4.384 0 2 208067_x_a LOC1001302 2.404 0.00002 0.00337 6.486 6.335 5.710 8.324 8.889 8.530 t 24 /// UTY 8 7 213413_at STON1 2.370 0.00031 0.01044 1.451 2.814 2.573 4.480 4.583 4.886 0 6 206279_at PRKY 2.341 0.00000 0.00169 3.637 4.003 4.284 6.395 6.400 6.153 5 0 1555136_at FGD6 2.259 0.01927 0.09222 1.693 0.284 3.452 4.018 4.510 3.678 6 1 242679_at LOC1001310 2.228 0.00004 0.00410 5.847 5.105 5.862 7.956 7.977 7.564 39 1 9 230546_at VASH1 2.213 0.00023 0.00900 4.063 2.925 3.363 5.550 5.395 6.045 0 2 1556046_a LOC157627 2.044 0.00061 0.01482 5.351 3.963 4.933 6.770 6.973 6.635 _at 3 3 224058_s_a HSD17B7P2 2.015 0.00043 0.01250 2.804 2.936 1.756 4.452 4.350 4.737 t 6 2 212999_x_a HLA-DQB1 1.973 0.00002 0.00343 6.442 5.949 6.686 8.196 8.338 8.463 t 9 1 221730_at COL5A2 1.919 0.01170 0.06988 3.409 1.046 2.697 4.564 4.282 4.063 3 9 218418_s_a KANK2 1.899 0.00579 0.04803 1.763 3.681 3.340 4.556 4.922 5.002 t 2 8 224685_at MLLT4 1.896 0.00000 0.00171 6.242 5.751 5.801 7.844 7.826 7.812 6 1 202686_s_a AXL 1.891 0.06680 0.19125 4.472 4.208 1.059 4.816 5.102 5.493 t 8 8 223168_at RHOU 1.881 0.00001 0.00255 6.808 6.410 6.168 8.362 8.285 8.383 5 1 218976_at DNAJC12 1.840 0.00002 0.00293 6.431 5.881 5.908 8.067 7.779 7.892 1 9 208059_at CCR8 1.838 0.00001 0.00214 4.415 4.045 4.104 6.274 5.890 5.916 0 3

Appendix Table 10.17. Continued.

274

Probeset Gene Log2 P Value Adjuste P1:III:3_r P1:III:3_r P1:III:3_r P1:II:1_r P1:II:1_r P1:II:1_r Name Symbol Fold d P 1 2 3 1 2 3 Chang Value e 230097_at GART 1.832 0.00079 0.01674 3.341 3.919 4.637 5.821 5.997 5.576 1 2 214129_at PDE4DIP 1.795 0.00135 0.02247 5.588 5.782 6.938 8.023 7.898 7.773 4 3 219179_at DACT1 1.790 0.00000 0.00190 5.184 5.029 5.404 6.779 7.122 7.084 8 7 240744_at CPA5 1.787 0.00001 0.00233 4.356 4.700 4.268 6.363 6.015 6.309 3 3 221156_x_a CCPG1 1.784 0.00000 0.00166 4.350 4.114 3.978 5.793 6.030 5.970 t 4 2 209473_at ENTPD1 1.763 0.00000 0.00075 7.766 7.580 7.587 9.429 9.360 9.433 1 5 232509_at PDE4DIP 1.758 0.00737 0.05466 5.303 3.519 3.579 5.758 5.995 5.921 2 4 230337_at SOS1 1.736 0.00009 0.00607 6.558 6.049 5.702 7.859 7.813 7.845 7 9 1562226_at VWDE 1.735 0.01219 0.07116 4.445 2.208 3.470 5.226 5.057 5.044 2 9 205229_s_a COCH 1.734 0.00007 0.00540 6.205 5.779 6.581 7.903 7.860 8.005 t 4 6 238871_at MLLT4 1.732 0.00001 0.00233 5.590 5.258 5.125 7.055 7.205 6.910 3 3 1560378_at C21orf41 1.716 0.00155 0.02436 2.870 1.660 2.929 4.144 4.334 4.130 9 9 231437_at SLC35D2 1.702 0.00046 0.01289 3.081 2.978 3.976 5.143 4.874 5.124 3 3 207691_x_a ENTPD1 1.686 0.00004 0.00446 5.979 6.521 6.191 8.033 7.642 8.076 t 9 0 205619_s_a MEOX1 1.682 0.00025 0.00956 5.394 4.713 5.515 6.680 6.817 7.171 t 6 3 234673_at HHLA2 1.672 0.00011 0.00674 3.729 4.223 4.264 6.049 5.751 5.431 9 1 239810_at VASH1 1.670 0.01278 0.07300 4.254 3.236 2.433 4.547 5.669 4.718 1 0 1560814_a_ C15orf57 1.664 0.00011 0.00674 4.030 4.780 4.699 6.295 6.098 6.107 at 9 1 235238_at SHC4 1.648 0.00000 0.00184 3.864 3.622 3.779 5.274 5.618 5.314 7 1 223915_at BCOR 1.616 0.00001 0.00218 6.271 6.012 6.443 7.743 7.958 7.872 1 7 213083_at SLC35D2 1.615 0.00023 0.00907 6.127 5.368 5.757 7.078 7.389 7.631 4 0 233675_s_a LOC37449 1.610 0.00007 0.00560 4.061 4.734 4.615 5.989 6.041 6.209 t 1 9 4

Appendix Table 10.17. Continued.

275

Probeset Gene Log2 P Value Adjuste P1:III:3_ P1:III:3_ P1:III:3_ P1:II:1_ P1:II:1_ P1:II:1_ Name Symbol Fold d P r1 r2 r3 r1 r2 r3 Chang Value e 1563209_a MACROD2 -1.601 0.00061 0.01483 6.962 7.388 7.890 5.481 6.107 5.848 _at 4 4 231821_x_a LOC1001284 -1.611 0.00002 0.00334 7.737 8.127 8.114 6.342 6.603 6.200 t 60 /// 7 7 LOC1001331 77 /// LOC349114 /// LOC388312 /// LOC399744 /// LOC441124 /// LOC729021 /// LOC729218 /// LOC729660 /// LOC730235 /// tcag7.907 229487_at EBF1 -1.614 0.00013 0.00709 5.825 6.508 5.814 4.312 4.402 4.591 6 3 223767_at GPR84 -1.616 0.00000 0.00103 6.669 6.507 6.532 5.058 4.957 4.843 2 1 241495_at CCNL1 -1.616 0.00000 0.00191 9.271 9.328 8.910 7.584 7.551 7.525 9 9 228377_at KLHL14 -1.617 0.00028 0.00989 5.029 4.233 4.299 2.662 3.023 3.023 3 9 210587_at INHBE -1.625 0.00314 0.03501 4.479 4.240 4.621 2.677 3.646 2.141 2 7 1565715_at FUS -1.629 0.00000 0.00152 8.194 8.096 8.273 6.734 6.498 6.445 4 8 1559975_at BTG1 -1.633 0.00018 0.00808 6.870 7.287 6.441 5.349 5.312 5.038 7 7 219841_at AICDA -1.634 0.00000 0.00114 9.541 9.670 9.422 7.969 7.962 7.800 2 0 220051_at PRSS21 -1.637 0.00046 0.01292 6.038 6.076 6.199 5.022 4.433 3.946 6 4 215393_s_a COBLL1 -1.638 0.00037 0.01155 4.931 5.313 5.131 3.334 3.093 4.033 t 8 9 208092_s_a FAM49A -1.641 0.00012 0.00678 6.510 6.580 6.193 5.215 4.562 4.584 t 6 1 214777_at IGKV4-1 -1.644 0.00432 0.04128 5.309 5.509 5.342 4.119 4.333 2.777 6 8 233182_x_a ATXN3 -1.651 0.00051 0.01365 6.590 7.377 6.931 5.673 5.386 4.885 t 2 8 213005_s_a KANK1 -1.655 0.03386 0.12620 4.380 4.409 4.819 1.827 4.415 2.401 t 3 4 204533_at CXCL10 -1.660 0.00000 0.00103 9.324 9.293 9.256 7.600 7.505 7.789 2 1 219517_at ELL3 -1.662 0.00014 0.00719 7.162 7.045 7.011 5.027 5.881 5.325 1 6 39318_at TCL1A -1.668 0.00009 0.00607 6.400 6.165 6.418 4.436 5.105 4.438 7 9

Appendix Table 10.17. Continued.

276

Probeset Gene Log2 P Value Adjuste P1:III:3_ P1:III:3_ P1:III:3_ P1:II:1_ P1:II:1_ P1:II:1_ Name Symbol Fold d P r1 r2 r3 r1 r2 r3 Chang Value e 236995_x_a TFEC -1.670 0.00217 0.02872 3.772 4.210 3.160 2.311 2.394 1.428 t 0 5 211506_s_a IL8 -1.677 0.00071 0.01577 6.407 6.426 6.343 5.096 5.033 4.016 t 3 1 209138_x_a IGL@ -1.685 0.00000 0.00103 8.686 8.568 8.752 7.097 6.858 6.998 t 2 1 238164_at USP6NL -1.690 0.00186 0.02644 6.155 5.786 5.937 4.703 3.444 4.660 3 0 230245_s_a LOC283663 -1.693 0.00000 0.00088 6.996 7.092 7.148 5.404 5.472 5.280 t 1 5 217979_at TSPAN13 -1.696 0.00006 0.00492 8.270 7.988 7.815 5.978 6.516 6.491 0 3 203561_at FCGR2A -1.698 0.00002 0.00310 6.623 6.748 7.087 5.048 4.968 5.348 3 9 1562657_a C10orf90 -1.699 0.06054 0.17992 4.110 4.468 3.970 3.428 3.455 0.568 _at 8 8 214976_at RPL13 -1.704 0.00029 0.01017 5.665 5.708 5.340 3.489 3.702 4.410 5 2 228592_at MS4A1 -1.710 0.00000 0.00095 10.253 10.446 10.319 8.735 8.539 8.614 1 9 217649_at ZFAND5 -1.717 0.00234 0.03003 5.626 5.046 5.323 4.410 3.465 2.971 6 0 202625_at LYN -1.732 0.00000 0.00191 7.680 7.801 7.343 5.888 5.779 5.961 8 0 221766_s_a FAM46A -1.735 0.00004 0.00424 5.862 5.768 5.467 4.257 3.990 3.644 t 5 6 232884_s_a ZNF853 -1.737 0.00571 0.04781 4.623 3.550 3.452 2.852 2.111 1.452 t 9 1 213002_at MARCKS -1.746 0.00016 0.00779 9.462 10.237 10.015 8.251 8.369 7.856 6 7 229584_at LRRK2 -1.756 0.00007 0.00532 5.567 5.847 5.545 3.847 3.550 4.294 0 5 205267_at POU2AF1 -1.757 0.00000 0.00074 9.872 9.848 9.912 8.176 8.011 8.176 1 6 209200_at MEF2C -1.758 0.00051 0.01364 7.468 7.739 7.915 5.377 5.983 6.489 0 2 1565717_s_ FUS -1.761 0.00000 0.00191 8.270 7.912 8.389 6.335 6.484 6.468 at 8 0 210347_s_a BCL11A -1.777 0.00017 0.00784 5.952 5.839 5.196 4.207 3.760 3.689 t 3 6 214090_at LOC1001331 -1.798 0.00001 0.00271 6.461 6.924 6.545 4.622 4.913 4.998 05 7 3 209098_s_a JAG1 -1.817 0.00072 0.01594 6.170 6.396 6.678 4.506 5.254 4.034 t 6 8 207113_s_a TNF -1.817 0.00003 0.00364 7.996 8.050 7.897 6.014 6.575 5.903 t 3 1 1560647_at TSPYL1 -1.823 0.00006 0.00530 5.569 5.389 5.003 3.799 3.519 3.175 7 4 203753_at TCF4 -1.828 0.00014 0.00721 8.157 8.218 8.407 5.910 6.548 6.841 2 2 228599_at MS4A1 -1.837 0.00000 0.00105 7.288 7.266 7.600 5.587 5.500 5.558 2 8 212099_at RHOB -1.846 0.00000 0.00074 8.826 8.893 8.800 7.040 6.891 7.052 0 3

Appendix Table 10.17. Continued.

277

Probeset Gene Log2 P Value Adjuste P1:III:3_r P1:III:3_r P1:III:3_r P1:II:1_r P1:II:1_r P1:II:1_r Name Symbol Fold d P 1 2 3 1 2 3 Chang Value e 215078_at SOD2 -1.855 0.00009 0.00607 6.146 6.395 6.297 4.744 4.619 3.911 6 9 226068_at SYK -1.859 0.00007 0.00535 6.697 6.822 6.645 4.482 4.772 5.333 2 4 222891_s_ BCL11A -1.876 0.00048 0.01314 6.352 5.753 6.319 4.854 4.156 3.787 at 0 9 44790_s_at C13orf18 -1.878 0.00000 0.00071 8.549 8.610 8.565 6.635 6.742 6.713 /// 0 0 LOC72897 0 241843_at SNORA28 -1.879 0.00019 0.00819 6.127 6.621 5.886 4.463 4.638 3.897 2 6 1559361_a MACC1 -1.888 0.00235 0.03004 3.581 4.009 3.594 1.962 2.607 0.951 t 3 8 226459_at PIK3AP1 -1.889 0.00001 0.00253 8.827 8.636 8.616 6.870 7.072 6.469 5 5 1554676_a SRGN -1.891 0.00001 0.00224 7.697 7.772 7.486 5.880 5.956 5.445 t 2 1 221651_x_ IGK@ /// -1.910 0.00000 0.00074 9.833 9.886 9.972 8.039 7.852 8.071 at IGKC 1 6 204689_at HHEX -1.912 0.00015 0.00754 6.066 6.692 6.501 4.896 4.063 4.563 6 4 203922_s_ CYBB -1.919 0.00000 0.00168 6.703 6.849 6.510 4.851 4.545 4.909 at 5 2 214677_x_ IGL@ -1.930 0.00000 0.00168 9.132 9.255 9.143 7.185 7.026 7.529 at 5 2 204005_s_ PAWR -1.931 0.00000 0.00110 6.572 6.439 6.557 4.360 4.665 4.752 at 2 7 220266_s_ KLF4 -1.942 0.00125 0.02148 4.694 4.976 5.375 2.977 2.409 3.831 at 4 2 204866_at PHF16 -1.944 0.00013 0.00706 6.868 6.766 6.691 4.392 4.702 5.400 5 9 1555827_a CCNL1 -1.963 0.00000 0.00074 9.987 9.921 9.795 8.017 7.965 7.832 t 1 6 226818_at MPEG1 -1.976 0.00011 0.00656 5.654 5.154 5.262 3.848 3.322 2.971 0 5 201462_at SCRN1 -1.991 0.00970 0.06313 5.564 5.275 5.039 3.996 1.911 3.996 1 6 228056_s_ NAPSB -1.997 0.00009 0.00601 5.217 5.734 5.797 3.582 3.195 3.980 at 4 3 210356_x_ MS4A1 -2.002 0.00000 0.00074 9.702 9.697 9.679 7.867 7.581 7.624 at 1 6 217607_x_ EIF4G2 -2.022 0.00000 0.00176 9.023 9.085 8.902 7.143 7.151 6.651 at 6 2 229597_s_ WDFY4 -2.024 0.00001 0.00277 7.259 6.893 7.053 4.997 5.392 4.743 at 8 8 210796_x_ SIGLEC6 -2.030 0.00000 0.00179 5.332 5.770 5.639 3.752 3.565 3.335 at 7 0 221671_x_ IGK@ /// -2.033 0.00000 0.00085 9.868 9.887 9.893 7.794 7.690 8.063 at IGKC 1 0 212386_at TCF4 -2.036 0.00021 0.00873 6.627 6.934 6.560 5.237 4.676 4.100 7 3 229937_x_ LILRB1 -2.047 0.00000 0.00090 8.678 8.754 8.962 6.893 6.750 6.609 at 1 3 216491_x_ IGHM -2.058 0.00011 0.00663 5.683 5.934 6.068 4.422 3.545 3.546 at 5 4

Appendix Table 10.17. Continued.

278

Probeset Gene Log2 P Value Adjuste P1:III:3_r P1:III:3_r P1:III:3_r P1:II:1_r P1:II:1_r P1:II:1_r Name Symbol Fold d P 1 2 3 1 2 3 Chang Value e 212225_at EIF1 -2.089 0.00000 0.00152 7.737 8.260 7.857 5.906 5.823 5.858 4 8 233937_at GGNBP2 -2.096 0.00221 0.02904 4.581 4.540 4.248 1.291 3.063 2.729 7 6 204249_s_a LMO2 -2.104 0.00000 0.00075 7.848 7.870 7.670 5.603 5.603 5.870 t 1 5 232737_s_a ENPP3 -2.106 0.00072 0.01588 3.090 3.496 3.511 2.105 0.956 0.718 t 0 0 1565674_at FCGR2A -2.119 0.00001 0.00207 6.460 6.177 6.253 3.895 4.114 4.524 /// 0 1 FCGR2B /// FCGR2C 228158_at LOC64516 -2.121 0.00030 0.01036 5.290 5.527 5.771 2.891 4.104 3.231 6 4 4 217418_x_a MS4A1 -2.134 0.00000 0.00095 9.596 9.645 9.494 7.682 7.383 7.267 t 1 9 225021_at ZNF532 -2.146 0.00004 0.00410 4.606 4.801 4.673 2.922 2.032 2.688 2 9 222915_s_a BANK1 -2.147 0.00000 0.00074 7.114 7.137 7.009 4.841 5.123 4.855 t 1 6 224795_x_a IGK@ /// -2.150 0.00000 0.00075 10.248 10.222 10.223 8.226 7.864 8.152 t IGKC 1 5 207655_s_a BLNK -2.169 0.00001 0.00204 8.376 8.290 7.878 6.153 5.731 6.154 t 0 7 1553906_s_ FGD2 -2.174 0.00000 0.00071 9.927 9.854 9.833 7.791 7.547 7.756 at 0 0 224405_at FCRL5 -2.179 0.00002 0.00286 6.947 6.552 6.838 4.718 4.891 4.190 0 8 219498_s_a BCL11A -2.185 0.00002 0.00293 7.111 6.922 6.905 5.097 4.958 4.327 t 1 9 241446_at ADAM28 -2.214 0.00001 0.00271 5.829 5.333 5.393 3.593 2.974 3.345 7 3 1565716_at FUS -2.242 0.00000 0.00071 9.715 9.847 9.625 7.465 7.390 7.605 0 0 212592_at IGJ -2.258 0.00000 0.00071 8.141 8.001 8.036 5.805 5.952 5.645 0 0 209199_s_a MEF2C -2.263 0.00000 0.00071 8.183 8.199 8.205 6.118 5.892 5.789 t 0 0 205997_at ADAM28 -2.278 0.00000 0.00075 6.981 6.714 6.610 4.415 4.442 4.613 1 5 203923_s_a CYBB -2.284 0.00017 0.00787 6.435 6.859 7.252 5.140 4.413 4.141 t 6 4 1554508_at PIK3AP1 -2.307 0.00009 0.00607 6.310 6.205 5.792 3.856 4.294 3.237 7 9 224404_s_a FCRL5 -2.314 0.00007 0.00532 6.670 6.035 6.391 4.569 3.659 3.927 t 0 5 201445_at CNN3 -2.327 0.00005 0.00479 5.152 4.598 4.155 2.344 2.512 2.069 7 6 207540_s_a SYK -2.330 0.00025 0.00957 5.191 5.498 5.403 3.835 2.522 2.745 t 9 1 208650_s_a CD24 -2.346 0.00002 0.00302 6.942 6.556 6.828 4.564 4.767 3.956 t 2 1 202975_s_a RHOBTB3 -2.360 0.00054 0.01402 4.263 3.640 3.983 1.041 2.484 1.281 t 1 7 206641_at TNFRSF1 -2.366 0.00009 0.00600 5.774 5.540 5.824 2.922 3.099 4.020 7 2 5

Appendix Table 10.17. Continued.

279

Probeset Gene Log2 P Value Adjuste P1:III:3_r P1:III:3_r P1:III:3_r P1:II:1_r P1:II:1_r P1:II:1_r Name Symbol Fold d P 1 2 3 1 2 3 Chang Value e 231647_s_a FCRL5 -2.366 0.00012 0.00678 6.664 6.630 5.976 3.478 4.237 4.456 t 5 1 211276_at TCEAL2 -2.380 0.00390 0.03927 4.720 4.280 3.862 0.588 2.630 2.505 6 0 223343_at MS4A7 -2.381 0.01781 0.08827 3.537 3.719 3.236 2.161 -0.823 2.011 2 9 219497_s_a BCL11A -2.422 0.00011 0.00674 7.337 6.770 6.142 4.250 4.623 4.110 t 9 1 1563674_at FCRL2 -2.425 0.00005 0.00482 7.104 7.110 6.807 4.757 5.021 3.968 7 4 1558185_at CLLU1 -2.429 0.00001 0.00274 6.454 7.107 6.856 4.170 4.754 4.206 7 5 227388_at TUSC1 -2.429 0.02431 0.10441 4.741 4.484 4.039 3.216 2.880 -0.120 7 7 38521_at CD22 -2.452 0.00000 0.00075 7.627 7.619 7.897 5.486 5.123 5.178 1 5 230276_at FAM49A -2.490 0.00011 0.00663 4.220 4.069 3.832 0.875 2.147 1.628 4 3 227646_at EBF1 -2.497 0.00041 0.01223 6.242 6.582 6.795 4.440 3.124 4.563 9 1 222040_at HNRNPA1 -2.498 0.00000 0.00027 9.148 9.114 9.129 6.662 6.641 6.595 /// 0 4 LOC72884 4 204581_at CD22 -2.529 0.00000 0.00075 7.332 7.649 7.194 4.744 4.973 4.871 1 5 235673_at TAF1B -2.531 0.00002 0.00317 5.312 5.458 5.488 3.312 2.322 3.030 5 1 233261_at EBF1 -2.551 0.00001 0.00258 7.598 7.572 7.435 4.573 4.879 5.499 5 1 220146_at TLR7 -2.579 0.00000 0.00176 7.382 7.729 7.229 5.110 4.495 4.998 7 9 219667_s_a BANK1 -2.601 0.00000 0.00152 6.488 6.385 6.194 3.676 4.128 3.459 t 3 8 203641_s_a COBLL1 -2.667 0.00229 0.02971 5.524 5.830 5.933 1.702 3.613 3.971 t 9 5 1558662_s_ BANK1 -2.669 0.00000 0.00190 6.471 6.398 6.691 3.557 4.337 3.660 at 8 7 204959_at MNDA -2.699 0.00000 0.00074 6.459 6.923 6.565 3.945 3.895 4.012 0 3 212387_at TCF4 -2.710 0.00002 0.00306 7.846 8.202 7.260 5.099 5.319 4.762 3 5 217422_s_a CD22 -2.746 0.00000 0.00027 7.612 7.626 7.791 4.982 4.939 4.869 t 0 4 1566764_at MACC1 -2.782 0.00004 0.00410 4.363 4.678 4.658 2.463 1.304 1.587 1 9 235372_at FCRLA -2.818 0.00000 0.00058 7.552 7.281 7.440 4.444 4.757 4.617 0 1 215933_s_a HHEX -2.838 0.00000 0.00097 5.867 6.116 5.890 2.870 3.489 3.002 t 2 8 220120_s_a EPB41L4 -2.894 0.02461 0.10507 4.506 4.613 4.692 3.311 -0.847 2.664 t A 6 6 216436_at PIK3R4 -2.898 0.00024 0.00918 4.649 5.022 4.095 2.202 0.810 2.058 1 2 205504_at BTK -2.910 0.00000 0.00171 6.650 6.465 5.808 3.588 3.270 3.334 6 1

Appendix Table 10.17. Continued.

280

Probeset Gene Log2 P Value Adjuste P1:III:3_ P1:III:3_ P1:III:3_ P1:II:1_ P1:II:1_ P1:II:1_ Name Symbol Fold d P r1 r2 r3 r1 r2 r3 Chang Value e 216662_at MYO7B -2.951 0.00002 0.00330 4.075 4.646 3.547 1.309 0.798 1.309 7 5 238900_at HLA-DRB1 /// -3.005 0.00000 0.00169 4.180 4.854 4.209 1.693 1.446 1.089 HLA-DRB2 /// 6 0 HLA-DRB3 /// HLA-DRB4 /// HLA-DRB5 /// LOC1001334 84 /// LOC1001336 61 /// LOC1001338 11 /// LOC730415 /// RNASE2 /// ZNF749 239905_at YTHDC1 -3.009 0.00282 0.03311 4.856 5.202 4.871 0.376 3.196 2.328 7 4 206586_at CNR2 -3.036 0.00002 0.00325 5.824 5.606 5.426 2.379 3.267 2.102 6 8 244023_at SYK -3.052 0.00117 0.02071 4.858 4.315 4.982 2.691 0.342 1.966 6 8 1552782_at SLC44A5 -3.060 0.00817 0.05756 3.375 4.030 3.434 -1.453 2.037 1.073 6 4 209189_at FOS -3.165 0.00063 0.01499 5.874 5.781 6.060 1.404 3.430 3.384 8 6 211644_x_a IGK@ /// -3.198 0.00010 0.00656 6.881 6.552 6.724 4.458 3.310 2.796 t IGKC /// 9 5 IGKV3-20 /// IGKV3D-11 /// IGKV3D- 15 /// LOC440871 215565_at DTNB -3.236 0.00000 0.00071 8.429 7.963 8.083 5.113 4.730 4.924 0 0 1566766_a MACC1 -3.248 0.00194 0.02712 5.855 5.590 5.470 3.486 2.945 0.742 _at 7 4 228439_at BATF2 -3.356 0.00149 0.02372 6.048 6.535 6.115 2.565 4.426 1.638 7 0 229088_at ENPP1 -3.401 0.00020 0.00835 5.516 5.747 5.748 2.895 1.096 2.818 1 8 210889_s_a FCGR2B -3.507 0.00015 0.00740 6.068 6.050 5.745 2.745 1.341 3.254 t 0 0 228285_at TDRD9 -3.768 0.00081 0.01704 4.337 4.763 4.270 0.253 -0.473 2.286 4 3 210432_s_a SCN3A -3.786 0.00000 0.00169 5.190 5.717 5.097 1.918 0.934 1.792 t 6 0 226489_at TMCC3 -4.003 0.00039 0.01177 5.248 5.314 5.444 2.899 0.529 0.568 2 7 235401_s_a FCRLA -4.054 0.00000 0.00160 6.267 5.611 5.917 2.165 2.221 1.245 t 4 6 237813_at PCBP2 /// -4.175 0.00058 0.01451 5.629 5.619 5.129 -0.402 1.803 2.450 PCBP2P2 3 2 244546_at CYCS -4.233 0.00017 0.00783 3.347 4.758 3.380 0.698 -1.002 -0.912 1 8

Appendix Table 10.17. Continued.

281

Probeset Gene Log2 P Value Adjuste P1:III:3_r P1:III:3_r P1:III:3_r P1:II:1_r P1:II:1_r P1:II:1_r Name Symbo Fold d P 1 2 3 1 2 3 l Chang Value e 204470_at CXCL1 -4.796 0.00001 0.00232 5.489 5.324 5.383 1.648 0.231 -0.072 2 1 209925_at OCLN -5.777 0.00000 0.00169 4.756 5.691 5.379 0.250 -0.327 -1.428 5 0 243712_at XIST -8.346 0.00001 0.00232 9.448 9.365 9.475 2.930 0.449 -0.129 2 1 224590_at XIST -8.895 0.00000 0.00027 8.192 8.204 8.015 -0.953 -1.355 0.034 0 4 214218_s_ XIST -9.952 0.00000 0.00160 10.947 11.024 10.848 2.615 -0.638 0.985 at 4 6 224588_at XIST -10.793 0.00000 0.00027 11.047 11.123 11.080 -0.362 -0.120 1.353 0 4 227671_at XIST -11.353 0.00002 0.00309 8.699 8.569 8.417 -2.867 -0.309 -5.199 3 8 221728_x_ XIST -12.095 0.00000 0.00000 11.306 11.341 11.212 -0.415 -0.859 -1.152 at 0 4

Appendix Table 10.17. Continued.

282

Sample Assay Detector Reporter Task Ct Threshold P5:II:1Name 1 NOBOXName FAM Unknown 27.240421 0.2 P5:II:1 1 NOBOX FAM Unknown 27.31159 0.2 P5:II:1 1 NOBOX FAM Unknown 27.456335 0.2 P5:II:2 1 NOBOX FAM Unknown 26.797153 0.2 P5:II:2 1 NOBOX FAM Unknown 26.869265 0.2 P5:II:2 1 NOBOX FAM Unknown 26.939661 0.2 P1:III:4 1 NOBOX FAM Unknown 27.905571 0.2 P1:III:4 1 NOBOX FAM Unknown 28.000534 0.2 P1:III:4 1 NOBOX FAM Unknown 27.968296 0.2 NEG 1 NOBOX FAM NTC Undetermined 0.2 NEG 1 NOBOX FAM NTC Undetermined 0.2 NEG 1 NOBOX FAM NTC Undetermined 0.2 P5:II:1 1 RNaseP VIC Unknown 27.511667 0.2 P5:II:1 1 RNaseP VIC Unknown 27.45318 0.2 P5:II:1 1 RNaseP VIC Unknown 27.647167 0.2 P5:II:2 1 RNaseP VIC Unknown 26.747095 0.2 P5:II:2 1 RNaseP VIC Unknown 26.86929 0.2 P5:II:2 1 RNaseP VIC Unknown 26.609753 0.2 P1:III:4 1 RNaseP VIC Unknown 27.28803 0.2 P1:III:4 1 RNaseP VIC Unknown 27.272635 0.2 P1:III:4 1 RNaseP VIC Unknown 27.282894 0.2 NEG 1 RNaseP VIC NTC Undetermined 0.2 NEG 1 RNaseP VIC NTC Undetermined 0.2 NEG 1 RNaseP VIC NTC Undetermined 0.2

Appendix Table 10.18. Raw data for NOBOX Taqman assay 1.

Sample Assay Detector Reporte Task Ct Threshol P5:II:1Name 2 NOBOXName FAMr Unknow 27.65323 0.2d P5:II:1 2 NOBOX FAM Unknown 27.564613 0.2 P5:II:1 2 NOBOX FAM Unknown 27.664677 0.2 P5:II:2 2 NOBOX FAM Unknown 27.168919 0.2 P5:II:2 2 NOBOX FAM Unknown 27.24205 0.2 P5:II:2 2 NOBOX FAM Unknown 27.216745 0.2 P1:III:4 2 NOBOX FAM Unknown 28.647362 0.2 P1:III:4 2 NOBOX FAM Unknown 28.616907 0.2 P1:III:4 2 NOBOX FAM Unknown 28.634872 0.2 NEG 2 NOBOX FAM NTCn Undetermine 0.2 NEG 2 NOBOX FAM NTC Undetermined 0.2 NEG 2 NOBOX FAM NTC Undetermined 0.2 P5:II:1 2 RNaseP VIC Unknow 27.635933d 0.2 P5:II:1 2 RNaseP VIC Unknown 27.588327 0.2 P5:II:1 2 RNaseP VIC Unknown 27.744024 0.2 P5:II:2 2 RNaseP VIC Unknown 26.731928 0.2 P5:II:2 2 RNaseP VIC Unknown 26.62388 0.2 P5:II:2 2 RNaseP VIC Unknown 26.79544 0.2 P1:III:4 2 RNaseP VIC Unknown 27.117102 0.2 P1:III:4 2 RNaseP VIC Unknown 27.348299 0.2 P1:III:4 2 RNaseP VIC Unknown 27.306683 0.2 NEG 2 RNaseP VIC NTCn Undetermine 0.2 NEG 2 RNaseP VIC NTC Undetermined 0.2 NEG 2 RNaseP VIC NTC Undetermined 0.2 d Appendix Table 10.19. Raw data for NOBOX Taqman assay 2

283

Primer Sequence NOBOX DELETION PRIMER PAIR 1 FOR GTTGGTACCCTCCCCAGTTT NOBOX DELETION PRIMER PAIR 1 REV ACCTCCCATACCTTGCTGTG NOBOX DELETION PRIMER PAIR 2 FOR AGAGGCTTCATGGCAACATC NOBOX DELETION PRIMER PAIR 2 REV AGCCGCCTTCTTCTGGTTAT NOBOX DELETION PRIMER PAIR 3 FOR TTGAATGAATGAATGAATGAGC NOBOX DELETION PRIMER PAIR 3 REV CAATGACCTGATGCTGCCTA

Appendix Table 10.20. Primer sequences for the breakpoint of suspected 7q35 heterozygous deletion.

Primer Sequence CLPP cDNA F GTGGCCCGGAATATTGGTAG CLPP cDNA R TAGCTGGGACAGGTTCTGCT GTF2F1 cDNA F GAAGGGTTCAGACGACGAGG GTF2F1 cDNA R CGTCTTCTTCTTCGCCATG PCP2 cDNA F GAGGCCAGCAGAAAAGTGAC PCP2 cDNA R GAGACCCAGGATGCCTCAG . Appendix Table 10.21. Primers sequences for expression analysis of CLPP, GTF2F1 and PCP2.

284

Genomic % Predicte Ref> Varian Chr co- Reads variant d Effect Gene Var t reads cordinates reads zygosity

1 15570802 G>A/A 12 12 100.0 Hom C395Y DAP3 0 hsa-mir- 3 75679937 G>G/A 40 11 27.5 Het utr-5 1324 hsa-mir- 3 75679973 A>A/G 35 11 31.4 Het utr-5 1324 3 12915026 A>A/G 18 5 27.8 Het utr-3 MBD4 3 129150264 T>T/G 21 11 52.4 Het utr-3 MBD4 15 526624829 G>G/A 68 32 47.1 Het R984W MYO5A 15 52720718 G>A/G 110 63 57.3 Het R63X MYO5A 1 14800442 T>T/A 79 22 27.8 Het utr-3 NBPF14 2 1 14800447 C>C/G 146 36 24.7 Het utr-3 NBPF14

4 103826755 T>T/C 192 53 27.6 Het T416A NHEDC1 4 103826767 G>G/A 177 46 26.0 Het R412X NHEDC1 4 103826789 T>T/G 143 44 30.8 Het L405L NHEDC1 1 695118 A>G/G 54 54 100.0 Hom T141A OR4F5 21 14982716 T>C/T 42 26 61.9 Het M56T POTED 21 14982746 G>A/G 25 17 Het R66H POTED 21 14982786 C>C/T 27 9 68.033.3 Het N79N POTED 1 13037877 T>C/C 52 52 100.0 Hom C314C PRAMEF22 19 33493715 C>C/T 34 7 20.6 Het spliceSit RHPN2 19 33493722 A>A/G 35 9 25.7 Het A315Ae RHPN2 10 51613231 T>C/C 39 39 100.0 Hom utr-5 TIMM23 10 51613269 G>A/A 45 45 100.0 Hom utr-5 TIMM23

Appendix Table 10.22. Whole exome SNP data for individual P6:II:1. All single heterozygous variants removed, leaving only possible compound heterozygous and homozygous variants. Variants detected on NHLBI Exome Sequencing Project Exome Variant Server were removed.

285

Chr Genomic Genomic Reads Indel Indel sequence Predicted Gene Exon Effect co-ordinate co-ordinate type zygoisity start end

10 72434642 72434650 7 -9 AGGATTTTC Hete ADAMTS14 2 CI 20 60883769 60883769 16 +9 GACACGAAG Hete ADRM1 9 CI 5 55466560 55466574 7 -15 CGGCCATTTTTATC Hete ANKRD55 8 CI T 5 17276054 17276059 7 -6 CTCTCT Hete BASP1 utr-3 6 136582194 136582194 27 +3 AAA Hete BCLAF1 utr-3 10 103717493 103717493 8 +1 A Hete C10orf76 Adj exon 6 SS

17 35873013 35873026 10 -14 TCTGCTCCCCGCC Hete DUSP14 utr-3 G 19 57669786 57669787 37 -2 GA Hete DUXA 3 FS 3 184039868 184039868 5 +1 G Hete EIF4G1 6 FS 4 15938178 15938178 5 -1 T Hete FGFBP1 61 FS 1 240370914 240370946 14 -33 GCCCCCTCTACCC Hete FMN2 5 CI GGAGCGGGAATAC CTCCTCC 3 15613279 15613279 16 +1 A Hete HACL1 Adj exon 6 SS

Appendix Table 10.23. Whole exome Indel data for individual P6:II:1. Data was filtered using greater than 5 reads. CI = coding indel, FS = frameshift, SS = splice site.

286

Chr Genomic Genomic Reads Indel Indel sequence Predicted Gene Exon Effect co-ordinate co-ordinate type zygoisity start end 12 26492315 26492315 5 +1 T Hete ITPR2 utr-3 21 46047050 46047053 5 -4 ACTT Hete KRTAP10- utr-5 17 39165358 39165358 7 +3 AGA Hete KRTAP39 utr-5 5 162945388 162945388 61 -1 T Homo MAT2B utr-3 13 25832832 25832837 20 -6 TAAAAT Hete MTMR6 Adj exon 8 SS 16 46744683 46744689 5 -7 TGGGGAC Hete MYLK3 3 FS X 92928035 92928035 5 +1 C Hete NAP1L3 1 FS 2 206641240 206641243 10 -4 CGCA Hete NRP2 16 FS 2 206641245 206641245 10 +1 A Hete NRP2 16 FS 9 125273385 125273385 38 +1 T Hete OR1J2 1 FS 11 4608564 4608570 16 -7 GAGTATG Hete OR52I2 1 FS 11 5809807 5809807 14 -1 A Hete OR52N1 1 FS 11 7818383 7818383 28 +21 ATATGGTTACCAG Hete OR5P2 1 CI GTAGATGC 9 112900341 112900341 10 +6 GAAGCT Hete PALM2- 8 CI 2 55863360 55863360 15 +1 A Hete AKAP2PNPT1 utr-3 1 12939546 12939546 60 +1 G Hete PRAMEF4 1 FS

Appendix Table 10.23.Continued

287

Chr Genomic Genomic Reads Indel Indel sequence Predicted Gene Exon Effect co-ordinate co-ordinate type zygoisity start end 3 52027854 52027859 5 -6 CCTTGG Hete RPL29 1 CI 15 63447933 63447933 6 -1 A Hete RPS27L Adj exon 2 SS

2 175292581 175292593 28 -13 TCAAATTTATCAG Hete SCRN3 7 FS

X 100531414 100531419 23 -6 TCATCC Hete TAF7L 4 CI 1 154514536 154514536 7 +1 C Hete TDRD10 6 FS 19 4817288 4817288 9 +3 AGG Hete TICAM1 1 CI 19 21607415 21607415 52 -1 A Hete ZNF493 1 FS 19 53116934 53117017 156 -84 CCACACTCATTACA Homo ZNF83 1 CI TTTGTAAGGTTTCT CTCCAGT GTGGATTCTCTGA TGTTGTGCAAGGT GTGAAATATGATG GAAG ACCTTT

Appendix Table 10.23. Continued

288

Primer Sequence BSN EXON 1-F GGCTCCTTCTCAGCATGATAC BSN EXON 1-R GACGTCGAACCTCGCTGT BSN EXON 2-F AGGAAGAGGGTGGTGATGG BSN EXON 2-R TTGGTGAAAGAGGGGAAGG BSN EXON 3A-F AGTGCCATCTGTCCTTCAGC BSN EXON 3A-R CTGGGCTCAGCTGTGGAA BSN EXON 3B-F GGAAGCCAGACCAAGAGAGA BSN EXON 3B-R ATTCTAGCCCCAGGCTCAGT BSN EXON 3C-F AGGGCCTCACTGGTAAGCTC BSN EXON 3C-R GCTGCAGACCCAGAACTCTT BSN EXON 4A-F ATTTGGGGGCAGTAAATTGG BSN EXON 4A-R GGGACTCGGGTCTTCTTTTC BSN EXON 4B-F GAAGCAGAAAGGGCCACAG BSN EXON 4B-R GCGTGCCAGAACCTCATCTA BSN EXON 4C-F GTCAAGGCTGTTCCAGAAGC BSN EXON 4C-R ACCCAATCCTCCCTGTCAAT BSN EXON 5A-F AAGGCCAGAAGGAGATAGGG BSN EXON 5A-R CCAGCTCCTCATCAGAGTCA BSN EXON 5B-F CCTCCGAGATCCACAAGGT BSN EXON 5B-R CTCCTCATAGCCCGTGGTG BSN EXON 5C-F GGGCAGCAGAACTGACTGAT BSN EXON 5C-R GCTGCTGCTTTGCTTCTTCT BSN EXON 5D-F CCTGAGCTGGAGATGGAGAG BSN EXON 5D-R GCCTCAGCCTCAGAGTCAAG BSN EXON 5E-F GCGGGACAAGGAAGAACTG BSN EXON 5E-R CATGCGGACTATGCTCTGTG BSN EXON 5F-F AGCTGAGCTGCTCCAGAGG BSN EXON 5F-R AGCTGGACAAGGAGCCACAG BSN EXON 5G-F TTCTCTACCCCCACCTCCTC BSN EXON 5G-R TGGACTTGGCGTCTGTGTAG BSN EXON 5H-F GCTGGACGAGCTGCTAGAGA BSN EXON 5H-R GGCACCCCAGCCATATAGT BSN EXON 5I-F CCTCTCAAGAGGCTCCCTTT BSN EXON 5I-R CTGCTTTTGTTGAGCCATGA BSN EXON 5J-F TGGACCTCACCTCTCTTGCT BSN EXON 5J-R ACCACTGTGTGTGGCTTCCT BSN EXON 5K-F GAGGCCAAGTTTGCCAGATA BSN EXON 5K-R TTGGTGTCAGACATGGAGGA BSN EXON 5L-F TGAGAAGAGCATGGCAGATG BSN EXON 5L-R ACTGGTCCATGGAGTTGAGG BSN EXON 5M-F GGCTTGCAGTATGGCTCAGT BSN EXON 5M-R ATTCCACCAGATGCGTAAGG BSN EXON 5N-F CTCAGACCTGGACTCCTTGG BSN EXON 5N-R CATCTGCTGGTGGCTTCTG BSN EXON 5O-F CTATCTGGGGAAACCTGCTG BSN EXON 5O-R GCTGCTTCTGCTCCTCTAGC

Appendix Table 10.24. Primer sequences for coding exons of human BSN.

289

Primer Sequence BSN EXON 5P-F GACAACTTCGGCTGCAAGAG BSN EXON 5P-R AATCCTCCTGACCACACAGG BSN EXON 5Q-F CCCTTACACATGCAGCCTTC BSN EXON 5Q-R ACCTGACGATCTCCACCTTG BSN EXON 5R-F GCCACTCAGACTCAGGCTCT BSN EXON 5R-R TCACGTGAGCTTTGTTCAGC BSN EXON 5S-F GATCCCCTGGAGATTGGGTA BSN EXON 5S-R GGGAGCATCTGCTCATTGAC BSN EXON 6A-F CTGGTAAATCAGGGCACCAG BSN EXON 6A-R GCAAGGGTGGGTAGTCACG BSN EXON 6B-F ACCACTGTCCCTGCTACCAA BSN EXON 6B-R GTGTTTGGCCTGCATAGTGG BSN EXON 6C-F CAGTATTCTGCAGGCAGTGG BSN EXON 6C-R GGTGCTGTCCTTGGTCAGTT BSN EXON 6D-F TCTTCAGCCCCATCTGAAAC BSN EXON 6D-R TGGATATCTTTTGCTCCATGC BSN EXON 6E-F GCTGCTATGCCAGAGGAGAA BSN EXON 6E-R GCTTTTTGAAGTCGCTCCAC BSN EXON 6F-F ATGGGCTCAAGAAGAACGTG BSN EXON 6F-R AGACCCTGCTGGGACATCTT BSN EXON 7A-F CTGATCTGCTTGTGGCTGTG BSN EXON 7A-R ATACCCAGGCTGACCCTTCT BSN EXON 7B-F GACTACGATGAACCCCCTGA BSN EXON 7B-R GTGGTTGGCTGGCTCTGT BSN EXON 7C&8-F TGCAGTCAAAGGCAGAACC BSN EXON 7C&8-R GGGCACAGGTGGTATAGAGAG BSN EXON 9&10-F CCCTAGCCTCCTTCTCACCT BSN EXON 9&10-R GGTACTGAACCAGCCGAAGA BSN EXON 11-F CCTGGCTTTCAAACCATCTG BSN EXON 11-R ACCATCTCAGGCAGCTCTGT

Appendix Table 10.24. Continued.

290

Primer Sequence Zfish BSNa ex10 PP1 For AGCCATCCCAGAGACATCAG Zfish BSNa ex10 PP1 Rev GGGAAATGCAGCTGACAGG Zfish BSNa ex10 PP2 For AGCCATCCCAGAGACATCAG Zfish BSNa ex10 PP2 Rev CTTGGCTTTTGTTGACTGACC Zfish BSNa ex10 PP3 For AGCCATCCCAGAGACATCAG Zfish BSNa ex10 PP3 Rev GGGTGACCATAGGCATCTGT Zfish BSNa ex10 PP4 For AGCCATCCCAGAGACATCAG Zfish BSNa ex10 PP4 Rev AGGTACCGTCGGGAGACTTT Zfish BSNa ex13 PP1 For TCCGGTCAGTCAACAAAAGC Zfish BSNa ex13 PP1 Rev TCCTTGGGACACTCTGCTCT Zfish BSNa ex13 PP2 For CCTATCCGGTCAGTCAACAAA Zfish BSNa ex13 PP2 Rev CGTCTCCCAGAATGACCACT Zfish BSNa ex13 PP3 For ACCACTGCCCAAGAAATCAG Zfish BSNa ex13 PP3 Rev CGTCTCCCAGAATGACCACT Zfish BSNa ex13 PP4 For CCTATCAGGTCCAAGGGTCA Zfish BSNa ex13 PP4 Rev GCTCATCAGAGTGGCGTGTA Zfish BSNb PP1 For TCAGACAGTGAACTGAATAATTTGA Zfish BSNb PP1 Rev TCCCATGGTGTTTAGAACGAG Zfish BSNb PP2 For GAGCCCTCTGAGTCCTGTTG Zfish BSNb PP2 Rev ATGACGGGAAGAGGTGTGTC Zfish BSNb PP3 For AGCCGTAGAGCAAGAATCCA Zfish BSNb PP3 Rev ATGACGGGAAGAGGTGTGTC

Appendix Table 10.25. Primer sequences for zebrafish Bsn cDNA sequencing.

Appendix Figure 10.7. Overview of the Affymetrix Human SNP array V6.0 protocol. Taken from the Affymetrix website.

291

10.1 References

1. Morton, C.C. and W.E. Nance, Newborn hearing screening--a silent revolution. N Engl J Med, 2006. 354(20): p. 2151-64. 2. Kochhar, A., M.S. Hildebrand, and R.J. Smith, Clinical aspects of hereditary hearing loss. Genet Med, 2007. 9(7): p. 393-408. 3. Shearer, A.E., et al., Deafness in the genomics era. Hear Res, 2011. 282(1- 2): p. 1-9. 4. Schwander, M., B. Kachar, and U. Muller, Review series: The cell biology of hearing. J Cell Biol, 2010. 190(1): p. 9-20. 5. Hilgert, N., R.J. Smith, and G. Van Camp, Function and expression pattern of nonsyndromic deafness genes. Curr Mol Med, 2009. 9(5): p. 546-64. 6. Vlastarakos, P.V., et al., Novel approaches to treating sensorineural hearing loss. Auditory genetics and necessary factors for stem cell transplant. Med Sci Monit, 2008. 14(8): p. RA114-25. 7. Lonyai, A., et al., Fetal Hox11 expression patterns predict defective target organs: a novel link between developmental biology and autoimmunity. Immunol Cell Biol, 2008. 86(4): p. 301-9. 8. Ernest, S., et al., Localization of anosmin-1a and anosmin-1b in the inner ear and neuromasts of zebrafish. Gene Expr Patterns, 2007. 7(3): p. 274-81. 9. Kelsell, D.P., et al., Connexin 26 mutations in hereditary non-syndromic sensorineural deafness. Nature, 1997. 387(6628): p. 80-3. 10. Kenneson, A., K. Van Naarden Braun, and C. Boyle, GJB2 (connexin 26) variants and nonsyndromic sensorineural hearing loss: a HuGE review. Genet Med, 2002. 4(4): p. 258-74. 11. Denoyelle, F., et al., Prelingual deafness: high prevalence of a 30delG mutation in the connexin 26 gene. Hum Mol Genet, 1997. 6(12): p. 2173-7. 12. Mohamed, M.R., et al., Functional analysis of a novel I71N mutation in the GJB2 gene among Southern Egyptians causing autosomal recessive hearing loss. Cell Physiol Biochem, 2010. 26(6): p. 959-66. 13. Choi, S.Y., et al., Different functional consequences of two missense mutations in the GJB2 gene associated with non-syndromic hearing loss. Hum Mutat, 2009. 30(7): p. E716-27. 14. Beltramello, M., et al., Impaired permeability to Ins(1,4,5)P3 in a mutant connexin underlies recessive hereditary deafness. Nat Cell Biol, 2005. 7(1): p. 63-9. 15. del Castillo, I., et al., A deletion involving the connexin 30 gene in nonsyndromic hearing impairment. N Engl J Med, 2002. 346(4): p. 243-9. 16. Liu, X.Z., et al., Mutations in connexin31 underlie recessive as well as dominant non-syndromic hearing loss. Hum Mol Genet, 2000. 9(1): p. 63- 7. 17. Yang, J.J., et al., Identification of mutations in members of the connexin gene family as a cause of nonsyndromic deafness in Taiwan. Audiol Neurootol, 2007. 12(3): p. 198-208. 18. Kubisch, C., et al., KCNQ4, a novel potassium channel expressed in sensory outer hair cells, is mutated in dominant deafness. Cell, 1999. 96(3): p. 437-46. 19. Pourova, R., et al., Spectrum and frequency of SLC26A4 mutations among Czech patients with early hearing loss with and without Enlarged Vestibular Aqueduct (EVA). Ann Hum Genet, 2010. 74(4): p. 299-307. 292

20. Singh, R. and P. Wangemann, Free radical stress-mediated loss of Kcnj10 protein expression in stria vascularis contributes to deafness in Pendred syndrome mouse model. Am J Physiol Renal Physiol, 2008. 294(1): p. F139-48. 21. Guilford, P., et al., A human gene responsible for neurosensory, non- syndromic recessive deafness is a candidate homologue of the mouse sh-1 gene. Hum Mol Genet, 1994. 3(6): p. 989-93. 22. Evans, K.L., et al., Human olfactory marker protein maps close to tyrosinase and is a candidate gene for Usher syndrome type I. Hum Mol Genet, 1993. 2(2): p. 115-8. 23. Weil, D., et al., Defective myosin VIIA gene responsible for Usher syndrome type 1B. Nature, 1995. 374(6517): p. 60-1. 24. Liu, X.Z., et al., Mutations in the myosin VIIA gene cause non-syndromic recessive deafness. Nat Genet, 1997. 16(2): p. 188-90. 25. Hildebrand, M.S., et al., Variable hearing impairment in a DFNB2 family with a novel MYO7A missense mutation. Clin Genet, 2010. 77(6): p. 563- 71. 26. Liu, X.Z., et al., Autosomal dominant non-syndromic deafness caused by a mutation in the myosin VIIA gene. Nat Genet, 1997. 17(3): p. 268-9. 27. Di Leva, F., et al., Identification of a novel mutation in the myosin VIIA motor domain in a family with autosomal dominant hearing loss (DFNA11). Audiol Neurootol, 2006. 11(3): p. 157-64. 28. Luijendijk, M.W., et al., Identification and molecular modelling of a mutation in the motor head domain of myosin VIIA in a family with autosomal dominant hearing impairment (DFNA11). Hum Genet, 2004. 115(2): p. 149-56. 29. Sun, Y., et al., Novel missense mutations in MYO7A underlying postlingual high- or low-frequency non-syndromic hearing impairment in two large families from China. J Hum Genet, 2011. 56(1): p. 64-70. 30. Bolz, H., et al., Impaired calmodulin binding of myosin-7A causes autosomal dominant hearing loss (DFNA11). Hum Mutat, 2004. 24(3): p. 274-5. 31. Kimberling, W.J., et al., Linkage of Usher syndrome type I gene (USH1B) to the long arm of chromosome 11. Genomics, 1992. 14(4): p. 988-94. 32. Friedman, T.B., et al., A gene for congenital, recessive deafness DFNB3 maps to the pericentromeric region of chromosome 17. Nat Genet, 1995. 9(1): p. 86-91. 33. Winata, S., et al., Congenital non-syndromal autosomal recessive deafness in Bengkala, an isolated Balinese village. J Med Genet, 1995. 32(5): p. 336-43. 34. Probst, F.J., et al., Correction of deafness in shaker-2 mice by an unconventional myosin in a BAC transgene. Science, 1998. 280(5368): p. 1444-7. 35. Wang, A., et al., Association of unconventional myosin MYO15 mutations with human nonsyndromic deafness DFNB3. Science, 1998. 280(5368): p. 1447-51. 36. Ahmed, Z.M., et al., Mutations of MYO6 are associated with recessive deafness, DFNB37. Am J Hum Genet, 2003. 72(5): p. 1315-22.

293

37. Mohiddin, S.A., et al., Novel association of hypertrophic cardiomyopathy, sensorineural deafness, and a mutation in unconventional myosin VI (MYO6). J Med Genet, 2004. 41(4): p. 309-14. 38. Crozet, F., et al., Cloning of the genes encoding two murine and human cochlear unconventional type I myosins. Genomics, 1997. 40(2): p. 332-41. 39. Wells, A.L., et al., Myosin VI is an actin-based motor that moves backwards. Nature, 1999. 401(6752): p. 505-8. 40. Frolenkov, G.I., et al., Genetic insights into the morphogenesis of inner ear hair cells. Nat Rev Genet, 2004. 5(7): p. 489-98. 41. Hasson, T., et al., Unconventional myosins in inner-ear sensory epithelia. J Cell Biol, 1997. 137(6): p. 1287-307. 42. Verhoeven, K., et al., Mutations in the human alpha-tectorin gene cause autosomal dominant non-syndromic hearing impairment. Nat Genet, 1998. 19(1): p. 60-2. 43. Alasti, F., et al., A novel TECTA mutation confirms the recognizable phenotype among autosomal recessive hearing impairment families. Int J Pediatr Otorhinolaryngol, 2008. 72(2): p. 249-55. 44. Gueta, R., et al., Structural and mechanical analysis of tectorial membrane tecta mutants. Biophys J, 2011. 100(10): p. 2530-8. 45. Xia, A., et al., Deficient forward transduction and enhanced reverse transduction in the alpha tectorin C1509G human hearing loss mutation. Dis Model Mech, 2010. 3(3-4): p. 209-23. 46. Pfister, M., et al., A genotype-phenotype correlation with gender-effect for hearing impairment caused by TECTA mutations. Cell Physiol Biochem, 2004. 14(4-6): p. 369-76. 47. McGuirt, W.T., et al., Mutations in COL11A2 cause non-syndromic hearing loss (DFNA13). Nat Genet, 1999. 23(4): p. 413-9. 48. Chen, W., et al., Mutation of COL11A2 causes autosomal recessive non- syndromic hearing loss at the DFNB53 locus. J Med Genet, 2005. 42(10): p. e61. 49. Vikkula, M., et al., Autosomal dominant and recessive osteochondrodysplasias associated with the COL11A2 locus. Cell, 1995. 80(3): p. 431-7. 50. Brunner, H.G., et al., A Stickler syndrome gene is linked to chromosome 6 near the COL11A2 gene. Hum Mol Genet, 1994. 3(9): p. 1561-4. 51. Verpy, E., et al., Mutations in a new gene encoding a protein of the hair bundle cause non-syndromic deafness at the DFNB16 locus. Nat Genet, 2001. 29(3): p. 345-9. 52. Zwaenepoel, I., et al., Otoancorin, an inner ear protein restricted to the interface between the apical surface of sensory epithelia and their overlying acellular gels, is defective in autosomal recessive deafness DFNB22. Proc Natl Acad Sci U S A, 2002. 99(9): p. 6240-5. 53. Shahin, H., et al., Five novel loci for inherited hearing loss mapped by SNP-based homozygosity profiles in Palestinian families. Eur J Hum Genet, 2010. 18(4): p. 407-13. 54. Walsh, T., et al., Genomic analysis of a heterogeneous Mendelian phenotype: multiple novel alleles for inherited hearing loss in the Palestinian population. Hum Genomics, 2006. 2(4): p. 203-11. 55. Jovine, L., J. Park, and P.M. Wassarman, Sequence similarity between stereocilin and otoancorin points to a unified mechanism for

294

mechanotransduction in the mammalian inner ear. BMC Cell Biol, 2002. 3: p. 28. 56. Robertson, N.G., et al., Mutations in a novel cochlear gene cause DFNA9, a human nonsyndromic deafness with vestibular dysfunction. Nat Genet, 1998. 20(3): p. 299-303. 57. Street, V.A., et al., A novel DFNA9 mutation in the vWFA2 domain of COCH alters a conserved cysteine residue and intrachain disulfide bond formation resulting in progressive hearing loss and site-specific vestibular and central oculomotor dysfunction. Am J Med Genet A, 2005. 139A(2): p. 86-95. 58. Fransen, E., et al., A common ancestor for COCH related cochleovestibular (DFNA9) patients in Belgium and The Netherlands bearing the P51S mutation. J Med Genet, 2001. 38(1): p. 61-5. 59. Douville, P.J., et al., The brain-specific POU-box gene Brn4 is a sex-linked transcription factor located on the human and mouse X chromosomes. Mamm Genome, 1994. 5(3): p. 180-2. 60. de Kok, Y.J., et al., Association between X-linked mixed deafness and mutations in the POU domain gene POU3F4. Science, 1995. 267(5198): p. 685-8. 61. Bitner-Glindzicz, M., et al., Further mutations in Brain 4 (POU3F4) clarify the phenotype in the X-linked deafness, DFN3. Hum Mol Genet, 1995. 4(8): p. 1467-9. 62. Vahava, O., et al., Mutation in transcription factor POU4F3 associated with inherited progressive hearing loss in humans. Science, 1998. 279(5358): p. 1950-4. 63. Collin, R.W., et al., Missense mutations in POU4F3 cause autosomal dominant hearing impairment DFNA15 and affect subcellular localization and DNA binding. Hum Mutat, 2008. 29(4): p. 545-54. 64. Scott, H.S., et al., Insertion of beta-satellite repeats identifies a transmembrane protease causing both congenital and childhood onset autosomal recessive deafness. Nat Genet, 2001. 27(1): p. 59-63. 65. Ben-Yosef, T., et al., Novel mutations of TMPRSS3 in four DFNB8/B10 families segregating congenital autosomal recessive deafness. J Med Genet, 2001. 38(6): p. 396-400. 66. Guipponi, M., et al., The transmembrane (TMPRSS3) mutated in deafness DFNB8/10 activates the epithelial sodium channel (ENaC) in vitro. Hum Mol Genet, 2002. 11(23): p. 2829-36. 67. Lee, Y.J., et al., Pathogenic mutations but not polymorphisms in congenital and childhood onset autosomal recessive deafness disrupt the proteolytic activity of TMPRSS3. J Med Genet, 2003. 40(8): p. 629-31. 68. Naz, S., et al., Mutations in a novel gene, TMIE, are associated with hearing loss linked to the DFNB6 locus. Am J Hum Genet, 2002. 71(3): p. 632-6. 69. Sirmaci, A., et al., A founder TMIE mutation is a frequent cause of hearing loss in southeastern Anatolia. Clin Genet, 2009. 75(6): p. 562-7. 70. Yang, J.J., et al., Identification of novel variants in the TMIE gene of patients with nonsyndromic hearing loss. Int J Pediatr Otorhinolaryngol, 2010. 74(5): p. 489-93.

295

71. Shen, Y.C., et al., The transmembrane inner ear (tmie) gene contributes to vestibular and lateral line development and function in the zebrafish (Danio rerio). Dev Dyn, 2008. 237(4): p. 941-52. 72. Mitchem, K.L., et al., Mutation of the novel gene Tmie results in sensory cell defects in the inner ear of spinner, a mouse model of human hearing loss DFNB6. Hum Mol Genet, 2002. 11(16): p. 1887-98. 73. Usami, S., et al., Simultaneous screening of multiple mutations by invader assay improves molecular diagnosis of hereditary hearing loss: a multicenter study. PLoS One, 2012. 7(2): p. e31276. 74. Seeburg, P.H. and J.P. Adelman, Characterization of cDNA for precursor of human luteinizing hormone releasing hormone. Nature, 1984. 311(5987): p. 666-8. 75. Salisbury, T.B., A.K. Binder, and J.H. Nilson, Welcoming beta-catenin to the gonadotropin-releasing hormone transcriptional network in gonadotropes. Mol Endocrinol, 2008. 22(6): p. 1295-303. 76. Messinis, I.E., C.I. Messini, and K. Dafopoulos, The role of gonadotropins in the follicular phase. Ann N Y Acad Sci, 2010. 1205: p. 5-11. 77. Kumar, P. and S.F. Sait, Luteinizing hormone and its dilemma in ovulation induction. J Hum Reprod Sci, 2011. 4(1): p. 2-7. 78. Topaloglu, A.K. and L.D. Kotan, Molecular causes of hypogonadotropic hypogonadism. Curr Opin Obstet Gynecol, 2010. 22(4): p. 264-70. 79. Bianco, S.D. and U.B. Kaiser, The genetic and molecular basis of idiopathic hypogonadotropic hypogonadism. Nat Rev Endocrinol, 2009. 5(10): p. 569-76. 80. Dode, C., et al., Loss-of-function mutations in FGFR1 cause autosomal dominant Kallmann syndrome. Nat Genet, 2003. 33(4): p. 463-5. 81. Kallmann F, S.W., Barrera SE, The Genetic Aspects of Primary Eunuchoidism. American Journal of Mental Deficiency, 1944. 48: p. 203- 236. 82. Maestre de San Juan, A., Falta total de los nervios olfactorios con anosmia en un individuo en quien existia una atrofia congenita de los testiculos y miembro viril. Siglo Medico, 1856. 131: p. 211. 83. Wray, S., P. Grant, and H. Gainer, Evidence that cells expressing luteinizing hormone-releasing hormone mRNA in the mouse are derived from progenitor cells in the olfactory placode. Proc Natl Acad Sci U S A, 1989. 86(20): p. 8132-6. 84. Schwanzel-Fukuda, M. and D.W. Pfaff, Origin of luteinizing hormone- releasing hormone neurons. Nature, 1989. 338(6211): p. 161-4. 85. Bick, D., et al., Male infant with ichthyosis, Kallmann syndrome, chondrodysplasia punctata, and an Xp chromosome deletion. Am J Med Genet, 1989. 33(1): p. 100-7. 86. Schwanzel-Fukuda, M., D. Bick, and D.W. Pfaff, Luteinizing hormone- releasing hormone (LHRH)-expressing cells do not migrate normally in an inherited hypogonadal (Kallmann) syndrome. Brain Res Mol Brain Res, 1989. 6(4): p. 311-26. 87. Rugarli, E.I., Kallmann syndrome and the link between olfactory and reproductive development. Am J Hum Genet, 1999. 65(4): p. 943-8. 88. Franco, B., et al., A gene deleted in Kallmann's syndrome shares homology with neural cell adhesion and axonal path-finding molecules. Nature, 1991. 353(6344): p. 529-36.

296

89. Legouis, R., et al., The candidate gene for the X-linked Kallmann syndrome encodes a protein related to adhesion molecules. Cell, 1991. 67(2): p. 423- 35. 90. Cariboni, A., et al., The product of X-linked Kallmann's syndrome gene (KAL1) affects the migratory activity of gonadotropin-releasing hormone (GnRH)-producing neurons. Hum Mol Genet, 2004. 13(22): p. 2781-91. 91. Jap, T.S., et al., Identification of two novel missense mutations in the KAL1 gene in Han Chinese subjects with Kallmann Syndrome. J Endocrinol Invest, 2011. 34(1): p. 53-9. 92. Tang, K.F., et al., Molecular analysis of KAL-1 in a series of Kallmann syndrome and normosmic idiopathic hypogonadotropic hypogonadism patients from Northwestern China. Asian J Androl, 2009. 11(6): p. 711-5. 93. Salenave, S., et al., Kallmann's syndrome: a comparison of the reproductive phenotypes in men carrying KAL1 and FGFR1/KAL2 mutations. J Clin Endocrinol Metab, 2008. 93(3): p. 758-63. 94. Bhagavath, B., et al., KAL1 mutations are not a common cause of idiopathic hypogonadotrophic hypogonadism in humans. Mol Hum Reprod, 2007. 13(3): p. 165-70. 95. Albuisson, J., et al., Kallmann syndrome: 14 novel mutations in KAL1 and FGFR1 (KAL2). Hum Mutat, 2005. 25(1): p. 98-9. 96. Oliveira, L.M., et al., The importance of autosomal genes in Kallmann syndrome: genotype-phenotype correlations and neuroendocrine characteristics. J Clin Endocrinol Metab, 2001. 86(4): p. 1532-8. 97. Gonzalez-Martinez, D., et al., Anosmin-1 modulates fibroblast growth factor receptor 1 signaling in human gonadotropin-releasing hormone olfactory neuroblasts through a heparan sulfate-dependent mechanism. J Neurosci, 2004. 24(46): p. 10384-92. 98. Hu, Y., et al., Novel mechanisms of fibroblast growth factor receptor 1 regulation by extracellular matrix protein anosmin-1. J Biol Chem, 2009. 284(43): p. 29905-20. 99. Dode, C., et al., Kallmann syndrome: mutations in the genes encoding prokineticin-2 and prokineticin receptor-2. PLoS Genet, 2006. 2(10): p. e175. 100. Cole, L.W., et al., Mutations in prokineticin 2 and prokineticin receptor 2 genes in human gonadotrophin-releasing hormone deficiency: molecular genetics and clinical spectrum. J Clin Endocrinol Metab, 2008. 93(9): p. 3551-9. 101. Abreu, A.P., et al., Loss-of-function mutations in the genes encoding prokineticin-2 or prokineticin receptor-2 cause autosomal recessive Kallmann syndrome. J Clin Endocrinol Metab, 2008. 93(10): p. 4113-8. 102. Sarfati, J., et al., A comparative phenotypic study of kallmann syndrome patients carrying monoallelic and biallelic mutations in the prokineticin 2 or prokineticin receptor 2 genes. J Clin Endocrinol Metab, 2010. 95(2): p. 659-69. 103. Kim, H.G., et al., Mutations in CHD7, encoding a chromatin-remodeling protein, cause idiopathic hypogonadotropic hypogonadism and Kallmann syndrome. Am J Hum Genet, 2008. 83(4): p. 511-9. 104. Falardeau, J., et al., Decreased FGF8 signaling causes deficiency of gonadotropin-releasing hormone in humans and mice. J Clin Invest, 2008. 118(8): p. 2822-31.

297

105. Kramer, P.R. and S. Wray, Novel gene expressed in nasal region influences outgrowth of olfactory axons and migration of luteinizing hormone- releasing hormone (LHRH) neurons. Genes Dev, 2000. 14(14): p. 1824-34. 106. Miura, K., J.S. Acierno, Jr., and S.B. Seminara, Characterization of the human nasal embryonic LHRH factor gene, NELF, and a mutation screening among 65 patients with idiopathic hypogonadotropic hypogonadism (IHH). J Hum Genet, 2004. 49(5): p. 265-8. 107. Pitteloud, N., et al., Digenic mutations account for variable phenotypes in idiopathic hypogonadotropic hypogonadism. J Clin Invest, 2007. 117(2): p. 457-63. 108. Xu, N., et al., Nasal embryonic LHRH factor (NELF) mutations in patients with normosmic hypogonadotropic hypogonadism and Kallmann syndrome. Fertil Steril, 2011. 95(5): p. 1613-20 e1-7. 109. Trarbach, E.B., et al., Nonsense mutations in FGF8 gene causing different degrees of human gonadotropin-releasing deficiency. J Clin Endocrinol Metab, 2010. 95(7): p. 3491-6. 110. Kaplan, J.D., et al., Clues to an early diagnosis of Kallmann syndrome. Am J Med Genet A, 2010. 152A(11): p. 2796-801. 111. Sinisi, A.A., et al., Homozygous mutation in the prokineticin-receptor2 gene (Val274Asp) presenting as reversible Kallmann syndrome and persistent oligozoospermia: case report. Hum Reprod, 2008. 23(10): p. 2380-4. 112. Pitteloud, N., et al., Reversible kallmann syndrome, delayed puberty, and isolated anosmia occurring in a single family with a mutation in the fibroblast growth factor receptor 1 gene. J Clin Endocrinol Metab, 2005. 90(3): p. 1317-22. 113. Ribeiro, R.S., T.C. Vieira, and J. Abucham, Reversible Kallmann syndrome: report of the first case with a KAL1 mutation and literature review. Eur J Endocrinol, 2007. 156(3): p. 285-90. 114. de Roux, N., et al., A Family with Hypogonadotropic Hypogonadism and Mutations in the Gonadotropin-Releasing Hormone Receptor. N Engl J Med, 1997. 337(22): p. 1597-1603. 115. Cerrato, F., et al., Coding sequence analysis of GNRHR and GPR54 in patients with congenital and adult-onset forms of hypogonadotropic hypogonadism. Eur J Endocrinol, 2006. 155 Suppl 1: p. S3-S10. 116. Kotani, M., et al., The metastasis suppressor gene KiSS-1 encodes kisspeptins, the natural ligands of the orphan G protein-coupled receptor GPR54. J Biol Chem, 2001. 276(37): p. 34631-6. 117. Muir, A.I., et al., AXOR12, a novel human G protein-coupled receptor, activated by the peptide KiSS-1. J Biol Chem, 2001. 276(31): p. 28969-75. 118. de Roux, N., et al., Hypogonadotropic hypogonadism due to loss of function of the KiSS1-derived peptide receptor GPR54. Proc Natl Acad Sci U S A, 2003. 100(19): p. 10972-6. 119. Tenenbaum-Rakover, Y., et al., Neuroendocrine phenotype analysis in five patients with isolated hypogonadotropic hypogonadism due to a L102P inactivating mutation of GPR54. J Clin Endocrinol Metab, 2007. 92(3): p. 1137-44. 120. Seminara, S.B., et al., The GPR54 gene as a regulator of puberty. N Engl J Med, 2003. 349(17): p. 1614-27.

298

121. Gottsch, M.L., et al., A role for kisspeptins in the regulation of gonadotropin secretion in the mouse. Endocrinology, 2004. 145(9): p. 4073-7. 122. Matsui, H., et al., Peripheral administration of metastin induces marked gonadotropin release and ovulation in the rat. Biochem Biophys Res Commun, 2004. 320(2): p. 383-8. 123. Navarro, V.M., et al., Effects of KiSS-1 peptide, the natural ligand of GPR54, on follicle-stimulating hormone secretion in the rat. Endocrinology, 2005. 146(4): p. 1689-97. 124. Teles, M.G., et al., A GPR54-activating mutation in a patient with central precocious puberty. N Engl J Med, 2008. 358(7): p. 709-15. 125. Silveira, L.G., et al., Mutations of the KISS1 gene in disorders of puberty. J Clin Endocrinol Metab, 2010. 95(5): p. 2276-80. 126. Krajewski, S.J., et al., Morphologic evidence that neurokinin B modulates gonadotropin-releasing hormone secretion via neurokinin 3 receptors in the rat median eminence. J Comp Neurol, 2005. 489(3): p. 372-86. 127. Todman, M.G., S.K. Han, and A.E. Herbison, Profiling neurotransmitter receptor expression in mouse gonadotropin-releasing hormone neurons using green fluorescent protein-promoter transgenics and microarrays. Neuroscience, 2005. 132(3): p. 703-12. 128. Goodman, R.L., et al., Kisspeptin neurons in the arcuate nucleus of the ewe express both dynorphin A and neurokinin B. Endocrinology, 2007. 148(12): p. 5752-60. 129. Topaloglu, A.K., et al., TAC3 and TACR3 mutations in familial hypogonadotropic hypogonadism reveal a key role for Neurokinin B in the central control of reproduction. Nat Genet, 2009. 41(3): p. 354-8. 130. Guran, T., et al., Hypogonadotropic hypogonadism due to a novel missense mutation in the first extracellular loop of the neurokinin B receptor. J Clin Endocrinol Metab, 2009. 94(10): p. 3633-9. 131. Gianetti, E., et al., TAC3/TACR3 mutations reveal preferential activation of gonadotropin-releasing hormone release by neurokinin B in neonatal life followed by reversal in adulthood. J Clin Endocrinol Metab, 2010. 95(6): p. 2857-67. 132. Coulam, C.B., S.C. Adamson, and J.F. Annegers, Incidence of premature ovarian failure. Obstet Gynecol, 1986. 67(4): p. 604-6. 133. Beck-Peccoz, P. and L. Persani, Premature ovarian failure. Orphanet J Rare Dis, 2006. 1: p. 9. 134. Wieacker, P., Genetic Aspects of Premature Ovarian Failure. J Reproduktionsmed Endokrinol, 2009. 6(1): p. 17-18. 135. Christakos, A.C., et al., Gonadal dysgenesis as an autosomal recessive condition. Am J Obstet Gynecol, 1969. 104(7): p. 1027-30. 136. Aittomaki, K., The genetics of XX gonadal dysgenesis. Am J Hum Genet, 1994. 54(5): p. 844-51. 137. Portnoi, M.F., et al., Molecular cytogenetic studies of Xq critical regions in premature ovarian failure patients. Hum Reprod, 2006. 21(9): p. 2329-34. 138. Sybert, V.P. and E. McCauley, Turner's syndrome. N Engl J Med, 2004. 351(12): p. 1227-38. 139. Ledig, S., A. Ropke, and P. Wieacker, Copy number variants in premature ovarian failure and ovarian dysgenesis. Sex Dev, 2010. 4(4-5): p. 225-32.

299

140. Singh, R.P. and D.H. Carr, The anatomy and histology of XO human embryos and fetuses. Anat Rec, 1966. 155(3): p. 369-83. 141. Modi, D.N., S. Sane, and D. Bhartiya, Accelerated germ cell apoptosis in sex chromosome aneuploid fetal human gonads. Mol Hum Reprod, 2003. 9(4): p. 219-25. 142. Goswami, R., et al., Prevalence of the triple X syndrome in phenotypically normal women with premature ovarian failure and its association with autoimmune thyroid disorders. Fertil Steril, 2003. 80(4): p. 1052-4. 143. Tartaglia, N.R., et al., A review of trisomy X (47,XXX). Orphanet J Rare Dis, 2010. 5: p. 8. 144. Artini, P.G., et al., Chromosomal abnormalities in women with premature ovarian failure. Gynecol Endocrinol, 2010. 26(10): p. 717-24. 145. Jacobs, P.A., et al., Evidence for the existence of the human "super female". Lancet, 1959. 2(7100): p. 423-5. 146. Villanueva, A.L. and R.W. Rebar, Triple-X syndrome and premature ovarian failure. Obstet Gynecol, 1983. 62(3 Suppl): p. 70s-73s. 147. Skalba, P., A. Cygal, and Z. Gierzynska, A case of premature ovarian failure (POF) in a 31-year-old woman with a 47,XXX karyotype. Endokrynol Pol, 2010. 61(2): p. 217-9. 148. Persani, L., et al., Primary Ovarian Insufficiency: X chromosome defects and autoimmunity. J Autoimmun, 2009. 33(1): p. 35-41. 149. Mumm, S., et al., X/autosomal translocations in the Xq critical region associated with premature ovarian failure fall within and outside genes. Genomics, 2001. 76(1-3): p. 30-6. 150. Castrillon, D.H. and S.A. Wasserman, Diaphanous is required for cytokinesis in Drosophila and shares domains of similarity with the products of the limb deformity gene. Development, 1994. 120(12): p. 3367- 77. 151. Sala, C., et al., Eleven X chromosome breakpoints associated with premature ovarian failure (POF) map to a 15-Mb YAC contig spanning Xq21. Genomics, 1997. 40(1): p. 123-31. 152. Philippe, C., et al., Physical mapping of DNA markers in the q13-q22 region of the human X chromosome. Genomics, 1993. 17(1): p. 147-52. 153. Bione, S., et al., A human homologue of the Drosophila melanogaster diaphanous gene is disrupted in a patient with premature ovarian failure: evidence for conserved function in oogenesis and implications for human sterility. Am J Hum Genet, 1998. 62(3): p. 533-41. 154. Prueitt, R.L., J.L. Ross, and A.R. Zinn, Physical mapping of nine Xq translocation breakpoints and identification of XPNPEP2 as a premature ovarian failure candidate gene. Cytogenet Cell Genet, 2000. 89(1-2): p. 44- 50. 155. Di Pasquale, E., P. Beck-Peccoz, and L. Persani, Hypergonadotropic ovarian failure associated with an inherited mutation of human bone morphogenetic protein-15 (BMP15) gene. Am J Hum Genet, 2004. 75(1): p. 106-11. 156. Galloway, S.M., et al., Mutations in an oocyte-derived growth factor gene (BMP15) cause increased ovulation rate and infertility in a dosage- sensitive manner. Nat Genet, 2000. 25(3): p. 279-83. 157. Dube, J.L., et al., The bone morphogenetic protein 15 gene is X-linked and expressed in oocytes. Mol Endocrinol, 1998. 12(12): p. 1809-17.

300

158. Hanrahan, J.P., et al., Mutations in the genes for oocyte-derived growth factors GDF9 and BMP15 are associated with both increased ovulation rate and sterility in Cambridge and Belclare sheep (Ovis aries). Biol Reprod, 2004. 70(4): p. 900-9. 159. Yan, C., et al., Synergistic roles of bone morphogenetic protein 15 and growth differentiation factor 9 in ovarian function. Mol Endocrinol, 2001. 15(6): p. 854-66. 160. McNatty, K.P., et al., Bone morphogenetic protein 15 and growth differentiation factor 9 co-operate to regulate granulosa cell function in ruminants. Reproduction, 2005. 129(4): p. 481-7. 161. Di Pasquale, E., et al., Identification of new variants of human BMP15 gene in a large cohort of women with premature ovarian failure. J Clin Endocrinol Metab, 2006. 91(5): p. 1976-9. 162. Dixit, H., et al., Missense mutations in the BMP15 gene are associated with ovarian failure. Hum Genet, 2006. 119(4): p. 408-15. 163. Rossetti, R., et al., BMP15 mutations associated with primary ovarian insufficiency cause a defective production of bioactive protein. Hum Mutat, 2009. 30(5): p. 804-10. 164. Ledig, S., et al., BMP15 mutations in XX gonadal dysgenesis and premature ovarian failure. Am J Obstet Gynecol, 2008. 198(1): p. 84 e1-5. 165. Tiotiu, D., et al., Variants of the BMP15 gene in a cohort of patients with premature ovarian failure. Hum Reprod, 2010. 25(6): p. 1581-7. 166. Laissue, P., et al., Mutations and sequence variants in GDF9 and BMP15 in patients with premature ovarian failure. Eur J Endocrinol, 2006. 154(5): p. 739-44. 167. Allen, E.G., et al., Examination of reproductive aging milestones among women who carry the FMR1 premutation. Hum Reprod, 2007. 22(8): p. 2142-52. 168. Cronister, A., et al., Heterozygous fragile X female: historical, physical, cognitive, and cytogenetic features. Am J Med Genet, 1991. 38(2-3): p. 269-74. 169. Schwartz, C.E., et al., Obstetrical and gynecological complications in fragile X carriers: a multicenter study. Am J Med Genet, 1994. 51(4): p. 400-2. 170. Allingham-Hawkins, D.J., et al., Fragile X premutation is a significant risk factor for premature ovarian failure: the International Collaborative POF in Fragile X study--preliminary data. Am J Med Genet, 1999. 83(4): p. 322-5. 171. Murray, A., et al., Studies of FRAXA and FRAXE in women with premature ovarian failure. J Med Genet, 1998. 35(8): p. 637-40. 172. Persani, L., R. Rossetti, and C. Cacciatore, Genes involved in human premature ovarian failure. J Mol Endocrinol, 2010. 45(5): p. 257-79. 173. Jin, P. and S.T. Warren, Understanding the molecular basis of fragile X syndrome. Hum Mol Genet, 2000. 9(6): p. 901-8. 174. Bachner, D., et al., Enhanced expression of the murine FMR1 gene during germ cell proliferation suggests a special function in both the male and the female gonad. Hum Mol Genet, 1993. 2(12): p. 2043-50. 175. Murray, A., et al., Microdeletions in FMR2 may be a significant cause of premature ovarian failure. J Med Genet, 1999. 36(10): p. 767-70.

301

176. Bione, S., et al., Mutation analysis of two candidate genes for premature ovarian failure, DACH2 and POF1B. Hum Reprod, 2004. 19(12): p. 2759- 66. 177. Lacombe, A., et al., Disruption of POF1B binding to nonmuscle actin filaments is associated with premature ovarian failure. Am J Hum Genet, 2006. 79(1): p. 113-9. 178. Cordts, E.B., et al., Genetic aspects of premature ovarian failure: a literature review. Arch Gynecol Obstet, 2011. 283(3): p. 635-43. 179. Matthews, C.H., et al., Primary amenorrhoea and infertility due to a mutation in the beta-subunit of follicle-stimulating hormone. Nat Genet, 1993. 5(1): p. 83-6. 180. Matthews, C. and V.K. Chatterjee, Isolated deficiency of follicle- stimulating hormone re-revisited. N Engl J Med, 1997. 337(9): p. 642. 181. Rabinowitz, D., et al., Isolated follicle-stimulating hormone deficiency revisited. Ovulation and conception in presence of circulating antibody to follicle-stimulating hormone. N Engl J Med, 1979. 300(3): p. 126-8. 182. Layman, L.C., et al., Delayed puberty and hypogonadism caused by mutations in the follicle-stimulating hormone beta-subunit gene. N Engl J Med, 1997. 337(9): p. 607-11. 183. Layman, L.C., et al., FSH beta gene mutations in a female with partial breast development and a male sibling with normal puberty and azoospermia. J Clin Endocrinol Metab, 2002. 87(8): p. 3702-7. 184. Lofrano-Porto, A., et al., Luteinizing hormone beta mutation and hypogonadism in men and women. N Engl J Med, 2007. 357(9): p. 897- 904. 185. Weiss, J., et al., Hypogonadism caused by a single amino acid substitution in the beta subunit of luteinizing hormone. N Engl J Med, 1992. 326(3): p. 179-83. 186. Valdes-Socin, H., et al., Hypogonadism in a patient with a mutation in the luteinizing hormone beta-subunit gene. N Engl J Med, 2004. 351(25): p. 2619-25. 187. Roy, A.C., et al., Identification of seven novel mutations in LH beta-subunit gene by SSCP. Mol Cell Biochem, 1996. 165(2): p. 151-3. 188. Liao, W.X., et al., A new molecular variant of luteinizing hormone associated with female infertility. Fertil Steril, 1998. 69(1): p. 102-6. 189. Ramanujam, L.N., et al., Association of molecular variants of luteinizing hormone with male infertility. Hum Reprod, 2000. 15(4): p. 925-8. 190. Liao, W.X., H.H. Goh, and A.C. Roy, Functional characterization of a natural variant of luteinizing hormone. Hum Genet, 2002. 111(2): p. 219- 24. 191. Aittomaki, K., et al., Mutation in the follicle-stimulating hormone receptor gene causes hereditary hypergonadotropic ovarian failure. Cell, 1995. 82(6): p. 959-68. 192. Rannikko, A., et al., Functional characterization of the human FSH receptor with an inactivating Ala189Val mutation. Mol Hum Reprod, 2002. 8(4): p. 311-7. 193. Doherty, E., et al., A Novel mutation in the FSH receptor inhibiting signal transduction and causing primary ovarian failure. J Clin Endocrinol Metab, 2002. 87(3): p. 1151-5.

302

194. Conway, G.S., et al., Mutation screening and isoform prevalence of the follicle stimulating hormone receptor gene in women with premature ovarian failure, resistant ovary syndrome and polycystic ovary syndrome. Clin Endocrinol (Oxf), 1999. 51(1): p. 97-9. 195. Layman, L.C., et al., The Finnish follicle-stimulating hormone receptor gene mutation is rare in North American women with 46,XX ovarian failure. Fertil Steril, 1998. 69(2): p. 300-2. 196. Meduri, G., et al., Delayed puberty and primary amenorrhea associated with a novel mutation of the human follicle-stimulating hormone receptor: clinical, histological, and molecular studies. J Clin Endocrinol Metab, 2003. 88(8): p. 3491-8. 197. Allen, L.A., et al., A novel loss of function mutation in exon 10 of the FSH receptor gene causing hypergonadotrophic hypogonadism: clinical and molecular characteristics. Hum Reprod, 2003. 18(2): p. 251-6. 198. Beau, I., et al., A novel phenotype related to partial loss of function mutations of the follicle stimulating hormone receptor. J Clin Invest, 1998. 102(7): p. 1352-9. 199. Touraine, P., et al., New natural inactivating mutations of the follicle- stimulating hormone receptor: correlations between receptor function and phenotype. Mol Endocrinol, 1999. 13(11): p. 1844-54. 200. Arnhold, I.J., et al., Clinical features of women with resistance to luteinizing hormone. Clin Endocrinol (Oxf), 1999. 51(6): p. 701-7. 201. Arnhold, I.J., A. Lofrano-Porto, and A.C. Latronico, Inactivating mutations of luteinizing hormone beta-subunit or luteinizing hormone receptor cause oligo-amenorrhea and infertility in women. Horm Res, 2009. 71(2): p. 75- 82. 202. Latronico, A.C., et al., Brief report: testicular and ovarian resistance to luteinizing hormone caused by inactivating mutations of the luteinizing hormone-receptor gene. N Engl J Med, 1996. 334(8): p. 507-12. 203. Latronico, A.C., et al., A homozygous microdeletion in helix 7 of the luteinizing hormone receptor associated with familial testicular and ovarian resistance is due to both decreased cell surface expression and impaired effector activation by the cell surface receptor. Mol Endocrinol, 1998. 12(3): p. 442-50. 204. Kremer, H., et al., Male pseudohermaphroditism due to a homozygous missense mutation of the luteinizing hormone receptor gene. Nat Genet, 1995. 9(2): p. 160-4. 205. Toledo, S.P., et al., An inactivating mutation of the luteinizing hormone receptor causes amenorrhea in a 46,XX female. J Clin Endocrinol Metab, 1996. 81(11): p. 3850-4. 206. Newton, C.L., et al., Rescue of expression and signaling of human luteinizing hormone G protein-coupled receptor mutants with an allosterically binding small-molecule agonist. Proc Natl Acad Sci U S A, 2011. 108(17): p. 7172-6. 207. Bayne, R.A., S.J. Martins da Silva, and R.A. Anderson, Increased expression of the FIGLA transcription factor is associated with primordial follicle formation in the human fetal ovary. Mol Hum Reprod, 2004. 10(6): p. 373-81. 208. Joshi, S., et al., Ovarian gene expression in the absence of FIGLA, an oocyte-specific transcription factor. BMC Dev Biol, 2007. 7: p. 67.

303

209. Soyal, S.M., A. Amleh, and J. Dean, FIGalpha, a germ cell-specific transcription factor required for ovarian follicle formation. Development, 2000. 127(21): p. 4645-54. 210. Zhao, H., et al., Transcription factor FIGLA is mutated in patients with premature ovarian failure. Am J Hum Genet, 2008. 82(6): p. 1342-8. 211. Suzumori, N., et al., Nobox is a homeobox-encoding gene preferentially expressed in primordial and growing oocytes. Mech Dev, 2002. 111(1-2): p. 137-41. 212. Huntriss, J., M. Hinkins, and H.M. Picton, cDNA cloning and expression of the human NOBOX gene in oocytes and ovarian follicles. Mol Hum Reprod, 2006. 12(5): p. 283-9. 213. Rajkovic, A., et al., NOBOX deficiency disrupts early folliculogenesis and oocyte-specific gene expression. Science, 2004. 305(5687): p. 1157-9. 214. Qin, Y., et al., NOBOX homeobox mutation causes premature ovarian failure. Am J Hum Genet, 2007. 81(3): p. 576-81. 215. Bisgaard, A.M., et al., Twins with mental retardation and an interstitial deletion 7q34q36.2 leading to the diagnosis of long QT syndrome. Am J Med Genet A, 2006. 140(6): p. 644-8. 216. Rossi, E., et al., A 12Mb deletion at 7q33-q35 associated with autism spectrum disorders and primary amenorrhea. Eur J Med Genet, 2008. 51(6): p. 631-8. 217. Sehested, L.T., et al., Deletion of 7q34-q36.2 in two siblings with mental retardation, language delay, primary amenorrhea, and dysmorphic features. Am J Med Genet A, 2010. 152A(12): p. 3115-9. 218. Parker, K.L. and B.P. Schimmer, Steroidogenic factor 1: a key determinant of endocrine development and function. Endocr Rev, 1997. 18(3): p. 361- 77. 219. Ferraz-de-Souza, B., L. Lin, and J.C. Achermann, Steroidogenic factor-1 (SF-1, NR5A1) and human disease. Mol Cell Endocrinol, 2011. 336(1-2): p. 198-205. 220. Luo, X., Y. Ikeda, and K.L. Parker, A cell-specific nuclear receptor is essential for adrenal and gonadal development and sexual differentiation. Cell, 1994. 77(4): p. 481-90. 221. Majdic, G., et al., Knockout mice lacking steroidogenic factor 1 are a novel genetic model of hypothalamic obesity. Endocrinology, 2002. 143(2): p. 607-14. 222. Achermann, J.C., et al., A mutation in the gene encoding steroidogenic factor-1 causes XY sex reversal and adrenal failure in humans. Nat Genet, 1999. 22(2): p. 125-6. 223. Lourenco, D., et al., Mutations in NR5A1 associated with ovarian insufficiency. N Engl J Med, 2009. 360(12): p. 1200-10. 224. Philibert, P., et al., Mutational analysis of steroidogenic factor 1 (NR5a1) in 24 boys with bilateral anorchia: a French collaborative study. Hum Reprod, 2007. 22(12): p. 3255-61. 225. Warman, D.M., et al., Three new SF-1 (NR5A1) gene mutations in two unrelated families with multiple affected members: within-family variability in 46,XY subjects and low ovarian reserve in fertile 46,XX subjects. Horm Res Paediatr, 2011. 75(1): p. 70-7.

304

226. Perrault, M.K., B.; Housset, E., Deux cas de syndrome de Turner avec surdi-mutite dans une meme fratrie. Bulletins et mémoires de la Société médicale des hôpitaux de Paris, 1951. 16: p. 79-84. 227. Josso, N., et al., [Familial Turner's Syndrome. Study of 2 Families with Xo and Xx Karyotypes.]. Ann Pediatr (Paris), 1963. 10: p. 163-7. 228. Pallister, P.D. and J.M. Opitz, The Perrault syndrome: autosomal recessive ovarian dysgenesis with facultative, non-sex-limited sensorineural deafness. Am J Med Genet, 1979. 4(3): p. 239-46. 229. Bosze, P., et al., Perrault's syndrome in two sisters. Am J Med Genet, 1983. 16(2): p. 237-41. 230. McCarthy, D.J. and J.M. Opitz, Perrault syndrome in sisters. Am J Med Genet, 1985. 22(3): p. 629-31. 231. Fiumara, A., et al., Perrault syndrome: evidence for progressive nervous system involvement. Am J Med Genet A, 2004. 128A(3): p. 246-9. 232. Nishi, Y., et al., The Perrault syndrome: clinical report and review. Am J Med Genet, 1988. 31(3): p. 623-9. 233. Linssen, W.H., et al., Deafness, sensory neuropathy, and ovarian dysgenesis: a new syndrome or a broader spectrum of Perrault syndrome? Am J Med Genet, 1994. 51(1): p. 81-2. 234. Gottschalk, M.E., S.B. Coker, and L.A. Fox, Neurologic anomalies of Perrault syndrome. Am J Med Genet, 1996. 65(4): p. 274-6. 235. Jacob, J.J., et al., Perrault syndrome with Marfanoid habitus in two siblings. J Pediatr Adolesc Gynecol, 2007. 20(5): p. 305-8. 236. Cruz, O.L., M.E. Pedalini, and C.A. Caropreso, Sensorineural hearing loss associated to gonadal dysgenesis in sisters: Perrault's syndrome. Am J Otol, 1992. 13(1): p. 82-3. 237. Nikolaou, D.S. and R.M. Winston, Sporadic Perrault syndrome. J Obstet Gynaecol, 1999. 19(4): p. 436-7. 238. Marlin, S., et al., Perrault syndrome: report of four new cases, review and exclusion of candidate genes. Am J Med Genet A, 2008. 146A(5): p. 661-4. 239. Pierce, S.B., et al., Mutations in the DBP-deficiency protein HSD17B4 cause ovarian dysgenesis, hearing loss, and ataxia of Perrault Syndrome. Am J Hum Genet, 2010. 87(2): p. 282-8. 240. Pierce, S.B., et al., Mutations in mitochondrial histidyl tRNA synthetase HARS2 cause ovarian dysgenesis and sensorineural hearing loss of Perrault syndrome. Proc Natl Acad Sci U S A, 2011. 108(16): p. 6543-8. 241. Huyghe, S., et al., Peroxisomal multifunctional protein-2: the enzyme, the patients and the knockout mouse model. Biochim Biophys Acta, 2006. 1761(9): p. 973-94. 242. Botstein, D., et al., Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet, 1980. 32(3): p. 314-31. 243. Weber, J.L. and P.E. May, Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am J Hum Genet, 1989. 44(3): p. 388-96. 244. Sachidanandam, R., et al., A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 2001. 409(6822): p. 928-33.

305

245. Lander, E.S. and D. Botstein, Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science, 1987. 236(4808): p. 1567-70. 246. Botstein, D. and N. Risch, Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet, 2003. 33 Suppl: p. 228-37. 247. Lugassy, J., et al., Rapid detection of homozygous mutations in congenital recessive ichthyosis. Arch Dermatol Res, 2008. 300(2): p. 81-5. 248. Wang, S., et al., Genome-wide autozygosity mapping in human populations. Genet Epidemiol, 2008. 249. Rehman, S.U., et al., Autozygosity mapping of a large consanguineous Pakistani family reveals a novel non-syndromic autosomal recessive mental retardation locus on 11p15-tel. Neurogenetics, 2011. 250. Miano, M.G., et al., Pitfalls in homozygosity mapping. Am J Hum Genet, 2000. 67(5): p. 1348-51. 251. Lezirovitz, K., et al., Unexpected genetic heterogeneity in a large consanguineous Brazilian pedigree presenting deafness. Eur J Hum Genet, 2008. 16(1): p. 89-96. 252. Collins, F.S., Positional cloning moves from perditional to traditional. Nat Genet, 1995. 9(4): p. 347-50. 253. Group, T.T.C.S.C., Positional cloning of a gene involved in the pathogenesis of Treacher Collins syndrome. Nat Genet, 1996. 12(2): p. 130-6. 254. Worton, R.G., et al., Duchenne muscular dystrophy involving translocation of the dmd gene next to ribosomal RNA genes. Science, 1984. 224(4656): p. 1447-9. 255. Bassi, M.T., et al., Cloning of the gene for ocular albinism type 1 from the distal short arm of the X chromosome. Nat Genet, 1995. 10(1): p. 13-9. 256. Collins, F.S., M. Morgan, and A. Patrinos, The Human Genome Project: lessons from large-scale biology. Science, 2003. 300(5617): p. 286-90. 257. Ku, C.S., N. Naidoo, and Y. Pawitan, Revisiting Mendelian disorders through exome sequencing. Hum Genet, 2011. 129(4): p. 351-70. 258. Bick, D. and D. Dimmock, Whole exome and whole genome sequencing. Curr Opin Pediatr, 2011. 23(6): p. 594-600. 259. Shendure, J. and H. Ji, Next-generation DNA sequencing. Nat Biotechnol, 2008. 26(10): p. 1135-45. 260. Sanger, F., et al., Nucleotide sequence of bacteriophage phi X174 DNA. Nature, 1977. 265(5596): p. 687-95. 261. Hunkapiller, T., et al., Large-scale and automated DNA sequence determination. Science, 1991. 254(5028): p. 59-67. 262. Shendure, J., et al., Accurate multiplex polony sequencing of an evolved bacterial genome. Science, 2005. 309(5741): p. 1728-32. 263. Fullwood, M.J., et al., Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res, 2009. 19(4): p. 521-32. 264. Balasubramanian, S., Sequencing nucleic acids: from chemistry to medicine. Chem Commun (Camb), 2011. 47(26): p. 7281-6. 265. McKernan, K.J., et al., Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res, 2009. 19(9): p. 1527-41.

306

266. Rhodes, M.D. The Fundamentals of 2 Base Encoding and Colour Space. 2008 [cited 2008. 267. De Michele, G., et al., Heterogeneous findings in four cases of cerebellar ataxia associated with hypogonadism (Holmes' type ataxia). Clin Neurol Neurosurg, 1993. 95(1): p. 23-8. 268. M. Badura-Stronka, W.G., B. Męczekalski, A. Wawrocka, K. and M.K. Zawieja, A. Latos-Bieleńska. Perrault syndrome in a female manifesting carrier of mtDNA 11778G>A mutation. in European Society of Human Genetics. 2011. Amsterdam. 269. Mehdipour, P., Karimi, A.R., Bastanhagh, MM., PERRAULT'S SYNDROME: A CLINICAL AND GENETIC INVESTIGATION OF THREE SISTERS. Acta Medica Iranica, 1999. 37(2): p. 78-85. 270. Jenkinson, E.M., et al., Newly recognized recessive syndrome characterized by dysmorphic features, hypogonadotropic hypogonadism, severe microcephaly, and sensorineural hearing loss maps to 3p21.3. Am J Med Genet A, 2011. 271. Jenkinson, E.M., et al., Perrault syndrome: further evidence for genetic heterogeneity. J Neurol, 2011. 272. Carr, I.M., et al., Interactive visual analysis of SNP data for rapid autozygosity mapping in consanguineous families. Hum Mutat, 2006. 27(10): p. 1041-6. 273. Abecasis, G.R., et al., Merlin--rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet, 2002. 30(1): p. 97-101. 274. Gibbs, J.R. and A. Singleton, Application of genome-wide single nucleotide polymorphism typing: simple association and beyond. PLoS Genet, 2006. 2(10): p. e150. 275. Oriol, R., et al., Common origin and evolution of glycosyltransferases using Dol-P-monosaccharides as donor substrate. Mol Biol Evol, 2002. 19(9): p. 1451-63. 276. Chen, P., et al., Progressive hearing loss in mice lacking the cyclin- dependent kinase inhibitor Ink4d. Nat Cell Biol, 2003. 5(5): p. 422-6. 277. Buchold, G.M., P.L. Magyar, and D.A. O'Brien, Mice lacking cyclin- dependent kinase inhibitor p19Ink4d show strain-specific effects on male reproduction. Mol Reprod Dev, 2007. 74(8): p. 1008-20. 278. Santos, R.L., et al., DFNB68, a novel autosomal recessive non-syndromic hearing impairment locus at chromosomal region 19p13.2. Hum Genet, 2006. 120(1): p. 85-92. 279. Chen, A.H., et al., MYO1F as a candidate gene for nonsyndromic deafness, DFNB15. Arch Otolaryngol Head Neck Surg, 2001. 127(8): p. 921-5. 280. Zadro, C., et al., Are MYO1C and MYO1F associated with hearing loss? Biochim Biophys Acta, 2009. 1792(1): p. 27-32. 281. Nair, T.S., et al., Identification and characterization of choline transporter- like protein 2, an inner ear glycoprotein of 68 and 72 kDa that is the target of antibody-induced hearing loss. J Neurosci, 2004. 24(7): p. 1772-9. 282. Kommareddi, P.K., et al., Cochlin isoforms and their interaction with CTL2 (SLC44A2) in the inner ear. J Assoc Res Otolaryngol, 2007. 8(4): p. 435- 46. 283. Borud, B., et al., Cloning and characterization of a novel zinc finger protein that modulates the transcriptional activity of nuclear receptors. Mol Endocrinol, 2003. 17(11): p. 2303-19.

307

284. de Kok, Y.J., et al., Identification of a hot spot for microdeletions in patients with X-linked deafness type 3 (DFN3) 900 kb proximal to the DFN3 gene POU3F4. Hum Mol Genet, 1996. 5(9): p. 1229-35. 285. Naranjo, S., et al., Multiple enhancers located in a 1-Mb region upstream of POU3F4 promote expression during inner ear development and may be required for hearing. Hum Genet, 2010. 128(4): p. 411-9. 286. Song, M.H., et al., Clinical evaluation of DFN3 patients with deletions in the POU3F4 locus and detection of carrier female using MLPA. Clin Genet, 2010. 78(6): p. 524-32. 287. Jamieson, R.V., et al., Domain disruption and mutation of the bZIP transcription factor, MAF, associated with cataract, ocular anterior segment dysgenesis and coloboma. Hum Mol Genet, 2002. 11(1): p. 33-42. 288. Kleinjan, D.A. and V. van Heyningen, Long-range control of gene expression: emerging mechanisms and disruption in disease. Am J Hum Genet, 2005. 76(1): p. 8-32. 289. Kidd, J.M., et al., Mapping and sequencing of structural variation from eight human genomes. Nature, 2008. 453(7191): p. 56-64. 290. Wong, K.K., et al., A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet, 2007. 80(1): p. 91-104. 291. Redon, R., et al., Global variation in copy number in the human genome. Nature, 2006. 444(7118): p. 444-54. 292. Long, X. and J.M. Miano, Remote control of gene expression. J Biol Chem, 2007. 282(22): p. 15941-5. 293. Bulger, M. and M. Groudine, Functional and mechanistic diversity of distal transcription enhancers. Cell, 2011. 144(3): p. 327-39. 294. Pfaffl, M.W., A new mathematical model for relative quantification in real- time RT-PCR. Nucleic Acids Res, 2001. 29(9): p. e45. 295. Gusnanto, A., S. Calza, and Y. Pawitan, Identification of differentially expressed genes and false discovery rate in microarray studies. Curr Opin Lipidol, 2007. 18(2): p. 187-93. 296. Laine, H., et al., p19(Ink4d) and p21(Cip1) collaborate to maintain the postmitotic state of auditory hair cells, their codeletion leading to DNA damage and p53-mediated apoptosis. J Neurosci, 2007. 27(6): p. 1434-44. 297. O'Bryan, J.P., et al., axl, a transforming gene isolated from primary human myeloid leukemia cells, encodes a novel receptor tyrosine kinase. Mol Cell Biol, 1991. 11(10): p. 5016-31. 298. Mark, M.R., et al., Characterization of Gas6, a member of the superfamily of G domain-containing proteins, as a ligand for Rse and Axl. J Biol Chem, 1996. 271(16): p. 9785-9. 299. Berclaz, G., et al., Estrogen dependent expression of the receptor tyrosine kinase axl in normal and malignant human breast. Ann Oncol, 2001. 12(6): p. 819-24. 300. Ito, M., et al., Expression of receptor-type tyrosine kinase, Axl, and its ligand, Gas6, in pediatric thyroid carcinomas around chernobyl. Thyroid, 2002. 12(11): p. 971-5. 301. Zhu, Y., et al., Kank proteins: a new family of ankyrin-repeat domain- containing proteins. Biochim Biophys Acta, 2008. 1780(2): p. 128-33. 302. Zhang, Y., et al., SIP, a novel ankyrin repeat containing protein, sequesters steroid receptor coactivators in the cytoplasm. Embo J, 2007. 26(11): p. 2645-57.

308

303. Cook, M., et al., Pronapsin A and B gene expression in normal and malignant human lung and mononuclear blood cells. Biochim Biophys Acta, 2002. 1577(1): p. 10-6. 304. Patel, N., et al., OB-BP1/Siglec-6. a leptin- and sialic acid-binding protein of the immunoglobulin superfamily. J Biol Chem, 1999. 274(32): p. 22729- 38. 305. Cosman, D., et al., A novel immunoglobulin superfamily receptor for cellular and viral MHC class I molecules. Immunity, 1997. 7(2): p. 273-82. 306. Wilson, G.L., et al., cDNA cloning of the B cell membrane protein CD22: a mediator of B-B cell interactions. J Exp Med, 1991. 173(1): p. 137-46. 307. Li, H., J. Ruan, and R. Durbin, Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res, 2008. 18(11): p. 1851-8. 308. Larkin, M.A., et al., Clustal W and Clustal X version 2.0. Bioinformatics, 2007. 23(21): p. 2947-8. 309. Adzhubei, I.A., et al., A method and server for predicting damaging missense mutations. Nat Methods, 2010. 7(4): p. 248-9. 310. Kumar, P., S. Henikoff, and P.C. Ng, Predicting the effects of coding non- synonymous variants on protein function using the SIFT algorithm. Nat Protoc, 2009. 4(7): p. 1073-81. 311. Corneveaux J. J., O.J., White C., Allen A. N.,Van Camp G., Friedman R.,Huentelman M. J. Identification of genes expressed in human inner ear tissue using next generation RNA sequencing. in European Society of Human Genetics Conference. 2011. Amsterdam: Nature. 312. Kang, S.G., et al., Human mitochondrial ClpP is a stable heptamer that assembles into a tetradecamer in the presence of ClpX. J Biol Chem, 2005. 280(42): p. 35424-32. 313. Hansen, J., et al., Decreased expression of the mitochondrial matrix Lon and ClpP in cells from a patient with hereditary spastic paraplegia (SPG13). Neuroscience, 2008. 153(2): p. 474-82. 314. Flores, O., et al., The small subunit of transcription factor IIF recruits RNA polymerase II into the preinitiation complex. Proc Natl Acad Sci U S A, 1991. 88(22): p. 9999-10003. 315. Kumar, R., et al., Induced alpha-helix structure in AF1 of the androgen receptor upon binding transcription factor TFIIF. Biochemistry, 2004. 43(11): p. 3008-13. 316. Lavery, D.N. and I.J. McEwan, Functional characterization of the native NH2-terminal transactivation domain of the human androgen receptor: binding kinetics for interactions with TFIIF and SRC-1a. Biochemistry, 2008. 47(11): p. 3352-9. 317. Reid, J., et al., The androgen receptor interacts with multiple regions of the large subunit of general transcription factor TFIIF. J Biol Chem, 2002. 277(43): p. 41247-53. 318. Berrebi, A.S., et al., Cerebellar Purkinje cell markers are expressed in retinal bipolar neurons. J Comp Neurol, 1991. 308(4): p. 630-49. 319. Willard, F.S., C.R. McCudden, and D.P. Siderovski, G-protein alpha subunit interaction and guanine nucleotide dissociation inhibitor activity of the dual GoLoco motif protein PCP-2 (Purkinje cell protein-2). Cell Signal, 2006. 18(8): p. 1226-34.

309

320. Zhang, X., H. Zhang, and J. Oberdick, Conservation of the developmentally regulated dendritic localization of a Purkinje cell-specific mRNA that encodes a G-protein modulator: comparison of rodent and human Pcp2(L7) gene structure and expression. Brain Res Mol Brain Res, 2002. 105(1-2): p. 1-10. 321. Walsh, T., et al., Whole exome sequencing and homozygosity mapping identify mutation in the cell polarity protein GPSM2 as the cause of nonsyndromic hearing loss DFNB82. Am J Hum Genet, 2010. 87(1): p. 90- 4. 322. Kabashi, E., et al., Zebrafish models for the functional genomics of neurogenetic disorders. Biochim Biophys Acta, 2011. 1812(3): p. 335-45. 323. Saba, T.G., et al., An atypical form of erythrokeratodermia variabilis maps to chromosome 7q22. Hum Genet, 2005. 116(3): p. 167-71. 324. Montpetit, A., et al., Disruption of AP1S1, causing a novel neurocutaneous syndrome, perturbs development of the skin and spinal cord. PLoS Genet, 2008. 4(12): p. e1000296. 325. Winter, C., et al., The presynaptic cytomatrix protein Bassoon: sequence and chromosomal localization of the human BSN gene. Genomics, 1999. 57(3): p. 389-97. 326. tom Dieck, S., et al., Bassoon, a novel zinc-finger CAG/glutamine-repeat protein selectively localized at the active zone of presynaptic nerve terminals. J Cell Biol, 1998. 142(2): p. 499-509. 327. Fenster, S.D., et al., Piccolo, a presynaptic zinc finger protein structurally related to bassoon. Neuron, 2000. 25(1): p. 203-14. 328. Altrock, W.D., et al., Functional inactivation of a fraction of excitatory synapses in mice deficient for the active zone protein bassoon. Neuron, 2003. 37(5): p. 787-800. 329. Sterling, P. and G. Matthews, Structure and function of ribbon synapses. Trends Neurosci, 2005. 28(1): p. 20-9. 330. Dick, O., et al., The presynaptic active zone protein bassoon is essential for photoreceptor ribbon synapse formation in the retina. Neuron, 2003. 37(5): p. 775-86. 331. Khimich, D., et al., Hair cell synaptic ribbons are essential for synchronous auditory signalling. Nature, 2005. 434(7035): p. 889-94. 332. Fuenzalida, L.C., K.L. Keen, and E. Terasawa, Colocalization of FM1-43, Bassoon, and GnRH-1: GnRH-1 release from cell bodies and their neuroprocesses. Endocrinology, 2011. 152(11): p. 4310-21. 333. Kimmel, C.B., et al., Stages of embryonic development of the zebrafish. Dev Dyn, 1995. 203(3): p. 253-310. 334. Lee-Kirsch, M.A., et al., Mutations in the gene encoding the 3'-5' DNA exonuclease TREX1 are associated with systemic lupus erythematosus. Nat Genet, 2007. 39(9): p. 1065-7. 335. Lehtinen, D.A., et al., The TREX1 double-stranded DNA degradation activity is defective in dominant mutations associated with autoimmune disease. J Biol Chem, 2008. 283(46): p. 31649-56. 336. Matsumoto, T., et al., Androgen receptor functions in male and female physiology. J Steroid Biochem Mol Biol, 2008. 109(3-5): p. 236-41. 337. Charlton, H., Neural transplantation in hypogonadal (hpg) mice - physiology and neurobiology. Reproduction, 2004. 127(1): p. 3-12.

310

10.2 Publications

The following papers include work from Chapters 3 and 4.

311