University of Groningen

Unravelling the genetic basis of hereditary disorders by high-throughput exome sequencing strategies Jazayeri, Omid

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record

Publication date: 2016

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA): Jazayeri, O. (2016). Unravelling the genetic basis of hereditary disorders by high-throughput exome sequencing strategies. University of Groningen.

Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

The publication may also be distributed here under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license. More information can be found on the University of Groningen website: https://www.rug.nl/library/open-access/self-archiving-pure/taverne- amendment.

Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

Download date: 07-10-2021

Unravelling the genetic basis of hereditary disorders by high-throughput exome sequencing strategies

Omid Jazayeri

|

Omid Jazayeri

Unravelling the genetic basis of hereditary disorders by high-throughput exome sequencing strategies

The research presented in this thesis was performed at the Department of Genetics, Uiversity Medical Center Groningen, University of Groningen, Groningen, The Netherlands.

Cover design by Matin Mehrban ([email protected])

Page layout by Omid Jazayeri

Printed by IPSKAMP printing

© 2016 Omid Jazayeri. All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means without permission of the auther.

ISBN: 978-90-367-8972-1 (Print) ISBN: 978-90-367-8971-4 (e-book)

Unravelling the genetic basis of hereditary disorders by high-throughput exome sequencing strategies

PhD thesis

to obtain the degree of PhD at the University of Groningen on the authority of the Rector Magnificus Prof. E. Sterken and in accordance with the decision by the College of Deans.

This thesis will be defended in public on

Monday 20 June 2016 at 9:00 hours

By

Omid Jazayeri

born on 27 February 1973 in Tehran, Iran

Promotor Prof. R. J. Sinke

Co- promotor Dr. B. Z. Alizadeh

Assessment Committee Prof. H. M. Boezen Prof. V. V. A. Knoers Prof. F. J. van Spronsen

Paranymphs:

Eduard Nicolass de Boer Lenard Fredericus Johansson Ali Saber

Contents

Chapter 1 General Introduction 9

Chapter 2 Whole-exome sequencing is a powerful approach for establishing 21 the etiological diagnosis in patients with intellectual disability and microcephaly

Chapter 3 A novel homozygous insertion and review of published mutations 49 in the NNT causing familial glucocorticoid deficiency (FGD)

Chapter 4 Identification of novel candidate for L1 syndrome by 71 X-exome sequencing

Chapter 5 Composite biological network analysis provides novel clues for 91 hereditary congenital microcephaly

Chapter 6 General discussion 145

Appendices Summary (English) 163 Samenvatting (Dutch) 169 Acknowledgements 175

Chapter 1

General introduction

General introduction

Since the discovery of the DNA double helix in 1953 [1], several milestones have marked the path of the field of genomic research. One of these was the introduction of Sanger sequencing in the late 1970s, which provided a novel and effective technique to analyze and improve our insight into the [2], but the most significant was the completion of The Human Genome Project (HGP), 1 conducted by many laboratories around the world, in 2001 [3]. In concert with the completion of the HGP, an exciting new era of sequencing has begun. Driven by the technical challenge posed by the HGP, techniques for DNA sequencing have changed dramatically and led to the advent of next-generation sequencing (NGS) methods. These methods rely on the massively parallel sequencing of very short DNA fragments [4]. Within a relatively short period of time, NGS technologies have revolutionized research on the human genome and also that of other organisms [5] in various fields of genomics including DNA sequencing, expression studies through RNA sequencing (RNA-Seq) and epigenetic investigations (ChIP-Seq) [6,7]. With respect to DNA sequencing in humans, NGS-based techniques can be applied to entire genomes or to select parts of the genome (exome sequencing or gene panels). The methods are categorized into three groups based on their sample preparation methodology. First, in Whole Genome Sequencing (WGS), the entire genome of individual humans, which is composed of 3.2 billion nucleotides, can be sequenced directly within a week [8]. Second, in Whole Exome Sequencing (WES) or gene panels in which capture-by-hybridization selectively enriches regions or genes of interest, followed by the sequencing procedure. WES allows sequencing all of ~20,000 human genes following enrichment of ~180,000 exons, in dozens of individuals, within a few days [8]. Based on a similar approach, dedicated disease- specific gene panels may be used for targeted sequencing of particular gene sets of interest. In the third method, amplicon enrichment, genes (or regions) of interest are enriched by a multiplex-PCR, the products of which are then subjected to the sequencing procedure. This method is frequently applied for cancer genetics to detect causative mutations in tumour samples derived from, e.g., breast cancer [9] and thyroid cancer [10] patients. A simplified overview of the application of NGS techniques to detect/identify disease causing mutations is depicted in Figure 1. Importantly, introduction of NGS techniques provided unprecedented possibilities for genome analyses. These advances and the sharp decreases in the cost of sequencing technologies [12] have paved the way for widespread use of WES for disease-gene discovery. Miller syndrome was the first rare Mendelian disorder for which the causal gene was identified using WES [13].

11

Chapter 1

12

General introduction

Figure 1: A simplified workflow of a next generation sequencing analysis (adapted from Lohmann & Klein [11]). 1. For the sequencing analysis, DNA is required. This can be derived from large families with many affected patients, sporadic patients and their healthy parents (trios), cohorts of small families with the same disease (for gene discovery) or from individual patients or tissues (e.g. tumor samples). 2. DNA is prepared for the sequencing by fragmentation. Specific target sequences need to be enriched 1 (by hybridization or by PCR). 3. Fragments are ligated to universal adapters and amplified clonally, and then loaded onto a chip. 4. The sequencing reaction is monitored by a light signal (fluorescence) or by the release of a proton resulting in a pH change. 5. The signals are translated into a sequence and aligned to the reference genome. 6. Mismatches with the reference sequence are annotated with respect to the coding part of the genome. 7. Distinction are made between likely benign or possibly pathogenic variants using In silico methods such as predictor algorithms or mutation databases which comprehensively collect pathogenic mutations. 8. Finally, candidate variants classified as potential disease-causing need to be validated by Sanger sequencing along with functional studies and knockout animal models.

Since then, the list of rare Mendelian disorders for which WES was used to identify the genetic causes is rapidly growing [14]. WES has also been valuable in identifying causal and predisposing variants in common diseases and complex traits such as cardiovascular diseases, obesity and diabetes, hypertension and cancers [15]. The rationale for applying WES is evident: the 20,687 protein-coding genes [16] constitute only ~1% of the human genome but harbor about 85% of the known disease-associated mutations. In fact, most Mendelian disorders are caused by exonic mutations or splice-site mutations [17]. Accordingly, by sequencing no more than about 50Mb of our genome, the entire exome can be effectively analyzed. In recent years, WES has become the routine diagnostic application of NGS to identify the causes of mostly monogenic disorders. In particular, exome sequencing has helped to overcome difficulties and limitations in the identification of causal mutations in diseases with extreme locus heterogeneity such as intellectual disabilities [18], hearing loss [19] and retinitis pigmentosa [20]. One of the unique capabilities NGS methods offer is the detection of de novo variants across the entire genome by application of a trio-based sequencing strategy. This has unraveled the genetic cause, including identification of novel disease- associated genes, in sporadic cases of autism [21], intellectual disability [22] and schizophrenia [23]. Although WES may lead to identification of causal mutation(s), and despite its unique ability to detect de novo variants genome-wide, limitations of the technique such as incomplete capturing of all exons and the complexity of variant interpretation still pose challenges for application of WES in the clinic [24–26]. In parallel to applying WES, a broad range of dedicated gene panels have been implemented in germ-line and tumor diagnostics. These provide a sensitivity and specificity that equals the current standard of Sanger sequencing [27] and, to 13

Chapter 1 date, outperform WES in this respect. Comprehensive gene panels also allow more samples to be analyzed in a single run (making this technique cost-efficient) with a shorter turn-around time than clinical exome sequencing [28–30]. Gene panels have been implemented to diagnose mitochondrial disorders [31], epilepsy [32], retinal dystrophies [29], inflammatory bowel disease [33], cardiomyopathies [34] as well as amplicon-based targeted sequencing of breast cancer [9] and myeloid malignancies [35]. In addition to the reliable detection of mutations, such panels are able to detect even low-percentage mosaicisms like those seen in tumor materials [28,36]. Given that sequencing of the entire human genome has not been a generally accessible option until recently, the first real success of WGS was that it became possible to accurately detect and measure fetal DNA in maternal serum to screen for fetal aneuploidies using a noninvasive method [37–39]. More recently, however, with the availability of the newer generation of high-capacity sequencers, together with a further reduction in sequencing costs, WGS is being considered more and more often as the ultimate option for diagnostic testing. In addition to exomic variants, WGS is able to detect pathogenic structural variations like inter- and intrachromosomal translocation or chromosomal rearrangements [40–42]. With all these exiting possibilities at hand, and further technological improvements expected, the biggest challenge in using NGS techniques lies in the data-interpretation. For instance, exome sequencing yields approximately between 20,000 and 50,000 variants per sequenced exome. These figures vary enormously among different sequencing studies depending on the ethnicity of the patients [43], the method used for exome enrichment, the sequencing platform, sequencing depth, and the algorithms used for mapping and variant calling [44]. Identifying causal mutation(s) against a huge background of non-pathogenic polymorphisms and sequencing errors is a key challenge when interpreting NGS results. For instance, each human genome harbors about 100 genuine loss-of-function variants with around 20 genes completely inactivated [45]. Thus, having as much complementary information as possible, such as the pedigree or population structure, inheritance pattern, phenotypic features, known or unknown etiology, knowing whether a phenotype arises owing to de novo or inherited variants, linkage and/or homozygosity mapping data, is of utmost importance for optimal customized variant interpretation and prioritization strategies.

14

General introduction

Aim of this thesis

The research presented in this thesis focuses on WES applications for unravelling the genetic basis of human hereditary disorders by (i) screening of a comprehensive set of known disease-causing genes for mutations; (ii) identifying novel candidate 1 disease genes in genetically heterogeneous disorders with different possible inheritance patterns; and (iii) developing bioinformatic tools to improve on the possibilities for selecting putative candidate genes by in silico analyses. A diagram summarizing the structure of this thesis is shown in Figure 2. In Chapter 2, I examine the utility of WES as a diagnostic approach for establishing a molecular diagnosis in a highly heterogeneous group of patients with microcephaly and variable intellectual disability. Many patients with microcephaly and intellectual disability have a rather indistinguishable and non-specific phenotype. Diagnostic testing through candidate gene testing by Sanger sequencing is rather limited and the diagnostic yield is generally low. We therefore set out to apply WES as a tool to survey all microcephaly genes reported thus far in a single test and to determine WES’s usefulness in diagnosing patients.

Figure 2: Structure of this thesis.

15

Chapter 1

In Chapter 3 we set out to identify the disease gene in a Dutch family with a boy affected by familiar glucocorticoid deficiency (FGD) by applying trio-exome sequencing. Chapter 4 aims to identify a novel disease causing gene for L1 syndrome, an X-linked disorder, by application of a variant of WES. To date, L1 syndrome is known to be caused by mutations in the L1CAM gene, but many unsolved cases remain after mutation analysis of this gene. Pursuing the hypothesis that the disorder is considered to be X-linked, i.e. that the causative gene should be located on the X , we initiated X-exome sequencing in a cohort of male patients. Chapter 5 describes the development of an in silico composite network to predict novel candidate genes that might be involved in congenital microcephaly. Unsolved cases are a prevalent problem in diagnostics and researchers are often confronted with the limitations of WES to detect pathogenic mutations. Non- coding variants or inadequate quality of sequence in true causative gene may lead investigators to miss causal mutations. In addition, and importantly, undiscovered gene(s) associated with a given disorder are clearly a main explanation for unsolved cases. In the latter situation, applying in silico methods to create composite biological network from publically available database-resources could provide new, additional candidate genes to be considered in identifying disease-genes. Finally in Chapter 6 I will discuss the effectiveness and limitations of applying WES for molecular genetic studies on human disease, as well as discussing possible improvements and presenting my future perspectives on high-throughput sequencing approaches.

16

General introduction

References 1. Watson JD, Crick FH, others (1953) Molecular structure of nucleic acids. Nature 171: 737–738. 2. Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain- terminating inhibitors. Proceedings of the National Academy of Sciences 74: 1 5463–5467. 3. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, et al. (2001) The sequence of the human genome. Science 291: 1304–1351. 4. Shendure J, Ji H (2008) Next-generation DNA sequencing. Nature biotechnology 26: 1135–1145. 5. Ekblom R, Galindo J (2011) Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity (Edinb) 107: 1–15. 6. Fouse SD, Nagarajan RP, Costello JF (2010) Genome-scale DNA methylation analysis. Epigenomics 2: 105–117. 7. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10: 57–63. 8. Marian AJ (2012) Challenges in medical applications of whole exome/genome sequencing discoveries. Trends in cardiovascular medicine 22: 219–223. 9. Tung N, Battelli C, Allen B, Kaldate R, Bhatnagar S, et al. (2015) Frequency of mutations in individuals with breast cancer referred for BRCA1 and BRCA2 testing using next-generation sequencing with a 25-gene panel. Cancer 121: 25–33. 10. Nikiforova MN, Wald AI, Roy S, Durso MB, Nikiforov YE (2013) Targeted next-generation sequencing panel (ThyroSeq) for detection of mutations in thyroid cancer. The Journal of Clinical Endocrinology & Metabolism 98: 1852–1860. 11. Lohmann K, Klein C (2014) Next Generation Sequencing and the Future of Genetic Diagnosis. Neurotherapeutics 11: 699–707. 12. Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MB, et al. (2011) The real cost of sequencing: higher than you think! Genome Biology 12: 125. 13. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, et al. (2010) Exome sequencing identifies the cause of a mendelian disorder. Nature genetics 42: 30–35. 14. Rabbani B, Mahdieh N, Hosomichi K, Nakaoka H, Inoue I (2012) Next- generation sequencing: impact of exome sequencing in characterizing Mendelian disorders. Journal of human genetics 57: 621–632. 15. Rabbani B, Tekin M, Mahdieh N (2014) The promise of whole-exome sequencing in medical genetics. Journal of human genetics 59: 5–15. 16. Pennisi E (2012) ENCODE project writes eulogy for junk DNA. Science 337: 1159–1161. 17. Majewski J, Schwartzentruber J, Lalonde E, Montpetit A, Jabado N (2011) What can exome sequencing do for you? Journal of medical genetics 48: 580– 589.

17

Chapter 1

18. Hamdan FF, Srour M, Capo-Chichi J-M, Daoud H, Nassif C, et al. (2014) De Novo Mutations in Moderate or Severe Intellectual Disability. PLoS genetics 10: e1004772. 19. McClellan J, King M-C (2010) Genetic heterogeneity in human disease. Cell 141: 210–217. 20. Daiger SP, Sullivan LS, Bowne SJ (2013) Genes and mutations causing retinitis pigmentosa. Clinical genetics 84: 132–141. 21. O’Roak BJ, Deriziotis P, Lee C, Vives L, Schwartz JJ, et al. (2011) Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nature genetics 43: 585–589. 22. De Ligt J, Willemsen MH, van Bon BW, Kleefstra T, Yntema HG, et al. (2012) Diagnostic exome sequencing in persons with severe intellectual disability. New England Journal of Medicine 367: 1921–1929. 23. Rees E, Kirov G, O’Donovan MC, Owen MJ (2012) De novo mutation in schizophrenia. Schizophrenia bulletin 38: 377–381. 24. Xuan J, Yu Y, Qing T, Guo L, Shi L (2013) Next-generation sequencing in the clinic: promises and challenges. Cancer letters 340: 284–295. 25. De Leeneer K, Hellemans J, De Schrijver J, Baetens M, Poppe B, et al. (2011) Massive parallel amplicon sequencing of the breast cancer genes BRCA1 and BRCA2: opportunities, challenges, and limitations. Human mutation 32: 335– 344. 26. Frebourg T (2014) The Challenge for the Next Generation of Medical Geneticists. Human mutation 35: 909–911. 27. Sikkema-Raddatz B, Johansson LF, Boer EN, Almomani R, Boven LG, et al. (2013) Targeted Next-Generation Sequencing can Replace Sanger Sequencing in Clinical Diagnostics. Human mutation 34: 1035–1042. 28. Chong HK, Wang T, Lu H-M, Seidler S, Lu H, et al. (2014) The Validation and Clinical Implementation of BRCAplus: A Comprehensive High-Risk Breast Cancer Diagnostic Assay. PLoS One 9: e97408. 29. Glӧckle N, Kohl S, Mohr J, Scheurenbrand T, Sprecher A, et al. (2013) Panel- based next generation sequencing as a reliable and efficient technique to detect mutations in unselected patients with retinal dystrophies. European Journal of Human Genetics 22: 99–104. 30. Group SM (2015) Comprehensive gene panels provide advantages over clinical exome sequencing for Mendelian diseases. Genome biology 16: 134. 31. Platt J, Cox R, Enns GM (2014) Points to Consider in the Clinical Use of NGS Panels for Mitochondrial Disease: An Analysis of Gene Inclusion and Consent Forms. Journal of genetic counselling 23: 594–603. 32. Wang J, Gotway G, Pascual JM, Park JY (2014) Diagnostic Yield of Clinical Next-Generation Sequencing Panels for Epilepsy. JAMA neurology 71: 650– 651. 33. Kammermeier J, Drury S, James CT, Dziubak R, Ocaka L, et al. (2014) Targeted gene panel sequencing in children with very early onset inflammatory bowel disease—evaluation and prospective analysis. Journal of medical genetics 51: 748–755.

18

General introduction

34. D’Argenio V, Frisso G, Precone V, Boccia A, Fienga A, et al. (2014) DNA sequence capture and next-generation sequencing for the molecular diagnosis of genetic cardiomyopathies. The Journal of molecular diagnostics 16: 32–44. 35. Cheng DT, Cheng J, Mitchell TN, Syed A, Zehir A, et al. (2014) Detection of mutations in myeloid malignancies through paired-sample analysis of microdroplet-PCR deep sequencing data. The Journal of Molecular 1 Diagnostics 16: 504–518. 36. Friedman E, Efrat N, Soussan-Gutman L, Dvir A, Kaplan Y, et al. (2015) Low-level constitutional mosaicism of a de novo BRCA1 gene mutation. British Journal of Cancer 112: 765–768. 37. Lo YD (2013) Non-invasive prenatal testing using massively parallel sequencing of maternal plasma DNA: from molecular karyotyping to fetal whole-genome sequencing. Reprod Biomed Online 27: 593–598. 38. Lo YD, Chan KA, Sun H, Chen EZ, Jiang P, et al. (2010) Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus. Science translational medicine 2: 61ra91. 39. Chen EZ, Chiu RWK, Sun H, Akolekar R, Chan KCA, et al. (2011) Noninvasive prenatal diagnosis of fetal trisomy 18 and trisomy 13 by maternal plasma DNA sequencing. PLoS One 6: e21791. 40. Nishiguchi KM, Tearle RG, Liu YP, Oh EC, Miyake N, et al. (2013) Whole genome sequencing in patients with retinitis pigmentosa reveals pathogenic DNA structural changes and NEK2 as a new disease gene. Proceedings of the National Academy of Sciences 110: 16139–16144. 41. Gilissen C, Hehir-Kwa JY, Thung DT, van de Vorst M, van Bon BW, et al. (2014) Genome sequencing identifies major causes of severe intellectual disability. Nature 511: 344–348. 42. Wei L, Liu S, Conroy J, Wang J, Papanicolau-Sengos A, et al. (2015) Whole- genome sequencing of a malignant granular cell tumor with metabolic response to pazopanib. Molecular Case Studies 1: a000380. 43. Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, et al. (2011) Exome sequencing as a tool for Mendelian disease gene discovery. Nature Reviews Genetics 12: 745–755. 44. Gilissen C, Hoischen A, Brunner HG, Veltman JA (2012) Disease gene identification strategies for exome sequencing. European Journal of Human Genetics 20: 490–497. 45. MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, et al. (2012) A systematic survey of loss-of-function variants in human protein- coding genes. Science 335: 823–828.

19

Chapter 2

Whole-exome sequencing is a powerful approach for establishing the etiological diagnosis in patients with intellectual disability and microcephaly

Patrick Rump1#, Omid Jazayeri1#, Krista K. van Dijk-Bos1, Lennart F. Johansson1,3, Anthonie J. van Essen1, Johanna B.G.M. Verheij1, Hermine E. Veenstra-Knol1, Egbert J.W. Redeker2, Marcel M.A.M. Mannens2, Morris A. Swertz3, Behrooz Z. Alizadeh4, Conny M.A. van Ravenswaaij-Arts1, Richard J. Sinke1, and Birgit Sikkema-Raddatz1

1University of Groningen, University Medical Centre Groningen, Department of Genetics, Groningen, the Netherlands 2University of Amsterdam, Academic Medical Centre, Department of Clinical Genetics, Amsterdam, the Netherlands. 3University of Groningen, University Medical Center Groningen, Genomics Coordination Centre, Department of Genetics, Groningen, the Netherlands 4University of Groningen, University Medical Centre Groningen, Department of Epidemiology, Groningen, the Netherlands

# both are shared first authors

BMC Medical Genomics (2016) 9:7

WES approach in diagnosis of patients with microcephaly

Abstract Clinical and genetic heterogeneity in monogenetic disorders represents a major diagnostic challenge. Although the presence of particular clinical features may aid in identifying a specific cause in some cases, the majority of patients remain undiagnosed. Here, we investigated the utility of whole-exome sequencing as a diagnostic approach for establishing a molecular diagnosis in a highly heterogeneous group of patients with varied intellectual disability and microcephaly. Whole-exome sequencing was performed in 38 patients, including three sib- 2 pairs, in addition to or in parallel with genetic analyses that were performed during the diagnostic work-up of the study participants. In ten out of these 35 families (29%), we found mutations in genes already known to be related to a disorder in which microcephaly is a main feature. Two unrelated patients had mutations in the ASPM gene. In seven other patients we found mutations in RAB3GAP1, RNASEH2B, KIF11, ERCC8, CASK, DYRK1A and BRCA2. In one of the sib- pairs, mutations were found in the RTTN gene. Mutations were present in seven out of our ten families with an established etiological diagnosis with recessive inheritance. We demonstrate that whole-exome sequencing is a powerful tool for the diagnostic evaluation of patients with highly heterogeneous neurodevelopmental disorders such as intellectual disability with microcephaly. Our results confirm that autosomal recessive disorders are highly prevalent among patients with microcephaly.

23

Chapter 2

Introduction Microcephaly, a disproportionally small head size defined as occipitofrontal circumference at or below -3 standard deviations (SD), is an important neurological sign usually associated with developmental delays and intellectual disability [1]. Microcephaly can be congenital or may develop later after birth. It can occur as a more or less isolated finding or as one of the features of a more complex syndrome. To date, according to the London Medical Database [2], more than 800 syndromes with microcephaly have been described, reflecting an enormous number of possible diagnoses. Microcephaly may stem from a wide variety of genetic or non-genetic conditions and is considered to be a consequence of abnormal brain development. Non-genetic causes include foetal alcohol exposure, perinatal infections, asphyxia or haemorrhage. Microcephaly can also occur in patients with inborn errors of metabolism, but this explains only a small proportion of the total cases [3]. In the majority of patients a genetic cause is suspected [4]. Chromosomal abnormalities may account for 15-20% of patients [5]. In a retrospective study of 680 children with microcephaly [6], a specific genetic cause was detected in 15.3% of them, with numerical chromosome aberrations accounting for 6.8% and microdeletions/duplications and monogenetic disorders accounting for the other 8.5%. Over the last few years the number of genes related to microcephaly has been rising, primarily as a result of the increased use of whole-exome sequencing (for an overview see [7]). This clinical and genetic heterogeneity represents a major diagnostic challenge. Although the presence of additional clinical findings may aid in identifying a specific cause in some cases, the majority of patients with microcephaly remain undiagnosed. Yet an early and specific diagnosis is important because it can provide relevant information about disease prognosis, appropriate medical or supportive care, reproductive consequences for parents and other family members, and prenatal diagnostic options. It may also preclude unnecessary further, and possibly invasive, diagnostic tests and evaluations. The diagnostic work-up of microcephaly is extensive and includes brain imaging, metabolic and ophthalmologic evaluations, skin or muscle biopsies, karyotyping or microarray analysis, and mutation analyses of microcephaly genes. However, the use of targeted approaches, such as candidate gene testing by Sanger sequencing, is rather limited and their diagnostic yield is generally low. In contrast, because whole-exome sequencing does not require a priori knowledge of the gene or genes responsible for the disorder under investigation [8], it has been proven to be a very effective technique for identifying new genetic causes of microcephaly [9, 10].

24

WES approach in diagnosis of patients with microcephaly

In this study, we investigated the utility of whole-exome sequencing as a diagnostic approach for establishing a molecular diagnosis in a highly heterogeneous group of patients with microcephaly and intellectual disability.

Materials and Methods Study subjects We performed whole-exome sequencing in 38 patients, including three sib-pairs, 2 with severe microcephaly and varied intellectual disabilities (Table 1). Patients were recruited, both prospectively and retrospectively, from the population of patients with intellectual disability referred to the genetics departments of the University Medical Centre Groningen and the Academic Medical Centre in Amsterdam. Informed consent was obtained from the participating families and the study protocol was approved by the Ethics Committee of the University Medial Centre Groningen. The clinical characteristics of all 38 patients are summarized in Table 1. The average age of the patients was 10 years (range 0 to 57 years). The majority (87%) were children, with only five patients over 18 years of age. Severe microcephaly was defined as an occipital frontal head circumference of at least 3 SDs below the age- related mean according to Dutch national reference curves. The mean occipital frontal head circumference Z-score for our patients was -4.5 SD (range -3 to -8 SD). Consanguinity was reported for the parents of six patients (patients# 16, 23, 27, 28, 29, 35), including one of the three sib-pairs. Phenotype information was retrieved from medical records and provided by the referring physicians. All patients had undergone genetic testing during their routine diagnostic work-up. In addition to whole-exome sequencing, the work-up may have included standard chromosome analysis, metabolic screening, or DNA sequencing of individual or multiple genes. On average, 2.8 DNA tests (range 0 to 9 tests) were performed per family (supplemental Table S1). The most frequently tested gene was ASPM. This gene was analysed in seven (20%) of the families.

25

Chapter 2

26

WES approach in diagnosis of patients with microcephaly

2

27

Chapter 2

28

WES approach in diagnosis of patients with microcephaly

Microarray analysis SNP array analysis was performed in all patients prior to inclusion using the IlluminaHumanCytoSNP-12 v2.1 DNA Analysis BeadChip following the manufacturer’s instructions (Illumina, San Diego, CA, USA). Normalized intensity and allelic ratios were analysed using GenomeStudio Data Analysis Software and the cnvPartition v3.1.6 algorithm to assess copy number variation. No causal copy number variants were found (data not shown). Homozygosity was determined as were, if possible, any shared haplotypes within a family. Shared homozygous 2 stretches between the siblings with consanguineous parents were detected by selecting regions in which both siblings shared the same homozygous alleles; for the sib-pair with a recessive disorder we selected regions in which they shared the same combination of alleles, and for the sib-pair with a dominant disorder we selected regions in which they shared at least one allele at each SNP position. Stretches of homozygosity were detected within a single patient by selecting regions that only showed homozygous alleles.

Whole-exome sequencing Library preparation was based on the SureSelect All Exon V4 bait protocol (Agilent Technologies, Inc., Santa Clara, CA, USA), and carried out on a PerkinElmer® SciClone NGS workstation (PerkinElmer, Inc., Waltham, MA, USA). Briefly, 3 micrograms of genomic DNA were randomly fragmented with ultrasound using a Covaris® instrument (Covaris, Inc., Woburn, MA, USA). Adapters were ligated to both ends of the resulting fragments. Fragments with an average insert size of 220 bp were excised using the LabChip XT® gel system (PerkinElmer) and amplified by PCR (98 oC 2 min; 11cycli of 98 oC 30s, 65 oC 30s, 72 oC 1 min; 72 oC 10 min). The quality of the product was verified by capillary electrophoresis on the LabChip GX instrument (PerkinElmer). Of this product, 500 ng was subsequently hybridized to the Agilent SureSelect® All Exon V4 bait pool (Agilent Technologies, Inc.). During subsequent enrichment with PCR (98 oC 2 min; 11 cycli of 98 oC 30s, 57 oC 30s, 72 oC 1 min; 72 oC 10 min), individual samples were bar-coded and the quality of the product was again verified on the LabChip GX instrument. The product had an average fragment size of 360 bp, and was sequenced using 100 reads paired-end on an Illumina® HiSeq2000 in pools of four samples per lane (Illumina). Resulting image files were processed including demultiplexing using standard Illumina® base calling software. All lanes of sequence data were aligned to the human reference genome build b37, as released by the 1000 Genomes Project [11], using Burrows-Wheeler Aligner [12]. Subsequently the duplicate reads were marked. Using the Genome Analysis

29

Chapter 2

Toolkit (GATK) [13], re-alignment around insertions and deletions detected in the sequence data and in the 1000 Genomes Project pilot [11] was performed, followed by base quality score recalibration. During the full process the quality of the data was assessed by performing Picard [14], GATK Coverage and custom scripts. SNP and indel discovery was done using GATK Unified Genotyper and Pindel [15], respectively, followed by annotation using SnpEff [16]. This production pipeline was implemented using the MOLGENIS compute [17] platform for job generation, execution and monitoring. For each sample, a vcf file was generated that included all variants. Mean target coverage of 64x was achieved, with more than 82% of targeted bases having at least 20x coverage (supplemental Table S2). Sample swab was excluded by a concordance check with the SNP information from the array.

DNA interpretation For data interpretation we uploaded the vcf files to Cartagenia NGS bench (Cartagenia, Leuven, Belgium) and then filtered the variants. We removed variants covered by less than five sequence reads and possibly benign variants annotated in dbSNP133, the 1000 Genomes project, the Seattle exome database [18], or the GoNL database [19] with an allele frequency above 2%. Additionally, variants that were annotated in eight control samples, and for which exome sequencing was performed in the same sequence run, were also removed because they were considered to be artefacts. The remaining variants were filtered on any occurrence in known genes related to the phenotype, using a filter with selected phenotype traits [HPO-terms for microcephaly (HP:0011451, HP:0005484, HP:0000253), abnormalities of the nervous system (HP:0002011, HP:0000707), and abnormality of the head (HP:0000234) ][20] in the Cartagenia filter tree. The remaining variants were assumed to be related to the phenotype and further filtered on an inheritance model. First, an autosomal recessive inheritance model was applied for gene identification, then an autosomal dominant or X-linked inheritance model was applied. We considered variants resulting in transcriptional or splice site effects as potentially pathogenic. For the remaining group of variants, including those without a phenotypic match, we checked for their presence in the professional HGMD database (Biobase-international, Beverly, MA, USA) and manually checked the relationships between variants and possible phenotypes. The effect of potentially pathogenic variants was explored in the Alamut prediction program (Alamut, Rouen, France) or based on information from available databases and literature.

30

WES approach in diagnosis of patients with microcephaly

Validation of mutations by Sanger sequencing Candidate pathogenic variants were validated by Sanger sequencing. When available, we studied segregation of these confirmed variants by investigating DNA samples from additional family members. Sequencing analysis was carried out using flanking intronic primers (primer sequences are available upon request). The forward primer was designed with a PT1 tail (5’-TGTAAAACGACGGCCAGT-3’) and the reverse primer with a PT2 tail (5’-CAGGAAACAGCTATGACC-3’). PCR was performed in a total volume of 10 µl containing 5 µl AmpliTaq Gold ®Fast PCR Master Mix 2 (Applied Biosystems, Foster City, CA, USA), 1.5µl of each primer with a concentration of 0.5 pmol/μl (Eurogentec, Serian, Belgium) and 2 µl genomic DNA in a concentration of 40 ng/μl. Samples were PCR amplified and sequenced according to our standard diagnostic protocols (available upon request).

Causality criteria To designate candidate pathogenic variants as causative, variants confirmed by Sanger sequencing were required to occur in genes in which mutations are known to cause a disorder with clinical features consistent with the patient’s phenotype. All variants were either reported in the literature or HGMD database [21] to be deleterious, classified as truncating or having a splice site effect, or predicted to be deleterious at least in three prediction algorithms in Alamut. The presence of candidate pathogenic variants was checked in the 1000 Genomes project, the Seattle exome database [18], Exome Variant Server [22], and Exome Aggregation Consortium (ExAC) [23]. Furthermore, when possible, familial segregation was investigated to determine whether this was consistent with the expected mode of inheritance. Causative variants were then reported to the referring physicians.

Results On average 34,580 single-nucleotide variants and small insertions or deletions were found in each patient by whole-exome sequencing (see supplemental Table S3). A molecular diagnosis could be established in 11 patients from 10 families, corresponding to a diagnostic yield of 29%. The inheritance of these mutations were autosomal recessive (N=7 families), autosomal dominant (N=2) and X-linked (N=1). In three patients, a molecular diagnosis was reached during their diagnostic work-up while performing exome sequencing (patient 1, patient 26 and patient 32). All three of these diagnoses were also reached by whole-exome sequencing. In the remaining 25 families, we identified five possibly candidate genes (data not shown). However, for none of these candidates a causal relationship to the phenotype of the patients could be proven. 31

Chapter 2

Mutations were detected in nine different previously identified microcephaly genes: ASPM, RAB3GAP1, RNASEH2B, KIF11, RTTN, ERCC8, CASK, DYRK1A and BRCA2 (Table 2). None of these mutations was homo- or heterozygous present in the databases of benign variants. One exception was the variant in RNASEH2B [c.529G>A]. This mutation has been described before [24] and is predicted to be pathogenic by Alamut .It was heterozygous reported in the ExAC database with a frequency of 0.13%. We identified ASPM mutations related to primary microcephaly type 5 [25] in two independent families. Patient 6 had two novel mutations in the ASPM gene: a deletion of four base pairs resulting in a frameshift and a splice site mutation. Sanger sequencing of the parents confirmed segregation. The other ASPM mutations (in patient 32) had already been detected by Sanger sequencing: a stop mutation and an insertion of one base pair resulting in a frameshift. In two patients with clinically recognizable syndromes (i.e. Warburg Micro syndrome and Aicardi-Goutières syndrome), mutations in RAB3GAP1 (patient 1) [26] and RNASEH2B (patient 26) [24] related to these conditions were identified during their diagnostic work-up as well as by whole-exome sequencing. In RAP3GAP1 a homozygous deletion of four base pairs in exon 6 was detected. This deletion creates a frameshift starting at codon Thr159. The new reading frame ends in a stop codon 18 positions downstream. A frame-shift mutation in the KIF11 gene was identified in patient 4, who was known to have microcephaly and retinopathy, consistent with the phenotype. The mutation is an insertion of one base pair resulting in a frameshift and after eight amino acids causing a stop codon. The mutation in this patient proved to be inherited from the mother, who also had microcephaly and mild learning problems, but who was not found to have any ophthalmologic abnormalities. The unaffected maternal aunt of patient 4 did not carry the mutation. We found a compound heterozygous frame-shift and missense mutation in the RTTN gene encoding the centrosome-associated protein Rotatin in one of the three sib-pairs (patients 8 and 9). The missense mutation was predicted as deleterious by several programs (including Polyphen, SIFT, and Mutation taster). In the ERCC8 gene a homozygous frame-shift mutation was found in patient 16, located in an 8 Mb homozygous region causing a stop codon further in the exon. Sanger sequencing of the parents confirmed that both were carriers of this heterozygous mutation. We further found a splice-site mutation in the CASK gene in patient 19, and analysis of the parents confirmed that this was a de novo mutation. This mutation results in a predicted change of the splice site. 32

WES approach in diagnosis of patients with microcephaly

We identified a frame-shift mutation in the DYRK1A gene in patient 23, which was not inherited from the patient’s mother. Because DNA of the patient’s father was not available for testing, we were unable to confirm a de novo event in our patient, but the mutation is highly likely to be pathogenic. A homozygous mutation in the BRCA2 gene was detected in patient 24. Biallelic BRCA2 mutations cause Fanconi anaemia complementation group D1 [27], a phenotype consistent with the phenotype of this patient. Sanger sequencing of the parents confirmed segregation. Since both parents of our patient were heterozygous 2 for the mutation, and because of the associated increased risk for cancer, both of their families were offered appropriate genetic counselling and mutation screening. To date, no other family members have been reported to have cancer.

33

Chapter 2

34

WES approach in diagnosis of patients with microcephaly

Discussion Overall, a molecular diagnosis was established in 29% of the families with microcephaly studied using whole-exome sequencing. This diagnostic yield is comparable to the diagnostic yield of 25% for whole-genome sequencing reported in a study of patients with a wide range of suspected genetic disorders, mostly related to neurologic conditions [28]. Our results are also in line with other studies on the diagnostic utility of whole-exome sequencing in genetically highly heterogeneous disorders. Whole-exome sequencing had a higher diagnostic yield 2 than Sanger sequencing in patients with deafness (44% vs. 10%), blindness (52% vs. 25%), mitochondrial diseases (16% vs. 11%) and movement disorders (20% vs. 5%) [29]. In a study of 188 probands from families with consanguineous parents and two or more affected children, mutations in known genes were found in 27% of the patients [8]. A similar diagnostic yield of approximately 25% was obtained in a study on consanguineous families from Qatar [30]. We found mutations that are highly likely to be causative in nine different genes known to be associated with a disorder in which microcephaly is an important phenotypic feature. The ASPM gene is related to primary microcephaly type 5, which is the most common cause of autosomal recessive inherited primary microcephaly [31]. Although ASPM was the most frequently tested gene during the diagnostic work-up of the patients (Table S1), in one of these two families this gene had not been tested. Mutations in the KIF11 gene are known to cause autosomal dominant inherited microcephaly with or without chorioretinopathy, lymphedema, or mental retardation [32], a phenotype that is consistent with the clinical features of the patient in which we identified a KIF11 mutation. Homozygous missense mutations in the RTTN gene have recently been identified in patients with microcephaly, intellectual disability, epilepsy and bilateral polymicrogyria [33]. The p.H865R mutation identified in our family results in a substitution of an amino acid that is located in the second Armadillo-like domain of Rotatin, a protein-protein interacting domain that is highly conserved among species. Sanger sequencing in both parents and two unaffected siblings confirmed that the mutations segregated with the clinical phenotype in this family, further proving causality. In our patients, the type of cortical malformations differed and the degree of microcephaly was more severe than in these published cases. It is known that such heterogeneity in distribution, severity and type of cortical malformations can occur in relation to mutations in one specific gene, as has been observed with mutations in the GPR56 gene [33, 34]. ERCC8 gene mutations cause Cockayne syndrome type A [35]. Although, in retrospect, this diagnosis is consistent with the clinical features of our patient, it had not been considered prior to our discovery of the ERCC8 gene 35

Chapter 2 mutation by whole-exome sequencing. When the patient was re-evaluated, his facial features had changed substantially and had become more characteristic of Cockayne syndrome. Mutations in the CASK gene cause mental retardation with microcephaly and pontocerebellar hypoplasia [36]. This X-linked condition is seen more frequently in females than in males [37]. The presence of severe intellectual disability, spasticity, seizures, absence of speech, and prominent upper incisors reported in our CASK patient are known clinical features associated with this disorder [38]. Unfortunately, no brain MRI was available from this patient so it remains unknown whether she also had pontine or cerebellar hypoplasia. The absence of this information may also explain why this diagnosis had not been considered prior to whole-exome sequencing. DYRK1A mutations cause autosomal dominant mental retardation type 7, in which microcephaly is an important clinical finding [39, 40]. Typically, de novo DYRK1A mutations are found [39] as was most likely the case in our patient. Biallelic BRCA2 mutations are known to cause Fanconi anaemia type D1 [41], and we found a homozygous mutation in this gene. Secondary microcephaly and hepatocellular carcinoma, as seen in our patient, are known features of Fanconi anaemia [27]. This particular BRCA2 mutation has been identified in families with familial breast cancer [42]. In our cohort an autosomal recessive disorder was detected in 7/10 of the families with an established diagnosis, while only eight out of 35 (23%) families had been suspected of having a recessive disorder (six consanguineous families and two families with affected siblings). This is consistent with the high empirical recurrence risk in sibs (approximately 15-20%) of patients with severe primary microcephaly without an identified cause [43]. Our results suggest that autosomal recessive inheritance is far more frequent in patients with microcephaly than in patients with intellectual disability overall [28, 44]. In a study of 100 patients with severe intellectual disability, mostly de novo mutations with an assumed dominant effect were identified and only one patient was found to have an autosomal recessive disorder [45]. Our results confirm the clinical and genetic heterogeneity in patients with microcephaly. When whole-exome sequencing is performed early in the diagnostic work-up of patients, it may prevent other, unnecessary, diagnostic evaluations. Most of the patients in our study had had extensive diagnostic evaluations but these had not led to a specific diagnosis in the majority of cases. Neveling et al calculated that when three or more genes per patient are investigated by Sanger sequencing, whole- exome sequencing is more cost effective and has a higher diagnostic yield [29].

36

WES approach in diagnosis of patients with microcephaly

Conclusion Our results demonstrate that whole-exome sequencing is a powerful tool for the diagnostic evaluation of highly heterogeneous neurodevelopmental disorders. We performed whole-exome sequencing in 35 families with intellectual disability and microcephaly. A molecular diagnosis could be established in 29% of the studied families. In 70% of the families an autosomal recessive condition was present, confirming that this inheritance mode is more prevalent among patients with intellectual disability who have microcephaly. 2

Acknowledgements This study was supported by a grant from the University Medical Centre Groningen (Doelmatigheidsfonds). We thank all patients and families for their participation, their physicians for submitting patient samples and providing clinical information, and Jackie Senior and Kate Mc Intyre for editing the manuscript.

37

Chapter 2

References 1. Woods CG (2004) Human microcephaly. Current Opinion in Neurobiology 14:112–117. 2. London Medical Database. [http://www.lmdatabases.com]. 3. Prasad AN, Bunzeluk K, Prasad C, Chodirker BN, Magnus KG, Greenberg CR (2007) Agenesis of the corpus callosum and cerebral anomalies in inborn errors of metabolism. Congenital Anomalies 47:125–135. 4. Piro E, Antona V, Consiglio V, Ballacchino A, Graziano F (2013) Microcephaly a clinical-genetic and neurologic approach. Acta Medica Mediterranea 29:327–331. 5. Cooper GM, Coe BP, Girirajan S, Rosenfeld JA, Vu TH, Baker C et el. (2011) A copy number variation morbidity map of developmental delay. Nature Genetics 43:838–846. 6. Hagen M, Pivarcsi M, Liebe J, Bernuth H, Didonato N, Hennermann JB et al. (2014) Diagnostic approach to microcephaly in childhood. a two-center study and review of the literature. Developmental Medicine and Child Neurology. 56:732–741. 7. Alcantara D, O’Driscoll M (2014) Congenital microcephaly. American Journal of Medical Genetics Part C. 166:124–139. 8. Dixon-Salazar TJ, Silhavy JL, Udpa N, Schroth J, Bielas S, Schaffer AE et al. (2012) Exome sequencing can improve diagnosis and alter patient management. Science Translational Medicine 4:138ra78. 9. Dyment DA, Sawyer SL, Chardon JW, Boycott KM (2013) Recent Advances in the Genetic Etiology of Brain Malformations. Current Neurology and Neuroscience Reports 13:1–10. 10. Rabbani B, Mahdieh N, Hosomichi K, Nakaoka H, Inoue I (2012) Next- generation sequencing: impact of exome sequencing in characterizing Mendelian disorders. Journal of Human Genetics 57:621–632. 11. Abecasis G, Altshuler D, Auton A, Brooks LD, Durbin RM and GRA, Hurles ME et al. (2010) A map of human genome variation from population- scale sequencing. Nature 467:1061–1073. 12. Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows- Wheeler transform. Bioinformatics 26:589–595. 13. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20:1297– 1303. 14. Picard. [http://picard.sourceforge.net]. 15. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z: Pindel (2009) A pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25:2865–2871. 16. Cingolani P, Platts A, Wang L, Coon M, Nguyen T, Wang L et al. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w; iso-2; iso-3. Fly 6:1–13. 38

WES approach in diagnosis of patients with microcephaly

17. Byelas H, Dijkstra M, Neerincx PB, Van Dijk F, Kanterakis A, Deelen P et al. (2013) Scaling Bio-Analyses from Computational Clusters to Grids. In IWSG. Zurich. Proceedings of the 5th International Workshop on Science Gateways. 18. 1000 Genomes project, Seattle exome database. [http://www.evs.gs.washington.edu]. 19. Francioli LC, Menelaou A, Pulit SL, van Dijk F, Palamara PF, Elbers CC et al. (2014) Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nature Genetics 46:818–825. 20. Human phenotype ontology. [http://www.human-phenotype-ontology.org/]. 2 21. HGMD, Human Gene Mutation Database. [https://portal.biobase- international.com/cgi-bin]. 22. Exome Variant Server. (http://evs.gs.washington.edu/EVS/) 23. Exome Aggregation Consortium (ExAC). (http://ExAC.broadinstitute.org). 24. Crow YJ, Leitch A, Hayward BE, Garner A, Parmar R, Griffith E et al. (2006) Mutations in genes encoding ribonuclease H2 subunits cause Aicardi- Goutieres syndrome and mimic congenital viral brain infection. Nature Genetics 38:910–916. 25. Passemard S, Titomanlio L, Elmaleh M, Afenjar A, Alessandri J-L, Andria G et al. (2009) Expanding the clinical and neuroradiologic phenotype of primary microcephaly due to ASPM mutations. Neurology 73:962–969. 26. Aligianis IA, Johnson CA, Gissen P, Chen D, Hampshire D, Hoffmann K et al. (2005) Mutations of the catalytic subunit of RAB3GAP1 cause Warburg Micro syndrome. Nature Genetics 37:221–224. 27. Alter BP, Rosenberg PS, Brody LC. (2007) Clinical and molecular features associated with biallelic mutations in FANCD1/BRCA2. Journal of Medical Genetics 44:1–9. 28. Yang Y, Muzny DM, Reid JG, Bainbridge MN, Willis A, Ward PA et al. (2013) Clinical whole-exome sequencing for the diagnosis of mendelian disorders.The New England Journal of Medicine. 369:1502–1511. 29. Neveling K, Feenstra I, Gilissen C, Hoefsloot LH, Kamsteeg E-J, Mensenkamp AR et al. (2013) A Post-Hoc Comparison of the Utility of Sanger Sequencing and Exome Sequencing for the Diagnosis of Heterogeneous Diseases. Human Mutation 34:1721–1726. 30. Fahiminiya S, Almuriekhi M, Nawaz Z, Staffa A, Lepage P, Ali R et al. (2013) Whole exome sequencing unravels disease-causing genes in consanguineous families in Qatar. Clinical Genetics 86:134–141. 31. Nicholas AK, Swanson EA, Cox JJ, Karbani G, Malik S, Springell K et al. (2009) The molecular landscape of ASPM mutations in primary microcephaly. Journal of Medical Genetics 46:249–253. 32. Ostergaard P, Simpson MA, Mendola A, Vasudevan P, Connell FC, van Impel A et al. (2012) Mutations in KIF11 Cause Autosomal-Dominant Microcephaly Variably Associated with Congenital Lymphedema and Chorioretinopathy. American Journal of Human Genetics 90:356–362. 33. Kheradmand Kia S, Verbeek E, Engelen E, Schot R, Poot RA, de Coo IF et al. (2012) RTTN Mutations Link Primary Cilia Function to Organization of

39

Chapter 2

the Human Cerebral Cortex. American Journal of Human Genetics 91:533– 540. 34. Barkovich AJ, Guerrini R, Kuzniecky RI, Jackson GD, Dobyns WB. (2012) A developmental and genetic classification for malformations of cortical development: update 2012. Brain 135:1348–1369. 35. Laugel V, Dalloz C, Durand M, Sauvanaud F, Kristensen U, Vincent M-C et al. (2010) Mutation update for the CSB/ERCC6 and CSA/ERCC8 genes involved in Cockayne syndrome. Human Mutation 31:113–126. 36. Najm J, Horn D, Wimplinger I, Golden JA, Chizhikov VV, Sudi J et al. (2008) Mutations of CASK cause an X-linked brain malformation phenotype with microcephaly and hypoplasia of the brainstem and cerebellum. Nature Genetics 40:1065–1067. 37. Hackett A, Tarpey PS, Licata A, Cox J, Whibley A, Boyle J et al. (2009) CASK mutations are frequent in males and cause X-linked nystagmus and variable XLMR phenotypes. European Journal of Human Genetics 18:544– 552. 38. Moog U, Kutsche K, Kortüm F, Chilian B, Bierhals T, Apeshiotis N et al. (2011) Phenotypic spectrum associated with CASK loss-of-function mutations. Journal of Medical Genetics 48:741–751. 39. O’Roak BJ, Vives L, Fu W, Egertson JD, Stanaway IB, Phelps IG et al. (2012) Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders. Science 338:1619–1622. 40. Van Bon B, Hoischen A, Hehir-Kwa J, de Brouwer A, Ruivenkamp C, Gijsbers A et al. (2011) Intragenic deletion in DYRK1A leads to mental retardation and primary microcephaly. Clinical Genetics 79:296–299. 41. Howlett NG, Taniguchi T, Olson S, Cox B, Waisfisz Q, de Die-Smulders C et al. (2002) Biallelic inactivation of BRCA2 in Fanconi anemia. Science 297:606–609. 42. Tea M-KM, Kroiss R, Muhr D, Fuerhauser-Rappaport C, Oefner P, Wagner TM et al. (2014) Central European BRCA2 mutation carriers: Birth cohort status correlates with onset of breast cancer. Maturitas77:68–72. 43. Firth HV, Hurst JA, Hall JG. (2005) Oxford desk reference: Clinical genetics. Oxford University Press. 44. Veltman JA, Brunner HG. (2012) De novo mutations in human genetic disease. Nature Review Genetics 13:565–575. 45. De Ligt J, Willemsen MH, van Bon BW, Kleefstra T, Yntema HG, Kroes T et al. (2012) Diagnostic exome sequencing in persons with severe intellectual disability. The New England Journal of Medicine 367:1921–1929.

40

WES approach in diagnosis of patients with microcephaly

Chapter 2 2

Supplementary materials

41

Chapter 2

Supplementary Table S1: Genetic tests performed during the diagnostic work-up of the study participants. Patient Number of Genetic tests tests 1 1 RAB3GAP 2 1 UBE3A 3 8 ASPM, NBN, MCPH1, WDR62, CDK5RAP2, CEP152, CENJP, STIL 4 0 - 5 5 SHH, SIX3, GLI2, ZIC2, TGIF 6 0 - 7 7 UBE3A, SNRPN methylation, MECP2, MCT8, mtDNA (Mito chip), TUBA1A, WDR62 8 0 - 9 7 DCX, LIS1,TUBA1A, FLNA, POMT1, POMGnT1, mtDNA (m3243A>G; m8344A>G; m8993T>G/C) 10 0 - 11 0 - 12 1 FMR1 13 3 MCPH1, ASPM, FMR1 14 3 ZEB2, FMR1, MCT8 15 0 - 16 6 FMR1, MCT8, POLG, TIMM8A, ECGF1(MNGIE), mtDNA (Affymetrix resequencing array 2.0) 17 0 - 18 1 MECP2 19 2 MECP2, CDKL5 20 1 MYCN 21 1 Leber Congenital Amaurosis chip (Asper version 2010) 22 7 WDR62, CDK5RAP2, CENPJ, CEP152, STIL, MCPH1, ASPM 23 0 - 24 3 CDKL5, MECP2, FMR1 25 7 KCNJ11, ABCC8, GCK, H19/LIT1 methylation, GHR, STAT5B, IGF1 26 9 MECP2, CDKL5, UBE3A, TCF4, FOXG1, RNASEH2A, RNASEH2B, RNASEH2C, TREX1 27 1 GPR56 28 0 - 29 0 - 30 2 ASPM, MCPH1 31 4 CHD7, KAT6B, MED12, UBE3B 32 2 ASPM, MCPH1

42

WES approach in diagnosis of patients with microcephaly

Patient Number of Genetic tests tests 33 6 HNF1β, PKHD1, NPHP3, Bardet-Biedl syndrome micro- array (Asper version 5), mtDNA (Mito chip), BCS1L 34 0 - 35 4 MCPH1, ASPM, NBS, Fanconi anemia (mitomycine C test) 36 3 ASPM, STIL, MCPH1 37 0 - 2 38 3 ASPM, STIL, MCPH1

43

Chapter 2

Supplementary Table S2: Specification of the sequence data.

Samples Mean Min Max Genome size (bp) 3101804739 3101804739 3101804739 Bait size (bp) 51695938 51374954 51756122 Target size (bp) 51695938 51374954 51756122 Total number of reads 119523589 75682208 174102599 On target bases (Mb) 3306.90 2141.65 4660.77 Aligned bases (Mb) 8863.88 5714.48 12854.38 Mean bait coverage 62.51 40.11 87.64 Mean target coverage 63.97 41.38 90.05 Fraction bp on bait 0.28 0.20 0.37 Fraction bp near bait 0.12 0.08 0.22 Fraction bp off bait 0.38 0.26 0.49 Fraction bp not aligned 0.11 0.05 0.16 Capture specificity 0.62 0.51 0.74 Fraction target covered ≥ 2x 0.96 0.95 0.99 Fraction target covered≥ 10x 0.90 0.84 0.96 Fraction target covered ≥ 20x 0.82 0.68 0.88 Fraction target covered ≥ 30x 0.71 0.52 0.83 Fraction usable bases on target 0.28 0.20 0.37 Mean read length 95 95 95 Strand balance 0.50 0.5 0.5 Median insert size 224 175 369 Mean insert size 226.49 172.72 353.71 Standard deviation insert size 52.22 25.41 99.69 Number of SNPs for concordance 2635 2103 3311 Concordance 1 1 1 Name of the bait set(s) used in the hybrid selection for this project: SureSelect All Exon 50MB baits hg19 human g1k v37

44

WES approach in diagnosis of patients with microcephaly

2

45

Chapter 2

46

WES approach in diagnosis of patients with microcephaly

2

47

Chapter 3

A novel homozygous insertion and review of published mutations in the NNT gene causing familial glucocorticoid deficiency (FGD)

Omid Jazayeri 1, Xuanzhu Liu 2, Cleo C. van Diemen 1, Willie M. Bakker-van Waarde 3, Birgit Sikkema-Raddatz 1, Richard J. Sinke 1, Jianguo Zhang 2, Conny M.A. van Ravenswaaij-Arts 1

1 University of Groningen, University Medical Centre Groningen, Department of Genetics, Groningen, The Netherlands 2 BGI-Shenzhen, Shenzhen 518083, China 3 University of Groningen, University Medical Centre Groningen, Beatrix Children’s Hospital, Department of Pediatric Endocrinology, The Netherlands

European Journal of Medical Genetics (2015) 58: 642-649

Glucocorticoid deficiency and NNT mutations

Abstract Familial glucocorticoid deficiency (FGD) is an autosomal recessive disorder characterized by low levels of cortisol despite high adrenocorticotropin (ACTH) levels, due to the reduced ability of the adrenal cortex to produce cortisol in response to stimulation by ACTH. FGD is a heterogeneous disorder for which causal mutations have been identified in MC2R, MRAP, MCM4 and TXNRD2. Also mutations in STAR and CYP11A1 can sometimes present with a phenotype resembling FGD. Recently, it has been indicated that FGD can also be caused by mutations in NNT (nicotinamide nucleotide transhydrogenase). We identified a 6.67 Mb homozygous region harboring the NNT gene by 3 SNP haplotyping in a 1-year old Dutch boy presenting with FGD, but without mutations in MC2R and MRAP. Exome-sequencing revealed a novel homozygous mutation (NM_012343.3: c.1259dupG) in NNT that was predicted to be disease- causing. The mutation is located in exon 9 and creates a frameshift leading to a premature stop-codon (p.His421Serfs*4) that is known to result in FGD. Both parents were shown to be heterozygous carriers. We reviewed the literature for all the reported NNT mutations and their clinical presentation. The median age of disease onset in 23 reported patients, including the present patient, was 12 months (range 3 days to 39 months). There was no difference in age of disease onset between truncating and non-truncating NNT mutations. Based on recent literature, we advise to monitor patients with FGD due to NNT mutations for possible combined mineralocorticoid insufficiency and extra-adrenal manifestations.

51

Chapter 3

Inroduction Familial glucocorticoid deficiency (FGD; MIM 202200) is a rare, autosomal recessive disorder characterized by undetectable or low levels of plasma cortisol despite high plasma adrenocorticotropin (ACTH) levels. The first symptoms generally occur during early infancy. FGD can present as an acute adrenal crisis precipitated by an intercurrent illness, or with recurrent hypoglycemia (often associated with seizures). Nonspecific additional symptoms might be: lethargy, failure to thrive, pallor and delayed developmental milestones. In addition, hyperpigmentation due to elevated ACTH levels is often seen. Strikingly, serum electrolytes are usually normal because aldosterone production is regulated mainly by the renin-angiotensin system. Plasma cortisol levels can range from undetectable to low-normal in the presence of high ACTH levels and do not respond to exogenous ACTH, demonstrating that patients are specifically resistant to ACTH [1]. FGD is a genetically heterogeneous disorder resulting from known mutations in MC2R (Melanocortin 2 receptor; MIM 607397), MRAP (MC2R accessory protein; MIM 609196), and MCM4 (Minichromosome maintenance 4; MIM 602638) [2,3]. Also mutations in STAR (Steroidogenic acute regulatory protein; MIM 600617) and CYP11A1 (Cytochrome P450, Family 11, Subfamily A, MIM 613743; MIM 118485) can sometimes present with a phenotype resembling FGD [4,5]. It has recently been reported that mutations in NNT (nicotinamide nucleotide transhydrogenase; MIM 607878) and TXNRD2 (Thioredoxin Reductase 2; MIM 606448) can also result in FGD [3,6]. While mutations in MC2R, MRAP, STAR, and MCM4 account for ~50% of FGD patients, NNT mutations are found in 5- 10% of FGD patients [3,7]. Mutations in TXNRD2, has been reported in only one Kashmiri family yet [2]. Mutations in MC2R, MRAP, STAR and NNT have been found in patients from different ethnic origins. In contrast, only one private mutation for MCM4 has been reported, in patients from an Irish travelling community [8,9] who presented with a more complex phenotype, including natural killer cell deficiency, a DNA repair disorder, and FGD (MIM 609981). When ACTH resistance is associated with alacrima and achalasia of the cardia resulting in dysphagia, this constitutes a separate condition known as triple A syndrome (AAAS or Allgrove syndrome, MIM 231550) which is caused by mutations in AAAS (MIM 605378). The NNT gene is located at chromosome 5p12. It consists of 22 exons and codes for a 1086 amino acid protein. NNT is a highly conserved enzyme integrated in the inner mitochondrial membrane. It comprises three domains: two mitochondrial matrix domains and one transmembrane domain (including 14 α- 52

Glucocorticoid deficiency and NNT mutations helixes) that spans the mitochondrial inner membrane (see Fig. 1). The mitochondrial matrix domains 1 and 2 contain the binding sites for NAD/NADH and NADP/NADPH, respectively, and protrude from the inner membrane into the mitochondrial matrix [10]. NNT utilizes the electrochemical proton gradient across the mitochondrial inner membrane to produce high concentrations of NADPH from NADH. NADPH is needed by glutathione peroxidase for the detoxification of reactive oxygen species (ROS) in mitochondria. Thus, NNT deficiency results in decreased NADPH production and impaired ROS detoxification [3,6]. Knockdown of NNT in a human adrenocortical cell line results in increased apoptosis and lowered glutathione peroxidase function. Lower levels of NNT thus make the 3 adrenal cortex more vulnerable to ROS damage caused by defective ROS detoxification. Here we report on a consanguineous Dutch family in whom a novel homozygous mutation in NNT led to FGD. We then summarize the pathogenic mutations reported in NNT thus far, their localization within the gene and protein domains, and their clinical effect.

Materials and methods Clinical report Our patient, the son of consanguineous Caucasian parents (his paternal great- grandmother was a sister of his maternal great-grandfather, see Figure 2A), was born in breech presentation, after an uneventful pregnancy, at a gestational age of 39 weeks. His birth weight was 3104 grams and birth length 48 cm. In the first postnatal months, he had feeding difficulties and vomited frequently. From the age of 4 months, his skin pigmentation increased. His psychomotor development was within the normal range. At the age of 12 months, he had a long-lasting seizure and a blood glucose level of 1.6 mmol/l. Physical examination revealed mild plagiocephaly, hyperpigmented skin and thin, light blonde hair. He cried with tears, excluding triple A syndrome. His height (78 cm; +0.56 SD), weight (9.7 kg; -1 SD) and head circumference (47.8 cm; +0.4 SD) were within the normal range. He had dark pigmentation of the nipples and external genitalia, a penile length of 5.2 cm (+1.8 SD) and normal testes (2 ml). Blood sampling for biochemical and endocrinological studies was performed in the morning and revealed normal sodium and potassium concentrations (Na 139 mmol/l, K 4.5 mmol/l), normal urine sodium and potassium levels (Na 17 mmol/l, N 33 mmol/l; measured in a normal hydration state), a low cortisol level (5 nmol/l, normal 50-800 nmol/l) and an elevated ACTH level (> 270 ng/l, normal < 46 ng/l). 53

Chapter 3

Figure 1: Schematic representation of the NNT protein. Amino acid positions and predicted protein domains have been adapted from UniProtKB database (Q13423). Purple circles represent the transit peptide. Blue circles represent NADH- and green circles NADPH-binding sites. Red circles indicate the amino acids where mutations have been reported. Truncating mutations are shown in red boxes and non-truncating mutations in black boxes. aPredicted result if exon skipped.

Serum 17-hydroxyprogesterone (0.6 nmol/l, normal 0,5-10 nmol/l) and androstenedione concentrations (0.1 nmol/l, normal < 1 nmol/l) were low-normal, while the plasma renin activity was within the normal range (3.9 nmol/l/h , normal 0.8-4 nmol/l/h). Urinary steroid analysis detected no tetrahydrocortisone and no abnormal steroid metabolites. No antibodies against the adrenal cortex could be detected and serum very long-chain fatty acids were normal. An abdominal ultrasound showed normal adrenal glands for the boy’s age. He was treated with hydrocortisone 13-15 mg/m2/day to normalize his serum ACTH concentrations. He became more active and had no more signs of hypoglycemia or seizures. At the start of this study, the NNT gene was not firmly associated with FGD and Sanger sequencing in the proband had failed to identify a pathogenic mutation in the MC2R and MRAP genes. Therefore, we performed a SNP analysis to identify regions of identity-by-descent, followed by exome sequencing with special attention paid to variants in the homozygous regions we had identified.

54

Glucocorticoid deficiency and NNT mutations

Molecular studies Genomic DNA was extracted from peripheral blood samples from the patient and his parents after informed consent was obtained. SNP array analysis was performed using the HumanCytoSNP-12 v2.1 DNA BeadChip (Illumina, San Diego, CA, USA) and analyzed using GenomeStudio Data Analysis Software and the cnvPartition v3.1.6 algorithm DNA samples were enriched using the SureSelect Human All Exon V2 Kit (Agilent Technologies, Inc., Santa Clara, CA, USA) according to the manufacturer’s instructions and sequenced on an Illumina HiSeq2000. Alignment to the human genome reference (UCSC, hg 19 version, build 37.1) was done using SOAP aligner [11]. Duplicated reads were removed and an average coverage of 3 71.85 was obtained for the target regions. The Soapsnp software [12] was used to assemble the consensus sequence and to call genotypes in target regions and flanking regions. For insertions or deletions (indels) in the target regions, reads were aligned to the reference genome via the Burrows-Wheeler Aligner [13] and indels were identified by Genome Analysis Toolkit [14]. Variants were annotated using a BGI internal pipeline (BGI, Shenzhen, China). We excluded synonymous variants or those in the 3’-UTR, 5’-UTR, intronic and intergenic regions. Under a recessive model, we retained homozygous variants and compound heterozygous variants in the patient. Finally, we excluded common variants (minor allele frequency ≥ 0.5% in both the 1000 Genomes Project and ESP6500). The homozygous variant we identified in the NNT gene was validated by amplification of the relevant part of exon 9 using 5’-GTGAACATAGGGTGATAGAC-3’ and 5’- TCAGCATAAAGCTGGCATAC-3’ primers and sequencing using standard Sanger sequencing protocols.

Review of the literature All NNT mutations and their clinical phenotypes reported in the literature up to October 12, 2015 were retrieved from PubMed using NNT [Title/Abstract] AND mutation [Title/Abstract] as a query. Subsequently, papers were selected describing patients with adrenal insufficiency due to NNT mutations. Additionally, the ClinVar database (see Web Links) was searched for pathogenic mutations in NNT. Fig. 1 shows the reported disease-causing mutations in NNT and their localization within the protein domains using information adapted from the UniProtKB database (Q13423). One of the reported mutations, c.1A>G, was re-analyzed using ORF Finder (see Web Links). This program predicted that the mutation affected NNT function (supplement figure A). As part of our review analysis, the age of disease onset among three groups with a different number of truncating alleles in NNT were compared with Kruskal– 55

Chapter 3

Wallis ANOVA. For the two sib-pairs [15] described in Table 1, their mean age of onset was used in this analysis. Truncating mutations were defined as mutations causing a premature termination of transcription, i.e nonsense and frameshift mutations, while missense mutations were classified as non-truncating. Splice site mutations were categorised depending on the predicted effect (see table 1).

Results Molecular studies Genome-wide SNP analysis did not reveal any putative disease-associated copy number variants in the patient. However, we identified 25.7 Mb (~0.8%) copy- neutral regions of homozygosity, including a 6.67 Mb region on chromosome 5 located at 39548842-46228333 bp, between SNP markers rs6870776 and rs7293466 (GRCh19/hg37), and harboring the NNT gene (Fig. 2B). Subsequent exome- sequencing revealed a novel homozygous frameshift mutation in NNT (NM_012343.3: c.1259dupG). This mutation was then confirmed by Sanger sequencing and both parents were found to be heterozygous carriers (Fig. 2C). The mutation is located in exon 9 and creates a frameshift starting at codon His421, leading to a premature stop-codon (p.His421Serfs*4). This particular mutation has not been annotated in dbSNP or mutation databases, including those of the 1000 Genomes Project, the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project, nor the Genome of the Netherlands (see Web Links).

Review of the literature Our literature review yielded 29 NNT disease-causing mutations in 23 patients with FGD, including our patient, two reported sib-pairs and one unrelated patient with a recurrent mutation, shown to be a founder mutation by Weinberg-Shukron et al. 2015). Fifteen patients from 14 families had homozygous mutations and eight patients from seven families had compound heterozygous mutations (Table 1). Remarkably, almost all mutations are unique and there is a preponderance of missense mutations: 17 missense, 7 frameshift (or predicted-to-be frameshift), and 5 nonsense mutations. For one mutation (c.1A>G; first patient in Table 1), a substitution at the initial methionine, the exact effect is unknown. Applying ORF Finder showed there is no ATG codon upstream of the one used normally in the primary NNT transcript, whereas the closest in-frame ATG codon is located down- stream at position c.394, and if used, would result in in-frame translation (supplement figure A). However, if this is the case, the first 131 amino acids of NNT, including the signal peptide (amino acids 1-43), will be missed. As this signal peptide is necessary for importing NNT into the mitochondrial membrane, its 56

Glucocorticoid deficiency and NNT mutations absence may result in no functional NNT in mitochondria. Moreover, the first translation-initiating methionine in NNT is an evolutionarily conserved amino acid among human, chimpanzee, rat, mouse, cow, chicken, frog and tetraodon (Alamut Interactive Biosoftware, supplement figure B). Therefore, we categorized the c.1A>G mutation as truncating. The 29 mutations are scattered over the gene. Figure 1 visualizes their location in the NNT protein. Twenty mutations are located in the mitochondrial matrix domains (the largest part of the protein), seven in the transmembrane regions and two in the transit peptide. None of the 13 mutations in mitochondrial matrix domain 1 are located at the NAD binding site, while two out of the five mutations 3 in mitochondrial matrix domain 2 are at the NADP binding site. The clinical features of all the reported patients, including ours, are summarized in Table 1. Information on the size of the adrenal gland was only known for three reported patients, all being normal like in our patient. In three of the reviewed families the combination of FGD with mineralocorticoid deficiency occurred. The sib-pair of a consanguineous Palestinian family has a homozygous NNT mutation (c.598 G>A, p.Gly200Ser). The same authors describe a non-related patient with an identical homozygous mutation and shared haplotype, indicating a founder mutation. Combined mineralocorticoid and glucocorticoid deficiency was documented in all three affected individuals [16]. Additionally, combined adrenal failure, precocious puberty and testicular adrenal rest tumor was reported in a patient with another homozygous NNT mutation (c.1163 G>A, p.Tyr388Ser) [17]. For 21 patients, belonging to 19 families, the age of disease presentation was known (Figure 3). The median age of onset was 12 months in the 19 families with a range of 3 days to 39 months in the 21 individual patients. In four out of the 16 families with isolated glycocorticoid deficiency, hyperpigmentation of the skin was reported as the first clinical presentation. The median age of onset in the six families with two truncating mutations was 9 months (range 4-18 months, n=7 patients), in the four families with one truncating and one non-truncating mutation it was 13.5 months (range 7-29 months, n=4 patients), and in the nine families with two non- truncating mutations it was 12 months (range 3 days to 39 months, n=10 patients) (figure 3). These differences are not statistically significant (p=0. 5147). Not surprisingly, the age of onset was earlier in those patients presenting with a combined adrenal insufficiency. However, if we exclude these patients from our analysis the differences in age of onset (9, 13.5 and 15.5 months, respectively) was still not significant (p=0.2161)(see Figure 3, dashed line).

57

Chapter 3

Figure 2: (A) Partial pedigree of our proband with FGD, demonstrating his consanguineous parents. (B) SNP array data showing a 6.67 Mb region of homozygosity in chromosome 5 at chr5: 39548842- 46228333 bounded by SNP markers rs6870776 and rs7293466 (GRCh19/hg37). (C) Sequence chromatograms of part of NNT exon 9 obtained by Sanger sequencing and showing a homozygous insertion in the proband. The parents both carry a heterozygous insertion at the same position.

58

Glucocorticoid deficiency and NNT mutations

3

59

Chapter 3

Levels Levels before

c

ee ee Fig. 1.);

retation retation original of authors

perpigmentation perpigmentation of the skin reported as first

.

The The position of each mutation is given as the

a

hy

e

deficiency. deficiency.

= = undetectable

Abbreviations: Abbreviations: hmz=homozygous, MM= molecular matrix,

levels levels measured after steroid withdrawal under intravenous fluid

Und

NNT protein domains according to UniProtKB database (Q13423),

predicted predicted effect not known;

d

predicted predicted

b b

and and mineralocorticoid

-

50 pg/ml;

-

ADP binding site (see Fig. 1.).

NM_012343.3; NM_012343.3;

plasma renin activity and aldosterone

g

have have combined gluco

the transcript

bold

between between two protein parts of the N

are sib pairs, Patients in

h h

predicted predicted result if exon skipped;

f

italics

number number of bases from the start codon of between square brackets: truncated transcipt, most likely resulting in nonsense mediated decay, for explanation of domains (s normaltreatment, reference values: for aldosteron renin and dependent on of patientage the and interp therefore assay used, is indicated (arrows), for cortisol >200 nmol/l, for ACTH 0 clinical feature; administration; domain, U=unknown, NADP=NADP TM=transmembrane binding N=normal, site, Patients in

60

Glucocorticoid deficiency and NNT mutations

Discussion We identified a novel frameshift mutation in NNT in a Dutch patient with FGD. The patient showed hypoglycemia and hyperpigmentation, similar to what is observed in other patients with a NNT mutation (see Table 1) [6]. DNA diagnostics for NNT was at that time not available in the Netherlands. However, the mutation could easily be identified thanks to the combination of SNP haplotyping and whole exome sequencing in this distantly-related family. Especially in clinical phenotypes with heterogeneous causes, whole exome sequencing has proven to be a very powerful diagnostic approach [18,19]. The question is whether based on phenotype a clinical distinction can be 3 made between the different causes of FGD. Clinically, cortisol deficiencies caused by mutations in MCM4 and AAAS are not pure FGD syndromes. They both have additional features: natural killer cell deficiency and short stature in the MCM4- related syndrome, and alacrima with achalasia in the AAAS-related syndrome. Patients with a disease-causing mutation in STAR usually present with lipoid congenital adrenal hyperplasia (LCAH, MIM 201701), which is a more severe phenotype than FGD. These patients have adrenal and gonadal insufficiency, with high ACTH and renin levels, and low cortisol and aldosterone levels. As a consequence, a failure of androgenization occurs in patients with a 46,XY karyotype, resulting in complete or partial sex reversal. Atypical or mild LCAH may present as FGD [4,20]. Similar to nonclassic LCAH, a partial defect in CYP11A1 may also lead to misdiagnosis of FGD [5,21]. FGD due to MRAP and MC2R mutations is clinically indistinguishable from FGD due to NNT mutations, although it has been suggested, but not yet confirmed, that obesity may be part of the phenotype in MRAP mutations [22]. Obesity was not reported in the patients with NNT or MC2R mutations. Chung et al. [23] showed that MRAP-related FGD (n=22) usually has an earlier age of onset (median 1 month; range birth – 1.6 years) than FGD due to MC2R mutations (n=40, median 2 years, range 1 week – 16 years). In the 21 patients with NNT mutations and an available age of onset presented here, the median age of onset was 12 months (range 3 days – 3.25 years), thus falling in-between the MRAP- and MC2R-related phenotypes.

61

Chapter 3

Type of mutations in NNT-related FGD Following the observed 31% residual NNT activity in a Japanese patient due to a homozygous non-truncating mutation (c.644 T>C; F215S) we studied the relationship between age of disease onset and type of NNT mutations. Since truncating mutations are most likely to result in no protein being produced at all, it was not surprising that they were located throughout the NNT gene. The NNT non-truncating mutations, however, were often located in or near important domains that were likely to affect the function of the NNT enzyme (four out of 14 were near binding sites and four were within the transmembrane domain). However, the age of disease onset between truncating and non-truncating NNT mutations was not statistically significant (Figure 3) and it was not possible to determine a reliable correlation between the type of NNT mutations and cortisol or ACTH levels. Although the observed residual enzyme activity in the Japanese patient who carried a homozygous non-truncating mutation may support the idea that non- truncating mutations delay the age of disease presentation, we cannot extend this finding to all non-truncating mutations. A possible explanation for this is that the amount of residual NNT activity might be highly dependent of the location of the mutation within the protein. The age of disease onset may also depend on additional comorbidity, such as the combined mineralocorticoid insufficiency, usually resulting in symptoms occurring earlier in life, as illustrated in Figure 3. The missense mutations resulting in mineralocoritocoid deficiency were located in the mitochondrial matrix I domain and both resulted in a Serine amino acid. However, a similarly located missense mutation resulting in a Serine residue was found in a patient without mineralocoritocoid deficiency (age of onset 19 months). A limiting factor of our study is the small patient cohort and larger groups of patients are needed to investigate whether the type of NNT mutation indeed is correlated with age of disease onset.

Phenotypic heterogeneity Among the reviewed patients are four patients from three families with two different homozygous missense mutations and the combination of mineralocorticoid and glucocorticoid deficiency [16-17]. Mineralocorticoid deficiency was excluded in our patient (plasma renin and sodium were last checked at the age of 6 years) and not reported in any of the other patients.

62

Glucocorticoid deficiency and NNT mutations

3 Figure 3: NNT mutations and age of disease presentation. Age of onset of disease in patients with (AA) two truncating mutations (●), (AB) one truncating and one non-truncating muation (■), and (BB) two non-truncating mutations (▲). The solid symbols represent patients with isolated glucocorticoid deficiency, while the open symbols (∆) are patients with combined gluco- and mineralocorticoid deficiency. The solid horizontal line represents the median age in each group (for sib pairs the mean age was used when calculating the median age of onset). In column BB, the dashed line represents the median age when patients with combined adrenal deficiency were excluded from the analysis. The difference in age of presentation between groups (AA, AB and BB) were not significant according to Kruskal-Wallis test with and without combined mineralocorticoid and glucocorticoid deficiency; the p-values were 0.51 and 0.22, respectively. in any of the other patients (see Table 1). As explained above, we could not identify a genetic explanation for this broader adrenal phenotype, e.g. the missense mutations were not located at a specific functional domain. The patient reported by Hershkovitz also had a testicular adrenal rest tumor in combination with testicular enlargement, precocious virilization and skin hyperpigmentation. The symptoms regressed after intensification of glucocorticoid treatment. It should be noted that chronic elevation of ACTH may result in adrenal rest tumors, which may regress when glucocorticoid therapy is intensified. Although this is not specific for FGD, it is worth mentioning and has implications for the surveillance of patients with an NNT mutation. From recent reports it can be concluded that the phenotypic spectrum of NNT mutation is not restricted to FGD but may also include combined adrenal insufficiency and extra-adrenal manifestations like gonadal adrenal rest tumors. Moreover, an association with left ventricular noncompaction (LVNC) was recently reported for heterozygous NNT loss of function mutations [24]. The authors identified a single allele NNT mutation in two patients presenting with LVNC, without adrenal manifestations. They showed that suppression of Nnt in zebrafish caused early ventricular malformation and contractility defects. However, cardiac problems were not described in the patients listed in Table 1 who had biallelic mutations in NNT. Our patient had no features of a cardiac disease, a good physical 63

Chapter 3 condition and a normal heart contour on an X-ray of the thorax at the age of 5 years. No cardiac disease has been documented in the families of the reviewed patients, however under-reporting is very likely. More clinical studies are needed to investigate the prevalence of LVNC in heterozygous carriers of an NNT mutation. Altogether, these recent findings in NNT mutation patients expand our knowledge of the phenotypic spectrum of NNT mutations and this has implications for the surveillance of these patients. They should be closely monitored for combined adrenal insufficiency and extra-adrenal manifestations, e.g. precocious puberty, adrenal rest tumors, possibly cardiac manifestations and other symptoms we still may not be aware of being associated with NNT mutations. The diagnosis of FGD due to NNT mutations is also important for the family, to check newborns at an early age, preventing serious illness or death due to an Addisonian crise, or preventing cerebral damage due to hypoglycemia and hypotension.

NNT pathomechanism in FGD Although NNT is widely expressed in humans – with its expression most readily detectable in adrenal, heart, kidney, thyroid and adipose tissues [6] – we do not know why NNT mutations apparently only affect adrenocortical cells. Production of ROS within mitochondria is one of the major internal triggers for apoptotic cell death [25]. Mice carrying Nnt mutations show slightly disorganized zona fasciculata with higher levels of apoptosis than wild-type mice [6]. As Yamaguchi et al. suggested, the adrenocortical cells of the zona fasciculata that produce a large amount of cortisol may be specifically vulnerable to oxidative stress caused by impaired redox potential and increased ROS [7]. Weinberg-Shukron et al. were able to provide patient-based evidence that NNT mutations indeed increase cellular oxidative stress and impair mitochondrial function and morphology [16]. They observed that ROS levels were 40% higher in NNT_p.Gly200Ser homozygous fibroblasts compared with control fibroblasts. They also found a significant reduction in ATP content (25% less than in healthy control). Additionally, in 50% of the patients’ cells the mitochondria had a pathological punctate appearance which is known to be related to various apoptotic stimuli [26]. It is possible that functional compensation by overlapping antioxidant defense mechanisms protects other cell types or tissues in patients with NNT mutations [27].

Conclusion We have identified a novel NNT mutation in a Dutch patient with FGD, using SNP haplotyping and exome sequencing, an efficient approach in the heterogenous and 64

Glucocorticoid deficiency and NNT mutations clinically not easily distinguishable group of FGD disorders. We also synthesized the, still limited, data on FGD caused by NNT mutations to date and noticed a recent broadening of the phenotype. As a consequence, patients who carry NNT mutations should be closely monitored for likely extra-adrenal manifestations. We showed that the type of mutation is not apparently correlated with age of disease onset. However, larger studies are needed to further investigate this.

Web Links 1000 Genomes Project: http://www.1000genomes.org/ ClinVar database: http://www.ncbi.nlm.nih.gov/clinvar/ 3 NHLBI Exome Sequencing Project (ESP) Exome Variant Server (ESP6500SI-V2- SSA137): http://evs.gs.washington.edu/EVS/ Uniprot: http://www.uniprot.org/ ORF Finder: http://www.ncbi.nlm.nih.gov/gorf/gorf.html Alamut Interactive Biosoftware: http://www.interactive-biosoftware.com/ Genome of the Netherlands: http://www.genoomvannederland.nl/

65

Chapter 3

References 1. Clark AJ, Chan LF, Chung T-T, Metherell LA (2009) The genetics of familial glucocorticoid deficiency. Best Practice & Research Clinical Endocrinology & Metabolism 23: 159–165. 2. Prasad R, Chan LF, Hughes CR, Kaski JP, Kowalczyk JC, et al. (2014) Thioredoxin reductase 2 (TXNRD2) mutation associated with familial glucocorticoid deficiency (FGD). The Journal of Clinical Endocrinology & Metabolism 99: 1556–1563. 3. Meimaridou E, Hughes CR, Kowalczyk J, Guasti L, Paul Chapple J, et al. (2013) Familial glucocorticoid deficiency: new genes and mechanisms. Molecular and cellular endocrinology 371: 195–200. 4. Metherell LA, Naville D, Halaby G, Begeot M, Huebner A, et al. (2009) Nonclassic lipoid congenital adrenal hyperplasia masquerading as familial glucocorticoid deficiency. The Journal of Clinical Endocrinology & Metabolism 94: 3865–3871. 5. Sahakitrungruang T, Tee MK, Blackett PR, Miller WL (2010) Partial defect in the cholesterol side-chain cleavage enzyme P450scc (CYP11A1) resembling nonclassic congenital lipoid adrenal hyperplasia. The Journal of Clinical Endocrinology & Metabolism 96: 792–798. 6. Meimaridou E, Kowalczyk J, Guasti L, Hughes CR, Wagner F, et al. (2012) Mutations in NNT encoding nicotinamide nucleotide transhydrogenase cause familial glucocorticoid deficiency. Nature genetics 44: 740–742. 7. Yamaguchi R, Kato F, Hasegawa T, Katsumata N, Fukami M, et al. (2013) A novel homozygous mutation of the nicotinamide nucleotide transhydrogenase gene in a Japanese patient with familial glucocorticoid deficiency. Endocrine journal 60: 855–859. 8. Casey JP, Nobbs M, McGettigan P, Lynch S, Ennis S (2012) Recessive mutations in MCM4/PRKDC cause a novel syndrome involving a primary immunodeficiency and a disorder of DNA repair. Journal of medical genetics 49: 242–245. 9. Hughes CR, Guasti L, Meimaridou E, Chuang C-H, Schimenti JC, et al. (2012) MCM4 mutation causes adrenal failure, short stature, and natural killer cell deficiency in humans. Journal of Clinical Investigation 122: 814–820. 10. White SA, Peake SJ, McSweeney S, Leonard G, Cotton NP, et al. (2000) The high-resolution structure of the NADP (H)-binding component (dIII) of proton-translocating transhydrogenase from human heart mitochondria. Structure 8: 1–12. 11. Li R, Yu C, Li Y, Lam T-W, Yiu S-M, et al. (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25: 1966–1967. 12. Li R, Li Y, Fang X, Yang H, Wang J, et al. (2009) SNP detection for massively parallel whole-genome resequencing. Genome research 19: 1124–1132. 13. Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows- Wheeler transform. Bioinformatics 26: 589–595.

66

Glucocorticoid deficiency and NNT mutations

14. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next- generation DNA sequencing data. Genome research 20: 1297–1303. 15. Novoselova TV, Rath SR, Carpenter K, Pachter N, Dickinson JE, et al. (2015) NNT Pseudoexon Activation as a Novel Mechanism for Disease in Two Siblings with Familial Glucocorticoid Deficiency. The Journal of Clinical Endocrinology & Metabolism 100: 350–354. 16. Weinberg-Shukron A, Abu-Libdeh A, Zhadeh F, Carmel L, Kogot-Levin A, et al. (2015) Combined mineralocorticoid and glucocorticoid deficiency is caused by a novel founder nicotinamide nucleotide transhydrogenase mutation that alters mitochondrial morphology and increases oxidative stress. Journal of medical genetics 52: 636–641. 3 17. Hershkovitz E, Arafat M, Loewenthal N, Haim A, Parvari R (2015) Combined adrenal failure and testicular adrenal rest tumor in a patient with nicotinamide nucleotide transhydrogenase deficiency. Journal of Pediatric Endocrinology and Metabolism 28: 1187–1190. 18. Rabbani B, Tekin M, Mahdieh N (2014) The promise of whole-exome sequencing in medical genetics. Journal of human genetics 59: 5–15. 19. Yang Y, Muzny DM, Reid JG, Bainbridge MN, Willis A, et al. (2013) Clinical whole-exome sequencing for the diagnosis of mendelian disorders. New England Journal of Medicine 369: 1502–1511. 20. Baker BY, Lin L, Kim CJ, Raza J, Smith CP, et al. (2006) Nonclassic congenital lipoid adrenal hyperplasia: a new disorder of the steroidogenic acute regulatory protein with very late presentation and normal male genitalia. The Journal of Clinical Endocrinology & Metabolism 91: 4781–4785. 21. Parajes S, Kamrath C, Rose IT, Taylor AE, Mooij CF, et al. (2011) A novel entity of clinically isolated adrenal insufficiency caused by a partially inactivating mutation of the gene encoding for P450 side chain cleavage enzyme (CYP11A1). The Journal of Clinical Endocrinology & Metabolism 96: 1798–1806. 22. Rumie H, Metherell L, Clark A, Beauloye V, Maes M (2007) Clinical and biological phenotype of a patient with familial glucocorticoid deficiency type 2 caused by a mutation of melanocortin 2 receptor accessory protein. European Journal of Endocrinology 157: 539–542. 23. Chung T-TL, Chan LF, Metherell LA, Clark AJ (2010) Phenotypic characteristics of familial glucocorticoid deficiency (FGD) type 1 and 2. Clinical Endocrinology 72: 589–594. 24. Bainbridge MN, Davis EE, Choi W-Y, Dickson A, Martinez HR, et al. (2015) Loss of Function Mutations in NNT Are Associated with Left Ventricular Noncompaction. Circulation Cardiovascular Genetics: 544–552. 25. Bras M, Queenan B, Susin S (2005) Programmed cell death via mitochondria: different modes of dying. Biochemistry 70: 231–239.

67

Chapter 3

26. Frank S, Gaume B, Bergmann-Leitner ES, Leitner WW, Robert EG, et al. (2001) The role of dynamin-related protein 1, a mediator of mitochondrial fission, in apoptosis. Devopmental Cell 1: 515–525. 27. Prasad R, Metherell L, Clark A, Storr H (2013) Deficiency of ALADIN impairs redox homeostasis in human adrenal cells and inhibits steroidogenesis. Endocrinology 154: 3209–3218.

68

Glucocorticoid deficiency and NNT mutations

Chapter 3

3

Supplementary materials

69

Chapter 3

Supplement figure A: ORF Finder result for NNT transcript (NM_012343.3). The yellow boxes display first ATG start codon and second in-frame potential ATG, respectively. The red boxes represent three out-of-frame potential translational start sites which might result in non-functional NNT. The green box shows a transit peptide (amino acids 1-43).

Supplement figure B: Evolutionary conservation of the M1 residue translation-initiating methionine in NNT is outlined in red (extracted from Alamut Interactive Biosoftware).

70

Chapter 4

Identification of novel candidate genes for L1 syndrome by X-exome sequencing.

Omid Jazayeri, Krista K. Van Dijk-Bos, Birgit Sikkema-Raddatz, Yvonne J. Vos, Richard J. Sinke

University of Groningen, University Medical Centre Groningen, Department of Genetics, Groningen, the Netherlands

Novel candidate genes for L1 syndrome

Abstract L1 syndrome is an X-linked recessive disease caused by mutations in the L1CAM gene. It is characterized by hydrocephalus, aqueduct stenosis, adducted thumbs, spastic paraplegia, intellectual disability, agenesis/dysgenesis of the corpus callosum and aphasia. Although the causality of L1CAM gene in L1 syndrome is well- established, and more than 200 different pathogenic L1CAM mutations have been linked to this disorder, the number of unsolved cases remains high. Since 95% of all L1 syndrome patients (with and without an L1CAM mutation) are males and familial cases mostly show an X-linked recessive inheritance, we focused on the X chromosome to find new candidate gene(s). Accordingly, we performed X-exome sequencing in a cohort of 58 individuals who are suspected to have L1 syndrome but do not carry mutations in the L1CAM gene. 4 We identified mutations in 7 patients implying five genes as possible candidates for L1 syndrome: DACH2, TENM1, AMER1, PQBP1 and CXorf38. Three previously identified independent mutations in DACH2 imply this gene as the most promising novel candidate gene for L1 syndrome. PQBP1 and AMER1 have already been linked to known diseases whose clinical spectrum overlap L1 syndrome. Although a single gene can be associated with more than one disease, differences in characteristic features in L1 syndrome suggest that a common genetic basis for these disorders may be less likely. An initial follow-up analysis of these five genes in an additional 24 patients did not reveal extra mutations indicating that, if causative, these genes would probably explain only a small proportion of L1 syndrome patients. Functional studies are necessary to firmly establish the causative association of either of the candidate genes.

73

Chapter 4

Introduction L1 syndrome is an X-linked recessive neurological disorder with clinical pleiotropic abnormalities. It comprises four clinically distinct but overlapping neurological syndromes [1–3]. The first and most severe form is X-linked hydrocephalus, also referred to as hydrocephalus, due to stenosis of the aqueduct of Sylvius (HSAS; MIM 307000) [4]. In addition to congenital hydrocephalus, several other phenotypic features including adducted thumbs, intellectual disability and lower limb spasticity have been documented in this syndrome. Neonatal or infant death has been also documented for this subtype [3]. The second form is MASA syndrome (MIM 303350), a milder form of the disease characterized by light to moderate intellectual disability, aphasia, shuffling gait and adducted thumbs [5]. Variable phenotypic presentation has been reported in this form of disease even for carriers of the same mutation in a single family [6]. The other two forms, which occur much less frequently, are X-linked complicated hereditary spastic paraplegia type 1 (SPG1; MIM 303350) and (X-linked complicated corpus callosum agenesis (X-linked ACC; MIM 304100). These complicated forms may also present with mild to moderate intellectual disability [4]. To date, there is only a single gene known to cause L1 syndrome, L1CAM. The L1CAM gene is located on the X-chromosome at Xq28 and consists of 29 exons which encode a protein of 1,257 amino acids, comprising a signal peptide of 19 amino acids and a functional L1 protein of 1,238 amino acids [4]. This protein, the neuronal cell adhesion molecule L1, is a transmembrane glycoprotein, belongs to the immunoglobulin superfamily and is essential in the development of the nervous system [2]. Although the causality of L1CAM in L1 syndrome is well-known and more than 200 different pathogenic L1CAM mutations have been linked to this disorder [4], many unsolved cases remain. Vos et al. (2010) reported a diagnostic yield of 22.8% [7] from sequencing the coding regions of the L1CAM gene in a cohort of 320 patients with L1 syndrome, and causative mutations were identified in 73 patients. We cannot exclude the possibility of more than one gene being involved in the development of L1 syndrome in humans. To test this hypothesis we have focused on the X chromosome, since 95% of all L1 syndrome patients (with and without an L1CAM mutation) are males and familial cases mostly show an X-linked recessive inheritance [8]. Additionally, in a cohort of 98 patients with MASA syndrome, 94% of patients were males [3]. In the same study, 70 patients (100%) with HSAS (a severe form of L1 syndrome) were males. Accordingly, we set out to perform an X-exome analysis to allow a detailed analysis of all X-chromosomal 74

Novel candidate genes for L1 syndrome genes in a cohort of L1 syndrome patients for whom the L1CAM mutation analysis was negative.

Materials and Methods 1. Study subjects We selected a cohort of L1 syndrome patients who were referred for L1CAM analysis. All patients presented with hydrocephalus and at least one additional common phenotypic feature: macrocephaly, adducted thumb and/or agenesis of the corpus callosum. The patients are mainly male, but also included a few females diagnosed with L1 syndrome. We included these because female L1 syndrome patients with mutations in the L1CAM gene have also been reported [9,10]. Previous mutation analysis of the coding regions of L1CAM was negative in all patients. Initially, 58 patients were subject to X-exome sequencing. Subsequently, an 4 additional 24 patients were tested for targeted analysis of the most interesting candidates. All patients or their parents gave consent for diagnostic testing and subsequent use of DNA to improve diagnostic testing procedures.

2. X-exome sequencing 2.1. X-exome kit design and library preparation We designed an X-exome custom Agilent SureSelect Enrichment kit using SureDesign (https://earray.chem.agilent.com/suredesign/). Library preparation was performed according to manufacturer’s instructions (Agilent Technologies, Inc., Santa Clara, CA, USA) and carried out on a BRAVO workstation (Agilent Technologies, Inc.). In brief, 3 µg of genomic DNA were randomly fragmented with ultrasound using a Covaris® instrument (Covaris, Inc., Woburn, MA, USA). Fragments with an average insert size of 250 bp were selected using AMPure Xp beads (Beckman Coulter, Inc.). Adapters were ligated to both ends of the resulting fragments and amplified by PCR (98o C 2 min; 12 cycles of 98oC 30s, 65oC 30s, 72oC 1 min; 72oC 10 min). The quality of the product after size selection and amplification was verified by capillary electrophoresis on the Tapestation 2200 (Agilent Technologies, Inc.). Of this product, 750 ng was subsequently hybridized to the SureSelect XT Custom capture library (S0692382, Agilent Technologies, Inc.). During subsequent enrichment with PCR (98oC 2 min; 14 cycles of 98oC 30s, 57oC 30s, 72oC 1 min; 72oC 10 min), individual samples were bar-coded and the quality of the product was again verified on the Tapestation 2200 instrument. Finally, 24 samples were pooled and subjected to 2x100 base pair reads

75

Chapter 4 paired-end sequencing on an Illumina® HiSeq2500. Resulting image files were processed including demultiplexing using standard Illumina® base calling software.

2.2. Data processing Sequencing data were aligned to the human reference genome build 37 [11], using Burrows-Wheeler Aligner [12]. Subsequently duplicate reads were marked. Using the Genome Analysis Toolkit (GATK) [13], re-alignment around insertions and deletions detected in the sequence data and in the 1000 Genomes Project pilot [11] was performed, followed by base quality score recalibration. During the entire process, the quality of the data was assessed by performing Picard (see Web Links), GATK Coverage and custom scripts. SNP and indel discovery was done using GATK Unified Genotyper and Pindel [14], respectively, followed by annotation using SnpEff [15]. This production pipeline was implemented using the MOLGENIS compute [16] platform for job generation, execution and monitoring. For each sample, a vcf file was generated that included all variants.

2.3. Data filtering Figure 1 displays a flow diagram of our filtering strategy, and each step described in detail below. For data filtering and interpretation, we uploaded the vcf files into Cartagenia NGS bench 4.1.2 (Cartagenia, Leuven, Belgium) and a cohort analysis was performed. As an in-house control, we used sequencing data from four male samples with a completely different phenotype for whom a pathogenic variant in an autosomal gene was detected and two healthy males.

2.3.1. Filtering based on in-house control and frequency in our patient cohort We filtered out the variants present in in-house controls or ≥5x in our patient cohort. With this filtering step sequencing artefacts, i.e. systematic errors caused by our procedures and pipelines, are effectively removed.

2.3.2. Filtering based on allele frequencies In this step of our filtering strategy we removed common, possibly benign, variants annotated in the 1000 Genomes project (see Web Links), ESP6500 (see Web Links), ExAC (see Web Links) and db SNP (see Web Links). All variants with allele frequencies ≥1% for all ethnicity groups were excluded from further analysis. Afterward, the variants were further filtered according to allele frequency in European ethnicities. Indeed, the variants with allele frequency > 1% were excluded as possibly benign in the European population including the European American

76

Novel candidate genes for L1 syndrome ethnicity (ESP6500), the non-Finnish European ethnicity (ExAC) and the European ethnicity (1000 Genomes phase 1 and GoNL-V4 (see Web Links); [17]).

2.3.3. Filtering based on variant position in genome In this step the variants located in exonic regions or ± 5 bps around exon-intron boundary were selected for further evaluation.

2.3.4. Filtering based on impact on protein Next, the remaining variants were filtered according to predicted loss-of-function effects. Frameshift, start-loss, stop-gain, stop-loss and nonsynonymous missense mutations were selected for down-stream interpretation. Putative splice site mutations detected in the 5 bps exon-flanking intronic regions were also included. Nonsynonymous variants were filtered out based on CADD score (see Box 1). The 4 CADD score is a relatively new selection tool in variant interpretation pipelines. It is becoming more and more evident that genes have different, specific thresholds [18] and thus a gene-specific calibration should be carried out to determine the optimal threshold per gene. Since such specific threshold values are not (yet) available, we decided to consider variant(s) with CADD score ≥ 15 as potentially damaging variants. This cut off value has already been applied in other studies [19,20]. It is worth noting that we considered all truncating variants regardless of their CADD scores.

2.3.5. Variant interpretation and selection of candidate genes Using Alamut software (Alamut, Rouen, France), we checked the impact of chosen variants on protein and also their localization in protein domains. The effect of splice site variants on splicing process was also predicted by Alamut software. Predicted deleterious variants were explored for additional clues to provide evidence that these variants may be in genes that can be considered as likely candidate genes for L1 syndrome. Therefore we prepared a general summary of biological processes in which L1CAM is involved (see Box 2). Additionally, the auxiliary information for each candidate gene has been summarized in Table 2.

3. Validation of mutations by Sanger sequencing Candidate pathogenic variants were validated by Sanger sequencing. Sequencing analysis was carried out using primers for all exons in which likely pathogenic mutations were identified upon X-exome sequencing (primer sequences are available upon request). Forward primers were designed with a PT1 tail (5’- TGTAAAACGACGGCCAGT-3’) and reverse primers with a PT2 tail (5’- 77

Chapter 4

CAGGAAACAGCTATGACC-3’). PCR was performed in a total volume of 10 µl containing 5 µl AmpliTaq Gold ®Fast PCR Master Mix (Applied Biosystems, Foster City, CA, USA), 1.5µl of each primer with a concentration of 0.5 pmol/μl (Eurogentec, Serian, Belgium) and 2 µl genomic DNA in a concentration of 40 ng/μl. Samples were PCR amplified and sequenced according to routine protocols.

4. Follow-up Screening of candidate genes In this step five candidate genes were analysed in 24 additional patients affected with L1 syndrome.

Figure 1: Flow diagram of variant selection strategy.

78

Novel candidate genes for L1 syndrome

Box 1: Glossary CADD: Combined Annotation–Dependent Depletion (CADD) scores, a method for objectively integrating many diverse annotations into a single measure (C score) for each variant. CADD integrates several well-known tools such as PolyPhen, SIFT and GERP.

GeneCards: The Human Gene Database is an integrative database that provides comprehensive user-friendly information on all annotated and predicted human genes. It automatically integrates gene-centric data from >100 web sources, including genomic, transcriptomic, proteomic, genetic, clinical and functional information. 4

Mou se Genome Informatics (MGI): MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic and biological data to facilitate the study of human health and disease.

Box 2: An overview of biological processes in which L1CAM is involved

1 Axon outgrowth and pathfinding [46] 2 Axon fasciculation (bundling) [47]

3 Axonal survival [48] 4 Neuronal migration [49] 5 Adhesion between neurons and between neurons and Schwann cells [50,51] 6 Regeneration of damaged nerve tissue [52] 7 Involvement in integrin-mediated cell–cell and cell–matrix interactions [53] 8 Synapse formation [54]

9 Involvement in intracellular second messenger systems [2] 10 Regulates topographic mapping of retinal axon [55]

79

Chapter 4

Results We generated X-exome data including the L1CAM gene for 58 L1 syndrome patients. On average 9,825,718 reads per patient were produced. The mean coverage per target was 74 reads with > 93% covered more than 20 times. The fraction of bases on target was on average 27%. The cohort analysis by Cartagenia NGS bench 4.1.2 (Cartagenia, Leuven, Belgium) along with 6 in-house controls resulted in 9364 variants. For L1CAM the coverage was > 20X in 97% of the gene and we did not detect mutations that were missed in previous Sanger sequencing analyses [7]. Next, stepwise-filtering finally resulted in seven confirmed mutations, including one nonsense and six missense mutations in five different genes (CXorf38, TENM1, PQBP1, DACH2 and AMER1) (Table 1). One mutation could not be confirmed by Sanger sequencing and the variant’s raw data showed that the calling quality was just above threshold, and we therefore re-classified this variant as a false-positive. Four variants are novel, not previously observed or reported in any database to our knowledge, and three have been reported with very low allele frequencies in the ExAC database (Table 1). Auxiliary information for each candidate gene has been summarized in Table 2. Two genes (PQBP1 and AMER1) have already been documented to be associated with diseases that show clinical overlap with L1 syndrome phenotypic features. Interestingly, we observed nonsynonymous mutations in DACH2 in three patients. No additional putative mutations could be detected in these five candidate genes when screening 24 additional patients with L1 syndrome.

Discussion This study set out to identify novel causal X-chromosomal genes in a panel of 58 L1 syndrome patients for whom the L1CAM mutation analysis was negative. Since 95% of all L1 syndrome patients (with and without an L1CAM mutation) are males, and familial cases mostly show an X-linked recessive inheritance, we hypothesized that in addition to L1CAM, other X-chromosomal genes are likely to be involved in this disorder. Upon X-exome sequencing we identified mutations in five novel candidate genes.

80

Novel candidate genes for L1 syndrome

4

81

Chapter 4

82

Novel candidate genes for L1 syndrome

AMER1 (also known as WTX) We identified a missense mutation (c.2689G>A, p.Asp897Asn) in the AMER1 gene in one female patient. APC Membrane Recruitment Protein 1, AMER1, is a protein with 1135 amino acids. Germline inactivation of the AMER1 gene leads to Osteopathia striata with cranial sclerosis (MIM 300647), a rare X-linked dominant disorder which exhibits overlapping features with L1 syndrome such as macrocephaly, hydrocephalus, speech delay, hypotonia, mild intellectual disability and partial agenesis of corpus callosum. AMER1 ablation in early mesenchymal progenitors causes aberrant osteoblastogenesis leading to bone overgrowth in mouse [22]. It might be an explanation for observed craniofacial dysmorphism (frontal and occipital bossing and macrocephaly) in Osteopathia striata with cranial sclerosis (OSCS; MIM 300647). Hearing loss is the most common neurological manifestation (46%) and overall 20% of patients present defects of the central 4 nervous system (ventricular dilatation, abnormal gyration, or agenesis of corpus callosum) and developmental delay. Variable phenotypic presentations have been reported even within a family [23,24]. Hirschsprung disease and Laryngotracheomalacia have been also observed in some patients [25]. Hemizygous males are usually more severely affected than heterozygous females. In males, the disorder is usually associated with fetal or neonatal lethality [26]. To date only truncating mutation in WTX are known to cause OCSC [27]. Since it is known that different mutations and different inheritance types may result in disorders with distinct clinical entities, we cannot exclude that missense mutations in this gene cause a disorder with an L1 syndrome phenotype and X-linked recessive inheritance.

PQBP1 We identified a missense mutation in PQBP1 (c.91G>C, p.Glu31Gln) located in exon 2 and affecting an evolutionary strongly conserved amino acid. PQBP1 encodes a polyglutamine-binding protein, PQBP1, whose function is not well understood but is thought to play a role in mRNA splicing regulation and mRNA translation. PQBP1 mediates neural development by regulating mRNA splicing. [28,29]. Defects in PQPB1 lead to Renpenning syndrome (MIM 309500), the common features of which include variable intellectual disability, microcephaly, craniofacial dysmorphism, velar dysfunction, lean body habitus, sparse hair, short stature, selective muscular atrophy and genital anomalies [30,31]. Strabismus, spasticity and intellectual disability are the clinical manifestations in Renpenning syndrome which show overlap with the phenotypic features of L1 syndrome. Microcephaly has been reported in 89% (58/65) of patients who had mutations in 83

Chapter 4

PQBP1 genes, and 75% (64/85) of patients also presented with a lean body habitus [31]. The most common mutations in PQBP1 are deletions and duplications of AG dinucleotides within an (AG)6 tract in exon 4, which cause frameshift mutations [32]. Only one pathogenic missense mutation (NM_005710.2: c.194A>G, p.Tyr65Cys) has been reported [33]. This missense mutation maps within a functional region of the protein known as the WW domain (tryptophan-tryptophan domain) [34]. The WW domain is a domain composed of 38 amino acids, from amino acid 46 to 84. The identified missense mutation in our patient, p.Glu31Gln, does not belong to this domain. Probably this mutation causes a disorder with an L1 syndrome phenotype.

DACH2 Interestingly, we identified three missense mutations in DACH2 (c.1105G>A, p.Glu369Lys; c.1580G>A, p.Arg527Gln; c.514C>T, p.Arg172Cys) in three patients (two males and one female). To our knowledge the first two mutations are novel and have not been documented in literature or databases. The third mutation, p.Arg172Cys, has been observed with a very low allele frequency (0.004%) in the ExAC database. Mutations in DACH2 have been already reported in two brothers [35]. They presented a few phenotypic features similar to those seen in L1 syndrome including language disorder, intellectual disability and distinct manifestations like difficulty in standing and walking, and urinary and fecal incontinence. They also appeared to carry mutations in ABCD1 explaining the observed increased plasma levels of very- long-chain fatty acids. However, the similar phenotypic features for L1 syndrome in these patients might also be related to loss of function in DACH2. Additionally, it has been predicted that a putative Dach2-knockout mouse model may show abnormal axon guidance as was also observed in patients with L1 syndrome [36] (see Box 2 and Table 2). Altogether, the detection of three mutations in our cohort suggests DACH2 is an interesting novel candidate gene for L1 syndrome. The functional effect of the DACH2 mutation in our one female patient could perhaps be explained by the occurrence of non-random X-inactivation, i.e. if the X- chromosome carrying the wild-type DACH2 allele is preferentially inactivated.

84

Novel candidate genes for L1 syndrome

TENM1 (also known as ODZ1 or Teneurin-1) We identified a missense mutation (c.4211G>A, p.Arg1404Gln) in the TENM1 gene. The mutation affects a highly evolutionarily conserved amino acid. This gene has not yet been associated with human disease. Teneurins are a family of transmembrane glycoproteins conserved from Caenorhabditis elegans and Drosophila melanogaster to vertebrates [37]. Teneurin domain architecture is highly conserved between invertebrates and vertebrates. In vertebrates, Teneurins are expressed most prominently in the brain [38]. In the developing chick brain, TENM1 is expressed in many parts of brain including the retina, optic tectum, olfactory bulb and cerebellum. It is functionally involved in neurite outgrowth, axon guidance and synaptic connectivity in animal models [38,39]. The TENM1 intracellular domain was found to contain a nuclear localization signal that is required for its translocation to the nucleus [37]. Evidence 4 of TENM1 proteolytic cleavage and nuclear translocation of the intracellular domain has been reported in chick embryo fibroblasts [40]. Additionally, interaction of the teneurin-1 intracellular domain with MBD-1, a methyl-CpG-binding protein, suggests the signalling and/or transcriptional regulation role for teneurin-1 [38,40]. Altogether, the similar biological functions between TENM1 and L1CAM (see Box 2) implicate it as a potential candidate gene to further studies.

CXorf38 A nonsense mutation ( c.937C>T, p.Q313*) in CXorf38 has been observed in one patient. The mutation is located in last coding exon and predicts premature termination of protein translation 7 amino acids before normal stop codon. Both human CXorf38 and its mouse ortholog (1810030O07Rik) escape X-inactivation [41,42], therefore the inheritance pattern is expected to be similar to normal recessive inheritance in females. There is not much information available about this uncharacterized protein, hence, for now, supportive biological evidence is lacking which would connect CXorf38 to L1 syndrome.

Conclusion Our X-exome analysis revealed possible disease causing mutations in 7 out of 58 L1 syndrome patients. This finding implicates another five X-chromosomal genes as candidate genes for L1 syndrome. We consider DACH2 as most promising novel candidate gene for L1 syndrome based on the identification of nonsynonymous mutations in three patients from our cohort. Two candidate genes (AMER1 and PQBP1) have already been linked to known diseases whose clinical spectrum

85

Chapter 4 overlaps with that of L1 syndrome. Such overlap could be explained by the notion that different mutations in single gene may result in disorders with distinct clinical entities [43–45]. A common basis for L1 syndrome and Osteopathia stiata seems less likely because the latter is characterized by an X-linked dominant inheritance and frequent hearing loss (42% of patients). The high incidence of microcephaly (89%) and lean body habitus (75%) in Renpenning syndrome while these features have not been reported or are rare in individuals with L1 syndrome may indicate that a common basis of Renpenning syndrome and L1 syndrome is less likely. Still, there is a clear rationale for a systematic clinical re-evaluation mutation carriers and an extensive mutation analysis of these putative candidates (AMER1 and PQBP1) in many more patients. Functional studies are essential to firmly confirm pathogenicity of these putative candidate genes (DACH2, TENM1 and CXorf38) for L1 syndrome. However, even if functionally confirmed, our finding indicates that these genes will probably not explain a substantial proportion of L1 patients.

Acknowledgements We thank all patients and families for their physicians for participating in this study. We also thank Kate Mc Intyre and Jackie Senior for editing the manuscript.

Web Links Picard: http://picard.sourceforge.net MOLGENIS: http://www.molgenis.org/wiki/ComputeStart 1000 Genomes Project: http://www.1000genomes.org/ Genome of the Netherlands project: http://www.nlgenome.nl GeneCards: http://www.genecards.org/ GeneNetwork: http://genenetwork.nl:8080/GeneNetwork/ MIM: Mendelian inheritance in man number (http://omim.org) The Exome Aggregation Consortium (ExAC): http://exac.broadinstitute.org/ NHLBI Exome Sequencing Project (ESP) Exome Variant Server (ESP6500SI-V2- SSA137): http://evs.gs.washington.edu/EVS/ Mouse Genome Informatics (MGI): http://www.informatics.jax.org/ The Human Gene Mutation Database (HGMD): http://www.hgmd.cf.ac.uk/ac/index.php

86

Novel candidate genes for L1 syndrome

References 1. Schäfer MK, Nam Y-C, Moumen A, Keglowich L, Bouché E, et al. (2010) L1 syndrome mutations impair neuronal L1 function at different levels by divergent mechanisms. Neurobiology of disease 40: 222–237. 2. Fransen E, Van Camp G, Vits L, Willems PJ (1997) L1-associated diseases: clinical geneticists divide, molecular geneticists unite. Human molecular genetics 6: 1625–1632. 3. Schrander-Stumpel C, Hӧweler C, Jones M, Sommer A, Stevens C, et al. (1995) Spectrum of X-linked hydrocephalus (HSAS), MASA syndrome, and complicated spastic paraplegia (SPG1): Clinical review with six additional families. American journal of medical genetics 57: 107–116. 4. Vos YJ, Hofstra RM (2010) An updated and upgraded L1CAM mutation database. Human Mutation 31: 1102–1109. 5. Schrander-Stumpel C, Legius E, Fryns J-P, Cassiman J-J (1990) MASA syndrome: new clinical features and linkage analysis using DNA probes. 4 Journal of medical genetics 27: 688–692. 6. Fryns J-P, Spaepen A, Cassiman J-J, Van den Berghe H (1991) X linked complicated spastic paraplegia, MASA syndrome, and X linked hydrocephalus owing to congenital stenosis of the aqueduct of Sylvius: variable expression of the same mutation at Xq28. Journal of medical genetics 28: 429. 7. Vos YJ, de Walle HEK, Bos KK, Stegeman JA, Ten Berge AM, et al. (2010) Genotype-phenotype correlations in L1 syndrome: a guide for genetic counselling and mutation analysis. Journal of Medical Genetics 47: 169–175. 8. Vos YJ (2010) Genetics of L1 syndrome. PhD thesis: University of Groningen. 9. Ruiz JC, Cuppens H, Legius E, Fryns J-P, Glover T, et al. (1995) Mutations in L1-CAM in two families with X linked complicated spastic paraplegia, MASA syndrome, and HSAS. Journal of medical genetics 32: 549–552. 10. Kaepernick L, Legius E, Higgins J, Kapur S (1994) Clinical aspects of the MASA syndrome in a large family, including expressing females. Clinical genetics 45: 181–185. 11. Abecasis G, Altshuler D, Auton A, Brooks LD, Durbin RM and GRA, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073. 12. Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows- Wheeler transform. Bioinformatics 26: 589–595. 13. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next- generation DNA sequencing data. Genome research 20: 1297–1303. 14. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z (2009) Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25: 2865–2871. 15. Cingolani P and PA, Wang LL and CM and NT and WL and LSJ and RDM, Lu X (2012) A program for annotating and predicting the effects of single

87

Chapter 4

nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w; iso-2; iso-3. Fly (Austin) 6: 1–13. 16. Byelas H, Dijkstra M, Neerincx PB, Van Dijk F, Kanterakis A, et al. (2013) Scaling Bio-Analyses from Computational Clusters to Grids. IWSG. Zurich, Switzerland: Proceedings of the 5th International Workshop on Science Gateways. p. ISSN: 1613–0073. 17. Francioli LC, Menelaou A, Pulit SL, van Dijk F, Palamara PF, et al. (2014) Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nature Genetics 46: 818–825. 18. Velde KJ, Kuiper J, Thompson BA, Plazzer J-P, Valkenhoef G, et al. (2015) Evaluation of CADD scores in curated mismatch repair gene variants yields a model for clinical validation and prioritization. Human mutation 36: 712–719. 19. Santos-Cortez RLP, Chiong CM, Reyes-Quintos MRT, Tantoco MLC, Wang X, et al. (2015) Rare A2ML1 variants confer susceptibility to otitis media. Nature genetics 47: 917–920. 20. Li J, Meeks H, Feng B-J, Healey S, Thorne H, et al. (2016) Targeted massively parallel sequencing of a panel of putative breast cancer susceptibility genes in a large cohort of multiple-case breast and ovarian cancer families. Journal of medical genetics 53: 34–42. 21. Comai G, Boutet A, Neirijnck Y, Schedl A (2010) Expression patterns of the Wtx/Amer gene family during mouse embryonic development. Developmental Dynamics 239: 1867–1878. 22. Moisan A, Rivera MN, Lotinun S, Akhavanfard S, Coffman EJ, et al. (2011) The WTX tumor suppressor regulates mesenchymal progenitor cell fate specification. Developmental Cell 20: 583–596. 23. Savarirayan R, Nance J, Morris L, Haan E, Couper R (1997) Osteopathia striata with cranial sclerosis: highly variable phenotypic expression within a family. Clinical genetics 52: 199–205. 24. Winter RM, Crawfurd MD, Meire HB, Mitchell N (1980) Osteopathia striata with cranial sclerosis: highly variable expression within a family including cleft palate in two neonatal cases. Clinical genetics 18: 462–474. 25. Viot G, Lacombe D, David A, Mathieu M, de Broca A, et al. (2002) Osteopathia striata cranial sclerosis: non-random X-inactivation suggestive of X-linked dominant inheritance. American Journal of Medical Genetics 107: 1– 4. 26. Zicari AM, Tarani L, Perotti D, Papetti L, Nicita F, et al. (2012) WTX R353X mutation in a family with osteopathia striata and cranial sclerosis (OS-CS): case report and literature review of the disease clinical, genetic and radiological features. Italian Journal of Pediatrics 38: 27. 27. Holman SK, Daniel P, Jenkins ZA, Herron RL, Morgan T, et al. (2011) The male phenotype in osteopathia striata congenita with cranial sclerosis. American Journal of Medical Genetics Part A 155: 2397–2408. 28. Wan D, Zhang ZC, Zhang X, Li Q, Han J (2015) X-chromosome-linked intellectual disability protein PQBP1 associates with and regulates the translation of specific mRNAs. Human molecular genetics 24: 4599–4614.

88

Novel candidate genes for L1 syndrome

29. Kunde S, Musante L, Grimme A, Fischer U, Müller E, et al. (2011) The X- chromosome-linked intellectual disability protein PQBP1 is a component of neuronal RNA granules and regulates the appearance of stress granules. Human molecular genetics 20: 4916–4931. 30. Friez MJ (2014) Phenotypes and Variants in Cases Submitted for X-Linked Intellectual Disability (XLID) Gene Panel Testing. PhD thesis: University of South Carolina. 31. Germanaud D, Rossi M, Bussy G, Gérard D, Hertz-Pannier L, et al. (2011) The Renpenning syndrome spectrum: new clinical insights supported by 13 new PQBP1-mutated males. Clinical genetics 79: 225–235. 32. Kalscheuer VM, Freude K, Musante L, Jensen LR, Yntema HG, et al. (2003) Mutations in the polyglutamine binding protein 1 gene cause X-linked mental retardation. Nature genetics 35: 313–315. 33. Lubs H, Abidi F, Echeverri R, Holloway L, Meindl A, et al. (2006) Golabi-Ito- Hall syndrome results from a missense mutation in the WW domain of the 4 PQBP1 gene. Journal of medical genetics 43: e30. 34. Tapia VE, Nicolaescu E, McDonald CB, Musi V, Oka T, et al. (2010) Y65C missense mutation in the WW domain of the Golabi-Ito-Hall syndrome protein PQBP1 affects its binding activity and deregulates pre-mRNA splicing. Journal of biological chemistry 285: 19391–19401. 35. Zhang Y, Liu Y, Li Y, Duan Y, Zhang K, et al. (2014) Exome sequencing identifies combined mutations in ABCD1 and DACH2 in two siblings with a distinct phenotype. BMC medical genetics 15: 105. 36. Engle EC (2010) Human genetic disorders of axon guidance. Cold Spring Harbor perspectives in biology 2: a001784. 37. Kenzelmann D, Chiquet-Ehrismann R, Leachman NT, Tucker RP (2008) Teneurin-1 is expressed in interconnected regions of the developing brain and is processed in vivo. BMC developmental biology 8: 30. 38. Kenzelmann D, Chiquet-Ehrismann R, Tucker R (2007) Teneurins, a transmembrane protein family involved in cell communication during neuronal development. Cellular and molecular life sciences 64: 1452–1456. 39. Mӧrck C, Vivekanand V, Jafari G, Pilon M (2010) C. elegans ten-1 is synthetic lethal with mutations in cytoskeleton regulators, and enhances many axon guidance defective mutants. BMC developmental biology 10: 55. 40. Nunes SM, Ferralli J, Choi K, Brown-Luedi M, Minet AD, et al. (2005) The intracellular domain of teneurin-1 interacts with MBD1 and CAP/ponsin resulting in subcellular codistribution and translocation to the nuclear matrix. Experimental Cell Research 305: 122–132. 41. Yang F, Babak T, Shendure J, Disteche CM (2010) Global survey of escape from X inactivation by RNA-sequencing in mouse. Genome research 20: 614–622. 42. Peeters SB, Cotton AM, Brown CJ (2014) Variable escape from X- chromosome inactivation: Identifying factors that tip the scales towards expression. BioEssays 36: 746–756.

89

Chapter 4

43. Seri M, Pecci A, Di Bari F, Cusano R, Savino M, et al. (2003) MYH9-related disease: May-Hegglin anomaly, Sebastian syndrome, Fechtner syndrome, and Epstein syndrome are not distinct entities but represent a variable expression of a single illness. Medicine 82: 203–215. 44. Synofzik M, Gonzalez MA, Lourenco CM, Coutelier M, Haack TB, et al. (2014) PNPLA6 mutations cause Boucher-Neuhäuser and Gordon Holmes syndromes as part of a broad neurodegenerative spectrum. Brain: 69–77. 45. Oostra BA, Willemsen R (2009) FMR1: a gene with three faces. Biochimica et Biophysica Acta (BBA)-General Subjects 1790: 467–477. 46. Fischer G, Kunemund V, Schachner M (1986) Neurite outgrowth patterns in cerebellar microexplant cultures are affected by antibodies to the cell surface glycoprotein L1. The Journal of neuroscience 6: 605–612. 47. Kunz S, Ziegler U, Kunz B, Sonderegger P (1996) Intracellular signaling is changed after clustering of the neural cell adhesion molecules axonin-1 and NgCAM during neurite fasciculation. Journal of Cell Biology 135: 253–267. 48. Haney CA, Sahenk Z, Li C, Lemmon V, Roder J, et al. (1999) Heterophilic binding of L1 on unmyelinated sensory axons mediates Schwann cell adhesion and is required for axonal survival. Journal of Cell Biology 146: 1173–1184. 49. Asou H, Miura M, Kobayashi M, Uyemura K, Itoh K (1992) Cell adhesion molecule L1 guides cell migration in primary reaggregation cultures of mouse cerebellar cells. Neuroscience letters 144: 221–224. 50. Rathjen FG, Schachner M (1984) Immunocytological and biochemical characterization of a new neuronal cell surface component (L1 antigen) which is involved in cell adhesion. The EMBO journal 3: 1. 51. Persohn E, Schachner M (1987) Immunoelectron microscopic localization of the neural cell adhesion molecules L1 and N-CAM during postnatal development of the mouse cerebellum. Journal of Cell Biology 105: 569–576. 52. Martini R, Schachner M (1988) Immunoelectron microscopic localization of neural cell adhesion molecules (L1, N-CAM, and myelin-associated glycoprotein) in regenerating adult mouse sciatic nerve. Journal of Cell Biology 106: 1735–1746. 53. Kadmon G, Altevogt P (1997) The cell adhesion molecule L1: species-and cell-type-dependent multiple binding mechanisms. Differentiation 61: 143– 150. 54. Maness PF, Schachner M (2007) Neural recognition molecules of the immunoglobulin superfamily: signaling transducers of axon guidance and neuronal migration. Nature neuroscience 10: 19–26. 55. Buhusi M, Schlatter MC, Demyanenko GP, Thresher R, Maness PF (2008) L1 interaction with ankyrin regulates mediolateral topography in the retinocollicular projection. The Journal of Neuroscience 28: 177–188.

90

Chapter 5

Composite biological network analysis provides novel clues for hereditary congenital microcephaly

Omid Jazayeri1,2, Birgit Sikkema-Raddatz1, Richard J. Sinke1 and Behrooz Z. Alizadeh2

1University of Groningen, University Medical Centre Groningen, Department of Genetics, Groningen, the Netherlands. 2University of Groningen, University Medical Centre Groningen, Department of Epidemiology, Groningen, the Netherlands.

Submitted

In silico network analysis of congenital microcephaly

Abstract The molecular pathomechanisms underlying congenital microcephaly (CM) are multifactorial while the etiology of this heterogeneous disorder has still not been fully addressed. A marked number of CM cases remain genetically unsolved even when using the highly advanced Next Generation Sequencing (NGS)-based diagnostic technologies. We aimed therefore to identify biological processes underlying CM in relation to already well-established CM-causing genes that may also suggest candidate genes for CM. We performed a systematic literature review that yielded 76 definite protein-coding CM-causing genes. Next, we carried out an integrative composite biological network analysis by using various databases. The genes were used for functional enrichment analysis against Gene Ontology (GO) annotations, and then the enrichment in each biological function group was calculated. GO annotations highlighted two previously less addressed biological functions underlying CM, telomere biology and tRNA metabolic process, with a 16.75 and 19.17 fold enrichment, respectively, compared to the entire genome. The 5 results also confirmed previously known biological processes such as centrosome, spindle pole and microtubule-related-processes, DNA damage repair and DNA replication involved in CM. Additionally, the 100 top ranked candidate genes related to CM were suggested and evaluated using OMIM, MGI and HGMD databases, of which 16 candidates represent a relationship with microcephaly. These 16 candidate genes support the idea that our network can potentially suggest CM candidate genes. The genes that displayed relationship to CM in mouse models (MGI-based features) would probably provide prime candidate for gene-discovery research. Among the 100 candidate genes, 14 important genes for (neuro)developmental functions might be of notable interest and useful for gene identification in unsolved cases presenting CM. More generally, this research provides an in silico framework to explore candidate genes in any rare genetically heterogeneous Mendelian disease and may facilitate gene discovery and mechanistic understanding of Mendelian disease.

93

Chapter 5

Introduction Microcephaly is most commonly defined as an occipitofrontal head circumference (OFC) of less than three standard deviations (SD) below the mean for age, sex, and ethnicity. In the original document on the nosology of microcephaly by Opitz and Holt, a cut-off of two standard deviations was used [1], while Woods and Parker have proposed a cut-off of three standard deviations [2]. A special form of microcephaly is autosomal recessive primary microcephaly (MCPH). It is characterized by an OFC less than -4 SD at birth and intellectual impairment that is not related to other neurological findings, although mild seizures may occasionally be present. Most MCPH patients have a normal height, weight and appearance, normal chromosome analysis and brain scan results [3]. Congenital microcephaly (CM) can be a feature of different multiple congenital anomaly syndromes such as Nijmegen breakage syndrome, Seckel syndrome, or Warsaw breakage syndrome [4], but can also present as an isolated feature in MCPH. To date, 15 MCPH-causing genes have been identified [5–8]. The pathomechanism of MCPH and CM is highly heterogeneous. Several studies have indicated that common molecular pathomechanisms, such as defects in centrosome, spindle microtubule organization, DNA damage repair and cell cycle checkpoint control may underlie MCPH [4,5,7,9]. Additionally, MFSD2A, which is required for omega-3 fatty acid transport in the endothelium of blood-brain barrier has been recently reported as MCPH15 [8]. The other two recently identified MCPH genes, PHC1 [10] and ZNF335 [11], appear to be involved in “chromatin remodeling”, further suggesting this process is involved also in the pathomechanism of CM. Recent advances in Next Generation Sequencing (NGS)-based strategies have accelerated and revolutionized the identification of causal genes in monogenic disorders [12] and also for CM. For example, Abou Jamra et al. (2011) identified three CM-causing genes by whole exome sequencing (WES) in three consanguineous families [13]. Despite these advances, the causal genetic mutations remain unidentified in many of patients with CM or MCPH. In a cohort of 35 families with patients suffering from microcephaly and variable intellectual disability, we could not identify the disease-causing gene in 20 families (70%) even when using WES [14]. Similar results have been reported by other researchers [15]. Explanations for these genetically unsolved cases (see Box 1) include the current limitations of NGS or WES techniques [16] and the fact that not all disease-causing genes have yet been identified. One way to identify novel candidate genes for unsolved cases, is to focus on understanding the underlying pathways and the genes that may display biological 94

In silico network analysis of congenital microcephaly connections with already established CM-causing genes. Composite biological network analysis (see Box 1) aims to identify relevant underlying mechanisms and also facilitates the discovery of potential novel causal genes. This can be achieved by using a systematic strategy that integrates biological information from many different publically available biological databases. In a recent study, Novarino et al. (2014) demonstrated the feasibility of such in silico approaches by identifying three additional candidate disease genes in families with hereditary spastic paraplegias. Putative mutations were found to co-segregate with the disease phenotype by applying a network analysis of results obtained from WES [17]. Though the etiology of CM is not yet fully understood, this approach has not been applied to CM. Here, we carried out a systematic review to compile a list of established causal genes for CM. Next, we performed an in silico composite biological network study to determine which biological processes may underlie CM. A ranked list of candidate genes that are likely involved in CM was also generated. Further, the candidate genes were investigated to compile supportive evidence for their relation with microcephaly using the following databases: Online Mendelian Inheritance in Man 5 (OMIM), Mouse Genome Informatics (MGI), Human Gene Mutation Database (HGMD) and Database of Essential Genes (DEG; see Web Links).

Method We performed a composite biological network analysis, as previously described[18] to identify the biological mechanisms underlying CM and to predict candidate genes in which mutations may potentially cause CM. We carried out this analysis in three distinct phases as described below and depicted in Figure 1.

Phase 1: Selection of congenital microcephaly genes We performed a systematic search in PubMed with a query of MeSH terms “microcephaly” restricted to subheading “genetics” and an extra filter on “human” species. Candidate genes retrieved from the OMIM database where then grouped into four classes based on the a priori defined criteria described in supplementary Table 1. The first class includes genes which have been clearly mentioned in OMIM as definite causes of CM (or microcephaly in neonates). The second class included genes that have been reported repeatedly as causing CM (or microcephaly in neonates) in the literature, but are not yet specified in OMIM. The third class included possible CM genes in which the onset of disease presentation (i.e. microcephaly at birth) has not been reported in either OMIM or literature, or was reported in only one patient (i.e. a case report) without additional confirmation by functional or animal studies. 95

Chapter 5

Box 1:

Chromosome localization: Any process in which a chromosome is transported to, or maintained in, a specific location (Gene Ontology Consortium; see Web Links). Co-localization: Genes which are co-expressed in the same tissue, or proteins found in the same cellular location (GeneMANIA help page; see Web Links). Composite biological network: An integrated network derived from multiple biological association networks such as a protein-protein network, co-expression network, or pathway network. Functional enrichment analysis: A method to identify classes of genes that are over- represented in a large set of genes. For example, GO-based functional enrichment analysis showing which biological functions have been enriched in a large set of genes. Genetic interaction: A genetic interaction between two genes generally indicates that the phenotype of a double mutant differs from what is expected from each individual mutant. GO term (Gene Ontology term): An ontology of defined terms representing gene product properties. The ontology covers three domains: cellular component, molecular function and biological process. GO annotation biological function group enrichment: The enrichment in a group of GO terms, which are involved in a specific biological function. The word “group” refers to a number of GO terms which represent similar biological functions. Each GO term consist of several genes which may overlap with the other GO terms in the same group. It is calculated by compiling GO terms with similar function into a group and then extracting unique related genes in a proposed network. Performing the same process for the entire genome, group enrichment will then be calculated by comparing the number of genes in a proposed network to entire genome (see a descriptive example in Figure 2) Hypergeometric probability: The probability of k successes in n draws, without replacement, from a population of size N that contains exactly K successes, wherein each draw is either a success or a failure (see Web Links for explanation with an example). False Discovery Rate (FDR): One of family wise error rate, i.e. the probability of making one or more false discoveries among all the hypotheses when performing multiple hypothesis testing. The FDR is the rate that significant features are truly null. For example, a false positive rate of 5% means that on average 5% of the truly null hypothesis (H0) in the study will be called significant. MeSH term: MeSH (Medical Subject Headings) is the controlled vocabulary thesaurus used for indexing articles for PubMed. Pathway: Two gene products are linked if they participate in the same reaction within a pathway. These data are collected from various source databases, such as Reactome and BioCyc via PathwayCommons (GeneMANIA help page; see Web Links). Telomere maintenance: Telomere maintenance mechanisms contribute to the maintenance of proper telomeric length and structure by affecting and monitoring the activity of telomeric proteins and the length of telomeric DNA. Unsolved case: Refers to patients who are clinically diagnosed but genetically remained undetermined.

96

In silico network analysis of congenital microcephaly

CM genes with incomplete penetrance in patients were also included in the third class. The fourth class included genes which did not fall into the first three classes, for instance if the onset of microcephaly was not at birth but instead in infancy or early childhood. The search process was completed on 15 March 2015.

Phase 2: Building a composite biological network Phase 2.1: Selection of in-silico functional data We utilized a large set of databases contained within the GeneMANIA package (Database version 12-08-2014, Homo sapiens) [19] to build a composite biological network. This package comprises 542 networks that collectively utilize more than 172 million gene-gene associations based on the following databases (see Box 1) of co-localization, genetic interaction, pathways, physical protein-protein interactions, shared protein domain, and co-expression [19]. Additionally, to insure we explored co-expressed genes during brain development in the fetus, we input the following five human fetal brain expression data from one expression data set obtained from six human fetuses between 13-16 weeks of gestation (Gene Expression Omnibus 5 database: accession no. GSE38805) [20] and four distinct fetal brain expression data sets from 15-21 weeks of gestation extracted from the atlas of developing human brain (see Web Links).

Phase 2.2: Gene Ontology (GO) annotation biological function group enrichment and hypergeometric probability test Genes from classes 1 and 2 identified in phase 1 (Table 1) were selected as input gene to start the analysis (i.e. they are our training gene set) and to build a composite biological network (see Box 1) using the default weighting method. As the initial query, we selected “zero″ for the number of related genes. Each CM- causing gene can be involved to one or more Gene Ontology terms (GO terms; see Box 1). The training gene set composed of CM-causing genes was then used for functional enrichment analysis against GO terms to identify the most relevant GO terms for CM genes. As has been previously advised [19,21,22], we applied a FDR<0.05 threshold (see Box 1) to select significant GO annotations. These significant GO terms were then categorized based on their corresponding biological functions. Those GO terms associated with similar biological functions were grouped together and hereafter called a ‘biological function group’. For example, there were three GO terms which are related to the telomere biology group (supplement Dataset 1). Since each CM-causing gene can be linked to different GO terms, duplicate or multiple CM genes were linked to more than one biological function. Since this would introduce a false counting of the number of 97

Chapter 5

CM genes and eventually lead to overestimation of enrichment analysis for the affected biological group, we removed therefore duplicated genes in each of the biological function group. To calculate biological function group enrichment (see Box 1), associated genes within each biological function group were extracted from the resulting network. Further, across the entire genome , the associated-genes corresponding to each biological function group were also extracted (see Web Links). Therefore, we generated two gene sets for each biological function group: the first set of associated-genes originated from our network analysis and the second set of genes was extracted from the entire genome (supplementary Dataset 1 and 2). For instance, four out of 76 CM genes are associated with telomere biology from our network analyses, whereas 65 out of 20,687 genes in entire genome were associated with this biological function group [23]. Biological function group enrichment [24] was thus calculated using the following formula:

Number of genes in each biological function group in our network [ ] Total genes in our network (N=76 in phase 2.2 or 176 in phase 2.3)

[ Number of genes in each biological function group in entire genome ] Total genes in entire genome (N=20687)

To show that these enriched biological processes within our network are highly unlikely to have occurred by the chance alone, we performed the exact hypergeometric probability test as previously applied [24] (see Box 1 and Web Links). Population size was set at 20,687, with is the total number of protein coding genes [23]. Sample sizes were 76 genes and 176 genes for phase 2.2 and phase 2.3, respectively. The details and number of genes associated with each biological group within the entire genome, and within the constructed network are presented in the supplementary Dataset S1 for Phase 2.2 and supplementary Dataset S2 for Phase 2.3. The approach to calculate biological function group enrichment and hypergeometric probability test is exemplified in more details for the telomere biology group enrichment in Figure 2. The estimates of these analyses are presented as enrichment score.

Phase 2.3: GO annotation biological function group enrichment and identification of candidate genes for CM We input the initially identified genes from Phase 1 as the training gene set and performed a composite biological network analysis. The algorithm predicted the candidate genes that were most likely to share function with the training gene set. 98

In silico network analysis of congenital microcephaly

This led to the prediction of the 100 top-ranked genes which have a functional association to at least one of the CM genes of the training set (supplementary Table 4). To calculate the enrichment of each of the predicted biological function groups, we considered the total number of the possible 176 CM associated genes (i.e. 76 from phase 1 and 100 predicted from phase 2.3) in our network, and the total number of genes in the entire genome was set at 20,687 [23] (see also Box 1).

Phase 3: Confirmation of candidate genes proposed by our network To obtain insight into whether genes predicted by our network study may indeed be related to CM, the 100 top related genes’ functional properties and reported relationships to diseases were studied step-wised in OMIM, MGI, HGMD and DEG (see Web Links). We should note that DEG (see Web Links) is a continuously updated on-line database which extracts and compiles essential genes from different studies [25]. In OMIM the inclusion of “microcephaly” as an important feature in the clinical synopsis was considered as supportive evidence. When there was no affirmative documentation in OMIM, the phenotype section of 5 the MGI database was checked for descriptions of cerebral cortex abnormalities or abnormal brain morphology, which may be associated with microcephaly in mouse mutants. All candidate genes were further studied in HGMD® professional release 2015.4 (see Web Links), where we checked the clinical synopsis for details on whether patients with pathogenic mutations in these candidate genes are affected by diseases that include microcephaly as a manifestation. For a given gene, when we found evidence of a relation to CM in OMIM, MGI or HGMD, we considered this as affirmative indication for the causal role of the gene in CM (supplementary Table 4). Scine human orthologs of mouse essential genes are associated with a wide spectrum of diseases [26], we also further investigated the candidate genes which are embryonic lethal genes in mice models (MGI) in DEG version 13.3 to check their essentiality in human. Essential genes are indispensable for the survival of an organism and necessary for basic developmental functions. Loss-of-function mutations in essential genes cause lethality during embryogenesis or shortly after birth [27]. The Cytoscape software [28] was used to visualize the relations between the genes and biological functions.

99

Chapter 5

Figure 1: Overview of the study set-up and study phases, and their relationships as implemented in the present study

100

In silico network analysis of congenital microcephaly

Results Selection of congenital microcephaly genes Our systematic PubMed review resulted in 770 articles, in which 397 candidate genes were reported as apparently CM-associated genes. We categorized the genes identified through our systematic review into four classes (supplementary Table 1). Combination of gene sets from class 1 and 2 led to 77 definite CM-causing genes (Table 1): 76 protein coding genes and one small nuclear RNA (SnRNA), RNU4ATAC. These 76 genes were then used as input query in phase 2 with the exclusion of RNU4ATAC because there was no information about this gene in the applied databases.

Building a composite biological network, GO term annotation, and group enrichment Our approach yielded a list of the top 51 significant GO terms (supplementary Dataset S1). We excluded six non-specific GO terms: protein C-terminus binding, brain development, organelle assembly, in utero embryonic development, organelle 5 fission and nuclear division. The remaining 45 GO terms were categorized into nine biological function groups that are important in the context of CM, namely (i) Centrosome, spindle pole and microtubule related (14 GO terms); (ii) DNA damage repair and cell cycle checkpoint (7 GO terms); (iii) Nuclease activity (5 GO terms); (iv) Helicase activity (5 GO terms); (v) Telomere biology (3 GO terms); (vi) DNA replication (3 GO terms); (vii) DNA metabolic process (3 GO terms); (viii) chromosome localization (2 GO terms) and; (ix) DNA conformation change (2 GO terms). Additionally, there were three singleton GO terms of (x) tRNA metabolic process, (xi) DNA-dependent ATPase activity and (xii) DNA recombination (Table 2). The enrichment scores of each biological function group and their corresponding significance obtained from the exact hypergeometric probability test are presented in Table 2. Of highest interest and novelty, were two previously rarely addressed biological functions, telomere biology and tRNA metabolic process, with 16.75-fold (P=9.2×10-5) and 19.17-fold (P=6.07×10-6) enrichment, respectively. Furthermore, and as positive validation, previously well-known CM-associated biological functions scored highly, these include centrosome, spindle pole, microtubule-related functions (13.12-fold enrichment; P=2.61×10-17) and DNA damage repair/cell cycle checkpoint-related function (14.23 folds; P=1.13×10-20) displayed the highest number of annotated GO terms in our network analysis. Notably, helicase activity including unwinding of DNA duplexes resulting in an open conformation (18.9-fold; P=6.5×10-06), nuclease activity including cleavage DNA in a damaged site (21.43-fold; P=3.48×10-11) and DNA recombination (10.82- 101

Chapter 5 folds-; P=1.80×10-05) were significantly enriched in our network (Table 2, and supplementary Table 3). Chromosome localization and DNA conformation change also displayed 37.12-fold (P=6.80×10-5) and 12.1-fold (P=9.73×10-6) enrichments, respectively. Repeating functional enrichment analysis against GO terms (Phase 2.3; using 176 CM-associated genes), yielded a list of 192 top significant GO terms (supplementary Dataset S2). Excluding 26 non-specific GO terms (supplementary Dataset S2) left 166 GO terms which were categorized into 21 biological function groups according to their similarities in related biological function and 10 singleton GO terms (Table 2 and supplementary Table 3). As expected, a higher number of GO terms and thus biological function groups emerged (supplementary Table 3) due the increased number of genes. Comparing the results of phase 2.2 and 2.3, we observed all 12 biological function groups in phase 2.2 appeared once more in phase 2.3 (Table 2). Telomere biology showed a significant enrichment score of 17.85-fold at P=2.95×10-12 as did tRNA metabolic processes with an enrichment score 9.93-fold at P=2.91×10-05 in phase 2.3. Previously well-known CM-associated biological functions, such as centrosome, spindle pole, microtubule-related functions (10-fold; P=2.15×10-44), DNA damage repair, cell cycle checkpoint-related functions (9.68- fold; P=1.88×10-47), nuclease activity including cleavage DNA damaged site (7.98; P=6.58×10-10), DNA conformation change (11.32; P=1.30×10-10), helicase activity including unwinding of DNA duplexes (13.06; P=1.66×10-07), DNA recombination (13.23; P=1.25×10-14), and chromosome localization (21.37; P=3.19×10-05) were significantly enriched again in phase 2.3. In addition to the GO terms identified in phase 2.2, several additional biological functions were indicated in phase 2.3 (supplementary Table 3). Of these, two top biological functions with many annotated GO terms showed the highest statistical significance in relation with CM: i.) Chromosome and sister chromatid segregation (23.88-fold; P=1.03×10-28) and ii.) Kinetochore, centromere and condensed chromosome (14.82-fold; P=7.77×10-25) (supplementary Table 3).

102

In silico network analysis of congenital microcephaly

Table 1: Systematic review of causal genes reported for congenital microcephaly. Gene Name of Relation with Reference source disease/syndrome microcephaly MIM literature ID 1 AP3B1* Hermansky-Pudlak syndrome 2 Microcephaly at birth 608233 2 AP4B1* Spastic paraplegia 47 Microcephaly; Onset at 614066 birth 3 AP4E1* Spastic paraplegia 51 Microcephaly; Onset at 613744 birth 4 AP4M1* Spastic paraplegia 50 Microcephaly; Prenatal 612936 onset 5 AP4S1* Spastic paraplegia 52 Microcephaly; Onset at 614067 birth 6 ASNS* Asparagine synthetase deficiency Microcephaly; Onset in 615574 utero or at birth 7 ASPM* Autosomal recessive primary Microcephaly; Onset at 608716 microcephaly 5 birth 8 C2CD3* Orofaciodigital syndrome XIV Microcephaly; Onset in 615948 utero 9 CENPJ* Autosomal recessive primary Microcephaly; Onset at 609279 microcephaly 6;Seckel syndrome 4 birth 10 CEP135* Autosomal recessive primary Microcephaly; Onset at 614673 microcephaly 8 birth 11 CHKB* Muscular dystrophy, congenital, Microcephaly at birth 602541 5 megaconial type 12 CLP1* Pontocerebellar hypoplasia type 10 Microcephaly; Onset at 615803 birth 13 EPG5* Vici syndrome Microcephaly at birth 242840 14 ERCC1* Cerebrooculofacioskeletal syndrome Microcephaly; Onset at 610758 4 birth 15 IBA57* Multiple mitochondrial dysfunctions Microcephaly, Onset in 615330 syndrome 3 utero 16 IGF1* Growth retardation with deafness Microcephaly, onset in 608747 and mental retardation due utero to IGF1 deficiency 17 IGF1R* Resistance to Insulin-like growth Microcephaly; Onset in 270450 factor I utero 18 KIAA1279* Goldberg-Shprintzen megacolon Microcephaly; Prenatal 609460 syndrome onset 19 KIF5C* Cortical dysplasia with other brain Microcephaly; Onset at 615282 malformation birth 20 MCM4* Natural killer cell and glucocorticoid Microcephaly; growth 609981 deficiency with DNA repair defect retardation onset in utero 21 MCT8* Allan-Herndon-Dudley syndrome Microcephaly at birth 300523

22 NDE1* Lissencephaly 4 (with microcephaly) Microcephaly; Onset in 614019 utero 23 OCLN* Band-like calcification with simplified Congenital microcephaly 251290 gyration and polymicrogyria 24 PHGDH* Phosphoglycerate dehydrogenase Congenital microcephaly 601815 deficiency 25 PLK4* Microcephaly and chorioretinopathy, Microcephaly; Onset at 616171 autosomal recessive, 2 birth 26 PNKP* Microcephaly, seizures, and Microcephaly; Onset 613402 developmental delay prenatally or at birth 27 POMT1* Muscular dystrophy- Microcephaly; Onset 236670 dystroglycanopathy (congenital with prenatally or at birth brain and eye anomalies), type A, 1

103

Chapter 5

Gene Name of Relation with Reference source disease/syndrome microcephaly MIM literature ID 28 POMT2* Muscular dystrophy- Microcephaly; Onset 613150 dystroglycanopathy (congenital with prenatally or at birth brain and eye anomalies), type A, 2 29 PSAT1* Neu-Laxova syndrome 2 Microcephaly; Onset in 616038 utero 30 QARS* Microcephaly, progressive, seizures, Microcephaly; Onset at 615760 and cerebral and cerebellar atrophy birth or early infancy 31 RARS2* Pontocerebellar hypoplasia, type 6 Microcephaly; Onset at 611523 birth or in first days 32 RBBP8* Jawad syndrome Congenital microcephaly 251255 33 SLC25A19* Microcephaly, Amish type Microcephaly; Onset at 607196 birth 34 SLC25A22* Epileptic encephalopathy, early Microcephaly; Onset in 609304 infantile, 3 first hours to days of life 35 SSR4* Congenital disorder of glycosylation, Microcephaly; Onset at 300934 type Iy birth 36 STAMBP* Microcephaly-capillary malformation Microcephaly; Onset at 614261 syndrome birth 37 TSEN2* Pontocerebellar hypoplasia type 2B Microcephaly; Onset at 612389 birth 38 TSEN54* Pontocerebellar hypoplasia type 2A Microcephaly; Onset at 277470 birth 39 TUBB* Cortical dysplasia, complex, with Microcephaly; Onset at 615771 other brain malformations 6 birth 40 TUBGCP6* Microcephaly and chorioretinopathy, Microcephaly; Onset at 251270 autosomal recessive, 1 birth 41 WDR62* Autosomal recessive primary Microcephaly; Onset in 604317 microcephaly 2 utero 42 ZNF335* Autosomal recessive primary Microcephaly; Onset at 615095 microcephaly 10 birth 43 ZNF592* Spinocerebellar ataxia, autosomal Microcephaly at birth 606937 recessive 5 44 AMPD2* Pontocerebellar hypoplasia, type 9 Microcephaly at birth 615809 45 ATR♣ Seckel syndrome 1 Severe microcephaly at 210600 [37,74] birth; Prenatal growth retardation 46 CASC5♣ Autosomal recessive primary Congenital microcephaly 604321 [60] microcephaly 4 47 CDK5RAP2♣ Autosomal recessive primary Primary microcephaly 604804 [75] microcephaly 3 48 CDK6♣ Autosomal recessive primary Primary microcephaly 616080 [76] microcephaly 12 49 CENPE♣ Autosomal recessive primary Congenital microcephaly 616051 [56] microcephaly 13 50 CENPF♣ Ciliary dyskinesia, primary, 31 Primary microcephaly 600236 [58] 51 CEP152♣ Autosomal recessive primary Primary microcephaly 613529 [77] microcephaly 9; Seckel syndrome 5 52 CEP63♣ Seckel syndrome 6 Primary microcephaly 614728 [78] 53 CKAP2L♣ Filippi syndrome Microcephaly at birth 272440 [62] 54 CLPB♣ 3-methylglutaconic aciduria, type VII, Congenital microcephaly 616254 [79] with cataracts, neurologic involvement and neutropenia 55 DDX11♣ Warsaw breakage syndrome Congenital microcephaly 613398 [80,81] 56 DNA2♣ Seckel syndrome 8 Severe congenital 615807 [82] microcephaly

104

In silico network analysis of congenital microcephaly

Gene Name of Relation with Reference source disease/syndrome microcephaly MIM literature ID 57 DYRK1A♣ Mental retardation, autosomal Primary microcephaly 614104 [83,84] dominant 7 58 EFTUD2♣ Mandibulofacial dysostosis, Guion- Microcephaly at birth 610536 [85] Almeida type 59 EOMES♣ Prenatal-onset 604615 [86,87] microcephaly

60 IER3IP1♣ Microcephaly, epilepsy, and diabetes Primary microcephaly 614231 [88,89] syndrome 61 KAT6A♣ Mental retardation, autosomal Primary microcephaly 601408 [52] dominant 32 62 KIF11♣ Microcephaly with or without Congenital microcephaly 152950 [90,91] chorioretinopathy, lymphedema, or mental retardation 63 KIF14♣ Meckel syndrome 12 Microcephaly; intrauterine 616258 [92] growth retardation 64 KIF2A♣ Cortical dysplasia, complex, with Congenital microcephaly 615411 [93] other brain malformations 3 65 LIG4♣ LIG4 syndrome Congenital microcephaly 606593 [94] 66 MCPH1♣ Autosomal recessive primary Primary microcephaly 251200 [95,96] microcephaly 1 5 67 NBS1 Nijmegen breakage syndrome Microcephaly at birth; 251260 [39,97,98] (NBN)♣ Prenatal growth retardation 68 NHEJ1♣ Severe combined immunodeficiency Microcephaly at birth 611291 [99] with microcephaly, growth retardation, and sensitivity to ionizing radiation 69 PCNT♣ Microcephalic osteodysplastic Microcephaly at birth 210720 [100] primordial dwarfism, type II 70 PHC1♣ Autosomal recessive primary Primary microcephaly 615414 [10] microcephaly 11 71 RAD50 Nijmegen breakage syndrome-like Congenital microcephaly 613078 [41] disorder 72 RNU4ATAC Microcephalic osteodysplastic Microcephaly at birth 210710 [101] ‡♣ primordial dwarfism, type I 73 SASS6♣ Autosomal recessive primary Primary microcephaly 609321 [6] microcephaly14 74 STIL♣ Autosomal recessive primary Primary microcephaly 612703 [102] microcephaly 7 75 TRMT10A♣ Microcephaly, short stature, and Primary microcephaly 616033 [53] impaired glucose metabolism 76 XRCC2♣ Microcephaly at birth 600375 [103] 77 XRCC4♣ Short stature, microcephaly, and Prenatal-Onset Growth 194363 [104,105] endocrine dysfunction Failure with Disproportionate Microcephaly ‡GeneMANIA cannot recognize RNU4ATAC so it was not used as an input gene in initial gene set; *emerged from class 1; ♣emerged from class 2.

As Figure 3 shows, CM-causing genes normally exhibit multifunctional roles and represent overlapping biological functions. Several CM-causing genes overlap in two, three or even more biological functions suggested to be involved in CM. ERCC1, for example, is involved in five biological functions: DNA damage repair, 105

Chapter 5 telomere biology, the DNA metabolic process, DNA recombination and nuclease activity. NBN (NBS1) shows association with eight biological functions. Of interest, we observed the category “tRNA metabolic process; the chemical reactions and pathways involved in tRNA processes such as aminoacylation, tRNA end turnover” as a distinct biological function (Figure 3), and the genes RARS2, TSEN54, TSEN2, CLP1 and QARS are categorized in this group.

Evaluation of the validity of network prediction According to OMIM, mutations in seven out of 100 top related genes identified in phase 2.3 (MRE11A, BUB1B, PRKDC, PARS, TUBG1, ERCC4 and ATRIP) cause disorders in which microcephaly is of one the phenotypic features (supplementary Table 4). These genes were not detected in our extensive literature review because of e.g. incomplete penetrance or onset of microcephaly in infancy and we have grouped them in class 3 of our literature review (see supplementary Table 2). Inspection of MGI (see Web Links) for the 93 remaining genes revealed that mutations in four genes (BMI1, AGTPBP1, DISC1 and KIF3A) also cause microcephaly with phenotypic features such as cerebral cortex abnormalities or abnormal brain morphology in knockout mice (supplementary Table 4). Twenty two out of the remaining 89 genes cause prenatal lethality in mice models according to the MGI database (supplementary Table 4). Because these genes may represent essential genes in mice [27], we analyzed these 22 in DEG to check whether these genes may also be essential genes in humans. This turned out to be the case for 20 of the 22 embryonic lethal genes apart from IGFBP4 and GTSE1 (supplementary Table 4). Additionally, five probable CM candidate genes were identified by analyzing HGMD. Defects in TSEN15 [29], TSEN34 [30], KIF23 [31], IGFBP4 [31] and KIF4A [32] have already been reported in human disorders in which microcephaly is one of the phenotypic features (supplementary Table 4). There is no apparent relation between the remaining 64 candidate genes and CM based on OMIM, MGI or HGMD other than that these putative candidate genes show significant relationships with well-established CM-causing genes within our network (supplementary Table 4). Although the majority of these remaining 64 candidate genes are not associated with established disorders in the OMIM database, some have been related to diseases in human. RAD51D, for instance, has been associated with breast-ovarian cancer, PTCH1 with basal cell nevus syndrome and BRCA1 with breast cancer (supplementary Table 4).

106

In silico network analysis of congenital microcephaly

5

107

Chapter 5

see see

group

ouped together. For

genes associated genes withassociated

with telomere inwith biology

genes genes were extracted from

-

associated

-

lity.

upplement upplement Dataset 1). Next, duplicated

function

-

bability bability test was calculated via GeneProf software (

are associated with telomere biology. Biological function

[23]

output output and biological

of of 20,687 genes of the entire genome

related group, there are three GO terms which are related to telomere biology (S

-

).

was then calculated using the formula shown. The exact hypergeometric pro

[24]

An example calculation of biological function group enrichment and exact hypergeometric probabi hypergeometric exact function and groupenrichment biological of calculation An example

Box 1 and Web links Web and 1 Box Figure 2:Figure To calculate biological function group enrichment (see Box 1), all GO terms related to each biological function group were gr example, in the telomere in each genes biological groupwere to removed produce finala unique set gene per biological function In group. our network, each biological function group were extracted directly from GeneMANIA entire genome the a from fouracross For publicly genes database. 76 out wereof available instance, found associated be to our network analyses, whereas 65 out enrichment

108

In silico network analysis of congenital microcephaly

5

Figure 3: In silico composite network analysis highlighting possible biological functions underlying congenital microcephaly (phase 2.2). Green nodes indicate CM-causing genes involved in two biological functions. Orange nodes indicate CM-causing genes involved in three biological functions. Purple nodes indicate CM-causing genes involved in four or more biological functions.

Discussion Despite the recent advances in NGS technology, the etiology of CM is still not fully understood and the causal genetic mutations remain unidentified in many of patients with CM. We aimed to identify biological processes underlying CM which may suggest potential candidate genes in which defects may cause CM. In addition to well-known biological processes underlying CM, our network analysis identified two less addressed biological processes: “telomere biology” and “tRNA metabolic process”. The candidate genes were also further evaluated in silico for CM causality. Among 100 candidate genes, 12 genes have already reported for microcephaly in human (according to OMIM or HGMD) and four genes were also associated with brain abnormalities which are expected to cause microcephaly in mice (according to MGI).

Novel biological processes that may be relevant in the etiology of CM Our network analysis suggested that two previously less addressed biological functions of “telomere biology” and “tRNA metabolic process” may be involved in the pathomechanism of CM. The biological processes related to telomere biology 109

Chapter 5 were enriched 16.75 times in CM network. Several lines of knowledge support a role for telomere biology in the pathomechanism of CM. Among 76 established CM- causing genes (Table 1), seven (MCPH1, ATR, RBBP8, NBS1, RAD50, DNA2 and ERCC1) are associated with telomere biology. MCPH1 regulates DNA-damage response at the telomeres [33], maintains telomere structure and represses human telomerase reverse transcriptase [34,35]. Similarly, the ATR gene is involved in telomere protection and stability [36]. Patient primary fibroblasts deficient for ATR spontaneously accumulated DNA damage at telomeres [37]. Likewise, mutations in RBBP8 causes Jawad syndrome (MIM 251255) and CM is one of its phenotypic features. In fact, RBBP8 promotes alternative non-homologous end-joining repair at uncapped telomeres [38]. Also, mutations in the NBS1 (also called NBN) gene cause Nijmegen breakage syndrome (NBS; MIM 251260) which is characterized by microcephaly at birth combined with immunodeficiency and predisposition to malignancies [39]. NBS1 is part of the MRE11/RAD50/NBS1 complex (MRN complex), which is involved in the DNA double-strand break response pathway [40] that plays a role in telomere maintenance (see Box 1) and its organization. Both RAD50 and NBS1 cause CM [39,41], and, interestingly, mutations in the first part of heterotrimeric MRN complex, MRE11, can also cause microcephaly [42]. The MRN complex serves as shelterin accessory factors which are implicated in the generation of telomere G-overhangs [43], recombination [44] and telomere length control [45]. This finding led us to take a closer look at the other shelterin accessory factors listed in the study by Diotti and Loayza [46], and we noticed that ERCC1, MCPH1 and ATR are also shelterin accessory factors and mutations in these genes lead to CM (Table 1). Additionally, mutations in the other shelterin accessory factors (ORC1, XPF and RTEL1) cause microcephaly. However, we grouped these genes in class 3 of our systematic review (supplementary Table 2), as the onset of microcephaly caused by these genes is in the infancy period. As part of origin recognition complex (ORC), ORC1 has a role in telomere maintenance [47]. Both XPF [48] and RTEL1 [49] also play a role in telomere maintenance. Furthermore, two GO terms, “telomere maintenance” and “telomere organization” which were enriched in our network, were associated with DNA2, ERCC1, RAD50 and NBS1. “Nuclear chromosome, telomeric region” were also enriched in our analysis, as ERCC1, RAD50 and NBS1 categorized into this GO term (supplementary Dataset S1). Taken together, our findings further support a possible involvement of telomere biology in the pathomechanism of CM. The observed 19.17-fold enrichment of tRNA metabolic process in our network highlights a possible relationship of this biological process with CM. Our network analysis has linked seven (CLP1, QARS, RARS2, TSEN2, TSEN54, 110

In silico network analysis of congenital microcephaly

KAT6A and TRMT10A) out of the 76 CM-causing genes into the tRNA metabolic process in which the first five genes represent a separate gene-network cluster among the CM-related genes (Figure 3). Except for QARS, these genes cause different subtypes of pontocerebellar hypoplasia (Table 1) that are characterized by mostly prenatal onset of stunted growth, decay of cerebral structures and CM [50]. Mutations in QARS, encoding Glutaminyl-tRNA Synthetase, cause progressive microcephaly associated with cerebral-cerebellar atrophy [51]. Additionally, mutations in KAT6A [52], a Lysine acetyl-transferase gene, and TRMT10A [53], a tRNA modifying enzyme with methyltransferase activity, cause syndromes associated with CM. Yet, the mechanism by which neuronal cells are vulnerable to defects in tRNA metabolic processes is not well understood. It has been suggested that defects in tRNA processing leading to tRNA fragment accumulation, and might trigger the development of neurodegeneration [54] and apoptosis of neuroprogenitor cells [55]. Although the majority of CM-causing genes exhibit overlapping biological functions, the separate cluster of tRNA metabolic process may indicate that this process is involved in a distinct pathway underlying CM 5 (Figure 3).

Well-known biological processes underlying CM Our network analyses identified previously well-known biological functions implicated in CM such as defects in centrosome, spindle pole-related, DNA damage repair and cell cycle checkpoint-related functions [4,5,7] and chromosome localization [56–58] which have been discussed comprehensively elsewhere. “DNA conformation change” was enriched 12.10-fold in our CM network, in which a gene set of six genes, (CASC5, DDX11, DNA2, KAT6A, NBN, and RAD50) were linked to this biological process. Notably, four of these genes exhibit helicase activity. Similarly, chromatin remodeling, which changes DNA conformation, has been recently linked to CM [7]. Further support was given by the identification of ZNF335 and PHC1 as MCPH genes, called MCPH10 and MCPH11, respectively. These genes are associated with chromatin remodeling as well [7,11]. Clinically, loss of ZNF335 in neuron progenitors disrupts progenitor cell proliferation, leading to a severe reduction in forebrain structure [11]. However, the PHC1 pathomechanism in MCPH is not yet completely clear. It has been suggested that mutations in PHC1 modify chromatin conformation and as a consequence, defects in DNA damage repair and cell cycle progression may then lead to MCPH [10]. Two extra top enriched biological functions, (i) chromosome and sister chromatid segregation and (ii) Kinetochore, centromere and condensed chromosome, which were identified in phase 2.3, have already been reported as biological functions underlying CM [59– 111

Chapter 5

62]. Indeed, definite CM-causing genes accompanied by the top 100 candidate genes could evoke a number of already known biological functions in a more efficient manner than using only the 76 definite CM-causing genes.

Identification of novel candidate genes for CM Network analysis resulted in a gene set of 100 loci that may be involved in CM, out of these different groups of genes can be distinguished. The first group of genes is comprised of 16 genes with experimentally/clinically represented association with microcephaly. According to OMIM, mutations in seven genes out of top 100 genes represent diseases in which microcephaly has been reported as a phenotypic feature of their clinical synopsis. These genes have been categorized in class 3 of our systematic review, and were not included in the initial set of 76 CM causing genes. The other set of four genes has been confirmed as potentially causal genes for microcephaly in mice models (MGI). These are plausible candidates to screen in unsolved cases with microcephaly or for investigation of their functional role in neural progenitor cells, as defects in progenitor cell proliferation may lead to CM. It is worth mentioning that mutations in DISC1 have been reported in Schizophrenia [63,64] and in patients with agenesis of corpus callosum, a developmental abnormality of the brain often associated with microcephaly as well [65,66]. Additionally, five genes identified in our analyses (TSEN15, TSEN34, KIF23, IGFBP4 and KIF4A) were found to be related to microcephaly in HGMD. In fact, only one or two pathogenic mutations have been found in these five genes. Among them, interestingly, TSEN15 and TSEN34 are involved in the tRNA metabolic process. The second group includes the remaining 84 candidate genes for which we found no relation with microcephaly in OMIM, MGI or HGMD. Although the majority of these are not associated with established disorders, a few genes such as BRCA1, RAD51D, PTCH1, SERPINA7, AMN, BARD1 and DCLRE1C have been reported for complex diseases in which microcephaly has not been recorded as one of their related clinical synopses. Therefore, these genes can probably be ruled out as potential CM-causing genes. Interestingly, 22 genes that localize predominantly in the center of our network (shown in supplementary Figure 1) are associated with prenatal lethality in mice and their characteristics were summarized in supplementary Table 4. In addition, based on DEG, 14 human orthologs out of these 22 mouse lethal genes not only are essential genes in human but there were no survived pathogenic mutation recorded for them in HGMD. Human orthologs of such essential genes are considered to have important roles in (neuro)development and to be more likely associated with disease based on the observed strong cross- 112

In silico network analysis of congenital microcephaly species conservation, phenotypes in other species and lower mutation rates [27,67– 70]. Dickerson et al. (2011) revealed that the human orthologs of mouse essential genes are associated with a wide spectrum of diseases affecting diverse tissues [26]. It should be noted that null or severe loss-of-function mutations in essential genes may indeed contribute to embryonic lethality, while, hypomorphic mutations (partial loss-of-function) in the same genes can contribute to less severe abnormalities that are recognized as human disease [26]. Furthermore, it has been shown that human disease genes which are also lethal in the mouse tend to be inherited in dominant pattern of inheritance and cause developmental abnormalities [26]. Putting all this evidence together, examining heterozygous or de novo mutations in orthologs of these 22 essential genes is a diagnostic option for unsolved CM patients.

Strength and limitation of the study and further applicability Our study implemented an in silico framework for a rare heterogeneous Mendelian disease. Such an approach is currently common practice in common diseases, but our study is among the first to apply this for a Mendelian disease and, in particular, 5 in CM. Previous studies usually only discussed a single biological network or just limited biological information [17,71,72], whereas we utilized highly diverse biological information to build a comprehensive composite biological network. Further strengths of our study are the calculation of biological function group enrichment and the in silico evaluation of CM candidate genes. The fact that our analysis was able to move from genes back to known biological pathomechanisms involved in CM, underlines the feasibility of in silico functional network analysis to illustrate (novel) biological processes underlying CM. Of note is that incompleteness of GO annotations is a limitation in GO- based studies. For example, despite involvement of ASPM – a well-established MCPH-causal gene – in spindle pole orientation and organization [73], ASPM was not among the group “centrosome, spindle pole and microtubule related” gene set selected by the GO term. In addition, due to lack of information in applied databases, it was not possible to input RNU4ATAC as a query gene, even though RNU4ATAC is a SnRNA and defects in this gene causes microcephaly at birth.

Conclusion This study aimed to provide novel insights to the pathomechanism of CM. By performing an in silico systematic analyses, we built a composite biological network for CM from which we implied the biological processes of telomere biology and tRNA metabolic processes involved in CM. Using this approach, we confirmed already known biological processes underlying CM such as defects in DNA damage 113

Chapter 5 repair and cell cycle control, and centrosome-, spindle pole- and microtubule-related processes. We further compiled a list of 100 biologically plausible candidate genes for CM, which was backed by existing information in OMIM, MGI or HGMD from both human and mouse models. We propose that a similar approach is applicable to other rare Mendelian diseases.

Acknowledgments We thank Kate Mc Intyre for editing the manuscript. Omid Jazayeri was financially supported by a scholarship from Graduate School of Medical Sciences (GSMS), University of Groningen, Groningen, the Netherlands.

Web Links Online Mendelian Inheritance in Man, http://omim.org/ GeneMANIA, http://www.genemania.org and also http://pages.genemania.org/help/ Atlas of developing human brain, http://www.brainspan.org GO terms associated gene list for entire genome extracted from GeneMANIA, http://www.genemania.org/data/r10_GoCategories/ Mouse Genome Informatics (MGI), http://www.informatics.jax.org/ Calculation of Hypergeometric probability, http://www.geneprof.org/GeneProf/tools/hypergeometric.jsp Gene Ontology Consortium, http://geneontology.org/ Database of Essential Genes (DGE), http://www.essentialgene.org/ Human Gene Mutation Database (HGMD) http://www.hgmd.cf.ac.uk/ac/index.php

114

In silico network analysis of congenital microcephaly

References 1. Opitz J, Holt M (1989) Microcephaly: general considerations and aids to nosology. Journal of craniofacial genetics and developmental biology 10: 175– 204. 2. Woods CG, Parker A (2013) Investigating microcephaly. Archives of disease in childhood 98: 707–713. 3. Woods CG, Bond J, Enard W (2005) Autosomal recessive primary microcephaly (MCPH): a review of clinical, molecular, and evolutionary findings. The American Journal of Human Genetics 76: 717–728. 4. Alcantara D, O’Driscoll M (2014) Congenital microcephaly. American Journal of Medical Genetics Part C: Seminars in Medical Genetics 166: 124– 139. 5. Morris-Rosendahl DJ, Kaindl AM (2015) What next-generation sequencing (NGS) technology has enabled us to learn about primary autosomal recessive microcephaly (MCPH). Molecular and cellular probes 29: 271–281. 6. Khan MA, Rupp VM, Orpinell M, Hussain MS, Altmüller J, et al. (2014) A 5 missense mutation in the PISA domain of HsSAS-6 causes autosomal recessive primary microcephaly in a large consanguineous Pakistani family. Human molecular genetics 23: 5940–5949 . 7. Barbelanne M, Tsang WY (2014) Molecular and Cellular Basis of Autosomal Recessive Primary Microcephaly. BioMed Research International: Article ID 547986. 8. Guemez-Gamboa A, Nguyen LN, Yang H, Zaki MS, Kara M, et al. (2015) Inactivating mutations in MFSD2A, required for omega-3 fatty acid transport in brain, cause a lethal microcephaly syndrome. Nature genetics 47: 809–813. 9. Kaindl AM, Passemard S, Kumar P, Kraemer N, Issa L, et al. (2010) Many roads lead to primary autosomal recessive microcephaly. Progress in neurobiology 90: 363–383. 10. Awad S, Al-Dosari MS, Al-Yacoub N, Colak D, Salih MA, et al. (2013) Mutation in PHC1 implicates chromatin remodeling in primary microcephaly pathogenesis. Human molecular genetics 22: 2200–2213. 11. Yang YJ, Baltus AE, Mathew RS, Murphy EA, Evrony GD, et al. (2012) Microcephaly gene links trithorax and REST/NRSF to control neural stem cell proliferation and differentiation. Cell 151: 1097–1112. 12. Rabbani B, Mahdieh N, Hosomichi K, Nakaoka H, Inoue I (2012) Next- generation sequencing: impact of exome sequencing in characterizing Mendelian disorders. Journal of human genetics 57: 621–632.

115

Chapter 5

13. Jamra RA, Philippe O, Raas-Rothschild A, Eck SH, Graf E, et al. (2011) Adaptor protein complex 4 deficiency causes severe autosomal-recessive intellectual disability, progressive spastic paraplegia, shy character, and short stature. The American Journal of Human Genetics 88: 788–795. 14. Rump P, Jazayeri O, van Dijk-Bos KK, Johansson LF, van Essen AJ, et al. (2016) Whole-exome sequencing is a powerful approach for establishing the etiological diagnosis in patients with intellectual disability and microcephaly. BMC Medical Genomics 9: 7. 15. De Ligt J, Willemsen MH, van Bon BW, Kleefstra T, Yntema HG, et al. (2012) Diagnostic exome sequencing in persons with severe intellectual disability. New England Journal of Medicine 367: 1921–1929. 16. Lohmann K, Klein C (2014) Next Generation Sequencing and the Future of Genetic Diagnosis. Neurotherapeutics 11: 699–707. 17. Novarino G, Fenstermaker AG, Zaki MS, Hofree M, Silhavy JL, et al. (2014) Exome sequencing links corticospinal motor neuron disease to common neurodegenerative disorders. Science 343: 506–511. 18. Vaez A, Jansen R, Prins BP, Hottenga J-J, de Geus EJ, et al. (2015) An in silico Post-GWAS Analysis of C-Reactive Protein Loci Suggests an Important Role for Interferons. Circulation: Cardiovascular Genetics 8: 487–497. 19. Zuberi K, Franz M, Rodriguez H, Montojo J, Lopes CT, et al. (2013) GeneMANIA prediction server 2013 update. Nucleic Acids Research 41: 115–122. 20. Fietz SA, Lachmann R, Brandl H, Kircher M, Samusik N, et al. (2012) Transcriptomes of germinal zones of human and mouse fetal neocortex suggest a role of extracellular matrix in progenitor self-renewal. Proceedings of the National Academy of Sciences 109: 11836–11841. 21. Garg R, Caino MC, Kazanietz MG (2013) Regulation of Transcriptional Networks by PKC Isozymes: Identification of c-Rel as a Key Transcription Factor for PKC-Regulated Genes. PLoS One 8: e67319. 22. Farris SP, Miles MF (2013) Fyn-dependent gene networks in acute ethanol sensitivity. PLoS One 8: e82435. 23. Pennisi E (2012) ENCODE project writes eulogy for junk DNA. Science 337: 1159–1161 . 24. Kleefstra T, Schenck A, Kramer JM, van Bokhoven H (2014) The genetics of cognitive epigenetics. Neuropharmacology 80: 83–94. 25. Luo H, Lin Y, Gao F, Zhang C-T, Zhang R (2014) DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic Acids Research 42: 574–80. 116

In silico network analysis of congenital microcephaly

26. Dickerson JE, Zhu A, Robertson DL, Hentges KE (2011) Defining the role of essential genes in human disease. PLoS One 6: e27368. 27. Georgi B, Voight BF, Bucan M (2013) From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLOS Genetics 9: e1003484. 28. Smoot ME, Ono K, Ruscheinski J, Wang P-L, Ideker T (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27: 431–432. 29. Alazami AM, Patel N, Shamseldin HE, Anazi S, Al-Dosari MS, et al. (2015) Accelerating novel candidate gene discovery in neurogenetic disorders via whole-exome sequencing of prescreened multiplex consanguineous families. Cell Report 10: 148–161. 30. Budde BS, Namavar Y, Barth PG, Poll-The BT, Nürnberg G, et al. (2008) tRNA splicing endonuclease mutations cause pontocerebellar hypoplasia. Nature Genetics 40: 1113–1118. 31. Karaca E, Harel T, Pehlivan D, Jhangiani SN, Gambin T, et al. (2015) Genes 5 that Affect Brain Structure and Function Identified by Rare Variant Analyses of Mendelian Neurologic Disease. Neuron 88: 499–513. 32. Willemsen MH, Ba W, Wissink-Lindhout WM, de Brouwer AP, Haas SA, et al. (2014) Involvement of the kinesin family members KIF4A and KIF5C in intellectual disability and synaptic function. Journal of medical genetics 51: 487–494. 33. Kim H, Lee O-H, Xin H, Chen L-Y, Qin J, et al. (2009) TRF2 functions as a protein hub and regulates telomere maintenance by recognizing specific peptide motifs. Nature structural & molecular biology 16: 372–379. 34. Shi L, Li M, Su B (2012) MCPH1/BRIT1 represses transcription of the human telomerase reverse transcriptase gene. Gene 495: 1–9. 35. Venkatesh T, Suresh PS (2014) Emerging roles of MCPH1: Expedition from primary microcephaly to cancer. European Journal of Cell Biology 93: 98– 105. 36. McNees CJ, Tejera AM, Martinez P, Murga M, Mulero F, et al. (2010) ATR suppresses telomere fragility and recombination but is dispensable for elongation of short telomeres by telomerase. Journal of Cell Biology 188: 639–652. 37. Mokrani-Benhelli H, Gaillard L, Biasutto P, Le Guen T, Touzot F, et al. (2013) Primary microcephaly, impaired DNA replication, and genomic instability caused by compound heterozygous ATR mutations. Human mutation 34: 374–384. 117

Chapter 5

38. Badie S, Carlos AR, Folio C, Okamoto K, Bouwman P, et al. (2015) BRCA1 and CtIP promote alternative non-homologous end-joining at uncapped telomeres. The EMBO journal 34: 410–424. 39. Chrzanowska KH, Gregorek H, Dembowska-Bagi’nska B zenna, Kalina MA, Digweed M (2012) Nijmegen breakage syndrome (NBS). Orphanet Journal of Rare Diseases 7: 13. 40. Berardinelli F, di Masi A, Salvatore M, Banerjee S, Myung K, et al. (2007) A case report of a patient with microcephaly, facial dysmorphism, chromosomal radiosensitivity and telomere length alterations closely resembling “Nijmegen breakage syndrome” phenotype. European journal of medical genetics 50: 176–187. 41. Waltes R, Kalb R, Gatei M, Kijas AW, Stumm M, et al. (2009) Human RAD50 deficiency in a Nijmegen breakage syndrome-like disorder. American Journal of Human Genetics 84: 605–616. 42. Matsumoto Y, Miyamoto T, Sakamoto H, Izumi H, Nakazawa Y, et al. (2011) Two unrelated patients with MRE11A mutations and Nijmegen breakage syndrome-like severe microcephaly. DNA Repair (Amst) 10: 314– 321. 43. Chai W, Sfeir AJ, Hoshiyama H, Shay JW, Wright WE (2006) The involvement of the Mre11/Rad50/Nbs1 complex in the generation of G- overhangs at human telomeres. EMBO Report 7: 225–230. 44. Wang RC, Smogorzewska A, de Lange T (2004) Homologous recombination generates T-loop-sized deletions at human telomeres. Cell 119: 355–368. 45. Wu Y, Xiao S, Zhu X-D (2007) MRE11-RAD50-NBS1 and ATM function as co-mediators of TRF1 in telomere length control. Nature Structural Molecular Biology 14: 832–840. 46. Diotti R, Loayza D (2011) Shelterin complex and associated factors at human telomeres. Nucleus 2: 119–135. 47. Deng Z, Norseen J, Wiedmer A, Riethman H, Lieberman PM (2009) TERRA RNA binding to TRF2 facilitates heterochromatin formation and ORC recruitment at telomeres. Molecular Cell 35: 403–413. 48. Wu Y, Mitchell TR, Zhu X-D (2008) Human XPF controls TRF2 and telomere length maintenance through distinctive mechanisms. Mechanisms of Ageing and Development 129: 602–610. 49. Uringa E-J, Youds JL, Lisaingo K, Lansdorp PM, Boulton SJ (2011) RTEL1: an essential helicase for telomere maintenance and the regulation of homologous recombination. Nucleic Acids Research 39: 1647–1655.

118

In silico network analysis of congenital microcephaly

50. Rudnik-Schneborn S, Barth PG, Zerres K (2014) Pontocerebellar hypoplasia. American Journal of Medical Genetics Part C: Seminars in Medical Genetics 166: 173–183. 51. Zhang X, Ling J, Barcia G, Jing L, Wu J, et al. (2014) Mutations in QARS, encoding glutaminyl-tRNA synthetase, cause progressive microcephaly, cerebral-cerebellar atrophy, and intractable seizures. The American Journal of Human Genetics 94: 547–558. 52. Arboleda VA, Lee H, Dorrani N, Zadeh N, Willis M, et al. (2015) De Novo Nonsense Mutations in KAT6A, a Lysine Acetyl-Transferase Gene, Cause a Syndrome Including Microcephaly and Global Developmental Delay. The American Journal of Human Genetics 96: 498––506. 53. Gillis D, Krishnamohan A, Yaacov B, Shaag A, Jackman JE, et al. (2014) TRMT10A dysfunction is associated with abnormalities in glucose homeostasis, short stature and microcephaly. Journal of medical genetics 51: 581–586. 54. Weitzer S, Hanada T, Penninger JM, Martinez J (2015) CLP1 as a novel 5 player in linking tRNA splicing to neurodegenerative disorders. Wiley Interdisciplinary Reviews: RNA 6: 47–63. 55. Koch L (2014) Disease genetics: tRNA splicing defect underlies brain disorder. Nature Reviews Genetics 15: 361–361. 56. Mirzaa GM, Vitre B, Carpenter G, Abramowicz I, Gleeson JG, et al. (2014) Mutations in CENPE define a novel kinetochore-centromeric mechanism for microcephalic primordial dwarfism. Human genetics 133: 1023–1039. 57. Feng Y, Walsh CA (2004) Mitotic Spindle Regulation by Nde1 Controls Cerebral Cortical Size. Neuron 44: 279–293. 58. Waters AM, Asfahani R, Carroll P, Bicknell L, Lescai F, et al. (2015) The kinetochore protein, CENPF, is mutated in human ciliopathy and microcephaly phenotypes. Journal of medical genetics 52: 147–156. 59. Lizarraga SB, Margossian SP, Harris MH, Campagna DR, Han A-P, et al. (2010) Cdk5rap2 regulates centrosome function and chromosome segregation in neuronal progenitors. Development 137: 1907–1917. 60. Genin A, Desir J, Lambert N, Biervliet M, Van Der Aa N, et al. (2012) Kinetochore KMN network gene CASC5 mutated in primary microcephaly. Human molecular genetics 21: 5306–5317. 61. Ghongane P, Kapanidou M, Asghar A, Elowe S, Bolanos-Garcia VM (2014) The dynamic protein Knl1-a kinetochore rendezvous. Journal of cell science 127: 1–9.

119

Chapter 5

62. Hussain MS, Battaglia A, Szczepanski S, Kaygusuz E, Toliat MR, et al. (2014) Mutations in CKAP2L, the Human Homolog of the Mouse Radmis Gene, Cause Filippi Syndrome. The American Journal of Human Genetics 95: 622– 632. 63. Song W, Li W, Feng J, Heston LL, Scaringe WA, et al. (2008) Identification of high risk DISC1 structural variants with a 2% attributable risk for schizophrenia. Biochemical and biophysical research communications 367: 700–706. 64. Mitchell K, Porteous D (2011) Rethinking the genetic architecture of schizophrenia. Psychological medicine 41: 19–32. 65. Prasad AN, Bunzeluk K, Prasad C, Chodirker BN, Magnus KG, et al. (2007) Agenesis of the corpus callosum and cerebral anomalies in inborn errors of metabolism. Congenital Anomalies (Kyoto) 47: 125–135. 66. Osbun N, Li J, O’Driscoll MC, Strominger Z, Wakahiro M, et al. (2011) Genetic and functional analyses identify DISC1 as a novel callosal agenesis candidate gene. American Journal of Medical Genetics A 155A: 1865–1876. 67. Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, et al. (2015) High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype- Specific Cancer Liabilities. Cell 163: 1515–1526. 68. Wang T, Birsoy K, Hughes NW, Krupczak KM, Post Y, et al. (2015) Identification and characterization of essential genes in the human genome. Science 350: 1096–1101. 69. Blomen VA, Májek P, Jae LT, Bigenzahn JW, Nieuwenhuis J, et al. (2015) Gene essentiality and synthetic lethality in haploid human cells. Science 350: 1092–1096. 70. Lek M, Karczewski K, Minikel E, Samocha K, Banks E, et al. (2015) Analysis of protein-coding genetic variation in 60,706 humans. bioRxiv: early release. 71. Ercan-Sencicek AG, Jambi S, Franjic D, Nishimura S, Li M, et al. (2015) Homozygous loss of DIAPH1 is a novel cause of microcephaly in humans. European Journal of Human Genetics 23: 165–172 . 72. Ala U, Piro RM, Grassi E, Damasco C, Silengo L, et al. (2008) Prediction of human disease genes by human-mouse conserved coexpression analysis. PLoS computational biology 4: e1000043. 73. Higgins J, Midgley C, Bergh A-M, Bell S, Askham J, et al. (2010) Human ASPM participates in spindle organisation, spindle orientation and cytokinesis. BMC Cell Biology 11: 85. 74. Ogi T, Walker S, Stiff T, Hobson E, Limsirichaikul S, et al. (2012) Identification of the First ATRIP-Deficient Patient and Novel Mutations in 120

In silico network analysis of congenital microcephaly

ATR Define a Clinical Spectrum for ATR-ATRIP Seckel Syndrome. PLoS genetics 8: e1002945. 75. Issa L, Mueller K, Seufert K, Kraemer N, Rosenkotter H, et al. (2013) Clinical and cellular features in patients with primary autosomal recessive microcephaly and a novel CDK5RAP2 mutation. Orphanet Journal of Rare Disease 8: 59. 76. Hussain MS, Baig SM, Neumann S, Peche VS, Szczepanski S, et al. (2013) CDK6 associates with the centrosome during mitosis and is mutated in a large Pakistani family with primary microcephaly. Human molecular genetics 22: 5199–5214. 77. Guernsey DL, Jiang H, Hussin J, Arnold M, Bouyakdan K, et al. (2010) Mutations in centrosomal protein CEP152 in primary microcephaly families linked to MCPH4. The American Journal of Human Genetics 87: 40–51. 78. Sir J-H, Barr AR, Nicholas AK, Carvalho OP, Khurshid M, et al. (2011) A primary microcephaly protein complex forms a ring around parental centrioles. Nature genetics 43: 1147–1153. 5 79. Capo-Chichi J-M, Boissel S, Brustein E, Pickles S, Fallet-Bianco C, et al. (2015) Disruption of CLPB is associated with congenital microcephaly, severe encephalopathy and 3-methylglutaconic aciduria. Journal of medical genetics 52: 303–311. 80. Capo-Chichi J-M, Bharti SK, Sommers JA, Yammine T, Chouery E, et al. (2013) Identification and biochemical characterization of a novel mutation in DDX11 causing Warsaw breakage syndrome. Human mutation 34: 103–107. 81. Van der Lelij P, Chrzanowska KH, Godthelp BC, Rooimans MA, Oostra AB, et al. (2010) Warsaw breakage syndrome, a cohesinopathy associated with mutations in the XPD helicase family member DDX11/ChlR1. The American Journal of Human Genetics 86: 262–266. 82. Shaheen R, Faqeih E, Ansari S, Abdel-Salam G, Al-Hassnan ZN, et al. (2014) Genomic analysis of primordial dwarfism reveals novel disease genes. Genome Research 24: 291–299. 83. Courcet J-B, Faivre L, Malzac P, Masurel-Paulet A, Lopez E, et al. (2012) The DYRK1A gene is a cause of syndromic intellectual disability with severe microcephaly and epilepsy. Journal of medical genetics 49: 731–736. 84. Van Bon B, Hoischen A, Hehir-Kwa J, de Brouwer A, Ruivenkamp C, et al. (2011) Intragenic deletion in DYRK1A leads to mental retardation and primary microcephaly. Clinical genetics 79: 296–299. 85. Lines MA, Huang L, Schwartzentruber J, Douglas SL, Lynch DC, et al. (2012) Haploinsufficiency of a spliceosomal GTPase encoded by EFTUD2 121

Chapter 5

causes mandibulofacial dysostosis with microcephaly. American Journal of Human Genetics 90: 369–377. 86. Arnold SJ, Huang G-J, Cheung AF, Era T, Nishikawa S-I, et al. (2008) The T- box transcription factor Eomes/Tbr2 regulates neurogenesis in the cortical subventricular zone. Genes & development 22: 2479–2484. 87. Baala L, Briault S, Etchevers HC, Laumonnier F, Natiq A, et al. (2007) Homozygous silencing of T-box transcription factor EOMES leads to microcephaly with polymicrogyria and corpus callosum agenesis. Nature genetics 39: 454–456. 88. Shalev SA, Tenenbaum-Rakover Y, Horovitz Y, Paz VP, Ye H, et al. (2014) Microcephaly, epilepsy, and neonatal diabetes due to compound heterozygous mutations in IER3IP1: insights into the natural history of a rare disorder. Pediatric Diabetes 15: 252–256. 89. Poulton CJ, Schot R, Kia SK, Jones M, Verheijen FW, et al. (2011) Microcephaly with simplified gyration, epilepsy, and infantile diabetes linked to inappropriate apoptosis of neural progenitors. The American Journal of Human Genetics 89: 265–276. 90. Mirzaa GM, Enyedi L, Parsons G, Collins S, Medne L, et al. (2014) Congenital microcephaly and chorioretinopathy due to de novo heterozygous KIF11 mutations: Five novel mutations and review of the literature. American Journal of Medical Genetics Part A 164: 2879–2886. 91. Ostergaard P, Simpson MA, Mendola A, Vasudevan P, Connell FC, et al. (2012) Mutations in KIF11 Cause Autosomal-Dominant Microcephaly Variably Associated with Congenital Lymphedema and Chorioretinopathy. The American Journal of Human Genetics 90: 356–362. 92. Filges I, Nosova E, Bruder E, Tercanli S, Townsend K, et al. (2014) Exome sequencing identifies mutations in KIF14 as a novel cause of an autosomal recessive lethal fetal ciliopathy phenotype. Clinical genetics 86: 220–228. 93. Poirier K, Lebrun N, Broix L, Tian G, Saillour Y, et al. (2013) Mutations in TUBG1, DYNC1H1, KIF5C and KIF2A cause malformations of cortical development and microcephaly. Nature genetics 45: 639–647. 94. Murray JE, Bicknell LS, Yigit G, Duker AL, Kogelenberg M, et al. (2014) Extreme growth failure is a common presentation of ligase IV deficiency. Human mutation 35: 76–85. 95. Jackson AP, Eastwood H, Bell SM, Adu J, Toomes C, et al. (2002) Identification of microcephalin, a protein implicated in determining the size of the human brain. The American Journal of Human Genetics 71: 136–142.

122

In silico network analysis of congenital microcephaly

96. Jackson AP, McHale DP, Campbell DA, Jafri H, Rashid Y, et al. (1998) Primary autosomal recessive microcephaly (MCPH1) maps to chromosome 8p22-pter. The American Journal of Human Genetics 63: 541–546. 97. Van Der Burgt I, Chrzanowska KH, Smeets D, Weemaes C (1996) Nijmegen breakage syndrome. Journal of medical genetics 33: 153–156. 98. Chrzanowska K, Kleijer W, Krajewska-Walasek M, Bialecka M, Gutkowska A, et al. (1995) Eleven Polish patients with microcephaly, immunodeficiency, and chromosomal instability: the Nijmegen breakage syndrome. American journal of medical genetics 57: 462–471. 99. Buck D, Malivert L, de Chasseval R, Barraud A, Fondanèche M-C, et al. (2006) Cernunnos, a novel nonhomologous end-joining factor, is mutated in human immunodeficiency with microcephaly. Cell 124: 287–299. 100. Bober MB, Niiler T, Duker AL, Murray JE, Ketterer T, et al. (2012) Growth in individuals with Majewski osteodysplastic primordial dwarfism type II caused by pericentrin mutations. American Journal of Medical Genetics A 158A: 2719–2725. 5 101. Edery P, Marcaillou C, Sahbatou M, Labalme A, Chastang J, et al. (2011) Association of TALS developmental disorder with defect in minor splicing component U4atac snRNA. Science 332: 240–243. 102. Kumar A, Girimaji SC, Duvvari MR, Blanton SH (2009) Mutations in STIL, Encoding a Pericentriolar and Centrosomal Protein, Cause Primary Microcephaly. The American Journal of Human Genetics 84: 286–290. 103. Shamseldin HE, Elfaki M, Alkuraya FS (2012) Exome sequencing reveals a novel Fanconi group defined by XRCC2 mutation. Journal of medical genetics 49: 184–186. 104. Rosin N, Elcioglu NH, Beleggia F, Isgüven P, Altmüller J, et al. (2015) Mutations in XRCC4 causes primary microcephaly, short stature, and increased genomic instability. Human molecular genetics 24: 3708–3717. 105. Murray JE, van der Burg M, IJspeert H, Carroll P, Wu Q, et al. (2015) Mutations in the NHEJ Component XRCC4 Cause Primordial Dwarfism. The American Journal of Human Genetics 96: 412–424.

123

Chapter 5 Chapter 5

Supplementary materials

124

In silico network analysis of congenital microcephaly

5

125

Chapter 5

Supplementary Table 2: The name of genes grouped in class 3 in alphabetic order and the reason to exclude them from analysis. Gene Phenotype Reason to exclude Referenc* MIM number 1 ALG1 608540 Microcephaly neonatal and also postnatal [1] 2 ALG3 601110 Microcephaly at birth is not clear in OMIM; but head [2] circumference is normal at birth based on literature 3 ASXL1 605039 3/7 patients with ASXL1 had primary microcephaly [3]

4 ATRIP 606605 Microcephaly in one patient without confirmation by [4] animal model 5 ATRX 301040 Microcephaly at birth is not clear in OMIM; 75% of [5] ATRX mutations show microcephaly 6 B3GALTL 261540 Microcephaly (22% of patients); onset is not clear in OMIM OMIM 7 BCAP31 300475 Microcephaly onset is not clear in both OMIM and OMIM literature 8 BLM 210900 Microcephaly onset is not clear in both OMIM and OMIM; [6] literature 9 BRCA2 600185 Microcephaly at birth has been reported in a few [7–10] patients, but incomplete penetrance 10 BRP44L 614741 Microcephaly, progressive only in 1 patient OMIM 11 BUB1B 257300 Microcephaly onset is not clear in both OMIM and [11] literature 12 CASK 300749 Based on OMIM, microcephaly and onset at birth or [12] early infancy; but postnatal microcephaly based on Seltzer (2014) 13 CDC6 613805 Microcephaly at birth is not clear in OMIM; [13] incomplete penetrance 14 CDT1 613804 Microcephaly at birth is not clear in OMIM; [13] incomplete penetrance 15 CEP57 614114 Microcephaly onset is not clear in both OMIM and OMIM literature 16 CLPP 614129 Microcephaly (in few patients); onset of disease is not OMIM clear in OMIM 17 COG4 613489 One patient had onset at age 4 months after normal OMIM development 18 COG6 615328 Microcephaly at birth but with incomplete penetrance OMIM 19 COX7B 300887 Microcephaly (3/4 patients); onset is not clear OMIM 20 CRIPT 615789 Microcephaly onset is not clear in literature [7] 21 CTCF 615502 Microcephaly onset is not clear in both OMIM and OMIM literature 22 CTSD 610127 Based on Schulz et al.(2013) age of onset is from [14] birth to adulthood 23 CYB5R3 250800 Microcephaly onset is not clear in both OMIM and OMIM literature 24 DKC1 305000 Microcephaly at birth is not clear in OMIM and OMIM microcephaly seen in Hoyeraal–Hreidarsson syndrome variant 25 DLD 246900 Onset usually in the neonatal period; later onset has OMIM been reported as well. 26 DNM1L 614388 Based on OMIM, microcephaly and onset in first [15] days of life reported only in one patient without animal model 27 DPAGT1 191350 Two diseases, one microcephaly at birth and the OMIM second without microcephaly 28 EFEMP2 614437 Microcephaly (rare): onset is not clear OMIM 29 EHMT1 610253 Microcephaly onset is not clear in both OMIM and OMIM literature 30 EIF2AK3 226980 Onset is not clear in OMIM; incomplete penetrance [16] 31 ERCC2 601675 Microcephaly onset is not clear in both OMIM and OMIM 126

In silico network analysis of congenital microcephaly

Gene Phenotype Reason to exclude Referenc* MIM number literature 32 ERCC3 610651 Microcephaly onset is not clear in both OMIM and OMIM literature 33 ERCC4 615272 Microcephaly; onset in infancy OMIM (XPF) 34 ESCO2 609353 Microcephaly onset is not clear in both OMIM and OMIM literature 35 FAM20C 259775 Microcephaly onset is not clear in both OMIM and OMIM literature 36 FBXL4 615471 Onset at birth or early infancy; microcephaly (in some OMIM patients) 37 FRAS1 219000 Microcephaly onset is not clear in both OMIM and OMIM literature 38 FREM2 219000 Microcephaly onset is not clear in both OMIM and OMIM literature 39 FRMD4A 616305 Congenital microcephaly based on one family; no [17] functional study or animal model 40 FTO 612938 Microcephaly onset is not clear in both OMIM and OMIM literature 41 G6PC3 612541 Microcephaly onset is not clear in both OMIM and OMIM literature 42 GFM1 609060 Mild microcephaly at birth OMIM; [18] 43 GLI2 610829 Microcephaly at birth is not clear in OMIM; OMIM incomplete penetrance 5 44 GLI2 165230 Two disease: one microcephaly (but unclear onset of OMIM disease) the other without microcephaly 45 GMPPB 615320 Three diseases with microcephaly but with different OMIM age of onset from birth to 4 years 46 GNPAT 222765 Microcephaly onset is not clear in both OMIM and OMIM; [19] literature 47 GNPAT 222765 Microcephaly onset is not clear in both OMIM and OMIM literature 48 HCCS 309801 Onset is not clear in OMIM; incomplete penetrance [20] 49 HDAC8 300269 Two diseases: one microcephaly (but unclear onset of OMIM disease) the other without microcephaly 50 HSPD1 118190 Two diseases: one without microcephaly the other OMIM with microcephaly (onset between birth and 3 months of age) 51 KCNJ2 600681 Three diseases: one microcephaly (unclear onset), one OMIM without microcephaly 52 KCNT1 608167 Two diseases: one microcephaly (but unclear onset of OMIM disease) the other without microcephaly 53 LARP7 615071 Microcephaly onset is not clear in both OMIM and OMIM literature 54 LIAS 614462 Microcephaly, Onset in first days of life; only in one [21] patient 55 LINS1 614340 Microcephaly (in some patients); onset is not clear in OMIM OMIM 56 MASP1 257920 Microcephaly onset is not clear in both OMIM and OMIM literature 57 MBTPS2 300294 Three diseases: two without microcephaly, one OMIM microcephaly at birth

58 MKS1 249000 Microcephaly at birth is not clear in OMIM; [22] incomplete penetrance 59 MLL2 147920 Microcephaly onset is not clear in OMIM; Some [23] patients did not present microcephaly 60 MPLKIP 234050 Microcephaly onset is not clear in OMIM and OMIM literature

127

Chapter 5

Gene Phenotype Reason to exclude Referenc* MIM number 61 MRE11A 604391 Two unrelated patients with congenital microcephaly [24] but normally Ataxia-telangiectasia-like disorder exhibit without microcephaly 62 MYCN 164280 Microcephaly at birth is not clear in OMIM; OMIM microcephaly (79% of cases) 63 NALCN 615419 Onset at birth or in infancy(OMIM); but brain MRI [25] in six patients was normal 64 NIN 614851 Microcephaly onset is not clear in both OMIM and OMIM; literature [26,27] 65 NIPBL 122470 Microcephaly onset is not clear in OMIM; some [28] patients did not show microcephaly 66 NOLA3 224230 Microcephaly onset is not clear in both OMIM and [29] literature 67 NSDHL 300831 Microcephaly onset is not clear in both OMIM and [30] literature 68 NSMCE2 NA Two reported patients, one microcephaly at birth and [31] for the other OFC at birth was not clear 69 ORC1 224690 Microcephaly onset is not clear in both OMIM and [13] literature 70 ORC4 613800 Microcephaly at birth is not clear in OMIM; [13] incomplete penetrance 71 ORC6 613803 Microcephaly at birth is not clear in OMIM; [13] incomplete penetrance 72 PAK3 300558 Microcephaly onset is not clear in both OMIM and [32] literature 73 PARS 616140 Microcephaly (1 patient); onset in the first year of life OMIM 74 PDHX 245349 Onset at birth or in early childhood(OMIM); 3/26 [33] patients reviewed presented with microcephaly 75 PEX7 215100 Microcephaly onset is not clear in both OMIM and [34] literature 76 PGAP2 614207 Microcephaly at birth but incomplete penetrance OMIM 77 PHF6 301900 Microcephaly at birth is not clear in OMIM; [35] incomplete penetrance 78 PIGO 614749 Microcephaly onset at birth; incomplete penetrance [36] 79 PMM2 212065 Onset is not clear and microcephaly (50% of patients) OMIM 80 POMGNT1 606822 Three diseases: two microcephaly at birth and one OMIM without microcephaly 81 POR 201750 Microcephaly onset is not clear in both OMIM and OMIM literature 82 PORCN 305600 Microcephaly onset is not clear in both OMIM and OMIM literature 83 PQBP1 309500 Microcephaly at birth is not clear in OMIM; [37] incomplete penetrance 84 PRKDC 615966 Immunodeficiency 26, with or without neurologic OMIM abnormalities in OMIM; therefore incomplete penetrance 85 PTDSS1 151050 Microcephaly (in some patients); onset is not clear in OMIM OMIM 86 PTF1A 609069 Microcephaly onset is not clear in both OMIM and OMIM literature 87 RAD21 614701 Microcephaly onset is not clear in both OMIM and OMIM; [38] literature 88 RBPJ 614814 Microcephaly onset is not clear in both OMIM and OMIM literature

89 RPS6KA3 303600 Microcephaly onset is not clear in both OMIM and OMIM literature 90 RTEL1 615190 Microcephaly onset in early childhood OMIM 91 RTTN 614833 Microcephaly onset is not clear in both OMIM and [39] literature 128

In silico network analysis of congenital microcephaly

Gene Phenotype Reason to exclude Referenc* MIM number 92 SALL1 107480 Microcephaly onset is not clear in both OMIM and OMIM literature 93 SATB2 612313 Microcephaly onset is not clear in both OMIM and OMIM literature 94 SC4MOL 607545 Microcephaly onset is not clear in both OMIM and [40] literature 95 SC5DL 607330 Microcephaly onset is not clear in both OMIM and OMIM literature 96 SF3B4 154400 Microcephaly onset is not clear in both OMIM and OMIM literature 97 SIX3 157170 Microcephaly onset is not clear in both OMIM and [41] literature 98 SKI 182212 Microcephaly onset is not clear in both OMIM and OMIM literature 99 SLC35A3 615553 Microcephaly onset is not clear in both OMIM and OMIM literature 100 SNAP29 609528 Onset in first months of life OMIM 101 SOX2 206900 Microcephaly onset is not clear in both OMIM and OMIM literature 102 SPOCK1 602264 Microcephaly at birth but only in one patient without [42] animal model 103 STT3A 615596 Microcephaly; at birth is not clear; one family; no [43] animal model 104 STT3B 615597 Microcephaly at birth is not clear; one patient; no [43] 5 animal model 105 TBCE 241410 Microcephaly onset is not clear in both OMIM and [44] literature; 21 patients with similar mutation and no animal model and functional study 106 TFAP2A 113620 Microcephaly onset is not clear in both OMIM and OMIM literature 107 THOC6 613680 Microcephaly onset is not clear in both OMIM and [45] literature 108 TUBA1A 611603 Microcephaly at birth is not clear in OMIM; three [46] patients with normal neonate parameter 109 TUBA8 613180 Microcephaly at birth but incomplete penetrance OMIM 110 TUBB2B 610031 Microcephaly onset is not clear in both OMIM and [47] literature; Incomplete penetrance 111 TUBB3 614039 Microcephaly at birth but incomplete penetrance OMIM 112 TUBG1 615412 Microcephaly at birth is not clear in OMIM; OMIM incomplete penetrance [48] 113 UBE3B 244450 Microcephaly onset is not clear in both OMIM and OMIM literature 114 UBR1 243800 Microcephaly onset is not clear in both OMIM and OMIM literature 115 VIPAR 613404 Microcephaly onset is not clear in both OMIM and OMIM; [49] literature 116 VPS33B 208085 Microcephaly onset is not clear in both OMIM and OMIM; [50] literature 117 ZBTB18 612337 Onset is not clear in OMIM; incomplete penetrance; OMIM one patient 118 ZEB2 235730 Microcephaly at birth is not clear in OMIM; [51] incomplete penetrance *Data were obtained from references (numbers) or OMIM database.

129

Chapter 5

Supplementary Table 3: Additional biological functions which are likely to be involved in CM. These biological functions resulted from phase 2.3, analysis of 176 genes including 76 definite CM genes along with 100 predicted genes. These functions did not emerge from previous phase 2.2, which analyzed only 76 definite CM genes.

Based on 76 definite congenital microcephaly genes along with 100 predicted genes Biological function Number of Group Hypergeometric observed enrichment* probability‡ GO terms # 1 Chromosome and sister chromatid 5 (2.6%) 23.88 1.03×10-28 segregation 2 Kinetochore, centromere and 9 (4.6%) 14.82 7.77×10-25 condensed chromosome 3 DNA ligation 1 (0.5%) 58.77 1.02×10-08 4 Ubiquitin ligase complex 1 (0.5%) 19.06 6.27×10-07 5 Viral latency 2 (1%) 42.74 1.58×10-06 6 Response to ionizing radiation and 3 (1.5%) 3.79 2.18×10-06 stress 7 Midbody 1 (0.5%) 10.16 5.42×10-06 8 Meiosis 4 (2%) 9.46 8.66×10-06 9 Regulation of (1%) 9.14 1.08×10-05 organization 10 Antigen processing and presentation 7 (3.6%) 5.44 1.43×10-05 11 Regulation of ligase activity 2 (1%) 8.23 2.13×10-05 12 Aging 2 (1%) 10.37 2.28×10-05 13 Establishment or maintenance of cell 1 (0.5%) 9.40 3.96×10-05 polarity 14 Fibroblast proliferation 1 (0.5%) 18.08 6.30×10-05 15 Inositol lipid-mediated signaling 2 (1%) 5.44 1.02×10-04 16 Protein serine/threonine kinase 1 (0.5%) 4.05 1.57×10-04 activity 17 Insulin receptor 1 (0.5%) 27.12 1.59×10-04 18 Activation of signaling protein 1 (0.5%) 9.04 2.11×10-04 activity involved in unfolded protein response 19 Protein autophosphorylation 2 (1%) 5.52 2.49×10-04 #Number of GO terms with FDR<0.05 in each group. *Biological function group enrichment calculated according to formula in material method, phase 2. ‡Hypergeometric probability of drawing exactly number of successes in each biological function group from a sample size of 176 genes (See Web Links).

130

In silico network analysis of congenital microcephaly

5

131

Chapter 5

132

In silico network analysis of congenital microcephaly

5

133

Chapter 5

134

In silico network analysis of congenital microcephaly

5

135

Chapter 5

136

In silico network analysis of congenital microcephaly

5

137

Chapter 5

.

o o extra top biological functions

e models as models e well.

causing genes input as a query in phase 2. Purple nodes Purple 2. phase in as query input a genes causing

-

causing genes have been studied along with 100 predicted genes. To reduce To reduce 100 predicted genes. with have along genes studied been causing

-

human human (extracted from OMIM) or result in brain abnormalities in mouse models (extracted

known/possible biological functions underlying congenital microcephaly and involved genes identified in phase 2.3 identifiedphase in genes andmicrocephaly involved functions congenital underlying biological known/possible

In silico In phase 2.3 in which associated GO terms in 76 definite CM in definite GO terms 2.3 76 associated which in phase

The network created in in created network The

Supplementary Figure 1: Figure Supplementary the we complexity, visualized 12 common biological functions that emerged from both 2.2 and phase 2.3. phase Additionally, tw CM nodes blue dark display Green a with border. boxes two been have by displayed 2.3 phase identified display predicted genes in which defects lead to microcephaly in lethality exhibitingmous in prenatal genes Redmicrocephaly. display cause to suspected nodes abnormalities MGI);from these ‡ 138

In silico network analysis of congenital microcephaly

Supplementary references 1. Morava E, Vodopiutz J, Lefeber DJ, Janecke AR, Schmidt WM, et al. (2012) Defining the phenotype in congenital disorder of glycosylation due to ALG1 mutations. Pediatrics 130: e1034–9. 2. Denecke J, Kranz C, Von Kleist-Retzow JC, Bosse K, Herkenrath P, et al. (2005) Congenital disorder of glycosylation type Id: clinical phenotype, molecular analysis, prenatal diagnosis, and glycosylation of fetal proteins. Pediatric research 58: 248–253. 3. Hoischen A, van Bon BW, Rodriguez-Santiago B, Gilissen C, Vissers LE, et al. (2011) De novo nonsense mutations in ASXL1 cause Bohring-Opitz syndrome. Nature genetics 43: 729–731. 4. Ogi T, Walker S, Stiff T, Hobson E, Limsirichaikul S, et al. (2012) Identification of the First ATRIP-Deficient Patient and Novel Mutations in ATR Define a Clinical Spectrum for ATR-ATRIP Seckel Syndrome. PLoS genetics 8: e1002945. 5. Moncini S, Bedeschi M, Castronovo P, Crippa M, Calvello M, et al. (2013) ATRX mutation in two adult brothers with non-specific moderate intellectual disability identified by exome sequencing. Meta Gene 1: 102–108. 5 6. Arora H, Chacon AH, Choudhary S, McLeod MP, Meshkov L, et al. (2014) Bloom syndrome. International Journal of Dermatology 53: 798–802. 7. Shaheen R, Faqeih E, Ansari S, Abdel-Salam G, Al-Hassnan ZN, et al. (2014) Genomic analysis of primordial dwarfism reveals novel disease genes. Genome Research 24: 291–299. 8. Alter BP, Rosenberg PS, Brody LC (2007) Clinical and molecular features associated with biallelic mutations in FANCD1/BRCA2. Journal of medical genetics 44: 1–9. 9. Alcantara D, O’Driscoll M (2014) Congenital microcephaly. American Journal of Medical Genetics Part C: Seminars in Medical Genetics 166: 124–139. 10. McKinnon PJ (2009) DNA repair deficiency and neurological disease. Nature Reviews Neuroscience 10: 100–112. 11. Hanks S, Coleman K, Reid S, Plaja A, Firth H, et al. (2004) Constitutional aneuploidy and cancer predisposition caused by biallelic mutations in BUB1B. Nature genetics 36: 1159–1161. 12. Seltzer LE, Paciorkowski AR (2014) Genetic disorders associated with postnatal microcephaly. American Journal of Medical Genetics Part C: Seminars in Medical Genetics 166: 140–155. 13. De Munnik SA, Bicknell LS, Aftimos S, Al-Aama JY, van Bever Y, et al. (2012) Meier-Gorlin syndrome genotype-phenotype studies: 35 individuals with pre-replication complex gene mutations and 10 without molecular diagnosis. European Journal of Human Genetics 20: 598–606. 14. Schulz A, Kohlschütter A, Mink J, Simonati A, Williams R (2013) NCL Diseases-Clinical Perspectives. Biochimica et Biophysica Acta (BBA)- Molecular Basis of Disease: 1801–1806.

139

Chapter 5

15. Waterham HR, Koster J, van Roermund CW, Mooyer PA, Wanders RJ, et al. (2007) A lethal defect of mitochondrial and peroxisomal fission. New England Journal of Medicine 356: 1736–1741. 16. De Wit MCY, de Coo IFM, Julier C, Delépine M, Lequin MH, et al. (2006) Microcephaly and simplified gyral pattern of the brain associated with early onset insulin-dependent diabetes mellitus. Neurogenetics 7: 259–263. 17. Fine D, Flusser H, Markus B, Shorer Z, Gradstein L, et al. (2015) A syndrome of congenital microcephaly, intellectual disability and dysmorphism with a homozygous mutation in FRMD4A. European Journal of Human Genetics 23: 1729––1734. 18. Valente L, Tiranti V, Marsano RM, Malfatti E, Fernandez-Vizarra E, et al. (2007) Infantile encephalopathy and defective mitochondrial DNA translation in patients with mutations of mitochondrial elongation factors EFG1 and EFTu. The American Journal of Human Genetics 80: 44–58. 19. Buchert R, Tawamie H, Smith C, Uebe S, Innes AM, et al. (2014) A Peroxisomal Disorder of Severe Intellectual Disability, Epilepsy, and Cataracts Due to Fatty Acyl-CoA Reductase 1 Deficiency. The American Journal of Human Genetics 95: 602–610. 20. Van Rahden VA, Rau I, Fuchs S, Kosyna FK, de Almeida Jr HL, et al. (2014) Clinical spectrum of females with HCCS mutation: from no clinical signs to a neonatal lethal form of the microphthalmia with linear skin defects (MLS) syndrome. Orphanet Journal of Rare Diseases 9: 53. 21. Mayr JA, Zimmermann FA, Fauth C, Bergheim C, Meierhofer D, et al. (2011) Lipoic acid synthetase deficiency causes neonatal-onset epilepsy, defective mitochondrial energy metabolism, and glycine elevation. The American Journal of Human Genetics 89: 792–797. 22. Khaddour R, Smith U, Baala L, Martinovic J, Clavering D, et al. (2007) Spectrum of MKS1 and MKS3 mutations in Meckel syndrome: a genotype- phenotype correlation. Human mutation 28: 523–524. 23. Makrythanasis P, van Bon BW, Steehouwer M, Rodríguez-Santiago B, Simpson M, et al. (2013) MLL2 mutation detection in 86 patients with Kabuki syndrome: a genotype-phenotype study. Clinical Genetics 84: 539– 545. 24. Matsumoto Y, Miyamoto T, Sakamoto H, Izumi H, Nakazawa Y, et al. (2011) Two unrelated patients with MRE11A mutations and Nijmegen breakage syndrome-like severe microcephaly. DNA Repair (Amst) 10: 314–321. 25. Al-Sayed MD, Al-Zaidan H, Albakheet A, Hakami H, Kenana R, et al. (2013) Mutations in NALCN cause an autosomal-recessive syndrome with severe 140

In silico network analysis of congenital microcephaly

hypotonia, speech impairment, and cognitive delay. The American Journal of Human Genetics 93: 721–726. 26. Grosch M, Grüner B, Spranger S, Stütz AM, Rausch T, et al. (2013) Identification of a Ninein (NIN) mutation in a family with spondyloepimetaphyseal dysplasia with joint laxity (leptodactylic type)-like phenotype. Matrix Biology 32: 387–392. 27. Dauber A, LaFranchi SH, Maliga Z, Lui JC, Moon JE, et al. (2012) Novel microcephalic primordial dwarfism disorder associated with variants in the centrosomal protein ninein. The Journal of Clinical Endocrinology & Metabolism 97: E2140–E2151. 28. Mannini L, Cucco F, Quarantotti V, Krantz ID, Musio A (2013) Mutation spectrum and genotype-phenotype correlation in Cornelia de Lange syndrome. Human Mutation 34: 1589–1596. 29. Walne AJ, Vulliamy T, Marrone A, Beswick R, Kirwan M, et al. (2007) Genetic heterogeneity in autosomal recessive dyskeratosis congenita with one subtype due to mutations in the telomerase-associated protein NOP10. 5 Human molecular genetics 16: 1619–1629. 30. McLarren KW, Severson TM, du Souich C, Stockton DW, Kratz LE, et al. (2010) Hypomorphic temperature-sensitive alleles of NSDHL cause CK syndrome. American journal of human genetics 87: 905–914. 31. Payne F, Colnaghi R, Rocha N, Seth A, Harris J, et al. (2014) Hypomorphism for human NSMCE2/MMS21 causes primordial dwarfism and insulin resistance. The Journal of Clinical Investigation 124: 4028–4038. 32. Magini P, Pippucci T, Tsai I-C, Coppola S, Stellacci E, et al. (2014) A mutation in PAK3 with a dual molecular effect deregulates the RAS/MAPK pathway and drives an X-linked syndromic phenotype. Human molecular genetics 23: 3607–3617. 33. Tajir M, Arnoux JB, Boutron A, Elalaoui SC, De Lonlay P, et al. (2012) Pyruvate dehydrogenase deficiency caused by a new mutation of PDHX gene in two Moroccan patients. European journal of medical genetics 55: 535–540. 34. Motley AM, Brites P, Gerez L, Hogenhout E, Haasjes J, et al. (2002) Mutational spectrum in the PEX7 gene and functional analysis of mutant alleles in 78 patients with rhizomelic chondrodysplasia punctata type 1. The American Journal of Human Genetics 70: 612–624. 35. Carter MT, Picketts DJ, Hunter AG, Graham GE (2009) Further clinical delineation of the Bӧ rjeson-Forssman-Lehmann syndrome in patients with PHF6 mutations. American Journal of Medical Genetics Part A 149: 246– 250. 141

Chapter 5

36. Krawitz PM, Murakami Y, Hecht J, Krüger U, Holder SE, et al. (2012) Mutations in PIGO, a member of the GPI-anchor-synthesis pathway, cause hyperphosphatasia with mental retardation. American journal of human genetics 91: 146–151. 37. Germanaud D, Rossi M, Bussy G, Gérard D, Hertz-Pannier L, et al. (2011) The Renpenning syndrome spectrum: new clinical insights supported by 13 new PQBP1-mutated males. Clinical genetics 79: 225–235. 38. Deardorff MA, Wilde JJ, Albrecht M, Dickinson E, Tennstedt S, et al. (2012) RAD21 mutations cause a human cohesinopathy. The American Journal of Human Genetics 90: 1014–1027. 39. Kheradmand Kia S, Verbeek E, Engelen E, Schot R, Poot RA, et al. (2012) RTTN mutations link primary cilia function to organization of the human cerebral cortex. American journal of human genetics 91: 533–540. 40. He M, Smith LD, Chang R, Li X, Vockley J (2014) The role of sterol-C4- methyl oxidase in epidermal biology. Biochimica et Biophysica Acta (BBA)- Molecular and Cell Biology of Lipids 1841: 331–335. 41. Lacbawan F, Solomon BD, Roessler E, El-Jaick K, Domené S, et al. (2009) Clinical spectrum of SIX3-associated mutations in holoprosencephaly: correlation between genotype, phenotype and function. Journal of medical genetics 46: 389–398. 42. Dhamija R, Graham Jr JM, Smaoui N, Thorland E, Kirmani S (2014) Novel de novo SPOCK1 mutation in a proband with developmental delay, microcephaly and agenesis of corpus callosum. European journal of medical genetics 57: 181–184. 43. Shrimal S, Ng BG, Losfeld M-E, Gilmore R, Freeze HH (2013) Mutations in STT3A and STT3B cause two congenital disorders of glycosylation. Human molecular genetics 22: 4638–4645. 44. Naguib K, Gouda S, Elshafey A, Mohammed F, Bastaki L, et al. (2009) Sanjad-Sakati syndrome/Kenny-Caffey syndrome type 1: a study of 21 cases in Kuwait. Eastern Mediterranean Health Journal 15: 345–352. 45. Boycott KM, Beaulieu C, Puffenberger EG, McLeod DR, Parboosingh JS, et al. (2010) A novel autosomal recessive malformation syndrome associated with developmental delay and distinctive facies maps to 16ptel in the Hutterite population. American Journal of Medical Genetics Part A 152: 1349–1356. 46. Poirier K, Saillour Y, Fourniol F, Francis F, Souville I, et al. (2013) Expanding the spectrum of TUBA1A-related cortical dysgenesis to Polymicrogyria. European Journal of Human Genetics 21: 381–385. 142

In silico network analysis of congenital microcephaly

47. Romaniello R, Arrigoni F, Cavallini A, Tenderini E, Baschirotto C, et al. (2014) Brain malformations and mutations in alpha and beta tubulin genes: a review of the literature and description of two new cases. Developmental Medicine & Child Neurology 56: 354–360. 48. Poirier K, Lebrun N, Broix L, Tian G, Saillour Y, et al. (2013) Mutations in TUBG1, DYNC1H1, KIF5C and KIF2A cause malformations of cortical development and microcephaly. Nature genetics 45: 639–647. 49. Cullinane AR, Straatman-Iwanowska A, Zaucker A, Wakabayashi Y, Bruce CK, et al. (2010) Mutations in VIPAR cause an arthrogryposis, renal dysfunction and cholestasis syndrome phenotype with defects in epithelial polarization. Nature genetics 42: 303–312. 50. Gissen P, Johnson CA, Morgan NV, Stapelbroek JM, Forshew T, et al. (2004) Mutations in VPS33B, encoding a regulator of SNARE-dependent membrane fusion, cause arthrogryposis-renal dysfunction-cholestasis (ARC) syndrome. Nature genetics 36: 400–404. 51. Yamada Y, Nomura N, Yamada K, Matsuo M, Suzuki Y, et al. (2014) The 5 spectrum of ZEB2 mutations causing the Mowat-Wilson syndrome in Japanese populations. American Journal of Medical Genetics Part A 164: 1899–1908. 52. Alazami AM, Patel N, Shamseldin HE, Anazi S, Al-Dosari MS, et al. (2015) Accelerating novel candidate gene discovery in neurogenetic disorders via whole-exome sequencing of prescreened multiplex consanguineous families. Cell Report 10: 148–161. 53. Budde BS, Namavar Y, Barth PG, Poll-The BT, Nürnberg G, et al. (2008) tRNA splicing endonuclease mutations cause pontocerebellar hypoplasia. Nature Genetics 40: 1113–1118. 54. Matsuura S, Matsumoto Y, Morishima K, Izumi H, Matsumoto H, et al. (2006) Monoallelic BUB1B mutations and defective mitotic-spindle checkpoint in seven families with premature chromatid separation (PCS) syndrome. American Journal of Medical Genetics Part A 140: 358–367. 55. Georgi B, Voight BF, Bucan M (2013) From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLOS Genetics 9: e1003484. 56. Woodbine L, Neal JA, Sasi N-K, Shimada M, Deem K, et al. (2013) PRKDC mutations in a SCID patient with profound neurological abnormalities. Journal of Clinical Investion 123: 2969–2980.

143

Chapter 5

57. Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, et al. (2015) High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype- Specific Cancer Liabilities. Cell 163: 1515–1526. 58. Wang T, Birsoy K, Hughes NW, Krupczak KM, Post Y, et al. (2015) Identification and characterization of essential genes in the human genome. Science 350: 1096–1101. 59. Blomen VA, Májek P, Jae LT, Bigenzahn JW, Nieuwenhuis J, et al. (2015) Gene essentiality and synthetic lethality in haploid human cells. Science 350: 1092–1096. 60. Lek M, Karczewski K, Minikel E, Samocha K, Banks E, et al. (2015) Analysis of protein-coding genetic variation in 60,706 humans. bioRxiv: early release. 61. Bogliolo M, Schuster B, Stoepker C, Derkunt B, Su Y, et al. (2013) Mutations in ERCC4, encoding the DNA-repair endonuclease XPF, cause Fanconi anemia. American journal of human genetics 92: 800–806. 62. Kashiyama K, Nakazawa Y, Pilz DT, Guo C, Shimada M, et al. (2013) Malfunction of nuclease ERCC1-XPF results in diverse clinical manifestations and causes Cockayne syndrome, xeroderma pigmentosum, and Fanconi anemia. The American Journal of Human Genetics 92: 807–819. 63. Karaca E, Harel T, Pehlivan D, Jhangiani SN, Gambin T, et al. (2015) Genes that Affect Brain Structure and Function Identified by Rare Variant Analyses of Mendelian Neurologic Disease. Neuron 88: 499–513. 64. Osbun N, Li J, O’Driscoll MC, Strominger Z, Wakahiro M, et al. (2011) Genetic and functional analyses identify DISC1 as a novel callosal agenesis candidate gene. Am J Med Genet A 155A: 1865–1876. 65. Pao GM, Zhu Q, Perez-Garcia CG, Chou S-J, Suh H, et al. (2014) Role of BRCA1 in brain development. Proceedings of the National Academy of Sciences 111: E1240–E1248. 66. Willemsen MH, Ba W, Wissink-Lindhout WM, de Brouwer AP, Haas SA, et al. (2014) Involvement of the kinesin family members KIF4A and KIF5C in intellectual disability and synaptic function. Journal of medical genetics 51: 487–494. 67. Zhang J, Sarge KD (2009) Identification of a polymorphism in the RING finger of human Bmi-1 that causes its degradation by the ubiquitin- proteasome system. FEBS letters 583: 960–964.ssion of genes involved in tWatanabe2, Takayuki Tohge2, Malcolm J. Hawkesford3, Rainer Hoefgen2, J. Theo M. Elzenga1 and Luit J. De Kok1

144

Chapter 6

General discussion

General discussion

The advent of next-generation sequencing (NGS) has revolutionized genomic research/diagnostics. NGS provides unprecedented means to carry out the most comprehensive genomic analyses. Indeed, the massive capacity of NGS has led to a paradigm shift from gene by gene analyses to whole genome analysis. As a result, the number of phenotypes with a demonstrated molecular basis in the Online Mendelian Inheritance in Man (OMIM) database has more than doubled from 2,048 in January 2007 [1] to 4,515 in 31th July 2015, mostly as a direct result of the rapid adaptation of researchers to using whole exome sequencing (WES) in their gene- hunting studies. Consequently, WES is now also being rapidly implemented for routine diagnostic testing. Clearly, for genetically highly heterogeneous disorders like deafness, blindness, mitochondrial disease and movement disorders [2], WES results in a significantly higher diagnostic yield than Sanger sequencing. This thesis focused on application of WES-based techniques as a molecular tool to diagnose/discover pathogenic variants in monogenic disorders. In the course of my studies, I have applied current state-of-the-art technological approaches by using WES in order to establish a concrete genetic diagnosis for unsolved patients. These unsolved cases had already been evaluated extensively by series of diagnostic tests including karyotyping, FISH, array-CGH and conventional Sanger sequencing, 6 yet the causal mutations remained to be identified. By solving these cases, I have shown that these NGS applications can be applied to identify pathogenic mutations in genetically heterogeneous diseases (chapter 2 and 3) and to identify novel disease genes (chapter 4). Despite these promising WES results in diagnostic and research settings [1,3], unsolved cases still remain. In chapter 2 we achieved a diagnostic yield of 29% in individuals who presented microcephaly with severe intellectual disability. The outcome seems promising in this phenotypically and genetically heterogeneous cohort. Our yield falls within a range previously reported for neurologic disorders (Table 1). The variations in diagnostic yield can be probably attributed to, among other causes, differences in disease characteristics and analytical differences with respect to exome capturing and filtering strategies and in sample size. The high number of unresolved cases points to the importance of potential technical limitations in performing WES and the interpretation of its results summarized below.

Technical limitations One of the known shortcomings of WES is coverage, which is the number of unique reads that overlap a specific genomic region. While an average coverage of 248X is readily achieved for all exons in a targeted resequencing methods [12]. 147

Chapter 6

Table 1: Diagnostic yield of WES in neurologic disorders Disease Number Average coverage Diagnostic Reference of patients yield Inherited peripheral 110 45.5X 18.2% [4] neuropathy* Range(27.5X-88X) Cohort in which 250 130X 25% [5] approximately 80% were children with neurologic phenotypes Cerebellar ataxia 76 NA 21% [6] Various unexplained 78 NA 41% [7] neurodevelopmental disorders Severe intellectual 100 trios Median coverage 64X, 87% of 16% [8] disability target bases covered ≥10X Sporadic non- 51 trios NA 45-55%‡ [9] syndromic intellectual disability (study of de novo variants) Moderate or severe 41 trios Median coverage 135X, 95% of 29% [10] intellectual disability target bases covered ≥10X (study of de novo variants) Paediatric-onset 28 106X 46% [11] ataxia Range(55X-162X) Microcephaly with 35 kindreds 64X 29% Current study intellectual disability Range(41X-90X) (chapter 2) * WES with focus on 75 known genes ‡ Because of the extensive locus heterogeneity, final conclusions about the pathogenicity of each individual mutation cannot be made. NA, not available

Current WES protocols generally capture no more than approximately 80-90% of the exons, even at an average coverage of 90X. Hence, this incompleteness of WES is a potential source for missed diagnoses [13]. Even assuming 99% of the targeted regions are covered above 10X, it means that 0.3 Mb (0.01 x 30 Mb) or 200 genes (0.01 x 20,000 genes) will lack sufficient coverage to enable sensitive variant detection. From a technical perspective, one the factors contributing to the coverage is the GC content. GC is a limiting factor for efficient hybridization and amplification during target enrichment [14]. Existing high or low GC contents in a genomic region lead, in general, to a less efficient capture probe hybridization [15], which ultimately results in low coverage [16,17]. A systematic comparison of exon- intron structure and GC content for all known genes in the human genome revealed that first introns and exons are more GC-rich than last and internal ones [18]. This highlights the importance of paying particular attention to the sequencing quality of 148

General discussion such problematic exons, at least in known genes for a given disorder. In a study of genomic sequences of seven ataxia genes targeted on a 2-Mb sequence-capture array, for example, two exons (63.6% GC content for exon 1 of the FXN gene; 76.1% GC content for exon 2 of the SACS gene) showed no coverage at all [19]. An improvement in coverage for such problematic gene regions in current WES panels would therefore help to fully capture the exome, and this seems feasible as there are only small numbers of problematic exons. If we could obtain better coverage in WES, it would most likely further improve our diagnostic yield and ultimately solve at least some of the currently unsolved cases. A second source of uncertainty is that WES does not allow extensive analysis of non-coding regions. Non-coding regions of the genome such as introns, miRNAs, lncRNAs, snRNA, promoters, enhancers and UTRs are not captured and sequenced when using WES. This may be of particular relevance for neurological diseases. There is increasing evidence for associations between non-coding RNAs and brain development in mice [20–22] or even neurodegenerative and neuropsychiatric disorders in humans. Indeed, the association of miRNAs with Alzheimer’s [23], schizophrenia and autism [24,25] has been documented. Many miRNAs [20] and lncRNAs [26] are highly expressed in the nervous system, and 6 miRNAs have been implicated in synaptic plasticity [24], neurogenesis and regulating cerebral cortex size [25,27], neural precursor self-renewal and differentiation [28] and maintaining the neural progenitor pool [29]. Functional studies in mouse models have confirmed these findings. Blocking miR-7 function in mouse model, for example, caused microcephaly-like brain defects [30]. As I noted in chapter 5, at least two CM-causing genes, MCPH1 and DYRK1A, are regulated by miRNAs [31,32]. Likewise, lncRNAs are involved in epigenetic regulation, synaptic and neural network connectivity and plasticity, differentiation of neural stem cells and progenitors and brain development [33–35]; There are several reports on the involvement of lncRNAs in neurodevelopmental disorders such as Prader– Willi syndrome, Angelmann syndrome, micropthalmia syndrome 3, spinocerebellar ataxia 8 and Rett syndrome [34]. The high proportion of identified miRNAs and lncRNAs in the brain [36] and their roles in brain functions and developmental processes, along with WES-based undiagnosed cases (Table 1), highlight the importance of these molecules, at least in neurologic disorders. These observations support the idea that a closer look into the non-coding RNAs may lead to the discovery of more pathogenic mutations. This notion is supported by the finding that defects in snRNA RNU4ATAC lead to congenital microcephaly [37,38]. Accordingly it would be an attractive option to extend current WES-based studies with RNA-seq. However, this may not be readily applicable for patients with 149

Chapter 6 neurologic disorders because mRNA and non-coding RNA profiles are tissue, time and anatomical brain-region-specific [26,39,40], and extracting non-coding RNAs from other tissues such as blood or fibroblasts may not provide useful expression information. Nonetheless, the potential role of non-coding RNA or cis-regulatory elements in Mendelian disease is intriguing, posing a clear challenge that needs to be further addressed. The third source of uncertainty, which is of lower impact, is the inability of WES to detect pseudo-exons. These are exonic-like sequences that are present within intronic regions. Insertion of a pseudo-exon may introduce an extraneous exon within the mature mRNA that may cause either the disruption of or the insertion of a novel stretch of amino acid in the translational reading frame. As a result, the normal biological properties of the resulting protein could be disrupted [41]. Although pseudo-exons are rarely recognized as a cause of human disease, they have been reported in various cancers [41]. Interestingly, as mentioned in chapter 3, a previous study of two siblings with familiar glucocorticoid deficiency proved a pseudo-exon insertion as cause of the disease [42]. As another example, deep intronic sequencing in the STAMBP gene in congenital microcephaly patients (chapter 5) found a casual role for a homozygous intronic mutation (c.1005+358A>G), which leads to the splicing of a 108bp pseudo-exon containing a premature stop codon [43]. While growing evidence indicates that pseudo-exon events could explain some of the unsolved cases, these pseudo-exons are generally not covered by exome-seq. Therefore, deep resequencing in intronic regions for the identification of pseudo-exon insertions may improve current detection of gene- mutations in unsolved cases. However, in order to be able at all to interpret variants identified in the intronic regions, additional cDNA sequencing will be required to correctly interpret and define pseudo-exon insertions because there are no effective tools as of yet to predict such alterations based solely on the intron-variant information.

Challenges in interpretation of variants Next to the technical limitations, current interpretations of WES results are also far from perfect, and this further contributes to the high number of unsolved cases. Translating the results obtained from WES is complicated and relies heavily on our current state of knowledge about disease-causing genes and application of a series of software-prediction tools to assess the possible pathogenicity of variants. Here, there is already an apparent clear lack of knowledge about the potential effects of synonymous mutations in disease. Synonymous mutations represent a significant 150

General discussion proportion of variants detected by WES; it is estimated that 6-8% of single nucleotide changes in oncogenes are synonymous [44], and this raises the question of whether these mutations may lead to disease. Despite the general concept that synonymous mutations do not change the amino acid composition, and are therefore less relevant for disease pathology, a number of studies have shown that synonymous mutations may cause the disease. Associations with synonymous mutations have been reported for at least 50 diseases [45]; of which three have been proposed to represent CM-causing genes (NBS1, ERCC1 and IGF1R, see chapter 5). While the mechanisms by which synonymous substitutions may cause disease are not fully understood, particular synonymous mutations may affect splicing accuracy, translation fidelity, protein folding, mRNA stability, mRNA secondary structure and expression level [46–48]. Among these mechanisms, defects in splicing accuracy have been reported more frequently in association with synonymous mutations in the literature [45]. We, however, evaluated splicing effects of synonymous mutations using in silico analyses throughout this study, but found none of them to be predicted as a possible splicing effect variant. Accordingly, the potential impact of synonymous mutations remains to be investigated by future in vitro techniques such as quantitative RT-PCR, circular dichroism spectroscopy, Western blot and 6 ribosome profiling along with in silico analyses [46]. Taken together, this affirms that the possible impact of synonymous mutations should not be neglected in WES variant interpretation. A second challenge is the interpretation of variants located in pseudogenes. Pseudogenes are dysfunctional copies of genes that have lost their functional expression due to i.) mutations in, or integration into, silent regions of the genome and ii.) their ability to encode functional protein because of the presence of frameshifts or premature stop codons [49]. Many pseudogenes are transcribed into RNA, with some exhibiting a tissue-specific pattern of activation. Pseudogene transcripts can be processed into short interfering RNAs that regulate coding genes through the RNAi pathway. It has been shown that pseudogenes are capable of regulating tumour suppressors and oncogenes by acting as microRNA decoys and are able to inhibit/alter the mRNA level of their parent gene [50]. For example, PTEN is a tumour suppressor that is often mutated at one allele at cancer presentation. The severity and susceptibility to cancer is influenced by the dosage of PTEN [51]. A gene-pseudogene pair (here, PTEN-PTENP1) was shown to be co- regulated by the same miRNA [52]. The miRNA can bind to 3’-UTR of both PTEN and PTENP1 where they show 95% similarity in sequence. PTENP1 pseudogene acts as a ‘‘miRNA decoy’’, binding to and thereby reducing the effective cellular concentration of miRNAs, thus allowing PTEN to escape miRNA-mediated 151

Chapter 6 repression. A similar relationship was also demonstrated between the oncogene KRAS and its pseudogene KRASP1 [52]. Dramatic changes in the expression of pseudogenes were seen during the transition from pluripotent stem cells to early differentiating neurons [53]. Indeed, this study identified 1371 pseudogenes that were up- or down-regulated significantly during this transition. The number of pseudogenes with significant changes in expression during neural differentiation may highlight the role of pseudogenes in brain development. It is possible that the exact molecular pathomechanism underlying PTEN-PTENP1 or KRAS-KRASP1 occurs similarly in the genes involved in brain development to down-regulate them. Despite these indications for the pathogenicity of pseudogenes in human disease, variants in pseudogenes are not always considered in WES analysis. Currently, most efforts are directed at distinguishing mutations that occur in the active gene-copy from mutations that occur in the pseudo-gene copies, and, in the latter case, these are then disregarded as potential disease-causing variants. This may turn out to be erroneous in some cases and lead to missing of disease-causing mutations in WES experiments. Therefore, variants in pseudogenes deserve more careful consideration during the interpretation of WES results to avoid missing pathogenic variants. A final limitation in interpreting WES results is our current knowledge of established disease-genes. For many disorders there will be yet undiscovered causal genes. As previously discussed, unresolved cases are a widespread observation in WES studies and; in addition to the issues already mentioned, the most significant factor causing this discrepancy may be that our filtering strategies are focused on and limited to known disease-genes and -pathways. I strongly believe that alternative approaches, such as biological network analysis, will provide novel clues and candidate genes for follow-up. In an interesting example, Novarino et al. (2014) demonstrated the potential of such alternative approaches by identifying three additional candidate disease genes in families with hereditary spastic paraplegias by applying a network analysis after WES. In these genes, putative mutations were detected that co-segregated with the disease-phenotype [54]. This alternative approach highlights the importance of understanding molecular pathomechanisms, biological functions and interaction network(s) of established disease genes to resolve unsolved cases. Here, bioinformatics tools are a means to combine results from the different domains of genetics and genomics, to pinpoint and prioritize the potential causal genes for a disease of interest and to augment our knowledge about the underlying disease mechanism. To further illustrate the added value of such approaches, in chapter 5, I made a first attempt at extending our capacity to identify potential CM-related genes by using a composite biological network analysis. Starting from 76 established CM-causing genes, as expected, our 152

General discussion network analysis identified previously well-known biological functions implicated in CM such as defects in the centrosome, spindle pole-related functions, DNA damage repair and cell cycle checkpoint control. Additionally, our network analysis identified another two biological processes, (i.) telomere biology and (ii.) tRNA metabolic processes, both of which are probably important pathomechanisms underlying CM. Moreover, our analysis identified a further 100 potential candidate genes for congenital microcephaly. In silico biological networks are also able to display possible biological connections between candidate genes and established disease genes. In our research on L1 syndrome, in silico biological network analysis enabled us to provide supportive evidence for DACH2 being a novel candidate gene for L1 syndrome (chapter 4). DACH2 showed biological relationship with L1CAM, the only gene currently known to cause L1 syndrome. This biological relationship has been mediated via the LAMC1 gene by two edges (Figure 1). Similar to L1CAM, LAMC1 is involved in the same pathway (axon guidance).In addition, there is also a genetic interaction between DACH2 and LAMC1. Genetic interaction implies that these two genes share a functional relationship [55] and may be involved in the same biological process or pathway, or they may also be involved in compensatory pathways with unrelated apparent functions [55]. A genetic interaction between two 6 genes generally indicates that the phenotype of a double mutant differs from what is expected from each individual mutant. Further support for such a relation may be provided by the fact that mutations in LAMC1 cause Dandy–Walker malformation [56], a disease characterized by some overlapping phenotypic features with L1 syndrome such as hydrocephalus, corpus callosum agenesis and skull defects [57,58]. Altogether, our in silico network study highlights DACH2 as a promising candidate gene for L1 syndrome. In summary, composite biological network analysis provides a valuable in silico framework to identify putative candidate genes for any rare, genetically heterogeneous Mendelian disease and may facilitate mechanistic understanding of the disease. For our CM-research, future studies should concentrate on the investigation of both these proposed genes in unsolved CM-kindreds and using novel suggested biological processes such as studying the role of candidate genes involved in telomere biology in neural stem cell proliferation/division.

153

Chapter 6 Figure 1: Biological relationship between a candidate gene for L1 syndrome (DACH2) and L1CAM. This biological relationship has been mediated via LAMC1 by two edges (pathway and genetic interaction), i.e. both L1CAM and LAMC1 are involved in same pathway (axon guidance; as has been experimentally proven and reported in Pathwaycommons database). The gene-gene interaction between DACH2 and LAMC1 has also been experimentally proven [71]. This network was generated by the GeneMANIA algorithm, which uses existing knowledge to visualize the biological relationships among genes. Light blue lines represent pathways. Dark blue lines indicate co-localization. Red lines reflect protein-protein interaction. Light brown lines show shared protein domains. Green lines display genetic interactions.

Future perspectives NGS technologies create exiting new opportunities to study human disease both to gain much better insight into the underlying mechanisms and to improve clinical care. In a relatively short time, NGS technologies have found their way into research and diagnostics. Gene panel sequencing and WES have become routine in clinical applications, particularly for the diagnosis of monogenic disorders. However, as discussed in previous sections, the utilization of WES in the diagnostic setting is still suboptimal, and currently has some limitations. Further development is needed to fully utilize NGS capacities and improve NGS-based diagnostic yield. Because of current routine application of WES (coding region), I believe that the majority of disease-causing genes will be identified soon. Afterward, the remained unsolved cases will lead to a new milestone in human genomic disease in future, which I named “Non-codingomics”. In particular, for neurologic/neurodevelopmental disorders, the abundance of non-coding RNA in the brain and the relatively large number of unsolved cases (Table 1) emphasize that special consideration urgently needs to be given to interpreting variants in non- coding RNAs. To enable the inclusion of non-coding regions, it is expected that whole genome sequencing (WGS) will soon replace WES [59,60]. This will provide unprecedented opportunities to study possible disease-causing variations outside the exome and to call all structural variants, and will probably lead to a significant reduction in test costs and turnaround time [61].

154

General discussion

To further improve the NGS-based diagnostic yield, we will need other layers of information to use as a type of functional read-out for genomic variations. Combining genome sequencing with RNA-seq may provide the first of such layers, although it will also pose new challenges with respect to obtaining and interpreting reliable and relevant expression data. Subsequent integration of these outcomes with other omics such as proteomics and metabolomics could provide further layers to help in deciphering genomic variants which would enable us to study associations of, e.g., post-translational modification, epigenetic alterations or metabolite compounds with gene expression and disease causality. Obviously, shifting from WES to WGS and adding extra layers of information will improve diagnostic possibilities. However, in parallel, there is also an urgent need for novel bioinformatics tools to better interpret variants on top of what we are able to do now. For example, the ability to predict functional effects of variants in non-coding RNA or of intronic variants leading to possible pseudo-exon insertions would provide a further improvement of current variant interpretation. In the interpretation of variants in coding regions many un-resolved areas still exist and, when extending toward the entire genome, many more question marks will appear on the horizon. 6 Thanks to NGS technologies, the acceleration in identifying novel causal genes will gradually help to minimize the “diagnostic odyssey” for patients. Hopes are high, especially for complex disorders such as intellectual disability (ID) with developmental delay [62,63]. Despite the extensive genetic heterogeneity of ID, WGS is now expected to identify the major causes of severe ID in patients for whom the diagnostic result upon microarray-based CNV and WES appeared inconclusive [64]. Increasing our knowledge of the aetiology of ID by shifting from WES to WGS will pave the way for personalized medicine for this clinically and genetically heterogeneous disorder. Personalized medicine refers to customization of healthcare in which NGS technologies can provide a comprehensive genome-wide view of the genetic variations in patients. This may be instrumental for designing better individual treatment, more effective diagnostics, personalized risk-prediction and tailored therapies. Examples already have been reported: early intervention combined with targeted treatment in children with fragile X syndrome improved their behaviour and cognition [65], and enzyme replacement therapy when coupled with stem cell therapy improved treatment for Hurler syndrome [66]. In the next few years, WGS will become part of the standard repertoire of techniques to guide the treatment of cancer patients [67] and genetic disorders such as inherited ocular diseases [68], autism [69] and genetic epilepsies [70]. Accordingly, the use of genetic information is undoubtedly expected to become a major player in 155

Chapter 6 medicine, but, before getting there, we still need to address many challenges, including technical limitations, interpretation of variants and better exchanges of information between the clinical and genomics disciplines to optimally expand and integrate our knowledge.

156

General discussion

References 1. Miyatake S, Matsumoto N (2014) Genetics: clinical exome sequencing in neurology practice. Nature Reviews Neurology 10: 676–678. 2. Neveling K, Feenstra I, Gilissen C, Hoefsloot LH, Kamsteeg E-J, et al. (2013) A Post-Hoc Comparison of the Utility of Sanger Sequencing and Exome Sequencing for the Diagnosis of Heterogeneous Diseases. Human mutation 34: 1721–1726. 3. Sun Y, Ruivenkamp CA, Hoffer MJ, Vrijenhoek T, Kriek M, et al. (2015) Next-Generation Diagnostics: Gene Panel, Exome, or Whole Genome? Human mutation 36: 648–655. 4. Drew AP, Zhu D, Kidambi A, Ly C, Tey S, et al. (2015) Improved inherited peripheral neuropathy genetic diagnosis by whole-exome sequencing. Molecular genetics & genomic medicine 3: 143–154. 5. Yang Y, Muzny DM, Reid JG, Bainbridge MN, Willis A, et al. (2013) Clinical whole-exome sequencing for the diagnosis of mendelian disorders. New England Journal of Medicine 369: 1502–1511. 6. Fogel BL, Lee H, Deignan JL, Strom SP, Kantarci S, et al. (2014) Exome sequencing in the clinical diagnosis of sporadic or familial cerebellar ataxia. JAMA neurology 71: 1237–1246. 7. Srivastava S, Cohen JS, Vernon H, Barañano K, McClellan R, et al. (2014) Clinical whole exome sequencing in child neurology practice. Annals of neurology 76: 473–483. 6 8. De Ligt J, Willemsen MH, van Bon BW, Kleefstra T, Yntema HG, et al. (2012) Diagnostic exome sequencing in persons with severe intellectual disability. New England Journal of Medicine 367: 1921–1929. 9. Rauch A, Wieczorek D, Graf E, Wieland T, Endele S, et al. (2012) Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. The Lancet 380: 1674–1682. 10. Hamdan FF, Srour M, Capo-Chichi J-M, Daoud H, Nassif C, et al. (2014) De Novo Mutations in Moderate or Severe Intellectual Disability. PLoS genetics 10: e1004772. 11. Sawyer SL, Schwartzentruber J, Beaulieu CL, Dyment D, Smith A, et al. (2014) Exome Sequencing as a Diagnostic Tool for Pediatric-Onset Ataxia. Human mutation 35: 45–49. 12. Sikkema-Raddatz B, Johansson LF, Boer EN, Almomani R, Boven LG, et al. (2013) Targeted Next-Generation Sequencing can Replace Sanger Sequencing in Clinical Diagnostics. Human mutation 34: 1035–1042. 13. Kiezun A, Garimella K, Do R, Stitziel NO, Neale BM, et al. (2012) Exome sequencing and the genetic basis of complex traits. Nature genetics 44: 623– 630. 14. Knies K, Schuster B, Ameziane N, Rooimans M, Bettecken T, et al. (2012) Genotyping of fanconi anemia patients by whole exome sequencing: advantages and challenges. PLoS One 7: e52648.

157

Chapter 6

15. Chilamakuri CSR, Lorenz S, Madoui M-A, Vodák D, Sun J, et al. (2014) Performance comparison of four exome capture systems for deep sequencing. BMC Genomics 15: 449. 16. Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, et al. (2009) Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nature biotechnology 27: 182–189. 17. Meienberg J, Zerjavic K, Keller I, Okoniewski M, Patrignani A, et al. (2015) New insights into the performance of human whole-exome capture platforms. Nucleic Acids Research 43: e76. 18. Kalari KR, Casavant M, Bair TB, Keen HL, Comeron JM, et al. (2006) First exons and introns-a survey of GC content and gene structure in the human genome. In Silico Biology 6: 237–242. 19. Hoischen A, Gilissen C, Arts P, Wieskamp N, van der Vliet W, et al. (2010) Massively parallel sequencing of ataxia genes after array-based enrichment. Human mutation 31: 494–499. 20. Mehler MF, Mattick JS (2007) Noncoding RNAs and RNA editing in brain development, functional diversification, and neurological disease. Physiological reviews 87: 799–823. 21. Barry G (2014) Integrating the roles of long and small non-coding RNA in brain function and disease. Molecular Psychiatry 19: 410–416. 22. Knauss JL, Sun T (2013) Regulatory mechanisms of long noncoding RNAs in vertebrate central nervous system development and function. Neuroscience 235: 200–214. 23. Femminella GD, Ferrara N, Rengo G (2015) The emerging role of microRNAs in Alzheimer’s disease. Frontiers in physiology 6: Article 40. 24. Wang W, Kwon EJ, Tsai L-H (2012) MicroRNAs in learning, memory, and neurological diseases. Learning & Memory 19: 359–368. 25. Sun E, Shi Y (2015) MicroRNAs: Small molecules with big roles in neurodevelopment and diseases. Experimental neurology 268: 46–53. 26. Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, et al. (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome research 22: 1775–1789. 27. Tuoc TC, Pavlakis E, Tylkowski MA, Stoykova A (2014) Control of cerebral size and thickness. Cellular and Molecular Life Sciences 71: 3199–3218. 28. Hudish LI, Appel B (2014) microRNA regulation of neural precursor self- renewal and differentiation. Neurogenesis 1: e976018. 29. Chen Y, Bian S, Zhang J, Zhang H, Tang B, et al. (2014) The silencing effect of microRNA miR-17 on p21 maintains the neural progenitor pool in the developing cerebral cortex. Frontiers in neurology 5: Article 132. 30. Pollock A, Bian S, Zhang C, Chen Z, Sun T (2014) Growth of the developing cerebral cortex is controlled by microRNA-7 through the p53 pathway. Cell Rep 7: 1184–1196. 31. Zhang Y, Liao J-M, Zeng SX, Lu H (2011) p53 downregulates Down syndrome-associated DYRK1A through miR-1246. EMBO Report 12: 811– 817.

158

General discussion

32. Wang N, Lu H, Chen W, Gan M, Cao X, et al. (2014) Primary microcephaly gene MCPH1 shows a novel molecular biomarker of human renal carcinoma and is regulated by miR-27a. International journal of clinical and experimental pathology 7: 4895. 33. Wu P, Zuo X, Deng H, Liu X, Liu L, et al. (2013) Roles of long noncoding RNAs in brain development, functional diversification and neurodegenerative diseases. Brain research bulletin 97: 69–80. 34. Van de Vondervoort IIGM, Gordebeke PM, Khoshab N, Tiesinga PHE, Buitelaar JK, et al. (2013) Long non-coding RNAs in neurodevelopmental disorders. Frontiers in molecular neuroscience 6: 53. 35. Qureshi IA, Mehler MF (2014) An evolving view of epigenetic complexity in the brain. Philosophical Transactions of the Royal Society B: Biological Sciences B 369: Article ID 20130506. 36. Kosik KS, Krichevsky AM (2005) The elegance of the microRNAs: a neuronal perspective. Neuron 47: 779–782. 37. Nagy R, Wang H, Albrecht B, Wieczorek D, Gillessen-Kaesbach G, et al. (2012) Microcephalic osteodysplastic primordial dwarfism type I with biallelic mutations in the RNU4ATAC gene. Clinical genetics 82: 140–146. 38. Edery P, Marcaillou C, Sahbatou M, Labalme A, Chastang J, et al. (2011) Association of TALS developmental disorder with defect in minor splicing component U4atac snRNA. Science 332: 240–243. 39. Roth RB, Hevezi P, Lee J, Willhite D, Lechner SM, et al. (2006) Gene 6 expression analyses reveal molecular relationships among 20 regions of the human CNS. Neurogenetics 7: 67–80. 40. Mercer TR, Dinger ME, Sunkin SM, Mehler MF, Mattick JS (2008) Specific expression of long noncoding RNAs in the mouse brain. Proceedings of the National Academy of Sciences 105: 716–721. 41. Romano M, Buratti E, Baralle D (2013) Role of pseudoexons and pseudointrons in human cancer. International Journal of Cell Biology: Article ID 810572. 42. Novoselova TV, Rath SR, Carpenter K, Pachter N, Dickinson JE, et al. (2015) NNT Pseudoexon Activation as a Novel Mechanism for Disease in Two Siblings with Familial Glucocorticoid Deficiency. The Journal of Clinical Endocrinology & Metabolism 100:350–354. 43. McDonell LM, Mirzaa GM, Alcantara D, Schwartzentruber J, Carter MT, et al. (2013) Mutations in STAMBP, encoding a deubiquitinating enzyme, cause microcephaly-capillary malformation syndrome. Nature genetics 45: 556–562. 44. Supek F, Miñana B, Valcárcel J, Gabaldón T, Lehner B (2014) Synonymous mutations frequently act as driver mutations in human cancers. Cell 156: 1324–1335. 45. Sauna ZE, Kimchi-Sarfaty C (2011) Understanding the contribution of synonymous mutations to human disease. Nature Reviews Genetics 12: 683– 691. 46. Hunt RC, Simhadri VL, Iandoli M, Sauna ZE, Kimchi-Sarfaty C (2014) Exposing synonymous mutations. Trends in Genetics 30: 308–321.

159

Chapter 6

47. Kudla G, Murray AW, Tollervey D, Plotkin JB (2009) Coding-sequence determinants of gene expression in Escherichia coli. Science 324: 255–258. 48. Bartoszewski RA, Jablonsky M, Bartoszewska S, Stevenson L, Dai Q, et al. (2010) A synonymous single nucleotide polymorphism in Delta F508 CFTR alters the secondary structure of the mRNA and the expression of the mutant protein. Journal of Biological Chemistry 285: 28741–28748. 49. Pei B, Sisu C, Frankish A, Howald C, Habegger L, et al. (2012) The GENCODE pseudogene resource. Genome Biology 13: R51. 50. Pink RC, Wicks K, Caley DP, Punch EK, Jacobs L, et al. (2011) Pseudogenes: pseudo-functional or key regulators in health and disease? RNA 17: 792–798. 51. Alimonti A, Carracedo A, Clohessy JG, Trotman LC, Nardella C, et al. (2010) Subtle variations in Pten dose determine cancer susceptibility. Nature genetics 42: 454–458. 52. Poliseno L, Salmena L, Zhang J, Carver B, Haveman WJ, et al. (2010) A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465: 1033–1038. 53. Lin M, Pedrosa E, Shah A, Hrabovsky A, Maqbool S, et al. (2011) RNA-Seq of human neurons derived from iPS cells reveals candidate long non-coding RNAs involved in neurogenesis and neuropsychiatric disorders. PLoS One 6: e23356. 54. Novarino G, Fenstermaker AG, Zaki MS, Hofree M, Silhavy JL, et al. (2014) Exome sequencing links corticospinal motor neuron disease to common neurodegenerative disorders. Science 343: 506–511. 55. Boucher B, Jenna S (2013) Genetic interaction networks: better understand to better predict. Frontiers in Genetics 4: Article 290. 56. Poretti A, Boltshauser E, Doherty D (2014) Cerebellar hypoplasia: Differential diagnosis and diagnostic approach. American Journal of Medical Genetics Part C: Seminars in Medical Genetics 166C: 211–226. 57. Al-Turkistani, Hatim K (2014) Dandye-Walker syndrome. Journal of Taibah University Medical Sciences 9: 209–212. 58. Darbro BW, Mahajan VB, Gakhar L, Skeie JM, Campbell E, et al. (2013) Mutations in Extracellular Matrix Genes NID1 and LAMC1 Cause Autosomal Dominant Dandy-Walker Malformation and Occipital Cephaloceles. Human mutation 34: 1075–1079. 59. Stranneheim H, Engvall M, Naess K, Lesko N, Larsson P, et al. (2014) Rapid pulsed whole genome sequencing for comprehensive acute diagnostics of inborn errors of metabolism. BMC Genomics 15: 1090. 60. Saunders CJ, Miller NA, Soden SE, Dinwiddie DL, Noll A, et al. (2012) Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units. Science translational medicine 4: 154ra135. 61. Feero WG, Guttmacher AE (2014) Genomics, Personalized Medicine, and Pediatrics. Academic pediatrics 14: 14–22. 62. Picker JD, Walsh CA (2013) New innovations: therapeutic opportunities for intellectual disabilities. Annals of Neurology 74: 382–390.

160

General discussion

63. Vissers LE, Gilissen C, Veltman JA (2016) Genetic studies in intellectual disability and related disorders. Nature Reviews Genetics 17: 9–18. 64. Gilissen C, Hehir-Kwa JY, Thung DT, van de Vorst M, van Bon BW, et al. (2014) Genome sequencing identifies major causes of severe intellectual disability. Nature 511: 344–348. 65. Leigh MJS, Nguyen DV, Mu Y, Winarni TI, Schneider A, et al. (2013) A randomized double-blind, placebo-controlled trial of minocycline in children and adolescents with fragile x syndrome. Journal of developmental and behavioral pediatrics: JDBP 34: 147–155. 66. Eisengart JB, Rudser KD, Tolar J, Orchard PJ, Kivisto T, et al. (2013) Enzyme replacement is associated with better cognitive outcomes after transplant in Hurler syndrome. The Journal of pediatrics 162: 375–380. 67. Rodriguez-Antona C, Taron M (2015) Pharmacogenomic biomarkers for personalized cancer treatment. Journal of internal medicine 277: 201–217. 68. Porter L, Black G (2014) Personalized ophthalmology. Clinical genetics 86: 1– 11. 69. Geschwind DH, State MW (2015) Gene hunting in autism spectrum disorder: on the path to precision medicine. The Lancet Neurology 14: 1109–1120. 70. EpiPM-Consortium (2015) A roadmap for precision medicine in the epilepsies. The Lancet Neurology 14: 1219–1228. 71. Lin A, Wang RT, Ahn S, Park CC, Smith DJ (2010) A genome-wide map of human genetic interactions inferred from radiation hybrid genotypes. Genome 6 research 20: 1122–1132.

161

Appendices

Summary (English) Samenvatting (Dutch) Acknowledgements

Summary

The unprecedented increase in the throughput of DNA sequencing driven by Next- Generation Sequencing technologies (NGS) now allows for the cost-efficient analysis of the entire genome or selected parts of the genome (exome or gene panels) for multiple samples in a single sequencing run with a short turn-around time. With all these possibilities at hand, the limitations of this technique, such as incomplete capturing of all exons and the complexity of variant interpretation, still pose major challenges for application of Whole Exome Sequencing (WES) in the clinic. The research presented in this thesis focuses on using WES to unravel the genetic basis of human hereditary disorders by (i) screening of a comprehensive set of known disease-causing genes for mutations; (ii) identifying novel candidate disease genes in genetically heterogeneous disorders with different possible inheritance patterns; and (iii) developing bioinformatic tools to improve our capacity to select putative candidate genes by in silico analyses. In chapter 1 we provided background information on NGS-based techniques and presented a simplified overview of the main phases in NGS. This chapter introduced whole genome sequencing, WES and disease-specific gene panels and their applications. The potential of using WES in genome analyses to identify causal mutations in diseases with extreme locus heterogeneity is briefly described along with the challenges presented by the data-interpretation necessary to identify causal mutations(s) against the huge background of non-pathogenic variants. In chapter 2 we set out to apply WES as a diagnostic approach for establishing a molecular diagnosis in a highly heterogeneous group of patients with microcephaly and varied intellectual disability. We achieved a diagnostic yield of 29% (10 out of 35 kindreds) and found mutations in known disease-genes. Our results were in line with other studies on the diagnostic utility of WES in genetically highly heterogeneous disorders. Our results confirmed that many microcephaly cases are explained by autosomal recessive inheritance (7 out of 10 families). We concluded that WES is a powerful tool for the diagnostic evaluation of patients with highly heterogeneous neurodevelopmental disorders. In chapter 3 we set out to identify the disease gene in a Dutch family with a boy affected by familial glucocorticoid deficiency (FGD) by applying trio-exome sequencing. Exome-sequencing revealed a novel homozygous mutation (NM_012343.3: c.1259dupG) in NNT that was predicted to be disease-causing. The mutation is located in exon 9 and creates a frameshift leading to a premature stop- codon (p.His421Serfs*4) that is known to result in FGD. We also reviewed the literature for all reported NNT mutations and their clinical presentation. The median age of disease onset was 12 months (range 3 days-39 months). There was no significant difference in age of disease onset between truncating and non- truncating 165

Appendices

NNT mutations. Based on recent literature, we advise that patients with FGD resulting from NNT mutations be monitored for possible combined mineralocorticoid insufficiency and extra-adrenal manifestations. In chapter 4 we aimed to identify novel disease causing genes for L1 syndrome, an X-linked disorder, by application of a variant of WES. L1 syndrome is known to be caused by mutations in the L1CAM gene, but many unsolved cases remain even after mutation analysis of this gene. Since 95% of all L1 syndrome patients (with and without a L1CAM mutation) are male, and familial cases mostly show an X-linked recessive inheritance, we focused on the X chromosome and performed customized X-exome sequencing in a cohort of 58 patients in whom the L1CAM mutation analysis was negative. We identified mutations in 7 patients implicating 5 genes as possible novel candidates for L1 syndrome: DACH2, TENM1, AMER1, PQBP1 and CXorf38. Three independent mutations in DACH2 suggest this gene as the most promising candidate gene for L1 syndrome. Two of the remaining candidate genes (PQBP1 and AMER1) had already been linked to known diseases whose clinical spectrum overlap L1 syndrome. While a single gene can be associated with more than one disease, differences in typical phenotypes in L1 syndrome suggest that these disorders may be less likely to have a common genetic basis. In a first attempt to identify further mutation carriers, we next performed targeted screening of an additional 24 patients for these five candidate genes but could not identify an extra variant. In the end, functional studies will be essential to firmly establish the causative role of DACH2, TENM1 and CXorf38 in L1 syndrome. The number of putative pathogenic mutations in our cohort indicates that, if causative, the identified genes can only explain a small proportion of L1 patients. In chapter 5 we aimed to identify molecular pathomechanisms underlying congenital microcephaly (CM) and predicted additional candidate CM-causing genes based on their relations with well-established causative genes. A systematic literature review identified 76 definite CM-causing genes. We then carried out an in silico composite biological network analysis on the basis of these genes by integration of data from many different publically available biological databases. Our analysis highlighted known biological processes (i.e. centrosome and mitotic microtubule spindle, DNA damage repair and DNA replication) involved in congenital microcephaly. Additionally, it suggested two biological functions, telomere biology and tRNA metabolic process, that were previously less addressed in the literature. We also ranked the 100 top genes found to be related to any of the initial well- established CM-causing genes. Evaluation of these proposed candidate genes showed that defects in 7 out of the 100 top related genes are already known to be 166

Summary involved in disorders which present microcephaly as one of their phenotypic features. Additionally, studying the 93 remaining genes in the Mouse Genome Informatics database revealed that mutations in 4 of these genes also cause cerebral cortex abnormalities or abnormal brain morphology in knockout mice that are expected to result in microcephaly. Subsequent consultation of the Human Gene Mutation Database found 5 additional genes associated with microcephaly as a phenotypic feature. Altogether, our network strategy provides an interesting 16 candidate genes that we considered to be important in follow-up studies of unsolved cases exhibiting CM. This result also demonstrates the potential of such a network approach to facilitate gene discovery in this genetically heterogeneous disease. The genes that displayed a relationship to CM in mouse models (MGI-based features) and the 14 genes important for (neuro)developmental functions in both mouse and human might be of particular interest and prime candidates for gene-discovery research. Likewise, similar use of this in silico framework tool would enable identification of novel candidate genes for any rare genetically heterogeneous Mendelian disease and may facilitate a more detailed mechanistic understanding of the disease. However, despite the promising results of WES in both diagnostics and research settings, unsolved cases remain. In chapter 6, I focused on the major limitations of WES as a diagnostic tool, these being (i) technical limitations and (ii) interpretation challenges. Insufficient coverage is one of the main shortcomings of WES, as current protocols generally capture no more than 80-90% of the exons even with an average coverage of 90X. In addition, non-coding regions of the genome are not captured and sequenced using WES, which may be of particular relevance for neurological diseases given the increasing evidence for associations between non-coding RNAs and brain development. Ultimately, we still do not know all disease-associated genes and hence our filtering strategies are focused on and limited to known disease-genes and -pathways. This leaves us with ample challenges that are addressed in the last chapter, which is devoted to a discussion of future perspectives for NGS applications and how I believe that adding extra layers of information would support and accelerate the next stage of NGS in personalized medicine. In conclusion, WES-based diagnostic testing is a proven and powerful innovation for human genetics the application of which will drive further developments in the fast-expanding field of human disease research.

167

Summary

Appendices

Summary (English) Samenvatting (Dutch) Acknowledgements

169

Samenvatting

De ongeëvenaarde toename van de capaciteit van op Next-Generation Sequencing technologieën (NGS) gebaseerde DNA sequencing maakt een kostenefficiënte analyse van het totale genoom of van delen van het genoom (exoom of genpanels), voor meerdere monsters in één enkele sequencing run met een korte doorlooptijd mogelijk. Echter, ondanks al deze mogelijkheden, vormen beperkingen van de techniek, zoals incomplete exon-capturing en complexiteit van variantinterpretatie, nog steeds een grote uitdaging voor de toepassing van Whole Exome Sequencing (WES) in de kliniek. Het onderzoek dat in dit proefschrift wordt gepresenteerd, focust op het gebruik van WES voor het ontrafelen van de genetische basis van humane erfelijke aandoeningen door (i) screening van een omvangrijke set bekende ziekteveroorzakende genen op mutaties; (ii) identificatie van nieuwe kandidaat-ziektegenen in genetisch heterogene aandoeningen met verschillende mogelijke overervingspatronen; en (iii) ontwikkeling van bioinformatica programma’s om via in silico analyses betere mogelijkheden te hebben voor de selectie van kandidaatgenen. In hoofdstuk 1 gaven we achtergrondinformatie over NGS technieken en een versimpeld overzicht van de verschillende NGS- processtappen. Dit hoofdstuk introduceert whole genome sequencing, WES, ziektespecifieke genpanels en hun toepassingen. De mogelijkheid om WES in genoomanalyses toe te passen voor identificatie van causale mutaties in ziekten met een extreme locusheterogeniteit is kort beschreven, samen met de uitdagingen van data-interpretatie, die noodzakelijk is voor de identificatie van de causale mutatie(s) in een achtergrond van niet-pathogene varianten. In hoofdstuk 2 hebben we uiteengezet hoe WES is toegepast als een diagnostische benadering bij het vaststellen van een moleculaire diagnose in een zeer heterogene groep patiënten met microcefalie en verstandelijke beperkingen. We konden een diagnose stellen in 29% van de onderzoeken (10 van de 35 families) en vonden mutaties in bekende ziektegenen. Onze resultaten lagen in lijn met andere studies over het diagnostisch nut van gebruik van WES bij genetisch heterogene aandoeningen. Onze resultaten bevestigden dat veel gevallen van microcefalie verklaard worden door autosomaal recessieve overerving (7 van de 10 families). We concludeerden dat WES een effectieve toepassing is voor de diagnostiek van patiënten met een zeer heterogene neurologische aandoening. 171

Appendices

In hoofdstuk 3 is trio-exoomsequencing toegepast om het ziektegen te identificeren in een Nederlandse familie met een jongen met familiale glucocorticoïd deficiëntie (FGD). Exoomsequencing toonde een nieuwe ziekteveroorzakende homozygote mutatie (NM_012343.3:c.1259dupG) in NNT aan. De mutatie ligt in exon 9 en veroorzaakt een frameshift die tot een vervroegd stopcodon leidt (p.His421Serfs*4). Ook analyseerden we de literatuur voor alle gerapporteerde NNT mutaties en hun klinische presentatie. De mediaan van de leeftijd waarop de ziekte zich openbaarde was 12 maanden (variërend van 3 dagen tot 39 maanden). Er bleek geen significant verschil in de leeftijd waarop de ziekte zich openbaarde tussen patiënten met NNT mutaties die een vervroegd stopcodon veroorzaken en patiënten met andere typen NNT mutaties. Gebaseerd op recente literatuur, adviseren wij dat patiënten met door NNT mutaties veroorzaakte FGD gescreend worden voor mogelijke gecombineerde mineralocorticoïd insufficiëntie en extra-adrenale manifestaties. In hoofdstuk 4 richtten we ons op het identificeren van nieuwe ziekteveroorzakende genen voor het L1-syndroom, een X-gebonden aandoening, met behulp van een variant WES toepassing. Het is bekend dat L1-syndroom kan worden veroorzaakt door mutaties in het L1CAM gen, maar zelfs na mutatie-analyse van dit gen blijven veel onopgeloste gevallen over. Omdat 95% van alle L1-syndroom patiënten (met en zonder een L1CAM mutatie) mannelijk zijn en familiale gevallen meestal een X- gebonden recessieve overerving laten zien, hebben we ons gefocust op het X-chromosoom en een op maat gemaakte X-exoom sequencing uitgevoerd in een cohort van 58 patiënten waarin de L1CAM mutatie-analyse negatief was. We identificeerden mutaties in 7 patiënten die 5 genen aanwezen als mogelijke nieuwe kandidaten voor L1-syndroom: DACH2, TENM1, AMER1, PQBP1 en CXorf38. Drie onafhankelijke mutaties in DACH2 suggereren dit gen als de meest veelbelovende kandidaat voor het L1 syndroom. Twee overige kandidaatgenen (PQBP1 en AMER1) waren eerder al gekoppeld aan ziekten waarvan het klinisch spectrum overlapt met het L1 syndroom. Hoewel een enkel gen geassocieerd kan zijn met meer dan één ziekte, suggereren verschillen in typische fenotypes in L1 syndroom dat het minder waarschijnlijk is dat deze aandoeningen een gezamenlijke genetische basis hebben. In een eerste poging meer mutatiedragers te identificeren, 172

Samenvatting voerden we vervolgens een gerichte analyse van deze 5 kandidaatgenen uit bij nog eens 24 patiënten, maar konden geen aanvullende mutaties identificeren. Uiteindelijk zullen functionele studies onontbeerlijk zijn om de oorzakelijke rol van DACH2, TENM1 en CXorf38 in L1 syndroom definitief vast te stellen. Het aantal mutaties in ons cohort wijst erop dat, wanneer causaal, de geïdentificeerde genen toch slechts een klein gedeelte van de L1 patiënten kunnen verklaren. In hoofdstuk 5 streefden we naar het identificeren van onderliggende moleculaire pathomechanismen voor congenitale microcefalie (CM) en voorspelden we kandidaatgenen gebaseerd op hun relaties met de bekende CM-genen. Een systematisch literatuuronderzoek identificeerde 76 genen waarvan het zeker is dat ze CM veroorzaken. Hierna voerden we een in silico biologische netwerk analyse uit gebaseerd op deze genen, via integratie van data van vele verschillende publiek beschikbare biologische databases. Onze analyse markeerde bekende biologische processen (d.w.z. centrosoom en mitotische spoeldraden, DNA-schade herstel en DNA replicatie) als betrokken bij congenitale microcefalie. Verder wees de analyse op betrokkenheid van twee biologische functies, telomeerbiologie en het tRNA metabolische proces, die minder worden besproken in de literatuur. Ook rangschikten we de top 100 genen die zijn gerelateerd aan één van de met CM-geassocieerde genen. Evaluatie van deze voorgestelde kandidaatgenen toonde aan dat van 7 van de top 100 genen reeds bekend is dat ze betrokken zijn bij aandoeningen waarvan microcefalie één van de fenotypische kenmerken is. Daarnaast liet vervolgonderzoek van de 93 overige genen met behulp van de Mouse Genome Informatics database zien dat knockout-muizen voor 4 van deze genen cerebrale-cortexafwijkingen of een abnormale hersenmorfologie tonen, en waarvan dus verwacht wordt dat ze in microcefalie resulteren. Vervolgens bleek uit onderzoek van de Human Gene Mutation Database dat 5 extra genen geassocieerd zijn met microcefalie als fenotypisch kenmerk. In conclusie leverde onze netwerkstrategie 16 interessante kandidaatgenen op die we belangrijk achten voor vervolgstudies van onopgeloste CM gevallen. Tevens laten deze resultaten het potentieel zien van een dergelijke netwerkbenadering als ondersteuning voor het ontdekken van genen in genetisch heterogene ziekten. De genen die een relatie met CM lieten zien in muismodellen (MGI gebaseerde kenmerken) en 173

Appendices de 14 genen die in zowel muis als mens belangrijk zijn voor neurologische functies of ontwikkelingsfuncties, zouden van bijzonder belang kunnen zijn en de voornaamste kandidaten voor onderzoek naar nieuwe CM-genen. Eveneens zou vergelijkbaar gebruik van dit in silico framework programma de identificatie van nieuwe kandidaatgenen mogelijk maken voor iedere zeldzame genetisch heterogene Mendeliaanse ziekte en kan het programma bijdragen aan een gedetailleerder mechanistisch begrip van ziekte. Echter, ondanks de veelbelovende WES resultaten in zowel diagnostiek als research, blijven er onopgeloste gevallen over. In hoofdstuk 6 ging ik in op de belangrijkste beperkingen van WES als diagnostische tool. Dit zijn (i) technische beperkingen en (ii) uitdagingen met betrekking tot de interpretatie. Onvoldoende dekking is een van de grootste tekortkomingen van WES, aangezien bij huidige protocollen zelfs met een gemiddelde coverage van 90X of hoger meestal niet meer dan 80-90% van de exonen goed analyseerbaar zijn. Daarnaast worden met WES niet-coderende genomische gebieden niet verrijkt en gesequenced, wat van bijzonder belang kan zijn voor neurologische ziekten, gegeven het groeiende bewijs voor associaties tussen niet-coderend RNA en hersenontwikkeling. Tenslotte kennen we nog niet alle ziektegerelateerde genen en dus zijn onze filterstrategieën gericht op, en beperkt tot bekende ziektegenen en ziekte- processen. Dit levert zeker uitdagingen, die worden besproken in het laatste hoofdstuk, dat is gewijd aan een discussie van toekomstperspectieven voor NGS-toepassingen en hoe ik geloof dat toevoeging van extra lagen informatie de volgende stap van NGS in personalized medicine zou ondersteunen en versnellen. Concluderend: op WES gebaseerd diagnostisch testen is een bewezen en krachtige innovatie, waarvan de toepassing verdere ontwikkelingen zal stimuleren in het snel groeiende veld van onderzoek naar humane ziekten.

174

Appendices

Summary (English) Samenvatting (Dutch) Acknowledgements

Acknowledgements

Though only my name appears on the cover of this thesis, a great many people have contributed to its production. I most sincerely appreciate every person and organization that provided me with the opportunity to carry out my work and acknowledge the fact that without them this could not have been possible. Firstly, I would like to express my sincere gratitude to my promoter Prof. Richard J. Sinke for his continuous support during my PhD study and related research, for his patience and for his immense knowledge. His guidance helped me in performing my research and writing my thesis. Dear Richard, I deeply appreciate you for helping me to acquire new knowledge. A special thanks also goes to my co-promoter, Dr. Behrooz Z. Alizadeh, for his patience, critical comments on manuscripts and for helping me to enrich my ideas about network in silico analysis. Behrooz, I learned from you how to start and follow a scientific question, a skill which will definitely be useful for my future career. I would also like to acknowledge Prof. H. M. Boezen, Prof. V. V. A. Knoers and Prof. F. J. van Spronsen for being a member of the assessment committee for my thesis, for taking the time to read it and for giving their approval for the defense. I would like to thank Prof. Cisca Wijmenga, head of the genetics department, for creating a convivial diverse atmosphere in which vibrant scientific intellectual activities can take place. I would like to mention specifically my gratitude for the support given by Dr. Birgit Sikkema-Raddatz. Thank you for commenting on manuscripts but also for helping me to understand the principles of Comparative Genome Hybridization and SNP array. It is my great pleasure to thank Prof. C.M.A. van Ravenswaaij-Arts. I do appreciate all the time and effort you spend commenting on the manuscript and your quick reactions during the submissions process for our FGD project, as well as for your collaboration in our microcephaly studies. I should thank Dr. Patrick Rump for his help and collaboration on phenotyping microcephaly patients and in manuscript preparation. I would like to appreciate Dr. Cleo. van Diemen. Dear Cleo, I learned from you how to filter variants in exome data. Thank you for “babysitting” me at the beginning of my journey through the exome analysis in my FGD project. I should thank Dr. Yvonne Vos. Dear Yvonne, I appreciate you sharing your experience in L1 syndrome and your commenting on the manuscript clearly improved this chapter of my thesis. I should thank Dr.V.K. Magadi Gopalaiah. Dear Vinod, I really appreciate your warmly listening to questions and scientific answers in a friendly way. 177

Appendices

I also would like to really extend my appreciation to Kate Mc Intyre and Jackie Senior for editing the entire thesis and for their kind cooperation. Very special thanks to my spouse who was always with me during difficult times. I really appreciate your patience, support and encouragement. This would be my pleasure to dedicate my thesis to you. I also would like to thank Azadeh who brought happiness with her during her stay with us in the Netherlands. I would like to thank my parents for all the support they have provided me over the years and for allowing me to realize my own potential. Special thanks to Theo and Arike, Bahram, Elizabet, Judith, and Martin Waslander. My family and I owe you for your help, efforts and support. We will never forget all your kindness. I should express my deepest thank to my mentor, Dr. M.J. Soleimani. Dear Dr. Soleimani, you are my role model in life, your personality which is combination of knowledge, wisdom, kindness, ethics and power was always interesting to me. I also appreciate your continued help, support, and encouragement. I would like to thank the University of Mazandaran for supporting and preparing this opportunity to travel to the Netherlands to obtain my PhD. Special thanks to my Paranimfs, Ali Saber, Lennart Johansson and Eddy de Boer, thank you for helping me prepare for today. I am happy to have you at my side today and hope to see you soon in Iran. My warmest thanks to my friends and lab-mates for making my years at the UMCG great and enjoyable: Rodrigo Coutinho de Almeida, Krista Bos, Mohamed Zahir Alimohammad, Juha Karjalainen, Isis Ricano-Ponce, Suzanne van Sommeren, Barbara Hrdlickova, Javier Gutierrez-Achury, Olga Sin, Ferronika Paranita, Esther Nibbeling, Cleo Smeets, Rudi Alberts, Jan Osinga, Monica Wong, Maria Zorro, Urmo Vosa, Ludolf Boven, Mathieu Platteel, Jurjen Knapper, Itty Oostendorp, Nettie Rietema, Klaas Heeres, Jakob van der Vlag, Marcel Burger, Astrid Maatman, Marloes Benjamins, Michiel Fokkens, Rutger Modderman, Desiree Weening, Jelkje Bergsma, Bart Mol, Mariska Oortwijn, Dharmendra and Pulkesh. I would like to thank Bote, Mentje, Helene, Joke, Aukje and Marijke Hanania, for their kind cooperation in dealing with paperwork and visa issues. I would like also to thank members of GUIDE for their support and help during my PhD, with special thanks to Maaike Bansema and Riekje Banus. Many thanks to all my close friends and the Iranian community of PhD students and their respective families for the joyful gatherings and all their supports: Bahram and Mitra, Ali and Matin, Ahmad Sattarzadeh, Mehran and Fatemeh, Morteza and Elham, Hesam and Zahra, Davood and Zohreh, Ahmad Vaez, Seyed and Ida, Yazdan and Marzyeh, and Shifteh Abedian. 178

Acknowledgements

I have tried to thank everyone who helped me in completing this thesis but if I have forgotten someone, I extend my thanks to you as well.

At the end I would like to share with you my life experience during this time:

Listen with the ears of tolerance … See through the eyes of compassion … Speak with the language of love… (Rumi, Persian poet, 1207-1273)

Omid May 2016 Groningen

179

List of courses and publications

Courses (1) Genetic epidemiological research and data analysis. (theory and practice).University of Groningen. 2011; 1.5 ECT

(2) Technical & ethical aspects of digital image manipulation. University of Groningen. 2011; 0.5 ECT.

(3) Moleular cytogenetics (theory and practice). University of Groningen. 2011-2012; 4 ECTs

(4) Introductory course cytogentics (theory and practice). University of Groningen. 2012; 3 ECTs

(5) English presentation skill for PhD student. University of Groningen. 2012; 1 ECT

(6) Principles of Next generation sequencing data analysis and interpretation. Erasmus Medical Center University, Rotherdam. 2012; 0.4 ECT

(7) Techniques in molecular biology. University of Groningen. 2012; 4 ECTs

(8) SNP and human disease. Erasmus Medical Center University, Rotherdam. 2012; 2 ECTs

(9) High-throughput next generation biology (theory and practice). University of Groningen. 2013; 1 ECT

(10) Molecular diagnostics. Erasmus Medical Center University, Rotherdam. 2013; 1 ECT

(11) Project management for scientific research. University of Groningen. 2011-2013; 2 ECTs

(12) Microbiological safety. University of Groningen. 2013; 1 ECT

181

Appendices

(13) Publishing in English for PhD student (Scientific writing). University of Groningen. 2013; 2 ECTs

(14) Mass Spectrometry. University of Groningen. 2013; 0.4 ECT

(15) Next generation sequencing (NGS) in DNA diagnostics. Erasmus Medical Center University, Rotherdam. 2014; 1 ECT

(16) Ethics of Research and Scientific Integrity for Researchers. University of Groningen. 2013-2014; 1.5 ECT

(17) Mutation detection by sequencing (theory and practice). University of Groningen. 2014-2015; 4 ECTs

(18) Cancer genetics & cytogenetic diagnostics. Radboud University Medical Center, Nijmegen. 2015; 1.5 ECT

Publications

Jazayeri O, Liu X, Diemen CC van, Bakker-van Waarde WM, Sikkema-Raddatz B, Sinke RJ, Zhang J, Ravenswaaij-Arts CM van (2015) A novel homozygous insertion and review of published mutations in the NNT gene causing familial glucocorticoid deficiency (FGD). European journal of medical genetics 58: 642–649.

Rump P, Jazayeri O, Dijk-Bos KK van, Johansson LF, Essen AJ van, Verheij JB, Veenstra-Knol HE, Redeker EJ, Mannens MM, Swertz MA, Alizadeh BZ, Ravenswaaij-Arts CMA, Sinke RJ and Sikkema-Raddatz B (2016) Whole-exome sequencing is a powerful approach for establishing the etiological diagnosis in patients with intellectual disability and microcephaly. BMC Medical Genomics 9: 7.

Jazayeri O, Sikkema-Raddatz B, Sinke RJ and Alizadeh BZ. Composite biological network analysis provides novel clues for hereditary congenital microcephaly. Submitted.

182