עבודת גמר )תזה( לתואר Thesis for the degree דוקטור לפילוסופיה Doctor of Philosophy

מוגשת למועצה המדעית של Submitted to the Scientific Council of the מכון ויצמן למדע Weizmann Institute of Science רחובות, ישראל Rehovot, Israel

מאת By דנית עוז לוי Danit Oz Levi

זיהוי גנומי של מחלות חד-גניות לא מפוענחות בריצוף של הדור החדש

Next Generation Genomic discovery of undeciphered monogenic diseases

מנחה: :Advisor Prof. Doron Lancet פרופ' דורון לנצט

Month and Year חודש ושנה עבריים March 2016 אדר תשע"ו 1

LIST OF ABBREVIATIONS NGS Next Generation Sequencing HSP Hereditary Spastic Paraparesis IDIS Intractable Diarrhea of Infancy Syndrome THES Trichohepatoenteric Syndrome CLS Capillary Leak Syndrome SNP Single Nucleotide Polymorphism SNV Single Nucleotide Variant InDel Insertion-Deletion VUS Variant of Uncertain Significance WGS Whole genome sequencing CNV Copy Number Variation VCF Variant Call Format MAF Minor Allele Frequency LOF Loss Of Function SPG Spastic Paraplegia SCA Spinocerebellar Ataxia EEC Enteroendocrine Cells TPN Total Parenteral Nutrition TF Transcription Factor RNAPII RNA polymerase 2 SD Syndromic Diarrhea MGUS Monoclonal Gammopathy of Undetermined Significance IVIG Intravenous Immunoglobulins BWA Burrows-Wheeler Alignment GATK Genome Analysis Toolkit SVA Sequence Variant Analyzer NHLBI National Heart Lung and Blood Institute ExAc The Exome Aggregation Consortium

2

CHGV Center Variation DMEM Dulbecco's modified eagle medium FBS Fetal Bovine Serum EBSS Earle's balanced salt solution TEM Transition Electron Microscopy ERDS Estimation by read depth with single- nucleotide variants EEG Electroencephalogram EMG Electromyography WD Triptophan-Aspertate repeat LOD Logarithm Of Odds ICR Intestinal Critical Region ENCODE Encyclopedia of DNA Elements UCSC University of California Santa Cruz CHGA Chromogranin A iPSC Induced Pluripotent Stem Cells HIO Human Intestinal Organoids GTex Genotype-Tissue Expression BLAST Basic Local Alignment search tool ORF Open Reading Frame KO Knockout WT Wild-Type MRI/MRS Magnetic Resonance Imaging/Spectrometry CBC Complete Blood Count CRP C-reactive ECM Extracellular Matrix FA Focal Adhesion VEGF Vascular Endothelial Growth Factor GO Ontology

3

DEP Differentially Expressed FDR False Discovery Rate ALS Amyotrophic Lateral Sclerosis HSAN Hereditary Sensory- Autonomic Neuropathy ER Endoplasmatic Reticulum HUVEC Human umbilical vein endothelial cells CMA Chromosomal microarray

4

ACKNOWLEDGMENTS

First and foremost, I express my deepest gratitude to my supervisor Prof. Doron Lancet, who has been and will continue to be my mentor for life. I thank him for his continuous support of my PhD studies and research, for his patience and willingness to teach me, for the motivation, and immense knowledge. During my PhD studies I have been collaborating with experts in the field of genetics and genomics that have all contributed greatly to the success of my work, and from whom I have learned a lot and expanded my knowledge in many aspects. Especially I would like to thank Prof. Elon Pras, head of the genetic institute at Sheba Medical Center for his continuous interest and assistance, for fruitful and intelligent discussions and for the highly appreciated personal support. I thank Prof. Yair Anikster and Prof. Bruria Ben-Zeev from Sheba Medical Center for initiating the highly interesting genetic projects I have been involved in and for helping me make them even greater. I thank Prof. Zvulun Elazar for his collaboration in the autophagy study and to his lab members, especially Dr. Amir Gelman, for welcoming me into their lab as one of them while working on the TECPR2 project. I thank Prof. David Goldstein for a brilliant collaboration and to all his group members for sharing their expertise in NGS analysis and bioinformatics. I thank Prof. Len Pennacchio for his collaboration in the IDIS project and all the excellent scientific work he has done in order to make this story a successful publication. I thank Dr. Tsviya Olender for teaching me and guiding me throughout my PhD. I thank Dr. Anna Alkelai and Dr. Gil Stelzer for the stimulating scientific discussion, a lot of moral support but overall for being wonderful friends. I thank all lab members of the Lancet group throughout my PhD period for the motivating discussions, and for making my time extremely enjoyable. An enormous thank you to my loving and extremely supportive family- my husband Sagi for his encouragement and for being a wonderful husband and father allowing me to grow professionally, to my father who is always encouraging me to think forward and be great, and to my mother that with her incredible grandmother skills enabled me to be where I am today.

5

TABLE OF CONTENTS

ABSTRACT ...... 10 11 ...... תקציר INTRODUCTION...... 12 Next generation sequencing analysis ...... 12 Analysis of sequence variants ...... 13 Databases of control frequencies ...... 13 Protein damage prediction ...... 14 Variant prioritization ...... 15 Potential impact of this thesis...... 17 Hereditary Spastic Paraparesis (HSP) ...... 18 Autophagy and neurodegeneration ...... 18 Intractable Diarrhea of Infancy Syndrome (IDIS) ...... 19 Enhancer activity and connection to diseases ...... 21 Trichohepatoenteric syndrome (THES) ...... 22 Capillary Leak Syndrome (CLS)...... 23 Pathophysiology ...... 24 Treatment ...... 24 METHODS ...... 26 General methods ...... 26 1.1 Subjects ...... 26 1.2 Exome sequencing and variant identification ...... 26 1.3 Bioinformatics analysis ...... 27 Methods for the study of HSP ...... 27 2.1 Homozygosity mapping ...... 27 2.2 Semi-quantitative RT-PCR ...... 28 2.3 Cell culture and transfection ...... 28 2.4 Immunoblots ...... 28

6

2.5 Immunofluorescence analyses ...... 29 2.6 Transmission Electron Microscopy (TEM) ...... 29 Methods for the study of IDIS...... 29 3.1 Whole genome sequencing ...... 29 3.2 Biopsy collection ...... 30 3.3 RNA extraction from biopsies ...... 30 3.4 RNA sequencing of human samples ...... 30 3.5 Quantitative Real-Time Reverse Transcriptase Polymerase Chain Reaction (qPCR) ...... 30 3.6 Serum Collection ...... 31 3.7 ELISA ...... 31 3.8 Linkage analysis and homozygosity mapping ...... 31 3.9 Deletion analysis...... 32 3.10 Mouse transgenic assays ...... 32 3.11 Generation of enhancer null mice ...... 33 3.12 Genotyping of enhancer null mice ...... 33 3.13 RNA sequencing of mouse tissues ...... 33 3.14 RT-PCR, RT-qPCR and 5’, 3’-RACE ...... 34 3.15 Histological analysis of human biopsies ...... 34 3.16 Generation of induced pluripotent stem cells (iPSCs) from patient lymphocytes ...... 35 3.17 Differentiation of iPSCs into intestinal organoids ...... 35 3.18 Circular Conformation Capture (4C) analysis ...... 36 3.19 Deletion knockout by CRISP/Cas9 system ...... 37 Methods for the study of THES ...... 38 4.1 Immunology function ...... 38 4.2 Polarizing microscopy ...... 38 Methods for the study of CLS ...... 39 5.1 Immunofluorescence of cultured cells and microscopy ...... 39 5.2 Western blots ...... 39 5.3 Platelets adhesion tests ...... 40

7

5.4 Endothelial cells isolation ...... 40 5.5 Endothelial cells assays with serum ...... 40 5.6 Transcriptome analysis ...... 41 5.7 Proteomics sample preparation ...... 41 5.8 Liquid Chromatography ...... 42 5.9 Mass Spectrometry ...... 42 5.10 Data Processing and Analysis...... 43 RESULTS ...... 44 1. Hereditary Spastic Paraparesis (HSP) ...... 44 1.1 Clinical description ...... 44 1.2 Exome sequencing and mutation discovery ...... 45 1.3 Connecting the mutation to changes in Autophagy pathway proteins ...... 49 1.4 Using siRNA knockdown as model for the mutation ...... 51 1.5 Additional patients with TECPR2 mutations...... 53 2. Intractable Diarrhea of Infancy Syndrome (IDIS) ...... 57 2.1 Identification of two deletion alleles in IDIS patients ...... 57 2.2 The intergenic deletion removes a distant-acting enhancer ...... 61 2.3 Deletion of the enhancer in mice leads to a human-like phenotype ...... 62 2.4 Assessing levels by RNA sequencing ...... 64 2.5 Histopathological analysis of patient's biopsies ...... 67 2.6 Reprogrammed intestinal cells from patient-derived induced pluripotent stem cells...... 68 2.7 Circular Chromosome Conformation Capture (4C) ...... 69 2.8 Reprocessed RNA-seq analysis in KO mice identified a new differentially expressed transcript in stomach ...... 71 2.9 LOC105731045 protein search ...... 76 2.10 Deletion of the LOC10531045 ORF in mice leads to a less severe phenotype of IDIS...... 77 2.11 Cross lines of Enhancer KO and ORF KO showed the phenotype varies widely ...... 79 3. Trichohepatoenteric Syndrome (THES)...... 80 3.1 Clinical description ...... 80

8

3.2 Exome-sequencing and mutation discovery ...... 80 3.3 Clinical diagnosis of THES following exome-sequencing...... 81 4. Capillary Leak Syndrome (CLS)...... 83 4.1 Clinical description ...... 84 4.2 TLN1 as a strong candidate for CLS ...... 85 4.3 The splice site mutation affects the TLN1 transcripts ...... 87 4.4 Transcriptome analysis on patients’ skin fibroblasts ...... 88 4.5 Proteome analysis on patients’ skin fibroblasts ...... 90 4.6 Intercellular junctions integrity is impaired in TLN1 hemizygous endothelial cells ...... 91 4.7 Seeking other TLN1-related phenotypes in the CLS patient ...... 92 5. Additional genome studies ...... 98 5.1 Trios project ...... 98 5.2 Chromosomal translocation ...... 102 5.3 Intellectual disability with microcephaly ...... 104 DISCUSSION ...... 108 1. Next generation sequencing- past, present and future ...... 108 1.1 Whole genome versus whole-exome sequencing ...... 109 1.2 The future of clinical exome and genome sequencing ...... 111 2. Hereditary Spastic Paraparesis ...... 112 3. Intractable Diarrhea of Infancy Syndrome ...... 114 4. Trichohepatoenteric Syndrome ...... 117 5. Capillary Leak Syndrome...... 118 BIBLIOGRAPHY ...... 122 List of Publications ...... 134 Declaration on independent collaboration ...... 136

9

ABSTRACT

The identification of the causative mutation for a monogenic disease enables molecular diagnosis and sheds new light on the molecular and cellular mechanism of genetic diseases. Next generation DNA sequencing (NGS) makes it possible to identify mutations responsible for rare disorders even with a small number of affected individuals and no available genetic linkage data. In my thesis, I employed this paradigm along with novel bioinformatics approaches to aid in the identification of new disease-causing . I mainly focused on four rare diseases in which I identified novel genes and mutations, and investigated the relevant biological significance. For Hereditary Spastic Paraparesis (HSP) I found three different variants in the autophagy-related gene TECPR2. This novel association of autophagy with HSP was corroborated by follow-up studies in our group and in external laboratories. For Intractable Diarrhea of Infancy Syndrome (IDIS) I identified a homozygous deletion of an intergenic region with a regulatory signature. This region proved to have enhancer activity in early gastrointestinal development. A candidate target gene for the deleted enhancer is a neighboring unannotated predicted gene LOC105731045, with high similarity to DAXX, a death-associated transcription repressor implied in embryonic development. Mouse targeted deletion of the enhancer and/or its target gene recapitulated many aspects of the human condition. For a case of undeciphered atypical diarrhea, I found a novel mutation in TTC37 leading to a definitive diagnosis of trichohepatoenteric syndrome. For Capillary Leak Syndrome (CLS) I found a dominant negative splice mutation in TLN1, a cytoskeletal gene with a role in adhesion- dependent capillary function. Experiments involving mouse endothelial cells with heterozygously deleted TLN1 showed disruption of endothelial integrity and paracellular junctions. In addition, I have been involved in a large trio project for rare and undiagnosed conditions, in which the disease gene was identified for 46% of the cases. In another case I identified the breakpoints for a balanced genomic translocation in a fetus using whole-genome sequencing. In sum, my work highlights the importance of bringing together the use of genome-wide sequencing methods, advanced bioinformatics analyses and insights from model experimental systems to enhance the capacity for clinical genetics decipherment.

10

תקציר

זיהוי הגנים הגורמים למחלות חד-גניות נדירות מאפשר אבחון מולקולרי ושופך אור על התהליכים הביולוגיים והתאיים המעורבים במחלות אלו. הטכנולוגיות המהפכניות לריצוף דנ"א מהדור הבא מאפשרות כיום לזהות מוטציות הגורמות למחלות אלה אפילו כשהן מופיעות במספר משפחות קטן ובהעדר הגדרת אזור תאחיזה. בעבודת הדוקטורט שלי התמקדתי בגישה זו, בשילוב עם ביואינפורמטיקה חדשנית ומתוחכמת לניתוח תוצאות הריצוף, וזאת על מנת לזהות גנים גורמי-מחלה חדשים. עבודתי התמקדה בעיקר בארבע מחלות חד-גניות נדירות, עבורן זיהיתי את הגן והמוטציה הגורמים למחלה, וחקרתי את המשמעות הביולוגית שלהם בהקשר מנגנוני המחלה. במחלה נוירודגנרטיבית הגורמת לאטקסיה ושיתוק בגפיים התחתונות, מצאתי 3 מוטציות פתוגניות שונות בגן TECPR2 הקשור למסלול האוטופגיה בתא. הקשר בין הגן ומסלול האוטופגיה לבין מחלה הנוירודגנרטיבית אומת ע"י מחקרים נוספים שנעשו בקבוצת המחקר שלנו וגם במעבדות אחרות והיווה פתח לתחום מחקר חדש הנלמד כיום רבות. במחלת שלשול מולד בתינוקות (IDIS( זיהיתי חסר הומוזיגוטי של מקטע דנ"א באזור אינטרגני בגנום, שזוהה כבעל חתימה רגולטורית. אזור זה הוכח כבעל פעילות אנהנסר בשלב התפתחותי מוקדם במערכת העיכול. גן המטרה המועמד של האנהנסר הינו גן חדש חזוי וללא אנוטציה, LOC105731045, הדומה מבחינת הרצף לגן DAXX, מעכב שעתוק הקשור בהתפתחות עוברית. הורדת האנהנסר ו/או גן המטרה שלו בעכבר הביא למרכיבי פנוטיפ דומים לאלה שנצפו באדם. עבור מקרה נוסף של מחלת שלשול לא מפוענחת, מצאתי מוטציה חדשה בגן TTC37 הידוע הגורם לסינדרום מסוג Trichohepatoenteric , דבר אשר הוביל לאבחון מלא של המטופלת. במחלה מסכנת חיים בה יש התקפי דליפה של פלסמה מנימי הדם אל הנוזל הבין תאי, מצאתי מוטציה הטרוזיגוטית נדירה הגורמת לשינוי בשחבור בגן שלד-התא TLN1, הידוע כבעל תפקוד בקשרים בין תאיים באנדותל הנימים. ניסויים בתאי אנדותל מעכבר עם חסר הטרוזיגוטי ב TLN1 הראו תפקוד לקוי של צמתים בין תאיים באנדותל. במחקר גדול נוסף הכולל ילד חולה במחלה קשה ולעיתים לא מאובחנת והורים בריאים, זיהינו באמצעות ריצוף אקסום, את גורם המחלה ב 64% מהמקרים. במקרה נוסף זיהיתי את נקודות השבר בטרנסלוקציה כרומוזומלית מאוזנת בעובר ע"י ריצוף גנומי. לסיכום, עבודת הדוקטורט שלי מדגישה את החשיבות העליונה בשילוב של ריצוף גנומי מהדור הבא, אנליזה ביואינפורמטית נכונה ושימוש במערכות מודל על מנת להבין את הבסיס הגנטי של מחלות נדירות ולהעלות את יכולת הפענוח הרפואית והאבחנתית במחלות גנטיות.

11

INTRODUCTION Next generation sequencing analysis

Mendelian disorders are caused by abnormalities in an individual's genome, including single-nucleotide polymorphisms (SNPs), small insertions and deletions (InDel) and copy number variations. To date, the molecular basis for nearly 3,500 Mendelian phenotypes has been uncovered1. However, there are many more rare (or “orphan”) disorders where the genetic cause remains unknown (the NIH lists over 7,500 in total (http://rarediseases.info.nih.gov/)). Rare monogenic diseases are of substantial interest because identification of their genetic bases provides important knowledge about disease mechanisms, biological pathways, diagnosis and potential therapeutic targets. While many rare monogenic disorders are amenable to linkage analysis, quite a few others present a challenge to such method as they appear in only a few affected individuals and/or families, which results in statistically underpowered analyses and often restricts the analysis to a priori-identified candidate genes. In recent years, the growing usage of next-generation sequencing methodologies now allows causal variants can be identified in Mendelian disorders with much higher efficiency and with considerably fewer constraints. Two unbiased sequencing approaches are available for detecting genetic variation within an individual: whole genome sequencing and whole exome sequencing. With genome sequencing, about 4 million variants per individual can be detected, whereas exome sequencing (covering mainly the 1.5% protein-coding part of the genome) results in about 20,000 variants2. However, sequencing of the exome, rather than the entire genome is well justified for detecting all protein-coding variations in a patient in a single experiment and has become a very powerful approach for the identification of causal mutations in rare human disorders3; 4. This is because a large fraction of the thousands of mutations that underlie monogenic diseases are rare missense or nonsense variations in protein-coding exons5-7. Splice acceptor and donor sites represent an additional class of sequences that are enriched for highly functional variation and are therefore targeted here as well. Recent studies have demonstrated the diagnostics power of exome sequencing especially in cases of undeciphered rare conditions or in atypical forms of a heterogeneous disease

12 seen in very few affected individuals in one in one family only8-12. The availability of whole exome sequence data provides the opportunity to perform additional genotyping within the founder families or in large ethnically matched control cohorts, thus contributing to the generation of community and ancestry specific reference databases for the analysis of the many future “variants of unknown significance” (VUS) emerging from exome sequencing. However, as this approach fails to interrogate the remaining non-coding 98% of the human genome, with emerging indications that a significant proportion of disease- associated variants in non-protein-coding regions13; 14, as well as in non-coding RNA genes. With the increasing number of well documented cases in which mutations in human regulatory elements are associated with severe phenotypes, despite the increasing difficulties in functional interpretation15, there is a clear world-wide trend to a transition to performing whole genome sequencing. Importantly, a further reason for moving to whole genome sequencing is its much more uniform sequence coverage, hence being more adequate for detecting copy number variations (CNVs) of clinical significance. In my thesis I carried out several WGS experiments when justified.

Analysis of sequence variants

Primary analysis of disease NGS results includes sequence read mapping and variant calling, with results stored on a Variant Call Format (VCF) file. This file typically contains 20,000-50,000 positions that differ from the reference genome, providing a “long list” that should be further annotated and filtered using an appropriate analysis pipeline. The filtering process includes the use of annotations such as variant frequency in the general population, variant sequencing quality and missense variant protein impact, as well as the predicted inheritance mode according to the studied phenotype and family.

Databases of control frequencies There are to date several databases that contain a growing number of variants that are continuously being discovered in the human genome. Whenever a variant is identified within the analysis of a rare Mendelian disorder, it is crucial to carefully scrutinize these

13 databases in order to determine the rarity of the identified variant. Population databases are useful in obtaining the frequencies of variants in large populations, nevertheless, they cannot be assumed to include only healthy individuals and are known to contain pathogenic variants. These databases do not contain extensive information regarding the functional effect of the reported variants or any possible associated phenotypes. In addition, when using such databases, one must determine the degree to which data are validated for analytical accuracy (e.g., low-pass versus Sanger-validated variants) and evaluate any quality metrics that are provided to assess data accuracy, including searching the scientific and medical literature, with due regard to and the count of individuals reported. Clinical laboratories and research institutions generating large amounts of sequencing data often implement an internal system to track all sequence variants identified in each gene and clinical assertions when reported. This system holds a pivotal role in the filtering process of rare variants, as it usually contains reported polymorphisms from an ethnically matched population as studies in a specific lab, in contrast to the large population datasets that contain multiple types of ethnicities for controls. In case a variant has not been reported in any control database so far, it should be defined as “novel”. In case the variant has been reported, its rarity will be determined according to its reported minor allele frequency (MAF) in the population. A variant is typically defined as a polymorphism when its MAF is above 1%.

Protein damage prediction A variety of computational tools is currently available for the interpretation of the effect of the sequence variant at the nucleotide and amino acid level. The two main categories of these tools include those that predict whether a missense change is damaging to the encoded protein function or structure and those that predict whether there is an effect on splicing. Tools for addressing noncoding sequences are now beginning to emerge16. In addition, the ExAC RVIS (Residual Variation Intolerance Score based on the Exome Aggregation Consortium data) was mined from the Genic Intolerance database17 (http://genic-intolerance.org/index.jsp), providing an intolerance scoring system to assess whether genes have relatively more or less functional genetic variation than expected

14 based on the apparently neutral variation found in the gene. Genes responsible for Mendelian diseases are significantly more intolerant to functional genetic variation than genes that do not cause any known disease. The impact of a missense change depends on criteria such as the evolutionary conservation of an amino acid or nucleotide, the location and context within the protein sequence, and the biochemical consequence of the amino acid substitution. In general, most algorithms for missense variant prediction are 65–80% accurate when examining known disease variants18. Most tools also tend to have low specificity, resulting in overprediction of missense changes as deleterious, and are not as reliable at predicting missense variants with a milder effect. The top three tools that are currently most commonly used include PolyPhen19, SIFT20 and MutationTaster21. The capacity to combine these methods into a unified score aids in the analysis and determination of a pathogenic versus benign variant. In the case of splicing variants, most of the developed software programs such as GeneSplicer, NetGene2 and Human Splicing Finder have higher sensitivity (~90-100%) relative to specificity (~60–80%) in predicting splice site abnormalities22-24.

Variant prioritization Filtering by the above mentioned “secondary analysis” criteria helps generate a “variant medium list”, with a few dozen to a few hundred entries, depending on the assumed mode of inheritance and on the employed filtering cutoffs. A major challenge of NGS technology is the transition from such variant list, each with its harboring gene, to the most viable very few disease-causing candidates. A typical gene can have a multitude of variants that have not yet been documented to have a relationship with a disease or a phenotype. So in most cases, none of the annotated variant-disease relations appear relevant to the sequenced subject. Furthermore, it has been shown that every genome harbors about 100 loss-of-function (LOF) variants25, so frequency and impact filtering is not sufficient to identify causative variants. To address these challenges, a gene-based interpretation process becomes necessary. The interpretation strategy entails finding disease or phenotype relationships for the gene, so as to address yet undiscovered variant- phenotype relations.

15

In my work I have performed many exome data interpertations for different diseases and have used the above mentioned criteria for secondary analysis including filtering by frequency, variant pathogenicity, mode of inheritance, sequencing quality parameters and other project-specific parameters. For the performance of the highly important tertiary analysis, I have contributed to the construction and development of VarElect (ve..org) in our laboratory, a phenotype-dependent NGS variant prioritizer, which leverages the wealth of information in GeneCards and MalaCards and its affiliated databases26-28. The VarElect algorithm infers direct as well as indirect links between genes and diseases/phenotypes. The direct mode strongly relies on the combined power of GeneCards and MalaCards. When scrutinizing the results, the user benefits from extensive gene-phenotype evidence portrayal ("MiniCards"), with hyperlinks to the parent databases. VarElect compares favorably with several often-used NGS phenotyping tools, thus providing a robust facility for ranking genes, pointing out their likelihood to be related to a patient’s disease. To date, VarElect has helped solve ~20 clinical cases in our own laboratory and is heavily used in more than 50 NGS centers worldwide. VarElect's capacity to automatically process numerous NGS cases, either in web format or in a VCF-analyzer API mode is indispensable for emerging clinical projects that involve thousands of whole exome/genome NGS analyses. In the indirect mode, VarElect benefits from GeneCards’ diverse gene-to-gene relationships. This includes String protein-protein interactions and integrated pathway information in PathCards, another member of the GeneCards Suite. PathCards was constructed because of the pronounced heterogeneity in pathway naming and gene content in different pathway sources. It utilizes an algorithm which unifies pathways into SuperPaths29 integrating 3,215 pathways from 12 sources into 1,073, reducing redundancy and maximizing informativeness. The indirect mode further cashes on GenesLikeMe, a tool that relates genes to each other by shared paralogy, protein domains, mouse phenotypes, tissue RNA/protein expression patterns and publications.

16

Potential impact of this thesis

The main goal of my PhD thesis was to identify the genetic basis of undeciphered rare Mendelian disorders. Identification of disease-causing variants from pedigree-based gene discovery in specific populations will likely inform similar or identical disease phenotypes in other populations which may show alternate mutations in the same gene or different but functionally related genes. The impact of such findings will likely translate into a deeper understanding of the molecular basis of these disorders and furthermore offer translational promise in the form of genetic counseling, prenatal diagnosis, and customized therapy for whole communities with the same ethnic background and more generally for the public in Israel and worldwide. Insights into the genetic basis of novel Mendelian diseases may also provide new drug targets for both rare and related common disorders. These findings may also expand the use of current drugs to treat patients that have been diagnosed, and most importantly, will enable the generation of new drugs in cases where the disease causing gene encodes a protein that is non-functional and can be replaced, such as an enzyme replacement therapy. In the case of gain of function mutations, or certain other categories of mutations – novel approaches closer to the genetic abnormality may be more appropriate (e.g. antisense oligonucleotides, mRNA read-through facilitators; genome editing in tissues that undergo somatic self-renewal by clonal expansion and other approaches under development). Furthermore, deeper understanding of monogenic diseases may inform the development of novel therapeutics for related common diseases. My PhD thesis focused on four rare debilitating genetic diseases that have gone undeciphered for many years. These disorders mainly affect infants and children at a very early stage of life, and are therefore considered life-threatening. For these, I have identified the disease causing mutation and performed further functional assays in order to decipher the molecular mechanism that lies behind such disorders.

17

Hereditary Spastic Paraparesis (HSP)

HSP comprise a diverse group of neurodegenerative disorders estimated to affect 9.6 per 100,000 individuals, characterized by axonal degeneration of the corticospinal or pyramidal motor and sensory tracts that control the lower extremities, resulting in progressive spasticity and severe paralysis of the lower limbs30-33. The disorder exhibits clinical variability with an onset seen from early childhood through 70 years of age. Pure HSP is characterized by progressive spasticity and leg weakness and is often associated with autosomal dominant inheritance, whereas complicated HSP involves lower limb spasticity accompanied by other neurological symptoms such as ataxia, dystonia, tremor, epileptic seizures, mental retardation and cognitive decline, optic atrophy and skin abnormalities and tends to be autosomal recessive34; 35. Previously to our findings, 48 SPG loci for different HSP types (including X-linked) have been reported, and of these 27 have been associated with identified genes36. Because of the heterogeneity and the clinical overlap between the different groups of spastic paraparesis, diagnosis and recommendation for genetic testing in these disorders have been a daunting task. In my work I have identified a novel form of recessive complicated HSP, along with autonomic features and respiratory symptoms, in Jewish Bukharian children. This new form of the disease has never been reported before, and we thus named it SPG49. Exome sequencing identified the disease is caused by a single base deletion resulting in a premature stop codon in the gene TECPR2, leading to full degradation of the protein. A recent proteomic analysis of the autophagy interaction network has demonstrated a positive autophagy regulation by TECPR2 via interaction with the six human Atg8 orthologs, including the MAP1LC3 (LC3) subfamily37. We thus examined the autophagy-related fate of two key autophagic proteins, SQSTM1 (p62) and MAP1LC3B (LC3) in skin fibroblasts of an affected individual, as compared to healthy control.

Autophagy and neurodegeneration Autophagy is a major intracellular mechanism for degradation of compromised proteins and organelles in the lysosome, and is essential to maintain cellular homeostasis and survival. Autophagy impairment has been implicated in the pathogenesis of several neurodegenerative and muscle diseases, such as Huntington, Alzheimer, and Parkinson

18 diseases, spinocerebellar ataxias (SCAs), and amyotrophic lateral sclerosis. In these diseases, the buildup of protein aggregates leads to a decline in proteasome activity, making nerve cells more dependent on autophagy for degradation, and often resulting in upregulated autophagy. Neurons are highly vulnerable to impairment of autolysosomal clearance and must continually traffic autophagy-related compartments long distances back to the cell body where substrate clearance by lysosomes is most active. Constitutive autophagy is highly active in motor neurons and has a protective role by preventing the accumulation of cytosolic unsequestered cargo and potentially toxic proteins that lead to disruption of the transport mechanisms in axons. Even a minor inhibition in lysosomal proteolytic activity disrupts the transport of autophagic vacuoles and causes them to selectively accumulate, creating an axonal “traffic jam”. Thus, autophagy failure, depending on where the defect is along the pathway, can specifically trigger neuronal cell death and bring about a neurodegenerative phenotype. These mechanisms of decreased autophagosome formation or autophagosomal aggregation and their contribution to neurodegeneration are yet to be further investigated.

Intractable Diarrhea of Infancy Syndrome (IDIS)

Congenital diarrhea disorders are a heterogeneous group of inherited diseases of the gastrointestinal tract starting within the first few weeks of life, often immediately after birth38-40. For many of these conditions, severe chronic diarrhea represents the main clinical manifestation, while in other (syndromic) cases diarrhea is only a component of a more systemic disease. Very often therapy must be started at birth to prevent life- threatening complications. Milder progressive forms with late onset have been described. In recent years, molecular analysis has become a major advantage in the diagnostic approach to a patient41. This heterogeneous group of diseases comprising rare enteropathies related to specific etiology and pathogenesis such as: (i) defects in absorption and transport of nutrients and electrolytes; (ii) maintenance and differentiation of enterocytes (the most prevalent gut cells) (iii) differentiation and function of enteroendocrine cells (EECs), those that secrete over 30 gastrointestinal hormone peptides and constitute 1% of the intestinal epithelial cell population42; 43 and (iv) modulation of the intestinal immune response40.

19

Congenital diarrheas are not only major problems of clinical management. The great importance of these disorders, when genetically unraveled, lies in the information that they provide about normal small-intestinal function in humans. The patients may be considered the human equivalents of the 'knock-out' mice, in which targeted gene disruption allows sometimes unexpected insight into the regulation of complex biological function44. Table 1 provides a simplified view of the genetically deciphered sub-universe of congenital diarrhea, whereby the mutated genes are highly diverse in function, including ion transporters, proteases, transcription factors, cytoskeleton and cell adhesion proteins, second-messenger transduction enzymes and mitochondrial respiratory chain components. Many other forms of the disease await genetic scrutiny.

Table 1: Congenital Diarrhea Diseases with their associated gene symbols and names. Class: (i) absorption and transport of nutrients and electrolytes; (ii) maintenance and differentiation of Enterocytes; (iii) differentiation and function of enteroendocrine cells; (iv) modulation of the intestinal immune response. APECED, autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy. IPEX - immunodysregulation polyendocrinopathy enteropathy X-linked syndrome. defines the first 6 as Diarrhea-N (N=1-6), abbreviated here as DiaN. Parenthesized gene symbols are suspected causative genes. Table construction based on OMIM and our own disease database MalaCards (http://www.malacards.org/).

20

In my PhD thesis, I studied in detail the molecular-genetic etiology of a specific group of 8 non-syndromic congenital diarrhea patients belonging to 7 families, many with known or suspected consanguinity, and showing likely autosomal recessive inheritance. These were historically defined as suffering from intractable diarrhea of infancy syndrome (IDIS)39, and the genetic basis of their affliction remained undeciphered for nearly 2 decades. IDIS is a potentially life threatening condition in young infants and children, defined as congenital, severe and protracted non-infectious diarrhea lasting more than two weeks, consequent malabsorption, multiple food intolerance and failure to thrive38; 39. Since this condition cannot be successfully treated, affected individuals depend on life- long Total Parenteral Nutrition (TPN) and in some cases small bowl transplantation45. The results I have obtained throughout this project provide indicative clues that at least some of the IDIS patients belong to class (iii), namely malfunction of hormonal secretion, which in an extreme form is termed enteric anendocrinosis46. The prominent relevant example are diarrhea patients with mutations in neurogenin3 (NEUROG3)46-48, a key transcription factor of enteroendocrine cell development. Through careful evaluation of exome sequencing and linkage analysis results, we found a shared deleted region and characterized it as a yet-unreported, evolutionarily conserved enhancer, which is active mainly in specific regions of the gastrointestinal tract in mouse – parts of the duodenum, stomach and (IDIS enhancer). Targeted deletion of this region in mice caused symptoms recapitulating all major aspects of the human condition.

Enhancer activity and connection to diseases Enhancers are DNA elements usually located in nucleosome-free regions of non-coding DNA that are more readily accessible, or ‘open’ compared to nucleosome-dense regions. Open chromatin is sensitive to enzymes that digest DNA, and therefore enzyme sensitivity assays were among the earliest methods used to detect enhancers. DNase I is the most frequently used enzyme assay to identify regulatory elements, and genome-wide DNase I hypersensitivity data are now available for over hundred different cell and tissue types. While the majority of DNase I hypersensitivity sites are common between at least two different cell types, some are cell-type specific49; 50. In addition, the open chromatin

21 regions have a high rate of histone replacement and are enriched for the less stable histone variants H2A.Z and H3.351-53. Enhancers activate gene transcription by recruiting tissue-specific transcription factors (TF), RNA polymerase II (RNAPII) and other co-factors involved in transcriptional activation. In addition, Enhancer sequences often contain clusters of TF binding sites. Enhancer-bound TFs can bind co-factors that remodel chromatin by excluding nucleosomes or bending DNA, which makes the associated DNA accessible to other proteins that are involved in facilitating transcriptional initiation or elongation54. However, not all non-coding DNA regions that are bound by TFs are enhancers. Many reproducible sites with low TF binding do not have specific known a function, and are therefore considered benign55. Enhancers serve as a primary contact for transcriptional activators, and initiate physical contact with remote gene promoters by means of a chromatin loop54; 56. This interaction is necessary for recruiting the transcriptional machinery. Although a large fraction of the thousands of mutations that underlie monogenic diseases are rare missense or nonsense variations in protein-coding exons or in splice junctions57, there are a few well documented cases in which mutations (including copy number variations) in enhancer gene-regulatory regions are found to be causative58-62. Interaction between enhancer and promoter elements has been shown to control spatiotemporal gene expression in multicellular organisms, often in development.

Trichohepatoenteric syndrome (THES)

In realm of the IDIS project, our collaboration has identified and recruited several other sporadic patients with unexplained diarrhea. These patients were exome sequenced either as singletons or within a family trio, in order identify a mutation in a gene that could account as the target gene for the enhancer. Here we reported the genetic resolution of an unusual case of a congenital diarrhea by whole-exome sequencing, with implications to the utility of routine use of such analysis. Trichohepatoenteric syndrome (THES, [MIM 222470]) also referred to as syndromic diarrhea (SD), is a rare form of congenital diarrhea requiring total parenteral nutrition,

22 with an estimated frequency of 1 in 450,000 live births. The clinical features of this syndrome are highly variable and include hair abnormalities, facial dysmorphism, growth retardation and immunodeficiency, the latter seen in only 90% of the patients. Liver dysfunction is associated in about half of the patients and varies in severity, from no abnormalities to hepatic hemingiomas, chronic hepatitis and cirrhosis in severe cases63; 64. Most reported cases are caused by mutations in the tetratrichopeptide repeat domain 37 (TTC37) gene on chromosome 5q14.3-q21.2. Thus, Hartley et al identified homozygosity or compound heterozygosity for nine different mutations in TTC37 in families of variable ethnic origins65. Some of these mutations were also found in another study that analyzed TTC37 in 12 THES patients from 11 different families66. TTC37 is reported as the ortholog of yeast ski3p, that along with ski2p and ski8p forms the ski-complex required for exosome-mediated RNA surveillance, that mediates the 3’5’ decay of aberrant mRNAs67; 68. The absence of a functional ski complex is not lethal, probably because of the existence of the other XRN1-based mRNA decay pathway in the 5’3’ direction. More recently, causative mutations for THES were also identified in SKIV2L, the human ortholog of yeast Ski2p69.

Capillary Leak Syndrome (CLS)

CLS, also known as “Clarkson disease”, was first described in 1960 with a case report of cyclical edema and shock due to increased capillary permeability70. It is an extremely rare disease characterized by acute life-threatening attacks of marked increase in capillary permeability, leading to plasma extravasation, severe edema, vascular collapse and hypotensive shock. The disease is frequently misdiagnosed as sepsis, systemic anaphylaxis and angioedema. Recent publications and case reports have increased the awareness of this disorder and have enabled more patients to be correctly diagnosed. Despite this higher recognition, no genetic familial case has ever been described. Since it was first described in 1960, roughly 250 similar cases of CLS have been reported71. Most reported cases are sporadic and involve previously healthy adult patients, while the median age of onset is 45 years. There is also a slight increase in the fraction of male patients versus females. Only 10 cases of clear-cut idiopathic CLS have been reported

23 previously in children, none were involving a familial case. The disease is associated with substantial morbidity and mortality, with a 5-year overall survival of 73% to 76%72; 73.

Pathophysiology The molecular cause of CLS is unknown and systematic and genetic studies have been limited so far due to its rarity. Although it is believed to be caused by transient endothelial dysfunction, the triggering factor is yet unknown and most patients are asymptomatic between episodes70; 74. Some patients show minor signs of systemic inflammation such as fevers, arthralgias, or myalgias at the onset of an episode. The most typical presenting signs are the combination of hemoconcentration, and severe hypotension. Primary standard protocol for management of an acute attack mainly involves early recognition of the disorder, timely fluid repletion and hemodynamic support. Immune dysregulation may have a function in the pathogenesis of the disease. Two separate case reports displayed increased numbers of circulating CD25+ and CD25+ 75 T cells , but no further immunophenotyping was performed. Among adult patients, an estimated 79% to 82% have monoclonal gammopathy of undetermined significance (MGUS), defined as a serum monoclonal immunoglobulin level less than 0.3 g/L, less than 10% bone marrow plasma cells and the absence of end-organ damage, such as lytic bone lesions or renal failure. Previous studies have suggested that in CLS, monoclonal immunoglobulins could bind and inhibit a factor crucial for endothelial barrier function. An example of such mechanism is acquired angioedema, a major symptom of CLS attack, which has been diagnosed in the setting of MGUS and other lymphoproliferative disorders76. Although it is abundant among the adult patients, MGUS has not been reported among affected children.

Treatment CLS Symptoms reverse almost as quickly as they arise, and the attack usually ends with intravascular fluid overload and rapid polyuria to maintain low normal blood pressure with or without high normal heart rate in patients with normal cardiac function. During the leak phase of CLS it is crucial to keep normal central venous pressure thus conservative fluid replacement is given. So far, all treatment strategies were based on

24 single case reports. The usage of oral corticosteroids has helped in minimizing the severity of the attacks, by countering the inflammatory triggers of the disease, however did not prevent an attack. Recently it has been suggested that administration of prophylactic therapy (including theophylline and terbutaline) and/or intravenous immunoglobulins (IVIG) on a monthly basis could be effective in preventing an attack77. In the case of severe attacks, patients may require mechanical ventilation because of flush pulmonary edema. Maintenance therapy may reduce the severity of an attack and is mostly empirical. Terbutaline and theophylline on a daily basis have been used in several cases of affected children, however the dosage must be individualized on the basis of peak serum concentrations. The primary limitations of such treatment are the side effects that vary among patients after prolonged exposure. Nevertheless, all of the mentioned compounds remain unproven and should be carefully used accordingly.

25

METHODS General methods

1.1 Subjects Patients were recruited at Schneider, Sheba and Wolfson medical centers in Israel. For whole-exome sequencing, DNA was extracted from a peripheral blood sample. The molecular studies were approved by the ethical committee of either Sheba Medical Center or Wolfson medical center and the Israeli Ministry of Health. Written informed consent was obtained from all participants or their respective legal guardians.

1.2 Exome sequencing and variant identification Whole-exome sequencing was performed using either the NimblGen SeqCap EZ exome library (Roche Nimblegen, Madison, WI, USA), the SureSelect Human All Exon kit 37- 50 Mb (Aligent Technologies, Santa Clara, CA), or the TruSeq Exome enrichment kit (Illumina Inc. San Diego, CA, USA). Samples were subsequently sequenced using the Illumina Genome Analyzer IIx or the HiSeq2000 platforms (Illumina, Inc. San Diego, CA). The resulting reads were aligned to the reference genome (GRCh37/hg19) using the Burrows-Wheeler Alignment (BWA-0.5.10)4. Polymerase chain reaction duplicates were removed using picard-tools-1.59 (http://picard.sourceforge.net). Genetic differences relative to the reference genome were called using UnifiedGenotyper of the Genome Analysis Toolkit (GATK-1.6–11)5 or SAMtools variant calling program78, which identifies both single nucleotide variants (SNVs) and small insertion-deletions (InDels). High quality SNVs were obtained using the following criteria: consensus score ≥20, SNP quality score ≥20, and reads supporting SNP ≥3. High quality indels were obtained using the following criteria: consensus score ≥20, indel quality score ≥50, ratio of (reads supporting variant)/(reads supporting reference): 0.2-5.0, and reads supporting indel ≥3. Annotation was performed using either SnpEff-3.3 (Ensembl 73 database)6, the SequenceVariantAnalyzer software (SVA)79, DNAnexus software (Palo Alto, CA, USA), and an in-house script using ANNOVAR80 and the GeneCards database annotation26.

26

1.3 Bioinformatics analysis For all patients’ exomes, variant filtering was performed using a similar pipeline. Variants which were called less than 8X coverage, and with a quality score of <20 were excluded. In terms of functional annotation, only protein-altering variants (stop gain/loss, start loss, frameshift, missense, splice-site) were included. The dbNSFP database was used to access the functional prediction of non-synonymous SNPs. We primarily focused on genotypes absent in control data sets including the dbSNP138-142, the 1000 Genomes Project, NHLBI GO Exome-sequencing Project (http://evs.gs.washington.edu/EVS/), the ExAc browser http://exac.broadinstitute.org, the Hadassah in-house database, 240 in- house controls of different Israeli ethnic origins and the internal control cohort comprised of 3,027 subjects enrolled in the Center for Human Genome Variation (CHGV) through Duke institutional review board-approved protocols. Among all heterozygous variants only de novo or compound heterozygous variants were kept. The available protein predicting datasets such as PolyPhen219, SIFT20, MutationTaster21 and LRT81 were used to predict mutations deleteriousness. Variants that survived the filtering process were subjected to phenotype-driven screening and prioritization utilizing VarElect within GeneCards82, a variant election tool for disease/phenotype-dependent gene variant prioritization (http://varelect.genecards.org/). The deleterious effect of the candidate variants was further assessed by PredictProtein83 which is based on amino acid composition and protein considerations.

Methods for the study of HSP

2.1 Homozygosity mapping All of the variants identified by SAMtools were filtered by SVA for those with high quality (coverage ≥5, reads supporting SNV ≥3, SNV quality ≥20, SNV consensus ≥20). All filtered dbSNP SNVs and novel SNVs falling in the exome target regions were chosen for the analysis. To search for runs of homozygosity within any one of the exome sequenced individuals and to identify shared homozygous regions, we used the PLINK 1.07 software 84. We used a sliding window of 10 SNVs within a 1000 kb and defined

27 individual homozygous regions as stretches of at least 1 kb that contain 10 SNVs. All other parameters were set to default.

2.2 Semi-quantitative RT-PCR Full-length cDNA were derived from the human kidney cell line HEK-293. We used PCR-directed mutagenesis to generate the 3642∆T form of the gene. FLAG-tagged TECPR2 was made by two-step PCR-directed mutagenesis. We transfected the constructs into the monkey kidney cell line COS-7 and derived total cDNA after 24h using RNeasy plus mini kit, RNase-Free DNase set (QIAGEN) and High Capacity cDNA Reverse Transcription Kit, and visualized by semi-quantitative RT-PCR.

2.3 Cell culture and transfection Skin fibroblasts were cultured in Dulbecco's modified eagle medium (DMEM) (Invitrogen, Carlsbad, CA) supplemented with 20% Fetal Bovine Serum (FBS), 1% penicillin-streptomycin (Sigma) and 1% L-Glutamine at 37ºC in 5% CO2. For starvation conditions, cells were incubated in Earle's balanced salt solution (EBSS) medium at 37ºC for 6h. HeLa cells were grown on αMEM medium supplemented with 10% FBS and 1% penicillin-streptomycin (Sigma) at 37ºC in 5% CO2. For siRNA silencing, subconfluent HeLa cells were transfected using DharmaFect 1 (Dharmacon) with siRNA SMARTpools (50nM siRNA SMARTpool), consisting of four RNA duplexes, each targeting TECPR2, and non-targeting siRNA control were all purchased from Dharmacon. Experiments were performed 72 h after transfection. Starvation conditions were obtained as previously described.

2.4 Immunoblots For LC3 and p62, the lysates were subjected to SDS–PAGE, and immunoblotting was performed as described.85 The following were used: Primary antibodies- rabbit polyclonal anti LC3 produced by immunization of a rabbit with peptide corresponding to the 14 amino acids of the N-terminus of LC3 with an additional cysteine, mouse monoclonal anti-p62 (Santa-Cruz Biotechnology) or anti-GAPDH (Millipore, Billerica, MA). Secondary antibodies- HRP-conjugated goat anti-rabbit for LC3 and HRP-conjugated goat anti-mouse for p62. For TECPR2, the membranes were

28 incubated with anti-FLAG M2 monoclonal (Sigma Aldrich), anti-human TECPR2 antibody (Sigma Aldrich) or anti-GAPDH (Cell Signaling Technology, Danvers, MA). Proteins were visualized with the enhanced chemiluminescence (ECL) plus western blotting detection system (GE Healthcare, Piscataway, NJ).

2.5 Immunofluorescence analyses Method was performed as previously described.85 Primary antibodies that were used are same as used for Immunoblots. Secondary antibodies were - FITC-conjugated goat anti- rabbit for LC3 and rhodamine-conjugated goat anti-mouse for p62. Confocal images were taken by a FV500 laser-scanning confocal microscope equipped with a PLAPO 60 x 1.4 NA oil immersion lens and analyzed by Fluoview software (Olympus). Cell surface immunofluorescence intensity quantification was performed by image analysis using Meta-Morph software, (Universal Imaging, Westchester, PA).

2.6 Transmission Electron Microscopy (TEM) Fibroblasts were fixed with 3% paraformaldehyde and 2.5% glutaraldehyde in 0.1 M cacodylate buffer (pH 7.4) washed in the same buffer and post fixed with 1% osmium tetroxide. After en bloc staining with 2% uranylacetate in water for 1h at RT, the slices were dehydrated in graded ethanol solutions and embedded in Epon 812. Ultrathin sections (70–90 nm thickness) were prepared using Ultramicrotome Leica UCT and analysed under 120 kV at Spirit Transmission Electron Microscope (FEI).

Methods for the study of IDIS

3.1 Whole genome sequencing WGS of individual 2.1 was performed at CHGV, using the Illumina HiSeq platform (Illumina, Inc. San Diego, CA) and analyzed as described in section 1.2 of methods. 275 CHGV whole-genome sequenced, unrelated samples were used as controls. To detect copy number variants from WGS we used the Estimation by read depth with single- nucleotide variants (ERDS) tool86.

29

3.2 Biopsy collection Subjects underwent gastro-duodenoscopy following Institutional Review Board (IRB) approval (No. 9881-12-SMC) at Sheba Medical Center, and written informed consent of the patients and family members.

3.3 RNA extraction from biopsies RNA isolation from frozen biopsies was performed using TRI Reagent® method (Sigma- Aldrich Inc.) according to the manufacturer’s instructions or by Qiagen RNeasy Mini Kit (Qiagen, Valencia, CA, USA). Integrity of the samples was measured for concentration and purity using NanoDrop® Spectrophotometer (Nanodrop Technologies, Wilmington, DE, USA).

3.4 RNA sequencing of human samples Total RNA was prepared according to the Illumina RNA-seq protocol: briefly, globin reduction, polyA enrichment, chemical fragmentation of the polyA RNA, cDNA synthesis, and size selection of 200bp cDNA fragments were performed. Next, the size- selected libraries were used for cluster generation on the flow cell and prepared flow cells were run on the Illumina HiSeq2000 (Illumina, Inc. San Diego, CA). We obtained a total of 74.18 million paired-end reads of a 100 bp for the affected sample and 72.53 million reads to the healthy sample. Reads were align to the human genome (NCBI37/hg19) using Tophat v2.0.486 with the default parameters. Gene expression quantification was performed with cuffdiff87 using the Illumina iGenome project UCSC annotation file as a reference.

3.5 Quantitative Real-Time Reverse Transcriptase Polymerase Chain Reaction (qPCR) RNA extracted from the biopsies was used for qPCR expression analyses. qPCR were performed using TaqMan® Gene Expression Assays (Applied Biosystems, Foster City, CA, USA) using the Applied Biosystems StepOnePlus (Applied Biosystems). From 1 µg of biopsy RNA, cDNA was synthesized using the SuperScript® First-strnad Synthesis System for RT-PCR (Invitrogen, Carlsband, CA, USA) according to the manufacturer's instructions. A total of 20 µl of cDNA was added with 30ul of water to 50 µl of TaqMan®

30 universal PCR Master Mix (Applied Biosystems) and the resulting 100 µl reaction mixtures were loaded onto a 96-well PCR plate. We used 14 different TaqMan® Gene Expression Assays including three housekeeping genes with the following assays IDs: Hs00757713_m1 (MLN), Hs01074053_m1 (GHRL), Hs00175048_m1 (NTS), Hs00356144_m1 (SST), Hs00174945_m1 (PYY), Hs01062283_m1 (GAST), Hs00292465_m1 (ARX), Hs00174937_m1 (CCK), Hs00175030_m1 (GIP), Hs00219734_m1 (GKN1), Hs00699389_m1 (GKN2), The housekeeping genes we used were HMBS (Hs00609297_m1), ACTB (Hs99999903_m1) and GAPDH (Hs99999905_m1). Reference cDNA samples were synthesized using 200 ng of RNA from RNA extracted from stomach and duodenum tissues of two healthy controls (BioCat GmbH, Heidelberg, Germany) for use in the normalization calculations. Quantitative RT- PCR for expression analysis on the missing exons in C16ORF91 was done using cDNA extracted from the Human Digestive System MTC™ Panel (Clontech Laboratories, Inc. Mountain View, CA).

3.6 Serum Collection Whole blood was withdrawn into a Vacutainer serum tube without anti-coagulant .The blood was immediately treated with 1µM AEBSF (protease inhibitor) and remained at room temparature for 30 min to clot before centrifugation (15 min at 2500 rpm at 4°C).

3.7 ELISA Serum hormone levels were determined using sandwich ELISA technique performed by the following commercial kits according to the manufacture’s instructions. Human Ghrelin (Total) ELISA COLD PACKS (Millipore, USA), Human PYY (Total) ELISA Kit (Millipore), and Human gastric inhibitory polypeptide (GIP) ELISA Kit (ENCO).

3.8 Linkage analysis and homozygosity mapping Genome-wide SNP genotyping from DNA of 6 affected children and 22 relatives from families 1-5 was performed using the Illumina HumanCytoSNP-12v2-1_H, according to the manufacturer’s recommendations (Illumina, Inc. San Diego, CA) in conjunction with SNP genotypes retrieved from whole exome data. For linkage studies 35,845 informative equally spaced SNP markers were chosen after filtering for Mendelian errors and unlikely

31 genotypes. Genotypes were examined with the use of a multipoint parametric linkage analysis and haplotype reconstruction for an autosomal recessive model with complete penetrance and a disease allele frequency of 0.001 as previously described88. Homozygosity mapping was performed using PLINK84 with the default parameters (length 1000 kb, SNP(N) 100, SNP density 50 kb/SNP, largest gap 1000 kb).

3.9 Deletion analysis Boundaries for the two deletion alleles were determined by PCR using amplified DNA and Sanger sequencing. In parallel, we used polymorphic markers that were identified by electronically screening genomic clones located on Chr16 0.86-2.8Mb. Primers were designed with the Primer3 software (http://frodo.wi.mit.edu/cgi- bin/primer3/primer3_www.cgi/ from the Whitehead Institute, Massachusetts Institute of Technology, and Cambridge, MA). Amplification of the polymorphic markers was performed in a 25 - µl reaction containing 50 ng of DNA, 13.4 ng of each primer, and 1.5 mM dNTPs in1.5 mM MgCl2 PCR buffer with 1.2 U Taq polymerase (Bio-Line, London, UK). After an initial denaturation of 5 minutes at 95°C, 30 cycles were performed (94°C for 2 minutes, 56°C for 3 minutes, and 72°C for 1 minute), followed by a final step of 7 minutes at 72°C. PCR products were electrophoresed on an automated genetic analyzer (Prism 3100; Applied Biosystems, Inc. [ABI], Foster City, CA). The breakpoints coordinates were : ∆L- chr16: 1475365-1482378, ∆S- chr16:1480850-1483951, with an overlapping region at chr16: 1480850-1482378 (ICR).

3.10 Mouse transgenic assays The candidate sequence containing the expected enhancer (chr 16: 1479875 – 1480992) was PCR amplified from human genomic DNA and, using Gateway (Invitrogen) cloning, was cloned into an Hsp68-lacZ vector containing a minimal Hsp68 promoter coupled to a lacZ reporter gene. The construct was microinjected into fertilized FVB/N mouse oocytes, which were implanted into pseudopregnant foster females and embryos were collected at E11.5 through E14.5. Enhancer reporter activity was determined by X-gal staining to detect -galactosidase activity. Only patterns observed in at least three different embryos resulting from independent transgenic events were considered reproducible positive enhancers.

32

3.11 Generation of enhancer null mice Homologous arms were generated by PCR and cloned into ploxPN2T vector, which contains neomycin resistant cassette flanked by loxP for positive selection, and an HSV- tk cassette for negative selection. Constructs were linearized and electroporated (20 µg) into W4/129S6 mouse embryonic stem cells (Taconic). The electroporated cells were selected under G418 (150 µg/ml) and 0.2 µM FIAU for a week. Surviving colonies were picked and expanded on 96-well plates, screened both by PCR and sequencing with primers outside but flanking the homologous arm. Clones that were correctly targeted were electroporated with 20 µg of the Cre recombinase-expressing plasmid TURBO-Cre. TURBO-Cre was provided by Dr. Timothy Ley of the Embryonic Stem Cell Core of the Siteman Cancer Center, Washington University Medical School. Clones positive for Neo removal were screened by PCR and checked for G418 sensitivity. PCR products covering the deleted region and part of homologous arms were gel purified and sequenced to confirm the deletion of the ICR enhancer. Correctly targeted clones were subsequently injected into C57BL/6J blastocyst stage embryos. Chimeric mice were then crossed to C57BL/6J mice (Charles River) as well as 129S6/SvEvTac (Taconic) to generate heterozygous enhancer null mice, followed by breeding of heterozygous littermates to generate homozygous enhancer null mice.

3.12 Genotyping of enhancer null mice Genomic DNA was extracted from a 0.2 to 0.3-cm section of tail that was incubated overnight in lysis buffer (containing 100mM Tris-HCl pH 8.5, 5mM EDTA, 0.2% SDS, 200mM NaCl and 50 µg Proteinase K) at 55 °C. Genotyping was carried out using standard PCR techniques. One to two microliters of 50- to 100-fold diluted tail lysate was used in a 20 µl PCR containing 200 µM dNTP, 1.5 mM MgCl2, 5 pmole of each forward and reverse primer and 0.5 U of Taq polymerase.

3.13 RNA sequencing of mouse tissues Total RNA was extracted from different intestinal regions and stomach of mice at E11.5, P1, P5, P10, P15 and P20 using TRIzol® Reagent (ThermoFisher Scientific). RNAseq libraries were then constructed using Illumina TruSeq Stranded Total RNA Sample Preparation Kit following manufacture’s recommendation. The libraries were sequenced

33 using a 50bp single end strategy with four samples per lane on an Illumina HiSeq instrument and data was analyzed using the same protocols as described for human, though with the mm9 mouse reference and Illumina iGenome project mouse genome annotation data.

3.14 RT-PCR, RT-qPCR and 5’, 3’-RACE Total RNA was extracted from intestine and stomach of mice using TRIzol® Reagent (ThermoFisher Scientific). Total RNA was DNase (Promega) treated. First strand cDNA was generated from the DNase-treated total RNA using SuperScript™ First-Strand Synthesis System (ThermoFisher Scientific). RT-qPCR was performed using KAPA SYBR® FAST Roche LightCycler® 480 2X qPCR Master Mix (KAPA Biosystems). To identify deletion transcript(s), regular RT-PCR, 5-RACE and 3’-RACE were performed. Regular RT-PCR used Platinum Taq DNA Polymerase High fidelity (ThermoFisher Scientific). 5’ and 3’ RACE used SMARTer® RACE 5’/3’ Kit (Clonthch) and manufacture’s recommended protocol. PCR products were gel purified and sequenced

3.15 Histological analysis of human biopsies FFPE blocks were sectioned at a thickness of 4 μm and a positive control was added on the right side of the slides. All immunostainings were fully calibrated on a Benchmark XT staining module (Ventana Medical Systems Inc., USA). Briefly, after sections were dewaxed and rehydrated, a CC1 Standard Benchmark XT pretreatment for antigen retrieval (Ventana Medical Systems) was selected for all immunostainings: Chromogranin A (1:500, Dako, Denmark), and Synaptophysin, (1:200, Life Technologies, Invitrogen, USA). Detection was performed with iView DAB Detection Kit (Ventana Medical Systems Inc., USA) and counterstained with hematoxylin (Ventana Medical Systems Inc., USA). After the run on the automated stainer was completed, slides were dehydrated in ethanol solutions (70%, 96%, and 100%) for one minute each. Sections were then cleared in xylene for 2 minutes, mounted with Entellan and cover slips were added. Chromogranin A and Synaptophysin showed cytoplasmic staining.

34

3.16 Generation of induced pluripotent stem cells (iPSCs) from patient lymphocytes Whole blood was isolated by routine venipuncture from patient 2.1 and two healthy siblings (2.3- heterozygous carrier, 2.4- unaffected WT) at Sheba Medical Center in Israel, in preservative-free 0.9% sodium chloride containing 100U/mL heparin. Blood was then shipped overnight to Cincinnati Children’s Hospital Medical Center for iPS cell generation. Peripheral blood mononuclear cells (PBMCs) were isolated from whole blood by Ficoll centrifugation as previously described89 and were used to derive iPSCs. Briefly, PBMCs were cultured for 4 days in DMEM containing 10% FCS, 100ng/ml SCF, 100ng/ml TPO, 100ng/ml IL3, 20ng/ml IL6, 100ng/ml Flt3L, 100ng/ml GM-CSF, and 50ng/ml M-CSF (Peprotech). Transduction using a polycistronic lentivirus expressing Oct4, Sox2, Klf4, cMyc and dTomato was performed90 following the second day of culture in this media. Transduced cells were then cultured for an additional 4 days in DMEM containing 10% FCS, 100ng/ml SCF, 100ng/ml TPO, 100ng/ml IL3, 20ng/ml IL6, and 100ng/ml Flt3L. Media was changed every other day. PBMCs were then plated on 0.1% gelatin-coated dishes containing 2 x 104 irradiated MEFs/cm2 (GlobalStem, Rockville, MD), and cultured in hESC media containing 20% knockout serum replacement, 1mM L-glutamine, 0.1mM β-mercaptoethanol, 1x non-essential amino acids, and 4ng/ml bFGF until iPSC colony formation. Putative iPSC colonies were then manually excised and re-plated in feeder free culture conditions consisting of matrigel (BD BioSciences, San Jose, CA) and mTeSR1 (STEMCELL Technologies, Vancouver, BC). Lines exhibiting robust proliferation and maintenance of stereotypical human pluripotent stem cell morphology were then expanded and cryopreserved before use in experiments. Standard metaphase spreads and G-banded karyotypes were determined by the CCHMC Cytogenetics Laboratory.

3.17 Differentiation of iPSCs into intestinal organoids The differentiation of induced human pluripotent stem cells was performed as previously described91-93 with minor modifications. Briefly, two clonal iPSC lines from each donor were dispase passaged into a matrigel coated 24-well tissue culture plate and cultured for 3 days in mTeSR1. Following definitive endoderm differentiation, the monolayers were

35 treated for 4 days with RPMI medium 1640 (Gibco) containing 2% defined fetal calf serum, 1x non-essential amino acids, 3µM CHIR99021 (Stemgent) and 500ng/mL rhFGF4 (R&D Systems) to induce hindgut spheroid morphogenesis. After the 4th day, “day 0” HIOs were collected and embedded in matrigel matrix and cultured in Advanced DMEM/F12 (Gibco) containing 100 U/mL penicillin/streptomycin (Gibco), 2mM L- Glutamine (Gibco), 15mM HEPES (Gibco), N2 Supplement (Gibco), B27 Supplement (Gibco), and 100ng/mL rhEGF (R&D Systems) for up to 42 days, splitting, passaging, and changing the media periodically. HIOs collected for immunofluorescence analysis were fixed in 4% paraformaldehyde for 1-2 h at room temperature, washed overnight at 4°C in PBS, and embedded in O.C.T. Compound (Sakura). Sections 8-10µ thick were incubated with primary antibodies overnight at 4°C in 10% normal donkey serum / 0.05% Triton X-100-PBS solution and subsequently incubated with secondary antibodies for 1 h at room temperature. The primary antibodies used were: FoxA2 (1:500; Novus), E-Cadherin (1:500; R&D Systems), Synaptophysin (1:1000; Synaptic Systems), CDX2 (1:500; Biogenex), Pdx1 (1:5000; Abcam; data not shown). All secondary antibodies (AlexaFluor; Invitrogen) were used at 1:500 dilutions. Confocal microscopy images were captured with a 20 plan apo objective on a Nikon A1Rsi Inverted, using settings of 0.5 pixel dwell time, 1024 resolution, 2 line averaging, and 2.0 A1 plus scan. Total RNA was extracted from HIOs using a NucleoSpin RNA II kit (Macherey-Nagel), and cDNA was synthesized with SuperScript VILO (Invitrogen) using 300ng RNA. qPCR analysis was performed with TaqMan Fast Advanced Master Mix and custom designed TaqMan Array 96-Well FAST Plates (Applied Biosystems) consisting of the following targets: 18S-Hs99999901_s1; GAPDH-Hs99999905_m1; ARX-Hs00292465_m1; CHGA-Hs00900370_m1; SYP-Hs00300531_m1; NTS-Hs00175048_m1.

3.18 Circular Chromosome Conformation Capture (4C) analysis 4C templates were prepared as previously described94. Briefly, 107 cells per sample were crosslinked in 2% formaldehyde for 10 min. and quenched with 1M glycine. Cells were lysed in 150 mM NaCl/50 mM Tris-HCl (pH 7.5)/5 mM EDTA/0.5% NP-40/1% Triton X-100. First digests were performed with 200 U DpnII (NEB) and 600 U HindIII (Roche) for 4x4 and 6x4 4Cs, resp., followed by ligation at 16 C with 50 T4 DNA

36 ligase (Roche) in 7 mL. Ligated samples were decrosslinked with Proteinase K (0.5 ug/uL), purified, and digested with 50 U Csp6I (Thermo) each, followed by second ligation with 100 U T4 DNA ligase in 14 mL and purification. Resulting products were used as PCR template. Primers for PCR were designed using guidelines described previously94; 95, and contained adapters for single-end sequencing by Illumina. Reads were mapped to mm9 genome96. To calculate the 4C signal the reads were trimmed and align to the mm9 genome using bowtie with the parameters -a -m 1 --best --strata –v 2. The read count per restriction site was calculated with BEDTools.

3.19 Deletion knockout by CRISP/Cas9 system

3.19.1 sgRNA clone synthesis Guide RNA clones are generated using George Church Lab vectors and protocol. A 19 bp of the selected target sequence is incorporated into two 60-mer oligonucleotides as indicated below:

Insert_F: TTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAACACCGNNNNNNNNNNN NNNNNNNN

Insert_R: GACTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAACNNNNNNNNNNNNN NNNNNNC

3.19.2 In vitro Transcription of Cas9 & sgRNA T7promoter-Cas9-polyA and T7promoter-sgRNA amplicons are PCR amplified off of pDD921 and sgRNA clones respectively using Phusion polymerase (New England Biolabs). Cas9 RNA is generated by in vitro transcription from the T7promoter-Cas9- polyA amplicon using mMESSAGE mMACHINE T7 Kit (ThermoFisher Scientific), following manufacture’s instruction. sgRNA RNA is generated by in vitro transcription from the T7promoter-sgRNA amplicon using MEGAshortscript Kit (ThermoFisher Scientific), following manufacture’s instruction. In vitro transcribed RNA is cleaned using MEGAclear™ Kit (ThermoFisher Scientific), following manufacture’s instruction.

37

RNA is eluted into RNase-free Microinjection Buffer (10 mM Tris, pH7.5; 0.1 mM EDTA). The RNA is then checked on 10% TBE Urea PAGE gel for quality.

3.19.3 Microinjection and generation of genetic modified mice A RNA mixture of 100 ng/µl Cas9 RNA and 50 ng/µl sgRNA RNA (if multiple, evenly divide in all sgRNA, and make a total concentration of 50 ng/µl) is injected into cytoplasm of fertilized mouse eggs. Pups generated (F0) are PCR screened. PCR products are sequenced to identify deletion(s). The deletion(s) are sequencing verified again in F1 pups before establishing the line.

3.19.4 Deletion knockout generated Deletion knockouts of LOC105371045 ORF, and its flanking genes Ccdc154, C16orf91, UNKL and Rps2 were generated. The two oligos are annealed, and extended using Phusion polymerase (New England Biolabs) to make a 100bp double stranded DNA fragment, which is then cloned into AflII linearized gRNA cloning vector (Church lab) using Gibson assembly, transformed into TOPO10 Chemically Competent E. coli (ThermoFisher Scientific). Clones are sequencing verified for correct assembly of the sgRNA.

Methods for the study of THES

4.1 Immunology function Cell surface markers of peripheral blood mononuclear cells (PBMCs) were determined by immunofluorescent staining and flow cytometry (Epics V; Coulter Electronics, Hialeah, FL) with antibodies purchased from Coulter Diagnostics, lymphocyte proliferation in response to phytohemagglutinin and anti-CD3 was determined by tritiated thymidine incorporation and serum concentration of immunoglobulins was measured by nephelometry, as previously described97.

4.2 Polarizing microscopy Hair microscopy was done using dry mount and polarized microscopy as previously described98. Polarized light microscopy is of particular value in trichothiodystrophy,

38 where the characteristic alternate dark and white bands of hair shaft can be seen. This sign is known as “tiger tail” appearance, which is not visualized under light microscopic examination.

Methods for the study of CLS

5.1 Immunofluorescence of cultured cells and microscopy Immunofluorescence was performed as previously described99. Briefly, Fibroblasts were plated on gelatin, fibronectin and collagen coated coverslips and grown to semi confluence. Cells were incubated with anti-TLN1 (1:100, BD-Biosciences), anti-TLN2 (1:100, Invitrogen) and anti-TD77 (1:100, Sigma) antibodies for 1 h at room temperature. Cells were then washed with PBS and incubated with Alexa-Fluor-conjugated secondary antibodies (1:100, Invitrogen) for 45 min at room temperature before being mounted on microscope slides using Prolong Gold anti-fade reagent (Invitrogen). Fluorescence microscopy was performed using 40x objective / Nikon microscope. Images were analyzed with IMARIS software showing the area and the number of focal adhesion as well as the surface of each cell. Quantification of FAs was performed for every cell separately.

5.2 Western blots Cells were lysed in sample buffer (65mM Tris pH 6.8, 60mM sucrose, 3% SDS) containing complete inhibitor cocktail (Roche). Lysed samples were sonicated briefly and total protein was quantified using DC Protein Assay (Bio-Rad). Beta-mercaptoethanol was added to the samples and these were separated on 6% sodium dodecyl sulphate– polyacrylamide gel electrophoresis gels and transferred to nitrocellulose membranes (Millipore). Following blocking with milk in Tris-buffered saline 0.1% Tween-20 for 1 hour, membranes were incubated with primary antibodies 1:1000 dilution in 3% BSA Tris-buffered saline overnight at 4°C. Then membranes were washed three times, before incubation with HRP-conjugated secondary antibodies in blocking buffer for 1 hour at room temperature. Development was performed using Luminata HRP substrate (Millipore).

39

5.3 Platelets adhesion tests Platelet adhesion was measured by the cone and plate(let) analyzer (CPA) as previously described 100. Briefly, whole blood was placed in polystyrene plates and subjected to a shear rate of 1,800 s-1 using a rotating Teflon cone. The plates were then thoroughly washed with distilled water, stained and analyzed with an inverted microscope connected to an image analysis system. Two parameters of platelet adhesion were evaluated: percent of platelet surface coverage (SC, %) and the average size (AS, μm2) of the platelet aggregates bound to the surface.

5.4 Endothelial cells isolation Briefly, lungs were minced, digested with 0.1% collagenase type I (Invitrogen) in PBS for 1h, passed through a 70- m pore size cell strainer (BD Bio- sciences), re-suspended in mouse lung endothelial cell medium Ham’s F-12/DMEM 1:1, containing antibiotics and L-Glutamine, 20% fetal calf serum and endothelial mitogen (AbD Serotech), and plated onto tissue-culture flasks pre-coated with a mixture of collagen I, (BD Bio-sciences), human plasma fibronectin (Millipore), and 0.1% procine skin gelatin (Sigma). Endothelial cells were purified by magnetic immunosorting, using antibodies to intracellular adhesion molecule (ICAM-2). All samples were tested for endothelial purity by flow cytometric analysis for PECAM-1, VE-cadherin, endoglin, and ICAM-2 and only used when purity was 98-99%. Experiments were performed in early passages (2-6).

5.5 Endothelial cells assays with serum Mouse primary endothelial cells were seeded in 8-well coated slides (5x104 cells/well) and after 5 days of Tamoxifen treatment were assessed for adherens junction formation. In detail, on the 6th day of culture cells were starved in optimem+2% FCS for 2h, followed by stimulation with P48 basal or attack serum (20% of either in optimem+2% FCS) for 2h, or thrombin (at 0.2U/ml in optimem+2% FCS) for 30 minutes, or mTNF (at 10ng/ml in optimem+2% FCS) for 1h. Afterwards, cells were washed twice with PBS and were fixed with 4% PFA to further perform immunofluorescence with adherens junction markers

40

5.6 Transcriptome analysis Total RNA was prepared according to the Illumina RNA-seq protocol: briefly, globin reduction, polyA enrichment, chemical fragmentation of the polyA RNA, cDNA synthesis, and size selection of 200bp cDNA fragments were performed. Next, the size- selected libraries were used for cluster generation on the flow cell and prepared flow cells were run on the Illumina HiSeq2000 (Illumina, Inc. San Diego, CA). We obtained a total of xx million paired-end reads of a 100 bp for the affected sample and xx million reads to the healthy sample. Reads were aligned to the human genome (NCBI37/hg19) using Tophat v 2.0.4 86 with the default parameters. Gene expression quantification was performed with cuffdiff 87 using the Illumina iGenome project UCSC annotation file as a reference.

5.7 Proteomics sample preparation For serum proteomics, proteins were reduced by addition of dithiolthreitol (Sigma) to a final concentration of 5 mM and incubation for 30 min at 60°C, and alkylated with 10 mM iodoacetemide (Sigma) in the dark for 30 min at 21 °C. The proteins were then digested using trypsin (Promega; Madison, WI, USA) at a ratio of 1:50 (w/w trypsin/protein) for 16 hours at 37°C. Digestions were stopped by addition of 1% trifluroacetic acid (TFA). The samples were stored in -80˚C in aliquots. For fibroblasts proteomics, Samples were subjected to in-solution tryptic digestion using a modified filter aided sample preparation protocol (FASP). Sodium dodecyl sulfate buffer (SDT) included: 4% (w/v) SDS, 100mM Tris/HCl pH 7.6, 0.1M DTT. Urea buffer (UB): 8 M urea (Sigma, U5128) in 0.1 M Tris/HCl pH 8.0 and UC buffer: 2M Urea, pH 7.6-8.0 (dilute UB X 4 with 0.1M Tris-HCl pH 7.6). Cells were dissolved in 100μL SDT buffer and lysed for 3min at 95°C. Then spun down at 16,000 RCF for 10min. 30μl were mixed with 200μl UB and loaded onto 30 kDa molecular weight cutoff filters and spun down. 200μl of A were added to the filter unit and centrifuge at 14,000 x g for 40 min. Trypsin was then added and samples incubated at 37°C overnight. Digested proteins were then spun down, acidified with trifloroacetic acid and stored in -80°C until analysis.

41

5.8 Liquid Chromatography ULC/MS grade solvents were used for all chromatographic steps. Each sample was loaded using split-less nano-Ultra Performance Liquid Chromatography (10kpsi nanoAcquity; Waters, Milford, MA, USA) in high-pH/low-pH reversed phase (RP) 2 dimensional liquid chromatography mode. 7.5μg of digested protein from each sample was loaded onto a C18 column (XBridge, 0.3x50mm, 5μm particles, Waters). Buffers used were: A) 20mM ammonium formate, pH 10 and B) ACN. Peptides were released from the column in a step gradient of increasing acetonitrile composition. Each fraction flowed directly to the second dimension of chromatography. The buffers used in the low pH RP were: A) H2O + 0.1% formic acid and B) acetonitrile + 0.1% formic acid. Desalting of samples was performed online using a reverse-phase C18 trapping column (180µm i.d. 20mm length, 5µm particle size; Waters). The peptides in samples were separated using a C18 T3 HSS nano-column (75µm i.d. 200mm length, 1.8µm particle size; Waters) at 0.4µL/minute. Peptides were eluted from the column and into the mass spectrometer using the following gradient: 3% to 30%B in 60min, 30% to 95%B in 5min, maintained at 95% for 7min and then back to initial conditions.

5.9 Mass Spectrometry The nanoLC was coupled online through a nanoESI emitter (7 cm length, 10 mm tip; New Objective; Woburn, MA, USA) to a quadrupole ion mobility time-of-flight mass spectrometer (Synapt G2 HDMS, Waters) tuned to 20,000 mass resolution (full width at half height). Data were acquired using Masslynx version 4.1 in HDMSE positive ion mode, in which the quadrupole is set to transfer all ions. Then ions were separated in the T-Wave ion mobility chamber and transferred into the collision cell. Collision energy was alternated from low to high throughout the acquisition time. In low-energy (MS1) scans, the collision energy was set to 5 eV and this was ramped from 27 to 50 eV for high-energy scans. For both scans, the mass range was set to 50 – 2,000 Da with a scan time set to 1 second. A reference compound (Glu-Fibrinopeptide B; Sigma) was infused continuously for external calibration using a LockSpray and scanned every 30 seconds.

42

5.10 Data Processing and Analysis For serum proteomics, raw data processing and database searching was performed using Proteinlynx Global Server (IdentityE) version 2.5.2. Database searching was carried out using the Ion Accounting algorithm as previously described 101. Briefly, the algorithm detects the 250 most abundant peptides and performs an initial pass through the database in order to identify these peptides (with mass tolerance of 7ppm for precursor ions and 15ppm for fragment ions). These peptides are then depleted from the database and the remaining peptides are searched. The cycle continues to the next most abundant peptides, which are identified and then depleted from the database. These tentative peptide identifications are ranked and scored based on how well they conform to 14 predetermined models of specific, physicochemical attributes (such as retention time and fragmentation prediction, fragment to precursor ratios and others). For fibroblasts proteomics, raw data was imported into the Expressionist® software (Genedata) and processed as described 102. The software was used for retention time alignment and peak detection of precursor peptides. A master peak list was generated from all MS/MS events and sent for database searching using Mascot v2.5 (Matrix Sciences). Data was searched against the human sequences UniprotKB (http://www.uniprot.org/) appended with 125 common laboratory contaminant proteins. Fixed modification was set to carbamidomethylation of cysteines and variable modification was set to oxidation of methionines. Search results were then filtered using the PeptideProphet algorithm 103 to achieve maximum false discovery rate of 1% at the protein level. Peptide identifications were imported back to Expressions to annotate identified peaks. Quantification of proteins from the peptide data was performed using an in-house script1. Data was normalized base on the total ion current. Protein abundance was obtained by summing the three most intense, unique peptides per protein.

43

RESULTS

1. Hereditary Spastic Paraparesis (HSP)

The following data have been published in AJHG104 and in Autophagy105. We identified five individuals from three apparently unrelated Jewish Bukharian families, independently referred to the pediatric neurological clinic at the Sheba Medical Center in Israel. Affected individuals presented with what appeared as autosomal recessive HSP of the complicated form, however with additional autonomic features and respiratory symptoms. Four of the five individuals were available for DNA sequencing analysis (Fig. 1).

Figure 1. (A) Family pedigrees of the three Bukharian families with HSP, roman numerals indicate generations and numbers are serial within the family. Blue circles indicate exome- sequenced individuals. Slash mark represents the deceased individual that wasn’t available for exome sequencing. (B) T1 sagittal MRI image from individual II-1 in family A showing thin corpus callosum at the age of 3 years. (C) Same, with progressive cerebellar vermis atrophy at the age of 10 years. (D) T2 axial MRI image from individual II-2 in family C at the age of 7 years showing enlarged lateral ventricles and deep sulci indicating cerebral atrophy. (E) Same, with sagittal MRI image, showing deep cerebral sulci indicating cerebral atrophy, thin corpus callosum and vermian atrophy.

1.1 Clinical description Affected individuals had short stature, with mild brachycephalic microcephaly, round face and a low anterior hairline, dental crowding, short broad neck and a chubby habitués. The neurological phenotype included motor and cognitive delay, followed by

44 moderate to severe intellectual disability and hypotonia that evolved until the end of the first decade of life into a spastic, rigid ataxic gait in four out of five individuals who developed independent walking. Speech when present was dysarthric, their faces were hypomimic although with a friendly disposition, and deep tendon reflexes were absent. The more cooperative individuals were dysmetric in finger-to-nose maneuver. All affected individuals had recurrent pulmonary infections due to gastroesophageal reflux disease (GERD [MIM 109350]). Ongoing severe central apnea episodes were characteristic to all, initially during sleep, evolving with age into the wake state. Four of the individuals had recurrent episodes of decreased alertness, aggravation of hypotonia, and inefficient respiration requiring mechanical ventilation with spontaneous remission to baseline in the younger affected individuals. The oldest affected individual (Fig. 1 family B: II-2), a 20-year-old female, did not recover from her last episode and currently requires mechanical ventilation. Some of the acute deterioration episodes were related to inter-current infections but there was no evidence of metabolic decompensation. The two siblings from family B (II-1, II-2) had infrequent short generalized tonic clonic seizures. Individual II-1 from family A had recurrent ankle pressure ulcers and individual II-2 from family B had transient severe encephalopathy with intermittent transaminases elevation. Extensive metabolic testing was normal for all. Individual II-1 from family B died at the age of 5.5 years due to aspiration. MRI scans from two individuals (Family A: II-1, Family C: II-2) showed thin corpus callosum, cerebral and cerebellar atrophy (mainly vermian) (Figure 1B-E). Muscle biopsy from three individuals (Family B: II-1, II-2; Family C: II-2) showed normal histology and respiratory chain studies revealed normal function. Polysomnograms showed very frequent central apneas (>90/hour) accompanied by hypoxemia with poor response to oxygen enrichment. EEG showed diffuse slowing with poor background organization and no epileptiform activity. EMG and nerve conduction studies were normal.

1.2 Exome sequencing and mutation discovery We performed exome sequencing in the four available individuals. Based on previous studies indicating that Bukhara Jewish families are distantly related106 and on the

45 inferred autosomal recessive inheritance in all three families, we expected a founder mutation, and thus focused on homozygous variants shared among all affected individuals. We identified five shared homozygous variants among the four affected individuals. Only one homozygous variant had a zero minor allele frequency in controls. This variant is a single-base deletion of T in exon 16 within TECPR2 (c.3416delT). The wild-type TECPR2 protein contains 1411 amino acids, and the variation leads to a frameshift in codon 1138 (p.Leu1139Argfs*75), resulting in a premature stop codon at amino acid position 1212, hence the loss of 4 out of 6 TECPR domains (Fig. 2). The variant displayed perfect co-segregation in all of the three studied families. Furthermore, our results were strengthened by homozygosity mapping using the exome sequencing data. Applying PLINK analysis84 to all high quality exome sequence variants, we identified a 1 Mb region of homozygosity at chromosomal region 14q:102,356,475- 103,388,999 (NCBI build 37/hg19) shared among all four sequenced affected individuals and encompassing TECPR2. We also interrogated the exomes of these affected individuals to determine if copy number variants (CNVs) could explain the observed idiosyncratic HSP phenotype. We therefore looked for exons with very low or zero coverage, that would suggest homozygous (CNV) deletions, but found none. In addition, we applied CoNIFER107, an algorithm that attempts to detect deletions and duplications in exome data, but did not detect any CNVs shared by all affected individuals. While CNV detection from exome data is known to be error prone,108 this significantly decreases the probability that the causative variation is a genomic deletion or duplication. To better estimate the frequency of the c.3642delT variant in healthy individuals, we genotyped 1,098 additional DNA samples from healthy controls that were not ethnically matched, and found that none were carriers. Thus, overall this truncation was not found in any of the 2,007 non Bukharian controls that we have checked. We also genotyped 150 Jewish Bukharian controls and found the mutation in a heterozygous form in 4 out of the 300 , accounting for an allele frequency of 0.013 in this ethnic group. This implies that the disease could be seen in 1.7 out of 10,000 endogamy couples, consistent with our identification of three affected families by an admittedly incomplete screen in a community of 150,000 Israeli Bukharian Jews with likely partial endogamy.

46

Figure 2. Protein attributes of TECPR2. Domain structure and mutation according to UniProt and NCBI conserved domains. Top, conserved domains in the wild-type TECPR2 according to UniProt. Yellow, WD (tryptophan-aspartic acid dipeptide) repeats; purple, TECPR domains; blue poly-Lysin tract. Bottom, the effect of the mutation, with orange indicating modified protein sequence due to frameshift at position 1138, resulting in a premature stop-codon at position 1212.

To test the effect of the mutation, we derived full-length cDNA from the human kidney cell line HEK-293 and used PCR-directed mutagenesis to generate the c.3642delT form of the gene. C-terminus or N-terminus FLAG-tagged TECPR2 was made by two-step PCR-directed mutagenesis. We then transfected the constructs into monkey kidney cell line COS-7, human kidney cell line HEK-293 and human epithelial cell line HeLa, derived total cDNA and visualized by semi-quantitative RT-PCR. Transcripts were seen for both the normal and mutated forms (Fig. 3A). We then used anti-FLAG M2 monoclonal antibody, anti-human TECPR2 polyclonal antibody or anti-GAPDH monoclonal antibody to detect TECPR2 protein forms. TECPR2 was detected by anti- FLAG and anti-TECPR2 immunoblotting only in the wild-type but not in the mutant transfection (Fig. 3B and C). This suggests that the mistranslated and truncated protein is degraded in these cells. Inhibiting the proteasome degradation pathway by MG132 and lactacystin rescued the mutated protein (Fig. 3D-G). This indicates that the mutated protein is targeted for proteasome degradation.

47

Figure 3. Effect of TECPR2 mutation. (A) Semiquantatative RT- PCR analysis of the fate of TECPR2 mRNA expression in COS-7 transfectants. Mock -empty vector; T -wild-type ; ∆T- mutant ; RT(+) -with reverse transcriptase ; (RT-) -no reverse transcriptase. The data reveal no effect of the mutation on PCR-amplified mRNA levels. (B) Immunoblotting with anti-FLAG monoclonal antibody. Same notation as in A. GAPDH - loading control. The data reveal major disappearance of the mutated protein. Wild-type TECPR2 is seen at the expected molecular weight (154 kDa). (C) Immunoblotting using the anti-TECPR2 antibody. 293T and HeLa are unmodified cell lines; endogenous TECPR2 is not detectable. Labels as in A and B. Showing the wild-type protein but not the mutated form in transfected COS-7 cells. (D) Effect of proteasome inhibition with MG132 and Lactacystin. Immunoblotting with anti-FLAG monoclonal antibody for COS-7 transfectants of C-terminus FLAG tagged TECPR2. (-), mock transfected; T, wild-

48 type; ∆T, mutant; GAPDH, loading control. Reveals rescue of the truncate mutant TECPR2 and enhancement of wild-type TECPR2 upon proteasome inhibition. (E) Effect of proteasome inhibition with MG132 and Lactacystin. Immunoblotting with anti-FLAG monoclonal antibody for COS-7 transfectants of N-Terminus FLAG tagged TECPR2. Same notation as in D. (F) Expressions of each alleles of N-terminus or C-terminus FLAG tagged TECPR2 and their effects of proteasome inhibition with MG132 and Lactacystin. Immunoblotting with anti-FLAG monoclonal antibody for HEK293 transfectants of N-terminus or C-terminus FLAG tagged TECPR2 and with anti-FLAG monoclonal antibody (G) for HeLa transfectants of N-terminus or C-terminus FLAG tagged TECPR2. Data supports the results in Figure 3D,E with the same notations.

1.3 Connecting the mutation to changes in Autophagy pathway proteins The protein encoded by TECPR2 has recently been shown by immunoprecipitation-based proteomic analysis to interact with the six human Atg8 orthologs, including the MAP1LC3 group (LC3). In parallel, it was shown to be a positive regulator of autophagosome accumulation by a TECPR2 siRNA approach.37. Autophagy is a major catabolic intracellular pathway. It controls the degradation of the majority of long-lived cytosolic proteins and bulky cellular constituents (protein aggregates and organelles), contributing to the maintenance of intercellular homeostasis and cell survival109; 110. In this process, newly formed membranes, termed phagophores, engulf parts of the cytoplasm leading to the production of double-membraned autophagosomes that get delivered to lysosomes for content degradation. Dysfunction of autophagy has been proposed as an underlying mechanism for numerous neurodegenerative and muscle diseases111; 112. We also note that TECPR2 is highly expressed in the human brain, especially in the prefrontal cortex113; 114, providing a potential basis for the organ-specific outcome of the deleterious TECPR2 mutation. Two evolutionary conserved proteins are necessary for the formation of the autophagy process and work in parallel (Fig. 4) - The first is the cargo-recruiting polyubiquitin- binding protein SQSTM1 (p62) that is being degraded in the autolysosome during autophagy. The second is the cytosolic and phosphatidylethanolamine-conjugated forms of the ubiquitin-like autophagy-initiation protein MAP1LC3B (LC3) highly associated with the autophagosomal membrane and functioning after the isolation membrane of autophagosomes has formed115-118.

49

Figure 4. Ubiquitylation of protein aggregates triggers binding of the adaptor protein p62, which also binds LC3 conjugated with lipids in the double membrane of the forming autophagosome. HDAC6, histone deacetylase 6; PE, phosphatidylethanolamine.

We thus hypothesized that the truncated TECPR2 leads to modifications of autophagy, potentially resulting in the observed phenotype. To test this we derived skin fibroblasts from an affected individual (Fig. 1: Family B II-2) and from an unrelated healthy control and induced autophagy by starvation in nutrient-poor medium for 6h. We used immunoblotting of cell lysates to measure the protein amount of SQSTM1 (p62) and MAP1LC3B (LC3). For LC3, the activation of autophagy leads to a transition from the cytosolic form (LC3I) to the autophagosome-associated phosphatidylethanolamine- conjugated form (LC3II), an effect enhanced in the presence of bafilomycin A, an inhibitor of autophagosome-lysosome fusion. Bafilomycin A is also expected to enhance p62 levels as the protein is degraded in the autolysosome, together with the cargo proteins. The variously treated fibroblast lysates were subjected to SDS–PAGE, followed by immunoblotting. The anti-TECPR2 antibody used for Figure 3C was ineffective in detecting endogenous protein in the present experiment. The presence of autophagosomes in the starvation and bafilomicyn A condition is evidenced by Transmission Electron Microscopy (TEM) (Fig. 5C). In general, while starvation alone led to a slight diminution in the amount of both proteins, the addition of bafilomycin A led to a considerable augmentation of the amounts of p62 as well as LC3II in the control. In comparison, the affected individual sample showed an across-the -board diminution of p62 and LC3II levels under all conditions (Fig. 5A and B).

50

Figure 5. Effect of mutation on autophagy markers. (A) Immunoblotting with anti-p62 antibody and anti-LC3 antibody for skin fibroblasts lysates of both affected and control. Basal - rich medium; str - starvation; baf - with the lysosomal inhibitor bafilomycin A. LC3I is cytoplasmic form, LC3II is phosphatidylethanolamine- conjugated form. GAPDH was used as loading control. (B) Summary of three biological replicates, each in duplicate, for experiments as shown in A, with same notations, using immunoblot scan quantitation and normalization by GAPDH control. Error bars are standard deviation for the four replicates. (C) Skin fibroblasts of an affected individual and a healthy control were incubated under starvation conditions in the presence of bafilomycin A for 6h, and thin sections visualized by transmission electron microscopy. Arrows show representative autophagosomes/autolysosomes, with a possible slight diminution of organelle accumulation within autophagic bodies in the affected fibroblasts.

1.4 Using siRNA knockdown as model for the mutation Furthermore, we performed siRNA knockdown of TECPR2 in HeLa cells transfected and examined their autophagy-related phenotype. For validating the knockdown effect on TECPR2, quantitative real-time PCR was carried out. By 72h, minimal level of TECPR2 could be detected in comparison to the non-targeting siRNA control (Fig. 6A). In cells in which TECPR2 was knocked down, we observed a major reduction of the bafilomycin- induced LC3II immunoreactivity. Of note, p62 amounts were much less affected by the mutation (Fig. 6B). Using confocal immunofluorescence analysis, we confirmed these observations for both proteins (Fig. 6C, D). These combined results suggest that the presumed knockdown of TECPR2 in the spastic paraparesis subjects brings upon a decreased accumulation of LC3II labeled autophagosomes and attenuates the delivery of

51

LC3II and p62 to lysosomal degradation. This indicates that the autophagy pathway is impaired, but not completely eliminated, as we observe a lower impact on p62 protein levels. Of note, p62 is selectively recruited into autophagosomes and therefore even partial autophagic activity may be sufficient to deliver this protein to the lysosome.

Figure 6. TECPR2 knockdown in HeLa cells (A) Real time PCR amplification of cDNA from transfected HeLa with either non-targeting siRNA (NT-siRNA) or a pool of 4 TECPR2 siRNAs (TP-siRNA) using primers specific for TECPR2 (exons 16-17). (B) Immunoblotting with an anti-p62 antibody and anti LC3 antibody for HeLa Cells lysates. Notation as in Figure 5A. (C) LC3II and p62 protein levels in HeLa cells in which TECPR2 is knocked down by siRNA viewed by immunofluorescence confocal microscopy, under different conditions. Green -LC3 antibody; red -p62 antibody; yellow -merger of both signals. Notation as in B. For non-target siRNA punctate perinuclear structures, stained both for p62 and for LC3II, are likely autophagosomes.

52

1.5 Additional patients with TECPR2 mutations The following data have been published in Eur J Paediatric Nuerology119.

Our collaboration has identified three additional non-Bukharian patients with a consistent phenotypic presentation to the SPG49 patients, harboring two novel mutations in TECPR2.

Patient 1 (Fig. 7A family 1, II:1) was the first of four children to healthy unrelated Ashkenazi Jewish parents. Patient 2 (Fig. 7A, family 2, II:2) is the second of 3 children to healthy parents of Ashkenazi origin with no consanguinity. Patient 3 (Fig. 7A, family 3, II:1) is the eldest of 2 children to parents of mixed Ashkenazi/Tunisian-Yamani/Kurdish origin, with no consanguinity and no neurological or developmental diseases in the extended family.

Figure 7. Pedigrees and TECPR2 mutations. (A) Family pedigrees of patients 1,2 and 3 (corresponding to families). WT: wild type allele. (B) Chromatogram of the sites of the TECPR2 mutations, the altered base is indicated with an arrow.

1.5.1 Molecular diagnosis Patient 1 was found to have a pair of compound heterozygous variants in TECPR2. The first was a missense variant on chr14: 102881058 C>T, NM_001172631 (c.C566T,

53 p.Thr189Ile) and the second was a 1 bp deletion of T at chr14: 102898367, causing a frameshift (c.1319delT, p.Leu440Argfs*19) leading to a premature stop codon (Fig. 6B, top and middle). Both mutations were predicted to be deleterious according to PolyPhen2 and SIFT and perfect segregation was demonstrated within the family. Both mutations have not been previously reported in SPG49 patients and were not present in any control dataset. Patient 2 was exome-sequenced within a trio along with his two healthy parents. We found the same c.1319delT (p.Leu440Argfs*19) frameshift deletion that was found heterozygously in patient 1 (Fig. 6B, middle). Both parents were found to be heterozygous carriers of this deletion. The unique phenotype of patient 3 raised the suspicion of TECPR2 involvement. Due to the fact that the child was of both Ashkenazi and Kurdish origin (having genetic similarities to Bukharians), we initially screened for the two identified mutations found in these ethnicities, the c.1319delT, (p.Leu440Argfs*19) identified in the Ashkenazi patient 2 and the c.3416delT (p.Leu1139Argfs*75) identified in our previously reported Bukharian patients104. Using Sanger sequencing we found these two variants in a compound heterozygous mode of inheritance (Fig. 6B, middle and bottom). Segregation within the family confirmed that the father who is of Ashkenazi/Tunisian origin was heterozygous for the c.1319delT mutation and the mother who is of Yemenite/Kurdish origin was heterozygous for the c.3416delT mutation.

1.5.2 Bioinformatics considerations TECPR2 encodes 1411 amino acids protein with three WD (tryptophan-aspartic acid) repeat domains located at the N-terminus and six TECPR domains concentrated around amino acid 1000 and 1250. The p.Leu1139Argfs variant would result in a loss of four out of six TECPR domains. The p.L440fs reported here is predicted to cause an even more radical truncation of the protein resulting in a loss of more than half its size and all six TECPR domains (Fig. 8).

54

Figure 8. Functional prediction of TECPR2 mutations. Schematic representation of TECPR2 gene and predicted encoded protein product. Exons are represented with boxes. The WD (tryptophan-aspartic-acid dipeptide) repeats indicated in yellow, TECPR domains shown in purple, and blue represents the polylysin tract. The locations of p.Thr189Ile, p.Leu440Argfs*19 and p.Leu1139Argfs*75 within the protein are indicated with an arrow.

The Thr189 is part of the third WD domain and is extremely conserved throughout evolution, PredictProtein83 provides high scores to Thr 189 substitutions in general and to Thr189Ile in particular (Fig. 9A). A model of WD region of the protein amino acids is available in Modbase120. The model covers amino acids 1-304 of the protein. Although the similarity to the template protein (3fm0A; human CIAO1, pdb 3fm0) is relatively low (15% sequence identity), both proteins are WD domain proteins, so there is a good reason to believe that the basic folds are conserved. The WD domain of TECPR is a mainly Beta domain which forms a β-propel structure with a seven fold symmetry (Fig. 9B). Thr189 is located in the central beta strand of a WD repeat close to a connector of two anti- parallel strands and thus presumably possesses a role in stabilization of the WD fold or in interactions with other molecules.

55

Figure 9. Model and prediction for the Thr189Ile mutation. )A( Analysis of amino acid substitutions within the WD region of TECPR2 using PredictProtein. Position 189 is highly sensitive to substitution as indicated by the reddish colors (black rectangle). The actual Thr189Ile mutation is circled. )B( Homology model of amino acids 1-304 in the protein reveals the classic 7fold symmetry of the WD domain. Thr189 is colored in red and located in the core of WD repeat.

56

2. Intractable Diarrhea of Infancy Syndrome (IDIS)

We studied eight patients from seven unrelated families of Jewish Iraqi origin with an autosomal recessive pattern of severe congenital malabsorptive diarrhea originally defined as having congenital intractable diarrhea40 (Fig. 01, Fig. 11a,b,c).

Figure 01. Filled black symbols are affected individuals, and deletion genotypes are indicated in red. Exome sequencing was done for individuals 1.1, 2.1, 3.1, 4.1, 4.2; whole genome sequencing was done for individual 2.1. Transcriptome analysis done for 2.1, 2.4. Patient 1.1 (*) was found to have uniparental disomy (UPD).

2.1 Identification of two deletion alleles in IDIS patients Exome sequencing analysis on five patients (Fig. 10) revealed no rare exonic sequence variants with the appropriate patient segregation. Whole genome linkage analysis (Fig. 11) and haplotype reconstruction using SNP genotyping performed on 6 of the patients in families 1-5 and their 22 relatives detected a single significant (LOD score = 4.26) telomeric linkage interval on with flanking marker rs2074359 (chr16: 2,984,868). Recombination analysis using both SNP genotyping and exome data (when available) reduced the linkage interval to a 800kb region within the linkage interval on chr16: 1,050,877 – 1,849,916 in the 4 patients of families 1,2,3,5. To identify possible

57 structural genomic changes at this locus, we further examined all exome sequencing data sets, as well as WGS data from one of the patients. In exome sequencing data, we observed an absence of coverage of three consecutive exons of a predicted transcript of C16ORF91 in a subset of patients, suggesting the presence of a deletion (Fig. 02d, Fig. 13). PCR amplification and Sanger sequencing in these families revealed a 7,013 bp deletion, termed ΔL. Further scrutiny revealed that none of the three computationally predicted exons within the deleted interval are supported by quantitative RT-PCR, or by public transcription resources (UCSC genome browser, Illumina Body Map, ENCODE), i.e. mistakenly included in the exome capture kit, suggesting that the ΔL region is intergenic, providing a first line of evidence suggesting that a non-coding function may be affected by the deletion. Targeted PCR and sequencing of the locus showed that the two patients in family 4, who did not share the region of homozygosity, were compound heterozygotes for ΔL along with a distinct allelic variant ΔS, a partially overlapping 3,101 bp deletion defining a minimal sequence termed intestine-critical region (ICR) of 1,528 bp (Fig. 12d). All eight patients in this study showed ΔS/ΔS, ΔS/ΔL or ΔL/ΔL genotypes, resulting in homozygous deletion of the ICR (Fig. 10). Neither of these deletions were found in several large control samples, including 200 ethnicity-matched controls and >3,000 WGS data sets from diverse sources. Patient 1.1 showed uniparental isodisomy for the maternal chromosome carrying the ΔL allele.

Figure 11. Analysis of SNP genotyping performed on six of the patients in families 1-5 and their 22 relatives detected a single significant telomeric linkage interval on chr16 with a max LODscore of 4.26. Haplotype reconstruction confirmed this interval with flanking marker rs207435 (chr16: 2,984,868) and showed two distinct disease haplotypes in an either homozygous setting in affected individuals for disease allele 1 (i.e. ΔL) in families 2, 3, 5, or a compound heterozygous setting for disease alleles 1 and 2 (i.e. ΔS) in family 4. All affected individuals carrying disease allele 1 showed an identical disease haplotype from rs533184 (chr16: 1,155,025) to rs397435 (chr16: 2,010,138).

58

Figure 02. Overview of human and mouse locus and key findings. a/b, Family pedigrees and genotyping results for patients compound heterozygous for the two deletion alleles (a) and homozygous for one of the deletion alleles (b). c, Patient 4.2 at birth and at age 2y with total parenteral nutrition (TPN). d/e, Genomic map of the deletion alleles in human (d) and mouse (e), indicating the location of ΔL and ΔS, as well as their minimal overlapping region ICR. Exome sequencing data is capped at up to 5 overlapping tags; vertebrate conservation is 100- vertebrate PhyloP; only selected transcription factor binding sites and DHS clusters with signal in >20/125 ENCODE cell types shown.

59

Figure 13. Schematic of reads covering exons in the C16orf91 gene, for the five exome- sequenced patients and for three controls sequenced under identical conditions. The first three patients with a L/L genotype had zero-coverage in the three upstream exons (right). The last two patients with a L/S genotype had non-zero coverage in these exons, but significantly lower than controls. The downstream exons (left) had high coverage in all subjects. Numbers indicate scale in sequencing reads per base.

Whole-genome sequencing for patient 2.1 confirmed the ΔL attributes and showed that it is the only homozygous genomic deletion in the linkage region. None of the deletions were present in 200 ethnically matched Iraqi control chromosomes as well as in either 122 in-house Caucasians WGS samples. In addition, we searched in >3000 WGS of diverse sources in the KAVIAR dataset121 and found no deletions overlapping those reported here. Further, 1092 individuals from the 1000 Genome Project122 were scanned within the integrated variant calls file seeking overlaps with the L and S regions, and

60 no such were observed. Searching the Database of Genomic Variants123; 124 for large deletions that span the L and S regions identified several heterozygous deletions with combined allele frequency <0.004.

2.2 The intergenic deletion removes a distant-acting enhancer Analyzing the ΔV overlap region by Encyclopedia of DNA Elements (ENCODE)125 data showed an included 400bp region with high evolutionary conservatin across vertebrates that shows CpG island and DNAse hypersensetivity signatures, and encompasses a cluster of multiple binding sites for transcription factors identified by ChIP-seq (Fig. 11d). The strongest ChIP-Seq signal was observed for FOXA1 and FOXA2, both are known to be enhancer interactors, acting as 'pioneers' whose binding enables chromatin access for other tissue-specific transcription factors56; 126. These results raised the possibility that the ICR is a distant-acting enhancer. To experimentally validate that this noncoding fragment has enhancer activity and learn about the tissues in which it is active, we collaborated with prof. Len Pennacchio and the VISTA enhancer browser team at Lawrance Berkeley Laboratories, CA, and they examined the enhancer activity of the minimal critical human interval in a transgenic mouse enhancer assay127. In short, the entire ΔV region was cloned into an entry vector together with a weak promoter coupled to the LacZ reporter gene, microinjected into fertilized mouse eggs, harvesting the embryos at different gestational stages and performing whole mount and section LacZ staining visualization (Fig. 14). In transgenic embryos ranging from embryonic day (E) 11.5 to E14.5, we observed robust and reproducible reported activity in the stomach, pancreas and duodenum (Fig. 14). All three of these organs contain many distinct enteroendocrine cell types that control gastrointestinal and metabolic function via hormone peptides42. These results support the notion that the ICR sequence deleted in congenital diarrhea patients contains an enhancer active in vivo in the developing digestive system, and may thus be directly linked to the disease etiology.

61

Figure 14. Enhancer reporter activity in E13.5 and E14.5 transgenic mouse embryos. Cross- sections showing X-gal staining for -galactosidase activity in E13.5 stomach, pancreas and duodenum as marked.

2.3 Deletion of the enhancer in mice leads to a human-like phenotype To examine if deletion of the minimal ICR sequence was sufficient to cause the in vivo phenotypes observed in human patients, we removed a 1,512 bp mouse sequence orthologous to the human 1,528 bp ICR from the mouse genome using homologous recombination in embryonic stem cells (Fig. 10e). When heterozygous chr17+/ΔICR mice were interbred, homozygous chr17ΔICR/ΔICR offspring were born at the expected Mendelian frequency. At birth, the pups showed no gross phenotypes and had normal suckling behavior. However, starting within the first few days of life, chr17ΔICR/ΔICR mice displayed overall reduced size (Fig. 15A), low body weight (Fig. 15B) and substantially decreased survival (Fig. 15C). Only 40% of chr17ΔICR/ΔICR mice survived to weaning at ~20 days of age and by two months after birth, surviving chr17ΔICR/ΔICR mice showed a 60% reduction in weight compared to wild-type or heterozygous littermates. Examination of fecal pellets and internal organs revealed abnormal digestive tract function in chr17ΔICR/ΔICR mice. The stomach content of chr17ΔICR/ΔICR mice during the first weeks of life did not show gross deviations from wild-type controls in volume or appearance and consisted of normal amounts of milk. However, the intestinal content

62

was abnormal, with pale undigested appearance, much softer consistency, and failure to form discrete fecal pellets (Fig. 16A-B). These results indicate that deletion of the ICR enhancer in mice causes substantial disruption of intestinal function, consistent with the in vivo activity of the enhancer in the developing intestinal tract and recapitulating the congenital diarrhea phenotype observed in human patients carrying homozygous ICR deletions.

A

Figure 15. (A) Chr17ΔICR/ΔICR offspring are viable but show a reduction in size and weight compared to wild-type littermates. (B) Reduction in body weight among surviving offspring (C) and increased mortality of chr17ΔICR/ΔICR compared to wild-type.

Figure 16. (A) Modified intestinal content in the wild-type (left) and the chr17ΔICR/ΔICR mouse (right). (B) Abnormal appearance of fecal pellets from chr17ΔICR/ΔICR mice.

63

2.4 Assessing gene expression levels by RNA sequencing

To explore the molecular basis of the phenotypes observed upon ICR deletion, we examined possible changes in gene transcription in human and mouse digestive tract tissues. Such changes may reflect dysregulation of direct target genes of the ICR enhancer, indirect downstream regulatory events, or the absence or general dysfunction of intestinal cell populations. We performed RNA sequencing of duodenal and stomach biopsies obtained from a ΔL/ΔL patient, as well as a non-diseased sibling. Among the genes showing the strongest down-regulation genome-wide in at least one of these tissues, eight encode gastrointestinal peptide hormones secreted by enteroendocrine cells128, and four have other relationships to gastrointestinal function (Fig. 17 and Table 2). Particularly pronounced changes were observed for five peptide hormones: gastric inhibitory polypeptide (GIP), motilin (MLN) and ghrelin (GHRL) in the duodenum and gastrin (GAST) and somatostatin (SST) in the stomach, all of which showed >100-fold reduction in expression. In addition MBOAT4129; 130, a ghrelin-modifying enzyme, and ARX, a transcription factor controlling enteroendocrine cell development131 and associated with syndromic congenital diarrhea132; 133 showed 20 to 30-fold down- regulation in the ΔL/ΔL small intestine. These results are consistent with abnormal development or function of enteroendocrine cells134. Among the genes showing the largest increase in expression, eight are related to the gastrointestinal tract including gastrokines 1 and 2 (GKN1, GKN2), crucial for homeostasis of gastric epithelial cells and maintenance of gastric mucosa integrity135, pepsin precursor (PGA3) and motilin receptor (MLNR). Quantitative RT-PCR of selected candidates including seven gastrointestinal peptide hormones and ARX confirmed their dysregulation in ΔL/ΔL samples. . In addition, serum GAST, CCK, GIP, GRHL and PYY in ΔL/ΔL and ΔL/ΔS patients were found at zero concentrations compared to healthy controls. Consistent with these observations in human patients, RNA sequencing of a panel of mouse digestive tract biopsies taken at different stages of development showed that nearly all of these genes are dysregulated in chr17ΔICR/ΔICR mice. For the genes shown in Table 2, across all profiled mouse digestive tract tissues 121 of 191 valid comparisons showed significant changes in expression (p < 0.05), the vast majority of which (105 of 121; 87%) was in the same direction as in human biopsies. Together, these results are consistent with major

64 disruptions of normal intestinal physiology in chr17ΔICR/ΔICR humans and mice and highlight the close resemblance between the human disease condition and the mouse knockout model.

Figure 17. Significant expression changes in human small intestine biopsies. Log2 (FPKM) values are presented for IDIS patient and his healthy brother (Fig. 10; 2.1 and 2.4 accordingly). Fold changes values are indicated in red above columns. Undef, hormone is not expressed in small intestine.

Gene Description Fold Changes human human small mouse P mouse tissue stomach intestine Down-Regulated in human patients / chr17VV mice SST somatostatin 10 683 36 <0.01 colon/rectum (P1) GIP gastric inhibitory peptide 277 n.e. 768 <0.001 intestine (P5) MLN motilin 206 n.e. - - (no mouse ortholog) stomach (P10, GHRL ghrelin/obestatin prepropeptide 125 5.2 896 <0.001 bottom) CEL carboxyl ester lipase 1.1 135 144 <0.001 intestine (P1, top) stomach (P10, ARX aristaless related homeobox 30 6 23 <0.05 bottom) PYY peptide YY 25 n.e. 223 <0.001 rectum (P5) MBOAT stomach (P20, ghrelin O-acyltransferase 22 1.4 9.4 <0.01 4 bottom) NTS neurotensin (0.62) 15 674 <0.001 intestine (P1, bottom) GAST gastrin 11 123 52 <0.001 stomach (P5) CCK cholecystokinin 8.2 6.7 109 <0.001 intestine (P5, top) SLC26A (>0.05 solute carrier family 26, member 7 7.4 2.9 6.2 stomach (P1) 7 ) Up-Regulated in human patients / chr17VV mice GKN1 gastrokine 1 256 n.e. 25 <0.001 colon (P5) PGA3 pepsinogen A3 113 6.96 - - (no mouse ortholog) GKN2 gastrokine 2 60 (0.81) 22 <0.001 colon (P5)

65

DUOX2 dual oxidase 2 51 (0.34) 19 <0.001 intestine (P1, top) RBP2 retinol binding protein 2 (0.89) 20 8 <0.001 colon (P5) stomach (P10, REG1B regenerating islet-derived 1 beta 14 n.e. 1946 <0.001 bottom) MLNR motilin receptor 1.0 12 - - (no mouse ortholog) ATP4B ATPase, H+/K+ exchanging, beta 7.6 4.5 345 <0.001 intestine (P1, top)

Table 2. Significant expression changes in human and mouse intestinal tissues. Fold changes are calculated as the expression ratio of non-affected human or wild-type mice over homozygous ΔICR/ΔICR patients or mouse littermates. n.e., not expressed. n/a, not applicable. Fold-change and p-value for the mouse tissue with quantitatively strongest genotype-dependent regulation in same direction as human tissue shown. p-values are Bonferroni-corrected for multiple hypothesis testing across 16 mouse tissues.

Seeking an enhancer target gene, we could not find significant RNA seq dysregulation for any of the genes within ±1Mb around the enhancer (Fig. 18). The presumably observed differentiation in expression of TPSG1 turned out to be a false positive.

Figure 18. Expression levels (FPKM) of genes surrounding the enhancer. No obvious differentiation was observed.

66

2.5 Histopathological analysis of patient's biopsies To further explore the pathophysiology associated with ICR deletions, biopsies obtained from two ΔL/ΔL homozygous patients were subjected to immunohistochemical staining with chromogranin A (CHGA), an early marker of enteroendocrine cell development. Increased immunoreactivity, as compared to healthy controls, was seen in the duodenal villi and stomach pyloric mucosae, a hyperplastic change that further supports that ICR deletions cause abnormal development of enteroendocrine cells (Fig. 19). Further, negative staining was observed for gastrin, somatostatin and glucagon expressing cells in patient’s duodenal and stomach tissues compared to control tissue showing the expected staining pattern, consistent with the downregulation observed by the transcriptome data (data not shown).

Figure 19. Increased immunoreactivity of Chromogranin A stained enteroendocrine cells in duodenal biopsy (villi and intestinal glands) of patient 7.1 (A) as compared with the number in a control sample (C), and in the antral glands of stomach (pyloric mucosae) biopsy of patient 2.1 (B) as compared with the number in a control sample (D).

67

2.6 Reprogrammed intestinal cells from patient-derived induced pluripotent stem cells.

In collaboration with Prof. James Wells from the Cincinnati children's hospital, we generated induced pluripotent stem cell (iPSC) lines from a ΔL/ΔL patient, a heterozygous +/ΔL sibling, and an unaffected +/+ sibling and differentiated them into human intestinal organoids (HIOs) Differentiation of iPSCs into intestinal tissues in vitro is highly similar to development of the embryonic intestine, and after 21 and 42 days in culture, HIOs from all three genotypes formed an intestinal epithelium that expressed CDH1, FOXA2 (Fig. 20A) and CDX2 (data not shown). Analysis of enteroendocrine cells with the markers Synaptophysin (SYP, Fig. 20A) and Chromogranin A (CHGA, not shown) indicated that these cells were more readily detected in the ΔL/ΔL iPSC HIOs than in the HIOs generated from carrier or control iPSC lines after 21 days in culture, similar to biopsy specimens. In contrast, the number of enteroendocrine cells at the later (42 day) time point was severely reduced in ΔL/ΔL HIOs. These results were confirmed by quantitative RT-PCR where ΔL/ΔL HIOs showed a substantial decrease in the expression of enteroendocrine markers CHGA, SYP, as well as ARX (Fig. 20B). These results suggest that specification of enteroendocrine cells during development and in adults is normal or even precocious in ΔL/ΔL patients, but that later stages of development and differentiation were impaired. We note that patient biopsies showed increased immunoreactivity of CHGA (Fig. 19), which may indicate that in vivo these tissues acquire a steady state, whereas the in vitro HIO model recapitulates the initial emergence of enteroendocrine cells during embryonic development134.

68

Figure 20. Human enteroendocrine cell development is impaired in iPSC-derived intestinal organoid cultures. (A) Human intestinal organoids (HIOs) were generated from control (+/+), carrier (+/ΔL), and patient (ΔL/ΔL) iPSC lines and analyzed at 21 days and 42 days of culture. Intestinal epithelial development was interrogated by expression of the epithelial markers FOXA2 (blue) and CDH1 (red). Synaptophysin (SYP - green) was used to mark developing enteroendocrine cells. Representative examples from two separate iPSC lines from each patient run in triplicate are shown. (B) Analysis of 42 day HIOs by quantitative RT-PCR for the enteroendocrine markers ARX, Chromogranin A (CHGA) and synaptophysin (SYP). Error bars show standard error of the mean. Control vs. carrier was not significant. Carrier vs patient was significant at p<0.05 in all cases (student’s t-test, one-tailed). Results are from two separate iPSC lines from each patient run in triplicate.

2.7 Circular Chromosome Conformation Capture (4C)

To identify interactions between the identified gut enhancer and the potential target gene promoter, we performed circular chromosome conformation capture (4C-seq) analyses with the deleted enhancer genomic region as a viewpoint, in gut and stomach biopsies of E15 mouse embryos. This technique was developed to identify physical interactions genome-wide from any given genomic location136.

The only significant cis-interacting segments genome-wide were seen within a ± 100kb of the viewpoint. Among these, there were two signals well-separated from the near viewpoint vicinity and occurring in immediate upstream regions of genes. The first was at the coiled-coil domain-containing 154 (CCDC154) gene and the second at the unkempt family zinc finger like (UNKL) gene (Fig. 21). These two genes may therefore be considered as likely enhancer target genes.

69

Previously, a spontaneous novel mutation involving a homozygous genomic ~5kb deletion that includes exons 1-6 of CCDC154 was observed in a “new toothless” (ntl) mouse, showing aberrant tooth development and osteopetrosis137; 138. In this knockout mouse line, runted progeny who died shortly after weaning were identified. The functional context of certain coiled coil domain containing proteins in the regulation of DNA-DNA connections relevant, among others to the control of gene expression has recently been reviewed139. However, our RNAseq experiments in enhancer-deleted human and mouse (Fig. 18) did not show significant expression down-regulation for CCDC154. Taken together with the reported phenotype for mice lacking CCDC154, these results weaken the probability of this gene as being the enhancer target.

Mammalian UNKL and its paralog UNK are orthologous to the drosophila unkempt (unk) gene, which encodes a zinc finger/RING domain-containing transcription factor protein. Drosophila unk encodes a set of mRNAs abundant only during early stages of embryogenesis, and showing distinct expression patterns in the early embryo140. More recently a mouse unkempt ortholog was shown to play a role in chromatin remodeling, serve as a negative regulator of the timing of photoreceptor differentiation141 and act as an RNA-binding protein to control of a neuronal morphology program142. However, UNKL, like CCDC154, also showed negligible expression change in our current RNAseq results for enhancer-deleted human and mouse. Following 4C results pointing at CCDC154 and UNKL as the two top candidate target genes for the enhancer, we used the CRISPR/Cas9 system for genome engineering to delete the four closest genes to the enhancer from the mouse genome using microinjection of Cas9 and sgRNA RNA mixture into the cytoplasm of fertilized mouse eggs.

70

Figure 21. 4C data for the ICR sequence in E15 mouse stomach and gut biopsies. 4C signal using the ICR sequence as view point (indicated by blue arrow) on mouse chromosome 17. The signal is presented without the exact view point peak, smoothened using the MATLAB smooth function based on the LOWESS method with a span of 30. Scales corresponding to read count are shown on the left.

2.8 Reprocessed RNA-seq analysis in KO mice identified a new differentially expressed transcript in stomach After identifying no potential targets in the vicinity of the distant acting enhancer, we performed further RNA-seq focused analysis on the digestive tract tissues of the KO mice in compared to healthy control mice. In small intestine tissues, no cis gene expression change was observed in KO compared to control. However in stomach tissue, we noted a suggestive expression difference in each stomach pairwise comparison (Fig. 22).

71

Figure 22. Tissue and strand specific RNA expression around the deleted enhancer in stomach tissue. P refers to post-natal day of the mouse. WT, wide-type; KO, chr17ΔICR/ΔICR mice. The location of the peak differences is indicated with a green arrow. The ΔV region deleted in the mice is indicated with red box.

This differential expression pattern is located on the opposite strand from CCDC154 in stomach tissue. When looking further into that region and performing various Blast searches, we identified a region right next to the deletion that has an extended open reading frame (ORF) of 900 bp (Fig. 23). This ORF lies within a newly reported transcript defined as LOC105371045 containing two exons - exon 1 is non-coding and exon 2 containing the ORF. This transcript was only reported in the latest updated version of the genome browser (GRCh38.p2). The coding region of LOC is located upstream to ΔV, the region in which the enhancer was found active. Thus, this coding region was not deleted in our KO mice lacking the enhancer.

Figure 23. Schematic view of the location of the LOC105371045 transcript within the significantly expressed RNA-seq peak in stomach biopsy, in relation to the boundaries of the three deletion alleles.

72

LOC105371045 is an uncharacterized transcript that was predicted based on RNA-seq evidence with alignments that fully cover the model and 3 samples providing support for all introns. Based on these data, NCBI generated a predicted protein-coding model transcript and protein at this location because there is evidence of transcription, and there is an open reading frame of reasonable size (266-aa) and coding propensity (an internal statistical calculation of how likely the ORF is to be coding, based on details like codon bias and other sequence composition characteristics). According to NCBI RNA-seq tracks, there is support for this transcript to be expressed in prostate and stomach (Fig. 24A). In addition, our RNA-seq data from the human biopsies of affected and controls, compared with the available data from Illumina body map, and from the Chr17ΔICR/ΔICR mice showed this transcript to be highly expressed specifically in the stomach and small intestine, with a higher expression seen in controls (wild-type) biopsies as expected (Fig. 24B,C).

In addition, we searched within the Genotype-Tissue Expression (GTex) Project, for any reported expression data on the LOC105371045 transcript. At that time, The GTex portal provided expression levels only for genes that were annotated in GENCODE v12 (based on NCBI37/hg19), therefore, the LOC105371045 transcript has not been computed and aligned. By contacting the developers of GTex and sending them the coordinates of the human LOC105371045 transcript, I was able to obtain full annotation and computed FPKM values, displaying the expression pattern across all tissues (Diego Garrido and Roderic Guigo, Personal communications) (Fig. 25 A-B)

73

B C

Figure 24. (A) Screenshot from NCBI (version GRCh38.p2) of RNA seq expression pattern of different tissues around LOC105731045 (marked with a box), showing higher expression levels in stomach and prostate. (B) RNA seq data for LOC105371045 expression in the stomach and small intestine biopsies of the IDIS patient and controls and in the 16 tissues represented in the Illumina body map. Expression Values are in FPKM. Aff=affected individual with IDIS, het=carrier of the ΔL deletion. (C) Summary of 6 biological replicates of RNA seq data for LOC105371045 expression in two different areas of the stomach and small intestine biopsies of the Chr17ΔICR/ΔICR mice and controls at post natal days 10 and 15.

74

Figure 25. LOC105731045 expression across GTex tissues. (A) Expression pattern across all tissues available in GTex database, showing highest expression in brain, prostate and stomach. (B) Average FPKM values for the top expressing tissues as indicated in A. Expression in stomach is X4 higher than in small intestine and pancreas.

75

2.9 LOC105731045 protein search We used the blastp algorithm of BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE= Proteins) to identify relationships between the ORF sequence and any known protein based on sequence similarity. Blastp search produced several results with highest amino acid similarity (88%) to the gene DAXX encoding death-associated protein 6 (Fig. 26). This is a multifunctional protein that resides in multiple locations in the nucleus and in the cytoplasm. It interacts with a wide variety of proteins, such as apoptosis antigen Fas, centromere protein C, and transcription factor erythroblastosis virus E26 oncogene homolog 1. In the nucleus, the encoded protein functions as a potent transcription repressor that binds to sumoylated transcription factors. In addition, DAXX modulates PAX5 activity, a transcription factor crucial for embryonic development.

Figure 26. Blastp search of amino acid sequence similarity between LOC10531045 and DAXX. Shown with 88% sequence similarity.

76

2.10 Deletion of the LOC10531045 ORF in mice leads to a less severe phenotype of IDIS. To evaluate whether the LOC10531045 is the enhancer’s target gene and to examine if deletion of the ORF sequence will lead to the same in vivo phenotype observed in the enhancer KO mice, we removed a 815bp mouse sequence orthologous to the human 804bp ORF from the mouse genome using microinjection of Cas9 and sgRNA RNA mixture into the cytoplasm of fertilized mouse eggs. When heterozygous chr17+/ΔORF mice were interbred, homozygous chr17ΔORF/ΔORF offspring were born at the expected Mendelian frequency. Starting within the first few days of life, chr17ΔICR/ΔICR mice displayed mild reduction in size and in body weight (Fig. 27A) and decreased survival. Chr17ΔORF/ΔORF mice had a 10% death rate by weaning at 21 days of age and by two months after birth, surviving chr17ΔICR/ΔICR mice showed a 20% reduction in weight compared to wild-type or heterozygous littermates. Examination of fecal pellets and internal organs revealed abnormal digestive tract function in some of the chr17ΔORF/ΔORF mice. The intestine from around the ileum to rectum was swollen in some cases, but not as severe as seen in chr17ΔICR/ΔICR mice (Fig. 27B). The gut content was pale with undigested appearance in younger pups (p15) had normal colored feces and shape. Diarrhea phenotype was only seen in younger pups (p15). Overall the observed phenotype in the chr17ΔICR/ΔICR mice was much less severe and less prominent than observed in the mice with a homozygous deletion of the enhancer chr17ΔORF/ΔORF (Fig. 28).

77

Figure 27. (A) ORF KO mice showing reduced size and body weight compared to controls (B) p10 guts. KO pups had solid feces in the rectum. The feces pellets in rectum of null pups were lighter in color than those in wt. Null intestines were slightly swollen in the area adjacent to caecum (red arrows).

Figure 28. Comparison of chr17ΔICR/ΔICR (bottom) versus chr17ΔORF/ΔORF (top) mice. Weight for male mice is indicated in grams from birth to weaning (~p20). Homozygous male mice for the enhancer KO had significantly lower growth rate than WT.

78

2.11 Cross lines of Enhancer KO and ORF KO showed the phenotype varies widely After generating the chr17ΔORF/ΔORF mice, we performed enhancer KO het x ORF KO het breedings, creating compound heterozygous lines for the two deletions. 67 pups from 7 litters had been genotyped: 17 pups (9 Males, 8 Females) were enhancer KO het and ORF KO het; 18 pups (9 Males, 9 Females) were enhancer WT and ORF WT; the rest 32 pups were either enhancer het/ORF WT, or enhancer WT/ORF het, accounting for an expected Mendelian ratio. We dissected all 17 enhancer KO het / ORF KO het pups, aged from p10 to p13. The overall phenotype varied widely among these pups: the body weight of enhancer het / ORF het pups varied, from great reduction (~40% of WT littermates' weight) to normal size comparing to wt littermates, and the observed diarrhea phenotype of enhancer het / ORF het pups was generally mild in the majority of cases, similar to what was seen for the ORF KO null pups. Feces in rectum had pellet shape, but were soft and yellow colored, while in wild-type the feces were dark brown and much harder. However, in a few cases, the compound heterozygous pups had relative severe diarrhea phenotype, with no pellet shaped feces in the rectum. One case showed swollen intestine to a degree as obvious as in enhancer KO null mice. In addition, the cecum was knotted up, a phenotype which was also seen for some of the ORF null pups, but not for enhancer null pups (Fig. 29).

Figure 29. Diarrhea phenotype of enhancer het /ORF het pups is generally mild in the majority of cases, as shown for B5371 (top). Feces in rectum had pellet shape, but soft and yellow colored, while in wild-type the feces were dark brown and much harder (B5374, bottom). B5370 (middle) is one of the very few pups which had relative severe diarrhea, with no pellet shaped feces in the rectum. Diarrhea was squeezed out when the pup was cervical dislocated (arrow).

79

3. Trichohepatoenteric Syndrome (THES) The following data have been published in Clinical Genetics143.

3.1 Clinical description Patient 12.1 (Fig. 30A), 4 year-old girl in a Middle-Eastern Arab consanguineous family with no gastrointestinal history, developed persistent secretory diarrhea at 18 days of age, which did not improve upon cessation of feedings, nor under a trial of Galactomin 19 formula (for suspicion of Glucose-galactose malabsorption), and has been completely dependent on TPN ever since. She had minor dysmorphic features but no obvious developmental delay (Fig. 30B). Despite mild elevation of liver transaminases, abdominal ultrasounds suggested no major hepatic phenotype. Duodenal biopsies excluded tufting enteropathy and microvillous inclusion disease. Brain MRI/MRS and metabolic screens including urine reducing substances, blood and urine amino acid profiles, blood homocysteine levels and lipid profile, were normal. There were no presentations of cardiac, cutaneous, platelet or immune deficiency phenotypes as immunoglobulin levels were normal and patient responded well to vaccinations. Sweat test and genetic testing excluded cystic fibrosis.

3.2 Exome-sequencing and mutation discovery Due to consanguinity in the family we suspected recessive inheritance. Exome sequencing found a total of 590 rare homozygous variants (SNPs and InDels) with a MAF < 0.02 in databases for healthy controls. These variants were prioritized using VarElect82 that produced six candidate genes with association to diarrhea or to the gastrointestinal tract. The gene with the strongest phenotype implication was TTC37, a known gene for trichohepatoenteric syndrome, harboring a homozygous mutation c.2282 A> G, encoding a missense change (p.Leu761Pro). This mutation had zero frequency in the scrutinized datasets, involving nearly 8000 exomes. Genotyping by Sanger sequencing confirmed this variant as truly homozygous in the proband and showed perfect segregation within the family (Fig. 30A,C). The c.2282 A>G variant was further genotyped in 100 healthy Arab-Muslims and no homozygotes of heterozygous carriers were found. The mutation was predicted to be possibly damaging according to several

80 predictors including PolyPhen219, SIFT20 and damaging according to MutationTaster21 and LRT81. Moreover, the mutation is located in an evolutionary conserved region using a comparison of 46 vertebrates144. These results provide convincing support for identifying the patient as afflicted with THES.

3.3 Clinical diagnosis of THES following exome-sequencing For further confirmation of THES in the patient, microscopic examination of the hair was preformed, showing hair shafts of varying sizes, some discontinuous with areas of thinning and breaks (trichorhexis nodosa, Fig. 30D). THES is a heterogeneous disease with a widely varying spectrum of phenotypes. The most prevalent symptoms are persistent secretory diarrhea, hair abnormalities and facial dysmorphism, with a rather frequent appearance of immune deficiency. In retrospect analysis, our patient appears to have a rather atypical form of THES. While presenting certain facial dysmorphism and mild hair phenotypes, she showed none of the reported hepatic symptoms such as hepatomegaly, chronic hepatitis, cirrhosis or progressive liver failure and had no immune deficiency64. Further, none of the less frequent THES symptoms were observed, including cardiac, cutaneous and platelet abnormalities 63. The hair abnormalities observed prior to the microscopic examination were minor and could have occurred due to severe malnutrition caused by the diarrhea. Such atypical clinical presentation accounts for the fact that THES was not suspected in the present case. The definitive molecular diagnosis will allow pre-natal testing and genetic counseling to the family for future pregnancies. It will also provide preventive treatment and therapeutic approaches and focus on essential medical procedures rather than performing unnecessary invasive examinations. This might include to effectively addressing the potential future appearance of cardiac, liver and other progressive abnormalities associated with THES.

81

Figure 30. (A) Family pedigree. Affected girl (filled symbol) TTC37 genotype status for the c.2282 A> G variant is indicated by G/G for homozygote, G/A for heterozygote and A/A for a wild-type form. (B) The affected girl at age 4y, showing dysmorphic features compared to her healthy 4 year-old twin brother. (C) A> G missense variant in TTC37 in the affected girl (middle panel). Genotyping by Sanger sequencing validated homozygosity in the patient and showed perfect segregation within the family, confirming the WT form (bottom) and heterozygous (top). (D) Light microscopic examination of the hair of the affected girl with areas of thinning and breaks.

82

4. Capillary Leak Syndrome (CLS)

We identified the first familial case of CLS, an Israeli family with ten potentially affected individuals, showing suspected autosomal dominant inheritance with incomplete penetrance. The proband (patient 48, Fig. 31) is a 14 year-old boy presenting with severe and frequent CLS attacks from infancy, with a remarkable family history of 9 more The family is of Jewish Ashkenazi origin and the great-grandparents .ץaffected relatives were first degree cousins. They had 10 children, 5 died at infancy between the ages of 6 months to 1-year from unknown reasons of sudden death. We suspect, through discussions with the family that the deceased children from the second generation might have been affected with CLS and died from an acute attack that was not diagnosed. This cannot be confirmed neither ruled out, thus we refer to these cases as “affected individuals” in the pedigree. Within the third generation, there are three affected individuals, patients 16 and 26 died at adolescence from similar CLS attacks and the only other CLS patient who is currently alive besides the proband is patient 21, the father’s cousin who has been affected in childhood and has recovered from the attacks as an adult. In addition, patient 47 is the proband’s sister who passed away at 5 months of age from an acute CLS attack. For this girl, it was the first identified attack. This patient passed away more than 10 years prior to this study, however we were able to extract DNA from a post-mortem biopsy and include her in the analysis. The proposed mode of inheritance according to this pedigree is autosomal dominant with incomplete penetrance due to the large number of affected individuals in the second generation and also to the fact that affected children were born to healthy parents. All other monogenic modes of inheritance (recessive, X-linked) were ruled out.

83

Figure 31. CLS Family pedigree. Individuals in the first generation are first degree cousins. The TLN1 splice mutation genotypes are indicated in red; HET, mutation carriers, WT, wild-type. Proband is indicated by an arrow. Filled in black symbols represent CLS affected individuals. Grey symbols represent healthy carriers.

4.1 Clinical description Patient 48. This 14 year-old boy was originally described in 2010 by his primary physicians 145. He was diagnosed with CLS at the age of five months and has had many episodes since, some of which were very severe and necessitated full resuscitation. This case was clinically reported for the following reasons: First, for being the fourth case of CLS in children reported in the world, second, for his exceptional presentation in which he exhibited a substantial neurologic involvement with cerebellar edema and autonomic dysfunction, and last but most outstanding – for being the first case of familial CLS reported in the world. Since the 2010 report, the patient has been experiencing many episodes of CLS, with an increasing frequency since reaching puberty at the age of 13. Through the years 2009-2013 he has been experiencing on average 2-3 episodes a year. Through 2014, when he was 13 years old, he had 6 severe episodes. The first clinical signs of an impending episode consist of mild abdominal pain, nausea and occasionally vomiting. On admission patient is presented with tachycardia around 130/min, normal blood pressure values for age, hematocrit at about 60% (between episodes he has normal hematocrit of 40%). CBC, blood chemistry, renal function tests, liver enzymes, CRP and blood cultures are in the normal range. The patient is treated according to an internal protocol that has been assembled through the years by his physicians, which consists of 2-3 boluses of crystalloids followed by 1.5 maintenance of fluids, albumin 1gr/kg, and continuous drip of aminophylline. Hydrocortisone and

84 immunoglobulins are given in most episodes but not in all. Usually after a few hours hematocrit normalizes and the patient’s condition stabilizes. Each admission lasts in average 2-3 days. Between CLS episodes the patient is a perfectly healthy adolescent with high intellectual skills. Patient 47. This is the sister of patient 48 that died at the age of 6 months after arriving to the emergency room in irreversible shock which was subsequently suspected to be a CLS episode that started at home with vomiting. Patients 16 and 21. The two other cousins are siblings, both diagnosed with CLS at the age of 9 years. Patient 16 died during his second episode of CLS at the age of 11y. The other brother is alive and is currently healthy. He experienced three severe episodes of CLS, at the age of 10y, 12y and 15y years. In each episode, he was brought to the intensive care unit in shock, after the same prodrome of abdominal pain nausea and vomiting. Blood analysis revealed hemoconcentration of 70%. He was treated successfully with intravenous fluids, steroids and inotropic support and recovered completely. Since the age of 15 years he hasn’t had any further episodes. The extended family history revealed 5 siblings of the grandfather from the father’s side who died as children, with no details on the cause of death. In addition, three more cousins of the patient’s father exhibited episodes of CLS. One of them is a female (patient 26, Fig. 31), suspected to have a CLS episode of which she died at the age of 9 years.

4.2 TLN1 as a strong candidate for CLS Whole-exome sequencing was carried out on the two available affected family members, the proband (patient 48) and his father's cousin (patient 21) (Fig. 31). Due to the expected autosomal dominant mode of inheritance with incomplete penetrance, analysis was aimed at identifying a deleterious heterozygous mutation inherited from the proband father’s side. Our analysis identified a total of 37 heterozygous rare variants (SNPs and InDels) shared among the two affected individuals, and with a MAF <0.01 in databases for healthy controls. Among these variants, only one heterozygous SNP showed perfect co- segregation within the family (Fig. 31) when found in the two affected sequenced individuals (#48, #21), in the deceased sister for whom we have obtained DNA from a

85 post-mortem muscle biopsy (#47), and in all obligate carriers from the father’s side including the father (#38, #14, #10, #2) and was absent in the proband’s mother (#39). The variant is A>G substitution at a splice donor site (+2) of intron 54 (c.7543+2A>G) of TLN1 encoding the talin1 protein. Talin1 is a large (2541 amino acids) cytoskeletal protein consisting of an N-terminal head domain and a 220 kDa C-terminal rod domain that contains multiple binding sites for F-actin. This protein is a component of adhesion complexes between the signaling pathways that regulate the affinity and avidity of integrins for extracellular metrix (ECM) protein such as collagen and fibronectin, integrin recycling and the structure and dynamics of the actin and microtubule networks146. Talin1 is reported to have a pivotal role in mediating adhesion-dependent shape changes, pericyte contractility and cellular stiffness147. Pericytes surround the capillary endothelium and their interactions with endothelial cells control microvascular remodeling and capillary tonus by creating a mechanical force that regulates endothelial dynamics. Macromolecular focal adhesion complexes coordinate such dynamics and participate in force transduction. Among these focal adhesion molecules, talin1, that via its integrin binding, provides a direct link between ECM proteins and the cytoskeleton (Fig. 32).

Figure 32. Interactions between talin and its binding partners in focal adhesion assembly. Talin binds to PIPKIγ, which produces PIP2 that binds to talin, strengthening its interaction with integrin. PIP2-associated vinculin can transiently bind to activated Arp2/3 complex, which nucleates actin polymerization. The c.7543+2A>G splice mutation is indicated in a red arrow.

86

4.3 The splice site mutation affects the TLN1 transcripts To investigate the potential aberrant transcripts in the affected individuals, total RNA was extracted from peripherial blood of patients and controls. cDNA was PCR amplified with specific primers for individually amplifying exons 53,54 and 55. Since the variant was present in a heterozygous state in our patients, both the normal and the mutated allele were expected. PCR products derived from cDNA harboring the mutant c.7543+2 G- allele lacked exon 54 of the TLN1 gene (Fig. 33A). As a result of this skipping of exon 54 which comprises 63 bp, the encoded variant protein is predicted to have an in-frame deletion of 21 amino acids. In addition, the mutant allele showed retention of intron 54 comprising of 111bp (Fig. 33A), This aberrant transcript is expected to encode a protein with a shifted frame beginning at codon 2397 and a premature stop at codon 2402 (Fig. 33C). In addition, TLN1, TLN2, Vinculin and C-terminal Talin in affected fibroblasts using western blot did not show any remarkable protein degradation (data not shown).

Figure 33. (A) PCR reaction of both skipping of exon 54 (primer pair “D”) and retention of intron 54 (primer pair “E”). M, marker; individuals are numbered as in pedigree. X- Total DNA segment; - No DNA. (B) Schematic of designed segments for PCR of cDNA indicating fragments length with and without the genetic aberrations. (C) Protein Attributes of TLN1. Domain structure and mutation according to UniProt and NCBI conserved domains. Top: conserved domains in the wild- type TLN1. Middle: exon skipping caused by the mutation creating a shorter version of the protein. Bottom: intron retention caused by the mutation. Orange indicates the modified protein sequence resulting from a frameshift (at position 2,379) leading to a premature stop codon (at position 2,402).

87

4.4 Transcriptome analysis on patients’ skin fibroblasts To explore the molecular basis of the phenotype observed upon the TLN1 mutation, we examined possible changes in gene transcription that may reflect dysregulation of genes within the talin pathway, or an indirect downstream effect. We performed RNA sequencing on skin fibroblasts derived from the proband and his healthy mother. Overall 135 genes were identified as downregulated in the patient, and 111 as upregulate (Table 3) (GO) enrichment analysis using GeneAnalytics within the GeneCards suite82 revealed that the downregulated genes play important roles in diverse cellular processes. The most significantly enriched GO terms with the highest number of matched genes were cell adhesion and extracellular matrix organization, in which talin1 plays a pivotal role (Fig. 34). Interestingly, among the upregulated genes, highest number of matched genes was also seen for cell adhesion and extracellular matrix organization.

Upregulated genes Downregulated genes AMPH, APOBEC3F, ARHGEF4, ASF1B, A2M, ABI3BP, ADAMTS7, ADCY4, B4GALNT1, BDH1, BTRC, CADPS, CDA, AFF3, AHR, APOE, ARHGAP28, ASPN, CDC45, CDCA2, CDCP1, CENPI, CENPW, C17orf103, C8orf44, CA5BP1, CASS4, CKAP2L, CLSPN, CSMD2, DEPDC1B, CBLN3, CDKN2B, CHI3L1, CLEC14A, DES, DIRAS3, DMC1, DNA2, DOK5, CLIC2, COL15A1, COMP, CPA4, CPZ, DUSP5, EFNB2, ENSG00000224677, CRELD1, CRIP1, CRLF1, CTH, CXCR7, ENSG00000226593, ENSG00000229230, CYP4V2, DACT1, DEPTOR, DGCR6, ENSG00000231632, ENSG00000234160, DNALI1, DPT, ECM2, EFEMP1, ENSG00000244198, ENSG00000249319, EXO1, FAM111B, FAM64A, FANCD2, ENSG00000212153, ENSG00000223908, FBN2, FOXF1, GAL, GLE1, GLIPR1, ENSG00000225410, ENSG00000229927, HERC2P2, HMGB3P6, IGFBP5, IL32, ENSG00000230838, ENSG00000232495, ITGA2, KRBOX1, KRTAP2-3, LIN9, ENSG00000248594, ENSG00000252531, LMNB1, LOC389831, LY6K, LYSMD3, ENSG00000258430, EPB41L4A-AS1, MCAM, MCM10, MCM4, MCM8, MLF1IP, F10, FAM115B, FAM20A, FAM70B, MMP1, MMP3, NCAM1, NDNF, NME1- FBXL2, FOXP2, FXYD1, GALNTL1, NME2, NOV, NOVA1, ORC1, PARP8, GDF15, GGT5, GXYLT2, HOXB6, PAX8, PCDHGA12, PCOLCE2, PEG10, HOXB7, HOXC4, HOXC6, HOXC8,

88

PITPNM3, PLXNC1, PODXL, POLE2, HOXC9, HOXD3, HOXD-AS1, IFI30, PPARG, PRKAR2B, PRKG2, PTGS2, IFIT1, IGFBP7, IL20RB, IL6, INHBE, PTPRN, RAD51, RAD54L, RGS4, RMI2, JPH2, KCNE4, KDM5B-AS1, RPS28, S100A2, SEMA5A, SEPT7P3, KIAA1462, KRT18, KRT7, KRT8, LBH, SERPINB2, SHC3, SHCBP1, SLC38A5, LOC100128252, LYNX1, MED28, SLC43A3, SLIT2, SNORA73B, SSTR1, MKX, MMP11, NEDD9, NEGR1, STEAP1B, THBD, TMEFF1, TMSB4XP2, NLGN1, NREP, ODZ4, OLFML2A, TMTC1, TNFRSF11B, TNFRSF21, TNIP3, OMD, P2RX6, PALM, PCDHGA7, TOX2, TRAIP, TRNP1, TRPC4, TSPAN13, UBE2T, WNK4,, ZNF367 PCDHGB6, PDPN, PEAR1, PIM1, PIR, PLA2G16, PLAC9, PMS2P4, PPL, PRR5L, PRSS23, PTGDS, RN7SK, RPS4XP13, SAMD14, SCRG1, SEMA3C, SFRP1, SLC12A8, SLC6A6, SLIT3, SNHG5, SNORD3A, SPON1, SRGN, STAG3L1, SULF1, SULF2, SYTL2, THNSL2, TMEM45A, TMEM9B-AS1, TNFAIP6, TNXB, TPD52L1, TRIB2, TSSC2, UNC5B, VIT, WNT2, ZBTB43, ZNF888 Table 3. All downregulated and upregulated genes seen in transcriptome analysis of patients' skin fibroblasts

89

Figure 34. GO biological processes for all downregulated and upregulated genes in the fibroblasts’ transcriptome analysis from the patient and healthy control. Ordered according to score from highest to low. All genes were submitted into “GeneAnalytics”. Matched genes indicated genes within the given list, total genes is total number of genes in this entity.

4.5 Proteome analysis on patients’ skin fibroblasts We attempted to identify proteins with relevance to CLS pathogenesis due to the TLN1 splice mutation, and performed a proteomics comparison on both the affected and control skin fibroblasts. Data analysis enabled characterization of 85 differentially expressed proteins that were statistically significant (False discovery rate (FDR) = 1%). This list of proteins was inserted into VarElect using keywords corresponding with CLS phenotype. In addition, we looked for a connection between talin1 function and pathway and these differentially expressed proteins (Table 4). All terms used gave either direct or indirect results indicating a strong connection to CLS phenotypic description and to talin mechanism. We note that all proteins that were found directly related to a phenotype that is known to be malfunctioning in CLS, were those that were downregulated in the patient.

90

Table 4. Keywords used in VarElect to detect both direct and indirect connection with the top down and upregulated proteins in transcriptomics analysis using the patient’s skin fibroblasts.

4.6 Intercellular junctions integrity is impaired in TLN1 hemizygous endothelial cells TLN1 has previously been shown to be involved in focal adhesion. To provide more direct support for a role of this gene paracellular transport in endothelial cells, we examined mouse endothelial cells from a mouse strain with heterozygously deleted TLN1, generated in our collaborators’ laboratory148. We visualized VE-cadherin, a vascular endothelium-specific transmembrane protein, whose calcium-mediated homotypic interactions between adjacent cells are essential for endothelial barrier function149; 150. The cells showed significant attenuation of the localization of VE- cadherin in adherens junctions, discontinuous cell-cell junctions and disrupted transport- mediating paracellular junctions (Fig. 35). We further observed an increase in cytoplasmatic Ve-cadherin as it is presumably removed from the adherens junctions and gets internalized (Fig. 35C).

91

Figure 35. (A-B) Immunostaining with VE-cadherin antibody (green) in normal and tailin1 hemizygous mouse endothelial cells. Colour-scaled VE-Cad shows the length of continuous structures of VE-Cadherin adherens junctions. Red-big adherent areas, blue- small structures of disrupted or internalized staining. (C) Increased cytoplasmatic VE-Cadherin in Talin het endothelial cells.

4.7 Seeking other TLN1-related phenotypes in the CLS patient

4.7.1 Integrity of focal adhesion assembly Talin was originally discovered as a component of focal adhesion in cells in culture and is required for both cell spreading and focal adhesion assembly146; 151. To investigate the effect of the splice site mutation in TLN1 on this mechanism, skin fibroblasts were extracted from the proband and his healthy mother. Fibroblasts were fixed and stained for the focal adhesion marker Talin1 with and without Manganase (Mn2+) that enhances cell spreading and adhesion assembly through direct activation of integrins152 (Fig. 36A), Talin2 (Fig. 36B) and Vinculin and beta-integrin with and without Mn2+ (Fig. 37). Results showed that the localization of Talin1 was not affected in both wild-type and affected cells, and that integrin activation is intact. In addition, the cells were well spread and did not differ in focal adhesion number and size in the presence or absence of 1 mM Mn2+ for 1 hour.

92

Figure 36. Quantitative analysis of Focal adhesions. Skin fibroblasts from the patient and a healthy control relative were plated on gelatin, fibronectin and collagen coated coverslips and grown to semi confluence. (A) Cells were fixed with MEOH and subjected to immunofluorescence staining for Talin 1 (green) to visualize focal adhesions. DNA content was stained with dapi (blue). Images were taken using 40x objective / Nikon microscope. Mn, manganese. (B) Cells were fixed with MEOH and subjected to immunofluorescence staining for Talin 2 (red) to visualize focal adhesions. DNA content was stained with dapi (blue). Scale bar 30um. (C) Images analyzed with IMARIS software show the area and the number of focal adhesion as well as the surface of each cell. Quantification of focal adhesions was performed for every cell separately. Scale bar 30um.

Figure 37. Quantitative analysis of Focal adhesions with Vinculin and active beta-integrin staining. Immunostaining of human fibroblasts isolated from a CLS patient and a control donor, with or without manganese (Mn2+) stimulation, for beta-integrin (Active b1; green) and Vinculin (red) to visualize focal adhesion sites. DNA content was stained with dapi (blue). Images analyzed with IMARIS software show the adhesion area and the number of focal adhesion sites (F.As) per cell. N=3, student’s t-test, with *p < 0.05, **p < 0.01, ***p < 0.001.

93

4.7.2 TLN1 splice site mutation does not affect platelets function Previous studies have shown that disruption of TLN1 in knockout mice resulted in embryonic lethality around gastrulation153. However, it has been reported that platelets from talin1-deficient mice are defective in retracting a thrombin-induced clot, since platelet retraction of fibrin clots requires integrins to be connected to the actin cytoskeleton 154. One important role of platelets is the plugging of capillary leaks that may otherwise become an entry point for germs, an action that could also prevent blood loss. Although the proband did not show any clinical sign of aberrant clotting, we note that this does not fully account for the integrity of platelets aggregation. To examine whether our TLN1 splice site mutation affects the patient’s normal platelets activity we performed ex-vivo platelets adhesion studies. Both wild-type and TLN1 deficient platelets were observed in normal size and showed normal aggregation to the surface (Fig. 38).

Figure 38. Platelets aggregation assay. Testing both the affected (patient) and a normal blood sample (control) results in aggregates formation on the well surface. SC; surface coverage. AS; average size.

4.7.3 CLS episodic serum induces endothelial permeability in both wild-type and hemizygous Talin1 endothelial cells The main hypothesis regarding the disease mechanism is the impairment of endothelial function leading to hyperpermeability. Previous studies have shown that CLS serum

94 increases the microvascular endothelial barrier dysfunction and permeability74. To test whether CLS serum from our patient leads to similar phenotype, we applied serum obtained from patient 48 during an acute attack and at a quiescent (basal) period, to confluent mouse talin1 hemizygous endothelial cells, and endothelial cells derived from WT mouse, and stained for VE-Cadherin. We observed that when applying both episodic serum and its basal counterpart on normal endothelial cells, the junctional localization of VE cadherin is attenuated and leads to the disruption of endothelial adherens junctions. These data strongly suggest that specific factor(s) present in the serum of the CLS patient provoke vascular leak symptoms by eliciting endothelial hyper-permeability. Similar to this observation, we show that talin1 hemizygous endothelial cells also have disrupted endothelial adherens junctions under no stimulation, and that for these cells, further application of basal or episodic serum does not change the original effect caused by the mutation (Fig. 39), indicating that the talin1 hemizygous mutation effects the endothelial junctions in a very similar manner as does the serum from the patient.

Figure 39. Immunostaining with VE-cadherin antibody (green) in normal and talin1 hemizygous mouse endothelial cells. Annotation same as in Figure 35. Showing that serum from a CLS patient during an attack elicit permeability of endothelial monolayers.

95

4.7.4 Elevated immune related proteins in CLS serum Previous studies have shown that serum obtained from a subject during an acute CLS attack induced endothelial barrier dysfunction and increased permeability, when applied in vitro74. The authors suggested that this is caused by soluble factors of inflammation in CLS serum during an attack, such as vascular endothelial growth factor (VEGF) and angiopoetin2 (Ang2), which may increase endothelial cells susceptibility to vascular leakage. To assess the role of inflammation in acute CLS attacks and to discover new mediators that may contribute in understanding the disease mechanism, we analyzed the expression of serum proteins in serum obtained from the proband during either an acute attack (episodic) or a quiescent period (basal), compared to serum obtained from his healthy brother (Fig. 31, patient 49), by mass spectrometry. After depletion of the 6 most abundant proteins, we identified 198 proteins that passed analysis criteria. Overall, the patient’s episodic serum sample showed significant elevation for several immune related factors when compared to both conditions (Fig. 40). We focused on proteins which had a fold change differential expression of >2, and examined their biological relation. Gene ontology (GO) enrichment analysis showed that most of these proteins were involved in complement and coagulation cascades. For instance, Keratin 8, Type II (KRT8), one of the most highly elevated proteins in episodic serum, is a component of the intermediate-filament cytoskeleton of simple epithelia for which increased levels have been associated with inflammation, fibrosis and edema155. The second most elevated protein in episodic serum is fibronectin1 (FINC), a glycoprotein involved in cell adhesion cell motility, opsonization, wound healing, and maintenance of cell shape. However, when looking at protein expression in patient’s serum at basal compared to a healthy control, no significant elevation was observed. The highest fold change value observed between these two samples was 1.7 for Immunoglobulin D (IGHD) and Paraoxonase 1 (PON1).

96

Figure 40. Intensities of abundant proteins in the patient’s serum at basal level and episodic and in control serum. Displaying all proteins that passed the threshold of 2.

97

5. Additional genome studies

5.1 Trios project We performed whole-exome sequencing on 57 patients along with their unaffected parents (parent-child trio). The patients were all young children with congenital anomalies and/or intellectual disabilities due to unexplained conditions presumed to be genetic. Importantly, we did not seek patients with similar phenotypes. The patients were chosen to be representative of a clinical sample of undiagnosed genetic conditions, in that they were not selected for genetic tractability or phenotypic homogeneity, making the cohort representative of a typical genetics clinic. The overall goal was to provide diagnosis based on mutations found in already known genes but to also provide pointers toward novel disease genes. The trios presented here were largely published within a larger trio project of 119 trios analyzed both at The Weizmann Institute and at Duke University, as part of the multi institutional collaboration of Duke-Weizmann-Sheba156.

On average, 93.19% of the exome-wide consensus coding sequence (CCDS) sequence (release 14) was covered with at least 10X fold coverage. We identified an average of 15 candidate variants that passed the filtering process for each trio, averaging one de novo, one hemizygous, five homozygous, and seven compound heterozygous variants (Table 5). On average, this number is bigger than what is to be expected from whole exome trio analysis, but is consistent with the higher percentage of consanguinity among the trios in our study (~20% of the families were consanguineous).

Trios (n=57) Percentage of CCDS r14 (%) 93.19%

De novo mutation 0.69 X-Linked genotype 1.29 Homozygous mutations 5.73 Compound heterozygous mutations 7.44 Total 15.15 Table 5. Average numbers of qualifying genotypes identified in the analyzed trios. Synonymous variant effects have been excluded from below comparisons.

98

Using exome sequencing analysis followed by Sanger validation I found the disease causing gene in 26 patients (45% success rate) (Table 6), of which for 19 patients a full diagnosis was established, and for 7, further functional studies are currently taking place to support the results. Of the total 26 resolved cases, we found 17 known disease genes and 9 new genes with an existing OMIM or PubMed disease association but with a less consistent clinical phenotypic overlap. Overall, four cases (15%) were due to a de novo mutation, twelve (46%) were due to a homozygous mutation, three (11%) were due to a hemizygous genotype (X-linked), and seven (26%) were due to a compound heterozygous mutation. The percentage of cases diagnosed with recessive conditions is higher than that reported in previous studies157 presumably due to the increased level of consanguinity in our cohort as compared with the populations in most other published WES diagnostic studies. We have three examples in which the genetic diagnosis led to an immediate change in management. For one of the three patients, the genetic diagnoses informed specific pharmacotherapies. This is the patient of trio 53 (Table 6), who has a de novo missense mutation in KCNQ2 and has been prescribed Retigabine158 that helped to reduced seizure frequency in the patient despite a lack of an observable positive effect on development. In two additional patients (trio 91 and trio 94), the genetic diagnoses led to specific diet interventions that significantly improved the patients’ metabolic conditions.

99

Known gene/New gene Trio Disease Gene Mode of inheritance for the syndrome Outcome 45 epilepsy with MR and hemolytic anemia GPI recessive Known gene diagnosis determined 50 epileptic encephalopathy PIGN compound heterozygote Known gene diagnosis determined 53 MR, macrocephaly, epileptic encephalopathyKCNQ2 De-Novo Known gene diagnosis determined 63 Apnea and leigh disease NDUFAF2 recessive Known gene diagnosis determined 68 malignant migratory epilepsy WDR81 compound heterozygote New gene diagnosis determined 72 microcephaly with liver failure SLC1A4 recessive New gene diagnosis determined 74 epileptic encephalopathy PRODH recessive Known gene diagnosis determined 78 microcephaly and spasm SLC1A4 compound heterozygote New gene diagnosis determined 86 auditory neuropathy, progressive cerebellarAIFM1 ataxia x-linked Known gene diagnosis determined 89 microcephaly with non epileptic spasm FOXG1 De-Novo Known gene diagnosis determined 91 Leigh like PDHA1 x-linked Known gene diagnosis determined 94 basal ganglia necrosis GCDH compound heterozygote Known gene diagnosis determined 96 RP epilepsy CLN6 recessive Known gene diagnosis determined 97 MR, autistic features, kabuki like DYNC1H1 De-Novo Known gene diagnosis determined 98 prontocerebellar hypoplasia and cataract KIF11 recessive Known gene diagnosis determined 100 purine/pyramidine disorder DPYD recessive Known gene diagnosis determined 101 skeletal deformaties, dysmorphism NOTCH2 De-Novo Known gene diagnosis determined 107 hereditary spastic paraplegia TPI1 recessive Known gene diagnosis determined 71 dystonia, MR and autonomic dysfunction TECPR2 recessive Known gene diagnosis determined 44 Melkersson-Rosenthal syndrome SLC17A1 recessive New gene in functional assays 59 Carnitine deficiency SLC9C2 compound heterozygote New gene in functional assays 67 microcephaly with MR GPM6B, ATPA4 x-linked Known gene in functional assays 80 microcephaly with brain atrophy TRAPPC9 recessive New gene in functional assays 82 congenital diarrhea KCTD10 recessive New gene in functional assays 87 rett like with a simplified gyral pattern CASC5 compound heterozygote New gene in functional assays 103 cokayne-syndrome like ALMS1 compound heterozygote New gene in functional assays 47 pitt-hopkins like No good candidates 48 juvenile parkinson and GH deficiency No good candidates 49 progresive cerebelocerebellar atrophy No good candidates 51 Leigh disease No good candidates 54 thrombocytopenia, anemia No good candidates 55 MR/ cerebellar malformations No good candidates 56 prontocerebellar hypoplasia No good candidates 58 Rhabdomyolysis No good candidates 62 hallermann-streiff syndrome No good candidates 69 severe MR and epilepsy No good candidates 52 Leukoencephalopathy No good candidates 60 autonomic neuropathy- FTT No good candidates 61 O-linked disorder No good candidates 73 cerebellar ataxia No good candidates 75 early epileptic encephalopathy No good candidates 76 myoclonic epileptic encephalopathy No good candidates 77 hypsarrhythmia, hypomyelination No good candidates 79 infantile neuroaxonal dystrophy INAD+deafness No good candidates 81 cyanotic heart disease No good candidates 83 congenital adrenal hyperplasia-like No good candidates 84 cardiomyopathy No good candidates 85 normocephalic leukoencephalopathy with subcortical cysts No good candidates 88 microcephaly, MR, chorea No good candidates 90 movement disorder, MR No good candidates 92 rett like No good candidates 93 AT like, cerebellar ataxia No good candidates 95 myoclonusm cerebellar atrophy No good candidates 99 Troyer like No good candidates 102 ketotic hypoglycemia No good candidates 104 early onset strokes and epilepsy No good candidates 106 hereditary spastic paraplegia and ataxia No good candidates Table 6. List of all 57 trios that were sequenced and findings

100

It is important to note that there is a difference between perfect controls—ethnically matched and screened for personal and family history of any relevant illness—and controls of convenience, such as the those provided by the EVS. The number of candidate mutations in our cohort was higher than that previously published, and this reflects the benefits of having an ethnically matched control population for comparison. The EVS includes subjects selected for specific diseases such as early-onset heart disease and stroke, as well as for extreme phenotypes such as very high or low cholesterol. This must be borne in mind when using controls of convenience for screening out candidate variants. One illustrative example of this is the patient in trio 100 with a homozygous DPYD genotype (c.1905+1G>A) that has been reported to cause dihydropyrimidine dehydrogenase (DPD) deficiency (OMIM 274270), an autosomal-recessive disorder of pyrimidine metabolism159. Our patient had failure to thrive, global developmental delay, and high urine uracil levels, all consistent with DPD deficiency. However, we also found a homozygous genotype in 1 of 3,027 internal controls not known to have DPD deficiency. Examination of the literature shows that DPD deficiency is characterized by a highly variable phenotype, and some individuals with known pathogenic genotypes can be asymptomatic160. This fact, together with the observation that the mutation has already been reported among unrelated patients with DPD deficiency, strongly supports the pathogenic nature of the genotype in our patient despite the occurrence of the genotype in a control.

It is also important to bear in mind quality control differences when using controls sequenced elsewhere. For example, the patient of trio 71 (of Ashkenazi origin) is homozygous for a frameshift mutation in TECPR2, which I originally identified as causing SPG49 (see HSP chapter). This patient had overlapping manifestations with SPG49, including severe hypotonia, gastroesophageal reflux disease, areflexia, intellectual disability, and breathing abnormalities; however, no candidate genes emerged during initial analysis. Given the strong clinical evidence for SPG49, the patient’s treating clinician requested that TECPR2 be screened more liberally than the qualifying genotype criteria. As a result, I identified a homozygous variant (p.Leu440ArgfsTer19) that has not been previously reported in SPG49 patients. Among the EVS controls database, homozygosity of the same frameshift variant was found in one subject (a

101

European American). Because the EVS European-American genotypic distribution deviated from Hardy–Weinberg equilibrium (A1A1 = 1/A1R = 5/RR = 3,861, P = 0.0027)161, I asked the Exome Sequencing Project directly about this EVS homozygous genotype and was informed that the homozygous InDel genotype was likely to be heterozygous and mistakenly called as homozygous because of the low EVS sequencing coverage at this locus (Qian Yi, personal communication). Thus, this observation was reported as a false positive to advocate for the careful evaluation of putatively pathogenic variants based on all lines of available evidence.

We note that among the sequenced trios, we were not able to identify the disease causing gene in 31 cases (54%). These trios had either no good candidate genes, or had no genes that passed segregation within the family.

5.2 Chromosomal translocation A 29 year-old healthy woman with a female fetus performed amniocentesis at week 23+1. The results indicated that the fetus has a 46XX karyotype, with an apparent balanced translocation between chromosomes 4 and 15 (46,XX,t(4:15)(q22:q21)) (Fig. 41). A more detailed structural variation analysis with a chromosomal microarray (CMA) did not reveal any copy number variations at the translocation site, down to the resolution of this method. Analyzing the karyotype of the parents showed that the translocation is not present in any of them, suggesting a de-novo variant. Based on an empirical statistics of 7% probability for severe pathology in de-novo balanced translocations162 the couple was advised to terminate the pregnancy.

Figure 41. Fetus karyotpe from amniocentesis showing the 4:15 translocation. Impaired chromosomes indicated in red circle

102

The methods used so far do not provide answers to two important questions: a) Are there CNVs smaller than detectable by the CMA method; b) What is the exact translocation breakpoint and what could be its potential impact on gene integrity. pon the couple’s further decision, the fetus's DNA was sent to the Israel National Center for Personalized Medicine at the Weizmann Institute for whole genome sequencing, and to the Center for Human Genome Variation at Duke University for structural variant calling. The breakpoints were mapped to chr4: 98340534 and chr15: 56439149 (hg. 19) (Fig. 42). The disrupted genes were RFX7 on chr4 and STPG2 on Chr15. RFX7 is a transcription factor which belongs to the regulatory factor X gene family. This gene family has very divergent regulatory factors functioning in a diverse spectrum of systems, including the control of the immune response in mammals, brain development and more. The RFX7 gene was discovered by computational methods in 2008 and its specific function is still largely unknown, except for a single study which reports on an association with chronic lymphocytic leukemia risk. STPG2 is a largely uncharacterized protein. A recent study associated this gene to azoospermia and mild dysmorphia. While STPG2 has 3 isoforms, only one has a complete coding sequence. Our analysis suggests that this isoform, which is expressed in testes, is not disrupted by the translocation. The interval is annotated to harbor additional transcript coded on the opposite strand, but without any experimental evidences addressing their function.

Figure 42. The breakpoints on chromosome 15 (left) and (right)

103

Overall, these findings indicated that two genes with rather unknown functions were affected by the translocation, however their level of pathogenicity could not be fully assessed and all available tools indicated that this affect leads to a rather likely benign phenotype. Additional follow ups during the pregnancy showed no abnormalities in the fetus, and the pregnancy was carried to term. The woman gave birth at week 38 to a healthy girl.

5.3 Intellectual disability with microcephaly The following data have been published in Clinical Genetics163.

Two unrelated patients, presenting with significant global developmental delay, severe progressive microcephaly, seizures, spasticity and thin corpus callosum (CC) underwent trio whole-exome sequencing.

The first patient (Fig. 43A, II:3) is a 4.5-year-old girl of Ashkenazi-Iraqi Jewish origin with no consanguinity and no known genetic diseases in the family. MRI demonstrated thin corpus callosum, delayed myelination and cerebral atrophy. Currently she exhibits daily myoclonic seizures, irritability and hyperactivity, sleep disorder, reduced purposeful hand movements and severe global developmental delay. The second patient (Fig. 43b, II:5) is a 6-year-old girl of Ashkenazi Jewish origin with no consanguinity, suffering from global developmental delay and febrile seizures. MRI also demonstrated mild cerebral atrophy and a thin corpus callosum with normal spectroscopy.

104

Figure 43. Pedigree and segregation of the mutations in both families. (a) Patient II:3 in family 1 is compound heterozygous to the missense c.766G>A, p.(E256K)mutation and the nonsense c.945delTT,p.(Leu315Hisfs*42) mutation. (b) PatientII:5 in family 2 is homozygous for the c.766G>A,p.(E256K) mutation. We were unable to obtain DNA from the deceased patient in family 2.

Trio exome sequencing performed in the first patient yielded 2 de novo variants, 1 newly homozygous variant and 15 compound heterozygous variants that passed the filtering process. However, none of these were previously reported in context of the patient’s phenotype. Trio exome sequencing performed in the second patient yielded 3 de novo variants, 1 newly homozygous variant and 11 compound heterozygous variants. Again, none of these were reported in context of the patient’s phenotype. Crossing the data from both patients, revealed only one gene, SLC1A4, in which both patients had pathogenic variants. The Ashkenazi patient was homozygous for a missense c.766G>A, p.(E256K) mutation (NM_003038.4) whereas the second patient of Ashkenazi-Iraqi origin was compound heterozygous for the same missense mutation and a novel nonsense c.945delTT, p.(Leu315Hisfs*42) mutation (NM_003038.4). Both mutations were validated using Sanger sequencing and demonstrated perfect segregation within the

105 families (Fig. 43 a,b). The E256K mutation was predicted to be damaging and was located in a region that is highly sensitive to amino acid changes (Fig. 44A). In addition, multiple sequence alignment demonstrated that the E256 is evolutionally conserved (Fig. 44B).

Figure 44. (A) Global analysis for the effect of mutations in ASCT1 using snap2 program in predict Protein site. For each position the effect of all possible substitution is shown. The E256K mutation (marked in a blue square) is assigned as deleterious. In addition, the entire domain shows relatively high sensitivity for mutations as indicated by the reddish colors. (B) CLUSTAL 2.1 multiple sequence alignment demonstrating that the glutamine in the 256 position is conserved in all species. ‘*’ (asterisk) indicates positions which have a single fully conserved residue.‘:’(colon) indicates conservation between groups of strongly similar properties – scoring >0.5 in the Gonnet PAM250 matrix.‘.’(period) indicates conservation between groups of weakly similar properties – scoring ≤0.5intheGonnetPAM 250 matrix.

106

The E256K mutation is extremely rare among the general population, with 0.0001 MAF among European Americans and zero within our in-house exomes database of Israeli sequenced individuals. In addition, no homozygous individuals were found. The p.Leu315Hisfs*42 was completely absent among all available controls datasets. Among 100 Jewish Ashkenazi controls, we found one subject heterozygous for the E256K mutation, while among 100 Jewish Iraqi healthy controls, carriers for the p.Leu315Hisfs*42 mutation were not detected. Further, we constructed a structural model of the human ASCT1 protein using Swiss-model164 in order to study the structural and functional effect of the mutations. The predicted structure (Fig. 45a) demonstrated that p.Leu315Hisfs*42 mutation causes modification and truncation of a centric part of the polypeptide (Fig. 45b). The E256K mutation in which Glu256 is substituted by lysine is predicted to be located within a long membrane helix, close to the cytoplasmic part of the transporter. Assuming that the protein is a homo-trimer, as are many other homologous transporters, it is also expected to be positioned on the interface with the other subunits (Fig. 45c,d). Given this, the Glu256 residue might play a role in stabilization of the complex or in the attraction and transfer of the transported or co- transported and anti-transported molecules (such as sodium).

Figure 45. Structural mapping of the SLC1A4/ASCT1 pathogenic mutations. (a) Structural model of Human ASCT1 protein (Uniprot ID P43007). The mutation (E256K) is predicted to be located within a long membrane helix. (b) The frame-shift mutation causes change in the sequence of 41 amino acids which follow the mutation site (colored in red, amino acids 315–355), up until an off- frame stop codon. The rest of protein (colored in blue, amino acids 356–523) is truncated, most probably leading to miss folded non-functional protein. (c) The mature protein probably adopts a symmetrical homo-multimeric state, most likely a trimer, with E256 predicted to be located close to the interface. (d) Side view of the complex from within the membrane plane which demonstrates that E256 is close to the cytoplasmic side. In all panels E256 location is shown in black and in stick representation

107

DISCUSSION 1. Next generation sequencing- past, present and future NGS represents an entirely new principle of sequencing technology following the previously standard Sanger sequencing methodology, which was first used in 1977165. In the Sanger method, test-tube “sequencing-by-synthesis” reactions produce DNA chains that are arbitrarily terminated at each of the different positions by irreversibly introducing a fluorescent dideoxynucleotide, and the resultant pool is subsequently separated according to size by capillary electrophoresis. In NGS the same chemistry is applied, except that test-tubes and capillaries are replaced by single molecules amplified into clusters on a flowcell surface, and reversible dideoxynucleotide are used, allowing a scanner to visualize as many as 1 billion reactions in cycles of dideoxynucleotide blocade, detection and removal. This is directly monitored by the sequencing machine, allowing about 100Gb to be read in a single flowcell, allowing a whole human genome to be sequenced at X30 in one go. Remarkably, the increasing sequencing capacity is paralleled by a gradually decreasing cost to sequence a human genome, and indeed at the last year of my thesis, the desired goal of the “$1000 genome” has been reached. NGS became available to the community in 2008–09 when the first NGS machines entered the market. The process of applying NGS in a research or a diagnostic setting comprises a wet laboratory workflow, followed by a dry laboratory workflow involving bioinformatic analyses (sequence alignment, variant calling), as well as variant filtering and interpretation (annotation, mapping against variant databases). To date, NGS is the best available tool to elucidate disease-causing mutations. It is not only useful in large extended families, where linkage information provides information about the disease locus, but may also be applied to detect disease-causing de novo mutations in sporadic patients, a research and diagnostic question impossible to address by conventional Sanger sequencing without having a candidate gene166.

108

1.1 Whole genome versus whole-exome sequencing

In the early days of NGS, whole genome sequencing was less feasible to perform due to the fact that for sequencing an entire genome, one would need to run dozens of flowcells, which made this method ineffective and extremely expensive. For that reason, the capacity to capture a target region was invented, allowing researchers to focus on either small parts of the genome, a limited linkage region in a given family or on the entire exome, with the latter being the most prevalently used. In exome sequencing, nearly all exonic sequences are enriched mainly by the different commercially available enrichment kits. In addition, custom-specific probe sets or microarrays are used for specific target enrichment of a desirable genomic region. The average coverage per exome today has reached the standard of 100X and has proved more than sufficient in identifying the disease causing gene. At a later stage, exome panels for different disorders have become available, enabling a deeper sequencing yield for a specific set of genes of interest and a faster analysis process due to the lower amount of data. Within the last few years, the number of newly identified disease genes has grown exponentially through the application of exome sequencing in all fields of medicine.

The major game changer was the increase in sequencing capacity per flowcell. It has rapidly become possible to run 1G of reads per 100Gb, thus run a whole genome on one flowcell. This enabled the fast decrease in the pricing of whole genome. Whole-genome sequencing assumes no specific hypothesis in a genetic study and requires no capturing. The major expanse in NGS is infact the capture kits used for whole exome sequencing. Once it was possible to use a single flowcell for a WGS run, prices for WGS and WES were almost the same. Currently, the appropriate and standard mean sequencing depth for whole genome sequencing is about 20X–40X.

Initial NGS analysis usually focuses on known disease-related genes, and for that purpose, exome sequencing provides sufficient answers. Exome capture kits began with a capacity of ~37Mb, and have these days reached over 70Mb, some also include UTRs for a broader sequencing region. In addition, it is highly frequent that sporadic capture of intergenic regions is seen in the exome, thus this methods actually gives more than just

109 the coding regions. Another aspect when studying a well-defined clinical case is the use of exome panels that include only a limited set of genes of interest depending on the phenotype. This increases the coverage level for the sequenced genes and decreases the amount of false positive calls. Another level of optimization for the ratio of sequencing depth and output (with respect to the costs) includes the parallel analysis of several patients in one run. This is possible through the integration of “molecular barcodes” during library preparation. These barcodes tag all sequences derived from a single patient and thus allow for pooled analyses of up to 400 samples. The major advantage of genome sequencing over exome sequencing comprises the unbiased analysis of the genome. Thus, potential protein-coding regions that have not yet been annotated as genes, as well as regulatory regions, such as noncoding RNAs or transcription factor binding sites, are included in the sequence analysis. In this context, and as we have seen for the IDIS project, a regulatory role might be assigned to unknown non-coding regions of the human genome. It is highly likely to assume that in the near future, exome capture kits will also include regulatory regions such as enhancers and promoters and other newly defined regions, however the identification of novel regulatory regions will only be achieved through WGS. Another great advantage for WGS is the homogeneity in coverage enabling a more accurate identification of CNVs. CNVs can be identified by means of specific signatures that they leave in the aligned sequence data, for example, changes in read-depth (RD), discordantly mapped read pairs (RP or paired-end mapping, PEM) or split reads (SRs), thus it is crucial to have homogeneity in sequencing reads when analyzing CNVs. Exome capture does not generate coverage homogeneity, thus it is much more difficult to perform CNVs analysis using this method. What is mostly common these days among clinical centers is performing WGS for detecting CNVs, however performing a focused analysis only on the exonic regions. The capacity to fully annotate and read the data generated from WGS is still quite poor. Many parameters in the genome are still left unannotated, making interpretation a daunting task. In addition, although it is possible to obtain and store massive amount of sequencing data, it still remains somewhat limited when analyzing whole genome data.

110

Today, both genome and exome sequencing methodologies are either performed for a research or a clinical diagnostic setting. The decision whether to run whole-genome or whole-exome sequencing still depends of the capacity of the genomic center that is handling the data and the availability and means to perform sophisticated bioinformatics analyses. Since pricing is no longer a critical issue, the call is usually made according to the biological question and the specific case. In case a defined disorder is presented and a physician would want to focus only on a small set of candidate genes, panel or exome sequencing should be sufficient. Where a broader uncharacterized phenotype is studied, WGS should be considered as the researcher would want to search the culprit by all means.

1.2 The future of clinical exome and genome sequencing In my work, I have deciphered the disease causing mutation for four rare diseases, for three of which, new disease-associated genes were identified, and in one case, a patient with an undiagnosed condition was successfully diagnosed and management was changed accordingly. This was done using exome sequencing combined with bioinformatics tools and functional assays that allowed rapid discovery and shed new light on each of these rare diseases mechanism. A solid diagnosis often has value for patients and families even when no new treatments result from the diagnosis, however can help in improving patient management. In the case of the translocation, whole genome sequencing of a fetus allowed the identification of the exact breakpoints, thus offering a much more exact information on the nature of the translocation and its possible outcome, and helped both the physicians and the family make life-changing decisions. It is reasonable to assume that with a gradually decreasing cost, whole genome sequencing will become a routine testing even for prenatal cases. In summary, whole-exome and whole-genome sequencing as diagnostic tools could be successfully used for undiagnosed patients with atypical presentation of a known disorder and should be strongly considered a routine testing in all cases where a genetic condition is suspected but traditional clinical genetic testing has proven negative. In addition, in some cases at least, it is likely that NGS will prove faster and less expensive than the long diagnostic odyssey many families now endure. Furthermore,

111 given the rapid pace of new gene discovery, it is essential to appreciate the need to dynamically reanalyze patient exomes and consider performing advanced whole-genome sequencing in cases where the exome did not yield any result.

2. Hereditary Spastic Paraparesis

Here I studied three independent Jewish Bukharian families with a unique spectrum of symptoms accounting for a newly described form of complicated spastic paraparesis, resulting in the identification of a novel gene and mutation responsible for the disease. Due to the incomplete resemblance to any other known form of HSP, We proposed to term the present disease SPG49 that was added to the full SPG list of phenotypes. Our functional assays showed that TECPR2 inactivation brings upon a decreased accumulation of LC3II-labeled autophagosomes but not a complete elimination.

For a group of progressive neurodegenerative diseases such as Huntington’s disease, Alzheimer’s disease, Parkinson’s disease, spinocerebellar ataxia, and amyotrophic lateral sclerosis (ALS), mutant proteins become aggregate-prone, hence less accessible to proteasome degradation, thus becoming more dependent on autophagy for their clearance.112; 167; 168 These studies have shown that constitutive autophagy is essential to the survival of motor neurons and that failure to induce autophagosome formation results in cytosolic persistence of unsequestered cargo, which could promote aggregation of intracellular components leading to cytotoxicity and disrupt neural function.169; 170 Alternatively, malformation of autophagosomes could lead to malfunction at downstream cellular pathways, suggesting that subsequent autophagic steps are modified.

The core neuropathology of HSP is distal degeneration of the lateral corticospinal tract.171 Many biochemical mechanisms contribute to axonal function within the corticospinal tracts, including mitochondrial, endoplasmic reticulum (ER)-shaping, endosomal trafficking, and microtubule stability, each affecting anterograde and retrograde axonal transport172. One of the rare autosomal dominant forms of HSP (SPG8) is caused by mutations in KIAA0196, encoding the protein strumpellin. This protein is involved in the fission of endosomes and interacts directly with VCP, a member of the AAA-ATPase family that acts through SQSTM1, with multiple cellular functions including vesicular

112 trafficking and degradation of proteins by the ubiquitin-proteasome system. Missense mutations in VCP have been recently identified as causing a late onset form of autosomal dominant complicated HSP. Another form of recessive, early onset complicated spastic paraparesis, SPG20 (Troyer syndrome [MIM 275900]) shows some similarity to the presently described SPG49, as both syndromes include short stature and dismorphic features but no eye involvement or EMG abnormalities. Interestingly, SPG20 involves a homozygous truncation of the spartin protein,173 which is implicated in regulating endosomal trafficking,174 an allied intracellular membrane transport pathway, thus supporting the implication of autophagy on the cellular mechanism of spastic paraparesis.

TECPR2 has a relatively high molecular mass (1411 amino acids) hinting at a molecular linker function. It is unique in having both WD (tryptophan-aspartic acid repeat) domains and TECPR domains, both implicated in protein-protein interactions involved in diseases.175; 176 In its domain disposition, TECPR2 appears to be mammalian-specific, with lower organisms showing orthologs with either WD or with TECPR domains, but not both. The TECPR2 protein shows relatively weak but significant similarity (by BLAST via PairsDB177) to two other human proteins. One is TECPR1, implicated in selective recruitment of bacteria into autophagosome,110; 178; 179 and in autophagosome assembly.180 The other is the HPS5 protein, underlying a specific type of Hermansky- Pudlak syndrome, involving dysfunction of lysosome-related organelles181. This information further strengthens the functional classification of TECPR2 and its disease affiliation.

The discovery of TECPR2 as disease related gene led to the identification of three additional SPG49 patients of non-Bukharian origin harboring 2 novel mutations in TECPR2, providing further evidence implicating this gene in this newly defined syndrome. All three patients shared a distinctive phenotype corresponding with SPG49, but also with a prominent feature of hyper-reflexia in contrast to the areflexia in the Bukharian patients. We thus later suggested that TECPR2 is causing a subtype of autosomal recessive hereditary sensory- autonomic neuropathy (HSAN) rather than HSP, and therefore, there should be increased awareness of the unique constellations of signs and symptoms among pediatricians in general. Additionally, and in light of new

113

Ashkenazi mutations in the TECPR2, we propose that patients that are found negative for the known Ashkenazi sensory-autonomic neuropathies such as familial dysautonomia, should be evaluated for the two Ashkenazi mutations reported here. The study of TECPR2 as an autophagy related and neurodegenerative associated gene was further evaluated in several follow up studies. One report demonstrated another SPG gene, ZFYVE26 as being key determinant of autophagosome maturation, which is impaired when the protein is defective or absent. These results further supported the relevance of defective autophagy in neurodegeneration182. Another follow up study showed that TECPR2 functions as molecular scaffold linking early secretion pathway and autophagy, and is required for the stabilization of SEC24D, a specific trafficking component, maintenance of functional ER exit sites (ERES), and efficient ER export in a manner dependent on binding to lipidated LC3C. Further, TECPR2-deficient HSP patient cells display alterations in SEC24D abundance and ER export efficiency183. Further, a novel non-synonymous variant in TECPR2 was detected in Spanish water dogs causing uncharacterized juvenile-onset neuroaxonal dystrophy (NAD)184, a phenotype that is etiologically paralelle to the human SPG49 we have identified in the Bukharian patients. These follow-up studies provide further genetic and biological proof to the relation between TECPR2 and HSP, and emphasize the association between impaired autophagy and neurodegeneration.

3. Intractable Diarrhea of Infancy Syndrome

The involvement of distant-acting regulatory regions in human diseases remains poorly understood and few cases of disease-causing variations that affect transcriptional enhancers have been documented58-62. Only one of these examples constitutes a complete deletion of an enhancer60 and it remains unclear if deletion of the homologous sequence in mice produces a phenotype mimicking the human condition. In the present study we show that a deletion of a developmental enhancer sequence is the cause of a severe, recessively inherited gastrointestinal disease. Enhancer activity is highly tissue-specific, and the tissues with enhancer activity in vivo are consistent with the gastrointestinal disease etiology. The observed molecular and physiological phenotypes suggest that the

114 enhancer deletion affects normal development of enteroendocrine cells and thereby normal enteroendocrine hormone secretion. This is supported by the striking phenotypic similarity between chr17ΔICR/ΔICR mice and mice with an intestinal-specific deletion of Neurog3, a proendocrine transcription factor required for development of enteroendocrine cells185. Since chr17ΔICR/ΔICR mice resemble human patients homozygous for ICR deletions in all disease aspects examined in this study, these mice are likely to provide an accurate model for studying the human condition and exploring therapeutic interventions in the future.

Our 4C data gave inclonclusive results. The two top genes that emerged, CCDC154 and UNKL did not show any significant differential exprsssion in RNAseq assays both from human and mice biopsies. In addition, complete knockout of CCDC154 gave a distinctive phenotype that is unrealted to IDIS. We note that it is possible we did not obtain the right type of cells in which the enhancer is active in the mouse gut when performing the 4C assay. For a better evaluation, the enhancer expressing cells should be FACS sorted for future 4C assays. It is also possible that the enhnacer is active only at an earlier developmental stage that what we were able to obtain (E15). Despite that, although 4C is useful as a discovery tool for making links between enhancers and likely target loci, the data cannot be interperted in a simple spatial context and should be interperted with caution. Views about genome organization should be further validated by independent methods such as fluoresence in situ hybridization (FISH), however in our case, we tested for short distance connections between the enhancer and the target gene, thus FISH cannot be used as an effective tool. The search for the enahncer target gene led to the identification of a novel predicted transcript with a 900 bp ORF (LOC105371045). However, we did not identify this differentially expressed transcript when analyzing the RNA seq data for the first time, due to the fact that this transcript was not annotated in the available genome browser version we used at the time of analysis (NCBI37/hg19). Most RNA seq alignment programs provide expression scores only for annotated genes, since they calculate reads per kb along the transcript. This transcript was only first annotated in the later genome browser (NCBI38), which allowed reanalysis and identification of the different expression pattern. A second reason for why it was missed in the first RNA seq analysis

115 is that the original analysis focused mainly on intestinal tissues, and the expression difference was only observed in lower parts of the stomach. Only after reanalyzing the data we were able to pick up this signal in stomach. Blastp searches on the ORF showed high amino acid sequence similarity to DAXX, a transcription factor regulating the activity of Pax5. A known paralog to Pax5 is Pax4, involved in enteroendocrine development into different hormone producing enteroendocrine subtypes186. These findings indicate a possible mechanistic role for this newly defined transcript and emphasis a strong connection to the proposed IDIS mechanism as observed in our previous findings. However we note that this observation should still be treated as a theoretical model, as there isn’t any definitive evidence that the protein is indeed expressed. Knocking out the ORF of LOC105371045 in mice (chr17ΔORF/ΔORF) gave a much less severe diarrhea phenotype than seen for the enhancer KO mice. This phenotypic difference could be explained mainly by differences in mouse strains, i.e., different genetic background that can lead to very different responses following KO of a gene-targeted allele or a certain region in the genome187. In addition, it is possible that the enhancer regulates more than just the predicted ORF, or that this intergenic region we defined as an enhancer has other important functions beyond regulatory. Beyond congenital diarrhea, our results highlight the potential role that distant-acting regulatory elements may play in the pathology of other Mendelian diseases. While WGS approaches identify increasing numbers of disease-associated non-coding variants, their functional interpretation remains challenging. The current work demonstrates the importance of detailed experimental follow-up of such findings through in vivo models, an approach that will benefit from the emerging suite of highly efficient genome editing tools188. In view of the genetic results, the transcriptome analyses, and previous reports on congenital diarrhea stemming from failed enteroendocrine cells, we conclude that the IDIS enhancer deletion is the causative variation for the studied diarrhea patients, acting via an enteroendocrine mechanism. We further believe that it may be a fine-tuning mechanism for enteroendocrine cell subtype specification.

116

4. Trichohepatoenteric Syndrome

THES is a heterogeneous disease with a widely varying spectrum of phenotypes. The most prevalent symptoms are persistent secretory diarrhea, hair abnormalities and facial dysmorphism, with a rather frequent appearance of immune deficiency and hepatic phenotypes. In retrospect analysis, our patient appears to have a rather atypical form of THES. While presenting certain facial dysmorphism and mild hair phenotypes, she showed none of the reported hepatic symptoms such as hepatomegaly, chronic hepatitis, cirrhosis or progressive liver failure and had no immune deficiency64. Further, none of the less frequent THES symptoms were observed, including cardiac, cutaneous and platelet abnormalities63. The hair abnormalities observed prior to the microscopic examination were minor and could have occurred due to severe malnutrition caused by the diarrhea. Such atypical clinical presentation accounts for the fact that THES was not suspected in the present case. Whole-exome sequencing together with VarElect variant analysis was instrumental in defining the presently studied phenotype as belonging to the THES spectrum. The TTC37 homozygous missense mutation revealed is extremely rare in the general population (<6X10-5), and not previously reported in any THES study. This is potentially related to the unusual syndromic disposition. We note that this new mutation in TTC37 is also unusual in causing THES for the first time in a new population. THES has so far been identified in families of Pakistani, Kurdish, Italian, North African and French origin65; 66 but not in a person of middle-eastern Arab origin.

This work emphasized the power of exome sequencing and variant analysis in clinical genetics. The definitive molecular diagnosis will allow offering pre-natal testing and genetic counseling to the family for future pregnancies. It will also provide preventive treatment and therapeutic approaches and focus on essential medical procedures rather than performing unnecessary invasive examinations. This might include to effectively addressing the potential future appearance of cardiac, liver and other progressive abnormalities associated with THES.

117

5. Capillary Leak Syndrome

We have identified the first case of familial CLS with a most likely dominant mode of inheritance with incomplete penetrance. Other proposed Mendelian genetic inheritance modes could not have existed in this pedigree, due to the fact that the second generation in the family married spouses from different ethnic background that couldn’t have all been carriers for the same TLN1 mutation. We performed exome sequencing on two family members presenting with a similar clinical course pf CLS attacks including hemoconcentration, severe abdominal pain, vomiting, edema and hypovolemic shock with a fast deterioration to organ failure requiring full resuscitation. Analysis revealed a novel splice-site mutation in TLN1 creating two mutated forms of the protein including both exon skipping and intron retention, the later causing a frameshift and creating a stop codon. Talin1 is a cytoskeletal protein that binds to integrin beta tails, and regulating cell adhesion, cell migration and assembly of the extracellular matrix. While we observed that the mutated form of TLN1 does not affect talin1 protein expression in patients’ skin fibroblasts. We also showed that the mutation did not have an effect on focal adhesion integrity and proper integrin binding. The latter observation was further strengthened by the fact that platelets activity in the patient was normal. One possible explanation for the observed lack of clear junctional phenotype in our mutation-carrying fibroblasts is the previously reported upregulation of talin2, which appears to compensate for the loss of talin1146. However, we did not detect any upregulation of talin2 in the patient’s fibroblasts. Another possible explanation for the lack of phenotype in the tested fibroblasts is that fibroblasts do not share relevant mechanisms with vascular endothelial cells. As the fibroblasts assays gave no indication on the possible effect of the TLN1 mutation, we used endothelial cells from a TLN1 hemizygous mouse that was previously generated by our collaborators and was available to us. We did not have a mouse carrying the exact same splice mutation. The rational of using endothelial cells lies in the fact that previous studies on talin1 and talin2 expression in human umbilical vein endothelial cells (HUVEC) showed that these cells only express talin1. Thus, it is highly expected that in endothelial cells, any form of malfunctioning talin1 would lead to a more severe effect on the organism and will generate a stronger phenotype. The function of talin has been studied extensively in cells in culture, and it has become apparent that while talin1 is not

118 required for initiating cell adhesion and initiating cell spreading, it is required for cells to maintain their spread morphology and for cell migration. In our study we found that hemizygous TLN1 endothelial mice cells show impaired and disrupted cell-cell junctions, a phenotype that might lead to the permeability effect causing the plasma leakage. This observation is similar to what was previously seen for HUVECs where knockdown of talin1 prevented adherens junction formation, and as a result, the cells were unable to maintain spread morphology and showed defects in cell migration146. In addition, our collaborators have observed that hemizygous TLN1 mouse endothelial cells show a decrease in vessel formation capacity and embryonic angiogenesis148. So far, the effect of malfunctioning talin1 on junctional integrity has not been reported. Here we note for the first time that endothelial cells harboring only one functioning allele of TLN1 show impaired junctions which may lead to increased paracellular permeability, and eventually to the capillary leak phenotype. Furthermore, we propose that patients’ endothelial cell should be assessed for junctional integrity and endothelial permeability to substantiate our findings and shed more light on the mechanistic effect of the splice mutation. In this study we also found that serum taken from a patient during an acute attack and at quiescent induces endothelial permeability through remodeling of endothelial cell-cell junctions. Previous studies have already shown this phenomenon for patient's serum during an acute attack but not for the basal level74. This observation could be explained by the hypothesis that a CLS patient harbors factors in his serum that differ from healthy individuals, that could lead to increased endothelium monolayer permeability and induce disruption of endothelial adherens junctions. These factors could also account for the cause that may trigger the unexplained attacks in the patient. Hence, we propose that factor within the serum of the CLS patient work in synergy with the TLN1 mutation, leading to the observed attacks and to a much more severe phenotype that is not present in the healthy TLN1 carriers in the family. This could explain the incomplete penetrance mode of inheritance and may also explain the change in phenotypic severity between the proband (Fig. 31, patient 48) and his father's cousin that was affected at childhood and experienced no attacks after the age of 19y (Fig. 31, patient 21). In addition, it is possible to assume a digenic mode of inheritance in this case, whereas an additional mutation in another gene in the patients could account for the more severe CLS

119 phenotype observed. Further assays should be considered to support this hypothesis, including sequencing of additional family members. Transcriptome and proteome analysis on patient’s skin fibroblasts compared to his healthy mother as control indicated that most differentially expressed genes that were either up or downregulated in the patient are involved in biological processes in which TLN1 plays a pivotal role. The serum proteomics we performed for patient’s basal and episodic serum compared to a healthy control, demonstrated that immune and inflammation related proteins are significantly elevated in serum during an attack. In addition we found that the patient’s basal serum did not differ much in terms of protein expression from the control serum, indicating that episodic serum harbors specific factors that contribute to the observed phenotype of CLS. However, as we have already showed, basal serum applied to wild-type endothelial cells also generated an effect on endothelial permeability and cell junctions. Thus we suggest that within the patient’s serum there are factors that lead to CLS, but that could not have been detected in the proteomics analyses of the serum, most probably due to the presence of other abundant proteins and low initial expression levels. We propose that further assays on the episodic and basal serum should be performed in order to isolate these specific factors that caused the observed phenotype in the endothelial cells. A previous study on adult CLS patients showed that plasma drawn during an acute attack contained elevated levels of specific growth factors such as VEGF and Ang274. VEGF signaling plays a major role in promoting the proliferation and differentiation of the endothelial lineage from the earliest stages of development, whereas the angiopoietin pathway acts somewhat later, to promote the recruitment of supporting cells and vessel stabilization189. These factors are known to have the ability to induce rapid leakage from blood vessels70; 190. In addition, it has been shown that one member of the VEGF family, VEGFA, activates FAK (Focal Adhesion Kinase) and Paxillin in human umbilical vein endothelial cells (HUVEC) through VEGF Receptor 2, leading to recruitment of Actin- anchoring proteins such as talin and vinculin to the focal adhesion plaque, which are essential for VEGFA-induced actin reorganization 191. These data, together with our findings on the TLN1 mutation suggest that elevated VEGF levels in CLS patients could

120 be a consequence of talin1 deficiency in endothelial cells, a phenotype that cannot be rescued by the presence of talin2, generating a positive feedback on the formation of VEGF. Our transcriptome and proteome findings on patients’ fibroblasts were closely related and implicated on the same mechanisms that might be impaired in CLS. Both assays suggested that inflammation or infection may have a pivotal role in triggering acute CLS attacks. In summary, the main characteristics of CLS are the life-threatening attacks probably upon a trigger that still remains unknown. The proband in our family has been suffering from acute attacks usually following strained physical activity or over-excitement. On the other hand, some of the carriers of the heterozygote mutation in TLN1 never presented with any symptoms that resemble an attack and are considered healthy. Thus, we suggest that the TLN1 mutation causes predisposition to CLS, but that the pathology and actual attacks depend on other external factors controlled by environment or in the serum, or by other genetic variations, which could be explained by a digenic model.

121

BIBLIOGRAPHY 1. Stitziel, N.O., Kiezun, A., and Sunyaev, S. (2011). Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biol 12, 227. 2. Lohmann, K., and Klein, C. (2014). Next generation sequencing and the future of genetic diagnosis. Neurotherapeutics 11, 699-707. 3. Gilissen, C., Hoischen, A., Brunner, H.G., and Veltman, J.A. (2011). Unlocking Mendelian disease using exome sequencing. Genome biology 12, 228. 4. Bamshad, M.J., Ng, S.B., Bigham, A.W., Tabor, H.K., Emond, M.J., Nickerson, D.A., and Shendure, J. (2011). Exome sequencing as a tool for Mendelian disease gene discovery. Nature reviews Genetics 12, 745-755. 5. Cirulli, E.T., and Goldstein, D.B. (2010). Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 11, 415-425. 6. Biesecker, L.G. (2010). Exome sequencing makes medical genomics a reality. Nat Genet 42, 13-14. 7. Choi, M., Scholl, U.I., Ji, W., Liu, T., Tikhonova, I.R., Zumbo, P., Nayir, A., Bakkaloglu, A., Ozen, S., Sanjad, S., et al. (2009). Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U S A 106, 19096-19101. 8. Nishri, D., Edvardson, S., Lev, D., Leshinsky-Silver, E., Ben-Sira, L., Henneke, M., Lerman-Sagie, T., and Blumkin, L. (2014). Diagnosis by whole exome sequencing of atypical infantile onset Alexander disease masquerading as a mitochondrial disorder. European journal of paediatric neurology : EJPN : official journal of the European Paediatric Neurology Society. 9. Gibson, J., Gilbert, R.D., Bunyan, D.J., Angus, E.M., Fowler, D.J., and Ennis, S. (2013). Exome analysis resolves differential diagnosis of familial kidney disease and uncovers a potential confounding variant. Genetics research 95, 165-173. 10. Gripp, K.W., Ennis, S., and Napoli, J. (2013). Exome analysis in clinical practice: expanding the phenotype of Bartsocas-Papas syndrome. American journal of medical genetics Part A 161A, 1058-1063. 11. Lieber, D.S., Vafai, S.B., Horton, L.C., Slate, N.G., Liu, S., Borowsky, M.L., Calvo, S.E., Schmahmann, J.D., and Mootha, V.K. (2012). Atypical case of Wolfram syndrome revealed through targeted exome sequencing in a patient with suspected mitochondrial disease. BMC medical genetics 13, 3. 12. Need, A.C., Shashi, V., Hitomi, Y., Schoch, K., Shianna, K.V., McDonald, M.T., Meisler, M.H., and Goldstein, D.B. (2012). Clinical application of exome sequencing in undiagnosed genetic conditions. Journal of medical genetics 49, 353-361. 13. Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., Hindorff, L.A., Hunter, D.J., McCarthy, M.I., Ramos, E.M., Cardon, L.R., Chakravarti, A., et al. (2009). Finding the missing heritability of complex diseases. Nature 461, 747-753. 14. Visel, A., Rubin, E.M., and Pennacchio, L.A. (2009). Genomic views of distant-acting enhancers. Nature 461, 199-205. 15. Dickel, D.E., Visel, A., and Pennacchio, L.A. (2013). Functional anatomy of distant-acting mammalian enhancers. Philosophical transactions of the Royal Society of London Series B, Biological sciences 368, 20120359. 16. Kircher, M., Witten, D.M., Jain, P., O'Roak, B.J., Cooper, G.M., and Shendure, J. (2014). A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46, 310-315.

122

17. Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S., and Goldstein, D.B. (2013). Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet 9, e1003709. 18. Thusberg, J., Olatubosun, A., and Vihinen, M. (2011). Performance of mutation pathogenicity prediction methods on missense variants. Hum Mutat 32, 358-368. 19. Adzhubei, I.A., Schmidt, S., Peshkin, L., Ramensky, V.E., Gerasimova, A., Bork, P., Kondrashov, A.S., and Sunyaev, S.R. (2010). A method and server for predicting damaging missense mutations. Nat Methods 7, 248-249. 20. Hu, J., and Ng, P.C. (2013). SIFT Indel: predictions for the functional effects of amino acid insertions/deletions in proteins. PLoS One 8, e77940. 21. Schwarz, J.M., Rodelsperger, C., Schuelke, M., and Seelow, D. (2010). MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods 7, 575-576. 22. Houdayer, C., Caux-Moncoutier, V., Krieger, S., Barrois, M., Bonnet, F., Bourdon, V., Bronner, M., Buisson, M., Coulet, F., Gaildrat, P., et al. (2012). Guidelines for splicing analysis in molecular diagnosis derived from a set of 327 combined in silico/in vitro studies on BRCA1 and BRCA2 variants. Hum Mutat 33, 1228-1238. 23. Vreeswijk, M.P., Kraan, J.N., van der Klift, H.M., Vink, G.R., Cornelisse, C.J., Wijnen, J.T., Bakker, E., van Asperen, C.J., and Devilee, P. (2009). Intronic variants in BRCA1 and BRCA2 that affect RNA splicing can be reliably selected by splice-site prediction programs. Hum Mutat 30, 107-114. 24. Jian, X., Boerwinkle, E., and Liu, X. (2014). In silico tools for splicing defect prediction: a survey from the viewpoint of end users. Genet Med 16, 497-503. 25. MacArthur, D.G., Balasubramanian, S., Frankish, A., Huang, N., Morris, J., Walter, K., Jostins, L., Habegger, L., Pickrell, J.K., Montgomery, S.B., et al. (2012). A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823-828. 26. Safran, M., Dalah, I., Alexander, J., Rosen, N., Iny Stein, T., Shmoish, M., Nativ, N., Bahir, I., Doniger, T., Krug, H., et al. (2010). GeneCards Version 3: the human gene integrator. Database (Oxford) 2010, baq020. 27. Rebhan, M., Chalifa-Caspi, V., Prilusky, J., and Lancet, D. (1997). GeneCards: integrating information about genes, proteins and diseases. Trends Genet 13, 163. 28. Rappaport, N., Twik, M., Nativ, N., Stelzer, G., Bahir, I., Stein, T.I., Safran, M., and Lancet, D. (2014). MalaCards: A Comprehensive Automatically-Mined Database of Human Diseases. Curr Protoc Bioinformatics 47, 1 24 21-21 24 19. 29. Belinky, F., Nativ, N., Stelzer, G., Zimmerman, S., Iny Stein, T., Safran, M., and Lancet, D. (2015). PathCards: multi-source consolidation of human biological pathways. Database (Oxford) 2015. 30. Stevanin, G., Santorelli, F.M., Azzedine, H., Coutinho, P., Chomilier, J., Denora, P.S., Martin, E., Ouvrard-Hernandez, A.M., Tessa, A., Bouslam, N., et al. (2007). Mutations in SPG11, encoding spatacsin, are a major cause of spastic paraplegia with thin corpus callosum. Nat Genet 39, 366-372. 31. Rosulescu, E., Stanoiu, C., Buteica, E., Stanoiu, B., Burada, F., and Zavaleanu, M. (2009). Hereditary spastic paraplegia. Rom J Morphol Embryol 50, 299-303. 32. Schule, R., and Schols, L. (2011). Genetics of hereditary spastic paraplegias. Semin Neurol 31, 484-493. 33. Blackstone, C. (2012). Cellular pathways of hereditary spastic paraplegia. Annu Rev Neurosci 35, 25-47. 34. Depienne, C., Stevanin, G., Brice, A., and Durr, A. (2007). Hereditary spastic paraplegias: an update. Curr Opin Neurol 20, 674-680.

123

35. Fink, J.K. (1993). Hereditary Spastic Paraplegia Overview. In GeneReviews, R.A. Pagon, T.D. Bird, C.R. Dolan, K. Stephens, andM.P. Adam, eds. (Seattle (WA). 36. Salinas, S., Proukakis, C., Crosby, A., and Warner, T.T. (2008). Hereditary spastic paraplegia: clinical features and pathogenetic mechanisms. Lancet Neurol 7, 1127-1138. 37. Behrends, C., Sowa, M.E., Gygi, S.P., and Harper, J.W. (2010). Network organization of the human autophagy system. Nature 466, 68-76. 38. Avery, G.B., Villavicencio, O., Lilly, J.R., and Randolph, J.G. (1968). Intractable diarrhea in early infancy. Pediatrics 41, 712-722. 39. Straussberg, R., Shapiro, R., Amir, J., Yonash, A., Rachmel, A., Bisset, W.M., and Varsano, I. (1997). Congenital intractable diarrhea of infancy in Iraqi Jews. Clin Genet 51, 98-101. 40. Canani, R.B., and Terrin, G. (2011). Recent progress in congenital diarrheal disorders. Current gastroenterology reports 13, 257-264. 41. Terrin, G., Tomaiuolo, R., Passariello, A., Elce, A., Amato, F., Di Costanzo, M., Castaldo, G., and Canani, R.B. (2012). Congenital diarrheal disorders: an updated diagnostic approach. Int J Mol Sci 13, 4168-4185. 42. Gunawardene, A.R., Corfe, B.M., and Staton, C.A. (2011). Classification and functions of enteroendocrine cells of the lower gastrointestinal tract. Int J Exp Pathol 92, 219-231. 43. Hand, K.V., Giblin, L., and Green, B.D. (2012). Hormone profiling in a novel enteroendocrine cell line pGIP/neo: STC-1. Metabolism 61, 1683-1686. 44. Murch, S.H. (1997). The molecular basis of intractable diarrhoea of infancy. Baillieres Clin Gastroenterol 11, 413-440. 45. Breil, T., Longerich, T., Bettendorf, M., Schnitzler, P., and Engelmann, G. (2011). An unusual intestinal infection causing intractable diarrhoea of infancy. Journal of clinical virology : the official publication of the Pan American Society for Clinical Virology 50, 97-99. 46. Wang, J., Cortina, G., Wu, S.V., Tran, R., Cho, J.H., Tsai, M.J., Bailey, T.J., Jamrich, M., Ament, M.E., Treem, W.R., et al. (2006). Mutant neurogenin-3 in congenital malabsorptive diarrhea. N Engl J Med 355, 270-280. 47. Sayar, E., Islek, A., Yilmaz, A., Akcam, M., Flanagan, S.E., and Artan, R. (2013). Extremely rare cause of congenital diarrhea: enteric anendocrinosis. Pediatr Int 55, 661-663. 48. Rubio-Cabezas, O., Jensen, J.N., Hodgson, M.I., Codner, E., Ellard, S., Serup, P., and Hattersley, A.T. (2011). Permanent Neonatal Diabetes and Enteric Anendocrinosis Associated With Biallelic Mutations in NEUROG3. Diabetes 60, 1349-1353. 49. Thurman, R.E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M.T., Haugen, E., Sheffield, N.C., Stergachis, A.B., Wang, H., Vernot, B., et al. (2012). The accessible chromatin landscape of the human genome. Nature 489, 75-82. 50. Shu, W., Chen, H., Bo, X., and Wang, S. (2011). Genome-wide analysis of the relationships between DNaseI HS, histone modifications and gene expression reveals distinct modes of chromatin domains. Nucleic Acids Res 39, 7428-7443. 51. Mito, Y., Henikoff, J.G., and Henikoff, S. (2007). Histone replacement marks the boundaries of cis-regulatory domains. Science 315, 1408-1411. 52. Jin, C., Zang, C., Wei, G., Cui, K., Peng, W., Zhao, K., and Felsenfeld, G. (2009). H3.3/H2A.Z double variant-containing nucleosomes mark 'nucleosome-free regions' of active promoters and other regulatory regions. Nat Genet 41, 941-945. 53. Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., and Zhao, K. (2007). High-resolution profiling of histone methylations in the human genome. Cell 129, 823-837. 54. Marsman, J., and Horsfield, J.A. (2012). Long distance relationships: enhancer-promoter communication and dynamic gene transcription. Biochim Biophys Acta 1819, 1217-1227.

124

55. Li, X.Y., MacArthur, S., Bourgon, R., Nix, D., Pollard, D.A., Iyer, V.N., Hechmer, A., Simirenko, L., Stapleton, M., Luengo Hendriks, C.L., et al. (2008). Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biol 6, e27. 56. Calo, E., and Wysocka, J. (2013). Modification of enhancer chromatin: what, how, and why? Mol Cell 49, 825-837. 57. Chen, J.M., Ferec, C., and Cooper, D.N. (2010). Revealing the human mutome. Clin Genet 78, 310-320. 58. Laurell, T., Vandermeer, J.E., Wenger, A.M., Grigelioniene, G., Nordenskjold, A., Arner, M., Ekblom, A.G., Bejerano, G., Ahituv, N., and Nordgren, A. (2012). A novel 13 insertion in the sonic hedgehog ZRS limb enhancer (ZRS/LMBR1) causes preaxial polydactyly with triphalangeal thumb. Hum Mutat 33, 1063-1066. 59. Kasowski, M., Kyriazopoulou-Panagiotopoulou, S., Grubert, F., Zaugg, J.B., Kundaje, A., Liu, Y., Boyle, A.P., Zhang, Q.C., Zakharia, F., Spacek, D.V., et al. (2013). Extensive variation in chromatin states across humans. Science 342, 750-752. 60. Ghiasvand, N.M., Rudolph, D.D., Mashayekhi, M., Brzezinski, J.A.t., Goldman, D., and Glaser, T. (2011). Deletion of a remote enhancer near ATOH7 disrupts retinal neurogenesis, causing NCRNA disease. Nat Neurosci 14, 578-586. 61. D'Haene, B., Attanasio, C., Beysen, D., Dostie, J., Lemire, E., Bouchard, P., Field, M., Jones, K., Lorenz, B., Menten, B., et al. (2009). Disease-causing 7.4 kb cis-regulatory deletion disrupting conserved non-coding sequences and their interaction with the FOXL2 promotor: implications for mutation screening. PLoS Genet 5, e1000522. 62. Emison, E.S., McCallion, A.S., Kashuk, C.S., Bush, R.T., Grice, E., Lin, S., Portnoy, M.E., Cutler, D.J., Green, E.D., and Chakravarti, A. (2005). A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk. Nature 434, 857-863. 63. Fabre, A., Martinez-Vinson, C., Goulet, O., and Badens, C. (2013). Syndromic diarrhea/Tricho- hepato-enteric syndrome. Orphanet journal of rare diseases 8, 5. 64. Fabre, A., Breton, A., Coste, M.E., Colomb, V., Dubern, B., Lachaux, A., Lemale, J., Mancini, J., Marinier, E., Martinez-Vinson, C., et al. (2014). Syndromic (phenotypic) diarrhoea of infancy/tricho-hepato-enteric syndrome. Archives of disease in childhood 99, 35-38. 65. Hartley, J.L., Zachos, N.C., Dawood, B., Donowitz, M., Forman, J., Pollitt, R.J., Morgan, N.V., Tee, L., Gissen, P., Kahr, W.H., et al. (2010). Mutations in TTC37 cause trichohepatoenteric syndrome (phenotypic diarrhea of infancy). Gastroenterology 138, 2388-2398, 2398 e2381-2382. 66. Fabre, A., Martinez-Vinson, C., Roquelaure, B., Missirian, C., Andre, N., Breton, A., Lachaux, A., Odul, E., Colomb, V., Lemale, J., et al. (2011). Novel mutations in TTC37 associated with tricho-hepato-enteric syndrome. Human mutation 32, 277-281. 67. Brown, J.T., Bai, X., and Johnson, A.W. (2000). The yeast antiviral proteins Ski2p, Ski3p, and Ski8p exist as a complex in vivo. RNA 6, 449-457. 68. Wang, L., Lewis, M.S., and Johnson, A.W. (2005). Domain interactions within the Ski2/3/8 complex and between the Ski complex and Ski7p. RNA 11, 1291-1302. 69. Fabre, A., Charroux, B., Martinez-Vinson, C., Roquelaure, B., Odul, E., Sayar, E., Smith, H., Colomb, V., Andre, N., Hugot, J.P., et al. (2012). SKIV2L mutations cause syndromic diarrhea, or trichohepatoenteric syndrome. American journal of human genetics 90, 689-692. 70. Clarkson, B., Thompson, D., Horwith, M., and Luckey, E.H. (1960). Cyclical edema and shock due to increased capillary permeability. Am J Med 29, 193-216. 71. Hsu, P., Xie, Z., Frith, K., Wong, M., Kakakios, A., Stone, K.D., and Druey, K.M. (2015). Idiopathic systemic capillary leak syndrome in children. Pediatrics 135, e730-735.

125

72. Gousseff, M., Arnaud, L., Lambert, M., Hot, A., Hamidou, M., Duhaut, P., Papo, T., Soubrier, M., Ruivard, M., Malizia, G., et al. (2011). The systemic capillary leak syndrome: a case series of 28 patients from a European registry. Ann Intern Med 154, 464-471. 73. Kapoor, P., Greipp, P.T., Schaefer, E.W., Mandrekar, S.J., Kamal, A.H., Gonzalez-Paz, N.C., Kumar, S., and Greipp, P.R. (2010). Idiopathic systemic capillary leak syndrome (Clarkson's disease): the Mayo clinic experience. Mayo Clin Proc 85, 905-912. 74. Xie, Z., Ghosh, C.C., Patel, R., Iwaki, S., Gaskins, D., Nelson, C., Jones, N., Greipp, P.R., Parikh, S.M., and Druey, K.M. (2012). Vascular endothelial hyperpermeability induces the clinical symptoms of Clarkson disease (the systemic capillary leak syndrome). Blood 119, 4321-4332. 75. De Martino, M., Sasso, L., Pirozzi, F., and Bonaduce, D. (2009). Systemic capillary leak syndrome or Clarkson's disease: a case report. Intern Emerg Med 4, 357-358. 76. Dolberg-Stolik, O.C., Putterman, C., Rubinow, A., Rivkind, A.I., and Sprung, C.L. (1993). Idiopathic capillary leak syndrome complicated by massive rhabdomyolysis. Chest 104, 123-126. 77. Xie, Z., Chan, E.C., Long, L.M., Nelson, C., and Druey, K.M. (2015). High-dose intravenous immunoglobulin therapy for systemic capillary leak syndrome (Clarkson disease). The American journal of medicine 128, 91-95. 78. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079. 79. Ge, D., Ruzzo, E.K., Shianna, K.V., He, M., Pelak, K., Heinzen, E.L., Need, A.C., Cirulli, E.T., Maia, J.M., Dickson, S.P., et al. (2011). SVA: software for annotating and visualizing sequenced human genomes. Bioinformatics 27, 1998-2000. 80. Wang, K., Li, M., and Hakonarson, H. (2010). ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38, e164. 81. Chun, S., and Fay, J.C. (2009). Identification of deleterious mutations within three human genomes. Genome Res 19, 1553-1561. 82. Stelzer, G., Dalah, I., Stein, T.I., Satanower, Y., Rosen, N., Nativ, N., Oz-Levi, D., Olender, T., Belinky, F., Bahir, I., et al. (2011). In-silico human genomics with GeneCards. Human genomics 5, 709-717. 83. Yachdav, G., Kloppmann, E., Kajan, L., Hecht, M., Goldberg, T., Hamp, T., Honigschmid, P., Schafferhans, A., Roos, M., Bernhofer, M., et al. (2014). PredictProtein--an open resource for online prediction of protein structural and functional features. Nucleic Acids Res 42, W337-343. 84. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., de Bakker, P.I., Daly, M.J., et al. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559-575. 85. Weidberg, H., Shpilka, T., Shvets, E., Abada, A., Shimron, F., and Elazar, Z. (2011). LC3 and GATE-16 N termini mediate membrane fusion processes required for autophagosome biogenesis. Dev Cell 20, 444-454. 86. Zhu, M., Need, A.C., Han, Y., Ge, D., Maia, J.M., Zhu, Q., Heinzen, E.L., Cirulli, E.T., Pelak, K., He, M., et al. (2012). Using ERDS to infer copy-number variants in high-coverage genomes. American journal of human genetics 91, 408-421. 87. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature protocols 7, 562-578.

126

88. Bockenhauer, D., Feather, S., Stanescu, H.C., Bandulik, S., Zdebik, A.A., Reichold, M., Tobin, J., Lieberer, E., Sterner, C., Landoure, G., et al. (2009). Epilepsy, ataxia, sensorineural deafness, tubulopathy, and KCNJ10 mutations. The New England journal of medicine 360, 1960-1970. 89. Kunisato, A., Wakatsuki, M., Shinba, H., Ota, T., Ishida, I., and Nagao, K. (2011). Direct generation of induced pluripotent stem cells from human nonmobilized blood. Stem cells and development 20, 159-168. 90. Warlich, E., Kuehle, J., Cantz, T., Brugman, M.H., Maetzig, T., Galla, M., Filipczyk, A.A., Halle, S., Klump, H., Scholer, H.R., et al. (2011). Lentiviral vector design and imaging approaches to visualize the early stages of cellular reprogramming. Molecular therapy : the journal of the American Society of Gene Therapy 19, 782-789. 91. Spence, J.R., Mayhew, C.N., Rankin, S.A., Kuhar, M.F., Vallance, J.E., Tolle, K., Hoskins, E.E., Kalinichenko, V.V., Wells, S.I., Zorn, A.M., et al. (2011). Directed differentiation of human pluripotent stem cells into intestinal tissue in vitro. Nature 470, 105-109. 92. McCracken, K.W., Howell, J.C., Wells, J.M., and Spence, J.R. (2011). Generating human intestinal tissue from pluripotent stem cells in vitro. Nature protocols 6, 1920-1928. 93. Takahashi, K., Tanabe, K., Ohnuki, M., Narita, M., Ichisaka, T., Tomoda, K., and Yamanaka, S. (2007). Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861-872. 94. Suvarna, B.S. (2011). Drug-receptor interactions. Kathmandu Univ Med J (KUMJ) 9, 203-207. 95. van de Werken, H.J., de Vree, P.J., Splinter, E., Holwerda, S.J., Klous, P., de Wit, E., and de Laat, W. (2012). 4C technology: protocols and data analysis. Methods in enzymology 513, 89-112. 96. Robertson, W.G. (1990). Epidemiology of urinary stone disease. Urological research 18 Suppl 1, S3-8. 97. Amariglio, N., Lev, A., Simon, A., Rosenthal, E., Spirer, Z., Efrati, O., Broides, A., Rechavi, G., and Somech, R. (2010). Molecular assessment of thymus capabilities in the evaluation of T-cell immunodeficiency. Pediatr Res 67, 211-216. 98. Adya, K.A., Inamadar, A.C., Palit, A., Shivanna, R., and Deshmukh, N.S. (2011). Light microscopy of the hair: a simple tool to "untangle" hair disorders. Int J Trichology 3, 46- 56. 99. Kostourou, V., Lechertier, T., Reynolds, L.E., Lees, D.M., Baker, M., Jones, D.T., Tavora, B., Ramjaun, A.R., Birdsey, G.M., Robinson, S.D., et al. (2013). FAK-heterozygous mice display enhanced tumour angiogenesis. Nat Commun 4, 2020. 100. Savion, N., and Varon, D. (2006). Impact--the cone and plate(let) analyzer: testing platelet function and anti-platelet drug response. Pathophysiology of haemostasis and thrombosis 35, 83-88. 101. Li, G.Z., Vissers, J.P., Silva, J.C., Golick, D., Gorenstein, M.V., and Geromanos, S.J. (2009). Database searching and accounting of multiplexed precursor and product ion spectra from the data independent analysis of simple and complex peptide mixtures. Proteomics 9, 1696-1719. 102. Shalit, T., Elinger, D., Savidor, A., Gabashvili, A., and Levin, Y. (2015). MS1-Based Label-Free Proteomics Using a Quadrupole Orbitrap Mass Spectrometer. Journal of proteome research 14, 1979-1986. 103. Keller, A., Nesvizhskii, A.I., Kolker, E., and Aebersold, R. (2002). Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical chemistry 74, 5383-5392.

127

104. Oz-Levi, D., Ben-Zeev, B., Ruzzo, E.K., Hitomi, Y., Gelman, A., Pelak, K., Anikster, Y., Reznik- Wolf, H., Bar-Joseph, I., Olender, T., et al. (2012). Mutation in TECPR2 reveals a role for autophagy in hereditary spastic paraparesis. Am J Hum Genet 91, 1065-1072. 105. Oz-Levi, D., Gelman, A., Elazar, Z., and Lancet, D. (2013). TECPR2: a new autophagy link for neurodegeneration. Autophagy 9, 801-802. 106. Blumen, S.C., Korczyn, A.D., Lavoie, H., Medynski, S., Chapman, J., Asherov, A., Nisipeanu, P., Inzelberg, R., Carasso, R.L., Bouchard, J.P., et al. (2000). Oculopharyngeal MD among Bukhara Jews is due to a founder (GCG)9 mutation in the PABP2 gene. Neurology 55, 1267-1270. 107. Krumm, N., Sudmant, P.H., Ko, A., O'Roak, B.J., Malig, M., Coe, B.P., Project, N.E.S., Quinlan, A.R., Nickerson, D.A., and Eichler, E.E. (2012). Copy number variation detection and genotyping from exome sequence data. Genome Res 22, 1525-1532. 108. Alkan, C., Kidd, J.M., Marques-Bonet, T., Aksay, G., Antonacci, F., Hormozdiari, F., Kitzman, J.O., Baker, C., Malig, M., Mutlu, O., et al. (2009). Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41, 1061- 1067. 109. Weidberg, H., Shvets, E., and Elazar, Z. (2011). Biogenesis and cargo selectivity of autophagosomes. Annu Rev Biochem 80, 125-156. 110. Rubinsztein, D.C., Shpilka, T., and Elazar, Z. (2012). Mechanisms of autophagosome biogenesis. Curr Biol 22, R29-34. 111. Levine, B., and Kroemer, G. (2008). Autophagy in the pathogenesis of disease. Cell 132, 27- 42. 112. Menzies, F.M., and Rubinsztein, D.C. (2010). Broadening the therapeutic scope for rapamycin treatment. Autophagy 6, 286-287. 113. Middleton, F.A., and Strick, P.L. (1994). Anatomical evidence for cerebellar and basal ganglia involvement in higher cognitive function. Science 266, 458-461. 114. Schmahmann, J.D. (2004). Disorders of the cerebellum: ataxia, dysmetria of thought, and the cerebellar cognitive affective syndrome. J Neuropsychiatry Clin Neurosci 16, 367- 378. 115. Shpilka, T., Weidberg, H., Pietrokovski, S., and Elazar, Z. (2011). Atg8: an autophagy-related ubiquitin-like protein family. Genome Biol 12, 226. 116. Nakatogawa, H., Ichimura, Y., and Ohsumi, Y. (2007). Atg8, a ubiquitin-like protein required for autophagosome formation, mediates membrane tethering and hemifusion. Cell 130, 165-178. 117. Rusten, T.E., and Stenmark, H. (2010). p62, an autophagy hero or culprit? Nat Cell Biol 12, 207-209. 118. Pankiv, S., Clausen, T.H., Lamark, T., Brech, A., Bruun, J.A., Outzen, H., Overvatn, A., Bjorkoy, G., and Johansen, T. (2007). p62/SQSTM1 binds directly to Atg8/LC3 to facilitate degradation of ubiquitinated protein aggregates by autophagy. J Biol Chem 282, 24131- 24145. 119. Heimer, G., Oz-Levi, D., Eyal, E., Edvardson, S., Nissenkorn, A., Ruzzo, E.K., Szeinberg, A., Maayan, C., Mai-Zahav, M., Efrati, O., et al. (2015). TECPR2 mutations cause a new subtype of familial dysautonomia like hereditary sensory autonomic neuropathy with intellectual disability. Eur J Paediatr Neurol. 120. Pieper, U., Webb, B.M., Dong, G.Q., Schneidman-Duhovny, D., Fan, H., Kim, S.J., Khuri, N., Spill, Y.G., Weinkam, P., Hammel, M., et al. (2014). ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 42, D336-346.

128

121. Glusman, G., Caballero, J., Mauldin, D.E., Hood, L., and Roach, J.C. (2011). Kaviar: an accessible system for testing SNV novelty. Bioinformatics 27, 3216-3217. 122. Abecasis, G.R., Altshuler, D., Auton, A., Brooks, L.D., Durbin, R.M., Gibbs, R.A., Hurles, M.E., and McVean, G.A. (2010). A map of human genome variation from population-scale sequencing. Nature 467, 1061-1073. 123. Iafrate, A.J., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y., Scherer, S.W., and Lee, C. (2004). Detection of large-scale variation in the human genome. Nature genetics 36, 949-951. 124. Xu, H., Poh, W.T., Sim, X., Ong, R.T., Suo, C., Tay, W.T., Khor, C.C., Seielstad, M., Liu, J., Aung, T., et al. (2011). SgD-CNV, a database for common and rare copy number variants in three Asian populations. Human mutation 32, 1341-1349. 125. Qu, H., and Fang, X. (2013). A brief review on the Human Encyclopedia of DNA Elements (ENCODE) project. Genomics, proteomics & bioinformatics 11, 135-141. 126. Eeckhoute, J., Lupien, M., Meyer, C.A., Verzi, M.P., Shivdasani, R.A., Liu, X.S., and Brown, M. (2009). Cell-type selective chromatin remodeling defines the active subset of FOXA1- bound enhancers. Genome Res 19, 372-380. 127. Pennacchio, L.A., Ahituv, N., Moses, A.M., Prabhakar, S., Nobrega, M.A., Shoukry, M., Minovitsky, S., Dubchak, I., Holt, A., Lewis, K.D., et al. (2006). In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499-502. 128. Helander, H.F., and Fandriks, L. (2012). The enteroendocrine "letter cells" - time for a new nomenclature? Scandinavian journal of gastroenterology 47, 3-12. 129. Yang, J., Brown, M.S., Liang, G., Grishin, N.V., and Goldstein, J.L. (2008). Identification of the acyltransferase that octanoylates ghrelin, an appetite-stimulating peptide hormone. Cell 132, 387-396. 130. Gahete, M.D., Cordoba-Chacon, J., Salvatori, R., Castano, J.P., Kineman, R.D., and Luque, R.M. (2010). Metabolic regulation of ghrelin O-acyl transferase (GOAT) expression in the mouse hypothalamus, pituitary, and stomach. Molecular and cellular endocrinology 317, 154-160. 131. Beucher, A., Gjernes, E., Collin, C., Courtney, M., Meunier, A., Collombat, P., and Gradwohl, G. (2012). The homeodomain-containing transcription factors Arx and Pax4 control enteroendocrine subtype specification in mice. PloS one 7, e36449. 132. Gecz, J., Cloosterman, D., and Partington, M. (2006). ARX: a gene for all seasons. Current opinion in genetics & development 16, 308-316. 133. Itoh, M., Takizawa, Y., Hanai, S., Okazaki, S., Miyata, R., Inoue, T., Akashi, T., Hayashi, M., and Goto, Y. (2010). Partial loss of pancreas endocrine and exocrine cells of human ARX- null mutation: consideration of pancreas differentiation. Differentiation; research in biological diversity 80, 118-122. 134. Du, A., McCracken, K.W., Walp, E.R., Terry, N.A., Klein, T.J., Han, A., Wells, J.M., and May, C.L. (2012). Arx is required for normal enteroendocrine cell development in mice and humans. Dev Biol 365, 175-188. 135. Kim, O., Yoon, J.H., Choi, W.S., Ashktorab, H., Smoot, D.T., Nam, S.W., Lee, J.Y., and Park, W.S. (2014). GKN2 contributes to the homeostasis of gastric mucosa by inhibiting GKN1 activity. Journal of cellular physiology 229, 762-771. 136. Simonis, M., Kooren, J., and de Laat, W. (2007). An evaluation of 3C-based methods to capture DNA interactions. Nat Methods 4, 895-901. 137. Lu, X., Rios, H.F., Jiang, B., Xing, L., Kadlcek, R., Greenfield, E.M., Luo, G., and Feng, J.Q. (2009). A new osteopetrosis mutant mouse strain (ntl) with odontoma-like proliferations and lack of tooth roots. European journal of oral sciences 117, 625-635.

129

138. Liao, W., Zhao, R., Lu, L., Zhang, R., Zou, J., Xu, T., Wu, C., Tang, J., Deng, Y., and Lu, X. (2012). Overexpression of a novel osteopetrosis-related gene CCDC154 suppresses cell proliferation by inducing G2/M arrest. Cell Cycle 11, 3270-3279. 139. Poon, B.P., and Mekhail, K. (2011). Cohesin and related coiled-coil domain-containing complexes physically and functionally connect the dots across the genome. Cell Cycle 10, 2669-2682. 140. Mohler, J., Weiss, N., Murli, S., Mohammadi, S., Vani, K., Vasilakis, G., Song, C.H., Epstein, A., Kuang, T., English, J., et al. (1992). The embryonically active gene, unkempt, of Drosophila encodes a Cys3His finger protein. Genetics 131, 377-388. 141. Avet-Rochex, A., Carvajal, N., Christoforou, C.P., Yeung, K., Maierbrugger, K.T., Hobbs, C., Lalli, G., Cagin, U., Plachot, C., McNeill, H., et al. (2014). Unkempt is negatively regulated by mTOR and uncouples neuronal differentiation from growth control. PLoS genetics 10, e1004624. 142. Murn, J., Zarnack, K., Yang, Y.J., Durak, O., Murphy, E.A., Cheloufi, S., Gonzalez, D.M., Teplova, M., Curk, T., Zuber, J., et al. (2015). Control of a neuronal morphology program by an RNA-binding zinc finger protein, Unkempt. Genes & development 29, 501-512. 143. Oz-Levi, D., Weiss, B., Lahad, A., Greenberger, S., Pode-Shakked, B., Somech, R., Olender, T., Tatarsky, P., Marek-Yagel, D., Pras, E., et al. (2015). Exome sequencing as a differential diagnosis tool: resolving mild trichohepatoenteric syndrome. Clinical genetics 87, 602- 603. 144. Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R., and Siepel, A. (2010). Detection of nonneutral substitution rates on mammalian phylogenies. Genome research 20, 110-121. 145. Sion-Sarid, R., Lerman-Sagie, T., Blumkin, L., Ben-Ami, D., Cohen, I., and Houri, S. (2010). Neurologic involvement in a child with systemic capillary leak syndrome. Pediatrics 125, e687-692. 146. Kopp, P.M., Bate, N., Hansen, T.M., Brindle, N.P., Praekelt, U., Debrand, E., Coleman, S., Mazzeo, D., Goult, B.T., Gingras, A.R., et al. (2010). Studies on the morphology and spreading of human endothelial cells define key inter- and intramolecular interactions for talin1. Eur J Cell Biol 89, 661-673. 147. Kotecki, M., Zeiger, A.S., Van Vliet, K.J., and Herman, I.M. (2010). Calpain- and talin- dependent control of microvascular pericyte contractility and cellular stiffness. Microvasc Res 80, 339-348. 148. Monkley, S.J., Kostourou, V., Spence, L., Petrich, B., Coleman, S., Ginsberg, M.H., Pritchard, C.A., and Critchley, D.R. (2011). Endothelial cell talin1 is essential for embryonic angiogenesis. Dev Biol 349, 494-502. 149. Dejana, E., Tournier-Lasserve, E., and Weinstein, B.M. (2009). The control of vascular integrity by endothelial cell junctions: molecular basis and pathological implications. Dev Cell 16, 209-221. 150. Shay-Salit, A., Shushy, M., Wolfovitz, E., Yahav, H., Breviario, F., Dejana, E., and Resnick, N. (2002). VEGF receptor 2 and the adherens junction as a mechanical transducer in vascular endothelial cells. Proc Natl Acad Sci U S A 99, 9462-9467. 151. Albiges-Rizo, C., Frachet, P., and Block, M.R. (1995). Down regulation of talin alters cell adhesion and the processing of the alpha 5 beta 1 integrin. Journal of cell science 108 ( Pt 10), 3317-3329. 152. Dormond, O., Ponsonnet, L., Hasmim, M., Foletti, A., and Ruegg, C. (2004). Manganese- induced integrin affinity maturation promotes recruitment of alpha V beta 3 integrin to focal adhesions in endothelial cells: evidence for a role of phosphatidylinositol 3-kinase and Src. Thromb Haemost 92, 151-161.

130

153. Monkley, S.J., Zhou, X.H., Kinston, S.J., Giblett, S.M., Hemmings, L., Priddle, H., Brown, J.E., Pritchard, C.A., Critchley, D.R., and Fassler, R. (2000). Disruption of the talin gene arrests mouse development at the gastrulation stage. Developmental dynamics : an official publication of the American Association of Anatomists 219, 560-574. 154. Haling, J.R., Monkley, S.J., Critchley, D.R., and Petrich, B.G. (2011). Talin-dependent integrin activation is required for fibrin clot retraction by platelets. Blood 117, 1719-1722. 155. Casanova, M.L., Bravo, A., Ramirez, A., Morreale de Escobar, G., Were, F., Merlino, G., Vidal, M., and Jorcano, J.L. (1999). Exocrine pancreatic disorders in transsgenic mice expressing human keratin 8. The Journal of clinical investigation 103, 1587-1595. 156. Zhu, X., Petrovski, S., Xie, P., Ruzzo, E.K., Lu, Y.F., McSweeney, K.M., Ben-Zeev, B., Nissenkorn, A., Anikster, Y., Oz-Levi, D., et al. (2015). Whole-exome sequencing in undiagnosed genetic diseases: interpreting 119 trios. Genet Med 17, 774-781. 157. Yang, Y., Muzny, D.M., Reid, J.G., Bainbridge, M.N., Willis, A., Ward, P.A., Braxton, A., Beuten, J., Xia, F., Niu, Z., et al. (2013). Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N Engl J Med 369, 1502-1511. 158. Orhan, G., Bock, M., Schepers, D., Ilina, E.I., Reichel, S.N., Loffler, H., Jezutkovic, N., Weckhuysen, S., Mandelstam, S., Suls, A., et al. (2014). Dominant-negative effects of KCNQ2 mutations are associated with epileptic encephalopathy. Ann Neurol 75, 382- 394. 159. Vreken, P., Van Kuilenburg, A.B., Meinsma, R., Smit, G.P., Bakker, H.D., De Abreu, R.A., and van Gennip, A.H. (1996). A point mutation in an invariant splice donor site leads to exon skipping in two unrelated Dutch patients with dihydropyrimidine dehydrogenase deficiency. J Inherit Metab Dis 19, 645-654. 160. Van Kuilenburg, A.B., Vreken, P., Abeling, N.G., Bakker, H.D., Meinsma, R., Van Lenthe, H., De Abreu, R.A., Smeitink, J.A., Kayserili, H., Apak, M.Y., et al. (1999). Genotype and phenotype in patients with dihydropyrimidine dehydrogenase deficiency. Hum Genet 104, 1-9. 161. Wigginton, J.E., Cutler, D.J., and Abecasis, G.R. (2005). A note on exact tests of Hardy- Weinberg equilibrium. Am J Hum Genet 76, 887-893. 162. Warburton, D. (1991). De novo balanced chromosome rearrangements and extra marker chromosomes identified at prenatal diagnosis: clinical significance and distribution of breakpoints. Am J Hum Genet 49, 995-1013. 163. Heimer, G., Marek-Yagel, D., Eyal, E., Barel, O., Oz Levi, D., Hoffmann, C., Ruzzo, E.K., Ganelin-Cohen, E., Lancet, D., Pras, E., et al. (2015). SLC1A4 mutations cause a novel disorder of intellectual disability, progressive microcephaly, spasticity and thin corpus callosum. Clin Genet 88, 327-335. 164. Biasini, M., Bienert, S., Waterhouse, A., Arnold, K., Studer, G., Schmidt, T., Kiefer, F., Cassarino, T.G., Bertoni, M., Bordoli, L., et al. (2014). SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res 42, W252-258. 165. Sanger, F., Nicklen, S., and Coulson, A.R. (1977). DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 74, 5463-5467. 166. Veltman, J.A., and Brunner, H.G. (2012). De novo mutations in human genetic disease. Nat Rev Genet 13, 565-575. 167. Pasquali, L., Longone, P., Isidoro, C., Ruggieri, S., Paparelli, A., and Fornai, F. (2009). Autophagy, lithium, and amyotrophic lateral sclerosis. Muscle Nerve 40, 173-194. 168. Knaevelsrud, H., and Simonsen, A. (2010). Fighting disease by selective autophagy of aggregate-prone proteins. FEBS Lett 584, 2635-2645.

131

169. Moreau, K., Luo, S., and Rubinsztein, D.C. (2010). Cytoprotective roles for autophagy. Curr Opin Cell Biol 22, 206-211. 170. Wong, E., and Cuervo, A.M. (2010). Autophagy gone awry in neurodegenerative diseases. Nat Neurosci 13, 805-811. 171. Wharton, S.B., McDermott, C.J., Grierson, A.J., Wood, J.D., Gelsthorpe, C., Ince, P.G., and Shaw, P.J. (2003). The cellular and molecular pathology of the motor system in hereditary spastic paraparesis due to mutation of the spastin gene. J Neuropathol Exp Neurol 62, 1166-1177. 172. Henson, B.J., Zhu, W., Hardaway, K., Wetzel, J.L., Stefan, M., Albers, K.M., and Nicholls, R.D. (2012). Transcriptional and post-transcriptional regulation of SPAST, the gene most frequently mutated in hereditary spastic paraplegia. PLoS ONE 7, e36505. 173. Patel, H., Cross, H., Proukakis, C., Hershberger, R., Bork, P., Ciccarelli, F.D., Patton, M.A., McKusick, V.A., and Crosby, A.H. (2002). SPG20 is mutated in Troyer syndrome, an hereditary spastic paraplegia. Nat Genet 31, 347-348. 174. Bakowska, J.C., Jupille, H., Fatheddin, P., Puertollano, R., and Blackstone, C. (2007). Troyer syndrome protein spartin is mono-ubiquitinated and functions in EGF receptor trafficking. Mol Biol Cell 18, 1683-1692. 175. Li, D., and Roberts, R. (2001). WD-repeat proteins: structure characteristics, biological function, and their involvement in human diseases. Cell Mol Life Sci 58, 2085-2097. 176. Smith, T.F. (2008). Diversity of WD-repeat proteins. Subcell Biochem 48, 20-30. 177. Heger, A., Korpelainen, E., Hupponen, T., Mattila, K., Ollikainen, V., and Holm, L. (2008). PairsDB atlas of protein sequence space. Nucleic Acids Res 36, D276-280. 178. Ogawa, M., Yoshikawa, Y., Kobayashi, T., Mimuro, H., Fukumatsu, M., Kiga, K., Piao, Z., Ashida, H., Yoshida, M., Kakuta, S., et al. (2011). A Tecpr1-dependent selective autophagy pathway targets bacterial pathogens. Cell Host Microbe 9, 376-389. 179. Levine, B., Mizushima, N., and Virgin, H.W. (2011). Autophagy in immunity and inflammation. Nature 469, 323-335. 180. Chen, D., Fan, W., Lu, Y., Ding, X., Chen, S., and Zhong, Q. (2012). A mammalian autophagosome maturation mechanism mediated by TECPR1 and the Atg12-Atg5 conjugate. Mol Cell 45, 629-641. 181. Wei, M.L. (2006). Hermansky-Pudlak syndrome: a disease of protein trafficking and organelle function. Pigment Cell Res 19, 19-42. 182. Vantaggiato, C., Clementi, E., and Bassi, M.T. (2014). ZFYVE26/SPASTIZIN: a close link between complicated hereditary spastic paraparesis and autophagy. Autophagy 10, 374- 375. 183. Stadel, D., Millarte, V., Tillmann, K.D., Huber, J., Tamin-Yecheskel, B.C., Akutsu, M., Demishtein, A., Ben-Zeev, B., Anikster, Y., Perez, F., et al. (2015). TECPR2 Cooperates with LC3C to Regulate COPII-Dependent ER Export. Mol Cell 60, 89-104. 184. Hahn, K., Rohdin, C., Jagannathan, V., Wohlsein, P., Baumgartner, W., Seehusen, F., Spitzbarth, I., Grandon, R., Drogemuller, C., and Jaderlund, K.H. (2015). TECPR2 Associated Neuroaxonal Dystrophy in Spanish Water Dogs. PLoS One 10, e0141824. 185. Mellitzer, G., Beucher, A., Lobstein, V., Michel, P., Robine, S., Kedinger, M., and Gradwohl, G. (2010). Loss of enteroendocrine cells in mice alters lipid absorption and glucose homeostasis and impairs postnatal survival. The Journal of clinical investigation 120, 1708-1721. 186. Schonhoff, S.E., Giel-Moloney, M., and Leiter, A.B. (2004). Minireview: Development and differentiation of gut endocrine cells. Endocrinology 145, 2639-2644.

132

187. Doetschman, T. (2009). Influence of genetic background on genetically engineered mouse phenotypes. Methods Mol Biol 530, 423-433. 188. Mali, P., Esvelt, K.M., and Church, G.M. (2013). Cas9 as a versatile tool for engineering biology. Nature methods 10, 957-963. 189. Mueller, M.D., Vigne, J.L., Minchenko, A., Lebovic, D.I., Leitman, D.C., and Taylor, R.N. (2000). Regulation of vascular endothelial growth factor (VEGF) gene transcription by estrogen receptors alpha and beta. Proc Natl Acad Sci U S A 97, 10972-10977. 190. Senger, D.R., Galli, S.J., Dvorak, A.M., Perruzzi, C.A., Harvey, V.S., and Dvorak, H.F. (1983). Tumor cells secrete a vascular permeability factor that promotes accumulation of ascites fluid. Science 219, 983-985. 191. Matsumoto, T., and Claesson-Welsh, L. (2001). VEGF receptor signal transduction. Sci STKE 2001, re21.

133

List of Publications

1. Stelzer G, Dalah I, Stein TI, Satanower Y, Rosen N, Nativ N, Oz Levi D, Olender T, Belinky F, Bahir I, Krug H, Perco P, Mayer B, Kolker E, Safran M, and Lancet D (2011) In-silico human genomics with GeneCards. Hum Genomics 5, 709-717.

2. Oz Levi D*, Ben-Zeev B*, Ruzzo EK, Hitomi Y, Gelman A, Pelak K, Anikster Y, Reznik-Wolf H, Bar-Joseph I, Olender T, Alkelai A, Weiss M, Ben-Asher E, Ge D, Shianna KV, Elazar Z, Goldstein DB, Pras E, and Lancet D (2012) Mutation in TECPR2 Reveals a Role for Autophagy in Hereditary Spastic Paraparesis. Am J Hum Genet 91, 1-8.

3. Oz Levi D, Gelman A, Elazar Z, and Lancet D (2013). TECPR2: a new autophagy link for neurodegeneration. Autophagy 9, 801-802.

4. Ruzzo EK, Capo-Chichi JM, Ben-Zeev B, Chitayat D, Mao H, Pappas AL, Hitomi Y, Lu YF, Yao X, Hamdan FF, Pelak K, Reznik-Wolf H, Bar-Joseph I, Oz Levi D, Lev D, Lerman-Sagie T, Leshinsky-Silver E, Anikster Y, Ben- Asher E, Olender T, Colleaux L, Décarie JC, Blaser S, Banwell B, Joshi RB, He XP, Patry L, Silver RJ, Dobrzeniecka S, Islam MS, Hasnat A, Samuels ME, Aryal DK, Rodriguiz RM, Jiang YH, Wetsel WC, McNamara JO, Rouleau GA, Silver DL, Lancet D, Pras E, Mitchell GA, Michaud JL, Goldstein DB. (2013). Deficiency of asparagine synthetase causes congenital microcephaly and a progressive form of encephalopathy. Neuron 80, 429-441.

5. Oz Levi D, Weiss B, Lahad A, Greenberger S, Pode-Shakked B, Somech R, Olender T, Tatarsky P, Marek-Yagel D, Pras E, Anikster Y, Lancet D (2014). Exome sequencing as a differential diagnosis tool: resolving mild trichohepatoenteric syndrome. Clinical genetics.

6. Zhu, X, Petrovski S, Xie P, Ruzzo EK, Lu YF, McSweeney KM, Ben-Zeev B, Nissenkorn A, Anikster Y, Oz Levi D, Dhindsa RS, Hitomi Y, Schoch K, Spillmann RC, Heimer G, Marek-Yagel D, Tzadok M, Han Y, Worley G, Goldstein J, Jiang YH, Lancet D, Pras E, Shashi V, McHale D, Need AC, Goldstein DB (2015). Whole-exome sequencing in undiagnosed genetic diseases: interpreting 119 trios. Genetics in medicine : official journal of the American College of Medical Genetics.

7. Heimer G, Marek-Yagel D, Eyal E, Barel O, Oz Levi D, Hoffmann C, Ruzzo EK, Ganelin-Cohen E, Lancet D, Pras E, Rechavi G, Nissenkorn A, Anikster Y, Goldstein DB, Ben Zeev B (2015). SLC1A4 mutations cause a

134

novel disorder of intellectual disability, progressive microcephaly, spasticity and thin corpus callosum. Clinical genetics.

8. Heimer G*, Oz Levi D*, Mai-Zahav M, Nissenkorn N, Ruzzo EK, Axelrod F, Maayan C, Edvardson S, Pras E, Reznik-Wolf H, Lancet D, Goldstein DB, Anikster Y, Shalev SA, Elpeleg O, Ben Zeev B (2015). TECPR2 mutations cause a new subtype of familial dysautonomia like hereditary sensory autonomic neuropathy with intellectual disability. European Journal of Paediatric Neurology. In Press

9. Ekhilevitch, N., Kurolap, A., Oz-Levi, D., Mory, A., Hershkovitz, T., Ast, G., Mandel, H., and Baris, H.N. (2015). Expanding the MYBPC1 phenotypic spectrum: a novel homozygous mutation causes arthrogryposis multiplex congenita. Clin Genet.

10. Oz-Levi D, Olender, T, Bar-Joseph, I, Zhu, Y, Marek-Yagel, D, Alkelai A, Ruzzo, EK, Han, Y, Vos, E, Tatarskyy, P, Reznik-Wolf H, Milgrom R, Weiss B, Pode-Shakked B, Schvimer M, Barshack I, Hartman C, Shapiro R , Shamir R, Imai DM , Coleman-Derr D, Dickel DE, Nord AS, Wu H, Afzal V, Lammerts van Bueren K, Barnes RM, Visel A, Black BL, Mayhew CN, Kuhar MF, Pitstick A, Tekman M, Stanescu HC, Wells JM, Kleta R, De Latt W , Goldstein DB, Pras E, Pennacchio LA, Lancet D and Anikster Y (2015). Gut Enhancer Deletions Cause Severe Intractable Diarrhea. Nature. Under revision

135

Declaration on independent collaboration

All the bioinformatics analysis presented in this study was done by me including whole- exome, whole-genome and transcriptome data. All experiments presented in this study were calibrated, performed and analyzed by me except for the following procedures:

1. Recruitment of subjects and all clinical evaluation was done either at Sheba Medical Center or Wolfson Medical Center. 2. High throughput sequencing on the Illumina machines and basic bioinformatics was done in several locations: the Atlas and Macrogen companies, the Weizmann Institute Biological Services and INCPM, and at. David Goldstein's lab at Duke University, NC. 3. For the IDIS project, enhancer transgenic assays, knockout and transgenic mice and the biological assays on such mice were performed by Prof. Len Pennacchio from Lawrence Berkeley Laboratories, Berkeley, CA. 4. Generation of induced pluripotent stem cells for the IDIS project was performed in collaboration with Prof. James Wells from Cincinnati children's hospital, USA. All experiments performed from these cells were done at his lab. 5. Many of the IDIS 4C analyses were done in collaboration with Prof. Wouter de Latt from Utrecht University, The Netherlands, some with contributions by Amos Tanay at WIS. 6. Immunofluorescence of cultured cells, microscopy and endothelial cell assays for the CLS project was done in collaboration with Dr. Vassiliki Kostourou from BSRC Fleming, Greece.

136

March 2, 2016 To: Feinberg Graduate School

Re: Danit Oz-Levi shared authorship papers

Supervisor's Letter: In cases where a thesis includes work in which the student submitting the thesis is one of multiple equal-contributing first authors, the mentor must add a letter explaining the contribution of the student in such papers. The mentor must also declare that this work will not be included in another Ph.D. thesis, except in the case of the other equal-contributing authors. This letter is not part of the thesis document.

Danit Oz Levi is one of two equal-contributing first authors in two paper included in her thesis.

In the first, “Mutation in TECPR2 Reveals a Role for Autophagy in Hereditary Spastic Paraparesis” published in American J Human Genet 2012, Danit shares first authorship with Prof. Bruria Ben Zeev, head of the Peadeatric Neurology department at Sheba Medical Center who has clinically saw the patients and defined their symptoms. Danit performed all sequencing analysis procedures and all experimental procedures involved in this study in order to prove the pathogenicity of the TECPR2 mutation and to explore the relation between the autophagy process and the disease. In addition, Danit wrote the manuscript except for the clinical part written by Prof. Ben-Zeev.

In the second, “TECPR2 mutations cause a new subtype of familial dysautonomia like hereditary sensory autonomic neuropathy with intellectual disability” published in Eur. J Pediatric Neurol. 2015, Danit shares first authorship with Dr. Gali Heimer, the Neurologist from Sheba Medical center, who has seen the patients and performed clinical evaluation. Danit analyzed the exome of the reported trio in this project. In addition, Danit and Dr. Heimer wrote the manuscript jointly.

The work presented in both papers will not be included in another Ph.D. thesis

Truly,

Doron Lancet, Ph.D. Professor