bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Loss of UGP2 in brain leads to a severe epileptic encephalopathy, emphasizing that bi- 2 allelic isoform specific start-loss mutations of essential can cause genetic diseases

3 Elena Perenthaler1, Anita Nikoncuk1*, Soheil Yousefi1*, Woutje M. Berdowski1*, Ivan Capo2, Herma C. 4 van der Linde1, Paul van den Berg1, Edwin H. Jacobs1, Darija Putar1, Mehrnaz Ghazvini3, Eleonora 5 Aronica4,5, Wilfred F.J. van IJcken6, Walter G. de Valk1, Evita Medici–van den Herik7, Marjon van 6 Slegtenhorst1, Lauren Brick8, Mariya Kozenko8, Jennefer N. Kohler 9, Jonathan A. Bernstein10, Kristin G. 7 Monaghan11, Amber Begtrup11, Rebecca Torene11, Amna Al Futaisi12, Fathiya Al Murshedi13, Renjith 8 Mani12, Faisal Al Azri14, Erik-Jan Kamsteeg15, Majid Mojarrad16,17,18, Atieh Eslahi16,19, Zaynab Khazaei20, 9 Fateme Massinaei Darmiyan20, Mohammad Doosti21, Ehsan Ghayoor Karimiani22, Jana Vandrovcova23, 10 Faisal Zafar24, Nuzhat Rana24, Krishna K. Kandaswamy25, Jozef Hertecant26, Peter Bauer25, Stephanie 11 Efthymiou23, Henry Houlden23, Aida M. Bertoli-Avella25, Reza Maroofian23, Kyle Retterer11, Alice S. 12 Brooks1, Tjakko J. van Ham1 and Tahsin Stefan Barakat1, #

13

14 1Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands 15 2Department for Histology and Embryology, Faculty of Medicine Novi Sad, University of Novi Sad, Serbia 16 3iPS cell core facility, Erasmus MC University Medical Center, Rotterdam, The Netherlands 17 4Amsterdam UMC, University of Amsterdam, Department of (Neuro)pathology, Amsterdam, Amsterdam 18 Neuroscience, The Netherlands 19 5Stichting Epilepsie Instellingen Nederland (SEIN), The Netherlands 20 6Center for Biomics, Department of Cell Biology, Erasmus MC University Medical Center, Rotterdam, The 21 Netherlands 22 7Department of Neurology, Erasmus MC University Medical Center, Rotterdam, The Netherlands 23 8Division of Genetics, McMaster Children’s Hospital, Hamilton, Ontario, L8S 4J9, Canada. 24 9Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA, 94035, USA 25 10Department of Pediatrics, Division of Medical Genetics, Stanford University School of Medicine, Stanford, CA 26 94035, USA 27 11GeneDx, Gaithersburg, MD, 20877, USA 28 12Department of Child health, college of Medicine and Health Sciences, Sultan Qaboos University, Muscat, 29 Oman 30 13Genetic and Developmental Medicine Clinic, Sultan Qaboos University Hospital, Muscat, Oman 31 14Department of Radiology and molecular imaging, Sultan Qaboos University Hospital, Muscat, Oman 32 15Department of Clinical Genetics, Radboud University, Nijmegen, The Netherlands 33 16Department of Medical Genetics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, 34 Iran 35 17Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad, Iran 36 18Genetic Center of Khorasan Razavi, Mashhad, Iran 37 19Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran 38 20Genetic Counseling Center, Welfare Organization of Sistan and Baluchestan, Zahedan, Iran 39 21Department of Modern Sciences and Technologies, Faculty of Medicine, Mashhad University of Medical 40 Sciences, Mashhad, Iran. 41 22Genetics Research Centre, Molecular and Clinical Sciences Institute, St. George’s, University, London, SW17 42 ORE, United Kingdom 43 23Department of Neuromuscular Disorders, UCL Queen Square Institute of Neurology, London, WC1N 3BG, 44 United Kingdom. 45 24Department of Paediatric Neurology, Children’s hospital and institute of Child health, Multan 60000, Pakistan 46 25CENTOGENE AG, Rostock, Germany 47 26Department of Pediatrics, Tawam Hospital, and College of Medicine and Health Sciences, UAE University, Al- 48 Ain, UAE 49 50 * equal contribution, shared second author 51 # corresponding Author: [email protected]

1

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

52 53 Running title: Loss of UGP2 causes a severe epileptic encephalopathy 54 55 56 Key words: epileptic encephalopathy, UGP2, ATG mutations, start-loss mutation, genetics, whole 57 exome sequencing, microcephaly, recurrent mutation, founder mutation, essential 58

59

60 Abstract:

61 Developmental and/or epileptic encephalopathies (DEEs) are a group of devastating genetic 62 disorders, resulting in early onset, therapy resistant seizures and developmental delay. Here we 63 report on 12 individuals from 10 families presenting with a severe form of intractable epilepsy, 64 severe developmental delay, progressive microcephaly and visual disturbance. Whole exome 65 sequencing identified a recurrent, homozygous variant (chr2:64083454A>G) in the essential UDP- 66 glucose pyrophosphorylase (UGP2) gene in all probands. This rare variant results in a tolerable 67 Met12Val missense change of the longer UGP2 isoform but causes a disruption of the start 68 codon of the shorter isoform. We show that the absence of the shorter isoform leads to a reduction 69 of functional UGP2 in brain cell types, leading to altered glycogen metabolism, upregulated 70 unfolded protein response and premature neuronal differentiation, as modelled during pluripotent 71 stem cell differentiation in vitro. In contrast, the complete lack of all UGP2 isoforms leads to 72 differentiation defects in multiple lineages in human cells. Reduced expression of Ugp2a/Ugp2b in 73 vivo in zebrafish mimics visual disturbance and mutant animals show a behavioral phenotype. Our 74 study identifies a recurrent start codon mutation in UGP2 as a cause of a novel autosomal recessive 75 DEE. Importantly, it also shows that isoform specific start-loss mutations causing expression loss of a 76 tissue relevant isoform of an essential protein can cause a genetic disease, even when an organism- 77 wide protein absence is incompatible with life. We provide additional examples where a similar 78 disease mechanism applies.

79

80

81

82

83

84

85

86

87

2

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

88 Introduction:

89 Developmental and/or epileptic encephalopathies (DEEs) are a heterogeneous group of genetic 90 disorders, characterized by severe epileptic seizures in combination with developmental delay or 91 regression1. Genes involved in multiple pathophysiological pathways have been implicated in DEEs, 92 including synaptic impairment, ion channel alterations, transporter defects and metabolic processes 93 such as disorders of glycosylation2. Mostly, dominant acting, de novo mutations have been identified 94 in children suffering from DEEs3, and only a limited number of genes with a recessive mode of 95 inheritance are known so far, with a higher occurrence rate in consanguineous populations4. A recent 96 cohort study on DEEs employing whole exome sequencing (WES) and copy-number analysis, 97 however, found that up to 38% of diagnosed cases might be caused by recessive genes, indicating 98 that the importance of this mode of inheritance in DEEs has been underestimated5.

99 The contains ~20,000 genes of which more than 5,000 have been implicated in 100 genetic disorders. Wide-scale population genomics studies and CRISPR-Cas9 based loss-of-function 101 (LoF) screens have identified around 3000-7000 genes that are essential for the viability of the 102 human organism or result in profound loss of fitness when mutated. In agreement with that they are 103 depleted for LoF variants in the human population6. For some of these essential genes it is believed 104 that LoF variants are incompatible with life and are therefore unlikely to be implicated in genetic 105 disorders presenting in postnatal life7. One such example is the UDP-glucose pyrophosphorylase 106 (UGP2) gene at 2. UGP2 is an essential octameric enzyme in nucleotide-sugar 107 metabolism8-10, as it is the only known enzyme capable of catalyzing the conversion of glucose-1- 108 phosphate to UDP-glucose11,12. UDP-glucose is a crucial precursor for the production of glycogen by 109 (GYS)13,14, and also serves as a substrate for UDP-glucose:glycoprotein 110 (UGGT) and UDP-glucose-6-dehydrogenase (UGDH), thereby playing important roles in glycoprotein 111 folding control, glycoconjugation and UDP-glucuronic acid synthesis. The latter is an obligate 112 precursor for the synthesis of and proteoglycans of the extracellular matrix15,16, 113 of which aberrations have been associated with DEEs and neurological disorders17-20. UGP2 has 114 previously been identified as a marker protein in various types of malignancies including gliomas 115 where its upregulation is correlated with a poor disease outcome21-28, but has so far not been 116 implicated in genetic diseases and it has been speculated that this is given its essential role in 117 metabolism8.

118 Many genes are differentially expressed amongst tissues, regulated by non-coding regulatory 119 elements29. In addition, it has become clear that there are more than 40,000 protein isoforms 120 encoded in the human genome, whose expression levels vary amongst tissues. Although there are 121 examples of genetic disorders caused by the loss of tissue specific protein isoforms30-33, it is unknown 122 whether a tissue-relevant loss of an essential gene can be involved in human disease. Here, we 123 report on such a scenario, providing evidence that a novel form of a severe DEE is caused by the 124 brain relevant loss of the essential gene UGP2 due to an isoform specific and germ line transmitted 125 start codon mutation. We present data that this is likely a more frequent disease mechanism in 126 human genetics, illustrating that essential genes for which organism-wide loss is lethal can still be 127 implicated in genetic disease when only absent in certain tissues due to expression misregulation.

128

3

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

129 Results:

130 A recurrent ATG mutation in UGP2 in 12 individuals presenting with a severe DEE

131 We encountered a three-month old girl (Figure 1A, family 1, individual 1), that was born as the first 132 child to healthy non-consanguineous Dutch parents, by normal vaginal delivery after an uneventful 133 pregnancy conceived by ICSI. She presented in the first weeks of life with irritability and jitteriness, 134 that developed into infantile spasms and severe epileptic activity on multiple 135 electroencephalograms, giving rise to a clinical diagnosis of West syndrome (Figure 1B). Despite the 136 use of multiple anti-epileptic drugs, including ACTH and a ketogenic diet, seizures remained 137 intractable and occurred daily. Severe developmental delay was evident without acquisition of any 138 noticeable developmental milestones, causing the need for gastrointestinal tube feeding. Visual 139 tracking was absent, and foveal hypopigmentation, hypermetropia and mild nystagmus were noticed 140 upon ophthalmological investigation. MRI brain imaging showed no gross structural abnormalities or 141 migration disorders at the age of 4 months, but displayed reduced white matter, that further 142 developed into global atrophy with wide sulci and wide pericerebral liquor spaces at the age of 17 143 months (Figure 1C, Supplementary Figure 1B). At that time, she had become progressively 144 microcephalic, with a head circumference of -2.96 SD at the last investigation at 23 months of age 145 (Supplementary Figure 1A). She showed a number of minor dysmorphisms, including a sloping 146 forehead, elongated head with suture ridging, bitemporal narrowing, a relatively small mouth and 147 large ears (Figure 1A). Neurological examination showed brisk, symmetric deep tendon reflexes, 148 more pronounced at the upper limbs. Routine investigations, including metabolic screening in urine, 149 plasma and cerebrospinal fluid were normal. A SNP-array showed a normal female chromosomal 150 profile, with a large, ~30 Mb run of homozygosity (ROH) at chromosome 2, and a few smaller ROH 151 regions, adding up to 50 Mb ROH regions in total, pointing to an unrecognized common ancestor of 152 both parents (coefficient of inbreeding 1/64). Subsequent trio WES did not show any disease-causing 153 variants in known DEE genes, but identified a homozygous variant (chr2:64083454A>G) in UGP2, 154 located in the large ROH region (Figure 1D), with no other disease implicated variants observed in 155 that region. Both parents were heterozygous carriers of the same variant. Via Genematcher34 and our 156 network of collaborators, we identified 11 additional individuals from 9 unrelated families (of which 157 8 were consanguineous), harboring the exact same homozygous variant and presenting with an 158 almost identical clinical phenotype of intractable seizures, severe developmental delay, visual 159 disturbance, microcephaly and similar minor dysmorphisms (Figure 1A, C, E, Supplementary Figure 160 1B, Supplementary Case reports, Supplementary Table 1 for detailed information on 11 cases). Six 161 of these individuals passed away before the age of 3.5 years. In 4 families, at least 4 already 162 deceased siblings had a similar phenotype but could not be investigated. Two families were of Indian 163 descent (both with ancestors from regions currently belonging to Pakistan), living in Canada (family 164 2) and the USA (family 3), with the remaining families coming from Oman (family 4, originally from 165 Pakistan), Pakistan (family 5), Iran (family 6, 7 and 8) and UAE (family 9). One additional case in a 166 family from Oman was identified presenting with intractable seizures and microcephaly, but no 167 detailed medical information could be obtained at this point.

168 Having identified at least 12 individuals with an almost identical clinical phenotype and an identical 169 homozygous variant in the same gene, led us to pursue UGP2 as a candidate gene for a new genetic 170 form of DEE. UGP2 is highly expressed in various brain regions (Figure 1F), and also widely expressed 171 amongst other tissues, including liver and muscle according to the data from the GTEx portal35

4

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

172 (Supplementary Figure 1D). The (chr2:64083454A>G) variant is predicted to cause a missense variant 173 (c.34A>G, p.Met12Val) in UGP2 isoform 1 (NM_006759), and to cause a translation start loss (c.1A>G, 174 p.?.) of UGP2 isoform 2 (NM_001001521), referred to as long and short isoform, respectively. The 175 variant has not been reported in the Epi25 web browser36, ClinVar37, LOVD38, Exome Variant Server39, 176 DECIPHER40, GENESIS41 , GME variome42 or Iranome databases43, is absent from our in-house data 177 bases and is found only 15 times in a heterozygous, but not homozygous, state in the 280,902 alleles 178 present in gnomAD (MAF: 0.00005340)44. In the GeneDx unaffected adult cohort, the variant was 179 found heterozygous 10 times out of 173,502 alleles (MAF: 0.00005764), in the ~10,000 exomes of the 180 Queen Square Genomic Center database two heterozygous individuals were identified, and out of 181 45,921 individuals in the Centogene cohort, 10 individuals are heterozygous for this variant. The 182 identified variant has a CADD score (v1.4) of 19.2245 and Mutation Taster46 predicted this variant as 183 disease causing. The nucleotide is strongly conserved over multiple species (Figure 1G). Analysis of 184 WES data from 6 patients did provide evidence of a shared ROH between patients from different 185 families, indicating that this same variant might represent an ancient mutation that originated some 186 26 generations ago (Supplementary Figure 1C). Interestingly, since most families originally came 187 from regions of India, Pakistan and Iran, overlapping with an area called Balochistan, this could 188 indicate that the mutation has originated there around 600 years ago. As Dutch traders settled in 189 that area in the 17th century, it is tempting to speculate that this could explain the co-occurrence of 190 the variant in these distant places47.

191 Short UGP2 isoform is predominantly expressed in brain and absent in patients with ATG mutations

192 Both UGP2 isoforms only differ by 11 amino acids at the N-terminal (Figure 2A) and are expected to 193 be functionally equivalent8. To investigate how the A>G variant may cause DEE, we first obtained 194 fibroblasts from individual 1 (homozygous for the A>G variant) and her heterozygous parents and 195 analyzed the isoform expression by Western blotting (Figure 2B). Whereas the two isoforms were 196 equally expressed in wild type fibroblasts, the expression of the shorter isoform was diminished to 197 ~25% of total UGP2 in heterozygous parents, both of individual 1 (Figure 2B, C) and of individual 2 198 and 3 (Supplementary figure 2A, B), and was absent in cells from the affected individual 1 (Figure 2 199 B, C; fibroblasts of the affected children in family 2 or other families were not available). Total UGP2 200 levels were not significantly different between the affected child and her parents, or between 201 parents and wild type controls (Figure 2D, Supplementary Figure 2C). This indicates that the long 202 isoform harboring the Met12Val missense variant is upregulated in fibroblast when the short isoform 203 is missing. Moreover, this indicates that Met12Val does not affect the stability of the long isoform at 204 the protein or transcript level (Supplementary Figure 2D, E, F). RNA-seq on peripheral blood samples 205 of family 1 did not identify altered splicing events of UGP2 and the global transcriptome of the 206 proband was not different from her parents, although only a limited analysis could be performed as 207 only a single sample was available for each individual (Supplementary Figure 2G, H). Both 208 homozygous and heterozygous fibroblasts had a similar proliferation rate compared to wild type 209 fibroblasts (Figure 2E, Supplementary Figure 2I), and immunocytochemistry confirmed a similar 210 subcellular localization of UGP2 in mutant and wild type cells (Figure 2F). We then measured the 211 enzymatic activity of UGP2 in wild type, heterozygous and homozygous fibroblasts, and found that 212 mutant fibroblast had a similar capacity to produce UDP-glucose in the presence of exogenously 213 supplied glucose-1-phosphate and UTP (Figure 2G). Altogether, this indicates that the long UGP2 214 isoform harboring the Met12Val missense change is functional and is therefore unlikely to contribute 215 to the patient phenotype.

5

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

216 As the A>G variant results in a functional long UGP2 isoform but abolishes the translation of the 217 shorter UGP2 isoform, we next investigated whether the ratio between short and long isoform 218 differs amongst tissues. If so, the homozygous A>G variant would lead to depletion of UGP2 in tissues 219 where mainly the short isoform is expressed, possibly below a threshold that is required for normal 220 development or function. Western blotting on cellular extracts derived from wild type H9 human 221 embryonic stem cells (ESCs), commercially acquired H9-derived neural stem cells (NSCs) and 222 fibroblasts (Figure 3A) showed that, whereas the ratio between short and long isoform in fibroblasts 223 was around 0.5, in ESCs it was 0.14 and in NSCs 0.77, indicating that the shorter UGP2 isoform is the 224 predominant one in NSCs (Figure 3B). A similar trend was observed when assessing the transcript 225 level, both by multiplex RT-PCR and RT-qPCR, using primers detecting specifically the short and long 226 transcript isoform (Figure 3C, D, E). This indicates that differential isoform expression between cell 227 types is regulated at the transcriptional level, possibly hinting at tissue-specific regulatory elements 228 driving isoform expression. We next analyzed RNA-seq data from human fetal tissues48-51 to 229 determine the fraction of reads covering short versus total UGP2 transcripts (Figure 3F). This showed 230 that in human fetal brain the short transcript isoform is predominantly expressed. To gain more 231 insight into the cell type-specific expression of UGP2, we performed immunohistochemistry on 232 human fetal brain tissues from the first to third trimester of pregnancy (Figure 3G). In the 233 first trimester we found pale labeling of neuropil in the proliferative neuroepithelium of the 234 hypothalamic, cortical, mesencephalic and thalamic regions (Figure 3G-A/I, II, III, IV), as well as the 235 marginal zone of the spinal cord (Figure 3G-A/V) and cuboidal epithelial cells of choroid plexus 236 (Figure 3G-A/VI). During the second trimester, UGP2 positivity was detected in neurons from the 237 subplate region of the cerebral cortex (Figure 3G-B/I, II) and still in some of the cells in the 238 neuroepithelium and subventricular zone (Figure 3G-B/III). Almost the same pattern of UGP2 239 distribution was found in the cerebral cortex of fetuses from the 3rd trimester. Also, we found clear 240 cytoplasmatic UGP2 expression in neurons from mesencephalic, inferior olivary and cerebellar nuclei 241 during the second (Figure 3G-B/IV, V, and VI) and third trimester, respectively (Figure 3G-C/IV, V). In 242 the white matter of the cerebellum in the third trimester, we identified single positive glial cells 243 (Figure 3G-C/VI). In the cerebellar cortex we did not find specific positivity of cells on UGP2 (Figure 244 3G-B, C/VII). Cuboidal epithelial cells of choroid plexus preserved UGP2 positivity during the 245 second trimester (Figure 3G-B/VIII) but lost it in the third trimester (Figure 3G-C/VIII). Together this 246 indicates that UGP2 can be detected in a broad variety of cell types during brain development. On 247 Western blotting, we noticed preferential expression of the shorter UGP2 isoform in the developing 248 cortex and cerebellum from gestational weeks 14, 20 and 28 (Figure 3H) and in the frontal cortex of 249 brains from weeks 21 and 23 (Supplementary Figure 2J). Together, this supports the hypothesis that 250 the DEE phenotype in patients is caused by a major loss of functional UGP2 in the brain, as the short 251 isoform represents virtually all UGP2 produced in this tissue.

252 Lack of the short UGP2 isoform leads to transcriptome changes upon differentiation into neural stem 253 cells

254 To model the disease in vitro, we first engineered the homozygous A>G mutation in H9 ESCs to study 255 the mutation in a patient independent genetic background and compare it to isogenic parental cells. 256 We obtained two independent clones harboring the homozygous A>G change (referred to as knock- 257 in, KI, mutant) and two cell lines harboring an insertion of an additional A after nucleotide position 42 258 of UGP2 transcript 1 (chr2:64083462_64083463insA) (Supplementary Figure 3A, B) (referred to as 259 knockout, KO). This causes a premature stop codon at position 47 (D15Rfs*33), leading to

6

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

260 nonsense mediated mRNA decay and complete absence of UGP2 protein (Supplementary Figure 3C). 261 All derived ESCs had a normal morphology and remained pluripotent as assessed by marker 262 expression (Supplementary Figure 3D, E), indicating that the absence of UGP2 in ESCs is tolerated, in 263 agreement with genome-wide LoF CRISPR screens which did not identify UGP2 as an essential gene 264 in ESCs52,53. We differentiated wild type, KI and KO ESCs into NSCs, using dual SMAD inhibition 265 (Supplementary Figure 4 A-C). Wild type cells could readily differentiate into NSCs, having a normal 266 morphology and marker expression, whereas differentiation of KI and KO cells was more variable and 267 not all differentiations resulted in viable, proliferating NSCs. KO cells could not be propagated for 268 more than 5 passages under NSC culture conditions (data not shown), which could indicate that the 269 total absence of UGP2 protein is not tolerated in NSCs. When assessed by Western blotting, total 270 UGP2 protein levels were reduced in KI cells and depleted in KO cells compared to wild type 271 (Supplementary Figure 4D, E).

272 Next, we performed RNA-seq of wild type, KI and KO ESCs and NSCs to assess how depletion of UGP2 273 upon NSCs differentiation would impact on the global transcriptome (Figure 4, Supplementary 274 Figure 5, Supplementary Table 2). In agreement with normal proliferation and morphology of KI and 275 KO ESCs, all ESCs shared a similar expression profile of pluripotency associated genes and only few 276 genes were differentially expressed between the three genotypes (Supplementary Figure 5C, 277 Supplementary Table 3). This indicates that the absence of UGP2 in ESCs does not lead to major 278 transcriptome alterations despite the central role of this enzyme in metabolism. Upon 279 differentiation, cells from all genotypes expressed NSC markers (Supplementary Figure 5F), but when 280 comparing wild type and KO cells, we observed noticeable changes, that were less pronounced in KI 281 NSCs but still followed a similar trend (Figure 4A, B, Supplementary Figure 5D, E). Gene enrichment 282 analysis showed that genes downregulated in KO and KI cells were implicated in processes related to 283 the extra-cellular matrix, cell-cell interactions and metabolism, while genes upregulated in KO and KI 284 cells were enriched for synaptic processes and genes implicated in epilepsy (Figure 4C, 285 Supplementary Table 4). Both KO and KI cells showed an upregulation of neuronal expressed genes, 286 indicating a tendency to differentiate prematurely. To validate RNA-seq findings, we tested several 287 genes by RT-qPCR in wild type, KI and KO cells (Figure 4D). We also included KO rescue cells, in which 288 we had restored the expression of either the wild type or the mutant UGP2 long isoform, leading 289 each to an approximately 4-fold UGP2 overexpression at the NSC state compared to WT 290 (Supplementary Figure 4F). Amongst the tested genes was NNAT, which showed a significant 291 upregulation in KI and KO cells, which was rescued by restoration of UGP2 expression in KO NSCs. 292 NNAT encodes neuronatin that stimulates glycogen synthesis by upregulating glycogen synthase and 293 was previously found to be upregulated in Lafora disease. This lethal teen-age onset 294 neurodegenerative disorder presenting with myoclonic epilepsy is caused by mutations in the 295 ubiquitin ligase malin, leading to accumulation of altered polyglucosans54. Malin can ubiquitinate 296 neuronatin leading to its degradation. As reduced UGP2 expression might impact on glycogen 297 production, it seems plausible that this results in compensatory NNAT upregulation and in 298 downstream aberrations contributing to the patient phenotypes. Indeed, neuronatin upregulation 299 was shown to cause increased intracellular Ca2+ signaling, ER stress, proteasomal dysfunction and cell 300 death in Lafora disease55,56, and was shown to be a stress responsive protein in the outer segment of 301 retina photoreceptors57,58. Another interesting gene upregulated in KI and KO NSCs and 302 downregulated in rescue cell lines was the autism candidate gene FGFBP359. This secreted 303 proteoglycan that enhances FGF signaling is broadly expressed in brain60, and functions as an

7

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

304 extracellular chaperone for locally stored FGFs in the ECM, thereby influencing glucose metabolism 305 by regulating rate-limiting in gluconeogenesis61. Other potentially relevant genes displaying 306 the same expression trend were the heparan sulphate proteoglycan GPC2 (a marker of immature 307 neurons62,63), the helix-loop-helix transcription factor ID4 (a marker of postmitotic neurons64), and 308 the signaling molecule FGFR3 that has been implicated in epilepsy65. Genes downregulated in KO 309 cells and upregulated in rescue cells included urokinase-type plasminogen activator PLAU (deficiency 310 in mouse models increases seizure susceptibility66), the glycoprotein GALNT7 (upregulation of which 311 has been found to promote glioma cell invasion67) and the brain tumor gene MYBL1 (that has been 312 shown to be regulated by O-linked N-acetylglucosamine68. Similar expression changes were observed 313 in NSCs differentiated from induced pluripotent stem cells (iPSCs) that we had generated from family 314 1 (Supplementary Figure 6). Together, RNA-seq showed that whereas the absence of UGP2 is 315 tolerated in ESCs, its complete absence or reduced expression results in global transcriptome 316 changes in NSCs, with many affected genes implicated in DEE relevant pathways.

317

318 Absence of short UGP2 isoform leads to metabolic defects in neural stem cells

319 To investigate how reduced UGP2 expression levels in KO and KI cells would impact on NSC 320 metabolism, we investigated the capacity to produce UDP-glucose in the presence of exogenously 321 supplied glucose-1-phospate and UTP. KO NSCs showed a severely reduced ability to produce UDP- 322 glucose (Figure 5A). This reduction was rescued by ectopic overexpression of both long wild type and 323 long mutant UGP2. KI cells showed a slightly reduced activity in ESCs (Supplementary Figure 7A), but 324 a more strongly reduced activity in NSCs compared to wild type (Figure 5A), correlating with total 325 UGP2 expression levels (Supplementary Figure 4D, E). Surprisingly, contrary to KO NSCs, KO ESC 326 showed some residual capacity to produce UDP-glucose despite the complete absence of UGP2 327 (Supplementary Figure 7A). This could indicate that a yet to be identified enzyme can partially take 328 over the function of UGP2 in ESCs but not NSCs, which might explain the lack of expression changes 329 in this cell type upon UGP2 loss. iPSCs showed similar results (Supplementary Figure 7B). We next 330 assessed the capacity to synthesize glycogen under low oxygen conditions by PAS staining, as it was 331 previously shown that hypoxia triggers increased glycogen synthesis69. As expected, wild type ESCs 332 cultured for 48 hours under hypoxia showed an intense cytoplasmic PAS staining in most cells 333 (Supplementary Figure 7C, D), while KO ESCs showed a severely reduced staining intensity. This 334 indicates that under hypoxia conditions, the residual capacity of ESC to produce UDP-glucose in the 335 absence of UGP2 is insufficient to produce glycogen. KI ESCs were indistinguishable from wild type 336 (Supplementary Figure 7D). At the NSC state, many KO cells kept at low oxygen conditions for 48 337 hours died (data not shown) and those KO cells that did survive were completely depleted from 338 glycogen granules (Figure 5B, C). This could be rescued by overexpression of both wild type or 339 mutant long UGP2 isoform. KI NSCs showed a more severe reduction in PAS staining compared to the 340 ESC state (Figure 5B, C), and we observed similar findings in patient iPSC derived NSCs 341 (Supplementary Figure 7E). Together, this further indicates that upon neural differentiation the 342 isoform expression switch renders patient cells depleted of UGP2, leading to a reduced capacity to 343 synthesize glycogen. This can directly be involved in the DEE phenotype, as, besides affecting energy 344 metabolism, reduction of glycogen in brain has been shown to result in I) impairment of synaptic 345 plasticity70; II) reduced clearance of extracellular potassium ions leading to neuronal 346 hypersynchronization and seizures71-73; and III) altered glutamate metabolism74. To investigate how

8

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

347 reduced UDP-glucose levels would impact on glycosylation, we next, investigated glycosylation levels 348 by means of LAMP2, a lysosomal protein known to be extensively glycosylated both by N-linked and 349 O-linked glycosylation75. We found that KO NSCs show hypoglycosylation of LAMP2 that is rescued by 350 the over expression of both WT and mutant long isoform (Figure 5D). In contrast, in ESCs no 351 glycosylation defects were noticed (Supplementary Figure 7F). Finally, we investigated whether the 352 absence of UGP2, affecting protein glycosylation, could induce ER stress and thus unfolded protein 353 response (UPR). Whereas in ESCs, the absence of UGP2 did not result in a detectable effect on UPR 354 markers (Supplementary Figure 7G), in NSCs we noticed an increased expression of these genes both 355 in KO and in KI cells (Figure 5E). This indicates that NSCs having UGP2 levels under a certain threshold 356 are more prone to ER-stress and UPR. In agreement with this, we did not observe upregulation of 357 UPR markers in patient derived fibroblast, which have similar total UGP2 expression levels compared 358 to controls (Supplementary Figure 7H). Together this indicates that upon differentiation to NSCs, KI 359 cells become sufficiently depleted of UGP2 to have reduced synthesis of UDP-glucose, leading to 360 defects in glycogen synthesis and protein glycosylation and to the activation of UPR response. 361 Alterations of these crucial processes are likely to be implicated in the pathogenesis leading to 362 increased seizure susceptibility, altered brain microstructure and progressive microcephaly.

363 Ugp2a and Ugp2b double mutant zebrafish recapitulate metabolic changes during brain 364 development, have an abnormal behavioral phenotype, visual disturbance, and increased seizure 365 susceptibility

366 Finally, to model the consequences of the lack of UGP2 in vivo, we generated zebrafish mutants for 367 both ugp2a and ugp2b, the zebrafish homologs of UGP2, using CRISPR-Cas9 injections in fertilized 368 oocytes in a background of a radial glia/neural stem cell reporter76. Double homozygous mutant lines 369 having frameshift deletions for both genes confirmed by Sanger sequencing could be generated but 370 the only viable combination, obtained with ugp2a loss, created a novel ATG in 2 of ugp2b, 371 leading to a hypomorphic allele (Figure 6A). Homozygous ugp2a/b mutant zebrafish had a normal 372 gross morphology of brain and radial glial cells (Figure 6B), showed a largely diminished activity to 373 produce UDP-glucose in the presence of exogenously supplied glucose-1-phospate and UTP (Figure 374 6C), and showed a reduction in c-FOS expression levels, indicating reduced global neuronal activity 375 (Figure 6D). To monitor possible spontaneous seizures, we performed video tracking experiments of 376 developing larvae under light-dark cycling conditions at 5 days post fertilization (dpf). Control larvae 377 show increased locomotor activity under light conditions, and although ugp2 double mutant larvae 378 still responded to increasing light conditions, they showed a strongly reduced activity (Figure 6E, F). 379 This could indicate that their capability to sense visual cues is diminished, or that their tectal 380 processing of visual input is delayed, resulting in reduced movements. Strikingly, upon careful 381 inspection, we noticed that ugp2 double mutant larvae did not show spontaneous eye movements, 382 in contrast to age-matched control larvae (Figure 6G, Supplemental Movie 1 and 2). Whereas we did 383 not observe an obvious spontaneous epilepsy phenotype in these double mutant larvae, upon 384 stimulation with 4-aminopyridine (4-AP), a potent convulsant, double mutant larvae showed an 385 increased frequency and duration of movements at high velocity compared to controls, which might 386 indicate an increased seizure susceptibility (Figure 6H, I). Taken together, severely reduced 387 Ugp2a/Ugp2b levels result in a behavior defect with reduced eye movements, indicating that also in 388 zebrafish Ugp2 plays an important role in brain function.

9

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

389 UGP2 is an essential gene in humans and ATG mutations of tissue specific isoforms of essential genes 390 potentially cause more rare genetic diseases

391 Several lines of evidence argue that UGP2 is essential in humans. First, no homozygous LoF variants 392 or homozygous exon-covering deletions for UGP2 are present in gnomAD or GeneDx controls, and 393 homozygous variants in this gene are limited to non-coding changes, synonymous variants and 5 394 missense variants, together occurring only 7 times homozygous (Supplementary Table 5). Also, no 395 homozygous or compound heterozygous UGP2 LoF variants were found in published studies on 396 dispensable genes in human knockouts77-79, or in the Centogene (CentoMD) or GeneDx patient 397 cohorts, encompassing together many thousands of individuals, further indicating that this gene is 398 intolerant to loss-of-function in a bi-allelic state. In addition, no homozygous deletions of the region 399 encompassing UGP2 are present in DECIPHER40 or ClinVar37. Second, UGP2 has been identified as an 400 essential gene using gene-trap integrations80 and in CRISPR-Cas9 LoF screens in several human cell 401 types81-85. Finally, studies in yeast86,87, fungus88 and plants89-91 consider the orthologs of UGP2 as 402 essential, and the absence of Ugp2 in mice is predicted to be lethal92. In flies, homozygous UGP 403 knock-outs are lethal while only hypomorphic compound heterozygous alleles are viable but have a 404 severe movement defect with altered neuromuscular synaptogenesis due to glycosylation defects93. 405 To further investigate the essentiality of UGP2, we performed differentiation experiments of our WT, 406 KO and rescue ESCs. Differentiation of KO ESCs into hematopoietic stem cells (HSCs) resulted in 407 severe downregulation of GATA2 compared to wild type cells, and this was restored in rescue cell 408 lines (Figure 7A). GATA2 is a key transcription factor in the developing blood system, and knockout of 409 Gata2 is embryonic lethal in mice due to defects in HSC generation and maintenance94,95. 410 Differentiation of ESCs into cardiomyocytes similarly affected key marker gene expression in KO cells, 411 and these changes were restored upon UGP2 rescue (Figure 7B, C). Whereas WT ESCs could generate 412 beating cardiomyocytes after 10 days, these were not seen in KO ESCs. Taken together this argues 413 that the complete absence of UGP2 in humans is probably incompatible with life, a hypothesis that 414 cannot be tested directly. However, if true, this could well explain the occurrence of the unique 415 recurrent mutation in all cases presented herein. Given the structure of the UGP2 locus (Figure 2A), 416 every LoF variant would affect either the long isoform, when located in the first 33 nucleotides of the 417 cDNA sequence, or both the short and long isoform when downstream to the ATG of the short 418 isoform. Therefore, the short isoform start codon is the only mutational target that can disrupt 419 specifically the short isoform. In this case, the Met12Val change introduced into the long isoform 420 does not seem to disrupt UGP2 function to such an extent that this is intolerable and therefore 421 allows development to proceed for most tissues. However, the lack of the short UGP2 isoform 422 caused by the start codon mutation results in a depletion of functional UGP2 in tissues where 423 normally the short isoform is predominantly expressed. In brain this reduction diminishes total UGP2 424 levels below a threshold for normal development, causing a severe epileptic encephalopathy 425 syndrome. Given the complexity of the human genome with 42976 transcripts with RefSeq peptide 426 IDs, perhaps also other genetic disorders might be caused by such tissue restricted depletion of 427 essential . Using a computational homology search of human proteins encoded by different 428 isoforms, we have identified 1766 genes that share a similar structure to the UGP2 locus (e.g. a 429 shorter protein isoform that is largely identical to the longer protein isoform, translated from an ATG 430 that is contained within the coding sequence of the long isoform) (Figure 7D). When filtering these 431 genes for 1) those previously shown to be essential6, 2) not associated with disease (e.g. no OMIM 432 phenotype) and 3) those proteins where the shorter isoform is no more than 50 amino acids

10

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

433 truncated at the N-terminal compared to the longer isoform, we identified 247 genes 434 (Supplementary Table 6). When comparing the ratios of isoform specific reads obtained from 435 different fetal RNA-seq data48-51 we noticed that many of these genes show differential isoform 436 expression amongst multiple tissues, with many genes showing either expression of the long or the 437 short isoform in a particular tissue (Figure 7E). Homozygous LoF variants or start codon altering 438 mutations in these genes are rare in gnomAD (Supplementary Table 7), and it is tempting to 439 speculate that mutations in start codons of these genes could be associated with human genetic 440 diseases, as is the case for UGP2. Using mining of data from undiagnosed patients from our own 441 exome data base, the Queen Square Genomic Center database and those from Centogene and 442 GeneDx, we found evidence for several genes out of the 247 having rare, bi-allelic variants affecting 443 the start codon of one of the isoforms that could be implicated in novel disorders (unpublished 444 observations) and give one such example in the Supplementary Note. Together, these findings 445 highlight the relevance of mutations resulting in tissue-specific protein loss of essential genes for 446 genetic disorders.

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

11

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

463 Discussion:

464 Here we describe a recurrent variant in 12 individuals from 10 families, affecting the start codon of 465 the shorter isoform of the essential gene UGP2 as a novel cause of a severe DEE. Using in vitro and in 466 vivo disease modeling, we provide evidence that the reduction of UGP2 expression in brain cells 467 leads to global transcriptome changes, a reduced ability to produce glycogen, alterations in 468 glycosylation and increased sensitivity to ER stress, which together can explain the phenotype 469 observed in the patients. Most likely our findings in vitro underestimate the downstream effects in 470 patient cells, as in fetal brain the longer isoform expression is almost completely silenced and 471 virtually all UGP2 comes from the shorter isoform, which in patient cells cannot be translated. During 472 our in vitro NSC differentiation this isoform switch is less complete, leaving cells with the patient 473 mutation with some residual UGP2. Strikingly, the clinical phenotype seems to be very similar in all 474 cases, including intractable seizures, absence of developmental milestones, progressive microcephaly 475 and a disturbance of vision, with retinal pigment changes observed in all patients who had 476 undergone ophthalmological examination. Also, all patients seem to share similar, although mild, 477 dysmorphisms, possibly making this condition a recognizable syndrome.

478 The involvement of UGP2 in genetic disease is surprising. Given its central role in nucleotide-sugar 479 metabolism it is expected that loss of this essential protein would be incompatible with life, and 480 therefore loss-of-function should not be found in association with postnatal disease. Our data argue 481 that indeed a total absence of UGP2 in all cells is lethal, but that tissue-specific loss, as caused here 482 by the start codon alteration of an isoform important for brain, can be compatible with postnatal 483 development but still results in a severe phenotype. Given that any other LoF variant across this gene 484 would most likely affect both protein isoforms, this could also explain why only a single mutation is 485 found in all individuals. The fact that the Met12Val long isoform was able to rescue the full KO 486 phenotype indicates that the missense change introduced to the long protein isoform does not affect 487 UGP2 function. As other variants at this start codon, even heterozygous, are not found, possibly 488 missense variants encoding for leucine, lysine, threonine, arginine or isoleucine (e.g. amino acids that 489 would be encoded by alternative changes affecting the ATG codon) at this amino acid location in the 490 long isoform could not produce a functional protein and are therefore not tolerated. Although start 491 codon mutations have previously been implicated in disease96,97, there are no reports, to our 492 knowledge, on disorders describing start codon alterations of other essential genes, leading to 493 alterations of tissue specific isoforms. Using a genome-wide homology search, we have identified a 494 large list of other essential genes with a similar locus structure and variable isoform expression 495 amongst tissues, where similar ATG altering variants could affect tissue-relevant expression. An 496 intriguing question is why evolution has resulted in a large number of genes encoding almost 497 identical protein isoforms. It will be interesting to further explore the mutational landscape of these 498 genes in cohorts of currently unexplained patients.

499

500

501

502

12

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

503 Author contributions:

504 EP performed molecular biology experiments, with help from AN and DP. HvdL, WB and TvH 505 performed zebrafish work. PvdB and EHJ performed enzymatic analyses. IC performed brain 506 immunohistochemistry and supplied tissues samples. EA supplied tissue samples. MG generated 507 iPSCs. WvI and WGdV performed and SY analysed RNA-seq. SY performed gene homology search. 508 Patient recruitment and diagnosing was performed in the different families as follows: Family 1: TSB, 509 ASB, and EM phenotyped patient 1, MvS analysed WES; Family 2: LB and MK phenotyped patient 2 510 and 3, KGM, AB, KR analyzed WES; Family 3: JNK and JB phenotyped patient 4, KGM, AB, KR analyzed 511 WES. Family 4: AaF, FaM, RM and FaA phenotyped patient 5, EJK analyzed WES; Family 5: FZ and NR 512 phenotyped patient 6, SE, HH analyzed WES; Family 6, family 7 and family 8: MM, AE, ZK, FMD, MD, 513 EGK phenotyped patients 7-10, JV, RM, HH analyzed WES; Family 9: JH phenotyped patient 11, KKK, 514 ABA analysed WES; RT, KR, KKK, PB, ABA, RM, HH provided genetic data for population analysis. TSB 515 identified patient 1, conceived the study, obtained funding, supervised the lab work and wrote the 516 manuscript, with input from all main authors. All authors approved the final version of the 517 manuscript.

518 Acknowledgements

519 We are indebted to the parents of the patients for their kind cooperation. We thank Virginie 520 Verhoeven and Gerben Schaaf for critically reading our manuscript and Grazia Mancini for helpful 521 discussions. We thank Gerben Schaaf for providing the LAMP2 antibody, and Eskeatnaf Mulugeta for 522 bioinformatics advice. DP was supported by an Erasmus+ Traineeship Programme. HH is supported 523 by the Rosetree Trust, Ataxia UK, MSA Trust, Brain Research UK, Muscular Dystrophy UK, Muscular 524 Dystrophy Association (MDA USA), Higher Education Commission of Pakistan, The MRC 525 (MR/S01165X/1, MR/S005021/1, G0601943), Wellcome Trust (WT093205MA, WT104033AIA and the 526 Synaptopathies Strategic Award, 165908) and the National Institute for Health Research University 527 College London Hospitals Biomedical Research Centre. Families 5, 6, 7 and 8 were collected as part of 528 the SYNaPS Study Group collaboration funded by The Wellcome Trust and strategic award 529 (Synaptopathies) funding. Research for these families was conducted as part of the Queen Square 530 Genomics group at University College London, supported by the National Institute for Health 531 Research University College London Hospitals Biomedical Research Centre. TVH is supported by an 532 Erasmus University Rotterdam (EUR) fellowship. TSB’s lab is supported by the Netherlands 533 Organisation for Scientific Research (ZonMW Veni, grant 91617021), a NARSAD Young Investigator 534 Grant from the Brain & Behavior Research Foundation, an Erasmus MC Fellowship 2017 and Erasmus 535 MC Human Disease Model Award 2018. TSB, IC and EA acknowledge support from COST action 536 CA16118 that facilitated this collaboration.

537 Disclosure:

538 KGM, AB, RT and KR are employees of GeneDx, Inc. KR holds stock in OPKO Health, Inc. KKK, PB and 539 ABA are employees of CENTOGENE AG.

540

541

13

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

542 Online Methods:

543 Patient recruitment

544 All affected probands were investigated by their referring physicians and all genetic analysis was 545 performed in a diagnostic setting. Legal guardians of affected probands gave informed consent for 546 genomic investigations and publication of their anonymized data.

547 Next generation sequencing of index patients

548 Individual 1:

549 Genomic DNA was isolated from peripheral blood leukocytes of proband and both parents and 550 exome-coding DNA was captured with the Agilent Sure Select Clinical Research Exome (CRE) kit (v2). 551 Sequencing was performed on an Illumina HiSeq 4000 with 150bp paired end reads. Reads were 552 aligned to hg19 using BWA (BWA-MEM v0.7.13) and variants were called using the GATK haplotype 553 caller (v3.7 (reference: http://www.broadinstitute.org/gatk/)98. Detected variants were annotated, 554 filtered and prioritized using the Bench lab NGS v5.0.2 platform (Agilent technologies). Initially, only 555 genes known to be involved in epilepsy were analyzed, followed by a full exome analysis revealing 556 the homozygous UGP2 variant

557 Individuals 2, 3 and 4:

558 Using genomic DNA from the proband and parents (individual 4) or the proband, parents, and 559 affected sibling (individual 2 and 3), the exonic regions and flanking splice junctions of the genome 560 were captured using the SureSelect Human All Exon V4 (50 Mb) (individual 4) or the IDT xGen Exome 561 Research Panel v1.0 (individual 2 and 3). Massively parallel (NextGen) sequencing was done on an 562 Illumina system with 100bp or greater paired-end reads. Reads were aligned to human genome build 563 GRCh37/UCSC hg19, and analyzed for sequence variants using a custom-developed analysis tool. 564 Additional sequencing technology and variant interpretation protocol has been previously 565 described99. The general assertion criteria for variant classification are publicly available on the 566 GeneDx ClinVar submission page (http://www.ncbi.nlm.nih.gov/clinvar/submitters/26957/)

567 Individual 5 (Nijmegen)

568 Diagnostic exome sequencing was done at the Departments of Human Genetics of the Radboud 569 University Medical Center Nijmegen, The Netherlands and performed essentially as described 570 previously100.

571 Individual 6, 7, 8, 9 and 10

572 After informed consent, we collected blood samples from the probands, their parents and unaffected 573 siblings, and extracted DNA using standard procedures. To investigate the genetic cause of the 574 disease, WES was performed in the affected proband. Nextera Rapid Capture Enrichment kit 575 (Illumina) was used according to the manufacturer instructions. Libraries were sequenced in an 576 Illumina HiSeq3000 using a 100-bp paired-end reads protocol. Sequence alignment to the human 577 reference genome (UCSC hg19), and variants calling, and annotation were performed as described 578 elsewhere101. After removing all synonymous changes, we filtered single nucleotide variants (SNVs)

14

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

579 and indels, only considering exonic and donor/acceptor splicing variants. In accordance with the 580 pedigree and phenotype, priority was given to rare variants [<1% in public databases, including 1000 581 Genomes project, NHLBI Exome Variant Server, Complete Genomics 69, and Exome Aggregation 582 Consortium (ExAC v0.2)] that were fitting a recessive or a de novo model.

583 Individual 11 and 12

584 Whole exome sequencing was performed at CENTOGENE AG, as previously described102.

585

586 Human brain samples

587 Tissue was obtained, upon informed consent, and used in a manner compliant with the Declaration 588 of Helsinki and the Research Code provided by the local ethical committees. Fetal brains were 589 preserved after spontaneous or induced abortions with appropriate maternal written consent for 590 brain autopsy and use of rest material for research. We performed a careful histological and 591 immunohistochemical analysis and evaluation of clinical data (including genetic data, when 592 available). We only included specimens displaying a normal cortical structure for the corresponding 593 age and without any significant brain pathology.

594 Brain tissue immunohistochemistry

595 For immunohistochemical analysis, we used 2 cases from the first trimester (GW6 and GW9), 4 cases 596 from the second trimester (GW21, GW23, GW24 and GW26) and 2 cases from the third trimester 597 (GW33 and GW36). Anatomical regions were determined according to the atlas of human brain 598 development103-106. We cut 4 µm sections from formalin-fixed, paraffin embedded whole fetuses 599 (GW6 and GW9) and brain tissue from cerebral, mesencephalic, cerebellar and brain stem region 600 (from GW21 to GW36). Slides were stained with mouse anti-UGP2 (C-6) in a 1:150 dilution (Santa 601 Cruz) and visualized using Mouse and Rabbit Specific HRP/DAB (ABC) Detection IHC kit (Abcam). 602 Mayer’s hematoxylin was used as a counterstain for immunohistochemistry followed by mounting 603 and coverslipping (Bio-Optica) for slides. Prepared slides were analyzed and scanned under a 604 VisionTek® Live Digital Microscope (Sakura).

605

606 Cloning of UGP2 cDNA

607 RNA was isolated using TRI reagent (Sigma) from whole peripheral blood of index patient 1 and her

608 parents, after red blood cell depletion with RBC lysis buffer (168mM NH4Cl, 10mM KHCO3, 0.1mM 609 EDTA). cDNA was synthesized following the iSCRIPT cDNA Synthesis Kit (Bio-Rad) protocol, and the 610 coding sequence of the long and short UGP2 isoform (wild type or mutant) was PCR-amplified 611 together with homology arms for Gibson assembly (see Supplementary Table 8 for primer 612 sequences) using Phusion High-Fidelity DNA polymerase (NEB). PCR amplified DNA was then cloned 613 by Gibson assembly as previously described107 in a pPyCAG-IRES-puro plasmid (a kind gift of Ian 614 Chambers, Edinburgh) opened with EcoRI for experiments in mammalian cells. All obtained plasmids 615 were sequenced verified by Sanger sequencing (complete plasmid sequences available upon 616 request).

15

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

617 Fibroblast cell culture

618 Fibroblasts from index patient 1 and her parents were obtained using a punch biopsy according to 619 standard procedures, upon informed consent (IRB approval MEC-2017-341). Fibroblasts from the 620 parents of index patient 2 and 3 were also obtained upon informed consent at McMaster Children’s 621 Hospital. All fibroblasts were cultured in standard DMEM medium supplemented with 15% Fetal calf 622 serum, MEM Non-Essential amino acids (Sigma), 100 U/ml penicillin and 100 µg/ml streptomycin, as 623 done previously108, in routine humidified cell culture incubators at 20% O2. Fibroblast cell lines were 624 transfected using Lipofectamine 3000 (Invitrogen) with the indicated plasmid constructs. All the cell 625 lines used in this report were regularly checked for the presence of mycoplasma and were negative 626 during all experiments.

627 Genome engineering in human embryonic stem cells

628 H9 human embryonic stem cells were cultured as previously described107,109. In short, cells were 629 maintained on feeder free conditions in mTeSR-1 medium (STEMCELL technologies) on Matrigel 630 (Corning) coated culture dishes. To engineer the patient specific UGP2 mutation by homologous 631 recombination110, ESC were transfected using Lipofectamine 3000 with a plasmid expressing 632 eSpCas9-t2a-GFP (a kind gift of Feng Zhang) and a gRNA targeting the UGP2 gene (see 633 Supplementary Table 8 for the sequence), together with a 60 bp single stranded oligonucleotide 634 (ssODN) homology template encoding the patient mutation (synthesized at IDT). To increase the 635 stability of the ssODN and therefore homologous recombination efficiency, the first two 5’ and 3’ 636 nucleotides were synthesized using phosphorothiorate bonds111. 48 hours post transfection, GFP 637 expressing cells were sorted, and 6000 single GFP-positive cells were plated on a Matrigel coated 6- 638 well plate in the presence of 10µM ROCK-inhibitor (Y27632, Millipore). After approximately 10 days, 639 single colonies where manually picked, expanded and genotyped using Sanger sequencing (see 640 Supplementary Table 8 for primer sequences). As a by-product of non-homologous end joining, 641 knock-out clones were identified which showed a single nucleotide A insertion at position 42 of UGP2 642 transcript 1 (chr2:64083462_64083463insA), leading to an out of frame transcript and a premature 643 termination of the protein at amino acid position 47 (D15Rfs*33). Western blotting confirmed the 644 absence of all UGP2 protein in knock-out clones and the loss of the short UGP2 isoform in clones with 645 the patient mutation. To produce a stable rescue cell line, ESC cells were transfected as previously 646 described with the pPyCAG-IRES-puro plasmid expressing either the long WT or mutant UGP2 647 isoform. After 48 hours, the population of cells with the transgene integration was selected with 648 1µg/ml puromycin. Engineered ESC clones had a normal colony morphology and pluripotency factor 649 expression.

650 Patient specific Induced pluripotent stem cell generation

651 Patient fibroblast cell lines were reprogrammed using the CytoTune™-iPS 2.0 Sendai Reprogramming 652 Kit (Thermo Scientific, A16517) expressing the reprogramming factors OCT4, SOX2, KLF4 and C-MYC 653 on matrigel coated cell culture plates, upon informed consent (IRB approval MEC-2017-341). After 654 approximately 4-5 weeks, emerging colonies were manually picked and expanded. Multiple clones 655 were assessed for their karyotype, pluripotency factor expression and three lineage differentiation 656 potential (Stem Cell Technologies, #05230), following the routine procedures of the Erasmus MC iPS

16

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

657 Cell facility, as previously described108. Sanger sequencing was used to verify the genotype of each 658 obtained iPSC line. We used three validated clones for each individual in our experiments.

659 Neural stem cell differentiation

660 Pluripotent cells were differentiated in neural stem cells (NSCs), using a modified dual SMAD 661 inhibition protocol112. In short, 18000 cells/cm2 were plated on matrigel coated cell culture dishes in 662 mTeSR-1 medium in the presence of 10µM Y27632. When cells reached 90% confluency, the medium 663 was switched to differentiation medium (KnockOut DMEM (Gibco), 15% KnockOut serum 664 replacement (Gibco), 2mM L-glutamine (Gibco), MEM Non-Essential amino acids (Sigma), 0.1 mM β- 665 mercaptoethanol, 100U/ml penicillin and 100 µg/ml streptomycin) supplemented with 2µM A 83-01 666 (Tocris) and 2µM Dorsomorphin (Sigma-Aldrich). At day 6, medium was changed to an equal ratio of 667 differentiation medium and NSC medium (KnockOut DMEM-F12 (Gibco), 2mM L-glutamine (Gibco), 668 20ng/ml bFGF (Peprotech), 20ng/ml EGF (Peprotech), 2% StemPro Neural supplement (Gibco), 669 100U/ml penicillin and 100µg/ml streptomycin) supplemented with 2µM A 83-01 (Tocris) and 2µM 670 Dorsomorphin (Sigma-Aldrich). At day 10, cells were passaged (NSC p=0) using Accutase (Sigma) and 671 maintained in NSC medium. We used commercially available H9-derived NSCs (Gibco) as a control (a 672 kind gift of Raymond Poot, Rotterdam).

673 Other stem cell differentiation experiments

674 ESCs were differentiated into hematopoietic stem cells and cardiomyocyte using commercially 675 available STEMCELL technologies kits (STEMdiff Hematopoietic kit #05310, STEMdiff Cardiomyocyte 676 differentiation kit #05010) according to manufacturer’s instructions. Cells were finally harvested and 677 lysed with TRI reagent to isolate RNA for further RT-qPCR analysis.

678 RNA-sequencing and data analysis

679 For RNA-seq on blood derived patient RNA, peripheral blood was obtained from index patient 1 and 680 her parents, collected in PAX tubes and RNA was isolated following standard diagnostic procedures in 681 the diagnostics unit of the Erasmus MC Clinical Genetics department. RNA-seq occurred in a 682 diagnostic setting, and sequencing was performed at GenomeScan (Leiden, The Netherlands). For 683 RNA-seq of in vitro cultured cell lines, RNA was obtained from 6-well cultures using TRI reagent, and 684 further purified using column purification (Qiagen, #74204). mRNA capture, library prep including 685 barcoding and sequencing on an Illumina HiSeq2500 machine were performed according to standard 686 procedures of the Erasmus MC Biomics facility. Approximately 20 million reads were obtained per 687 sample. For the cell line experiments, two independent H9 wild type cultures, two independent 688 knock-out clones harboring the same homozygous UGP2 genetic alteration and two independent 689 clones harboring the patient homozygous UGP2 mutation were used. Each cell line was sequenced in 690 two technical replicates at ESC state and differentiated NSC state (at passage 5). FASTQ files obtained 691 after de-multiplexing of single-end, 50 bp sequencing reads were trimmed by removing possible 692 adapters using Cutadapt after quality control checks on raw data using the FastQC tool. Trimmed 693 reads were aligned to the human genome (hg38) using the HISAT2 aligner113. To produce Genome 694 Browser Tracks, aligned reads were converted to bedgraph using bedtools genomecov, after which 695 the bedGraphToBigWig tool from the UCSC Genome Browser was used to create a bigwig file. 696 Aligned reads were counted for each gene using htseq-count114 and GenomicFeatures115 was used to 697 determine the gene length by merging all non-overlapping per gene from the

17

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

698 Homo_sapiens.GRCh38.92.gtf file (Ensemble). Differential gene expression and RPKM (Reads Per 699 Kilobase per Million) values were calculated using edgeR116 after removing low expressed genes and 700 normalizing data. The threshold for significant differences in the gene expression was FDR < 0.05. To 701 obtain a list of ESC and NSC reference genes used in Supplementary Figure 6F, we retrieved genes 702 annotated in the following GO terms using GSEA/MSigDB web site v7.0: 703 GO_FOREBRAIN_NEURON_DEVELOPMENT (GO:0021884), GO_CEREBRAL_CORTEX_DEVELOPMENT 704 (GO:0021987), GO_NEURAL_TUBE_DEVELOPMENT (GO:0021915), 705 BHATTACHARYA_EMBRYONIC_STEM_CELL (PMID: 15070671) and BENPORATH_NOS_TARGETS 706 (PMID: 18443585).

707 Functional enrichment analysis

708 Metascape117, g:profiler118 and Enrichr119 were used to assess functional enrichment of differential 709 expressed genes. Supplementary Table 4 reports all outputs in LogP, log(q-value) and Adjusted p- 710 value (q-value) for Metascape and g:profiler, and in p-value, Adjusted p-value (q-value) and 711 combined-score (which is the estimation of significance based on the combination of Fisher's exact 712 test p-value and z-score deviation from the expected rank) for Enrichr. All tools were used with 713 default parameters and whole genome set as background.

714 Genome-wide homology search:

715 To make a genome-wide list of transcripts sharing a similar structure as UGP2 transcripts, 42976 716 transcripts from 21522 genes (Human genes GRCh38.p12) were extracted using BioMart of Ensembl 717 (biomaRt R package). 11056 out of 21522 genes had only 1 transcript and the remaining 31920 718 transcripts from 10466 genes were selected, the protein sequences were obtained with biomaRt R 719 package and homology analysis was performed using the NCBI`s blastp (formatting option: - 720 outfmt=6) command line. We grouped longest and shorter transcript based on coding sequence 721 length and only kept those that matched a pairwise homology comparison between the longest and 722 the shorter transcript with the following criteria: complete 100 percent identity, without any gap and 723 mismatch, and starting ATG codon of shortest transcript being part of the longest transcript(s). 1766 724 genes meet these criteria. We then filtered these genes for published essential genes6, leaving us 725 with 1197 genes. Using BioMart (Attributes: Phenotype description and Study external reference) of 726 Ensembl we then evaluated the probability that these genes were implicated in disease and 727 identified 850 genes that did not have an association with disease phenotype/OMIM number. Of 728 those, 247 genes encoded proteins of which the shorter isoform differed less than 50 amino acids 729 from the longer isoform. We chose this arbitrary threshold to exclude those genes where both 730 isoforms could encode proteins differing largely in size and might therefore encode functionally 731 completely differing proteins (although we cannot exclude that this will also hold true for some of 732 the genes in our selection).

733 Differential isoform expression in fetal tissues

734 Publically available RNA-seq data from various fetal tissue samples (Supplementary Table 2) were 735 analyzed using the same workflow as described for the RNA-seq data analysis above. To determine 736 differential isoform expression in these tissues, we calculated a ratio between the unique exon(s) of 737 the shortest and longest transcript for each gene and assessed its variability across different fetal 738 tissue samples. The number of reads for each unique exon of a transcript was calculated by mapping

18

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

739 aligned RNA-seq reads against the unique exon coordinate using bedtools multicov. The longest and 740 shortest transcripts were separated and the transcript ratio (number of counts of shortest transcript 741 / (number of counts of shortest transcript + number of counts of longest transcript)) for each gene 742 was obtained from the average reads of RNA-seq samples per tissue. 382 genes out of 1197 genes 743 showed high variability across different samples (defined as a difference between highest and lowest 744 ratio > 0.5), 277 of those high variable genes were not associated with a disease phenotype/OMIM 745 number and of these 83 genes had a length less than 50 amino acids (a subset of the 247 genes with 746 no OMIM and length less than 50 amino acids)

747 Haplotype Analysis

748 The 30 MB region surrounding UGP2 was extracted from exome sequencing VCF files to include both 749 common and rare polymorphisms. Variants were filtered for a minimum depth of coverage of at least 750 10 reads and a genotype quality of at least 50. The filtered variants, were then used as input in PLINK 751 (v1.07) with the following settings:

752 --homozyg-snp 5 753 --homozyg-kb 100 754 --homozyg-gap 10000 755 --homozyg-window-het 0

756 ROH around the UGP2 variant were identified in all 5 probands examined. The minimum ROH in 757 common between all samples was a 5Mb region at chr2: 60679942-65667235. We note that targeted 758 sequencing leads to uneven SNP density, so the shared ROH may, in fact, be larger or smaller. Next, 759 we used recombination maps from deCODE to estimate the size of the region in centiMorgans (cM). 760 We then used the region size in cM to estimate the time to event in generations using methods 761 previously described120.

762

763 qPCR analysis

764 RNA was obtained using TRI reagent, and cDNA prepared using iSCRIPT cDNA Synthesis Kit according 765 to manufacturer’s instructions. qPCR was performed using iTaq universal SYBR Green Supermix in a 766 CFX96RTS thermal cycler (Bio-Rad). Supplementary Table 8 summarizes all primers used in this 767 study. Relative gene expression was determined following the ΔΔct method. To calculate the ratio of 768 the short isoform, we performed absolute quantification as previously described121. Briefly, we 769 performed qPCR on known copy numbers, ranging from 10^3 to 10^8 copies, of a plasmid containing 770 the short UGP2 isoform (5’ UTR included) using primers detecting specifically either the total or the 771 short isoform. After plotting the log copy number versus the ct, we obtained a standard curve that 772 we used to extrapolate the copy number of the unknown samples. To test for significance, we used 773 Student’s T-test and considered p<0.05 as significant.

774 Western blotting

775 Proteins were extracted with NE buffer (20mM Hepes pH 7.6, 1.5mM MgCl2, 350mM KCl, 0.2mM 776 EDTA and 20% glycerol) supplemented with 0.5% NP40, 0.5mM DTT, cOmplete Protease Inhibitor 777 Cocktail (Roche) and 150U/ml benzonase Protein concentration was determined by BCA (Pierce) and

19

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

778 20-50µg of proteins were loaded onto a 4–15% Criterion TGX gel (Bio-Rad). Proteins were then 779 transferred to a nitrocellulose membrane using the Trans-Blot Turbo Transfer System (Bio-Rad). The 780 membrane was blocked in 5% milk in PBST and subsequently incubated overnight at 4°C with primary 781 antibody diluted in milk. After PBST washes, the membrane was incubated 1 hour at RT with the 782 secondary antibody and imaged with an Odyssey CLX scanning system (Li-Cor). Band intensities were 783 quantified using Image Studio (Li-cor). Antibodies used were: Ms-α-UGP2 (sc-514174) 1:250; Ms-α- 784 Vinculin (sc-59803) 1:10000; Gt-α-actin (sc-1616) 1:500; Ms-α-LAMP2 (H4B4) 1:200; IRDye 800CW 785 Goat anti-Mouse (926-32210) 1:5000; IRDye 680 Donkey anti-Goat (926-32224) 1:5000.

786

787 Zebrafish disease modelling

788 Animal experiments were approved by the Animal Experimentation Committee at Erasmus MC, 789 Rotterdam. Zebrafish embryos and larvae were kept at 28°C on a 14–10‐hour light–dark cycle in 1 M

790 HEPES buffered (pH 7.2) E3 medium (34.8 g NaCl, 1.6 g KCl, 5.8 g CaCl2 · 2H2O, 9.78 g MgCl2 · 6 H2O). 791 For live imaging, the medium was changed at 1 dpf to E3 + 0.003% 1‐phenyl 2‐thiourea (PTU) to 792 prevent pigmentation. Ugp2a and ugp2b were targeted by Cas9/gRNA RNP-complex as we did 793 before76. Briefly, fertilized oocytes from a tgBAC(slc1a2b:Citrine)re01tg reporter line76 maintained on 794 an TL background strain were obtained, and injected with Cas9 protein and crRNA and tracrRNA 795 synthesized by IDT (Alt-R CRISPR-Cas9 System), targeting the open reading frame of zebrafish ugp2a 796 and ugp2b. DNA was extracted from fin clips and used for genotyping using primers flanking the 797 gRNA location (Supplementary Table 8) followed by sequencing. Mutants with a high level of out of 798 frame indels in both genes were identified using TIDE122 and intercrossed to obtain germ line 799 transmission. Upon re-genotyping, mutant zebrafish with the following mutations as indicated in 800 Figure 6 were selected and further intercrossed. In this study, we describe two new mutant fish lines 801 containing deletions in ugp2a (ugp2aΔ/Δ) and ugp2b (ugp2bΔ/Δ): ugp2are08/re08 containing a 37 bp 802 deletion in exon 2 and ugp2bre09/re09 containing a 5 bp deletion in exon 2. Intravital imaging, and 803 analysis of eye movement, was performed as previously described76. Briefly, zebrafish larvae 804 anesthetized in tricaine were mounted in low melting point agarose containing tricaine and imaged 805 using a Leica SP5 intravital imaging setup with a 20×/1.0 NA water-dipping lens. To assess the 806 locomotor activity of zebrafish larvae from 3 to 5 dpf, locomotor activity assays were performed 807 using an infrared camera system (DanioVision™ Observation chamber, Noldus) and using EthoVision® 808 XT software (Noldus) as described76. Briefly, control (n = 24) and ugp2aΔ/Δ; ugp2bΔ/Δ (n = 24) zebrafish 809 larvae, in 48 well plates, were subjected to gradually increasing (to bright light) and decreasing light 810 conditions (darkness) as in Kuil et al76. Distance traveled (mm) per second was measured. For 4-AP 811 (Sigma) stimulation animals were treated with 4-AP dissolved in DMSO 30 minutes before the onset 812 of the experiments. For these experiments locomotor activity was measured over 35 minutes, with 813 the first 5 minutes going from dark to light, followed by 30 minutes under constant light exposure.

814

815 Periodic acid- schiff (PAS) staining

816 ESCs or differentiated NSCs (wild type, KO, KI or rescue) were incubated under hypoxia conditions 817 (3% O2) for 48 hours. Cells were fixed with 5.2% formaldehyde in ethanol, incubated 10 min with 1% 818 Periodic acid, 15 min at 37°C with Schiff’s reagent (Merck) and 5 min with Hematoxylin solution

20

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

819 (Klinipath) prior to air drying and mounting. Every step of the protocol is followed by a 10 minutes 820 wash with tap water. Imaging occurred on an Olympus BX40 microscope. Images were acquired at a 821 100x magnification, and ImageJ software was used for quantification. For ESCs, we used a minimum 822 of 20 images per genotype for the quantification, containing on average 20 cells each, calculating the 823 percentage of PAS positive area. For NSCs, we imaged between 80 to 100 cells per genotype, 824 counting the number of glycogen granules in the cytoplasm. We report the average of two 825 independent experiments at 48 hours low oxygen.

826

827 UGP2 enzymatic activity

828 The measurement of UGP2 enzyme activity was performed according to a modified GALT 829 enzyme activity assay as described previously123. Frozen cell pellets were defrosted and 830 homogenized on ice. 10 µl of each cell homogenate (around 0.5 mg protein/ml as 831 established by BSA protein concentration determination) was pre-incubated with 10 µl of 832 dithiothreitol (DDT) for 5 min at 25°C. 80 µl of a mixture of glucose-1-phosphate (final 833 concentration 1 mM), UTP (0.2 mM), magnesium chloride (1 mM), glycine (125 mM) and 834 Tris-HCl (pH8) (40 mM) was added and incubated for another 15 min at 25°C. The reaction 835 was stopped by adding 150 µl of 3.3% perchloric acid. After 10 min on ice the mixture was 836 centrifuged (10,000 rpm for 5 min at 4°C), the supernatant isolated and neutralized with ice 837 cold 8 µl potassium carbonate for 10 min on ice. After centrifugation the supernatant was 838 isolated and 1:1 diluted with eluent B (see below) after which the mixture was added to a 839 MilliPore Amicon centrifugal filter unit. After centrifugation the supernatant was stored at - 840 20°C until use. The separation was performed by injection of 10 µl of the defrosted 841 supernatant onto a HPLC system with UV/VIS detector (wave length 262 nm) equipped with 842 a reversed phase Supelcosil LC-18-S 150 mm x 4.6 mm, particle size 5 µm, analytical column 843 and Supelguard LC18S guard column (Sigma-Aldrich). During the experiments the 844 temperature of the column was maintained at 25°C. The mobile phase consisted of eluent A 845 (100% methanol) and eluent B (50 mM ammonium phosphate buffer pH7.0 and 4 mM 846 tetrabutylammonium bisulphate). A gradient of 99% eluent B (0-20 min), 75% eluent B (20- 847 30 min) and 99% eluent B (30-45 min) at a flow rate of 0.5 m/min was used. The reaction 848 product UDP-glucose was quantified using a calibration curve with known concentrations of 849 UDP-glucose. UGP2 activity was expressed as the amount of UDP-glucose formed per mg 850 protein per min. Experiments were performed in duplicate and for every cell line two 851 independently grown cell pellets were used.

852 Immunostaining / Immunohistochemistry

853 For immuonofluorescence staining, cells were seeded on coverslips coated with 100µg/ml poly-D- 854 lysine (Sigma) overnight. For ESC, coverslips were further coated with Matrigel (Corning) for one hour 855 at 37°C. When cells reached about 70% confluency, they were fixed with 4% PFA for 15 min at RT. 856 Cells were then permeabilized with 0.5% triton in PBS, incubated one hour in blocking solution (3% 857 BSA in PBS) and then overnight at 4°C with the primary antibody diluted in blocking solution. The

21

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

858 following day the coverslips were incubated one hour at room temperature in the dark with a Cy3- 859 conjugated secondary antibody and mounted using ProLong Gold antifade reagent with DAPI 860 (Invitrogen) to counterstain the nuclei. Images were acquired with a ZEISS Axio Imager M2 using a 861 63X objective.

862 Data availability:

863 RNA-Seq of in vitro studies are publicly available through the National Center for Biotechnology 864 Information (NCBI) Gene Expression Omnibus (GEO) under accession number GSE137129. A token for 865 reviewer access is present in the supplement. Due to privacy regulations and consent, raw RNA-seq 866 data from patient blood cannot be made available. To retrieve tissue wide expression levels of UGP2, 867 the GTEx Portal was accessed on 16/07/2019 (https://gtexportal.org/home/). RNA-seq data from 868 various tissues were downloaded from various publications48-51. All publically available data that were 869 re-analyzed here are summarized in Supplementary Table 2.

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

22

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

889 Figure Legends

890 Figure 1: UGP2 homozygous variants in 10 individuals with severe epileptic encephalopathy

891 A) Facial pictures of individual 1 (at 3, 18 and 23 month), individual 5 (at 9 years), individual 6 (at 11 892 month) and individual 10 (at 2 years). Note the progressive microcephaly with sloping forehead, 893 suture ridging, bitemporal narrowing, high hairline, arched eyebrows, pronounced philtrum, a 894 relatively small mouth and large ears. 895 B) Electroencephalogram of individual 1 at the age of 8 month showing a highly disorganized 896 pattern with high voltage irregular slow waves intermixed with multifocal spikes and polyspikes. 897 C) T1-weighted mid sagittal brain MRI of individual 1 (age 17 month) and individual 4 (age 24 898 month) illustrating global atrophy and microcephaly but no major structural anomalies. 899 D) Sanger sequencing traces of family 1, confirming the chr2:64083454A>G variant in UGP2 in a 900 heterozygous and homozygous state in parents and affected individual 1, respectively. 901 E) Family pedigrees of ascertained patients. Affected individuals and heterozygous parents are 902 indicated in black and half black, respectively. Affected individuals with confirmed genotype are 903 indicated with an arrow, and numbers. Other affected siblings presenting with similar 904 phenotypes are indicated with a question mark. Consanguineous parents are indicated with a 905 double connection line. Male are squares, females circles; unknown sex indicated with rotated 906 squares; deceased individuals are marked with a line. 907 F) Violin plots showing distribution of gene expression (in TPM) amongst male and female samples 908 from the GTEx portal43 for various brain regions. Outliers are indicated by dots 909 G) Multiple species sequence alignment from the UCSC browser, showing that the ATG start site is 910 highly conserved. 911

912 Figure 2: UGP2 homozygous variant leads to a loss of the shorter protein isoform in patient 913 fibroblasts

914 A) Schematic drawing of the human UGP2 locus, with both long and short transcript isoforms. 915 Boxes represent exons, with coding sequences indicated in green. The location of the 916 recurrent mutation is indicated in both transcripts. 917 B) Western blotting of cellular extracts derived from control fibroblasts or fibroblasts obtained 918 from family 1, detecting the housekeeping control vinculin or UGP2. Note the two separated 919 isoforms of UGP2 that have a similar intensity in wild type cells. The shorter isoform is less 920 expressed in fibroblasts from heterozygous parents and absent in fibroblasts from the 921 affected proband. 922 C) Western blot quantification of the fraction of the short UGP2 protein isoform compared to 923 total UGP2 expression in control, parental heterozygous and proband homozygous 924 fibroblasts, as determined in three independent experiments. Error bars represent SEM. 925 D) Western blot quantification of total UGP2 protein levels, as determined by the relative 926 expression to the housekeeping control vinculin. Bar plot showing the results from three 927 independent experiments. Error bars represent SEM; no significant differences were found 928 between parents and proband, t-test, two-tailed.

23

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

929 E) Cell proliferation experiment of fibroblast from heterozygous parents and homozygous 930 proband from family 1, during a 5 days period, determined in three independent 931 experiments. Error bars represent SEM. 932 F) Immunocytochemistry on cultured control and UGP2 heterozygous and homozygous mutant 933 fibroblast derived from family 1, detecting UGP2 (red). Nuclei are stained with DAPI. Scale 934 bar = 50 µm. 935 G) Enzymatic activity of UGP2 as measured in control and UGP2 heterozygous and homozygous 936 mutant fibroblast derived from family 1. Shown is the mean of two independent 937 experiments. Error bars represent SEM; no significant differences were found, unpaired t- 938 test, two-tailed.

939 Figure 3: UGP2 short isoform is predominant in brain related cell types

940 A) Western blotting showing UGP2 expression in H9 human embryonic stem cells (ESCs), H9 941 derived neural stem cells (NSCs) and fibroblasts (Fibro). Vinculin is used as a housekeeping 942 control. Note the changes in relative expression between the two UGP2 isoforms in the 943 different cell types. L, ladder. 944 B) Western blot quantification of the fraction of the short UGP2 protein isoform compared to 945 total UGP2 expression, as determined in three independent experiments. Error bars 946 represent SEM. 947 C) Multiplex RT-PCR of ESCs, NSCs and fibroblasts, showing a similar variability in isoform 948 expression at the transcript as at the protein level. Each cell line was tested in triplicates. 949 D) Quantification of the fraction of the short UGP2 transcript isoform compared to total UGP2 950 expression, from the multiplex RT-PCR from C). Error bars represent SEM 951 E) Quantification of the fraction of the short UGP2 transcript isoform compared to total UGP2 952 expression by qRT-PCR in three independent experiments. Error bars represent SEM. 953 F) Ratio of RNA-seq reads covering the short transcript isoform compared to the total reads 954 (covering both short and long isoforms), in multiple fetal tissues. In RNA-seq samples derived 955 from brain, virtually all UGP2 expression comes from the short isoform. Error bars represent 956 SD. 957 G) Immunohistochemistry detecting UGP2 in human fetal brains from the first, second and third 958 trimester (gestational week (GW) 6, 9, 23 and 36). See text for details. 959 H) Western blotting detecting UGP2 in various human brain regions at week 14, 20 and 28 of 960 gestation, showing the virtual absence of the long isoform expression in fetal brain. Vinculin 961 is used as a housekeeping control. L, ladder.

962

963 Figure 4: RNA-seq of UGP2 mutant H9 derived cell lines

964 A) Venn diagram showing the overlap between differentially expressed genes in UGP2 KO or KI 965 NSCs that are upregulated (upper panel genes with FDR<0.05 and LogFC>1) or 966 downregulated (lower panel, genes with FDR<0.05 and LogFC<-1) compared to wild type 967 NSCs. 968 B) Box plot showing the distribution of gene expression levels (in Log2(RPKM+1)) from RNA-seq 969 for the groups of genes displayed in A), in wild type, UGP2 KI or KO NSCs. Boxes are IQR; line

24

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

970 is median; and whiskers extend to 1.5x the IQR (*=p<0.05; **=p<0.01,***=p<0.001, unpaired 971 t-test, two-tailed). 972 C) Enrichment analysis using Enrichr119 of up- or downregulated genes in NSCs from A) for 973 selected sets, showing the 5 most enriched terms per set. Combined score 974 and p-value calculated by Enrichr are depicted (*=p<0.05; **=p<0.01; ***=p<0.001). 975 D) qRT-PCR validation of differentially expressed genes from RNA-seq in wild type, UGP2 KI, 976 UGP2 KO NSCs and KO NSCs that were rescued with either WT or MUT (Met12Val) transcript 977 isoform 1, at p5 of NSC differentiation. Bar plot showing the mean fold change for the 978 indicated genes compared to wild type, normalized for the housekeeping gene TBP. Results 979 of two biological and two independent technical replicates are plotted. Colors match the 980 Venn diagram group to which the tested genes belong to from A). Error bars represent SEM; 981 (*=p<0.05; **=p<0.01,***=p<0.001, unpaired t-test, one-tailed).

982

983 Figure 5: metabolic changes upon UGP2 loss

984 A) UGP2 enzymatic activity in WT, UGP2 KI, KO and KO NSCs rescued with wildtype or mutant 985 Met12Val isoform 1 of UGP2. Bar plot showing the mean of two replicate experiments, error 986 bar is SEM. *=p<0.05; ***=p<0.001, unpaired t-test, two-tailed. 987 B) Representative pictures of PAS staining in WT, KI, KO and rescue NSCs. Nuclei are 988 counterstained with hematoxylin (blue). Inserts show zoom-in of part of the cytoplasm. Note 989 the presence of glycogen granules in WT NSCs, their diminished number in KI NSCs, their 990 absence in KO NSCs and their reappearance upon rescue with both wild type long UGP2 as 991 with Met12Val long UGP2. 992 C) Quantification of the number of glycogen granules per cell in WT, UGP2 KI, KO and rescue 993 NSCs, after 48 hours culture under low-oxygen conditions. Shown is the average number of 994 glycogen granules per cell, n=80-100 cells per genotype. Error bars represent the SD. 995 ***=p<0.001, unpaired t-test, two-tailed. 996 D) Western blotting detecting LAMP2 (upper panel) and the housekeeping control ACTIN (lower 997 panel) in cellular extracts from ESC-derived NSCs, that are wt, UGP2 KI, KO and KO cells 998 rescued with either the long wildtype isoform 1 or the mutant Met12Val isoform 1. 999 Glycosylated LAMP2 runs at ~110 kDa, whereas hypo-glycosylated LAMP2 is detected around 1000 75 kDa. The absence of detectable changes in LAMP2 glycosylation in KI cells is likely 1001 explained by a non-complete isoform switch upon in vitro NSC differentiation, resulting in 1002 residual UGP2 levels (c.p. Supplementary Figure 5D). 1003 E) qRT-PCR expression analysis for UPR marker genes (spliced XBP1, HSPA5, ATF4 and EDEM) in 1004 WT, KI, KO and rescue NSCs. Shown is the mean fold change for the indicated genes 1005 compared to wild type, normalized for the housekeeping gene TBP. Results of two biological 1006 and two independent technical replicates are plotted, from two experiments. Error bars 1007 represent SEM; *=p<0.05; **=p<0.01,***=p<0.001, unpaired t-test, two-tailed

1008

1009

1010

25

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1011 Figure 6: zebrafish disease modelling

1012 A) Schematic drawing of the ugp2a and ugp2b loci in zebrafish and the generated mutations 1013 indicated. 1014 B) Confocal images (Maximum projection of confocal Z-stacks) of the brain of wild type (left) 1015 and ugp2aΔ/Δ; ugp2bΔ/Δ mutant zebrafish larvae (right), both in an slc1a2b-citrine reporter 1016 background, at 4 days post fertilization (dpf). IThe lower panels are higher magnifications of 1017 the boxed regions indicated in the upper panels. Scale bar in upper panel is 100 µm, in lower 1018 panel 20 µm. In upper panel, Z = 45 with step size 4 µm; In lower panel, Z = 30 with step size 1019 2 µm. 1020 C) Enzymatic activity in ugp2 double mutant zebrafish larvae at 4 and 5 dpf, compared to wild 1021 type age matched controls, showing reduced Ugp2 enzyme activity in double mutant 1022 zebrafish. 1023 D) qRT-PCR for the neuronal activity marker c-FOS in wild type and ugp2 double mutant larvae 1024 at 3 dpf. For each group, 2 batches of 12 larvae were pooled. Shown is the mean fold change 1025 for the indicated genes compared to wild type, normalized for the housekeeping gene gapdh. 1026 Error bars represent SEM; ***= p<0.001, unpaired t-test, two-tailed. 1027 E) Representative graph of a locomotion assay showing the total distance moved by larvae 1028 during the dusk-dawn routine (total time: 3 hr 12 min), n = 24 larvae per genotype. Grey 1029 shading shows the standard error of the mean. 1030 F) Quantification of the total distance moved throughout the experiment from E) excluding the 1031 dark period. 1032 G) Quantification of the number of observed spontaneous eye movements during a 2 minutes 1033 observation in wild type and ugp2 double mutant larvae at 4 dpf. Each dot represents one 1034 larva; shown is the average and SD; ***p<0.001, t-test, two tailed. 1035 H) Quantification of the frequency of movements at a speed of > 15 mm/s, for wild type control 1036 and ugp2 double mutant zebrafish larvae at 4 dpf, treated with mock control or with 0.04 1037 nM or 0.4 nM 4-AP during a 35 minutes observation. Each dot represents a single larva; 1038 results of two experiments are shown, with in total 24 larvae per condition. 1039 I) As H, but now assessing movement duration at a speed of > 15 mm/s. * =p<0.05, two way 1040 ANOVA with Bonferoni post-test.

1041 Figure 7: Essentiality of UGP2 and other disease candidate genes with a similar mutation mechanism

1042 A) qRT-PCR analysis of the hematopoietic stem cell markers GATA2, LMO2 and RUNX1, after 12 1043 days of differentiation of wild type, UGP2 KO and UGP2 KO rescue ESCs. Shown is the mean 1044 fold change for the indicated genes compared to wild type, normalized for the housekeeping 1045 gene TBP. Results of two biological and two technical replicates are plotted. Error bars 1046 represent SEM; *=p<0.05; **=p<0.01,***=p<0.001, unpaired t-test, two-tailed. 1047 B) As A), but now for cardiomyocyte differentiation at day 15, assessing expression of the 1048 cardiomyocyte markers TNNT2, MYL2 and MYL7. 1049 C) Bright-field image of cardiomyocyte cultures of wild type, UGP2 KO and rescue cells. Note 1050 the elongated organized monolayer structures cardiomyocytes capable of beating in wild 1051 type and rescue cells, that are absent in KO cultures. Scale bar is 400 µm. 1052 D) Scheme showing the homology search to identify genes with a similar structure as UGP2, 1053 where ATG altering mutations could affect a tissue specific isoform causing genetic disease.

26

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1054 E) Heat map showing the ratio of short isoform expression over total isoform expression from 1055 published RNA-seq data amongst 20 tissues for 83 out 247 essential genes that are not yet 1056 implicated in disease and in which the short and longer protein isoform differ by less than 50 1057 amino acids at the N-terminal.

1058 Supplementary Figure 1, related to Figure 1:

1059 A) Growth chart from individual 1 for length (left) and head circumference (right) in cm. 1060 Reference chart from the Dutch population are used (TNO) and regions between -2 and + 2 1061 SD are shaded. 1062 B) MRI studies of individual 5 (at the age of 12 month) and individual 6, showing global brain 1063 atrophy. 1064 C) ROH comparison between affected individuals from family 1, 4, 5, 6 and 7, carrying the 1065 homozygous chr2:64083454A>G mutation. The red line indicates the UGP2 variant, and the 1066 blue lines demark the shared ROH region between the individuals (chr2:60679942- 1067 65667235). 1068 D) Violin plots showing distribution of gene expression (in TPM) amongst samples from the 1069 GTEx portal35 for tissues and cell lines. Samples are sorted with the highest median TPM on 1070 the right. Outliers are indicated by dots.

1071 Supplementary Figure 2, related to Figure 2:

1072 A) Western blotting of cellular extracts derived from control fibroblasts or fibroblasts obtained 1073 from heterozygous parents of family 2, detecting the house keeping control vinculin or UGP2. 1074 Note the two separated isoforms of UGP2 that have a similar intensity in wild type cells. The 1075 shorter isoform shows reduced expression in fibroblasts from heterozygous parents. 1076 B) Quantification of the fraction of the short UGP2 protein isoform compared to total UGP2 1077 expression in control, and heterozygous fibroblasts from family 2, as determined in three 1078 independent experiments. Error bars represent SEM. 1079 C) Western blot quantification of total UGP2 protein levels, as determined by the relative 1080 expression to the housekeeping control vinculin. Bar graph showing the results from three 1081 independent experiments. Error bars represent SEM; no significant differences between 1082 control and parent samples, unpaired t-test, two-tailed. 1083 D) qRT-PCR analysis of total UGP2 or the short isoform in fibroblast from heterozygous parents 1084 or homozygous proband from family 1, normalized for the housekeeping control TBP. The 1085 mean fold change compared to heterozygous parents of two biological replicates and two 1086 technical replicates is shown; error bars represent SEM no significant differences between 1087 control and parent samples, unpaired t-test, two-tailed. 1088 E) Multiplex RT-PCR detecting relative expression of UGP2 isoform 1 and isoform 2 in peripheral 1089 blood from family 1 and unrelated wild type controls. 1090 F) Sanger sequencing of RT-PCR products from E), showing the expression of the homozygous 1091 and heterozygous chr2:64083454A>G UGP2 variant in the index proband, her parents and an 1092 unrelated control. 1093 G) Heat map showing genome-wide gene expression levels (in log2(RPKM+1)) in peripheral 1094 blood from heterozygous parents and homozygous proband from family 1.

27

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1095 H) Gene expression levels (in log2(RPKM+1)) from RNA-seq in peripheral blood for a selected 1096 number of genes involved in metabolism. 1097 I) Cell proliferation experiment of fibroblast from heterozygous parents from family 2 and wild 1098 type controls, during a 5 days period. Error bars represent SEM, **= p<0.01, unpaired t-test, 1099 two-tailed. 1100 J) Western blotting detecting UGP2 in human frontal cortex from week 21 and 23 of gestation, 1101 showing the virtual absence of the long isoform expression in fetal brain. Vinculin is used as a 1102 housekeeping control. 1103

1104 Supplementary Figure 3, related to Figure 4: generation of mutant UGP2 H9 cell lines

1105 A) Nucleotide sequence encompassing the ATG of UGP2 transcript isoform 2. Indicated are the 1106 coding sequence, the location of the gRNA, PAM sequence and ssODN used to introduce the 1107 C.1A>G, p.? mutation. 1108 B) Sanger sequencing traces of part of the UGP2 gene from wild type, UGP2 knock-out (KO) and 1109 UGP2 knock-in H9 ESCs (KI). The A at the start of the coding sequence of UGP2 isoform 2 1110 (short isoform) is highlighted. The homozygous insertion of an additional A in knockout and 1111 the mutation into a G in knock-in cells are indicated. 1112 C) Western blot detecting UGP2 and vinculin in wild type ESC, heterozygous and homozygous 1113 knockout and knock-in ESCs, as indicated. Note the complete loss of UGP2 in KO cells, and 1114 the loss of the short isoform in KI cells. 1115 D) RT-qPCR detecting the pluripotency factors OCT4, NANOG and REX1 in H9 wild type, UGP2 1116 knock-in (KI) and UGP2 knock-out (KO) ESCs, normalized for the house keeping control TBP. 1117 Mean fold change compared to wild type of two biological replicates and three technical 1118 replicates is shown; error bars represent SEM, *= p<0.05, unpaired t-test, two-tailed. 1119 E) Bright field image of a representative ESC colony from wild type parental and UGP2 KO ESCs.

1120 Supplementary Figure 4, related to Figure 4, NSC differentiation

1121 A) Schematic drawing of the differentiation procedure, see online methods for details. 1122 B) Bright field image showing representative pictures from ESCs and differentiated NSCs. 1123 C) qRT-PCR analysis for pluripotency markers (NANOG, OCT4 (POU5F1), REX1) and genes 1124 expressed in NSCs (PAX6, GFAP) in WT, UGP2 KO and KI differentiated NSCs at p1 and p5. 1125 Mean fold change compared to wild type of two biological replicates and two technical 1126 replicates is shown; error bars represent SEM. 1127 D) Western blotting showing UGP2 expression in WT, UGP2 KI and KO differentiated NSCs. 1128 Vinculin is used as a housekeeping control. 1129 E) Quantification of total UGP2 protein levels by Western blot, as determined by the relative 1130 expression to the housekeeping control vinculin. Bar graph showing the results from two 1131 independent experiments; error bars represent SEM. 1132 F) qRT-PCR analysis of UGP2 in NSCs or KO NSCs rescued with either the long wild type or long 1133 mutant UGP2 isoform. Mean fold change compared to wild type is shown for two biological 1134 replicates and three technical replicates; error bars represent SEM.

1135

28

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1136 Supplementary Figure 5: RNA-seq, related to Figure 4

1137 A) Scatter plot showing the pair wise correlation between biological replicates. 1138 B) Heat map displaying Pearson correlation between biological replicates. 1139 C) Table summarizing up- (FDR<0.05 and LogFC>1) and down regulated (FDR<0.05 and LogFC<- 1140 1) genes in WT, KO and KI ESCs. 1141 D) Table summarizing up- (FDR<0.05 and LogFC>1) and down regulated (FDR<0.05 and LogFC<- 1142 1) genes in WT, KO and KI ESC upon differentiation in NSCs. 1143 E) Table summarizing up- (FDR<0.05 and LogFC>1) and down regulated (FDR<0.05 and LogFC<- 1144 1) genes in WT, KO and KI NSCs. 1145 F) Heat map visualizing gene expression (in log2(RPKM+1)) and clustering of WT, KO and KI ESCs 1146 and NSCs, for a panel of ESC and NSC specific genes (see methods)

1147

1148 Supplementary Figure 6, related to Figure 4: UGP2 mutant induced pluripotent stem cells

1149 A) Immunofluorescence of iPSC clones used in this study derived from Family 1 (three clones 1150 per individual) showing iPSC colonies stained for the pluripotency markers TRA1-81 (red) and 1151 OCT4 (green) (left panel) or SSEA4 (red) and NANOG (green) (right panel). Nuclei are stained 1152 with DAPI (blue). 1153 B) qRT-PCR expression analysis for the indicated pluripotency associated genes in 4 wild type 1154 control human embryonic stem cell lines and the iPSCs derived from family 1. Mean fold 1155 change compared to human embryonic stem cells of three biological replicates (e.g. 1156 individual clones from A) and three technical replicates is shown; error bars represent SEM. 1157 No statistically significant differences were found, unpaired t-test, two-tailed. 1158 C) Sanger sequencing of representative iPSC clones confirming the presence of the 1159 chr2:64083454A>G UGP2 mutation in a heterozygous state in clones derived from parents 1160 and homozygous state in clones derived from the affected child. 1161 D) qRT-PCR PCR expression analysis upon differentiation for pluripotency (NANOG, OCT4 1162 (POUF51), REX1) and NSC markers (PAX6, GFAP), for H9 ESC control and heterozygous and 1163 homozygous iPSCs derived from family 1. Mean fold change compared to human embryonic 1164 stem cells of three biological replicates (e.g. individual clones from A)and two technical 1165 replicates is shown; normalized to TBP; error bars represent SEM. 1166 E) qRT-PCR expression analysis in iPSC-derived NSCs for genes that showed differential 1167 expression in RNA-seq experiments, e.g. NNAT, FGFBP3, ID4 and PLAU. Mean fold change for 1168 cells obtained from the affected child compared to cells obtained from its parents (set to 1) 1169 of three biological replicates (e.g. individual clones from A) and two technical replicates is 1170 shown; normalized to TBP; error bars represent SEM. 1171

1172 Supplementary Figure 7: related to figure 5

1173 A) UGP2 enzymatic activity in WT, UGP2 KI, KO and KO ESCs rescued with wildtype isoform 1 or 1174 mutant Met12Val isoform 1 of UGP2. Plotted is the mean from two replicate experiments, 1175 error bar is SEM. ***=p<0.001, unpaired t-test, two-tailed.

29

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1176 B) UGP2 enzymatic activity in iPSC derived NSCs from family 1. Plotted is the mean from two 1177 replicate experiments, measuring each the results for the three clones for each individual, 1178 error bar is SEM. *=p<0.05; unpaired t-test, two-tailed. 1179 C) PAS staining in WT and UGP2 KO ESCs. Nuclei are counterstained with hematoxylin (blue). 1180 D) Quantification of the PAS stained area in WT, KI and KO ESCs. Shown is the average PAS 1181 positive area per genotype from two biological replicates, each stained in two experiments; 1182 error bars are SD. ***=p<0.001, unpaired t-test, two-tailed. 1183 E) Glycogen granules detected by PAS staining in iPSC-derived NSCs from family 1 after 48 hours 1184 culture under low-oxygen conditions. Number of granules for paternal cell line are set at 1185 100%. Average of three biological and two technical replicates per genotype, with each n=80- 1186 100 cells counted. Error bars represent SD, ***=p<0.001, unpaired t-test, two-tailed. 1187 F) Western blotting detecting LAMP2 (upper panel) and the house keeping control actin (lower 1188 panel) in cellular extracts from ESCs, that are WT, UGP2 KI, or KO. Compare to Figure 5D. 1189 G) qRT-PCR expression analysis for UPR marker genes (spliced XBP1, HSPA5, ATF4 and EDEM) in 1190 WT, UGP2 KI, KO and rescue ESCs. Shown is the mean fold change for the indicated genes 1191 compared to wild type, normalized for the housekeeping gene TBP. Results of two biological 1192 and three technical replicates are plotted from two experiments. Error bars represent SEM; 1193 *= p<0.05, unpaired t-test, two-tailed). 1194 H) qRT-PCR expression analysis for UPR marker genes (spliced XBP1, HSPA5, ATF4 and EDEM) in 1195 in primary fibroblasts from family 1. Shown is the mean fold change for the indicated genes 1196 compared to wild type, normalized for the housekeeping gene TBP. Results of two 1197 experiments with each three technical replicates are plotted. Error bars represent SEM; *= 1198 p<0.05, unpaired t-test, two-tailed.

1199 Supplementary Table 1: Extended clinical characteristics of 11 patients with homozygous UGP2 1200 variants

1201 Supplementary Table 2: RNA-seq data used in this study

1202 Supplementary Table 3: differentially expressed genes

1203 Supplementary Table 4: enrichment analysis

1204 Supplementary Table 5: UGP2 variants in gnomAD

1205 Supplementary Table 6: genome-wide homology search results

1206 Supplementary Table 7: gnomAD data of 247 disease candidate genes

1207 Supplementary Table 8: Oligonucleotides used in this study

1208 Supplemental Movie 1: wild type zebrafish eye movements

1209 Supplemental Movie 2: Ugp2a/b double mutant zebrafish eye movements

1210 Supplementary Case Reports

1211 Supplementary Note: additional example of an isoform start codon alteration of an essential gene

30

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1212 References:

1213 1. Kalser, J. & Cross, J.H. The epileptic encephalopathy jungle - from Dr West to the concepts of 1214 aetiology-related and developmental encephalopathies. Curr Opin Neurol 31, 216-222 1215 (2018). 1216 2. McTague, A., Howell, K.B., Cross, J.H., Kurian, M.A. & Scheffer, I.E. The genetic landscape of 1217 the epileptic encephalopathies of infancy and childhood. Lancet Neurol 15, 304-16 (2016). 1218 3. Epi, K.C. et al. De novo mutations in epileptic encephalopathies. Nature 501, 217-21 (2013). 1219 4. Nashabat, M. et al. The landscape of early infantile epileptic encephalopathy in a 1220 consanguineous population. Seizure 69, 154-172 (2019). 1221 5. Papuc, S.M. et al. The role of recessive inheritance in early-onset epileptic encephalopathies: 1222 a combined whole-exome sequencing and copy number study. Eur J Hum Genet 27, 408-421 1223 (2019). 1224 6. Bartha, I., di Iulio, J., Venter, J.C. & Telenti, A. Human gene essentiality. Nat Rev Genet 19, 51- 1225 62 (2018). 1226 7. Robbins, S.M., Thimm, M.A., Valle, D. & Jelin, A.C. Genetic diagnosis in first or second 1227 trimester pregnancy loss using exome sequencing: a systematic review of human essential 1228 genes. J Assist Reprod Genet 36, 1539-1548 (2019). 1229 8. Fuhring, J. et al. Octamerization is essential for enzymatic function of human UDP-glucose 1230 pyrophosphorylase. Glycobiology 23, 426-37 (2013). 1231 9. Fuhring, J.I. et al. A quaternary mechanism enables the complex biological functions of 1232 octameric human UDP-glucose pyrophosphorylase, a key enzyme in cell metabolism. Sci Rep 1233 5, 9618 (2015). 1234 10. Yu, Q. & Zheng, X. The crystal structure of human UDP-glucose pyrophosphorylase reveals a 1235 latch effect that influences enzymatic activity. Biochem J 442, 283-91 (2012). 1236 11. Turnquist, R.L., Gillett, T.A. & Hansen, R.G. Uridine diphosphate glucose pyrophosphorylase. 1237 Crystallization and properties of the enzyme from rabbit liver and species comparisons. J Biol 1238 Chem 249, 7695-700 (1974). 1239 12. Flores-Diaz, M. et al. Cellular UDP-glucose deficiency caused by a single point mutation in the 1240 UDP-glucose pyrophosphorylase gene. J Biol Chem 272, 23784-91 (1997). 1241 13. Higuita, J.C., Alape-Giron, A., Thelestam, M. & Katz, A. A point mutation in the UDP-glucose 1242 pyrophosphorylase gene results in decreases of UDP-glucose and inactivation of glycogen 1243 synthase. Biochem J 370, 995-1001 (2003). 1244 14. Adeva-Andany, M.M., Gonzalez-Lucan, M., Donapetry-Garcia, C., Fernandez-Fernandez, C. & 1245 Ameneiros-Rodriguez, E. Glycogen metabolism in humans. BBA Clin 5, 85-100 (2016). 1246 15. Magee, C., Nurminskaya, M. & Linsenmayer, T.F. UDP-glucose pyrophosphorylase: up- 1247 regulation in hypertrophic cartilage and role in hyaluronan synthesis. Biochem J 360, 667-74 1248 (2001). 1249 16. Vigetti, D., Viola, M., Karousou, E., De Luca, G. & Passi, A. Metabolic control of hyaluronan 1250 synthases. Matrix Biol 35, 8-13 (2014). 1251 17. Perkins, K.L., Arranz, A.M., Yamaguchi, Y. & Hrabetova, S. Brain extracellular space, 1252 hyaluronan, and the prevention of epileptic seizures. Rev Neurosci 28, 869-892 (2017). 1253 18. Soleman, S., Filippov, M.A., Dityatev, A. & Fawcett, J.W. Targeting the neural extracellular 1254 matrix in neurological disorders. Neuroscience 253, 194-213 (2013). 1255 19. Cope, E.C. & Gould, E. Adult Neurogenesis, Glia, and the Extracellular Matrix. Cell Stem Cell 1256 24, 690-705 (2019). 1257 20. Arranz, A.M. et al. Hyaluronan deficiency due to Has3 knock-out causes altered neuronal 1258 activity and seizures via reduction in brain extracellular space. J Neurosci 34, 6164-76 (2014). 1259 21. Zeng, C., Xing, W. & Liu, Y. Identification of UGP2 as a progression marker that promotes cell 1260 growth and motility in human glioma. J Cell Biochem 120, 12489-12499 (2019).

31

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1261 22. Li, S., Hu, Z., Zhao, Y., Huang, S. & He, X. Transcriptome-Wide Analysis Reveals the Landscape 1262 of Aberrant Alternative Splicing Events in Liver Cancer. Hepatology 69, 359-375 (2019). 1263 23. Li, Y. et al. Multiomics Integration Reveals the Landscape of Prometastasis Metabolism in 1264 Hepatocellular Carcinoma. Mol Cell Proteomics 17, 607-618 (2018). 1265 24. Wang, L. et al. Expression of UGP2 and CFL1 expression levels in benign and malignant 1266 pancreatic lesions and their clinicopathological significance. World J Surg Oncol 16, 11 (2018). 1267 25. Wang, Q. et al. SHP2 and UGP2 are Biomarkers for Progression and Poor Prognosis of 1268 Gallbladder Cancer. Cancer Invest 34, 255-64 (2016). 1269 26. Tan, G.S. et al. Novel proteomic biomarker panel for prediction of aggressive metastatic 1270 hepatocellular carcinoma relapse in surgically resectable patients. J Proteome Res 13, 4833- 1271 46 (2014). 1272 27. Thorsen, K. et al. Tumor-specific usage of alternative transcription start sites in colorectal 1273 cancer identified by genome-wide exon array analysis. BMC Genomics 12, 505 (2011). 1274 28. de Jonge, H.J. et al. Gene expression profiling in the leukemic stem cell-enriched CD34+ 1275 fraction identifies target genes that predict prognosis in normal karyotype AML. Leukemia 1276 25, 1825-33 (2011). 1277 29. Perenthaler, E., Yousefi, S., Niggl, E. & Barakat, T.S. Beyond the Exome: The Non-coding 1278 Genome and Enhancers in Neurodevelopmental Disorders and Malformations of Cortical 1279 Development. Front Cell Neurosci 13, 352 (2019). 1280 30. Jenkins, Z.A. et al. Differential regulation of two FLNA transcripts explains some of the 1281 phenotypic heterogeneity in the loss-of-function filaminopathies. Hum Mutat 39, 103-113 1282 (2018). 1283 31. Gostynska, K.B. et al. Mutation in exon 1a of PLEC, leading to disruption of plectin isoform 1a, 1284 causes autosomal-recessive skin-only epidermolysis bullosa simplex. Hum Mol Genet 24, 1285 3155-62 (2015). 1286 32. Li, J. et al. Point Mutations in Exon 1B of APC Reveal Gastric Adenocarcinoma and Proximal 1287 Polyposis of the Stomach as a Familial Adenomatous Polyposis Variant. Am J Hum Genet 98, 1288 830-842 (2016). 1289 33. Ta-Shma, A. et al. Mutations in TMEM260 Cause a Pediatric Neurodevelopmental, Cardiac, 1290 and Renal Syndrome. Am J Hum Genet 100, 666-675 (2017). 1291 34. Sobreira, N., Schiettecatte, F., Valle, D. & Hamosh, A. GeneMatcher: a matching tool for 1292 connecting investigators with an interest in the same gene. Hum Mutat 36, 928-30 (2015). 1293 35. Consortium, G.T. The Genotype-Tissue Expression (GTEx) project. Nat Genet 45, 580-5 1294 (2013). 1295 36. Epi25 Collaborative. Electronic address, s.b.u.e.a. & Epi, C. Ultra-Rare Genetic Variation in the 1296 Epilepsies: A Whole-Exome Sequencing Study of 17,606 Individuals. Am J Hum Genet (2019). 1297 37. Landrum, M.J. et al. ClinVar: public archive of relationships among sequence variation and 1298 human phenotype. Nucleic Acids Res 42, D980-5 (2014). 1299 38. Fokkema, I.F. et al. LOVD v.2.0: the next generation in gene variant databases. Hum Mutat 1300 32, 557-63 (2011). 1301 39. Exome Variant Server NHLBI GO Exome Sequencing Project (ESP) Seattle WA. (accessed Juli 1302 2019). 1303 40. Firth, H.V. et al. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans 1304 Using Ensembl Resources. Am J Hum Genet 84, 524-33 (2009). 1305 41. Gonzalez, M. et al. Innovative genomic collaboration using the GENESIS (GEM.app) platform. 1306 Hum Mutat 36, 950-6 (2015). 1307 42. Scott, E.M. et al. Characterization of Greater Middle Eastern genetic variation for enhanced 1308 disease gene discovery. Nat Genet 48, 1071-6 (2016). 1309 43. Fattahi, Z. et al. Iranome: A catalog of genomic variations in the Iranian population. Hum 1310 Mutat (2019). 1311 44. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285- 1312 91 (2016).

32

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1313 45. Rentzsch, P., Witten, D., Cooper, G.M., Shendure, J. & Kircher, M. CADD: predicting the 1314 deleteriousness of variants throughout the human genome. Nucleic Acids Res 47, D886-D894 1315 (2019). 1316 46. Schwarz, J.M., Cooper, D.N., Schuelke, M. & Seelow, D. MutationTaster2: mutation 1317 prediction for the deep-sequencing age. Nat Methods 11, 361-2 (2014). 1318 47. BALUCHISTAN i. Geography, History and Ethnography. in Encyclopædia Iranica Vol. III fasc. 6, 1319 pp 598-632 (2010). 1320 48. Roadmap Epigenomics, C. et al. Integrative analysis of 111 reference human epigenomes. 1321 Nature 518, 317-30 (2015). 1322 49. Yan, L. et al. Epigenomic Landscape of Human Fetal Brain, Heart, and Liver. J Biol Chem 291, 1323 4386-98 (2016). 1324 50. Hwang, T. et al. Dynamic regulation of RNA editing in human brain development and disease. 1325 Nat Neurosci 19, 1093-9 (2016). 1326 51. Shih, H.P. et al. A Gene Regulatory Network Cooperatively Controlled by Pdx1 and Sox9 1327 Governs Lineage Allocation of Foregut Progenitor Cells. Cell Rep 13, 326-36 (2015). 1328 52. Mair, B. et al. Essential Gene Profiles for Human Pluripotent Stem Cells Identify 1329 Uncharacterized Genes and Substrate Dependencies. Cell Rep 27, 599-615 e12 (2019). 1330 53. Yilmaz, A., Peretz, M., Aharony, A., Sagi, I. & Benvenisty, N. Defining essential genes for 1331 human pluripotent stem cells by CRISPR-Cas9 screening in haploid cells. Nat Cell Biol 20, 610- 1332 619 (2018). 1333 54. Turnbull, J. et al. Lafora disease. Epileptic Disord 18, 38-62 (2016). 1334 55. Sharma, J., Rao, S.N., Shankar, S.K., Satishchandra, P. & Jana, N.R. Lafora disease ubiquitin 1335 ligase malin promotes proteasomal degradation of neuronatin and regulates glycogen 1336 synthesis. Neurobiol Dis 44, 133-41 (2011). 1337 56. Sharma, J. et al. Neuronatin-mediated aberrant calcium signaling and endoplasmic reticulum 1338 stress underlie neuropathology in Lafora disease. J Biol Chem 288, 9482-90 (2013). 1339 57. Shinde, V., Pitale, P.M., Howse, W., Gorbatyuk, O. & Gorbatyuk, M. Neuronatin is a stress- 1340 responsive protein of rod photoreceptors. Neuroscience 328, 1-8 (2016). 1341 58. Sel, S. et al. Temporal and spatial expression pattern of Nnat during mouse eye development. 1342 Gene Expr Patterns 23-24, 7-12 (2017). 1343 59. Salyakina, D. et al. Copy number variants in extended autism spectrum disorder families 1344 reveal candidates potentially involved in autism risk. PLoS One 6, e26049 (2011). 1345 60. Li, Y. et al. Temporal and spatial expression of fgfbp genes in zebrafish. Gene 659, 128-136 1346 (2018). 1347 61. Tassi, E. et al. Fibroblast Growth Factor Binding Protein 3 (FGFBP3) impacts carbohydrate and 1348 lipid metabolism. Sci Rep 8, 15973 (2018). 1349 62. Oikari, L.E. et al. Cell surface heparan sulfate proteoglycans as novel markers of human 1350 neural stem cell fate determination. Stem Cell Res 16, 92-104 (2016). 1351 63. Lugert, S. et al. Glypican-2 levels in cerebrospinal fluid predict the status of adult 1352 hippocampal neurogenesis. Sci Rep 7, 46543 (2017). 1353 64. Diotel, N., Beil, T., Strahle, U. & Rastegar, S. Differential expression of id genes and their 1354 potential regulator znf238 in zebrafish adult neural progenitor cells and neurons suggests 1355 distinct functions in adult neurogenesis. Gene Expr Patterns 19, 1-13 (2015). 1356 65. Okazaki, T. et al. Epileptic phenotype of FGFR3-related bilateral medial temporal lobe 1357 dysgenesis. Brain Dev 39, 67-71 (2017). 1358 66. Kyyriainen, J. et al. Deficiency of urokinase-type plasminogen activator and its receptor 1359 affects social behavior and increases seizure susceptibility. Epilepsy Res 151, 67-74 (2019). 1360 67. Hua, S. et al. High expression of GALNT7 promotes invasion and proliferation of glioma cells. 1361 Oncol Lett 16, 6307-6314 (2018). 1362 68. Guo, H. et al. O-Linked N-Acetylglucosamine (O-GlcNAc) Expression Levels Epigenetically 1363 Regulate Colon Cancer Tumorigenesis by Affecting the Cancer Stem Cell Compartment via 1364 Modulating Expression of Transcriptional Factor MYBL1. J Biol Chem 292, 4123-4137 (2017).

33

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1365 69. Pescador, N. et al. Hypoxia promotes glycogen accumulation through hypoxia inducible 1366 factor (HIF)-mediated induction of glycogen synthase 1. PLoS One 5, e9644 (2010). 1367 70. Duran, J., Saez, I., Gruart, A., Guinovart, J.J. & Delgado-Garcia, J.M. Impairment in long-term 1368 memory formation and learning-dependent synaptic plasticity in mice lacking glycogen 1369 synthase in the brain. J Cereb Blood Flow Metab 33, 550-6 (2013). 1370 71. Lopez-Ramos, J.C., Duran, J., Gruart, A., Guinovart, J.J. & Delgado-Garcia, J.M. Role of brain 1371 glycogen in the response to hypoxia and in susceptibility to epilepsy. Front Cell Neurosci 9, 1372 431 (2015). 1373 72. Choi, H.B. et al. Metabolic communication between astrocytes and neurons via bicarbonate- 1374 responsive soluble adenylyl cyclase. Neuron 75, 1094-104 (2012). 1375 73. Xu, J. et al. Requirement of glycogenolysis for uptake of increased extracellular K+ in 1376 astrocytes: potential implications for K+ homeostasis and glycogen usage in brain. 1377 Neurochem Res 38, 472-85 (2013). 1378 74. Schousboe, A., Sickmann, H.M., Walls, A.B., Bak, L.K. & Waagepetersen, H.S. Functional 1379 importance of the astrocytic glycogen-shunt and glycolysis for maintenance of an intact 1380 intra/extracellular glutamate gradient. Neurotox Res 18, 94-9 (2010). 1381 75. Wang, X. et al. Histone H3K4 methyltransferase Mll1 regulates protein glycosylation and 1382 tunicamycin-induced apoptosis through transcriptional regulation. Biochim Biophys Acta 1383 1843, 2592-602 (2014). 1384 76. Kuil, L.E. et al. Hexb enzyme deficiency leads to lysosomal abnormalities in radial glia and 1385 microglia in zebrafish brain development. Glia 67, 1705-1718 (2019). 1386 77. Narasimhan, V.M. et al. Health and population effects of rare gene knockouts in adult 1387 humans with related parents. Science 352, 474-7 (2016). 1388 78. Saleheen, D. et al. Human knockouts and phenotypic analysis in a cohort with a high rate of 1389 consanguinity. Nature 544, 235-239 (2017). 1390 79. Sulem, P. et al. Identification of a large set of rare complete human knockouts. Nat Genet 47, 1391 448-52 (2015). 1392 80. Blomen, V.A. et al. Gene essentiality and synthetic lethality in haploid human cells. Science 1393 350, 1092-6 (2015). 1394 81. Bakke, J. et al. Genome-wide CRISPR screen reveals PSMA6 to be an essential gene in 1395 pancreatic cancer cells. BMC Cancer 19, 253 (2019). 1396 82. Bertomeu, T. et al. A High-Resolution Genome-Wide CRISPR/Cas9 Viability Screen Reveals 1397 Structural Features and Contextual Diversity of the Human Cell-Essential Proteome. Mol Cell 1398 Biol 38(2018). 1399 83. Wang, T. et al. Identification and characterization of essential genes in the human genome. 1400 Science 350, 1096-101 (2015). 1401 84. Wang, X. et al. BRD9 defines a SWI/SNF sub-complex and constitutes a specific vulnerability 1402 in malignant rhabdoid tumors. Nat Commun 10, 1881 (2019). 1403 85. Hart, T. et al. High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific 1404 Cancer Liabilities. Cell 163, 1515-26 (2015). 1405 86. Daran, J.M., Bell, W. & Francois, J. Physiological and morphological effects of genetic 1406 alterations leading to a reduced synthesis of UDP-glucose in Saccharomyces cerevisiae. FEMS 1407 Microbiol Lett 153, 89-96 (1997). 1408 87. Daran, J.M., Dallies, N., Thines-Sempoux, D., Paquet, V. & Francois, J. Genetic and 1409 biochemical characterization of the UGP1 gene encoding the UDP-glucose 1410 pyrophosphorylase from Saccharomyces cerevisiae. Eur J Biochem 233, 520-30 (1995). 1411 88. Li, M. et al. UDP-glucose pyrophosphorylase influences polysaccharide synthesis, cell wall 1412 components, and hyphal branching in Ganoderma lucidum via regulation of the balance 1413 between glucose-1-phosphate and UDP-glucose. Fungal Genet Biol 82, 251-63 (2015). 1414 89. Chen, R. et al. Rice UDP-glucose pyrophosphorylase1 is essential for pollen callose deposition 1415 and its cosuppression results in a new type of thermosensitive genic male sterility. Plant Cell 1416 19, 847-61 (2007).

34

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1417 90. Park, J.I. et al. UDP-glucose pyrophosphorylase is rate limiting in vegetative and reproductive 1418 phases in Arabidopsis thaliana. Plant Cell Physiol 51, 981-96 (2010). 1419 91. Woo, M.O. et al. Inactivation of the UGPase1 gene causes genic male sterility and endosperm 1420 chalkiness in rice (Oryza sativa L.). Plant J 54, 190-204 (2008). 1421 92. Tian, D. et al. Identifying mouse developmental essential genes using machine learning. Dis 1422 Model Mech 11(2018). 1423 93. Jumbo-Lucioni, P.P., Parkinson, W.M., Kopke, D.L. & Broadie, K. Coordinated movement, 1424 neuromuscular synaptogenesis and trans-synaptic signaling defects in Drosophila 1425 galactosemia models. Hum Mol Genet 25, 3699-3714 (2016). 1426 94. Tsai, F.Y. et al. An early haematopoietic defect in mice lacking the transcription factor GATA- 1427 2. Nature 371, 221-6 (1994). 1428 95. de Pater, E. et al. Gata2 is required for HSC generation and survival. J Exp Med 210, 2843-50 1429 (2013). 1430 96. Binder, J. et al. Clinical and molecular findings in a patient with a novel mutation in the 1431 deafness-dystonia peptide (DDP1) gene. Brain 126, 1814-20 (2003). 1432 97. Caridi, G. et al. A novel mutation in the albumin gene (c.1A>C) resulting in analbuminemia. 1433 Eur J Clin Invest 43, 72-8 (2013). 1434 98. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next- 1435 generation DNA sequencing data. Genome Res 20, 1297-303 (2010). 1436 99. Retterer, K. et al. Clinical application of whole-exome sequencing across clinical indications. 1437 Genet Med 18, 696-704 (2016). 1438 100. Snoeijen-Schouwenaars, F.M. et al. Diagnostic exome sequencing in 100 consecutive patients 1439 with both epilepsy and intellectual disability. Epilepsia 60, 155-164 (2019). 1440 101. Mencacci, N.E. et al. De Novo Mutations in PDE10A Cause Childhood-Onset Chorea with 1441 Bilateral Striatal Lesions. Am J Hum Genet 98, 763-71 (2016). 1442 102. Trujillano, D. et al. Clinical exome sequencing: results from 2819 samples reflecting 1000 1443 families. Eur J Hum Genet 25, 176-182 (2017). 1444 103. Bayer SA, A.J. (ed.) Atlas of human central nervous system development, Vol 2: The human 1445 brain during the third trimester (CRC Press, Boca Raton, 2004). 1446 104. Bayer SA, A.J. (ed.) Atlas of human central nervous system development, Vol 3: The human 1447 brain during the second trimester, (CRC Press, Boca Raton, 2005). 1448 105. Bayer SA, A.J. (ed.) Atlas of human central nervous system development, Vol 4: The human 1449 brain during the late first trimester, (CRC Press, Boca Raton, 2006). 1450 106. Bayer SA, A.J. (ed.) Atlas of human central nervous system development, Vol 5: The human 1451 brain during the early first trimester, (CRC Press, Boca Raton, 2008). 1452 107. Barakat, T.S. et al. Functional Dissection of the Enhancer Repertoire in Human Embryonic 1453 Stem Cells. Cell Stem Cell 23, 276-288 e8 (2018). 1454 108. Barakat, T.S. et al. Stable X chromosome reactivation in female human induced pluripotent 1455 stem cells. Stem Cell Reports 4, 199-208 (2015). 1456 109. Barakat, T.S. & Gribnau, J. X chromosome inactivation and embryonic stem cells. Adv Exp 1457 Med Biol 695, 132-54 (2010). 1458 110. Barakat, T.S. & Gribnau, J. Generation of knockout alleles by RFLP based BAC targeting of 1459 polymorphic embryonic stem cells. Methods Mol Biol 1227, 143-80 (2015). 1460 111. Renaud, J.B. et al. Improved Genome Editing Efficiency and Flexibility Using Modified 1461 Oligonucleotides with TALEN and CRISPR-Cas9 Nucleases. Cell Rep 14, 2263-2272 (2016). 1462 112. Chambers, S.M. et al. Highly efficient neural conversion of human ES and iPS cells by dual 1463 inhibition of SMAD signaling. Nat Biotechnol 27, 275-80 (2009). 1464 113. Kim, D., Langmead, B. & Salzberg, S.L. HISAT: a fast spliced aligner with low memory 1465 requirements. Nat Methods 12, 357-60 (2015). 1466 114. Anders, S., Pyl, P.T. & Huber, W. HTSeq--a Python framework to work with high-throughput 1467 sequencing data. Bioinformatics 31, 166-9 (2015).

35

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1468 115. Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput 1469 Biol 9, e1003118 (2013). 1470 116. Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edgeR: a Bioconductor package for differential 1471 expression analysis of digital gene expression data. Bioinformatics 26, 139-40 (2010). 1472 117. Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems- 1473 level datasets. Nat Commun 10, 1523 (2019). 1474 118. Raudvere, U. et al. g:Profiler: a web server for functional enrichment analysis and 1475 conversions of gene lists (2019 update). Nucleic Acids Res 47, W191-W198 (2019). 1476 119. Kuleshov, M.V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 1477 update. Nucleic Acids Res 44, W90-7 (2016). 1478 120. Ying, D. et al. HaploShare: identification of extended haplotypes shared by cases and 1479 evaluation against controls. Genome Biol 16, 92 (2015). 1480 121. Turton, K.B., Esnault, S., Delain, L.P. & Mosher, D.F. Merging Absolute and Relative 1481 Quantitative PCR Data to Quantify STAT3 Splice Variant Transcripts. J Vis Exp (2016). 1482 122. Brinkman, E.K., Chen, T., Amendola, M. & van Steensel, B. Easy quantitative assessment of 1483 genome editing by sequence trace decomposition. Nucleic Acids Res 42, e168 (2014). 1484 123. Lindhout, M., Rubio-Gozalbo, M.E., Bakker, J.A. & Bierau, J. Direct non-radioactive assay of 1485 galactose-1-phosphate:uridyltransferase activity using high performance liquid 1486 chromatography. Clin Chim Acta 411, 980-3 (2010).

1487

36

bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Figure 1 A Pictures of patients will be made available upon publication of the peer-reviewed version of this preprint

B C Individual 1, 17 month Individual 4, 24 month

G C AA T G T D E Family 1 Family 2 Family 3 Family 4 WT Netherlands Canada USA Oman (India) (India) (Pakistan)

G C A R T G T

c.34 A>G > >

het. > G C A G T G T 1 >2 >3 4 c.34 A>G ? 5 brain hom. malformation seizures Family 5 Family 6 Family 7 Family 8 Family 9

Pakistan Iran Iran Iran UAE

> >

> > >

? 6 7> 8 ? 9 ? 10 11 GDD similar similar seizures a ected a ected F G Gene expression for UGP2 (ENSG00000169764.10) female male in Human Brain

200200

150150 TPM 100100

5050

AmygdalaAnterior cingulateCaudate cortex (basalCerebellar (BA24) ganglia)Cerebellum HemisphereCortex Frontal CortexHippocampus (BA9)HypothalamusNucleus accumbensPutamen (basal Spinal(basal ganglia) cordganglia)Substantia (cervical c-1) nigra bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which Figure 2 was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. A UGP2 Exon Ia Ib II III IV V VI VII VIII IX X

Exon size (bp) 269 325 128 108 186 134 298 198 243 105 443 Transcript variant 1 NM006759 c.34A>G, p.Met12Val 1527 bp, 509 AA Transcript variant 2 NM001001521 c.1A>G, p.? 1494 bp, 498 AA B C D 1.0 1.5 Control Mother Father Proband 0.8 1.0 0.6 Vinculin 0.4 long 0.5

short UGP2 Fraction of short isoform short 0.2

0.0 Total UGP2 protein 0.0

Control ParentsProband ParentsProband E Parents F UGP2 UGP2 /DAPI 6.0×104 Proband 2

4.0×104

2.0×104 Control Cells per cm per Cells 0 0 1 2 3 4 5 Days G

UGP2 activity in fibroblast Mother 15

10

5 Father (nmol/min/mg)

Enzymatic activity Enzymatic 0

father control mother proband Proband bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 3 Long A C Short L ESC L NSC Fibro 150 kDa L ESC NSC Fibro Vinculin 1000 bp long long UGP2 850 bp short 50 kDa short

B D E F 1.00 Transcript Transcript Protein RT-PCR qPCR 0.75 1.0 1.0 1.0 0.8 0.8 0.8 0.50 0.6 0.6 0.6 0.4 0.4 0.4 0.25 0.2 0.2

0.2 Fetal tissue RNA−seq

0.0 0.0 Fraction short / total UGP2 0.00 Fraction short isoform Fraction short Fraction short isoform Fraction short 0.0 Fraction short isoform Fraction short rge Heart Liver Brain Thymus ancreasKidneySpleen ESC NSC ESC NSC ESC NSC cle Back Lung Left StomachP Lung Right MuscleMus ArmMuscle Leg Intestine Small Fibroblast Fibroblast Fibroblast Intestine La G

A) first trimester B) second trimester GW23 C) third trimester GW36

H

L Frontal Temporal Parietal Occipital Cerebellum L ESC GW: 14 20 28 14 20 28 14 20 28 14 20 28 14 20 28 150 kDa Vinculin

long 50 kDa short UGP2 bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Figure 4 *** *** *** *** *** *** A B 10.0 * 10.0 10.0 KO KI 7.5 7.5 7.5

1176 153 220 5.0 5.0 5.0 2.5 2.5 2.5 Log2(RPKM + 1) Log2(RPKM + 1) Log2(RPKM + 1)

0.0 0.0 0.0 Up compared to WT WT KI KO WT KI KO WT KI KO UP, n=1176 genes Up, n=153 genes Up, n=220 genes *** *** *** ** *** *** 10.0 10.0 10.0 KO KI 7.5 7.5 7.5 982 71 136 5.0 5.0 5.0

2.5 2.5 2.5 Log2(RPKM + 1) Log2(RPKM + 1) Log2(RPKM + 1)

0.0 0.0 0.0 Down compared to WT WT KI KO WT KI KO WT KI KO C Down, n=982 genes Down, n=71 genes Down, n=136 genes 1176 up GO Biological Process 2018 153 up GO Biological Process 2018 220 up GO Cellular Component 2015 modulation of chemical synaptic transmission 118 *** cell migration involved in sprouting angiogenesis 247 *** neuronal cell body membrane 130 ** neuron cell-cell adhesion 103 *** cell-cell adhesion mediated by cadherin 240 *** cell body membrane 130 ** glutamate receptor signaling pathway 97 *** negative regulation of vascular permeability 211 ** sarcoplasmic reticulum membrane 74 * amino acid import across plasma membrane 77 *** neuronal ion channel clustering 180 ** gap junction 54 * DNA replication checkpoint 74 *** cyclic nucleotide catabolic process 180 ** photoreceptor disc membrane 54 * MGI Mammalian Phenotype Lev. 4 OMIM Expanded Rare Diseases GeneRIF ARCHS4 abnormal synaptic transmission 121 *** epilepsy 12 Episodic ataxia 56 *** seizures 54 *** ataxia 11 Juvenile myoclonic epilepsy 56 *** abnormal emotion/affect behavior 52 *** deafness 4 Juvenile retinoschisis 56 *** abnormal learning/memory/cond 44 *** mental retardation 4 Mental retardation epilepsy 56 *** abnormal brain morphology 40 *** epiphyseal dysplasia, multiple 1 Ohtahara syndrome 56 *** 982 down KEGG 2019 Human 71 down GO Cellular Component 2017 136 down GO Cellular Component 2017 ECM-receptor interaction 43 *** neuronal cell body membrane 274 ** neuronal cell body membrane 116 ** Renin-angiotensin system 39 *** invadopodium membrane 228 * neurofilament 100 * Arrhythmogenic right ventricular cardiomyopathy 37 *** CD95 death-inducing signaling complex 228 * CD95 death-inducing signaling complex 100 * Caffeine metabolism 31 * death-inducing signaling complex 181 * SAGA-type complex 100 * Hypertrophic cardiomyopathy 26 *** integral component of synaptic vesicle membrane 149 * multivesicular body membrane 100 * GO Molecular Function 2015 HumanCyc 2016 WikiPathways 2019 Mouse laminin binding 71 *** chondroitin sulfate biosynthesis (late stages) 126 * Synthesis and Degradation of Ketone Bodies 100 * extracellular matrix binding 46 *** fatty acid β-oxidation (peroxisome) 56 Prostaglandin Synthesis and Regulation 96 ** sodium channel regulator activity 44 *** chondroitin sulfate biosynthesis 47 Robo4 and VEGF Signaling Pathways Crosstalk 79 * Rho guanyl-nucleotide exchange factor activity 40 *** gluconeogenesis 29 Heart Development 57 ** glucose binding 40 ** mucin core 1 and core 2 O-glycosylation 24 Dysregulated miRNA Targeting in Insulin/PI3K-AKT 49 *

NNAT FGFBP3 GPC2 ID4 D 10 8 5 40 ** ** * *** 8 6 4 30

6 hange 3 hange * 4 hange hange 20 c c 4 c 2 c ld ld ld ld 2 2 1 10 Fo Fo Fo Fo 0 0 0 0 ** ** KI KI KI KI WT KO WT KO WT KO WT KO

rescue WT rescue WT rescue WT rescue WT rescue MUT rescue MUT rescue MUT rescue MUT FGFR3 PLAU GALNT7 MYBL1 6 25 * 8 15 * * *** 20 6 *** 4 15 10 hang e hang e hange hang e 4 * c c c c 10 ** ld ld ld ld 2 5 5 2 Fo Fo Fo Fo **

KI KI KI KI WT KO WT KO WT KO WT KO

rescue WT rescue WT rescue WT rescue WT rescue MUT rescue MUT rescue MUT rescue MUT bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which Figure 5 was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. *** *** A NS C NS * 40 500 ***

30 400 300 20 200

(nmol/min/mg) 10 100 enzymatic activity enzymatic

0 number of Average glycogen granules per cell 0 WT KI KO rescue rescue WT KI KO rescuerescue WT MUT WT MUT B NSC WT NSC UGP2 KI NSC UGP2 KO

NSC UGP2 KO rescue WT NSC UGP2 KO rescue MUT

D rescue rescue L WT KI1 KI2 KO1 KO2 L WT MUT

100 kDa glycosylated LAMP2 75kDa hypo- glycosylated 50kDa Actin 37kDa

E 10 WT 8 ** ** * ** KI 6 KO 4 * rescue WT * * ** * *** 2 ** * rescue MUT Fold change Fold 0 sXBP1 HSPA5 ATF4 EDEM bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 6 Control UGP2 mutant A ugp2a B

GAGAACTCCATGCACACAGAGCTGGAGAAGCTCTTGAGTACCGCCAAAACACCAGAGGCCGAGgt protein: 507 aa ---ENSMHTELEKLLSTAKTPEAE--- gRNA PAM

Ugp2aΔ/Δ GAGAACTCCATGCACACAGAGC------CGAGgt protein: 34+46 aa ---ENSMHTEPRSLGRTLRASRSFSTVSFRRRGRLLTGPRSRDHPKTRSSRMRRSS*

ugp2b

gRNA PAM GTCTCAGTAAAGAGTTATTGAGCAACTCCAACGGGATGGCTCAGTTTCAGGAAGTGATGCGACAG protein: 492aa / 506 aa ---LSKELLSNSNGMAQFQEVMRQQLESSMHVELEKLLSTAKGAGAE

Ugp2bΔ/Δ GTCTCAGTAAAGAGTTATTGAGCAACTCCAA------TGGCTCAGTTTCAGGAAGTGATGCGACAG protein: none /14+16 aa ---LSKELLSNSNGSVSGSDATAAGELYAC*

C 10 D c-FOS qPCR 1.5 8 *** mg) /

min 6 / 1.0 4

2 0.5 SEA (nmol SEA 0

Relative expression normalized for GAPDH 0.0

Δ/Δ Δ/Δ 5 dpf Δ/Δ 6 dpf Control Control 5 dpf Control 6 dpf UGP2 E UGP2 UGP2 F 140 Control 15000 Ugp2 Δ/Δ min) 120 / 100 10000 80 60 5000 40

20 (mm) moved Distance Distance Distance moved (mm 0 0

3 h Δ/Δ Control UGP2 G H I *** * 20 20 20

15 15 15

10 10 10

movements 5 5 5 Number of eye Number Movement frequency Movement

0 0 (s) duration Movement 0

Δ/Δ Δ/Δ 0.4 Δ/Δ 0.4 Control Δ/Δ 0.04 Δ/Δ 0.04 Control 0.4 Control 0.4 UGP2 Control 0.04 Control 0.04 Δ/Δ untreated UGP2 Δ/Δ untreated UGP2 UGP2 UGP2 Control untreated Control untreated UGP2 UGP2 bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which Figure 7was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. ESC-Hematopoietic stem cell di erentiation A GATA2 LMO2 RUNX1 D 5 8 8 ** * * 4 * 6 6 Long 3 *** 4 4 2 Short 1 2 2 Fold change Fold Fold change Fold * change Fold 0 0 0 <50 AA

WT KO WT KO WT KO 21522 genes rescue WT rescue WT rescue WT (42976 isoforms) rescue MUT rescue MUT rescue MUT B ESC-Cardiomyocyte di erentiation TNNT2 MYL2 MYL7 10466 genes with >1 isoform (31920 isoforms) 10 150 * 2.5 8 2.0 6 100 1.5 ** 1766 genes with short isoform 4 50 1.0 2 0.5 largely identical to long C-term Fold change Fold 0 ** change Fold 0 * change Fold 0.0 *

WT KO WT KO WT KO 1197 essential genes rescue WT rescue WT rescue WT C rescue MUT rescue MUT rescue MUT Wild type CTRL UGP2 KO UGP2 KO Rescue WT 247 genes <50 AA di erence between short and long and no OMIM phenotype

83 genes with high variability across tissues

E Ratio (shortest isoform /total isoforms)

0 0.4 1

long isoform predominant

short isoform predominant 83 essential genes without OMIM phenotype83 essential expression isoform and variable H9_WT NSC_WT Fetal_Liver Fetal_Brain Fetal_Heart Fetal_Ovary Fetal_Kidney Fetal_Spleen Whole_Blood Fetal_Thymus etal_Lung_Left Fetal_Stomach Fetal_Pancreas Fetal_Fibroblast F etal_Lung_Right F Fetal_Muscle_Leg Fetal_Muscle_Arm Fetal_Muscle_Back Fetal_Intestine_Small Fetal_Intestine_Large bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Supplementary Figure 1 Family 7 Family 6 Family 5 Family 4 Family 1 A C length (cm) head circumference (cm) head circumference

Age (Years) Age (Years) B Individual 6 Individual 5 Individual 5

D Gene expression for UGP2 (ENSG00000169764.10) 600

500

400

TPM 300

200

100

PancreasBrainWhole - SpinalBrain BloodBrain - cordHypothalamusTestis - Substantia Brain(cervicalKidney - HippocampusBrain c-1) nigraStomach- Cortex - PutamenPituitarySpleenBrain (basalSkin - AmygdalaProstate ganglia)- NotOvary SunSkin SmallExposed - SunBrain Intestine ExposedCells (Suprapubic)- CaudateAdrenal - EBV-transformed Thyroid- (LowerTerminalBrain Gland(basalBrain leg)- CerebellumEsophagus Ileumganglia) - CerebellarBrain lymphocytesFallopian - NucleusCervix - Mucosa HemisphereNerve Tube -Vagina Ectocervixaccumbens - CervixTibialLung -Heart Endocervix (basalMinor - AtrialBrain ganglia) SalivaryHeart Appendage- AnteriorUterus - GlandEsophagusLeftBladder VentriclecingulateColonColon - Gastroesophageal -cortex EsophagusSigmoid - BrainTransverse (BA24)Breast - CortexBrain - Muscularis -Adipose Mammary -Junction FrontalArteryArtery - Visceral- CortexAdiposeAortaTissue - LiverCoronary (BA9)Cells(Omentum) - SubcutaneousArtery - TransformedMuscle - Tibial - Skeletal fibroblasts bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Supplementary Figure 2 1.0 0.8 A B 0.6 0.4 CTRL ♀ CTRL ♂ mother father

Fraction Fraction of 0.2 Vinculin isoform short 0.0 UGP2 Control Parents Parents C D F AA G C AA T G T C 1.5 1.5 Proband Control 1.0 1.0

0.5 0.5 AA G C A GT G T C Fold change Fold 0.0 0.0

Total UGP2 protein Proband

Control Parents AA G C A R T G T C Total UGP2 Short UGP2 E O 2 marker proband father mother control control control control H Father

AA G C A R T G T C 630bp UGP2 variant 2 512bp UGP2 variant 1 Mother

0 2 4 G Log2(RPKM+1) I 5 2 1.0×10 CTRL 4 8.0×10 Parents 6.0×104 ** 4.0×104 2.0×104 Cells per cm per Cells 0 0 1 2 3 4 5 Days

Father Mother Proband

H 8 Proband J Father L Frontal cortex Mother 6 GW: 21+123+5 21+1 150 kDa Vinculin 4 UGP2 (RPKM + 1) 2

Log 2 long 50 kDa short UGP2

0 AGL GALE GALT GYS1 PGM1 bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Supplementary Figure 3 A X X

B C KO KI WT hESC het 1 2 het 1 2

Vinculin UGP2 KO UGP2

UGP2 KI E ESC KO ESC

D 2.0 CTRL KI 1.5 KO 1.0

Fold change Fold 0.5

0.0 OCT4 NANOG REX1 bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Supplementary Figure 4 A B ESC NSC 2 µM A8301 + 2 µM Dorsomorphin mTeSR Differentiation NSC

Day 0 Day 6 Day 10 seed ESC 90% 1:1 Diff : NSC Split ESC confluence NSC p=0

C NANOG P1 OCT4 P1 REX1 P1 PAX6 P1 GFAP P1 1.5 1.5 1.5 10000 12 1000 9 1.0 1.0 1.0 100 6 0.5 0.5 0.5 10 1 3 Fold change Fold 0.0 change Fold 0.0 change Fold 0.0 change Fold 0.1 change Fold 0

ESCNSC ESCNSC ESCNSC ESCNSC ESCNSC NSC KI NSC KI NSC KI NSC KI NSC KI NSC KO NSC KO NSC KO NSC KO NSC KO

NANOG P5 OCT4 P5 REX1 P5 PAX6 P5 GFAP P5 1.5 1.5 1.5 10000 25 1000 20 1.0 1.0 1.0 100 15 10 0.5 0.5 0.5 10 1 5 Fold change Fold Fold change Fold 0.0 change Fold 0.0 change Fold 0.0 0.1 change Fold 0

ESCNSC ESCNSC ESCNSC ESCNSC ESCNSC NSC KI NSC KI NSC KI NSC KI NSC KI NSC KO NSC KO NSC KO NSC KO NSC KO

UGP2 protein upon NSC differentiation D NSC KI1 KI2 KO1 KO2 Vinculin UGP2

E Total UGP2 PROTEIN F Total UGP2 TRANSCRIPT 6 1.5 4 1.0 2 0.5 Fold change Fold Fold change Fold 0 0.0 KI KO NSC NSC

rescue WT rescue MUT bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Supplementary Figure 5 A B

R=1, p<2.2e-16 R=0.99,R = 0.99 , p<2.2e-16p < 2.2e-16 RR=1, = 1 p<2.2e-16, p < 2.2e-16

10 10 10 0.6 0.8 1 Pearson Correlation H9_WT_B1 5 5 5 H9_WT_B2 H9_KO_B1 H9_KO_B2 Log2(RPKM+1) H9_KI_B2 Log2(RPKM+1) H9_KO_B2 Log2(RPKM+1) H9_WT_B2 0 0 0 H9_KI_B2 0 5 10 0 5 10 0 5 10 H9_KI_B1 Log2(RPKM+1) H9_WT_B1 Log2(RPKM+1) H9_KO_B1 Log2(RPKM+1) H9_KI_B1 NSC_WT_B2 R=0.99, , p<2.2e-16 RR=0.97, = 0.97 , pp<2.2e-16 < 2.2e-16 R=0.99,R = 0.99 , p<2.2e-16p < 2.2e-16 NSC_WT_B1 10 10 10 NSC_KI_B1 NSC_KI_B2 NSC_KO_B2 5 5 5 NSC_KO_B1 Log2(RPKM+1) NSC_KI_B2 Log2(RPKM+1) NSC_KO_B2 Log2(RPKM+1) NSC_WT_B2 0 0 0 H9_KI_B1 H9_KI_B2 H9_KO_B2 H9_KO_B1 H9_WT_B2 H9_WT_B1 0 5 10 0 5 10 0 5 10 NSC_KI_B2 NSC_KI_B1 NSC_KO_B1 NSC_KO_B2 Log2(RPKM+1) NSC_WT_B1 Log2(RPKM+1) NSC_KO_B1 Log2(RPKM+1) NSC_KI_B1 NSC_WT_B1 NSC_WT_B2

C D E UP 3413 UP UP ESCs WT NSCs DOWN 3164 DOWN DOWN 1 3025 1329 KO KO KO 8 2363 1053 NSCs 5 57 3502 KI KI KI 373 1160 18 56 3081 207 1167 WT KO WT KO KI WT KO ESCs F ESC genes NSC genes

0 1 2 3 KI_B1 KI_B2 WT_B1 WT_B2 KO_B2 KO_B1 WT_B2 WT_B1 KI_B1 KI_B2 KO_B1 KO_B2 Log2(RPKM + 1) ESCs NSCs bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which Supplementarywas not certified Figure by peer review) 6 is the author/funder. All rights reserved. No reuse allowed without permission.

A TRA1-81 SSEA4 C DAPI TRA1-81 OCT4 MERGE DAPI SSEA4 NANOG MERGE OCT4 NANOG

male clone 10 AGC AAR T G TCCT A clone 16

UGP2 +/- male c.34 A>G het clone 18 clone 10 female clone 1 AGC AAR T G TCCT A clone 10 c.34 A>G het UGP2 +/- female clone 14 clone 1 female clone 16 AGC AAGT G TCCT A clone 17

UGP2 -/- female c.34 A>G hom clone 18 clone 16

ESC B 5 Mother 4 Father 3 Child 2 1 Fold change Fold 0

KLF4 cMYC SOX2 GDF3 REX1 FGF4 ESG1 TERT OCT3/4 NANOG D NANOG OCT4 REX1 PAX6 GFAP 1.5 1.5 1.5 1500 8 1.0 1.0 1.0 1000 6 4 0.5 0.5 0.5 500 2 Fold change Fold Fold change Fold change Fold change Fold Fold change Fold 0.0 0.0 0.0 0 0

ESC ESC ESC ESC ESC

Child NSC Child NSC Child NSC Child NSC Child NSC Parents NSC Parents NSC Parents NSC Parents NSC Parents NSC NNAT FGFBP3 ID4 PLAU E 2.0 3 5 1.5 1.5 4 2 3 1.0 1.0 1 2 0.5 0.5 1 Fold change Fold Fold change Fold 0.0 change Fold 0 0 change Fold 0.0

Child Child Child Child Parents Parents Parents Parents bioRxiv preprint doi: https://doi.org/10.1101/799841; this version posted October 10, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Supplementary Figure 7 A UGP2 activity ESC C *** NS ESC WT ESC UGP2 KO 25 ***

20

15

10 (nmol/min/mg)

enzymatic activity enzymatic 5

0 WT KI KO rescue rescue WT Mut B UGP2 activity iPSC-NSC D E * *** *** *** 10 40 100 8 30 75 6 20 50 4

(nmol/min/mg) 10 25

enzymatic activity enzymatic 2 PAS stained area (%) PAS in ESCs 0 0 Glycogen granules (%) in iPSC derived NSCs 0 parents proband WT KI KO father mother child iPSC-NSCs Family 1 F ESCs L WT KI1 KI2 KO1 KO2 150 kDa LAMP2 100 kDa

50 kDa ACTIN 37 kDa

WT KI G KO H Control ESC rescue WT Fibroblast Parents 4 rescue MUT 2.0 Proband 3 1.5 * 2 1.0 1 0.5 Fold change Fold Fold change Fold 0 0.0 sXBP1 HSPA5 ATF4 EDEM sXBP1 HSPA5 ATF4 EDEM