EXPLORING MOLECULAR PATHOGENESIS TO

STREAMLINE FUTURE THERAPEUTICS IN

RARE DISEASES USING GSD1A AS A MODEL

By

KATHLEEN LYNN PLONA

Submitted in partial fulfillment of the requirements for the degree

of Doctor of Philosophy

Dissertation Advisor: Mitchell L. Drumm, PhD

Department of Genetics and Genome Sciences

CASE WESTERN RESERVE UNIVERSITY

August 2021 Case Western Reserve University

School of Graduate Studies

We hereby approve the thesis of

Kathleen Lynn Plona

Candidate for the degree of Doctor of Philosophy*

Committee Chair

Paul Tesar

Committee Member

Anna Mitchell

Committee Member

Colleen Croniger

Committee Member

Mitchell L. Drumm

Date of Defense

May 27, 2021

* We also certify that written approval has been obtained for any proprietary material

contained therein

2 Dedication

This work is dedicated to everyone who has ever experienced the confusion and struggle of finding a diagnosis.

3 Table of Contents

List of Tables ...... 6

List of Figures ...... 7

Acknowledgements ...... 8

Abstract ...... 11

Chapter 1: Background and Significance ...... 13

1.1 Rare diseases ...... 13

1.2 Role of sequencing in diagnostics ...... 14

1.3 Variant interpretation ...... 15

1.4 G6PC Family ...... 16

1.5 Slc37a4 (G6PT) ...... 19

1.6 type 1 ...... 19

1.7 Glycogen storage disease type 1a ...... 20

1.8 Significance ...... 22

1.9 Figures for Chapter 1 ...... 27

1.10 Tables for Chapter 1 ...... 30 Chapter 2: Classifying molecular phenotypes of G6PC variants for pathogenic properties and to guide therapeutic development ...... 31

2.1 Abstract ...... 31

2.2 Synopsis ...... 32

2.3 Introduction ...... 32

2.4 Materials and Methods ...... 35

2.5 Results ...... 39

4 2.6 Discussion ...... 42

2.7 Figures for Chapter 2 ...... 45

2.8 Tables for Chapter 2 ...... 50

Chapter 3: Investigating localization and movement of G6PC family members and

G6PC1 variants ...... 51

3.1 Abstract ...... 51

3.2 Synopsis ...... 52

3.3 Introduction ...... 52

3.4 Materials and Methods ...... 54

3.5 Results ...... 56

3.6 Discussion ...... 59

3.7 Figures for Chapter 3 ...... 63

Chapter 4: Discussion ...... 74

4.1 Figures for Chapter 4 ...... 89

References ...... 91

5 List of Tables

Table 1-1 Associated diseases and phenotype overlay for G6PC family members and

G6PT ...... 30

Table 2-1 Summary of G6PC variant molecular phenotype results ...... 50

Table 3-1 Plasmids used in Localization and Movement Analysis ...... 55

6 List of Figures

Figure 1-1. Functional relationship of G6PC and G6PT in G6P metabolism ...... 27

Figure 1-2. Tissue expression and cellular localization of G6PC family and G6PT ...... 28

Figure 1-3. Potential molecular consequences of DNA variants ...... 29

Figure 2-1 Selection of G6PC variants ...... 45

Figure 2-2 Protein levels are significantly reduced for a subset of G6PC variants ...... 46

Figure 2-3 G6PC N-linked glycosylation is altered in several G6PC variants ...... 47

Figure 2-4 Several G6PC variants have altered protein localization...... 48

Figure 2-5 C109Y, a Variant of Uncertain Significance, shows features consistent with pathogenicity ...... 49

Figure 3-1 Localization and movement of G6PC family members...... 63

Figure 3-2 Movement and colocalization of G6PC family members with G6PT ...... 64

Figure 3-3 Movement of G6PC1 variants ...... 67

Figure 3-4 Abnormal movement in G6PC1 variants ...... 72

Figure 4-1. Protein levels of G6PC1 active site variants in HepG2 cells ...... 89

Figure 4-2. Protein levels of G6PC1 missense variants in HepG2 cells ...... 90

7 Acknowledgements

I thank the expertise and assistance from Yuriy Fedorov and equipment from the

Small Molecule and Drug Development (SMDD) core facility. I would like to acknowledge Biorender for creation of figures and Snapgene for plasmid map generation.

This work was supported by grants from the Children’s Fund for Glycogen Storage

Disease, the Research Institute for Children’s Health, and NIH grant T32 GM007250.

I thank my advisor, Dr. Mitchell Drumm, for welcoming me into his lab and giving me the opportunity to work on a project I was passionate about. For caring about rare diseases and finding new ways to investigate them and give the attention they deserve.

I am thankful to my thesis committee for their insights and support along the way, and especially to Dr. Anna Mitchell for supporting me in my clinical interests and mentoring me in both the clinical and research worlds.

I am grateful to all of my colleagues in the lab and program who helped support me through this journey sharing reagents, expertise, and kindness whenever needed. So many have helped me through hard times and celebrated my successes and the community of support helped me along the way.

I want to thank all of the scientists and friends who graced my couch with their presence and shared science and support along the way. The cross-discipline discussions helped build the science into its final form. I especially thank Dr. Tristan de Jesús, Sarah

Holden, and Dr. Zach Stanfield, for their many years of friendship, patience, and ideas, and Dr. Doug Oswald for being the best glycobiologist I have ever met.

8 I thank Anjali and the entire Prabhakaran family for sharing their experiences with me and teaching me the most valuable lessons about living with GSD1a. And to

Anjali for being a stellar young scientist and helping shape this project inside and outside of the lab.

I thank Katherine Halloran, Little Katie, for starting as my little shadow and ending as my colleague and friend. I am so grateful you stayed with me the whole way adding ideas, editing writing, and keeping the energy alive in the lab.

I thank the rest of the summer dream team students- Henry Stitzel and Noah

Taylor, for tirelessly sequencing, cloning, and putting mCherry where it belonged, and being the extra sets of hands and heads for everything going on in the lab.

I am thankful to my undergraduate mentors who brought me into research and gave me the skill I needed to build upon. Dr. Sue Simon-Westendorf for staring my interest in medicine when I accidentally signed up for her intro to biology class. Dr.

Soichi Tanda, who guided my first lab experience and taught me plasmid mutagenesis a decade ago which gave me the research skills and foundation for this thesis. And Dr.

Helaine Burstein, who supported me in every way possible to get me admitted to an

MSTP program so I could pursue my wild dream of doing medicine and research.

I thank my family who cheered me the whole way and helped me believe in myself for as long as I can remember. And my in-laws who have made the cheering section bigger and louder and the celebrations even more fun.

I thank my parents for giving me everything I needed – and more- to get this far.

For showing me how important and freeing learning can be and helping me down the

9 path I chose to be an eternal student. This has been a long journey and their love and support made it much easier to keep going and reach this finish line.

I thank Erin Ponting for being my other half in lab. For being my calculator when

I forgot math, for being my memory when I– what was I doing again?, for talking through every good and bad idea and always making everything better. This work would not have been possible without her sitting next to me through it all.

I thank Dr. Ian Bayles for being my other half in everything else. For always listening and supporting me. For knowing when I needed to focus or needed distracted for a while. For tirelessly reading over this thesis with me- the truest testament to love- and barely ever complaining. For being such a great journal club partner that I decided to marry him, and for then transitioning from “Mr. and Mrs.” to “Dr. and Dr.” with me. For being a wonderful husband and scientist in more ways than I have words for.

10 Exploring Molecular Pathogenesis to Streamline Future

Therapeutics in Rare Diseases Using GSD1a as a Model

Abstract

by

KATHLEEN LYNN PLONA

Increasing use of DNA sequencing has led to a large unmet need for variant interpretation in diagnostic and prognostic research and medicine. The combination of patient data and in silico analysis provides useful, but often inconclusive information that can be bolstered by the use of in vitro analysis. Uniform, rapid, and cost-effective analysis of novel DNA variants can be used to determine if and how variants contribute to pathogenicity. Understanding molecular phenotypes can lay the groundwork for personalized therapeutic strategies and treatment approvals, especially in rare diseases where clinical trials are limited by small patient numbers. In this work the rare Mendelian disease, glycogen storage disease 1a, was used as a model disease for exploring variant analysis and classification in vitro. Four molecular phenotypes were examined based on characteristics of the G6PC protein. Of 29 variants in the G6PC analyzed, six revealed a complex abnormal phenotype of reduced protein amount, abnormal glycosylation, abnormal localization, and abnormal intracellular trafficking. Of these six, five were known pathogenic variants, and one was a variant of uncertain significance

(VUS). This provides novel data on the molecular mechanisms of pathogenicity for these variants and provides evidence for definitive classification of the VUS p.C109Y as a pathogenic variant.

11 In addition to variant classification, the G6PC family and substrate transporter G6PT were compared to enhance foundational knowledge of their localization and movement within cells.

When assessed in HepG2 cell model, G6PC1, G6PC3, and G6PT display identical localization and movement patterns, while G6PC2 has more diffuse localization and does not appear to be trafficked similarly within the cell. This body of work helps demonstrate that the utilization of uniform in vitro systems to classify variants paired with strong foundational knowledge can help to provide informed avenues for therapeutic development and treatment in both common and rare genetic disease.

12 13 Background and Significance

1.1 Rare Diseases

Rare diseases are defined as those that affect less than 1 in 2,000 people, or fewer

than 200,000 people in the USA. Due to their inherent rarity and lack of tracking,

accurate estimates for how extensively rare diseases burden humanity are challenging1.

Though individually each disease affects a small number of people, it is estimated there

are more than 7,000 rare disease which collectively affect more than 25 million people in

the USA alone, and much higher numbers worldwide2. Of these, more than 80% are

suspected to be genetic single gene disorders, known as Mendelian diseases.

There are many challenges to studying and treating these rare diseases3. The first

is that rare diseases are inherently rare, meaning there are very small patient populations

often consisting of only one patient or a small family group. Even when there is access

and consent to study these patients, the studies lack power and the ability to replicate

investigation and expand interpretation2. When reaching into therapeutics development,

clinical studies typically require large patient numbers, and the gold standard of clinical

trial, randomized control trials, are impossible with such small patient numbers4. In the

trials that do occur, there remain barriers to gaining valuable insight from them. Rees et

al. tracked rare disease clinical trials initiated over a two year span starting in 2010.

Nearly 700 clinical trials were initiated, but they found over half never yielded

meaningful, actionable results. Thirty percent of the trials were discontinued with

inadequate patient recruitment listed as the primary cause, 30% of completed trials took

between 2-4 years to publish, and 30% remained unpublished 4 years after completion5.

Additionally, there is little funding or incentive to study these diseases as there requires a

14 large expenditure of time and money to research, but there is no financial gain from treating or curing such a small group. This earned them the term “orphan diseases” because drug development companies had no interest in pursuing their investigation.

There have been numerous rare disease groups and collectives established to counteract the financial challenge, the most prominent in the USA being the Orphan Drug Act in

1983. Some of the relatively more common of the rare diseases have established foundations specifically for one disease or group of related diseases to facilitate private funding and research interest, however this still leaves numerous rare diseases with no meaningful ongoing research in either pathological mechanisms or therapeutic development.

1.2 Role of sequencing in diagnostics

Since the initial invention of DNA sequencing in the 1970s, the technologies and role of DNA sequencing in scientific and clinical studies has expanded substantially making it a cornerstone of research and medicine6. Though our knowledge of the and its vast variation has increased exponentially with the completion of the

Human Genome Project in 2003 and the rapid expansion to the 1000 Genomes Project in

20127-9, clinical applications have been hampered by cost, time, data storage, and interpretation. Technological advancements, especially next generation sequencing

(NGS), have reduced the time and cost components for whole genome sequencing

(WGS), and techniques like Sanger sequencing and microarray are still frequently employed when a particular diagnosis is suspected. Clinically, sequencing is most commonly and prominently used in prenatal screening and to improve cancer care, but its

15 role in diagnostics for Mendelian disorders is rapidly expanding as well6. Whole genome and whole exome sequencing (WES) in Mendelian disorders serves to both diagnose patients with known conditions, and to discover previously unknown and causes for new conditions. This has been most prominent in the field of intellectual disability and neurodevelopmental research, where it is estimated that up to 25% of cases can be diagnosed with WES10. Certainly, with the increasing use of NGS sequencing in clinical practice, and as our library of known genetic disease expands, the diagnostic rate will continue to increase and provide invaluable benefit to rare disease discovery and management.

1.3 Variant interpretation

With the increasing use of WGS and WES, novel variant identification is unavoidable. Each individual has about 3-4 million single nucleotide variants (SNV) and more than 300,000 small insertions and deletions (indel) when compared, and every genome has more than 100 premature stop codons in protein coding genes11; 12. Many of these variants can be filtered out as insignificant by comparing to population data, as those frequently occurring in the population are likely to have no detrimental effect.

However numerous de novo and rare variants, defined as those found in less than <1% of the population, leave challenges in interpretation and diagnostics. In silico programs exist to help sort this surplus of variants to narrow down diagnoses. In addition to population frequency, they assess evolutionary sequence conservation and homology to determine how likely the change is to be tolerated. For variants in protein coding regions the effect on the final protein product is considered- nonsense and frameshift variants are most

16 likely to be deleterious, while silent variants are more likely to have no effect. Missense variants are assessed for physio-chemical similarity of the new amino acid, and predicted structural changes in the protein based on known structural data and homology to similar proteins. These factors allow variants to be categorized on a spectrum from benign to pathogenic, however many variants do not have enough evidence to support their role either way and are termed variants of uncertain significance (VUS). ACMG guidelines suggest utilizing several predictive programs, like SIFT, CADD, and Polyphen, to determine pathogenicity and that a classification is final when multiple in silico predictors agree, however this still leaves up to 35% of variants classified as VUS or with conflicting interpretations of pathogenicity13-15. More in vivo and in vitro data are required to interpret the molecular effects of these variants and provide definitive diagnostic and prognostic information for clinicians and patients.

1.4 G6PC Family

The G6PC family is composed of three members, G6PC1, G6PC2, and G6PC3.

These three paralogs share 93 identical amino acid residues giving 39% protein for the family, where G6PC1 and G6PC2 share the highest homology at 50%, and G6PC1 and G6PC3 have 33% homology. In all three family members the phosphatase domain is present and the homologous active site residues of G6PC1: R83,

H119, R170, and H176, are conserved. All three are commonly reported in databases to be integral transmembrane proteins found in the endoplasmic reticulum of the cell, with their catalytic site facing the lumen of the ER16; 17. Interestingly, both G6PC1 and G6PC2 contain the ER retention signal KKXX peptide motif on their C terminal cytoplasmic

17 domains, while G6PC3 does not18. G6PC3 also notably lacks any reported glycosylated residues, while Uniprot database lists N-linked glycosylation at N96 for G6PC1, and N92 for G6PC216.

Each G6PC family member has distinct tissue expression pattern that gave rise to their original names. G6PC1, named for its function, was the first family member identified and is highly expressed in liver and kidney tissue, and is found at lower levels in the intestines and pancreatic islet cells. This mirrors its function in glycogenolysis, which is most crucial in the primary organs of glycogen storage: the liver and kidneys.

Contribution of other tissues to glycogen storage and metabolism has been shown to lesser degrees, and there is notable contribution and compensation from other organs when the liver is malfunctioning 19-21. The liver appears to be sufficient for normal glucose regulation, but it is not necessary. Liver transplant in GSD1a patients is essentially curative for GSD1a, as it rescues glucose homeostasis and metabolic dysregulation in affected individuals, though data on the liver alone are limited since many patients receive combined liver and kidney transplants22. However, selective knock-out (KO) of G6PC1 in only the liver allows for normal glucose homeostasis mirroring control mice, showing the liver is not necessary for whole body glucose regulation19. Additionally, it has been reported that glucose regulation becomes more manageable after puberty and with aging of individuals with GSD1a, with enhanced skeletal muscle mass at puberty being theorized to play a role in glycogen storage and release21; 23.

G6PC2, originally named islet-specific G6PC-related protein (IGRP), has very restricted tissue expression and is found mainly in beta cells of the pancreatic islet and

18 testis 24; 25. Though it shares closest sequence homology with G6PC1, its functional similarity is less clearly defined, with controversy on whether or not it has phosphohydrolase function 17; 24; 26. G6PC2’s ability to hydrolyze glucose-6-phosphate

(G6P) proves more challenging to study than the other family members, potentially due to relatively higher instability of the protein structure24. Most sources report it does not

26-29 have G6P hydrolysis function , while Uniprot database lists it functioning with a Km of 0.45mM16, and two other groups reporting potential function24; 30. Evaluation of function typically relies on membrane disruption, leading Hutton et al. to postulate that

G6PC2 relies on other cellular components to maintain activity24. While G6PC2 is not currently linked to any Mendelian disease, it implicated in type 1 diabetes as a target of cell mediated autoimmunity, and GWAS studies link G6PC2 SNPs to fasting blood glucose variations, supporting it still retains a role in glucose homeostasis at the organism level31. As the pancreas is not known to contribute substantially to whole organism gluconeogenesis or glycogenolysis, G6PC2 is potentially carrying out a role as a glucose sensor in the pancreas24. Overall, it appears to be the family member with the most conflicting reports and least clearly defined function.

The third family member, G6PC3, is found in nearly every tissue analyzed, originally earning the name ubiquitously expressed G6PC-related protein (UGRP) due to its widespread expression. G6PC3, though it lacks the ER localization motif, is also reported to be found in the ER and catalyze the G6P hydrolysis. The enzymatic function has been reported to be identical to G6PC1 with similar Km at about 2mM, but a 6-fold lower Vmax. It is also reported to act upon a wider range of phosphate ester metabolites, and have a lesser overall preference for hydrolysis of G6P relative to G6PC132. G6PC3

19 deficiency is the cause of severe congenital neutropenia type 4 (SCN4) in humans, and similarly causes neutropenia and increased bacterial infection rates in G6PC3-/- mouse models. The neutropenia has been identified as due to both a decrease in released mature neutrophils, as well as increased destruction of peripheral blood neutrophils33. It has been shown that the neutrophil destruction is caused by buildup of a G6P analog, 1,5- anhydroglucitol-6-phosphate (1,5AG6P) in mice with G6PC3 deficiency32.

1.5 Slc37a4 (G6PT)

Slc37a4 aka G6PT is not a member of the G6PC family, but is functionally linked to G6PC family function in glucose metabolism as it handles transport of the G6PC’s substrate. It is an ER membrane antiporter responsible for G6P traffic into the ER in exchange for inorganic phosphate transport from the ER into the cytoplasm. It is also able to traffic other G6P analog metabolites, including 1,5AG6P32. Its expression is widespread throughout the body, like G6PC3, with the liver being one of the highest expressing organs17. Deficiency of G6PT is the cause of glycogen storage disease type 1b

(GSD1b).

1.6 Glycogen storage disease type 1

Glycogen storage disease type 1 is one of 11 glycogen storage diseases, divided into two subtypes: 1a and 1b. Historically, GSD1a and GSD1b were indistinguishable and simply grouped together as GSD1 due to shared clinical presentation of hypoglycemia and hepatomegaly. The function of these two proteins in terminal steps of glycogen breakdown are so closely linked that they result in identical symptoms (Figure

20 1-1). Over time, subgroups of patients were identified to have the classic symptoms of

GSD1a- hypoglycemia, dysregulated metabolism, and liver damage- with the addition of immune deficiency in the form of neutropenia. Advances in molecular and genetic testing were eventually able to distinguish these as two separate diseases, where only GSD1b has the added neutropenia pathology. Notably, the immune defects of GSD1b are identical to those seen in G6PC3 deficiency32. This overlap in symptoms between GSD1b (Slc37a4 aka G6PT-/-) and both GSD1a (G6PC1-/-) and SCN4 (G6PC3-/-), reflects both the expression pattern and known role of G6PT as a transporter for G6PC family substrates

(Figure 1-2 and Table 1-1).

Before genetic testing was common in clinical laboratories, the distinction could be made by assessing the phosphatase function in microsomal membranes from liver biopsy. A reduction of phosphatase function with an intact microsomal membrane gives the diagnosis of GSD type 1, but is unable to differentiate GSD1a from GSD1b.

Disrupting the membrane artificially removes the compartmentalization of G6PC and eliminates the need for substrate transport into the membrane via G6PT, allowing for differentiation between the two subtypes. If phosphatase activity is restored upon membrane disruption, then the defect was in G6P transport via G6PT, and the diagnosis is GSD1b. If the phosphatase activity remains deficient, then G6PC itself is dysfunctional and the diagnosis is GSD1a34.

1.7 Glycogen storage disease type 1a

GSD1a is a rare Mendelian disease caused by homozygous or compound heterozygous deleterious variants in G6PC1 gene. Its incidence is estimated to be 1 in

21 100,000 live births, and it is the most common type of glycogen storage disease35. The disease typically presents with enlarged liver and characteristics of hypoglycemia at several months of age including irritability, fatigue, and seizures. The pathophysiology is an inability to break down glycogen stores to free glucose into the bloodstream. In a fed state, glucose utilization and metabolism are normal, but the lack of glucose entering the blood leads to the hallmark hypoglycemia during periods of fasting. This leads to metabolic dysregulation including hypoketosis due to enhanced fatty acid oxidation, lactic acidosis from G6P shunting to other pathways, hyperuricemia via decreased renal clearance, and hyperlipidemia from enhanced triglyceride synthesis. Additionally, the affected person experiences the external symptoms of low blood sugar- dizziness, shakiness, confusion, fatigue, which can rapidly progress to life threatening seizures, coma, and death. When managed properly, fasting states are avoided, and the occurrence and symptoms of hypoglycemia and metabolic dysregulation are minimal. The accumulation of glycogen stores in the liver and kidneys leads to downstream effects as well. Hepatic adenomas are common in patients of 20-30 years of age, about 10% of which progress to malignancy36. Renal enlargement and damage can lead to proteinuria, hematuria, increased incidence of kidney stones, hypertension, and progressive renal insufficiency37.

The current core treatment of this metabolic disease centers on preventing the affected individuals from ever being in a state of fasting35; 38; 39. This requires regular ingestion of meals, snacks, and cornstarch which helps to slowly release glucose between meals supporting a constant fed state metabolically. Without this constant ingestion the patients are at risk of critically low blood sugar which can lead to seizures, loss of

22 consciousness, coma, and death. The dietary regiment required for patients to survive requires waking up once or more through the night to ingest corn starch, and makes common illnesses that feature vomiting potentially fatal as they cannot sustain those forced periods of fasting. The constant, lifelong hypervigilance and adherence to strict dietary regiments to survive is a massive physical and psychological burden on these patients and the family and friends supporting them. Thus, any treatment or intervention that can even prolong periods between necessary meals could be substantial quality of life improvements for those affected by GSD1.

1.8 Significance

The growing use of DNA sequencing in a clinical setting is uncovering novel genetic variants faster than they can be interpreted. This causes confusion and uncertainty for the clinicians, patients, and family members struggling to understand and apply these results to their health and lives. Sequencing is often used for exploratory diagnostic purposes in cases where there is already uncertainty, and return of unclear results only amplifies that strain. Clinicians are unsure what, if any, results to report to patients, and many clinics and sequencing companies have different policies on what results should be returned to the patient or consumer. With whole exome and whole genome sequencing, large parts of the genome are being viewed, and consequently incidental findings unrelated to the original diagnostic question are prominent40. To handle this in 2013 the

American College of Medical Genetics issued recommendations on what genes they feel incidental findings should be reported to patients. The list, initially comprised of 56 genes, contains genes for which defects would be actionable: those that are highly likely

23 to cause disease and have defined interventions and treatments than can reduce morbidity and mortality in affected individuals41. Targeted single gene or multi gene panels remove the possibility for incidental findings, but do still uncover variants whose significance and role in disease are unknown. Direct to consumer genetic testing has increased, which also puts a burden of interpretation on providing companies and the healthcare professionals users seek out after obtaining their results. There is high diversity in patient views on whether or not they wish to know about these results, ranging from patients who request to know all results, regardless of actionability, to patients who only want definitive primary findings, to those who elect to avoid testing entirely due to the uncertainty. Standard practices suggest or require genetic counseling sessions prior to any genetic testing, in which these options are reviewed and patients are able to make the choice that is right for them42. Additionally, as this deals with heritable genetic information, the findings not only have impact on the patient, but on blood relatives as well. The impact and scope of incidental and uncertain findings on patient wellbeing and behavior is a popular field for ethical debate and study43-46. Receiving genetic information can influence reproductive choices, risk taking behaviors, lifestyle choices like diet and exercise, smoking and drinking, etc47. Increasing our ability to interpret and deliver certainty in results would remove a lot of the distress and discomfort around genetic testing and enhance our ability to provide diagnostic and prognostic information to patients so people can make more informed choices in their lives.

In this work glycogen storage disease 1a (GSD1a), caused by pathogenic variants in the G6PC1 gene, was used as a model for studying rare variants in rare diseases. It is a rare Mendelian disease with autosomal recessive inheritance and over 100 reported

24 associated variants, more than half of which were found in the coding sequence. At the time of study conception, 33% of the coding variants were reported as variants of uncertain significance (VUS), meaning it was unclear if they caused glycogen storage disease 1a or were simply rare, non-impactful population variants. The variants reported in databases such as ClinVar can come from a variety of sources, including incidental findings from a wider genome search, or uncertain findings from testing targeted to diagnose the specific condition. It was unclear what proportion of variants in GSD1a reporting represented variants found in someone with clinical presentation consistent with

GSD1. It is possible some did represent clinicians and patients waiting to find clarity on whether that variant was the cause for a patients GSD1a, or for potential parents who are carriers and do not know the likelihood any child would have GSD1a if they inherited that variant in combination with another. One goal of this work was to set up a system for testing variants, especially in rare diseases, to provide the missing pieces of functional, empiric data that can bolster variant databases and improve interpretation and disease understanding overall. This information will both help understand disease process and pathophysiology on a molecular level and can help any who are in those impossible positions trying to make life decisions based on unclear information.

In addition to understanding if a variant is pathogenic, this process also can yield information on how the variant is pathogenic. The molecular effects of a given variant can be wide reaching (Figure 1-3). Some effects are common to any protein coding variant: protein quantity, location, and structure are inherent and essential features to any protein’s function. There are also more specific features that may vary based on protein type, like post translational modification and movement within the cell. Enzymes can be

25 tested for function, and subunits can be tested for proper binding to partners. In each case, categorizing the effect of a variant on specific protein phenotypes paves the future for informed drug screening to target the specific defect. Doing so in an in vitro system allows for relatively low cost, high throughput assessment of variants for a variety of diseases, bringing much needed empiric data to variant interpretation. Having the system be modular in design, where the specific characteristics of the protein in question are identified, helps to direct investigation to likely routes of molecular pathophysiology.

Making the system uniform in analysis of variants for any given disease allows for direct comparison between variants and comparative assessments of pathogenicity. In rare diseases where clinical trials are challenged by low patient numbers, having in vitro data on the molecular effects of variants can vastly streamline therapeutic application and approval. Advancements in cystic fibrosis (CF) treatment are especially notable for utilization of this approach, where each variant is categorized into classes based on molecular effect. As the CFTR protein is a membrane channel protein the defects include absent protein, processing defects, channel gating defects, conduction defects, and insufficient protein defects48. Drugs have been identified that increase protein activity and correct misfolding defects for specific relatively common CF variants, and approved via clinical trials49; 50. For the rarer CF variants, clinical trials were less feasible, and so data were generated in vitro on variants in the same classes responding to the drugs. Recently, the FDA approved therapeutic use for these variants based entirely on in vitro data, the first approval of its kind51. This success in CF treatment exemplifies how such data can and are being used to streamline treating and improving the lives of patients with rare diseases.

26 Additionally, a large part of our ability to interpret variant effect relies on our base understanding of the biological function and properties of human genes, which remains woefully incomplete. While there is still disagreement on the true number, current estimates hover around 40,000 genes in the human genome, 20,000 of which are protein coding52. Of those protein coding genes, about 20% do not have known functions53. There is also a bias in investigation, where some genes are investigated disproportionality more than others with no regard to their relative physiologic importance54. Much of the research continues to focus on the smaller portion of proteins that were already identified before the human genome was sequenced, leaving large portions of basic biology understudied55. Expanding our foundational knowledge of the role of any protein broadens our understanding and ultimately allows for better application to human health and wellbeing.

27 1.9 Figures for Chapter 1

Figure 1-1. Functional relationship of G6PC and G6PT in G6P metabolism. Both G6PT and G6PC are embedded in the ER membrane, with the catalytic core of G6PC facing internal in the ER lumen. G6PT transports the substrate G6P into the ER, where G6PC catalyzes the terminal step in glycogen breakdown, hydrolyzing G6P to free glucose and inorganic phosphate. This occurs primarily in the major organs of glycogen storage, the liver and kidneys, where glycogen stores are broken down during periods of fasting. The free glucose is then able to leave the cell and enter the bloodstream, providing energy throughout the body. Defects in G6PC function cause glycogen storage disease 1a, while defects in G6PT cause glycogen storage disease 1b, both characterized by hypoglycemia during fasting and abnormal glycogen accumulation in the liver and kidneys. Image generated using BioRender.

28

G6PT

Figure 1-2. Tissue expression and cellular localization of G6PC family and G6PT. Human body maps showing organ and tissue level expression of each protein. G6PC1, restricted primarily to the major organs of glycogen storage: liver, kidneys, and small intestine. G6PC2 with highly restricted expression to the pancreatic islet cells. G6PC3 and G6PT with widespread expression through the whole body. Organelle expression reports are more variable, with all proteins being primarily listed as ER membrane bound, but some data pointing to more widespread localization within the cytoplasm and other membrane of the cell, with varying degrees of certainty for each.

29

Figure 1-3. Potential molecular consequences of DNA variants. Depicted are the range of consequences possible for any protein coding gene, using a normally ER localized protein like G6PC as an example. Protein could be absent, reduced, mis localized, or otherwise unable to carry out its given function. More specific defects can arise depending on a proteins specific features, like abnormal post translational modifications, loss of enzymatic functions, and abnormal binding to partner proteins. Image generated using BioRender.

30

1.10 Tables for Chapter 1

Table 1-1. Associated diseases and phenotype overlap for G6PC family members and G6PT

G6PC1 G6PC2 G6PC3 G6PT (Slc37a4)

Glycogen Severe Glycogen Associated storage disease None congenital storage Mendelian Disease type 1a neutropenia disease type

(GSD1a) type 4 (SNC4) 1b (GSD1b)

Abnormal Disease Abnormal Neutropenia glycogen Characteristics glycogen metabolism and metabolism neutropenia

31 Chapter 2. Classifying molecular phenotypes of G6PC variants for pathogenic properties and to guide therapeutic development

1 1 1 Kathleen L. Plona , Jean F. Eastman , Mitchell L. Drumm*

1Genetics and Genome Sciences Department, School of Medicine, Case Western Reserve

University, Cleveland, Ohio, 44106, United States

Published in Journal of Inherited Metabolic Disease Reports March 28, 2021

2.1 Abstract

Due to advances in sequencing technologies, identification of genetic variants is rapid. However, the functional consequences of most genomic variants remain unknown.

Consequently, variants of uncertain significance (VUSs) that appear in clinical DNA diagnostic reports lack sufficient data for interpretation. Algorithms exist to aid prediction of a variant’s likelihood of pathogenicity, but these predictions usually lack empiric evidence. To examine the feasibility of generating functional evidence in vitro for a given variant’s role in disease, a panel of 29 coding sequence variants in the G6PC gene were assessed. G6PC encodes glucose-6 phosphatase enzyme, and reduction in its function causes the rare metabolic disease glycogen storage disease type 1a (GSD1a).

Variants were heterologously expressed as fusion proteins in a hepatocyte-derived cell line and examined for effects on steady-state protein levels, biosynthetic processing, and intracellular distribution. The screen revealed variant effects on protein levels, N-linked glycosylation status, and cellular distribution. Of the 8 VUSs tested, 7 behaved similar to wild type protein while 1 VUS, p.Cys109Tyr, exhibited features consistent with pathogenicity for all molecular phenotypes assayed, including significantly reduced

32 protein levels, alteration in protein glycosylation status, and abnormally diffuse protein localization pattern, and has recently been reported in a patient with GSD1a. Thus, our results show that such a screen adds in vitro evidence to existing databases to aid in diagnostics, and also provides further classification for molecular phenotypes that could be used in future therapeutic screening approaches for small molecule or gene editing strategies directed at specific variants.

2.2 Synopsis:

Systematic analysis of variants of uncertain significance (VUS) and known disease causing gene variants yields information on pathogenicity and potential variant- specific routes of therapeutic intervention.

2.3 Introduction

Increasing use of DNA sequencing, paired with technologies such as small molecule screening and genome editing, is providing hope that gene-specific therapies will exist for an increasing number of disorders. However, with the decreasing cost of

DNA sequencing, variant interpretation has become the limiting factor in its clinical use56; 57. In silico predictive programs like CADD, SIFT, and Polyphen, use multiple factors to assess pathogenic potential of variants, including sequence conservation and homology, physio-chemical similarity, and predicted structural changes58-61. Variants are annotated as a spectrum, from pathogenic to benign, and those variants lacking evidence

33 for their role in disease are listed as variants of unknown or uncertain significance (VUS).

Guidelines from ACMG suggest only utilizing results when multiple in silico predictors agree on a classification, which leaves up to 35% of variants unclassified13-15. Functional, empiric data regarding putative disease-causing variants could provide direct evidence of pathogenicity outside the capabilities of current in silico prediction programs to address these cases of uncertainty and conflicting interpretations. Molecular and biochemical characterization of variants’ effects could not only have diagnostic value in establishing pathogenicity, they could also guide variant-specific therapeutic development in the emerging era of personalized medicine.

The effects of genetic variants on protein function has proven to be an important aspect of therapeutic development and in vitro characterizations have allowed the development of drugs and their indications to move at a rapid pace. For example, in cystic fibrosis research variants have been classified due to their effect on the protein product, and drugs have been identified that increase activity49, and correct misfolding50, for specific variants. Initially clinical trials were required to obtain approval for drug use in each variant, however clinical trials are inherently complicated for rare diseases with small patient populations4. Recently FDA approval was granted based on in vitro research, which is accelerated by the variant classifications helping predict which variants will respond to particular drugs51. This therapeutic success demonstrates the utility of heterologous variant analysis, and provides an example of how such data can be integrated and applied to genetic disease management.

As a model genetic disease for testing methods to improve variant interpretation, we investigated glycogen storage disease 1a (GSD1a [MIM232200]), caused by variants

34 in the glucose-6-phosphatase gene (G6PC [MIM613742]). The G6PC gene is a nine transmembrane-domain protein expressed in the liver, kidney, and small intestine where it functions in the endoplasmic reticulum (ER) to cleave glucose-6-phosphate in the terminal step of glycogenolysis and gluconeogenesis24; 62-64. The catalytic site for phosphohydrolase function has been homology mapped with other phosphatases identifying key catalytic residues at Arg83, His119, Arg170 and His17665, and its biosynthesis includes N-linked glycosylation at N9666. While much of the cell biology of

G6PC is understood and many disease-causing variants have been cataloged24; 34; 67-71, there still remain numerous VUS, and within the known disease-causing variants the molecular cause of pathogenicity is unclear. At the time of writing, the ClinVar database listed 145 variants, 87 of which are in the coding sequence, and 26 of those listed as

VUS. For some known pathogenic variants, genotype/phenotype correlations have been explored on a clinical level in attempt to elucidate the role of specific genotypes in patient outcomes and improve precision medicine, however, extensive heterogeneity and external factors complicated this evaulation72. Here, we provide a uniform approach to classifying variants in the G6PC gene, in vitro, according to their impact on biosynthesis of a readily detectable fusion protein. The goal of this study was to use a controlled system to evaluate the molecular phenotypes of variants on multiple characteristics affecting protein function, such as total protein at steady state, post-translational modification, and subcellular localization. The factors analyzed here provide valuable in vitro data for understanding the variant spectrum and could be broadly applicable to other genetic disorders informing diagnostics, prognostics, and therapeutic development.

35

2.4 Materials and Methods

Cell Culture and Transfection

HepG2 cells (ATCC HB-8065), which are routinely used as a robust in vitro model for the liver in metabolic studies73, were cultured using DMEM/F-12 growth medium with 10% fetal bovine serum, 1% (10,000 U/ml) penicillin/streptomycin and 1%

(110mg/L) L-glutamine at 37°C with 5% CO2. Cells are passaged at 80-100% confluence using 0.25% Trypsin (w/v)- 0.53nM EDTA. For transfection, nearly confluent cells were passaged 24 hours prior, then transfected following the Lipofectamine 3000

(ThermoFisher #L3000001) protocol for 24 well plates using 500ng of each plasmid and

1.5ul Lipofectamine 3000 reagent per well. Transfection mixture was left on cells for 48 hours before imaging, and for 72 hours before protein harvest for western blot analysis.

Plasmid Construction

The fusion protein approach has been used for this enzyme by others and have shown that N-terminal fusions do not appear to disrupt enzyme stability or function64; 66;

74, nor do C-terminal fusions2; 12. N-terminal G6PC-EFGP fusion plasmid base constructs were purchased from VectorBuilder. The base plasmids were G6PC-EGFP fusion plasmid containing the G6PC coding sequence with stop codon (TAA) removed and fused to EGFP with a single glycine (GGA) linker (VB190719-1039cgw pRP[Exp]-Neo-

SV40>[G6PC]:EGFP), and an EGFP only control plasmid (VB170206-1119ntc pRP[Exp]-Neo-Sv40>EGFP). These plasmids were altered with restriction enzyme

36 cloning to replace the NeoR selectable marker with mCherry fluorophore between SacI and XhoI sites. Individual variants were introduced to the plasmids using GeneArt site directed mutagenesis reagents and protocol (ThermoFisher #A13282) and confirmed via

Sanger sequencing. Additionally, an unmodified G6PC-FLAG construct (VB190521-

1108jbf pRP[Exp]-mCherry-SV40>hG6PC[NM_000151.3]/FLAG) was used as the WT construct in co-localization analysis.

Fixing and Staining Cells

For steady-state protein expression imaging, HepG2 cells were transfected as described above in 24-well, clear-bottom, black-walled plates. At 48 hours, cells were rinsed with 1x PBS and fixed in 4% paraformaldehyde for 10 minutes. After three washes in 1x PBS, cells were stained with 1µg/ml DAPI nuclear stain for 5 minutes then stored at

4°C in 1x PBS. For colocalization analysis, the G6PC-FLAG plasmid was fixed as described then permeabilized with 0.1% PBS-TritonX for 10min, blocked in 10% donkey serum at room temperature for 1 hour, incubated with mouse anti FLAG primary antibody diluted 1:500 in 10% donkey serum (Sigma #F1804, lot SLBX2256) at 4°C overnight, rinsed five times in 1x PBS, incubated with Donkey anti mouse-Alexa 647 secondary antibody diluted 1:1000 in 10% donkey serum (Jackson ImmunoResearch

#715-605-150) for 1 hour, rinsed five times, and stained with 1µg/ml DAPI for 5 min followed by storage in 1x PBS at 4°C in the dark until imaging.

37 Operetta Imaging and Analysis

Cells were imaged and analyzed for steady state protein expression level on the

Perkin Elmer Operetta System and Columbus software. Transfected cells fixed and DAPI stained on 24 well plates were imaged at 20x magnification with 60 images taken per variant, in triplicate, for a total of n=180 images analyzed per variant. Each image was analyzed for cell number via DAPI staining, transfected cell number via mCherry fluorescence, and G6PC-expressing cell number via EGFP fluorescence. Output was given as percent of transfected (mCherry positive) cells expressing detectable levels of

G6PC (EGFP positive) at a preset threshold determined from background levels in negative controls, and compared to WT using Welch’s 2 tailed t-tests for statistical significance.

Western Blot

Cells were lysed in 10mM HEPES pH 7.3 NaCl 1% NP-40 with protease inhibitor

(Millipore Sigma #4693132001). Collected protein was denatured at 37oC for 30 minutes without addition of 2-mercaptoethanol and 20µg total protein lysate per lane was run on western blot. Primary antibody was rat anti-GFP (Biolegend #338002) at 1:2000 and secondary goat anti-rat IgG HRP conjugated (Millipore Sigma #AB183P) at 1:5000, both diluted in 5% milk in 1x PBS with 0.1% Tween20.

Deglycosylation

Cells were transfected as described above for 48 hours and 30µg of total protein lysate per sample were deglycosylated with PNGase Fast Kit (Sigma-Aldrich

38 #EMS0001) following the manufacturer’s protocol with the denaturing conditions modified to 50oC for 10 minutes. The entire treated sample was run on western blot as described above. Antibodies used for this blot were rat anti-GFP (Biolegend #338002) at

1:2000, goat anti-rat IgG HRP conjugated (Millipore Sigma #AP183P) at 1:5000, mouse anti-mCherry (Novus Biologicals #NBP1-96752) at 1:2000, goat anti-mouse IgG HRP conjugated (Millipore Sigma #AP181) at 1:5000, rabbit anti-vinculin (Abcam

#ab129002) at 1:10,000, and mouse anti-rabbit IgG HRP conjugated (Millipore Sigma

#AP188P) at 1:5000.

High magnification imaging and co-localization analysis

HepG2 cells grown on 24-well glass bottom plates were transfected, fixed, and stained as described above and imaged on the Zeiss Axio Observer 7 Scope with Zen Pro

3.0 Software (Carl Zeiss Microscopy, 2019, Germany). Each variant modeled on the

G6PC-EGFP (green) plasmid was co-transfected with a WT G6PC-FLAG (far red) construct to assess co-localization. A plasmid expressing EGFP alone was used as a reference for cytoplasmic localization, and each plasmid independently expressed mCherry to allow identification of transfected cells (Figure 2-4a). Images were taken of 9 cells per variant at 63x magnification for analysis using the Zeiss co-localization module software and a Pearson correlation coefficient was reported for each individual cell.

39 2.5 Results

Fluorescent and epitope-tagged G6PC fusion proteins were used to do large scale characterization of several aspects of G6PC including quantity, biosynthesis and intracellular localization. In total, 29 variants spanning all 5 exons of G6PC (Figure 2-1a) were selected to represent a variety of pathogenicity ratings (Figure 2-1b), DNA alterations (Figure 2-2c) and protein alterations (Figure 2-1d). As the majority of reported variants in G6PC are substitutions, our panel reflects that by testing 26 substitutions, 2 deletions, and 1 duplication. Two known benign variants were included as controls along with a pathogenic variant, p.Lys216=, that affects splicing but not protein coding, and thus should be functionally benign in our model using the intronless cDNA sequence.

The remaining 10 pathogenic variants were tested to further elucidate their molecular mechanisms of pathogenicity, while the properties of variants annotated as “likely pathogenic” and “VUS” helps in their classification as well.

The first step of this study was to quantify overall G6PC content of a panel of variants in a uniform context using a tagged fusion protein construct (Figure 2-2a). The percent of transfected cells expressing EGFP was calculated for each of the 29 variants and compared to WT (Figure 2-2b). In this screen variants with premature termination codons are essentially undetectable with the highest being 6.7% (p.Gln27Ter) of WT level, while missense variants display a wide range of values from 29.1% (p.Gln54Pro) to

148.0% (p.Val304Ile) of WT level, with six showing significantly less G6PC-EGFP amount than WT. Two VUS, p.Val304Ile and p.Val308Ile, showed significantly more

G6PC-EGFP protein level than WT G6PC-EGFP, while 15 variants had significantly

40 lower protein level. Notably, one of the eight VUS tested, p.Cys109Tyr, showed reduced protein expression at roughly 33.7% WT level. Example images (Figure 2-2c) are included showing WT (positive) and mock transfected (negative) controls, along with representative low, medium, and high protein level variants.

As G6PC is reported to have N-linked glycosylation at the N96 position, we next qualitatively examined all variants in the G6PC-EGFP fusion protein construct by western blot. WT G6PC-EGFP shows a distinct double band, and when treated with

PNGase F to deglycosylate the protein only a single band is seen, indicating the higher molecular weight bands are glycosylated forms of G6PC (Figure 2-3a). Analysis of the variants revealed the banding pattern is altered in some, likely attributed to some alteration in protein glycosylation (Figure 2-3b). This confirmed there was little to no detectable G6PC-EGFP protein for the three frameshift and six nonsense variants tested.

The three synonymous variants displayed a 2-band pattern consistent with WT, while the five of the sixteen missense variants appear to have only a single band indicative of a glycosylation defect. Notably these five variants with a single band are all among the six variants shown to have significantly reduced protein amount in Figure 2-2b. The remaining variant with reduced protein amount, p.Asp38Val, did retain a doublet banding pattern, however the higher molecular weight band is visibly fainter.

We next examined intracellular distribution of the G6PC-EGFP tagged fusion protein. Localization was analyzed using WT G6PC-FLAG was co-transfected with each protein expressing variant in G6PG-EGFP (Figure 2-4a) and an average Pearson colocalization score was plotted (Figure 2-4b). Co-transfection of two WT constructs had an average Pearson co-localization score of 0.847, while WT G6PC-FLAG with the

41 EGFP negative control had an average of 0.164. Representative images (Figure 2-4c) show the diffuse cytoplasmic pattern of EGFP alone, in contrast to the restricted, punctate appearance of WT G6PC localization. Variants R83C and E110Q demonstrate visual differences, where p.Glu110Gln (Pearson 0.761) retains a restricted punctate pattern and co-localizes well with WT, while p.Arg83Cys (Pearson 0.7494) appears more diffuse and abnormal in localization. Variants with a mean Pearson score below that of p.Arg83Cys all showed visible trends of abnormal localization and had a wider variance reflecting some abnormality in the localization of these variants that warrants further analysis.

Of the eight VUS tested, p.Cys109Tyr was notable for showing abnormal phenotype in each assay and was re-listed as having conflicting interpretations of pathogenicity in the Clinvar database as of Sept. 2019. In our analysis, high throughput cell imaging showed protein level was 33.9% of WT (Figure 2-5a), and only a singular band was seen on western blot (Figure 2-5b). In examining C109Y’s localization, it was noted that localization was not unanimously aberrant in every cell analyzed. There was some variation in localization from cell to cell, where the majority of cells analyzed showed the abnormal diffuse cytoplasmic localization (Figure 2-5c), and some appeared to retain WT localization patterning. We feel the wide standard deviation for this variant in Figure 2-4b reflects an important finding for the pathophysiology of this variant that was not otherwise quantifiable.

42 2.6 Discussion

The goal of this study was to systematically examine molecular phenotypes of a panel of G6PC variants, and to apply this methodology to inform about the cause of pathogenicity and the pathogenic potential of VUSs. The 29 variants tested provide a representative selection of the types and proportions of those found in the GSD1a patient population and should provide insight into diagnostic ambiguities as well as help guide future “personalized” therapeutic strategies. By examining steady-state protein levels, glycosylation status, and localization within the cell we have yielded new information on how each variant affects protein behavior at a molecular level, summarized in Table 1.

This information can supplement current databases combining in silico predictions, patient data, and other in vitro reports to give a more comprehensive picture on pathogenicity and its underlying molecular cause for each variant. Using a uniform approach allows us to compare variants and make molecular groupings that could streamline prognostics and therapeutic approvals for like-variants, as seen in recent FDA approvals for cystic fibrosis medications based on in vitro classification data51.

When examining protein levels, these results support the computational prediction that variants causing premature stop codons in disease-linked genes are almost universally pathogenic, and confirms that the pathogenicity is due to protein absence.

Thus, for nonsense and frameshift variants, in silico analysis is likely sufficient and additional in vitro data is not warranted. Generally, the best therapeutic options to address nonsense and frameshift variants would be those that restore protein production, such as gene editing to return to the reference sequence, and gene/mRNA/protein replacement to

43 provide alternate production sources. In the case of GSD1a, protein replacement therapy is complicated by it being a membrane-bound ER protein, however, advances are being made in clinical trials with gene replacement therapy75-79. Additional therapies for premature stop codon read-through could be be applied to nonsense variants, such as nonsense suppressor tRNA or therapeutic compounds like aminoglycosides.

Our identification of variants with reduced protein amount represents a new molecular phenotype that may be contributing to their pathogenicity. Reduced enzyme quantity could be the underlying cause of lower enzymatic activity. While variants in active site residues (p.Arg83Cys and p.Arg83His) are likely catalytically inactive regardless of protein levels, missense variants with reduced total protein may benefit from therapeutic strategies to increase protein production or decrease degradation.

As previous reports showed substituting the N96 residue to prevent N-linked glycosylation reduced enzyme activity66, we hypothesized that other variants may alter glycosylation and be an underlying cause of pathogenicity. Our results showed this was a prominent phenotype for missense variants in G6PC and that it warrants examination as a contributor to pathogenicity in glycosylated proteins.

Given that G6PC resides within the endoplasmic reticulum, we examined localization as a potential contributor to pathogenicity. The abnormal localization is an especially crucial molecular phenotype for a compartmentalized enzyme, where increasing expression or using therapeutics that modulate or enhance function may have severe and unexpected side effects due to its location within cells and altered access to potential substrates.

44 Interestingly, the missense variants with lower protein level also had abnormal glycosylation and were among the lowest localization scores. The consistent results for 6 of the missense variants showing significantly decreased protein level, abnormal glycosylation, and trending toward altered localization, suggest a connection among these phenotypes. It is possible each phenotype is distinct, but it is more likely they are linked in a cause/effect fashion. Likely, the altered glycosylation and/or localization within the cells is tagging the protein for degradation, leading to decreased levels. However, the trafficking differences could be due to, or the cause of, abnormal glycosylation. It may be possible for these variants that a singular therapeutic intervention could alleviate all three abnormalities, as such, further investigation is warranted to determine the root cause of the complex molecular phenotype.

Finally, in the case of the missense VUS p.Cys109Tyr, this study demonstrates that disease annotation will be an iterative process, requiring continual updates of sequencing and clinical data with in vitro lab analysis to help establish pathogenicity status and its cause for a given variant. When this study began, p.Cys109Tyr was classified as a VUS in ClinVar database, and as it progressed, clinical data emerged for a

GSD1a patient homozygous for the p.Cys109Tyr variant, transitioning its status to be likely, but not definitively, pathogenic80. Combined with the data presented here which show this variant, and not a closely linked undetected variant(s), affects function and is consistent with other known pathogenic variants in each assessment performed. We assert that this variant could be conclusively re-classified and provide closure to current and future carriers of the variant on its potential to cause disease.

45 2.7 Figures for Chapter 2

Figure 1. Selection of G6PC variants

B C D Synonymous/ Splice Site

Benign Duplication Alteration

Frameshift 1 2 Deletion 1 Synonymous 3 2 3

VUS Pathogenic 8 1 Nonsense 1 6 Likely Substitution Missense Pathogeni 26 16 c 8

Characteristics of selected G6PC variants. A) Schematic of G6PC coding sequence with select panel of 29 G6PC variants shown with reported pathogenicity (color) and protein domain location (shape). Number of mutations represented from each type of (B) reported pathogenicity rating (compiled from ClinVar and Ensembl databases) (C) DNA sequence mutation and (D) protein mutation.

46 Figure 2. Protein levels are significantly reduced for a subset of G6PC variants

C WT Mock Q347X R83C R170Q

mCherry

-

EGFP G6PC

Merge

Protein expression of G6PC variant panel. (A) Schematic of the WT G6PC-EGFP fusion construct. (B) Mean ± SD percentage of G6PC-EGFP positive transfected cells detected in n=180 images per variant tested using the Perkin Elmer Operetta System at 20x magnification. Stats analysis of each individual variant compared to WT with Welch’s unpaired 2 tail t-test, all unmarked are not significant. (C) Representative DAPI stained 20x magnification images from controls and representative variants with low (Q347X), medium (R83C), and high (Q347X) G6PC-EGFP expression in mCherry positive transfected cells.

47

Figure 3. G6PC N-linked glycosylation is altered in several G6PC variants

A B

EGFP

Q27X W63X R170X G242X Y323X Q347X Y44Y P144P L216L WT Untransfected Q27Rfs L46Sfs Y128Tfs mCherry

EGFP

vinculin mCherry

D38V Q54P R83C E193Q A331A G188S T213I G222A F294L V304I V308I R83H C109Y E110K E110Q H119L R170Q

EGFP

mCherry

G6PC glycosylation status is altered for multiple missense variants of known and uncertain pathogenicity. (A) Western blot of whole cell lysate from HepG2 cells transfected with G6PC-EGFP +/- treatment with PNGase F to de-glycosylate the protein. (B) Representative western blots for all protein expressing variants showing varied patterns of the WT double band pattern.

48 Figure 4. Several G6PC variants have altered protein localization

Variant WT mCherry Merge C G6PC-EGFP G6PC-FLAG

EGFP Control

-

WT EGFP G6PC

-

R83C R83C G6PC EGFP

-

EGFP G6PC E110Q Abnormal localization may contribute to pathogenicity in some variants. (A) Schematic of the fusion constructs with EGFP-only control plasmid (top), G6PC-FLAG WT construct (middle), and G6PC- EGFP construct (bottom) mutagenized with each variant. (B) Mean ± SD Pearson correlation coefficient calculated by Zeiss Zen Pro software colocalization module in n=9 cells imaged at 63x magnification for each variant. Stats analysis of each individual variant compared to WT with Welch’s unpaired 2 tail t-test, all unmarked are not significant.(C) Representative images showing protein localization compared to WT in controls and two select variants (R83C, Pearson .7494; E110Q, Pearson .7597).

49

Figure 5. C109Y, a Variant of Uncertain Significance, shows features consistent with

pathogenicity

A B WT C109Y mCherry G6PC-EGFP Merge

C109Y

C C109Y WT mCherry Merge G6PC-EGFP G6PC-FLAG

VUS Variant C109Y shows features consistent with pathogenicity. (A) HepG2 transfected cells showing reduced G6PC-EGFP protein expression for C109Y (14.85%) compared to WT (44.03%). (B) Western blot showing a singular band for C109Y compared to double banding for WT. (C) Transfected HepG2 cells showing C109Y localization having a diffuse cytoplasmic pattern compared to a more restricted pattern of WT G6PC-FLAG expression.

50 2.8 Tables for Chapter 2

Table 1. Summary of G6PC variant molecular phenotype results Protein cDNA Protein Protein Code Alteration Alteration Pathogenicity Level Localization

EGFP NA NA Control 89.39 0.164 WT Refseq None Control 44.03 0.847 Q27Rfs c. 79 del C p.Gln27fs Pathogenic 4.36**** NA Q27X c. 79C>T p.Gln27Ter Likely pathogenic 2.94**** NA D38V c.113A>T p.Asp38Val Pathogenic 20.70**** 0.775 Y44Y c.132 C>T p.Tyr44= Benign 51.62 0.789 L46Sfs c.136delC p.Leu46fs Likely pathogenic 2.08**** NA Q54P c.161 A>C p.Gln54Pro Pathogenic 12.81**** 0.438*** W63X c.189 G>A p.Trp63Ter Likely pathogenic 1.06**** NA R83C c.247C>T p.Arg83Cys Pathogenic 22.95**** 0.749 R83H c.248 G>A p.Arg83His Pathogenic 28.46*** 0.690** C109Y c.326 G>A p.Cys109Tyr VUS 14.85**** 0.666 E110K c.328 G>A p.Glu110Lys Likely pathogenic 47.23 0.803 E110Q c.328 G>C p.Glu110Gln Likely pathogenic 58.39 0.761 H119L c.356 A>T p.His119Leu Likely pathogenic 45.78 0.868 Y128Tfs c.379_380dupTA p.Tyr128fs Likely pathogenic 0.93**** NA P144P c.432 G>A p.Pro114= Benign 56.40 0.875 R170Q c.509 G>A p.Arg170Gln Likely pathogenic 57.70 0.811 R170X c.508C>T p.Arg170Ter Pathogenic 0.83**** NA G188S c.562 G>A p.Gly188Ser Pathogenic 24.37**** 0.730 E193Q c.577 G>C p.Glu193Gln VUS 53.03 0.881 T213I c.638 C>T p.Thr213Ile VUS 49.35 0.812 L216L c.648G>T p.Leu216= Pathogenic 52.25 0.872 G222A c.665 G>C p.Gly222Ala VUS 60.76 0.871 Q242X c.724 C>T p.Gln242Ter Pathogenic 0.72**** NA F294L c.882 C>A p.Phe294Leu VUS 55.10 0.899 V304I c.910 G>A p.Val304Ile VUS 65.18* 0.840 V308I c.922 G>A p.Val308Ile VUS 61.97* 0.893 Y323X c.969 C>A p.Tyr323Ter Pathogenic 0.72**** NA A331A c.993 G>C p.Ala331= VUS 59.92 0.902 Q347X c.1039C>T p.Gln347Ter Pathogenic 1.16**** NA Summary chart of protein level and localization consequences of 29 G6PC variants. Protein level value is the mean percent EGFP positive transfected cells from n=180 images analyzed at 20x magnification per variant. Localization value is the mean Pearson correlation coefficient between WT and variant co- localization in n=9 cells analyzed at 63x magnification. Significant levels are shown as compared to WT in a Welch’s two tailed t-test.

51 Chapter 3. Investigating localization and movement of G6PC family members and

G6PC1 variants

3.1 Abstract

The three G6PC family proteins, along with G6PT (Slc37a4), function to maintain glucose homeostasis. Each paralog has unique features and disease associations that reflect localization and functional differences within the body. In this work we sought to assess the subcellular localization and behavior of these proteins in a human liver cell line to investigate similarities and differences among the G6PC family members. N-terminal fluorescent fusion protein constructs were created for each of the 4 proteins, and live cell imaging was done to assess localization and movement. We found that G6PC1, G6PC3, and G6PT all localize to the nuclear membrane, perinuclear membranes, and small vesicles throughout the entirety of the cytoplasm. These vesicle structures exhibit rapid, multi-directional movement, which was previously only reported for G6PC1, and both

G6PC1 and G6PC3 co-localize strongly with G6PT in all structures. G6PC2 displays a strikingly different localization pattern, with more diffuse coverage throughout the cytoplasm and nucleus and only occasionally several vesicles with little to no movement were observed. Additionally, we expanded on prior work categorizing molecular phenotypes of G6PC1 variants. Twenty-one variants related to GSD1a were observed for movement throughout the cell, and 6 were found to be abnormal. These six G6PC1 variants: D38V, Q54P, R83C, R83H, C109Y, and G188S, were previously found to be

52 abnormal in localization, protein level, and glycosylation, indicating possible linkage among these phenotypes and a singular root cause.

3.2 Synopsis

The subcellular localization and trafficking show important differences among the three G6PC family members and provide a novel molecular phenotype for dysfunction in

G6PC1 variants responsible for GSD1a.

3.3 Introduction

The glucose-6-phosphate catalytic subunit (G6PC) gene family consists of three members, G6PC1, G6PC2, and G6PC3. The G6PC family are integral membrane bound phosphatase enzymes which catalyze the hydrolysis of glucose-6-phosphate (G6P) to free glucose and inorganic phosphate81. When considering and its role in metabolism and disease process, a gene outside the family, Slc37a4 (G6PT), becomes notable as well and will be discussed alongside the G6PC gene family. G6PT is the transport protein which regulates trafficking of G6P into the lumen of the ER where it is accessible to G6PC and hydrolysis can occur34; 62; 82.

The three G6PC family members share 39% sequence homology, with G6PC1 and G6PC2 being the most similar in sequence. The phosphatase domain and homologous active site residues are shared among all three, however only G6PC1 and

G6PC2 are reported to have N-linked glycosylation and an ER retention signal16; 66.

Despite the lack of ER localization signal, all three are reported to be transmembrane proteins in the ER membrane where they function in glucose homeostasis.

53 G6PC1 is most highly expressed in liver, kidney, and intestines where it carries out its highly defined role in release of free glucose from G6P83-85. Dysfunction of G6PC1 is the cause of glycogen storage disease type 1a, characterized by inability to maintain normal blood glucose levels during fasting and an abnormal buildup of glycogen stores in liver and kidneys34; 35; 68; 86-88. G6PC2 is highly restricted in expression and found primarily in the pancreatic islet and testis where it may have a role as a glucose sensor.

While G6PC2 is not linked to a mendelian disease, it is identified in GWAS as being related to variation in blood glucose levels. G6PC3 is found widespread throughout the body and at highest levels in the skeletal muscle, though its dysfunction causes a prominent phenotype of neutrophil dysfunction in severe congenital neutropenia type 4

(SCN4).

Slc37a4 aka G6PT is functionally linked to the G6PC family, as it is the transporter responsible for G6PC substrate transport into the ER. Its expression is also widespread through the body, with highest expression in the liver and kidneys.

Malfunction of G6PT causes glycogen storage disease type 1b, with a phenotype that is consistent with a combination of G6PC1 and G6PC3 loss89. The phenotype overlap is consistent with G6PT’s essential role making substrate available within the ER for both

G6PC1 and G6PC3 to hydrolyze.

A study from Soty et al. examined the intracellular localization and trafficking of

G6PC1 and G6PT within the cell, identifying movement for the first time for G6PC162.

We sought to expand upon this evaluation to the entire G6PC family, and to build upon our previous work assessing movement as a potential molecular phenotype of G6PC1 variants in GSD1a90.

54

3.4 Materials and Methods

Cell Culture and Transfection

HepG2 cells (ATCC HB-8065) were cultured using DMEM/F-12 growth medium with 10% fetal bovine serum, 1% (10,000 U/ml) penicillin/streptomycin and 1%

(110mg/L) L-glutamine at 37°C with 5% CO2. Cells are passaged at 80-100% confluence using 0.25% Trypsin (w/v)- 0.53nM EDTA. For transfection, nearly confluent cells were passaged 24 hours prior, then transfected following the Lipofectamine 3000

(ThermoFisher #L3000001) protocol for 24 well plates using 500ng of each plasmid and

1.5ul Lipofectamine 3000 reagent per well. Transfection mixture was left on cells for 48 hours before imaging, and for 72 hours before protein harvest for western blot analysis.

Plasmid Construction

N-terminal fusion plasmid constructs were obtained from VectorBuilder for the

G6PC family, G6PT (Slc37a4), and EGFP only control vector.

The base plasmid G6PC1-EGFP was altered with restriction enzyme cloning to replace the NeoR selectable marker with mCherry fluorophore between SacI and XhoI sites. Individual variants were introduced to the G6PC1-EGFP/mcherry plasmid using

GeneArt site directed mutagenesis reagents and protocol (ThermoFisher #A13282) and confirmed via Sanger sequencing. Plasmids with the additional selectable marker mcherry were used for solo transfection, while plasmids with NeoR were used for co- transfection and colocalization experiments.

55

Table 3-1. Plasmids used in Localization and Movement Analysis Name Vector Builder Name and identification Number EGFP Control VB170206-1119ntc pRP[Exp]-Neo-Sv40>EGFP G6PC1-FLAG VB190521-1108jbf pRP[Exp]-mCherry- SV40>hG6PC[NM_000151.3]/FLAG G6PC1-EGFP VB190719-1039cgw pRP[Exp]-Neo-SV40>[G6PC]:EGFP G6PC1- Cloned from G6PC1-EGFP base construct EGFP/mcherry G6PC2-EGFP VB190923-1424vdy pRP[Exp]-Neo- SV40>hG6PC2[NM_021176.2]*/EGFP G6PC2- VB190923-1355evn pRP[Exp]-mCherry- EGFP/mcherry SV40>hG6PC2[NM_021176.2]*/EGFP G6PC3-EGFP VB1707204-1077ugy pRP[Exp]-Neo-SV40>[hG6PC3]:EGFP G6PC3- VB190923-1330eyn pRP[Exp]-mCherry- EGFP/mcherry SV40>hG6PC3[NM_138387.3]*/EGFP G6PT-mcherry VB190923-1402gkb pRP[Exp]-Neo- SV40>hSLC37A4[NM_001164277.1]/mCherry G6PT- VB190923-1387buw pRP[Exp]-mCherry- EGFP/mcherry SV40>hSLC37A4[NM_001164277.1]/EGFP

High magnification imaging

HepG2 cells grown on 24-well glass bottom plates were transfected as described above and imaged on the Zeiss Axio Observer 7 Scope with Zen Pro 3.0 Software (Carl

Zeiss Microscopy, 2019, Germany). Live cell video was taken at 63x magnification for

30-60 seconds per construct, and still images were used from video clips. The cells imaged were representative of the most common features seen after observing multiple transfected cells per transfected protein.

56 3.5 Results

In Figure 3-1 each member of the G6PC family, along with G6PT and an EGFP only control, are shown individually transfected into HepG2 cells. The still images are taken from live cell video at three timepoints: 0, 15, and 30 seconds, with arrows to indicate the position of representative points of tagged protein at timepoint zero to facilitate visualization of movement. G6PC1, G6PC3, and G6PT all display similar localization patterns and will be discussed together, while G6PC2 displays a strikingly different pattern. The localization of G6PC2 appears distributed through the entire cell including the nucleus and cytoplasm and mirrors the patterning of EGFP control. Rarely, a few (less than 5) higher intensity punctate spots were observed in the cytoplasm of cells, but that was not representative of the majority of G6PC2 transfected cells observed.

The localization of the other 3 fusion proteins: G6PC1, G6PC3, and G6PT, display similar patterns with distinct sub-localization in the cell. Each has some larger membranous perinuclear aggregation, consistent with ER localization, as well as numerous small punctate vesicular aggregates. The larger membranous aggregates appear to stay mostly stationary, with some small shifting of the organization of the structure.

The smaller vesicle like structures exhibit rapid, omni-directional movement throughout the entire cell with the exclusion of the nucleus. Nearly all of the small vesicles exhibit movement, but the speed and distance traveled is highly variable. This representative example in Figure 3-1 does not display perinuclear localization, though it was noted in observation of other G6PC1, GPC3, and G6PT transfected cells and can be seen in Figure

3-2 images for these proteins.

57 Considering their functional linkage in glucose handling and disease phenotype, we wanted to examine co-localization of each family member with G6PT. Additionally, after proving G6PT does exhibit movement, it was important to see if the intracellular movement was unique or echoed among G6PC1, G6PC3, and G6PT. Figure 3-2 shows co-transfection of each G6PC family member (green) and G6PT (red) with still images at

0, 15, and 30 second timepoints from live cell video. Figure 3-2a shows strong co- localization between G6PC1 and G6PT, and arrows pointing out spots where G6PC1 and

G6PT appear strongly co-localized at time 0 to facilitate visualization of movement. It is worth noting that there are some distinct points of red and green alone in the merged imaged, though overall co-localization is strong. Figure 3-2b shows G6PC2 and G6PT with some co-localization. As stated in Figure 3-1, the typical presentation of G6PC2 localization was diffuse throughout the cytoplasm and lacked the distinct sub localization patterning. This cell was selected as one of the few that displayed the punctate vesicle pattern to examine movement when vesicles are observed. This patterning remains distinct from G6PC1 and G6PC3 in that there is wider spread of G6PC2 localization with fewer punctate aggregates and discrete spots of G6PC2. There was minimal movement noted in the several discrete aggregates, though the co-localization in those spots was strong. The vesicles of G6PT in the lower left of the cell shifted slightly but did not display the same range and speed of movement seen when transfected alone or with any other construct. Figure 3-2c shows strong co-localization between G6PC3 and G6PT, mirroring what is described above for G6PC1 and G6PT. The appearance of some distinct red and green only spots is noted again showing strong, but not perfect, co- localization between the two. Overall, we see co-localization among each G6PC family

58 member with G6PT, though G6PC2 is limited and a clear outlier in localization and movement for the G6PC family.

In Figure 3-3 we move on to examine the trafficking of G6PC as a potential point of aberration for variants. Overall, we examined 21 variants compared to WT positive and EGFP negative controls. As described above, WT G6PC1 localized throughout the cell in punctate spots and some larger conglomerates consistent with ER structure, and nuclear membrane localization are often, but not always, present. The small vesicle structures exhibit rapid, omni-directional movement throughout the entire cell. In this analysis 6 missense variants displayed strikingly abnormal movement patterns: D38V,

Q54P, R83C, R83H, C109Y, and G188S. In each case, the decreased or absent movement was directly linked to abnormal localization patterning, where the variant was distributed more diffusely throughout the cytoplasm and nucleus. For the remaining 15 variants assessed, localization and movement of punctate vesicles was consistent with

WT.

The abnormal variants are highlighted with two examples for each in Figure 3-4 to give more example of the inconsistent irregularities seen for each. The first, D38V, showed mixed results with some cells looking similar to WT in distribution and movement, and others appearing entirely abnormal with more cytoplasmic distribution and a membranous structure extending through the entire cell with no discrete vesicles.

There is some shifting appearance of the membranous structures, but no movement similar to that of the punctate spots in WT cells. This shifting membranous structure is an enduring theme for the remaining variants that showed abnormality. In Q54P an additional distribution pattern is seen where there are numerous areas which specifically

59 exclude G6PC1. These vesicle areas can be seen to shift to some degree showing there is still movement and traffic occurring within the cells, however the G6PC1 itself does not appear to be moving. This vesicle exclusion pattern was also seen for variants R83C,

R83H, C109Y, and G188S. Variant R83C was notable for retaining some mobile punctate vesicles within a cell, while also containing exclusionary vesicles. R83H,

C109Y, and G188S all had the membranous appearance with some larger conglomerates that shifted somewhat but did not show nearly the same speed and mobility that WT does.

Overall, the enduring theme observed is that these variants displayed a range of variable abnormality distinct from WT and the other variants examined. This is consistent with all prior examination of these variants for phenotypes including protein level, localization, and glycosylation, and the known or suspected pathogenicity of each.

3.6 Discussion

The goal of this work was to expand the analysis of localization and movement within the G6PC family to identify similarity and divergence, and to assess movement in

G6PC variants, particularly those our lab previously identified with abnormal localization90. To our knowledge, the analysis of G6PC2 and G6PC3 are novel findings in the field and represent the first-time intracellular movement was examined for these proteins, while the results of G6PC1 and G6PT analysis shows both validating and contradicting findings to the work by Soty. et al, and what is commonly established in the field of G6PC biology62. G6PC1 is canonically listed as an exclusively ER localized protein in databases and publications. There are reports of nuclear membrane localization based on studies done before 198062; 91; 92, though this is typically not mentioned or

60 examined in more recent publications. In recent years, Genecards database has included more intracellular compartments with a medium to low degree of confidence for G6PC1.

These are primarily identified through text mining and the specific evidence for inclusion of the other organelles is unclear. Papers specifically focused on G6PC1 and GSD1a research almost universally list it as an ER membrane component, with the exception of

Soty. et al which identified it localizing in nuclear membrane, perinuclear structures consistent with ER, and reported the novel finding of punctate vesicles throughout the cytoplasm. In addition to this expanded range of localization to include cytoplasmic vesicles for both G6PC1 and G6PT, Soty et al. also examined intracellular movement.

Their reports indicate the G6PC1 vesicles move, which our analyses confirm, but they reported that G6PT shows no mobility, which is in direct conflict with our findings. It is unclear if the conflict is due to utilization of a different cell type (HeLa vs HepG2), differences in methodology, or technical limitations in imaging, but certainly warrants further follow up to understand the discrepancy. Though the function and necessity of the vesicle structures has yet to be established, it likely represents an important part of physiology for these proteins that should be studied further.

Soty et al also tested four G6PC1 variants: E110K, A241T, G184R, and G222R, for movement phenotype and concluded that these GSD1a variants do not affect G6PC localization. In this work we expanded the panel to test twenty-one variants and saw reduced or absent movement for six of them. Previously we reported on abnormal molecular phenotypes for these same six G6PC1 missense variants, 5 of which are known pathogenic in GSD1a, and one VUS with conflicting reports of pathogenicity90. All previously displayed reduced protein amount, abnormal glycosylation, and abnormal

61 localization, and now present with absent or drastically diminished mobility throughout the cell. The co-incidence of these abnormal molecular phenotypes for the same six variants, and only these six variants, suggests there may be a singular root cause for all phenotypes seen. Notably, none of the variants that exhibit normal localization in punctate vesicles was seen to have any obvious qualitative differences in movement.

As noted in Figure 3-2 showing G6PC family co-localization with G6PT, the merged image co-localization was strong but imperfect between G6PC1 and G6PC3 with

G6PT. There are potential biological and technical explanations for this. As it remains unclear if G6PC family members form any physical link with G6PT, it could be that they are separate proteins that are localized and move similarly but remain independent enough to have distinct spots. Additionally, this is a system using over expression where the ratio of these proteins does not reflect natural biological levels which could skew co- localization and cause intensity of one to outshine the other. It may also simply be due to technical artifact given the nature of imaging. Though the microscope takes the images rapidly in succession, there are milliseconds between each red and green color channel image (reflected in the timepoints sometimes differing for each color channel). These vesicles move so rapidly that is it possible they appear separated in the time it took to image each channel. Overall, this reflects the importance in determining whether G6PT and the G6PC family members form a physical linkage, which would be a potential route for pathogenicity when binding sites are disturbed, and if this is a potential point of divergence in G6PC1, G6PC2, and G6PC3 function.

Overall, G6PC2 has been notoriously harder to study for enzymatic function, and has been theorized to be more sensitive to pH and other conditions within the cell81. It is

62 possible this artificial over-expression system and non-native cell type are enough to disrupt the typical localization and movement of G6PC2. It was seen with Soty et al. that addition of the fluorophore to the N terminal aspect of G6PT was enough to obstruct its ability to integrate into intracellular membrane and localize62, so it may be possible a similar phenomenon is happening with G6PC2 despite the C- terminal fusion being tolerable for the other family members. More comprehensive examination of conditions, and assessment of endogenous G6PC2 with availability of appropriate antibodies, is warranted to gain better understanding.

The findings of more widespread localization in vesicles throughout the cytoplasm raises questions about the scope of G6PC family function, and the range of metabolic consequences when it goes awry. Previously several groups have tried to do genotype-phenotype correlation on metabolic markers in GSD1a patients, but analysis was limited by patient heterogeneity and the small patient population34; 72. Paired with more comprehensive understanding of molecular phenotypes, it may be possible to re- visit this type of analysis to see if patients with abnormal G6PC1 localization present with different metabolic profiles than those with absent or reduced G6PC1.

63 3.7 Figures for Chapter 3

Figure 1: Localization and movement of G6PC family members A

B 0 seconds 15 seconds 30 seconds

G6PC1

G6PC2

G6PC3

G6PT

EGFP

Figure 1. Localization and movement of G6PC family members. (A) Schematic of fusion constructs of G6PC family tagged with EGFP on the C-terminal end, G6PT-mCherry, and an EGFP only control plasmid. (B) Representative images of each fusion protein construct transfected individually into HepG2 cells. Still images were taken at 63x magnification from live cell video at three time points: 0, 15, and 30 seconds. Where appropriate, arrows are used to indicate representative discrete points at time 0 which move position in the following images. Notably, G6PC2 and EGFP only control constructs had no visible movement and are distributed more uniformly throughout the cells.

64

Figure 2: Movement and colocalization of G6PC family members with G6PT

A

B 0 seconds 15 seconds 30 seconds

Merge

G6PT

G6PC1

65

Figure 2 continued: Movement and colocalization of G6PC family members with G6PT

C 0 seconds 15 seconds 30 seconds

Merge

G6PT

G6PC2

66

Figure 2 continued: Movement and colocalization of G6PC family members with G6PT 0 seconds 15 seconds 30 seconds D

Merge

G6PT

G6PC3

Figure 2. Movement and colocalization of G6PC family members with G6PT. (A) Schematic of fusion constructs of G6PC family tagged with EGFP on the C-terminal end, G6PT-mCherry, and an EGFP only control plasmid. (B-D) Representative images of each G6PC family-EGFP fusion construct co-transfected with G6PT-mcherry into HepG2 cells. Still images were taken at 63x magnification from live cell video at three time points: 0, 15, and 30 seconds. Arrows are used to indicate representative discrete points at time 0 which move position in the following images.

67 Figure 3: Movement of G6PC1 variants 0 seconds 15 seconds 30 seconds

WT

EGFP

Q27R

D38V

Y44Y

68

Figure 3 continued: Movement of G6PC1 variants 0 seconds 15 seconds 30 seconds

Q54P

R83C

R83H

C109Y

E110K

69

Figure 3 continued: Movement of G6PC1 variants 0 seconds 15 seconds 30 seconds

E110Q

H119L

P144P

R170Q

G188S

70

Figure 3 continued: Movement of G6PC1 variants 0 seconds 15 seconds 30 seconds

E193Q

T213I

L216L

G222A

F294L

71 Figure 3 continued: Movement of G6PC1 variants 0 seconds 15 seconds 30 seconds

V304I

V308I

A331A

Figure 3. Movement of G6PC1 variants. Representative images of each G6PC1-EGFP variant fusion construct transfected into HepG2 cells. Still images were taken at 63x magnification from live cell video at three time points: 0, 15, and 30 seconds. Arrows are used to indicate representative discrete points at time 0 which move position in the following images.

72 Figure 4: Abnormal movement in G6PC1 variants

0 seconds 15 seconds 30 seconds

D38V

Q54P

R83C

73 Figure 4 continued: Abnormal movement in G6PC1 variants

0 seconds 15 seconds 30 seconds

R83H

C109Y

G188S

Figure 4. Abnormal movement of G6PC1 variants. Additional representative images of each G6PC1-EGFP variant fusion construct identified as having abnormal movement. Still images were taken at 63x magnification from live cell video in HepG2 cells at three time points: 0, 15, and 30 seconds. Arrows are used to indicate representative discrete points at time 0 which move position in the following images when appropriate. Two examples from each variant are shown side by side to emphasize the variety of irregularities seen for each.

74 Chapter 4. Discussion

The goal of this work was to develop a systematic approach to analyze genetic variants and determine if they cause disease, and if so, how they cause disease. This lays the foundation for informed therapeutic discovery and application, especially in rare diseases where patient populations are small and funding is low. In addition to assessing variants using glycogen storage disease 1a as a rare Mendelian disease model, this work also sought to expand current knowledge of G6PC biology to lay a better foundation for assessing molecular phenotypes in its associated disease.

In chapter 1 we examined the pertinent background and significance for studying novel DNA variants identified through genetic testing and sequencing. The current need for identifying variants puts much of the focus on determining if a variant contributes to disease, but in addition to providing diagnostic information, it may also be able to provide prognostic information. Even within a singular gene disorder, different variants may lead to a spectrum of disease severity. Genotype-phenotype correlation is a valuable analysis to help predict outcomes and plan preventive measures for patients with particular genetic makeup. As seen in the therapeutic successes in cystic fibrosis, in vivo and in vitro data can be utilized to apply therapeutics based on patient genotype its unique molecular phenotype49-51. An example of prognostic application comes from a study where specific Fibrillin-1 variants were used to risk stratify patients with Marfan syndrome when it was found that patients with protein truncating variants have more aortic events than patients with missense variants93. This highlights how the classification of protein consequence alone was able to yield meaningful results for disease

75 management and preventive screening and echoes a prominent concept that premature truncating variants cause more severe disease. However, that is not always the case. First, truncating variants can have two results: they may lead to completely absent protein, or they may lead to an incomplete protein. Most nonsense and frameshift variants will lead to nonsense mediated decay in the mRNA state and never be translated to protein.

However, some, especially those within the last exon, are able to escape nonsense mediated decay and produce truncated protein products. Thus, a truncating variant within a single disease could lead to two very different molecular results which have potential to have different severity. Secondly, deletion or absence of a protein is not always the most severe phenotype. There can be cases, like Pelizaeus-Merzbacher Disease, where complete deletion of the gene, and thus completely absent protein, causes a milder phenotype than some missense variants. Thus, though my work concluded that nonsense and frameshift variants can almost universally be considered pathogenic, additional analysis can still be done for these types of variants. The subset of nonsense and frameshift variants were confirmed to not produce full length protein, but the model system was not designed to detect truncated protein which could be an important point of differentiation in the molecular effect of different premature truncating variants.

Utilization of both N terminal and C terminal fluorescent tags, when the protein physiology allows for it, or antibodies near N and C terminal ends, could provide further distinction especially in cases where in silico analysis suggests escape of nonsense mediated decay. The analysis in this work also does not provide enough information to determine if lack of full-length protein would be expected to cause more severe GSD1a

76 phenotypes, which is an important next step of this work to apply results in a prognostic manner.

Within glycogen storage disease type 1a and 1b, genotype-phenotype analysis has been attempted at the full physiologic level. In GSD1a, several case reports have highlighted the specific variants of patients with mild or atypical phenotypes, increasing interest in determining genotype-phenotype correlation in G6PC variants34; 94-98. In a more comprehensive study of 20 patients with GSD1a, they sought to compare metabolic measures commonly taken in clinic for GSD1a patients to their genotypes. The blood markers they examined were lactate, uric acid, triglycerides, cholesterol, along with standard urinalysis, BMI, and height, and note that the amount of time between measures were not constant due to the study being performed retrospectively. They concluded that nonsense and active site variants seem to have a more severe clinical phenotype, however they also reported extensive heterogeneity in GSD1a presentation, even among siblings with the same genotype72. In a study on GSD1b they sought to examine patients who were homozygous for a given Slc37a4 variant to reduce the confounding effects of compound heterozygosity. They assessed the neutropenia phenotype of 6 patients to stratify as mild or severe, and noted additional prominent disease symptoms. Patients 1-3 shared a frameshift variant (p.L348V*58fs), patient 4 had a missense (p.G122E), and patients 5-1 and 5-2 shared a nearly whole gene deletion. The patients with the frameshift variants and large deletion were identified as being more severe in phenotype, with more pronounced neutropenia and more hospitalizations compared to the patient with a missense variant. This provided some evidence for correlation between genotype and myeloid phenotype in GSD1b, however they note small patient size and high phenotypic

77 variability make it difficult to make definitive conclusions. Another examined patients with homozygous and compound heterozygous genotypes and did find a stronger correlation trend of specific genotypes with severe neutropenia, but again larger sample sizes would be needed to make definitive statements99. Each of these studies emphasizes the potential utility of establishing genotype-phenotype for patients to better perform personalized disease management for individual patients. Dietary management in GSD type 1 patients is critical to their survival, and any information to predict or personalize that can have profound impacts on the wellbeing of those living with this disease. These studies also serve to exemplify that study design is not consistent, as different parameters were examined each time in an attempt to establish these correlations. Comprehensive data are difficult to obtain when patients are monitored at different clinics around the world, and much of this analysis requires a combination of quantitative and qualitative measures of disease severity, along with consideration for patient lifestyle and adherence to treatment.

These challenges in genotype-phenotype correlation at the clinical phenotype level, along with considerations like cost and invasiveness of human studies, help demonstrate the value of in vitro models. Examination in a uniform system allows for direct comparison of variant effects, which may be otherwise indistinguishable, especially in autosomal recessive inheritance pattern diseases where compound heterozygosity is common. Genotype-phenotype correlation on an organism scale is also challenging in rare diseases where individual disease variants may only be reported in a single patient. Though many of the strengths of artificial systems can also be argued to be weaknesses, consideration for the unique interplay of the compound heterozygous

78 variants and other modifier genes should not be discounted. Analysis in a uniform in vitro system is a start to understanding the complexities of how variants interact with each other and within each individual’s unique genetic background. In vitro analysis remains an important foundational step to the understanding of how variants individually contribute to disease.

Chapter 2 highlights novel molecular phenotypes identified for G6PC1- including reduced protein amount, abnormal glycosylation, and abnormal localization, and Chapter

3 introduces the additional novel phenotype of abnormal movement throughout the cell.

These phenotypes were assessed with the goal of categorizing variants to determine if they showed signs of pathogenicity, and in what molecular manner they were pathogenic.

The first, reduced protein amount, can have a variety of underlying causes- most simply boiled down to decreased production and/or increased degradation. The model system in this work used an artificial promoter and was over-expressing the fusion protein construct in cells and does not reflect physiologic levels of G6PC, but the comparison between expression of different variants in the same system is still significant. Reduced protein amount interestingly appeared to show no correlation to reported phosphorylase activity levels from other groups. Most notably are the active site missense variants included in analysis (Figure 4-1), where R83C and R83H had reduced amounts, but H119L, R170Q had normal levels. Both R83C and R170Q have been reported to have zero phosphorylase activity, and the remaining active site variants would be presumed to be equally catalytically inactive as other missense variants in those positions- H119A and H176A- have no activity65. What contributes to the lesser amount of the R83 variants compared to the others would be interesting to investigate further. Though the catalytic core variants

79 are most likely pathogenic due to their disruption of the essential residues for an enzyme’s function, other missense variants with reduced protein amount may benefit from therapeutic strategies to increase protein production or reduce protein degradation.

Such an elegant solution is not, however, applicable in the case of the four non-active site missense variants reported in this work due to the fact that they all were found to be abnormal in additional assays and thus have a more complex phenotype.

The second molecular phenotype addressed in Chapter 2, glycosylation, is an important post translational modification to consider for any glycoprotein. Glycosylation has roles in protein conformation, stability, and function100. One possibility for the complex phenotype of the abnormal variants reported in this work is that the variants somehow hindered glycosylation of the protein. This could be sufficient to lead to the other effects of abnormal localization and reduced amount by nature of misfolding and/or instability of the final protein product. G6PC1 has the most common form of glycosylation, N-linked glycosylation, on residue 96. Mutagenesis to p.N96A at this residue is reported to reduce enzyme activity to about 40% of WT levels, but did not have an effect on protein quantity measured via western blot66. This provides evidence against the theory that abnormal glycosylation could be the cause of reduced protein amounts, however quantification via western blot may not be as sensitive as the method used in this body of work to compare steady state protein amount in cells, and mild reductions could have been missed.

The third phenotype, localization, is especially important for a compartmentalized protein like G6PC1. G6PC family members are typically reported as ER transmembrane proteins, with the catalytic core facing inside of the ER lumen. Though, as introduced in

80 Chapter 3, there are some discrepancies about the extent of G6PC localization to other membrane structures and to vesicles within the cell, the compartmentalization of its function cleaving G6P is well established. The compartmentalization is further reinforced by the overlapping phenotype between GSD1a (G6PC1-/-) and GSD1b (Slc37a4 aka

G6PT-/-), where we would not expect such an overlap if G6PT were not governing substrate transport of G6P into a separate compartment for G6PC1 catalysis. While studies have shown G6PC1 has a very strong preference for G6P as a substrate, it is to a lesser extent able to hydrolyze other sugar-6-phosphate substrates, and multiple studies have assessed the enzymatic function and range of substrate preference to varying degrees32; 85; 101; 102. Much of the analysis of enzymatic function and substrate specificity has been done within the context of microsomal membranes. Mitheux et al. note that these are “artificial revesiculated membranes, far from the natural environment of glucose-6-phosphatase”. They specifically note that enzyme kinetics change drastically in differing conditions, and that sugar-6-phosphate analogs of G6P are also metabolized in disrupted microsomal membranes83. This leads to the conclusion that the compartmental localization and G6PT handling of substrate transport is crucial for the specific affinity and enzyme kinetics of G6PC1 and, paired with data from Veiga-da-Cunha et al., for

G6PC3 as well32. Having this compartmentalization disrupted could alter the profile of enzymatic function, especially as we see it appearing freely within the cytoplasm of the cells for several variants in our results. It is likely that the 3D structure of this transmembrane protein is so disrupted that enzymatic function is impossible, but should any remain intact it could lead to a very different gain-of-function phenotype profile.

81 Lastly, intracellular trafficking is a phenotype that to our knowledge has only been reported one other time for G6PC162. Thus far there has been no speculation or research reported on the function or purpose of this movement. The original reporting group showed G6PC1 movement in HeLa cells, whereas the experiments in Chapter 3 of this work showed movement of G6PC1, G6PC3 and G6PT in HepG2 cells. The discrepancy in their report that G6PT does not exhibit movement in HeLa cells, while we report obvious movement in our study utilizing HepG2 cells could reflect the differences in the cell type used or could reflect other experimental or biological differences.

Continuing investigation in additional cell types would provide more solid groundwork for establishing the prominence of this discovery in G6PC biology, specifically within cell types where the proteins are endogenously expressed. Primary cell lines and animal models would be an invaluable tool for further examining this phenotype, especially

G6PC2 to determine if its lack of mobility and dissimilarity to the rest of the family is due to expression outside its native cell type. This work has identified protein variants with abolished movement, which may provide another route for investigating the role of movement in G6PC biology, however it is confounded by the additional molecular phenotypes each variant displays and does not provide a clean system to assess movement alone.

In each assay performed in this work variants were identified with abnormal molecular phenotypes, but as eluded to before, it was the same variants in every assay.

The exception was D38V, which had a borderline phenotype in assays. D38V had significantly reduced protein amount, had a noticeably fainter upper band on WB, and had more pronounced mixed display of regular and irregular localization and movement.

82 Though not all of these variants reached statistical significance on abnormal localization, they were qualitatively abnormal and that was echoed in the analysis of their movement.

There were no instances of a variant being abnormal in only one of the performed assays.

The principle of Occam’s razor would suggest that this complex phenotype is due to a single underlying cause. As the movement phenotype is seen only in the vesicles, and there were no observed cases of the vesicles being present but not moving, I would suspect that the abnormal movement phenotype is secondary to abnormal localization.

Though much more work is required to parse out the specifics of this complex phenotype interplay, I would propose two major models to test. First, as glycosylation can play a major role in protein structure and function, the nature of these variants may prevent glycosylation and alter the overall structure of the protein, causing it to be mis- localized and misfolded. This likely makes the abnormal protein the target of enhanced degradation and leads to the overall reduced protein amount seen. The first experiment I would propose in testing this is to examine variants at the N96 residue within this pipeline to determine if ablation of the glycosylation site causes the same complex phenotype. Should this variant be confirmed to have normal protein levels and localize normally, it would negate this proposed model. Another possibility is that translocation of the nascent protein to the ER is abnormal, causing abnormal localization that inhibits glycosylation and overall leads to enhanced degradation of the protein. Though the known ER localization signal motif remains intact in each of these variants, there are likely added layers of complexity to ER translocation signaling within G6PC family, especially as G6PC3 completely lacks the standard KKXX motif and is still clearly localized in the same manner23. Further research into the ER translocation of G6PC1 and

83 which residues are essential for its specific translational processing would shed further light on this possibility. Both theories could be partially tested by determining if the reduced protein amount is degraded secondary to these abnormalities, or if there is reduced synthesis in addition to these abnormalities. Inhibiting the proteosome or inhibiting translation and examining half-life of these variants compared to WT could differentiate the cause of the reduced amounts at steady state.

In analyzing the results, other potential similarities among these abnormal variants were explored. First, there was no specific trend in amino acid polarity change found, as these represented nonpolar, polar, positive and negative charged amino acids being changed to different states and included the case of R83H which is a change from one positively charged amino acid residue to another. Next position within the protein was examined. These variants span 150 amino acids within the first half of the G6PC1 protein, with 5/6 in the first third of the protein. Figure 4-2 displays this pattern in the results of protein amount analysis, however when overlayed with the reported pathogenicity of the variants it displays an unintentional bias of this study in missense variant selection. Known pathogenic variants were over-selected in the N terminal half of the protein, while VUS were the only missense variant type represented in the C terminal half. This is a flaw that would be rectified by testing additional known pathogenic missense variants across the entirety of the protein body. Protein domain was also assessed to look for association with position in transmembrane helix, ER lumen, or cytoplasmic facing portions of the protein. Three of the variants, R83C, R83H, and

C109Y were located in two different ER lumenal domains, D38V and G188S are located in two different transmembrane domains, and Q54P resides in a cytoplasmic domain.

84 Thus, no clear association with any of these features was seen, and though the variants trended to the N terminal half of the protein this may be an artifact of uneven pathogenic variant selection.

Of the six missense variants with abnormal phenotype, five were known pathogenic variants, while p.C109Y was listed as VUS. As discussed in Chapter 2, over the course of this work additional clinical data for p.C109Y was updated in the ClinVar database. In silico processing identifies some degree of pathogenic potential from the variant due to it being an ultra-rare SNP, G6PC1 having high conservation at the C109 residue, and high physio-chemical dissimilarity between cysteine and tyrosine residues.

In addition to that, in 2018 a patient with clinical symptoms of GSD1a was found to be homozygous for the p.C109Y variant. Despite the strong pairing of in silico prediction and clinical data from an unambiguously homozygous individual with the associated disease, the variant remains listed as “conflicting interpretations of pathogenicity” nearly three years after the clinical report was made. Caution and diligence are necessary in making definitive calls of pathogenicity, as premature classification can have as much detrimental effect as uncertainty. As additional patients are unlikely to present for rare variants in rare diseases to provide further evidence, in vitro data is clearly necessary to provide additional layers of data and give definitive classifications.

In order to study variants and their pathogenicity and pathophysiology, it is important to have strong foundation of the normal biology and function of each gene and protein product. In genes like the G6PC family, while the general function and role of these proteins is understood, there still remains some discrepancy and uncertainty. Much of our functional knowledge of genes comes from knockout experiments, where the

85 consequences of the loss of a genes function inform on what its purpose is. In GSD1a and

1b it is obvious that glucose release and breakdown of glycogen are critical roles of

G6PC and G6PT. In GSD1b and SCN4 it is obvious that G6PT and G6PC3 are essential to healthy function of neutrophils. In G6PC2 there is a less obvious phenotype that requires deeper examination into G6PC2’s biological role. In the cases of the obvious disease phenotypes, it is possible that such profound metabolic dysregulation is masking smaller roles of G6PC1 and G6PT that we have yet to uncover. Work from Veiga-da-

Cunha et al. examined a wider variety of substrates for G6PC1 and G6PC3 and identified unique profiles for each. While G6PC1 activity is strongest for G6P, it also metabolized mannose-6-phosphate (M6P), and to a small extent 1,5-anhydro-D-glucitol-6-phosphate

(1,5AG6P), fructose-6-phosphate (F6P), and ribose-5-phosphate (R5P). G6PC3 metabolized G6P to a relatively lesser extent and has stronger preference for R5P,

1,5AG6P, ribitol-5P (Rol5P), M6P, and eight of the remaining 9 substrates tested. The expanded substrate profile, paired with the unexplored intracellular trafficking and under- reported localization to vesicles, suggest there is more to G6PC biology than we currently understand.

One additional component of G6PC and G6PT biology that is surprisingly lacking, is exploration into the physical interaction between these proteins. Reports commonly state that these proteins form a complex, but the specifics on if this is true physically interacting complex or a purely functionally associated complex are unclear.

Soty et al. investigated this with FRET, but it is unclear if their negative result is due to true physiological separation, or technical limitations62. G6PC1 has been shown to be about 37kDa in immunoblotting and SDS-PAGE103, and through examination of tagged

86 proteins64, while a group using radiation inactivation analysis identified a larger 75kDa functional unit that may be supporting evidence for a G6PC-G6PT physical complex104.

Our results confirm that these proteins co-localize in all structures in the cell they are found in and remain co-localized in their movement throughout the cell. The potential for physical linkage between these proteins is interesting as it provides another possible route for molecular phenotyping and comparison among the three G6PC family members. With the fusion protein models, immunoprecipitation pulldown assays could be performed to investigate this possibility, and if a physical complex is identified variants could be explored in that new context.

A larger goal of this work was to lay groundwork for future therapeutic interventions, and the clear role of variant classification in doing so is emphasized through this work. However, not all therapeutic interventions require a personalized approach and the G6PC family of paralogs provides an interesting potential for treatment approach. Other congenital metabolic diseases have been successfully treated with enzyme replacement therapies105. Typically administered by IV infusion every few weeks, these therapies simply provide functional copies of the enzyme that is deficient directly into the bloodstream. However, the complexities of G6PC and G6PT being compartmentalized transmembrane proteins, along with the complex workings and regulation of glucose and glycogen metabolism, make simply delivering complete copies of the enzyme unfeasible. Additionally, humoral immunity and antibody production against the enzyme, specifically for patients who do not naturally produce any enzyme, are potential complicating factors38. DNA and RNA replacement therapies are another feasible route, with clinical trial currently underway to test recombinant adeno-associated

87 virus with G6PC1106, though immune response remains a potential issue in patients with no residual G6PC expression. The unique features of the G6PC family, with three enzymes reported to catalyze the same reaction, raises the possibility for paralogous gene treatment. As these paralogs are already expressed and recognized in the patients, the chance of immune reaction is removed. With current knowledge, it seems G6PC2 has functionally diverged most from the G6PC family function and maintains a separate, more limited, role in glucose metabolism. With varying reports on its ability to catalyze

G6P hydrolysis, highly restricted expression pattern, and strikingly different localization and movement patterns in HepG2 model liver cells, it is an unlikely candidate for compensation of loss of G6PC1. However, G6PC3 may be a feasible candidate. As the ubiquitously expressed G6PC family member, is already present in the liver and kidney cells where G6PC function is crucial and has been theorized to contribute to residual glucose release from other tissues in GSD1a20; 21; 23. This is loosely supported by the mild improvement in glucose regulation experienced by many GSD1a patients after puberty, which correlates with an increase in muscle mass. Though the liver is the primary organ for glycogen storage, muscle tissue actually contains a higher total quantity of glycogen when considered collectively, and the activity of G6PC3 in the muscle may allow for modest levels of glucose release. This is insufficient to naturally alleviate disease but provides an intriguing theoretical framework to test and build upon. The lower relative expression, 6-fold lower Vmax, and lesser affinity for G6P as a substrate may all combine to limit its ability to naturally compensate for G6PC1 loss in GSD1a patients. When gene therapy was applied mouse models, levels of G6PC1 as low as 1% of WT were reported to maintain glucose homeostasis and a favorable metabolic profile, while levels as low as

88 3% prevented liver adenoma and carcinoma development107. With such a low level of

G6PC1 activity required, modest increases in G6PC3 may be sufficient to reduce or alleviate symptoms in patients with GSD1a. Small molecules could be screened for a range of effects on G6PC3 physiology including increased expression, decreased degradation, protein stabilization, improved Km or Vmax, and altering its substrate specificity. G6PC3 can be investigated to determine if overexpression functionally compensates for G6PC1 and investigated for adverse side effects due to its expanded substrate profile. This again emphasized the need to fully understand the foundational biology of these proteins in order to predict and assess for outcomes. If feasible, this would provide an elegant route of treatment that would be uniquely possible by utilizing the genome’s own redundancy of paralogous genes that have not functionally diverged.

In summary, establishing a strong foundational understanding of proteins in the human body is instrumental in predicting and understanding consequences of their variation. Determining if and how variants lead to disease can improve personalized diagnostics, prognostics, and informed therapeutic discovery for any genetic ailment.

Employing techniques that yield results quickly, are low cost, and can be done without the need for invasive procedures will improve the availability of this information for all diseases, especially those rare disease which suffer from less funding and research power.

89 4.1 Figures for Chapter 4

Figure 4-1. Protein levels of G6PC1 active site variants in HepG2 cells. Six active site variants in the four reported G6PC1 catalytic core resides are shown compared to WT and negative controls. The mean and SD percentage of G6PC1-EGFP positive transfected cells detected in n=180 images per variant tested using the Perkin Elmer Operetta System at 20x magnification in fixed cells. Stats show Welch’s two tailed t test analysis of each variant compared to WT protein levels. * p <0.05; ** p <0.01; ***p <0.001; **** p <0.0001.

90

110 Aggregate by AA Position Missense Only 110 Pathogenic Not Reported 100 VUS Positive Control

90

80

70

60

50 *** **** 40 **** ****

Percent EGFP Positive Transfected Cells 30 **** ****

20

10

0

WT Mock None Q27R D38V Q54P R83C R83H T213I V304I V308I C109Y E110K E110Q H119L R170Q H176L G188S E193Q G222A F294L GFP/RFP Mutation

Figure 4-2. Protein levels of G6PC1 missense variants in HepG2 cells. Eighteen missense variants in the G6PC1 protein, organized by aa residue position, are shown compared to WT and negative controls. The mean and SD percentage of G6PC1-EGFP positive transfected cells detected in n=180 images per variant tested using the Perkin Elmer Operetta System at 20x magnification in fixed cells. Stats show Welch’s two tailed t test analysis of each variant compared to WT protein levels.

91 References

1. Blencowe, H., Moorthie, S., Petrou, M., Hamamy, H., Povey, S., Bittles, A., Gibbons, S., Darlison, M., and Modell, B. (2018). Rare single gene disorders: estimating baseline prevalence and outcomes worldwide. Journal of Community Genetics 9, 397-406. 2. Mitani, A.A., and Haneuse, S. (2020). Small Data Challenges of Studying Rare Diseases. JAMA Network Open 3, e201965. 3. Stoller, J.K. (2018). The Challenge of Rare Diseases. Chest 153, 1309-1314. 4. Augustine, E.F., Adams, H.R., and Mink, J.W. (2013). Clinical Trials in Rare Disease. Journal of Child Neurology 28, 1142-1150. 5. Rees, C.A., Pica, N., Monuteaux, M.C., and Bourgeois, F.T. (2019). Noncompletion and nonpublication of trials studying rare diseases: A cross-sectional analysis. PLOS Medicine 16, e1002966. 6. Shendure, J., Balasubramanian, S., Church, G.M., Gilbert, W., Rogers, J., Schloss, J.A., and Waterston, R.H. (2017). DNA sequencing at 40: past, present and future. Nature 550, 345-353. 7. (2010). A map of human genome variation from population-scale sequencing. Nature 467, 1061-1073. 8. (2015). A global reference for human genetic variation. Nature 526, 68-74. 9. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56-65. 10. Yang, Y., Muzny, D.M., Xia, F., Niu, Z., Person, R., Ding, Y., Ward, P., Braxton, A., Wang, M., Buhay, C., et al. (2014). Molecular Findings Among Patients Referred for Clinical Whole-Exome Sequencing. JAMA 312, 1870. 11. Lappalainen, T., Scott, A.J., Brandt, M., and Hall, I.M. (2019). Genomic Analysis in the Age of Human Genome Sequencing. Cell 177, 70-84. 12. Lek, M., Karczewski, K.J., Minikel, E.V., Samocha, K.E., Banks, E., Fennell, T., O’Donnell-Luria, A.H., Ware, J.S., Hill, A.J., Cummings, B.B., et al. (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285- 291. 13. De La Campa, E.Á., Padilla, N., and De La Cruz, X. (2017). Development of pathogenicity predictors specific for variants that do not comply with clinical guidelines for the use of computational evidence. BMC Genomics 18. 14. Nykamp, K., Anderson, M., Powers, M., Garcia, J., Herrera, B., Ho, Y.-Y., Kobayashi, Y., Patil, N., Thusberg, J., Westbrook, M., et al. (2017). Sherloc: a comprehensive refinement of the ACMG–AMP variant classification criteria. Genetics in Medicine 19, 1105-1117. 15. Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J., Grody, W.W., Hegde, M., Lyon, E., Spector, E., et al. (2015). Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in Medicine 17, 405-423.

92 16. Bateman, A., Martin, M.-J., Orchard, S., Magrane, M., Agivetova, R., Ahmad, S., Alpi, E., Bowler-Barnett, E.H., Britto, R., Bursteinas, B., et al. (2021). UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research 49, D480- D489. 17. Stelzer, G., Rosen, N., Plaschkes, I., Zimmerman, S., Twik, M., Fishilevich, S., Stein, T.I., Nudel, R., Lieder, I., Mazor, Y., et al. (2016). The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses. Current Protocols in Bioinformatics 54, 1.30.31-31.30.33. 18. Jackson, M.R., Nilsson, T., and Peterson, P.A. (1990). Identification of a consensus motif for retention of transmembrane proteins in the endoplasmic reticulum. The EMBO Journal 9, 3153-3162. 19. Mutel, E., Abdul-Wahed, A., Ramamonjisoa, N., Stefanutti, A., Houberdon, I., Cavassila, S., Pilleul, F., Beuf, O., Gautier-Stein, A., Penhoat, A., et al. (2011). Targeted deletion of liver glucose-6 phosphatase mimics glycogen storage disease type 1a including development of multiple adenomas. Journal of Hepatology 54, 529-537. 20. Battezzati, A., Caumo, A., Martino, F., Sereni, L.P., Coppa, J., Romito, R., Ammatuna, M., Regalia, E., Matthews, D.E., Mazzaferro, V., et al. (2004). Nonhepatic glucose production in humans. American Journal of Physiology- Endocrinology and Metabolism 286, E129-E135. 21. Shieh, J.-J., Pan, C.-J., Mansfield, B.C., and Chou, J.Y. (2004). A Potential New Role for Muscle in Blood Glucose Homeostasis. Journal of Biological Chemistry 279, 26215-26219. 22. Boers, S.J., Visser, G., Smit, P.G., and Fuchs, S.A. (2014). Liver transplantation in glycogen storage disease type I. Orphanet J Rare Dis 9, 47. 23. Shieh, J.J., Pan Cj Fau - Mansfield, B.C., Mansfield Bc Fau - Chou, J.Y., and Chou, J.Y. A glucose-6-phosphate hydrolase, widely expressed outside the liver, can explain age-dependent resolution of hypoglycemia in glycogen storage disease type Ia. 24. Hutton, J.C., and O'Brien, R.M. (2009). Glucose-6-phosphatase catalytic subunit gene family. J Biol Chem 284, 29241-29245. 25. Frigeri, C., Martin, C.C., Svitek, C.A., Oeser, J.K., Hutton, J.C., Gannon, M., and O'Brien, R.M. (2004). The Proximal Islet-Specific Glucose-6-Phosphatase Catalytic Subunit-Related Protein Autoantigen Promoter Is Sufficient to Initiate but not Maintain Transgene Expression in Mouse Islets in Vivo. Diabetes 53, 1754-1764. 26. Arden, S.D., Zahn, T., Steegers, S., Webb, S., Bergman, B., Brien, R.M., and Hutton, J.C. (1999). Molecular cloning of a pancreatic islet-specific glucose-6- phosphatase catalytic subunit-related protein. Diabetes 48, 531. 27. Shieh, J.-J., Pan, C.-J., Mansfield, B.C., and Chou, J.Y. (2004). The islet-specific glucose-6-phosphatase-related protein, implicated in diabetes, is a glycoprotein embedded in the endoplasmic reticulum membrane. FEBS Letters 562, 160-164. 28. Martin, C., Oeser, J., Svitek, C., Hunter, S., Hutton, J., and O'Brien, R. (2002). Identification and characterization of a human cDNA and gene encoding a ubiquitously expressed glucose-6-phosphatase catalytic subunit-related protein. Journal of Molecular Endocrinology 29, 205-222.

93 29. Martin, C.C., Bischof, L.J., Bergman, B., Hornbuckle, L.A., Hilliker, C., Frigeri, C., Wahl, D., Svitek, C.A., Wong, R., Goldman, J.K., et al. (2001). Cloning and Characterization of the Human and Rat Islet-specific Glucose-6-phosphatase Catalytic Subunit-related Protein (IGRP) Genes. Journal of Biological Chemistry 276, 25197-25207. 30. Petrolonis, A.J., Yang, Q., Tummino, P.J., Fish, S.M., Prack, A.E., Jain, S., Parsons, T.F., Li, P., Dales, N.A., Ge, L., et al. (2004). Enzymatic Characterization of the Pancreatic Islet-specific Glucose-6-Phosphatase-related Protein (IGRP). Journal of Biological Chemistry 279, 13976-13983. 31. Hutton, J.C., and Eisenbarth, G.S. (2003). A pancreatic β-cell-specific homolog of glucose-6-phosphatase emerges as a major target of cell-mediated autoimmunity in diabetes. Proceedings of the National Academy of Sciences 100, 8626. 32. Veiga-Da-Cunha, M., Chevalier, N., Stephenne, X., Defour, J.-P., Paczia, N., Ferster, A., Achouri, Y., Dewulf, J.P., Linster, C.L., Bommer, G.T., et al. (2019). Failure to eliminate a phosphorylated glucose analog leads to neutropenia in patients with G6PT and G6PC3 deficiency. Proceedings of the National Academy of Sciences 116, 1241-1250. 33. McDermott, D.H., De Ravin, S.S., Jun, H.S., Liu, Q., Priel, D.A.L., Noel, P., Takemoto, C.M., Ojode, T., Paul, S.M., Dunsmore, K.P., et al. (2010). Severe congenital neutropenia resulting from G6PC3 deficiency with increased neutrophil CXCR4 expression and myelokathexis. Blood 116, 2793-2802. 34. Matern, D., Seydewitz, H.H., Bali, D., Lang, C., and Chen, Y.T. (2002). Glycogen storage disease type I: diagnosis and phenotype/genotype correlation. Eur J Pediatr 161 Suppl 1, S10-19. 35. Kishnani, P.S., Austin, S.L., Abdenur, J.E., Arn, P., Bali, D.S., Boney, A., Chung, W.K., Dagli, A.I., Dale, D., Koeberl, D., et al. (2014). Diagnosis and management of glycogen storage disease type I: a practice guideline of the American College of Medical Genetics and Genomics. Genet Med 16, e1. 36. Bianchi, L. (1993). Glycogen storage disease I and hepatocellular tumours. Eur J Pediatr 152 Suppl 1, S63-70. 37. Reitsma-Bierens, W.C.C. (1993). Renal complications in glycogen storage disease type I. European Journal of Pediatrics 152, 60-62. 38. Koeberl, D.D., Kishnani, P.S., and Chen, Y.T. (2007). Glycogen storage disease types I and II: treatment updates. Journal of inherited metabolic disease 30, 159-164. 39. Davis, M.K., and Weinstein, D.A. (2008). Liver transplantation in children with glycogen storage disease: controversies and evaluation of the risk/benefit of this procedure. Pediatr Transplant 12, 137-145. 40. Wolf, S.M. (2013). Return of Individual Research Results and Incidental Findings: Facing the Challenges of Translational Science. Annual Review of Genomics and Human Genetics 14, 557-577. 41. Kalia, S.S., Adelman, K., Bale, S.J., Chung, W.K., Eng, C., Evans, J.P., Herman, G.E., Hufnagel, S.B., Klein, T.E., Korf, B.R., et al. (2017). Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0): a policy statement of the American College of Medical Genetics and Genomics. Genetics in Medicine 19, 249-255.

94 42. Clift, K.E., Halverson, C.M.E., Fiksdal, A.S., Kumbamu, A., Sharp, R.R., and McCormick, J.B. (2015). Patients' views on incidental findings from clinical exome sequencing. Applied & Translational Genomics 4, 38-43. 43. Facio, F.M., Eidem, H., Fisher, T., Brooks, S., Linn, A., Kaphingst, K.A., Biesecker, L.G., and Biesecker, B.B. (2013). Intentions to receive individual results from whole-genome sequencing among participants in the ClinSeq study. European Journal of Human Genetics 21, 261-265. 44. Fernandez, C.V., Bouffet, E., Malkin, D., Jabado, N., O’Connell, C., Avard, D., Knoppers, B.M., Ferguson, M., Boycott, K.M., Sorensen, P.H., et al. (2014). Attitudes of parents toward the return of targeted and incidental genomic research findings in children. Genetics in Medicine 16, 633-640. 45. Rigter, T., Henneman, L., Kristoffersson, U., Hall, A., Yntema, H.G., Borry, P., Tönnies, H., Waisfisz, Q., Elting, M.W., Dondorp, W.J., et al. (2013). Reflecting on Earlier Experiences with Unsolicited Findings: Points to Consider for Next- Generation Sequencing and Informed Consent in Diagnostics. Human Mutation 34, 1322-1328. 46. Clarke, A.J. (2014). Managing the ethical challenges of next-generation sequencing in genomic medicine. British Medical Bulletin 111, 17-30. 47. Christensen, K.D., and Green, R.C. (2013). How could disclosing incidental information from whole-genome sequencing affect patient behavior? Per Med 10, 10.2217/pme.2213.2224. 48. Elborn, S., and Vallieres, E. (2014). Cystic fibrosis gene mutations: evaluation and assessment of disease severity. Advances in Genomics and Genetics, 161. 49. Ramsey, B.W., Davies, J., McElvaney, N.G., Tullis, E., Bell, S.C., Dřevínek, P., Griese, M., McKone, E.F., Wainwright, C.E., Konstan, M.W., et al. (2011). A CFTR Potentiator in Patients with Cystic Fibrosis and theG551DMutation. New England Journal of Medicine 365, 1663-1672. 50. Boyle, M.P., Bell, S.C., Konstan, M.W., McColley, S.A., Rowe, S.M., Rietschel, E., Huang, X., Waltz, D., Patel, N.R., and Rodman, D. (2014). A CFTR corrector (lumacaftor) and a CFTR potentiator (ivacaftor) for treatment of patients with cystic fibrosis who have a phe508del CFTR mutation: a phase 2 randomised controlled trial. The Lancet Respiratory Medicine 2, 527-538. 51. Durmowicz, A.G., Lim, R., Rogers, H., Rosebraugh, C.J., and Chowdhury, B.A. The U.S. Food and Drug Administration's Experience with Ivacaftor in Cystic Fibrosis. Establishing Efficacy Using In Vitro Data in Lieu of a Clinical Trial. 52. Pertea, M., Shumate, A., Pertea, G., Varabyou, A., Chang, Y.-C., Madugundu, A.K., Pandey, A., and Salzberg, S.L. (2018). Thousands of large-scale RNA sequencing experiments yield a comprehensive new human gene list and reveal extensive transcriptional noise. bioRxiv, 332825. 53. Wood, V., Lock, A., Harris, M.A., Rutherford, K., Bähler, J., and Oliver, S.G. (2019). Hidden in plain sight: what remains to be discovered in the eukaryotic proteome? Open Biology 9, 180241. 54. Stoeger, T., Gerlach, M., Morimoto, R.I., and Nunes Amaral, L.A. (2018). Large- scale investigation of the reasons why potentially important genes are ignored. PLOS Biology 16, e2006643.

95 55. Edwards, A.M., Isserlin, R., Bader, G.D., Frye, S.V., Willson, T.M., and Yu, F.H. (2011). Too many roads not taken. Nature 470, 163-165. 56. Muir, P., Li, S., Lou, S., Wang, D., Spakowicz, D.J., Salichos, L., Zhang, J., Weinstock, G.M., Isaacs, F., Rozowsky, J., et al. (2016). The real cost of sequencing: scaling computation to keep pace with data generation. Genome Biology 17. 57. Yohe, S., and Thyagarajan, B. (2017). Review of Clinical Next-Generation Sequencing. Archives of Pathology & Laboratory Medicine 141, 1544-1557. 58. Kumar, P., Henikoff, S., and Ng, P.C. (2009). Predicting the effects of coding non- synonymous variants on protein function using the SIFT algorithm. Nature Protocols 4, 1073-1081. 59. Kircher, M., Witten, D.M., Jain, P., O'Roak, B.J., Cooper, G.M., and Shendure, J. (2014). A general framework for estimating the relative pathogenicity of human genetic variants. Nature Genetics 46, 310-315. 60. Grimm, D.G., Azencott, C.A., Aicheler, F., Gieraths, U., Macarthur, D.G., Samocha, K.E., Cooper, D.N., Stenson, P.D., Daly, M.J., Smoller, J.W., et al. (2015). The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity. Human Mutation 36, 513-523. 61. Tang, H., and Thomas, P.D. (2016). Tools for Predicting the Functional Impact of Nonsynonymous Genetic Variation. Genetics 203, 635-647. 62. Soty, M., Chilloux, J., Casteras, S., Grichine, A., Mithieux, G., and Gautier-Stein, A. (2012). New insights into the organisation and intracellular localisation of the two subunits of glucose-6-phosphatase. Biochimie 94, 695-703. 63. Chou, J.Y., and Mansfield, B.C. (2008). Mutations in the glucose-6-phosphatase- alpha (G6PC) gene that cause type Ia glycogen storage disease. Hum Mutat 29, 921-930. 64. Pan, C.J., Lei, K.J., Annabi, B., Hemrika, W., and Chou, J.Y. (1998). Transmembrane topology of glucose-6-phosphatase. The Journal of biological chemistry 273, 6144-6148. 65. Ghosh, A., Shieh, J.J., Pan, C.J., Sun, M.S., and Chou, J.Y. (2002). The catalytic center of glucose-6-phosphatase. HIS176 is the nucleophile forming the phosphohistidine-enzyme intermediate during catalysis. J Biol Chem 277, 32837- 32842. 66. Pan, C.J., Lei, K.J., and Chou, J.Y. (1998). Asparagine-linked oligosaccharides are localized to a luminal hydrophilic loop in human glucose-6-phosphatase. J Biol Chem 273, 21658-21662. 67. Shieh, J.J., Terzioglu, M., Hiraiwa, H., Marsh, J., Pan, C.J., Chen, L.Y., and Chou, J.Y. (2002). The molecular basis of glycogen storage disease type 1a: structure and function analysis of mutations in glucose-6-phosphatase. J Biol Chem 277, 5047-5053. 68. Chou, J.Y., Jun, H.S., and Mansfield, B.C. (2015). Type I glycogen storage diseases: disorders of the glucose-6-phosphatase/glucose-6-phosphate transporter complexes. Journal of Inherited Metabolic Disease 38, 511-519. 69. Beyzaei, Z., and Geramizadeh, B. (2019). Molecular diagnosis of glycogen storage disease type I: a review. EXCLI J 18, 30-46.

96 70. Seydewitz, H.H., and Matern, D. (2000). Molecular genetic analysis of 40 patients with glycogen storage disease type Ia: 100% mutation detection rate and 5 novel mutations. Human Mutation 15, 115-116. 71. Rake, J.P., Ten Berge, A.M., Visser, G., Verlind, E., Niezen-Koning, K.E., Buys, C.H.C.M., Smit, G.P.A., and Scheffer, H. (2000). Glycogen storage disease type Ia: recent experience with mutation analysis, a summary of mutations reported in the literature and a newly developed diagnostic flowchart. European Journal of Pediatrics 159, 322-330. 72. Peeks, F., Steunenberg, T.A.H., de Boer, F., Rubio-Gozalbo, M.E., Williams, M., Burghard, R., Rajas, F., Oosterveer, M.H., Weinstein, D.A., and Derks, T.G.J. (2017). Clinical and biochemical heterogeneity between patients with glycogen storage disease type IA: the added value of CUSUM for metabolic control. J Inherit Metab Dis 40, 695-702. 73. Castell, J.V., Jover, R., Martnez-Jimnez, C.P., and Gmez-Lechn, M.J. (2006). Hepatocyte cell lines: their use, scope and limitations in drug metabolism studies. Expert Opinion on Drug Metabolism & Toxicology 2, 183-212. 74. Pan, C.J., Lin, B., and Chou, J.Y. (1999). Transmembrane topology of human glucose 6-phosphate transporter. The Journal of biological chemistry 274, 13865-13869. 75. Chou, J.Y., Zingone, A., and Pan, C.J. (2002). Adenovirus-mediated gene therapy in a mouse model of glycogen storage disease type 1a. Eur J Pediatr 161 Suppl 1, S56-61. 76. Weinstein, D.A., Correia, C.E., Conlon, T., Specht, A., Verstegen, J., Onclin- Verstegen, K., Campbell-Thompson, M., Dhaliwal, G., Mirian, L., Cossette, H., et al. (2010). Adeno-Associated Virus-Mediated Correction of a Canine Model of Glycogen Storage Disease Type Ia. Human Gene Therapy 21, 903-910. 77. Yiu, W.H., Lee, Y.M., Peng, W.T., Pan, C.J., Mead, P.A., Mansfield, B.C., and Chou, J.Y. (2010). Complete normalization of hepatic G6PC deficiency in murine glycogen storage disease type Ia using gene therapy. Mol Ther 18, 1076-1084. 78. Clar, J., Mutel, E., Gri, B., Creneguy, A., Stefanutti, A., Gaillard, S., Ferry, N., Beuf, O., Mithieux, G., Nguyen, T.H., et al. (2015). Hepatic lentiviral gene transfer prevents the long-term onset of hepatic tumours of glycogen storage disease type 1a in mice. Human Molecular Genetics 24, 2287-2296. 79. Roseman, D.S., Khan, T., Rajas, F., Jun, L.S., Asrani, K.H., Isaacs, C., Farelli, J.D., and Subramanian, R.R. (2018). G6PC mRNA Therapy Positively Regulates Fasting Blood Glucose and Decreases Liver Abnormalities in a Mouse Model of Glycogen Storage Disease 1a. Molecular Therapy 26, 814-821. 80. Miao, H., Zhou, J., Yang, Q., Liang, F., Wang, D., Ma, N., Gao, B., Du, J., Lin, G., Wang, K., et al. (2018). Long-read sequencing identified a causal structural variant in an exome-negative case and enabled preimplantation genetic diagnosis. Hereditas 155, 32. 81. Hutton, J.C., and O'Brien, R.M. (2009). Glucose-6-phosphatase Catalytic Subunit Gene Family. Journal of Biological Chemistry 284, 29241-29245. 82. Chou, J.Y., and Mansfield, B.C. (2014). The SLC37 family of sugar- phosphate/phosphate exchangers. Curr Top Membr 73, 357-382.

97 83. Mithieux, G. (1997). New knowledge regarding glucose-6 phosphatase gene and protein and their roles in the regulation of glucose metabolism. Eur J Endocrinol 136, 137-145. 84. Rajas, F., Jourdan-Pineau, H., Stefanutti, A., Mrad, E.A., Iynedjian, P.B., and Mithieux, G. (2007). Immunocytochemical localization of glucose 6-phosphatase and cytosolic phosphoenolpyruvate carboxykinase in gluconeogenic tissues reveals unsuspected metabolic zonation. Histochem Cell Biol 127, 555-565. 85. Mithieux, G., Rajas, F., and Gautier-Stein, A. (2004). A novel role for glucose 6- phosphatase in the small intestine in the control of glucose homeostasis. J Biol Chem 279, 44231-44234. 86. Chou, J.Y. (2001). The molecular basis of type 1 glycogen storage diseases. Curr Mol Med 1, 25-44. 87. Froissart, R., Piraud, M., Boudjemline, A.M., Vianey-Saban, C., Petit, F., Hubert- Buron, A., Eberschweiler, P.T., Gajdos, V., and Labrune, P. (2011). Glucose-6- phosphatase deficiency. Orphanet J Rare Dis 6, 27. 88. Bali, D.S., Chen, Y.T., Austin, S., and Goldstein, J.L. (1993). Glycogen Storage Disease Type I. In GeneReviews(®), M.P. Adam, H.H. Ardinger, R.A. Pagon, S.E. Wallace, L.J.H. Bean, K. Stephens, andA. Amemiya, eds. (Seattle (WA), University of Washington, Seattle Copyright © 1993-2020, University of Washington, Seattle. GeneReviews is a registered trademark of the University of Washington, Seattle. All rights reserved. 89. Sim, S.W., Weinstein, D.A., Lee, Y.M., and Jun, H.S. (2020). Glycogen storage disease type Ib: role of glucose-6-phosphate transporter in cell metabolism and function. FEBS Lett 594, 3-18. 90. Plona, K.L., Eastman, J.F., and Drumm, M.L. (2021). Classifying molecular phenotypes of G6PC variants for pathogenic properties and to guide therapeutic development. JIMD Reports. 91. Shin, T.S., Chung, I.H., and Kim, S.S. (1978). Electron Microscopy on Activity and Localization of Glucose-6-phosphatase in Liver Cells. Yonsei Medical Journal 19, 1. 92. Chiquoine, A.D. (1953). THE DISTRIBUTION OF GLUCOSE-6-PHOSPHATASE IN THE LIVER AND KIDNEY OF THE MOUSE. Journal of Histochemistry & Cytochemistry 1, 429-435. 93. Becerra-Muñoz, V.M., Gómez-Doblas, J.J., Porras-Martín, C., Such-Martínez, M., Crespo-Leiro, M.G., Barriales-Villa, R., De Teresa-Galván, E., Jiménez-Navarro, M., and Cabrera-Bueno, F. (2018). The importance of genotype-phenotype correlation in the clinical management of Marfan syndrome. Orphanet Journal of Rare Diseases 13. 94. Takahashi, K., Akanuma, J., Matsubara, Y., Fujii, K., Kure, S., Suzuki, Y., Wataya, K., Sakamoto, O., Aoki, Y., Ogasawara, M., et al. (2000). Heterogeneous mutations in the glucose-6-phosphatase gene in Japanese patients with glycogen storage disease type Ia. Am J Med Genet 92, 90-94. 95. Shieh, J.J., Lu, Y.H., Huang, S.W., Huang, Y.H., Sun, C.H., Chiou, H.J., Liu, C., Lo, M.Y., Lin, C.Y., and Niu, D.M. (2012). Misdiagnosis as steatohepatitis in a family with mild glycogen storage disease type 1a. Gene 509, 154-157.

98 96. Cassiman, D., Libbrecht, L., Verslype, C., Meersseman, W., Troisi, R., Zucman- Rossi, J., and Van Vlierberghe, H. (2010). An adult male patient with multiple adenomas and a hepatocellular carcinoma: mild glycogen storage disease type Ia. J Hepatol 53, 213-217. 97. Nakamura, T., Ozawa, T., Kawasaki, T., Nakamura, H., and Sugimura, H. (2001). Glucose-6-phosphatase gene mutations in 20 adult Japanese patients with glycogen storage disease type 1a with reference to hepatic tumors. J Gastroenterol Hepatol 16, 1402-1408. 98. Keller, K.M., Schütz, M., Podskarbi, T., Bindl, L., Lentze, M.J., and Shin, Y.S. (1998). A new mutation of the glucose-6-phosphatase gene in a 4-year-old girl with oligosymptomatic glycogen storage disease type 1a. J Pediatr 132, 360-361. 99. Sarajlija, A., Djordjevic, M., Kecman, B., Skakic, A., Pavlovic, S., Pasic, S., and Stojiljkovic, M. (2020). Impact of genotype on neutropenia in a large cohort of Serbian patients with glycogen storage disease type Ib. European Journal of Medical Genetics 63, 103767. 100. Varki, A. (2017). Biological roles of glycans. Glycobiology 27, 3-49. 101. van Schaftingen, E., and Gerin, I. (2002). The glucose-6-phosphatase system. Biochem J 362, 513-532. 102. Arion, W.J., Wallin, B.K., Carlson, P.W., and Lange, A.J. (1972). The Specificity of Glucose 6-Phosphatase of Intact Liver Microsomes. Journal of Biological Chemistry 247, 2558-2565. 103. Daniele, N., Rajas, F., Payrastre, B., Mauco, G., Zitoun, C., and Mithieux, G. (1999). Phosphatidylinositol 3-Kinase Translocates onto Liver Endoplasmic Reticulum and May Account for the Inhibition of Glucose-6-phosphatase during Refeeding. Journal of Biological Chemistry 274, 3597-3601. 104. Ness, G.C., Sukalski, K.A., Sample, C.E., Pendleton, L.C., McCreery, M.J., and Nordlie, R.C. (1989). Radiation inactivation analysis of rat liver microsomal glucose-6-phosphatase. Journal of Biological Chemistry 264, 7111-7114. 105. Solomon, M., and Muro, S. (2017). Lysosomal enzyme replacement therapies: Historical development, clinical outcomes, and future perspectives. Advanced Drug Delivery Reviews 118, 109-134. 106. Zhang, L., Lee, C., Arnaoutova, I., Anduaga, J., Starost, M.F., Mansfield, B.C., and Chou, J.Y. Gene therapy using a novel G6PC-S298C variant enhances the long- term efficacy for treating glycogen storage disease type Ia. 107. Kim, G.Y., Lee, Y.M., Kwon, J.H., Cho, J.H., Pan, C.J., Starost, M.F., Mansfield, B.C., and Chou, J.Y. Glycogen storage disease type Ia mice with less than 2% of normal hepatic glucose-6-phosphatase-α activity restored are at risk of developing hepatic tumors.

99