<<

APPLYING FORWARD GENETIC APPROACHES TO

RARE MENDELIAN DISORDERS AND COMPLEX

TRAITS

By

ANLU CHEN

Submitted in partial fulfillment of the requirements

For the degree of Doctor of Philosophy

Dissertation Advisor: Dr. David Buchner

Department of Biochemistry

CASE WESTERN RESERVE UNIVERSITY

August, 2018

CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of

ANLU CHEN

Candidate for the degree of Doctor of Philosophy*.

Committee Chair

Hung-Ying Kao

Committee Member

David Buchner

Anna Mitchell

Anthony Wynshaw-Boris

Eckhard Jankowsky

Date of Defense

July 5th, 2018

*We also certify that written approval has been obtained for any proprietary material contained therein

TABLE OF CONTENTS

TABLE OF CONTENTS……………………………………………...…………………i

LIST OF TABLES…………………………………………………….………………...iv

LIST OF FIGURES…………………………………………………………….…..…...vi

LIST OF ABBREVIATIONS…………………………………………………………viii

ACKNOWLEDGEMENT……………………………………………………………...ix

ABSTRACT…………………………………………………………………………...... xi

Chapter 1. Background and Significance……………………………………….……...1

Background………………………………………………………………………………2

1. Forward genetics and …………………………………………………2

1.1. History of genetic studies…………………………………………………….2

1.2. New era of forward genetics using next-generation sequencing……………..5

2. Complex traits and diseases…………………………………………………….6

2.1. Genome-wide association study (GWAS)…………………………………...6

2.2. Missing heritability…………………………………………………………..7

3. Rare Mendelian disorders……………………………………………………………..10

3.1. Rare disorders……………………………………………………………….10

3.2. Consanguineous families……………………………………………………10

Significance……………………………………………………………………………..12

Chapter 2. Mutations in the Mitochondrial Ribosomal MRPS22 Lead to

Primary Ovarian Insufficiency………………………………………………………...14

(Adapted from Chen A. et al. Human 2018)

i

Abstract…………………………………………………………………………………15

Introduction…………………………………………………………………………….16

Results…………………………………………………………………………………..18

1. Identification of mutations in MRPS22 in patients with POI…………………………18

2. Cellular studies of POI patient-derived fibroblasts…………………………………...33

3. Embryonic lethality of Mrps22 deficient mice………………………………………..38

4. mRpS22 in Drosophila germ cells is required for fertility……………………………40

Discussion…………………………………………………………………………….....44

Materials and Methods……………………………………………………………….. 50

Chapter 3. Mutations in PIK3C2A Cause Syndromic Short Stature Associated with

Cataracts and Skeletal Abnormalities……………………….………………………..58

(Manuscript in preparation)

Abstract…………………………………………………………………………………59

Introduction…………………………………………………………………………….60

Results…………………………………………………………………………………...61

1. Identification of mutations in PIK3C2A in patients with syndromic short stature……61

2. Identification of cellular defects in patient-derived fibroblasts……………………….76

3. Pik3c2a deficiency causes cataracts in zebrafish model………………………………79

Discussion……………………………………………………………………………….83

Materials and Methods………………………………………………………………...87

ii

Chapter 4. Widespread Epistasis Regulates Glucose Homeostasis and

Expression………………………………………………………………………………97

(Adapted from Chen A. et al. PLoS Genet. 2017)

Abstract…………………………………………………………………………………98

Introduction…………………………………………………………………………….99

Results………………………………………………………………………………….101

1. Contribution of epistasis to metabolic traits………………………………………....101

2. Regulation of by epistasis………………………………….………114

3. Context-dependent effects on gene expression………………………………………128

4. Significant contribution of epistasis to trait heritability……………………………..133

Discussion……………………………………………………………………………...136

Materials and Methods……………………………………………………………….142

Chapter 5. Summary and Future Direction…………………………………………157

Summary………………………………………………………………………………158

Future Directions……………………………………………………………………...159

1. Researchers are not alone in battles against genetic diseases………………………..159

2. Gene therapy to cure the diseases……………………………………………………161

3. Strategies to predict disease risk loci………………………………………………...162

4. Strategies to better under current data……………………………………………….163

5. What’s beyond genetic studies in understanding human disorders?...... 164

Reference……………………………………………………………………………....166

iii

LIST OF TABLES

Table 2.1. Hormone levels in four individuals with POI………………………………..22

Table 2.2. Adrenocorticotropic hormone stimulation test for patients with POI………..22

Table 2.3. List of gaps in WES coverage in Family I with POI and primers used for

Sanger sequencing of these regions…………………………………………………...... 27

Table 2.4. Plasma adrenal levels in two individuals with POI…………………..32

Table 2.5. Survival of offspring from a Mrps22 heterozygous knockout mouse (+/-) intercross…………………………………………………………………………………39

Table 2.6. Phenotypes of RNAi-mediated mRpS22 tissue-specific knockdown in

Drosophila……………………………………………………………………………….41

Table 3.1. Phenotypic characteristics of patients in three families with Syndromic Short

Stature……………………………………………………………………………………65

Table 3.2. Candidate variants identified by WES in patients with Syndromic Short

Stature……………………………………………………………………………………69

Table 3.3. Survival of offspring from pik3c2a heterozygous knockout zebrafish (+/-) crosses……………………………………………………………………………………80

Table 3.4. List of primers used in the study of Syndromic Short Stature with PIK3C2A mutations………………………………………………………………………………...95

Table 3.5. List of antibodies used in the study of Syndromic Short Stature…………….96

Table 4.1. Number of mice used for analysis of body weight and plasma glucose……102

Table 4.2. Main and average effects on phenotypes…………………………………...109

Table 4.3. Main effects on gene expression……………………………………………118

Table 4.4. Summary of with mutliple meQTLs…………………………………118

iv

Table 4.5. Genes exaimined by RNA-Seq and RT-qPCR for epistasis and additive interactions……………………………………………………………………………...122

Table 4.6. Interection effects on gene expression……………………………………...124

Table 4.7. Summary of genes with mutliple ieQTLs…………………………………..126

Table 4.8. Identification of fasting glucose QTLs using a combined linear model……153

Table 4.9. Identification of body weight QTLs using a combined linear model………154

Table 4.10. Primer sequences for RT-qPCR detection………………………………...155

v

LIST OF FIGURES

Fig. 2.1. Pedigrees of two consanguineous families with POI…………………………..19

Fig. 2.2. Absence of germ cells in the ovary of a female patient with the MRPS22 p.R202H mutation………………………………………………………………………..24

Fig. 2.3. Independent mutations in MRPS22 identified in two consanguineous families with POI………………………………………………………………………………….28

Fig. 2.4. Molecular analysis of fibroblasts from patients with the MRPS22 (p.R202H) mutation………………………………………………………………………………….34

Fig. 2.5. Oxidative phosphorylation is normal in fibroblasts from patients with the

MRPS22 (p.R202H) mutation…………………………………………………………...36

Fig. 2.6. mRpS22 is required for female germ cell development in Drosophila………...43

Fig. 2.7. Structural analysis of disease-causing missense mutations in MRPS22……….48

Fig. 3.1. Pedigrees and phenotypic characteristics of patients with Syndromic Short

Stature…………………...... 63

Fig. 3.2. Detailed phenotypic characteristics of individuals with PIK3C2A deficiency...... 66

Fig. 3.3. Loss-of-function mutations in PIK3C2A. ……………………………………..71

Fig. 3.4. Protein and mRNA levels of PIK3C2A in patient-derived cells……………….74

Fig. 3.5. Cilia defects in patient-derived PIK3C2A fibroblasts………………………….75

Fig. 3.6. PIK3C2A exon skipping in individual III-II-2 with Syndromic Short

Stature……………………………………………………………………………………77

Fig. 3.7. Localization of ciliary markers in patient-derived PIK3C2A deficient fibroblasts………………………………………………………………………………...78

vi

Fig. 3.8. Pik3c2a deficiency in zebrafish causes cataracts..……………………………..82

Fig. 4.1. Body weight and glucose levels in all CSS and control mice………………...103

Fig. 4.2. Schematic diagram of CSS and control crosses………………………………105

Fig. 4.3. Identification of 5 inter-chromosomal epistatic interactions that regulate

fasting glucose levels in mice…………………………………………………………..111

Fig.4.4. Inter-chromosomal epistasis regulates fasting glucose levels…………………112

Fig. 4.5. Identification of meQTLs that regulate hepatic gene expression……………..116

Fig. 4.6. Positive correlation between cis-meQTLs and trans-meQTLs……………….117

Fig. 4.7. Identification of 5 trans-meQTLs that regulate the hepatic expression of

Brca2……………………………………………………………………………………119

Fig. 4.8. Regulation of hepatic Zkscan3 expression by additive meQTLs……………..121

Fig. 4.9. Positive correlation between cis-ieQTLs and trans-ieQTLs………………….125

Fig. 4.10. Identification of 4 ieQTLs that regulate the hepatic expression of Agt……..127

Fig. 4.11. Schematic diagram illustrating the categorization of epistasis as either synergistic or antagonistic………………………………………………………………129

Fig. 4.12. Examples of synergistic and antagonistic ieQTLs…………………………..131

Fig. 4.13. Contribution of epistasis to the genetic regulation of hepatic gene expression……………………………………………………………………………....134

Fig. 4.14. No differences in mapping efficiency of RNA-Seq reads between B6 and

CSSs…………………………………………………………………………………….148

vii

LIST OF ABBREVIATIONS

HD Huntington's disease ENU N-ethyl-N-nitrosourea RNAi RNA interference WGS whole genome sequencing WES whole exome sequencing GWAS Genome-wide association study POI Primary ovarian insufficiency LH FSH follicle-stimulating hormone E2 estradiol ACTH Adrenocorticotropic hormone test CAH congenital adrenal hyperplasia SNP single nucleotide polymorphism MAF minor allele frequency CSS substitution strains QTL quantitative trait loci me-QTL main expression QTL ie-QTL interaction expression QTLs

viii

ACKNOWLEDGEMENTS

I would like to express my sincere gratitude to my advisor Dr. David Buchner for his

unwavering support and encouragement during my Ph.D. odyssey. His lab was the third

lab I rotated during the first semester of my Ph.D. program. With the great experience in

previous lab, I still felt something special in him and truly believed he would be a great

mentor. I appreciate that he fully trusts my ability to conduct three intriguing projects.

With his art of foresight on the projects, extraordinary communication skills among collaborators, my work and some luck, all of them turned out well.

Since I mainly worked on human genetic projects, which requires widely collaboration,

I would sincerely thank our collaborators. Without their help and support, I couldn’t have fulfilled the three diverse projects.

I would like to thank all my lab members, Dr. Alyssa Charrier, Li Wang and Rachel

Stegemann, who help build up the best lab ever.

I wish to sincerely acknowledge my committee members, Dr. Hung-Ying Kao, Dr.

Anthony Wynshaw-Boris, Dr. Anna Mitchell and Dr. Eckhard Jankowsky. They have provided valuable suggestions and comments to my research as well as fully supported my career.

I would thank student and faculty fellows in the Department of Biochemistry for the stimulating discussions and for all the fun that we had together, especially for choosing me as this year’s student representative and full support. I also would like to thank

Department of Genetics for the harmonic atmosphere for conducting research.

ix

At least, I would like to thank my parents, for their unselfish love and support.

x

Applying Forward Genetic Approaches to Rare Mendelian

Disorders and Complex Traits

Abstract

By

ANLU CHEN

Forward genetics utilizes unbiased genetic approaches to locate causal genetic variants of heritable traits. Classic forward genetics approaches were tremendously successful, but are now widely considered too time-consuming and laborious. Technological and methodological innovations, such as next-generation sequencing, have ushered in a new era of forward genetics studies to better understand the genetic basis of disease. In this dissertation, I provided examples of forward genetic approaches that successfully identified novel genetic causes of two rare Mendelian disorders (Chapter 2,3) and further discovered the genetic architecture of metabolism-related complex traits (Chapter 4).

Mendelian disorders are caused by variants in a single gene, while complex traits are regulated by multiple genes, which either work independently or interact with each other.

In Chapters 2 and 3, we explored the exome sequence of patients from consanguineous families to identify causal genetic variants, which were further studied using cellular and animal models. Our findings unveiled novel functions of the genes MRPS22 and

PIK3C2A, and facilitated a better understanding of normal and pathological development of their associated disorders. Our study expanded the phenotypic spectrum of MRPS22

xi

mutations from mitochondrial diseases to now also include primary ovarian insufficiency and elucidated its cell autonomous role in germ cell development. Our study on syndromic short stature associated with cataracts and skeletal abnormalities also identified the first Mendelian disorder associated with PIK3C2A mutations, whose in vivo role was poorly understood. In Chapter 4, we identified widespread epistatic interactions using double chromosome substitution stains in mice and provided strong evidence for the controversial contribution of epistasis to genetically complex traits and diseases. Our findings demonstrated that epistatic interactions controlled the majority of the heritable variation in both fasting plasma glucose levels and hepatic gene expression, even greater than the additive effects on these traits. These findings may partially explain the phenomenon of ‘missing heritability’ in complex traits. We also identified that the epistatic interactions were prone to keep trait levels at their “normal” level. We hypothesize that this is evolutionarily advantageous, enabling stored genetic variants in the genome without reducing fitness while allowing for rapid adaptation to future environmental challenges.

xii

Chapter 1: Introduction

1

Background

1. Forward genetics and reverse genetics

1.1 History of modern genetic studies

Gregor Johann Mendel is widely known as the ‘father of modern genetics’ due to his

work on pea plants published in 1866, which first characterized the inheritance patterns

of certain traits in pea plants, now known as ‘Mendelian inheritance’. Though largely

undiscovered for over 30 years, the value of this profound discovery was marked in

history. In the early 1900s, three geneticists, Hugo de Vries, Carl Correns and Erich von

Tschermak independently “rediscovered” Mendel’s findings and enlightened researchers

to further investigate the nature of the phenomena. With the discovery of the double

helical structure of DNA by James Watson and Francis Crick in 1953 as well as

contributions from Rosalind Franklin and Maurice Wilkins, a DNA sequencing technique

by Frederick Sanger and colleagues in 1977, and the PCR (polymerase chain reaction) technique by Kary Mullis in 1983, we entered a golden era of identifying the molecular link between genotype and phenotype.

To identify phenotype-genotype correlations, classic forward genetic approaches were brought to the field, which attempt to locate the causal variants of a heritable trait in an unbiased manner that is based on the position of the variant in the genome rather than the function of the variant or the gene through which the variant acts. One such method to identify the genetic basis of a measurable phenotype is to perform linkage mapping to localize a causal mutation to a defined chromosomal region. This can then facilitate the

2

identification of the causal variant using PCR-based sequencing methods, followed by

genetic and/or biological approaches to further validate the causal relationship.

With so many biomedically relevant phenotypes to study, but little knowledge of gene function, it has been tremendously valuable to identify the genes that cause various phenotypes using classic forward genetic approaches. For example, the trembler strain, a naturally occurring mutant mouse strain, was used as a model for peripheral neuropathy and facilitated the discovery of disease-causing mutations in Pmp22[1]. In , studies of Huntington's disease (HD) also benefited from this approach. With haplotype analysis of linkage disequilibrium, expansions of CAG repeats in the coding region of the huntingtin (HTT) gene were identified[2]. To facilitate additional forward genetic studies, a variety of animal models have been created to improve the efficiency of disease gene identification. For example, X-ray and ENU (N-ethyl-N-nitrosourea) exposure generates mutations with over 100-fold increased efficiency compared with that of spontaneous mutations[3]. X-rays trigger large deletions or chromosomal translocations and have been used to study phenotype induced by null mutations, whereas ENU induces point mutations and provides the opportunity to dissect phenotypes with missense mutations, nonsense mutations and splicing variants in parallel[4]. As a result of classic forward

genetic approaches, many genes are named after their phenotype. For example, the rosy

gene (ry) encodes for xanthine dehydrogenase in Drosophila, but was named because

flies with homozygous recessive mutations in this gene developed rosy eye color[5].

3

In contrast to forward genetics, reverse genetic approaches have an emphasis on

manipulating a specific gene and studying the consequences of this manipulation at the

molecular and phenotype levels. So far, 6,035 eukaryotic species have had their genome

sequenced and are currently available through the National Center for Biotechnology

Information. Genome-wide sequencing revealed a large number of genes whose functions

are unknown and cannot be predicted, but whose sequence availability can enable studies

based on reverse genetic approaches.

RNA interference (RNAi) is the process by which expression of a target gene is inhibited

by antisense and sense , and is one such method for studying gene function by

reverse genetics. RNAi was first discovered in C. elegans as an efficient mechanism for

silencing gene expression or [6] and has been applied in many organisms as a

high-throughput approach to generate genome-wide loss-of-function phenotypes [7],

including agricultural applications [8]. But the usage is severely limited by the ability to

deliver siRNAs to the target gene. In Chapter 2 of this thesis, we applied RNAi in exploring the phenotype of a gene of interest in Drosophila. In Xenopus and zebrafish,

morpholino oligonucleotides are more commonly applied. Morpholino oligonucleotides have been chemically modified to mimic the nucleotide sequences and therefore alter

mRNA splicing or translation. In terms of systematic approaches, in addition to

homologous recombination, insertional mutagenesis-based approaches were widely applied in , yeast, mice and other mammals. For example, by random insertion of a vector into the genome of mouse ES cells, gene trapping has generated over 121,000 ES cell lines so far, representing ~40% of all genes. Other targeted genetic editing

4

approaches were also applied, such as targeted knockouts, transgenics and so on. So far, in the mouse genome, 13,302 genes with spontaneous, induced or genetically engineered mutations have been phenotyped, and associated with 1,464 human disorders according to Mouse Genome Informatics (http://www.informatics.jax.org/).

1.2. New era of forward genetics using next-generation sequencing

Building upon the Sanger chain termination sequencing method, capillary electrophoresis-based sequencing instruments, as first-generation sequencing methods, were invented by Applied Biosystems and played a key role in the

Project, which took 15 years and cost nearly three billion dollars to sequence the first human genome. Since then, technological and methodological innovations are providing the opportunity for a second surge of forward genetics studies. For example, the HiSeq X Ten System was released by Illumina in 2014. This system can generate

1.8 terabases (Tb) of data in a single sequencing run. With the help of barcoding methods to tag individual samples within a pool of DNA samples, this platform enables high-quality sequencing of over 45 human genomes per run and costs less than

$1000 per genome.

Using whole genome sequencing (WGS), the multinational 1000 Genomes Project identified genetic and structural variants, such as copy number variation, from 2,504 individuals from multiple ethnic groups [9–12]. This project turned out to be a great success in rare variants discovery. In total, ~64 million variants with less than 0.5% allele frequency were discovered, accounting for ~72% of all variants. Another

5

interesting finding is that the majority of variants observed in a single genome are common variants. This leads to the question: Which ones are associated with human fitness? Though the protein-coding portion of the genome (exome) accounts for just ~2% of the whole human genome, 85% of disorders for which genetic causes have been identified thus far are associated with variants in the exome [13,14]. As an alternative strategy to WGS, whole exome sequencing (WES) has become an affordable but efficient and reliable way to identify the molecular etiology of human diseases.

Additionally, there are publicly available databases of exome sequence from healthy individuals, such as the Exome Aggregation Consortium database[15] or the Greater

Middle East Variome Project[16] that can serve as controls for patient studies. For example, the databases mentioned above facilitated the studies in Chapter 2 and 3 in this thesis.

2. Complex traits and human diseases

2.1. Genome-wide association study (GWAS)

GWAS are based on the hypothesis that the variant causing a complex trait is more frequently present in individuals with that trait (the case group) than individuals without the traits (the control group). The aim of GWAS is to scan the genome in an unbiased manner to identify genetic variants associated with complex traits and disease susceptibility. The causal variants could then be discovered either directly from the common SNP markers genotyped in the study, or based on linkage disequilibrium between common SNP markers and other linked variants. In 1996, Risch and Merikangas

6

first claimed that GWAS have greater power than linkage studies, however, the major limitation was the identification of a large number of SNP markers (up to one million) to substantially reduce the required number of samples for screening [17]. In 2005, the multinational HapMap project (Phase I) developed a haplotype map of the human genome with over one million SNPs from a variety of human populations for initial screening [27]. This study also determined that approximately 500,000 common SNPs in the human genome are sufficient to tag causal variants in non-African populations. With the ability to quickly scan SNPs in the genome using commercialized SNP chips and a large number of case-control samples and population cohorts, GWAS became more feasible. In addition to SNP chips, WGS data also can serve as an important source of genotype information for GWAS with higher coverage and more SNP markers. So far, over 10,000 GWAS have reported significant associations between genetic variants and diseases, including heart disease, diabetes, auto-immune diseases, psychiatric disorders as well as quantitative traits [19].

2.2. Missing heritability

The variants currently identified by GWAS typically each have a small effect size, meaning each variant accounts for only a small portion of the phenotypic variation. Even combined, these variants can only explain 10%–20% of the phenotypic variance for most complex traits, with the remaining unidentified 80%-90% often referred to as “missing heritability”. By definition, heritability refers to the proportion of phenotypic variance due to heritable genetic factors. For example, the estimated heritability for height in

7

humans is 0.8, suggesting that inherited genetic variants account for 80% of the variation in height with the remaining 20% due to environmental factors[20]. Among the most

important unanswered questions in the current field of genetics is where to find the missing heritability.

2.2.1. Sample size

Though not linearly correlated, increasing the number of samples used in GWAS benefits

the detection of causal variants. For example, GWAS of approximately 700,000

individuals boosted the number of variants strongly associated with height to 3,290 [21].

Altogether, these variants explained 24.6% of heritability. In contrast, only 54 variants

were discovered in a study of approximately 63,000 individuals, which together

accounted for just 5% of heritability [22,23]. With efforts from multinational consortiums,

it is reasonable to predict that continuously growing numbers of variants will be

discovered based on the additional power to detect variants with larger sample sizes.

2.2.2. Rare variants

The underlying rationale for GWAS is the ‘common disease, common variant’ hypothesis.

However, rare variants also contribute to common diseases in some cases. Rare variants arbitrarily refer to variants with less than 1% minor allele frequency (MAF <1%) while

common variants are defined as MAF >5% in population. In GWAS, rare variants are

generally difficult to test for an association with a trait due to a lack of statistical power.

However, next-generation sequencing has dramatically boosted the discovery of rare

variants that contribute to common complex traits[24–28]. Taking Alzheimer's disease as

8

an example, rare variants in TREM2, PLD3, UNC5C and AKAP9 have been associated with Alzheimer susceptibility due to their large effect size[26]. In one such study,

Thorlakur Jonsson and colleagues imputed variants from the genome sequences of 2261

Icelanders into the genomes of patients with Alzheimer's disease and control participants and then performed GWAS. As a result, a rare variant, rs75932628-T, in the gene TERM2 was discovered[29].

2.2.3. Epistatic interactions

Even with ever-larger sample sizes and the ability to detect all genetic variants, structural variations and epigenetic factors, there is a theoretical pitfall diminishing our ability to make progress towards the identification of ‘missing heritability’. This pitfall was described by Eric Lander and colleagues in 2012 as ‘phantom heritability’ [30]. The percentage of explained heritability is calculated as a ratio of phenotypic variance explained by the additive effects of discovered variants (the numerator) and additive effects of all variants, even not discovered yet (the denominator). However, Lander claimed that the denominator might be overestimated and thus even when all variants are discovered, the percentage of explained variants is far below 100%, or ‘phantom heritability’. Such ‘phantom heritability’ can be caused by epistatic interactions among already discovered variants. Instead of additively affecting the phenotype, these variants interact with each other and trigger either diminished or exaggerated outcomes relative to that expected by additivity. So far, a number of genome-wide interaction-based association studies or Post-GWAS gene-gene interaction studies in humans have provided evidence for epistasis in a variety of complex traits and diseases[31–37]. For

9

example, SNP rs2106261 (G/A substitution) in ZFHX3, rs2200733 (C/T substitution)

near PITX2c, and rs3807989 (A/G substitution) in CAV1 have been previously identified

by GWAS that associated with Atrial fibrillation disease. Yufeng Huang and colleagues found that rs2200733 and rs2106261 epistatically interact with each other, resulting in a synergistic effect that increases the risk of Atrial fibrillation disease[31]. Therefore, a better understanding of epistasis can help solve the ‘missing heritability’ problem and discover novel disease pathophysiology.

3. Rare Mendelian disorders

3.1. Rare disorders

A rare disease refers to a disease that affects fewer than 200,000 people in the United

States[38]. There are an estimated 7,000 rare genetic diseases[39]. Collectively, between

25 and 30 million Americans suffer from a rare disease[38], which is nearly 1 in 10.

Moreover, approximately 75% of rare diseases occur in children under ten years of age

and threaten their lives[40]. An early and accurate genetic diagnosis is critical to the

optimal care for a child with a rare genetic disease. It’s also important for pregnant

women and couples if family history is a concern. Rare diseases are particularly

susceptible to misdiagnosis due to the lack of understanding or familiarity with the

disease by physicians. Even one reported case may inspire the discovery of additional

cases, and thus these initial studies can benefit the diagnosis of the patients, and provide

potential treatment for them.

3.2. Consanguineous families

10

A consanguineous family refers to a family with offspring from a marriage between two individuals with at least one ancestor in common. In general, consanguinity does not increase the risk for autosomal dominant conditions in offspring when one of the parents is affected, nor for X-linked recessive conditions if neither parent is affected[41].

However, consanguineous families have increased risk for autosomal recessive disorders because of the inheritance of autosomal recessive gene mutations from a common ancestor. The pre-reproductive mortality in offspring of first-cousins is ~3.5% higher than that of non-consanguineous offspring[42]. The highest rates of consanguineous marriage, up to 50%, occur in North and sub-Saharan Africa, the Middle East, and West

Asia. In these regions, traditionally consanguineous marriages are preferred and respected.

Our study described below is based on offspring from consanguineous families of Middle

Eastern origin.

Consanguineous families are an efficient genetic model to study rare disease. In a consanguineous family, the recessive mutation is carried and inherited within the family.

If two family members are cousins and each inherits one copy of the recessive mutation from their common ancestor, theoretically 25% of their offspring will inherit both copies of the mutation and develop a recessive disorder[43]. Moreover, current studies find that in a consanguineous family,30% of the rare-disease-causing genes turn out to be a novel gene function compared with around 8% in studies of the general population[44].

11

Significance

1. Studies of complex traits

GWAS on type 2 diabetes and other metabolic complex traits have dramatically

promoted the identification of associated variants, however, a large portion of heritability

still can’t be explained. Presumably, such missing heritability is partially due to epistasis,

but there was a lack of evidence for large-scale epistasis in complex, multi-cellular

organisms. In this dissertation, we used mouse chromosome substitution strains to

facilitate epistasis detection, and successfully identified widespread epistatic interactions

that regulate fasting glucose levels and hepatic gene expression.

2. Studies of rare Mendelian disorders

Compared with common variants, rare variants are typically harder to detect in GWAS.

However, effect sizes of these variants tend to be much larger than common variants,

suggesting a more significant effect on gene function. Recently, more rare variants have been detected by next-generation sequencing. However, phenotype-genotype correlation studies of rare variants depend on well-established filtering strategies to identify the causal variant among all variants within an entire genome. In this dissertation we studied consanguineous families, utilizing specific population allele frequency databases among other information to guide our variant filtering strategies, and identified a novel genetic basis for two genetic diseases. Importantly, the causal variants revealed novel gene functions as they relate to normal physiological development.

12

Overall, our studies on complex traits and rare Mendelian disorders contributed to a

better understanding of the genetic basis of phenotype-genotype correlations, improved genetic diagnosis and counseling as well as built a foundation for precision medicine.

13

Chapter 2. Mutations in the mitochondrial

MRPS22 lead to primary ovarian insufficiency.

This is a pre-copyedited, author-produced version of an article accepted for publication in Human Molecular Genetics following peer review. The version of record (Anlu Chen, Dov Tiosano, Tulay Guran, Hagit N. Baris, Yavuz Bayram, Adi Mory, Laura Shapiro-Kulnane, Craig A. Hodges, Zeynep Coban Akdemir, Serap Turan, Shalini N. Jhangiani, Focco van den Akker, Charles L. Hoppel, Helen K. Salz, James R. Lupski, and David A. Buchner. Mutations in the mitochondrial ribosomal protein MRPS22 lead to primary ovarian insufficiency. Human Molecular Genetics, 2018, Vol. 27,

No. 11 1913–1926) is available online at: https://doi.org/10.1093/hmg/ddy098.

14

Abstract

Primary ovarian insufficiency (POI) is characterized by amenorrhea and loss or

dysfunction of ovarian follicles prior to the age of 40. POI has been associated with

autosomal recessive mutations in genes involving hormonal signaling and

folliculogenesis, however the genetic etiology of POI most often remains unknown. Here we report MRPS22 homozygous missense variants c.404G>A (p.R135Q) and c.605G>A

(p.R202H) identified in four females from two independent consanguineous families as a novel genetic cause of POI in adolescents. Both missense mutations identified in

MRPS22 are rare, occurred in highly evolutionarily conserved residues, and are predicted to be deleterious to protein function. In contrast to prior reports of mutations in MRPS22

associated with severe mitochondrial disease, the POI phenotype is far less severe.

Consistent with this phenotype-genotype correlation, mitochondrial defects in oxidative phosphorylation or rRNA levels were not detected in fibroblasts derived from the POI patients, suggesting a non-bioenergetic or tissue specific mitochondrial defect.

Furthermore, we demonstrate in a Drosophila model that mRpS22 deficiency specifically in somatic cells of the ovary had no effect on fertility, whereas flies with mRpS22 deficiency specifically in germ cells were infertile and agametic, demonstrating a cell autonomous requirement for mRpS22 in germ cell development. These findings collectively identify that MRPS22, a component of the small mitochondrial subunit, is critical for ovarian development and may therefore provide insight into the pathophysiology and treatment of ovarian dysfunction.

15

Introduction

Primary ovarian insufficiency (POI) is defined by the loss or dysfunction of ovarian

follicles associated with amenorrhea before the age of 40 [45]. POI is a major cause of

female infertility with a prevalence greater than 1%. There is a strong genetic component

to the development of POI, both in the form of monogenic and multigenic disorders,

however, in most cases the genetic etiology of POI remains unclear [46]. Among the most common genetic defects associated with POI are defects, which collectively account for approximate 10-25% of POI cases [46]. These include Turner’s syndrome, Triple X syndrome, and . A number of monogenic disorders resulting in POI have also been identified, including those with variants in

BMP15 and PGRMC1, both located on Chromosome X, as well as those with variants in

GDF9, FOXO3, FIGLA, and NR5A1, among others, that are autosomal [46]. These variants are each estimated to account for between 1-2% of POI cases. Thus, the majority of POI cases remain classified as idiopathic.

Although there remains much to understand about the genetic basis of POI, much has been learned already about both normal and pathological ovarian development based on cellular and molecular studies of the genes that have been associated with POI. For example, SYCE1, STAG3, and HFM1 are members of the synaptonemal complex that are required for chromosomal segregation during meiosis [47–51] and NUP107 is a component of the nuclear pore complex that is important for maintaining the communication between gonadal somatic cells and oocytes [52]. Furthermore, autosomal

recessive disorders that affect DNA repair, such as MCM8 [53] and MCM9 [54,55] and

16

genes encoding factors, such as FIGLA [56], SOHLH1 [57], and NOBOX

[58,59] have been recently reported in POI. Moreover, initiation

factor 4E nuclear import factor 1 (eIF4ENIF1) has been recently identified in cases of

dominantly inherited POI [60]. However, there are still many cases with unexplained POI

suggesting that new causative genes are yet to be discovered.

Here we present the identification of two different homozygous missense mutations in the nuclear encoded gene mitochondrial ribosomal protein S22, MRPS22, as another genetic cause of POI in four adolescent females from two independent consanguineous families.

Drosophila modeling demonstrated a cell autonomous function of the MRPS22 ortholog in germ cells that is required for female germ cell viability, thus collectively demonstrating the importance of MRPS22 in reproduction and ovarian development.

17

Results

Identification of mutations in MRPS22 in patients with POI

To identify novel genetic causes of POI, we focused on an extended Israeli-Christian

Arab consanguineous family, in which two distinct genetic conditions with autosomal- recessive inheritance patterns were suspected (Fig. 2.1A). The presence of two distinct genetic conditions is not uncommon in this particular patient population [61]. The first

genetic condition presented as 46,XY females with inguinal hernias that contained

in two twin sisters at the age of 3 years (Fig. 2.1A. F1-IV-7 and F1-IV-8).

Genetic evaluation revealed that both sisters had a 46,XY and were

homozygous for a missense mutation in the hydroxysteroid 17-beta dehydrogenase 3

gene (NM_000197.1 (HSD17B3): c.239G>A; p.Arg80Gln), thus resulting in a diagnosis

of 17-beta hydroxysteroid dehydrogenase type III deficiency (OMIM:605573) [62].

HSD17B3 converts to and is expressed predominantly in the

testes [63]. One girl underwent bilateral gonadectomy at the age of 3 years and continued

to be raised as a female. Her sister underwent unilateral gonadectomy at the age of 3

years and was raised as female until the age of 9 years, when due to adrenarche and the

presence of one in the inguinal canal, signs of masculation appeared. The parents

decided to raise the child as a boy. All of the other girls in the extended family (Fig. 2.1A)

were evaluated and found to have a normal 46,XX karyotype.

18

Fig. 2.1. Pedigrees of two consanguineous families with POI.

19

(A) Family I pedigree structure. Below the pedigree are the genotypes corresponding to

MRPS22, HSD17B3, and the karyotyping results. MRPS22 mutation refers to c.605G>A; p.Arg202His. HSD17B3 mutation refers to c.239G>A; p.Arg80Gln (B) Family II pedigree structure. Below the pedigree are the MRPS22 genotypes, which refers to c.404G>A; p.Arg135Gln and the karyotyping results.

20

The second genetic condition in this family was suspected when the proband (Fig. 2.1A.

F1-IV-9) presented with delayed at the age of 16 years, with delay in breast

development (B) and pubic hair (P) that were at Tanner stage 1 and 2, respectively.

Hormonal testing revealed hypergonadotrophic as evidenced by high

basal levels including luteinizing hormone (LH) and follicle-stimulating

hormone (FSH), as well as undetectable estradiol (E2) (Table 2.1). Medical history

revealed normal pregnancy and delivery (birth weight 3000 grams) and normal growth

and development during childhood. Genetic analysis revealed that this individual was

homozygous for the p.Arg80Gln variant in the HSD17B3 gene. However, all previously

described 46,XX female patients that were homozygous for the p.Arg80Gln variant were

asymptomatic [62,64]. Thus, a second unrelated genetic condition was suspected [61].

Family history revealed that in the extended family there is another 19-year-old girl (Fig.

2.1A. F1-IV-5) with (B1P2) and a similar profile of POI with hypergonadotrophic hypogonadism (Table 2.1). MRI of the proband demonstrated a very small uterus, measuring 5x12x14 mm. Adrenocorticotropic hormone (ACTH) testing in both patients confirmed normal cortisol production, without any evidence of an enzymatic block (Table 2.2). Urinary steroid analysis by gas chromatography / mass spectrometry (GC/MS) revealed low levels of etiocholanolone (Et) and androsterone (An), and that methemoglobin levels were normal. Echocardiography revealed a normal heart in both sisters. Lactate levels (1.0 and 1.4 mmol/L; normal range 0.5-1.6 mmol/L) and

blood pH (pH=7.33 and 7.31) were both normal. Bone age measurements taken at

chronological age 18 years and 3 months were compatible with bone age of 13 years, thus

demonstrating delayed bone age.

21

Table 2.1. Hormone levels in four individuals with POI.

F1-IV-5 F1-IV-9 F1-IV-11 F2-IV-2 Normal pubertal range Age (years) 19 16 12 14 LH (mIU/L) 17.3 21.0 9.1 20.0 1.0 – 14.7 FSH 41.3 78.0 21.0 99.4 3.0 – 21.0 (mIU/L) E2 (pmol/L) Not detected Not detected 141 0.04 55 – 1250

Table 2.2. Adrenocorticotropic hormone stimulation test for patients with POI.

Individual F1-IV-9 F2-IV-2 ACTH test (min) 0 60 0 60 Cortisol (nmol/l) 323 797 750 1080 17OH (nmol/l) 1.8 9.7 4.8 8.1 CS (nmol/l) 7.1 14.6 N.D. N.D. N.D. not done

22

The proband’s sister (Fig. 2.1A. F1-IV-11), at the age of 9 years also showed elevated

LH level of 10.4 mIU/ml with low FSH 0.27 mIU/ml and undetectable . At the age of 12 years, physical examination revealed Tanner stage 3 breast development with detectable but elevated (Table 2.1). Anti-mullerian hormone was undetectable < 0.16 µgr/L. MRI showed a small uterus with normal cervix (21x9x9 mm) and small ovaries (10x6x4 mm). An ovarian biopsy at the age of 14 years, taken for ovarian tissue preservation, demonstrated fibrotic ovaries without follicles (Fig. 2.2). All three affected individuals are the offspring of first degree cousins, consistent with an autosomal recessive inheritance pattern (Fig. 2.1A). Genetic evaluation in all three patients revealed a normal female karyotype 46,XX. One patient was homozygous for the p.Arg80Gln variant in HSD17B3, one was heterozygous, and one did not carry the

HSD17B3 variant. Given that mutations in HSD17B3 are not associated with POI [62,64], and the POI phenotype did not segregate with the p.Arg80Gln variant in HSD17B3 in this family, it suggested an independent genetic etiology (Fig. 2.1A).

23

Fig. 2.2. Absence of germ cells in the ovary of a female patient with the MRPS22

p.R202H mutation.

H & E stained ovarian tissue from individual F1-IV-9.

24

To identify the genetic basis of the POI, linkage analysis was performed with affected

individuals F1-IV-5, F1-IV-9 and the unaffected sibling F1-IV-1. Based on the predicted

autosomal recessive inheritance pattern, we focused on regions of homozygosity in the

affected daughters F1-IV-5 and F1-IV-9 that were heterozygous in the unaffected sibling

F1-IV-1. SNP genotyping analysis identified 19 loci greater than 1 Mb on 10 different

that segregated with the POI.

To identify the causal genetic mutation within these candidate intervals, we performed

WES in affected individual F1-IV-5 and the unaffected individual F1-IV-1 (Fig. 2.1A).

Candidate variants encoding nonsynonymous changes discovered by WES were filtered

to remove variants with an allele frequency greater than 0.01 and that were not predicted

by either SIFT [65] or Polyphen2 [66] to be damaging or possibly damaging to protein

function. After filtering, four candidate variants from Family I were left and were

validated by Sanger sequencing. However, only the variant in mitochondrial ribosomal

protein S22 (MRPS22, NM_020191.2: c.605G>A: p.R202H; ClinVar Variation ID

SCV000607729), segregated with the POI phenotype in Family I with 30 unaffected and

three affected family members. Based on the SNP linkage analysis, this variant was

within a 3 Mb interval defined by the SNP markers rs2737735 and rs16850488 on

Chromosome 3 that segregated with POI. Analysis of polymorphisms within the WES

data revealed an absence of heterozygosity within this interval (Fig. 2.3A). To ensure that

there were no other nonsynonymous variants within this interval segregating with disease,

the 12 coding regions of genes that were not covered by WES were analyzed by Sanger

sequencing. These 12 intervals totaled 2,167 bp of coding sequence (average size: 197 bp;

25

range 53 bp – 411 bp) (Table 3.4). No additional variants were identified, demonstrating

that the MRPS22 (p.R202H) variant was the only change that segregated with

disease. This variant is present in dbSNP (rs753345594), but has an allele frequency of

0.00001218 (3/246,212), with no homozygotes present in the genome Aggregation

Database (gnomAD, v2.0). The Arg202 residue is highly evolutionarily conserved (Fig.

2.3B). Collectively, these data suggested that the MRPS22 (p.R202H) variant was a

strong candidate as a novel genetic cause of POI.

To identify additional POI patients with a mutation in MRPS22, this gene was submitted

to GeneMatcher [67]. A second family was identified in which WES revealed a different

homozygous missense mutation in MRPS22 (NM_020191; c.404G>A; p.R135Q; ClinVar

Variation ID SCV000693853) in the proband F2-IV-2. This variant was confirmed by

Sanger sequencing and was located within a genomic interval defined by an absence of

heterozygosity (Fig. 2.3A). Exome variant analysis had led to initial prioritization of five

homozygous candidate variants in Family II, in BCL6, KDM1A, MRPS22, PLXND1, and

TRIM62. This patient presented with a POI phenotype similar to that described for the 3

patients in Family 1 and as described in detail below. The MRPS22 variant was not

present in the Greater Middle East Variome Project WES database [16], the Genome

Aggregation Database (gnomAD) [15], or the BCM-HGSC internal database that consists of more than 6,500 exomes including ~1,100 Turkish exomes. Moreover, similar to

MRPS22 (p.R202H), this variant is highly evolutionary conserved (Fig. 2.3B) and predicted to be deleterious to protein function by SIFT and Polyphen2 [65,66].

26

Table 2.3. List of gaps in WES coverage in Family I with POI and primers used for

Gene Primer Strat (hg19) End (hg19) Size Sequence(5' to 3') located type Forward GCGTTAGGAGATGTGCAGGT Reverse AATTACTGCTCGGCTCCCAC RASA2 141205926 141206058 132 Nested forward GAGTACGGTTCTCTGCAGGG Nested CCCTTCCAGCCTCAACCG reverse Forward TGTTGTCTTTCCTGGCAGTG Reverse TTCCTGGGACAGAAGACTCC NMNAT3 139294836 139294425 411 Forward ACCTCCTCCAACAAGCTCCT Reverse GCAGGCAGACAATGGTTTCT ACPL2/PX Forward GGCAGTGTCCTCTCAGCAAC 140950667 140950753 87 YLP1 Reverse TCAGTCTCCCAACCTCGGAC Forward CTTTCTTCTGCCAGGGTTCTT 138823027 138823175 149 Reverse CCAGGAGTTTAGGCATTAGCC BPESC1 Forward TCTGGGCCAGCTCTATGC 138824938 138825263 326 Reverse CCCCACGCTAAACCGTCT Forward TAACGAGGAGGTGTTCTCGG ATP1B3 141595679 141595731 53 Reverse AATGAATGGGGCCGCACT Forward GCTAGAAGCGCACCCAT CLSTN2 139654213 139654325 113 Reverse GCACAGACAGCCCTCAAA Forward TTCCCCAAGCCAACGTCT RNF7 141461486 141461749 264 Reverse TTGTTTAACTCCGTTTATTGCCC Forward TAGTGGCCTTCAGGGATGAG SPSB4 140770244 140770585 342 Reverse GAGGAATTCTCAGGGACTGG Forward AGTCTGCAGTGTTTTCCTCTCT TFDP2 141719094 141719195 102 Reverse AGTCAATCTGCTCACAGGGT Forward GGGTCTTGCCTGGATGTTGA ZBTB38 141146476 141146663 188 Reverse TGATGGATCTGGGCAAAGCA

27

Fig. 2.3. Independent mutations in MRPS22 identified in two consanguineous families with POI.

(A) Sanger sequencing confirms the presence of the MRPS22 mutations detected by WES.

Intervals with an absence of heterozygosity were identified based on calculated B-allele

28

frequencies from the WES data. Gray shaded areas indicate regions with an absence of heterozygosity. The location of MRPS22 within an interval with an absence of heterozygosity is indicated by a vertical black line. (B) Alignment of protein sequences of

MRPS22 from multiple species demonstrates the evolutionary conservation of the

MRPS22 residues Arg135 and Arg202 that are altered in patients with POI.

29

In Family II, the proband (F2-IV-2) was born to 1st degree cousins of Turkish descent

(Fig. 2.1B). She presented with amenorrhea at 14 years and 8 months of age. Her

previous medical history revealed normal pregnancy and delivery at term, but she was

small for gestational age with 1900 grams birth weight (-3.9 SDS). Family history was

not consistent with a sexual development, puberty, or infertility disorder. Physical

examination revealed mild facial dysmorphism including deep-set eyes with mild

hypotelorism and mild ptosis, thin upper lip, and hypoplastic nares. Height was 145.3 cm

(-2.1 SDS) and weight was 45 kg (+0.4 SDS). At chronological age of 14 years and 8 months, bone age was 10 years. Fundoscopy was normal. She had Tanner stage 3 breast development, without pubic or axillary hair development. Laboratory evaluation revealed a normal 46,XX female karyotype and normal blood count and chemistry except for

impaired glucose tolerance test. Blood pH levels were normal (pH=7.42), however, blood

lactate concentrations were slightly elevated (3 mmol/L; normal range 0.5-1.6 mmol/L) and nerve conduction studies and electromyography testing revealed bilateral axonal polyneuropathy of lower extremities as indicated by absent evoked potentials from bilateral sural, peroneal, and tibial nerves. Brain and heart morphology was normal, as revealed by cranial and pituitary MRIs and echocardiography, respectively. Endocrine evaluation revealed hypergonadotrophic hypogonadism (Table 2.1). Abdomino-pelvic ultrasound showed small uterus of 15x12x3 mm, and ovaries could not be visualized.

ACTH test was normal (Table 2.2). Plasma adrenal by LC-MS/MS confirmed very low keto steroids (Table 2.4). Methemoglobin levels were elevated (1.9%, normal range: 0 - 1.5%) and Bone densitometry revealed mild osteoporosis (L2-L4, Z-score -2.1,

BMD: 0.736 gr/cm2). She was treated with combined estrogen and progesterone

30

supplementation and had menarche at 16 years and 8 months. Her final height is 157 cm

(+0.4 SDS) and weight is 58 kg (+1 SDS). Pelvic ultrasound at 20 years of age showed uterus as 42x21x8 mm and hypoplastic ovaries (right ovary: 14x6x4 mm; left ovary:

11x8x4 mm).

31

Table 2.4. Plasma adrenal steroid levels in two individuals with POI.

F1-IV-9 F2-IV-2 Normal range Progesterone 0.9 0.54 0.5 – 2.3 (nmol/l)

DHEAS 0.7 0.9 1.8 – 10.3 (mol/l) DHEA Not done 1.1 3.5 – 41.1 (nmol/l) Androstendione 3.3 0.02 1.7 – 16.3 (nmol/l) Testosterone <0.3 0.3 0.3 – 3.8 (nmol/l)

32

Cellular studies of POI patient-derived fibroblasts

Collectively, the identification of two independent families with POI and predicted

deleterious missense mutations in MRPS22 was highly suggestive that pathogenic

variants in this gene represent a novel genetic cause of POI. To examine the role of the

MRPS22 (p.R202H) variant on gene function, MRPS22 mRNA and protein expression

levels were examined in patient-derived fibroblasts. No detectable changes were

identified in protein or mRNA expression levels between control- and patient-derived

primary fibroblasts (Fig. 2.4A, B). In addition to potential expression level differences in

MRPS22, as a component of the small subunit of the , defects in

MRPS22 have been shown to reduce the levels of mitochondrial rRNAs [68,69]. However,

there were no differences between control- and patient-derived fibroblasts in the

expression levels of the 12S and 16S rRNA expression levels (Fig. 2.4C).

To evaluate the effect of MRPS22 (p.R202H) on mitochondrial function, we performed mitochondrial function studies in control- and patient-derived primary fibroblasts.

Measurements of electron transport chain complex activities from cultured skin

fibroblasts and OXPHOS activity measured with permeabilized cells both failed to

identify significant differences in activity between fibroblasts from the POI patients and

control individuals (Fig. 2.5). Thus, MRPS22 (p.R202H) has no detectable effect on mitochondrial function in primary fibroblasts, although we were unable to directly examine its function in ovarian tissue.

33

Fig. 2.4. Molecular analysis of fibroblasts from patients with the MRPS22 (p.R202H) mutation.

34

(A) Western blot analysis of MRPS22 and the loading control alpha-tubulin from control- and patient-derived primary fibroblasts. Levels of MRPS22 protein were determined by

ImageJ analysis and normalized to alpha-tubulin. (B) Levels of MRPS22 mRNA, (C) 12S rRNA, and (D) 16S rRNA were detected by RT-qPCR and were unchanged between control- and patient- derived fibroblasts. Each cell lines were seeded as triplicates. NS, not significant.

35

Fig. 2.5. Oxidative phosphorylation is normal in fibroblasts from patients with the

MRPS22 (p.R202H) mutation.

(A) Intact mitochondria isolated from fibroblasts were supplied with electron donor substrates and respiration, as indicated by oxygen uptake, was measured with a Clark

36

electrode. (B) Rate of oxidative phosphorylation in fibroblasts determined using protocol

1 as described [70]. (C) Rate of oxidative phosphorylation in fibroblasts determined using protocol 2 as described [70]. Control samples denote a genetically unrelated fibroblast line of approximately equal passage number that was analyzed concurrently with the samples derived from Family I. Average denotes the historical averages of control samples analyzed from a reference population (electron transport chain activity, n=144; oxidative phosphorylation, n=57). Average sample is shown as mean ± standard deviation. None of the data in the POI individuals is significantly different from the controls.

37

Embryonic lethality of Mrps22 deficient mice

Given the inability to study the impact of the MRPS22 mutations in patient-derived ovarian tissue, we generated two animal models to better investigate the in vivo function of MRPS22. First, we examined a homozygous Mrps22 knockout mouse model that was generated by complete of all exons and intervening sequences. Heterozygous

Mrps22 knockout mice (+/-) were fertile and showed no overt signs of abnormalities.

However, among 3-week-old offspring of a heterozygous intercross, no homozygous knockout mice (-/-) were detected (Table 2.5). Similarly, when offspring of a heterozygous intercross were genotyped at embryonic day 18.5 (e18.5), again no -/- offspring were detected (Table 2.5). Thus, complete deficiency of Mrps22 results in embryonic lethality, demonstrating the crucial role of Mrps22 in development, but preventing functional studies in adult ovarian tissue.

38

Table 2.5. Survival of offspring from a Mrps22 heterozygous knockout mouse (+/-) intercross.

Age E18.5 3 weeks

Genotype +/+ +/- -/- +/+ +/- -/-

Observed 12 25 0 10 22 0

Expected 9.25 18.5 9.25 8 16 8

p value 0.0021 0.0046

39

mRpS22 in Drosophila germ cells is required for fertility

The Drosophila melanogaster genome encodes a single ortholog, mRpS22, with

significant homology to the human MRPS22 gene. To evaluate the role of mRpS22 in vivo, we used an inducible tissue-specific RNA interference (RNAi)-mediated

knockdown approach. In these experiments, knockdown was achieved by expressing a

short hairpin RNA under the control of the upstream activator sequence (UAS) with the

following drivers: tub-Gal4 which uniformly drives expression in all tissues [71]; c587-

Gal4, and bab-Gal4, which drive expression in the somatic cells of the ovary [72]; and the germline specific nos-Gal4::VP16 driver [73]. We found that ubiquitous mRpS22

knockdown (tub>mRpS22RNAi) resulted in larval death. Interestingly, we found that

knockdown in germ cells (nos>mRpS22RNAi), but not in the somatic cells of the ovary, led

to female sterility (Table 2.6).

40

Table 2.6. Phenotypes of RNAi-mediated mRpS22 tissue-specific knockdown in

Drosophila

Conditional RNAi Driver Survival condition Female fertility knockdown

tub-Gal4 Whole body larval lethality -

Ovary: bab-Gal4 Viable Fertile somatic cells

Ovary: c587-Gal4 Viable Fertile somatic cells

Ovary: nos-Gal4 Viable Infertile germ cells

41

To identify the defect underlying the female sterility, ovaries were stained with an

antibody against Vasa, which labels all germ cells, and the DNA stain DAPI to monitor

germ cell differentiation. Adult ovaries are composed of 15-20 individual strands of progressively developing egg chambers called ovarioles (Fig. 2.6A). Egg chambers are assembled within the germarium, a structure at the anterior end of each ovariole. Each egg chamber contains 16 interconnected germ cells, one of which will become an oocyte and the others polyploidy nuclei. As each 16-cell cyst is surrounded by an epithelial monolayer of somatic cells, it will bud off from the germarium to form a chain of individualized egg chambers of progressive age. The bab>mRpS22RNAi mutant ovarioles

were indistinguishable from those in wild-type flies, suggesting that loss of mRpS22 in

the somatic cells of the ovary did not alter cell viability or ovarian development (Fig.

2.6B). However, nos>mRpS22RNAi mutant ovarioles lacked strings of developing egg

chambers (Fig. 2.6C). Moreover, no germ cells were detected, even at the tip of the

ovariole where the germline stem cells normally reside. (Fig. 2.6C) This agametic

phenotype suggests a defect in germ cell survival.

42

Fig. 2.6. mRpS22 is required for female germ cell development in Drosophila.

Representative confocal images of ovarioles from (A) control, (B) bub>mRpS22RNAi, and

(C) nos>mRpS22RNAi females stained for the cytoplasmic Vasa protein (green) to visualize germ cells, and the DNA stain DAPI (red). Scale bars: 50 μm.

43

Discussion

Here we describe the identification of four individuals from two independent

consanguineous families with missense mutations in MRPS22 that result in autosomal

recessive inheritance of POI. The conclusion of pathogenicity for the MRPS22 variants was based on the cumulative evidence stemming from the identification of two different homozygous missense variants in independent families together with functional data from a Drosophila model of germ cell specific mRpS22 deficiency. The genetic data support the causal role of the MRPS22 variants based on the following ACMG criteria: absence in population databases (strength of criteria = pathogenic moderate), multiple lines of computational evidence supporting a deleterious effect on the gene (pathogenic supporting), and co-segregation with disease in multiple affected family members

(pathogenic moderate). In addition, despite the absence of mitochondrial defects in functional studies of patient-derived fibroblasts (benign strong), the animal modeling studies in Drosophila demonstrated a deleterious effect of mRpS22 deficiency on fertility and ovarian development (pathogenic strong). Thus, the cumulative evidence together supports the pathogenicity of homozygous MRPS22 missense variants as a novel cause of

POI in adolescents.

MRPS22 encodes a component of the small 28S mitochondrial ribosome subunit that is found in species including mammals, fruit flies, and nematodes, but lacks a direct ortholog in fungi, yeast, plants, or bacteria [74]. Protein translation in mitochondria is

44

required to translate the 13 polypeptides encoded in the mitochondrial genome that are essential components of all mitochondrial respiratory chain complexes, excluding complex II which is entirely nuclear encoded [75]. The mitochondrial ribosome is composed of 80 and three rRNA molecules divided between two subunits, a large 39S subunit and a small 28S subunit [76,77]. Beyond MRPS22, mutations in other nuclear-encoded proteins involved in mitochondrial translation are associated with impaired ovarian development. Homozygous mutations in another component of the 28S subunit, MRPS7, were associated with primary hypogonadism and primary adrenal failure, as well as sensorineural deafness and lactic academia [78]. Mutations in the mitochondrial tRNA synthetases HARS2 and LARS2 both result in Perrault syndrome, consisting of sensorineural hearing loss and ovarian dysfunction [79,80]. Mutations in the mitochondrial tRNA synthetase AARS2 cause progressive leukoencephalopathy with ovarian failure [81]. Finally, mutations in any of the five subunits of EIF2B can lead to ovarioleukodystrophy, which in addition to vanishing white matter in the nervous system is associated with ovarian failure in female carriers [82]. Thus, the causal relationship between mutations in many genes involved in mitochondrial translation and ovarian development highlight the critical role of mitochondrial translation in this tissue.

In addition to mutations in MRPS22 causing POI, rare mutations in MRPS22 have also previously been reported to cause severe mitochondrial disease with features including cardiomyopathy, lactic acidosis, and brain abnormalities [68,69,83,84]. Features related to ovarian or germ cell development were not previously reported in these patients.

Therefore, this report extends the phenotypic spectrum of disorders associated with

45

impaired MRPS22 function. The only previous case reports of female patients with

MRPS22 mutations were three female infants who were homozygous for an MRPS22

(p.R170H) allele and presented with severe hypotonia, hypertrophic cardiomyopathy,

lactic acidosis, and died in infancy. Patients described in three other case reports were all

males, and also presented in critical condition with heart and brain abnormalities.

Molecular analysis of mitochondrial function in patient-derived fibroblasts from these patients identified decreased enzyme activities for the oxidative phosphorylation complexes and decreased levels of mitochondrial 12S and 16S rRNAs. Thus, the pathogenic variants p.R170S and p.L215P in MRPS22 in these patients compromised mitochondrial energy production. This is in contrast to the POI patient fibroblasts carrying the MRPS22 c.605G>A: p.R202H mutation, which demonstrated no defects in

OXPHOS activity or mitochondrial rRNA levels. This is consistent with the relatively milder phenotype of POI and the absence of lactic acidosis. Collectively, this suggests that the MRPS22 mutations p.R202H and p.R135Q associated with POI affect primarily the mitochondrial role in the reproductive system, but not in global energy production.

Structural analysis of the human mitochondrial ribosome [76] suggests a potential

mechanism for the more severe phenotypes associated with the missense mutations

p.L215P [69] and p.R170H [85] relative to the relatively milder POI phenotype associated

with the p.R135Q and p.R202H mutations (Fig. 2.7). Amino acids R135 and R202 are

buried and situated in an internal region of MRPS22, between an α-helical subdomain

and a β-sheet + 1 α-helix subdomain as shown above and below the cluster containing

R202 and R135 (Fig. 2.7). Both R135 and R202 form hydrogen bond interactions and

46

van der Waals interactions that will be disrupted by their respective mutations causing a

localized disturbance of that particular MRPS22 structure, possibly also affecting its

interaction with residues near F177 of MRPS18B. The L215 residue is located in a

hydrophobic region comprised of 3 α-helices. The p.L215P mutation likely not only causes a disruption of this hydrophobic core, but the change to Pro is also predicted to disrupt both the hydrogen bonding of the α-helix to which L215 belongs and van der

Waals interactions with an α-helix of MRPS18B (Fig. 2.7). The R170 residue is situated in the β-sheet + 1 α-helix subdomain (Fig. 2.7). The R170 side chain forms a hydrogen

bond with the backbone oxygen of S162. This S162 residue itself is in hydrogen bonding

distance with D71 of protein MRPS16 (Fig. 2.7). We anticipate that the p.R170H

mutation will cause loss of the interaction with S162 of MRPS22 and thereby lead to an

altered conformation of this region. In summary, the anticipated structural consequences

of the R135Q and R202H are likely similar, as they are in close proximity to each other

(~8.5Å) and are part of the same buried charge cluster in between two subdomains of

MRPS22 (Fig. 2.7). In contrast, the R170H and L215P mutations are postulated to indirectly cause disruption of protein:protein interfaces (Fig. 2.7).

47

Fig. 2.7. Structural analysis of disease-causing missense mutations in MRPS22.

Close-up view of the human mitochondrial ribosome structure containing the 4 identified disease-causing missense mutations in MRPS22. The mitochondrial ribosome structure

(39) (PDBid=3j9m) is shown in cartoon representation with the mutations shown in ball- and-stick and nearby key residue in stick representation. The ribosomal proteins MRPS22,

MRPS18B, MRPS16, and MRPS25 and the 12S rRNA are colored grey, green, magenta, blue, and orange, respectively. Hydrogen bonds are depicted by dashed lines. The figure was generated using PYMOL (htttps://pymol.org).

48

The etiology of ovarian dysfunction due to mutations in MRPS22, and other proteins that

function in mitochondrial translation, remains unclear [86]. However, the fact that germ-

cell specific deletion of mRpS22 in Drosophila results in agametic ovaries suggests a cell

autonomous phenotype within the female germline. Interestingly, the precursors to these

cells, primordial germ cells, demonstrate significantly elevated OXPHOS activity relative

to other cell types [87]. Furthermore, this activity is important for the eventual

specification of these stem cells [87]. The critical dependence of primordial germ cells on high levels of OXPHOS may contribute to the specific ovarian dysgenesis phenotype in

the context of relatively mild impairments in mitochondrial translation, as other cell types

are unaffected by the subtle mitochondrial defects. Thus, the identification of mutations

in specific genes that cause these mild mitochondrial defects and are therefore critical for

oocyte formation will lead to a better understanding of normal ovarian development and

potentially a better understanding of the molecular basis of premature ovarian failure.

49

Materials and Methods

Human studies. These studies were approved by the ethics committees of Rambam

Health Care Campus, Marmara University, and the Baylor-Hopkins Center for Mendelian

Genomics. Informed consent was obtained from all participants. Peripheral blood was collected from affected individuals, parents, and unaffected relatives if available.

Genomic DNA was extracted from blood leukocytes according to standard procedures.

Linkage analysis. Individuals F1-IV-5, F1-IV-9, and F1-IV-1, were genotyped at loci across the genome using the Illumina Omni 250K SNP chip. Based on the presumed autosomal recessive model of inheritance, regions of homozygosity greater than 1 megabase were identified that were shared between the affected individuals F1-IV-5 and

F1-IV-9, but not the control individual F1-IV-1.

Whole exome sequencing. Whole exome sequencing (WES) of patient F1-IV-5 and control individual F1-IV-1 was performed at The Technion Institute Sequencing Core.

DNA was extracted from whole blood and sequenced using the Illumina TruSeq .

Samples were sequenced with paired-end 100 bp reads totaling 19,290,546 for sample

F1-IV-5 and 18,997,336 for sample F1-IV-1. The fastq sequence files were mapped to the reference human genome GRCh37 using BWA (v. 0.7.5) [88]. Sequence was analyzed

following the GATK (v. 2.8-1) best practices including removal of duplicate reads by

50

Picard (v. 1.105), local realignment, and base quality score recalibration [89]. The

resulting number of unique mapped reads for F1-IV-5 and F1-IV-1 were 17,785,392 and

14,074,730, respectively. The average read depth was 15.3x and 14.7x, respectively.

HaplotypeCaller was used to call SNPs and indels [89]. For SNPs, the filters used were

QD < 2.0, FS > 60.0, and MQ < 40.0. For indels, the filters used were QD < 2.0 and FS >

200.0. The variants passing these filters were annotated with Annovar [90] using the

following databases: nonsyn_splicing, esp6500si_all, 100g2012apr_all,

snp137NonFlagged, ljb2_sift, and ljb_pp2.

WES was performed on the patient F2-IV-2 at the Baylor College of Medicine Human

Genome Sequencing Center (BCM-HGSC) according to a previously described protocol

[91]. In brief, genomic DNA samples were prepared into Illumina paired-end libraries and underwent whole-exome capture via the BCM-HGSC core VCRome 2.1 design [92] (42

Mb, NimbleGen, Cat. No. 06266380001) according to the manufacturer’s protocol

(NimbleGen SeqCap EZ Exome Library SR User’s Guide), followed by sequencing on

the HiSeq 2000 platform (Illumina) with a sequencing yield of 8.4 Gb. The samples

achieved 96% of the targeted exome bases covered to a depth-of-coverage of 20 or

greater [93]. Data produced were aligned and mapped to the human genome reference

sequence (Genome Reference Consortium GRCh37, hg19) with the Mercury in-house

bioinformatics pipeline [94]. Variants were called using the ATLAS (an integrative

variant analysis pipeline optimized for variant discovery) variant calling method and the

Sequence Alignment/Map (SAMtools) suites and annotated with the in-house-developed

51

‘‘Cassandra’’ annotation pipeline that uses Annotation of Genetic Variants ANNOVAR and additional tools and databases.

Variant prioritization for indels was as follows: If an indel is reported as a variant in the

Human Gene Mutation Database (HGMD) and its frequency is less than 5 percent in the

1000 Genomes Project (1000GP) data, it was included. If it is not present in HGMD, it has to pass through all of the variant quality filters and its frequency has to be less than 2 percent in 1000GP data to be prioritized and investigated further as a potential pathogenic variant. In a second filtering step, another filter is further applied based on the number of samples having the variant in Atherosclerosis Risk in Communities Study (ARIC) database. The number of samples having the variant in the ARIC database should be less than 120 out of 10,940 samples in total.

Variant prioritization for single nucleotide variants (SNVs) was as follows: If a SNV is reported as a variant in the HGMD or it has a clinical variant value between 3 and 8 in the Single Nucleotide Polymorphism database (dbSNP), its frequency has to be less than

5 percent in both 1000GP data and NHLBI GO Exome Sequencing Project (ESP5400)

African and European populations. If it is not present in HGMD and does not have a clinical variant value between 3 and 8 in dbSNP, it has to pass through all of the variant quality filters and its frequency has to be less than 1 percent in both 1000GP data and

ESP5400 African and European populations to be prioritized and investigated further as a potential pathogenic variant. Like indels, the SNVs that pass these filters were further

52

filtered based on the number of samples having the variant in the ARIC database (should

be less than 120 out of 10,940 samples in total). These filtering steps typically obtain

~800 variants per sample [94].

Given the apparent autosomal recessive inheritance pattern in the pedigree we focused on

the homozygous variants in the parsed and filtered WES data. Out of ~800 variants, we

selected the homozygous variants that have an allele frequency below 0.1% in our

internal database (CMG), which consists of more than 6,500 exomes including ~ 1,100

Turkish exomes. Then potential pathogenic variants including indels, nonsense and splice

site variants, and missense variants that were predicted as deleterious in at least three out

of five computational algorithms including SIFT, Polyphen2, LRT, Mutation Taster and

PROVEAN, were selected as candidate variants. After these filtering steps five

homozygous candidate variants in BCL6, KDM1A, MRPS22, PLXND1, and TRIM62

remained for manual inspection.

Sanger Sequencing. The MRPS22 (c.404G>A) variant was amplified by PCR using the

following primers: 5’-ATG GCC TTA GTG GGA CAC AG-3’ and 5’-AGG AGC GAA

ACT CCA TTT CA-3’. The 12 PCR amplicons that were not covered by WES in Family

I on between rs2737735 and rs16850488 were amplified and sequenced

with the primers listed in Table 2.3. Sanger sequencing was performed by GenScript. The data were visualized and analyzed using FinchTV (Geospiza).

53

Genotyping. The MRPS22 (c.404G>A) variant was amplified by PCR using the

following primers: 5’-GAA AAT TAT TGG TGT CAA AAT TGT A-3’ and 5’-ATG

GCC TTA GTG GGA CAC AG-3’. The resulting PCR product was digested with the restriction enzyme, Rsa1, and analyzed by agarose gel electrophoresis. Restriction digest of the wild-type PCR product resulted in a single 200 bp PCR product whereas restriction digest of the MRPS22 (c.404G>A) variant resulted in a 176 bp and a 24 bp product.

Cell culture. For fibroblast cell lines, punch biopsies of skin were obtained from patients

and controls. Patient and control fibroblast cell lines were cultured in high glucose

DMEM (Dulbecco's Modified Eagle's Medium) (Thermo Fisher #11965-092) supplemented with 10% fetal bovine serum (Sigma, #F2442) and 1% penicillin and streptomycin (Thermo Fisher 15140-122).

Western Blotting. Protein was extracted from cultured primary fibroblast cells with

RIPA buffer (Sigma #R0278) and a protease inhibitor (Roche #05892791001). Protein was quantified using the BCA method. Western blotting was performed and quantitated using ImageJ as described [95]. Primary antibodies used were an anti-MRPS22

monoclonal antibody (1:1,000, Proteintech #10984-1-AP) and an anti-alpha-tubulin

antibody (1:10,000, Sigma, #T9026). Secondary antibodies used were anti-rabbit (1:5,000,

Thermo Fisher #31460) and anti-mouse (1:5,000, Thermo Fisher #31430).

54

Quantitative PCR (qPCR). Total RNA was isolated from primary fibroblast cells using

the PureLink RNA purification kit (Thermo Fisher) and reverse transcribed using the

high capacity cDNA reverse transcription kit (Applied Biosystems). The sequences for

qPCR primers are as follows: MRPS22 forward primer 5’-TGA TAA TCA TGG CGC

CCC TC-3’, MRPS22 reverse primer 5’- CTA CCA GAT TCT GCG GCC T-3’; 12S rRNA forward primer 5’-TAG ATA CCC CAC TAT GCT TAG C-3’ ,12S rRNA reverse primer 5’-CGA TTA CAG AAC AGG CTC C-3’; 16S rRNA forward primer 5’-CCA

AAC CCA CTC CAC CTT AC-3’, 16S rRNA reverse primer 5’-TCA TCT TTC CCT

TGC GGT AC-3’; GAPDH forward primer 5’-AAT CCC ATC ACC ATC TTC CA’3’,

GAPDH reverse primer 5’-TGG ACT CCA CGA CGT ACT CA’3’. The qPCR reactions were performed with the power SYBR green PCR Master Mix (Thermo Fisher) and run on a Bio Rad CFX Connect Real Time System (Bio Rad). Expression levels were calculated using the △△Ct method relative to the GAPDH control gene.

Fibroblast oxidative phosphorylation and electron transport chain activity. Studies used an O2K (Oroboros Instruments) with permeabilized skin fibroblasts and performed with 2 protocols as previously described [70]. Electron transport chain complexes in skin

fibroblasts were measured spectrophotometrically at 37 ºC as previously described [96,97].

Mice. Heterozygous Mrps22 knockout mice (B6N(Cg)-Mrps22tm1.1(KOMP)Vlcg/J, stock #028462) were purchased from The Jackson Laboratory and maintained by brother- sister matings. All mice used for experiments were obtained from breeder colonies at

55

Case Western Reserve University. Mice were housed in ventilated racks with access to

food and water ad libitum and maintained at 21°C on a 12-hour light/12-hour dark cycle.

All mice were cared for as described under the Guide for the Care and Use of Animals,

eighth edition (2011) and all experiments were approved by IACUC and carried out in an

AAALAC approved facility. The IACUC protocol number is 2014-0132. Mice were weaned at 3 weeks of age and genotyped. The Mrps22 knockout allele was genotyped by

PCR based on the presence of a 380 bp product (wild-type allele) or a 491 bp product

(knockout allele). The sequence of the genotyping primers are as follows: Mrps22 wild- type forward primer 5’- GCT GTG GGC AGT GTT ATT GT-3’, Mrps22 wild-type

reverse primer 5’- TCT CAC ACC TAG TAC CGC AGT C-3’; Mrps22 mutant forward

primer 5’- CGG TCG CTA CCA TTA CCA GT-3’, Mrps22 mutant reverse primer 5’-

TCA GTA AGT ACC TTT TAA TCC CAA GA-3’.

Drosophila stocks and culture conditions. All Drosophila strains used in this study

were obtained from the Bloomington Drosophila Stock Center (BDSC), except for c587-

Gal4 which was a kind gift of T. Xie (Stowers Institute, Kansas City, Mo.). The stocks obtained from BDSC include nos-Gal4 (BDSC #4937), bab1-Gal4 (BDSC #6802), tubP-

Gal4 (BDSC #5138) and mRpS22-P{TRiP.HMC06144} (BDSC #65882). HMC RNAi lines are constructed in the VALIUM20 vector, designed for strong expression in both somatic and germline cells [98,99]. To maximize knockdown expression, animals were

raised at 29°C.

56

Immunofluorescence and image analysis. Drosophila ovaries from 2-3 day old females were fixed and stained by standard methods [100]. The primary Vasa antibody was obtained from the Developmental Studies Hybridoma Bank, and used at 1:100.

Secondary antibody conjugated to Alexa Fluor 555 (Thermo Fisher) was used at 1:200.

Images were acquired on a Leica TCS SP8 confocal microscope and assembled using

Photoshop (Adobe) and PowerPoint (Microsoft).

57

Chapter 3. Mutations in PIK3C2A Cause Syndromic Short

Stature Associated with Cataracts and Skeletal Abnormalities

The study of this chapter was submitted to

Dov Tiosano#, Hagit N. Baris#, Anlu Chen#, Markus Schueler#, Marrit M. Hitzert#, Antje Wiesener, Antonio Berguaz, Adi Mory, Alex Yuan, Brett Copeland, Joseph G. Gleeson, Patrick Rump, Hester van Meer, Deborah A. Sival, Karl X. Knaup, Andre Reis, Nadine N. Hauer, Christian T. Thiel, Brian M. McDermott, Brian D. Perkins, Ronald Roepman, Rolph Pfundt, Michael S. Wiesener, Mariam G. Aslanyan, and David A. Buchner. American Journal of Human Genetics (# denotes co-first authorship)

58

Abstract

PIK3C2A is a class II member of the phosphoinositide 3-kinase (PI3K) family proteins that catalyze the phosphorylation of phosphoinositide (PI) into PI(3)P and the phosphorylation of PI(4)P into PI(3,4)P2. PIK3C2A is critical for the formation of cilia and for receptor-mediated endocytosis, among other biological functions. We identified loss-of-function mutations in PIK3C2A in children with short stature, coarse facial features, cataracts with secondary glaucoma, multiple skeletal abnormalities, and other findings from three independent consanguineous families. Cellular studies of patient- derived fibroblasts confirmed the loss of PIK3C2A function as evidenced by the lack of

PIK3C2A protein, impaired cilia formation, and decreased levels of PI(3,4)P2.

Additionally, Pik3c2a deficiency in zebrafish also causes cataract formation. Thus, the genetic and molecular data collectively implicate mutations in PIK3C2A in a new

Mendelian disorder of PI metabolism. Identifying the genetic basis for this novel genetic syndrome sheds light on the critical role of a class II PI3K member in growth, vision, skeletal formation and neurological development. In particular, the considerable phenotypic overlap between this syndrome and Lowe syndrome, which also includes cataracts and skeletal malformations and is caused by mutations in OCRL, a gene encoding PI-5-phosphatase, highlight the key role of PI metabolizing in specific developmental processes, while demonstrating the unique non-redundant functions of each enzyme. This discovery, together with studies of other disorders of PI metabolism, will enable future studies to discover the molecular and mechanistic basis of this syndrome to better understand the role of PIK3C2A and class II PI3Ks in disease.

59

Introduction

Identifying the genetic basis of diseases with Mendelian inheritance provides insights

into gene function, susceptibility to disease, and can guide the development of new

therapeutics. To date, ~50% of the genes underlying Mendelian phenotypes have yet to

be discovered[101]. The disease genes that have been identified thus far have led to a

better understanding of the pathophysiological pathways and to the development of

medicinal products approved for the clinical treatment of such rare disorders[39].

Furthermore, technological advances in DNA sequencing allowed for the identification of

novel genetic mutations that result in rare Mendelian disorders[67,102]. We have applied

these next-generation sequencing technologies to discover mutations in PIK3C2A that

cause a newly identified genetic syndrome consisting of dysmorphic features, short

stature, cataracts and skeletal abnormalities.

PIK3C2A is a class II member of the phosphoinositide 3-kinase (PI3K) family of lipid

kinases that catalyze the phosphorylation of phosphatidylinositol (PtdIns)[103]. The function of class II PI3Ks are poorly understood; however, they are generally thought to catalyze the phosphorylation of PtdIns to generate PtdIns(3)P, although this remains controversial[104]. PIK3C2A has been attributed a wide-range of biological functions

including glucose transport, angiogenesis, Akt activation, endosomal trafficking,

phagosome maturation, exocytosis, and autophagy[105–113]. In addition, PIK3C2A is

critical for the formation and function of primary cilia [109,112]. However, there is as yet

60

no causal link between PIK3C2A, or any class II PI3K, and human disease. Here, we

describe the evidence that loss-of-function mutations in PIK3C2A are associated with a

novel syndromic disorder involving neurological, visual, skeletal, growth, and

occasionally conductive hearing impairments.

Results

Identification of mutations in PIK3C2A in patients with syndromic short stature.

Five individuals between the ages of 8 and 21 were identified from three unrelated

consanguineous families who presented with a similar constellation of clinical features including cataracts, secondary glaucoma, skeletal abnormalities, and dysmorphic facial

features (Fig. 3.1, Table 3.1). The dysmorphic facial features included coarse facies, low

hairline, epicanthal folds, flat and broad nasal bridges, and retrognathia (Fig. 3.1B).

Skeletal findings included scoliosis, delayed bone age, diminished ossification of femoral

heads, cervical lordosis, shortened fifth digits with mild metaphyseal dysplasia and

clinodactyly, as well as dental findings such as broad maxillary incisors, narrow

mandibular teeth, and dental enamel defects (Figs. 3.1C, 3.1D, Fig. 3.2). Other recurrent

features included hearing loss, short stature, stroke, developmental delay, and

nephrocalcinosis. For example, individual I-II-2 recently started having seizures, with an

EEG demonstrating sharp waves in the central areas of the right hemisphere and short

61

sporadic generalized epileptic seizures. Her brain MRI showed a previous stroke in the

right corpus striatum (Fig. 3.1F). In addition, brain MRI of patient II-II-3 showed multiple small frontal and periventricular lacunar infarcts (Fig. 3.2E). Unclear episodes of syncope also led to neurological investigations including EEG in individual III-II-2, without any signs of epilepsy. Her brain MRI showed symmetrical structures and normal cerebrospinal fluid spaces but pronounced lesions of the white matter (Fig. 3.2E).

62

Fig. 3.1. Pedigrees and pictures of the individuals studied.

(A) Pedigree of three consanguineous families studied. Black shapes indicate affected individuals. Roman numerals representing the generation are indicated on the left and

Arabic numerals representing the individual are indicated below each pedigree symbol.

(B) Photographs of affected individuals under their corresponding pedigree symbol indicate coarse facial features, including a broad nasal bridge, thick columella, and thick alae nasi. Of note, the left eye of patient II-II-2 shows phthisis bulbi of unknown etiology,

63

as evidenced by an atrophic non-functional eye. Representative images are shown. (C)

An X-ray indicate square shaped vertebral bodies and a flat pelvis, subluxation of the hips, and meta- and epiphyseal dysplasia of the femoral heads in patient III-II-2. (D) The teeth in patient II-II-3 indicates broad maxillary incisors, narrow mandibular teeth, and dental enamel defects. (E) The eye with a visible cataract (Cataracta polaris anterior), as indicated by a white arrow, in individual III-II-2, and (F) A brain MRI demonstrating areas of altered signal intensity as indicated by the white arrow in individual I-II-2.

64

Table 3.1. Phenotypic characteristics of patients in three families.

Family I I II II III

Patient II-1 II-2 II-2 II-3 II-2

Age (years) 11 8 12 10 20

Gender female female male male female Israel Israel Origin (Muslim- (Muslim- Syria Syria Tunisia Arabic) Arabic) Consanguineous + + + + +

Height -1.2 SD -2.3 SD -2.5 SD -4.8 SD -1.9 SD

Weight -0.2 SD -1.7 SD -0.2 SD -3.9 SD -1.9 SD Head -0.25 SD N.D. +0.9 SD -1.1 SD N.D. circumference Congenital + + + + + cataract Secondary + + + + - glaucoma Hearing loss + - - + + Scoliosis/Skeletal + + + + + abnormalities Teeth + + + + + Abnormalities Developmental + + N.D. N.D. N.D. delay Stroke + + + N.D. "+" indicates presence of trait, "-" indicates absence of trait, N.D., not done. GAG, glycosaminoglycan.

65

Fig. 3.2. Images of individuals with PIK3C2A deficiency.

Photographic images of (A) teeth, (B) hands, and (C) feet are shown from the five individuals with PIK3C2A deficiency. (D) X-Ray images of the pelvis and (E) MRI

66

images of the brain are shown when available. White arrows in the MRI images indicate regions of altered signal intensity.

In addition to the shared syndromic features described above in all three families, both affected daughters in Family I were diagnosed with congenital adrenal hyperplasia

(CAH), due to 17-alpha-hydroxylase deficiency, and were found to have a homozygous familial mutation: NM_000102.3:c.286C>T; p.Arg96Trp in the CYP17A1 gene (OMIM

#202110)[114,115]. The affected individuals in Families II and III do not carry mutations in CYP17A1 or have CAH, suggesting the presence of two independent and unrelated conditions in Family I. The co-occurrence of multiple monogenic disorders is not uncommon among this highly consanguineous population[116].

All five affected individuals were born to healthy first-degree cousins, with the exception of hypothyroidism in the mother in Family II (II-1-1), suggesting an autosomal recessive inheritance pattern (Fig. 3.1A). To identify the genetic basis of this disorder, enzymatic assays related to the mucopolysaccharidosis subtypes MPS I, MPS IVA, MPS IVB, and

MPSVI were tested in Families I and II and found to be normal. Enzymatic assays for mucolipidosis II/III were also normal and no pathogenic mutations were found in galactosamine-6-sulfate sulfatase (GALNS) in Family I. Additionally, since some of the features of patient II-II-3 were reminiscent of Noonan syndrome, Hennekam syndrome, and Aarskog-Scott syndrome, individual genes involved in these disorders were analyzed in Family II, but no pathogenic mutation was identified. An atypical presentation of

Williams-Beuren disease and Leri-Weill syndrome were excluded as evidenced by

67

molecular in patient III-II-2 and a chromosomal analysis, microarray and

molecular testing of FGFR3 were also normal.

Given the unsuccessful targeted genetic testing, WES was performed for the affected

individuals from all three families. After technical and biological filtering of the variants

identified by WES, five candidate variants were identified in Family I, including the

CYP17A1 (p.R96W) mutation that is the cause of the CAH[114,115], but is not known to

cause the other phenotypes. The remaining four variants were in the genes ATF4,

DNAH14, PLEKHA7, and PIK3C2A (Table 3.2). In Family II, sequence and CNV

analysis of the exome data revealed homozygous missense variants in KIAA1549L,

METAP1, and PEX2, in addition to a homozygous deletion in PIK3C2A that

encompassed exons 1-24 out of 32 total exons (Table 3.2, Fig. 3.3B). The deletion was

limited to PIK3C2A and did not affect the neighboring genes. Sequence analysis of

Family III showed a missense variant in PTH2R, a nonsense variant in DPRX, and a

splice site variant in PIK3C2A (Table 3.2).

WES analysis revealed that all affected family members in Families I, II, and III were

homozygous for predicted loss-of-function variants in PIK3C2A, and none of the

unaffected family members was homozygous for the PIK3C2A variants. The initial link

between these three families with rare mutations in PIK3C2A was made possible through

the sharing of information via the GeneMatcher website[67]. The single nucleotide

PIK3C2A variants in Families I and III were confirmed by Sanger sequencing (Fig. 3.3C,

D).

68

Table 3.2. Candidate variants identified by WES.

MA SIFT Polyphen2 Polyphen2 Gene SNP ID Type Effect Transcript cDNA Protein SIFT F Score HVAR Score

Family I

8.1e- ATF4 rs144769713 SNV Missense NM_001675 c.512C>T p.Ser171Phe Damaging 0 Benign 0.188 6 3.6e- CYP17A1 rs104894138 SNV Missense NM_000102 c.286C>T p.Arg96Trp Damaging 0 Prob Dam 1 5

DNAH14 . . SNV Missense NM_001373 c.5135T>A p.Leu1712His Damaging 0 Prob Dam 0.998

PIK3C2A . . SNV Nonsense NM_002645 c.585T>G p.Tyr195Ter . . . .

PLEKHA7 . . SNV Missense NM_175058 c.2899C>T p.Arg967Trp Damaging 0 Prob Dam 1

Family II 69

- KIAA1549 4.1e rs761694178 SNV Missense NM_012194 c.2132C>A p.Pro717Leu Damaging 0.02 Poss Dam 0.837 L 6

METAP1 . . SNV Missense NM_015143 c.408A>G p.Ile136Met Damaging 0 Prob Dam 0.985

7.4e- PEX2 rs35689779 SNV Missense NM_000318 c.209A>G p.Tyr70Cys Damaging 0.04 Prob Dam 0.989 4 c.(0+1_1-1)_ PIK3C2A . . DEL Deletion NM_002645 (4007+1_4008- p.0 . . . . 1)del Family III

7.1e- PTH2R . SNV Missense NM_005048 c.773G>A p.Gly258Asp Damaging 0 Prob Dam 1 6 3.3e- DPRX rs201435914 SNV Nonsense NM_001012728 c.466C>T p.Arg156Ter . . . . 4 p.Asn483_Arg5 PIK3C2A . . SNV Splice site NM_002645 c.1640+1G>T . . . . 47delinsLys

SNV, single nucleotide variant. DEL, deletion. Prob Dam, probably damaging. Poss Dam, possibily damaging. MAF, minor allele frequency

(from gnomAD v2.0.2). Overlapped genes among three families are indicated in bold.

70

71

Fig. 3.3. Loss-of-function mutations in PIK3C2A.

(A) Diagram of the intron/exon and protein domain structures of PIK3C2A, indicating the location of mutations identified in three independent consanguineous families with homozygous loss-of-function mutations in PIK3C2A. (B) CNV analysis confirmed a homozygous deletion encompassing exons 1-24 out of 32 total exons of PIK3C2A, indicated with the red line. (C) Sanger sequencing confirmed homozygosity for the

PIK3C2A c.585T variant in Family I. (D) Sanger sequencing confirmed homozygosity for the PIK3C2A c.1640+1 G>T variant in Family III.

72

In Family I, the nonsense mutation in PIK3C2A (p.Y195*) results in the deletion of 1,492

amino acids from a protein that is 1,686 amino acids. This is predicted to eliminate nearly

all functional domains, including the catalytic kinase domain, and is expected to trigger

nonsense-mediated mRNA decay[111]. Accordingly, levels of PIK3C2A mRNA are significantly decreased in both heterozygous and homozygous individuals carrying the p.Y195* variant (Fig. 3.4A). The deletion in Family II eliminates the first 24 exons of a

32-exon gene and is therefore not predicted to express any protein. This is consistent with a lack of PIK3C2A mRNA expression (Fig. 3.4B). The PIK3C2A variant in Family III disrupts an essential splice site (c.1640+1G>T) that leads to decreased mRNA levels (Fig.

3.4C) and exon skipping of both exons 5 and 6 (Fig. 3.5). Although this transcript remains in-frame, no PIK3C2A protein was detected by Western blotting (Fig. 3.4D).

This is consistent with Families I and II, for which western blotting also failed to detect

any full-length PIK3C2A in fibroblasts from the affected homozygous children (Fig.

3.4E). Thus, all three PIK3C2A variants likely encode loss-of-function alleles.

Importantly, among the 141,352 WES and WGS from control individuals in the Genome

Aggregation Database (gnomAD)[117], none are homozygous for loss-of-function

mutations in PIK3C2A, which is consistent with total PIK3C2A deficiency causing

severe early onset disease.

73

Fig. 3.4. Protein and mRNA levels of PIK3C2A in patient-derived cells.

PIK3C2A mRNA levels were detected by qRT-PCR in patient-derived fibroblasts from

(A) Family I, (B) Family II, and (C) Family III. (D) Whole cell lysates from fibroblasts of healthy controls (WT), heterozygous parents, and affected individuals from (D) Family

III and (E) Families I and II were analyzed by Western blotting for PIK3C2A and the loading controls Actin or GAPDH. Epitopes of anti-PIK3C2A antibodies (AB1-AB4) are

detailed in Table 3.5. * indicates p < 0.05. ** indicates p < 0.01. *** indicates p < 0.0001.

qRT-PCR data is represented as mean ± SEM (n=3-4 technical replicates per sample).

74

Fig. 3.5. PIK3C2A exon skipping in individual III-II-2.

The c.1650+1G>T mutation in PIK3C2A disrupts the splice donor site in intron 6 and leads to skipping of exons 5 and 6. Chromatograms are from sequenced RT-PCR products from cDNA of fibroblasts from wild-type control and patient fibroblasts using primers located in exons 3 and 10. Positions of primers are indicated by orange arrows and position of the splice site mutation is indicated by a black arrow.

75

Identification of cellular defects in patient-derived fibroblasts.

To test whether cellular phenotypes were consistent with loss-of-function mutations in

PIK3C2A, we examined cellular and cilia-localized PI(3,4)P2 levels as well as cilia length in control- and patient-derived primary fibroblasts. PIK3C2A deficiency in the patient-derived fibroblasts profoundly decreased PI(3,4)P2 throughout the cell in non- ciliated cells (Fig. 3.6A) and within primary cilia (Fig. 3.6B). Cilia length was also reduced in PIK3C2A deficient cells relative to control cells (Fig. 3.6C), although the percentage of ciliated cells was not altered (Fig. 3.6D). Despite the reduction of PI(3,4)P2 in cilia, the localization of other ciliary components were not affected (Fig. 3.7).

76

Fig. 3.6. Cilia defects in patient-derived fibroblasts.

Immunofluorescence studies on (A) non-ciliated and (B) ciliated fibroblasts using anti-

clathrin heavy chain (CHC), anti-α-Tubulin, and anti-PI(3,4)P2 antibodies demonstrate

decreased enrichment of the PIK3C2A product PI(3,4)P2 in non-ciliated cells and a loss

of the ciliary localization in affected individuals. Nuclei are stained with DAPI. (C) Cilia

length and (E) cilia number in primary fibroblasts from affected individuals and unrelated

control cells. *** indicates p < 0.0001. Data is represented as mean ± SEM

(n>300/sample).

77

Fig. 3.7. Localization of ciliary markers in patient-derived PIK3C2A deficient fibroblasts.

Co-Immunofluorescence studies on ciliated human fibroblasts using anti-IFT88, anti-PC1, and anti-IFT54, anti-Rab11, and anti-PI(3)P shown in green and anti-a-Tubulin antibody shown in red. No difference in ciliary localization between wild-type and affected individuals was detected.

78

Pik3c2a deficiency causes cataracts in zebrafish model.

To determine what features are caused by Pik3c2a deficiency in a model organism, we generated and examined two zebrafish models with nonsense mutations in pik3c2a.

Embryos with the alleles sa10124 and sa12328 were created as part of the Zebrafish

Mutation Project[118] and were obtained by in vitro fertilization from frozen sperm

samples by the Zebrafish International Resource Center. The alleles sa10124 and sa12328 encode nonsense mutations in pik3c2a at amino acids 585 and 1236, respectively, that were confirmed by Sanger sequencing. Both nonsense mutations occur prior to the end of the catalytic domain and are thus predicted to encode null alleles. We generated homozygous pik3c2asa12328/sa12328 and pik3c2asa10124/sa10124 zebrafish, as well as

compound heterozygous pik3c2asa12328/sa10124 mutants by intercrossing heterozygous

adults. As these alleles were generated by random ENU mutagenesis[118], analysis of the

compound heterozygous pik3c2asa12328/sa10124 mutants minimized the likelihood of homozygosity for any unlinked ENU-induced mutations. The frequency of offspring genotypes from each pik3c2a+/- intercross was expected to follow a Mendelian (1:2:1)

ratio. The 1:2:1 genotype ratio was observed for offspring at both 1- and 3-weeks post fertilization (Table 3.3) and the gross morphology of the mutants at these ages was indistinguishable from that of control fish. However, no pik3c2a-/- zebrafish survived beyond 3 months, demonstrating that pik3c2a is required for viability into adulthood

(Table 3.3).

79

Table 3.3. Survival of offspring from pik3c2a heterozygous knockout zebrafish (+/-) crosses.

pik3c2a genotype Cross Age p value -/- +/- +/+

1 week 33 83 31 0.29

sa10124+/- x sa12328+/- 3 weeks 16 25 10 0.49

3-5 months 0 33 11 0.0003

1 week 9 15 10 0.77 sa10124+/- x sa10124+/- 3-5 months 0 5 3 0.25

1 week 5 12 7 0.33 sa12328+/- x sa12328+/- 3-5 months 0 13 7 0.04

1 week 47 110 48 0.5748 Combined 3-5 months 0 51 21 0.0001

80

Given the extended timeframe of pik3c2a-/- viability in zebrafish relative to the mouse

Pik3c2a knockout model, which is embryonic lethal[106], we used the zebrafish model of

Pik3c2a deficiency to test for phenotypic similarities with the features of PIK3C2A deficiency in humans. We focused on the presence of cataracts, given that it was a robust and early phenotype present in all affected individuals described above. Zebrafish were screened for lenticular abnormalities by coaxial illumination. A masked grader evaluated digital photographs and videos of each animal and graded the lens as normal or abnormal.

Each animal was genotyped following the cataract grading. All pik3c2a-/- animals

evaluated (n=7) displayed lenticular abnormalities, whereas only one control pik3c2a+/- or

wild-type zebrafish (n=7) displayed any lenticular abnormalities (p < 0.005, two-tailed

Fisher’s exact test) (Fig. 3.8). The mutant animals had a circular defect which was more obvious in the posterior aspect of the lens, reminiscent of posterior lenticonus (Fig. 3.8B).

81

Fig. 3.8. Pik3c2a deficiency in zebrafish causes cataracts.

Coaxial illumination of a (A) wild-type eye, and a (B) pik3c2a-/- zebrafish eye resembling posterior lenticonus. Toluidine blue staining of a (C) wild-type eye, and a (D) pik3c2a-/- zebrafish eye.

82

Discussion

Here we describe the identification of three independent families with loss-of-function

mutations in PIK3C2A, resulting in a novel syndrome displaying short stature, cataracts,

secondary glaucoma, and skeletal abnormalities among other features (Tables 3.1).

Interestingly, we observed in patient-derived fibroblasts shortening of the cilia and

decreased levels of ciliary PI(3,4)P2 (Fig. 3.6). Thus, based on the loss-of-function

mutations in PIK3C2A, the phenotypic overlap between the three independent families,

the patient-derived cellular data consistent with PIK3C2A deficiency, and the presence of

cataracts in both patients and pik3c2a-/- zebrafish, we conclude that loss-of-function

mutations in PIK3C2A cause this novel syndrome.

The identification of PIK3C2A loss-of-function mutations in humans represents the first mutations identified in any class II PI-3-kinase with Mendelian inheritance, and thus sheds light into the biological role of this poorly understood class of PI-3-

kinases[104,119]. This is significant not only for understanding the role of PIK3C2A in

rare monogenic disorders, but also for the potential contribution of common variants in

PIK3C2A in more genetically complex disorders. Often, severe mutations in rare

Mendelian disorders can highlight the biological function of genes in a developmental

process in which other less severe variants in that gene are likely to contribute[120,121].

For example, severe mutations in PPARG cause monogenic lipodystrophy, whereas less severe variants are associated with complex polygenic forms of lipodystrophy[122,123].

83

In the case of PIK3C2A deficiency, the identification of delay of neurological

development in Family I may provide biological insight into the mechanisms underlying

the association between common variants in PIK3C2A and schizophrenia[124–126].

Likewise, the short stature in PIK3C2A deficient patients calls attention to the SNPs

rs1330 and rs757081 that are both less than 125 kilobases from the PIK3C2A gene and

are significantly associated with human height[127,128]. Of note, as PIK3C2A is required

for sonic hedgehog signaling[112], variation in this pathway has previously been

implicated in the regulation of human height[129].

Other monogenic disorders of phosphoinositide metabolism include Lowe syndrome which shares many of the same features with PIK3C2A deficiency including congenital cataracts, secondary glaucoma, kidney defects, skeletal abnormalities, developmental delay, and short stature[130,131]. The enzyme defective in Lowe syndrome, OCRL, is a

5-phosphatase that is required for membrane trafficking and ciliogenesis, similar to

PIK3C2A [132]. The similarities between Lowe syndrome and PIK3C2A deficiency suggest that similar defects in phosphatidylinositol metabolism, perhaps related to

deficiency of PI(3,4)P2, which was greatly reduced in PIK3C2A patient-derived

fibroblasts (Fig. 3.6), may underlie both disorders. In addition to Lowe syndrome, there is

partial overlap between PIK3C2A deficiency and other Mendelian disorders of PI

metabolism, such as the early-onset cataracts in patients with INPP5K

deficiency[133,134], demonstrating the importance of PI metabolism in lens development.

84

The viability of patients with PIK3C2A deficiency and pik3c2a deficient zebrafish

suggests differences between the biological functions of human PIK3C2A and the mouse

ortholog. Mouse knockout models of Pik3c2a result in growth retardation by e8.5 and

embryonic lethality between e10.5-11.5 due to vascular defects[106]. It remains to be

determined whether the species viability differences associated with PIK3C2A deficiency

result from altered PIK3C2A function between humans and mice or due to altered

compensation from other PI metabolizing enzymes. For instance, there are species- specific differences between humans and mice in the transcription and splicing of the

OCRL homolog INPP5B that may uniquely contribute to PI metabolism in each species[135]. Interestingly, the splicing pattern of inpp5b in zebrafish is similar to that in

humans[135], and the pik3c2a-/- zebrafish survives considerably longer than the mouse

Pik3c2a knockout. However, other biological functions appear to be conserved between

humans and mice. For instance, deletion of Pik3c2a in adult mice resulted in prolonged

bleeding time and demonstrated that Pik3c2a is required for platelet function[136]. The

brain MRIs in two of the PIK3C2A deficient patients also detected evidence of bleeding

within the brain (Fig. 3.2E), suggesting that PIK3C2A is required for maintaining proper

hemostasis.

It is intriguing that both PIK3C2A and OCRL have important roles in cilia formation.

Primary cilia are evolutionary conserved microtubule-derived cellular organelles that

protrude from the surface of most mammalian cell types. They play a pivotal role in a

number of processes, such as left-right patterning during embryonic development, cell growth, and differentiation. The importance of primary cilia in embryonic development

85

and tissue homeostasis has become evident over the two past decades, as a number of

proteins which localize to the cilium harbor defects causing syndromic diseases,

collectively known as ciliopathies. Hallmark features of ciliopathies include skeletal

abnormalities, progressive vision and hearing loss, mild to severe intellectual disabilities,

polydactyly, and kidney phenotypes. Primary cilia formation is initiated by a cascade of

processes involving the targeted trafficking and docking of Golgi-derived vesicles near

the mother centriole. Interestingly, phosphatidylinositol metabolism has been linked to

ciliary dysfunction[137] and PIK3C2A loss has been associated with impaired

ciliogenesis in MEFs, likely due to defective trafficking of ciliary components[112].

Further work and the identification of additional PIK3C2A patients will be needed to

better understand the phenotype-genotype correlation associated with PIK3C2A deficiency. However, the identification of the first patients with PIK3C2A deficiency establishes that this enzyme is not required for viability in humans. Additionally, the clinical presentations of the five PIK3C2A deficient patients identified thus far clearly establishes a role for PIK3C2A in neurological and skeletal development, as well as vision, and growth.

86

Material and Methods

Human studies. The study was approved by the ethics committees of Rambam Hospital,

Haifa, Israel, of the University Medical Center, Groningen, Netherlands, and University

Hospital, Erlangen, Germany. Informed consent was obtained from all participants.

Whole exome sequencing (WES). WES of two patients from Family I was performed using 1µg of DNA that was extracted from whole blood and fragmented and enriched using the Truseq DNA PCR Free kit (Illumina). Samples were sequenced on one lane of a

HiSeq2500 (Illumina) with 2x100bp read length and analyzed as described[138]. Raw fastq files were mapped to the reference human genome GRCh37 and the two mapped

SAI files were combined using the BWA package[139] (v.0.7.12) and output as BAM files. To pre-clean up the data, duplicate reads were marked and removed by Picard (v.

1.119). In addition, local realignment and base quality score recalibration were performed following the GATK pipeline[140] (v. 3.3). The resulting number of sequencing reads for the older sister and younger sister were 32,358,405 and 38,507,957, respectively. The average read depth was 98x and 117x, respectively. Subsequently, HaplotypeCaller was used to call SNPs and indels. For SNPs, the filters used were QD < 2; MQ <60; FS > 40;

MQRankSum < −12.5; ReadPosRankSum < −8; DP<=10. For indels, the filters used were QD < 2; FS > 200; ReadPosRankSum < −20; DP<=10. The variants that passed these filters were further annotated with Annovar[141]. Databases used in Annovar were

RefSeq[142], EXAC[117] (v. exac03), CLINVAR[143] (v. clinvar_20150330) and LJB

87

database[144] (v. ljb26_all). Exome variants in Family I were filtered out if they were not

homozygous in both affected individuals, had a population allele frequency greater than

0.1% in either the Exome Aggregation Consortium database[15] or the Greater Middle

East Variome Project[16], and were not predicted to be deleterious by either SIFT[145] or

Polyphen2[146].

WES was performed on the two affected individuals of Family II and both their parents

essentially as previously described[147]. Target regions were enriched using the Agilent

SureSelectXT Human All Exon 50Mb Kit. Whole-exome sequencing was performed on

the Illumina HiSeq platform (BGI, Copenhagen, Denmark) followed by data processing

with BWA[139] (read alignment) and GATK[140] (variant calling) software packages.

Variants were annotated using an in-house developed pipeline. Prioritization of variants

was done by an in-house designed ‘variant interface’ and manual curation.

The DNAs of Family III were enriched using the SureSelect Human All Exon Kit v6

(Agilent, Santa Clara, CA) and sequenced on an Illumina HiSeq 2500 (Illumina, San

Diego, CA). Image analysis and base calling were performed using HiSeq instrument

control software with default parameters. After demultiplexing with bcl2fastq v1.8.4

from Illumina, read alignment was performed with BWA[139] version 0.7.8 using the

bwa mem algorithm with the human genome assembly hg19 (GRCh37) as a reference.

Duplicate reads were marked with Picard (version 1.111). The average read depth was

95x (III-II-2), 119x (III-I-1) and 113x (III-I-2). Single-nucleotide variants and small

88

insertions and deletions (indels) were detected using five different callers:

HaplotypeCaller and UnifiedGenotyper of the aforementioned Genome Analysis Toolkit,

SNVer[148], freeBayes[149], and Platypus[150]. Variant annotation was performed using

ANNOVAR[141], CLINVAR[143], OMIM and MedGen. Variants were selected that

were covered by at least 10% of the average coverage of each exome and for which at

least 5 novel alleles were detected from 2 or more callers. All modes of inheritance were

analyzed[151]. Variants were prioritized based on a population frequency of 10-3 or below

(based on the ExAC database[117] and an in-house variant database), on the evolutionary conservation, and on the mutation severity prediction (CADD score 15 or higher). All remaining variants and the segregation in the family were confirmed by Sanger sequencing.

Copy number variants analysis. Microarray analysis for copy number variant detection in Family I was performed using a HumanOmni5-Quad chip (Illumina). SNP array raw

data was mapped to the reference human genome GRCh37 and analyzed using

GenomeStudio (v. 2011/1). Signal intensity files with Log R ratio and B-allele frequency

were further analyzed with PennCNV[152] (v. 2014/5/7) to detect copy number variants.

CNV analysis on the WES data of Family II was performed by CNV calling using

CoNIFER[153]. Variants were annotated using an in-house developed pipeline.

Prioritization of variants was done by an in-house designed ‘variant interface’ and

manual curation as described before[154]. Subsequent segregation analysis of the

89

pathogenic CNV in Family II was performed with MAQ by using a targeted primer set with primers in exons 3, 10, 20 and 24 which are located within the deletion and exons 28,

32, 34 which are located outside of the deletion (Multiplex Amplicon Quantification

(MAQ; Multiplicom Niel, Belgium).

Sanger Sequencing. DNA was extracted from patients’ and controls’ blood cells or fibroblasts. Sanger sequencing was performed by GenScript. The data were visualized and analyzed using FinchTV (Geospiza). Candidate variants identified by WES were

PCR amplified and sequenced with the primers listed in Table 3.4.

Cell culture. Human dermal fibroblasts were obtained from sterile skin punches cultured in DMEM (Dulbecco's Modified Eagle's Medium) supplemented with 10 - 20% Fetal

Calf Serum, 1% Sodium Pyruvate and 1% Penicillin and streptomycin (P/S) in 5% CO2 at 37°C. Control fibroblasts were obtained from healthy age-matched volunteers.

Fibroblasts from passages 4–8 were used for the experiments.

Western Blotting. Protein was extracted from cultured primary fibroblast cells as described[155,156]. Extracts were quantified using the DC protein assay (BioRad) or the

BCA method. Equal amounts of protein were separated by SDS-PAGE and electrotransferred onto polyvinylidene difluoride membranes (Millipore, Billerica,

Massachusetts, USA). Membranes were blocked with TBST/5% fat-free dried milk and

90

stained with antibodies as detailed in Table 3.5. Secondary antibodies were goat anti- rabbit (1:5,000, Thermo Fisher #31460) goat anti-mouse (1:5,000, Thermo Fisher

#31430), goat anti-rabbit (1:2,000, Dako #P0448), and goat anti-mouse (1:2,000, Dako

#P0447).

Cilia analysis. To induce ciliogenesis, cells were grown in DMEM with 0 - 0.2% FCS for 48 hours. Immunofluorescence staining was performed in a biological triplicate.

Briefly, cells were washed in PBS, then fixed and permeabilized in ice-cold methanol for

5 minutes, followed by extensive washing with PBS. After blocking in 5% Bovine Serum

Albumin, cells were incubated with primary antibodies for 1.5 hours at room temperature

(RT) and extensively washed in PBS-T. Primary antibodies used for Centrin and

ARL13B are detailed in Table 3.5. To wash off the primary antibody, cells were extensively washed in PBS-T. Subsequently, cells were incubated with secondary antibodies, Alexa Fluor 488 (1:800, Invitrogen) and Alexa Fluor 568 (1:800, Invitrogen), for 45 min followed by washing with PBS-T. Finally, cells were shortly rinsed in ddH2O and samples were mounted using Vectashield with DAPI. Images were taken using an

Axio Imager Z2 microscope with an Apotome (Zeiss) at 63x magnification. Cilia were measured manually using Fiji software taking the whole length of the cilium based on

ARL13B staining. At least 300 cilia were measured per sample. Cilia lengths were pooled for 3 control cell lines and compared to 2 patient-derived samples. Statistical significance was calculated using a Student t-test.

91

cDNA and quantitative real time-PCR. Total RNA was purified from primary fibroblasts using the PureLink RNA purification kit (ThermoFisher) or RNAPure peqGOLD (Peqlab, Darmstadt, Germany). RNA was reverse transcribed into complementary DNA with random hexamer using a high-Capacity cDNA Reverse

Transcription Kit (Thermo Fisher Scientific, Waltham, USA). RT-PCR to detect exon- skipping in family III was performed using primers flanking exon 6 (Fig. 3.5). Gene expression was quantified by SYBR Green real-time PCR using the CFX Connect Real-

Time System (BioRad, München, Germany). Primers used are detailed in Table 3.4.

Expression levels were calculated using the CT method relative to GADPH.

△△

Immunostaining. Cells were grown on glass coverslips to approximately 80%-90% confluency in DMEM + 10% FCS + 1% P/S, at which time the medium was replaced with DMEM without FCS for 48 hours to induce ciliogenesis. Cells were fixed in either methanol for 10 minutes at -20°C or 4% paraformaldehyde for 10 minutes at RT. Fixed cells were washed in PBS, and incubated with 10% normal goat serum, 1% bovine serum albumin in PBS for 1 hour at RT. If cells were fixed with paraformaldehyde, blocking solutions contained 0.5% Triton X-100. Cells were incubated with primary antibody overnight at 4°C, washed in PBS, and incubated with secondary antibody including

Diamidino-2-Phenylindole (DAPI) to stain nuclei for 1 hours at RT. Coverslips were mounted on glass slides with fluoromount (Science Services, München, German) and imaged on a confocal laser scanning system with a 63x objectives (LSM 710, Carl Zeiss

MicroImaging, Jena, Germany). Primary antibodies are detailed in Table 3.5.

92

Zebrafish (Danio rerio). Zebrafish strains carrying the sa10124 and sa12328 alleles were purchased from the Zebrafish International Resource Center (ZIRC). Zebrafish were kept with the approval of the Case Western Reserve University Institutional Animal Care and Use Committee (protocol number 2015-0139) in a 16h light/8h dark cycle. Fish were euthanized by chilling at 4 degrees Celsius.

The pik3c2a knockout allele was genotyped using the dCAPS method[157]. The primers

were designed using the web-based software program dCAPS Finder 2.0 (Table 3.4)[158].

The 2nd PCR introduced a restriction site for the enzyme DdeI in the wild-type allele of

pik3c2a but not the sa10124 allele, or introduced a restriction site for the enzyme TaqI

in the wild-type allele of pik3c2a but not the sa12324 allele. The digested PCR products

were analyzed by electrophoresis on a 3.5% agarose gel. For the sa10124 allele, PCR and

digestion of the mutant pik3c2a allele produced a 223 bp product, whereas the wild-type

allele produced a 206 bp and 17 bp product. For the sa12328 allele, PCR and digestion of

the mutant pik3c2a allele produced a 228 bp product, whereas the wild-type allele produced a 205 bp and 23 bp product.

Cataracts evaluation in zebrafish. Adult zebrafish were anesthetized with tricaine and examined. Coaxial illumination using a Leica M841 surgical microscope was used to visualize lenticular defects. Digital video recordings were made using a Panasonic GP-

US932A HD camera system and later reviewed by an ophthalmologist without

knowledge of the genotype of each animal. Optical sectioning by changing the z-axis

93

focus was used to aid in the identification of cataracts. All zebrafish examined for cataracts were offspring of a cross between pik3c2asa10124/+ and pik3c2asa12328/+ fish.

94

Table 3.4. List of primers used in this study.

Primer Forward Reverse Source PIK3C2A_cDNA GACATTGAAGGAT Splicing- Exon3 TTCAGCTACC effect PIK3C2A_cDNA GCACAGTCTGTAGG Splicing- Exon10 ACTCCTACC effect PIK3C2A cDNA CTCAGCTTGCAAA CTGGGTTTGTGCGG Gene Exon 1-2 AGCCCAG TGATTG expression PIK3C2A_cDNA GTGCTGACCTCTG CAAGTTGTAGGCCT Gene Exon24 ATATGGC GACAGC expression ATF4 TAGATGACCTGGA GGGCTCATACAGAT Sequencing AACCATGC GCCACTA DNAH14 GGTGGAGTAGAGC GGTACAGTCCCAGG Sequencing TCCCAGA TCATCC PLEKHA7 CACTCCCCGAACT CAGCTCAGGCTCAC Sequencing CTACAGC TGACAT PIK3C2A ACAGTGGCCACCT TCAGTCCTTGCTTT Sequencing, GGATTAC CCCATT Family I PIK3C2A TTATTGTGGCTGA GACAATAGAAAGA Sequencing, AGGATGC CCAAAGAGTGG Family III GAPDH cDNA AATCCCATCACCA TGGACTCCACGACG Gene TCTTCCA TACTCA expression pik3c2asa10124(1st GCAACTCCACAGA ACCTCTGGTGAGCG Genotyping primer pair) TGCGATA TGTTCT (dCAPS) pik3c2asa10124(2nd GCAACTCCACAGA TCAACTTCATCCAG Genotyping primer pair) TGCGATA AGCTCA (dCAPS) pik3c2asa12328(1st AACCTCACTCCCA CAACAGAACTGCTG Genotyping primer pair) TGACCTC CCATGT (dCAPS) pik3c2asa12328(2nd AACCTCACTCCCA TGTCCTTGAAGGAA Genotyping primer pair) TGACCTC CCCGTCACTCG (dCAPS)

95

Table 3.5. List of antibodies used in this study.

Antigen Host Catalog # Source Dilution IF Dilution IB PIK3C2A rabbit --- Gift from Prof. 1:200 1:1,000 (AB1) Haucke (Berlin), epitope: a.a. 2-365 PIK3C2A mouse Sc-365290 Santa Cruz, 1:50 1:1,000 (AB2) epitope: a.a. 61- 360 PIK3C2A rabbit 12402 Cell Signaling, - 1:1,000 (AB3) epitope: ~ a.a. 717 PIK3C2A rabbit 22028-1-AP Proteintech, 1:1000 (AB4) Acetylated mouse T7451 Sigma 1:300 -- α-Tubulin (Lys40) Acetylated rabbit 5335 Cell Signaling 1:200 -- α-Tubulin (Lys40) Clathrin rabbit Ab21679 Abcam 1:200 1:1,000 heavy chain IFT88 rabbit 13967-1-AP ProteinTech 1:50 -- PI(3,4)P2 mouse Z-P034b Echelon 1:150 -- PI(3)P mouse Z-P003 Echelon 1:100 -- Polycystin1 rabbit Ab74115 Abcam 1:50 -- RAB11A rabbit Ab65200 Abcam 1:100 -- TRAF3IP1 rabbit A104577 Atlas 1:50 -- (IFT54) GAPDH mouse MA5-15738 Thermo Fisher -- 1:10,000 Centrin mouse 04-1624 Millipore 1:500 -- ARL13B rabbit 17711-1-AP Proteintech 1:500 --

IF, immunofluorescence; IB, immunoblot; a.a., amino acid

96

Chapter 4. Widespread epistasis regulates glucose homeostasis and gene expression

The study of this chapter was published in

Anlu Chen, Yang Liu, Scott M. Williams, Nathan Morris , David A. Buchner. Widespread epistasis regulates glucose homeostasis and gene expression. PLoS Genetics. 2017, 13(9): e1007025.

97

Abstract

The relative contributions of additive versus non-additive interactions in the regulation of

complex traits remains controversial. This may be in part because large-scale epistasis

has traditionally been difficult to detect in complex, multi-cellular organisms. We

hypothesized that it would be easier to detect interactions using mouse chromosome

substitution strains that simultaneously incorporate allelic variation in many genes on a

controlled genetic background. Analyzing metabolic traits and gene expression levels in

the offspring of a series of crosses between mouse chromosome substitution strains

demonstrated that inter-chromosomal epistasis was a dominant feature of these complex

traits. Epistasis typically accounted for a larger proportion of the heritable effects than

those due solely to additive effects. These epistatic interactions typically resulted in trait

values returning to the levels of the parental CSS host strain. Due to the large epistatic

effects, analyses that did not account for interactions consistently underestimated the true

effect sizes due to allelic variation or failed to detect the loci controlling trait variation.

These studies demonstrate that epistatic interactions are a common feature of complex

traits and thus identifying these interactions is key to understanding their genetic regulation.

98

Introduction

The genetic basis of complex traits and diseases results from the combined action of

many genetic variants [159]. However, it remains unclear whether these variants act

individually in an additive manner or via non-additive epistatic interactions. Epistasis has

been widely observed in model organisms such as S. cerevisiae [160,161], C. elegans

[162], D. melanogaster [163] and M. musculus [164]. However, it has been more difficult

to detect in humans, potentially due to their diverse genetic backgrounds, low allele

frequencies, limited sample sizes, complexity of interactions, insufficient effect sizes, and

methodological limitations [165,166]. Nonetheless, a number of genome-wide interaction-

based association studies in humans have provided evidence for epistasis in a variety of

complex traits and diseases [31–37]. However, concerns remain over whether observed

epistatic interactions are due to statistical or experimental artifacts [167,168].

To better understand the contribution of epistasis to complex traits, we studied mouse chromosome substitution strains (CSSs) [169]. For each CSS, a single chromosome in a

host strain is replaced by the corresponding chromosome from a donor strain. This

provides an efficient model for mapping quantitative trait loci (QTLs) on a fixed genetic

background. This is in contrast to populations with many segregating variants such as

advanced intercross lines [170], heterogeneous stocks [171], or typical analyses in humans.

Given the putative importance of genetic background effects in complex traits [172,173],

we hypothesized the fixed genetic backgrounds of CSSs can provide a novel means for

detecting genetic interactions on a large-scale [169,174]. Previous studies of CSSs with

99

only a single substituted chromosome suggested that non-additive epistatic interactions

between loci were a dominant feature of complex traits [164]. However, to identify the

interacting loci, or at least their chromosomal locations, requires the analysis of genetic variation in multiple genomic contexts [175]. We thus extended the analysis of single

chromosome substitutions by analyzing a series of CSSs with either one or two

substituted chromosomes, collectively representing the pairwise interactions between

genetic variants on the substituted chromosomes. This experimental design can directly

identify and map loci that are regulated by epistasis by analyzing the phenotypic effects

of genetic variants on multiple fixed genetic backgrounds. Here we report the widespread

effects of epistasis in controlling complex traits and gene expression. The detection of

true epistatic interactions will improve our understanding of trait heritability and genetic

architecture, as well as provide insights into the biological pathways that underlie disease

pathophysiology [176]. Knowing about epistasis will also be essential for guiding precision medicine-based decisions by interpreting specific variants in appropriate contexts.

100

Results

Contribution of epistasis to metabolic traits.

Body weight and fasting plasma glucose levels were measured in a total of 766 control

and CSS mice (Table 4.1, Fig. 4.1). Raw data of body weight and plasma glucose measurements is available at https://doi.org/10.1371/journal.pgen.1007025.s011. The

CSSs included 240 mice that were heterozygous for one A/J-derived chromosome and

444 mice that were heterozygous for two different A/J-derived chromosomes, both on otherwise B6 backgrounds. The CSSs with two A/J-derived chromosomes represented all pairwise interactions between the individual A/J-derived chromosomes. For example, comparisons were made between strain B6, strains (B6.A3 x B6)F1 and (B6 x B6.A10)F1, which were both heterozygous for a single A/J-derived chromosome (Chr. 3 and 10, respectively), and strain (B6.A3 x B6.A10)F1, which was heterozygous for A/J-derived chromosomes 3 and 10 (Fig. 4.2). A complete list of the strains analyzed is shown in

Table 4.1. Quantitative trait loci (QTLs) were identified for both body weight and plasma glucose levels that were due to main effects and interaction effects. Of note, due to the nature of the CSS experimental design, the regions defined by the identified QTLs correspond to the entire substituted chromosome. Additionally, due to the study design, only QTLs with dominant or semi-dominant effects could be assessed.

101

Table 4.1. Number of mice used for analysis of body weight and plasma glucose.

Maternal Genotype B6 B6.A3 B6.A6 B6.A14 B6.A17

82 20 34 32 35 B6 (43,39) (9,11) (16,18) (14,18) (20,15) 24 3 36 19 30 B6.A4 (13,11) (1,2) (18,18) (10,9) (16,14) 37 21 33 35 41 B6.A5 (16,21) (15,6) (17,16) (17,18) 20,21 27 8 24 10 29 B6.A8

Paternal Genotype Paternal (16,11) (3,5) (13,11) (6,4) (12,17) 31 41 33 39 42 B6.A10 (17,14) (21,20) (15,18) (22,17) (21,21) Total number of mice is shown with the numbers of male (left) and female (right) mice indicated below in parentheses.

102

Fig. 4.1 Body weight and glucose levels in all CSS and control mice.

Body weight and plasma glucose levels were measured in 5-week-old mice that were fasted overnight. Each dot represents the data from an individual mouse. Females (F) are

103

shown in red. Males (M) are shown in blue. Outliers, as described in the Trait Analysis paragraph in the Methods section, are not shown but all data are available at https://doi.org/10.1371/journal.pgen.1007025.s011.

104

Fig. 4.2. Schematic diagram of CSS and control crosses.

Crosses were used to generate control, single CSS, and double CSS mice to examine main effects and interaction effects on various traits and gene expression levels. The four crosses used (top) to generate the control and CSS offspring (bottom) to study the substitution of chromosomes 3 and 10 are provided as an example of the crosses that were performed. Each rectangle represents a chromosome, with the substituted chromosomes 3 and 10 diagramed in this figure, on B6 background in all mice. The

105

control B6 mice were generated from Cross I. The single CSS mice were generated from crosses II and III. The double CSS mice were generated from cross IV. M, Male. F,

Female.

106

Joint F-tests for main effects on body weight indicated that the chromosome substitutions

influenced body weight (males p=0.0028; females p=0.0008; meta p=1.4e-05). Similarly,

joint F-tests tests for main effects on plasma glucose levels demonstrated a significant

effect of the chromosome substitutions (males p=0.0082; females p=0.00011; meta

p=1.4e-05). QTLs with main effects on body weight were mapped to chromosomes 8

(main effect: 1.23g; average effect: 1.02g) and 17 (main effect: -1.13g; average effect: -

1.11g) (Table 4.2). Note that we define main effects as the effect of a chromosome

substitution as estimated by a model which includes all pairwise interaction terms, thus

taking into account context-dependent genetic background effects. In contrast, the

average effect is estimated using a model that does not include any interaction terms; the

latter is similar to the analyses performed in a typical GWAS. QTLs with main effects on

fasting glucose were mapped to chromosomes 3 (main effect: 25.0 mg/dL; average effect:

9.61 mg/dL), 5 (main effect: 15.6 mg/dL; average effect: 6.02 mg/dL), and 4 (main effect:

17.5 mg/dL; average effect: 6.61 mg/dL) (Table 4.2).b

Joint F-tests for interaction effects on body weight were not significant (males p= 0.19; females p= 0.83; meta p= 0.44), and therefore epistatic interactions on body weight were not further investigated. However, joint F-tests for interaction effects on plasma glucose demonstrated the importance of epistasis in regulating this trait (males p= 0.002; females p= 0.003; meta p= 8.99e-05). In fact, among the males and females respectively, epistasis accounted for 43% (95% confidence interval: 23%-75%) and 72% (95% confidence interval: 37%-97%) of the heritable effects on plasma glucose levels. The discrepant results for the contribution of interactions to body weight and plasma glucose are likely reflected in the difference between whether QTLs for these traits were detected using the

107

main effect model or the average effect model (Table 4.2). For plasma glucose, only 1 of the 3 QTLs identified using the main effect model was also identified using the average effect model, and no new QTLs were identified with the average effect model. In contrast, both of the QTLs for body weight identified using the main effect model were also identified using the average effect model, and 2 new QTLs were identified on chromosomes 6 and 10. This suggests that for a trait regulated by epistatic interactions, the ability to successfully identify QTLs is greatly enhanced by accounting for these interactions. However, for a trait regulated primarily by additive effects, a model incorporating interactions can be detrimental to QTL identification.

To identify specific epistatic interactions, we tested explicit hypotheses for inter- chromosomal pairwise interactions on plasma glucose levels. Among the 15 CSS crosses analyzed, 5 crosses demonstrated inter-chromosomal epistatic interactions that altered plasma glucose levels (Figs. 4.3, 4.4). Interestingly, in all 5 crosses demonstrating interactions, one chromosome substitution increased fasting glucose levels relative to the control B6 strain. These main effects raised plasma glucose levels by an average of 12.3 mg/dL in males and 17.8 mg/dL in females. However, in all 5 observed interactions the average plasma glucose levels in the double CSSs were closer to the control B6 strain than any single CSS was. Furthermore, in 4 of the 5 interactions, the plasma glucose levels in the double CSS did not differ statistically from the control strain B6 (p value >

0.1). Thus, the chromosome substitution driving the increase in plasma glucose on a B6 background had no effect on glucose levels when the genetic background was altered by the second chromosome substitution.

108

Table 4.2. Main and average effects on phenotypes

Meta Substituted Effect 95% CI 95% CI Adjusted chromosome Estimate T P Lower Upper P Main 9.74488 -0.265762283 19.81374709 2.192947 0.0436 0.1854 14 Average -0.86608 -6.083174116 4.307270988 -0.34777 0.7405 1 Main -1.65693 -7.808561114 4.721277732 -0.38763 0.6486 1 17 Average -2.57627 -6.141044821 0.946939948 -1.13129 0.2104 0.8329 Main 25.01456 19.21034134 31.30942924 4.762928 <0.0001 0.0001 3 Average 9.606284 5.540854949 13.74347862 3.322242 2e-04 0.0057 Main 11.54314 -0.592916635 24.1972221 2.684844 0.0288 0.0645 6 Average -3.84284 -7.802632063 0.299592968 -1.64152 0.083 0.4532

109 Main 12.55446 5.207282095 20.01631742 2.780221 0.0039 0.0519 Plasma glucose 10

Average 2.23361 -1.792949821 6.369265612 0.984374 0.3067 0.9125 Main 17.47872 10.51503985 24.72698899 3.564996 1e-04 0.0067 4 Average 6.614857 1.258687295 12.25451985 2.514275 0.0163 0.0666 Main 15.62627 7.183748327 24.03397682 3.740554 2e-04 0.0045 5 Average 6.024702 1.809464596 10.38360837 2.614923 0.0087 0.0508 Main 8.74165 1.750181134 15.99788547 1.857149 0.0386 0.3628 8 Average -1.46336 -5.284059866 2.529461511 -0.52701 0.5403 0.9978 Cl, confidence interval.

Meta Substituted Effect 95% CI 95% CI Adjusted chromosome Estimate T P Lower Upper P Main 0.318969 -0.586592987 1.183791215 1.030359 0.3986 0.9161 14 Average -0.0193 -0.348753138 0.32173969 -0.11239 0.9127 1 - Main -1.13483 -1.844105803 -3.68475 7e-04 0.0037 0.410376532 17 - Average -1.11218 -1.382123889 -7.07123 <0.0001 <0.0001

0.842540405 Main 0.087931 -0.532102491 0.681676768 0.237511 0.8047 1 3 Average 0.271599 -0.144578572 0.694418149 1.363997 0.1902 0.7084 Main 0.634872 0.210502972 1.05681823 2.08434 0.0134 0.2464 6

Body weight Average 0.933015 0.665819837 1.203822428 5.794178 <0.0001 <0.0001 Main 0.517743 -0.005528751 1.024677598 1.618801 0.0789 0.531 10 110 Average 0.436682 0.112330084 0.756676888 2.800148 0.006 0.0331

Main 0.203255 -0.537046654 0.92737608 0.57813 0.5782 0.9969 4 Average 0.312415 -0.046415952 0.656068643 1.719168 0.0856 0.444 Main -0.22006 -0.689000914 0.236982218 -0.74791 0.4137 0.9862 5 Average 0.035833 -0.248051049 0.31682035 0.22635 0.8076 1 Main 1.232247 0.703835183 1.772528999 3.616917 <0.0001 0.0051 8 Average 1.024028 0.673792153 1.369461021 5.321076 <0.0001 <0.0001 Cl, confidence interval.

Fig. 4.3. Identification of 5 inter-chromosomal epistatic interactions that regulate fasting glucose levels in mice.

Multiple testing adjusted p-values for interaction effects on fasting plasma glucose levels among 15 crosses each involving two A/J-derived chromosome substitutions with the

substituted chromosomes indicated below the chart. Inverse-variance meta-analysis was

used to combine the effects from males and females. The horizontal line indicates the

significance threshold of 0.05.

111

112

Fig.4.4. Inter-chromosomal epistasis regulates fasting glucose levels.

Plasma glucose levels were measured in 5-week-old female (left) and male (right) mice that were fasted overnight. Each dot represents the glucose level of a single mouse.

“Others” represents the data from all mice in this study excluding the other 4 strains shown in that panel. The black horizontal line indicates the mean glucose level for each group. The red horizontal line indicates the predicted trait level based on a model of additivity.

113

Regulation of gene expression by epistasis.

As hepatic gluconeogenesis is a key determinant of plasma glucose levels in healthy

-sensitive mice [177], the hepatic gene expression patterns of control and CSS male

mice were analyzed to better understand the molecular mechanisms underlying the

epistatic regulation of plasma glucose. The RNA-Seq data were filtered for genes

expressed in the liver, leaving 13,289 genes that were tested for differential expression

associated with both main and interaction effects. A total of 6,101 main effect expression

QTLs (meQTLs) were identified (FDR < 0.05) (Fig. 4.5). The full list of meQTLs is

available at https://doi.org/10.1371/journal.pgen.1007025.s014. Those meQTL genes

located on the substituted chromosome were classified as cis-meQTLs (Fig. 4.5, red)

whereas the meQTL genes not located on the substituted chromosome were classified as

trans-meQTLs (Fig. 4.5, blue). Among all possible genes regulated by a cis-meQTL, on average 11.48% of these genes in each strain had a cis-meQTL (range: 5.54% - 22.09%)

(Table 4.3). Similarly, among all possible genes regulated by a trans-meQTL, on average

5.42% (range: 0.08% to 19.26%) of these genes were regulated by a trans-meQTL (Table

4.3). The percentage of cis- and trans-meQTLs in each strain demonstrated a strong positive correlation (Spearman’s r = 1.0) but the proportion of cis-eQTLs was always greater than the proportion of trans-eQTLs. Strain (B6 x B6.A8)F1 had both the highest percentage of genes with cis-meQTLs (22.09%) and trans-meQTLs (19.26%), whereas strain (B6 x B6.A5)F1 had both the lowest percentage of genes with cis-meQTLs (5.54%) and trans-meQTLs (0.08%). This suggests that trans-meQTLs are being driven by the cumulative action of many cis-effects rather than a single or small number of major transcriptional regulators (Fig. 4.6). Among the genes regulated by a meQTL(s), 41.98%

114

(1615 out of 3847) were regulated by multiple meQTLs (Range: 2-6) (Table 4.4). The

full list of genes with multiple meQTLs is available at

https://doi.org/10.1371/journal.pgen.1007025.s017. For example, Brca2 is regulated by 5

trans-meQTLs mapped to chromosomes 4, 6, 8, 10 and 14 (Fig. 4.7), demonstrating that

hepatic Brca2 expression is regulated by allelic variation throughout the genome. In addition to the well-known role of Brca2 in breast cancer susceptibility, Brca2 has been

implicated in hepatocellular carcinoma risk [178–180].

115

Fig. 4.5. Identification of meQTLs that regulate hepatic gene expression.

A circos plot of meQTL locations in the genome where each layer of the circle represents the comparison between a CSS strain and control B6 mice. From the inner circle, the CSS strains are (B6 x B6.A5)F1, (B6.17 x B6)F1, (B6.A3 x B6)F1, (B6.A6 x B6)F1, (B6 x

B6.A10)F1, (B6 x B6.A4)F1, (B6.A14 x B6)F1 and (B6 x B6.A8)F1. Cis-meQTLs and trans-meQTLs are marked with red and blue, respectively. The width of each chromosome is proportional to its physical size. The height of each meQTL bar is proportional to the number of meQTLs in that genomic interval.

116

Fig. 4.6. Positive correlation between cis-meQTLs and trans-meQTLs.

(A) Scatter plot of the relationship between the percentage of cis-meQTLs and trans- meQTLs in each of 8 CSS strains with one substituted chromosome. The strains are labelled on the graph with only their substituted chromosome, for example strain (B6 x

B6.A8)F1 is shown for simplicity as A8. Data are shown on a log scale. (B) Histogram

illustrating the percentage of cis-meQTLs and trans-meQTLs in each of 8 CSS strains with one substituted chromosome.

117

Table 4.3. Main effects on gene expression.

Cis-meQTLs Trans-meQTLs

Number Percentage Number of Percentage Number Number of of genes Strain of genes genes on of genes of cis- trans- on other with cis- substituted with trans- meQTLs eQTLs chromos meQTL chromosome meQTL omes

(B6 x B6.A8)F1 22.09% 154 697 19.26% 2425 12592 (B6.A14 x B6)F1 19.27% 84 436 10.05% 1292 12853 (B6 x B6.A4)F1 17.20% 139 808 8.47% 1057 12481 (B6 x B6.A10)F1 9.46% 58 613 3.54% 449 12676 (B6.A6 x B6)F1 9.41% 67 712 1.01% 127 12577 (B6.A3 x B6)F1 6.92% 51 737 0.69% 86 12552 (B6.A17 x B6)F1 5.85% 37 632 0.09% 11 12657 (B6 x B6.A5)F1 5.54% 54 974 0.08% 10 12315 All 11.48% 644 5609 5.42% 5457 100703

Table 4.4. Summary of genes with multiple meQTLs.

Number of Number of Percentage of meQTLs genes genes 0 9442 71.05% 1 2232 16.80% 2 1138 8.56% 3 346 2.60% 4 102 0.77% 5 27 0.20% 6 2 0.02%

118

Fig. 4.7. Identification of 5 trans-meQTLs that regulate the hepatic expression of

Brca2.

Gene expression levels of Brca2 in the liver are shown for strain B6 and 8 single CSS strains. Each dot represents Brca2 expression in an individual mouse. The mean value for each strain is indicated by a solid line. The Brca2 gene is located on mouse chromosome

5. ** indicates p<0.01 relative to strain B6. *** indicates p<0.001 relative to strain B6.

119

In addition to the meQTLs regulated by substitution of a single chromosome, the analysis of double CSSs enabled the detection of eQTLs with additive and interaction effects between the substituted chromosomes. The expression of Zkscan3 represents an example of additivity, with the substitution of A/J-derived chromosomes 8 and 17 each individually increasing the expression of Zkscan3 relative to control B6 mice (Fig. 4.8.).

In the double CSS strain (B6.A17 x B6.A8)F1, the effects of each individual chromosome substitution are combined in an additive manner to result in yet higher expression than either of the single CSSs (Fig. 4.8A). The additive effects of the Zkscan3 meQTLs detected by RNA-Seq were confirmed by quantitative reverse transcription PCR

(Fig. 4.8B), as were 4/5 additional meQTLs demonstrating additivity (Table 4.5).

120

Fig. 4.8. Regulation of hepatic Zkscan3 expression by additive meQTLs.

(A) Gene expression of Zkscan3 in the liver was analyzed by (A) RNA-Seq and (B) RT- qPCR. Each dot represents Zkscan3 expression levels in an individual mouse. RT-qPCR data shown are relative to the control gene Rplp0. The mean value for each strain is indicated by a black line. The expected expression level of Zkscan3 based on a model of additivity is indicated with a red line. The p value from a test for interactions is shown. A p > 0.05 is suggestive of regulation by additivity rather than interaction.

121

Table 4.5. Genes examined by RNA-Seq and RT-qPCR for epistasis and additive interactions.

RNA-seq (fold change) qPCR (fold change) Type Gene name Cross CSS1 & P-value for CSS1 & P-value for CSS1/B6 CSS2/B6 CSS1/B6 CSS2/B6 CSS2/B6 interaction CSS2/B6 interaction Agxt A6:A8 0.87 0.55 1.01 5.00469E-10 1.20 0.65 1.24 0.04112 Pcx A14:A8 0.68 0.59 0.90 3.18347E-11 0.69 0.53 0.99 0.000363 Slc6a12 A14:A8 0.72 0.65 1.00 4.15536E-13 0.83 0.64 1.04 0.00532 Serpinf2 A6:A8 0.87 0.66 0.92 1.00661E-06 1.25 0.66 1.17 0.1911 Zbtb20 A14:A4 1.11 1.50 1.05 8.96403E-06 1.13 1.15 1.44 0.568 Zbtb20 A17:A4 1.22 1.50 1.06 7.06135E-07 1.11 1.21 1.46 0.623 Antagonisti Raph1 A14:A4 1.09 1.39 0.98 5.02787E-07 0.85 0.72 1.35 0.0043 c Raph1 A17:A4 1.09 1.39 1.01 6.41372E-06 0.76 0.72 1.05 0.0661 Dnajb9 A14:A4 0.97 0.82 1.60 2.58697E-09 1.01 0.77 1.94 1.22E-05 Cers6 A14:A4 0.72 0.79 1.52 3.60415E-08 0.74 0.62 1.72 1.24E-05 Ldha A6:A8 0.92 0.69 1.22 3.13268E-08 1.26 0.77 1.37 0.1441 122 Sec23b A14:A4 0.88 0.86 1.22 5.12386E-08 0.97 0.79 1.83 0.000144

Eif2ak3 A14:A4 0.87 0.89 1.33 1.3656E-07 0.87 0.80 1.96 2.57E-06 Cyp3a16 A14:A4 1.50 1.54 16.17 0.001333533 1.47 1.55 5.87 6.44E-05 Syvn1 A14:A4 1.05 1.01 1.58 0.001266066 0.80 0.58 1.84 5.58E-05 Gstm1 A14:A8 0.97 0.99 1.65 0.000669705 0.99 1.01 1.78 0.00506 Usp18 A6:A8 1.08 0.86 1.68 0.00027991 1.43 0.98 2.07 0.0481 Pik3c2a A14:A8 0.98 1.09 0.75 9.74301E-05 1.11 1.09 0.90 0.107 Synergistic Stat5a A14:A8 0.96 1.01 1.36 0.006677835 1.18 0.96 1.49 0.105 Tecr A6:A8 0.95 0.94 1.28 2.88342E-05 1.37 1.00 1.66 0.2259 Mcm10 A14:A4 1.10 0.98 0.54 0.000183216 1.42 0.87 1.56 0.4847 Nol8 A14:A4 0.95 0.97 1.27 0.000767168 1.03 0.80 1.98 1.69E-05 Rprl3 A14:A4 686.25 561.52 742.32 1.52828E-51 1.02 0.78 1.97 1.49E-05 Zkscan3 A17:A8 1.47 1.66 2.12 0.111901094 1.36 1.50 2.24 0.18559 Asns A3:A5 0.41 0.36 0.14 0.912510043 0.46 0.41 0.16 0.37253 Asns A14:A5 0.31 0.36 0.13 0.767984888 0.48 0.40 0.19 0.37105 Additive Slc12a2 A14:A4 1.25 1.25 1.36 0.185386891 1.17 0.99 2.08 0.00217 Igfbp3 A14:A4 0.74 0.76 0.63 0.24590094 1.02 0.70 0.90 0.33164 Ldha A14:A4 0.70 0.80 0.65 0.143861309 0.84 0.72 0.93 0.165

In addition to examples of additivity, interaction expression QTLs (ieQTLs) were

identified that were jointly regulated by genetic variation on two substituted

chromosomes. The ieQTLs, similar to the meQTLs, were divided into cis-ieQTLS and

trans-ieQTLs, with cis-ieQTLs defined by differentially expressed genes located on either

one of the two substituted chromosomes and trans-ieQTLs representing differentially expressed genes that are not located on either substituted chromosome. A total of 4,283 ieQTLs were identified. The full list of ieQTLs is available at https://doi.org/10.1371/journal.pgen.1007025.s019. Among all possible genes regulated by a cis-ieQTL or trans-ieQTL, 2.01% and 2.16% of genes were regulated by a cis- or

trans-ieQTL respectively (Table 4.6). The combination of A/J-derived chromosomes 8

and 14 yielded the most ieQTLs (n=2,305) including cis-ieQTLs regulating the

expression of 17.56% of all genes on chromosomes 8 or 14 and trans-ieQTLs regulating

the expression of 17.32% of all genes throughout the remainder of the genome. Overall,

the ieQTLs demonstrated a similar positive correlation as the meQTLs (Spearman’s r =

0.92) (Fig. 4.9), although there was no enrichment for cis-ieQTLs. Among the genes

regulated by an ieQTL(s), 32.35% (945 out of 2921) were regulated by multiple ieQTLs

(Range: 2-7) (Table 4.7). The full list of genes with multiple ieQTLs is available at

https://doi.org/10.1371/journal.pgen.1007025.s020. For example, expression of Agt,

which codes for angiotensinogen and involves in blood pressure regulation, is decreased

in strain (B6.A8 x B6)F1 relative to control B6 mice; however, interactions between

alleles on and chromosomes 6, 3, 17, and 14 all result in expression levels

of Agt that did not differ from the control strain (Fig. 4.10).

123

Table 4.6. Interaction effects on gene expression.

Cis-ieQTLs Trans-ieQTLs Subtypes of ieQTLs

Genes Genes Percentage Number of Number of Number Cross Genes Number with Number with of genes Number of genes on genes on of with cis- of cis- trans- of trans- synergistic with antagonistic substituted other synergistic ieQTL (%) ieQTLs ieQTL ieQTLs ieQTLs antagonistic ieQTLs chromosomes chromosomes ieQTLs (%) (%) ieQTLs

A14:A8 17.56% 199 1133 17.32% 2106 12156 6% 129 94% 2176 124 A6:A8 6.81% 96 1409 5.89% 700 11880 2% 15 98% 781 A14:A4 3.86% 48 1244 3.57% 430 12045 6% 31 94% 447 A6:A10 1.81% 24 1325 0.58% 69 11964 1% 1 99% 92 A14:A10 1.62% 17 1049 2.05% 251 12240 1% 3 99% 265 A6:A4 0.86% 13 1520 0.58% 68 11769 0% 0 100% 81 A3:A8 0.77% 11 1434 0.77% 91 11855 2% 2 98% 100 A3:A10 0.44% 6 1350 0.56% 67 11939 0% 0 100% 73 A17:A8 0.23% 3 1329 0.00% 0 11960 0% 0 100% 3 A17:A4 0.14% 2 1440 0.48% 57 11849 2% 1 98% 58 A17:A10 0.08% 1 1245 0.03% 4 12044 0% 0 100% 5 A14:A5 0.00% 0 1410 0.17% 20 11879 0% 0 0% 0 A17:A5 0.00% 0 1606 0.00% 0 11683 0% 0 0% 0 A3:A5 0.00% 0 1711 0.00% 0 11578 0% 0 0% 0 A6:A5 0.00% 0 1686 0.00% 0 11603 0% 0 100% 20 All 2.01% 420 20891 2.16% 3863 178444 4% 182 96% 4101

Fig. 4.9. Positive correlation between cis-ieQTLs and trans-ieQTLs.

(A) Scatter plot of the relationship between the percentage of cis-ieQTLs and trans- ieQTLs identified among 15 pairwise CSS crosses. The data points are labelled on the graph with the two substituted chromosomes for each pairwise cross. Data are shown on a log scale. (B) Histogram illustrating the percentage of cis-ieQTLs and transieQTLs in each of 15 pairwise CSS crosses.

125

Table 4.7. Summary of genes with mutliple ieQTLs.

Number of ieQTLs Number of genes Percentage of genes 0 10368 78.02% 1 1976 14.87% 2 662 4.98% 3 185 1.39% 4 71 0.53% 5 19 0.14% 6 7 0.05% 7 1 0.01%

126

Fig. 4.10. Identification of 4 ieQTLs that regulate the hepatic expression of Agt.

Gene expression levels of Agt in the liver are shown for strain B6, 5 single CSS strains, and 4 double CSS strains. Each dot represents Agt expression in an individual mouse.

The mean value for each strain is indicated by a solid line. The expected expression level of Agt in the double CSS strains based on a model of additivity is indicated with a red line. The Agt gene is located on mouse chromosome 8.

127

Context-dependent effects on gene expression.

We next tested whether the interaction effects on gene expression were synergistic

(positive epistasis) or antagonistic (negative epistasis) (Fig. 4.11). Synergistic refers to an increased difference in gene expression levels between the double CSS and the control

B6 strain beyond that expected based on an additive model, whereas antagonistic refers to a decreased difference. The regulation of Agxt was an example of an antagonistic interaction, with main effects from substituted chromosomes 6 and 8 each individually decreasing Agxt expression, whereas this effect was lost in the double chromosome substitution strain (Fig. 4.12A). In contrast, the regulation of Cyp3a16 represented an example of synergistic interaction with the detection of an ieQTL in the absence of a meQTLs (Fig. 4.12B). Among the ieQTLs, antagonistic interactions accounted for 96%

(n=4101) while synergistic interactions accounted for 4% (n=182) (Table 4.6).

Remarkably, for 80% of the antagonistic interactions (3285/4101), gene expression in one or both of the single CSSs differed from the control B6 strain (a meQTL), whereas expression in the double CSS reverted to control levels (p > 0.1 relative to strain B6). To again validate the RNA-Seq data using an independent method, RT-qPCR was performed for a subset of genes with antagonistic (n=13) and synergistic (n=10) interactions.

Replication by RT-qPCR confirmed the detection of epistasis in 61% (p <0.05) of the genes tested (Antagonistic: 8/13; Synergistic: 6/10) (Table 4.5).

128

Fig. 4.11. Schematic diagram illustrating the categorization of epistasis as either synergistic or antagonistic.

129

Hypothetical mean expression levels are shown with black lines for the strains B6 and the two single CSS strains (CSSa x B6)F1 and (B6 x CSSb)F1, where a and b represent any two different substituted chromosomes. The predicted expression levels based on a model of additivity in the double CSS strain (CSSa x CSSb)F1 are shown with a red line.

Synergistic epistasis is represented by a difference in trait values between the double CSS and control B6 strain that is greater than that predicted by additivity. Antagonistic epistasis is represented by a difference in trait values between the double CSS and control

B6 strain that is less than that predicted by additivity. (A) Illustrates the case where only one single CSS strain shows expression differences relative to the control. (B) Illustrates the case where both single CSS strains show expression differences relative to the control.

(C) Illustrates the case where both single CSS strains show expression differences relative to the control, but in opposite directions. (D) Illustrates the case where neither single CSS strain show expression differences relative to the control.

130

Fig. 4.12. Examples of synergistic and antagonistic ieQTLs. Each dot represents the gene expression data from one mouse. The horizontal bar indicates the mean value for each strain (A) An antagonistic ieQTL regulates the expression of Agxt in the liver. (B) A

131

synergistic ieQTL regulates the expression of Cyp3a16 in the liver. The red horizontal line indicates the predicted trait level based on a model of additivity

132

Significant contribution of epistasis to trait heritability.

Given that the ieQTLs regulated approximately 2% of all genes expressed in the liver

(Table 4.6), we sought to quantify the contribution of genetic interactions to the heritable

component of all genes. First, an empirical Bayes quasi-likelihood F-test identified 6,684 genes out of the 12,325 genes expressed in the liver for which there was evidence of genetic control within the population of CSSs (FDR<0.05). The average proportion of heritable variation attributable to interactions across these genes was 0.56 (1st quartile:

0.43 – 3rd quartile: 0.68) (Fig. 4.13A). When the same analysis was restricted to only

genes with a statistically significant (FDR<0.05) contribution of interactions to gene

expression levels (n=3,236 genes), the proportion of heritable variation attributable to

interactions increased to 0.66 (1st quartile: 0.56, 3rd quartile: 0.74) (Fig. 4.13B). For

comparison, a simulation study was conducted using artificial data to model pure

additivity in the absence of interactions, with a resulting estimate of heritability of 0.13

(1st quartile: 0.05, 3rd quartile: 0.19) (Fig. 4.13C), which provides an estimate of the

background noise in this measurement. Thus, genetic interactions are a major contributor

to the regulation of gene expression.

133

134

Fig. 4.13. Contribution of epistasis to the genetic regulation of hepatic gene expression. Diagrams representing the estimated proportion of genetic variation due to interactions for (A) all genes expressed in the mouse liver whose expression was under genetic control in the CSS strains studied, (B) the same data segregated based on the statistical evidence supporting an effect of interaction on gene expression, and (C) a comparison of the genes with the most significant evidence for regulation by genetic interactions (FDR < 0.05) and a simulation study with artificial data that models the absence of any genetic interactions.

135

Discussion

CSSs, which have a simplified and fixed genetic background, were used to identify

widespread and likely concurrent epistatic interactions. This systematic analysis of

mammalian double CSSs demonstrated that epistatic interactions controlled the majority

of the heritable variation in both fasting plasma glucose levels and hepatic gene

expression (Fig. 4.13). Among genes expressed in the liver, the expression level of 24%

were regulated, at least in part, by epistasis (Fig. 4.13). This number is remarkable

considering that only dominant or semi-dominant effects were tested, only a single tissue and time point were examined, allelic variation from only two inbred strains of mice were included, and only 15 randomly selected pairwise strain combinations of A/J derived

CSSs were tested out of a possible 462 combinations of double CSSs. The prevalence of epistatic interactions provides a potential molecular mechanism underlying the highly dependent nature of complex traits on genetic background [172,173,181,182]. Interpreting

the effect of individual allelic variants will thus be severely limited by population-style

analyses that fail to account for possible contextual effects. Nonetheless, progress is

being made in this field, including in diseases such as multiple sclerosis (MS), which is a

complex genetic disease whose risk is highly associated with family history [183]. For

example, MS risk alleles in DDX39B (rs2523506) and IL7R (rs2523506A) together

significantly increase MS risk considerably more than either variant independently [37].

Based on the considerable number of interactions detected in the CSS crosses, context-

dependent interactions such as that between DDX39B and IL7R in MS are likely

136

widespread and may therefore represent a significant source of missing heritability for

complex traits and diseases [23,30].

Although epistasis was a dominant factor regulating fasting glucose levels, the same effect was not detected in the regulation of body weight. It is not clear if this is due to different genetic architectures between these two traits or whether this was due to the

limited genetic variation between the B6 and A/J strains. The body weight studies were

conducted in mice fed a standard rodent chow, whereas differences in body weight

between strains B6 and A/J are significantly more pronounced when challenged with a

high-fat diet [184,185]. Alternately, a recent meta-analysis of trait heritability in twin

studies identified significant variation in the role of additive and non-additive variation

among different traits, with suggestive evidence for non-additive effects in 31% of traits

[186]. Among the traits analyzed, genetic regulation of neurological, cardiovascular, and ophthalmological traits were among the most consistent with solely additive effects,

whereas traits related to reproduction and dermatology were more often consistent with

non-additive interactions. Among the metabolic traits studied, 40% of the 464 traits

studied were consistent with a contribution of non-additive interactions [186]. It is

interesting to speculate whether some traits that may have a more direct effect on fitness

(e.g. reproduction) are more likely to involve multiple non-additive effectors in order to

maintain a narrow phenotypic or developmental range [187].

Although many inter-chromosomal non-additive interactions were identified in mice, it

remains unclear whether these interactions are attributable to bigenic gene-gene

interactions or to higher-order epistasis involving multiple loci located on a substituted

chromosome. Studies in yeast that dissected the genetic architecture of epistasis

137

demonstrated that gene-gene interactions played a minor role among the heritable effects

attributable to epistasis, thus primarily implicating higher order interactions [160]. Yet,

other studies in yeast that methodically tested pairs of gene knockouts for interactions

identified a number of gene-gene interactions [188]. Additional evidence for both high- order epistasis with three, four, and even more mutations [189] as well as bigenic gene-

gene interactions [190] have been identified, and it seems likely that both will underlie

interactions detected in the CSS studies. This is because the use of CSSs to study the

allelic variation found on an entire chromosome in tandem equally enables the detection

of bigenic and higher-order interactions. This property of CSSs may contribute to the

robust detection of epistasis using the CSS experimental platform relative to genetic

mapping studies in populations with many independently segregating variants, which are

often underpowered to identify higher-order interactions [191]. However, to formally test

this and determine the relative contribution of each, higher resolution genetic mapping of

the epistatic interactions will be necessary to better understand their molecular nature

[192]. Higher resolution mapping studies should eventually shed light on whether the chromosome-level properties discovered in this study are consistent with those for SNP- level interactions. Based on previous studies of complex trait QTLs in single-CSS studies, chromosome-level QTLs demonstrated a similar genetic architecture as that found in higher resolution QTLs including large effect sizes, similar direction of effects, and suggestive evidence of widespread epistasis [174,193]. Thus, it seems likely that

discoveries made based on chromosome-level analysis of epistasis, will apply equally to studies involving individual genetic variants. For example, genetic variants in Cntnap2 were identified by higher resolution mapping studies of chromosome-level QTLs in CSSs,

138

that were associated with opposing effects on body weight depending on epistatic

interactions with intra-chromosomal variation in the genetic background [194].

Perhaps the most significant outcome of the epistasis detected was the high degree of

constancy in the light of context dependence, such that the interactions usually returned

trait values to the levels detected in control mice. Remarkably, this is just as Waddington

predicted 75 years ago, a phenomenon he referred to as canalization [195] and has been observed in previous studies[196–200]. Canalization refers to the likelihood of an organism to proceed towards one developmental outcome, despite variation in the process along the way. This variation can be influenced by among other things the numerous functional genetic variants present in a typical human genome, which may contain thousands of variants that alter gene function [201]. We find that the overwhelming

majority of genetic interactions return trait values to levels seen in control strains, which

would act to reduce phenotypic variation among developmental outcomes. Studies of

epistasis in tomato plants detected by analyzing short chromosomal regions on different

genetic backgrounds identified a similar bias towards antagonistic epistasis relative to

synergistic epistasis[199]. A bias towards antagonistic interactions was also detected in

large-scale gene-gene interactions studies in yeast, although with a lower frequency of

antagonistic relative to synergistic interactions[198,202]. Thus, our results are concordant

with other studies that the majority of epistatic interactions are antagonistic, and together

suggest that when larger tracts of DNA are assessed for interactions the effects are even

more likely to be antagonistic. This robustness in the face of considerable genetic

variation is central to the underlying properties of canalization. These genetic interactions

therefore represent a mechanism for storing genetic variation within a population, without

139

reducing individual fitness. This stored genetic variation could then enable populations to

more quickly adapt to environmental changes [203].

Finally, the consistently greater effect sizes of main effects relative to average effects

suggests that GWAS-type studies, in both human and model organisms, consistently

underestimate true effect sizes in at least a subset of individuals. For example, a large F2

intercross between inbred mice carrying a mutation that results in a nonfunctional allele

of the growth hormone releasing (Ghrhr) on either a B6 or C3H

genetic background identified widespread antagonistic epistasis, albeit with small

contributions to overall trait heritability relative to additive effects [196]. Similarly,

epistatic interactions were identified in the Diversity Outbred mice resulting in small

contributions to the overall heritability of metabolic-related traits [204]. These studies

contrast the large contribution of epistasis to trait heritability identified using the CSS paradigm (Fig. 4.13), mirroring the contrasting portraits of genetic architecture identified based on differing genetic structures of these experimental populations [174]. The CSS

paradigm examines context-dependent effects on individual genotypes and typically

identifies QTLs with large effect sizes. Alternatively, GWAS-type studies average effects

across a population of heterogeneous genotypes and typically identify QTLs with small

phenotypic effects. However, perhaps most relevant is that the relatively simpler

genotypes of CSSs enable greater depth analyzing fewer unique genotypes, potentially

capturing what would be rare genotypic combinations in a segregating cross or human

population. Therefore, the key to enabling precision medicine, which like the CSS studies

is focused on the effect of a variant on one specific genetic background, is to identify in

which subset of individuals a particular variant has a significant effect. The consideration

140

of epistasis in treatment, although in its infancy, remains a promising avenue for improving clinical treatment regimens, including predicting drug response in tumors [205] and guiding antibiotic drug-resistance [206]. However, true precision medicine will necessitate a more comprehensive understanding of how genetic background, across many loci, affects single variant substitutions.

141

Materials and Methods

Mice. Chromosome substitution strains (CSS) and control strains were purchased from

The Jackson Laboratory. These strains include C57BL/6J-Chr3A/J/NaJ mice (Stock

#004381) (B6.A3), C57BL/6J-Chr4A/J/NaJ mice (Stock #004382) (B6.A4), C57BL/6J-

Chr5A/J/NaJ mice (Stock #004383) (B6.A5), C57BL/6J-Chr6A/J/NaJ mice (Stock #004384)

(B6.A6), C57BL/6J-Chr8A/J/NaJ mice (Stock #004386) (B6.A8), C57BL/6J-Chr10A/J/NaJ mice (Stock #004388) (B6.A10), C57BL/6J-Chr14A/J/NaJ mice (Stock #004392)

(B6.A14), C57BL/6J-Chr17A/J/NaJ mice (Stock #004395) (B6.A17) and C57BL/6J

(Stock #000664). Mice were maintained by brother-sister matings. All mice used for

experiments were obtained from breeder colonies at Case Western Reserve University.

Mice were housed in ventilated racks with access to food and water ad libitum and

maintained at 21°C on a 12-hour light/12-hour dark cycle. All mice were cared for as described under the Guide for the Care and Use of Animals, eighth edition (2011) and all experiments were approved by IACUC and carried out in an AAALAC approved facility.

The IACUC protocol numbers were 2013-0098 and 2016-0064. Male mice from strains

B6, B6.A4, B6.A5, B6.A10 strains and B6.A8 were bred with female mice from strains

B6, B6.A3, B6.A6, B6.A14 and B6.A17 strain. The offspring were weaned at 3 weeks of age. The number of offspring analyzed from each cross is shown in Table 4.1 for both body weight and plasma glucose, although glucose levels were not measured in one mouse each from the following strains: (B6 x B6.A10)F1, (B6.A14 x B6)F1, (B6.A17 x

B6.A10)F1, (B6.A3 x B6.A10)F1, (B6.A6 x B6.A4)F1, (B6.A14 x B6.A5)F1 and (B6.A6 x B6.A5)F1. The mice analyzed from each cross were derived from at least three

142

independent breeding cages. No blinding to the genotypes was undertaken.

Mouse phenotyping. At 5 weeks of age, mice were fasted 16 hours overnight and body

weight was measured. Mice were anesthetized with isofluorane and fasting blood glucose

levels were measured via retro orbital bleeds using an OneTouch Ultra2 meter (LifeScan,

Milpitas, CA, USA). Mice were subsequently sacrificed by cervical dislocation and the

caudate lobe of the liver was collected and immediately placed in RNAlater (Thermo

Fisher Scientific, Waltham, MA, USA).

Trait analysis. To analyze the body weight and fasting plasma glucose data, linear regression was used with a main effects term and a term for each pairwise interaction for the males and females separately. In the glucose data, 5 observations were Winserized by setting a ceiling of 4 median absolute deviations from the median. Any values larger than the ceiling (165 mg/dL) were set to the ceiling. Additionally, interactions where one of the crosses contained fewer than 5 mice were not analyzed leading to the removal of the

(B6.A4 x B6.A3)F1 mice, the female (B6.A8 x B6.A14)F1 and the male (B6.A8 x

B6.A3)F1 mice. For each trait and for each sex, we estimated a linear model with the following predictors: (1) maternal substitution, (2) paternal substitution and (3) the interaction of maternal by paternal substitution. In these models, the reference strain was

B6. The sexes may potentially differ in residual variance and in the effect of the chromosome substitutions (i.e. gene by sex interaction). To handle these differences transparently, we estimated and reported models for each sex separately. Within each of the above models, two joint linear hypothesis tests were performed of the following

143

hypothesis: (a) there were no main effects (i.e. terms (1) and (2) in the model above were

all 0), and (b) there were no interaction effects (i.e. terms (3) in above model were all 0).

These linear hypothesis tests were carried out using the “linearHypothesis” function in

the “car” package [207] and with the anova function in R. Fisher’s method was used to

combine these p-values from males and females [208]. Similar results were obtained

using a full 3-way interaction model including all interactions between sex, maternal

substitution and paternal substitution. In this approach, the test of the null hypothesis that

all main effects in males and females were 0 had a p-value of 3.168e-05 and 1.17e-05 for

weight and glucose respectively, while the overall test for interaction had a p-value of

0.44 and 0.00011 for weight and glucose respectively. Inverse-variance meta-analysis

ˆ ˆ was used to combine the coefficient estimates from the males and females. If βm and β f

are estimated genetic effects for males and females respectively then the IVW estimator

ˆ 1/ var (β f ) is ββ垐? =ww +−(1 ) β where w = . Thus, while the genetic IVW f m 垐 1/ var(ββfm) + 1/ var ( )

effects may potentially differ between males and females, the combined results represent

a weighted average of the effect in males and in females. To account for potential non- normality, heteroscedasticity and multiple testing, we created 10,000 bootstrap data sets by sampling with replacement from each cross and sex combination. Studentized bootstraps (i.e. using pivotal statistics) were used to create confidence intervals for the coefficients and p-values. Multiple tests were adjusted for by comparing the observed test statistics to the maximum bootstrap test statistic as described elsewhere [209]. P-values were adjusted for multiple comparisons separately for each trait and separately for the main effects and interactions. As an alternative to the meta-analysis approach, we also fit

144

a linear model adjusting for sex as a covariate. Results of this analysis are reported in

Tables 4.8 and 4.9. The proportion of the genetic variance explained by interactions was estimated as (RFull – RAdditive)/ RFull where RAdditive and RFull are the adjusted coefficients of

determination for the model with only main effects and for the full interaction model

respectively. The adjusted coefficients of determination are an estimate of the proportion

of variation in the trait which is explained by the model. Note that RFull and RAdditive share the same denominator (i.e. the total trait variation). Thus, total trait variation cancels out of the quantity (RFull - RAdditive)/ RFull so that the quantity represents the amount of genetic variation that cannot be explained by main effects only. Using the adjusted version of the

coefficient of determination helps account for potential overfitting. Bootstrap confidence

intervals of this proportion were calculated.

Sample preparation for RNA-Seq. Liver tissue stored in RNAlater was homogenized using a Tissumizer Homogenizer (Tekmar, Cincinnati, OH, USA). Total RNA was isolated using the PureLink RNA purification kit (Thermo Fisher Scientific, Waltham,

MA, USA). A sequencing library was generated using the TruSeq Stranded Total RNA kit (Illumina, San Diego, CA, USA). RNA samples were sequenced on Illumina

HiSeq2500s with single-end 50 reads [210]. Library preparation and RNA

sequencing were performed by the CWRU genomics core (Director, Dr. Alex Miron). A

total of 7,269,450,186 reads were generated across four flow cells, with an average of

47,204,222 ± 928,913 [range: 14,561,990 – 76,538,825] reads per sample. Sequencing

quality was assessed by FastQC [211], which identified an average per base quality score

of 35.46.

145

RNA-Seq data analysis. To maximize statistical power, 20 samples were selected for

analysis from the control B6 group, 8 samples were selected from the single CSS groups,

and 5 samples were selected from the double CSS groups. A total of 154 control and CSS

mice were analyzed, including 20 B6 mice, 63 mice that were heterozygous for one A/J-

derived chromosome, and 71 mice that were heterozygous for two different A/J-derived chromosomes. Only male mice were analyzed to avoid complications due to sex

differences in gene expression. The B6.A4 x B6.A3 and B6.A8 x B6.A3 crosses were

poor breeders and thus we did not obtain 5 samples to analyze from these crosses.

Reads were aligned using TopHat2 (2.0.10) [212] to the reference mm10 genome with the

GENCODE vM7 annotations as a guide. Because the reference genome is comprised of

sequence from strain B6, sequencing reads from a B6-derived chromosome are more

accurately mapped than reads from an A/J-derived chromosome [213]. To avoid potential

mapping biases, we created an “individualized genome” of the A/J mouse strain using the

program Seqnature [213] with variant calls from the Mouse Genomes Project that were

downloaded from The Sanger Institute [214]. Reads that were not mapped to the B6

genome were then mapped to the individualized AJ genome with TopHat2. HTSeq-count

[215] and the GENCODE vM7 gene annotations[46] were used to count the number of

reads for each gene feature. After filtering to remove duplicate reads, unmapped reads,

low quality reads, and reads mapped to non-GENCODE regions of the genome, an

average of 16,506,775 ± 439,754 [range: 4,638,701 – 30,465,477] reads were mapped to

GENCODE regions per sample. There was no significant difference in the mapping

efficiency (number of mapped reads / total number of reads) between the control B6

samples and any of the CSS strains either genome-wide (Fig. 4.14A) or on the substituted

146

chromosome (Fig. 4.14B). This suggests that the sequence differences on the A/J chromosomes did not reduce mapping efficiency in the CSSs.

Graphical depictions of the distribution CPM (counts per million) were used to remove the following 3 outlier samples: E171, E305, and E570. Genes where fewer than 75% of the samples had a count greater than or equal to 15 were considered to be expressed at low levels in liver and were removed leaving 13,289 genes that were considered expressed. To enhance reproducibility and reduce the dependence between the genes, svaseq [61] was used to create 5 surrogate variables that served as covariates in subsequent modeling.

EdgeR [58] was used to fit a model with main effects and pairwise interactions between each chromosome substitution. EdgeR uses a log link function, and thus departure from additivity in EdgeR is departure from a multiplicative model on the gene expression level.

For each gene an interaction model was fit which included the following terms: (1) maternal substitution, (2) paternal substitution, (3) the interaction of maternal by paternal substitution, and (4) the SVA covariates. For all models, “B6” was used as the reference for the categorical chromosome substitution predictors.

147

Fig. 4.14. No differences in mapping efficiency of RNA-Seq reads between B6 and

CSSs.

(A) Genome-wide mapping efficiency was calculated as the number of unique reads

mapped to the GENCODE coding portion of the genome divided by the total number of

148

reads per sample. (B) Mapping efficiency was calculated as above for the individual substituted chromosomes in each CSS as indicated.

149

A stratified FDR approach was used for the analysis of both meQTLs and ieQTLs [219].

For meQTLs, we tested for associations between every combination of chromosome substitutions in the study with every unfiltered gene in the RNA-Seq data. These hypothesis tests were stratified by chromosome and cis vs. trans. The method of

Benjamini and Hochberg [220] was applied within each strata to control the false discovery rate. Similarly, the hypothesis tests for the ieQTLs were stratified by each chromosome combination and cis/trans. The stratified FDR approach has been shown to be more powerful when the proportion of true hypothesis differs by strata. The chromosome-chromosome interactions with FDR < 0.05 were divided into the categories synergistic and antagonistic based on the gene expression differences between the double

CSS strain and the control strain relative to that predicted by an additive model (Fig.

4.11). Spearman’s r was used to summarize the association between several variables in the analysis. A Spearman’s r of 1 implies that the rank order of the values for two variables is the same. To estimate the amount of variation attributable to interaction, we fit an additive model in EdgeR which did not include any interaction terms. We then calculated for each individual and gene the fitted values assuming that the individual’s covariates (i.e. the SVA surrogate variables) were set to 0 and thus do not contribute to the variation. We calculate SSFull as the sum of the mean centered and squared fitted values for the full model including interaction, SAdditive was calculated similarly for the additive model. We calculated the proportion of the genetic variation explained by interactions as (SSFull - SAdditive) / SFull. This proportions is only meaningful when there is genetic variation to be explained. To filter out only genes with evidence of genetic control, using the full model for each gene, we tested the overall joint null hypothesis that

150

all mouse strains had the same average expression level using the empirical Bayes quasi- likelihood F-tests test as implemented in EdgeR. This allowed us to classify some genes as showing evidence of genetic control. Only these genes were looked at further. The estimator (SSFull - SAdditive) / SFull may be slightly biased upward due to overfitting.

However, the mean value for this statistic among the genes with no significant interaction

(FDR > 0.5) was 0.25 (1st quartile: 0.20, 3rd quartile: 0.32) (Fig. 4.13B), which gives one

estimate of the upper bound on the possible bias. Here, the overall test that the interaction

terms were all 0 was carried out using the Bayes quasi-likelihood F-tests test as

implemented in EdgeR. To assess any potential bias stemming from the arbitrary

selection of an FDR > 0.5, we performed a simulation study to independently

approximate the upper limit on this bias. Using the fitted values (i.e. predicted mean)

from the additive model described above, we simulated counts for each gene and

individual from a Poisson distribution. The full and additive model was fit to the

simulated data set, and the variance explained (SSFull - SAdditive) / SFull was calculated for

each gene. The simulation was repeated 100 times and the average variance explained by

interaction was averaged across all simulations for each gene. The mean for the amount

of genetic variance explained by interaction under this simulated additive model was 0.13

(1st quartile: 0.05, 3rd quartile: 0.19) (Fig. 4.13C). This gives another estimate of the

upper bound on the possible bias.

Multiple testing correction. For both the analysis of mouse phenotypes and RNA-Seq

data it is necessary to account for multiple testing in order to avoid a large number of

false positive findings. The approaches to multiple testing for the mouse phenotypes and

151

RNA-Seq data are fundamentally different because the number hypotheses being tested were very different. For the mouse phenotype data, there were a relatively small number of targeted hypotheses, and thus the conservative and more confirmatory approach of controlling the family-wise type I error was applied. In this case, the genetic scan for each of the small number of traits was considered to be a separate question (i.e. the main effects for each trait and interaction effects for each trait were considered a separate

“family” of hypotheses). For the large number traits analyzed in the RNA-Seq data, a less conservative and more hypothesis generating approach known as the stratified FDR was applied.

Quantitative PCR (qPCR). Tissue was homogenized using TissueLyser II (Qiagen,

Valencia, CA, USA) and total RNA was isolated using the PureLink RNA purification kit with TRIzol protocol (Thermo Fisher Scientific, Waltham, MA, USA). Total RNA was reverse transcribed using the high capacity cDNA reverse transcription kit (Applied

Biosystems, Carlsbad, CA, USA). The sequences for each primer are listed in Table 4.10.

The qPCR reactions were performed with the power SYBR green PCR Master Mix

(Thermo Fisher Scientific, Waltham, MA, USA) and run on a Bio Rad CFX Connect

Real Time System (Bio Rad, Hercules, CA, USA). Expression levels were calculated using the Ct method relative to the Rplp0 control gene.

△△

152

Table 4.8. Identification of fasting glucose QTLs using a combined linear model.

Model Terms Estimate Std. Error t value Pr(>|t|) Significant (Intercept) 72.30 2.46 29.37 < 2e-16 * Maternal A14 9.35 4.44 2.10 0.035712 Maternal A17 -1.73 4.25 -0.41 0.683665 Maternal A3 24.59 5.25 4.68 3.40E-06 * Maternal A6 11.30 4.30 2.63 0.00873 Paternal A10 13.26 4.49 2.95 0.003263 * Paternal A4 16.96 4.89 3.47 0.000549 * Paternal A5 15.28 4.17 3.66 0.00027 * Paternal A8 8.23 4.67 1.76 0.078739 Sex (Male) 9.36 1.54 6.08 1.95E-09 * Maternal A14 : Paternal A10 -17.91 6.77 -2.64 0.00838 Maternal A17 : Paternal A10 0.04 6.61 0.01 0.995736 Maternal A3 : Paternal A10 -29.66 7.31 -4.06 5.52E-05 * Maternal A6 : Paternal A10 -16.45 6.83 -2.41 0.016317 Maternal A14 : Paternal A4 -11.64 7.84 -1.48 0.13834 Maternal A17 : Paternal A4 -5.76 7.17 -0.80 0.422139 Maternal A3 : Paternal A4 -6.31 13.93 -0.45 0.650844 Maternal A6 : Paternal A4 -24.91 7.04 -3.54 0.000431 * Maternal A14 : Paternal A5 -16.18 6.69 -2.42 0.015864 Maternal A17 : Paternal A5 -7.50 6.39 -1.17 0.24095 Maternal A3 : Paternal A5 -3.28 7.81 -0.42 0.674979 Maternal A6 : Paternal A5 -24.34 6.66 -3.66 0.000275 * - Maternal A14 : Paternal A8 11.1885 8.97221 -1.247 0.21279 Maternal A17 : Paternal A8 -2.8734 7.06517 -0.407 0.684347 - Maternal A3 : Paternal A8 * 34.6259 9.97441 -3.471 0.000548 - Maternal A6 : Paternal A8 * 22.5194 7.30439 -3.083 0.002126 Model used: Glucose = [maternal substitution] + [paternal substitution] + [maternal substitution]*[paternal substitution] + [sex] * indicates statistical significance (p<0.05) following Bonferonni correction

153

Table 4.9. Identification of body weight QTLs using a combined linear model.

Model Terms Estimate Std. Error t value Pr(>|t|) Significant (Intercept) 13.93 0.19 74.82 < 2e-16 * Maternal A14 0.37 0.33 1.12 0.262578 Maternal A17 -1.13 0.32 -3.50 0.000486 * Maternal A3 0.27 0.40 0.67 5.05E-01 Maternal A6 0.67 0.33 2.07 0.038688 Paternal A10 0.60 0.34 1.77 0.076771 Paternal A4 0.36 0.37 0.97 0.335033 Paternal A5 -0.10 0.32 -0.32 0.746572 Paternal A8 1.29 0.35 3.64 0.000297 * Sex (Male) 2.80 0.12 24.13 < 2e-16 * Maternal A14 : Paternal A10 -0.83 0.51 -1.65 0.100242 Maternal A17 : Paternal A10 -0.03 0.50 -0.06 0.951586 Maternal A3 : Paternal A10 -0.33 0.55 -0.61 5.44E-01 Maternal A6 : Paternal A10 0.11 0.51 0.22 0.826329 Maternal A14 : Paternal A4 0.00 0.59 0.00 0.999939 Maternal A17 : Paternal A4 -0.18 0.54 -0.34 0.733663 Maternal A3 : Paternal A4 0.48 1.05 0.46 0.645792 Maternal A6 : Paternal A4 0.34 0.53 0.64 0.520308 Maternal A14 : Paternal A5 -0.41 0.50 -0.81 0.416534 Maternal A17 : Paternal A5 0.30 0.48 0.62 0.538963 Maternal A3 : Paternal A5 0.54 0.59 0.92 0.356591 Maternal A6 : Paternal A5 1.28 0.50 2.55 0.0111 Maternal A14 : Paternal A8 -0.9913 0.6768879 -1.464 0.143483 Maternal A17 : Paternal A8 -0.03062 0.5344762 -0.057 0.954337 Maternal A3 : Paternal A8 0.401273 0.7545657 0.532 0.595029 Maternal A6 : Paternal A8 0.005279 0.5525797 0.01 0.99238 Model used: Weight = [maternal substitution] + [paternal substitution] + [maternal substitution]*[paternal substitution] + [sex] * indicates statistical significance (p<0.05) following Bonferonni correction

154

Table 4.10. Primer sequences for RT-qPCR detection.

Interaction Primer Gene name Sequence(5' to 3') category type Forward TGCTTCAGATCATGGAGGAGA Agxt Reverse TGGTTCCGGTTAGAAAGGAGT Forward AGGCCATGAAGGAGATGCAC Pcx Reverse CTTAGCCACCTTGTCCCCTG Forward TCTGGGAGAGACGGGTTTTG Slc6a12 Reverse GAAGACGATGCCCTGGTAGG Forward CACAGTGTCGGTGGACATGA Serpinf2 Reverse GGGGAAATGAGCCACCTGTA Forward CCCGGTCTGTCCACCTTTAC Zbtb20 Reverse TGGGGCTTCTCACCTGTATG Forward CTCTGCTTCGCCGACTACG Tmem245 Reverse CAATGTCCAGATCCACAGGCT Antagonistic Forward AGTATCCCGGAGTCTCAGTCAA Raph1 Reverse TAGTTTGAGGGGACAGAGGGG Forward CGGGGCGCACAGGTTATTAG Dnajb9 Reverse CTCTGAGGCAGACTTTGGCA Forward AACAACATGGCCCGAGTAGG Cers6 Reverse TGCCATTTTGGCAGCCTCTA Forward GGAGTGGTGTGAATGTTGCC Ldha Reverse TCACCTCGTAGGCACTGTCC Forward AGAACGAGATGGTGTGCGTT Sec23b Reverse GCATATGCTGGAGGGAACTGA Forward AGCAAGCCAGAGGTGTTTGG Eif2ak3 Reverse GGAAGATTCGAGCAGGGACTC Forward AGTGGGGATAATGAGTAAATCCAT Cyp3a16 Reverse GGCACCTAACACATCTTTCACAG Forward CCACCAGTACAGCCGTTTCT Syvn1 Reverse TACCCATCCAAGGAGGAGGG Forward GATACACCATGGGTGACGCT Gstm1 Reverse TCTCCATCCAGGTGGTGCTT Forward CCCTCATGGTCTGGTTGGTTT Usp18 Reverse GCACTCCGAGGCACTGTTAT Synergistic Forward GCGGGAGAAAAACATGGCTC Pik3c2a Reverse AATACCAGGACCTCACGCTG Forward CTACGTGTTCCCAGACCGAC Stat5a Reverse TGACGAACTCAGGGACCACT Forward GCACTGGCCGTTTTTGTGAT Tecr Reverse TCCAGGAGCCCACCTCATAA Forward AGAGAAAACCAGCGAGGAGC Mcm10 Reverse GGCTGCAGAGATGAATCAGGT Nol8 Forward GACGACAGACTTCGTGGTTCT

155

Reverse CTTGTTCGGGCTTCCCAAGA Forward ACTCTTCGGCCCCTGAGAAG Rprl3 Reverse GCTCTCTGGGAATTCACCTCC Forward GGAGTCTTTGGGATCCCTGC Zkscan3 Reverse TCCATTTTCAGCAACCCCTGT Forward GCCATCTATGACAGCGTGGA Asns Reverse AGTCCAGGCCCCCTGATAAA Forward GCAAAATCTCCAGGATGGCG Additive Slc12a2 Reverse CATATGTGAGCAACGCAGCC Forward AACCTGCTCCAGGAAACATCA Igfbp3 Reverse ACTTGGAATCGGTCACTCGG Forward GGAGTGGTGTGAATGTTGCC Ldha Reverse TCACCTCGTAGGCACTGTCC

156

Chapter 5. Summary and Future Direction

157

Summary

In this dissertation, we have identified the genetic basis of two rare Mendelian disorders

(Chapter 2,3) and better understand the genetic architecture of complex metabolic traits

(Chapter 4), both using forward genetic approaches.

In Chapter 2, we explored the exome of four patients with primary ovarian insufficiency from two independent consanguineous families and identified causal missense mutations in MRPS22, which was further investigated using a Drosophila model. The infertility and lack of germ cells phenotype in both humans and flies unveiled a novel function of

MRPS22 in germ cell development, which differed from all previous cases with MRPS22 mutations.

Similarly, in Chapter 3, we identified three novel loss-of-function mutations on PIK3C2A in a previously unidentified syndrome in five patients from three independent consanguineous families, providing us the opportunity to investigate its novel function in phosphatidylinositol metabolism, cilia function and cataract formation, using patient- derived fibroblasts and a zebrafish model of Pik3c2a deficiency.

In Chapter 4, we identified widespread epistatic interactions using double chromosome substitution strains in mouse, thus providing strong evidence for the importance of epistasis in complex traits, which remains a controversial phenomenon. Our findings demonstrated that epistatic interactions controlled the majority of the heritable variation

158

in both fasting plasma glucose levels and hepatic gene expression, even greater than the additive effects. This finding may partially explain the ‘missing heritability’ phenomenon due to the difficulty of discovering epistatic interactions in humans. We also identified an interesting effect of epistatic interactions, which were prone to maintain fasting glucose levels at control levels. This might be an evolutionary strategy that stores genetic variants in individuals without reducing their fitness and allows them to quickly adapt to new environmental challenges.

Future Direction

1. Researchers are not alone in battles against genetic diseases

Adrenoleukodystrophy is a disease due to the accumulation of very long chain fatty acids that leads to demyelination and eventual death before adulthood. Back in the

1990s, Lorenzo’s Oil, a 4:1 mix of oleic acid and erucic acid, was a trial treatment for this disease. Surprisingly, it was proposed by parents of Lorenzo, who desperately looked for a cure for their young son. Though the clinical trial ended with mixed results and failed[221], it did promote studies of this rare disease and ultimately led to the discovery of causal variants in ABCD1[222]. This case strongly demonstrated the enthusiasm and potential efforts that patients and their relatives could contribute to the studies of rare human disorders.

159

In fact, it’s a trend nowadays that efforts directly from patients and their relatives

contribute to an increasing proportion of patient-centered genetic studies. For example,

SPARK is an ambitious project aiming to enroll 50,000 individuals with autism and their families in the United States to accelerate autism research. In this study, cases are

reported by patients or their parents, allow for online access to study results by

participants, and reserve the ability to recontact with participants for new research studies

and to potentially provide first-hand updates and treatment guidelines[223]. With the

access to personal sequencing data, individuals can easily retrieve their DNA report

according to newly annotated variants using web-based literature retrieval systems, such as Promethease.

In our studies, we also benefited from positive patient-researcher interactions. With their

consent for skin punch biopsies and the application of that in our research, we were able

to culture primary fibroblast cells, which served as the cellular models in our studies. On

the other hand, patients benefited from our studies as well. For example, the secondary

glaucoma was not diagnosed in Family I until we found a second family with PIK3C2A

deficiency that was already known to have glaucoma. Due to this discovery, the patients

in Family I were referred to an ophthalmologist for a detailed eye exam, which confirmed

the presence of glaucoma. In addition to improving the clinical diagnosis, our

identification of the genetic basis for this syndrome can also be used to provide prenatal

screening for any future offspring from these affected families and their relatives.

160

2. Gene therapy to cure the diseases

Since Jesse Gelsinger’s death during a gene therapy trial in 1999, gene therapy research had been almost ‘frozen’ for decades. However, the discovery of a precise gene-editing tool, CRISPR/Cas9, has ‘broken the ice’ of gene therapy research. In 2015, China launched CRISPR trials in cancer patients. Earlier this year, a CRISPR/Cas9-based clinical trial to treat Beta-Thalassemia was approved in Europe. It appears that in the near future, we could conquer any . However, evidence has shown that this assumption might be too optimistic. As Eric Lander said when asked for his opinion on how CRISPR should be applied ‘We are terrible predictors of the consequences of the changes we make’. A good example is that a mutation in CCR5 that protects against HIV was unexpectedly associated with increased risk for West Nile virus[224], in which case genomic editing upon that mutation attempting to lower the risk of West Nile virus could have resulted in increased risk of HIV.

In addition to treating complex traits, rare disorder treatments have also been applied in humans. Though the FDA has approved over 500 orphan drugs for rare disorders, and these orphan drugs accounted for 17% of prescription drug market share in 2017, none of them take the form of gene therapy. Eventually, last year, gene therapy using zinc finger nucleases was applied to a 44-year old man with a rare disease referred to as Hunter syndrome, which is characterized by accumulation of glycosaminoglycans due to mutations in the gene IDS. According to ClinicalTrials.gov, this treatment (SB-913) inserts a correct copy of the IDS gene under the control of the strong albumin promoter

161

and produces functional IDS enzyme in patients’ liver. So far, his symptoms have diminished without any negative consequences.

This treatment sheds light on the potential application of gene therapy to our patients which have a similar phenotype as Hunter’s syndrome. With a well-designed strategy to

replace a functional copy of PIK3C2A, patients may recover from the disease, even with

just 50% of gene function like their healthy parents. However, it’s likely wise to remain

skeptical towards this new trend as we have little knowledge of the long-term effects of

these gene therapy approaches.

3. Strategies to predict disease risk loci

Over the past two decades, the healthcare system has increasingly shifted its attention

from treatment options to predictive and/or preventive strategies to improve both patient

care and reduce overall cost [225]. To better predict disease risk, even before the onset of

disease, machine learning-based studies on electronic health records (EHR) are a

promising direction. Compared with traditional algorithms [226], machine learning-based

algorithms have improved the accuracy, precision and trade-off between false positive

and true positive rates and have been successfully applied in risk predictions of type 2

diabetes [227] and heart disease [228]. More interestingly, recent applications of machine

learning with big autism spectrum disorder datasets are able to provide scores for each

gene to identify novel autism genes. Collectively, application of new algorithms will

dramatically speed up the identification of novel disease genes.

162

In addition, machine learning-based imaging studies can facilitate genetic syndrome diagnoses [229–231]. Rare disorder patients on average visit 7.3 physicians before receiving an accurate diagnosis. However, with these machine learning-based facial recognition software programs, like Face2Gene, physicians may be able to diagnose rare disorders simply by snapping a photo of patient’s face. For example, it can be applied in syndromic disorders with craniofacial malformations syndromes, like our patients with

PIK3C2A mutations.

4. Strategies to better under current data

In contrast to predictive studies, retrospective studies on raw sequencing data also shed light on the diagnosis of rare disorders. In a recent study, 156 cases that failed to provide definitive diagnoses were reanalyzed with raw sequencing data. Surprisingly, 24 of 156 cases turned out to be definitively diagnosed after just 1 year had passed since the initial analysis [232]. The improved diagnostic rates were mainly due to improved informatics approaches and a better understanding of the variants of unknown significance [233].

Therefore, reanalysis of an individual’s genome-wide sequencing data every 1–2 years until a diagnosis is recommended [233].

In addition to publishing our work in scientific journals, we have submitted the pathogenic variants and supporting data to ClinVar, which is a widely used database for variant interpretation, which will further promote the identification of similar cases in the future.

163

5. What’s beyond genetic studies in understanding human disorders?

Our studies described in this thesis focused on identifying the genetic basis of complex traits and human disorders. However, studies on epigenetics and environmental factors are also indispensable.

In contrast to genetic studies, epigenetic studies focus on heritable changes in the regulation of gene activity and expression without disturbing DNA sequences. The underlying mechanisms include DNA methylation, histone modification and long non- coding RNAs, all of which play crucial roles in development, tissue homeostasis, cell identity and genome stability [234]. As a result, aberrant epigenetic regulations have been associated with cancer, cardiovascular, neurological diseases, metabolic disorders, as well as imprinting disorders [235]. In addition to mutations in genes involved in epigenetic regulation[236], mutations in non-coding region can also trigger human disorders due to epigenetic changes. For example, in elderly type 2 diabetes patients, a polymorphism in the NDUFB6 promoter region that creates a new DNA methylation site was identified and confirmed to have increased DNA methylation and decreased

NDUFB6 expression, which is a known risk factor for insulin resistance [237]. These studies provide evidence that epigenetic factors are associated with human disorders.

In addition, environmental factors can influence human fitness. Environmental factors refer to nutrition, toxins and infectious agents and lifestyle. It is now well known that prenatal exposure to environmental factors can have a long-term impact on not only organ development, but even adulthood fitness. A famous example is the Dutch famine

164

studies. This series of studies discovered that individuals conceived during that period of famine showed higher rates of obesity and cardiovascular disorders when examined in their 50s, as well as a greater age-associated decline of cognitive functions as adults [238].

Similarly, prenatal smoke exposure has been strongly associated with risks of impaired lung function development and asthma [239]. Further study showed changes in SAT2 methylation in offspring’s peripheral blood [240]. These studies provide evidence for the assumption that environmental factors play an important role in developmental reprogramming via epigenetic regulation, and then act as an “epigenetic memory” of the exposure.

165

Reference

1. Suter U, Welcher AA, Özcelik T, Snipes GJ, Kosaras B, Francke U, et al. Trembler mouse carries a point mutation in a myelin gene. Nature. 1992;356: 241–244. doi:10.1038/356241a0

2. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. The Huntington’s Disease Collaborative Research Group. Cell. 1993;72: 971–983.

3. Rinchik EM. Chemical mutagenesis and fine-structure functional analysis of the mouse genome. Trends Genet. 1991;7: 15–21. doi:10.1016/0168-9525(91)90016-J

4. Arnold CN, Barnes MJ, Berger M, Blasius AL, Brandl K, Croker B, et al. ENU-induced phenovariance in mice: inferences from 587 mutations. BMC Res Notes. 2012;5: 577. doi:10.1186/1756-0500-5-577

5. Reaume AG, Knecht DA, Chovnick A. The Rosy in Drosophila Melanogaster: Xanthine Dehydrogenase and Eye Pigments. Genetics. 1991;129: 1099–1109.

6. Fire A, Xu S, Montgomery MK, Kostas SA, Driver SE, Mello CC. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature. 1998;391: 806– 811. doi:10.1038/35888

7. Kuttenkeuler D, Boutros M. Genome-wide RNAi as a route to gene function in Drosophila. Brief Funct Genomic Proteomic. 2004;3: 168–176.

8. Waterhouse PM, Graham MW, Wang MB. Virus resistance and gene silencing in plants can be induced by simultaneous expression of sense and antisense RNA. Proc Natl Acad Sci U S A. 1998;95: 13959–13964.

9. 1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467: 1061–1073. doi:10.1038/nature09534

10. 1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491: 56–65. doi:10.1038/nature11632

11. Consortium T 1000 GP. A global reference for human genetic variation. Nature. 2015;526: 68–74. doi:10.1038/nature15393

12. An integrated map of structural variation in 2,504 human genomes | Nature [Internet]. [cited 5 Jun 2018]. Available: https://www.nature.com/articles/nature15394

13. Botstein D, Risch N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet. 2003;33: 228– 237. doi:10.1038/ng1090

166

14. Majewski J, Schwartzentruber J, Lalonde E, Montpetit A, Jabado N. What can exome sequencing do for you? J Med Genet. 2011; jmedgenet–2011–100223. doi:10.1136/jmedgenet-2011-100223

15. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536: 285–291. doi:10.1038/nature19057

16. Scott EM, Halees A, Itan Y, Spencer EG, He Y, Azab MA, et al. Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nat Genet. 2016;48: 1071–1076. doi:10.1038/ng.3592

17. Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273: 1516–1517.

18. Consortium TIH, Altshuler D, Donnelly P. A haplotype map of the human genome. Nature. 2005;437: 1299–1320. doi:10.1038/nature04226

19. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42: D1001– 1006. doi:10.1093/nar/gkt1229

20. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of heritability for human height. Nat Genet. 2010;42: 565–569. doi:10.1038/ng.608

21. Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, et al. Meta-analysis of genome-wide association studies for height and body mass index in ~700,000 individuals of European ancestry. bioRxiv. 2018; 274654. doi:10.1101/274654

22. Visscher PM. Sizing up human height variation. Nat Genet. 2008;40: 489–490. doi:10.1038/ng0508-489

23. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461: 747–753. doi:10.1038/nature08494

24. Rivas MA, Beaudoin M, Gardet A, Stevens C, Sharma Y, Zhang CK, et al. Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat Genet. 2011;43: 1066–1073. doi:10.1038/ng.952

25. Gudmundsson J, Sulem P, Gudbjartsson DF, Masson G, Agnarsson BA, Benediktsdottir KR, et al. A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nat Genet. 2012;44: 1326–1329. doi:10.1038/ng.2437

26. Del-Aguila JL, Koboldt DC, Black K, Chasse R, Norton J, Wilson RK, et al. Alzheimer’s disease: rare variants with large effect sizes. Curr Opin Genet Dev. 2015;33: 49–55. doi:10.1016/j.gde.2015.07.008

167

27. Lee S, Abecasis GR, Boehnke M, Lin X. Rare-Variant Association Analysis: Study Designs and Statistical Tests. Am J Hum Genet. 2014;95: 5–23. doi:10.1016/j.ajhg.2014.06.009

28. Marouli E, Graff M, Medina-Gomez C, Lo KS, Wood AR, Kjaer TR, et al. Rare and low- frequency coding variants alter human adult height. Nature. 2017;542: 186–190. doi:10.1038/nature21039

29. Jonsson T, Stefansson H, Steinberg S, Jonsdottir I, Jonsson PV, Snaedal J, et al. Variant of TREM2 Associated with the Risk of Alzheimer’s Disease. N Engl J Med. 2013;368: 107–116. doi:10.1056/NEJMoa1211103

30. Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci. 2012;109: 1193–1198. doi:10.1073/pnas.1119675109

31. Huang Y, Wang C, Yao Y, Zuo X, Chen S, Xu C, et al. Molecular Basis of Gene-Gene Interaction: Cyclic Cross-Regulation of Gene Expression and Post-GWAS Gene-Gene Interaction Involved in Atrial Fibrillation. PLoS Genet. 2015;11: e1005393. doi:10.1371/journal.pgen.1005393

32. Liu Y, Xu H, Chen S, Chen X, Zhang Z, Zhu Z, et al. Genome-wide interaction-based association analysis identified multiple new susceptibility Loci for common diseases. PLoS Genet. 2011;7: e1001338. doi:10.1371/journal.pgen.1001338

33. Kirino Y, Bertsias G, Ishigatsubo Y, Mizuki N, Tugal-Tutkun I, Seyahi E, et al. Genome-wide association analysis identifies new susceptibility loci for Behçet’s disease and epistasis between HLA-B*51 and ERAP1. Nat Genet. 2013;45: 202–207. doi:10.1038/ng.2520

34. Verma SS, Cooke Bailey JN, Lucas A, Bradford Y, Linneman JG, Hauser MA, et al. Epistatic Gene-Based Interaction Analyses for Glaucoma in eMERGE and NEIGHBOR Consortium. PLoS Genet. 2016;12: e1006186. doi:10.1371/journal.pgen.1006186

35. Hemani G, Shakhbazov K, Westra H-J, Esko T, Henders AK, McRae AF, et al. Detection and replication of epistasis influencing transcription in humans. Nature. 2014;508: 249–253. doi:10.1038/nature13005

36. Hu T, Chen Y, Kiralis JW, Collins RL, Wejse C, Sirugo G, et al. An information-gain approach to detecting three-way epistatic interactions in genetic association studies. J Am Med Inform Assoc JAMIA. 2013;20: 630–636. doi:10.1136/amiajnl-2012-001525

37. Galarza-Muñoz G, Briggs FBS, Evsyukova I, Schott-Lerner G, Kennedy EM, Nyanhete T, et al. Human Epistatic Interaction Controls IL7R Splicing and Increases Multiple Sclerosis Risk. Cell. 2017;169: 72–84.e13. doi:10.1016/j.cell.2017.03.007

38. Brewer GJ. Drug development for orphan diseases in the context of personalized medicine. Transl Res J Lab Clin Med. 2009;154: 314–322. doi:10.1016/j.trsl.2009.03.008

168

39. Boycott KM, Vanstone MR, Bulman DE, MacKenzie AE. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat Rev Genet. 2013;14: 681–691. doi:10.1038/nrg3555

40. Marešová P, Mohelská H, Kuča K. Cooperation Policy of Rare Diseases in the European Union. Procedia - Soc Behav Sci. 2015;171: 1302–1308. doi:10.1016/j.sbspro.2015.01.245

41. Hamamy HA, Masri AT, Al-Hadidy AM, Ajlouni KM. Consanguinity and genetic disorders. Profile from Jordan. Saudi Med J. 2007;28: 1015–1017.

42. Bittles AH, Black ML. Evolution in health and medicine Sackler colloquium: Consanguinity, human evolution, and complex diseases. Proc Natl Acad Sci U S A. 2010;107 Suppl 1: 1779–1786. doi:10.1073/pnas.0906079106

43. Hamamy H. Consanguineous marriages. J Community Genet. 2012;3: 185–192. doi:10.1007/s12687-011-0072-y

44. Beaulieu CL, Majewski J, Schwartzentruber J, Samuels ME, Fernandez BA, Bernier FP, et al. FORGE Canada Consortium: outcomes of a 2-year national rare-disease gene-discovery project. Am J Hum Genet. 2014;94: 809–817. doi:10.1016/j.ajhg.2014.05.003

45. Committee opinion no. 605: primary ovarian insufficiency in adolescents and young women. Obstet Gynecol. 2014;124: 193–197. doi:10.1097/01.AOG.0000451757.51964.98

46. Rossetti R, Ferrari I, Bonomi M, Persani L. Genetics of primary ovarian insufficiency: Genetics of POI. Clin Genet. 2017;91: 183–198. doi:10.1111/cge.12921

47. Bolcun-Filas E, Hall E, Speed R, Taggart M, Grey C, de Massy B, et al. Mutation of the mouse Syce1 gene disrupts synapsis and suggests a link between synaptonemal complex structural components and DNA repair. PLoS Genet. 2009;5: e1000393. doi:10.1371/journal.pgen.1000393

48. Costa Y, Speed R, Ollinger R, Alsheimer M, Semple CA, Gautier P, et al. Two novel proteins recruited by synaptonemal complex protein 1 (SYCP1) are at the centre of meiosis. J Cell Sci. 2005;118: 2755–2762. doi:10.1242/jcs.02402

49. Caburet S, Arboleda VA, Llano E, Overbeek PA, Barbero JL, Oka K, et al. Mutant cohesin in premature ovarian failure. N Engl J Med. 2014;370: 943–949. doi:10.1056/NEJMoa1309635

50. Guiraldelli MF, Eyster C, Wilkerson JL, Dresser ME, Pezza RJ. Mouse HFM1/Mer3 is required for crossover formation and complete synapsis of homologous chromosomes during meiosis. PLoS Genet. 2013;9: e1003383. doi:10.1371/journal.pgen.1003383

51. Wang J, Zhang W, Jiang H, Wu B-L, Primary Ovarian Insufficiency Collaboration. Mutations in HFM1 in recessive primary ovarian insufficiency. N Engl J Med. 2014;370: 972–974. doi:10.1056/NEJMc1310150

169

52. Weinberg-Shukron A, Renbaum P, Kalifa R, Zeligson S, Ben-Neriah Z, Dreifuss A, et al. A mutation in the nucleoporin-107 gene causes XX . J Clin Invest. 2015;125: 4295–4304. doi:10.1172/JCI83553

53. AlAsiri S, Basit S, Wood-Trageser MA, Yatsenko SA, Jeffries EP, Surti U, et al. Exome sequencing reveals MCM8 mutation underlies ovarian failure and chromosomal instability. J Clin Invest. 2015;125: 258–262. doi:10.1172/JCI78473

54. Fauchereau F, Shalev S, Chervinsky E, Beck-Fruchter R, Legois B, Fellous M, et al. A non- sense MCM9 mutation in a familial case of primary ovarian insufficiency. Clin Genet. 2016;89: 603–607. doi:10.1111/cge.12736

55. Wood-Trageser MA, Gurbuz F, Yatsenko SA, Jeffries EP, Kotan LD, Surti U, et al. MCM9 mutations are associated with ovarian failure, short stature, and chromosomal instability. Am J Hum Genet. 2014;95: 754–762. doi:10.1016/j.ajhg.2014.11.002

56. Zhao H, Chen Z-J, Qin Y, Shi Y, Wang S, Choi Y, et al. Transcription factor FIGLA is mutated in patients with premature ovarian failure. Am J Hum Genet. 2008;82: 1342–1348. doi:10.1016/j.ajhg.2008.04.018

57. Bayram Y, Gulsuner S, Guran T, Abaci A, Yesil G, Gulsuner HU, et al. Homozygous loss-of- function mutations in SOHLH1 in patients with nonsyndromic hypergonadotropic hypogonadism. J Clin Endocrinol Metab. 2015;100: E808–814. doi:10.1210/jc.2015-1150

58. Qin Y, Choi Y, Zhao H, Simpson JL, Chen Z-J, Rajkovic A. NOBOX homeobox mutation causes premature ovarian failure. Am J Hum Genet. 2007;81: 576–581. doi:10.1086/519496

59. Bouilly J, Bachelot A, Broutin I, Touraine P, Binart N. Novel NOBOX loss-of-function mutations account for 6.2% of cases in a large primary ovarian insufficiency cohort. Hum Mutat. 2011;32: 1108–1113. doi:10.1002/humu.21543

60. Kasippillai T, MacArthur DG, Kirby A, Thomas B, Lambalk CB, Daly MJ, et al. Mutations in eIF4ENIF1 are associated with primary ovarian insufficiency. J Clin Endocrinol Metab. 2013;98: E1534–1539. doi:10.1210/jc.2013-1102

61. Kurolap A, Orenstein N, Kedar I, Weisz Hubshman M, Tiosano D, Mory A, et al. Is one diagnosis the whole story? patients with double diagnoses. Am J Med Genet A. 2016;170: 2338–2348. doi:10.1002/ajmg.a.37799

62. Rösler A, Silverstein S, Abeliovich D. A (R80Q) mutation in 17 beta-hydroxysteroid dehydrogenase type 3 gene among Arabs of Israel is associated with pseudohermaphroditism in males and normal asymptomatic females. J Clin Endocrinol Metab. 1996;81: 1827–1831. doi:10.1210/jcem.81.5.8626842

63. Geissler WM, Davis DL, Wu L, Bradshaw KD, Patel S, Mendonca BB, et al. Male pseudohermaphroditism caused by mutations of testicular 17β–hydroxysteroid dehydrogenase 3. Nat Genet. 1994;7: 34–39. doi:10.1038/ng0594-34

170

64. Mendonca BB, Arnhold IJP, Bloise W, Andersson S, Russell DW, Wilson JD. 17β- Hydroxysteroid Dehydrogenase 3 Deficiency in Women. J Clin Endocrinol Metab. 1999;84: 802–804. doi:10.1210/jcem.84.2.5477

65. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31: 3812–3814.

66. Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. 2013;Chapter 7: Unit7.20. doi:10.1002/0471142905.hg0720s76

67. Sobreira N, Schiettecatte F, Valle D, Hamosh A. GeneMatcher: a matching tool for connecting investigators with an interest in the same gene. Hum Mutat. 2015;36: 928– 930. doi:10.1002/humu.22844

68. Saada A, Shaag A, Arnon S, Dolfin T, Miller C, Fuchs-Telem D, et al. Antenatal mitochondrial disease caused by mitochondrial ribosomal protein (MRPS22) mutation. J Med Genet. 2007;44: 784–786. doi:10.1136/jmg.2007.053116

69. Smits P, Saada A, Wortmann SB, Heister AJ, Brink M, Pfundt R, et al. Mutation in mitochondrial ribosomal protein MRPS22 leads to Cornelia de Lange-like phenotype, brain abnormalities and hypertrophic cardiomyopathy. Eur J Hum Genet EJHG. 2011;19: 394–399. doi:10.1038/ejhg.2010.214

70. Ye F, Hoppel CL. Measuring oxidative phosphorylation in human skin fibroblasts. Anal Biochem. 2013;437: 52–58. doi:10.1016/j.ab.2013.02.010

71. Lee T, Luo L. analysis with a repressible cell marker for studies of gene function in neuronal morphogenesis. Neuron. 1999;22: 451–461.

72. Cabrera GR, Godt D, Fang P-Y, Couderc J-L, Laski FA. Expression pattern of Gal4 enhancer trap insertions into the bric à brac locus generated by P element replacement. Genes N Y N 2000. 2002;34: 62–65. doi:10.1002/gene.10115

73. Van Doren M, Williamson AL, Lehmann R. Regulation of zygotic gene expression in Drosophila primordial germ cells. Curr Biol CB. 1998;8: 243–246.

74. Smits P, Smeitink JAM, van den Heuvel LP, Huynen MA, Ettema TJG. Reconstructing the evolution of the mitochondrial ribosomal proteome. Nucleic Acids Res. 2007;35: 4686– 4703. doi:10.1093/nar/gkm441

75. Greber BJ, Ban N. Structure and Function of the Mitochondrial Ribosome. Annu Rev Biochem. 2016;85: 103–132. doi:10.1146/annurev-biochem-060815-014343

76. Amunts A, Brown A, Toots J, Scheres SHW, Ramakrishnan V. Ribosome. The structure of the human mitochondrial ribosome. Science. 2015;348: 95–98. doi:10.1126/science.aaa1193

171

77. Greber BJ, Bieri P, Leibundgut M, Leitner A, Aebersold R, Boehringer D, et al. Ribosome. The complete structure of the 55S mammalian mitochondrial ribosome. Science. 2015;348: 303–308. doi:10.1126/science.aaa3872

78. Menezes MJ, Guo Y, Zhang J, Riley LG, Cooper ST, Thorburn DR, et al. Mutation in mitochondrial ribosomal protein S7 (MRPS7) causes congenital sensorineural deafness, progressive hepatic and renal failure and lactic acidemia. Hum Mol Genet. 2015;24: 2297–2307. doi:10.1093/hmg/ddu747

79. Pierce SB, Chisholm KM, Lynch ED, Lee MK, Walsh T, Opitz JM, et al. Mutations in mitochondrial histidyl tRNA synthetase HARS2 cause ovarian dysgenesis and sensorineural hearing loss of Perrault syndrome. Proc Natl Acad Sci U S A. 2011;108: 6543–6548. doi:10.1073/pnas.1103471108

80. Pierce SB, Gersak K, Michaelson-Cohen R, Walsh T, Lee MK, Malach D, et al. Mutations in LARS2, encoding mitochondrial leucyl-tRNA synthetase, lead to premature ovarian failure and hearing loss in Perrault syndrome. Am J Hum Genet. 2013;92: 614–620. doi:10.1016/j.ajhg.2013.03.007

81. Dallabona C, Diodato D, Kevelam SH, Haack TB, Wong L-J, Salomons GS, et al. Novel (ovario) leukodystrophy related to AARS2 mutations. Neurology. 2014;82: 2063–2071. doi:10.1212/WNL.0000000000000497

82. Fogli A, Rodriguez D, Eymard-Pierre E, Bouhour F, Labauge P, Meaney BF, et al. Ovarian failure related to eukaryotic 2B mutations. Am J Hum Genet. 2003;72: 1544–1550. doi:10.1086/375404

83. Baertling F, Haack TB, Rodenburg RJ, Schaper J, Seibt A, Strom TM, et al. MRPS22 mutation causes fatal neonatal lactic acidosis with brain and heart abnormalities. . 2015;16: 237–240. doi:10.1007/s10048-015-0440-6

84. Kılıç M, Oğuz K-K, Kılıç E, Yüksel D, Demirci H, Sağıroğlu MŞ, et al. A patient with mitochondrial disorder due to a novel mutation in MRPS22. Metab Brain Dis. 2017; doi:10.1007/s11011-017-0074-5

85. Saada A, Shaag A, Arnon S, Dolfin T, Miller C, Fuchs-Telem D, et al. Antenatal mitochondrial disease caused by mitochondrial ribosomal protein (MRPS22) mutation. J Med Genet. 2007;44: 784–786. doi:10.1136/jmg.2007.053116

86. May-Panloup P, Boucret L, Chao de la Barca J-M, Desquiret-Dumas V, Ferré-L’Hotellier V, Morinière C, et al. Ovarian ageing: the role of mitochondria in oocytes and follicles. Hum Reprod Update. 2016;22: 725–743. doi:10.1093/humupd/dmw028

87. Hayashi Y, Otsuka K, Ebina M, Igarashi K, Takehara A, Matsumoto M, et al. Distinct requirements for energy metabolism in mouse primordial germ cells and their reprogramming to embryonic germ cells. Proc Natl Acad Sci U S A. 2017;114: 8289–8294. doi:10.1073/pnas.1620915114

172

88. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma Oxf Engl. 2009;25: 1754–1760. doi:10.1093/bioinformatics/btp324

89. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20: 1297–1303. doi:10.1101/gr.107524.110

90. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38: e164. doi:10.1093/nar/gkq603

91. Lupski JR, Gonzaga-Jauregui C, Yang Y, Bainbridge MN, Jhangiani S, Buhay CJ, et al. Exome sequencing resolves apparent incidental findings and reveals further complexity of SH3TC2 variant alleles causing Charcot-Marie-Tooth neuropathy. Genome Med. 2013;5: 57. doi:10.1186/gm461

92. Bainbridge MN, Wang M, Wu Y, Newsham I, Muzny DM, Jefferies JL, et al. Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome Biol. 2011;12: R68. doi:10.1186/gb-2011-12-7-r68

93. Challis D, Yu J, Evani US, Jackson AR, Paithankar S, Coarfa C, et al. An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics. 2012;13: 8. doi:10.1186/1471-2105-13-8

94. Reid JG, Carroll A, Veeraraghavan N, Dahdouli M, Sundquist A, English A, et al. Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline. BMC Bioinformatics. 2014;15: 30. doi:10.1186/1471-2105-15-30

95. Charrier A, Wang L, Stephenson EJ, Ghanta SV, Ko C-W, Croniger CM, et al. Zinc finger protein 407 overexpression upregulates PPAR target gene expression and improves glucose homeostasis in mice. Am J Physiol Endocrinol Metab. 2016;311: E869–E880. doi:10.1152/ajpendo.00234.2016

96. Hoppel CL, Kerr DS, Dahms B, Roessmann U. Deficiency of the reduced nicotinamide adenine dinucleotide dehydrogenase component of complex I of mitochondrial electron transport. Fatal infantile lactic acidosis and hypermetabolism with skeletal-cardiac myopathy and encephalopathy. J Clin Invest. 1987;80: 71–77. doi:10.1172/JCI113066

97. Krähenbühl S, Talos C, Wiesmann U, Hoppel CL. Development and evaluation of a spectrophotometric assay for complex III in isolated mitochondria, tissues and fibroblasts from rats and humans. Clin Chim Acta Int J Clin Chem. 1994;230: 177–187.

98. Ni J-Q, Zhou R, Czech B, Liu L-P, Holderbaum L, Yang-Zhou D, et al. A genome-scale shRNA resource for transgenic RNAi in Drosophila. Nat Methods. 2011;8: 405–407. doi:10.1038/nmeth.1592

99. Perkins LA, Holderbaum L, Tao R, Hu Y, Sopko R, McCall K, et al. The Transgenic RNAi Project at Harvard Medical School: Resources and Validation. Genetics. 2015;201: 843– 852. doi:10.1534/genetics.115.180208

173

100. Shapiro-Kulnane L, Smolko AE, Salz HK. Maintenance of Drosophila germline stem cell sexual identity in oogenesis and tumorigenesis. Dev Camb Engl. 2015;142: 1073–1082. doi:10.1242/dev.116590

101. Chong JX, Buckingham KJ, Jhangiani SN, Boehm C, Sobreira N, Smith JD, et al. The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. Am J Hum Genet. 2015;97: 199–215. doi:10.1016/j.ajhg.2015.06.009

102. Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER. The next-generation sequencing revolution and its impact on genomics. Cell. 2013;155: 27–38. doi:10.1016/j.cell.2013.09.006

103. Cantley LC. The Phosphoinositide 3-Kinase Pathway. Science. 2002;296: 1655–1657. doi:10.1126/science.296.5573.1655

104. Jean S, Kiger AA. Classes of phosphoinositide 3-kinases at a glance. J Cell Sci. 2014;127: 923–928. doi:10.1242/jcs.093773

105. Devereaux K, Dall’Armi C, Alcazar-Roman A, Ogasawara Y, Zhou X, Wang F, et al. Regulation of mammalian autophagy by class II and III PI 3-kinases through PI3P synthesis. PloS One. 2013;8: e76405. doi:10.1371/journal.pone.0076405

106. Yoshioka K, Yoshida K, Cui H, Wakayama T, Takuwa N, Okamoto Y, et al. Endothelial PI3K- C2α, a class II PI3K, has an essential role in angiogenesis and vascular barrier function. Nat Med. 2012;18: 1560–1569. doi:10.1038/nm.2928

107. Leibiger B, Moede T, Uhles S, Barker CJ, Creveaux M, Domin J, et al. Insulin-feedback via PI3K-C2alpha activated PKBalpha/Akt1 is required for glucose-stimulated insulin secretion. FASEB J Off Publ Fed Am Soc Exp Biol. 2010;24: 1824–1837. doi:10.1096/fj.09- 148072

108. Krag C, Malmberg EK, Salcini AE. PI3KC2α, a class II PI3K, is required for dynamin- independent internalization pathways. J Cell Sci. 2010;123: 4240–4250. doi:10.1242/jcs.071712

109. Falasca M, Maffucci T. Regulation and cellular functions of class II phosphoinositide 3- kinases. Biochem J. 2012;443: 587–601. doi:10.1042/BJ20120008

110. Behrends C, Sowa ME, Gygi SP, Harper JW. Network organization of the human autophagy system. Nature. 2010;466: 68–76. doi:10.1038/nature09204

111. Campa CC, Franco I, Hirsch E. PI3K-C2α: One enzyme for two products coupling vesicle trafficking and . FEBS Lett. 2015;589: 1552–1558. doi:10.1016/j.febslet.2015.05.001

112. Franco I, Gulluni F, Campa CC, Costa C, Margaria JP, Ciraolo E, et al. PI3K Class II α Controls Spatially Restricted Endosomal PtdIns3P and Rab11 Activation to Promote Primary Cilium Function. Dev Cell. 2014;28: 647–658. doi:10.1016/j.devcel.2014.01.022

174

113. Posor Y, Eichhorn-Gruenig M, Puchkov D, Schöneberg J, Ullrich A, Lampe A, et al. Spatiotemporal control of endocytosis by phosphatidylinositol-3,4-bisphosphate. Nature. 2013;499: 233–237. doi:10.1038/nature12360

114. Laflamme N, Leblanc JF, Mailloux J, Faure N, Labrie F, Simard J. Mutation R96W in cytochrome P450c17 gene causes combined 17 alpha-hydroxylase/17-20-lyase deficiency in two French Canadian patients. J Clin Endocrinol Metab. 1996;81: 264–268. doi:10.1210/jcem.81.1.8550762

115. Martin RM, Lin CJ, Costa EMF, de Oliveira ML, Carrilho A, Villar H, et al. P450c17 deficiency in Brazilian patients: biochemical diagnosis through progesterone levels confirmed by CYP17 genotyping. J Clin Endocrinol Metab. 2003;88: 5739–5746. doi:10.1210/jc.2003-030988

116. Kurolap A, Orenstein N, Kedar I, Weisz Hubshman M, Tiosano D, Mory A, et al. Is one diagnosis the whole story? patients with double diagnoses. Am J Med Genet A. 2016;170: 2338–2348. doi:10.1002/ajmg.a.37799

117. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536: 285–291. doi:10.1038/nature19057

118. Kettleborough RNW, Busch-Nentwich EM, Harvey SA, Dooley CM, de Bruijn E, van Eeden F, et al. A systematic genome-wide analysis of zebrafish protein-coding gene function. Nature. 2013;496: 494–497. doi:10.1038/nature11992

119. Vanhaesebroeck B, Whitehead MA, Piñeiro R. Molecules in medicine mini-review: isoforms of PI3K in biology and disease. J Mol Med. 2016;94: 5–11. doi:10.1007/s00109- 015-1352-5

120. Lupski JR, Belmont JW, Boerwinkle E, Gibbs RA. Clan genomics and the complex architecture of human disease. Cell. 2011;147: 32–43. doi:10.1016/j.cell.2011.09.008

121. Blair DR, Lyttle CS, Mortensen JM, Bearden CF, Jensen AB, Khiabanian H, et al. A Nondegenerate Code of Deleterious Variants in Mendelian Loci Contributes to Complex Disease Risk. Cell. 2013;155: 70–80. doi:10.1016/j.cell.2013.08.030

122. Lotta LA, Gulati P, Day FR, Payne F, Ongen H, van de Bunt M, et al. Integrative genomic analysis implicates limited peripheral adipose storage capacity in the pathogenesis of human insulin resistance. Nat Genet. 2016;49: 17–26. doi:10.1038/ng.3714

123. Semple RK, Savage DB, Cochran EK, Gorden P, O’Rahilly S. Genetic syndromes of severe insulin resistance. Endocr Rev. 2011;32: 498–514. doi:10.1210/er.2010-0020

124. Goes FS, McGrath J, Avramopoulos D, Wolyniec P, Pirooznia M, Ruczinski I, et al. Genome-wide association study of schizophrenia in Ashkenazi Jews. Am J Med Genet Part B Neuropsychiatr Genet Off Publ Int Soc Psychiatr Genet. 2015;168: 649–659. doi:10.1002/ajmg.b.32349

175

125. Ruderfer DM, Fanous AH, Ripke S, McQuillin A, Amdur RL, Schizophrenia Working Group of Psychiatric Genomics Consortium, et al. Polygenic dissection of diagnosis and clinical dimensions of bipolar disorder and schizophrenia. Mol Psychiatry. 2014;19: 1017–1024. doi:10.1038/mp.2013.138

126. Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium. Genome- wide association study identifies five new schizophrenia loci. Nat Genet. 2011;43: 969– 976. doi:10.1038/ng.940

127. Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014;46: 1173–1186. doi:10.1038/ng.3097

128. Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467: 832–838. doi:10.1038/nature09410

129. Lettre G, Jackson AU, Gieger C, Schumacher FR, Berndt SI, Sanna S, et al. Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet. 2008;40: 584–591. doi:10.1038/ng.125

130. Bökenkamp A, Ludwig M. The oculocerebrorenal syndrome of Lowe: an update. Pediatr Nephrol Berl Ger. 2016;31: 2201–2212. doi:10.1007/s00467-016-3343-3

131. Staiano L, De Leo MG, Persico M, De Matteis MA. Mendelian disorders of PI metabolizing enzymes. Biochim Biophys Acta BBA - Mol Cell Biol Lipids. 2015;1851: 867–881. doi:10.1016/j.bbalip.2014.12.001

132. Mehta ZB, Pietka G, Lowe M. The cellular and physiological functions of the Lowe syndrome protein OCRL1. Traffic Cph Den. 2014;15: 471–487. doi:10.1111/tra.12160

133. Wiessner M, Roos A, Munn CJ, Viswanathan R, Whyte T, Cox D, et al. Mutations in INPP5K , Encoding a Phosphoinositide 5-Phosphatase, Cause Congenital Muscular Dystrophy with Cataracts and Mild Cognitive Impairment. Am J Hum Genet. 2017;100: 523–536. doi:10.1016/j.ajhg.2017.01.024

134. Osborn DPS, Pond HL, Mazaheri N, Dejardin J, Munn CJ, Mushref K, et al. Mutations in INPP5K Cause a Form of Congenital Muscular Dystrophy Overlapping Marinesco-Sjögren Syndrome and Dystroglycanopathy. Am J Hum Genet. 2017;100: 537–545. doi:10.1016/j.ajhg.2017.01.019

135. Bothwell SP, Farber LW, Hoagland A, Nussbaum RL. Species-specific difference in expression and splice-site choice in Inpp5b, an inositol polyphosphate 5-phosphatase paralogous to the enzyme deficient in Lowe Syndrome. Mamm Genome. 2010;21: 458– 466. doi:10.1007/s00335-010-9281-7

136. Mountford JK, Petitjean C, Putra HWK, McCafferty JA, Setiabakti NM, Lee H, et al. The class II PI 3-kinase, PI3KC2α, links platelet internal membrane structure to shear- dependent adhesive function. Nat Commun. 2015;6: 6535. doi:10.1038/ncomms7535

176

137. Bielas SL, Silhavy JL, Brancati F, Kisseleva MV, Al-Gazali L, Sztriha L, et al. Mutations in INPP5E, encoding inositol polyphosphate-5-phosphatase E, link phosphatidyl inositol signaling to the ciliopathies. Nat Genet. 2009;41: 1032–1036. doi:10.1038/ng.423

138. Chen A, Tiosano D, Guran T, Baris HN, Bayram Y, Mory A, et al. Mutations in the mitochondrial ribosomal protein MRPS22 lead to primary ovarian insufficiency. Hum Mol Genet. 2018; doi:10.1093/hmg/ddy098

139. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma Oxf Engl. 2009;25: 1754–1760. doi:10.1093/bioinformatics/btp324

140. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20: 1297–1303. doi:10.1101/gr.107524.110

141. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38: e164. doi:10.1093/nar/gkq603

142. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non- redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35: D61–65. doi:10.1093/nar/gkl842

143. Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016;44: D862–868. doi:10.1093/nar/gkv1222

144. Liu X, Jian X, Boerwinkle E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat. 2011;32: 894–899. doi:10.1002/humu.21517

145. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm : Abstract : Nature Protocols [Internet]. [cited 21 Mar 2016]. Available: http://www.nature.com/nprot/journal/v4/n7/abs/nprot.2009.86.html

146. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7: 248–249. doi:10.1038/nmeth0410-248

147. Neveling K, Feenstra I, Gilissen C, Hoefsloot LH, Kamsteeg E-J, Mensenkamp AR, et al. A post-hoc comparison of the utility of sanger sequencing and exome sequencing for the diagnosis of heterogeneous diseases. Hum Mutat. 2013;34: 1721–1726. doi:10.1002/humu.22450

148. Wei Z, Wang W, Hu P, Lyon GJ, Hakonarson H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 2011;39: e132. doi:10.1093/nar/gkr599

177

149. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. Prepr ArXiv12073907v2 Q-BioGN. 2012;

150. Rimmer A, Phan H, Mathieson I, Iqbal Z, Twigg SRF, WGS500 Consortium, et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat Genet. 2014;46: 912–918. doi:10.1038/ng.3036

151. Hauer NN, Popp B, Schoeller E, Schuhmann S, Heath KE, Hisado-Oliva A, et al. Clinical relevance of systematic phenotyping and exome sequencing in patients with short stature. Genet Med. 2017; doi:10.1038/gim.2017.159

152. Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SFA, et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007;17: 1665–1674. doi:10.1101/gr.6861907

153. Copy number variation detection and genotyping from exome sequence data [Internet]. [cited 28 Aug 2017]. Available: http://genome.cshlp.org/content/early/2012/05/14/gr.138115.112.abstract

154. Pfundt R, Del Rosario M, Vissers LELM, Kwint MP, Janssen IM, de Leeuw N, et al. Detection of clinically relevant copy-number variants by exome sequencing in a large cohort of genetic disorders. Genet Med Off J Am Coll Med Genet. 2017;19: 667–675. doi:10.1038/gim.2016.163

155. Buchner DA, Charrier A, Srinivasan E, Wang L, Paulsen MT, Ljungman M, et al. Zinc Finger Protein 407 (ZFP407) Regulates Insulin-stimulated Glucose Uptake and Glucose Transporter 4 (Glut4) mRNA. J Biol Chem. 2015;290: 6376–6386. doi:10.1074/jbc.M114.623736

156. Knaup KX, Guenther R, Stoeckert J, Monti JM, Eckardt K-U, Wiesener MS. HIF is not essential for suppression of experimental tumor growth by mTOR inhibition. J Cancer. 2017;8: 1809–1817. doi:10.7150/jca.16486

157. Neff MM, Neff JD, Chory J, Pepper AE. dCAPS, a simple technique for the genetic analysis of single nucleotide polymorphisms: experimental applications in Arabidopsis thaliana genetics. Plant J Cell Mol Biol. 1998;14: 387–392.

158. Neff MM, Turk E, Kalishman M. Web-based primer design for single nucleotide polymorphism analysis. Trends Genet TIG. 2002;18: 613–615.

159. Fu W, O’Connor TD, Akey JM. Genetic architecture of quantitative traits and complex diseases. Curr Opin Genet Dev. 2013;23: 678–683. doi:10.1016/j.gde.2013.10.008

160. Bloom JS, Ehrenreich IM, Loo WT, Lite T-LV, Kruglyak L. Finding the sources of missing heritability in a yeast cross. Nature. 2013;494: 234–237. doi:10.1038/nature11867

161. Jasnos L, Korona R. Epistatic buffering of fitness loss in yeast double deletion strains. Nat Genet. 2007;39: 550–554. doi:10.1038/ng1986

178

162. Lehner B, Crombie C, Tischler J, Fortunato A, Fraser AG. Systematic mapping of genetic interactions in Caenorhabditis elegans identifies common modifiers of diverse signaling pathways. Nat Genet. 2006;38: 896–903. doi:10.1038/ng1844

163. Huang W, Richards S, Carbone MA, Zhu D, Anholt RRH, Ayroles JF, et al. Epistasis dominates the genetic architecture of Drosophila quantitative traits. Proc Natl Acad Sci. 2012;109: 15553–15559. doi:10.1073/pnas.1213423109

164. Shao H, Burrage LC, Sinasac DS, Hill AE, Ernest SR, O’Brien W, et al. Genetic architecture of complex traits: Large phenotypic effects and pervasive epistasis. Proc Natl Acad Sci. 2008;105: 19910 –19914. doi:10.1073/pnas.0810388105

165. Mackay TFC. Epistasis and Quantitative Traits: Using Model Organisms to Study Gene- Gene Interactions. Nat Rev Genet. 2014;15: 22–33. doi:10.1038/nrg3627

166. Huang W, Mackay TFC. The Genetic Architecture of Quantitative Traits Cannot Be Inferred from Variance Component Analysis. PLoS Genet. 2016;12: e1006421. doi:10.1371/journal.pgen.1006421

167. Wood AR, Tuke MA, Nalls MA, Hernandez DG, Bandinelli S, Singleton AB, et al. Another explanation for apparent epistasis. Nature. 2014;514: E3–E5. doi:10.1038/nature13691

168. Fish AE, Capra JA, Bush WS. Are Interactions between cis-Regulatory Variants Evidence for Biological Epistasis or Statistical Artifacts? Am J Hum Genet. 2016;99: 817–830. doi:10.1016/j.ajhg.2016.07.022

169. Nadeau JH, Singer JB, Matin A, Lander ES. Analysing complex genetic traits with chromosome substitution strains. Nat Genet. 2000;24: 221–225. doi:10.1038/73427

170. Darvasi A, Soller M. Advanced intercross lines, an experimental population for fine genetic mapping. Genetics. 1995;141: 1199–1207.

171. Talbot CJ, Nicod A, Cherny SS, Fulker DW, Collins AC, Flint J. High-resolution mapping of quantitative trait loci in outbred mice. Nat Genet. 1999;21: 305–308. doi:10.1038/6825

172. Sackton TB, Hartl DL. Genotypic Context and Epistasis in Individuals and Populations. Cell. 2016;166: 279–287. doi:10.1016/j.cell.2016.06.047

173. Chow CY. Bringing genetic background into focus. Nat Rev Genet. 2016;17: 63–64. doi:10.1038/nrg.2015.9

174. Buchner DA, Nadeau JH. Contrasting genetic architectures in different mouse reference populations used for studying complex traits. Genome Res. 2015;25: 775–791. doi:10.1101/gr.187450.114

175. Rapp JP, Garrett MR, Deng AY. Construction of a double congenic strain to prove an epistatic interaction on blood pressure between rat chromosomes 2 and 10. J Clin Invest. 1998;101: 1591–1595. doi:10.1172/JCI2251

179

176. Mackay TF, Moore JH. Why epistasis is important for tackling complex human disease genetics. Genome Med. 2014;6: 125. doi:10.1186/gm561

177. Brown MS, Goldstein JL. Selective versus total insulin resistance: a pathogenic paradox. Cell Metab. 2008;7: 95–96. doi:10.1016/j.cmet.2007.12.009

178. Stoppa-Lyonnet D. The biological effects and clinical implications of BRCA mutations: where do we go from here? Eur J Hum Genet EJHG. 2016;24 Suppl 1: S3–9. doi:10.1038/ejhg.2016.93

179. Wang K, Lim HY, Shi S, Lee J, Deng S, Xie T, et al. Genomic landscape of copy number aberrations enables the identification of oncogenic drivers in hepatocellular carcinoma. Hepatol Baltim Md. 2013;58: 706–717. doi:10.1002/hep.26402

180. Kan Z, Zheng H, Liu X, Li S, Barber TD, Gong Z, et al. Whole-genome sequencing identifies recurrent mutations in hepatocellular carcinoma. Genome Res. 2013;23: 1422–1433. doi:10.1101/gr.154492.113

181. Gerke J, Lorenz K, Cohen B. Genetic Interactions Between Transcription Factors Cause Natural Variation in Yeast. Science. 2009;323: 498–501. doi:10.1126/science.1166426

182. Gerke J, Lorenz K, Ramnarine S, Cohen B. Gene–Environment Interactions at Nucleotide Resolution. PLOS Genet. 2010;6: e1001144. doi:10.1371/journal.pgen.1001144

183. Sawcer S, Franklin RJM, Ban M. Multiple sclerosis genetics. Lancet Neurol. 2014;13: 700– 709. doi:10.1016/S1474-4422(14)70041-9

184. Buchner DA, Burrage LC, Hill AE, Yazbek SN, O’Brien WE, Croniger CM, et al. Resistance to diet-induced obesity in mice with a single substituted chromosome. Physiol Genomics. 2008;35: 116–122. doi:10.1152/physiolgenomics.00033.2008

185. Hill-Baskin AE, Markiewski MM, Buchner DA, Shao H, DeSantis D, Hsiao G, et al. Diet- induced hepatocellular carcinoma in genetically predisposed mice. Hum Mol Genet. 2009;18: 2975–2988. doi:10.1093/hmg/ddp236

186. Polderman TJC, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visscher PM, et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet. 2015;47: 702–709. doi:10.1038/ng.3285

187. Siegal ML, Bergman A. Waddington’s canalization revisited: Developmental stability and evolution. Proc Natl Acad Sci. 2002;99: 10528–10532. doi:10.1073/pnas.102303999

188. Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Pagé N, et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science. 2001;294: 2364–2368. doi:10.1126/science.1065810

189. Sailer ZR, Harms MJ. Detecting High-Order Epistasis in Nonlinear Genotype-Phenotype Maps. Genetics. 2017;205: 1079–1088. doi:10.1534/genetics.116.195214

180

190. Lagator M, Igler C, Moreno AB, Guet CC, Bollback JP. Epistatic Interactions in the Arabinose Cis-Regulatory Element. Mol Biol Evol. 2016;33: 761–769. doi:10.1093/molbev/msv269

191. Taylor MB, Ehrenreich IM. Higher-order genetic interactions and their contribution to complex traits. Trends Genet TIG. 2015;31: 34–40. doi:10.1016/j.tig.2014.09.001

192. Nadeau JH, Forejt J, Takada T, Shiroishi T. Chromosome substitution strains: gene discovery functional analysis and systems studies. Mamm Genome Off J Int Mamm Genome Soc. 2012;23: 693–705. doi:10.1007/s00335-012-9426-y

193. Yazbek SN, Buchner DA, Geisinger JM, Burrage LC, Spiezio SH, Zentner GE, et al. Deep congenic analysis identifies many strong, context-dependent QTLs, one of which, Slc35b4, regulates obesity and glucose homeostasis. Genome Res. 2011;21: 1065–1073. doi:10.1101/gr.120741.111

194. Buchner DA, Geisinger JM, Glazebrook PA, Morgan MG, Spiezio SH, Kaiyala KJ, et al. The juxtaparanodal proteins CNTNAP2 and TAG1 regulate diet-induced obesity. Mamm Genome Off J Int Mamm Genome Soc. 2012;23: 431–442. doi:10.1007/s00335-012-9400- 8

195. Waddington, C. H. Canalization of development and the inheritance of acquired characters. Nature. 1942;150: 563–565.

196. Tyler AL, Donahue LR, Churchill GA, Carter GW. Weak Epistasis Generally Stabilizes Phenotypes in a Mouse Intercross. PLOS Genet. 2016;12: e1005805. doi:10.1371/journal.pgen.1005805

197. Gonzalez PN, Pavlicev M, Mitteroecker P, Pardo-Manuel de Villena F, Spritz RA, Marcucio RS, et al. Genetic structure of phenotypic robustness in the collaborative cross mouse diallel panel. J Evol Biol. 2016;29: 1737–1751. doi:10.1111/jeb.12906

198. Costanzo M, VanderSluis B, Koch EN, Baryshnikova A, Pons C, Tan G, et al. A global genetic interaction network maps a wiring diagram of cellular function. Science. 2016;353. doi:10.1126/science.aaf1420

199. Guerrero RF, Muir CD, Josway S, Moyle LC. Pervasive antagonistic interactions among hybrid incompatibility loci. PLoS Genet. 2017;13: e1006817. doi:10.1371/journal.pgen.1006817

200. Bastepe M, Fröhlich LF, Linglart A, Abu-Zahra HS, Tojo K, Ward LM, et al. Deletion of the NESP55 differentially methylated region causes loss of maternal GNAS imprints and type Ib. Nat Genet. 2005;37: 25–27. doi:10.1038/ng1487

201. Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Res. 2009;19: 1553–1561. doi:10.1101/gr.092619.109 fr. Segrè D, DeLuna A, Church GM, Kishony R. Modular epistasis in yeast metabolism. Nat Genet. 2005;37: 77–83. doi:10.1038/ng1489

181

203. Carlborg O, Jacobsson L, Ahgren P, Siegel P, Andersson L. Epistasis and the release of genetic variation during long-term selection. Nat Genet. 2006;38: 418–420. doi:10.1038/ng1761

204. Tyler AL, Ji B, Gatti DM, Munger SC, Churchill GA, Svenson KL, et al. Epistatic Networks Jointly Influence Phenotypes Related to Metabolic Disease and Gene Expression in Diversity Outbred Mice. Genetics. 2017;206: 621–639. doi:10.1534/genetics.116.198051

205. Weigelt B, Reis-Filho JS. Epistatic interactions and drug response. J Pathol. 2014;232: 255–263. doi:10.1002/path.4265

206. Wong A. Epistasis and the Evolution of Antimicrobial Resistance. Front Microbiol. 2017;8. doi:10.3389/fmicb.2017.00246

207. Fox J, Weisberg S. An R Companion to Applied Regression, Second Edition. Sage Publications; 2011.

208. Michael Dewey (2016). metap: meta-analysis of significance values. R package version 0.7.).

209. Westfall, P. H. & Young, S. S. Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley; 1993.

210. Chhangawala S, Rudy G, Mason CE, Rosenfeld JA. The impact of read length on quantification of differentially expressed genes and splice junction detection. Genome Biol. 2015;16: 131. doi:10.1186/s13059-015-0697-y

211. Andrews S. FastQC: a quality control tool for high throughput sequence data [Internet]. 2010. Available: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

212. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14: R36. doi:10.1186/gb-2013-14-4-r36

213. Munger SC, Raghupathy N, Choi K, Simons AK, Gatti DM, Hinerfeld DA, et al. RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations. Genetics. 2014;198: 59–73. doi:10.1534/genetics.114.165886

214. Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477: 289–294. doi:10.1038/nature10413

215. Anders S, Pyl PT, Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31: 166–169. doi:10.1093/bioinformatics/btu638

216. Mudge JM, Harrow J. Creating reference gene annotation for the mouse C57BL6/J genome assembly. Mamm Genome Off J Int Mamm Genome Soc. 2015;26: 366–378. doi:10.1007/s00335-015-9583-x

182

217. Leek JT. svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res. 2014;42: e161–e161. doi:10.1093/nar/gku864

218. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26: 139–140. doi:10.1093/bioinformatics/btp616

219. Sun L, Craiu RV, Paterson AD, Bull SB. Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genet Epidemiol. 2006;30: 519–530. doi:10.1002/gepi.20164

220. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser B Methodol. 1995;57: 289–300.

221. Aubourg P, Adamsbaum C, Lavallard-Rousseau M-C, Rocchiccioli F, Cartier N, Jambaque I, et al. A Two-Year Trial of Oleic and Erucic Acids (“Lorenzo’s Oil”) as Treatment for Adrenomyeloneuropathy. N Engl J Med. 1993;329: 745–752. doi:10.1056/NEJM199309093291101

222. Putative X-linked adrenoleukodystrophy gene shares unexpected homology with ABC transporters | Nature [Internet]. [cited 11 Jun 2018]. Available: https://www.nature.com/articles/361726a0

223. SPARK Consortium. Electronic address: [email protected], SPARK Consortium. SPARK: A US Cohort of 50,000 Families to Accelerate Autism Research. Neuron. 2018;97: 488–493. doi:10.1016/j.neuron.2018.01.015

224. Lim JK, McDermott DH, Lisco A, Foster GA, Krysztof D, Follmann D, et al. CCR5 Deficiency is a Risk Factor for Early Clinical Manifestations of West Nile Virus Infection, but not for Infection per se. J Infect Dis. 2010;201: 178–185. doi:10.1086/649426

225. Golubnitschaja O, Kinkorova J, Costigliola V. Predictive, Preventive and Personalised Medicine as the hardcore of “Horizon 2020”: EPMA position paper. EPMA J. 2014;5: 6. doi:10.1186/1878-5085-5-6

226. Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc JAMIA. 2013;20: 117–121. doi:10.1136/amiajnl-2012-001145

227. Zheng T, Xie W, Xu L, He X, Zhang Y, You M, et al. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inf. 2017;97: 120– 127. doi:10.1016/j.ijmedinf.2016.09.014

228. Dai W, Brisimi TS, Adams WG, Mela T, Saligrama V, Paschalidis IC. Prediction of hospitalization due to heart diseases by supervised learning methods. Int J Med Inf. 2015;84: 189–197. doi:10.1016/j.ijmedinf.2014.10.002

229. Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell. 2018;172: 1122–1131.e9. doi:10.1016/j.cell.2018.02.010

183

230. Pantel JT, Zhao M, Mensah MA, Hajjir N, Hsieh T-C, Hanani Y, et al. Advances in computer-assisted syndrome recognition by the example of inborn errors of metabolism. J Inherit Metab Dis. 2018;41: 533–539. doi:10.1007/s10545-018-0174-3

231. Gurovich Y, Hanani Y, Bar O, Fleischer N, Gelbman D, Basel-Salmon L, et al. DeepGestalt - Identifying Rare Genetic Syndromes Using Deep Learning. ArXiv180107637 Cs. 2018; Available: http://arxiv.org/abs/1801.07637

232. Reanalysis of clinical whole-exome sequence data yields multiple new diagnoses. Am J Med Genet A. 176: 264–265. doi:10.1002/ajmg.a.38608

233. Costain G, Jobling R, Walker S, Reuter MS, Snell M, Bowdin S, et al. Periodic reanalysis of whole-genome sequencing data enhances the diagnostic advantage over standard clinical genetic testing. Eur J Hum Genet. 2018;26: 740–744. doi:10.1038/s41431-018-0114-6

234. Barrero MJ, Boué S, Belmonte JCI. Epigenetic Mechanisms that Regulate Cell Identity. Cell Stem Cell. 2010;7: 565–570. doi:10.1016/j.stem.2010.10.009

235. Berdasco M, Esteller M. Genetic syndromes caused by mutations in epigenetic genes. Hum Genet. 2013;132: 359–383. doi:10.1007/s00439-013-1271-x

236. Klein CJ, Botuyan M-V, Wu Y, Ward CJ, Nicholson GA, Hammans S, et al. Mutations in DNMT1 cause hereditary sensory neuropathy with dementia and hearing loss. Nat Genet. 2011;43: 595–600. doi:10.1038/ng.830

237. Ling C, Poulsen P, Simonsson S, Rönn T, Holmkvist J, Almgren P, et al. Genetic and epigenetic factors are associated with expression of respiratory chain component NDUFB6 in human skeletal muscle. J Clin Invest. 2007;117: 3427–3435. doi:10.1172/JCI30938

238. Roseboom T, de Rooij S, Painter R. The Dutch famine and its long-term consequences for adult health. Early Hum Dev. 2006;82: 485–491. doi:10.1016/j.earlhumdev.2006.07.001

239. Krauss-Etschmann S, Meyer KF, Dehmel S, Hylkema MN. Inter- and transgenerational epigenetic inheritance: evidence in asthma and COPD? Clin Epigenetics. 2015;7. doi:10.1186/s13148-015-0085-1

240. Flom JD, Ferris JS, Liao Y, Tehranifar P, Richards CB, Cho YH, et al. Prenatal smoke exposure and genomic DNA methylation in a multiethnic birth cohort. Cancer Epidemiol Biomark Prev Publ Am Assoc Cancer Res Cosponsored Am Soc Prev Oncol. 2011;20: 2518–2523. doi:10.1158/1055-9965.EPI-11-0553

184