APPLYING FORWARD GENETIC APPROACHES TO
RARE MENDELIAN DISORDERS AND COMPLEX
TRAITS
By
ANLU CHEN
Submitted in partial fulfillment of the requirements
For the degree of Doctor of Philosophy
Dissertation Advisor: Dr. David Buchner
Department of Biochemistry
CASE WESTERN RESERVE UNIVERSITY
August, 2018
CASE WESTERN RESERVE UNIVERSITY
SCHOOL OF GRADUATE STUDIES
We hereby approve the thesis/dissertation of
ANLU CHEN
Candidate for the degree of Doctor of Philosophy*.
Committee Chair
Hung-Ying Kao
Committee Member
David Buchner
Anna Mitchell
Anthony Wynshaw-Boris
Eckhard Jankowsky
Date of Defense
July 5th, 2018
*We also certify that written approval has been obtained for any proprietary material contained therein
TABLE OF CONTENTS
TABLE OF CONTENTS……………………………………………...…………………i
LIST OF TABLES…………………………………………………….………………...iv
LIST OF FIGURES…………………………………………………………….…..…...vi
LIST OF ABBREVIATIONS…………………………………………………………viii
ACKNOWLEDGEMENT……………………………………………………………...ix
ABSTRACT…………………………………………………………………………...... xi
Chapter 1. Background and Significance……………………………………….……...1
Background………………………………………………………………………………2
1. Forward genetics and reverse genetics…………………………………………………2
1.1. History of genetic studies…………………………………………………….2
1.2. New era of forward genetics using next-generation sequencing……………..5
2. Complex traits and human diseases…………………………………………………….6
2.1. Genome-wide association study (GWAS)…………………………………...6
2.2. Missing heritability…………………………………………………………..7
3. Rare Mendelian disorders……………………………………………………………..10
3.1. Rare disorders……………………………………………………………….10
3.2. Consanguineous families……………………………………………………10
Significance……………………………………………………………………………..12
Chapter 2. Mutations in the Mitochondrial Ribosomal Protein MRPS22 Lead to
Primary Ovarian Insufficiency………………………………………………………...14
(Adapted from Chen A. et al. Human Molecular Genetics 2018)
i
Abstract…………………………………………………………………………………15
Introduction…………………………………………………………………………….16
Results…………………………………………………………………………………..18
1. Identification of mutations in MRPS22 in patients with POI…………………………18
2. Cellular studies of POI patient-derived fibroblasts…………………………………...33
3. Embryonic lethality of Mrps22 deficient mice………………………………………..38
4. mRpS22 in Drosophila germ cells is required for fertility……………………………40
Discussion…………………………………………………………………………….....44
Materials and Methods……………………………………………………………….. 50
Chapter 3. Mutations in PIK3C2A Cause Syndromic Short Stature Associated with
Cataracts and Skeletal Abnormalities……………………….………………………..58
(Manuscript in preparation)
Abstract…………………………………………………………………………………59
Introduction…………………………………………………………………………….60
Results…………………………………………………………………………………...61
1. Identification of mutations in PIK3C2A in patients with syndromic short stature……61
2. Identification of cellular defects in patient-derived fibroblasts……………………….76
3. Pik3c2a deficiency causes cataracts in zebrafish model………………………………79
Discussion……………………………………………………………………………….83
Materials and Methods………………………………………………………………...87
ii
Chapter 4. Widespread Epistasis Regulates Glucose Homeostasis and Gene
Expression………………………………………………………………………………97
(Adapted from Chen A. et al. PLoS Genet. 2017)
Abstract…………………………………………………………………………………98
Introduction…………………………………………………………………………….99
Results………………………………………………………………………………….101
1. Contribution of epistasis to metabolic traits………………………………………....101
2. Regulation of gene expression by epistasis………………………………….………114
3. Context-dependent effects on gene expression………………………………………128
4. Significant contribution of epistasis to trait heritability……………………………..133
Discussion……………………………………………………………………………...136
Materials and Methods……………………………………………………………….142
Chapter 5. Summary and Future Direction…………………………………………157
Summary………………………………………………………………………………158
Future Directions……………………………………………………………………...159
1. Researchers are not alone in battles against genetic diseases………………………..159
2. Gene therapy to cure the diseases……………………………………………………161
3. Strategies to predict disease risk loci………………………………………………...162
4. Strategies to better under current data……………………………………………….163
5. What’s beyond genetic studies in understanding human disorders?...... 164
Reference……………………………………………………………………………....166
iii
LIST OF TABLES
Table 2.1. Hormone levels in four individuals with POI………………………………..22
Table 2.2. Adrenocorticotropic hormone stimulation test for patients with POI………..22
Table 2.3. List of gaps in WES coverage in Family I with POI and primers used for
Sanger sequencing of these regions…………………………………………………...... 27
Table 2.4. Plasma adrenal steroid levels in two individuals with POI…………………..32
Table 2.5. Survival of offspring from a Mrps22 heterozygous knockout mouse (+/-) intercross…………………………………………………………………………………39
Table 2.6. Phenotypes of RNAi-mediated mRpS22 tissue-specific knockdown in
Drosophila……………………………………………………………………………….41
Table 3.1. Phenotypic characteristics of patients in three families with Syndromic Short
Stature……………………………………………………………………………………65
Table 3.2. Candidate variants identified by WES in patients with Syndromic Short
Stature……………………………………………………………………………………69
Table 3.3. Survival of offspring from pik3c2a heterozygous knockout zebrafish (+/-) crosses……………………………………………………………………………………80
Table 3.4. List of primers used in the study of Syndromic Short Stature with PIK3C2A mutations………………………………………………………………………………...95
Table 3.5. List of antibodies used in the study of Syndromic Short Stature…………….96
Table 4.1. Number of mice used for analysis of body weight and plasma glucose……102
Table 4.2. Main and average effects on phenotypes…………………………………...109
Table 4.3. Main effects on gene expression……………………………………………118
Table 4.4. Summary of genes with mutliple meQTLs…………………………………118
iv
Table 4.5. Genes exaimined by RNA-Seq and RT-qPCR for epistasis and additive interactions……………………………………………………………………………...122
Table 4.6. Interection effects on gene expression……………………………………...124
Table 4.7. Summary of genes with mutliple ieQTLs…………………………………..126
Table 4.8. Identification of fasting glucose QTLs using a combined linear model……153
Table 4.9. Identification of body weight QTLs using a combined linear model………154
Table 4.10. Primer sequences for RT-qPCR detection………………………………...155
v
LIST OF FIGURES
Fig. 2.1. Pedigrees of two consanguineous families with POI…………………………..19
Fig. 2.2. Absence of germ cells in the ovary of a female patient with the MRPS22 p.R202H mutation………………………………………………………………………..24
Fig. 2.3. Independent mutations in MRPS22 identified in two consanguineous families with POI………………………………………………………………………………….28
Fig. 2.4. Molecular analysis of fibroblasts from patients with the MRPS22 (p.R202H) mutation………………………………………………………………………………….34
Fig. 2.5. Oxidative phosphorylation is normal in fibroblasts from patients with the
MRPS22 (p.R202H) mutation…………………………………………………………...36
Fig. 2.6. mRpS22 is required for female germ cell development in Drosophila………...43
Fig. 2.7. Structural analysis of disease-causing missense mutations in MRPS22……….48
Fig. 3.1. Pedigrees and phenotypic characteristics of patients with Syndromic Short
Stature…………………...... 63
Fig. 3.2. Detailed phenotypic characteristics of individuals with PIK3C2A deficiency...... 66
Fig. 3.3. Loss-of-function mutations in PIK3C2A. ……………………………………..71
Fig. 3.4. Protein and mRNA levels of PIK3C2A in patient-derived cells……………….74
Fig. 3.5. Cilia defects in patient-derived PIK3C2A fibroblasts………………………….75
Fig. 3.6. PIK3C2A exon skipping in individual III-II-2 with Syndromic Short
Stature……………………………………………………………………………………77
Fig. 3.7. Localization of ciliary markers in patient-derived PIK3C2A deficient fibroblasts………………………………………………………………………………...78
vi
Fig. 3.8. Pik3c2a deficiency in zebrafish causes cataracts..……………………………..82
Fig. 4.1. Body weight and glucose levels in all CSS and control mice………………...103
Fig. 4.2. Schematic diagram of CSS and control crosses………………………………105
Fig. 4.3. Identification of 5 inter-chromosomal epistatic interactions that regulate
fasting glucose levels in mice…………………………………………………………..111
Fig.4.4. Inter-chromosomal epistasis regulates fasting glucose levels…………………112
Fig. 4.5. Identification of meQTLs that regulate hepatic gene expression……………..116
Fig. 4.6. Positive correlation between cis-meQTLs and trans-meQTLs……………….117
Fig. 4.7. Identification of 5 trans-meQTLs that regulate the hepatic expression of
Brca2……………………………………………………………………………………119
Fig. 4.8. Regulation of hepatic Zkscan3 expression by additive meQTLs……………..121
Fig. 4.9. Positive correlation between cis-ieQTLs and trans-ieQTLs………………….125
Fig. 4.10. Identification of 4 ieQTLs that regulate the hepatic expression of Agt……..127
Fig. 4.11. Schematic diagram illustrating the categorization of epistasis as either synergistic or antagonistic………………………………………………………………129
Fig. 4.12. Examples of synergistic and antagonistic ieQTLs…………………………..131
Fig. 4.13. Contribution of epistasis to the genetic regulation of hepatic gene expression……………………………………………………………………………....134
Fig. 4.14. No differences in mapping efficiency of RNA-Seq reads between B6 and
CSSs…………………………………………………………………………………….148
vii
LIST OF ABBREVIATIONS
HD Huntington's disease ENU N-ethyl-N-nitrosourea RNAi RNA interference WGS whole genome sequencing WES whole exome sequencing GWAS Genome-wide association study POI Primary ovarian insufficiency LH luteinizing hormone FSH follicle-stimulating hormone E2 estradiol ACTH Adrenocorticotropic hormone test CAH congenital adrenal hyperplasia SNP single nucleotide polymorphism MAF minor allele frequency CSS chromosome substitution strains QTL quantitative trait loci me-QTL main expression QTL ie-QTL interaction expression QTLs
viii
ACKNOWLEDGEMENTS
I would like to express my sincere gratitude to my advisor Dr. David Buchner for his
unwavering support and encouragement during my Ph.D. odyssey. His lab was the third
lab I rotated during the first semester of my Ph.D. program. With the great experience in
previous lab, I still felt something special in him and truly believed he would be a great
mentor. I appreciate that he fully trusts my ability to conduct three intriguing projects.
With his art of foresight on the projects, extraordinary communication skills among collaborators, my work and some luck, all of them turned out well.
Since I mainly worked on human genetic projects, which requires widely collaboration,
I would sincerely thank our collaborators. Without their help and support, I couldn’t have fulfilled the three diverse projects.
I would like to thank all my lab members, Dr. Alyssa Charrier, Li Wang and Rachel
Stegemann, who help build up the best lab ever.
I wish to sincerely acknowledge my committee members, Dr. Hung-Ying Kao, Dr.
Anthony Wynshaw-Boris, Dr. Anna Mitchell and Dr. Eckhard Jankowsky. They have provided valuable suggestions and comments to my research as well as fully supported my career.
I would thank student and faculty fellows in the Department of Biochemistry for the stimulating discussions and for all the fun that we had together, especially for choosing me as this year’s student representative and full support. I also would like to thank
Department of Genetics for the harmonic atmosphere for conducting research.
ix
At least, I would like to thank my parents, for their unselfish love and support.
x
Applying Forward Genetic Approaches to Rare Mendelian
Disorders and Complex Traits
Abstract
By
ANLU CHEN
Forward genetics utilizes unbiased genetic approaches to locate causal genetic variants of heritable traits. Classic forward genetics approaches were tremendously successful, but are now widely considered too time-consuming and laborious. Technological and methodological innovations, such as next-generation sequencing, have ushered in a new era of forward genetics studies to better understand the genetic basis of disease. In this dissertation, I provided examples of forward genetic approaches that successfully identified novel genetic causes of two rare Mendelian disorders (Chapter 2,3) and further discovered the genetic architecture of metabolism-related complex traits (Chapter 4).
Mendelian disorders are caused by variants in a single gene, while complex traits are regulated by multiple genes, which either work independently or interact with each other.
In Chapters 2 and 3, we explored the exome sequence of patients from consanguineous families to identify causal genetic variants, which were further studied using cellular and animal models. Our findings unveiled novel functions of the genes MRPS22 and
PIK3C2A, and facilitated a better understanding of normal and pathological development of their associated disorders. Our study expanded the phenotypic spectrum of MRPS22
xi
mutations from mitochondrial diseases to now also include primary ovarian insufficiency and elucidated its cell autonomous role in germ cell development. Our study on syndromic short stature associated with cataracts and skeletal abnormalities also identified the first Mendelian disorder associated with PIK3C2A mutations, whose in vivo role was poorly understood. In Chapter 4, we identified widespread epistatic interactions using double chromosome substitution stains in mice and provided strong evidence for the controversial contribution of epistasis to genetically complex traits and diseases. Our findings demonstrated that epistatic interactions controlled the majority of the heritable variation in both fasting plasma glucose levels and hepatic gene expression, even greater than the additive effects on these traits. These findings may partially explain the phenomenon of ‘missing heritability’ in complex traits. We also identified that the epistatic interactions were prone to keep trait levels at their “normal” level. We hypothesize that this is evolutionarily advantageous, enabling stored genetic variants in the genome without reducing fitness while allowing for rapid adaptation to future environmental challenges.
xii
Chapter 1: Introduction
1
Background
1. Forward genetics and reverse genetics
1.1 History of modern genetic studies
Gregor Johann Mendel is widely known as the ‘father of modern genetics’ due to his
work on pea plants published in 1866, which first characterized the inheritance patterns
of certain traits in pea plants, now known as ‘Mendelian inheritance’. Though largely
undiscovered for over 30 years, the value of this profound discovery was marked in
history. In the early 1900s, three geneticists, Hugo de Vries, Carl Correns and Erich von
Tschermak independently “rediscovered” Mendel’s findings and enlightened researchers
to further investigate the nature of the phenomena. With the discovery of the double
helical structure of DNA by James Watson and Francis Crick in 1953 as well as
contributions from Rosalind Franklin and Maurice Wilkins, a DNA sequencing technique
by Frederick Sanger and colleagues in 1977, and the PCR (polymerase chain reaction) technique by Kary Mullis in 1983, we entered a golden era of identifying the molecular link between genotype and phenotype.
To identify phenotype-genotype correlations, classic forward genetic approaches were brought to the field, which attempt to locate the causal variants of a heritable trait in an unbiased manner that is based on the position of the variant in the genome rather than the function of the variant or the gene through which the variant acts. One such method to identify the genetic basis of a measurable phenotype is to perform linkage mapping to localize a causal mutation to a defined chromosomal region. This can then facilitate the
2
identification of the causal variant using PCR-based sequencing methods, followed by
genetic and/or biological approaches to further validate the causal relationship.
With so many biomedically relevant phenotypes to study, but little knowledge of gene function, it has been tremendously valuable to identify the genes that cause various phenotypes using classic forward genetic approaches. For example, the trembler strain, a naturally occurring mutant mouse strain, was used as a model for peripheral neuropathy and facilitated the discovery of disease-causing mutations in Pmp22[1]. In humans, studies of Huntington's disease (HD) also benefited from this approach. With haplotype analysis of linkage disequilibrium, expansions of CAG repeats in the coding region of the huntingtin (HTT) gene were identified[2]. To facilitate additional forward genetic studies, a variety of animal models have been created to improve the efficiency of disease gene identification. For example, X-ray and ENU (N-ethyl-N-nitrosourea) exposure generates mutations with over 100-fold increased efficiency compared with that of spontaneous mutations[3]. X-rays trigger large deletions or chromosomal translocations and have been used to study phenotype induced by null mutations, whereas ENU induces point mutations and provides the opportunity to dissect phenotypes with missense mutations, nonsense mutations and splicing variants in parallel[4]. As a result of classic forward
genetic approaches, many genes are named after their phenotype. For example, the rosy
gene (ry) encodes for xanthine dehydrogenase in Drosophila, but was named because
flies with homozygous recessive mutations in this gene developed rosy eye color[5].
3
In contrast to forward genetics, reverse genetic approaches have an emphasis on
manipulating a specific gene and studying the consequences of this manipulation at the
molecular and phenotype levels. So far, 6,035 eukaryotic species have had their genome
sequenced and are currently available through the National Center for Biotechnology
Information. Genome-wide sequencing revealed a large number of genes whose functions
are unknown and cannot be predicted, but whose sequence availability can enable studies
based on reverse genetic approaches.
RNA interference (RNAi) is the process by which expression of a target gene is inhibited
by antisense and sense RNAs, and is one such method for studying gene function by
reverse genetics. RNAi was first discovered in C. elegans as an efficient mechanism for
silencing gene expression or translation [6] and has been applied in many organisms as a
high-throughput approach to generate genome-wide loss-of-function phenotypes [7],
including agricultural applications [8]. But the usage is severely limited by the ability to
deliver siRNAs to the target gene. In Chapter 2 of this thesis, we applied RNAi in exploring the phenotype of a gene of interest in Drosophila. In Xenopus and zebrafish,
morpholino oligonucleotides are more commonly applied. Morpholino oligonucleotides have been chemically modified to mimic the nucleotide sequences and therefore alter
mRNA splicing or translation. In terms of systematic approaches, in addition to
homologous recombination, insertional mutagenesis-based approaches were widely applied in bacteria, yeast, mice and other mammals. For example, by random insertion of a vector into the genome of mouse ES cells, gene trapping has generated over 121,000 ES cell lines so far, representing ~40% of all genes. Other targeted genetic editing
4
approaches were also applied, such as targeted knockouts, transgenics and so on. So far, in the mouse genome, 13,302 genes with spontaneous, induced or genetically engineered mutations have been phenotyped, and associated with 1,464 human disorders according to Mouse Genome Informatics (http://www.informatics.jax.org/).
1.2. New era of forward genetics using next-generation sequencing
Building upon the Sanger chain termination sequencing method, capillary electrophoresis-based sequencing instruments, as first-generation sequencing methods, were invented by Applied Biosystems and played a key role in the Human Genome
Project, which took 15 years and cost nearly three billion dollars to sequence the first human genome. Since then, technological and methodological innovations are providing the opportunity for a second surge of forward genetics studies. For example, the HiSeq X Ten System was released by Illumina in 2014. This system can generate
1.8 terabases (Tb) of data in a single sequencing run. With the help of barcoding methods to tag individual samples within a pool of DNA samples, this platform enables high-quality sequencing of over 45 human genomes per run and costs less than
$1000 per genome.
Using whole genome sequencing (WGS), the multinational 1000 Genomes Project identified genetic and structural variants, such as copy number variation, from 2,504 individuals from multiple ethnic groups [9–12]. This project turned out to be a great success in rare variants discovery. In total, ~64 million variants with less than 0.5% allele frequency were discovered, accounting for ~72% of all variants. Another
5
interesting finding is that the majority of variants observed in a single genome are common variants. This leads to the question: Which ones are associated with human fitness? Though the protein-coding portion of the genome (exome) accounts for just ~2% of the whole human genome, 85% of disorders for which genetic causes have been identified thus far are associated with variants in the exome [13,14]. As an alternative strategy to WGS, whole exome sequencing (WES) has become an affordable but efficient and reliable way to identify the molecular etiology of human diseases.
Additionally, there are publicly available databases of exome sequence from healthy individuals, such as the Exome Aggregation Consortium database[15] or the Greater
Middle East Variome Project[16] that can serve as controls for patient studies. For example, the databases mentioned above facilitated the studies in Chapter 2 and 3 in this thesis.
2. Complex traits and human diseases
2.1. Genome-wide association study (GWAS)
GWAS are based on the hypothesis that the variant causing a complex trait is more frequently present in individuals with that trait (the case group) than individuals without the traits (the control group). The aim of GWAS is to scan the genome in an unbiased manner to identify genetic variants associated with complex traits and disease susceptibility. The causal variants could then be discovered either directly from the common SNP markers genotyped in the study, or based on linkage disequilibrium between common SNP markers and other linked variants. In 1996, Risch and Merikangas
6
first claimed that GWAS have greater power than linkage studies, however, the major limitation was the identification of a large number of SNP markers (up to one million) to substantially reduce the required number of samples for screening [17]. In 2005, the multinational HapMap project (Phase I) developed a haplotype map of the human genome with over one million SNPs from a variety of human populations for initial screening [27]. This study also determined that approximately 500,000 common SNPs in the human genome are sufficient to tag causal variants in non-African populations. With the ability to quickly scan SNPs in the genome using commercialized SNP chips and a large number of case-control samples and population cohorts, GWAS became more feasible. In addition to SNP chips, WGS data also can serve as an important source of genotype information for GWAS with higher coverage and more SNP markers. So far, over 10,000 GWAS have reported significant associations between genetic variants and diseases, including heart disease, diabetes, auto-immune diseases, psychiatric disorders as well as quantitative traits [19].
2.2. Missing heritability
The variants currently identified by GWAS typically each have a small effect size, meaning each variant accounts for only a small portion of the phenotypic variation. Even combined, these variants can only explain 10%–20% of the phenotypic variance for most complex traits, with the remaining unidentified 80%-90% often referred to as “missing heritability”. By definition, heritability refers to the proportion of phenotypic variance due to heritable genetic factors. For example, the estimated heritability for height in
7
humans is 0.8, suggesting that inherited genetic variants account for 80% of the variation in height with the remaining 20% due to environmental factors[20]. Among the most
important unanswered questions in the current field of genetics is where to find the missing heritability.
2.2.1. Sample size
Though not linearly correlated, increasing the number of samples used in GWAS benefits
the detection of causal variants. For example, GWAS of approximately 700,000
individuals boosted the number of variants strongly associated with height to 3,290 [21].
Altogether, these variants explained 24.6% of heritability. In contrast, only 54 variants
were discovered in a study of approximately 63,000 individuals, which together
accounted for just 5% of heritability [22,23]. With efforts from multinational consortiums,
it is reasonable to predict that continuously growing numbers of variants will be
discovered based on the additional power to detect variants with larger sample sizes.
2.2.2. Rare variants
The underlying rationale for GWAS is the ‘common disease, common variant’ hypothesis.
However, rare variants also contribute to common diseases in some cases. Rare variants arbitrarily refer to variants with less than 1% minor allele frequency (MAF <1%) while
common variants are defined as MAF >5% in population. In GWAS, rare variants are
generally difficult to test for an association with a trait due to a lack of statistical power.
However, next-generation sequencing has dramatically boosted the discovery of rare
variants that contribute to common complex traits[24–28]. Taking Alzheimer's disease as
8
an example, rare variants in TREM2, PLD3, UNC5C and AKAP9 have been associated with Alzheimer susceptibility due to their large effect size[26]. In one such study,
Thorlakur Jonsson and colleagues imputed variants from the genome sequences of 2261
Icelanders into the genomes of patients with Alzheimer's disease and control participants and then performed GWAS. As a result, a rare variant, rs75932628-T, in the gene TERM2 was discovered[29].
2.2.3. Epistatic interactions
Even with ever-larger sample sizes and the ability to detect all genetic variants, structural variations and epigenetic factors, there is a theoretical pitfall diminishing our ability to make progress towards the identification of ‘missing heritability’. This pitfall was described by Eric Lander and colleagues in 2012 as ‘phantom heritability’ [30]. The percentage of explained heritability is calculated as a ratio of phenotypic variance explained by the additive effects of discovered variants (the numerator) and additive effects of all variants, even not discovered yet (the denominator). However, Lander claimed that the denominator might be overestimated and thus even when all variants are discovered, the percentage of explained variants is far below 100%, or ‘phantom heritability’. Such ‘phantom heritability’ can be caused by epistatic interactions among already discovered variants. Instead of additively affecting the phenotype, these variants interact with each other and trigger either diminished or exaggerated outcomes relative to that expected by additivity. So far, a number of genome-wide interaction-based association studies or Post-GWAS gene-gene interaction studies in humans have provided evidence for epistasis in a variety of complex traits and diseases[31–37]. For
9
example, SNP rs2106261 (G/A substitution) in ZFHX3, rs2200733 (C/T substitution)
near PITX2c, and rs3807989 (A/G substitution) in CAV1 have been previously identified
by GWAS that associated with Atrial fibrillation disease. Yufeng Huang and colleagues found that rs2200733 and rs2106261 epistatically interact with each other, resulting in a synergistic effect that increases the risk of Atrial fibrillation disease[31]. Therefore, a better understanding of epistasis can help solve the ‘missing heritability’ problem and discover novel disease pathophysiology.
3. Rare Mendelian disorders
3.1. Rare disorders
A rare disease refers to a disease that affects fewer than 200,000 people in the United
States[38]. There are an estimated 7,000 rare genetic diseases[39]. Collectively, between
25 and 30 million Americans suffer from a rare disease[38], which is nearly 1 in 10.
Moreover, approximately 75% of rare diseases occur in children under ten years of age
and threaten their lives[40]. An early and accurate genetic diagnosis is critical to the
optimal care for a child with a rare genetic disease. It’s also important for pregnant
women and couples if family history is a concern. Rare diseases are particularly
susceptible to misdiagnosis due to the lack of understanding or familiarity with the
disease by physicians. Even one reported case may inspire the discovery of additional
cases, and thus these initial studies can benefit the diagnosis of the patients, and provide
potential treatment for them.
3.2. Consanguineous families
10
A consanguineous family refers to a family with offspring from a marriage between two individuals with at least one ancestor in common. In general, consanguinity does not increase the risk for autosomal dominant conditions in offspring when one of the parents is affected, nor for X-linked recessive conditions if neither parent is affected[41].
However, consanguineous families have increased risk for autosomal recessive disorders because of the inheritance of autosomal recessive gene mutations from a common ancestor. The pre-reproductive mortality in offspring of first-cousins is ~3.5% higher than that of non-consanguineous offspring[42]. The highest rates of consanguineous marriage, up to 50%, occur in North and sub-Saharan Africa, the Middle East, and West
Asia. In these regions, traditionally consanguineous marriages are preferred and respected.
Our study described below is based on offspring from consanguineous families of Middle
Eastern origin.
Consanguineous families are an efficient genetic model to study rare disease. In a consanguineous family, the recessive mutation is carried and inherited within the family.
If two family members are cousins and each inherits one copy of the recessive mutation from their common ancestor, theoretically 25% of their offspring will inherit both copies of the mutation and develop a recessive disorder[43]. Moreover, current studies find that in a consanguineous family,30% of the rare-disease-causing genes turn out to be a novel gene function compared with around 8% in studies of the general population[44].
11
Significance
1. Studies of complex traits
GWAS on type 2 diabetes and other metabolic complex traits have dramatically
promoted the identification of associated variants, however, a large portion of heritability
still can’t be explained. Presumably, such missing heritability is partially due to epistasis,
but there was a lack of evidence for large-scale epistasis in complex, multi-cellular
organisms. In this dissertation, we used mouse chromosome substitution strains to
facilitate epistasis detection, and successfully identified widespread epistatic interactions
that regulate fasting glucose levels and hepatic gene expression.
2. Studies of rare Mendelian disorders
Compared with common variants, rare variants are typically harder to detect in GWAS.
However, effect sizes of these variants tend to be much larger than common variants,
suggesting a more significant effect on gene function. Recently, more rare variants have been detected by next-generation sequencing. However, phenotype-genotype correlation studies of rare variants depend on well-established filtering strategies to identify the causal variant among all variants within an entire genome. In this dissertation we studied consanguineous families, utilizing specific population allele frequency databases among other information to guide our variant filtering strategies, and identified a novel genetic basis for two genetic diseases. Importantly, the causal variants revealed novel gene functions as they relate to normal physiological development.
12
Overall, our studies on complex traits and rare Mendelian disorders contributed to a
better understanding of the genetic basis of phenotype-genotype correlations, improved genetic diagnosis and counseling as well as built a foundation for precision medicine.
13
Chapter 2. Mutations in the mitochondrial ribosomal protein
MRPS22 lead to primary ovarian insufficiency.
This is a pre-copyedited, author-produced version of an article accepted for publication in Human Molecular Genetics following peer review. The version of record (Anlu Chen, Dov Tiosano, Tulay Guran, Hagit N. Baris, Yavuz Bayram, Adi Mory, Laura Shapiro-Kulnane, Craig A. Hodges, Zeynep Coban Akdemir, Serap Turan, Shalini N. Jhangiani, Focco van den Akker, Charles L. Hoppel, Helen K. Salz, James R. Lupski, and David A. Buchner. Mutations in the mitochondrial ribosomal protein MRPS22 lead to primary ovarian insufficiency. Human Molecular Genetics, 2018, Vol. 27,
No. 11 1913–1926) is available online at: https://doi.org/10.1093/hmg/ddy098.
14
Abstract
Primary ovarian insufficiency (POI) is characterized by amenorrhea and loss or
dysfunction of ovarian follicles prior to the age of 40. POI has been associated with
autosomal recessive mutations in genes involving hormonal signaling and
folliculogenesis, however the genetic etiology of POI most often remains unknown. Here we report MRPS22 homozygous missense variants c.404G>A (p.R135Q) and c.605G>A
(p.R202H) identified in four females from two independent consanguineous families as a novel genetic cause of POI in adolescents. Both missense mutations identified in
MRPS22 are rare, occurred in highly evolutionarily conserved residues, and are predicted to be deleterious to protein function. In contrast to prior reports of mutations in MRPS22
associated with severe mitochondrial disease, the POI phenotype is far less severe.
Consistent with this phenotype-genotype correlation, mitochondrial defects in oxidative phosphorylation or rRNA levels were not detected in fibroblasts derived from the POI patients, suggesting a non-bioenergetic or tissue specific mitochondrial defect.
Furthermore, we demonstrate in a Drosophila model that mRpS22 deficiency specifically in somatic cells of the ovary had no effect on fertility, whereas flies with mRpS22 deficiency specifically in germ cells were infertile and agametic, demonstrating a cell autonomous requirement for mRpS22 in germ cell development. These findings collectively identify that MRPS22, a component of the small mitochondrial ribosome subunit, is critical for ovarian development and may therefore provide insight into the pathophysiology and treatment of ovarian dysfunction.
15
Introduction
Primary ovarian insufficiency (POI) is defined by the loss or dysfunction of ovarian
follicles associated with amenorrhea before the age of 40 [45]. POI is a major cause of
female infertility with a prevalence greater than 1%. There is a strong genetic component
to the development of POI, both in the form of monogenic and multigenic disorders,
however, in most cases the genetic etiology of POI remains unclear [46]. Among the most common genetic defects associated with POI are X chromosome defects, which collectively account for approximate 10-25% of POI cases [46]. These include Turner’s syndrome, Triple X syndrome, and Fragile X syndrome. A number of monogenic disorders resulting in POI have also been identified, including those with variants in
BMP15 and PGRMC1, both located on Chromosome X, as well as those with variants in
GDF9, FOXO3, FIGLA, and NR5A1, among others, that are autosomal [46]. These variants are each estimated to account for between 1-2% of POI cases. Thus, the majority of POI cases remain classified as idiopathic.
Although there remains much to understand about the genetic basis of POI, much has been learned already about both normal and pathological ovarian development based on cellular and molecular studies of the genes that have been associated with POI. For example, SYCE1, STAG3, and HFM1 are members of the synaptonemal complex that are required for chromosomal segregation during meiosis [47–51] and NUP107 is a component of the nuclear pore complex that is important for maintaining the communication between gonadal somatic cells and oocytes [52]. Furthermore, autosomal
recessive disorders that affect DNA repair, such as MCM8 [53] and MCM9 [54,55] and
16
genes encoding transcription factors, such as FIGLA [56], SOHLH1 [57], and NOBOX
[58,59] have been recently reported in POI. Moreover, eukaryotic translation initiation
factor 4E nuclear import factor 1 (eIF4ENIF1) has been recently identified in cases of
dominantly inherited POI [60]. However, there are still many cases with unexplained POI
suggesting that new causative genes are yet to be discovered.
Here we present the identification of two different homozygous missense mutations in the nuclear encoded gene mitochondrial ribosomal protein S22, MRPS22, as another genetic cause of POI in four adolescent females from two independent consanguineous families.
Drosophila modeling demonstrated a cell autonomous function of the MRPS22 ortholog in germ cells that is required for female germ cell viability, thus collectively demonstrating the importance of MRPS22 in reproduction and ovarian development.
17
Results
Identification of mutations in MRPS22 in patients with POI
To identify novel genetic causes of POI, we focused on an extended Israeli-Christian
Arab consanguineous family, in which two distinct genetic conditions with autosomal- recessive inheritance patterns were suspected (Fig. 2.1A). The presence of two distinct genetic conditions is not uncommon in this particular patient population [61]. The first
genetic condition presented as 46,XY females with inguinal hernias that contained
testicles in two twin sisters at the age of 3 years (Fig. 2.1A. F1-IV-7 and F1-IV-8).
Genetic evaluation revealed that both sisters had a 46,XY karyotype and were
homozygous for a missense mutation in the hydroxysteroid 17-beta dehydrogenase 3
gene (NM_000197.1 (HSD17B3): c.239G>A; p.Arg80Gln), thus resulting in a diagnosis
of 17-beta hydroxysteroid dehydrogenase type III deficiency (OMIM:605573) [62].
HSD17B3 converts androstenedione to testosterone and is expressed predominantly in the
testes [63]. One girl underwent bilateral gonadectomy at the age of 3 years and continued
to be raised as a female. Her sister underwent unilateral gonadectomy at the age of 3
years and was raised as female until the age of 9 years, when due to adrenarche and the
presence of one testicle in the inguinal canal, signs of masculation appeared. The parents
decided to raise the child as a boy. All of the other girls in the extended family (Fig. 2.1A)
were evaluated and found to have a normal 46,XX karyotype.
18
Fig. 2.1. Pedigrees of two consanguineous families with POI.
19
(A) Family I pedigree structure. Below the pedigree are the genotypes corresponding to
MRPS22, HSD17B3, and the karyotyping results. MRPS22 mutation refers to c.605G>A; p.Arg202His. HSD17B3 mutation refers to c.239G>A; p.Arg80Gln (B) Family II pedigree structure. Below the pedigree are the MRPS22 genotypes, which refers to c.404G>A; p.Arg135Gln and the karyotyping results.
20
The second genetic condition in this family was suspected when the proband (Fig. 2.1A.
F1-IV-9) presented with delayed puberty at the age of 16 years, with delay in breast
development (B) and pubic hair (P) that were at Tanner stage 1 and 2, respectively.
Hormonal testing revealed hypergonadotrophic hypogonadism as evidenced by high
basal gonadotropin levels including luteinizing hormone (LH) and follicle-stimulating
hormone (FSH), as well as undetectable estradiol (E2) (Table 2.1). Medical history
revealed normal pregnancy and delivery (birth weight 3000 grams) and normal growth
and development during childhood. Genetic analysis revealed that this individual was
homozygous for the p.Arg80Gln variant in the HSD17B3 gene. However, all previously
described 46,XX female patients that were homozygous for the p.Arg80Gln variant were
asymptomatic [62,64]. Thus, a second unrelated genetic condition was suspected [61].
Family history revealed that in the extended family there is another 19-year-old girl (Fig.
2.1A. F1-IV-5) with delayed puberty (B1P2) and a similar profile of POI with hypergonadotrophic hypogonadism (Table 2.1). MRI of the proband demonstrated a very small uterus, measuring 5x12x14 mm. Adrenocorticotropic hormone (ACTH) testing in both patients confirmed normal cortisol production, without any evidence of an enzymatic block (Table 2.2). Urinary steroid analysis by gas chromatography / mass spectrometry (GC/MS) revealed low levels of etiocholanolone (Et) and androsterone (An), and that methemoglobin levels were normal. Echocardiography revealed a normal heart in both sisters. Lactate levels (1.0 and 1.4 mmol/L; normal range 0.5-1.6 mmol/L) and
blood pH (pH=7.33 and 7.31) were both normal. Bone age measurements taken at
chronological age 18 years and 3 months were compatible with bone age of 13 years, thus
demonstrating delayed bone age.
21
Table 2.1. Hormone levels in four individuals with POI.
F1-IV-5 F1-IV-9 F1-IV-11 F2-IV-2 Normal pubertal range Age (years) 19 16 12 14 LH (mIU/L) 17.3 21.0 9.1 20.0 1.0 – 14.7 FSH 41.3 78.0 21.0 99.4 3.0 – 21.0 (mIU/L) E2 (pmol/L) Not detected Not detected 141 0.04 55 – 1250
Table 2.2. Adrenocorticotropic hormone stimulation test for patients with POI.
Individual F1-IV-9 F2-IV-2 ACTH test (min) 0 60 0 60 Cortisol (nmol/l) 323 797 750 1080 17OH progesterone (nmol/l) 1.8 9.7 4.8 8.1 CS (nmol/l) 7.1 14.6 N.D. N.D. N.D. not done
22
The proband’s sister (Fig. 2.1A. F1-IV-11), at the age of 9 years also showed elevated
LH level of 10.4 mIU/ml with low FSH 0.27 mIU/ml and undetectable estrogens. At the age of 12 years, physical examination revealed Tanner stage 3 breast development with detectable estrogen but elevated gonadotropins (Table 2.1). Anti-mullerian hormone was undetectable < 0.16 µgr/L. MRI showed a small uterus with normal cervix (21x9x9 mm) and small ovaries (10x6x4 mm). An ovarian biopsy at the age of 14 years, taken for ovarian tissue preservation, demonstrated fibrotic ovaries without follicles (Fig. 2.2). All three affected individuals are the offspring of first degree cousins, consistent with an autosomal recessive inheritance pattern (Fig. 2.1A). Genetic evaluation in all three patients revealed a normal female karyotype 46,XX. One patient was homozygous for the p.Arg80Gln variant in HSD17B3, one was heterozygous, and one did not carry the
HSD17B3 variant. Given that mutations in HSD17B3 are not associated with POI [62,64], and the POI phenotype did not segregate with the p.Arg80Gln variant in HSD17B3 in this family, it suggested an independent genetic etiology (Fig. 2.1A).
23
Fig. 2.2. Absence of germ cells in the ovary of a female patient with the MRPS22
p.R202H mutation.
H & E stained ovarian tissue from individual F1-IV-9.
24
To identify the genetic basis of the POI, linkage analysis was performed with affected
individuals F1-IV-5, F1-IV-9 and the unaffected sibling F1-IV-1. Based on the predicted
autosomal recessive inheritance pattern, we focused on regions of homozygosity in the
affected daughters F1-IV-5 and F1-IV-9 that were heterozygous in the unaffected sibling
F1-IV-1. SNP genotyping analysis identified 19 loci greater than 1 Mb on 10 different
chromosomes that segregated with the POI.
To identify the causal genetic mutation within these candidate intervals, we performed
WES in affected individual F1-IV-5 and the unaffected individual F1-IV-1 (Fig. 2.1A).
Candidate variants encoding nonsynonymous changes discovered by WES were filtered
to remove variants with an allele frequency greater than 0.01 and that were not predicted
by either SIFT [65] or Polyphen2 [66] to be damaging or possibly damaging to protein
function. After filtering, four candidate variants from Family I were left and were
validated by Sanger sequencing. However, only the variant in mitochondrial ribosomal
protein S22 (MRPS22, NM_020191.2: c.605G>A: p.R202H; ClinVar Variation ID
SCV000607729), segregated with the POI phenotype in Family I with 30 unaffected and
three affected family members. Based on the SNP linkage analysis, this variant was
within a 3 Mb interval defined by the SNP markers rs2737735 and rs16850488 on
Chromosome 3 that segregated with POI. Analysis of polymorphisms within the WES
data revealed an absence of heterozygosity within this interval (Fig. 2.3A). To ensure that
there were no other nonsynonymous variants within this interval segregating with disease,
the 12 coding regions of genes that were not covered by WES were analyzed by Sanger
sequencing. These 12 intervals totaled 2,167 bp of coding sequence (average size: 197 bp;
25
range 53 bp – 411 bp) (Table 3.4). No additional variants were identified, demonstrating
that the MRPS22 (p.R202H) variant was the only amino acid change that segregated with
disease. This variant is present in dbSNP (rs753345594), but has an allele frequency of
0.00001218 (3/246,212), with no homozygotes present in the genome Aggregation
Database (gnomAD, v2.0). The Arg202 residue is highly evolutionarily conserved (Fig.
2.3B). Collectively, these data suggested that the MRPS22 (p.R202H) variant was a
strong candidate as a novel genetic cause of POI.
To identify additional POI patients with a mutation in MRPS22, this gene was submitted
to GeneMatcher [67]. A second family was identified in which WES revealed a different
homozygous missense mutation in MRPS22 (NM_020191; c.404G>A; p.R135Q; ClinVar
Variation ID SCV000693853) in the proband F2-IV-2. This variant was confirmed by
Sanger sequencing and was located within a genomic interval defined by an absence of
heterozygosity (Fig. 2.3A). Exome variant analysis had led to initial prioritization of five
homozygous candidate variants in Family II, in BCL6, KDM1A, MRPS22, PLXND1, and
TRIM62. This patient presented with a POI phenotype similar to that described for the 3
patients in Family 1 and as described in detail below. The MRPS22 variant was not
present in the Greater Middle East Variome Project WES database [16], the Genome
Aggregation Database (gnomAD) [15], or the BCM-HGSC internal database that consists of more than 6,500 exomes including ~1,100 Turkish exomes. Moreover, similar to
MRPS22 (p.R202H), this variant is highly evolutionary conserved (Fig. 2.3B) and predicted to be deleterious to protein function by SIFT and Polyphen2 [65,66].
26
Table 2.3. List of gaps in WES coverage in Family I with POI and primers used for
Gene Primer Strat (hg19) End (hg19) Size Sequence(5' to 3') located type Forward GCGTTAGGAGATGTGCAGGT Reverse AATTACTGCTCGGCTCCCAC RASA2 141205926 141206058 132 Nested forward GAGTACGGTTCTCTGCAGGG Nested CCCTTCCAGCCTCAACCG reverse Forward TGTTGTCTTTCCTGGCAGTG Reverse TTCCTGGGACAGAAGACTCC NMNAT3 139294836 139294425 411 Forward ACCTCCTCCAACAAGCTCCT Reverse GCAGGCAGACAATGGTTTCT ACPL2/PX Forward GGCAGTGTCCTCTCAGCAAC 140950667 140950753 87 YLP1 Reverse TCAGTCTCCCAACCTCGGAC Forward CTTTCTTCTGCCAGGGTTCTT 138823027 138823175 149 Reverse CCAGGAGTTTAGGCATTAGCC BPESC1 Forward TCTGGGCCAGCTCTATGC 138824938 138825263 326 Reverse CCCCACGCTAAACCGTCT Forward TAACGAGGAGGTGTTCTCGG ATP1B3 141595679 141595731 53 Reverse AATGAATGGGGCCGCACT Forward GCTAGAAGCGCACCCAT CLSTN2 139654213 139654325 113 Reverse GCACAGACAGCCCTCAAA Forward TTCCCCAAGCCAACGTCT RNF7 141461486 141461749 264 Reverse TTGTTTAACTCCGTTTATTGCCC Forward TAGTGGCCTTCAGGGATGAG SPSB4 140770244 140770585 342 Reverse GAGGAATTCTCAGGGACTGG Forward AGTCTGCAGTGTTTTCCTCTCT TFDP2 141719094 141719195 102 Reverse AGTCAATCTGCTCACAGGGT Forward GGGTCTTGCCTGGATGTTGA ZBTB38 141146476 141146663 188 Reverse TGATGGATCTGGGCAAAGCA
27
Fig. 2.3. Independent mutations in MRPS22 identified in two consanguineous families with POI.
(A) Sanger sequencing confirms the presence of the MRPS22 mutations detected by WES.
Intervals with an absence of heterozygosity were identified based on calculated B-allele
28
frequencies from the WES data. Gray shaded areas indicate regions with an absence of heterozygosity. The location of MRPS22 within an interval with an absence of heterozygosity is indicated by a vertical black line. (B) Alignment of protein sequences of
MRPS22 from multiple species demonstrates the evolutionary conservation of the
MRPS22 residues Arg135 and Arg202 that are altered in patients with POI.
29
In Family II, the proband (F2-IV-2) was born to 1st degree cousins of Turkish descent
(Fig. 2.1B). She presented with amenorrhea at 14 years and 8 months of age. Her
previous medical history revealed normal pregnancy and delivery at term, but she was
small for gestational age with 1900 grams birth weight (-3.9 SDS). Family history was
not consistent with a sexual development, puberty, or infertility disorder. Physical
examination revealed mild facial dysmorphism including deep-set eyes with mild
hypotelorism and mild ptosis, thin upper lip, and hypoplastic nares. Height was 145.3 cm
(-2.1 SDS) and weight was 45 kg (+0.4 SDS). At chronological age of 14 years and 8 months, bone age was 10 years. Fundoscopy was normal. She had Tanner stage 3 breast development, without pubic or axillary hair development. Laboratory evaluation revealed a normal 46,XX female karyotype and normal blood count and chemistry except for
impaired glucose tolerance test. Blood pH levels were normal (pH=7.42), however, blood
lactate concentrations were slightly elevated (3 mmol/L; normal range 0.5-1.6 mmol/L) and nerve conduction studies and electromyography testing revealed bilateral axonal polyneuropathy of lower extremities as indicated by absent evoked potentials from bilateral sural, peroneal, and tibial nerves. Brain and heart morphology was normal, as revealed by cranial and pituitary MRIs and echocardiography, respectively. Endocrine evaluation revealed hypergonadotrophic hypogonadism (Table 2.1). Abdomino-pelvic ultrasound showed small uterus of 15x12x3 mm, and ovaries could not be visualized.
ACTH test was normal (Table 2.2). Plasma adrenal steroids by LC-MS/MS confirmed very low keto steroids (Table 2.4). Methemoglobin levels were elevated (1.9%, normal range: 0 - 1.5%) and Bone densitometry revealed mild osteoporosis (L2-L4, Z-score -2.1,
BMD: 0.736 gr/cm2). She was treated with combined estrogen and progesterone
30
supplementation and had menarche at 16 years and 8 months. Her final height is 157 cm
(+0.4 SDS) and weight is 58 kg (+1 SDS). Pelvic ultrasound at 20 years of age showed uterus as 42x21x8 mm and hypoplastic ovaries (right ovary: 14x6x4 mm; left ovary:
11x8x4 mm).
31
Table 2.4. Plasma adrenal steroid levels in two individuals with POI.
F1-IV-9 F2-IV-2 Normal range Progesterone 0.9 0.54 0.5 – 2.3 (nmol/l)
DHEAS 0.7 0.9 1.8 – 10.3 (mol/l) DHEA Not done 1.1 3.5 – 41.1 (nmol/l) Androstendione 3.3 0.02 1.7 – 16.3 (nmol/l) Testosterone <0.3 0.3 0.3 – 3.8 (nmol/l)
32
Cellular studies of POI patient-derived fibroblasts
Collectively, the identification of two independent families with POI and predicted
deleterious missense mutations in MRPS22 was highly suggestive that pathogenic
variants in this gene represent a novel genetic cause of POI. To examine the role of the
MRPS22 (p.R202H) variant on gene function, MRPS22 mRNA and protein expression
levels were examined in patient-derived fibroblasts. No detectable changes were
identified in protein or mRNA expression levels between control- and patient-derived
primary fibroblasts (Fig. 2.4A, B). In addition to potential expression level differences in
MRPS22, as a component of the small subunit of the mitochondrial ribosome, defects in
MRPS22 have been shown to reduce the levels of mitochondrial rRNAs [68,69]. However,
there were no differences between control- and patient-derived fibroblasts in the
expression levels of the 12S and 16S rRNA expression levels (Fig. 2.4C).
To evaluate the effect of MRPS22 (p.R202H) on mitochondrial function, we performed mitochondrial function studies in control- and patient-derived primary fibroblasts.
Measurements of electron transport chain complex enzyme activities from cultured skin
fibroblasts and OXPHOS activity measured with permeabilized cells both failed to
identify significant differences in activity between fibroblasts from the POI patients and
control individuals (Fig. 2.5). Thus, MRPS22 (p.R202H) has no detectable effect on mitochondrial function in primary fibroblasts, although we were unable to directly examine its function in ovarian tissue.
33
Fig. 2.4. Molecular analysis of fibroblasts from patients with the MRPS22 (p.R202H) mutation.
34
(A) Western blot analysis of MRPS22 and the loading control alpha-tubulin from control- and patient-derived primary fibroblasts. Levels of MRPS22 protein were determined by
ImageJ analysis and normalized to alpha-tubulin. (B) Levels of MRPS22 mRNA, (C) 12S rRNA, and (D) 16S rRNA were detected by RT-qPCR and were unchanged between control- and patient- derived fibroblasts. Each cell lines were seeded as triplicates. NS, not significant.
35
Fig. 2.5. Oxidative phosphorylation is normal in fibroblasts from patients with the
MRPS22 (p.R202H) mutation.
(A) Intact mitochondria isolated from fibroblasts were supplied with electron donor substrates and respiration, as indicated by oxygen uptake, was measured with a Clark
36
electrode. (B) Rate of oxidative phosphorylation in fibroblasts determined using protocol
1 as described [70]. (C) Rate of oxidative phosphorylation in fibroblasts determined using protocol 2 as described [70]. Control samples denote a genetically unrelated fibroblast line of approximately equal passage number that was analyzed concurrently with the samples derived from Family I. Average denotes the historical averages of control samples analyzed from a reference population (electron transport chain activity, n=144; oxidative phosphorylation, n=57). Average sample is shown as mean ± standard deviation. None of the data in the POI individuals is significantly different from the controls.
37
Embryonic lethality of Mrps22 deficient mice
Given the inability to study the impact of the MRPS22 mutations in patient-derived ovarian tissue, we generated two animal models to better investigate the in vivo function of MRPS22. First, we examined a homozygous Mrps22 knockout mouse model that was generated by complete deletion of all exons and intervening sequences. Heterozygous
Mrps22 knockout mice (+/-) were fertile and showed no overt signs of abnormalities.
However, among 3-week-old offspring of a heterozygous intercross, no homozygous knockout mice (-/-) were detected (Table 2.5). Similarly, when offspring of a heterozygous intercross were genotyped at embryonic day 18.5 (e18.5), again no -/- offspring were detected (Table 2.5). Thus, complete deficiency of Mrps22 results in embryonic lethality, demonstrating the crucial role of Mrps22 in development, but preventing functional studies in adult ovarian tissue.
38
Table 2.5. Survival of offspring from a Mrps22 heterozygous knockout mouse (+/-) intercross.
Age E18.5 3 weeks
Genotype +/+ +/- -/- +/+ +/- -/-
Observed 12 25 0 10 22 0
Expected 9.25 18.5 9.25 8 16 8
p value 0.0021 0.0046
39
mRpS22 in Drosophila germ cells is required for fertility
The Drosophila melanogaster genome encodes a single ortholog, mRpS22, with
significant homology to the human MRPS22 gene. To evaluate the role of mRpS22 in vivo, we used an inducible tissue-specific RNA interference (RNAi)-mediated
knockdown approach. In these experiments, knockdown was achieved by expressing a
short hairpin RNA under the control of the upstream activator sequence (UAS) with the
following drivers: tub-Gal4 which uniformly drives expression in all tissues [71]; c587-
Gal4, and bab-Gal4, which drive expression in the somatic cells of the ovary [72]; and the germline specific nos-Gal4::VP16 driver [73]. We found that ubiquitous mRpS22
knockdown (tub>mRpS22RNAi) resulted in larval death. Interestingly, we found that
knockdown in germ cells (nos>mRpS22RNAi), but not in the somatic cells of the ovary, led
to female sterility (Table 2.6).
40
Table 2.6. Phenotypes of RNAi-mediated mRpS22 tissue-specific knockdown in
Drosophila
Conditional RNAi Driver Survival condition Female fertility knockdown
tub-Gal4 Whole body larval lethality -
Ovary: bab-Gal4 Viable Fertile somatic cells
Ovary: c587-Gal4 Viable Fertile somatic cells
Ovary: nos-Gal4 Viable Infertile germ cells
41
To identify the defect underlying the female sterility, ovaries were stained with an
antibody against Vasa, which labels all germ cells, and the DNA stain DAPI to monitor
germ cell differentiation. Adult ovaries are composed of 15-20 individual strands of progressively developing egg chambers called ovarioles (Fig. 2.6A). Egg chambers are assembled within the germarium, a structure at the anterior end of each ovariole. Each egg chamber contains 16 interconnected germ cells, one of which will become an oocyte and the others polyploidy nuclei. As each 16-cell cyst is surrounded by an epithelial monolayer of somatic cells, it will bud off from the germarium to form a chain of individualized egg chambers of progressive age. The bab>mRpS22RNAi mutant ovarioles
were indistinguishable from those in wild-type flies, suggesting that loss of mRpS22 in
the somatic cells of the ovary did not alter cell viability or ovarian development (Fig.
2.6B). However, nos>mRpS22RNAi mutant ovarioles lacked strings of developing egg
chambers (Fig. 2.6C). Moreover, no germ cells were detected, even at the tip of the
ovariole where the germline stem cells normally reside. (Fig. 2.6C) This agametic
phenotype suggests a defect in germ cell survival.
42
Fig. 2.6. mRpS22 is required for female germ cell development in Drosophila.
Representative confocal images of ovarioles from (A) control, (B) bub>mRpS22RNAi, and
(C) nos>mRpS22RNAi females stained for the cytoplasmic Vasa protein (green) to visualize germ cells, and the DNA stain DAPI (red). Scale bars: 50 μm.
43
Discussion
Here we describe the identification of four individuals from two independent
consanguineous families with missense mutations in MRPS22 that result in autosomal
recessive inheritance of POI. The conclusion of pathogenicity for the MRPS22 variants was based on the cumulative evidence stemming from the identification of two different homozygous missense variants in independent families together with functional data from a Drosophila model of germ cell specific mRpS22 deficiency. The genetic data support the causal role of the MRPS22 variants based on the following ACMG criteria: absence in population databases (strength of criteria = pathogenic moderate), multiple lines of computational evidence supporting a deleterious effect on the gene (pathogenic supporting), and co-segregation with disease in multiple affected family members
(pathogenic moderate). In addition, despite the absence of mitochondrial defects in functional studies of patient-derived fibroblasts (benign strong), the animal modeling studies in Drosophila demonstrated a deleterious effect of mRpS22 deficiency on fertility and ovarian development (pathogenic strong). Thus, the cumulative evidence together supports the pathogenicity of homozygous MRPS22 missense variants as a novel cause of
POI in adolescents.
MRPS22 encodes a component of the small 28S mitochondrial ribosome subunit that is found in species including mammals, fruit flies, and nematodes, but lacks a direct ortholog in fungi, yeast, plants, or bacteria [74]. Protein translation in mitochondria is
44
required to translate the 13 polypeptides encoded in the mitochondrial genome that are essential components of all mitochondrial respiratory chain complexes, excluding complex II which is entirely nuclear encoded [75]. The mitochondrial ribosome is composed of 80 proteins and three rRNA molecules divided between two subunits, a large 39S subunit and a small 28S subunit [76,77]. Beyond MRPS22, mutations in other nuclear-encoded proteins involved in mitochondrial translation are associated with impaired ovarian development. Homozygous mutations in another component of the 28S subunit, MRPS7, were associated with primary hypogonadism and primary adrenal failure, as well as sensorineural deafness and lactic academia [78]. Mutations in the mitochondrial tRNA synthetases HARS2 and LARS2 both result in Perrault syndrome, consisting of sensorineural hearing loss and ovarian dysfunction [79,80]. Mutations in the mitochondrial tRNA synthetase AARS2 cause progressive leukoencephalopathy with ovarian failure [81]. Finally, mutations in any of the five subunits of EIF2B can lead to ovarioleukodystrophy, which in addition to vanishing white matter in the nervous system is associated with ovarian failure in female carriers [82]. Thus, the causal relationship between mutations in many genes involved in mitochondrial translation and ovarian development highlight the critical role of mitochondrial translation in this tissue.
In addition to mutations in MRPS22 causing POI, rare mutations in MRPS22 have also previously been reported to cause severe mitochondrial disease with features including cardiomyopathy, lactic acidosis, and brain abnormalities [68,69,83,84]. Features related to ovarian or germ cell development were not previously reported in these patients.
Therefore, this report extends the phenotypic spectrum of disorders associated with
45
impaired MRPS22 function. The only previous case reports of female patients with
MRPS22 mutations were three female infants who were homozygous for an MRPS22
(p.R170H) allele and presented with severe hypotonia, hypertrophic cardiomyopathy,
lactic acidosis, and died in infancy. Patients described in three other case reports were all
males, and also presented in critical condition with heart and brain abnormalities.
Molecular analysis of mitochondrial function in patient-derived fibroblasts from these patients identified decreased enzyme activities for the oxidative phosphorylation complexes and decreased levels of mitochondrial 12S and 16S rRNAs. Thus, the pathogenic variants p.R170S and p.L215P in MRPS22 in these patients compromised mitochondrial energy production. This is in contrast to the POI patient fibroblasts carrying the MRPS22 c.605G>A: p.R202H mutation, which demonstrated no defects in
OXPHOS activity or mitochondrial rRNA levels. This is consistent with the relatively milder phenotype of POI and the absence of lactic acidosis. Collectively, this suggests that the MRPS22 mutations p.R202H and p.R135Q associated with POI affect primarily the mitochondrial role in the reproductive system, but not in global energy production.
Structural analysis of the human mitochondrial ribosome [76] suggests a potential
mechanism for the more severe phenotypes associated with the missense mutations
p.L215P [69] and p.R170H [85] relative to the relatively milder POI phenotype associated
with the p.R135Q and p.R202H mutations (Fig. 2.7). Amino acids R135 and R202 are
buried and situated in an internal region of MRPS22, between an α-helical subdomain
and a β-sheet + 1 α-helix subdomain as shown above and below the cluster containing
R202 and R135 (Fig. 2.7). Both R135 and R202 form hydrogen bond interactions and
46
van der Waals interactions that will be disrupted by their respective mutations causing a
localized disturbance of that particular MRPS22 structure, possibly also affecting its
interaction with residues near F177 of MRPS18B. The L215 residue is located in a
hydrophobic region comprised of 3 α-helices. The p.L215P mutation likely not only causes a disruption of this hydrophobic core, but the change to Pro is also predicted to disrupt both the hydrogen bonding of the α-helix to which L215 belongs and van der
Waals interactions with an α-helix of MRPS18B (Fig. 2.7). The R170 residue is situated in the β-sheet + 1 α-helix subdomain (Fig. 2.7). The R170 side chain forms a hydrogen
bond with the backbone oxygen of S162. This S162 residue itself is in hydrogen bonding
distance with D71 of protein MRPS16 (Fig. 2.7). We anticipate that the p.R170H
mutation will cause loss of the interaction with S162 of MRPS22 and thereby lead to an
altered conformation of this region. In summary, the anticipated structural consequences
of the R135Q and R202H are likely similar, as they are in close proximity to each other
(~8.5Å) and are part of the same buried charge cluster in between two subdomains of
MRPS22 (Fig. 2.7). In contrast, the R170H and L215P mutations are postulated to indirectly cause disruption of protein:protein interfaces (Fig. 2.7).
47
Fig. 2.7. Structural analysis of disease-causing missense mutations in MRPS22.
Close-up view of the human mitochondrial ribosome structure containing the 4 identified disease-causing missense mutations in MRPS22. The mitochondrial ribosome structure
(39) (PDBid=3j9m) is shown in cartoon representation with the mutations shown in ball- and-stick and nearby key residue in stick representation. The ribosomal proteins MRPS22,
MRPS18B, MRPS16, and MRPS25 and the 12S rRNA are colored grey, green, magenta, blue, and orange, respectively. Hydrogen bonds are depicted by dashed lines. The figure was generated using PYMOL (htttps://pymol.org).
48
The etiology of ovarian dysfunction due to mutations in MRPS22, and other proteins that
function in mitochondrial translation, remains unclear [86]. However, the fact that germ-
cell specific deletion of mRpS22 in Drosophila results in agametic ovaries suggests a cell
autonomous phenotype within the female germline. Interestingly, the precursors to these
cells, primordial germ cells, demonstrate significantly elevated OXPHOS activity relative
to other cell types [87]. Furthermore, this activity is important for the eventual
specification of these stem cells [87]. The critical dependence of primordial germ cells on high levels of OXPHOS may contribute to the specific ovarian dysgenesis phenotype in
the context of relatively mild impairments in mitochondrial translation, as other cell types
are unaffected by the subtle mitochondrial defects. Thus, the identification of mutations
in specific genes that cause these mild mitochondrial defects and are therefore critical for
oocyte formation will lead to a better understanding of normal ovarian development and
potentially a better understanding of the molecular basis of premature ovarian failure.
49
Materials and Methods
Human studies. These studies were approved by the ethics committees of Rambam
Health Care Campus, Marmara University, and the Baylor-Hopkins Center for Mendelian
Genomics. Informed consent was obtained from all participants. Peripheral blood was collected from affected individuals, parents, and unaffected relatives if available.
Genomic DNA was extracted from blood leukocytes according to standard procedures.
Linkage analysis. Individuals F1-IV-5, F1-IV-9, and F1-IV-1, were genotyped at loci across the genome using the Illumina Omni 250K SNP chip. Based on the presumed autosomal recessive model of inheritance, regions of homozygosity greater than 1 megabase were identified that were shared between the affected individuals F1-IV-5 and
F1-IV-9, but not the control individual F1-IV-1.
Whole exome sequencing. Whole exome sequencing (WES) of patient F1-IV-5 and control individual F1-IV-1 was performed at The Technion Institute Sequencing Core.
DNA was extracted from whole blood and sequenced using the Illumina TruSeq kit.
Samples were sequenced with paired-end 100 bp reads totaling 19,290,546 for sample
F1-IV-5 and 18,997,336 for sample F1-IV-1. The fastq sequence files were mapped to the reference human genome GRCh37 using BWA (v. 0.7.5) [88]. Sequence was analyzed
following the GATK (v. 2.8-1) best practices including removal of duplicate reads by
50
Picard (v. 1.105), local realignment, and base quality score recalibration [89]. The
resulting number of unique mapped reads for F1-IV-5 and F1-IV-1 were 17,785,392 and
14,074,730, respectively. The average read depth was 15.3x and 14.7x, respectively.
HaplotypeCaller was used to call SNPs and indels [89]. For SNPs, the filters used were
QD < 2.0, FS > 60.0, and MQ < 40.0. For indels, the filters used were QD < 2.0 and FS >
200.0. The variants passing these filters were annotated with Annovar [90] using the
following databases: nonsyn_splicing, esp6500si_all, 100g2012apr_all,
snp137NonFlagged, ljb2_sift, and ljb_pp2.
WES was performed on the patient F2-IV-2 at the Baylor College of Medicine Human
Genome Sequencing Center (BCM-HGSC) according to a previously described protocol
[91]. In brief, genomic DNA samples were prepared into Illumina paired-end libraries and underwent whole-exome capture via the BCM-HGSC core VCRome 2.1 design [92] (42
Mb, NimbleGen, Cat. No. 06266380001) according to the manufacturer’s protocol
(NimbleGen SeqCap EZ Exome Library SR User’s Guide), followed by sequencing on
the HiSeq 2000 platform (Illumina) with a sequencing yield of 8.4 Gb. The samples
achieved 96% of the targeted exome bases covered to a depth-of-coverage of 20 or
greater [93]. Data produced were aligned and mapped to the human genome reference
sequence (Genome Reference Consortium GRCh37, hg19) with the Mercury in-house
bioinformatics pipeline [94]. Variants were called using the ATLAS (an integrative
variant analysis pipeline optimized for variant discovery) variant calling method and the
Sequence Alignment/Map (SAMtools) suites and annotated with the in-house-developed
51
‘‘Cassandra’’ annotation pipeline that uses Annotation of Genetic Variants ANNOVAR and additional tools and databases.
Variant prioritization for indels was as follows: If an indel is reported as a variant in the
Human Gene Mutation Database (HGMD) and its frequency is less than 5 percent in the
1000 Genomes Project (1000GP) data, it was included. If it is not present in HGMD, it has to pass through all of the variant quality filters and its frequency has to be less than 2 percent in 1000GP data to be prioritized and investigated further as a potential pathogenic variant. In a second filtering step, another filter is further applied based on the number of samples having the variant in Atherosclerosis Risk in Communities Study (ARIC) database. The number of samples having the variant in the ARIC database should be less than 120 out of 10,940 samples in total.
Variant prioritization for single nucleotide variants (SNVs) was as follows: If a SNV is reported as a variant in the HGMD or it has a clinical variant value between 3 and 8 in the Single Nucleotide Polymorphism database (dbSNP), its frequency has to be less than
5 percent in both 1000GP data and NHLBI GO Exome Sequencing Project (ESP5400)
African and European populations. If it is not present in HGMD and does not have a clinical variant value between 3 and 8 in dbSNP, it has to pass through all of the variant quality filters and its frequency has to be less than 1 percent in both 1000GP data and
ESP5400 African and European populations to be prioritized and investigated further as a potential pathogenic variant. Like indels, the SNVs that pass these filters were further
52
filtered based on the number of samples having the variant in the ARIC database (should
be less than 120 out of 10,940 samples in total). These filtering steps typically obtain
~800 variants per sample [94].
Given the apparent autosomal recessive inheritance pattern in the pedigree we focused on
the homozygous variants in the parsed and filtered WES data. Out of ~800 variants, we
selected the homozygous variants that have an allele frequency below 0.1% in our
internal database (CMG), which consists of more than 6,500 exomes including ~ 1,100
Turkish exomes. Then potential pathogenic variants including indels, nonsense and splice
site variants, and missense variants that were predicted as deleterious in at least three out
of five computational algorithms including SIFT, Polyphen2, LRT, Mutation Taster and
PROVEAN, were selected as candidate variants. After these filtering steps five
homozygous candidate variants in BCL6, KDM1A, MRPS22, PLXND1, and TRIM62
remained for manual inspection.
Sanger Sequencing. The MRPS22 (c.404G>A) variant was amplified by PCR using the
following primers: 5’-ATG GCC TTA GTG GGA CAC AG-3’ and 5’-AGG AGC GAA
ACT CCA TTT CA-3’. The 12 PCR amplicons that were not covered by WES in Family
I on chromosome 3 between rs2737735 and rs16850488 were amplified and sequenced
with the primers listed in Table 2.3. Sanger sequencing was performed by GenScript. The data were visualized and analyzed using FinchTV (Geospiza).
53
Genotyping. The MRPS22 (c.404G>A) variant was amplified by PCR using the
following primers: 5’-GAA AAT TAT TGG TGT CAA AAT TGT A-3’ and 5’-ATG
GCC TTA GTG GGA CAC AG-3’. The resulting PCR product was digested with the restriction enzyme, Rsa1, and analyzed by agarose gel electrophoresis. Restriction digest of the wild-type PCR product resulted in a single 200 bp PCR product whereas restriction digest of the MRPS22 (c.404G>A) variant resulted in a 176 bp and a 24 bp product.
Cell culture. For fibroblast cell lines, punch biopsies of skin were obtained from patients
and controls. Patient and control fibroblast cell lines were cultured in high glucose
DMEM (Dulbecco's Modified Eagle's Medium) (Thermo Fisher #11965-092) supplemented with 10% fetal bovine serum (Sigma, #F2442) and 1% penicillin and streptomycin (Thermo Fisher 15140-122).
Western Blotting. Protein was extracted from cultured primary fibroblast cells with
RIPA buffer (Sigma #R0278) and a protease inhibitor (Roche #05892791001). Protein was quantified using the BCA method. Western blotting was performed and quantitated using ImageJ as described [95]. Primary antibodies used were an anti-MRPS22
monoclonal antibody (1:1,000, Proteintech #10984-1-AP) and an anti-alpha-tubulin
antibody (1:10,000, Sigma, #T9026). Secondary antibodies used were anti-rabbit (1:5,000,
Thermo Fisher #31460) and anti-mouse (1:5,000, Thermo Fisher #31430).
54
Quantitative PCR (qPCR). Total RNA was isolated from primary fibroblast cells using
the PureLink RNA purification kit (Thermo Fisher) and reverse transcribed using the
high capacity cDNA reverse transcription kit (Applied Biosystems). The sequences for
qPCR primers are as follows: MRPS22 forward primer 5’-TGA TAA TCA TGG CGC
CCC TC-3’, MRPS22 reverse primer 5’- CTA CCA GAT TCT GCG GCC T-3’; 12S rRNA forward primer 5’-TAG ATA CCC CAC TAT GCT TAG C-3’ ,12S rRNA reverse primer 5’-CGA TTA CAG AAC AGG CTC C-3’; 16S rRNA forward primer 5’-CCA
AAC CCA CTC CAC CTT AC-3’, 16S rRNA reverse primer 5’-TCA TCT TTC CCT
TGC GGT AC-3’; GAPDH forward primer 5’-AAT CCC ATC ACC ATC TTC CA’3’,
GAPDH reverse primer 5’-TGG ACT CCA CGA CGT ACT CA’3’. The qPCR reactions were performed with the power SYBR green PCR Master Mix (Thermo Fisher) and run on a Bio Rad CFX Connect Real Time System (Bio Rad). Expression levels were calculated using the △△Ct method relative to the GAPDH control gene.
Fibroblast oxidative phosphorylation and electron transport chain activity. Studies used an O2K (Oroboros Instruments) with permeabilized skin fibroblasts and performed with 2 protocols as previously described [70]. Electron transport chain complexes in skin
fibroblasts were measured spectrophotometrically at 37 ºC as previously described [96,97].
Mice. Heterozygous Mrps22 knockout mice (B6N(Cg)-Mrps22tm1.1(KOMP)Vlcg/J, stock #028462) were purchased from The Jackson Laboratory and maintained by brother- sister matings. All mice used for experiments were obtained from breeder colonies at
55
Case Western Reserve University. Mice were housed in ventilated racks with access to
food and water ad libitum and maintained at 21°C on a 12-hour light/12-hour dark cycle.
All mice were cared for as described under the Guide for the Care and Use of Animals,
eighth edition (2011) and all experiments were approved by IACUC and carried out in an
AAALAC approved facility. The IACUC protocol number is 2014-0132. Mice were weaned at 3 weeks of age and genotyped. The Mrps22 knockout allele was genotyped by
PCR based on the presence of a 380 bp product (wild-type allele) or a 491 bp product
(knockout allele). The sequence of the genotyping primers are as follows: Mrps22 wild- type forward primer 5’- GCT GTG GGC AGT GTT ATT GT-3’, Mrps22 wild-type
reverse primer 5’- TCT CAC ACC TAG TAC CGC AGT C-3’; Mrps22 mutant forward
primer 5’- CGG TCG CTA CCA TTA CCA GT-3’, Mrps22 mutant reverse primer 5’-
TCA GTA AGT ACC TTT TAA TCC CAA GA-3’.
Drosophila stocks and culture conditions. All Drosophila strains used in this study
were obtained from the Bloomington Drosophila Stock Center (BDSC), except for c587-
Gal4 which was a kind gift of T. Xie (Stowers Institute, Kansas City, Mo.). The stocks obtained from BDSC include nos-Gal4 (BDSC #4937), bab1-Gal4 (BDSC #6802), tubP-
Gal4 (BDSC #5138) and mRpS22-P{TRiP.HMC06144} (BDSC #65882). HMC RNAi lines are constructed in the VALIUM20 vector, designed for strong expression in both somatic and germline cells [98,99]. To maximize knockdown expression, animals were
raised at 29°C.
56
Immunofluorescence and image analysis. Drosophila ovaries from 2-3 day old females were fixed and stained by standard methods [100]. The primary Vasa antibody was obtained from the Developmental Studies Hybridoma Bank, and used at 1:100.
Secondary antibody conjugated to Alexa Fluor 555 (Thermo Fisher) was used at 1:200.
Images were acquired on a Leica TCS SP8 confocal microscope and assembled using
Photoshop (Adobe) and PowerPoint (Microsoft).
57
Chapter 3. Mutations in PIK3C2A Cause Syndromic Short
Stature Associated with Cataracts and Skeletal Abnormalities
The study of this chapter was submitted to
Dov Tiosano#, Hagit N. Baris#, Anlu Chen#, Markus Schueler#, Marrit M. Hitzert#, Antje Wiesener, Antonio Berguaz, Adi Mory, Alex Yuan, Brett Copeland, Joseph G. Gleeson, Patrick Rump, Hester van Meer, Deborah A. Sival, Karl X. Knaup, Andre Reis, Nadine N. Hauer, Christian T. Thiel, Brian M. McDermott, Brian D. Perkins, Ronald Roepman, Rolph Pfundt, Michael S. Wiesener, Mariam G. Aslanyan, and David A. Buchner. American Journal of Human Genetics (# denotes co-first authorship)
58
Abstract
PIK3C2A is a class II member of the phosphoinositide 3-kinase (PI3K) family proteins that catalyze the phosphorylation of phosphoinositide (PI) into PI(3)P and the phosphorylation of PI(4)P into PI(3,4)P2. PIK3C2A is critical for the formation of cilia and for receptor-mediated endocytosis, among other biological functions. We identified loss-of-function mutations in PIK3C2A in children with short stature, coarse facial features, cataracts with secondary glaucoma, multiple skeletal abnormalities, and other findings from three independent consanguineous families. Cellular studies of patient- derived fibroblasts confirmed the loss of PIK3C2A function as evidenced by the lack of
PIK3C2A protein, impaired cilia formation, and decreased levels of PI(3,4)P2.
Additionally, Pik3c2a deficiency in zebrafish also causes cataract formation. Thus, the genetic and molecular data collectively implicate mutations in PIK3C2A in a new
Mendelian disorder of PI metabolism. Identifying the genetic basis for this novel genetic syndrome sheds light on the critical role of a class II PI3K member in growth, vision, skeletal formation and neurological development. In particular, the considerable phenotypic overlap between this syndrome and Lowe syndrome, which also includes cataracts and skeletal malformations and is caused by mutations in OCRL, a gene encoding PI-5-phosphatase, highlight the key role of PI metabolizing enzymes in specific developmental processes, while demonstrating the unique non-redundant functions of each enzyme. This discovery, together with studies of other disorders of PI metabolism, will enable future studies to discover the molecular and mechanistic basis of this syndrome to better understand the role of PIK3C2A and class II PI3Ks in disease.
59
Introduction
Identifying the genetic basis of diseases with Mendelian inheritance provides insights
into gene function, susceptibility to disease, and can guide the development of new
therapeutics. To date, ~50% of the genes underlying Mendelian phenotypes have yet to
be discovered[101]. The disease genes that have been identified thus far have led to a
better understanding of the pathophysiological pathways and to the development of
medicinal products approved for the clinical treatment of such rare disorders[39].
Furthermore, technological advances in DNA sequencing allowed for the identification of
novel genetic mutations that result in rare Mendelian disorders[67,102]. We have applied
these next-generation sequencing technologies to discover mutations in PIK3C2A that
cause a newly identified genetic syndrome consisting of dysmorphic features, short
stature, cataracts and skeletal abnormalities.
PIK3C2A is a class II member of the phosphoinositide 3-kinase (PI3K) family of lipid
kinases that catalyze the phosphorylation of phosphatidylinositol (PtdIns)[103]. The function of class II PI3Ks are poorly understood; however, they are generally thought to catalyze the phosphorylation of PtdIns to generate PtdIns(3)P, although this remains controversial[104]. PIK3C2A has been attributed a wide-range of biological functions
including glucose transport, angiogenesis, Akt activation, endosomal trafficking,
phagosome maturation, exocytosis, and autophagy[105–113]. In addition, PIK3C2A is
critical for the formation and function of primary cilia [109,112]. However, there is as yet
60
no causal link between PIK3C2A, or any class II PI3K, and human disease. Here, we
describe the evidence that loss-of-function mutations in PIK3C2A are associated with a
novel syndromic disorder involving neurological, visual, skeletal, growth, and
occasionally conductive hearing impairments.
Results
Identification of mutations in PIK3C2A in patients with syndromic short stature.
Five individuals between the ages of 8 and 21 were identified from three unrelated
consanguineous families who presented with a similar constellation of clinical features including cataracts, secondary glaucoma, skeletal abnormalities, and dysmorphic facial
features (Fig. 3.1, Table 3.1). The dysmorphic facial features included coarse facies, low
hairline, epicanthal folds, flat and broad nasal bridges, and retrognathia (Fig. 3.1B).
Skeletal findings included scoliosis, delayed bone age, diminished ossification of femoral
heads, cervical lordosis, shortened fifth digits with mild metaphyseal dysplasia and
clinodactyly, as well as dental findings such as broad maxillary incisors, narrow
mandibular teeth, and dental enamel defects (Figs. 3.1C, 3.1D, Fig. 3.2). Other recurrent
features included hearing loss, short stature, stroke, developmental delay, and
nephrocalcinosis. For example, individual I-II-2 recently started having seizures, with an
EEG demonstrating sharp waves in the central areas of the right hemisphere and short
61
sporadic generalized epileptic seizures. Her brain MRI showed a previous stroke in the
right corpus striatum (Fig. 3.1F). In addition, brain MRI of patient II-II-3 showed multiple small frontal and periventricular lacunar infarcts (Fig. 3.2E). Unclear episodes of syncope also led to neurological investigations including EEG in individual III-II-2, without any signs of epilepsy. Her brain MRI showed symmetrical structures and normal cerebrospinal fluid spaces but pronounced lesions of the white matter (Fig. 3.2E).
62
Fig. 3.1. Pedigrees and pictures of the individuals studied.
(A) Pedigree of three consanguineous families studied. Black shapes indicate affected individuals. Roman numerals representing the generation are indicated on the left and
Arabic numerals representing the individual are indicated below each pedigree symbol.
(B) Photographs of affected individuals under their corresponding pedigree symbol indicate coarse facial features, including a broad nasal bridge, thick columella, and thick alae nasi. Of note, the left eye of patient II-II-2 shows phthisis bulbi of unknown etiology,
63
as evidenced by an atrophic non-functional eye. Representative images are shown. (C)
An X-ray indicate square shaped vertebral bodies and a flat pelvis, subluxation of the hips, and meta- and epiphyseal dysplasia of the femoral heads in patient III-II-2. (D) The teeth in patient II-II-3 indicates broad maxillary incisors, narrow mandibular teeth, and dental enamel defects. (E) The eye with a visible cataract (Cataracta polaris anterior), as indicated by a white arrow, in individual III-II-2, and (F) A brain MRI demonstrating areas of altered signal intensity as indicated by the white arrow in individual I-II-2.
64
Table 3.1. Phenotypic characteristics of patients in three families.
Family I I II II III
Patient II-1 II-2 II-2 II-3 II-2
Age (years) 11 8 12 10 20
Gender female female male male female Israel Israel Origin (Muslim- (Muslim- Syria Syria Tunisia Arabic) Arabic) Consanguineous + + + + +
Height -1.2 SD -2.3 SD -2.5 SD -4.8 SD -1.9 SD
Weight -0.2 SD -1.7 SD -0.2 SD -3.9 SD -1.9 SD Head -0.25 SD N.D. +0.9 SD -1.1 SD N.D. circumference Congenital + + + + + cataract Secondary + + + + - glaucoma Hearing loss + - - + + Scoliosis/Skeletal + + + + + abnormalities Teeth + + + + + Abnormalities Developmental + + N.D. N.D. N.D. delay Stroke + + + N.D. "+" indicates presence of trait, "-" indicates absence of trait, N.D., not done. GAG, glycosaminoglycan.
65
Fig. 3.2. Images of individuals with PIK3C2A deficiency.
Photographic images of (A) teeth, (B) hands, and (C) feet are shown from the five individuals with PIK3C2A deficiency. (D) X-Ray images of the pelvis and (E) MRI
66
images of the brain are shown when available. White arrows in the MRI images indicate regions of altered signal intensity.
In addition to the shared syndromic features described above in all three families, both affected daughters in Family I were diagnosed with congenital adrenal hyperplasia
(CAH), due to 17-alpha-hydroxylase deficiency, and were found to have a homozygous familial mutation: NM_000102.3:c.286C>T; p.Arg96Trp in the CYP17A1 gene (OMIM
#202110)[114,115]. The affected individuals in Families II and III do not carry mutations in CYP17A1 or have CAH, suggesting the presence of two independent and unrelated conditions in Family I. The co-occurrence of multiple monogenic disorders is not uncommon among this highly consanguineous population[116].
All five affected individuals were born to healthy first-degree cousins, with the exception of hypothyroidism in the mother in Family II (II-1-1), suggesting an autosomal recessive inheritance pattern (Fig. 3.1A). To identify the genetic basis of this disorder, enzymatic assays related to the mucopolysaccharidosis subtypes MPS I, MPS IVA, MPS IVB, and
MPSVI were tested in Families I and II and found to be normal. Enzymatic assays for mucolipidosis II/III were also normal and no pathogenic mutations were found in galactosamine-6-sulfate sulfatase (GALNS) in Family I. Additionally, since some of the features of patient II-II-3 were reminiscent of Noonan syndrome, Hennekam syndrome, and Aarskog-Scott syndrome, individual genes involved in these disorders were analyzed in Family II, but no pathogenic mutation was identified. An atypical presentation of
Williams-Beuren disease and Leri-Weill syndrome were excluded as evidenced by
67
molecular genetic testing in patient III-II-2 and a chromosomal analysis, microarray and
molecular testing of FGFR3 were also normal.
Given the unsuccessful targeted genetic testing, WES was performed for the affected
individuals from all three families. After technical and biological filtering of the variants
identified by WES, five candidate variants were identified in Family I, including the
CYP17A1 (p.R96W) mutation that is the cause of the CAH[114,115], but is not known to
cause the other phenotypes. The remaining four variants were in the genes ATF4,
DNAH14, PLEKHA7, and PIK3C2A (Table 3.2). In Family II, sequence and CNV
analysis of the exome data revealed homozygous missense variants in KIAA1549L,
METAP1, and PEX2, in addition to a homozygous deletion in PIK3C2A that
encompassed exons 1-24 out of 32 total exons (Table 3.2, Fig. 3.3B). The deletion was
limited to PIK3C2A and did not affect the neighboring genes. Sequence analysis of
Family III showed a missense variant in PTH2R, a nonsense variant in DPRX, and a
splice site variant in PIK3C2A (Table 3.2).
WES analysis revealed that all affected family members in Families I, II, and III were
homozygous for predicted loss-of-function variants in PIK3C2A, and none of the
unaffected family members was homozygous for the PIK3C2A variants. The initial link
between these three families with rare mutations in PIK3C2A was made possible through
the sharing of information via the GeneMatcher website[67]. The single nucleotide
PIK3C2A variants in Families I and III were confirmed by Sanger sequencing (Fig. 3.3C,
D).
68
Table 3.2. Candidate variants identified by WES.
MA SIFT Polyphen2 Polyphen2 Gene SNP ID Type Effect Transcript cDNA Protein SIFT F Score HVAR Score
Family I
8.1e- ATF4 rs144769713 SNV Missense NM_001675 c.512C>T p.Ser171Phe Damaging 0 Benign 0.188 6 3.6e- CYP17A1 rs104894138 SNV Missense NM_000102 c.286C>T p.Arg96Trp Damaging 0 Prob Dam 1 5
DNAH14 . . SNV Missense NM_001373 c.5135T>A p.Leu1712His Damaging 0 Prob Dam 0.998
PIK3C2A . . SNV Nonsense NM_002645 c.585T>G p.Tyr195Ter . . . .
PLEKHA7 . . SNV Missense NM_175058 c.2899C>T p.Arg967Trp Damaging 0 Prob Dam 1
Family II 69
- KIAA1549 4.1e rs761694178 SNV Missense NM_012194 c.2132C>A p.Pro717Leu Damaging 0.02 Poss Dam 0.837 L 6
METAP1 . . SNV Missense NM_015143 c.408A>G p.Ile136Met Damaging 0 Prob Dam 0.985
7.4e- PEX2 rs35689779 SNV Missense NM_000318 c.209A>G p.Tyr70Cys Damaging 0.04 Prob Dam 0.989 4 c.(0+1_1-1)_ PIK3C2A . . DEL Deletion NM_002645 (4007+1_4008- p.0 . . . . 1)del Family III
7.1e- PTH2R . SNV Missense NM_005048 c.773G>A p.Gly258Asp Damaging 0 Prob Dam 1 6 3.3e- DPRX rs201435914 SNV Nonsense NM_001012728 c.466C>T p.Arg156Ter . . . . 4 p.Asn483_Arg5 PIK3C2A . . SNV Splice site NM_002645 c.1640+1G>T . . . . 47delinsLys
SNV, single nucleotide variant. DEL, deletion. Prob Dam, probably damaging. Poss Dam, possibily damaging. MAF, minor allele frequency
(from gnomAD v2.0.2). Overlapped genes among three families are indicated in bold.
70
71
Fig. 3.3. Loss-of-function mutations in PIK3C2A.
(A) Diagram of the intron/exon and protein domain structures of PIK3C2A, indicating the location of mutations identified in three independent consanguineous families with homozygous loss-of-function mutations in PIK3C2A. (B) CNV analysis confirmed a homozygous deletion encompassing exons 1-24 out of 32 total exons of PIK3C2A, indicated with the red line. (C) Sanger sequencing confirmed homozygosity for the
PIK3C2A c.585T variant in Family I. (D) Sanger sequencing confirmed homozygosity for the PIK3C2A c.1640+1 G>T variant in Family III.
72
In Family I, the nonsense mutation in PIK3C2A (p.Y195*) results in the deletion of 1,492
amino acids from a protein that is 1,686 amino acids. This is predicted to eliminate nearly
all functional domains, including the catalytic kinase domain, and is expected to trigger
nonsense-mediated mRNA decay[111]. Accordingly, levels of PIK3C2A mRNA are significantly decreased in both heterozygous and homozygous individuals carrying the p.Y195* variant (Fig. 3.4A). The deletion in Family II eliminates the first 24 exons of a
32-exon gene and is therefore not predicted to express any protein. This is consistent with a lack of PIK3C2A mRNA expression (Fig. 3.4B). The PIK3C2A variant in Family III disrupts an essential splice site (c.1640+1G>T) that leads to decreased mRNA levels (Fig.
3.4C) and exon skipping of both exons 5 and 6 (Fig. 3.5). Although this transcript remains in-frame, no PIK3C2A protein was detected by Western blotting (Fig. 3.4D).
This is consistent with Families I and II, for which western blotting also failed to detect
any full-length PIK3C2A in fibroblasts from the affected homozygous children (Fig.
3.4E). Thus, all three PIK3C2A variants likely encode loss-of-function alleles.
Importantly, among the 141,352 WES and WGS from control individuals in the Genome
Aggregation Database (gnomAD)[117], none are homozygous for loss-of-function
mutations in PIK3C2A, which is consistent with total PIK3C2A deficiency causing
severe early onset disease.
73
Fig. 3.4. Protein and mRNA levels of PIK3C2A in patient-derived cells.
PIK3C2A mRNA levels were detected by qRT-PCR in patient-derived fibroblasts from
(A) Family I, (B) Family II, and (C) Family III. (D) Whole cell lysates from fibroblasts of healthy controls (WT), heterozygous parents, and affected individuals from (D) Family
III and (E) Families I and II were analyzed by Western blotting for PIK3C2A and the loading controls Actin or GAPDH. Epitopes of anti-PIK3C2A antibodies (AB1-AB4) are
detailed in Table 3.5. * indicates p < 0.05. ** indicates p < 0.01. *** indicates p < 0.0001.
qRT-PCR data is represented as mean ± SEM (n=3-4 technical replicates per sample).
74
Fig. 3.5. PIK3C2A exon skipping in individual III-II-2.
The c.1650+1G>T mutation in PIK3C2A disrupts the splice donor site in intron 6 and leads to skipping of exons 5 and 6. Chromatograms are from sequenced RT-PCR products from cDNA of fibroblasts from wild-type control and patient fibroblasts using primers located in exons 3 and 10. Positions of primers are indicated by orange arrows and position of the splice site mutation is indicated by a black arrow.
75
Identification of cellular defects in patient-derived fibroblasts.
To test whether cellular phenotypes were consistent with loss-of-function mutations in
PIK3C2A, we examined cellular and cilia-localized PI(3,4)P2 levels as well as cilia length in control- and patient-derived primary fibroblasts. PIK3C2A deficiency in the patient-derived fibroblasts profoundly decreased PI(3,4)P2 throughout the cell in non- ciliated cells (Fig. 3.6A) and within primary cilia (Fig. 3.6B). Cilia length was also reduced in PIK3C2A deficient cells relative to control cells (Fig. 3.6C), although the percentage of ciliated cells was not altered (Fig. 3.6D). Despite the reduction of PI(3,4)P2 in cilia, the localization of other ciliary components were not affected (Fig. 3.7).
76
Fig. 3.6. Cilia defects in patient-derived fibroblasts.
Immunofluorescence studies on (A) non-ciliated and (B) ciliated fibroblasts using anti-
clathrin heavy chain (CHC), anti-α-Tubulin, and anti-PI(3,4)P2 antibodies demonstrate
decreased enrichment of the PIK3C2A product PI(3,4)P2 in non-ciliated cells and a loss
of the ciliary localization in affected individuals. Nuclei are stained with DAPI. (C) Cilia
length and (E) cilia number in primary fibroblasts from affected individuals and unrelated
control cells. *** indicates p < 0.0001. Data is represented as mean ± SEM
(n>300/sample).
77
Fig. 3.7. Localization of ciliary markers in patient-derived PIK3C2A deficient fibroblasts.
Co-Immunofluorescence studies on ciliated human fibroblasts using anti-IFT88, anti-PC1, and anti-IFT54, anti-Rab11, and anti-PI(3)P shown in green and anti-a-Tubulin antibody shown in red. No difference in ciliary localization between wild-type and affected individuals was detected.
78
Pik3c2a deficiency causes cataracts in zebrafish model.
To determine what features are caused by Pik3c2a deficiency in a model organism, we generated and examined two zebrafish models with nonsense mutations in pik3c2a.
Embryos with the alleles sa10124 and sa12328 were created as part of the Zebrafish
Mutation Project[118] and were obtained by in vitro fertilization from frozen sperm
samples by the Zebrafish International Resource Center. The alleles sa10124 and sa12328 encode nonsense mutations in pik3c2a at amino acids 585 and 1236, respectively, that were confirmed by Sanger sequencing. Both nonsense mutations occur prior to the end of the catalytic domain and are thus predicted to encode null alleles. We generated homozygous pik3c2asa12328/sa12328 and pik3c2asa10124/sa10124 zebrafish, as well as
compound heterozygous pik3c2asa12328/sa10124 mutants by intercrossing heterozygous
adults. As these alleles were generated by random ENU mutagenesis[118], analysis of the
compound heterozygous pik3c2asa12328/sa10124 mutants minimized the likelihood of homozygosity for any unlinked ENU-induced mutations. The frequency of offspring genotypes from each pik3c2a+/- intercross was expected to follow a Mendelian (1:2:1)
ratio. The 1:2:1 genotype ratio was observed for offspring at both 1- and 3-weeks post fertilization (Table 3.3) and the gross morphology of the mutants at these ages was indistinguishable from that of control fish. However, no pik3c2a-/- zebrafish survived beyond 3 months, demonstrating that pik3c2a is required for viability into adulthood
(Table 3.3).
79
Table 3.3. Survival of offspring from pik3c2a heterozygous knockout zebrafish (+/-) crosses.
pik3c2a genotype Cross Age p value -/- +/- +/+
1 week 33 83 31 0.29
sa10124+/- x sa12328+/- 3 weeks 16 25 10 0.49
3-5 months 0 33 11 0.0003
1 week 9 15 10 0.77 sa10124+/- x sa10124+/- 3-5 months 0 5 3 0.25
1 week 5 12 7 0.33 sa12328+/- x sa12328+/- 3-5 months 0 13 7 0.04
1 week 47 110 48 0.5748 Combined 3-5 months 0 51 21 0.0001
80
Given the extended timeframe of pik3c2a-/- viability in zebrafish relative to the mouse
Pik3c2a knockout model, which is embryonic lethal[106], we used the zebrafish model of
Pik3c2a deficiency to test for phenotypic similarities with the features of PIK3C2A deficiency in humans. We focused on the presence of cataracts, given that it was a robust and early phenotype present in all affected individuals described above. Zebrafish were screened for lenticular abnormalities by coaxial illumination. A masked grader evaluated digital photographs and videos of each animal and graded the lens as normal or abnormal.
Each animal was genotyped following the cataract grading. All pik3c2a-/- animals
evaluated (n=7) displayed lenticular abnormalities, whereas only one control pik3c2a+/- or
wild-type zebrafish (n=7) displayed any lenticular abnormalities (p < 0.005, two-tailed
Fisher’s exact test) (Fig. 3.8). The mutant animals had a circular defect which was more obvious in the posterior aspect of the lens, reminiscent of posterior lenticonus (Fig. 3.8B).
81
Fig. 3.8. Pik3c2a deficiency in zebrafish causes cataracts.
Coaxial illumination of a (A) wild-type eye, and a (B) pik3c2a-/- zebrafish eye resembling posterior lenticonus. Toluidine blue staining of a (C) wild-type eye, and a (D) pik3c2a-/- zebrafish eye.
82
Discussion
Here we describe the identification of three independent families with loss-of-function
mutations in PIK3C2A, resulting in a novel syndrome displaying short stature, cataracts,
secondary glaucoma, and skeletal abnormalities among other features (Tables 3.1).
Interestingly, we observed in patient-derived fibroblasts shortening of the cilia and
decreased levels of ciliary PI(3,4)P2 (Fig. 3.6). Thus, based on the loss-of-function
mutations in PIK3C2A, the phenotypic overlap between the three independent families,
the patient-derived cellular data consistent with PIK3C2A deficiency, and the presence of
cataracts in both patients and pik3c2a-/- zebrafish, we conclude that loss-of-function
mutations in PIK3C2A cause this novel syndrome.
The identification of PIK3C2A loss-of-function mutations in humans represents the first mutations identified in any class II PI-3-kinase with Mendelian inheritance, and thus sheds light into the biological role of this poorly understood class of PI-3-
kinases[104,119]. This is significant not only for understanding the role of PIK3C2A in
rare monogenic disorders, but also for the potential contribution of common variants in
PIK3C2A in more genetically complex disorders. Often, severe mutations in rare
Mendelian disorders can highlight the biological function of genes in a developmental
process in which other less severe variants in that gene are likely to contribute[120,121].
For example, severe mutations in PPARG cause monogenic lipodystrophy, whereas less severe variants are associated with complex polygenic forms of lipodystrophy[122,123].
83
In the case of PIK3C2A deficiency, the identification of delay of neurological
development in Family I may provide biological insight into the mechanisms underlying
the association between common variants in PIK3C2A and schizophrenia[124–126].
Likewise, the short stature in PIK3C2A deficient patients calls attention to the SNPs
rs1330 and rs757081 that are both less than 125 kilobases from the PIK3C2A gene and
are significantly associated with human height[127,128]. Of note, as PIK3C2A is required
for sonic hedgehog signaling[112], variation in this pathway has previously been
implicated in the regulation of human height[129].
Other monogenic disorders of phosphoinositide metabolism include Lowe syndrome which shares many of the same features with PIK3C2A deficiency including congenital cataracts, secondary glaucoma, kidney defects, skeletal abnormalities, developmental delay, and short stature[130,131]. The enzyme defective in Lowe syndrome, OCRL, is a
5-phosphatase that is required for membrane trafficking and ciliogenesis, similar to
PIK3C2A [132]. The similarities between Lowe syndrome and PIK3C2A deficiency suggest that similar defects in phosphatidylinositol metabolism, perhaps related to
deficiency of PI(3,4)P2, which was greatly reduced in PIK3C2A patient-derived
fibroblasts (Fig. 3.6), may underlie both disorders. In addition to Lowe syndrome, there is
partial overlap between PIK3C2A deficiency and other Mendelian disorders of PI
metabolism, such as the early-onset cataracts in patients with INPP5K
deficiency[133,134], demonstrating the importance of PI metabolism in lens development.
84
The viability of patients with PIK3C2A deficiency and pik3c2a deficient zebrafish
suggests differences between the biological functions of human PIK3C2A and the mouse
ortholog. Mouse knockout models of Pik3c2a result in growth retardation by e8.5 and
embryonic lethality between e10.5-11.5 due to vascular defects[106]. It remains to be
determined whether the species viability differences associated with PIK3C2A deficiency
result from altered PIK3C2A function between humans and mice or due to altered
compensation from other PI metabolizing enzymes. For instance, there are species- specific differences between humans and mice in the transcription and splicing of the
OCRL homolog INPP5B that may uniquely contribute to PI metabolism in each species[135]. Interestingly, the splicing pattern of inpp5b in zebrafish is similar to that in
humans[135], and the pik3c2a-/- zebrafish survives considerably longer than the mouse
Pik3c2a knockout. However, other biological functions appear to be conserved between
humans and mice. For instance, deletion of Pik3c2a in adult mice resulted in prolonged
bleeding time and demonstrated that Pik3c2a is required for platelet function[136]. The
brain MRIs in two of the PIK3C2A deficient patients also detected evidence of bleeding
within the brain (Fig. 3.2E), suggesting that PIK3C2A is required for maintaining proper
hemostasis.
It is intriguing that both PIK3C2A and OCRL have important roles in cilia formation.
Primary cilia are evolutionary conserved microtubule-derived cellular organelles that
protrude from the surface of most mammalian cell types. They play a pivotal role in a
number of processes, such as left-right patterning during embryonic development, cell growth, and differentiation. The importance of primary cilia in embryonic development
85
and tissue homeostasis has become evident over the two past decades, as a number of
proteins which localize to the cilium harbor defects causing syndromic diseases,
collectively known as ciliopathies. Hallmark features of ciliopathies include skeletal
abnormalities, progressive vision and hearing loss, mild to severe intellectual disabilities,
polydactyly, and kidney phenotypes. Primary cilia formation is initiated by a cascade of
processes involving the targeted trafficking and docking of Golgi-derived vesicles near
the mother centriole. Interestingly, phosphatidylinositol metabolism has been linked to
ciliary dysfunction[137] and PIK3C2A loss has been associated with impaired
ciliogenesis in MEFs, likely due to defective trafficking of ciliary components[112].
Further work and the identification of additional PIK3C2A patients will be needed to
better understand the phenotype-genotype correlation associated with PIK3C2A deficiency. However, the identification of the first patients with PIK3C2A deficiency establishes that this enzyme is not required for viability in humans. Additionally, the clinical presentations of the five PIK3C2A deficient patients identified thus far clearly establishes a role for PIK3C2A in neurological and skeletal development, as well as vision, and growth.
86
Material and Methods
Human studies. The study was approved by the ethics committees of Rambam Hospital,
Haifa, Israel, of the University Medical Center, Groningen, Netherlands, and University
Hospital, Erlangen, Germany. Informed consent was obtained from all participants.
Whole exome sequencing (WES). WES of two patients from Family I was performed using 1µg of DNA that was extracted from whole blood and fragmented and enriched using the Truseq DNA PCR Free kit (Illumina). Samples were sequenced on one lane of a
HiSeq2500 (Illumina) with 2x100bp read length and analyzed as described[138]. Raw fastq files were mapped to the reference human genome GRCh37 and the two mapped
SAI files were combined using the BWA package[139] (v.0.7.12) and output as BAM files. To pre-clean up the data, duplicate reads were marked and removed by Picard (v.
1.119). In addition, local realignment and base quality score recalibration were performed following the GATK pipeline[140] (v. 3.3). The resulting number of sequencing reads for the older sister and younger sister were 32,358,405 and 38,507,957, respectively. The average read depth was 98x and 117x, respectively. Subsequently, HaplotypeCaller was used to call SNPs and indels. For SNPs, the filters used were QD < 2; MQ <60; FS > 40;
MQRankSum < −12.5; ReadPosRankSum < −8; DP<=10. For indels, the filters used were QD < 2; FS > 200; ReadPosRankSum < −20; DP<=10. The variants that passed these filters were further annotated with Annovar[141]. Databases used in Annovar were
RefSeq[142], EXAC[117] (v. exac03), CLINVAR[143] (v. clinvar_20150330) and LJB
87
database[144] (v. ljb26_all). Exome variants in Family I were filtered out if they were not
homozygous in both affected individuals, had a population allele frequency greater than
0.1% in either the Exome Aggregation Consortium database[15] or the Greater Middle
East Variome Project[16], and were not predicted to be deleterious by either SIFT[145] or
Polyphen2[146].
WES was performed on the two affected individuals of Family II and both their parents
essentially as previously described[147]. Target regions were enriched using the Agilent
SureSelectXT Human All Exon 50Mb Kit. Whole-exome sequencing was performed on
the Illumina HiSeq platform (BGI, Copenhagen, Denmark) followed by data processing
with BWA[139] (read alignment) and GATK[140] (variant calling) software packages.
Variants were annotated using an in-house developed pipeline. Prioritization of variants
was done by an in-house designed ‘variant interface’ and manual curation.
The DNAs of Family III were enriched using the SureSelect Human All Exon Kit v6
(Agilent, Santa Clara, CA) and sequenced on an Illumina HiSeq 2500 (Illumina, San
Diego, CA). Image analysis and base calling were performed using HiSeq instrument
control software with default parameters. After demultiplexing with bcl2fastq v1.8.4
from Illumina, read alignment was performed with BWA[139] version 0.7.8 using the
bwa mem algorithm with the human genome assembly hg19 (GRCh37) as a reference.
Duplicate reads were marked with Picard (version 1.111). The average read depth was
95x (III-II-2), 119x (III-I-1) and 113x (III-I-2). Single-nucleotide variants and small
88
insertions and deletions (indels) were detected using five different callers:
HaplotypeCaller and UnifiedGenotyper of the aforementioned Genome Analysis Toolkit,
SNVer[148], freeBayes[149], and Platypus[150]. Variant annotation was performed using
ANNOVAR[141], CLINVAR[143], OMIM and MedGen. Variants were selected that
were covered by at least 10% of the average coverage of each exome and for which at
least 5 novel alleles were detected from 2 or more callers. All modes of inheritance were
analyzed[151]. Variants were prioritized based on a population frequency of 10-3 or below
(based on the ExAC database[117] and an in-house variant database), on the evolutionary conservation, and on the mutation severity prediction (CADD score 15 or higher). All remaining variants and the segregation in the family were confirmed by Sanger sequencing.
Copy number variants analysis. Microarray analysis for copy number variant detection in Family I was performed using a HumanOmni5-Quad chip (Illumina). SNP array raw
data was mapped to the reference human genome GRCh37 and analyzed using
GenomeStudio (v. 2011/1). Signal intensity files with Log R ratio and B-allele frequency
were further analyzed with PennCNV[152] (v. 2014/5/7) to detect copy number variants.
CNV analysis on the WES data of Family II was performed by CNV calling using
CoNIFER[153]. Variants were annotated using an in-house developed pipeline.
Prioritization of variants was done by an in-house designed ‘variant interface’ and
manual curation as described before[154]. Subsequent segregation analysis of the
89
pathogenic CNV in Family II was performed with MAQ by using a targeted primer set with primers in exons 3, 10, 20 and 24 which are located within the deletion and exons 28,
32, 34 which are located outside of the deletion (Multiplex Amplicon Quantification
(MAQ; Multiplicom Niel, Belgium).
Sanger Sequencing. DNA was extracted from patients’ and controls’ blood cells or fibroblasts. Sanger sequencing was performed by GenScript. The data were visualized and analyzed using FinchTV (Geospiza). Candidate variants identified by WES were
PCR amplified and sequenced with the primers listed in Table 3.4.
Cell culture. Human dermal fibroblasts were obtained from sterile skin punches cultured in DMEM (Dulbecco's Modified Eagle's Medium) supplemented with 10 - 20% Fetal
Calf Serum, 1% Sodium Pyruvate and 1% Penicillin and streptomycin (P/S) in 5% CO2 at 37°C. Control fibroblasts were obtained from healthy age-matched volunteers.
Fibroblasts from passages 4–8 were used for the experiments.
Western Blotting. Protein was extracted from cultured primary fibroblast cells as described[155,156]. Extracts were quantified using the DC protein assay (BioRad) or the
BCA method. Equal amounts of protein were separated by SDS-PAGE and electrotransferred onto polyvinylidene difluoride membranes (Millipore, Billerica,
Massachusetts, USA). Membranes were blocked with TBST/5% fat-free dried milk and
90
stained with antibodies as detailed in Table 3.5. Secondary antibodies were goat anti- rabbit (1:5,000, Thermo Fisher #31460) goat anti-mouse (1:5,000, Thermo Fisher
#31430), goat anti-rabbit (1:2,000, Dako #P0448), and goat anti-mouse (1:2,000, Dako
#P0447).
Cilia analysis. To induce ciliogenesis, cells were grown in DMEM with 0 - 0.2% FCS for 48 hours. Immunofluorescence staining was performed in a biological triplicate.
Briefly, cells were washed in PBS, then fixed and permeabilized in ice-cold methanol for
5 minutes, followed by extensive washing with PBS. After blocking in 5% Bovine Serum
Albumin, cells were incubated with primary antibodies for 1.5 hours at room temperature
(RT) and extensively washed in PBS-T. Primary antibodies used for Centrin and
ARL13B are detailed in Table 3.5. To wash off the primary antibody, cells were extensively washed in PBS-T. Subsequently, cells were incubated with secondary antibodies, Alexa Fluor 488 (1:800, Invitrogen) and Alexa Fluor 568 (1:800, Invitrogen), for 45 min followed by washing with PBS-T. Finally, cells were shortly rinsed in ddH2O and samples were mounted using Vectashield with DAPI. Images were taken using an
Axio Imager Z2 microscope with an Apotome (Zeiss) at 63x magnification. Cilia were measured manually using Fiji software taking the whole length of the cilium based on
ARL13B staining. At least 300 cilia were measured per sample. Cilia lengths were pooled for 3 control cell lines and compared to 2 patient-derived samples. Statistical significance was calculated using a Student t-test.
91
cDNA and quantitative real time-PCR. Total RNA was purified from primary fibroblasts using the PureLink RNA purification kit (ThermoFisher) or RNAPure peqGOLD (Peqlab, Darmstadt, Germany). RNA was reverse transcribed into complementary DNA with random hexamer using a high-Capacity cDNA Reverse
Transcription Kit (Thermo Fisher Scientific, Waltham, USA). RT-PCR to detect exon- skipping in family III was performed using primers flanking exon 6 (Fig. 3.5). Gene expression was quantified by SYBR Green real-time PCR using the CFX Connect Real-
Time System (BioRad, München, Germany). Primers used are detailed in Table 3.4.
Expression levels were calculated using the CT method relative to GADPH.
△△
Immunostaining. Cells were grown on glass coverslips to approximately 80%-90% confluency in DMEM + 10% FCS + 1% P/S, at which time the medium was replaced with DMEM without FCS for 48 hours to induce ciliogenesis. Cells were fixed in either methanol for 10 minutes at -20°C or 4% paraformaldehyde for 10 minutes at RT. Fixed cells were washed in PBS, and incubated with 10% normal goat serum, 1% bovine serum albumin in PBS for 1 hour at RT. If cells were fixed with paraformaldehyde, blocking solutions contained 0.5% Triton X-100. Cells were incubated with primary antibody overnight at 4°C, washed in PBS, and incubated with secondary antibody including
Diamidino-2-Phenylindole (DAPI) to stain nuclei for 1 hours at RT. Coverslips were mounted on glass slides with fluoromount (Science Services, München, German) and imaged on a confocal laser scanning system with a 63x objectives (LSM 710, Carl Zeiss
MicroImaging, Jena, Germany). Primary antibodies are detailed in Table 3.5.
92
Zebrafish (Danio rerio). Zebrafish strains carrying the sa10124 and sa12328 alleles were purchased from the Zebrafish International Resource Center (ZIRC). Zebrafish were kept with the approval of the Case Western Reserve University Institutional Animal Care and Use Committee (protocol number 2015-0139) in a 16h light/8h dark cycle. Fish were euthanized by chilling at 4 degrees Celsius.
The pik3c2a knockout allele was genotyped using the dCAPS method[157]. The primers
were designed using the web-based software program dCAPS Finder 2.0 (Table 3.4)[158].
The 2nd PCR introduced a restriction site for the enzyme DdeI in the wild-type allele of
pik3c2a but not the sa10124 allele, or introduced a restriction site for the enzyme TaqI
in the wild-type allele of pik3c2a but not the sa12324 allele. The digested PCR products
were analyzed by electrophoresis on a 3.5% agarose gel. For the sa10124 allele, PCR and
digestion of the mutant pik3c2a allele produced a 223 bp product, whereas the wild-type
allele produced a 206 bp and 17 bp product. For the sa12328 allele, PCR and digestion of
the mutant pik3c2a allele produced a 228 bp product, whereas the wild-type allele produced a 205 bp and 23 bp product.
Cataracts evaluation in zebrafish. Adult zebrafish were anesthetized with tricaine and examined. Coaxial illumination using a Leica M841 surgical microscope was used to visualize lenticular defects. Digital video recordings were made using a Panasonic GP-
US932A HD camera system and later reviewed by an ophthalmologist without
knowledge of the genotype of each animal. Optical sectioning by changing the z-axis
93
focus was used to aid in the identification of cataracts. All zebrafish examined for cataracts were offspring of a cross between pik3c2asa10124/+ and pik3c2asa12328/+ fish.
94
Table 3.4. List of primers used in this study.
Primer Forward Reverse Source PIK3C2A_cDNA GACATTGAAGGAT Splicing- Exon3 TTCAGCTACC effect PIK3C2A_cDNA GCACAGTCTGTAGG Splicing- Exon10 ACTCCTACC effect PIK3C2A cDNA CTCAGCTTGCAAA CTGGGTTTGTGCGG Gene Exon 1-2 AGCCCAG TGATTG expression PIK3C2A_cDNA GTGCTGACCTCTG CAAGTTGTAGGCCT Gene Exon24 ATATGGC GACAGC expression ATF4 TAGATGACCTGGA GGGCTCATACAGAT Sequencing AACCATGC GCCACTA DNAH14 GGTGGAGTAGAGC GGTACAGTCCCAGG Sequencing TCCCAGA TCATCC PLEKHA7 CACTCCCCGAACT CAGCTCAGGCTCAC Sequencing CTACAGC TGACAT PIK3C2A ACAGTGGCCACCT TCAGTCCTTGCTTT Sequencing, GGATTAC CCCATT Family I PIK3C2A TTATTGTGGCTGA GACAATAGAAAGA Sequencing, AGGATGC CCAAAGAGTGG Family III GAPDH cDNA AATCCCATCACCA TGGACTCCACGACG Gene TCTTCCA TACTCA expression pik3c2asa10124(1st GCAACTCCACAGA ACCTCTGGTGAGCG Genotyping primer pair) TGCGATA TGTTCT (dCAPS) pik3c2asa10124(2nd GCAACTCCACAGA TCAACTTCATCCAG Genotyping primer pair) TGCGATA AGCTCA (dCAPS) pik3c2asa12328(1st AACCTCACTCCCA CAACAGAACTGCTG Genotyping primer pair) TGACCTC CCATGT (dCAPS) pik3c2asa12328(2nd AACCTCACTCCCA TGTCCTTGAAGGAA Genotyping primer pair) TGACCTC CCCGTCACTCG (dCAPS)
95
Table 3.5. List of antibodies used in this study.
Antigen Host Catalog # Source Dilution IF Dilution IB PIK3C2A rabbit --- Gift from Prof. 1:200 1:1,000 (AB1) Haucke (Berlin), epitope: a.a. 2-365 PIK3C2A mouse Sc-365290 Santa Cruz, 1:50 1:1,000 (AB2) epitope: a.a. 61- 360 PIK3C2A rabbit 12402 Cell Signaling, - 1:1,000 (AB3) epitope: ~ a.a. 717 PIK3C2A rabbit 22028-1-AP Proteintech, 1:1000 (AB4) Acetylated mouse T7451 Sigma 1:300 -- α-Tubulin (Lys40) Acetylated rabbit 5335 Cell Signaling 1:200 -- α-Tubulin (Lys40) Clathrin rabbit Ab21679 Abcam 1:200 1:1,000 heavy chain IFT88 rabbit 13967-1-AP ProteinTech 1:50 -- PI(3,4)P2 mouse Z-P034b Echelon 1:150 -- PI(3)P mouse Z-P003 Echelon 1:100 -- Polycystin1 rabbit Ab74115 Abcam 1:50 -- RAB11A rabbit Ab65200 Abcam 1:100 -- TRAF3IP1 rabbit A104577 Atlas 1:50 -- (IFT54) GAPDH mouse MA5-15738 Thermo Fisher -- 1:10,000 Centrin mouse 04-1624 Millipore 1:500 -- ARL13B rabbit 17711-1-AP Proteintech 1:500 --
IF, immunofluorescence; IB, immunoblot; a.a., amino acid
96
Chapter 4. Widespread epistasis regulates glucose homeostasis and gene expression
The study of this chapter was published in
Anlu Chen, Yang Liu, Scott M. Williams, Nathan Morris , David A. Buchner. Widespread epistasis regulates glucose homeostasis and gene expression. PLoS Genetics. 2017, 13(9): e1007025.
97
Abstract
The relative contributions of additive versus non-additive interactions in the regulation of
complex traits remains controversial. This may be in part because large-scale epistasis
has traditionally been difficult to detect in complex, multi-cellular organisms. We
hypothesized that it would be easier to detect interactions using mouse chromosome
substitution strains that simultaneously incorporate allelic variation in many genes on a
controlled genetic background. Analyzing metabolic traits and gene expression levels in
the offspring of a series of crosses between mouse chromosome substitution strains
demonstrated that inter-chromosomal epistasis was a dominant feature of these complex
traits. Epistasis typically accounted for a larger proportion of the heritable effects than
those due solely to additive effects. These epistatic interactions typically resulted in trait
values returning to the levels of the parental CSS host strain. Due to the large epistatic
effects, analyses that did not account for interactions consistently underestimated the true
effect sizes due to allelic variation or failed to detect the loci controlling trait variation.
These studies demonstrate that epistatic interactions are a common feature of complex
traits and thus identifying these interactions is key to understanding their genetic regulation.
98
Introduction
The genetic basis of complex traits and diseases results from the combined action of
many genetic variants [159]. However, it remains unclear whether these variants act
individually in an additive manner or via non-additive epistatic interactions. Epistasis has
been widely observed in model organisms such as S. cerevisiae [160,161], C. elegans
[162], D. melanogaster [163] and M. musculus [164]. However, it has been more difficult
to detect in humans, potentially due to their diverse genetic backgrounds, low allele
frequencies, limited sample sizes, complexity of interactions, insufficient effect sizes, and
methodological limitations [165,166]. Nonetheless, a number of genome-wide interaction-
based association studies in humans have provided evidence for epistasis in a variety of
complex traits and diseases [31–37]. However, concerns remain over whether observed
epistatic interactions are due to statistical or experimental artifacts [167,168].
To better understand the contribution of epistasis to complex traits, we studied mouse chromosome substitution strains (CSSs) [169]. For each CSS, a single chromosome in a
host strain is replaced by the corresponding chromosome from a donor strain. This
provides an efficient model for mapping quantitative trait loci (QTLs) on a fixed genetic
background. This is in contrast to populations with many segregating variants such as
advanced intercross lines [170], heterogeneous stocks [171], or typical analyses in humans.
Given the putative importance of genetic background effects in complex traits [172,173],
we hypothesized the fixed genetic backgrounds of CSSs can provide a novel means for
detecting genetic interactions on a large-scale [169,174]. Previous studies of CSSs with
99
only a single substituted chromosome suggested that non-additive epistatic interactions
between loci were a dominant feature of complex traits [164]. However, to identify the
interacting loci, or at least their chromosomal locations, requires the analysis of genetic variation in multiple genomic contexts [175]. We thus extended the analysis of single
chromosome substitutions by analyzing a series of CSSs with either one or two
substituted chromosomes, collectively representing the pairwise interactions between
genetic variants on the substituted chromosomes. This experimental design can directly
identify and map loci that are regulated by epistasis by analyzing the phenotypic effects
of genetic variants on multiple fixed genetic backgrounds. Here we report the widespread
effects of epistasis in controlling complex traits and gene expression. The detection of
true epistatic interactions will improve our understanding of trait heritability and genetic
architecture, as well as provide insights into the biological pathways that underlie disease
pathophysiology [176]. Knowing about epistasis will also be essential for guiding precision medicine-based decisions by interpreting specific variants in appropriate contexts.
100
Results
Contribution of epistasis to metabolic traits.
Body weight and fasting plasma glucose levels were measured in a total of 766 control
and CSS mice (Table 4.1, Fig. 4.1). Raw data of body weight and plasma glucose measurements is available at https://doi.org/10.1371/journal.pgen.1007025.s011. The
CSSs included 240 mice that were heterozygous for one A/J-derived chromosome and
444 mice that were heterozygous for two different A/J-derived chromosomes, both on otherwise B6 backgrounds. The CSSs with two A/J-derived chromosomes represented all pairwise interactions between the individual A/J-derived chromosomes. For example, comparisons were made between strain B6, strains (B6.A3 x B6)F1 and (B6 x B6.A10)F1, which were both heterozygous for a single A/J-derived chromosome (Chr. 3 and 10, respectively), and strain (B6.A3 x B6.A10)F1, which was heterozygous for A/J-derived chromosomes 3 and 10 (Fig. 4.2). A complete list of the strains analyzed is shown in
Table 4.1. Quantitative trait loci (QTLs) were identified for both body weight and plasma glucose levels that were due to main effects and interaction effects. Of note, due to the nature of the CSS experimental design, the regions defined by the identified QTLs correspond to the entire substituted chromosome. Additionally, due to the study design, only QTLs with dominant or semi-dominant effects could be assessed.
101
Table 4.1. Number of mice used for analysis of body weight and plasma glucose.
Maternal Genotype B6 B6.A3 B6.A6 B6.A14 B6.A17
82 20 34 32 35 B6 (43,39) (9,11) (16,18) (14,18) (20,15) 24 3 36 19 30 B6.A4 (13,11) (1,2) (18,18) (10,9) (16,14) 37 21 33 35 41 B6.A5 (16,21) (15,6) (17,16) (17,18) 20,21 27 8 24 10 29 B6.A8
Paternal Genotype Paternal (16,11) (3,5) (13,11) (6,4) (12,17) 31 41 33 39 42 B6.A10 (17,14) (21,20) (15,18) (22,17) (21,21) Total number of mice is shown with the numbers of male (left) and female (right) mice indicated below in parentheses.
102
Fig. 4.1 Body weight and glucose levels in all CSS and control mice.
Body weight and plasma glucose levels were measured in 5-week-old mice that were fasted overnight. Each dot represents the data from an individual mouse. Females (F) are
103
shown in red. Males (M) are shown in blue. Outliers, as described in the Trait Analysis paragraph in the Methods section, are not shown but all data are available at https://doi.org/10.1371/journal.pgen.1007025.s011.
104
Fig. 4.2. Schematic diagram of CSS and control crosses.
Crosses were used to generate control, single CSS, and double CSS mice to examine main effects and interaction effects on various traits and gene expression levels. The four crosses used (top) to generate the control and CSS offspring (bottom) to study the substitution of chromosomes 3 and 10 are provided as an example of the crosses that were performed. Each rectangle represents a chromosome, with the substituted chromosomes 3 and 10 diagramed in this figure, on B6 background in all mice. The
105
control B6 mice were generated from Cross I. The single CSS mice were generated from crosses II and III. The double CSS mice were generated from cross IV. M, Male. F,
Female.
106
Joint F-tests for main effects on body weight indicated that the chromosome substitutions
influenced body weight (males p=0.0028; females p=0.0008; meta p=1.4e-05). Similarly,
joint F-tests tests for main effects on plasma glucose levels demonstrated a significant
effect of the chromosome substitutions (males p=0.0082; females p=0.00011; meta
p=1.4e-05). QTLs with main effects on body weight were mapped to chromosomes 8
(main effect: 1.23g; average effect: 1.02g) and 17 (main effect: -1.13g; average effect: -
1.11g) (Table 4.2). Note that we define main effects as the effect of a chromosome
substitution as estimated by a model which includes all pairwise interaction terms, thus
taking into account context-dependent genetic background effects. In contrast, the
average effect is estimated using a model that does not include any interaction terms; the
latter is similar to the analyses performed in a typical GWAS. QTLs with main effects on
fasting glucose were mapped to chromosomes 3 (main effect: 25.0 mg/dL; average effect:
9.61 mg/dL), 5 (main effect: 15.6 mg/dL; average effect: 6.02 mg/dL), and 4 (main effect:
17.5 mg/dL; average effect: 6.61 mg/dL) (Table 4.2).b
Joint F-tests for interaction effects on body weight were not significant (males p= 0.19; females p= 0.83; meta p= 0.44), and therefore epistatic interactions on body weight were not further investigated. However, joint F-tests for interaction effects on plasma glucose demonstrated the importance of epistasis in regulating this trait (males p= 0.002; females p= 0.003; meta p= 8.99e-05). In fact, among the males and females respectively, epistasis accounted for 43% (95% confidence interval: 23%-75%) and 72% (95% confidence interval: 37%-97%) of the heritable effects on plasma glucose levels. The discrepant results for the contribution of interactions to body weight and plasma glucose are likely reflected in the difference between whether QTLs for these traits were detected using the
107
main effect model or the average effect model (Table 4.2). For plasma glucose, only 1 of the 3 QTLs identified using the main effect model was also identified using the average effect model, and no new QTLs were identified with the average effect model. In contrast, both of the QTLs for body weight identified using the main effect model were also identified using the average effect model, and 2 new QTLs were identified on chromosomes 6 and 10. This suggests that for a trait regulated by epistatic interactions, the ability to successfully identify QTLs is greatly enhanced by accounting for these interactions. However, for a trait regulated primarily by additive effects, a model incorporating interactions can be detrimental to QTL identification.
To identify specific epistatic interactions, we tested explicit hypotheses for inter- chromosomal pairwise interactions on plasma glucose levels. Among the 15 CSS crosses analyzed, 5 crosses demonstrated inter-chromosomal epistatic interactions that altered plasma glucose levels (Figs. 4.3, 4.4). Interestingly, in all 5 crosses demonstrating interactions, one chromosome substitution increased fasting glucose levels relative to the control B6 strain. These main effects raised plasma glucose levels by an average of 12.3 mg/dL in males and 17.8 mg/dL in females. However, in all 5 observed interactions the average plasma glucose levels in the double CSSs were closer to the control B6 strain than any single CSS was. Furthermore, in 4 of the 5 interactions, the plasma glucose levels in the double CSS did not differ statistically from the control strain B6 (p value >
0.1). Thus, the chromosome substitution driving the increase in plasma glucose on a B6 background had no effect on glucose levels when the genetic background was altered by the second chromosome substitution.
108
Table 4.2. Main and average effects on phenotypes
Meta Substituted Effect 95% CI 95% CI Adjusted chromosome Estimate T P Lower Upper P Main 9.74488 -0.265762283 19.81374709 2.192947 0.0436 0.1854 14 Average -0.86608 -6.083174116 4.307270988 -0.34777 0.7405 1 Main -1.65693 -7.808561114 4.721277732 -0.38763 0.6486 1 17 Average -2.57627 -6.141044821 0.946939948 -1.13129 0.2104 0.8329 Main 25.01456 19.21034134 31.30942924 4.762928 <0.0001 0.0001 3 Average 9.606284 5.540854949 13.74347862 3.322242 2e-04 0.0057 Main 11.54314 -0.592916635 24.1972221 2.684844 0.0288 0.0645 6 Average -3.84284 -7.802632063 0.299592968 -1.64152 0.083 0.4532
109 Main 12.55446 5.207282095 20.01631742 2.780221 0.0039 0.0519 Plasma glucose 10
Average 2.23361 -1.792949821 6.369265612 0.984374 0.3067 0.9125 Main 17.47872 10.51503985 24.72698899 3.564996 1e-04 0.0067 4 Average 6.614857 1.258687295 12.25451985 2.514275 0.0163 0.0666 Main 15.62627 7.183748327 24.03397682 3.740554 2e-04 0.0045 5 Average 6.024702 1.809464596 10.38360837 2.614923 0.0087 0.0508 Main 8.74165 1.750181134 15.99788547 1.857149 0.0386 0.3628 8 Average -1.46336 -5.284059866 2.529461511 -0.52701 0.5403 0.9978 Cl, confidence interval.
Meta Substituted Effect 95% CI 95% CI Adjusted chromosome Estimate T P Lower Upper P Main 0.318969 -0.586592987 1.183791215 1.030359 0.3986 0.9161 14 Average -0.0193 -0.348753138 0.32173969 -0.11239 0.9127 1 - Main -1.13483 -1.844105803 -3.68475 7e-04 0.0037 0.410376532 17 - Average -1.11218 -1.382123889 -7.07123 <0.0001 <0.0001
0.842540405 Main 0.087931 -0.532102491 0.681676768 0.237511 0.8047 1 3 Average 0.271599 -0.144578572 0.694418149 1.363997 0.1902 0.7084 Main 0.634872 0.210502972 1.05681823 2.08434 0.0134 0.2464 6
Body weight Average 0.933015 0.665819837 1.203822428 5.794178 <0.0001 <0.0001 Main 0.517743 -0.005528751 1.024677598 1.618801 0.0789 0.531 10 110 Average 0.436682 0.112330084 0.756676888 2.800148 0.006 0.0331
Main 0.203255 -0.537046654 0.92737608 0.57813 0.5782 0.9969 4 Average 0.312415 -0.046415952 0.656068643 1.719168 0.0856 0.444 Main -0.22006 -0.689000914 0.236982218 -0.74791 0.4137 0.9862 5 Average 0.035833 -0.248051049 0.31682035 0.22635 0.8076 1 Main 1.232247 0.703835183 1.772528999 3.616917 <0.0001 0.0051 8 Average 1.024028 0.673792153 1.369461021 5.321076 <0.0001 <0.0001 Cl, confidence interval.
Fig. 4.3. Identification of 5 inter-chromosomal epistatic interactions that regulate fasting glucose levels in mice.
Multiple testing adjusted p-values for interaction effects on fasting plasma glucose levels among 15 crosses each involving two A/J-derived chromosome substitutions with the
substituted chromosomes indicated below the chart. Inverse-variance meta-analysis was
used to combine the effects from males and females. The horizontal line indicates the
significance threshold of 0.05.
111
112
Fig.4.4. Inter-chromosomal epistasis regulates fasting glucose levels.
Plasma glucose levels were measured in 5-week-old female (left) and male (right) mice that were fasted overnight. Each dot represents the glucose level of a single mouse.
“Others” represents the data from all mice in this study excluding the other 4 strains shown in that panel. The black horizontal line indicates the mean glucose level for each group. The red horizontal line indicates the predicted trait level based on a model of additivity.
113
Regulation of gene expression by epistasis.
As hepatic gluconeogenesis is a key determinant of plasma glucose levels in healthy
insulin-sensitive mice [177], the hepatic gene expression patterns of control and CSS male
mice were analyzed to better understand the molecular mechanisms underlying the
epistatic regulation of plasma glucose. The RNA-Seq data were filtered for genes
expressed in the liver, leaving 13,289 genes that were tested for differential expression
associated with both main and interaction effects. A total of 6,101 main effect expression
QTLs (meQTLs) were identified (FDR < 0.05) (Fig. 4.5). The full list of meQTLs is
available at https://doi.org/10.1371/journal.pgen.1007025.s014. Those meQTL genes
located on the substituted chromosome were classified as cis-meQTLs (Fig. 4.5, red)
whereas the meQTL genes not located on the substituted chromosome were classified as
trans-meQTLs (Fig. 4.5, blue). Among all possible genes regulated by a cis-meQTL, on average 11.48% of these genes in each strain had a cis-meQTL (range: 5.54% - 22.09%)
(Table 4.3). Similarly, among all possible genes regulated by a trans-meQTL, on average
5.42% (range: 0.08% to 19.26%) of these genes were regulated by a trans-meQTL (Table
4.3). The percentage of cis- and trans-meQTLs in each strain demonstrated a strong positive correlation (Spearman’s r = 1.0) but the proportion of cis-eQTLs was always greater than the proportion of trans-eQTLs. Strain (B6 x B6.A8)F1 had both the highest percentage of genes with cis-meQTLs (22.09%) and trans-meQTLs (19.26%), whereas strain (B6 x B6.A5)F1 had both the lowest percentage of genes with cis-meQTLs (5.54%) and trans-meQTLs (0.08%). This suggests that trans-meQTLs are being driven by the cumulative action of many cis-effects rather than a single or small number of major transcriptional regulators (Fig. 4.6). Among the genes regulated by a meQTL(s), 41.98%
114
(1615 out of 3847) were regulated by multiple meQTLs (Range: 2-6) (Table 4.4). The
full list of genes with multiple meQTLs is available at
https://doi.org/10.1371/journal.pgen.1007025.s017. For example, Brca2 is regulated by 5
trans-meQTLs mapped to chromosomes 4, 6, 8, 10 and 14 (Fig. 4.7), demonstrating that
hepatic Brca2 expression is regulated by allelic variation throughout the genome. In addition to the well-known role of Brca2 in breast cancer susceptibility, Brca2 has been
implicated in hepatocellular carcinoma risk [178–180].
115
Fig. 4.5. Identification of meQTLs that regulate hepatic gene expression.
A circos plot of meQTL locations in the genome where each layer of the circle represents the comparison between a CSS strain and control B6 mice. From the inner circle, the CSS strains are (B6 x B6.A5)F1, (B6.17 x B6)F1, (B6.A3 x B6)F1, (B6.A6 x B6)F1, (B6 x
B6.A10)F1, (B6 x B6.A4)F1, (B6.A14 x B6)F1 and (B6 x B6.A8)F1. Cis-meQTLs and trans-meQTLs are marked with red and blue, respectively. The width of each chromosome is proportional to its physical size. The height of each meQTL bar is proportional to the number of meQTLs in that genomic interval.
116
Fig. 4.6. Positive correlation between cis-meQTLs and trans-meQTLs.
(A) Scatter plot of the relationship between the percentage of cis-meQTLs and trans- meQTLs in each of 8 CSS strains with one substituted chromosome. The strains are labelled on the graph with only their substituted chromosome, for example strain (B6 x
B6.A8)F1 is shown for simplicity as A8. Data are shown on a log scale. (B) Histogram
illustrating the percentage of cis-meQTLs and trans-meQTLs in each of 8 CSS strains with one substituted chromosome.
117
Table 4.3. Main effects on gene expression.
Cis-meQTLs Trans-meQTLs
Number Percentage Number of Percentage Number Number of of genes Strain of genes genes on of genes of cis- trans- on other with cis- substituted with trans- meQTLs eQTLs chromos meQTL chromosome meQTL omes
(B6 x B6.A8)F1 22.09% 154 697 19.26% 2425 12592 (B6.A14 x B6)F1 19.27% 84 436 10.05% 1292 12853 (B6 x B6.A4)F1 17.20% 139 808 8.47% 1057 12481 (B6 x B6.A10)F1 9.46% 58 613 3.54% 449 12676 (B6.A6 x B6)F1 9.41% 67 712 1.01% 127 12577 (B6.A3 x B6)F1 6.92% 51 737 0.69% 86 12552 (B6.A17 x B6)F1 5.85% 37 632 0.09% 11 12657 (B6 x B6.A5)F1 5.54% 54 974 0.08% 10 12315 All 11.48% 644 5609 5.42% 5457 100703
Table 4.4. Summary of genes with multiple meQTLs.
Number of Number of Percentage of meQTLs genes genes 0 9442 71.05% 1 2232 16.80% 2 1138 8.56% 3 346 2.60% 4 102 0.77% 5 27 0.20% 6 2 0.02%
118
Fig. 4.7. Identification of 5 trans-meQTLs that regulate the hepatic expression of
Brca2.
Gene expression levels of Brca2 in the liver are shown for strain B6 and 8 single CSS strains. Each dot represents Brca2 expression in an individual mouse. The mean value for each strain is indicated by a solid line. The Brca2 gene is located on mouse chromosome
5. ** indicates p<0.01 relative to strain B6. *** indicates p<0.001 relative to strain B6.
119
In addition to the meQTLs regulated by substitution of a single chromosome, the analysis of double CSSs enabled the detection of eQTLs with additive and interaction effects between the substituted chromosomes. The expression of Zkscan3 represents an example of additivity, with the substitution of A/J-derived chromosomes 8 and 17 each individually increasing the expression of Zkscan3 relative to control B6 mice (Fig. 4.8.).
In the double CSS strain (B6.A17 x B6.A8)F1, the effects of each individual chromosome substitution are combined in an additive manner to result in yet higher expression than either of the single CSSs (Fig. 4.8A). The additive effects of the Zkscan3 meQTLs detected by RNA-Seq were confirmed by quantitative reverse transcription PCR
(Fig. 4.8B), as were 4/5 additional meQTLs demonstrating additivity (Table 4.5).
120
Fig. 4.8. Regulation of hepatic Zkscan3 expression by additive meQTLs.
(A) Gene expression of Zkscan3 in the liver was analyzed by (A) RNA-Seq and (B) RT- qPCR. Each dot represents Zkscan3 expression levels in an individual mouse. RT-qPCR data shown are relative to the control gene Rplp0. The mean value for each strain is indicated by a black line. The expected expression level of Zkscan3 based on a model of additivity is indicated with a red line. The p value from a test for interactions is shown. A p > 0.05 is suggestive of regulation by additivity rather than interaction.
121
Table 4.5. Genes examined by RNA-Seq and RT-qPCR for epistasis and additive interactions.
RNA-seq (fold change) qPCR (fold change) Type Gene name Cross CSS1 & P-value for CSS1 & P-value for CSS1/B6 CSS2/B6 CSS1/B6 CSS2/B6 CSS2/B6 interaction CSS2/B6 interaction Agxt A6:A8 0.87 0.55 1.01 5.00469E-10 1.20 0.65 1.24 0.04112 Pcx A14:A8 0.68 0.59 0.90 3.18347E-11 0.69 0.53 0.99 0.000363 Slc6a12 A14:A8 0.72 0.65 1.00 4.15536E-13 0.83 0.64 1.04 0.00532 Serpinf2 A6:A8 0.87 0.66 0.92 1.00661E-06 1.25 0.66 1.17 0.1911 Zbtb20 A14:A4 1.11 1.50 1.05 8.96403E-06 1.13 1.15 1.44 0.568 Zbtb20 A17:A4 1.22 1.50 1.06 7.06135E-07 1.11 1.21 1.46 0.623 Antagonisti Raph1 A14:A4 1.09 1.39 0.98 5.02787E-07 0.85 0.72 1.35 0.0043 c Raph1 A17:A4 1.09 1.39 1.01 6.41372E-06 0.76 0.72 1.05 0.0661 Dnajb9 A14:A4 0.97 0.82 1.60 2.58697E-09 1.01 0.77 1.94 1.22E-05 Cers6 A14:A4 0.72 0.79 1.52 3.60415E-08 0.74 0.62 1.72 1.24E-05 Ldha A6:A8 0.92 0.69 1.22 3.13268E-08 1.26 0.77 1.37 0.1441 122 Sec23b A14:A4 0.88 0.86 1.22 5.12386E-08 0.97 0.79 1.83 0.000144
Eif2ak3 A14:A4 0.87 0.89 1.33 1.3656E-07 0.87 0.80 1.96 2.57E-06 Cyp3a16 A14:A4 1.50 1.54 16.17 0.001333533 1.47 1.55 5.87 6.44E-05 Syvn1 A14:A4 1.05 1.01 1.58 0.001266066 0.80 0.58 1.84 5.58E-05 Gstm1 A14:A8 0.97 0.99 1.65 0.000669705 0.99 1.01 1.78 0.00506 Usp18 A6:A8 1.08 0.86 1.68 0.00027991 1.43 0.98 2.07 0.0481 Pik3c2a A14:A8 0.98 1.09 0.75 9.74301E-05 1.11 1.09 0.90 0.107 Synergistic Stat5a A14:A8 0.96 1.01 1.36 0.006677835 1.18 0.96 1.49 0.105 Tecr A6:A8 0.95 0.94 1.28 2.88342E-05 1.37 1.00 1.66 0.2259 Mcm10 A14:A4 1.10 0.98 0.54 0.000183216 1.42 0.87 1.56 0.4847 Nol8 A14:A4 0.95 0.97 1.27 0.000767168 1.03 0.80 1.98 1.69E-05 Rprl3 A14:A4 686.25 561.52 742.32 1.52828E-51 1.02 0.78 1.97 1.49E-05 Zkscan3 A17:A8 1.47 1.66 2.12 0.111901094 1.36 1.50 2.24 0.18559 Asns A3:A5 0.41 0.36 0.14 0.912510043 0.46 0.41 0.16 0.37253 Asns A14:A5 0.31 0.36 0.13 0.767984888 0.48 0.40 0.19 0.37105 Additive Slc12a2 A14:A4 1.25 1.25 1.36 0.185386891 1.17 0.99 2.08 0.00217 Igfbp3 A14:A4 0.74 0.76 0.63 0.24590094 1.02 0.70 0.90 0.33164 Ldha A14:A4 0.70 0.80 0.65 0.143861309 0.84 0.72 0.93 0.165
In addition to examples of additivity, interaction expression QTLs (ieQTLs) were
identified that were jointly regulated by genetic variation on two substituted
chromosomes. The ieQTLs, similar to the meQTLs, were divided into cis-ieQTLS and
trans-ieQTLs, with cis-ieQTLs defined by differentially expressed genes located on either
one of the two substituted chromosomes and trans-ieQTLs representing differentially expressed genes that are not located on either substituted chromosome. A total of 4,283 ieQTLs were identified. The full list of ieQTLs is available at https://doi.org/10.1371/journal.pgen.1007025.s019. Among all possible genes regulated by a cis-ieQTL or trans-ieQTL, 2.01% and 2.16% of genes were regulated by a cis- or
trans-ieQTL respectively (Table 4.6). The combination of A/J-derived chromosomes 8
and 14 yielded the most ieQTLs (n=2,305) including cis-ieQTLs regulating the
expression of 17.56% of all genes on chromosomes 8 or 14 and trans-ieQTLs regulating
the expression of 17.32% of all genes throughout the remainder of the genome. Overall,
the ieQTLs demonstrated a similar positive correlation as the meQTLs (Spearman’s r =
0.92) (Fig. 4.9), although there was no enrichment for cis-ieQTLs. Among the genes
regulated by an ieQTL(s), 32.35% (945 out of 2921) were regulated by multiple ieQTLs
(Range: 2-7) (Table 4.7). The full list of genes with multiple ieQTLs is available at
https://doi.org/10.1371/journal.pgen.1007025.s020. For example, expression of Agt,
which codes for angiotensinogen and involves in blood pressure regulation, is decreased
in strain (B6.A8 x B6)F1 relative to control B6 mice; however, interactions between
alleles on chromosome 8 and chromosomes 6, 3, 17, and 14 all result in expression levels
of Agt that did not differ from the control strain (Fig. 4.10).
123
Table 4.6. Interaction effects on gene expression.
Cis-ieQTLs Trans-ieQTLs Subtypes of ieQTLs
Genes Genes Percentage Number of Number of Number Cross Genes Number with Number with of genes Number of genes on genes on of with cis- of cis- trans- of trans- synergistic with antagonistic substituted other synergistic ieQTL (%) ieQTLs ieQTL ieQTLs ieQTLs antagonistic ieQTLs chromosomes chromosomes ieQTLs (%) (%) ieQTLs
A14:A8 17.56% 199 1133 17.32% 2106 12156 6% 129 94% 2176 124 A6:A8 6.81% 96 1409 5.89% 700 11880 2% 15 98% 781 A14:A4 3.86% 48 1244 3.57% 430 12045 6% 31 94% 447 A6:A10 1.81% 24 1325 0.58% 69 11964 1% 1 99% 92 A14:A10 1.62% 17 1049 2.05% 251 12240 1% 3 99% 265 A6:A4 0.86% 13 1520 0.58% 68 11769 0% 0 100% 81 A3:A8 0.77% 11 1434 0.77% 91 11855 2% 2 98% 100 A3:A10 0.44% 6 1350 0.56% 67 11939 0% 0 100% 73 A17:A8 0.23% 3 1329 0.00% 0 11960 0% 0 100% 3 A17:A4 0.14% 2 1440 0.48% 57 11849 2% 1 98% 58 A17:A10 0.08% 1 1245 0.03% 4 12044 0% 0 100% 5 A14:A5 0.00% 0 1410 0.17% 20 11879 0% 0 0% 0 A17:A5 0.00% 0 1606 0.00% 0 11683 0% 0 0% 0 A3:A5 0.00% 0 1711 0.00% 0 11578 0% 0 0% 0 A6:A5 0.00% 0 1686 0.00% 0 11603 0% 0 100% 20 All 2.01% 420 20891 2.16% 3863 178444 4% 182 96% 4101
Fig. 4.9. Positive correlation between cis-ieQTLs and trans-ieQTLs.
(A) Scatter plot of the relationship between the percentage of cis-ieQTLs and trans- ieQTLs identified among 15 pairwise CSS crosses. The data points are labelled on the graph with the two substituted chromosomes for each pairwise cross. Data are shown on a log scale. (B) Histogram illustrating the percentage of cis-ieQTLs and transieQTLs in each of 15 pairwise CSS crosses.
125
Table 4.7. Summary of genes with mutliple ieQTLs.
Number of ieQTLs Number of genes Percentage of genes 0 10368 78.02% 1 1976 14.87% 2 662 4.98% 3 185 1.39% 4 71 0.53% 5 19 0.14% 6 7 0.05% 7 1 0.01%
126
Fig. 4.10. Identification of 4 ieQTLs that regulate the hepatic expression of Agt.
Gene expression levels of Agt in the liver are shown for strain B6, 5 single CSS strains, and 4 double CSS strains. Each dot represents Agt expression in an individual mouse.
The mean value for each strain is indicated by a solid line. The expected expression level of Agt in the double CSS strains based on a model of additivity is indicated with a red line. The Agt gene is located on mouse chromosome 8.
127
Context-dependent effects on gene expression.
We next tested whether the interaction effects on gene expression were synergistic
(positive epistasis) or antagonistic (negative epistasis) (Fig. 4.11). Synergistic refers to an increased difference in gene expression levels between the double CSS and the control
B6 strain beyond that expected based on an additive model, whereas antagonistic refers to a decreased difference. The regulation of Agxt was an example of an antagonistic interaction, with main effects from substituted chromosomes 6 and 8 each individually decreasing Agxt expression, whereas this effect was lost in the double chromosome substitution strain (Fig. 4.12A). In contrast, the regulation of Cyp3a16 represented an example of synergistic interaction with the detection of an ieQTL in the absence of a meQTLs (Fig. 4.12B). Among the ieQTLs, antagonistic interactions accounted for 96%
(n=4101) while synergistic interactions accounted for 4% (n=182) (Table 4.6).
Remarkably, for 80% of the antagonistic interactions (3285/4101), gene expression in one or both of the single CSSs differed from the control B6 strain (a meQTL), whereas expression in the double CSS reverted to control levels (p > 0.1 relative to strain B6). To again validate the RNA-Seq data using an independent method, RT-qPCR was performed for a subset of genes with antagonistic (n=13) and synergistic (n=10) interactions.
Replication by RT-qPCR confirmed the detection of epistasis in 61% (p <0.05) of the genes tested (Antagonistic: 8/13; Synergistic: 6/10) (Table 4.5).
128
Fig. 4.11. Schematic diagram illustrating the categorization of epistasis as either synergistic or antagonistic.
129
Hypothetical mean expression levels are shown with black lines for the strains B6 and the two single CSS strains (CSSa x B6)F1 and (B6 x CSSb)F1, where a and b represent any two different substituted chromosomes. The predicted expression levels based on a model of additivity in the double CSS strain (CSSa x CSSb)F1 are shown with a red line.
Synergistic epistasis is represented by a difference in trait values between the double CSS and control B6 strain that is greater than that predicted by additivity. Antagonistic epistasis is represented by a difference in trait values between the double CSS and control
B6 strain that is less than that predicted by additivity. (A) Illustrates the case where only one single CSS strain shows expression differences relative to the control. (B) Illustrates the case where both single CSS strains show expression differences relative to the control.
(C) Illustrates the case where both single CSS strains show expression differences relative to the control, but in opposite directions. (D) Illustrates the case where neither single CSS strain show expression differences relative to the control.
130
Fig. 4.12. Examples of synergistic and antagonistic ieQTLs. Each dot represents the gene expression data from one mouse. The horizontal bar indicates the mean value for each strain (A) An antagonistic ieQTL regulates the expression of Agxt in the liver. (B) A
131
synergistic ieQTL regulates the expression of Cyp3a16 in the liver. The red horizontal line indicates the predicted trait level based on a model of additivity
132
Significant contribution of epistasis to trait heritability.
Given that the ieQTLs regulated approximately 2% of all genes expressed in the liver
(Table 4.6), we sought to quantify the contribution of genetic interactions to the heritable
component of all genes. First, an empirical Bayes quasi-likelihood F-test identified 6,684 genes out of the 12,325 genes expressed in the liver for which there was evidence of genetic control within the population of CSSs (FDR<0.05). The average proportion of heritable variation attributable to interactions across these genes was 0.56 (1st quartile:
0.43 – 3rd quartile: 0.68) (Fig. 4.13A). When the same analysis was restricted to only
genes with a statistically significant (FDR<0.05) contribution of interactions to gene
expression levels (n=3,236 genes), the proportion of heritable variation attributable to
interactions increased to 0.66 (1st quartile: 0.56, 3rd quartile: 0.74) (Fig. 4.13B). For
comparison, a simulation study was conducted using artificial data to model pure
additivity in the absence of interactions, with a resulting estimate of heritability of 0.13
(1st quartile: 0.05, 3rd quartile: 0.19) (Fig. 4.13C), which provides an estimate of the
background noise in this measurement. Thus, genetic interactions are a major contributor
to the regulation of gene expression.
133
134
Fig. 4.13. Contribution of epistasis to the genetic regulation of hepatic gene expression. Diagrams representing the estimated proportion of genetic variation due to interactions for (A) all genes expressed in the mouse liver whose expression was under genetic control in the CSS strains studied, (B) the same data segregated based on the statistical evidence supporting an effect of interaction on gene expression, and (C) a comparison of the genes with the most significant evidence for regulation by genetic interactions (FDR < 0.05) and a simulation study with artificial data that models the absence of any genetic interactions.
135
Discussion
CSSs, which have a simplified and fixed genetic background, were used to identify
widespread and likely concurrent epistatic interactions. This systematic analysis of
mammalian double CSSs demonstrated that epistatic interactions controlled the majority
of the heritable variation in both fasting plasma glucose levels and hepatic gene
expression (Fig. 4.13). Among genes expressed in the liver, the expression level of 24%
were regulated, at least in part, by epistasis (Fig. 4.13). This number is remarkable
considering that only dominant or semi-dominant effects were tested, only a single tissue and time point were examined, allelic variation from only two inbred strains of mice were included, and only 15 randomly selected pairwise strain combinations of A/J derived
CSSs were tested out of a possible 462 combinations of double CSSs. The prevalence of epistatic interactions provides a potential molecular mechanism underlying the highly dependent nature of complex traits on genetic background [172,173,181,182]. Interpreting
the effect of individual allelic variants will thus be severely limited by population-style
analyses that fail to account for possible contextual effects. Nonetheless, progress is
being made in this field, including in diseases such as multiple sclerosis (MS), which is a
complex genetic disease whose risk is highly associated with family history [183]. For
example, MS risk alleles in DDX39B (rs2523506) and IL7R (rs2523506A) together
significantly increase MS risk considerably more than either variant independently [37].
Based on the considerable number of interactions detected in the CSS crosses, context-
dependent interactions such as that between DDX39B and IL7R in MS are likely
136
widespread and may therefore represent a significant source of missing heritability for
complex traits and diseases [23,30].
Although epistasis was a dominant factor regulating fasting glucose levels, the same effect was not detected in the regulation of body weight. It is not clear if this is due to different genetic architectures between these two traits or whether this was due to the
limited genetic variation between the B6 and A/J strains. The body weight studies were
conducted in mice fed a standard rodent chow, whereas differences in body weight
between strains B6 and A/J are significantly more pronounced when challenged with a
high-fat diet [184,185]. Alternately, a recent meta-analysis of trait heritability in twin
studies identified significant variation in the role of additive and non-additive variation
among different traits, with suggestive evidence for non-additive effects in 31% of traits
[186]. Among the traits analyzed, genetic regulation of neurological, cardiovascular, and ophthalmological traits were among the most consistent with solely additive effects,
whereas traits related to reproduction and dermatology were more often consistent with
non-additive interactions. Among the metabolic traits studied, 40% of the 464 traits
studied were consistent with a contribution of non-additive interactions [186]. It is
interesting to speculate whether some traits that may have a more direct effect on fitness
(e.g. reproduction) are more likely to involve multiple non-additive effectors in order to
maintain a narrow phenotypic or developmental range [187].
Although many inter-chromosomal non-additive interactions were identified in mice, it
remains unclear whether these interactions are attributable to bigenic gene-gene
interactions or to higher-order epistasis involving multiple loci located on a substituted
chromosome. Studies in yeast that dissected the genetic architecture of epistasis
137
demonstrated that gene-gene interactions played a minor role among the heritable effects
attributable to epistasis, thus primarily implicating higher order interactions [160]. Yet,
other studies in yeast that methodically tested pairs of gene knockouts for interactions
identified a number of gene-gene interactions [188]. Additional evidence for both high- order epistasis with three, four, and even more mutations [189] as well as bigenic gene-
gene interactions [190] have been identified, and it seems likely that both will underlie
interactions detected in the CSS studies. This is because the use of CSSs to study the
allelic variation found on an entire chromosome in tandem equally enables the detection
of bigenic and higher-order interactions. This property of CSSs may contribute to the
robust detection of epistasis using the CSS experimental platform relative to genetic
mapping studies in populations with many independently segregating variants, which are
often underpowered to identify higher-order interactions [191]. However, to formally test
this and determine the relative contribution of each, higher resolution genetic mapping of
the epistatic interactions will be necessary to better understand their molecular nature
[192]. Higher resolution mapping studies should eventually shed light on whether the chromosome-level properties discovered in this study are consistent with those for SNP- level interactions. Based on previous studies of complex trait QTLs in single-CSS studies, chromosome-level QTLs demonstrated a similar genetic architecture as that found in higher resolution QTLs including large effect sizes, similar direction of effects, and suggestive evidence of widespread epistasis [174,193]. Thus, it seems likely that
discoveries made based on chromosome-level analysis of epistasis, will apply equally to studies involving individual genetic variants. For example, genetic variants in Cntnap2 were identified by higher resolution mapping studies of chromosome-level QTLs in CSSs,
138
that were associated with opposing effects on body weight depending on epistatic
interactions with intra-chromosomal variation in the genetic background [194].
Perhaps the most significant outcome of the epistasis detected was the high degree of
constancy in the light of context dependence, such that the interactions usually returned
trait values to the levels detected in control mice. Remarkably, this is just as Waddington
predicted 75 years ago, a phenomenon he referred to as canalization [195] and has been observed in previous studies[196–200]. Canalization refers to the likelihood of an organism to proceed towards one developmental outcome, despite variation in the process along the way. This variation can be influenced by among other things the numerous functional genetic variants present in a typical human genome, which may contain thousands of variants that alter gene function [201]. We find that the overwhelming
majority of genetic interactions return trait values to levels seen in control strains, which
would act to reduce phenotypic variation among developmental outcomes. Studies of
epistasis in tomato plants detected by analyzing short chromosomal regions on different
genetic backgrounds identified a similar bias towards antagonistic epistasis relative to
synergistic epistasis[199]. A bias towards antagonistic interactions was also detected in
large-scale gene-gene interactions studies in yeast, although with a lower frequency of
antagonistic relative to synergistic interactions[198,202]. Thus, our results are concordant
with other studies that the majority of epistatic interactions are antagonistic, and together
suggest that when larger tracts of DNA are assessed for interactions the effects are even
more likely to be antagonistic. This robustness in the face of considerable genetic
variation is central to the underlying properties of canalization. These genetic interactions
therefore represent a mechanism for storing genetic variation within a population, without
139
reducing individual fitness. This stored genetic variation could then enable populations to
more quickly adapt to environmental changes [203].
Finally, the consistently greater effect sizes of main effects relative to average effects
suggests that GWAS-type studies, in both human and model organisms, consistently
underestimate true effect sizes in at least a subset of individuals. For example, a large F2
intercross between inbred mice carrying a mutation that results in a nonfunctional allele
of the growth hormone releasing hormone receptor (Ghrhr) on either a B6 or C3H
genetic background identified widespread antagonistic epistasis, albeit with small
contributions to overall trait heritability relative to additive effects [196]. Similarly,
epistatic interactions were identified in the Diversity Outbred mice resulting in small
contributions to the overall heritability of metabolic-related traits [204]. These studies
contrast the large contribution of epistasis to trait heritability identified using the CSS paradigm (Fig. 4.13), mirroring the contrasting portraits of genetic architecture identified based on differing genetic structures of these experimental populations [174]. The CSS
paradigm examines context-dependent effects on individual genotypes and typically
identifies QTLs with large effect sizes. Alternatively, GWAS-type studies average effects
across a population of heterogeneous genotypes and typically identify QTLs with small
phenotypic effects. However, perhaps most relevant is that the relatively simpler
genotypes of CSSs enable greater depth analyzing fewer unique genotypes, potentially
capturing what would be rare genotypic combinations in a segregating cross or human
population. Therefore, the key to enabling precision medicine, which like the CSS studies
is focused on the effect of a variant on one specific genetic background, is to identify in
which subset of individuals a particular variant has a significant effect. The consideration
140
of epistasis in treatment, although in its infancy, remains a promising avenue for improving clinical treatment regimens, including predicting drug response in tumors [205] and guiding antibiotic drug-resistance [206]. However, true precision medicine will necessitate a more comprehensive understanding of how genetic background, across many loci, affects single variant substitutions.
141
Materials and Methods
Mice. Chromosome substitution strains (CSS) and control strains were purchased from
The Jackson Laboratory. These strains include C57BL/6J-Chr3A/J/NaJ mice (Stock
#004381) (B6.A3), C57BL/6J-Chr4A/J/NaJ mice (Stock #004382) (B6.A4), C57BL/6J-
Chr5A/J/NaJ mice (Stock #004383) (B6.A5), C57BL/6J-Chr6A/J/NaJ mice (Stock #004384)
(B6.A6), C57BL/6J-Chr8A/J/NaJ mice (Stock #004386) (B6.A8), C57BL/6J-Chr10A/J/NaJ mice (Stock #004388) (B6.A10), C57BL/6J-Chr14A/J/NaJ mice (Stock #004392)
(B6.A14), C57BL/6J-Chr17A/J/NaJ mice (Stock #004395) (B6.A17) and C57BL/6J
(Stock #000664). Mice were maintained by brother-sister matings. All mice used for
experiments were obtained from breeder colonies at Case Western Reserve University.
Mice were housed in ventilated racks with access to food and water ad libitum and
maintained at 21°C on a 12-hour light/12-hour dark cycle. All mice were cared for as described under the Guide for the Care and Use of Animals, eighth edition (2011) and all experiments were approved by IACUC and carried out in an AAALAC approved facility.
The IACUC protocol numbers were 2013-0098 and 2016-0064. Male mice from strains
B6, B6.A4, B6.A5, B6.A10 strains and B6.A8 were bred with female mice from strains
B6, B6.A3, B6.A6, B6.A14 and B6.A17 strain. The offspring were weaned at 3 weeks of age. The number of offspring analyzed from each cross is shown in Table 4.1 for both body weight and plasma glucose, although glucose levels were not measured in one mouse each from the following strains: (B6 x B6.A10)F1, (B6.A14 x B6)F1, (B6.A17 x
B6.A10)F1, (B6.A3 x B6.A10)F1, (B6.A6 x B6.A4)F1, (B6.A14 x B6.A5)F1 and (B6.A6 x B6.A5)F1. The mice analyzed from each cross were derived from at least three
142
independent breeding cages. No blinding to the genotypes was undertaken.
Mouse phenotyping. At 5 weeks of age, mice were fasted 16 hours overnight and body
weight was measured. Mice were anesthetized with isofluorane and fasting blood glucose
levels were measured via retro orbital bleeds using an OneTouch Ultra2 meter (LifeScan,
Milpitas, CA, USA). Mice were subsequently sacrificed by cervical dislocation and the
caudate lobe of the liver was collected and immediately placed in RNAlater (Thermo
Fisher Scientific, Waltham, MA, USA).
Trait analysis. To analyze the body weight and fasting plasma glucose data, linear regression was used with a main effects term and a term for each pairwise interaction for the males and females separately. In the glucose data, 5 observations were Winserized by setting a ceiling of 4 median absolute deviations from the median. Any values larger than the ceiling (165 mg/dL) were set to the ceiling. Additionally, interactions where one of the crosses contained fewer than 5 mice were not analyzed leading to the removal of the
(B6.A4 x B6.A3)F1 mice, the female (B6.A8 x B6.A14)F1 and the male (B6.A8 x
B6.A3)F1 mice. For each trait and for each sex, we estimated a linear model with the following predictors: (1) maternal substitution, (2) paternal substitution and (3) the interaction of maternal by paternal substitution. In these models, the reference strain was
B6. The sexes may potentially differ in residual variance and in the effect of the chromosome substitutions (i.e. gene by sex interaction). To handle these differences transparently, we estimated and reported models for each sex separately. Within each of the above models, two joint linear hypothesis tests were performed of the following
143
hypothesis: (a) there were no main effects (i.e. terms (1) and (2) in the model above were
all 0), and (b) there were no interaction effects (i.e. terms (3) in above model were all 0).
These linear hypothesis tests were carried out using the “linearHypothesis” function in
the “car” package [207] and with the anova function in R. Fisher’s method was used to
combine these p-values from males and females [208]. Similar results were obtained
using a full 3-way interaction model including all interactions between sex, maternal
substitution and paternal substitution. In this approach, the test of the null hypothesis that
all main effects in males and females were 0 had a p-value of 3.168e-05 and 1.17e-05 for
weight and glucose respectively, while the overall test for interaction had a p-value of
0.44 and 0.00011 for weight and glucose respectively. Inverse-variance meta-analysis
ˆ ˆ was used to combine the coefficient estimates from the males and females. If βm and β f
are estimated genetic effects for males and females respectively then the IVW estimator
ˆ 1/ var (β f ) is ββ垐? =ww +−(1 ) β where w = . Thus, while the genetic IVW f m 垐 1/ var(ββfm) + 1/ var ( )
effects may potentially differ between males and females, the combined results represent
a weighted average of the effect in males and in females. To account for potential non- normality, heteroscedasticity and multiple testing, we created 10,000 bootstrap data sets by sampling with replacement from each cross and sex combination. Studentized bootstraps (i.e. using pivotal statistics) were used to create confidence intervals for the coefficients and p-values. Multiple tests were adjusted for by comparing the observed test statistics to the maximum bootstrap test statistic as described elsewhere [209]. P-values were adjusted for multiple comparisons separately for each trait and separately for the main effects and interactions. As an alternative to the meta-analysis approach, we also fit
144
a linear model adjusting for sex as a covariate. Results of this analysis are reported in
Tables 4.8 and 4.9. The proportion of the genetic variance explained by interactions was estimated as (RFull – RAdditive)/ RFull where RAdditive and RFull are the adjusted coefficients of
determination for the model with only main effects and for the full interaction model
respectively. The adjusted coefficients of determination are an estimate of the proportion
of variation in the trait which is explained by the model. Note that RFull and RAdditive share the same denominator (i.e. the total trait variation). Thus, total trait variation cancels out of the quantity (RFull - RAdditive)/ RFull so that the quantity represents the amount of genetic variation that cannot be explained by main effects only. Using the adjusted version of the
coefficient of determination helps account for potential overfitting. Bootstrap confidence
intervals of this proportion were calculated.
Sample preparation for RNA-Seq. Liver tissue stored in RNAlater was homogenized using a Tissumizer Homogenizer (Tekmar, Cincinnati, OH, USA). Total RNA was isolated using the PureLink RNA purification kit (Thermo Fisher Scientific, Waltham,
MA, USA). A sequencing library was generated using the TruSeq Stranded Total RNA kit (Illumina, San Diego, CA, USA). RNA samples were sequenced on Illumina
HiSeq2500s with single-end 50 base pair reads [210]. Library preparation and RNA
sequencing were performed by the CWRU genomics core (Director, Dr. Alex Miron). A
total of 7,269,450,186 reads were generated across four flow cells, with an average of
47,204,222 ± 928,913 [range: 14,561,990 – 76,538,825] reads per sample. Sequencing
quality was assessed by FastQC [211], which identified an average per base quality score
of 35.46.
145
RNA-Seq data analysis. To maximize statistical power, 20 samples were selected for
analysis from the control B6 group, 8 samples were selected from the single CSS groups,
and 5 samples were selected from the double CSS groups. A total of 154 control and CSS
mice were analyzed, including 20 B6 mice, 63 mice that were heterozygous for one A/J-
derived chromosome, and 71 mice that were heterozygous for two different A/J-derived chromosomes. Only male mice were analyzed to avoid complications due to sex
differences in gene expression. The B6.A4 x B6.A3 and B6.A8 x B6.A3 crosses were
poor breeders and thus we did not obtain 5 samples to analyze from these crosses.
Reads were aligned using TopHat2 (2.0.10) [212] to the reference mm10 genome with the
GENCODE vM7 annotations as a guide. Because the reference genome is comprised of
sequence from strain B6, sequencing reads from a B6-derived chromosome are more
accurately mapped than reads from an A/J-derived chromosome [213]. To avoid potential
mapping biases, we created an “individualized genome” of the A/J mouse strain using the
program Seqnature [213] with variant calls from the Mouse Genomes Project that were
downloaded from The Sanger Institute [214]. Reads that were not mapped to the B6
genome were then mapped to the individualized AJ genome with TopHat2. HTSeq-count
[215] and the GENCODE vM7 gene annotations[46] were used to count the number of
reads for each gene feature. After filtering to remove duplicate reads, unmapped reads,
low quality reads, and reads mapped to non-GENCODE regions of the genome, an
average of 16,506,775 ± 439,754 [range: 4,638,701 – 30,465,477] reads were mapped to
GENCODE regions per sample. There was no significant difference in the mapping
efficiency (number of mapped reads / total number of reads) between the control B6
samples and any of the CSS strains either genome-wide (Fig. 4.14A) or on the substituted
146
chromosome (Fig. 4.14B). This suggests that the sequence differences on the A/J chromosomes did not reduce mapping efficiency in the CSSs.
Graphical depictions of the distribution CPM (counts per million) were used to remove the following 3 outlier samples: E171, E305, and E570. Genes where fewer than 75% of the samples had a count greater than or equal to 15 were considered to be expressed at low levels in liver and were removed leaving 13,289 genes that were considered expressed. To enhance reproducibility and reduce the dependence between the genes, svaseq [61] was used to create 5 surrogate variables that served as covariates in subsequent modeling.
EdgeR [58] was used to fit a model with main effects and pairwise interactions between each chromosome substitution. EdgeR uses a log link function, and thus departure from additivity in EdgeR is departure from a multiplicative model on the gene expression level.
For each gene an interaction model was fit which included the following terms: (1) maternal substitution, (2) paternal substitution, (3) the interaction of maternal by paternal substitution, and (4) the SVA covariates. For all models, “B6” was used as the reference for the categorical chromosome substitution predictors.
147
Fig. 4.14. No differences in mapping efficiency of RNA-Seq reads between B6 and
CSSs.
(A) Genome-wide mapping efficiency was calculated as the number of unique reads
mapped to the GENCODE coding portion of the genome divided by the total number of
148
reads per sample. (B) Mapping efficiency was calculated as above for the individual substituted chromosomes in each CSS as indicated.
149
A stratified FDR approach was used for the analysis of both meQTLs and ieQTLs [219].
For meQTLs, we tested for associations between every combination of chromosome substitutions in the study with every unfiltered gene in the RNA-Seq data. These hypothesis tests were stratified by chromosome and cis vs. trans. The method of
Benjamini and Hochberg [220] was applied within each strata to control the false discovery rate. Similarly, the hypothesis tests for the ieQTLs were stratified by each chromosome combination and cis/trans. The stratified FDR approach has been shown to be more powerful when the proportion of true hypothesis differs by strata. The chromosome-chromosome interactions with FDR < 0.05 were divided into the categories synergistic and antagonistic based on the gene expression differences between the double
CSS strain and the control strain relative to that predicted by an additive model (Fig.
4.11). Spearman’s r was used to summarize the association between several variables in the analysis. A Spearman’s r of 1 implies that the rank order of the values for two variables is the same. To estimate the amount of variation attributable to interaction, we fit an additive model in EdgeR which did not include any interaction terms. We then calculated for each individual and gene the fitted values assuming that the individual’s covariates (i.e. the SVA surrogate variables) were set to 0 and thus do not contribute to the variation. We calculate SSFull as the sum of the mean centered and squared fitted values for the full model including interaction, SAdditive was calculated similarly for the additive model. We calculated the proportion of the genetic variation explained by interactions as (SSFull - SAdditive) / SFull. This proportions is only meaningful when there is genetic variation to be explained. To filter out only genes with evidence of genetic control, using the full model for each gene, we tested the overall joint null hypothesis that
150
all mouse strains had the same average expression level using the empirical Bayes quasi- likelihood F-tests test as implemented in EdgeR. This allowed us to classify some genes as showing evidence of genetic control. Only these genes were looked at further. The estimator (SSFull - SAdditive) / SFull may be slightly biased upward due to overfitting.
However, the mean value for this statistic among the genes with no significant interaction
(FDR > 0.5) was 0.25 (1st quartile: 0.20, 3rd quartile: 0.32) (Fig. 4.13B), which gives one
estimate of the upper bound on the possible bias. Here, the overall test that the interaction
terms were all 0 was carried out using the Bayes quasi-likelihood F-tests test as
implemented in EdgeR. To assess any potential bias stemming from the arbitrary
selection of an FDR > 0.5, we performed a simulation study to independently
approximate the upper limit on this bias. Using the fitted values (i.e. predicted mean)
from the additive model described above, we simulated counts for each gene and
individual from a Poisson distribution. The full and additive model was fit to the
simulated data set, and the variance explained (SSFull - SAdditive) / SFull was calculated for
each gene. The simulation was repeated 100 times and the average variance explained by
interaction was averaged across all simulations for each gene. The mean for the amount
of genetic variance explained by interaction under this simulated additive model was 0.13
(1st quartile: 0.05, 3rd quartile: 0.19) (Fig. 4.13C). This gives another estimate of the
upper bound on the possible bias.
Multiple testing correction. For both the analysis of mouse phenotypes and RNA-Seq
data it is necessary to account for multiple testing in order to avoid a large number of
false positive findings. The approaches to multiple testing for the mouse phenotypes and
151
RNA-Seq data are fundamentally different because the number hypotheses being tested were very different. For the mouse phenotype data, there were a relatively small number of targeted hypotheses, and thus the conservative and more confirmatory approach of controlling the family-wise type I error was applied. In this case, the genetic scan for each of the small number of traits was considered to be a separate question (i.e. the main effects for each trait and interaction effects for each trait were considered a separate
“family” of hypotheses). For the large number traits analyzed in the RNA-Seq data, a less conservative and more hypothesis generating approach known as the stratified FDR was applied.
Quantitative PCR (qPCR). Tissue was homogenized using TissueLyser II (Qiagen,
Valencia, CA, USA) and total RNA was isolated using the PureLink RNA purification kit with TRIzol protocol (Thermo Fisher Scientific, Waltham, MA, USA). Total RNA was reverse transcribed using the high capacity cDNA reverse transcription kit (Applied
Biosystems, Carlsbad, CA, USA). The sequences for each primer are listed in Table 4.10.
The qPCR reactions were performed with the power SYBR green PCR Master Mix
(Thermo Fisher Scientific, Waltham, MA, USA) and run on a Bio Rad CFX Connect
Real Time System (Bio Rad, Hercules, CA, USA). Expression levels were calculated using the Ct method relative to the Rplp0 control gene.
△△
152
Table 4.8. Identification of fasting glucose QTLs using a combined linear model.
Model Terms Estimate Std. Error t value Pr(>|t|) Significant (Intercept) 72.30 2.46 29.37 < 2e-16 * Maternal A14 9.35 4.44 2.10 0.035712 Maternal A17 -1.73 4.25 -0.41 0.683665 Maternal A3 24.59 5.25 4.68 3.40E-06 * Maternal A6 11.30 4.30 2.63 0.00873 Paternal A10 13.26 4.49 2.95 0.003263 * Paternal A4 16.96 4.89 3.47 0.000549 * Paternal A5 15.28 4.17 3.66 0.00027 * Paternal A8 8.23 4.67 1.76 0.078739 Sex (Male) 9.36 1.54 6.08 1.95E-09 * Maternal A14 : Paternal A10 -17.91 6.77 -2.64 0.00838 Maternal A17 : Paternal A10 0.04 6.61 0.01 0.995736 Maternal A3 : Paternal A10 -29.66 7.31 -4.06 5.52E-05 * Maternal A6 : Paternal A10 -16.45 6.83 -2.41 0.016317 Maternal A14 : Paternal A4 -11.64 7.84 -1.48 0.13834 Maternal A17 : Paternal A4 -5.76 7.17 -0.80 0.422139 Maternal A3 : Paternal A4 -6.31 13.93 -0.45 0.650844 Maternal A6 : Paternal A4 -24.91 7.04 -3.54 0.000431 * Maternal A14 : Paternal A5 -16.18 6.69 -2.42 0.015864 Maternal A17 : Paternal A5 -7.50 6.39 -1.17 0.24095 Maternal A3 : Paternal A5 -3.28 7.81 -0.42 0.674979 Maternal A6 : Paternal A5 -24.34 6.66 -3.66 0.000275 * - Maternal A14 : Paternal A8 11.1885 8.97221 -1.247 0.21279 Maternal A17 : Paternal A8 -2.8734 7.06517 -0.407 0.684347 - Maternal A3 : Paternal A8 * 34.6259 9.97441 -3.471 0.000548 - Maternal A6 : Paternal A8 * 22.5194 7.30439 -3.083 0.002126 Model used: Glucose = [maternal substitution] + [paternal substitution] + [maternal substitution]*[paternal substitution] + [sex] * indicates statistical significance (p<0.05) following Bonferonni correction
153
Table 4.9. Identification of body weight QTLs using a combined linear model.
Model Terms Estimate Std. Error t value Pr(>|t|) Significant (Intercept) 13.93 0.19 74.82 < 2e-16 * Maternal A14 0.37 0.33 1.12 0.262578 Maternal A17 -1.13 0.32 -3.50 0.000486 * Maternal A3 0.27 0.40 0.67 5.05E-01 Maternal A6 0.67 0.33 2.07 0.038688 Paternal A10 0.60 0.34 1.77 0.076771 Paternal A4 0.36 0.37 0.97 0.335033 Paternal A5 -0.10 0.32 -0.32 0.746572 Paternal A8 1.29 0.35 3.64 0.000297 * Sex (Male) 2.80 0.12 24.13 < 2e-16 * Maternal A14 : Paternal A10 -0.83 0.51 -1.65 0.100242 Maternal A17 : Paternal A10 -0.03 0.50 -0.06 0.951586 Maternal A3 : Paternal A10 -0.33 0.55 -0.61 5.44E-01 Maternal A6 : Paternal A10 0.11 0.51 0.22 0.826329 Maternal A14 : Paternal A4 0.00 0.59 0.00 0.999939 Maternal A17 : Paternal A4 -0.18 0.54 -0.34 0.733663 Maternal A3 : Paternal A4 0.48 1.05 0.46 0.645792 Maternal A6 : Paternal A4 0.34 0.53 0.64 0.520308 Maternal A14 : Paternal A5 -0.41 0.50 -0.81 0.416534 Maternal A17 : Paternal A5 0.30 0.48 0.62 0.538963 Maternal A3 : Paternal A5 0.54 0.59 0.92 0.356591 Maternal A6 : Paternal A5 1.28 0.50 2.55 0.0111 Maternal A14 : Paternal A8 -0.9913 0.6768879 -1.464 0.143483 Maternal A17 : Paternal A8 -0.03062 0.5344762 -0.057 0.954337 Maternal A3 : Paternal A8 0.401273 0.7545657 0.532 0.595029 Maternal A6 : Paternal A8 0.005279 0.5525797 0.01 0.99238 Model used: Weight = [maternal substitution] + [paternal substitution] + [maternal substitution]*[paternal substitution] + [sex] * indicates statistical significance (p<0.05) following Bonferonni correction
154
Table 4.10. Primer sequences for RT-qPCR detection.
Interaction Primer Gene name Sequence(5' to 3') category type Forward TGCTTCAGATCATGGAGGAGA Agxt Reverse TGGTTCCGGTTAGAAAGGAGT Forward AGGCCATGAAGGAGATGCAC Pcx Reverse CTTAGCCACCTTGTCCCCTG Forward TCTGGGAGAGACGGGTTTTG Slc6a12 Reverse GAAGACGATGCCCTGGTAGG Forward CACAGTGTCGGTGGACATGA Serpinf2 Reverse GGGGAAATGAGCCACCTGTA Forward CCCGGTCTGTCCACCTTTAC Zbtb20 Reverse TGGGGCTTCTCACCTGTATG Forward CTCTGCTTCGCCGACTACG Tmem245 Reverse CAATGTCCAGATCCACAGGCT Antagonistic Forward AGTATCCCGGAGTCTCAGTCAA Raph1 Reverse TAGTTTGAGGGGACAGAGGGG Forward CGGGGCGCACAGGTTATTAG Dnajb9 Reverse CTCTGAGGCAGACTTTGGCA Forward AACAACATGGCCCGAGTAGG Cers6 Reverse TGCCATTTTGGCAGCCTCTA Forward GGAGTGGTGTGAATGTTGCC Ldha Reverse TCACCTCGTAGGCACTGTCC Forward AGAACGAGATGGTGTGCGTT Sec23b Reverse GCATATGCTGGAGGGAACTGA Forward AGCAAGCCAGAGGTGTTTGG Eif2ak3 Reverse GGAAGATTCGAGCAGGGACTC Forward AGTGGGGATAATGAGTAAATCCAT Cyp3a16 Reverse GGCACCTAACACATCTTTCACAG Forward CCACCAGTACAGCCGTTTCT Syvn1 Reverse TACCCATCCAAGGAGGAGGG Forward GATACACCATGGGTGACGCT Gstm1 Reverse TCTCCATCCAGGTGGTGCTT Forward CCCTCATGGTCTGGTTGGTTT Usp18 Reverse GCACTCCGAGGCACTGTTAT Synergistic Forward GCGGGAGAAAAACATGGCTC Pik3c2a Reverse AATACCAGGACCTCACGCTG Forward CTACGTGTTCCCAGACCGAC Stat5a Reverse TGACGAACTCAGGGACCACT Forward GCACTGGCCGTTTTTGTGAT Tecr Reverse TCCAGGAGCCCACCTCATAA Forward AGAGAAAACCAGCGAGGAGC Mcm10 Reverse GGCTGCAGAGATGAATCAGGT Nol8 Forward GACGACAGACTTCGTGGTTCT
155
Reverse CTTGTTCGGGCTTCCCAAGA Forward ACTCTTCGGCCCCTGAGAAG Rprl3 Reverse GCTCTCTGGGAATTCACCTCC Forward GGAGTCTTTGGGATCCCTGC Zkscan3 Reverse TCCATTTTCAGCAACCCCTGT Forward GCCATCTATGACAGCGTGGA Asns Reverse AGTCCAGGCCCCCTGATAAA Forward GCAAAATCTCCAGGATGGCG Additive Slc12a2 Reverse CATATGTGAGCAACGCAGCC Forward AACCTGCTCCAGGAAACATCA Igfbp3 Reverse ACTTGGAATCGGTCACTCGG Forward GGAGTGGTGTGAATGTTGCC Ldha Reverse TCACCTCGTAGGCACTGTCC
156
Chapter 5. Summary and Future Direction
157
Summary
In this dissertation, we have identified the genetic basis of two rare Mendelian disorders
(Chapter 2,3) and better understand the genetic architecture of complex metabolic traits
(Chapter 4), both using forward genetic approaches.
In Chapter 2, we explored the exome of four patients with primary ovarian insufficiency from two independent consanguineous families and identified causal missense mutations in MRPS22, which was further investigated using a Drosophila model. The infertility and lack of germ cells phenotype in both humans and flies unveiled a novel function of
MRPS22 in germ cell development, which differed from all previous cases with MRPS22 mutations.
Similarly, in Chapter 3, we identified three novel loss-of-function mutations on PIK3C2A in a previously unidentified syndrome in five patients from three independent consanguineous families, providing us the opportunity to investigate its novel function in phosphatidylinositol metabolism, cilia function and cataract formation, using patient- derived fibroblasts and a zebrafish model of Pik3c2a deficiency.
In Chapter 4, we identified widespread epistatic interactions using double chromosome substitution strains in mouse, thus providing strong evidence for the importance of epistasis in complex traits, which remains a controversial phenomenon. Our findings demonstrated that epistatic interactions controlled the majority of the heritable variation
158
in both fasting plasma glucose levels and hepatic gene expression, even greater than the additive effects. This finding may partially explain the ‘missing heritability’ phenomenon due to the difficulty of discovering epistatic interactions in humans. We also identified an interesting effect of epistatic interactions, which were prone to maintain fasting glucose levels at control levels. This might be an evolutionary strategy that stores genetic variants in individuals without reducing their fitness and allows them to quickly adapt to new environmental challenges.
Future Direction
1. Researchers are not alone in battles against genetic diseases
Adrenoleukodystrophy is a disease due to the accumulation of very long chain fatty acids that leads to demyelination and eventual death before adulthood. Back in the
1990s, Lorenzo’s Oil, a 4:1 mix of oleic acid and erucic acid, was a trial treatment for this disease. Surprisingly, it was proposed by parents of Lorenzo, who desperately looked for a cure for their young son. Though the clinical trial ended with mixed results and failed[221], it did promote studies of this rare disease and ultimately led to the discovery of causal variants in ABCD1[222]. This case strongly demonstrated the enthusiasm and potential efforts that patients and their relatives could contribute to the studies of rare human disorders.
159
In fact, it’s a trend nowadays that efforts directly from patients and their relatives
contribute to an increasing proportion of patient-centered genetic studies. For example,
SPARK is an ambitious project aiming to enroll 50,000 individuals with autism and their families in the United States to accelerate autism research. In this study, cases are
reported by patients or their parents, allow for online access to study results by
participants, and reserve the ability to recontact with participants for new research studies
and to potentially provide first-hand updates and treatment guidelines[223]. With the
access to personal sequencing data, individuals can easily retrieve their DNA report
according to newly annotated variants using web-based literature retrieval systems, such as Promethease.
In our studies, we also benefited from positive patient-researcher interactions. With their
consent for skin punch biopsies and the application of that in our research, we were able
to culture primary fibroblast cells, which served as the cellular models in our studies. On
the other hand, patients benefited from our studies as well. For example, the secondary
glaucoma was not diagnosed in Family I until we found a second family with PIK3C2A
deficiency that was already known to have glaucoma. Due to this discovery, the patients
in Family I were referred to an ophthalmologist for a detailed eye exam, which confirmed
the presence of glaucoma. In addition to improving the clinical diagnosis, our
identification of the genetic basis for this syndrome can also be used to provide prenatal
screening for any future offspring from these affected families and their relatives.
160
2. Gene therapy to cure the diseases
Since Jesse Gelsinger’s death during a gene therapy trial in 1999, gene therapy research had been almost ‘frozen’ for decades. However, the discovery of a precise gene-editing tool, CRISPR/Cas9, has ‘broken the ice’ of gene therapy research. In 2015, China launched CRISPR trials in cancer patients. Earlier this year, a CRISPR/Cas9-based clinical trial to treat Beta-Thalassemia was approved in Europe. It appears that in the near future, we could conquer any genetic disorder. However, evidence has shown that this assumption might be too optimistic. As Eric Lander said when asked for his opinion on how CRISPR should be applied ‘We are terrible predictors of the consequences of the changes we make’. A good example is that a mutation in CCR5 that protects against HIV was unexpectedly associated with increased risk for West Nile virus[224], in which case genomic editing upon that mutation attempting to lower the risk of West Nile virus could have resulted in increased risk of HIV.
In addition to treating complex traits, rare disorder treatments have also been applied in humans. Though the FDA has approved over 500 orphan drugs for rare disorders, and these orphan drugs accounted for 17% of prescription drug market share in 2017, none of them take the form of gene therapy. Eventually, last year, gene therapy using zinc finger nucleases was applied to a 44-year old man with a rare disease referred to as Hunter syndrome, which is characterized by accumulation of glycosaminoglycans due to mutations in the gene IDS. According to ClinicalTrials.gov, this treatment (SB-913) inserts a correct copy of the IDS gene under the control of the strong albumin promoter
161
and produces functional IDS enzyme in patients’ liver. So far, his symptoms have diminished without any negative consequences.
This treatment sheds light on the potential application of gene therapy to our patients which have a similar phenotype as Hunter’s syndrome. With a well-designed strategy to
replace a functional copy of PIK3C2A, patients may recover from the disease, even with
just 50% of gene function like their healthy parents. However, it’s likely wise to remain
skeptical towards this new trend as we have little knowledge of the long-term effects of
these gene therapy approaches.
3. Strategies to predict disease risk loci
Over the past two decades, the healthcare system has increasingly shifted its attention
from treatment options to predictive and/or preventive strategies to improve both patient
care and reduce overall cost [225]. To better predict disease risk, even before the onset of
disease, machine learning-based studies on electronic health records (EHR) are a
promising direction. Compared with traditional algorithms [226], machine learning-based
algorithms have improved the accuracy, precision and trade-off between false positive
and true positive rates and have been successfully applied in risk predictions of type 2
diabetes [227] and heart disease [228]. More interestingly, recent applications of machine
learning with big autism spectrum disorder datasets are able to provide scores for each
gene to identify novel autism genes. Collectively, application of new algorithms will
dramatically speed up the identification of novel disease genes.
162
In addition, machine learning-based imaging studies can facilitate genetic syndrome diagnoses [229–231]. Rare disorder patients on average visit 7.3 physicians before receiving an accurate diagnosis. However, with these machine learning-based facial recognition software programs, like Face2Gene, physicians may be able to diagnose rare disorders simply by snapping a photo of patient’s face. For example, it can be applied in syndromic disorders with craniofacial malformations syndromes, like our patients with
PIK3C2A mutations.
4. Strategies to better under current data
In contrast to predictive studies, retrospective studies on raw sequencing data also shed light on the diagnosis of rare disorders. In a recent study, 156 cases that failed to provide definitive diagnoses were reanalyzed with raw sequencing data. Surprisingly, 24 of 156 cases turned out to be definitively diagnosed after just 1 year had passed since the initial analysis [232]. The improved diagnostic rates were mainly due to improved informatics approaches and a better understanding of the variants of unknown significance [233].
Therefore, reanalysis of an individual’s genome-wide sequencing data every 1–2 years until a diagnosis is recommended [233].
In addition to publishing our work in scientific journals, we have submitted the pathogenic variants and supporting data to ClinVar, which is a widely used database for variant interpretation, which will further promote the identification of similar cases in the future.
163
5. What’s beyond genetic studies in understanding human disorders?
Our studies described in this thesis focused on identifying the genetic basis of complex traits and human disorders. However, studies on epigenetics and environmental factors are also indispensable.
In contrast to genetic studies, epigenetic studies focus on heritable changes in the regulation of gene activity and expression without disturbing DNA sequences. The underlying mechanisms include DNA methylation, histone modification and long non- coding RNAs, all of which play crucial roles in development, tissue homeostasis, cell identity and genome stability [234]. As a result, aberrant epigenetic regulations have been associated with cancer, cardiovascular, neurological diseases, metabolic disorders, as well as imprinting disorders [235]. In addition to mutations in genes involved in epigenetic regulation[236], mutations in non-coding region can also trigger human disorders due to epigenetic changes. For example, in elderly type 2 diabetes patients, a polymorphism in the NDUFB6 promoter region that creates a new DNA methylation site was identified and confirmed to have increased DNA methylation and decreased
NDUFB6 expression, which is a known risk factor for insulin resistance [237]. These studies provide evidence that epigenetic factors are associated with human disorders.
In addition, environmental factors can influence human fitness. Environmental factors refer to nutrition, toxins and infectious agents and lifestyle. It is now well known that prenatal exposure to environmental factors can have a long-term impact on not only organ development, but even adulthood fitness. A famous example is the Dutch famine
164
studies. This series of studies discovered that individuals conceived during that period of famine showed higher rates of obesity and cardiovascular disorders when examined in their 50s, as well as a greater age-associated decline of cognitive functions as adults [238].
Similarly, prenatal smoke exposure has been strongly associated with risks of impaired lung function development and asthma [239]. Further study showed changes in SAT2 methylation in offspring’s peripheral blood [240]. These studies provide evidence for the assumption that environmental factors play an important role in developmental reprogramming via epigenetic regulation, and then act as an “epigenetic memory” of the exposure.
165
Reference
1. Suter U, Welcher AA, Özcelik T, Snipes GJ, Kosaras B, Francke U, et al. Trembler mouse carries a point mutation in a myelin gene. Nature. 1992;356: 241–244. doi:10.1038/356241a0
2. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. The Huntington’s Disease Collaborative Research Group. Cell. 1993;72: 971–983.
3. Rinchik EM. Chemical mutagenesis and fine-structure functional analysis of the mouse genome. Trends Genet. 1991;7: 15–21. doi:10.1016/0168-9525(91)90016-J
4. Arnold CN, Barnes MJ, Berger M, Blasius AL, Brandl K, Croker B, et al. ENU-induced phenovariance in mice: inferences from 587 mutations. BMC Res Notes. 2012;5: 577. doi:10.1186/1756-0500-5-577
5. Reaume AG, Knecht DA, Chovnick A. The Rosy Locus in Drosophila Melanogaster: Xanthine Dehydrogenase and Eye Pigments. Genetics. 1991;129: 1099–1109.
6. Fire A, Xu S, Montgomery MK, Kostas SA, Driver SE, Mello CC. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature. 1998;391: 806– 811. doi:10.1038/35888
7. Kuttenkeuler D, Boutros M. Genome-wide RNAi as a route to gene function in Drosophila. Brief Funct Genomic Proteomic. 2004;3: 168–176.
8. Waterhouse PM, Graham MW, Wang MB. Virus resistance and gene silencing in plants can be induced by simultaneous expression of sense and antisense RNA. Proc Natl Acad Sci U S A. 1998;95: 13959–13964.
9. 1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467: 1061–1073. doi:10.1038/nature09534
10. 1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491: 56–65. doi:10.1038/nature11632
11. Consortium T 1000 GP. A global reference for human genetic variation. Nature. 2015;526: 68–74. doi:10.1038/nature15393
12. An integrated map of structural variation in 2,504 human genomes | Nature [Internet]. [cited 5 Jun 2018]. Available: https://www.nature.com/articles/nature15394
13. Botstein D, Risch N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet. 2003;33: 228– 237. doi:10.1038/ng1090
166
14. Majewski J, Schwartzentruber J, Lalonde E, Montpetit A, Jabado N. What can exome sequencing do for you? J Med Genet. 2011; jmedgenet–2011–100223. doi:10.1136/jmedgenet-2011-100223
15. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536: 285–291. doi:10.1038/nature19057
16. Scott EM, Halees A, Itan Y, Spencer EG, He Y, Azab MA, et al. Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nat Genet. 2016;48: 1071–1076. doi:10.1038/ng.3592
17. Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273: 1516–1517.
18. Consortium TIH, Altshuler D, Donnelly P. A haplotype map of the human genome. Nature. 2005;437: 1299–1320. doi:10.1038/nature04226
19. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42: D1001– 1006. doi:10.1093/nar/gkt1229
20. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of heritability for human height. Nat Genet. 2010;42: 565–569. doi:10.1038/ng.608
21. Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, et al. Meta-analysis of genome-wide association studies for height and body mass index in ~700,000 individuals of European ancestry. bioRxiv. 2018; 274654. doi:10.1101/274654
22. Visscher PM. Sizing up human height variation. Nat Genet. 2008;40: 489–490. doi:10.1038/ng0508-489
23. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461: 747–753. doi:10.1038/nature08494
24. Rivas MA, Beaudoin M, Gardet A, Stevens C, Sharma Y, Zhang CK, et al. Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat Genet. 2011;43: 1066–1073. doi:10.1038/ng.952
25. Gudmundsson J, Sulem P, Gudbjartsson DF, Masson G, Agnarsson BA, Benediktsdottir KR, et al. A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nat Genet. 2012;44: 1326–1329. doi:10.1038/ng.2437
26. Del-Aguila JL, Koboldt DC, Black K, Chasse R, Norton J, Wilson RK, et al. Alzheimer’s disease: rare variants with large effect sizes. Curr Opin Genet Dev. 2015;33: 49–55. doi:10.1016/j.gde.2015.07.008
167
27. Lee S, Abecasis GR, Boehnke M, Lin X. Rare-Variant Association Analysis: Study Designs and Statistical Tests. Am J Hum Genet. 2014;95: 5–23. doi:10.1016/j.ajhg.2014.06.009
28. Marouli E, Graff M, Medina-Gomez C, Lo KS, Wood AR, Kjaer TR, et al. Rare and low- frequency coding variants alter human adult height. Nature. 2017;542: 186–190. doi:10.1038/nature21039
29. Jonsson T, Stefansson H, Steinberg S, Jonsdottir I, Jonsson PV, Snaedal J, et al. Variant of TREM2 Associated with the Risk of Alzheimer’s Disease. N Engl J Med. 2013;368: 107–116. doi:10.1056/NEJMoa1211103
30. Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci. 2012;109: 1193–1198. doi:10.1073/pnas.1119675109
31. Huang Y, Wang C, Yao Y, Zuo X, Chen S, Xu C, et al. Molecular Basis of Gene-Gene Interaction: Cyclic Cross-Regulation of Gene Expression and Post-GWAS Gene-Gene Interaction Involved in Atrial Fibrillation. PLoS Genet. 2015;11: e1005393. doi:10.1371/journal.pgen.1005393
32. Liu Y, Xu H, Chen S, Chen X, Zhang Z, Zhu Z, et al. Genome-wide interaction-based association analysis identified multiple new susceptibility Loci for common diseases. PLoS Genet. 2011;7: e1001338. doi:10.1371/journal.pgen.1001338
33. Kirino Y, Bertsias G, Ishigatsubo Y, Mizuki N, Tugal-Tutkun I, Seyahi E, et al. Genome-wide association analysis identifies new susceptibility loci for Behçet’s disease and epistasis between HLA-B*51 and ERAP1. Nat Genet. 2013;45: 202–207. doi:10.1038/ng.2520
34. Verma SS, Cooke Bailey JN, Lucas A, Bradford Y, Linneman JG, Hauser MA, et al. Epistatic Gene-Based Interaction Analyses for Glaucoma in eMERGE and NEIGHBOR Consortium. PLoS Genet. 2016;12: e1006186. doi:10.1371/journal.pgen.1006186
35. Hemani G, Shakhbazov K, Westra H-J, Esko T, Henders AK, McRae AF, et al. Detection and replication of epistasis influencing transcription in humans. Nature. 2014;508: 249–253. doi:10.1038/nature13005
36. Hu T, Chen Y, Kiralis JW, Collins RL, Wejse C, Sirugo G, et al. An information-gain approach to detecting three-way epistatic interactions in genetic association studies. J Am Med Inform Assoc JAMIA. 2013;20: 630–636. doi:10.1136/amiajnl-2012-001525
37. Galarza-Muñoz G, Briggs FBS, Evsyukova I, Schott-Lerner G, Kennedy EM, Nyanhete T, et al. Human Epistatic Interaction Controls IL7R Splicing and Increases Multiple Sclerosis Risk. Cell. 2017;169: 72–84.e13. doi:10.1016/j.cell.2017.03.007
38. Brewer GJ. Drug development for orphan diseases in the context of personalized medicine. Transl Res J Lab Clin Med. 2009;154: 314–322. doi:10.1016/j.trsl.2009.03.008
168
39. Boycott KM, Vanstone MR, Bulman DE, MacKenzie AE. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat Rev Genet. 2013;14: 681–691. doi:10.1038/nrg3555
40. Marešová P, Mohelská H, Kuča K. Cooperation Policy of Rare Diseases in the European Union. Procedia - Soc Behav Sci. 2015;171: 1302–1308. doi:10.1016/j.sbspro.2015.01.245
41. Hamamy HA, Masri AT, Al-Hadidy AM, Ajlouni KM. Consanguinity and genetic disorders. Profile from Jordan. Saudi Med J. 2007;28: 1015–1017.
42. Bittles AH, Black ML. Evolution in health and medicine Sackler colloquium: Consanguinity, human evolution, and complex diseases. Proc Natl Acad Sci U S A. 2010;107 Suppl 1: 1779–1786. doi:10.1073/pnas.0906079106
43. Hamamy H. Consanguineous marriages. J Community Genet. 2012;3: 185–192. doi:10.1007/s12687-011-0072-y
44. Beaulieu CL, Majewski J, Schwartzentruber J, Samuels ME, Fernandez BA, Bernier FP, et al. FORGE Canada Consortium: outcomes of a 2-year national rare-disease gene-discovery project. Am J Hum Genet. 2014;94: 809–817. doi:10.1016/j.ajhg.2014.05.003
45. Committee opinion no. 605: primary ovarian insufficiency in adolescents and young women. Obstet Gynecol. 2014;124: 193–197. doi:10.1097/01.AOG.0000451757.51964.98
46. Rossetti R, Ferrari I, Bonomi M, Persani L. Genetics of primary ovarian insufficiency: Genetics of POI. Clin Genet. 2017;91: 183–198. doi:10.1111/cge.12921
47. Bolcun-Filas E, Hall E, Speed R, Taggart M, Grey C, de Massy B, et al. Mutation of the mouse Syce1 gene disrupts synapsis and suggests a link between synaptonemal complex structural components and DNA repair. PLoS Genet. 2009;5: e1000393. doi:10.1371/journal.pgen.1000393
48. Costa Y, Speed R, Ollinger R, Alsheimer M, Semple CA, Gautier P, et al. Two novel proteins recruited by synaptonemal complex protein 1 (SYCP1) are at the centre of meiosis. J Cell Sci. 2005;118: 2755–2762. doi:10.1242/jcs.02402
49. Caburet S, Arboleda VA, Llano E, Overbeek PA, Barbero JL, Oka K, et al. Mutant cohesin in premature ovarian failure. N Engl J Med. 2014;370: 943–949. doi:10.1056/NEJMoa1309635
50. Guiraldelli MF, Eyster C, Wilkerson JL, Dresser ME, Pezza RJ. Mouse HFM1/Mer3 is required for crossover formation and complete synapsis of homologous chromosomes during meiosis. PLoS Genet. 2013;9: e1003383. doi:10.1371/journal.pgen.1003383
51. Wang J, Zhang W, Jiang H, Wu B-L, Primary Ovarian Insufficiency Collaboration. Mutations in HFM1 in recessive primary ovarian insufficiency. N Engl J Med. 2014;370: 972–974. doi:10.1056/NEJMc1310150
169
52. Weinberg-Shukron A, Renbaum P, Kalifa R, Zeligson S, Ben-Neriah Z, Dreifuss A, et al. A mutation in the nucleoporin-107 gene causes XX gonadal dysgenesis. J Clin Invest. 2015;125: 4295–4304. doi:10.1172/JCI83553
53. AlAsiri S, Basit S, Wood-Trageser MA, Yatsenko SA, Jeffries EP, Surti U, et al. Exome sequencing reveals MCM8 mutation underlies ovarian failure and chromosomal instability. J Clin Invest. 2015;125: 258–262. doi:10.1172/JCI78473
54. Fauchereau F, Shalev S, Chervinsky E, Beck-Fruchter R, Legois B, Fellous M, et al. A non- sense MCM9 mutation in a familial case of primary ovarian insufficiency. Clin Genet. 2016;89: 603–607. doi:10.1111/cge.12736
55. Wood-Trageser MA, Gurbuz F, Yatsenko SA, Jeffries EP, Kotan LD, Surti U, et al. MCM9 mutations are associated with ovarian failure, short stature, and chromosomal instability. Am J Hum Genet. 2014;95: 754–762. doi:10.1016/j.ajhg.2014.11.002
56. Zhao H, Chen Z-J, Qin Y, Shi Y, Wang S, Choi Y, et al. Transcription factor FIGLA is mutated in patients with premature ovarian failure. Am J Hum Genet. 2008;82: 1342–1348. doi:10.1016/j.ajhg.2008.04.018
57. Bayram Y, Gulsuner S, Guran T, Abaci A, Yesil G, Gulsuner HU, et al. Homozygous loss-of- function mutations in SOHLH1 in patients with nonsyndromic hypergonadotropic hypogonadism. J Clin Endocrinol Metab. 2015;100: E808–814. doi:10.1210/jc.2015-1150
58. Qin Y, Choi Y, Zhao H, Simpson JL, Chen Z-J, Rajkovic A. NOBOX homeobox mutation causes premature ovarian failure. Am J Hum Genet. 2007;81: 576–581. doi:10.1086/519496
59. Bouilly J, Bachelot A, Broutin I, Touraine P, Binart N. Novel NOBOX loss-of-function mutations account for 6.2% of cases in a large primary ovarian insufficiency cohort. Hum Mutat. 2011;32: 1108–1113. doi:10.1002/humu.21543
60. Kasippillai T, MacArthur DG, Kirby A, Thomas B, Lambalk CB, Daly MJ, et al. Mutations in eIF4ENIF1 are associated with primary ovarian insufficiency. J Clin Endocrinol Metab. 2013;98: E1534–1539. doi:10.1210/jc.2013-1102
61. Kurolap A, Orenstein N, Kedar I, Weisz Hubshman M, Tiosano D, Mory A, et al. Is one diagnosis the whole story? patients with double diagnoses. Am J Med Genet A. 2016;170: 2338–2348. doi:10.1002/ajmg.a.37799
62. Rösler A, Silverstein S, Abeliovich D. A (R80Q) mutation in 17 beta-hydroxysteroid dehydrogenase type 3 gene among Arabs of Israel is associated with pseudohermaphroditism in males and normal asymptomatic females. J Clin Endocrinol Metab. 1996;81: 1827–1831. doi:10.1210/jcem.81.5.8626842
63. Geissler WM, Davis DL, Wu L, Bradshaw KD, Patel S, Mendonca BB, et al. Male pseudohermaphroditism caused by mutations of testicular 17β–hydroxysteroid dehydrogenase 3. Nat Genet. 1994;7: 34–39. doi:10.1038/ng0594-34
170
64. Mendonca BB, Arnhold IJP, Bloise W, Andersson S, Russell DW, Wilson JD. 17β- Hydroxysteroid Dehydrogenase 3 Deficiency in Women. J Clin Endocrinol Metab. 1999;84: 802–804. doi:10.1210/jcem.84.2.5477
65. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31: 3812–3814.
66. Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. 2013;Chapter 7: Unit7.20. doi:10.1002/0471142905.hg0720s76
67. Sobreira N, Schiettecatte F, Valle D, Hamosh A. GeneMatcher: a matching tool for connecting investigators with an interest in the same gene. Hum Mutat. 2015;36: 928– 930. doi:10.1002/humu.22844
68. Saada A, Shaag A, Arnon S, Dolfin T, Miller C, Fuchs-Telem D, et al. Antenatal mitochondrial disease caused by mitochondrial ribosomal protein (MRPS22) mutation. J Med Genet. 2007;44: 784–786. doi:10.1136/jmg.2007.053116
69. Smits P, Saada A, Wortmann SB, Heister AJ, Brink M, Pfundt R, et al. Mutation in mitochondrial ribosomal protein MRPS22 leads to Cornelia de Lange-like phenotype, brain abnormalities and hypertrophic cardiomyopathy. Eur J Hum Genet EJHG. 2011;19: 394–399. doi:10.1038/ejhg.2010.214
70. Ye F, Hoppel CL. Measuring oxidative phosphorylation in human skin fibroblasts. Anal Biochem. 2013;437: 52–58. doi:10.1016/j.ab.2013.02.010
71. Lee T, Luo L. Mosaic analysis with a repressible cell marker for studies of gene function in neuronal morphogenesis. Neuron. 1999;22: 451–461.
72. Cabrera GR, Godt D, Fang P-Y, Couderc J-L, Laski FA. Expression pattern of Gal4 enhancer trap insertions into the bric à brac locus generated by P element replacement. Genes N Y N 2000. 2002;34: 62–65. doi:10.1002/gene.10115
73. Van Doren M, Williamson AL, Lehmann R. Regulation of zygotic gene expression in Drosophila primordial germ cells. Curr Biol CB. 1998;8: 243–246.
74. Smits P, Smeitink JAM, van den Heuvel LP, Huynen MA, Ettema TJG. Reconstructing the evolution of the mitochondrial ribosomal proteome. Nucleic Acids Res. 2007;35: 4686– 4703. doi:10.1093/nar/gkm441
75. Greber BJ, Ban N. Structure and Function of the Mitochondrial Ribosome. Annu Rev Biochem. 2016;85: 103–132. doi:10.1146/annurev-biochem-060815-014343
76. Amunts A, Brown A, Toots J, Scheres SHW, Ramakrishnan V. Ribosome. The structure of the human mitochondrial ribosome. Science. 2015;348: 95–98. doi:10.1126/science.aaa1193
171
77. Greber BJ, Bieri P, Leibundgut M, Leitner A, Aebersold R, Boehringer D, et al. Ribosome. The complete structure of the 55S mammalian mitochondrial ribosome. Science. 2015;348: 303–308. doi:10.1126/science.aaa3872
78. Menezes MJ, Guo Y, Zhang J, Riley LG, Cooper ST, Thorburn DR, et al. Mutation in mitochondrial ribosomal protein S7 (MRPS7) causes congenital sensorineural deafness, progressive hepatic and renal failure and lactic acidemia. Hum Mol Genet. 2015;24: 2297–2307. doi:10.1093/hmg/ddu747
79. Pierce SB, Chisholm KM, Lynch ED, Lee MK, Walsh T, Opitz JM, et al. Mutations in mitochondrial histidyl tRNA synthetase HARS2 cause ovarian dysgenesis and sensorineural hearing loss of Perrault syndrome. Proc Natl Acad Sci U S A. 2011;108: 6543–6548. doi:10.1073/pnas.1103471108
80. Pierce SB, Gersak K, Michaelson-Cohen R, Walsh T, Lee MK, Malach D, et al. Mutations in LARS2, encoding mitochondrial leucyl-tRNA synthetase, lead to premature ovarian failure and hearing loss in Perrault syndrome. Am J Hum Genet. 2013;92: 614–620. doi:10.1016/j.ajhg.2013.03.007
81. Dallabona C, Diodato D, Kevelam SH, Haack TB, Wong L-J, Salomons GS, et al. Novel (ovario) leukodystrophy related to AARS2 mutations. Neurology. 2014;82: 2063–2071. doi:10.1212/WNL.0000000000000497
82. Fogli A, Rodriguez D, Eymard-Pierre E, Bouhour F, Labauge P, Meaney BF, et al. Ovarian failure related to eukaryotic initiation factor 2B mutations. Am J Hum Genet. 2003;72: 1544–1550. doi:10.1086/375404
83. Baertling F, Haack TB, Rodenburg RJ, Schaper J, Seibt A, Strom TM, et al. MRPS22 mutation causes fatal neonatal lactic acidosis with brain and heart abnormalities. Neurogenetics. 2015;16: 237–240. doi:10.1007/s10048-015-0440-6
84. Kılıç M, Oğuz K-K, Kılıç E, Yüksel D, Demirci H, Sağıroğlu MŞ, et al. A patient with mitochondrial disorder due to a novel mutation in MRPS22. Metab Brain Dis. 2017; doi:10.1007/s11011-017-0074-5
85. Saada A, Shaag A, Arnon S, Dolfin T, Miller C, Fuchs-Telem D, et al. Antenatal mitochondrial disease caused by mitochondrial ribosomal protein (MRPS22) mutation. J Med Genet. 2007;44: 784–786. doi:10.1136/jmg.2007.053116
86. May-Panloup P, Boucret L, Chao de la Barca J-M, Desquiret-Dumas V, Ferré-L’Hotellier V, Morinière C, et al. Ovarian ageing: the role of mitochondria in oocytes and follicles. Hum Reprod Update. 2016;22: 725–743. doi:10.1093/humupd/dmw028
87. Hayashi Y, Otsuka K, Ebina M, Igarashi K, Takehara A, Matsumoto M, et al. Distinct requirements for energy metabolism in mouse primordial germ cells and their reprogramming to embryonic germ cells. Proc Natl Acad Sci U S A. 2017;114: 8289–8294. doi:10.1073/pnas.1620915114
172
88. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma Oxf Engl. 2009;25: 1754–1760. doi:10.1093/bioinformatics/btp324
89. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20: 1297–1303. doi:10.1101/gr.107524.110
90. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38: e164. doi:10.1093/nar/gkq603
91. Lupski JR, Gonzaga-Jauregui C, Yang Y, Bainbridge MN, Jhangiani S, Buhay CJ, et al. Exome sequencing resolves apparent incidental findings and reveals further complexity of SH3TC2 variant alleles causing Charcot-Marie-Tooth neuropathy. Genome Med. 2013;5: 57. doi:10.1186/gm461
92. Bainbridge MN, Wang M, Wu Y, Newsham I, Muzny DM, Jefferies JL, et al. Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome Biol. 2011;12: R68. doi:10.1186/gb-2011-12-7-r68
93. Challis D, Yu J, Evani US, Jackson AR, Paithankar S, Coarfa C, et al. An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics. 2012;13: 8. doi:10.1186/1471-2105-13-8
94. Reid JG, Carroll A, Veeraraghavan N, Dahdouli M, Sundquist A, English A, et al. Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline. BMC Bioinformatics. 2014;15: 30. doi:10.1186/1471-2105-15-30
95. Charrier A, Wang L, Stephenson EJ, Ghanta SV, Ko C-W, Croniger CM, et al. Zinc finger protein 407 overexpression upregulates PPAR target gene expression and improves glucose homeostasis in mice. Am J Physiol Endocrinol Metab. 2016;311: E869–E880. doi:10.1152/ajpendo.00234.2016
96. Hoppel CL, Kerr DS, Dahms B, Roessmann U. Deficiency of the reduced nicotinamide adenine dinucleotide dehydrogenase component of complex I of mitochondrial electron transport. Fatal infantile lactic acidosis and hypermetabolism with skeletal-cardiac myopathy and encephalopathy. J Clin Invest. 1987;80: 71–77. doi:10.1172/JCI113066
97. Krähenbühl S, Talos C, Wiesmann U, Hoppel CL. Development and evaluation of a spectrophotometric assay for complex III in isolated mitochondria, tissues and fibroblasts from rats and humans. Clin Chim Acta Int J Clin Chem. 1994;230: 177–187.
98. Ni J-Q, Zhou R, Czech B, Liu L-P, Holderbaum L, Yang-Zhou D, et al. A genome-scale shRNA resource for transgenic RNAi in Drosophila. Nat Methods. 2011;8: 405–407. doi:10.1038/nmeth.1592
99. Perkins LA, Holderbaum L, Tao R, Hu Y, Sopko R, McCall K, et al. The Transgenic RNAi Project at Harvard Medical School: Resources and Validation. Genetics. 2015;201: 843– 852. doi:10.1534/genetics.115.180208
173
100. Shapiro-Kulnane L, Smolko AE, Salz HK. Maintenance of Drosophila germline stem cell sexual identity in oogenesis and tumorigenesis. Dev Camb Engl. 2015;142: 1073–1082. doi:10.1242/dev.116590
101. Chong JX, Buckingham KJ, Jhangiani SN, Boehm C, Sobreira N, Smith JD, et al. The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. Am J Hum Genet. 2015;97: 199–215. doi:10.1016/j.ajhg.2015.06.009
102. Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER. The next-generation sequencing revolution and its impact on genomics. Cell. 2013;155: 27–38. doi:10.1016/j.cell.2013.09.006
103. Cantley LC. The Phosphoinositide 3-Kinase Pathway. Science. 2002;296: 1655–1657. doi:10.1126/science.296.5573.1655
104. Jean S, Kiger AA. Classes of phosphoinositide 3-kinases at a glance. J Cell Sci. 2014;127: 923–928. doi:10.1242/jcs.093773
105. Devereaux K, Dall’Armi C, Alcazar-Roman A, Ogasawara Y, Zhou X, Wang F, et al. Regulation of mammalian autophagy by class II and III PI 3-kinases through PI3P synthesis. PloS One. 2013;8: e76405. doi:10.1371/journal.pone.0076405
106. Yoshioka K, Yoshida K, Cui H, Wakayama T, Takuwa N, Okamoto Y, et al. Endothelial PI3K- C2α, a class II PI3K, has an essential role in angiogenesis and vascular barrier function. Nat Med. 2012;18: 1560–1569. doi:10.1038/nm.2928
107. Leibiger B, Moede T, Uhles S, Barker CJ, Creveaux M, Domin J, et al. Insulin-feedback via PI3K-C2alpha activated PKBalpha/Akt1 is required for glucose-stimulated insulin secretion. FASEB J Off Publ Fed Am Soc Exp Biol. 2010;24: 1824–1837. doi:10.1096/fj.09- 148072
108. Krag C, Malmberg EK, Salcini AE. PI3KC2α, a class II PI3K, is required for dynamin- independent internalization pathways. J Cell Sci. 2010;123: 4240–4250. doi:10.1242/jcs.071712
109. Falasca M, Maffucci T. Regulation and cellular functions of class II phosphoinositide 3- kinases. Biochem J. 2012;443: 587–601. doi:10.1042/BJ20120008
110. Behrends C, Sowa ME, Gygi SP, Harper JW. Network organization of the human autophagy system. Nature. 2010;466: 68–76. doi:10.1038/nature09204
111. Campa CC, Franco I, Hirsch E. PI3K-C2α: One enzyme for two products coupling vesicle trafficking and signal transduction. FEBS Lett. 2015;589: 1552–1558. doi:10.1016/j.febslet.2015.05.001
112. Franco I, Gulluni F, Campa CC, Costa C, Margaria JP, Ciraolo E, et al. PI3K Class II α Controls Spatially Restricted Endosomal PtdIns3P and Rab11 Activation to Promote Primary Cilium Function. Dev Cell. 2014;28: 647–658. doi:10.1016/j.devcel.2014.01.022
174
113. Posor Y, Eichhorn-Gruenig M, Puchkov D, Schöneberg J, Ullrich A, Lampe A, et al. Spatiotemporal control of endocytosis by phosphatidylinositol-3,4-bisphosphate. Nature. 2013;499: 233–237. doi:10.1038/nature12360
114. Laflamme N, Leblanc JF, Mailloux J, Faure N, Labrie F, Simard J. Mutation R96W in cytochrome P450c17 gene causes combined 17 alpha-hydroxylase/17-20-lyase deficiency in two French Canadian patients. J Clin Endocrinol Metab. 1996;81: 264–268. doi:10.1210/jcem.81.1.8550762
115. Martin RM, Lin CJ, Costa EMF, de Oliveira ML, Carrilho A, Villar H, et al. P450c17 deficiency in Brazilian patients: biochemical diagnosis through progesterone levels confirmed by CYP17 genotyping. J Clin Endocrinol Metab. 2003;88: 5739–5746. doi:10.1210/jc.2003-030988
116. Kurolap A, Orenstein N, Kedar I, Weisz Hubshman M, Tiosano D, Mory A, et al. Is one diagnosis the whole story? patients with double diagnoses. Am J Med Genet A. 2016;170: 2338–2348. doi:10.1002/ajmg.a.37799
117. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536: 285–291. doi:10.1038/nature19057
118. Kettleborough RNW, Busch-Nentwich EM, Harvey SA, Dooley CM, de Bruijn E, van Eeden F, et al. A systematic genome-wide analysis of zebrafish protein-coding gene function. Nature. 2013;496: 494–497. doi:10.1038/nature11992
119. Vanhaesebroeck B, Whitehead MA, Piñeiro R. Molecules in medicine mini-review: isoforms of PI3K in biology and disease. J Mol Med. 2016;94: 5–11. doi:10.1007/s00109- 015-1352-5
120. Lupski JR, Belmont JW, Boerwinkle E, Gibbs RA. Clan genomics and the complex architecture of human disease. Cell. 2011;147: 32–43. doi:10.1016/j.cell.2011.09.008
121. Blair DR, Lyttle CS, Mortensen JM, Bearden CF, Jensen AB, Khiabanian H, et al. A Nondegenerate Code of Deleterious Variants in Mendelian Loci Contributes to Complex Disease Risk. Cell. 2013;155: 70–80. doi:10.1016/j.cell.2013.08.030
122. Lotta LA, Gulati P, Day FR, Payne F, Ongen H, van de Bunt M, et al. Integrative genomic analysis implicates limited peripheral adipose storage capacity in the pathogenesis of human insulin resistance. Nat Genet. 2016;49: 17–26. doi:10.1038/ng.3714
123. Semple RK, Savage DB, Cochran EK, Gorden P, O’Rahilly S. Genetic syndromes of severe insulin resistance. Endocr Rev. 2011;32: 498–514. doi:10.1210/er.2010-0020
124. Goes FS, McGrath J, Avramopoulos D, Wolyniec P, Pirooznia M, Ruczinski I, et al. Genome-wide association study of schizophrenia in Ashkenazi Jews. Am J Med Genet Part B Neuropsychiatr Genet Off Publ Int Soc Psychiatr Genet. 2015;168: 649–659. doi:10.1002/ajmg.b.32349
175
125. Ruderfer DM, Fanous AH, Ripke S, McQuillin A, Amdur RL, Schizophrenia Working Group of Psychiatric Genomics Consortium, et al. Polygenic dissection of diagnosis and clinical dimensions of bipolar disorder and schizophrenia. Mol Psychiatry. 2014;19: 1017–1024. doi:10.1038/mp.2013.138
126. Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium. Genome- wide association study identifies five new schizophrenia loci. Nat Genet. 2011;43: 969– 976. doi:10.1038/ng.940
127. Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014;46: 1173–1186. doi:10.1038/ng.3097
128. Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467: 832–838. doi:10.1038/nature09410
129. Lettre G, Jackson AU, Gieger C, Schumacher FR, Berndt SI, Sanna S, et al. Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet. 2008;40: 584–591. doi:10.1038/ng.125
130. Bökenkamp A, Ludwig M. The oculocerebrorenal syndrome of Lowe: an update. Pediatr Nephrol Berl Ger. 2016;31: 2201–2212. doi:10.1007/s00467-016-3343-3
131. Staiano L, De Leo MG, Persico M, De Matteis MA. Mendelian disorders of PI metabolizing enzymes. Biochim Biophys Acta BBA - Mol Cell Biol Lipids. 2015;1851: 867–881. doi:10.1016/j.bbalip.2014.12.001
132. Mehta ZB, Pietka G, Lowe M. The cellular and physiological functions of the Lowe syndrome protein OCRL1. Traffic Cph Den. 2014;15: 471–487. doi:10.1111/tra.12160
133. Wiessner M, Roos A, Munn CJ, Viswanathan R, Whyte T, Cox D, et al. Mutations in INPP5K , Encoding a Phosphoinositide 5-Phosphatase, Cause Congenital Muscular Dystrophy with Cataracts and Mild Cognitive Impairment. Am J Hum Genet. 2017;100: 523–536. doi:10.1016/j.ajhg.2017.01.024
134. Osborn DPS, Pond HL, Mazaheri N, Dejardin J, Munn CJ, Mushref K, et al. Mutations in INPP5K Cause a Form of Congenital Muscular Dystrophy Overlapping Marinesco-Sjögren Syndrome and Dystroglycanopathy. Am J Hum Genet. 2017;100: 537–545. doi:10.1016/j.ajhg.2017.01.019
135. Bothwell SP, Farber LW, Hoagland A, Nussbaum RL. Species-specific difference in expression and splice-site choice in Inpp5b, an inositol polyphosphate 5-phosphatase paralogous to the enzyme deficient in Lowe Syndrome. Mamm Genome. 2010;21: 458– 466. doi:10.1007/s00335-010-9281-7
136. Mountford JK, Petitjean C, Putra HWK, McCafferty JA, Setiabakti NM, Lee H, et al. The class II PI 3-kinase, PI3KC2α, links platelet internal membrane structure to shear- dependent adhesive function. Nat Commun. 2015;6: 6535. doi:10.1038/ncomms7535
176
137. Bielas SL, Silhavy JL, Brancati F, Kisseleva MV, Al-Gazali L, Sztriha L, et al. Mutations in INPP5E, encoding inositol polyphosphate-5-phosphatase E, link phosphatidyl inositol signaling to the ciliopathies. Nat Genet. 2009;41: 1032–1036. doi:10.1038/ng.423
138. Chen A, Tiosano D, Guran T, Baris HN, Bayram Y, Mory A, et al. Mutations in the mitochondrial ribosomal protein MRPS22 lead to primary ovarian insufficiency. Hum Mol Genet. 2018; doi:10.1093/hmg/ddy098
139. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma Oxf Engl. 2009;25: 1754–1760. doi:10.1093/bioinformatics/btp324
140. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20: 1297–1303. doi:10.1101/gr.107524.110
141. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38: e164. doi:10.1093/nar/gkq603
142. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non- redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35: D61–65. doi:10.1093/nar/gkl842
143. Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016;44: D862–868. doi:10.1093/nar/gkv1222
144. Liu X, Jian X, Boerwinkle E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat. 2011;32: 894–899. doi:10.1002/humu.21517
145. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm : Abstract : Nature Protocols [Internet]. [cited 21 Mar 2016]. Available: http://www.nature.com/nprot/journal/v4/n7/abs/nprot.2009.86.html
146. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7: 248–249. doi:10.1038/nmeth0410-248
147. Neveling K, Feenstra I, Gilissen C, Hoefsloot LH, Kamsteeg E-J, Mensenkamp AR, et al. A post-hoc comparison of the utility of sanger sequencing and exome sequencing for the diagnosis of heterogeneous diseases. Hum Mutat. 2013;34: 1721–1726. doi:10.1002/humu.22450
148. Wei Z, Wang W, Hu P, Lyon GJ, Hakonarson H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 2011;39: e132. doi:10.1093/nar/gkr599
177
149. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. Prepr ArXiv12073907v2 Q-BioGN. 2012;
150. Rimmer A, Phan H, Mathieson I, Iqbal Z, Twigg SRF, WGS500 Consortium, et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat Genet. 2014;46: 912–918. doi:10.1038/ng.3036
151. Hauer NN, Popp B, Schoeller E, Schuhmann S, Heath KE, Hisado-Oliva A, et al. Clinical relevance of systematic phenotyping and exome sequencing in patients with short stature. Genet Med. 2017; doi:10.1038/gim.2017.159
152. Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SFA, et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007;17: 1665–1674. doi:10.1101/gr.6861907
153. Copy number variation detection and genotyping from exome sequence data [Internet]. [cited 28 Aug 2017]. Available: http://genome.cshlp.org/content/early/2012/05/14/gr.138115.112.abstract
154. Pfundt R, Del Rosario M, Vissers LELM, Kwint MP, Janssen IM, de Leeuw N, et al. Detection of clinically relevant copy-number variants by exome sequencing in a large cohort of genetic disorders. Genet Med Off J Am Coll Med Genet. 2017;19: 667–675. doi:10.1038/gim.2016.163
155. Buchner DA, Charrier A, Srinivasan E, Wang L, Paulsen MT, Ljungman M, et al. Zinc Finger Protein 407 (ZFP407) Regulates Insulin-stimulated Glucose Uptake and Glucose Transporter 4 (Glut4) mRNA. J Biol Chem. 2015;290: 6376–6386. doi:10.1074/jbc.M114.623736
156. Knaup KX, Guenther R, Stoeckert J, Monti JM, Eckardt K-U, Wiesener MS. HIF is not essential for suppression of experimental tumor growth by mTOR inhibition. J Cancer. 2017;8: 1809–1817. doi:10.7150/jca.16486
157. Neff MM, Neff JD, Chory J, Pepper AE. dCAPS, a simple technique for the genetic analysis of single nucleotide polymorphisms: experimental applications in Arabidopsis thaliana genetics. Plant J Cell Mol Biol. 1998;14: 387–392.
158. Neff MM, Turk E, Kalishman M. Web-based primer design for single nucleotide polymorphism analysis. Trends Genet TIG. 2002;18: 613–615.
159. Fu W, O’Connor TD, Akey JM. Genetic architecture of quantitative traits and complex diseases. Curr Opin Genet Dev. 2013;23: 678–683. doi:10.1016/j.gde.2013.10.008
160. Bloom JS, Ehrenreich IM, Loo WT, Lite T-LV, Kruglyak L. Finding the sources of missing heritability in a yeast cross. Nature. 2013;494: 234–237. doi:10.1038/nature11867
161. Jasnos L, Korona R. Epistatic buffering of fitness loss in yeast double deletion strains. Nat Genet. 2007;39: 550–554. doi:10.1038/ng1986
178
162. Lehner B, Crombie C, Tischler J, Fortunato A, Fraser AG. Systematic mapping of genetic interactions in Caenorhabditis elegans identifies common modifiers of diverse signaling pathways. Nat Genet. 2006;38: 896–903. doi:10.1038/ng1844
163. Huang W, Richards S, Carbone MA, Zhu D, Anholt RRH, Ayroles JF, et al. Epistasis dominates the genetic architecture of Drosophila quantitative traits. Proc Natl Acad Sci. 2012;109: 15553–15559. doi:10.1073/pnas.1213423109
164. Shao H, Burrage LC, Sinasac DS, Hill AE, Ernest SR, O’Brien W, et al. Genetic architecture of complex traits: Large phenotypic effects and pervasive epistasis. Proc Natl Acad Sci. 2008;105: 19910 –19914. doi:10.1073/pnas.0810388105
165. Mackay TFC. Epistasis and Quantitative Traits: Using Model Organisms to Study Gene- Gene Interactions. Nat Rev Genet. 2014;15: 22–33. doi:10.1038/nrg3627
166. Huang W, Mackay TFC. The Genetic Architecture of Quantitative Traits Cannot Be Inferred from Variance Component Analysis. PLoS Genet. 2016;12: e1006421. doi:10.1371/journal.pgen.1006421
167. Wood AR, Tuke MA, Nalls MA, Hernandez DG, Bandinelli S, Singleton AB, et al. Another explanation for apparent epistasis. Nature. 2014;514: E3–E5. doi:10.1038/nature13691
168. Fish AE, Capra JA, Bush WS. Are Interactions between cis-Regulatory Variants Evidence for Biological Epistasis or Statistical Artifacts? Am J Hum Genet. 2016;99: 817–830. doi:10.1016/j.ajhg.2016.07.022
169. Nadeau JH, Singer JB, Matin A, Lander ES. Analysing complex genetic traits with chromosome substitution strains. Nat Genet. 2000;24: 221–225. doi:10.1038/73427
170. Darvasi A, Soller M. Advanced intercross lines, an experimental population for fine genetic mapping. Genetics. 1995;141: 1199–1207.
171. Talbot CJ, Nicod A, Cherny SS, Fulker DW, Collins AC, Flint J. High-resolution mapping of quantitative trait loci in outbred mice. Nat Genet. 1999;21: 305–308. doi:10.1038/6825
172. Sackton TB, Hartl DL. Genotypic Context and Epistasis in Individuals and Populations. Cell. 2016;166: 279–287. doi:10.1016/j.cell.2016.06.047
173. Chow CY. Bringing genetic background into focus. Nat Rev Genet. 2016;17: 63–64. doi:10.1038/nrg.2015.9
174. Buchner DA, Nadeau JH. Contrasting genetic architectures in different mouse reference populations used for studying complex traits. Genome Res. 2015;25: 775–791. doi:10.1101/gr.187450.114
175. Rapp JP, Garrett MR, Deng AY. Construction of a double congenic strain to prove an epistatic interaction on blood pressure between rat chromosomes 2 and 10. J Clin Invest. 1998;101: 1591–1595. doi:10.1172/JCI2251
179
176. Mackay TF, Moore JH. Why epistasis is important for tackling complex human disease genetics. Genome Med. 2014;6: 125. doi:10.1186/gm561
177. Brown MS, Goldstein JL. Selective versus total insulin resistance: a pathogenic paradox. Cell Metab. 2008;7: 95–96. doi:10.1016/j.cmet.2007.12.009
178. Stoppa-Lyonnet D. The biological effects and clinical implications of BRCA mutations: where do we go from here? Eur J Hum Genet EJHG. 2016;24 Suppl 1: S3–9. doi:10.1038/ejhg.2016.93
179. Wang K, Lim HY, Shi S, Lee J, Deng S, Xie T, et al. Genomic landscape of copy number aberrations enables the identification of oncogenic drivers in hepatocellular carcinoma. Hepatol Baltim Md. 2013;58: 706–717. doi:10.1002/hep.26402
180. Kan Z, Zheng H, Liu X, Li S, Barber TD, Gong Z, et al. Whole-genome sequencing identifies recurrent mutations in hepatocellular carcinoma. Genome Res. 2013;23: 1422–1433. doi:10.1101/gr.154492.113
181. Gerke J, Lorenz K, Cohen B. Genetic Interactions Between Transcription Factors Cause Natural Variation in Yeast. Science. 2009;323: 498–501. doi:10.1126/science.1166426
182. Gerke J, Lorenz K, Ramnarine S, Cohen B. Gene–Environment Interactions at Nucleotide Resolution. PLOS Genet. 2010;6: e1001144. doi:10.1371/journal.pgen.1001144
183. Sawcer S, Franklin RJM, Ban M. Multiple sclerosis genetics. Lancet Neurol. 2014;13: 700– 709. doi:10.1016/S1474-4422(14)70041-9
184. Buchner DA, Burrage LC, Hill AE, Yazbek SN, O’Brien WE, Croniger CM, et al. Resistance to diet-induced obesity in mice with a single substituted chromosome. Physiol Genomics. 2008;35: 116–122. doi:10.1152/physiolgenomics.00033.2008
185. Hill-Baskin AE, Markiewski MM, Buchner DA, Shao H, DeSantis D, Hsiao G, et al. Diet- induced hepatocellular carcinoma in genetically predisposed mice. Hum Mol Genet. 2009;18: 2975–2988. doi:10.1093/hmg/ddp236
186. Polderman TJC, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visscher PM, et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet. 2015;47: 702–709. doi:10.1038/ng.3285
187. Siegal ML, Bergman A. Waddington’s canalization revisited: Developmental stability and evolution. Proc Natl Acad Sci. 2002;99: 10528–10532. doi:10.1073/pnas.102303999
188. Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Pagé N, et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science. 2001;294: 2364–2368. doi:10.1126/science.1065810
189. Sailer ZR, Harms MJ. Detecting High-Order Epistasis in Nonlinear Genotype-Phenotype Maps. Genetics. 2017;205: 1079–1088. doi:10.1534/genetics.116.195214
180
190. Lagator M, Igler C, Moreno AB, Guet CC, Bollback JP. Epistatic Interactions in the Arabinose Cis-Regulatory Element. Mol Biol Evol. 2016;33: 761–769. doi:10.1093/molbev/msv269
191. Taylor MB, Ehrenreich IM. Higher-order genetic interactions and their contribution to complex traits. Trends Genet TIG. 2015;31: 34–40. doi:10.1016/j.tig.2014.09.001
192. Nadeau JH, Forejt J, Takada T, Shiroishi T. Chromosome substitution strains: gene discovery functional analysis and systems studies. Mamm Genome Off J Int Mamm Genome Soc. 2012;23: 693–705. doi:10.1007/s00335-012-9426-y
193. Yazbek SN, Buchner DA, Geisinger JM, Burrage LC, Spiezio SH, Zentner GE, et al. Deep congenic analysis identifies many strong, context-dependent QTLs, one of which, Slc35b4, regulates obesity and glucose homeostasis. Genome Res. 2011;21: 1065–1073. doi:10.1101/gr.120741.111
194. Buchner DA, Geisinger JM, Glazebrook PA, Morgan MG, Spiezio SH, Kaiyala KJ, et al. The juxtaparanodal proteins CNTNAP2 and TAG1 regulate diet-induced obesity. Mamm Genome Off J Int Mamm Genome Soc. 2012;23: 431–442. doi:10.1007/s00335-012-9400- 8
195. Waddington, C. H. Canalization of development and the inheritance of acquired characters. Nature. 1942;150: 563–565.
196. Tyler AL, Donahue LR, Churchill GA, Carter GW. Weak Epistasis Generally Stabilizes Phenotypes in a Mouse Intercross. PLOS Genet. 2016;12: e1005805. doi:10.1371/journal.pgen.1005805
197. Gonzalez PN, Pavlicev M, Mitteroecker P, Pardo-Manuel de Villena F, Spritz RA, Marcucio RS, et al. Genetic structure of phenotypic robustness in the collaborative cross mouse diallel panel. J Evol Biol. 2016;29: 1737–1751. doi:10.1111/jeb.12906
198. Costanzo M, VanderSluis B, Koch EN, Baryshnikova A, Pons C, Tan G, et al. A global genetic interaction network maps a wiring diagram of cellular function. Science. 2016;353. doi:10.1126/science.aaf1420
199. Guerrero RF, Muir CD, Josway S, Moyle LC. Pervasive antagonistic interactions among hybrid incompatibility loci. PLoS Genet. 2017;13: e1006817. doi:10.1371/journal.pgen.1006817
200. Bastepe M, Fröhlich LF, Linglart A, Abu-Zahra HS, Tojo K, Ward LM, et al. Deletion of the NESP55 differentially methylated region causes loss of maternal GNAS imprints and pseudohypoparathyroidism type Ib. Nat Genet. 2005;37: 25–27. doi:10.1038/ng1487
201. Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Res. 2009;19: 1553–1561. doi:10.1101/gr.092619.109 fr. Segrè D, DeLuna A, Church GM, Kishony R. Modular epistasis in yeast metabolism. Nat Genet. 2005;37: 77–83. doi:10.1038/ng1489
181
203. Carlborg O, Jacobsson L, Ahgren P, Siegel P, Andersson L. Epistasis and the release of genetic variation during long-term selection. Nat Genet. 2006;38: 418–420. doi:10.1038/ng1761
204. Tyler AL, Ji B, Gatti DM, Munger SC, Churchill GA, Svenson KL, et al. Epistatic Networks Jointly Influence Phenotypes Related to Metabolic Disease and Gene Expression in Diversity Outbred Mice. Genetics. 2017;206: 621–639. doi:10.1534/genetics.116.198051
205. Weigelt B, Reis-Filho JS. Epistatic interactions and drug response. J Pathol. 2014;232: 255–263. doi:10.1002/path.4265
206. Wong A. Epistasis and the Evolution of Antimicrobial Resistance. Front Microbiol. 2017;8. doi:10.3389/fmicb.2017.00246
207. Fox J, Weisberg S. An R Companion to Applied Regression, Second Edition. Sage Publications; 2011.
208. Michael Dewey (2016). metap: meta-analysis of significance values. R package version 0.7.).
209. Westfall, P. H. & Young, S. S. Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley; 1993.
210. Chhangawala S, Rudy G, Mason CE, Rosenfeld JA. The impact of read length on quantification of differentially expressed genes and splice junction detection. Genome Biol. 2015;16: 131. doi:10.1186/s13059-015-0697-y
211. Andrews S. FastQC: a quality control tool for high throughput sequence data [Internet]. 2010. Available: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
212. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14: R36. doi:10.1186/gb-2013-14-4-r36
213. Munger SC, Raghupathy N, Choi K, Simons AK, Gatti DM, Hinerfeld DA, et al. RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations. Genetics. 2014;198: 59–73. doi:10.1534/genetics.114.165886
214. Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477: 289–294. doi:10.1038/nature10413
215. Anders S, Pyl PT, Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31: 166–169. doi:10.1093/bioinformatics/btu638
216. Mudge JM, Harrow J. Creating reference gene annotation for the mouse C57BL6/J genome assembly. Mamm Genome Off J Int Mamm Genome Soc. 2015;26: 366–378. doi:10.1007/s00335-015-9583-x
182
217. Leek JT. svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res. 2014;42: e161–e161. doi:10.1093/nar/gku864
218. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26: 139–140. doi:10.1093/bioinformatics/btp616
219. Sun L, Craiu RV, Paterson AD, Bull SB. Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genet Epidemiol. 2006;30: 519–530. doi:10.1002/gepi.20164
220. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser B Methodol. 1995;57: 289–300.
221. Aubourg P, Adamsbaum C, Lavallard-Rousseau M-C, Rocchiccioli F, Cartier N, Jambaque I, et al. A Two-Year Trial of Oleic and Erucic Acids (“Lorenzo’s Oil”) as Treatment for Adrenomyeloneuropathy. N Engl J Med. 1993;329: 745–752. doi:10.1056/NEJM199309093291101
222. Putative X-linked adrenoleukodystrophy gene shares unexpected homology with ABC transporters | Nature [Internet]. [cited 11 Jun 2018]. Available: https://www.nature.com/articles/361726a0
223. SPARK Consortium. Electronic address: [email protected], SPARK Consortium. SPARK: A US Cohort of 50,000 Families to Accelerate Autism Research. Neuron. 2018;97: 488–493. doi:10.1016/j.neuron.2018.01.015
224. Lim JK, McDermott DH, Lisco A, Foster GA, Krysztof D, Follmann D, et al. CCR5 Deficiency is a Risk Factor for Early Clinical Manifestations of West Nile Virus Infection, but not for Infection per se. J Infect Dis. 2010;201: 178–185. doi:10.1086/649426
225. Golubnitschaja O, Kinkorova J, Costigliola V. Predictive, Preventive and Personalised Medicine as the hardcore of “Horizon 2020”: EPMA position paper. EPMA J. 2014;5: 6. doi:10.1186/1878-5085-5-6
226. Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc JAMIA. 2013;20: 117–121. doi:10.1136/amiajnl-2012-001145
227. Zheng T, Xie W, Xu L, He X, Zhang Y, You M, et al. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inf. 2017;97: 120– 127. doi:10.1016/j.ijmedinf.2016.09.014
228. Dai W, Brisimi TS, Adams WG, Mela T, Saligrama V, Paschalidis IC. Prediction of hospitalization due to heart diseases by supervised learning methods. Int J Med Inf. 2015;84: 189–197. doi:10.1016/j.ijmedinf.2014.10.002
229. Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell. 2018;172: 1122–1131.e9. doi:10.1016/j.cell.2018.02.010
183
230. Pantel JT, Zhao M, Mensah MA, Hajjir N, Hsieh T-C, Hanani Y, et al. Advances in computer-assisted syndrome recognition by the example of inborn errors of metabolism. J Inherit Metab Dis. 2018;41: 533–539. doi:10.1007/s10545-018-0174-3
231. Gurovich Y, Hanani Y, Bar O, Fleischer N, Gelbman D, Basel-Salmon L, et al. DeepGestalt - Identifying Rare Genetic Syndromes Using Deep Learning. ArXiv180107637 Cs. 2018; Available: http://arxiv.org/abs/1801.07637
232. Reanalysis of clinical whole-exome sequence data yields multiple new diagnoses. Am J Med Genet A. 176: 264–265. doi:10.1002/ajmg.a.38608
233. Costain G, Jobling R, Walker S, Reuter MS, Snell M, Bowdin S, et al. Periodic reanalysis of whole-genome sequencing data enhances the diagnostic advantage over standard clinical genetic testing. Eur J Hum Genet. 2018;26: 740–744. doi:10.1038/s41431-018-0114-6
234. Barrero MJ, Boué S, Belmonte JCI. Epigenetic Mechanisms that Regulate Cell Identity. Cell Stem Cell. 2010;7: 565–570. doi:10.1016/j.stem.2010.10.009
235. Berdasco M, Esteller M. Genetic syndromes caused by mutations in epigenetic genes. Hum Genet. 2013;132: 359–383. doi:10.1007/s00439-013-1271-x
236. Klein CJ, Botuyan M-V, Wu Y, Ward CJ, Nicholson GA, Hammans S, et al. Mutations in DNMT1 cause hereditary sensory neuropathy with dementia and hearing loss. Nat Genet. 2011;43: 595–600. doi:10.1038/ng.830
237. Ling C, Poulsen P, Simonsson S, Rönn T, Holmkvist J, Almgren P, et al. Genetic and epigenetic factors are associated with expression of respiratory chain component NDUFB6 in human skeletal muscle. J Clin Invest. 2007;117: 3427–3435. doi:10.1172/JCI30938
238. Roseboom T, de Rooij S, Painter R. The Dutch famine and its long-term consequences for adult health. Early Hum Dev. 2006;82: 485–491. doi:10.1016/j.earlhumdev.2006.07.001
239. Krauss-Etschmann S, Meyer KF, Dehmel S, Hylkema MN. Inter- and transgenerational epigenetic inheritance: evidence in asthma and COPD? Clin Epigenetics. 2015;7. doi:10.1186/s13148-015-0085-1
240. Flom JD, Ferris JS, Liao Y, Tehranifar P, Richards CB, Cho YH, et al. Prenatal smoke exposure and genomic DNA methylation in a multiethnic birth cohort. Cancer Epidemiol Biomark Prev Publ Am Assoc Cancer Res Cosponsored Am Soc Prev Oncol. 2011;20: 2518–2523. doi:10.1158/1055-9965.EPI-11-0553
184