Washington University in St. Louis Washington University Open Scholarship

Arts & Sciences Electronic Theses and Dissertations Arts & Sciences

Spring 5-15-2020

Genetic Interactions and Maternal Modulate Congenital Heart Disease Risk

Ehiole Ogboma Akhirome Washington University in St. Louis

Follow this and additional works at: https://openscholarship.wustl.edu/art_sci_etds

Part of the Genetics Commons, and the Systems Engineering Commons

Recommended Citation Akhirome, Ehiole Ogboma, "Genetic Interactions and Maternal Genes Modulate Congenital Heart Disease Risk" (2020). Arts & Sciences Electronic Theses and Dissertations. 2159. https://openscholarship.wustl.edu/art_sci_etds/2159

This Dissertation is brought to you for free and open access by the Arts & Sciences at Washington University Open Scholarship. It has been accepted for inclusion in Arts & Sciences Electronic Theses and Dissertations by an authorized administrator of Washington University Open Scholarship. For more information, please contact [email protected]. WASHINGTON UNIVERSITY IN ST. LOUIS

Division of Biology and Biomedical Sciences Development, Regenerative and Stem Cell Biology

Dissertation Examination Committee: Patrick Jay, Chair Shin-ichiro Imai Heather Lawson Michael Province Stacey Rentschler

Genetic Interactions and Maternal Genes Modulate

Congenital Heart Disease Risk

by

Ehiole Akhirome

A dissertation presented to The Graduate School of Washington University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

May 2020 St. Louis, Missouri

© 2020 Ehiole Akhirome

Table of Contents

List of Figures ...... iv List of Tables ...... vi Acknowledgements ...... vii Abstract ...... x Chapter 1 ...... 1 Introduction ...... 2 Goals of the Thesis...... 16 Figures...... 17 Table ...... 21 References ...... 23 Chapter 2 ...... 30 Introduction ...... 32 Results ...... 35 Discussion ...... 39 Methods...... 42 Figures...... 51 Tables ...... 65 References ...... 66 Chapter 3 ...... 69 Introduction ...... 70 Results ...... 73 Discussion ...... 79 Methods...... 83 Figures...... 89 Tables ...... 102 References ...... 111 Chapter 4 ...... 114

ii

Synthesis and implications for CHD research ...... 115 References ...... 118 Appendix I ...... 119 Curriculum Vitae ...... 135

iii

List of Figures

Chapter 1: Introduction

Figure 1.1 The causes of congenital heart disease ...... 17

Figure 1.2 Advanced intercross ...... 18

Figure 1.3 Coronal section of a wildtype mouse heart ...... 19

Figure 1.4 Reciprocal ovary transplants between young and old mice shows that the maternal age effect is a function of old mothers, not old eggs ...... 20

Chapter 2: Genetic interactions shape the phenotype and severity of congenital heart defects

Figure 2.1 The severity and incidence of a heart defect are negatively correlated...... 51

Figure 2.2 Breeding scheme for inbred strain intercrosses ...... 52

Figure 2.3 The quantitative effect of G×GNkx interactions on risk correlates with the severity of a heart defect ...... 53

Figure 2.4 The correlation between defect severity and effect size is not attributable to trivial explanations ...... 55

Figure 2.5 G×G×GNkx effects correlate with the severity of a heart defect ...... 56

Figure 2.6 Every possible interaction between G×G×GNkx modifiers for each defect type ...... 58

Figure 2.7 Genetic compatibility between two loci involved in G×G×GNkx interactions lowers the risk of heart defects in Nkx2-5+/- mice ...... 63

Chapter 3: High-resolution mapping of the maternal age-associated risk of congenital heart disease

Figure 3.1 Genetic variation in the maternal age effect indicates it is a quantitative trait...... 89

Figure 3.2 Breeding scheme for the F56 LG/J×SM/J advanced intercross line ...... 90 iv

Figure 3.3 Experimental cross diagram for the F56 mapping population ...... 91

Figure 3.4 ASD and VSD risk is subject to strong maternal age effects ...... 92

Figure 3.5 The effect of maternal age on each defect varies broadly ...... 93

Figure 3.6 Estimates of the maternal age effect, per mother ...... 94

Figure 3.7 Distribution of 18 GBS SNPs compared to the Reference ...... 95

Figure 3.8 Manhattan plots of the maternal age effect on ASD and Mus. VSD show many loci, but no obvious overlap ...... 96

Figure 3.9 Combining the ASD and Mus. VSD analysis reveals common maternal age loci..... 98

Figure 3.10 Manhattan plots of the maternal age effect on Mbr. VSD and AVSD show only 1 suggestive locus ...... 99

Figure 3.11 Analyzing the maternal age effect for all defects combined reveals the best consensus candidate loci ...... 100

Figure 3.12 Effect plots illustrating the maternal age effect for the most significant loci ...... 101

v

List of Tables

Chapter 1: Introduction

Table 1.1 Congenital heart disease-associated genes ...... 21

Chapter 2: Genetic interactions shape the phenotype and severity of congenital heart defects

Table 2.1 Modifier loci by cross and defect type ...... 65

Chapter 3: High-resolution mapping of the maternal age-associated risk of congenital heart disease

Table 3.1 P-values from the GLM for between-mother variation in the age effect ...... 102

Table 3.2 List of LSAI mothers genotyped with the Illumina GigaMUGA array ...... 103

Table 3.3 Number of LSAI mothers that did not produce a pup with a given heart defect ...... 104

Table 3.4 List of LSAI mothers without DNA samples for genotyping ...... 105

Table 3.5 Counts of LSAI mothers and pups used in the various mapping analyses ...... 106

Table 3.6 All suggestive and significant loci from all mapping analyses of the maternal age effect on CHD risk ...... 107

Table 3.7 All -coding genes from the Chr. 3 loci ...... 109

Table 3.8 All protein-coding genes from the ASD Chr. 10 locus ...... 110

vi

Acknowledgements

If it takes a village to raise a child, it most certainly takes a village for that child to earn a

PhD. Here are some people from my village.

First, I thank my family. My parents, Godwin and Omoh, for their hardwork and sacrifices that gave me an opportunity to be here. I thank my brothers, Ezeomo and Obaseki, for their constant support since the beginning. I also thank my extended family in Nigeria who always had me in their thoughts and prayers.

Second, I thank all the fantastic educators that I have had throughout the years, who helped develop my love for science: Mrs. Bridget Townsend, Brent Csehy, Pam Perry, Phillip Powell,

Greg Marr, José Soria, and Paul Spearman. I owe my deepest gratitude to Nicole Gerardo, who gave me my first shot as a scientist and let me work independently as a bright-eyed college freshman. Without her, I would not have chosen this path.

Third, I thank my many friends, many of whom are like family. The epic adventures we have together are the perfect respite to hard work. As Yo Gotti said (paraphrase), I’m not going to shout out 1000 people’s names; if you support me, I already know it and if I support you, you already know it.

Fourth, I want to express my most sincere gratitude to all the final members of the Jay

Laboratory. Suk Dev Regmi, Nelson Ugwu, Yidan Qin, Paul Krucylak and Patrick Jay. I will always cherish all the good times we shared.

vii

This work was supported by the WashU MSTP training grant an NIH predoctoral training grant (T32 HL007873) and a Ruth L. Kirschstein National Research Service Award (F30

HL136077).

Ehiole Akhirome

Washington University in St. Louis

May 2020

viii

Dedication

This dissertation is dedicated to my incredible fiancé, Daniela, who has been with me since the

beginning of this journey called graduate school.

A new scientific truth does not triumph by convincing its opponents and making them see the

, but rather because its opponents eventually die, and a new generation grows up that is

familiar with it.

- Max Karl Ernst Ludwig Planck

ix

ABSTRACT OF THE DISSERTATION

Genetic Interactions and Maternal Genes Modulate Congenital Heart Disease Risk

by

Ehiole Akhirome

Doctor of Philosophy in Biology and Biomedical Sciences

Developmental, Regenerative, and Stem Cell Biology

Washington University in St. Louis, 2020

Associate Professor Patrick Jay, Chair

Congenital heart disease (CHD) is the most common congenital anomaly, which makes it a leading cause of infant mortality. Congenital heart defects are a cluster of distinct developmental malformations that affect the vasculature, musculature and organization of the heart, each with varying clinical severity. Although medical and surgical advances have reduced CHD mortality in newborns and children, these patients grow up and many experience serious morbidity and early mortality. The first step toward reducing this burden is to understand the causes of CHD.

Surprisingly, environmental insults and de novo mutations are estimated to explain less than one- third of CHD cases. In many cases, even when a vital cardiac is mutated, a heart defect does not occur. This highlights the critical role of genetic and environmental modifiers in CHD pathogenesis. Attempts to identify these modifiers have had marginal success in humans. This motivated us to model CHD in mice, in which we can control the effects of environment and genetics.

Using nearly 20,000 Nkx2-5 heterozygous mutant mice from multiple inbred strain crosses, my work provides three key findings that describe how genetic and environmental risk factors x modulate CHD risk. First, severe heart defects are rare because they require interactions between multiple risk alleles to manifest disease. Contrarily, mild heart defects can be caused by the Nkx2-

5 mutation alone, which allows these defects to be common. Second, genetic robustness to deleterious mutations can result from well-integrated or coadapted genetic networks. In our mouse model, we found that epistatic interaction effects tend to suppress heart defect risk when the interacting alleles originate from the ancestral mouse strain. This suggests that the incomplete penetrance of human CHD-associated mutations is a result of varying levels of robustness to disease across individuals. Third, there is significant genetic variation in the maternal age associated risk of CHD, suggesting that the underlying genes can be identified. We recapitulated the human maternal age risk using a 56th generation advanced intercross mouse mother population and identified one genome-wide significant locus that modulates the age effect across different heart defect types. Modulating the associated molecular pathway may become a fruitful therapeutic target to suppress CHD risk.

In conclusion, my work has uncovered multiple factors that contribute to congenital heart disease risk in humans. The importance of epistasis in CHD risk emphasizes the need to consider oligogenic disease models in whole-exome/genome and clinical genetics studies of CHD.

Furthermore, maternal effects such as the maternal age effect may help identify modifiable molecular pathways that can suppress CHD risk in human populations. Future studies on the maternal age effect will focus on finalizing our statistical models and validating candidate genes in animal models.

xi

Chapter 1

1

Introduction

Congenital heart disease (CHD) makes up one-third of all congenital defects, making it the most common birth defect. This does not include the number of cases that cause miscarriages and stillborn births. Outcomes for CHD cases have improved drastically over the past several decades because of innovative surgical techniques. However, the surgical interventions only delay morbidity and mortality. Studies show two-thirds of CHD patients are now adults and the number of hospital admissions for adult CHD complications exceed those of pediatric cases1,2. This highlights the need for new strategies that can prevent CHD altogether. Although molecular genetic techniques have been useful in identifying high penetrance CHD genes, most cases of CHD are sporadic and not associated with classical cardiac developmental genes. A better approach would be to identify genetic and molecular mechanisms that can be modified to reduce CHD risk.

This introduction chapter will provide a historical perspective of CHD genetics and move toward present day techniques that are used to investigate complex genetic traits. Finally, this chapter will discuss how these concepts can be applied toward understanding the genetic determinants of CHD and potentially finding preventative interventions.

A historical perspective on the genetics of congenital heart disease

The earliest records of congenital heart defects date back to the 1600s, but detailed studies did not begin until the 1800s3. In 1858, physician Thomas Bevill Peacock wrote the first comprehensive study on congenital heart disease (CHD) entitled, “On Malformations of the

Human Heart.” In this monograph, Peacock was the first to suggest a hereditary predisposition to

CHD, even before Mendel published his treatise on genetic inheritance4,5. Despite Peacock’s prescient assertion, other cardiologists focused on finding causative environmental factors or

2 developing surgical interventions because genetic studies were not feasible6. It was not until the mid-1900s that genetics became a focus in CHD research. In 1968, James Nora, a prominent pediatric cardiologist, proposed a multifactorial inheritance model of CHD7. He hypothesized that a heart defect is caused by a specific mixture of susceptibility genes combined with certain environmental stressors in the mother that shift an elevated genetic risk toward a true defect. This is still a popular hypothesis among pediatric cardiologists today.

Helen Taussig, founder of the field of pediatric cardiology and co-inventor of the Blalock-

Thomas-Taussig shunt, did not completely agree with Nora. She found that human populations worldwide all had similar rates of CHD, so she discounted the influence of specific environmental stressors3. Furthermore, she compiled data showing that other mammals developed CHD at similar rates as human. At the end of her career, she proposed the idea that genetic elements common to all mammals influenced the risk of CHDs and the defects we see today are “primeval hearts—

‘stepping stones’ in the evolution of the ‘normal’ heart”. She supports this idea by stating how certain heart defects do not reduce fitness as much, and therefore may permit the underlying genetic variants to escape selection. This theory is not widely popular today, but it spurred others to study cardiac development in other organisms with the hopes that their findings will recapitulate human development.

In the present age of molecular genetics, investigators have benefitted greatly from animal models. Animal models have facilitated the discovery of various critical transcription factors. In

1993, Rolf Bodmer identified the gene, tinman, as a key factor responsible for cardiac specification in the drosophila visceral mesoderm8. Just a few months later, the mammalian orthologue, Nkx2-5 (Csx), was identified in mice9,10. Soon, several other cardiac transcription factors were identified in animals. It was not until 1998 that genome-wide linkage analysis proved 3 that NKX2-5 gene mutations caused CHD in humans11. Since then, more studies have linked over

300 genes to the pathogenesis of CHD (Table 1.1). Surprisingly, only about 25% of CHD cases harbor de novo mutations in these genes12, suggesting that genetic polymorphisms in other genes are functionally significant in CHD pathogenesis (Figure 1.1).

The inheritance pattern for CHD has been difficult to decipher, partly because of variable penetrance of CHD-causing variants. Some have tried to approach this problem by studying the risk of CHD recurrence in the offspring when a parent also has a CHD. Sir John Burn and his colleagues used British CHD data and mathematical genetic models to determine that the various types of heart defects are inherited through different mechanisms; some are due to single gene influences, while others are caused by up to 100 genetic loci13. A series of studies conducted in

Denmark showed that there is significant familial clustering of CHD, in general, but the specific

CHD phenotypes found within families are highly discordant14,15. These studies suggest two non- exclusive possibilities: (1) individuals with CHD inherit a single disease-causing locus and genetic variants at other loci control the penetrance and expressivity of the phenotype; (2) individuals inherit a combination of genetic factors that increase the risk of impaired heart development and the heart defect is determined by a combination of environmental and genetic modifiers. Although there may be overlap between the mechanisms of inheritance, the experimental approaches to discover CHD genes require a distinction between simple/Mendelian and complex/polygenic inheritance.

Simple Mendelian inheritance suggests a single genetic locus with a large effect on CHD risk. Classically, researchers approach this scenario using linkage analysis on family pedigrees.

Linkage analysis is a form of genetic study that tracks recombination events between SNP markers, allowing the detection of the trait loci that are located close to the SNP marker because of linkage 4 disequilibrium16. Linkage disequilibrium (LD) is the tendency for physically close markers (on the same chromosome) to be inherited together17. Deviation from the expected 50% recombination required by the Law of Independent Assortment is called LD. The LD between SNP markers and the disease locus allows for statistical inferences that can determine its physical location. Modern linkage analysis requires SNP genotype data for probands, unaffected siblings and parents, and high-quality phenotype data for everyone. Including extended family members or other affected families can improve statistical power and reduce false positives16,18. Linkage analysis revealed the location of Mendelian mutations in genes such as NOTCH1, TBX5 and GATA4 because of the autosomal dominant inheritance pattern19-21.

In the early 2010s, genome-wide association studies (GWAS) were expected to unveil how common genetic variants affect risk for CHD. In simple terms, a GWAS is a regression of the phenotype on the genotypes at each locus. This quantifies how strongly each single nucleotide polymorphism (SNP) is associated with the phenotype of interest22. GWAS is particularly useful for traits with low penetrance or polygenic inheritance, with small phenotypic effect per locus.

Though GWAS appeared very promising, it has yet to unearth new causal genes for CHD. Because the effect sizes of associated loci are usually small and there are often several loci to validate, it becomes difficult to hone in on the inciting gene or genetic network18.

There have been two exemplary studies that tempered expectations about the potential of

GWAS in CHD research. In 2013, Cordell et. al. published results on a large GWAS including nearly 2000 CHD cases and over 5000 controls23. Despite the large sample size and “careful” phenotyping, the authors were unable to reach genome-wide significance in the major CHD categories. They detected a suggestive locus associated with the risk for a special type of atrial septal defect23. Another similar study in a Han Chinese population found that the risk of atrial 5 septal defects associates with SNP markers near TBX15 and MAML3 [Ref. 24]. Overall, human

GWAS has had minimal success in identifying causal gene variants. Aside from the weak effects of common SNPs, the extreme genetic heterogeneity in human populations reduces the power of

GWAS. This means that polymorphisms that are associated with risk in one population, are not in other populations. As a result, some scientists have decided to limit genetic heterogeneity by using animal models to study how genetic variation influences complex traits like CHD.

Using QTL mapping to study complex traits

The characteristics that define an individual, e.g. corporal features, physiology and temperament can be defined as quantitative traits. A quantitative trait locus (QTL) is a genetic region that influences the phenotypic variation in a trait. QTL analysis or mapping is a method that uses linkage to nearby genetic markers to identify a locus, or location on a chromosome, containing a gene that influences the trait of interest. QTLs are mapped using one of two general methods: (1)

Linkage mapping and (2) Association mapping. Linkage mapping is typically used to analyze small family pedigrees or experimental crosses of divergent inbred animals, where relationships between individuals are known. It utilizes recombination events to isolate the genetic markers that predict the trait phenotype in an F2 or later generation. This method is effective at detecting loci with large effect size using a dozen to a few hundred individuals. Association mapping is similar because it also uses recombination, but this method is used for analyzing outbred populations with several generations of recombination, such as human or wild organism populations. Myriad historical recombination events allow only markers that are closely linked to QTLs to show significant association to the trait phenotype. Therefore, association mapping leads to higher resolution (much smaller QTL intervals) compared to linkage mapping because of the higher 6 number of recombination events. However, association mapping requires a greater sample size to detect QTLs25. An important consideration for any QTL analysis is that it is only possible to map

QTLs for traits that vary due to genetic differences within a population.

Early work on QTL analysis took place mainly in agriculture, where people made associations between a “neutral” marker phenotype, like color, and a trait of interest. Karl Sax was one of the first to study this type of linkage quantitatively when he linked pigmentation pattern to seed size in Phaseolus vulgaris, the common bean26. These types of analyses were necessary because we lacked neutral, molecular genetic markers that would enable the physical location of trait loci. In 1980, David Botstein and his colleagues developed a method to create a genetic linkage map using restriction fragment length polymorphisms (RFLPs). A “complete” genetic linkage map allowed scientists to use a physical property of the DNA molecule as a marker for its physical location on the chromosome27. Nine years later, Botstein collaborated with Eric Lander to write a seminal paper proposing the interval mapping method for QTL mapping that makes use of a genome-wide RFLP linkage map28. A major strength of interval mapping is its ability to infer the presence of a QTL even if it is located between genetic markers. They optimized their method for crosses between inbred strains of model organisms, such that the alleles at a marker could take one of two possible values. This allowed interval mapping to gain traction rapidly amongst experimental biologists and geneticists alike.

Within a few years, researchers used interval mapping to find QTLs responsible for hypertension in rats29,30, type 1 diabetes in mice31, male genitalia shape differences in flies32 and more. Investigators even began to use QTL mapping on human data. The link between the APOE4 allele of the Apolipoprotein E and Alzheimer’s disease susceptibility was identified using QTL interval mapping33. This inspired others to try their luck at QTL mapping their favorite human 7 complex trait. Many QTL mapping attempts in humans ended in failure and were attributed to a lack of power34. However, the expectation that human data would yield similar results as those from inbred model organisms was naïve because human populations are very outbred relative to model organisms34,35. This realization spurred the development of SNPs and the GWAS after the was finished in 2001.

When studying complex traits, it is essential to limit the influence of environmental factors and control the genetic structure of the experimental populations. Even moderate control of these variables is very difficult, if not impossible, in human populations. Therefore, animal models offer a distinct advantage over human populations because confounding variables, like genetic structure and allele frequency, can be controlled. With animal models, it is possible to address the provocative questions in the field of genetics. One of those questions is about the genetic architecture of complex traits35,36. How do genetic variants control the expression of complex traits? The answer is still unclear, but recent studies are building the foundation needed to understand the genetic interactions that regulate complex traits.

In order to make full use of the genetic code, we must first understand the genetic architecture of the phenotypes it encodes. Understanding the complete genetic architecture of even one complex trait is far from a trivial task. It requires knowledge of all the genes that underlie a phenotype, the portion of those genetic loci that control the trait’s variation, the influence of additive, dominance and epistatic genetic interactions, the effects of polymorphisms, environment and mutations, the role of pleiotropy, the significance of indirect genetic effects, the role of reproductive barriers, and the sum effect of all these factors on the molecular mechanisms that determine the quantitative values for the trait37. Although it is not possible to discuss all these

8 components in this format, it is important to realize that every phenotype or trait is the product of intricate and interwoven genetic and environmental networks.

Most of the complex traits that have been studied to date have only addressed a portion of the genetic architecture because of the daunting complexity38-44. To simplify the natural complexity of these genetic networks, it is essential to limit the scope of search and optimize the experimental methods for a specific aspect of the genetic architecture. For example, detecting epistatic interactions between QTL requires a large sample size because of the small proportion of individuals that will have the double homozygote genotypes. The diminished sample size increases the variance of the mean phenotype, which makes interactions more difficult to detect44. Cheverud and colleagues found epistatic QTL interactions for mouse litter size by using over 1600 mice in their study45. Instead of a brute force approach, using model organisms that enable high-throughput methods, such as yeast, can save time and money at the expense of translatability to mammalian phenotypes46. Another strategy for detecting epistatic QTL uses congenic strains, namely chromosome substitution or genetic introgression lines. These organisms have isogenic backgrounds with specific chromosomes or genetic loci “inserted” from another strain, restricting the source of phenotypic variation to smaller genetic regions. Spiezio et. al. (2012) demonstrated the effectiveness of this approach when they used a chromosome substitution line of mice to detect epistatic QTL interactions for dozens of traits while using only a few hundred mice47. Experimental designs like this one increase the power to detect QTLs, but they are limited to a subset of the important genetic variants available in other populations48.

Historically, a key weakness of QTL linkage mapping is the lack of mapping precision35,49.

Many studies used F2 or backcross breeding schemes to generate their study populations, but this only allows two or three recombination events per chromosome. As a result, significant QTL 9 intervals typically span several megabases, containing hundreds of genes. It is also possible that a

QTL with a large effect is actually the sum of two separate QTLs in tight linkage. To address these issues, Darvasi and Soller (1995) proposed the use of an experimental population called an

50 advanced intercross line (AIL) . The concept behind the AIL is to take an F2 generation from an intercross of two inbred strains and continue randomly intercrossing for several generations

(Figure 1.2). AILs are a special case of outbred populations because the allele frequencies a start

0.5 at every marker, which maximizes the power to detect genetic effects of each allele49,50. The

AIL population will accumulate increasing numbers of recombination events per generation and reduce the size of QTL intervals, which permits fine-scale mapping of QTL. A QTL interval mapped in an F2 generation will be narrowed by a factor of t/2, where t is the generation number; hence, a 30cM QTL interval in the F2, will be 6cM in the F10 [Ref. 50]. As the cost of genotyping falls and the density of SNP panels increases, the feasibility and popularity of AILs has surged.

An important consideration with AILs is population structure. Population structure means that there are subpopulations that comprise the whole population49. This is because individuals from the same family, i.e. siblings, are more similar genetically. Therefore, there are increased correlations between phenotypes within a family because they are closely related (K. Broman,

Personal Communication, 2014). This will result in an elevated Type 1 Error rate because some genotypes will show significant effects only because they are associated with a particular family with extreme trait values, not the actual trait per se51,52. Fortunately, association tests that incorporate linear mixed models (LMMs) adequately control the false positive rate52. The LMM incorporates a random effect, which uses a covariance matrix that accounts for the patterns of genetic similarity between all individuals in the population. Freely available LMM software such

10 as GEMMA53 and the R function lmekin54 allow genetic mapping in structured populations in human and model organism studies.

The Cheverud Laboratory was among the first to publish studies that mapped QTLs using

AILs55. Since the turn of the century, James Cheverud has been breeding an advanced intercross of SM/J and LG/J inbred strains. Now his population of LG/J-SM/J advanced intercross (LSAI) is

th at the 56 generation (J. Cheverud, Personal Communication, 2014). The F56 LSAI reduces the confidence intervals for mapped QTL to ~1Mb, compared to dozens of megabases in an F2 cross.

These mice have aided the discovery of genes related to obesity, regeneration, bone length, metabolism and responses to trauma56-60. Animal models and quantitative genetics tools, like those cited here, enhance our ability to define the genetic interactions that control complex traits and diseases alike. Continued work in this field is essential for us to build our understanding of the genetic architecture of important traits and develop targeted approaches to modify them.

Animal models of congenital heart disease genetics

The study of CHD in animals has helped identify numerous genes that are involved in normal heart development. Many of the experiments were performed by removing a gene via knockouts and studying the resulting cardiac phenotypes61. However, this sort of information does not reveal the cause of a majority of CHD cases because most cases lack errors in key CHD genes62. The present state of knowledge can be likened to figuring out what makes a car stop running by removing random parts. That sort of knowledge is difficult to apply if we need to find the faulty part when the car suddenly stops running. Instead, we need to understand both the function of each part and be able to recognize suboptimal and broken parts. These concepts apply to mechanics and doubly to those studying CHD and heart development. We must develop 11 methods that identify how broken or suboptimal combinations of genes and environments influence the risk for CHD. Advances in this area can help shift the field of CHD toward preventative interventions.

Surprisingly, relatively few published studies address how genetic modifiers influence

CHD risk when a known mutation is present. Some of these published papers address the risk for

CHD in individuals with Down syndrome (DS). People with DS have a ~50% incidence of CHD63.

However, since half have normal hearts, it suggests that environmental and genetic modifiers affect the CHD risk in this sensitized population. One study found that DS patients with deleterious variants of CRELD1 or HEY2 have a significantly higher risk for CHD64. In a DS mouse model, they showed that haplo-insufficiency of either modifier increased CHD risk. Another study on DS and CHD used genomic rearrangements to show that the genes in a 3.7Mb region on Hsa21 are responsible for the elevated risk for CHD65. Therapies that modify the dosage of these genes may prove beneficial for individuals with DS or some sporadic cases of CHD.

Like the risk of CHD in Down syndrome, mutations in archetypal cardiac transcription factors (NKX2-5, GATA4, TBX5, etc.) do not always lead to CHD66-68. Hence, it is sensible to consider that there are genetic and environmental modifiers that control the manifestation of CHD.

For this reason, the Jay Lab decided to search for modifiers of the cardiac master regulator, Nkx2-

5, in a CHD mouse model. Murine cardiac anatomy is very similar to human and is an established model for mammalian heart development69,70.

Winston & Jay et. al. (2010) characterized >3000 Nkx2-5+/- mouse hearts to demonstrate that the incidence, types and severity of cardiac defects strongly depended on their genetic background71. First, they demonstrated that the pure C57BL/6 genetic background was highly susceptible to ventricular (VSD) and atrial (ASD) septal defects in the presence of an Nkx2-5+/- 12 mutation. The incidence of each defect type was near 40%. When Nkx2-5+/- C57BL/6 mice were crossed to either an FVB/N or A/J mouse, the incidence of VSDs in their Nkx2-5+/- offspring

+/- +/- dropped to less than 3%, and ASDs to less than 8%. Relative to Nkx2-5 F1 animals, Nkx2-5 F2 intercross mice of either strain saw a substantial increase in the incidence of both VSDs and ASDs, but lower than in the Nkx2-5+/- C57BL/6 background. Interestingly, Nkx2-5+/- C57BL/6×A/J Nkx2-

+/- 5 F2 had a greater risk of developing ASDs and severe atrioventricular septal defects (AVSDs)

+/- than the C57BL/6×FVB/N Nkx2-5 F2. Primarily, this demonstrates the protective effect of heterozygosity on the risk for CHD. Furthermore, these data underscore the effect of polymorphic genetic modifiers on CHD risk. A segregation analysis of the C57BL/6×FVB/N crosses suggested that at least two polymorphic loci and one epistatic interaction predicted the risk for an ASD or a muscular VSD. Even though all the mice in the study are Nkx2-5+/- mutants, their risk for developing a heart defect is at least partly dependent on genotypes at other loci.

In a follow-up study, Jay & his colleagues addressed the genetic architecture of ventricular septal defects (VSD) by QTL linkage mapping in the same Nkx2-5+/- mouse model72. They mapped

+/- significant QTL for membranous and muscular types of VSDs in the Nkx2-5 F2 offspring of an

+/- +/- Nkx2-5 C57BL/6×FVB/N F1intercross. Remarkably, the risk for VSDs in the Nkx2-5 offspring increases incrementally each month as the mother ages. This maternal age-associated risk for CHD was not explained by aneuploidy. This finding supports Nora’s prediction that multifactorial influences control the risk for CHD7. It also recapitulates epidemiological data that implicate maternal age as a CHD risk factor in humans73,74.

The Jay Lab continued to explore the maternal age-associated risk factor in their most recent publication75. Prior to this study, it was not known whether the maternal age risk came from old ovaries or factors within an aged mother. To address this, they performed reciprocal ovary 13 transplants between young (< 100 days) and old (>300 days) C57BL/6 mothers and assessed the incidence of CHDs in the Nkx2-5+/- offspring (Figure 1.4). The results determined that an increased incidence of VSDs tracked with old mothers with young ovaries; young mothers with old ovaries maintained a baseline incidence of defects. An analysis of different maternal strains revealed that the presence and extent of the maternal age-associated risk varied by genetic background. Another set of experiments showed that the maternal age effect was not simply due to obesity or insulin resistance in old mothers. Most importantly, they discovered that mice mothers that exercised ad libitum were able to eliminate the increased risk of CHD in their Nkx2-5+/- offspring.

There are a few conclusions to draw from this study. First, the results indicate that there are maternal genetic and environmental mechanisms that can meaningfully modify the risk for cardiac defects in predisposed offspring. Because the maternal age risk followed old mothers, it is likely that physiological changes, in response to exercise, restore facets of a youth-like state in the old mothers. The maternal age risk was absent in some strains and pronounced in others, implying a maternal-fetal gene × gene interaction that can be modified by the maternal behaviors and environment. The fetal risk for CHD is high due to an Nkx2-5+/- mutation, but that risk may be elevated because of the mother’s genetic susceptibility to age-related physiological changes. Since the magnitude of the age-related risk is different across mouse strains, there must be maternal genes that regulate this effect. Therefore, we can use these strain-specific differences to map these genes and identify the responsible molecular pathways.

Maternal-fetal interactions, like the maternal age-associated risk of CHD, prove that the genetic architecture of CHD is quite complicated, but this is likely true of all complex traits.

Indirect effects are an important, yet underappreciated, aspect of genetic architecture76,77. Maternal effects are a type of indirect genetic effect where the mother contributes to the phenotypic variation 14 of traits in the offspring through mechanisms that are independent of inherited maternal genes78.

In order to construct a trait’s genetic architecture, it is important to determine the extent of trait heritability. Heritability specifies the extent of correlation between the individual’s genes and the expressed trait. Failure to account for indirect effects can lead to inflated heritability estimates, meaning that estimates of QTL effects may be overestimated79. Maternal effects are inherently difficult to estimate because they are confounded by the 50% genetic similarity between mother and offspring80. Confounding, in this respect, can lead to a correlation between maternal phenotype and offspring phenotype for the same trait, which can be erroneously attributed solely to genetic inheritance.

In the case of the maternal age-associated risk of CHD, maternal genetic effects are most relevant because environment is uniform and paternal age is not a risk factor72. The age-associated risk for CHD is not purely a maternal genetic phenotype because its effects are also a function of maternal age. Therefore, this trait will only exist when a mother has the combination of genetic polymorphisms that make her susceptible to the ageing effect and the offspring has a genetic predisposition to CHD. It is unlikely that there will be confounding between the effects of maternal genotype and offspring genotype because they control different traits. QTL mapping experiments designed to identify the maternal genetic pathways that control the age-associated risk for CHD hold significant promise because the risk is modifiable by exercise. Therapies directed at the mother to target deficiencies in the offspring are particularly attractive because they avoid the perilous hurdles of in utero interventions. For this to be possible, we must develop a better understanding of how maternal genetics and environment interact to influence fetal cardiac development, and how genetic polymorphisms dictate that interaction.

15

Goals of the Thesis

One interesting feature of congenital heart disease, and other genetic diseases, is that strikingly different phenotypes can result from one, ostensibly Mendelian, mutation. NKX2-5 mutations elicit a range of incompletely penetrant heart defects that vary in anatomic location and severity. This suggests that interactions between a mutation and other factors, environmental or genetic, determine the phenotypic outcome. The first aim of my thesis is to dissect the genetic architecture of phenotypic variability in CHD through animal models. Our mouse model replicates

CHD risk in susceptible people through a heterozygous mutation in Nkx2-5, an essential cardiac . We used inbred strain crosses and quantitative genetic methods to identify genetic polymorphisms that modify CHD risk in Nkx2-5, haploinsufficient mice. I then assessed how epistatic interactions between genetic modifiers of Nkx2-5 influence the CHD phenotype that occurs.

The second goal of my thesis stems from the brilliant discovery that maternal age increases the offspring CHD risk, both in humans and in our Nkx2-5 mouse model. Prior studies in the Jay laboratory showed that this risk is due to factors within aging mouse mothers, not old ovaries.

Further, voluntary exercise is sufficient to reduce her CHD risk to baseline levels. These experiments showed that the maternal age risk varied according to the genetic background of the mice, indicating that it is a quantitative genetic trait. The second aim of my thesis is to identify genes that influence the variation in the maternal age effect. We partnered with the laboratory of

James Cheverud to use a 56th generation, advanced intercross mouse population to identify CHD modifier loci to nearly single-gene resolution. The mapping analysis requires developing models that account for the nuances of our mouse population. 16

Figures

Figure 1.1 The causes of congenital heart disease

Nearly two-thirds of all CHD cases are unexplained. Most of the cases with a known cause are the result of de novo CNVs and gene mutations. Inherited mutations are still expected to play a small role. (Reproduced from Ref. 12)

17

Figure 1.2 Advanced intercross chromosomes

After several generations of random intercrossing, advanced intercross chromosomes have accumulated many more recombination events that allow fine-scale mapping experiments.

18

Figure 1.3 Coronal section of a wildtype mouse heart

This histological section shows all 4 heart chambers. The anatomical structure is nearly identical to the human heart. The areas with slight differences are located around the venous pole of the heart. The human heart has 4 pulmonary veins that enter the left atrium, while the mouse has the pulmonary veins join together before abutting the left atrium81. Ao, Aorta; RA, LA, right, left atrium; RV, LV, right, left ventricle. (Adapted from Ref. 71)

19

A

B

Figure 1.4 Reciprocal ovary transplants between young and old mice shows that the maternal age effect is a function of old mothers, not old eggs

Panel A) shows a schematic for the ovary transplant surgeries and the sample sizes used. B) Old mothers with young ovaries have more than twice the incidence of VSDs compared to young mothers with old ovaries. (Adapted from Ref. 75)

20

Table

Table 1.1 Congenital heart disease-associated genes

These 334 genes were compiled from Zaidi et al. 2013 and Li et al. 2015 (Refs. 82 and 83).

Acan BMPR2 Dnm2 GJA9 LEFTY2 NKX3-2 PTPN11 TBX20 ACP6 BUB1B DOLK Gm572 LEMD3 NODAL RAB10 TBX3 ACTA2 C1orf106 DPPA4 GPC3 LLPH NOTCH1 RAB23 TBX5 ACTC1 Cc2d2a DQ983818 GPR161 Lox NOTCH2 RAI1 TCF21 ACVR1 Ccdc151 Drc1 GPRC6A LPIN1 NOTCH2NL RAI2 TDGF1 ACVR2B CCDC39 DVL1 GSK3B Lrp1 NOTCH3 RAPGEF5 TFAP2B Adamts6 CCDC40 DVL2 HAND1 Lrp2 NOTCH4 REL TGFBR2 AHSA2 CCT4 Dync2h1 HAND2 LRRC50 NOTO RFX2 TLL1 AKAP9 CDH2 Dyx1c1 Hectd1 LRRC6 NPHP3 RFX3 TMBIM4 ANKRD1 Cep110 DZIP1 HES1 Ltbp1 NPPA Robo1 TMEM195 Anks6 Cep290 EED HES4 MARK2 NSD1 ROCK2 Tmem67 Ap1b1 CER1 EHMT1 HEY2 MAX NUMBL ROR2 TNFRSF21 Ap2b1 CFC1 ELN HOXA1 MED13L NUP188 ROTATIN TSC1 APOBEC2 Cfc1 EP300 ID2 MEF2A OFD1 RPGRIP1L TSEN15 ARL13B CHD1L ESCO2 IER2 MEF2C OSR1 RUNX2 TTC21B Armc4 CHRAC1 EVC IFT122 Megf8 PAPOLG S100Z TTC30A ASXL2 CHRD EVC2 Ift140 METT10D PCMTD2 SALL1 TWIST1 ATE1 CITED2 EZH1 IFT172 MGAT1 PCNT SALL4 TXNDC3 ATP1A2 CLDN7 FBN1 IFT20 MGP PCSK5 SDC2 UBR1 ATP4A CLUL1 FBN2 IFT57 MID1 Pcsk5 SEMA3E USP34 ATP4B Cml5 FGF8 Ift74 MKKS Pde2a SESN1 VANGL2 BBS1 CREBBP FGFR1 IFT88 MKS1 PEX1 SHH VEGFA BBS10 CRELD1 FIBP1 IGFBP4 Mmp21 PEX13 SHOC2 VEGFC BBS11 CRHBP FLNA IGFBP5 MNDA PHYHD1 SIL VIT BBS12 CRX FMO5 IHH MSX1 PIFO SLC2A10 WNT3A BBS2 CSRP1 FOXA2 INVERSIN MSX2 PITX2 SMAD2 XPO1 BBS3 CTNNA3 FOXC1 IPPK MYH10 Pkd1 SMAD5 ZAC1 BBS4 Cxcr4 FOXH1 ISL1 Myh10 PKD1L1 Smad6 Zbtb14 BBS5 DAND5 FOXJ1 JAG1 MYH11 Pkd1l1 Smarca4 ZEB2 BBS6 Daw1 FOXL2 JAZF1 MYH6 PKD2 SMARCD3 ZFPM1 BBS7 Dctn5 Frem2 JBTS17 Ndst1 PLAGL1 SMO ZIC3 BBS8 DHCR7 FTO KCNE1 NEK2 Plxnd1 SNAI1 ZNF480 BBS9 DLL1 Fuz KCNJ2 Nek8 PPM1K Snx17 ZNF528 21

BCL11A DMRT2 GADL1 KCNQ1 NF1 PPP3CA SOS1 ZNF534 BCL6 Dnaaf3 GALNT11 KIAA1841 NFATC1 PQPB1 SOX17 ZNF610 BCL9 Dnah11 GATA4 KIF3A NFATC3 Prdm1 SRF ZNF638 BCOR DNAH2 GATA5 KIF3B NFATC4 Prickle1 STIL ZNHIT3 BICC1 DNAH5 GATA6 KIF3C NIPBL PRKAB2 SUFU Bicc1 Dnah5 GDF1 Kif7 NKD1 PROX1 Sufu BMP4 DNAI1 GJA1 KIFAP3 NKX2-5 Pskh1 SUPT3H BMP7 Dnai1 GJA5 KLF13 NKX2-6 PTCH1 Tab1 BMPR1A DNAI2 GJA8 LBR PTCH2 Tbc1d32 BMPR1B LEFTY1 Ptk7 TBX1

22

References

1. Steiner, J. M. & Kovacs, A. H. Adults with congenital heart disease – Facing morbidities and uncertain early mortality. Progress in Pediatric Cardiology 48, 75-81, doi:https://doi.org/10.1016/j.ppedcard.2018.01.006 (2018).

2. O'Leary, J. M., Siddiqi, O. K., de Ferranti, S., Landzberg, M. J. & Opotowsky, A. R. The Changing Demographics of Congenital Heart Disease Hospitalizations in the United States, 1998 Through 2010. JAMA 309, 984-986, doi:10.1001/jama.2013.564 (2013).

3. Taussig, H. B. World survey of the common cardiac malformations: developmental error or genetic variant? Am J Cardiol 50, 544-559 (1982).

4. Peacock, T. B. On malformations of the human heart. etc. with original cases and illustrations. 2 edn, (John Churchill and Sons, 1866).

5. Mendel, G. Versuche über Pflanzen-Hybriden. (Brünn : Im Verlage des Vereines, 1866).

6. Gelb, B. D. History of Our Understanding of the Causes of Congenital Heart Disease. Circ Cardiovasc Genet 8, 529-536, doi:10.1161/CIRCGENETICS.115.001058 (2015).

7. Nora, J. J. Multifactorial inheritance hypothesis for the etiology of congenital heart diseases. The genetic-environmental interaction. Circulation 38, 604-617 (1968).

8. Bodmer, R. The gene tinman is required for specification of the heart and visceral muscles in Drosophila. Development 118, 719-729 (1993).

9. Komuro, I. & Izumo, S. Csx: a murine homeobox-containing gene specifically expressed in the developing heart. Proc Natl Acad Sci U S A 90, 8145-8149 (1993).

10. Lints, T. J., Parsons, L. M., Hartley, L., Lyons, I. & Harvey, R. P. Nkx-2.5: a novel murine homeobox gene expressed in early heart progenitor cells and their myogenic descendants. Development 119, 419-431 (1993).

11. Schott, J. J. et al. Congenital heart disease caused by mutations in the transcription factor NKX2-5. Science 281, 108-111 (1998).

12. Akhirome, E., Walton, N. A., Nogee, J. M. & Jay, P. Y. The Complex Genetic Basis of Congenital Heart Defects. Circulation Journal 81, 629-634, doi:10.1253/circj.CJ-16- 1343 (2017).

13. Burn, J. et al. Recurrence risks in offspring of adults with major heart defects: results from first cohort of British collaborative study. Lancet 351, 311-316 (1998).

23

14. Øyen, N. et al. Recurrence of congenital heart defects in families. Circulation 120, 295- 301, doi:10.1161/CIRCULATIONAHA.109.857987 (2009).

15. Øyen, N. et al. Recurrence of discordant congenital heart defects in families. Circ Cardiovasc Genet 3, 122-128, doi:10.1161/CIRCGENETICS.109.890103 (2010).

16. Ott, J., Wang, J. & Leal, S. M. Genetic linkage analysis in the age of whole-genome sequencing. Nat Rev Genet 16, 275-284, doi:10.1038/nrg3908 (2015).

17. Weiss, K. M. & Clark, A. G. Linkage disequilibrium and the mapping of complex human traits. Trends Genet 18, 19-24 (2002).

18. Kathiresan, S. & Srivastava, D. Genetics of human cardiovascular disease. Cell 148, 1242-1257, doi:10.1016/j.cell.2012.03.001 (2012).

19. Basson, C. T. et al. Mutations in human TBX5 [corrected] cause limb and cardiac malformation in Holt-Oram syndrome. Nat Genet 15, 30-35, doi:10.1038/ng0197-30 (1997).

20. Garg, V. et al. GATA4 mutations cause human congenital heart defects and reveal an interaction with TBX5. Nature 424, 443-447, doi:10.1038/nature01827 (2003).

21. Garg, V. et al. Mutations in NOTCH1 cause aortic valve disease. Nature 437, 270-274, doi:10.1038/nature03940 (2005).

22. Clarke, G. M. et al. Basic statistical analysis in genetic case-control studies. Nat Protoc 6, 121-133, doi:10.1038/nprot.2010.182 (2011).

23. Cordell, H. J. et al. Genome-wide association study of multiple congenital heart disease phenotypes identifies a susceptibility locus for atrial septal defect at chromosome 4p16. Nat Genet 45, 822-824, doi:10.1038/ng.2637 (2013).

24. Hu, Z. et al. A genome-wide association study identifies two risk loci for congenital heart malformations in Han Chinese populations. Nat Genet 45, 818-821, doi:10.1038/ng.2636 (2013).

25. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747-753 (2009).

26. Sax, K. The Association of Size Differences with Seed-Coat Pattern and Pigmentation in PHASEOLUS VULGARIS. Genetics 8, 552-560 (1923).

27. Botstein, D., White, R. L., Skolnick, M. & Davis, R. W. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 32, 314- 331 (1980).

24

28. Lander, E. S. & Botstein, D. Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185-199 (1989).

29. Hilbert, P. et al. Chromosomal mapping of two genetic loci associated with blood- pressure regulation in hereditary hypertensive rats. Nature 353, 521-529, doi:10.1038/353521a0 (1991).

30. Jacob, H. J. et al. Genetic mapping of a gene causing hypertension in the stroke-prone spontaneously hypertensive rat. Cell 67, 213-224 (1991).

31. Todd, J. A. et al. Genetic analysis of autoimmune type 1 diabetes mellitus in mice. Nature 351, 542 (1991).

32. Liu, J. et al. Genetic analysis of a morphological shape difference in the male genitalia of Drosophila simulans and D. mauritiana. Genetics 142, 1129-1145 (1996).

33. Corder, E. H. et al. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 261, 921-923 (1993).

34. Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, doi:10.1126/science.273.5281.1516 (1996).

35. Flint, J. & Mackay, T. F. Genetic architecture of quantitative traits in mice, flies, and humans. Genome Res 19, 723-733, doi:10.1101/gr.086660.108 (2009).

36. Liti, G. & Louis, E. J. Advances in quantitative trait analysis in yeast. PLoS Genet 8, e1002912, doi:10.1371/journal.pgen.1002912 (2012).

37. Mackay, T. F. The genetic architecture of quantitative traits. Annu Rev Genet 35, 303- 339, doi:10.1146/annurev.genet.35.102401.090633 (2001).

38. Slate, J. From beavis to beak color: a simulation study to examine how much qtl mapping can reveal about the genetic architecture of quantitative traits. Evolution 67, 1251-1262, doi:10.1111/evo.12060 (2013).

39. Famoso, A. N. et al. Genetic architecture of aluminum tolerance in rice (Oryza sativa) determined through genome-wide association analysis and QTL mapping. PLoS Genet 7, e1002221, doi:10.1371/journal.pgen.1002221 (2011).

40. Huang, W. et al. Epistasis dominates the genetic architecture of Drosophila quantitative traits. Proc Natl Acad Sci U S A 109, 15553-15559, doi:10.1073/pnas.1213423109 (2012).

41. Dzur-Gejdosova, M., Simecek, P., Gregorova, S., Bhattacharyya, T. & Forejt, J. Dissecting the genetic architecture of F1 hybrid sterility in house mice. Evolution 66, 3321-3335, doi:10.1111/j.1558-5646.2012.01684.x (2012).

25

42. Poissant, J., Davis, C. S., Malenfant, R. M., Hogg, J. T. & Coltman, D. W. QTL mapping for sexually dimorphic fitness-related traits in wild bighorn sheep. Heredity (Edinb) 108, 256-263, doi:10.1038/hdy.2011.69 (2012).

43. Waszak, S. M. et al. Population Variation and Genetic Control of Modular Chromatin Architecture in Humans. Cell 162, 1039-1050, doi:10.1016/j.cell.2015.08.001 (2015).

44. Mackay, T. F. Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nat Rev Genet 15, 22-33, doi:10.1038/nrg3627 (2014).

45. Peripato, A. C. et al. Epistasis affecting litter size in mice. J Evol Biol 17, 593-602, doi:10.1111/j.1420-9101.2004.00702.x (2004).

46. Gerke, J., Lorenz, K. & Cohen, B. Genetic interactions between transcription factors cause natural variation in yeast. Science 323, 498-501, doi:10.1126/science.1166426 (2009).

47. Spiezio, S. H., Takada, T., Shiroishi, T. & Nadeau, J. H. Genetic divergence and the genetic architecture of complex traits in chromosome substitution strains of mice. BMC Genet 13, 38, doi:10.1186/1471-2156-13-38 (2012).

48. Mackay, T. F., Stone, E. A. & Ayroles, J. F. The genetics of quantitative traits: challenges and prospects. Nat Rev Genet 10, 565-577, doi:10.1038/nrg2612 (2009).

49. Gonzales, N. M. & Palmer, A. A. Fine-mapping QTLs in advanced intercross lines and other outbred populations. Mamm Genome 25, 271-292, doi:10.1007/s00335-014-9523-1 (2014).

50. Darvasi, A. & Soller, M. Advanced intercross lines, an experimental population for fine genetic mapping. Genetics 141, 1199-1207 (1995).

51. Peirce, J. L. et al. Genome Reshuffling for Advanced Intercross Permutation (GRAIP): simulation and permutation for advanced intercross population analysis. PLoS One 3, e1977, doi:10.1371/journal.pone.0001977 (2008).

52. Cheng, R. et al. Genome-wide association studies and the problem of relatedness among advanced intercross lines and other highly recombinant populations. Genetics 185, 1033- 1044, doi:10.1534/genetics.110.116863 (2010).

53. Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44, 821-824, doi:10.1038/ng.2310 (2012).

54. Therneau, T. (2018).

55. Ehrich, T. H. et al. Fine-mapping gene-by-diet interactions on chromosome 13 in a LG/J x SM/J murine model of obesity. Diabetes 54, 1863-1872 (2005).

26

56. Fawcett, G. L. et al. Fine-mapping of obesity-related quantitative trait loci in an F9/10 advanced intercross line. Obesity (Silver Spring) 18, 1383-1392, doi:10.1038/oby.2009.411 (2010).

57. Cheverud, J. M. et al. Fine-mapping quantitative trait loci affecting murine external ear tissue regeneration in the LG/J by SM/J advanced intercross line. Heredity (Edinb) 112, 508-518, doi:10.1038/hdy.2013.133 (2014).

58. Norgard, E. A. et al. Genetic factors and diet affect long-bone length in the F34 LG,SM advanced intercross. Mamm Genome 22, 178-196, doi:10.1007/s00335-010-9311-5 (2011).

59. Lawson, H. A. et al. Genetic, epigenetic, and gene-by-diet interaction effects underlie variation in serum lipids in a LG/JxSM/J murine model. J Lipid Res 51, 2976-2984, doi:10.1194/jlr.M006957 (2010).

60. Rai, M. F., Schmidt, E. J., Hashimoto, S., Cheverud, J. M. & Sandell, L. J. Genetic loci that regulate ectopic calcification in response to knee trauma in LG/J by SM/J advanced intercross mice. J Orthop Res 33, 1412-1423, doi:10.1002/jor.22944 (2015).

61. Andersen, T. A., Troelsen Kde, L. & Larsen, L. A. Of mice and men: molecular genetics of congenital heart disease. Cell Mol Life Sci 71, 1327-1352, doi:10.1007/s00018-013- 1430-1 (2014).

62. Fahed, A. C., Gelb, B. D., Seidman, J. G. & Seidman, C. E. Genetics of congenital heart disease: the glass half empty. Circ Res 112, 707-720, doi:10.1161/CIRCRESAHA.112.300853 (2013).

63. Nisli, K. et al. Congenital heart disease in children with Down's syndrome: Turkish experience of 13 years. Acta Cardiol 63, 585-589 (2008).

64. Li, H. et al. Genetic Modifiers Predisposing to Congenital Heart Disease in the Sensitized Down Syndrome Population. Circulation. Cardiovascular Genetics 5, 301-308, doi:10.1161/CIRCGENETICS.111.960872 (2012).

65. Liu, C. et al. Engineered chromosome-based genetic mapping establishes a 3.7 Mb critical genomic region for Down syndrome-associated heart defects in mice. Hum Genet 133, 743-753, doi:10.1007/s00439-013-1407-z (2014).

66. Abou Hassan, O. K. et al. NKX2-5 mutations in an inbred consanguineous population: genetic and phenotypic diversity. Sci Rep 5, 8848, doi:10.1038/srep08848 (2015).

67. Misra, C. et al. Congenital heart disease-causing Gata4 mutation displays functional deficits in vivo. PLoS Genet 8, e1002690, doi:10.1371/journal.pgen.1002690 (2012).

27

68. Polk, R. C. et al. The pattern of congenital heart defects arising from reduced Tbx5 expression is altered in a Down syndrome mouse model. BMC Dev Biol 15, 30, doi:10.1186/s12861-015-0080-y (2015).

69. Rossant, J. Mouse mutants and cardiac development: new molecular insights into cardiogenesis. Circ Res 78, 349-353 (1996).

70. Conway, S. J., Henderson, D. J. & Copp, A. J. Pax3 is required for cardiac neural crest migration in the mouse: evidence from the splotch (Sp2H) mutant. Development 124, 505-514 (1997).

71. Winston, J. B. et al. Heterogeneity of genetic modifiers ensures normal cardiac development. Circulation 121, 1313-1321, doi:10.1161/CIRCULATIONAHA.109.887687 (2010).

72. Winston, J. B. et al. Complex trait analysis of ventricular septal defects caused by Nkx2- 5 mutation. Circ Cardiovasc Genet 5, 293-300, doi:10.1161/CIRCGENETICS.111.961136 (2012).

73. Hollier, L. M., Leveno, K. J., Kelly, M. A., DD, M. C. & Cunningham, F. G. Maternal age and malformations in singleton births. Obstet Gynecol 96, 701-706 (2000).

74. Materna-Kiryluk, A. et al. Parental age as a risk factor for isolated congenital malformations in a Polish population. Paediatr Perinat Epidemiol 23, 29-40, doi:10.1111/j.1365-3016.2008.00979.x (2009).

75. Schulkey, C. E. et al. The maternal-age-associated risk of congenital heart disease is modifiable. Nature 520, 230-233, doi:10.1038/nature14361 (2015).

76. Wolf, J. B., Brodie Iii, E. D., Cheverud, J. M., Moore, A. J. & Wade, M. J. Evolutionary consequences of indirect genetic effects. Trends Ecol Evol 13, 64-69 (1998).

77. Wolf, J. & Cheverud, J. M. Detecting maternal-effect loci by statistical cross-fostering. Genetics 191, 261-277, doi:10.1534/genetics.111.136440 (2012).

78. Wolf, J. B. & Wade, M. J. What are maternal effects (and what are they not)? Philos Trans R Soc Lond B Biol Sci 364, 1107-1115, doi:10.1098/rstb.2008.0238 (2009).

79. Kruuk, L. E. & Hadfield, J. D. How to separate genetic and environmental causes of similarity between relatives. J Evol Biol 20, 1890-1903, doi:10.1111/j.1420- 9101.2007.01377.x (2007).

80. Berenos, C., Ellis, P. A., Pilkington, J. G. & Pemberton, J. M. Estimating quantitative genetic parameters in wild populations: a comparison of pedigree and genomic approaches. Mol Ecol 23, 3434-3451, doi:10.1111/mec.12827 (2014).

28

81. Wessels, A. & Sedmera, D. Developmental anatomy of the heart: a tale of mice and man. Physiological genomics 15, 165-176 (2003).

82. Zaidi, S. et al. De novo mutations in histone-modifying genes in congenital heart disease. Nature 498, 220-223, doi:10.1038/nature12141 (2013).

83. Li, Y. et al. Global genetic analysis in mice unveils central role for cilia in congenital heart disease. Nature 521, 520-524, doi:10.1038/nature14269 (2015).

29

Chapter 2

Genetic interactions shape the phenotype and severity of congenital heart defects

30

The majority of this chapter was extracted from the manuscript titled “The severity of a heart defect shapes its genetic architecture,” which is in revision as of May 2019 at Scientific Reports.

Akhirome E., Regmi S.D., Magnan R.A., Ugwu N., Qin Y., Schulkey C.E., Cheverud J.M., Jay

P.Y. (2018). The impact of epistasis increases with the fitness cost of a congenital heart defect.

Submitted.

31

Introduction

Congenital heart disease (CHD) is recognized as the most common birth defect, but it actually encompasses multiple, anatomically distinct phenotypes. The types of heart defects show an inverse relationship between their severity and incidence. For example, secundum atrial septal defects (ASDs), which are compatible with survival into middle age, are three times more common than atrioventricular septal defects (AVSD), which can cause death during infancy1,2. The inverse relationship has no simple explanation, such as by the relative frequencies of the causes of mild and severe defects. The relationship does imply, however, that each defect has a liability threshold related to its severity. The higher the threshold, the rarer a defect is. The developmental pathways that are affected in a severe defect must be more robust to perturbation than the pathways involved in a mild defect.

CHD could thus be a compelling model to examine the connections between the fitness cost and genetic architecture of a phenotype. As a rule, the same deleterious mutation can manifest as a mild, moderate or severe defect in different individuals. Modifier genes affect the phenotypic outcome3. Genetic heterogeneity and rare alleles hinder their characterization in humans, but inbred strain crosses in mutant mouse models can circumvent these limitations. The number of alleles at a locus can be limited to two, and the frequencies of both alleles can be kept equally common. We have shown through such crosses that the incidence of particular defects in Nkx2-

5+/- mice varies between genetic backgrounds. NKX2-5 mutation causes non-syndromic CHD in humans4,5. Genetic interactions modulate the susceptibility of the cardiac developmental pathways

6,7 to a deleterious mutation . In addition, first-generation (F1) hybrids of the inbred strains

C57BL/6N and FVB/N or A/J have a lower incidence of heart defects than Nkx2-5+/- mice in the

32

C57BL/6N background or the second-generation (F2) offspring of backcrosses to the parental strains or intercrosses (F1×F1). The observations suggest hybrid vigor; that is, genome-wide heterozygosity increases the average robustness of a population6.

The observations also suggest that variation in the robustness to the Nkx2-5 mutation – and conversely the liability to a heart defect – results from variation in the effects of genetic interactions. Epistasis contributes quantifiably to the variation of fitness traits in simple organisms, such as yeast8, Drosophila9 and C. elegans10. On the other hand, statistical genetic evidence from higher organisms, such as mice or humans, is scant, and theoretical analyses arrive at disparate conclusions regarding the significance of epistasis in complex traits or disease11,12. The reasons why the results conflict is hotly debated13. One, less commonly discussed reason is that the contribution of epistasis could vary with the fitness cost of a phenotype12. Selection indisputably eliminates deleterious mutations, but selection may also increase the robustness of pathways to perturbation by shaping genetic interactions and networks14. We thus reasoned that a comparative analysis of the genetic architecture of four CHD phenotypes under the same mutation could offer insight into the problem. The four defects – ASDs, muscular and membranous ventricular septal defects (VSD), and AVSD – have fitness costs that range from mild to severe, as defined by the likelihood of survival to reproductive age in humans. We determined the number and effects of quantitative trait loci (QTLs) on the risk of each heart defect in Nkx2-5+/- mice from two different inbred strain crosses. Each QTL represents a pairwise interaction between a modifier gene and

Nkx2-5 (G×GNkx). In addition, we examined the effects of higher-order interactions between two

QTLs and Nkx2-5 (G×G×GNkx). Empirical data are sorely needed to understand the nature of genetic risk in CHD. More fundamentally, a quantitative analysis of epistasis in a set of mutant

33 phenotypes that exact different fitness costs should shed light on the complex genetic basis of mutational robustness15.

34

Results

We phenotyped ~10,000 Nkx2-5+/- newborns from intercrosses between the inbred strains

A/J and C57BL/6N (A×B, N = 2999) and FVB/N and C57BL/6N (F×B; N = 6958). An ASD is the most common diagnosis associated with human NKX2-5 mutation5,16. An ASD is also most common in the mouse crosses, followed by membranous and muscular ventricular septal defects

(VSD) and AVSD (Figure 2.1A). The two anatomic types of VSD have different developmental bases, but similar pathological consequences. We note this because epidemiological data generally count membranous and muscular VSDs together, making VSD appear more common than ASD1.

Both intercrosses recapitulate the epidemiological relationship between the severity and incidence of a heart defect (Figure 2.1B). The attrition of more severely affected embryos does not explain the distribution of defects because Nkx2-5+/- mice are born at the expected Mendelian ratio6. In fact, very severe defects, such as double outlet right ventricle and tricuspid atresia, were observed, but there were too few for quantitative genetic analysis. About 70% of Nkx2-5+/- mice from either intercross have a normal heart, implying that the mutation is modestly deleterious overall.

The total effect of genetic modifiers on quantitative risk could be a function of the number of genes, their effect sizes or both. To distinguish these possibilities, we mapped the modifiers in the F2 intercross of A×B and in the combined F2 and advanced intercross generations of F×B

(Figure 2.2). Mapping in an F2 intercross has greater power to detect loci, whereas the combined analysis of an F2 and later generations has greater mapping resolution17. We identified QTLs for

ASD, membranous and muscular VSD, and AVSD (Fig. 2.3A, Table 2.1). Each set of QTLs represents G×GNkx interactions in developmental pathways leading to a defect. QTLs that overlap between defects may share a gene involved in the development of multiple cardiac structures. 35

The number of detected loci is similar between the four heart defects (Figure 2.4A). G×GNkx effect sizes, however, vary with the severity of a defect. G×GNkx interactions have a small effect on ASD risk and increasingly larger effects for VSDs and AVSD. The correlation holds whether the effect is calculated as the risk associated with a susceptibility allele or the phenotypic variance explained by a locus (Figure 2.3B, C). To account for undetected loci, we estimated the phenotypic variability explained by all genotyped SNPs. The heritability is similar between defects (Figure

2.4B). The correlation between G×GNkx effect size and severity is not due to a larger total genetic contribution to the risk of severe defects.

While each locus was mapped according to its individual effect on risk, two loci may interact. Representative examples compare the observed defect incidences at two-locus genotypes to the expected incidences under the null model of non-interacting loci18. Small differences between the observed and expected incidences of ASD and VSD indicate that each locus exerts mainly independent or additive effects (Figure 2.5A, B; Figure 2.6). In contrast, the incidence of

AVSD at a two-locus genotype can deviate substantially from the null expectation, indicating a non-linear effect. Some differences are especially large relative to the population or the unweighted average incidence of a defect across the nine genotypes (Figure 2.5C, D; Figure 2.6).

The latter calculation of incidence permits comparisons between defects of higher-order epistatic effects independently of genotype frequencies. The population and unweighted average incidences are correlated in the inbred strain crosses because there are only two common alleles per locus. In natural populations, multiple and rare alleles disrupt the correlation and make epistasis difficult to assess.

36

Similar to G×GNkx interactions, G×G×GNkx effect sizes increase with the severity of a defect

(Figure 2.5E). We calculated the G×G×GNkx contribution to the observed incidence of a defect at each of the nine genotypes between every pair of significant and suggestive loci (or the next most significant locus in the two cases where there is just one suggestive locus). Statistically significant

G×G×GNkx effect sizes vary between defects. AVSD effects are larger than ASD and VSD effects.

The relationship is not secondary to an underlying correlation, such as between the observed incidence at a two-locus genotype and the G×G×GNkx effect size or the severity of a defect.

Patterns of G×G×GNkx effects suggest that, during the inbreeding of mouse strains, fitness pressures positively selected genotypes, resulting in the coadaptation of interacting genes, as opposed to the elimination of intrinsically deleterious alleles. Among significant G×G×GNkx effects, two-locus genotypes that are homozygous at both loci for alleles from the same inbred strain or “syn-homozygous”, e.g., BB at both loci, are more likely to lower risk. Conversely, anti- homozygous G×G×GNkx effects, e.g., homozygous FF at one locus and BB at the other, are more likely to raise risk. Heterozygosity at either locus is equally likely to raise or lower risk (Figure

2.7A). Syn- and anti-homozygous G×G×GNkx interactions can have major effects on the risk of a severe defect. For example, one anti-homozygous, two-locus genotype accounts for 50% of all the

AVSDs in the F×B intercross (Figure 2.5D). The genotypes at either locus are not intrinsically deleterious because Nkx2-5 wild-type mice, including littermates of affected mutants, do not develop AVSDs. Conversely, both syn-homozygous genotypes at two loci in the A×B intercross effectively suppress AVSD risk (Figure 2.5C). In general, no one- or two-locus genotype skews

ASD or VSD risk as extremely (Figure 2.6).

37

Selection for coadapted genotypes likely increased the robustness of cardiac developmental pathways to subsequent perturbations – Nkx2-5 mutation in this case. Internally consistent

G×G×GNkx effects further suggest that selection for coadaptation was a recurrent event in each of the inbred strains examined. Protective and deleterious effects of syn- and anti-homozygosity, respectively, are associated with congruent effects at the other syn- and anti-homozygous genotypes (Fig. 2.7B–E).

38

Discussion

Normal cardiac development is crucial for survival to reproductive age, but CHD phenotypes exact widely ranging costs on fitness with severe defects being rarer than mild ones.

One must wonder whether there is an evolutionary basis for these observations19. Highly penetrant, mostly de novo genetic abnormalities account for about one third of all CHD cases3, but they do not obviously account for the inverse relationship between defect severity and incidence. For example, ASDs represented more than 10% of CHD cases in two large, whole-exome sequencing studies20,21. The present results suggest instead that the inverse relationship is a consequence of the genetic architectures of individual phenotypes. The results offer novel insights into the nature of genetic risk in CHD and the origins of mutational robustness.

Genetic heterogeneity and rare alleles obscure the biological significance of epistasis in human diseases. In contrast, inbred strain crosses offer more power to detect genetic interactions in the setting of naturally occurring variation. Through such crosses, we show that the effects of pairwise and higher-order epistasis correlate with the severity of a heart defect in Nkx2-5+/- mice.

Other features of the genetic architecture of a defect, such as the number of QTLs, do not correlate.

The QTLs for mild ASDs exert mainly small, additive effects on risk, while the QTLs for severe

AVSDs exert large, additive and non-linear effects. The QTLs for moderately severe, ventricular septal defects have intermediate effects. Strikingly non-linear G×G×GNkx effects at certain two- locus genotypes can effectively suppress the risk of an AVSD or account for a large fraction of

AVSDs in the Nkx2-5+/- population. These observations support a complex genetic model for the inverse relationship between the severity and incidence of a defect in humans. Whereas a modestly deleterious mutation may be sufficient to cause a mild defect, additional interactions with other 39 loci are necessary to surpass the liability threshold for a severe defect. Severe defects are hence rarer because the relevant multi-locus genotypes are improbable. Consistent with this model, a small study of non-syndromic AVSD found that among 34 persons who carried a mutation of one of six AVSD genes, NIPBL, CHD7, CEP152, BMPR1A, ZFPM2, or MDM4, 8 carried two or more, rare or rare, damaging nonsynonymous variants of the six genes. Larger studies are necessary to confirm the result and validate the potential interactions22.

The model further suggests a corollary for strongly deleterious mutations. Epistasis should matter less because the mutations place individuals near or beyond the liability threshold for a defect. To exert the same phenotypic effect as a weak mutation and its associated genetic interactions, a strong mutation must either obstruct one critical pathway or perturb several simultaneously. The dramatic effect of de novo mutations of histone-modifying genes, which regulate the expression of multiple developmental genes, is consistent with the latter possibility23.

Although genome-wide heterozygosity is associated with an overall lower risk of CHD in

Nkx2-5+/- mice, which indicates hybrid vigor, the effects of syn- and anti-homozygosity at interacting loci suggest genetic coadaptation. Dobzhansky introduced the concept of selection for alleles in gene complexes by virtue of the fitness benefits that their interaction confers in 1948

[Refs. 24,25]. Coadapted genes are probably widespread but under-detected because the alleles at either locus are not intrinsically deleterious. Recent examples relate to growth under different conditions in yeast8, male fertility in Drosophila9, body composition phenotypes in mice26, and vitamin D and skin color gene coadaptation in humans27. An especially elegant example demonstrates genetic coadaptation in the regulation of fasting plasma glucose levels, a diabetes- related trait. In a set of mouse crosses involving chromosomal substitution strains, one or two,

40 different chromosomes from the A/J strain were introduced into the C57BL/6J background. Five interactions were discovered between pairs of A/J chromosomes that promoted normal glucose homeostasis. Compared to the control, C57BL/6J strain, mice that carried either A/J chromosome of an interacting pair had elevated glucose levels. Yet, the levels fell back to normal when mice carried both A/J chromosomes rather than rise higher as an additive model would predict28. A/J and C57BL/6J appear to have arrived at different genetic solutions to maintain normal plasma glucose levels, just as the three strains in the present work must promote the robustness of cardiac developmental pathways.

Finally, whether robustness to mutation arises by selection or as an inherent consequence of an adapted trait has been a longstanding debate14,29. The results thus offer a rare example of selection for genetic robustness15. Depending upon the experimental design, investigators have lacked either the ancestral population to assess the pre-robust state or the years required to detect selection for robustness in an evolving population14. Our application of inbred strains, which were selected for fitness under genome-wide homozygosity, addresses these challenges. Unknown pressures appear to have selected strain-specific genotypes at interacting loci that enhance the robustness of cardiac developmental pathways to subsequent perturbation. Nkx2-5 mutation was probably not the original selection pressure because there are many potential causes of CHD.

Nevertheless, outbreeding in our intercrosses produces individual variation in the robustness to

Nkx2-5 mutation. Modifier locus genotypes permit us to observe the analog of pre- and post- selected states in the same population. If humans likewise vary in robustness, then the genetic basis of CHD is a function not only of deleterious variants but also of the degree of coadaptation of genes in an individual.

41

Methods

Mouse strains and crosses

Inbred C57BL/6N (B) and FVB/N (F) mice were purchased from Charles River

(Wilmington, Massachusetts) and A/J (A) from the Jackson Laboratory (Bar Harbor, Maine). The

+/- 6,30 Nkx2-5 mutant line is maintained in the C57BL/6N background . To produce F1 hybrids,

+/- +/- Nkx2-5 C57BL/6N males were crossed to wildtype A/J or FVB/N females. The Nkx2-5 F1 was then crossed to produce A×B and F×B F2 progeny. We also produced F10 and F14 advanced intercross generations of F×B by random mating from the F2 generation onward. Mating between

31 siblings and first cousins was avoided from the F3 and F4 generations onward . Mice were housed under standard conditions in the same room with access to water and chow ad libitum. The

Washington University School of Medicine Animal Studies Committee approved the experiments.

Heart collection and phenotyping

We collected and phenotyped hearts as previously described 6. Briefly, pups were collected within hours of birth. Thoraxes were fixed in 10% neutral buffered formalin. After Nkx2-5 genotyping, all Nkx2-5+/- and a subset of wild-type hearts were dissected and embedded in paraffin.

Hearts were completely sectioned in the frontal plane at 6 µm thickness. Every section was collected, stained with hematoxylin and eosin, and evaluated by two individuals. All defects were diagnosed by the morphology of the atrial and ventricular septae, semilunar and atrioventricular valves, chambers and the anatomic relationships between them. Secundum ASDs were distinguished from a patent foramen ovale based on the size of the latter in wild-type hearts.

42

Single nucleotide polymorphism genotyping

Genomic DNA was isolated from tissues by phenol-chloroform extraction. Single nucleotide polymorphism (SNP) genotyping was performed on a subset of the phenotyped animals.

Samples were selected to maximize the number of mice with heart defects while maintaining a comparable number of control mice. Control mice had normal hearts and were selected semi- randomly in the AILs to avoid over-representation of individuals from low-incidence families. For the heart defect cases, we SNP genotyped mice with the largest sized defects relative to anatomical landmarks in each heart. See Table 1 for the sample sizes of SNP genotyped mice per population.

For the A×B6N and FVB×B6N F2 crosses, we created custom SNP panels on the Sequenom

MassARRAY system6. We ascertained 102 and 129 SNPs evenly distributed across all 19 autosomes from the A×B6N and FVB×B6N F2 crosses, respectively. The average marker density ranged from 19-24 Mbp. The X chromosome was not included in this study because preliminary studies did not show any suggestive loci on the X chromosome.

Mice from the FVB×B6N F10 and F14 AIL populations were genotyped using the Illumina

Mouse MD Linkage Panel. 1359 SNPs were ascertained across all autosomes. We used the UCSC

Genome Browser (genome.ucsc.edu) to update the SNP identifiers and locations to the

GRCm38/mm10 (Dec. 2011) mouse genome build. We removed 4 SNPs that were not found in dbSNP (build 141). We removed SNPs with > 10% missing or with low median quality scores

(GC < 0.35), and removing SNPs that are not polymorphic between B6N and FVB strains. After

QC, 761 SNPs remained. The genotype data from the FVB×B6N F2 and AIL crosses were

43 combined and 3 markers were genotyped in all crosses, leaving a total of 887 markers with an average marker density of 3 Mbp. Appendix I provides the full list of SNPs.

Genotype probabilities and imputation

17 To increase the mapping resolution and enhance power , we combined the F2 and AILs from the FVB×B6N mouse crosses. This required imputing the SNPs that were not ascertained in the alternate mouse population.

30 In the F2 mice, we used R/qtl (version 1.40-8) to calculate the genotype probabilities of

758 SNPs that were ascertained only in the AIL mice30. For the missing genotypes in the AILs, we used QTLRel31 (version 0.2-15) to calculate the genotype probabilities of 126 SNPs that were only ascertained in the F2 mice. The genotype probabilities calculated by QTLRel account for the recombination patterns based on the AIL generation, therefore we calculated the genotype probabilities separately for the F10 and F14 crosses. The genotype probability data from all three crosses was combined and then converted to imputed allele dosages or posterior mean genotypes in the BIMBAM genotype file format [see GEMMA32 manual, http://www.xzlab.org/software/GEMMAmanual.pdf]. The BIMBAM genotype file uses genotype probabilities during association testing to account for uncertainty in the imputed genotypes.

We imputed missing genotypes for each locus that was used in the epistatic effect analyses.

For the F2 populations, we used R/qtl to impute all missing genotypes to that with the maximum marginal probability. With this process, 80% of missing genotypes were imputed with > 70% confidence; only 2% of missing genotypes were imputed with less than 50% confidence. For the

44

AIL population, we used QTLRel to impute the missing genotypes separately for the F10 and F14 crosses. 99.9% of missing AIL genotypes were imputed with > 70% confidence.

Genetic mapping

We performed case-control genome-wide association tests separately for each of the 4 heart defects: atrial septal defects, membranous ventricular septal defects, muscular ventricular septal defects, and atrioventricular septal defects. Although we found other rare and severe heart defects, like double outlet right ventricle, their sample sizes (< 20) were insufficient for genetic analyses.

Mapping in the combined FVB×B6N F2 and AIL mice crosses required special considerations, as discussed in Parker et al. 2014 [Ref. 17]. The main issue is unequal relatedness between individuals in the AIL populations, which leads to elevated false-positive rates if not properly controlled. To account for relatedness, we used the univariate linear mixed model implemented in the GEMMA32 (version 0.94.1). We modeled specific heart defects as binary phenotypes using genome-wide SNPs and a “random” or “polygenic” effect that accounts for relatedness:

푦 = 휇 + 푥훽 + 푢 + 휖;

where y is a vector of binary phenotypes (0 = control, 1 = heart defect), μ is the mean, x is a vector of SNP marker genotypes, β is a vector of marker effects, u is a vector of random effects, and ϵ is a vector of errors.

Linear mixed models account for relatedness by including a genetic covariance or relatedness matrix (GRM), which captures the variance in the phenotype that is due to the genome- 45 wide similarity between individuals. For each case-control analysis, we used GEMMA to estimate centered GRMs using the SNP genotypes from the combined F2 and AIL FVB×B6N population.

However, we found that there was little power to detect genetic associations. This was because the average pairwise relatedness between F2 mice was estimated to be near 0, which suggests they are unrelated. Because all F2 individuals are effectively full-siblings and share on average 50% of their genotypes, we substituted the F2 - F2 relatedness coefficients in the GRM with 0.5, the expected value for a full sibship, and 1 on the diagonal. This substitution allowed us to dramatically improve our power to detect genetic associations in the combined F2 and AIL population. F2-AIL relatedness coefficients were not changed. Then we adjusted the entire matrix (F2 + AIL) to the nearest positive definite matrix using the nearPD function in the “Matrix” R package (version 1.2-

6). This adjustment adds a small amount of noise to the GRM to remove negative eigenvalues, which is necessary for stability of the mixed model. A Mantel test for similarity between the pre- and post-adjusted GRMs showed > 95% similarity in all cases.

In the combined FVB×B6N F2 and AIL, we used the likelihood ratio test in GEMMA to test the alternative hypothesis that a SNP has a non-zero effect against the null hypothesis that is has zero effect. We set genome-wide significance thresholds by Bonferroni-correction. The number of tests for the correction was calculated with the simpleM software33, which estimates the effective number of independent tests by accounting for linkage between SNP markers. The number of effective tests or markers was 542. Therefore, significant loci had P values < 9.2×10-4

(0.05/542) and suggestive loci had P values < 1.8×10-3 (1/542)34.

We performed interval mapping separately for each heart defect in the A×B6N F2 population using the binary trait model in R/qtl. We calculated genotype probabilities at missing

46 genotypes and 9cM intervals. Statistical significance was determined by 5000 permutations.

Significant loci (α = 0.05) had LOD scores > 3.3, and suggestive loci (α = 0.20) had LOD scores

> 2.7 [Ref. 34].

Estimation of heritability

The percent phenotypic variance explained by all genotyped SNP that interact with Nkx2-

5+/- (genome-wide PVE) can be estimated using restricted maximum likelihood (REML) in the linear mixed model in GEMMA. This method uses the GRM to estimate the genome-wide PVE and its standard error in each case-control population. In the absence of Nkx2-5 mutation, this method would be similar to estimating the narrow-sense heritability (h2) from genotyped SNPs.

We used GEMMA to estimate the GRMs for the A×B6N F2 population and used the same matrices used for mapping the combined FVB×B6N population.

We corrected the PVE estimates for ascertainment and transformed them to the liability scale using the equation35:

푃2(1−푃)2 푃푉퐸; 퐴(1−퐴)푧2 where P is the population disease incidence, A is incidence in the ascertained cohort, and z is the height of the standard normal distribution at the population liability threshold36. Our genome-wide

PVE estimates may slightly underestimate the true PVE because we did not include the sex chromosomes, and there may be incomplete linkage between the SNP markers and some modifier variants.

47

Analysis of G×GNkx effects

We estimated the percent variance explained by each modifier SNP (SNP PVE) using the equation:

[(MAF × (1 − MAF)) × β2] ; Phenotype Variance

where MAF is the SNP minor allele frequency and β is the effect of the SNP in a linear model. The SNP PVE estimates were corrected and transformed to the liability scale in the same manner as the genome-wide PVE estimates. We used GEMMA to estimate βs for FVB×B6N SNPs using the GRMs from the mapping procedure. The effects of A×B6N SNPs were also estimated in

GEMMA while using a GRM with all individuals set as full-siblings (relatedness coefficients set to 0.5).

We estimated odds ratios for the G×GNkx effects using logistic regression models. Odds ratios for each locus identified in the A×B6N F2 population were obtained using the base glm function in R. Odds ratios for each locus identified in the combined FVB×B6N population were obtained using the logistic mixed model implemented in GMMAT37. This mixed model allows us to control for relatedness in the AIL using the same relatedness matrices that were used in the mapping procedure.

Analysis of G×G×GNkx effects

+/- Third-order epistais effects (G×G×GNkx), between two modifier loci and Nkx2-5 , can be estimated from the pairwise interactions between modifiers. Therefore, we quantified the

48 contribution of G×G×GNkx epistasis to the observed risk for each two-locus genotype using

Cheverud & Routman’s physiological epistasis model18, which is not weighted by allele frequencies. Two-locus genotype refers to one of the nine possible genotypic combinations between interacting modifier loci (see Figure 2.5). In short, we computed epistatic effects as the difference between the observed defect incidence and the expected incidence in the absence of

G×G×GNkx epistasis. The expected incidence is estimated from the two modifier loci by assuming they affect risk independently. This expected incidence can be calculated as the unweighted regression of the two-locus genotype incidences onto the two single-locus incidences. The epistatic effects are measured in units of heart defect incidence, with the sign depending on whether the epistatic effect enhances (+) or suppresses (-) incidence at that two-locus genotype. To identify the significant epistatic effects, we arcsine-transformed the data to obtain unbiased variances as described by Sokal & Rohlf38. Using these variances, we conducted two-tailed T-tests to determine which epistatic effects were significantly different from zero. The P value threshold for the T-test was a Bonferroni-corrected 0.05 significance threshold. The number of tests was equal to the number of pairwise combinations of loci multiplied by nine, the number of two-locus genotypes.

Significant epistatic effects were rescaled by dividing by the unweighted population incidence, which is the arithmetic mean of the nine incidences at each two-locus interaction18.

These rescaled epistatic effects show the change in observed risk as a percentage of the defect’s population incidence. This allowed us to compare the relative magnitudes of the epistatic effects even though the heart defect incidences are different. Custom R code was developed to perform this analysis. Because we only identified one G×GNkx modifier locus for ASD and Mus. VSD in the FVB×B6N population, we assessed G×G×GNkx effects in combination with the next most

49 significant locus; ASD: rs13478997, Chr. 6, 55cM, P = 1.94×10-3; Mus. VSD: gnf13.115.241, Chr.

13, 64cM, P = 2.39×10-3.

Note: It was not possible to estimate G×G×GNkx effects using logistic regression models as in the G×GNkx case because of convergence failures. These failures were caused by two-locus genotype groups with 0 heart defects, causing undefined values after logit transformation.

Statistical Analyses

GEMMA was run in Ubuntu 14.04. All other statistical analyses were performed in the R

Statistical Computing environment (version 3.3.1). T-tests and Z-tests used a two-sided alternative hypothesis. We used the R package “ppcor”39 to calculate Kendall’s partial correlation40,41 (τ) with a two-sided alternative hypothesis. Partial correlations allowed us to calculate non-parametric correlation while controlling for the mouse population and the number of defect cases. ANOVA models controlled for the number of defect cases as a covariate.

50

Figures

A

B

Figure 2.1 The severity and incidence of a heart defect are negatively correlated.

A) In humans, mild defects are more common than severe ones. The median survival without surgical intervention defines the severity of a defect 2. The panels show representative heart defects from Nkx2-5+/- pups. Arrows point to a secundum ASD, membranous and muscular VSDs, and the common atrioventricular canal in an AVSD. B) As in humans, the incidences of heart defects in Nkx2-5+/- newborns are inversely related with their severity (Kendall’s partial rank correlation  = -0.845, P < 0.01). Double outlet right ventricle (DORV) was present in 18 and 3 Nkx2-5+/- hearts from the F×B and A×B intercrosses respectively. One case of tricuspid atresia was found in each intercross.

51

Figure 2.2 Breeding scheme for inbred strain intercrosses

+/- We produced F1 hybrids by crossing wild-type FVB/N (F) or A/J (A) females to Nkx2-5 +/- C57BL/6N (B) males. We phenotyped newborn Nkx2-5 F2 pups from both A×B and F×B intercrosses and F10 and F14 pups from the F×B advanced intercross. The advanced intercross was produced by random mating of non-siblings and non-first cousins beginning in the F3 and F4 generations. The number of Nkx2-5+/- pups phenotyped in each generation is shown.

52

A

B C

Figure 2.3 The quantitative effect of G×GNkx interactions on risk correlates with the severity of a heart defect

A) Genetic loci modify the risk of specific heart defects in Nkx2-5+/- mice from the F×B and A×B intercrosses. The number of affected Nkx2-5+/- newborns genotyped for each defect is indicated. Normal Nkx2-5+/- littermates were genotyped as controls (N = 330 A×B and 1024 F×B). Suggestive and genome-wide significance thresholds are indicated in blue and red. B) and C) G×GNkx effects on risk, quantified as the percent variance explained by the most significant SNP at a locus or the odds ratio for each risk allele, correlate with defect severity. Muscular () and 53 membranous (•) VSD loci are analyzed together because the two VSD types have similar severity. The hatch marks indicate the mean.

54

A B

Figure 2.4 The correlation between defect severity and effect size is not attributable to trivial

explanations

Neither A), the number of significant and suggestive loci detected nor B), the phenotypic variability explained by all genotyped SNPs varies with the severity of a defect. Error bars are 95% C.I.

55

A B

C D

E

Figure 2.5 G×G×GNkx effects correlate with the severity of a heart defect

Representative plots depict the observed and expected incidence of a defect at a two-locus genotype. The expected incidence is calculated under the null hypothesis of non-interacting loci. The difference between the observed and expected incidences is the G×G×GNkx effect at a genotype. Not all differences are significant. A), B) ASD and VSD loci mainly contribute to risk independently, as indicated by similar observed and expected incidences. Significant G×G×GNkx

56 effects are small relative to the unweighted average of incidences (UWA, scale bars) across all nine genotypes. C), D) Epistatic interactions between AVSD loci exert large G×G×GNkx effects. Consider two examples (*). The syn-homozygous, two-locus genotypes (AA-AA and BB-BB) in the A×B intercross effectively suppress risk. In contrast, half of the AVSDs in the F×B intercross are associated with an anti-homozygous, Chr 5 BB-Chr 16 FF genotype. e, G×G×GNkx effect sizes correlate with defect severity (Kendall’s τ = 0.195, P = 0.0017). AVSD effects are larger than ASD and VSD (Mann-Whitney U test). Significant G×G×GNkx effects were divided by the UWA and shown as absolute values because the net deviation of observed and expected incidences is zero. The sign indicates whether a significant effect increases (+) or decreases (-) risk. The hatch marks indicate the mean. Effects are color-coded by their two-locus genotypes, as given in the Punnett square for syn- and anti-homozygous, double heterozygous, and mixed genotypes. Alleles: B, C57BL/6N; X, A/J or FVB/N.

57

Figure 2.6 Every possible interaction between G×G×GNkx modifiers for each defect type

The observed incidences at two-locus genotypes of ASD, membranous and muscular VSD, and AVSD are shown between every pair of suggestive or genome-wide significant modifier loci found in both crosses. In the two cases where there is only one suggestive locus, the locus is plotted against the next most significant locus in the cross. ASD: F×B

ASD: A×B

Membranous VSD: F×B

58

Membranous VSD: A×B 59

Muscular VSD: F×B 60

Muscular VSD: A×B

61

AVSD: F×B

AVSD: A×B

62

A B

C D

E F

Figure 2.7 Genetic compatibility between two loci involved in G×G×GNkx interactions lowers the risk of heart defects in Nkx2-5+/- mice

A), Statistically significant syn-homozygous G×G×GNkx effects between two modifier loci are more likely to be protective than expected by chance or compared to anti-homozygous effects. Anti-homozygous effects are less likely to be protective than expected by chance. Double- heterozygous and mixed genotype effects are equally likely to be protective or deleterious. The fractions of significant effects that are protective are indicated. *, not equal to 50%, P < 0.05. B– 63

F) A significant G×G×GNkx effect at a syn-homozygous genotype that is protective (risk ) or at an anti-homozygous genotype that is deleterious (risk ) is associated with congruent effects at the other syn- and anti-homozygous genotypes. Proportions were compared in two-sided z-tests.

64

Tables Cross Defect Peak SNP Locus interval LOD or -log(p) Risk Allele Candidate genes

A×B ASD rs6238909 chr1:125665904-182054904 2.88 B Csrp1, Lefty1, Lefty2, Pitx2, Vangl2

A×B ASD rs31135898 chr3:52939690-158052844 3.22 A Bmpr1b, Notch2, Pifo, Pitx2

A×B ASD rs13478067* chr4:124165816-154991070 2.69 A

A×B Mbr.VSD rs3659784 chr2:31172022-152809607 3.42 A Actc1, Acvr1, Bbs5, Jag1, Lrp2, Mkks, Ttc21b, Zeb2

A×B Mbr.VSD rs27610338 chr4:100332046-154991070 5.60 A Ptch2

A×B Mbr.VSD rs13479701 chr8:14759093-62208384 3.81 A Hand2, Vegfc,

A×B Mbr.VSD rs13482119 chr14:17628076-67124319 2.92 B Bmp4, Bmpr1a, Esco2, Gata4, Ift88, Myh6

A×B Mbr.VSD rs30415214 chr19:23376527-53761523 3.55 B Acta2, Fgf8

A×B Mus.VSD rs13479794 chr8:14759093-108947107 3.77 A Bbs2, Gdf1, Hand2, Nfatc3, Sall1, Vegfc

A×B Mus.VSD rs13482037 chr13:96099181-118789220 5.87 B Fgf10, Isl1

A×B AVSD rs13477291 chr3:75914491-136196844 4.98 A Notch2, Pifo, Pitx2

A×B AVSD rs27610338* chr4:100332046-154991070 3.40 A Ptch2 Cc2d2a, Evc, Evc2. Pkd2, Ptpn11, Med13l, Tbx3,

A×B AVSD rs13478337 chr5:35506765-129965381 4.08 B Tbx5

F×B ASD rs6304156 chr9:121915911-123153990 3.02 B

F×B Mbr.VSD rs6292642 chr6:94055997-111299794 6.12 B Foxp1

F×B Mbr.VSD rs3657963 chr8:15279371-17579118 4.98 F Csmd1

F×B Mus.VSD rs6349084 chr6:85961013-104553235 3.13 B Prickle2

F×B AVSD rs6255362 chr5:116106456-134045067 4.74 B Ptpn11, Med13l, Tbx3, Tbx5

F×B AVSD rs4217260 chr16:85097188-93538107 2.73 F Table 2.1 Modifier loci by cross and defect type

F×B, -log(p). A×B, LOD. Bold, genome-wide significant. Non-bold, suggestive. *, estimated marker. The closest SNP is indicated.

The locus interval is bounded by the 1 LOD or -log(p) drop from the peak SNP

65

References

1. Hoffman, J. I. & Kaplan, S. The incidence of congenital heart disease. Journal of the American college of cardiology 39, 1890-1900 (2002).

2. Hoffman, J. I. E. The natural and unnatural history of congenital heart disease. (Wiley- Blackwell, 2009).

3. Akhirome, E., Walton, N. A., Nogee, J. M. & Jay, P. Y. The Complex Genetic Basis of Congenital Heart Defects. Circulation Journal 81, 629-634, doi:10.1253/circj.CJ-16- 1343 (2017).

4. Benson, D. W. et al. Mutations in the cardiac transcription factor NKX2. 5 affect diverse cardiac developmental pathways. The Journal of clinical investigation 104, 1567-1573 (1999).

5. Schott, J. J. et al. Congenital heart disease caused by mutations in the transcription factor NKX2-5. Science 281, 108-111 (1998).

6. Winston, J. B. et al. Heterogeneity of genetic modifiers ensures normal cardiac development. Circulation 121, 1313-1321, doi:10.1161/CIRCULATIONAHA.109.887687 (2010).

7. Panzer, A. A. et al. Nkx2-5 and Sarcospan genetically interact in the development of the muscular ventricular septum of the heart. Scientific reports 7, 46438 (2017).

8. Forsberg, S. K. G., Bloom, J. S., Sadhu, M. J., Kruglyak, L. & Carlborg, O. Accounting for genetic interactions improves modeling of individual quantitative trait phenotypes in yeast. Nat Genet 49, 497-503, doi:10.1038/ng.3800 (2017).

9. Corbett-Detig, R. B., Zhou, J., Clark, A. G., Hartl, D. L. & Ayroles, J. F. Genetic incompatibilities are widespread within species. Nature 504, 135-137 (2013).

10. Noble, L. M. et al. Polygenicity and epistasis underlie fitness-proximal traits in the Caenorhabditis elegans multiparental experimental evolution (CeMEE) panel. Genetics, genetics. 300406.302017 (2017).

11. Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS genetics 4, e1000008 (2008).

12. Wilkins, J. F., McHale, P. T., Gervin, J. & Lander, A. D. Survival of the Curviest: Noise- Driven Selection for Synergistic Epistasis. PLOS Genetics 12, e1006003, doi:10.1371/journal.pgen.1006003 (2016).

66

13. Sackton, Timothy B. & Hartl, Daniel L. Genotypic Context and Epistasis in Individuals and Populations. Cell 166, 279-287, doi:10.1016/j.cell.2016.06.047.

14. de Visser, J. A. et al. Perspective: Evolution and detection of genetic robustness. Evolution 57, 1959-1972 (2003).

15. Siegal, M. L. & Leu, J.-Y. On the nature and evolutionary impact of phenotypic robustness mechanisms. Annual review of ecology, evolution, and systematics 45, 495- 517 (2014).

16. Maury, P. et al. Cardiac Phenotype and Long-Term Follow-Up of Patients With Mutations in NKX2-5 Gene. Journal of the American College of Cardiology 68, 2389- 2390 (2016).

17. Parker, C. C. et al. High-resolution genetic mapping of complex traits from a combined analysis of F2 and advanced intercross mice. Genetics 198, 103-116, doi:10.1534/genetics.114.167056 (2014).

18. Cheverud, J. M. & Routman, E. J. Epistasis and its contribution to genetic variance components. Genetics 139, 1455-1461 (1995).

19. Dobzhansky, T. Nothing in biology makes sense except in the light of evolution. The american biology teacher 75, 87-91 (1973).

20. Jin, S. C. et al. Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nature genetics 49, 1593 (2017).

21. Sifrim, A. et al. Distinct genetic architectures for syndromic and nonsyndromic congenital heart defects identified by exome sequencing. Nat Genet 48, 1060-1065, doi:10.1038/ng.3627 (2016).

22. D’Alessandro, L. C. et al. Exome sequencing identifies rare variants in multiple genes in atrioventricular septal defect. Genetics in Medicine 18, 189 (2016).

23. Zaidi, S. et al. De novo mutations in histone-modifying genes in congenital heart disease. Nature 498, 220-223, doi:10.1038/nature12141 (2013).

24. Dobzhansky, T. Genetics of natural populations. XVIII. Experiments on chromosomes of Drosophila pseudoobscura from different geographic regions. Genetics 33, 588 (1948).

25. Dobzhansky, T. Genetics of natural populations. XIX. Origin of heterosis through natural selection in populations of Drosophila pseudoobscura. Genetics 35, 288 (1950).

26. Tyler, A. L., Donahue, L. R., Churchill, G. A. & Carter, G. W. Weak epistasis generally stabilizes phenotypes in a mouse intercross. PLoS genetics 12, e1005805 (2016).

67

27. Tiosano, D. et al. Latitudinal clines of the human and skin color- genes. G3: Genes, Genomes, Genetics, g3. 115.026773 (2016).

28. Chen, A., Liu, Y., Williams, S. M., Morris, N. & Buchner, D. A. Widespread epistasis regulates glucose homeostasis and . PLoS genetics 13, e1007025 (2017).

29. Wagner, A. Robustness and evolvability in living systems. (2005).

30. Broman, K. W., Wu, H., Sen, S. & Churchill, G. A. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889-890 (2003).

31. Cheng, R., Abney, M., Palmer, A. A. & Skol, A. D. QTLRel: an R package for genome- wide association studies in which relatedness is a concern. BMC Genet 12, 66, doi:10.1186/1471-2156-12-66 (2011).

32. Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44, 821-824, doi:10.1038/ng.2310 (2012).

33. Gao, X. Multiple testing corrections for imputed SNPs. Genet Epidemiol 35, 154-158, doi:10.1002/gepi.20563 (2011).

34. Lander, E. & Kruglyak, L. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat Genet 11, doi:10.1038/ng1195-241 (1995).

35. Zhou, X., Carbonetto, P. & Stephens, M. Polygenic Modeling with Bayesian Sparse Linear Mixed Models. PLoS Genet 9, e1003264, doi:10.1371/journal.pgen.1003264 (2013).

36. Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics. 4th edn, (Longman, 1996).

37. Chen, H. et al. Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models. The American Journal of Human Genetics 98, 653-666, doi:10.1016/j.ajhg.2016.02.012. (2016).

38. Sokal, R. R. & Rohlf, F. J. Biometry: the principles of statistics in biological research. 3rd edn, (New York, NY: WH Freeman and Co, 1995).

39. Kim, S. ppcor: An R Package for a Fast Calculation to Semi-partial Correlation Coefficients. Communications for statistical applications and methods 22, 665-674, doi:10.5351/CSAM.2015.22.6.665 (2015).

40. Kendall, M. G. A NEW MEASURE OF RANK CORRELATION. Biometrika 30, 81-93, doi:10.1093/biomet/30.1-2.81 (1938).

41. Kendall, M. G. Partial Rank Correlation. Biometrika 32, 277-283, doi:10.2307/2332130 (1942). 68

Chapter 3

High-resolution mapping of the maternal age-associated risk of congenital heart disease

69

Introduction

Advanced maternal age is an underappreciated risk factor for congenital heart disease (CHD) in humans. Some may attribute this risk solely to the age-related decline in oocyte quality and an accumulation of genetic mutations. However, the risk remains even after excluding children with major syndromes or aneuploidy1.

The laboratory of Patrick Jay was the first to demonstrate an analogous maternal age- related CHD risk in mice. Using an Nkx2-5+/- mouse model of CHD, they demonstrated that each additional month of maternal age increased the offspring CHD risk by ~10% (Ref. 2). Copy number analysis in the mice excluded any meaningful effect of aneuploidy or large copy number variants. To determine if the maternal age risk is due to old eggs or physiological factors within old mothers, the ovaries of young mouse mothers were reciprocally exchanged for those of old mothers. The results conclusively showed that the elevated risk of CHD, specifically ventricular septal defects, in Nkx2-5+/- pups is due to physiological factors from the aging mother, not time- dependent damage to the eggs3. Subsequent experiments showed that the age effect can be mitigated if the mother voluntarily exercises by running on a mouse wheel. The positive effect of aerobic exercise highlights the possibility that if the causal factors are identified, therapeutics, targeted at the mother, can mimic the effects of exercise and suppress age-associated CHD risk in humans.

Following these profound discoveries, the next steps were not so clear because the maternal effect could be anything in the mother that changes with age. The most prudent approach is to use an unbiased method to identify the driving factor in the mother, rather than pursuing one of many

70 possible angles. Fortunately, there was one key observation that guided the path forward: the magnitude of the maternal age effect varies by the mother’s genetic background (Figure 3.1) This variation establishes that the age-related risk is a quantitative genetic trait that is amenable to genetic mapping. To associate our phenotype to specific genes, we need a genetic tool that enables sufficiently high mapping resolution; this tool is an advanced intercross line Darvasi and Soller4.

High-resolution genetic mapping in an advanced intercross line

The resolution of genetic mapping depends on the number of meiotic crossover events that have occurred since the original founder pair. It is the random shuffling of chromosome segments during meiosis that allows trait-associated genotypes to segregate with phenotypes5,6.

Experimental backcrosses and F2 intercross analyses only have one generation of recombination events, thus the mapped loci have confidence intervals that typically comprise hundreds of candidate genes7. Contrarily, advanced intercross lines (AILs) boost the recombination frequency by randomly crossing F2 progeny for several generations, without mating individuals with common grandparents4. This process results in a panel of genetically unique mice. The mapping resolution increases linearly with additional intercross generations.

th Dr. James Cheverud has developed a 56 generation advanced intercross (F56) by intercrossing Large and Small (LG/J×SM/J) mouse strains. Many generations of meiotic recombination events have expanded the F56 genetic map 28-fold relative to an F2 population

(Figure 3.2). Linkage mapping in the F56 population will yield loci spanning ~600kb, an interval that contains an average of three genes. Earlier generations of this LG/J×SM/J advanced intercross have been used successfully to map narrow genetic loci contributing to various complex traits 71 including obesity8, bone length9, metabolism10, anxiety7 and tissue regeneration11. Cheverud’s AIL was used to identify candidate genes (Aff3, Fam18a, Syn3 and Ank) for ectopic calcification12.

Mapping the maternal age-associated risk of CHD in the F56 LG/J×SM/J dams presents an unprecedented opportunity to locate modifiers of CHD risk at almost single-gene resolution.

72

Results

Our team collected a total 17,568 mouse pups from 289 LG/J×SM/J dams 56th generation advanced intercross (LSAI) dams, 17,425 were available for analysis. The experimental cross was optimized to generate Nkx2-5+/- pups, while allowing for homozygous genotypes at all potential

+/- modifier loci (Figure 3.3). We crossed LSAI sister pairs (littermates) to an Nkx2-5 F1 hybrid male that was either B6;LG or B6;SM. Using these males ensures the offspring population will express any recessive CHD risk modifiers. From the total pups collected, there were 8634 Nkx2-5 mutants. A one sample z-test showed that the number of heterozygous pups did not differ from the expected 50% ratio (P = 0.234).

Atrial septum and muscular septum defects reveal a large maternal age effect

CHD phenotyping began on a cohort of 7242 Nkx2-5+/- pups. First, I determined the incidence of the four most common heart defects: atrial septal (ASD), membranous ventricular septal (Mbr. VSD), muscular ventricular septal (Mus. VSD) and atrioventricular septal (AVSD) defects (Figure 3.4A). It was clear that ASDs are common and AVSDs are the least common, as we expected from previous experiments.

To recapitulate the maternal age-dependent risk of CHD in humans, it is essential that CHD risk increases proportionally with maternal age. To determine whether the maternal age effect exists, we compared CHD incidence of mouse pups from young mothers (≤120 days old; N =

1613) to pups from old mothers (≥250 days old; N = 3276) (Figure 3.4B). I found significant

73 maternal age effects for all defect types, but Mbr. VSD and AVSD maternal age effects were weaker. A logistic regression model further supports these observations (Figure 3.5).

The maternal age effect on CHD risk varies significantly between mothers

Although there is a robust maternal age effect for ASD and Mus. VSD, the most important factor in mapping the maternal age effect is variation in the effect among LSAI mothers. To test if there is significant variation in the age effect, I used a logistic regression model (see Methods) to determine if the incidence of heart defects is dependent on an interaction between mothers and the maternal age. A significant interaction term provides evidence for mappable variation in the maternal age effect. The logistic regression model showed significant interactions between mother and maternal age for all defects (Table 3.1).

Using a similar model as before, I estimated the change in the age effect for each mother individually. The resulting odds ratios (ORs) are point estimates describing the CHD risk per month of maternal age. These ORs can reach as high as 3 (Figure 3.6). However, it important to note that many of the point estimates have large standard errors because of the relatively small number of offspring available per mother. This explains the lack of error bars in Figure 3.6. There is likely a substantial amount of noise in these point estimates. However, this should not matter for two reasons: 1) The point estimates provide a reference for the magnitude and range of the age effects, 2) these results confirm the logistic regression model demonstrating that there is significant variation in the maternal age effect. The agreement between the two models suggests that genetic mapping will yield fruitful results.

74

Genotyping by sequencing identifies over 30,000 high quality SNPs

Single nucleotide polymorphism (SNP) genotyping was performed through two methods:

1) Genotyping by sequencing method (GBS)13,14 and 2) Illumina GeneSeek Genomic Profiler

GigaMUGA15. Over 90% of LSAI mothers were genotyped with GBS, the remaining that were genotyped with the Illumina array were to circumvent poor DNA quality.

GBS uses restriction enzymes to systematically reduce the size of the genome, so that the same DNA regions are sequenced in all samples. This allows high sequencing depth, which is essential for SNP variant calling. As a visual assessment of the sequencing and variant calling quality, I plotted the distribution of variants called by GBS in comparison to the regions known to be polymorphic between LG/J and SM/J (Figure 3.7). The 36,220 SNPs we detected closely fit the distribution of expected SNPs. Our SNPs achieve roughly 90% genome coverage.

The Illumina GigaMUGA array contains 143,000 SNPs that captures polymorphisms from over 150 different mouse strains. After quality control with the same thresholds as used for the

GBS samples and removing duplicate or invalid SNPs, there were 24,683 SNPs remaining for 20 samples (Table 3.2). I tested the concordance between GBS and GigaMUGA for two samples that were genotyped on both platforms (LSAI 101058 and 101080). I found that only 243 SNPs were genotyped on both platforms. After removing rows with missing data, I found that 94% and 82% of SNP calls overlapped between the two replicates, respectively. It appeared that the disagreements were almost entirely due to the GBS method missing a heterozygous call, only detecting one allele.

75

Because most samples were genotyped using the GBS platform, I imputed the SNPs called by GBS in the samples genotype with the GigaMUGA platform. Imputation was performed using the R package, QTLRel, to calculate genotype probabilities, and then the probabilities were converted to an allele dosage/mean genotype, which incorporates the level of certainty in the genotype value. After all the QC and processing steps, there were 36,220 SNPs available for 262 total LSAI mothers with 7242 pups.

Mapping the maternal age-associated risk of CHD

Using a linear mixed model (see Methods), I modeled the phenotype for each pup in a case–control or binary format. I classified controls as pups that lacked any sort of heart defect, and cases were the pups that have the specific defect under analysis. This means that the pups with other heart defects were excluded from that specific analysis. I found that this was necessary because including pups with other defects as controls, was reducing power by confounding the control cohort.

The independent variables are a combination of fixed and random effects. I estimated the fixed effects of maternal age, the mothers’ genetic effects (additive and dominant) at each locus, and pairwise interactions between maternal age and genetic effects on each pup’s risk of developing a specific heart defect. The interactions between maternal age and the genetic effects

(Maternal Age × Additive effect; Maternal Age × Dominance effect) are the most important terms in the analysis because it will pinpoint the loci that drive the maternal age effect. Our model also controlled for the paternal type as a binary covariate.

76

In early iterations of this analysis, I noticed that several LSAI mothers had offspring with defect incidences much lower than the population average, which greatly reduces the likelihood of detecting a maternal age effect when the offspring have effectively no risk of developing a specific defect (Table 3.3). To control for this scenario, I included the defect incidence from all offspring per mother as a covariate to control for the variation in the maternal age effect estimates that could be due to low CHD risk. I also controlled for unequal relationships between mothers by including a random effect of relatedness. This model was fit with custom scripts for the lmekin16 R function, which is a specialized wrapper for the coxme package that adds the capacity to model random effects with a genetic relatedness matrix (GRM) or a covariance matrix. I calculated covariance matrices using the functions in GEMMA. Genome-wide significance was set by Bonferroni correction based on the number of independent SNP markers17, 10,764 effective markers. Thus, genome-wide significance was set at p < 4.6×10-6 (0.05/10764) and the suggestive threshold was p < 9.30×10-5 (1/10764).

The maternal age effect greatly amplified ASD and Mus. VSD risk. One parsimonious hypothesis would suggest that the same maternal age effect loci will appear in both ASD and Mus.

VSD cohorts. Thus, I conducted separate analyses for each defect to identify genetic modifiers of the maternal age effect (Figure 3.8A, B). Upon inspecting the results, two things became clear: 1) there are several significant loci, particularly those for the ASD maternal age effect, and 2) there is no clear overlap between the loci from the ASD and Mus. VSD analyses. To remedy the latter observation, I combined the ASD and Mus. VSD cases into one analysis to ensure that the maternal age effect loci that I detect will affect both ASD and Mus. VSD. This analysis identified a significant locus on Chromosome 3, centered at 72,433,899 bp (Figure 3.9, Table 3.7). The some

77 of the nearest genes to this marker are Rapgef2, Fnip2, Ppid and Etfdh. These loci will be the target of future investigation, particularly those on chromosomes 3 and 8 because they are supported most strongly by the data.

Other important data is summarized in Figures 3.10 – 3.12 and Tables 3.5 – 3.8. These data include manhattan plots for other maternal age defect analyses, effect plots, loci intervals and protein coding genes in the significant loci.

78

Discussion

Congenital heart disease is a very complex biological phenomenon. Aside from the effects of mutations and genetic modifiers within an individual, there are external factors that can also have a large impact on CHD risk. Environment is frequently used as a catch-all term to include all heritability that is not explained by an individual’s genes. Although this simplifying assumption is useful at times, it does not allow a full investigation of the genetic architecture of complex traits.

In our work here, we go against the grain and explore the role of maternal effects in CHD.

Maternal effects are instances where a phenotype in the mother alters a phenotype in the offspring, independently of the genes she transmits to the offspring. There are many examples of maternal effects throughout nature. For instance, seasonal changes in temperature and light cycle determine whether an aphid mother will produce parthenogenetic or sexual offspring, and maternal sensitivity to these environmental changes varies within populations18. In mouse embryos, Tgf-β1 null mutations are lethal only when the mother is also Tgf-β1 null. Maternal expression of Tgf-β1 during gestation is sufficient to rescue normal development. Perhaps the most well-known maternal effect in humans is the “thrifty phenotype”, also known as the Barker hypothesis19. Barker noticed that adults that were in utero during the Irish potato famine suffer a substantially elevated risk for metabolic syndrome. He hypothesized that maternal malnutrition during that period caused lasting changes to the embryonic metabolism as an adaptation to scavenge as many nutrients as possible to maintain development. In adults, this can lead to an overly anabolic metabolism, which increases the risk for several cardiometabolic diseases.

79

The maternal age-associated risk of CHD was first observed in humans and recapitulated in our mouse model of CHD. The mice showed that the maternal age effect is ameliorable with voluntary exercise. This gives us hope that it will be possible to create therapeutics that target this age effect and potentially other risk factors for CHD. Although our work is still in progress, I have identified loci that potentially contain genetic modifiers of the maternal age effect. I hypothesize that the loci we may identify will contain genes that are components of classical aging pathways such as Mtor, FoxO3a, Sirt1 or Igf1. Because exercise mitigates the age effect, I may identify exercise responsive myokines20, or energy homeostasis genes such as Leptin. I may also find unexpected genes to be associated with the maternal age effect. These genes may not be recognized for roles in offspring development possibly because of antagonistic pleiotropy. Antagonistic pleiotropy occurs when gene effects that enhance fitness early in life, also have pernicious effects later in life21. For instance, a gene variant that affects reproductive hormone homeostasis to allow earlier reproduction, may negatively affect the female body at advanced age, thus enhancing the maternal age effect on CHD.

Current state and future directions for genotyping

At our current stage of progress in this project, we are in the process of genotyping all the remaining LSAI mother. This includes 61 LSAI mother samples that had missing or poor-quality spleen DNA, whose livers were used for GBS. There were also 25 LSAI mothers whose carcasses were lost in the Chicago shuffle; we are using GBS for ~10 pups from each missing mother to infer her genotypes. Once these genotypes are available we can start to finalize our observations from mapping.

80

Current state and future directions for statistical modeling (technical details)

As important as the genotyping effort, is the work on the statistical model used to map the maternal age effect. Early in the analysis, I noticed that the model may have some issues with pseudoreplication because the response variable in my model are individual pups that are treated as independent samples, when truly the pups from the same mother are correlated.

Pseudoreplication is the statistical phenomenon where the samples from which observation are collected are assumed to be independent, when in reality the samples show some sort of correlation. A simple example of pseudoreplication gives the rationale for the paired T-test22: when taking repeated measures from the same sample of individuals, the subsequent measurements are going to be correlated within an individual. Many researchers, including myself, assumed that the using a linear mixed model with a random intercept would fully account for the non-independence of individuals. Failure to account for the correlation between the rates of change in non- independent samples can inflate the Type 1 error rate because the standard errors are underestimated23.

Dr. Cheverud also noticed that there is an issue with pseudoreplication and suggested two remedies: 1) use the difference in the early and late incidence of CHD per LSAI mother as the response variable or 2) adjust the degrees of freedom for the model to match the number of mothers in the study, not the number of pups. These are great options that would alleviate the issue of pseudoreplication, however, they introduce other challenges. Option 1 becomes difficult for mothers with fewer pups because the estimates of their age effect will be poor because they have few samples, but their age effect (the difference) is weighted equally with other mothers more pups

81 and more confident estimates. Option 2 becomes difficult because the lmekin R function does not permit manually dictating the degrees of freedom. It would require using SAS PROC GLIMMIX, which I do not have available. If this software becomes available this may become a good option.

A well-referenced24 study by Schielzeth and Forstmeier23 found that including a random slope term, along with the random intercept term in the linear mixed model, will account for issues with pseudoreplication. This is supported by the fact that the random slope accounts for the non- randomness in the rates of change that are inherent in groups of correlated samples. Pursuant to this information, I did a small test to compare the performance of the linear mixed model with random intercept only to a model that added a random slope of maternal age. I used one SNP and the same overall model used for mapping and compared the p-values and standard errors. I found that the results differed negligibly, as in SE and beta differences of 1 millionth or less. Although this is a very crude approximation for the genome-wide set of SNPs, it suggests that pseudoreplication does not have a profound effect on the genetic mapping.

82

Methods

Mouse strains and crosses

The Cheverud laboratory produced and housed 292 LG/J×SM/J F56 advanced intercross female mice (LSAI mothers). The LSAI mothers were generated as sister pairs to be crossed to a

Nkx2-5+/- hybrid male of either C57BL6/J;LG/J or C57BL6/J;SM/J, see Figure 3.3 for the cross schematic. Three mothers either died prematurely or failed to produce offspring, bringing the final number of LSAI mothers to 289. We bred LSAI mothers with the hybrid males for the duration of her reproductive lifespan, expecting an average 10 litters with roughly 8 pups per litter, a total

23,120 pups.

Collection and phenotyping of hearts

We collected 17,568 total offspring from the LSAI mothers, with maternal age at birth ranging from 55 – 429 days. We collected the pups at P0 to avoid the loss of those with severe heart defects. After excluding pups with low DNA concertation or missing samples, there were

17,425 pups remaining. We expect that 50% (8,712) of the pups will carry the Nkx2-5 mutation; we recovered 8,634 mutants (49.54%), which was not significantly different from the expectation

(one-sample z-test, P = 0.23).

We fixed the thoraces of all Nkx2-5 mutants in 10% neutral formalin. Hearts were completely sectioned in the frontal plane at 6 µm thickness. Every section was collected, stained with hematoxylin and eosin, and evaluated by two highly trained individuals. All defects were diagnosed by the morphology of the atrial and ventricular septae, semilunar and atrioventricular 83 valves, chambers and the anatomic relationships between them. Secundum ASDs were distinguished from a patent foramen ovale based on the size of the latter in wild-type hearts. ASDs were counted in these analyses if they reached an internal Jay lab sizing threshold of 5 (50% of the length of the atrial septum).

Detecting variation in the maternal age effect using a generalized linear model

To test for variation in maternal age effects between mothers, I used the following generalized linear model:

푪푯푫 = 푷푻 + 푴푰 + 푴푨 + (푴푰 ∗ 푴푨) where CHD is a binary case-control indicator that denotes the presence of a specific defect or the a completely normal heart, PT accounts for the effects of B6×LG versus B6×SM F1 hybrid father,

MI is the maternal identity, MA is the effect of the mother’s age, and MA*MI is the interaction of maternal age and maternal identity. The interaction term is critical because it represents the variation in individual maternal age effects for mapping. I tested the significance of each term with an analysis of deviance using a likelihood ratio test.

Single nucleotide polymorphism genotyping

Genomic DNA was isolated by phenol-chloroform extraction and dissolved in 1.2mL of

0.5x TE buffer. SNP genotyping was performed with two methods 1) genotyping by sequencing

(GBS) and 2) Illumina GGP GigaMUGA array.

84

GBS services were performed by Beijing Genomics Institute (BGI) in Shenzhen, China.

We prepared the DNA samples by diluting them to ~ 300ng/μL with 0.25x TE and depositing them in 96-well plates, along with 3 samples with known genotypes. We used an Nkx2-5+/-

C57BL6/J male, and pure LG/J and SM/J samples. After quality control steps at BGI, the DNA was digested with ApeKI , size filtered for ~200 – 300bp fragments, and paired-end sequencing was performed on the Illumina HiSeq 4000.

SNP-calling on the Illumina GGP GigaMUGA array was performed by Neogen

Corporation in Lincoln, NE. DNA samples were prepared the same way as for GBS. However, we included two LSAI mother samples (101058 and 101080) that had been genotyped with GBS as a technical replicate and to serve as a comparison between the two methods. The GigaMUGA arrays were analyzed on Illumina Goldengate platform and yields 143,000 SNPs.

Bioinformatics processing and analyses

Sequence data from GBS was provided on 500GB hard drives. The paired FASTQ files provided are already demultiplexed and adapter trimmed. The read counts from the samples ranged from 1.5 – 5 million reads. Using BWA MEM (version 0.7.5a-r405) on Linux Ubuntu (version

14.04 LTS), I aligned the files to the GRCm38 mouse reference (called

“Mus_musculus.GRCm38.dna_sm.chromosome.1.fa”). Note: the file is named , which is a naming error on the Ensembl FTP site because this file contains the entire genome. The

SAM files were converted to BAM files and were indexed. The BAM files were fed into a customized version of the Fast-GBS25 analytical pipeline run on the Washington University CHPC

High Performance Computing Cluster, which runs Linux of unknown version. Fast-GBS 85 incorporates the program Platypus for variant calling. Fast-GBS produces a VCF file that I converted into a text file for downstream processing in R. I used custom scripts in R to clean and parse the GBS data for mapping.

I identified nearly 400,000 polymorphic nucleotides. After filtering for known LG/J vs.

SM/J polymorphisms26, minor allele frequency (MAF) > 0.20, < 20% missing per SNP and Hardy-

Weinberg equilibrium there were 36,220 SNPs remaining for 188 LSAI samples. On average, there is one SNP per ~100 kbp. Such a high SNP density allows very high-resolution mapping, despite only ~5000 SNPs being needed to map in this F56 population.

We received the GigaMUGA data via a download link containing text and Excel files. The file containing the raw data is called “Washington_Universit_Akhirome_MURGIGV01_

20180426_FinalReport.txt”. The R package argyle27 was specially designed to perform QC steps on data from the Illumina GGP series of mouse genotyping arrays. The data showed an average

90% call rate across all samples. I extracted the SNP data from the QC software for use in mapping analyses.

Imputing SNP data and merging datasets

Because >90% of LSAI mothers were genotyped with GBS and only 243 SNPs overlapped with GigaMUGA, I imputed the GBS SNP positions in the 20 GigaMUGA mice to merge the two datasets. Using QTLRel (version 0.2-15), I calculated the genotype probabilities of the missing SNPs and converted them to BIMBAM format, which uses the mean genotypes or allelic dosage values. BIMBAM format accounts for the uncertainty in imputed genotype values

86 by allowing any range of numbers between 0 and 2 (numerical coding for the 3 genotypes). After combining the data with the GBS samples, there were 262 total mothers with SNP data available for analysis.

Genetic association analyses

To map the maternal age effect, we conduct genetic association tests in R, using a custom parallelized version of the lmekin16 function in R (version 3.3.1). I tested associations between

SNPs in the mother and her maternal age effect using a linear mixed model:

푪푯푫 = 푷푻 + 푨 + 푫 + 푴푨 + 푰풏풄 + (푨 ∗ 푴푨) + (푫 ∗ 푴푨) + 푲 + 풆

where CHD is a binary indicator (0,1) for the absence or presence of a heart defect in an offspring, A (D) is the additive (dominance) effect of the genotypes at the maternal locus, K is the random effect of kinship between mothers, Inc is the incidence covariate that accounts for each mother’s intrinsic CHD susceptibility that is transmitted to the offspring, and e is the error vector.

(A × MA) and (D × MA) are the interactions of maternal genotype and maternal age. These interaction terms are critical because they pinpoint the genetic regions that alter how much the maternal age effect alters CHD risk.

Statistical analyses

All statistical analyses and functions not otherwise specified were performed in the R

Statistical Computing environment (version 3.3.1).

87

88

Figures

Figure 3.1 Genetic variation in the maternal age effect indicates it is a quantitative trait

Female mice with different genetic backgrounds have varying levels of a maternal age effect on VSD risk, suggesting that the differences are genetically determined. (Schulkey et al. 2015, Ref. 3)

89

Figure 3.2 Breeding scheme for the F56 LG/J×SM/J advanced intercross line

The LSAI advanced intercross began with an SM/J male and LG/J female28. (Adapted from Ref. 29)

90

Figure 3.3 Experimental cross diagram for the F56 mapping population

The Nkx2-5+/- P0-aged offspring are collected and examined for heart defects.

91

A B

Figure 3.4 ASD and VSD risk is subject to strong maternal age effects

A) We have phenotyped 7242 pups (84% of the total). These pups carry the most leverage in our analyses because they are at the extremes of maternal age; B) the pups shown in the Young mom (<120 days) and Old mom (>250 days) groups comprise all pups that fit that category. ***** represents a Z-Test P < 0.0001.

92

Figure 3.5 The effect of maternal age on each defect varies broadly

The odds ratios were estimated using the following generalized linear model:

Defect ~ Father + Maternal Age, where Defect is the binary indicator for the presence of the specific defect vs. all other pups (irrespective of defect status), Father is a covariate for the hybrid male background, and Maternal Age is the mother’s age in months. (N = 5763)

93

Figure 3.6 Estimates of the maternal age effect, per mother

Defects with larger maternal age effects, ASD and Mus. VSD, also had significantly more between- mother variation. A color gradient was added to emphasize that there are data points from many (~180) LSAI mothers. I excluded mothers with fewer than 15 Nkx2-5+/- pups.

94

0 Mb 40 Mb 80 Mb

Figure 3.7 Distribution of Chromosome 18 GBS SNPs compared to the Reference

The red bars show the SNP counts from GBS per 2 Megabases compared to the reference SNP density distribution, which is based on whole genome sequence of both LG/J and SM/J26. One representative chromosome is shown here for the sake of legibility.

95

A

B

Figure 3.8 Manhattan plots of the maternal age effect on ASD and Mus. VSD show many loci, but no obvious overlap

96

A) There is 1 locus that surpassed the genome-wide significance (red line) and 11 others that only exceed the suggestive threshold (blue line) for the ASD maternal age effect. B) Mus. VSD has no significant locus and 7 suggestive loci.

97

Figure 3.9 Combining the ASD and Mus. VSD analysis reveals common maternal age loci

Chromosome 3 shows strong evidence for a shared QTL and there are several other suggestive loci.

98

A

B

Figure 3.10 Manhattan plots of the maternal age effect on Mbr. VSD and AVSD show only 1 suggestive locus

A) No Mbr. VSD loci were detected and B) only 1 locus was detected on Chr. 5 for AVSD defects.

99

Figure 3.11 Analyzing the maternal age effect for all defects combined reveals the best consensus candidate loci

Although only one significant locus was detected, this is on Chr. 3 72Mb, which overlaps with loci detected in Mus. VSD, ASD + Mus. VSD combined analyses. There are 2 suggestive loci.

100

A B

C D

Figure 3.12 Effect plots illustrating the maternal age effect for the most significant loci

Young and Old Mom are as previously described. OR (Odds Ratio) is from a logistic regression using all data used in mapping (not just young vs. old) with terms for Maternal Age and Father type.

101

Tables

Table 3.1 P-values from the GLM for between-mother variation in the age effect

Mother Defect Type Mother Paternal Type Maternal Age × Maternal Age ASD 0.011 0.42 2.2E-16 9.9E-14 Mus. VSD 9.2E-7 0.465 2.2E-16 2.2E-16 ASD + Mus. VSD 9.7E-13 0.86 2.2E-16 2.2E-16 Mbr. VSD 1.6E-4 0.49 5.5E-15 1.8E-6 AVSD 0.0086 0.58 0.0087 8.122E-16 All Defects 2.2E-16 0.91 2E-16 3.9E-8

Variation in the maternal age effect for each heart defect was performed using a GLM with the same terms as given in the table (see Methods). Maternal age explained a significant amount of the offspring’s risk for all defect types, but not paternal type. All defects had significant variation in the age effect based on maternal identity (Mother × Maternal Age interaction term). Insignificant p-values are highlighted in red.

102

Table 3.2 List of LSAI mothers genotyped with the Illumina GigaMUGA array

LSAI DNA Mother Source 99874 Spleen 100044 Liver 100135 Spleen 100227 Liver 100260 Liver 100267 Liver 100434 Liver 100465 Spleen 100481 Liver 100531 Liver 100553 Liver 100602 Spleen 100634 Liver 100745 Spleen 100924 Liver 100935 Liver 100962 Liver 101094 Liver 101181 Liver 101200 Liver

103

# Mothers w/o # Mothers w/o defect Defect defect (≥ 15 total pups)

ASD 42 18

Mus. VSD 39 20

Mbr. VSD 51 27

AVSD 79 27

Total 206 157

Table 3.3 Number of LSAI mothers that did not produce a pup with a given heart defect

For the LSAI mothers with no pups that have a specific defect make it impossible to estimate the maternal age effect. This data suggests that mapping power could be improved by excluding females that lack sufficient offspring with a given defect.

104

Table 3.4 List of LSAI mothers without DNA samples for genotyping

LSAI mothers without DNA 99781 99796 99972 99998 100084 100116 100286 100313 100317 100445 100501 100509 100511 100518 100650 100671 100705 100857 100868 100886 100922 100964 100993 101212 101243

Instead of directly genotyping these females, we will sequence their offspring and use their genotypes to infer the maternal genotypes.

105

Table 3.5 Counts of LSAI mothers and pups used in the various mapping analyses

Defect # LSAI Mothers # Pups

All Data 262 7242

All Defects 252 7171

ASD 193 4047

Mus. VSD 214 4816

ASD + Mus. VSD 235 5506

Mbr. VSD 201 4356

AVSD 161 3369

These data started with the All Data, which includes mothers with <10 pups. After removing those mothers, All Defects were those mothers and pups used in the All Defects mapping analysis. All analyses compared control pups (no defects) to those with the named defect.

106

Table 3.6 All suggestive and significant loci from all mapping analyses of the maternal age effect on CHD risk

Risk Analysis Chr Peak Lower Upper -log10(p) Locus Allele Mus. VSD 2 27059965 27058824 27062432 4.252 S Suggest ASD+MUS 3 72433899 70232592 76330685 5.432 S Signif All Defects 3 72433899 71312230 72564314 5.000 S Suggest ASD+MUS 3 78160123 77705412 78341524 5.149 S Suggest Mus. VSD 3 78160123 77705412 78341524 4.796 S Suggest ASD 4 10412170 9985083 10534025 4.149 S Suggest ASD+MUS 5 10187223 10186856 10200719 4.699 S Suggest Mus. VSD 5 10338120 10204461 10370589 4.721 S Suggest AVSD 5 122679822 122628998 122707862 4.456 S Suggest Mus. VSD 5 148575239 148493190 148591437 4.854 L Suggest ASD+MUS 5 151301096 151074156 151642464 4.638 L Suggest Mus. VSD 5 151301096 151074156 151642464 4.444 L Suggest ASD 6 93617919 93553398 93641297 4.076 S Suggest ASD 6 136331809 136292833 136347816 4.409 L Suggest ASD 7 70787371 70625510 71400860 5.000 S Suggest ASD+MUS 8 95062332 95056619 95082495 5.000 L Suggest ASD 8 95107702 95056619 95121718 4.745 L Suggest ASD 10 27826308 27387809 27844166 4.523 S Suggest ASD 10 57815137 57755032 57848120 4.523 L Suggest ASD 10 89558992 89111526 89616052 5.357 L Signif ASD 11 33993249 33767594 34385148 4.770 S Suggest ASD 13 38200712 38074675 38580221 4.886 S Suggest Mus. VSD 14 87458271 87423575 87467896 4.066 L Suggest All Defects 17 27045339 27038183 27057323 4.161 S Suggest ASD+MUS 17 27319123 27249150 27335308 4.071 S Suggest ASD+MUS 17 27514130 27511552 27519073 4.260 S Suggest ASD+MUS 17 27804653 27796945 27811334 4.638 S Suggest ASD 17 49245677 49074142 65778923 4.432 S Suggest ASD+MUS 18 52816516 52617697 52838921 4.174 L Suggest

107

ASD 18 52826440 52485633 53103241 4.347 L Suggest Mus. VSD 18 78669929 76871126 78779875 4.328 L Suggest All Defects X 168241499 161397927 168700752 4.538 S Suggest

This table is organized by chromosome and locus to emphasize overlapping loci from different analyses. The significant loci are highlighted in red. The lower and upper bounds were determined by a flanking drop of 1 “log10(p) unit” from the peak SNP. In the Risk Allele column, L stands for the LG/J allele and S stands for the SM/J allele.

108

Table 3.7 All protein-coding genes from the Chr. 3 loci

Chr. 3 70 – 77 MB Peak: 72433899 LOD 5.4318 | Risk allele: S Gene Chr Start End Otol1 3 70007613 70028708

Sis 3 72888557 72967863

Slitrk3 3 73047265 73057803

Bche 3 73635808 73708415

Zbbx 3 75037907 75165034

Serpini2 3 75242370 75270078

Wdr49 3 75274988 75482156

Pdcd10 3 75516490 75556856

Serpini1 3 75557547 75643495

Gm29133 3 75748197 75787653

Golim4 3 75875084 75956949

Fstl5 3 76074270 76710019

109

Table 3.8 All protein-coding genes from the ASD Chr. 10 locus

Chr. 10 89 – 90 MB Peak: 89558992 LOD 5.3565 | Risk allele: L Gene Chr Start End Ano4 10 88948994 89344762

Gas2l3 10 89408823 89443967

Nr1h4 10 89454234 89533585

Slc17a8 10 89574020 89621253

Scyl2 10 89638721 89686285

Actr6 10 89711971 89732295

Uhrf1bp1l 10 89744991 89819871

Anks1b 10 89873509 90973300

110

References

1. Miller, A., Riehle-Colarusso, T., Siffel, C., Frias, J. L. & Correa, A. Maternal age and prevalence of isolated congenital heart defects in an urban area of the United States. Am J Med Genet A 155A, 2137-2145, doi:10.1002/ajmg.a.34130 (2011).

2. Winston, J. B. et al. Complex trait analysis of ventricular septal defects caused by Nkx2- 5 mutation. Circ Cardiovasc Genet 5, 293-300, doi:10.1161/CIRCGENETICS.111.961136 (2012).

3. Schulkey, C. E. et al. The maternal-age-associated risk of congenital heart disease is modifiable. Nature 520, 230-233, doi:10.1038/nature14361 (2015).

4. Darvasi, A. & Soller, M. Advanced intercross lines, an experimental population for fine genetic mapping. Genetics 141, 1199-1207 (1995).

5. Morgan, T. H. The mechanism of Mendelian heredity. (Holt, 1915).

6. Sturtevant, A. H. A Third Group of Linked Genes in Drosophila Ampelophila. Science 37, 990-992, doi:10.1126/science.37.965.990 (1913).

7. Parker, C. C. et al. High-resolution genetic mapping of complex traits from a combined analysis of F2 and advanced intercross mice. Genetics 198, 103-116, doi:10.1534/genetics.114.167056 (2014).

8. Fawcett, G. L. et al. Fine-mapping of obesity-related quantitative trait loci in an F9/10 advanced intercross line. Obesity (Silver Spring) 18, 1383-1392, doi:10.1038/oby.2009.411 (2010).

9. Norgard, E. A. et al. Genetic factors and diet affect long-bone length in the F34 LG,SM advanced intercross. Mamm Genome 22, 178-196, doi:10.1007/s00335-010-9311-5 (2011).

10. Lawson, H. A. et al. Genetic, epigenetic, and gene-by-diet interaction effects underlie variation in serum lipids in a LG/JxSM/J murine model. J Lipid Res 51, 2976-2984, doi:10.1194/jlr.M006957 (2010).

11. Cheverud, J. M. et al. Fine-mapping quantitative trait loci affecting murine external ear tissue regeneration in the LG/J by SM/J advanced intercross line. Heredity (Edinb) 112, 508-518, doi:10.1038/hdy.2013.133 (2014).

12. Rai, M. F., Schmidt, E. J., Hashimoto, S., Cheverud, J. M. & Sandell, L. J. Genetic loci that regulate ectopic calcification in response to knee trauma in LG/J by SM/J advanced intercross mice. J Orthop Res 33, 1412-1423, doi:10.1002/jor.22944 (2015).

111

13. Baird, N. A. et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PloS one 3, e3376 (2008).

14. Elshire, R. J. et al. A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species. PLOS ONE 6, e19379, doi:10.1371/journal.pone.0019379 (2011).

15. Morgan, A. P. et al. The mouse universal genotyping array: from substrains to subspecies. G3: Genes, Genomes, Genetics, g3. 115.022087 (2015).

16. Therneau, T. (2018).

17. Gao, X. Multiple testing corrections for imputed SNPs. Genet Epidemiol 35, 154-158, doi:10.1002/gepi.20563 (2011).

18. Moran, N. A. The evolution of aphid life cycles. Annual review of entomology 37, 321- 348 (1992).

19. Barker, D. J. et al. Fetal nutrition and cardiovascular disease in adult life. The Lancet 341, 938-941 (1993).

20. Pedersen, B. K. & Febbraio, M. A. Muscles, exercise and obesity: skeletal muscle as a secretory organ. Nat Rev Endocrinol 8, 457-465 (2012).

21. Giaimo, S. & d'Adda di Fagagna, F. Is cellular senescence an example of antagonistic pleiotropy? Aging Cell 11, 378-383, doi:10.1111/j.1474-9726.2012.00807.x (2012).

22. Millar, R. B. & Anderson, M. J. Remedies for pseudoreplication. Fisheries Research 70, 397-407 (2004).

23. Schielzeth, H. & Forstmeier, W. Conclusions beyond support: overconfident estimates in mixed models. Behavioral Ecology 20, 416-420 (2008).

24. Wilson, A. J. et al. An ecologist’s guide to the animal model. Journal of Animal Ecology 79, 13-26 (2010).

25. Torkamaneh, D., Laroche, J., Bastien, M., Abed, A. & Belzile, F. Fast-GBS: a new pipeline for the efficient and highly accurate calling of SNPs from genotyping-by- sequencing data. BMC bioinformatics 18, 5 (2017).

26. Nikolskiy, I. et al. Using whole-genome sequences of the LG/J and SM/J inbred mouse strains to prioritize quantitative trait genes and nucleotides. BMC Genomics 16, 415, doi:10.1186/s12864-015-1592-3 (2015).

27. Morgan, A. P. argyle: an R package for analysis of Illumina genotyping arrays. G3: Genes, Genomes, Genetics, g3. 115.023739 (2015).

112

28. Cheverud, J. M. et al. Quantitative trait loci for murine growth. Genetics 142, 1305-1319 (1996).

29. Peters, L. L. et al. The mouse as a model for human biology: a resource guide for complex trait analysis. Nat Rev Genet 8, 58-69, doi:10.1038/nrg2025 (2007).

113

Chapter 4

Conclusions and perspectives

114

Synthesis and implications for CHD research

The inheritance and manifestation of congenital heart disease is a remarkably complex biological process. My research shows that complex genetic interactions within a predisposed individual can dictate whether the disease appears or not. I also show the profound impact that the developmental milieu can have on the risk of certain heart defects. However, the question remains as to how our findings will drive the future of CHD research.

My work on epistasis in CHD shows that the outcome of a de novo mutation is highly dependent on the combination of other inherited polymorphisms across the genome. This is the oligogenic model for CHD. Unfortunately, the methodology used in many large-scale sequencing studies automatically ignores the role of potentially deleterious variants transmitted from unaffected parents. This simplifying assumption is useful for reducing the complexity of large whole genome datasets, but it fails to represent the full genetic architecture of CHD. My results suggest that if we identify well-annotated networks of genes that physically interact during cardiac development (a feat in and of itself), when a de novo mutation is identified, we can query the list of gene interactions to look for inherited variants in those genes.

One study recently applied a similar method to study genes with potential physical interactions in a CHD cohort1. on combined WES and array-CGH data in a study of trios with atrioventricular septal defects. The authors used a novel rare-disease inheritance model to detect de novo mutations as well as inherited compound heterozygous and rare homozygous gene variants across the genome. This model also assessed the burden of potentially deleterious variation in genes that had likely physical interactions. They found that their model was able to identify CHD-

115 associated variants in 23% of trios with sporadic CHD. Interestingly, many of the trios had multiple predicted-deleterious variants in genes that have not been previously tied to CHD. Additionally, this study only considered inherited variants that potentially functioned by recessive disease mechanisms. This is likely because it is difficult to work with the large number of rare heterozygous variants that are inherited from the parents.

Most cases of CHD are not caused by a single genetic event. Using my work as a proof of concept, studies of human CHD must change the assumptions away from a monogenic model to an oligogenic model. This assertion is supported by the fact that evolution favors robust or resilient genetic networks. If it only takes one mutation to derail cardiac development in all cases, the incidence of CHD would be much higher. My work shows that the combinations of genetic variants within a genetic interaction network can substantially influence whether a heart defect appears in susceptible individuals. This is even more true for severe heart defects, for which risk is more dependent on other interacting polymorphisms than for mild defects.

Interestingly, I observed that AVSDs are rare in Nkx2-5+/- mice and in humans, suggesting that atrioventricular septum development is quite robust to genetic insults. I also found that the maternal age effect did not affect the risk of AVSD. This suggests that the atrioventricular compartment is robust to both genetic and environmental insults. This is logical because selection can exert heavier pressure on traits with higher fitness costs. These assertions are partially supported by the observation made by Brodwall et al. 2017 (Ref. 2), showing that the risk of severe

CHD is 7 times higher when an older sibling has a severe heart defect compared to when the sibling has a normal heart. Contrarily, the risk is only 2.3 times higher than normal when the older sibling has a mild defect. Once one family member manifests a severe CHD, it indicates that there has

116 been a breakdown in the resilience mechanisms. Since there are likely many broken safeguards, it is likely that the siblings are each inheriting many risk factors from their parents, thus enhancing the recurrence risks in the family. Therefore, I hypothesize that severe CHD risk has a significantly higher heritability from inherited rather than de novo variants. Future work in partitioning these contributions will be necessary to validate this idea.

Because the effects of evolution are pervasive throughout biology3, it is likely that robustness influences all diseases with variable expressivity and penetrance. It has been suggested that robustness is a ubiquitous biological phenomenon4,5, therefore it is not surprising that evolution has shaped the architecture of biological networks to prevent the most severe phenotypes. Our finding that coadapted gene variants in mice can suppress CHD risk shows that it is the specific interacting variants that determine the level of robustness. Harmonious allele combinations generate favorable epistatic effects that can suppress the effects of deleterious mutations. If we identify modifiable factors that influence this risk suppression, then we would finally have an intervention to prevent CHD. Similarly, our work on the maternal age effect seeks to identify the physiological mechanisms through which aerobic exercise suppresses the maternal age effect. It is possible that these mechanisms could include robustness to stressors associated with aging.

In conclusion, my work on CHD explored the intrinsic and extrinsic factors that influence

CHD risk. While there is much left to be done, progress is imminent. As the costs of sequencing and analytical assays decrease, it will become possible to do the large-scale tests in human populations. Results from these studies will eventually lead to interventions that will better treat and even prevent congenital heart disease.

117

References

1. Priest, J. R. et al. De Novo and Rare Variants at Multiple Loci Support the Oligogenic Origins of Atrioventricular Septal Heart Defects. PLoS Genet 12, e1005963, doi:10.1371/journal.pgen.1005963 (2016).

2. Brodwall, K. et al. Recurrence of congenital heart defects among siblings—a nationwide study. American Journal of Medical Genetics Part A 173, 1575-1585, doi:10.1002/ajmg.a.38237 (2017).

3. Dobzhansky, T. Nothing in biology makes sense except in the light of evolution. The american biology teacher 75, 87-91 (1973).

4. Tyler, A. L., Asselbergs, F. W., Williams, S. M. & Moore, J. H. Shadows of complexity: what biological networks reveal about epistasis and pleiotropy. Bioessays 31, 220-227 (2009).

5. Felix, M.-A. & Barkoulas, M. Pervasive robustness in biological systems. Nat Rev Genet 16, 483-496, doi:10.1038/nrg3949 (2015).

118

Appendix I

List of SNPs used in the A/J×C57BL/6N and FVB/N×C57BL/6N experiments

[119]

A/J×C57BL/6N SNPs

F2 cM, Cox F2 cM, Cox SNP ID Chr. SNP ID Chr. sex-avg. sex-avg. rs13475818 1 16.4470 rs30458137 9 32.7528 rs3163007 1 32.3070 rs33720457 9 40.8760 rs13475934 1 39.5110 rs3710939 9 57.9830 rs3684360 1 54.7000 rs13480488 10 3.0060 rs3701630 1 69.1070 rs13459187 10 5.1430 rs6238909 1 84.9310 rs29329268 10 5.8055 rs13476354 2 10.3282 rs13480521 10 7.8605 rs13459062 2 21.8060 rs13459119 10 9.7540 rs3659784 2 62.0980 rs29325765 10 47.0500 rs27349879 2 75.4070 rs13480773 10 62.2764 rs3659658 2 89.4670 rs26864121 11 13.3116 rs3676540 3 6.8289 rs13480991 11 25.4920 rs29807239 3 14.2680 rs13481091 11 43.7211 rs3716258 3 24.0970 rs27059631 11 58.9000 rs3706378 3 33.7550 rs3706128 11 70.2894 rs13477291 3 43.5124 rs13481256 11 82.9570 rs31135898 3 53.0931 rs3678500 12 9.9405 rs30509660 3 63.0360 rs13481388 12 15.5330

[120]

rs3685116 3 72.3035 rs13481459 12 22.8396 rs13477517 3 81.6622 rs6333493 12 28.7260 rs27682879 4 7.5557 rs13481595 12 45.4220 rs27807782 4 14.0000 rs3705279 13 7.2572 rs3719378 4 45.7060 rs13459140 13 17.8688 rs6195490 4 57.6654 rs13481783 13 21.2100 rs27610338 4 69.0510 rs29237897 13 35.7893 rs13478067 4 86.1670 rs29229165 13 50.4979 rs13478133 5 9.9680 rs13482037 13 67.2100 rs13478182 5 18.1590 rs30637674 14 7.0800 rs13478245 5 26.8220 rs13482119 14 18.6570 rs3684754 5 36.0449 rs3715961 14 23.9629 rs13478337 5 41.8192 rs3653455 14 34.6140 rs3655541 5 51.4340 rs6306220 14 44.0387 rs3715788 5 60.5724 rs13482356 14 56.1650 rs6237258 5 68.2580 rs32275246 15 8.5380 rs13478585 5 85.2381 rs13482486 15 9.4930 rs6172481 6 1.8060 rs13482587 15 21.1460 rs6271324 6 15.4780 rs32164372 15 38.0150 rs13478830 6 32.5560 rs3695692 15 48.0369 rs6292642 6 48.9280 rs32677126 16 9.6360

[121]

rs6361894 6 63.4350 rs4169065 16 21.7827 rs13479152 7 12.4855 rs4189683 16 34.3570 rs32082888 7 25.4805 rs4210345 16 45.7803 rs3675728 7 39.8674 rs13459152 17 22.5930 rs4137968 7 54.4590 rs13483049 17 29.7243 rs13479530 7 73.1870 rs33633158 17 45.4911 rs3023176 8 7.5870 rs6358426 18 10.1000 rs13479701 8 22.3824 rs29976462 18 23.8762 rs13479794 8 31.5500 rs3656892 18 42.5727 rs13479955 8 50.0740 rs30997811 19 18.3527 rs13479979 8 56.1970 rs30415214 19 35.9722 rs13480121 9 16.3358 rs30931570 19 48.4583

[122]

FVB/N×C57BL/6N SNPs

F2 cM, F2 cM, Illumina SNP ID SNP ID Chr. Cox sex- Illumina SNP ID SNP ID Chr. Cox sex- avg. avg.

rs6269442 rs6269442 1 1.614 rs3667475 rs3667475 8 34.488

rs13475701 rs13475701 1 1.744 rs13479830 rs13479830 8 35.127 CZECH- rs13475706 rs13475706 1 1.889 8_78291790 rs36926924 8 37.450

rs3684358 rs3684358 1 2.060 rs13479871 rs13479871 8 41.475

rs13475729 rs13475729 1 3.749 rs6257357 rs6257357 8 41.611

rs3671256 rs3671256 1 5.009 rs13479882 rs13479882 8 42.129

rs13475750 rs13475750 1 6.153 rs13479884 rs13479884 8 42.330

rs6404446 rs6404446 1 6.442 rs6391152 rs6391152 8 43.853

mCV23695025 rs31949292 1 8.486 rs13479922 rs13479922 8 44.877

rs6173215 rs6173215 1 8.692 rs4137596 rs4137596 8 46.663

mCV24784983 rs31917015 1 10.425 rs6285803 rs6285803 8 48.120

rs6166266 rs6166266 1 10.731 rs6287320 rs6287320 8 48.493

rs3677683 rs3677683 1 10.932 rs13479947 rs13479947 8 49.503

rs6384194 rs6384194 1 11.319 rs3706149 rs3706149 8 50.061

rs3695988 rs3695988 1 11.345 rs13479955 rs13479955 8 50.074

rs4137502 rs4137502 1 11.461 rs3669235 rs3669235 8 50.253

rs31837951 rs31837951 1 12.249 rs13479956 rs13479956 8 50.340

rs3707642 rs3707642 1 12.426 rs3662808 rs3662808 8 56.991

rs3681732 rs3681732 1 12.953 rs6237645 rs6237645 8 57.389

rs3683997 rs3683997 1 14.842 gnf08.118.027 rs33401493 8 63.315

rs4222295 rs4222295 1 17.369 rs3708073 rs3708073 8 70.716

gnf01.036.770 rs13475823 1 18.581 rs6310696 rs6310696 8 70.824

rs13475827 rs13475827 1 20.081 rs6377872 rs6377872 8 71.000

rs13475834 rs13475834 1 20.661 rs3705725 rs3705725 8 72.309

rs13475847 rs13475847 1 23.851 rs31833030 rs31833030 8 72.339

rs13475851 rs13475851 1 25.709 gnf09.012.310 rs13480087 9 7.064

CEL-1_49993068 rs31383619 1 25.876 rs13480092 rs13480092 9 7.305

rs3724092 rs3724092 1 25.910 rs13480095 rs13480095 9 7.408

rs13475866 rs13475866 1 26.334 rs8270115 rs8270115 9 7.927

rs6322485 rs6322485 1 32.063 rs13480103 rs13480103 9 9.511

rs3163007 rs3163007 1 32.307 rs3088801 rs3088801 9 10.106

rs6288543 rs6288543 1 34.187 rs6404775 rs6404775 9 14.314

rs6312657 rs6312657 1 34.680 rs6385855 rs6385855 9 15.145

rs13475931 rs13475931 1 39.196 rs3655898 rs3655898 9 18.165

rs13475934 rs13475934 1 39.511 UT_9_35.918713 rs30437276 9 20.114

gnf01.075.385 rs13475936 1 39.780 rs6413270 rs6413270 9 20.806 [123]

rs13475939 rs13475939 1 40.572 rs33646953 rs33646953 9 24.426

rs13475946 rs13475946 1 41.565 rs13462199 rs13462199 9 24.739

gnf01.079.218 rs13475952 1 42.110 rs13480173 rs13480173 9 25.485

rs3723062 rs3723062 1 43.848 rs6395817 rs6395817 9 25.803

UT_1_89.100476 rs32600524 1 44.067 gnf09.042.496 rs13480179 9 26.352

rs6195073 rs6195073 1 44.700 rs13480186 rs13480186 9 27.114

rs31435980 rs31435980 1 44.979 rs3723670 rs3723670 9 28.839

rs13475980 rs13475980 1 46.189 rs13480208 rs13480208 9 29.932

rs6268443 rs6268443 1 47.188 rs13480217 rs13480217 9 31.394

rs3694065 rs3694065 1 48.814 rs6224703 rs6224703 9 32.333

CEL-1_103251925 rs30596698 1 49.306 rs30458137 rs30458137 9 32.753

CEL-1_105423103 rs31680156 1 49.601 rs3714012 rs3714012 9 33.180

rs3685919 rs3685919 1 50.674 rs3655717 rs3655717 9 35.340

rs13476061 rs13476061 1 50.919 rs6174757 rs6174757 9 36.954

rs3674655 rs3674655 1 52.113 rs6355445 rs6355445 9 39.122

rs3678634 rs3678634 1 52.718 rs3721056 rs3721056 9 39.700

rs4137908 rs4137908 1 54.799 rs13480271 rs13480271 9 40.240

rs13476104 rs13464873 1 55.969 rs33720457 rs33720457 9 40.876

rs3699561 rs3699561 1 56.934 rs3724833 rs3724833 9 41.977

rs3724826 rs3724826 1 57.000 rs13480285 rs13480285 9 42.583

rs13476119 rs13476119 1 58.159 gnf09.074.193 rs13480298 9 43.863

rs8250053 rs8250053 1 58.391 rs3676124 rs3676124 9 45.286

gnf01.137.841 rs30968449 1 60.722 rs3695050 rs3695050 9 46.021

rs13476141 rs13476141 1 61.841 rs3700596 rs3700596 9 46.512

rs6382880 rs6382880 1 62.409 rs3669564 rs3669564 9 47.114

rs13476152 rs13476152 1 62.560 gnf09.087.298 rs13480340 9 47.798

rs3672697 rs3672697 1 63.104 rs13480345 rs13480345 9 48.379

rs30986578 rs30986578 1 63.320 rs13480351 rs13480351 9 48.893

rs6411476 rs6411476 1 63.404 CEL-9_95875215 rs29789189 9 50.284

rs13476184 rs13476184 1 65.303 rs3689336 rs3689336 9 50.613

UT_1_113.684537 rs31544365 1 67.667 rs13480365 rs13480365 9 51.376

rs31299650 rs31299650 1 69.032 rs3717654 rs3717654 9 54.162

rs13476229 rs13476229 1 76.767 rs30343415 rs30343415 9 55.078

rs3707910 rs3707910 1 79.926 rs13480399 rs13480399 9 57.487

rs3669108 rs3669108 1 81.044 rs13480407 rs13480407 9 59.514

rs13476265 rs13476265 1 83.595 gnf09.110.002 rs13480428 9 63.934

rs13476273 rs13476273 1 84.910 rs6320810 rs6320810 9 67.636

rs3667164 rs3667164 1 93.199 rs3669563 rs3669563 9 69.841

rs13476300 rs13476300 1 96.082 rs13459114 rs13459114 9 72.670

rs6180557 rs6180557 1 97.905 rs6304156 rs6304156 9 73.664 [124]

rs27096879 rs27096879 2 2.238 rs6185923 rs6185923 10 0.438

rs6213083 rs6213083 2 2.316 CEL-10_3989939 rs29368538 10 1.298

rs4136817 rs4136817 2 2.515 rs3721803 rs3721803 10 1.696

rs6359983 rs6359983 2 3.790 rs13480488 rs13480488 10 3.006

rs6240512 rs6240512 2 8.426 rs3664101 rs3664101 10 3.154

rs3674936 rs3674936 2 14.520 rs4228112 rs4228112 10 4.417

rs13459062 rs13459062 2 21.806 rs13459187 rs13459187 10 5.143

CEL-2_32981944 rs32868446 2 22.274 rs29329268 rs29329268 10 5.806

gnf02.035.469 rs13476442 2 22.742 rs3699409 rs3699409 10 5.979

rs13476439 rs13476439 2 24.323 rs13480521 rs13480521 10 7.861

rs13476454 rs13476454 2 24.770 rs13459119 rs13459119 10 9.293

rs13476467 rs13476467 2 26.974 rs13480534 rs13480534 10 9.796

rs13476472 rs13476472 2 27.478 rs3679120 rs3679120 10 10.400

rs6265423 rs6265423 2 27.693 rs13480547 rs13480547 10 11.755

CEL-2_50605053 rs27901975 2 29.025 rs13480554 rs13480554 10 12.941

rs3725341 rs3725341 2 29.897 gnf10.026.889 rs13480566 10 16.478

rs13476503 rs13476503 2 30.511 rs13480578 rs13480578 10 18.922

rs27929044 rs27929044 2 30.694 rs13480579 rs13480579 10 19.237

rs3718711 rs3718711 2 31.734 rs13480601 rs13480601 10 23.190

rs3672719 rs3672719 2 33.018 rs13480605 rs13480605 10 23.824

CEL-2_63143553 rs33395966 2 36.722 rs3696307 rs3696307 10 27.314

rs6222797 rs6222797 2 38.403 rs13480621 rs13480621 10 30.574

rs4136610 rs4136610 2 41.073 rs13480627 rs13480627 10 34.736

rs3664661 rs3664661 2 42.714 rs13480630 rs13480630 10 35.028

rs13476580 rs13476580 2 44.165 CEL-10_71736974 rs29332616 10 37.031

rs3722345 rs3722345 2 48.304 rs13480652 rs13480652 10 38.380

rs3667007 rs3667007 2 49.029 rs13480657 rs13480657 10 38.917

rs8273639 rs8273639 2 49.444 rs3717445 rs3717445 10 40.766

rs13476621 rs13476621 2 49.887 rs13480678 rs13480678 10 41.500

rs6404809 rs6404809 2 50.023 rs3679902 rs3679902 10 41.674

rs13476636 rs13476636 2 50.636 rs3089366 rs3089366 10 45.073

rs13476639 rs13476639 2 51.239 rs13480702 rs13480702 10 45.499

rs13476649 rs13476649 2 52.255 rs29325765 rs29325765 10 47.050

rs6378047 rs6378047 2 53.194 gnf10.091.965 rs13480710 10 47.545

rs27359225 rs27359225 2 53.374 rs13480712 rs13480712 10 48.536

rs13476666 rs13476666 2 53.746 rs13480716 rs13480716 10 48.973

rs13476689 rs13476689 2 56.063 rs13480722 rs13480722 10 50.350

rs13476691 rs13476691 2 56.306 rs3710293 rs3710293 10 51.414

rs13476698 rs13476698 2 56.748 rs3688351 rs3688351 10 54.723

CEL-2_111981058 rs33283452 2 56.948 rs6290359 rs6290359 10 56.295 [125]

rs4138562 rs4138562 2 57.704 rs6243755 rs6243755 10 56.692

rs3699172 rs3699172 2 58.312 rs29353606 rs29353606 10 61.545

rs3149106 rs3149106 2 59.399 rs3680872 rs3680872 10 61.556 CEL- rs8279354 rs8279354 2 59.823 10_119602638 rs29360855 10 67.701

rs3658927 rs3658927 2 60.369 rs3697243 rs3697243 10 73.832

rs6228179 rs6228179 2 60.677 rs3676330 rs3676330 10 76.442

rs6401493 rs6401493 2 61.384 rs13480836 rs13480836 11 2.495

gnf02.126.027 rs13476762 2 62.081 rs13480837 rs13480837 11 2.735

rs3661596 rs3661596 2 63.204 rs13480863 rs13480863 11 5.918

rs3665528 rs3665528 2 63.278 rs26925478 rs26925478 11 6.126

rs6363071 rs6363071 2 71.292 rs3023249 rs3023249 11 6.662

rs6209325 rs6209325 2 73.369 rs13480875 rs13480875 11 7.335

rs27349879 rs27349879 2 75.407 gnf11.017.294 rs29389113 11 10.513

rs13476860 rs13476860 2 77.376 rs13480910 rs13480910 11 14.219

rs3695266 rs3695266 2 78.550 rs3700830 rs3700830 11 17.794

rs13476874 rs13476874 2 80.116 rs13459123 rs13459123 11 18.321

rs3664408 rs3664408 2 81.738 rs26822994 rs26822994 11 18.868

rs13476889 rs13476889 2 85.007 rs3723833 rs3723833 11 19.079

CEL-2_168586738 rs27311433 2 88.761 rs13480968 rs13480968 11 22.232

rs3726974 rs3726974 2 89.685 rs6359329 rs6359329 11 23.638

rs13476928 rs13476928 2 97.782 rs13480996 rs13480996 11 25.809

rs13476932 rs13476932 2 98.140 rs3654344 rs3654344 11 25.922

rs3679483 rs3679483 2 102.647 rs26898753 rs26898753 11 27.389

rs27681559 rs27681559 2 102.929 rs13481009 rs13481009 11 27.986

rs13476963 rs13476963 3 2.104 rs13481011 rs13481011 11 28.080

rs13476969 rs13476969 3 2.406 rs13481033 rs13481033 11 32.472

rs13476973 rs13476973 3 2.935 rs4228731 rs4228731 11 33.040

rs6248752 rs6248752 3 2.967 rs3684076 rs3684076 11 33.700

rs13476985 rs13476985 3 3.230 rs3697686 rs3697686 11 36.250

rs3659988 rs3659988 3 4.094 rs3711357 rs3711357 11 37.947

gnf03.015.035 rs13476997 3 4.990 rs13481076 rs13481076 11 40.590

rs6235984 rs6235984 3 7.393 rs6197743 rs6197743 11 42.960

rs13477030 rs13477030 3 10.990 rs13481109 rs13481109 11 46.074

rs3660588 rs3660588 3 13.426 rs13481119 rs13481119 11 46.826

rs29807239 rs29807239 3 14.268 rs13481145 rs13481145 11 52.843

rs13477043 rs13477043 3 15.290 rs3714299 rs3714299 11 58.424

gnf03.030.222 rs29919622 3 16.101 rs27059631 rs27059631 11 58.900

rs4223883 rs4223883 3 16.434 rs3661058 rs3661058 11 62.745

rs6246699 rs6246699 3 17.452 rs6393948 rs6393948 11 67.403

rs6324747 rs6324747 3 19.211 rs3698446 rs3698446 11 68.645 [126]

CEL-3_45598106 rs30263314 3 20.388 rs6386362 rs6386362 11 70.093

rs13477097 rs13477097 3 21.275 rs6370458 rs6370458 11 71.910

rs30057783 rs30057783 3 21.726 rs13481227 rs13481227 11 73.596

rs4139913 rs4139913 3 21.749 rs3672597 rs3672597 11 75.820

rs6241331 rs6241331 3 23.549 rs3699056 rs3699056 11 79.467

rs4223969 rs4223969 3 26.260 rs13481256 rs13481256 11 82.957

rs13477126 rs13477126 3 27.287 rs3712881 rs3712881 11 84.892

rs13477138 rs13477138 3 29.092 rs13481303 rs13481303 12 5.626

rs13477143 rs13477143 3 29.302 rs3706330 rs3706330 12 7.880

rs13477154 rs13477154 3 30.058 rs13481321 rs13481321 12 7.923

rs6212539 rs6212539 3 30.168 rs13481324 rs13481324 12 8.117

rs31430997 rs31430997 3 30.245 rs3717860 rs3717860 12 8.495

gnf03.063.824 rs13477167 3 30.735 rs13481363 rs13481363 12 10.543

rs6198234 rs6198234 3 32.419 rs3089800 rs3089800 12 11.630

rs6264454 rs6264454 3 32.571 rs13481371 rs13481371 12 12.768

rs13477201 rs13477201 3 33.735 rs6223000 rs6223000 12 14.927

rs3706378 rs3706378 3 33.755 rs13481388 rs13481388 12 15.533

rs13477210 rs13477210 3 34.419 rs6311081 rs6311081 12 16.814

rs13477217 rs13477217 3 34.881 rs3658100 rs3658100 12 17.537

rs13477224 rs13477224 3 35.847 rs13481406 rs13481406 12 18.110

rs13477233 rs13477233 3 37.284 rs13481408 rs13481408 12 18.369

rs13477251 rs13477251 3 38.955 CEL-12_40646532 rs29148707 12 20.993

rs13477268 rs13477268 3 40.211 rs3670749 rs3670749 12 22.242

rs3022964 rs3022964 3 42.089 rs13481459 rs13481459 12 22.840

rs3726226 rs3726226 3 42.961 rs3706319 rs3706319 12 25.397

rs4138887 rs4138887 3 45.048 mCV22351241 rs29158995 12 25.941

rs13477299 rs13477299 3 45.247 gnf12.061.363 rs13481496 12 26.705

rs3701653 rs3701653 3 45.461 rs13481514 rs13481514 12 28.892

rs29690864 rs29690864 3 47.761 rs13481527 rs13481527 12 30.340

rs13477321 rs13477321 3 47.987 rs3655558 rs3655558 12 32.585

rs3676545 rs3676545 3 48.528 rs13481541 rs13481541 12 33.626

rs3698700 rs3698700 3 49.118 rs3662628 rs3662628 12 35.986

gnf03.119.970 rs13477357 3 50.381 rs29176418 rs29176418 12 36.457

rs3708412 rs3708412 3 50.969 rs13481565 rs13481565 12 38.807

CEL-3_120379605 rs30157172 3 52.425 CEL-12_84750094 rs29184538 12 43.833

rs13477404 rs13477404 3 60.271 rs3670410 rs3670410 12 44.326

rs13477421 rs13477421 3 62.120 rs13481588 rs13481588 12 45.007

rs30509660 rs30509660 3 63.036 rs13481592 rs13481592 12 45.198

rs3676039 rs3676039 3 63.124 rs6207869 rs6207869 12 45.478

rs13477439 rs13477439 3 64.048 rs13481599 rs13481599 12 46.455 [127]

rs3090379 rs3090379 3 66.233 rs13481604 rs13481604 12 49.820

rs6290401 rs6290401 3 66.587 rs3716084 rs3716084 12 50.269

rs3657112 rs3657112 3 72.448 rs13481614 rs13481614 12 51.183

rs13477494 rs13477494 3 76.859 rs13481632 rs13481632 12 56.892

rs13477506 rs13477506 3 79.346 rs6390948 rs6390948 12 59.558

gnf03.160.599 rs30801216 3 81.070 rs13481655 rs13481655 12 61.147

rs13477517 rs13477517 3 81.662 rs3023711 rs3023711 12 62.619

CEL-3_159340478 rs30353245 3 82.066 mCV24244050 rs29249127 13 1.733

rs3660863 rs3660863 4 3.374 rs6215262 rs6215262 13 2.100

rs6287606 rs6287606 4 7.098 rs13481668 rs13481668 13 2.487

rs27682879 rs27682879 4 7.532 rs13481673 rs13481673 13 2.837

CEL-4_19508304 rs27682826 4 7.534 rs6348604 rs6348604 13 5.492

rs13477596 rs13477596 4 8.599 rs3023379 rs3023379 13 6.365

gnf04.019.134 rs13477602 4 9.709 rs13481715 rs13481715 13 6.898

mCV22919271 rs27748877 4 10.325 rs6314295 rs6314295 13 7.214

rs13477617 rs13477617 4 11.276 rs3705279 rs3705279 13 7.257

rs3701432 rs3701432 4 12.487 rs3679784 rs3679784 13 7.543

CEL-4_30653207 rs27801920 4 13.275 rs3663223 rs3663223 13 9.387

rs3663744 rs3663744 4 13.517 rs13481767 rs13481767 13 16.310

rs27807782 rs27807782 4 14.000 rs13481783 rs13481783 13 21.210

rs3674908 rs3674908 4 14.005 rs3712907 rs3712907 13 21.327

rs13477643 rs13477643 4 17.003 rs3688207 rs3688207 13 21.883

rs3698283 rs3698283 4 23.004 rs6411274 rs6411274 13 24.301

rs27856136 rs27856136 4 24.490 rs6244558 rs6244558 13 24.703

rs3677770 rs3677770 4 29.364 rs3693942 rs3693942 13 29.453

rs3712541 rs3712541 4 32.814 rs3690198 rs3690198 13 32.531

rs27935605 rs27935605 4 33.958 rs4229817 rs4229817 13 34.532

rs6254381 rs6254381 4 34.002 rs29730990 rs29730990 13 34.544

rs13477838 rs13477838 4 42.520 rs13481871 rs13481871 13 37.601

rs13477866 rs13477866 4 45.650 rs13481880 rs13481880 13 40.224

rs28076769 rs28076769 4 45.667 rs13481883 rs13481883 13 40.871

rs13477883 rs13477883 4 47.779 rs13481908 rs13481908 13 43.360

rs6324470 rs6324470 4 49.475 rs29247893 rs29247893 13 43.678

rs6226080 rs6226080 4 50.643 mCV24625340 rs29565666 13 44.743

rs3710617 rs3710617 4 51.378 gnf13.088.732 rs29932904 13 45.235

rs6381371 rs6381371 4 53.085 rs6288319 rs6288319 13 46.504

rs3692563 rs3692563 4 54.658 rs6316213 rs6316213 13 47.483

rs3675629 rs3675629 4 56.727 gnf13.093.328 rs13481943 13 47.760

rs6195490 rs6195490 4 57.665 rs29229165 rs29229165 13 50.498

rs13477959 rs13477959 4 57.707 rs13481961 rs13481961 13 50.925 [128]

rs6355837 rs6355837 4 63.346 rs13481968 rs13481968 13 52.195

UT_4_132.137715 rs32281313 4 66.216 rs13481983 rs13481983 13 55.812

rs3663950 rs3663950 4 67.994 rs13481996 rs13459786 13 58.109

rs13478002 rs13478002 4 68.924 rs13482005 rs13482005 13 59.736

rs27610338 rs27610338 4 69.051 gnf13.115.241 rs13482022 13 64.043

rs4224864 rs4224864 4 70.210 rs6247696 rs6247696 13 64.656

rs3719891 rs3719891 4 75.668 rs13482028 rs13482028 13 64.842

rs3023025 rs3023025 4 77.015 rs13482035 rs13482035 13 66.772

rs13478089 rs13478089 4 84.321 rs13482037 rs13482037 13 67.210

rs13478067 rs13478067 4 85.611 rs4230144 rs4230144 14 4.710

rs13478068 rs13478068 4 86.105 rs6322899 rs6322899 14 5.762

rs6279100 rs6279100 4 88.605 rs6290836 rs6290836 14 6.421

rs3664617 rs3664617 5 2.537 rs30637674 rs30637674 14 7.080

rs6402980 rs6402980 5 2.883 rs3687889 rs3687889 14 11.062

rs6223482 rs6223482 5 3.199 rs30721414 rs30721414 14 11.936

rs3714258 rs3714258 5 3.933 rs13482096 rs13482096 14 12.574

rs3687916 rs3687916 5 6.151 rs3692121 rs3692121 14 14.111

rs13478139 rs13478139 5 10.162 rs13482104 rs13482104 14 15.674

rs13478154 rs13478154 5 13.069 rs6396829 rs6396829 14 16.608

rs13469943 rs13469943 5 16.960 rs13482119 rs13482119 14 18.657

rs3700706 rs3700706 5 17.391 rs13482143 rs13482143 14 22.258

rs13459085 rs13459085 5 17.941 rs13482159 rs13482159 14 23.452

rs13478182 rs13478182 5 18.159 rs3715961 rs3715961 14 23.963

rs6256504 rs6256504 5 21.165 rs13482170 rs13482170 14 25.173

rs13478205 rs13478205 5 22.295 rs3666583 rs3666583 14 26.733

rs6215373 rs6215373 5 24.090 rs13482191 rs13482191 14 27.710

rs13478223 rs13478223 5 25.023 rs13482194 rs13482194 14 28.170

rs3659933 rs3659933 5 25.740 rs13482216 rs13482216 14 33.165

CEL-5_45872918 rs29521905 5 25.980 rs3672425 rs3672425 14 35.550

rs3714001 rs3714001 5 26.890 rs13482247 rs13482247 14 37.396

rs3664008 rs3664008 5 29.397 rs6284633 rs6284633 14 37.542

mCV23125912 rs259632697 5 30.567 rs13459144 rs13459144 14 40.316

CEL-5_56167948 rs29584438 5 30.975 rs13482265 rs13482265 14 42.277

rs3090667 rs3090667 5 31.824 rs13482284 rs13482284 14 43.806

rs6187409 rs6187409 5 32.229 rs3708535 rs3708535 14 44.007

mCV23386455 rs33138184 5 32.845 rs13482296 rs13482296 14 44.438

rs3707918 rs3707918 5 38.702 rs6395984 rs6395984 14 45.189

rs3153758 rs3153758 5 40.491 rs6291434 rs6291434 14 45.459

mCV22996021 rs29678187 5 40.926 rs13482311 rs13482311 14 45.801

rs3721607 rs3721607 5 42.897 rs30835762 rs30835762 14 45.957 [129]

CEL-5_87173557 rs29560457 5 44.025 rs13482327 rs13482327 14 47.839

gnf05.084.686 rs13478384 5 44.297 rs3692362 rs3692362 14 49.712

rs3673049 rs3673049 5 44.318 CEL-14_94845626 rs30942806 14 52.610

rs13478388 rs13478388 5 44.712 rs13482354 rs13482354 14 56.151

rs6232866 rs6232866 5 45.474 rs3654132 rs3654132 14 56.158

rs13478402 rs13478402 5 47.492 rs13482375 rs13482375 14 57.817

CEL-5_93945748 rs29502845 5 47.559 rs6256423 rs6256423 14 58.780

rs3661241 rs3661241 5 47.740 rs6169105 rs6169105 14 60.717

rs3705458 rs3705458 5 48.326 rs3665550 rs3665550 14 62.625

rs13478416 rs13478416 5 48.461 rs13482398 rs13482398 14 63.475

rs13478451 rs13478451 5 53.197 rs13482407 rs13482407 14 65.852

gnf05.106.197 rs13478463 5 54.517 rs31594308 rs31594308 15 1.843

rs13478473 rs13478473 5 55.949 rs13482418 rs13482418 15 1.849

rs6362200 rs6362200 5 55.994 rs3665826 rs3665826 15 2.032

gnf05.110.207 rs13478479 5 56.285 CEL-15_8331158 rs31671106 15 3.823

rs6255362 rs6255362 5 60.089 CEL-15_9687257 rs31927778 15 4.440

CEL-5_117374791 rs32146173 5 60.445 rs13482431 rs13482431 15 5.575

rs13478518 rs13478518 5 66.363 rs3711814 rs3711814 15 6.248

rs13478540 rs13478540 5 74.362 CEL-15_14786403 rs31608298 15 7.706

rs13478546 rs13478546 5 76.004 CEL-15_15482356 rs32187076 15 7.773

rs4225536 rs4225536 5 77.062 rs13482455 rs13482455 15 7.869

rs6298689 rs6298689 5 78.785 rs3715857 rs3715857 15 8.139

rs3656197 rs3656197 5 80.281 rs13482477 rs13482477 15 8.543

rs13478567 rs13478567 5 81.309 rs32275246 rs32275246 15 8.617

rs13478573 rs13478573 5 82.564 rs13482497 rs13482497 15 11.032

rs13478583 rs13478583 5 85.175 rs13482504 rs13482504 15 12.282

rs3718776 rs3718776 5 89.193 rs32084485 rs32084485 15 12.542

rs3661828 rs3661828 6 1.448 rs6188239 rs6188239 15 14.103

rs13478602 rs13478602 6 1.650 rs13482529 rs13482529 15 15.274

rs6172481 rs6172481 6 1.888 CEL-15_43206205 rs32312277 15 16.751

rs3699833 rs3699833 6 2.513 rs13482543 rs13482543 15 16.809

rs13478617 rs13478617 6 3.436 rs13482549 rs13482549 15 17.395

rs13478631 rs13478631 6 4.899 rs3707900 rs3707900 15 19.919

rs13478641 rs13478641 6 7.199 rs13482585 rs13482585 15 21.175

rs3655269 rs3655269 6 8.073 rs13482595 rs13482595 15 25.030

rs13478656 rs13478656 6 8.808 rs31824103 rs31824103 15 29.031

rs3671709 rs3671709 6 11.736 rs13482628 rs13482628 15 29.421

rs13478677 rs13478677 6 11.989 rs13482637 rs13482637 15 32.182

gnf06.026.418 rs13478684 6 12.318 rs13482661 rs13482661 15 37.583

rs30945099 rs30945099 6 15.021 rs3665030 rs3665030 15 37.745 [130]

rs13478705 rs13478705 6 15.022 rs32164372 rs32164372 15 38.015

rs13478709 rs13478709 6 15.265 rs6276391 rs6276391 15 38.245

rs3704635 rs3704635 6 15.430 rs13482695 rs13482695 15 43.934

rs6330932 rs6330932 6 16.438 rs13482704 rs13482704 15 45.108

rs13478719 rs13478719 6 17.777 rs13482712 rs13482712 15 46.484

rs6238771 rs6238771 6 21.815 rs6285067 rs6285067 15 49.907

rs13478732 rs13478732 6 21.865 rs13482751 rs13482751 15 58.021

rs13478745 rs13478745 6 23.816 rs4152638 rs4152638 16 2.426

rs13478753 rs13478753 6 24.558 rs4152790 rs4152790 16 2.475

CEL-6_57082524 rs30910255 6 27.764 rs4231060 rs4231060 16 2.889

rs13478819 rs13478819 6 32.381 rs4157412 rs4157412 16 3.415

rs3672029 rs3672029 6 32.530 rs4158907 rs4158907 16 3.509

rs13478830 rs13478830 6 32.556 rs32677126 rs32677126 16 9.636

rs3699367 rs3699367 6 33.047 rs4165081 rs4165081 16 12.309

rs6377140 rs6377140 6 35.012 rs4165279 rs4165279 16 12.645

rs6181382 rs6181382 6 35.526 rs4167031 rs4167031 16 19.714

CEL-6_83434907 rs31002289 6 35.969 rs4168640 rs4168640 16 21.374

rs13459097 rs13459097 6 36.950 rs4170308 rs4170308 16 22.749

CEL-6_86289708 rs30605552 6 37.538 rs4171440 rs4171440 16 24.104

rs13478882 rs13478882 6 39.970 rs4177651 rs4177651 16 27.831

rs13478891 rs13478891 6 40.922 rs4179075 rs4179075 16 28.385

rs13478893 rs13478893 6 41.108 rs4182243 rs4182243 16 30.376

rs6239023 rs6239023 6 42.752 rs4183448 rs4183448 16 30.604

rs6349084 rs6349084 6 44.804 rs4185639 rs4185639 16 31.431

rs13478919 rs13478919 6 45.115 rs4186744 rs4186744 16 32.099

rs4138572 rs4138572 6 45.600 rs3696661 rs3696661 16 32.767

rs6292642 rs6292642 6 48.559 rs6163640 rs6163640 16 32.971

rs6208251 rs6208251 6 48.778 rs4189277 rs4189277 16 34.227

rs13478971 rs13478971 6 51.615 rs4189683 rs4189683 16 34.357

rs13478977 rs13478977 6 52.302 rs4191367 rs4191367 16 34.819

rs6393943 rs6393943 6 53.019 rs4193069 rs4193069 16 36.067

rs6401637 rs6401637 6 53.749 rs4194417 rs4194417 16 36.445

rs13478997 rs13478997 6 55.551 rs4197416 rs4197416 16 37.958

rs13479014 rs13479014 6 59.240 rs4197715 rs4197715 16 38.865

rs6389420 rs6389420 6 61.973 rs4200124 rs4200124 16 40.292

rs3681620 rs3681620 6 62.834 rs4201998 rs4201998 16 40.906

CEL-6_130920075 rs30316697 6 63.358 rs6317052 rs6317052 16 45.623

rs3722480 rs3722480 6 63.612 rs4211364 rs4211364 16 45.892

rs6339546 rs6339546 6 64.542 rs4211664 rs4211664 16 46.759

rs13479053 rs13479053 6 64.794 rs3680665 rs3680665 16 46.916 [131]

rs30015657 rs30015657 6 65.878 rs4212102 rs4212102 16 47.413

rs3023102 rs3023102 6 66.215 rs4213268 rs4213268 16 49.112

rs13479071 rs13479071 6 68.530 rs4214396 rs4214396 16 49.568

rs3672808 rs3672808 6 69.744 rs4217260 rs4217260 16 51.637

rs6329892 rs6329892 6 74.038 rs4219905 rs4219905 16 54.059

rs3658783 rs3658783 6 76.543 rs3712360 rs3712360 16 54.488

rs30160101 rs30160101 6 77.568 rs3696981 rs3696981 16 57.427

rs30848564 rs30848564 6 77.701 rs4136382 rs4136382 17 2.001

rs6265387 rs6265387 6 78.094 rs13482845 rs13482845 17 2.515

gnf06.149.306 rs30950360 6 78.865 UT_17_3.221024 rs33373629 17 3.756

rs3700068 rs3700068 7 2.536 rs3672987 rs3672987 17 17.806

rs3714915 rs3714915 7 2.617 rs6390174 rs6390174 17 19.311

rs13479105 rs13479105 7 2.689 rs3702604 rs3702604 17 19.559

rs6384973 rs6384973 7 3.058 CEL-17_39772541 rs33178797 17 19.602

rs3658362 rs3658362 7 7.760 rs6409750 rs6409750 17 22.923

rs3689218 rs3689218 7 9.441 rs13483016 rs13483016 17 25.749

rs13479153 rs13479153 7 13.060 rs6272475 rs6272475 17 27.907

rs31465998 rs31465998 7 15.866 rs3714226 rs3714226 17 28.823

CEL-7_17114061 rs32390275 7 16.524 rs3727008 rs3727008 17 38.866

rs6239372 rs6239372 7 20.611 rs3675634 rs3675634 17 42.179

rs32082888 rs32082888 7 25.481 rs3691628 rs3691628 17 44.650

gnf07.032.889 rs13479208 7 27.892 rs13483103 rs13483103 17 45.436

rs3719256 rs3719256 7 28.860 rs13483117 rs13483117 17 49.683

CEL-7_36725559 rs31893514 7 31.277 rs13483119 rs13483119 17 50.304

rs3699938 rs3699938 7 32.593 gnf17.083.842 rs13483131 17 51.965

rs13479243 rs13479243 7 32.844 rs13483135 rs13483135 17 53.085

rs13479251 rs13479251 7 32.929 rs13483144 rs13483144 17 56.362

rs3663313 rs3663313 7 33.441 rs4231720 rs4231720 17 56.873

mCV23672419 rs31944466 7 33.556 rs33117354 rs33117354 17 59.733

rs3714908 rs3714908 7 34.645 rs6288047 rs6288047 17 60.459

rs13479319 rs13479319 7 38.358 rs6397044 rs6397044 17 60.640

rs3676254 rs3676254 7 41.543 rs13483183 rs13483183 18 2.302

rs13479339 rs13479339 7 43.462 CEL-18_5565618 rs29628302 18 3.578

rs13479347 rs13479347 7 43.881 rs4231742 rs4231742 18 5.653

rs13479355 rs13479355 7 44.319 rs13483212 rs13483212 18 6.068

rs13479358 rs13479358 7 45.126 mCV23617245 rs30271401 18 8.002

rs13479363 rs13479363 7 45.693 rs6358426 rs6358426 18 10.034

rs6213614 rs6213614 7 47.085 rs13483244 rs13483244 18 11.555

rs13479385 rs13479385 7 48.729 rs4138020 rs4138020 18 11.996

rs13479395 rs13479395 7 50.443 rs3722973 rs3722973 18 17.694 [132]

CEL-7_77850273 rs32517623 7 50.648 rs3664831 rs3664831 18 19.514

rs4226783 rs4226783 7 51.622 rs6220251 rs6220251 18 21.395

rs13479414 rs13479414 7 52.150 rs3676196 rs3676196 18 22.966

rs3707067 rs3707067 7 53.897 rs4231834 rs4231834 18 23.625

rs13479427 rs13479427 7 54.309 rs29976462 rs29976462 18 23.876

rs4137968 rs4137968 7 54.459 CEL-18_46082735 rs30176009 18 24.277

rs3713052 rs3713052 7 54.621 gnf18.051.412 rs13483364 18 28.334

rs6357312 rs6357312 7 54.711 CEL-18_60214752 rs29626928 18 34.209

rs13479461 rs13479461 7 58.644 rs8237402 rs8237402 18 35.096

rs6275579 rs6275579 7 67.508 rs3688789 rs3688789 18 37.483

rs13479506 rs13479506 7 68.039 rs3691542 rs3691542 18 39.503

rs3682038 rs3682038 7 69.108 rs13483432 rs13483432 18 45.629

rs3716088 rs3716088 7 76.319 rs13483436 rs13483436 18 48.825

rs13479540 rs13479540 7 77.809 rs13459193 rs13459193 18 50.842

CEL-7_122752866 rs31680975 7 81.137 rs13483446 rs13483446 18 51.811

rs3663988 rs3663988 7 84.321 rs3720876 rs3720876 18 53.237

rs31198744 rs31198744 7 88.854 rs13483466 rs13483466 18 55.727

rs6153168 rs6153168 8 3.092 mCV24836796 rs29667966 18 57.837

rs3023176 rs3023176 8 7.587 rs13483482 rs13483482 18 58.209

rs13479619 rs13479619 8 7.914 rs30456650 rs30456650 19 3.378

rs3657963 rs3657963 8 8.446 rs6236348 rs6236348 19 4.441

gnf08.014.711 rs13479625 8 9.266 UT_19_10.709331 rs30517739 19 7.103

mCV24845756 rs33566717 8 10.583 rs6316813 rs6316813 19 8.177

rs4140004 rs4140004 8 11.326 CEL-19_12595293 rs30697936 19 8.771

rs3661760 rs3661760 8 11.480 rs31280015 rs31280015 19 10.150

rs13479656 rs13479656 8 14.032 rs3686467 rs3686467 19 12.273

rs13479673 rs13479673 8 16.533 rs30604569 rs30604569 19 13.204

rs4227096 rs4227096 8 18.218 gnf19.017.711 rs13483556 19 13.705

rs3706948 rs3706948 8 18.826 rs6309315 rs6309315 19 18.426

rs13479701 rs13479701 8 22.382 rs6293693 rs6293693 19 20.923

rs3665028 rs3665028 8 22.606 rs3090325 rs3090325 19 21.258

rs3667738 rs3667738 8 23.504 rs30746021 rs30746021 19 21.788

rs13479716 rs13479716 8 23.710 rs13459194 rs13459194 19 26.625

rs3703811 rs3703811 8 23.870 rs6237466 rs6237466 19 27.057

rs3666140 rs3666140 8 24.316 rs30396271 rs30396271 19 29.792

rs6386110 rs6386110 8 24.757 CEL-19_34542259 rs30364728 19 29.987

rs3719401 rs3719401 8 27.887 gnf19.035.019 rs30920120 19 32.228

rs13479755 rs13479755 8 28.863 rs3655407 rs3655407 19 34.251

CEL-8_51607005 rs31276910 8 29.030 rs30415214 rs30415214 19 35.972

rs13479768 rs13479768 8 29.246 rs3656289 rs3656289 19 36.170 [133]

rs3707439 rs3707439 8 30.761 rs3687275 rs3687275 19 36.617

rs3672639 rs3672639 8 30.936 rs13461374 rs13461374 19 36.698 rs13479793 rs13479793 8 31.514 rs13483643 rs13483643 19 38.427 rs13479794 rs13479794 8 31.550 rs3023496 rs3023496 19 41.195 rs13479811 rs13479811 8 33.219 rs6304326 rs6304326 19 47.937

rs3656875 rs3656875 8 33.567 rs30931570 rs30931570 19 48.458

rs3699406 rs3699406 8 34.067 rs13483682 rs13483682 19 51.073

UT_4_59.889271 rs249259485 19 56.384

[134]

Ehiole O. Akhirome

Department of Pediatrics Campus Box # 8208 Washington University School of Medicine 660 South Euclid Avenue Laboratory of Patrick Jay Saint Louis, MO 63110 314-286-2787 [email protected]

EDUCATION

Washington University School of Medicine St. Louis, MO MD, PhD (Medical Scientist Training Program) 2012-2020 Emory University Atlanta, GA B.S. Biology (Summa Cum Laude), B.A. Chemistry 2008-2012

RESEARCH EXPERIENCE Washington University School of Medicine Doctor of Philosophy (PhD) in Developmental, Regenerative and Stem Cell 2014-2018 Biology Advisor: Patrick Jay, MD, PhD “Genetic interactions and maternal genes modulate congenital heart disease risk”

Emory University Department of Biology Honor’s Thesis Advisor: Nicole Gerardo, PhD 2010-2012 “The impact of defensive symbionts and host plants on fitness and population dynamics of pea aphids, Acyrthosiphon pisum” SIRE Research Fellow Advisor: Nicole Gerardo, PhD “Competition in aphid populations”

Emory University School of Medicine SURE/CPFAR Fellow Advisor: Paul Spearman, MD 2009 & 2010 “The role of Rab11a and Tetherin in HIV-1 replication”

PEER-REVIEWED PUBLICATIONS 1. Akhirome E, Regmi SD, Magnan RA, Ugwu N, Qin Y, Schulkey CE, Cheverud JM, Jay PY. The impact of epistasis increases with the fitness cost of a congenital heart defect. [Manuscript in press] 2. Akhirome E, Walton NA, Nogee JM, & Jay PY. The Complex Genetic Basis of Congenital Heart Defects. Circulation Journal, 2017, 81.5:629-634. 3. Jay PY, Akhirome E, Magnan RA, Zhang R, Kang L, Qin Y, Ugwu N, Regmi SD, Nogee JM, Cheverud JM. Transgenerational cardiology: One way to a baby's heart is through the mother. Molecular and Cellular Endocrinology, 2016, 435:94-102. 4. Akhirome E & Jay PY. Rhythm genes sing more than one tune: Noncanonical functions of cardiac ion channels. Editorial. Circulation: Arrhythmia and Electrophysiology, 2015, 8(2):261-2. 5. Qi M, Williams JA, Chu H, Chen X, Wang JJ, Ding L, Akhirome E, Wen X, Lapierre LA, Goldenring JR, Spearman P. Rab11-FIP1C and Rab14 direct plasma membrane sorting and particle incorporation of the HIV - 1 envelope glycoprotein complex. PLoS Pathogens, 2013, 9(4):e1003278.

ABSTRACTS & PRESENTATIONS 1. Akhirome E. Gene-gene Interactions Promote Robust Heart Development. Oral Presentation at Cardiovascular Research Symposium at Washington University, 5 th Annual Meeting, 2016, St. Louis, MO. 2. Akhirome E, Regmi SD, Jay PY. Forces of Selection Shape the Genetic Architecture of Congenital Heart Disease. Poster Presentation at Weinstein Cardiovascular Development Conference, 22 nd Annual Meeting, 2015, Boston, MA. 3. Akhirome E, Regmi SD, Jay PY. An Evolutionary Perspective on the Genetic Architecture of Congenital Heart Defects. Poster Presentation at Cardiovascular Research Symposium at Washington University, 3 rd Annual Meeting, 2014, St. Louis, MO. 4. Harris EV, Akhirome E, Parker BJ, Gerardo NM. Cost of immunity in pea aphids associated with host plant interactions. Poster presentation by EVH at Ecological Society of America Conference, 98 th Annual Meeting 2013, Minneapolis, MN.

135

5. Akhirome E, Parker BJ, Gerardo NM. Genotypic Competition in Aphid Populations. Poster Presentation at SIRE Research Symposium at Emory University, 2011, Atlanta, GA. 6. Akhirome E, Parker BJ, Gerardo NM. Dynamics of defensive symbionts in aphid populations. Poster Presentation at SURE Symposium at Emory University, 2011, Atlanta, GA. 7. Akhirome, E, Domanksi P, Hammonds J, Spearman P. Identification of Tetherin-associated Using Tandem Affinity Purification. Poster Presentation at SURE Research Symposium at Emory University, 2009, Atlanta, GA. 8. Akhirome E, Qi M, Hammonds J, Ding L, Spearman P. The Role of Rab11a in HIV -1 Envelope incorporation. Poster Presentation at SURE Research Symposium at Emory University, 2010, Atlanta, GA.

HONORS & AWARDS NIH/NHLBI F30 Individual MD/PhD Fellowship (HL136077) 2017-2020 - PI: Ehiole Akhirome / Sponsors: Patrick Jay, MD, PhD & James Cheverud, PhD HHMI/Jackson Laboratory McKusick Genetics Course Scholarship 2017 Weinstein Cardiovascular Development Conference Travel Award 2016 Initiative for Maximizing Student Development (IMSD) Travel Award 2015 Phi Beta Kappa Society 2010 Emory College Dean’s Achievement Scholarship 2010-2012 Emory College Dean’s List 2008-2012 Emory College Liberal Arts Scholarship 2008-2012

ACTIVITIES McKinsey and Company, Silicon Valley Office Associate Intern: Healthcare Systems and Services Summer 2018 Washington University Department of Biology Graduate Teaching Assistant in Evolutionary Biology for Dr. Kenneth Olsen 2014-2015

CSMB Mentorship Program Mentored students at the Collegiate School of Medicine and Bioscience in St. Louis 2014-2015

Emory University Department of Chemistry Undergraduate Teaching Assistant in Organic Chemistry for Dr. Jose Soría 2009-2010

PROFESSIONAL MEMBERSHIPS American Society of Human Genetics 2017-2018 American Medical Association 2012-Present Missouri Medical Student Association 2012-2016

RESEARCH MENTEES Washington University School of Medicine – Jay Lab Paul Krucylak 2018 - Current: Undergraduate, Washington University in St. Louis Ida Qin 2015-2017 - Current: PhD candidate, California Institute of Technology Nelson Ugwu 2015-2016 - Current: MD student, Yale University School of Medicine

136