Replication and Functional Validation of SNPs Previously Associated with Coronary Artery Disease

Matthew B. Sellers, MD

A Thesis Submitted to the Graduate Faculty of

WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES

in Partial Fulfillment of the Requirements for the Degree of

MASTER OF SCIENCE

in Clinical and Population Translational Science

Winston- Salem, North Carolina

August 2013

Approved by:

David M. Herrington, M.D., M.H.S., Advisor

Examining Committee:

Beverly M. Snively, Ph.D., Examining Chair

Donald W. Bowden, Ph.D., Chairman

Daniel Beavers, PhD

Timothy D. Howard, Ph.D

ACKNOWLEDGEMENTS

I would like to sincerely thank the following people for their guidance, support and mentorship, without whom this thesis submission would not have been possible.

To Dr. Herrington, thank you for your mentorship and guidance throughout the thesis process, as well as your continued mentorship during my cardiology fellowship. Your commitment to the cardiology research fellows, as well as the general clinical fellows, in our pursuit for knowledge has been unparallel.

To Allie Richardson and Karen Blinson, thank you for keeping me in line with all my abstract and poster deadlines, appointment times, and professional obligations. You were instrumental in getting me settled into fellowship as well as providing general advice and support for settling into life in Winston-Salem.

To Georgia Saylor, thank you for your computer and statistical guidance. You were always available on an immediate basis for help, and have spent countless hours running long

SAS programs for these genetic analyses.

To Drs. Howard and Liu, thank you for your guidance with several abstracts and with writing the Translational Science Institute grant. You have been very approachable and patient when discussing genetic concepts and basic laboratory and analysis techniques. Your mentorship was not only instrumental in helping me develop the tools to complete this research project but also forged a friendship through the process.

I would also like to take this opportunity to thank my wife, Holly, for her continued understanding as I have moved her from city to city during my training, decided to undertake an additional year of fellowship and giving me my newborn son for which I am eternally grateful.

ii

TABLE OF CONTENTS

Page

List of Abbreviations……………………………………………………………………….iv

List of Illustrations………………………………………………………………………….v

Abstract……………………………………………………………………………………..vi

Chapter

1. Introduction…………………………………………………………………………1

1. Specific Aims……………………………………………………………………….18

2. Replication of the Association of PSRC1 and MTHFD1L with Coronary Artery Disease:

The Multi-Ethnic Study of Atherosclerosis…………………………………………20

3. Functional Validation of Previously Identified SNPs Associated with Coronary Artery

Disease………………………………………………………………………………35

4. Curriculum Vitae……………………………………………………………………47

iii

LIST OF ABBREVIATIONS

CAD = Coronary Artery Disease

CHD = Coronary Heart Disease

GWAS = Genome-Wide Association Studies

Health ABC = Health, Aging and Body Composition Study

LDL = Low Density Lipoprotein mRNA = Message RNA (ribonucleic acid)

SNP = Single Nucleotide Polymorphism

iv

LIST OF ILLUSTRATIONS

Chapter 1 Page

I. Previously published SNPs associated with CAD with genom-wide significance...4

Chapter 2

I. Manhattan Plot representing the MESA GWAS for all ethnic groups………….. 26

II. Replication of previously published SNPs with genome-wide significance within the

4 ethnic groups MESA GWAS………………………………………………….30

III. MESA SNP associations replicated within Health ABC and meta analysis……..31

IV. Hazard ratios with 95% CI of SNP associated with MTHFD1L in African Americans

(top) and SNP associated with PSRC1 in Caucasians (bottom)………………….31

Chapter 3

I. Previously published SNPs associated with CAD in the literature that were available

for expression analysis………………………………………………………39

II. SNP associations with mRNA expression levels………………………………….40

III. Statistically significant SNP associations stratified by race………………………40

v

Abstract

Introduction

Genome-wide association studies have identified ~30 SNPs associated with CHD in several large cohorts of predominantly European descent. We sought additional replication of these associations in more diverse cohorts including the Multi-Ethnic Study of Atherosclerosis

(MESA) cohort (N=6,425), and the Health Aging and Body Composition (Health ABC) cohort

(N = 2,800). To provide additional functional validation of a subset of 16 of these SNPs we also examined associations of these SNPs with expression of ~25,000 gene transcripts in purified monocytes from a subset (N=1264) of MESA participants.

Methods

Cox proportional hazards models were used to measure the association between SNPs and time to first CHD event after adjustment for age, gender, study site, and genetic ancestral principal components in both the MESA and Health ABC cohorts. Random effects meta- analysis was used to evaluate the association across both cohorts. Genome-wide mRNA expression profiles were generated with the Illumina HumanHT-12 BeadChip platform in purified monocytes from 1264 MESA subjects sampled to achieve balance across age, race/ethnicity and sex. Association of 16 of these SNPs with individual gene transcripts were assessed using generalized linear models adjusting for age, race, sex, study site and technical factors including the percent of cell contamination with neutrophils, natural killer cells, B cells and T cells.

Results

Two of the 30 previously published SNP associations with coronary artery disease, rs599839 within the PSRC1 gene in close proximity to 1 (p value 0.01, HR 1.45)

vi and rs6922269 (p value 0.03, HR 1.52) within the MTHFD1L gene on chromosome 6, revealed nominal significance in the MESA cohort and these findings were replicated within the Health

ABC cohort (PSRC1 p value 0.009, HR 1.25 and MTHFD1L p value 0.016, HR 1.19). Of the 16 previously reported SNPs associated with CAD, 5 were found to be statistically significantly associated with differing mRNA expression levels after Bonferroni adjustment (p values < 3.17 x

10-6), and rs599839 revealed differing mRNA expression of the PSRC1 gene (p value 3 x 10-

14). Increasing number of the rs599839 minor allele was associated with decreased expression of the PSRC1 transcripts.

Conclusion

Two SNPs in proximity to PSRC1 (rs599839) and MTHFD1L (rs6922269) were validated in 2 large independent cohorts strengthening the association of these SNPs with coronary artery disease. Five SNPs with GWAS evidence of associations with coronary artery disease are associated with differing mRNA expression, suggesting that the mechanism of some of these associations may involve modulation of . More research is warranted to determine the full mechanisms of these associations.

vii

Chapter 1

Introduction

In addition to environmental and lifestyle factors1, coronary disease tends to cluster in families suggesting genetic factors play a crucial role in the development of cardiovascular disease, especially premature coronary heart disease. In an early case-control study in the

Honolulu Heart Study cohort, the relative risk of coronary heart disease death was 11.3 for fathers of CHD cases with early onset CHD, and the relative risk of developing CHD for siblings of early onset CHD cases was 2.52. In a prospective study of coronary heart disease in males age

39-59, a reported parental history of coronary heart disease was statistically significantly (p =

0.01) associated with an increased risk of the combined incidence of symptomatic and angina pectoris in subjects under 50 years of age after adjustment for known CHD risk factors3. Further support of genetic influences on the development of coronary heart disease, as well as the increased risk of CHD mortality, was illustrated in a landmark publication within the New England Journal of Medicine utilizing monozygotic and dizygotic twins. The relative hazard of death from CHD when a male twin died of CHD prior to age 55 was 8.1 (95% CI 2.7 to 24.5) for monozygotic twins and 3.8 (95% CI 1.4 to 10.5) for dizygotic twins. Similar risks were seen when female twins died prior to the age of 65 years, and these risks were not significantly decreased when adjusting for known CHD risk factors4. In a subsequent twin study, the heritability of death was 0.57 (95% CI, 0.45-0.69) amongst male twins, and 0.38 (0.26-0.50) amongst female twins5.

As the genetic influence on coronary heart disease has been well established for several decades, since the sequencing of the , researchers have focused on trying to identify and specific regions within the genome with hopes of identifying new

pathophysiologic mechanisms and risk factors for the development of cardiovascular disease, as well as other complex diseases.

Genome-Wide Association Studies

Cardiovascular epidemiology and clinical research have been unable to fully explain an individual’s risk for developing disease. The portion of cardiovascular disease risk not explained by traditional risk factors has been termed the “missing risk” or “missing heritability”6. Recent efforts to explain this missing risk have focused on genetic research, and with new technological advancements, such as genome-wide association studies (GWAS), there has been an explosion of new genetic information. Genome-wide association studies (GWAS) are trying to characterize the association between single nucleotide polymorphisms (SNPs), which serve as markers for specific regions within the genome, and the complex mechanisms leading to coronary artery disease, with aspirations of highlighting novel mechanisms for disease, as well as providing additional data in risk stratification and preventive cardiology.

Several GWAS have identified single nucleotide polymorphisms (SNP) associated with coronary artery disease, and the most replicated region identified is a 100kb region on chromosome 97-11. The major SNP identified by McPherson, rs10757274 located on chromosome 9, revealed a graded risk with a 15-20% increased risk of coronary heart disease in heterozygotes and a 30-40% increased risk of coronary heart disease in homozygotes10. In parallel, Helgadottir identified the major SNP rs10757278 on the same 9p21 which was associated with coronary artery disease with OR 1.29 (p=3.6x10-14) and also revealed a graded risk with an OR of 1.26 and 1.64 for heterozygotes and homozygotes, respectively9. The risk allele was not associated with any of the risk factors for coronary artery disease, and has been associated with coronary artery calcium10. It continues to be unclear how this locus results in an

2 increased risk for coronary artery disease. The closest genes to these SNPs are two cyclin dependent kinase inhibitors, CDKN2A and CDKN2B. These genes encode which play an important role in cell cycle regulation and interact with TGF-B which has been implicated in the pathogenesis of atherosclerosis12. Since these two sentinal studies established the locus on chromosome 9 associated with coronary artery disease, several other studies have strengthened this association by replicating these results in Caucasians, Japanese, Hispanics and East Asians7,

13-16, and 9p21 remains the most well replicated and validated genetic SNP association with coronary artery disease. After this locus on chromosome 9 was identified as associated with coronary artery disease, other GWAS publications have implicated additional SNPs as possibly being related to CHD events, with 30 of the 34 SNPs implicating specific genes [Table 1].

3

Table 1. Previously published SNPs associated with CAD with genome-wide significance

The Welcome Trust Case Control Consortium (WTCCC) replicated the association on chromosome 9, and identified additional SNPs on , 16 and 22 using 2,000 cases with a history of myocardial infarction or coronary revascularization before the age of 66

(identified by records and/or physician reports) against 3,000 controls7. Other than the replicated association with chromosome 9, the other genes found to be associated at genome-wide significance levels with coronary artery disease where intergenic. However, the WTCCC had

4 some modest genome associations with the MTHD1L gene and the ADAMTS17 gene, and these associations were further strengthened in future studies7.

Samani et. al. sought to provide more robust evidence for associations of genetic loci with coronary artery disease and myocardial infarction by analyzing the WTCCC cohort in conjunction with data from the German Myocardial Infarction Family Study. The WTCCC consisted of 1988 subjects with a strong family history of coronary artery disease and a previous myocardial infarction or history of revascularization before the age of 66 years, and the German

MI Family Study consisted of subjects with myocardial infarction before the age of 60 years and a first degree relative with premature coronary artery disease. Using this approach, they were able to identify new loci on 1, 2, 6, 10, and 1511. In the primary analysis, the replication of the locus on chromosome 9p21.3 revealed the strongest signal with a combined p value of 2.91 x 10-19, with 36% risk increase per copy of the risk allele (95%CI, 27 to 46). In addition, the locus on chromosome 6q25.1 revealed a combined p value of 2.90 x 10-8 for the association with coronary artery disease, with a 23% increase per risk allele (95%CI, 15 to 33).

The third locus for the primary analysis on chromosome 2q36.3 had a combined p value of 1.61 x 10-7, with a 21% increase per copy of the risk allele (95%CI, 13 to 30). The loci on chromosome 9 and 6 were not affected by adjustment for known cardiac risk factors; however, the odds ratio for chromosome 2 was reduced. In the combined analysis, loci on chromosome 1,

10 and 15 were also identified11.

This analysis provided evidence to support new genetic loci associations with coronary artery disease on chromosome 6 (MTHFD1L gene), chromosome 1 (PSRC1 and MIA3 genes), chromosome 10 (CXCL12 gene) and chromosome 15 (SMAD3 gene)11. The SNP associated with chromosome 6 maps to the MTHFD1L gene which encodes the mitochondrial isozyme of

5

C1-tetrahydrofolate (THF) synthase. This family of C1-THF synthases is a part of a variety of cellular process involved in the synthesis of purine and methionine11. The SNPs associated with the PSRC1 gene have been implicated in pathways with plausible pathophysiologic mechanisms related to metabolic profile and atherosclerosis. This region of the genome around the SNPs on chromosome 1 near PSRC1 have been associated with LDL cholesterol in some recent genome- wide association studies17. These results were further validated in a study investigating the association of rs599839 with traditional cardiovascular risk factors. In this study, rs599839 was associated with higher LDL cholesterol levels per copy of the risk allele in over 2,000 adults (p=

3.84 x 10-6)18. In addition, liver expression levels of PSRC-1, as well as the gene cadherin, EGF

LAG seven-pass G-type receptor 2 (CELSR2) and sortilin 1 (SORT1), have been significantly associated with LDL cholesterol levels in mouse models and humans. Combined data suggests that PSRC-1, CELSR2 and sortilin 1 are tightly co-regulated and may operate in a conserved network associated with the metabolic phenotypes of dyslipidemia, obesity, diabetes, and atherosclerosis19. The MIA3 and SMAD3 genes are also involved in the cell cycle and cell growth and inhibition.

Kathiresan et. al. evaluated associations with early myocardial infarction identified by autopsy or clinical presentation in close to 3,000 subjects and replicated these findings in close to

20,000 controls identifying new loci on chromosomes 1, 2, 6, 19, and 2120. This was a four stage

GWAS replication study design of early myocardial infarction in the Myocardial Infarction

Genetics Consortium (MIGen), a cohort of 2,967 cases of early myocardial infarction defined as men less than 50 years old and women less than 60 years old and over 3,000 age and sex matched controls. This analysis revealed nine SNPs associated with myocardial infarction at a pre-specified genome wide significance level of 5 x 10-8, of which four represented replication

6 of previously reported associations and five represented novel associations. The four previous associations mapped to chromosome 9p21, 1p13 near CELSR2-PSRC1-SORT1, 10q11 near

CXCL12 and 1q41 in MIA3. Of the novel associations, two (19p13 near LDLR and 1p32 near

PCSK9) are possibly involved in the pathophysiology of risk factors for myocardial infarction.

Mutations in both of these genes have been found to influence LDL cholesterol levels, and these loci may increase coronary artery disease risk via acting through LDL cholesterol, a very well known risk factor for coronary artery disease and myocardial infarction. The other three novel associations were an intergenic region on chromosome 21, an intron in PHACTR1 (phosphatase and actin regulator 1) on chromosome 6 and an intron in WDR12 (WD repeat domain 12) on chromosome 2 with OR of 1.19, 1.13 and 1.17, respectively20. The mechanism by which these regions may influence coronary artery disease risk are unknown, however, PHACTR1 has been found to be associated with coronary artery calcium in over 10,000 participants, and WDR12 is involved in cell proliferation20.

In a follow up GWAS, Erdman et. al. used a three stage analysis model which identified

1 new region of association with CAD with strong statistical evidence on chromosome 3 and another region with suggestive statistical evidence on . In this approach, 1222

German subjects from the German MI Family Study II with premature coronary disease and a positive family history were analyzed against 1298 population based controls, and the preliminary data underwent replication in three additional genome-wide datasets of CAD and subsequent replication in roughly 25,000 subjects. After replication and follow up analysis, SNP rs9818870 in the MRAS gene on chromosome 3 revealed an aggregate p value of 7.44 x 10-13 with an OR of 1.15 (95% CI, 1.11 to 1.19), and the SNP rs2259816 in the HNF1A-C12orf43 region on chromosome 12 revealed an aggregate p value of 4.81 x 10-7 with an OR of 1.08 (95%

7

CI, 1.05 to 1.11). Neither of these two regions was associated with traditional risk factors upon further analysis. The MRAS gene encodes the M-ras which belongs to the family of ras

GTP-binding proteins and is widely expressed in the cardiovascular system. The authors provide evidence that suggests a role for M-ras in adhesion signaling, a vital process in the pathophysiology of atherosclerosis21.

In addition to the MRAS gene, there was statistical evidence supporting an association of a region of linkage disequilibrium encompassing the HNF1A and C12orf43 genes. HNF1A encodes a transcription factor that regulates gene expression in the liver, and variants of this gene have been implicated in the process of maturity-onset diabetes of the young (MODY), plasma levels of C-reactive protein, and higher plasma levels of low-density lipoprotein cholesterol. The

C12orf43 gene is also highly expressed in cardiovascular tissues21.

Erdman et. al. sought to further identify additional coronary artery disease loci applying a stepwise approach by initially performing a GWAS on 1157 cases and 1748 controls within the

German MI Family Study III (KORA). The SNPs with liberal statistical significance were replicated in two independent samples, the German MI Family Study I and II revealing nine SNP associations that were further analyzed in a wet lab approach followed by a meta-analysis ultimately yielding a novel association of rs3739998 on chromosome 10 with a combined p value of 1.27 x 10-11 (OR 1.15, 95% CI 1.11 to 1.20). This SNP leads to a missense variant in the of the KIAA1462 gene, which encodes a protein with unknown function22.

A recent publication has built upon the GWAS by including SNP associated gene expression analysis, as well as cardiovascular risk factors and subclinical disease markers and their association with mRNA expression levels. Using this approach, Wild et. al. identified a novel loci within the LIPA gene as associated with coronary artery disease in a case-control

8 design with 5,031 cases, followed by a two stage replication analysis and a final meta-analysis of

59,789 cases and control subjects. Two SNPs within the LIPA gene on chromosome 10 reached genome-wide significance, rs1412444 (p value 3.71 x 10-8; OR 1.1; 95%CI 1.07 to 1.14) and rs2246833 (p value 4.35 x 10-8; OR 1.1; 95%CI 1.06 to 1.14). Both the rs1412444 and rs2246833 SNP are located within the intronic regions of the LIPA (lysosomal acid lipase A) gene, and both showed a strong association with higher LIPA mRNA expression with p values of

1.3 x 10-96 and 4.0 x 10-96 respectively. In further analysis, LIPA expression was found to be significantly associated with lower HDL cholesterol and impaired endothelial function measured by flow-mediated vasodilation. In addition, two previously reported associated with coronary artery disease, PSRC1 on chromosome 1 and WDR12 on chromosome 2, were found to affect transcription levels. The risk alleles for the two SNPs rs599839 and rs629301 on chromosome 1 were associated with decreased levels of PSRC1 transcripts, and the risk allele for rs6725887 on chromosome 2 was associated with decreased FAM117B transcript levels23. Increased PSRC1 levels were found to be associated with lower LDL levels, higher HDL levels, lower blood pressure, and improved endothelial function, and in support of these findings the risk alleles of the PSRC1 SNPs have been associated with increasing LDL levels and extent of atherosclerosis plaques23.

Lysosomal acid lipase (LAL) is encoded by the LIPA gene and plays an essential role in lipoprotein homeostasis and metabolism, and mutations in LIPA may lead to cholesterol ester storage disease or Wolman disease and can result in premature atherosclerosis24. LDL cholesterol binds to LDL receptors on the cellular membrane and is ultimately endocytosed and transported to lysosomes where LAL hydrolyzes triglycerides and cholesteryl esters. With nonfunctioning LAL, cholesteryl esters accumulate in cells leading to foam cells which play a

9 crucial role in the development of atherosclerosis25. It has been postulated that increased LAL activity may lead to increased generation of enzymatically modified LDL and free cholesterol resulting in foam cell formation and inflammation23. Therefore, it seems apparent that the balance of LAL activity be maintained in homeostasis, and abnormally high expression or abnormally low expression may lead to the development of atherosclerosis.

In a novel approach, Slavin et. al. used a two-marker GWAS approach and identified six separate genome regions, three located within genes, in association with coronary artery disease using the WTCCC26. Hemochromatosis type 2 (HFE2), the cause of a form of juvenile hemochromatosis which can cause cardiomyopathy from iron overload was associated with coronary artery disease with p = 1.75 x 10-7 for a two-marker logistic model. Serine/threonine kinase 32B (STK32B) and disco-interacting protein 2 homolog C (DIP2C) were also statistically significantly associated with coronary artery disease (p = 2.19 x 10-11 and 1.18 x 10-8, respectively), however, there is no known plausible pathophysiologic link with cardiovascular disease and these gene products. In addition, three noncoding regions were found to be associated with coronary artery disease, supporting the known association with chromosome 9, and identifying a region relatively close to heat shock protein 90 kDa beta (HSP90B1).

HSP90B1 is a highly conserved molecular chaperone protein, and evidence is provided supporting a role in cell differentiation and regulation, as well as an association with nitric oxide synthase in rat myocardial ischemic reperfusion injury26.

Subsequent GWAS have been done, including a meta-analysis from the CARDIoGRAM consortium and an additional GWAS from the Coronary Artery Disease Genetics Consortium, which have implicated other SNPs as associated with coronary artery disease27-28. The transatlantic Coronary ARtery DIsease Genome wide Replication and Meta-analysis

10

(CARDIoGRAM) consortium undertook a large meta-analysis of fourteen GWAS of coronary artery disease including 22,233 cases with coronary artery disease and 64,762 controls of

European descent followed by genotyping the top hits in 56,682 subjects revealing thirteen new loci associated with coronary artery disease and further supporting 10 loci previously identified in the literature as associated with coronary artery disease [Table 1]. This large meta-analysis strengthened the majority of previously reported SNP associations, but also identified a large number of new associations with the majority of the risk loci appearing to act via novel mechanisms leading to increased risk of coronary artery disease and a large portion were also associated with other disease traits27.

In addition to identifying SNPs associated with coronary artery disease, the

CARDIoGRAM consortium attempted to explore the possible mechanisms by which the newly identified regions impart their increased risk. In line with prior GWAS of coronary artery disease, only three of the thirteen newly identified SNPs were associated with traditional risk factors. The region on chromosome 11 and 9 were associated with dyslipidemia, with rs964184 associated with increased levels of LDL and decreased levels of HDL, and rs579459 within the

ABO gene was associated with increased LDL levels. The loci on chromosome 10, rs12413409 within the CYP17A1-CNNM2-NT5C2 gene was associated with hypertension. Analysis of four additional loci were found to result in non-synonymous coding changes or were in linkage disequilibrium with other SNPs which resulted in coding changes of ZC3HC1 on ,

ADAMTS7 on chromosome 15, HHIPL1 on and GIP on chromosome 17.

Three loci were found to result in changes in gene expression of TCF21 on chromosome 6, and

RASD1-SMCR3-PEMT and UBE2Z on chromosome 17. Lastly, five of the new loci have been implicated in multiple other disease traits27.

11

The Coronary Artery Disease Genetics Consortium combined a cohort of Europeans with a cohort of South Asians and completed a meta-analysis identifying additional new loci associated with coronary artery disease28. This analysis evaluated 8,424 Europeans and 6,996

South Asians against 15,062 controls in a meta-analysis of four large GWAS with replication in an independent sample of 21,408 cases and 19,185 controls revealing five loci: LIPA on chromosome 10, PDGFD on chromosome 11, ADAMSTS7-MORF4L1 on chromosome 15, a gene rich region on chromosome 7 and KIAA1462 on chromosome 10. The SNP rs1412444 is in an intron on LIPA (lysosomal acid lipase gene), and has also been associated with increased expression levels of LIPA mRNA in monocytes29; of note, both the LIPA association and the

KIAA1462 association had been identified in roughly parallel publications22-23. Genetic mutations within LIPA can result in a Mendelian disorder with hyperlipidemia and increased atherosclerosis29, but it is not clear how this SNP results in an increased risk of coronary artery disease. The SNP rs974819 was in linkage disequilibrium with the platelet-derived growth factor D (PDGFD) gene, and this analysis also revealed SNP associated mRNA expression differences in the (PDGFD) gene. PDGFD may play a role in atherosclerosis as it has been shown to stimulate matric metalloproteinase activity influence monocyte migration, important inciting events in the development of atherosclerotic plaques30.

As technology for accurately performing GWAS became universally available, massive amounts of genetic data have exponentially accumulated in the literature, as outlined above in an abridged cardiovascular GWAS review. With this accumulation of raw genetic data at light speeds ahead of the pre-GWAS era, there have been evolving statistical questions and analytical concerns with regards to how to interpret this ocean of new information. One of the most debated concerns regarding GWAS is the issue of multiple testing. Multiple testing has been

12 known to potentially lead to false positive results, but until the era of the GWAS very few studies have even come close to analyzing 500,000 to 1 million hypotheses which is now not uncommon with GWAS. There is continued debate as to what can be declared a significant result, and for this reason, it becomes increasingly important to have very large sample sizes and to replicate previously identified associations, especially in samples representative of the population. The

NCI-NHGRI working group has extensively published on the importance of replication of

GWAS data31.

In addition to the problems with interpretation of GWAS results in the setting of multiple testing, the majority of data supporting these SNP-trait associations are derived from predominately Caucasian populations and not all have been replicated in independent cohorts.

There have been a few studies identifying novel SNPs in other ethnic groups, specifically in populations of Japenese and Chinese decent. Aoki et. al. identified a novel SNP not previously reported in any of the large GWAS of European and Caucasian decent located on chromosome 5.

This loci on chromosome 5 was associated with myocardial infarction with a combined p value of 5.3 x 10-13 and an OR of 0.80 [95%CI, 0.75 to 0.85] suggesting a protective genetic factor for myocardial infarction in this Japanese population32. In addition, Wang et. al. performed a

GWAS in the Chinese Han population identifying a novel region on chromosome 6 within the

C6orf105 gene associated with coronary artery disease with combined p value of 4.87 x 10-12 and OR 1.51 [95% CI, 1.34 to 1.70]. The minor risk allele of this SNP was also found to be associated with decreased mRNA expression of the Corf105 gene33.

Gene Expression

Genome-wide association studies have contributed to closing the knowledge gap termed above as the missing heritability. Unfortunately, GWAS have only been able to explain a

13 fraction of the risk for disease, with the average single nucleotide polymorphism (SNP) association having an odds ratio of 1.2 for complex diseases. In addition, GWAS are limited to evaluation of the fixed DNA sequence and not the fluidity of genetic regulation, tissue specific gene expression and the environment.

Using GWAS to identify SNPs associated with cardiovascular disease has allowed researchers to scan the genome and identify regions of interest; however, this approach may not provide direct information on how these risk loci contribute to disease processes. In an attempt to elucidate more functional genetic information, gene expression data and identifying the underlying transcriptome may prove more logical. Analyzing the genetic code, whether via

GWAS or other direct assessment, may provide prognostic information, however, analysis of dynamic changes in DNA, such as mRNA expression, my provide insight into prognosis, disease severity, response to specific interventions, and monitoring of disease activity. Within the relatively novel field of transcriptomics, differing gene expression data has been utilized in an attempt to further understand the pathophysiology of cardiovascular disease.

With cardiovascular tissues not readily available or routinely sampled in diseased subjects, there has been a focus on utilizing peripheral blood which is easily obtained for expression analysis. In this direction, Taurino et. al. used whole blood to identify differing expression profiles in subjects with CAD compared to controls, as well as analyzed changes in expression after participation of subjects in cardiopulmonary rehabilitation34. Between CAD subjects versus controls, 365 genes were differentially expressed and through biological pathway analysis, expression of NDUFB3 [NADH dehydrogenase (ubiquinone) 1B subcomplex 3],

UQCRQ (ubiquinol-cytochrome c reductase, complex III subunit VII), and ATP5I (ATP synthase, H+ transporting, mitochondrial F0 complex, subunit E) were greater in subjects with

14

CAD. In addition, completion of cardiopulmonary rehabilitation was associated with a downregulation of genes in these pathways. Further, micro-RNA (mi-RNA) expression profiling revealed has-miR-140-3p and has-miR-182 were differentially expressed and gene expression of the above mi-RNA target genes was reduced. These gene pathways are involved in mitochondrial dysfunction and oxidative phosphorylation, and mitochondrial dysfunction, particularly generation of ROS by mitochondria, has been implicated in oxidative stress and cardiovascular disease35.

In an additional study analyzing peripheral whole blood, Sinnaeve et. al evaluated gene expression data on 120 cases, defined on the basis of their coronary artery disease index, and 121 controls without angiographic evidence of coronary artery stenosis36. In the univariate analysis,

160 genes were found to correlate with CAD-Index, and in the multivariate-ANOVA 19 prob sets were statistically significant. In addition, these differential gene expression patterns accurately separated subjects on the basis of the severity of aortic atherosclerosis. Most of the genes identified are involved in bone marrow cell differentiation, cell growth or arrest, cell adhesion and matrix modulation, and inflammatory and immune response, all processes that have been implicated in the pathophysiology of atherosclerosis.

It is widely accepted that the pathophysiology of atherosclerosis, at least in part, involves a systemic inflammatory component driven by monocytes/macrophages, and T cells. Therefore, additional literature on expression profiles in cardiovascular disease have isolated monocytes and performed similar expression data analysis. Wingrove et. al. performed a stepwise analysis to identify expression profiles and elucidate genes that distinguish between subjects with angiographically confirmed CAD and control subjects using isolated monocytes from peripheral blood samples37. Genes were identified via microarray and replicated using RT-PCR in the

15

Siegburg cohort and after multivariate analysis, 14 genes were statistically significant. These genes were replicated within the CATHGEN cohort confirming 11 of the 14. A gene expression score created by summing the expression levels of the fourteen genes was proportional to the severity of coronary artery disease. The biological pathways incorporating these 11 genes involved oxidation, extracellular matrix, cell motility proteins, signaling receptors and transcription factors.

Several other studies utilizing peripheral leukocytes have identified altered gene expression which may have a role in the atherosclerotic process. Lysosome-associated membrane protein-2 (LAMP-2) expression was found to be increased in peripheral blood leukocytes of coronary artery disease patients38. LAMP-2 is a protein found in the lysosomal membrane and is required for appropriate lysosomal fusion with autophagosome and phagosome, lysosomal mobility, and chaperone-mediated autophagy38. In a separate analysis, microtubule- associated protein 1 light chain 3 (LC3) expression was decreased in coronary artery disease subjects. LC3 is involved in autophagosome formation, suggesting autophagy in peripheral blood leukocytes may be involved in the pathophysiology of atherosclerosis39.

The above gene expression cardiovascular research provides evidence that expression patterns may differ among cardiovascular disease patients and controls. As SNPs provide markers for possible genetic regions of interest, and with the knowledge that expression patterns probably differ in subjects with disease phenotypes compared to control subjects, it may be plausible that these SNPs play an undetermined role in mRNA expression.

Summary

It has been well established that an individual’s genetic architecture plays an integral role in which patients develop clinically significant coronary artery disease. Recently, GWAS have

16 identified SNPs that are associated with the development of coronary artery disease, and in most cases have identified novel regions of interest. The mechanisms for how these SNPs lead to an increased risk of CAD remain largely unknown. The goals of this thesis are to explore the associations of single nucleotide polymorphisms with coronary artery disease in more ethnically diverse populations, and investigate one possible mechanism for how these static markers may alter gene expression.

17

Specific Aims

 Primary Specific Aim 1: To replicate previously reported SNPs associated with

myocardial infarction or coronary heart disease events in the literature with incident all

coronary heart disease events in the Multi-Ethnic Study of Atherosclerosis (MESA)

cohort [Phase 1].

 Secondary Specific Aim 1: Further support the above findings through replication

within the Health Aging and Body Composition (Health ABC) cohort and perform a

meta-analysis of these results [Phase 2].

 Hypothesis: One or more previously reported SNPs associated with MI or CHD events

will also be associated with incident All CHD events in the MESA cohort, and these

findings will be strengthened by replication within another large independent cohort

 Primary Specific Aim 2: Determine the association between SNPs associated with

coronary heart disease and mRNA expression levels in the MESA cohort

 Hypothesis 2: One or more SNPs previously associated with coronary heart disease will

be associated with differing mRNA expression levels, suggesting these SNPs affect gene

expression which may be one mechanism of their increased risk of coronary heart disease

This research is important because there remains a large void of knowledge regarding what regions of the genome are associated with an increased risk of the development of coronary artery disease, how these regions differ within ethnicities, and how the static genetic code leads to changes in disease phenotype. Specifically, an improved understanding of the gene regions responsible for an increase in coronary artery disease risk may help elucidate key genetic

18 pathways associated with the pathogenesis of one of the largest burdens of global health, and illuminate potential targets that could be exploited by novel therapeutics.

19

Chapter 2

Replication of the Association of PSRC1 and MTHFD1L with Coronary Artery Disease: The Multi-Ethnic Study of Atherosclerosis

Matthew B. Sellers, MD1, Yongmei Liu, MD, PhD2, Gregory Burke, MD, MSc3, Jasmin Divers, PhD4, Pamela Ouyang, MBBS5, Walter Palmas, MD6, Wendy Post, MD5, Kurt Lohman, PhD4, David Herrington, MD, MHS1

1Department of Internal Medicine-Cardiology and 2Department of Epidemiology and Prevention and 3Division of Public Health Sciences and 4Department of Biostatistical Sciences, Wake Forest School of Medicine; 5Department of Internal Medicine -Cardiology, Johns Hopkins University; 6Department of Internal Medicine, Columbia University

Abstract

Introduction: GWAS have identified thirty single nucleotide polymorphisms (SNPs) associated with coronary heart disease (CHD) with genome-wide statistical significant to date; however, the majority of these analyses are in subjects of Caucasian ancestry. We analyzed these previously reported SNPs associated with CHD events within four separate ethnic groups in the Multi-

Ethnic Study of Atherosclerosis (MESA). Associations were validated in the Health Aging and

Body Composition (Health ABC) cohort.

Methods: Cox proportional hazards model assuming an additive genetic effect adjusting for age, gender, study site, and genetic ancestral principal components was performed on 2,528

Caucasians, 1,673 African Americans, 1,449 Hispanics, and 775 Chinese subjects with 263 total events. SNPs with a nominal statistical significance of 0.05 in Caucasians and African

Americans were further validated using the Health, Aging and Body Composition (Health ABC)

20 cohort consisting of an additional 2,800 subjects (1661 Caucasians and 1139 African Americans with 954 total events).

Results: Among the three SNPs within the Caucasian group with nominal significance, rs599839 of the PSRC1 gene on chromosome 1 with a p value of 0.01 (HR 1.45) replicated within the Health ABC Caucasian cohort (p 0.009, HR 1.25). Among the five SNPs within the

African American group with nominal significance, rs6922269 of the MTHFD1L gene on chromosome 6 with a p value of 0.03 (HR 1.52) replicated within the Health ABC African

American cohort (p 0.016, HR 1.19).

Conclusion: Two of the 30 previously published SNP associations with coronary artery disease, rs599839 within the PSRC1 gene on chromosome 1 and rs6922269 within the MTHFD1L gene on chromosome 6, revealed nominal significance in the MESA cohort and these findings were replicated within the Health ABC cohort.

21

Introduction

Cardiovascular disease continues to be one of the leading causes of death in the United

States and other industrialized countries. The American Heart Association estimates that 16.8 million individuals have ischemic heart disease and 1 in 3 American adults have cardiovascular disease. With the aging population in the United States, these numbers are likely only to increase.

Potentially modifiable risk factors, such as hypertension, diabetes, abdominal obesity, smoking, alcohol use and exercise habits have been well established as contributors that predispose the general population to myocardial infarction regardless of sex or age1. In addition to environmental and lifestyle factors, coronary disease tends to cluster in families suggesting genetic factors play a crucial role in the development of cardiovascular disease, especially in premature coronary heart disease4. Since the sequencing of the human genome, genome-wide association studies (GWAS) are trying to characterize the role of genetics and potentially further illustrate the pathophysiology of coronary artery disease, as well as provide additional data in risk stratification and preventive cardiology.

Several GWAS have identified single nucleotide polymorphisms (SNP) associated with coronary artery disease, and the most replicated region identified is a 100kb region on chromosome 97-11. This locus on chromosome 9 has been strengthened by replicating these results in Caucasians, Japanese, Hispanics and East Asians7, 13-16, and 9p21 remains the most well replicated and validated genetic SNP association with coronary artery disease.

Since this sentinel cardiovascular GWAS, other GWAS publications have implicated additional SNPs as possibly being related to CHD events, with 27 of the 30 SNPs implicating specific genes [Table 1]. The Welcome Trust Case Control Consortium (WTCCC) replicated the

22 association on chromosome 9, and identified additional SNPs on chromosome 1, 16 and 22 using

2,000 cases with a history of myocardial infarction or coronary revascularization before the age of 66 identified by records and/or physician reports against 3,000 controls7. Samani et. al. used the WTCCC cohort of close to 2,000 subjects and replicated these findings in the German MI cohort composed of patients with a history of myocardial infarction before the age of 60 with at least one first degree relative with premature coronary artery disease. Using this approach, they were able to identify new loci on chromosomes 1, 2, 6, 10, and 1511. Kathiresan et. al. evaluated associations with early myocardial infarction identified by autopsy or clinical presentation in close to 3,000 subjects and replicated these findings in close to 20,000 controls identifying new loci on chromosomes 1, 2, 6, 19, and 2120. Subsequent replications have been done, including a meta-analysis, which have implicated other SNPs as associated with coronary artery disease21, 27.

However, the majority of data supporting these SNP-trait associations are derived from predominately Caucasian populations and not all have been replicated in independent cohorts.

The primary aim was to support previously reported SNPs associated with MI/CHD events and to examine the variation in the associations across race/ethnicity groups in MESA.

Methods

Study Participants

The Multi-Ethnic Study of Atherosclerosis (MESA) is a prospective observational cohort of subjects free from cardiovascular disease at baseline. The study design and objectives of

MESA have been previously described40. Briefly, MESA participants were recruited from six field sites in the United States—Forsyth County, NC, Northern Manhattan/Bronx, NY,

Baltimore/Baltimore County, MD, St Paul, MN, Chicago, IL, and Los Angeles County, CA. The

MESA cohort comprises 6,814 men and women of diverse ethnic background who were 45–84

23 years old at the baseline exam and free of clinically overt cardiovascular disease. The cohort was

53% women with a racial/ethnic composition of 38% white, 28% African American, 23%

Hispanic, and 11% Asian, primarily of Chinese descent. Genome-wide association study analysis was performed on MESA cohort members who provided DNA samples (n = 6,425;

2528 Whites, 1449 Hispanics, 1673 African-Americans, and 775 Chinese).

Patient outcomes were time to first All CHD event and its individual subcomponents (MI,

Resuscitated Cardiac Arrest, definite Angina, Probable Angina (if followed by

Revascularization), CHD death) using a proportional hazards association model. The original

GWAS was completed in this cohort, and SNPs with evidence of association with coronary artery disease were identified within the MESA GWAS cohort. The SNPs with nominal statistical significance of less than 0.05 in Caucasians and African Americans were further validated using the Health, Aging and Body Composition (Health ABC) cohort.

The Health ABC Study is a longitudinal, prospective study investigating the associations among body composition, weight-related health conditions, and incident functional limitations in

2,800 subjects (1661 Caucasians and 1139 African Americans with 954 total events)29.

Outcomes were defined as incident CHD events defined as hospitalization for MI, angina or elective coronary revascularization. Data collection at baseline occurred during the periods

1997–1998. The Health ABC study cohort consists of 3,075 well-functioning black and white men and women aged 70–79 at baseline (1997-1998). Eligibility included reporting no difficulties performing activities of daily living (ADL), walking a quarter of a mile, or climbing

10 steps without resting. In addition, all participants were free of a terminal diagnosis and had no intention to move from the area for at least 3 years. White participants were recruited from a random sample of Medicare beneficiaries in the zip codes in and surrounding Pittsburgh,

24

Pennsylvania, and Memphis, Tennessee. Black participants were recruited from all age-eligible residents of the areas in and surrounding Pittsburgh and Memphis. All study participants provided written informed consent prior to participation.

Genotyping

Participants were typed on the Affymetrix 6.0 SNP array at Affymetrix Research

Services Lab. 6880 samples passed initial genotyping QC, as outlined below. An additional

1738 African American samples genotyped at the Broad Institute as part of the CARe project passed genotyping QC. Affymetrix performed plate-based genotype calling using Birdseed v2.

Sample QC was based on call rates and contrast QC (cQC) statistics. The Broad Institute performed similar QC for CARe samples. Additional sample and SNP QC was carried out at

University of Virginia, including sample call rate, sample cQC, and sample heterozygosity by race at the sample level as well as outlier plate checking by call rate, median cQC or heterozygosity at plate level. Four samples were removed due to low call rate (<95%). Plate- based heterozygosity check found no evidence of contamination, so all plates that passed other

QC metrics were retained. Cryptic sample duplicates based on IBD/IBS were dropped. We excluded monomorphic SNPs across all samples; SNPs with missing rate > 5% or observed heterozygosity > 53% were also excluded.

Additional genotypes were imputed separately in each ethnic group using the program

IMPUTE2. Prior to imputation, SNPs that were recommended for exclusion were dropped, and therefore imputed if they were in the HM1+2 reference panel. Additionally, IMPUTE dropped monomorphic SNPs within each ethnic group. HapMap CEU was used as the reference population for CAU sample, while the HapMap I + II CEU+YRI+CHB+JPT (rel#22, BCBI

Build 36, dbSNP b126) was used as the reference population for the non-Caucasian groups.

25

Figure 1. Manhattan Plot representing the MESA GWAS for all ethnic groups with the Q-Q plot of the data in the bottom right corner.

Calculation of ethnic-specific PCs

Principal components (PCs) of ancestry were computed for each ethnic group using the program SMARTPCA, which is distributed with EIGENSTRAT 41-42. The PC analysis was performed using SNPs selected for minimal linkage disequilibrium (LD) within each of the four ethnic groups. Outliers identified by 5 iterations using 10 sigma thresholds (113 CAU, 65 CHN,

21 AA, and 75 HIS) were removed.

Statistical Analysis

26

Cox proportional hazards model assuming an additive genetic effect adjusting for age, gender, study site, and genetic ancestral principal components was performed on 2,528

Caucasians, 1,673 African Americans, 1,449 Hispanics, and 775 Chinese subjects with 263 total events [Figure 1]. The two SNPs identified with a nominal statistical significance of 0.05 in

Caucasians and African Americans were further validated using the Health, Aging and Body

Composition (Health ABC) cohort consisting of an additional 2,800 subjects (1661 Caucasians and 1139 African Americans with 954 total events). As only 2 SNPs were tested within the

Health ABC cohort, statistical significance was defined as a Bonferroni correction of 0.05/2 (p <

0.025).

All previously reported SNPs identified by literature review as associated with coronary artery disease with genome-wide significance (defined as p value < 5 x 10-6) were identified within the MESA GWAS. Any SNP(s) with nominal significance, defined as p < 0.05, were selected for further replication with Health ABC. A meta-analysis was performed on all SNPs remaining after phase 1 (initial MESA GWAS) and phase 2 (replication within Health ABC) replication using METAL software (Goncalo Abecasis and Cristen Willer, 2007)43.

Results

At the time of the GWAS, in the MESA cohort there were a total of 263 events, with 118 events within the Caucasian group, 64 events in the African American group, 59 events in the

Hispanic group and 22 in the Chinese group. Genome-wide association analysis was performed on the MESA subjects stratified by ethnicity. As expected secondary to small sample size and low power, no SNP within the original GWAS reached genome-wide significance defined as p <

5x10-8. As specified a priori, previously reported single nucleotide polymorphisms associated

27 with coronary artery disease published in the literature with genome wide significance were located within the MESA GWAS results for each ethnic groups GWAS [Table1].

28

Table 1. Replication of previously published SNPs with genome-wide significance within the 4 ethnic groups MESA GWAS. Prev Reported MESA CAU 118/2528 AFA 64/679 HIS 59/1449 CHN 22/775 P P P SNP Chr Position Gene P value Author Al AF OR [CI] Coded AF Beta AF Beta AF P value Beta AF Beta Value value value rs17672135 1 238,512,219 intergenic 2.4 x10-6 WTCCC T 0.87 1.43[1.23-1.64] T 0.89 0.56 0.13 0.91 0.36 -0.25 0.94 0.56 -0.23 0.89 0.12 -0.65 rs599839 1 109,623,689 PSRC1 4.0 x10-9 Samani A 0.77 1.29[1.18-1.4] G 0.21 0.01 0.37 0.71 0.03 -0.42 0.30 0.41 0.16 0.08 0.19 0.56 rs17114036 1 56,735,409 PPAP2B 3.8 x10-19 Schunkert A 0.91 1.17[1.13-1.22] A 0.88 0.70 -0.09 0.83 0.43 -0.23 0.91 0.49 -0.30 0.96 0.80 -0.26 rs17465637 1 220,890,152 MIA3 1.0 x10-6 Samani C 0.71 1.20[1.12-1.3] C 0.71 0.67 0.06 0.28 0.92 0.02 0.47 0.53 0.12 0.60 0.15 0.49 rs11206510 1 55,268,627 PCSK9 9.6 x10-9 Kathiresan T 0.81 1.15[1.1-1.21] T 0.82 0.80 -0.04 0.87 0.34 -0.24 0.88 0.97 0.01 0.96 0.10 17.32 rs2943634 2 226,776,324 pseudogene 2.0 x10-7 Samani C 0.65 1.21[1.13-1.30] C 0.67 0.14 0.21 0.44 0.53 0.12 0.73 0.64 0.11 0.92 0.87 -0.09 rs6725887 2 203,454,130 WDR12 1.3 x10-8 Kathiresan C 0.14 1.17[1.11-1.23] T 0.88 0.73 0.07 0.97 0.17 1.00 0.92 0.48 0.27 0.98 0.99 0.03 rs9818870 3 139,604,812 MRAS 7.4 x10-13 Erdmann T 0.15 1.15[1.11-1.19] T 0.15 0.74 -0.06 0.09 0.43 0.23 0.08 0.32 0.28 0.01 0.10 -15.15 rs17609940 6 35,142,778 ANKS1A 1.4 x10-8 Schunkert G 0.75 1.07[1.05-1.10] C 0.17 0.37 0.16 0.05 0.01 -0.81 0.15 0.33 0.27 0.01 0.99 16.29 rs12190287 6 134,256,218 TCF21 1.1 x10-12 Schunkert C 0.62 1.08[1.06-1.10] C 0.63 0.61 -0.07 0.89 0.22 -0.41 0.63 0.50 -0.13 0.58 0.73 -0.11 rs6922269 6 151,294,678 MTHFD1L 3.0 x10-8 Samani A 0.25 1.23[1.15-1.33] G 0.74 0.71 -0.06 0.46 0.03 0.42 0.63 0.65 -0.08 0.98 0.01 -1.45 rs12526453 6 13,035,530 PHACTR1 1.3 x10-9 Kathiresan C 0.65 1.12[1.08-1.17] G 0.35 0.49 0.09 0.16 0.33 -0.25 0.33 0.11 -0.34 0.90 0.10 -15.11 rs11556924 7 129,450,732 ZC3HC1 9.2 x10-18 Schunkert C 0.62 1.09[1.07-1.12] C 0.61 0.52 0.08 0.91 0.98 0.01 0.81 0.42 0.19 0.95 0.99 -18.20 rs579459 9 135,143,989 ABO 4.1 x10-14 Schunkert C 0.21 1.10[1.07-1.13] C 0.22 0.13 -0.22 0.15 0.94 -0.02 0.17 0.43 0.23 0.20 0.65 0.18 CDKN2A,C rs1333049 9 22,086,055 1.8 x10-14 WTCCC C 0.47 1.47[1.27-1.70] G 0.51 0.44 -0.10 0.74 0.11 -0.33 0.56 0.53 -0.12 0.51 0.78 -0.08 DKN2B CYP17A1, rs12413409 10 104709086 CNNM2, 1.0 x10-9 Schunkert G 0.89 1.12[1.08-1.16] A 0.10 0.56 -0.12 0.06 0.43 0.33 0.17 0.63 -0.12 0.30 0.86 0.06 NT5C2 rs501120 10 44,073,873 CXCL12 9.0 x10-8 Samani T 0.87 1.33[1.20-1.48] T 0.85 0.20 0.26 0.58 0.44 -0.15 0.73 0.84 0.04 0.66 0.22 -0.36 ZNF259, rs964184 11 116,154,127 APOA5-A4- 1.0 x10-17 Schunkert G 0.13 1.13[1.10-1.16] C 0.86 0.14 0.26 0.79 0.81 0.05 0.71 0.35 0.19 0.77 0.96 -0.02 C3-A1 HNF1A- rs2259816 12 119,919,970 4.8 x10-7 Erdmann T 0.36 1.08[1.05-1.11] T 0.38 0.91 0.02 0.15 0.55 -0.16 0.36 0.76 -0.06 0.45 0.36 0.28 C12orf43 COL4A1, rs4773144 13 109,758,713 3.8 x10-9 Schunkert G 0.44 1.07[1.05-1.09] A 0.56 0.91 -0.01 0.60 0.33 -0.19 0.58 0.57 -0.11 0.60 0.69 -0.12 COL4A2 rs2895811 14 99,203,695 HHIPL1 1.1 x10-10 Schunkert C 0.43 1.07[1.05-1.10] C 0.41 0.26 0.15 0.25 0.008 -0.58 0.30 0.18 -0.28 0.23 0.04 -0.64 rs3825807 15 76,876,166 ADAMTS7 1.1 x10-12 Schunkert A 0.57 1.08[1.06-1.10] A 0.57 0.11 -0.22 0.84 0.92 0.03 0.74 0.22 -0.30 0.87 0.06 0.76 rs17228212 15 65,245,693 SMAD3 2.0 x10-7 Samani C 0.30 1.21[1.13-1.30] T 0.73 0.65 0.07 0.88 0.81 0.07 0.85 0.15 -0.33 1.00 0.10 14.67 rs8055236 16 81,769,899 intergenic 5.6 x10-6 WTCCC G 0.80 1.91[1.33-2.74] T 0.19 0.86 0.03 0.50 0.03 -0.43 0.20 0.03 0.50 0.10 0.65 -0.24 RASD1, rs12936587 17 17,484,447 SMCR3, 4.45x10-10 Schunkert G 0.56 1.07[1.05-1.09] A 0.45 0.03 0.29 0.29 0.40 -0.17 0.35 0.73 -0.07 0.13 0.63 -0.21 PEMT rs216172 17 2,073,254 SMG6, SRR 1.15x10-9 Schunkert C 0.37 1.07[1.05-1.09] C 0.36 0.12 0.22 0.37 0.90 0.02 0.34 0.72 0.07 0.26 0.59 0.19 UBE2Z, GIP, rs46522 17 44,343,596 ATP5G1, 1.81x10-8 Schunkert T 0.53 1.06[1.04-1.08] C 0.49 0.15 0.19 0.84 0.10 0.38 0.59 0.77 -0.05 0.31 0.22 0.44 SNF8

29 rs1122608 19 11,024,601 LDLR 1.9 x 10-9 Kathiresan G 0.75 1.15[1.10-1.20] T 0.34 0.05 -0.33 0.06 0.18 -0.57 0.14 0.08 0.39 0.10 0.93 -0.05 SLC5A3- rs9982601 21 34,520,998 MRPS6- 6.4 x 10-11 Kathiresan T 0.13 1.20[1.14-1.27] T 0.13 0.46 -0.15 0.19 0.82 -0.07 0.10 0.30 -0.43 0.90 0.41 -25.34 KCNE2 rs688034 22 25,019,635 intergenic 3.75 x 10-6 WTCCC T 0.31 1.11[0.99-1.25] T 0.34 0.65 0.06 0.13 0.25 0.28 0.16 0.55 0.15 0.90 0.10 -12.73

30

Among the three SNPs within the Caucasian group with nominal significance defined as a p value of < 0.05, rs599839 of the PSRC1 gene on chromosome 1 with a p value of 0.01 replicated within the Health ABC Caucasian cohort with a p value of 0.009. Among the five

SNPs within the African American group with nominal significance, rs6922269 of the

MTHFD1L gene on chromosome 6 with a p value of 0.03 replicated within the Health ABC

African American cohort with a p value of 0.016 [Table 2 and Figure 2]. A meta-analysis was performed on of the SNPs within each ethnic group as presented in Table 2.

MESA Health ABC Meta-Anal SNP Chr Gene ethnicity p value HR ethnicity p value HR P value rs599839 1 PSRC1 Caucasian 0.01 1.45 Caucasian 0.009 1.25 0.0058 rs6922269 6 MTHFD1L African A. 0.03 1.52 African A. 0.016 1.19 0.0013

Table 2. Replicated SNP Associations

Figure 2. Hazard ratios with 95% CI of SNP associated with MTHFD1L in African Americans

(top) and SNP associated with PSRC1 in Caucasians (bottom)

Discussion

31

The association of 30 SNPs previously documented in the literature as associated with coronary artery disease in predominantly Caucasian cohorts was examined within the previously completed MESA GWAS, and two of the published SNP associations with coronary artery disease, rs599839 on chromosome 1 within the PSRC1 gene and rs6922269 on chromosome 6 within the MTHFD1L gene, revealed nominal significance within the Caucasian group and

African American group, respectively. These associations were replicated within the Health

ABC cohort further validating these findings in an additional large, independent cohort. In addition, a meta-analysis was performed strengthening these results, revealing a combined p value of 0.0058 for rs599839 within the PSRC1 gene in Caucasians and a combined p value of

0.0013 for rs6922269 within the MTHFD1L gene in African Americans.

Samani et. al. originally described the association of rs599839 and rs6922269 with coronary artery disease using the Welcome Trust Case Control Consortium and combining this data with the German MI Family Study to implicate these genetic loci with risk of coronary artery disease11. Subsequently, rs599839 was one of a group of SNPs further validated in a large scale meta-analysis of multiple GWAS revealing associations with coronary artery disease27.

These results continue to provide much needed replication to confirm GWAS SNP associations and support that rs599839 and rs6922269 are not likely false positive findings. In addition, rs6922269 was originally described in predominantly Caucasian populations, and this provides evidence of support of an association within the African American population.

In addition to the replication studies supporting the association of rs599839 with coronary artery disease, this SNP has been implicated in pathways with plausible pathophysiologic mechanisms related to metabolic profile and atherosclerosis. rs599839 is located in the 3’ untranslated region (UTR) of the proline/serine-rich coiled coil protein (PSRC1) gene. The

32 region of the genome around this SNP has been associated with LDL cholesterol in some recent genome-wide association studies17. These results were further validated in a study investigating the association of rs599839 with traditional cardiovascular risk factors. In this study, rs599839 was associated with higher LDL cholesterol levels per copy of the risk allele in over 2,000 adults

(p= 3.84 x 10-6)18. In addition, liver expression levels of PSRC-1, as well as the gene cadherin,

EGF LAG seven-pass G-type receptor 2 (CELSR2) and sortilin 1 (SORT1), have been significantly associated with LDL cholesterol levels in mouse models and humans. Combined data suggests that PSRC-1, CELSR2 and sortilin 1 are tightly co-regulated and may operate in a conserved network associated with the metabolic phenotypes of dyslipidemia, obesity, diabetes, and atherosclerosis19.

Despite these promising results, more investigation is needed to further classify how rs599839 increases risk for coronary artery disease, or its high risk metabolic phenotypes, and continued replication is needed to further strengthen these data. Interestingly, rs599839 has limited linkage disequilibrium with other variants within the PSRC1 gene, and has stronger linkage disequilibrium with variants in other adjacent genes. Therefore, rs599839 may not be exerting its effects via the PSRC1 gene and further functional analysis is needed to help classify how this SNP increases risk for coronary artery disease.

One of the strengths of the MESA cohort is the ethnic diversity of subjects, including

Caucasian, Chinese, African American, and Hispanic, and the MESA GWAS provides genomic information in ethnic groups with limited research data. However, replication was only possible within the Caucasian and African American cohort, as the sample size and event outcomes most likely limited our ability to replicate findings within the Hispanic and Chinese ethnicities.

Conclusion

33

Two of the 30 previously published SNP associations with coronary artery disease, rs599839 within the PSRC1 gene on chromosome 1 and rs6922269 within the MTHFD1L gene on chromosome 6, revealed nominal significance in the MESA cohort and these findings were replicated within the Health ABC cohort. Together these data from two large independent cohorts provide further support for an association of these SNPs with CHD and justify additional research into the possible pathophysiologic roles of these genes, or genes within linkage disequilibrium to these SNPs.

34

Chapter 3

Functional Validation of Previously Identified SNPs Associated with Coronary Artery Disease

Matthew B. Sellers, MD1, Yongmei Liu, MD, PhD2, Timothy Howard, PhD3, Gregory Burke, MD, MSc4, Jasmin Divers, PhD5, Pamela Ouyang, MBBS6, Walter Palmas, MD7, Wendy Post, MD6, Kurt Lohman, PhD5, David Herrington, MD, MHS1

1Department of Internal Medicine-Cardiology and 2Department of Epidemiology and Prevention and 3Center for Genomics and Personalized Medicine Research and 4Division of Public Health Sciences and 5Department of Biostatistical Sciences, Wake Forest School of Medicine; 6Department of Internal Medicine -Cardiology, Johns Hopkins University; 7Department of Internal Medicine, Columbia University

Abstract

Introduction: Genome-wide association studies have identified multiple SNPs as statistically significantly associated with coronary artery disease. To provide additional functional validation of these SNPs, we examined their association with expression of ~25,000 gene transcripts in purified monocytes from a subset (N=1264) of participants from the Multi-Ethnic Study of

Atherosclerosis (MESA).

Methods: Genome-wide mRNA expression profiles were generated with the Illumina

HumanHT-12 BeadChip platform in purified monocytes from 1264 MESA subjects sampled to achieve balance across age, race/ethnicity and sex. Associations of the previously reported SNPs with individual gene transcripts was assessed using generalized linear models adjusting for age, race, sex, study site and technical factors.

35

Results: Of the 16 previously reported SNPs associated with CAD, 5 were found to be statistically significantly associated with differing mRNA expression levels. Of these 5 SNP associations, rs599839 located near the PSRC1 gene is significantly associated with differing mRNA expression of the PSRC1 gene using 2 separate probes (p = 4.9x10-22 and p = 2.4x10-

18).

Conclusion: A subset of the previously reported SNPs associated with CAD is associated with differing gene expression. Of these SNPs, rs599839 is associated with differing mRNA expression of the PSRC1 gene of which it is only several base pairs away from the 3’UTR of the

PSRC1 gene suggesting there may be a direct and local effect of this SNP on gene expression.

Introduction

Cardiovascular disease is a complex polygenic disorder that remains a pandemic worldwide. There is a large body of literature trying to identify genetic components that lead to increased risk for cardiovascular disease, largely and most recently via the form of genome-wide association studies. It has become apparent, however, that identification of SNPs associated with coronary artery disease has led to new challenges in understanding how these nucleotide changes increase the risk of CAD. When localized, these regions have largely not been associated with genes involved in known pathophysiologic mechanisms of cardiovascular disease. In addition, one of the most replicated and strongly associated regions on chromosome 9 is intergenic and the mechanism for its increased risk for disease remains elusive9-10.

There are roughly 25,000 genes in human DNA, and the importance of gene regulation, transcriptomics, proteomics, and the fluidity of the human genome becomes increasingly

36 emphasized. The use of peripheral whole blood or isolated monocytes has been a promising tool for analysis of gene expression and cardiovascular disease36-38, 44-45. In this study, we will analyze gene expression profiles in isolated mononuclear cells from peripheral whole blood samples in subjects within the Multi-Ethnic Study of Atherosclerosis (MESA) cohort as a function of SNPs previously identified in the literature as associated with coronary artery disease.

Methods

Study Participants

The Multi-Ethnic Study of Atherosclerosis (MESA) is a prospective observational cohort of subjects free from cardiovascular disease at baseline. The study design and objectives of

MESA have been previously described40. Briefly, MESA participants were recruited from six field sites in the United States—Forsyth County, NC, Northern Manhattan/Bronx, NY,

Baltimore/Baltimore County, MD, St Paul, MN, Chicago, IL, and Los Angeles County, CA. The

MESA cohort comprises 6,814 men and women of diverse ethnic background who were 45–84 years old at the baseline exam and free of clinically overt cardiovascular disease. The cohort was

53% women with a racial/ethnic composition of 38% white, 28% African American, 23%

Hispanic, and 11% Asian, primarily of Chinese descent. Genome-wide association study analysis was performed on MESA cohort members who provided DNA samples (n = 6,425; 2528

Whites, 1449 Hispanics, 1673 African-Americans, and 775 Chinese). Patient outcomes were time to first All CHD event and its individual subcomponents (MI, Resuscitated Cardiac Arrest, definite Angina, Probable Angina (if followed by Revascularization), CHD death) using a proportional hazards association model.

DNA collection

37

Cell pellets collected from MESA routine peripheral blood draws during exam 5 were utilized to isolate monocytes and extract DNA and RNA simultaneously. During scheduled clinic visits in MESA exam 5, 32ml peripheral blood was collected in sodium citrate containing

Vacutainer CPT cell separation tube (Becton Dickinson, Rutherford, NJ). Subsequently, monocytes were isolated using the anti-CD14 coated magnetic beads method according to the study protocol (modified from manufacturer’s instruction); samples were stored in RNAlater

(Ambion, Austin, TX) solution and frozen on site; and samples were shipped to the Wake Forest

Human Genetics Laboratory on a quarterly basis. In order to establish proper QA and QC and ensure that standardized procedures are followed across the four MESA field centers in regards to sample collection, handling and shipment, the Wake Forest lab provided standardized protocols and order project supplies (e.g. same lot # for the lab supplies) and shipping material, and provided centralized training for all the field centers. At Wake, DNA and RNA were isolated simultaneously from the monocytes using the AllPrep DNA/RNA Mini Kit (Qiagen, Inc.). RNA underwent quality control testing with an Agilent 2100 Bioanalyzer with RNA 6000 Nano chips

(Agilent Technology, Inc., Santa Clara, CA) according to the manufacturer’s instructions. mRNA Expression Profiles

Human genome mRNA expression profiles of 1264 monocyte samples from the MESA cohort described above were revealed using the Illumina HumanHT-12 BeadChip targeting

~25,000 genes. A general linear model was fitted with mRNA expression levels as the outcome variable and the previously reported SNP associations as the explanatory variable controlling for age, race, gender, study site, chip, well, and cell contamination (B cells, T cells, natural killer cells and neutrophils). Standardized QC/QA procedures were used for all DNA expression analysis.

38

SNP Chr Gene P value Author Risk Al rs599839 1 PSRC1 4 x 10-9 Samani A rs17465637 1 MIA3 1 x 10-6 Samani C rs11206510 1 PCSK9 9.6 x 10-9 Kathiresan T rs2943634 2 pseudogene 2 x 10-7 Samani C rs6725887 2 WDR12 1.3 x 10-8 Kathiresan C rs12190287 6 TCF21 1.07x10-12 Schunkert C rs6922269 6 MTHFD1L 3 x 10-8 Samani A rs12526453 6 PHACTR1 1.3 x 10-9 Kathiresan C rs1333049 9 CDKN2A,CDKN2B 1.8 x 10-14 WTCCC C rs12413409 10 CYP17A1, CNNM2, NT5C2 1.03x10-9 Schunkert G rs501120 10 CXCL12 9 x 10-8 Samani T rs974819 11 PDGFD 2.41x10-9 Peden T rs4773144 13 COL4A1, COL4A2 3.84x10-9 Schunkert G rs17228212 15 SMAD3 2 x 10-7 Samani C rs8055236 16 intergenic 5.6 x 10-6 WTCCC G rs1122608 19 LDLR 1.9 x 10-9 Kathiresan G rs9982601 21 SLC5A3-MRPS6-KCNE2 6.4 x 10-11 Kathiresan T Table 1. Previously published SNPs associated with CAD in the literature that were available for gene expression analysis.

Results

Of the 16 previously reported SNPs associated with CAD, 5 were found to be statistically

significantly associated with differing mRNA expression levels after Bonferroni adjustment

[Table 2]. Of these 5 SNP associations, rs599839 located within the PSRC1 gene is significantly

associated with differing mRNA expression of the PSRC1 gene. In addition, the SNP associated

with the WDR12 gene is statistically significantly associated with mRNA expression levels of

the FAM117B gene which is also located on chromosome 2. A stratified analysis by race of the

statistically significant SNP associations was performed [Table 3].

39

SNP Chr Gene Probe Target Gene Chr P value ILMN3244607 FAM117B 2 7.0x10-49 rs6725887 2 WDR12 ILMN1739942 FAM117B 2 3.7x10-45 ILMN2315964 PSRC1 1 4.9x10-22 rs599839 1 PSRC1 ILMN1671843 PSRC1 1 2.4x10-18 CYP17A1, CNNM2, rs12413409* 10 ILMN2151056 C10orf32 10 3.8x10-19 NT5C2 rs12526453 5 PHACTR1 ILMN_2379718 RAB24 6 3.1x10-8 rs501120 10 CXCL12 ILMN1730454 FOLR3 11 6.6x10-8 rs11206510 1 PCSK9 ILMN1703686 BRF1 14 1.9x10-5 rs17465637 19 MIA3 ILMN_2181469 ZNF611 1 3.8x10-5 rs1122608 19 SMARCA4 ILMN_1806275 ADAT3 19 2.2x10-4 rs2943634 1 intergenic ILMN_1772712 UBE2J2 2 3.8x10-4 rs12190287* 1 NA,SGK1 ILMN_1804174 FCGR2B 6 1.4x10-3 rs6922269* 11 MTHFD1L ILMN_3241870 FRMD8 6 2.4x10-3 rs17228212 4 SMAD3 ILMN_1710873 ZNF330 15 2.6x10-3 DMRTA1, DMRTA1, rs1333049 7 ILMN_2259292 9 8.8x10-3 CDKN2BAS CDKN2BAS rs974819* 10 PDGFD,DYNC2H1 ILMN_1699574 NRP1 11 2.6x10-2 rs4773144* 20 COL4A1,IRS2 ILMN_1654118 BCL2L1 13 2.6x10-2 SLC5A3-MRPS6- rs9982601* 21 ILMN2112915 DRD4 11 1.8x10-1 KCNE2 Table 2. SNP associations with mRNA expression levels uncorrected for multiple comparisons

SNP Probe White Black Hispanic Overall ILMN3244607 3.36x10-25 7.10x10-6 1.87x10-17 7.01E-49 rs6725887 ILMN1739942 2.34x10-23 5.30x10-04 6.12x10-18 3.66E-45 ILMN2315964 6.69x10-21 1.80x10-06 3.06x10-10 4.90E-22 rs599839 ILMN1671843 3.58x10-12 6.06x10-04 7.18x10-20 2.43E-18 rs12413409* ILMN2151056 2.52x10-08 4.77x10-05 2.42x10-09 3.82E-19 rs12526453 ILMN2379718 2.94x10-04 7.23x10-02 7.17x10-03 3.09E-08 rs501120 ILMN1730454 1.10x10-01 3.94x10-03 2.84x10-02 6.62E-08 Table 3. Statistically significant SNP associations stratified by race

The functional significance of these SNPs was evaluated using the University of

California Santa Cruz (UCSC) Genome Browser. There are only 4 SNPs located cis to the target gene mRNA expression, rs6725887, rs599839, rs12413409, and rs1122608, providing a much more plausible location for direct effects on gene expression. Of these SNPs, rs599839 is located

40 only a few base pairs down from the 3’ UTR of the PSRC1 gene and is in linkage disequilibrium with several SNPs located within the CELSR2 gene. SNP rs6725887 is located within the untranslated region of the WDR12 gene and is within a large block of linkage disequilibrium with multiple SNPs within the WDR12 gene as well as several genes within close proximity.

However, rs6725887 is only in mild LD with 2 SNPs located within the FAM117B gene for which it has been associated with differing mRNA expression. SNP rs12413409 is located within the untranslated region of the CNNM2 gene and is in LD with 2 SNPs located within the untranslated region of C10orf32 gene for which it has been associated with differing mRNA expression. None of the significant above SNPs was located within known documented histone binding sites, or other known regulatory regions.

Discussion

Most of the GWAS that have identified highly replicated SNPs associated with CAD provide little insight into the role these markers play in the physiology of complex diseases.

Therefore, isolated DNA and RNA from monocyte samples of subjects within the MESA cohort were utilized to evaluate the association between SNPs identified in the literature as associated with CAD with genome-wide significance and whole genome mRNA transcript levels to investigate any role these SNPs may have on gene regulation and expression. Several previously reported SNPs had associations with gene expression, and of these statistically significant findings, rs599839 was associated with PSRC1 expression which has been implicated in the pathophysiology of atherosclerosis and is also located within the 3’UTR of the PSRC1 gene.

The rs599839 SNP has been associated with LDL cholesterol levels, myocardial infarction and angiographically documented coronary artery disease in previous studies19, 46-48.

Schadt et. al. isolated human liver DNA and found rs599839 was associated with mRNA

41 expression of PSRC1, SORT1, CELSR2 and SYPL2. This association was further validated using genetically engineered mice designed to study metabolic traits associated with cardiovascular disease. In this validation analysis, PSRC1, SORT1 and CELSR2 liver expression levels were statistically significantly associated with LDL cholesterol levels. These genes were found to be involved in a causal network of tightly co-regulated genes under a subnetwork including a larger group of genes that have been implicated in obesity, diabetes, cholesterol levels, and cardiovascular disease19. The PSRC1 gene encodes for proline/serine-rich coiled coil protein 1 and CELSR2 encodes for cadherin EGF LAG seven-pass G-type receptor 2, and the function of both of these proteins has yet to be fully elucidated. However, SORT1 encodes sortilin which is a cell surface receptor which has been shown to bind the LDL-receptor associated protein (RAP) in vitro46. PSRC1 expression levels were positively correlated with

LDL cholesterol in these mouse models, which may provide a pathophysiologic mechanism as the rs599839 risk allele was associated with increased PSRC1 expression in the MESA monocyte cohort. In our analysis, rs599839 was not statistically significantly associated with

SORT1 and CELSR2 expression levels; however these expression association analyses used isolated monocyte samples and not liver tissue DNA.

It is plausible that the mechanism for rs599839’s association with coronary artery disease is via a co-regulated multi-gene network that may involve PSRC1, CELSR2 and SORT1, and there is evidence that this pathophysiologic process works through LDL cholesterol homeostasis.

Supporting this hypothesis are several GWAS that have identified rs599839 as associated with

LDL cholesterol levels17, 46, 49. Linsel-Nitschke et. al. found that each copy of the G-allele (the non-risk allele in GWAS of rs599839) of rs599839 was associated with a mean difference in

LDL-C levels of 0.14 mmol/L (90% CI 0.09 to 0.17 mmol/L; p=2.6x10-11) in a pooled analysis

42 of multiple cohorts after adjusting for age and gender. In parallel, in a case-control analysis the association of rs599839 with CAD in 4287 cases and 7572 controls found a 9% decrease in CAD risk for each copy of the G allele. In addition, in a pooled meta-analysis of recent GWAS on

CAD each copy of the G allele of rs599839 was associated with a 13% decreased risk of coronary artery disease46. The changes in LDL cholesterol concentrations identified based on rs599839 allele copies are modest; however, 33.5% and 5.6% of the Caucasian population are heterozygous and homozygous for the G allele, respectively.

This is a large analysis of cellular DNA expression of cells known to be directly linked to the pathophysiology of atherosclerosis, and provides some insight into the void of knowledge regarding the mechanisms for the association of SNPs and coronary artery disease. One limitation regarding these data is this is still an indirect assessment of DNA expression within the atherosclerotic plaque, as monocytes were used as a surrogate and there may be substantial differences in mRNA expression in peripheral blood monocytes compared to monocytes directly involved in the atherosclerotic process. In addition, these data provide associations of SNPs and mRNA expression, but do not provide evidence regarding how these SNPs directly affect mRNA transcription.

Of the 16 SNPs that have been consistently identified as statistically significantly associated with CAD in GWAS, 5 were identified as associated with differing gene mRNA expression levels. Of these SNPs, rs599839 is associated with differing mRNA expression of the

PSRC1 gene within which this SNP is in close proximity, suggesting there may be a direct and local effect of this SNP on gene expression.

43

References

1. Yusuf S, Hawken S, Ounpuu S, et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. Lancet 2004;364:937- 52. 2. Phillips RL, Lilienfeld AM, Diamond EL, Kagan A. Frequency of coronary heart disease and cerebrovascular accidents in parents and sons of coronary heart disease index cases and controls. Am J Epidemiol 1974;100:87-100. 3. Sholtz RI, Rosenman RH, Brand RJ. The relationship of reported parental history to the incidence of coronary heart disease in the Western Collaborative Group Study. Am J Epidemiol 1975;102:350-6. 4. Marenberg ME, Risch N, Berkman LF, Floderus B, de Faire U. Genetic susceptibility to death from coronary heart disease in a study of twins. N Engl J Med 1994;330:1041-6. 5. Zdravkovic S, Wienke A, Pedersen NL, Marenberg ME, Yashin AI, De Faire U. Heritability of death from coronary heart disease: a 36-year follow-up of 20 966 Swedish twins. J Intern Med 2002;252:247- 54. 6. Marian AJ. The enigma of genetics etiology of atherosclerosis in the post-GWAS era. Curr Atheroscler Rep 2012;14:295-9. 7. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007;447:661-78. 8. Broadbent HM, Peden JF, Lorkowski S, et al. Susceptibility to coronary artery disease and diabetes is encoded by distinct, tightly linked SNPs in the ANRIL locus on chromosome 9p. Hum Mol Genet 2008;17:806-14. 9. Helgadottir A, Thorleifsson G, Manolescu A, et al. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 2007;316:1491-3. 10. McPherson R, Pertsemlidis A, Kavaslar N, et al. A common allele on chromosome 9 associated with coronary heart disease. Science 2007;316:1488-91. 11. Samani NJ, Erdmann J, Hall AS, et al. Genomewide association analysis of coronary artery disease. N Engl J Med 2007;357:443-53. 12. Kalinina N, Agrotis A, Antropova Y, et al. Smad expression in human atherosclerotic lesions: evidence for impaired TGF-beta/Smad signaling in smooth muscle cells of fibrofatty lesions. Arterioscler Thromb Vasc Biol 2004;24:1391-6. 13. Assimes TL, Knowles JW, Basu A, et al. Susceptibility locus for clinical and subclinical coronary artery disease at chromosome 9p21 in the multi-ethnic ADVANCE study. Hum Mol Genet 2008;17:2320- 8. 14. Bjorck HM, Lanne T, Alehagen U, et al. Association of genetic variation on chromosome 9p21.3 and arterial stiffness. J Intern Med 2009;265:373-81. 15. Hinohara K, Nakajima T, Takahashi M, et al. Replication of the association between a chromosome 9p21 polymorphism and coronary artery disease in Japanese and Korean populations. J Hum Genet 2008;53:357-9. 16. Hiura Y, Fukushima Y, Yuno M, et al. Validation of the association of genetic variants on chromosome 9p21 and 1q41 with myocardial infarction in a Japanese population. Circ J 2008;72:1213-7.

44

17. Sandhu MS, Waterworth DM, Debenham SL, et al. LDL-cholesterol concentrations: a genome- wide association study. Lancet 2008;371:483-91. 18. Samani NJ, Braund PS, Erdmann J, et al. The novel genetic variant predisposing to coronary artery disease in the region of the PSRC1 and CELSR2 genes on chromosome 1 associates with serum cholesterol. J Mol Med 2008;86:1233-41. 19. Schadt EE, Molony C, Chudin E, et al. Mapping the genetic architecture of gene expression in human liver. PLoS Biol 2008;6:e107. 20. Kathiresan S, Voight BF, Purcell S, et al. Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nat Genet 2009;41:334-41. 21. Erdmann J, Grosshennig A, Braund PS, et al. New susceptibility locus for coronary artery disease on chromosome 3q22.3. Nat Genet 2009;41:280-2. 22. Erdmann J, Willenborg C, Nahrstaedt J, et al. Genome-wide association study identifies a new locus for coronary artery disease on chromosome 10p11.23. Eur Heart J 2011;32:158-68. 23. Wild PS, Zeller T, Schillert A, et al. A Genome-Wide Association Study Identifies LIPA as a Susceptibility Gene for Coronary Artery Disease. Circ Cardiovasc Genet 2011;4:403-12. 24. Klima H, Ullrich K, Aslanidis C, Fehringer P, Lackner KJ, Schmitz G. A splice junction mutation causes deletion of a 72-base exon from the mRNA for lysosomal acid lipase in a patient with cholesteryl ester storage disease. J Clin Invest 1993;92:2713-8. 25. Zschenker O, Illies T, Ameis D. Overexpression of lysosomal acid lipase and other proteins in atherosclerosis. J Biochem 2006;140:23-38. 26. Slavin TP, Feng T, Schnell A, Zhu X, Elston RC. Two-marker association tests yield new disease associations for coronary artery disease and hypertension. Hum Genet 2011. 27. Schunkert H, Konig IR, Kathiresan S, et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat Genet 2011;43:333-8. 28. A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease. Nat Genet 2011;43:339-44. 29. Zeller T, Wild P, Szymczak S, et al. Genetics and beyond--the transcriptome of human monocytes and disease susceptibility. PLoS One 2010;5:e10693. 30. Wagsater D, Zhu C, Bjorck HM, Eriksson P. Effects of PDGF-C and PDGF-D on monocyte migration and MMP-2 and MMP-9 expression. Atherosclerosis 2009;202:415-23. 31. Chanock SJ, Manolio T, Boehnke M, et al. Replicating genotype-phenotype associations. Nature 2007;447:655-60. 32. Aoki A, Ozaki K, Sato H, et al. SNPs on chromosome 5p15.3 associated with myocardial infarction in Japanese population. J Hum Genet 2011;56:47-51. 33. Wang F, Xu CQ, He Q, et al. Genome-wide association identifies a susceptibility locus for coronary artery disease in the Chinese Han population. Nat Genet 2011;43:345-9. 34. Taurino C, Miller WH, McBride MW, et al. Gene expression profiling in whole blood of patients with coronary artery disease. Clin Sci (Lond) 2010;119:335-43. 35. Victor VM, Apostolova N, Herance R, Hernandez-Mijares A, Rocha M. Oxidative stress and mitochondrial dysfunction in atherosclerosis: mitochondria-targeted antioxidants as potential therapy. Curr Med Chem 2009;16:4654-67. 36. Sinnaeve PR, Donahue MP, Grass P, et al. Gene expression patterns in peripheral blood correlate with the extent of coronary artery disease. PLoS One 2009;4:e7037. 37. Wingrove JA, Daniels SE, Sehnert AJ, et al. Correlation of peripheral-blood gene expression with the extent of coronary artery stenosis. Circ Cardiovasc Genet 2008;1:31-8. 38. Wu G, Huang J, Wei G, Liu L, Pang S, Yan B. LAMP-2 gene expression in peripheral leukocytes is increased in patients with coronary artery disease. Clin Cardiol 2011;34:239-43.

45

39. Wu G, Wei G, Huang J, Pang S, Liu L, Yan B. Decreased gene expression of LC3 in peripheral leucocytes of patients with coronary artery disease. Eur J Clin Invest 2011;41:958-63. 40. Bild DE, Bluemke DA, Burke GL, et al. Multi-ethnic study of atherosclerosis: objectives and design. Am J Epidemiol 2002;156:871-81. 41. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet 2006;2:e190. 42. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006;38:904-9. 43. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010;26:2190-1. 44. Muller O, Delrue L, Hamilos M, et al. Transcriptional fingerprint of human whole blood at the site of coronary occlusion in acute myocardial infarction. EuroIntervention 2011;7:458-66. 45. Colombo G, Gertow K, Marenzi G, et al. Gene expression profiling reveals multiple differences in platelets from patients with stable angina or non-ST elevation acute coronary syndrome. Thromb Res 2011;128:161-8. 46. Linsel-Nitschke P, Heeren J, Aherrahrou Z, et al. Genetic variation at chromosome 1p13.3 affects sortilin mRNA expression, cellular LDL-uptake and serum LDL levels which translates to the risk of coronary artery disease. Atherosclerosis 2010;208:183-9. 47. Muendlein A, Geller-Rhomberg S, Saely CH, et al. Significant impact of chromosomal locus 1p13.3 on serum LDL cholesterol and on angiographically characterized coronary atherosclerosis. Atherosclerosis 2009;206:494-9. 48. Samani NJ, Braund PS, Erdmann J, et al. The novel genetic variant predisposing to coronary artery disease in the region of the PSRC1 and CELSR2 genes on chromosome 1 associates with serum cholesterol. J Mol Med (Berl) 2008;86:1233-41. 49. Wallace C, Newhouse SJ, Braund P, et al. Genome-wide association study identifies genes for biomarkers of cardiovascular disease: serum urate and dyslipidemia. Am J Hum Genet 2008;82:139-49.

46

MEDICAL TRAINING______Cardiology Fellowship 7/2010-present Wake Forest University Baptist Medical Center Winston-Salem, NC

Internal Medicine Residency 6/2007-7/2010 Duke University Medical Center Durham, NC

EDUCATION______Master of Science, Clinical Population Translational Science 6/2010-present Wake Forest University Pending Thesis Presentation

Doctor of Medicine 5/2007 Medical University of South Carolina Charleston, SC Class rank: 3 of 138 Alpha Omega Alpha

Bachelor of Science, Genetics 5/2003 University of Georgia Athens, GA Magna cum Laude

EMPLOYMENT______LeBauer Cardiology 12/2012-present Moses Cone Hospital Cardiology Moonlighting Greensboro, NC

Lexington Cardiology 9/2012-present Lexington Memorial Hospital Cardiology Moonlighting Lexington, NC

Novant Inpatient Care Specialists 12/2010-1/2012 Forsyth Medical Center Internal Medicine Moonlighting Winston-Salem, NC

High Point Regional Hospitalists 12/2010-8/2012 High Point Regional Hospital Internal Medicine Moonlighting High Point, NC

47

HONORS and AWARDS______Alpha Omega Alpha (AOA) Honor Medical Society 2006-2007 South Carolina Alpha Chapter Medical University of South Carolina, School of Medicine Fourth Year Medical Student Inductee Charleston, SC

MUSC Class Rank 3 of 138, Graduating Class of 2007 5/2007 Medical University of South Carolina, School of Medicine Recognizes and awards top 5 of each graduating class Charleston, SC

Outstanding Supplemental Instructor Award 2007 5/2007 Medical University of South Carolina Presented each year to the top medical student supplemental instructor for superior teaching and guidance to medical, dental, and physical therapy students Charleston, SC

Honors, Surgery 2006 Honors, Internal Medicine 2006 Honors, Neurology 2006 Honors, Psychiatry 2006 Honors, Pediatrics 2005 Collegiate All-American Scholar 2005

PUBLICATIONS______1. Sellers MB, Tricoci P, Harrington RA. A new generation of antiplatelet agents. Curr Opin Cardiol 2009; 24:307-12.

2. Sellers MB, Newby LK. Atrial Fibrillation, Anticoagulation, Fall Risk and Outcomes in the Elderly. Am Hear J 2011; 161(2):241-6

3. Sellers MB, Liu Y, Burke G, Divers J, Ouyang P, Palmas W, Post W, Lohman K, Herrington D. Replication of the Association of PSRC1 and MTHFD1L with Coronary Artery Disease: The Multi-Ethnic Study of Atherosclerosis. American Heart Association- Abstract 15437

4. Sellers MB, Liu Y, Howard T, Shimmin LC, Hixson JE, Herrington D. Epigenetic Analysis of Lipid Phenotypes in Liver Tissue from the Pathobiological Determinants of Atherosclerosis in Youth (PDAY) Study. American Heart Association- Abstract 16471

GRANTS/FUNDING______1. Co-Principal Investigators: Sellers MB, Herrington D. Epigenetic Analysis of Lipid Phenotypes in Liver Tissue from the Pathobiological Determinants of Atherosclerosis in Youth (PDAY) Study. TSI Pilot Funding Program ($35,000) and the Dubie Holeman Cancer and Heart Fund ($10,000).

48

RESEARCH______Clinical Research: Utilizing the ECG through EpiCare to identify high risk features in subjects for differing cardiovascular phenotypes. Wake Forest University School of Medicine is the electrocardiogram reading center for more than 20 national and international epidemiologic studies and clinical trials.

Translational Research: Identifying genetic and epigenetic markers for coronary artery disease and utilizing transcriptomics via mRNA expression profiles to illuminate genetic mechanisms that lead to complex disease, specifically cardiovascular disease. Master of Science thesis is a replication analysis of single nucleotide polymorphisms associated with coronary artery disease and a functional validation using monocyte mRNA expression profiles to illustrate associations of these SNPs with coronary artery disease.

CERTIFICATIONS______USMLE Step 1, 253 6/2005 USMLE Step 2, 244 12/2006 USMLE Step 2 Clinical Skills Pass 1/2007 USMLE Step 3 Pass (transcript requested) 1/2009

ABIM, Internal Medicine 8/2010

Seminar on the Epidemiology and Prevention of Cardiovascular Disease 7/2010 American Heart Association, 10 Day Seminar Selected from postdoctoral applicants Granlibakken Conference Center Tahoe City, California

ACLS/BLS Active

LICENSURE______State of North Carolina 2010-present

PROFESSIONAL EXPERIENCE______Peer Review Committee, Duke University  Committee member involved in an IRB approved project instituting a new resident to resident evaluation process, with the ultimate goal of initiating a “360 degree” evaluation process including residents, attendings, nurses, students and staff

Duke Medical Student Standardized Patient Exam Evaluator Duke University Medical School Durham, NC, September 2008 and September 2009  Graded Duke Medical Student physical diagnosis patient histories for their clinical skills exam

Duke Medical Student EKG Tutorials

49

Duke University Medical School Durham, NC, July 2008-September 2008  Collected patient EKGs and created a clinical PowerPoint EKG tutorial representing common EKG abnormalities for Duke second year medical students

Duke Medical School CAPSTONE Project Duke University Medical School Durham, NC Spring 2008  Led a discussion with the graduating fourth year medical students pursuing internal medicine residency regarding expectations and concerns relating to intern year

Supplemental Instructor for the College of Health Professions, Physician Assistant, Physiology Center for Academic Excellence (CAE), Medical University of South Carolina Charleston, SC, January 2007-May 2007  Tutored five sophomore physician assistant students twice a week for 1 ½ hours in their 2nd semester Physiology course  Reviewed material before each session, prepared learning materials, lectured on topics in physiology covered each week in the classroom, quizzed students on important topics, submitted weekly reports to the CAE, and maintained contact with the course director

Supplemental Instructor for the College of Medicine Center for Academic Excellence (CAE), Medical University of South Carolina Charleston, SC, August 2006-December 2006  Anatomy: August 2006-December 2006  Immunology and Microbiology: August 2006-December 2006  Pathology: January 2006-May 2006  Anatomy: August 2005-December 2005  Neuroscience: January 2005-May 2005  Anatomy: August 2004-December 2004  Anatomy: June 2004-August 2004 (Dental Medicine) (duties as above)

Physical Diagnosis Small Group Leader Medical University of South Carolina Charleston, SC, September 2006-December 2006  Paired with a Physician to help with the second year Physical Diagnosis Doctoring course  Cover the important aspects of physical diagnosis of the cardiovascular, pulmonary, musculoskeletal, neurological, abdominal, and ENT exams for a small group of second year medical students in conjunction with the physician associate professor

USMLE Step I Review Small Group Leader Center for Academic Excellence (CAE), Medical University of South Carolina Charleston, SC, May 2005-June 2005  Led small group sessions for five classmates in the College of Medicine twice a week for 1½ hours on important topics in preparation for USMLE Step I boards

50

 Prepared learning materials, lectured and led discussions on topics identified by the students, and discussed review test questions in different board review materials

PROFESSIONAL ORGANIZATIONS______American Heart Association American Medical Association American College of Physicians

51