IT’S COMPLICATED: ANALYZING THE ROLE OF AND IN

by JEFFREY HSU

Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy

Dissertation Adviser: Dr. Jonathan D Smith

Department of Molecular Medicine CASE WESTERN RESERVE UNIVERSITY

January 2014 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of:

Jeffrey Hsu

candidate for the Doctor of Philosophy degree*.

Thomas LaFramboise Thesis Committee Chair

Mina Chung David Van Wagoner David Serre Jonathan D Smith

Date: 5/24/2013 We also certify that written approval has been obtained for any proprietary material contained therein.

i Contents

Abstract 1

Acknowledgments 3

1 Introduction 4 1.1 Heritability of cardiovascular diseases ...... 4 1.2 Genome wide era and complex diseases ...... 7 1.3 Functional genomics of disease associated loci ...... 8 1.4 Animal models for functional studies of CAD ...... 13

2 Genetic-Genomic Replication to Identify Candidate Mouse Atheroscle- rosis Modifier 14 2.1 Introduction ...... 14 2.2 Methods ...... 15 2.2.1 Mouse and studies ...... 15 2.3 Results ...... 17 2.3.1 QTL replication in a new cross ...... 17 2.3.2 coding differences between AKR and DBA mice resid- ing in ATH QTLs ...... 22 2.3.3 eQTL in bone marrow derived and endothelial cells 30 2.3.4 eQTL replication between different crosses and dif- ferent platforms ...... 44

3 Transcriptome Analysis of Genes Regulated by Loading in Two Strains of Mouse Macrophages Associates Lysosome Pathway and ER Stress Response with Atherosclerosis Susceptibility 51 3.1 Introduction ...... 52 3.2 Methods ...... 53 3.2.1 Mice ...... 53 3.2.2 Total, free, and esterified cholesterol quantification ...... 53 3.2.3 Loading of macrophages with acetylated LDL for transcrip- tome profiling ...... 54 3.2.4 Isolation of total RNA from BMM cell pellets ...... 54 3.2.5 Hybridization and detection of transcripts ...... 55 3.2.6 Microarray data analysis ...... 55

ii 3.2.7 Real-Time quantitative PCR (qPCR) ...... 56 3.2.8 Western blot ...... 56 3.3 Results and Discussion ...... 57 3.3.1 AKR and DBA/2 macrophages respond differently to choles- terol loading ...... 57 3.3.2 Hierarchical clustering ...... 58 3.3.3 Strain differences on BMM transcriptome ...... 58 3.3.4 Cholesterol loading effect on BMM transcriptome ...... 61 3.3.5 Cholesterol loading–strain interaction effect on BMM transcrip- tome ...... 72 3.3.6 Validation of data by quantitative Real-Time PCR (qPCR) . 78 3.3.7 Western Blot Analysis ...... 79

4 Whole Genome Expression Differences in Left and Right Atria Ascertained by RNA-Sequencing 82 4.1 Introduction and Background ...... 82 4.2 Methods ...... 84 4.2.1 RNA-sequencing for left-right pairs ...... 84 4.2.2 Paried-end read analysis ...... 84 4.2.3 RT-PCR ...... 85 4.3 Results ...... 86 4.3.1 RNA-seq of left-right atrial appendages ...... 86 4.3.2 miRNA differences between the left and the right atria ...... 87 4.3.3 mRNA gene expression differences between the left and the right atria ...... 91 4.3.4 Left-right expression differences in poorly annotated transcripts 100 4.4 Discussion ...... 101

5 Conclusion: How to unravel complicated traits 105 5.1 Roadmap to Identification of Mouse Atherosclerosis Modifier Genes . 105 5.2 Roadmap to Identification of causal variants for AF, and their mecha- nism of action ...... 107 5.3 The utility of functional genomics ...... 108

iii List of Tables

2.1 Table 1: Aortic root lesion (log 10) QTLs in DBA/2 x AKR F2 cohort 18 2.2 Corresponding GWAS hits in human orthologous regions to mouse AthQTL ...... 20 2.2 Corresponding GWAS hits in human orthologous regions to mouse AthQTL ...... 21 2.3 Polyphen2 scores within Ath intervals...... 23 2.3 Polyphen2 scores within Ath intervals...... 24 2.3 Polyphen2 scores within Ath intervals...... 25 2.3 Polyphen2 scores within Ath intervals...... 26 2.3 Polyphen2 scores within Ath intervals...... 27 2.3 Polyphen2 scores within Ath intervals...... 28 2.3 Polyphen2 scores within Ath intervals...... 29 2.4 Top 25 cis-eQTLs by LOD score in BMMs ...... 31 2.4 Top 25 cis-eQTLs by LOD score in BMMs ...... 32 2.5 Top 25 trans-eQTLs by LOD score in BMMs ...... 34 2.5 Top 25 trans-eQTLs by LOD score in BMMs ...... 35 2.6 Top 25 cis-eQTLs by LOD score in endothelial cells ...... 37 2.7 Top 25 trans-eQTLs by LOD score in endothelial cells...... 38 2.7 Top 25 trans-eQTLs by LOD score in endothelial cells...... 39 2.8 cis-eQTLs that are found in both ECCs and BMMs at < 5% FDR that also reside within the AthQTLs ...... 41 2.9 trans-eQTLs that are found in both endothelial cells (EC) and bone marrow macrophages (BMM) at < 30% FDR ...... 42 2.9 trans-eQTLs that are found in both endothelial cells (EC) and bone marrow macrophages (BMM) at < 30% FDR ...... 43 2.10 Summary statistics and replication of bone marrow macrophage cis and trans-eQTLs for the prior and new crosses using the restricted set of common probe...... 44 2.11 Replicated trans-eQTL between crosses at the %5 FDR level . . . . 47 2.12 Replicated cis-eQTL within replicated Ath QTL intervals that have replicated direction of expression-lesion correlation ...... 49 2.12 Replicated cis-eQTL within replicated Ath QTL intervals that have replicated direction of expression-lesion correlation ...... 50

3.1 Top 10 differentially expressed transcripts between AKR and DBA/2 unloaded ...... 60

iv 3.2 Top 10 differentially expressed transcripts between AKR and DBA/2 unloaded ...... 63 3.3 Significantly regulated transcripts upon cholesterol loading involved in lysosome pathway, ranked by fold change (loaded/unloaded). . . . . 67 3.3 Significantly regulated transcripts upon cholesterol loading involved in lysosome pathway, ranked by fold change (loaded/unloaded). . . . . 68 3.4 Differentially expressed transcripts conserved in both experiments that reside within Ath28, Ath22 and Ath26 QTLs...... 81

4.1 Expression differences of miRNAs between the left and right atria at FDR <0.08 ranked by p-value...... 89 4.1 Expression differences of miRNAs between the left and right atria at FDR <0.08 ranked by p-value...... 90 4.1 Expression differences of miRNAs between the left and right atria at FDR <0.08 ranked by p-value...... 91 4.2 The top 20 left-right differentially expressed atrial genes ranked by p-value...... 93 4.2 The top 20 left-right differentially expressed atrial genes ranked by p-value...... 94 4.3 Top left-right atria differentially regulated genesets by presence of tran- scription factor binding motifs ± 2 kb from the start site of transcrip- tion of atrial expressed genes...... 95 4.3 Top left-right atria differentially regulated genesets by presence of tran- scription factor binding motifs ± 2 kb from the start site of transcrip- tion of atrial expressed genes...... 96 4.3 Top left-right atria differentially regulated genesets by presence of tran- scription factor binding motifs ± 2 kb from the start site of transcrip- tion of atrial expressed genes...... 97 4.4 Top left-right atria differentially regulated genesets by presence of con- served miRNA motif in the 3’ UTR of atrial expressed genes...... 99 4.5 Novel non-Ensembl annotated transcripts that were differentially ex- pressed between the left and right atria ranked by p-value ...... 101

v List of Figures

1.1 cis-action on gene expression...... 9 1.2 Example of a cis-eQTL ...... 10 1.3 GWAS SNPs linked to putative functional variant ...... 11 1.4 An schematic of how aellic imbalance arises...... 12

2.1 Combined AthQTL ...... 19 2.2 SNPs within probes may lead to false eQTLs ...... 33 2.3 Replicated cis-eQTL ...... 40 2.4 Combined cis-overlap ...... 45 2.5 Replicated trans-eQTL ...... 46

3.1 Hierarchical clustering analysis of 32 samples included in the study . 59 3.2 Conservation of cholesterol induced changes in macrophage gene ex- pression in two independent experiments ...... 64 3.3 Gene expression and validation of microarray data by quantitative real- time PCR in experiment 1 and 2 macrophages ...... 72 3.4 Examples of transcripts with strain-loading interaction effect in exper- iment 1 samples ...... 73 3.5 Examples of transcripts with strain-loading interaction effect in exper- iment 2 samples ...... 75 3.6 Total, esterified, and free cholesterol mass in unloaded and loaded macrophages ...... 77 3.7 Validation of microarray expression data ...... 79

4.1 Size-distribution of small-RNA reads ...... 87 4.2 Multidimension scaling (MDS) of gene expression of Left-Right atrial pairs ...... 89 4.3 HAMP and PITX2 display inverse expression pattern ...... 93

vi It’s complicated: Analyzing the role of Genetics and Genomics in Cardiovascular Disease

Abstract by JEFFREY HSU

Coronary artery disease (CAD) and atrial fibrillation (AF) are two common complex diseases of the heart that both have strong genetic components. Although genome wide association studies in and genetic mapping studies in mouse models have identified genomic loci associated with these diseases, the functional genetic variants and the mechanisms by which they alter disease susceptibility are largely unknown. In this thesis we use mouse models and human atrial tissues to perform functional genomic studies to gain insight into these disease associated loci, with the central hypothesis that many disease associated variants by regulating the expression of nearby genes. The major methods used in these studies are transcrip- tome profiling by expression microarrays and RNA sequencing followed by rigorous statistical analysis. Transcript expression can be considered as an intermediate phe- notype to disease susceptibility; and, the identification of genetic variants that alter transcript expression may lead to the mechanism of disease association. In the first

1 study we performed an intercross between AKR and DBA/2 inbred mouse lines to identify and replicate atherosclerosis modifier genes. Quantitative trait loci (QTL) for atherosclerosis on 2, 15, and 17 were replicated between studies. We quantified whole-genome expression of bone marrow derived macrophages and en- dothelial cells and found that much of the gene expression was under genetic control, including at the atherosclerosis QTLs, and that this was genetic regulation of gene expression was reproducible between experiments. In the second study we identified genes that are regulated differentially in macrophages from these two mouse strains upon cholesterol loading. We found that the atherosclerosis resistant strain had much higher up regulation of genes in the lysosomal pathway. Combining these two studies, we identified several atherosclerosis candidate modifier genes, such as Sys1. In the third study we quantified the differences in gene expression between human left and right atria using RNA sequencing and found many genes differentially expressed at loci associated with AF in genome-wide association studies. In addition we identified a long non-coding RNA adjacent to PITX2, at the with that has the highest association with AF. In the future these loci can be studied further using reporter assays to identify the causal variant as well as using genomic editing techniques to study the gene regulatory mechanism.

2 Acknowledgments

First and foremost on the many people to thank is Dr. Jonathan Smith, for which none of this would be possible. He is a mentor second to none and a great friend. A great number of people have helped me in completing this dissertation. Shamone Gore-Panter and Stela Berisha have contributed greatly to discussions as well as running of experiments. Thank you Peggy for making me help you move even though I’m probably the smallest person she knows besides Ethan. Greg Tchou for getting me excited about sequencing and your long albeit enjoyable stories. Kai Smith was a summer student in the lab has contributed greatly to a python package for genetic analysis. Brandon Pereman for helping me out in extracint RNA from several hundred samples. The rest of the Smith Lab have also made great contributions too numerous to list. A number of principle investigators have contributed to helping me as well. I owe a great deal of thanks to the atrial fibrillation workgroup and especially Mina Chung, David Van Wagoner, John Barnard, Dave Serre and Peter Hanna, none of this would be possible. Thomas LaFromboise and discussions with him have contributed to this dissertation.

3 Chapter 1

Introduction

1.1 Heritability of cardiovascular diseases

Cardiovascular disease (CVD) remains the number one cause of mortality globally [1]. Although mortalities from CVD have decreased dramatically in the past few decades, the disease rates remain stubbornly high, a reflection of the complex etiologies in- volved [2]. Complex diseases have many underlying causes, and although there are various highly heritable familial forms of these diseases, for the majority of the popu- lation, these diseases have many explanatory variables that have variable pentrances. Complex traits and diseases are likely to be polygenic, if they have an appreciable genetic component. In comparison Mendelian monogenic disorders are driven by a single gene with a large effect size and have a very distinctive inheritance pattern. Although there are many types of CVD, the topic of this thesis will focus on coronary artery disease and atrial fibrillation, both of which are chronic disorders that have the potential to result in acute events through the formation of blood clots resulting in significant morbidity and mortality. Coronary artery disease (CAD) is the buildup of and calcified plaques in the blood vessels supplying the heart. CAD is a multi- disease involving the blood,

4 heart, , vascular smooth muscles, liver, and macrophages. Due to its complexity only a cursory background on CAD will be briefly discussed here, mainly for definitional purposes. Plaque progression starts with formation of fatty streaks containing both modified – primarily oxidized low-density (oxLDL) – as well as monocytes and macrophages that have become laden with these lipids. are that bind to lipids and act as the main carriers of lipids in the circulatory system. Scavenger receptors, such as scavenger receptor A, SCARA1, or CD36 on macrophages recognize ox-LDL and other modified LDL species and mediate their uptake into the macrophage. The narrowing of the arteries itself is not lethal and is often times asymptomatic, but it may lead to angina pectoris and shortness of breath. The atherosclerotic plaque can rupture and trigger a clotting response and complete blockage of the coronaries known as a myocardial infarction. The resulting ischemia of the heart often causes necrosis of the cardiac muscle and if left untreated heart failure and death. The Framingham Heart Study (FHS), a multi-generational epidemiological study of heart disease found that blood pressure, total cholesterol, LDL-cholesterol, high density lipoprotein cholesterol (HDL-C), dia- betes and, smoking contribute to the 10-year risk of developing coronary heart disease [3]. Atrial fibrillation (AF) is another disorder which can ultimately lead to clotting and ischemia. Atrial fibrillation is the most common form of cardiac arrhythmia with a life time risk of 1 in 4 [4]. It is characterized by a rapid beating and uncoordinated firing of the atrias which are the two upper chambers of the heart. The arrhythmia results in poor pumping of blood from the atria into the ventricles. If left untreated, this abnormal beating can result in shortness of breath, dizziness and palpitations. Pooling of the blood in the atria potentially results in the formation of blood clots that may travel up into the carotid arteries leading to an ischemic stroke and sudden death. Presence of AF increases the risk of ischemic stroke by roughly 5 fold at all

5 ages [5] [6]. In addition undetected intermittent asymptomatic AF likely accounts for a significant share of unexplained strokes [7]. Risk factors for AF itself are age, gender, , hypertension, valve disease and congestive heart failure [8]. Lone AF is defined as AF without overt signs of the heart disease such as heart failure or valve disease. Anticoagulants such as wafarin are often prescribed to patients with AF to prevent blood clotting. Radio frequency catheter ablation is used as a longer lasting treatment for AF itself. Both these diseases, despite being complex, have a fairly large genetic component in determining disease predisposition. Early Mendelian genetics played a large role in understanding cardiovascular disease, linking hypercholestermia, CAD and sudden cardiac arrest [9]. The genetic basis for familial hypercholestermia would later be iden- tified as in the LDLR receptor by Brown and Goldstein (see review [10]). Beyond mendelien disorders, the Framingham study also finds that family history of cardiovascular disease is a risk factor that is independent of the traditional factors [11]. Family history of premature CAD in a parent or a sibling increases the risk of CAD roughly by 2-fold. Heritability (h2) is the fraction of the phenotypic variance in a given environment that is explained by genetics which is often estimated via twin studies. CAD, as ascertained by coronary angiograms, is highly heritable, but only at proximal locations (50%) while distal disease and plaque number was not heritable [12]. Many of the CAD risk factors are also highly heritable [13]. Fasting total choles- terol levels and were found to be highly heritable, around 60% and 75% respectively [14]. More specifically, LDL and HDL are both found to be just as herita- ble [15]. Although much has been made about the conflation of shared environmental factors, adoption studies where twins are reared apart show that the genetic compo- nent remains just as high for lipid traits[16]. Smoking initiation and nicotine addiction is estimated to be 50-75% heritable [17][18][19]. The genetic component in smoking

6 behavior remains even after smoking rates halved between two different cohorts born at different times [20]. Genetics roughly explains 26% of the variation of Diabetes Mellitus (DM) [21]. Other marginal covariates such as adult BMI are 50-90% herita- ble with the average study producing roughly 75% heritability. In Atrial fibrillation, family studies suggest that AF is roughly 22% heritable. [22] [23]. AF also tends to cluster in families as well [24], and progeny on average have 8 to 9 fold increased relative risk for atrial fibrillation[25]. Lone AF has high heritabilities as it is not secondary to other CV diseases. In addition there exists several mongenic familial forms of atrial fibrillation involving rare mutations in the genes KCNE2, HCN4 and KCNQ1.

1.2 Genome wide era and complex diseases

The common disease common variant hypothesis posits the role of common variants in common diseases. A common variant is defined as a polymorphic locus with more than 5% minor allele frequency in the . Genome-wide association studies (GWAS) involve the whole genome genotyping of single-nucleotide polymorphisms (SNPs), usually by microarrays, and correlate the effect of each SNP with the pheno- type. SNPs are overwhelmingly biallelic and constitute the largest number of human genetic variations. SNPs often segregate together with other SNPs due to linkage equilibrium (LD), the coinheritance of ancestral chromosome segments. Hundreds of GWAS have been published, using both case-control and longitudinal study designs, and have currently found thousands of variants that associate with disease or anthro- pomorphic traits. A comprehensive catalog of genome-wide assocations studies found that risk alleles are common, with the median at 36% [26]. In coronary artery disease a very large scale GWAS containing more than 100,000 patients has discovered 13

7 loci associated with coronary artery disease [27]. The strongest of these hits occurs at 9p21. Many of these hits occur in intergenic regions of the genome. Likewise in atrial fibrillation, GWAS studies have identified several strong novel loci involved in AF. The strongest of such hits occur at the 4q25 locus at the SNP rs2200733 [28][29] which is 171 kilobases away from the closest gene PITX2. GWAS loci for several other related cardiac electrophysiological traits such as PR-interval have discovered that some risk loci are shared between PR interval and AF [30][31]. Like in CAD, many of these associated loci occur within intergenic regions.

1.3 Functional genomics of disease associated loci

Once a locus is identified in a genetic study, the mechanism of action of how the genetics lead to an altered phenotype is useful in both confirming the hit as well as potentially developing therapeutics. Since the discovered SNP in a GWAS is not necessarily the causative variant, finding the exact causal mechanism often requires finding the exact variant. With the advent of inexpensive sequencing technologies finding the causative may be possible when all the variants are known by direct sequencing. There are several mechanisms that can explain how intergenic associations uncovered in GWAS can affect a phenotype; first, the non-coding SNP could be in LD with a coding SNP; second, the SNP may be linked to a variant that alters the sequence of a RNA gene such as a micro-RNA (miRNA) or a long non-coding RNA (lincRNA) [32]; and third, the SNP could be affecting expression of an adjacent or nearby gene directly or linked with a SNP that does. The first explanation was often thought to be the most plausible explanation for GWAS results due to the popular understanding of mendelian genetics [33]. In addition, due to the large effect sizes of nonsense mutations, elucidating theses hits

8 was relatively easy. With the advent of cheap sequencing, more rare variants, in the coding regions that may be tagged by common variants are likely to be discovered. Resequencing of the PCSK9 gene associated with LDL-C levels found in GWAS and familial studies [34] finds additional rare variants within the coding region of PCSK9. The PCSK9 gene has both gain-of-function as well as loss of function mutations that alter the plasma cholesterol levels in opposite directions. GWAS studies have not accounted for all the heritability of cardiovascular disease traits. The proportion of the heritability of lipid traits is increased when including these rarer variants by fine mapping [35].

Figure 1.1: cis-action on Gene Expression. In this example the A allele (top in the promoter region leads to more mRNA expression than the G (bottom) promoter allele

There is not much known about the second explanation, changes in RNA genes. However, the third explanation is probably the major mechanism responsible for the majority of GWAS hits. Thus, the central hypothesis for my functional genomics stud- ies is that regulatory SNPs control gene expression in cis (Figure 1.1). An expression quantitative trait (eQTL) is a locus that strongly associates with gene transcript lev- els. A cis-eQTL is one in which the eQTL controls the expression levels directly by acting as a for a DNA binding protein such as transcription factor. Prac-

9 Figure 1.2: Example of a cis-eQTL. Genetics control gene expression levels as the association between the two is significant. rs3671849 is a SNP located adjacent to the SYS1 gene on . tically, this usually means that the eQTL occurs in the vicinity to the transcript which it affects. An example of a cis-eQTL from my mouse studies is shown in Figure 1.2, where a SNP linked to the Sys1 gene is associated with the expression of this gene. Gene expression can be an intermediate phenotype between genes and disease and as such a relationship between gene expression and genetics can increase the a priori be- lief that the genetics at particular loci affect a phenotype or offer additional evidence that a particular association is causal. Studies that look at the genetic control of gene expression are known as expression GWAS (eGWAS) or eQTL. The ENCODE Consortia has catalogued many noncoding regions of the genome and has found that the majority of GWAS hits have at least one linked SNP that falls within a puta- tive functional element (Figure 1.3)[36][37]. A putative non-coding functional region is defined as a region that is annotated by the various ENCODE projects either by DNAse hypersensitivity or chromatin immuno-precipitation sequencing (CHIP-seq) of various transcription factors. Regions of the genome that are actively occupied by transcription factors are free of nucleosomes and are susceptible to digestion by DNAase . In cardiovascular research, many studies have confirmed that gene expression differences are caused by genetic polymorphisms.

10 Figure 1.3: The majority of GWAS SNPs are linked to other variants (r2 ≥ 0.8) that have putative function as determined by DNAse hypersensitivity of Chip-Seq the ENCODE project (in blue). Image adapted from [37]

In addition to levels of expression in population studies, the genetic control of gene expression can also be assayed by allelic expression imbalance (AEI). If a transcript contains a heterozygous SNP, referred to as the indicator SNP, and another genetic variant affects the expression of the transcript, then two alleles should be expressed in an imbalanced manner that can be found by cDNA sequencing (Figure 1.4). If there is a significant difference between the counts of the two indicator SNPs, then that is suggestive that there are other heterozygous polymorphisms somewhere acting in cis on the transcript. AEI offers benefits over eQTLs in that the test is run within a sample and therefore not cofounded by other variables such as sex, age and disease phenotype. A good example of a GWAS locus where the functional SNP mechanism has been determined is at the LDL-C GWAS hit at SORT1. The 1p13.3 locus, and in particular rs599839, found in CAD GWAS studies occurs in a non-coding region of the genome [38]. CELSR2, PSCR1, MYBPHL, and SORT1 are all genes that reside at the 1p13.3 locus. The major allele at this CAD locus correlated with higher CAD risk, and also correlated with lower levels of expression of CELSR2 and SORT1 in humans. In ad-

11 Figure 1.4: An schematic of how aellic imbalance arises. A transcription factor binds differentially at an enhancer/promoter element due to a polymorphism. One chro- mosome has a T allele that is a better match for a transcription factor. This binding initiates a higher level of transcription of the adjacent gene compared to the homolo- gous chromosome with the G allele at the same position. We can measure this effect by counting a heterozygous indicator SNP and testing for a significant difference between the two.

dition, expression of CELSR2 and SORT1 correlates inversely with LDL-cholesterol levels in a mouse model [39]. Additional work in humans found that the major allele was most strongly associated with higher levels of very small LDL in this region found. Overexpression and knockdown studies of SORT1 in mouse liver[40] confirmed its ef- fect in altering LDL levels. A single causative SNP, rs12740374, was found to affect the expression of SORT1 when all the nearby adjacent SNPs were tested sequentially in enhancer reporter assays. The minor allele at this SNP creates a binding site for the C/EBP transcription factor. Relevent to atrial fibrillation, a variant within the promoter region of GJA5, the gene encoding for the channel connexion-40, genes affects its expression in human left atria [41]. In addition to expression levels, AEI was found for GJA5, strengthening the belief in this eQTL.

12 1.4 Animal models for functional studies of CAD

In chapters 1 and 2 we discuss studies using mouse models of atherosclerosis to identify modifier genes using genetics and eQTL studies. Animal models offer the opportunity to tease out the causal genes and pathways that are much harder to do in human association studies. The majority of GWAS studies are cross-sectional studies rather than longitudinal and thus may be prone to sampling issues. GWAS also requires a large number of samples due to the statistical burden of meeting genome-wide significance. Like humans, animals have a large amount of genetic variation that may alter phenotype, and by breeding inbred strains, we can map and find genetic variants altering complex traits. As described earlier, quantitative trait (QTL) loci are similar to GWAS, with QTLs . Instead of using atherosclerosis disease as a binary variable, some measurable quanitative trait like LDL levels or aortic root lesion area is used. One complication is that mice do not spontaneously get CAD. One way to induce CAD in mice is to use hyperlipidemic ApoE -/- mice. In previous studies, it was found that DBA/2J and AKR/J ApoE -/- inbred mouse strains had the greatest differential in aortic root lesion area [42], and we used these two strains to perform our functional genomic studies.

13 Chapter 2

Genetic-Genomic Replication to Identify Candidate Mouse Atherosclerosis Modifier Genes

As published in: Hsu J, Smith JD. Genetic-Genomic Replication to Identify Candi- date Mouse Atherosclerosis Modifier Genes. Journal of the American Heart Associa- tion 2013 Jan 23;2(1):e005421

2.1 Introduction

Atherosclerosis is a complex disease with both environmental and genetic susceptibil- ity components. The heritability of atherosclerotic coronary artery disease in humans is evident from family history being a significant risk factor [43][44]. In addition, GWAS have identified multiple loci associated with CAD [45]. However, these studies have a tremendous statistical burden to overcome to meet the threshold of genome wide significance, and thus much of the genetic contribution is may be underreported.

14 Also, GWAS do not ascertain rare variants, and it is becoming increasingly clear that rare variants in aggregate can account for a significant portion of population variance for complex traits such as plasma triglycerides [46]. Thus, there is still impetus to identify novel genes and pathways that play a role in atherosclerosis susceptibility. Genetics also plays a role in lesion development in mouse models of atherosclerosis, as different inbred strains have markedly different aortic lesion areas [42]. Mouse models provide an opportunity to tease out the potential genetic modifiers for multigenic phe- notypes. We have previously shown that AKR ApoE -/- mice have 10-fold smaller aortic root lesions compared to DBA/2 ApoE -/- mice when fed a chow diet [42]. A previous intercross between these two strains identified two significant and four sug- gestive quantitative trait loci (QTL) for aortic root lesion area. Just as lesion area is a quantitative trait that can be used for gene mapping studies, gene expression levels can likewise be treated as a quantitative trait to map the expression QTLs (eQTL), or the loci that control the expression of specific transcripts [47]. We had previously

performed an eQTL analysis using macrophages from the F2 cohort of the AKR ApoE -/- X DBA/2 ApoE -/- strain intercross [48]. Here, we report atherosclerosis (Ath) QTL and eQTL findings from a second independent strain intercross of these same two strains.

2.2 Methods

2.2.1 Mouse and cell studies

A DBA/2J ApoE-/- X AKR/J ApoE-/- reciprocal strain intercross was performed

to generate an F1 cohort and the subsequent F2 cohort of 89 males and 77 females.

The F2 mice were weaned at 21 days and placed on a regular chow diet. Mice were sacrificed at 16 weeks of age. Femurs were collected from all mice and the descending aortas were removed from males for culture of endothelial cells as described below.

15 Tail-tip DNA was prepared from each F2 mouse by Proteinase K digestion followed by ethanol precipitation. Lesion areas of the aortic root were quantified as previously de- scribed [49]. Genotyping was performed using Illumina Golden Gate mouse genotyping arrays. Genotyping calls were made using Illumina’s Genome Studio. 599 informa- tive markers between AKR and DBA/2 were used for QTL analyses. Atherosclerosis QTLs were mapped using the package qtl [50]. False discovery rates were estimated by permutation within this software. Bone marrow macrophages were derived as previously described. Descending aor- tas were isolated from the F2 mice, cut into 2mm sections, placed on top of matrigel coated plated, and grown to confluence ( 10-14 days) in DMEM supplemented as previously described [51]. Cells were treated with dispase and passaged twice before RNA isolation. RNA was isolated using Qiagen’s total RNA kits and digested with DNAse I (Qiagen Catalogue #79254) for 12 minutes at room temperature to remove genomic DNA. RNA integrity was confirmed by agarose gel electrophoresis. cDNA was synthesized by using Illumina protocols and reagents and hybridized on Illumina Mouse Ref-8 v2 microarrays. All expression and phenotype analysis is made available in GEO (GSE35676). Gene expression data was loaded into the R-package lumi [52] log 2 transformed and quantile normalized. eQTLs were mapped using the scanone function in the R-package qtl (9). Probes were mapped using BLAT [53] against the mouse refer- ence genome mm9 reference genome. Probes that were matched to multiple locations and different annotated transcripts from ensembl release 63 were discarded. Probes containing polymorphisms, either an indel or SNP or probes mapping to known struc- tural variants between DBA and AKR[54][55] were discarded as polymorphisms be- tween the two strains mapping to a probe location could lead to identification of false cis-eQTLs [56]. This was done by taking the blat locations and using Tabix [57] to retrieve sequence variants from the mouse genomes sequencing project vari-

16 ant calls. This filtering of probes resulted in the removal of 2,749 probes from the dataset. Human-Mouse alignments generated previously by Schwarz et. al [58] were used to obtain the human regions corresponding to our mouse Ath QTLs and the human-mouse alignment was used to obtain the corresponding human GWAS [59] related to coronary artery disease. To compare eQTLs from the current and prior studies [48], matches between Affymetrix and Illumina’s probes were provided by Il- lumina (http://www.switchtoi.com/probemapping.ilmn). The prior cross SNPs were imputed to the new SNPs with the simple assumption of no double crossovers. A liberal 20 mb window was used between studies to ascertain if an eQTL overlapped. A combined eQTL analysis was done using data from both studies, using cross mem- bership as an additive covariate and sex as an interactive covariate.

2.3 Results

2.3.1 Atherosclerosis QTL replication in a new cross

F2 mice were generated using a reciprocal cross strategy from ApoE-/- mice on the AKR and DBA/2 backgrounds. Lesion areas in the aortic root were quantified in 166

F2 mice, 77 female and, 89 male. A genome scan was performed for each F2 mouse using 599 informative SNPs. We defined significant QTLs as those that have genome wide false discovery rates (FDR) of < 10%, by permutation analysis. Combining both sexes and using sex as an interactive covariate, we identified three suggestive atherosclerosis (Ath) QTLs at a LOD score threshold of 2.0 on 2, 15, and 17 with FDRs of 15, 30, and 16%, respectively Table 1. Although these Ath QTLs only met the suggestive threshold due to small sample size, each of these Ath QTLs were detected in a prior AKR x DBA/2 strain intercross [60] : Ath28 a suggestive QTL on chromosome 2; Ath22 a significant QTL on ; and, Ath26 a significant QTL on . We performed a combined Ath QTL analysis

17 using both crosses with cohort membership as an additive covariate and sex as an interactive covariate. As the platform and markers used to genotype the two crosses differed, R/QTL [61] was used to impute the genotypes between the two crosses. On chromosome 2, Ath28 was replicated and had a combined LOD score of 5.9; on chromosome 15, Ath22 was replicated with a combined LOD score of 5.3; and, on chromosome 17, Ath26 was replicated with a combined LOD score of 5.6 (Figure 2.1). All of these combined LOD scores met the genome wide FDR threshold of < 10%, and in fact they all met < 5% FDR. However, the suggestive Ath QTLs identified in the first cross using sex as an interactive covariate on chromosomes 5, 3, and 13 [60] were not replicated in the second cross. The approximate 95% Bayesian credible interval was obtained for all three loci (Table 2.1). Thus, the clinical trait mouse QTL model is partially reproducible for a phenotype as complex as lesion area, which has a fairly large coefficient of variation, even within inbred strains.

Table 2.1: Table 1: Aortic root lesion (log 10) QTLs in DBA/2 x AKR F2 cohort

Symbol Chr Bayesian Second FDR Combined FDR Confidence Cross Cross LOD Inter- LOD Score val(Mb) Ath28 2 165.1- 2.8 0.15 5.9 < 0.05 179.3 Ath22 15 3.6-31.9 2.0 0.30 5.3 < 0.05 Ath26 17 12.4-64.3 2.7 0.16 5.6 < 0.05

We identified the human chromosome segments orthologous to these mouse loci. We examined whether these human orthologous regions contained common genetic variants associated with coronary artery disease (CAD) or related risk factors by searching the National Research Institute Genome Wide Associa- tion Study (GWAS) Catalog [26] (Table 2.2). There were no human CAD GWAS hits in the region in synteny with Ath28, although 21% of the Ath28 interval on chromosome 2 displayed no synteny due to complex expansion in the mouse genome

18 Figure 2.1: Combined AthQTL. Aortic root lesion (log10) QTLs in DBA/2 x AKR F2 cohort. The pink and blue lines show the LOD plots for the prior and new crosses of AKR ApoE-/- and DBA/2 ApoE-/- mice, respectively. The black line shows the LOD plot for the combined analysis using cross as an additive covariate. In all analyses, sex was used as an interactive covariate. after species divergence. The Ath22 locus on chromosome 15 contains the corre- sponding segment on human that has been associated with subclinical atherosclerosis [62]. The Ath26 locus on chromosome 17 corresponds to human chro- mosomes 6 (primarily) and 19, including the MHC locus, and overlaps with multiple human GWAS loci for CAD and related risk factors (Table 2.2).

19 Table 2.2: Corresponding GWAS hits in human orthologous regions to mouse AthQTL

Chr Position rsID Author Trait Nearest gene chr 2 (Ath28 ) No hits chr 15 (Ath22 ) chr5 13764419 rs2896103 O’Donnell CJ Subclinical atherosclerosis traits (other) DNAH5 chr5 13769974 rs7715811 O’Donnell CJ Subclinical atherosclerosis traits (other) DNAH5 chr5 13779743 rs1502050 O’Donnell CJ Subclinical atherosclerosis traits (other) DNAH5

chr 17 (Ath26 ) chr6 160578859 rs1564348 Teslovich TM LDL cholesterol LPA chr6 160578859 rs1564348 Teslovich TM Cholesterol, total LPA chr6 160741621 rs3120139 Qi Q Lp (a) levels SLC22A3 20 chr6 160863531 rs2048327 Tregouet DA Coronary heart disease SLC22A3,LPAL2,LPA chr6 160863531 rs2048327 Tregouet DA Coronary heart disease SLC22A3,LPAL2,LPA chr6 160907133 rs3127599 Tregouet DA Coronary heart disease SLC22A3,LPAL2,LPA chr6 160907133 rs3127599 Tregouet DA Coronary heart disease SLC22A3,LPAL2,LPA chr6 160910516 rs12214416 Qi Q Lp (a) levels LPAL2 chr6 160960358 rs6919346 Ober C Lp (a) levels LPA chr6 160961136 rs3798220 Schunkert H Coronary heart disease LPA chr6 160962502 rs7767084 Tregouet DA Coronary heart disease SLC22A3,LPAL2,LPA chr6 160962502 rs7767084 Tregouet DA Coronary heart disease SLC22A3,LPAL2,LPA chr6 160969737 rs10755578 Tregouet DA Coronary heart disease SLC22A3,LPAL2,LPA chr6 160969737 rs10755578 Tregouet DA Coronary heart disease SLC22A3,LPAL2,LPA chr6 161010117 rs10455872 Chasman DI Response to therapy (LDL-C) LPA chr6 161010117 rs10455872 Qi Q Lp (a) levels LPA chr6 161089816 rs1084651 Teslovich TM HDL cholesterol LPA chr6 161137989 rs783147 Qi Q Lp (a) levels PLG Table 2.2: Corresponding GWAS hits in human orthologous regions to mouse AthQTL

Chr Position rsID Author Trait Nearest gene chr6 34546560 rs2814982 Teslovich TM Cholesterol, total C6orf106 chr6 34552797 rs2814944 Teslovich TM HDL cholesterol C6orf106 chr6 35034800 rs17609940 Schunkert H Coronary heart disease ANKS1A chr19 8433196 rs7255436 Teslovich TM HDL cholesterol ANGPTL4 chr19 8469738 rs2967605 Kathiresan S HDL cholesterol ANGPTL4 chr6 31184196 rs3869109 Davies RW Coronary heart disease HCG27, HLA-C chr6 32412435 rs3177928 Teslovich TM LDL cholesterol HLA chr6 32412435 rs3177928 Teslovich TM Cholesterol, total HLA chr6 32669373 rs11752643 Takeuchi F Coronary heart disease HLA, DRB-DQB chr6 33143948 rs2254287 Willer CJ LDL cholesterol B3GALT4 chr6 43758873 rs6905288 Davies RW Coronary heart disease VEGFA 21 2.3.2 Protein coding differences between AKR and DBA mice

residing in ATH QTLs

We used a variety of bioinformatic and genomic methods In order to identify can- didate genes that may be responsible for the three replicated Ath QTLs. Using the mouse sequence data from 15 common inbred strains [54][55], we identified all of the nonsynonymous protein changes in these three loci. These strain variable genes on chromosomes 2 (11 genes), 15 (23 genes), and 17 (258 genes) are listed in Supple- mentary Table 1, with many genes having more than one substitution between these two strains. We identified many more strain variant genes for Ath26 on chromosome 17, as it is a very large 52 Mb gene dense interval that contains the highly polymorphic mouse H2 major histocompatibility region. After exclusion of the major histocompatibility genes, we used Polyphen 2 [63] to ascertain in silico the like- lihood that each protein coding change would impair protein function, and we found numerous potential protein functional differences between the two strains(Table 2.3 and Supplementary Table 1).

22 Table 2.3: Polyphen2 scores within Ath intervals.

Ref. Alt. Gene AA AA Polyphen2 Chr Position AKR1 DBA1 Allele Allele Symbol position alteration score

2 165177862 C T 1/1 0/0 Zfp663 646 G>R 0.98 2 165179205 G A 1/1 0/0 Zfp663 198 T>I 0.969 2 172377285 T C 0/0 1/1 Tcfap2c 143 Y>H 0.917 2 173034738 G C 1/1 0/0 Zbp1 280 T>S 0.944 15 3929418 C T 0/0 1/1 Fbxo4 3 G>R 1

23 15 4705267 G A 1/1 0/0 C6 205 R>K 1 15 4741106 G T 1/1 0/0 C6 533 R>M 0.591 15 4888423 G T 1/1 0/0 Heatr7b2 981 Q>H 0.989 15 4984193 G A 1/1 0/0 C7 242 T>M 0.78 15 9297067 T G 1/1 0/0 Ugt3a2 380 H>Q 0.966 15 11834935 C T 1/1 0/0 Npr3 154 A>T 1

Table excludes the MHC genes and several zinc finger protein genes. 1 0 is the reference allele; 1 is the first or only alternate allele; and 2 is the second alternate allele. 2 Probability of detrimental amino acid substitution, using >0.5 as the threshold. Table 2.3: Polyphen2 scores within Ath intervals.

Ref. Alt. Gene AA AA Polyphen2 Chr Position AKR1 DBA1 Allele Allele Symbol position alteration score

15 27492564 C T 0/0 1/1 Ank 201 A>V 0.932 15 28275570 C T 0/0 1/1 Dnahc5 2434 R>W 0.908 17 12243519 G C 0/0 1/1 Park2 397 E>Q 1 17 17987504 C T 0/0 1/1 Has1 40 A>T 0.638 17 18048331 T G 1/1 0/0 Gm7535 194 D>A 0.958

24 17 18048648 C T 1/1 0/0 Gm7535 88 M>I 0.875 17 18395099 A G 1/1 0/0 Vmn2r94 117 S>P 1 17 18719527 C T 0/0 1/1 Vmn2r96 53 A>V 1 17 19084957 G T 1/1 0/0 Vmn2r97 836 L>F 0.951 17 19203418 G T 0/0 1/1 Vmn2r98 405 A>S 0.937 17 19204356 G A 0/0 1/1 Vmn2r98 496 E>K 0.979 17 19814743 C G 1/1 0/0 Vmn2r102 352 P>R 0.806

Table excludes the MHC genes and several zinc finger protein genes. 1 0 is the reference allele; 1 is the first or only alternate allele; and 2 is the second alternate allele. 2 Probability of detrimental amino acid substitution, using >0.5 as the threshold. Table 2.3: Polyphen2 scores within Ath intervals.

Ref. Alt. Gene AA AA Polyphen2 Chr Position AKR1 DBA1 Allele Allele Symbol position alteration score

17 19910406 G C 1/1 0/0 Vmn2r103 27 C>S 1 17 19949143 G T 0/0 1/1 Vmn2r103 738 K>N 1 17 20405284 C T 0/0 1/1 Vmn2r106 606 V>I 0.986 17 20415678 A C 0/0 1/1 Vmn2r106 312 Y>D 0.975 17 20608324 C T 1/1 0/0 Vmn2r108 300 M>I 0.793

25 17 20639402 T G 1/1 0/0 Vmn1r225 47 L>R 0.998 17 21050607 A T 0/0 1/1 Vmn1r232 232 F>I 0.774 17 21423632 C T 0/0 1/1 Vmn1r236 16 T>I 0.953 17 23496190 G A,T 0/0 1/1 Vmn2r115 557 V>F 0.892 17 23802717 A T 0/0 1/1 Ccdc64b 170 T>S 0.961 17 24558772 G A 1/1 0/0 Rnps1 165 G>E 0.953 17 24701163 A G 1/1 0/0 Pkd1 95 E>G 0.985

Table excludes the MHC genes and several zinc finger protein genes. 1 0 is the reference allele; 1 is the first or only alternate allele; and 2 is the second alternate allele. 2 Probability of detrimental amino acid substitution, using >0.5 as the threshold. Table 2.3: Polyphen2 scores within Ath intervals.

Ref. Alt. Gene AA AA Polyphen2 Chr Position AKR1 DBA1 Allele Allele Symbol position alteration score

17 25240572 C T 1/1 0/0 Telo2 664 V>M 0.946 17 25242093 C T 0/0 1/1 Telo2 531 R>H 1 17 25247204 G A 0/0 1/1 Telo2 301 R>C 0.998 17 25305356 A T 0/0 1/1 Ccdc154 380 Q>L 0.924 17 25305357 G T 0/0 1/1 Ccdc154 380 Q>H 0.996

26 17 25436769 T C 0/0 1/1 Prss34 260 L>P 0.996 17 25457599 A G 1/1 0/0 Prss29 74 T>A 0.997 17 29018352 G A 1/1 0/0 Pnpla1 416 S>N 0.766 17 29018363 T A 1/1 0/0 Pnpla1 420 L>M 0.94 17 29275100 G T 1/1 0/0 Rab44 86 Q>H 1 17 29296944 C T 1/1 0/0 Cpne5 506 V>M 0.862 17 29994582 C T 1/1 0/0 Mdga1 54 D>N 0.997

Table excludes the MHC genes and several zinc finger protein genes. 1 0 is the reference allele; 1 is the first or only alternate allele; and 2 is the second alternate allele. 2 Probability of detrimental amino acid substitution, using >0.5 as the threshold. Table 2.3: Polyphen2 scores within Ath intervals.

Ref. Alt. Gene AA AA Polyphen2 Chr Position AKR1 DBA1 Allele Allele Symbol position alteration score

17 30814214 C A,T 0/0 1/1 Dnahc8 886 L>M 1 17 31145549 A T 0/0 1/1 Umodl1 1304 I>F 0.933 17 31372493 G A 1/1 0/0 Ubash3a 450 V>M 0.948 17 32838830 C T 1/1 0/0 Cyp4f15 378 S>F 0.827 17 33062306 A G 1/1 0/0 Cyp4f13 453 S>P 0.739

27 17 33731241 C T 1/1 0/0 Myo1f 658 R>C 0.993 17 34048493 A G 1/1 0/0 Daxx 179 N>S 0.734 17 34057211 C G 0/0 1/1 Tapbp 79 L>V 0.669 17 34069629 C T 0/0 1/1 Rgl2 234 A>V 0.999 17 34072961 G A 0/0 1/1 Rgl2 665 V>I 0.969 17 34102686 A T 0/0 1/1 Vps52 663 S>C 0.996 17 34165995 C T 1/1 0/0 Slc39a7 393 V>M 0.968

Table excludes the MHC genes and several zinc finger protein genes. 1 0 is the reference allele; 1 is the first or only alternate allele; and 2 is the second alternate allele. 2 Probability of detrimental amino acid substitution, using >0.5 as the threshold. Table 2.3: Polyphen2 scores within Ath intervals.

Ref. Alt. Gene AA AA Polyphen2 Chr Position AKR1 DBA1 Allele Allele Symbol position alteration score

17 34195857 G A 1/1 0/0 Col11a2 1044 A>T 0.961 17 34196094 G A 0/0 1/1 Col11a2 1079 V>M 0.996 17 34499618 G A 0/0 1/1 Btnl2 242 E>K 0.904 17 34518474 A C 0/0 1/1 Btnl3 300 K>Q 0.81 17 34597021 A C 1/1 0/0 BC051142 264 Q>P 0.663

28 17 34645055 G A,T 2/2 0/0 Btnl6 482 A>D 0.996 17 34645142 C G 0/0 1/1 Btnl6 453 S>T 0.936 17 34645839 T C 0/0 1/1 Btnl6 337 Q>R 0.752 17 34652479 C G 0/0 1/1 Btnl6 85 E>Q 0.893 17 34670298 A G 1/1 0/0 Btnl7 426 F>L 0.999 17 34670922 G A 1/1 0/0 Btnl7 334 R>C 0.999 17 34679441 C T 1/1 0/0 Btnl7 132 G>R 0.997

Table excludes the MHC genes and several zinc finger protein genes. 1 0 is the reference allele; 1 is the first or only alternate allele; and 2 is the second alternate allele. 2 Probability of detrimental amino acid substitution, using >0.5 as the threshold. Table 2.3: Polyphen2 scores within Ath intervals.

Ref. Alt. Gene AA AA Polyphen2 Chr Position AKR1 DBA1 Allele Allele Symbol position alteration score

17 34721459 G A 0/0 1/1 Notch4 1469 R>Q 0.997 17 34724653 A G 1/1 0/0 Notch4 1873 Y>C 0.995 17 34735125 G A 1/1 0/0 Ager 31 G>E 1 17 34782359 C T 0/0 1/1 Fkbpl 52 P>L 0.957 17 34787670 G T 0/0 1/1 Atf6b 233 A>S 0.975

29 17 34808447 A G 0/0 1/1 Tnxb 273 Q>R 0.963 17 34831297 A C 1/1 0/0 Tnxb 1903 E>A 0.944 17 34832575 C G 1/1 0/0 Tnxb 2020 H>D 0.794 17 34833521 C T 0/0 1/1 Tnxb 2167 T>I 0.993 17 34840315 C T 0/0 1/1 Tnxb 2494 P>S 0.992 17 34867829 C T 0/0 1/1 C4b 1442 R>K 0.613 17 35163365 G A 0/0 1/1 Ng23 129 T>M 0.941

Table excludes the MHC genes and several zinc finger protein genes. 1 0 is the reference allele; 1 is the first or only alternate allele; and 2 is the second alternate allele. 2 Probability of detrimental amino acid substitution, using >0.5 as the threshold. On chromosome 2, only 3 proteins changes are predicted to be damaging in one strain relative to the other strain. Zbp1 is a Z-DNA binding protein, Tcfap2c is an AP-2 transcription factor involved in early development and Zfp663 is a zinc-finger protein; and, none of these proteins have been previously implicated in atherosclero- sis susceptibility. On chromosome 15, there are several protein coding changes that are predicted to be detrimental in one strain vs. the other. Two components of the , C6 and C7 have predicted functional differences, both with the AKR version being detrimental. The complement system has potential roles in cardiovascular disease as previously reviewed [64]. On chromosome 17, there were 50 genes with predicted detrimental changes, and many additional changes in MHC com- plex genes, that were not subjected to the Polyphen analysis. Some notable protein changes were found in Rab44, Collagen 11a2, and Notch4, which can alter cellular vesicular trafficking, extracellular matrix, and signal transduction, respectively, all potential atherosclerosis modifiers.

2.3.3 eQTL in bone marrow derived macrophages and en-

dothelial cells

The global profile of gene expression in bone marrow derived macrophages was as-

sayed using Illumina microarrays from 79 female and 81 male F2 mice. Using the same 599 SNPs used to map the Ath QTLs, we mapped expression QTLs (eQTLS), or loci that are associated with the expression of each transcript, using sex as an additive covariate. An eQTL was defined as a cis-eQTLs if the eQTL mapped within 20 Mb of the probe position. A trans-eQTL is defined as the QTL mapping anywhere else on the genome. In order to eliminate spurious eQTLs, we filtered out 2,479 expression array probes that contained a SNP or insertion/deletion between these two strains, which could lead to altered probe hybridization impairing accurate measure of gene expression. We validated that these strain variant probes would indeed lead to false

30 cis-eQTLs with on average a LOD score that was double the LOD score of compara- ble probes containing no variant (LOD 14.7 vs. 7.6, p-value < 2.2x10−6). In addition, the transcripts with the non-reference SNP allele overlapping the probes were over- whelmingly called with lower expression values vs. the transcripts containing the reference allele (Figure 2.2). After filtering out probes that were not expressed above background in at least 25% of the samples, 9600 probes were evaluated for eQTLs. We used a stringent FDR cut off of 5% to identify cis-eQTLs, which corresponded to a LOD score threshold of 2.4 and found 937 cis-eQTLs (Supplementary Table 2). Table 2.4 shows the top 25 cis-eQTLs ranked by LOD score. Since trans-eQTLs are indirect, and often not as strong as cis-eQTLs, we applied both a liberal FDR cut off of 30% as well as a stringent 5% cut off. The 30% and 5% FDR thresholds corresponded to LOD scores of 2.81 and 3.75, respectively, with 3797 and 551 trans- eQTLs identified, respectively (Supplementary Table 3). Table 2.5 shows the top 25 trans-eQTLs ranked by LOD score.

Table 2.4: Top 25 cis-eQTLs by LOD score in BMMs

Gene Symbol QTL Marker QTL Chr QTL Marker Pos (Mb) LOD

Rnps1 rs3719497 17 24.1 55.9 Fblim1 rs3688566 4 140.0 54 Zfp277 rs13481408 12 35.5 44.7 2210012G02Rik rs3709486 4 110.0 43.9 Atg9b CEL-5 24211033 5 24.2 42.5 Vill rs3669563 9 117.9 41.2 Gjb4 gnf04.123.367 4 126.4 35.3 Gm962 CEL-19 5283144 19 5.3 35 Insl6 rs3090325 19 26.0 33.3 Gm962 CEL-19 5283144 19 5.3 32.8

31 Table 2.4: Top 25 cis-eQTLs by LOD score in BMMs

Gene Symbol QTL Marker QTL Chr QTL Marker Pos (Mb) LOD

Hint2 CEL-4 40541402 4 40.5 32.6 Agpat5 rs3657963 8 16.6 32.1 Prss22 rs3726555 17 15.2 31.5 Ccdc163 rs3709486 4 110.0 30.7 Sys1 rs3671849 2 163.2 30.4 Abhd1 rs13469943 5 29.5 30.1 Tuft1 rs13477261 3 92.3 28 Usp2 rs4135590 9 43.1 28 H2 -Gs10 rs3682923 17 34.3 27.7 Abhd1 rs13469943 5 29.5 27.5 Zfp420 rs4226520 7 18.8 27.5 H2 -T10 rs6298471 17 35.1 26.9 Pdxdc1 rs4163196 16 13.1 26.5 Fgr rs3663950 4 134.2 26.1 Scamp5 rs13480208 9 55.4 25.7

32 Figure 2.2: SNPs within probes may lead to false eQTLs due to altered hybridiza- tion of transcripts to probes. The figure above shows 306 cis-eQTLs identified that contained AKR vs. DBA/2 polymorphisms that overlapped the sequence of the mi- croarray probe. The SNPs that differed from the C57Bl/6 reference genome and were also detected with lower expression are shown in blue, while those that were detected with higher expression are shown in pink. Thus, probes designed to the C57Bl/6 ref- erence genome were more likely to hybridize less strongly to transcripts that contain a SNP differing from the reference sequence.

33 Table 2.5: Top 25 trans-eQTLs by LOD score in BMMs

Gene Symbol QTL Chr QTL Marker Pos (Mb) QTL Marker LOD Probe Chr Probe Location (Mb)

Eif4h 1 4.5 rs13475701 2.76 5 13.5 Eri3 1 4.5 rs13475701 2.91 4 117.2 Rabep1 1 4.5 rs13475701 3.12 11 70.8 Crebl2 1 4.5 rs13475701 2.91 6 134.8 Spn 1 4.5 rs13475701 2.88 7 134.3 Arrdc3 1 11.5 rs3722996 2.82 13 81.0

34 Ppap2a 1 11. rs3722996 3.61 13 113.7 Ppap2a 1 11.5 rs3722996 3.76 13 113.6 Zfand2b 1 11.5 rs3722996 3.02 1 75.2 Eif2c3 1 11.5 rs3722996 3.65 4 12.6 Ppap2a 1 16.4 rs3671256 2.76 13 113.6 Rpsa 1 16.4 rs3671256 3.15 9 12.0 Ang 1 16.4 rs3671256 3.41 14 51.7 Cnot7 1 19.7 rs13475750 2.78 8 41.6 Tmed3 1 19.7 rs13475750 3.05 9 89.6 Table 2.5: Top 25 trans-eQTLs by LOD score in BMMs

Gene Symbol QTL Chr QTL Marker Pos (Mb) QTL Marker LOD Probe Chr Probe Location (Mb)

Stxbp2 1 19.7 rs13475750 3.17 8 3.6 Ankzf1 1 19.7 rs13475750 4.55 1 75.2 2310047M10Rik 1 19.7 rs13475750 3.04 11 68.9 Sntb2 1 19.7 rs13475750 2.96 8 109.5 Ankrd17 1 19.7 rs13475750 2.76 5 90.7 Anks6 1 19.7 rs13475750 3.31 4 47.0

35 Odf2 1 19.7 rs13475750 2.94 2 29.8 Glis2 1 19.7 rs13475750 2.75 16 4.6 Clec1a 1 19.7 rs13475750 3.21 6 129.4 Lusis has proposed that mouse strain effects on endothelial cell (EC) function may underlie some strain effects on atherosclerosis [51]. We successfully cultured pri- mary aortic EC from 48 male F2 mice used in the atherosclerosis study and assayed global gene expression by microarray. As expected, these cells expressed high levels of canonical EC transcripts, such as Tie2, VEGFR, and VWF, all of which were lowly expressed in BMM. We applied the same FDR thresholds as in the macrophage anal- ysis to identify EC cis- and trans-eQTLs. At the 5% FDR threshold, corresponding to a LOD score of 2.47, we identified 458 cis-eQTLs (Supplementary Table 4 and top 25 in 2.6). For trans-eQTLs, the 30% and 5% FDR thresholds corresponded to LOD scores of 2.70, and 3.92, with 4894 and 365 trans-eQTLs identified, respectively (Supplementary Table 5 and top 25 in Table 2.7).

36 Table 2.6: Top 25 cis-eQTLs by LOD score in endothelial cells

Gene QTL QTL Marker QTL LOD Symbol Chr Position (Mb) marker Thumpd1 7 107.4 rs3709679 31.7 Mod1 9 90.5 gnf09.087.298 29.2 Mff 1 85.9 rs3723062 28 Paip1 13 115.1 rs13482035 27.5 Ercc5 1 44.7 CEL-1 44668113 24.3 Lrrc57 2 117.1 rs13476723 24.1 G430022H21Rik 3 122.0 rs3659836 23.1 Aqr 2 111.1 rs13476698 22.2 Atpbd3 7 33.6 rs8255275 21.1 Abhd1 5 29.5 rs13469943 18.9 Scoc 8 82.2 rs13479863 17.1 Zfp277 12 35.0 rs13481406 16.7 Grwd1 7 33.6 rs8255275 16.2 Ugt1a6a 1 87.1 UT 1 89.100476 16 2610019P18Rik 5 135.8 rs4225536 15 Rnf41 10 122.9 rs13480803 15 Slc25a3 10 93.3 rs13480712 14.9 Pdxdc1 16 12.2 rs4162800 14.8 4930455C21Rik 16 30.9 rs4168640 14.6 Il3ra 14 7.4 rs3150398 14.6 Tpmt 13 46.6 rs6411274 14.5 Arid4b 13 13.0 rs13481697 14.2 Pdxdc1 16 12.2 rs4162800 14 BC031748 X 129.8 rs13484031 13

37 Table 2.7: Top 25 trans-eQTLs by LOD score in endothelial cells.

Gene Symbol QTL Marker QTL Chr QTL Marker Pos (Mb) LOD Probe Chr Probe Location (Mb)

Tob2 rs13480854 11 7.5 7.24 15 81.7 Adi1 rs6344105 12 63.0 6.9 12 29.4 Mrgprg rs13482407 14 114.4 6.46 7 151.0 Rb1 rs3663355 4 47.4 6.39 14 73.6 P2ry6 rs3707067 7 86.0 6.2 7 108.1 Slc10a7 rs13477617 4 27.1 5.99 8 81.2

38 Trp53bp1 rs13482418 15 3.5 5.94 2 121.0 Mgst3 rs6279930 1 137.3 5.92 1 169.3 Fam168a rs13479324 7 57.9 5.89 7 108.0 Eml1 rs6393948 11 103.3 5.84 12 109.7 Mrvi1 rs13476928 2 174.2 5.82 7 118.0 Tprn rs3669563 9 117.9 5.78 2 25.1 Cdk2ap2 rs6290836 14 9.1 5.71 19 4.1 Spcs3 gnf18.051.412 18 53.6 5.64 8 55.6 Trap1 rs3699561 1 131.0 5.54 16 4.0 Table 2.7: Top 25 trans-eQTLs by LOD score in endothelial cells.

Gene Symbol QTL Marker QTL Chr QTL Marker Pos (Mb) LOD Probe Chr Probe Location (Mb)

Eral1 gnf04.123.367 4 126.4 5.5 11 77.9 BC013529 rs13477617 4 27.1 5.47 10 7.5 Tcf20 rs13483085 17 67.0 5.43 15 82.6 Prpsap2 rs13482673 15 82.6 5.42 11 61.5 Rin3 rs3721056 9 71.3 5.38 12 103.6 Dhcr24 rs3664408 2 161.4 5.37 4 106.3

39 Nxt1 rs3663950 4 134.2 5.36 2 148.5 Rtn2 rs13476928 2 174.2 5.34 7 19.9 Mlst8 rs13482035 13 115.1 5.33 17 24.6 Figure 2.3: Replicated cis-eQTL. Example of a replicated cis-eQTL between tissues (A and C) and between studies (A and B) of Sys1, a gene encoding for an integral golgi associated . Means +/- SEM are shown adjacent to the individual values. P-values and R-squared was obtained by linear regression with sex as an additive covariate.

In order to evaluate cross tissue eQTLs we counted the number of cis-eQTLs (5% FDR) and trans-eQTLs (30% FDR) that were found in both macrophages and ECs. We identified 156 cis-eQTLs (Supplementary Table 5) common in both tissues, al- though our power was limited by the relatively small number of EC samples. We identified 12 cross-tissue cis-eQTLs that were located in the three replicated Ath QTLs in chromosomes 2, 15, and 17 (Table 2.8). An example of a cross-tissue cis- eQTL within an Ath QTL interval is Sys1, coding for the golgi-localized integral membrane protein homolog (Figure 2.3). In contrast to the 156 cross tissue cis- eQTLs, there were only 18 cross-tissue trans-eQTLs at the 30% FDR threshold that overlapped between the two tissues (Table 2.9). A replicated trans-eQTL is defined one in which the trans-eQTL markers map within 10 Mb of each other. Upon inspec-

40 tion, it appears that three of these cross tissue trans-eQTLs on may in fact be cis-eQTLs, as the position of the gene and the markers were on the chro- mosome 7 and only slightly greater that the 20 Mb cutoff use to classify cis-eQTLs. The small number cross-tissue trans-eQTLs has been noted in prior studies [65]. One of these cross tissue trans-eQTLs mapped to the Ath22 interval on chr 15, which was associated with the expression of the Klf2 transcription factor on chromosome 8.

Table 2.8: cis-eQTLs that are found in both ECCs and BMMs at < 5% FDR that also reside within the AthQTLs

BMM ECC Gene QTL BMM ECC Probe ID eQTL eQTL Symbol Chrom LOD LOD position Position

ILMN 2674425 Sys1 2 163.2 163.2 30.45 10.91 ILMN 1216029 Cstf1 2 174.2 174.2 3.87 3.67 ILMN 2869312 Fbxo4 15 5.7 4.2 7.06 10.81 ILMN 3123120 Rnps1 17 24.1 15.2 55.9 11.24 ILMN 2601877 Brd4 17 34.3 34.3 2.97 4.6 ILMN 2810539 H2-Gs10 17 34.3 39.8 27.72 9.78 ILMN 2691360 Mrps10 17 39.8 43.9 6.98 3.99 ILMN 1256171 Tmem63b 17 43.9 43.9 16.68 8.49 ILMN 2734045 Mrpl14 17 45.5 35.1 20.28 5.97 ILMN 2804487 Aif1 17 45.5 39.8 2.62 3.49 ILMN 2761876 Fez2 17 62.1 72.7 3.69 3.25 ILMN 3090123 Dync2li1 17 82.1 87.5 22.79 6.35

41 Table 2.9: trans-eQTLs that are found in both endothelial cells (EC) and bone marrow macrophages (BMM) at < 30% FDR

ProbeID Gene Symbols QTL BMM EC eQTL BMM EC Probe Probe Chrom eQTL Position LOD LOD Chr Position position (Mb) (Mb) (Mb) ILMN 2759167 Gtpbp5 1 153 160 4.6 2.9 2 180 ILMN 2629375 1110038F14Rik 3 42 42 3.3 4.03 15 77 ILMN 2740285 Fancl 3 127 132 2.8 2.8 11 26

42 ILMN 2602837 Akr1e1 4 142 134 28.6 2.8 13 5 ILMN 1245307 Fbln2 5 40 40 3 2.7 6 91 ILMN 2685329 Hspg2 5 40 40 2.8 2.8 4 137 ILMN 2622057 Tsen2 5 117 112 3.1 4.7 6 116 ILMN 2791578 Gspt1 5 132 136 3.1 2.9 16 11 ILMN 1255175 Unc45a 7 67 65 6 3.3 7 87 ILMN 2893879 Gdpd3 7 111 112 16.1 5.2 7 134 ILMN 2689056 Cd2bp2 7 112 112 6.1 5 7 134 ILMN 2751354 Pde4dip 8 84 86 4.1 3.1 3 98 Table 2.9: trans-eQTLs that are found in both endothelial cells (EC) and bone marrow macrophages (BMM) at < 30% FDR

ProbeID Gene Symbols QTL BMM EC eQTL BMM EC Probe Probe Chrom eQTL Position LOD LOD Chr Position position (Mb) (Mb) (Mb) ILMN 2604029 Klf2 15 34 41 3.1 3.1 8 75 ILMN 1256434 Pigk 15 72 72 3 3.8 3 152 ILMN 2836924 Wdr45 15 96 93 2.9 2.7 20 7

43 ILMN 3114998 Zfp658 15 96 93 3.4 3.7 7 51 ILMN 2672778 Abhd1 17 67 70 3.1 3.1 5 31 ILMN 3038459 Morf4l1 19 21 30 2.8 2.7 9 90 Table 2.10: Summary statistics and replication of bone marrow macrophage cis and trans-eQTLs for the prior and new crosses using the restricted set of common probe.

cis-eQTLs trans-eQTLs

# of eQTLs (5% FDR) prior cross 738 281 # of eQTLs (5% FDR) new cross 482 274 # of eQTLs common between old and new (5%) 265 5 # of eQTLs common between old and new (30%) ND 23 # of eQTLs in combined analysis (5% FDR)1 783 703 # of eQTLs in combined analysis (30% FDR)1 ND 3158

1 combined sex eQTL analysis in both crosses using sex as an additive covariate

2.3.4 Macrophage eQTL replication between different crosses

and different platforms

A macrophage eQTLs study was performed in the previous AKR x DBA/2 F2 in- tercross; however different genetic markers and different expression array platforms were used. In order to examine replication of macrophage eQTLs between the cur- rent and prior study, we re-analyzed the prior data by imputing to the currently used 599 strain specific SNPs and mapping the Affymetrix gene expression probes to the currently used Illumina probes. After filtering out probes not mapped to the Illumina platform or those that were excluded in our new cross, only 5678 probes remained for analysis. We then performed the eQTL analysis of the prior dataset using sex as an additive covariate and obtained cis and trans e-QTLs at the same FDRs as described above (summary statistics in Table 2.9). Of the 738 and 482 cis-eQTLS identified in the prior and new crosses, respectively, 265 were replicated representing 36% and 55% of the input cis-eQTLs in the old and new cross, respectively (Figure 2.4, Supplementary Table 7). The cis-eQTL replication percentage range (36 to 55%)

44 in our study is somewhat lower than that of previously published replication study by van Nas et al. that found a cis-eQTL replication rate of around 50-60% [65]. However, van Nas et al. used the same platform and genotyping markers across their two studies, while we used separate platforms. In addition, van Nas et al. probably over estimated the replication rate as they did not remove probes containing strain polymorphic SNPs as we did in our study; and, we showed above that inclusion of the strain-polymorphic probes leads to strong false positive eQTLs. Sys1 not only had a cross-tissue cis-eQTL, it is also an example of a cross-study replicated cis-eQTL in BMM (Figure 2.3). The SNP rs3671849, within the Ath28 locus, displayed a strong additive effect on the expression of Sys1, with the DBA/2 allele expressed higher. This marker was associated with 51% and 42% of the variation in BMM Sys1 gene expression, respectively in the new and prior crosses, and 63% of the variation in EC Sys1 gene expression.

Figure 2.4: Combined cis-overlap. Venn diagram of the overlap between the cis-eQTL in the new cross and the old cross. Transcripts were limited to only the transcripts that were called present in both and had corresponding probe between the platforms.

There were only 5 trans-eQTLs that replicated between the two crosses or 0.9% and 0.6% of the old cross and new cross trans-eQTLs respectively at a 5% FDR LOD score cutoff (Table 2.11). The LOD plots and allele effects on gene expression for the Lamb2 gene, which had a replicated trans-eQTL, is shown in Figure 2.5. Relaxing the FDR to 30% in both crosses resulted in 23 trans-eQTLs that replicated between the studies or 6% and 4% of the old cross and new cross trans-eQTLs respectively. This is lower than the 19% trans-eQTL replication rate observed by van Nas et al;

45 however, the same caveats apply to our analysis concerning our use of two expression array and SNP platforms [65].

Figure 2.5: Replicated trans-eQTL. An example of a replicating trans-eQTL on chro- mosome 4 for the Lamb2 gene residing on chromosome 9.

46 Table 2.11: Replicated trans-eQTL between crosses at the %5 FDR level

Prior New QTL Prior New Probe Prior QTL Gene Probe New Probe ID QTL Marker Marker LOD LOD Position Probe ID Chr Symbol Chr Position (Mb) Position (Mb) Score Score (Mb)

1416513 at ILMN 2699488 1 11.3 4.5 5.0 6.4 Lamb2 9 108.4 1419423 at ILMN 2737368 13 74.5 63.9 6.3 10.7 Stab2 10 86.3 1437470 at ILMN 2780759 1 169.2 188.4 7.1 4.1 Pknox1 17 31.7 1448609 at ILMN 2493175 1 8.1 13.0 13.9 5.1 Tst 15 78.2

47 1451343 at ILMN 1240149 8 44.5 43.9 5.8 4.1 Vps36 8 23.4 As an alternative to examining replication of eQTLs, we combined the data from both F2 cohorts and performed a combined analysis of cis and trans-macrophage eQTLs using gender and cross as additive covariates. The combined method has more power to identify eQTLs than the replication method as it uses a larger sample size and is thus not penalized by a near miss false negative result in one of the two crosses. In the combined analysis there were 783 cis-eQTLs at a 5% FDR threshold (Supplementary Table 8). An example of a significant cis-eQTL found in the com- bined analysis, but not in the replicated analysis, is an eQTL for Wdr70, a WD40 repeat adapter protein. In the combined analysis there were 160 cis-eQTLs that were found that were not found in either analysis. Furthermore, there were 703 and 3158 trans-eQTLs at the 5% and 30% FDR thresholds in the combined analysis, respec- tively (Supplementary Table 9). We systematically searched for replicated eQTLs within the Ath QTL regions in order to identify potential atherosclerosis modifier candidate genes. In total there were 14 genes that met this criteria, and for each we determined the correlation of macrophage gene expression and lesion area within the

F2 mice of the prior and current crosses. Twelve of these correlations had conserved directions in the two crosses (Table 2.12). At the Ath28 locus on chromosome 2, we identified two replicated macrophage cis-eQTLs, of which Sys1 may have a connec- tion to cholesterol ester . Sys1, whose expression was positively associated with lesion area, is a golgi-localized integral membrane protein that is essential for the targeting for several proteins to the golgi complex and membrane vesicles [66] including the small GTPases Arl3p and Arfrp1. Deletion of Arfrp1 results in loss of lipid droplet formation in adipocytes [67]; and, lipid droplets in macrophages store cholesterol esters, and thus may play an important role in modifying atherosclerosis. At the Ath26 locus on chromosome 17, there were 9 replicated eQTLs with a shared direction of lesion area correlation, two of which have some prior link to atherosclero- sis. Prss22 is a that converts pro--type

48 into its enzymatically active form, abbreviated as uPA [68]. We found that Prss22 expression was inversely correlated with atherosclerosis, thus we would predict that uPA activity may also be inversely correlated with atherosclerosis. However, this is not the case, as prior studies have shown that macrophage expression of uPA is pos- itively associated with atherosclerosis in apoE-deficient mice [69][70]. Ltb, encoding lymphotixin beta (a member of the tumor necrosis factor gene family), resides in the Ath26 locus, and its expression was positively correlated with lesion area. Lympho- toxin beta receptor signalling in the arterial media beneath atherosclerotic plauques has been found to promote tertiary lymphoid organogenesis [71]. In addition, circu- lating levels of lymphotoxin beta receptor in humans were positively associated with coronary artery calcium scores [72]. However, it is difficult to interpret whether these findings are relevant to our observed correlation of macrophage Ltb expression and lesion area. None of the other replicated eQTLs at the Ath loci had obvious known connections to pathways implicated in atherosclerosis.

Table 2.12: Replicated cis-eQTL within replicated Ath QTL intervals that have replicated direction of expression-lesion correlation

Expression-lesion Expression-lesion Gene QTL Illmn Probe ID Affy Probe ID correlation correlation Symbol chr new cross1 old cross1

ILMN 2674425 1450057 at Sys1 2 0.06 0.18 ILMN 1216029 1448597 at Cstf1 2 -0.16 -0.13 ILMN 2710121 1416441 at Pgcp 15 -0.19 -0.12 ILMN 2688287 1420352 at Prss22 17 -0.04 -0.18 ILMN 2615207 1418321 at Eci1 17 -0.29 -0.15 ILMN 1219908 1418344 at Tmem8 17 0.11 0.18 ILMN 1218891 1419547 at Fahd1 17 -0.06 -0.13

1 Pearson’s correlation R value.

49 Table 2.12: Replicated cis-eQTL within replicated Ath QTL intervals that have replicated direction of expression-lesion correlation

Expression-lesion Expression-lesion Gene QTL Illmn Probe ID Affy Probe ID correlation correlation Symbol chr new cross1 old cross1

ILMN 2667889 1417173 at Atf6b 17 0.02 0.13 ILMN 1241923 1449537 at Msh5 17 -0.18 -0.26 ILMN 2726308 1449021 at Rpp21 17 -0.26 -0.10 ILMN 1258283 1419135 at Ltb 17 0.16 0.16 ILMN 2761876 1434348 at Fez2 17 -0.08 -0.07

1 Pearson’s correlation R value.

50 Chapter 3

Transcriptome Analysis of Genes Regulated by Cholesterol Loading in Two Strains of Mouse Macrophages Associates Lysosome Pathway and ER Stress Response with Atherosclerosis Susceptibility

As published in: Berisha S1, Hsu J1, Robinet P, Smith JD. Transcriptome Analysis of Genes Regulated by Cholesterol Loading in Two Strains of Mouse Macrophages Associates Lysosome Pathway and ER Stress Response with Atherosclerosis Suscep- tibility. PLoS ONE 2013 In Press

1Co-first authors

51 3.1 Introduction

Atherosclerosis is a complex and progressive of arteries that can be initi- ated by the accumulation and entrapment of in the extracellular matrix of the sub-endothelial intima layer [73][74]. Monocytes migrate across the endothe- lial monolayer into the intimal layer where they differentiate into macrophages and take up modified low density lipoproteins by scavenger receptors [75][76]. Once in- side the cells modified LDL is degraded in lysosomes, depositing free cholesterol in the lysosomes that is transported to the endoplasmic reticulum for conversion into cholesterol esters and storage in lipid droplets [77][78]. Cholesterol esters inside the lipid droplets can be hydrolyzed back into free cholesterol; and, recently, this hy- drolysis has been shown to be mediated via delivering cholesterol esters to lysosomal acid lipase [79]. Free cholesterol can be exported out of the cells through pas- sive diffusion, and/or protein mediated efflux via ATP-binding cassette transporters ABCA1, ABCG1 or scavenger receptor SR-BI [80][81][82][83][84]. Macrophages can- not limit their uptake of cholesterol as the expression of scavenger receptors is not subject to feedback regulation by cellular cholesterol content [85][86]. An imbalance between cholesterol influx and efflux leads to excessive accumulation of cholesterol in macrophages transforming them into cholesterol-engorged foam cells characteristic of fatty streaks [87][88][89]. The genetic background strain has been shown to effect atherosclerotic lesion area in hyperlipidemic ApoE-/- mice [42]. We have previously shown that chow-diet fed DBA/2 ApoE-/- mice have aortic root lesion areas that are 10-fold larger than those in AKR ApoE-/-mice at 16 weeks of age [60]. To gain insight on the effects of cholesterol loading on macrophage gene expression and its relation to atherosclerosis suscepti- bility, we incubated bone-marrow derived macrophages from atherosclerosis resistant AKR ApoE-/- and atherosclerosis susceptible DBA/2 ApoE-/- mice with acetylated LDL (AcLDL). We found that DBA/2 cells accumulate more total and esterified

52 cholesterol, but less free cholesterol than the AKR macrophages. We investigated the effect of cholesterol on the macrophages transcriptome and identified transcripts that were differentially regulated by strain or cholesterol loading, and those transcripts in which there was a strain-loading interaction effect. Gene set enrichment analysis showed that some of the identified transcripts are involved in the lysosome pathway, and we found several transcripts that play a role in the endoplasmic reticulum stress response. We hypothesized that some of these differentially regulated transcripts could be responsible for the observed difference in atherosclerosis susceptibility be- tween the AKR and DBA/2 strains. We identified differentially expressed transcripts as possible atherosclerosis modifier candidate genes residing within atherosclerosis QTLs previously characterized in a strain intercross between these two strains.

3.2 Methods

3.2.1 Mice

Age matched AKR and DBA/2 ApoE-/- mice were maintained on chow diet and sacrificed at 16 weeks of age. Plasma total cholesterol levels at time of sacrifice were 372.8 ± 47.78 mg/ml and 912.7 ± 104.3 mg/ml for AKR ApoE-/- and DBA/2 ApoE-/- mice, respectively. Femurs were collected to isolate and culture bone marrow monocytes.

3.2.2 Total, free, and esterified cholesterol quantification

Bone marrow derived macrophages (BMM) from AKR and DBA/2 ApoE-/- mice were cultured in p100 cell culture dishes for 13 days with 15% L-cell conditioned media, a source of macrophage colony stimulating factor (M-CSF). We performed control studies to compare BMM cholesterol loading by 24h incubation with 50 µg/ml AcLDL, copper oxidized LDL, or native LDL; and, we only observed robust cholesterol loading

53 with AcLDL at this dose, justifying its use in our subsequent studies. The cells were loaded for 48h with 50 µg/ml AcLDL to allow foam cell formation with unloaded cells used as controls. Lipids were extracted and total, free and esterified cholesterol levels were quantified and normalized to cell protein as previously described [90]. Quadruplicate samples were assayed in triplicate and significance was determined by ANOVA with Newman-Keuls posttest using GraphPad Prism software.

3.2.3 Loading of macrophages with acetylated LDL for tran-

scriptome profiling

Differentiated BMM in p100 dishes from AKR and DBA/2 ApoE-/- mice were in- cubated for 24 hours in the presence or absence of 50 (experiment 2) or 100 µg/ml (experiment 1) AcLDL in quadruplicate. Different AcLDL doses were used due to altered efficiency of cholesterol loading of independent AcLDL preparations. After the incubation, the media and cholesterol were aspirated off, cells were washed twice, scrapped in 1ml ice-cold PBS and transferred to microcentrifuge tubes. Cell pellets were obtained by centrifugation in a microcentrifuge at 5,000 rpm for 2 minutes at RT and kept on ice for subsequent use.

3.2.4 Isolation of total RNA from BMM cell pellets

Total RNA was isolated from cell pellets using the RNeasy Mini Kit (Qiagen, Valencia, CA), following the manufacturer’s protocol. On-column digestion with RNase-free DNase (Qiagen, Valencia, CA) was performed to remove genomic DNA. DNase was removed in the subsequent washing steps. RNA integrity was tested by overnight incubation of 200-500 ng of total RNA at 37◦C and observation of the 18S and 28S ribosomal RNA bands on a 1.2% agarose/ethidium bromide gel.

54 3.2.5 Hybridization and detection of gene transcripts

An aliquot of total RNA for each sample (˜2 µg) was submitted to the Genomic Core at Lerner Research Institute. Total RNA was used as template to synthesize single stranded cDNA following the Illumina protocol. Single stranded cDNA is then con- verted into double stranded cDNA, purified and in vitro transcribed (in the presence of biotinylated UTP and CTP) to produce biotin-labeled cRNA. Purified labeled cRNA from experiment 1 and 2 samples were hybridized respectively to MouseRef–8 v1 and Ref-8 v2 microarray chips (Illumina) for 16 hours at 58◦C, according to the manu- facturer’s instructions. Eight samples were profiled on one chip with either 24,613 or 25,697 transcript probes per sample. To reduce the chip-to-chip variability two control and two cholesterol-loaded samples from each strain were put on each mi- croarray chip. After hybridization, the arrays were washed several times and stained with streptavidin-Cy3. After the staining, the arrays were washed and scanned on the Illumina BeadArray Reader using Illumina BeadScan image data acquisition software.

3.2.6 Microarray data analysis

Expression data were quantile normalized and log 2 transformed using the R-package lumi [91]. 1,342 unique probes that hybridized to transcript sequences containing a strain-specific polymorphism or an indel (insertion/deletion) were excluded from fur- ther gene expression-related analysis. Sequence variations between AKR and DBA/2 (SNPs and insertions-deletions) were obtained from sequencing data generated by the Mouse Genomes Project: http://www.sanger.ac.uk/resources/mouse/genomes/. Probes were also removed if no sample had a detection p-value less than 0.01. 9,316 and 11,919 probes remained for further analysis in experiments 1 and 2, respectively. To determine strain effects on the transcriptome, only the results of unloaded bone marrow macrophages were analyzed. log 2 fold change , unadjusted p-values, and false discovery rate adjusted p-values were determined separately for the two inde-

55 pendent datasets with the limma package in R [92][93] using a fitted linear model

with the following equation: Yi=β1Strain+ β2Loading + β3Strain-Loading, with the strain and loading βs as additive covariates and the strain-loading interaction β as an interactive covariate. Gene set enrichment analysis was performed for all expressed transcripts to identify possible pathways altered by strain, loading and strain-loading interactions. The romer function in the limma package was used to test for gene set enrichment analysis [94][95] with the Molecular Signature Database from the Broad Institute as gene sets [96]. We present and discuss further only the pathways that were significantly enriched using the permutation test p-value threshold of 1.0E-04. The MIAME compliant microarray dataset from this study is available through the Gene Expression Omnibus server, accession number GSE38736.

3.2.7 Real-Time quantitative PCR (qPCR)

To validate the findings of gene transcript measurements by microarray, we assessed the expression of selected gene transcripts by using qPCR approach. The expression levels of tribbles homolog 3 (Trib3 ), activating transcription factor 4 (Atf4 ), ceroid- lipofuscinosis neuronal 5 (Cln5 ), and ATP-binding cassette sub-family G member 1 (Abcg1 ) were quantified relative to the endogenous control gene, ubiquitin C (Ubc) using pre-designed TaqMan gene expression assays (Applied Biosystems). Mean fold changes for each sample were calculated as previously described [97].

3.2.8 Western blot

Unloaded and cholesterol loaded BMM from AKR and DBA/2 were lysed with 0.5 mL of M-PER mammalian protein extraction reagent (Pierce ). Cell lysates (45 µg protein/lane) were loaded, separated on a 4–15% gradient polyacrylamide gel and transferred to PVDF membranes by semi-dry electroblotting. After blocking with rapid blocking buffer (Amresco, Cat # M325) for 1 hour, the membrane was

56 incubated overnight with rabbit to TRIB3 (Proteintech, Cat # 13300) at 4◦C. The membrane was washed and further incubated with HRP goat anti-rabbit IgG (Amresco, Cat # N791) secondary antibody for 1 hour. The membrane was then exposed to an enhanced chemiluminescent system, and bands were visualized by exposure to X-ray film. After stripping, GAPDH protein was visualized as loading control using goat antibody to GAPDH (Abcam, Cat # ab9483), followed by HRP rabbit anti-goat IgG as described above. TRIB3 band density was quantified using Total Lab TL120 software (Nonlinear Dynamics) and normalized to the respective GAPDH band density.

3.3 Results and Discussion

3.3.1 AKR and DBA/2 macrophages respond differently to

cholesterol loading

Bone marrow derived macrophages of the two strains were incubated with 50 µg/ml AcLDL for 48 hours. We observed that macrophages from the atherosclerosis sus- ceptible DBA/2 strain had significantly higher levels of total cholesterol (p-value < 0.0001) and cholesterol esters (p-value < 0.0001) compared to atherosclerosis resis- tant AKR macrophages (Figure 3.6A and 3.6B). In contrast, AcLDL-loaded DBA/2 macrophages had significantly lower levels of free cholesterol (p-value < 0.05, Fig- ure 3.6C). We have recently found that the strain difference in cholesterol ester metabolism is due to reduced cholesterol ester turnover in DBA/2 cells, which we attribute to decreased autophagic flux; and, autophagy is the primary mechanism responsible for the degradation of cholesterol esters in lipid droplets [98]. The cur- rent study focuses on the transcriptome differences between macrophages from the DBA/2 and AKR strains and the changes in response to cholesterol loading in these macrophages. Our objective was to identify possible genes and pathways that may

57 play a role in the observed strain-specific differences in response to cholesterol loading and atherosclerosis.

3.3.2 Hierarchical clustering

Bone marrow derived macrophages from mice of both strains were incubated with or without AcLDL for 1 day (n=4 per group, total of 8 groups) in two independent experiments. Total RNA was applied to Illumina expression arrays and analyzed as described in Materials and Methods. Hierarchical clustering analysis was performed on all 32 samples (Figure 3.1). The two independent experiments were assayed on different versions of the Illumina array, using cells from different sexes with different batches of AcLDL, and were treated and analyzed in different years. Based on the observed clustering profile of these two groups, we decided not to pool the samples, and thus we analyzed the expression data from the two experiments separately for strain, loading and strain-loading interaction. Furthermore, we used the two studies to identify the overlapping group of genes and pathways that consistently respond to cholesterol loading in a strain-shared or strain-specific fashion.

3.3.3 Strain differences on BMM transcriptome

Experiment 1 samples: 3,059 transcripts were identified as differentially expressed between AKR and DBA/2 unloaded macrophages at a stringent false discovery rate (FDR) adjusted p-value < 0.01 in experiment 1 samples. Table 3.1 shows the top 10 most significant differentially expressed transcripts, while the entire list is shown in Table S1. 229 differentially expressed transcripts were found with a 2- or higher fold change in expression between the two strains, of which 137 transcripts were expressed higher in DBA/2 and 92 transcripts were expressed higher in AKR macrophages.

58 Figure 3.1: Hierarchical clustering analysis of 32 samples included in the study. Four replicates each for control and cholesterol loaded macrophages from AKR and DBA/2 strain, both independent experiments were included (loaded; unloaded samples).

Experiment 2 samples: A similar analysis was performed on the experiment 2 samples and 1,703 transcripts were found to be differentially expressed between AKR and DBA/2 macrophages (FDR adjusted p-value < 0.01; 3.1 top ten and Table S2 for all results). Of the 418 differentially expressed transcripts with a 2- or higher fold change, 220 transcripts were expressed higher in DBA/2 and 198 were expressed higher in AKR macrophages.

59 Table 3.1: Top 10 differentially expressed transcripts between AKR and DBA/2 unloaded

Gene Log2Fold P-Value Adjusted Description Symbol Change1 P-Value2 Experiment 1 H2-D1 7.7 2.70E-23 2.50E-19 Histocompatibility 2, D region locus 1 H2-Q6 5.6 1.10E-20 5.10E-17 Histocompatibility 2, Q region locus 6 Gpr137b 4.2 1.50E-19 4.60E-16 G protein-coupled receptor 137B Ncf2 6.1 5.00E-19 1.20E-15 cytosolic factor 2 Sfi1 3.7 7.00E-19 1.30E-15 Sfi1 homolog, spindle assembly associated (yeast) Psmb6 -3.7 9.40E-19 1.30E-15 Proteasome (prosome, macropain) subunit, beta type 6 Ccl5 -4.5 9.20E-19 1.30E-15 Chemokine (C-C motif) ligand 5 Baat1 3.2 1.80E-18 2.10E-15 BRCA1-associated ATM activator 1 Gpnmb -3.7 2.80E-18 2.90E-15 Glycoprotein (transmembrane) nmb Prcp 3.3 5.70E-18 5.30E-15 Prolylcarboxypeptidase (angiotensinase C) Experiment 2 H2-K1 4.3 9.40E-14 1.10E-09 Histocompatibility 2, K region locus 1

60 Pilrb1 5.2 5.20E-13 1.40E-09 Paired immunoglobin-like type 2 receptor beta 1 H2-Q5 4.7 7.10E-13 1.40E-09 Histocompatibility 2, Q region locus 5 H2-Q8 4.6 2.80E-12 1.40E-09 Histocompatibility 2, Q region locus 8 Eif2s3y 4.4 3.90E-13 1.40E-09 Eukaryotic translation initiation factor 2, sub- unit 3, structural gene Y-linked Prcp 3.3 6.50E-13 1.40E-09 Prolylcarboxypeptidase (angiotensinase C) Lrrc57 3.6 5.00E-12 8.50E-09 Leucine rich repeat containing 57 Ogfrl1 3.1 6.30E-12 9.30E-09 Opioid growth factor receptor-like 1 Napsa -3.1 7.20E-12 9.50E-09 Napsin A aspartic peptidase Baat1 3.3 8.10E-12 9.60E-09 BRCA1-associated ATM activator 1

1Positive log2 fold change, transcripts expressed higher in DBA/2; negative log2 fold change, transcript expressed higher in AKR macrophages. 2FDR adjusted p-value based upon permutation. Combined results and discussion: 522 differentially expressed transcripts overlapped among the 3,059 and 1,703 strain significant differences identified in the experiment 1 and 2, respectively (significance threshold set at 0.01 FDR, Table S3), making these transcript changes the ones that we are most confident of. Among the top ten differentially expressed transcripts, two and three were in the histocompati- bility gene family in experiment 1 and 2, respectively; however, different members of genes in this family were observed in the two experiments. The only gene to make it to the top ten in both experiments was Prcp, encoding prolylcarboxypeptidase (also known as angiotensinase C), expressed significantly higher in DBA/2 macrophages. Transcripts encoding for transmembrane glycoprotein nmb (Gpnmb) and napsin A aspartic peptidase (Napsa) were expressed significantly higher in AKR macrophages. Gene set enrichment analysis: To identify the common biological pathways most relevant to the genes that differ in expression between AKR and DBA/2 BMM, tran- scriptomes from both experiments were subjected to Gene Set Enrichment Analysis (GSEA) using KEGG pathways, and we report here only the pathways identified as significantly enriched in both. Strain effects on geneset enrichment were found for the hematopoietic cell lineage, chemokine signaling, toll like receptor signaling and aldos- terone regulated sodium reabsorption pathways (permutation test p-value < 0.0001, Table S4). In conclusion, significant basal gene expression differences were observed between the two strains in both experiments, which need to be considered in the following analysis of gene expression changes in the AKR and DBA/2 macrophages in response to cholesterol loading.

3.3.4 Cholesterol loading effect on BMM transcriptome

To identify the differentially regulated transcript in response to cholesterol loading, the expression data were fitted in a linear model with strain as an additive variable

61 and strain-loading interaction as an interactive variable. This model identified tran- scripts whose expression was either up-regulated or down-regulated in one or both strains upon cholesterol loading. Experiment 1 samples: 3,758 transcripts were identified as differentially ex- pressed in response to cholesterol loading in AKR and DBA/2 macrophages at an FDR adjusted p-value < 0.01 (Table 3.2 top ten and Table S5 for all results). There were 261 differentially expressed transcripts with a 2-fold or higher change in expres- sion upon cholesterol loading in one or both strains, of which 127 transcripts were up-regulated and 134 down-regulated. Experiment 2 samples: A similar number of significantly differentially ex- pressed transcripts were found in macrophages in response to cholesterol loading (3,308 transcripts; FDR adjusted p-value < 0.01; Table 3.2, Table S6). Of the 567 differentially expressed transcripts with a 2-fold or higher change, 236 transcripts were up-regulated upon cholesterol loading versus 331 that were down-regulated in one or both strains.

62 Table 3.2: Top 10 differentially expressed transcripts between AKR and DBA/2 unloaded

Gene Log2Fold P-Value Adjusted Description Symbol Change1 P-Value2 Experiment 1 P2ry13 -2.8 7.8E-020 2.7E-016 Purinergic receptor P2Y, G-protein coupled 13 Clec4a3 -2.5 8.7E-020 2.7E-016 C-type lectin domain family 4, member A3 Npy 2.1 6.1E-020 2.7E-016 Neuropeptide Y Ms4a6c -2.3 4.6E-019 5.9E-016 Membrane-spanning 4-domains, subfamily A, member 6C H2-Ab1 -2.3 3.7E-019 5.9E-016 Histocompatibility 2, class II antigen A, beta 1 Hyal1 2.3 5.7E-019 5.9E-016 Hyaluronoglucosaminidase 1 Trib3 2.6 5.1E-019 5.9E-016 Tribbles homolog 3 (Drosophila) Ppap2b 2.7 4.9E-019 5.9E-016 phosphatase type 2B Chac1 2.9 3.6E-019 5.9E-016 ChaC, cation transport regulator-like 1 (E. coli) Ly6a -2.2 6.7E-019 6.2E-016 Lymphocyte antigen 6 complex, locus A

63 Experiment 2 Ccr5 -2.9 3.8E-016 2.8E-012 Chemokine (C-C motif) receptor 5 Ccr5 -3 6E-016 2.8E-012 Chemokine (C-C motif) receptor 5 Ifit3 -3.3 7.1E-016 2.8E-012 Interferon-induced protein with tetratricopep- tide repeats 3 Fcgr1 -2 1.2E-015 3.7E-012 Fc receptor, IgG, high affinity I Ccr5 -2.9 1.8E-015 4.4E-012 Chemokine (C-C motif) receptor 5 P2ry14 -1.9 3.2E-015 6.4E-012 Purinergic receptor P2Y, G-protein coupled, 14 Trib3 3.7 4.8E-015 8.1E-012 Tribbles homolog 3 (Drosophila) Ifit3 -3.2 6.4E-015 9.6E-012 Interferon-induced protein with tetratricopep- tide repeats 3 Ly6a -3.9 7.4E-015 9.8E-012 Lymphocyte antigen 6 complex, locus A Vegfa 2.7 9.9E-015 1.2E-011 Vascular endothelial growth factor A

1Positive log2 fold change, transcripts expressed higher in DBA/2; negative log2 fold change, transcript expressed higher in AKR macrophages. 2FDR adjusted p-value based upon permutation. Combined results and discussion: The cholesterol loading effect on gene ex- pression was largely reproducible in both experiments, despite microarray platform differences. There were 2,475 cholesterol regulated transcripts with identical probes on the two array platforms, and of these, 1140 transcripts were significantly regu- lated by cholesterol loading in both experiments (Table S7). The cholesterol-loading induced fold changes of these transcripts were also well correlated between the two experiments (R2 = 0.68, Figure 3.2).

Figure 3.2: Conservation of cholesterol induced changes in macrophage gene expression in two independent experiments. Linear regression analysis of log2 fold changes of the 1,140 overlapping transcripts between experiment 1and experiment 2 dataset that are significantly regulated by cholesterol loading in one or both strains. P-value of linear regression <0.0001

Transcripts that were significantly up-regulated in response to cholesterol loading in both experiments include: tribbles homolog 3 (Trib3 ), hyaluronoglucosaminidase 1 (Hyal1 ), vascular endothelial growth factor A (Vegfa), etc. Transcripts that were significantly down-regulated in both experiments include: chemokine (C-C motif) re-

64 ceptor 5 (Ccr5 ), lymphocyte antigen 6 complex, locus A (Ly6a), interferon-induced protein with tetratricopeptide repeats 3 (Ifit3 ), etc. The related purinergic receptors P2ry13 and P2ry14 were also down regulated in both experiments, with P2ry13 iden- tified as the most significantly altered transcript in response to cholesterol loading in experiment 1, while P2ry14 was identified among the top 10 most significant tran- scripts in experiment 2. P2ry14 encodes for a G-protein coupled receptor expressed in a subpopulation of bone-marrow hematopoietic stem cells [99]. P2ry13 has been shown to play a role in hepatic uptake of holo-HDL particles, particularly in SR-BI deficient liver cells; and, P2ry13 knockout mice are reported to have decreased re- verse cholesterol transport [100]. However, whether P2ry13 deficiency has an effect on plasma HDL levels in mice is controversial [100][101]. The most significantly altered transcript in response to cholesterol loading in experiment 2 was chemokine (C-C motif) receptor 5 (Ccr5 ), which was down-regulated by this treatment. Emerging evidence from both human and mouse studies supports important role(s) played by the Ccr5 receptor and its ligand Ccl5 in atherogenesis, a detailed description of which is provided in a recent review by Jones et al. [102]. Gene set enrichment analysis: Eight pathways were significantly enriched in transcripts regulated by cholesterol loading in both experiments: lysosome, cytokine- cytokine receptor interaction, primary bile acid biosynthesis, allograft rejection, aminoa- cyl tRNA biosynthesis, autoimmune thyroid disease, hematopoietic cell lineage, and type I diabetes mellitus (permutation test p-value < 0.0001, Table S4). We looked more carefully at the lysosome pathway because it had the highest number of genes involved and of the recent discovery that macrophage cholesterol ester is mobilized for efflux via autophagy, with the cholesterol ester hydrolyzed by lysosomal acid lipase [79]. The number of significant cholesterol regulated transcripts in this pathway was 43 and 45 in experiment 1 and 2, respectively (Table 3.3), with 25 in common in both experiments. GSEA allows both the few strongly differentially expressed transcripts

65 and the many weakly differentially expressed transcripts to factor into the enrichment analysis. Thus, most of the lysosomal pathway genes in Table 3 are only modestly regulated by cholesterol loading. Nevertheless, many minor changes in gene expres- sion in the same pathway may add up to a large overall effect in that pathway, in this instance lysosome function.

66 Table 3.3: Significantly regulated transcripts upon cholesterol loading involved in lysosome pathway, ranked by fold change (loaded/unloaded).

Expt.1 AKR DBA Adjusted Expt.2 AKR DBA Adjusted Gene Log2 log 2 P-Value Gene Log2 Log2 P-Value Symbol Fold Fold Symbol Fold Fold Change1 Change2 Change1 Change2 Hyal1 2.3 0.6 5.9E-016 Hyal1 1.3 1.1 2.8E-007 Igf2r 1.4 0.6 2.3E-012 Atp6v1h 0.8 0.7 1E-006 Ctsk 1.3 0.4 9.6E-012 Sort1 0.8 1.6 4.8E-005 Tcirg1 1.3 0.2 2.9E-012 Gla 0.7 0 0.00059 Ctsb 0.9 0.1 1.1E-008 Ctsl 0.7 0.3 0.0005 Gla 0.7 0.1 1.5E-006 Ctsz 0.7 0.3 9.3E-005 Naglu 0.7 0.1 2.8E-008 Igf2r 0.6 1.3 0.00022 Gga2 0.7 0.3 1.8E-009 Cltb 0.6 0.3 0.00012 Cd68 0.6 0.1 8.6E-005 Mcoln1 0.6 0.5 8.6E-005 Gnptab 0.6 0.1 2E-007 Slc11a1 0.6 0.5 1.2E-005 Slc17a5 0.6 -0.1 1.5E-006 Dnase2a 0.6 0.5 8.2E-005 Cln5 0.6 0.3 6.8E-008 Tcirg1 0.6 0.6 2E-005 Atp6v0b 0.5 0.2 1.2E-006 Cln5 0.6 0.3 4.1E-005 Ctsz 0.5 0.2 6.8E-007 Ctsa 0.5 0 0.0079 Ctns 0.5 -0.1 6.9E-005 Ap1s1 0.5 0.3 0.0019 Ap3d1 0.5 -0.1 8.8E-006 Atp6v0a2 0.5 0.1 0.00011 Neu1 0.4 0.1 0.00029 Atp6v0d1 0.5 0.1 0.0004 Mcoln1 0.3 0.1 0.00042 Cd68 0.5 0.6 0.005 Ctsd 0.3 0.1 0.00093 Gaa 0.4 0.3 0.0058 Abca2 0.3 0.1 0.00059 Gba 0.3 0.3 0.0021 Manba 0.3 0.1 0.0032 Psap 0.3 0.4 0.0023 Dnase2a 0.3 0 0.0014 Glb1 -0.3 0.1 0.0034 Ap1s1 0.3 0.4 0.0011 Lgmn -0.4 -0.2 0.0027 Atp6v0a1 0.2 0.2 0.0052 Man2b1 -0.4 0.3 0.00046 Psap 0.2 0 0.0051 Ap4m1 -0.4 -0.3 0.005 Acp2 -0.3 -0.1 0.0021 Pla2g15 -0.4 -0.2 0.00025 Ap4m1 -0.3 0.1 0.0014 Lamp2 -0.4 -0.7 0.00078 Ap1b1 -0.3 -0.2 0.0022 Ap3m2 -0.4 -0.3 0.00041 Ap3m2 -0.3 0.2 0.0021 Arsb -0.4 0.1 0.0068 Ppt2 -0.3 0 0.0001 Smpd1 -0.5 -0.5 0.0044 Ctsh -0.4 -0.1 2.2E-005 Acp2 -0.5 -1 0.00025 Sort1 -0.4 -0.3 0.00056 Clta -0.5 -0.1 1.3E-005 Ap4s1 -0.4 0 1.5E-005 Ctss -0.5 0.1 0.0027 Asah1 -0.5 -0.9 0.0018 Ctsf -0.5 -0.2 0.0046 Clta -0.5 0.3 0.0016 Manba -0.6 0.1 3.5E-005 Lamp2 -0.5 0 4.6E-006 Sgsh -0.6 0.4 0.0037 Slc11a1 -0.5 -0.2 1.6E-006 Gusb -0.6 -0.2 0.00023 Ap1s2 -0.5 0.1 2.6E-006 Ppt1 -0.7 -0.5 4.1E-005 Napsa -0.5 -0.1 3.1E-007 Ctsc -0.7 -1.4 0.00012 Ctse -0.7 -0.4 7.7E-008 Hgsnat -0.8 -0.4 5.4E-005 Ppt1 -0.7 -0.3 1.4E-008 Ids -0.8 -0.5 6.7E-007 Laptm4a -0.8 -0.3 1.1E-008 Ctse -0.8 -0.8 4E-007 Ctsc -1.2 -0.1 8.9E-013 Asah1 -0.9 -1.1 6.9E-006 The 25 overlapping genes between the two experiments are shown in bold 1Calculated as log2 of AKR loaded/AKR unloaded average expression, positive numbers higher in loaded, negative numbers higher in loaded. 2Calculated as log2 of AKR loaded/AKR unloaded average expression, positive numbers higher in loaded, negative numbers higher in loaded.

67 Table 3.3: Significantly regulated transcripts upon cholesterol loading involved in lysosome pathway, ranked by fold change (loaded/unloaded).

Expt.1 AKR DBA Adjusted Expt.2 AKR DBA Adjusted Gene Log2 log 2 P-Value Gene Log2 Log2 P-Value Symbol Fold Fold Symbol Fold Fold Change1 Change2 Change1 Change2 Napsa -0.9 -0.1 8E-006 Ctsc -1.2 -1.6 1.7E-008 Ap1b1 -1.3 -0.3 9.9E-010 The 25 overlapping genes between the two experiments are shown in bold 1Calculated as log2 of AKR loaded/AKR unloaded average expression, positive numbers higher in loaded, negative numbers higher in loaded. 2Calculated as log2 of AKR loaded/AKR unloaded average expression, positive numbers higher in loaded, negative numbers higher in loaded.

68 Most of the lysosome pathway genes regulated by cholesterol showed a larger fold change in macrophages from the atherosclerosis resistant AKR than the susceptible DBA/2 strain, and a systematic analysis of the strain-cholesterol loading interaction is provided below. These experimental-validated regulated transcripts include the following lysosomal acid : 1) proteases represented by (CtsC, CtsE and CtsZ ); 2) the peptidase napsin (Napsa); 3) glycosidases (Hyal1 and Gla); 4) palmitoyl-protein thioesterase (Ppt1 ); 5) phosphatase (Acp2 ); and 6) ceramidase (Asah1 ). Point mutations in the ASAH1 gene leads to the lysosomal storage disor- der Farber disease; and, Asah1+/- mice develop an advanced lipid storage disease in many organs, most prominently in liver [103]. The conserved set of cholesterol regulated genes also includes: 1) several major and minor lysosomal membrane pro- teins (Lamp2, Cd68 and Cln5 ); 2) adapter-related protein complex subunits beta, mu and sigma (Ap3d1, Ap3m2 and Ap1s1 ) that are involved in the transport be- tween lysosomes and Golgi; 3) sortilin 1 (Sort1 ); and 4) mucolipin1 (Mcoln1 ). Of these non-hydrolytic lysosomal pathway genes, two stood out as potentially relevant to atherosclerosis Mcoln1 and Sort1. Mcoln1 encodes for a protein that co-localizes with endocytosed material that accumulates in lysosomes, and it plays a role in the exit of lipids from the lysosome and the trafficking of MHCII to the plasma membrane [104]. In addition, Mcoln1-deficient neurons have defective autophagic flux leading to increased levels of LC3-II and P62 [105]. Sort1 is a genome wide association study (GWAS) hit for LDL-cholesterol and coronary artery disease, and has been shown to play a role in hepatic apoB lipoprotein secretion and LDL uptake [106].

SREBP and LXR motifs in cholesterol regulated transcripts: Since GSEA for sequence motifs was not very informative for the cholesterol loading datasets (data not shown); we examined motifs for two well known sterol regulated transcription factors. The classical example of cholesterol regulation of gene expression involves

69 the down regulation of genes containing the sterol responsive element (SRE). This regulation is mediated by sterol control of sterol regulatory element binding protein (SREBP) processing in the ER and Golgi, such that high sterols repress its pro- cessing and low sterols permit its processing into a positively acting transcription factor [107]. Thus, we searched for the V$SREBP1 02 motif (KATCACCCCAC tar- get sequence) motif within 2 kb of the transcription start site among all expressed transcripts in the two experiments. We identified 24 expressed transcripts associated with the V$SREBP1 02 motif in the first experiment. 11 transcripts (46%) were sig- nificantly regulated by cholesterol loading with 6 up regulated and 5 down regulated (Table S8). In the second experiment, we identified 35 expressed transcripts associ- ated with the V$SREBP1 02 motif, 12 of which (34%) were significantly regulated by cholesterol loading with 4 up regulated and 8 down regulated. Seven replicated transcripts with this motif met our criteria for significant regulation by cholesterol loading in both experiments, two up and five down regulated (Table S8); however, the overlap between the two sets of cholesterol regulated SERPB1-associated transcripts was not significantly different compared to the overlap of all cholesterol regulated genes (one-tailed Fisher’s exact test, p=0.15). These seven replicated transcripts are not directly involved in cholesterol biosynthesis. A similar finding was reported in mouse macrophages deficient for SREBP1a, in which the most highly regulated tran- scripts did not include those coding for classical cholesterol biosynthesis genes [108]. Overall, we were surprised that several of the SREBP motif-containing genes were induced upon loading, a state where SREPB is expected to be low and the SRE con- taining genes are repressed. However, there are now well known examples of SREBP acting as a transcriptional repressor, such as SREBP repression of IRS2 transcription in liver [109]. The oxysterol activated transcription factor LXR heterodimerizes with RXR and binds to genes harboring LXR responsive elements, often leading to sterol mediated

70 up-regulated gene expression, as demonstrated for Abca1, Abcg1, and apoE [110]. We searched for the LXR DR4 motif TGACCGNNAGTRACCC within 2 kb of the start site of expressed transcripts. We identified 32 expressed transcripts associated with the LXR DR4 motif in the first experiment. 17 transcripts (53%) were significantly regulated by cholesterol loading with 6 up regulated and 11 down regulated (Table S9). In the second experiment, we identified 31 expressed transcripts associated with the LXR DR4 motif, of which 11 transcripts (55%) were significantly regulated by cholesterol loading with 7 up regulated and 4 down regulated. Six transcripts with this motif met our criteria for significant regulation by cholesterol loading in both ex- periments, two up and four down regulated (Table S9); however, the overlap between the two sets of cholesterol regulated LXR-associated transcripts was not significantly different compared to the overlap of all cholesterol regulated genes (one-tailed Fisher’s exact test, p=0.45). The two replicated up-regulated genes were Abcg1 and Stac2. Abcg1 is a well-known LXR target gene, but we did not identify Abca1 another well- known target, and apoE is not expressed in our apoE-deficient macrophages. The expression of Abca1 on our microarrays was very low, possibly due to poor probe design, so we quantified its expression in the experiment 1 samples by qPCR, and de- termined that it is indeed up-regulated by cholesterol loading in both strains (Figure 3.3). In a combined ChIP and gene expression study in human THP1 macrophages, LXR agonist treatment was found to lead to widespread up and down regulation of genes adjacent to confirmed LXR binding sites [111], confirming that LXR may act as either a transcriptional activator or repressor.

71 Figure 3.3: Gene expression and validation of microarray data by quanti- tative real-time PCR in experiment 1 and 2 macrophages: Abca1 (A) and Abcg1 (B) expression was calculated relative to ubiquitin C (Ubc) gene expression, an endogenous control whose expression remained unchanged under the conditions of experiment. Values are expressed as mean ± SD (N = 4). Different numbers above bars show p<0.001 (A) or p<0.05 (B) by Newman-Keuls ANOVA posttest, while similar numbers above bars show no significant differences.

3.3.5 Cholesterol loading–strain interaction effect on BMM

transcriptome

A fitted linear model using strain and loading as additive covariates was used to identify the transcripts with a significant cholesterol loading-strain interaction effect. This interactive effect identifies transcripts that have different directions or degrees of cholesterol regulation between the two strains, for example, a transcript that is up regulated in AKR and down regulated in DBA/2, or a transcript that is highly up regulated in AKR but only moderately up regulated in DBA/2 and vice-versa. Experiment 1 samples: 1,929 probes were identified with a significant loading- strain interaction effect at an FDR adjusted p-value < 0.01, with several transcripts independently identified by multiple probes (Table S10). The top 10 most signif- icant transcripts for strain-loading interaction effect were: Slamf9, Chac1, Trib3, Vwf, Gja1, Tgfbi, Dok2, Gadd45a, Gdf15 and Dner. The response of three of these

72 transcripts to cholesterol loading in both strains is shown in Figure 3.4, with Slamf9 highly down-regulated by loading in AKR (-5.4 fold) and not significantly changed in DBA/2; Trib3 highly up-regulated in AKR (6.1 fold) and moderately up-regulated in DBA/2 (1.2 fold); and Dner down-regulated in AKR (-2.5 fold) and up-regulated in DBA/2 (1.4 fold).

Figure 3.4: Examples of transcripts with strain-loading interaction effect in experiment 1 samples: Slamf9 (A), Trib3 (B), and Dner (C) expression in unloaded and loaded macrophages. Values are expressed as mean ± SD (N = 4). Different numbers above bars show p<0.01 (A), p<0.05 (B, C) by Newman-Keuls ANOVA posttest, while similar numbers above bars show no significant differences.

Experiment 2 samples: 965 probes were identified with a significant loading- strain interaction effect at an FDR adjusted p-value < 0.01, with several transcripts

73 independently identified by multiple probes (Table S11). The top 10 most significant probes for strain-loading interaction effect were: Ddit3, Aqp9, Trib3, Nurp1, Cox6a2, Asns, Ccr5 (represented by three independent probes), and Slc1a4. The response of three of these transcripts to cholesterol loading in both strains is shown in Figure 3.5, with Ddit3 up-regulated by loading in AKR (3.0 fold), and down-regulated in DBA/2 (-1.4 fold); Trib3 up-regulated highly in AKR (5.6 fold) and unchanged in DBA/2; and Ccr5 highly down-regulated in AKR (-7.6 fold) and moderately down-regulated in DBA/2 (-1.6 fold).

74 Figure 3.5: Examples of transcripts with strain-loading interaction effect in experiment 2 samples: Ddit3 (A), Ccr5 (B), and Trib3 (C) expression in unloaded and loaded macrophages. Values are expressed as mean ± SD (N = 4). Different numbers above bars show p<0.05 (A), p<0.001 (B), or p<0.001 (C) by Newman-Keuls ANOVA posttest, while similar numbers above bars show no significant differences. (D) Proteins isolated from unloaded and cholesterol loaded AKR and DBA/2 BMM were subjected to Western blot analysis, showing TRIB3 band density normalized to GAPDH.

Combined results and discussion: There were 213 probes with highly signif- icant strain-loading interactions that were conserved in both experiments (presented in Table S12). Ddit3, identified as the transcript with the most significant loading- strain interaction effect in the second experiment samples, was also observed with a significant strain loading effect in the first experiment. This gene encodes the DNA-

75 damage inducible transcript 3 proteins, more widely known as CHOP (C/EBP ho- mologous protein). CHOP is a transcription factor whose expression is up-regulated by endoplasmic reticulum (ER) stress response and the unfolded protein response (UPR), which under prolonged stress can trigger apoptosis [112][113]. Two other genes known to participate in the ER stress pathway, Trib3 and Atf4 [114][115][116], were also among the list of genes with strain-loading interactions in both experiments with similar regulation as seen for Ddit3, highly up regulated by cholesterol loading in AKR, and either down regulated or unchanged in loaded DBA/2 cells. Tabas and colleagues have previously shown that free cholesterol loading of macrophages induces ER stress including CHOP production [117][118]. Thus, our finding of increased ex- pression of ER stress related genes in AKR, the strain with increased free cholesterol loading (Figure 3.6), fits well with the effects of free cholesterol on ER stress described by Tabas [117][118].

76 Figure 3.6: Total (A), esterified (B), and free (C) cholesterol mass in un- loaded and AcLDL loaded AKR and DBA/2 ApoE-/- macrophages. All data are mean ± SD, N=4 per group using the average of triplicate assays per sam- ple. P-values were calculated by ANOVA with Newman-Keuls posttest, showing only the strain differences after loading. There were no significant strain differences in unloaded cells.

ER stress, CHOP, and Trib3 have previously been implicated in atherosclerosis. In ApoE-/- mice fed a western-type diet to induce advanced lesions, CHOP defi- ciency led to 35% smaller aortic lesions and 50% less plaque necrosis when compared to controls, despite similar levels of plasma lipoproteins [119]. Similar results were obtained on the Ldlr-deficient background [119]. Adenoviral knockdown of Trib3 in ApoE-Ldlr double knockout also led to smaller aortic lesions [120]. Our finding of increased ER stress response gene expression in cholesterol loaded macrophages from

77 the atherosclerosis resistant AKR strain appears to differ from the above findings where decreased ER stress response was associated with smaller lesions. There are several potential reasons that might explain this discrepancy. One explanation is that the atherosclerosis studies comparing AKR and DBA/2 strains were performed in chow fed mice at an early time point prior to lesion necrosis, and Tabas has postu- lated that lesion macrophage apoptosis in early stage atherosclerosis is associated with increased phagocytic clearance of apoptotic cells and diminished lesion progression. Another explanation may be that ER stress can also trigger autophagy, which can be protective against lesion progression [121]. Autophagy also plays a role in cholesterol ester by delivering foam cell lipid droplets to the lysosome where lysoso- mal acid lipase cleaves cholesterol esters into free cholesterol, which is the substrate for cholesterol efflux [79].

3.3.6 Validation of data by quantitative Real-Time PCR (qPCR)

To confirm the microarray data we performed quantitative real-time PCR for four selected transcripts: three highly regulated ones, Trib3, Atf4 and Cln5, along with and LXR regulated transcript Abcg1. The expression levels for each transcript in the unloaded and cholesterol loaded samples from experiment 1 were quantified relative to the ubiquitin C (Ubc) control, which was found to be the least variable transcript in this study among the list of endogenous qPCR controls available from Applied Biosystems ( 3% coefficient of variation among the 16 samples in experiment 1). For each of these four transcripts, the expression levels quantified by qPCR were found to be consistent with the microarray data. Linear regression analysis revealed a highly significant positive correlation for each tested transcript (R2 values ranged from 0.66 to 0.97, Figure 3.7).

78 Figure 3.7: Validation of microarray expression data: Linear regression analysis of microarray expression and qPCR data for Trib3 (A), Abcg1 (B), Atf4 (C) and Cln5 (D) performed in experiment 1 unloaded and loaded AKR and DBA/2 macrophages. Microarray data were not log2 transformed for this analysis.

3.3.7 Western Blot Analysis

We compared Trib3 mRNA levels (microarray) and protein levels (Western blot) in unloaded and cholesterol loaded AKR and DBA/2 BMM (Figure 3.5C and D). A large increase in Trib3 upon cholesterol loading in AKR cells was observed at both the mRNA and protein levels. In contrast, upon cholesterol loading in DBA/2 cells there was no appreciable change in Trib3 mRNA or protein. Thus, the strain-specific response to cholesterol was reproducibly observed at both the mRNA and protein levels. However, the basal levels ofTrib3 mRNA in the AKR and DBA/2 cells were similar, while Trib3 protein levels were much higher in DBA/2 cells. Differences ob- served between mRNA and protein levels are not uncommon in complex biological samples [122]. Previous studies comparing mRNA and protein abundance among dif- ferent mouse strains have found limited correlations (mean R= 0.27), such that the mean association between protein and mRNA levels is only 7.3% comparing various strains [123]. Furthermore, this study showed that clinical traits among the various mouse strains were more strongly correlated with transcript levels than protein levels [123]. Identification of Candidate Genes for Ath22, Ath26 and Ath28 QTLs Sev- eral studies have integrated genetics and genomics to identify plausible candidate genes for complex diseases. Previous studies in our lab have identified atheroscle- rosis QTLs from an AKRxDBA/2 ApoE-/- intercross [60], of which Ath28 QTL on chromosome 2, Ath22 on chromosome 15 and Ath26 on chromosome 17 were recently validated in a second independent F2 cohort [124]. We searched the list of genes dif- ferentially expressed by strain, cholesterol loading, or the cholesterol loading-strain interaction, whose differential expression was conserved in both experiments. Table 4 shows this list of differentially regulated atherosclerosis modifier candidate genes that reside within these three QTL intervals each defined as 1-LOD score drop from the QTL peak. More candidates were identified within the Ath26 QTL interval on chromosome 17, as this is the largest and most gene-dense QTL interval.

80 Table 3.4: Differentially expressed transcripts conserved in both experiments that reside within Ath28, Ath22 and Ath26 QTLs.

Chromosome 21 Chromosome 152 Chromosome 173 ( Ath28 QTL) ( Ath22 QTL) (Ath26 QTL) Slc13a34 Capsl4 Tmem63b a,5 Srpk15 Sdc44 Myo104 C2 4,5 Gtpbp25,6 Mmp94 Oxct14 Tmem84 Nme45 Rnf1145 Fam134b4 Ltb a, b.6 Cfb5 Gnas5 Rai145 Prss224 Nrm5,6 Ctsz5 Il7r5 Rpl7l14 Mapk145 Gnas5 Ank6 Enpp44 Angptl45 Cebpb5 6 Itpr34 Cdkn1a5 Rae15 Tnfrsf214 H2-T23 Amdhd24 B430306N03Rik5 Fpr14,5 Brd25 Cul74 Cyp4f165 Fahd14 Rrp1b5 Emr1 4,5,6 Tcf195 6 Mrpl144 Aurka5 Dusp14 Pex65,6 Hagh4 Vars5 Ubxn64 Hmga15 Tubb54 Trip105 Fpr25 Klc45 Aif15,6 Plcl25 Itpr35 Cul75,6 Fgd25 6 Slc35b25 Ebi35 Mocs15 Tnfrsf12a5 Nfkbie5 H2-Eb15 Dpp95 9030025P20Rik5 Vegfa5,6 Ccnd35 Atp6v0e5 Uhrf15 6 Gpr1085 Cnpy35 Alkbh75 Flywch25 Chaf1a6 D17H6S56E-55 2410011O22Rik6 Lemd25 1Ath28 QTL confidence interval on chromosome 2 extends from 160 to 181Mb 2Ath22 QTL confidence interval on chromosome 15 extends from 3 to 33Mb 3Ath26 QTL confidence interval on chromosome 17 extends from 12 to 65Mb 4Transcripts significant for strain effect 5Transcripts significant for cholesterol loading effect 6Transcripts significant for cholesterol loading-strain interaction ef- fect 81 Chapter 4

Whole Genome Expression Differences in Human Left and Right Atria Ascertained by RNA-Sequencing

As published in Hsu J, Hanna P, Van Wagoner D, Barnard J, Serre D, Chung MK, Smith JD. Whole Genome Expression Differences in Human Left and Right Atria Ascertained by RNA-Sequencing. Circulation Cardiovasculr Genetics 2012 Jun;5(3):327-35.

4.1 Introduction and Background

The electrophysiological properties of the heart are determined by the expression of ion channels, gap junctions and other accessory proteins, for example, the expression of hyperpolarization-activated, cyclic nucleotide-gated ion channels (HCN2, HCN4 )

82 which contribute to the pacemaker activity of the sinoatrial node[125]. The molecular basis of some differences in action potentials between the cardiac ventricles, atria, and SA node have been previously elucidated[126]; however, left-to-right differences, particularly in the atria are less well characterized. This is important, as the right and left atrium have different susceptibilities towards developing arrhythmias, such as AF. Left-to-right asymmetries in expression [127] may promote reentry and have a role in the development of atrial AF [128]. GWAS have identified some of the genetic factors that are associated with an increased risk of developing AF[129]; however, the mechanisms by which these genetic factors lead to AF are not currently known. Specifically, the strongest and most widely replicated single nucleotide polymorphisms (SNPs) associated with AF lie in an intergenic region of chromosome 4q25, which may play an as yet undiscovered role in regulating the ex- pression of nearby genes, such as PITX2, a gene known to be involved in cardiac development and left/right patterning [130] Expression of miRNAs has also been shown to play a crucial role in cardiac devel- opment and disease, in part through the attenuation of expression [131]. miRNAs are RNA polymerase II transcribed non-coding RNAs that bind to target transcripts in a sequence specific manner. This binding results in the repression of translation and can lead to transcript degradation [132] resulting in lower steady state mRNA levels. miRNAs have been shown to play an important role in determining the core transcriptional network and thus may play a role in cell type differentiation[133]. miRNAs such as mir-1, modulate a wide-array of transcripts important in cardiac function, see [134] for a review. Previously, many significant left-right differences in mRNA expression were found in mouse atria using microarrays[135]. In the current study we utilized next generation sequencing to establish a comprehensive catalogue of differentially expressed transcripts and miRNAs in the human left and right atria, which can serve as the basis for further investigations into the genetic etiology of AF.

83 4.2 Methods

4.2.1 RNA-sequencing for left-right pairs

Total RNA was extracted using QIAGEN miRNeasy Mini Kits from four paired 20 mg atrial tissue samples processed simultaneously. mRNA sequencing libraries were made following the Illumina mRNA protocol. Briefly, RNA was purified by poly-A selection using oligo(dT) beads and chemically fragmented. First and second strand cDNA synthesis was followed by end-repair and 3’ adenylation. 5’ and 3’ Illumina adapters were ligated and size-selected using gel-purification and PCR amplification. All eight samples were 50 bp paired-end sequenced on a single flow-cell using the Illumina Genome Analyzer IIx. Using the same total RNA preparations, a small RNA-sequencing library was constructed following Illumina’s Alternative v1.5 small RNA-seq protocol. Briefly, 5’ and 3’ adapters were ligated onto the total RNA and amplified. The cDNA with adapters was size selected, by agarose gel electrophoresis, between 93-100 bp and 36 bp single-end sequencing was performed on all eight sam- ples on a single flow-cell. RNA reads are deposited in the GEO database, accession GSE31999.

4.2.2 Paried-end read analysis mRNA paired-end reads were aligned using TopHat [136] to the UCSC hg19 build using Reference Sequence (RefSeq) as a guide (-G option). Counts at known RefSeq genes were generated from the read alignments using custom Python scripts, where a read and its read pair were only counted once if the read-pairs were mapped to within 6 standard deviations of the average fragment size across reads. Genes with expression lower than 10 or lower reads summed across the samples were thrown out from further analysis. Count data was loaded into the , EdgeR [137] [138] and were normalized between samples using the trimmed mean of M-values (TMM),

84 which calculates the average library size after trimming the top and bottom 5% ex- pressed transcripts, and by trimming the transcripts with the top and bottom 30% log-fold changes [139]. To take into account the left-right paired experimental design, Cox-Reid conditional inference was used to estimate the tagwise dispersion for each of the genes [137]. A modified Fisher’s t-test was then used to determine differential expression between the left and right atria [137]. P-values were then adjusted by the method of Benjamini-Hochberg to derive FDRs (16). In order to determine if the one atrial donor sample (#3) significantly affected the results, we repeated the analysis after removing this sample. The top differentially expressed miRNAs, mRNAs and non-coding RNAs were largely conserved (Supplemental Tables 1-3). Gene set enrichment was done using the romer function in the R-package limma. Romer uses a model that is better suited for microarray data compared to edgeR, which estimates the expression and biological variation from a negative binomial as edgeR does. However romer was used nonetheless because it accounts for the correla- tion structure of genes and uses a novel rotation approach for calculating p-values that is applicable to small studies, unlike the permutation approach used in other gene-set methods. A pseudo count was first created by adding 0.5 counts to the counts of each RefSeq transcript. Counts were then converted to reads per kilobase model per million mapped reads (RPKM), multiplied by the TMM normalization factor and log base 2 transformed. Gene sets were obtained from Molecular Signature Databases from The Broad Institute [140]. A parametric resampling method for generalized lin- ear models [Majewski2010a] was used to obtain p-values (9999 iterations).

4.2.3 RT-PCR

A custom designed Taqman-primer and probe set was used for qRT-qPCR of the PITX2c transcript (Supplemental Table 7), normalized to cardiac actin (ACTC1 ) ex- pression using a primer limited probe set (assay number Hs00606316 m1 from Applied

85 Biosystems). RT-PCR reactions were run in duplex and relative PITX2c expression was calculated by the ∆∆CT method.

4.3 Results

4.3.1 RNA-seq of left-right atrial appendages

Left-right atrial appendage pairs were obtained from four human subjects, three of whom underwent bilateral Maze surgery for the treatment of atrial fibrillation and valve disease, while the fourth pair of atrial appendages was obtained postmortem from an unused heart transplant donor. Supplemental Table 1 describes the char- acteristics of these subjects. Total RNA was extracted from all samples. To deter- mine left-right atrial gene expression differences, whole-genome expression analysis of RNA samples was performed using RNA-sequencing for both the small RNA fraction containing miRNAs (36 bp single-end sequencing) and the poly-A enriched mRNA fraction (51 bp paired-end sequencing). miRNA-seq yielded between 7.7 and 17.1 million reads per sample that mapped to known miRNAs in miRBase [138]. mRNA- seq yielded between 16.3 and 29.9 million reads per sample that mapped to known Reference Sequence (RefSeq) transcripts; all mapped reads were collapsed to their unique RefSeq transcript regardless of potential splice isoforms in order to obtain digital counts. The full summary statistics of the sequencing results are described in Supplemental Tables 2 and 3.

86 4.3.2 miRNA gene expression differences between the left

and the right atria

Figure 4.1: Size-distribution of the small-RNA reads in one representative sample. The mode read length post adapter trimming occurs at 22bp which is expected from reads generated from miRNAs. The majority of sequences that did not read into the Illumina adapters aligned to tRNAs and rRNAs. Each read was classified as mapping uniquely to the human genome (Unique, red); having multiple alignments to the genome (Multi, blue); failing quality control (QC, green); or not mapping to the genome (NM, grey). The majority of the multiple aligned 20 - 23 bp reads correspond to miRNA families, which were collapsed into a single miRNA species for subsequent analyses.

After trimming the Illumina adapters from the miRNA reads, the majority of the reads were distributed around 22 bp (Figure 4.1). Of the total number of reads, 24.5 ± 5.7% (mean ± SD) mapped uniquely to known miRNAs (miRBase release 17). In addition, many sequences that mapped to more than one genomic locus represent

87 valid miRNAs, as determined by alignment to the hairpin pre-miRNA which were collapsed into unique mature miRNAs, such that the majority of 22 bp reads that were multiple-mapped were in fact miRNAs. Overall for the 8 samples, 39 ± 8% of the total reads were mapped to miRNAs (Supplemental Table 2). Most of longer 36-bp reads that did not read into the Illumina adapters mapped to annotated tR- NAs and rRNAs. The most highly expressed miRNA in the atria was mir-143, which represented, on average, 32.7 ± 1.5% and 26.7 ± 1.8% of all mapped reads in the left and right atria, respectively (Supplemental Table 4). The miRNA dataset was subjected to multidimension scaling (MDS) (Figure 4.2) showing that left-right sid- edness is measurably associated with the miRNA transcriptome.

Using a generalized liner model (GLM) in edgeR software [137] to fit the pair-wise data followed by removal of miRNAs whose average expression over the total library size in the left and right atria fell below 7.63 ∗ 10−06, 32 miRNAs were differentially expressed between the left and right atria at a P-value ≤ 0.01 and a false discovery rate (FDR) < 0.08 (Table 4.1). Of the top 32 differentially expressed miRNAs, 18 were expressed more so in the left atria than in the right. For example, hsa-mir-135b had 4.9-fold higher expression in the right vs. left side (FDR=4.02∗10−5). In contrast, hsa-mir-100 was expressed 3.2-fold higher in the left side (FDR=5.91 ∗ 10−9).

88 Figure 4.2: Multidimension scaling (MDS) of gene expression differences between the left and right atria. A. MDS of the miRNA data reveals that the first dimension of miRNA expression imperfectly separated left (pink symbols) and right (blue symbols) atrial tissue, with the left atria tending to cluster on the left side of the plot. Different symbol shapes represent the four subjects , with sample 3 (triangle) representing the sinus rhythm donor. B. MDS of the mRNA data showing segregation of left and right atria in the first dimension.

Table 4.1: Expression differences of miRNAs between the left and right atria at FDR <0.08 ranked by p-value.

Atrial Absolute Expressed mirBase ID1 PValue2 FDR3 Concentration1 Fold-Change Higher hsa-miR-10b 0.000605 3.94 Left 2.13E-011 5.91E-009 hsa-miR-100 0.0138 3.23 Left 3.5E-011 7.28E-009 hsa-miR-135b 8.58E-006 4.99 Right 5.79E-007 4.02E-005

1 miR-X* names represent the miR*or passenger strand of the pri- mary miRNA stem-loop transcript 2 Average fraction of total mapped miRNA reads in the left and right atria 3 P-values and FDRs are based on the EdgeR pairwise analysis.

89 Table 4.1: Expression differences of miRNAs between the left and right atria at FDR <0.08 ranked by p-value.

Atrial Absolute Expressed mirBase ID1 PValue2 FDR3 Concentration1 Fold-Change Higher hsa-miR-487a 1.38E-005 2.25 Left 8.77E-006 0.000365 hsa-miR-4448 1.34E-005 2.28 Left 3.59E-005 0.0013 hsa-miR-585 1.39E-005 2.27 Right 4.68E-005 0.00156 hsa-miR-1275 1.58E-005 2.09 Right 7.83E-005 0.00228 hsa-miR-483-5p 2.43E-005 2.2 Right 8.08E-005 0.00228 hsa-miR-4284 2.18E-005 2.45 Left 9.24E-005 0.00239 hsa-miR-9 0.000143 1.92 Left 9.48E-005 0.00239 hsa-miR-1973 1.89E-005 2.48 Right 0.000105 0.00249 hsa-miR-125b-1* 2.31E-005 1.9 Left 0.000245 0.00475 hsa-miR-4497 3.63E-005 2 Right 0.000288 0.00533 hsa-miR-425 0.000938 1.76 Right 0.000625 0.01 hsa-miR-125b 0.00658 1.78 Left 0.000627 0.01 hsa-miR-92b 8.89E-005 1.87 Right 0.000818 0.0124 hsa-miR-150 0.000125 1.72 Right 0.00116 0.0164 hsa-miR-708 5.8E-005 1.77 Left 0.00118 0.0164 hsa-miR-495 1.51E-005 1.76 Left 0.00164 0.021 hsa-miR-3123 3.73E-005 2.04 Left 0.00176 0.0222 hsa-miR-24-1* 0.000119 1.78 Left 0.00195 0.0239 hsa-miR-202* 9.28E-006 1.86 Left 0.00198 0.0239

1 miR-X* names represent the miR*or passenger strand of the pri- mary miRNA stem-loop transcript 2 Average fraction of total mapped miRNA reads in the left and right atria 3 P-values and FDRs are based on the EdgeR pairwise analysis.

90 Table 4.1: Expression differences of miRNAs between the left and right atria at FDR <0.08 ranked by p-value.

Atrial Absolute Expressed mirBase ID1 PValue2 FDR3 Concentration1 Fold-Change Higher hsa-miR-155 0.000103 1.72 Right 0.0022 0.0248 hsa-miR-376c 6.82E-005 1.67 Right 0.0041 0.0417 hsa-miR-4792 0.000289 1.84 Left 0.00475 0.0471 hsa-miR-766 1.01E-005 1.67 Left 0.00505 0.0495 hsa-miR-675* 1.94E-005 1.64 Right 0.00715 0.067 hsa-miR-378f 0.000106 1.85 Left 0.00761 0.0692 hsa-miR-133a 0.0111 1.63 Left 0.00895 0.0777 hsa-miR-423-5p 0.0011 1.56 Left 0.00922 0.0792 hsa-miR-146a 0.000624 1.55 Right 0.0094 0.0799

1 miR-X* names represent the miR*or passenger strand of the pri- mary miRNA stem-loop transcript 2 Average fraction of total mapped miRNA reads in the left and right atria 3 P-values and FDRs are based on the EdgeR pairwise analysis.

4.3.3 mRNA gene expression differences between the left

and the right atria

The majority of the mRNA reads mapped to the human reference genome hg19 (85% ± 6.2) and more specifically to known RefSeq genes (67.7 ± 10.6 %). MDS plot of the transcripts showed separation between the left and right atria in dimension 1. (Figure 2b). 746 genes were called differentially expressed between the left and right atria at the stringent FDR of 0.001, while 2292 genes were differentially expressed at an FDR of ≤ 0.05. The top 20 differentially expressed genes ranked by p-value are

91 shown in Table 4.2. PITX2 was expressed 116-fold higher in the left vs. right atria with the 3rd most significant p-value (p = 8.72 ∗ 10−68), accounting for 0.0032% of reads mapping to the transcriptome in the left atria, without appreciable expression in right atrial samples (Figure 4.3). Of the 6 RefSeq annotated PITX2 RNA isoforms, only the PITX2c RNA isoform was detected (Supplemental Figure 1), although a 3’ library generation bias might under represent expression of other transcripts. To vali- date the PITX2 sequence results, we performed an internally-normalized quantitative RT-PCR using a TaqMan expression assay specific for PITX2c on 19 additional left right atrial pairs (17 surgical samples and 2 unused donor hearts). PITX2c was ex- pressed 232 + 165-fold (mean + SD) higher in the left vs. right atria, confirming the RNA-seq results. In contrast, HAMP, encoding hepcidin antimicrobial protein, was expressed 121-fold higher in the right vs. left atria (Figure 4.3). BMP10, a gene known to be down-regulated by PITX2c [137], was expressed 282-fold higher in the right atria, in the side opposite of PITX2c expression as expected (Table 4.2). Two cardiomyocyte specific transcripts were differentially expressed among the top 25 significant genes: MYL2, a slow cardiac myosin regulatory light chain was ex- pressed 10 fold higher in the left atria (p = 5.11 ∗ 10−33), and HCN4, a pacemaker ion-channel was expressed more than 7 fold higher in the right atria (p = 1.28∗10−24).

92 Figure 4.3: HAMP and PITX2 display inverse expression patterns between left and right atria. Data are normalized to one million reads per library. The lines connect each subject’s paired samples.

Table 4.2: The top 20 left-right differentially expressed atrial genes ranked by p-value.

Atrial Absolute Expressed Gene Symbol P-Value2 FDR2 Concentration1 Fold-Change Higher

HAMP 6.42E-005 121.18 Right 1.55E-111 2.77E-107 BMP10 0.000551 282.13 Right 1.39E-096 1.24E-092 PITX2 9.19E-006 116.33 Left 8.72E-072 5.2E-068 C2orf14 4.32E-006 126.97 Right 1.15E-048 5.13E-045 C19orf33 4.83E-006 20.81 Left 2.7E-039 9.66E-036 LOC100144602 1.6E-006 106.15 Left 1.32E-033 3.95E-030 MYL2 0.000208 10.35 Left 5.11E-033 1.31E-029 BDKRB1 3.61E-006 41.17 Left 1.1E-031 2.46E-028 SALL1 2.57E-006 28.67 Right 6.53E-031 1.3E-027 DNASE1L3 4.95E-006 12.47 Right 8.28E-031 1.48E-027 1Average fraction of total mapped miRNA reads in the left and right atria. 2P values and FDRs are based on the EdgeR pairwise analysis. 93 Table 4.2: The top 20 left-right differentially expressed atrial genes ranked by p-value.

Atrial Absolute Expressed Gene Symbol P-Value2 FDR2 Concentration1 Fold-Change Higher

KRT7 7.34E-006 14.19 Left 1.53E-030 2.48E-027 FAM84A 4.89E-006 17.27 Right 5.67E-030 8.46E-027 IRX3 1.07E-005 8.39 Right 3.75E-029 5.16E-026 THBS4 5.75E-005 9.54 Left 2.55E-027 3.27E-024 ANKRD30BL 2.26E-006 > 1000 Left 1.97E-026 2.17E-023 SYT4 7.43E-006 27.11 Left 2.05E-026 2.17E-023 ALOX15 1.85E-005 15.03 Left 2.06E-026 2.17E-023 CLDN18 8.92E-006 12.62 Left 5.19E-026 5.16E-023 RBP4 1.5E-005 6.52 Left 1.1E-024 1.03E-021 HCN4 1.99E-005 7.53 Right 1.28E-024 1.14E-021 1Average fraction of total mapped miRNA reads in the left and right atria. 2P values and FDRs are based on the EdgeR pairwise analysis.

To better characterize the differences between the left and right atria, gene set enrichment was done using the molecular signatures database (MSigDB) from The Broad Institute [141] and testing was done with the competitive geneset test function romer from the limma R-package[140]. Various genes relating to the (GO) term signal transduction and transcription were significantly up-regulated in the right atria compared to the left. For example, of the 1391 atrial-expressed RefSeq genes belonging to the signal transduction pathway, 205 were expressed higher in the right in our prior pairwise edgeR analysis and 94 in the left, a significant enrichment (p= 0.0001 by re-sampling in limma Supplemental Table 5). The is an example of a GO term geneset that was enriched in genes that were more highly

94 expressed in the left atria (P= 0.0001). Various additional GO sets relating to the mitochondria were significantly enriched in the left atria, including components of the NADH dehydrogenase chain, mitochondrial matrix and mitochondrial ribosomal complex (data not shown). Using gene sets generated from shared conserved cis-regulatory motifs from MSigDB [142], genes containing or adjacent to several transcription factor binding motifs, or conserved motifs not yet associated with a specific transcription factor, were en- riched in right or left atria expression. Table 4.3 shows the list of 21 and 16 motifs that were associated with gene sets enriched in the right and left atria, respectively (at p=0.0001, with at least 5 genes enriched in one side). For example, 163 atrial- expressed RefSeq genes were close to a p300 binding element and this set of genes was expressed at significantly higher levels in the right vs. left atria (p=0.0001 by re-sampling analysis). We compared this atrial-expressed p300 motif containing gene- set with our prior pairwise edgeR analysis of differentially expressed genes and found that out of these 163 genes, 25 genes were more highly expressed in the right atria, versus 7 genes in the left atria.

Table 4.3: Top left-right atria differentially regulated genesets by presence of tran- scription factor binding motifs ± 2 kb from the start site of transcription of atrial expressed genes.

Motif Right1 Left1 Hypothesis

V$AP4 01 19 10 Right AACTTT UNKNOWN 195 74 Right AACWWCAANK UNKNOWN 10 1 Right V$P300 01 25 7 Right V$CREBP1 01 14 6 Right V$MYOGNF1 01 6 3 Right

95 Table 4.3: Top left-right atria differentially regulated genesets by presence of tran- scription factor binding motifs ± 2 kb from the start site of transcription of atrial expressed genes.

Motif Right1 Left1 Hypothesis

V$IRF1 01 23 9 Right V$IRF2 01 10 5 Right V$TAL1BETAE47 01 24 12 Right V$TAL1ALPHAE47 01 24 11 Right V$HEN1 01 21 12 Right V$TAL1BETAITF2 01 26 14 Right V$GATA3 01 24 13 Right V$EVI1 04 34 6 Right V$EVI1 05 17 7 Right V$MZF1 02 19 6 Right V$ZID 01 21 8 Right V$IK2 01 28 7 Right V$CDP 01 7 4 Right ACCTGTTG UNKNOWN 15 5 Right V$ELK1 02 3 8 Right V$CREB 01 14 16 Left V$EVI1 02 7 7 Left V$NRF2 01 3 8 Left ACTAYRNNNCCCR UNKNOWN 9 10 Left V$USF C 12 14 Left V$PPARA 01 3 4 Left V$ATF B 11 13 Left V$GABP B 3 8 Left

96 Table 4.3: Top left-right atria differentially regulated genesets by presence of tran- scription factor binding motifs ± 2 kb from the start site of transcription of atrial expressed genes.

Motif Right1 Left1 Hypothesis

V$TEL2 Q6 14 4 Left V$SF1 Q6 19 13 Left V$COUP DR1 Q6 13 11 Left GGAANCGGAANY UNKNOWN 0 9 Left V$CREB Q4 01 11 12 Left V$AP4 Q6 01 16 23 Left V$ER Q6 02 18 25 Left

Inclusion in this table was performed by filtering genesets that contained at least 5 genes that were significantly up regulated in the right or left atria, and by having the minimum p-value (10−4) under the model-free hypothesis including all genes near the specified DNA element. 1 Number of genes from the geneset that are contained within those found to be expressed significantly higher in the left or right atria via the edgeR analysis (FDR < 0.05), not including those with moderate expression bias.

We repeated this analysis using shared conserved miRNA binding motifs in the 3’ UTR of atrial expressed genes. We found 16 and 1 miRNA motif genesets that were expressed higher in the right and left atria, respectively (p=0.0001, with at least 5 genes enriched in one side, Table 4.4). For example, 165 atrial-expressed genes con- tained a miR-133a binding site in their 3’ UTRs, and this set of genes was expressed significantly higher in the right vs. left atria (p=0.0001 by re-sampling analysis). We compared this atrial-expressed 133a binding site motif containing geneset with our prior pairwise edgeR analysis of differentially expressed genes and found that out of

97 these 163 genes, 11 genes were found to be expressed higher in the right atria and 1 gene in the left atria. This corresponds well with the observed higher expression of mir133a in the left vs. right atria (1.63-fold difference, Table 4.1). Thus, miR133a is more highly expressed in the left side, leading to decreased expression of its mRNA target genes, resulting in a higher right-sided expression for these targets.

98 Table 4.4: Top left-right atria differentially regulated genesets by presence of con- served miRNA motif in the 3’ UTR of atrial expressed genes.

Motif Right1 Left1 Hypothesis

CACTGCC,MIR-34A,MIR-34C,MIR-449 20 13 Right GGGACCA,MIR-133A,MIR-133B 11 1 Right ATGCTGC,MIR-103,MIR-107 18 7 Right AGGGCCA,MIR-328 8 1 Right CTGAGCC,MIR-24 17 7 Right CCTGTGA,MIR-513 11 5 Right GGCAGCT,MIR-22 24 13 Right CCTGCTG,MIR-214 19 3 Right GACAGGG,MIR-339 8 2 Right GAGCCAG,MIR-149 16 5 Right CCCAGAG,MIR-326 12 1 Right GCAAGGA,MIR-502 10 3 Right CAGTCAC,MIR-134 8 0 Right TCCAGAG,MIR-518C 14 4 Right CACCAGC,MIR-138 17 5 Right CAGCCTC,MIR-485-5P 16 3 Right CCATCCA,MIR-432 4 6 Left

Inclusion in this table was performed by filtering genesets that contained at least 5 genes that were significantly up regulated in the right or left atria, and by having the minimum p-value (10−4) under the model-free hypothesis including all genes near the specified DNA element. 1 Number of genes from the geneset that are contained within those found to be expressed significantly higher in the left or right atria via the edgeR analysis (FDR < 0.05), not including those with moderate expression bias. 99 We examined AF and PR interval GWAS hits and found three nearby coding genes, PITX2, SULF2, and WNT11, that had significant left-right atrial gene ex- pression differences (Supplemental Table 6), reinforcing that these genes may play a functional role in AF pathogenesis.

4.3.4 Left-right expression differences in poorly annotated

transcripts

Cufflinks [143] was used to assemble transcripts from aligned sequencing reads in order to identify potentially novel transcripts. 521 transcripts were found to be differen- tially expressed that were either classified as novel or considered a novel processed transcript by the Ensembl database (release number 61). After manual curation by filtering out alternative splice variants of known genes and very lowly expressed tran- scripts (i.e. that were expressed below 5 fragments per kilobase of the transcript per million mapped reads of the transcriptome (FPKM), thirteen novel transcripts, rang- ing from 2 to 4 , were found to be differentially expressed that were not found in the Ensembl annotation, all of which appear to function as non-coding RNAs based upon in silico analysis. As a check to ensure that the transcripts did not arise due to misalignment of sequencing reads, eleven of the novel transcripts had 95-100% of the sequencing reads mapping uniquely to the genome in an exon contiguous fashion using BLAT [53] (Table 4.5), with the other two mapping to multiple sites. The most significantly differentially expressed novel transcript (p<10-16) is adjacent to the TBX5 gene, a heart-specific transcription factor.

100 Table 4.5: Novel non-Ensembl annotated transcripts that were differentially expressed between the left and right atria ranked by p-value.

Absolute Transcript Nearest Expressed Exons Fold P-Value Boundary Gene Higher Change chr12:114883593-114885094 TBX5 2 2.01 Right ≤ 10−16 chr6:36810354-36812333 CPNE5 2 1.84 Left ≤ 10−16 chr12:58325277-58329330 XRCC6BP1 2 6.69 Right 2.2E-016 chr2:27939711-27961143 SLC4A1 2 2.94 Left 1.4E-014 chr2:50999236-51003553 NRXN1 1 2 1.77 Left 1.2E-012 chr19:50989460-51003531 JOSD2 3 4.95 Right 0.00011 chr15:25243485-25247622 SNOR108 4 2.32 Right 0.00047 chr19:50990647-50999173 C19orf63 2 1.75 Right 0.00076 chr16:58467328-58496760 NDRG4 3 1.58 Left 0.0012 chr13:114054026-114066562 ADPRHL1 2 1.39 Left 0.0025 chr19:50991540-51005189 JOSD2 2 3.13 Left 0.015 1 Transcript is intronic

4.4 Discussion

We found large differences in miRNA and mRNA expression levels between the left and the right atria in our genome-wide analysis of total RNA expression, including the well-known left-right patterning genes PITX2 and BMP10. Of the 17 named genes in our top 20 differentially expressed transcripts (Table 4.2) , six have recently been validated to be differentially expressed by microarray and/or RT-PCR in the same direction in either mouse and/or human atria, including the top three transcripts HAMP, BMP10, PITX2 [144]. Several novel non-coding messenger RNAs were asym-

101 metrically expressed as well. These data suggest that the left and the right atria have significantly different gene expression profiles, which may have electrophysiological and pathophysiological consequences. However, a limitation of this study is the com- bination of diseased and non-diseased subjects. Differences in transcription between AF and non-AF in the left atria are of great interest, but we were not powered to detect these differences.

The top locus associated with AF maps to an intergenic region on chromosome 4q25, and PITX2 is the closest adjacent gene. Previous work in humans has shown that PITX2c is expressed primarily in the left atria [144][145]. We confirmed this re- sult and found practically no PITX2c expression in the right atria, nor did we detect expression of any other PITX2 isoform in the left atria. Kirchhof et al. reported that PITX2c was the only isoform expressed In mouse atria, and that it was expressed only in the left atria [146], agreeing with our data in humans. However, a prior study in humans detected multiple isoforms of PITX2 in both atria using RT-qPCR [135]. RNA-seq has been reported to be more reliable for low-abundance transcripts compared to RT-qPCR [147]; although, our finding of no other PITX2 isoforms may be due to insufficient read coverage to detect transcripts expressed at less than 10 mRNA copies per million mRNA . In contrast to PITX2c that was ex- pressed only in the left atria, BMP10, a transcript known to be directly repressed by PITX2, was expressed only in the right atria agreeing with recently published data [144]. PITX2 has been reported to repress the expression of hsa-mir-1-1 [145]. , al- though our sample size is small we observed a trend for an inverse correlation between PITX2 expression and hsa-mir-1-1 expression in the left atria (r= -0.84, p= 0.099). However, several other atrial expressed genes known to be regulated by PITX2c, such as SHOX2, TBX3 and NKX2-5 [148][135] showed no difference in expression levels between the left and the right atria (data not shown). PITX2 expression is high

102 throughout the left atria just after birth in mice, but it is only expressed in a small subset of left atrial cells in adult mice [135]. We speculate that developmental epige- netic influences or other overriding transcription factors might also regulate SHOX2, TBX3 and NKX2-5 expression in the adult atria diminishing the effect of PITX2. Although PITX2c is expressed only at low levels in these adult samples, the differ- ential expression of PITX2c and BMP10 reported here suggests that in adults these factors may play a continuing role, potentially in the pathogenesis of AF. In addition to genes with well-known roles in cardiac , we found that the HAMP gene was expressed exclusively in the right atria. HAMP encodes hepcidin antimicrobial protein, a protein that is mainly produced by the liver and that controls iron absorption in the intestine. Mutations in HAMP cause hemochromatosis type 2B leading to iron overload in many organ systems. Furthermore, hemochromatosis type 2B often presents with cardiomyopathy, heart failure, and/or major arrhythmias which are a prominent cause of death in the absence of treatment, suggesting that HAMP may have a local role in iron regulation in the heart [149]. Hsa-miR-143 was the most abundantly expressed miRNA in human atria in our analysis, accounting for 30% of all mapped miRNAs reads in the atria. However, it was previously reported, by sequencing a library of cloned miRNA cDNAs, that the most abundant miRNA expressed in the mouse heart is mir-1, accounting for 45% of miRNA expression [150]. There are several potential explanations for this discrepancy. First, we specifically looked at miRNA expression in the atria and not the whole-heart, and it has been shown that the atria and ventricles have vastly different expression profiles [151]. Second, miRNA expression in the heart may not be conserved across species; for example, cardiac p300 binding site enhancer regions are weakly conserved between mice and humans [152]. The high expression level of miR-143 in the atria is concordant with its role in development in mice, where it has been shown to play a critical role in the formation of the outflow tract by repressing

103 KLF4 and promoting differentiation [153]. Recent work in zebrafish has shown that mir-143 expression during cardiogenesis is dependent on the beating of the heart; although, mir-143 is not expressed in zebrafish atria, but rather in the outflow tract and ventricles [154]. mir-143 has also been shown to be critical for cardiac chamber formation through the direct repression of adducin3, an F-actin capping protein in zebrafish [155]. We found increased expression of mir-133 in the left atria and a corresponding decrease in gene expression of mir-133 targets in the left atria. mir-133 specifies a cardiac progenitor lineage by preventing the expression of non-muscle genes, but it also inhibits further differentiation into cardiomyocytes [156][157], which may suggest a more progenitor like state of the left atria when compared to the right atria. Polyadenlyated long intergenic non-coding RNAs (lincRNAs) have been shown to regulate gene expression in trans by acting as scaffolds for chromatin modifying proteins such as the polycomb repressor complex 2 (PRC2), which regulate gene ex- pression via histone modification [32]. siRNA mediated down regulation of specific PRC2 associated lincRNAs, such as HOTAIR, leads to the up regulation of a specific set of 100 to 300 PRC2 repressed genes. We found a conserved lincRNA that was dif- ferentially expressed between the left and right atria located 47 kb up-stream the gene encoding transcription factor TBX5. There are 2 SNPs near TBX5 associated with PR interval [31], which may be relevant to AF pathogenesis. We speculate that the lincRNA adjacent to TBX5 may also be regulated by these SNPs, altering expression of downstream genes through chromatin remodeling. We found many lincRNAs were expressed in the heart, suggesting the potential for widespread cardiac regulation of gene expression by these non-coding RNAs.

104 Chapter 5

Conclusion: How to unravel complicated traits

5.1 Roadmap to Identification of Mouse Atheroscle-

rosis Modifier Genes

Mapping of the QTLs that affect aortic root lesion area in a new strain intercross, a complex quantitative trait, is reproducible on chromosomes 2, 15 and 17 with that of a previous cohort DBA/2 ApoE -/- X AKR ApoE -/- F2 intercross. The hits on chromosome 2 and 17 each accounted for 6 % of the total variance in aortic root lesion area with the DBA/2 the allele increasing the root lesion area. Similar to a GWAS, replication is essential for QTL experiments in removing false positives due to the high number of statistical tests being run. It should be noted, however, that not every hit replicated. One hit that was not replicated in the new study was the one occurring on chromosome 5. Given the sample size, it may be that we were not powered to detect the locus or it may be that the first hit was a false positive. Congenic mice made after this study did not validate the QTL on chromosome 5, and therefore the chromosome 5 QTL found in the first study is likely to be a false positive. Out of the

105 three positive QTLs – chromosome 17 congenics shows the same strong significant effect in the same direction (unpublished) and the other two congenic strains are still being bred The replication for cis-eQTLs was high. We found several hundred cis-eQTLs that replicated between two studies which is roughly half of the eQTLs. This suggests along with many other studies that genetic variation seems to have pervasive control of gene expression levels. The replication of trans-eQTL was low with the list limited to only a few genes. The focus in future has thus shifted to cis-elements. Of course expression alterations aren’t the only possible changes; we have identified all the coding nonsynonymous SNPs at the QTLs. One component of genetics that was not addressed in of our eQTL studies directly is the genetic interaction with environment. We performed a study on the response of gene expression due to an interaction effect between cholesterol loading and strain in Chapter 3. Pathways involving lysosomal process were significantly different between the two strains, with the atherosclerosis resistant strain being more highly expressed. We have verified several of these gene x treatment interactions using qRT-PCR such as Atf4. Taken altogether, we made a comprehensive list of genes that are strong candidates for being modifier genes of atherosclerosis. Combining these different genome and transcriptomic studies we identified Sys1 on chromosome 2 as a strong atherosclerosis modifier gene for the following 3 reasons: 1) it is located in a replicated atherosclerosis QTL; 2) it is a strong replicated cross-tissue cis-eQTL 3) and it is involved in the lysosomal pathway implicated in the strong differences in autophagy we have discovered in these 2 strains. In all scientific journeys, there is always more. We have uncovered these loci but verifying the direct downstream effect is still an ongoing investigation. We are in the process of knocking down some these genes and looking at an intermediate phenotype. Sys1 encodes for a protein that is essential in moving Arfrp1 protein into the lysosome, and it has a potential role in cholesterol ester metabolism to free cholesterol via acid

106 lysosomal lipase. Peggy Robinet and Brian Ritchey in the lab have found that the AKR strain has much higher free cholesterol and less cholesterol ester in acetylated- LDL loaded macrophages. We are attempting to see if knockdown or overexpression of Sys1 alters this phenotype. Back crosses are being performed to fine map the identified loci. In addition we can test candidates by using genomic engineering to convert the DBA/2 alleles into AKR alleles in DBA/2 embryonic stem cells which allows us to make “allele replacement” mice. This is a laborious task, but would be the gold standard method to prove causality for an atherosclerosis modifier gene as it allows the specific localization of one variant without other variables.

5.2 Roadmap to Identification of causal variants

for AF, and their mechanism of action

We have found significant differences between the left and right atria in terms of expression that is likely to play in role in the etiology of AF. Many of these differences occurred at locations that are significantly associated in atrial fibrillation GWAS, for example at the PITX2 and TBX5 loci. We have also found several long non-coding RNAs that are differentially regulated at these loci. One of these occurs adjacent to PITX2. Our studies have shown that both PITX2 and the lincRNA adjacent to PITX2 have a cis-eQTL within another adjacent gene. Neither of their expressions; however correlate with AF (unpublished). In addition we have identified an eQTL by both microarray and aellic expression imbalance at CAV1, both associated with the AF GWAS SNP at that locus. We are in the process of assessing AEI across hundreds of RNA-seq samples from left atria especially for those genes that are located near GWAS hits. Once eQTLs and AEI have been confirmed at these GWAS loci, potential causative variants can be restricted to those that are in tight LD with GWAS hits for further study of function. We are in the process of testing for enhancer or repressor

107 activity in these regions by reporter assays in cardiomyocytes derived from H9 stem cell lines. Once an effect has been reported, genomic editing between the bi-allelic polymorphisms can be performed to test for different effects on gene expression and cardiomyocyte differentiation and function.

5.3 The utility of functional genomics

As sequencing gets cheaper and more and more rare mutations are discovered, pre- diction of their functional role will be highly important. Considering that non-coding functional regions of the genome appear to play large role in creating phenotypic variation, Ab initio prediction of novel variants in these regions will be essential for developing personalized treatments and therapeutics. Despite being often ignored, I hypothesize that the predictions of changes for non-coding variants will be easier to do once some of their downstream mechanisms have been characterized. In order to do this, the first step is to identify the functional regions; reporter gene assays have been used in our lab as well as bioinformatic approaches such as using the ENCODE data. Statistical genetics in these regions by fine mapping/resequencing studies or multiethnic group epidemiological studies will again both confirm and narrow down the region of interest. Once the potential causative variants are narrowed down to a manageable number, one can use genomic engineering techniques to change that one variant and determine its effect in functional studies. Stem cells are an ideal cell line to perform such studies in because of their increased ability to repair DNA after damage [158] which is amenable to approaches such as CRISPR, zinc finger, and talen genomic edition approaches [159][160][161]. Stem cells can also be directed to differentiate into hard to obtain cell types. The technology now exists in which we can specifically test for a functional role of variants in various diseases and especially heart disease.

108 Bibliography

[1] Valentin Fuster, Bridget B Kelly, and Rajesh Vedanthan. “Promoting Global Cardiovascular Health”. In: Circulation 123.15 (2011), pp. 1671–1678. [2] Alan S Go et al. “Executive Summary: Heart Disease and Stroke Statis- tics—2013 Update A Report From the American Heart Association”. In: Cir- culation 127.1 (2013), pp. 143–152. [3] Peter WF Wilson et al. “Prediction of coronary heart disease using risk factor categories”. In: Circulation 97.18 (1998), pp. 1837–1847. [4] Donald M Lloyd-Jones et al. “Lifetime risk for development of atrial fibrillation the Framingham heart study”. In: Circulation 110.9 (2004), pp. 1042–1046. [5] Thomas J Wang et al. “A risk score for predicting stroke or death in individuals with new-onset atrial fibrillation in the ”. In: JAMA: the journal of the American Medical Association 290.8 (2003), pp. 1049–1056. [6] Philip A Wolf, Robert D Abbott, and William B Kannel. “Atrial fibrillation as an independent risk factor for stroke: the Framingham Study.” In: Stroke 22.8 (1991), pp. 983–988. [7] Lucas Elijovich et al. “Intermittent atrial fibrillation may account for a large proportion of otherwise cryptogenic stroke: a study of 30-day cardiac event monitors.” In: Journal of stroke and cerebrovascular diseases: the official jour- nal of National Stroke Association 18.3 (2009), p. 185. [8] Emelia J Benjamin et al. “Independent risk factors for atrial fibrillation in a population-based cohort”. In: JAMA: the journal of the American Medical Association 271.11 (1994), pp. 840–844. [9] Carl M¨uller.“Xanthomata, Hypercholesterolemia, Angina Pectoris.” In: Acta Medica Scandinavica 95.S89 (1938), pp. 75–84. issn: 0954-6820. doi: 10.1111/ j.0954-6820.1938.tb19279.x. [10] Michael S Brown and Joseph L Goldstein. “Lipoprotein metabolism in the macrophage: implications for cholesterol deposition in atherosclerosis”. In: An- nual review of 52.1 (1983), pp. 223–261. [11] Richard H Myers et al. “Parental history is an independent risk factor for coronary artery disease: the Framingham Study”. In: American heart journal 120.4 (1990), pp. 963–969.

109 [12] Marcus Fischer et al. “Distinct heritable patterns of angiographic coronary artery disease in families with myocardial infarction”. In: Circulation 111.7 (2005), pp. 855–862. [13] MELISSA A AUSTIN et al. “Risk factors for coronary heart disease in adult female twins genetic. Heritability and shared environmental influences”. In: American Journal of 125.2 (1987), pp. 308–318. [14] STEVEN C HUNT et al. “Genetic heritability and common environmental components of resting and stressed blood pressures, lipids, and body mass index in Utah pedigrees and twins”. In: American Journal of Epidemiology 129.3 (1989), pp. 625–638. [15] Marian Beekman et al. “Heritabilities of and lipid levels in three countries”. In: Twin Research 5.2 (2002), pp. 87–97. [16] Debra A Heller et al. “Genetic and environmental influences on serum lipid levels in twins”. In: New England Journal of Medicine 328.16 (1993), pp. 1150– 1156. [17] Jacqueline M Vink, Gonneke Willemsen, and Dorret I Boomsma. “Heritability of smoking initiation and nicotine dependence”. In: Behavior genetics 35.4 (2005), pp. 397–406. [18] Dorit Carmelli et al. “Genetic influence on smoking—a study of male twins”. In: New England Journal of Medicine 327.12 (1992), pp. 829–833. [19] Christina N Lessov-Schlaggar et al. “Heritability of cigarette smoking and alcohol use in Chinese male twins: the Qingdao twin registry”. In: International journal of epidemiology 35.5 (2006), pp. 1278–1285. [20] Jacqueline M Vink and Dorret I Boomsma. “Interplay between heritability of smoking and environmental conditions? A comparison of two birth cohorts”. In: BMC public health 11.1 (2011), p. 316. [21] P Poulsen et al. “Heritability of type II (non--dependent) diabetes mel- litus and abnormal glucose tolerance–a population-based twin study”. In: Di- abetologia 42.2 (1999), pp. 139–145. [22] Ingrid Elisabeth Christophersen et al. “Familial Aggregation of Atrial Fibrilla- tionCLINICAL PERSPECTIVE A Study in Danish Twins”. In: Circulation: Arrhythmia and Electrophysiology 2.4 (2009), pp. 378–383. [23] Patrick T Ellinor et al. “Familial aggregation in lone atrial fibrillation”. In: Human genetics 118.2 (2005), pp. 179–184. [24] David O Arnar et al. “Familial aggregation of atrial fibrillation in Iceland”. In: European heart journal 27.6 (2006), pp. 708–712. [25] Patrick T Ellinor et al. “Locus for atrial fibrillation maps to chromosome 6q14–16”. In: Circulation 107.23 (2003), pp. 2880–2883.

110 [26] Lucia a Hindorff et al. “Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.” In: Proceedings of the National Academy of Sciences of the United States of America 106.23 (June 2009), pp. 9362–7. issn: 1091-6490. doi: 10.1073/pnas.0903103106. [27] Heribert Schunkert et al. “Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease.” In: Nature genetics 43.4 (Jan. 2011), pp. 333–8. issn: 1546-1718. doi: 10.1038/ng.784. [28] Daniel F Gudbjartsson et al. “A sequence variant in ZFHX3 on 16q22 asso- ciates with atrial fibrillation and ischemic stroke.” In: Nature genetics 41.8 (Aug. 2009), pp. 876–8. issn: 1546-1718. doi: 10.1038/ng.417. [29] Patrick T Ellinor et al. “Common variants in KCNN3 are associated with lone atrial fibrillation.” In: Nature genetics 42.3 (Mar. 2010), pp. 240–4. issn: 1546-1718. doi: 10.1038/ng.537. [30] Raha Pazoki et al. “SNPs Identified as Modulators of ECG Traits in the Gen- eral Population Do Not Markedly Affect ECG Traits during Acute Myocardial Infarction nor Ventricular Fibrillation Risk in This Condition”. In: PloS one 8.2 (2013), e57216. [31] Hilma Holm et al. “Several common variants modulate heart rate, PR interval and QRS duration.” In: Nature genetics 42.2 (Feb. 2010), pp. 117–122. issn: 1546-1718. doi: 10.1038/ng.511. [32] Ahmad M Khalil et al. “Many human large intergenic noncoding RNAs as- sociate with chromatin-modifying complexes and affect gene expression.” In: Proceedings of the National Academy of Sciences of the United States of Amer- ica 106.28 (July 2009), pp. 11667–11672. issn: 1091-6490. doi: 10.1073/pnas. 0904715106. [33] Samuel P Dickson et al. “Rare variants create synthetic genome-wide associa- tions”. In: PLoS biology 8.1 (2010), e1000294. [34] Jonathan C Cohen et al. “Sequence variations in PCSK9, low LDL, and pro- tection against coronary heart disease”. In: New England Journal of Medicine 354.12 (2006), pp. 1264–1272. [35] Mario Falchi et al. “A genomewide search using an original pairwise sampling approach for large genealogies identifies a new locus for total and low-density lipoprotein cholesterol in two genetically differentiated isolates of Sardinia”. In: The American Journal of Human Genetics 75.6 (2004), pp. 1015–1031. [36] Jennifer Harrow et al. “GENCODE: The reference human genome annotation for The ENCODE Project”. In: Genome research 22.9 (2012), pp. 1760–1774. [37] Marc A Schaub et al. “Linking disease associations with regulatory information in the human genome”. In: Genome research 22.9 (2012), pp. 1748–1759. [38] NJ Samani et al. “Genomewide association analysis of coronary artery dis- ease”. In: New England Journal of Medicine 357.5 (2007), pp. 443–453.

111 [39] Eric E Schadt et al. “Mapping the genetic architecture of gene expression in human liver.” In: PLoS biology 6.5 (May 2008), e107. issn: 1545-7885. doi: 10.1371/journal.pbio.0060107. [40] Kiran Musunuru et al. “From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus”. In: Nature 466.7307 (Aug. 2010), pp. 714–719. [41] Robert C Wirka et al. “A Common -40 Gene Promoter Variant Af- fects Connexin-40 Expression in Human Atria and Is Associated With Atrial FibrillationClinical Perspective”. In: Circulation: Arrhythmia and Electrophys- iology 4.1 (2011), pp. 87–93. [42] J. D. Smith et al. “In Silico Quantitative Trait Locus Map for Atherosclerosis Susceptibility in -Deficient Mice”. In: Arteriosclerosis, throm- bosis, and vascular biology 23.1 (Nov. 2002), pp. 117–122. issn: 10795642. doi: 10.1161/01.ATV.0000047461.18902.80. [43] DM Lloyd-Jones et al. “Parental cardiovascular disease as a risk factor for car- diovascular disease in middle-aged adults”. In: JAMA 291.18 (2004), pp. 2204– 2211. [44] S. Zdravkovic et al. “Heritability of death from coronary heart disease: a 36- year follow-up of 20 966 Swedish twins”. In: Journal of Internal Medicine 252.3 (Sept. 2002), pp. 247–254. issn: 0954-6820. doi: 10.1046/j.1365- 2796.2002.01029.x. [45] MP Reilly et al. “Novel locus for coronary atherosclerosis and association of ABO with myocardial infarction in the presence of coronary atherosclerosis: two genome-wide association”. In: Lancet 377.9763 (2011), pp. 383–392. doi: 10.1016/S0140-6736(10)61996-4.Identification. [46] Christopher T Johansen et al. “Excess of rare variants in non-genome-wide association study candidate genes in patients with hypertriglyceridemia”. In: Circulation. Cardiovascular genetics 5.1 (Feb. 2012), pp. 66–72. issn: 1942- 3268. doi: 10.1161/CIRCGENETICS.111.960864. [47] EE Schadt et al. “Genetics of gene expression surveyed in maize, mouse and man”. In: Nature 422.October 2002 (2003), pp. 297–302. doi: 10 . 1038 / nature01482.1.. [48] Jeffrey M Bhasin et al. “Sex specific gene regulation and expression QTLs in mouse macrophages from a strain intercross.” In: PloS one 3.1 (Jan. 2008), e1435. issn: 1932-6203. doi: 10.1371/journal.pone.0001435. [49] Julie Baglione and Jonathan D Smith. “Quantitative assay for mouse atheroscle- rosis in the aortic root”. In: Methods in Molecular Medicine 129 (2006), pp. 83– 95. [50] KW Broman et al. “R/qtl: QTL mapping in experimental crosses”. In: Bioin- formatics 19.7 (2003), pp. 889–890. doi: 10.1093//btg112.

112 [51] W. Shi et al. “Endothelial Responses to Oxidized Lipoproteins Determine Ge- netic Susceptibility to Atherosclerosis in Mice”. In: Circulation 102.1 (July 2000), pp. 75–81. issn: 0009-7322. doi: 10.1161/01.CIR.102.1.75. [52] Pan Du, Warren a Kibbe, and Simon M Lin. “lumi: a pipeline for processing Illumina microarray.” In: Bioinformatics (Oxford, England) 24.13 (July 2008), pp. 1547–8. issn: 1367-4811. doi: 10.1093/bioinformatics/btn224. [53] W.J. Kent. “BLAT — The BLAST-Like Alignment Tool”. In: Genome research 12.4 (2002), p. 656. doi: 10.1101/gr.229202.. [54] Thomas M Keane et al. “Mouse genomic variation and its effect on phenotypes and gene regulation.” In: Nature 477.7364 (Sept. 2011), pp. 289–94. issn: 1476- 4687. doi: 10.1038/nature10413. [55] Binnaz Yalcin et al. “Sequence-based characterization of structural variation in the mouse genome.” In: Nature 477.7364 (Sept. 2011), pp. 326–9. issn: 1476-4687. doi: 10.1038/nature10432. [56] Rudi Alberts et al. “Sequence polymorphisms cause many false cis eQTLs.” In: PloS one 2.7 (Jan. 2007). Ed. by John Storey, e622. [57] Heng Li et al. “The Sequence Alignment/Map format and SAMtools.” In: Bioinformatics (Oxford, England) 25.16 (Aug. 2009), pp. 2078–9. issn: 1367- 4811. doi: 10.1093/bioinformatics/btp352. [58] Scott Schwartz et al. “Human – Mouse Alignments with BLASTZ”. In: Genome research 13.1 (2003), pp. 103–107. doi: 10.1101/gr.809403.. [59] LA Hindorof et al. A Catalog of Published Genome-Wide Association Studies. [60] Jonathan D Smith et al. “Atherosclerosis susceptibility loci identified from a strain intercross of apolipoprotein E-deficient mice via a high-density genome scan.” In: Arteriosclerosis, thrombosis, and vascular biology 26.3 (Mar. 2006), pp. 597–603. issn: 1524-4636. doi: 10.1161/01.ATV.0000201044.33220.5c. [61] Danny Arends et al. “R/qtl: high-throughput multiple QTL mapping.” In: Bioinformatics (Oxford, England) 26.23 (Dec. 2010), pp. 2990–2. issn: 1367- 4811. doi: 10.1093/bioinformatics/btq565. [62] Christopher J O’Donnell et al. “Genome-wide association study for subclinical atherosclerosis in major arterial territories in the NHLBI’s Framingham Heart Study”. In: BMC Medical Genetics 8.Suppl 1 (2007), S4. issn: 14712350. doi: 10.1186/1471-2350-8-S1-S4. [63] Ivan Adzhubei et al. “A method and server for predicting damaging missense mutations.” In: Nature methods 7.4 (Apr. 2010), pp. 248–9. issn: 1548-7105. doi: 10.1038/nmeth0410-248. [64] Dorian O Haskard, Joseph J Boyle, and Justin C Mason. “The role of comple- ment in atherosclerosis.” In: Current opinion in lipidology 19.5 (Oct. 2008), pp. 478–82. issn: 0957-9672. doi: 10.1097/MOL.0b013e32830f4a06.

113 [65] Atila van Nas et al. “Expression Quantitative Trait Loci: Replication, Tissue- and Sex-Specificity in Mice.” In: Genetics 185.3 (May 2010), pp. 1059–1068. issn: 1943-2631. doi: 10.1534/genetics.110.116087. [66] Rudy Behnia et al. “Targeting of the Arf-like GTPase Arl3p to the Golgi requires N-terminal acetylation and the membrane protein Sys1p.” In: Na- ture 6.5 (May 2004), pp. 405–13. issn: 1465-7392. doi: 10.1038/ ncb1120. [67] Angela Hommel et al. “The ARF-like GTPase ARFRP1 is essential for lipid droplet growth and is involved in the regulation of .” In: Molecular and cellular biology 30.5 (Mar. 2010), pp. 1231–42. issn: 1098-5549. doi: 10. 1128/MCB.01269-09. [68] Shinsuke Yasuda et al. “Urokinase-type plasminogen activator is a preferred substrate of the human epithelium serine protease /PRSS22”. In: Blood 105.10 (2005), pp. 3893–3901. [69] Aaron E Cozen et al. “Macrophage-targeted overexpression of urokinase causes accelerated atherosclerosis, coronary artery occlusions, and premature death”. In: Circulation 109.17 (2004), pp. 2129–2135. [70] Ranjini Krishnan et al. “Level of macrophage uPA expression is an important determinant of atherosclerotic lesion growth in Apoe-/- mice”. In: Arterioscle- rosis, thrombosis, and vascular biology 29.11 (2009), pp. 1737–1744. [71] Rolf Gr¨abneret al. “Lymphotoxin β receptor signaling promotes tertiary lym- phoid organogenesis in the aorta adventitia of aged ApoE-/- mice”. In: The Journal of experimental medicine 206.1 (2009), pp. 233–248. [72] Andrew W Owens et al. “Circulating lymphotoxin β receptor and atheroscle- rosis: Observations from the Dallas Heart Study”. In: Atherosclerosis 212.2 (2010), pp. 601–606. [73] Peter Libby, Paul M Ridker, and G¨oranK Hansson. “Progress and challenges in translating the biology of atherosclerosis”. In: Nature 473.7347 (2011), pp. 317–325. [74] Christopher K Glass and Joseph L Witztum. “Atherosclerosis: The Road Ahead Review”. In: Cell 104 (2001), pp. 503–516. [75] Kathryn J Moore and Mason W Freeman. “Scavenger receptors in atheroscle- rosis beyond lipid uptake”. In: Arteriosclerosis, thrombosis, and vascular biol- ogy 26.8 (2006), pp. 1702–1711. [76] David R Greaves and Siamon Gordon. “The macrophage scavenger receptor at 30 years of age: current knowledge and future challenges”. In: Journal of lipid research 50.Supplement (2009), S282–S286. [77] Raymond E Soccio and Jan L Breslow. “Intracellular cholesterol transport”. In: Arteriosclerosis, thrombosis, and vascular biology 24.7 (2004), pp. 1150– 1160.

114 [78] RL Tiwari, V Singh, and MK Barthwal. “Macrophages: an elusive yet emerg- ing therapeutic target of atherosclerosis”. In: Medicinal research reviews 28.4 (2008), pp. 483–544. [79] Mireille Ouimet et al. “Autophagy regulates cholesterol efflux from macrophage foam cells via lysosomal acid lipase”. In: Cell metabolism 13.6 (2011), pp. 655– 667. [80] Khalid Alwaili et al. “High-density lipoproteins and cardiovascular disease: 2010 update”. In: Expert review of cardiovascular therapy 8.3 (2010), pp. 413– 423. [81] Pedro R Moreno, Javier Sanz, and Valentin Fuster. “Promoting Mechanisms of Vascular HealthCirculating Progenitor Cells, Angiogenesis, and Reverse Cholesterol Transport”. In: Journal of the American College of Cardiology 53.25 (2009), pp. 2315–2323. [82] Xun Wang and Daniel J Rader. “Molecular regulation of macrophage reverse cholesterol transport”. In: Current opinion in cardiology 22.4 (2007), pp. 368– 372. [83] Laurent Yvan-Charvet, Nan Wang, and Alan R Tall. “Role of HDL, ABCA1, and ABCG1 transporters in cholesterol efflux and immune responses”. In: Ar- teriosclerosis, thrombosis, and vascular biology 30.2 (2010), pp. 139–143. [84] Yves L Marcel, Mireille Ouimet, and Ming-Dong Wang. “Regulation of choles- terol efflux from macrophages”. In: Current opinion in lipidology 19.5 (2008), pp. 455–461. [85] Marieke Pennings et al. “Regulation of cholesterol in macrophages and consequences for atherosclerotic lesion development”. In: FEBS letters 580.23 (2006), pp. 5588–5596. [86] Saara Vainio and Elina Ikonen. “Macrophage cholesterol transport: a critical player in foam cell formation”. In: Annals of medicine 35.3 (2003), pp. 146– 155. [87] Pavel Shashkin, Bojan Dragulev, and Klaus Ley. “Macrophage differentiation to foam cells”. In: Current pharmaceutical design 11.23 (2005), pp. 3061–3072. [88] AJ Lusis. “Atherosclerosis”. In: Nature 407.6801 (2000), pp. 233–41. [89] Aldons J Lusis, Rebecca Mar, and P¨aiviPajukanta. “Genetics of atheroscle- rosis”. In: Annu. Rev. Genomics Hum. Genet. 5 (2004), pp. 189–218. [90] Peggy Robinet et al. “A simple and sensitive enzymatic method for cholesterol quantification in macrophages and foam cells”. In: Journal of lipid research 51.11 (2010), pp. 3364–3369. [91] Pan Du, Warren A Kibbe, and Simon M Lin. “lumi: a pipeline for processing Illumina microarray”. In: Bioinformatics 24.13 (2008), pp. 1547–1548. [92] Gordon K Smyth et al. “Linear models and empirical bayes methods for as- sessing differential expression in microarray experiments”. In: Stat Appl Genet Mol Biol 3.1 (2004), p. 3.

115 [93] Robert Gentleman et al. Bioinformatics and solutions using R and . Vol. 746718470. Springer New York, 2005. [94] Guro Dørum et al. “Rotation testing in gene set enrichment analysis for small direct comparison experiments”. In: Statistical Applications in Genetics and 8.1 (2009), pp. 1–24. [95] Ian J Majewski et al. “Opposing roles of polycomb repressive complexes in hematopoietic stem and progenitor cells.” In: Blood 116.5 (Aug. 2010), pp. 731– 739. issn: 1528-0020. doi: 10.1182/blood-2009-12-260760. [96] Aravind Subramanian et al. “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles”. In: Proceedings of the National Academy of Sciences of the United States of America 102.43 (2005), pp. 15545–15550. [97] Kenneth J Livak and Thomas D Schmittgen. “Analysis of Relative Gene Ex- pression Data Using Real-Time Quantitative PCR and the 2¡ sup¿- ∆∆CT¡/sup¿ Method”. In: methods 25.4 (2001), pp. 402–408. [98] Peggy Robinet, Brian Ritchey, and Jonathan D Smith. “Physiological Dif- ference in Autophagic Flux in Macrophages From 2 Mouse Strains Regulates Cholesterol Ester Metabolism”. In: Arteriosclerosis, Thrombosis, and Vascular Biology (2013). [99] Byeong-Chel Lee et al. “P2Y-like receptor, GPR105 (P2Y14), identifies and mediates of bone-marrowhematopoietic stem cells”. In: Genes & development 17.13 (2003), pp. 1592–1604. [100] Aur´elieC Fabre et al. “P2Y13 receptor is critical for reverse cholesterol trans- port”. In: Hepatology 52.4 (2010), pp. 1477–1483. [101] Dani¨elBlom et al. “Altered lipoprotein metabolism in P2Y¡ sub¿ 13¡/sub¿ knockout mice”. In: Biochimica et Biophysica Acta (BBA)-Molecular and Cell Biology of Lipids 1801.12 (2010), pp. 1349–1360. [102] KL Jones, JJ Maguire, and AP Davenport. “Chemokine receptor CCR5: from AIDS to atherosclerosis”. In: British journal of 162.7 (2011), pp. 1453–1469. [103] Chi-Ming Li et al. “Insertional mutagenesis of the mouse acid ceramidase gene leads to early embryonic lethality in homozygotes and progressive lipid storage disease in heterozygotes”. In: Genomics 79.2 (2002), pp. 218–224. [104] Eric G Thompson et al. “Lysosomal trafficking functions of mucolipin-1 in murine macrophages”. In: BMC cell biology 8.1 (2007), p. 54. [105] Cyntia Curcio-Morelli et al. “Macroautophagy is defective in mucolipin-1- deficient mouse neurons”. In: Neurobiology of disease 40.2 (2010), pp. 370– 377. [106] Alanna Strong et al. “Hepatic sortilin regulates both se- cretion and LDL catabolism”. In: The Journal of clinical investigation 122.8 (2012), p. 2807.

116 [107] Jin Ye and Russell A DeBose-Boyd. “Regulation of cholesterol and synthesis”. In: Cold Spring Harbor Perspectives in Biology 3.7 (2011). [108] Seung-Soon Im et al. “Linking to the innate immune response in macrophages through sterol regulatory element binding protein-1a”. In: Cell metabolism 13.5 (2011), pp. 540–549. [109] Tomohiro Ide et al. “SREBPs suppress IRS-2-mediated insulin signalling in the liver”. In: Nature cell biology 6.4 (2004), pp. 351–357. [110] Chunyan Zhao and Karin Dahlman-Wright. “Liver X receptor in cholesterol metabolism”. In: Journal of Endocrinology 204.3 (2010), pp. 233–240. [111] Petri Pehkonen et al. “Genome-wide landscape of liver X receptor chromatin binding and gene regulation in human macrophages”. In: BMC genomics 13.1 (2012), p. 50. [112] David Ron and Peter Walter. “Signal integration in the endoplasmic reticu- lum unfolded protein response”. In: Nature reviews Molecular cell biology 8.7 (2007), pp. 519–529. [113] Sophie C Cazanave et al. “CHOP and AP-1 cooperatively mediate PUMA ex- pression during lipoapoptosis”. In: American Journal of Physiology-Gastrointestinal and Liver Physiology 299.1 (2010), G236–G243. [114] Yuan-yuan Shang et al. “TRB3, upregulated by ox-LDL, mediates human -derived macrophage apoptosis”. In: Febs Journal 276.10 (2009), pp. 2752–2761. [115] Yuan-Yuan Shang et al. “Tribble 3, a novel oxidized low-density lipoprotein- inducible gene, is induced via the activating transcription factor 4–C/EBP homologous protein pathway”. In: Clinical and Experimental Pharmacology and Physiology 37.1 (2010), pp. 51–55. [116] Sabrina Prudente et al. “The Mammalian Tribbles Homolog TRIB3, Glucose Homeostasis, and Cardiovascular Diseases”. In: Endocrine reviews 33.4 (2012), pp. 526–546. [117] Tracie DeVries-Seimon et al. “Cholesterol-induced macrophage apoptosis re- quires ER stress pathways and engagement of the type A scavenger receptor”. In: The Journal of cell biology 171.1 (2005), pp. 61–73. [118] Ira Tabas. “Consequences and therapeutic implications of macrophage apopto- sis in atherosclerosis the importance of lesion stage and phagocytic efficiency”. In: Arteriosclerosis, thrombosis, and vascular biology 25.11 (2005), pp. 2255– 2264. [119] Edward Thorp et al. “Reduced apoptosis and plaque necrosis in advanced atherosclerotic lesions of Apoe-/- and Ldlr-/- mice lacking CHOP”. In: Cell Metabolism 9.5 (2009), p. 474. [120] Zhi-hao Wang et al. “Silence of TRIB3 Suppresses Atherosclerosis and Stabi- lizes Plaques in Diabetic ApoE-/-/LDL Receptor-/- Mice”. In: Diabetes 61.2 (2012), pp. 463–473.

117 [121] Xianghai Liao et al. “Macrophage autophagy plays a protective role in ad- vanced atherosclerosis”. In: Cell metabolism (2012). [122] Tobias Maier, Marc G¨uell,and Luis Serrano. “Correlation of mRNA and pro- tein in complex biological samples”. In: FEBS letters 583.24 (2009), pp. 3966– 3973. [123] Anatole Ghazalpour et al. “Comparative analysis of proteome and transcrip- tome variation in mouse”. In: PLoS genetics 7.6 (2011), e1001393. [124] Jeffrey Hsu and Jonathan D Smith. “Genetic-Genomic Replication to Identify Candidate Mouse Atherosclerosis Modifier Genes”. In: Journal of the American Heart Association 2.1 (2013). [125] A Ludwig et al. “Two pacemaker channels from human heart with profoundly different activation kinetics.” In: The EMBO journal 18.9 (May 1999), pp. 2323– 2329. issn: 0261-4189. doi: 10.1093/emboj/18.9.2323. [126] Gernot Schram et al. “Differential Distribution of Cardiac Ion Channel Ex- pression as a Basis for Regional Specialization in Electrical Function”. In: Circulation research 90.9 (May 2002), pp. 939–950. issn: 00097330. doi: 10. 1161/01.RES.0000018627.89528.6F. [127] Niels Voigt et al. “Left-to-right atrial inward rectifier potassium current gra- dients in patients with paroxysmal versus chronic atrial fibrillation.” In: Cir- culation Arrhythmia and Electrophysiology 3.5 (Oct. 2010), pp. 472–480. issn: 1941-3084. doi: 10.1161/CIRCEP.110.954636. [128] Felipe Atienza et al. “Activation of inward rectifier potassium channels ac- celerates atrial fibrillation in humans: evidence for a reentrant mechanism.” In: Circulation 114.23 (Dec. 2006), pp. 2434–2442. issn: 1524-4539. doi: 10. 1161/CIRCULATIONAHA.106.633735. [129] Daniel F Gudbjartsson et al. “Variants conferring risk of atrial fibrillation on chromosome 4q25.” In: Nature 448.7151 (July 2007), pp. 353–7. issn: 1476- 4687. doi: 10.1038/nature06007. [130] Daniela Galli et al. “Atrial myocardium derives from the posterior region of the second heart field, which acquires left-right identity as Pitx2c is expressed.” In: Development 135.6 (Mar. 2008), pp. 1157–1167. issn: 0950-1991. doi: 10. 1242/dev.014563. [131] Ankur Saxena and Clifford J Tabin. “miRNA-processing enzyme Dicer is nec- essary for cardiac outflow tract alignment and chamber septation.” In: Proceed- ings of the National Academy of Sciences of the United States of America 107.1 (Jan. 2010), pp. 87–91. issn: 1091-6490. doi: 10.1073/pnas.0912870107. [132] Antonio J Giraldez et al. “Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs.” In: Science 312.5770 (Apr. 2006), pp. 75–79. issn: 1095-9203. doi: 10.1126/science.1122689.

118 [133] Alexander Marson et al. “Connecting microRNA genes to the core transcrip- tional regulatory circuitry of embryonic stem cells.” In: Cell 134.3 (Aug. 2008), pp. 521–533. issn: 1097-4172. doi: 10.1016/j.cell.2008.07.020. [134] Chijen R Lin et al. “Pitx2 regulates lung asymmetry, cardiac positioning and pituitary and tooth morphogenesis”. In: Nature 401.6750 (1999), pp. 279–282. [135] Jun Wang et al. “Pitx2 prevents susceptibility to atrial arrhythmias by in- hibiting left-sided pacemaker specification.” In: Proceedings of the National Academy of Sciences of the United States of America 107.21 (May 2010), pp. 9753–9758. issn: 1091-6490. doi: 10.1073/pnas.0912585107. [136] Cole Trapnell, Lior Pachter, and Steven L Salzberg. “TopHat: discovering splice junctions with RNA-Seq.” In: Bioinformatics (Oxford, England) 25.9 (May 2009), pp. 1105–1111. issn: 1367-4811. doi: 10.1093/bioinformatics/ btp120. [137] Mark D Robinson, Davis J McCarthy, and Gordon K Smyth. “edgeR: a Bio- conductor package for differential expression analysis of digital gene expression data.” In: Bioinformatics 26.1 (Jan. 2010), pp. 139–140. issn: 1367-4811. doi: 10.1093/bioinformatics/btp616. [138] Sam Griffiths-Jones. “The microRNA Registry.” In: Nucleic acids research 32.Database issue (Jan. 2004), pp. D109–11. issn: 1362-4962. doi: 10.1093/ nar/gkh023. [139] Alessandra Tessari et al. “Myocardial Pitx2 differentially regulates the left atrial identity and ventricular asymmetric remodeling programs.” In: Circula- tion research 102.7 (Apr. 2008), pp. 813–22. issn: 1524-4571. doi: 10.1161/ CIRCRESAHA.107.163188. [140] Di Wu et al. “ROAST: rotation gene set tests for complex microarray exper- iments.” In: Bioinformatics (Oxford, England) 26.17 (Sept. 2010), pp. 2176– 2182. issn: 1367-4811. doi: 10.1093/bioinformatics/btq401. [141] Aravind Subramanian et al. “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.” In: Proceedings of the National Academy of Sciences of the United States of America 102.43 (Oct. 2005), pp. 15545–15550. issn: 0027-8424. doi: 10.1073/pnas.0506580102. [142] Xiaohui Xie et al. “Systematic discovery of regulatory motifs in human pro- moters and 3’ UTRs by comparison of several mammals.” In: Nature 434.7031 (Mar. 2005), pp. 338–345. issn: 1476-4687. doi: 10.1038/nature03441. [143] Cole Trapnell et al. “Transcript assembly and quantification by RNA-Seq re- veals unannotated transcripts and isoform switching during cell differentia- tion”. In: Nature biotechnology 28.5 (May 2010), pp. 516–520. issn: 1087-0156. doi: 10.1038/nbt.1621.

119 [144] Peter C. Kahr et al. “Systematic Analysis of Gene Expression Differences between Left and Right Atria in Different Mouse Strains and in Human Atrial Tissue”. In: PLoS ONE 6.10 (Oct. 2011). Ed. by Leon J. de Windt, e26389. issn: 1932-6203. doi: 10.1371/journal.pone.0026389. [145] Ana Chinchilla et al. “PITX2 Insufficiency Leads to Atrial Electrical and Struc- tural Remodeling Linked to Arrhythmogenesis.” In: Circulation. Cardiovascu- lar genetics 4.3 (Apr. 2011), pp. 269–279. issn: 1942-3268. doi: 10.1161/ CIRCGENETICS.110.958116. [146] Paulus Kirchhof et al. “PITX2c is Expressed in the Adult Left Atrium, and Re- ducing Pitx2c Expression Promotes Atrial Fibrillation Inducibility and Com- plex Changes in Gene Expression.” In: Circulation. Cardiovascular genetics 4 (Jan. 2011), pp. 123–133. issn: 1942-3268. doi: 10.1161/CIRCGENETICS.110. 958058. [147] Scot J Matkovich et al. “Deep mRNA sequencing for in vivo functional analysis of cardiac transcriptional regulators: application to Galphaq.” In: Circulation research 106.9 (May 2010), pp. 1459–1467. issn: 1524-4571. doi: 10.1161/ CIRCRESAHA.110.217513. [148] M Faucourt et al. “The pitx2 homeobox protein is required early for endoderm formation and nodal signaling.” In: 229.2 (Jan. 2001), pp. 287–306. issn: 0012-1606. doi: 10.1006/dbio.2000.9950. [149] Antonella Roetto et al. “Mutant antimicrobial peptide hepcidin is associated with severe juvenile hemochromatosis.” In: Nature genetics 33.1 (Jan. 2003), pp. 21–22. issn: 1061-4036. doi: 10.1038/ng1053. [150] Mariana Lagos-Quintana et al. “Identification of tissue-specific microRNAs from mouse.” In: Current biology : CB 12.9 (Apr. 2002), pp. 735–9. issn: 0960-9822. [151] Andreas S Barth et al. “Functional profiling of human atrial and ventricular gene expression.” In: Pflugers Archiv : European journal of physiology 450.4 (July 2005), pp. 201–208. issn: 0031-6768. doi: 10.1007/s00424-005-1404- 8. [152] Matthew J Blow et al. “ChIP-Seq identification of weakly conserved heart enhancers”. In: Nature genetics 42.9 (Aug. 2010), pp. 818–822. issn: 1061- 4036. doi: 10.1038/ng.650. [153] Kimberly R Cordes et al. “miR-145 and miR-143 regulate smooth muscle cell fate and plasticity.” In: Nature 460.7256 (Aug. 2009), pp. 705–10. issn: 1476- 4687. doi: 10.1038/nature08195. [154] Kota Y Miyasaka et al. “Heartbeat regulates cardiogenesis by suppressing retinoic acid signaling via expression of miR-143.” In: Mechanisms of develop- ment 128.1-2 (Sept. 2010), pp. 18–28. issn: 1872-6356. doi: 10.1016/j.mod. 2010.09.002.

120 [155] Dekker C Deacon et al. “The miR-143-adducin3 pathway is essential for cardiac chamber morphogenesis.” In: Development 137.11 (June 2010), pp. 1887–1896. issn: 1477-9129. doi: 10.1242/dev.050526. [156] Kathryn N Ivey et al. “MicroRNA regulation of cell lineages in mouse and human embryonic stem cells.” In: Cell stem cell 2.3 (Mar. 2008), pp. 219–229. issn: 1875-9777. doi: 10.1016/j.stem.2008.01.016. [157] Yoshio Kato et al. “Real-time functional imaging for monitoring miR-133 during myogenic differentiation.” In: The international journal of biochem- istry & cell biology 41.11 (Nov. 2009), pp. 2225–2231. issn: 1878-5875. doi: 10.1016/j.biocel.2009.04.018. [158] Li Z Luo et al. “DNA repair in human pluripotent stem cells is distinct from that in non-pluripotent human cells”. In: PloS one 7.3 (2012), e30541. [159] Dirk Hockemeyer et al. “Genetic engineering of human pluripotent cells using TALE nucleases”. In: Nature biotechnology 29.8 (2011), pp. 731–734. [160] Qiurong Ding et al. “Enhanced Efficiency of Human Pluripotent Stem Cell Genome Editing through Replacing TALENs with CRISPRs.” In: Cell stem cell 12.4 (2013), p. 393. [161] Le Cong et al. “Multiplex genome engineering using CRISPR/Cas systems”. In: Science 339.6121 (2013), pp. 819–823.

121