An Inter-Vascular Bed and Inter-Species Investigation of Epigenetic Regulatory Elements in Endothelial Cells

by

Lina Antounians

A thesis submitted in conformity with the requirements for the degree of Master’s of Science

Department of Molecular Genetics University of

© Copyright by Lina Antounians 2015

An Inter-Vascular Bed and Inter-Species Investigation of Epigenetic Regulatory Elements in Endothelial Cells Lina Antounians Master’s of Science Molecular Genetics, , 2015

Abstract A major challenge in human genetics is to understand the mechanisms that control expression. To identify gene regulatory regions required for vascular homeostasis, we performed chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) for a variety of histone modifications and the JUN in primary aortic endothelial cells (ECs) isolated from human, rat (Rattus norvegicus) and bovine (Bos taurus). We generated a chromatin state map for human aortic ECs and found that the vast majority of regulatory regions in aortic ECs were also active in venous ECs. By comparing the genomic occupancy of

JUN and a histone modification indicative of active enhancers (H3K27ac) between species, we identified a set of conserved regulatory regions that were enriched for EC-specific pathways and human regulatory disease variants. Overall, we demonstrate that comparative epigenomics is a viable strategy to identify functionally important vascular gene regulatory elements.

ii

Acknowledgements I would like to express my sincere gratitude to my supervisor Dr. Michael Wilson for his mentorship and encouragement throughout my Master’s degree. I have learned many technical and analytical skills in your lab, but none as important as the passion for learning, teaching, and collaborating with my peers. A heartfelt thanks to Alejandra Medina-Rivera for her insights and hard work she put into our joint project. It has been a pleasure to work alongside you as a team and to become great friends outside of science. I would also like to thank Dr. Johanna Rommens and Dr. Julie Claycomb, my thesis supervisory committee, for their support, enthusiasm, and critical feedback in both exciting and difficult times. To Liis, I am thankful for the long conversations about science and life, and for your constant encouragement. I had the pleasure to work alongside other brilliant minds in the Wilson lab, both past and present, especially Huayun, Minggao, Xuefei, Maisam, and Azad. Thank you for listening and critiquing my long practice talks and teaching me everything I know about computational biology (including how to get over the fear of using Terminal). Thank you to Xiaoli, Lindsy, and Ted, who were instrumental in helping me set up my project. I am forever grateful to have shared my graduate school experience with wonderful colleagues, including many past and current members in the Gagnon lab, Pearson lab, Minassian lab, Bazett-Jones lab, Rommens lab, Weksberg lab, Justice lab, and the Fish lab. Thank you to all my peers and mentors in the Genetics and Genome Biology Program at SickKids and the Department of Molecular Genetics at University of Toronto for the memorable experience. I am fortunate that support from my family has been a key to my success in graduate school; this thesis is dedicated to you. To my parents, who have always supported me and pushed me to become a better version of myself every day, I endeavor to make you proud. To Leo, my little big brother, thank you for helping me realize and focus on what is most important in life. I am sincerely grateful to my extended family, for their love and for always encouraging me to continue striving and working hard in different aspects of my life. And lastly, to my other half, Keon, thank you for being my inspiration and my motivation, we will continue climbing together.

iii

Data Attribution The data presented here was acquired and analyzed in collaboration with post-doctoral fellow Dr. Alejandra Medina-Rivera. I conducted all experimental procedures, including 1) expanding primary ECs in cell culture, 2) conducting immunofluorescence and quality control assays, and 3) performing ChIP-seq and ChIP-qPCR experiments. Using scripts written by Dr. Medina-Rivera and others in the lab, I analyzed experimental data. This included 1) aligning sequencing reads, conducting quality control, and calling peaks, 2) building chromatin states, 3) conducting enrichment analysis for biological pathway, human disease variants, and transcription factor binding sites, and 4) defining biochemically conserved sites across multiple species. I interpreted the results together with Dr. Medina-Rivera.

iv

Table of Contents Abstract ...... ii

Acknowledgements ...... iii

Data Attribution ...... iv

List of Figures ...... vii

List of Tables ...... viii

List of Supplementary Figures ...... ix

List of Supplementary Tables ...... ix

List of Abbreviations ...... x

List of ...... xi

Chapter 1 Introduction ...... 1 Endothelial cells play prominent roles in cardiovascular health and disease ...... 1 ECs retain their vascular-bed specific gene expression profiles in tissue culture ...... 1 Ex vivo EC models are essential for understanding EC biology ...... 2 Arterial and venous specification is controlled in part by sets of transcription factors ...... 4 Integrating multiple ChIP-seq datasets is a powerful way to characterize the epigenome ...... 7 Thesis Rationale for Chapter 1 ...... 10

Chapter 1 Materials and Methods ...... 11

Chapter 1 Results ...... 19 1.1 Characterizing the epigenetic landscape of human aortic endothelial cells ...... 19 1.2 Active enhancer and regions can distinguish biological pathways between HAEC and HUVEC ...... 25 1.3 Searching for functional regulatory elements in HAEC-only and HUVEC-only regions ...... 30 1.4 Prominent regulatory regions private to endothelial cells highlight endothelial cell-specific biology ...... 34

Chapter 1 Summary ...... 37

Chapter 2 Introduction ...... 38 Identifying functional cis-regulatory elements using comparative genomics ...... 38 Non-coding regions carry crucial information related to human disease gene regulation ...... 39 Epigenetic constraint between species reveals biologically relevant pathways and diseases ...... 39 v

Cell models from multiple species are commonly used to study vascular biology ...... 40 Thesis Rationale for Chapter 2 ...... 41

Chapter 2 Materials and Methods ...... 42

Chapter 2 Results ...... 45 2.1 A minority of human aortic EC H3K27ac and JUN binding sites are shared with rat and bovine aortic ECs ...... 45 2.2 Determining the distribution of biochemically-conserved sites in EC-private and pan-tissue regulatory regions ...... 48 2.3 Using H3K27ac as a baseline for identifying biochemically conserved TF binding sites ...... 50 2.4 Biochemically conserved regulatory regions harbour many known human regulatory . 56 2.5 Regulatory mutations can alter transcription factor binding sites and cause disease ...... 60 2.6 Many regulatory variants in conserved TF-bound sites are tissue specific ...... 62 2.7 Conserved H3K27ac regions are functionally active in blood vessels of mouse embryos ...... 66 2.8 DNA sequence constraint is not a good predictor of biochemically conserved JUN binding ...... 69

Chapter 2 Summary ...... 75

Discussion and Future Directions ...... 77

References ...... 89

Appendix ...... 100

vi

List of Figures Figure 1: Anatomical location of EC origin is an important determinant of ex vivo gene expression patterns...... 3 Figure 2: Transcription factors drive endothelial cell specification...... 5 Figure 3: Histone modifications and TF binding sites illustrate different epigenetic features...... 9 Figure 4: Epigenetic profiling of HAEC and HUVEC at the CCL2 locus, which is active in both HAEC and HUVEC, and HEY2 locus, which plays an important role in arterial gene regulation...... 20 Figure 5: Integrating multiple histone modification and CTCF binding data reveals that the epigenome is similarly regulated between HAEC and HUVEC...... 23 Figure 6: There are relatively few HAEC or HUVEC-specific gene regulatory regions...... 26 Figure 7: Pathway enrichment analysis of JUN and H3K27ac peak regions that were classified as HAEC-only or HUVEC-only ...... 28 Figure 8: EC-private H3K27ac and JUN sites are enriched for biological pathways involved in inflammation and angiogenesis...... 36 Figure 9: There is strong JUN binding within an active enhancer marked by H3K27ac at the JUN locus in rat, bovine, and human aortic ECs...... 46 Figure 10: Few human H3K27ac and JUN peaks are biochemically conserved in other species...... 47 Figure 11: Conserved JUN sites are more likely to be private to ECs, and conserved enhancers are preferentially found in pan-tissue sites...... 49 Figure 12: Defining leniently conserved JUN sites between multiple species...... 51 Figure 13: AP-1 monomers have overrepresented transcription factor binding motifs in HAEC peaks conserved with either rat and/or bovine...... 52 Figure 14: Conserved regulatory regions enrich for tissue-specific biological pathways relevant to ECs...... 54 Figure 15: HGMD regulatory variant enrichment analysis for biochemically conserved H3K27ac and JUN peaks...... 56 Figure 16: Network of genes annotated to regulatory HGMD variants that overlap leniently conserved JUN sites...... 58 Figure 17: Regulatory variants alter JUN binding motifs in promoter region of IL6 and intronic region of RAD51B...... 61

vii

Figure 18: Known human disease variants enrich for conserved TF binding sites in a cell type- dependent manner...... 63 Figure 19: The majority of conserved H3K27ac aortic EC and liver sites enrich for the same genes associated to HGMD variants...... 65 Figure 20: Six biochemically-conserved H3K27ac or JUN sites have positive enhancer activity in blood vessels of mouse embryos...... 68 Figure 21: Biochemically-conserved JUN peaks are enriched for different biological pathways than the top ranked DNA sequence constraint peaks...... 71 Figure 22: Peak intensity scores are better predictors of true biochemical conservation than DNA sequence constraint methods GERP and fitCons...... 74 Figure 23: Target site of CRISPR/Cas9 genome editing in a biochemically-conserved JUN site of the NOS3 proximal promoter region...... 82 Figure 24: Nuclear translocation of RELA occurs within 15 minutes post TNFA stimulation with sustained nuclear localization for approximately one hour...... 85 Figure 25: RELA and JUN co-regulate expression of TNFA-induced 3 (TNFAIP3) by binding to a nucleosome depleted region in the proximal promoter...... 88

List of Tables Table 1: TF binding sites in HAEC-only and HUVEC-only JUN and H3K27ac peaks reveal possible binding partners of JUN...... 31

viii

List of Supplementary Figures Supplementary Figure 1: Inter-replicate cross correlation shows that HAEC and HUVEC libraries cluster based on ChIP-seq factor...... 108 Supplementary Figure 2: Two variants within the proximal promoter of NOS3 are within a HAEC-only H3K27ac and HAEC-only JUN peak...... 113 Supplementary Figure 3: Number of human aortic EC peaks shared with rat and/or bovine aortic ECs...... 114 Supplementary Figure 4: Number of leniently conserved JUN sites with rat and bovine aortic ECs...... 115 Supplementary Figure 5: The conservation status of aortic EC peaks tested by VISTA enhancers is not predictive of positive enhancer activity...... 120

List of Supplementary Tables Supplementary Table 1: HAEC ChIP-seq library sequencing statistics ...... 100 Supplementary Table 2: HUVEC and HepG2 ChIP-seq library information ...... 101 Supplementary Table 3: HAEC ChIP-seq quality control ...... 102 Supplementary Table 4: Regulatory HGMD variants in HAEC-only and HUVEC-only H3K27ac and JUN sites...... 103 Supplementary Table 5: List of regulatory region definitions ...... 107 Supplementary Table 6: RAEC and BAEC ChIP-seq library sequencing statistics ...... 109 Supplementary Table 7: RAEC and BAEC ChIP-seq quality control ...... 109 Supplementary Table 8: Top 10 g:Profiler enrichments in GO Terms used to describe genes from Supplementary Table 4...... 110 Supplementary Table 9: Top 10 g:Profiler enrichments in GO Terms used to describe genes from Figure 19 show common genes enrich for cellular functions...... 116 Supplementary Table 10: List of closest genes to HAEC-only regions that are in an active chromatin state and repressed or poised state in HUVEC, and vise versa...... 121 Supplementary Table 11: Regulatory HGMD disease variants in conserved JUN sites annotated as disease causing...... 129

ix

List of Abbreviations ALS Amyotrophic Lateral Sclerosis API Application programming interface AUC Area under curve BAEC Bovine aortic endothelial cells BWA Burrows-Wheeler Aligner CRISPR Clustered regularly interspaced short palindromic repeats CRM Cis-regulatory module DAPI 4',6-diamidino-2-phenylindole DiI 1,1'-dioctadecyl-3,3,3'3'-tetramethylindocarbocyanine perchlorate DFP Disease functional polymorphism DM Disease DP Disease polymorphism EC Endothelial cell ENCODE Encyclopaedia of DNA Elements FDR False discovery rate FP Functional polymorphism GAT Genomic association test GERP Genomic Evolutionary Rate Profiling GO GREAT Genomic Regions Enrichment of Annotations Tool GWAS Genome Wide Association Studies HAEC Human aortic endothelial cells HGMD Human Gene Mutation Database HUVEC Human umbilical vein endothelial cells IDR Irreproducible Discovery Rate LDL Low density lipoprotein MAEC Mouse aortic endothelial cells MAPK Mitogen-activated protein kinases MSA Multiple sequence alignment NSC Normalized strand coefficient OMIM Online Mendelian Inheritance in Man PAEC Porcine aortic endothelial cells RAEC Rat aortic endothelial cells ROC Receiver operating curve RSAT Regulatory Sequence Analysis Tools RSC Relative strand correlation TF Transcription factor SNP Single nucleotide polymorphism TSS Transcription start site UTR Untranslated region WCE Whole cell extract

x

List of Genes ATF2 Activating transcription factor 2 BCAN Brevican proteoglycan CEBPA CCAAT/enhancer-binding protein alpha COUP-TFII Chicken ovalbumin upstream promoter transcription factor 2 CTCF CCCTC-binding factor EHF ETS homologous factor ERG ETS-related gene ETS E26 transformation-specific family of transcription factors FEV Fifth Ewing Variant Protein, ETS family FOS FBJ Murine Osteosarcoma Viral Oncogene Homolog GATA2 GATA binding protein 2 JDP JUN Jun Proto-Oncogene JUNB Jun B Proto-Oncogene JUND Jun D Proto-Oncogene MAF v- avian musculoaponeurotic fibrosarcoma oncogene homolog NF-κB Nuclear Factor Of Kappa Light Polypeptide Gene Enhancer In B-Cells 1 NOS3 Nitric Oxide Synthase 3 PDGFR Platelet-derived growth factor POR P450 oxidoreductase PROCR Protein C Receptor PTEN Phosphatase And Tensin Homolog RELA V-Rel Avian Reticuloendotheliosis Viral Oncogene Homolog A TGFBR Transforming growth factor beta receptor TNFA Tumor necrosis factor alpha VEGFA Vascular endothelial growth factor alpha VWF Von Willebrand factor

xi

Chapter 1 Introduction

Endothelial cells play prominent roles in cardiovascular health and disease The endothelium is a monolayer of cells that forms the inner cellular lining of arteries, veins, and capillaries in the body. The endothelium is highly metabolically active and a principal player in the control of several physiological processes, including vascular homeostasis, response to injury and inflammation, and the formation of new blood vessels (Aird 2012). The dysfunction of endothelial cells (ECs) contributes to a myriad of human diseases such as atherosclerosis, thrombosis, irritable bowel disease, and diabetes (Morange and Trégouët 2013; Steyers and Miller 2014; Francescone et al. 2015; Maezawa et al. 2015). For this reason, ECs are extensively studied in vivo and in culture. Understanding how EC genes function is an active area of research with a considerable emphasis on understanding how ECs respond to various stimuli such as cytokines, growth factors, and flow dynamics (Chiu and Chien 2011; Aranguren et al. 2013; Brown et al. 2014; Kaikkonen et al. 2014). Since ECs line the lumen of all blood vessels in the body, they function in a variety of distinct microenvironments. For example, aortic ECs experience high shear stress and pressure from the heart pumping oxygenated blood, and most venous ECs experience relatively low shear stress and pressure while circulating deoxygenated blood. Indeed, characterizing the unique EC biology that occurs at different anatomical locations is a major focus of cardiovascular disease research (Chi et al. 2003; Sana et al. 2005; Scott et al. 2013). In vivo studies of ECs in different tissues and vascular beds have revealed tissue specific disease processes (e,g, in vivo von Willebrand Factor (VWF) studies (Aird et al. 1997), in vivo thrombosis studies (Koskinas et al. 2012), and Factor 8 (F8) expression in liver ECs (Everett et al. 2014)). However, most of the fundamental mechanisms of ECs have been elucidated in ex vivo tissue culture systems.

ECs retain their vascular-bed specific gene expression profiles in tissue culture A landmark paper that compared gene expression patterns between multiple endothelial cells showed that ECs cultured ex vivo retain artery- and vein-specific gene expression programs (Chi et al. 2003). Unsupervised hierarchical clustering of mRNA from 53 different cultured EC samples showed a striking consistency of expression patterns based on the original anatomical location of the cultured ECs (Figure 1). This study showed that ECs retain their arteriovenous identity ex vivo and suggested that this specificity was inherited and propagated epigenetically.

1

In my thesis, I will investigate the epigenetic mechanisms that underlie artery- and vein-specific endothelial gene regulation.

Ex vivo EC models are essential for understanding EC biology The most widely studied human ex vivo EC model are derived from umbilical veins (HUVECs). Due to availability following birth, HUVECs are the most widely studied EC cell type and are used to model diverse aspects of vascular biology (Akiyama et al. 2000; Clanchy and Hamilton 2012; Zhang et al. 2013). HUVECs are also the best characterized EC at the level of the epigenome. The Encyclopedia of DNA Elements (ENCODE) Consortium has generated an epigenetic map of HUVECs by profiling several histone modifications (N=11), transcription factors (TFs) (N=5), and chromatin modulators (N=3) (Consortium 2012). Collectively, this work serves as a foundation for all biologists who work with HUVECs. Another EC cell type commonly used in vascular biology is aortic endothelial cells (AECs), which have been used to study the inflammatory response (Liu et al. 2012), the blood coagulation cascade (Miwa et al. 2010), and relevant diseases like atherosclerosis and diabetes (Wang et al. 2012; Hakimi et al. 2014). Bovine aortic endothelial cells (BAEC) were one of the first EC model systems used in pioneer studies that discovered endothelial nitric oxide synthase (eNOS) (Nishida et al. 1992). Given that arterial and venous ECs isolated from large vessels have reproducibly different gene expression patterns ex vivo (Chi et al. 2003), there is a need for a comparable comprehensive epigenetic map of regulatory elements in HAECs. Having a representative map for human aortic ECs (HAECs) will allow us to identify common and EC-cell type specific gene regulatory regions. AECs from multiple species are actively studied and commercially available from various species (e.g. bovine (BAEC (Du et al. 2012)), mouse (MAEC (James et al. 2014)), rat (RAEC (Nguyen et al. 2013)), and pig (PAEC (Kourtzelis et al. 2015)). In addition to the biological importance of understanding ECs from the aorta, the practical and economical advantages of AECs allow for comparative epigenomic studies that can provide new insights into understanding vascular gene regulatory mechanisms. In contrast, one limitation of the HUVEC model is that cells from similar anatomical locations do not exist or are not readily obtainable.

2

Figure 1: Anatomical location of EC origin is an important determinant of ex vivo gene expression patterns. Figure 1 Legend: Unsupervised hierarchical clustering of gene expression patterns of 53 cultured primary EC explants. Four distinct clusters are shown and colour-coded Artery, Tissue (I), Vein, and Tissue (II). Reprinted from PNAS (Chi et al. 2003) copyright 2003 for noncommercial and educational use.

3

Arterial and venous specification is controlled in part by sets of transcription factors Endothelial cells are developmentally derived from FLK1- and VEGFR2-expressing cells of the mesoderm layer (Shalaby et al. 1995; Carmeliet et al. 1996). There are key TFs that control EC development from hemangiogenic mesoderm cells to differentiated adult arterial, venous, and lymphatic ECs (Figure 2, reproduced with permission from (Park et al. 2013)). Arterial EC specification is primarily controlled by the Notch signaling pathway, which drives the expression of HEY1, HEY2, FOXC1, and FOXC2 TFs (Park et al. 2013). Venous EC specification involves the down-regulation of the Notch signaling pathway and upregulation of COUP-TFII (NR2F2) transcription factor. COUP-TFII (NR2F2) represses FOXC1 expression in vein ECs and upregulates vein-specific transcription factors SOX7 and SOX18 (Chen et al. 2012). There are also many TFs that are expressed in both artery and vein ECs. Notably, GATA2, a known regulator of the blood lineage and endothelial cells (Dorfman et al. 1992), and JUN bind together to form cis-regulatory modules (CRMs) and function to control EC phenotypes and inflammatory responses (Linnemann et al. 2011). A recent study analyzed the transcriptome of human umbilical artery ECs and HUVECs and identified 8 transcription factors (HEY2, PRDM16, TOX2, SOX17, NKX2.3, AFF3, EMX2, and MSX1) that co-determine the arterial EC phenotype (Aranguren et al. 2013). Overexpressing these 8 artery-specific TFs in HUVECs can exhibit arterial-like characteristics. Furthermore, ECs are highly sensitive to the environment and can change their gene expression profiles in response to a number of different stimuli. For example, delta-like 4 (DLL4) encodes a NOTCH ligand and is an early marker of arterial endothelial cells. Although HUVECs express DLL4 in low levels, they can be induced to have arterial levels of DLL4 by overexpression of endothelial-enriched ETS TFs (Wythe et al. 2013). These studies show that TFs can reprogram ECs and that environmental conditions can naturally lead to expression of genes normally restricted to an alternate vascular bed. Studying these TFs and their target genes will shed light on both genetic and environmental factors that control EC gene regulation.

4

Figure 2: Transcription factors drive endothelial cell specification. Figure 2 Legend: Different sets of transcription factors regulate endothelial development from progenitor cells to differentiated arterial and venous ECs. Reprinted from Circulation Research with permission (Park et al. 2013), copyright 2013 (See Appendix page 133).

5

To facilitate inter vascular bed and inter species comparison of EC TF binding, I chose to focus on TFs known to participate in EC-specific biological processes in all EC cell types. The genome-wide binding of a limited number of TFs has been characterized in HUVECs. Two TFs which have been characterized by ENCODE (Bernstein et al. 2012) and others (Linnemann et al. 2011) that are known to participate in EC-specific functions are JUN and GATA2. My thesis work in AECs focused on JUN for important biological and practical reasons (I was only able to obtain JUN ChIP-seq data in AECs from multiple species). JUN is a monomer of Activator Protein-1 (AP-1) protein and is ubiquitously expressed in multiple tissues (Uhlen et al. 2015). The AP-1 complex regulates cellular proliferation, apoptosis, and early response to inflammatory stimulus (Gerritsen and Bloor 1993; Shaulian and Karin 2002). JUN forms JUN/JUN homodimers or JUN/FOS heterodimers in the AP-1 complex, and also binds with members of ATF2 and JDP families (Shaulian and Karin 2002; Hess et al. 2004). There are two paralogs of JUN, JUNB and JUND, which have a nearly identical C-terminus where dimerization and DNA binding occur, but have a different N-terminal domain that controls transcriptional activation (Castellazzi et al. 1991). The phosphorylation of JUN at Ser63 by c-Jun N-terminal kinase (JNK) in response to extracellular stimuli causes an increase in transcription of JUN target genes (Nateri et al. 2005). In addition to regulating the early response to a wide variety of external signals (e.g. UV radiation, oxidative stress, and cytokines), JUN plays a key role in G1 cell cycle progression that is independent of Ser63 phosphorylation (Angel and Karin 1991; Wisdom et al. 1999). A homozygous mouse knock out model (Jun-/-) is lethal in the embryonic period between day E11.5 and 15.5 and displays severe edema in the brain, impaired hepatogenesis, and altered fetal liver erythropoiesis (Hilberg et al. 1993). The majority of JUN binding sites in HUVECs (70%) co-localize with GATA2, a master regulator of hematopoiesis and a key player in the development and maintenance of ECs (De Val and Black 2009; Linnemann et al. 2011). JUN is regulated by highly conserved mitogen- activated protein kinase (MAPK) pathways (Davis 1994; Seger and Krebs 1995), making it an ideal TF for inter-vascular and inter-species analyses. Genome-wide binding of JUN has also been profiled in other cells including HepG2, a hepatocarcinoma cell line, which provides us with a unique opportunity to compare binding of a ubiquitous TF to identify cell-type specific regulatory regions. TFs bind gene regulatory regions either individually or in combination with co-factors to regulate spatial and temporal gene expression patterns (Zinzen et al. 2009). TFs regulate the 6 expression of genes by binding to 200-bp stretches of DNA called cis-regulatory modules (CRMs). It is generally not possible to profile all TFs that regulate gene expression in a given tissue, however, bioinformatics tools can be used to predict TF motifs within CRMs and uncover new gene regulatory mechanisms (Ernst and Kellis 2010; Kaikkonen et al. 2014; Villar et al. 2015). Thus, by mapping a single TF like the ubiquitously expressed JUN, one can identify the binding sites of several tissue-specific TFs. Locating TF-bound sites and finding mutations that alter TF binding preference is an important area of human disease research (Spivakov et al. 2012; Farh et al. 2014). Considering that 93% of variants associated to human disease traits through genome-wide association studies (GWAS) fall outside of protein coding sequences (Maurano et al. 2012), it is important to map the functional regulatory elements in the non-coding genome to better understand the impact of non-coding variants on disease (Boyle et al. 2012; Ward and Kellis 2012). In my thesis, I aim to experimentally define regulatory regions in ECs and look for known human regulatory disease variants within these regions; I am interested to learn whether variants may be causing disease phenotypes that are linked with EC biology.

Integrating multiple ChIP-seq datasets is a powerful way to characterize the epigenome Although TFs are essential for EC development and function, their expression alone does not explain how ECs can retain their gene expression profiles ex vivo. One important aspect of EC biology comes from the field of epigenetics. Epigenetics is the study of chromatin- based regulation that influences variable gene expression patterns despite unaltered DNA sequence (Bird 2007). Epigenetic marks such as histone posttranslational modifications, DNA methylation, and RNA-based mechanisms can be used to understand fundamental determinants of EC gene expression (Matouk and Marsden 2008). There is major interest in identifying epigenetic interactions between regulatory regions and the genes they regulate, especially with the recent publication of the largest catalogue of epigenetic elements for multiple human cells and tissues (Roadmap Epigenomics et al. 2015; Romanoski et al. 2015). In my thesis, I profiled several epigenetic marks and generated a HAEC atlas comparable to the datasets that exist for HUVEC. There are over 30 epigenetic marks that have been profiled genome-wide for various cells and tissues (Ernst and Kellis 2015), however, it is not practical or necessary to profile all of these. Indeed a small subset can be actively used to describe the rest of the epigenome marks. In my thesis I chose to profile eight marks that have been shown to sufficiently annotate the epigenome based on previous works (Figure 3) (Ernst et al. 2011). We also included CTCF, a

7 transcriptional regulator and key architectural component of the genome to predict long-range chromatin interactions (Zuin et al. 2014). CTCF is found at the borders of chromosomal domains and loops, and is often used to make predictions of chromatin conformation and looping (Phillips and Corces 2009; Corradin et al. 2014). There are numerous representative histone modifications that can distinguish essential genomic features such as active enhancers, promoters, repressed regions and gene bodies. I will use H3K27ac (acetylation of the 27th lysine residue on the H3 histone tail) that marks active enhancers and promoters, and H3K4me2 that marks active and latent enhancers and promoters (Figure 3) (Orford et al. 2008; Creyghton et al. 2010). H3K36me3-enriched regions occur due to polymerase II activity and thus denote active transcriptional elongation (Barski et al. 2007). Combinations of different histone modifications can be used to further characterize regulatory sites; for example, H3K27me3- and H3K4me1/2-enriched regions mark poised enhancers, which are enhancers that are repressed but primed for activation, in contrast to H3K27me3- enriched regions that broadly mark gene repression (Vastenhouw and Schier 2012). H3K4me3 marks promoters and similarly H3K27me3- and H3K4me3-enriched regions mark poised promoters that are primed for activation. ENCODE profiled enhancer and promoter marks in HUVEC and found active enhancer states were highly enriched for genes in biological pathways such as angiogenesis, blood vessel morphogenesis, and vascular development (Ernst et al. 2011).

8

Figure 3: Histone modifications and TF binding sites illustrate different epigenetic features. Figure 3 Legend: Schematic representation of regulatory regions at a transcription start site (TSS) of a typical gene. DNA is wrapped around histones which have modifications on their tails, denoted by coloured shapes in the legend. These modifications mark different features of the regulatory landscape, including enhancers, promoters, and active gene bodies. Adapted from Huayun Hou (Rzeczkowska et al. 2014).

9

Integrating multiple genome-wide ChIP-seq assays to annotate genomes and epigenomes can be done by creating ‘chromatin state maps’ (Hoffman et al. 2012; Ernst and Kellis 2013). For example, chromatin state maps can be used to provide an interpretable summary of numerous massive histone modification and TF binding datasets. Chromatin state maps were used to define chromatin states for six human cell lines including HUVECs (Ernst and Kellis 2013). Recently, a large collection of chromatin states was published for mouse tissues (Yue et al. 2014), and human tissues (Roadmap Epigenomics et al. 2015), which included active regulatory regions in 111 human tissues and cells. The integration of massive amounts of data is critical for defining biologically meaningful regulatory regions.

Thesis Rationale for Chapter 1 Epigenetic mechanisms control artery- and vein-specific EC gene expression. The epigenetic profiles of human umbilical vein ECs published by ENCODE has revealed many insights into EC gene regulation, however, a comparable epigenetic map of an arterial model of ECs does not exist. In Chapter 1, we used ChIP-seq to experimentally determine the genome wide enrichments for several representative histone modifications and the transcription factors JUN and CTCF in primary HAECs. We integrated this data and published HUVEC data to generate a chromatin state map. My objective was to identify shared and vascular-bed-specific gene regulatory regions and ask if these regulatory regions are associated with unique functional properties, such as biological pathways and regions of the genome linked to vascular diseases.

10

Chapter 1 Materials and Methods

Human aortic endothelial cell culture Two biological replicates of HAECs were obtained commercially and were isolated from donors aged 15 and 60 years (Cell Applications catalogue #304-05a). All donors were listed as healthy males, and each lot of cells was tested for expression of cell surface marker CD31. Endothelial cells express CD31 at cell junctions, and this surface protein is important for EC- leukocyte adhesion. The supplier tested these ECs for their ability to uptake and degrade DiI fluorescently labeled acetylated-low density lipoproteins (ac-LDL). ECs regularly uptake ac- LDL in vivo through the scavenger cell pathway of LDL metabolism, and this functional test has become a hallmark for confirming EC identity (Voyta et al. 1984; Adams et al. 2013). HAECs were grown in supplier-recommended Endothelial Cell Growth Media (Cell

Applications, catalogue #211-500) and cultured at 37° C in a 5%-CO2 humidified incubator. We expanded HAECs and checked the cells daily for cobblestone morphology and changes in media indicating contamination. We split cells consistently with a 4:1 ratio for 3 passages, for an estimated 10 total population doublings (33-hour doubling time). Early passage HAECs (P4) were frozen down for future functional experiments. HAECs were used at passage 6, within the recommended expansion limit of 15 population doublings. At time of harvest, HAECs had proper cobblestone morphology through visual inspection with a light microscope.

ChIP-seq Methods

Formaldehyde fixation HAECs were grown to 100% confluency in 15-cm plates, washed with phosphate buffered saline (PBS) to remove serum, and fixed with 10 mL of freshly-prepared 1% formaldehyde (Canemco & Marivac, catalogue #0173) in Solution A (50 mM HEPES–KOH pH 7.4, 100 mM NaCl, 1 mM EDTA, 0.5 mM EGTA) for 10 minutes at room temperature following previously established protocols (Schmidt et al. 2009). The formaldehyde was quenched with 0.5 mL of 2.5M glycine at room temperature for 5 minutes, then the cells were washed twice with ice cold PBS and kept at 4° C. The cells were scraped off the plates, collected by spinning at 2000g for 5 minutes at 4° C, aliquoted in 10 million cell pellets, frozen on dry ice, and stored at -80° C.

11

Chromatin immunoprecipitation Chromatin immunoprecipitation experiments were conducted as previously described with any modifications noted below (Schmidt et al. 2009). 100 uL of magnetic protein-G Dynabeads (LifeTechnologies, catalogue #10004D) were washed three times and pre-blocked with 1 mL of 0.5% BSA (Sigma, catalogue #A8412) w/v in PBS. Antibodies used for ChIP were rabbit anti-H3K4me1 (Abcam, ab8895 polyclonal); rabbit anti-H3K4me2 (Millipore, 07- 030 polyclonal); mouse anti-H3K4me3 (Millipore, 17-678 monoclonal); mouse anti-H3K27ac (Millipore, 05-1334 monoclonal); mouse anti-H3K27me3 (Abcam, ab6002 monoclonal); rabbit anti-H3K36me3 (Abcam, ab9050 polyclonal); rabbit anti-CTCF (Millipore, 07-729 polyclonal); rabbit anti-Rad21 (Abcam, ab992 polyclonal); rabbit anti-Jun (Santa Cruz Biotechnology, sc1694 polyclonal); rabbit anti-NFKB(p65) (Santa Cruz Biotechnology, sc372 polyclonal). 10 µg of antibody was incubated with pre-blocked Dynabeads for at least 6 hours, rotating at 4° C, then washed three times with 1 mL blocking solution to clear excess unbound antibody. 10-50 million fixed cells were thawed on ice and treated with a series of lysis buffers supplemented with protease inhibitors (Roche, catalogue #05056489001). Cells were lysed in 10 mL of lysis buffer 1 (LB1, 50 mM HEPES–KOH, pH 7.5; 140 mM NaCl; 1 mM EDTA; 10% Glycerol; 0.5% NP-40 or Igepal CA-630; 0.25% Triton X-100) for 10 minutes rotating at 4° C and centrifuged at 2000g for 5 minutes at 4° C. Pelleted nuclei were washed with 10 mL of LB2 (10 mM Tris–HCL, pH 8.0; 200 mM NaCl; 1 mM EDTA; 0.5 mM EGTA) for 5 minutes rotating at 4° C, followed by centrifugation at 2000g for 5 minutes. The cell pellet was then resuspended in 3 mL of LB3 (10 mM Tris– HCl, pH 8; 100 mM NaCl; 1 mM EDTA; 0.5 mM EGTA; 0.1% Na–Deoxycholate; 0.5% N-lauroylsarcosine) and kept on ice. The samples remained in LB3 for the sonication. A microtip Misonix Sonicator was used to sonicate nuclei for 3 minutes and 20 seconds (10 seconds ON, 1 minute OFF for 20 cycles) to fragment DNA between 200 – 400 bp. The total amount of energy output was recorded for each sonication step, and ranged between 5000 – 6500 J. 150 uL of 20% triton X-100 was added to the sonicated chromatin followed by centrifugation at 20,000g for 10 minutes at 4° C. 150 uL of the sonicant or whole cell extract (WCE) was taken at this point as the input control for the ChIP experiment and stored at -20° C. Supernatants containing the sonicated DNA were distributed according to cell number required for each ChIP. 10-50 million fixed and sonicated cells were used per transcription factor immunoprecipitation and a minimum of 2 million cells was used for histone modification immunoprecipitation experiments. Lysates were added to protein-G bound antibodies and 12 incubated overnight at 4° C on a rotator. Immunoprecipitated DNA was collected using a magnetic stand and washed seven times with 1 mL RIPA buffer (50 mM HEPES–KOH, pH 7.5; 500 mM LiCl; 1 mM EDTA; 1% NP-40 or Igepal CA-630; 0.7% Na–Deoxycholate) and once with 1 mL TBS (20 mM Tris-HCl pH 7.6; 150 mM NaCl). Samples were centrifuged at 950g for 3 minutes at 4° C to remove residual TBS buffer, then resuspended in 200 uL ChIP elution buffer (50 mM Tris–HCl, pH 8; 10 mM EDTA; 1% SDS) and reverse cross-linked at 65° C overnight. 150 uL of ChIP elution buffer was added to 50 uL of WCE and reverse cross-linked together with the ChIP samples. After reverse cross- linking, the supernatant was separated from magnetic beads and diluted with 200 uL TE. Samples were treated with 8 uL of 1 mg/mL RNaseA (Ambion, catalogue #AM2271) for 30 minutes at 37° C, then 3 uL of 15 mg/mL Proteinase K (Invitrogen, catalogue #100005393) for at least 2 hours at 55° C. Phenol-chloroform-isoamyl alcohol extraction and ethanol precipitation (100% EtOH, 200mM NaCl, and 20 µg of Glycoblue (Ambion, catalogue #AM9516)) was used to recover purified ChIP DNA. Purified DNA (200-500 ng of immunoprecipitate, depending on immunoprecipitate) was resuspended in 44 uL of elution buffer (10mM Tris-HCL, pH 8.5) and stored at -20° C.

ChIP-seq library preparation ChIP DNA was prepared for Illumina sequencing by blunt-end repair, dA-tailing, and ligation of Illumina adaptors using a NEBNext DNA library preparation kit (New England Biolabs, catalogue #E6040L). Total ChIP DNA (approximately 200-500ng) and 220ng of DNA input (WCE) was end repaired for 30 minutes at room temperature, and then purified using column purification with either DNA Clean and Concentrator (Zymogen, catalogue #D4014) or PCR purification columns (Qiagen, catalogue #28106) as recommended by manufacturer’s protocol. Blunt-end repaired DNA was dA-tailed for 40 minutes at 37° C, then column purified. dA-tailed DNA was ligated to Illumina adaptors (final concentration 6.67 nM) that have a T- overhang and a proprietary uracil hairpin. USER enzyme was used to cleave the uracil hairpin of the Illumina adaptor, and adaptor-ligated DNA was column purified. The library was PCR amplified for 16-18 cycles using a universal primer and a barcoded primer (New England Biolabs, catalogue #E7335L). Each library made on the same day had a unique barcode that could be used to identify sequencing reads from a pool of libraries. PCR-amplified DNA was purified using PCR column purification and eluted in 20 uL of elution buffer for preparation of gel extraction, or 30 uL of TE for preparation of Pippin Prep size selection. Size selection was

13 performed in one of two ways; some HAEC basal samples (H3K27ac, Jun, H3K4me2, and Input) were gel purified by casting a 2% ultra-pure ChIP-grade agarose gel (BioRad, catalogue #161-3107) and using a Dark Reader to cut out DNA reaction products of 200-300bp. These samples were then purified using MinElute Gel Extraction Kit (Qiagen, catalogue #28604). All other HAEC library samples were size selected from 200-350bp using a 2% agarose dye-free automated size selection cassette from Pippin Prep (Sage Sciences, catalogue #CDF2010).

ChIP-qPCR and sequencing samples As a routine back up of ChIP-seq libraries, samples were reamplified by PCR for 25 cycles [98° C 2 min, (98° C 30 s, 65° C 30 s, 72° C 1 min) x 25 cycles, 72° C 5 min, 10° C hold] using universal primer pairs designed against the Illumina adaptors (1.3 AATGATACGGCGACCACCGAGA and 2.4 CAAGCAGAAGACGGCATACGAGAT). Reamplified samples were purified using PCR purification columns, quantified using NanoDrop or Qubit, and stored at -20° C. In some cases ChIP-seq libraries were tested for enrichment by qPCR. The qPCR reaction contained 11 uL total: 5.5 uL of SybrGreen Master Mix (LifeTechnologies, catalogue #4367659), 0.3 uL of 25 nM for each primer pair, 0.2 uL ddH2O, and 5 uL of 0.5 ng/uL sample. Each reaction was performed in triplicate against multiple primer pairs including gene desert, positive control, and known EC-specific amplicons. qPCR was performed on the Via7 Real Time PCR System (LifeTechnologies), using default thermocycling conditions [95° C 10 min, (95° C 15 s, 60° C 1 min) x 40 cycles] to amplify 80-120bp amplicons. Delta delta Ct method was used to calculate fold enrichment of each test amplicon relative to a gene desert and non- immunoprecipitated input sample (WCE) for calibration. That is, the Ct of each ChIP and input sample was calibrated to a gene desert, then to the Ct value of the corresponding input sample for various genomic loci. We used an empirical based lower cutoff of 20-fold enrichment between ChIP libraries and input controls to assess quality of the experiment before submitting samples to sequencing. Samples were submitted for quality control analysis to the Donnelley Sequencing Center (University of Toronto) or The Center for Applied Genomics (Hospital for Sick Children) for Bioanalyzer analysis and library quantification using KAPA Biosystems. Libraries were sequenced using Illumina HiSeq2500. We obtained approximately 21 million raw reads per sample (Supplementary Table 1). The flowcells were prepared and processed by the sequencing

14 facility according to the manufacturer's protocol, with 100-bp single-end sequencing for 75 cycles.

ChIP-seq data processing

Alignment and Peak Calling ChIP-seq and input reads from ECs were aligned to hg19 [GRCh37] genome assembly with Burrow-Wheeler Aligner (BWA), using default parameters (Li and Durbin 2009). Reads that mapped to more than one place in the genome were randomly assigned to a single locus. The FastQC pipeline was used to analyze the quality of sequencing data and to spot problems that originated with starting library material or the sequencer (Andrews 2010). Sequencing reads (100-bp reads) were automatically trimmed based on the quality of the bases called at each position. Peaks were called for each sample relative to the WCE input using MACS2 with a cut- off of false-discovery rate (q ≤ 0.05) (Zhang et al. 2008). In order to identify nucleosome- depleted regulatory regions for histone modifications, narrow peaks were stitched to form broad peaks using the broad option and default parameters in MACS2.

Accessing Data All signal tracks for ChIP-seq libraries generated for this thesis are available through a private link. To access these data, go to http://genome.ucsc.edu, My Data, Track Hubs, and under the “My Hubs” tab, paste the URL http://wilsonlab.org/Lina_Hub/hub.txt and click Add Hub. The default page is for hg19. For other species generated for this thesis (rat and bovine), change the “genome” dropdown menu. Click on Genome Browser and the LA Hub will be displayed for all libraries in that assembly. Change the viewing option in the dropdown menu from “hide” to “full” to view data. Raw and processed data files will also be made available on ArrayExpress (Kolesnikov et al. 2015) at the time of paper publication.

Quality control The ENCODE consortium proposed several quality control measures to help evaluate ChIP-seq experiments (Landt et al. 2012). We used these metrics to describe the quality of our data (Supplementary Table 1). ENCODE guidelines suggest that libraries with a PCR bottleneck coefficient < 0.5 should be reviewed. All of our libraries passed this criterion. Landt et al developed a cross-correlation analysis that is independent of peak calling to measure the quality of ChIP-seq samples – normalized strand coefficient (NSC) values below 1.05 and relative strand coefficient (RSC) values below 0.8 may be of poor quality. All of our samples met these 15 criteria. As a second independent measure of quality, we performed the Irreproducible Discovery Rate (IDR) analysis described by Landt et al., shown in Supplementary Table 3. Based on the quality control measures taken above, we merged sequencing reads from biological replicates, called peaks, and used merged peaks for all downstream analyses.

Functional gene enrichment analysis using GREAT Functional enrichment analysis of ChIP-seq data was performed using Genomic Regions Enrichment of Annotations Tool (GREAT) version 3.0 API (Mclean et al. 2010). GREAT has pre-computed regulatory domains for each gene in the genome and assumes that the noncoding sequences that lie within the regulatory domains regulate that given gene. Regulatory domains are defined as 5 kb upstream and 1 kb downstream from the TSS (basal regulatory region), and extended up to the basal regulatory domain of the nearest upstream and downstream genes (extended regulatory region) but no longer than 1 Mb in each direction. The TSS of the ‘canonical isoform’ is used as defined by UCSC Known Genes track (Hsu et al. 2006), and the basal regulatory region of genes can be overlapping. Using pre-computed regulatory domains GREAT determines the total fraction of the genome annotated for the genes within any given ontology term, and counts the number of user-defined peak regions that fall into regulatory domains of each ontology term (see Figure 1 from (Mclean et al. 2010)). The binomial test statistic is calculated and measures the likelihood of observing a peak in the regulatory domain of the genes within an ontology term given the total number of peaks in the input set. The hypergeometric test is also used to associate proximal (≤ 5 kb of TSS) binding events, and calculates the probability that one or more peaks fall within a regulatory domain of a gene given all the genes in the genome. Peak regions were submitted to GREAT using the API with the default association rule parameters. GREAT outputs were plotted using an R tool developed by our lab. To create summary figures, I chose to display the results from two important databases 1) GO Biological Processes, which describes the associated biological processes of genes (Gene Ontology 2015) and Mouse Phenotype Database, which associates phenotypic data from over 20,000 knock-out mouse strains to genes (Eppig et al. 2015)). The genes annotated to each ontology term in GO Biological Processes and Mouse Phenotype Database can be accessed through (Gene Ontology 2015) and (Eppig et al. 2015), respectively. We displayed the binomial fold enrichment, which calculates the probability of the query peak falling within the regulatory domain of a gene in an ontology term. Enrichments are considered significant by GREAT if they returned a binomial

16

FDR q-value ≤ 0.05, a significant hypergeometric FDR q-value (≤ 0.05), and a minimum binomial fold enrichment of 2 over the genome.

Other publically-available datasets used in thesis In order to compare our data with publically available ENCODE data, we downloaded and processed ChIP-seq samples for the same histone modifications, CTCF, and JUN from HUVECs (Linnemann et al. 2011; Bernstein et al. 2012). To discriminate enhancer and JUN regions that are private to ECs, we downloaded and processed the raw sequencing files for H3K27ac and JUN from HepG2 for comparison (Consortium 2012). We aligned sequencing reads and called peaks as above. All samples passed the same quality control measures used to evaluate ChIP libraries above. Information for these libraries including the links to raw data, number of reads and peaks are shown in Supplementary Table 2.

Chromatin state map construction To generate a chromatin state map in HAEC and HUVEC we used ChromHMM (Ernst and Kellis 2012). One must first establish the number of chromatin state annotations to use for the whole genome. The ENCODE consortium (Ernst and Kellis 2013) and others (Roadmap Epigenomics et al. 2015) typically construct several models with numbers of states ranging from 10-60 based on which factors are used. State models of different regions can be assigned labels by the user (e.g. active promoter), which are biologically intuitive. We trained several models with the number of states ranging from 8 to 20 states to determine the optimal number of states to define. We determined that a 13-state model is optimal for all further analyses that provided sufficient resolution to capture biologically meaningful patterns, and a larger number of states did not capture significantly new interactions.

Defining differentially bound regions To define differential enhancer and Jun binding sites, sequencing reads normalized to read depth (reads per kilobases per million reads) were defined for each library and compared between HAEC, HUVEC, and HepG2 enhancers and Jun sites using an R bioconductor package, DiffBind (Stark and Brown 2011). All HAEC and HUVEC peak coordinates were queried for differential enrichment of signal between the three cell lines with a false discovery rate cutoff of 0.05. This allowed us to determine if peaks were shared between endothelial cells, were found in all cell lines (pan-tissue), or were specific to each cell line. Heat maps were generated showing the signal at EC peak coordinates centered on the highest peak summit, and

17 separated by the differential binding status (either common between ECs, specific to aortic EC, or specific to vein EC).

18

Chapter 1 Results

1.1 Characterizing the epigenetic landscape of human aortic endothelial cells To generate an epigenetic map of HAECs, I used ChIP-seq to profile histone modifications H3K4me3 (mark of active promoter), H3K4me2 (mark of active/latent promoter and enhancer), H3K27ac (mark of active enhancers and promoters), H3K36me3 (mark of active gene bodies), H3K4me1 (mark of active enhancers), H3K27me3 (mark of polycomb-repressed regions), and transcription factors JUN (AP-1 member) and CTCF (found at the borders of chromosomal domains and loops (Phillips and Corces 2009)) in HAECs. These experiments were conducted on HAECs from two different donors, and the raw sequencing data from both replicates were merged since it passed multiple quality control measures (see page 15 and Supplementary Table 3). We called peaks for histone modifications (average peak length 1.8kb), JUN (average peak length 253bp) and CTCF libraries (average peak length 320bp) (see Supplementary Table 3 for peak numbers). We next compared our aortic EC experiments to publically available datasets for HUVEC cell line which was generated by ENCODE (Linnemann et al. 2011; Bernstein et al. 2012) to identify regulatory elements that are common between ECs as well as unique to each EC cell type. We processed HUVEC data (Supplementary Table 3, for data access see page 15) using the same methods and showed that the quality of the data is comparable to our ChIP-seq experiments (Supplementary Figure 1). Two representative genomic loci are highlighted 1) EC- expressed CCL2, and 2) arterial EC-specific HEY2 genes (Figure 4). CCL2 encodes a chemokine involved in immunoregulation and the inflammatory process and has marks of activation in both ECs that are consistent with the chemokine’s functions (Han et al. 2004; Bonello et al. 2011). HEY2 encodes a TF under the control of the Notch signaling pathway only expressed in arterial ECs and has a marked repression signal in HUVEC (H3K27me3) (Aranguren et al. 2013).

19

Figure 4: Epigenetic profiling of HAEC and HUVEC at the CCL2 locus, which is active in both HAEC and HUVEC, and HEY2 locus, which plays an important role in arterial gene regulation. Figure 4 Legend: UCSC Browser screenshots of HAEC (red) and HUVEC (black) signal at CCL2 and HEY2. The total size of genomic window (x-axis) is 30kb per screenshot, and the y- axis represents number of sequencing reads per library (50bp). Boxed regions of CCL2 show 20 similar marks of active enhancers in both ECs, and boxed regions of HEY2 show specificity for activation in arterial ECs and repression in venous ECs.

While each ChIP-seq library is independently informative, integrating data from multiple ChIP-seq experiments allows for higher-confidence categorization of regulatory elements. Chromatin states were defined de novo on the basis of different combinations of histone modifications that occur genome-wide using ChromHMM (Ernst and Kellis 2012) (see page 17). We manually annotated a model of 13 chromatin states based on the relative enrichment of each library per state (Figure 5, Panel A), corresponding to previously published annotations (Ernst et al. 2011; Roadmap Epigenomics et al. 2015). By calculating the relative enrichment of each state throughout the genome, we categorized 35% of the genome into functional regulatory classes, and 65% as having low ChIP-seq signal (Figure 5, Panel B). A side-by-side comparison of HAECs states and HUVECs states generally shows high similarity and overlap of genomic features in each state (Figure 5 Panel C). With genome-wide categorization of various regulatory annotations, we sought to identify how similar the location of each annotation is in each cell type. To estimate the significance of the similarity between the chromatin state of HAECs and HUVECs, we calculated the log2 fold enrichment of observed number of overlaps relative to the number of expected base pair overlaps using Genome Association Tester, GAT (Heger et al. 2013). We generated an expected background model by conducting 1,000,000 simulations and -6 controlling for multiple testing by using a false discovery rate of 10 . The log2-fold enrichment from HAEC to HUVEC perspective, and vise versa, are very similar to each other (Figure 5, Panel D). This means that the observed base pair overlap of each EC cell line is reciprocal and similar number of base pairs overlap from each perspective. Active TSS (State 1) has a high fold enrichment value that indicates there is a 171-fold or 7.42 log2 fold more nucleotide base pair overlap than expected by chance. Thus, perhaps as expected, active TSSs are highly similar between HAEC and HUVEC. Interestingly the lowest fold-enrichments were obtained for polycomb-repressed (State 7) and weakly transcribed (State 11) regions, which indicate that these regions are less similar between the two cell types but still have highly significant overlap. We did not generate fold enrichments of similarity between HAEC and HUVEC for State 8, as it covered the vast majority of the genome and was too computationally intensive. We expected that genes like VWF, which is generally used as an EC marker (Aird et al. 1997), to have similar chromatin states between HAEC and HUVEC (Figure 5, Panel E). 21

Indeed, VWF contains mainly active chromatin states including “active transcription” and “genic enhancers” in both HAEC and HUVEC. Another example where regulation between HAEC and HUVEC is similar is within the HOXD cluster of genes (Figure 5, Panel F). This region is broadly marked by a repressive chromatin state in both HAEC and HUVEC, with the exception of the HOXD8 locus that has an active enhancer mark (H3K27ac) in HUVEC. This is expected, since HOXD8 is required for differentiation of lymphatic endothelial cells that derive from vein ECs (Harada et al. 2009) (see Figure 2). Overall, chromatin state analysis shows that the majority of the epigenome is similarly regulated in HAEC and HUVEC. This outcome is probable since both are the same cell type and perform similar biological functions. However, given that there are differences in the transcriptional programs that control artery- and vein- specific gene expression that persist ex vivo in cell culture conditions (Chi et al. 2003), we sought to identify and characterize differences in the active regulatory regions and JUN sites of artery- and vein- ECs.

22

Figure 5: Integrating multiple histone modification and CTCF binding data reveals that the epigenome is similarly regulated between HAEC and HUVEC.

23

Figure 5 Legend: Panel A: Chromatin states trained jointly across HAECs and HUVECs by ChromHMM. Relative enrichment in emissions denotes the frequency of each specific chromatin mark along the x-axis observed at each state on the y-axis. Panel B: Genomic coverage of each state for HAEC and HUVEC. Panel C: Annotation enrichments of each state in HAEC and HUVEC for various genomic features from RefSeq (x-axis). Panel D: Log fold- enrichment of observed/expected number of base pair overlaps between states. Panel E and F: Chromatin state analysis at VWF (expressed) and HOXD cluster (repressed) with matching colours indicating states from Panel A. Active enhancer mark H3K27ac, polycomb-repression mark H3K27me3, and active gene body mark H3K36me3 ChIP-seq sequencing reads (y-axis, 50bp) for HAEC (red) and HUVEC (black) at corresponding genomic window of 350kb (x- axis).

24

1.2 Active enhancer and promoter regions can distinguish biological pathways between HAEC and HUVEC We have shown that the epigenomes of HAEC and HUVEC are highly similar using chromatin state maps (Figure 5 Panel A-D and Supplementary Figure 1). Given that EC vascular-bed-specific gene expression profiles persist ex vivo in cell culture (Chi et al. 2003) (Figure 1), we sought to identify artery- and vein- specific gene regulation. To distinguish between HAEC and HUVEC, we identified differentially bound H3K27ac and JUN sites using an R bioconductor package, DiffBind (Stark and Brown 2011) (see page 17). This analysis used ChIP-seq reads normalized to input libraries (as opposed to simply overlapping peaks) to identify occurrences and intensity of differential peaks. Of the 64,489 H3K27ac EC peaks (HAEC plus HUVEC), 4,363 (6.8%) were differentially bound in HAEC (“HAEC-only”) and 3,756 (5.8%) were differentially bound in HUVEC (“HUVEC-only”). The remaining 56,370 peaks (87.4%) were considered not significantly different. From the 78,409 JUN EC peaks (HAEC plus HUVEC), 1,536 (2%) were differentially bound in HAEC and 1,383 (1.7%) peaks were differentially bound in HUVEC (Figure 6).

25

Figure 6: There are relatively few HAEC or HUVEC-specific gene regulatory regions. Figure 6 Legend: Differential binding analysis of H3K27ac and JUN sites shows that there are relatively few differentially bound gene regulatory regions between HAECs and HUVECs. H3K27ac and JUN peaks from HAEC and HUVEC were compared based on signal intensity normalized to respective input (q-value = 0.05), and categorized as not differentially bound, HAEC-only, and HUVEC-only. The signal for each peak centered on the highest peak summit with a window of (+/-) 5kb is plotted in the heat map for HAEC and HUVEC reads. Green horizontal lines separate the HAEC or HUVEC specific peak regions from the not differentially bound regions.

26 We then asked whether HAEC- and HUVEC-only H3K27ac and JUN peaks enriched for specific genes and pathways that are consistent with their vascular bed of origin. We used our chromatin state maps to identify the HAEC and HUVEC peaks that marked regions that were considered to be in active states (active promoter or enhancer states 1-4, and 12 from Figure 5). Through this analysis, we found that 4017 (92%) of 4,363 HAEC-only and 3,605 (96%) of 3,756 HUVEC-only H3K27ac peaks are in promoter and enhancer states. In JUN binding sites, 1,209 (79%) of 1,536 HAEC-only and 623 (45%) of 1,383 HUVEC-only peaks are in active promoter and enhancer states. Using Genomic Regions Enrichment of Annotations Tool (GREAT) we found biological pathways that were enriched in HAEC-only and HUVEC-only H3K27ac and JUN sites (Figure 7). We observed that HAEC-only H3K27ac regions were enriched for the Notch signaling pathway with the top GO Biological Process category being “Notch signaling involved in heart development” (binomial FDR q-value 1.01e-7, fold enrichment = 4.8). The top Mouse Phenotype database enrichments included abnormal vasculature (e.g. thick myocardium binomial FDR q-value = 6.95e-9, fold enrichment = 5.3). HUVEC-only H3K27ac regions were enriched for sprouting angiogenesis (binomial FDR q- value = 1.90e-5, fold enrichment = 2.10). JUN peaks differentially bound in HAEC were also enriched for Notch signaling (binomial FDR q-value = 2.8e-2, fold enrichment = 5.04). HUVEC-only regions were enriched for the cellular proliferation through PI3K pathway (binomial FDR q-value = 4.85e-2, fold enrichment = 1.3), a known regulator of tumor angiogenesis and vascular remodeling (Suzuki et al. 2007). We observed Notch signaling pathway target genes HEY1, HEY2, NOTCH1, and NOTCH2 among the top HAEC-only biological enrichments. Genes giving rise to HUVEC-only biological pathway enrichments include EC-related inflammatory genes like VEGFA, ROBO1, and SLIT2 genes (Verissimo et al. 2009; Zhao et al. 2014). By comparing our HAEC epigenetic atlas to HUVEC we found that although the vast majority of active enhancers and JUN binding regions are shared, we use our resource to detect vascular-bed-specific pathways.

27 We then asked whether HAEC- and HUVEC-only H3K27ac and JUN peaks enriched for specific genes and pathways that are consistent with their vascular bed of origin. We used our chromatin state maps to identify the HAEC and HUVEC peaks that marked regions that were considered to be in active states (active promoter or enhancer states 1-4, and 12 from Figure 5). Through this analysis, we found that 4017 (92%) of 4,363 HAEC-only and 3,605 (96%) of 3,756 HUVEC-only H3K27ac peaks are in promoter and enhancer states. In JUN binding sites, 1,209 (79%) of 1,536 HAEC-only and 623 (45%) of 1,383 HUVEC-only peaks are in active promoter and enhancer states. Using Genomic Regions Enrichment of Annotations Tool (GREAT) we found biological pathways that were enriched in HAEC-only and HUVEC-only H3K27ac and JUN sites (Figure 7). We observed that HAEC-only H3K27ac regions were enriched for the Notch signaling pathway with the top GO Biological Process category being “Notch signaling involved in heart development” (binomial FDR q-value 1.01e-7, fold enrichment = 4.8). The top Mouse Phenotype database enrichments included abnormal vasculature (e.g. thick myocardium binomial FDR q-value = 6.95e-9, fold enrichment = 5.3). HUVEC-only H3K27ac regions were enriched for sprouting angiogenesis (binomial FDR q- value = 1.90e-5, fold enrichment = 2.10). JUN peaks differentially bound in HAEC were also enriched for Notch signaling (binomial FDR q-value = 2.8e-2, fold enrichment = 5.04). HUVEC-only regions were enriched for the cellular proliferation through PI3K pathway (binomial FDR q-value = 4.85e-2, fold enrichment = 1.3), a known regulator of tumor angiogenesis and vascular remodeling (Suzuki et al. 2007). We observed Notch signaling pathway target genes HEY1, HEY2, NOTCH1, and NOTCH2 among the top HAEC-only biological enrichments. Genes giving rise to HUVEC-only biological pathway enrichments include EC-related inflammatory genes like VEGFA, ROBO1, and SLIT2 genes (Verissimo et al. 2009; Zhao et al. 2014). By comparing our HAEC epigenetic atlas to HUVEC we found that although the vast majority of active enhancers and JUN binding regions are shared, we use our resource to detect vascular-bed-specific pathways.

27

Figure 7: Pathway enrichment analysis of JUN and H3K27ac peak regions that were classified as HAEC-only or HUVEC-only 28

Figure 7 Legend: Pathway enrichments for HAEC-only and HUVEC-only regions as described in Figure 6. The top 5 enrichments (y-axis) for each category (HAEC-only and HUVEC-only JUN and H3K27ac sites), if available, are shown for each database and the corresponding enrichments are shown across the x-axis for all three categories of conservation. Entries are ranked by –log10 binomial FDR q-values (x- axis). The size of the asterisk is proportional to the binomial fold change obtained for the given database.

29

1.3 Searching for functional regulatory elements in HAEC-only and HUVEC-only regions

Evidence of transcription factor motifs within peaks reveal possible binding partners of JUN We conducted a motif enrichment analysis using a matrix-based motif discovery tool peak-motifs in RSAT (Thomas-Chollier et al. 2012) to identify overrepresented TF binding motifs within H3K27ac and JUN peaks in HAEC-only and HUVEC-only regions. We identified several primary and secondary motifs significantly overrepresented in each category that match to the JASPAR database of TF binding motifs (Mathelier et al. 2014) (Table 1). As expected, JUN peaks were enriched for AP-1 monomers JUN, JUND, JUNB, FOS, and FOSL2 binding sites in all categories, which serves as a positive control to show that the JUN ChIP-seq library is enriched for JUN and other AP-1 related cofactor binding sites. Interestingly, ERG binding sites were predicted within both HAEC-only and HUVEC-only JUN peaks. ERG is a known regulator of angiogenesis and controls vascular stability (Birdsey et al. 2015). Other transcription factor binding sites that were enriched in both HAEC-only and HUVEC-only regions belonged to the ETS family of transcription factors (FEV, FLI1, and ELF1), which regulate cellular differentiation, apoptosis, and angiogenesis (Sharrocks 2001). We also examined broader regulatory regions marked by H3K27ac and found TF motifs GATA3, and ETS family of TFs including FEV, FLI1, and ETS1. To find HAEC-only secondary motifs, which are overrepresented nucleotide sequences that match to JASPAR matrices when primary binding motifs of JUN are masked within the peaks, we used control sequences from HUVEC-only regions to mask potential regulators of HUVECs. Secondary motifs are more distinct than primary motifs. HAEC-only secondary motifs in JUN peaks include EGR1, a transcription factor that responds to inflammatory stimuli and regulates apoptosis in vascular cells (Fu et al. 2003; Azahri and Kavurma 2013), and SP1 and SP2, which are ubiquitously expressed housekeeping transcription factors (Solomon et al. 2008). Interestingly, HUVEC-only secondary motifs identified in JUN peaks included MEF2C, which is a strong inhibitor of angiogenesis and inflammatory response in ECs (Sturtzel et al. 2014; Xu et al. 2015). Secondary motifs in HAEC-only H3K27ac peaks included TCF3, which has been shown to directly enhance Notch signaling gene HES1 (Ikawa et al. 2006). Thus, by analyzing overrepresented TF motifs within JUN and H3K27ac sites that were specific to each EC cell-type, we identified potential co-regulators of JUN in vascular-bed-specific regions.

30 Table 1: TF binding sites in HAEC-only and HUVEC-only JUN and H3K27ac peaks reveal possible binding partners of JUN. Table 1 Legend: The RSAT tool peak-motifs was used to mine overrepresented TF motifs with a background model of the whole genome and matched de novo discovered motifs to the JASPAR database of transcription factor binding motifs (Mathelier et al. 2014). The motif matches to JASPAR database are displayed for each category. Secondary motifs are defined as motifs found in a peak sequence that has been modified to mask the “correction category” sequences from each test set. That is, other potential cofactors besides the predominant binding motifs (primary motifs) within the JUN and H3K27ac HAEC-only and HUVEC-only regions. Secondary motifs Peaks in active Correction with correction Factor Category chromatin state Primary motifs category category FOSL2, JUNB, FOSL1, KLF5, SP1, SP2, Erg, FLI1, ELF1, FOS, BATF::JUN, JUN_(var.2), HUVEC-only EGR1, SP1, SP2, Jun HAEC-only 1,209 JUN::FOS, BATF::JUN, JUND (623 peaks) JUND, BATF::JUN, FOSL2, RELA, Erg, FEV, MEF2C, JUNB, FOS, SRY, Sox6, SOX10, FLI1, ELF1, JUND, FOSL2, HUVEC- JUN::FOS, JUND_(var.2), CREB1, JUN, HAEC-only FOXP1, Sox3, only 623 JUNB, JUN, NFE2L1::MafG (1,209 peaks) FOXO3, , , TEAD1, REL, SPIB, NFIC, ZEB1, Tcf3, GATA3, Mecom, FEV, Erg, FLI1, Ets1, FEV, HUVEC-only EWSR1-FLI1, EHF, H3K27ac HAEC-only 4,017 ELF5, FOSL1, JUN::FOS, FOSL2, (3,605 peaks) Erg TEAD1, REL, SPIB, E2F1, E2F4, Gata1, GATA3, Mecom, SRY, Sox3, Sox5, Prrx2, CREB1, JUN, Pdx1, HOXA5, JUN::FOS, BATF::JUN, JUND_(var.2), HUVEC- FOSL2, FEV, Erg, FLI1, GABPA, FLI1, ELK4, HAEC-only HIF1A::ARNT, only 3,605 ELF1, NRF1 (4,017 peaks) Pax2,

31 Known human regulatory mutations in EC gene regulatory regions are associated with disease Variants in regulatory regions can have direct functional consequences by altering a TF- binding motif. For example, in the proximal promoter of Factor 9 (F9) gene, G>A and A>G conversions at positions -6 and -5, respectively, caused an alteration to the transcription factor binding motif of ONECUT1/2 and led to the diminished levels of F9 (Funnell et al. 2013). These two distinct point mutations are found in more than half of all individuals with hemophilia B Leyden and not found in non-hemophilic populations. Similarly, we were interested to find known human regulatory mutations that could potentially alter TF binding sites in HAEC-only and HUVEC-only regions. We used the Human Gene Mutation Database (HGMD), a manually curated catalogue of mutations that underlie or are associated with human inherited disease (Stenson et al. 2014). From a total of 134,151 mutations across the genome, the HGMD has 2,896 annotated non-coding regulatory mutations of which 1,000 are disease causing mutations, 597 are disease-associated polymorphisms with significant association (p<0.05), 767 are disease-associated polymorphisms with additional supporting evidence pointing to direct functional importance, and 532 are functional polymorphisms with evidence from in vitro or in vivo studies. Collectively, these various mutations will be referred to as HGMD variants. While there are still considerable challenges in associating regulatory DNA variants with diseases, the HGMD is currently the most comprehensive catalog of these mutation types. We asked which regulatory HGMD variants were found in HAEC-only and HUVEC-only H3K27ac and JUN peak regions (Supplementary Table 4). Within HAEC-only H3K27ac peaks (N=4,363) we found 37 regulatory HGMD variants annotated to 25 unique genes. For example, there are 3 regulatory mutations around transforming growth factor beta receptor II (TGFBR2), which is associated with Marfan syndrome and Loeys-Dietz aortic aneurysm syndrome. These connective tissue disorders are marked by aortic aneurysms due to weakening layers of the aorta (Singh et al. 2006). Furthermore, there are 3 variants near nitric oxide synthase-3 (NOS3), a gene that produces nitric oxide and mediates vascular endothelial growth factor-induced angiogenesis and blood clotting through the activation of platelets (Morbidelli et al. 2003). Interestingly, in HAEC-only JUN peaks (N=1,536), we found 16 regulatory HGMD variants, 2 of which were the same NOS3 variants as in HAEC-only H3K27ac peaks. This means that 2 regulatory variants (rs2070744 and rs3918226) at NOS3 are within both HAEC-only H3K27ac and JUN peak

32 regions (Supplementary Figure 2). In HUVEC-only H3K27ac peaks (N=3,756), we found 36 HGMD variants annotated to 21 unique genes, and no HGMD variants in HUVEC-only JUN peaks. Many of the regulatory variants in HUVEC-only H3K27ac regions are associated with cancer, including colorectal, lung, thyroid, and renal cancers. Tumor cells initiate EC sprouting and angiogenesis and provide essential nutrients to the tumor through vascularization (Nakatsu et al. 2003; Quail and Joyce 2013). A gene set enrichment analysis with g:Profiler (Reimand et al. 2007) using HAEC-only H3K27ac sites reveals that biological pathway enrichments for processes like morphogenesis of a branching epithelium (hypergeometric p-value = 7.4e-6), and the regulation of systemic arterial blood pressure by endothelin (hypergeometric p-value = 3.3e- 2) (Supplementary Table 8, Section A). HAEC-only JUN sites harboured variants related to response to external stimuli (hypergeometric p-value = 3.1e-5), and regulation of cell communication (hypergeometric p-value = 5.5e-3) (Supplementary Table 8, Section B). Biological pathway enrichments for genes annotated to HGMD variants that fall in HUVEC- only H3K27ac were not significant. Our findings suggest the possible involvement of ECs in these and other diseases. Furthermore, these regulatory regions can be prioritized for functional experiments, as disease- causing variants have already been identified within these regions. There are many more regulatory variants that require functional validation, for example the DFP category of variants has supporting evidence from reporter gene constructs and gene editing experiments, however the mechanisms and the relationship to disease causality is not yet clear. The ratio of disease causing mutations (DM) to disease-associated polymorphisms with additional supporting functional evidence (DFP) for is 0.35, 0.58, and 0.4 for HAEC-only H3K27ac peaks, HUVEC- only H3K27ac peaks, and HAEC-only JUN peaks, respectively. This means that further experimentation is required to link these variants to disease. Future experiments should include determining which TF binding motifs are altered as a consequence of these variants and functionally dissecting these regions using genome-editing techniques.

33

1.4 Prominent regulatory regions private to endothelial cells highlight endothelial cell- specific biology We established that the vast majority of active regulatory regions and JUN sites are common between artery- and vein- ECs. This raises the question of which of these regulatory regions are important for fundamental EC biological processes. To search for EC-specific pathways that are enriched for endothelial processes, we defined EC-private H3K27ac and JUN sites by comparing to a non-mesoderm derived cell type for which the same data was collected (Consortium 2012) (Supplementary Table 3). We chose H3K27ac and JUN data from ENCODE ascertained for a liver cancer cell line (HepG2), since the liver and ECs are both involved in maintaining hemostasis and responding to inflammatory stimuli. We compiled a set of EC peaks for H3K27ac (N=64,489) and JUN (78,409) and asked which ones were significantly enriched in ECs relative to HepG2. We termed these regions “EC-private” (Supplementary Table 5). If there was no significant difference in ChIP-seq reads between ECs and HepG2 we labeled these EC peaks as “pan-tissue” (Supplementary Table 5). A total of 12,747 (19.7%) of 64,489 H3K27ac peaks and 6,549 (8.4%) of 78,409 JUN peaks were found to be EC-private. The rest of the regions were not significantly different between the three cell types and were categorized as pan-tissue (51,742 (80.2%) of 64,489 H3K27ac peaks and 71,860 (91.6%) of 78,409 JUN peaks). To find pathways associated to EC-private and pan-tissue H3K27ac and JUN regions, we used GREAT to link regions to genes and identify biologically enriched pathways (Mclean et al. 2010) (Figure 8). There were too many pan-tissue H3K27ac and JUN peaks to effectively look for enrichments (N=51,742 and N=71,860, respectively, as GREAT recommends less than 15,000 regions). However, we were able to look for pathways enriched for EC-private regulatory regions. GREAT annotates each peak region to the nearest gene by default extension parameters (see page 16), and calculates the binomial fold enrichment of genes within a given ontology database. We found that EC-private H3K27ac peaks were enriched for biological pathways related to EC biology. For example the most significant enrichment we observed in the GO Biological Process database was for platelet-derived growth factor receptor (PDGFR) signaling pathway (binomial FDR q-value = 3.0e-12, fold enrichment = 2.1). Many of the genes in the PDGFR pathway are closely related to vascular endothelial growth factors (Ball et al. 2007; Tran et al. 2013). Additional EC related enrichments were observed in the Mouse Phenotype database. These included phenotypes associated with EC-private genes such as pulmonary fibrosis that occurs post-EC injury (Leach et al. 2013) (binomial FDR q-value = 34

3.1e-9, fold enrichment = 2.1) and disrupted neovascularization (binomial FDR q-value = 3.0e- 8, fold enrichment = 2.1). EC-private JUN regions showed more detailed EC enrichments including genes that are positive regulators of angiogenesis (binomial FDR q-value = 2.0e-11, fold enrichment = 2.0) from the GO Biological Process database and abnormal cardiovascular morphogenesis (binomial FDR q-value = 4.7e-9, fold enrichment = 2.13) from the Mouse Phenotype database. Importantly, we show that these categories are not similarly enriched across pan-tissue regions based on the high number of pan-tissue regions. Thus, defining regulatory regions that are private to ECs has allowed us to identify EC-specific biological pathways. However, it should be noted that the large number of EC-private regions decreases the probability of detecting specific enrichments using tools such as GREAT. Thus, a principled method to categorize this kind of epigenetic data is needed. In Chapter 2, I will explore the utility of comparative epigenomics to further dissect EC gene regulatory regions.

35

Figure 8: EC-private H3K27ac and JUN sites are enriched for biological pathways involved in inflammation and angiogenesis. Figure 8 Legend: The top 5 enrichments (y-axis) for each category (EC-private and pan-tissue H3K27ac and JUN sites), if available, are shown for each database and the corresponding enrichments are shown across the x-axis for all three categories of conservation. Entries are ranked by –log10 binomial FDR q-values (x-axis). Significant enrichments are labeled with an asterisk relative to the binomial fold enrichment of the gene set.

36

Chapter 1 Summary We generated an epigenetic map of the human aortic endothelial cell genome. This map can serve as a resource for researchers in the endothelial cell biology community and a platform from which to build cross species transcription factor and epigenetic comparisons (Chapter 2). We have profiled histone modifications that are widely used to mark active enhancers, promoters, and gene bodies, as well as repressive histone marks and a chromatin looping factor. By integrating multiple ChIP-seq experiments, we built a chromatin state map that allows us to confidently define chromatin domains and make comparisons between HAEC and HUVEC epigenomes. We showed that the majority of the epigenome is similarly regulated in ECs by comparing the location of chromatin state annotations genome-wide. We dissected active enhancer and JUN sites and identified many common elements between HAEC and HUVEC. For example, we found that regulatory regions that distinguished between artery and vein ECs were related in part to peaks close by to Notch signaling target genes for HAEC. Given that we identified many common elements, we asked which of the regulatory regions were fundamental to endothelial functions by comparing to a commonly used and non- mesoderm derived cell line, HepG2. We found regulatory regions private to ECs were enriched for genes in the inflammatory response and angiogenesis, while regions common among all three tissues (pan-tissue) were enriched for genes in general cellular processes like actin filamentation and mRNA nuclear export. In order to position our findings in relation to known human diseases, we searched for variants in HAEC-only and HUVEC-only peaks. We found variants related to diseases with plausible EC phenotypes, for example Marfan syndrome and Loeys-Dietz syndrome in HAEC- only peaks and cancer-related mutations in HUVEC-only peaks, suggesting there is inherent value in performing vascular-bed-specific epigenetic profiling and building vascular-bed- specific chromatin state maps. Overall, our results showed that the vast majority of regulatory regions were shared among ECs, although a remaining open question is whether these regions are functional. This question will be addressed in Chapter 2 by prioritizing potential functional sites using conservation. In Chapter 2, we specifically use evolutionary conservation of active enhancer and JUN sites to further distill these regulatory regions into potential functional units.

37

Chapter 2 Introduction

Identifying functional cis-regulatory elements using comparative genomics Regions of the genome that are functional often have conserved DNA sequence, such as what is typically observed in exonic regions that make up protein-coding genes. The regions can have detectable sequence similarity conservation and can be aligned in distantly related species that are separated by hundreds of millions of years. While ~1.5% of the genome is protein coding, ~5% of the has been shown to be directly under stabilizing selection, which covers many conserved functional regulatory elements (Cooper et al. 2005; Lindblad-Toh et al. 2011; Gulko et al. 2015). There are various strategies to identify functional non-coding regulatory regions of the genome. One way is to use evolutionary conservation of DNA sequence across species, as is commonly done for protein coding genes. Using multiple sequence DNA alignments to find conserved regions of non-coding DNA is called phylogenetic foot-printing and was first described by Tagle et al., at the gamma-globin gene HBG1 in primates (Tagle et al. 1988). Using multiple species DNA sequence alignments, Tagle et al., identified “invariant” DNA sequence regions in 5’ UTR, promoter, and intronic regions, which led them to ultimately identify target sites for trans-acting factors that regulate gamma globin gene expression. When this technique is used between closely related species (such as primates) and an outgroup, it can be used to find DNA sequence that accounts for uniquely primate traits; this is known as phylogenetic shadowing and it has been used to identify many functional regulatory elements that would likely have been missed by human-rodent comparisons alone (Boffelli et al. 2003). Computational methods have since been developed and are widely used to assess the DNA sequence constraint among multiple species (Siepel et al. 2005; Cooper et al. 2010; Gulko et al. 2015). These methods take mutation rates of important non-coding regulatory regions into account and assign scores to each base pair to describe the level of functional constraint. These scoring methods have been widely used to recognize and prioritize causal disease variants in exons (Cooper et al. 2010), to compare and contrast regulatory versus protein-coding variants between populations (Vernot et al. 2012), to identify ultra-conserved gene regulatory regions (Pennacchio et al. 2006), and to annotate distinct chromatin states of various epigenomes (Roadmap Epigenomics et al. 2015).

38

Non-coding regions carry crucial information related to human disease gene regulation Non-coding regions of the genome are increasingly being associated to complex human disease traits. Genome Wide Association Studies (GWAS) have revealed that more than 80% of single nucleotide polymorphisms (SNPs) associated with human phenotypes and disease are found within non-coding regions which makes understanding the function of the non-coding genome a priority (Hindorff et al. 2009). Specifically, disease associated SNPs are enriched in tissue-specific regulatory regions of the appropriate disease cell type (Ernst et al. 2011; Maurano et al. 2012). Despite the importance of non-coding regions, understanding their function remains a challenge. Thus, I am interested to develop methods to uncover functional regulatory regions that are specific to ECs in order to understand more about EC gene regulation in health and disease.

Epigenetic constraint between species reveals biologically relevant pathways and diseases Although multiple sequence alignments of gene regulatory regions and in silico derived TF binding sites (TFBS) can be used to predict in vivo bound TFBS, the location of the majority of TF binding sites are species-specific (Schmidt et al. 2010; Ballester et al. 2014; Villar et al. 2014; Yue et al. 2014). Importantly, these species-specific TF binding sites often occur in genomic regions that can be accurately aligned in multiple species (Schmidt et al. 2010; Ballester et al. 2014). Conservation of TF binding in orthologous regions of multiple species can identify functional regulatory elements in the human genome (see (Wilson and Odom 2009; Odom 2011; Villar et al. 2014). I refer to in vivo TF binding or histone modifications that occur in orthologous regions of two or more species (defined as a minimum of 1 bp overlap in multiple sequence alignments) as “biochemical conservation” for the remainder of my thesis (Supplementary Table 5). This term refers to the covalent bonds between and DNA that occurs during the ChIP experiment. Although there are an increasing number of studies comparing human-mouse epigenomes (Heinz et al. 2013; Yue et al. 2014; Gjoneska et al. 2015), there are currently few studies that compare more than 3 vertebrate species (Ballester et al. 2014; Villar et al. 2014). Other species like the worm (Gerstein et al. 2010) and fly (Consortium et al. 2010) have also been instrumental to the current understanding of epigenetic regulation. Studies have reported TF binding conservation ranging from 2-14% in liver across multiple vertebrate species separated by 300 million years (Schmidt et al. 2010). This aspect of gene regulation is particularly interesting as biochemically conserved regulatory regions enrich for tissue specific

39 pathways and processes (Loh et al. 2006; Odom et al. 2006; Conboy et al. 2007; Ballester et al. 2014). One of the first multi-species epigenetic comparisons using histone modifications was performed by Xiao et al., who compared H3K27ac, H3K4me3, and H3K36me3 between human, mice, and pig pluripotent stem cells (Xiao et al. 2012). They observed that approximately 20% of enhancers and promoters are conserved between human versus mouse, and human versus pig. Connections to disease variants within conserved regions have also been published; most recently, cross-species strategies that compare differences in histone modifications in mouse models of disease to a collection of normal human tissues revealed that Alzheimer’s disease- associated GWAS SNPs were found in active enhancers of the mouse that altered gene expression during disease progression (Gjoneska et al. 2015). This and other work suggest that performing cross-species epigenomic comparisons can help characterize and contextualize gene regulatory regions. Biochemically conserved TF binding sites are enriched for tissue-specific functions and pathways. For example, our group recently showed that clusters CRMs of four liver-specific TFs that are shared between human, macaque, mouse, rat, and dog were enriched for liver diseases and liver-associated GWAS disease traits (Ballester et al. 2014). These tissue-specific enrichments were not obtained for human-specific CRMs. This supports the hypothesis that biochemical conservation is indicative of regulatory DNA function. However, this hypothesis has not been directly addressed in different tissues. Furthermore Ballester et al. only looked at liver enriched transcription factors (HNF4A, CEBPA, ONECUT1 and FOXA1) and it remains to be seen whether conserved occupancy of ubiquitously expressed TFs such as JUN also enrich for tissue-specific function. Here we ask whether regulatory regions shared among multiple species control biological pathways and functions specific to ECs.

Cell models from multiple species are commonly used to study vascular biology ECs obtained from rodents and cattle are often used as models for understanding EC biology. Rat ECs have been used to study hypertension and endothelial dysfunction (Nguyen et al. 2013), vascular remodeling (Fu et al. 2014), and the role of estrogen in cardiovascular health (Ding et al. 2015). Bovine aortic endothelial cells (BAEC) are used to research angiogenesis (Du et al. 2012), hypoxia (Wang et al. 2014), and vascular remodeling via fluid sheer stress (Zeng and Tarbell 2014). Using these model organisms has helped to uncover key points of EC gene control. Notably, through the characterization of endothelial cell nitric oxide synthase

40

(NOS), BAECs have been instrumental in the discovery of human cardiovascular disease mechanisms (Nishida et al. 1992). However, unlike the genomic resources for HUVEC, there are few epigenetic resources for ECs from these model organisms.

Thesis Rationale for Chapter 2 I sought to provide an evolutionary interpretation of the human aortic EC epigenomic atlas (Chapter 1) by studying TF binding and epigenetic modifications indicative of active gene regulatory regions in aortic ECs from widely used EC model organisms, rat and bovine. My motivation is that biochemically-conserved regions are likely to be functionally important. This empirical strategy to classify gene regulatory regions has recently been shown to enrich for tissue-specific processes in liver (Ballester et al. 2014). However, since ECs are found in many tissues throughout the body, it was not clear what comparative epigenomics analyses in ECs would reveal. We asked whether biological pathways and known disease mutations are enriched in biochemically-conserved sites. Importantly, we showed that multi-species binding data are essential datasets because neither DNA sequence constraint alone nor ChIP-seq signal alone can accurately predict biochemically-conserved interactions.

41

Chapter 2 Materials and Methods

Rat Aortic ECs Two different pools of distinct biological replicates of Sprague-Dawley rat aortic endothelial cells (RAEC, CellBiologics catalogue #R2196) were grown in Endothelial Cell

Growth Media (Cell Applications, catalogue #211-500) and cultured at 37° C in a 5%-CO2 humidified incubator. The supplier tested the two lots of 6-week-old pooled male RAECs for DiI-ac-LDL uptake, and Zo-1 expression. Zo-1 is a cell surface marker for ECs that controls cell-cell adhesion (Li and Poznansky 1990; Tornavaca et al. 2015). We expanded RAECs in the same media as the HAECs. Cells were checked daily for proper morphology. We split the cells with a 4:1 split ratio for 2 passages, with an estimated 8-10 population doublings (20-hour doubling time). Early passage RAECs (P5) were frozen down for future functional experiments. At time of harvest, RAECs had proper cobblestone morphology through visual inspection with a light microscope.

Bovine Aortic ECs As part of a collaboration with Dr. Jason Fish’s lab, a post-doctoral fellow Dr. Lan Dang expanded two different lots of bovine aortic endothelial cells (BAEC, Cell Application, catalogue #B304-05B) in Endothelial Cell Medium (ScienCell, catalogue #1001). Dr. Dang followed the growing and fixing protocols outlined in the Chapter 1 Materials and Methods section and provided fixed cell pellets for the ChIP-seq experiments.

Experimental Methods ChIP-seq experiments for RAECs and BAECs were conducted as described in Chapter 1 Materials and Methods using the same antibodies: mouse anti-H3K27ac (Millipore, 05-1334 monoclonal) and rabbit anti-JUN (Santa Cruz Biotechnology, sc1694 polyclonal). The ChIP-seq library preparation, and sequencing were conducted as described in Chapter 1. RAEC and BAEC libraries were size selected from 200-350bp using a 2% agarose dye-free automated size selection cassette from Pippin Prep (Sage Sciences, catalogue #CDF2010).

ChIP-seq data alignment and quality control ChIP-seq library and input reads from RAEC and BAEC were aligned to rn5 [RGSC 5.0] (Shimoyama et al. 2015), and btau6 [Bos_taurus_UMD_3.1] (Bovine Genome et al. 2009) with Burrow-Wheeler Aligner (BWA), using default parameters (Li and Durbin 2009).

42

Sequencing reads were automatically trimmed based on the quality of the bases called at each position (100-bp reads). We aligned sequencing reads to the same rat and bovine genome assemblies that were in the 13-way eutherian mammals Enredo-Pecan-Ortheus multiple sequence alignment (Paten et al. 2008). We used the same quality control metrics as in Chapter 1 to describe the quality of our data. ENCODE guidelines suggest a PCR bottleneck coefficient greater than 0.5, NSC values greater than 1.05 and RSC values greater than 0.8. All of the RAEC and BAEC libraries reported here passed the Landt et al criteria (Supplementary Table 6 and Supplementary Table 7). We called peaks for each sample relative to the WCE input using default setting in MACS2 with a false-discovery rate (q ≤ 0.05). As in Chapter 1 Methods, narrow peaks were stitched to form broad peaks using the broad option and default parameters in MACS2 for H3K27ac libraries. We performed the Irreproducible Discovery Rate (IDR) analysis described by Landt et al., shown in Supplementary Table 7. The majority of peaks called in each biological replicate are consistent between both replicates. Based on the quality control measures taken above, we merged sequencing reads from biological replicates, called peaks, and used merged peaks for all downstream analyses.

Cross-species overlap of enhancers and JUN binding sites We defined biochemically-conserved JUN and H3K27ac sites from the perspective of the human genome. Our definition of a biochemically-conserved peak is a human peak that has a peak in the orthologous location of the rat or bovine genomes that overlaps by at least one base pair. Orthologous regions are retrieved from a 13-way eutherian mammals Enredo-Pecan- Ortheus (Paten et al. 2008) multiple sequence alignment (MSA) from the Ensembl Compara database, which includes genomic assemblies with comprehensively annotated features (Flicek et al. 2013). We used the Ensembl compara API to query the orthologs in the MSA for rat and bovine. When more than one multiple sequence alignment was present for a given human region (as would happen for paralogous regions present in rat or bovine), we used the longest alignment block in the EPO MSA to assess biochemical conservation. Approximately 8.5% of human H3K27ac and 4.4% of JUN peaks fell in this category. While this decision allowed us to identify additional biochemically conserved regions, to properly understand biochemical conservation in duplicated regions we would need to closely inspect and redo parts of these MSAs, which is beyond the scope of this thesis project. H3K27ac-centric conservation is defined as an H3K27ac peak that has an H3K27ac peak in the rat and/or bovine MSA, and JUN-

43 centric conservation is defined as a human JUN peak that has a JUN peak in the rat and/or bovine MSA (Supplementary Table 5).

44

Chapter 2 Results

2.1 A minority of human aortic EC H3K27ac and JUN binding sites are shared with rat and bovine aortic ECs We determined biochemical conservation of regulatory regions between human, rat, and bovine ECs. We conducted H3K27ac and JUN ChIP-seq experiments with rat and bovine aortic ECs and called peaks in each experiment (for access to data see page 15). We identified 48,289 BAEC and 47,393 RAEC H3K27ac peaks, and 4,964 BAEC and 16,147 RAEC JUN peaks (Supplementary Table 7). We highlight a representative example of our 3-species ChIP-seq data at the JUN locus (Figure 9). Here JUN binds within an H3K27ac-enriched region at its proximal promoter in all three species. Indeed JUN has long been known to engage in a positive auto- regulatory feedback (Angel et al. 1988). Across our entire data set we found that 33.6% (N=19,643) of human H3K27ac peaks and 5.8% (N=3,687) of human JUN peaks are conserved with either rat and/or bovine (Figure 10). The percentage of TF-centric conservation is similar to previous comparative genomics findings (Schmidt et al. 2010; Ballester et al. 2014; Villar et al. 2014). We also determined if HAEC peaks were biochemically conserved with rat and bovine separately (Supplementary Figure 3, Panel A and B), as well as H3K27ac and JUN sites conserved across all three species (Supplementary Figure 3, Panel C). 13.2% (N=7,749) of H3K27ac sites and 0.83% (N=539) are conserved with both rat and bovine. Overall, we identified that a minority of human aortic EC active enhancers and JUN sites share orthologous binding in rat and bovine aortic ECs.

45

Figure 9: There is strong JUN binding within an active enhancer marked by H3K27ac at the JUN locus in rat, bovine, and human aortic ECs. Figure 9 Legend: UCSC screenshot of rat, bovine, and human aortic ECs showing H3K27ac, JUN, and input library signal in native genomes. Sequencing read counts corresponding to the highest summit are shown on the y-axis and a genomic window of approximately 9.5kb is shown on the x-axis. Nucleosome-depleted regions within H3K27ac-marked active enhancers have strong JUN ChIP-seq signal. This example represents conserved human H3K27ac and JUN sites that have a rat and bovine site in the orthologous genomic loci.

46

Figure 10: Few human H3K27ac and JUN peaks are biochemically conserved in other species. Figure 10 Legend: A schematic of multiple sequence alignment. Black region shows location of a conserved peak (purple ChIP-seq signal for H3K27ac and green ChIP-seq signal for JUN) in a simplified MSA of human, rat, and bovine. Dark grey regions denote alignable DNA in the MSA, and light grey regions show unalignable regions. Strict conservation is TF-centric, where a human JUN peak has a corresponding rat and/or bovine JUN peak overlapping by at least 1 base pair. Grey numbers above/below each species represent original number of peaks in library. The white numbers in blue boxes denote integrated conservation data showing number of human peaks that are conserved in either rat and/or bovine.

47

2.2 Determining the distribution of biochemically-conserved sites in EC-private and pan- tissue regulatory regions We asked if HAEC biochemically conserved JUN and H3K27ac sites were preferentially found in “EC-private” or “pan-tissue” regulatory DNA regions. In Chapter 1, we defined H3K27ac or JUN sites as being “EC-private” if they had significantly different ChIP-seq signal compared to a non-EC cell type HepG2 (false-discovery rate ≤ 0.05, N=12,747 for H3K27ac, N=6,549 for JUN). “Pan-tissue” sites were considered as any H3K27ac or JUN EC peak that did not have significantly different ChIP-seq signal compared to HepG2 (N=51,742 for H3K27ac and N=71,860 for JUN). We found that EC-private regions were highly enriched for biological pathways related to ECs, while pan-tissue regions enriched for cellular processes (see page 34). We asked if biochemically conserved HAEC peaks were preferentially found in EC- private or pan-tissue regions. We created a contingency table categorizing HAEC peaks in EC- private and pan-tissue regions by conservation status of HAEC peaks shared in either rat and/or bovine (Figure 11). We observe a 1.9-fold enrichment of conserved JUN sites in EC-private over pan-tissue regions (Bonferroni adjusted p-value = 2.9e-32). In contrast, there is a 1.3-fold enrichment of conserved H3K27ac sites in pan-tissue regions, meaning that conserved enhancers are more likely to be shared in HAEC, HUVEC, and HepG2 than in regions that are private to ECs (Bonferroni adjusted p-value = 7.0e-86). Our findings suggest that biochemically-conserved enhancers can have binding sites for different sets of tissue-specific TFs, while the majority of biochemically-conserved JUN sites regulate functions that are private to ECs.

48

Figure 11: Conserved JUN sites are more likely to be private to ECs, and conserved enhancers are preferentially found in pan-tissue sites. Figure 11 Legend: Differential binding analysis of active enhancer and JUN marks between HAEC, HUVEC, and HepG2 was conducted by comparing signal intensity normalized to respective input (q-value ≤ 0.05). Sites were categorized as EC-private or pan-tissue. These peaks were further categorized by their conservation status between human and either rat and/or bovine for conserved JUN sites (left) and H3K27ac sites (right). Fisher’s exact test for contingency tables was conducted to determine if the distribution of peaks in four categories was significantly different between categories (EC-private, pan-tissue, conserved, and not conserved). The Bonferroni-adjusted p-values are shown under each contingency table and represent the probability that the status of a HAEC peak is independent of the status of biochemical conservation. Conserved JUN sites are enriched in EC-private regions, while conserved H3K27ac sites are enriched in pan-tissue regions.

49

2.3 Using H3K27ac as a baseline for identifying biochemically conserved TF binding sites Comparing ChIP-seq data for JUN and H3K27ac between species provides the opportunity to identify functionally relevant protein-DNA interactions in orthologous regions. However, there is inherent difficulty in finding antibodies that can efficiently ChIP TFs in multiple species. Our JUN antibody is one exception, yet even here we obtain different numbers of JUN peaks in each species. It is not clear whether this difference is technical or biological (see Discussion) (Figure 10). Given that the H3K27ac libraries produced similar number of peaks between all three species, and the fact that the majority of JUN sites that are conserved between human, rat, and bovine fall within pan-tissue enhancers marked by H3K27ac, we asked which CRMs would be recovered by using H3K27ac as the comparator in multi-species comparisons. This approach is advantageous since the regions are broader and cover potential CRMs, and robust data for H3K27ac across species is easier to attain. We refer to this strategy of looking for JUN conservation as “lenient” versus JUN binding (in Panel B of Figure 10) as “strict” JUN conservation (Supplementary Table 5). We defined JUN peaks in human that have an H3K27ac peak in the rat and/or bovine MSA that overlaps by at least one base pair as “lenient” conserved JUN sites (Figure 12). We find that 27.7% (N=17,863) of human JUN peaks are conserved with either rat and/or bovine by overlap of H3K27ac peak by at least one base pair in our lenient comparisons. 10% of JUN peaks are leniently conserved with both rat and bovine (N=6,376) (Supplementary Figure 4). The lenient category of biochemical conservation recovered 2,340 JUN sites that were not originally in H3K27ac peaks in human. That is, JUN sites found within an active enhancer in the orthologous location of rat and/or bovine but were not originally within an active enhancer of human aortic ECs. As expected, almost all (95%) strictly conserved JUN sites are within the leniently conserved JUN set. Thus, we find that almost a third of human JUN binding events fall within an evolutionarily conserved enhancer.

50

Figure 12: Defining leniently conserved JUN sites between multiple species. Figure 12 Legend: Lenient conservation is defined as a human JUN site that overlaps an active enhancer mark in the orthologous region of either rat and/or bovine aortic ECs by at least one base pair. Black region shows location of a leniently conserved JUN peak (green ChIP-seq signal for JUN and blue ChIP-seq signal for H3K27ac) in a simplified MSA of human, rat, and bovine. Dark grey region denotes alignable DNA in the MSA, and light grey regions show unalignable regions. Grey numbers under each species represent original number of peaks in library. White number in blue box shows the number of leniently conserved human JUN peaks.

51

To determine which co-factors of JUN have overrepresented binding motifs within biochemically conserved sites, we used the peak-motifs in RSAT (Thomas-Chollier et al. 2012). Interestingly, we identified binding motifs for KLF5 and an ETS family member FEV overrepresented only in leniently conserved JUN sites and conserved enhancers (Figure 13). We highlighted KLF5 as a potential co-regulator with JUN in Chapter 1 (Table 1). These motifs were not recovered in strictly conserved sites, perhaps because of the low number of peaks categorized as strictly conserved. Other overrepresented TF binding motifs in conserved sites included known regulators of EC gene expression; 1) members of the AP-1 family FOSL1, JUNB, and JUND, 2) SP1 and SP2, and 3) EGR1.

Figure 13: AP-1 monomers have overrepresented transcription factor binding motifs in HAEC peaks conserved with either rat and/or bovine. Figure 13 Legend: The RSAT tool peak-motifs was used to mine overrepresented transcription factor binding motifs in biochemically-conserved regions using the whole genome as a background model. De novo discovered motifs were matched to the JASPAR database of motifs, which are displayed for each category of biochemical conservation.

52

The top biological pathway enrichments for biochemically-conserved H3K27ac and JUN sites (both strict and lenient) are related to transforming growth factor beta receptor (TGFBR) signaling pathway and cell-substrate junction assembly (Figure 14). These categories are highly enriched in conserved H3K27ac sites (binomial FDR q-value = 6.31e-47, fold enrichment = 2.01 for TGFBR signaling, and binomial FDR q-value = 3.01e-40, fold enrichment = 3.04 for cell-substrate junction assembly), strictly conserved JUN sites (binomial FDR q-value = 2.17e- 26, fold enrichment = 2.91 for TGFBR signaling, and binomial FDR q-value = 9.40e-20, fold enrichment = 4.68 for cell-substrate junction assembly), and leniently conserved JUN sites (binomial FDR q-value = 7.70e-63, fold enrichment = 2.26 for TGFBR signaling, and binomial FDR q-value = 3.15e-43, fold enrichment = 3.25 for cell-substrate junction assembly). Since conserved H3K27ac and leniently conserved JUN sites cover approximately one-third of the datasets, categories like angiogenesis and response to reactive oxygen species, which are enriched for strictly conserved JUN sites, have small q-values but are not enriched over the background of the whole genome. Thus, we have shown that biologically relevant pathways are enriched in “leniently conserved” sites by defining the ChIP-seq binding landscape of a TF of interest in human and using H3K27ac as a proxy for active enhancers in additional multiple species. This idea could be a novel strategy that would be particularly useful in the field of comparative genomics, where multi-species TF data is difficult to attain. For example, enhancer sites for liver have recently been published for 20 mammalian species (Villar et al. 2015), but a similar experiment for TFs would be resource intensive and require large amounts of starting material. A way to test this new strategy could be to use TF binding data in more than three species (for example, in liver) and determine the number of strictly conserved sites that were predicted to be leniently conserved. We propose that this approach can be used to 1) study regulatory elements that bind different combinations of TFs in multiple species and 2) circumvent variation in multi-species TF ChIP-seq data.

53

Figure 14: Conserved regulatory regions enrich for tissue-specific biological pathways relevant to ECs.

54

Figure 14 Legend: Pathway enrichments for biochemically conserved H3K27ac sites, strictly and leniently conserved JUN sites are shown for GO Biological Process and Mouse Phenotype databases. The top 5 enrichments (y-axis) for each category of conservation (conserved H3K27ac sites and strictly/leniently conserved JUN sites), if available, are shown for each database and the corresponding enrichments are shown across the x-axis for all three categories of conservation. Entries are ranked by –log10 binomial FDR q-values (x-axis). Significant enrichments are labeled with an asterisk relative to the binomial fold enrichment of the gene set.

55 2.4 Biochemically conserved regulatory regions harbour many known human regulatory mutations Now that we have defined biochemical conservation for H3K27ac and JUN, we wanted to further characterize these conserved regions in the context of human disease mutations. We searched the Human Gene Mutation Database (see page 30) and generated contingency tables categorizing conserved and non-conserved peaks that overlapped a “regulatory” HGMD variant (Figure 15). We asked if regulatory disease variants were preferentially found in conserved over non-conserved H3K27ac or JUN sites. We found 5.2 times as many regulatory mutations in conserved H3K27ac than non-conserved sites (Bonferroni p-value = 1.1e-48); nearly 5.9 times as many in strictly conserved over non-conserved JUN sites (Bonferroni p-value = 1.9e-13); and 7.2 times as many in leniently conserved sites over non-conserved JUN sites (Bonferroni p- value = 3.4e-25).

Figure 15: HGMD regulatory variant enrichment analysis for biochemically conserved H3K27ac and JUN peaks. Figure 15 Legend: Contingency tables showing conservation status of human peaks, either shared with rat and/or bovine or not shared, and the number of peaks that have a regulatory HGMD variant for H3K27ac (left), strictly conserved JUN sites (middle), and leniently conserved JUN sites (right). The fold enrichment value is calculated as the proportion of conserved peaks with HGMD variants divided by the proportion of non-conserved peaks with HGMD variants. Fisher’s exact test was performed on these contingency tables to determine the probability of shared sites enriching for HGMD variants, with the Bonferroni-corrected p-values listed at the bottom of each table. There are more variants from the regulatory subset of HGMD found in conserved sites than non-conserved sites.

56

To investigate relationships between the genes annotated to HGMD variants that fall in leniently conserved JUN sites we used GeneMania to generate a gene network (Figure 16) (Warde-Farley and Donaldson 2010). The 89 leniently conserved JUN sites contained the 32 strictly conserved JUN sites (Figure 15). Functional pathways enriched in this network include EC-related pathways nitric oxide metabolism (Hypergeometric q-value = 1.24e-5), angiogenesis (Hypergeometric q-value = 6.31e-4), and regulation of chemotaxis (Hypergeometric q-value = 8.51e-6). Notably, VEGFA, TGFBR2, CCL2, and F3 have regulatory variants in conserved JUN peaks and are part of the angiogenesis pathway. Knowing that NOS3 regulatory mutations share pathways with JUNB, HSP90AB1, and CALM1, and that NOS3 has physical interactions with PSEN1 and CALM1, provides more evidence to prioritize the functional dissection of this specific locus. Interestingly, these NOS3 mutations were also highlighted in Chapter 1 as being in HAEC-only regions. We have now shown that they fall within conserved JUN sites and belong to the nitric oxide metabolism pathway with other query genes like HSP90AB1 and CALM1. Since HGMD has relatively few regulatory variants and most of these variants are annotated close to the proximal promoter of known genes, we suggest that many more regulatory regions that affect EC gene regulation remain to be discovered.

57

Figure 16: Network of genes annotated to regulatory HGMD variants that overlap leniently conserved JUN sites. 58

Figure 16 Legend: GeneMania was used to draw a network of genes and highlight functional pathways that connect each leniently conserved JUN site with a regulatory HGMD variant. There are 89 leniently conserved JUN sites with 86 uniquely annotated HGMD genes, 81 of which are protein coding. Functional pathway enrichments between 81 unique genes annotated to HGMD variants are shown. Black nodes represent queried genes that are not enriched for functional pathways, and colored nodes represent genes in pathways indicated in the functions legend. Black nodes with red outlines represent genes annotated to HGMD variants that overlapped a strictly conserved JUN site. Grey nodes are genes that connect query genes through pathway, physical, or predicted interactions, shown by different colours of edges as denoted by the networks legend. FDR values for functional pathways are shown beside each category.

59 2.5 Regulatory mutations can alter transcription factor binding sites and cause disease Discovering which transcription factor binding sites are affected by non-coding regulatory DNA mutations is of great interest in human genetics (Funnell et al. 2013; Farh et al. 2014). Knowing which TFs are affected by a regulatory DNA mutation can reveal new ways that disease genes are controlled. As our JUN ChIP-seq peaks were on average ~200 bp, it is quite likely that most regulatory mutations overlapping JUN peaks did not directly affect JUN binding. However, regulatory mutations predicted to cause a loss or gain of a JUN motif within our biochemically conserved JUN or H3K27ac peaks represent relevant candidates that should be prioritized for further functional testing. We looked for potential examples where known regulatory disease variants altered AP-1 binding by using JUN position weight matrices from HAEC, HUVEC, BAEC, and RAEC peaks, as well as conserved HAEC peaks. Using an RSAT tool variation-scan, which scans peak sequences bearing variants in order to predict if binding motifs may be affected by the variant (see www.rsat.eu and (Medina-Rivera et al. 2015)), we predicted that 17 of the 89 regulatory variants in leniently conserved JUN sites alter a JUN binding motif. Interestingly, our top prediction of a JUN altering variant was in the promoter region of IL6, a gene encoding a pro- and anti-inflammatory cytokine that regulates the inflammatory response (Figure 17) (Scheller et al. 2011). This particular variant, -278A>C is associated with systemic lupus erythematosus in Korean populations and is shown to increase expression of IL6 in vitro (Jeon et al. 2010). This increase in expression of IL6 can cause an imbalance of the anti- and pro- inflammatory response, though more functional work needs to be conducted to identify the causal relationship between the variant and systemic lupus erythematosus. Thus, using variation-scan in within our JUN peak regions, we were able to predict an example of a known mechanism that causes misregulation of IL6 through JUN binding motif alteration (Grassl et al. 1999; Faggioli et al. 2004). Furthermore, we predicted a novel JUN binding motif alteration within an intronic region of RAD51B (Figure 17), a gene that encodes the protein RAD51B that assists in the repair of DNA double strand breaks, and is associated with male breast cancer (Orr et al. 2012) and poor cardiovascular disease outcomes (Larson et al. 2007). A loss of a JUN site could potentially cause less RAD51B production and effect DNA double strand break repair. This variant is within a strictly conserved pan-tissue JUN site. Although Orr et al., have already suggested that this variant may lead to loss of a JUN binding site by using DNA sequence 60 similarity to the JUN motif (Orr et al. 2012), these predictions have yet to be functionally validated. Unlike evidence gained from GWAS studies, finding these regulatory regions bound in ECs from multiple species supports that ECs are a relevant cell type to use to further investigate the function of these regulatory regions through reporter gene assays and genome editing.

Figure 17: Regulatory variants alter JUN binding motifs in promoter region of IL6 and intronic region of RAD51B. Figure 17 Legend: Two examples of predictions made by variation-scan of HGMD variants. Both involve serious alterations to a JUN binding motif. Reference and alternative alleles are presented above the JUN position weight matrix. In the IL6 locus, the alternative allele C disrupts a highly favoured A position in the JUN matrix. The alternative T allele in the intronic region of RAD51B causes disruption of the highly favoured C allele in the JUN matrix. The p- value fold change is shown under each motif and is the ratio of the probability of JUN binding at the alternative allele to the probability of JUN binding at the reference allele.

61

2.6 Many regulatory variants in conserved TF-bound sites are tissue specific To determine if known human variants are preferentially found in conserved sites in a tissue-dependent manner, we compared TF binding in aortic ECs and liver. We asked if the regulatory variants that are found in TF binding sites are specific to each cell type. As an example, we used previously published data from (Ballester et al. 2014) where the binding profile of HNF4a, a liver-enriched TF, is available for human and rat. Since binding data was limited to human and rat, we defined strict JUN conservation in aortic ECs as human peaks that share a peak with rat (Panel A in Supplementary Figure 3) in order to make an analogous phylogenetic TF comparison. We generated contingency tables testing the distribution of regulatory HGMD variants in strictly conserved HNF4a and JUN sites and non-conserved sites (Figure 18). We wanted to know how similar the regulatory variants identified in TF bound sites of aortic ECs and liver are to each other. That is, could we be recovering variants coincidentally based on the annotations within HGMD? We took the genes that were annotated to the HGMD variants and created a Venn diagram comparing variants that fall in HAEC JUN peaks versus liver HNF4a peaks (Figure 18). We see that the majority of the variants are specific to each cell type, with 19 (73%) HGMD genes unique to EC and 80 (92%) HGMD genes unique to liver. There are 7 highlighted HGMD genes that overlap the two categories, and include genes that encode enzymes for common cellular functions such as POR an oxidoreductase, PTEN a phosphatase, and MDM2 an E3 ubiquitin ligase. Interestingly, TGFBR2 regulates both normal cardiac development (Robson et al. 2010) and hepatocarcinogenesis (Morris et al. 2012) and is shared between the two categories. Thus, we observe that the majority of genes annotated to regulatory variants in conserved TF sites are specific to each cell type.

62

Figure 18: Known human disease variants enrich for conserved TF binding sites in a cell type-dependent manner. Figure 18 Legend: A contingency table evaluating the distribution of regulatory HGMD variants in conserved versus non-conserved sites is shown. We used our JUN ChIP-seq data in human and rat ECs and compared it to previously published HNF4a ChIP-seq data in human and rat liver. The number of regions conserved between human and rat are shown. The fold enrichment values are calculated as the proportion of conserved peaks with regulatory variants divided by the proportion of non-conserved peaks with regulatory variants. There are more regulatory variants in conserved sites than non-conserved sites, with Bonferroni adjusted p- values displayed. A Venn diagram of genes annotated to regulatory HGMD variants in JUN and HNF4a sites is shown. The majority of sites are unique to each cell type.

63

We then asked if the variants that overlapped conserved aortic EC active enhancers were the same as those found in conserved active enhancers in liver. Recently, Villar et al., published H3K27ac ChIP-seq experiments for primary human, rat, and bovine liver tissue (Villar et al. 2015). We downloaded their raw data and processed sequencing reads in the same manner as described in Materials and Methods (see page 15 and 43). We defined conserved human liver peaks as H3K27ac sites that had an orthologous H3K27ac peak in either rat or bovine liver. As observed for aortic ECs, liver H3K27ac sites that are biochemically conserved between human and rat were enriched for HGMD regulatory variants (Figure 19, Panel A, Bonferroni p-value = 1.3e-39, fold enrichment = 3.0). The majority of EC HGMD genes (230/252, 91%) are almost entirely contained within the biochemically-conserved liver H3K27ac dataset (Figure 19, Panel B). The categories that are enriched in EC-specific HGMD variants are related to cellular response to chemical stimulus (hypergeometric p-value = 7.3e-3) and response to endogenous stimulus (hypergeometric p-value = 9.7e-3) (Supplementary Table 9, Section A). The 230 common H3K27ac HGMD variants enrich for response to stress (hypergeometric p-value = 4.5e-32) and blood circulation (hypergeometric p-value = 6.4e-8) (Supplementary Table 9, Section B). Liver-specific HGMD variants fall near genes that regulate response to organic substances (hypergeometric p-value = 6.9e-56) and glucose metabolism pathway (hypergeometric p-value =8.3e-6) (Supplementary Table 9, Section C). Using simple binary overlap, we show that the majority of variants that fall within H3K27ac sites of aortic ECs are shared in liver, while the majority of variants that fall in conserved JUN do not overlap with HNF4a in liver. However, one caveat is that since the liver dataset was generated from primary tissue and the liver contains sinusoidal ECs (Friedman 2012), we could be detecting EC signatures. Furthermore, many of the H3K27ac regions could be differentially bound between liver and ECs. Even in common H3K27ac regions, different sets of tissue-enriched TFs could bind and result in tissue specific gene expression patterns (e.g. (Kim et al. 2014)).

64

Figure 19: The majority of conserved H3K27ac aortic EC and liver sites enrich for the same genes associated to HGMD variants. Figure 19 Legend: Panel A: A contingency table evaluating the distribution of regulatory HGMD variants in conserved versus non-conserved sites of liver from previously published H3K27ac ChIP-seq data in human, rat, and bovine liver (Villar et al. 2015). The fold enrichment values are calculated as the proportion of conserved peaks with regulatory variants divided by the proportion of non-conserved peaks with regulatory variants. There are more regulatory variants in conserved liver H3K27ac sites than non-conserved sites, with Bonferroni adjusted p- values displayed. Panel B: A Venn diagram comparing genes annotated to regulatory HGMD variants in EC and liver H3K27ac sites. The majority of genes annotated to regulatory variants in aortic EC are shared between the two cell types.

65 2.7 Conserved H3K27ac regions are functionally active in blood vessels of mouse embryos In order to evaluate functionality of biochemically conserved H3K27ac and JUN sites in HAECs, we asked whether any of them serve as in vivo enhancers in a mouse model. To do this we interrogated publically available databases of in vivo enhancer activity performed in mice (Figure 20, Panel A). Visel et al., tested over 2000 evolutionarily conserved enhancer elements in a transgenic mouse assay to validate in vivo enhancer function and pattern of activity (Visel et al. 2007). They defined a positive enhancer as one that showed reproducible expression in the same structure in at least three independent biological replicates, and a negative enhancer as one that was tested with at least five transgenic embryos yet no reproducible expression was observed in any structure in at least three different embryos. We downloaded the most-updated version of this resource (VISTA http://enhancer.lbl.gov) and retrieved the enhancers mapping to the human genome (N=1,740). We then asked if any H3K27ac or JUN sites identified in our study overlapped these tested enhancers. 186 VISTA enhancers contained the full sequence of any H3K27ac and/or JUN HAEC peak. We were curious to know whether VISTA enhancers that covered conserved peaks had positive functional activity, and if so, how many were expressed in blood vessels. We formulated three contingency tables to test if biochemically conserved H3K27ac and JUN sites were more likely to be positively expressed (Supplementary Figure 5). We found that the conservation status of the peak contained in the VISTA enhancer was not indicative of positive enhancer activity, since there are as many positive VISTA enhancers as negative enhancers that are conserved in E11.5 embryos (p-values = 0.624, 0.712, 0.742 for conserved H3K27ac, strictly conserved JUN, and leniently conserved JUN contingency tables, respectively). Thus, conserved sites defined by our study do not preferentially have positive enhancer activity in the in vivo transgenic mouse assay over non-conserved sites. This could be due to the fact that a very small number of enhancers identified by this thesis were tested, and in order to confidently categorize positive enhancer activity by conservation status, more enhancers should be functionally validated. It is important to note that the enhancers selected to be assayed in the VISTA database show high DNA constraint (not biochemical conservation) and thus are potentially biased for controlling early developmental gene expression. We filtered the 186 VISTA enhancers by status of biochemical conservation and found that 96 contained conserved peaks (either H3K27ac, strictly conserved JUN, and/or leniently 66 conserved JUN), and 91 contained non-conserved peaks (Figure 20, Panel B). Approximately half of the VISTA enhancers (conserved or not conserved) had positive expression. From the 56 positively expressed enhancers, 52 contained both a conserved JUN and a conserved H3K27ac peak, 3 contained a conserved H3K27ac peak, and 1 contained a conserved JUN peak. From the full VISTA catalogue, 9 candidate enhancers had positive expression in blood vessels, 7 of which were identified by either JUN or H3K27ac ChIP experiments in HAECs. 6 out of 7 enhancers that had positive expression in blood vessels contained a conserved peak from our datasets, and 1 out of 7 had a non-conserved H3K27ac peak. To test if this distribution was significant, we calculated the hypergeometric statistic for the probability of observing 6/9 biochemically conserved peaks with positive enhancer activity in blood vessels given 95 opportunities, and observed a small p-value (1.67e-6). The 6 VISTA enhancers that have conserved sites are annotated to PHF7-SEMA3G, MARCH8, BCL2L1, BCAN-NES, HHEX-EXOC6, and ZNF521. The SEMA3G enhancer controls expression of a gene that produces semaphorin, a secreted signaling molecule that has classically been studied as an axon guidance molecule. However, recently SEMA3G was shown to be a key regulator of angiogenesis (Kutschera et al. 2011; Aranguren et al. 2013). Furthermore, using a different transgenic mouse assay, Kutschera et al showed that SEMA3G protein localizes to arterial vasculature detectable from embryonic day 9.5 through to adolescence. This provides a secondary in vivo validation of the role that this gene and protein play in ECs. This enhancer is approximately 4kb upstream of the SEMA3G TSS and is found in a leniently conserved JUN site that is biochemically conserved between human and rat aortic ECs. This peak is also considered as pan-tissue (Supplementary Table 5). Other TFs that bind within this enhancer include ETS1 (Zhang et al. 2013), FOS (Linnemann et al. 2011), and ERG (data generated in collaboration with Dr. Jason Fish). Examples like this one provide evidence of activity and localization of conserved enhancers. This analysis showed us a glimpse of the possible functioning of enhancers identified by our study, and many more enhancers remain to be functionally validated. Moreover, the VISTA enhancer database is based on a single developmental time point, E11.5, which allows for whole embryo staging and visualization of major tissues and organs (Pennacchio et al. 2006). The H3K27ac and JUN sites identified by our study could be active in other time points during development and in fully differentiated adult tissues. Since ECs are found in all tissues of the body, we are limited in our ability to pinpoint the expression of these enhancers specifically to ECs using this assay. 67

Figure 20: Six biochemically-conserved H3K27ac or JUN sites have positive enhancer activity in blood vessels of mouse embryos. Figure 20 Legend: Panel A shows a schematic representation of the in vivo enhancer assay conducted by Visel et al (Visel et al. 2007). Candidate VISTA enhancers were cloned into LacZ reporter constructs, microinjected into fertilized embryos, and assessed for LacZ localization on embryonic day 11.5. Panel B is a flowchart of the 186 VISTA enhancers that overlapped either H3K27ac or JUN sites separated by conservation, positive expression, and blood vessel localization. A representative embryo is shown for each enhancer that is positively expressed in blood vessels and other structures (e.g. RIC3-LMO1 embryo also has marked expression in the midbrain). The gene annotated to the enhancer by Visel et al., and the reproducibility of this pattern of expression is displayed under each embryo. There are more conserved enhancers that have positive blood vessel expression than non-conserved enhancers.

68

2.8 DNA sequence constraint is not a good predictor of biochemically conserved JUN binding Biochemically conserved JUN and H3K27ac binding sites were enriched for EC-specific biological pathways (Figure 14) and known human regulatory variants that lead to diseases (Figure 15, Figure 16). We asked whether DNA sequence constraint could accurately predict which regions had biochemical conservation. We asked if peaks with the highest DNA sequence constraint or signal intensity scores were also shown to be biochemically conserved with human and another species. We used DNA sequence constraint scoring methods GERP (Cooper et al. 2005) and fitCons (Gulko et al. 2015). We calculated the average GERP and fitCons score of each peak for our tests. The signal intensity scores were calculated as the sequencing reads in a peak corrected to the input reads as determined by MACS2. We specifically looked at biochemically conserved JUN sites for this analysis to capture biological functions of JUN in TF-bound sites. Scoring the average DNA sequence constraint for broad bimodal H3K27ac peaks would reduce the peak scores (thus, we consider only JUN binding sites for this analysis). We ranked all JUN peaks based on average GERP, fitCons, and signal intensity scores and took the top N number of peaks corresponding to “strict” biochemically conserved data (N=3,687 for JUN, see Figure 10). We first compared the biological pathways that were enriched in the top ranked DNA constrained peaks and top ranked intensity peaks using GREAT (Figure 21). GO Biological Process categories that were highly enriched in biochemically-conserved and top intensity JUN peaks included angiogenesis (binomial FDR q-value = 2.18e-25, fold enrichment = 2.0) and TGFBR signaling pathway (binomial FDR q-value = 1.20e-14, fold enrichment = 2.6). Abnormal mouse phenotypes enriched in biochemically-conserved and top intensity JUN peaks were related to hemostasis (binomial FDR q-value = 1.34e-16, fold enrichment = 2.0) and wound healing (binomial FDR q-value = 1.50e-16, fold enrichment = 2.2). The top GERP and top fitCons peaks enrich for processes like posttranslational regulation of gene expression (binomial FDR q-value = 3.54e-23, fold enrichment = 2.0) and nucleocytoplasmic transport (binomial FDR q-value = 4.08e-19, fold enrichment = 2.3). Top GERP peaks were also enriched for early developmental processes like artery morphogenesis (binomial FDR q-value = 1.24e-19, fold enrichment = 3.1) and artery development (binomial FDR q-value = 4.23e-18, fold enrichment = 2.9). The genes that gave rise to early developmental processes included HEY2, FOXC1, and HAND2 outlined in Figure 2. Abnormal

69 mouse phenotypes enriched in top ranking DNA sequence constrained peaks are related to complete embryonic lethality (e.g. embryonic lethality during organogenesis, binomial FDR q- value = 3.45e-76, fold enrichment = 2.0), suggesting that these JUN sites could be necessary for early development. These findings indicate that biochemically-conserved and DNA sequence-conserved regions are enriched for distinct pathways that are biologically relevant to ECs. We showed that biochemically conserved and top intensity TF peaks have the closest pattern of enrichment for the same categories, while top GERP and fitCons scoring peaks enriched for similar categories. Therefore, by assessing if a human aortic EC peak is biochemically conserved or has DNA sequence constraint, we recover distinct and complementary biological pathways. Next, we assessed the ability for DNA sequence constraint scoring methods to predict biochemically conserved protein-DNA interactions.

70

Figure 21: Biochemically-conserved JUN peaks are enriched for different biological pathways than the top ranked DNA sequence constraint peaks.

71

Figure 21 Legend: Multiple GREAT plots showing up to five of the most significant enrichments obtained for each of the four analyses for JUN from GO Biological Process and Mouse Phenotype Database. Biochemically conserved sites were defined as peaks shared between human and either rat and/or bovine (Figure 10). The top ranked GERP, fitCons, and intensity peaks were queried for biological pathways using GREAT. The –log10 of binomial q-values are shown for each GREAT category along the x-axis. The size of the asterisk is proportional to the binomial fold change obtained for the given database.

72 We next asked how well DNA sequence constraint metrics predicted biochemically conserved TF binding using receiver operating characteristic curves (ROC curves). We used biochemically conserved JUN sites in aortic ECs and also examined biochemically conserved HNF4a sites in liver to determine if our findings held true in two different tissues. We calculated the average and maximum GERP and fitCons scores per peak, and determined the signal intensity and q-value scores from the MACS2 peak calling output for each JUN and HNF4a peak. The accuracy of each scoring and ranking method depends on how well it can separate peaks into biochemically conserved and non-biochemically conserved categories. True positives were defined as human JUN EC peaks or liver HNF4a peaks that are biochemically conserved with rat aortic ECs and liver, respectively. Both GERP and fitCons scoring methods could not predict biochemically-conserved peaks, as shown by an area under the ROC curve (AUC) of approximately 0.5 for JUN and HNF4a (Figure 22). The best scoring GERP and fitCons nucleotide within the peak (marked in blue as GERP_max and fitCons_max) had a similar trend (AUC close to 0.5). This value corresponds to random chance, which means that biochemically conserved sites are not in the top ranking GERP and fitCons peaks. Signal intensity and q-value scores, on the other hand, gave AUC values close to 0.7 JUN and 0.55 for HNF4A. We interpret these findings as indicating that DNA sequence constraint, or more sophisticated methods that use DNA constraint in their models, do not accurately predict biochemically-conserved sites in two different tissues. Overall, we show that biochemical conservation datasets are valuable for finding biologically relevant gene regulatory regions.

73

Figure 22: Peak intensity scores are better predictors of true biochemical conservation than DNA sequence constraint methods GERP and fitCons. Figure 22 Legend: ROC curves of GERP, fitCons, and peak intensity tests are shown for JUN and HNF4a, with the area under the ROC curve (AUC) indicated in the legend. The y-axis is the true positive rate or sensitivity of the test, and the x-axis represents the false positive rate or 1 – specificity of the test. The total number of regions tested is the full set of HAEC peaks for JUN (N=64,351) and HNF4a (N=58,462). True positives are human peaks that are conserved with rat. The average GERP and fitCons scores per peak (green) and maximum GERP and fitCons scoring nucleotides within a peak (blue) do not accurately predict biochemical conservation with an AUC of nearly 0.5, equating to random chance. Peak intensity and q-value of peaks (confidence of peak call) are fair methods for predicting biochemically-conserved regions with an AUC of approximately 0.7 for JUN. Peak intensity and q-value of peaks are only slightly better in predicting biochemically-conserved HNF4a sites than DNA sequence constraint scoring.

74

Chapter 2 Summary We determined which EC regulatory regions identified in our HAEC epigenetic map generated in Chapter 1 were biochemically conserved in two common model organisms used by vascular biologists. The human regulatory regions that were biochemically conserved with rat and/or bovine aortic EC regulatory regions were enriched for known EC functions. We observed that conserved JUN sites were more likely to be found in EC-private regulatory regions, while conserved H3K27ac sites were enriched for EC regulatory regions that were shared with a non- EC liver cell line, HepG2. One of the major challenges moving forward with comparative TF binding analyses is the limited availability of antibodies that function similarly in multiple species. Although the antibody we used for JUN in this study was raised against conserved epitopes across species, we observed variable numbers of JUN peaks in human, rat and bovine. Since the majority of conserved JUN sites were found in H3K27ac peaks, we proposed a second way to designate a “conserved” JUN peak – that is to determine which human JUN peaks occur in a region of the human genome that is orthologous to H3K27ac enriched region in one or more additional species. This “lenient” definition for conservation allowed us to make use of multi-species enhancer data that marks a broader region of regulatory control and circumvent the JUN variation between multiple species ChIP-seq experiments. We observed that the lenient category virtually recovered all strictly conserved JUN sites, and similar biological pathways were enriched in both sets. We then asked if these regions were enriched for known regulatory human disease variants using the HGMD and found that conserved sites were more likely to have regulatory variants than non-conserved sites. We further characterized the regulatory variants that fell in leniently conserved JUN sites and found that biological processes like angiogenesis and nitric oxide metabolism were highly enriched. We asked if this trend was specific to aortic ECs or could be reproduced in another cell type. Using previously published data from the liver (Ballester et al. 2014), we showed that regulatory variants in TF sites were tissue-specific while regulatory variants in H3K27ac sites were common among ECs and liver. Biological processes that were enriched in H3K27ac sites common between ECs and liver included cellular proliferation and apoptosis, while the liver- specific genes enriched for lipid homeostasis. Moreover, to search for evidence of in vivo enhancer activity, we turned to the VISTA browser that used a transgenic mouse assay and evaluated the expression of VISTA enhancers that contained the full sequence of our JUN or

75

H3K27ac data. We saw that enhancers that stimulated reporter gene expression in the blood vessel of mouse embryos likely contained conserved JUN or H3K27ac sites. Finally, we showed that we could not have simply predicted our biochemically-conserved sites using sophisticated and widely used metrics of DNA sequence constraint. Thus, understanding epigenetic regulation of genes is more complicated than the underlying DNA sequence similarity. Overall, by using evolutionary conservation as a filter to distill regulatory regions, we characterized the conserved epigenetic landscape of aortic ECs and identified unique tissue-specific pathways and human regulatory variants that serve as the basis of further functional experimentation.

76

Discussion and Future Directions By generating an inter-vascular bed and inter-species atlas of endothelial cell TF binding and epigenetic modifications, we identified many regulatory regions that pertain to vascular EC gene expression. We further showed that regulatory regions that are biochemically conserved and private to ECs were enriched for known human regulatory variants that lead to disease. However, the function of the vast majority of conserved EC regulatory regions we identified have not been functionally characterized. There is great interest in 1) determining the function of putative gene regulatory regions maintained in multiple species, and 2) understanding the impact of genetic variation in these regulatory regions on gene expression. In the following discussion, we further outline computational and experimental strategies for characterizing the function of EC regulatory elements as well as highlight new investigations that can be informed by comparative epigenomic analyses in ECs.

Identifying candidates for functional study in artery- and vein-specific regulatory regions With an epigenetic atlas of regulatory elements and chromatin state maps, we have the opportunity to identify potential targets for functional study in ECs. Though we identify a large number of regulatory regions, we can distill these using differential binding analysis, chromatin state information, and evolutionary conservation to find potentially functional regulatory regions. Since ECs are located in a variety of different microenvironments, the regulatory regions that control artery- and vein- specific gene expression will differ. With active chromatin states and repressed/poised state annotations, we can use differences in chromatin state domains between ECs to identify vascular-bed specific gene regulatory networks. We can assess whether these regions are repressed or poised in venous ECs by interpreting the chromatin state map. There are 87 H3K27ac and 7 JUN HAEC-only peaks that fall in repressed or poised states in HUVEC (Supplementary Table 10). For example, the closest genes to these regions include Notch signaling pathway targets such as transcription factors HES2 and HEY2 in both H3K27ac and JUN sites, some of which are biochemically conserved. Conversely, there are 163 H3K27ac and 9 JUN HUVEC-only peaks in active enhancer states that are repressed or poised in HAECs (Supplementary Table 10). Interestingly, the TBX family of TFs are close to candidates from both HAEC and HUVEC lists; TBX3, which regulates early heart development (Hoogaars et al. 2004), is a candidate from the HAEC list, while TBX20, which has been linked to atherosclerosis (Shen et al. 2013), is a candidate from the HUVEC list. Thus, the regulatory

77 atlas and chromatin state information can be used as multiple lines of evidence to uncover potential targets for further study.

Other AP-1 members could play regulatory roles in multiple species We chose to identify the genome-wide occupancy of the JUN monomer of AP-1 in this study, mainly because JUN and GATA2 TFs are known to co-regulate EC gene expression. However, the AP-1 complex is comprised of dimeric DNA binding proteins from a number of different TF families (JUN, FOS, JDP, MAF, and ATF). The ability for AP-1 to control a wide range of biological processes like apoptosis, cellular proliferation, cell survival and differentiation, and is regulated by changes in the transcription of genes encoding AP-1 monomers (Shaulian and Karin 2002). JUN is the most potent transcriptional activator from the AP-1 family (Hsu et al. 1992; Shaulian and Karin 2002) and is often antagonized or weakened by JUNB. Since the genome-wide occupancy of AP-1 monomers has not been elucidated for ECs in multiple species, other AP-1 subunits could play important unexplored roles in rat or bovine ECs. In addition to technical differences that could be due to the antibody, it is important to consider biological differences with regards to different repertoires of AP-1 family members expressed at a given time (Shaulian and Karin 2002). It could also be that different AP-1 monomers are present in AECs from human, bovine, and rat. Future experiments should explore other members of the AP-1 family of TFs that could play essential roles in EC biology.

Cross-species datasets are a useful resource Growing evidence suggests that biochemically-conserved protein-DNA interactions provide essential information about biological pathways and targets that are important for cell- type specific functions (Visel et al. 2008; Cheng et al. 2014; Stergachis et al. 2014; Roadmap Epigenomics et al. 2015). Our experiments in multiple species provided essential evidence of evolutionarily conserved protein-DNA interactions that could not be predicted using DNA sequence constraint or ChIP-seq signal alone (Figure 22). We showed this for a ubiquitously expressed TF in ECs, JUN, and a tissue-restricted TF in liver, HNF4a. The evidence we present here for the functional relevance of EC-specific regulatory elements and biochemical conservation of JUN and H3K27ac sites suggests that our epigenetic atlas will be a valuable resource for cross-species TF and epigenetic comparisons. However, this is not to say that human-specific regulatory elements are not important, indeed our group previously showed that liver detoxification of xenobiotics was highly enriched in human-specific regulatory regions in liver (Ballester et al. 2014). Nevertheless, in terms of identifying the critical control regions of 78 genes required for the function of a specific cell type, shared orthologous TF binding is able to distinguish an important set of regulatory regions that warrants further investigation.

Finding EC-related regulatory variants within strictly conserved JUN sites Requiring JUN binding to be shared in orthologous regions between multiple species is a stringent filter that provides evidence of purifying selection. To demonstrate that novel variants in conserved regulatory regions identified by our study can be linked with EC biology, we described the variants found in strictly conserved JUN sites (as described on page 56), in more detail (Supplementary Table 11). There were 7 unique genes annotated to “disease-causing” regulatory mutations (“DM” category as defined by (Stenson et al. 2014)). Two of the seven genes, TGFBR2 and PROCR, are clearly related to EC biological functions (e.g. blood coagulation). Three of the seven genes are associated with neurological diseases (amyotrophic lateral sclerosis (ALS) and auditory neuropathy) that are linked with different cell types, but these genes also play other functional roles related to ECs. For example, there is an ALS- associated variant in heat shock protein beta-1 (HSPB1) promoter region; HSBP1 is released primarily from ECs and regulates angiogenesis via direct interaction with vascular endothelial growth factor (VEGF) (Choi et al. 2014). Our data adds to the current state of knowledge about these regulatory variants by showing that these variants lie within biochemically conserved JUN sites. These variants have the potential to alter the binding sites of AP-1 monomers or other co- factors; for example, HSF1 TF binding is disrupted in the core promoter of HSPB1 (Dierick et al. 2007), and we have predicted that this variant disrupts binding of another stress response TF, HESX1. Loss of HSF1 and HESX1 TF binding at the core promoter can potentially cause misregulation and lead to aberrant expression of HSPB1. Thus, our multi-species regulatory atlas can be leveraged to find potential TF motif altering regulatory variants that cause disease and new unexplored roles that ECs might play in these diseases.

79 NOS3 promoter is conserved between species and harbours functional regulatory variants As described in Section 1.4, a series of regulatory HGMD variants were found that occurred in HAEC-only and HUVEC-only H3K27ac and JUN sites. Of particular interest, we identified two variants in the promoter region of NOS3 that were covered by an overlapping HAEC-only H3K27ac and HAEC-only JUN peaks. The two variants in the promoter region of NOS3, rs2070744 and rs3918226, are associated with coronary spasm (Nakayama et al. 1999) and hypertension (Zhang et al. 2012), respectively. Using evolutionary conservation as a filter in Chapter 2, we showed that the peaks including these two NOS3 variants were shared in orthologous regions of human, rat, and bovine. We found that the major allele (C) for the rare variant C>T at rs3918226 is conserved in all three species using the Multiz Alignment (Blanchette et al. 2004) and that the DNA sequence is considered constrained by GERP (Cooper et al. 2005). The common variant (C) at rs2070744 is not conserved between human, rat, or bovine using the Multiz alignment or GERP. Within a 120-bp window that is nucleosome- depleted, there is evidence of other TFs binding, like FOS (Linnemann et al. 2011), ETS1 (Zhang et al. 2013), and ERG (data generated in collaboration with Dr. Jason Fish). These factors are all known regulators of EC gene expression, and their binding sites were overrepresented in our H3K27ac and JUN datasets (Figure 13). Indeed, there is a predicted disruption in ETS and EHF binding site at rs3918226 in the HaploReg V2 database (Ward and Kellis 2012). Thus, increasing evidence from biochemical binding of co-factors, biological relevance of NOS3 to ECs, and disease variants at this locus, point to this specific cis-regulatory module as a suitable candidate for further functional follow-up studies.

The disease association of many regulatory regions remains unexplored We have been able to learn about how variants in regulatory regions have led to rare human diseases within the promoter regions of genes that are highly studied. The role of variants in many more regulatory regions remains unexplored. Novel variants can be uncovered in biochemically-conserved regions that are associated with cardiovascular diseases. A strategy to prioritize which experiments to conduct first would be to examine the GWAS catalogue and assess whether conserved sites overlap GWAS annotations like cardiovascular or immune- related traits. Enrichment for GWAS annotation terms will provide more evidence to prioritize which regulatory regions to target for functional experimentation. Our group previously showed that liver-associated diseases were highly enriched in conserved liver TF binding sites (Ballester

80 et al. 2014); however, enrichment of GWAS annotated terms remains unknown in conserved regulatory regions of ECs. Future work could also involve leveraging the conservation status of a peak in a hypothesis-driven GWAS (HD-GWAS) of cardiovascular disease, where conserved regions are prioritized over non-conserved regions as described by (Sun et al. 2012). Using HG-GWAS, we could test if a high-prioritized group containing conserved regulatory regions enriches for GWAS variants over low-prioritized non-conserved sites. Overall, identifying known human regulatory variants in our datasets has afforded us the ability to characterize potential roles for ECs in various diseases, however disease variants in many more regulatory regions remain unexplored.

Functional dissection of regulatory regions We have identified several intriguing candidates of EC gene regulatory regions through developing an epigenetic atlas of regulatory elements in ECs. One example is the NOS3 promoter region, which harbours regulatory variants related to cardiovascular disease. Although in vitro luciferase assays are used to predict how specific variants directly affect TF binding sites and gene expression patterns for variants (Supplementary Table 11), plasmid DNA is extrachromosomal and does not have natively bound histone proteins. A strategy to test the function of variants of the NOS3 promoter in an in vivo setting is to use genome-editing techniques to alter the variant and assess if TF-bound sites are disrupted. Genomic regions can be modified with CRISPR/Cas9 genome editing technology (Jinek et al. 2012; Cong et al. 2013; Ran et al. 2013). By deleting or altering these variants, we can determine if levels of NOS3 expression are affected. Two strategies that we could use to functionally dissect the biochemically-conserved NOS3 promoter region are to 1) delete the biochemically-conserved JUN binding site and 2) switch two SNPs in proximal promoter regions (Figure 23). We expect that by modifying the NOS3 variants we will see a marked difference in expression of NOS3 through RNA-seq, which will cause changes in global gene expression. Because CRISPR/Cas9 genome editing can have off-target effects, controls will be included to check for this.

81

Figure 23: Target site of CRISPR/Cas9 genome editing in a biochemically-conserved JUN site of the NOS3 proximal promoter region. Figure 23 Legend: Targeting of SNPs in NOS3 proximal promoter as outlined on page 80. The target site is highlighted in pale blue and covers the two SNPs of interest. A first strategy is to delete this 200-bp region and assess gene expression changes of NOS3, and a second strategy is to switch the SNPs and assess if NOS3 expression is affected through RNA-seq.

82

Frontiers of cross-species comparative epigenomics In Chapter 1 and 2, we generated and annotated a reference atlas of regulatory regions in human aortic ECs. Besides using evolutionary conservation to identify critical points of EC gene control, this atlas can also serve as a starting resource to study regulatory events that are involved in biological pathways such as inflammation. ECs are highly responsive to their surrounding microenvironment and play crucial roles in mediating inflammatory responses. For example, ECs control vascular permeability, which allows for trafficking of immune cells during the inflammatory response (Mai et al. 2013). A key inflammatory cytokine that signals systemic inflammation and an acute phase response is tumor necrosis factor-alpha (TNFA). TNFA signals the activation of two critical inflammatory responses, 1) the activation of MAPK signaling that regulates JUN, and 2) NF-κB nuclear translocation. Both JUN and NF-κB are key mediators of the inflammatory response as they regulate early response genes (Aggarwal 2000). The genome-wide binding of NF-κB subunit RELA has been characterized in HUVECs treated with TNFA (Brown et al. 2014). Since RELA is known to bind to tissue-specific enhancers (Natoli et al. 2011) we would predict the responses to be similar in arterial and venous ECs. However, whether the HAEC specific enhancer regions are bound by RELA remains to be identified. I am involved in the preliminary experiments aimed to answer the question, that is, which regulatory elements are responsive to TNFA stimulation and which elements are biochemically conserved among multiple species in cultured aortic EC primary cells. Arterial EC datasets for JUN and RELA pre- and post- inflammation do not exist and so it remains to be seen how similar the TNFA response is between these two vascular beds. We assayed JUN binding under normal physiological conditions, so we focused on positive controls needed to establish the groundwork for including RELA in a project to consider the inflammatory response. The NF-κB complex is sequestered in the cytoplasm of cells under physiological conditions, and when TNFA signals environmental stress the nuclear localization signal of NF-κB becomes unmasked (Ballard et al. 1992). To demonstrate that nuclear translocation of NF-κB is induced post TNFa stimulation in HAECs, we conducted immunofluorescence assays to track the RELA subunit using rabbit anti-RELA (Santa Cruz Biotechnology, sc372 polyclonal) (Figure 24). After exposure to TNFA, RELA is diffuse in the cell after 0 minutes. After 15 minutes of TNFA exposure, we observed strong co-localization of RELA and nuclear-stained DAPI. This localization remains for approximately one hour post stimulation, with the highest intensity at around 30 minutes. Based on the findings of this immunofluorescence assay and other published methods (Turner et al. 2010; Ostuni et al. 2013), 83 we chose to assess the genome-wide binding occupancy of RELA and JUN, and histone modification H3K27ac at 45 minutes post TNFA stimulation to build an epigenetic atlas of NF- κB stimulation.

84

Figure 24: Nuclear translocation of RELA occurs within 15 minutes post TNFA stimulation with sustained nuclear localization for approximately one hour.

85

Figure 24 Legend: Time course of immunofluorescence labeling of RELA (Santa Cruz Biotechnology, sc372 polyclonal) in HAECs treated with 10ng/mL human recombinant TNFa. Z-stacked images were taken on a Nikon A1R Si Point Scanning Confocal microscope with 40X water-immersion objective lenses and processed using NIS Elements software. The same laser intensity, sensitivity and acquiring mode were used to image all samples. RELA (red) translocates to the nucleus stained with DAPI (blue), within 15 minutes of TNFA treatment, with the highest intensity of localization occurring at the 30-minute mark. After 90 minutes, the intensity of RELA in the nucleus decreases. Scale bar shown in bottom right panel = 31um.

86 To optimize various ChIP-seq parameters, I conducted pilot ChIP-seq experiments for RELA, JUN, and H3K27ac in pre- and post-TNFA stimulated HAECs. We observed approximately 1,200 RELA peaks that overlap JUN peaks post stimulation. For example, both RELA and JUN bind within a nucleosome-depleted region of an active enhancer of TNFA- induced protein 3 (TNFAIP3) (Figure 25). The preliminary findings provide the foundation for studying epigenetic changes to ECs in an inflammatory state. A future aim is to build on Chapter 1 by comparing the changes of the epigenetic landscape in the inflammatory state by characterizing active enhancers and TF-bound sites between HAEC and HUVEC. Furthermore, since rodent models are widely used to study human inflammatory diseases, we are curious to know which TNFA-responsive regulatory regions from Chapter 2 are conserved between human, mouse, and rat aortic ECs that are treated with TNFA. By developing an intra-vascular and inter-species atlas of inflamed regulatory elements in ECs, we will study the evolution of the inflammatory response. Furthermore, we will search for known and novel regulatory variants in these prioritize inflammatory response regions.

87

Figure 25: RELA and JUN co-regulate expression of TNFA-induced protein 3 (TNFAIP3) by binding to a nucleosome depleted region in the proximal promoter. Figure 25 Legend: UCSC screenshot of JUN and H3K27ac ChIP-seq experiments presented in this thesis and preliminary RELA ChIP-seq experiment in TNFA-treated HAECs. A 3.5kb genomic window at TNFAIP3 locus (x-axis) and number of sequencing reads per library (y- axis) are shown. JUN and RELA form a cis-regulatory module and control expression of this inflammatory-induced gene.

88

References Adams WJ, Zhang Y, Cloutier J, Kuchimanchi P, Newton G, Sehrawat S, Aird WC, Mayadas TN, Luscinskas FW, Garcia-Cardena G. 2013. Functional vascular endothelium derived from human induced pluripotent stem cells. Stem Cell Reports 1(2): 105-113. Aggarwal BB. 2000. Tumour necrosis factors receptor associated signalling molecules and their role in activation of apoptosis, JNK and NF-κB. Ann Rheum Dis 59 Suppl 1: i6-16. Aird WC. 2012. Endothelial cell heterogeneity. Cold Spring Harbor Perspectives in Medicine 2: a006429. Aird WC, Edelberg JM, Weiler-Guettler H, Simmons WW, Smith TW, Rosenberg RD. 1997. Vascular bed-specific expression of an endothelial cell gene is programmed by the tissue microenvironment. J Cell Biol 138(5): 1117-1124. Akiyama H, Kondoh T, Kokunai T, Nagashima T, Saito N, Tamaki N. 2000. Blood-brain barrier formation of grafted human umbilical vein endothelial cells in athymic mouse brain. Brain Research 858(1): 172-176. Andrews S. 2010. FastQC: a quality control tool for high throughput sequence data. Angel P, Hattori K, Smeal T, Karin M. 1988. The jun proto-oncogene is positively autoregulated by its product, Jun/AP-1. Cell 55(5): 875-885. Angel P, Karin M. 1991. The role of Jun, Fos and the AP-1 complex in cell-proliferation and transformation. Biochim Biophys Acta 1072(2-3): 129-157. Aranguren XL, Agirre X, Beerens M, Coppiello G, Uriz M, Vandersmissen I, Benkheil M, Panadero J, Aguado N, Pascual-Montano A et al. 2013. Unraveling a novel transcription factor code determining the human arterial-specific endothelial cell signature. Blood 122(24): 3982-3992. Azahri NS, Kavurma MM. 2013. Transcriptional regulation of tumour necrosis factor-related apoptosis-inducing ligand. Cell Mol Life Sci 70(19): 3617-3629. Ball SG, Shuttleworth CA, Kielty CM. 2007. Vascular endothelial growth factor can signal through platelet-derived growth factor receptors. J Cell Biol 177(3): 489-500. Ballard DW, Dixon EP, Peffer NJ, Bogerd H, Doerre S, Stein B, Greene WC. 1992. The 65-kDa subunit of human NF-κB functions as a potent transcriptional activator and a target for v-Rel-mediated repression. Proc Natl Acad Sci U S A 89(5): 1875-1879. Ballester B, Medina-Rivera A, Schmidt D, Gonzalez-Porta M, Carlucci M, Chen X, Chessman K, Faure AJ, Funnell AP, Goncalves A et al. 2014. Multi-species, multi-transcription factor binding highlights conserved control of tissue-specific biological pathways. Elife 3: e02626. Barski A, Cuddapah S, Cui K, Roh T-Y, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. 2007. High-resolution profiling of histone methylations in the human genome. Cell 129: 823-837. Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74. Bird A. 2007. Perceptions of epigenetics. Nature 447(7143): 396-398. Birdsey GM, Shah AV, Dufton N, Reynolds LE, Osuna Almagro L, Yang Y, Aspalter IM, Khan ST, Mason JC, Dejana E et al. 2015. The endothelial transcription factor ERG promotes vascular stability and growth through Wnt/beta-catenin signaling. Dev Cell 32(1): 82-96. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14(4): 708-715.

89

Boffelli D, McAuliffe J, Ovcharenko D, Lewis KD, Ovcharenko I, Pachter L, Rubin EM. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299(5611): 1391-1394. Bonello GB, Pham M-h, Begum K, Sigala J, Sataranatarajan K, Mummidi S. 2011. An evolutionarily conserved TNF-α-responsive enhancer in the far upstream region of human CCL2 locus influences its gene expression. Journal of Immunology (Baltimore, MD: 1950) 186: 7025-7038. Bovine Genome S Analysis C Elsik CG Tellam RL Worley KC Gibbs RA Muzny DM Weinstock GM Adelson DL Eichler EE et al. 2009. The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science 324(5926): 522-528. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, Karczewski KJ, Park J, Hitz BC, Weng S et al. 2012. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res 22(9): 1790-1797. Brown JD, Lin CY, Duan Q, Griffin G, Federation AJ, Paranal RM, Bair S, Newton G, Lichtman AH, Kung AL et al. 2014. NF-κB Directs Dynamic Super Enhancer Formation in Inflammation and Atherogenesis. Molecular Cell 56: 219-231. Carmeliet P, Ferreira V, Breier G, Pollefeyt S, Kieckens L, Gertsenstein M, Fahrig M, Vandenhoeck A, Harpal K, Eberhardt C et al. 1996. Abnormal blood vessel development and lethality in embryos lacking a single VEGF allele. Nature 380(6573): 435-439. Castellazzi M, Spyrou G, La Vista N, Dangy JP, Piu F, Yaniv M, Brun G. 1991. Overexpression of c-jun, junB, or junD affects cell growth differently. Proc Natl Acad Sci U S A 88(20): 8890-8894. Chen X, Qin J, Cheng C-M, Tsai M-J, Tsai SY. 2012. COUP-TFII is a major regulator of cell cycle and Notch signaling pathways. Molecular Endocrinology (Baltimore, MD) 26: 1268-1277. Cheng Y, Ma Z, Kim B-H, Wu W, Cayting P, Boyle AP, Sundaram V, Xing X, Dogan N, Li J et al. 2014. Principles of regulatory information conservation between mouse and human. Nature 515: 371-375. Chi J-T, Chang HY, Haraldsen G, Jahnsen FL, Troyanskaya OG, Chang DS, Wang Z, Rockson SG, van de Rijn M, Botstein D et al. 2003. Endothelial cell diversity revealed by global expression profiling. PNAS 100: 10623-10628. Chiu JJ, Chien S. 2011. Effects of disturbed flow on vascular endothelium: pathophysiological basis and clinical perspectives. Physiol Rev 91(1): 327-387. Choi SH, Lee HJ, Jin YB, Jang J, Kang GY, Lee M, Kim CH, Kim J, Yoon SS, Lee YS et al. 2014. MMP9 processing of HSPB1 regulates tumor progression. PLoS One 9(1): e85509. Clanchy FIL, Hamilton Ja. 2012. HUVEC co-culture and haematopoietic growth factors modulate human proliferative monocyte activity. Cytokine 59: 31-34. Conboy CM, Spyrou C, Thorne NP, Wade EJ, Barbosa-Morais NL, Wilson MD, Bhattacharjee A, Young RA, Tavare S, Lees JA et al. 2007. Cell cycle genes are the evolutionarily conserved targets of the E2F4 transcription factor. PLoS One 2(10): e1061. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, Hsu PD, Wu X, Jiang W, Marraffini LA et al. 2013. Multiplex Genome Engineering Using CRISPR/Cas Systems. Science (339): 819-823. Consortium, Encode Project. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414): 57-74. Consortium ModEncode, Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, Eaton ML, Landolin JM, Bristow CA, Ma L et al. 2010. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330(6012): 1787-1797. 90

Cooper GM, Goode DL, Ng SB, Sidow A, Bamshad MJ, Shendure J, Nickerson DA. 2010. Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nat Methods 7(4): 250-251. Cooper GM, Stone EA, Asimenos G, Program NCS, Green ED, Batzoglou S, Sidow A. 2005. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15(7): 901-913. Corradin O, Saiakhova A, Akhtar-Zaidi B, Myeroff L, Willis J, Cowper-Sal lari R, Lupien M, Markowitz S, Scacheri PC. 2014. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res 24: 1-13. Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, Hanna J, Lodato Ma, Frampton GM, Sharp Pa et al. 2010. Histone H3K27ac separates active from poised enhancers and predicts developmental state. PNAS 107: 21931-21936. Davis RJ. 1994. MAPKs: new JNK expands the group. Trends in Biochemical Sciences 19(11): 470-473. De Val S, Black BL. 2009. Transcriptional Control of Endothelial Cell Development. Developmental Cell 16: 180-195. Dierick I, Irobi J, Janssens S, Theuns J, Lemmens R, Jacobs A, Corsmit E, Hersmus N, Van Den Bosch L, Robberecht W et al. 2007. Genetic variant in the HSPB1 promoter region impairs the HSP27 stress response. Hum Mutat 28(8): 830. Ding Q, Hussain Y, Chorazyczewski J, Gros R, Feldman RD. 2015. GPER-independent effects of estrogen in rat aortic vascular endothelial cells. Mol Cell Endocrinol 399: 60-68. Dorfman DM, Wilson DB, Bruns GA, Orkin SH. 1992. Human transcription factor GATA-2. Evidence for regulation of preproendothelin-1 gene expression in endothelial cells. J Biol Chem 267(2): 1279-1285. Du J, Teng RJ, Guan T, Eis A, Kaul S, Konduri GG, Shi Y. 2012. Role of autophagy in angiogenesis in aortic endothelial cells. American Journal of Physiology Cell Physiology 302(2): C383-391. Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE, Mouse Genome Database G. 2015. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease. Nucleic Acids Res 43(Database issue): D726-736. Ernst J, Kellis M. 2010. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature Biotechnology 28: 817-825. ---. 2012. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods 9: 215-216. ---. 2013. Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types. Genome Res 23: 1142-1154. Ernst J, Kellis M. 2015. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat Biotechnol 33(4): 364-376. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M et al. 2011. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473: 43-49. Everett LA, Cleuren AC, Khoriaty RN, Ginsburg D. 2014. Murine coagulation factor VIII is synthesized in endothelial cells. Blood 123(24): 3697-3705. Faggioli L, Costanzo C, Donadelli M, Palmieri M. 2004. Activation of the Interleukin-6 promoter by a dominant negative mutant of c-Jun. Biochim Biophys Acta 1692(1): 17-24. Farh KK-h, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, Beik S, Shoresh N, Whitton H, Ryan RJH, Shishkin AA et al. 2014. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518(7539): 337-343. 91

Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S et al. 2013. Ensembl 2013. Nucleic Acids Research 41: D48-55. Francescone R, Hou V, Grivennikov SI. 2015. Cytokines, IBD, and Colitis-associated Cancer. Inflamm Bowel Dis 21(2): 409-418. Friedman SL. 2012. A silent partner no longer - sinusoidal endothelial cells in liver homeostasis and disease. Journal of Hepatology 56(5): 1001-1002. Fu J, Chen YF, Zhao X, Creighton JR, Guo Y, Hage FG, Oparil S, Xing DD. 2014. Targeted delivery of pulmonary arterial endothelial cells overexpressing interleukin-8 receptors attenuates monocrotaline-induced pulmonary vascular remodeling. Arterioscler Thromb Vasc Biol 34(7): 1539-1547. Fu M, Zhu X, Zhang J, Liang J, Lin Y, Zhao L, Ehrengruber MU, Chen YE. 2003. Egr-1 target genes in human endothelial cells identified by microarray analysis. Gene 315: 33-41. Funnell APW, Wilson MD, Ballester B, Mak KS, Burdach J, Magan N, Pearson RCM, Lemaigre FP, Stowell KM, Odom DT et al. 2013. A CpG mutational hotspot in a ONECUT binding site accounts for the prevalent variant of hemophilia B Leyden. American Journal of Human Genetics 92: 460-467. Gene Ontology Consortium. 2015. Gene Ontology Consortium: going forward. Nucleic Acids Res 43(Database issue): D1049-1056. Gerritsen ME, Bloor CM. 1993. Endothelial cell gene expression in response to injury. FASEB Journal: Official publication of the Federation of American Societies for Experimental Biology 7: 523-532. Gerstein MB Lu ZJ Van Nostrand EL Cheng C Arshinoff BI Liu T Yip KY Robilotto R Rechtsteiner A Ikegami K et al. 2010. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science 330(6012): 1775-1787. Gjoneska E, Pfenning AR, Mathys H, Quon G, Kundaje A, Tsai LH, Kellis M. 2015. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer's disease. Nature 518(7539): 365-369. Grassl C, Luckow B, Schlondorff D, Dendorfer U. 1999. Transcriptional regulation of the interleukin-6 gene in mesangial cells. Journal of the American Society of Nephrology : JASN 10(7): 1466-1477. Gulko B, Hubisz MJ, Gronau I, Siepel A. 2015. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet 47(3): 276-283. Hakimi M, Peters A, Becker A, Bockler D, Dihlmann S. 2014. Inflammation-related induction of absent in melanoma 2 (AIM2) in vascular cells and atherosclerotic lesions suggests a role in vascular pathogenesis. J Vasc Surg 59(3): 794-803. Han KH, Hong K-H, Park J-H, Ko J, Kang D-H, Choi K-J, Hong M-K, Park S-W, Park S-J. 2004. C-reactive protein promotes monocyte chemoattractant protein-1--mediated chemotaxis through upregulating CC chemokine receptor 2 expression in human monocytes. Circulation 109: 2566-2571. Harada K, Yamazaki T, Iwata C, Yoshimatsu Y, Sase H, Mishima K, Morishita Y, Hirashima M, Oike Y, Suda T et al. 2009. Identification of targets of Prox1 during in vitro vascular differentiation from embryonic stem cells: functional roles of HoxD8 in lymphangiogenesis. J Cell Sci 122(Pt 21): 3923-3930. Heger A, Webber C, Goodson M, Ponting CP, Lunter G. 2013. GAT: a simulation framework for testing the association of genomic intervals. Bioinformatics 29(16): 2046-2048. Heinz S, Romanoski CE, Benner C, Allison Ka, Kaikkonen MU, Orozco LD, Glass CK. 2013. Effect of natural genetic variation on enhancer selection and function. Nature. Hess J, Angel P, Schorpp-Kistner M. 2004. AP-1 subunits: quarrel and harmony among siblings. J Cell Sci 117(Pt 25): 5965-5973. 92

Hilberg F, Aguzzi A, Howells N, Wagner EF. 1993. c-jun is essential for normal mouse development and hepatogenesis. Nature 365(6442): 179-181. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. 2009. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106(23): 9362-9367. Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS. 2012. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods 9(5): 473-476. Hoogaars WM, Tessari A, Moorman AF, de Boer PA, Hagoort J, Soufan AT, Campione M, Christoffels VM. 2004. The transcriptional repressor Tbx3 delineates the developing central conduction system of the heart. Cardiovasc Res 62(3): 489-499. Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D. 2006. The UCSC Known Genes. Bioinformatics 22(9): 1036-1046. Hsu JC, Bravo R, Taub R. 1992. Interactions among LRF-1, JunB, c-Jun, and c-Fos define a regulatory program in the G1 phase of liver regeneration. Mol Cell Biol 12(10): 4654- 4665. Ikawa T, Kawamoto H, Goldrath AW, Murre C. 2006. E proteins and Notch signaling cooperate to promote T cell lineage specification and commitment. J Exp Med 203(5): 1329-1342. James AC, Szot JO, Iyer K, Major JA, Pursglove SE, Chapman G, Dunwoodie SL. 2014. Notch4 reveals a novel mechanism regulating Notch signal transduction. Biochim Biophys Acta 1843(7): 1272-1284. Jeon JY, Kim HA, Kim SH, Park HS, Suh CH. 2010. Interleukin 6 gene polymorphisms are associated with systemic lupus erythematosus in Koreans. The Journal of Rheumatology 37(11): 2251-2258. Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E. 2012. A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science 337: 816-821. Kaikkonen MU, Niskanen H, Romanoski CE, Kansanen E, Kivela AM, Laitalainen J, Heinz S, Benner C, Glass CK, Yla-Herttuala S. 2014. Control of VEGF--a transcriptional programs by pausing and genomic compartmentalization. Nucleic Acids Res 42(20): 12570-12584. Kim T-H, Li F, Ferreiro-Neira I, Ho L-L, Luyten A, Nalapareddy K, Long H, Verzi M, Shivdasani Ra. 2014. Broadly permissive intestinal chromatin underlies lateral inhibition and cell plasticity. Nature. Kolesnikov N, Hastings E, Keays M, Melnichuk O, Tang YA, Williams E, Dylag M, Kurbatova N, Brandizi M, Burdett T et al. 2015. ArrayExpress update--simplifying data submissions. Nucleic Acids Res 43(Database issue): D1113-1116. Koskinas KC, Chatzizisis YS, Antoniadis AP, Giannoglou GD. 2012. Role of endothelial shear stress in stent restenosis and thrombosis: pathophysiologic mechanisms and implications for clinical translation. Journal of the American College of Cardiology 59(15): 1337- 1349. Kourtzelis I, Ferreira A, Mitroulis I, Ricklin D, Bornstein SR, Waskow C, Lambris JD, Chavakis T. 2015. Complement inhibition in a xenogeneic model of interactions between human whole blood and porcine endothelium. Hormone and Metabolic Research 47(1): 36-42. Kutschera S, Weber H, Weick A, De Smet F, Genove G, Takemoto M, Prahst C, Riedel M, Mikelis C, Baulande S et al. 2011. Differential endothelial transcriptomics identifies semaphorin 3G as a vascular class 3 semaphorin. Arterioscler Thromb Vasc Biol 31(1): 151-159. 93

Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P et al. 2012. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Research 22: 1813-1831. Larson MG, Atwood LD, Benjamin EJ, Cupples LA, D'Agostino RB, Sr., Fox CS, Govindaraju DR, Guo CY, Heard-Costa NL, Hwang SJ et al. 2007. Framingham Heart Study 100K project: genome-wide associations for cardiovascular disease outcomes. BMC Med Genet 8 Suppl 1: S5. Leach HG, Chrobak I, Han R, Trojanowska M. 2013. Endothelial cells recruit macrophages and contribute to a fibrotic milieu in bleomycin lung injury. American Journal of Respiratory Cell and Molecular Biology 49(6): 1093-1101. Li CX, Poznansky MJ. 1990. Characterization of the ZO-1 protein in endothelial and other cell lines. J Cell Sci 97 ( Pt 2): 231-237. Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14): 1754-1760. Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Washietl S, Kheradpour P, Ernst J, Jordan G, Mauceli E et al. 2011. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478(7370): 476-482. Linnemann AK, O'Geen H, Keles S, Farnham PJ, Bresnick EH. 2011. Genetic framework for GATA factor function in vascular biology. Proceedings of the National Academy of Sciences of the United States of America 108: 13641-13646. Liu ZJ, Tan Y, Beecham GW, Seo DM, Tian R, Li Y, Vazquez-Padron RI, Pericak-Vance M, Vance JM, Goldschmidt-Clermont PJ et al. 2012. Notch activation induces endothelial cell senescence and pro-inflammatory response: implication of Notch signaling in atherosclerosis. Atherosclerosis 225(2): 296-303. Loh YH, Wu Q, Chew JL, Vega VB, Zhang W, Chen X, Bourque G, George J, Leong B, Liu J et al. 2006. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat Genet 38(4): 431-440. Maezawa Y, Takemoto M, Yokote K. 2015. Cell biology of diabetic nephropathy: Roles of endothelial cells, tubulointerstitial cells and podocytes. Journal of Diabetes Investigation 6(1): 3-15. Mai J, Virtue A, Shen J, Wang H, Yang X-F. 2013. An evolving new paradigm: endothelial cells--conditional innate immune cells. Journal of Hematology & Oncology 6: 61-74. Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, Buchman S, Chen CY, Chou A, Ienasescu H et al. 2014. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res 42(Database issue): D142-147. Matouk CC, Marsden PA. 2008. Epigenetic Regulation of Vascular Endothelial Gene Expression. Circulation Research 102: 873-887. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J et al. 2012. Systematic localization of common disease- associated variation in regulatory DNA. Science (New York, NY) 337: 1190-1195. Mclean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G. 2010. GREAT improves functional interpretation of cis-regulatory regions. Nature Biotechnology 28: 1630-1639. Medina-Rivera A, Defrance M, Sand O, Herrmann C, Castro-Mondragon JA, Delerce J, Jaeger S, Blanchet C, Vincens P, Caron C et al. 2015. RSAT 2015: Regulatory Sequence Analysis Tools. Nucleic Acids Research: 1-7. Miwa Y, Yamamoto K, Onishi A, Iwamoto M, Yazaki S, Haneda M, Iwasaki K, Liu D, Ogawa H, Nagasaka T et al. 2010. Potential value of human thrombomodulin and DAF 94

expression for coagulation control in pig-to-human xenotransplantation. Xenotransplantation 17(1): 26-37. Morange P-E, Trégouët D-a. 2013. Current knowledge on the genetics of incident venous thrombosis. Journal of thrombosis and haemostasis : JTH 11 Suppl 1: 111-121. Morbidelli L, Donnini S, Ziche M. 2003. Role of nitric oxide in the modulation of angiogenesis. Current Pharmaceutical Design 9(7): 521-530. Morris SM, Baek JY, Koszarek A, Kanngurn S, Knoblaugh SE, Grady WM. 2012. Transforming growth factor-beta signaling promotes hepatocarcinogenesis induced by loss. Hepatology 55(1): 121-131. Nakatsu MN, Sainson RC, Aoto JN, Taylor KL, Aitkenhead M, Perez-del-Pulgar S, Carpenter PM, Hughes CC. 2003. Angiogenic sprouting and capillary lumen formation modeled by human umbilical vein endothelial cells (HUVEC) in fibrin gels: the role of fibroblasts and Angiopoietin-1. Microvasc Res 66(2): 102-112. Nakayama M, Yasue H, Yoshimura M, Shimasaki Y, Kugiyama K, Ogawa H, Motoyama T, Saito Y, Ogawa Y, Miyamoto Y et al. 1999. T-786-->C mutation in the 5'-flanking region of the endothelial nitric oxide synthase gene is associated with coronary spasm. Circulation 99(22): 2864-2870. Nateri AS, Spencer-Dene B, Behrens A. 2005. Interaction of phosphorylated c-Jun with TCF4 regulates intestinal cancer development. Nature 437(7056): 281-285. Natoli G, Ghisletti S, Barozzi I. 2011. The genomic landscapes of inflammation. Genes & Development 25: 101-106. Nguyen H, Chiasson VL, Chatterjee P, Kopriva SE, Young KJ, Mitchell BM. 2013. Interleukin- 17 causes Rho-kinase-mediated endothelial dysfunction and hypertension. Cardiovasc Res 97(4): 696-704. Nishida K, Harrison DG, Navas JP, Fisher AA, Dockery SP, Uematsu M, Nerem RM, Alexander RW, Murphy TJ. 1992. Molecular cloning and characterization of the constitutive bovine aortic endothelial cell nitric oxide synthase. J Clin Invest 90(5): 2092-2096. Odom DT. 2011. Identification of Transcription Factor-DNA Interactions In Vivo. Sub-cellular Biochemistry 52: 175-191. Odom DT, Dowell RD, Jacobsen ES, Nekludova L, Rolfe PA, Danford TW, Gifford DK, Fraenkel E, Bell GI, Young RA. 2006. Core transcriptional regulatory circuitry in human hepatocytes. Molecular Systems Biology 2: 2006-2017. Orford K, Kharchenko P, Lai W, Dao MC, Worhunsky DJ, Ferro A, Janzen V, Park PJ, Scadden DT. 2008. Differential H3K4 methylation identifies developmentally poised hematopoietic genes. Dev Cell 14(5): 798-809. Orr N, Lemnrau A, Cooke R, Fletcher O, Tomczyk K, Jones M, Johnson N, Lord CJ, Mitsopoulos C, Zvelebil M et al. 2012. Genome-wide association study identifies a common variant in RAD51B associated with male breast cancer risk. Nature Genetics 44: 1182-1184. Ostuni R, Piccolo V, Barozzi I, Polletti S, Termanini A, Bonifacio S, Curina A, Prosperini E, Ghisletti S, Natoli G. 2013. Latent enhancers activated by stimulation in differentiated cells. Cell 152: 157-171. Park C, Kim TM, Malik AB. 2013. Transcriptional regulation of endothelial cell and vascular development. Circulation Research 112: 1380-1400. Paten B, Herrero J, Beal K, Fitzgerald S, Birney E. 2008. Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res 18(11): 1814-1828.

95

Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S, Dubchak I, Holt A, Lewis KD et al. 2006. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444(7118): 499-502. Phillips JE, Corces VG. 2009. CTCF: master weaver of the genome. Cell 137(7): 1194-1211. Quail DF, Joyce JA. 2013. Microenvironmental regulation of tumor progression and metastasis. Nat Med 19(11): 1423-1437. Ran FA, Hsu PD, Wright J, Agarwala V, Scott Da, Zhang F. 2013. Genome engineering using the CRISPR-Cas9 system. Nature Protocols 8: 2281-2308. Reimand J, Kull M, Peterson H, Hansen J, Vilo J. 2007. g:Profiler--a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Research 35(Web Server Issue): W193-200. Roadmap Epigenomics C, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi- Moussavi A, Kheradpour P, Zhang Z, Wang J et al. 2015. Integrative analysis of 111 reference human epigenomes. Nature 518(7539): 317-330. Robson A, Allinson KR, Anderson RH, Henderson DJ, Arthur HM. 2010. The TGFbeta type II receptor plays a critical role in the endothelial cells during cardiac development. Dev Dyn 239(9): 2435-2442. Romanoski CE, Glass CK, Stunnenberg HG, Wilson L, Almouzni G. 2015. Epigenomics: Roadmap for regulation. Nature 518(7539): 314-316. Rzeczkowska PA, Hou H, Wilson MD, Palmert MR. 2014. Epigenetics: a new player in the regulation of mammalian puberty. Neuroendocrinology 99(3-4): 139-155. Sana TR, Janatpour MJ, Sathe M, McEvoy LM, McClanahan TK. 2005. Microarray analysis of primary endothelial cells challenged with different inflammatory and immune cytokines. Cytokine 29: 256-269. Scheller J, Chalaris A, Schmidt-Arras D, Rose-John S. 2011. The pro- and anti-inflammatory properties of the cytokine interleukin-6. Biochim Biophys Acta 1813(5): 878-888. Schmidt D, Wilson MD, Ballester B, Schwalie PC, Brown GD, Marshall A, Kutter C, Watt S, Martinez-Jimenez CP, Mackay S et al. 2010. Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding. Science 328: 1036-1040. Schmidt D, Wilson MD, Spyrou C, Brown GD, Hadfield J, Odom DT. 2009. ChIP-seq: using high-throughput sequencing to discover protein-DNA interactions. Methods (San Diego, Calif) 48: 240-248. Scott DW, Vallejo MO, Patel RP. 2013. Heterogenic endothelial responses to inflammation: role for differential N-glycosylation and vascular bed of origin. Journal of the American Heart Association 2(4): e000263. Seger R, Krebs EG. 1995. The MAPK signaling cascade. FASEB J 9(9): 726-735. Shalaby F, Rossant J, Yamaguchi TP, Gertsenstein M, Wu XF, Breitman ML, Schuh AC. 1995. Failure of blood-island formation and vasculogenesis in Flk-1-deficient mice. Nature 376(6535): 62-66. Sharrocks AD. 2001. The ETS-domain transcription factor family. Nature Reviews Molecular Cell Biology 2(11): 827-837. Shaulian E, Karin M. 2002. AP-1 as a regulator of cell life and death. Nature Cell Biology 4(5): E131-136. Shen T, Zhu Y, Patel J, Ruan Y, Chen B, Zhao G, Cao Y, Pang J, Guo H, Li H et al. 2013. T- box20 suppresses oxidized low-density lipoprotein-induced human vascular endothelial cell injury by upregulation of PPAR-gamma. Cellular Physiology and Biochemistry: international journal of experimental cellular physiology, biochemistry, and pharmacology 32(5): 1137-1150.

96

Shimoyama M, De Pons J, Hayman GT, Laulederkind SJ, Liu W, Nigam R, Petri V, Smith JR, Tutaj M, Wang SJ et al. 2015. The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease. Nucleic Acids Res 43(Database issue): D743- 750. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S et al. 2005. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15(8): 1034-1050. Singh KK, Rommel K, Mishra A, Karck M, Haverich A, Schmidtke J, Arslan-Kirchner M. 2006. TGFBR1 and TGFBR2 mutations in patients with features of Marfan syndrome and Loeys-Dietz syndrome. Human Mutation 27: 770-777. Solomon SS, Majumdar G, Martinez-Hernandez A, Raghow R. 2008. A critical role of in regulating gene expression in response to insulin and other hormones. Life Sciences 83: 305-312. Spivakov M, Akhtar J, Kheradpour P, Beal K, Girardot C, Koscielny G, Herrero J, Kellis M, Furlong EE, Birney E. 2012. Analysis of variation at transcription factor binding sites in Drosophila and humans. Genome Biol 13(9): R49. Stark R, Brown GD. 2011. DiffBind: differential binding analysis of ChIP-Seq peak data., http://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf. Stenson PD, Mort M, Ball EV, Shaw K, Phillips A, Cooper DN. 2014. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet 133(1): 1-9. Stergachis AB, Neph S, Sandstrom R, Haugen E, Reynolds AP, Zhang M, Byron R, Canfield T, Stelhing-Sun S, Lee K et al. 2014. Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature 515(7527): 365-370. Steyers CM, Miller FJ. 2014. Endothelial dysfunction in chronic inflammatory diseases. International Journal of Molecular Sciences 15: 11324-11349. Sturtzel C, Testori J, Schweighofer B, Bilban M, Hofer E. 2014. The transcription factor MEF2C negatively controls angiogenic sprouting of endothelial cells depending on oxygen. PLoS One 9(7): e101521. Sun L, Rommens JM, Corvol H, Li W, Li X, Chiang Ta, Lin F, Dorfman R, Busson P-F, Parekh RV et al. 2012. Multiple apical plasma membrane constituents are associated with susceptibility to meconium ileus in individuals with . Nature Genetics 44: 562-569. Suzuki A, Hamada K, Sasaki T, Mak TW, Nakano T. 2007. Role of PTEN/PI3K pathway in endothelial cells. Biochem Soc Trans 35(Pt 2): 172-176. Tagle DA, Koop BF, Goodman M, Slightom JL, Hess DL, Jones RT. 1988. Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. Journal of molecular biology 203: 439-455. Thomas-Chollier M, Herrmann C, Defrance M, Sand O, Thieffry D, van Helden J. 2012. RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets. Nucleic Acids Res 40(4): e31. Tornavaca O, Chia M, Dufton N, Almagro LO, Conway DE, Randi AM, Schwartz MA, Matter K, Balda MS. 2015. ZO-1 controls endothelial adherens junctions, cell-cell tension, angiogenesis, and barrier formation. J Cell Biol 208(6): 821-838. Tran TA, Kinch L, Pena-Llopis S, Kockel L, Grishin N, Jiang H, Brugarolas J. 2013. Platelet- derived growth factor/vascular endothelial growth factor receptor inactivation by sunitinib results in Tsc1/Tsc2-dependent inhibition of TORC1. Mol Cell Biol 33(19): 3762-3779. 97

Turner DA, Paszek P, Woodcock DJ, Nelson DE, Horton CA, Wang Y, Spiller DG, Rand DA, White MRH, Harper CV. 2010. Physiological levels of TNFα stimulation induce stochastic dynamics of NF-κB responses in single living cells. Journal of Cell Science 123: 2834-2843. Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson A, Kampf C, Sjostedt E, Asplund A et al. 2015. Proteomics. Tissue-based map of the human proteome. Science 347(6220): 1260419. Vastenhouw NL, Schier AF. 2012. Bivalent histone modifications in early embryogenesis. Current Opinion in Cell Biology 24(3): 374-386. Verissimo AR, Herbert JM, Heath VL, Legg JA, Sheldon H, Andre M, Swain RK, Bicknell R. 2009. Functionally defining the endothelial transcriptome, from Robo4 to ECSCR. Biochem Soc Trans 37(Pt 6): 1214-1217. Vernot B, Stergachis AB, Maurano MT, Vierstra J, Neph S, Thurman RE, Stamatoyannopoulos JA, Akey JM. 2012. Personal and population genomics of human regulatory variation. Genome Res 22(9): 1689-1697. Villar D, Berthelot C, Aldridge S, Rayner TF, Lukk M, Pignatelli M, Park TJ, Deaville R, Erichsen JT, Jasinska AJ et al. 2015. Enhancer Evolution across 20 Mammalian Species. Cell 160(3): 554-566. Villar D, Flicek P, Odom DT. 2014. Evolution of transcription factor binding in metazoans - mechanisms and functional implications. Nature Reviews Genetics 15: 221-233. Visel A, Minovitsky S, Dubchak I, Pennacchio LA. 2007. VISTA Enhancer Browser--a database of tissue-specific human enhancers. Nucleic Acids Res 35(Database issue): D88-92. Visel A, Prabhakar S, Akiyama JA, Shoukry M, Lewis KD, Holt A, Plajzer-Frick I, Afzal V, Rubin EM, Pennacchio LA. 2008. Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nat Genet 40(2): 158-160. Voyta JC, Via DP, Butterfield CE, Zetter BR. 1984. Identification and isolation of endothelial cells based on their increased uptake of acetylated-low density lipoprotein. The Journal of Cell Biology 99: 2034-2040. Wang HJ, Huang HC, Chuang YC, Liao PJ, Yang DM, Yang WK, Huang H. 2012. Modulation of tissue factor and thrombomodulin expression in human aortic endothelial cells incubated with high glucose. Acta Diabetologica 49(2): 125-130. Wang L, Bhatta A, Toque HA, Rojas M, Yao L, Xu Z, Patel C, Caldwell RB, Caldwell RW. 2014. Arginase inhibition enhances angiogenesis in endothelial cells exposed to hypoxia. Microvasc Res 98C: 1-8. Ward LD, Kellis M. 2012. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Research 40(Database issue): D930-934. Warde-Farley D, Donaldson S. 2010. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Research 38: 214-220. Wilson MD, Odom DT. 2009. Evolution of transcriptional control in mammals. Current Opinion in Genetics & Development 19: 579-585. Wisdom R, Johnson RS, Moore C. 1999. c-Jun regulates cell cycle progression and apoptosis by distinct mechanisms. EMBO J 18(1): 188-197. Wythe JD, Dang LTH, Devine WP, Boudreau E, Artap ST, He D, Schachterle W, Stainier DYR, Oettgen P, Black BL et al. 2013. ETS factors regulate Vegf-dependent arterial specification. Developmental Cell 26: 45-58.

98

Xiao S, Xie D, Cao X, Yu P, Xing X, Chen C-C, Musselman M, Xie M, West FD, Lewin Ha et al. 2012. Comparative epigenomic annotation of regulatory DNA. Cell 149: 1381-1392. Xu Z, Yoshida T, Wu L, Maiti D, Cebotaru L, Duh EJ. 2015. Transcription factor MEF2C suppresses endothelial cell inflammation via regulation of NF-κB and KLF2. J Cell Physiol 230(6): 1310-1320. Yue F Cheng Y Breschi A Vierstra J Wu W Ryba T Sandstrom R Ma Z Davis C Pope BD et al. 2014. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515: 355-364. Zeng Y, Tarbell JM. 2014. The adaptive remodeling of endothelial glycocalyx in response to fluid shear stress. PLoS One 9(1): e86249. Zhang B, Day DS, Ho JW, Song L, Cao J, Christodoulou D, Seidman JG, Crawford GE, Park PJ, Pu WT. 2013. A dynamic H3K27ac signature identifies VEGFA-stimulated endothelial enhancers and requires EP300 activity. Genome Research 23: 917-927. Zhang X, Lynch AI, Davis BR, Ford CE, Boerwinkle E, Eckfeldt JH, Leiendecker-Foster C, Arnett DK. 2012. Pharmacogenetic association of NOS3 variants with cardiovascular disease in patients with hypertension: the GenHAT study. PLoS One 7(3): e34217. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W et al. 2008. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9(9): R137. Zhao H, Anand AR, Ganju RK. 2014. Slit2-Robo4 pathway modulates lipopolysaccharide- induced endothelial inflammation and its expression is dysregulated during endotoxemia. Journal of Immunology 192(1): 385-393. Zinzen RP, Girardot C, Gagneur J, Braun M, Furlong EE. 2009. Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 462(7269): 65-70. Zuin J, Dixon JR, van der Reijden MI, Ye Z, Kolovos P, Brouwer RW, van de Corput MP, van de Werken HJ, Knoch TA, van IWF et al. 2014. Cohesin and CTCF differentially affect chromatin architecture and gene expression in human cells. Proc Natl Acad Sci U S A 111(3): 996-1001.

99 Appendix

Supplementary Table 1: HAEC ChIP-seq library sequencing statistics

100

Supplementary Table 2: HUVEC and HepG2 ChIP-seq library information

101

Supplementary Table 3: HAEC ChIP-seq quality control

102

Supplementary Table 4: Regulatory HGMD variants in HAEC-only and HUVEC-only H3K27ac and JUN sites. Supplementary Table 4 Legend: HAEC-only and HUVEC-only regulatory regions were analyzed to align currently known disease variants based on HGMD. There are 37 variants in HAEC-only H3K27ac regions, 36 in HUVEC-only H3K27ac regions, and 16 variants in HAEC- only JUN sites. The genomic coordinates of each variant along with rs identification number are shown if available. Evidence from HGMD catalogue of disease-causing mutations (DM, DM? question mark denotes that a degree of doubt has been found to exist with regard to pathogenicity), disease-associated polymorphisms (DP), disease-associated polymorphisms with additional supporting functional evidence (DFP), and in vitro/laboratory or in vivo functional polymorphisms (FP) are shown. Disease associations based on the original publication and the PubMed IDs are displayed for each variant.

Genomic Position (hg19) Categories rs ID Gene Association PubMed ID Defined by (reference SNP) HGMD HAEC-only H3K27ac peaks with regulatory HGMD variants (N=37) in 25 unique genes chr1:21617244-21617245 DFP rs213045 ECE1 Ambulatory blood pressure 12566389 chr1:110229943-110229944 DFP rs412543 GSTM1 Breast cancer: reduced risk 19228880 chr1:114415367-114415368 DP rs2488457 PTPN22 Diabetes: type 1 16470599 chr1:228337560-228337561 DM NULL GJC2 Pelizaeus-Merzbacher-like disease 20695017 chr3:30647159-30647160 DFP rs3087465 TGFBR2 Gastric cancer: reduced risk 17187359 chr3:30648038-30648039 DM? NULL TGFBR2 Marfan syndrome II 16799921 chr3:30735936-30735937 DFP rs744751 TGFBR2 Leprosy 21917900 chr3:49395756-49395757 FP rs1800668 GPX1 Reduced transcriptional activity 15331559 chr3:49396359-49396360 FP rs3811699 GPX1 Reduced transcriptional activity 15331559 chr4:55991730-55991731 FP rs7667298 KDR Increased mRNA expression 19435508 chr4:55992365-55992366 DFP rs2071559 KDR Coronary heart disease 17707181 chr4:145567826-145567827 DM rs201404563 HHIP Pituitary hormone deficiency: combined 22897141 chr5:70220891-70220892 DM NULL SMN1 Spinal muscular atrophy 20564270 chr6:31526855-31526856 DFP rs13192469 LTA Leprosy 22071774 chr6:33588146-33588147 DP rs3748079 ITPR3 Systemic lupus erythematosus 18219441 chr7:35293779-35293780 DM NULL TBX20 Ventricular septal defect 22465533

103

chr7:150689969-150689970 DP rs2243311 NOS3 Limb deficiency defect 16906563 chr7:150690078-150690079 DFP rs2070744 NOS3 Coronary spasm 10359729 chr7:150690175-150690176 DP rs3918226 NOS3 Hypertension 22184326 chr8:23082970-23082971 DFP rs13278062 TNFRSF10A Bladder cancer 19070628 chr9:139565149-139565150 FP rs4636297 MIR126 Reduced expression 20621067 chr11:321016-321017 DFP rs3888188 IFITM3 Tuberculosis 23874452 chr11:10325324-10325325 FP rs11042725 ADM Altered transcriptional activity 19636336 chr12:6451589-6451590 DP rs4149570 TNFRSF1A Kawasaki disease 21315128 chr12:69201796-69201797 DFP rs937282 MDM2 Bladder cancer: increased risk 18519798 chr12:69202555-69202556 DFP rs117039649 MDM2 Breast cancer: reduced risk 21316605 chr12:69202579-69202580 DFP rs2279744 MDM2 Cancer suseptibility 15550242 chr14:23305648-23305649 DFP rs1004030 MMP14 Focal segmental glomerulosclerosis: reduced risk 18927121 chr14:23305662-23305663 DP rs1003349 MMP14 Focal segmental glomerulosclerosis: reduced risk 18927121 chr16:69761660-69761661 DFP rs689455 NQO1 ALI after major trauma: reduced risk 19017358 chr17:42078967-42078968 DM NULL NAGS Hyperammonaemia: carbamylglutamate responsive 21681857 chr19:49457937-49457938 DFP rs4645878 BAX Disease progression in chronic lymphocytic leukaemia 12359369 chr20:23030291-23030292 DFP rs16984852 THBD Venous thrombosis: increased risk 23332921 chr20:23030342-23030343 DM? rs13306848 THBD Myocardial infarction 9236408 chr20:23030442-23030443 DM NULL THBD Myocardial infarction 9236408 chr20:34025982-34025983 DFP rs143383 GDF5 Osteoarthritis 17384641 chr20:34026067-34026068 FP NULL GDF5 Increased gene expression 22929025

HUVEC-only H3K27ac peaks with regulatory HGMD variants (N=36) in 21 unique genes chr1:145507645-145507646 FP rs139428292 RBM8A Reduced promoter activity 22366785 chr1:145507764-145507765 FP rs201779890 RBM8A Reduced promoter activity 22366785 chr1:160312264-160312265 DFP rs10752637 NCSTN Alzheimer disease 19840113 chr1:192777806-192777807 DFP rs2746072 RGS2 Metabolic syndrome: susceptibility 17143182 chr2:175629599-175629600 DFP rs16862847 CHRNA1 Myasthenia gravis: early onset 17687331 chr3:37034769-37034770 DM? rs35032294 MLH1 Colorectal cancer: non-polyposis 16830052 chr3:37034931-37034932 DM NULL MLH1 Colorectal cancer: non-polyposis 17690979

104

chr3:37034945-37034946 DP rs1800734 MLH1 Lung cancer: risk 15382050 chr3:37034974-37034975 DM? NULL MLH1 Colorectal cancer: non-polyposis 21615986 chr3:37034996-37034997 DM rs41285097 MLH1 Colorectal cancer: non-polyposis 11726306 chr3:37035010-37035011 DM? rs56198082 MLH1 Colorectal cancer: non-polyposis 11726306 chr3:37035010-37035011 DM? NULL MLH1 Colorectal cancer: non-polyposis 14517962 chr3:37035011-37035012 DM NULL MLH1 Colorectal cancer: non-polyposis 21840485 chr3:115377645-115377646 DM NULL GAP43 22138049 chr5:159894846-159894847 DFP rs57095329 MIR146A Systemic lupus erythematosus: increased risk 21738483 chr6:135375990-135375991 DP rs2297339 HBS1L Severity in thalassaemia 18839276 chr6:163148720-163148721 DM NULL PARK2 Parkinson disease: autosomal recessive 11971093 chr6:163149054-163149055 DFP rs9347683 PARK2 Parkinsonism: late-onset 12374768 chr6:163149054-163149055 DP rs9347683 PACRG Infertility: male 19268936 chr7:92157801-92157802 FP rs12386703 PEX1 Altered gene expression 16088892 chr7:92157885-92157886 FP rs12386601 PEX1 Altered gene expression 16088892 chr8:119124442-119124443 FP rs34016643 EXT1 Increased promoter activity 22037484 chr13:60738071-60738072 DM NULL DIAPH3 Auditory neuropathy 20624953 chr14:75745092-75745093 DM? NULL FOS Lipodystrophy: congenital generalized 23919306 chr16:3115110-3115111 DP rs28372698 IL32 Thyroid carcinoma: increased risk 23486016 chr16:88718095-88718096 DFP rs72811418 CYBA Essential hypertension 17620958 chr16:88718352-88718353 DFP rs9932581 CYBA Reduced promoter activity 12729892 chr17:66508598-66508599 DM NULL PRKAR1A Carney complex 12424709 chr19:44079686-44079687 DFP rs3213245 XRCC1 Increased lung cancer risk 16652158 chr22:42015764-42015765 DFP rs5751129 XRCC6 Pterygium 17768380 chr22:42016698-42016699 DFP rs2267437 XRCC6 Renal cell carcinoma: increased risk 22593040 chr22:42017263-42017264 DFP rs132770 XRCC6 Renal cell carcinoma: increased risk 22455395 chrX:23799737-23799738 DFP rs6526342 SAT1 Suicide 16389195 chrX:100662900-100662901 FP rs2071225 GLA Reduced activity 18979223 chrX:100662920-100662921 FP rs3027584 GLA Increased transcription 7672123 chrX:100663463-100663464 DM? NULL GLA Parkinson disease 21683120

105

HAEC-only JUN peaks with regulatory HGMD variants (N=16) in 14 unique genes chr1:11072690-11072691 DM? rs11121679 TARDBP Amyotrophic lateral sclerosis 19695877 chr1:202927506-202927507 DFP rs7517286 ADIPOR1 Lower insulin resistance 18466348 chr1:207925191-207925192 DP rs2796268 CD46 Haemolytic uraemic syndrome 15661753 chr3:49395756-49395757 FP rs1800668 GPX1 Reduced transcriptional activity 15331559 chr7:75931812-75931813 DM NULL HSPB1 Amyotrophic lateral sclerosis 17623484 chr7:90893971-90893972 FP rs2232157 FZD1 Altered transcription factor binding 20051274 chr7:116312438-116312439 DFP rs1858830 MET Autism 17053076 chr7:150690078-150690079 DFP rs2070744 NOS3 Coronary spasm 10359729 chr7:150690175-150690176 DP rs3918226 NOS3 Hypertension 22184326 chr11:61735060-61735061 DM NULL FTH1 Iron overload 11389486 chr11:61735175-61735176 FP rs114778979 FTH1 Reduced expression 16797877 chr14:90863356-90863357 DFP rs12885713 CALM1 Osteoarthritis 15746150 chr15:89878390-89878391 DFP rs2856268 POLG Breast cancer: protection against 22684821 chr17:76356095-76356096 DP rs12953258 SOCS3 Increased whole-body insulin sensitivity 15249995 chr19:42364186-42364187 DM? NULL RPS19 Diamond-Blackfan anaemia 20054847 chr22:37415491-37415492 FP NULL TST Reduced promoter activity 16790311

106

Supplementary Table 5: List of regulatory region definitions

107

Supplementary Figure 1: Inter-replicate cross correlation shows that HAEC and HUVEC libraries cluster based on ChIP-seq factor. Supplementary Figure 1 Legend: Peaks from each library were pooled into a master set, and each library was queried for number of overlapping sequencing reads by at least 1 base pair. The histogram (bottom left) shows intensity of Pearson correlation in green. Hierarchical clustering based on Pearson correlation values are shown in the dendogram (left), and on the top there is a legend of tissue type (pink = HUVEC, blue = HAEC), factor, and replicate. Distinct clusters include 1) an out group of repressed regions marked by H3K27me3, 2) gene bodies marked by H3K36me3, and 3) insulated regions marked by chromatin modulator CTCF.

108

Supplementary Table 6: RAEC and BAEC ChIP-seq library sequencing statistics

Supplementary Table 7: RAEC and BAEC ChIP-seq quality control

109

Supplementary Table 8: Top 10 g:Profiler enrichments in GO Terms used to describe genes from Supplementary Table 4. Supplementary Table 8 Legend: Genes annotated to regulatory HGMD variants in HAEC-only H3K27ac sites (Section A) and HAEC- only JUN sites (Section B) were inputted into g:Profiler (Reimand et al. 2007) to conduct a gene set enrichment analysis. HUVEC-only H3K27ac sites were not significantly enriched and were not included in the table. The number of genes within each GO term is shown in term column, and the percentage of gene coverage from the query genes is shown as a percentage with the corrected hypergeometric p- values. Genes that are enriched in each GO term are highlighted in the last column. A) Category enrichments for 25 unique genes annotated to HGMD variants in HAEC-only H3K27ac peaks GO Term Name # Term # Query # Common % Corrected GO Term ID Genes Genes Genes Genes p-value response to organic 2742 25 17 68 1.28E-06 GO:0010033 TNFRSF1A,BAX,ITPR3,GDF5,KDR,PTPN22,MDM2,IFITM3,A substance DM,MMP14,TGFBR2,HHIP,NOS3,THBD,NQO1,LTA,GPX1 morphogenesis of a 197 25 7 28 7.37E-06 GO:0061138 BAX,KDR,ADM,MMP14,TGFBR2,HHIP,TBX20 branching epithelium angiogenesis 411 25 8 32 5.19E-05 GO:0001525 BAX,KDR,ADM,MMP14,TGFBR2,TBX20,NOS3,GPX1 branching 164 25 6 24 9.56E-05 GO:0048754 BAX,KDR,MMP14,TGFBR2,HHIP,TBX20 morphogenesis of an epithelial tube extrinsic apoptotic 80 25 5 20 9.99E-05 GO:0008625 TNFRSF1A,BAX,TNFRSF10A,NOS3,GPX1 signaling pathway via death domain receptors cell proliferation 1901 25 13 52 1.77E-04 GO:0008283 BAX,GDF5,KDR,PTPN22,MDM2,ADM,MMP14,TGFBR2,HHI P,TBX20,NOS3,LTA,GPX1 negative regulation of 4438 25 18 72 2.63E-04 GO:0048519 TNFRSF1A,BAX,TNFRSF10A,GDF5,KDR,PTPN22,MDM2,IFI biological process TM3,ADM,MMP14,TGFBR2,HHIP,TBX20,NOS3,THBD,NQO1 ,LTA,GPX1 regulation of signaling 2832 25 15 60 2.76E-04 GO:0023051 TNFRSF1A,BAX,ITPR3,TNFRSF10A,ECE1,GDF5,KDR,PTPN 22,MDM2,ADM,TGFBR2,HHIP,TBX20,NOS3,GPX1

110

apoptotic process 1849 25 12 48 1.31E-03 GO:0006915 TNFRSF1A,BAX,TNFRSF10A,GDF5,KDR,MDM2,ADM,TGFB R2,NOS3,NQO1,LTA,GPX1 regulation of cell 2845 25 14 56 2.57E-03 GO:0010646 TNFRSF1A,BAX,ITPR3,TNFRSF10A,GDF5,KDR,PTPN22,MD communication M2,ADM,TGFBR2,HHIP,TBX20,NOS3,GPX1 response to bacterium 485 25 7 28 3.43E-03 GO:0009617 TNFRSF1A,PTPN22,ADM,NOS3,THBD,LTA,GPX1 positive regulation of 4565 25 17 68 3.50E-03 GO:0048522 TNFRSF1A,BAX,ITPR3,TNFRSF10A,ECE1,GDF5,KDR,PTPN cellular process 22,MDM2,ADM,MMP14,TGFBR2,TBX20,NOS3,NQO1,LTA,G PX1 embryo development 1013 25 9 36 4.52E-03 GO:0009790 BAX,ECE1,GDF5,KDR,ADM,MMP14,TGFBR2,TBX20,NOS3 positive regulation of 1370 25 10 40 6.00E-03 GO:0051240 BAX,GDF5,KDR,PTPN22,ADM,TGFBR2,TBX20,NOS3,THBD, multicellular LTA organismal process lung development 192 25 5 20 7.81E-03 GO:0030324 KDR,MMP14,TGFBR2,HHIP,NOS3 ovulation cycle process 85 25 4 16 7.97E-03 GO:0022602 BAX,KDR,MMP14,NOS3

female gonad 99 25 4 16 1.46E-02 GO:0008585 BAX,KDR,MMP14,NOS3 development regulation of systemic 5 25 2 8 3.30E-02 GO:0003100 ECE1,NOS3 arterial blood pressure by endothelin

B) Category enrichments for 14 unique genes annotated to HGMD variants in HAEC-only JUN peaks GO Term Name # Term # Query # Common % Corrected GO Term ID Genes Genes Genes Genes p-value regulation of response 775 14 8 57.1 3.13E-05 GO:0032101 RPS19,MET,HSPB1,CD46,NOS3,SOCS3,CALM1,GPX1 to external stimulus negative regulation of 1308 14 8 57.1 1.79E-03 GO:0048585 RPS19,MET,HSPB1,CD46,FZD1,NOS3,SOCS3,GPX1 response to stimulus cellular response to 2547 14 10 71.4 1.92E-03 GO:0070887 RPS19,MET,HSPB1,POLG,FZD1,ADIPOR1,NOS3,SOCS3,CA chemical stimulus LM1,GPX1

111

negative regulation of 31 14 3 21.4 2.51E-03 GO:1903202 MET,HSPB1,GPX1 oxidative stress- induced cell death homeostatic process 1491 14 8 57.1 4.83E-03 GO:0042592 RPS19,MET,HSPB1,POLG,ADIPOR1,FTH1,CALM1,GPX1 endothelial cell 141 14 4 28.6 4.85E-03 GO:0043542 MET,HSPB1,NOS3,GPX1 migration regulation of signaling 2832 14 10 71.4 5.22E-03 GO:0023051 MET,HSPB1,CD46,TARDBP,FZD1,ADIPOR1,NOS3,SOCS3,C ALM1,GPX1 regulation of cell 2845 14 10 71.4 5.45E-03 GO:0010646 MET,HSPB1,CD46,TARDBP,FZD1,ADIPOR1,NOS3,SOCS3,C communication ALM1,GPX1 response to wounding 1030 14 7 50 5.81E-03 GO:0009611 RPS19,HSPB1,CD46,NOS3,SOCS3,CALM1,GPX1 enzyme linked receptor 1115 14 7 50 9.84E-03 GO:0007167 MET,HSPB1,FZD1,ADIPOR1,NOS3,SOCS3,CALM1 protein signaling pathway response to gamma 51 14 3 21.4 1.15E-02 GO:0010332 POLG,SOCS3,GPX1 radiation positive regulation of 9 14 2 14.3 3.22E-02 GO:2001028 MET,HSPB1 endothelial cell chemotaxis cellular response to 888 14 6 42.9 3.92E-02 GO:1901701 MET,POLG,FZD1,ADIPOR1,NOS3,SOCS3 oxygen-containing compound negative regulation of 889 14 6 42.9 3.95E-02 GO:0032269 HSPB1,CD46,TARDBP,SOCS3,CALM1,GPX1 cellular protein metabolic process positive regulation of 4565 14 11 78.6 4.60E-02 GO:0048522 RPS19,MET,HSPB1,CD46,TARDBP,FZD1,ADIPOR1,NOS3,S cellular process OCS3,CALM1,GPX1

112 Supplementary Figure 2: Two variants within the proximal promoter of NOS3 are within a HAEC-only H3K27ac and HAEC-only JUN peak. Supplementary Figure 2 Legend: Regulatory variants rs2070744 and rs3918226 are within a HAEC-only H3K27ac and HAEC-only JUN peak. UCSC screenshot of H3K27ac, JUN, and input ChIP-seq experiments in HAEC (red) and HUVEC (blue). A 5kb genomic window at NOS3 locus (x-axis) and number of sequencing reads per library (50bp) are shown. There is a marked difference in signal between HAEC and HUVEC libraries.

113

Supplementary Figure 3: Number of human aortic EC peaks shared with rat and/or bovine aortic ECs. Supplementary Figure 3 Legend: Grey numbers for each species represent original number of peaks in library; white numbers in boxes represent human peaks conserved in each category. Less than 20% of human H3K27ac and 4.2% of human JUN peaks are shared with rat (Panel A), while approximately 30% of human H3K27ac and 2.3% of human JUN peaks are shared with bovine (Panel B). Ultraconserved regions (human, rat, and bovine) represent 13.2% and 0.8% of human H3K27ac and JUN peaks, respectively (Panel C).

114

Supplementary Figure 4: Number of leniently conserved JUN sites with rat and bovine aortic ECs. Supplementary Figure 4 Legend: Number of human peaks shared with rat or bovine separately, and shared in both rat and bovine (right) are shown. Grey numbers under each species represent original number of peaks in library, white numbers in boxes represent human peaks conserved in each category. 15% of human JUN peaks are shared with rat, while approximately 20% of human JUN peaks are shared with bovine. 10% of human JUN peaks are leniently conserved in both rat and bovine.

115 Supplementary Table 9: Top 10 g:Profiler enrichments in GO Terms used to describe genes from Figure 19 show common genes enrich for cellular functions. Supplementary Table 9 Legend: Genes annotated to regulatory HGMD variants in EC and liver H3K27ac sites were classified into a Venn diagram in Figure 19. EC-only (Section A), common between ECs and liver (Section B), and liver-only (Section C) categories are shown. Each category was inputted into g:Profiler (Reimand et al. 2007) to conduct a gene set enrichment analysis. The number of genes within each GO term is shown in common genes column, and the percentage of gene coverage from the gene list is shown. The corrected hypergeometric p-values are displayed and the genes that are enriched in each GO term are highlighted in the last column. HGMD genes unique to EC enhancers are related to immune responses and apoptosis. HGMD genes common between EC and liver enhancers are related to cellular functions like protein localization and fibril organization, while HGMD genes unique to liver enhancers enrich for glucose metabolism and vitamin biosynthesis. From the perspective of EC enhancers, most HGMD genes are shared with liver.

A) Category enrichments for 8 unique genes annotated to HGMD variants in EC-only H3K27ac peaks GO Term Name # Term # Query # Common % Corrected GO Term ID Genes Genes Genes Genes p-value regulation of 1423 8 6 0.4 5.71E-03 GO:0042981 HSPA5,KITLG,IL6,TNFAIP8,NQO1,TBX1 apoptotic process cellular response to 2547 8 7 0.3 7.25E-03 GO:0070887 HSPA5,KITLG,HTR1B,IL6,STAT4,NQO1,TBX1 chemical stimulus response to 1558 8 6 0.4 9.72E-03 GO:0009719 HSPA5,KITLG,HTR1B,IL6,NQO1,TBX1 endogenous stimulus response to organic 2742 8 7 0.3 1.20E-02 GO:0010033 HSPA5,KITLG,HTR1B,IL6,STAT4,NQO1,TBX1 substance tissue remodeling 141 8 3 2.1 2.92E-02 GO:0048771 HTR1B,IL6,TBX1 response to toxic 165 8 3 1.8 4.68E-02 GO:0009636 HSPA5,IL6,NQO1 substance

B) Category enrichments for 239/244 unique genes annotated to HGMD variants that are in common between EC and liver H3K27ac sites GO Term Name # Term # Query # Common % Corrected GO Term ID Genes Genes Genes Genes p-value

116

response to stress 3864 239 134 3.5 4.48E-32 GO:0006950 TFPI,ERCC1,RB1CC1,GCLM,NR1H3,IFNGR1,RAD51,ATG5,FECH, KLF6,TNFRSF1A,XRCC1,MLH1,CPB2,PSEN1,GSK3B,BAX,DNMT3 B,UNC13D,COMT,AAAS,MSH2,HSP90AB1,SIRT1,GADD45B,SNAP 29,SMARCB1,HMOX1,APEX1,PSMA6,NFKBIA,PROCR,SMAD7,HM OX2,TGFB1,MET,HSPB1,SERPINE1,ENG,GATA3,CCL2,PRKAR1A, YWHAE,HSPA8,GTF2H1,MVK,CDKN1B,VEGFA,SLC29A1,LOX,H MGCR,OGG1,GNAI2,NFE2L2,PARK7,CD46,F3,IRF6,CREB1,CTGF ,TARDBP,CDKN1A,MT2A,IRF1,BMP4,INSIG2,GCH1,CASP9,PTPN 22,APC,P2RX4,MDM2,RAC1,XPA,FGF2,IFITM3,IFNAR1,SOD1,PT PRF,RHOB,ARL6IP5,CCNA2,ADM,ATM,PDCD4,XRCC4,NEIL2,XP C,SLC16A1,GNAQ,MMP14,CALM3,IL6R,TGFBR2,ANXA5,ITGA2,NI PBL,NOS3,HTRA1,IRF2,FEN1,ADAM9,ADRB2,FOS,ETFDH,PTEN, GAP43,STAT5B,EXO1,ATP2A2,UCP2,FOXC2,JUN,THBD,CALR,RA D51B,CHEK2,PROS1,SOCS3,FANCA,XRCC6,XRCC2,ADH5,DDX39 B,CALM1,MTOR,BMPR2,NFKBIL1,ERCC6,LTA,GPX1,ADSL,MIF,H MBS blood circulation 429 239 26 6.1 6.36E-08 GO:0008015 GCLM,ATG5,HMOX1,SMAD7,ENG,YWHAE,VEGFA,HMGCR,RGS2 ,ECE1,CTGF,GCH1,P2RX4,SOD1,ADM,CALM3,TBX20,NOS3,AR,A DRB2,ATP2A2,FOXC2,ADH5,CALM1,BMPR2,GPX1 regulation of protein 183 239 14 7.7 1.74E-04 GO:1900180 GSK3B,HSP90AB1,SIRT1,NFKBIA,TGFB1,SUFU,OGG1,PARK7,CD localization to KN1A,BMP4,GNAQ,FZD1,MTOR,NFKBIL1 nucleus regulation of fibril 3 239 3 100 4.88E-03 GO:1902903 HSPA8,PARK7,GPX1 organization regulation of 154 239 11 7.1 8.97E-03 GO:0032434 PSEN1,GSK3B,HSP90AB1,SMAD7,NFE2L2,PARK7,APC,MDM2,RB proteasomal 1,ATM,PTEN ubiquitin-dependent protein catabolic process interferon-gamma- 85 239 8 9.4 2.66E-02 GO:0060333 NR1H3,IFNGR1,HSP90AB1,IRF6,MT2A,IRF1,IRF2,SOCS3 mediated signaling pathway embryonic cranial 43 239 6 14 3.62E-02 GO:0048701 MTHFD1,BMP4,MMP14,TGFBR2,NIPBL,FOXC2 skeleton morphogenesis negative regulation 5 239 3 60 4.79E-02 GO:0090241 ATG5,SPI1,SIRT1 of histone H4

117

acetylation

C) Category enrichments for 339/341 unique genes annotated to HGMD variants in liver-only H3K27ac peaks GO Term Name # Term # Query # Common % Corrected GO Term ID Genes Genes Genes Genes p-value response to organic 2742 339 173 6.3 6.87E-56 GO:0010033 GCLC,CFTR,PON1,FMO1,NR1H4,HGF,ABCC2,FAS,TNFRSF1B,AT substance P6V0A1,OTC,CDH1,CYBA,ITIH4,F7,MAOB,SEL1L,PTGS2,TBX21,I L4R,PIK3C3,IL12RB2,APOB,BIRC5,THPO,ESR1,ITPR3,XBP1,HNF 4A,CST3,SYP,FLT1,IL21R,CSK,NAMPT,PRKAG2,PTGDS,CHRNE,C OL1A1,CRYAB,APOA5,IL10RA,TCIRG1,IFNG,IL1R1,FASLG,APOA 1,TNFAIP3,ARG1,TGFB3,DNMT3A,ADRA1A,CAT,PCK1,IL1B,PAX8 ,CCR7,IRF5,LOXL1,CDKN1C,EPO,ASS1,F12,VIMP,PPARG,SERPI NF1,CHI3L1,KL,MEN1,SORT1,IL2RA,PDGFRA,ETS1,HNF1A,AGT, EDNRB,BLK,GATA4,IL10,IL1RN,TOR1A,TLR4,ALDOB,TLR2,THBS 1,CYP1B1,RBP4,BRCA2,ESR2,CYP1A2,POLG,FURIN,IRF8,CREB3 L4,PKLR,EPHX1,NR1I2,AHSG,EGFR,IL2RG,CDKN2B,SCGB1A1,S ERPINH1,ADRA2A,IL18,NR3C2,ACSL1,HHEX,NR4A2,LY96,PPARG C1B,MX1,USF1,APOA2,WNT9B,ADIPOR1,AGRP,ABCG1,S100B,LR P5,ALPL,GFI1,VCAM1,CXCR1,SLC2A2,HHIP,TNFRSF11B,ABCA1, GJB2,RET,STAT6,NKX3- 1,IGF2,VKORC1,SOST,SFTPC,IL12A,IRS1,PCSK9,CXCL10,MGMT, CD14,PDGFD,FGA,FGB,BCL2,BTC,HSD11B2,MTHFR,CD19,GBA, CIITA,RNLS,SOCS1,IRS2,CXCR3,MAPT,PPARA,LITAF,S100A14,M ME,ENPP1,AGER,HSPA1B,HSPA1A,APOM,LTC4S,CYP21A2,TNF,T LR9,IL10RB,P2RY11,INS positive regulation of 995 339 51 5.1 1.36E-08 GO:0045944 NR1H4,IGF1,HGF,FOXP3,ESR1,XBP1,HNF4A,NAMPT,TNFSF8,IF transcription from NG,POMC,BCL9,TGFB3,HOXB5,PCK1,IL1B,PAX8,LIF,IRF5,PPAR RNA polymerase II G,MEN1,ETS1,HNF1A,GATA4,IL10,TLR4,TLR2,IRF8,CREB3L4,NR promoter 1I2,TNIP1,EGFR,CDKN2B,HHEX,NR4A2,PPARGC1B,USF1,LRP5,S HH,STAT6,NKX3- 1,IGF2,ATF5,CXCL10,NPAS2,CIITA,INSIG1,CXCR3,PPARA,TNF,T LR9 positive regulation of 103 339 15 14.6 8.80E-07 GO:1900182 IGF1,CDH1,PTGS2,XBP1,TGFB3,IL1B,LIF,TLR4,TLR2,EGFR,IL18, protein localization SHH,TNF,TLR9,INS to nucleus

118

glucose metabolic 225 339 20 8.9 8.29E-06 GO:0006006 IGF1,POMC,PPP1R3C,PCK1,G6PC,VIMP,ALDOB,RBP4,PKLR,IG process FBP3,USF1,ADIPOR1,LRP5,IGF2,IRS1,IRS2,PPARA,ENPP1,TNF,I NS positive regulation of 92 339 13 14.1 1.96E-05 GO:0042307 IGF1,CDH1,PTGS2,XBP1,TGFB3,IL1B,TLR4,TLR2,EGFR,IL18,SH protein import into H,TNF,TLR9 nucleus platelet 87 339 12 13.8 9.79E-05 GO:0002576 IGF1,HGF,APOA1,TGFB3,ACTN4,THBS1,SERPING1,IGF2,FGG,F degranulation GA,FGB,F8 regulation of Cdc42 4 339 4 100 2.35E-04 GO:0032489 APOC3,APOA1,APOE,ABCA1 protein signal transduction fever generation 11 339 5 45.5 1.73E-03 GO:0001660 PTGS2,IL1B,EDNRB,IL1RN,TNF phospholipase C- 84 339 10 11.9 5.40E-03 GO:0007200 ESR1,SLC9A3R1,NPR3,ADRA1A,AGT,EDNRB,AGTR1,ADRA2A,F2, activating G-protein P2RY11 coupled receptor signaling pathway fat-soluble vitamin 15 339 5 33.3 1.06E-02 GO:0042362 PLTP,IFNG,IL1B,GFI1,TNF biosynthetic process response to laminar 15 339 5 33.3 1.06E-02 GO:0034616 XBP1,TGFB3,ASS1,ETS1,ABCA1 fluid shear stress positive regulation of 3 339 3 100 1.35E-02 GO:0033092 FOXP3,IL1B,SHH immature T cell proliferation in thymus regulation of renal 3 339 3 100 1.35E-02 GO:0002019 CYBA,AGT,ACE output by angiotensin regulation of 17 339 5 29.4 2.12E-02 GO:0032647 TLR8,IRF5,IL10,TLR4,TLR9 interferon-alpha production positive regulation of 46 339 7 15.2 3.63E-02 GO:0045840 IGF1,BIRC5,IL1B,LRP5,IGF2,BTC,INS mitotic nuclear division extracellular matrix 388 339 20 5.2 4.86E-02 GO:0030198 CDH1,MMP9,CST3,COL1A1,TGFB3,LOXL1,PDGFRA,ETS1,AGT,L organization AMC1,THBS1,CYP1B1,FURIN,SERPINH1,VCAM1,TNFRSF11B,FG G,FGA,FGB,TNF

119

Supplementary Figure 5: The conservation status of aortic EC peaks tested by VISTA enhancers is not predictive of positive enhancer activity. Supplementary Figure 5 Legend: Contingency table evaluating the distribution of positive and negative VISTA enhancers in conserved H3K27ac and strictly and leniently conserved JUN and non-conserved sites. A positive enhancer showed reproducible expression in the same structure in at least three independent biological replicates, and a negative enhancer was tested with at least five transgenic embryos yet no reproducible expression was observed in any structure in at least three different embryos. There is equal probability that positive VISTA enhancers are found in conserved and non-conserved sites. The likelihood of the observed distribution of VISTA enhancers is not significant, as shown by the p-value of Fisher’s exact test displayed under each contingency table. Therefore, the conservation status of a HAEC peak contained within a VISTA enhancer is not indicative of positive expression within this set of tested enhancers.

120 Supplementary Table 10: List of closest genes to HAEC-only regions that are in an active chromatin state and repressed or poised state in HUVEC, and vise versa. Supplementary Table 10 Legend: The closest gene to each candidate peak was obtained using RefSeq annotations. The description of each gene, gene type, and Online Mendelian Inheritance of Man (OMIM) IDs were obtained from UCSC Table Browser. The peak score and distance to closest gene is recorded, with 0 bp indicating direct overlap. Genes that encode transcription factors are in bolded.

HAEC-only JUN sites in active chromatin state, with repressed and/or poised HUVEC state (N=7) hgnc_symbol description gene_biotype OMIM Peak_score Distances DLX2 distal-less 2 [Source:HGNC Symbol;Acc:HGNC:2915] protein_coding 126255 686 55 GJA5 gap junction protein, alpha 5, 40kDa [Source:HGNC Symbol;Acc:HGNC:4279] protein_coding 121013 344 0 HEY2 hes-related family bHLH transcription factor with YRPW motif 2 [Source:HGNC protein_coding 604674 121 0 Symbol;Acc:HGNC:4881] LINC01471 long intergenic non-protein coding RNA 1471 [Source:HGNC Symbol;Acc:HGNC:51106] lincRNA 0 385 40640 LOC101928227 NA NA 0 980 67308 MIR187 microRNA 187 [Source:HGNC Symbol;Acc:HGNC:31558] miRNA 612698 411 51674 NFIX /X (CCAAT-binding transcription factor) [Source:HGNC Symbol;Acc:HGNC:7788] protein_coding 164005 719 0

HAEC-only H3K27ac sites in active chromatin state, with repressed and/or poised HUVEC state (N=87) hgnc_symbol description gene_biotype OMIM Peak_score Distances ACTN2 actinin, alpha 2 [Source:HGNC Symbol;Acc:HGNC:164] protein_coding 102573 18 0 ADCY8 adenylate cyclase 8 (brain) [Source:HGNC Symbol;Acc:HGNC:239] protein_coding 103070 94 66178 ADGRF1 adhesion G protein-coupled receptor F1 [Source:HGNC Symbol;Acc:HGNC:18990] protein_coding 0 27 40779 ADORA3 adenosine A3 receptor [Source:HGNC Symbol;Acc:HGNC:268] protein_coding 600445 42 0 ADORA3 adenosine A3 receptor [Source:HGNC Symbol;Acc:HGNC:268] LRG_gene 600445 42 0 AFAP1L2 actin filament associated protein 1-like 2 [Source:HGNC Symbol;Acc:HGNC:25901] protein_coding 612420 47 0 AMICA1 adhesion molecule, interacts with CXADR antigen 1 [Source:HGNC Symbol;Acc:HGNC:19084] protein_coding 609770 308 0

BBOX1-AS1 BBOX1 antisense RNA 1 [Source:HGNC Symbol;Acc:HGNC:50700] antisense 0 28 0 BCL2L11 BCL2-like 11 (apoptosis facilitator) [Source:HGNC Symbol;Acc:HGNC:994] protein_coding 603827 201 26869 BCL2L11 BCL2-like 11 (apoptosis facilitator) [Source:HGNC Symbol;Acc:HGNC:994] LRG_gene 603827 201 26869 BET1 Bet1 golgi vesicular membrane trafficking protein [Source:HGNC Symbol;Acc:HGNC:14562] protein_coding 605456 33 64005

CBWD5 COBW domain containing 5 [Source:HGNC Symbol;Acc:HGNC:24584] protein_coding 0 70 0 CDH11 cadherin 11, type 2, OB-cadherin (osteoblast) [Source:HGNC Symbol;Acc:HGNC:1750] protein_coding 600023 113 0 CDH4 cadherin 4, type 1, R-cadherin (retinal) [Source:HGNC Symbol;Acc:HGNC:1763] protein_coding 603006 64 0 CECR2 cat eye syndrome region, candidate 2 [Source:HGNC Symbol;Acc:HGNC:1840] protein_coding 607576 17 0

CIDEA cell death-inducing DFFA-like effector a [Source:HGNC Symbol;Acc:HGNC:1976] protein_coding 604440 40 0 COL24A1 collagen, type XXIV, alpha 1 [Source:HGNC Symbol;Acc:HGNC:20821] protein_coding 610025 56,13 0,0 CORO2B coronin, actin binding protein, 2B [Source:HGNC Symbol;Acc:HGNC:2256] protein_coding 605002 35 1634 121

CORO2B coronin, actin binding protein, 2B [Source:HGNC Symbol;Acc:HGNC:2256] protein_coding 605002 35 1634 CYP19A1 cytochrome P450, family 19, subfamily A, polypeptide 1 [Source:HGNC Symbol;Acc:HGNC:2594] protein_coding 107910 42 0

CYSLTR2 cysteinyl leukotriene receptor 2 [Source:HGNC Symbol;Acc:HGNC:18274] protein_coding 605666 230 26538 DLGAP1 discs, large (Drosophila) homolog-associated protein 1 [Source:HGNC Symbol;Acc:HGNC:2905] protein_coding 605445 31 0

FAM89A family with sequence similarity 89, member A [Source:HGNC Symbol;Acc:HGNC:25057] protein_coding 0 74 0 FRMD3 FERM domain containing 3 [Source:HGNC Symbol;Acc:HGNC:24125] protein_coding 607619 59 0 FST follistatin [Source:HGNC Symbol;Acc:HGNC:3971] protein_coding 136470 59 0 GDNF-AS1 GDNF antisense RNA 1 (head to head) [Source:HGNC Symbol;Acc:HGNC:43592] lincRNA 0 25 0 GNG2 guanine nucleotide binding protein (G protein), gamma 2 [Source:HGNC Symbol;Acc:HGNC:4404] protein_coding 606981 32 0

GPR37 G protein-coupled receptor 37 (endothelin receptor type B-like) [Source:HGNC protein_coding 602583 75 0 Symbol;Acc:HGNC:4494] GPRIN3 GPRIN family member 3 [Source:HGNC Symbol;Acc:HGNC:27733] protein_coding 611241 60 74001 HAND2 heart and neural crest derivatives expressed 2 [Source:HGNC Symbol;Acc:HGNC:4808] protein_coding 602407 16 31900 HCG9 HLA complex group 9 (non-protein coding) [Source:HGNC Symbol;Acc:HGNC:21243] processed_transcri 615797 38 7458 pt HCG9 HLA complex group 9 (non-protein coding) [Source:HGNC Symbol;Acc:HGNC:21243] lincRNA 615797 38 7458 HES2 hes family bHLH transcription factor 2 [Source:HGNC Symbol;Acc:HGNC:16005] protein_coding 609970 320 0 HEY2 hes-related family bHLH transcription factor with YRPW motif 2 [Source:HGNC protein_coding 604674 30 6579 Symbol;Acc:HGNC:4881] IFITM1 interferon induced transmembrane protein 1 [Source:HGNC Symbol;Acc:HGNC:5412] protein_coding 604456 34 0 IFITM3 interferon induced transmembrane protein 3 [Source:HGNC Symbol;Acc:HGNC:5414] protein_coding 605579 167 0 IGFBP5 insulin-like growth factor binding protein 5 [Source:HGNC Symbol;Acc:HGNC:5474] protein_coding 146734 81 69711 IGFL3 IGF-like family member 3 [Source:HGNC Symbol;Acc:HGNC:32930] protein_coding 610546 510 7117 KCNMA1 potassium channel, calcium activated large conductance subfamily M alpha, member 1 protein_coding 600150 31 21093 [Source:HGNC Symbol;Acc:HGNC:6284] KIAA1211L KIAA1211-like [Source:HGNC Symbol;Acc:HGNC:33454] protein_coding 0 51 0 LDLRAD3 low density lipoprotein receptor class A domain containing 3 [Source:HGNC protein_coding 0 125 23760 Symbol;Acc:HGNC:27046] LIMCH1 LIM and calponin homology domains 1 [Source:HGNC Symbol;Acc:HGNC:29191] protein_coding 0 143 0 LINC00518 long intergenic non-protein coding RNA 518 [Source:HGNC Symbol;Acc:HGNC:28626] lincRNA 0 44 738 LINC01057 long intergenic non-protein coding RNA 1057 [Source:HGNC Symbol;Acc:HGNC:49057] lincRNA 0 65 8588 LINC01471 long intergenic non-protein coding RNA 1471 [Source:HGNC Symbol;Acc:HGNC:51106] lincRNA 0 39 40406 LINC01514 long intergenic non-protein coding RNA 1514 [Source:HGNC Symbol;Acc:HGNC:51207] lincRNA 0 43 9814 LOC102723709 NA NA 0 47 353622 LOC102724000 NA NA 0 39 143331 LOC102724096 NA NA 0 54 3961 LOC728613 NA NA 0 53 0 LOC728673 NA NA 0 39 47431 LRRC74A leucine rich repeat containing 74A [Source:HGNC Symbol;Acc:HGNC:23346] protein_coding 0 32 0 MIR187 microRNA 187 [Source:HGNC Symbol;Acc:HGNC:31558] miRNA 612698 34 51370 MIS18A MIS18 kinetochore protein A [Source:HGNC Symbol;Acc:HGNC:1286] protein_coding 0 61 18446 MON1B MON1 secretory trafficking family member B [Source:HGNC Symbol;Acc:HGNC:25020] protein_coding 608954 389 16670 MOXD1 monooxygenase, DBH-like 1 [Source:HGNC Symbol;Acc:HGNC:21063] protein_coding 609000 69 24018 122

OVAAL ovarian adenocarcinoma amplified long non-coding RNA [Source:HGNC lincRNA 0 145 0 Symbol;Acc:HGNC:49422] PARP11 poly (ADP-ribose) polymerase family, member 11 [Source:HGNC Symbol;Acc:HGNC:1186] protein_coding 0 82 0 PIFO primary cilia formation [Source:HGNC Symbol;Acc:HGNC:27009] protein_coding 614234 86 11639 PLEKHG1 pleckstrin homology domain containing, family G (with RhoGef domain) member 1 [Source:HGNC protein_coding 0 164 0 Symbol;Acc:HGNC:20884] PPP2R2A protein phosphatase 2, regulatory subunit B, alpha [Source:HGNC Symbol;Acc:HGNC:9304] protein_coding 604941 48 59806

PRKG1 protein kinase, cGMP-dependent, type I [Source:HGNC Symbol;Acc:HGNC:9414] protein_coding 176894 39,27 0,0 PRNT prion protein (testis specific) [Source:HGNC Symbol;Acc:HGNC:18046] protein_coding 0 25 0 PRR15 proline rich 15 [Source:HGNC Symbol;Acc:HGNC:22310] protein_coding 0 43 0 PRSS3 protease, serine, 3 [Source:HGNC Symbol;Acc:HGNC:9486] protein_coding 613578 30 0 PTPRO protein tyrosine phosphatase, receptor type, O [Source:HGNC Symbol;Acc:HGNC:9678] protein_coding 600579 155 0 RASGRF2 Ras protein-specific guanine nucleotide-releasing factor 2 [Source:HGNC Symbol;Acc:HGNC:9876] protein_coding 606614 64 0

RASGRF2-AS1 RASGRF2 antisense RNA 1 [Source:HGNC Symbol;Acc:HGNC:40499] antisense 0 64 0 REEP1 receptor accessory protein 1 [Source:HGNC Symbol;Acc:HGNC:25786] protein_coding 609139 71 0 RELN reelin [Source:HGNC Symbol;Acc:HGNC:9957] protein_coding 600514 38 0 RFPL1S RFPL1 antisense RNA 1 [Source:HGNC Symbol;Acc:HGNC:9978] antisense 605972 38 15194 SATB2 SATB homeobox 2 [Source:HGNC Symbol;Acc:HGNC:21637] protein_coding 608148 44 0 SEMA4C sema domain, immunoglobulin domain (Ig), transmembrane domain (TM) and short cytoplasmic protein_coding 604462 141 0 domain, (semaphorin) 4C [Source:HGNC Symbol;Acc:HGNC:10731] SETBP1 SET binding protein 1 [Source:HGNC Symbol;Acc:HGNC:15573] protein_coding 611060 135 0 SLC35G3 solute carrier family 35, member G3 [Source:HGNC Symbol;Acc:HGNC:26848] protein_coding 0 55 18735 SMIM12 small integral membrane protein 12 [Source:HGNC Symbol;Acc:HGNC:25154] protein_coding 0 63 23782 SPATA18 spermatogenesis associated 18 [Source:HGNC Symbol;Acc:HGNC:29579] protein_coding 612814 276 0 SPTBN2 spectrin, beta, non-erythrocytic 2 [Source:HGNC Symbol;Acc:HGNC:11276] protein_coding 604985 30 6025 SSX2IP synovial sarcoma, X breakpoint 2 interacting protein [Source:HGNC Symbol;Acc:HGNC:16509] protein_coding 608690 22 46826

TBX20 T-box 20 [Source:HGNC Symbol;Acc:HGNC:11598] protein_coding 606061 33 0 TBX20 T-box 20 [Source:HGNC Symbol;Acc:HGNC:11598] LRG_gene 606061 33 0 TMEM158 transmembrane protein 158 (gene/pseudogene) [Source:HGNC Symbol;Acc:HGNC:30293] protein_coding 0 40 0

TMEM233 transmembrane protein 233 [Source:HGNC Symbol;Acc:HGNC:37219] protein_coding 0 57 0 TMIGD3 NA NA 0 42 0 TMTC1 transmembrane and tetratricopeptide repeat containing 1 [Source:HGNC Symbol;Acc:HGNC:24099] protein_coding 615855 46 0

TSHZ3 teashirt finger homeobox 3 [Source:HGNC Symbol;Acc:HGNC:30700] protein_coding 614119 52 436 TSLP thymic stromal lymphopoietin [Source:HGNC Symbol;Acc:HGNC:30743] protein_coding 607003 52 2751 TWIST2 twist family bHLH transcription factor 2 [Source:HGNC Symbol;Acc:HGNC:20670] protein_coding 607556 38 0 ULK4P1 ULK4 pseudogene 1 [Source:HGNC Symbol;Acc:HGNC:15775] transcribed_unpro 0 49 0 cessed_pseudogen e ULK4P2 ULK4 pseudogene 2 [Source:HGNC Symbol;Acc:HGNC:15776] transcribed_unpro 0 49 0 cessed_pseudogen e 123

UNC5B unc-5 homolog B (C. elegans) [Source:HGNC Symbol;Acc:HGNC:12568] protein_coding 607870 48,52 0,0 UNC5B-AS1 UNC5B antisense RNA 1 [Source:HGNC Symbol;Acc:HGNC:45096] antisense 0 52 0 WNT9A wingless-type MMTV integration site family, member 9A [Source:HGNC protein_coding 602863 176 0 Symbol;Acc:HGNC:12778]

HUVEC-only JUN sites in active chromatin state, with repressed and/or poised HAEC state (N=11) hgnc_symbol description gene_biotype OMIM Peak_score Distances ARL4C ADP-ribosylation factor-like 4C [Source:HGNC Symbol;Acc:HGNC:698] protein_coding 604787 376,145 0,193954 FHL2 four and a half LIM domains 2 [Source:HGNC Symbol;Acc:HGNC:3703] protein_coding 602633 170 16720 GAP43 growth associated protein 43 [Source:HGNC Symbol;Acc:HGNC:4140] protein_coding 162060 390 109894 HOXA-AS3 HOXA cluster antisense RNA 3 [Source:HGNC Symbol;Acc:HGNC:43748] antisense 0 252 4143 HOXD1 homeobox D1 [Source:HGNC Symbol;Acc:HGNC:5132] protein_coding 142987 122 1684 LINC00845 long intergenic non-protein coding RNA 845 [Source:HGNC Symbol;Acc:HGNC:45033] lincRNA 0 230 2393 LINC01422 long intergenic non-protein coding RNA 1422 [Source:HGNC Symbol;Acc:HGNC:50728] lincRNA 0 55 292037 LOC729950 NA NA 0 94 169784 NR5A2 subfamily 5, group A, member 2 [Source:HGNC Symbol;Acc:HGNC:7984] protein_coding 604453 73 50780 TMEM200A transmembrane protein 200A [Source:HGNC Symbol;Acc:HGNC:21075] protein_coding 0 130 182892

HUVEC-only JUN sites in active chromatin state, with repressed and/or poised HAEC state (N=162) hgnc_symbol description gene_biotype OMIM Peak_score Distances ABCG1 ATP-binding cassette, sub-family G (WHITE), member 1 [Source:HGNC Symbol;Acc:HGNC:73] protein_coding 603076 86 0

ADAM12 ADAM metallopeptidase domain 12 [Source:HGNC Symbol;Acc:HGNC:190] protein_coding 602714 13 0 ADAMTS10 ADAM metallopeptidase with thrombospondin type 1 motif, 10 [Source:HGNC protein_coding 608990 108 0 Symbol;Acc:HGNC:13201] ADAMTS12 ADAM metallopeptidase with thrombospondin type 1 motif, 12 [Source:HGNC protein_coding 606184 63 0 Symbol;Acc:HGNC:14605] ADAP2 ArfGAP with dual PH domains 2 [Source:HGNC Symbol;Acc:HGNC:16487] protein_coding 608635 33 1037 AKAP6 A kinase (PRKA) anchor protein 6 [Source:HGNC Symbol;Acc:HGNC:376] protein_coding 604691 25 0 ALDH1A2 aldehyde dehydrogenase 1 family, member A2 [Source:HGNC Symbol;Acc:HGNC:15472] protein_coding 603687 17 0 ALDH8A1 aldehyde dehydrogenase 8 family, member A1 [Source:HGNC Symbol;Acc:HGNC:15471] protein_coding 606467 152 7740 ALPL alkaline phosphatase, liver/bone/kidney [Source:HGNC Symbol;Acc:HGNC:438] protein_coding 171760 48 7971 AQP9 aquaporin 9 [Source:HGNC Symbol;Acc:HGNC:643] protein_coding 602914 20 0 ARHGEF10L Rho guanine nucleotide exchange factor (GEF) 10-like [Source:HGNC Symbol;Acc:HGNC:25540] protein_coding 612494 31 0

ARL4C ADP-ribosylation factor-like 4C [Source:HGNC Symbol;Acc:HGNC:698] protein_coding 604787 17 193709 ASB13 ankyrin repeat and SOCS box containing 13 [Source:HGNC Symbol;Acc:HGNC:19765] protein_coding 615055 15 48580 ATP11B ATPase, class VI, type 11B [Source:HGNC Symbol;Acc:HGNC:13553] protein_coding 605869 20 109997 ATP6V1H ATPase, H+ transporting, lysosomal 50/57kDa, V1 subunit H [Source:HGNC protein_coding 608861 21 132490 Symbol;Acc:HGNC:18303] BACH2 BTB and CNC homology 1, basic transcription factor 2 [Source:HGNC protein_coding 605394 18,18 0,0 Symbol;Acc:HGNC:14078] BCL2 B-cell CLL/lymphoma 2 [Source:HGNC Symbol;Acc:HGNC:990] protein_coding 151430 37 10472 BMPR1B bone morphogenetic protein receptor, type IB [Source:HGNC Symbol;Acc:HGNC:1077] protein_coding 603248 23 0 BMPR1B-AS1 BMPR1B antisense RNA 1 (head to head) [Source:HGNC Symbol;Acc:HGNC:50864] lincRNA 0 23 0 BVES blood vessel epicardial substance [Source:HGNC Symbol;Acc:HGNC:1152] protein_coding 604577 58 0 124

C10orf35 chromosome 10 open reading frame 35 [Source:HGNC Symbol;Acc:HGNC:23519] protein_coding 0 19 61980 C17orf107 chromosome 17 open reading frame 107 [Source:HGNC Symbol;Acc:HGNC:37238] protein_coding 0 41 0 C17orf67 chromosome 17 open reading frame 67 [Source:HGNC Symbol;Acc:HGNC:27900] protein_coding 0 28 83432 C1QTNF9B C1q and tumor necrosis factor related protein 9B [Source:HGNC Symbol;Acc:HGNC:34072] protein_coding 614148 40 2391

C8orf33 chromosome 8 open reading frame 33 [Source:HGNC Symbol;Acc:HGNC:26104] protein_coding 0 27 22180 CCDC89 coiled-coil domain containing 89 [Source:HGNC Symbol;Acc:HGNC:26762] protein_coding 0 31 740 CD163L1 CD163 molecule-like 1 [Source:HGNC Symbol;Acc:HGNC:30375] protein_coding 606079 52 4762 CDK15 cyclin-dependent kinase 15 [Source:HGNC Symbol;Acc:HGNC:14434] protein_coding 616147 15 0 CDK15 cyclin-dependent kinase 15 [Source:HGNC Symbol;Acc:HGNC:14434] protein_coding 616147 15 0 CDK6 cyclin-dependent kinase 6 [Source:HGNC Symbol;Acc:HGNC:1777] protein_coding 603368 63 0 CELSR1 cadherin, EGF LAG seven-pass G-type receptor 1 [Source:HGNC Symbol;Acc:HGNC:1850] protein_coding 604523 75 0

CFAP221 cilia and flagella associated protein 221 [Source:HGNC Symbol;Acc:HGNC:33720] protein_coding 0 72 0 CHRNE cholinergic receptor, nicotinic, epsilon (muscle) [Source:HGNC Symbol;Acc:HGNC:1966] protein_coding 100725 41 0 CHST8 carbohydrate (N-acetylgalactosamine 4-0) sulfotransferase 8 [Source:HGNC protein_coding 610190 36 8200 Symbol;Acc:HGNC:15993] COL12A1 collagen, type XII, alpha 1 [Source:HGNC Symbol;Acc:HGNC:2188] protein_coding 120320 38 0 COPRS coordinator of PRMT5, differentiation stimulator [Source:HGNC Symbol;Acc:HGNC:28848] protein_coding 0 29 124521 CREB5 cAMP responsive element binding protein 5 [Source:HGNC Symbol;Acc:HGNC:16844] protein_coding 0 87 0 CRISPLD1 -rich secretory protein LCCL domain containing 1 [Source:HGNC protein_coding 0 34 0 Symbol;Acc:HGNC:18206] CRISPLD1 cysteine-rich secretory protein LCCL domain containing 1 [Source:HGNC protein_coding 0 34 0 Symbol;Acc:HGNC:18206] CSF1 colony stimulating factor 1 (macrophage) [Source:HGNC Symbol;Acc:HGNC:2432] protein_coding 120420 87,61 1,391,617,291 CSRP2 cysteine and glycine-rich protein 2 [Source:HGNC Symbol;Acc:HGNC:2470] protein_coding 601871 37 194 CTHRC1 collagen triple helix repeat containing 1 [Source:HGNC Symbol;Acc:HGNC:18831] protein_coding 610635 67 0 DHRS3 dehydrogenase/reductase (SDR family) member 3 [Source:HGNC Symbol;Acc:HGNC:17693] protein_coding 612830 35 27203

DMC1 DNA meiotic recombinase 1 [Source:HGNC Symbol;Acc:HGNC:2927] protein_coding 602721 64 0 DNAH14 dynein, axonemal, heavy chain 14 [Source:HGNC Symbol;Acc:HGNC:2945] protein_coding 603341 63 0 DNAH8 dynein, axonemal, heavy chain 8 [Source:HGNC Symbol;Acc:HGNC:2952] protein_coding 603337 41 0 DTX3 deltex 3, E3 ubiquitin ligase [Source:HGNC Symbol;Acc:HGNC:24457] protein_coding 613142 80 0 EDIL3 EGF-like repeats and discoidin I-like domains 3 [Source:HGNC Symbol;Acc:HGNC:3173] protein_coding 606018 41 0 FAM178A family with sequence similarity 178, member A [Source:HGNC Symbol;Acc:HGNC:17814] protein_coding 610348 32 21439 FAM178A family with sequence similarity 178, member A [Source:HGNC Symbol;Acc:HGNC:17814] protein_coding 610348 32 21439 FAM46C family with sequence similarity 46, member C [Source:HGNC Symbol;Acc:HGNC:24712] protein_coding 613952 42 67507 FAM71A family with sequence similarity 71, member A [Source:HGNC Symbol;Acc:HGNC:26541] protein_coding 0 161 8311 FAM78A family with sequence similarity 78, member A [Source:HGNC Symbol;Acc:HGNC:25465] protein_coding 0 18 0 FBXO32 F-box protein 32 [Source:HGNC Symbol;Acc:HGNC:16731] protein_coding 606604 62 0 FCN2 ficolin (collagen/fibrinogen domain containing lectin) 2 [Source:HGNC Symbol;Acc:HGNC:3624] protein_coding 601624 71 8828

FLRT2 fibronectin leucine rich transmembrane protein 2 [Source:HGNC Symbol;Acc:HGNC:3761] protein_coding 604807 270 0 GAP43 growth associated protein 43 [Source:HGNC Symbol;Acc:HGNC:4140] protein_coding 162060 29 0 GATA5 GATA binding protein 5 [Source:HGNC Symbol;Acc:HGNC:15802] protein_coding 611496 41 25665 GATA6-AS1 GATA6 antisense RNA 1 (head to head) [Source:HGNC Symbol;Acc:HGNC:48840] lincRNA 0 85 62917 125

GFPT2 glutamine-fructose-6-phosphate transaminase 2 [Source:HGNC Symbol;Acc:HGNC:4242] protein_coding 603865 33 0 GPRIN3 GPRIN family member 3 [Source:HGNC Symbol;Acc:HGNC:27733] protein_coding 611241 15 47407 HAS2-AS1 HAS2 antisense RNA 1 [Source:HGNC Symbol;Acc:HGNC:34340] antisense 614353 32 0 HFM1 HFM1, ATP-dependent DNA helicase homolog (S. cerevisiae) [Source:HGNC protein_coding 615684 35 0 Symbol;Acc:HGNC:20193] HIST1H2BI histone cluster 1, H2bi [Source:HGNC Symbol;Acc:HGNC:4756] protein_coding 602807 99 0 HIST1H3E histone cluster 1, H3e [Source:HGNC Symbol;Acc:HGNC:4769] protein_coding 602813 142 0 HIST1H3G histone cluster 1, H3g [Source:HGNC Symbol;Acc:HGNC:4772] protein_coding 602815 99 0 HIST1H4F histone cluster 1, H4f [Source:HGNC Symbol;Acc:HGNC:4783] protein_coding 602824 51 0 HOXA5 homeobox A5 [Source:HGNC Symbol;Acc:HGNC:5106] protein_coding 142952 29 0 HOXA-AS3 HOXA cluster antisense RNA 3 [Source:HGNC Symbol;Acc:HGNC:43748] antisense 0 29 0 HOXB6 homeobox B6 [Source:HGNC Symbol;Acc:HGNC:5117] protein_coding 142961 119 0 HOXB-AS3 HOXB cluster antisense RNA 3 [Source:HGNC Symbol;Acc:HGNC:40283] antisense 0 38,44,119 2318,0,0 HSBP1L1 binding protein 1-like 1 [Source:HGNC Symbol;Acc:HGNC:37243] protein_coding 0 88 0 HTRA3 HtrA serine peptidase 3 [Source:HGNC Symbol;Acc:HGNC:30406] protein_coding 608785 15 0 IGF2BP1 insulin-like growth factor 2 mRNA binding protein 1 [Source:HGNC Symbol;Acc:HGNC:28866] protein_coding 608288 25,42 0,0

IGFBP1 insulin-like growth factor binding protein 1 [Source:HGNC Symbol;Acc:HGNC:5469] protein_coding 146730 60 0 IL1R2 interleukin 1 receptor, type II [Source:HGNC Symbol;Acc:HGNC:5994] protein_coding 147811 19 17543 IL1RAPL1 interleukin 1 receptor accessory protein-like 1 [Source:HGNC Symbol;Acc:HGNC:5996] protein_coding 300206 44 0 IL21 interleukin 21 [Source:HGNC Symbol;Acc:HGNC:6005] protein_coding 605384 31 69685 INHBA inhibin, beta A [Source:HGNC Symbol;Acc:HGNC:6066] protein_coding 147290 81 0 INHBA-AS1 INHBA antisense RNA 1 [Source:HGNC Symbol;Acc:HGNC:40303] antisense 0 81 0 ISPD isoprenoid synthase domain containing [Source:HGNC Symbol;Acc:HGNC:37276] protein_coding 614631 70 23399 ITGA11 integrin, alpha 11 [Source:HGNC Symbol;Acc:HGNC:6136] protein_coding 604789 38 0 KBTBD11 kelch repeat and BTB (POZ) domain containing 11 [Source:HGNC Symbol;Acc:HGNC:29104] protein_coding 0 20 0

KCNE1 potassium channel, voltage gated subfamily E regulatory beta subunit 1 [Source:HGNC protein_coding 176261 41 0 Symbol;Acc:HGNC:6240] KCNE1 potassium channel, voltage gated subfamily E regulatory beta subunit 1 [Source:HGNC LRG_gene 176261 41 0 Symbol;Acc:HGNC:6240] KCNJ15 potassium channel, inwardly rectifying subfamily J, member 15 [Source:HGNC protein_coding 602106 58 0 Symbol;Acc:HGNC:6261] KCNK3 potassium channel, two pore domain subfamily K, member 3 [Source:HGNC protein_coding 603220 70 0 Symbol;Acc:HGNC:6278] KCTD16 potassium channel tetramerization domain containing 16 [Source:HGNC Symbol;Acc:HGNC:29244] protein_coding 613423 38 0

KIF21B kinesin family member 21B [Source:HGNC Symbol;Acc:HGNC:29442] protein_coding 608322 30 0 LDLRAD4 low density lipoprotein receptor class A domain containing 4 [Source:HGNC protein_coding 606571 20 0 Symbol;Acc:HGNC:1224] LGR6 leucine-rich repeat containing G protein-coupled receptor 6 [Source:HGNC protein_coding 606653 121 0 Symbol;Acc:HGNC:19719] LIN7A lin-7 homolog A (C. elegans) [Source:HGNC Symbol;Acc:HGNC:17787] protein_coding 603380 36 0 LINC00486 long intergenic non-protein coding RNA 486 [Source:HGNC Symbol;Acc:HGNC:42946] lincRNA 0 64 0 LINC01422 long intergenic non-protein coding RNA 1422 [Source:HGNC Symbol;Acc:HGNC:50728] lincRNA 0 82,71 219,085,281,557 LOC100271832 NA NA 0 64 0 126

LOC100507642 NA NA 0 64 118467 LOC101927040 NA NA 0 37 16983 LOC101927497 NA NA 0 63 0 LOC101929268 NA NA 0 39 0 LOC101929565 NA NA 0 46 2409 LOC158434 NA NA 0 67 16593 LOC283045 NA NA 0 23 0 LOC440390 NA NA 0 42 229925 LOC647323 NA NA 0 36,93 0,0 LRRC2 leucine rich repeat containing 2 [Source:HGNC Symbol;Acc:HGNC:14676] protein_coding 607180 43 0 MIR132 microRNA 132 [Source:HGNC Symbol;Acc:HGNC:31516] miRNA 610016 70 0 MIR212 microRNA 212 [Source:HGNC Symbol;Acc:HGNC:31589] miRNA 613487 70 0 MIR5580 microRNA 5580 [Source:HGNC Symbol;Acc:HGNC:43482] miRNA 0 15 289146 MIR8086 microRNA 8086 [Source:HGNC Symbol;Acc:HGNC:50107] miRNA 0 13 37459 MOBP myelin-associated oligodendrocyte basic protein [Source:HGNC Symbol;Acc:HGNC:7189] protein_coding 600948 117 0 MRAS muscle RAS oncogene homolog [Source:HGNC Symbol;Acc:HGNC:7227] protein_coding 608435 80 0 MYO1B myosin IB [Source:HGNC Symbol;Acc:HGNC:7596] protein_coding 606537 23 0 NBPF20 NA NA 614007 49 0 NEDD9 neural precursor cell expressed, developmentally down-regulated 9 [Source:HGNC protein_coding 602265 45 6975 Symbol;Acc:HGNC:7733] NIPAL1 NIPA-like domain containing 1 [Source:HGNC Symbol;Acc:HGNC:27194] protein_coding 0 186 0 NR2F1-AS1 NR2F1 antisense RNA 1 [Source:HGNC Symbol;Acc:HGNC:48622] antisense 0 25 856253 NR5A2 nuclear receptor subfamily 5, group A, member 2 [Source:HGNC Symbol;Acc:HGNC:7984] protein_coding 604453 53 64818 NTM neurotrimin [Source:HGNC Symbol;Acc:HGNC:17941] protein_coding 607938 102 0 PCDH17 protocadherin 17 [Source:HGNC Symbol;Acc:HGNC:14267] protein_coding 611760 22 0 PLEKHA7 pleckstrin homology domain containing, family A member 7 [Source:HGNC protein_coding 612686 18 0 Symbol;Acc:HGNC:27049] PPFIBP2 PTPRF interacting protein, binding protein 2 (liprin beta 2) [Source:HGNC protein_coding 603142 58 0 Symbol;Acc:HGNC:9250] PRNP prion protein [Source:HGNC Symbol;Acc:HGNC:9449] protein_coding 176640 42 29601 PRRX2 paired related homeobox 2 [Source:HGNC Symbol;Acc:HGNC:21338] protein_coding 604675 67 0 PRTG protogenin [Source:HGNC Symbol;Acc:HGNC:26373] protein_coding 613261 46 0 RAB3C RAB3C, member RAS oncogene family [Source:HGNC Symbol;Acc:HGNC:30269] protein_coding 612829 54 0 RASA3 RAS p21 protein activator 3 [Source:HGNC Symbol;Acc:HGNC:20331] protein_coding 605182 47 41726 RGS20 regulator of G-protein signaling 20 [Source:HGNC Symbol;Acc:HGNC:14600] protein_coding 607193 38 0 RIMS1 regulating synaptic membrane exocytosis 1 [Source:HGNC Symbol;Acc:HGNC:17282] protein_coding 606629 30 0 ROR1 receptor tyrosine kinase-like orphan receptor 1 [Source:HGNC Symbol;Acc:HGNC:10256] protein_coding 602336 69 0 RUNX2 runt-related transcription factor 2 [Source:HGNC Symbol;Acc:HGNC:10472] protein_coding 600211 21 26734 SALL4 spalt-like transcription factor 4 [Source:HGNC Symbol;Acc:HGNC:15924] protein_coding 607343 37 0 SALL4 spalt-like transcription factor 4 [Source:HGNC Symbol;Acc:HGNC:15924] LRG_gene 607343 37 0 SAMD5 sterile alpha motif domain containing 5 [Source:HGNC Symbol;Acc:HGNC:21180] protein_coding 0 14 276 SDHAP3 succinate dehydrogenase complex, subunit A, flavoprotein pseudogene 3 [Source:HGNC transcribed_unpro 0 66 12376 Symbol;Acc:HGNC:18781] cessed_pseudogen e SEMA3A sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3A protein_coding 603961 60 0 [Source:HGNC Symbol;Acc:HGNC:10723] 127

SERPINE2 serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 2 protein_coding 177010 22 102884 [Source:HGNC Symbol;Acc:HGNC:8951] SKAP1 src kinase associated phosphoprotein 1 [Source:HGNC Symbol;Acc:HGNC:15605] protein_coding 604969 45 0 SKAP2 src kinase associated phosphoprotein 2 [Source:HGNC Symbol;Acc:HGNC:15687] protein_coding 605215 114 0 SLC30A1 solute carrier family 30 (zinc transporter), member 1 [Source:HGNC Symbol;Acc:HGNC:11012] protein_coding 609521 56 22651

SNX10 sorting nexin 10 [Source:HGNC Symbol;Acc:HGNC:14974] protein_coding 614780 22 2163 SNX16 sorting nexin 16 [Source:HGNC Symbol;Acc:HGNC:14980] protein_coding 614903 46 17614 SOBP sine oculis binding protein homolog (Drosophila) [Source:HGNC Symbol;Acc:HGNC:29256] protein_coding 613667 59 4018

SORBS1 sorbin and SH3 domain containing 1 [Source:HGNC Symbol;Acc:HGNC:14565] protein_coding 605264 128,106 0,0 SP3 [Source:HGNC Symbol;Acc:HGNC:11208] protein_coding 601804 32 47105 SPATA13 spermatogenesis associated 13 [Source:HGNC Symbol;Acc:HGNC:23222] protein_coding 613324 52 0 SPECC1 sperm antigen with calponin homology and coiled-coil domains 1 [Source:HGNC protein_coding 608793 36,98 0,0 Symbol;Acc:HGNC:30615] SPTSSA serine palmitoyltransferase, small subunit A [Source:HGNC Symbol;Acc:HGNC:20361] protein_coding 613540 37 180456 STC2 stanniocalcin 2 [Source:HGNC Symbol;Acc:HGNC:11374] protein_coding 603665 31 22052 SULT6B1 sulfotransferase family, cytosolic, 6B, member 1 [Source:HGNC Symbol;Acc:HGNC:33433] protein_coding 0 33 0

SUSD5 sushi domain containing 5 [Source:HGNC Symbol;Acc:HGNC:29061] protein_coding 0 60 0 TBX3 T-box 3 [Source:HGNC Symbol;Acc:HGNC:11602] protein_coding 601621 26 0 TCEA3 transcription elongation factor A (SII), 3 [Source:HGNC Symbol;Acc:HGNC:11615] protein_coding 604128 28 0 TET1 tet methylcytosine dioxygenase 1 [Source:HGNC Symbol;Acc:HGNC:29484] protein_coding 607790 113 0 TMEM200A transmembrane protein 200A [Source:HGNC Symbol;Acc:HGNC:21075] protein_coding 0 25 182661 TRIB2 tribbles pseudokinase 2 [Source:HGNC Symbol;Acc:HGNC:30809] protein_coding 609462 368 40545 TRIM2 tripartite motif containing 2 [Source:HGNC Symbol;Acc:HGNC:15974] protein_coding 614141 56 0 TRPS1 trichorhinophalangeal syndrome I [Source:HGNC Symbol;Acc:HGNC:12340] protein_coding 604386 109 0 TTC29 tetratricopeptide repeat domain 29 [Source:HGNC Symbol;Acc:HGNC:29936] protein_coding 0 43 29945 VGLL3 vestigial-like family member 3 [Source:HGNC Symbol;Acc:HGNC:24327] protein_coding 609980 49 2345 VWA3B von Willebrand factor A domain containing 3B [Source:HGNC Symbol;Acc:HGNC:28385] protein_coding 0 91 0 ZNF295-AS1 ZNF295 antisense RNA 1 [Source:HGNC Symbol;Acc:HGNC:23130] lincRNA 0 82 15240 ZNF365 protein 365 [Source:HGNC Symbol;Acc:HGNC:18194] protein_coding 607818 23 0 ZNF462 zinc finger protein 462 [Source:HGNC Symbol;Acc:HGNC:21684] protein_coding 0 23 1601 ZNF521 zinc finger protein 521 [Source:HGNC Symbol;Acc:HGNC:24605] protein_coding 610974 16 192549 ZNF521 zinc finger protein 521 [Source:HGNC Symbol;Acc:HGNC:24605] protein_coding 610974 16 192549 ZNF664-FAM101A NA NA 0 40 0 ZNF704 zinc finger protein 704 [Source:HGNC Symbol;Acc:HGNC:32291] protein_coding 0 30 50146 ZNF793-AS1 ZNF793 antisense RNA 1 (head to head) [Source:HGNC Symbol;Acc:HGNC:51303] antisense 0 66 0 ZNF804A zinc finger protein 804A [Source:HGNC Symbol;Acc:HGNC:21711] protein_coding 612282 12 873

128

Supplementary Table 11: Regulatory HGMD disease variants in conserved JUN sites annotated as disease causing. Supplementary Table 11 Legend: Seven unique genes annotated to rare regulatory variants in conserved JUN peaks are shown. The function of the gene product (RefSeq) and the associated diseases from the HGMD entry are shown. A summary is shown of the original paper from the HGMD with the major findings and predictions of binding alterations of TF binding sites by the authors. variation-scan predictions made by us are shown for comparison, where breakers are variants that disrupt a TF binding site while makers are variants that create a TF binding site. Findings from searches of other papers linking the entry gene and endothelial cells are shown, with corresponding PMIDs. The total number of variants annotated to each gene is shown in the last column.

Gene PMID rsID Gene encodes Disease Paper Tissues Findings Predicted variation- Relevance to Total association tested disruption of scan ECs number of TF binding predictions HGMD sites variants annotated to gene in conserved JUN site TARDBP 19695877 rs11121679 TDP-43, a Elevated Examined 46 SALS 16/46 patients Used Motif breaker Double- 1 transcriptional levels or sporadic ALS Brain bank had MatInspector (CH2H, homozygous repressor abnormalities (SALS) (prefrontal noncoding to predict that TGIF1 zebrafish of TDP-43 caused by cortex, less TARDBP rs11121679 (interacts with mutants of associated cytoplasmic affected by mutation abolishes JUN)), tardbp and with ALS, aggregates of ALS at time SF2-ASF no motif tardbpl dementia, TDP-43 of death than binding EGR makers (tardbp-like) head- protein in spinal cord nerve growth show muscle injury/concus motor neurons factor-induced degeneration, sions and glial cells. protein C. strongly (2009) Predicted reduced blood other circulation, mutations mispatterning (exonic) to of vessels. disrupt other PMID: TFs 23457265 TGFBR2 16799921 -334T>A TGF beta 2 Loeys–Dietz Examined 41 Blood of 41 All missense Used Motif maker TGFB 1 receptor, syndrome and unrelated patients vs. 93 mutations TFSEARCH (C2H2), signalling kinase domain Marfan individuals non-patients occurred in to predict gain motif breaker through the for TGF beta syndrome, with a kinase domain of GATA-1 at (TCF3, MYB, TGFBR2 signalling TGFBR2 confirmed or of TGFBR2 -334T>A. RFX5, C2H2) receptor, in (cell growth, mutations tentative protein, functionally endothelial cell found in 2.5% clinical predicted to tested by cells, plays an differentiation of individuals diagnosis of be luciferase important role 129

, apoptosis, affected by Marfan evolutionarily construct in cardiac cellular thoracic aortic Syndrome, in conserved transfection in development, homeostasis aneurysm and whom FBN1 between COS-7 and is dissection. (causal gene) mouse, monkey essential for mutations zebrafish, rat, kidney cell cerebral were not and dog. Six line. Showed vascular identified. mutations increase in integrity (2006) detected in activity. Did PMID: TGFBR2 and not confirm it 20652948 two in was GATA-1 TGFBR1. HSPB1 17623484 NULL Heat shock Amyotrophic Examined at Blood of 150 The C-217T Used Motif HSPB1 is 1 protein lateral sporadic ALS Belgian ALS in HSPB1 MatInspector breakers released induced by sclerosis, a (SALS) cases patients vs. core promoter to predict a (HSF1 (heat primarily environmental motor neuron and identified 262 non- is associated reduction in shock factor from stress and disease. this rare patients with a two- binding site protein 1), endothelial developmenta genetic fold reduction affinity of HESX1 cells (ECs) l changes. variant in a in basal heat shock (stress and regulates This protein is conserved transcriptional factor (HSF) response), angiogenesis a factor in "heat shock activity in C2H2 cluster via direct motor neuron element" in COS-1 and 52) interaction survival. the promoter SH-SY5Y no motif with vascular region of cells as makers endothelial HSPB1 compared to growth factor (2003) the wild type (VEGF) allele, using PMID: luciferase 24465581 assays

PTEN 12844284 NULL Phosphatidyli Cowden Examined Blood of 122 50% Predicted an Motif makers PI3K/Akt 8 nositol-3,4,5- disease, variants in unrelated reduction in alteration of (CREB1, signalling trisphosphate characterized PTEN in individuals PTEN protein SP1 site JDP2, PAX2, regulates 3-phosphatase by multiple individuals diagnosed in patients PAX4, TLX1, angiogenesis that as a tumour-like with Cowden with CS vs. with promoter MYB) through tumour growths syndrome 186 non- mutations. motif breakers affecting the suppressor called (CS) and patients Ten (THAP1 (EC expression of that is hamartomas Bannayan- heterozygous proliferation), VEGF mutated in a and an Riley- sequence E2F1, C2H2, PMID: large number increased risk Ruvalcaba variants TBP, SPZ1, 22505939 of cancers at of certain syndrome within the MBD2 (EC), high forms of (BRRS) PTEN TFIG, frequency cancer. (2003) promoter TEAD1,EPA region were S1, HEY2). found in nine patients with 130

CS.

PTEN 17847000 NULL " Cowden Examined Blood of 122 They further They could Motif makers " " disease variants in unrelated characterized not predict if (TFCP2,NKX PTEN in CS individuals promoter these 2.5,PAX5,YB and BRRS, diagnosed variants found promoter X1,E2F1), same authors with Cowden in 5 CS mutations motif breakers as above. syndrome patients. altered TF (C2H2, (2007) (CS) vs. 186 Promoter binding sites PAX2, RFX2, controls variant cause and suggested XBP1, JUN) large mRNA that abnormal secondary protein structure translation as alterations, a novel resulting in an mechanism of inhibition of CS protein pathogenesis translation and a decrease in PTEN protein expression. 2/5 variants showed 50% reduction in luciferase activity with promoter mutation in MCF7 cells. 3/5 variants had normal mRNA expression but aberrant secondary structures. DIAPH3 20624953 NULL formin Auditory Examined Blood or a 2- to 3-fold SP1, KLF by Motif makers Increased 1 involved in neuropathy, a variant in saliva samples overexpressio comparison to (C2H2, REL, chromosomal actin rare form of highly from 3 related n of DIAPH3 binding motif TFDP2, loss of the remodelling, deafness conserved 5' individuals mRNA in ZNF350, DIAPH3 which characterized UTR of affected by lymphoblastoi E2F1, locus in a regulates cell by an absent DIAPH3 in non- d cell lines PTF1A, cohort of movement or abnormal auditory syndromic from affected SREBF1) prostate and adhesion auditory neuropathy. auditory individuals. - no motif cancer brainstem (2010) neuropathy 172G > A breakers patients, response with vs. 379 non- mutation is related to preservation patients sufficient to microvesicle of outer hair drive release and 131

cell function. overexpressio tumour cell n of a growth luciferase facilitated by reporter activation of ECs. PMID 20445011.

PROCR 11552992 Receptor for Late foetal Looked at 95 women Five Insertion in Motif makers Thrombomod 1 activated loss, involvement with mutations patients (C2H2, ulin (TM) and protein C, a associated of PROCR unexplained were corresponding STAT4/5/2, the serine with placental and late fetal loss identified in to Ets1 ZEB1), endothelial protease insufficiency thrombomudu (> 20 weeks) the TM gene recognition motif breakers protein C activated by and lin (TM) in vs. 236 in 95 patients motif, (ATF5) receptor and involved coagulation late fetal loss. women and three in insertion is (EPCR) are in the blood defects. (2001). 236 control reported in glycoprotein coagulation subjects, and patients with receptors pathway two mutations venous expressed were thromboembo mainly on the identified in lism and endothelial the EPCR myocardial surface of gene in 95 infarction blood vessels patients and and also in the one in 236 placenta; they control both play a subjects. key physiological role in the protein C anticoagulant pathway. RMRP various various nuclear long Cartilage-Hair Transcriptiona 48 noncoding Hypoplasia, a l profiling of RNA bone growth CHH patient disorder RNAs showed characterized upregulation by short of several stature, cytokines and abnormal cell cycle immune regulatory function, and genes PMID: fine spare hair 16254002

132

License for Citation (Park et al. 2013) Supplementary Table 12 WOLTERS KLUWER HEALTH, INC. LICENSE TERMS AND CONDITIONS This Agreement between Lina Antounians ("You") and Wolters Kluwer Health, Inc. ("Wolters Kluwer Health, Inc.") consists of your license details and the terms and conditions provided by Wolters Kluwer Health, Inc. and Copyright Clearance Center.

Total Terms and Conditions Terms and conditions Wolters Kluwer Health 1. Transfer of License: Wolters Kluwer hereby grants you a non-exclusive license to reproduce this material for this purpose, and for no other use, subject to the conditions herein https://s100.copyright.com/MyAccount/viewPrintableLicenseDetails?ref=156c28ac-8930-46a9-80b0-561243ab2e77 Page 1 of 3 RightsLink - Your Account 2015-07-28, 5:20 PM 2. Credit Line: A credit line will be prominently placed, wherever the material is reused and include: the author(s), title of article, title of journal, volume number, issue number and inclusive pages. Where a journal is being published by a learned society, the details of that society must be included in the credit line. i. for Open access journals:The following statement needs to be added when reprinting the material in Open Access journals only: ‘promotional and commercial use of the material in print, digital or mobile device format is prohibited without the permission from the publisher Wolters Kluwer Health. Please contact [email protected] for further information 3. Exceptions: In case of Disease Colon Rectum, Plastic Reconstructive Surgery, The Green Journal, Critical care Medicine, Pediatric Critical Care Medicine, the American Heart Publications, the American Academy of Neurology the following guideline applies: no drug/ trade name or logo can be included in the same page as the material re-used. 4. Translations: When requesting a permission to translate a full text article, Wolters Kluwer/ Lippincott Williams & Wilkins request to receive the pdf of the translated document. This disclaimer should be added at all times: Wolters Kluwer Health and its Societies take no responsibility for the accuracy of the translation from the published English original and are not liable for any errors which may occur.

133

5. Warranties The requestor warrants that the material shall not be used in any manner which may be considered derogatory to the title, content, or authors of the material, or to Wolters Kluwer 6. Indemnity: You hereby indemnify and hold harmless Wolters Kluwer and their respective officers, directors, employees and agents, from and against any and all claims, costs, proceeding or demands arising out of your unauthorised use of the Licensed Material. 7. Geographical Scope: Permission granted is valid worldwide in the English language and the languages specified in your original request 8. Wolters Kluwer cannot supply the requestor with the original artwork or a “clean copy.” 9. Permission is valid if the borrowed material is original to a Wolters Kluwer imprint (Lippincott-Raven Publishers, Williams &Wilkins, Lea & Febiger, Harwal, Rapid Science, Little Brown & Company, Harper & Row Medical, American Journal of Nursing Co, and Urban & Schwarzenberg) 10. Termination of contract: If you opt not to use the material requested above please notify RightsLink or Wolters Kluwer Health/ Lippincott Williams & Wilkins within 90 days of the original invoice date. 11. This permission does not apply to images that are credited to publications other than Wolters Kluwer journals. For images credited to non-Wolters Kluwer Health journal publications, you will need to obtain permission from the journal referenced in the figure or table legend or credit line before making any use of image(s) or table(s) 12. Third party material: Adaptations are protected by copyright, so if you would like to reuse material that we have adapted from another source, you will need not only our permission, but the permission of the rights holder of the original material. Similarly, if you want to reuse an adaptation of original LWW content that appears in another publishers work, you will need our permission and that of the next publisher. The adaptation should be credited as follows: Adapted with permission from Wolters Kluwer Health: Book author, title, year of publication or Journal name, article author, title, reference citation, year of publication. 13. Altering or modifying material: Please note that modification of text within figures or full- text article is strictly forbidden. 14. Please note that articles in the ahead-of–print stage of publication can be cited and the content may be re-used by including the date of access and the unique DOI number. Any final changes in manuscripts will be made at the time of print publication and will be reflected in the final electronic issue. Disclaimer: Articles appearing in the Published Ahead-of–Print section have been peer-reviewed and accepted for publication in the relevant journal and posted online before print publication. Articles appearing as publish ahead–of-print may contain statements, opinions, and information that have errors in facts, figures, or interpretation. Accordingly, Lippincott Williams & Wilkins, the editors and authors and their respective employees are not responsible or liable for the use of any such inaccurate or misleading data, opinion or information contained in the articles in this section. 15. Duration of the license: i. Permission is granted for a one-time use only within 12 months from the date of this invoice. Rights herein do not apply to future reproductions, editors, revisions, or other derivative works. Once the 12- month term has expired, permission to renew must be submitted in writing. ii. For content reused in another journal or book, in print or electronic format, the license is one-time use and lasts for the 1st edition of a book or for the life of the edition in case of journals. iii. IfyourPermissionRequestisforuseonawebsite(whichisnotajournalorabook),internet,intranet,oranypublicly accessible site, you agree to remove the material from such site after 12 months or else renew your permission request. 16. Contingent on payment: While you may exercise the rights licensed immediately upon issuance of the license at the end of https://s100.copyright.com/MyAccount/viewPrintableLicenseDetails?ref=156c28ac-8930-46a9-80b0-561243ab2e77 Page 2 of 3 RightsLink - Your Account 2015-07-28, 5:20 PM the licensing process for the transaction, provided that you have disclosed complete and accurate details of your proposed use, no license is finally effective unless and until full payment is received from you (either by publisher or by CCC) as provided in CCC's Billing and Payment terms and conditions. If full payment is not received on a timely basis, then any license preliminarily granted shall be deemed automatically revoked and shall be void as if never granted. Further, in the event that you breach any of these terms and conditions or any of CCC's Billing and Payment terms and conditions, the license is automatically revoked and shall be void as if never granted. Use of materials as described in a revoked license, as well as any use of the materials beyond the scope of an unrevoked license, may constitute copyright infringement and publisher reserves the right to take any and all action to protect its copyright in the materials. 17. Waived permission fee: If the permission fee for the requested use of our material has been waived in this instance, please be advised that your future requests for Wolters Kluwer materials may attract a fee on another occasion. Please always check with the Wolters Kluwer Permissions Team if in doubt [email protected] For Books only: 18. 1. Permission is granted for a one time use only. Rights herein do not apply to future reproductions, editions, revisions, or other derivative works. SPECIAL CASES: 1. For STM Signatories only, as agreed as part of the STM Guidelines Any permission granted for a particular edition will apply also to subsequent editions and for editions in other languages, provided such editions are for the work as a whole in situ and does not involve the separate exploitation of the permitted illustrations or excerpts. Please click here to view the STM guidelines. v1.11 Questions? [email protected] or +1-855-239-3415 (toll free in the US) or +1-978-646-2777. https://s100.copyright.com/MyAccount/viewPrintableLicenseDetails?ref=156c28ac-8930-46a9-80b0-561243ab2e77 Page 3 of 3

134