
Predicting Cell Types and Genetic Variations Contributing to Disease by Combining GWAS and Epigenetic Data Anna Gerasimova*, Lukas Chavez, Bin Li, Gregory Seumois, Jason Greenbaum, Anjana Rao, Pandurangan Vijayanand, Bjoern Peters La Jolla Institute for Allergy and Immunology, La Jolla, California, United States of America Abstract Genome-wide association studies (GWASs) identify single nucleotide polymorphisms (SNPs) that are enriched in individuals suffering from a given disease. Most disease-associated SNPs fall into non-coding regions, so that it is not straightforward to infer phenotype or function; moreover, many SNPs are in tight genetic linkage, so that a SNP identified as associated with a particular disease may not itself be causal, but rather signify the presence of a linked SNP that is functionally relevant to disease pathogenesis. Here, we present an analysis method that takes advantage of the recent rapid accumulation of epigenomics data to address these problems for some SNPs. Using asthma as a prototypic example; we show that non- coding disease-associated SNPs are enriched in genomic regions that function as regulators of transcription, such as enhancers and promoters. Identifying enhancers based on the presence of the histone modification marks such as H3K4me1 in different cell types, we show that the location of enhancers is highly cell-type specific. We use these findings to predict which SNPs are likely to be directly contributing to disease based on their presence in regulatory regions, and in which cell types their effect is expected to be detectable. Moreover, we can also predict which cell types contribute to a disease based on overlap of the disease-associated SNPs with the locations of enhancers present in a given cell type. Finally, we suggest that it will be possible to re-analyze GWAS studies with much higher power by limiting the SNPs considered to those in coding or regulatory regions of cell types relevant to a given disease. Citation: Gerasimova A, Chavez L, Li B, Seumois G, Greenbaum J, et al. (2013) Predicting Cell Types and Genetic Variations Contributing to Disease by Combining GWAS and Epigenetic Data. PLoS ONE 8(1): e54359. doi:10.1371/journal.pone.0054359 Editor: Yi-Hsiang Hsu, Harvard Medical School, United States of America Received July 17, 2012; Accepted December 11, 2012; Published January 30, 2013 Copyright: ß 2013 Gerasimova et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: Funding for this study was provided by United States National Institutes of Health (NIH) R01 HL114093 and NIH U19 AI100275. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] Introduction is no obvious expected phenotype or function, but also because of the tight genetic linkage of SNPs in a haploblock, only few of Asthma is a chronic inflammatory disease, characterized by which will have a functional effect. reversible airway obstruction and increased bronchial hyperre- Recently, there has been a vast acceleration in the identification sponsiveness. This complex disorder is influenced by the of non-coding genomic elements that regulate gene transcription interdependencies between various factors - genetic and environ- [23,24]. This has been enabled by the advent of genome-wide mental being the most important ones. The estimate that 35–80% chromatin analysis, such as profiling of the histone mark of the variation in the risk of asthma can be attributed to genetic H3K4me1 (histone 3 lysine 4, monomethylation). This mark is variation has spurred a number of genome-wide association found on histones associated with genomic regions accessible to studies (GWASs) of asthma [1–17], making it one of the best regulatory DNA-binding proteins (transcription factors), which studied diseases to date. Most of these studies involved genotyping thereby function as enhancers of transcription. Enhancers could asthmatic vs. non-asthmatic donors using single nucleotide poly- also be marked by other protein modifications, e.g. H3K4me2 morphism (SNP) arrays that can detect the presence or absence of (histone 3 lysine 4, dimethylation), H3K27ac (histone 3 lysine 27, up to a million SNPs [10]. Using information from studies such as acetylation), as well as histone variant H2A.Z [25–27]. In this The 1000 genomes [18] and Hap Map projects [19], the presence study, we focused on the H3K4me1 and H3K27ac marks as more of additional SNPs not present on the genotyping arrays can be data were publicly available for these markers. extrapolated by imputation [20,21]. GWASs have led to the Notably, the location of enhancers is highly cell-type specific discovery of a large set of SNPs that are significantly more frequent [28], supporting the notion that different cell types maintain their in patients with asthma compared to individuals without asthma or specialized functions by selectively activating different regulatory healthy controls. However, identifying a functional link between regions of the genome [29]. Moreover, several recent studies have the presence of an asthma-risk SNP and development of disease shown that disease-associated SNPs are enriched in enhancers has not been straightforward [22], primarily because the majority [29–32]. of identified SNPs are located in non-coding regions so that there PLOS ONE | www.plosone.org 1 January 2013 | Volume 8 | Issue 1 | e54359 Predicting Cell Types and SNPs Relevant to Disease Based on these considerations, we tested the hypothesis that In addition, we have detected an enrichment of asthma SNPs in SNPs associated with a specific disease are more frequently found many active intergenic states (State 29, 30, 34–36). in enhancers specific to cells that are relevant for the disease. Using In contrast, regions with heterochromatin state and repetitive asthma as an example, we document a significant enrichment of elements (states 40–42, and 48) showed high depletion for disease- asthma-associated SNPs in genomic regions marked by H3K4me1 associated SNPs, as did some states associated with transcription in CD4+ T cells, which are known to contribute to asthma (states 15, 17–19, 23 and 28). pathogenesis. In contrast, cells from brain, breast and skeletal Notably, we also found significant enrichment in States 9, and muscle tissues unrelated to asthma, are depleted of asthma- 34, which are associated with promoter and enhancer regions, associated SNPs in their enhancer regions. Thus, the methodology suggesting that SNPs present in non-coding regions (enhancers we report here can be utilized to make an unbiased prediction of and promoters) contribute to disease pathogenesis by perturbing which cell types contribute to disease pathogenesis, and which the transcriptional regulation of an associated gene. disease-associated SNPs are likely to be functionally important. The Location of Active Enhancers is Primarily a Function Results of Cell Type Enhancers have previously been reported to be tissue and cell- The Majority of Asthma-associated SNPs is Located in type specific [37,38]. To determine if the SNP enrichment in Non-coding Regions enhancers that we found for CD4+ T cells (which are known We retrieved all known asthma-associated SNPs from the contributors to asthma) was a cell-type specific effect, we GWAS integrator database [33], resulting in 131 SNPs (Table S1 compared the available data on the distribution of the in File S1). For all these directly disease-associated SNPs, we also H3K4me1 modification for human CD4+ T cells and several retrieved SNPs in tight genetic linkage (n = 2510, r2 = 0.8) based other human tissues/organs using published studies. We obtained on the most recent release of HaploReg [30] (Table S2 in File S1). all available H3K4me1 Chromain ImmunoPrecipitation (ChIP)- As a background control set, we assembled non-asthma-associated Seq data from the epigenome atlas [39,40] limiting our analysis to SNPs from DbSNP135Common dataset of the UCSC Browser data for which both IP and control input samples were available. A [34,35]. The distribution of the three sets of SNPs in coding total of 37 samples from 19 distinct cell types and 8 tissues were regions (cds), 59-untranslated regions (59-UTRs), 39-UTRs, in- retrieved, and putative enhancers in each dataset were identified trons, and intergenic regions is shown in Figure 1. As expected, by H3K4me1 peak calling (Materials and Methods). As an asthma-associated SNPs were highly enriched in coding sequences example of cell type-specific distribution of H3K4me1, Figure 2 compared to the background set of SNPs, and a significant but shows the extended Th2 cytokine locus. The well-known locus lower enrichment was found for 39- and 59-UTRs. However, the control region LCR-O and the hypersensitivity site (HS) V, which largest proportion of asthma-associated SNPs was found in introns both function as enhancers in CD4+ T cells, are marked in red and intergenic regions, i.e. non-coding sequences known to boxes [41]. Importantly, there are H3K4me1 peaks at these contain enhancers. regions in six CD4+ T cell types (Figure 2), but no peaks were found in the kidney, liver or brain cells. This result demonstrates Non-coding SNPs are Significantly Enriched in Enhancers that our method for calling H3K4me1 peaks accurately identifies In an elegant study by the Kellis group [36], genomic regions in known enhancers. CD4+ T cells were comprehensively classified into different Next, we compared the enhancers identified in different chromatin states using a Hidden Markov Model. We used these datasets by calculating pair-wise Matthews correlation coeffi- cients (MCC), which quantify the overlap of enhancers on a per classifications to examine if any chromatin states were enriched for nucleotide basis (see Materials and Methods).
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages9 Page
-
File Size-