Quantitative Analysis of Chip-Seq Data Uncovers Dynamic and Sustained H3k4me3 and H3k27me3 Modulation in Cancer Cells Under Hypo
Total Page:16
File Type:pdf, Size:1020Kb
Adriaens et al. Epigenetics & Chromatin (2016) 9:48 DOI 10.1186/s13072-016-0090-4 Epigenetics & Chromatin RESEARCH Open Access Quantitative analysis of ChIP‑seq data uncovers dynamic and sustained H3K4me3 and H3K27me3 modulation in cancer cells under hypoxia Michiel E. Adriaens1,2*† , Peggy Prickaerts3†, Michelle Chan‑Seng‑Yue4,8, Twan van den Beucken5,6,7, Vivian E. H. Dahlmans3, Lars M. Eijssen2, Timothy Beck4,9, Bradly G. Wouters5,6,7, Jan Willem Voncken3 and Chris T. A. Evelo2 Abstract Background: A comprehensive assessment of the epigenetic dynamics in cancer cells is the key to understand‑ ing the molecular mechanisms underlying cancer and to improving cancer diagnostics, prognostics and treat‑ ment. By combining genome-wide ChIP-seq epigenomics and microarray transcriptomics, we studied the effects of oxygen deprivation and subsequent reoxygenation on histone 3 trimethylation of lysine 4 (H3K4me3) and lysine 27 (H3K27me3) in a breast cancer cell line, serving as a model for abnormal oxygenation in solid tumors. A priori, epigenetic markings and gene expression levels not only are expected to vary greatly between hypoxic and normoxic conditions, but also display a large degree of heterogeneity across the cell population. Where traditionally ChIP-seq data are often treated as dichotomous data, the model and experiment here necessitate a quantitative, data-driven analysis of both datasets. Results: We first identified genomic regions with sustained epigenetic markings, which provided a sample-specific reference enabling quantitative ChIP-seq data analysis. Sustained H3K27me3 marking was located around cen‑ tromeres and intergenic regions, while sustained H3K4me3 marking is associated with genes involved in RNA binding, translation and protein transport and localization. Dynamic marking with both H3K4me3 and H3K27me3 (hypoxia- induced bivalency) was found in CpG-rich regions at loci encoding factors that control developmental processes, congruent with observations in embryonic stem cells. Conclusions: In silico-identified epigenetically sustained and dynamic genomic regions were confirmed through ChIP-PCR in vitro, and obtained results are corroborated by published data and current insights regarding epigenetic regulation. Keywords: Epigenomics, Transcriptomics, Data normalization, ChIP-sequencing, H3K4me3, H3K27me3, Hypoxia, MCF7 *Correspondence: [email protected] †Michiel E Adriaens and Peggy Prickaerts have contributed equally to this work 1 Maastricht Centre for Systems Biology – MaCSBio, Maastricht University, Maastricht, The Netherlands Full list of author information is available at the end of the article © 2016 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Adriaens et al. Epigenetics & Chromatin (2016) 9:48 Page 2 of 11 Background unsupervised approach is required to identify and define Epigenomics explores genome-wide epigenetic modi- epigenetically invariant genomic regions. fications, such as DNA (hydroxy)methylation or post- Second, although biological interpretation of ChIP-seq translational modification of N-terminal histone tails data by itself is legitimate [19–21], integration of tran- [1]; such marking is often dynamic and serves to support scriptomics data is pivotal for a comprehensive biological phenotypic plasticity. Well-known examples of histone interpretation [22]. We extrapolate from this observation modifications are trimethylation of histone 3 lysine 4 to propose that transcriptomics data analysis is a require- (H3K4me3) and lysine 27 (H3K27me3), which are often ment for biology-driven preprocessing of ChIP-seq data. associated with transcriptional activity and repression, Specifically, the invariant regions used for ChIP-seq data respectively. Combined chromatin immunoprecipita- normalization should be defined in conjunction with tion (ChIP) and high-throughput sequencing technology transcriptomics data. (ChIP-seq [2, 3]) has proven to be a powerful approach to Considering the challenges presented by a highly vari- create genome-wide maps of such histone modifications ant biological system, we pursued the invariant-set para- in multiple human cancer cell lines [4–7], and the key digm outlined above in a data-driven analysis approach to understanding the molecular mechanisms underlying combining genes with sustained expression and genomic cancer and to improving cancer diagnostics, prognostics regions with sustained H3K4me3 and H3K27me3 mark- and treatment [1, 8, 9]. ings across all states. The approach enables quantitative In this study we focused specifically on ChIP-seq- data comparison for H3K4me3 and H3K27me3 ChIP-seq based analysis of dynamically changing H3K4me3 and data and a comprehensive integrative analysis of epig- H3K27me3 enrichment in response to fluctuating oxygen enomics and transcriptomics data. The biological insights levels. MCF7 cells were exposed to 8 and 24 h of hypoxia, derived from the analysis are further corroborated a and subsequently reoxygenated, serving as a model for posteriori by publicly available data and current insights such dynamic epigenetic effects occurring in response to regarding epigenetic regulation. reduced oxygen availability in solid tumors [10]. Analysis of a highly dynamic biological system, where the major- Results ity of epigenetic markings and gene expression levels will Data generation be altered between physiologically distinct states, as well MCF7 cells were exposed to 8 or 24 h of hypoxia (0.02 % as display a large degree of heterogeneity across the cell oxygen) and subsequently reoxygenated for 8 h (21 % population due to epigenetic selection and stochasticity, oxygen) (see Methods section for details). Cells were har- poses a number of challenges on data analysis. vested and processed for ChIP-seq and microarray-based First, the expected heterogeneity, i.e., the observation expression analysis at 0, 8 and 24 h of hypoxia (referred that the same region can harbor a specific epigenetic to as t = 0, t = 8 and t = 24, respectively), and at 8 h of mark in some part of the cell population, while not in reoxygenation (referred to henceforth as t =+8). the other, results in (1) a quantitative range of ChIP-seq signals within a state and (2) quantitative differences Quantitative analysis of sustained and dynamic H3K27me3 between states. Although ChIP-seq data are tradition- enrichment under fluctuating oxygen levels ally treated as dichotomous (i.e., present/not present), To enable quantitative analysis, H3K27me3 datasets were here quantitative data comparison is therefore a must. normalized based on identification of regions with sus- This requires normalization of the ChIP-seq data [11]. tained H3K27me3 enrichment between all samples ana- Most commonly, a scaling approach relative to the total lyzed. The cumulative area under the curve (AUC) for all number of aligned reads is applied, based on the predic- peaks in all these regions was determined for each con- tion that the number of differences between conditions is dition, resulting in sample-specific scaling factors (see limited [12–14]. This assumption, however, does not hold Methods section for details). here, as quantitative data comparison between highly For H3K27me3, after normalization, a single cutoff variant states requires invariant measurement sets across value for all samples was set at the peak height above all experimental conditions as a reference, such as an which the H3K27me3 data correlated with the ChIP- artificial spike-in [15–17], or biologically steadily marked seq data of CBX8, a known H3K27me3-binding protein H3K4me3/H3K27me3 regions (by analogy: steadily under normoxic conditions (t = 0) [23], corresponding expressed genes in transcriptomic studies). However, to a normalized height of 5.0 on a log2 scale. For biologi- a priori definition of regions with sustained epigenetic cal interpretation analyses, we set the level of biologi- marking is challenging, as this depends on the biologi- cal significance at two times this background level (6.0 cal model as well as the studied histone modification on a log2 scale) and considered peaks below this value [13]. To accommodate study-specific approaches [18], an background noise (i.e., biologically irrelevant). To add Adriaens et al. Epigenetics & Chromatin (2016) 9:48 Page 3 of 11 further biological validity to this threshold, we defined enrichment of H3K4me3 surrounding in TSS (region a set of 1137 genes whose expression levels were above defined as− 1000 bp to +100 bp of the TSS [4]) was the 95th percentile for all samples analyzed, i.e., consist- summed, yielding sample-specific scaling factors to ently highly expressed genes. Any repressive H3K27me3 which each peak in the dataset was scaled. A number of enrichment present in these genes were considered back- genes with dynamically altered or sustained expression ground noise, and indeed, the median enrichment of were validated by conventional ChIP-PCR. Oxygen dep- highly expressed genes H3K27me3 peaks was found