Histone Modification Levels Are Predictive for Gene Expression
Total Page:16
File Type:pdf, Size:1020Kb
Histone modification levels are predictive for gene expression Rosa Karlića,b,1, Ho-Ryun Chunga,1,2, Julia Lasserrea, Kristian Vlahovičekb,c, and Martin Vingrona aMax-Planck-Institut für Molekulare Genetik, Department of Computational Molecular Biology, Ihnestraße 73, 14195 Berlin, Germany; bBioinformatics Group, Division of Biology, Faculty of Science, Zagreb University, Horvatovac 102a, 10000 Zagreb, Croatia; and cDepartment of Informatics, University of Oslo, P.O. Box 1080, Blindern, NO-0316 Oslo, Norway Edited by Robert G. Roeder, The Rockefeller University, New York, NY, and approved January 7, 2010 (received for review August 20, 2009) Histones are frequently decorated with covalent modifications. and are tightly regulated to achieve a precise control of gene ex- These histone modifications are thought to be involved in various pression. The regulatory mechanisms depend on the action of chromatin-dependent processes including transcription. To eluci- transcription factors, which facilitate the recruitment of pol II date the relationship between histone modifications and transcrip- and/or chromatin modifying complexes. Histone modifications tion, we derived quantitative models to predict the expression can therefore be viewed as a read out of the activity of transcrip- level of genes from histone modification levels. We found that his- tion factors. In line with this idea, there are established links be- tone modification levels and gene expression are very well corre- tween the distinct steps in the transcription cycle and some lated. Moreover, we show that only a small number of histone histone modifications, e.g., H3K4me3 is associated with tran- modifications are necessary to accurately predict gene expression. scription initiation [(14) and references therein], H3K27me3 with We show that different sets of histone modifications are necessary the repression of transcription elongation [(15) and references to predict gene expression driven by high CpG content promoters therein], and H3K36me3 with the removal of histone acetylations (HCPs) or low CpG content promoters (LCPs). Quantitative models in the wake of an elongating pol II (for review, see ref. 11). How- involving H3K4me3 and H3K79me1 are the most predictive of the ever, in general, little is known about the relationship between expression levels in LCPs, whereas HCPs require H3K27ac and histone modifications and the transcriptional process. H4K20me1. Finally, we show that the connections between histone Here, we systematically study this relationship. We analyzed modifications and gene expression seem to be general, as we were the recently published genome-wide localization data of 38 his- BIOPHYSICS AND able to predict gene expression levels of one cell type using a mod- tone modifications and one histone variant measured in human COMPUTATIONAL BIOLOGY el trained on another one. CD4þ T-cells (16, 17). We address four major questions: (i)Is there a quantitative relationship between histone modifications high CpG content promoter ∣ low CpG content promoter ∣ levels and transcription? (ii) Are there histone modifications that regression analysis ∣ transcription are more important than others to predict transcript levels? (iii) Are there different requirements for different promoter iv he DNA of eukaryotic organisms is packaged into chromatin, types? ( ) Are the relationships general? To answer these ques- Twhose basic repeating unit is the nucleosome. A nucleosome tions, we derived models that quantitatively relate the measured is formed by wrapping 147 base pairs of DNA around an octamer expression level of genes (18) to the level of modifications at their of four core histones, H2A, H2B, H3, and H4 (1–5) which are promoters. We show that our models faithfully capture the mea- subject to a number of posttranslational covalent modifications sured expression levels of genes, suggesting that the levels of [(6); for review, see ref. 7]. These modifications can alter the modifications are quantitatively related to gene expression and chromatin structure and function by changing the charge of that there is a tight link between these histone modifications the nucleosome particle, and/or by recruiting protein complexes and the transcriptional process. Furthermore, combinations of either individually or in combination (8). Hence, histone modifi- only two to three modifications are sufficient to build models that cations are thought to constitute a “Histone Code,” which is read give rise to at least 95% of the performance obtained by using all out by proteins to bring about specific downstream effects (9, 10). modifications. The separation of low CpG content promoters Histone modifications have been linked to a number of chro- (LCPs) from high CpG content promoters (HCPs) revealed that matin-dependent processes, including replication, DNA-repair, different histone modifications are identified dependent on the and transcription. The link between histone modifications and CpG content of the promoters. Finally, gene expression levels transcription has been particularly intensively studied. It has been of another cell type were accurately predicted using a model 4þ found that individual modifications can be associated with trained on the CD T-cells, suggesting that the relationship be- transcriptional activation or repression. Acetylation and phos- tween histone modifications and gene expression is a general one. phorylation generally accompany transcription; sumoylation, Results deimination, and proline isomerization are usually found in transcriptionally silent regions; methylation and ubiquitination Histone Modification Levels Are Predictive of Gene Expression in CD4þ are implicated in both activation and repression of transcription T-Cells. The presence or absence of certain histone modifi- (8). Furthermore, the establishment of some modifications is de- cations has been shown to correlate with the expression status of pendent on the presence of other modifications, e.g., the catalysis of H3K4me3 requires the presence of H2BK120ub1 (the Author contributions: R.K., H.-R.C., K.V., and M.V. designed research; R.K. and H.-R.C. so-called trans-tail pathway) and the phosphorylation on serine performed research; J.L. contributed new reagents/analytic tools; R.K. and H.-R.C. 5 on the C-terminal domain of RNA polymerase II (pol II) analyzed data; R.K., H.-R.C., J.L., K.V., and M.V. wrote the paper. (for review, see ref. 11, which also reviews other examples for The authors declare no conflict of interest. the combinatorial action of histone modifications). This article is a PNAS Direct Submission. Transcription proceeds in a series of steps, also referred to as Freely available online through the PNAS open access option. transcription cycle, starting with preinitiation complex formation, 1R.K. and H.-R.C. contributed equally to this work. pol II recruitment, the transition to an initiating and thereafter 2To whom correspondence should be addressed. E-mail: [email protected]. elongating pol II, elongation, and finally termination (for review, This article contains supporting information online at www.pnas.org/cgi/content/full/ see refs. 12, 13). The first four steps take place at the promoter 0909344107/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.0909344107 PNAS Early Edition ∣ 1of6 Downloaded by guest on September 27, 2021 genes (19). To get a better understanding of the relationships One-modification model (41 models) between histone modifications and gene expression, we analyzed f(N'i) =a+bi *N'i the publicly available genome-wide localization data for 38 his- N' f(N' ) y tone modifications and one histone variant in human CD4þ i i T-cells (ChIP-seq data) (16, 17). We used the numbers of tags Two-modifications model (820 models) for each histone modification or variant, found in a region of f(N' N' ) =a+b N' +b N' 4,001 base pairs surrounding the transcription start sites of i, j i i j j 14,802 RefSeq genes, as an estimation of the level of histone N'i modifications. Furthermore, we examined microarray data mea- f(N' N' ) y 4þ i, j suring the transcript abundance of each of these genes in CD N'j T-cells (18), where the logarithm of the intensity served as expres- sion value. Because open chromatin regions preferentially have Three-modifications model (10,660 models) higher tag counts than closed chromatin regions (20, 21), we also f(N' N' N' ) =a+b N' +b N' +b N' obtained ChIP-seq data for unspecific control ChIPs using goat i, j, k i i j j k k and rabbit IgG antibodies in CD4þ T-cells (22) and mapped N'i these tags to the same regions as the histone modification data. We derived a simple model that relates the expression values to N' y j f(N'i, N'j, N'k ) the histone modification levels (for details, see Methods). Briefly, N i we transformed the number of tags ij for each modification and N'k promoter j to a logarithmic scale. Because some Nij were zero, we had to add a pseudocount αi to the levels of each modification Full model (1 model) to assure the logarithm would always be defined (N0 ¼ ij f(N' ... N' ) =a+b N' + ... + b N' logðNij þ αiÞ). We chose the αi that maximizes the correlation 1, , 41 1 1 41 41 of Nij to the expression value (estimated using 4,934 randomly N' selected promoters). We then built for the remaining 9,868 pro- 1 moters a linear regression model where the entire set of modifi- f(N' , ..., N' ) y cations and the control IgG data served as input (Fig. 1; referred ... 1 41 “ ” to as full model ). We used 10-fold cross-validation to ascertain N'41 that a possible quantitative relationship is of general nature and not limited to a subset of genes. We evaluated the performance of Fig. 1. Modeling framework. Models are equations that linearly relate the N0 the model by determining the Pearson correlation coefficient r levels of histone modifications to the measured expression value. i between modeled and measured expression. corresponds to a vector of length L (the number of promoters), where the r ¼ 0 77 components are the transformed levels of a histone modification i The full model is very well correlated to expression ( .