The Epigenome Tools 2: Chip-Seq and Data Analysis
Total Page:16
File Type:pdf, Size:1020Kb
The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang [email protected] http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1 Outline • Epigenome: basics review • ChIP-seq overview • ChIP-seq data analysis 2 Epigenome histone nucleosome The epigenome is a multitude of chemical compounds that can tell the genome what to do. The epigenome is made up of chemical compounds and proteins that can attach to DNA and direct such actions as turning genes on or off, controlling the production of proteins in particular cells. -- from genome.gov Original figure from ENCODE, Darryl Leja (NHGRI), Ian Dunham (EBI) 3 Epigenomic marks • DNA methylation • Histone marks – Covalent modifications – Histone variants • Chromatin regulators – Histone modifying enzymes – Chromatin remodeling complexes • * Transcription factors 4 Histone modifications • Nucleosome Core Particles • Core Histones: H2A, H2B, H3, H4 Notation: H3K4me3 • Covalent modifications on histone tails include: methylation (me), acetylation (ac), phosphorylation … • Histone variants • Histone modifications are implicated in influencing gene expression. Allis C. et al. Epigenetics. 2006 5 LETTERS Figure 3 Patterns of histone modifications at a IL2RA CD28RE b CNS22 IFNG Chr12: 66810000 66830000 66850000 enhancers. (a) The histone modification pattern at Chr10: 6145000 6150000 6155000 6160000 H3K4me3 25 * H3K4me3 184 H3K4me2 11 the CD28RE enhancer (highlighted in red) of the H3K4me2 15 H3K4me1 30 H3K4me1 39 * H3K4ac 4 IL2RA gene. Significant modifications are * 8 H3K4ac * H3K27me3 4 H3K27me3 indicated by asterisks on the left. (b) Histone 4 H3K27me2 5 H3K27me2 5 H3K27me1 6 modification patterns at the IFNG gene and its H3K27me1 10 * H3K27ac 17 * H3K27ac * 23 H3K9me3 5 downstream enhancer, CNS22, are shown. H3K9me3 2 Histone modificationsH3K9me2 associate5 with H3K9me2 3 H3K9me1 12 Significant modifications at CNS22 are indicated H3K9me1 15 * H3K9ac 3 * H3K9ac 13 by asterisks on the left. (c) The fractions of H3K79me3 2 H3K79me3 7 regulation of geneH3K79me2 expression 2 enhancers associated with each of the 38 H3K79me2 6 H3K79me1 3 H3K79me1 5 H3R2me2 4 modifications. (d) Patterns of histone H3R2me2 3 H3R2me1 5 H3R2me1 modifications at 4,179 DNase hypersensitive 5 H3K36me3 11 H3K36me3 4 H3K36me1 5 sites. The y axis indicates the number of patterns, H3K36me1 7 H3K36ac 5 H3K36ac 8 * H4K20me3 4 and x axis indicates the number of hypersensitive H4K20me3 3 H4K20me1 3 H4K20me1 6 sites associated with each pattern. (e) Correlation H2BK5me1 5 H2BK5me1 9 H2BK5ac 6 * H2BK5ac 15 analysis of gene expression with the ten largest * H4R3me2 3 H4R3me2 5 change) - H3K14ac 7 modification patterns by assigning an enhancer to H3K14ac 5 H3K18ac 9 H3K18ac 17 * the TSS of the nearest known gene. All, all * H3K23ac 16 H3K23ac 8 * * H4K5ac 10 H4K5ac 8 * DNase I hypersensitive sites. The number of * H4K8ac 17 H4K8ac 4 * * H4K12ac 4 hypersensitive sites associated with each H4K12ac 7 H4K16ac 6 H4K16ac 6 * H4K91ac 4 pattern is indicated. H4K91ac 10 (fold log2 * * H2BK12ac 7 H2BK12ac 6 * Differentialexpression H2BK20ac 5 H2BK20ac 12 * H2BK120ac 7 H2BK120ac 14 * * H2AK5ac 8 H2AK5ac 12 * H2AK9ac 3 H2AK9ac 3 H2A.Z 23 H2A.Z 16 * * detected patterns, 1,174 are associated with c 0.35 multiple genes and 3,165 with only one gene 0.30 each (Fig. 2a). The 13 most prevalent patterns 0.25 are each associated with more than 62 genes. 0.20 We next examined the expression of genes in 0.15 these patterns, using the mean expression of 0.10 all genes as a reference (Fig. 2b). It seems that Fractions of enhancers 0.05 we can roughly classify these top patterns into 0 three classes (I, II and III in Fig. 2b) accord- ing to expression. Four of six patterns in class H2A.Z H3K4ac H3K9ac H4K5ac H4K8ac H3K14ac H3K18ac H3K23ac H3K27ac H3K36ac H4K12ac H4K16ac H4K91ac H2AK5ac H2AK9ac H2BK5ac H3K4me1 H3K4me2 H3K4me3 H3K9me1 H3K9me2 H3K9me3 H3R2me1 H3R2me2 I contain H3K27me3 and correlate with low H2BK12ac H2BK20ac H3K27me1 H3K27me2 H3K27me3 H3K36me1 H3K36me3 H3K79me1 H3K79me2 H3K79me3 H4K20me1 H4K20me3 H2BK5me1 H2BK120ac 6 Wang, Zang et al. Nat Genet 2008expression. These patterns also contain H2AZ, H3K4me1, H3K4me2, H3K4me3, H3K9me1: 21 deH2AZ, H3K4me1: 36 H3K4me1/2/3, H3K9me1 and H2A.Z but H2AZ, H3K4me2, H3K4me3, H3K9me1: 37 10,000 100 H2AZ, H3K4me3: 20 no acetylations. The patterns containing H2AZ: 67 90 H3K4me1, H4K20me1: 21 H3K4me1: 64 only H3K4me3 or no modification also 80 H3K4me3: 34 1,000 belong to this class. Class II contains 70 H3K27me3: 65 H3K36me3: 41 60 H4K20me1: 42 H3K36me3 or a modification backbone con- H2AK5ac: 63 100 50 No modification: 1,817 sisting of 17 modifications (as discussed All enhancers: 4,179 40 below), or the backbone plus H4K16ac, 30 10 higher expression which correlates with intermediate gene 20 expression. Class III shows the highest expres- Percentage of enhancers with 10 Number of combinatorial patterns sion, and it includes H2BK5me1, H4K16ac, 1 0 1 10 100 1,000 10,000 1 10 100 1,000 10,000 H4K20me1 and H3K79me1/2/3 in addition Number of enhancers associated with one pattern Enhancers-associated gene expression to the modification backbone (Fig. 2b). Our Gene Ontology analysis suggests that genes histone acetyl transferases (HATs) can associate with different involved in cellular physiology and metabolism are enriched in the regions of genes. For example, PCAF associates with the elongation- active class III patterns, consistent with their house-keeping roles (data competent RNA Pol II, whereas p300 interacts with the initiation- not shown). In contrast, many genes involved in development, cell– competent RNA Pol II (ref. 19). Additionally, depletion of GCN5 cell signaling and synaptic transmission are enriched in the inactive or PCAF, but not CBP or p300, affects H4K8ac and H3K14ac20. class I patterns, consistent with their not being required for mature The distribution patterns of these histone acetylations and histone T-cell function. methylations are exemplified by the genomic locus for ZMYND8 (also To correlate each modification with gene expression, we compared known as PRKCBP1), which is expressed in CD4+ T cells (Fig. 1g). The the average gene expression with or without each modification promoter region (highlighted in red), which was defined as a 2-kb (Fig. 2c). H3K27me3 was among a group of repressive marks also region surrounding the TSS, is associated with 25 modifications including H3K27me2, H3K9me2, H3K9me3 and H4K20me3, whereas 7 (P o 10À ). most other modifications correlated with activation. Although the To identify the patterns of histone modifications in an unbiased modification patterns do not uniquely determine the extent of way, we examined each of the 12,541 gene promoters for association expression, the H3K79me3 and H2BK5ac modifications showed with each of the 18 acetylations, 19 methylations and H2A.Z. Of the weak correlation with expression within a modification pattern possible patterns, only a small fraction exists at promoters. Of 4,339 (Supplementary Fig. 5 online). NATURE GENETICS VOLUME 40 [ NUMBER 7 [ JULY 2008 899 “Functions” of histone marks Table 3. Distinctive Chromatin Features of Genomic Elements Functional Annotation Histone Marks References Promoters H3K4me3 Bernstein et al., 2005; Kim et al., 2005; Pokholok et al., 2005 Bivalent/Poised Promoter H3K4me3/H3K27me3 Bernstein et al., 2006 Transcribed Gene Body H3K36me3 Barski et al., 2007 Enhancer (both active and poised) H3K4me1 Heintzman et al., 2007 Poised Developmental Enhancer H3K4me1/H3K27me3 Creyghton et al., 2010; Rada-Iglesias et al., 2011 Active Enhancer H3K4me1/H3K27ac Creyghton et al., 2010; Heintzman et al., 2009; Rada-Iglesias et al., 2011 Polycomb Repressed Regions H3K27me3 Bernstein et al., 2006; Lee et al., 2006 Heterochromatin H3K9me3 Mikkelsen et al., 2007 survey DNA enrichment at 500 representative loci using the steps in a row, or Sequential-ChIP-seq, can uncover histone nCounter probe system. High-quality antibodies are distinctRivera &PTMs Ren Cell on2013 the7 same molecule or chromatin-associated proteins from IgG patterns and the occupancy distribution correlates in the same complex. Several groups combined bisulfite with a logical set of chromatin states. Validating antibody re- sequencing with ChIP giving rise to BisChIP-seq and ChIP-BS- agents for enrichment specificity and robustness ensures good seq (Brinkman et al., 2012; Statham et al., 2012). Long-distance quality ChIP-seq data sets with high signal to noise ratios. DNA interactions mediated by a specific protein can be profiled Given that ChIP-seq is a mature technology, the technical using chromatin interaction analysis by paired-end-tag restrictions of the technique are well defined by its users. These sequencing, or ChIA-PET (Fullwood et al., 2009). We anticipate restrictions include the need for large amounts of starting mate- other inventive uses of ChIP technology to continue to uncover rial, limited resolution, and the dependence on antibodies. undiscovered roles of histone modifications and histone variants Improvements to ChIP-seq have been developed to address in transcriptional regulation. these limitations and expand the possibilities of its use. Collect- ing enough starting material for ChIP-seq can be challenging Mapping of Chromatin Structures because experiments typically require 1 million (histone modifi- Nucleosome Positioning cations) to 5 million (TFs and chromatin modifiers) cells. Although Moving up the hierarchy of genomic organization, we now look this is feasible when studying fast dividing cell lines, the chal-