Aggregating Hundreds of ENCODE Epigenomic Annotations to Predict Cell-State-Specific Regulatory Elements
Total Page:16
File Type:pdf, Size:1020Kb
Aggregating hundreds of ENCODE epigenomic annotations to predict cell-state-specific regulatory elements Tiffany Amariuta Ph.D. Candidate, Raychaudhuri Lab Harvard University, Boston, MA pre-ASHG ENCODE Workshop 2018 Majority of GWAS variants are noncoding noncoding >90% coding (Hrdlickova et al 2014 Review, BBA-Molecular Basis of Disease) Big data: genomic functional annotation Gene regulatory elements enriched for disease heritability 50 *** *** CD4+ T Histone Marks *** CD4+ T Gene Sets 30 CD4+ T Enhancers *** *** *** 10 RA h2 Enrichment *** *** *** *** *** 0 CD4+ T T.4int8+.ThT.4.Pa.BDC H3K4me3H3K27ac (Treg)H3K4me3 (Th2) (Treg) T.4SP69+.Th T.4.PLN.BDC H3K4me3 (Th17 stim) H3K27ac (Th17 stim) CD4+ T Super Enhancers data from Finucane et al 2015 Nat Genet, Finucane et al 2018 Nat Genet Distinguishing specific from nonspecific gene regulatory elements NON-SPECIFIC SPECIFIC Distinguishing specific from nonspecific gene regulatory elements NON-SPECIFIC SPECIFIC Distinguishing specific from nonspecific gene regulatory elements NON-SPECIFIC SPECIFIC hypothesis: MORE ENRICHED FOR DISEASE HERITABILITY Regulatory elements may be identified by bound TFs and local epigenome specific genome-wide regulatory changes Defining cell-state-specific gene regulatory elements Identify TFs that regulate your cell-state ChIP-seq for all TFs? Yes No Regulatory element: 1. Identify TF motifs any spot where TF binds 2. Grow cells 3. Open chromatin assay Regulatory element: peaks containing TF motifs Defining cell-state-specific gene regulatory elements Identify TFs that regulate your cell-state ChIP-seq for all TFs? May not know the TFs, have ChIP-seq for all TFs, have enough cells, or have a means to assay cellYes No -state Regulatory element: 1. Identify TF motifs any spot where TF binds 2. Grow cells 3. Open chromatin assay Regulatory element: peaks containing TF motifs Defining cell-state-specific gene regulatory elements Identify TFs that regulate your cell-state ChIP-seq for all TFs? would include generic regulatory elements as well Yes No Regulatory element: 1. Identify TF motifs any spot where TF binds 2. Grow cells 3. Open chromatin assay Regulatory element: peaks containing TF motifs Chromosome 5 Chromosome 12 132.005 mb 132.015 mb 132.025 mb 68.54 mb 68.55 mb 68.56 mb 68.545 mb 68.555 mb 132.01 mb 132.02 mb 3 IFNG gene IL4 gene 1 1 TF Th1 Th1 IL4 Gene 0.5 0.8 IMPACT 0.5 TF 0 0.6 0 1 1 c(0, predictions, 0) c(0, predictions, 0.4 Th2 0) c(0, predictions, Th2 (Inference and Modeling of Phenotype in CD4+ Th1 IMPACT 0.2 -related ACtive Transcription)0.8 Index Index 0.5 rs71542467, rs715424680 0.5 0.6 0.4 0 0 1 1 c(0, predictions, 0) c(0, predictions, HLA−DQB1 rs72844401 Th17 0) c(0, predictions, 0.2 IMPACT in CD4+ Th2 IMPACT Th17 2 0 Index Index 0.5 0.5 Regulatory Activity rs28451423 rs71542466 rs4279477 1.5 epigenomic featuresRegulatory Activity 0 0 1 1 specific TF 1 c(0, predictions, 0) c(0, predictions, Treg 0) c(0, predictions, Treg Information content Information ChIP 0.5 Index Index 0.5 ATAC 0.5 Chromosome 2 0 0 0 Chromosome 17 1 2 3 4 5 6 7 8 9 10 11 12 c(0, predictions, 0) c(0, predictions, Index 68.5 chr 12 coordinatePosition (Mb) 68.6 0) c(0, predictions, 204.725132.01 mb chr 5 coordinate204.735 mb (Mb) 132.03204.745 mb 76.345 mb Index76.355 mb Index chr12 genomic coordinate chr5 genomic coordinate 76.35 mb 76.36 mb 204.73 mb 204.74 mb RFX5 SOCS3 Gene CTLA4 gene Gene 1 1 Index SOCS3 Th1 Th1 TF logistic regression + feature selection 0.5 0.5 IMPACT Gene CTLA4 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 0 1 1 0.8 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th2 Th2 CIITA 0.6 Index TF Index 0.5 0.5 0.4 0.2 0 IMPACT in CD4+ Th17 IMPACT Index 0 1 1 0 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th17 0.8 Th17 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 1- 0.6 Index Index 0.5 0.5 Regulatory Activity H3K4me1 Activated Regulatory Activity 0.4 0 0 0.2 1 1 0- c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, H3K4me3 Activated Treg in CD4+ Treg IMPACT 0 Treg Index Index Index 0.5 H3K27ac Activated 0.5 cell-state-specific c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 0 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, regulatory annotation H3K4me1 Resting 76.35 chr 17 coordinate (Mb) 76.36 204.73 chr 2 coordinate (Mb) 204.75 Index Index H3K4me3 Resting chr17 genomic coordinate chr2 genomic coordinate Index H3K27ac Resting c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 32626192 32626692 32629192 32630692 32632192 32633692 32635192 32636692 Index chr6 genomic coordinate (bp) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) IMPACT Epigenomic Features IMPACT Feature Marks IMPACT Feature Cell Types H3K4me1 Macrophage H3K4me3 fetal_brain HepG2 H3K27ac Treg DNase Th1 ATAC Chondrocyte H3K9ac CD4T CD8 H3K27me3 Th17_stim Promoter Th2 H3K9me3 None Pol2 CD4 Sequence−based Pancreatic_Islets Tconv H3K36me3 CD4_Memory FAIRE Naive DGF Pancreatic_Tissue CTCF CD4_Naive CD8_Naive K27ac Jurkat pTEF Memory PolII Pancreatic_islets 0 20 40 60 80 100 0 10 20 30 40 50 frequency frequency IMPACT model 1A Key: master regulator TF master TF coregulators generic TFs cell-type-specific regulatory elements key TF key TF coregulatorsc training regulatory regions ctraining non-regulatory regions cell-type-nonspecific regulatory elements Genes < < < < < < < < > > > > > > > > > Cellular Truth Training TF Motifs ChIP-seq Features Open c Chromatin H3K4me1 H3K9me3 … + 511 peak 1 peak 2 peak 3 IMPACT 1 score 0 IMPACT model 1A Key: master regulator TF master TF coregulators generic TFs cell-type-specific regulatory elements key TF key TF coregulatorsc training regulatory regions ctraining non-regulatory regions cell-type-nonspecific regulatory elements Genes < < < < < < < < > > > > > > > > > Cellular Truth Training training regulatory regions TF Motifs c cell-type-specific regulatory elementsc ChIP-seq training non-regulatory regions Features Open Chromatin H3K4me1 c H3K9me3 … + 511 peak 1 peak 2 peak 3 IMPACT 1 score 0 IMPACT learns a cell-state-specific epigenomic signature 1A training regulatory regions Key: generic TFs cell-type-specific regulatory elements master regulator TF master TF coregulators cell-type-specific regulatory elementsc key TF key TF coregulatorsc c training regulatory regions ctraining non-regulatory regions cell-type-nonspecific regulatorytraining elements non-regulatory regions Genes < < < < < < < < > > > > > > > > > Cellular cell-type-specific regulatory elements Truth Training cell-type-nonspecific regulatory elements TF Motifs ChIP-seq Features Open Feature 1 Feature 2 Chromatin H3K4me1 Region Feat.1 Feat.2 Feat.1 Feat.2 H3K9me3 motif 1 1 0 1 1 … + 509 peak 1 peak 2 peak 3 local local distal distal IMPACT 1 score 0 2A ElasticNet Logistic Regression Fit for Tbet IMPACT epigenomic signature no asterisk: 1 -. / + 1234 / < |/7| < 2 ∗ -. / + 1234(/) Th1 DGF local ∗: 2 -. / + 1234 / < |/7| < 3 ∗ -. / + 1234(/) Th17 DNase local *** ∗∗: 3 -. / + 1234 / < |/7| < 4 ∗ -. / + 1234(/) Th1 DNase local ** ***: |/7| > 4 ∗ -. / + 1234(/) Th1 H3K27ac local * Th1 K27ac local * Tconv H3K4me1 distal * rs71542467, rs71542468 CD56 primary H3K4me1 local * HLA−DQB1 rs72844401 CD4 ATAC distal * rs28451423 rs71542466 rs4279477 * T-bet (Th1) Naive H3K4me1 local ChIP Th1 H3K27ac distal ATAC Th1 DGF local Index Th2 DGF local Th1 DNase local RFX5 Th2 DNase local Index Th1 DNase local c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) Th2 DGF local CIITA Th1 DNase local CD56 primary H3K4me1+ 498 distal Index Th1 Brd4 local c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 1 H3K4me1 Activated strength of feature to identify active binding (model Th2 DNase local !) H3K4me3 Activated Th2 K27ac local Index CD4T ATAC distal H3K27ac Activated c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) Tconv H3K4me1 local H3K4me1 Resting H3K4me3 Resting Index Intercept H3K27ac Resting c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) *** 32626192 32626692 32629192 32630692 32632192 32633692 32635192 32636692 Index −6 −4 −2 0 2 chr6 genomic coordinate (bp) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) Strength of Feature to Identify Active Binding c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) Chromosome 17 Chromosome 17 Chromosome 12 76.345Chromosome mb 17 76.355 mb Epigenomic resolution is finer than ChIPChromosome 12 -seq 76.345 mb 76.355 mb 68.54 mb 68.55 mb 68.56 mb 76.345 mb 76.35 mb 76.355 mb 76.36 mb 68.54 mb 68.55 mb 68.56 mb 68.545 mb 68.555 mb 76.35 mb 76.36 mb 68.545 mb 68.555 mb IFNG gene Gene Chromosome 2 1C SOCS3 SOCS3 gene Gene SOCS3 T-BET motifs Gene SOCS3 TF STAT3 motifs TF TF T-BET ChIP TF 204.725 mb 204.735 mb STAT3 ChIP 204.745 mb TF IMPACT 10.8 1 0.8 IMPACT 0.8 Chromosome 5 Chromosome 2 p(Th1 0.60.6 0.8 204.73 mb 204.74 mb 0.5 p(Th17 0.6 Chromosome 2 0.40.4 Chromosome 17 0.50.60.8 regulatory 0.4 204.725 mb 204.735 mb 204.745 mb 132.005 mb 132.015 mb 132.025 mb regulatory IMPACT in CD4+ Th1 IMPACT IMPACT in CD4+ Th1 IMPACT 0.20.2 Chromosome 5 0.40.20.6 ChromosomeChromosome 17 5 in CD4+ Th17 IMPACT Chromosome 12 element) 0 76.345Chromosome mb 17 76.355 mb 0 204.725 mb 204.735 mb 204.745 mb 0 0 element) 00.20.4 204.73 mb 204.74 mb Chromosome 12 132.005 mb 132.01 mb 132.015 mb 132.02 mb 132.025 mb in CD4+ Th17 IMPACT 68.54 mb 68.55 mb 68.56 mb 76.345 mb 68.5132.005 mb chr 12 coordinate76.355132.015 mb mb (Mb) 68.6132.025 mb 0.2 76.35 chr 17 coordinate (Mb) 76.36