Aggregating hundreds of ENCODE epigenomic annotations to predict cell-state-specific regulatory elements
Tiffany Amariuta Ph.D. Candidate, Raychaudhuri Lab Harvard University, Boston, MA pre-ASHG ENCODE Workshop 2018 Majority of GWAS variants are noncoding
noncoding >90%
coding
(Hrdlickova et al 2014 Review, BBA-Molecular Basis of Disease) Big data: genomic functional annotation Gene regulatory elements enriched for disease heritability
50 *** *** CD4+ T Histone Marks *** CD4+ T Gene Sets
30 CD4+ T Enhancers *** ***
*** 10 RA h2 Enrichment *** *** *** *** *** 0
CD4+ T T.4int8+.ThT.4.Pa.BDC H3K4me3H3K27ac (Treg)H3K4me3 (Th2) (Treg) T.4SP69+.Th T.4.PLN.BDC H3K4me3 (Th17 stim) H3K27ac (Th17 stim) CD4+ T Super Enhancers
data from Finucane et al 2015 Nat Genet, Finucane et al 2018 Nat Genet Distinguishing specific from nonspecific gene regulatory elements
NON-SPECIFIC SPECIFIC Distinguishing specific from nonspecific gene regulatory elements
NON-SPECIFIC SPECIFIC Distinguishing specific from nonspecific gene regulatory elements
NON-SPECIFIC SPECIFIC
hypothesis: MORE ENRICHED FOR DISEASE HERITABILITY Regulatory elements may be identified by bound TFs and local epigenome
specific genome-wide regulatory changes Defining cell-state-specific gene regulatory elements
Identify TFs that regulate your cell-state
ChIP-seq for all TFs?
Yes No
Regulatory element: 1. Identify TF motifs any spot where TF binds 2. Grow cells 3. Open chromatin assay
Regulatory element: peaks containing TF motifs Defining cell-state-specific gene regulatory elements
Identify TFs that regulate your cell-state
ChIP-seq for all TFs? May not know the TFs, have ChIP-seq for all TFs, have enough cells, or have a means to assay cellYes No -state
Regulatory element: 1. Identify TF motifs any spot where TF binds 2. Grow cells 3. Open chromatin assay
Regulatory element: peaks containing TF motifs Defining cell-state-specific gene regulatory elements
Identify TFs that regulate your cell-state
ChIP-seq for all TFs? would include generic regulatory elements as well Yes No
Regulatory element: 1. Identify TF motifs any spot where TF binds 2. Grow cells 3. Open chromatin assay
Regulatory element: peaks containing TF motifs Chromosome 5
Chromosome 12 132.005 mb 132.015 mb 132.025 mb 68.54 mb 68.55 mb 68.56 mb
68.545 mb 68.555 mb 132.01 mb 132.02 mb 3 IFNG gene IL4 gene 1 1 TF Th1 Th1 IL4 Gene 0.5 0.8 IMPACT 0.5 TF 0 0.6 0 1 1 c(0, predictions, 0) c(0, predictions, 0.4 Th2 0) c(0, predictions, Th2
(Inference and Modeling of Phenotype in CD4+ Th1 IMPACT 0.2 -related ACtive Transcription)0.8 Index Index 0.5 rs71542467, rs715424680 0.5 0.6 0.4 0 0 1 1 c(0, predictions, 0) c(0, predictions, HLA−DQB1 rs72844401 Th17 0) c(0, predictions, 0.2
IMPACT in CD4+ Th2 IMPACT Th17 2 0 Index Index 0.5 0.5 Regulatory Activity rs28451423 rs71542466 rs4279477 1.5 epigenomic featuresRegulatory Activity 0 0
1 1 specific TF 1 c(0, predictions, 0) c(0, predictions, Treg 0) c(0, predictions, Treg Information content Information ChIP 0.5 Index Index 0.5 ATAC 0.5 Chromosome 2 0 0 0 Chromosome 17 1 2 3 4 5 6 7 8 9 10 11 12 c(0, predictions, 0) c(0, predictions, Index 68.5 chr 12 coordinatePosition (Mb) 68.6 0) c(0, predictions, 204.725132.01 mb chr 5 coordinate204.735 mb (Mb) 132.03204.745 mb 76.345 mb Index76.355 mb Index chr12 genomic coordinate chr5 genomic coordinate 76.35 mb 76.36 mb 204.73 mb 204.74 mb RFX5 SOCS3 Gene CTLA4 gene Gene 1 1 Index SOCS3 Th1 Th1
TF logistic regression + feature selection 0.5 0.5 IMPACT Gene CTLA4 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 0 1 1 0.8 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th2 Th2 CIITA 0.6
Index TF Index 0.5 0.5 0.4 0.2 0 IMPACT in CD4+ Th17 IMPACT Index 0 1 1 0 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th17 0.8 Th17 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 1- 0.6 Index Index 0.5 0.5 Regulatory Activity H3K4me1 Activated Regulatory Activity 0.4 0 0 0.2 1 1 0- c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, H3K4me3 Activated Treg in CD4+ Treg IMPACT 0 Treg Index Index Index 0.5 H3K27ac Activated 0.5 cell-state-specific c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 0 c(0, predictions, 0) c(0, predictions,
c(0, predictions, 0) c(0, predictions, regulatory annotation H3K4me1 Resting 76.35 chr 17 coordinate (Mb) 76.36 204.73 chr 2 coordinate (Mb) 204.75 Index Index H3K4me3 Resting chr17 genomic coordinate chr2 genomic coordinate Index H3K27ac Resting c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 32626192 32626692 32629192 32630692 32632192 32633692 32635192 32636692 Index chr6 genomic coordinate (bp) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) IMPACT Epigenomic Features
IMPACT Feature Marks IMPACT Feature Cell Types
H3K4me1 Macrophage H3K4me3 fetal_brain HepG2 H3K27ac Treg DNase Th1 ATAC Chondrocyte H3K9ac CD4T CD8 H3K27me3 Th17_stim Promoter Th2 H3K9me3 None Pol2 CD4 Sequence−based Pancreatic_Islets Tconv H3K36me3 CD4_Memory FAIRE Naive DGF Pancreatic_Tissue CTCF CD4_Naive CD8_Naive K27ac Jurkat pTEF Memory PolII Pancreatic_islets
0 20 40 60 80 100 0 10 20 30 40 50
frequency frequency IMPACT model 1A
Key: master regulator TF master TF coregulators generic TFs cell-type-specific regulatory elements key TF key TF coregulatorsc training regulatory regions ctraining non-regulatory regions cell-type-nonspecific regulatory elements Genes < < < < < < < < > > > > > > > > > Cellular Truth Training TF Motifs
ChIP-seq
Features Open c Chromatin H3K4me1
H3K9me3 … + 511 peak 1 peak 2 peak 3 IMPACT 1 score 0 IMPACT model 1A
Key: master regulator TF master TF coregulators generic TFs cell-type-specific regulatory elements key TF key TF coregulatorsc training regulatory regions ctraining non-regulatory regions cell-type-nonspecific regulatory elements Genes < < < < < < < < > > > > > > > > > Cellular Truth Training training regulatory regions TF Motifs c cell-type-specific regulatory elementsc ChIP-seq training non-regulatory regions
Features Open Chromatin H3K4me1 c
H3K9me3 … + 511 peak 1 peak 2 peak 3 IMPACT 1 score 0 IMPACT learns a cell-state-specific epigenomic signature
1A training regulatory regions Key: generic TFs cell-type-specific regulatory elements master regulator TF master TF coregulators cell-type-specific regulatory elementsc key TF key TF coregulatorsc c training regulatory regions ctraining non-regulatory regions cell-type-nonspecific regulatorytraining elements non-regulatory regions Genes < < < < < < < < > > > > > > > > > Cellular cell-type-specific regulatory elements Truth Training cell-type-nonspecific regulatory elements TF Motifs
ChIP-seq
Features Open Feature 1 Feature 2 Chromatin H3K4me1 Region Feat.1 Feat.2 Feat.1 Feat.2 H3K9me3 motif 1 1 0 1 1 … + 509 peak 1 peak 2 peak 3 local local distal distal IMPACT 1 score 0 2A ElasticNet Logistic Regression Fit for Tbet IMPACT epigenomic signature no asterisk: 1 -. / + 1234 / < |/7| < 2 ∗ -. / + 1234(/) Th1 DGF local ∗: 2 -. / + 1234 / < |/7| < 3 ∗ -. / + 1234(/) Th17 DNase local *** ∗∗: 3 -. / + 1234 / < |/7| < 4 ∗ -. / + 1234(/) Th1 DNase local ** ***: |/7| > 4 ∗ -. / + 1234(/) Th1 H3K27ac local * Th1 K27ac local * Tconv H3K4me1 distal * rs71542467, rs71542468 CD56 primary H3K4me1 local * HLA−DQB1 rs72844401 CD4 ATAC distal * rs28451423 rs71542466 rs4279477 * T-bet (Th1) Naive H3K4me1 local ChIP Th1 H3K27ac distal ATAC Th1 DGF local Index Th2 DGF local Th1 DNase local RFX5 Th2 DNase local Index Th1 DNase local
c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) Th2 DGF local CIITA Th1 DNase local CD56 primary H3K4me1+ 498 distal Index Th1 Brd4 local c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 1 H3K4me1 Activated strength of feature to identify active binding (model Th2 DNase local !) H3K4me3 Activated Th2 K27ac local Index CD4T ATAC distal H3K27ac Activated c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) Tconv H3K4me1 local H3K4me1 Resting H3K4me3 Resting Index Intercept H3K27ac Resting
c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) *** 32626192 32626692 32629192 32630692 32632192 32633692 32635192 32636692 Index −6 −4 −2 0 2 chr6 genomic coordinate (bp) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) Strength of Feature to Identify Active Binding c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) Chromosome 17
Chromosome 17 Chromosome 12 76.345Chromosome mb 17 76.355 mb
Epigenomic resolution is finer than ChIPChromosome 12 -seq 76.345 mb 76.355 mb 68.54 mb 68.55 mb 68.56 mb 76.345 mb 76.35 mb 76.355 mb 76.36 mb 68.54 mb 68.55 mb 68.56 mb 68.545 mb 68.555 mb 76.35 mb 76.36 mb 68.545 mb 68.555 mb
IFNG gene Gene Chromosome 2 1C SOCS3 SOCS3 gene Gene SOCS3
T-BET motifs Gene SOCS3
TF STAT3 motifs TF TF T-BET ChIP TF 204.725 mb 204.735 mb STAT3 ChIP 204.745 mb TF IMPACT 10.8 1 0.8 IMPACT 0.8 Chromosome 5 Chromosome 2 p(Th1 0.60.6 0.8 204.73 mb 204.74 mb 0.5 p(Th17 0.6 Chromosome 2 0.40.4 Chromosome 17 0.50.80.6 regulatory 0.4 204.725 mb 204.735 mb 204.745 mb 132.005 mb 132.015 mb 132.025 mb regulatory IMPACT in CD4+ Th1 IMPACT IMPACT in CD4+ Th1 IMPACT 0.20.2 Chromosome 5 0.20.60.4 ChromosomeChromosome 17 5 in CD4+ Th17 IMPACT Chromosome 12 element) 0 76.345Chromosome mb 17 76.355 mb 0 204.725 mb 204.735 mb 204.745 mb 0 0 element) 00.40.2 204.73 mb 204.74 mb
Chromosome 12 132.005 mb 132.01 mb 132.015 mb 132.02 mb 132.025 mb in CD4+ Th17 IMPACT 68.54 mb 68.55 mb 68.56 mb 76.345 mb 68.5132.005 mb chr 12 coordinate76.355132.015 mb mb (Mb) 68.6132.025 mb 0.2 76.35 chr 17 coordinate (Mb) 76.36
76.35 mb 76.36 mb in CD4+ Th17 IMPACT 0 68.54 mb 68.55 mb 68.56 mb 0 132.01 mb 132.02 mb Gene 76.35 mb 76.36 mb 204.73 mb 204.74 mb 68.545 mb 68.555 mb 76.35 mb 132.01 mb 132.0276.36 mb mb CTLA4 68.545 mb 68.555 mb Gene Gene IFNG gene Chromosome 2 CTLA4 CTLA4 gene IL4 Gene 1C SOCS3 SOCS3 gene
Gene IL4 gene SOCS3 IL4 Gene
T-BET motifs Gene SOCS3 TF GATA3STAT3 motifs motifs FOXP3 motifs TF TF T-BET ChIP TF TF IL4 Gene TF
TF 204.725 mb 204.735 mb 204.745 mb STAT3 ChIP TF
TF GATA3 ChIP FOXP3 ChIP Gene
IMPACT 1 CTLA4 0.80.8 IMPACT 11 1 IMPACT 0.8 Chromosome 2 IMPACT Chromosome 5 0.8TF 0.8 p(Th1 0.60.6 0.60.80.8 204.73 mb 204.74 mb 0.5 p(Th17p(Th2 0.6 Chromosome 2 p(Treg 0.80.6 0.40.4 0.50.50.80.6 regulatory 0.40.40.6 204.725 mb 204.735 mb 204.745 mb 0.5 132.005 mb 132.015 mb 132.025 mb regulatoryregulatory regulatory 0.60.4 IMPACT in CD4+ Th1 IMPACT IMPACT in CD4+ Th1 IMPACT 0.20.2 Chromosome 5 0.20.60.4 IMPACT in CD4+ Th17 IMPACT 0.2 Chromosome 5 0.80.4 TF 0.2 element) in CD4+ Th2 IMPACT 0 0 0 element) 0 204.725 mb 204.735 mb 204.745 mb 0.4 element) 0.4 element) in CD4+ Treg IMPACT 00.200.60.20 204.73 mb 204.74 mb 00 132.005 mb 132.01 mb 132.015 mb 132.02 mb 132.025 mb in CD4+ Th17 IMPACT 68.5132.005 mb chr 12 coordinate132.015 mb (Mb) 68.6132.025 mb in CD4+ Th2 IMPACT 0.2 76.35 chrchr 17 5coordinate coordinate (Mb) (Mb) 76.36 0.2 chr 2 coordinate (Mb) IMPACT in CD4+ Th17 IMPACT 0.40 0 132.01 132.03 204.73 204.75
0 in CD4+ Treg IMPACT 132.01 mb 132.02 mb Gene 204.73 mb 204.74 mb 0.80 132.01 mb 132.02 mb CTLA4 0.2 IMPACT in CD4+ Th2 IMPACT 0.6 Gene
CTLA4 0 CTLA4 gene IL4 Gene IL4 gene 0.4 IL4 Gene GATA3 motifs FOXP3 motifs 0.2 TF IL4 Gene TF TF TF GATA3 ChIP FOXP3 ChIP in CD4+ Treg IMPACT 0 Gene IMPACT 1 IMPACT CTLA4 1
0.8TF 0.8 p(Th2 0.8 p(Treg 0.8 0.50.6 0.50.6 regulatory 0.40.6 regulatory 0.60.4 0.2 0.80.4 TF 0.2 IMPACT in CD4+ Th2 IMPACT 0.4 element) element) in CD4+ Treg IMPACT 00.60.20 00 IMPACT in CD4+ Th2 IMPACT 0.40 132.01 chr 5 coordinate (Mb) 132.03 0.2 204.73 chr 2 coordinate (Mb) 204.75
IMPACT in CD4+ Treg IMPACT 0.8 0.2 0
IMPACT in CD4+ Th2 IMPACT 0.6 0 0.4 0.2
IMPACT in CD4+ Treg IMPACT 0 IMPACT Validation
• IMPACT model assumes that binding sites of each TF will constitute a global epigenomic signature. (Cowper Sal-Iari et al Nat Genet 2012) IMPACT Validation
• IMPACT model assumes that binding sites of each TF will constitute a global epigenomic signature. (Cowper Sal-Iari et al Nat Genet 2012)
• If the signature is robust, each binding site genome-wide should resemble this signature. IMPACT Validation
• IMPACT model assumes that binding sites of each TF will constitute a global epigenomic signature. (Cowper Sal-Iari et al Nat Genet 2012)
• If the signature is robust, each binding site genome-wide should resemble this signature.
• Therefore, IMPACT should be able to accurately predict TF binding.
• TF binding prediction is notoriously difficult. IMPACT reidentifies TF binding
1B
1 cell−state−specific H3K4me3 + TF motif ● cell−state−specific DNase + TF motif IMPACT
● validation AUC validation −
● 0.75
● ● ●
0.5 T−BET GATA3 STAT3 FOXP3 REST HNF4A TCF7L2 PolII
TF motif binding cross (Th1) (Th2) (Th17) (Treg) (Fetal Brain) (Liver) (Pancreas) (Lymphocyte) TF (cell type) IMPACT is more accurate than open chromatin + motif
1B
1 cell−state−specific H3K4me3 + TF motif ● cell−state−specific DNase + TF motif IMPACT
● validation AUC validation −
● 0.75
● ● ●
0.5 T−BET GATA3 STAT3 FOXP3 REST HNF4A TCF7L2 PolII
TF motif binding cross (Th1) (Th2) (Th17) (Treg) (Fetal Brain) (Liver) (Pancreas) (Lymphocyte) TF (cell type)
average increase in AUC from 0.66 to 0.92 IMPACT Application: Expression Causal Variation
• cis eQTL causal variation is known to be concentrated around promoters and the TSS. (Liu et al AJHG 2017) IMPACT Application: Expression Causal Variation
• cis eQTL causal variation is known to be concentrated around promoters and the TSS. (Liu et al AJHG 2017)
Chromosome 5
Chromosome 12 132.005 mb 132.015 mb 132.025 mb 68.54 mb 68.55 mb 68.56 mb
68.545 mb 68.555 mb 132.01 mb 132.02 mb • Hypothesis 3 IFNG gene IL4 gene 1 1 TF Th1 Th1 IL4 Gene 0.5 0.8 0.5 TF 0 0.6 0 1 1 c(0, predictions, 0) c(0, predictions, 0.4 Th2 0) c(0, predictions, Th2
IMPACT in CD4+ Th1 IMPACT 0.2 0.8 Index Index 0.5 IMPACT RNA polymerase II rs71542467, rs71542468annotation would better capture this variation.0 0.5 0.6 0.4 0 0 1 1 c(0, predictions, 0) c(0, predictions, HLA−DQB1 rs72844401 Th17 0) c(0, predictions, 0.2
IMPACT in CD4+ Th2 IMPACT Th17 2 0 Index Index 0.5 0.5 Regulatory Activity rs28451423 rs71542466 rs4279477 1.5 Regulatory Activity 0 0
1 1 1 c(0, predictions, 0) c(0, predictions, Pol II Treg 0) c(0, predictions, Treg Information content Information 0.5 Index Index 0.5 ATAC 0.5 Chromosome 2 0 0 0 Chromosome 17 1 2 3 4 5 6 7 8 9 10 11 12 c(0, predictions, 0) c(0, predictions, Index 68.5 chr 12 coordinatePosition (Mb) 68.6 0) c(0, predictions, 204.725132.01 mb chr 5 coordinate204.735 mb (Mb) 132.03204.745 mb 76.345 mb Index76.355 mb Index chr12 genomic coordinate chr5 genomic coordinate 76.35 mb 76.36 mb 204.73 mb 204.74 mb RFX5 SOCS3 Gene CTLA4 gene Gene 1 1 Index SOCS3 Th1 Th1 TF 0.5 0.5 Gene IMPACT CTLA4 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 0 1 1 0.8 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th2 Th2 CIITA 0.6
Index TF Index 0.5 0.5 0.4 0.2 0 IMPACT in CD4+ Th17 IMPACT Index 0 1 1 0 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th17 0.8 Th17 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0.6 Index Index 0.5 0.5 Regulatory Activity H3K4me1 Activated Regulatory Activity 0.4 0 0 0.2 1 1 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, H3K4me3 Activated Treg in CD4+ Treg IMPACT 0 Treg Index Index Index 0.5 H3K27ac Activated 0.5 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 0 c(0, predictions, 0) c(0, predictions, H3K4me1 Resting 0) c(0, predictions, 76.35 chr 17 coordinate (Mb) 76.36 204.73 chr 2 coordinate (Mb) 204.75 Index Index H3K4me3 Resting chr17 genomic coordinate chr2 genomic coordinate Index H3K27ac Resting c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 32626192 32626692 32629192 32630692 32632192 32633692 32635192 32636692 Index chr6 genomic coordinate (bp) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) IMPACT Application: Expression Causal Variation cis eQTL χ2 enrichment across functional annotations
1.8 *** ***
1.6 *** *** * * * *** * ***
1.4 *** *** Enrichment
1.2 *** *** *** *** 1.0
PolII ChIP (0.5%) PolII IMPACT (1%) TSS; Hoffman (2%) Coding; UCSC (1.4%)Promoter; UCSC (3%) 5' UTR; UCSC (0.5%) Coding+500bp; UCSC (6%) 5' UTR+500bp; UCSC (3%) Enhancers; Hoffman (4%) Promoter+500bp; UCSCTSS+500bp; (4%) Hoffman (3.4%) Enhancers; Andersson (0.4%) Enhancers+500bp; Hoffman (9%) Flanking Promoter; Hoffman (0.8%) Enhancers+500bp; Andersson (2%) Flanking Promoter+500bp; Hoffman (3%)
25% average increase in enrichment IMPACT Application: Polygenic Causal Variation
naïve CD4+ T cell • CD4+ T cells are strongly implicated in RA pathogenesis
• CD4+ T cell-states: Th1, Th2, Th17, Tregs, etc…
Th1 Th2 Th17 Treg IMPACT explains majority of RA h2 Top 5% of IMPACT captures large % of RA h2
EUR EAS 1.2 1.0 Top 5% of IMPACT Treg SNPs explain 85.7% of RA h2. 0.8
Top 9.8% explain 97.3% of RA h2. 0.6 Proportion of RA h2 0.4 0.2
0.0 Treg Th2 Th17 Th1 IMPACT annotation Most comprehensive explanation for RA h2 Top 5% of IMPACT captures large % of RA h2
EUR EAS 1.2 1.2 1.0 1.0 0.8 0.8 0.6 0.6 42.3% Proportion of RA h2 36.4% 0.4 0.4 Proportion of RA h2 0.2 0.2 0.0 Treg Th2 Th17 Th1 0.0 CD4+CD4+ T T Histone MarksCD4+ T IMPACT annotation Histone Marks Gene Sets IMPACT applied to 42 polygenic traits IMPACT applied to 42 polygenic traits
−10 −5 0 5 10
signed log10 p τ*
Immune Blood Th1 (T−BET) Th2 (GATA3) Metabolism Th17 (STAT3) Body Treg (FOXP3) Lung Treg (STAT5) Macrophage (IRF5) Reproductive Monocyte (IRF1) Pigment Monocyte (CEBPB) B cell (PAX5) Brain Liver (HNF4A) Other Pancreas (TCF7L2) Brain (RXRA) Fetal Brain (REST) BMI LDL HDL Height Autism Autism Tanning Tanning Sunburn Anorexia Anorexia Diabetes Heel Tscore Hair Pigment Skin Pigment Ever Smoked Smoked Ever Platelet Count Schizophrenia Schizophrenia Menarche Age Smoking Status Morning Person Morning Person Crohns Disease Type 2 Diabetes Type Menopause Age High Cholesterol Ulcerative Colitis Ulcerative BMI adj for WHR BMI adj for Eosinophil Count Downs Syndrome Downs Years of Education Years Reticulocyte Count Lung Smoking FVC Allergy and Eczema Allergy and Eczema Rheumatoid Arthritis Balding Less Severe Balding Less Severe Balding More Severe Balding More Severe Red Blood Cell Width Red Blood Cell Count White Blood Cell Count Systolic Blood Pressure All Autoimmune Disease All Autoimmune Coronary Artery Disease Lung Smoking FEV1 FVC Respiratory Ear/Nose/Throat Impedance Basal Metabolic Rate IMPACT applied to 42 polygenic traits: highlights
• CD4+ T IMPACT (T-BET, STAT3, GATA3, FOXP3) à explains heritability of (auto)immune traits
• Liver IMPACT (HNF4A) à explains heritability of LDL, HDL, + some blood traits
• Macrophage IMPACT (IRF5) à explains heritability of Schizophrenia, + immune traits
• Monocyte IMPACT (CEBPB, IRF1) à explains heritability of blood traits, not immune traits
• CD4+ Treg IMPACT (STAT5) replicates CD4+ Treg IMPACT (FOXP3) IMPACT Application: A priori identification of functional variants
• Fine-mapping 20 RA risk loci using GWAS of 11,475 European RA cases and 15,870 controls (Westra et al 2018 Nature Genetics)
• Identification of putatively causal variants:
1. Differential binding of CD4+ T nuclear extract (EMSA)
2. Differential enhancer activity (luciferase assays) IMPACT Application: A priori identification of functional variants
Chromosome 2 6B ChromosomeChr 2 2 ChromosomeChromosome 6 6 204.6 mb 204.8 mb Chr 6 138 mb 138.2 mb Chromosome 2 6C
204.6 mb 204.7 mb 204.8 mb 138.1 mb 204.7 mb CD28 204.6 mb CTLA4 204.8 mb 138 mb TNFAIP3 138.2 mb 204.7 mb 138.1 mb CD28 Gene TNFAIP3 Gene TNFAIP3 genomic coordinate (Mb) 204.5 genomic coordinate (Mb) 204.8 137.9 rs6927172 138.3 CTLA4 Gene rs11314585 rs11757201 rs6933404 rs17264332 rs62432712 rs55686954 rs117701653 rs3087243 rs2327832 rs35926684 rs6920220 (functional) (functional) 1 1 Th1 Th1 0.5 0.5 0 0 1 1 Th2 Th2 Index Index 0.5 0.5 c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords, c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, 0 0 1 1 Th17 Th17 CTLA4 Gene state regulatory element) state regulatory element)
TNFAIP3 Gene TNFAIP3 Index Index − 0.5 − 0.5 c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords, c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, 0 0 1 1 Treg Treg Index
Index 0.5 0.5 c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords, c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, IMPACT p(cell IMPACT 0 IMPACT p(cell IMPACT 0 204566515 204616515 204666515 204716515 137958235 137983235 138008235 Index Index genomic coordinate (bp) c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords,
c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, genomic coordinate (bp) Applications of IMPACT
Chromosome 5
Chromosome 12 132.005 mb 132.015 mb 132.025 mb 68.54 mb 68.55 mb 68.56 mb
68.545 mb 68.555 mb 132.01 mb 132.02 mb 3 IFNG gene IL4 gene 1 1 TF Th1 Th1 IL4 Gene 0.5 0.8 0.5 TF 0 1. Cell-state0.6 -specific regulation 0 2. Causal variation of gene expression 1 1 c(0, predictions, 0) c(0, predictions, 0.4 Th2 0) c(0, predictions, Th2 2
IMPACT in CD4+ Th1 IMPACT 0.2 0.8 Index Index cis eQTL χ enrichment across functional annotations 0.5 rs71542467, rs715424680 0.5 0.6 0.4 0 0 1 1 c(0, predictions, 0) c(0, predictions, HLA−DQB1 rs72844401 Th17 0) c(0, predictions, 0.2
IMPACT in CD4+ Th2 IMPACT Th17 2 0 Index Index 1.8 0.5 0.5
Regulatory Activity *** rs28451423 rs71542466 rs4279477 1.5 Regulatory Activity *** 0 0
1 1 1 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Treg 1.6 Treg *** *** * Information content Information * * 0.5 Index Index *** 0.5 0.5 * ATAC Chromosome 2 *** 0 0 0
1.4 *** Chromosome 17 1 2 3 4 5 6 7 8 9 10 11 12 *** c(0, predictions, 0) c(0, predictions, Index 68.5 chr 12 coordinatePosition (Mb) 68.6 0) c(0, predictions, 204.725132.01 mb chr 5 coordinate204.735 mb (Mb) 132.03204.745 mb
76.345 mb Index76.355 mb Index Enrichment chr12 genomic coordinate chr5 genomic coordinate
1.2 *** 204.73 mb 204.74 mb *** 76.35 mb 76.36 mb *** *** RFX5 SOCS3 Gene CTLA4 gene Gene 1 1 SOCS3 Index Th1 1.0 Th1 TF 0.5 0.5 Gene IMPACT CTLA4 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) PolII ChIP (0.5%) PolII IMPACT (1%) TSS; Hoffman (2%) 0 0 Coding; UCSC (1.4%)Promoter; UCSC (3%) 5' UTR; UCSC (0.5%) 1 1 0.8 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th2 Coding+500bp;Th2 UCSC (6%) 5' UTR+500bp; UCSC (3%) Enhancers; Hoffman (4%) CIITA 0.6 Promoter+500bp; UCSCTSS+500bp; (4%) Hoffman (3.4%) Enhancers; Andersson (0.4%) Enhancers+500bp; Hoffman (9%) Index TF Index Flanking Promoter; Hoffman (0.8%) Enhancers+500bp; Andersson (2%) 0.5 0.5 0.4 0.2 Flanking Promoter+500bp; Hoffman (3%) 0 IMPACT in CD4+ Th17 IMPACT Index 0 1 1 0 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th17 0.8 Th17 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0.6 Index Index 0.5 0.5 Regulatory Activity H3K4me1 Activated Regulatory Activity 0.4 0 0 0.2 1 1 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, H3K4me3 Activated Treg in CD4+ Treg IMPACT 0 Treg Index Index Index 0.5 H3K27ac Activated 0.5 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 0 c(0, predictions, 0) c(0, predictions, H3K4me1 Resting 0) c(0, predictions, 76.35 chr 17 coordinate (Mb) 76.36 204.73 chr 2 coordinate (Mb) 204.75 Index Index H3K4me3 Resting chr17 genomic coordinate chr2 genomic coordinate Index H3K27ac Resting Top 5% of IMPACT captures large % of RA h2 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 3. Causal variation of polygenic traits 4. Functional fine-mapping EUR 32626192 32626692 32629192 32630692 32632192 32633692 32635192 32636692 EAS Chromosome 2 6B ChromosomeChr 2 2 ChromosomeChromosome 6 6 204.6 mb 204.8 mb Chr 6 138 mb 138.2 mb Chromosome 2 6C
204.6 mb 204.7 mb 204.8 mb 138.1 mb 1.2 204.7 mb Index 204.6 mb 204.8 mb 138 mb 138.2 mb CD28 CTLA4 TNFAIP3 chr6 genomic coordinate (bp) 204.7 mb 138.1 mb c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) CD28 Gene TNFAIP3 Gene TNFAIP3
1.0 genomic coordinate (Mb) 204.5 genomic coordinate (Mb) 204.8 137.9 rs6927172 138.3 CTLA4 Gene rs11314585 rs11757201 rs6933404 rs17264332 rs62432712 rs55686954 rs117701653 rs3087243 rs2327832 rs35926684 rs6920220 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0.8 (functional) (functional) 1 1 Th1 Th1 0.5 0.5 0.6 0 0 1 1 Th2 Th2 Index Index 0.5 0.5 c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords, c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, Proportion of RA h2 0 0 0.4 1 1 Th17 Th17 CTLA4 Gene state regulatory element) state regulatory element)
TNFAIP3 Gene TNFAIP3 Index Index − 0.5 − 0.5 c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords, c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, 0 0 1 1 0.2 Treg Treg Index
Index 0.5 0.5 c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords, c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, IMPACT p(cell IMPACT 0 IMPACT p(cell IMPACT 0 204566515 204616515 204666515 204716515 137958235 137983235 138008235 0.0 Index Treg Th2 Th17 Th1 Index genomic coordinate (bp) c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords,
c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, genomic coordinate (bp) IMPACT annotation Thank You
Raychaudhuri Lab Soumya Raychaudhuri Yang Luo Emma Davenport Harm-Jan Westra
Price Lab Alkes Price Steven Gazal Bryce van de Geijn email: [email protected] twitter: @TAmariuta IMPACT manuscript on bioRxiv!