Aggregating hundreds of ENCODE epigenomic annotations to predict cell-state-specific regulatory elements

Tiffany Amariuta Ph.D. Candidate, Raychaudhuri Lab Harvard University, Boston, MA pre-ASHG ENCODE Workshop 2018 Majority of GWAS variants are noncoding

noncoding >90%

coding

(Hrdlickova et al 2014 Review, BBA-Molecular Basis of Disease) Big data: genomic functional annotation Gene regulatory elements enriched for disease heritability

50 *** *** CD4+ T Marks *** CD4+ T Gene Sets

30 CD4+ T Enhancers *** ***

*** 10 RA h2 Enrichment *** *** *** *** *** 0

CD4+ T T.4int8+.ThT.4.Pa.BDC H3K4me3H3K27ac (Treg) (Th2) (Treg) T.4SP69+.Th T.4.PLN.BDC H3K4me3 (Th17 stim) (Th17 stim) CD4+ T Super Enhancers

data from Finucane et al 2015 Nat Genet, Finucane et al 2018 Nat Genet Distinguishing specific from nonspecific gene regulatory elements

NON-SPECIFIC SPECIFIC Distinguishing specific from nonspecific gene regulatory elements

NON-SPECIFIC SPECIFIC Distinguishing specific from nonspecific gene regulatory elements

NON-SPECIFIC SPECIFIC

hypothesis: MORE ENRICHED FOR DISEASE HERITABILITY Regulatory elements may be identified by bound TFs and local epigenome

specific genome-wide regulatory changes Defining cell-state-specific gene regulatory elements

Identify TFs that regulate your cell-state

ChIP-seq for all TFs?

Yes No

Regulatory element: 1. Identify TF motifs any spot where TF binds 2. Grow cells 3. Open assay

Regulatory element: peaks containing TF motifs Defining cell-state-specific gene regulatory elements

Identify TFs that regulate your cell-state

ChIP-seq for all TFs? May not know the TFs, have ChIP-seq for all TFs, have enough cells, or have a means to assay cellYes No -state

Regulatory element: 1. Identify TF motifs any spot where TF binds 2. Grow cells 3. Open chromatin assay

Regulatory element: peaks containing TF motifs Defining cell-state-specific gene regulatory elements

Identify TFs that regulate your cell-state

ChIP-seq for all TFs? would include generic regulatory elements as well Yes No

Regulatory element: 1. Identify TF motifs any spot where TF binds 2. Grow cells 3. Open chromatin assay

Regulatory element: peaks containing TF motifs Chromosome 5

Chromosome 12 132.005 mb 132.015 mb 132.025 mb 68.54 mb 68.55 mb 68.56 mb

68.545 mb 68.555 mb 132.01 mb 132.02 mb 3 IFNG gene IL4 gene 1 1 TF Th1 Th1 IL4 Gene 0.5 0.8 IMPACT 0.5 TF 0 0.6 0 1 1 c(0, predictions, 0) c(0, predictions, 0.4 Th2 0) c(0, predictions, Th2

(Inference and Modeling of Phenotype in CD4+ Th1 IMPACT 0.2 -related ACtive Transcription)0.8 Index Index 0.5 rs71542467, rs715424680 0.5 0.6 0.4 0 0 1 1 c(0, predictions, 0) c(0, predictions, HLA−DQB1 rs72844401 Th17 0) c(0, predictions, 0.2

IMPACT in CD4+ Th2 IMPACT Th17 2 0 Index Index 0.5 0.5 Regulatory Activity rs28451423 rs71542466 rs4279477 1.5 epigenomic featuresRegulatory Activity 0 0

1 1 specific TF 1 c(0, predictions, 0) c(0, predictions, Treg 0) c(0, predictions, Treg Information content Information ChIP 0.5 Index Index 0.5 ATAC 0.5 Chromosome 2 0 0 0 Chromosome 17 1 2 3 4 5 6 7 8 9 10 11 12 c(0, predictions, 0) c(0, predictions, Index 68.5 chr 12 coordinatePosition (Mb) 68.6 0) c(0, predictions, 204.725132.01 mb chr 5 coordinate204.735 mb (Mb) 132.03204.745 mb 76.345 mb Index76.355 mb Index chr12 genomic coordinate chr5 genomic coordinate 76.35 mb 76.36 mb 204.73 mb 204.74 mb RFX5 SOCS3 Gene CTLA4 gene Gene 1 1 Index SOCS3 Th1 Th1

TF logistic regression + feature selection 0.5 0.5 IMPACT Gene CTLA4 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 0 1 1 0.8 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th2 Th2 CIITA 0.6

Index TF Index 0.5 0.5 0.4 0.2 0 IMPACT in CD4+ Th17 IMPACT Index 0 1 1 0 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th17 0.8 Th17 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 1- 0.6 Index Index 0.5 0.5 Regulatory Activity Activated Regulatory Activity 0.4 0 0 0.2 1 1 0- c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, H3K4me3 Activated Treg in CD4+ Treg IMPACT 0 Treg Index Index Index 0.5 H3K27ac Activated 0.5 cell-state-specific c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 0 c(0, predictions, 0) c(0, predictions,

c(0, predictions, 0) c(0, predictions, regulatory annotation H3K4me1 Resting 76.35 chr 17 coordinate (Mb) 76.36 204.73 chr 2 coordinate (Mb) 204.75 Index Index H3K4me3 Resting chr17 genomic coordinate chr2 genomic coordinate Index H3K27ac Resting c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 32626192 32626692 32629192 32630692 32632192 32633692 32635192 32636692 Index chr6 genomic coordinate (bp) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) IMPACT Epigenomic Features

IMPACT Feature Marks IMPACT Feature Cell Types

H3K4me1 Macrophage H3K4me3 fetal_brain HepG2 H3K27ac Treg DNase Th1 ATAC Chondrocyte CD4T CD8 Th17_stim Promoter Th2 None Pol2 CD4 Sequence−based Pancreatic_Islets Tconv CD4_Memory FAIRE Naive DGF Pancreatic_Tissue CTCF CD4_Naive CD8_Naive K27ac Jurkat pTEF Memory PolII Pancreatic_islets

0 20 40 60 80 100 0 10 20 30 40 50

frequency frequency IMPACT model 1A

Key: master regulator TF master TF coregulators generic TFs cell-type-specific regulatory elements key TF key TF coregulatorsc training regulatory regions ctraining non-regulatory regions cell-type-nonspecific regulatory elements Genes < < < < < < < < > > > > > > > > > Cellular Truth Training TF Motifs

ChIP-seq

Features Open c Chromatin H3K4me1

H3K9me3 … + 511 peak 1 peak 2 peak 3 IMPACT 1 score 0 IMPACT model 1A

Key: master regulator TF master TF coregulators generic TFs cell-type-specific regulatory elements key TF key TF coregulatorsc training regulatory regions ctraining non-regulatory regions cell-type-nonspecific regulatory elements Genes < < < < < < < < > > > > > > > > > Cellular Truth Training training regulatory regions TF Motifs c cell-type-specific regulatory elementsc ChIP-seq training non-regulatory regions

Features Open Chromatin H3K4me1 c

H3K9me3 … + 511 peak 1 peak 2 peak 3 IMPACT 1 score 0 IMPACT learns a cell-state-specific epigenomic signature

1A training regulatory regions Key: generic TFs cell-type-specific regulatory elements master regulator TF master TF coregulators cell-type-specific regulatory elementsc key TF key TF coregulatorsc c training regulatory regions ctraining non-regulatory regions cell-type-nonspecific regulatorytraining elements non-regulatory regions Genes < < < < < < < < > > > > > > > > > Cellular cell-type-specific regulatory elements Truth Training cell-type-nonspecific regulatory elements TF Motifs

ChIP-seq

Features Open Feature 1 Feature 2 Chromatin H3K4me1 Region Feat.1 Feat.2 Feat.1 Feat.2 H3K9me3 motif 1 1 0 1 1 … + 509 peak 1 peak 2 peak 3 local local distal distal IMPACT 1 score 0 2A ElasticNet Logistic Regression Fit for Tbet IMPACT epigenomic signature no asterisk: 1 -. / + 1234 / < |/7| < 2 ∗ -. / + 1234(/) Th1 DGF local ∗: 2 -. / + 1234 / < |/7| < 3 ∗ -. / + 1234(/) Th17 DNase local *** ∗∗: 3 -. / + 1234 / < |/7| < 4 ∗ -. / + 1234(/) Th1 DNase local ** ***: |/7| > 4 ∗ -. / + 1234(/) Th1 H3K27ac local * Th1 K27ac local * Tconv H3K4me1 distal * rs71542467, rs71542468 CD56 primary H3K4me1 local * HLA−DQB1 rs72844401 CD4 ATAC distal * rs28451423 rs71542466 rs4279477 * T-bet (Th1) Naive H3K4me1 local ChIP Th1 H3K27ac distal ATAC Th1 DGF local Index Th2 DGF local Th1 DNase local RFX5 Th2 DNase local Index Th1 DNase local

c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) Th2 DGF local CIITA Th1 DNase local CD56 primary H3K4me1+ 498 distal Index Th1 Brd4 local c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 1 H3K4me1 Activated strength of feature to identify active binding (model Th2 DNase local !) H3K4me3 Activated Th2 K27ac local Index CD4T ATAC distal H3K27ac Activated c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) Tconv H3K4me1 local H3K4me1 Resting H3K4me3 Resting Index Intercept H3K27ac Resting

c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) *** 32626192 32626692 32629192 32630692 32632192 32633692 32635192 32636692 Index −6 −4 −2 0 2 chr6 genomic coordinate (bp) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) Strength of Feature to Identify Active Binding c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) Chromosome 17

Chromosome 17 Chromosome 12 76.345Chromosome mb 17 76.355 mb

Epigenomic resolution is finer than ChIPChromosome 12 -seq 76.345 mb 76.355 mb 68.54 mb 68.55 mb 68.56 mb 76.345 mb 76.35 mb 76.355 mb 76.36 mb 68.54 mb 68.55 mb 68.56 mb 68.545 mb 68.555 mb 76.35 mb 76.36 mb 68.545 mb 68.555 mb

IFNG gene Gene Chromosome 2 1C SOCS3 SOCS3 gene Gene SOCS3

T-BET motifs Gene SOCS3

TF STAT3 motifs TF TF T-BET ChIP TF 204.725 mb 204.735 mb STAT3 ChIP 204.745 mb TF IMPACT 10.8 1 0.8 IMPACT 0.8 Chromosome 5 Chromosome 2 p(Th1 0.60.6 0.8 204.73 mb 204.74 mb 0.5 p(Th17 0.6 Chromosome 2 0.40.4 Chromosome 17 0.50.80.6 regulatory 0.4 204.725 mb 204.735 mb 204.745 mb 132.005 mb 132.015 mb 132.025 mb regulatory IMPACT in CD4+ Th1 IMPACT IMPACT in CD4+ Th1 IMPACT 0.20.2 Chromosome 5 0.20.60.4 ChromosomeChromosome 17 5 in CD4+ Th17 IMPACT Chromosome 12 element) 0 76.345Chromosome mb 17 76.355 mb 0 204.725 mb 204.735 mb 204.745 mb 0 0 element) 00.40.2 204.73 mb 204.74 mb

Chromosome 12 132.005 mb 132.01 mb 132.015 mb 132.02 mb 132.025 mb in CD4+ Th17 IMPACT 68.54 mb 68.55 mb 68.56 mb 76.345 mb 68.5132.005 mb chr 12 coordinate76.355132.015 mb mb (Mb) 68.6132.025 mb 0.2 76.35 chr 17 coordinate (Mb) 76.36

76.35 mb 76.36 mb in CD4+ Th17 IMPACT 0 68.54 mb 68.55 mb 68.56 mb 0 132.01 mb 132.02 mb Gene 76.35 mb 76.36 mb 204.73 mb 204.74 mb 68.545 mb 68.555 mb 76.35 mb 132.01 mb 132.0276.36 mb mb CTLA4 68.545 mb 68.555 mb Gene Gene IFNG gene Chromosome 2 CTLA4 CTLA4 gene IL4 Gene 1C SOCS3 SOCS3 gene

Gene IL4 gene SOCS3 IL4 Gene

T-BET motifs Gene SOCS3 TF GATA3STAT3 motifs motifs FOXP3 motifs TF TF T-BET ChIP TF TF IL4 Gene TF

TF 204.725 mb 204.735 mb 204.745 mb STAT3 ChIP TF

TF GATA3 ChIP FOXP3 ChIP Gene

IMPACT 1 CTLA4 0.80.8 IMPACT 11 1 IMPACT 0.8 Chromosome 2 IMPACT Chromosome 5 0.8TF 0.8 p(Th1 0.60.6 0.60.80.8 204.73 mb 204.74 mb 0.5 p(Th17p(Th2 0.6 Chromosome 2 p(Treg 0.80.6 0.40.4 0.50.50.80.6 regulatory 0.40.40.6 204.725 mb 204.735 mb 204.745 mb 0.5 132.005 mb 132.015 mb 132.025 mb regulatoryregulatory regulatory 0.60.4 IMPACT in CD4+ Th1 IMPACT IMPACT in CD4+ Th1 IMPACT 0.20.2 Chromosome 5 0.20.60.4 IMPACT in CD4+ Th17 IMPACT 0.2 Chromosome 5 0.80.4 TF 0.2 element) in CD4+ Th2 IMPACT 0 0 0 element) 0 204.725 mb 204.735 mb 204.745 mb 0.4 element) 0.4 element) in CD4+ Treg IMPACT 00.200.60.20 204.73 mb 204.74 mb 00 132.005 mb 132.01 mb 132.015 mb 132.02 mb 132.025 mb in CD4+ Th17 IMPACT 68.5132.005 mb chr 12 coordinate132.015 mb (Mb) 68.6132.025 mb in CD4+ Th2 IMPACT 0.2 76.35 chrchr 17 5coordinate coordinate (Mb) (Mb) 76.36 0.2 chr 2 coordinate (Mb) IMPACT in CD4+ Th17 IMPACT 0.40 0 132.01 132.03 204.73 204.75

0 in CD4+ Treg IMPACT 132.01 mb 132.02 mb Gene 204.73 mb 204.74 mb 0.80 132.01 mb 132.02 mb CTLA4 0.2 IMPACT in CD4+ Th2 IMPACT 0.6 Gene

CTLA4 0 CTLA4 gene IL4 Gene IL4 gene 0.4 IL4 Gene GATA3 motifs FOXP3 motifs 0.2 TF IL4 Gene TF TF TF GATA3 ChIP FOXP3 ChIP in CD4+ Treg IMPACT 0 Gene IMPACT 1 IMPACT CTLA4 1

0.8TF 0.8 p(Th2 0.8 p(Treg 0.8 0.50.6 0.50.6 regulatory 0.40.6 regulatory 0.60.4 0.2 0.80.4 TF 0.2 IMPACT in CD4+ Th2 IMPACT 0.4 element) element) in CD4+ Treg IMPACT 00.60.20 00 IMPACT in CD4+ Th2 IMPACT 0.40 132.01 chr 5 coordinate (Mb) 132.03 0.2 204.73 chr 2 coordinate (Mb) 204.75

IMPACT in CD4+ Treg IMPACT 0.8 0.2 0

IMPACT in CD4+ Th2 IMPACT 0.6 0 0.4 0.2

IMPACT in CD4+ Treg IMPACT 0 IMPACT Validation

• IMPACT model assumes that binding sites of each TF will constitute a global epigenomic signature. (Cowper Sal-Iari et al Nat Genet 2012) IMPACT Validation

• IMPACT model assumes that binding sites of each TF will constitute a global epigenomic signature. (Cowper Sal-Iari et al Nat Genet 2012)

• If the signature is robust, each binding site genome-wide should resemble this signature. IMPACT Validation

• IMPACT model assumes that binding sites of each TF will constitute a global epigenomic signature. (Cowper Sal-Iari et al Nat Genet 2012)

• If the signature is robust, each binding site genome-wide should resemble this signature.

• Therefore, IMPACT should be able to accurately predict TF binding.

• TF binding prediction is notoriously difficult. IMPACT reidentifies TF binding

1B

1 cell−state−specific H3K4me3 + TF motif ● cell−state−specific DNase + TF motif IMPACT

● validation AUC validation −

● 0.75

● ● ●

0.5 T−BET GATA3 STAT3 FOXP3 REST HNF4A TCF7L2 PolII

TF motif binding cross (Th1) (Th2) (Th17) (Treg) (Fetal Brain) (Liver) (Pancreas) (Lymphocyte) TF (cell type) IMPACT is more accurate than open chromatin + motif

1B

1 cell−state−specific H3K4me3 + TF motif ● cell−state−specific DNase + TF motif IMPACT

● validation AUC validation −

● 0.75

● ● ●

0.5 T−BET GATA3 STAT3 FOXP3 REST HNF4A TCF7L2 PolII

TF motif binding cross (Th1) (Th2) (Th17) (Treg) (Fetal Brain) (Liver) (Pancreas) (Lymphocyte) TF (cell type)

average increase in AUC from 0.66 to 0.92 IMPACT Application: Expression Causal Variation

• cis eQTL causal variation is known to be concentrated around promoters and the TSS. (Liu et al AJHG 2017) IMPACT Application: Expression Causal Variation

• cis eQTL causal variation is known to be concentrated around promoters and the TSS. (Liu et al AJHG 2017)

Chromosome 5

Chromosome 12 132.005 mb 132.015 mb 132.025 mb 68.54 mb 68.55 mb 68.56 mb

68.545 mb 68.555 mb 132.01 mb 132.02 mb • Hypothesis 3 IFNG gene IL4 gene 1 1 TF Th1 Th1 IL4 Gene 0.5 0.8 0.5 TF 0 0.6 0 1 1 c(0, predictions, 0) c(0, predictions, 0.4 Th2 0) c(0, predictions, Th2

IMPACT in CD4+ Th1 IMPACT 0.2 0.8 Index Index 0.5 IMPACT RNA polymerase II rs71542467, rs71542468annotation would better capture this variation.0 0.5 0.6 0.4 0 0 1 1 c(0, predictions, 0) c(0, predictions, HLA−DQB1 rs72844401 Th17 0) c(0, predictions, 0.2

IMPACT in CD4+ Th2 IMPACT Th17 2 0 Index Index 0.5 0.5 Regulatory Activity rs28451423 rs71542466 rs4279477 1.5 Regulatory Activity 0 0

1 1 1 c(0, predictions, 0) c(0, predictions, Pol II Treg 0) c(0, predictions, Treg Information content Information 0.5 Index Index 0.5 ATAC 0.5 Chromosome 2 0 0 0 Chromosome 17 1 2 3 4 5 6 7 8 9 10 11 12 c(0, predictions, 0) c(0, predictions, Index 68.5 chr 12 coordinatePosition (Mb) 68.6 0) c(0, predictions, 204.725132.01 mb chr 5 coordinate204.735 mb (Mb) 132.03204.745 mb 76.345 mb Index76.355 mb Index chr12 genomic coordinate chr5 genomic coordinate 76.35 mb 76.36 mb 204.73 mb 204.74 mb RFX5 SOCS3 Gene CTLA4 gene Gene 1 1 Index SOCS3 Th1 Th1 TF 0.5 0.5 Gene IMPACT CTLA4 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 0 1 1 0.8 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th2 Th2 CIITA 0.6

Index TF Index 0.5 0.5 0.4 0.2 0 IMPACT in CD4+ Th17 IMPACT Index 0 1 1 0 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th17 0.8 Th17 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0.6 Index Index 0.5 0.5 Regulatory Activity H3K4me1 Activated Regulatory Activity 0.4 0 0 0.2 1 1 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, H3K4me3 Activated Treg in CD4+ Treg IMPACT 0 Treg Index Index Index 0.5 H3K27ac Activated 0.5 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 0 c(0, predictions, 0) c(0, predictions, H3K4me1 Resting 0) c(0, predictions, 76.35 chr 17 coordinate (Mb) 76.36 204.73 chr 2 coordinate (Mb) 204.75 Index Index H3K4me3 Resting chr17 genomic coordinate chr2 genomic coordinate Index H3K27ac Resting c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 32626192 32626692 32629192 32630692 32632192 32633692 32635192 32636692 Index chr6 genomic coordinate (bp) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) IMPACT Application: Expression Causal Variation cis eQTL χ2 enrichment across functional annotations

1.8 *** ***

1.6 *** *** * * * *** * ***

1.4 *** *** Enrichment

1.2 *** *** *** *** 1.0

PolII ChIP (0.5%) PolII IMPACT (1%) TSS; Hoffman (2%) Coding; UCSC (1.4%)Promoter; UCSC (3%) 5' UTR; UCSC (0.5%) Coding+500bp; UCSC (6%) 5' UTR+500bp; UCSC (3%) Enhancers; Hoffman (4%) Promoter+500bp; UCSCTSS+500bp; (4%) Hoffman (3.4%) Enhancers; Andersson (0.4%) Enhancers+500bp; Hoffman (9%) Flanking Promoter; Hoffman (0.8%) Enhancers+500bp; Andersson (2%) Flanking Promoter+500bp; Hoffman (3%)

25% average increase in enrichment IMPACT Application: Polygenic Causal Variation

naïve CD4+ T cell • CD4+ T cells are strongly implicated in RA pathogenesis

• CD4+ T cell-states: Th1, Th2, Th17, Tregs, etc…

Th1 Th2 Th17 Treg IMPACT explains majority of RA h2 Top 5% of IMPACT captures large % of RA h2

EUR EAS 1.2 1.0 Top 5% of IMPACT Treg SNPs explain 85.7% of RA h2. 0.8

Top 9.8% explain 97.3% of RA h2. 0.6 Proportion of RA h2 0.4 0.2

0.0 Treg Th2 Th17 Th1 IMPACT annotation Most comprehensive explanation for RA h2 Top 5% of IMPACT captures large % of RA h2

EUR EAS 1.2 1.2 1.0 1.0 0.8 0.8 0.6 0.6 42.3% Proportion of RA h2 36.4% 0.4 0.4 Proportion of RA h2 0.2 0.2 0.0 Treg Th2 Th17 Th1 0.0 CD4+CD4+ T T Histone MarksCD4+ T IMPACT annotation Histone Marks Gene Sets IMPACT applied to 42 polygenic traits IMPACT applied to 42 polygenic traits

−10 −5 0 5 10

signed log10 p τ*

Immune Blood Th1 (T−BET) Th2 (GATA3) Metabolism Th17 (STAT3) Body Treg (FOXP3) Lung Treg (STAT5) Macrophage (IRF5) Reproductive Monocyte (IRF1) Pigment Monocyte (CEBPB) B cell (PAX5) Brain Liver (HNF4A) Other Pancreas (TCF7L2) Brain (RXRA) Fetal Brain (REST) BMI LDL HDL Height Autism Autism Tanning Tanning Sunburn Anorexia Anorexia Diabetes Heel Tscore Hair Pigment Skin Pigment Ever Smoked Smoked Ever Platelet Count Schizophrenia Schizophrenia Menarche Age Smoking Status Morning Person Morning Person Crohns Disease Type 2 Diabetes Type Menopause Age High Cholesterol Ulcerative Colitis Ulcerative BMI adj for WHR BMI adj for Eosinophil Count Downs Syndrome Downs Years of Education Years Reticulocyte Count Lung Smoking FVC Allergy and Eczema Allergy and Eczema Rheumatoid Arthritis Balding Less Severe Balding Less Severe Balding More Severe Balding More Severe Red Blood Cell Width Red Blood Cell Count White Blood Cell Count Systolic Blood Pressure All Autoimmune Disease All Autoimmune Coronary Artery Disease Lung Smoking FEV1 FVC Respiratory Ear/Nose/Throat Impedance Basal Metabolic Rate IMPACT applied to 42 polygenic traits: highlights

• CD4+ T IMPACT (T-BET, STAT3, GATA3, FOXP3) à explains heritability of (auto)immune traits

• Liver IMPACT (HNF4A) à explains heritability of LDL, HDL, + some blood traits

• Macrophage IMPACT (IRF5) à explains heritability of Schizophrenia, + immune traits

• Monocyte IMPACT (CEBPB, IRF1) à explains heritability of blood traits, not immune traits

• CD4+ Treg IMPACT (STAT5) replicates CD4+ Treg IMPACT (FOXP3) IMPACT Application: A priori identification of functional variants

• Fine-mapping 20 RA risk loci using GWAS of 11,475 European RA cases and 15,870 controls (Westra et al 2018 Nature Genetics)

• Identification of putatively causal variants:

1. Differential binding of CD4+ T nuclear extract (EMSA)

2. Differential activity (luciferase assays) IMPACT Application: A priori identification of functional variants

Chromosome 2 6B ChromosomeChr 2 2 ChromosomeChromosome 6 6 204.6 mb 204.8 mb Chr 6 138 mb 138.2 mb Chromosome 2 6C

204.6 mb 204.7 mb 204.8 mb 138.1 mb 204.7 mb CD28 204.6 mb CTLA4 204.8 mb 138 mb TNFAIP3 138.2 mb 204.7 mb 138.1 mb CD28 Gene TNFAIP3 Gene TNFAIP3 genomic coordinate (Mb) 204.5 genomic coordinate (Mb) 204.8 137.9 rs6927172 138.3 CTLA4 Gene rs11314585 rs11757201 rs6933404 rs17264332 rs62432712 rs55686954 rs117701653 rs3087243 rs2327832 rs35926684 rs6920220 (functional) (functional) 1 1 Th1 Th1 0.5 0.5 0 0 1 1 Th2 Th2 Index Index 0.5 0.5 c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords, c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, 0 0 1 1 Th17 Th17 CTLA4 Gene state regulatory element) state regulatory element)

TNFAIP3 Gene TNFAIP3 Index Index − 0.5 − 0.5 c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords, c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, 0 0 1 1 Treg Treg Index

Index 0.5 0.5 c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords, c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, IMPACT p(cell IMPACT 0 IMPACT p(cell IMPACT 0 204566515 204616515 204666515 204716515 137958235 137983235 138008235 Index Index genomic coordinate (bp) c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords,

c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, genomic coordinate (bp) Applications of IMPACT

Chromosome 5

Chromosome 12 132.005 mb 132.015 mb 132.025 mb 68.54 mb 68.55 mb 68.56 mb

68.545 mb 68.555 mb 132.01 mb 132.02 mb 3 IFNG gene IL4 gene 1 1 TF Th1 Th1 IL4 Gene 0.5 0.8 0.5 TF 0 1. Cell-state0.6 -specific regulation 0 2. Causal variation of gene expression 1 1 c(0, predictions, 0) c(0, predictions, 0.4 Th2 0) c(0, predictions, Th2 2

IMPACT in CD4+ Th1 IMPACT 0.2 0.8 Index Index cis eQTL χ enrichment across functional annotations 0.5 rs71542467, rs715424680 0.5 0.6 0.4 0 0 1 1 c(0, predictions, 0) c(0, predictions, HLA−DQB1 rs72844401 Th17 0) c(0, predictions, 0.2

IMPACT in CD4+ Th2 IMPACT Th17 2 0 Index Index 1.8 0.5 0.5

Regulatory Activity *** rs28451423 rs71542466 rs4279477 1.5 Regulatory Activity *** 0 0

1 1 1 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Treg 1.6 Treg *** *** * Information content Information * * 0.5 Index Index *** 0.5 0.5 * ATAC Chromosome 2 *** 0 0 0

1.4 *** Chromosome 17 1 2 3 4 5 6 7 8 9 10 11 12 *** c(0, predictions, 0) c(0, predictions, Index 68.5 chr 12 coordinatePosition (Mb) 68.6 0) c(0, predictions, 204.725132.01 mb chr 5 coordinate204.735 mb (Mb) 132.03204.745 mb

76.345 mb Index76.355 mb Index Enrichment chr12 genomic coordinate chr5 genomic coordinate

1.2 *** 204.73 mb 204.74 mb *** 76.35 mb 76.36 mb *** *** RFX5 SOCS3 Gene CTLA4 gene Gene 1 1 SOCS3 Index Th1 1.0 Th1 TF 0.5 0.5 Gene IMPACT CTLA4 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) PolII ChIP (0.5%) PolII IMPACT (1%) TSS; Hoffman (2%) 0 0 Coding; UCSC (1.4%)Promoter; UCSC (3%) 5' UTR; UCSC (0.5%) 1 1 0.8 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th2 Coding+500bp;Th2 UCSC (6%) 5' UTR+500bp; UCSC (3%) Enhancers; Hoffman (4%) CIITA 0.6 Promoter+500bp; UCSCTSS+500bp; (4%) Hoffman (3.4%) Enhancers; Andersson (0.4%) Enhancers+500bp; Hoffman (9%) Index TF Index Flanking Promoter; Hoffman (0.8%) Enhancers+500bp; Andersson (2%) 0.5 0.5 0.4 0.2 Flanking Promoter+500bp; Hoffman (3%) 0 IMPACT in CD4+ Th17 IMPACT Index 0 1 1 0 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th17 0.8 Th17 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0.6 Index Index 0.5 0.5 Regulatory Activity H3K4me1 Activated Regulatory Activity 0.4 0 0 0.2 1 1 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, H3K4me3 Activated Treg in CD4+ Treg IMPACT 0 Treg Index Index Index 0.5 H3K27ac Activated 0.5 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0 0 c(0, predictions, 0) c(0, predictions, H3K4me1 Resting 0) c(0, predictions, 76.35 chr 17 coordinate (Mb) 76.36 204.73 chr 2 coordinate (Mb) 204.75 Index Index H3K4me3 Resting chr17 genomic coordinate chr2 genomic coordinate Index H3K27ac Resting Top 5% of IMPACT captures large % of RA h2 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 3. Causal variation of polygenic traits 4. Functional fine-mapping EUR 32626192 32626692 32629192 32630692 32632192 32633692 32635192 32636692 EAS Chromosome 2 6B ChromosomeChr 2 2 ChromosomeChromosome 6 6 204.6 mb 204.8 mb Chr 6 138 mb 138.2 mb Chromosome 2 6C

204.6 mb 204.7 mb 204.8 mb 138.1 mb 1.2 204.7 mb Index 204.6 mb 204.8 mb 138 mb 138.2 mb CD28 CTLA4 TNFAIP3 chr6 genomic coordinate (bp) 204.7 mb 138.1 mb c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) CD28 Gene TNFAIP3 Gene TNFAIP3

1.0 genomic coordinate (Mb) 204.5 genomic coordinate (Mb) 204.8 137.9 rs6927172 138.3 CTLA4 Gene rs11314585 rs11757201 rs6933404 rs17264332 rs62432712 rs55686954 rs117701653 rs3087243 rs2327832 rs35926684 rs6920220 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0.8 (functional) (functional) 1 1 Th1 Th1 0.5 0.5 0.6 0 0 1 1 Th2 Th2 Index Index 0.5 0.5 c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords, c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, Proportion of RA h2 0 0 0.4 1 1 Th17 Th17 CTLA4 Gene state regulatory element) state regulatory element)

TNFAIP3 Gene TNFAIP3 Index Index − 0.5 − 0.5 c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords, c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, 0 0 1 1 0.2 Treg Treg Index

Index 0.5 0.5 c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords, c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, IMPACT p(cell IMPACT 0 IMPACT p(cell IMPACT 0 204566515 204616515 204666515 204716515 137958235 137983235 138008235 0.0 Index Treg Th2 Th17 Th1 Index genomic coordinate (bp) c(0, predmat2[selectcoords, i], 0) c(0, predmat2[selectcoords,

c(0, predmat[selectcoords, i], 0) c(0, predmat[selectcoords, genomic coordinate (bp) IMPACT annotation Thank You

Raychaudhuri Lab Soumya Raychaudhuri Yang Luo Emma Davenport Harm-Jan Westra

Price Lab Alkes Price Steven Gazal Bryce van de Geijn email: [email protected] twitter: @TAmariuta IMPACT manuscript on bioRxiv!