Inferring the epigenetic landscape of polygenic disease

ENCODE 2019: Research Applications and Users Meeting

Tiffany Amariuta PhD Candidate Raychaudhuri Lab, Harvard Medical School July 9, 2019 Polygenic contributions to complex

104 rheumatoid arthritis risk loci (Okada et al 2014) Changes in expression can be associated with disease risk

cellular subtypes associated with increased rheumatoid arthritis risk T cells T cells Monocytes

Fonseka et al 2018 STM Zhang, Slowikowski, Fonseka, Rao, Wei et al 2019 Nature Immunology Mechanisms leading to transcriptional change

TF binding methylation Me Me

Me Me Me Me Me Me

microRNAs PTM / degradation

EMBL Zhang et al 2018 Mol Genotype has a direct impact on TF binding

A Gene normal

Gene C decreased transcription TFs mark promoters and enhancers

Gene

Gene Chromosome 5

Chromosome 12 132.005 mb 132.015 mb 132.025 mb 68.54 mb 68.55 mb 68.56 mb

68.545 mb 68.555 mb 132.01 mb 132.02 mb 3 IFNG gene IL4 gene 1 1 TF Th1 Th1 IL4 Gene 0.5 0.8 0.5 TF 0 0.6 0 1 1 c(0, predictions, 0) c(0, predictions, 0.4 Th2 0) c(0, predictions, Th2

IMPACT in CD4+ Th1 IMPACT 0.2 0.8 Index Index 0.5 0 0.5 0.6 0.4 0 0 1 1 c(0, predictions, 0) c(0, predictions, Th17 0) c(0, predictions, 0.2

IMPACT in CD4+ Th2 IMPACT Th17 0 Index Index 0.5 0.5 Regulatory Activity Regulatory Activity 0 0 1 1 c(0, predictions, 0) c(0, predictions, Treg 0) c(0, predictions, Treg Knowing where regulation occurs in diseaseIndex - Index 0.5 0.5 Chromosome 2 0 0 Chromosome 17 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, 1 1

driving cell types could reveal causal biology 1 68.5 chr 12 coordinate (Mb) 68.6 204.725132.01 mb chr 5 coordinate204.735 mb (Mb) 132.03204.745 mb 76.345 mb Index76.355 mb Index Th1 (T−BET) chr12 genomic coordinate chr5 genomic coordinate BET) − BET) (T − (T Th1 Th1 76.35 mb 76.36 mb 204.73 mb 204.74 mb 0 0 SOCS3 Gene CTLA4 gene 0 1 1 1 Gene 1 1 SOCS3

Th1 Th1 1 BET) − (T Th1 0 Th2 (GATA3) TF (GATA3) (GATA3) Th2 Th2 1 0.5 0.5 Gene T TF binding pileup CTLA4 (GATA3) Th2 Chromosome 5

Chromosome 12

132.005 mb 132.015 mb 132.025 mb 0 68.54 mb 68.55 mb 68.56 mb 1 0 0 0 68.5450 mb 68.555 mb 132.01 mb 132.02 mb 0 3 IFNG gene IL4 gene (STAT3) Th17 1 1 Index 1 TF 1

1 Th1 Th1 1 0.8 IL4 Gene 1 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, 0 0.5 0.5

0.8 1 TF Th2

0 Th2 0.6 0 1 1 c(0, predictions, 0) c(0, predictions, 0.4 Th2 0) c(0, predictions, Th2

IMPACT in CD4+ Th1 IMPACT 0.2 0.6 0.8 (FOXP3) Treg Index Index Index 0.5 0 0.5 0.6 0.4 0 0 0 TF 1 1 Index c(0, predictions, 0) c(0, predictions, Index 0) c(0, predictions, Th17 0.2 1

IMPACT in CD4+ Th2 IMPACT Th17 0.5

0.5 0 0.4 Index Index 0.5 0.5 Regulatory Activity Th17 (STAT3) Regulatory Activity (STAT3) (STAT3) Th17 Th17 (STAT5) Treg 0 0 Index 1 1 c(0, predictions, 0) c(0, predictions, Index Treg 0) c(0, predictions, Index Treg

Index 0 Index Index 0.5 0.5

0.2 1 Chromosome 2 0 0 0 IMPACT in CD4+ Th17 IMPACT Chromosome 17 0 c(0, predictions, 0) c(0, predictions, 68.5 chr 12 coordinate (Mb) 68.6 0) c(0, predictions, 204.725132.01 mb chr 5 coordinate204.735 mb (Mb) 132.03204.745 mb (IRF5) Macrophage 76.345 mb 76.355 mb Index Index Index 0 chr12 genomic coordinate 0 1 chr5 genomic coordinate 0 1 0 204.73 mb 204.74 mb

76.35 mb 76.36 mb 0 c(0, predictions, 0) c(0, predictions,

c(0, predictions, 0) c(0, predictions, 0.8

SOCS3 Gene CTLA4 gene 1 1 1 Gene Th17 1 1

SOCS3 Th17 Th1 Th1 1 (IRF1) Macrophage TF 0.5 0.5 Gene Index CTLA4 0.6 0 0 0 1 1 0.8 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Index Th2 Index Th2 1 0.6

Index TF Index 0.5 0.5 0.5 0.5 0.4 Regulatory Activity 0.2 Regulatory Activity (IRF1) Monocyte 0 IMPACT in CD4+ Th17 IMPACT 0 0.4 Index 1 1 0 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th17 0.8 Th17 0.6 Treg (FOXP3) Index Index (FOXP3) Treg 0 0.5 0.5 Regulatory Activity Regulatory Activity (FOXP3) Treg

0.4 1 0 0 0.2 1

Index 1 Index B 0 c(0, predictions, 0) c(0, predictions, 0 c(0, predictions, 0) c(0, predictions, IMPACT in CD4+ Treg IMPACT 0.2 Index Treg 0 Treg (CEBPB) Monocyte Index Index Index 0.5 0.5 1 1 0 0 0 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, IMPACT in CD4+ Treg IMPACT 1 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, 76.35 chr 17 coordinate (Mb) 76.36 204.73 chr 2 coordinate (Mb) 204.75 0 0 0 Index Treg Index Treg 0 chr17 genomic coordinate chr2 genomic coordinate Index (PAX5) cell B 1 1 1 0 Index Index 1 0.5 0.5 Index (HNF4A) Liver 0 1 0

Treg (STAT5) 0 (STAT5) Treg Index Index (STAT5) Treg Index (PolII) Blood Index 0 1 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, 76.35 chr 17 coordinate (Mb) 76.36 204.73 chr 2 coordinate (Mb) 204.75 Index (TCF7L2) Pancreas 0 0 0 0 1

1 Index Index 1 chr17 genomic coordinate chr2 genomic coordinate 1 Index (RXRA) Brain 0 1 Index (REST) Brain Fetal

Macrophage (IRF5) (IRF5) Macrophage 0 Index Index Index (IRF5) Macrophage 1 Index (CUX1) K562 0 1 0 0 0 Index adaptive 1 1 1 0 1 Index innate 0 1 Macrophage (IRF1) (IRF1) (IRF1) Macrophage Macrophage Index (SMAD2) Cardio

Index Index Index 0 Index 0 0 0 1 1 1 Index Monocyte (IRF1) Index Index Index (IRF1) (IRF1) Monocyte Monocyte 0 0 0 1 1 1 Monocyte (CEBPB) Index Index Index (CEBPB) (CEBPB) Monocyte Monocyte 0 0 0 1 1 1 B cell (PAX5) Index Index Index (PAX5) (PAX5) cell B cell B 0 0 0 1 1 1 Liver (HNF4A) Index Index Index (HNF4A) (HNF4A) Liver Liver 0 0 0 1 1 1 Blood (PolII) Index Index Index (PolII) (PolII) Blood Blood 0 0 0 1 1 1 Pancreas (TCF7L2) Index Index Index (TCF7L2) (TCF7L2) Pancreas Pancreas 0 0 0 1 1 1 Brain (RXRA) Index Index Index (RXRA) (RXRA) Brain Brain 0 0 0 1 1 1 Fetal Brain (REST) Index Index Index (REST) (REST) Brain Brain Fetal Fetal 0 0 0 1 1 1 K562 (CUX1) Index Index Index (CUX1) (CUX1) K562 K562 0 0 0 1 1 1 adaptive Index Index Index adaptive adaptive 0 0 0 1 1 1 innate Index Index Index innate innate 0 0 0 1 1 1 Cardio (SMAD2) Index Index Index (SMAD2) (SMAD2) Cardio Cardio 0 0 0 Index Index Index Index Index Index Incomplete nature of assaying TF binding

Genome-wide strategy: ChIP-seq (in vivo)

Drawbacks: 1. can only test one TF / cell-type pair at a time 2. binding is dynamic; ChIP-seq is a snapshot

Results: 1. biases in literature and available data 2. potential false negatives Incomplete nature of assaying TF binding (bias)

57 cell types

94% sparse 142 TFs 0 32 count Epigenetic modifications also mark promoters and enhancers

Nat Rev Genet, Shlyueva and Stampfel 2014 Epigenetic modifications capture genetic variation in polygenic disease (S-LDSC) Finucane et al. Page 26 Author Manuscript Author Manuscript Author Manuscript Author Manuscript

adapted from -log10 enrichment p-value Finucane et al 2015 Nat Genet

Figure 6. Enrichment of cell-type groups. We report significance of enrichment for each of 10 cell- type groups, for each of 11 traits. The black dotted line at −log10(P) = 3.5 is the cutoff for Bonferroni significance. The grey dotted line at −log10(P) = 2.1 is the cutoff for FDR < 0.05. For HDL, three of the top individual cell types are adipose nuclei, which explains the enrichment of the “Other” category.

Nat Genet. Author manuscript; available in PMC 2016 May 01. Chromosome 5

Chromosome 12 132.005 mb 132.015 mb 132.025 mb 68.54 mb 68.55 mb 68.56 mb

68.545 mb 68.555 mb 132.01 mb 132.02 mb 3 IFNG gene IL4 gene 1 1 TF Th1 Th1 IL4 Gene 0.5 0.8 0.5

Using epigenetic data to predict TF binding TF 0 0.6 0 1 1 c(0, predictions, 0) c(0, predictions, 0.4 Th2 0) c(0, predictions, Th2

IMPACT in CD4+ Th1 IMPACT 0.2 0.8 Index Index 0.5 and cell type regulation 0 0.5 0.6 0.4 0 0 1 rs71542467, rs71542468 1 c(0, predictions, 0) c(0, predictions, Th17 0) c(0, predictions, 0.2

IMPACT in CD4+ Th2 IMPACT Th17 0 HLA−DQB1 rs72844401 Index Index 0.5 0.5 Regulatory Activity Regulatory Activity 0 binding motifs epigenomic features 0 1 2 1 c(0, predictions, 0) c(0, predictions, rs28451423 rs71542466 rs4279477 Treg 0) c(0, predictions, Treg TF ChIP 1.5 Index Index 0.5 0.5 1 Chromosome 2 Information content Information

0.5 0 ATAC 0 Chromosome 17 c(0, predictions, 0) c(0, predictions, 0 68.5 chr 12 coordinate (Mb) 68.6 0) c(0, predictions, 1 2 3 4 5 6 7 8 9 10 11 12 204.725132.01 mb chr 5 coordinate204.735 mb (Mb) 132.03204.745 mb Position 76.345 mb Index76.355 mb Index Index chr12 genomic coordinate chr5 genomic coordinate 76.35 mb 76.36 mb 204.73 mb 204.74 mb SOCS3 Gene CTLA4 gene Gene 1 1 RFX5 SOCS3 Th1 Th1 TF 0.5 0.5 Gene Index IMPACT logistic regression + feature selection CTLA4 0 TF motif binding prediction 0 1 1 0.8 c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th2 Th2 0.6

Index TF Index 0.5 CIITA 0.5 0.4 0.2 0 IMPACT in CD4+ Th17 IMPACT 0 1 1 0 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Th17 0.8 Th17 Index 1- 0.6 Index Index 0.5 0.5 Regulatory Activity Regulatory Activity c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 0.4 0 0 0- 0.2 1 Activated 1 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, Treg in CD4+ Treg IMPACT 0 Treg cell type regulatory annotationIndex Index 0.5 Activated 0.5 Index 0 Activated 0 c(0, predictions, 0) c(0, predictions, c(0, predictions, 0) c(0, predictions, c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 76.35 chr 17 coordinate (Mb) 76.36 204.73 chr 2 coordinate (Mb) 204.75 Index Index H3K4me1 Resting chr17 genomic coordinate chr2 genomic coordinate H3K4me3 Resting Index Amariuta et al 2019, AJHG H3K27ac Resting c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) 32626192 32626692 32629192 32630692 32632192 32633692 32635192 32636692 Index chr6 genomic coordinate (bp) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) c(0, rep(bedgraph_trimmed[, 4], region_lengths), 0) IMPACT

Key: master regulator TF master TF coregulators generic TFs cell-state-specific regulatory elements

key TF training regulatorykey regionsTF’s coregulatorstraining inactive regulatoryc regions cell-state-nonspecific regulatory elements c < < < < < < < < > > > > > > > > > Cellular Truth Training TF Motifs

ChIP-seq

Features Open c H3K4me1

H3K9me3 … + 509 peak 1 peak 2 peak 3 IMPACT 1 score adapted from 0 Amariuta et al 2019, AJHG IMPACT

Key: master regulator TF master TF coregulators generic TFs cell-state-specific regulatory elements

key TF training regulatorykey regionsTF’s coregulatorstraining inactive regulatoryc regions cell-state-nonspecific regulatory elements c Genes < < < < < < < < > > > > > > > > > Cellular Truth Training c TF Motifs cell-type-specific regulatory elementsc ChIP-seq

Features Open Chromatin c H3K4me1 c … + 509 peak 1 peak 2 peak 3 IMPACT 1 score adapted from 0 Amariuta et al 2019, AJHG IMPACT

Key: master regulator TF master TF coregulators generic TFs cell-state-specific regulatory elements

key TF training regulatorykey regionsTF’s coregulatorstraining inactive regulatoryc regions cell-state-nonspecific regulatory elements c Genes < < < < < < < < > > > > > > > > > Cellular Truth Training c TF Motifs cell-type-specific regulatory elementsc ChIP-seq

Features Open cell type regulatory elements Chromatin c H3K4me1 other regulatory elements

H3K9me3 … + 5,339+ 509 peak 1 peak 2 peak 3 IMPACT 1 score adapted from 0 Amariuta et al 2019, AJHG Selection of TFs and cell types to model (resulting in 730 IMPACT annotations) Most common cell types 2.0 245 1.0 250 log10 Freq 0.0 Lung Liver Brain B cell Breast Myeloid Colon Prostate Stem cell Mesendoderm 200

Most common TFs 142 2.0 150 Freq 1.0 log10 Freq 0.0 100 MAX SPI1 CTCF ESR1 REST HSF1 FOXA1 RUNX1 GABPA 57

50 All tissue types 23 2.0 0 1.0 log10 Freq Cell Cell Deriv TF Tissue 0.0 GI EYE LIVERLUNG SKIN BONE BLOODBREAST NEURALUTERUS KIDNEY MUSCLE HEART OTHERS STEMCELLPROSTATE VASCULARSARCOMA ADRENAL FIBROBLAST ADIPOCYTE PANCREASMELANOMA

Collection of high quality data with matched controls (Eiryo Kawakami) Epigenetic features to characterize motifs

IMPACT features 3.0 1000

2.5 Total = 5,345 2.0 100 1.5 Frequency log10 Freq 1.0 10 0.5 1 0.0 Txn DGF HiC DNase ATAC FAIRE CTCF K27ac H3K27ac H3K4me3 H3K4me1H3K9me3H3K27me3H3K4me2H3K36me3SequenceH2BK12ac Other IMPACT accurately predicts TF motif binding

77% of 730

models have 1.0 80

AUPRC > 0.6 0.8 60 0.6 40 Freq AUPRC 0.4 20 0.2 0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.02 0.04 0.06 0.08

AURPC Random Classifier Threshold Benchmarking IMPACT against state-of-the-art approaches

Mocap (cell-type-specific epigenetic assays) Virtual ChIP-seq (+ )

Karimzadeh and Hoffman, 2019 BioRxiv

Chen et al, 2017 Nuc Acids Res IMPACT outperforms state-of-the-art approaches AUPRC (n = 50 trials)

A 1 ●● ●●●●● ●●●● ● ●●●●● ● ●●●●● ●●●●● ●●●● ●●●●● ●●●●● ●● ●●●●● ●● ●●● ●●●●● ●● ●●●●● ●● ● ●●●●● ●● ●●● ●●●● ●●● ●●●●● ●●●●● ●● ●●●●● ● ●● ●●● ●●●● ●●● 0.75 ● ●● ●● ●● ● ●●●●● ●●●●● ● ● ●●●●● ●●●●● ●● ●●●● ●●● ●●● ●●●●●● ●● ●● ●●●● ●●●●●● ●●●●● ●●●●● ● ● ●● ●●●●● ●●●● ●●●● ●●●●● ●●●● ●●●● ●●● ●●●● ● ● ●●●●● ●●●●● ●●● ●● ●●●● ●● ●●●●● 0.5 ●●● ●●● ●●● ●●● ●●●● ●● ●●● ●● ● IMPACT ●● ●●●●● ●●● AUPRC ●●● ●●● ●● ●●●● ● ●● ● MocapG ●●●●● ● ● ●●●● ●● ●● ● ●●●●●● ●● ●●●● ● 0.25 ● ●●●● ●● ●●●●● ●● MocapS ●●● ●●●●● ●●● ●●●● ● Virtual ChIP ● ●

0 ●

T−BET GATA3 STAT3AUPRCFOXP3 (n =TCF7L2 50 trials)REST Pol II HNF4A (Th1) (Th2) (Th17) (Treg) (Pancreas) (Fetal Brain) (Lymphocyte) (Liver) B 1 ● ●●●●● ●●●●●●● ●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●● ●●●● ●●● ●●●●●●●●●● ● ● ● ●●●●●●●● ● ●●●●●●●● ●●●●●●● ●●● Amariuta et al 2019, AJHG●●●●●●●●●● ●●● ●●●●● ●● ●● ●● 0.75 ●●● ●●●●●●●● ●●●●●●●●●● ● ●●●●● ●●● ● ● ●●●●●● ●●● ●●●●●●● ●●●●● 0.5 ●●●● ●●●●●● ●●●●●●● MCC (n = 50 trials)●●●●●● ●●● ●●●●● AUPRC ●● ●●● ●● ●●● ●●● ●●●●● 0.25 ● ●●●●● ●●●●●● IMPACT ●●●●●● ●●●●●●●● 1 ●●●●●●●●● ●●●●● ● Virtual ChIP ●●● ●●● ●●● ●0 ●●●● ●●●●● ●●●●● ●●●●● ●●●● ●● ●● ●●● ●●●● ● ●●●● ●●●●● ●●●● ●● ●●●●● ●● ●●●●● ●●●● ●● ● ●●● 0.75 Pol II Pol II ●●●● ● Pol II Pol II●●●● Pol●●● II Pol II ●●●● ●●●●● ●●● ●●●●● ●●●●● (Colon) (Fibroblast)● ●●●●● (Heart) ●●●●● (Liver)●●● (Pancreas)●●●● (Stomach) ●● ●●● ● ●● ●● ●● ●●●● ● ● ● ●●●● ● ●●● ●●●●● ●● ● ●●●●● ●●●● ●●●●● ●●●● ●●●●● ●●●● ●●● ●●●●● ●●●●● ● ●●●● ●●●●● MCC ●●● ●● ●●●● ● ● ●● ● ●●●●● ●●●● ●●● ●●● ● ● ●●●●● ●●●● ● IMPACT ●● ●●●● ●● ●●●●● ●●●● ● ●●● ●●● ●●●●● ●●●●● ● ●●●● ●●●●● ● 0.25 ● ●●● ●● ●●● ●●●●● MocapG ●●●●● ●●● ●● ●●●● ●●●● ● ●● ●●●● ●●●● ●●●● ● MocapS ● ●●●● ● ●●●●● ●●●●● ●●● 0 ● ●●●●● ●●●● ●●●● ● Virtual ChIP ● ●●

T−BET GATA3 STAT3MCCFOXP3 (n = TCF7L250 trials)REST Pol II HNF4A (Th1) (Th2) (Th17) (Treg) (Pancreas) (Fetal Brain) (Lymphocyte) (Liver) 1 ● ● ●●●●●●●●● ●●●●●●●●●● ●●●●●● ● ●●●●●●●●●● ●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●●● ●●● ●●●●●●●● ●●●●●●●●● ●● ● ●●● ● ●●●●●● ●●●●● ●●●●●● ●●●●●●●● ●●●●●● ●●●●●● 0.75 ●●●●● ●●●● ●●●●●● ●● ●●●●●●● ● ●●●●●●● ●●●●●●●●● ● ● ●●●● ●●● ●●●●● ●●●●●●● ●●●●●●● MCC ●●●●●● ●●●●●●●●● ●●●●●● ●●●●● ●●●● ●●●

0.25 ●● ●●●●● ● IMPACT ●●●● ●●●●●●● ●●●●●● ●●●●●●● 0 ●●●●● ● ● Virtual ChIP ●●●●●●●●● ●●●

Pol II Pol II Pol II Pol II Pol II Pol II (Colon) (Fibroblast) (Heart) (Liver) (Pancreas) (Stomach) Does IMPACT predicted cell type regulation capture polygenic heritability? (S-LDSC)

Finucane et al. Page 26 Author Manuscript Author Manuscript Author Manuscript Author Manuscript

adapted from -log10 enrichment p-value Finucane et al 2015 Nat Genet

Figure 6. Enrichment of cell-type groups. We report significance of enrichment for each of 10 cell- type groups, for each of 11 traits. The black dotted line at −log10(P) = 3.5 is the cutoff for Bonferroni significance. The grey dotted line at −log10(P) = 2.1 is the cutoff for FDR < 0.05. For HDL, three of the top individual cell types are adipose nuclei, which explains the enrichment of the “Other” category.

Nat Genet. Author manuscript; available in PMC 2016 May 01. −log10(tau* p) BF

Cell type regulation predicted by IMPACT captures polygenic heritability

LDL Diabetes Balding More Severe Bipolar Disorder Fasting Glucose Lung Smoking FEV1 FVC Neuroticism Menopause Age Morning Person Lung Smoking FVC HDL High Cholesterol Systolic Blood Pressure Coronary Artery Disease Years of Education Smoking Status Asthma Respiratory Ear/Nose/Throat Allergy and Eczema Ulcerative Colitis Osteoporosis Heel Tscore Height BMI adj for WHR Breast Cancer Impedance Basal Metabolic Rate Schizophrenia Menarche Age White Blood Cell Count Prostate Cancer Reticulocyte Count Red Blood Cell Count Platelet Count Red Blood Cell Width All Autoimmune Disease Rheumatoid Arthritis T1D Celiac IBD Crohns Disease Primary biliary cirrhosis Lupus 0 21 Eosinophil Count Multiple sclerosis tau* significance 434 annotations

Treg:ETS1BLung:PAX5 cell:TBPB cell:SRFT cell:MYB Lung:MAXLung:MAXLung:MYC B Tcell:SP1 cell:ZFX Lung:EHFLung:MYCLung:MXI1Brain:MAXLiver:VDRBone:FLI1Brain:MAXBrain:MYCLiver:MYCLiver:MXI1Liver:SRFLiver:NFICLiver:MAXLiver:TBPLiver:ELF1Lung:ELF1Liver:ATF3Bone:MYCLung:MAXLung:MYCLung:E2F6Lung:SP1 Th2:GATA3BTh1:TBX21Th1:GATA3B cell:PBX3B Bcell:MAXcell:BCL6Bcell:ZEB1 BB cell:SPI1cell:BCL6B cell:MYCcell:PAX5 cell:SPI1cell:USF2B cell:MYCcell:MAX cell:JUNDcell:CUX1 Bcell:NFICLung:CTCFTreg:FOXP3 cell:NFYABcell:MAXBBB B cell:USF1B cell:ELF1 cell:BCL6cell:PBX1B cell:IRF4B cell:EGR1cell:PAX5cell:BCL6Bcell:SPI1 cell:ETS1 cell:MXI1 cell:MYCcell:PAX5cell:IRF4Brain:SOX2Brain:MXI1Liver:MAFKLiver:MAFFBColon:MYCB B cell:MYCTcell:SPI1 B cell:CTCFBcell:MYCBcell:ETS1B Bcell:ATF3 cell:PAX5BTcell:IRF4 B cell:E2F1cell:CTCFBcell:NFYB cell:E2F4cell:CTCFB cell:CTCFcell:RFX5Bcell:CTCF B cell:CTCF cell:CTCFcell:CTCFcell:ELK1 cell:CTCFColon:VDRBreast:MAXBreast:SRFBreast:MYCColon:MYCColon:KLF5Lung:SOX2Breast:MYCBrain:OTX2Brain:PBX3Breast:JUNBrain:USF1Liver:HEY1Liver:ZEB1Liver:RXRAColon:HSF1Lung:GATA3Lung:PBX3Bone:ESR2Muscle:FLI1Lung:MAFKLiver:HSF1Brain:RESTColon:USF1Cervix:TBPColon:SRFLung:USF1Cervix:MAXColon:RFX2Brain:RFX5Cervix:MYCColon:MAXLiver:JUNDLiver:RESTLiver:RFX5 B cell:RUNX3BBB Bcell:CREB1 Bcell:MEF2A cell:STAT3 cell:STAT3Bcell:RUNX1TBT cell:GABPA cell:RUNX1Bcell:STAT3cell:GATA3B cell:BACH2cell:STAT3 Bcell:NR2C2Myeloid:IRF1Myeloid:FOS cell:STAT3Myeloid:TBPMyeloid:JUNMyeloid:MYCMyeloid:SRFTMyeloid:ELK1 Myeloid:E2F4cell:GATA3Myeloid:SP1Cervix:MAFKMyeloid:MXI1Myeloid:E2F6Myeloid:MAXMyeloid:ATF1Myeloid:ATF3Myeloid:ELF1BBMyeloid:SPI1BT cell:STAT3Cervix:NFYBTcell:STAT3 cell:RUNX1cell:STAT3 Bcell:GABPASpleen:CTCFB cell:STAT3 cell:MAFKB cell:STAT3Breast:ESR2Breast:KLF4Lung:FOXA1Lung:FOXA1Breast:HSF1Breast:HSF1Breast:E2F1Breast:HSF1Liver:FOXA2Liver:FOXA1Colon:CDX2Colon:JUNDColon:GATA6Lung:SMAD3Breast:HIF1ALiver:FOXA1Colon:GATA6Colon:CDX2Breast:ESR1Breast:EGR1Breast:E2F4Breast:HSF1Breast:ARNTBreast:ESR1Breast:ESR1Breast:JUNDBreast:ELF1Breast:ESR1Tumor:ESR1Lung:CREB1Liver:CREB1Cervix:E2F1Muscle:ELF1Brain:GATA2Lung:CEBPBLiver:CEBPDLiver:SMAD3Liver:CEBPBLiver:NR2C2Liver:TEAD4Liver:FOXA2Kidney:ARNTLiver:HNF4ABreast:PBX1Breast:ESR1Lung:TEAD4Liver:FOSL2Breast:XBP1Cervix:USF2Brain:GABPAColon:EGR1Cervix:E2F4Cervix:ELK4Cervix:MXI1Cervix:RFX5Cervix:ELK1Colon:FOSL1Cervix:E2F6Lung:GABPAColon:JUND B Monocyte:VDRcell:CEBPBBMonocyte:IRF1 Myeloid:TFAP4Myeloid:HSF1cell:MEF2CMyeloid:GATA2PBMC:GATA1Myeloid:GATA1Myeloid:CUX1Myeloid:MAFFMyeloid:HEY1Myeloid:HSF2Myeloid:HSF1Myeloid:JUNBMyeloid:JUNDMyeloid:STAT1Monocyte:SPI1Myeloid:NRF1Myeloid:CTCFMyeloid:MAFKErythroid:KLF1Myeloid:EGR1Myeloid:ETS1Myeloid:USF2Myeloid:NFYAMyeloid:USF1Monocyte:SPI1Monocyte:ZFXProstate:ETV1Prostate:ERGBreast:FOXA1Breast:GATA3Epithelial:IRF1Prostate:E2F1Epithelial:MYCEpithelial:KLF5Breast:FOXA1Breast:CEBPBStomach:KLF5Prostate:ETV4Stomach:KLF5Colon:HNF4ABreast:FOSL2Epithelial:JUNThyroid:SOX2Breast:TEAD4Breast:GATA3StemStemStemStem cell:MAX Stemcell:MYCcell:TBP cell:MYC cell:SRF Prostate:ETV1cell:JUNColon:HNF4AColon:CEBPBColon:PROX1StemCervix:GABPAProstate:ETS1Cervix:NR2C2Prostate:JUND Colon:TEAD4cell:SP1Stem cell:SP1 Myeloid:RUNX1Myeloid:CEBPDMyeloid:TEAD4Myeloid:MEF2AMyeloid:FOSL1Myeloid:RUNX1Endothelial:FLI1Myeloid:CEBPBMyeloid:NR4A1Erythroid:NFE2Myeloid:THAP1Myeloid:CEBPAMyeloid:GABPAMyeloid:CREB1Myeloid:GABPAProstate:FOXA1Prostate:FOXP1Prostate:TFAP4Prostate:FOXA1Stomach:GATA6Stomach:GATA4Stomach:GATA4Prostate:FOXP1Epithelial:JUNBProstate:FOXA1Stomach:GATA6Stomach:GATA6Epithelial:ETV5Stomach:GATA6Stomach:GATA4Ectoderm:OTX2StemEpithelial:MAFGStemStemStemStemStem cell:OTX2Muscle:MYOD1Fibroblast:MYC cell:MXI1cell:USF1 cell:STAT3 cell:USF2cell:SOX2 cell:EGR1Fibroblast:MYCFibroblast:SOX2Fibroblast:MYCMuscle:GABPAStemStemStem cell:ATF3 cell:MAFK cell:JUND CD4 TMonocyte:STAT1 cell:BCL6CD4 T cell:ETS1Endothelial:MAXMacrophage:SPI1Monocyte:CEBPBLymphocyte:MYCCD4Epithelial:GMEB2Prostate:RUNX1Epithelial:FOXA2 Prostate:RUNX1TProstate:RUNX2 cell:RESTEpithelial:BARX1Pancreas:GATA6Prostate:GABPAEsophagus:SOX2Epithelial:HNF4AStemStemStemStem cell:NR5A2 Myoblast:MYOD1Endothelial:ETS1cell:PRDM1cell:TEAD4Endothelial:ELK3Stem cell:CREB1 cell:GABPAcell:FOSL1Endothelial:MYC cell:SMAD4Stomach:TEAD4StemStem cell:HNF4A cell:FOXA1 CD4Erythroblast:GATA1 T cell:RUNX1Adipocytes:CEBPAAdipocytes:CEBPADendriticErythroblast:GATA1Bone cell:SPI1 marrow:MYCKeratinocyte:SNAI2Ectoderm:SMAD4Endothelial:GATA2Hepatocyte:NR1H4Endothelial:HIF1AMacrophage:EGR2Stem cell:EOMESMesendoderm:SP1 Lymphoblastoid:VDRBone Macrophage:CEBPBmarrow:GATA1Lymphocyte:CEBPAMesendoderm:PDX1Mesendoderm:CDX2Mesendoderm:MYCMesendoderm:HEY1Mesendoderm:PAX6 MultipleMultipleLymphoblastoid:CTCF Myeloma:IRF4Myeloma:MAXMesendoderm:FOXA1Mesendoderm:NR5A2Mesendoderm:STAT3Mesendoderm:HNF4AMesendoderm:HNF1BMesendoderm:SOX17 Adipose stroma:PPARGMultiple Myeloma:MYC Retinal Epithelium:LHX2Mesendoderm:PRDM1 Tracheal epithelial:GRHL2 Gastrointestinal Tract:ETV1 Hematopoietic MesenchymalprogenitorsHematopoietic Hematopoietic:ERG stem progenitors cell:PPARG progenitors :SPI1 :FLI1 HematopoieticHematopoieticHematopoieticHematopoietic progenitors Hematopoieticprogenitors progenitors progenitors :RUNX1 :GATA2 progenitors :GATA1:RUNX1 :GFI1B :MEIS1 Hematopoietic progenitors :GATA2 IMPACT regulatory annotations outperform Top 5% of IMPACT captures large % of RA h2 epigenetic and transcriptomic annotationsEUR EAS 1.2 1.2 1.0 1.0 0.8 0.8 0.6 0.6 42.3%

Proportion of RA h2 36.4% 0.4 0.4 Proportion of RA h2 0.2 0.2 0.0 Treg Th2 Th17 Th1 0.0 CD4+ CD4+ T T MarksCD4+ T IMPACT annotation Histone Marks Gene Sets Amariuta et al 2019, AJHG 300 ** Treg (FoxP3) IMPACT Other Annotations * τ 200 A 6 300 60 ** Treg4 (FOXP3) IMPACT 100 Enrichment Other Annotations

*** 2

40 *** *** ****** *** *** *** 0 0 *** *** *** *** *** *** *** *** *** 2% 5% 0.04% 3% 1% 1% ***3% 2% ***3% 12% 13% 13% 13% 11% 3% 2 20 − −seq size Annotation effect *** −seq *** IMPACT annotations capture regulatory Treg IMPACTFoxp3 Motif T.4int8+.ThT.4.Pa.BDCT.4SP69+.ThCD4Control 0 Foxp3T.4.PLN.BDC Motif T.4int8+.Th RA h2 Enrichment *** *** T.4.Pa.BDC Foxp3 ChIPAveraged TracksH3K4me3H3K27ac (Treg)H3K4me3 (Th2) (Treg) *** *** *** T.4SP69+.ThCD4ControlT.4.PLN.BDC Foxp3Super ChIP EnhancersAveraged Tracks H3K4me3 (Treg)H3K27acH3K4me3 (Th2) (Treg) H3K4me3 (Th17 stim) H3K27ac (Th17 stim) mechanisms distinct from epigenetic marks2% 5% 0.04% 3% 1% 1% 3% 2% 3% 12% 13% 13% 13% 11%H3K4me33% (Th17 stim) H3K27ac (Th17 stim) −seq

Treg IMPACTFOXP3 Motif T.4int8+.ThT.4.Pa.BDC CD4Control H3K27ac (Th2) T.4SP69+.Th T.4.PLN.BDC FOXP3 ChIPAveraged TracksH3K4me3Compared (Treg)H3K4me3 (Treg)to Treg IMPACT Super Enhancers H3K27ac (Th17 stim) B H3K4me3 (Th17 stim) * τ

6 *** *** *** *** *** *** *** *** *** * ** *** 4 ** *** Independent ***** * *** Conditional Treg IMPACT Conditional Compared● Annotation

2 Independent 95%CI ** Conditional 95%CI *** *** *** ** ** 0 2 − Annotation effect size size Annotation effect −seq

FOXP3 Motif T.4int8+.ThT.4.Pa.BDC CD4Control H3K27ac (Th2) T.4SP69+.Th T.4.PLN.BDC FOXP3 ChIPAveraged Tracks H3K4me3 (Treg) H3K4me3 (Treg) Super Enhancers H3K4me3 (Th17 stim) H3K27ac (Th17 stim)

Amariuta et al 2019, AJHG IMPACT regulatory annotations outperform ChIP-seq annotations

Improvement in Tau* Significance

● 20 71% of annotations ● ● experienced an ● ●● ● ● ● ● ● ● ● ● ● increase in ● ● ● ● ●● ● ● enrichment post ● 15 ● ● ● ●● ● ● ● ● ● ● ● IMPACT ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●

log10 Tau* p log10 Tau* ● ● ● ● ● ● ● ● ●

− ● ● ● ●● ● ● 10 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●

IMPACT IMPACT ● ● ● ● ● ● ● ● ● ● ●●●● ● ● − ● ● ● ● ●● ● ● ● ● 5 ●●●● ● ● ●●●● ● ● ● ● ●● ●●● ● ● ● ● ● ● post ● ● ●●● ●●● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ●● ● ●● ● ●● ● ●●● ●● ●● ● ● ● 0 ● ●

4 6 8 10 12 14 pre−IMPACT −log10 Tau* p Disease-driving regulation is concordant between ancestries Disease-driving regulation is concordant between ancestries Asthma CAD chisq p<7.2e−28 chisq p<1.1e−48 4 4 ●● ● ● ● ●●●●●● ● ● ●●●●●●●●● ●●●●● ●●●●●●●●● ● ●●●●●●●●●● ● ●●●●●●●●● ●● ●●● ●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●● ● EUR ●●●● ●●●●●●●●●●●● EUR

● 0 0 ●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●● ● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●● ●●● ● ●●●●●●●● ● ●●●●●●●●● ●● ● ●●●● ● ●● 4 ● 4 ●● ● − − −2 0 2 4 −4 0 2 4

EAS EAS

PrCa RA chisq p<5.7e−79 chisq p<1.1e−27 10 10 5 5 ● ● ●●●● ● ● ●●●●●●●● EUR ●● EUR ● ● ●●●●●●●●●●● ●● ●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●● 0 ● ●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0 ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ● ● ●●●●●●●●●●●●●●●●● ● ●●●●●●●● ●● ●●●●●●●●●●● ●● 5 ●●●● ● −

−2 0 2 4 6 −2 2 4 6 8

EAS EAS Acknowledgements Raychaudhuri Lab Soumya Raychaudhuri Yang Luo Emma Davenport Harm-Jan Westra

Price Lab Alkes Price Steven Gazal Bryce van de Geijn

Funding ENCODE, NHGRI T32 email: [email protected] twitter: @TAmariuta