Functional Enrichments of Disease Variants Indicate Hundreds of Independent Loci Across Eight Diseases
Total Page:16
File Type:pdf, Size:1020Kb
Functional enrichments of disease variants indicate hundreds of independent loci across eight diseases Abhishek K. Sarkar, Lucas D. Ward, & Manolis Kellis 1.00 0.75 Cohort correlation !"SS 0.50 #AN"$" %&!" N"!"C1 N"!"C2 Pearson 0.25 'verall )TC## 0.00 Hold-out -0.25 0 25000 50000 75000 100000 Top n SNPs (full meta-analysis) Supplementary Figure 1: Correlation between individual cohort 푧-scores and meta-analyzed 푧- scores of the remainder in a study of rheumatoid arthritis considering increasing number of SNPs. SNPs are ranked by 푝-value in the overall meta-analysis. Overall correlation is between sample-size weighted 푧-scores and published inverse-variance weighted 푧-scores. 1 15-state model, 5 marks, 127 epigenomes Cell type/ tissue group Epigenome name Addtl marks H3K4me1 H3K4me3 H3K36me3 H3K27me3 H3K9me3 H3K27ac H3K9ac DNase-Seq DNA methyl RNA-Seq EID states Chrom. E017 IMR90 fetal lung fibroblasts Cell Line 21 IMR90 E002 ES-WA7 Cell Line E008 H9 Cell Line 21 E001 ES-I3 Cell Line E015 HUES6 Cell Line ESC E014 HUES48 Cell Line E016 HUES64 Cell Line E003 H1 Cell Line 20 E024 ES-UCSF4 Cell Line E020 iPS-20b Cell Line E019 iPS-18 Cell Line iPSC E018 iPS-15b Cell Line E021 iPS DF 6.9 Cell Line E022 iPS DF 19.11 Cell Line E007 H1 Derived Neuronal Progenitor Cultured Cells 13 E009 H9 Derived Neuronal Progenitor Cultured Cells 1 E010 H9 Derived Neuron Cultured Cells 1 E013 hESC Derived CD56+ Mesoderm Cultured Cells ES-deriv E012 hESC Derived CD56+ Ectoderm Cultured Cells E011 hESC Derived CD184+ Endoderm Cultured Cells E004 H1 BMP4 Derived Mesendoderm Cultured Cells 11 E005 H1 BMP4 Derived Trophoblast Cultured Cells 15 E006 H1 Derived Mesenchymal Stem Cells 13 E062 Primary mononuclear cells from peripheral blood E034 Primary T cells from peripheral blood E045 Prim. T cells effector/memory enriched from periph. blood E033 Primary T cells from cord blood E044 Primary T regulatory cells from peripheral blood Blood & E043 Primary T helper cells from peripheral blood E039 Primary T helper naive cells from peripheral blood T-cell E041 Primary T helper cells PMA-I stimulated E042 Primary T helper 17 cells PMA-I stimulated E040 Primary T helper memory cells from peripheral blood 18-state model, 7 marks, E037 Primary T helper memory cells from peripheral blood E048 Primary T CD8+ memory cells from peripheral blood 98 epigenomes E038 Primary T helper naive cells from peripheral blood E047 Primary T CD8+ naïve cells from peripheral blood E029 Primary monocytes from peripheral blood E031 Primary B cells from cord blood E035 Primary hematopoietic stem cells E051 Primary hematopoietic stem cells G-CSF-mobilized Male HSC & E050 Primary hematopoietic stem cells G-CSF-mobilized Female E036 Primary hematopoietic stem cells short term culture B-cell E032 Primary B cells from peripheral blood E046 Primary Natural Killer cells from peripheral blood E030 Primary neutrophils from peripheral blood E026 Bone Marrow Derived Cultured Mesenchymal Stem Cells Mesench E049 Mesenchymal Stem Cell Deriv. Chondrocyte Cultured Cells E025 Adipose Derived Mesenchymal Stem Cell Cultured Cells E023 Mesenchymal Stem Cell Derived Adipocyte Cultured Cells Myosat E052 Muscle Satellite Cultured Cells 1 E055 Foreskin Fibroblast Primary Cells skin01 E056 Foreskin Fibroblast Primary Cells skin02 E059 Foreskin Melanocyte Primary Cells skin01 E061 Foreskin Melanocyte Primary Cells skin03 Epithelial E057 Foreskin Keratinocyte Primary Cells skin02 E058 Foreskin Keratinocyte Primary Cells skin03 E028 Breast variant Human Mammary Epithelial Cells (vHMEC) E027 Breast Myoepithelial Primary Cells E054 Ganglion Eminence derived primary cultured neurospheres Neurosph E053 Cortex derived primary cultured neurospheres E112 Thymus Thymus E093 Fetal Thymus E071 Brain Hippocampus Middle E074 Brain Substantia Nigra E068 Brain Anterior Caudate E069 Brain Cingulate Gyrus E072 Brain Inferior Temporal Lobe Brain E067 Brain Angular Gyrus E073 Brain Dorsolateral Prefrontal Cortex E070 Brain Germinal Matrix E082 Fetal Brain Female E081 Fetal Brain Male Adipose E063 Adipose Nuclei E100 Psoas Muscle E108 Skeletal Muscle Female E107 Skeletal Muscle Male Muscle E089 Fetal Muscle Trunk E090 Fetal Muscle Leg E083 Fetal Heart 25-state model, 12 imputed marks, E104 Right Atrium E095 Left Ventricle 127 epigenomes Heart E105 Right Ventricle E065 Aorta E078 Duodenum Smooth Muscle Smooth E076 Colon Smooth Muscle E103 Rectal Smooth Muscle Muscle E111 Stomach Smooth Muscle E092 Fetal Stomach E085 Fetal Intestine Small E084 Fetal Intestine Large E109 Small Intestine Digestive E106 Sigmoid Colon E075 Colonic Mucosa E101 Rectal Mucosa Donor 29 E102 Rectal Mucosa Donor 31 E110 Stomach Mucosa E077 Duodenum Mucosa E079 Esophagus E094 Gastric E099 Placenta Amnion E086 Fetal Kidney E088 Fetal Lung E097 Ovary E087 Pancreatic Islets Other E080 Fetal Adrenal Gland E091 Placenta E066 Liver E098 Pancreas E096 Lung E113 Spleen E114 A549 EtOH 0.02pct Lung Carcinoma Cell Line 4 E115 Dnd41 TCell Leukemia Cell Line 4 E116 GM12878 Lymphoblastoid Cell Line 4 E117 HeLa-S3 Cervical Carcinoma Cell Line 4 E118 HepG2 Hepatocellular Carcinoma Cell Line 4 E119 HMEC Mammary Epithelial Primary Cells 4 E120 HSMM Skeletal Muscle Myoblasts Cell Line 4 E121 HSMM cell derived Skeletal Muscle Myotubes Cell Line 4 ENCODE E122 HUVEC Umbilical Vein Endothelial Cells Cell Line 5 E123 K562 Leukemia Cell Line 5 E124 Monocytes-CD14+ RO01746 Primary Cells 4 E125 NH-A Astrocyte Primary Cells 4 E126 NHDF-Ad Adult Dermal Fibroblast Primary Cells 4 E127 NHEK-Epidermal Keratinocyte Primary Cells 5 E128 NHLF Lung Fibroblast Primary Cells 4 E129 Osteoblast Primary Cells 4 Count 56 60 98 62 53 95 127 127 127 127 127 WGBS 184|26 RRBS RNA-seq trained 0% 50% 100% mCRF microarray applied Supplementary Figure 2: Unique identifiers, cell type names, and tissue groups for 127 reference epigenomes. ChromHMM state definitions for the 15-state model trained on observed data and the 25-state model trained on imputed data. CD RA T1D AD 0.005 0.005 T cell 0.006 0.0075 T cell T cell 0.004 0.004 B cell 0.0050 0.003 0.004 0.003 0.002 0.002 0.0025 0.002 0.001 0.001 0.0000 0.000 0.000 0.000 0 7,500 15,000 22,500 30,000 0 7,500 15,000 22,500 30,000 0 7,500 15,000 22,500 30,000 0 7,500 15,000 22,500 30,000 BIP SCZ CAD T2D 0.004 Brain T cell 0.003 Pancreatic islets 0.003 Colonic mucosa Brain Cumulative deviation 0.003 Small intestine 0.002 T cell 0.002 0.002 0.002 0.001 0.001 0.001 0.001 0.000 0.000 0.000 0.000 0 7,500 15,000 22,500 30,000 0 7,500 15,000 22,500 30,000 0 7,500 15,000 22,500 30,000 0 7,500 15,000 22,500 30,000 SNP rank by p-value IMR90 iPSC T-cell Mesench Epithelial Thymus Adipose Heart Digestive ENCODE2012 Tissue ESC ES-deriv B-cell Myosat Neurosph Brain Muscle Sm. Muscle Other Supplementary Figure 3: Identification of relevant cell types and determination of an empiri- cal 푝-value cutoff using enhancer annotations predicted by a 15 chromatin state model learnedon observed data for 5 histone modifications marks across 111 reference epigenomes. Each curve cor- responds to enhancer regions predicted in a specific reference epigenome and is colored by tissue group. The black line at zero cumulative deviation indicates zero enrichment, and the red vertical line indicates the empirical 푝-value cutoff taken forward for the rest of the analysis. A priori relevant enrichments are denoted by opaque lines. CD RA T1D AD BIP SCZ CAD Phenotype T2D Reference epigenome z-score 10 20 IMR90 iPSC T-ce"" Mesench Epithelia" Thymu# Adipose 'eart Digestive ENC)DE2012 Tissue ESC ES-deriv B-ce"" Myo#$t %eurosph Brain M&#cle Sm. M&#cle )ther Supplementary Figure 4: Enrichment of enhancers across 127 reference epigenomes in eight dis- eases. Only enhancer annotations significantly enriched (BH 푞 < 0.05) in at least one phenotype are shown. In contrast to enhancer modules (Figure 2), enrichment methods for annotations learned on individual cell types count constitutive elements towards every annotation, confounding the enrichments. 30 20 regulators master 10 of Count 0 IMR90 ESC iPSC ES-deriv T-cell B-cell Mesench Myosat Cluster weight Epithelial 1.00 Neurosph 0.75 0.50 Tissue Thymus 0.25 Brain Adipose Muscle Heart Sm. Muscle Digestive Other ENCODE2012 , ) )) 51 22 ,( +( 13 39 ++ *+ (( 38 125 226 128 157 103 117 153 178 213 193 181 159 177 151 180 161 212 218 141 Enhancer module IM'90 iPSC "-cell Mesench Epithelial "hymus Adipose Heart Digestive ENCODE2012 Tissue ESC ES-deriv -cell Myosat Neurosph rai! Muscle Sm. Muscle Other Supplementary Figure 5: Counts of putative master regulators identified across any of the eight diseases in 226 enhancer modules comprising patterns of observed histone modification across 111 reference epigenomes. Only counts for 32 enhancer modules in which a master regulator was discovered in any phenotype are shown. The leftmost four modules are defined as constitutive (having at least 50% of cluster weights greater than 0.25). MYEF2 CTCF SRF TEAD4 BACH1 ELF4 MEF2D REL SPI1 ETS1 ELK4 ETV6 '.)$#( * BHLHE41 NFKB1 NFKB2 MEF2A ETV7 ELF3 Transcription TEF ELF5 JDP2 MEF2B SPIB SPIC CEBPG ATF1 TEAD1 NFE2 AD RA CD BIP T1D T2D SCZ CAD E016 E003 E024 E007 E013 E012 E011 E004 E005 E006 E062 E037 E038 E047 E050 E055 E056 E059 E061 E057 E058 E028 E027 E054 E053 E112 E071 E070 E082 E100 E104 E095 E105 E065 E085 E084 E109 E106 E079 E094 E097 E087 E066 E098 E096 E113 E114 E116 E117 E118 E119 E120 E122 E123 E127 E128 Phenotype Reference*!&+,!"#-! Log odds ratio Relative expression 0.25 0.50 0.75 1.00 1.25 0.00 0.25 0.50 0.75 1.00 IMR90 iPSC T1)!44 M!/!") E&+$ !4+.4 T %-5/ A2+&#/! H!.($ D+,!/$+3! ENC6DE2012 Tissue ESC ES-2!(+3 B1)!44 M%#/.$ N!5(#/& B(.+" M5/)4! S-0*M5/)4! 6$ !( Supplementary Figure 6: Normalized expression (such that the maximum equals one) across 57 reference epigenomes of putative master regulators predicted in constitutively marked enhancers in eight diseases.