ENCODE: Understanding the Genome Michael Snyder May 8, 2013

ENCODE: Understanding the Genome Michael Snyder! " May 8, 2013 Conﬂicts: Personalis, Genapsys, Illumina! Slides From Ewan Birney, Marc Schaub, Alan Boyle Encyclopedia of DNA Elements (ENCODE) •# NHGRI-funded consor?um •# Goal: delineate all “func?onal” elements in the human genome •# Wide array of experimental assays •# Three Phases: 1) Pilot 2) Scale Up 1.0 3) Scale up 2.0 The ENCODE Project Consor?um. An Integrated Encyclopedia of DNA Elements in the Human Genome. Nature 2012 Project website: h5p://encodeproject.org The ENCODE Consor?um Phase 2 Brad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides) Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng) Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer) Jim Kent (David Haussler, Kate Rosenbloom) John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick Navas, Phil Green) Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman) Rick Myers (Barbara Wold) Sco Tenenbaum (Luiz Penalva) Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent, Roderic Guigo) Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan, Yoshihide Hayashizaki) Zhiping Weng (Nathan Trinklein, Rick Myers) AddiFonal ENCODE ParFcipants: Ellio? Marguiles, Eric Green, Job Dekker, Laura Elnitski, Len Pennachio, Jochen Wibrodt .. and many senior sciensts, postdocs, students, technicians, computer sciensts, stascians and administrators in these groups NHGRI: Elise Feingold, Mike Pazin, Peter Good 3 The ENCODE Consor?um Phase 3 Brad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides) Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng) Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer) Jim Kent (David Haussler, Kate Rosenbloom) Mike Cherry John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick Navas, Phil Green) Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman) Rick Myers (Barbara Wold) Sco Tenenbaum (Luiz Penalva) Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent, Roderic Guigo) Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan, Yoshihide Hayashizaki) Zhiping Weng (Nathan Trinklein, Rick Myers) Brenton Graveley (John Rinn, Others) .. and many senior sciensts, postdocs, students, technicians, computer sciensts, stascians and administrators in these groups NHGRI: Elise Feingold, Mike Pazin, Peter Good 4 Chip-seq (180 TFs Experimental Assays + Histone marks; 1770 data sets) RNA-seq (418) DNAse-seq (318) RNA-Sequencing Wang et al. 2009 Nat Gen. Rev." Func?onal data: ChIP-seq Sequence and align ChIP-seq Peak 300-500 bp Mo?f (8-12 bp) Immunoprecipitaon An?body Transcrip?on Factor ChIP-exo Histone Marks Func?onal data: DNase-seq DNaseI hypersensi?vity Sequence peak and align Transcrip?on DNaseI Factor Region of open chroman Histone Histone Func?onal data: DNase footprints DNaseI Sequence Footprint and align Transcrip?on DNaseI Factor Region of open chroman Histone Histone a b *0 ODSIHDWXUH b a r 3KHQRW\SHïDVVRFLDWHG613V e Y 5DQGRPVDPSOLQJRIPDWFKHG613V o *0 *HQRW\SHG613V *HQRPHV PHUVRQDOJHQRPHV LFKPHQWRUGHSOHWLRQ r ODSIHDWXUH r 3KHQRW\SHïDVVRFLDWHG613V e Y 5DQGRPVDPSOLQJRIPDWFKHG613V o *HQRW\SHG613V ROGHQ I *HQRPHV PHUVRQDOJHQRPHV LFKPHQWRUGHSOHWLRQ r ï 0 ORJ DFWLRQRI613VWKDW r E 7 '1DVH,SHDNV 7) WE 3) '5 ) 766 ROGHQ &7&) I c *:$6HQULFKPHQW ORJSYDOXH *2LPPXQHUHVSRQVH JHQHVDERYH 10ï 0 ORJ WKUHVKROG DFWLRQRI613VWKDW r E 7 '1DVH,SHDNV 7) WE 3) '5 ) 766 &7&) c *:$6HQULFKPHQW ORJSYDOXH *2LPPXQHUHVSRQVH JHQHVDERYH 10 WKUHVKROG d +XPDQ)HE *5&KKJ FKU Ross HardisonES , Belinda Giardine e FKU 40000000 C9 PTGER4 TTC33 DAB2 OSRF 1a 1 cf12 Examples of Signal Tracks BC026261 T cf4Ucd PRKAA1 T al1sc12984Iggmus T *:$6&DWDORJ K562MaxV0416102 Hepg2 Hepg2Foxa2 Hepg2P300V0416101 Hepg2Foxa1c20 HuvecCtcf Helas3Ctcf Hepg2Jund CACO2.DS8235 HUVEC.all HepG2.all Jurkat.DS12659 hTH1.all hTH2.DS7842 CD34.DS16814 d Phenotype SNP-Pheno associations overlap any TF occupancy Gm12878Ebf Gm12878Pol24h8 Gm12878Pol2 Helas3CebpbIggrab K562Ctcf Gm12878Pu1 Gm12878Mef2a K562Pol2V0416101 Gm12878Pax5c20 Gm12878NfkbIggrab Hepg2Ctcf HuvecGata2Ucd Gm12878Elf1sc631V0416101 Gm12878Egr1V0416101 Hepg2Mafkab50322Iggrab HuvecCfosUcd Gm12878Bcl Gm12878Irf4 Gm12878Batf Gm12878 Hepg2Fosl2 K562 +XPDQ)HE *5&KKJ FKU ES TOTAL 4860 600 78 57 69 69 72 47 47 71 54 35 54 29 44 28 48 50 38 35 45 37 37 44 62 33 57 46 62 40 55 47 70 85 118 62 192 57 81 Height 204 34 1 2 2 0 4 2 2 0 2 1 2 0 2 4 4 e FKU 40000000 Systemic_lupus_erythematosus 62 10 4 2 1 1 4 0 1 4 1 1 4 2 0 1 2 4 2 1 0 1 0 0 0 0 1 1 1 1 2 0 0 4 2 1 Crohn's_disease 105 20 2 2 2 2 1 2 2 0 2 1 2 1 1 1 2 1 1 0 2 1 1 2 1 2 2 1 PTGER4 Ulcerative_colitis 85 11 2 0 1 2 1 1 2 2 1 1 2 1 2 2 0 2 2 1 0 2 2 0 1 1 2 2 C9 Multiple_sclerosis 71 15 4 1 0 4 2 4 2 0 2 2 1 0 2 4 2 0 1 0 0 0 0 0 0 0 0 1 1 4 FKU ES Rheumatoid_arthritis 57 11 4 2 2 1 0 4 0 4 4 0 0 1 1 0 0 1 0 2 2 0 1 0 0 0 0 0 0 0 0 0 2 2 1 11 1 TTC33 LDL_cholesterol 45 8 0 0 0 2 2 1 0 4 1 0 1 0 1 0 1 0 0 0 0 0 0 2 2 2 1 1 1 0 2 1 0 1 0 &URKQҋVGLVHDVHDAB2 Bone_mineral_density 65 9 1 1 1 1 2 2 2 1 2 1 1 0 2 2 2 0 1 2 1 1 0 0 1 0 2 2 1 1 1 2 2 4 2 UV UV UV UV UVOSRF UV 1a Coronary_heart_disease 107 17 2 0 0 2 4 0 0 4 1 2 0 2 0 0 1 1 1 1 0 0 1 1 1 1 2 2 2 1 1 1 2 0 0 1 Chronic_lymphocytic_leukemia 17 8 1 4 0 0 1 0 2 1 0 0 2 0 1 0 2 1 1 cf12 2 0 1 0 1 0 0 0 0 0 0 1 0 0 0 2 0 1 XOFHUDWLYHFROLWLV BC026261 UV T Prostate_cancer 56 8 0 0 0 0 0 0 0 1 0 0 2 1 0 0 2 0 0 0 0 2 1 1 cf4Ucd 4 0 0 2 2 1 1 2 0 1 PRKAA1 T al1sc12984Iggmus Triglycerides 48 10 0 0 0 1 2 0 0 2 1 0 2 1 1 0 2 2 0 0 0 0 T 1 2 1 2 2 1 2 2 0 2 1 0 2 1 0 PXOWLSOHVFOHURVLV Celiac_disease 54 11 4 0 2 2 1 1 1 2 0 0 1 0 0 0 0 1 1 1 1 0 1 1 1 1 2 0 1 2 0 0 0 2 2 1 2 *:$6&DWDORJ UV K562MaxV0416102 Hepg2 Hepg2Foxa2 Hepg2P300V0416101 Hepg2Foxa1c20 HuvecCtcf Helas3Ctcf Hepg2Jund CACO2.DS8235 HUVEC.all HepG2.all Jurkat.DS12659 hTH1.all hTH2.DS7842 CD34.DS16814 Colorectal_cancerPhenotype 18SNP-Pheno associations overlap any TF occupancy 5 Gm12878Ebf 0 Gm12878Pol24h8 0 Gm12878Pol2 0 Helas3CebpbIggrab 1 K562Ctcf 0 Gm12878Pu1 0 Gm12878Mef2a 0 K562Pol2V0416101 1 Gm12878Pax5c20 0 Gm12878NfkbIggrab 0 Hepg2Ctcf 2 HuvecGata2Ucd 0 Gm12878Elf1sc631V0416101 0 Gm12878Egr1V0416101 0 Hepg2Mafkab50322Iggrab 0 HuvecCfosUcd 0 Gm12878Bcl 0 Gm12878Irf4 0 Gm12878Batf 0 Gm12878 0 Hepg2Fosl2 2 K562 0 0 2 0 0 2 2 0 1 0 2 0 1 Hematological_parametersTOTAL 486085 60012 78 570 690 69 721 470 47 710 541 351 542 290 440 280 482 500 381 351 452 372 371 441 620 330 571 461 621 400 550 470 701 851 118 62 192 57 81 HIV-1_controlHeight 20455 3410 0 2 4 1 2 10 1 22 2 20 0 00 41 0 0 20 1 1 1 21 01 21 2 11 21 02 21 0 40 1 0 0 0 40 2 1 0 +89(&*$7$ Systemic_lupus_erythematosusProtein_quantitative_trait_loci 6248 107 42 2 2 20 10 12 41 01 11 40 10 11 42 22 01 10 22 1 42 21 10 00 11 00 00 00 00 10 10 11 11 21 02 01 42 21 11 Alzheimer's_diseaseCrohn's_disease 10542 205 20 20 20 21 12 20 20 01 20 10 22 0 10 10 10 1 20 10 10 00 21 11 11 20 11 21 0 22 2 11 1 2 0 0 2 0 1 Ulcerative_colitisHDL_cholesterol 8555 118 21 0 0 01 11 20 0 12 1 0 11 20 0 20 11 12 20 10 20 21 02 21 21 11 02 22 22 01 11 12 0 21 2 0 21 0 FKU ES Multiple_sclerosisCholesterol 7116 156 41 0 0 10 02 0 40 22 42 20 01 20 21 10 00 21 40 0 20 1 01 0 11 00 00 00 00 02 01 01 00 10 12 0 1 41 0 7)V Rheumatoid_arthritisLongevity 5730 115 40 22 2 11 01 40 0 01 40 40 01 00 11 10 00 00 10 00 20 20 00 10 01 00 00 00 00 01 02 01 02 20 22 11 110 0 11 +89(&F)26 Attention_deficit_hyperactivity_disorderLDL_cholesterol 45102 89 00 00 00 21 22 10 00 41 10 01 11 00 10 00 11 00 00 00 01 00 0 00 1 20 0 20 20 11 10 0 10 00 22 10 01 10 00 Bone_mineral_densityCognitive_performance 65111 98 10 10 12 11 21 20 20 10 20 10 11 00 20 20 21 00 10 20 10 10 00 00 10 00 20 20 0 11 12 10 20 21 4 0 0 20 0 &URKQҋVGLVHDVH UV UV UV UV UV UV Coronary_heart_diseaseType_2_diabetes 10797 1713 20 00 00 21 41 02 01 41 11 20 01 21 00 00 12 11 11 01 01 10 10 1 1 10 20 20 20 12 11 10 1 20 2 00 4 00 10 Chronic_lymphocytic_leukemiaConduct_disorder 1738 85 10 41 1 01 0 0 11 00 20 10 02 00 20 00 10 00 20 10 10 20 00 10 01 10 00 00 00 01 02 00 10 01 02 02 21 00 11 XOFHUDWLYHFROLWLV+89(&,QSXW UV Prostate_cancerType_1_diabetes 5667 87 02 01 01 00 00 02 01 10 01 01 20 10 01 00 0 20 01 00 01 00 20 11 11 41 1 2 2 00 01 20 21 0 12 11 2 01 11 Dialysis-related_mortalityTriglycerides 4826 106 01 00 00 11 21 01 00 20 10 00 22 11 10 00 20 21 00 00 00 00 0 10 20 10 21 21 11 21 21 0 00 21 12 01 20 10 01 PXOWLSOHVFOHURVLV Bipolar_disorderCeliac_disease 54110 116 41 0 0 02 21 20 10 11 10 20 00 01 10 00 00 0 00 10 10 10 11 00 11 10 10 10 20 00 10 21 02 0 01 20 22 10 21 UV Colorectal_cancerBody_mass 1898 55 00 00 00 14 00 00 00 10 00 00 20 00 00 00 00 00 00 00 00 00 20 00 00 20 0 0 0 00 01 20 20 00 1 00 21 00 10 HUVEC Hematological_parametersC-reactive_protein 8534 127 0 00 00 0 10 00 0 01 10 10 20 00 00 00 22 00 10 10 20 20 10 10 01 00 11 10 10 01 01 00 11 10 2 1 0 0 1 Menarche_and_menopauseHIV-1_control 5562 106 00 20 40 10 20 00 10 21 20 00 00 00 10 00 01 00 10 10 10 10 11 10 22 10 10 20

ENCODE: Understanding the Genome Michael Snyder May 8, 2013

Semantic Web

The for Report 07-08

PREDICTD: Parallel Epigenomics Data Imputation with Cloud-Based Tensor Decomposition

Aggregation and Correlation Toolbox for Analyses of Genome Tracks Justin Jee Yale University

Biocreative II.5 Workshop 2009 Special Session on Digital Annotations

Manolis Kellis Piotr Indyk

Tporthmm : Predicting the Substrate Class Of

Prof. Manolis Kellis April 15, 2008

MPGM: Scalable and Accurate Multiple Network Alignment

ENCODE Consortium Meeting

You Are Cordially Invited to a Talk in the Edmond J. Safra Center for Bioinformatics Distinguished Speaker Series

News Archive Table of Contents