<<

ENCODE: Understanding the Genome

Michael Snyder

May 8, 2013

Conflicts: Personalis, Genapsys, Illumina Slides From , Marc Schaub, Alan Boyle Encyclopedia of DNA Elements (ENCODE)

• NHGRI-funded consorum • Goal: delineate all “funconal” elements in the human genome

• Wide array of experimental assays • Three Phases: 1) Pilot 2) Scale Up 1.0 3) Scale up 2.0

The ENCODE Project Consorum. An Integrated Encyclopedia of DNA Elements in the Human Genome. 2012 Project website: hp://encodeproject.org The ENCODE Consorum Phase 2

Brad Bernstein (, Manolis Kellis, Tony Kouzarides) Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, )

Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer) Jim Kent (, Kate Rosenbloom) John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick Navas, Phil Green) Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman) Rick Myers (Barbara Wold) Sco Tenenbaum (Luiz Penalva) Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent, Roderic Guigo) Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan, Yoshihide Hayashizaki) Zhiping Weng (Nathan Trinklein, Rick Myers)

Addional ENCODE Parcipants: Ellio Marguiles, Eric Green, Job Dekker, Laura Elnitski, Len Pennachio,

Jochen Wibrodt .. and many senior sciensts, postdocs, students, technicians, computer sciensts, stascians and administrators in these groups NHGRI: Elise Feingold, Mike Pazin, Peter Good 3 The ENCODE Consorum Phase 3

Brad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides) Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng)

Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer) Jim Kent (David Haussler, Kate Rosenbloom) Mike Cherry John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick Navas, Phil Green) Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman) Rick Myers (Barbara Wold) Sco Tenenbaum (Luiz Penalva) Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent, Roderic Guigo) Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan, Yoshihide Hayashizaki) Zhiping Weng (Nathan Trinklein, Rick Myers)

Brenton Graveley (John Rinn, Others)

.. and many senior sciensts, postdocs, students, technicians, computer sciensts, stascians and administrators in these groups

NHGRI: Elise Feingold, Mike Pazin, Peter Good 4 Chip-seq (180 TFs Experimental Assays + Histone marks; 1770 data sets) RNA-seq (418) DNAse-seq (318) RNA-Sequencing

Wang et al. 2009 Nat Gen. Rev. Funconal data: ChIP-seq

Sequence and align ChIP-seq Peak 300-500 bp

Mof (8-12 bp)

Immunoprecipitaon Anbody

Transcripon Factor

ChIP-exo Histone Marks Funconal data: DNase-seq

DNaseI hypersensivity Sequence peak and align

Transcripon DNaseI Factor

Region of open chroman

Histone Histone Funconal data: DNase footprints

DNaseI Sequence Footprint and align

Transcripon DNaseI Factor

Region of open chroman

Histone Histone a b  *0  

ODSIHDWXUH b a r  3KHQRW\SHïDVVRFLDWHG613V e

Y 5DQGRPVDPSOLQJRIPDWFKHG613V o  *0   *HQRW\SHG613V *HQRPHV  PHUVRQDOJHQRPHV LFKPHQWRUGHSOHWLRQ r ODSIHDWXUH r 3KHQRW\SHïDVVRFLDWHG613V e

Y   5DQGRPVDPSOLQJRIPDWFKHG613V o  *HQRW\SHG613V ROGHQ  I *HQRPHV PHUVRQDOJHQRPHV LFKPHQWRUGHSOHWLRQ r ï 0 ORJ DFWLRQRI613VWKDW   r E 7 '1DVH,SHDNV 7) WE 3) '5 ) 766

ROGHQ &7&) I c *:$6HQULFKPHQW ORJSYDOXH *2LPPXQHUHVSRQVH JHQHVDERYH 10ï 0 ORJ WKUHVKROG DFWLRQRI613VWKDW r E 7 '1DVH,SHDNV 7) WE 3) '5 ) 766 &7&) c *:$6HQULFKPHQW ORJSYDOXH *2LPPXQHUHVSRQVH JHQHVDERYH 10 WKUHVKROG

d +XPDQ)HE *5&KKJ FKU Ross HardisonES , Belinda Giardine e FKU  40000000  C9 PTGER4 TTC33 DAB2 OSRF 1a 1

cf12 Examples of Signal Tracks BC026261 T cf4Ucd PRKAA1 T al1sc12984Iggmus T *:$6&DWDORJ K562MaxV0416102 Hepg2 Hepg2Foxa2 Hepg2P300V0416101 Hepg2Foxa1c20 HuvecCtcf Helas3Ctcf Hepg2Jund CACO2.DS8235 HUVEC.all HepG2.all Jurkat.DS12659 hTH1.all hTH2.DS7842 CD34.DS16814 d Phenotype SNP-Pheno associations overlap any TF occupancy Gm12878Ebf Gm12878Pol24h8 Gm12878Pol2 Helas3CebpbIggrab K562Ctcf Gm12878Pu1 Gm12878Mef2a K562Pol2V0416101 Gm12878Pax5c20 Gm12878NfkbIggrab Hepg2Ctcf HuvecGata2Ucd Gm12878Elf1sc631V0416101 Gm12878Egr1V0416101 Hepg2Mafkab50322Iggrab HuvecCfosUcd Gm12878Bcl Gm12878Irf4 Gm12878Batf Gm12878 Hepg2Fosl2 K562 +XPDQ)HE *5&KKJ FKU ES TOTAL 4860 600 78 57 69 69 72 47 47 71 54 35 54 29 44 28 48 50 38 35 45 37 37 44 62 33 57 46 62 40 55 47 70 85 118 62 192 57 81 Height 204 34      1  2  2  0 4   2    2 0 2  1 2 0 2  4     4    e FKU  40000000  Systemic_lupus_erythematosus 62 10 4   2 1 1 4 0 1 4 1 1 4 2 0 1 2  4 2 1 0 1 0 0 0 0 1 1 1 1 2 0 0 4 2 1 Crohn's_disease 105 20 2 2 2 2 1 2 2 0 2 1 2  1 1 1  2 1 1 0 2 1 1 2 1 2  2  1        PTGER4 Ulcerative_colitis 85 11 2   0 1 2  1   1 2  2 1 1 2 1 2 2 0 2 2 1 0 2 2 0 1 1  2    2  C9 Multiple_sclerosis 71 15 4   1 0  4 2 4 2 0 2 2 1 0 2 4  2  0  1 0 0 0 0 0 0 0 0 1 1   4  FKU ES Rheumatoid_arthritis 57 11 4 2 2 1 0 4  0 4 4 0 0 1 1 0 0 1 0 2 2 0 1 0 0 0 0 0 0 0 0 0 2 2 1 11  1 TTC33 LDL_cholesterol 45 8 0 0 0 2 2 1 0 4 1 0 1 0 1 0 1 0 0 0 0 0  0  2  2 2 1 1  1 0 2 1 0 1 0 &URKQҋVGLVHDVHDAB2 Bone_mineral_density 65 9 1 1 1 1 2 2 2 1 2 1 1 0 2 2 2 0 1 2 1 1 0 0 1 0 2 2  1 1 1 2 2 4   2  UV UV UV UV UVOSRF UV 1a

Coronary_heart_disease 107 17 2 0 0 2 4 0 0 4 1 2 0 2 0 0 1 1 1 1 0 0 1 1 1  1 2 2 2 1 1 1  2  0  0 1

Chronic_lymphocytic_leukemia 17 8 1 4  0 0  1 0 2 1 0 0 2 0 1 0 2 1 1 cf12 2 0 1 0 1 0 0 0 0 0 0 1 0 0 0 2 0 1 XOFHUDWLYHFROLWLV BC026261 UV T

Prostate_cancer 56 8 0 0 0 0 0 0 0 1 0 0 2 1 0 0  2 0 0 0 0 2 1 1 cf4Ucd 4    0 0 2 2  1 1 2 0 1 PRKAA1 T al1sc12984Iggmus Triglycerides 48 10 0 0 0 1 2 0 0 2 1 0 2 1 1 0 2 2 0 0 0 0  T 1 2 1 2 2 1 2 2  0 2 1 0 2 1 0 PXOWLSOHVFOHURVLV Celiac_disease 54 11 4   0 2 2 1 1 1 2 0 0 1 0 0 0 0 1 1 1 1 0 1 1 1 1 2 0 1 2 0 0 0 2 2 1 2 *:$6&DWDORJ UV K562MaxV0416102 Hepg2 Hepg2Foxa2 Hepg2P300V0416101 Hepg2Foxa1c20 HuvecCtcf Helas3Ctcf Hepg2Jund CACO2.DS8235 HUVEC.all HepG2.all Jurkat.DS12659 hTH1.all hTH2.DS7842 CD34.DS16814 Colorectal_cancerPhenotype 18SNP-Pheno associations overlap any TF occupancy 5 Gm12878Ebf 0 Gm12878Pol24h8 0 Gm12878Pol2 0 Helas3CebpbIggrab 1 K562Ctcf 0 Gm12878Pu1 0 Gm12878Mef2a 0 K562Pol2V0416101 1 Gm12878Pax5c20 0 Gm12878NfkbIggrab 0 Hepg2Ctcf 2 HuvecGata2Ucd 0 Gm12878Elf1sc631V0416101 0 Gm12878Egr1V0416101 0 Hepg2Mafkab50322Iggrab 0 HuvecCfosUcd 0 Gm12878Bcl 0 Gm12878Irf4 0 Gm12878Batf 0 Gm12878 0 Hepg2Fosl2 2 K562 0 0 2    0 0 2 2 0 1 0 2 0 1 Hematological_parametersTOTAL 486085 60012 78 570 690 69 721 470 47 710 541 351 542 290 440 280 482 500 381 351 452 372 371 441 620 330 571 461 621 400 550 470 701 851 118 62 192 57 81 HIV-1_controlHeight 20455 3410 0 2 4 1 2 10 1 22 2 20 0 00 41 0 0 20 1 1 1 21 01 21 2 11 21 02 21 0 40 1 0 0 0 40 2 1 0 +89(&*$7$ Systemic_lupus_erythematosusProtein_quantitative_trait_loci 6248 107 42 2 2 20 10 12 41 01 11 40 10 11 42 22 01 10 22 1 42 21 10 00 11 00 00 00 00 10 10 11 11 21 02 01 42 21 11 Alzheimer's_diseaseCrohn's_disease 10542 205 20 20 20 21 12 20 20 01 20 10 22 0 10 10 10 1 20 10 10 00 21 11 11 20 11 21 0 22 2 11 1 2 0 0 2 0 1 Ulcerative_colitisHDL_cholesterol 8555 118 21 0 0 01 11 20 0 12 1 0 11 20 0 20 11 12 20 10 20 21 02 21 21 11 02 22 22 01 11 12 0 21 2 0  21 0 FKU ES Multiple_sclerosisCholesterol 7116 156 41 0 0 10 02 0 40 22 42 20 01 20 21 10 00 21 40 0 20 1 01 0 11 00 00 00 00 02 01 01 00 10 12 0 1 41 0 7)V Rheumatoid_arthritisLongevity 5730 115 40 22 2 11 01 40 0 01 40 40 01 00 11 10 00 00 10 00 20 20 00 10 01 00 00 00 00 01 02 01 02 20 22 11 110 0 11 +89(&F)26 Attention_deficit_hyperactivity_disorderLDL_cholesterol 45102 89 00 00 00 21 22 10 00 41 10 01 11 00 10 00 11 00 00 00 01 00 0 00 1 20 0 20 20 11 10 0 10 00 22 10 01 10 00 Bone_mineral_densityCognitive_performance 65111 98 10 10 12 11 21 20 20 10 20 10 11 00 20 20 21 00 10 20 10 10 00 00 10 00 20 20 0 11 12 10 20 21 4 0 0 20 0 &URKQҋVGLVHDVH UV UV UV UV UV UV Coronary_heart_diseaseType_2_diabetes 10797 1713 20 00 00 21 41 02 01 41 11 20 01 21 00 00 12 11 11 01 01 10 10 1 1 10 20 20 20 12 11 10 1 20 2 00 4 00 10 Chronic_lymphocytic_leukemiaConduct_disorder 1738 85 10 41 1 01 0 0 11 00 20 10 02 00 20 00 10 00 20 10 10 20 00 10 01 10 00 00 00 01 02 00 10 01 02 02 21 00 11 XOFHUDWLYHFROLWLV+89(&,QSXW UV Prostate_cancerType_1_diabetes 5667 87 02 01 01 00 00 02 01 10 01 01 20 10 01 00 0 20 01 00 01 00 20 11 11 41 1 2 2 00 01 20 21 0 12 11 2 01 11 Dialysis-related_mortalityTriglycerides 4826 106 01 00 00 11 21 01 00 20 10 00 22 11 10 00 20 21 00 00 00 00 0 10 20 10 21 21 11 21 21 0 00 21 12 01 20 10 01 PXOWLSOHVFOHURVLV Bipolar_disorderCeliac_disease 54110 116 41 0 0 02 21 20 10 11 10 20 00 01 10 00 00 0 00 10 10 10 11 00 11 10 10 10 20 00 10 21 02 0 01 20 22 10 21 UV Colorectal_cancerBody_mass 1898 55 00 00 00 14 00 00 00 10 00 00 20 00 00 00 00 00 00 00 00 00 20 00 00 20 0 0 0 00 01 20 20 00 1 00 21 00 10 HUVEC Hematological_parametersC-reactive_protein 8534 127 0 00 00 0 10 00 0 01 10 10 20 00 00 00 22 00 10 10 20 20 10 10 01 00 11 10 10 01 01 00 11 10 2 1 0 0 1 Menarche_and_menopauseHIV-1_control 5562 106 00 20 40 10 20 00 10 21 20 00 00 00 10 00 01 00 10 10 10 10 11 10 22 10 10 20 10 00 00 10 01 00 00 02 21 11 00 +89(&*$7$ Protein_quantitative_trait_lociBreast_cancer 4843 76 21 20 20 02 00 20 10 10 10 00 00 10 20 20 10 01 20 10 20 10 00 00 10 01 01 01 01 00 00 10 10 10 21 12 22 10 10 Mean_platelet_volumeAlzheimer's_disease 4215 55 01 00 00 10 20 01 00 10 01 00 20 00 00 00 02 10 01 01 00 00 10 10 10 00 11 10 01 20 20 10 10 21 00 00 22 00 11 Soluble_levels_of_adhesion_moleculesHDL_cholesterol 555 85 11 00 01 11 10 01 00 22 11 01 10 00 00 00 10 20 01 01 01 11 20 10 10 10 20 20 20 10 10 20 00 10 20 00 0 10 01 -XUNDW CholesterolPsoriasis 1638 66 11 01 02 00 20 02 01 21 20 01 10 00 10 00 00 11 00 00 00 10 10 00 11 00 00 00 00 20 10 10 00 01 20 01 14 10 00 7)V Parkinson's_diseaseLongevity 3046 55 00 21 1 10 10 00 00 10 00 01 10 00 10 00 00 00 00 00 00 00 00 01 10 00 00 00 01 10 20 10 21 00 20 11 01 00 10 '1DVH,+89(&F)26 Attention_deficit_hyperactivity_disorderObesity 10236 95 00 00 00 11 21 00 00 10 00 10 10 00 00 00 11 00 00 00 10 00 00 01 10 00 00 00 00 10 00 00 00 00 20 00 10 00 00 Fasting_glucose-related_traitsCognitive_performance 11171 85 01 00 20 10 10 00 00 01 00 00 10 01 00 00 11 00 00 00 00 00 00 00 00 00 02 00 00 10 20 00 01 11 1 01 01 00 00 7K Type_2_diabetes 97 13 0 0 0 1 1 2 1 1 1 0 1 1 0 0 2 1 1 1 1 0 0  1 0 0 0 0 2 1 0 1 0 2 0 4 0 0 Conduct_disorder 38 5 0 1 1 1  0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 2 0 0 1 2 2 1 0 1 +89(&,QSXW Type_1_diabetes 67 7 2 1 1 0 0 2 1 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 2 2 0 1 0 1 0 2 1  1 1 Dialysis-related_mortality 26 6 1 0 0 1 1 1 0 0 0 0 2 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 2 1 0 0 1 Bipolar_disorder 110 6 1 0 0 2 1 0 0 1 0 0 0 1 0 0 0  0 0 0 0 1 0 1 0 0 0 0 0 0 1 2  1 0 2 0 1 7K Body_mass 98 5 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0  0 1 0 0 HUVEC C-reactive_protein 34 7 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 1 0 2 1 0 0 1 Menarche_and_menopause 62 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 1 0 0 2 1 1 0 Breast_cancer 43 6 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 2 2 0 0 Mean_platelet_volume 15 5 1 0 0 0 0 1 0 0 1 0 0 0 0 0 2 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 2 0 1 Soluble_levels_of_adhesion_molecules 5 5 1 0 1 1 0 1 0 2 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 -XUNDW Psoriasis 38 6 1 1 2 0 0 2 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 4 0 0 Parkinson's_disease 46 5 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 '1DVH, Obesity 36 5 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Fasting_glucose-related_traits 17 5 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 1 1 1 1 1 0 0 7K

7K ENCODE Dimensions Cells

2,886 Experiments

200 Cell Lines/ Tissues Tissues 200Cell Lines/ ~10 TeraBases ~3000x of the Human Genome

Methods/Factors

200 Assays (~165 different TFs) Mouse Datasets Produced and Released

# ssue or cell types # experiments # of data sets Histone Modificaons 33 157 310 Transcripon Factor 29 109 299 RNA-Seq 69 104 193 DNase-Seq 55 55 127 Replicaon Timing 18 18 33 ChIP Controls 34 36 108 Total 123 479 1070

Data download from hp://encodeproject.org ENCODE Uniform Analysis Pipeline Anshul Kundaje, Qunhua Li, Michael Hoffman, Jason Ernst, Joel Rozowsky, Pouya Kheradpour

Mapped reads from producon (Bam)

Uniform Peak Calling Pipeline (SPP, PeakSeq) Signal Generaon (read extension and mappability correcon)

Good reproducibility Poor reproducibility

Segmentaon Rep2

Rep1 IDR Processing, QC and Blacklist Filtering ChromHMM/Segway

Self Organising Maps Mof Discovery Stats, GSC Signal Aggregaon enrichments, etc. over peaks Raw genome coverage of elements

Element Type Coverage Cumulave Coverage Region Exons 3% 3% Chip-seq bound mofs 4.5% 5% DNaseI Footprints 5.7% 9% Chip-seq bound regions 8.1% 12% DNaseI HS regions 15.2% 19.4% Histone Modificaons (*) 44% 49% RNA 62% 80% Bound Mof/ (* excluding broad marks) Footprint (Union over all experiments and cell types) Saturaon Steve Wilder

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1200000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Most aggressive ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● fit for saturaon ● ● ● ● ● ● 1000000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● suggests a maximum ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● of 50% of elements ● ● ● ● ● 800000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● discovered ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 600000 ● ● ●

Number of elements ● ● Likely to be lower due ● ●

● ● ● ● to inaccessible cell ● ● ● ● 400000 ● ● ● ● types etc ● ● ●

● 200000 ● ● 0

0 5 10 15 20 25 30 35 40 45 50 55 60

Number of cell lines ENCODE Integrave Segmentaons

Well Known: TSS, Gene Start, ~7 Major genome segments Gene Bodies

25 “elaborations” New Info: “Enhancers” (2 states), 1,000s of details Insulators

Unexpected: Specific Gene End Experimental Confirmation of New Enhancers Jason Gertz, Barbara Wold, Rick Myers, Len Pennacchio !

! 2 ! ! ! ! ! ! ! ! ! !! ! ! ! !!!! ! !! ! ! ! !! ! ! ! !! ! ! ! !!! ! !! ! ! ! ! ! 1 ! ! ! ! ! ! ! ! !!! ! ! !!! ! ! ! ! ! !!! ! ! ! ! ! ! ! !! !!! ! !! ! !!! ! ! !! ! ! ! ! !! ! ! ! !! !! !! ! !! ! ! ! ! ! ! ! !!! ! ! ! ! ! 0 ! ! ! ! ! !! ! ! ! ! ! !

! ! Median Reporter Expression, K562 !

−1 ! !

Back Discr HMM Naive Unk 53% hit rate in Mouse Assay Mann Whitney 0.003 HMM vs Background Pennacchio Lab 1e-7, HMM vs Naïve or Biologist picks Myers Lab Many other stories… $%&'#()*+,-.,/*0,# TAF1 YY1 TBP HEY1 E2F4 E2F6 ELF1 MAX POLR2A HMGN3 ZBTB7A CCNT2 EGR1 ETS1 SIN3A HDAC2 GABPA MXI1 MYC CHD2 IRF1 GTF2F1 THAP1 SP2 REST NRF1 USF1 FOS SP1 SRF SPI1 SIX5 CTCF RAD21 SMC3 CTCFL ZNF263 BCLAF1 TAF7 RDBP ZBTB33 BCL3 ATF3 USF2 NFE2 SETDB1 TRIM28 ZNF274 NR2C2 GATA1 GATA2 TAL1 EP300 SMARCA4 SMARCB1 SIRT6 JUNB JUND JUN FOSL1 MAFK CEBPB HDAC8 STAT1 STAT2 BDP1 POLR3A BRF1 GTF2B BRF2 TAF1 TBP YY1 HEY1 ELF1 MAX E2F4 E2F6 IRF1 EGR1 ZBTB7A ETS1 SIN3A CCNT2 HMGN3 HDAC2 GABPA CHD2 POLR2A GTF2F1 MXI1 MYC THAP1 SP1 SP2 NRF1 REST SIX5 SRF SPI1 RAD21 SMC3 CTCF CTCFL ZNF263 BCLAF1 TAF7 RDBP ZBTB33 BCL3 ATF3 USF2 USF1 NFE2 GATA1 GATA2 TAL1 EP300 SMARCA4 SMARCB1 SIRT6 JUNB JUND JUN FOSL1 FOS MAFK CEBPB Splicing/Histone interaction (Roderic Guigo) HDAC8 SETDB1 TRIM28 NR2C2 ZNF274 STAT1 STAT2 BDP1 POLR3A BRF1 GTF2B !"# BRF2 RNA landscape TF Co association and Tom Gingeras Regulatory Code Mike Snyder+Mark Gerstein

DNaseI footprints – John Stam. DNA Methylaon – Rick Myers What’s Next- Enhancing ENCODE

1) Deep analysis of six cell lines/ssues : K562, MCF7 Diploid: GM12878, H1 ES Cells Tissues: Liver, Brain 2) More limited coverage of hundreds of other cell types and ssues (RNA-Seq, DNAaseHS, etc) 3) Some mouse data

Many invesgators same as ENCODE2 + Brenton Graveley-a few others What’s Next- Species Comparisons

1) modENCODE/ENCODE Comparison Worms, Flies, Human Hundreds of worm and fly datasets (e.g. >250 C. elegans TF ChIP-Seq datasets)

2) MouseENCODE-humanENCODE Comparison The ENCODE Consorum

Brad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides) Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng)

Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer) Jim Kent (David Haussler, Kate Rosenbloom) John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick Navas, Phil Green) Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman) Rick Myers (Barbara Wold) Sco Tenenbaum (Luiz Penalva) Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent, Roderic Guigo) Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan, Yoshihide Hayashizaki) Zhiping Weng (Nathan Trinklein, Rick Myers)

Addional ENCODE Parcipants: Ellio Marguiles, Eric Green, Job Dekker, Laura Elnitski, Len Pennachio,

Jochen Wibrodt .. and many senior sciensts, postdocs, students, technicians, computer sciensts, stascians and administrators in these groups NHGRI: Elise Feingold, Mike Pazin, Peter Good 21