ENCODE: Understanding the Genome

ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney, Marc Schaub, Alan Boyle Encyclopedia of DNA Elements (ENCODE) • NHGRI-funded consortium • Goal: delineate all functional elements in the human genome • Wide array of experimental assays • Three Phases: 1) Pilot 2) Scale Up 1.0 3) Scale up 2.0 The ENCODE Project Consortium. An Integrated Encyclopedia of DNA Elements in the Human Genome. Nature 2012 Project website: http://encodeproject.org The ENCODE Consortium Brad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides) Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng) Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer) Jim Kent (David Haussler, Kate Rosenbloom) John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick Navas, Phil Green) Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman) Rick Myers (Barbara Wold) Scott Tenenbaum (Luiz Penalva) Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent, Roderic Guigo) Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan, Yoshihide Hayashizaki) Zhiping Weng (Nathan Trinklein, Rick Myers) Additional ENCODE Participants: Elliott Marguiles, Eric Green, Job Dekker, Laura Elnitski, Len Pennachio, Jochen Wittbrodt .. and many senior scientists, postdocs, students, technicians, computer scientists, statisticians and administrators in these groups NHGRI: Elise Feingold, Mike Pazin, Peter Good 3 Experimental Assays Chip-seq (165 TFs + Histone marks) RNA-seq (292) DNAse-seq (~200) RNA-Sequencing Wang et al. 2009 Nat Gen. Rev. Functional data: ChIP-seq Sequence and align ChIP-seq Peak 300-500 bp Motif (8-12 bp) Immunoprecipitation Antibody Transcription Factor ChIP-exo Histone Marks Functional data: DNase-seq DNaseI hypersensitivity Sequence peak and align Transcription DNaseI Factor Region of open chromatin Histone Histone Functional data: DNase footprints DNaseI Sequence Footprint and align Transcription DNaseI Factor Region of open chromatin Histone Histone b ) n a e 1.5 q o i u t t G M12878 e a 0.3 l e p f e p d 1.0 a ) q l b r n o a e 1.5 Phenotype−associated SNPs q e o t i u v t Random sampling of matched SNPs n t o G M12878 0.2 e e a l G enotyped SNPs t 0.3 e p 0.5 m a f 1000 G enomes e h h t p d c 1.0 24 Peqsonal genomes i a q r s l r o n Phenotype−associated SNPs P e t e N v Random sampling of matched SNPs 0.1 n 0.0 d o S 0.2 l e G enotyped SNPs t f o 0.5 f m a o ( 1000 G enomes h h t n c 2 24 Peqsonal genomes i o r g s i t n −0.5 o P l c 0 e F N a 0.1 0.0 E S F C R r / d E S T S DNaseI peaks TF l W P T D F T f o C f o ( n 2 genes above o c G W AS enqichment -log g p-value G O:0006955 immune qesponse i t −0.5 10o thqeshold l c 0 F a E S F C R r E T / DNaseI peaks TF W S P T D F T C c G W AS enqichment -log p-value G O:0006955 immune qesponse genes above 10 thqeshold d H uman Feb. 2009 (G RCh37/hg19) chq5:39,274,501-4Ross0,819, 5Hardison00 (1,545,0,0 0Belinda bp) Giardine e chq5: 39500000 40000000 40500000 1 PTGER4 0 y 1 b C9 c 6 s s a n 1 r 1 n u a 4 g 0 TTC33 o i 0 m p 1 g t 1 I b g u V 6 a 0 b 2 i a c 1 1 g 1 2 1 a I r 2 c c 0 3 4 0 DAB2 r 0 8 6 4 3 g o 1 6 o 2 0 g 1 1 h 8 d 0 a 0 g 9 s 6 c 5 c a V 6 I OSRF g 4 4 9 1 c 5 d 2 F 2 4 I 5 d 1 s s 3 5 2 1 1 b f 0 2 2 1 2 c c b 1 1 T f 2 6 U c f l l b 1 4 1 r l t 2 a Examples of Signal Tracks x 4 f k 4 1 f 2 V 1 a 8 e 4 2 U 2 0 f c a 8 b o o p u a l g f 2 U y c 0 c o l a 0 a f d k 6 8 r BC026261 f a s 1 f l b V 4 S E P P P M P N E E f B I B T n t V l l s s x 0 x c n n 1 7 f c o c l f S e 2 t 8 8 8 8 8 8 8 8 8 a 8 8 8 8 a t a 1 x D a f 3 t e l o o o u l c S l . a S c 7 7 7 7 7 7 7 7 7 7 7 7 7 l D a C . C PRKAA1 t h o C M F T F P F J G C C . p 2 a D 8 8 8 8 8 8 8 8 8 8 8 8 8 C a D 3 3 2 t . 2 2 2 2 2 2 2 2 . c c c P a C P T M 2 2 2 2 2 2 2 2 2 2 2 2 2 O l s s E a - 4 1 2 g e g e g g g g g e g G 2 2 2 2 r 1 1 1 1 1 1 1 1 1 1 1 1 1 a a k C V 3 P v v v l 6 6 p p p 6 6 p p p p l p p H H e r m m m m m m m m m m m m m e 5 5 e u e u e 5 5 e e e e u e e A U e D N v T T u G W AS Catalog o G G G H K G G K G G H H G G H H G G G G H K K H H H H H H H C H H J h h C d Phenotype S H uman Feb. 2009 (G RCh37/hg19) chq5:39,274,501-40,819,500 (1,545,000 bp) TOTAL 4860 600 78 57 69 69 72 47 47 71 54 35 54 29 44 28 48 50 38 35 45 37 37 44 62 33 57 46 62 40 55 47 70 85 118 62 192 57 81 Height 204 34 7 3 3 7 6 1 3 2 3 2 6 0 4 6 3 2 3 5 5 2 0 2 3 1 2 0 2 5 4 3 3 6 5 4 9 3 7 e chq5: 39500000 40000000 40500000 Systemic_lupus_erythematosus 62 10 4 6 6 2 1 1 4 0 1 4 1 1 4 2 0 1 2 3 4 2 1 0 1 0 0 0 0 1 1 1 1 2 0 0 4 2 1 Crohn's_disease 105 20 2 2 2 2 1 2 2 0 2 1 2 5 1 1 1 3 2 1 1 0 2 1 1 2 1 2 3 2 3 1 3 6 5 3 9 5 5 1 PTGER4 Ulcerative_colitis 85 11 2 3 3 0 1 2 3 1 3 3 1 2 0 3 2 1 1 2 1 2 2 0 2 2 1 0 2 2 0 1 1 3 2 5 3 7 2 3 y 1 b C9 c 6 chq5:40,390,001-40,440,000 (50,000 bp) s s Multiple_sclerosis 71 15 4 3 3 1 0 3 4 2 4 2 0 2 2 1 a 0 2 4 3 2 3 0 3 1 0 0 0 0 0 0 0 0 1 1 3 5 4 3 n 1 r 1 n u a 4 g 0 TTC33 o i 0 m Rheumatoid_arthritis 57 1p 1 4 2 2 1 0 4 3 0 4 4 0 0 1 1 0 0 1 0 2 2 0 1 0 0 0 0 0 0 0 0 0 2 2 1 11 3 1 1 g t 1 I b g u V 6 a 0 b 2 i a c 1 1 g LDL_cholesterol 45 8 0 0 0 2 2 1 0 4 1 0 1 0 1 1 0 1 0 0 0 0 0 0 2 2 2 1 1 1 0 2 1 0 1 0 3 2 3 3 3 1 a I r 2 c c 0 3 4 0 DAB2 r 0 8 6 4 3 g Cqohn’s disease o 1 6 o 2 0 g 1 1 qs4613763 qs17234657 qs11742570 qs6896969 qs1373692 qs9292777 Bone_mineral_density 65 9 1 h 1 1 1 2 2 2 1 2 1 1 0 2 2 2 0 1 2 1 1 0 0 1 0 2 2 3 1 1 1 2 2 4 3 3 2 3 8 d 0 a 0 g 9 s 6 c 5 c a V 6 I OSRF g 4 4 9 1 c 5 d 2 F 2 4 I 5 d 1 s s 3 5 2 1 1 b f 0 2 2 1 2 c c b 1 Coronary_heart_disease 107 17 2 0 0 2 4 0 0 4 1 2 0 2 0 0 1 1 1 0 0 1 1 1 1 3 1 2 2 2 1 1 1 3 2 3 0 6 0 1 T f 2 6 U c a f l l b 1 4 1 r l t 2 x 4 f k 4 1 f 2 V 1 a 8 e 4 2 U 2 0 f c a 8 b o o p u a l g f 2 U y c 0 c o l a 0 a f d k 6 8 r BC026261 f a s 1 l Chronic_lymphocytic_leukemia 17 8 1 4 5 0 0 3 1 0 2 1 0 0 2 0 1 0 2 1 1 2 0 1 0 1 0 0 0 f 0 0 0 1 0 0 0 2 0 1 ulceqative colitis qs1992660 b V 4 S E P P P M P N E E f B I B T n t V l l s s x 0 x c n n 1 7 f c o c l f S e 2 t 8 8 8 8 8 8 8 8 8 a 8 8 8 8 a t a 1 x D a f 3 t e l o o o u l c S l .

ENCODE: Understanding the Genome

Gene Prediction: the End of the Beginning Comment Colin Semple

The EMBL-European Bioinformatics Institute the Hub for Bioinformatics in Europe

Functional Effects Detailed Research Plan

Semantic Web

The for Report 07-08

PREDICTD: Parallel Epigenomics Data Imputation with Cloud-Based Tensor Decomposition

Multi-Class Protein Classification Using Adaptive Codes

Aggregation and Correlation Toolbox for Analyses of Genome Tracks Justin Jee Yale University

UC Irvine UC Irvine Previously Published Works

Biocreative II.5 Workshop 2009 Special Session on Digital Annotations

Establishing Incentives and Changing Cultures to Support Data Access

Manolis Kellis Piotr Indyk