<<

ENCODE: Understanding the

Michael Snyder

November 6, 2012

Conflicts: Personalis, Genapsys, Illumina Slides From , Marc Schaub, Alan Boyle Encyclopedia of DNA Elements (ENCODE)

• NHGRI-funded consortium • Goal: delineate all functional elements in the

• Wide array of experimental assays

• Three Phases: 1) Pilot 2) Scale Up 1.0 3) Scale up 2.0

The ENCODE Project Consortium. An Integrated Encyclopedia of DNA Elements in the Human Genome. 2012 Project website: http://encodeproject.org The ENCODE Consortium

Brad Bernstein (, , Tony Kouzarides) Ewan Birney (, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, ) Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer) Jim Kent (, Kate Rosenbloom) John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick Navas, Phil Green) Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman) Rick Myers (Barbara Wold) Scott Tenenbaum (Luiz Penalva) (Alexandre Reymond, , David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent, Roderic Guigo) Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan, Yoshihide Hayashizaki) Zhiping Weng (Nathan Trinklein, Rick Myers)

Additional ENCODE Participants: Elliott Marguiles, Eric Green, Job Dekker, Laura Elnitski, Len Pennachio, Jochen Wittbrodt .. and many senior scientists, postdocs, students, technicians, computer scientists, statisticians and administrators in these groups NHGRI: Elise Feingold, Mike Pazin, Peter Good 3 Experimental Assays Chip-seq (165 TFs + Histone marks) RNA-seq (292) DNAse-seq (~200) RNA-Sequencing

Wang et al. 2009 Nat Gen. Rev. Functional data: ChIP-seq

Sequence and align ChIP-seq Peak 300-500 bp

Motif (8-12 bp)

Immunoprecipitation Antibody

Transcription Factor

ChIP-exo Histone Marks Functional data: DNase-seq

DNaseI hypersensitivity Sequence peak and align

Transcription DNaseI Factor

Region of open chromatin

Histone Histone Functional data: DNase footprints

DNaseI Sequence Footprint and align

Transcription DNaseI Factor

Region of open chromatin

Histone Histone

b ) n

a e 1.5

q

o

i

u t

t G M12878

e a

0.3 l

e

p

f

e p

d 1.0

a

) q

l b

r

n o

a e 1.5 Phenotype−associated SNPs

q

e

o

t

i

u v

t Random sampling of matched SNPs

n t

o G M12878

0.2 e

e

a l G enotyped SNPs

t 0.3 e

p 0.5

m a

f 1000 G enomes

e

h

h

t

p d

c 1.0

24 Peqsonal

i

a

q

r

s

l

r o

n Phenotype−associated SNPs

P

e

t

e

N v Random sampling of matched SNPs

0.1 n 0.0

d o

S 0.2

l

e

G enotyped SNPs

t f

o 0.5

f

m

a o

( 1000 G enomes

h

h

t

n

c 2

24 Peqsonal genomes

i

o

r

g

s

i t

n −0.5

o

P

l c

0 e F

N a 0.1 0.0 E S F C R r /

d E S T S DNaseI peaks TF l W P T

D

F T f

o C

f

o

(

n

2 genes above o

c G W AS enqichment -log g p-value G O:0006955 immune qesponse i t −0.5

10o thqeshold l c 0 F a E S F C R r E T / DNaseI peaks TF W S P T D F T C c G W AS enqichment -log p-value G O:0006955 immune qesponse genes above 10 thqeshold

d H uman Feb. 2009 (G RCh37/hg19) chq5:39,274,501-4Ross0,819, 5Hardison00 (1,545,0,0 0Belinda bp) Giardine e chq5: 39500000 40000000 40500000

1 PTGER4

0

y 1

b C9

c

6

s s

a

n 1

r

1

n

u

a 4 g

0 TTC33

o

i 0 m

p

1 g

t 1

I

b

g

u V

6

a 0

b

2

i a

c 1 1 g

1

2

1

a I

r 2

c

c 0 3 4

0 DAB2

r

0

8 6

4

3

g

o 1 6

o 2 0

g 1

1

h

8

d 0 a 0

g

9

s 6 c 5

c

a

V 6

I OSRF

g 4

4

9

1

c 5 d 2

F 2 4

I 5

d

1 s

s 3

5

2

1 1

b f 0

2 2 1 2

c c

b 1

1

T f 2

6

U c

f l l b 1 4 1 r l t 2

a Examples of Signal Tracks

x

4

f

k

4 1

f 2 V 1

a 8

e 4

2 U 2

0 f c a 8

b o o p u a l g f 2

U

y c 0

c

o l a 0 a f d

k 6 8

r BC026261

f a s 1

f l

b V 4 S

E P P P M P N E E f B I B T

n

t V l l

s

s x 0 x c

n n 1

7

f

c

o c l

f S

e 2 t

8 8 8 8 8 8 8 8 8 a 8 8 8 8

a

t a 1 x D a

f 3 t

e l o o o

u l

c

S

l . . a S

c

7 7 7 7 7 7 7 7 7 7 7 7 7 l

D

a C .

C PRKAA1

t

h o C M F T F P F J

G C C .

p 2

a

D

8 8 8 8 8 8 8 8 8 8 8 8 8 C a D

3 3 2 t

.

2 2 2 2 2 2 2 2 . .

c c c

P a C P T M

2 2 2 2 2 2 2 2 2 2 2 2 2

O

l s s E a

-

4

1 2

g e g e g g g g g e g G

2 2 2 2

r

1 1 1 1 1 1 1 1 1 1 1 1 1

a a k

C V 3

P

v v v

l 6 6 p p p 6 6 p p p p l p p

H H

e r

m m m m m m m m m m m m m

e 5 5 e u e u e 5 5 e e e e u e e A U e D

N

v T T

u G W AS Catalog

o G G G H K G G K G G H H G G H H G G G G H K K H H H H H H H C H H J h h C d Phenotype S H uman Feb. 2009 (G RCh37/hg19) chq5:39,274,501-40,819,500 (1,545,000 bp) TOTAL 4860 600 78 57 69 69 72 47 47 71 54 35 54 29 44 28 48 50 38 35 45 37 37 44 62 33 57 46 62 40 55 47 70 85 118 62 192 57 81 Height 204 34 7 3 3 7 6 1 3 2 3 2 6 0 4 6 3 2 3 5 5 2 0 2 3 1 2 0 2 5 4 3 3 6 5 4 9 3 7 e chq5: 39500000 40000000 40500000 Systemic_lupus_erythematosus 62 10 4 6 6 2 1 1 4 0 1 4 1 1 4 2 0 1 2 3 4 2 1 0 1 0 0 0 0 1 1 1 1 2 0 0 4 2 1 Crohn's_disease 105 20 2 2 2 2 1 2 2 0 2 1 2 5 1 1 1 3 2 1 1 0 2 1 1 2 1 2 3 2 3 1 3 6 5 3 9 5 5

1 PTGER4

Ulcerative_colitis 85 11 2 3 3 0 1 2 3 1 3 3 1 2 0 3 2 1 1 2 1 2 2 0 2 2 1 0 2 2 0 1 1 3 2 5 3 7 2 3

y 1

b C9 c

6 chq5:40,390,001-40,440,000 (50,000 bp)

s s

Multiple_sclerosis 71 15 4 3 3 1 0 3 4 2 4 2 0 2 2 1 a 0 2 4 3 2 3 0 3 1 0 0 0 0 0 0 0 0 1 1 3 5 4 3

n 1

r

1

n

u

a 4 g

0 TTC33

o

i 0 m

Rheumatoid_arthritis 57 1p 1 4 2 2 1 0 4 3 0 4 4 0 0 1 1 0 0 1 0 2 2 0 1 0 0 0 0 0 0 0 0 0 2 2 1 11 3 1

1 g

t 1

I

b

g

u V

6

a 0

b

2

i a

c 1 1 g LDL_cholesterol 45 8 0 0 0 2 2 1 0 4 1 0 1 0 1 1 0 1 0 0 0 0 0 0 2 2 2 1 1 1 0 2 1 0 1 0

3 2 3 3 3

1

a I

r 2

c

c 0 3 4

0 DAB2

r

0

8 6

4 3

g Cqohn’s disease

o 1 6

o 2 0

g 1

1 qs4613763 qs17234657 qs11742570 qs6896969 qs1373692 qs9292777

Bone_mineral_density 65 9 1 h 1 1 1 2 2 2 1 2 1 1 0 2 2 2 0 1 2 1 1 0 0 1 0 2 2 3 1 1 1 2 2 4 3 3 2 3

8

d 0 a 0

g

9

s 6 c 5

c

a

V 6

I OSRF

g 4

4

9

1

c 5 d 2

F 2 4

I 5

d

1 s

s 3

5

2

1 1

b f 0

2 2 1 2

c c

b 1

Coronary_heart_disease 107 17 2 0 0 2 4 0 0 4 1 2 0 2 0 0 1 1 1 0 0 1 1 1 1 3 1 2 2 2 1 1 1 3 2 3 0 6 0 1

T f 2

6

U c

a f l l b 1 4 1 r l t 2

x

4

f

k

4 1

f 2 V 1

a 8

e 4

2 U 2

0 f c a 8

b o o p u a l g f 2

U

y c 0

c

o l a 0 a f d

k 6 8

r BC026261

f a s 1 l

Chronic_lymphocytic_leukemia 17 8 1 4 5 0 0 3 1 0 2 1 0 0 2 0 1 0 2 1 1 2 0 1 0 1 0 0 0 f 0 0 0 1 0 0 0 2 0 1 ulceqative colitis qs1992660

b V 4 S

E P P P M P N E E f B I B T

n

t V l l

s

s x 0 x c

n n 1

7

f

c

o c l

f S

e 2 t

8 8 8 8 8 8 8 8 8 a 8 8 8 8

a

t a 1 x D a

f 3 t

e l o o o

u l

c

S

l . . a S

Prostate_cancer 56 8 0 0 0 0 c 0 0 0 1 0 0 2 1 0 0 3 2 0 0 0 0 2 1 1 4 3 3 3 0 0 2 2 3 1 1 2 0 1

7 7 7 7 7 7 7 7 7 7 7 7 7 l

D

a C .

C PRKAA1

t

h o C M F T F P F J

G C C .

p 2

a

D

8 8 8 8 8 8 8 8 8 8 8 8 8 C a D

3 3 2 t

.

2 2 2 2 2 2 2 2 . .

c c c

P a C P T M

2 2 2 2 2 2 2 2 2 2 2 2 2

O

s s E a

Triglycerides 48 10l 0 0 0 1 2 0 0 2 1 0 2 1 1 0 2 2 0 0 0 0 3 1 2 1 2 2 1 2 2 3 0 2 1 0 2 1 0

-

4

1 2

g e g e g g g g g e g G

2 2 2 2

r

1 1 1 1 1 1 1 1 1 1 1 1

1 multiple scleqosis

a a k

C V 3

P

v v v

l 6 6 p p p 6 6 p p p p l p p

H H r

Celiac_disease 54 1e 1 4 3 3 0 2 2 1 1 1 2 0 0 1 0 0 0 0 1 1 1 1 0 1 1 1 1 2 0 1 2 0 0 0 2 2 1 2

m m m m m m m m m m m m m

e 5 5 e u e u e 5 5 e e e e u e e A U e D

N

v T T

u G W AS Catalog qs6451493

o G G G H K G G K G G H H G G H H G G G G H K K H H H H H H H C H H J h h C Colorectal_cancerPhenotype 18S 5 0 0 0 1 0 0 0 1 0 0 2 0 0 0 0 0 0 0 0 0 2 0 0 2 3 3 3 0 0 2 2 0 1 0 2 0 1 Hematological_parametersTOTAL 486085 60012 783 570 690 693 721 470 473 710 541 351 542 290 440 280 482 500 381 351 452 372 371 441 620 330 571 461 621 400 550 470 701 851 1183 623 1926 573 815 HIV-1_controlHeight 20455 3410 70 32 34 71 62 10 31 22 32 20 60 00 41 60 30 20 31 51 51 21 01 21 32 11 21 02 21 50 40 31 30 60 50 40 92 31 70 H UVEC G ATA2 Systemic_lupus_erythematosusProtein_quantitative_trait_loci 6248 107 42 62 62 20 10 12 41 01 11 40 10 11 42 22 01 10 22 31 42 21 10 00 11 00 00 00 00 10 10 11 11 21 02 01 42 21 11 Alzheimer's_diseaseCrohn's_disease 10542 205 20 20 20 21 12 20 20 01 20 10 22 50 10 10 10 31 20 10 1 0 00 21 11 11 20 11 21 30 22 32 11 31 62 50 30 92 50 51 Ulcerative_colitisHDL_cholesterol 8555 118 21 30 30 01 11 20 30 12 31 30 11 20 30 20 11 12 20 10 2 0 21 02 21 21 11 02 22 22 01 11 12 30 21 52 30 73 21 30 chq5:40,390,001-40,440,000 (50,000 bp) Multiple_sclerosisCholesterol 7116 156 41 30 30 10 02 30 40 22 42 20 01 20 21 10 00 21 4 0 30 2 0 31 01 30 11 00 00 00 00 02 01 01 00 10 12 30 51 41 30 TFs Rheumatoid_arthritisLongevity 5730 115 40 22 23 11 01 40 30 01 40 40 01 00 11 10 00 00 1 0 00 2 0 20 00 10 01 00 00 00 00 01 02 01 02 20 22 11 110 30 11 H UVEC cFOS Attention_deficit_hyperactivity_disorderLDL_cholesterol 45102 89 00 00 00 21 22 10 00 41 10 01 11 00 10 00 11 00 0 0 00 01 00 30 00 31 20 30 20 20 11 10 30 10 00 22 10 01 10 00 Bone_mineral_densityCognitive_performance 65111 98 10 10 12 11 21 20 20 10 20 10 11 00 20 20 2 1 00 1 0 20 10 10 00 00 10 00 20 20 30 11 12 10 20 21 43 30 30 20 30 Cqohn’s disease qs4613763 qs17234657 qs11742570 qs6896969 qs1373692 qs9292777 Coronary_heart_diseaseType_2_diabetes 10797 1713 20 00 00 21 41 02 01 41 11 20 01 21 00 00 12 11 11 01 01 10 10 13 31 10 20 20 20 12 11 10 31 20 32 00 64 00 10 Chronic_lymphocytic_leukemiaConduct_disorder 1738 85 10 41 51 01 03 30 11 00 20 10 02 00 20 00 10 00 20 10 10 20 00 10 01 10 00 00 00 01 02 00 10 01 02 02 21 00 11 uHlcUeVqaEtivCe cInolpituist qs1992660 Prostate_cancerType_1_diabetes 5667 87 02 01 01 00 00 02 01 10 01 01 20 10 01 00 30 20 01 00 01 00 20 11 11 41 31 32 32 00 01 20 21 30 12 11 25 01 11 Dialysis-related_mortalityTriglycerides 4826 106 01 00 00 11 21 01 00 20 10 00 22 11 10 00 20 21 00 00 00 00 30 10 20 10 21 21 11 21 21 30 00 21 12 01 20 10 01 multiple scleqosis Bipolar_disorderCeliac_disease 54110 116 41 30 30 02 21 20 10 11 10 20 00 01 10 00 00 03 00 10 1 0 10 11 00 11 10 10 10 20 00 10 21 02 03 01 20 22 10 21 qs6451493 Colorectal_cancerBody_mass 1898 55 00 00 00 14 00 00 00 10 00 00 20 00 00 00 00 00 0 0 00 0 0 00 20 00 00 20 30 30 30 00 01 20 20 00 13 00 21 00 10 HUVEC Hematological_parametersC-reactive_protein 8534 127 30 00 00 30 10 00 30 01 10 10 20 00 00 00 22 00 1 0 10 2 0 20 10 10 01 00 11 10 10 01 01 00 11 10 32 31 60 30 51 Menarche_and_menopauseHIV-1_control 5562 106 00 20 40 10 20 00 10 21 20 00 00 00 10 00 01 00 1 0 10 10 10 11 10 22 10 10 20 10 00 00 10 01 00 00 02 21 11 00 H UVEC G ATA2 Protein_quantitative_trait_lociBreast_cancer 4843 76 21 20 20 02 00 20 10 10 10 00 00 10 20 20 1 0 01 2 0 10 20 10 00 00 10 01 01 01 01 00 00 10 10 10 21 12 22 10 10 Mean_platelet_volumeAlzheimer's_disease 4215 55 01 00 00 10 20 01 00 10 01 00 20 00 00 00 0 2 10 0 1 01 00 00 10 10 10 00 11 10 01 20 20 10 10 21 00 00 22 00 11 Soluble_levels_of_adhesion_moleculesHDL_cholesterol 555 85 11 00 01 11 10 01 00 22 11 01 10 00 00 00 10 20 01 01 01 11 20 10 10 10 20 20 20 10 10 20 00 10 20 00 30 10 01 Juqkat CholesterolPsoriasis 1638 66 11 01 02 00 20 02 01 21 20 01 10 00 10 00 00 11 00 00 00 10 10 00 11 00 00 00 00 20 10 10 00 01 20 01 14 10 00 TFs Parkinson's_diseaseLongevity 3046 55 00 21 31 10 10 00 00 10 00 01 10 00 10 00 00 00 00 00 00 00 00 01 10 00 00 00 01 10 20 10 21 00 20 11 01 00 10 DNHaUseV EIC cFOS Attention_deficit_hyperactivity_disorderObesity 10236 95 00 00 00 11 21 00 00 10 00 10 10 00 00 00 11 00 00 00 10 00 00 01 10 00 00 00 00 10 00 00 00 00 20 00 10 00 00 Fasting_glucose-related_traitsCognitive_performance 11171 85 01 00 20 10 10 00 00 01 00 00 10 01 00 00 11 00 00 00 0 0 00 00 00 00 00 02 00 00 10 20 00 01 11 31 01 01 00 00 Th1 Type_2_diabetes 97 13 0 0 0 1 1 2 1 1 1 0 1 1 0 0 2 1 1 1 1 0 0 3 1 0 0 0 0 2 1 0 1 0 2 0 4 0 0 Conduct_disorder 38 5 0 1 1 1 3 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 2 0 0 1 2 2 1 0 1 H UVEC Input Type_1_diabetes 67 7 2 1 1 0 0 2 1 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 2 2 0 1 0 1 0 2 1 5 1 1 Dialysis-related_mortality 26 6 1 0 0 1 1 1 0 0 0 0 2 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 2 1 0 0 1 Bipolar_disorder 110 6 1 0 0 2 1 0 0 1 0 0 0 1 0 0 0 3 0 0 0 0 1 0 1 0 0 0 0 0 0 1 2 3 1 0 2 0 1 Th2 Body_mass 98 5 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 0 1 0 0 HUVEC C-reactive_protein 34 7 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 1 0 2 1 0 0 1 Menarche_and_menopause 62 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 1 0 0 2 1 1 0 Breast_cancer 43 6 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 2 2 0 0 Mean_platelet_volume 15 5 1 0 0 0 0 1 0 0 1 0 0 0 0 0 2 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 2 0 1 Soluble_levels_of_adhesion_molecules 5 5 1 0 1 1 0 1 0 2 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Juqkat Psoriasis 38 6 1 1 2 0 0 2 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 4 0 0 Parkinson's_disease 46 5 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 DNase I Obesity 36 5 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Fasting_glucose-related_traits 17 5 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 1 1 1 1 1 0 0 Th1

Th2 ENCODE Dimensions

Mouse: 126 TF ChIP 70 RNA-Seq

Lines/ Tissues Lines/ Cells

3,010 Experiments

200 Cell Cell 200 ~10 TeraBases ~3000x of the Human Genome

Methods/Factors

200 Assays (~165 ChIP-Seq of different TFs) ENCODE Uniform Analysis Pipeline Anshul Kundaje, Qunhua Li, Michael Hoffman, Jason Ernst, Joel Rozowsky, Pouya Kheradpour

Mapped reads from production (Bam)

Uniform Peak Calling Pipeline (SPP, PeakSeq) Signal Generation (read extension and mappability correction)

Good reproducibility Poor reproducibility

Segmentation

Rep2

Rep1 IDR Processing, QC and Blacklist Filtering ChromHMM/Segway

Self Organising Maps Motif Discovery Stats, GSC Signal Aggregation enrichments, etc. over peaks Raw genome coverage of elements

Element Type Coverage Cumulative Coverage Region Exons 3% 3% Chip-seq bound motifs 4.5% 5% DNaseI Footprints 5.7% 9% Chip-seq bound regions 8.1% 12% DNaseI HS regions 15.2% 19.4% Histone Modifications (*) 44% 49% RNA 62% 80% Bound Motif/ (* excluding broad marks) Footprint (Union over all experiments and cell types) ENCODE Integrative Segmentations

Well Known: TSS, Gene Start, ~7 Major genome segments Gene Bodies

25 “elaborations” New Info: “Enhancers” (2 states), 1,000s of details Insulators

Unexpected: Specific Gene End Experimental Confirmation of New Enhancers Jason Gertz, Barbara Wold, Rick Myers, Len Pennacchio

53% hit rate in Mouse Assay Mann Whitney 0.003 HMM vs Background Pennacchio Lab 1e-7, HMM vs Naïve or Biologist picks Myers Lab Many other stories…

Splicing/Histone interaction (Roderic Guigo)

RNA landscape Tom Gingeras

TF Co association and Regulatory Code Mike Snyder+Mark Gerstein

DNAseI footprints – John Stam. DNA Methylation – Rick Myers The ENCODE Consortium

Brad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides) Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng) Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer) Jim Kent (David Haussler, Kate Rosenbloom) John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick Navas, Phil Green) Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman) Rick Myers (Barbara Wold) Scott Tenenbaum (Luiz Penalva) Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent, Roderic Guigo) Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan, Yoshihide Hayashizaki) Zhiping Weng (Nathan Trinklein, Rick Myers)

Additional ENCODE Participants: Elliott Marguiles, Eric Green, Job Dekker, Laura Elnitski, Len Pennachio, Jochen Wittbrodt .. and many senior scientists, postdocs, students, technicians, computer scientists, statisticians and administrators in these groups NHGRI: Elise Feingold, Mike Pazin, Peter Good 16

Saturation Steve Wilder

0

0

0

0

0

2 1

Most aggressive

0

0

0 0

0 fit for saturation 0

1 suggests a maximum

0

0 0

0 of 50% of elements

0

s

8

t

n e

m discovered

e

l

e

f

0

o

0

r 0

e

0

b

0

m

6 u

N Likely to be lower due 0

0 to inaccessible cell

0

0 0

4 types etc

0

0

0

0

0

2 0

0 5 10 15 20 25 30 35 40 45 50 55 60

Number of cell lines Discovering functional genome segments Michael Hoffman, Jason Ernst, Bill Noble, Manolis Kellis

Well understood: TSS, Gene Start, Gene Bodies

Reassuringly Interesting “Enhancers” (2 states) Insulators

Definitely There, Unexpected Specific Gene End

Sub-classification of Repeats ~7 Major segments of the genome 25 “elaborations” 1,000s of details Irreproducible Discovery Rate (IDR) Ben Brown, Qunhau Li, Peter Bickel If one re-ran the experiment, what is the probability one would observe the same element at this rank or better

Uses ranked element lists from two replicates, and makes the assumption that there is noise at the bottom of the rank

Chip-seq Dnase-seq RNA-seq