ENCODE Analysis Working Group and Data Analysis Centre Rick Myers

Ewan Birney Motivation for mandated DAC

y Genesis from the experience of the pilot project y Everyone looking at the ceiling when a key piece of annoying analysis needs to happen

y A set of people who are funded to ensure that critical integrative analysis occurs (consistently and timely)

y In no way exclusive y Everyone is invited in analysis y DAC should fit around things which are happening at the consortium level y Porous (no distinction expected between DAC members and other consortium members) except… y …the cleaning of the Aegean stables moment (eg, creating repeat libraries, consistently remapping everyone’s chip-seq data)

y Interplay with DCC deliberate (trade off where things occur)

y When there are too many things on the DAC to-do list - ask AWG to prioritise. AWG Participates in Rick Myers discussion Chair of AWG Birney BickelBickel Project Manager Haussler EBI (Ian Dunham) Bickel

Directed Analysis Methods development EBI UCSC Yale BU EBI UCSC Yale BU U. Wash Penn Berkeley U. Wash Penn Berkeley DAC - federated, embedded y /Paul Flicek/Ian Dunham (EBI)- comparative , short read technology methods y Mark Gerstein (Yale) - chip-seq, link to genes/transcripts, link to modENCODE, P y (BU) - chip-chip, chip-seq, motif finding, bayesian analysis y Ross Hardison/Webb Miller (PSU) - comparative genomics, regulatory regions y Jim Kent/ (UCSC) - comparative genomics, DCC y Peter Bickel (UC Berkeley) - statistician y Bill Nobel (UW) - machine learning - HMMs, change point analysis, wavelets, SVMs New analysis tasks from AWG or community

Results Provided Triage and Back to AWG Initial prioritisation

Converting Priortisation Active ad hoc of all projects tasks analysis to by AWG handled pipelines by EDAC AWG prioritisation EDAC suggest pipelining tasks Experimental Data exploration, DCC group, in house Normalisation, coordination methods Sanity checking

Feedback to AWG and expt. groups Other consoritum activ Posing biological DAC activities questions by Feedback during AWG method devel. existing New algorithms methods New statistical to scale method appropriate methods + scaleable genome wide

pipelined analysis, No scaling problem DCC web sites, coordination browser integration Foundational paper y Concept: Publish a Human ENCODE manuscript mid- way through 4-year grant period y Why? y 2+ years from now for 1st comprehensive paper is too late y Will discipline us to move forward and finish some things y New technologies are allowing us to collect a lot more data than before Foundational paper y Ideas for content y Focus on Tier 1 cell lines y Include integrated analysis of all ENCODE aspects (i.e. expression, TF binding, chromatin, DNA methylation, etc.) y Also integrate and summarize data from papers already published by individual groups y Recruit non-ENCODE-funded experts in B cell and erythroid cell biology y Other ideas? Foundational paper - Timing y Hope for submission by March-April 2009 y Need 6 months for analysis and writing (?) y So late October/early November data freeze (?)

y Analysis Working Group will have face-to-face meeting December 8-9