REsEaRCh highlighTs

FUncTiOnal The modENCODE guide to the

Now that obtaining genome (RNA-seq), chromatin immunopre- modes of cooperative sequence is routine, assigning cipitation followed by sequencing factor recruitment. function is the current frontier. (ChIP–seq) for transcription factor Data-set integration also revealed Vast data sets that are now available binding and histone modifications, functional regulatory networks that for the Caenorhabditis elegans and DNA replication patterns and can help to improve predictions Drosophila melanogaster — nucleosome occupancy. Importantly, of function and expression. and which are described in a raft of the researchers analysed samples In D. melanogaster these networks new papers — show that large-scale from a range of developmental stages were used to make predictions of collaborative efforts offer a way for- and cell lines, which will help to dis- target based on the ward. This work by the model organ- cern the functional importance and expression of their regulators, and ism Encyclopedia of DNA Elements dynamics of DNA features. to predict the function of previously (modENCODE) Project offers Integrative analysis of these data unannotated . The authors unprecedented functional annotation sets was particularly important for found that one-quarter of genes and is likely to provide the foundation studying gene regulation, which by have predictable expression during for countless future experimental and its nature integrates DNA sequence embryogenesis and that regulatory computational studies. information, transcription factors, models can predict their expression The modENCODE Project was RNA and chromatin. For example, under novel conditions (in cell instigated in 2007 (in parallel with an studies of worms and flies identified lines). The ‘predictable’ genes may be expansion of the human ENCODE numerous short regions — termed those with more precise regulation, Project, the pilot phase of which had highly occupied target (HOT) although predictions might be possi- analysed 1% of the human genome) regions — that are enriched for the ble for more genes with the addition with the aim of identifying all binding of many transcription factors. of further data sets. sequence-based functional elements Intriguingly, their genomic locations This parallel work in C. elegans in the worm and fruitfly genomes. suggest functional importance — and D. melanogaster could highlight Two integrative papers that summa- for example, in C. elegans they are shared and distinct features of the rize the work and several companion enriched near to genes that are highly two species. Comparison of the two papers have now been published. expressed throughout development, species to each other, and to the The genome-wide data sets and in D. melanogaster they overlap human ENCODE project, should collected — 237 for C. elegans and origins of replication. HOT regions enable substantial advances in 700 for D. melanogaster — include have novel sequence motifs that identifying fundamental regulatory high-throughput RNA sequencing could point towards unidentified principles across animal genomes and provide a model for understand- ing the role of DNA sequence in gene regulation and disease. Mary Muers

Original research papers Gerstein, M. B. et al. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE Project. Science 330, 1775–1787 (2010) | The modENCODE Consortium et al. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330, 1787–1797 (2010) FUrTher reaDing Hawkins, R. D., Hon, G. C. & Ren, B. Next-generation genomics: an integrative approach. Nature Rev. Genet. 11, 476–486 (2010) WebsiTe The modENCODE Project: http://www.modencode.org

NATuRE REVIEws | Genetics VOlumE 12 | FEbRuARy 2011 © 2011 Macmillan Publishers Limited. All rights reserved