927-930 Feature Modencode NR.Indd
Total Page:16
File Type:pdf, Size:1020Kb
Vol 459|18 June 2009 FEATURE Unlocking the secrets of the genome Despite the successes of genomics, little is known about how genetic information produces complex organisms. A look at the crucial functional elements of fly and worm genomes could change that. DNA and expressed sequence tags, have been These two model organisms, with their ease of Susan E. Celniker, Laura A. L. Dillon, invaluable, but unfortunately these data sets husbandry and genetic manipulation, are pillars Mark B. Gerstein, Kristin C. Gunsalus, remain incomplete7. Non-coding RNA genes of modern biological research, and a systematic Steven Henikoff, Gary H. Karpen, present an even greater challenge8–10, and many catalogue of their functional genomic elements Manolis Kellis, Eric C. Lai, Jason D. Lieb, remain to be discovered, particularly those promises to pave the way to a more complete David M. MacAlpine, Gos Micklem, that have not been strongly conserved during understanding of the human genome. Studies Fabio Piano, Michael Snyder, Lincoln Stein, evolution. Flies and worms have roughly the of these animals have provided key insights Kevin P. Whiteand Robert H. Waterston, for same number of known transcription factors as into many basic metazoan processes, including the modENCODE Consortium humans11, but comprehensive molecular stud- developmental patterning, cellular signalling, ies of gene regulatory networks have yet to be DNA replication and inheritance, programmed he primary objective of the Human tackled in any of these species. cell death and RNA interference (RNAi). The Genome Project was to produce high- In an attempt to remedy this situation, the genomes are small enough to be investigated Tquality sequences not just for the human National Human Genome Research Institute comprehensively with current technologies and genome but also for those of the chief model (NHGRI) launched the ENCODE (Encyclope- findings can be validated in vivo. The research organisms: Escherichia coli, yeast (Saccharomy- dia of DNA Elements) project in 2003, with the communities that study these two organisms will ces cerevisiae), worm (Caenorhabditis elegans), goal of defining the functional elements in the rapidly make use of the modENCODE results, fly (Drosophila melanogaster) and mouse (Mus human genome. The pilot phase of the project deploying powerful experimental approaches musculus). Free access to the resultant data has focused on 1% of the human genome and a that are often not possible or practical in mam- prompted much biological research, includ- parallel effort to foster technology develop- mals, including genetic, genomic, transgenic, ing development of a map of common human ment12. The initial ENCODE analysis revealed biochemical and RNAi assays. modENCODE, genetic variants (the International HapMap new findings but also made clear just how com- with its potential for biological validation, will Project)1, expression profiling of healthy and plex the biology is and how our grasp of it is far add value to the human ENCODE effort by illu- diseased cells2 and in-depth studies of many from complete13. On the basis of this experi- minating the relationship between molecular individual genes. These genome sequences ence, the NHGRI launched two complemen- and biological events. have enabled researchers to carry out genetic tary programmes in 2007: an expansion of the The modENCODE project (Table 1) com- and functional genomic studies not previously human ENCODE project to the whole genome plements other systematic investigations possible, revealing new biological insights with (www.genome.gov/ENCODE) and the model into these highly studied organisms. In both broad relevance across the animal kingdom3,4. organism ENCODE (modENCODE) project organisms, RNAi collections have been devel- Nevertheless, our understanding of how the to generate a comprehensive annotation of oped and used to uncover novel gene func- information encoded in a genome can produce the functional elements in the C. elegans and tions14–18. Mutants are being recovered through a complex multicellular organism remains far D. melanogaster genomes (www.modencode. insertional mutagenesis19 and targeted dele- from complete. To interpret the genome accu- org; www.genome.gov/modENCODE). tions (http://celeganskoconsortium.omrf.org; rately requires a complete list of functionally important elements and a description of their TABLE 1 modENCODE CONSORTIUM dynamic activities over time and across dif- Elements Worm Fly Primary experimental data ferent cell types. As well as genes for proteins Transcripts Robert Waterston Susan Celniker Tiling arrays, RNASeq, RT-PCR/RACE, and non-coding RNAs, functionally impor- (mRNAs, non- (University of (LBNL), Eric Lai mass spectrometry, 3’ untranslated tant elements include regulatory sequences coding RNAs, Washington), (Sloan-Kettering region clone library, UAS-miRNA flies, that direct essential functions such as gene transcription start Fabio Piano (New Institute) knockdowns of RNA-binding proteins expression, DNA replication and chromosome sites, untranslated York University) regions, miRNAs) inheritance. Although geneticists have been quick to Transcription-factor- Michael Snyder Kevin White ChIP-chip, ChIP-seq, transcription- decode the functional elements in the yeast binding sites (Yale University) (University of factor-tagged strains, anti- Chicago) transcription factor antibodies S. cerevisiae, with its small compact genome and powerful experimental tools5–6, our under- Chromatin marks Jason Lieb Gary Karpen (LBNL), ChIP-chip and ChIP-seq of (University of Steven Henikoff chromosome-associated proteins and standing of the more complex genomes of North Carolina), nucleosomes human, mouse, fly and worm is still rudimen- Steven Henikoff tary. Intrinsic signals that define the bounda- (University of ries of protein-coding genes can only be partly Washington) recognized by current algorithms, and signals DNA replication David MacAlpine ChIP-chip and ChIP-seq of essential for other functional elements are even harder to (Duke University initiator proteins, origin mapping and find and interpret. Experimental approaches, Medical Center) DNA copy number in differentiated notably the sequencing of complementary tissues 927 © 2009 Macmillan Publishers Limited. All rights reserved 9927-93027-930 FeatureFeature mmodENCODEodENCODE NR.inddNR.indd 927927 112/6/092/6/09 113:50:243:50:24 OPINION NATURE|Vol 459|18 June 2009 RNA Centromere polymerase specification Condensation Histone Replication origins and and cohesion modifications, pre-replicative complex variants, and Spliceosome binding proteins Transcription Pre-RC DNA Nuclear pore and factors polymerase nuclear lamin interactions Isolate Domain-level chromatin regulation and Extract dosage compensation RNA Origin mapping, Short RNA Long RNA timing, miRNA mRNA differential piRNA hnRNA replication siRNA ncRNA Generate antibodies Microarray or sequence Epigenetics and transcription regulation Replication Transcription and splicing Figure 1 | DNA element functions and identification process. www.shigen.nig.ac.jp/c.elegans), with the other issues as the opportunities arise. the different types of functional element eventual goal of one for every known gene. The core of the modENCODE project consists will be used to reveal fundamental princi- Genome sequences of related species are now of ten groups who use high-throughput ples of fly and worm genome biology and to also available for both fly20,21 and worm22, methods to identify functional elements begin to uncover the emergent properties and multiple independent wild isolates are (see Table 1). A Data Coordinating Center of these complex genomes. Some topics the being characterized (T. MacKay, personal (DCC) will collect, integrate and display the modENCODE groups, along with interested communication, www.dpgp.org23; R.H.W.). data. Together, the groups expect to identify members of the wider community, intend to First-generation catalogues have been assem- the principal classes of functional element explore are outlined below, but these are only a bled of gene expression patterns during for D. melanogaster and C. elegans. They will beginning. Our intention is to create a resource development and in different tissues24–34. work closely together to complete the precise that will provide the foundation for ongoing annotation of protein-coding genes, identify analysis by scientists for years to come. Research and analysis small RNAs and non-coding RNA transcripts, Our two model organisms share many The modENCODE project will operate as an map transcription start sites, identify promoter similarities with other metazoans, including open consortium and participants can join motif elements, elucidate functional elements humans. They also differ from other organ- on the understanding that they will abide by within 3ʹ untranslated regions, and identify isms in some striking ways, particularly in the set criteria (www.genome.gov/26524644). alternatively spliced transcripts as well as the details of the establishment and maintenance An important aim of the project is to respond signals required for splicing. Genomic sites of cellular identity, centromere biology and to the needs of the broader Drosophila and bound by sequence-specific transcription heterochromatin function. To help under- C. elegans scientific communities, and several factors will also be comprehensively identi- stand how the similarities and differences in avenues will be open for suggestions on fied. Charting the chromatin