Decoding the Human Genome
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from genome.cshlp.org on September 6, 2012 - Published by Cold Spring Harbor Laboratory Press Decoding the human genome Kelly A. Frazer Genome Res. 2012 22: 1599-1601 Access the most recent version at doi:10.1101/gr.146175.112 To subscribe to Genome Research go to: http://genome.cshlp.org/subscriptions © 2012, Published by Cold Spring Harbor Laboratory Press ThisReceiveFreelyEmail Openarticle availableReferences freeCreative alertingcitesisAccess emaildistributed online14 alertsarticles, through exclusivelywhen 9 of thenew which Genome articlesby canCold be citeResearchSpring accessed this Harborarticle Open free -Laboratory sign Accessat: up inoption. thePress box at the http://genome.cshlp.org/content/22/9/1599.full.html#ref-list-1topfor theright first Commonscorner sixservice months of the articleafter the or clickfull-issue here publication date (see http://genome.cshlp.org/site/misc/terms.xhtmlLicense ). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/. Downloaded from genome.cshlp.org on September 6, 2012 - Published by Cold Spring Harbor Laboratory Press To subscribe to Genome Research go to: http://genome.cshlp.org/subscriptions © 2012, Published by Cold Spring Harbor Laboratory Press WhatRelated does Content our genome encode? John A. Stamatoyannopoulos Genome Res. September , 2012 22: 1602-1611 GENCODE: The reference human genome annotation for The ENCODE Project Jennifer Harrow, Adam Frankish, Jose M. Gonzalez, et al. Genome Res. September , 2012 22: 1760-1774 Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome Cédric Howald, Andrea Tanzer, Jacqueline Chrast, et al. Genome Res. September , 2012 22: 1698-1710 The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression Thomas Derrien, Rory Johnson, Giovanni Bussotti, et al. Genome Res. September , 2012 22: 1775-1789 Annotation of functional variation in personal genomes using RegulomeDB Alan P. Boyle, Eurie L. Hong, Manoj Hariharan, et al. Genome Res. September , 2012 22: 1790-1797 Toward mapping the biology of the genome Stephen Chanock Genome Res. September , 2012 22: 1612-1615 Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs Hagen Tilgner, David G. Knowles, Rory Johnson, et al. Genome Res. September , 2012 22: 1616-1625 RNA editing in the human ENCODE RNA-seq data Eddie Park, Brian Williams, Barbara J. Wold, et al. Genome Res. September , 2012 22: 1626-1633 Discovery of hundreds of mirtrons in mouse and human small RNA data Erik Ladewig, Katsutomo Okamura, Alex S. Flynt, et al. Genome Res. September , 2012 22: 1634-1645 Long noncoding RNAs are rarely translated in two human cell lines Balázs Bánfai, Hui Jia, Jainab Khatun, et al. Genome Res. September , 2012 22: 1646-1657 Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors Jie Wang, Jiali Zhuang, Sowmya Iyer, et al. Genome Res. September , 2012 22: 1798-1812 Understanding transcriptional regulation by integrative analysis of transcription factor binding data Chao Cheng, Roger Alexander, Renqiang Min, et al. Genome Res. September , 2012 22: 1658-1667 A highly integrated and complex PPARGC1A transcription factor binding network in HepG2 cells Alexandra E. Charos, Brian D. Reed, Debasish Raha, et al. Genome Res. September , 2012 22: 1668-1679 Widespread plasticity in CTCF occupancy linked to DNA methylation Hao Wang, Matthew T. Maurano, Hongzhu Qu, et al. Genome Res. September , 2012 22: 1680-1688 Predicting cell-type−specific gene expression from regions of open chromatin Anirudh Natarajan, Galip Gürkan Yardimci, Nathan C. Sheffield, et al. Genome Res. September , 2012 22: 1711-1722 Sequence and chromatin determinants of cell-type−specific transcription factor binding Aaron Arvey, Phaedra Agius, William Stafford Noble, et al. Genome Res. September , 2012 22: 1723-1734 Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements Anshul Kundaje, Sofia Kyriazopoulou-Panagiotopoulou, Max Libbrecht, et al. Genome Res. September , 2012 22: 1735-1747 ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia Stephen G. Landt, Georgi K. Marinov, Anshul Kundaje, et al. Genome Res. September , 2012 22: 1813-1831 Personal and population genomics of human regulatory variation Benjamin Vernot, Andrew B. Stergachis, Matthew T. Maurano, et al. Genome Res. September , 2012 22: 1689-1697 Linking disease associations with regulatory information in the human genome Marc A. Schaub, Alan P. Boyle, Anshul Kundaje, et al. Genome Res. September , 2012 22: 1748-1759 Downloaded from genome.cshlp.org on September 6, 2012 - Published by Cold Spring Harbor Laboratory Press Commentary Decoding the human genome Kelly A. Frazer1 Moores UCSD Cancer Center, Department of Pediatrics and Rady Children’s Hospital, University of California at San Diego, La Jolla, California 92093, USA Interpreting the human genome sequence is one of the major and noncoding) along with their corresponding RNA transcripts, scientific endeavors of our time. In February 2001, when the hu- and transcriptional regulatory regions. This paper (The ENCODE man genome reference sequence was initially released (Lander Project Consortium 2012), along with companion papers in Nature, et al. 2001), our understanding of the encoded contents was sur- Genome Research,andGenome Biology,providesmuchmorethana prisingly limited. It was perplexing to many in the scientific mere inventory of sequence elements but rather presents an inte- community when we realized that the human genome contains grated analysis providing important insights into the functional only ;21,000 distinct protein-coding genes (Claverie 2001; Hollon organization of the human genome. In alignment with the tradition 2001; Pennisi 2003; Clamp et al. 2007), as other less complex of large consortia sponsored by the NHGRI, the ENCODE project has species like the nematode Caenorhabditis elegans were known to made all data and derived results available through a freely accessible have a similar number of protein-coding genes (Hillier et al. 2005). database (Rosenbloom et al. 2010). It quickly became apparent that the developmental and physio- The following sections describe some of the highlights of the logical complexity of humans would not be explained solely by the ENCODE project, including technical accomplishments, high number of protein-coding genes, and the quest to understand the quality data sets, and integrated analyses with other resources, contents of the human genome began full force. such as disease-associated variants identified through genome-wide The Encyclopedia of DNA Elements (ENCODE) Project was association studies. launched in September of 2003 with the daunting task of identi- fying all the functional elements encoded in the human genome Methodology sequence. To accomplish this task, the National Human Genome ENCODE has placed emphasis on developing technologies and Research Institute (NHGRI) organized The ENCODE Project Con- generating data using standardized guidelines for each type of assay sortium, which consists of an international group of scientists with (The ENCODE Project Consortium 2011). As an example, ENCODE diverse expertise in experimental and computational methods for has performed thousands of individual chromatin immunoprecipi- generating and analyzing high-throughput genomic data (The tation (ChIP) reactions followed by high-throughput sequencing ENCODE Project Consortium 2004). During the initial four years, (ChIP-seq), and through these experiences, the investigators have the consortium conducted a pilot project which focused on an- developed a set of guidelines addressing antibody validation, ex- notating functional elements in a defined 1% of the human ge- perimental replication, sequence depth, data reporting, and data nome consisting of ;30 Mb divided among 44 genomic regions. quality assessment (Landt et al. 2012). For all ENCODE data types, On June 14, 2007, a report summarizing the findings of the pilot standardized guidelines are established and protocol descriptions project revealed pervasive transcription of the human genome, are available at http://www.encodeproject.org/. Importantly, these with the majority of nucleotides represented in transcripts in at guidelines enable investigators to readily generate additional data least a limited number of cell types at some time (The ENCODE sets to compare with ENCODE data of the same type as well as Project Consortium 2007). Many of these transcripts comprised perform integrative analyses across multiple data types. novel noncoding RNA genes. Importantly, The ENCODE Pilot Project assigned function to 60% of the evolutionarily constrained Genes and transcripts bases in the 44 genomic regions and identified many additional functional elements seemingly unconstrained across mammalian The ENCODE project has produced a reference gene set referred to evolution. Integration of the various experimental data generated as GENCODE (Harrow et al. 2012), which has been constructed by The ENCODE Pilot Project provided further insights into con- by a merge of manual and automated annotations of all human nections between chromatin structure (modifications and acces- protein-coding genes, pseudogenes, and noncoding