Is Junk DNA Bunk? a Critique of ENCODE

PERSPECTIVE Is junk DNA bunk? A critique of ENCODE W. Ford Doolittle1 Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada B3H 4R2 Edited by Michael B. Eisen, Howard Hughes Medical Institute, University of California, Berkeley, CA, and accepted by the Editorial Board February 4, 2013 (received for review December 11, 2012) Do data from the Encyclopedia Of DNA Elements (ENCODE) project render the notion of junk DNA obsolete? Here, I review older arguments for junk grounded in the C-value paradox and propose a thought experiment to challenge ENCODE’s ontology. Specifically, what would we expect for the number of functional elements (as ENCODE defines them) in genomes much larger than our own genome? If the number were to stay more or less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk or, at least, assign it a different sort of role (structural rather than informational). If, however, the number of functional elements were to rise significantly with C-value then, (i) organisms with genomes larger than our genome are more complex phenotypically than we are, (ii) ENCODE’sdefinition of functional element identifies many sites that would not be considered functional or phenotype-determining by standard uses in biology, or (iii) the same phenotypic functions are often determined in a more diffuse fashion in larger-genomed organisms. Good cases can be made for propo- sitions ii and iii. A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed. evolution | human genome There is much excitement in the blogosphere, chromatin landscapes, transcription factor animals in the efficiency of its chromo- among mainstream science journalists, and footprints, and long-range chromosomal somal organization and not just its cultural within the community of practicing genome interactions—support many current popu- attainments. Such genomic anthropocen- biologists about a flurry of articles and letters lation genetic studies linking human dis- trism, unacknowledged conflation of pos- in the September 6th, 2012 issue of Nature. eases to supposedly nongenic regions, and sible meanings of “function,” questionable These papers and many others published at they are truly impressive in scope and depth null hypotheses, and unrecognized pana- about the same time and since under the (5). They resonate with the current enthu- daptationism are behind this most recent umbrella of the ENCODE project collectively siasm for assigning multiple subtle but vital attempt to junk “junk.” claim function for the majority of the 3.2 Gb regulatory roles to the still enigmatic long Several of these same points have been human genome, not just the few percent al- noncoding RNAs (lncRNAs) now known made in brief by Eddy (8) and Niu and ready recognized as genes (traditionally de- to be transcribed from much of the length Jiang (9). My aim here is to remind readers fined) or obvious gene-controlling elements. of our genome (6, 7). Additionally, congru- of the structure of some earlier arguments Kolata writes in The New York Times that ence at many sites between the many meth- in defense of the junk concept (10) that “[t]he human genome is packed with at least ods used (RNA sequencing, binding by one remain compelling, despite the obvious four million gene switches that reside in bits or more of 100+ DNA binding proteins, success of ENCODE in mapping the sub- ofDNAthatwereoncedismissedas‘junk’ DNase I hypersensitivity, histone modifica- tle and complex human genomic landscape. but that turn out to play critical roles in con- tion, DNA methylation, and chromosome Also, I will suggest that we need as biol- trolling how cells, organs and other tissues conformation capture) leaves no doubt that ogists to defend traditional understand- behave” (1). In a Nature News and View many of these 4 million gene switches do ings of function: the publicity surrounding commentary, Ecker et al. (2) assert that “[o]ne represent chromosomal loci that are special ENCODE reveals the extent to which these understandings have been eroded. How- of the more remarkable findings described in some way, in at least one cell type. How- ever, theoretical expansion in other di- in the consortium’s entrée paper is that 80% ever, do ENCODE’s data truly require us to rections, reconceptualizing junk, might of the genome contains elements linked to abandon the widespread notion that junk be advisable. biochemical functions, dispatching the widely DNA—here specifically understood as DNA held view that the human genome is mostly that does not encode information promot- Perennial Problem of C-Value ‘junk DNA.’” The editors of The Lancet (3) ing the survival and reproduction of the Information and Structure. The junk idea enthuse: “Far from being ‘junk,’ the DNA organismsthatbearit—is the major con- long predates genomics and since its early between protein encoding genes consists of stituent of many eukaryotic genomes, our decades has been grounded in the “C-value myriad elements that determine gene ex- own genome included? paradox,” the observation that DNA amounts pression, whether by switching transcription I will argue by way of a thought experi- (C-value denotes haploid nuclear DNA con- on or off, or by regulating the degree of ment and an analysis of what biologists tent) and complexities correlate very poorly transcription and consequently the concen- traditionally have understood as function trations and function of all proteins.” Suc- that they do not. At the very least, “junk” cinctly, in Science, Pennisi (4) declares that as it has been conceived is an apt descrip- Author contributions: W.F.D. wrote the paper. the ENCODE publications write the “eulogy tor of the bulk of many genomes larger The author declares no conflict of interest. for junk DNA.” than our own. Moreover, it almost certainly This article is a PNAS Direct Submission. M.B.E. is a guest editor The new data—coming from high- still is for much of our genome, unless we invited by the Editorial Board. throughput analyses of transcriptional and hold Homo sapiens to be unique among the 1E-mail: [email protected]. 5294–5300 | PNAS | April 2, 2013 | vol. 110 | no. 14 www.pnas.org/cgi/doi/10.1073/pnas.1221376110 Downloaded by guest on September 28, 2021 with organismal complexity or evolutionary structural and cell biological roles “nucleo- than until very recently thought contributes PERSPECTIVE “advancement” (10–14). Humans do have skeletal,” considering C-value to be opti- to our survival and reproduction as organ- a thousand times as much DNA as simple mized by organism-level natural selection isms, because it encodes information tran- bacteria, but lungfish have at least 30 times (13, 20). Gregory, now the principal C-value scribed or expressed phenotypically in one more than humans, as do many flowering theorist, embraces a more “pluralistic, hier- tissue or another, or specifically regulates plants and some unicellular protists (14). archical approach” to what he calls “nucleo- such expression. Moreover, as is often noted, the discon- typic” function (11, 12, 17). A balance nection between C-value and organismal between organism-level selection on nuclear A Thought Experiment. ENCODE (5) de- fi “ complexity is also found within more re- structure and cell size, cell division times nes a functional element (FE) as a discrete fi stricted groups comprising organisms of and developmental rate, selfish genome-level genome segment that encodes a de ned seemingly similar lifestyle and comparable selection favoring replicative expansion, and product (for example, protein or non-coding organismal or behavioral complexity. The (as discussed below) supraorganismal (clade- RNA) or displays a reproducible biochemical most heavily burdened lungfish (Protopterus level) selective processes—as well as drift— signature (for example, protein binding, or fi ” aethiopicus) lumbers around with 130,000 must all be taken into account. a speci c chromatin structure). Asimple fi Mb, but the pufferfish Takifugu (formerly These forces will play out differently in thought experiment involving FEs so-de ned Fugu) rubripes gets by on less than 400 Mb different taxa. González and Petrov (23) point is at the heart of my argument. (15, 16). A less familiar but better (because out, for instance, that Drosophila and humans Suppose that there had been (and proba- monophyletic) animal example might be are at opposite extremes in terms of the bly, some day, there will be) ENCODE amphibians, showing a 120-fold range from balance of processes, with the minimalist projects aimed at enumerating, by transcrip- frogs to salamanders (17). Among angio- genomes of the former containing few (but tional and chromatin mapping, factor foot- sperms, there is a thousandfold variation mostly young and quite active) TEs, whereas printing, and so forth, all of the FEs in the (14). Additionally, even within a single ge- at least one-half of our own much larger ge- genomes of Takifugu and a lungfish, some nus, there can be substantial differences. nome comprises the moribund remains of small and large genomed amphibians (in- Salamander species belonging to Plethodon older TEs, principally SINEs and LINEs cluding several species of Plethodon), plants, boast a fourfold range, to cite a comparative (short and long interspersed nuclear ele- and various protists. There are, I think, two study popular from the 1970s (18). Some- ments). Such difference may in part reflect possible general outcomes of this thought times, such within-genus genome size dif- population size. As Lynch notes, small pop- experiment, neither of which would give us ferences reflect large-scale or whole-genome ulation size (characteristic of our species) clear license to abandon junk. duplications and sometimes rampant selfish will have limited the effectiveness of natural The firstoutcomewouldbethatFEs(es- DNA or transposable element (TE) mul- selection in preventing a deleterious accu- timated to be in the millions in our genome) tiplication.

Is Junk DNA Bunk? a Critique of ENCODE

ENCODE Regions Identification of Higher-Order Functional Domains in the Human

Developing and Implementing an Institute-Wide Data Sharing Policy Stephanie OM Dyke and Tim JP Hubbard*

Estimating the Number of Unseen Variants in the Human Genome

The ENCODE Project

The ENCODE (Encyclopedia of S DNA Elements) Project

What Is a Gene, Post-ENCODE? History and Updated Definition

A User's Guide to the Encyclopedia of DNA Elements (ENCODE)

Current Advances of Epigenetics in Periodontology from ENCODE Project

DNA Copy Number Analysis of Fresh and Formalin-Fixed Specimens By

ENCODE Phase 2: Participants and Projects

National Human Genome Research Institute EA Feingold, PJ G

Multiple Genes Encode Nuclear Factor 1-Like Proteins That Bind To