PERSPECTIVE Is junk DNA bunk? A critique of ENCODE

W. Ford Doolittle1 Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada B3H 4R2

Edited by Michael B. Eisen, Howard Hughes Medical Institute, University of California, Berkeley, CA, and accepted by the Editorial Board February 4, 2013 (received for review December 11, 2012)

Do data from the Encyclopedia Of DNA Elements (ENCODE) project render the notion of junk DNA obsolete? Here, I review older arguments for junk grounded in the C-value paradox and propose a thought experiment to challenge ENCODE’s ontology. Specifically, what would we expect for the number of functional elements (as ENCODE defines them) in much larger than our own ? If the number were to stay more or less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk or, at least, assign it a different sort of role (structural rather than informational). If, however, the number of functional elements were to rise significantly with C-value then, (i) organisms with genomes larger than our genome are more complex phenotypically than we are, (ii) ENCODE’sdefinition of functional element identifies many sites that would not be considered functional or -determining by standard uses in biology, or (iii) the same phenotypic functions are often determined in a more diffuse fashion in larger-genomed organisms. Good cases can be made for propo- sitions ii and iii. A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed.

evolution |

There is much excitement in the blogosphere, landscapes, factor animals in the efficiency of its chromo- among mainstream science journalists, and footprints, and long-range chromosomal somal organization and not just its cultural within the community of practicing genome interactions—support many current popu- attainments. Such genomic anthropocen- biologists about a flurry of articles and letters lation genetic studies linking human dis- trism, unacknowledged conflation of pos- in the September 6th, 2012 issue of . eases to supposedly nongenic regions, and sible meanings of “function,” questionable These papers and many others published at they are truly impressive in scope and depth null hypotheses, and unrecognized pana- about the same time and since under the (5). They resonate with the current enthu- daptationism are behind this most recent umbrella of the ENCODE project collectively siasm for assigning multiple subtle but vital attempt to junk “junk.” claim function for the majority of the 3.2 Gb regulatory roles to the still enigmatic long Several of these same points have been human genome, not just the few percent al- noncoding (lncRNAs) now known made in brief by Eddy (8) and Niu and ready recognized as (traditionally de- to be transcribed from much of the length Jiang (9). My aim here is to remind readers fined) or obvious -controlling elements. of our genome (6, 7). Additionally, congru- of the structure of some earlier arguments Kolata writes in The New York Times that ence at many sites between the many meth- in defense of the junk concept (10) that “[t]he human genome is packed with at least ods used (RNA sequencing, binding by one remain compelling, despite the obvious four million gene switches that reside in bits or more of 100+ DNA binding , success of ENCODE in mapping the sub- ofDNAthatwereoncedismissedas‘junk’ DNase I hypersensitivity, modifica- tle and complex human genomic landscape. but that turn out to play critical roles in con- tion, DNA methylation, and chromosome Also, I will suggest that we need as biol- trolling how cells, organs and other tissues conformation capture) leaves no doubt that ogists to defend traditional understand- behave” (1). In a Nature News and View many of these 4 million gene switches do ings of function: the publicity surrounding commentary, Ecker et al. (2) assert that “[o]ne represent chromosomal loci that are special ENCODE reveals the extent to which these understandings have been eroded. How- of the more remarkable findings described in some way, in at least one type. How- ever, theoretical expansion in other di- in the consortium’s entrée paper is that 80% ever, do ENCODE’s data truly require us to rections, reconceptualizing junk, might of the genome contains elements linked to abandon the widespread notion that junk be advisable. biochemical functions, dispatching the widely DNA—here specifically understood as DNA held view that the human genome is mostly that does not encode information promot- Perennial Problem of C-Value ‘junk DNA.’” The editors of The Lancet (3) ing the survival and reproduction of the Information and Structure. The junk idea enthuse: “Far from being ‘junk,’ the DNA organismsthatbearit—is the major con- long predates genomics and since its early between encoding genes consists of stituent of many eukaryotic genomes, our decades has been grounded in the “C-value myriad elements that determine gene ex- own genome included? paradox,” the observation that DNA amounts pression, whether by switching transcription I will argue by way of a thought experi- (C-value denotes haploid nuclear DNA con- on or off, or by regulating the degree of ment and an analysis of what biologists tent) and complexities correlate very poorly transcription and consequently the concen- traditionally have understood as function trations and function of all proteins.” Suc- that they do not. At the very least, “junk” cinctly, in Science, Pennisi (4) declares that as it has been conceived is an apt descrip- Author contributions: W.F.D. wrote the paper. the ENCODE publications write the “eulogy tor of the bulk of many genomes larger The author declares no conflict of interest. for junk DNA.” than our own. Moreover, it almost certainly This article is a PNAS Direct Submission. M.B.E. is a guest editor The new data—coming from high- still is for much of our genome, unless we invited by the Editorial Board. throughput analyses of transcriptional and hold Homo sapiens to be unique among the 1E-mail: [email protected].

5294–5300 | PNAS | April 2, 2013 | vol. 110 | no. 14 www.pnas.org/cgi/doi/10.1073/pnas.1221376110 Downloaded by guest on September 28, 2021 with organismal complexity or evolutionary structural and cell biological roles “nucleo- than until very recently thought contributes PERSPECTIVE “advancement” (10–14). Humans do have skeletal,” considering C-value to be opti- to our survival and reproduction as organ- a thousand times as much DNA as simple mized by organism-level natural selection isms, because it encodes information tran- bacteria, but lungfish have at least 30 times (13, 20). Gregory, now the principal C-value scribed or expressed phenotypically in one more than humans, as do many flowering theorist, embraces a more “pluralistic, hier- tissue or another, or specifically regulates plants and some unicellular protists (14). archical approach” to what he calls “nucleo- such expression. Moreover, as is often noted, the discon- typic” function (11, 12, 17). A balance nection between C-value and organismal between organism-level selection on nuclear A Thought Experiment. ENCODE (5) de- fi “ complexity is also found within more re- structure and cell size, cell division times nes a functional element (FE) as a discrete fi stricted groups comprising organisms of and developmental rate, selfish genome-level genome segment that encodes a de ned seemingly similar lifestyle and comparable selection favoring replicative expansion, and product (for example, protein or non-coding organismal or behavioral complexity. The (as discussed below) supraorganismal (clade- RNA) or displays a reproducible biochemical most heavily burdened lungfish (Protopterus level) selective processes—as well as drift— signature (for example, protein binding, or fi ” aethiopicus) lumbers around with 130,000 must all be taken into account. a speci c chromatin structure). Asimple fi Mb, but the pufferfish Takifugu (formerly These forces will play out differently in thought experiment involving FEs so-de ned Fugu) rubripes gets by on less than 400 Mb different taxa. González and Petrov (23) point is at the heart of my argument. (15, 16). A less familiar but better (because out, for instance, that and humans Suppose that there had been (and proba- monophyletic) animal example might be are at opposite extremes in terms of the bly, some day, there will be) ENCODE amphibians, showing a 120-fold range from balance of processes, with the minimalist projects aimed at enumerating, by transcrip- frogs to salamanders (17). Among angio- genomes of the former containing few (but tional and chromatin mapping, factor foot- sperms, there is a thousandfold variation mostly young and quite active) TEs, whereas printing, and so forth, all of the FEs in the (14). Additionally, even within a single ge- at least one-half of our own much larger ge- genomes of Takifugu and a lungfish, some nus, there can be substantial differences. nome comprises the moribund remains of small and large genomed amphibians (in- Salamander species belonging to Plethodon older TEs, principally SINEs and LINEs cluding several species of Plethodon), plants, boast a fourfold range, to cite a comparative (short and long interspersed nuclear ele- and various protists. There are, I think, two study popular from the 1970s (18). Some- ments). Such difference may in part reflect possible general outcomes of this thought times, such within-genus genome size dif- population size. As Lynch notes, small pop- experiment, neither of which would give us ferences reflect large-scale or whole-genome ulation size (characteristic of our species) clear license to abandon junk. duplications and sometimes rampant selfish will have limited the effectiveness of natural The firstoutcomewouldbethatFEs(es- DNA or transposable element (TE) mul- selection in preventing a deleterious accu- timated to be in the millions in our genome) tiplication. Schnable et al. (19) figure that mulation of TEs (24, 25). turnouttobemoreorlessconstantin the maize genome has more than doubled Zuckerkandl (26) once mused that all ge- number, regardless of C-value—at least in size in the last 3 million y, overwhelmingly nomic DNA must be to some degree “polite,” among similarly complex organisms. If larger through the replication and accumulation of in that it must not lethally interfere with gene C-value by itself does not imply more FEs, TEs for example. If we do not think of this expression. Indeed, some might suggest, as I then there will, of course, be great differences additional or “excess” DNA, so manifest will below, that true junk might better be in what we might call functional density through comparisons between and within defined as DNA not currently held to (FEs per kilobase) (26) among species. FEs biological groups, as junk (irrelevant if not account by selection for any sort of role spaced by kilobases in Arabidopsis would frankly detrimental to the survival and re- operating at any level of the biological hier- be megabases apart in maize on average. production of the organism bearing it), archy (27). However, junk advocates have Averages obscure details: the extra DNA how then are we to think of it? to date generally considered that even DNA in the larger genomes might be sequestered Of course, DNA inevitably does have a fulfilling bulk structural roles remains, in in a few giant silent regions rather than basic structural role to play, unlinked to terms of encoded information, just junk. uniformly stretching out the space between specific biochemical activities or the encod- Cell biology may require a certain C-value, FEs or lengthening intragenic introns. How- ing of information relevant to genes and but most of the stretches of noncoding DNA ever, in either case, this DNA could be seen their expression. Centromeres and telomeres that go to satisfying that requirement are as a sort of polite functionless filler or dil- exemplify noncoding chromosomal compo- junk (or worse, selfish). uent. At best, such DNA might have func- nents with specific functions. More gener- In any case, structural roles or multi- tions only of the structural or nucleoskeletal/ ally, DNA as a macromolecule bulks up and level selection theorizing are not what nucleotypic sort. Indeed, even this sort of gives shape to chromosomes and thus, as ENCODE commentators are endorsing when functional attribution is not necessary. There many studies show, determines important they proclaim the end of junk, touting the is room within an expanded, pluralistic and nuclear and cellular parameters such as di- existence of 4 million gene switches or myr- hierarchical theory of C-value (see below) vision time and size, themselves coupled iad elements that determine (12, 27) for much DNA that makes no to organismal development (11–13, 17). and assigning biochemical functions for contribution whatever to survival and re- The “selfish DNA” scenarios of 1980 (20– 80% of the genome. Indeed, there would production at the organismal level and thus 22), in which C-value represents only the be no excitement in either the press or the is junk at that level, although it may be outcome of conflicts between upward scientific literature if all the ENCODE team under selection at the sub- or supraor- pressure from reproductively competing had done was acknowledge an established ganismal levels (TEs and clade selection). TEs and downward-directed energetic re- theory concerning DNA’s structural im- If the human genome is junk-free, then it straints, have thus, in subsequent decades, portance. Rather, the excitement comes must be very luckily poised at some sort of yielded to more nuanced understandings. from interpreting ENCODE’s data to mean minimal size for organisms of human com- Cavalier-Smith (13, 20) called DNA’s that a much larger fraction of our DNA plexity. We may no longer think that

Doolittle PNAS | April 2, 2013 | vol. 110 | no. 14 | 5295 Downloaded by guest on September 28, 2021 mankind is at the center of the universe, allow facultative growth of bacteria on showing reduced within-human diversity but we still consider our species’ genome to β-galactosides, because we believe that, (and thus, likely acquiring new function be unique, first among many in having made long before E. coli was brought into the among us). Ponting and Hardison (37), such full and efficient use of all of its millions laboratory, the lac operon was maintained using methods that they believe to take into of SINES and LINES (retrotransposable by selection to allow such growth. We might account such turnover, “estimate that the α elements) and introns to encode the multi- also say that one of the functions of the steady-state value of sel [the proportion of tudes of lncRNAs and house the millions of human FOXP2 gene (which we share with all nucleotides in the human genome that enhancers necessary to make us the uniquely many other vertebrates) is now to support are subject to purifying selection because of complex creatures that we believe ourselves speech (32), although in the more distant their biological function] lies between 10% to be. However, were this extraordinary co- mammalian past, it could not have. We and 15%” (37). incidence the case, a corollary would be that would imagine that there has been selection Another way to attribute function is junk would not be defunct for many other for speech in human populations over con- through experimental ablation: whatever larger genomes: the term would not need to siderable time. Traits like FOXP2, now un- organism-level effect E does not occur after be expunged from the genomicist’s lexicon der positive or purifying selection for one deleting or blocking the expression of a re- more generally. As well, if, as is commonly effect but first arising because of selection gion R of DNA is taken to be the latter’s believed, much of the functional complexity for another, are what Gould and Vrba (33) function. This attribution is close to the of the human genome is to be explained called “exaptations”. everyday understanding of function, as in by evolution of our extraordinary cognitive What we would not want to call functions the function of the carburetor is to oxy- capacities, then many other mammals of (or even exaptations) are effects never so far genate gasoline. The approach embodies lesser acumen but similar C-value must selected for—side effects, as it were. Gould what philosophers would call a causal role truly have junk in their DNA. and Lewontin (34) famously called these (CR) definition of function and supposedly The second likely general outcome of my “spandrels.” They comprise both undesirable eschews evolutionary or historical justi- thought experiment would be that FEs as but apparently unavoidable consequences, fications. Much biological research into defined by ENCODE increase in number like vulnerability to phages in bacteria with function is done this way, but I think that with C-value, regardless of apparent organ- pili or lower back pain in primates walking most biologists consider that experimental ismal complexity. If they increase roughly upright, and seemingly neutral ones, like the ablation indirectly points to SE. They be- proportionately, FE numbers will vary over thumping noise made by the heart, to use an lieve that effect E could, under suitable a many-hundredfold range among organ- example beloved of philosophers. Indeed, conditions, be shown to have contributed isms normally thought to be similarly com- even fortuitously advantageous traits, such to the past fitness of organisms and most plex. Defining or measuring complexity is, the FOXP2-enabled capacity to leave voice importantly, that R exists as it does be- of course, problematic if not impossible. messages on answering machines, are not SE cause of E. Cardiologists do not say that it Still, it would be hard to convince ourselves functions. We do not think that our ances- is the function of the heart to make a that lungfish are 300 times more complex tors experienced positive selection for leaving thumping noise, although stopping the heart than Takifugu or 40 times more complex voice messages, although our descendants will silence it. Similarly, geneticists studying than us, whatever complexity might be. well might (and FOXP2 would then for them Huntington disease would not say that the More likely, if indeed FE numbers turn out have acquired a new exaptive function). trinucleotide repeat in the cognate gene, re- to increase with C-value, we will decide In any case, past selection, recent or an- iteration of which gives rise to the disorder, that we need to think again about what cient, can only be inferred, and we must use has disease causation as a function—although function is, how it becomes embedded in indirect ways to make the inference. One replacing the repeat with a nonidentical set of macromolecular structures, and what FEs as way, likely the most reliable but not univer- unique triplets encoding the same amino defined by ENCODE have to tell us about it. sally applicable, is evolutionary conservation. acid sequence would eliminate the delete- If diverse lineages retain a DNA sequence rious effect (38). Problematics of Function despite the erosive force of mutational di- A third, and the least reliable, method to Definition and Inference. What do we vergence, there must be some effect main- infer function is mere existence. The presence mean by function, informational or other- tained by purifying selection. The above is of a structure or the occurrence of a process wise? Most philosophers of biology, and not to say that all conserved sequences are or detectable interaction, especially if com- likely, most practicing biologists when conserved through purifying selection at plex, is taken as adequate evidence for its pressed, would endorse some form of the the level of organisms: some may be selfish. being under selection, even when ablation is selected effect (SE) definition of function Conversely, some conserved functions, such infeasible and the possibly selectable effect of (28–30). Selected effect is the form of as the complementary base-pairings that presence remains unknown. Because our ge- teleological explanation allowed, indeed maintain ribosomal RNA secondary struc- nomes have introns, Alu elements, and en- required, by Darwinian theory (31). Ac- tures, do not require primary sequence con- dogenous retroviruses, these things must cordingly, the functions of a trait or feature servation (35). Moreover, not all sequences bedoingussomegood.Becausearegion are all and only those effects of its presence that are likely to be currently under puri- is transcribed, its transcript must have some for which it was under positive natural se- fying organismal selection are conserved on fitness benefit, however remote. Because lection in the (recent) past and for which it an evolutionary (transspecies) timescale. residue N of protein P is leucine in species is under (at least) purifying selection now. In a recent comparative genomic survey, A and isoleucine in species B, there must They are why the trait or feature is there Ward and Kellis (36) find both mammal- be some selection-basedexplanation.This today and possibly why it was originally conserved human sequences showing in- approach enshrines “panadaptationism,” formed. Thus, we might reasonably say that creasing diversity within our species (and which was forcefully and effectively de- the function of the lac operon in Escherichia thus, likely becoming nonfunctional in hu- bunked by Gould and Lewontin (34) in coli is (and presumably, long has been) to mans) and mammal-nonconserved sequence 1979 but still informs much of molecular

5296 | www.pnas.org/cgi/doi/10.1073/pnas.1221376110 Doolittle Downloaded by guest on September 28, 2021 and evolutionary genetics, including promote clearer terminology that separates through drift and exist only as noise (46). PERSPECTIVE genomics. As Lynch (39) argues in his the identification of functional elements per Similarly, binding to proteins and other “ se from the ascription of specific functional ac- essay The Frailty of Adaptive Hypotheses fi RNAs is something that RNAs do. It is ” tivities using historical experimentally de ned for the Origins of Adaptive Complexity, categories, and also to dissuade the ascription inevitable that some such interactions, ini- This narrow view of evolution has become of very specific functions based on a biochemical tially fortuitous, will come to be genuinely untenable in light of recent observations from signature in place of a deeper mechanistic regulatory, either through positive selection genomic sequencing and population genetic understanding. or the neutral process described below as theory. Numerous aspects of genomic architecture, constructive neutral evolution (CNE). How- gene structure, and developmental pathways Functions of Classes and Parts. There are ever, there is no evolutionary force requiring fi “ ” are dif cult to explain without invoking the three other natural temptations that I that all or even most do. At another (so- nonadaptive forces of genetic drift and mutation. would caution consumers of the ENCODE ciology of science) level, it is inevitable In addition, emergent biological features such project product to avoid. The first tempta- as complexity, modularity, and evolvability, all that molecular biologists will search for tion is the assumption that, because some of which are current targets of considerable and discover some of those possibly quite speculation, may be nothing more than indirect members of a class of elements have ac- few instances in which function has evolved by-products of processes operating at lower quired SE functions, all or most must have and argue that the function of lncRNAs as levels of organization. functions or (more broadly) that the class a class of elements has, at last, been dis- of elements as a whole can thus be declared Functional attribution under ENCODE is covered. The positivist, verificationist bias functional. Stamatoyannopoulos (41), for of this third sort (mere existence) in the of contemporary science and the politics of instance, writes: main. Although FEs as defined by ENCODE its funding ensure this outcome. might be cross-identified by several methods In marked contrast to the prevailing wisdom, However, what is the correct conceptual and even evolutionarily conserved, they could ENCODE chromatin and transcription studies framework here? Why should either function most often be the molecular equivalent of now suggest that a large number of transposable or nonfunction for a class of elements be — elements encode highly cell type-selective reg- taken as the null hypothesis, and why should spandrels structured elements that are the ulatory DNA that controls not only their own indirect consequence of selection operating cell-selective transcription, but also those of evidence for or against function, however on other features but are themselves selec- neighboring genes. Far from an evolutionary defined, be taken as support of one or the tively neutral, a form of structured noise. dustbin, transposable elements appear to be other? Either is a form of essentialism or Demonstrations that some biochemical sig- active and lively members of the genomic reg- natural kind thinking inappropriate in con- natures are not neutral and may even meet ulatory community, deserving of the same level temporary biology. There is, after all, noth- SE criteria say nothing about the rest and of scrutiny applied to other genic or regulatory ing in nature that constrains classes of genetic features. are, of course, expected as long as oppor- elements defined by humans as sharing cer- tunism and co-optation are understood to It is surely inevitable that evolution, that tain common characteristics to share others be key elements in evolution. inveterate tinkerer, will have sometimes co- notpartofthatdefinition. In taking such a liberal definitional course, opted some TEs for such purposes (42). The second natural temptation is to as- ENCODE follows the lead of the Gene On- However, it is an overenthusiastic extrap- sume that function of a part implies function tology (GO) project, which defines molecular olation to describe TEs as a class as “active of the whole. Co-optation of the promoters of function in decontextualized nonhistorical and lively members of the genomic reg- TEsortheinsertionofbonafide regulatory terms (40): ulatory community.” sequences into introns is taken to impart Moreover, the word “regulation” has itself function to the TE or intron as a whole. Molecular function describes activities, such as catalytic or binding activities, that occur at degraded through use by genomicists, from However, the cell does not necessarily see the molecular level. GO molecular function designating evolved effects shown or likely to the rest of the TE or the intronic surround terms represent activities rather than the entities enhance fitness, presumably by efficient of an embedded as relevant to its (molecules or complexes) that perform the actions, control of the use of resources, to more activity, and only the or enhancer and do not specify where or when, or in what broadly denoting any measurable impact sequence may be under selection. Even when context, the action takes place. of one element or process on other elements an entire genetic element seems relevant or Proponents of ENCODE actually are or processes, regardless of fitness conse- necessary (whole introns must be removed concerned with what they call “functional quences. I think this broadening of defi- even if only certain sites are active in their validation” but principally seem worried nition misleads biologists such as Barroso removal), there is the possibility of excess about the danger of mistaking functional (43) in a passage cited later in this essay. baggage or junk-like character. Do enhancer- specificity (recognition or activity) rather Pacemakers regulate heartbeats and that harboring introns really need to be so long? than the risk of attributing function (es- is their function: tasers and caffeine also Anotheranalogyseemsinorder:My pecially regulatory function) where none affect cardiac rhythm, but we would not computer might be 5 ft from the wall socket, exists in the SE sense. Stamatoyannopoulos (at least in the former case) see this as but if I have only a 10-ft electrical cord all (41) writes: regulatory function. 10 ft will seem functional, because cutting fi These examples illustrate a natural temptation Regulation, de ned in this loose way, is, the cord anywhere will turn off my ma- to equate activity with patterning of epigenomic for instance, the assumed function of many chine. In this connection, note that much features. However, such reasoning drifts pro- or most lncRNAs, at least for some authors more than one-half of 80.4% of the human gressively farther away from experimentally (6, 7, 44, 45). However, the transcriptional genome that ENCODE deems functional is grounded function or mechanistic understand- machinery will inevitably make errors: ac- so considered because it is transcribed (4), ing. The sheer diversity of cross-cell-type reg- curacy is expensive, and the selective cost most often into an intron or lncRNA, only ulatory patterning evident in distal regulatory DNA uncovered by ENCODE suggests tre- of discriminating against all false promoters a tiny fraction of the length of which is mendous heterogeneity and functional diver- will be too great to bear. There will be likely to be involved in potentially regulatory sity. ENCODE is thus in a unique position to lncRNAs with promoters that have arisen interactions.

Doolittle PNAS | April 2, 2013 | vol. 110 | no. 14 | 5297 Downloaded by guest on September 28, 2021 A third and related natural temptation is traits that incurs selective cost at that level, a species, TE-bearing species in a genus, or to inflate functional attribution through whileofferingonlytheremotesthopeof classes comprising such species in a phylum). choice of window size. The ENCODE Pro- future benefit to the individual and its Indeed, one could reasonably argue for an ject Consortium notes (5) that: descendants. eventual expansion of the SE definition of “ ” The vast majority (80.4%) of the human ge- Other evolutionary mechanisms can. The function to include all levels as long as we nome participates in at least one biochemical publications by Lynch et al. (24) and Lynch distinguish them (DNA-level function, or- RNA- and/or chromatin-associated event in at (39, 49) have effectively argued that drift ganism-level function, clade-level function, least one cell type. Much of the genome lies close operating in very small populations will (by and so forth). This definitional expansion to a regulatory event: 95% of the genome lies chance) encourage accumulation of DNA might well lead to a reduction in the within 8 kilobases (kb) of a DNA–protein in- teraction (as assayed by bound ChIP-seq motifs that can add to C-value and might, in the amount of DNA that we could reasonably or DNase I footprints), and 99% is within 1.7 kb future, come in handy. Selection at the call junk. However, I think such an ex- of at least one of the biochemical events mea- suborganismal (selfish DNA) level may panded idea of function does not cur- sured by ENCODE. also seem to be future-directed: TEs do rently inform ENCODE or most genomic Even this last and largest percentage, sometimes later become useful through co- thinking, and in any case, the problems of assuming the average biochemical event optation or general effects on the generation inference, evidence, and appropriate null to directly involve tens to hundreds of bases of novelty (42, 50). Indeed, Fedoroff (51) has hypotheses remain. (41), assigns functions to only a minority recently proposed that TEs should not be “ fi ” Function as a Diffusible Quality. Quite (perhaps 10%) of the genome’s base pairs. described pejoratively as sel sh and that — often, we can intuit the results of a thought ENCODE’s data are indeed unexpected and the prevailing view that the epigenetic experiment before articulating the reasons impressive, but without some principled silencing mechanisms that eukaryotes for our intuition. This propensity is the and agreed-on metric for functional density use to limit TE replication arose to do just — mysterious appeal and practical use of and FE boundaries, any answer to the that puts the evolutionary cart before the thought experiments. My intuition is that, question “how much of the genome is horse. Rather, Fedoroff (51) suggests that when ENCODE-like methods are applied functional?” remains endlessly negotiable these mechanisms, by also limiting re- with equal thoroughness to larger genomes, and transparently window size-dependent. combination between repeated TEs (which makes genomes smaller), allow TEs to ac- outcomes of the second sort described above fi Future Function. InherNewsandView cumulate, growing genomes and that this will be obtained. That is, lung sh will have editorial in the September 6, 2012 issue of was a “...critical step in the evolution of many more FEs than Takifugu, and large- Nature, Barroso (43) speculates as follows multicellular organisms, underpinning the genomed Plethodon species will have more concerning the vast majority of human ability to diversify duplicates for expres- than smaller-genomed ones. If there were a DNA, until now thought useless: sion in specific cells and tissues” (51). primate with a C-value substantially greater than that of H. sapiens, it would prove to ... Fedoroff (51) concludes that there is a good reason to keep this DNA. have more FEs, even if judged more primi- Results from the ENCODE project show that On balance, then, the likelihood that contem- most of these stretches of DNA harbor regions porary eukaryotic genomes evolved in the con- tive in intelligence or on behavioral grounds. that bind proteins and RNA molecules, bringing text of epigenetic mechanisms seems vastly greater An exception might be genomes in which these into positions from which they cooperate than the likelihood that they were invented as C-value increases are very recent and caused with each other to regulate the function and an afterthought to combat a plague of parasitic by the expansive replication of repetitive level of expression of protein-coding genes. In transposons. sequences that previously lacked ENCODE- addition, it seems that widespread transcription fi from non-coding DNA potentially acts as a However, evolutionary explanations of de nable sites. reservoir for the creation of new functional genome structure need not be either/or in Assuming these predictions are borne out, molecules, such as regulatory RNAs. this sense, after it is recognized that selec- what might we make of it? Lynch (39) sug- In addition to equating regulation with tion affecting the genome operates at all gests that much of the genomic- and sys- (including supraorganismal) levels of the tems-level complexity of eukaryotes vis à having an effect, Barroso (43) revives here the fl notion that excess DNA is not junk because it biological hierarchy (12, 27). TEs can be vis prokaryotes is maladaptive, re ecting fi fi may some day be of use, and thus it is selected through sel sh replication at the the inability of selection to block xation maintained as a reservoir. Brenner (47) long level of DNA, while selection at the level of incrementally but mildly deleterious ago derided this sort of reasoning as follows: of organisms has established silencing mech- mutations in the smaller populations of anisms to reign in TE replication, and selec- the former. Thus, for instance, the fact that There is a strong and widely held belief that all tion at the level of clades has looked favorably eukaryotic molecular machines comprise organisms are perfect and that everything within on those clades with complex genomes that more interacting subunits than their pro- them is there for a function. Believers ascribe to fl the Darwinian natural selection process a fastidi- engender evolutionary novelty and render karyotic counterparts re ects the inability ous prescience that it cannot possibly have and whole-clade extinction less likely. (That is, of selection operating on smaller popula- some go so far as to think that patently useless clades that have TE-rich dynamic genomes tions to enforce functional efficiency. Ad- features of existing organisms are there as an may have indeed because of that produced ditionally, to be sure, the proliferation of investment for the future... more and more interestingly diverse and short- and long-range molecular interac- One sees similar Panglossian futuristic evolutionarily robust and evolvable descen- tions deleteriously interposing themselves speculation in some of the recent literature dant species.) Evolutionary forces operate si- within previously simpler regulatory net- on robustness and evolvability (critique in multaneously at all levels in the same and works will be harder to stop in small than ref. 48). However, it cannot in general be different directions with differing strengths large populations. the case that selection operating at the and results measurable in different units Several recent publications (52–55) have level of fitness of individuals within a spe- (such as the frequency of TEs in an indi- revived the argument that CNE is just such cies can favor the origin or maintenance of vidual genome, TE-enhanced individuals in an interpositional force, possibly a very

5298 | www.pnas.org/cgi/doi/10.1073/pnas.1221376110 Doolittle Downloaded by guest on September 28, 2021 powerful one. A simple example of CNE Tradeoffs between speed, economy, and molecular machine, at some time during PERSPECTIVE would be a process by which self-splicing accuracy will unavoidably entail that larger the cell cycle. Therefore, in this sense, the intron RNAs could become dependent on genomes will produce disproportionately gross structure of the chromosome set also proteinaceous splicing factors. Initially, the more noise in terms of fortuitous transcripts carries information that may be relevant to RNA secondary structure is sufficient to capable of becoming presuppressors and the function of genes, broadly defined. Ad- catalyze self-removal, but fortuitously bound thus, more complex, seemingly regulatory, ditionally, all DNA has the job of serving as proteins that stabilize the RNA can com- networks of interaction. Some will be func- a template for its own replication—to that pensate for (presuppress) mutations that tional in a CR if not an SE sense, and some extent, encoding information. might destabilize elements of the structure will have arisen through CNE so that they It is nevertheless true that a distinction necessary for independent splicing. Because are now maintained by purifying selection. between structural and informational roles they are not now deleterious, such muta- However, many, possibly the vast majority, has long been part of the C-value argument tions will accumulate to equilibrium: the are just there. forjunkDNA.Thislineofreasoninghas purifying selection pressure that main- The above considerations are but some of held that high C-value might be necessary for tained RNA secondary structure has gone. the reasons that one might intuit that FE cellular function, but the nongenic DNA that No selection is involved at this stage. If number will scale with genome size. A very fills the requirement is informationally junk. there are several such potentially presup- recent comparative analysis of transcrip- ENCODE’s claim is that much more of the pressible mutations, then a ratchet-like mech- tion factor binding sites in model organisms DNA is, in fact, informational (especially anism will make it difficult or impossible for (45) confirms this conjecture. Ruths and regulatory) than we had thought, and in- the RNA to ever regain splicing indepen- Nakhleh (45) claim that deed ENCODE’s focus is on sites likely to dence. Elimination of the protein will be- ... be involved directly or indirectly in tran- neutral evolutionary forces alone can explain — “ come a lethal event. Therefore, purifying binding site accumulation, and that selection on scription on the myriad elements that selection now prevents the loss of the com- the regulatory network does not alter this finding. determine gene expression” to quote The plex feature (molecular interdependency), If neutral forces drive the accumulation of Lancet (3). Therefore, the structure–in- although positive selection did nothing to binding sites, then, despite selective constraints, formation distinction informs the interpre- create it. organisms with large amounts of [noncoding] tation of the project’s results, and without it DNA would evolve functional, yet ‘overcom- The entire multiprotein, multi-RNA, plicated’ networks. there would be nothing novel or newsworthy eukaryotic spliceosome might have evolved in the assertion that all of the human genome through reiterations of this process along So Is Junk Bunk? has some sort of role in human biology. We with very many of the intricacies of the The renewed debate over junk, thus, owes have known that since the mid-1980s. cellular machinery (52–55). CNE would much of its heat to at least four miscon- Second is the conflation of SE and CR work in concert with any population-size ceptions or misrepresentations. First is definitions of function. Those of us who effect. Neither entails positive selection for the pretense that there is any definable speak of excess DNA as informationally the complex structure and/or processes thus boundary between informational and struc- junk mean that its presence is not to be produced—only purifying selection against tural (genic and nongenic) functions for explained by past and/or current selection its elimination. Philosophers who endorse DNA. Increasingly, genomics is expanding at the level of organisms—that it has no SE definitions of function have not, to my the boundaries of information as geneticists informational function construable histor- knowledge, embraced or even considered have typically understood it. Minimally, gene ically as an SE. Those who say that almost such CNE scenarios, which would meet CR means more than it used to mean. Djebali the whole of the human genome is func- definitions. Considering traits fixed by CNE et al. (56) write tional informationally do so on the basis of to have SE function would add still addi- ...the determination of genic regions is cur- an operational diagnosis embracing a non- tional arrows to the quiver of panadapta- rently defined by the cumulative lengths of the historical CR definition of function. This tionism and should perhaps be discouraged. isoforms and their genetic association to phe- definition is certain to identify as functions A common consequence of CNE is that notypic characteristics, the likely continued re- very many effects that have not been se- even structures or processes that have arisen duction in the lengths of intergenic regions will lected. The rhetoric attending the declared steadily lead to the overlap of most genes pre- by positive selection because they increase “eulogy for junk DNA” (4) sweeps this fi viously assumed to be distinct genetic loci. This organismal tness will later become more supports and is consistent with earlier observa- distinction under the carpet. complex in terms of the number of inter- tions of a highly interleaved transcribed genome, Third is a false natural kind ontology, es- molecular interactions required for their but more importantly, prompts the reconsidera- sentialist in nature, that encourages (i)the fi successful completion. Function diffuses. tion of the de nition of a gene. As this is a con- attribution to a whole class of operationally Genetic networks will first acquire and then sistent characteristic of annotated genomes, we defined genetic elements those functions – would propose that the transcript be considered require more and more protein nucleic acid, as the basic atomic unit of inheritance. Con- known only for a few and/or (ii) the attri- protein–protein, and nucleic acid–nucleic comitantly, the term gene would then denote a bution to the whole length of such a genetic acid intermolecular associations. Larger higher-order concept intended to capture all element a function that resides in only part genomes, producing more RNAs (and those transcripts (eventually divorced from of it. In the case of lncRNAs and intron sometimes more proteins), offer up more their genomic locations) that contribute to a transcripts, whose lengths together make given phenotypic trait. macromolecules and variants as potential up more than three-quarters of 80% of the fortuitous presuppressors and more poten- However, regulatory loci are also informa- genome said to be functional, this second tial DNA binding sites. Recognition systems tional even if not transcribed, and ENCODE sort of functional attribution seems especially for transcription and has documented many long-range interac- misleading. binding cannot be made indefinitely more tions between chromosomal regions that Fourth may be a seldom-articulated or - accurate without the aid of unbearably slow may be brought together physically in the questioned notion that cellular complexity and selectively costly proofreading systems. nucleus, a very complex and structure-rich is adaptive, the product of positive selection

Doolittle PNAS | April 2, 2013 | vol. 110 | no. 14 | 5299 Downloaded by guest on September 28, 2021 at the organismal level. Our disappointment Note Added in Proof. I note that a very ACKNOWLEDGMENTS. I thank Evelyn Fox Keller, Eric that humans do not have many more genes forcefully worded critique by Graur et al. Bapteste, James McInerney, Andrew Roger, and the fl fi Centre for Comparative Genomics and Evolutionary than fruit ies or nematodes has been as- (58), with more speci c objections to ENC- (CGEB) group for comments on this man- suaged by evidence that regulatory mecha- ODE’s methodology, was published while uscript and the Canadian Institutes for Health Research nisms that mediate those genes’ phenotypic this manuscript was in the proof stage. for support. expressions are more various, subtle, and sophisticated (57), evidence of the sort that

ENCODE seems to vastly augment. Yet there 1 Kolata G (September 5, 2002) Bits of mystery DNA, far from 29 Godfrey-Smith P (1994) A modern history theory of functions. are nonselective mechanisms, such as CNE, ‘junk’, play crucial role. The New York Times,SectionA,p.1. Nous 28:344–362. that could result in the scaling of FEs as 2 Ecker JR, et al. (2012) Genomics: ENCODE explained. Nature 30 Schwartz PH (1999) Proper function and recent selection. Philos fi 489(7414):52–55. Sci 66:S210–S222. ENCODE de nes them to C-value nonadap- 3 Anonymous (2012) Cracking ENCODE. Lancet 380(9846):950. 31 Ayala FJ (1970) Teleological explanations in evolutionary biology. tively or might be seen as selective at some 4 Pennisi E (2012) Genomics. ENCODE project writes eulogy for junk Philos Sci 37:1–15. DNA. Science 337(6099):1159–1161. 32 Scharff C, Petri J (2011) Evo-devo, deep homology and FoxP2: level higher or lower than the level of 5 Dunham I, et al. (2012) An integrated encyclopedia of DNA Implications for the evolution of speech and language. Philos Trans – individual organisms. Splits within the dis- elements in the human genome. Nature 489(7414):57 74. R Soc Lond B Biol Sci 366(1574):2124–2140. 6 cipline between panadaptationists/neutralists Rinn JL, Chang HY (2012) Genome regulation by long noncoding 33 Gould SJ, Vrba E (1982) Exaptation—a missing term in the – RNAs. Annu Rev Biochem 81:145 166. science of form. Paleobiology 8:4–15. and those researchers accepting or doubt- 7 Barry G, Mattick JS (2012) The role of regulatory RNA in cognitive 34 Gould SJ, Lewontin RC (1979) The spandrels of San Marco and evolution. Trends Cogn Sci 16(10):497–503. ing the importance of multilevel selection the Panglossian paradigm: A critique of the adaptationist 8 Eddy SR (2012) The C-value paradox, junk DNA and ENCODE. Curr fuel this controversy and others in biology. programme. Proc R Soc Lond B Biol Sci 205(1161):581–598. Biol 22(21):R898–R899. 35 Noller HF, Woese CR (1981) Secondary structure of 16S I submit that, up until now, junk has been 9 Niu DK, Jiang L (2013) Can ENCODE tell us how much junk we ribosomal RNA. Science 212(4493):403–411. carry in our genome? Biochem Biophys Res Commun 430(4): used to denote DNA whose presence 36 Ward LD, Kellis M (2012) Evidence of abundant purifying 1340–1343. cannot reasonably be explained by natural selection in humans for recently acquired regulatory functions. 10 Ohno S (1972) So much “junk” DNA in our genome. Brookhaven Science 337(6102):1675–1678. selection at the level of the organism for Symp Biol 23:366–370. 37 Ponting CP, Hardison RC (2011) What fraction of the human 11 Gregory TR (2001) Coincidence, coevolution, or causation? DNA encoded informational roles. There remain – content, cell size, and the C-value enigma. Biol Rev Camb Philos Soc genome is functional? Genome Res 21(11):1769 1776. 38 good reasons to believe that much of the 76(1):65–101. La Spada AR, Taylor JP (2010) Repeat expansion disease: Progress – DNA of many species fits this definition. 12 Gregory TR (2004) Macroevolution, hierarchy theory and the and puzzles in disease pathogenesis. Nat Rev Genet 11(4):247 258. 39 Nevertheless, while still insisting on SE C-value enigma. Paleobiology 30:179–202. Lynch M (2007) The frailty of adaptive hypotheses for the origins 13 Cavalier-Smith T (2005) Economy, speed and size matter: of organismal complexity. Proc Natl Acad Sci USA 104(Suppl 1): functionality, we might want to come up Evolutionary forces driving nuclear genome miniaturization and 8597–8604. with new definitions of function and junk expansion. Ann Bot (Lond) 95(1):147–175. 40 An introduction to the gene ontology. Available at http://www. by (i) abandoning the distinction between 14 Gregory TR (2005) Synergy between sequence and size in large- geneontology.org/GO.doc.shtml#molecular_function. Accessed scale genomics. Nat Rev Genet 6(9):699–708. December 11, 2012. informational and nucleoskeletal or nucle- 15 Metcalfe CJ, Filée J, Germon I, Joss J, Casane D (2012) Evolution 41 Stamatoyannopoulos JA (2012) What does our genome encode? otypic roles for DNA, (ii) admitting that of the Australian lungfish (Neoceratodus forsteri) genome: A major Genome Res 22(9):1602–1611. role for CR1 and L2 LINE elements. Mol Biol Evol 29(11):3529– 42 Kines KJ, Belancio VP (2012) Expressing genes do not forget their there may be strong selection for C-value 3539. LINEs: Transposable elements and gene expression. Front Biosci as a determinant of many cell biological 16 Brenner S, et al. (1993) Characterization of the pufferfish 17:1329–1344. features, (iii) fully embracing hierarchical (Fugu) genome as a compact model vertebrate genome. Nature 366 43 Barroso I (2012) Non-coding but functional. Nature 489:54. (6452):265–268. 44 Carninci P (2010) RNA dust: Where are the genes? DNA Res selection theory and acknowledging that 17 Gregory TR (2003) Variation across amphibian species in the size 17(2):51–59. different genomic features may have le- of the nuclear genome supports a pluralistic, hierarchical approach to 45 van Leeuwen S, Mikkers H (2010) Long non-coding RNAs: – fi the C-value enigma. Biol J Linn Soc Lond 79:329 339. Guardians of development. Differentiation 80(4–5):175–183. gitimate functions de ned and in play at 18 Mizuno S, Macgregor HC (1974) Chromosomes, DNA 46 Ruths T, Nakhleh L (2012) ncDNA and drift drive binding site different levels, and (iv)expandingtheSE sequences, and evolution in salamaders of the genus Plethodon. accumulation. BMC Evol Biol 12:159. definition of function to include traits that Chromosoma 48:239096. 47 Brenner S (1998) Refuge of spandrels. Curr Biol 8(19):R669. 19 Schnable PS, et al. (2009) The B73 maize genome: Complexity, 48 Pigliucci M (2008) Is evolvability evolvable? Nat Rev Genet 9(1): arise neutrally but are preserved by puri- – diversity, and dynamics. Science 326(5956):1112 1115. 75–82. 20 fying selection (12). Much that we now call Cavalier-Smith T (1978) Nuclear volume control by nucleoskeletal 49 Lynch M (2007) The Origins of Genome Architecture (Sinauer, DAN, selection for cell volme and cell growth rate, and the solution junk could then become functional. How- Sunderland, MA). of the DNA C-value paradox. J Cell Sci 43:247–278. 50 Faulkner GJ, Carninci P (2009) Altruistic functions for selfish DNA. ever, such a philosophically informed the- 21 Doolittle WF, Sapienza C (1980) Selfish genes, the phenotype Cell Cycle 8(18):2895–2900. paradigm and genome evolution. Nature 284(5757):601–603. oretical expansion is not what ENCODE, or 51 Fedoroff NV (2012) Presidential address. Transposable elements, 22 Orgel LE, Crick FHC (1980) Selfish DNA: The ultimate parasite. at least those authors stressing the demise , and genome evolution. Science 338(6108):758–767. Nature 284(5757):604–607. 52 Gray MW, Lukes J, Archibald JM, Keeling PJ, Doolittle WF (2010) of junk, so far seem to have in mind (1–5). 23 González J, Petrov DA (2012) Evolution of genome content: Cell biology. Irremediable complexity? Science 330(6006):920–921. Population dynamics of transposable elements in flies and humans. In the end, of course, there is no experi- 53 Lukes J, Archibald JM, Keeling PJ, Doolittle WF, Gray MW (2011) Methods Mol Biol 855:361–383. fi How a neutral evolutionary ratchet can build cellular complexity. mentally ascertainable truth of these de ni- 24 Lynch M, Bobay LM, Catania F, Gout JF, Rho M (2011) The – tional matters other than the truth that many repatterning of eukaryotic genomes by random genetic drift. Annu IUBMB Life 63(7):528 537. 54 Rev Genomics Hum Genet 12:347–366. Stoltzfus A (2012) Constructive neutral evolution: Exploring of the most heated arguments in biology are ’ 25 Fernández A, Lynch M (2011) Non-adaptive origins of evolutionary theory s curious disconnect. Biol Direct 7:35. 55 notaboutfactsatallbutratheraboutthe interactome complexity. Nature 474(7352):502–505. Doolittle WF (2012) Evolutionary biology: A ratchet for protein – words that we use to describe what we think 26 Zuckerkandl E (1986) Polite DNA: Functional density and complexity. Nature 481(7381):270 271. 56 Djebali S, et al. (2012) Landscape of transcription in human cells. the facts might be. However, that the debate functional compatibility in genomes. J Mol Evol 24(1–2):12–27. 27 Doolittle WF (1987) What introns have to tell us: Hierarchy in Nature 489(7414):101–108. is in the end about the meaning of words genome evolution. Cold Spring Harb Symp Quant Biol 57 Frazer KA (2012) Decoding the human genome. Genome Res does not mean that there are not crucial dif- 52:907–913. 22(9):1599–1601. 28 Garson J (2011) Selected effects and causal role functions in the 58 Graur D, et al. (2013) On the immortality of television sets: ferences in our understanding of the evolu- brain: The case for and etiological approach to neuroscience. Biol “Function” in the human genome according to the evolution- tionary process hidden beneath the rhetoric. Philos 26:547–565. free gospel of ENCODE. Genome Biol Evol, 10.1093/gbe/evt028.

5300 | www.pnas.org/cgi/doi/10.1073/pnas.1221376110 Doolittle Downloaded by guest on September 28, 2021