25.5 MH 22/5/06 2:53 PM Page 398

NEWS FEATURE |Vol 441|25 May 2006 C DARKIN

© 2006 Nature Publishing Group 25.5 gene MH 22/5/06 2:53 PM Page 399

NATURE|Vol 441|25 May 2006 NEWS FEATURE C. DARKIN WHAT IS A GENE? The idea of as beads on a DNA string is fast fading. Protein-coding sequences have no clear beginning or end and RNA is a key part of the information package, reports Helen Pearson.

ene’ is not a typical four-letter Laurence Hurst at the University of Bath, UK. viously unimagined scope of RNA. word. It is not offensive. It is never “All of that information seriously challenges The one gene, one protein idea is coming bleeped out of TV shows. And our conventional definition of a gene,” says under particular assault from researchers who ‘Gwhere the meaning of most four- molecular biologist Bing Ren at the University are comprehensively extracting and analysing letter words is all too clear, that of gene is not. of California, San Diego. And the information the RNA messages, or transcripts, manufac- The more expert scientists become in molecu- challenge is about to get even tougher. Later tured by genomes, including the human and lar , the less easy it is to be sure about this year, a glut of data will be released from mouse genome. Researchers led by Thomas what, if anything, a gene actually is. the international Encyclopedia of DNA Ele- Gingeras at the company Affymetrix in Santa Rick Young, a geneticist at the Whitehead ments (ENCODE) project. The pilot phase of Clara, California, for example, recently studied Institute in Cambridge, Massachusetts, says ENCODE involves scrutinizing roughly 1% of all the transcripts from ten chromosomes that when he first started teaching as a young the in unprecedented detail; across eight human cell lines and worked out professor two decades ago, it took him about the aim is to find all the precisely where on the chro- two hours to teach fresh-faced undergraduates sequences that serve a useful “We’ve come to the mosomes each of the tran- what a gene was and the nuts and bolts of how purpose and explain what realization that the scripts came from3. it worked. Today, he and his colleagues need that purpose is. “When we genome is full of The picture these studies three months of lectures to convey the concept started the ENCODE project paint is one of of the gene, and that’s not because the students I had a different view of overlapping transcripts.” mind-boggling complexity. are any less bright. “It takes a whole semester what a gene was,” says con- — Phillip Kapranov Instead of discrete genes to teach this stuff to talented graduates,” Young tributing researcher Roderic dutifully mass-producing says. “It used to be we could give a one-off def- Guigo at the Center for Genomic Regulation identical RNA transcripts, a teeming mass of inition and now it’s much more complicated.” in Barcelona. “The degree of complexity we’ve transcription converts many segments of the In classical genetics, a gene was an abstract seen was not anticipated.” genome into multiple RNA ribbons of differing concept — a unit of inheritance that ferried a lengths. These ribbons can be generated from characteristic from parent to child. As bio- Under fire both strands of DNA, rather than from just one chemistry came into its own, those character- The first of the complexities to challenge molec- as was conventionally thought. Some of these istics were associated with enzymes or proteins, ular biology’s paradigm of a single DNA transcripts come from regions of DNA previ- one for each gene. And with the advent of mol- sequence encoding a single protein was alterna- ously identified as holding protein-coding ecular biology, genes became real, physical tive splicing, discovered in viruses in 1977 (see genes. But many do not. “It’s somewhat revolu- things — sequences of DNA which when con- ‘Hard to track’, overleaf). Most of the DNA tionary,” says Gingeras’s colleague Phillip verted into strands of so-called messenger sequences describing proteins in humans have a Kapranov. “We’ve come to the realization that RNA could be used as the basis for building modular arrangement in which exons, which the genome is full of overlapping transcripts.” their associated protein piece by piece. The carry the instructions for making proteins, are Other studies, one by Guigo’s team4, and one great coiled DNA molecules of the chromo- interspersed with non-coding introns. In alter- by geneticist Rotem Sorek5, now at Tel Aviv somes were seen as long strings on which gene native splicing, the cell snips out introns and University, Israel, and his colleagues, have sequences sat like discrete beads. sews together the exons in various different hinted at the reasons behind the mass of tran- This picture is still the working model for orders, creating messages that can code for dif- scription. The two teams investigated occa- many scientists. But those at the forefront of ferent proteins. Over the years geneticists have sional reports that transcription can start at a genetic research see it as increasingly old-fash- also documented overlapping genes, genes DNA sequence associated with one protein ioned — a crude approximation that, at best, within genes and countless other weird arrange- and run straight through into the gene for a hides fascinating new complexities and, at ments (see ‘Muddling over genes’, overleaf). completely different protein, producing a worst, blinds its users to useful new paths Alternative splicing, however, did not in itself fused transcript. By delving into databases of of enquiry. require a drastic reappraisal of the notion of a human RNA transcripts, Guigo’s team esti- Information, it seems, is parceled out along gene; it just showed that some DNA sequences mate that 4–5% of the DNA in regions con- chromosomes in a much more complex way could describe more than one protein. Today’s ventionally recognized as genes is transcribed than was originally supposed. RNA molecules assault on the gene concept is more far reach- in this way. Producing fused transcripts could are not just passive conduits through which the ing, fuelled largely by studies that show the pre- be one way for a cell to generate a greater vari- gene’s message flows into the world but active ety of proteins from a limited number of regulators of cellular processes. In some cases, exons, the researchers say. RNA may even pass information across gener- Many scientists are now starting to think ations — normally the sole preserve of DNA. that the descriptions of proteins encoded in PLAILLY/SPL P. An eye-opening study last year raised the DNA know no borders — that each sequence possibility that plants sometimes rewrite their reaches into the next and beyond. This idea DNA on the basis of RNA messages inherited will be one of the central points to emerge from generations past1. A study on page 469 of from the ENCODE project when its results are this issue suggests that a comparable phenom- published later this year. enon might occur in mice, and by implication Kapranov and others say that they have doc- in other mammals 2. If this type of phenome- umented many examples of transcripts in non is indeed widespread, it “would have huge Spools of DNA (above) still harbour surprises, with which protein-coding exons from one part of implications,” says evolutionary geneticist one protein-coding gene often overlapping the next. the genome combine with exons from another

399 © 2006 Nature Publishing Group 25.5 gene MH 22/5/06 2:53 PM Page 400

NEWS FEATURE NATURE|Vol 441|25 May 2006

part that can be hundreds of thousands of when scientists have searched for the genetic Hard to track bases away, with several other ‘genes’ in basis of a disease or other characteristic they 1860s After playing between. This continuum of genes might have overwhelmingly found the underlying with pea plants, even spill over the boundaries of chromo- mutation to be in a protein-coding gene Austrian monk somes: last year, Richard Flavell at Yale Uni- rather than in another region. “The prepon- Gregor Mendel versity School of Medicine in New Haven, derance of evidence suggests that protein- defines the basic Connecticut, documented human immune- coding genes will hold their own when the rules of inheritance. system genes that seem to be controlled by day is over,” Hogenesh says. Traits are regulatory regions from another chromo- Some of the recent discoveries — that the determined by some6. “Discrete genes are starting to van- human genome makes a continuum of tran- discrete units that ish,” Guigo says. “We have a continuum of scripts and that cells produce masses of non- are passed from one transcripts.” coding RNA molecules — have not posed generation to the next. much of a problem to people outside the ABBEY OF ST THOMAS, BRNO, CZECH REPUBLIC CZECH THOMAS, BRNO, ABBEY OF ST Slippery concept world of molecular biology. Population 1909 Danish botanist Wilhelm Johanssen The large transcriptional surveys suggest geneticists can examine how a trait is passed coins the word ‘gene’ for the unit associated that a vast amount of the RNA manufactured down and evolves regardless of the precise with an inherited trait, although the physical basis remains unknown. by the mouse and human genomes do not molecular mechanism that underlies it. For code for proteins. Last year a consortium of example, geneticists can build models show- 1910 Thomas researchers in Japan, for example, estimated ing how a mutation is inherited whether it Morgan’s work on that a whopping 63% of the mouse genome is affects a protein, a non-coding RNA or a reg- 7,8 fruitflies (right), transcribed ; only 1–2% of the genome is ulatory region. “I don’t actually care if it’s shows that genes sit thought to be spanned by sequences that making a protein or not,” says Hurst. “The on chromosomes, contain everyday exons. equations are still the same.” leading to the idea The discovery of RNA sequences that But the same can’t be said for studies of genes as beads aren’t just intermediates between the DNA revealing so-called extragenomic modes of GRAPHIC SCIENCE/ALAMY GRAPHIC SCIENCE/ALAMY on a string. and the protein-making machinery is not inheritance. In recent years, many investiga- new in itself; the cell’s protein-building appa- tors have focused on epigenetic inheritance, 1941 George Beadle ratus requires a number of RNA molecules in which information is passed from parent and Edward Tatum introduce the concept that as well as proteins to operate. But the finding to offspring independent of the DNA one gene makes one enzyme. of ‘microRNAs’ and other RNA molecules sequence. And this week in Nature (see page now known to be vital in controlling many 469), Minoo Rassoulzadegan’s team at the 1944 Genes are made of DNA, find Oswald cellular processes in plants and animals, and French National Institute for Health and Avery (below), Colin MacLeod and Maclyn the newly revealed ferment of RNA tran- Medical Research (INSERM) in Nice, France, McCarty. scription, contributes to the view that RNA reports that RNA may sometimes be compli- actively processes and carries out the cating traditional models of inheritance. 1953 James Watson instructions in the genome. In mice, mutations in the Kit gene cause and Francis Crick publish the chemical Perhaps the regions that make non-coding white patches on the tail and feet; if a mouse structure of DNA; RNA should also carry the status of genes, if has one normal Kit gene and one mutated the central dogma of not the name itself. “I think it’s time for peo- one it will have the spots. The odd thing is molecular biology ple to take a deep breath “A lot of the that some of the offspring of emerges in which and step back,” says molec- such mice, who inherit two information flows ular biologist John Mattick information is being normal Kit genes, still have the from DNA to RNA of the University of transacted by RNA.” white tail. The French group to protein. Queensland in Brisbane, suggest that the mutant Kit TENNESSEE STATE LIB. & ARCHIVES/NLM/NIH STATE TENNESSEE — John Mattick Australia. “A lot of the gene manufactures abnormal 1977 Richard Roberts and Phillip Sharp information in the system is being transacted RNA molecules, which accumulate in sperm discover that genes can be split into segments, by RNA.” and pass into the egg. These bits of RNA leading to the idea that one gene can make Although functions have been identified somehow silence the normal Kit gene in the several proteins. for several RNA molecules, the crux of the next generation and subsequent ones, pro- debate now is the extent to which all the ducing the spotted-tail effect. “We are con- 1993 The first microRNA is identified in the extra RNA plays a part. It is conceivable that vinced that it’s a more general phenomenon,” worm Caenorhabditis elegans. it is easier to overtranscribe and ignore the says co-author Franc¸ois Cuzin. rubbish than to invest in systems that pro- If this is strange, the work reported last 2003 duce only what is needed. A study from last year1 on the cress plant Arabidopsis by GeneSweep: year, however, hints that at least some of the Robert Pruitt and his colleagues at Purdue Human mass of RNAs is doing something useful. University in West Lafayette, Indiana, is even geneticists come up with a Working at the Institute of the stranger. Here the gene involved is called definition for Novartis Research Foundation in San Diego, HOTHEAD. Pruitt and his co-workers’ protein-coding California, John Hogenesch and his co- analysis shows that some plants do not carry genes in order to workers systematically quenched the activity the mutant version of HOTHEAD that their decide on a winner for a bet on the number of of more than 500 non-coding RNAs in parents possessed. These plants had replaced human genes. The winner is announced, but human cells and found that eight were the abnormal DNA sequence with the regu- 9 geneticists acknowledge that they don’t know involved in cell signalling and growth . lar code possessed by earlier generations. the true answer. But Hogenesh, and many other scientists, “It’s like, whoa, this changes everything,” remain convinced that non-coding RNAs Pruitt says. “It definitely changes my view of 2006 The idea that human genes are one long are much less important, functionally, than inheritance.” continuum begins to emerge. those that describe proteins; in the past, Pruitt is now working to explain how the

400 © 2006 Nature Publishing Group 25.5 gene MH 22/5/06 2:53 PM Page 401

NATURE|Vol 441|25 May 2006 NEWS FEATURE

one means when they talk about genes because we don’t share the same definition,” says devel- opmental geneticist William Gelbert of Har-

M. ALICE WEBB vard University in Cambridge, Massachusetts. Without a clear definition of a gene, life is also difficult for bioinformaticians who want to use computer programs to spot landmark sequences in DNA that signal where one gene ends and the next begins. But reaching a con- sensus over the definition is virtually impossi- ble, as Karen Eilbeck can attest. Eilbeck, who works at the University of California in Berke- ley, is a coordinator of the Sequence Ontology consortium. This defines labels for landmarks within genetic-sequence databases of organ- isms, such as the mouse and fly, so that the databases can be more easily compared. The consortium tries, for example, to decide whether a protein-coding sequence should always include the triplet of DNA bases that mark its end. Eilbeck says that it took 25 scientists the bet- ter part of two days to reach a definition of a gene that they could all work with. “We had several meetings that went on for hours and everyone screamed at each other,” she says. The group finally settled on a loose definition that could accommodate everyone’s demands. (Since you ask: “A locatable region of genomic sequence, corresponding to a unit of inheri- Back-up copies: mutant DNA in the cress plant may be ‘corrected’ by inherited RNA. tance, which is associated with regulatory regions, transcribed regions and/or other plant could perform such a feat. One idea is promises to enrich — and complicate — the functional sequence regions.”) that they carry a back-up copy of their grand- notion of a gene yet further. Rather than striving to reach a single defin- parents’ genetic information encoded in RNA Leaving aside the can of worms that studies ition — and coming to blows in the process — that is passed into seeds along with the regular on epigenetics are beginning to open up, does most geneticists are instead incorporating less DNA and is then used as a template to ‘correct’ it matter that many scientists not directly con- ambiguous words into their vocabulary such certain genes. Conceivably, Pruitt says, some cerned with molecular mechanisms continue as transcripts and exons. When it is used, the of the mystery non-coding transcripts could to think of genetics in simpler terms? Some word ‘gene’ is frequently preceded by ‘protein- be responsible. “I think there’s something geneticists say yes. They worry that coding’ or another descriptor. “We almost being inherited outside what we think of as the researchers working with an oversimplistic have to add an adjective every time we use that conventional DNA genome.” idea of the gene could discard important noun,” says Francis Collins, director of the results that don’t fit. A medical researcher, for National Human Genome Research Institute Changing views example, might gloss over the many different at the National Institutes of Health in The implications of such findings for our transcripts generated by a sequence at one Bethesda, Maryland. understanding of evolution have yet to be fig- location. And the lack of a clear idea of what a But however much geneticists struggle to pin ured out. But research into the role of RNA as a gene is might also hinder collaboration. “I find down the elusive gene, it is precisely its ambigu- carrier of information across generations it sometimes very difficult to tell what some- ous nature that fuels their continued curiosity. “It’s ever more fascinating,” says Whitehead’s Young. Some things, it seems, are not best por- Muddling over genes trayed by a crude four-letter word. ■ Science philosophers Karola One is a DNA segment 500 biologists who Helen Pearson is a reporter working for Nature Stotz, at Indiana University that uses some of the same completed the questionnaire. in New York. in Bloomington, and Paul protein-coding sequences Stotz and Griffiths found that 1. Lolle, S. J., Victor, J. L., Young, J. M. & Pruitt. R. E. Nature Griffiths, now at the to manufacture two entirely 60% are typically sure of one 434, 505–509 (2005). University of Queensland in different proteins with answer, and 40% are 2. Rassoulzadegan, M. et al. Nature 441, 469–474 (2006). Australia, are attempting to distinct functions. In another, confident of another. 3. Cheng J. et al. Science 308, 1149–1154 (2005). measure the extent of one ‘gene’ is nestled within Hardly any confess that 4. Parra, G. et al. Genome Res. 16, 37–44 (2006). 5. Akiva, P. et al. Genome Res. 16, 30–36 (2006). working biologists’ the non-protein coding they don’t know. 6. Spilianakis, C. G., Lalioti, M. D., Town, T., Lee, G. R. & Flavell, bewilderment over genes. intron of another. Another Stotz wants to examine R. A. Nature 435, 637–645 (2005). They collected together 14 protein is assembled when whether scientists working in 7. FANTOM Consortium and RIKEN Genome Exploration weird and wonderful (but four different RNA separate disciplines tend to Research Group and Genome Science Group (Genome Network Project Core Group) Science 309, 1559–1563 real) genetic arrangements molecules, made from DNA view the situations in (2005). and asked biologists to scattered over 40,000 base different lights. “It will be 8. RIKEN Genome Exploration Research Group and Genome decide whether each pairs, are assembled into interesting to know if there is Science Group (Genome Network Project Core Group) and represents one, or more one transcript. some order to the confusion,” the FANTOM Consortium Science 309, 1564–1566 than one, gene. Confused? So were the Stotz says. H.P. (2005). 9. Willingham, A. T. et al. Science 309, 1570–1573 (2005).

401 © 2006 Nature Publishing Group