Downloaded from .cshlp.org on September 24, 2021 - Published by Cold Spring Harbor Laboratory Press

Insight/Outlook A New Function Evolved from Gene Fusion

Manyuan Long1 Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois 60637, USA

What constitutes genetic difference changes consistent with origin of new scription, and -cycle progression among organisms? How do new gene functions: High protein substitution (Sancho et al. 1998; Thomson et al. functions originate in nature? Since the rates and drastic changes in gene struc- 1998; Xiao et al. 1998). In Saccharomyces early days of molecular biology, we have ture. Drosophila is not the only organism cerevisiae, the UEV protein controls elon- known that homologous genes between whose genome has been found to origi- gation of polyubiquitin chains when as- species differ in DNA and protein se- nate new protein-coding genes differen- sociated with ubiquitin-conjugating en- quence. Noncoding regions have also tiating one species from another. Other zymes (E2; Hoffman and Pickart 1999). been evolving with repetitive sequences, organisms, including plants and mam- The UEV genes in divergent organisms transposable elements, and other ele- mals, also have newly originated genes. have maintained a very conserved struc- ments continuously reshaping For example, the Mus musculus genome ture in its common domain (C domain). of organisms. As more genomes of hu- contains multiple copies of the new However, there exists an additional do- mans and other organisms are exam- gene SP100-rs, which is absent in its sib- main (B domain) in one isoform of the ined, it also becomes clear that species ling species Mus caroli (Weichenhan et human gene that does not exist in other differ not only in these two genomic pa- al. 1998), though little detail of its evo- organisms and, thus, creates a new, chi- rameters but also in the number and lution and function is known. In potato, merical gene structure. How did this kinds of genes. a new cytochrome c1 originated a mito- new structure originate, and where does Genes are subject to a life and death chondrial targeting function (Long et al. the B domain come from? process: New genes have originated con- 1996). Retrosequences may have con- From the first glimpse, this human tinuously throughout evolution. For ex- tributed to the origin of new vertebrate gene is reminiscent of the chimerical ample, Drosophila melanogaster contains regulatory elements or new parts of ver- structure of two Drosophila young genes. 87 cuticle protein genes, while Cae- tebrate coding regions (Brosius 1999). In The first example is jingwei, which is norhabditis elegans contains no such these cases, recombination of protein composed of a major domain and an ad- genes in its genome (Rubin et al. 2000). modules and gene duplication played ditional N-terminal domain (Long and If this is thought to be comparing too- essential roles in creating the initial gene Langley 1993). Recent work implies that divergent organisms, take a look at re- structures, and natural selection partici- the mosaic structure of jingwei was cre- cently divergent sibling species. Dro- pated in the subsequent evolution. ated by insertion of the retrosequence of sophila teisseiri and Drosophila yakuba Although insights from young chi- the alcohol dehydrogenase gene into a contain a gene called jingwei (Long and merical genes in Drosophila have enor- previously existing gene, recruiting a Langley 1993; Wang et al. 2000), which mously changed our views of new gene portion of the N-terminal domain (Long originated only 2.5 million years ago. D. evolution, good data from humans or et al. 1999; Wang et al. 2000). The sec- melanogaster itself has a unique gene mammals have been lacking. This is a ond example is Sdic, which was created Sdic, which expresses particularly in the significant hurdle for understanding by a in two adjacent genes at sperm tail and does not exist in even its new gene evolution in the genetic sys- the DNA level (Nurminsky et al. 1998). closest relative species (Nurminsky et al. tems of the human and its primate rela- However, the human UEV gene seems to 1998). tives. In this issue, Thompson et al. have taken a different evolutionary route New genes often give rise to new (2000) present a clear example of how to acquire its additional B domain (Fig. 1). biological functions driven by adaptive new genes with novel functions can In the genomic databases of D. Darwinian selection (Long and Langley originate in humans and other mam- melananogster and C. elegans, two small 1993; Chen et al. 1997; Begun 1997; mals, including the molecular process DNA fragments unrelated to the UEV Nurminsky et al. 1998). New genes may and derived biological function. A closer gene in these species were found to be even have controlled the origination of look at the origination of this new gene, significantly similar to the B domain of new species, for example, Odysseus, a ho- Kua-UEV, offers insights into the general the human UEV gene. Further analysis meobox duplicate gene in Drosophila problem of human gene origination. showed that these are seven exons en- (Ting et al. 1998). Such new genes UEV is a conserved gene, distributed coding a 319–amino acid protein in C. are associated with two conspicuous across all major eukaryotic lineages elegans and five exons encoding a 326– ranging from animals to fungi, plants, amino acid protein in D. melanogaster. 1E-MAIL [email protected]; FAX and protozoa. The UEV proteins in these This newly discovered gene, named Kua (773)702-9740. Article and publication are at www.genome.org/cgi/ organisms share multiple functions, for (derived from the word “Cua” in Cat- doi/10.1101/gr.165700 example, cell protection, c-FOS tran- alan, which means “tail” or “queue”) en-

10:1655–1657 ©2000 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/00 $5.00; www.genome.org Genome Research 1655 www.genome.org Downloaded from genome.cshlp.org on September 24, 2021 - Published by Cold Spring Harbor Laboratory Press

Long

genes are also encoded in an operon-like structure (Blumen- thal and Spieth 1996). Thus, an authentic gene fusion should possess a particular mechanism to override the nonsense codon used to stop translation of the N-terminal protein. For ex- ample, a -like inser- tion in the stop codon would continue translation for a fused protein (Burns et al. 1990). However, the Kua-UEV human gene uses another, more sophis- ticated mechanism to solve the problem. Taking advantage of the more efficient splicing sys- tem in , Kua-UEV employs alternative splicing to skip the exon k6 of Kua that contains the Kua stop codon and exon A of UEV that con- tains a translation initiation codon. Given that many verte- brates genes often contain long UTR regions and an intergenic region, alternative splicing may Figure 1 The molecular process for Kua-UEV gene fusion. be an efficient mechanism to avoid the stop codon in up- codes a protein having features reminis- fore in various organisms. The classic ex- stream gene(s), as represented by the cent of fatty acid hydroxylase. Kua was amples are the fatty acid synthase gene Kua-UEV gene. These long stretches of also detected in other species (M. muscu- (McCarthy and Hardie 1984) and trypo- noncoding DNA may contain many lus, Trypanosoma cruzi, and Arabidopsis tophan synthethase gene in fungi stop codons, and the random peptides thaliana) but was not found in S. cerevi- (Burns et al. 1990). Other noted cases translated from such DNAs may not be siae genome sequences. include HisA and HisF in the histidine able to provide useful folds. Thus, one What is the linkage relationship be- pathway (Lang et al. 2000), glutamyl- can predict that, in the future, it would tween Kua and UEV?InD. melanogaster, and prolyl-tRNA synthetase genes (Ber- not be unusual to find gene fusion prod- Kua and the UEV gene are separated by thonneau and Mirande 2000), the ucts using this existing cellular mecha- 2.5 million bases in 1, young fusion gene Sp100-rs in M. mus- nism, rather than waiting for a mutation while in C. elegans these genes are lo- culus (Weichenhan et al. 1998), and the in the stop codon. cated on two different . old fused genes of ubiquitin and ribo- What is the evolutionary advantage Thus, the genes Kua and UEV are simply somal proteins in diverged organisms of gene fusion? Conspicuously, cova- different loci. However, in the human like yeast and human (Kirschner and lently connected proteins would ensure genome these two loci are adjacent by Stratakis 2000). In and , coregulation of of re- several kilobases, and a portion of RNA gene fusion was genomically surveyed lated functions. The covalently linked transcripts from the two genes is fused in a number of species whose genomes proteins can ensure stoichiometric pro- into a single RNA. This fused transcript have been sequenced (Snel et al. 2000). duction of the component peptides (Mc- structure may result from a relatively However, the human Kua-UEV gene fu- Carthy and Hardie 1984). Gene fusion weak terminating signal for Kua gene sion provides a revealing case regarding also confers other advantages for par- transcription. A similar mechanism is re- several important aspects of new protein ticular proteins. For example, the multi- sponsible for generating read-through origin. functionality of fatty acid synthase pre- transcripts of the L1 element and its First, a fused transcript is not a syn- vents dissociation at low protein con- downstream cellular gene sequences onym for a fused protein. Distinct pro- centration (McCarthy and Hardie 1984). (Boeke and Pickeral 1999; Moran et al. teins in prokaryotic organisms are orga- In these ideas or experiments, the fused 1999). nized in operons, long transcripts en- proteins are viewed as linked indepen- Gene fusion has been observed be- coding many proteins; many C. elegans dent functional units. In the case of Kua-

1656 Genome Research www.genome.org Downloaded from genome.cshlp.org on September 24, 2021 - Published by Cold Spring Harbor Laboratory Press

Insight/Outlook

UEV, however, a new advantage arises: created recently in humans and its close Boeke, J.D. and Pickeral, O.K. 1999. Nature The fusion creates a new function for relatives? In mouse, Kua and UEV may 398: 108–109. UEV enzymatic activity. The nonfused be in close proximity, because a hybrid Burns, D.M., Horn, V., Paluh, J., and Yanofsky, C. 1990. J. Biol. Chem. 265: form of UEV proteins, UEV1A, is intra- transcript of these two genes was also 2060–2069. cellularly located in the nucleus, while observed (T.M. Thomson, pers.comm.). Chen, L., DeVries, A.L., and Cheng, C.H. KUA proteins are distributed in endo- However, these two mouse genes gener- 1997. Proc. Natl. Acad. Sci. 94: membranes. Consistent with the loca- ate different hybrid transcripts, suggest- 3817–3822. tion of KUA proteins, KUA-UEV proteins ing that the fused protein and its func- Fink, G.R. 1987. Cell 49: 5–6. were shown to be associated with cyto- tions may have evolved recently in hu- Hofmann, R.M. and Pickart, C.M. 1999. Cell plasmic structures. Thus, the fused UEV mans or their primate ancestors. It should 96: 645–653. Kirschner, L.S. and Stratakis, C.A. 2000. enzymes work in new intracellular loca- be possible to demonstrate or falsify this Biochem. Biophys. Res. Commun. 270: tions, suggesting the origin of a new hypothesis by characterizing Kua and 1106–1110. gene function. This fused gene mimics UEV genes in our primate relatives. Lang, D., Thoma, R., Henn-Sax, M., Sterner, some chimerical genes created by exon Finally, UEV genes also possess an R., and Wilmanns, M. 2000. Science 289: shuffling, for example, the coxII gene interesting exon-intron structure that is 1546–1550. (Nugent and Palmer 1991) and the po- telling about intron evolution. UEV has Long, M. and C.H. Langley. 1993. Science 260: 91–95. tato cytochrome c1 (Long et al. 1996), unusually conserved positions of intron Long, M., de Souza, S.J., Rosenberg, C., and where the N-terminal-recruited portions 2 and 3, identical among plants, fungi, Gilbert, W. 1996. Proc. Natl. Acad. Sci. also ensure a particular intracellular po- animals, and protozoa. These two in- 93: 7727–7731. sition for the enzymatic activity en- trons thus should date back to 1–2 bil- Long, M., Wang, W., and Zhang, J. 1999. coded in the C-terminal peptide. lion years ago. However, the authors’ in- Gene 238: 135–142. Keeping original functions may be a terpretation of intron 4 in Schizosacchro- McCarthy, A.D. and Hardie, D.G. 1984. premise in the creation of new genes. myces pombe requires some caution. This Trends Biochem. Sci. 9: 60–63. Moran, J.V., DeBerardinis, R.J., and Kazazian Gene duplication is often involved in intron is interpreted as a new arrival by Jr., H.H. 1999. Science 283: 1530–1534. exon shuffling, suggesting selection recent intron insertion. This explana- Nugent, J.M. and Palmer, J.D. 1991. Cell 66: pressure for maintaining the function of tion seems plausible because it is the 473–481. donor genes. Even as UEV and Kua are only intron among the analyzed organ- Nurminsky, D.I., Nurminskaya, M.V., De fused together, they have kept their isms in phase 2 (i.e., the intron breaks a Aguiar, D., and Hartl, D.L. 1998. Nature original separate functions. UEV has a codon after the second nucleotide) and 396: 572–575. duplicate copy (UEV2), but no duplicate because it breaks the secondary structure Rubin, G.M., Yandell, M.D., Wortman, J.R., Gabor Miklos, G.L., Nelson, C.R., copy of Kua has been found yet. Is this a of the UEV protein. However, an alter- Hariharan, I.K., Fortini, M.E., Li, P.W., reason for the fused gene to generate in- native hypothesis generates a biologi- Apweiler, R., Fleischmann, W., et al. dependent transcripts? While there may cally sensible prediction and, hence, 2000. Science 287: 2204–2215. be other possible reasons for this pattern may be considered for a test: If this in- Sancho, E., Vila, M.R., Sanchez-Pulido, L., of transcription, speculation like this tron is an ancient intron, like intron 2 Lozano, J.J., Paciucci, R., Nadal, M., Fox, cannot be discounted. and 3, the missing corresponding intron M., Harvey, C., Bercovich, B., Loukili, N., Ciechanover, A., et al. 1998. Mol. Cell It is often thought that the function in all species except S. pombe would be Biol. 18: 576–589. of fused genes and chimerical genes is the result of intron loss. The position of Snel, B., Bork, P., and Huynen, M. 2000. simply an addition of functions in pre- all these missing introns would be in the Trends Genet. 16: 9–11. existing component genes. If so, one 3Ј end exon. This would extend the in- Thomson, T.M., Khalid, H., Lozano, J.J., would predict that the component se- teresting model of Fink (1987) from Sancho, E., and Arino, J. 1998. FEBS Lett. quences in such genes would evolve at a yeast to the organisms under investiga- 423: 49–52. neutral substitution rate. However, this tion. In this model, introns loss, by re- Thomson, T.M., Lozano, J.J., Loukili, N., Carrio´, R., Serras, F., Cormand, B., Valeri, prediction is inconsistent with an ob- verse transcription and homologous re- M., Dı´az, V.M., Abril, J., Burset, M., et al. served phenomenon of accelerated evo- combination, should show a gradient 2000. Genome Res. 10: 1743–1756. lution in chimerical genes (Long and from 3Ј to 5Ј in loss frequency. This al- Ting, C.T., Tsaur, S.C., Wu, M.L., Wu, C.I. Langley 1993; Long et al. 1996). A recent ternative hypothesis predicts that some 1998. Science 282: 1501–1504. structural analysis of the histidine bio- eukaryotic organisms, in addition to S. Wang, W., Zhang, J., Alvarez, C., Llopart, synthesis components HisA and HisF in- pombe, may still retain this intron. A., and Long, M. 2000. Mol. Biol. Evol. dicated that the protein structure after 17: 1294–1301. Weichenhan, D., Kunze, B., Traut, W., and gene fusion was also subject to structural REFERENCES Winking, H. 1998. Cytogenet. Cell Genet. and functional adaptation (Lang et al. Begun, D.J. 1997. 145: 375–382. 80: 226–231. 2000). In this sense, gene fusion may be Berthonneau, E. and Mirande, M. 2000. Xiao, W., Lin, S.L., Broomfield, S., Chow, a critical step toward creating a new FEBS Lett. 470: 300–304. B.L., and Wei, Y.F. 1998. Nucleic Acids gene with novel function. Blumenthal, T. and Spieth, J. 1996. Curr. Res. 26: 3908–3914. Is the function of the fused Kua-UEV Opin. Genet. Dev. 6: 692–698.

Genome Research 1657 www.genome.org Errata

Genome Research 10: 1655–1567 (2000)

A New Function Evolved from Gene Fusion Manyuan Long

The following reference was omitted:

Brosius, J. 1999. Gene 238: 116–134.

Genome Research 10: 1697–1710 (2000)

Sequence and Comparative Analysis of the Mouse 1-Megabase Region Orthologous to the Human 11p15 Imprinted Domain Patrick Onyango, Webb Miller, Jessica Lehoczky, Cheuk T. Leung, Bruce Birren, Sarah Wheelan, Ken Dewar, and Andrew P. Feinberg

The URL http://www.jhmi.edu/feinberg_lab found in the Methods section under the heading ‘Sequenc- ing of the Mouse Contig’ should instead be http://www.hopkinsmedicine.org/imprinting.

308 Genome Research 11:308 ©2001 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/01 $5.00; www.genome.org www.genome.org Downloaded from genome.cshlp.org on September 24, 2021 - Published by Cold Spring Harbor Laboratory Press

A New Function Evolved from Gene Fusion

Manyuan Long

Genome Res. 2000 10: 1655-1657 Access the most recent version at doi:10.1101/gr.165700

Related Content Errata for vol. 10, p. 1655 Genome Res. February , 2001 11: 308

References This article cites 25 articles, 10 of which can be accessed free at: http://genome.cshlp.org/content/10/11/1655.full.html#ref-list-1

Articles cited in: http://genome.cshlp.org/content/10/11/1655.full.html#related-urls

License

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

Cold Spring Harbor Laboratory Press