Gene Duplication and Evolution

T ECHNICAL C OMMENTS An analogy for the application of half- lives is the mortality of newborns centuries Gene Duplication and Evolution ago: At that time the infant mortality rate was Lynch and Conery (1) presented one of the in the mouse Sp100-rs family, a short lineage very high, because medical science was un- first serious efforts to study the evolutionary of Mus musculus has created at least 60 gene derdeveloped—but just because the “half- fate of gene duplication using genomic se- duplicates within 1.7 million years; other lin- life” of newborns is short, it does not follow quence data. Their analysis led to several eages such as the sibling taxa Mus caroli,a that half of all adults will die shortly. We interesting observations, particularly with re- group that diverged 2.5 million years ago, suggest that figure 2 of (1) supports a con- spect to the rate of gene duplication in eu- contain few duplicates (4). If the duplication clusion opposite to the one that Lynch and karyotic genomes and the subsequent half- rate over the time during which divergence is Conery drew: A large proportion of duplicate life of duplicates. These two parameters are observed is much lower than the recent rate genes either have evolved new functions (7) of particular importance in studying the evo- of duplication, the half-life calculated by or have been maintained by subfunctionaliza- lutionary processes of gene duplication and Lynch and Conery would represent a serious tion (8, 9) or other mechanisms. subsequent functional divergence. The most underestimate. Manyuan Long frequent class of duplications appeared to be Finally, an alternative interpretation for Department of Ecology and Evolution similar in all six species, which suggests the short half-life of duplicate genes before University of Chicago some silencing process for old duplicates. silencing may deserve consideration. Assum- 1101 East 57th Street Several additional considerations in the anal- ing that small values of S may more reliably Chicago, IL 60637, USA ysis and interpretation, however, might have reflect a short evolutionary time, the authors led to some different conclusions. chose to estimate the half-life of duplicate Kevin Thornton First, Lynch and Conery (1) used the genes only from gene pairs with S values in Committee on Genetics number of substitutions per silent site, S,to the range of 0 to 0.25. They estimated a mean University of Chicago measure the age of a duplicate-gene pair half-life of 4 million years, concluding that [figure 2 of (1)]. It is unclear, however, that “the fate awaiting most gene duplications References silent divergence is a suitable proxy for a appears to be silencing rather than preserva- 1. M. Lynch, J. S. Conery, Science 290, 1151 (2000). 2. L.-W. Zeng, J. M. Comeron, B. Chen, M. Kreitman, molecular clock involving different genes or tion,” and, hence, that “duplicate genes may Genetica 102Ð103, 369 (1998). gene duplicates. For example, Zeng et al. (2) only rarely evolve new functions.” Yet their 3. W.-H. Li, Molecular Evolution (Sinauer Associates, reported 9- to 15-fold differences in S values analysis appears to have ignored several im- Sunderland, MA, 1997). and a flat distribution of S for 24 single-copy portant features of the data [figure 2 of (1)]. 4. D. Weichenhan, B. Kunze, W. Traut, H. Winking, Cytogenet. Cell Genet. 80, 226 (1998). genes in Drosophila. Two points are impor- (i) Notwithstanding their model of “young” 5. D. A. Petrov, E. R. Lozovskaya, D. L. Hartl, Nature 384, tant in this context: (i) this large variation in duplicates, the tails of the distribution are 346 (1996). S is expected when the divergence time is long and flat, which suggests that the data are 6. G. M. Rubin et al., Science. 287, 2204 (2000). 7. W. Wang, J. Zhang, C. Alvarez, A. Llopart, M. Long, low; and (ii) the divergence time for each actually heterogeneous. (ii) The proportions Mol. Biol. Evol. 17, 1294 (2000). comparison made by Zeng et al.(2) was of the duplications that reside in the tails are 8. A. Force et al., Genetics 151, 1531 (1999). fixed. Thus, for different genes, S may vary high—85% for Drosophila melanogaster, 9. M. Lynch, A. Force, Genetics 154, 459 (2000). by more than an order of magnitude given a 66% for Caenorhabditis elegans, and 65% 21 December 2000; accepted 21 June 2001 fixed divergence time. This situation differs for Saccharomyces cerevisiae. (iii) The tails from description of divergence time using S include old and ancient duplications. The Lynch and Conery (1) have proposed a num- values from homologous genes across a heterogeneity of the age distribution in figure ber of provocative hypotheses regarding the group of organisms, in which a dependable 2of(1) suggests that the short half-life cal- evolution of duplicate genes, using data from molecular clock may exist. The same S val- culated from young duplicate-gene pairs can- nine eukaryotic species. One hypothesis is ues may represent duplicates of very different not be extended to most pairs. After all, a that the ratio of replacement (R) to silent (S ) ages, and the different S values may be from large proportion of these older duplicates nucleotide substitutions among recently du- duplicates of the same or similar ages. Thus, may be much older than 4 million years, with plicated genes is near 1.0, the neutral expec- figure 2 of (1) should be viewed with caution real ages of tens or hundreds of million years. tation. Their analysis indicates that this phase as a description of the age distribution of It is likely that these genes have been func- of relaxed selection is confined to recently gene duplications. A related issue is the reli- tional since their origin; otherwise, the dupli- duplicated gene pairs. Another hypothesis is ability of estimates of S, because many of the cate sequences would have been deleted from that many duplicate-gene pairs are short- values presented by Lynch and Conery (1) the genome (5). lived, with half-lives of 3 to 7 million years, were larger than 1. Estimates larger than 1 are In addition, the absolute number of old or depending on the organism. associated with a large variance due to satu- ancient gene duplicates is relatively large. Unfortunately, their conclusions are com- ration of substitutions and should generally For example, 40% of the approximately promised by the fact that their data, obtained be considered unreliable (3). 13,600 coding sequences in the D. melano- through GenBank taxon searches, included Second, the calculation of the half-life of gaster genome appear to have arisen by gene many redundant records. For example, 43.3% gene duplicates was based on the untested, duplication (6). Thus, some 34% of the fly of the gene pairs in their Arabidopsis data set hidden assumption that the rate of gene du- genome, or 4624 genes [40% ϫ 85% ϫ had no synonymous differences (S ϭ 0). We plication is constant over evolutionary 13,600, with the 85% from item (ii), above], randomly examined 50 of these gene pairs time—an assumption implicit in both figure 3 comprise old or ancient duplicates. It is there- and found that 86% were derived from the and equation 3 of (1). Unfortunately, there fore misleading to assert that the vast major- same genomic sequence, mostly because of are insufficient data with which to estimate ity of gene duplicates are quickly silenced, the presence of a single gene on two overlap- the variation in the rate of gene duplication even if the calculation of the half-life is ping clones. These redundant sequences were on a short time scale; nevertheless, there is correct. Rather, it appears that the accumula- used to estimate the rate at which duplicate- some evidence that the duplication rate for tion of “survivors” of the silencing process gene pairs reverted to single copies, a proce- some families may indeed not be stationary constitutes a large fraction of modern eukary- dure that tended to overestimate the rate of over a short evolutionary time. For example, otic genomes. gene loss. Such problems were not limited to www.sciencemag.org SCIENCE VOL 293 31 AUGUST 2001 1551a T ECHNICAL C OMMENTS Arabidopsis; 58.3% of human gene pairs and Liqing Zhang demographic analysis of gene duplicates 67.7% of mouse gene pairs had R ϭ S ϭ 0. Brandon S. Gaut made the hidden assumption of a constant Because Lynch and Conery recognized the Department of Ecology and rate of gene duplication over time. This as- potential problem of redundancy, human and Evolutionary Biology sumption was actually stated explicitly in (1), mouse gene pairs with S Ͻ0.01 were not used University of California, Irvine although strictly speaking we only assumed in their analyses. In many cases, however, Irvine, CA 92697, USA rate constancy over the time scale for which S Ͻ Ͻ both gene sequences from an S 0.01 pair Todd J. Vision 0.25. Long-term rate constancy was not were compared with a more distant gene USDA-ARS Center for Agricultural relevant to our birthrate estimates, which family member, which did result in the use of Bioinformatics were simply the average values that apply redundant data entries. Cornell University over the time scale required for S to reach Also problematic are the mammalian Ithaca, NY 14853, USA 0.01—perhaps the past few hundred thousand gene pairs in the 0.01 Ͻ S Ͻ0.05 class, to million years for the species analyzed. Rate which were crucial to the conclusion by References constancy is an important assumption under- Lynch and Conery (1) that selective con- 1.

Load more