<<

Primer Duplication: The Genomic Trade in Spare Parts Matthew Hurles

f necessity is the mother of change. provides is the widespread existence of gene invention, then its father is an opportunities to explore this forbidden families. Members of a that Iinveterate tinkerer, with a large evolutionary space more widely by share a common ancestor as a result garage full of spare parts. Innovation generating duplicates of a gene that of a duplication event are denoted as (like homicide) requires motive and can ‘wander’ more freely, on condition being paralogous, distinguishing them opportunity. Clearly, the predominant that between them they continue to from orthologous in different ‘motive’ during the of a novel supply the original function. , which share a common gene function is to gain a selective Susumu Ohno was the fi rst to ancestor as a result of a advantage. To understand why gene comprehensively elucidate the event. Paralogous genes can often duplications represent the major potential of gene duplication, in his be found clustered within a , ‘opportunities’ from which new genes book Evolution by Gene Duplication, although dispersed paralogues, often evolve, we must fi rst consider what published more than 30 years ago with more diverse functions, are also constrains genic evolution. (Ohno 1970). The prescience of common. The vast majority of genes in every Ohno’s book is highlighted by the genome are selectively constrained, fact that his book has almost certainly in that most nucleotide changes that been cited more times in the past fi ve Copyright: © 2004 Matthew Hurles. This is an alter the fi tness of the are years than in the fi rst fi ve years after its open-access article distributed under the terms of the Creative Commons Attribution License, which deleterious. How do we know this? publication. permits unrestricted use, distribution, and reproduc- Comparisons between genomes clearly tion in any medium, provided the original work is demonstrate that coding sequences What Is the Evidence for the properly cited. diverge at slower rates than non-coding Importance of Gene Duplication? Matthew Hurles is at the Wellcome Trust Sanger Insti- regions, largely due to a defi cit of The primary evidence that tute near Cambridge in the United Kingdom. E-mail: [email protected] at positions where a base duplication has played a vital role in change would cause an amino-acid the evolution of new gene functions DOI: 10.1371/journal.pbio.0020206

PLoS Biology | http://biology.plosjournals.org July 2004 | Volume 2 | Issue 7 | Page 0900 Whole genome sequences of closely of new biological functions over all classifi ed on the basis of the size of related have allowed evolutionary timescales, it is of great duplication generated, and whether us to identify changes in the gene interest to be able to comprehensively they involve an RNA intermediate complements of over relatively document the duplicative differences (Figure 1). short evolutionary distances. These that exist between our own species ‘Retrotransposition’ describes the comparisons typically reveal dramatic and our closest relatives, the great integration of reverse transcribed expansions and contractions of apes. The study by Fortna et al. (2004) mature RNAs at random sites in a gene families that can be related to in this issue of PLoS Biology identifi es genome. The resultant duplicated underlying biological differences. For over 3% of around 30,000 genes as genes (retrogenes) lack introns and example, and mice differ having undergone lineage-specifi c have poly-A tails. Separated from their in their sensory reliance on sight copy number changes among fi ve regulatory elements, these integrated and smell respectively; colour vision hominoid (humans plus the great apes) sequences rarely give rise to expressed in humans has been signifi cantly species. This is the fi rst time that copy full-length coding sequences, although enhanced by the duplication of an number changes among apes have functional retrogenes have been Opsin gene that allows us to distinguish been assayed for the vast majority of identifi ed in most genomes. light at three different wavelengths, genes, and we can expect that Tandem duplication of a genomic while mice can distinguish only segment (segmental duplication) the biological consequences of the 140 two. By contrast, a much higher is one of the possible outcomes of human-specifi c copy number changes proportion of the large gene family of ‘’, which results identifi ed in this study will be heavily olfactory receptors have retained their from investigated over the coming years. functionality in mice, as compared to between paralogous sequences. humans. How Do Duplications Arise? These recombination events can also Given the apparent importance of The various mechanisms by which give rise to the deletion or inversion gene duplication for the evolution genes become duplicated are often of intervening sequences. Recent

DOI: 10.1371/journal.pbio.0020206.g001

Figure 1. Mechanism of Gene Duplication A two-exon gene is fl anked by two Alu elements and a neighbouring replication termination site. Recombination between the two Alu elements leads to a tandem duplication event, as does a replication error instigated by the replication termination site. Retrotransposition of the mRNA of the gene leads to the random integration of an intron-less paralogue at a distinct genomic location.

PLoS Biology | http://biology.plosjournals.org July 2004 | Volume 2 | Issue 7 | Page 0901 evidence suggests that the explosion duplications, because duplication So what are the relative contributions of segmental duplications in recent breakpoints are enriched at replication of these different mechanisms? Not evolution has been caused in termination sites (Koszul et al. 2004). all interspersed duplicate genes are part by the rapid proliferation of Alu Genome duplication events generated by retrotransposition. elements about 40 MYA. Alu elements generate a duplicate for every gene The initially tandem arrangement of are derived from the 7SL RNA gene in the genome, representing a huge segmental duplications can be broken and represent the most frequent opportunity for a step-change in up by subsequent rearrangements. dispersed repeat in the human organismal complexity. However, In keeping with this hypothesis, genome, with the approximately genome duplication presents duplicated genes in a tandem 1 million copies of the 300-bp Alu signifi cant problems for the faithful arrangement typically represent more element representing around 10% transmission of a genome from recent duplication events (Friedman of the entire genome. The striking one generation to the next, and is and Hughes 2003). Recent analyses enrichment of Alu elements at the consequently a rare event, at least suggest that 70% of non-functional junctions between duplicated and in Metazoa. In principle, genome duplicated genes () single copy sequences implicates duplications should be easily identifi ed in the result from unequal crossing over between these through the coincident emergence retrotransposition rather than any repeats in the generation of segmental within a phylogeny of many gene DNA-based process (Torrents et al. duplications (Bailey et al. 2003). families. Unfortunately, this signal is 2003). The observation of segmental complicated by subsequent piecemeal duplication events with no evidence for loss and gain of gene family members. What Fates Befall a Recently -driven unequal crossing over Consequently, there is heated Duplicated Gene? suggests that segmental duplications debate over possible ancient genome A duplicated gene newly arisen can also arise through non-homologous duplication events in early vertebrate in a single genome must overcome mechanisms. A recent screen for evolution and more recently in teleost substantial hurdles before it can be spontaneous duplications in fi sh, both of which must have occurred observed in evolutionary comparisons. suggests that replication-dependent hundreds of millions of years ago First, it must become fi xed in the breakages also play a (McLysaght et al. 2002; Van de Peer et population, and second, it must be signifi cant role in generating tandem al. 2003). preserved over time. Population tells us that for new , fi xation is a rare event, even for new mutations that confer an immediate selective advantage. Nevertheless, it has been estimated that one in a hundred genes is duplicated and fi xed every million years (Lynch and Conery 2000), although it should be clear from the duplication mechanisms described above that it is highly unlikely that duplication rates are constant over time. However, once fi xed, three possible fates are typically envisaged for our gene duplication. Despite the slackened selective constraints, mutations can still destroy the incipient functionality of a duplicated gene: for example, by introducing a premature stop codon or a that destroys the structure of a major domain. These degenerative mutations result in the creation of a (nonfunctionalization). Over time, the likelihood of such a mutation being introduced increases. Recent studies suggest that there is a relatively DOI: 10.1371/journal.pbio.0020206.g002 narrow time window for evolutionary Figure 2. Fates of Duplicate Genes exploration before degradation A new duplication in a gene (blue) with two tissue-specifi c promoters (arrows) arises becomes the most likely outcome, in a population of single copy genes. Fixation within the population results in a typically of the order of 4 million years minority of cases. After fi xation, one gene is inactivated (degradation) or assumes a new function (neofunctionalization), or the expression pattern of the original gene is (Lynch and Conery 2000). partitioned between the two duplicates as one promoter is silenced in each duplicate in a During the relatively brief period complementary manner (subfunctionalization). of relaxed selection following gene

PLoS Biology | http://biology.plosjournals.org July 2004 | Volume 2 | Issue 7 | Page 0902 duplication, a new, advantageous investigated the retrotransposition retains the original function, nor has may arise as a result of one of the (since the existence of a human–mouse the original function been partitioned. gene copies gaining a new function common ancestor) of one of the two This mode of ‘duplicate co-evolution’ (neofunctionalization). This can be autosomal copies of the CDYL gene is likely to be especially prevalent in revealed by an accelerated rate of to (forming CDY). In signalling pathways. amino-acid change after duplication in the mouse, both Cdyl genes produce Earlier, we saw that homologous one of the gene copies. This burst of two distinct transcripts, one of which is recombination between selection is necessarily episodic—once expressed ubiquitously while the other paralogous sequences can result in a new function is attained by one of is testis-specifi c. By contrast, in humans rearrangements, including tandem the duplicates, selective constraints both CDYL genes produce a single duplications. Such recombination on this gene are reasserted. These ubiquitously expressed transcript, and events need not cause rearrangements, patterns of selection can be observed CDY exhibits testis-specifi c expression. but can also result in the non- in real data: most recently duplicated As CDY is a retrogene (see above) that reciprocal transfer of sequence gene pairs in the human genome have has not been duplicated together with from one paralogue to the other—a diverged at different rates from their its ancestral regulatory sequences, it process known as . ancestral amino-acid sequence (Zhang is clear that the DDC model is not the Gene conversion homogenizes et al. 2003). A convincing instance of only route by which to achieve spatial paralogous sequences, retarding neofunctionalization is the evolution partitioning of ancestral expression their divergence, and consequently of antibacterial activity in the ECP gene patterns. obscuring their antiquity. This leads in Old World Monkeys and hominoids Subfunctionalization can also lead to to the observation of ‘concerted after a burst of amino-acid changes the partitioning of temporal as well as evolution’ whereby duplicates within following the tandem duplication spatial expression patterns. In humans, a species can be highly similar and yet of the progenitor gene EDN (a the β-globin cluster of duplicated genes continue to diverge between species ribonuclease) some 30 MYA (Zhang et contains three genes with coordinated (Figure 3). Once gene duplicates al. 1998). The divergence of duplicated but distinct developmental expression have diverged suffi ciently so that genes over time can be also monitored patterns. One gene is expressed in they differ in their functionality (or in genome-wide functional studies. In embryos, another in foetuses, and non-functionality), gene conversion both yeast and nematodes, the ability of the third from neonates onwards. In events can become deleterious—for a gene to buffer the loss of its duplicate addition, coding sequence changes example, by introducing disrupting declines over time as their functional have co-evolved with the regulatory mutations from a pseudogene into overlap decreases. changes so that the O2 binding affi nity its functional duplicate. A substantial Rather than one gene duplicate of haemoglobin is optimised for each proportion of disease alleles in retaining the original function, developmental stage. This coupling Gaucher disease result from the while the other either degrades or between coding and regulatory change introduction of mutations into the evolves a new function, the original is similarly noted at a genomic level glucocerebrosidase gene from a functions of the single-copy gene may when expression differences between tandemly repeated pseudogene be partitioned between the duplicates many duplicated genes pairs are (Tayebi et al. 2003). These kinds of (subfunctionalization). Many genes correlated with their coding sequence recombinatorial interactions only perform a multiplicity of subtly distinct divergence (Makova and Li 2003). occur between paralogues that are functions, and selective pressures have minimally diverged. Thus, while resulted in a compromise between Other Evolutionary Consequences selective interactions and functional optimal sequences for each role. of Gene Duplication overlap between duplicates declines Partitioning these functions between If duplication results in the relatively slowly over evolutionary the duplicates may increase the formation of a novel function as a time, the potential for recombinatorial fi tness of the organism by removing result of interaction between the two interactions between paralogues is the confl ict between two or more diverged duplicates, which of the above relatively short-lived. functions. This outcome has become categories of evolutionary outcome For some genes, duplication confers associated with a population genetic does this innovation fall into? Not all an immediate selective advantage by model known as the Duplication– new biological functions resulting from facilitating elevated expression, or Degeneration–Complementation gene duplications can be ascribed as Ohno put it, ‘duplication for the (DDC) model, which focuses attention to individual genes. Protein–protein sake of producing more of the same’. on the regulatory changes after interactions often occur between This has clearly been the case for duplication (Force et al. 1999). In diverged gene duplicates. This is histones and ribosomal RNA genes. this model, degenerative changes especially true for ligand–receptor In this scenario, gene conversion is occur in regulatory sequences of both pairs, which are often supposed to co- of potential benefi t in maintaining duplicates, such that these changes evolve after a gene duplication event, homogeneity between copies. complement each other, and the union and thus progress from homophilic Certainly both histone and rDNA of the expression patterns of the two to heterophilic interactions. This genes are commonly found in arrays duplicates reconstitutes the expression emergent function of the new gene of duplicates: structures that facilitate pattern of the original (Figure 2). pair does not fi t comfortably into array homogenization by both gene A recent study by Dorus and any of the scenarios outlined above: conversion and repeated unequal colleagues (Dorus et al. 2003) both genes are functional yet neither crossing over.

PLoS Biology | http://biology.plosjournals.org July 2004 | Volume 2 | Issue 7 | Page 0903 biological consequences (if any). Extensive functional studies targeted at duplicated genes are required if we are to more fully understand the range of evolutionary outcomes. Moreover, collaborations between the proteomics and evolutionary genetics communities would facilitate investigation of the potential role of gene duplication during the evolution of the protein– protein and –cell interactions that are fundamental to the biology of multicellular organisms.

References Bailey JA, Liu G, Eichler EE (2003) An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet 73: 823–834. Dorus S, Gilbert SL, Forster ML, Barndt RJ, Lahn BT (2003) The CDY-related gene family: Coordinated evolution in copy number, expression profi le and protein sequence. Hum Mol Genet 12: 1643–1650. Emes RD, Goodstadt L, Winter EE, Ponting CP (2003) Comparison of the genomes of human and mouse lays the foundation of genome zoology. Hum Mol Genet 12: 701–709. Force A, Lynch M, Pickett FB, Amores A, Yan YL, et al. (1999) Preservation of duplicate genes DOI: 10.1371/journal.pbio.0020206.g003 by complementary, degenerative mutations. Genetics 151: 1531–1545. Figure 3. Fortna A, Young K, MacLaren E, Marshall K, Hahn Different gene conversion events homogenize minimally diverged duplicate genes in G, et al. (2004) Genome-wide identifi cation of each daughter species (A and B), with the result that while paralogues are highly similar, great ape and human lineage-specifi c genes. orthologues diverge over time. PLoS Biol 2: e207 DOI:10.1371/journal. pbio.0020207. Friedman R, Hughes AL (2003) The temporal distribution of gene duplication events in a set Mechanisms of segmental are plenty of ‘successful’ gene family of highly conserved human gene families. Mol duplication are oblivious to where clusters that contain associated Biol Evol 20: 154–161. genes begin and end, and so are pseudogenes. Koszul R, Caburet S, Dujon B, Fischer G (2004) Eucaryotic through the additionally capable of duplicating Duplicate gene evolution has spontaneous duplication of large chromosomal parts of genes or several contiguous most likely played a substantial segments. EMBO J 23: 234–243. genes. The intragenic duplication of role in both the rapid changes in Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science individual exons or enhancer elements organismal complexity apparent 290: 1151–1155. also presents new opportunities for the in deep evolutionary splits and the Makova KD, Li WH (2003) Divergence in the spatial pattern of between evolution of new functions or greater diversifi cation of more closely related human duplicate genes. Genome Res 13: regulatory complexity. species. The rapid growth in the 1638–1645. number of available genome sequences McLysaght A, Hokamp K, Wolfe KH (2002) Extensive genomic duplication during early Conclusions presents diverse opportunities to chordate evolution. Nat Genet 31: 200–204. The likelihood that newly address important outstanding Ohno S (1970) Evolution by gene duplication. London: George Allen and Unwin. 160 p. duplicated genes will both questions in duplicate gene evolution. Tayebi N, Stubblefi eld BK, Park JK, Orvisky remain functional clearly relates For those interested in patterns of E, Walker JM, et al. (2003) Reciprocal to their inherent potential to selection following duplication, the and nonreciprocal recombination at the glucocerebrosidase gene region: Implications undergo subfunctionalization or transient nature of the evolutionary for complexity in Gaucher disease. Am J Hum neofunctionalization. Under the window of opportunity following Genet 72: 519–534. Torrents D, Suyama M, Zdobnov E, Bork P (2003) DDC model, greater regulatory duplication will focus attention A genome-wide survey of human pseudogenes. complexity bestows greater potential on recently duplicated genes. In Genome Res 13: 2559–2567. for subfunctionalization (Force et al. this regard it will be important to Van de Peer Y, Taylor JS, Meyer A (2003) Are all fi shes ancient polyploids? J Struct Funct 1999), whereas neofunctionalization document 3: 65–73. is more likely to occur in genes that not only among species, as Fortna et Zhang J, Rosenberg HF, Nei M (1998) Positive are necessarily rapidly evolving, such al. have, but within species as well. Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci as those involved in reproduction, In addition, it has been, and will U S A 95: 3708–3713. immunity, and host defence (Emes continue to be, a lot easier to identify Zhang P, Gu Z, Li WH (2003) Different evolutionary patterns between young duplicate et al. 2003). This is not to say that copy number changes between genes in the human genome. Genome Biol 4: these biases are deterministic, there genomes than it is to identify their R56.

PLoS Biology | http://biology.plosjournals.org July 2004 | Volume 2 | Issue 7 | Page 0904