New Evolution: Little Did We Know

Manyuan Long,1,2,∗ Nicholas W. VanKuren,1,2 Sidi Chen,3 and Maria D. Vibranovski4

1Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois 60637; email: [email protected] 2Committee on Genetics, Genomics, and Systems Biology, The University of Chicago, Chicago, Illinois 60637; email: [email protected] 3Department of Biology and the Koch Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139; email: [email protected] 4Departamento de Genetica´ e Biologia Evolutiva, Instituto de Biociencias,ˆ Universidade de Sao˜ Paulo, Sao˜ Paulo, Brazil 05508; email: [email protected]

Annu. Rev. Genet. 2013. 47:307–33 Keywords First published online as a Review in Advance on evolutionary patterns, evolutionary rates, phenotypic evolution, brain September 13, 2013 evolution, sex dimorphism, gene networks The Annual Review of Genetics is online at Abstract This article’s doi: are perpetually added to and deleted from genomes during 10.1146/annurev-genet-111212-133301 evolution. Thus, it is important to understand how new genes are Annu. Rev. Genet. 2013.47:307-333. Downloaded from Copyright c 2013 by Annual Reviews. formed and how they evolve to be critical components of the genetic Access provided by Carnegie Mellon University on 08/20/15. For personal use only. All rights reserved systems that determine the biological diversity of life. Two decades of ∗ Corresponding author effort have shed light on the process of new gene origination and have contributed to an emerging comprehensive picture of how new genes are added to genomes, ranging from the mechanisms that generate new gene structures to the presence of new genes in different organisms to the rates and patterns of new gene origination and the roles of new genes in phenotypic evolution. We review each of these aspects of new gene evolution, summarizing the main evidence for the origination and importance of new genes in evolution. We highlight findings showing that new genes rapidly change existing genetic systems that govern various molecular, cellular, and phenotypic functions.

BACKGROUND AND which to investigate the evolution of new genes HISTORICAL OVERVIEW and to understand their properties. This idea was first manifested in the discovery of jingwei, Understanding how genes originate and a three-million-year-old gene in two species of subsequently evolve is crucial to explaining the African Drosophila (85). Jingwei revealed several genetic basis for the origin and evolution of interesting features of new gene evolution that novel phenotypes and, ultimately, biological are now known to be general: (a) recombina- diversity. Gene origination is thus a widely tion of existing genes, leading to a hybrid gene interesting, yet difficult, problem to study. structure; (b) rapid sequence evolution driven Perhaps unsurprisingly, the peculiar structures, by positive selection; and (c) acquisition of new functions, and evolution of new genes have biochemical functions (150, 162). attracted the interests of pioneers in genetics Today, it is clear that new gene origination is and evolution since the early twentieth century. a general process in evolution and that species- Sturtevant (129) was one of the first to identify specific or lineage-specific genes exist in many, a duplicated gene, the Bar duplication in if not all, organisms. Gigantic databases of ge- Drosophila melanogaster, from which Muller nomic sequences from thousands of species re- (103) developed the first prevalent model of veal that genomes contain huge numbers and new gene evolution in 1936. Muller (103, a large diversity of protein-coding genes. For p. 529) predicted that a new duplicate copy example, the plant Glycine max genome en- of a gene could acquire a novel function and codes more than 50,000 protein-coding genes, be preserved in the genome, and further that whereas the bacterial genome of Candidatus “there remains no reason to doubt the appli- Hodgkinia cicadicola contains only 189 genes. In cation of the dictum ‘all life from pre-existing addition, the abundance and diversity of non- life’ and ‘every cell from a pre-existing cell’ coding genes are only now beginning to be real- to the gene: ‘every gene from a pre-existing ized. Even genomes with similar gene numbers gene.’” This early thinking on single-gene can have very different, unrelated genes. These and whole-chromosome duplications (55) was recent data reveal a widespread process of birth greatly expanded in the 1970s. Ohno (112) and death of genes in organisms in which new further developed Muller’s model in 1970, genes enter the genome and old genes are lost. and Gilbert (52) proposed an entirely new What mechanisms and forces dictate gene birth model of new gene formation in 1978, whereby and death? Specifically, how are new genes and pieces of unrelated genes can be recombined novel functions added to genomes? into new genes rather than just be strictly In the two decades since the discovery duplicated. However, experimental work on of jingwei, there have been several hundred new genes did not begin until the early 1990s additional publications reporting various inter- Annu. Rev. Genet. 2013.47:307-333. Downloaded from

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. when a plausible framework for experimental esting and significant observations of new genes studies of new gene formation and evolution and new gene functions in many different or- was proposed: studies must focus on genes that ganisms. Regrettably, we can only choose a few were recently formed because young genes representative publications to sketch several still carry all the signatures of the evolutionary lines of observation that can provide insight forces that shaped their origination and the into an emerging, global picture of new gene evolution of their new structures and functions evolution. We follow the growth of scientific (83). As genes age, they accumulate information and underlying ideas and concepts that obscure the structural or evolutionary in new gene evolution, beginning by discussing signals from their early history (83, 84). In the methods for identifying new genes and eukaryotes, genes younger than 10–30 million mechanistic processes of new gene formation. years have not experienced much sequence We then describe the rates and patterns of evolution and thus constitute a valid system in new gene origination and evolution that may

indicate some rules governing these processes This review provides an overview of efforts to and discuss the evolutionary forces that act on understand the answers to these problems. new genes. Finally, we review the rapid growth Fixation: the of studies of the phenotypic effects of new genes Approaches to Identifying New Genes population genetic and their impact on phenotypic evolution. process by which a All new gene identification methods are based spreads to all on comparative analysis of the structures of individuals in a THE CONCEPT OF NEW genes and genomes. Within a group of closely population GENE ORIGINATION related species, we can define new genes as Monophyletic group: those that are present in all members of a a group of taxa that To understand various basic properties of new monophyletic group but absent from all out- share a common gene evolution, we need to have some concep- group species (Figure 1). Early studies often ancestor tion of the process of new gene origination and serendipitously identified new genes by ana- an operational definition for the process. This lyzing the phylogenetic distribution of genes definition helps us explore methods for new via characterization of small genomic regions gene identification. (e.g., 85, 108). Microarrays (42, 44, 45) and es- pecially next-generation sequencing (168, 169) The Process of New Gene Origination have made recent searches for new genes more New gene origination begins in a microevolu- purposeful efforts. tionary process. A protogene structure is first generated by a mutation in a single germ-cell Multiple genomes. Syntenic alignments genome. This protogene structure must then (Figure 1) of genomes can be used to iden- spread through the population until it is fixed. tify new genes from related species whose Various evolutionary forces, such as natural se- phylogenetic relationship is known. Syntenic lection and genetic drift, govern the spread alignments of each gene in each species allow of the protogene through the population, thus identification of genes that are present or making protogene fixation a population genetic absent in one genome relative to another process. Both before and after fixation, the pro- (Figure 1). In these comparisons, a gene can togene accumulates mutations that confer on it be defined as a new gene candidate if it is new structures and beneficial, sometimes novel, present in a certain clade or single species functions that are acted on by natural selection. and absent in all outgroup species (Figure 1). From the point that the protogene carries an Additionally, the orthologous genes that flank optimized function and is fixed in the genome, the new gene candidate appear in all species un- it is essentially the same as most other, older der consideration. This strategy has been used genes in the genome and can be considered with great success in Drosophila and mammals Annu. Rev. Genet. 2013.47:307-333. Downloaded from

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. a new gene. New gene studies typically focus (35, 168, 169, 172). New genes formed by dif- on these first two stages (the fixation process ferent mechanisms also have correspondingly and acquisition of a beneficial function) and the different structural features that can be used consequences of accepted mutations on the se- to infer the mechanism of new gene formation quence, structure, and function of the new gene. and the ancestral and derived characters. Interest in new gene origination has raised several general problems. What molecular Single genomes. Duplicate genes within mechanisms generate new gene structures? a single genome can be identified using What are evolutionary forces that drive the exhaustive pairwise comparisons between all origination of new genes? How often are new annotated genes in that genome. Most mech- genes fixed in a species? Are there any rules anisms to form new gene structures (see be- or patterns of new gene origination? What are low) result in certain structural changes in the the roles of new genes in phenotypic evolution? new gene. For example, new genes created by • New Gene Evolution 309 GE47CH14-Long ARI 29 October 2013 14:13

b a G1 G2 G3 S1 D. simulans

S2 D. mauritiana

CdicSdic AnnX S3 D. melanogaster

S4 D. yakuba

Figure 1 New genes are defined using syntenic and sequence comparisons between the genomes of a group of related species. (a) The general procedure to identify new genes. The phylogenetic relationship of species S1–S4 is shown by the blue tree. The linking relationship between the genes G1 ( yellow), G2 (red ),andG3(green)is shown within the species tree. Aligning the genomes of species S1–S4 shows that the new gene G2 is present in S1–S3 but absent in S4, indicating that G2 arose in the common ancestor of S1–S3. G2 was thus generated in the genome between old genes G1 and G3 in the common ancestor of S1, S2, and S3 (red star). (b) An example of using syntenic alignments to identify new genes. Sdic exists only in Drosophila melanogaster (110, 160). In this case, Sdic originated as a chimeric gene through recombination of duplicates of the two flanking genes, a 5 piece of Cdic encoding a cytoplasmic dynein intermediate chain and a 3 piece of AnnX.

RNA-based duplication (retrogenes) most of- phism and divergence (60, 97) are often used to ten lack introns, contain a stretch of adenine predict whether the new gene is functional. A  nucleotide at their 3 end, and contain a pair Ka/Ks ratio significantly lower than one (for sin- of short flanking direct repeats. These signals gle genome data, Ka/Ks < 0.5 in a comparison fade with evolutionary time. Betran´ et al. (11), between the new gene and its parental copy), for Bai et al. (4), and Meisel et al. (100) took ad- example, indicates functional constraint acting vantage of these new structures to identify new on the new gene, which we would expect if dis- retrogenes in fruit flies; Wang et al. (147) in ruptive mutations were being prevented from silkworm; and Emerson et al. (43), Marques accumulating in new protein-coding genes by et al. (92) and Vinckenbosch et al. (144) in pri- natural selection. These methods are widely mates and specifically humans. Divergence be- used as the first step to predict if a new gene is tween the new retrogene and the original gene likely functional (e.g., 4, 11, 43, 147, 168, 169). from which the retrogene was derived can be used to define the age of the new genes using a molecular clock. However, both strategies that MECHANISMS TO FORM NEW we have discussed so far can depend on the cur- GENE STRUCTURES Annu. Rev. Genet. 2013.47:307-333. Downloaded from

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. rent annotations, which are biased against the How are new gene structures formed? Muta- new genes, so caution must be taken when mak- tion toward a new gene structure is the first step ing claims about the presence/absence of genes of new gene evolution, and 11 distinct molecu- in different genomes (167). lar processes are known that contribute to the formation of new genes. These mechanisms are Predicting functionality of new genes. It covered in depth elsewhere (27, 65, 84), so we is desirable to predict whether candidate new only briefly touch on them here. We highlight genes are functional before beginning more several examples in Figure 2. laborious functional and phenotypic analyses. Comparisons of open reading frame length, transcription of new gene candidates, and sub- stitution rates between nonsynonymous and Gene duplication is thought to contribute most

synonymous sites (Ka versus Ks) and polymor- to the generation of new genes. A single (or a

few) new gene structure(s) can be formed at one stage in an evolutionary process leading to time by DNA-based duplication (the copying gene fission (Figure 2c). Okamura et al. (113) and pasting of a DNA sequence from one ge- demonstrated that frameshift mutations often Pseudogenes: genes nomic region to another) or retroposition. Al- generate new coding sequences and found that are thought to though DNA-based duplications are often tan- 470 human gene duplicates that had done so. have lost their ability dem (134), retroposed genes most often move Xue et al. (157) found that the Epstein-Barr to code for a to a new genomic environment (14, 15, 65, 172), virus contains an early gene that undergoes full-length protein where they must acquire new regulatory ele- frequent frameshifts, probably to combat ments or risk becoming processed pseudogenes. host immunity. In addition, divergence in An important gene duplication mechanism is alternative splicing patterns between duplicate whole-genome duplication (WGD), which has genes can generate distinct transcripts that occurred multiple times in eukaryote evolu- produce noncoding RNAs or polypeptides tion, particularly in plants (126). Hundreds to with slightly or entirely different functions thousands of duplicate genes are formed by a and rapidly alter duplicate gene structures and WGD event, and the vast majority of dupli- functions (51, 57, 69, 163, 173). cates are quickly lost. However, estimates of duplicate gene retention after WGDs in teleost fishes (∼15% after 350 million years) (16), yeast De Novo Genes (∼12% after 80 million years) (68), and Ara- New gene structures may arise from previ- bidopsis (∼30% after 80 million years) (13) all ously noncoding DNA (Figure 2d ). Chen et al. suggest that large fractions of duplicated loci (24) showed that antifreeze proteins of low can be retained. We show below that there are structural complexity, which bind and halt the a variety of ways that new gene structures can growth of ice crystals in the blood of some polar subsequently acquire new functions (2, 33, 61, fishes, were created by amplification of previ- 78, 158, 170). McLysaght et al. (98) showed that ously noncoding microsatellite DNA. Surpris- WGD may more easily generate new paralogs. ingly, a number of de novo genes of high struc- tural complexity were first found to originate from noncoding regions in Drosophila (6, 75). Alteration of Existing Gene Structures Since then, more de novo genes were identi- New gene structures can be generated by fied in Drosophila (26, 168, 172) as well as in modifying existing genes, domains, or exons. humans (71, 153, 155, 169), primates (137), Gilbert (52) proposed that exons and domains murine rodents (104), protozoa (159), yeast (17, could be recombined to produce new chimeric 21), rice (154), and viruses (122). Similar to gene structures (Figure 2a,b). Chimeric pro- strict de novo gene origination, horizontal gene teins formed by gene recombination have been transfer (HGT), the exchange of genes between Annu. Rev. Genet. 2013.47:307-333. Downloaded from

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. found in many organisms since their discovery genomes from distantly related taxa, can im- in the LDL receptor gene (86, 130), including mediately add new genes and functions to a Drosophila (85, 118, 119), Caenorhabditis elegans genome (Figure 2f ). HGT is a major mecha- (67), mammals (92, 133), and plants (151), and nism for the addition of new genes to prokary- are estimated to have contributed ∼19% of otic genomes (73, 111) but has also been re- new exons in eukaryotes (see Reference 86 and ported in a number of eukaryotic organisms, in- references therein). In addition, retroposed cluding plants (8, 161), insects (102), and fungi sequences may jump into or near existing genes (56) (Figure 2f ). and recruit existing exons, or involve two ad- jacent protein-coding genes (164). Conversely, new gene structures may be formed by splitting Noncoding RNAs existing genes. Wang et al. (149), for example, Not all new genes code for proteins. Noncod- found that gene duplication is an intermediate ing RNAs were found to play an important • New Gene Evolution 311 GE47CH14-Long ARI 29 October 2013 14:13 Chr 1 Chr 10 Juncus Abelia Apium Actinidia Acorus Austrobaileya Amborella Betula Corylus Casuarina Fagus Oxalis Lonicera Nicotiana Sarracenia Buxus Platanus Caulophyllum Ranunculus Bocconia Sanguinaria Pandanus Magnolia Piper Nymphaea 20 ? rps2 PSMD4 40 rps11 rps11


TAG 400 1,600


350 1,400

ATG 300 1,300

1,200 250 1,100 1,000

200 900

b 150 800


100 600 50 500

400 B3 B1 B4 B1 B3 0 B3300 B1 B4 B1 B3 200 Annu. Rev. Genet. 2013.47:307-333. Downloaded from Pseudoexons

ATG TAG 100 Access provided by Carnegie Mellon University on 08/20/15. For personal use only. mNSCI mNSCI 0 Duplication Complement degeneration Gene fission with sequence loss TAG ATG TAG ATG TAG Retroposition ATG TAG ATG ATG TAG ATG TAG -derived enzymatic domain Alu Alu Adh mkg mkg and

mkg-p1 mkg-p1 / / domain (observed) mauritiana mauritiana mauritiana (predicted) . . . Hydrophobic Duplication Duplication D D D simulans ancestral (hypothetical) . mkg-r3 mkg-r3 (ancestral gene) D Adh Ymp melanogaster . Jingwei a cd e D DAF DAF

role in neuronal functions in the early 1990s moter but had acquired a new β2-tubulin-like (136). A large number of functional RNAs promoter by recruiting a novel 5 regulatory se- from noncoding regions have been reported quence. This regulatory sequence drives testis- to play vital roles in a wide variety of organ- specific expression of β2-tubulin and appears isms (7, 80). In Drosophila, microRNAs appear to still do so for Dntf-2r. In addition, the new to turn over rapidly, but can be strongly in- retrogene Xcbp1 recruited existing neuron pro- fluenced by positive selection (89, 90, 109). moters present at its site of integration (29). Strikingly, Dai et al. (34) showed that a new This co-opted mode of promoter recruitment long noncoding RNA influences courtship be- is also observed in human retrogenes (144) and havior in D. melanogaster. Pseudogenes are may be a general mode for retrogene promoter conventionally thought of as dead genes that gain (65). Additionally, Ni et al. (107) observed play no functional roles (41), but they may that eight new genes essential for Drosophila de- evolve functions in regulating expression of re- velopment evolved binding sites for the CC- lated genes. Zheng & Gerstein (171) recently CTC binding factor (CTCF) insulator under found that many mammalian pseudogenes positive selection, ensuring the delineation of are transcribed and thus may still function. the regulatory domains of these genes. McCarrey & Riggs (96) predicted that pseudo- genes may regulate their parental genes, similar Transposable Elements to long noncoding RNAs or miRNAs. An ex- plicit mechanistic model of the use of pseudo- Transposable elements (TEs) can contribute to gene transcripts as decoys for cross-regulating functional divergence between duplicate genes expression of target genes was actually proposed through several ways, all similar to those de- and tested by Marques et al. (93, 94). scribed above (12). For instance, TEs can me- diate gene recombination by carrying coding sequences from one part of the genome to an- other (63, 158) and can even themselves be in- New Gene Regulatory Systems corporated into existing coding sequences (46, New genes must acquire a specific transcrip- 88, 106). In addition, TEs were recently found tion regulatory system to ensure certain tempo- to be a source of micro-RNAs, which are ma- ral and spatial expression patterns. Betran´ et al. jor components of posttranscriptional regula- (10) investigated the origin of the male-specific tion of expression (116). expression of Dntf-2r, a retroposed gene in the Although we still have a developing picture D. melanogaster–Drosophila simulans clade. The of the contributions of each of these mecha- new retrogene did not contain the parental pro- nisms for new gene formation in different taxa, ←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Annu. Rev. Genet. 2013.47:307-333. Downloaded from Access provided by Carnegie Mellon University on 08/20/15. For personal use only. Figure 2 Representative new genes exhibiting various new gene origination mechanisms. (a) Jingwei, a new gene found only in Drosophila teissieri and Drosophila yakuba, was generated by a combination of retroposition, DNA-based duplication, and gene recombination, which formed a chimeric gene consisting of Adh-derived enzymatic domain and a hydrophobic domain from Ymp (85, 150). (b)PIPSLin humans is a consequence of gene fusion between two adjacent ancestral genes by read-through transcription and subsequent coretroposition (164). (c) Gene fission split the ancestral gene monkeyking into two distinct genes in Drosophila mauritiana, revealing an intermediate process of gene fission aided by gene duplication and complementary degeneration (149). (d) The gene ENSMUSG00000078384 in mouse revealed the evolutionary process of de novo gene origination (104). Red boxes are ancestral stop codons (TGA) with two triangles showing the positions of the enabling mutations, including a substitution and a deletion. (e)Twonew genes in humans, DAF and mNSCI, were generated by domesticating transposable elements, Alu, and short interspersed elements (B1–B4) (91, 106). DAF and Alu elements together make an interesting case in which alternative splicing generated a new isoform in the mammalian genome. ( f ) Horizontal gene transfer (HGT) is prevalent in bacteria with mechanisms such as homologous recombination (111). Antibiotic resistance genes can be acquired by host genomes containing the intl gene (which encodes integrase), a recombination site (Att), and a promoter to express the captured gene, as depicted by the process shown in the panels on the left. • New Gene Evolution 313 GE47CH14-Long ARI 29 October 2013 14:13

work in humans and Drosophila suggests that and gene recombination. Thus, the rates of new ∼80% of genes are formed by DNA-based du- gene origination we highlight should be viewed plication, 5% to 10% by de novo duplication, as serious underestimates. and ∼10% by retroposition (168, 169). And al- though these mechanisms may generate the ini- Drosophila tial gene structures, many new structures (in a The first estimate of the rate of new gene orig- large variety of taxa) undergo radical structural ination was made for retrogenes in Drosophila renovation to change exon-intron structure and in 2002 by Betran´ et al. (11), who identified even recruit new or existing coding sequences ∼150 retrogenes in D. melanogaster (4, 11) that into the new locus (30, 49, 151, 172). It has arose after the divergence of the Drosophila and been observed that the various mechanisms are Sophophora subgenera approximately 50 Mya. often combined to determine the structure of Their estimate of three new retrogenes new genes. per million years in the lineage leading to D. melanogaster was corroborated by an inde- Evolution of Transcription Units pendent estimation of ∼1.5 new retrogenes per Other than the origination and evolution of the million years based on cDNA hybridiza- macrostructure of genes described above, it was tion against salivary polytene chromosomes recently found that the transcription units in in species in the D. melanogaster subgroup the genes of vertebrates have been direction- (∼25 million years old) (158). Zhou et al. (172) ally evolving toward a productive transcription. computationally estimated via DNA-based Almada et al. (1) reported a highly significant duplication, retroposition, de novo origination, linear correlation between the gene age and the and gene recombination new gene origination critical signals to define transcription units in a rates in the D. melanogaster subgroup to be gene, including the U1 small nuclear ribonu- 5–11 new genes per million years and found cleoprotein recognition sites and polyadeny- different rates for the four mechanisms. In par- lation sites (PASs). The observed incremental ticular, approximately 80% of new genes added gain of the U1 sites and gradual loss of PASs to the D. melanogaster lineage genome were in the 5 end of protein-coding genes revealed generated by DNA-based duplication. More a selection for a U1-PAS axis for productive extensive and detailed analyses of DNA-based transcription. and RNA-based duplicates were conducted by Vibranovski et al. (142), Meisel et al. (100), and ABUNDANCE AND Zhang et al. (168). Zhang et al. (168) analyzed ORIGINATION RATES the 12 Drosophila genomes and estimated that OF NEW GENES ∼17 duplicate genes per million years arose

Annu. Rev. Genet. 2013.47:307-333. Downloaded from in the Drosophila genome. Figure 3a shows

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. The advent of whole-genome sequences for the distribution of these new genes on the many organisms allowed identification of many Drosophila phylogeny. new DNA-based and RNA-based duplicate genes (e.g., 11, 43). With more genome se- quences available, especially in closely related Mammals groups such as the twelve Drosophila species Emerson et al. (43) and Marques et al. (92) iden- (32), it became possible to investigate the rates tified ∼120 retrogenes in the human genome, of new gene origination in particular lineages. yielding an estimated retrogene origination rate We review these findings in Drosophila,mam- of one retrogene per million years in the lin- mals, and plants. There have been no re- eage leading to humans. Zhang et al. (166, 169) ports of new gene origination rates for mech- systematically identified new genes in verte- anisms other than DNA-based duplication, brates, especially in primates, and showed that RNA-based duplication, de novo origination, the rates of new gene origination are variable

a Drosophila Mya 40 Branch 0 11,909

Branch 1 220 35

Branch 2 161


Branch 3 154


Branch 4 284 6

3 Branch 5 68 Br. 6 60 D D D D D D D D D D D D ...... melanogaster sechellia simulans yakuba erecta ananassae persimilis pseudoobscura willistoni grimshawi mojavensis virilis

b Vertebrates Mya 450 Branch 0 12,058

Branch 1 1,013 370

Branch 2 310 1,393

Branch 3 1,018 220

Branch 4 945 160

Branch 5 1,214 100 Annu. Rev. Genet. 2013.47:307-333. Downloaded from 70 Branch 6 336 Access provided by Carnegie Mellon University on 08/20/15. For personal use only. Branch 7 113030 43 Branch 8 314 25 Branch 9 286 Branch 10 13 392 6 Branch 11 447 Br. 12 389 Human Chimp Orangutan Rhesus Marmoset Mouse Guinea pig Dog Cow Armadillo Opossum Platypus Chicken Lizard Frog Fugu Zebrafish Terec

in different evolutionary stages of vertebrates (54), and C. elegans (81). Dopman & Hartl (40), (Figure 3b), although 25–30 genes generated Emerson et al. (42), Cardoso-Moreira & Long mostly by DNA-based and RNA-based dupli- (20), and Cardoso-Moreira et al. (19) identified cation arise per million years. Interestingly, this more than 1,000 partial and 100 complete rate is much higher on the branches closer to gene duplications/deletions in just 15 strains of human (66 new genes per million years in the D. melanogaster relative to the reference human lineage alone) (166). genome using microarray hybridization. In addition, next-generation sequencing Plants and microarrays have identified more than 1,200 partial and 600 complete gene du- In contrast to flies and mammals, Zhang plications/deletions in 179 individual human et al. (165) reported that 0.6 retrogenes per genomes relative to the reference genome (101, million years arose in the Arabidopsis thaliana 125). The recent sequencing of 43 genomes in genome, a rate comparable to Populus (174), two D. melanogaster populations revealed more and a microarray-based study in Arabidopsis CNVs, including 2,588 duplications and 3,336 identified 94 new genes created by DNA-based deletions relative to the reference genome (74). duplication and retroposition (45). Surpris- The large number of new genes segregating in ingly, Wang et al. (151) found that a very high populations is just now beginning to be appre- rate of retrogene and chimeric gene origination ciated and investigated further. An active area was present in rice: More than 1,000 retrogenes of research will be to perform functional and were identified in the rice genome, 380 of statistical analyses of these new genes to under- which evolved chimeric gene structures by stand their earliest stages of evolution. In all, recruiting previously existing genes into their these studies have shown that new gene origina- gene structures. These authors determined the tion rates can differ between taxa, yet are appre- rate of chimeric gene origination to be 7 per ciable in all groups studied. These results fur- million years in grass genomes in the lineage ther strengthen the conclusion that new gene leading to rice, 50 times the origination rate of origination is a general evolutionary process. chimeric genes in humans (144), and the high- est rate of chimeric gene origination known. In addition, Jiang et al. (63) identified more PATTERNS OF NEW GENE than 3,000 gene recombinants in rice mediated ORIGINATION by Pack-Mutator-like transposable elements (Pack-MULEs). These results suggest a huge Gene Traffic in Drosophila, Humans, potential for protein diversity in plant genomes. and Other Organisms Along with these extensive studies in With the large number of new genes identified

Annu. Rev. Genet. 2013.47:307-333. Downloaded from Drosophila, mammals, and plants, there have

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. in various organisms, researchers were able been many valuable investigations of chimeric to investigate statistical patterns of new gene genes and retrogenes in Caenorhabditis elegans characteristics to explore the mechanistic and (66), fish (25, 49), silkworm (147), and chicken evolutionary forces that impact the formation, (62). origination, and evolution of new genes. Betran´ et al. (11) examined the chromosomal distri- Copy Number Variation bution of retrogenes and their parental copies Inexpensive whole-genome analysis has also in D. melanogaster (Figure 4a). Surprisingly, made it possible to identify genes at the these authors found a significant excess of very earliest stages of their evolution, before autosomal retrogenes derived from X-linked fixation. Abundant copy number variation parental genes (X→A) and a significant (CNV) of individual genes has been detected deficiency of retrogenes formed in the opposite in Drosophila (40, 42, 124), humans (47), mouse direction (A→X) or between autosomes

b X a Drosophilia Humans


Excess male Excess male ExcessExcess nnon-sexon-sex aandnd EExcessxcess mmaleale biased functions biased functions femalefemale ffunctionsunctions biasedbiased ffunctionsunctions 114% 114% 2260%60% 299%299% –39% –39% –10 ~ –12% 4

–33% 232 3 Autosomes

Figure 4 Retrogene traffic in (a) Drosophila (11, 142) and (b) humans (43). Each arrow indicates the movement of retrogenes from the parental gene chromosomal location to the retrogene’s location. The size of the arrow indicates the intensity of gene movement between chromosomes, and the percentages show quantitatively the excess of movement over the null expectation (random origination and insertion). The functions of the retrogenes are indicated.

(A→A). Dai et al. (34a) further revealed that chromosome, an autosomal chromosome arm retrogenes derived from autosomal parental that fused to the ancestral X chromosome in copies tend to locate to the same chromosome the Drosophila genus evolution, also shows the as the parental copies. However, 42 out of same excess of gene traffic (100, 142). the 43 retrogenes exhibited X→A movement; Relative to Drosophila, human and mouse only one retrogene moved X→X. These two studies revealed similar yet distinct patterns of observations clearly reveal a striking pattern gene traffic (43) (Figure 4b). Compared with of new gene origination in flies: Retrogenes a neutral expectation based on the chromo- derived from X-linked genes prefer to copy into somal distribution of processed pseudogenes, autosomes. This directional movement of new which are expected to be evolving neutrally, genes is called gene traffic (43). These results there is an excess of X→A retrogene move- hold in the 12 sequenced species of Drosophila ment and most X→A retrogenes exhibit testis (100, 142) and in Anopheles gambiae (5, 138). expression. However, there is also a significant Interestingly, 90% of X→A retrogenes in excess of A→X retrogene movement in hu- D. melanogaster are expressed in testis, a signif- mans, and these excess A→X retrogenes ex- icantly higher proportion of testis-expressed hibit either female expression or unbiased ex- Annu. Rev. Genet. 2013.47:307-333. Downloaded from

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. genes than average (11), suggesting that the ret- pression, with nonsignificant male expression. rogene’s function (in this case, male-beneficial A→A movement is very low in humans (43). function) can be associated with its reloca- The mouse genome shows a very similar pat- tion. The symmetric pattern was observed in tern. Zhang et al. (166, 168) have shown that silkworm, which has ZW sex determination these patterns exist for DNA-based duplicates, (females are ZW and males ZZ), whereby genes retrogenes, and de novo genes in Drosophila,hu- retroposed from Z→A tend to be ovary ex- mans, and mouse. pressed (147). Gene traffic appears to be general in Drosophila for different mechanisms of new gene formation, as Vibranovski et al. (142) also Consequences of Gene Traffic for showed that new genes created by DNA-based Genome Evolution duplication exhibit the same X→A movement If gene traffic has been historically impor- and testis expression. Moreover, the neo-X tant for genome evolution, the majority of • New Gene Evolution 317 GE47CH14-Long ARI 29 October 2013 14:13

testis-biased/male-biased genes should be genes in the male somatic cells, although these autosomal, contrary to the previous conclusion models need to be rigorously experimentally that the X was a hotbed for male-biased genes tested. MSCI model: X chromosome (148). Several microarray-based studies of inactivation during male-biased genes and their chromosome loca- spermatogenesis favors tions by Ranz et al. (117) and Parisi et al. (114) Correlation Between Gene Age relocation of genes in Drosophila, Khil et al. (70) in mouse, and later and Expression involved in by Zhang et al. (166) in humans and mouse Early studies revealed a connection be- spermatogenesis to autosomes have confirmed this prediction. In Drosophila, tween the expression and the ages of new Zhang et al. (168) showed a smooth transition genes. Betran´ & Long (10) showed that of new male-biased genes from X linkage to Dntf-2r,a∼10 million-year-old gene in the autosomal linkage over evolutionary time. D. melanogaster subgroup, is expressed only in testis; however, its parent Dntf-2 is ex- pressed ubiquitously. Almost all retrogenes in Models to Interpret the Causes of Drosophila appear to have testis expression (4) Gene Traffic and to have maintained testis-biased or testis- In general, models to explain gene traffic, and specific expression independent of age (50). experimental evaluation of those models, show Vinckenbosch et al. (144) showed that new that natural selection is a major force govern- human retrogenes are often transcribed in ing gene traffic but that mutational processes testis and later evolve stronger and more likely also play a role (38). Meiotic sex chro- diverse spatial expression patterns, coining the mosome inactivation (MSCI) in the male germ “out of the testis” hypothesis. Whether or not line (11, 43, 139, 140), dosage compensation the testis is the starting point for new genes, in the heterogametic sex (3, 143), sexual an- a general survey of the expression patterns for tagonism between male- and female-beneficial new genes that originated within vertebrates genes (22, 128), and meiotic drive (131, 132) revealed strong positive correlation with age in have all been implicated in driving gene traf- both transcription intensity and spatial expres- fic. The relative role of each of these forces has sion (167). It is possible that this testis-biased been hotly debated. MSCI has a strong effect in pattern of retrogene expression is due to our mammals (70), and experimental evidence for inability to detect genes expressed at low levels MSCI in Drosophila comes from several studies in different tissues, but this issue should be (59, 139, 140). Vibranovski et al. (139) showed resolved soon with advances in next-generation that genes that are highly expressed in the sequencing. meiotic phase of spermatogenesis (when the X chromosome is predicted to be inactivated) are Annu. Rev. Genet. 2013.47:307-333. Downloaded from EVOLUTIONARY FORCES Access provided by Carnegie Mellon University on 08/20/15. For personal use only. significantly enriched on the autosomes. Con- versely, genes expressed in the mitotic phases of ACTING ON NEW GENES spermatogenesis are randomly distributed Evolutionary forces, such as natural selection throughout the genome. Other studies sug- and genetic drift, operate on both facets of new gest reduced expression throughout spermato- gene evolution: the fixation of new gene loci and genesis, including in the spermatogonia, which their acquisition of a beneficial function. These also discredits dosage compensation models two facets may overlap. In this section, we dis- (99; however, see 141). A clear-cut single cell cuss theoretical models developed to describe transcriptome is needed to clarify these issues. how new genes arise and acquire novel func- Along with the MSCI model, other non-germ- tions as well as general approaches to studying line-based models, e.g., sexual antagonism, are new genes and the selective forces that act on also necessary to interpret the expression of new them.

Selective Models of New functional improvement. EAC predicts that Gene Evolution the bifunctional ancestral gene is subject to selection before gene duplication, that adaptive Neofunctionalization: Muller (103) was among the first to recognize conflict between the ancestral function and the process by which a the potential importance of duplicate genes the new function constrains improvement of new gene acquires a in evolution. He proposed a simple model the selected function(s) before duplication, novel function whereby new duplicate genes could acquire and that adaptive changes and functional novel, beneficial functions distinct from those improvement occur in the daughter genes after of the original copies. Ohno (112) further duplication. elaborated on Muller’s duplication model For additional information on duplicate as a major means of neofunctionalization. gene evolution, see Conant & Wolfe (33), who However, Ohno also predicted that duplicate suggest that preservation of new genes stems genes are most often inactivated and become from the co-option of existing functions to serve pseudogenes. This classic model assumes that new purposes, and Walsh (145, 146), who gives the new gene is functional upon duplication a detailed mathematical description of the mod- and that the new gene subsequently acquires els and relative probabilities of neofunctionali- mutations that provide a novel beneficial zation and pseudogenization. function. The novel function is then preserved Examples of EAC (36), IAD (105), and AR in the genome by natural selection. (48) have been published, and each model has However, strictly duplicate genes are specific predictions for what we should observe redundant, and beneficial mutations are ex- if a new gene originated by each process (33). tremely rare. How do new duplicate genes However, none of these models can be used as a remain in the population long enough to accu- statistical framework for rigorously testing the mulate a beneficial, selected mutation(s)? This roles of evolutionary forces in new gene orig- problem led to the development of models that ination. Classic molecular population genetic predict selective preservation of both copies tests based on nucleotide substitution patterns at all stages of their evolution: adaptive and allele frequency spectra do provide this radiation (AR), innovation-amplification- framework and have been used extensively to divergence (IAD), and escape from adaptive detect selection on new genes. These tests, conflict (EAC). The AR model proposes that such as the M-K (McDonald-Kreitman) test gene duplication itself is favored, e.g., for in- (97) and the HKA (Hudson, Kreitman, and creased dosage of a gene product, and that the Aguade) test (60), detect elevated rates of amino new duplicates then undergo functional radia- acid substitutions (M-K) or reduced effective tion (48). Thus, AR posits that novel functions population size (HKA) at loci. In addition, are acquired after duplication. IAD and EAC, Thornton (135) introduced a coalescent-based Annu. Rev. Genet. 2013.47:307-333. Downloaded from

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. in contrast, propose that ancestral loci develop model that can be used to test for selection novel beneficial secondary functions before on CNV. The HKA test and Thornton’s test duplication (9, 36). Under IAD, repeated gene compare measurements of nucleotide variation duplication is favored to increase the dosage in genes with a distribution of parameter values of the novel secondary function. Different derived from neutral coalescent simulations. duplicates are then free to optimize the ances- Thus, the M-K, HKA, and Thornton’s tests tral or novel secondary function, and only the are used to test the classic model. Each of two best copies are retained in the genome. these five models (classic, AR, IAD, EAC, and The increase in the number of duplicate genes statistical) predicts that new genes should ex- within the AR and IAD models also provides perience strong natural selection after they are additional targets for beneficial mutations, formed. We now discuss some of the evidence thus increasing the probability and speed of indicating that this often appears to be the case. • New Gene Evolution 319 GE47CH14-Long ARI 29 October 2013 14:13

a c 65/32 Fixed retrogenes originating on autosomes/the X Polymorphic retrogenes KA = 0.018 originating on Adh Parental autosomes/the X KS = 0 genes 30/4 Adh-twain 0.01 D. simulans D. melanogaster D. simulans D. melanogaster In the clade of D. subobscura-guanchi 70/20 Fixed retrogenes from A A Polymorphic retrogenes from or X X copying over the A A or X X copying over KA = 0.020 retrogenes from A X the retrogenes from A X or K = 0 Jingwei Parental or X A copying X A copying Adh S genes 36/3 In the clade of 0.01 Chimpanzee Humans Chimpanzee Humans D. teissieri-yakuba

b Retroposition KA = 0.038 0/0 9/0 K = 0 Adh S

4/11 21/16 Adh-Finnegan 0.01 Adh 0/18 2/8 Jingwei 2/19 In the clade of D. hydei-mettleri D. teissieri D. yakuba D. teissieri D. yakuba

Siren Adh D ananassae KA = 0.064 In . and D. bipectinata complex KS = 0


Figure 5 Positive Darwinian selection acting on new genes. (a) Positive selection for the fixation of new retrogenes in Drosophila (124) and humans (123). The numerator and denominator show the numbers of retrogenes that originate on the autosomes and the X, respectively. Tests based on the M-K framework indicate an excess of fixed X→A retrogenes in both species and strong positive selection for X→A retrogene movement. (b)Thejingwei ( jgw) gene in Drosophila (85). The ratios over the branches are the numbers of nonsynonymous changes over the numbers of synonymous changes, and the ratios in the triangles are the ratios of divergence between the species and the polymorphisms. M-K tests and Ka/Ks ratios indicate strong positive selection acted on jgw shortly after it originated. (c) Selection acted on all Adh-derived chimeric genes in Drosophila (64), as indicated by elevated Ka/Ks ratios. Annu. Rev. Genet. 2013.47:307-333. Downloaded from

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. Fixation of New Genes et al. (42) found a genome-wide pattern con- The first study to identify signatures of selec- sistent with strong purifying selection on all tion on a new gene journeying to fixation was CNVs except duplications of whole genes. That performed by Llopart et al. (82), who analyzed is, single-gene duplications are under signif- a new variant of the jingwei gene in Drosophila icantly weaker purifying selection than par- teissieri, which lost its second intron. This tial gene duplications or partial or complete D. teissieri–specific intron presence-absence gene deletions. Similarly, Schrider et al. (123, polymorphism exhibits a significant excess of 124) showed a significant excess of fixed ver- rare alleles and patterns of nucleotide polymor- sus polymorphic retrogene CNV originating phism that is consistent with moderate natural from the X chromosome in both Drosophila and selection driving the polymorphism to fixation. humans, indicating that natural selection gov- Selection has also been detected on CNV in erns the patterns of retrogene CNV evolution D. melanogaster and other organisms. Emerson (Figure 5a). Overall, these studies show that

natural selection can play a key role in driving ent Adh-derived fusion genes often accumulate new genes to fixation. In addition, they high- mutations at the same sites, regardless of to light the use of classic population genetic tests which other gene they have fused (Figure 5c). in determining whether selection acts on new In addition, each of the four Adh-derived fu- genes during their journeys to fixation. sion genes exhibits strong signals of accelerated amino acid substitution using classic population genetic statistical tests (e.g., M-K test). Selection on Sequence Changes Some of these observations have recently in New Genes been borne out by genome-wide studies. Xu In addition to studies of the evolutionary forces et al. (156) surveyed structural differences be- governing the fixation of new genes, many stud- tween more than 600 paralogous pairs of genes ies have investigated the effects of selection and in plants and found that most new genes un- drift on new gene sequences. Long & Langley derwent radical changes in exon/intron content (85) showed that the new chimeric gene jingwei and boundaries as well as insertion/deletions. in D. teissieri and Drosophila yakuba contains a And using molecular population genetic tests, significant excess of nonsynonymous substitu- Chen et al. (30) found that young genes in tions compared with nonsynonymous polymor- D. melanogaster show strong signals of selection. phisms (relative to the ratio of synonymous sub- These authors predicted that ∼25% of amino stitutions to polymorphisms), indicating that acid substitutions in young essential genes were amino acid substitutions were rapidly driven to fixed by natural selection. In addition, this sig- fixation shortly after the origination of jingwei nal of selection diminishes as genes grow older. (Figure 5b). Similarly, Nurminsky et al. (110) Altogether these studies indicate that there are showed that a D. melanogaster–specific gene general patterns to new gene evolution: New family, Sdic, involved in sperm motility rapidly genes often undergo rapid (or immediate) struc- acquired a new exon-intron structure and testis- tural and sequence renovations and expression specific expression (Figure 1). Sdic is a chimeric pattern changes that are driven by strong natu- gene composed of a 5 piece of Cdic, encoding ral selection. a cytoplasmic dynein intermediate chain, and a 3 piece of AnnX, a phospholipid binding pro- Analysis of New Gene Structure tein. This fusion protein underwent rapid struc- and Function tural renovations, including the conversion of a In addition to analyses of new gene frequen- Cdic intron into an exon and an AnnX exon and cies and nucleotide changes, many groups have Cdic intron into a testis-specific promoter. Low investigated the evolutionary forces acting on levels of sequence polymorphism, preservation new genes by analyzing new gene functions, ge- of coding potential, and the absence of Sdic in nomic locations, or expression patterns. This Annu. Rev. Genet. 2013.47:307-333. Downloaded from

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. other closely related species suggest that Sdic complementary approach has revealed several was rapidly swept to fixation. fundamental patterns of new gene origination. These first discoveries sparked searches Chen et al. (24) and Cheng & Chen (31), for for general evolutionary patterns in new example, investigated the antifreeze proteins genes. Jones & Begun (64) searched for com- found in the blood of several orders of Arc- mon patterns in the evolution of three Adh- tic and Antarctic fish. These proteins inde- derived chimeric genes in different lineages of pendently evolved in the different orders, yet Drosophila. All three new genes quickly accu- they consist of nearly identical tripeptide re- mulated a large number of amino acid replace- peats. These tripeptide repeats were generated ment substitutions, several at identical amino de novo by amplification of short nucleotide acid sites, in the Adh-derived region shortly af- sequences. These studies showed that similar ter they arose. Strikingly, Jones & Begun (64) environmental pressures may favor the genera- and Shih & Jones (127) showed that differ- tion of genes with similar functions. • New Gene Evolution 321 GE47CH14-Long ARI 29 October 2013 14:13

In addition, as we showed in the pre- sential, functions at the molecular, cellular, and vious section, testis-biased genes are under- individual level (27). These findings were sig- represented on the D. melanogaster and mam- nificantly added to the classic reports of the malian X chromosome. Diaz-Castillo & Ranz’s diverged or novel functions in the genes cre- (38) analysis of the genomic location of genes ated by duplication (see 53, 79 for references relative to the position of chromosome domains therein). during spermatogenesis led the authors to alter- natively propose that the enrichment of testis- Biochemical Pathways biased retrogenes on the autosomes is caused New genes can generate new biochemical by an increased availability during spermato- pathways and products if they are enzymes or genesis of open chromatin domains that con- become enzymes. Zhang et al. (162) showed tain testis-expressed genes. This larger target that jingwei evolved the capacity to catalyze for retrogene integration allows a higher pro- breakdown of long-chain alcohols in D. yakuba portion of these retrogenes to acquire testis- and D. teissieri, whereas the parent Adh can biased expression. These general observations only act on short-chain alcohols. In Arabidopsis, of the location of sex-biased genes, and their Weng et al. (152) and Matsuno et al. (95) general movement off of the X chromosome, demonstrated that three recently evolved indicate that differences in expression alone can new duplicate genes from the P-450 family, dictate where in the genome new genes origi- Cyp98A9, Cyp98A8,andCyp84A4, assembled nate. Together, these results show that stud- two new biochemical pathways related to ies of general patterns of extant gene locations, phenolic metabolism that are required for structures, and expressions can be informative pollen development and α-pyrone synthesis. of new gene origination and evolution.

Gene Expression Networks PHENOTYPIC EFFECTS New genes can also be quickly integrated into OF NEW GENES existing gene networks. Chen et al. (30) ob- Studying the roles of new genes in phenotypic served that almost all young essential genes have evolution recently became feasible with the ad- been assimilated into protein-protein physical vent of sophisticated genetic tools and molec- interaction networks in Drosophila, and a signif- ular techniques as well as significant progress icant number of these young genes have de- in related areas of important phenotypes in bi- veloped multiple interactions with old genes ology. Young genes are often assumed to be (Figure 6). Integration appears to be driven dispensable because important functions are by natural selection. Several new genes have thought to require a long evolutionary period become new hubs. Analysis of one new gene, Annu. Rev. Genet. 2013.47:307-333. Downloaded from

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. to be developed and optimized (76). However, Zeus, derived from the DNA-binding pro- studies in the past decade have found numerous tein Caf40 via retroposition (28), revealed that young genes with important, and sometimes es- it retained ∼30% of Caf40’s DNA-binding −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→

Figure 6 New genes integrated into gene networks and reshaped those networks. (a) New yeast genes that originated through duplication-based (blue) and non-duplication-based (red ) mechanisms since the recent whole-genome duplication (<100 Mya) were integrated into the physical interaction network (18). The orange box highlights a module composed of two new genes involved in the pathway to form and process actin. DID4 ( green box) interacts with 11 new genes within three steps. (b) New genes in Drosophila formed hubs in protein-protein interaction networks (30). (c)TheDrosophila melanogaster–Drosophila simulans-specific gene Zeus quickly accumulated more than 100 amino acid substitutions (red, yellow, blue, and light green) in its nucleotide-binding domains under positive selection. Consequently, it evolved into a new DNA-binding motif that evolved hundreds of new gene links to rewire the gene networks that control reproduction (28).

322 Long et al. GE47CH14-Long ARI 29 October 2013 14:13







b c Nucleic acid binding groove 2

1 Bits G A T CGA AA G GC G GAG AGG G G A CC CG C A ATC T A G T C T G C A A C A CT CC C T G C T AC T T 0 T 1 234 5 67 8 9 1011 1213 14 15 16 17 18 19 20 21 Caf40

107 amino acid 4–6 Mya after generating substitutions in Zeus Zeus through retroposition Annu. Rev. Genet. 2013.47:307-333. Downloaded from

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. Nucleic acid binding groove Zeus 2

1 Bits A AG C G GA G G G GA A T A GAG A C AT T G A AACG C T GA T G G G C C GCC T A T C C TT C CTC T 0 C C C 1 25834 67 9 1011 1213 14 15 16 1720 18 19 21

Zeus has created 193 new gene links and kept only 30% (129) of ancestral links of caf40 • New Gene Evolution 323 GE47CH14-Long ARI 29 October 2013 14:13

sites. However, in a short evolutionary period heterochromatin protein 6 (HP6) 12–15 Mya. (4–6 million years) Zeus acquired 193 new bind- Subsequent loss of one of its two domains ing sites through which it activates or represses (the chromodomain) and the accumulation of hundreds of downstream genes involved in protein coding changes in the remaining chro- reproduction. This observation indicates that moshadow domain gave Umbrea a distinct chro- gene expression networks can be rapidly and matin localization pattern at the centromere. globally reshaped in evolution by new genes. Umbrea appears to have become essential only Li et al. (77) showed that a de novo gene in after it lost the chromodomain 5–7 Mya. Care- yeast can suppress a previously existing mating ful molecular dissection, ancestral protein res- type–control pathway, thus rewiring the struc- urrection, and population genetic analyses are ture of gene networks in the species. Capra the keys to understanding the processes and et al. (18) revealed that new genes in yeast be- time new genes take to acquire important roles come more integrated into cellular networks in organisms. over time. The modified networks are not nec- essarily novel or unimportant, either: Konikoff et al. (72) found that genes have been contin- Brain Evolution in Flies and Humans ually added and removed from the Wnt and Chen et al. (29) investigated the expression pat- TGF β-signaling pathways, ancient networks terns of new genes in Drosophila and found involved in animal development. that approximately five new genes per million years evolved brain expression patterns, mostly in structures involved in olfaction and learn- Development ing/memory. All new brain genes are expressed Surprisingly, new genes can quickly acquire in the α/β lobe, an evolutionarily new set of essential roles in development. Chen et al. neurons, implicating the roles of new genes (30) identified 59 genes that originated in in the evolution of this brain structure. Some the past ∼35 million years in Drosophila that of the new brain genes have significant ef- evolved essential developmental functions. fects on behavior. For example, Xcbp1 and Desr Silencing expression of these young genes influence foraging behaviors (29), and sphinx causes development failure in early to late influences courtship behaviors (34). The fre- pupae and in some cases at even earlier stages quent acquirement of new brain genes into the (Figure 7a,b). Furthermore, tissue-specific genome and the behavioral phenotypes of some knockdown of these young genes can cause of these genes suggested rapid evolution of be- morphological defects in adult flies. Silencing haviors, which is consistent with the remarkable new genes can also have a critical effect on observations of Rollman et al. (120) that de- reproduction, even when the individual can tected a great variation in the olfactory behav- Annu. Rev. Genet. 2013.47:307-333. Downloaded from

Access provided by Carnegie Mellon University on 08/20/15. For personal use only. complete development. The duplicate gene nsr ioral response associated with odorant receptor (novel spermatogenesis regulator) exists only in gene duplicates within the natural population the four species of the D. melanogaster clade of D. melanogaster. The incorporation of new that diverged 3 Mya, yet it evolved an essential genes into the brain is not specific to Drosophila. function required for sperm individualization Zhang et al. (166) found a correlation between (39). Similarly, silencing Zeus, a gene in the new genes and brain evolution in the human lin- same group of Drosophila, causes sterility by eage. A high proportion of hominoid-specific disrupting testis and sperm development (28). and human-specific genes are expressed in the Recent work on Umbrea, a 12–15 million- prefrontal cortex and temporal lobe, the newest year-old gene in Drosophila, carefully dissected brain structures, in early fetal development. the evolutionary steps this young gene took Strikingly, 54 of 380 human-specific genes are to becoming essential in D. melanogaster (121). expressed in these two brain regions, regions Umbrea arose by DNA-based duplication of that are critical for proper cognitive function.

G13463 CG6289 30 Mya 9 Mya Early pupa Early larva

G32376 18 Mya Pharate

b CG7627 Gene duplication YLL1 8


Mya 4



erecta erecta . yakuba teissieri . yakuba teissieri D . . simulans D . . simulans D D . D D . ananassae mauritiana mauritiana . D . D . D D melanogaster D melanogaster . . D D LINES METHOD/MUTATION PHENOTYPES 11343 P-element insertion Lethal, pupal stage SH2-0995 EMS/G717S Lethal, pupal stage SH2-1101 EMS/T765I Lethal, pupal stage SH2-0504 EMS/synonymous Viable V39539 RNAi/constitutive Gal4 Lethal, pupal stage V39540 RNAi/constitutive Gal4 Lethal, pupal stage

Annu. Rev. Genet. 2013.47:307-333. Downloaded from Figure 7 Access provided by Carnegie Mellon University on 08/20/15. For personal use only. The essential effects of new genes on development. (a) Development was terminated in a developmental stage when three different genes were knocked down using RNA interference (RNAi). (b) YLL1 originated in the common ancestor of the Drosophila melanogaster subgroup species ∼6–10 Mya, yet showed lethal effects in the pupal stage when silenced by RNAi, mutated by EMS, or disrupted by the P element (30).

One of these genes, SRGAP2, is involved in sexual reproduction and sex determination (87). neocortical development (23, 37). As the aforementioned patterns of new gene origination show, the vast majority of new genes Sexual Dimorphism and are sex-biased, especially male-biased, and their Sexual Reproduction origination processes show directional copy- New genes impact sexual dimorphism by par- ing between the sex chromosomes and auto- ticipating in the genetic systems that control somes (e.g., 11, 43). A number of new genes • New Gene Evolution 325 GE47CH14-Long ARI 29 October 2013 14:13

