New Gene Evolution: Little Did We Know
Total Page:16
File Type:pdf, Size:1020Kb
GE47CH14-Long ARI 29 October 2013 14:13 New Gene Evolution: Little Did We Know Manyuan Long,1,2,∗ Nicholas W. VanKuren,1,2 Sidi Chen,3 and Maria D. Vibranovski4 1Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois 60637; email: [email protected] 2Committee on Genetics, Genomics, and Systems Biology, The University of Chicago, Chicago, Illinois 60637; email: [email protected] 3Department of Biology and the Koch Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139; email: [email protected] 4Departamento de Genetica´ e Biologia Evolutiva, Instituto de Biociencias,ˆ Universidade de Sao˜ Paulo, Sao˜ Paulo, Brazil 05508; email: [email protected] Annu. Rev. Genet. 2013. 47:307–33 Keywords First published online as a Review in Advance on evolutionary patterns, evolutionary rates, phenotypic evolution, brain September 13, 2013 evolution, sex dimorphism, gene networks The Annual Review of Genetics is online at genet.annualreviews.org Abstract This article’s doi: Genes are perpetually added to and deleted from genomes during 10.1146/annurev-genet-111212-133301 evolution. Thus, it is important to understand how new genes are Annu. Rev. Genet. 2013.47:307-333. Downloaded from www.annualreviews.org Copyright c 2013 by Annual Reviews. formed and how they evolve to be critical components of the genetic Access provided by Carnegie Mellon University on 08/20/15. For personal use only. All rights reserved systems that determine the biological diversity of life. Two decades of ∗ Corresponding author effort have shed light on the process of new gene origination and have contributed to an emerging comprehensive picture of how new genes are added to genomes, ranging from the mechanisms that generate new gene structures to the presence of new genes in different organisms to the rates and patterns of new gene origination and the roles of new genes in phenotypic evolution. We review each of these aspects of new gene evolution, summarizing the main evidence for the origination and importance of new genes in evolution. We highlight findings showing that new genes rapidly change existing genetic systems that govern various molecular, cellular, and phenotypic functions. 307 GE47CH14-Long ARI 29 October 2013 14:13 BACKGROUND AND which to investigate the evolution of new genes HISTORICAL OVERVIEW and to understand their properties. This idea was first manifested in the discovery of jingwei, Understanding how genes originate and a three-million-year-old gene in two species of subsequently evolve is crucial to explaining the African Drosophila (85). Jingwei revealed several genetic basis for the origin and evolution of interesting features of new gene evolution that novel phenotypes and, ultimately, biological are now known to be general: (a) recombina- diversity. Gene origination is thus a widely tion of existing genes, leading to a hybrid gene interesting, yet difficult, problem to study. structure; (b) rapid sequence evolution driven Perhaps unsurprisingly, the peculiar structures, by positive selection; and (c) acquisition of new functions, and evolution of new genes have biochemical functions (150, 162). attracted the interests of pioneers in genetics Today, it is clear that new gene origination is and evolution since the early twentieth century. a general process in evolution and that species- Sturtevant (129) was one of the first to identify specific or lineage-specific genes exist in many, a duplicated gene, the Bar duplication in if not all, organisms. Gigantic databases of ge- Drosophila melanogaster, from which Muller nomic sequences from thousands of species re- (103) developed the first prevalent model of veal that genomes contain huge numbers and new gene evolution in 1936. Muller (103, a large diversity of protein-coding genes. For p. 529) predicted that a new duplicate copy example, the plant Glycine max genome en- of a gene could acquire a novel function and codes more than 50,000 protein-coding genes, be preserved in the genome, and further that whereas the bacterial genome of Candidatus “there remains no reason to doubt the appli- Hodgkinia cicadicola contains only 189 genes. In cation of the dictum ‘all life from pre-existing addition, the abundance and diversity of non- life’ and ‘every cell from a pre-existing cell’ coding genes are only now beginning to be real- to the gene: ‘every gene from a pre-existing ized. Even genomes with similar gene numbers gene.’” This early thinking on single-gene can have very different, unrelated genes. These and whole-chromosome duplications (55) was recent data reveal a widespread process of birth greatly expanded in the 1970s. Ohno (112) and death of genes in organisms in which new further developed Muller’s model in 1970, genes enter the genome and old genes are lost. and Gilbert (52) proposed an entirely new What mechanisms and forces dictate gene birth model of new gene formation in 1978, whereby and death? Specifically, how are new genes and pieces of unrelated genes can be recombined novel functions added to genomes? into new genes rather than just be strictly In the two decades since the discovery duplicated. However, experimental work on of jingwei, there have been several hundred new genes did not begin until the early 1990s additional publications reporting various inter- Annu. Rev. Genet. 2013.47:307-333. Downloaded from www.annualreviews.org Access provided by Carnegie Mellon University on 08/20/15. For personal use only. when a plausible framework for experimental esting and significant observations of new genes studies of new gene formation and evolution and new gene functions in many different or- was proposed: studies must focus on genes that ganisms. Regrettably, we can only choose a few were recently formed because young genes representative publications to sketch several still carry all the signatures of the evolutionary lines of observation that can provide insight forces that shaped their origination and the into an emerging, global picture of new gene evolution of their new structures and functions evolution. We follow the growth of scientific (83). As genes age, they accumulate mutations information and underlying ideas and concepts that obscure the structural or evolutionary in new gene evolution, beginning by discussing signals from their early history (83, 84). In the methods for identifying new genes and eukaryotes, genes younger than 10–30 million mechanistic processes of new gene formation. years have not experienced much sequence We then describe the rates and patterns of evolution and thus constitute a valid system in new gene origination and evolution that may 308 Long et al. GE47CH14-Long ARI 29 October 2013 14:13 indicate some rules governing these processes This review provides an overview of efforts to and discuss the evolutionary forces that act on understand the answers to these problems. new genes. Finally, we review the rapid growth Fixation: the of studies of the phenotypic effects of new genes Approaches to Identifying New Genes population genetic and their impact on phenotypic evolution. process by which a All new gene identification methods are based mutation spreads to all on comparative analysis of the structures of individuals in a THE CONCEPT OF NEW genes and genomes. Within a group of closely population GENE ORIGINATION related species, we can define new genes as Monophyletic group: those that are present in all members of a a group of taxa that To understand various basic properties of new monophyletic group but absent from all out- share a common gene evolution, we need to have some concep- group species (Figure 1). Early studies often ancestor tion of the process of new gene origination and serendipitously identified new genes by ana- an operational definition for the process. This lyzing the phylogenetic distribution of genes definition helps us explore methods for new via characterization of small genomic regions gene identification. (e.g., 85, 108). Microarrays (42, 44, 45) and es- pecially next-generation sequencing (168, 169) The Process of New Gene Origination have made recent searches for new genes more New gene origination begins in a microevolu- purposeful efforts. tionary process. A protogene structure is first generated by a mutation in a single germ-cell Multiple genomes. Syntenic alignments genome. This protogene structure must then (Figure 1) of genomes can be used to iden- spread through the population until it is fixed. tify new genes from related species whose Various evolutionary forces, such as natural se- phylogenetic relationship is known. Syntenic lection and genetic drift, govern the spread alignments of each gene in each species allow of the protogene through the population, thus identification of genes that are present or making protogene fixation a population genetic absent in one genome relative to another process. Both before and after fixation, the pro- (Figure 1). In these comparisons, a gene can togene accumulates mutations that confer on it be defined as a new gene candidate if it is new structures and beneficial, sometimes novel, present in a certain clade or single species functions that are acted on by natural selection. and absent in all outgroup species (Figure 1). From the point that the protogene carries an Additionally, the orthologous genes that flank optimized function and is fixed in the genome, the new gene candidate appear in all species un- it is essentially the same as most other, older der consideration. This strategy has been used genes in the genome and can be considered with great success in Drosophila and mammals Annu. Rev. Genet. 2013.47:307-333. Downloaded from www.annualreviews.org Access provided by Carnegie Mellon University on 08/20/15. For personal use only. a new gene. New gene studies typically focus (35, 168, 169, 172). New genes formed by dif- on these first two stages (the fixation process ferent mechanisms also have correspondingly and acquisition of a beneficial function) and the different structural features that can be used consequences of accepted mutations on the se- to infer the mechanism of new gene formation quence, structure, and function of the new gene.