Theory of Prokaryotic Genome Evolution INAUGURAL ARTICLE
Total Page:16
File Type:pdf, Size:1020Kb
Theory of prokaryotic genome evolution INAUGURAL ARTICLE Itamar Selaa, Yuri I. Wolfa, and Eugene V. Koonina,1 aNational Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 This contribution is part of the special series of Inaugural Articles by members of the National Academy of Sciences elected in 2016. Contributed by Eugene V. Koonin, August 24, 2016 (sent for review June 7, 2016; reviewed by Edo Kussell and Claus O. Wilke) Bacteria and archaea typically possess small genomes that are deleterious for the host, because the new DNA does not perform tightly packed with protein-coding genes. The compactness of pro- any immediately beneficial function but incurs the cost right from karyotic genomes is commonly perceived as evidence of adaptive the beginning. This theory, which is steeped in the well-estab- genome streamlining caused by strong purifying selection in large lished principles of population genetics, provides a simple, unified microbial populations. In such populations, even the small cost framework for understanding evolution of genomic complexity incurred by nonfunctional DNA because of extra energy and time without invoking widespread adaptation. This nonadaptive theory expenditure is thought to be sufficient for this extra genetic material can be reasonably assumed as the null hypothesis of genome to be eliminated by selection. However, contrary to the predic- evolution, the predictions of which have to be falsified to claim tions of this model, there exists a consistent, positive correlation adaptive phenomena (8, 12). between the strength of selection at the protein sequence level, The population genetic theory clearly predicts an inverse measured as the ratio of nonsynonymous to synonymous sub- correlation between the strength of selection at different levels stitution rates, and microbial genome size. Here, by fitting the and genome size: small genomes are predicted to be subject to genome size distributions in multiple groups of prokaryotes to stronger selection than large genomes (9). However, when predictions of mathematical models of population evolution, we protein sequence-level selection was measured for multiple show that only models in which acquisition of additional genes is, groups of closely related bacteria using the ratio of the non- on average, slightly beneficial yield a good fit to genomic data. synonymous to synonymous substitution rates (dN/dS)asaproxy, These results suggest that the number of genes in prokaryotic the opposite effect, namely a significant negative correlation be- genomes reflects the equilibrium between the benefit of additional tween dN/dS and genome size, was observed, indicating that larger EVOLUTION genes that diminishes as the genome grows and deletion bias (i.e., prokaryotic genomes typically evolve under a stronger selection the rate of deletion of genetic material being slightly greater than than small ones (13). the rate of acquisition). Thus, new genes acquired by microbial Here, we sought to further investigate the evolutionary factors genomes, on average, appear to be adaptive. The tight spacing of that control genome evolution in prokaryotes. We reproduced protein-coding genes likely results from a combination of the the negative correlation between dN/dS and genome size on an deletion bias and purifying selection that efficiently eliminates non- expanded genome collection and then, developed a mathemati- functional, noncoding sequences. cal model of genome evolution by gene gain and loss in pro- karyotic populations. By fitting the distribution of genome sizes evolutionary genomics | prokaryotic genome size | genome streamlining | predicted by the model to the empirical distribution for many positive selection | deletion bias groups of prokaryotes, we found that a good fit between theory and the data could be obtained only for models that included a he majority of bacterial and archaeal genomes are small, at positive mean fitness contribution of the gained genes countered Tleast compared with the genomes of multicellular and many by a deletion bias. These results imply that, at the level of gene unicellular eukaryotes (1, 2). Also, with the exception of deterio- rating genomes of some parasitic bacteria, the prokaryotic genomes Significance are highly compact, with densely packed protein-coding genes and a low fraction of noncoding sequences (3). The small genome size is Bacteria and archaea have small genomes with tightly packed thought to be selected for fast replication, whereas the high gene protein-coding genes. Typically, this genome architecture is density additionally facilitates coregulation of gene expression via explained by “genome streamlining” (minimization) under the operon organization (4, 5). Across the full range of cellular life selection for high replication rate. We developed a mathe- forms, a significant positive correlation has been shown to exist matical model of microbial evolution and tested it against between genome size and Neu,whereNe is the effective population extensive data from multiple genome comparisons to identify size, and u is the mutation rate per nucleotide (6–9). Accordingly, the key evolutionary forces. The results indicate that genome a simple and appealing population genetic theory has been de- evolution is not governed by streamlining but rather, reflects veloped, under which selection strength controls genome size and the balance between the benefit of additional genes that di- complexity (6, 9). Prokaryotes, with the exception of some para- minishes with the genome size and the intrinsic preference for sites, have large effective population sizes on the order of 109 or DNA deletion over acquisition. These results explain the ob- even higher, which implies strong selection enabling prokary- servation that, in an apparent contradiction with the pop- otes to maintain compact genomes (10). Under this strong selec- ulation genetic theory, microbes with large genomes reach tion regime, even short nonfunctional sequences incur cost that is higher abundance and are subject to stronger selection than “visible” to selection, conceivably through a combination of in- small “streamlined” genomes. creasing energy expenditure and reducing the replication rate, and are efficiently weeded out (11). In eukaryotes, at least the Author contributions: Y.I.W. and E.V.K. designed research; I.S. performed research; I.S., Y.I.W., multicellular forms, the effective population size is substantially and E.V.K. analyzed data; and I.S. and E.V.K. wrote the paper. (by orders of magnitude) smaller, and consequently, selection is Reviewers: E.K., New York University; and C.O.W., The University of Texas at Austin. not strong enough to eliminate superfluous genetic material, The authors declare no conflict of interest. which results in “bloated” genomes but also provides the raw Freely available online through the PNAS open access option. material for the evolution of complex features (6–8). It is often 1To whom correspondence should be addressed. Email: [email protected]. assumed, implicitly or explicitly, that any extra genetic material This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. arising from duplication or acquisition is, on average, slightly 1073/pnas.1614083113/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1614083113 PNAS | October 11, 2016 | vol. 113 | no. 41 | 11399–11407 Downloaded by guest on September 28, 2021 gain and loss, selection for genome compactness does not play a fixation probability that depends on the selection coefficient s, major role in determining the genome size in prokaryotes. which is a function of genome size as well s = sðxÞ. Specifically, Rather, the relatively small number of genes in prokaryotes is sðxÞ denotes the mean selective advantage of an individual with explained by the diminishing return associated with the acqui- x + 1 genes over an individual with x genes. Conveniently, selec- sition of new genes as the genome grows combined with the tion coefficients associated with gene acquisition and gene de- intrinsic deletion bias. letion are related as (Methods) Results Protein-Level Selection and Microbial Genome Size: Model Prediction sdeletion = −sacquisition. [1] and Genomic Data. It has been shown previously that, in prokary- F otes, genome size is negatively and significantly correlated with The fixation probability can be approximated by dN/dS, suggestive of a stronger protein-level selection in larger s x genomes (13). For the purpose of this work, we reproduced these F x ≈ ð Þ [2] ð Þ −N s x , findings on a substantially expanded set of groups of closely re- 1 − e e ð Þ lated bacterial and archaeal genomes compiled in the latest ver- N sion of the Alignable Tight Genomic Cluster (ATGC) database where e is the effective population size (16). The effective pop- ulation size is estimated for each prokaryotic group from the (14) (Fig. 1). Specifically, genome size (measured by the number dN/dS Methods of genes) and dN/dS values are correlated with Spearman’s rank ratio ( ). Fixed acquisition and deletion events, hereinafter “gene gain” and “gene loss,” respectively, occur with correlation coefficient ρ = −0.397. These findings imply that, in probabilities given by the multiplication of the event rate and the addition to local influences of different environments and life- fixation probability: styles, there are underlying universal components of gene gain and loss probabilities with