Selection and Genome Plasticity As the Key Factors in the Evolution of Bacteria
Total Page:16
File Type:pdf, Size:1020Kb
PHYSICAL REVIEW X 9, 031018 (2019) Selection and Genome Plasticity as the Key Factors in the Evolution of Bacteria † Itamar Sela,* Yuri I. Wolf, and Eugene V. Koonin National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA (Received 10 December 2018; revised manuscript received 6 June 2019; published 5 August 2019) In prokaryotes, the number of genes in different functional classes shows apparent universal scaling with the total number of genes that can be approximated by a power law, with a sublinear, near-linear, or superlinear scaling exponent. These dependences are gene class specific but hold across the entire diversity of bacteria and archaea. Several models have been proposed to explain these universal scaling laws, primarily based on the specifics of the respective biological functions. However, a population-genetic theory of universal scaling is lacking. We employ a simple mathematical model for prokaryotic genome evolution, which, together with the analysis of 34 clusters of closely related bacterial genomes, allows us to identify the underlying factors that govern the evolution of the genome content. Evolution of the gene content is dominated by two functional class-specific parameters: selection coefficient and genome plasticity. The selection coefficient quantifies the fitness cost associated with deletion of a gene in a given functional class or the advantage of successful incorporation of an additional gene. Genome plasticity reflects both the availability of the genes of a given class in the external gene pool that is accessible to the evolving population and the ability of microbes to accommodate these genes in the short term, that is, the class-specific horizontal gene transfer barrier. The selection coefficient determines the gene loss rate, whereas genome plasticity is the principal determinant of the gene gain rate. DOI: 10.1103/PhysRevX.9.031018 Subject Areas: Biological Physics, Interdisciplinary Physics I. INTRODUCTION microbes (notwithstanding some debate on the validity of the exact universality [5,7]), suggesting that differences in Comparative analyses of prokaryotic genomes show that scaling reflect important not yet understood features of the number of genes in different functional classes scales cellular organization and its evolution. In attempts to differentially with the genome size [1–6]. The scaling laws explain the empirical observation that power-law scaling are robust under various statistical tests [5], across different databases, and for different gene classifications [2–6].In is a good fit to the genomic data, several theoretical models the seminal analysis of scaling, van Nimwegen fitted the have been proposed, as outlined below. scaling to a power law of the form [2] In the first, now classic analysis of scaling laws, van Nimwegen grouped the functional classes of genes along three integer exponents 0,1,2, arguing that deviations from x ¼ ηxγ; ð Þ 1 1 the integers, as demonstrated in the dataset analyzed here (Fig. 1 and Table I), most likely reflected gene classifica- where x1 denotes the number of genes that belong to a tion ambiguities [2]. The gene classes with the 0 exponent specific functional class, and x is the total number of genes. include information processing systems (translation, basal Power laws are the simplest functions that give good fits to transcription, and replication), those with the exponent of 1 the gene scaling data [2,5]. Analysis of the scaling are primarily genes for metabolic enzymes and transporters, exponents γ has shown that such exponents are (nearly) whereas those with the exponent of 2 encode various universal for each functional class across a broad range of regulatory proteins. The essential information processing systems are universally conserved and remain nearly the *[email protected] same in all microbes regardless of genome size; metabolic † [email protected] networks expand proportionally to the genome growth, and the complexity of regulatory circuits increases quadrati- Published by the American Physical Society under the terms of cally with the total number of genes (i.e., linearly with the the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to number of potential interactions between gene products). the author(s) and the published article’s title, journal citation, To put this conceptual thinking on quantitative ground, the and DOI. toolbox model has been proposed to explain the quadratic 2160-3308=19=9(3)=031018(19) 031018-1 Published by the American Physical Society SELA, WOLF, and KOONIN PHYS. REV. X 9, 031018 (2019) FIG. 1. The scaling laws for all functional classes of the COGs. The number of genes in each functional class scales with the total number of genes, and the scaling exponents substantially differ between the classes. The number of genes in a given COG category is plotted against the total number of genes, together with a power-law fit and model fit, as given by Eq. (2). Each gray point represents one genome from the analyzed set of 1490 genomes. The scaling is fitted to a power law which is indicated by a solid gray line. The fitted scaling exponent is indicated in parentheses. Red points correspond to the mean values for each ATGC in the dataset, and the model fit of Eq. (2) is shown by the solid red line. scaling, whereby the number of regulators grows faster than extensive gene loss and horizontal gene transfer (HGT) the number of metabolic enzymes thanks to the frequent [12–16], it has been hypothesized that the universal reuse of the latter enzymes in new pathways [8,9]. exponents are determined by distinct gene gain and loss Subsequently, the toolbox model has been extended to rates for different classes of genes and reflect the innovation assume that the gene composition of prokaryotic genomes potential of these classes [17]. Clearly, regulatory genes is determined by selection for fixed proportions of genes have the highest innovation potential, whereas information from different functional classes; under this assumption, the processing systems have next to none. model recapitulates both the scaling laws and the distri- Although the power laws provide good fits to the bution of gene family sizes within each functional class genomic data, the origin of the observed scaling remains [10]. The linear scaling was also obtained under a theo- obscure. Given that genome sizes barely span 2 orders of retical model with two classes of genes, internal (house- magnitude (Fig. 1), the power-law fits should be treated as keeping) and external (ecological), and a reproductive rate approximations rather than firmly established quantitative set to be independent of the genome size [11]. Finally, laws. More importantly, although the models surveyed given that prokaryotic genome evolution is dominated by above account for the power-law fit to the genomic data, 031018-2 SELECTION AND GENOME PLASTICITY AS THE KEY FACTORS … PHYS. REV. X 9, 031018 (2019) TABLE I. Scaling, selection, and plasticity in different functional classes of microbial genes. Scaling ΔS1 Average selection Average Plasticity Class Functions exponent γ slope −q coefficient hΔS1i plasticity hp1i slope b J Translation 0.35 −1.10 × 10−2 2.68 0.005 1.54 × 10−5 L Replication and repair 0.51 −1.35 × 10−3 0.98 0.013 −9.18 × 10−5 D Cell division 0.64 −1.58 × 10−5 1.83 0.002 −3.21 × 10−5 F Nucleotide metabolism and transport 0.69 −1.75 × 10−2 2.15 0.003 3.65 × 10−5 O Post-translational modification, 0.83 −5.42 × 10−3 1.38 0.010 4.88 × 10−5 protein turnover, and chaperone functions M Membrane and cell wall structure and biogenesis 0.88 −1.05 × 10−3 0.65 0.029 4.57 × 10−5 H Coenzyme metabolism 0.88 −4.68 × 10−3 1.51 0.011 4.62 × 10−5 V Defense 0.94 −3.72 × 10−7 −0.44 0.034 −6.68 × 10−5 C Energy production and conversion 1.00 −4.52 × 10−3 1.19 0.017 8.92 × 10−5 I Lipid metabolism 1.08 −4.86 × 10−3 0.92 0.016 1.24 × 10−4 N Secretion and motility 1.18 −7.13 × 10−8 0.27 0.014 1.07 × 10−4 X Mobilome: prophages, transposons 1.24 −2.06 × 10−4 −4.76 1.021 1.12 × 10−2 P Inorganic ion transport and metabolism 1.24 −3.76 × 10−3 0.75 0.026 1.15 × 10−4 E Amino acid metabolism and transport 1.24 −1.90 × 10−3 1.12 0.028 8.15 × 10−5 R General functional prediction only 1.26 −1.05 × 10−3 0.37 0.051 1.15 × 10−4 S Function unknown 1.27 −5.92 × 10−7 0.55 0.025 3.35 × 10−5 G Carbohydrate metabolism and transport 1.47 −1.86 × 10−3 0.46 0.041 1.65 × 10−4 T Signal transduction 1.49 −1.28 × 10−3 0.57 0.030 1.25 × 10−4 K Transcription 1.63 −1.67 × 10−3 0.27 0.058 1.81 × 10−4 Q Biosynthesis, transport, and catabolism 1.69 −5.77 × 10−3 0.06 0.021 2.54 × 10−4 of secondary metabolites they stop short of a general theory of genome evolution class-specific quantities are normalized by genomic means, rooted in population genetics that would yield power laws and the ratios are used to extract model parameters. or even account for the observed scaling. The analyses presented here show that prokaryotic Here, we analyze a simple population genetics model for evolution is dominated by two underlying factors: selection prokaryotic genome evolution [18]. Genome evolution is coefficient and genome plasticity.