<<

GENOMIC FLUX: BY LOSS AND ACQIBSITION

Jeffrey G. LAwrence and John R. Roth

15

Genome evolution is the process by which the introduction of mitochondria and chloro­ content and organization of a species' genetic plasts). In , however, both genetics and information changes over time. This process genome analysis provide extensive evidence involves four sorts of changes: (i) point mu­ for gene loss and horizontal genetic transfer. tations and gene conversion events gradually Analyses of these data suggest that gene loss alter internal information; (ii) rearrangements and acquisition are likely to be the primary (e.g., inversions, translocations, inte­ mechanisms by which bacteria adapt geneti­ gration, and transpositions) alter cally to novel environments and by which topology with little change in information bacterial populations diverge and form sepa­ content; (iii) deletions cause irreversible loss of rate, evolutionarily distinct species. We sug­ information; and (iv) insertions of foreign ma­ gest that bacterial adaptation and are terial can add novel information to a genome. determined predominantly by acquisition of Although the first two processes can create selectively valuable (by horizontal trans­ new genes, they act very slowly. Gene loss and fer) and by loss of weakly contributing genes acquisition are genomic changes that can rad­ (by , , and drift from the ically and rapidly increase fitness or alter some population) during periods of relaxed selec­ aspect of lifestyle. tion. Most thought on genome evolution has fo­ We propose that a limitation of genome cused on how the slow sequence changes can expansion couples the rates of gene acquisition cause divergence of gene functions. This is and loss. may be limited in part understandable because available data suggest by population-based factors that limit the abil­ that horizontal genetic transfer has been a mi­ ity of cells to selectively maintain information; nor contributor to the evolution of eukaryotic some limitation may also be imposed by phys­ lineages (with notable exceptions, such as the iological considerations. The balance between selective gene acquisition and secondarily im­ posed gene loss implies that addition of a for­ eign gene increases the probability of loss of Jeffrey G. Lawrence, Department of Biological Sciences, Uni­ some resident function of lower selective versity of Pittsburgh, Pittsburgh, PA 15260. John R. Roth, Department of Biology, University of Utah, Salt Lake value. The interaction of these factors, we City, UT 84112.

Organization of the Prokaryotic Gmomt, Edited by Robert L. Charlebois, © 1999 American Society for Microbiology, Washington, D.C.

263 264 • LA WRENCE AND ROTH suggest, drives the divergence of bacterial others may provide a benefit only under cer­ types. tain circumstances (e.g., TMAO reductase). In this way, one may sort bacterial genes into DYNAMICS OF GENE LOSS broad classes that reflect their average impor­ tance to the cell (Table 1). in genes Existence of a Gene Implies making essential or very important contribu­ a Function tions to the cell will be strongly counterse­ Traditional bacterial genetics allows identifi­ lected in bacterial populations, since these cation of gene function by correlating mutant mutants cannot compete effectively against growth phenotypes with biochemical defects. otherwise uncompromised conspecific indi­ One can demonstrate the functional impor­ viduals. However, mutations in genes that do tance of many DNA sequences by observing not contribute to cell fitness (e.g., selfish genes the consequences of their disruption. In con­ on transposons) will not be counterselected, trast, genomic analyses identify genes solely as since these neutral mutants do not put their open reading frames, with possible similarity bearers at a selective disadvantage. to genes of known function but without a di­ Between these extremes lies the gray area rect tie to either phenotype or biochemical of mutations that have subtle effects on fitness; defect. we include two extreme classes of genes. In identifying a gene by its sequence, rather Some genes may make a minimal contribution than by mutations and phenotypes, one as­ to fitness under all growth conditions; others sumes implicitly that the very presence of a may make a large contribution to fitness but gene implies that it must confer a selectable do so only in a rare subset of environments. function. That is, the gene could only have For either class of gene, the average selection remained in the population if the encoded coefficient is low. The fate of mutations in the function is important; that is, mutants lacking most weakly selected genes is governed by a the function show reduced fitness and are re­ complex interaction of natural selection, ran­ moved from the population by natural selec­ dom genetic drift, population size, population tion. This evolutionary argument implies that subdivision, and genetic exchange within the a mutant phenotype (perhaps difficult to dem­ species. onstrate) will result if the gene is disrupted. In For any one species, a different fraction of principle, a few genes might be encountered genes may make up each category in Table 1. which either have just been introduced or In small (e.g., that of Mycoplasmagen­ have escaped selection, and the process of italium), a large fraction of genes are likely to elimination has not yet run its course; data be essential (42), while a smaller fraction of supporting such cases will be pres~nted below. genes are likely to be essential in An important question arises when one tries with larger genomes (e.g., Escherichia colt). Re­ to define the function of a gene (or determine gardless of the distribution, some genes in any whether it is one of the rare nonfunctional genome will fall at the bottom of the list and examples). That is, how important must a make a minimal contribution to cellular fitness function be to assure the maintenance of its (i.e., they are nearly neutral). These genes will gene? How large a fitness contribution must a be at greatest risk for functional loss by mu­ gene confer to remain in a genome? tation, deletion, and genetic drift. This class of genes may comprise a large portion of ge­ A Spectrum of Fitness Contributions nomic information. All bacterial genes are not equally important. In E. coli, few of the 4,286 protein-coding Functions performed by bacterial cells are di­ genes are essential. Isolation of temperature­ verse, and while some are essential for life un­ sensitive lethal mutations suggests that only der all conditions (e.g., RNA polymerase), ~200 genes are essential on rich medium (55). 15. GENOMIC FLUX • 265

TABLE 1 Fitness contributions of bacterial genes Fitness con- Class Physiological role Consequence of null mutation tribution Top Large Essential Lethality High Large Very important Strong impairment Middle Moderate Important Obvious impairment Low (I) Very low Minimally useful in all Subtle impairment; hard to detect experimen­ conditions tally Low (II) Very low Important in rare con- Strong impairment in some environments ditions Neutral None None None

This estimate of the minimal number of es­ fitness contributions c2.n be maintained in a sential genes in the E. coli chromosome (even population. If mutation rates are high, a when adjusted for failure to detect some genes stronger selective coefficient would be re­ by this method) is congruent with theoretical quired to maintain a gene in a population of estimates of minimal gene number (256 genes) the same size. Therefore, s is proportional to based on comparisons of several bacterial ge­ the mutation rate, µ,(sex: µ,).As the mutation nomes (42). Moreover, a similar number of rate increases, null mutations in a gene under genes are detected by mutations that cause a weak selection are more likely to drift to fix­ nutritional requirement for growth. Together, ation, since defective alleles are created more these rather important gene classes constitute rapidly than they can be removed by selection. approximately 10% of the E. coli genome; the As population size decreases, loss by drift be­ remaining 90% of genes must make smaller comes more likely and more genes become contributions to fitness or be needed only un­ effectively neutral (49, 50). In these smaller der particular conditions. This conclusion is populations, a larger selective value is required supported by the analysis of E. coli mutants to maintain a gene in the population. That is, described above and by the observation that a gene must make a stronger contribution to the genomes of free-living bacteria vary fitness to be selectively maintained. Therefore, greatly in size. In additism, experimental ap­ s is inversely proportional to the effective pop­ proaches have revealed that lesions in a large ulation size, N, (s ex: µ,/ N,). The dynamics of proportion of genes in the Saccharomyces how selection acts on mutant alleles is influ­ cerevisiae have only minimal effects on fitness enced by the rate of intraspecific recombina­ (65). tion. As the recombination rate increases, se­ lection can more effectively remove the steady A Minimal Fitness Contribution Is accumulation of detrimental alleles from a Required for Gene Maintenance . population, and the species can avoid to some While a complete description of the processes extent the fitness decline mandated by Mul­ governing the selective maintenance of bac­ ler's ratchet (41). As a result, the minimum terial genes is beyond the scope of this paper, fitness contribution required for selective some general guidelines can be described. If a maintenance in a haploid genome decreases as gene is to avoid loss by mutation and genetic the recombination rate, r, increases: drift, it must provide some minimal average s ex: µ,/rN, (1) selective benefit to the cell; we call that value s. (This is the maintenance threshold in Fig. Here, recombination allows selection to act 1.) If mutation rates are low, genes with small more efficiently on individual alleles by plac- 266 • LA WRENCE AND ROTH

Resident Genes Listed by Fitness Contribution

FIGURE 1 Genome evolution by ge­ nomic flux. A genome is depicted as a set of genes (lines and boxes) ranked by av­ erage selection coefficient; classes of genes discussed in Table 1 are noted on the left. Genes Genes inherited vertically are represented Likely To by solid lines; foreign genes are indicated Lost by open boxes. Genes below the thresh­ Be old for maintenance cannot be maintained by natural selection and will be lost by mutation and deletion. Acquired genes introduced by horizontal transfer will ul­ timately be lost if they fail to confer suf­ ficient fitness.

ing them in more genetic contexts and mini­ sizes, and mutation rates similar to or lower mizing their ability to persist by association than those of bacteria. For such species, the with valuable genes. estimate is made that a new mutation with no Thus, a gene must make a larger fitness effect on fitness has a probability of 1 IN of contribution to be maintained as the mutation drifting to fixation, with N being the popu­ rate increases, as the population size decreases, lation size. Therefore, in the absence of selec­ or as the recombination rate decreases. Con­ tion, every neutral allele in the gene pool has versely, with lower mutation rates, bigger an equal probability of displacing all other al­ populations, or more recombination, genes leles and drifting to fixation in the population. making extremely subtle contributions to cel­ To prevent loss by null mutations and drift, a lular fitness can maintain their positions. This gene must provide a selection value great relationship leads to two conclusions: (i) genes enough to oppose this drift, placing the re­ failing to make a minimal contribution to cel­ quired fitness contribution in the range of lular fitness will be. eliminated from the ge­ 1/N. nome, and (ii) barring any change in mutation For bacteria, which have much larger pop­ rate, recombination rate, or population size, ulation sizes, the above estimate would suggest the genome will have an upper, size limit at that an extremely small fitness contribution which all resident genes can pe~ist in a pop­ would be sufficient to allow maintenance in ulation of genomes by counterselection of the population. In bacteria, mutation rate is less-fit mutants. Additional genes cannot be likely to be a more prominent dictator ofloss maintained by natural selection_, and those than drift; opposing this would require a fit­ genes making the smallest contrib"ution to the ness contribution in the realm of the likeli­ organism's fitness will be lost. hood of mutational loss (10-5 per gene per Thus, the minimal selective value needed generation), which is probably larger than for gene maintenance is a variable dependent 1 IN. Despite these reasonable calculations, the on several factors. These dependencies make minimal fitness contribution could become it difficult to estimate the absolute values for much higher if many beneficial genes compete the minimum required fitness contribution. with each other to maintain a position in Eukaryotic sexual species typically have high a genome of limited size. This possibility recombination frequencies, small population emerges from the model presented.below. 15. GENOMIC FLUX • 267

Regardless of the numerical evaluation of One may infer that the vast majority of se­ the minimal fitness contribution, if genes are quences introduced by horizontal transfer sorted in order of decreasing average fitness would fail to make a minimal contribution contribution (Table 1), some must necessarily and would be lost. Several factors may explain fall at the end of the list and be prone to loss the failure to make a contribution. (i) The in­ by mutation and drift. It is these genes that are troduced DNA does not encode a product. (ii) always at risk for loss, especially when condi­ The acquired genes are not expressed in the tions change or newly acquired genes are in­ new host. (iii) The acquired genes are ex­ serted above them on the list. pressed and could contribute to a valuable function, but they cannot provide that func­ Evidence for Gene Loss tion without the help of other genes that were Genome sequence comparisons among enteric not cotransferred (see "Genomic Flux and the bacteria reveal genes that have been lost from Evolution of Gene Clusters" below). (iv) The certain lineages while being maintained in acquired gene produces a functional protein, others. For example, the phoA gene, encoding but this function does not increase the fitness alkaline phosphatase, has been lost from the of the new host cell (either the cell already Salmonella lineage but has been maintained in possessed the function, or the activity is not the genomes of virtually all other enteric bac­ useful). Therefore, the introduction and in­ teria (16). Such cases demonstrate that genes corporation of foreign DNA per se does not conferring a sufficient selectable phenotype in ensure its persistence in the recipient gene one ecological context may fail to provide pool. We must distinguish between horizontal such a benefit in another context and there­ transfer of genetic information (the physical fore be subject to loss by mutation and genetic process of incorporating foreign DNA into drift. one of a cell's replicons) and horizontal trans­ fer of useful phenotypic information (foreign DNA sequences that can be maintained for DYNAMICS OF GENE ACQUISITION long periods by selection for the encoded Horizontal Transfer of functions). Although cursory inspection of ge­ Genetic Infonnation nomic sequences may reveal genes that were DNA may be introduced into bacterial ge­ introduced by horizontal transfer (38, 47, 69), nomes by many processes, including conju­ such analyses alone do not reveal whether gation, bacteriophage-mediated , these sequences are providing a useful func­ and transformation. The general term recom­ tion (i.e., have been maintained for long pe­ bination typically refers to the introduction of riods by selection). The acquisition of se­ DNA from a conspecific cell, whereas hori­ quence that may or may not increase fitness is zontal genetic transfer or lateral genetic transfer re­ diagrammed (with loss) in Fig. 1. fers to interspecific exchanges; we will use these definitions in the discussion that follows. Horizontal Transfer of Useful The of a foreign gene into a bac­ Phenotypic Information terial genome does not guarantee its ;urvival. Horizontally acquired DNA can confer a se­ Rather, the gene must confer a sufficiently lectable new function only if the genes are ex­ large selective advantage to the cell to avoid pressed and all of the new genes required for loss by mutation and genetic drift; above, we that function have been cotransferred. If the have described this minimal fitness contribu­ new function makes a sufficiently large con­ tion (s). The fate of an introduced DNA se­ tribution to fitness (>s, the threshold for quence can be predicted by the model devel­ maintenance), then the genes providing for oped above for the loss of DNA sequences this function may be maintained within the under weak selection. new host genome and escape loss from the 268 • LAWRENCE AND ROTH population by mutation and genetic drift. In nomic changes that have caused the divergence this way, the horizontal transfer of genetic in­ of phylogenetic groups. Large-scale surveys formation is successful. That is, long-term which assess the numbers and ages of acquired maintenance of the acquired DNA occurs genes in bacterial genomes by DNA sequence only when that DNA provides phenotypic in­ analysis are discussed below. formation that is useful to the recipient cell. Some phenotypic capabilities may provide Limits to Genome Expansion a selective advantage only to hosts growing in The gene acquisition process outlined above certain environmental regimes. Similarly, hosts implies that a genome might continuously ac­ growing in a particular ecological niche may cumulate genes that impart a minimal selective find only certain kinds of functions useful. coefficient to the cell. However, it is clear that Therefore, although DNA may be transferred finite populations of cells cannot maintain an nonspecifically among a broad range of organ­ infinite number of genes. There is no evi­ isms, it persists in only a small subset of for­ dence that bacterial genomes have been con­ tuitous cases, where the transferred DNA im­ tinuously increasing over evolutionary time. parts a function of value to the host organism Rather, genome size is constrained by the in its traditional, or a newly available, ecolog­ population-genetic considerations outlined ical context. Only in these rare cases does the above. As detailed in equation 1, the minimal horizontal-transfer event contribute to ge­ fitness contribution required to maintain a nome evolution (ignoring any mechanistic gene can be described as a function of the mu­ consequences of recombination). Below we tation rate, recombination rate, and popula­ describe methods for assessing the frequency tion size (s ex µ,/ rN,). If the selective benefit of successful gene acquisition and the fraction of an allele falls short of this level, it may be of modern genomes that has been acquired by lost from the population by genetic drift. horizontal transfer of useful phenotypic infor­ When applied to many genes in a genome, mation. these forces limit genome expansion, because the minimum selective value required for each Evidence for Gene Acquisition gene increases with the total number of genes The evidence for gene acquisition in many or­ under selection ( G, the informative genome ganisms has been detailed elsewhere (64). In size). bacteria, horizontally transferred genes fre­ The effect of genome size on the minimum quently confer phenotypes that are character­ selective value is a bit more difficult to explain istic of particular taxonomic groups. Bacterial but can be visualized in the following way. taxonomists assign a new isolate to a particular The overall fitness of a genome is a complex species by scoring possession of particular phe­ function of the combined fitness contributions notypes that are present in one lineage but not of the many individual genes. As the number in another. Examples of group-specific traits of genes increases, the target for deleterious among enteric bacteria are lactose utilization mutations increases. Each deleterious mutation in E. coli and citrate transport . in Salmonella causes a fitness decrease for the organism and enterica. Salmonella's ability to synthesize co­ for alleles of other genes carried by that or­ balamin and use it to support propanediol ganism. Natural selection can reduce the fre­ degradation forms the basis of a test diagnostic quency of deleterious mutations by favoring for this bacterium (52). These species-specific cells that do not carry deleterious mutations. abilities are frequently ones whose genes ap­ However, as a genome increases in size, selec­ pear to have been acquired horizontally (based tion becomes less efficient at removing dele­ on composition and codon usage terious mutations from any one gene, since bias) (37). These data suggest that horizon­ there are many more genes with potentially tal transfer is an important aspect of the ge- deleterious mutations. 15. GENOMIC FLUX • 269

First, consider a genome of sufficient size Acquisition of Gene Clusters that, on average, one mutation occurs per ge­ For a horizontally transferred gene to provide nome per generation. Assuming that the num­ a new, selectively valuable function, all other ber of individuals in the population exceeds genes required for that function must be pres­ the number of genes in a chromosome, every ent or cointroduced into the new host ge­ cell will acquire a potentially deleterious mu­ nome and be appropriately expressed. With­ tation every generation. If the recombination out simultaneous transfer of all required genes, rate is sufficiently high, such mutations may a single acquired gene cannot provide a selec­ be removed and Muller's ratchet may be tive benefit and the gene will not remain in avoided. As the number of genes increases, the new population. this selective cleansing becomes more difficult, In bacteria, genes for nonessential (but use­ until the number of mutations occurring every ful) metabolic functions are often found in op­ generation is more than can be counterse­ erons or in closely linked clusters of indepen­ lected. Eventually, null mutations will com­ dently transcribed genes. These operons and pletely eliminate some genes from the pop­ clusters frequently include all the genes ulation, specifically, those that made the needed to provide a selectable function. We smallest contribution to the organism's fitness. have proposed that evolutionary formation of Thus, as genome size increases, the compo­ these clusters (selfish operons) can occur by nent genes effectively compete with one an­ stepwise aggregation of unlinked genes and other for maintenance in the genome. The that it is driven by selection for increased ef­ consequence of these effects is that mainte­ ficiency of horizontal transfer among prokar­ yotic genomes (36). Evidence for this model nance of a gene in a large genome requires a is detailed later. higher selective value than maintenance of the same gene as part of a smaller genome. The number of genes that can be simulta­ DYNAMICS OF GENOMIC FLUX neously maintained by selection is limited by Competition betw:een Genes for these population factors, which therefore limit Maintenance in a Genome genome expansion. The relationship between To this point, we have considered gene loss genome size ( G) and these population factors and acquisition as separate processes. Although lS both loss and acquisition strongly influence genome evolution, we suggest that the two G ex rN/µ, (2) processes are synergistic due to the limits on genome expansion. As detailed above, the Genome size, G, can increase only if_recom­ tninimal selective contribution required for bination rates increase (increasing the effi­ gene maintenance, s, is a function of the mu­ ciency with which selection can favor non­ tation rate, the recombination rate, and the mutant alleles of each gene), population sizes population size (equation 1). The number of increase (to decrease the rate of genetic drift), genes, G, that can be maintained by selection or mutation rates decrease (to minimiie gene is also a function of these parameters (equation damage). The coupled gain and loss process is 2). If mutation rate, recombination rate, and diagrammed in Fig. 1. population size remain constant, it is clear that Empirical evidence supports this limitation the tninimum selective contribution required on genome size. Despite high rates of hori­ for maintenance of each individual gene is a zontal genetic transfer among enteric bacterial function of the genome size: species, the genomes of E. coli, S. enterica, and (3) related organisms are notably uniform in size (2, 3, 45). That is, as genome size approaches the maxi- 270 • LAWRENCE AND ROTH mum number of genes maintainable by natural to contribute maximally until some resident selection, each gene must make a greater con­ functions are eliminated by mutations. Thus, tribution to cellular fitness in order to persist. the innovations provided by horizontally ac­ If a genome is small relative to the maximum quired genes may drive selective elimination maintainable size, a gene making only a small of some previous functions from the host ge­ contribution to cellular fitness can be main­ nome. In this way, the deletion of ancestral tained, since mutations in this gene can be ef­ sequences effectively increases the fitness im­ fectively counterselected. However, as the pact of the acquired sequences. number of genes under simultaneous selection increases, mutations in this same gene may no Genomic Flux and longer be effectively counterselected. Ecological Differentiation Thus, genes in a genome are competing Gene acquisition, coupled with gene loss, is a with each other for maintenance in the face dynamic process by which genetic material of mutation, deletion, and genetic drift. Con­ flows in and out of bacterial genomes. As sequently, an increase in genome size caused noted above, long-term stability of transferred by a valuable horizontally acquired gene may DNA must be associated with a selectable increase the likelihood that some less valuable phenotype. Hence, the positively selected gain gene (probably of totally unrelated function) of valuable genes by a genome will be asso­ will be lost. Since genome size is limited by ciated with loss of other genes, resulting in a selection, each gene within a genome com­ constantly changing portfolio of phenotypic petes with the new invading genes, and with capabilities. The process of gene acquisition other genes, for maintenance in the genome. and loss will serve to create descendent pop­ The competition among genes for selective ulations that are genetically and phenotypically maintenance complicates the process of gene different from ancestral populations. This pro­ acquisition and loss. Because of this competi­ cess is the inevitable outcome of the horizon­ tion, the selective hierarchy of genes outlined tal transfer of phenotypic information; the val­ above (Table 1) changes when a gene for a uable acquired sequences will be maintained new function is acquired. Consider genes for by selection, and insufficiently valuable native a weakly selected function (wsf) poised at the sequences will be displaced (Fig. 1). threshold of maintenance by natural selection, Genomic flux can alter the phenotypic s;. If the cell acquires genes for a novel func­ character of the recipient organism even when tion that provides a·.selective advantage greater the acquired sequences encode functions that than s;, the wsf genes will be lost from the are highly similar to functions encoded by na­ genome, because the minimum selective con­ tive sequences (e.g., E. coli maintains the hor­ tribution required for maintenance has in­ izontally acquired argl and gapA genes al­ creased and the value provided by wsf is no though each has a homologue performing a longer sufficient for maintenance (equation 3). similar function). Unless the new and old ver­ In this way, once genome size has reached its sions have some sort of functional difference, upper limit, the process of gene acquisition natural selection cannot counterselect muta­ can indirectly cause loss of less valuable genes. tions in both sets of genes. An acquired se­ Genomes that are continually acquiring useful quence that provides precisely the same func­ functions probably persist at this upper size tion as a native gene, and makes an identical, limit as genes are continually added and lost. redundant contribution to cellular fitness, will Acquired genes may also cause selective loss have an equal chance of surviving mutational of previously resident sequences if the old loss. This predicts that we may expect to see functions conflict with the introduced ones. occasional examples of old functions being su­ That is, an introduced function may disrupt perseded by functionally equivalent genes ac­ or cellular processes and be unable quired by horizontal transfer. 15. GENOMIC FLUX • 271

One can envlSlon two outcomes of gene processes. It is unclear how frequently useful acquisition. First, the phenotypic alteration information is transferred or how efficiently may allow the population to exploit its current novel metabolic capabilities can evolve by ecological niche more effectively. This refines point-mutational change (internal gene dupli­ an existing species but does not create an ec­ cation and divergence). It is also unclear how ologically distinct group of organisms. Alter­ effectively internal gene formation could com­ natively, the acquired information may pete with acquisition of equivalent functions broaden the ecological niche of the descen­ by horizontal transfer. Only by quantifying dent population, allowing it to exploit or sur­ these two processes (discussed below) can we vive new aspects of its surroundings. When compare their importance. Below we provide this happens, the descendent population has evidence that in enteric bacteria, the magni­ novel capabilities that distinguish it from the tude of genomic flux is so large that it prob­ ancestral population. This process of niche al­ ably dominates the process of adding new teration defines bacterial speciation, whereby functions to a genome. a group of organisms evolves and forms a dis­ tinct evolutionary lineage (67, 70). Limits to Applicability of Genomic The genotypic divergence of speciating Flux in Speciation lineages is increased by the inevitable loss of The genomic-flux model may be less impor­ genes that accompanies gene acquisition. The tant for some bacterial lineages. The genomes losses make it unlikely that gene acquisition of enteric bacteria show properties that fit well will simply broaden the ecological niche; with predictions of the model. These enteric rather, they mandate a qualitative change in bacteria have large (5 Mb) genomes that en­ niche definition. Lineages that prosper due to code many nonessential (but presumably use­ acquired traits will differ from the parent strain ful) functions; such modestly important func­ both by the newly acquired properties and by tions are subject to loss and acquisition. the gene losses that necessarily occur. As mul­ Differences in these nonessential functions dis­ tiple genotypic differences that distinguish tinguish enteric bacterial species, allowing newly diverging species accumulate, the infre­ each lineage to effectively exploit a different quent recombination among them will be less ecological niche. likely to cause coalescence of the lineages. To The genomic flux paradigm may apply less complete the speciation process, neutral point well to organisms that do not experience high mutations lead to sequence differences be­ rates of horizontal transfer or those unlikely to tween the diverging lineages; this reduces the derive any selective benefit from acquired homologous-recombination rate and imposes genes. For example, bacterial endosymbionts reproductive isolation. This drop in recom­ have little access to horizontal genetic transfer bination is imposed by the sequence specificity or intraspecific recombination and persist in of the recombination and mismatch repair the relatively constant environment of the (Mut) systems (53, 61, 68, 71). host. One might expect that their smaller pop­ Genomic flux makes bacterial divergence a ulation sizes and lower recombination rates natural outcome of the exploration of novel would serve to reduce the number of genes environments. The divergence of nascent spe­ the organisms could maintain (equation 3). cies accelerates due to the combination of Not surprisingly, phylogenetic analyses show horizontal transfer and concomitant muta­ that the genomes of these organisms are be­ tional loss of information. We think speciation coming smaller (39). Genome size reduction would be less likely to occur (or would occur is more dramatic in obligate endosymbionts, much more slowly) if it depended only on although these inferences are confounded by accumulation of internal point mutations. possible gene transfer from the endosymbiont However, it is difficult to compare the two genome to the host genome. In this lineage, 272 • LA WRENCE AND ROTH acqulSltton of novel metabolic capabilities is analyze further because we lack robust phy­ unlikely to allow differentiation of this lin­ logenies describing their relationships to each eage into related, but unexplored, ecological other and to similar bacteria. Without these niches. The minimal fitness contribution re­ phylogenies, we cannot easily identify which quired to maintain a gene in such organisms genes were lost by one species (and would be would be expected to be high. present in a closely related outgroup taxon) and which sequences were gained (and would be absent from other members of the phylo­ MEASURING GENOMIC FLUX genetic group). More importantly, genome Assessing Gene Loss and Acquisition comparisons do not tell us whether an ac­ To assess the contribution of genomic flux to quired foreign gene is providing a function genome evolution and speciation, one must that increases the fitness of the organism. This measure rates of gene loss and acquisition. To information would be provided if we knew do this, one must identify foreign genes and how long each gene had persisted in the ge­ determine when each was acquired (and how nome. A long persistence time would suggest many other genes were lost) since the diver­ that the sequence is valuable; that is, its mutant gence of sibling species from a common an­ alleles are being remoyed from the population cestor. One must then determine how long by selection. each acquired sequence has persisted in the An alternative tactic is to identify genes of genome. By comparing the complete genome foreign origin and determine how long ago sequences of two closely related organisms, each entered the genome. This method relies one can easily identify the sequences that are on intrinsic sequence characteristics rather unique to each species; these sequences have than genome comparisons. From the entry either been gained unilaterally by one taxon times of foreign genes, one can estimate the or lost unilaterally from the other. The next age structure of the gene population and the problem is to assess the arrival times of the overall rate of gene acquisition. If genomes are foreign genes. not increasing in size, one can conclude that Complete genome sequences allow one to the overall gene acquisition rate is accompa­ count genes unique to each member of a nied by an approximately equal rate of gene closely related species pair. M. genitalium has a loss. The lost genes would include some old 580-kb genome (19), while the closely related ancestral genes and some of the recently added species Mycoplasma_ pneumoniae possesses an genes that failed to confer sufficient selective 816-kb genome (20). Although each gene in value. the M. genitalium genome has a homologue in The assumption of a constant genome size the M. pneumoniae genome, the M. pneumoniae seems reasonable in cases where the members genome contains 209 genes not found in M. of a robust phylogeny possess similar genome genitalium (21). Similarly, the genome of the sizes. In these cases one can compare the rate sulfate-reducing archeon Archaeoglobus sp. (29) of information influx to the rate at which in­ is 25% larger than that of the strict methano­ formation is introduced. by point-mutational gen Methanococcus janaschii (11).- One could change and thereby assess the relative contri­ speculate that the additional genes found in butions of these processes to genome evolu­ Archaeoglobus allow its heterotrophic lifestyle; tion. We have explored this general strategy such genes would not confer a selective ad­ by examining the genomes of Salmonella and vantage on autotrophs like M. janaschii. E. coli. These genome comparisons demonstrate Enteric bacteria have a robust phylogeny that gene loss and acquisition contributed to (33) and do not vary substantially in genome the divergence of these bacterial species. size (45). Moreover, the various members of However, these species pairs are difficult to this clade inhabit diverse ecological niches- 15. GENOMIC FLUX • 273 they exploit soil and water regimes and the Codon usage bias measures the translational digestive tracts of insects, fish, amphibians, preferences among synonymous codons (some reptiles, birds, and mammals and show path­ codons are translated more quickly, more ogenic lifestyles. Furthermore, they have ob­ efficiently, or more faithfully than others). vious phenotypic characteristics that distin­ These preferences reflect the relative abun­ guish one species from another. The two dance of different cognate tRNAs (23, 24) and enteric species E. coli and S. enterica are clearly the nature of the tRNA modifications that al­ distinguishable, and each is the closest major low for discrimination among synonymous relative of the other. The sequence of the E. codons (29a). These factors modulate the ef­ coli genome is available (4), and that of the fect of directional mutation pressures and are Salmonella genome will be available shortly. reflected in the relative percent G+C contents Considerable sequence information is already of the three codon positions in protein-coding available for Salmonella. Therefore, we can sequences (43). identify genes unique to E. coli and S. enterica, The effect of directional mutation pressure identify the foreign-looking genes, and ana­ on position-specific nucleotide composition is lyze the age structure of each genome, making shown in Fig. 2A. Muto and Osawa (43) cor­ the assumption of no genome expansion. The related the overall nucleotide compositions of genomic-flux model predicts that the differ­ various bacterial genomes with the nucleotide ences between these taxa (including metabolic composition of each of the three codon po­ differences or degree of pathogenicity) arose sitions in coding sequences. They showed that by the process of gene loss and acquisition. experiencing very weak selection The selfish operon model (see below) predicts for function (e.g., third codon positions, at that multigene phenotypes acquired by hori­ which most substitutions fail to alter the en­ zontal transfer will be found in gene clusters coded amino acid) show a dramatic increase and operons. in percent G+C content as genome percent G+C nucleotide composition increases. In Identification of Foreign Genes contrast, nucleotides under strong natural se­ Acquired by E. coli lection (e.g., the second codon position, Genes that are native to a genome show char­ whose substitution alters the character of the acteristic and species-specific nucleotide com­ encoded amino acid) show more modest positions, dinucleotide fingerprints (28), and changes. The behavior of the first position is codon usage biases (56, '-57). These patterns intermediate, since a few changes there are emerge when genes experience the same di­ synonymous. Although the original data of rectional mutation pressures for a long time Muto and Osawa show substantial variance (62, 63). The rate of mutations that convert from a strict linear relationship between po­ an AT (or TA) pair to a GC (or CG) sitional and overall percent G+C contents, is not necessarily the same as the rate of the their study was done at a time when rather reverse substitutions, GC (or CG) to AT (or little sequence data was available. When large TA). The relative rates of the two substi_tution sets of genes or whole genome sequences are types reflect intracellular deoxynucleoside tri­ added to the analysis, the relationships appear phosphate pools, the error frequency of DNA quite robust (linear equations are provided in polymerases, and the error-specific effective­ Lawrence and Ochman [31]). ness of mismatch repair systems. Organism­ Directional mutation pressures are evident, specific differences in these functions dictate in that every long-time resident gene of a ge­ the relative rates of AT and GC pair inter­ nome shows an extremely predictable and conversion and establish the characteristic dif­ characteristic nucleotide composition for the ferences in nucleotide composition (percent first, second, and third codon positions (Fig. G+C). 2A). In contrast, nonnative genes are identi- 274 • LA WR.ENCE AND ROTH

.!!? ,, Ill II E .!\? II) II) ~ e c: i IO 11),2 Ill ~ 0 0 !II ::J 0 .s;; :::J ~II) :i 15 A .g ~ ::J"C: .g !a ~ B ~e 111:::1 ~-;; Cl> c: Q) .... .!! g. eQ) ~~::::!fl ;5 !ae () Q) ~ ~"$ 0 !:I> () ~.g 8i ·c: Cl. 111 ..... - ~ cQ a:i a:i ll,jCJ) <( a: n: i-: CJ)~ § r! I I I Cl) 100 I I 11 II I I I • Time

80 c: .2 ~ ~ 60 .gc: a ..... Ii 0 u 40 ~ "*-

20 • First t. Second •Third Inferred • Donor Acquired Current Genome Genes Genome 0 ~. I I 20 40 60 80 20 40 60 80 %G+C of Bacterial Genome Amelioration of %G+C Content

FIGURE 2 (A) Relationships between overall nucleotide composition of a bacterial genome and nucleotide compositions of the three codon positions (first to third); after Muto and Osawa (43) and Lawrence and Ochman (31). The organisms providing the data are shown at the top. The data for E. coli and S. enterica were calculated from 100 and 25% of the genome sequences, respectively, after known horizontally transferred sequences were removed. (B) Process of amelioration used to infer the time of introduction of acquired genes (31). The acquired genes (shaded symbols) are atypical fur the genome (solid symbols) in which they are found. The codon position­ specific nucleotide compositions of acquired genes are back-ameliorated (equation 4) until the minimum devi­ ation (by least-squares analysis) from the Muto and Osawa relationships (open symbols) are obtained. The heavy lines indicate the codon position-specific nucleotide compositions during back-amelioration. The arrows indicate the calculated back-amelioration process used to estimate the elapsed time since the sequence showed the pattern of the donor. The inset graph shoW$ the deviation of the curves from the Muto and Osawa relationships as a function of time rather than overall percent G+C.

fiably different, because they still show the se­ ground of E. coli genes with 58% G+C con­ quence patterns imposed by the directional tent at this position. Moreover, the codon us­ mutation pressures of their donor organisms. age bias and dinucleotide frequencies of B. For example, a gene from Badllus cereus, with cereus genes would be strikingly different from 28% G+C at the third codon position, would the major patterns found in the E. coli ge­ be readily detectable as unusual in a back- nome. These patterns have been used by many 15. GENOMIC FLUX • 275 workers to identify foreign genes in partially dents of a bacterial genome experience the sequenced bacterial genomes (28, 31, 38, 47, same directional mutation pressures and con­ 69). sequently evolve to exhibit relatively uniform Lawrence and Ochman (32) have used patterns .of nucleotide composition, codon us­ these criteria to identify all of the horizontally age bias, and dinucleotide frequencies. These transferred genes in the complete E. coli ge­ genes are readily identified as foreign when nome sequence. They used the nucleotide encountered, following horizontal transfer, in composition of codon positions, patterns of a recipient genome which experiences differ­ codon usage bias, and dinucleotide frequencies ent directional mutation pressures. to determine that 15% of the E. coli genome As these foreign genes are selectively main­ (755 of 4,288 genes) was made up of genes tained in the new genome, they accumulate identifiably introduced by horizontal transfer. substitutions that occur under the new set of This figure agrees well with previous estimates directional mutation pressures. Over time, based on subsets of E. coli genes (38, 69). It their nucleotide compositions will evolve to should be noted that these methods provide a resemble patterns reflected by native genes minimum estimate of the numbers of acquired (31). During this period of amelioration, how­ genes, since they would not detect foreign ever, the nucleotide composition of amelio­ genes donated by an organism with directional rating genes will reflect a combination of di­ mutation pressures similar to those of E. coli. rectional mutation pressures: those imparted The 755 foreign genes (547.8 kb) were intro­ by their donor organism and the modifications duced into the E. coli genome in at least 234 imposed since entry into the new host ge­ lateral-transfer events since this species di­ nome. A gene caught in the act of ameliora­ verged from the Salmonella lineage 100 million tion is between the equilibrium states of the years ago. donor and recipient genomes and will show a The large number of acquired genes sup­ sequence pattern that is unlike that of either ports the magnitude of horizontal transfer but donor or recipient (or any other well­ does not reveal how many of the acquired ameliorated genome). The unique properties genes contribute a useful function. Some of of nonequilibrium genes not only show their these genes, like those found on mobile ge­ foreign origin but also permit an estimation of netic elements and prophages, may have been how long the amelioration process that has introduced recently into the E. coli genome brought them from the well-ameliorated do­ and may not contribute 'to the fitness of the nor condition to their present (nonequili­ organism. To assess the role of gene acquisi­ brium) intermediate state has been under way. tion in selective divergence, we must deter­ One can estimate the elapsed time since trans­ mine how many of these foreign genes have fer by assessing the degree to which patterns remained in the genome for sufficient time to deviate from the Muto and Osawa relation­ assure us that they are maintained by selection. ships (31). That is, we must estimate the entry time of The degree to which the compositional genes that have arrived since the divergence patterns of ameliorating genes depart from the of the E. coli and Salmonella lineages -100 relationships defined by typical genomes al­ million years ago (40, 48). lows quantification of the time these genes have experienced the directional mutation Estimating the Introduction Time of pressures of their recipient genome. Lawrence Acquired Genes and Ochman (31) described the rates at which The same characteristics of genes that suggest the nucleotide composition of each codon po­ a foreign origin can help assess their time of sition will change over time. At any moment, introduction. As described above (see also Fig. the amelioration rate for each codon position 2), all genes that have been long-term resi- can be expressed as a function of the over- 276 • LAWRENCE AND ROTH all substitution rate, R (a function of the 250 synonymous and nonsynonymous substitution rates), the current nucleotide composition of 8 the horizontally transferred sequence, GC T, the nucleotide composition of the recipient genome, GCNative, and the transition/transver­ sion ratio (IV):

AGCHT = [(IV + 1/2)/(IV (4) + 1)] . R • (GCNative _ GCH~

Three equations, each derived from equation 4, describe the rate of amelioration for each 0 1-10 11-20 21-30 31-40 41-50 51-60 61-70 codon position. If the positional nucleotide Millions of Years Since lnlroductlon composition of an acquired gene deviates from FIGURE 3 The distribution of times of introduc­ the Muto and Osawa equilibrium relation­ tion for horizontally acquired genes in E. coli; after ships, each codon position may have been Lawrence and Ochman (32). All foreign genes that ameliorating, as described in equation 4. could be ameliorated successfully are included. These amelioration equations can be used to Roughly one-third of the foreign genes are not in­ cluded because their positional percent G+C contents determine the length of time that would be did not converge to fit the Muto and Osawa rela­ required to cause the observed deviation of tionships (see the text). the three codon positions from Muto and Os­ awa equilibria. By computational "back­ amelioration" of the sequence, one can esti­ Another third of the foreign genes have mate the introduction time of an acquired been impossible to date by the amelioration gene. This is diagrammed in Fig. 2B. Normal method. They are abnormal with respect to amelioration in this example is proceeding the Muto and Osawa relationships, but the leftward toward the host pattern. back-amelioration process does not converge on a pattern that fits these relationships, so The Age Structure of the they are not represented in Fig. 3. Since this E. coli Genome group includes transposable elements, pro­ Each of the 750 foreign genes in the E. coli phages, and other selfish elements, we suspect genome was subjected to amelioration analysis that the odd nature of their sequences may (32) to estimate the time at which each en­ result from their having moved repeatedly tered the genome. Introduction times ranged from one genome to another with long peri­ from 0 to 100 million years ago. As seen in ods in which their sequences ameliorated Fig. 3, about a third (225) of the foreign genes without selection. Thus, two-thirds of the for­ do not show any sign of amelioration, since eign genes are either newly acquired or from their codon-positional percent G+C contents selfish elements; neither class of genes is likely conform to the Muto and Osawa relationships to impart a selectable phenotype and contrib­ expected for a sequence of their overall per­ ute to species divergence. We expect that cent G+C content. These genes were likely these genes will ultimately be deleted from the introduced very recently and still reflect the chromosome; selfish elements will continue to directional mutation pressures of their donor propagate and to persist, but older copies will genomes. Without direct genetic evidence, eventually be removed (34, 44). we cannot assess whether any of these newly Another third of the acquired sequences acquired genes influences the fitness of E. coli shows signs of amelioration since entering the and will be maintained in the genome. E. coli genome (Fig. 3). In these genes, mu- 15. GENOMIC FLUX • 277 tations that abolish gene function appear to The identities of the horizontally acquired have been counterselected (the reading frames sequences support the genomic-flux model of are not interrupted by nonsense codons), speciation. Horizontally acquired genes en­ while mutations that alter nucleotide compo­ code many of the functions by which E. coli sition but do not abolish function have accu­ and Salmonella differ. These include E. coli mulated. These data suggest that these ac­ genes for lactose utilization (lac), phosphonate quired genes have improved the average utilization (phn), iron citrate transport (fee), fitness of E. coli and have been maintained by and tryptophan degradation (tna) ·and Salmo­ natural selection. The age distribution of these nella genes for cobalamin (cob), useful foreign genes shows that fairly recent propanediol degradation (pdu), citrate utiliza­ additions are more common than later arrivals, tion (tct), and host invasion (spa). Other func­ suggesting that many useful genes are held by tions present in only one taxon (like alkaline selection for considerable time but are ulti­ phosphatase [PhoA] in E. coli [16]) are unique mately lost because of insufficient selective because the corresponding genes were deleted value. We assume that the rate ofintroduction from the other taxon. We know of no phe­ of DNA by horizontal transfer has remained notypic property that discriminates between constant since the divergence of the E. coli and these taxa and is not correlated with a gene Salmonella lineages, and the vast majority of loss or acquisition event. No taxonomically horizontally acquired genes have failed to con­ useful function has been identified in either fer a sufficiently useful function to become taxon that has arisen by internal duplication permanent residents. and divergence of an ancestral gene to allow After correcting for this inevitable deletion evolution of a new function by point muta­ tion. of ameliorated foreign selected genes, we es­ timate the rate of horizontal transfer of genes Genomic Flux versus Point­ conferring selectable phenotypes (horizontal Mutational Change transfer of phenotypic information) into E. coli The overall rate of horizontal transfer of se­ to be -16 kb/million years (32). The age lectively valuable phenotypic information (16 structure of this group of genes suggests that kb/million yearsJ is lower than the total rate despite being held by selection for some time, of sequence introduction, since much infor­ most of these genes are ultimately eliminated mation does not contribute to fitness. Based (a small proportion of geQes has ameliorated on the substitution rates of E. coli genes (56), beyond our ability to detect them, but this scattered point mutations are estimated to alter process is much slower than the apparent de­ the information of the E. coli genome at a rate letion rate). As noted above, we assume that equivalent to 22 kb/million years (31). How­ this rate of acquisition is balanced by a com­ ever, unlike the overall rate of DNA intro­ parable rate of deletion of ancestral genes that duction by horizontal transfer, very few of could no longer be maintained by natural se­ these changes are likely to contribute to cel­ lection. The overall rate of introduction of lular fitness; the bulk (-90%) are at synony­ DNA is much higher (in excess of 64 kb/mil­ mous codon positions and cause no change in lion years) and includes sequences which fail the nature of the encoded protein. Therefore, to make a contribution to the fitness of the while gene acquisition and point mutations cell and have remained only long enough to both introduce change into bacterial genomes, be detected in the genome of E. coli K-12. the qualitative nature of the information they We suspect that this vastly underestimates the furnish is quite different. Acquired sequences true flux of total sequence. Our interpretation must provide a selectable function in order to of the genome structure is diagrammed in Fig. be maintained by natural selection and show 4. signs of amelioration; very few point muta- 278 • LAWRENCE AND ROTH

Ancestral genes A B displaced (250) c

Ancestor of E.coli and Salmonella E.coli

Other foreign genes Useless foreign genes introduced (>6000?) lost (>6000?) G H FIGURE 4 Schematic representation of the effects of genomic flux on the E. coli genome. The events depicted occurred since divergence of E. coli and Salmonella from the common ancestor. The useful gene flux (ABD) depicted above the major arrow describes genes that are kept in the genome long enough to show measurable amelioration. Although large numbers of potentially useful genes enter the chromosome (A) and may be main­ tained for some time, most of these are ultimately lost (B); however, about 250 remain (D). The number of selectively maintained foreign genes (D) is matched by a loss of ancestral genes (C). The flux below the major arrow (EFGH) represents sequences that provide no selective advantage. Large amounts of DNA with no value are likely introduced into the chromosome (G), but the vast majority are removed by deletion (H). About 500 of these genes are still in the genome but will eventually be removed by mutation and drift; these include those known to be useless (F), like mobile genetic elements, and those merely too recently arrived to have been subject to deletion (E). The newly arrived genes have not been tested by selection, but very few are likely to remain. As shown by the black bars, the ancestral chromosome also contained a portion of selectively useless genes that would have been ddeted.

tions are likely to improve cellular fitness. For better tested when the complete sequence of this reason, we maintain that new genes con­ the Salmonella genome is available. We predict tributing to the long-term evolution of bac­ that 15 to 30% of each genome will comprise terial species are more likely to have been ob­ sequences absent from the other taxon; DNA tained by horizontal transfer than by internal hybridization data supports this conclusion. duplication and divergence (point mutations). Whole-genome DNA-DNA hybridization studies showed that the E. coli and S. enterica Salmonella and Its Divergence from genomes are 45% "related" (7-10). This es­ E. coli timate reflects two processes: (i) more than Analysis of the E. coli and Salmonella genomes half of each genome is comprised of shared sequences that are -85% identical (48, 56), by using the principles outlined above allows several conclusions regarding the divergence and (ii) the remaining portion of each genome of Salmonella and E. coli &om their common (between 25 and 45%) is unique (8, 10). As ancestor. Many of these predictions will be detailed above, the sequences unique to each 15. GENOMIC FLUX • 279 genome include both acquired sequences and A notable feature of the Salmonella lineage genes that have been lost from the other spe­ is its widespread exploitation of pathogenic cies' genome (Fig. 4). lifestyles. Strains of S. enterica are pathogens of Lateral genetic transfer contributed many of many organisms, including humans, mice, the features used to identify strains of Salmo­ cows, reptiles, amphibians, and poultry (22). nella. A notable example is B 12 metabolism, Many genes facilitating this pathogenic life­ the basis of the Rambach test (52) for Salmo­ style are found in clusters known as patho­ nella identification, which scores the ability to genicity islands. Analysis of these regions in synthesize cobalamin (cob) and perform cobal­ Salmonella has indicated that many essential amin-dependent degradation of propanediol virulence factors, including those for attach­ (pdu). Salmonella acquired these functions in a ment (Salmonella pathogenicity island [SPI-1]), single horizontal-transfer event that added a invasion (SPI-2), and macrophage survival block of over 40 contiguous genes, nearly 1% (SPl-3), are found on segments of DNA of the Salmonella genome. These abilities are acquired by horizontal transfer (46, 58). The shared by virtually every isolate of S. enterica occurrence of horizontally acquired pathoge­ (35). Amelioration analysis suggests that the nicity islands in many pathogens (1) suggests genes were acquired 71 million years ago (31), that the genomic-flux model may provide a after the divergence of the E. coli and Salmo­ general framework for describing the evolu­ nella lineages (-100 million years ago) but be­ tion of many bacterial lineages. A more com­ fore the radiation of the salmonellae (-50 plete evaluation of the impact of genomic flux million years ago). The phylogenetic distri­ in Salmonella evolution can be performed bution of these functions supports the amelio­ when the genomic sequence becomes availa­ ration method for estimating transfer time. ble. The gene arrangement within the acquired Examination of available Salmonella se­ sequence block (pdu-cob) supports the impor­ quence data suggests that the rate of foreign­ tance of horizontal transfer. This block in­ sequence acquisition has been similar to that cludes about 40 genes that act together to pro­ estimated (see above) for E.coli. However, this vide a single selectable phenotype-the ability equality of acquisition rates need not be true to degrade a carbon source and synthesize the and may not be seen for all pairs of sister spe­ needed for this degradation. The cies. One species may have continued to live block of genes includes four independently in a manner similar to that of the common transcribed units: the pdu operon for degra­ ancestor, while the sister species diverged to dation of propanediol (15 to 20 genes), the explore some novel environmental situation. pduF gene for importing propanediol, the cob The exploratory species may have acquired

operon for synthesis ofB12 (20 genes), and the more foreign genes that gave it the capabilities pocR gene for regulation of all three transcripts needed to support its divergence, while the (54). There is no known reason that these conservative species did not experience signif­ units need to be close together icant selection for acquisition of new func­ in order to perform their functions. ~he fact tions. that all were acquired in a single transfer event If Salmonella acquires useful foreign se­ suggests why they happen to be close to­ quence at the rate of 16 kb per million years gether-their proximit}' in some donor or­ estimated for E. coli, we assume (as we did for ganism made it possible for Salmonella to ac­ E. coli) that it will lose ancestral sequence at a quire a complex metabolic capability and a similar rate. However, gain and loss of se­ selectable phenotype. Below, we will propose quences will occur independently in the two that such gene clusters and operons are formed lineages, and their genomes ~ diverge to the in a process driven by the horizontal-transfer extent that they gain and lose different infor­ process discussed above. mation. By this analysis alone (without com- 280 • LA WRENCE AND ROTH parison to an outgroup taxon), we cannot es­ from the other species by deletion. Each ge­ timate directly how much sequence was lost nome will contain about 750 foreign genes, a unilaterally from the Salmonella or E. coli line­ different set in each organism, that were ac­ age or how much ancestral sequence was lost quired by horizontal transfer; of these, about from both genomes independently. However, 250 are under selection and contributing to even without this information, we can use the the fitness of the organism. The rest contrib­ above considerations to predict what will be ute no valuable function: they are either new seen when the two genomes are compared af­ arrivals that have yet to be eliminated (250) or ter 100 million years of divergence. they are selfish elements that are decaying The general expectations are diagrammed without selection (250). These expectations in Fig. 5. Assuming similar events in each line­ are generally borne out by the available se­ age and a random loss of ancestral sequences, quence data and by DNA hybridization tests we predict that the roughly 4,000 genes under done many years ago (7, 10). selection in both Salmonella and E. coli will prove to be distributed in the following way. IMPACT OF GENOMIC FLUX The two genomes will each contain roughly 3,000 shared genes inherited vertically from Mechanisms of Gene Acquisition their common ancestor. Each will contain an The high rate of introduction of DNA into added 250 ancestral genes that have been lost the E. coli genome, in excess of 64 kb/million

3000 Shared ancestral genes ~ 3000

250 Unique ancestral genes -+- 250 FIGURE 5 Divergence of Salmonella and E. coli genomes. Based on the model presented in the text, the likely events and 250 Unique useful foreign genes l 250 final genomic consequences of this act of speciation are portrayed. On genomes, tri­ 500 Unique useless foreign genes l 500 angles indicate acquired foreign material. Open boxes indicate lost ancestral mate­ IGenomes share about 75% of their genes I rial. 15. GENOMIC FLUX • 281 years, requires plausible mechanisms for the pos1t1on. (If replicative transpos1t1on occurs introduction and stable chromosomal incor­ between a linear fragment and the chromo­ poration of heterologous genes. It may seem some, a second exchange would be re­ unlikely that sequences could be transferred in quired-transposition, recombination, or an natural environments at the rates inferred here. illegitimate event-to recircularize the recip­ Several facets of this process should be appre­ ient chromosome ([59]). The IS element me­ ciated. First, even an extremely low rate of diating the transposition probably resided on heterologous recombination will ensure a high the E. coli chromosome and not on the ac­ rate of gene acquisition by a species if the pop­ quired fragment, since IS elements of each ulation size is large and the transferred gene class within E. coli are nearly identical in DNA provides a selective advantage. This combi­ sequence. nation allows extremely rare transfer events to One might argue that the association of IS introduce genes that will rise to high fre­ elements with acquired genes is not due to a quency in the population. Second, genes may role in integration but rather reflects the dis­ initially be introduced into the cytoplasm on pensability of foreign sequences, which allows independent replicons ( or bacterio­ them to be used as targets with minimal se­ phages) that can provide a long period of rep­ lective consequences. This alternative would lication and selective maintenance prior to predict that all IS elements would be found integration into the bacterial chromosome. more often with foreign sequences. The Lastly, transposons are site-specific recombi­ mechanistic involvement in incorporation nation mechanisms that may mediate insertion seems more likely to us because certain IS el­ of foreign genes; these mechanisms circum­ ements show a much higher likelihood of as­ vent the need for homologous recombination sociation than others (six of seven for IS2 but with its demand for close sequence similarity. zero of three for IS 18 6). A notable example There is evidence for transposon- or bac­ of apparent IS-mediated gene acquisition is E. teriophage-mediated acquisition of foreign coli's G+C-rich ar;glregion at min 7, which is genes in the E. coli chromosome (32). First, flanked by IS 1 elements. many acquired genes lie adjacent to tRNA genes; bacteriophages are known to use tRNA Genomic Flux and Speciation genes as chromosomal integration sites (14, Gene loss and acquisition represent a powerful 25, 51), and many acquired gene blocks ad­ mechanism by which genomes adjust to major jacent to tRNA genes inClude homologues of changes in organism lifestyle. Such adjust­ bacteriophage genes. We suspect that other ments are implicit in bacterial speciation. Spe­ acquired sequences adjacent to tRNA genes ciation is the process by which an ancestral may be remnants of prophages from' which population of organisms-defined by their ex­ identifiable bacteriophage genes have been de­ ploitation of a particular ecological niche­ leted; far too many more foreign genes are evolves to form two populations, each ex­ found adjacent to tRNA genes than would be ploiting a separate ecological niche and re­ expected at random. .. combining more often within their own A significant fraction (68%) of the insertion group than with the sister species. As detailed sequences found in the E. coli chromosome are above, genomic flux appears to have played a associated with horizontally acquired genes, big role in the speciation event by which E. often lying at the boundary between native coli and S. enterica diverged. The distribution and acquired sequences (32). A foreign seg­ of these organisms in natural environments ment would initially be flanked by direct­ (18, 66) and their distinctive behavior in clin­ order copies of the insertion sequence (IS) el­ ical situations (22) suggest that they enjoy sig­ ement if an introduced circular DNA nificantly different lifestyles and inhabit sub­ fragment were integrated by replicative trans- stantially different ecological niches. 282 • LAWRENCE AND ROTH

We postulate that such diversification into amples of true paralogues in bacterial ge­ ecologically distinct organisms would not be nomes. feasible (in the 100-million-year period) if the The evolution of metabolic novelty by du­ genomes of these organisms diverged solely by plication and divergence is almost always a point mutation or internal rearrangement. We slow, stepwise, and inherently inefficient pro­ estimate (see above) that about 80% of the ge­ cess. For example, the evolution of a catabolic nomes of E. coli and Salmonella are derived pathway for utilization of a hexose may re­ from their common ancestor. and that each quire the following: species possesses about 1,000 genes that are 1. alteration of binding specificities for absent from the other's genome. The observed several to accommodate binding of differences appear to be the result of unilateral new substrates and intermediates gene acquisition and unilateral gene loss. 2. alteration of binding sites of these en­ None of the features now used to distinguish zymes to minimize binding of their original these organisms taxonomically can be attrib­ substrates uted to an accumulation of point mutations 3. alteration of the active sites of these en­ and internal functional divergence. Rather, zymes to allow them to perform their catalytic gene acquisition (and attendant gene loss) has functions efficiently and effectively on the provided each organism with distinct selecta­ new substrates and intermediates ble phenotypes. 4. alteration of release activities of the new products to optimize turnover Evolution of Metabolic Novelty 5. alteration of regulatory proteins to dis­ Even if horizontal transfer of phenotypic in­ criminate between old and new substrates formation is a major factor in bacterial evo­ 6. alteration of regulatory interactions to lution, metabolic novelties must ultimately allow under potentially dif­ evolve by stepwise mutational changes that ferent growth conditions cause functional alteration of preexisting 7. coordinate evolution of multiple en­ genes. We argue that, when the possibility of zymes to accommodate new substrates and in­ horizontal transfer exists, the slow process of termediates functional modification cannot provide for ef­ We maintain that all of these evolutionary ficient competitive exploitation of an environ­ steps are required to provide a function that ment. We suggest·.that the obviously homol­ allows efficient, competitive exploitation of ogous genes with distinct functions seen in a novel environments. Organisms at any stage single bacterial genome (apparent paralogues) in the evolutionary process outlined above probably arose initially in distinct lineages as will not effectively compete with those that the sole member of the gene family (ortho­ have acquired a preformed highly evolved logues). After the orthologues diverged in dif­ gene complex which performs the same task. ferent genomes, they were brought into a We propose that when horizontal acquisi­ common lineage by horizontal transfer. Thus, tion is possible and appropriate information we suggest that most of the apparent para­ modules preexist, internal evolution of meta­ logues seen in bacterial genomes are actually bolic novelty (reinvention of the wheel by du­ horizontally transferred orthologues. After the plication and divergence) cannot occur in the microbial world came to include a wide va­ context of competition. When bacterial spe­ riety of highly adapted genes, assorting these ciation entails the competitive invasion of an information bits by horizontal transfer became ecological niche, the evolution of metabolic much more likely than reinvention of func­ novelty cannot be correlated with bacterial tions by internal duplication and divergence. speciation. The analysis of E. coli and S. enterica In this sense, we suspect there will be few ex- detailed above supports this hypothesis; none 15. GENOMIC FLUX • 283 of the characteristics that discriminate between cause they can transfer more widely and (like these taxa can be attributed to the evolution transposable elements) have a selfish advan­ of metabolic novelty by intragenomic dupli­ tage. This advantage need not be associated cation and divergence. Rather, all of these dif­ with any physiological improvement ·of host ferences, which we posit were intimately in­ phenotype. We propose that the prevalence of volved in the divergence of the lineages and gene dusters stands as evidence that genomic adaptation to their individual niches, can be flux has historically been a primary contribu­ attributed to gene loss and gene acquisition. tor to genome evolution. All novel phenotypes were conferred by hor­ The cotranscription of many clustered izontally transferred genes. Therefore, genes genes (operons) is a further refinement of this for the diverse metabolic pathways have process and has allowed subsequent coregula­ formed slowly at earlier times and not during tion of gene clusters whose products partici­ the course of competitive invasion of novel pate in a single metabolic process. Since the ecological niches. Regardless of the relative elucidation of gene dusters in the 1950s (15) rates of these two processes, it is clear that and that of operon structure in the 1960s (26, gene loss and acquisition have facilitated ex­ 27), four theories have been offered to explain ploration of novel environments and allowed the evolution of bacterial gene dusters; these more rapid divergence of bacterial types in are reviewed in reference 36. The natal theory competitive situations. proposes that gene clusters resulted from the tandem duplication and divergence of parental Genomic Flux and the Evolution of genes that occurred during initial evolution of Gene Clusters metabolic pathways. While this process may The genomic-flux model predicts that hori­ explain clusters of homologous genes (e.g., zontal gene transfer mediates the acquisition mammalian globin gene dusters), bacterial op­ of novel phenotypic capabilities. If multiple erons typically contain genes whose products gene products are required to confer a select­ belong to distinct gene families (e.g., kinases, able phenotype, the phenotype cannot be methylases, and dehydrogenases) and were transmitted unless all of the required genes are likely assembled from previously existing un­ simultaneously mobilized. A subset of the nec­ linked genes. essary genes will not provide a selectable phe­ The Fisher theory postulates that coadapted notype and cannot be maintained by natural genes-those whose products have been se­ selection. Therefore, the applicability of the lected to work together efficiently-will ap­ genomic-flux model beyond simple functions pear to be more tightly linked than expected, requiring the product of only a single gene is since recombination may disrupt particularly contingent upon the physical clus~ering of advantageous combinations of coadapted al­ genes that provide for a single function. leles (17). Consider two genes, each with two Bacterial genomes are notable for having alleles, A and a at one locus and B and b at clusters of cotranscribed genes, usually con­ another locus. If natural selection favors or­ tributing to a single selectable funct\on (26). ganisms bearing a coadapted combination of These clusters include genes from a v~riety of alleles at these loci (AB or ab) and counter­ gene tamilies that must have been brought to­ selects organisms bearing more poorly coop­ gether after their initial formation. As de­ erating combinations (Ab or aB), linkage dis­ scribed previously (36) and outlined below, equilibrium will occur among the alleles at we believe that formation of these clusters was these loci, simply because the good combi­ driven by the selective advantage conferred on nations confer greater fitness. Apparent dise­ the clustered alleles themselves (in comparison quilibrium among alleles would be expected, to the same genes in an unclustered state). The even for genes on different , be­ clustered alleles are fitter in a global sense be- cause the alternative combinations have lower 284 • LA WRENCE AND ROTH fitness. This model was extended (5, 60) to not all of the members of the gene clusters are explain the origins of clusters of genes in hap­ physically close. More precise positioning of loid organisms, especially in bacteriophage ge­ genes in cotranscribable groups is likely to re­ nomes (6, 12, 13). It was postulated that quire additional steps. The most effective selection favors the assembly of genes into means ofjuxtaposing genes (deletion of inter­ clusters, so that unfavorable recombina­ vening genetic material) will not be useful tion between coadapted genes occurs less when the intervening material is selectively frequently. This model is restricted to the valuable. For this reason, we discount co­ assembly of operons bearing coadapted alleles regulation as a plausible selective force to drive (probably encoding products that interact the evolution of operons. physically) in a freely recombining, variable Horizontal transfer of genes to new ge­ population. The model is not supported by nomes provides a selective force that is not the fact that organisms with the highest re­ subject to the above limitations (36). As de­ combination rates (obligatory sexual species) tailed above, the introduction of a foreign show little evidence of clustering related DNA sequence to a genome does not ensure genes. Conversely, the largely asexual bacterial its persistence. Sequences that do not contrib­ lineages show a strong tendency to cluster ute to fitness are subject to deletion. There­ genes with related functions. fore, when a large fragment is introgressed, The coregulation model (probably the those functions contributing a valuable phe­ most widely accepted) postulates that genes notype will be selectively maintained while are found in operons because coordinate ex­ intervening noncontributing material will be pression of the constituent genes is beneficial deleted. In this way, transferred genes that act to the cell. While the benefits of coregulation together to confer a valuable phenotype will may contribute to selection for maintenance be maintained and brought closer together. of a gene cluster, this selection cannot drive Although deletion of intervening genes was formation of the clusters, since coregulation not possible in the context of the donor ge­ cannot provide a strong selective advantage for nome (where the deletions were deleterious), the intermediate states that must exist prior to deletion is inevitable in the context of a new transcriptional fusion. Moreover, coregulation recipient genome. Therefore, any loosely is seen for many bacterial genes (e.g., the E. linked collection of cooperating genes, just coli arg, nad, pur, and pyr genes) that are not close enough to be transferred between or­ assembled into operons, making it clear that ganisms, will become progressively more clustering is not essential to co regulation. We tightly linked with time in the new host. As have proposed a different model to account the cluster of related genes becomes tighter, it for the evolution of operons. will transfer more efficiently to new hosts. It The selfish operon model posits that gene should be noted that the base sequence of clusters were assembled as a natural conse­ genes in a cluster may be identical before and quence of frequent horizontal transfer and that after clustering has occurred. Thus, clustering they serve to increase the fitness of the con­ is not expected to provide any phenotypic im­ stituent genes by facilitating their distribution provement to the host (compared to the same to a wide variety of genomes (36). As detailed genes in an unclustered state) but rather pro­ below, this model provides a way of selecting vides a benefit to the clustered genes them­ for progressive clustering of genes and for the selves (wider distribution). We termed this the maintenance of gene clusters once formed. selfish operon model because we consider physical proximity a selfish property of the Genomic Flux Facilitates constituent genes, allowing more frequent Gene Clustering transfer to naive genomes. The process for The assembly of genes into clusters must entail evolution of gene clusters is diagrammed in a series of intermediate steps, during which Fig. 6. 15. GENOMIC FLUX • 285

Evolutionary advantage of clustered selectable alleles

wsfaC

Alleles (S orC) confer equal fitness regardless of genome position

toTransfer new host · Alleles (S or C) are both with ! subject to loss by mutation intervening and drift of the alleles genes out of the population wsraC wsfAC wsfBCf More efficient transfer Deletion of of tighter non-essential cluster material (neg)

Path I : Evolutionary loss Path II : Horizontal transfer and clustering

Equally likely, for the identical Open only to clustered (C) alleles, enhancing their persistence clustered (C ) and separated (S) alleles

FIGURE 6 Mechanism for clustering of genes by horizontal transfer. The diagram compares the fates of identical alleles of two genes, wsf, that together confer a weakly selectable function. In one organism these genes are clustered (C), and in the other they are separated (S). Both sets are equally subject to loss by mutation and drift. However, clustered wsf genes can spread horizontally to new genomes. Following transfer, genes (ess) that were essential in the donor become nonessential (neg) in the new species. While the selected wsf genes are selectively maintained, the neg genes are deleted. This tightens the wsf cluster and enhances the likelihood of its further transfer.

Tighter Gene Clustering and ever, once even loosely aggregated genes have Cotranscription Facilitate been assembled into a cluster following a sin­ Horizontal Transfer gle horizontal transfer (see above), these genes As described above, assembly of related genes have a higher probability of successful cotrans­ into clusters is driven because the clustered fer (and cooperative provision of a phenotype) state improves the efficiency with which those to new genomes. genes spread between and among speCies. As Cotranscription of gene clusters (organiza­ gene clusters become tighter, their ability to tion of gene clusters into operons) facilitates be mobilized increases, making the model de­ horizontal transfer by minimizing the number scribed here operate with ever-increasing ef­ of promoters that must function in the new ficiency. All known mechanisms of gene context. Gene clusters may frequently be transfer increase in efficiency with decreasing transferred into recipient cells with a transcrip­ size of the fragment that must be transferred. tion apparatus different from that of the do­ Genes contributing to a single function have nor; no phenotype can be conferred unless all a very low probability of cotransfer if they are promoters function in the new host. This dispersed on a bacterial chromosome. How- problem is minimized as the number of re- 286 • LA WRENCE AND ROTH quired promoters is reduced and is eliminated 2. Bergthorsson, U., and H. Ochman. 1995. if a block of cotranscribable genes integrates Heterogeneity of genome sizes among natural isolates of Escherichia coli.]. Bacteriol. 177:5784- near a host . Furthermore, operons 5789. of translationally coupled genes gain the ad­ 3. Bergthorsson, U., and H. Ochman. 1998. dition,al advantage of not requiring de novo Distribution of chromosome length variation in start signals in the new host. There­ natural isolates of Escherichia coli. Mol. Biol. Evol. fore, operons of cotranscribed, translationally 15:6-16. coupled genes are highly portable packages of 4. Blattner, F. R., G. Plunkett ill, C. A. Bloch, N. T. Perna, V. Burland, M. Riley, J. Col­ genetic information that function in the wid­ lado-Vides,]. D. Glasner, C. K. Rode, G. F. est variety of organisms. For this reason, the Mayhew, J. Gregor, N. W. Davis, H. A. cotranscription can be considered an addi­ Kirkpatrick, M. A. Goeden, D. J. Rose, tional selfish property of the constituent genes B. Mau, and Y. Shao. 1997. The complete ge­ that extends the benefits accrued by proxim­ nome sequence of Escherichia coli K-12. Science ity. Once an operon has formed, for purely 277:1453-1474. 5. Bodmer, W. F., and P. A. Parsons. 1962. selfish reasons, a regulatory mechanism may Linkage and recombination in evolution. Adv. provide additional benefits; in this way, co­ Genet. 11:1-100. regulation may contribute to selection for the 6. Botstein, D. 1980. A theory of modular evo­ maintenance of a gene cluster but it cannot lution for bacteriophages. Ann. N. Y. Acad. Sci. provide selection for the initial assembly, or 354:484-491. 7. Brenner, D.J., and D. B. Cowie. 1968. Ther­ cotranscription, of that cluster. mal stability of Escherichia coli-Salmonella typhimu­ rium deoxyribonucleic acid duplexes.]. Bacteriol. Summary 95:2258-2262. 8. Brenner, D. J., and S. Falkow. 1971. Molec­ We have outlined a model for the evolution ular relationships among members of the entero­ of bacterial genomes through the synergistic bacteriaceae. Adv. Genet. 16:81-118. processes of gene acquisition and gene loss. 9. Brenner, D. J., G. R. Fanning, K. E. John­ From analysis of the E. coli genome, we esti­ son, R. V. Citarella, and S. Falkow. 1969. mate that genes are lost and gained at a rate Polynucleotide sequence relationships among of 16 kb/million years. The information members of the Enterobacteriaceae.]. Bacteriol. 98: 637-650. gained by this process allows exploration of 10. Brenner, D. J., G. R. Fanning, F. J. Sker­ novel ecological niches under competitive man, and S. Falkow. 1972. Polynucleotide se­ conditions; the coupled loss of genes cuts the quence divergence among strains of Escherichia coli new organism off from its old lifestyle and and closely related organisms. ]. Bacteriol. 109: contributes to its divergence from the ancestral 953-965. population. We suggest that this process has 11. Bult, C. J., 0. White, G. J. Olsen, L. Zhou, R. D. Fleischmann, G. G. Sutton, J. A. driven the organization of bacterial genes into Blake, L. M. FitzGerald, R. A. Clayton, clusters and cotranscribed operons, which al­ J. D. Gocayne, A. R. Kerlavage, B. A. low a single horizontally transferred fragment Dougherty, J.-F. Tomb, M. D. Adams, C. to confer novel phenotypic capabilities on re­ I. Reich, R. Overbeek, E. F. Kirkness, K. G. cipient cells. The organization of genes into Weinstock, J. M. Merrick, A. Glodek, J. L. operons reflects the important role in bacterial Scott, N. S. M. Geoghagen,J. F. Weidman, J. L. Fuhrmann, D. Nguyen, T. R. Utter­ evolution and speciation played by genomic back, J. M. Kelley, J. D. Peterson, P. W. flux-the development of bacterial genomes Sadow, M. C. Hanna, M. D. Cotton, K. M. by gene loss and gene acquisition. Roberts, M. A. Hurst, B. P. Kaine, M. Bor­ odovsky, H.-P. Klenk, C. M. Fraser, H. 0. Smith, C.R. Woese, andJ. C. Venter. 1996. REFERENCES Complete genome sequence of the methanogenic 1. · Barinaga, M. 1996. A shared strategy for viru­ archaeon, Methanococcus jannaschii. Science 273: lence. Science 272:1261-1263. 1058-1073. 15. GENOMIC FLUX • 287

12. Campbell, A., and D. Botstein. 1983. Evo­ 24. lkemura, T. 1982. Correlation between the lution oflambdoid phages, p. 365-380. In R. W. abundance of yeast transfer and the oc­ Hendrix, J. W. Roberts, F. W. Stahl, and R. A. currence of the respective codons in protein Weisberg (ed.), LAmbda II. Cold Spring Harbor genes. Differences in synonymous codon choice Laboratory, Cold Spring Harbor, N.Y. patterns of yeast and Escherichia coli with refer­ 13. Casjens, S., G. Hatfull, and R. Hendrix. ence to the abundance of isoaccepting transfer 1992. Evolution of dsDNA tailed-bacteriophage RNAs.]. Mot. Biol. 158:573-597. genomes. Virology 3:383-397. 25. Inouye, S., M. G. Sunshine, E. W. Six, and 14. Cheetham, B. F., and M. E. Katz. 1995. A M. Inouye. 1991. Retrophage phi R73: an E. role for bacteriophages in the evolution and coli phage that contains a retroelement and inte­ transfer of bacterial virulence determinants. Mol. grates into a tRNA gene. Science 252:969-971. Microbiol. 18:201-208. 26. Jacob, F., and}. Monod. 1962. On the regu­ 15. Demerec, M., and P. Hartman. 1959. Com­ lation of gene activity. Cold Spring Harbor Symp. plex loci in microorganisms. Annu. Rev. Microbiol. Quant. Biol. 26:193-211. 13:377-406. 27. Jacob, F., D. Perrin, C. Sanchez, and}. Mo­ 16. DuBose, R. F., and D. L. Hartl. 1990. The nod. 1960. L'operon: groupe de genes a expres­ of alkaline phosphatase: cor­ sion coordonee par un operateur. C. R. Acad. Sci. relating variation among enteric bacteria to ex­ 250:1727-1729. perimental manipulations of the protein. Mot. 28. Karlin, S., and C. Burge. 1995. Dinucleotide Biol. Evol. 7:547-577. relative abundance extremes: a genomic signa­ 17. Fisher, R. A. 1930. The Genetical Theory ef Nat­ ture. Trends Genet. 11:283-290. ural Selection. Oxford University Press, Oxford, 29. Klenk, H.P., R. A. Clayton,}. F. Tomb, 0. United Kingdom. White, K. E. Nelson, K. A. Ketchum, R. J. 18. Foltz, V. D. 1969. Salmonella ecology.]. Am. Dodson, M. Gwinn, E. K. Hickey,}. D. Pe­ Oil Chem. Soc. 46:222-224. terson, D. L. Richardson, A. R. Kerlavage, 19. Fraser, C. M., J. D. Gocayne, 0. White, D. E. Graham, N. C. Kyrpides, R. D. M. D. Adams, R. A. Clayton, R. D. Fleisch­ Fleischmann, J. Quackenbush, N. H. Lee, mann, C. J. Bult, A. R. Kerlavage, G. Sut­ G. G. Sutton, S. Gill, E. F. Kirkness, B. A. Dougherty, K. McKenney, M. D. Adams, B. ton, J. M. Kelley, J. L. Fritchman, J. F. Loftus, andJ. C. Venter. 1997. The complete Weidman, K. V. Small, M. Sandusky, J. L. genome sequence of the hyperthermophilic, sul­ Fuhrmann, D. T. Nguyen, T. R. Utterback, phate-reducing archaeon Archaeoglobus falgidus. D. M. Saudek, C. A. Phillips,}. M. Merrick, Nature 390:364-370. J.-F. Tomb, B. A. Dougherty, K. F. Bott, 29a.Lawrence, J. Unpublished results. P.-C. Hu, T. S. Lucier, S. N. Petenon, 30. Lawrence, J. G. 1997. Selfish operons and spe­ H. 0. Smith, C. A. I. Hutchison, and J. C. ciation by gene transfer. Trends Microbiol. 5:355- Venter. 1995. The minimal gene complement 359. of Mycoplasma genitalium. Science 270:397-403. 31. Lawrence, J. G., and H. Ochman. 1997. 20. Himmelreich, R., 'H. Hilbert, H. Plagens, Amelioration of bacterial genomes: rates of E. Pirkl, B. C. Li, and R. Herrmann. 1996. change and exchange.]. Mol. Evol. 44:383-397. Complete sequence analysis of the genome of the 32. Lawrence,J. G., and H. Ochman. 1998. Mo­ bacterium Mycoplasma pneumoniae. Nucleic Acids lecular archaeology of the Escherichia coli genome. Res. 24:4420-4449. Proc. Natl. Acad. Sci. USA 95:9413-9417. 21. Himmelreich, R., H. Plagens, H. Hilbert, 33. Lawrence, J. G., H. Ochman, and D. L. B. Reiner, and R. Herrmann. 1996. Com­ Hartl. 1991. Molecular and evolutionary rela­ parative analysis of the genomes of the bacteria tionships among enteric bacteria. ]. Gen. Micro­ Mycoplasma pneumoniae and Mycoplasma genitalium. biol. 137:1911-1921. Nucleic Acids Res. 25:701-712. 34. Lawrence, J. G., H. Ochman, and D. L. 22. Hoff, G. L., and D. M. Hoff. 1984. Salmonella Hartl. 1992. The evolution of insertion se­ and Arizona, p. 69-82. In G. L. Hoff, F. L. Frye quences within enteric bacteria. Genetics 131:9- and E. R. Jacobson (ed.), Diseases ef Amphibians 20. and Reptiles. Plenum Press, New York, N.Y. 35. Lawrence,}. G., andJ. R. Roth. 1996. Evo­ 23. Ikemura, T. 1980. The frequency of codon us­ lution of coenzyme B 12 among enteric bacteria: age in E. coli genes: correlation with abundance evidence for loss and reacquisition of a multigene of cognate tRNA, p. 519-523. In S. Osawa, H. complex. Genetics 142:11-24. Ozeki, H. Uchida, and T. Yura (ed.), Genetics and 36. Lawrence,}. G., andJ. R. Roth. 1996. Selfish Evolution of RNA Polymerase, tRNA and Ribo­ operons: horizontal transfer may drive the evo­ somes. University of Tokyo Press, Tokyo, Japan. lution of gene clusters. Genetics 143:1843-1860. 288 • LAWRENCE AND ROTH

37. Lawrence,J. G., andJ. R. Roth. 1997. Roles 52. Rambach, A. 1990. New plate medium for fa­ of horizontal transfer in bacterial evolution. In M. cilitated differentiation of Salmonella spp. from Syvanen and C. Kado (ed.), Horizontal Gene Proteus spp. and other enteric bacteria. Appl. En­ Transfer. Chapman and Hall, London, United viron. Microbiol. 56:301-303. Kingdom. 53. Rayssiguier, C., D. S. Thaler, and M. Rad­ 38. Medigue, C., T. Rouxel, P. Vigier, A. Hen­ man. 1989. The barrier to recombination be­ aut, and A. Danchin. 1991. Evidence for hor­ tween Escherichia coli and Salmonella typhimurium izontal gene transfer in Escherichia coli speciation. is disrupted in mismatch-repair mutants. Nature ]. Mol. Biol. 222:851-856. 342:396-401. 39. Moran, N. A. 1996. Accelerated evolution and 54. Roth, J. R., J. G. Lawrence, and T. A. Bo­ Muller's rachet in endosymbiotic bacteria. Proc. bik. 1996. Cobalarnin (coenzyme B12): synthesis Natl. Acad. Sd. USA 93:2873-2878. and biological significance. Annu. Rev. Microbiol. 40. Moran, N. A., M.A. Munson, P. Baumann, 50:137-181. and H. Ishikawa. 1993. A molecular clock in 55. Schmid, M. B., N. Kapur, D. R. Isaacson, endosymbiotic bacteria is calibrated using insect P. Lindroos, and C. Sharpe. 1989. Genetic hosts. Proc. R. Soc. Lond. B 253:167-171. analysis of temperature-sensitive lethal mutants of 41. Muller, H. 1932. Some genetic aspects of sex. Salmonella typhimurium. Genetics 123:625-633. Amer. Nat. 66:118-138. 56. Sharp, P. M. 1991. Determinants of DNA se­ 42. Mushegian, A. R., and E. V. Koonin. 1996. quence divergence between Escherichia coli and A minimal gene set for cellular life derived by Salmonella typhimurium: codon usage, map posi­ comparison of complete bacterial genomes. Proc. tion, and concerted evolution.]. Mol. Evol. 33: Natl. Acad. Sd. USA 93:10268-10273. 23-33. 43. Muto, A., and S. Osawa. 1987. The 57. Sharp, P. M., and W.-H. Li. 1987. The codon and content of genomic DNA and bac­ adaptation index-a measure of directional syn­ terial evolution. Proc. Natl. Acad. Sci. USA 84: onymous codon usage bias, and its potential ap­ 166-169. plications. Nucleic Adds Res. 15:1281-1295. 44. Naas, T., M. Blot, W. M. Fitch, and W. Ar­ 58. Shea, J. E., M. Hensel, C. Gleeson, and ber. 1994. Insertion sequence-related genetic D. W. Holden. 1996. Identification of a viru­ variation in resting Escherichia coli. Genetics 136: lence locus encoding a second type III secretion 721-730. system in Salmonella typhimurium. Proc. Natl. Acad. 45. Ochman, H., and U. Bergthorsson. 1995. Sci. USA 93:~593-2597. Genome evolution in enteric bacteria. Curr. 59. Sonti, R. V., D. H. Keating, andJ. R. Roth. Opin. Genet. Dev. 5:734-738. 1992. Lethal transposition of Mud phages in 46. Ochman, H., and E. A. Groisman. 1996. Rec- strains of Salmonella typhimurium. Genetics Distribution of pathogenicity islands in Salmonella 133:17-28. spp. Irifect. Immun. 64:5410-5412. 60. Stahl, F. W., and N. E. Murray. 1966. The 47. Ochman, H., andJ. G. Lawrence. 1996. Phy­ evolution of gene clusters and genetic circularity logenetics and the ~elioration of bacterial ge­ in microorganisms. Genetics 53:569-576. nomes, p. 2627-2637. In F. C. Neidhardt, R. 61. Stambuk, S., and M. Radman. 1998. Mech­ Curtiss III, J. L. Ingraham, E. C. C. Lin, K. B. anism and control of interspecies recombination Low, B. Magasanik, W. S. Reznikoff, M. Riley, in Escherichia coli. I. Mismatch repair, methylation, M. Schaechter, and H. E. Untbarger (ed.), recombination and replication functions. Genetics Escherichia coli and Salmonella typhimurium: 150:533-542. Cellular and Molecular Biology, 2nd ed. American 62. Sueoka, N. 1988. Directional mutation pressure Society for Microbiology, Washington, D.C. and neutral molecular evolution. Proc. Natl. Acad. 48. Ochman, H., and A. C. Wilson. 1988. Evo­ Sci. USA 85:2653-2657. lution in bacteria: evidence for a up.iversal sub­ 63. Sueoka, N. 1992. Directional mutation pressure, stitution rate in cellular genomes. ]. Mo[. Evol. selective constraints, and genetic equilibria. ]. 26:74-86. Mol. Evol. 34:95-114. 49. Ohta, T. 1973. Slightly deleterious mutant sub­ 64. Syvanen, M., and C. Kado. 1998. Horizontal stitutions in evolution. Nature 264:96-98. Gene Transfer. Chapman & Hall, Ltd., London, 50. Ohta, T. 1976. Role of very slightly deleterious England. mutations in molecular evolution and polymor­ 65. Thatcher, J. W., J. M. Shaw, and W. J. phism. Theor. Popul. Biol. 10:254-275. Dickinson. 1998. Marginal fitness contributions 51. Pierson, L. S. D., and M. L. Kahn. 1987. In­ of nonessential genes in yeast. Proc. Natl. Acad. tegration of satellite bacteriophage P4 in Escheri­ Sci. USA 95:253-257. chia coli. DNA sequences of the phage and host 66. Thomason, B. M., J. W. Biddle, and W. B. regions involved in site-specific recombination.]. Cherry. 1975. Detection of salmonellae in the Mol. Biol. 196:487-496. environment. Appl. Microbiol. 30:764-767. 15. GENOMIC FLUX • 289

67. Van Valen, L. 1976. Ecological species, multi­ Takahata and A. G. Clark (ed.), Mechanisms ef species, and oaks. Taxon 25:223-239. Molecular Evolution. Japan Scientific Society Press, 68. Vulic, M., F. Dionisio, F. Taddei, and M. Tokyo, Japan. Radman. 1997. Molecular keys to speciation: 70. Wiley, E. 0. 1978. The evolutionary species DNA polymorphism and the control of genetic concept reconsidered. Syst. Zool. 27:17-26. exchange in Enterobacteria. Proc. Natl. Acad. Sd. 71. Zahrt, T. C., and S. Malot. 1997. Barriers to USA 94:9763-9767. recombination between closely related bacteria: 69. Whittam, T. S., and S. Ake. 1992. Genetic MutS and RecBCD inhibit recombination be­ polymorphisms and recombination in natural tween Salmonella typhimurium and Salmonella ty­ populations of Escherichia coli, p. 223-246. In N. phi. Proc. Natl. Acad. Sci. USA 94:9786-9791.