REVIEW ARTICLE Origins of bacterial diversity through horizontal genetic transferand to new ecological niches Jane Wiedenbeck & Frederick M. Cohan

Department of Biology, Wesleyan University, Middletown, CT, USA

Correspondence: Frederick M. Cohan, Abstract Department of Biology, Wesleyan University, Middletown, CT 06459-0170, USA. Horizontal genetic transfer (HGT) has played an important role in bacterial Tel.: 11 860 685 3482; at least since the origins of the bacterial divisions, and HGT still fax: 11 860 685 3279; facilitates the origins of bacterial diversity, including diversity based on antibiotic e-mail: [email protected] resistance. Adaptive HGT is aided by unique features of genetic exchange in such as the promiscuity of genetic exchange and the shortness of segments Received 9 November 2010; accepted 15 June transferred. Genetic exchange rates are limited by the genetic and ecological 2011. similarity of organisms. Adaptive transfer of genes is limited to those that can be Final version published online July 2011. transferred as a functional unit, provide a niche-transcending adaptation, and are compatible with the architecture and physiology of other organisms. Horizontally DOI:10.1111/j.1574-6976.2011.00292.x transferred may bring about fitness costs, and may

Editor: Fernando Baquero ameliorate these costs. The origins of ecological diversity can be analyzed by comparing the genomes of recently divergent, ecologically distinct populations, Keywords which can be discovered as sequence clusters. Such genome comparisons demon- species concept; niche-transcending strate the importance of HGT in ecological diversification. Newly divergent adaptation; ecotype; evolution; genome populations cannot be discovered as sequence clusters when their ecological content; amelioration. differences are coded by plasmids, as is often the case for antibiotic resistance; the discovery of such populations requires a screen for plasmid-coded functions.

Introduction throughout the genome, and has been responsible for the The evolution of bacteria is not just the evolution of animals origins of extremely diverse adaptations (Ochman et al., and writ small. The origin of bacterial species is 2000). Moreover, HGT has played a role in bacterial evolu- accelerated by unique features of bacterial genetics, perhaps tion at least since the origins of the bacterial divisions the most important being the ability of bacteria to readily (Gogarten et al., 2002). For example, methanotrophs acquire genes from other organisms (Ochman & Davalos, acquired the ability to synthesize some of the cofactors 2006; Cohan & Koeppel, 2008). Fully sequenced genomes for methane utilization from methanogenic reveal that a substantial fraction of ORFs have been horizon- (Chistoserdova et al., 1998; Gogarten et al., 2002). tally transferred (Nakamura et al., 2004; McDaniel et al., 2010), More recent HGT events have resulted in important and many of these acquisitions are thought to have driven the ecological differences between closely related species and origins of new bacterial species (Gogarten et al., 2002). between populations within a single recognized species Early evidence of the importance of horizontal genetic (Welch et al., 2002). For example, the virulence factors transfer (HGT) in bacterial evolution was seen in the spread that distinguish Salmonella from Escherichia coli were largely of penicillin resistance through plasmid transfer across the acquired by HGT (Groisman & Ochman, 1996; Gogarten Enterobacteriaceae (Datta & Kontomichalou, 1965). Bacterio- et al., 2002). Also, antibiotic resistance factors coded on logists might have interpreted this rapid spread of resistance plasmids distinguish extremely close relatives within recog-

MICROBIOLOGY REVIEWS as evidence for the general importance of HGT in bacterial nized species taxa. For many bacterial , the evolution, but the extent and impact of HGT were not fully acquisition of antibiotic resistance is likely necessary to appreciated until much later. With eventual access to survive in the ecological niche of agricultural animals and genome sequences, it became clear that HGT occurs frequently treated by antibiotics (O’Brien, 2002).

FEMS Microbiol Rev 35 (2011) 957–976 c 2011 Federation of European Microbiological Societies Published by Blackwell Publishing Ltd. All rights reserved 958 J. Wiedenbeck & F.M. Cohan

Through HGT, divergent populations can share an adap- foreign genes at negligible frequencies (Cohan, 1994; Cohan tation whose value transcends their differences in physio- & Koeppel, 2008) (Box 1). logical capabilities, cellular structures, and ecological niches. Bacterial recombination is much more promiscuous than This sharing of niche-transcending genes does not generally recombination in higher organisms, where it is always make the two populations more similar ecologically. Instead, limited to very close relatives (usually members of the same HGTallows the recipient to build on its unique, pre-existing species or very closely related species) (Mallet et al., 2007; adaptations to either invade a new niche or to improve its Mallet, 2008). Bacterial recombination can extend across the performance in its current niche (Cohan & Koeppel, 2008). bacterial divisions and even across the three domains of For example, enterotoxigenic E. coli, which attacks the (Garcia-Vallve et al., 2000; Rest & Mindell, 2003). Thus, epithelial cells of the small intestine, has shared the Class 5 bacterial recombination can foster the acquisition of adap- fimbriae by HGT with Burkholderia cepacia (Anantha et al., tations from both close and distant relatives. 2004), which can reside in lungs of cystic fibrosis Transferred DNA is generally short, often the length of patients, attacking the respiratory epithelium. Acquiring one to several genes. A short length helps to enable the these niche-transcending genes thus allows each lineage to success of adaptive HGT between deeply divergent bacteria, better attack cells of its respective niche, but does not cause as it allows a recipient to pick up a niche-transcending gene the donor and the recipient to converge ecologically (Cohan (or set of genes) without also acquiring the niche-specifying & Koeppel, 2008; Cohan, 2011). Likewise, when ecologically disparate human pathogens acquire the same antibiotic Box 1. Why recombination does not hinder the adaptive divergence resistance factors by HGT (Fondi & Fani, 2010), their of bacterial populations ecological niches are not converging beyond their response Haldane long ago showed why the rare introduction of niche- to natural selection by antibiotics. specifying alleles from one population to another cannot reverse the adaptive divergence between populations: the equilibrium frequency Here, we will review the properties of bacteria that make à HGT so effective in fostering adaptations and the origins of (p ) of a maladaptive, niche-specifying allele from another population is equal to its rate of entry into the population (m) divided by the selection new ecological populations. We will review the classes of intensity disfavoring the foreign allele (s), i.e. pà = m/s (Haldane, 1932). adaptations most and least likely to be delivered by HGT, In the case of bacteria, the rate of entry of another population’s allele is and the ecological and phylogenetic limits on genetic equal to the rate of recombination between populations (cb)(Cohan, transfer. Also, we will review the evolutionary challenges on 1994). A broad survey of recombination rates in the Bacteria and the recipient to accommodate horizontally acquired adapta- Archaea showed that recombination generally occurs at about the tions. Finally, we will review evidence from genome content same rate as and never greater than about 10 times the rate comparisons that demonstrate a prominent role of HGT in of mutation (per gene per generation) (Vos & Didelot, 2009), which is about 106 per gene per generation (Drake, 2009). This analysis of the origins of bacterial diversity, and how genome compar- recombination rates took special care to estimate only recombination isons have led to the discovery of the ecological dimensions rates among closest relatives, and so the recombination rates may be by which bacterial divergence occurs. taken as estimates of the rate of recombination within populations (cw). Thus, the estimate of recombination rate at 106 per gene per generation may be taken as an upper limit for the rate of recombination

between populations (cb cw). If foreign alleles are disfavored even by The qualities of bacterial genetic weak selection, for example around 102, the equilibrium frequency of à exchange that foster adaptive HGT foreign alleles would be negligible, in this case around p = cb/s cw/s, or 106/102 =104. Exchange of genes in bacteria is both rare and promiscuous. Thus, even if recombination between populations is occurring at as A broad survey of recombination rates has shown that high a rate as recombination within populations, each population will recombination, even among close relatives, occurs at a per be able to hold onto its respective combinations of genes, which adapt capita per gene rate that is generally close to the rate of it to its respective way of making a living. Thus, adaptive divergence in mutation, and rarely more than about 10 times the rate of bacteria does not require sexual isolation (Cohan, 1994; Cohan & mutation (Vos & Didelot, 2009). The rarity of recombina- Koeppel, 2008). This argument has been ignored by workers claiming a central role for recombination and sexual isolation in bacterial tion on a per capita basis does not prevent acquisition of divergence (Fraser et al., 2007; Sheppard et al., 2008), but the adaptations from other organisms, as the enormous popula- argument has never been refuted. tion sizes of many bacteria can bring unlikely recombination Nevertheless, recombination is an important force fostering adaptive events within reach (Levin & Bergstrom, 2000). Moreover, evolution in bacteria. Recombination of niche-transcending genes has recombination in bacteria is too rare to hinder the ecological been shown to be an important means of introduction of adaptations diversification of closely related populations. This is because into bacterial populations. The difference is that maladaptive natural selection against maladaptive, niche-specifying recombination of niche-specifying genes can be rejected by natural selection, but rare, adaptive introductions of niche-transcending genes genes (genes that are adaptive only in the context of their are amplified by natural selection. home population) from other populations easily keeps these

c 2011 Federation of European Microbiological Societies FEMS Microbiol Rev 35 (2011) 957–976 Published by Blackwell Publishing Ltd. All rights reserved Origins of diversity through horizontal transfer 959 genes of the donor, which would be maladaptive in the In some taxa (e.g. and Neisseria), genetic recipient (Zawadzki & Cohan, 1995; Cohan & Koeppel, exchange can occur by transformation, through uptake of 2008). This is in contrast to adaptive introgression in naked DNA from the environment. Most transforming animals and plants. Because eukaryotic meiosis involves a bacteria are able to take up DNA indiscriminately (Lorenz 50 : 50 mix of the two parents’ genomes, several generations & Wackernagel, 1994), but some species are selective about of backcrossing are required to yield a pure transfer of niche- the DNA sequences that are allowed to cross the membrane. transcending genes across species (Rieseberg et al., 2003); in For example, in Neisseria gonorrhoeae, successful transfor- bacteria, a set of niche-transcending genes can be trans- mation requires a 10-bp uptake sequence that occurs ferred, without other genes, in a single transfer event abundantly in the genome of this species taxon. The (Cohan, 2001). requirement for this uptake sequence effectively prevents genetic transfer from divergent organisms (Hamilton & Donor--recipient similarities required for Dillard, 2006). Once present in the , transferred DNA must genetic transfer ensure its replication so that it is not lost. For DNA that The promiscuity of bacteria and the opportunity to transfer enters a recipient cell on a plasmid, replication may occur by adaptations across taxa are not absolute. Certain shared simply remaining on the plasmid, as seen in many plasmids characteristics between donor and recipient are required for with antibiotic resistance factors (Fondi & Fani, 2010). transfer to take place (Cohan, 2002). In some cases, genetic Alternatively, if transferred DNA is contained within a pair exchange takes place through a plasmid or a phage vector, of insertion sequences on a plasmid or a phage, the DNA whereby the vector incorporates genetic material from one flanked by the insertion sequences may replicate and then host, the vector then infects another host, and the genetic transfer to the recipient , without a need for material becomes recombined into the new host (Majewski, between donor and recipient DNA (Vo et al., 2001). Clearly, vector-based genetic exchange between bac- 2010). teria can take place only if the bacteria share their vectors Otherwise, entering DNA may integrate into the recipient (Cohan, 2002). The host ranges of phage and plasmids are chromosome through homologous recombination. This generally narrow, but the host ranges of certain phage requires a ‘minimum efficient processing segment’ (MEPS) extend across bacterial genera (Jensen et al., 1998), and consisting of near-identical sequences of at least 25 bp at one some plasmids have extremely broad host ranges that extend or both ends of a donor segment (Shen & Huang, 1986; across bacterial divisions (Norman et al., 2009). Majewski & Cohan, 1999). The requirement of near identity Sharing of vectors between hosts is challenged by bacterial limits the divergence of genetic material transferred through restriction–modification systems. Many bacteria are homologous recombination, effectively allowing only close equipped with a restriction endonuclease that recognizes relatives (with o 20% nucleotide divergence) to exchange one specific DNA sequence, usually a palindromic tetramer genes in this manner (Majewski & Cohan, 1999). The MEPS (e.g. GGCC), hexamer, or octamer; these bacteria are also requirement has resulted in an exponential decay in the equipped with a modification enzyme that methylates the recombination rate as sequence divergence increases, as seen recognition sequence and thereby protects the cell’s DNA in Bacillus (Roberts & Cohan, 1993; Zawadzki et al., 1995; from its own restriction enzyme (Weiserova´ & Ryu, 2008). Majewski & Cohan, 1999), Escherichia (Vulic´ et al., 1997, Thus, bacteria that share the same restriction–modification 1999), and Streptococcus (Majewski et al., 2000a), although system can more easily share a phage or a plasmid, whereas requirements of near identity may be relaxed in hypermu- bacteria with different restriction–modification systems will tator strains lacking mismatch repair (Matic et al., 2000). effectively digest the DNA of one another’s vectors (Jeltsch, The requirement for near identity in homologous recom- 2003; Budroni et al., 2011). Nevertheless, short plasmid and bination is expected to reduce the transfer of homologous phage genomes may be shared across bacteria differing in adaptations, such as penicillin resistance in Streptococcus, their restriction–modification systems, as short vectors are which occurs through a mutational change in the target less likely to contain a given recognition sequence and are protein (the so-called ‘penicillin-binding protein’) (May- thereby more protected from cleavage (Lacks & Springhorn, nard Smith et al., 1991). Resistance alleles have been 1984). Even some larger plasmids, including the broad-host- transferred within the genus through homologous recombi- range Inc-P1 plasmids, are able to defend against restriction nation (Maynard Smith et al., 1991; Hoffman-Roberts endonucleases by removing restriction target sites from their et al., 2005; Barlow, 2009), and presumably, the rate of sequences (Wilkins et al., 1996), whereas others simulate the transfer across species taxa was reduced by their sequence host modification system, protecting their sequences from divergence. cleavage (Kruger & Bickle, 1983), thus contributing to their Sequence divergence of shared genes may also limit the promiscuity. transfer of heterologous genes. This is because homologous

FEMS Microbiol Rev 35 (2011) 957–976 c 2011 Federation of European Microbiological Societies Published by Blackwell Publishing Ltd. All rights reserved 960 J. Wiedenbeck & F.M. Cohan recombination can facilitate the transfer of heterologous Second, close relatives are more likely to share the same genes, particularly when a heterologous gene is sandwiched habitat than distant relatives; even members of different between homologous genes shared by the donor and the families, but the same order are more likely to share habitats recipient [i.e. homology-facilitated illegitimate recombina- than more distant relatives (Philippot et al., 2010). Because tion (HFIR)] (de Vries & Wackernagel, 2002). Provided that organisms can exchange genes only when they are in the a donor segment’s end sequences are homologous with same habitat (Matte-Tailliez et al., 2002), phylogenetic recipient sequences and they pass the MEPS criterion, any closeness tends to promote genetic exchange through a genetic material sandwiched between the end segments may sharing of environments. be cotransferred (Majewski & Cohan, 1999). While acquisition of novel genes by HGT is possible What adaptations can and cannot be between extremely divergent organisms, HGT occurs much widely transferred? more frequently between close relatives than between more distant relatives. For example, in the gammaproteobacter- Beyond the requirement of similarity between donor and ium Legionella pneumophila, the number of genes acquired recipient, there are limitations on the kinds of adaptations by HGT events from a donor taxon is correlated with the that can be transferred. The various genes must fit on a phylogenetic distance (based on 16S rRNA gene divergence) chromosomal segment short enough to be transferred or between L. pneumophila and the donor (Coscolla´ et al., they must fit on a mobile extrachromosomal element; also, 2011). More broadly, a recent analysis of 657 sequenced the set of genes must be functional when it arrives in a new prokaryotic genomes clearly showed that HGT events are organism (Lawrence, 1999). The selfish operon model pre- much more common among close relatives than distant dicts that natural selection will favor the contiguous ar- relatives (Popa et al., 2011) (Fig. 1). rangement of a functionally related set of genes as an A close phylogenetic relationship fosters HGT in at least operon, enabling the gene set to be successfully transferred two ways. First, close relatives will have greater sequence as a single, functional element across taxa (Lawrence & identity in their shared genes, thus increasing their rate of Roth, 1996; Lawrence, 1997, 1999). In this model, natural homologous recombination (Zawadzki et al., 1995; Vulic´ selection acts at the level of the operon, preserving the et al., 1997; Majewski & Cohan, 1998, 1999; Majewski et al., transferability of the unit, such that an operon can spread 2000b), as well as HFIR (de Vries & Wackernagel, 2002). widely across the bacteria. Indeed, certain operons have passed between organisms at a high rate (Homma et al., 2007). 0.7 An operon may expand to include more functions 0.6 Donor-recipient pairs through HGT acquisition of additional genes (Homma et al., 2007); when this occurs, it is generally the first step in 0.5 Disconnected pairs the pathway that is gained, such as the gaining of the fucP gene in the fucPIKUR operon in E. coli (for the digestion of 0.4 fucose) (Pal´ et al., 2005). In addition, regulators of newly 0.3 added genes may be gained along with the genes they control (Price et al., 2008). The co-incorporation of newly acquired 0.2 protein-coding genes along with their regulators may in-

Proportion of species pairs 0.1 crease the successful transferability of an enlarged operon (Price et al., 2008). 0.0 Transferable gene clusters often contain a set of genes that 0.1 1 10 100 are all involved in processing a single resource molecule. Genome sequence similarity (%) This is exemplified by the lac operon, providing inducible Fig. 1. The relationship between genome sequence similarity and the of environmental lactose (Jacob & Monod, proportion of species pairs that engaged in HGT. The ‘donor–recipient’ 1961). The lac operon includes protein-coding genes for a pairs represent the pairs of species that were connected by at least one lactose-binding repressor, which shuts down the operon in HGT event, based on 32 028 HGT events that could be attributed to a the absence of lactose; a lactose permease, allowing for the recipient and a donor organism, among 657 sequenced prokaryotic uptake of lactose; and two proteins for catabolizing lactose genomes; ‘disconnected pairs’ represent those species pairs that did not in different directions: one that cleaves lactose to its con- show evidence of any HGTevents. The decrease in the number of species stituent monosaccharides and another that transacetylates pairs apparently engaging in HGT among closest relatives (from 50% to 100% genome sequence similarity) was attributed to the difficulty in lactose. Another motif for a transferable cluster is an entire identifying HGT events among close relatives (Popa et al., 2011) (used biochemical pathway, for example the full synthetic pathway with permission from Cold Spring Harbor Laboratory Press). for cytochrome c biogenesis (Goldman & Kranz, 1998).

c 2011 Federation of European Microbiological Societies FEMS Microbiol Rev 35 (2011) 957–976 Published by Blackwell Publishing Ltd. All rights reserved Origins of diversity through horizontal transfer 961

In both of these cases, a contiguous set of genes constitutes a Box 2. Evolution of the CRISPR system through HGT transferable adaptation. CRISPR modules are an acquired immunity defense of Multiple antibiotic resistance genes are frequently ar- against phage and plasmids, and are possessed by most Archaea and ranged in clusters on plasmids (Gomez-Lus, 1998; Barlow, many Bacteria (Makarova et al., 2011). Each CRISPR module within a 2009). While not strictly operons in the sense that the genome contains a number of spacer regions, each derived from the various contiguously arranged antimicrobial-resistance sequence of a previously infecting phage or plasmid through HGT, genes on a plasmid may be regulated independently and the spacer regions are separated by repeat regions. Resistance to (Gomez-Lus, 1998), the contiguous organization of such phages through the CRISPR system is very precise. If the spacer sequences are not a 100% match to the phage, the phage may gene sets can be explained through an extension of the retain the ability to infect the bacterium (Barrangou et al., 2007). selfish operon theory. The association of different antimi- Thus, the incorporation of spacer sequences into CRISPR loci crobial resistance genes may contribute to a single, selectable contributes to the evolutionary arms race between the bacterium function. Carrying resistance for multiple antibiotics is a and the phage, where selective pressure imposed on the phage highly successful strategy for human pathogens likely to be through the CRISPR system may drive high evolutionary rates in the targeted by multiantibiotic therapy, either simultaneously or phage (Sorek et al., 2008). Spacer regions in CRISPR loci are also in a cycling regimen (Lawrence, 2000; Summers, 2006). rapidly gained and lost – the most newly acquired spacer regions are often unique to individual bacterial isolates (Pourcel et al., 2005). Surveys of the genes and adaptations that have been most Large-scale evolutionary changes have also been noted in CRISPR and least frequently transferred demonstrate some of these systems. Although sequences of repeats in CRISPR loci vary among principles of transferability. In a survey of completely different bacterial species, there are occurrences of unexpectedly sequenced genomes, the majority of genes identified as similar repeat sequences between diverse bacteria (Makarova et al., recently transferred were classified into four functional 2002; Haft et al., 2005; Godde & Bickerton, 2006). These similarities categories: plasmid, phage, and transposon functions; cell point to the propagation of CRISPR systems by HGT. The transfer of envelope functions; regulatory functions; and cellular pro- CRISPR systems is hypothesized to be mediated by megaplasmids (plasmids 4 40 kb in size), as CRISPR loci in some genomes are cesses (Nakamura et al., 2004). Within the cell envelope and located on megaplasmids rather than on the bacterial chromosome regulatory function categories, commonly transferred genes (Godde & Bickerton, 2006). The HGT of CRISPR loci may have helped include those encoding cell surface structures, biosynthesis to establish the widespread presence of CRISPR systems in bacteria and degradation of surface polysaccharides, and DNA inter- [40% of bacterial genomes contain CRISPR loci (Godde & actions. Frequently transferred cell surface proteins may be Bickerton, 2006)], and indicates the importance of the CRISPR understood as niche-transcending adaptations, as they are system for phage defense. often the targets of immune systems, and a newly acquired surface protein may allow immune escape (Stein et al., derived (Barrangou et al., 2007). The CRISPR loci are 2010). The most commonly transferred cellular process transcribed into RNA segments, which are cleaved into genes were those involved in DNA transformation, patho- smaller units that each target the complementary sequence genesis, toxin production, and resistance (Nakamura et al., on the phage (Brouns et al., 2008) by base-pairing with 2004). In the case of genes coding for pathogenesis, toxin either the phage’s DNA (Marraffini & Sontheimer, 2008) or production, and resistance, we may interpret their transfer- RNA (Hale et al., 2009). Thus, the CRISPR system repre- ability in terms of providing niche-transcending adaptations sents a form of acquired defense, where infection of a for invading new environments or for resisting new chal- bacterium by a phage may be followed by the incorporation lenges appearing in an organism’s present environment, of phage DNA into new spacer regions of the CRISPR such as antibiotics. system, thereby providing defense against related phage in The survey by Nakamura et al. (2004) also showed that the future (Sorek et al., 2008; van der Oost et al., 2009; plasmid, phage, and transposon functions comprised nearly Horvath & Barrangou, 2010). a third of the transferred genes. Many of these transferred Genes that have been acquired by a particular lineage genes are caused by infection by these subcellular particles through HGT cannot be assumed to be adaptations, and the and provide no adaptation for bacteria. In one special case, of an acquired gene should be confirmed, for bacteria can acquire defense against phage through HGT of example by showing that it is expressed under natural gene segments from phage that have infected them; this is conditions in a way that is consistent with its likely adaptive the modus operandi of the clustered regularly interspaced value (Steunou et al., 2008). Acquired genes are most likely short repeats (CRISPR) system (Marraffini & Sontheimer, to be nonadaptive ‘craters’ that have fallen onto the genome 2008) (Box 2). CRISPR units are comprised of a series of under a variety of circumstances, particularly if the genes short repeats separated by spacer sequences that are derived were newly acquired. One possibility, which genome ana- most often from bacteriophage or plasmids (Bolotin et al., lyses abundantly support, is that many transferred genes 2005; Mojica et al., 2005; Pourcel et al., 2005), and provide have entered the genome nonadaptively with invading defense against the phages or plasmids from which they are phage or transposons (Nakamura et al., 2004). Another

FEMS Microbiol Rev 35 (2011) 957–976 c 2011 Federation of European Microbiological Societies Published by Blackwell Publishing Ltd. All rights reserved 962 J. Wiedenbeck & F.M. Cohan possibility is that a nonadaptive, transferred gene has randomly reached a high frequency within a population or a taxon by or by periodic selection. Finally, a gene may have been adaptive in the past, but is no longer adaptive. This could be the case for an antibiotic resistance gene that was adaptive when a particular antibiotic was applied, but in the absence of the antibiotic, the gene is no longer useful. Among the cellular processes that are transferred the least are informational genes, involved in translation and tran- scription (Jain et al., 1999; Nakamura et al., 2004). Within E. coli in particular, Davids & Zhang (2008) found genes encoding for informational processes to be dominated by core genes shared by all E. coli strains, with no significant contribution from HGT. The low rates of transfer of informational genes [as well as photosynthetic genes (Aris- Brosou, 2005)] can be understood most generally by Riedl’s ‘burden’ hypothesis (Riedl, 1978). The burden hypothesis Fig. 2. The relationship between the number of HGT events and the predicts (for animal evolution) that organs burdened by number of protein–protein interactions. Each dot represents a gene connections to many other organs would be slow to evolve; family, and the number of protein–protein interactions for a given family their burden of complex interactions with other functional is quantified as the number of other gene families with which the given units would make their evolution difficult. Aris-Brosou family interacts. The negative relationship between the number of HGT (2005) has argued that the property of burden (or connec- events and the protein–protein interactions was quantified with a Spear- 105 tivity) that makes adaptive evolution difficult for complex man correlation coefficient of 0.422, with P o 8.18 10 (Cohen et al., 2011) (used with permission from Oxford University Press). structures also renders their HGT unlikely to succeed. More specifically, the burden hypothesis predicts a low rate of transfer of genes coding for parts of the transcription and being incompatible with the physiology and structures of translation machinery, as these functions are burdened with any but the most closely related organisms. many complex interactions; the transfer of only one part of a Among the most complex structures that show no complex set of co-adapted structures would likely bring evidence of HGT are features of cell architecture, which are about an incompatibility and loss of function (Jain et al., profoundly different across the most anciently divergent 1999). The degree of burden, and thus the predicted bacterial taxa (Cohan & Koeppel, 2008; Cohan, 2010). The resistance to the transfer of a gene, may be approximated prokaryotic domains of Bacteria and Archaea differ funda- by the number of protein–protein interactions of the gene’s mentally in the structure of their cell membranes, owing to product (Wellner et al., 2007; Price et al., 2008). This their use of ester vs. ether linkages in their membranes’ fatty measure of a protein’s connectivity has been shown to be acids. Some Archaean lineages build upon the ether linkage an important factor for the transferability of genes across all to produce a tetraether diglycerol structure yielding a lipid functional categories, in both ancient and recent transfers monolayer. This monolayer structure is of great ecological (Cohen et al., 2011) (Fig. 2). importance to , as the monolayer does Although uncommon, the transfer of highly connected not peel apart at high temperatures, in contrast to the lipid informational genes can occur (e.g. the transfer of an rRNA bilayer of Bacteria (Madigan et al., 2009). Thus, this operon or a gene encoding a ribosomal protein) (Gogarten profound structural difference between these domains is at et al., 2002; Lind et al., 2010). The burden hypothesis (or the least partly responsible for the ecological success of Archaea ‘complexity’ hypothesis) predicts that for informational at extremely high temperatures. While this architecture- genes to be successfully transferred, the recipient and the based ability to resist high temperature might be of value donor should be closely related. As predicted, the fitness for some Bacterial lineages, such differences in the ‘bau- costs of transferred ribosomal genes in Salmonella in experi- plans’ of cells, however adaptive, are not likely to be mental manipulations were generally much higher from packaged and transferred with success (Cohan & Koeppel, distant relatives than from close relatives (Lind et al., 2010). 2008). Because the lipid monolayer membrane interacts In contrast, the most phylogenetically distant transfers, such with many aspects of cell physiology, including the function as those between the Bacteria and the Archaea domains, of many trans-membrane proteins, transfer of such an almost always involve metabolic genes (Kanhere & Vingron, adaptation is unlikely, even if it were succinctly packaged as 2009). This pattern is consistent with informational genes a string of genes on a plasmid.

c 2011 Federation of European Microbiological Societies FEMS Microbiol Rev 35 (2011) 957–976 Published by Blackwell Publishing Ltd. All rights reserved Origins of diversity through horizontal transfer 963

Other nontransferable adaptations include architectural in the L19 locus, coding for another ribosomal structures of the and the Planctomyces. The protein (Maisnier-Patin et al., 2007). Firmicutes are built with an extremely strong , The potential for compensatory evolution may be eval- containing multiple layers of cross-linked uated by observing spontaneous evolution among the (Madigan et al., 2009), conferring effective and constitutive descendants of a cell that has integrated an adaptation resistance to osmotic stress (Schimel et al., 2007). The gram- through HGT. This reveals the potential of new mutations positive cell wall thus confers to the Firmicutes the ability to in the genome’s already-existing genes to ameliorate an thrive in drought-prone environments, which can deliver an adaptation’s harmful effects (Andersson & Levin, 1999; osmotic shock by rapidly re-wetting the cell. The Plancto- Levin et al., 2000). Alternatively, one may place an adaptive myces build a stalked , unique in the bacterial gene into different genetic backgrounds within a species world, which lifts the cell above the surface of sediment and taxon and observe the extent to which the fitness effect of the brings the cell within reach of much higher levels of adaptive gene varies between genetic backgrounds (Cohan nutrients (Lawler & Brun, 2007). In all these cases, bau- et al., 1994). This provides an estimate of the potential for a plan-level adaptations would in principle be gratefully species’ standing to ameliorate the adapta- received by other lineages for their ecological advantages, tion. For example, placing rifampicin-resistance alleles of but they have not been transferred, presumably because such the rpoB locus into different strains of Bacillus subtilis adaptations would integrate into the existing physiology and yielded a large range of fitness costs across different genetic development of a recipient cell only with extreme difficulty, backgrounds (Cohan et al., 1994). if at all (Cohan & Koeppel, 2008; Cohan, 2010; Philippot Amelioration can also occur through change in the et al., 2010). adaptation itself. Newly acquired genes have higher rates of evolution than other genes in the genome (Daubin & Och- man, 2004; Hao & Golding, 2006; Marri et al., 2007; Davids Accommodation of transferred & Zhang, 2008). For example, Hao & Golding (2006) found adaptations that within the Bacillaceae, recently transferred genes had longer tree lengths and higher Ka/Ks ratios than native genes, While a set of genes acquired through HGT may provide an illustrating rapid evolution. As transferred genes persist in important adaptation to a recipient, the transfer may also the genome, the Ka/Ks ratio decreases, suggesting adaptation incur harmful pleiotropic effects (Cohan et al., 1994; No- of the new genes to the new environment of the host (Hao & gueira et al., 2009). In the case of animals and plants, large Golding, 2006). Additionally, horizontally transferred genes genetic changes often disrupt the organism’s physiology and are more likely to be regulated by multiple factors; this development (Fisher, 1958), and we should likewise expect complex regulation is believed to evolve after the genes are HGTevents to be disruptive in bacteria. It is easy to imagine transferred (Price et al., 2008). that the expression of novel genes would disrupt existing Another mechanism for domesticating a horizontally physiology, but even acquiring an adaptive allele by homo- acquired adaptation involves altering the amount of protein logous recombination could disrupt the smooth functioning product of the transferred gene, by either gene amplification of a cell, especially in the case of antibiotic resistance alleles. or gene repression. In full-genome studies of paralogous This is because many antibiotics target informational pro- genes, it has been shown that genes newly acquired by teins, which are difficult to alter, as we have discussed. Thus, the genome are more likely to undergo gene duplication many resistance-conferring mutations in targeted informa- (Hooper & Berg, 2003a). Although the reason for the tion molecules bear a severe cost to the growth rate and duplication of horizontally transferred genes is not well competitiveness of a cell (Cohan et al., 1994; Andersson & established, it is thought that gene duplication functions to Levin, 1999; Levin et al., 2000). amplify the amount of gene product for genes with subopti- The deleterious side effects of a new adaptation can drive mal levels of protein product, as genes with large amounts of natural selection toward ‘domesticating’ the adaptation, that protein product are rarely duplicated (Hooper & Berg, 2003b; is, toward ameliorating its negative fitness effects. One mode Lind et al., 2010). Gene amplification may be especially of ameliorating the cost of a new adaptation is through important when HGTresults in the homologous replacement compensatory evolution, whereby natural selection favors of essential genes. For example, Lind et al.(2010)found modifiers at other loci that compensate for the deleterious that in Salmonella typhimurium, gene amplification was effects of the adaptation (Cohan et al., 1994; Andersson & common following the homologous replacement of riboso- Levin, 1999). These other loci are expected to code for mal genes from a variety of sources. The growth rates of functions that interact with the new adaptation. For exam- these mutants with gene amplifications were significantly ple, a costly streptomycin-resistance allele of the S12 riboso- higher than those without, indicating that the gene duplica- mal protein locus can be ameliorated by compensatory tion, in this case, served as amelioration following HGT.

FEMS Microbiol Rev 35 (2011) 957–976 c 2011 Federation of European Microbiological Societies Published by Blackwell Publishing Ltd. All rights reserved 964 J. Wiedenbeck & F.M. Cohan

Amelioration can also occur by way of gene repression. In cases where habitat differences suggest ecological Horizontally acquired sequences are often initially repressed differentiation between close relatives, a genome-based in the host genome by histone-like -structuring analysis can reveal the physiological basis for the ecological proteins (H-NS) (Schechter et al., 2003; Dorman, 2004, divergence and the role of HGT. For example, a recent study 2009; Oshima et al., 2006). H-NS appears to have the ability by Luo et al. (2011) aimed to find the physiological basis of to repress horizontally transferred DNA by recognizing differences between E. coli associated with external sequences with an aberrant GC content (Dorman, 2009). In environments (e.g. freshwater beaches and sediments) vs. some strains, upwards of 40% of horizontally transferred clades associated with and , either as com- genes are associated with H-NS proteins (Lucchini et al., mensals or as pathogens. Comparisons of genome content 2006). In the case of Salmonella enterica,anHns-like gene on confirmed that clades associated with external environments a plasmid expresses H-NS-like proteins that target horizon- were indeed ecologically distinct from clades associated with tally acquired genes only, allowing the silencing of horizon- endosymbiont lifestyles within mammals and birds. In tally transferred genes upon transfer (Beloin et al., 2003; contrast to commensal and pathogenic clades, the environ- Doyle et al., 2007; Banos˜ et al., 2009). mental clades were found consistently to contain lysozyme, The initial silencing of transferred genes may foster the for breaking down the cell walls of other bacteria in the incorporation and domestication of transferred DNA into external environment, as well as the biochemical pathway the genome, as repression serves to limit the fitness costs of for diol utilization, apparently for the use of diol as an expression of the novel DNA on the host cell (Dorman, energy substrate. The gut-associated clades were found to 2007; Banos˜ et al., 2009). Dorman has argued that this initial have transport and metabolic capability related to various silencing can be lessened over time as compensatory changes substrates that are known to be common in the gut, occur, allowing a potential benefit of a transferred gene including N-acetylglucosamine, gluconate, and five- and while minimizing its disruptive effects (Dorman, 2009). six-carbon sugars. Most importantly, by virtue of genome Finally, one other form of amelioration involves changes content differences, the environmental clades are not likely in an acquired set of genes to fit the nucleotide composition to compete well in the gut and are not expected to be of of the host genome. Organisms in different taxa above the concern for human health (Luo et al., 2011). genus level frequently are different in their frequencies of Likewise, closely related gut and dairy taxa within Lacto- single nucleotides, dinucleotides, trinucleotides, and so on. bacillus have acquired genes that foster adaptation to their The compositional differences between a donor segment respective environments (O’Sullivan et al., 2009). The gut- and the recipient are diminished over time as incorporated adapted taxon Lactobacillus acidophilus bears the gene for genes are subjected to the host’s pattern of mutation maltose-6-phosphate glucosidase, for degradation of the (Lawrence & Ochman, 1998), as well as natural selection to abundant maltose in the gut; also, the gut-adapted taxon better integrate the acquired genes with host genome has a gene for bile salt hydrolase, which contributes to regulation (Lercher & Pal, 2008). resistance to bile in the gut. Both these functions are absent in the dairy-adapted taxon Lactobacillus helveticus. The dairy-adapted taxon has a carboxypeptidase gene not found HGT and ecological divergence among in the gut-adapted taxon, which contributes to survival in close relatives environments, such as milk, where amino acid levels are low. Genome comparisons among other extremely closely How did the bacterial world diversify from a single common related populations in other taxa also identify the physiolo- ancestor to the hundred or more divisions of today, repre- gical basis of ecological divergence and point to a role for senting profoundly different cell architectures, , HGT in the origins of ecological diversification. Genome and ecologies? While the full answer is buried in the vast comparisons among soil populations of Pseudomonas putida expanses of time, we can approach a partial answer by associated with polluted and unpolluted environments show investigating a much more modest process, that of specia- the polluted populations to hold metal resistance genes not tion, in which one lineage splits into two irreversibly found in unpolluted sites (Wu et al., 2010). Also, closely separate lineages that are at least subtly different in ecologi- related populations within Prochlorococcus marinus have cal requirements and capabilities. One can approach the diverged in their phosphate-acquiring adaptations: those origins of new diversity through a genetic analysis of newly populations adapted to marine environments with very low divergent bacterial populations. The availability of full levels of phosphate have acquired a set of genes involved in genome sequences from closely related bacteria has enabled the uptake, regulation, and utilization of organic phosphates microbiologists to discover the ecological differences among (Martiny et al., 2009a). In studies like these, comparisons of closely related groups as well as the genetic differences genome contents among close relatives point to the physio- underlying the ecological divergence. logical basis by which populations diverge ecologically and

c 2011 Federation of European Microbiological Societies FEMS Microbiol Rev 35 (2011) 957–976 Published by Blackwell Publishing Ltd. All rights reserved Origins of diversity through horizontal transfer 965 implicate a role for HGT as the motive force by which this fascinating to find the extent to which ecological differences diversification has proceeded. among close relatives are caused by the horizontal acquisi- Genome content comparisons can also add to our knowl- tion of novel genes vs. changes in the sequences and edge about the ecological dimensions by which populations expression patterns of shared genes. have diverged. This is seen in the case of two clades of In summary, genome-based comparisons have confirmed Synechococcus in a Yellowstone hot spring: one asso- that closely related populations are ecologically distinct; they ciated with hotter, upstream environments closest to the have identified the ecological dimensions by which the source of the spring (the A clade, 60–65 1C) and the other populations have diverged; and they have determined the associated with cooler, downstream environments (B0 clade, genetic and physiological bases of ecological divergence, 55–61 1C). Comparison of the genome content of these including the role of HGT. Approaches adopted up to now, groups showed that beyond differences in their temperature however, have not directly addressed the origins of species, optima (Allewalt et al., 2006), the groups were also different by which one lineage irreversibly splits into two lineages, as in their abilities to take up and store mineral nutrients we next discuss. (Bhaya et al., 2007). The genomes suggest that the cooler, downstream clade can accommodate lower concentrations HGT and the origins of bacterial species of mineral nutrients (owing to consumption by upstream populations) by being able to take up phosphonate and store To discover the ecological and genetic changes that occur nitrogen, adaptations acquired by HGT. during , it is not enough to compare closely Metagenomic approaches have also yielded evidence for related species taxa or even close relatives within a species HGT in the origin of ecological diversity. A shotgun taxon. To investigate the dynamics of lineage splitting and sequencing of random clones from the microbial mat of a the origin of irreversible separateness, we need to identify Yellowstone hot spring indicated an adaptive divergence the most newly divergent, ecologically distinct populations. among extremely closely related populations of Synechococcus Focusing on these most newly divergent populations will (Bhaya et al., 2007). Whereas the complete genome of one allow us to identify the features responsible for the earliest isolate showed no genome content dedicated to transport of splitting of lineages. If we instead compare more divergent the reduced ferrous ion, several closely related organisms lineages, we cannot distinguish the features responsible for identified from the metagenome contained the ferrous trans- lineage splitting from those features added on after the port genes feoA and feoB. This suggested that some popula- lineages have become established. Studies of the speciation tions, but not others are able to scavenge reduced iron during process in bacteria have not yet attempted to identify and the nighttime when the mat becomes anoxic (Bhaya et al., compare the genomes of the most recent products of 2007). Thus, niche adaptation strategies can be inferred by speciation. comparing metagenome sequences with an ‘anchor’ isolate The established of bacteria does not help whose genome has been fully sequenced. identify the most newly divergent, irreversibly separate, Genome-based analyses have also shown how ecological ecologically distinct bacterial populations. The problem is diversity can emerge from changes other than HGT, through that the taxa recognized as species by bacterial systematics changes in existing genes. Petersen et al. (2007) searched the are extremely broadly defined, with enormous levels of set of genes shared by six E. coli strains for genes whose divergence in physiology, genome content, and most im- amino acid divergence was accelerated by natural selection. portantly, ecology (Welch et al., 2002; Whittam & Bum- This search identified several genes coding for cell surface baugh, 2002; Tettelin et al., 2005; Lefebure´ & Stanhope, proteins, and the amino acids under selection were consis- 2007; Rasko et al., 2008; Touchon et al., 2009; Paul et al., tently found to be in the extracellular region, indicating that 2010). For example, isolates within the species taxon E. coli diversification has resulted from the arms race between the show huge differences in habitat (Walk et al., 2007, 2009) bacteria and their enemies, such as phage and the host and genome content (Welch et al., 2002; Touchon et al., . 2009; Luo et al., 2011). While there are increasing numbers Analyses of genome-wide gene expression have also of studies comparing the genomes of close relatives within helped to characterize how changes in existing genes have the recognized species taxa, little attention has been paid as led to niche partitioning among close relatives. Denef et al. to whether the populations being compared represent the (2010) found that one clade of the Archaean Leptospirillum, most newly divergent, irreversibly separate, ecologically which is typically a late colonizer of acid-mine pools, differs distinct populations. from a closely related, early-colonizing clade in the expres- One approach to identifying such populations takes into sion of several shared proteins that may be important in account the dynamic properties that are widely understood growth at low nutrient concentrations, including proteins to characterize species, at least outside of the systematics of for cobalamin synthesis and glycine cleavage. It will be bacteria (de Queiroz, 2005). Foremost among these dynamic

FEMS Microbiol Rev 35 (2011) 957–976 c 2011 Federation of European Microbiological Societies Published by Blackwell Publishing Ltd. All rights reserved 966 J. Wiedenbeck & F.M. Cohan properties is that species are cohesive, in the sense that by the bacterial rate of recombination, which is consistently within a species is limited by a force of near the mutation rate (Vos & Didelot, 2009). At such low evolution (Templeton, 1989; Cohan & Perry, 2007). In rates of gene flow between populations, genetic exchange is contrast, different species are viewed to be irreversibly not a significant force preventing ecological divergence separate, with no cohesion holding them together. Also, between bacterial populations (Cohan, 1994; Cohan & species are expected to be ecologically distinct, which allows Koeppel, 2008) (Box 1). the species to coexist into the indefinite future. Finally, each Genetic drift and periodic selection are the most likely species is expected to be invented only once (de Queiroz, forces of cohesion within bacterial species. Cohesion by 2005). these forces is limited to the set of ecologically similar The forces of species cohesion in bacteria potentially organisms within an ‘ecotype’ (Kopac & Cohan, 2011). include genetic exchange between populations (Papke et al., Genetic drift is most likely to be an important force of 2007; Sheppard et al., 2008; Fraser et al., 2009; Retchless & cohesion for ecotypes with low effective population sizes, for Lawrence, 2010), genetic drift (Wernegreen & Moran, 1999; example, obligate endosymbionts that are transmitted be- Kuo et al., 2009), and periodic selection, a diversity-purging tween hosts in extremely small numbers (Wernegreen & process occurring in populations with low recombination Moran, 1999). Periodic selection can act to purge the rates (Koch, 1974; Levin, 1981; Cohan, 1994). Genetic diversity within any ecotype, regardless of its population exchange has been widely considered to be a cohesive force size (Levin & Bergstrom, 2000). In a periodic selection preventing ecological divergence in obligately sexual animals event, an adaptive mutation or a gene acquisition by HGT and plants; in some models, permanent, adaptive divergence within an ecotype outcompetes to all other between animal and populations is impossible unless members of the ecotype; owing to the low recombination barriers to genetic exchange have evolved (Mayr, 1963; rates in bacteria, selection favoring the adaptive mutation or Coyne & Orr, 2004), although the necessity of barriers to recombinant can bring nearly the entire genome sequence of genetic exchange in animal and plant speciation has been the mutant cell to 100% frequency within the ecotype recently challenged (Dieckmann et al., 2004; Mallet, 2008). (Cohan, 2005). Thus, the diversity within an ecotype is In the case of bacteria, the rate of exchange of genes ephemeral, lasting only until the next periodic selection between populations, even between differently adapted event (Fig. 3). populations of the same microhabitat, is extremely low. This The origin of permanent diversity requires the origin of a is because the flow of genes between populations is limited new, ecologically distinct population, that is, the origin of an

Fig. 3. The dynamics of ecotype formation and periodic selection within an ecotype. Circles represent different genotypes, and asterisks represent adaptive mutations. (a) Ecotype-formation event. A mutation or a recombination event allows the cell to occupy a new ecological niche, founding a new ecotype. A new ecotype can be formed only if the founding organism has undergone a fitness trade-off, whereby it cannot compete successfully with the parental ecotype in the old niche. (b) Periodic selection event. A periodic selection mutation improves the fitness of an individual such that the mutant and its descendants outcompete all other cells within the ecotype; these mutations do not affect the diversity within other ecotypes because ecological differences between ecotypes prevent direct competition. Periodic selection leads to the distinctness of ecotypes by purging the divergence within, but not between ecotypes. (Used with permission from Landes Publishers.)

c 2011 Federation of European Microbiological Societies FEMS Microbiol Rev 35 (2011) 957–976 Published by Blackwell Publishing Ltd. All rights reserved Origins of diversity through horizontal transfer 967 ecotype (Cohan & Perry, 2007). A new ecotype is formed resources at no cost in the old niche. In this case, the new when an organism invades a new ecological niche, by virtue organism becomes more of an ecological generalist, and will of a mutation or an HGT event. The diversity-purging simply outcompete the members of its ecotype to extinction, power of an ecotype’s periodic selection events cannot reach and no new ecotype is created (Fig. 4). beyond the ecological similarity of ecotype members; the Thus, the origin of bacterial ecotypes that can coexist can ecological distinctness across ecotypes will generally protect be viewed as the origin of species: a nascent ecotype is an ecotype from periodic selection stemming from another irreversibly separate from the parental ecotype from which it ecotype (Fig. 3). An ecotype may be most precisely under- is derived, because the divergence of different ecotypes is not stood as the of competitive superiority of an limited either by periodic selection (Fig. 3) or by recombi- adaptive mutant (Cohan, 1994). nation (Box 1); also, each ecotype is genetically cohesive, The formation of a new ecotype requires that the new owing to the effect of genetic drift and/or periodic selection niche-invading organism must be able to coexist with the in purging the diversity among the ecologically interchange- parental ecotype. This requires a fitness trade-off, where the able membership. Ecotypes thus hold the dynamic attributes success of the niche-invading organism in its new niche of species, and the origin of species can be understood by comes at a cost, usually in its ability to compete with the exploring the origins of ecological divergence among newly parental ecotype in the old niche. The fitness trade-off leads formed ecotypes. to two alternatively specialized ecotypes that can coexist Fortunately, the most newly divergent ecotypes can (Fig. 4). A different dynamic ensues when a mutant or a potentially be identified even when we do not know the recombinant organism acquires the ability to use new ecological dimensions by which they have diverged,

Event Consequence

(a) A

Periodic selection

Ecotype 1 Ecotype 1

Loss (b) B

Ecotype formation

Ecotype 1 Ecotype 1 Ecotype 2

Fig. 4. The consequences of a change in ecological niche following an HGT event. Acquisition of a new ecological function by HGT (indicated by the green triangle) can lead to either a periodic selection event (a) or an ecotype formation event (b). (a) The new ecological function is added to the ecological repertoire of the recipient strain, and the resultant strain is now able to outcompete the membership of its ecotype by virtue of its greater repertoire. An example of such a niche-expanding HGT event would be the acquisition of one more antibiotic resistance factor in a human that is already resistant to a number of antibiotics; this would expand the set of clinical conditions under which the strain could succeed. (b) Gain of one function is incompatible with an existing function or reduces the performance of an existing function. Thus, acquisition of the new ecological capability comes at the expense of some pre-existing function. In this case, acquisition of the new function leads to a new ecotype, which can coexist with the pre- existing ecotype. An example of this would be the acquisition of a new symbiosis plasmid in Rhizobium, conferring the recipient with the ability to infect a new plant species; however, two symbiosis plasmids within a cell are incompatible. Thus, the plasmid transfer event would create a splitting of the ecotype into two ecotypes. Alternatively, a mutation to better utilize a new carbon source could lead to the invention of a new ecotype, provided that the mutation causes lower performance in utilizing the old carbon source. This has been seen repeatedly in experimental populations of Escherichia coli that primarily used glucose for carbon; a mutation to utilize secreted acetate created a new ecotype because the acetate-utilizing bacteria were lessable to utilize glucose (Treves et al., 1998).

FEMS Microbiol Rev 35 (2011) 957–976 c 2011 Federation of European Microbiological Societies Published by Blackwell Publishing Ltd. All rights reserved 968 J. Wiedenbeck & F.M. Cohan

(a) (b) (c) ABC ABD Abc aBc abC ABCABD ABC

Stable ecotype Nano-niche Recurrent niche invasion

Fig. 5. Models of bacterial speciation. Ecotypes are represented by different colors; periodic selection events are indicated by asterisks and extinct lineages are represented by dashed lines. The letters represent the resources that each group of organisms can utilize. In cases where ecotypes utilize the same set of resources, but in different proportions, the predominant resource of each ecotype is denoted by a capital letter. (a) The Stable Ecotype model. The Stable Ecotype model is marked by a much higher rate of periodic selection than ecotype formation, such that each ecotype endures many periodic selection events during its lifetime. The Stable Ecotype model generally yields a one-to-one correspondence between ecotypes and sequence clusters. The ecotypes are able to coexist indefinitely because each has a resource not shared with the other. (b) The Nano-Niche model of bacterial speciation. In the figure, there are three Nano-Niche ecotypes (denoted by Abc, aBc, and abC) that use the same set of resources, but in different proportions. Each Nano-Niche ecotype can coexist with the other two because they have partitioned their resources, at least quantitatively. However, because the ecotypes share all their resources, each is vulnerable to a possible speciation-quashing mutation that may occur in the other ecotypes. This could be a mutation that increases efficiency in the utilization of all resources. These speciation-quashing mutations are indicated by a large asterisk; each of these extinguishes the other Nano-Niche ecotypes. Thus, in the Nano-Niche model, cohesion can cut across ecologically distinct populations, provided that they are only quantitatively different in their resource utilization. (c) The Recurrent Niche Invasion model. Here, a lineage may move, frequently and recurrently, from one ecotype to another, usually by acquisition and loss of niche-determining plasmids. In the figure, the red lines indicate the times in which a lineage is in the plasmid-containing ecotype; the blue lines indicate the times when the lineage is in the plasmid-absent ecotype. Periodic selection events within one ecotype extinguish only the lineages of the same ecotype. For example, in the most ancient periodic selection event shown, which is in the plasmid-absent (blue) ecotype, only the lineages missing the plasmid at the time of periodic selection are extinguished, while the plasmid-containing lineages (red) persist. Ecotypes determined by a plasmid are not likely to be discoverable as sequence clusters (Cohan, 2011). provided that a particular model of bacterial speciation is Most closely related ecotypes have similarly been hypothe- assumed (Cohan & Perry, 2007). Under the Stable Ecotype sized and independently confirmed to be ecologically dis- model (Fig. 5a), ecotypes are formed rarely and each ecotype tinct in many other taxa, including the photosynthetic endures many periodic selection events within its lifetime. marine of Prochlorococcus (Martiny et al., This model therefore allows the most newly divergent 2009b), on the basis of temperature and nitrogen require- ecotypes to accumulate a unique set of neutral sequence ments. mutations since diverging from their most recent common The demarcation of ecotypes can be aided by theory- ancestor; moreover, the recurrent periodic selection events based algorithms, which do not rely on an investigator’s ensure that each ecotype appears as a sequence cluster of intuition about the phylogenetic size of a sequence cluster very close relatives, based on a phylogeny of any gene in the most likely to correspond to an ecotype. The algorithms genome. Different ecotypes will therefore appear to be much ECOTYPE SIMULATION (Koeppel et al., 2008) and ADAPTML (Hunt more distantly related than members of the same ecotype et al., 2008) infer ecotype demarcations using universal and will correspond to sequence clusters (Fig. 5a). molecular methods, generally interpreting ecotypes as se- For decades, sequence clustering has formed the basis for quence clusters for any gene shared among organisms. These discovering ecotypes. For example, a diversity of ecotypes algorithms have inferred ecotypes from sequence data in was hypothesized within the species taxon various bacterial systems, and the ecotypes have consistently bovis based on the clustering of rapidly evolving sequences, been confirmed to be ecologically distinct. For example, and these clusters were confirmed to be ecologically distinct ECOTYPE SIMULATION has identified extremely closely related on the basis of differences in host range (Smith et al., 2006). ecotypes in soil Bacillus that are ecologically distinct in their

c 2011 Federation of European Microbiological Societies FEMS Microbiol Rev 35 (2011) 957–976 Published by Blackwell Publishing Ltd. All rights reserved Origins of diversity through horizontal transfer 969 associations with solar exposure (Koeppel et al., 2008), soil Nevertheless, the phylogenetic groups identified as eco- texture (Connor et al., 2010), rhizospheres, elevation, and types can be cohesive only if their members are ecologically salinity (S. Kopac, unpublished data), as well as in hot spring interchangeable, as genetic drift and periodic selection can Synechococcus that are ecologically distinct in their associa- purge diversity only within populations of the same ecolo- tions with horizontal (temperature and nutrient availability) gical niche; it is therefore essential to test whether the and vertical (light level and quality) dimensions of the mat members of an identified ecotype are ecologically inter- (Ward et al., 2006; Melendrez et al., 2011), and in the changeable (Kopac & Cohan, 2011). Genome-based com- pathogen L. pneumophila, where ecotypes differ in their parisons, whether focusing on genome content differences, amoebic host ranges and in their gene expression patterns positive selection on shared genes, or changes in gene (Cohan et al., 2006); ADAPTML has identified ecotypes of expression, can test whether the members of a putative marine Vibrio that are associated with different particle sizes ecotype (identified by sequence-based algorithms) are eco- and with seasons (Hunt et al., 2008). logically interchangeable and can identify the phylogenetic These most closely related ecotypes, whether discovered breadth of ecological interchangeability. by an intuitive or an algorithmic demarcation, are the key to We have recently tested for ecological interchangeability discovering the ecological dimensions by which lineages within one putative ecotype hypothesized by ECOTYPE SIMULA- split and the genetic basis of ecotype formation, as well as TION and ADAPTML, to our knowledge the only test of the role of HGT in the origins of ecological diversity. ecological interchangeability at this small phylogenetic scale Genome-based comparisons of the most closely related (Wiedenbeck, 2011). This was an analysis of five strains ecotypes could confirm that the ecotypes are ecologically from a putative ecotype (PE15) within B. subtilis ssp. different, and could reveal the ecological dimensions of spizizenii, including four strains isolated from a Valley ecotype formation; moreover, they could identify the role canyon (Connor et al., 2010) and one well-characterized of HGT in the origins of species. If genome investigations reference strain (Zeigler et al., 2008). The five strains were could be directed in the future toward comparing popula- shown to be heterogeneous in their genome content, largely tions identified by algorithms such as ECOTYPE SIMULATION and in genes related to phage and transposition as well as genes ADAPTML to be the most closely related, ecologically distinct classified as hypothetical (Wiedenbeck, 2011). Remarkably, populations, we would make better progress toward study- none of the remaining unshared genes provided any novel ing the dynamics of bacterial speciation and identifying all ecological functions; all these genes were either duplicates of the ecological diversity within bacterial taxa of interest genes already shared by every strain or they were novel genes (Cohan & Perry, 2007). To our knowledge, clades that have contributing to functions already possessed by all strains. been identified as most closely related ecotypes have yet to For example, all five strains had multiple genes involved in be compared by genome-based analyses. the transport and utilization of maltose, and additional Genome comparisons among close relatives can also test genes unique to some strains were either copies of existing the validity of alternative models of speciation. Of foremost maltose genes or they provided additional capability for importance is the need to test whether bacterial ecotypes are metabolizing maltose. Every nonphage, nontransposing, cohesive, and genome comparisons provide an opportunity nonhypothetical gene that was unique to a subclade of to do this. Doolittle & Zhaxybayeva (2009) have argued that strains within our sample shared this pattern seen for there is no fundamental reason why bacterial species should maltose. be cohesive, and they suggest that frequent HGT events may Thus, while the putative ecotype’s members appear not to continuously cause a lineage to switch from one ecological be ecologically interchangeable, the heterogeneity appears to niche to another. In the most extreme form of this argu- be limited. We hypothesized that the divergence among ment, ecological interchangeability may extend only as far as genome types represents a kind of quantitative divergence, a bacterium and its immediate offspring. where each strain sampled has its own ecological niche, but Doolittle and Zhaxybayeva (2009) have likely exaggerated no strain has a unique resource that protects it completely the importance of HGT in fostering ecological diversity from competition (Wiedenbeck, 2011). In this case, the among closest relatives (Kopac & Cohan, 2011). While various members of PE15 may utilize maltose under differ- closest relatives are inevitably different in their rosters of ent conditions, and each strain’s unique set of maltose genes genes, careful analyses of genome content differences have may allow that strain to utilize maltose best under its own shown that nearly all HGT events among closest relatives optimal conditions. involve transfers of genes not known to be involved in the This kind of quantitative divergence among ecotypes is ecological divergence of bacteria, such as phage-related different from the Stable Ecotype model, where each ecotype genes, genes for transposition activity, and genes whose is more completely protected from competition from other function is not known (the genes of so-called ‘hypothetical’ ecotypes because each ecotype has some unique resources function) (Touchon et al., 2009; Wiedenbeck, 2011). (Fig. 5a). In the case of the variants within PE15, each

FEMS Microbiol Rev 35 (2011) 957–976 c 2011 Federation of European Microbiological Societies Published by Blackwell Publishing Ltd. All rights reserved 970 J. Wiedenbeck & F.M. Cohan quantitatively different ecotype may have its own periodic bacteria associated with humans have been surveyed for selection events, but it is still possible that a periodic their antibiotic resistance genes. Sommer et al. (2009) selection event in one ecotype could drive the other ecotypes recently expressed DNA cloned from the human micro- to extinction, as in the Nano-Niche model of bacterial biome and selected clones for resistance to various antibio- speciation (Cohan & Perry, 2007) (Fig. 5b). Thus, while the tics. This approach does not require homology or sequence putative ecotype analyzed here turned out to be ecologically similarity of resistance genes to any known resistance factor. heterogeneous, it still may be cohesive owing to the incom- Indeed, Sommer and colleagues found a broad diversity of plete and quantitative nature of the ecological divergence resistance factors that were extremely divergent from, and among populations. This study shows the potential for gene some not even homologous with, previously known factors content comparisons to study both the phylogenetic breadth (Box 3). It is not understood why these abundant resistance of ecological interchangeability and the extent to which a factors have not yet made their way into E. coli and other demarcated, putative ecotype forms a cohesive unit. familiar bacteria in our guts. While this approach identifies An alternative model of bacterial speciation, the Recur- possible resistance factors in our future, it is important to rent Niche Invasion model, takes into account the role of note that the proximity of these factors to possible human mobile genetic elements, such as plasmids or phage, in pathogens might not be enough to foster their transfer. determining bacterial niches (Cohan & Perry, 2007) Beyond searching for antibiotic resistance genes in the (Fig. 5c). For example, in the case of Rhizobium, a bacterial , it will be informative to screen the lineage may acquire a symbiosis plasmid that adapts it as an metagenomes of other organisms from which we frequently endosymbiotic mutualist to one set of legume hosts; then acquire pathogens, including agricultural animals, mice, the lineage may lose that plasmid and acquire another ticks, and mosquitoes. We will increasingly be able to symbiosis plasmid, thereby adapting it to another set of identify the organism sources of resistance genes discovered legumes (Segovia et al., 1991). Also, a pathogenic lineage in this way as more bacterial genome sequences become may acquire a multiple-resistance plasmid, which will adapt it to habitats where antibiotics are free flowing, and then lose Box 3. Identification of antibiotic resistance factors in human gut it when living in habitats where antibiotics are rarely used. bacteria In either case, a cell can be converted from one ecotype to The widespread occurrence of antibiotic resistance factors among another by acquiring and/or losing a niche-specifying plas- bacteria, as well as their propagation by HGT, has led to an increased mid or phage, and so a lineage moves back and forth interest in characterizing the possible reservoirs for resistance factors between memberships in different, previously existing eco- (Riesenfeld et al., 2004; D’Costa et al., 2006). As the majority of types. In contrast to chromosomally based ecotypes, eco- are not presently cultivable, metagenomic analyses of these possible reservoirs may supplement previous analyses of types that are distinguished only by the presence of different resistance factors present in bacterial isolates. Sommer et al. (2009) plasmids cannot be discovered as sequence clusters in genes analyzed a possible resistance reservoir using a metagenomic analysis unrelated to ecological divergence (Cohan, 2011) (Fig. 5c). of human microbial communities. The authors isolated DNA from We therefore cannot predict and demarcate plasmid-based human saliva and fecal samples, and created clone libraries from ecotypes on the basis of sequence divergence in randomly these DNA isolates. The clones were then screened for resistance to a chosen chromosomal genes. The ecological diversity created number of antibiotics, and the metagenomic inserts conferring by plasmids can be discovered only by screening for the antibiotic resistance were sequenced and annotated. Interestingly, many of the resistance inserts identified in this way did not match functions known or expected to be coded by the plasmids previously characterized antibiotic resistance genes (Sommer et al., ahead of time. As we next discuss, such a functional 2009). Some of these genes encoded proteins that were identical to approach has been successful in identifying ecological diver- proteins annotated as hypothetical. This finding is especially sity in antibiotic resistance factors, which are commonly intriguing, as it indicates that many hypothetical or poorly carried and transferred on plasmids (Fondi & Fani, 2010). characterized proteins from sequenced bacterial genomes may indeed be resistance factors. Identifying these proteins will not only help to further characterize modes of antibiotic resistance in bacteria, The antibiotic resistance of our future but will also help to improve sequence annotation by identifying the functions of hypothetical proteins. The findings of the Sommer and What are the resistance genes of our future? To find the colleagues study indicated that metagenomic analyses are a resistance factors most likely to enter human pathogens successful way to identify new resistance factors. The human through HGT in the near future, we should take into microbiome, however, is only one of many possible reservoirs for account that environmental and phylogenetic proximity of antibiotic resistance. Other important sources for resistance factors two organisms contribute to their sharing of genes, as well as include agricultural sources, water environments (Baquero et al., the complexity and compatibility of resistance factors. 2008), and soil (Riesenfeld et al., 2004). Further investigations into these reservoirs, including metagenomic analyses, will help to better One approach screens for resistance factors with environ- characterize resistance factors likely to be medically significant. mental proximity to potential human pathogens; here,

c 2011 Federation of European Microbiological Societies FEMS Microbiol Rev 35 (2011) 957–976 Published by Blackwell Publishing Ltd. All rights reserved Origins of diversity through horizontal transfer 971 available. Those resistance factors that are in phylogenetic References and environmental proximity to human pathogens will be of Allewalt JP, Bateson MM, Revsbech NP, Slack K & Ward DM greatest concern (Matte-Tailliez et al., 2002). (2006) Effect of temperature and light on growth of and As we discover increasing numbers of resistance factors, by Synechococcus isolates typical of those we may test them for their efficacy in human pathogens. (Or predominating in the octopus spring microbial mat for safety purposes, we may test them in closely related, community of Yellowstone National Park. Appl Environ Microb nonpathogenic model systems.) This would include quanti- 72: 544–550. tative measures of the resistance benefit as well as the fitness Anantha RP, McVeigh AL, Lee LH, Agnew MK, Cassels FJ, Scott cost of resistance factors, perhaps by following the high- DA, Whittam TS & Savarino SJ (2004) Evolutionary and throughput approach of Sorek et al. (2007) for introducing functional relationships of colonization factor antigen i and thousands of genes into a strain and measuring the fitness other class 5 adhesive fimbriae of enterotoxigenic Escherichia effects of gene acquisition. It will be interesting to observe coli. Infect Immun 72: 7190–7201. the extent to which antibiotic resistance genes tend to be Andersson DI & Levin BR (1999) The biological cost of antibiotic specialized to different phylogenetic groups of pathogens resistance. Curr Opin Microbiol 2: 489–493. and how easily a resistance factor can co-evolve with a new Aris-Brosou S (2005) Determinants of adaptive evolution at the pathogen to overcome the initial incompatibility between molecular level: the extended complexity hypothesis. Mol Biol resistance and organism. Evol 22: 200–209. In summary, predicting our future of antibiotic resistance Banos˜ RC, Vivero A, Aznar S, Garcıa´ J, Pons M, Madrid C & will involve identifying the resistance factors that are within Juarez´ A (2009) Differential regulation of horizontally reach by HGT of our species’ pathogens. It will also be acquired and core genome genes by the bacterial modulator important to use the approaches outlined here to determine H-NS. PLoS Genet 5: e1000513. which of these resistance factors are compatible with our Baquero F, Martinez JL & Canton R (2008) Antibiotics and pathogens and which can become compatible through antibiotic resistance in water environments. Curr Opin Biotech ameliorative evolution. 19: 260–265. The population biology approaches we have outlined may Barlow M (2009) What has taught us also help to identify all the bacteria whose acquisition of about horizontal gene transfer. Methods Mol Biol 532: antibiotic resistance may harm us and our agricultural 397–411. dependents. We have illustrated how the broadly defined Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, species taxa of bacterial systematics can obscure within them Moineau S, Romero DA & Horvath P (2007) CRISPR provides multiple clades differing in their pathogenic properties; acquired resistance against viruses in prokaryotes. Science 315: predicting the next epidemic may require going beyond the 1709–1712. existing , through discovering new pathogenic Beloin C, Deighan P, Doyle M & Dorman CJ (2003) Shigella ecotypes within the established taxa (Cohan & Perry, 2007) flexneri 2a strain 2457T expresses three members of the and by not being distracted by ecotypes that are not H-NS-like protein family: characterization of the Sfh protein. pathogenic (Cohan et al., 2006; Smith et al., 2006; Luo Mol Genet Genomics 270: 66–77. et al., 2011). We suggest using the universal approaches Bhaya D, Grossman AR, Steunou AS et al. (2007) Population level described here to discover all the ecotypes within known functional diversity in a microbial community revealed by pathogenic taxa and to characterize their ecological capabil- comparative genomic and metagenomic analyses. ISME J 1: ities through genome comparisons. The discovery of poten- 703–713. tial pathogens by these universal approaches, followed by Bolotin A, Quinquis B, Sorokin A & Ehrlich SD (2005) Clustered surveillance of their acquisitions of antibiotic resistance, will regularly interspaced short palindrome repeats (CRISPRs) be an important preemptive public health strategy. have spacers of extrachromosomal origin. Microbiology 151: 2551–2561. Brouns SJ, Jore MM, Lundgren M et al. (2008) Small CRISPR Acknowledgements RNAs guide antiviral defense in prokaryotes. Science 321: 960–964. This work was funded by a NASA-funded Connecticut Budroni S, Siena E, Hotopp JC et al. (2011) Neisseria meningitidis Space Grant to J.W. and by Wesleyan University research is structured in clades associated with restriction modification funds to F.M.C. We thank Jessica Sherry for alerting us to systems that modulate homologous recombination. P Natl several examples of niche-specifying HGT events. We thank Acad Sci USA 108: 4494–4499. Fernando Baquero and Dieter Haas, as well as two anon- Chistoserdova L, Vorholt JA, Thauer RK & Lidstrom ME (1998) ymous reviewers, for their valuable suggestions for improv- C1 transfer enzymes and coenzymes linking methylotrophic ing the manuscript. bacteria and methanogenic Archaea. Science 281: 99–102.

FEMS Microbiol Rev 35 (2011) 957–976 c 2011 Federation of European Microbiological Societies Published by Blackwell Publishing Ltd. All rights reserved 972 J. Wiedenbeck & F.M. Cohan

Cohan FM (1994) The effects of rare but promiscuous genetic related bacteria in natural acidophilic microbial communities. exchange on evolutionary divergence in prokaryotes. Am P Natl Acad Sci USA 107: 2383–2390. Naturalist 143: 965–986. de Queiroz K (2005) Ernst Mayr and the modern concept of Cohan FM (2001) Bacterial species and speciation. Syst Biol 50: species. P Natl Acad Sci USA 102 (suppl 1): 6600–6607. 513–524. de Vries J & Wackernagel W (2002) Integration of foreign DNA Cohan FM (2002) Sexual isolation and speciation in bacteria. during natural transformation of Acinetobacter sp. by Genetica 116: 359–370. homology-facilitated illegitimate recombination. P Natl Acad Cohan FM (2005) Periodic selection and ecological diversity in Sci USA 99: 2094–2099. bacteria. Selective Sweep (Nurminsky D, ed), pp. 78–93. Landes Dieckmann U, Metz JAJ, Doebeli M & Tautz D (2004) Bioscience, Georgetown, TX. Introduction. Adaptive Speciation (Dieckmann U, Doebeli M, Cohan FM (2010) Synthetic biology: now that we’re creators, Metz JAJ & Tautz D, eds), pp. 1–16. Cambridge University what should we create? Curr Biol 20: R675–R677. Press, Cambridge. Cohan FM (2011) Are species cohesive? – A view from Doolittle WF & Zhaxybayeva O (2009) On the origin of bacteriology. Bacterial : A Tribute to prokaryotic species. Genome Res 19: 744–756. Thomas S Whittam (Walk S & Feng P, eds), pp. 43–65. Dorman CJ (2004) H-NS: a universal regulator for a dynamic American Society for Microbiology Press, Washington, DC. genome. Nat Rev Microbiol 2: 391–400. Cohan FM & Koeppel AF (2008) The origins of ecological Dorman CJ (2007) H-NS, the genome sentinel. Nat Rev Microbiol diversity in prokaryotes. Curr Biol 18: R1024–R1034. 5: 157–161. Cohan FM & Perry EB (2007) A systematics for discovering the Dorman CJ (2009) Regulatory integration of horizontally- fundamental units of bacterial diversity. Curr Biol 17: transferred genes in bacteria. Front Biosci 14: 4103–4112. R373–R386. Doyle M, Fookes M, Ivens A, Mangan MW, Wain J & Dorman CJ Cohan FM, King EC & Zawadzki P (1994) Amelioration of the (2007) An H-NS-like stealth protein aids horizontal DNA deleterious pleiotropic effects of an adaptive mutation in transmission in bacteria. Science 315: 251–252. Bacillus subtilis. Evolution 48: 81–95. Drake JW (2009) Avoiding dangerous missense: Cohan FM, Koeppel A & Krizanc D (2006) Sequence-based display especially low mutation rates. PLoS Genet 5: e1000520. discovery of ecological diversity within Legionella. Legionella: Fisher RA (1958) The Genetical Theory of Natural Selection. State of the Art 30 Years after Its Recognition (Cianciotto NP, Dover, New York. Abu Kwaik Y & Edelstein PH, et al, eds), pp. 367–376. ASM Fondi M & Fani R (2010) The horizontal flow of the plasmid Press, Washington, DC. resistome: clues from inter-generic similarity networks. Cohen O, Gophna U & Pupko T (2011) The complexity Environ Microbiol 12: 3228–3242. hypothesis revisited: connectivity rather than function Fraser C, Hanage WP & Spratt BG (2007) Recombination and the constitutes a barrier to horizontal gene transfer. Mol Biol Evol nature of bacterial speciation. Science 315: 476–480. 28: 1481–1489. Fraser C, Alm EJ, Polz MF, Spratt BG & Hanage WP (2009) The Connor N, Sikorski J, Rooney AP et al. (2010) The ecology of bacterial species challenge: making sense of genetic and speciation in Bacillus. Appl Environ Microb 76: 1349–1358. ecological diversity. Science 323: 741–746. Coscolla´ M, Comas I & Gonzalez-Candelas´ F (2011) Quantifying Garcia-Vallve´ S, Romeu A & Palau J (2000) Horizontal gene nonvertical inheritance in the evolution of Legionella transfer of glycosyl hydrolases of the rumen fungi. Mol Biol pneumophila. Mol Biol Evol 28: 985–1001. Coyne JA & Orr HA (2004) Speciation. Sinauer Associates, Evol 17: 352–361. Sunderland, MA. Godde JS & Bickerton A (2006) The repetitive DNA elements Datta N & Kontomichalou P (1965) Penicillinase synthesis called CRISPRs and their associated genes: evidence of controlled by infectious R factors in Enterobacteriaceae. horizontal transfer among prokaryotes. J Mol Evol 62: Nature 208: 239–241. 718–729. Daubin V & Ochman H (2004) Bacterial genomes as new gene Gogarten JP, Doolittle WF & Lawrence JG (2002) Prokaryotic homes: the genealogy of ORFans in E. coli. Genome Res 14: evolution in light of gene transfer. Mol Biol Evol 19: 1036–1042. 2226–2238. Davids W & Zhang Z (2008) The impact of horizontal gene Goldman BS & Kranz RG (1998) Evolution and horizontal transfer in shaping operons and protein interaction transfer of an entire biosynthetic pathway for cytochrome c networks–direct evidence of preferential attachment. BMC biogenesis: Helicobacter, Deinococcus, Archae and more. Mol Evol Biol 8: 23. Microbiol 27: 871–873. D’Costa VM, McGrann KM, Hughes DW & Wright GD (2006) Gomez-Lus R (1998) Evolution of bacterial resistance to Sampling the antibiotic resistome. Science 311: 374–377. antibiotics during the last three decades. Int Microbiol 1: Denef VJ, Kalnejais LH, Mueller RS, Wilmes P, Baker BJ, Thomas 279–284. BC, VerBerkmoes NC, Hettich RL & Banfield JF (2010) Groisman EA & Ochman H (1996) Pathogenicity islands: Proteogenomic basis for ecological divergence of closely bacterial evolution in quantum leaps. Cell 87: 791–794.

c 2011 Federation of European Microbiological Societies FEMS Microbiol Rev 35 (2011) 957–976 Published by Blackwell Publishing Ltd. All rights reserved Origins of diversity through horizontal transfer 973

Haft DH, Selengut J, Mongodin EF & Nelson KE (2005) A guild incorporate ecology into bacterial systematics. P Natl Acad Sci of 45 CRISPR-associated (Cas) protein families and multiple USA 105: 2504–2509. CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Kopac S & Cohan FM (2011) A theory-based pragmatism for Comput Biol 1: e60. discovering and classifying newly divergent bacterial species. Haldane JBS (1932) The Causes of Evolution. Longmans, Green, Genetics and Evolution of Infectious Diseases (Tibayrenc M, ed), and Co., London. pp. 21–41. Elsevier, London. Hale CR, Zhao P, Olson S, Duff MO, Graveley BR, Wells L, Terns Kruger DH & Bickle TA (1983) Bacteriophage survival: multiple RM & Terns MP (2009) RNA-guided RNA cleavage by a mechanisms for avoiding the deoxyribonucleic acid restriction CRISPR RNA-Cas protein complex. Cell 139: 945–956. systems of their hosts. Microbiol Rev 47: 345–360. Hamilton HL & Dillard JP (2006) Natural transformation of Kuo CH, Moran NA & Ochman H (2009) The consequences of Neisseria gonorrhoeae: from DNA donation to homologous genetic drift for bacterial genome complexity. Genome Res 19: recombination. Mol Microbiol 59: 376–385. 1450–1454. Hao W & Golding GB (2006) The fate of laterally transferred Lacks SA & Springhorn SS (1984) Transfer of recombinant genes: life in the fast lane to adaptation or death. Genome Res plasmids containing the gene for DpnII DNA methylase into 16: 636–643. strains of Streptococcus pneumoniae that produce DpnI or Hoffman-Roberts H, Babcock E & Mitropoulos I (2005) DpnII restriction endonucleases. J Bacteriol 158: 905–909. Investigational new drugs for the treatment of resistant Lawler ML & Brun YV (2007) Advantages and mechanisms of pneumococcal infections. Expert Opin Inv Drug 14: 973–995. polarity and cell shape determination in Caulobacter Homma K, Fukuchi S, Nakamura Y, Gojobori T & Nishikawa K crescentus. Curr Opin Microbiol 10: 630–637. (2007) Gene cluster analysis method identifies horizontally Lawrence J (1999) Selfish operons: the evolutionary impact of transferred genes with high reliability and indicates that they gene clustering in prokaryotes and . Curr Opin provide the main mechanism of operon gain in 8 species of Genet Dev 9: 642–648. gamma-. Mol Biol Evol 24: 805–813. Lawrence J (2000) Clustering of antibiotic resistance genes: Hooper SD & Berg OG (2003a) Duplication is more common Beyond the selfish operon. ASM News 66: 281–286. among laterally transferred genes than among indigenous Lawrence JG (1997) Selfish operons and speciation by gene genes. Genome Biol 4: R48. transfer. Trends Microbiol 5: 355–359. Hooper SD & Berg OG (2003b) On the nature of gene Lawrence JG & Ochman H (1998) Molecular archaeology of the innovation: duplication patterns in microbial genomes. Mol Escherichia coli genome. P Natl Acad Sci USA 95: 9413–9417. Biol Evol 20: 945–954. Lawrence JG & Roth JR (1996) Selfish operons: horizontal Horvath P & Barrangou R (2010) CRISPR/Cas, the immune transfer may drive the evolution of gene clusters. Genetics 143: system of bacteria and archaea. Science 327: 167–170. 1843–1860. Hunt DE, David LA, Gevers D, Preheim SP, Alm EJ & Polz MF Lefebure´ T & Stanhope MJ (2007) Evolution of the core and pan- (2008) Resource partitioning and sympatric differentiation genome of Streptococcus: positive selection, recombination, among closely related bacterioplankton. Science 320: and genome composition. Genome Biol 8: R71. 1081–1085. Lercher MJ & Pal C (2008) Integration of horizontally transferred Jacob F & Monod J (1961) Genetic regulatory mechanisms in the genes into regulatory interaction networks takes many million synthesis of proteins. J Mol Biol 3: 318–356. years. Mol Biol Evol 25: 559–567. Jain R, Rivera MC & Lake JA (1999) Horizontal gene transfer Levin BR (1981) Periodic selection, infectious gene exchange and among genomes: the complexity hypothesis. P Natl Acad Sci the genetic structure of E. coli populations. Genetics 99: 1–23. USA 96: 3801–3806. Levin BR & Bergstrom CT (2000) Bacteria are different: Jeltsch A (2003) Maintenance of species identity and controlling observations, interpretations, speculations, and opinions speciation of bacteria: a new function for restriction/ about the mechanisms of adaptive evolution in prokaryotes. modification systems? Gene 317: 13–16. P Natl Acad Sci USA 97: 6981–6985. Jensen EC, Schrader HS, Rieland B, Thompson TL, Lee KW, Levin BR, Perrot V & Walker N (2000) Compensatory mutations, Nickerson KW & Kokjohn TA (1998) Prevalence of broad- antibiotic resistance and the population genetics of adaptive host-range lytic bacteriophages of Sphaerotilus natans, evolution in bacteria. Genetics 154: 985–997. Escherichia coli, and Pseudomonas aeruginosa. Appl Environ Lind PA, Tobin C, Berg OG, Kurland CG & Andersson DI (2010) Microb 64: 575–580. Compensatory gene amplification restores fitness after inter- Kanhere A & Vingron M (2009) Horizontal gene transfers in species gene replacements. Mol Microbiol 75: 1078–1089. prokaryotes show differential preferences for metabolic and Lorenz MG & Wackernagel W (1994) Bacterial gene transfer by translational genes. BMC Evol Biol 9:9. natural genetic transformation in the environment. Microbiol Koch AL (1974) The pertinence of the periodic selection Rev 58: 563–602. phenomenon to evolution. Genetics 77: 127–142. Lucchini S, Rowley G, Goldberg MD, Hurd D, Harrison M & Koeppel A, Perry EB, Sikorski J et al. (2008) Identifying the Hinton JC (2006) H-NS mediates the silencing of laterally fundamental units of bacterial diversity: a paradigm shift to acquired genes in bacteria. PLoS Pathog 2: e81.

FEMS Microbiol Rev 35 (2011) 957–976 c 2011 Federation of European Microbiological Societies Published by Blackwell Publishing Ltd. All rights reserved 974 J. Wiedenbeck & F.M. Cohan

Luo C, Walk ST, Gordon DM, Feldgarden M, Tiedje JM & Escherichia coli in SOS-induced mismatch repair-deficient Konstantinidis KT (2011) Genome sequencing of cells. J Bacteriol 182: 5922–5924. environmental Escherichia coli expands understanding of the Matte-Tailliez O, Brochier C, Forterre P & Philippe H (2002) ecology and speciation of the model bacterial species. P Natl Archaeal phylogeny based on ribosomal proteins. Mol Biol Evol Acad Sci USA 108: 7200–7205. 19: 631–639. Madigan MT, Martinko JM, Dunlap PV & Clark DP (2009) Brock Maynard Smith JM, Dowson CG & Spratt BG (1991) Localized Biology of Microorganisms. 12th edn. Pearson Benjamin sex in bacteria. Nature 349: 29–31. Cummings, San Francisco. Mayr E (1963) Animal Species and Evolution. Belknap Press of Maisnier-Patin S, Paulander W, Pennhag A & Andersson DI Harvard University Press, Cambridge. (2007) Compensatory evolution reveals functional McDaniel LD, Young E, Delaney J, Ruhnau F, Ritchie KB & Paul interactions between ribosomal proteins S12, L14 and L19. JH (2010) High frequency of horizontal gene transfer in the J Mol Biol 366: 207–215. oceans. Science 330: 50. Majewski J (2001) Sexual isolation in bacteria. FEMS Microbiol Melendrez MC, Lange RK, Cohan FM & Ward DM (2011) Lett 199: 161–169. Influence of molecular resolution on sequence-based Majewski J & Cohan FM (1998) The effect of mismatch repair discovery of ecological diversity among Synechococcus and heteroduplex formation on sexual isolation in Bacillus. populations in an alkaline siliceous hot spring microbial mat. Genetics 148: 13–18. Appl Environ Microb 77: 1359–1367. Majewski J & Cohan FM (1999) DNA sequence similarity Mojica FJ, Dıez´ Villasenor˜ C, Garcıa-Mart´ ınez´ J & Soria E (2005) requirements for interspecific recombination in Bacillus. Intervening sequences of regularly spaced prokaryotic repeats Genetics 153: 1525–1533. derive from foreign genetic elements. J Mol Evol 60: 174–182. Majewski J, Zawadzki P, Pickerill P, Cohan FM & Dowson CG Nakamura Y, Itoh T, Matsuda H & Gojobori T (2004) Biased (2000a) Barriers to genetic exchange between bacterial species: biological functions of horizontally transferred genes in Streptococcus pneumoniae transformation. J Bacteriol 182: prokaryotic genomes. Nat Genet 36: 760–766. 1016–1023. Nogueira T, Rankin DJ, Touchon M, Taddei F, Brown SP & Rocha Majewski J, Zawadzki P, Pickerill P, Cohan FM & Dowson CG EP (2009) Horizontal gene transfer of the secretome drives the (2000b) Barriers to genetic exchange between bacterial species: evolution of bacterial cooperation and virulence. Curr Biol 19: Streptococcus pneumoniae transformation. J Bacteriol 182: 1683–1691. 1016–1023. Norman A, Hansen LH & Sorensen SJ (2009) Conjugative Makarova KS, Aravind L, Grishin NV, Rogozin IB & Koonin EV plasmids: vessels of the communal gene pool. Philos T Roy Soc (2002) A DNA repair system specific for thermophilic Archaea B 364: 2275–2289. and bacteria predicted by genomic context analysis. Nucleic O’Brien TF (2002) Emergence, spread, and environmental effect Acids Res 30: 482–496. of antimicrobial resistance: how use of an antimicrobial Makarova KS, Haft DH, Barrangou R et al. (2011) Evolution and anywhere can increase resistance to any antimicrobial classification of the CRISPR-Cas systems. Nat Rev Microbiol 9: anywhere else. Clin Infect Dis 34 (suppl 3): S78–S84. 467–477. Ochman H & Davalos LM (2006) The nature and dynamics of Mallet J (2008) Hybridization, ecological races and the nature of bacterial genomes. Science 311: 1730–1733. species: empirical evidence for the ease of speciation. Philos T Ochman H, Lawrence JG & Groisman EA (2000) Lateral gene Roy Soc B 363: 2971–2986. transfer and the nature of bacterial innovation. Nature 405: Mallet J, Beltran M, Neukirchen W & Linares M (2007) Natural 299–304. hybridization in heliconiine butterflies: the species boundary Oshima T, Ishikawa S, Kurokawa K, Aiba H & Ogasawara N as a continuum. BMC Evol Biol 7: 28. (2006) Escherichia coli histone-like protein H-NS preferentially Marraffini LA & Sontheimer EJ (2008) CRISPR interference binds to horizontally acquired DNA in association with RNA limits horizontal gene transfer in staphylococci by targeting polymerase. DNA Res 13: 141–153. DNA. Science 322: 1843–1845. O’Sullivan O, O’Callaghan J, Sangrador-Vegas A et al. (2009) Marri PR, Hao W & Golding GB (2007) The role of laterally Comparative genomics of lactic acid bacteria reveals a niche- transferred genes in adaptive evolution. BMC Evol Biol 7 specific gene set. BMC Microbiol 9: 50. (suppl 1): S8. Pal´ C, Papp B & Lercher MJ (2005) Adaptive evolution of Martiny AC, Huang Y & Li W (2009a) Occurrence of phosphate bacterial metabolic networks by horizontal gene transfer. Nat acquisition genes in Prochlorococcus cells from different ocean Genet 37: 1372–1375. regions. Environ Microbiol 11: 1340–1347. Papke RT, Zhaxybayeva O, Feil EJ, Sommerfeld K, Muise D & Martiny AC, Tai AP, Veneziano D, Primeau F & Chisholm SW Doolittle WF (2007) Searching for species in haloarchaea. (2009b) Taxonomic resolution, ecotypes and the P Natl Acad Sci USA 104: 14092–14097. of Prochlorococcus. Environ Microbiol 11: 823–832. Paul S, Dutta A, Bag SK, Das S & Dutta C (2010) Distinct, Matic I, Taddei F & Radman M (2000) No genetic barriers ecotype-specific genome and proteome signatures in the between Salmonella enterica serovar Typhimurium and marine cyanobacteria Prochlorococcus. BMC Genomics 11: 103.

c 2011 Federation of European Microbiological Societies FEMS Microbiol Rev 35 (2011) 957–976 Published by Blackwell Publishing Ltd. All rights reserved Origins of diversity through horizontal transfer 975

Petersen L, Bollback JP, Dimmic M, Hubisz M & Nielsen R (2007) Sommer MO, Dantas G & Church GM (2009) Functional Genes under positive selection in Escherichia coli. Genome Res characterization of the antibiotic resistance reservoir in the 17: 1336–1343. human microflora. Science 325: 1128–1131. Philippot L, Andersson SG, Battin TJ, Prosser JI, Schimel JP, Sorek R, Zhu Y, Creevey CJ, Francino MP, Bork P & Rubin EM Whitman WB & Hallin S (2010) The ecological coherence of (2007) Genome-wide experimental determination of barriers high bacterial taxonomic ranks. Nat Rev Microbiol 8: 523–529. to horizontal gene transfer. Science 318: 1449–1452. Popa O, Hazkani-Covo E, Landan G, Martin W & Dagan T Sorek R, Kunin V & Hugenholtz P (2008) CRISPR – a widespread (2011) Directed networks reveal genomic barriers and DNA system that provides acquired resistance against phages in repair bypasses to lateral gene transfer among prokaryotes. bacteria and archaea. Nat Rev Microbiol 6: 181–186. Genome Res 21: 599–609. Stein DC, Patrone JB & Bish S (2010) Innate immune recognition Pourcel C, Salvignol G & Vergnaud G (2005) CRISPR elements in of Neisseria meningitidis and Neisseria gonorrhoeae. Neisseria: Yersinia pestis acquire new repeats by preferential uptake of Molecular Mechanisms of Pathogenesis (Genco C & Wetzler L, bacteriophage DNA, and provide additional tools for eds), pp. 95–122. Caister Academic, Norwich. evolutionary studies. Microbiology 151: 653–663. Steunou AS, Jensen SI, Brecht E et al. (2008) Regulation of nif Price MN, Dehal PS & Arkin AP (2008) Horizontal gene transfer gene expression and the energetics of N2 fixation over the diel and the evolution of transcriptional regulation in Escherichia cycle in a hot spring microbial mat. ISME J 2: 364–378. coli. Genome Biol 9: R4. Summers AO (2006) Genetic linkage and horizontal gene Rasko DA, Rosovitz MJ, Myers GS et al. (2008) The pangenome transfer, the roots of the antibiotic multi-resistance problem. structure of Escherichia coli: comparative genomic analysis of Anim Biotechnol 17: 125–135. E. coli commensal and pathogenic isolates. J Bacteriol 190: Templeton A (1989) The meaning of species and speciation: a 6881–6893. genetic perspective. Speciation and its Consequences (Otte D & Rest JS & Mindell DP (2003) Retroids in archaea: phylogeny and Endler J, eds), pp. 3–27. Sinauer Associates, Sunderland, MA. lateral origins. Mol Biol Evol 20: 1134–1142. Tettelin H, Masignani V, Cieslewicz MJ et al. (2005) Genome Retchless AC & Lawrence JG (2010) Phylogenetic incongruence analysis of multiple pathogenic isolates of Streptococcus arising from fragmented speciation in enteric bacteria. P Natl agalactiae: implications for the microbial ‘pan-genome’. P Natl Acad Sci USA 107: 11453–11458. Acad Sci USA 102: 13950–13955. Riedl R (1978) Order in Living Systems. Wiley, Chichester. Touchon M, Hoede C, Tenaillon O et al. (2009) Organised Rieseberg LH, Raymond O, Rosenthal DM et al. (2003) Major genome dynamics in the Escherichia coli species results in ecological transitions in wild sunflowers facilitated by highly diverse adaptive paths. PLoS Genet 5: e1000344. hybridization. Science 301: 1211–1216. Treves DS, Manning S & Adams J (1998) Repeated evolution of an Riesenfeld CS, Goodman RM & Handelsman J (2004) acetate-crossfeeding in long-term populations Uncultured soil bacteria are a reservoir of new antibiotic of Escherichia coli. Mol Biol Evol 15: 789–797. resistance genes. Environ Microbiol 6: 981–989. van der Oost J, Jore MM, Westra ER, Lundgren M & Brouns SJ Roberts MS & Cohan FM (1993) The effect of DNA sequence (2009) CRISPR-based adaptive and heritable immunity in divergence on sexual isolation in Bacillus. Genetics 134: 401–408. prokaryotes. Trends Biochem Sci 34: 401–407. Schechter LM, Jain S, Akbar S & Lee CA (2003) The small Vo AT, van Duijkeren E, Gaastra W & Fluit AC (2010) nucleoid-binding proteins H-NS, HU, and Fis affect hilA Antimicrobial resistance, class 1 integrons, and genomic island expression in Salmonella enterica serovar Typhimurium. Infect 1inSalmonella isolates from Vietnam. PLoS One 5: e9440. Immun 71: 5432–5435. Vos M & Didelot X (2009) A comparison of homologous Schimel J, Balser TC & Wallenstein M (2007) Microbial stress- recombination rates in bacteria and archaea. ISME J 3: response physiology and its implications for 199–208. function. Ecology 88: 1386–1394. Vulic´ M, Dionisio F, Taddei F & Radman M (1997) Molecular Segovia L, Pinero˜ D, Palacios R & Martınez-Romero´ E (1991) keys to speciation: DNA polymorphism and the control of Genetic structure of a soil population of nonsymbiotic genetic exchange in enterobacteria. P Natl Acad Sci USA 94: Rhizobium leguminosarum. Appl Environ Microb 57: 426–433. 9763–9767. Shen P & Huang HV (1986) Homologous recombination in Vulic´ M, Lenski RE & Radman M (1999) Mutation, Escherichia coli: dependence on substrate length and recombination, and incipient speciation of bacteria in the homology. Genetics 112: 441–457. laboratory. P Natl Acad Sci USA 96: 7348–7351. Sheppard SK, McCarthy ND, Falush D & Maiden MC (2008) Walk ST, Alm EW, Calhoun LM, Mladonicky JM & Whittam TS Convergence of Campylobacter species: implications for (2007) Genetic diversity and population structure of bacterial evolution. Science 320: 237–239. Escherichia coli isolated from freshwater beaches. Environ Smith NH, Kremer K, Inwald J, Dale J, Driscoll JR, Gordon SV, Microbiol 9: 2274–2288. van Soolingen D, Hewinson RG & Smith JM (2006) Ecotypes Walk ST, Alm EW, Gordon DM, Ram JL, Toranzos GA, Tiedje JM of the Mycobacterium tuberculosis complex. J Theor Biol 239: & Whittam TS (2009) Cryptic lineages of the genus 220–225. Escherichia. Appl Environ Microb 75: 6534–6544.

FEMS Microbiol Rev 35 (2011) 957–976 c 2011 Federation of European Microbiological Societies Published by Blackwell Publishing Ltd. All rights reserved 976 J. Wiedenbeck & F.M. Cohan

Ward DM, Bateson MM, Ferris MJ, Kuhl¨ M, Wieland A, Koeppel Wiedenbeck JK (2011) Genomic and ecolological heterogeneity A & Cohan FM (2006) Cyanobacterial ecotypes in the among extremely close relatives in Bacillus. MSc Thesis, microbial mat community of Mushroom Spring (Yellowstone Wesleyan University, Middletown, CT. National Park, Wyoming) as species-like units linking Wilkins BM, Chilley PM, Thomas AT & Pocklington MJ (1996) microbial community composition, structure and function. Distribution of restriction enzyme recognition sequences on Philos T Roy Soc B 361: 1997–2008. broad host range plasmid RP4: molecular and evolutionary Weiserova´ M & Ryu J (2008) Characterization of a restriction implications. J Mol Biol 258: 447–456. modification system from the commensal Escherichia coli Wu X, Monchy S, Taghavi S, Zhu W, Ramos J & van der Lelie D strain A0 34/86 (O83:K24:H31). BMC Microbiol 8: 106. (2010) Comparative genomics and functional analysis of Welch RA, Burland V, Plunkett G III et al. (2002) Extensive niche-specific adaptation in Pseudomonas putida. FEMS mosaic structure revealed by the complete genome sequence of Microbiol Rev 35: 299–323. uropathogenic Escherichia coli. P Natl Acad Sci USA 99: Zawadzki P & Cohan FM (1995) The size and continuity of DNA 17020–17024. segments integrated in Bacillus transformation. Genetics 141: Wellner A, Lurie MN & Gophna U (2007) Complexity, 1231–1243. connectivity, and duplicability as barriers to lateral gene Zawadzki P, Roberts MS & Cohan FM (1995) The log-linear transfer. Genome Biol 8: R156. relationship between sexual isolation and sequence Wernegreen JJ & Moran NA (1999) Evidence for genetic drift in divergence in Bacillus transformation is robust. Genetics 140: endosymbionts (Buchnera): analyses of protein-coding genes. 917–932. Mol Biol Evol 16: 83–97. Zeigler DR, Pragai Z, Rodriguez S, Chevreux B, Muffler A, Albert Whittam TS & Bumbaugh AC (2002) Inferences from whole- T, Bai R, Wyss M & Perkins JB (2008) The origins of 168, W23, genome sequences of bacterial pathogens. Curr Opin Genet and other Bacillus subtilis legacy strains. J Bacteriol 190: Dev 12: 719–725. 6983–6995.

c 2011 Federation of European Microbiological Societies FEMS Microbiol Rev 35 (2011) 957–976 Published by Blackwell Publishing Ltd. All rights reserved