<<

Downloaded from http://cshperspectives.cshlp.org/ on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Realizing Microbial Evolution

Howard Ochman

Department of Integrative Biology, University of Texas, Austin, Texas 78712 Correspondence: [email protected]

Genome sequences have become the new phenotype for microbial evolutionists. The pat- terns of diversity revealed in the first 100 bacterial fostered development of a comprehensive framework that can explain their contents, organization, and evolution.

he study of microbes has been at the fore- tion of the several mitochondrial and viral ge- Tfront of research for some time but has nomes over the previous decade. The era of ge- changed considerably over the past century. nomics, particularly bacterial genomics, is What were originally witnessed as agents of commonly viewed to have begun in 1995 with disease became heralded as models for all life the publication of the sequence of Hae- forms with the advent of molecular biology. mophilus influenzae (Fleishmann et al. 1995) And this, in turn, led to two of the most notable followed by the genome accomplishments in microbial evolution: an sequence 3 months later (Fig. 1A) (Fraser et al. understanding of both their variation and their 1995). relationships at the deepest and shallowest tax- The field of bacterial genomics was actually onomic depths. These two areas expanded— well underway before the appearance of the first motivated largely by polymerase chain reaction genome sequences. Before 1995, it was already (PCR)—into complementary fields: popula- known (1) that bacterial chromosomes are, tion , which analyzes the source and with few exceptions, circular, possessing a single appointment of genetic variation within bacte- replication origin; (2) that most bacterial ge- rial species, and phylogenetics, which arranges nomes comprise a single chromosome but , even those noncultivable, into a that smaller extrachromosomal elements in the molecular tree of life. form of and phage are common; (3) It is undeniable that a revolution in the that genome sizes range from 500 to 10,000 kb; study of has been fostered by genomics (4) that their chromosomes are tightly packed (i.e., the sequencing and analysis of entire ge- with genes that average 1 kb in length, that nomes). Original interest in sequencing bacte- their genes contain no , are assorted rial genomes was probably more technical than onto both strands, and are arranged in ; biological—bacteria had small, gene-dense, (5) that genomic base composition varies wide- single-chromosome genomes, which made as- ly among bacterial species (from 25% to 75% sembly more tractable, and their compact size GþC), but that base composition is relative- was viewed as the logical next step after elucida- ly homogeneous over the entire chromosome

Editor: Howard Ochman Additional Perspectives on Microbial Evolution available at www.cshperspectives.org Copyright # 2016 Cold Spring Harbor Laboratory Press; all rights reserved. Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a018101

1 Downloaded from http://cshperspectives.cshlp.org/ on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

H. Ochman

A 1995 B 1996 10,000 10,000

8000 8000

6000 6000

4000

Gene count 4000 Gene count

2000 2000 Mycoplasma pneumoniae (816 kb; 736 ORFs) Mycoplasma genitalium Mycoplasma genitalium (580 kb; 564 ORFs) 246810 12 14 246810 12 14 (Mb) Genome size (Mb) C 1997 D 1998 10,000 10,000

8000 8000

6000 6000 Escherichia coli

4000 4000 Gene count Gene count (Mostly) free-living bacteria

2000 2000 Borrelia burgdorfei (Mostly) 246810 12 14 246810 12 14 Genome size (Mb) Genome size (Mb)

Figure 1. Genome size and gene count in bacterial genomes sequenced from 1995 to 1998. Red dots indicate genomes that were published in the designated year and smaller gray dots represent genomes published in all prior years. Panels A–D show results for the year indicated. ORF, Open reading frame.

within a given species; (6) that gene order is yielding a sufficient amount of start- conserved among related species; and (7) that ing material. Second was that the complete ge- the rates and patterns of can vary nome sequence not only defines the complete within a gene, and according to chromosomal gene inventory but also reveals those genes that location and transcriptional status of a gene. are not present. Knowledge of the entire genome So, what was learned from the first se- sequence allows no room to hypothesize activ- quenced bacterial genome? Their multipage ities specified by unknown genes. And, finally, foldout figure rendered the size, location, orien- my personal delight was that this genome set the tation, and putative function of each of the 1727 standard for genome quality. A published bac- genes in the 1.8-Mb genome, the culmination terial genome needed to be a closed circle, with of spending a reported 13 months and one mil- every gap closed and nucleotide confirmed. lion dollars (although it is difficult to see how Even with only two sequenced bacterial ge- this amount could cover the expense of reagents nomes, there was a consistent relationship be- and the 40 coinvestigators for a year). Naturally, tween genome size and total gene number, and the genome contained and corroborated many the two genomes that appeared in 1996 closely of the features listed above, but there were also followed this trend (Fig. 1B). But more impor- two inimitable benefits. First and foremost was tantly, the slate of four genomes now included the credence it gave to the whole-genome shot- Mycoplasma pneumoniae, and direct compari- gun (WGS) approach to sequencing. This meth- sons with the already-sequenced M. genitalium odology eliminated the need for genetic maps genome indicated that members of the same or manipulation, and could be applied to any bacterial genus can differ by more than 30% in

2 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a018101 Downloaded from http://cshperspectives.cshlp.org/ on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Realizing Microbial Evolution

genome size and contents (Himmelreich et al. (Deckert et al. 1998) and Mycobacte- 1996). rium tuberculosis, a genome size of more than The genome size difference between these 4.4 Mb (Cole et al. 1998). congeners was attributed to the horizontal Moving forward a few years, we see that by acquisition of numerous genes by M. pneumo- 2000, after the resolution of 30 bacterial ge- niae, as opposed to the loss of genes by nomes, there remained a remarkable relation- M. genitalium. Typically, conclusions about ship between genome size and gene number gene gain and gene loss are based on the pres- across eight phyla and a .10-fold range in ge- ence or absence of genes in a common ancestor nome size (Fig. 2A). But this tenet of bacterial of the focal species, but no such genome was genome contents was undermined by elucida- available. But by exhuming those long-estab- tion of the genome (Fig. lished principles of bacterial genome organiza- 2B) (Cole et al. 2001). M. leprae was replete with tion (hint, point 5 above), it is possible to infer such that its 3.3-Mb genome con- the ancestry of a sequence without any compar- tained only 1600 functional genes, the majority isons whatsoever. Because base composition of which are also present in M. tuberculosis. varies widely among bacterial species but is ho- Discovering that M. leprae harbored large mogeneous within a species, genes possessing numbers of pseudogenes raised questions about atypical features (such as anomalously high or why other bacterial genomes lacked similarly low base compositions) are most plausibly large numbers of inactivated and nonfunctional gained from outside sources. This makes it pos- regions. Organisms are continually sieged by sible to scan bacterial genomes to assess the ex- mutations and pseudogenes are continually be- tent of laterally acquired genes, and it was sub- ing generated. So, why is there the same mono- sequently shown that most bacterial genomes tonic relationship between size and gene num- were subject to high amounts of gene transfer ber across genomes large and small? and that many strain- or species-specific pheno- First, it is necessary to understand that types were conferred by unique genomic seg- M. leprae pseudogenes were discovered using ments of atypical base composition (Ochman a comparative approach—by gene-by-gene et al. 2000). alignments with orthologs in closely related In 1997, the genomes of the well-studied M. tuberculosis. The increasing availability of se- model systems, Escherichia coli and Bacillus sub- quenced genomes offered several such opportu- tilis, were published (Blattner et al. 1997; Kunst nities for comparisons, and it became clear that et al. 1997), and at more than 4 Mb in length, many—possibly all—genomes contained pseu- they constituted the largest bacterial genomes dogenes, identified as truncated versions of yet sequenced (with the B. subtilis publication genes in related genomes. Most interesting, how- surpassing the 100-author mark) (Fig. 1C). ever, was that the genomes with the largest num- Also that year, were the genomes of Helicobacter bers of pseudogenes were, like M. leprae, rela- pylori (Tomb et al. 1997) and Borrelia burgdor- tively recent pathogens of humans: for example, feri, whose relatively high gene number reflects Shigella flexneri, an agent of dysentery that the inclusion of genes encoded on its 11 extra- is descended from E. coli, and Yersinia pestis, chromosomal elements (Fraser et al. 1997). By which causes plague, each possessing more the time there were a dozen bacterial genome than 300 pseudogenes when compared with sequences (Fig. 1D), it was apparent that there their closest sequenced relatives (Lerat and Och- was an association between genome size and man 2004, 2005). bacterial lifestyle. Those species with smaller But the reason that pseudogenes do not ac- genomes tended to be pathogens, whereas those cumulate in genomes is that the mutational pro- with larger genomes were generally free-living cess in bacteria is biased toward deletions of all species. There were two notable exceptions: the sizes, such that superfluous sequences are elim- hot-spring bacterium, Aquifex aeolicus, had a inated from the genomes (Mira et al. 2001). This genome size of only 1.6 Mb and the human deletion bias is apparent at all scales—from

Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a018101 3 Downloaded from http://cshperspectives.cshlp.org/ on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

H. Ochman

A 2000 B 2001 10,000 10,000

8000 8000

6000 6000

4000 4000 Gene count Gene count

Eight bacterial phyla represented Mycobacterium leprae 2000 2000

Intracellular pathogens and obligate

246810 12 14 246810 12 14 Genome size (Mb) Genome size (Mb) C 2004 D 2005 10,000 10,000

8000 8000

6000 6000

4000 4000 Gene count Gene count

2000 2000

Intracellularn pathogensg andd obliobligateg endosymbionts CorsCorsonellasonella rruddiiuddiu i (160 kb; 1182 ORFs) 246810 12 14 246810 12 14 Genome size (Mb) Genome size (Mb)

Figure 2. Genome size and gene count in bacterial genomes sequenced from 2001 to 2005. Red dots indicate genomes that were published in the designated year and smaller gray dots represent genomes published in all prior years. Panels A–D show results for the year indicated. ORF, Open reading frame.

continuously cultured populations that mani- mostly functional genes (Ochman and Moran fest deletions spanning 5% of their genome 2001). (Nilsson et al. 2005) to the broad phylogenetic Which brings us to the topic of tiny genomes level in which lineages of bacteria with small and the minimal gene set required for life. In the genomes derive from large-genomed ancestors decades before full genome sequencing, the over evolutionary time scales (Ochman 2005). minimal set of genes that would support a cel- Synthesizing information about the size and lular life form was reasoned to number between contents of sequenced genomes has led to a 500 and 600. This figure was arrived at by re- comprehensive understanding of the principles covering genome sizes, using microscopic and that underlie the progression toward compact DNA renaturation procedures, in the 400-MDa genomes. Large-genomed, free-living bacteria (600-kb) range for the small-celled Mycoplas- move into a host or any environment that serves ma and estimating gene lengths as averaging as a ready source of previously scarce or unavail- 1200 bp, based on the mass of a typical protein. able nutrients. The nutrient-rich environment Despite the vagaries of these calculations, renders many genes superfluous, which are in- the numbers were remarkably correct. And, activated by mutations causing pseudogenes to when the complete genome sequence of M. accumulate. Finally,the pervasive deletional bias genitalium was elucidated in 1995 (Fraser et al. rids genomes of pseudogenes. Note that the pri- 1995), its 580-kb genome encoded a mere 506 mary force countering gene erosion and elimi- genes (470 proteins and 36 structural RNAs). nation is , with the result that Other experimental and computational ap- bacterial genomes, both large and small, contain proaches in estimating the minimal gene sets

4 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a018101 Downloaded from http://cshperspectives.cshlp.org/ on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Realizing Microbial Evolution

seemed to point to rather similar numbers. For The maintenance of an intact coding region, example, based on the proportion of essential given the constant onslaught of mutations, is genes detected in an early mutagenesis screen of evidence that a sequence has been required several dozen loci in B. subtilis, a minimal ge- over the evolutionary history of the lineage, be- nome size was calculated to be 318 kb to 562 kb cause genes that serve no function incur debili- (or about 300 to 500 genes) (Itaya 1995). And by tating mutations and are eventually eliminated comparing the genes common to the first two by deletions. The genomes of recent pathogens, fully sequenced bacterial genomes (H. influen- which often contain substantial numbers of zae and M. genitalium; Fig. 1A), the absolute nonfunctional genes, can be viewed as a transi- minimal gene set sufficient for cellular life was tional stage in the progression of events that estimated to number 256 genes (Mushegian and result in the typically high coding densities ob- Koonin 1996), which matched well with some served in other bacterial genomes. Therefore, of the lower estimates obtained through large- most bacterial genomes, regardless of their size scale gene disruptions in various bacterial spe- or gene number, are already at a minimal size cies (see Table 2 in Zhang et al. 2010 for a for the environment in which they evolved— compilation of these studies). there needs to be changes either in the chal- With nearly 200 genomes sequenced, it was lenges faced by bacteria or in the overall efficacy generally agreed that the minimal size of a bac- of selection to alter genomes in any appreciable terial genome was 500 kb (or 500 genes) be- way. cause numerous genomes from divergent taxa The goal of this essay is to show that the converged and hovered around that size (Fig. contents of bacterial genomes can be under- 2C). Clearly, there are many routes to minimal stood in a context above that includes—in genomes, but it seemed that it took about 500 most genome publications—that there is some- genes to thrive. But as the poet A.R. Ammons thing beyond the compendia of genomic prop- observed—“In nature there are few sharp erties and gene inventories. Very little about the lines”—and in 2005 came the report of Carso- evolution of bacterial genomes can be gleaned nella ruddii, whose 159,622-bp genome encod- from a single publication produced during the ed only 182 genes (Fig. 2D) (Nakabachi et al. glory days of bacterial genome sequencing, but 2006). Not only did this genome surpass the collectively we have been able to render the en- lowest estimates of the number of genes neces- tire landscape. sary to sustain life, but its genome was extreme in its base composition (16.5% GþC) and in its lack of several genes that are essential to replica- REFERENCES tion and repair in E. coli. Once the 500-gene Blattner FR, Plunkett G III, Bloch CA, Perna NT, Burland V, barrier was broken, there have been numerous Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF,et al. 1997. The complete genome sequence of Escher- other reports of tiny bacterial genomes—all of ichia coli K-12. Science 277: 1453–1462. which are endosymbionts of insects—with the Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris current record held by Tremblaya princeps, D, Gordon SV, Eiglmeier K, Gas S, Barry CE III, et al. which has only 121 genes and encodes no rec- 1998. Deciphering the biology of Mycobacterium tuber- culosis from the complete genome sequence. Nature 393: ognizable tRNA synthetases (McCutcheon and 537–544. Von Dohlen 2011), although it clearly needs to Cole ST, Eiglmeier K, Parkhill J, James KD, Thomson NR, translate its genes into proteins. Wheeler PR, Honore´ N, Garnier T,Churcher C, Harris D, Integrating what we currently know about et al. 2001. Massive gene decay in the bacillus. Nature 409: 1007–1011. the process of bacterial genome evolution, we Deckert G, Warren PV,Gaasterland T,YoungWG, Lenox AL, can conclude that most bacterial genomes are Graham DE, Overbeek R, Snead MA, Keller M, Aujay already at their minimal size. Genome-wide pat- M, et al. 1998. The complete genome of the hyperther- mophilic bacterium Aquifex aeolicus. Nature 392: 353– terns of sequence conservation indicate that the 358. vast majority of protein-coding genes in bacte- Fleishmann RD, Adams MD, White O, Clayton RA, Kirk- rial genomes are under functional constraints. ness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty

Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a018101 5 Downloaded from http://cshperspectives.cshlp.org/ on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

H. Ochman

BA, Merrick JM, et al. 1995. Whole-genome random Mira A, Ochman H, Moran NA. 2001. Deletional bias and sequencing and assembly of Haemophilus influenzae the evolution of bacterial genomes. Trends Genet 10: Rd. Science 269: 496–512. 589–596. Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Mushegian AR, Koonin EV. 1996. A minimal gene set for Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley cellular life derived by comparison of complete bacterial JM, et al. 1995. The minimal gene complement of Myco- genomes. Proc Natl Acad Sci 93: 10268–10273. plasma genitalium. Science 270: 397–403. Nakabachi A, YamashitaA, Toh H, Ishikawa H, Dunbar HE, Fraser CM, Casjens S, Huang WM, Sutton GG, Clayton R, Moran NA, Hattori M. 2006. The 160-kilobase genome Lathigra R, White O, Ketchum KA, Dodson R, Hickey of the bacterial Carsonella. Science 314: EK, et al. 1997. Genomic sequence of a Lyme dis- 267. ease spirochaete, Borrelia burgdorferi. Nature 390: 580– Nilsson A, Koskiniemi S, Eriksson S, Kugelberg E, Hinton 586. JCD, Andersson DI. 2005. Bacterial genome size reduc- Himmelreich R, Hilbert H, Plagens H, Prikl E, Li BC, Herr- tion by experimental evolution. Proc Natl Acad Sci 102: mann R. 1996. Complete sequence analysis of the ge- 12112–12116. nome of the bacterium Mycoplasma pneumoniae. Nucleic Ochman H. 2005. Genomes on the shrink. Proc Natl Acad Acids Res 24: 4420–4449. Sci 102: 11959–11960. Itaya M. 1995. An estimation of minimal genome size re- Ochman H, Moran NA. 2001. Genes lost and genes found: quired for life. FEBS Lett 362: 257–260. The molecular evolution of bacterial pathogenesis and Kunst F, Ogasawara N, Moszer I, Albertini AM, Alloni G, . Science 292: 1096–1098. Azevedo V, Bertero MG, Bessie`res P, Bolotin A, Borchert Ochman H, Lawrence JG, Groisman EA. 2000. Lateral gene S, et al. 1997. The complete genome sequence of the transfer and the nature of bacterial innovation. Nature Gram-positive bacterium Bacillus subtilis. Nature 390: 405: 299–305. 249–256. Tomb J, White O, Kerlavage AR, Clayton RA, Sutton GG, Lerat E, Ochman H. 2004. C2F: Exploring the outer limits Fleischmann RD, Ketchum KA, Klenk HP,Gill S, Dough- of bacterial pseudogenes. Genome Res 14: 2273–2278. erty BA, et al. 1997. The complete genome sequence of Lerat E, Ochman H. 2005. Recognizing pseudogenes in bac- the gastric pathogen Helicobacter pylori. Nature 388: terial genomes. Nucleic Acids Res 33: 3125–3132. 539–547. McCutcheon JM, VonDohlen CD. 2011. An interdependent Zhang LY,Chang SH, WangJ. 2010. How to make a minimal metabolic patchwork in the nested symbiosis of mealy- genome for synthetic minimal cell. Protein Cell 1: 427– bugs. Curr Biol 21: 1366–1372. 434.

6 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a018101 Downloaded from http://cshperspectives.cshlp.org/ on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Realizing Microbial Evolution

Howard Ochman

Cold Spring Harb Perspect Biol published online March 1, 2016

Subject Collection Microbial Evolution

Not So Simple After All: Bacteria, Their Population Genome-Based Microbial Taxonomy Coming of Genetics, and Recombination Age William P. Hanage Philip Hugenholtz, Adam Skarshewski and Donovan H. Parks Realizing Microbial Evolution and the History of Life Howard Ochman Vincent Daubin and Gergely J. Szöllosi Thoughts Toward a Theory of Natural Selection: Early Microbial Evolution: The Age of Anaerobes The Importance of Microbial Experimental William F. Martin and Filipa L. Sousa Evolution Daniel Dykhuizen Coevolution of the Organization and Structure of Microbial Speciation Prokaryotic Genomes B. Jesse Shapiro and Martin F. Polz Marie Touchon and Eduardo P.C. Rocha −−The Engine of Evolution: Studying The Evolution of Campylobacter jejuni and Mutation and Its Role in the Evolution of Bacteria Campylobacter coli Ruth Hershberg Samuel K. Sheppard and Martin C.J. Maiden The Origin of Mutants Under Selection: How Paleobiological Perspectives on Early Microbial Natural Selection Mimics Mutagenesis (Adaptive Evolution Mutation) Andrew H. Knoll Sophie Maisnier-Patin and John R. Roth Evolution of New Functions De Novo and from Preexisting Genes Dan I. Andersson, Jon Jerlström-Hultqvist and Joakim Näsvall

For additional articles in this collection, see http://cshperspectives.cshlp.org/cgi/collection/

Copyright © 2016 Cold Spring Harbor Laboratory Press; all rights reserved