Realizing Microbial Evolution
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from http://cshperspectives.cshlp.org/ on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press Realizing Microbial Evolution Howard Ochman Department of Integrative Biology, University of Texas, Austin, Texas 78712 Correspondence: [email protected] Genome sequences have become the new phenotype for microbial evolutionists. The pat- terns of diversity revealed in the first 100 bacterial genomes fostered development of a comprehensive framework that can explain their contents, organization, and evolution. he study of microbes has been at the fore- tion of the several mitochondrial and viral ge- Tfront of research for some time but has nomes over the previous decade. The era of ge- changed considerably over the past century. nomics, particularly bacterial genomics, is What were originally witnessed as agents of commonly viewed to have begun in 1995 with disease became heralded as models for all life the publication of the genome sequence of Hae- forms with the advent of molecular biology. mophilus influenzae (Fleishmann et al. 1995) And this, in turn, led to two of the most notable followed by the Mycoplasma genitalium genome accomplishments in microbial evolution: an sequence 3 months later (Fig. 1A) (Fraser et al. understanding of both their variation and their 1995). relationships at the deepest and shallowest tax- The field of bacterial genomics was actually onomic depths. These two areas expanded— well underway before the appearance of the first motivated largely by polymerase chain reaction genome sequences. Before 1995, it was already (PCR)—into complementary fields: popula- known (1) that bacterial chromosomes are, tion genetics, which analyzes the source and with few exceptions, circular, possessing a single appointment of genetic variation within bacte- replication origin; (2) that most bacterial ge- rial species, and phylogenetics, which arranges nomes comprise a single chromosome but organisms, even those noncultivable, into a that smaller extrachromosomal elements in the molecular tree of life. form of plasmids and phage are common; (3) It is undeniable that a revolution in the that genome sizes range from 500 to 10,000 kb; study of bacteria has been fostered by genomics (4) that their chromosomes are tightly packed (i.e., the sequencing and analysis of entire ge- with genes that average 1 kb in length, that nomes). Original interest in sequencing bacte- their genes contain no introns, are assorted rial genomes was probably more technical than onto both strands, and are arranged in operons; biological—bacteria had small, gene-dense, (5) that genomic base composition varies wide- single-chromosome genomes, which made as- ly among bacterial species (from 25% to 75% sembly more tractable, and their compact size GþC), but that base composition is relative- was viewed as the logical next step after elucida- ly homogeneous over the entire chromosome Editor: Howard Ochman Additional Perspectives on Microbial Evolution available at www.cshperspectives.org Copyright # 2016 Cold Spring Harbor Laboratory Press; all rights reserved; doi: 10.1101/cshperspect.a018101 Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018101 1 Downloaded from http://cshperspectives.cshlp.org/ on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press H. Ochman A 1995 B 1996 10,000 10,000 8000 8000 6000 6000 4000 Gene count 4000 Gene count Haemophilus influenzae 2000 2000 Mycoplasma pneumoniae (816 kb; 736 ORFs) Mycoplasma genitalium Mycoplasma genitalium (580 kb; 564 ORFs) 246810 12 14 246810 12 14 Genome size (Mb) Genome size (Mb) C 1997 D 1998 10,000 10,000 8000 8000 6000 6000 Escherichia coli 4000 4000 Gene count Bacillus subtilis Gene count (Mostly) free-living bacteria 2000 2000 Borrelia burgdorfei (Mostly) pathogens 246810 12 14 246810 12 14 Genome size (Mb) Genome size (Mb) Figure 1. Genome size and gene count in bacterial genomes sequenced from 1995 to 1998. Red dots indicate genomes that were published in the designated year and smaller gray dots represent genomes published in all prior years. Panels A–D show results for the year indicated. ORF, Open reading frame. within a given species; (6) that gene order is organism yielding a sufficient amount of start- conserved among related species; and (7) that ing material. Second was that the complete ge- the rates and patterns of mutations can vary nome sequence not only defines the complete within a gene, and according to chromosomal gene inventory but also reveals those genes that location and transcriptional status of a gene. are not present. Knowledge of the entire genome So, what was learned from the first se- sequence allows no room to hypothesize activ- quenced bacterial genome? Their multipage ities specified by unknown genes. And, finally, foldout figure rendered the size, location, orien- my personal delight was that this genome set the tation, and putative function of each of the 1727 standard for genome quality. A published bac- genes in the 1.8-Mb genome, the culmination terial genome needed to be a closed circle, with of spending a reported 13 months and one mil- every gap closed and nucleotide confirmed. lion dollars (although it is difficult to see how Even with only two sequenced bacterial ge- this amount could cover the expense of reagents nomes, there was a consistent relationship be- and the 40 coinvestigators for a year). Naturally, tween genome size and total gene number, and the genome contained and corroborated many the two genomes that appeared in 1996 closely of the features listed above, but there were also followed this trend (Fig. 1B). But more impor- two inimitable benefits. First and foremost was tantly, the slate of four genomes now included the credence it gave to the whole-genome shot- Mycoplasma pneumoniae, and direct compari- gun (WGS) approach to sequencing. This meth- sons with the already-sequenced M. genitalium odology eliminated the need for genetic maps genome indicated that members of the same or manipulation, and could be applied to any bacterial genus can differ by more than 30% in 2 Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018101 Downloaded from http://cshperspectives.cshlp.org/ on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press Realizing Microbial Evolution genome size and contents (Himmelreich et al. pathogen (Deckert et al. 1998) and Mycobacte- 1996). rium tuberculosis, a genome size of more than The genome size difference between these 4.4 Mb (Cole et al. 1998). congeners was attributed to the horizontal Moving forward a few years, we see that by acquisition of numerous genes by M. pneumo- 2000, after the resolution of 30 bacterial ge- niae, as opposed to the loss of genes by nomes, there remained a remarkable relation- M. genitalium. Typically, conclusions about ship between genome size and gene number gene gain and gene loss are based on the pres- across eight phyla and a .10-fold range in ge- ence or absence of genes in a common ancestor nome size (Fig. 2A). But this tenet of bacterial of the focal species, but no such genome was genome contents was undermined by elucida- available. But by exhuming those long-estab- tion of the Mycobacterium leprae genome (Fig. lished principles of bacterial genome organiza- 2B) (Cole et al. 2001). M. leprae was replete with tion (hint, point 5 above), it is possible to infer pseudogenes such that its 3.3-Mb genome con- the ancestry of a sequence without any compar- tained only 1600 functional genes, the majority isons whatsoever. Because base composition of which are also present in M. tuberculosis. varies widely among bacterial species but is ho- Discovering that M. leprae harbored large mogeneous within a species, genes possessing numbers of pseudogenes raised questions about atypical features (such as anomalously high or why other bacterial genomes lacked similarly low base compositions) are most plausibly large numbers of inactivated and nonfunctional gained from outside sources. This makes it pos- regions. Organisms are continually sieged by sible to scan bacterial genomes to assess the ex- mutations and pseudogenes are continually be- tent of laterally acquired genes, and it was sub- ing generated. So, why is there the same mono- sequently shown that most bacterial genomes tonic relationship between size and gene num- were subject to high amounts of gene transfer ber across genomes large and small? and that many strain- or species-specific pheno- First, it is necessary to understand that types were conferred by unique genomic seg- M. leprae pseudogenes were discovered using ments of atypical base composition (Ochman a comparative approach—by gene-by-gene et al. 2000). alignments with orthologs in closely related In 1997, the genomes of the well-studied M. tuberculosis. The increasing availability of se- model systems, Escherichia coli and Bacillus sub- quenced genomes offered several such opportu- tilis, were published (Blattner et al. 1997; Kunst nities for comparisons, and it became clear that et al. 1997), and at more than 4 Mb in length, many—possibly all—genomes contained pseu- they constituted the largest bacterial genomes dogenes, identified as truncated versions of yet sequenced (with the B. subtilis publication genes in related genomes. Most interesting, how- surpassing the 100-author mark) (Fig. 1C). ever, was that the genomes with the largest num- Also that year, were the genomes of Helicobacter bers of pseudogenes were, like M. leprae, rela- pylori (Tomb et al. 1997) and Borrelia burgdor- tively recent pathogens of humans: for example, feri, whose relatively high gene number reflects Shigella flexneri, an agent of dysentery that the inclusion of genes encoded on its 11 extra- is descended from E. coli, and Yersinia pestis, chromosomal elements (Fraser et al. 1997). By which causes plague, each possessing more the time there were a dozen bacterial genome than 300 pseudogenes when compared with sequences (Fig. 1D), it was apparent that there their closest sequenced relatives (Lerat and Och- was an association between genome size and man 2004, 2005).