PART V Microbial and Diversity

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. The objectives of this chapter are to:

N Provide information on how are named and what is meant by a validly named .

N Discuss the classification of Bacteria and and the recent move toward an evolutionarily based, phylogenetic classification.

N Describe the ways in which the Bacteria and Archaea are identified in the laboratory.

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 17 of Bacteria and Archaea

It’s just astounding to see how constant, how conserved, certain sequence motifs—proteins, genes—have been over enormous expanses of time. You can see sequence patterns that have per- sisted probably for over three billion years. That’s far longer than mountain ranges last, than continents retain their shape. —Carl Woese, 1997 (in Perry and Staley, )

his part of the book discusses the variety of that exist on Earth and what is known about their characteris- Ttics and evolution. Most of the material pertains to the Bacteria and Archaea because there is a special chapter dedicated to eukaryotic microorganisms. Therefore, this first chapter discusses how the Bacte- ria and Archaea are named and classified and is followed by several chapters (Chapters 18–22) that discuss the properties and diversity of the Bacteria and Archaea. When scientists encounter a large number of related items—such as the chemical elements, , or animals—they characterize, name, and organize them into groups. Thousands of species of plants, animals, and bacteria have been named, and many more will be named in the future as more are discovered. Not even the most brilliant biologist knows all of the species. Organizing the species into groups of similar types aids the scientist not only in remembering them but also in comparing them to their closest relatives, some of which the scientist would know very well. In addition, biologists are interested in evolution, because this is the process through which organisms became diverse. Unraveling the route of evolution leads to an understanding of how one species is re- lated to another. As discussed in subsequent text of this chapter, evolu- tionary relationships are assessed by molecular phylogeny, the analysis of gene and protein sequences to determine the relatedness among organisms. To date, approximately 5,000 bacterial and archaeal species have been named and, based on their characteristics, placed

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 486 Chapter Seventeen

within the existing framework of other known species. The branch of bacteriology that is responsible for char- Hierarchical classification acterizing and naming organisms and organizing them TABLE 17.1 of the bacterium into groups is called taxonomy or . Spirochaeta plicatilis Taxonomy can be separated into three major areas Name of activity. One is nomenclature, which is the naming of bacteria. The second is classification, which entails the Bacteria ordering of bacteria into groups based on common prop- (vernacular name: spirochetes) erties. In identification, the third area, an unknown bac- Class Spirochaetes terium, for example, from a clinical or soil sample, is Order Spirochaetales characterized to determine its species. This chapter cov- Family Spirochaetaceae ers all three of these areas. Spirochaeta Species plicatilis

17.1 Nomenclature The International Journal of Systematic and Evolutionary Microbiology (IJSEM) is a journal devoted to the taxon- Bacteriologists throughout the world have agreed on a omy of bacteria that is published by the Society for Gen- set of rules for naming Bacteria and Archaea. These rules, eral Microbiology. IJSEM publishes papers that describe called the “International Code for the Nomenclature of and name new bacterial taxa and contains an updated Bacteria” (1992), state what a scientist must do to de- listing of all new bacteria whose names have been scribe a new species or other taxon (taxa, pl.), which is validly published. Thus, although bacterial species may a unit of classification, such as a species, genus, or fam- be described in other scientific journals, they are not con- ily. Each bacterium is placed in a genus and given a sidered validly published until they have been included species name in the same manner as are plants and an- on a validation list in IJSEM. imals. For example, are Homo sapiens (genus IJSEM also provides a forum to debate specific con- name first, followed by species), and a common intes- troversies in nomenclature by allowing a scientist to tinal bacterium is named Escherichia coli. This binomial challenge the current nomenclature of an organism or system of names follows that proposed for plants and group of organisms. Such challenges, if accepted by peer animals by the Swedish taxonomist Carl von Linné (Lin- review, are then published as a question in the IJSEM. naeus; 1707–1778). The question is then evaluated by the Judicial Commis- According to the rules of bacterial nomenclature, the sion of the International Union of Microbiological Soci- root for the name of a species or other taxon can be de- eties, which subsequently publishes a ruling in the jour- rived from any language, but it must be given a Latin end- nal. One typical example of a problem considered by the ing so that the genus and species names agree in gender. Judicial Commission was the question about Yersinia For example, consider the species name Staphylococcus pestis, the causative agent of bubonic plague. Scientific aureus. The first letter in the genus name is capitalized, evidence indicates that Y. pestis is really just a subspecies the species name is lowercase, and they are both itali- of Yersinia pseudotuberculosis, a species name that has cized to indicate that they are Latinized. When writing precedence over Y. pestis because of its earlier publica- species names in longhand, as for a laboratory notebook, tion. Because of the potential confusion and possible they should be underlined to denote that they are itali- public health issues that could arise by renaming Y. cized. The genus name Staphylococcus is derived from pestis, Y. pseudotuberculosis subspecies pestis, the Judi- the Greek Staphyl from staphyle, which means a “bunch cial Commission ruled against renaming the bacterium of grapes,” and , from the Greek, meaning “a despite its scientific justification. berry.” The o (“oh”) between the two words is a joining vowel used to connect two Greek words together. The figurative meaning of the genus name is “a cluster of SECTION HIGHLIGHTS cocci,” which describes the overall of mem- Nomenclature is concerned with naming or- bers of the genus. The species name aureus is from the ganisms. For Bacteria and Archaea, specific Latin and means “golden,” the pigmentation of mem- rules must be followed in order to name and bers of this species. The -us ending of the genus and describe new species. Organisms that are species names is the Latin masculine ending for a noun placed on the approved or validated lists are (Staphylococcus in this case) and its adjective (aureus). officially recognized species. Successively higher taxonomic categories are family, or- der, class, phylum, and domain (Table 17.1)

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. Taxonomy of Bacteria and Archaea 487

17.2 Classification otic organisms exist—the Bacteria and the Archaea—ver- sus only one domain for , Eukarya. In addition, Classification is that part of taxonomy concerned with the Eukarya may have evolved more recently as the result the grouping of bacteria into taxa based on common of symbiotic events between different early prokaryotic characteristics. The earliest classifications did not con- forms of (see Chapter 1). Because of the long period of sider microorganisms. There were two kingdoms of life: evolution of bacteria and archaea, the various groups Plants and Animals. In 1868, Ernst Haeckel, a German within these domains exhibit considerable diversity, par- scientist, proposed a third specifically for mi- ticularly metabolic and physiological. In contrast, the croorganisms. Approximately a century later, in 1969, metabolic diversity of the Eukarya is limited, especially Robert Whittaker proposed a five-kingdom system of with respect to energy generation. The vast diversity of classification. His classification included Plants (Plan- metabolic types of is discussed more fully in tae), Animals (Animalia), Fungi, Protista, and . In Chapter 5 and in Chapters 18–22. This chapter first dis- this system, the eukaryotic microorganisms were placed cusses the traditional system of classification and then cov- in the Protista kingdom and the fungi had their own spe- ers what is being done to make it phylogenetic. cial kingdom. The bacteria and archaea were placed, as prokaryotes, in the kingdom Monera. Organisms were Artificial versus Phylogenetic Classifications separated from one another on the basis of nutrition and cell structure. Therefore, plants are photosynthetic eu- Conventional artificial taxonomy uses phenotypic tests karyotes, fungi are that use dissolved nu- to determine differences between strains and species. trients, and animals are heterotrophs that ingest their These tests are typically weighted so that characteristics food. The five-kingdom classification remained popular that are considered to be more important are given until recently. Then, in 1990, Carl Woese and colleagues higher priority. For example, in traditional taxonomy, proposed an entirely new classification, the Tree of Life the Gram stain has been given more weight in determin- (see Chapter 1). Unlike all previous classifications this ing the classification of an organism than whether the new classification uses a molecular phylogenic ap- organism uses glucose as a carbon source. Therefore, all proach. The Tree of Life is based on the sequence analy- gram-positive strains would be ascribed to one family sis of a common macromolecule that all organisms share, or genus, and, within that group, certain species or the RNA in the small subunit of the (see Chap- strains would use glucose and others would not. ter 4). This RNA was used to separate all organisms on Most bacteriologists favor a phylogenetic system for Earth into three different domains, the Bacteria, the Ar- the classification of bacteria, and with the advent of mo- chaea, and the Eukarya. Viruses, which are not organisms lecular phylogeny, this hope is now being realized. The because they are not cellular (see Chapter 1), cannot be current accepted treatise that contains a complete listing classified by this system. of prokaryotic species and their classification is Bergey’s Classification systems can be either artificial or nat- Manual of Systematic Bacteriology (2001–2008), published ural. Artificial systems of classification are based on ex- by Springer, and its more condensed edition, Bergey’s pressed characteristics of the organisms, or the phenotype Manual of Determinative Bacteriology (1994). In addition of the organism. In contrast, natural or phylogenetic to containing a complete classification of bacteria and systems are based on the purported evolution of the or- archaea, the more comprehensive version of Bergey’s ganism. Until recently, all bacterial classifications were Manual contains a description of all known validly de- artificial because there was no meaningful basis for de- scribed bacterial species. Therefore, it is the “encyclope- termining their evolution. In contrast, plants and ani- dia” of the bacteria and archaea that is widely used by mals have a fairly extensive fossil record on which to bacteriologists (Box 17.1). Bergey’s Manual of Systematic base an evolutionary classification system. Although fos- Bacteriology is now in its second edition. The first edition sils of microorganisms do exist (see Chapter 1), the sim- was based on an artificial classification because too lit- ple structures of microorganisms do not permit their tle phylogenetic information was available. However, identification into a taxonomic group by morphological the second edition is based on the Tree of Life that uses criteria. Furthermore, the stable morphology of and on- a phylogenetic framework as discussed in subsequent tology of plants and animals have been useful in devel- text of this chapter. oping natural systems of classification. This type of in- formation is rarely available for microorganisms. Phenotypic Properties and Artificial When considering bacterial classification, it is impor- Classifications tant to keep in mind that bacteria have been evolving on Earth for the past 3.5 to 4 billion years. Therefore, it should Phenotypic properties are those that are expressed by an not seem surprising that two separate domains of prokary- organism, and they have always played a major role in

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 488 Chapter Seventeen

BOX 17.1 Milestones

Bergey’s Manual Trust

David Bergey was a professor of of the fifth edition, the leadership in bacteriology at the University of SAB refused. After a long and bitter Pennsylvania in the early 1900s. As fight, the SAB relented and turned a taxonomist he was a member of a the total proceeds over to Bergey, committee of the Society of who promptly put the money into a American Bacteriologists (SAB— nonprofit trust with a board of now called the American Society for trustees to oversee the publication Microbiology), which was interested of manuals on bacterial systematics. in formulating a classification of the The Trust, now named in his honor bacteria that could be used for as Bergey’s Manual Trust, is respon- identification of species. In 1923, he sible for the publication of Bergey’s and four others published the first Manual of Determinative edition of Bergey’s Manual of Bacteriology, which is now in its Determinative Bacteriology. This was ninth edition, as well as other taxo- followed by new editions every few nomic books such as Bergey’s years. Royalties collected by the Manual of Systematic Bacteriology. publication activities of the commit- The Trust is headquartered at tee were held in SAB. When David University of Georgia and has a Bergey and his co-editor, Robert nine-member international board of David Bergey. Courtesy of the National Breed, requested money from the trustees, as well as associate mem- Library of Medicine. account to be used for preparation bers from many countries.

microbial taxonomy. Indeed, early classifications had to son the mycoplasmas stain as gram-negative is that they be based entirely on phenotypic properties because they lack cell walls altogether. Therefore, they have appar- were the only properties that could be studied. ently evolved from a group of gram-positive bacteria In classifications, it is important to select phenotypic that lost their wall during evolution. characteristics that allow one to group organisms to- In contrast to the gram-positive bacteria, gram-nega- gether and others that enable one to distinguish - tive bacteria and archaea fall into many different phylo- isms from one another. Furthermore, it is important that genetic groups, including peptidoglycan-containing the identifying characteristics selected be easy to deter- types and non–peptidoglycan-containing types that are mine. Two examples of simple phenotypic characteris- bacteria as well as archaea. Therefore, gram-negative or- tics that have been widely used in artificial classification ganisms are very diverse phylogenetically. schemes are the Gram stain and cell shape. It turns out At one time, some bacteriologists proposed that the that each of these has some utility in classifications. For simplest, and purportedly the most stable, cell shape— example, the Gram stain tells something about the na- the sphere—must have been the shape of the earliest bac- ture of the (see Chapter 4). Furthermore, the teria. They then developed an evolutionary scheme based Gram stain happens to be important phylogenetically on this theory, in which all of the coccus-shaped bacteria because two of the phylogenetic groups of Bacteria are were included in the same phylogenetic group. The va- gram-positive (i.e., and ) and the lidity of this classification has not been borne out by mo- other 20 or more are gram-negative. However, there are lecular phylogeny. For example, there are both gram-neg- some drawbacks to phenotypic properties. For example, ative as well as gram-positive cocci. Some cocci are it is noteworthy that some species of gram-positive bac- photosynthetic , others are nonphotosyn- teria stain as gram-negative bacteria. Likewise, the my- thetic, some are highly resistant to ultraviolet (UV) light coplasmas, which stain as gram-negative organisms, (Deinococcus), some are Archaea, and others are Bacteria. have been found to be members of the Firmicutes Nonetheless, cell shape is important for some groups. through 16S rRNA analyses (see Chapter 20). The rea- For example, the spirochetes phylum contains all of the

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. Taxonomy of Bacteria and Archaea 489 helically shaped bacteria with periplasmic flagella (see resents the number of properties positive for B and neg- Table 17.1 and Chapter 22). Because of their morphol- ative for A. ogy, these bacteria were correctly classified with one an- Characteristics for which both strains A and B are neg- other in the order Spirochaetales and now in the phylum ative are considered irrelevant because there would be Spirochaetes long before phylogenetic data confirmed the many such features that would have no bearing on their grouping. Apart from this group, overall cell shape has similarity. For example, formation is an un- little meaning at higher taxonomic levels. Nonetheless, common characteristic for bacteria. It is not significant it can still be significant at the species, genus, and even to incorporate this characteristic when comparing two family levels. species within a genus that do not produce . Other phenotypic properties have also proved useful It would, however, be of value in comparing endospore- in both artificial and phylogenetic classifications. For ex- forming organisms to closely related organisms. It ample, because of their unique ability to produce should be noted that the similarity coefficient can be methane gas, the methanogenic microorganisms have used to relate not only phenotypic features of one organ- always been classified together in artificial classifica- ism to another, but also to relate the sequence similar- tions. Likewise, from a phylogenetic standpoint, all the ity of macromolecules of different organisms. methanogenic organisms are members of the Eury- In numerical taxonomy, it is best to have as many tests archaeota of the Archaea. of phenotypic characters as possible. Typically at least Of course, phenotypic properties have a special signif- 50 independent characters are used, and many strains are icance not found in gene sequence analyses in that these usually compared simultaneously. Ideally, each charac- features provide information about what the organism is teristic should represent a single and separate gene. The capable of doing. One cannot directly conclude from a se- same gene should not be assessed more than once, and, quence that an organism is or is not a methanogen, for ex- therefore, overlapping phenotypic tests must be ample, unless that particular feature has been tested for avoided. Typically, SAB values are greater than or equal and determined. Thus, phenotypic tests provide valuable to 70% within a species and greater than 50% within a information about the capabilities of the organism that genus. An example of a numerical analysis is shown in may help explain its role in the environment in which it Box 17.2. . In artificial classifications, bacteria are grouped into a hierarchy based on phenotypic properties. For exam- Numerical Taxonomy ple, the chemoautotrophic bacteria that obtain energy from the oxidation of inorganic nitrogen compounds, When a large number of similar bacteria are being com- such as ammonia and nitrite, would be classified in the pared, computers are very useful in the analysis of the group of nitrifying bacteria. This group would be re- data. This aspect of taxonomy, which has been used in garded as an order. One subgroup, termed the family, artificial classifications, is referred to as numerical tax- would contain ammonia oxidizers and another would onomy. Numerical taxonomy is most useful at the contain nitrite oxidizers. Within each of these groups, species and strain level, where phylogenetic relatedness features such as cell shape would be further used to de- has already been established by ribosomal RNA (rRNA) fine differences among the various genera and species. sequencing and DNA–DNA reassociation. However, this artificial classification does not take into In numerical taxonomy all characteristics are given equal account the evolutionary relatedness among the mem- weight. Therefore, of a particular carbon bers of the nitrifying bacteria. source is considered to be as important as the Gram stain or the presence of a . In characterizing strains Phylogenetic Classification and Molecular in this manner, a large number of characteristics are de- Phylogeny termined, and the similarity between strains is then com- pared by a similarity coefficient. Each strain is compared An important article published by Zuckerkandl and with every other strain. The similarity coefficient, SAB, Pauling in 1962 suggested that the evolution of organ- between two strains, A and B, is defined as follows: isms might be recorded in the sequences of their macro- molecules. Subsequent research in the late 1960s and 1970s has supported this concept, resulting in a revolu- S = a AB abc++ tion in . In particular, molecules such as rRNA and some proteins have changed at a very slow where a represents the number of properties shared in rate during evolution; therefore, their sequences provide common by strains A and B; b represents the number of important clues to the relatedness among the various properties positive for A and negative for B; and c rep- bacterial taxa and their relatedness to plants and animals

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 490 Chapter Seventeen

BOX 17.2 Methods & Techniques Determination of Similarity Coefficients In this example, eight strains, A through H, are compared to one another by ten phenotypic tests. The results are shown in the first table: Table 1 Results of phenotypic tests for eight strains, A–H Strains Tested Tests A B C D E F G H 1 +– ++++–+ 2 –++–+–++ 3 ++–+–+++ 4 +–––++–+ 5 +– –+++–– 6 ++–+––+– 7 –++++––+ 8 +––+–++– 9 +– ++++–+ 10–++–+–++ Similarity coefficients are then determined by comparing the results of the tests for each of the strains against one another, using the formula given in the text. The results are shown in Table 2.

Table 2 Similarity coefficients (× 100 to give percent similarity) for the eight strains Strains A B C D E F G H A 100 B 20 100 C 20 43 100 D 75 33 33 100 E 40 38 71 40 100 F 86 10 22 63 44 100 G 33 67 25 33 20 22 100 H40507140754433100 The information from this matrix is then used to group the strains into similar types, as shown in the following matrix:

Table 3 Similarity matrix of grouped strains Strain A F D E H C G B A 100 F 86 100 D 75 63 100 E 40 40 40 100 H 40 44 40 75 100 C 20 22 33 71 71 100 G 33 22 33 20 33 25 100 B 20 10 33 38 50 43 67 100

According to these tests, strains A, F, and D are very similar to one another and probably compose a single species. Likewise, E, C, and H are very similar and appear to be a separate species. Strains B and G may also be a different species, although more tests should probably be performed to substantiate this. As mentioned earlier in the chapter, phenotypic data such as this can provide an indication of relatedness at the species level, but if new species are being described, DNA/DNA reassociation tests should be performed.

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. Taxonomy of Bacteria and Archaea 491 as well. The result is that a major breakthrough has oc- Ribosomal RNAs are not the only macromolecules curred in the classification of prokaryotic organisms, that have been considered in determining relatedness which has rapidly become phylogenetic through the at higher taxonomic levels. Proteins such as cytochrome study of molecular phylogeny as discussed in subse- c and ribulose bisphosphate carboxylase are just two quent text. The fact that the two domains Bacteria and examples of other molecules that have been used. Archaea exist was not at all appreciated until molecular However, not all organisms synthesize these macro- phylogenetic studies were performed. Bacteriologists are molecules. Furthermore, not all cytochrome c–like mol- now trying to determine when the split occurred be- ecules found in different organisms have the same tween the Bacteria, Archaea, and Eukarya and what the physiological function. Because these other molecules nature was of their last common ancestor. are not universally distributed among all organisms, By the 1970s, data had accumulated indicating that they cannot be used to compare distantly related taxa. a true phylogenetic classification of bacteria was possi- However, the sequences of highly conserved proteins, ble. What made it possible was, first of all, acceptance of such as cytochromes and nitrogenase, are very useful the evidence that some of the macromolecules of organ- in constructing phylogenies of their origin and evolu- isms were highly conserved, that is, changed very slowly tion within disparate microbial groups. It should be during evolution, and that their sequences held the key noted that because proteins are difficult and expensive to unlocking the relatedness of bacteria to one another to sequence, their sequences are normally deduced and to plants and animals. Second, sequencing tech- from their gene sequences. niques were developed and improved so that it became Within the biological world, share many easy to conduct sequence analyses of rRNA and other similarities, indicating the conservative nature of the macromolecules. structure. Prokaryotic ribosomes contain three types of In this section, emphasis will be given to rRNA se- RNA: 5S, 16S, and 23S. Both 5S and 16S rRNA have been quencing, in particular the RNA of the small subunit of used to determine relatedness among organisms. Be- the ribosome (see Chapter 4), 16S rRNA (or 18S rRNA cause the 16S molecule is larger (with about 1,500 bases), of Eukarya), as it is the most common conserved mole- it contains more information (see Figure 17.1) than the cule used to study the phylogeny of microorganisms smaller 5S molecule with only about 120 bases (Figure (Figure 17.1). In actual practice, the 16S rRNA gene, or 17.2). Less work has been done on the 23S molecule be- 16S rDNA, is sequenced because the polymerase chain cause it is longer (about 3,000 nucleotides) and therefore reaction (PCR) procedures are simple, and both strands not as easy to study. Therefore, scientists interested in of the DNA can be used to confirm the actual sequence. the classification and evolution of bacteria have concen- Several reasons justify the choice of rRNA as an evo- trated on 16S rRNA. The method of evaluation that pro- lutionary marker. First, the ribosome is found in all cel- vides the most information is sequence determination, lular organisms (see Chapter 4) and therefore allows one especially for the complementary DNA of 16S rRNA, to compare all organisms. Second, the function of the ri- that is, the double-stranded 16S rRNA gene or 16S bosome as the structure responsible for protein synthesis rDNA. It has been found that some regions of these mol- holds true for all classes of life. Therefore, it is possible to ecules are more highly conserved than others. The more compare the phylogeny of all organisms by analysis of highly conserved regions permit the comparison of dis- a single structure with an important cellular function. tantly related organisms (Figure 17.3), and the more The third advantage of using the ribosome in phylogeny variable domains are used for comparing more closely is that ribosomes are highly conserved and therefore have related organisms. Figure 17.3 shows the “stem-loop” changed very slowly over many millions of years. This secondary structure of a model 16S rRNA that allows for is true because it is a very complex structure that carries comparison among all organisms. The stem and loop out a specific function—protein synthesis (see Chapter portions, designated by the blue color, contain the most 11). Keep in mind that the 16S rRNA is interacting in a highly variable regions. For example, the blue area des- three-dimensional structure with protein and other rRNA ignated with a red star has been expanded for three molecules as well messenger RNA (mRNA). Therefore, species—E. coli (a bacterium), Methanococcus vannielii (an a high rate of evolutionary change in ribosome structure archaean), and Saccharomyces cerevisiae (a )— has been selected against during evolution. Mutant or- to illustrate the variation among these three species. The ganisms with dysfunctional ribosomes would be unable regions that are unique to a given taxon are termed sig- to compete with existing types and have therefore not nature sequences. They can be used for the design of survived. In this manner, evolution has selected against specific hybridization probes for identification (see later major changes in the ribosome. Nonetheless, incremen- section on Identification). tal modifications have occurred over the billions of years It appears that an analysis of 16S rDNA sequences of biological evolution and these differences are used to provides important information on the evolution of construct evolutionary trees. prokaryotes. However, before concluding that the se-

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 492 Chapter Seventeen

AA U G U U G C G C 1100 G C 700 AG U G C C A A A C AG U A G U U G U G G C C U ACGAGC CCUUA UCCUUUG CC CGG C GA A G UG A A UGC AUCUGGAG AU G GC G G C A A U G C U C G G G G G U A G G A A A C G G G C C G G U G A G A A U A G U C A A U G C G U G G A C C U U G A G C G G U G G A U G A A U G A U C G G 1150 G C A G U A A A G C G U G C A G C C G G C G C G U U G U A A G G G C U U C A G U A G C A U A U A 1200 G UC C G G C G A C U A A G G C U G G U C U A C G A C G G C G C U A U G C A U C G C G A U A A G C G G A A A U A A G 800 U A C G U U A A G C G A U UGAGAAU U G G C 1050 2 U A A A C C G []m G G C 750 A G G C C G G A C U U U U G A A G U U A U A G U G GA G C G G C G U G A A G C 650 G G C C U A UA U A C A U C G C 1000 C A A CCUGGGA UGCAUCUGA CU GGCAAGC C U A U A U A C C G G C C U U C G C U U G G G C C C C G U G U A G A C U G A U U G U U U G U C G C G C U A C G A G C C G C A G G C U A A C C G A U G A A A G G A G G C 600 G U A U G A G A A G C C U U A C C U C U C G G C G A U G A A C U A C G C G G A G G G G 850 G C A A C G C U G G C G G U A U C U A GCC GUCGACU U U A U G A C A C G U A A C C A A C U U A C A 7 G G C U A A U G C G G m A G C A A U U C G C A G C U G C G C C G U U U G C A A A G A G G U G C A U A A C A G C G U G 1250 G A G A G U C A A G C G C A A A C C G C G U 5 950 U G A A A U U C mC 2 U A G G U C A U G G C G A m G A U C C G U C A U U U G A GU CA C G A C A G G G A C A A C A U A G C G C U A G G y A GGG GGCC G A U A C U G C U C C U G U G A G A C G U 900 C G A A A A G A A G C A A G U C G A C U C U U G G A U G G G C U A A A C 450 C A A A A U G A C U A AC G A A G U G G G G G C G C U G G C G A CG G G A A C A A G A G C C G U UG ACG GGGCCCG AC U U G C G C A G A 500 C G U C G U A G A A A C U C U C C A A U 550 A A U G G G G A C U G U U C C G G G C U G U C G U A U A C A C A G U 1300 A U A G C U U G U G C U G C U A U A G U G A U C A C C G A A C A U U G GGUU GUAC G A G A U G G C A 10 C A A C A A 1350 A G G G A A A A U U G C C 1400 C AA G U U C C G G U A U G U U U G U A G G G C C 400 C U 4 G G U U A G A A C G G C A G G G C A m m A U A G A U C 3 G C G A m U G C G C G C G m2G A C A C C A 1500 UAACCGUAGG C G A A GCCUG UGCAG A G C G A U G A G C U C G U U m6A G U C 5 2 C A 5' m C G G U U G G C G U C C 6 U C G C G G G U A C G U A A A m A A G U A C U U A G 2 A G A C G A A C G A A U U C A U C G C C A A G C G G C A C A G C G UA 50 U G C U G CA G U G C C U A C G C G A A A G U C G A U U A G 350 C G G U U A A U G C AG C C A G G C C G A G U 3' U G U A C G C A C GA AA U A A G U A C C A GUC ACG GU CAG GAAGAAGC U G G G U G U 300 A C G G A A G U A U C G U C C G G U G A C A G U C U U U C U U C G A U G G G G A U A G GA A G A U C A U G C G 100 U C A A A C C U G C A G G C G C A G U G G A U G C C G A A A G G C U U U A C A A U C A C U G U G A G U G G C G C C U G A U Canonical base pair (AU, GC) A G U G G C A G U A G C G U C U A U GU base pair G A C G C G G G C G U G G A A C C G U G GA base pair U U A G A A A C U A A C G U U C A G U A Non-canonical base pair A G 250 G C C G C G U 1450 U G U U U G C G G U C G G A C C G G C A G G U A U C G U A A U 150 C G C G A U A A G A G GG CCUCUU G GGGG CUACUGG C A U A U C C G G G G A G C G C C C G A U G G C A A A A U A G U A A Figure 17.1 16S rRNA 200 A U A A A Secondary structure of the 16S rRNA molecule from the small C C C G A U subunit of the ribosome from the bacterium Escherichia coli. The G C A G bases are numbered from 1 at the 5’ end to 1,542 at the 3’ end. A C Every tenth nucleotide is marked with a tick mark, and every fiftieth nucleotide is numbered. Tertiary interactions with strong comparative data are connected by solid lines. From the Comparative RNA Web Site, www.rna.icmb.utexas.edu (courtesy of Robin Gutell).

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 40 Taxonomy of Bacteria and Archaea 493 E. coliA U H. sapiens C G C G U C C C U U C C C G C G U A A A A U G U C A G U C U C C C 30 C A C G 50 Figure 17.2 5S rRNA C GA C GA C C A GA G C G GA C U C C Comparison of the secondary structures of 5S U G A U G A A A rRNA molecules from the bacterium Escherichia G A G A U C A U C G coli and the eukaryote Homo sapiens. Note how 20 G G 60 C A G C C G C C C G similar the overall secondary structure is between G G A G U U C A C these two molecules even though they represent G U G 10 U A G A A C C G C C G many millions of years of divergence from one 5′ U G C C U G G C G G C C G 70 ′ G U C U A C G G C G C 5 C A C G G A C C G U C A AGG U C G G A U G U C GUG ′ C U C another. 3 G G ′ G U 120 110 A A 3 G G U U C G G G C U U A C A A G U U G 100 A A A U G — C Canonical base pair (AU, GC) G G 80 A A C U G C G U G G G U GU base pair U U G U A G U G G A GA base pair C C G G A C G C U U U C G G G Non-canonical base pair (UU, AC) C U C G 90 U C C G A A G

(A) (B)

U A A C A C U U U G U A U A G A C U U G C A U A U G G A G C G U G U A A G A Identical in 98% or C E. coli G C 450 A G G C more of all organisms G G G C C A Conserved only G G A A C A in the Bacteria U G CG U A A U A G C G A U A G GGUU GUAC Conserved only C in the Archaea U U G C U C C G G U A U G G A G C A G A A G Conserved only C in the Eukarya G A C A A Conserved within A G U C G U each domain, variable U U G G A A among domains A G C G U A M. vannielii Regions that vary A A U U U A GC U A G C structurally among U domains A GCAUGGGC C A C G U A C U C G U G A A C U U G

A U A A A C U C S. cerevisiae G U U A A U A A C A C G U A G G 5' A G A G A U C G C A A A A U A A G U A CGGGUCUUGUA UUG U U A C C C G G G A C A U A G C A A U A A A G U 3' Figure 17.3 Conservation and variation in small subunit rRNA (A) This diagram shows conserved and variable regions of the small subunit rRNA (16S in prokaryotes or 18S in eukaryotes). Each dot and triangle represents a position that holds a nucleotide in 95% of all organisms sequenced, although the actual nucleotide present (A, U, C, or G) varies among species. (B) The starred region from (A) as it appears in a bacterium (Escherichia coli), an archaean (Methanococcus vannielii), and a eukaryote (Saccharomyces cerevisiae). This region includes important signature sequences for the Bacteria and Archaea. Figure by Jamie Cannone, courtesy of Robin Gutell; data from the Comparative RNA Web Site: www.rna.icmb.utexas.edu.

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 494 Chapter Seventeen

quence of bases in rDNA accurately reflects the phy- (A) Unrooted trees (B) Rooted trees logeny of organisms, it is important to find totally sep- A arate and independent evolutionary markers to confirm B the classification. Some work has been performed with B sequencing of ATP synthases, elongation factors, RNA polymerases, and other conserved macromolecules. The C outcome of this research, which represents one of the A most exciting areas of biology, is leading to the develop- A ment of a complete phylogenetic classification of the C Bacteria, Archaea, and other microorganisms. The 16S C rDNA–based phylogeny will face its most stringent test A as additional genomes are sequenced and compared. B Before we discuss the use of 16S rDNA sequences in B arriving at the current phylogeny of Bacteria and Archaea, B it is first necessary to consider phylogenetic trees. C C PHYLOGENETIC TREES Like a family tree, a phyloge- A netic tree contains the tree of descendants of a biological family or group. However, whereas the family tree traces Figure 17.5 Unrooted and rooted trees the genealogy of a family of humans, phylogenetic trees Representations of the possible relatedness between three trace the lineage of a variety of different species. Thus, species: A, B, and C. (A) A single unrooted tree (shown in phylogenetic trees reflect the purported evolutionary re- both formats; see Figure 17.4). (B) Three possible rooted lationships among a group of species, usually through the trees (in one format). use of some molecular attribute they possess, such as the sequence of their rRNA. In this particular section we will discuss molecular phylogeny based on a comparison nodes represent ancestors. A branch is a length that rep- of the 16S rRNA sequences of organisms. resents the distance between or degree of separation of Phylogenetic trees have two features—branches and the species (nodes). nodes (Figure 17.4). Each node represents an individual Trees may be rooted or unrooted. Figure 17.5A shows species. External nodes (usually drawn to the extreme unrooted trees containing three different species: A, B, right of the tree) represent living species, and internal and C. Unrooted trees typically compare one feature of a group of related organisms, such as the sequence of their 16S rRNA gene. There is only one shape to this par- ticular tree with three species. In contrast, rooted trees provide more information. … and branch lengths represent Typically, rooted trees use the same gene from a distantly In both types of …external nodes the evolutionary related organism for the root. This allows one to com- trees, internal (here, A through E) distance or degree pare the relatedness of the more closely related species— nodes represent represent extant, of relatedness ancestor species… known species… between species. A, B and C—to one another. A rooted tree containing three species has three possible shapes (Figure 17.5B). A A In the examples shown, A and B are more closely related to one another in the uppermost example, whereas A B B and C and B and C, respectively, are more closely related in the two lower examples. C C Alternatively, an additional gene can be used to com- pare the species. When using this approach, a different D D gene such as the sequence of a macromolecule that un- Internal Internal derwent a gene duplication event prior to when the taxa nodes E nodes E Branch that are being studied, diverged from one another. As External Branch External an example, a comparison of the elongation factors Tu nodes nodes and G (see Chapter 13) was used to root the 16S rRNA Figure 17.4 Phylogenetic trees Tree of Life (see Figure 1.6). Two different formats of phylogenetic trees used to show For bacterial phylogeny, trees are constructed from relatedness among genes or species. information based on the sequence of subunits in macro-

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. Taxonomy of Bacteria and Archaea 495 molecules. As mentioned previously, rRNA, in particu- the sequence of nucleic acid subunits among different lar the small subunit rRNA molecule, 16S rRNA, has strains of bacteria or other genes is compared. been selected as the molecule of choice because of its Figure 17.6 illustrates the use of two different meth- conserved nature and length. ods to create a tree from an aligned hypothetical se- quence region of rRNA from four different strains SEQUENCING 16S rDNA If one isolates a new bac- (shown in Figure 17.6A). The first approach is a distance terium and wishes to determine its phylogenetic posi- matrix method called the “unweighted pair group tion among the known bacteria, it is necessary to deter- method with arithmetic mean,” or UPGMA (see Figure mine the sequence of its 16S rRNA gene. This can be 17.6B). This is one of the simplest analytical methods that accomplished in a number of ways. One of the most can be used. Figure 17.6C shows an analysis of the same common ways is to use polymerase chain reaction (PCR) sequences by maximum parsimony. to amplify the 16S rDNA from genomic DNA from the In the UPGMA method, a distance matrix is set up to bacterium (see Chapter 16). The amplified 16S rDNA compare the differences in sequence or “distance,” d, be- may be sequenced directly or ligated into a cloning vec- tween each of the four strains. There are four nucleotide tor and cloned into E. coli. This latter step allows a large differences between the sequence of organism a and the quantity of 16S rDNA to be produced through growth sequence of organism b; therefore, dab is determined to prior to sequence analysis. The entire 16S rDNA can be be 4. Likewise, dac is 5, dbc is 5, and so forth. In this in- sequenced using a standard set of oligonucleotide stance, the shortest distance, 2, is between strains c and primers and standard sequencing techniques. d. From these two strains, which show the closest rela- tionship to one another, a simple tree is constructed that ALIGNMENT WITH KNOWN SEQUENCES The next shows c and d connected by a node that is half the dis- step is to incorporate the determined linear DNA se- tance between the two, that is, 1 unit. This is expressed quence into an alignment with the sequences of other in the actual tree as a horizontal branch length of one known organisms. Two internet databases are of special unit from each of the organisms to a common ancestral importance in this regard. One is called the Ribosome node (see Figure 17.6B). Database Project, located at Michigan State University. The next step is to construct another matrix in which It contains the 16S rRNA sequences for those bacteria c and d are considered as a single composite unit (cd) and that have been sequenced. Sequences from this data- compared with a and b. From this matrix, the two most base can be retrieved electronically via the Internet similar organisms are a and b, and the length of this

(http://rdp.cme.msu.edu/). Another frequently used branch is calculated as the distance, dab/2, which is equal database for sequence analysis is the National Center for to 2 units. From this an intermediate second tree is

Biotechnology Information (NCBI) (www.ncbi.nlm. formed. Finally, a is different from (cd) by (dac + dad)/2, nih.gov/Genbank), which contains a comprehensive or (5 + 6)/2 = 5.5. Likewise, b is different from (cd) by (dbc listing for 16S rRNA genes as well as other genes. + dbd)/2 = 4.5. To determine the connection between the By using a series of computer programs and careful composite branch cd and ab, the distance is calculated as manual examination, one can align and compare the se- the average of d(cd)(ab), or 5.5 + 4.5/2, which is 5.0. This quence with those retrieved from the database. value is then divided by 2 to give the average distance between the two composites, cd and ab. Using this rea- PHYLOGENETIC ANALYSIS Having the sequence is soning, a final tree (see Figure 17.6B) is produced show- only half of the story. It is next necessary to compare the ing the relationship among the four different strains. sequence of the unknown bacterium to that of other bac- The UPGMA method is the simplest of the distance teria from the database. The determination of the evolu- methods used. More sophisticated distance methods in- tionary relatedness among organisms can be accom- clude transformed distance and neighbor-joining meth- plished by one of a number of phylogenetic methods. ods, which will not be discussed here. There are several types of analysis that can be used. Dis- As mentioned earlier, in maximum parsimony the tance matrix methods are one type of approach. In dis- goal is to identify the simplest tree that could explain the tance matrix methods, the evolutionary distances, based difference between two different sequences or species. on the number of nucleic acid or amino acid monomers This approach has its philosophical basis in Occam’s ra- that differ in a sequence, are determined among the zor, commonly used in the sciences, which states that the strains being compared. A second approach is to use likely solution to a problem is the simplest one. In this case, maximum parsimony methods. In maximum parsi- to explain the evolutionary difference between two mony, the goal is to find the simplest or most parsimo- species, one looks at the tree that has the fewest changes nious that could explain the related- (mutational events) that could explain their differences. ness between different sequences. In both approaches, This is accomplished with a computer that, in theory, con-

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 496 Chapter Seventeen

(A) 16S rRNA sequences Figure 17.6 Phylogenetic analysis This table shows the sequence of a nine-base Phylogenetic analysis of four different strains—a, b, c, region of the 16S rRNAs of the four strains. and d—using a hypothetical region of their 16S rRNA that contains nine bases (A). (B) The UPGMA method Organism Site number of determining a phylogenetic tree. (C) The maxi- (strain) 192 3 4 5 6 7 8 mum parsimony method (see text for details). a G C G G A C A A A b G A C G C C A A G c G A A A U C U A A d G A A A G C U A G

(B) UPGMA method

1 Construct a distance matrix showing the relatedness (as distance, d) between strains to determine the most closely 2 Diagram the relatedness related strains—in this case, c and d. between these strains. First matrix Beginning tree

a b c c d wherecd = 1 b dab = 4 — — d 2 c dac = 5 dbc = 5 — 1 d dad = 6 dbd = 4 dcd = 2

3 Construct a second matrix to assess the distance between 4 Since a and b are close to the first paired group (cd) and one another, determine and the remaining strains (a and b). diagram their relatedness. Second matrix Second tree c (cd) a 11 d a d(cd)a = /2 — 9 a d b d(cd)b = /2 dab = 4 whereab = 2 b 2

2

5 Determine the relatedness between cd and a and b, and diagram this. Final tree c

11 9 d ( /2 + /2 )/2 where d = = 2.5 a (cd)(ab) 2

b

2.5

siders all possible trees and then identifies the simplest identical. Site 2 is not informative either, because three of one (the one with the fewest assumed mutational events). the strains have A and one has C, suggesting that a sin- As for all phylogenetic analyses, the alignment of the se- gle mutational event has occurred. Site 3 is not informa- quences must be accurate. tive, because Trees 1 and 2, which have two changes, In maximum parsimony it is important to recognize are equally parsimonious, and Tree 3 differs from Tree 2 sites in the sequences that are useful for a comparison be- only in the inferred ancestor. Site 5 is not informative be- tween organisms. These are termed informative sites. cause all trees constructed from the information at this These sites are then used to determine the most parsimo- site differ from one another by three . In con- nious tree. For example, Site 1 in the example given (see trast, Site 4 is informative, because one tree (Tree 1) is Figure 17.6C) is not informative because all the bases are more parsimonious than the other two. Sites 6 and 8 are

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. Taxonomy of Bacteria and Archaea 497

(C) Maximum parsimony method

Following a maximum 1 Identify the informative sites in the sequences. Here, parsimony analysis, this construction of trees for sites 3, 4, 5, 7, and 9 shows that proves to be the most only sites 4, 7, and 9 are informative, because one tree parsimonious tree. is more parsimonious at these sites than are the others.

Three possible trees Site 3 Site 4 Site 5 Site 7 Site 9

a c G A G A A U A U A A ** ** GA* GA* CG* AU* GG * b d C A G A C G A U G G Tree 1

a b G C G G A C A A A G * * * AA GG AG* AA AG* *** ** c d A A A A U G U U A G Tree 2

a b G C G G A C A A A G * * * * AC* GG GU* AA AA * * * * * * d c A A A A G U U U G A Tree 3 Two changes in One change Three changes One change One change trees 1 and 2; in tree 1; two in all trees in tree 1; two in tree 2; two three changes changes in changes in changes in in tree 3 other trees other trees other trees

Informative sites are at positions 4, 7, and 9.

Mutations (changes) in trees at informative sites: 2 Determine the most parsimonious tree by Tree 1 1 + 1 + 2 = 4 analyzing each of the Tree 2 2 + 2 + 1 = 5 informative sites. Tree 3 2 + 2 + 2 = 6 Therefore, tree 1 is the most parsimonious tree.

not informative because all bases are identical, but both as 16S rDNA, which contains about 1,500 bp. Moreover, Sites 7 and 9 are informative. Site 7 favors Tree 1, whereas when such large sets of data are being analyzed, it is of- Site 9 favors Tree 2. Thus, for this set of data, Tree 1 is fa- ten not possible to determine that a proposed tree is, in vored two of three times, Tree 2 is favored one of three fact, the true tree. Indeed, all trees should be considered times, and Tree 3 is not favored at any time. Adding the as hypotheses until additional information has been an- changes at those three sites gives the following data: Tree alyzed. For example, the inclusion of a sequence from a 1 is the most parsimonious because a total of only four newly discovered, closely related organism may alter the changes ( 1 + 1 + 2 = 4) would explain its phylogeny, shape of a tree when it is included in the analysis. whereas in Tree 2, five changes are required ( 2 + 2 + 1 = To help support the reliability of a given tree, other 5), and in Tree 3 six changes are required (2 + 2 + 2 = 6). techniques are used. For example, in “bootstrap” analy- Note that the two trees that were constructed from the ses, random portions of the sequence are selected by the distance matrix and maximum parsimony methods are computer, and the trees formed from them are compared identical in shape. Both trees indicate that two of the with the proposed tree to assess the statistical signifi- strains, a and b, are more closely related to one another cance of the proposed tree. In this statistical approach, than they are to c and d. Likewise, c and d are more some 100 or 1,000 different bootstrap comparisons might closely related to one another than they are to a and b. As be made and provided as evidence that the proposed you can imagine, some rather sophisticated computer tree is indeed the most parsimonious one. Bootstrap programs have been developed to handle the immense analyses are applicable for all phylogenetic treatment amount of information inherent in longer sequences such procedures.

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 498 Chapter Seventeen

As mentioned previously, other analytical methods sible for sensitive bacteria to quickly become resistant to can be used to analyze sequence information and con- an antibiotic. This scenario applies equally to other en- struct phylogenetic trees. A common one used by micro- vironmental pressures that confront bacteria, such as ex- biologists is the maximum likelihood method, which posure to potentially toxic hydrocarbons that are used involves selecting trees that have the greatest likelihood as an energy source by other species in the environment. of accounting for the observed data. This is accom- Many of the known examples of rapid genetic change plished by assigning a probability to the of any occur through the acquisition of plasmids from related one base to any other base at each possible sequence po- organisms. Thus, in the preceding examples, some plas- sition. From this, all possible topological trees are con- mids are known that carry multiple antibiotic resistant structed. By integrating the probabilities for each mu- genes and others are known that carry hydrocarbon- tation over each tree, a degree of improbability for a tree degrading genes. is assessed. The least improbable tree is chosen as the In addition, we do know that genes can be acquired “true” tree. from distantly related organisms. For example, it has A final note on studying microbial phylogeny: It should been recently reported that some members of the Pro- be emphasized that many other genes besides 16S rDNA teobacteria and other phyla of the Bacteria have been found can be used for phylogenetic analysis. An appropriate to contain a gene responsible for bacteriorhodopsin syn- gene must have the degree of conservation necessary for thesis, which was only known previously from members the analysis desired and must have a homolog in the other of the Archaea. Thus, this example appears to represent organisms of interest. A homologous gene is a gene that the transfer of genetic material across two different do- shares a common ancestry. mains as well as phyla within the Bacteria. Another ex- ample is the bacterium, Agrobacterium tumefaciens, which naturally transfers genetic material to plants (Chapters 16 and 19). The process by which organisms evolve is termed spe- From an analysis of genomes that have been se- ciation. As with plants and animals, bacteria evolve in quenced, genes that have been derived from other or- habitats. However, unlike plants and animals, bacteria ganisms have been identified in some prokaryotic can evolve very quickly because of their rapid growth genomes (Figure 17.7). The transfers between phyla ap- rates, high population sizes, and haploid genomes that pear to be relatively rare events. In addition, it should allow for the rapid expression of favorable mutations be noted that each species has hundreds of core genes through . Thus, a lineage of bacteria is that can be used to compare its relatedness to other or- determined in large part through vertical inheritance, ganisms. If HGT occurred too extensively, it could con- the process by which the parental genotype is trans- fuse phylogenetic classifications based on vertical inher- ferred to the progeny cells following DNA replication itance to such an extent that it would render them utterly and . useless. Fortunately, this does not appear to pose a ma- However, bacteria can also acquire genetic material jor problem for 16S rRNA gene trees. from other different organisms through their various ge- netic exchange mechanisms: conjugation, transforma- tion, or (see Chapter 15). This phenome- SECTION HIGHLIGHTS non is referred to as horizontal (or lateral) gene transfer Both phenotypic and genotypic properties (HGT) to distinguish it from vertical inheritance. As a have been used to describe microorganisms, result, prokaryotic organisms may undergo dramatic and both are important in describing and changes in their population structure in a relatively short naming new prokaryotic species. Artificial tax- period. For example, we know that multiple-drug resist- onomy entails the use of phenotypic tests, ance can be rapidly acquired by a bacterial species that whereas a taxonomy based on evolutionary is sensitive to antibiotics if it is exposed to antibiotics in processes relies on molecular phylogeny. Mol- the presence of other antibiotic-resistant bacteria. Consider the situation of a bacterium that is a mem- ecular phylogeny uses 16S rRNA gene se- ber of the normal microbiota of the intestinal tract that quence and protein sequence analyses, which is exposed to an antibiotic to which it is sensitive. The have become very important in the classifica- bacterium may either perish or—if a gene is available in tion of Bacteria and Archaea. Several different the environment that confers resistance and the bac- methods, such as distance, parsimony, and terium has the capability—acquire the resistance gene maximum likelihood methods, are used to through an HGT process and survive. This example of construct phylogenetic trees. a strong selective pressure likely explains how it is pos-

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. Taxonomy of Bacteria and Archaea 499

Pseudomonas aeruginosa Escherichia coli tuberculosis halodurans Vibrio cholerae Bacillus subtilis Synechocystis PCC6803 Deinococcus radiodurans Xylella fastidiosa Pasteurella multocida Lactococcus lactis Archaeoglobus fulgidus Neisseria meningitidis Z2491 Neisseria meningitidis MC58 Halobacterium NRC-1 maritima Mycobacterium leprae Pyrococcus abyssi Pyrococcus horikoshii Methanobacterium thermautotrophicum pernix Campylobacter jejuni Haemophilus influenzae Helicobacter pylori 26695 Aquifex aeolicus Thermoplasma acidophilum Methanococcus jannaschii Treponema pallidum Borrelia burgdorferi Rickettsia prowazekii Mycoplasma pneumoniae Ureaplasma urealyticum Buchnera aphidicola Mycoplasma genitalium 0123456 Megabases of protein-coding DNA Figure 17.7 Horizontal gene transfer Analyses of sequenced bacterial genomes indicate that a significant proportion of their genes can be traced to other phylogenetic groups, indicating the importance of horizontal gene transfer (HGT) in bacterial speciation. This diagram shows the pro- portion of DNA that was acquired by HGT (in red) in some microbial genomes. Courtesy of Jeffrey Lawrence.

17.3 Taxonomic Units Each colony or culture of an organism represents an individual strain or clone in which all of the cells are de- The basic taxonomic unit is the species, although as men- scended from one single organism. In a somewhat dif- tioned earlier, some species have subspecies categories ferent sense of meaning, a strain can also refer to a mu- as well. The categories above the species are (sequen- tant of a species that has changed characteristics (for tially) genus, family, order, class, phylum, and domain example, lacks a particular gene). The strain, however, (see Table 17.1). It should be noted that uncertainties ex- is not considered a formal taxonomic unit, and Latin ist in bacteriology about the meaning of the higher tax- names are therefore not ascribed to strains; they have onomic categories such as kingdom because oftentimes only informal designations, such as E. coli strain K12. the phylogenetic markers that have been used (prima- There can also be varieties within species that exhibit rily 16S rDNA sequences) cannot definitively resolve the differences. These are called biovars (i.e., biological earliest branching points in the Tree of Life. Thus, al- varieties). For example, a serological variety such as E. though we know that each of these major branches is coli O157:H7 is a pathogenic serovar that causes he- equivalent to the and animal “kingdoms,” how the molytic uremic syndrome and can be lethal to children microbial “kingdoms” or phyla are related to one an- who become infected by eating contaminated food. Like- other is only poorly understood. wise, pathogenic varieties are termed pathovars, ecolog-

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 500 Chapter Seventeen

ical types ecovars, and so forth. Now let us look at the genomic similarity between strains. Accordingly, a individual taxa beginning with the species to see what species is defined as follows: two strains of the same features are typical at each taxonomic level. species must have a similar mole percent guanine plus cytosine content (mol % G + C) and must exhibit 70% or The Species greater DNA–DNA reassociation. The procedures used to determine these features are described here. The definition of a bacterial species differs from that of plants and animals. In , the classical species MOLE PERCENT GUANINE PLUS CYTOSINE (MOL % is defined as a group of individuals (males and females) G + C) The mol % G + C refers to the proportion of that exhibit evident morphological similarities and pro- guanine and cytosine to total bases (guanine, cytosine, duce fertile progeny through sexual reproduction. In- adenine, and thymine) in the DNA. Recall that because deed, the production of progeny in many animals such G and C are paired in the double-stranded DNA mole- as mammals requires sexual reproduction. cule by hydrogen bonds, as are A and T, they occur in Although gene exchange occurs in prokaryotic organ- equal concentrations. The formula is given as: isms, it is not essential for reproduction. Most bacterial reproduction is asexual and occurs by simple binary moles() G+ C transverse or budding. In prokaryotic organisms, mol % G+= C × 100 moles() G+++ C A T sexuality is uncommon and different from that of eu- karyotes. Eukaryotes produce haploid gametes in meio- sis (see Chapter 1). During sexual reproduction, the hap- Several methods can be used to determine the mol % G + loid gametes (egg and sperm) from the male and female C, sometimes also called the “GC ratio” of a bacterium. fuse to form the diploid zygote. In , All of them require that the DNA be first isolated from a DNA from one cell is transferred during replication to a bacterium and purified. Thus, it is necessary to lyse the receptor cell; however, only partial diploidy occurs. Ge- cells to release the cytoplasmic constituents including netic material can also be transferred by other mecha- DNA, and the DNA must then be purified to remove pro- nisms such as transformation and transduction (see teins and other cellular material. Cell lysis is typically ac- Chapter 15). These transfers are not always restricted to complished by treatment with lysozyme and detergents, members of the same species. and the DNA is precipitated with ethanol. When the DNA A bacterial species comprises a group of organisms has been sufficiently purified, it can be analyzed chemi- that share many phenotypic properties and a common cally to determine the content of each of the bases. Several evolutionary history and are therefore much more different procedures can be used to determine the GC ra- closely related to one another than to other species. This tio of the purified DNA. We will describe two of them here. definition, which is very subjective, has been interpreted A common procedure to determine GC ratios is by differently by bacteriologists in describing species. For thermal denaturation. The principle behind this method example, at one extreme some taxonomists are called is that the hydrogen bonds between the double strands lumpers because they group (or “lump”) fairly diverse can be broken by heating dissolved DNA. As the hydro- organisms into a single species or genus. An example of gen bonds are broken and the two strands separate, the a lumper is F. Drouet, who has proposed reducing the absorbance of the DNA increases. This procedure, called number of from 2,000 species to only 62! “melting the DNA,” is conducted with a spectrophotome- At the opposite pole are splitters. These are taxonomists ter set at 260 nm, a wavelength at which DNA absorbs who consider even the slightest differences sufficient for strongly. The hydrogen bonding of the GC base pair is a new species. For example, many years ago it was pro- stronger than the AT pair in the double-stranded DNA posed that the genus Salmonella be “split” into hundreds molecule. Therefore, a higher temperature is required to of different species, a separate species for each of the melt DNA that has a high content of GC pairs, that is, a hundreds of different serotypes (or serovars) that are rec- high GC ratio. Figure 17.8 shows a graph of the melting ognized based on specific cell-surface antigens of their of a double-stranded DNA molecule. This process is ac- and flagella. However, the views of complished by gradually increasing the temperature of a lumpers and splitters illustrated here are considered solution of the DNA in an appropriate buffer (ionic to be extreme and are not accepted by the majority of strength is important). As the temperature is increased, microbiologists. the melting process begins and continues until the dou- In fact, more recently, a less arbitrary, quantitative ba- ble-stranded DNA molecule is completely converted to sis has been proposed to define a bacterial species. the single-stranded form. The absorbance increases dur-

Agreement was reached by a group of prominent bacte- ing this melting process. The midpoint temperature (Tm) rial taxonomists to define a bacterial species based on is directly related to the GC ratio of the DNA. Thus, the

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. Taxonomy of Bacteria and Archaea 501

0.4 Prokaryotes Bacteria Archaea 0.3 Eukaryotes Humans For each DNA, the midpoint of Animals the melting process occurs… 0.2 Plants Algae Fungi 0.1 …at a characteristic Protozoa

UV absorbance (at 260 nm) midpoint temperature, ° Tm, in this case 90 C. 0201030 4050 60 70 80 90 100 T m Mol % G + C 80 85 90 95 100 Figure 17.10 DNA base composition range Temperature Range of mol % G + C content among various groups of organisms. Note the broad range of GC ratios for bacteria, Figure 17.8 DNA melting curve archaea, and the lower eukaryotes in comparison to plants Melting curve for a double-stranded molecule of DNA. As and animals. the temperature is increased during the experiment, the double-stranded DNA is converted to the single-stranded form and the UV absorbance of the solution increases. The midpoint temperature, Tm, can be calculated from the curve. This process is reversible if the temperature of the Another method is to use the “readout” from a solution is slowly decreased to allow the single strands to genome sequence, which contains all of the genetic in- reanneal. The Tm of this species, Escherichia coli, can be used formation of a species. to determine its mol % G + C content (see Figure 17.9). Figure 17.10 shows the range of GC ratios in various groups of organisms. On this basis alone, one can see that bacteria, which have GC ratios ranging from ap- GC ratio can be read from a chart showing the relation- proximately 20 to greater than 70, are truly a very di- ship between Tm and GC content (Figure 17.9). Once the verse group. In contrast, higher organisms such as ani- DNA has been melted, it will reanneal if the tempera- mals have a very restricted GC ratio range. ture is slowly lowered. Thus, the process shown in Fig- The GC ratio provides only the relative amount of ure 17.8 is reversible. However, if the solution is cooled guanine and cytosine compared to total bases in the rapidly, formation does not recur, and the mole- DNA of an organism and says nothing about the inher- cules of DNA are left in the single-stranded state. ent characteristics of the organisms or what genes are present. Indeed, two very different organisms can have sim- ilar or even identical GC ratios. For example, the DNA of 100 Streptococcus pneumoniae and humans have the same mol % G + C content. 80 Mycobacterium phlei DNA–DNA REASSOCIATION OR HYBRIDIZATION Al- Pseudomonas though the determination of GC ratio is useful in bac- 60 Serratia terial taxonomy, it does not tell us anything about the E. coli linear arrangement of the bases in the DNA. It is the 40 Bacillus subtilis arrangement of the DNA subunits that codes for specific Mol % G + C Cytophaga genes and proteins and therefore determines the features of an organism. DNA–DNA reassociation or hybridiza- 20 tion is one method used to compare the linear order of bases in two different organisms (Box 17.3). It is impor- tant to recognize that the = 70% level of reassociation 0 60 70 80 90 100 110 used for the species definition does not indicate that the

Tm (°C) two DNAs are 70% homologous or identical. In DNA–DNA reassociation, the actual order of bases Figure 17.9 Tm and DNA base composition Graph showing the direct relationship between mol % G + in the DNA is not determined, but rather the extent of C and midpoint temperature (Tm) of purified DNA in ther- reannealing between the DNAs of two different strains mal denaturation experiments. is assessed. Ideally one would like to know the actual

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 502 Chapter Seventeen

BOX 17.3 Methods & Techniques

i DNA–DNA Reassociation

DNA–DNA reassociation can be strands of the same species and or 14C-labeled thymidine (other performed by using a variety of dif- between strands of the comparison DNA bases or 32P labeling can be ferent methods. In all approaches, species. The degree of reannealing used as well). If the bacterium takes it is necessary to begin with puri- depends on how similar the DNAs up this labeled substrate and incor- fied DNA from the two organisms are to one another. If two strains porates it into DNA, then the DNA that are being compared. The DNA are very similar, their DNAs will becomes labeled. Alternatively, the is first cut into smaller segments reanneal to a high degree. In con- DNA can be purified from the bac- (i.e., sheared ) and then denatured trast, if two strains are very differ- terium and labeled enzymatically in by melting. DNA from the two dif- ent, then the extent of reannealing the laboratory. ferent strains are mixed and will be much less. One way to per- After the DNA has been labeled allowed to cool together to allow form DNA–DNA reassociation is to and purified, it is sheared to an reannealing to occur. This reanneal- radiolabel the DNA by growing the appropriate length by sonication. It ing will occur both between DNA bacterium with tritiated thymidine is then ready for the hybridization

Unlabeled DNA Radiolabeled DNA 1 Mix radiolabeled single-stranded DNA (obtained by labeling double- stranded DNA, then shearing and denaturing it) with large amounts of unlabeled single-stranded DNA segments from (in this control experiment) the same strain.

2 Heat the solution, then slowly cool.

3 Reannealing occurs between complementary segments.

4 Treat the solution with S-1 endonuclease to digest any remaining single-stranded segments.

5 Collect the double-stranded segments on a membrane filter, and measure the amount of radiolabel; this reveals the degree of reassociation (similarity between strains).

DNA–DNA reassociation. In this example, which is a control experiment (the radiolabeled sample is reannealed with unlabeled DNA from the same strain), the degree of reassocia- tion is highest and treated as 100%. If a different strain is reannealed with the radiola- beled DNA, it will show a lower degree of reannealing (compared with the 100% attrib- uted to the control), indicative of the similarity between the two strains being tested. Strains with reannealing values of 70% or greater are considered to be the same species.

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. Taxonomy of Bacteria and Archaea 503

BOX 17.3 Continued

experiments. First, single-stranded single-stranded fragments that did indicated previously, those strains DNA is prepared. This is accom- not reanneal are removed by that show 70% or greater reassocia- plished by heating the isolated DNA enzyme digestion using S-1 tion or hybrid formation with the molecules to render them single- endonuclease, which specifically labeled strain (determined by the stranded and then cooling them degrades only single-stranded DNA, amount of hybrid DNA that is radi- rapidly to prevent reannealing. and the double-stranded fragments olabeled compared with the same First, let’s look at the control are collected on a membrane filter strain control of 100% shown in the assay for the DNA reassociation or in a column. The amount of figure) are considered to be the experiment. In this case, a small radioactivity remaining on the filter same species. Anything less is con- amount of sheared radiolabeled or on the column, after washing to sidered a different species. DNA is rendered single-stranded. remove low-molecular-weight The temperature and salt con- This is then mixed with a much larg- material, represents the amount of centration at which the DNA–DNA er amount of unlabeled DNA hybrid formation between the reannealing occurs will influence obtained from the same bacterial labeled and unlabeled DNA for this the degree of reassociation strain. These are heated together identical strain. This is the control between single strands. Scientists and cooled slowly to allow the two reaction, and the amount of radiola- conducting DNA–DNA reassocia- single-stranded groups to reanneal bel (the extent of hybridization) is tion experiments typically use a to form hybrid double strands. considered to be 100%. reannealing temperature that is Because the amount of labeled To determine the extent of reas- 25°C lower than the average mid-

DNA relative to the unlabeled DNA sociation between the strain point temperature (Tm) of the DNAs is small, there is a very low probabil- described and an unknown strain, being compared. ity that it will reanneal with other similar experiments need to be per- This temperature is sufficiently labeled strands. Most of the reasso- formed. In this instance, unlabeled high that only those sequences ciations will occur between unla- single-stranded DNA from the that are most complementary will beled strands, and most of the unknown strain is prepared and reanneal. Therefore, this is consid- remainder will be between the mixed with the known strain for ered a stringent condition for rean- labeled and unlabeled strands. The which we have labeled the DNA. As nealing.

sequence of genes of a species. Indeed, sequencing en- there is in the species E. coli. Likewise, the 16S rDNA se- tire bacterial genomes has become quite common (see quences of E. coli possess more than 15 substitutions, Chapter 16). It is worthwhile noting that one could con- whereas the difference between 18S rDNA of the mouse sider that the actual DNA base sequence of a strain is the (order Rodentia) and the is less than 16. Finally, ultimate definition of a strain—analogous to the chem- DNA–DNA reassociation data, which are used to define ical formula for a compound. However, it is important the bacterial species at 70% or greater, indicate that hu- to recognize that, unlike chemical compounds, bacterial mans are much more highly similar to one another in strains and species are not static; they continue to evolve. comparison with E. coli. Therefore, it is evident that the Interestingly, the bacterial species definition appears typical bacterial species is equivalent to a genus or fam- to be much broader than that used for animals and ily of mammals based on molecular divergence, indicat- plants when one considers DNA–DNA reassociation ing that a bacterial species is defined much differently and other molecular criteria. For example, the bacterial from its eukaryotic counterparts. This difference is fur- species E. coli can be compared to its host mammalian ther evidenced by the biological species definition for species using a variety of molecular features, including animals, which requires that within a species, mating be- the range in GC ratio, 16S rDNA sequence (versus 18S tween sexes produces fertile progeny. rDNA sequence), and DNA–DNA reassociation (Table 17.2 ). Therefore, although there is essentially no varia- The Genus tion in the range in GC ratio for the human species, it is about 4 mol % within the E. coli species. Indeed, there All species belong to a genus, the next higher taxonomic is less variation in GC ratio in the order than unit. When DNA–DNA reassociation is performed

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 504 Chapter Seventeen

TABLE 17.2 Comparison of E. coli and its host speciesa

Property 16S or 18S rRNA DNA/DNA Comparison Mole % G + C Substitutions Reassociation Among E. coli 48–52 >15 bases >70% Among Homo sapiens 42 — — Among all 42 — — Between H. sapiens and mouse — 16 bases — Between H. sapiens and chimpanzee — — 98.6% Between H. sapiens and lemurs — — >70%

aAdapted from J. T. Staley, ASM News, 1999.

within a genus, some species of the genus may show lit- thesis are sufficient to proclaim a separate genus, even tle or no significant reassociation with other species. though two groups are otherwise very closely related. This does not indicate that they are unrelated to one an- As mentioned previously, the taxonomy of prokary- other, only that this technique is too specific to iden- otes is undergoing major changes. Although according tify outlying members of the same genus. Therefore, to classical taxonomy each genus belongs to a family of DNA–DNA hybridization has limited utility for deter- similar genera, relatedness at the familial and higher mining whether a species is a member of a known bac- level is often uncertain for bacteria. Bacteriologists have terial genus. been reluctant to ascribe organisms to formal Latinized The definition of the genus is based on one or more families and orders. However, as more becomes known prominent phenotypic characteristics that permit it to be about bacterial phylogeny, it is increasingly apparent distinguished from its closest relatives. Oftentimes some that higher taxonomic levels do have meaning and can striking physiological or morphological feature is pres- be distinguished from one another by comparing the se- ent that permits the genus to be differentiated from quences of certain macromolecules, as is reflected in the closely related taxa. For example, the genus Nitrosomonas new edition of Bergey’s Manual of Systematic Bacteriology. is a group of rod-shaped bacteria that grow as chemoau- totrophs, gaining energy from the oxidation of ammo- nia. Other ammonia oxidizers with coccus-shaped and SECTION HIGHLIGHTS helical cells are placed in other genera. Of course, all Bacteria and Archaea are classified in a hierar- strains of each of these genera need to be more closely chical structure from the domain level to the related to one another phylogenetically than to strains phylum, class, order, family, genus, and finally of other genera in a phylogenetic classification. Ideally species. Species are described based on both then, the genus makes up a monophyletic lineage (i.e., phenotypic and genotypic properties includ- one in which all are members of the same phylogenetic ing GC ratio, 16S rRNA sequence, and cluster or ). DNA–DNA hybridization. Higher Taxa Odd bedfellows are sometimes found in phylogenetic 17.4 Major Groups of Archaea and trees; therefore, photosynthetic and nonphotosynthetic Bacteria members of some closely related groups have been re- ported. Of course, loss of a key gene or two may result Bacteriologists have begun to construct classifications in converting a formerly photosynthetic organism to one using phylogenetic information from rRNA analyses. As that is not photosynthetic. Consequently, although the mentioned in Chapter 1, some prokaryotes are very dif- plant and animal kingdoms are differentiated on the ba- ferent from others, a revelation that came through an sis of whether or not they are photosynthetic, both fea- analysis of rRNA (Box 17.4). Ribosomal RNA data allow tures have been reported in two closely related bacter- the division of all organisms on Earth, prokaryotic and ial genera. However, because phenotype is so important eukaryotic, into three domains: Bacteria, Archaea, and Eu- at the genus level, important features such as photosyn- karya (see Figure 1.6). These domains can also be dis-

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. Taxonomy of Bacteria and Archaea 505

BOX 17.4 Research Highlights

The Discovery of Archaea

Clearly, one of the most exciting specific nucleotide pairs. The developments in bacterial classifi- oligonucleotide fragments cation of the twentieth century was produced were subjected to the discovery that there are two two-dimensional elec- major groups called Domains, trophoresis. The pattern of named the Bacteria and the spots on a two-dimensional Archaea. The appreciation of the chromatogram represented difference between the Bacteria the various rRNA oligonu- and the Archaea was the culmina- cleotides that were typical of tion of years of research by micro- each species. Studies from biologists throughout the world. Woese’s laboratory demon- However, the final piece of evi- strated that the patterns dence that convinced microbiolo- were very different for gists of this dichotomy was the dis- Bacteria and Archaea. Indeed, covery that these organisms had their studies of 18S rRNA very different 16S rRNAs. This from eukaryotic organisms research was performed in Carl indicated that the Archaea Carl Woese. Courtesy of Jason Lindsey. Woese’s laboratory at the are as different from Bacteria University of Illinois. At that time, as they are from eukaryotes. rRNA sequencing was not done The seminal findings of this routinely in laboratories. Instead, work have forever changed 16S rRNA was purified and digested the way microbiologists view by ribonucleases that cut between taxonomy and phylogeny.

tinguished from one another by phenotypic testing. For bacterial groups treated in this book. The Archaea are de- example, consider the cell envelope composition of the scribed in Chapter 18 and the Bacteria in Chapters 19–22 organisms. Peptidoglycan is found only in Bacteria, al- (Table 17.3). though two groups—the mycoplasmas and the Planc- tomycetales—lack it. Furthermore, the lipids of the Bacte- Domain: Archaea ria and Eukarya are ester-linked, whereas they are ether-linked in the Archaea (see Chapters 4 and 18). The Archaea are divided into the following four phylo- At this time, 28 different phylogenetic groups, re- genetic groups or phyla. ferred to here as phyla, are known. Included are 24 phyla of Bacteria and four phyla of Archaea. Each of these This phylum contains the most phyla has specific signature sequences in their ribosomes thermophilic organisms known. Some of these organ- that are distinctive to them. The prokaryotic phyla are isms grow at temperatures higher than the boiling point listed here along with a brief description of their major of water. Most rely on sulfur metabolism either as an en- features. They are treated in more detail in subsequent ergy source or as an electron sink. For example, some ox- chapters (Chapters 18 through 22), and the groups in idize reduced sulfur compounds aerobically to produce each chapter are indicated below. It is noteworthy that sulfuric acid. Others reduce elemental sulfur and use it many new phyla of the Bacteria, in particular, have been as an electron acceptor to form hydrogen sulfide. Some discovered in natural environments using clone library are iron and manganese reducers. Not all organisms are approaches, but have not yet been isolated in pure cul- thermophilic. They are also significant in deep-sea envi- ture (Box 17.5). Thus, at least ten to 15 additional major ronments as well as in polar seas. prokaryotic groups are very poorly understood. The second edition of Bergey’s Manual of Systematic EURYARCHEOTA The methanogens (methane produc- Bacteriology has been largely followed in organizing the ers) are noted for their ability to produce methane gas

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 506 Chapter Seventeen

BOX 17.5 Research Highlights

Novel Phyla Discovered by Molecular Analyses of Natural Habitats

One of the major advances in interest from the DNA. For phylo- the various 16S rDNA types exploring the diversity of micro- genetic (i.e., diversity) information, retrieved from a natural sample can organisms in natural environments 16S rDNA primers for the Bacteria provide important information has been the application of molec- or Archaea or universal primers that about the diversity of prokaryotes ular approaches developed in amplify both groups are commonly that occur in that environment. Norman Pace’s laboratory. In the used. The segments retrieved, typi- Using these approaches, it has most recent variation of this cally about 500 bp, can then be recently been determined that approach, DNA is extracted from sequenced to identify the phyloge- more than 50 major phyla of the environment of interest. Then netic groups. Although PCR Bacteria exist, yet isolates have PCR is used to amplify genes of approaches are not quantitative, been obtained of only 24.

Planctomycetes

Chlamyd

OP3

a ia

Nitrospir

Acidobacterium Termite group I OP10 Synergistes

WS1 OS-K -positive OP8 Flexistipes Cyanobacteria Actinobacteria Gre en non Low G + C gram sulfur Fibrobacter oup A OP5 Marine gr Green sulfur OP9 Dictyoglomus Cytophagales Thermus/Deinococcus Coprothermobacter

Spirochetes Thermotogales TM6 Thermodesulfobacterium

WS6 Aquificales TM7

P roteo

bacteri

a OP11

Archaea A phylogenetic tree of 16S rDNA sequences of Bacteria, based on pure cultures and clonal libraries from natural samples. Note the existence of many phyla (shown in outline rather than as solid black lines) that have not yet been cultivated. Courtesy of Phil Hugenholz and ASM Publications (Hugenholz, P., B. M. Goebel and N. R. Pace. 1998. J. Bacteriol. 180: 4765–4774).

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. Taxonomy of Bacteria and Archaea 507

Overall organization for treatment TABLE 17.3 of Bacteria and Archaea Bergey’s Manual of Bacterial Group This Book Systematic Bacteriologya Archaea Chapter 18 Volume 1 Proteobacteria Chapter 19 Volume 2 Gram-positive bacteria Chapter 20 Volumes 3 and 5 Phototrophic bacteria Chapter 21 Volumes 1, 2, and 3 Other Chapter 22 Volume 4

aFor more complete treatment of organization of taxa see Bergey’s Manual of Systematic Bacteriology (2nd ed.). from simple carbon sources. Some use carbon dioxide present, is via the Calvin cycle in all members of this and hydrogen gas, whereas others use methanol or phylum. acetic acid. These archaea are anaerobes, some of which This phylogenetic group contains many of the well- grow at the lowest oxidation–reduction potentials of known gram-negative heterotrophic bacteria such as all prokaryotes. Some of these archaea fix carbon diox- Pseudomonas, the enteric bacteria including E. coli, Vibrio ide, but they use neither the Calvin cycle nor the reduc- and luminescent bacteria, and the more morphologically tive tricarboxylic acid (TCA) cycle. Some of the Eur- unusual bacteria such as the prosthecate bacteria. In ad- yarcheota are hyperthermophilic. Extreme halophiles dition, many symbiotic genera such as Agrobacterium, make up another phenotypic subgroup. These extremely Rickettsia, and Rhizobium are members of this group. halophilic archaea grow only in saturated salt-brine so- The gram-negative bacterial sulfate reducers such as lutions. They lyse when placed in distilled water. Desulfovibrio are also found in this group. Also included It should be noted that there is an overlap between in this phylum are the exotic myxobacteria that form the extreme halophiles and methanogens. Thus, some fruiting structures as well as unicellular, nongliding species of methanogens grow in high-salt environments. forms such as the bacterial predator Bdellovibrio. Finally, the mitochondria present in almost all eukary- NANOARCHEOTA This recently discovered group of otes evolved from this group of bacteria. the Archaea comprises obligate parasites of other mem- bers of the Archaea. They are among the smallest of or- FIRMICUTES The bacteria in this group are all gram- ganisms, hence their name. positive, although the Mycoplasma group lacks a cell wall altogether and therefore stains as gram-negative. All KORARCHEOTA These archaeal microorganisms have other members of the group contain large amounts of been found in hot springs, but no strains have yet been peptidoglycan in their cell wall structure. isolated in pure culture, so little is known about their phe- The Firmicutes are unicellular organisms that have a notypic properties. low mol % G + C content. Most are cocci or rods, and some produce endospores. Bacillus are aerobic or faculta- Domain: Bacteria tive spore formers, whereas Clostridium species are anaer- obic fermenters. Some are sulfate reducers. One group, The Bacteria are divided into a number of phyla, which the heliobacteria, are photosynthetic, and members pro- are described in the following text. duce a unique form of bacteriochlorophyll, Bchl g.

PROTEOBACTERIA The Proteobacteria comprise a very ACTINOBACTERIA These gram-positive bacteria range large and diverse group of organisms. All four of the in shape from unicellular organisms to branching, fila- major bacterial nutritional types are represented mentous, mycelial organisms. Most are common soil or- within this group. Some of these organisms are photo- ganisms, some of which produce specialized dissemina- synthetic (treated in Chapter 21), whereas some are tion stages called conidiospores, which enable them to heterotrophic and others are chemolithotrophic. The survive during dry periods. chemolithotrophic bacteria include the nitrifiers, the thiobacilli, the filamentous sulfur oxidizers (Beggiatoa This group contains the genus Chlo- and related genera), and many species that grow as hy- roflexus, a green gliding bacterium that is metabolically drogen . Carbon dioxide fixation, when versatile. Members can grow as heterotrophs or photo-

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 508 Chapter Seventeen

synthetically. Carbon dioxide is not fixed by either the The Planctomycetes group of Bacte- Calvin cycle or the reductive TCA cycle but by a special ria are budding, unicellular, or filamentous bacteria. pathway known only for this group of bacteria. These bacteria lack peptidoglycan.

CHLOROBI The green sulfur bacteria are anoxygenic The Chlamydiae are a group of oblig- photosynthetic bacteria. Some are unicellular forms, and ately intracellular parasites and whose clos- others produce networks of cells. None are motile by fla- est relatives are the Planctomycetes. They also lack pep- gella or gliding . Some have gas vacuoles. They tidoglycan. use the reductive TCA cycle rather than the Calvin cycle to fix carbon dioxide. VERRUCOMICROBIA These bacteria are unusual in that some members have bacterial tubulin genes. Very few CYANOBACTERIA The cyanobacteria are the only bac- representatives of this phylum have been isolated in teria that carry out oxygenic . This is a di- pure culture, although they comprise up to 3% of the mi- verse group of bacteria ranging from unicellular to mul- crobiota from soils. ticellular filamentous and colonial types. Some grow in association with higher plants and animals. All cyanobac- LENTISPHAERA This is a newly discovered phylum. teria use the Calvin cycle for carbon dioxide fixation. The Some isolates are marine, others are intestinal symbionts chloroplast found in all eukaryotic photosynthetic organ- of mammals. isms evolved from this group of bacteria. SPIROCHAETES The spirochetes are morphologically These bacteria are hydrogen autotrophs. distinct from other bacteria. Their flexible cells are heli- This phylogenetic group contains the most thermophilic cal. All are motile due to a special flagellum-like struc- member of the Bacteria known and makes up one of the ture, the axial filament, not found in other bacteria. deepest branches of the Bacteria. These are anaerobic bacteria, some of This is a fermentative genus that con- which live in the gastrointestinal tracts of animals. Some tains some of the most thermophilic members of the Bac- are cellulose degraders. teria known. The term “toga” refers to the outer extra- cellular material that surrounds the cells. They grow at These bacteria are commonly found temperatures from 55°C to 90°C. Their cell lipids are un- in soils and sediments, but few strains have been culti- usual. vated. In addition to the aerobic genus Acidobacterium, this phylum contains homoacetogenic bacteria, Holophaga, and THERMOMICROBIA The genus Thermomicrobium con- iron-reducing bacteria in the genus Geothrix. tains small, rod-shaped that grow as het- erotrophs in hot springs with an optimal temperature FUSOBACTERIA These obligately anaerobic bacteria are for growth of 70°C to 75°C. The cell wall contains very commonly found in the oral cavities and intestinal tracts low amounts of . of animals.

THERMODESULFOBACTERIA This is a group of ther- DICTYOGLOMI Species of this group are thermophilic, mophilic sulfur-reducing bacteria. obligately anaerobic fermentative bacteria.

DEINOCOCCUS–THERMUS This is a very small group of organisms currently represented by very few gen- SECTION HIGHLIGHTS . The genus Deinococcus contains gram-positive bac- Currently four phyla have been described in teria. However, they differ from other gram-positive bac- the domain Archaea and 24 have been de- teria in showing strong resistance to gamma radiation scribed in the domain Bacteria. and UV light. Thermus contains thermophilic, rod- shaped bacteria. Ornithine is the diamino acid in the cell walls of both Thermus and Deinococcus. 17.5 Identification

BACTEROIDETES This is a diverse group containing The final area of taxonomy is identification. Bacteriolo- heterotrophic aerobes and anaerobes. Some are gliding gists are often confronted with determining to which heterotrophic bacteria that have a low DNA base com- species a newly isolated organism belongs. Clinical mi- position (about 30 to 40 mol % G + C). crobiologists need to know whether a specific patho-

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. Taxonomy of Bacteria and Archaea 509 genic bacterium is present so they can properly diagnose ers. Therefore, the results of one set of tests will deter- a disease. Food microbiologists need to determine mine which tests will need to be performed for further whether Salmonella, Listeria, or other potentially patho- clarification of the taxon. Sometimes several weeks genic bacteria are present in foods. Dairy microbiologists might be required to conduct all the tests needed to iden- need to keep their important lactic acid bacteria in cul- tify a strain. Rapid tests are extremely helpful, especially ture to produce a uniform variety of cheese. Brewers and in the medical, food, and water-testing areas. Fortu- wine makers need to keep their cultures pure to inocu- nately, standardized, routine tests can be performed for late the proper strains for quality control of their fermen- most clinically important bacteria that allow for their tations. Analysts at water treatment plants need to make rapid identification. Several companies have now pro- sure that their chlorination treatment is effective in duced commercial kits that are helpful in assisting in the killing coliform bacteria in the treated water and distri- identification of unknowns (see Figure 30.4). bution systems. Microbial ecologists need to identify An increasingly popular approach to identification of bacteria that are responsible for important processes bacterial unknowns involves characterization of their such as pesticide breakdown and . fatty acids. The fatty acids are found in membrane lipids The process of identification first assumes that the and are readily extracted from the cells and analyzed. bacterium of interest is one that has already been de- Different species of Bacteria produce different types and scribed and named. This is the usual case for most clin- ratios of fatty acids. The Archaea, of course, do not pro- ical specimens or specimens from known fermentations. duce fatty acids, so this procedure is not of value for their However, microbial ecologists often find that the organ- identification. However, they do produce characteristic ism they are interested in is new. It is estimated that less lipids that are useful taxonomically, especially for the than 1% of prokaryotic species have been isolated, stud- halobacteria. ied in the laboratory, and named. Therefore, it is not al- The fatty acid analysis procedure involves hydrolyz- ways possible to identify a bacterium that has been iso- ing a small quantity of cell material (about 40 mg is all lated from an environmental sample. that is needed) and saponifying it in sodium hydroxide. This is acidified with hydrochloric acid in methanol so Phenotypic Tests that the fatty acids can be methylated to form methyl es- ters. The fatty acid methylated esters (FAME) are then Phenotypic tests based on readily determined character- extracted with an organic solvent and injected into a gas istics are often used to identify a species. Most are sim- chromatograph. The resulting chromatogram (Figure ple to perform and inexpensive. Furthermore, the 17.11) can be used to identify the fatty acids that are in- amount of time and equipment required for conducting dicative of a species. Commercial firms have developed genotypic tests such as DNA–DNA reassociation pre- databases of fatty acid profiles that can be used for the clude the use of these tests in routine diagnosis. By per- identification of species. The advantage of this proce- forming a battery of some 10 to 20 simple phenotypic dure is that many samples can be analyzed quickly and tests, it is often possible to determine the genus, and per- without great effort. However, all organisms must be haps even the species, of a clinically important bac- grown under controlled conditions of temperature and terium, although some taxa are much more difficult to length of incubation and on the same medium. identify than others. Traditional methods for identification require grow- Nucleic Acid Probes and Fluorescent Antisera ing the organism in question in pure culture and per- forming a number of phenotypic tests. For example, if One exciting current area of research and commercial one wishes to identify a rod-shaped bacterium, the first application involves the development of DNA or RNA test would be a Gram stain. If the organism is determined “probes” that are specific for the signature sequences to be a gram-negative rod, the next questions to ask in- of rRNA or some other appropriate gene such as an en- clude the following: Is it motile? Is it an obligate aer- zyme that is characteristic of a species of interest. The obe? Is it fermentative? Does it have catalase? Can it grow probes are labeled in some manner so that the hybridiza- using acetate as a sole carbon source? The answers to tion can be visualized. This is accomplished either by these questions will direct the investigator toward the making the probes radioactive or by tagging them to a next step in identification. Conversely, if the organism fluorescent dye or an enzyme that gives a colorimetric is a gram-positive rod, then a completely different set of reaction. tests would need to be performed such as a test for en- By the proper selection or design of probes, it is pos- dospore formation. sible to identify an organism to a domain, genus, or These tests take time to perform, and the appropriate species by demonstrating specific hybridization to the tests for one genus of bacteria differ from that of oth- probe (see Chapter 25).

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 510 Chapter Seventeen

1.601 10.024 10.322 13.477

3500 Solvent 18:1 omega 7 cis, omega 9 trans, 16:1 omega 12 trans omega 16:0 7 cis 3000

2500

4.147

Quantity of fatty acid 2000 10:0 16:0 4.668 2O H 19:0 cyclo omega 8 12:0 1500 12.450 15.354

1 5 10 15 20 Retention time (minutes)

Figure 17.11 Fatty acid analysis indicates a cis double bond between the seventh and eighth Fatty acid methyl ester (FAME) chromatogram of an carbons from the omega end of the fatty acid. Also, 2OH unknown species, showing chromatographic column reten- indicates a hydroxyl group at the second carbon from the tion times and peak heights. Note: 10:0, 12:0, 16:0, and 19:0 omega end; cyclo omega 8, a cyclo-carbon at the eighth posi- indicate saturated fatty acids with 10, 12, 16, and 19 carbons; tion from the omega end. The 18:1 omega 7 cis, omega 9 16:1 and 18:1, monounsaturated 16-carbon and 18-carbon trans, and omega 12 trans peak results from either one fatty fatty acids; omega number, the position of the double bond acid or a mixture of fatty acids with double bonds at the relative to the omega end—that is, the hydrocarbon end (not three positions indicated (the chromatographic column does the carboxyl end)—of the fatty acid chain; cis and trans, the not separate these three fatty acids). Courtesy of MIDI configuration of the double bond. For example, omega 7 cis (Microbial Identification, Incorporated, Delaware).

Potential applications for probe technology are con- ical setting, if the organism has been isolated from a siderable. A number of commercial firms are already patient, its MLST “type” can be compared to that of a marketing probes to identify from large database to determine whether it belongs to a clinical samples. A goal of this technology is to enable known pathogenic species. The MLST approach is rap- the rapid identification of organisms directly from clin- idly gaining acceptance as a means of identification of ical or environmental samples without actually grow- pathogenic strains and species. ing them in culture first. Fluorescent antiserum tests are also useful. For example, Legionella spp., the Culture Collections causative agents of Legionnaires’ disease, are very dif- ficult to cultivate. However, good fluorescent antisera Unlike plants and animals, many bacteria and archaea are available for the identification of these species di- can be easily grown in pure culture and preserved by rectly from clinical samples or even from environmen- freeze-drying (lyophilization), handled in small test tal samples (Figure 17.12). tubes and vials, and readily sent anywhere in the world. Another approach in the identification of species is As lyophils, many of these organisms remain viable for through the use of multiple locus sequence typing 10 to 20 years or more and can be revived and studied by (MLST). In this approach, several core genes (typically anyone anywhere. Cultures can also be frozen at –80°C seven or eight) of an organism are sequenced and used in vials containing a suspension medium amended with to compare the organism to known species. In the clin- 15% glycerol. These remain viable for many years. Thus,

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. Taxonomy of Bacteria and Archaea 511

Figure17.12 Legionella fluorescence Legionella species are difficult to grow, but can be identified to species by fluorescence microscopy using tagged and specifically labeled antisera. In this image the bacteria are yellow-green. ©Michael Abbey/Visuals Unlimited.

unlike plants, for which an herbarium is used to preserve SUMMARY the specimens collected of an original species, bacteria are preserved as type cultures that are clones of the orig- • Bacterial taxonomy or systematics consists of three inal viable type strain of a species. The type strain of a areas: nomenclature, classification, and identifica- species is the one on which the species definition has been based. tion. Nomenclature is the naming of an organism. Through culture collections type strains are made avail- An International Code for the Nomenclature of able to professional microbiologists throughout the Bacteria has been published containing the rules for world. Many countries maintain national collections of naming Bacteria and Archaea. microorganisms, such as the American Type Culture Col- • Classification is the organization of and lection (ATCC) in the United States (www.atcc.org). If a Bacteria into groups of similar species. and microbiologist from India wishes to determine if he or Archaea Bacteria are classified in increasing hierarchical rank she has a new species, the original type strain can be Archaea from species, genus, family, order, class, phylum, obtained from a culture collection and used to conduct and domain. Artificial classifications are not based DNA–DNA reassociation assays and other tests to com- on the evolution of organisms but on expressed fea- pare it with his or her isolates. tures or the phenotype of an organism that includes Because of the importance of biological materials to properties such as cell shape and nutritional pat- science and industry, culture collections have become bi- terns. Phylogenetic classifications are based on the ological resource centers. Thus, strains that have been evolution of a group of organisms. Bacterial phy- patented are also deposited in culture collections so that logeny is now based on the sequence information they are accessible. Clones of genomes that have been from the highly conserved macromolecule, 16S sequenced are also deposited as are viruses and other rRNA, as well as the sequences of other genes and biological materials. proteins. • Phylogenetic trees, with branches and nodes, can SECTION HIGHLIGHTS be constructed based on the sequence of macromol- A number of tests are used to identify bacteria ecules such as rRNA. The length of the branch rep- that have been isolated from clinical speci- resents the inferred difference (number of changes) mens and natural sources. Some tests are phe- between organisms. External nodes represent notypic, whereas others use molecular se- extant species, whereas internal nodes represent quence information. ancestor species. Rooted trees are based on a com- parison of related species and an out-group.

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc. 512 Chapter Seventeen

• Speciation is the process whereby organisms evolve. 11. If you were working in a clinical laboratory, out- Typically genetic material is transferred from the line the types of procedures you would use to parental bacterium to the progeny through vertical identify isolates. Why do you recommend using inheritance. Horizontal gene transfer (HGT) occurs the procedures you suggest? in which genetic material is transferred among bac- 12. Why is rRNA of use in bacterial classification? teria that may or may not be closely related. 13. Compare the information obtained from deter- • A bacterial species is a group of similar strains that mining the DNA base composition (GC ratio) show at least 70% DNA–DNA hybridization. with that obtained by DNA reassociation Organisms of the same species will have similar if experiments. not identical DNA mol % G + C content. Organisms that have the same GC ratio are not necessarily simi- SUGGESTED READING lar. Based on molecular criteria, such as DNA–DNA reassociation, bacterial species are much more Boone, D., R. Castenholz and G. Garrity, eds. 2001. Bergey’s broadly defined than are plant and animal species. Manual of Systematic Bacteriology. 2nd ed., Vol. 1. New • Identification is the process whereby unknown York: Springer-Verlag. cultures can be compared to existing species to Brenner, D. J., N. R. Krieg, J. T. Staley and G. Garrity, eds. 2005. Bergey’s Manual of Systematic Bacteriology. 2nd ed., determine if they are sufficiently similar to be Vol. 2. New York: Springer-Verlag. members of the same species. Gerhardt, P., ed. 1993. Methods for General and Molecular • Type strains of all species must be deposited in at Microbiology. Washington, DC: ASM Press. Graur, D. and W. H. Li. 2000. Fundamentals of Molecular least two different types of culture collections, Evolution. 2nd ed. Sunderland, MA: Sinauer Associates, repositories where strains are preserved by Inc. lyophilization and deep freezing. The culture col- Hall, B. G. 2004. Phylogenetic Trees Made Easy. 2nd ed. lections provide cultures to microbiologists world- Sunderland, MA: Sinauer Associates, Inc. wide so they can compare unidentified strains to the official type strains. COMPUTER INTERNET RESOURCES

i Find more at www.sinauer.com/microbial-life American Type Culture Collection: http://www.atcc.org/. This site has a listing of all the bacterial strains deposit- ed in the American Type Culture Collection, as well as growth media and conditions. REVIEW QUESTIONS Bergey’s Manual Trust. Headquarters at the University of Georgia. This website has information on the current 1. Is it important to name and classify bacteria? classification of Bacteria and Archaea: 2. What procedure(s) are necessary to identify a http://www.bergeys.org/. National Center for Biotechnology Information (NCBI): bacterial isolate as a species? http://www.ncbi.nlm.nih.gov/. This center contains 3. Differentiate between an artificial and a phyloge- information that allows for comparison of genes from netic classification. different organisms through BLAST (Basic Local 4. In what ways does the classification of Bacteria Alignment Search Tool), Genbank, which contains a huge collection of gene sequences that can be used for differ from that of eukaryotic organisms? comparative analyses and a section on taxonomy. 5. How do the Archaea differ from Bacteria? From Ribosome Database Project (RDP), Michigan State University: eukaryotes? http://rdp.cme.msu.edu/. This site has information on 6. How is DNA melted and reannealed, and why is the 16S rRNA sequences of more than 250,000 bacterial this useful in bacterial taxonomy? and archaeal sequences. The database allows one to con- duct phylogenetic analyses of unknown strains whose 7. How would you go about identifying a bacteri- 16S rRNA sequence has been determined and to com- um that you isolated from a soil habitat? pare the sequence with those already reported. 8. Why is morphology of little use in bacterial clas- Comparative RNA Web Site: www.rna.icmb.utexas.edu. A sification? Is it of any use? remarkable collection of RNA sequence information presented with secondary structure models, conserva- 9. What is weighting and should phenotypic fea- tion diagrams, and more. Published as Cannone, J. J., tures be weighted in a bacterial classification et al. 2002. “The Comparative RNA Web Site: An online scheme? database of comparative sequence and structure infor- 10. Distinguish between lumpers and splitters. mation for ribosomal, intron, and other RNAs.” BioMed Central Bioinformatics 3: 2.

This material cannot be copied, disseminated, or used in any way without the express written permission of the publisher. Copyright 2007 Sinauer Associates Inc.