Microbial Genomes 1
Total Page:16
File Type:pdf, Size:1020Kb
Microbial Genomes 1 1 Microbial Genomes Voon Loong Chan Summary With more than 200 bacterial and archaeal genomes completely sequenced, and more than 500 genomes at various stages of completion, we begin to appreciate the enormous diversity of prokary- otic genomes in terms of chromosomal structure, gene content and organization, and the abundance and fluidity of accessory and mobile genetic elements. The genome of a bacterial species is com- posed of conserved core genes and variable accessory genes. Mobile genetic elements, such as plas- mids, transposons, insertion sequences, integrons, prophages, genomic islands, and pathogenicity islands, are part of the accessory genes, which can have a significant influence on the phenotype and biology of the organism. These mobile elements facilitate interspecies and intraspecies genetic exchange. They play an important role in the pathogenicity of bacteria, and are a major contributor to species diversity. Further genomic analysis will likely uncover more interesting genetic elements like small (noncoding) RNA genes that can play a significant role in gene regulation. Key Words: Genome diversity; plasmids; insertion elements; genomic islands; prophages; small RNA. 1. Introduction The first bacterial genome sequenced to completion was Haemophilus influenzae. It was published in 1995 (1). To date (Feb 1, 2006) 297 prokaryotic genomes have been sequenced, 272 bacterial and 25 archaeal. In addition, there are more than 500 prokaryo- tic genomes that are at various stages of completion. Interest in prokaryotic genomes is still growing at an escalating rate (Fig. 1). The major impetus for determining the com- plete genome sequence of an organism is to gain a better understanding of the biology and evolution of the microbes; and for pathogens, to identify new vaccine candidates, putative virulence genes, and targets for therapeutics, including antibiotics. Emergence of new antibiotic-resistant isolates of pathogens spurs the search for new vaccines, anti- microbials, and novel approaches in the prevention of infection. The complete genome sequence of a bacterium provides valuable information on the genome size and topology. Many sequence analysis programs and algorithms have been developed, and their capacity and power have improved with the growth of the data base (2). With the aim of identifying the function of all the genes in a particular genome, the nucleotide and predicted amino acid sequences are then compared with sequences in the public database using Basic Local Alignment Search Tool (BLAST). Algorithms have also been developed to align the whole genome of two closely related bacteria. A From: Bacterial Genomes and Infectious Diseases Edited by: V. L. Chan, P. M. Sherman, and B. Bourke © Humana Press Inc., Totowa, NJ 1 2 Chan Fig. 1. Number of completed genomes per year from 1995 to 2004. The numbers for each year were derived from two web sites: http://www.ncbi.nlm.nih.gov/genomes/MICROBES/ Complete.html and http://www.genomeonline.org. high percentage of protein genes can be identified through similarity to known genes. However, for most of the genomes sequenced, 25% or more of identified open reading frames (ORFs) do not match any known genes. To achieve full understanding of the biology and pathogenic mechanisms of a bacterial pathogen, we need to have a good understanding of the function of all the gene products and how the genes are regulated. 2. Structure and Diversity of Microbial (Bacterial and Archaeal) Genomes The structure of a microbial genome can be analyzed and compared with other genomes at various levels, including guanine and cytosine (G+C) content, overall size, topology, organization of genes, and the presence and abundance of accessory and mobile genetic elements. Limited information of the structure of bacterial genomes was available before the era of genomic sequencing. Most of the early structural information was derived from physical mapping using restriction enzyme cleavage and pulsed-field gel electro- phoresis. This has been reviewed extensively (3–6). It is abundantly clear from physi- cal mapping information and genomic sequence data that bacterial, as well as archaeal, genomes are highly diversified. 2.1. Chromosome Size Chromosome sizes of sequenced bacteria range from 580 kb of Mycoplasma geni- talium (7) to 9105 kb of Bradyrhizobium japonicum (8). The distribution of the sizes of bacterial genomes is shown in Fig. 2. Ninety percent of the bacterial genomes sequenced are less than 5.5 Mb. There are clusters of genomes around 0.9–1.3, 1.7–2.5, 3.3–3.7, and 4.5–4.9 Mb. Variations in the size of bacterial genomes are expected because there is huge diversity among bacteria in morphology, metabolic capability, and the ability to survive and grow in various environments or hosts. For some bacterial species, multi- ple strains have been sequenced: Bacillus anthracis, Bacillus cereus, Brucella meliten- sis, Chlamydophila pneumoniae, Escherichia coli, Leptospira interrogans, Myco- Microbial Genomes 3 Fig. 2. Size distribution of bacterial genomes. The sizes of the bacterial genomes were from two web sites: http://www.ncbi.nlm.nih.gov/genomes/MICROBES/Complete.html and http:// www.genomeonline.org.When multiple strains of the same species were sequenced the mean size of the different strains were used, provided the size variation was less than 10%. However, if the size variation exceeded 10%, they were entered as separate genomes. bacterium tuberculosis, Neisseria meningitidis, Prochorococus marinus, Salmonella enterica Typhi, Shigella flexneri, Staphylococcus aureus, Streptococcus agalactiae, Streptococcus pyogenes, Tropheryma whipplei, Vibro vulnificus, Xylella fastidiosa, and Yersinia pestis (see http://www.genomeonline.org; http://www.ncbi.nlm.nih.gov/ genomes/MICROBES/Complete.html). With the exception of E. coli and P. marinus, different strains of the same species have a very similar genome size. E. coli K12-MG1655 is 4.6 Mb, whereas the largest strain sequenced, E. coli O157:H7 (Sakai) is 5.6 Mb, 18% larger. Two ecotypes (strains of the same species occupying a different niche) of P. marinus, MED4 and MIT9313, were recently sequenced. The genome of both strains consists of a single circular chromosome. The chromosome of MED4, a high-light- adapted strain, is 1658 kb and that of MIT9313, a low-light-adapted strain, is 2411 kb. That is, the genome of ecotype MED4 is only 68.8% the size of MIT9313 and yet, on the basis of their ribosomal DNA sequences, they are classified as the same species. There are fewer archaeal genomes sequenced than bacterial genomes. Even though chromosomes of archaeal genomes can also differ greatly in size, ranging from the smallest at 500 kb for Nanoarchaeum equitans (9) to 5751 kb for Methanosarcina ace- tivorans (10), the sizes of archaeal genomes are not as varied as those of bacteria. Four- teen of the 20 archaeal species sequenced to date have a genome size between 1.5 and 2.3 Mb (see Fig. 3). 2.2. Chromosome Topology and Number Bacteria generally have a single, double-stranded circular DNA chromosome. How- ever, a number of bacterial species have a linear chromosome (5,11). The first linear bacterial chromosome identified was that of Borrelia burgdorferi (12). A number of Streptomyces species, including S. lividans 66 (13), S. coelicolor A3 (14), and S. aver- mitilis (15), also have a linear chromosome. It is likely that the Streptomyces linear chromosome evolved from an ancestral circular chromosome (11). 4 Chan Fig. 3. Size distribution of archaeal genomes. The sizes of the archaeal genomes were from two web sites: http://www.ncbi.nlm.nih.gov/genomes/MICROBES/Complete.html and http:// www.genomeonline.org. Agrobacterium tumefaciens, an D Proteobacteria, has two chromosomes, one linear and one circular (16,17). Other D Proteobacteria, Brucella melitensis biovar suis 1330 (18) and B. melitensis 16M (18), also have two chromosomes. B. melitensis, which causes abortion in sheep and goats and undulant fever in humans, has three biovars (biotypes), whereas Brucella suis, which infects swine, has four biovars. B. suis biovar 1, biovar 2, and biovar 4 all have two circular chromosomes. The two chromosomes in biovar 2 and biovar 4 are of similar size, but are different from the two chromosomes in biovar 1 suggesting chromosome rearrangement. B. suis biovar 3 has only one large circular chromosome, which is about the sum of the two chromosomes found in the other biovar types (19). So far, there are no other reports of different strains of the same species having different numbers of chromosomes, therefore it is probably a rare occur- rence. Other D Proteobacteria that contain two circular chromosomes are Rhizobium meliloti and Rhodobacter sphaeroides (5). Leptospira interrogans, a causative agent of leptospirosis in humans, has a 4.33 Mb large circular chromosome and a small 359 kb circular chromosome (20,21). Lepto- spira borgpetersenii serovar Hardjobovis, belonging to the same phylogenic pathogen group as L. interrogans, also has one large (3.61 Mb) circular and one small (317 kb) circular chromosomes (see Chapter 7). The genome structure of Leptospira spp. is thus different from that of two other sequenced spirochetes, Treponema pallidum (22) and B. burgdorferi (23). The T. pallidum genome is a circular chromosome of 1138 kb, whereas B. burgdorferi has a linear chromosome of 911 kb. Among the J Proteobacteria, most of the genomes have a single circular chromosome. A well-known exception is the genus Vibro. Three Vibro species have been sequenced, V. cholerae (24), V. vulnificus (25), and V. parahaemolyticus (26) and all three genomes contain two circular chromosomes. Burkholderia cepacia, a member of the E-subdivision of Proteobacteria, is found in soils and waters and is recognized as a serious respiratory pathogen for patients with cystic fibrosis. Although the genome is still being sequenced, pulsed-field gel elec- Microbial Genomes 5 trophoresis analysis of B. cepacia isolates identified multiple circular chromosomes, between two and four chromosomes with overall genome sizes in the range of 5 to 9 Mb (27,28).