Genomics Update Genomes of Model Organisms
Total Page:16
File Type:pdf, Size:1020Kb
Environmental Microbiology (2008) 10(6), 1383–1391 doi:10.1111/j.1462-2920.2008.01656.x Genomics update Genomes of model organisms: know thy tools OnlineOpen: This article is available free online at www.blackwell-synergy.com Michael Y. Galperin* included identification of the DH10B DNAregions that were National Center for Biotechnology Information, National absent in the strain MG1655 chromosome and closing the Library of Medicine, National Institutes of Health, gaps between contigs, which still required some sequenc- Bethesda, MD 20894, USA. ing.After the assembly of Wolbachia genomes from Droso- phila sequence reads by Salzberg and colleagues (2005), The list of recently completed microbial genome sequenc- this work is another impressive example of extracting ing projects (Table 1) includes genomes of two unicellular useful information on bacterial genomes from the massive eukaryotes, three archaea and a variety of bacteria, amounts of sequence data accumulated by the eukaryotic including an unusually diverse selection of the Firmicutes. genome sequencing projects. The highlights of these sequencing efforts include com- The genome sequence of E. coli DH10B revealed 226 plete genome sequences of several important model mutations, a 113 kb tandem duplication and an inversion organisms, including the standard laboratory strain as compared with the genome of E. coli MG1655 (Durfee Escherichia coli DH10B, the model halophile Halobacte- et al., 2008). Surprisingly, the presence of deoR mutation rium salinarum strain R1, the marine cyanobacterium in DH10B could not be confirmed, which made the causes Synechococcus sp. PCC 7002 and the unicellular green of the high transformation efficiency of this strain as alga Chlamydomonas reinhardtii. obscure as ever before. Arguably, the biggest news was sequencing of the In addition to DH10B, two other E. coli genomes have genome of E. coli DH10B (Durfee et al., 2008). Among been released in March 2008 and will be used for com- more than a dozen of E. coli strains with completely parative genome analysis. Escherichia coli strain SECEC sequenced genomes, most are pathogenic and only two, SMS-3–5 was isolated from a toxic metal-contaminated MG1655 and W3110, are derivatives of E. coli K-12. Strain coastal site at Shipyard Creek in Charleston, South Caro- DH10B was constructed at Douglas Hanahan’s lab at Cold lina. Surprisingly, this environmental strain is highly resis- Spring Harbor Laboratory (Grant et al., 1990) as a deriva- tant to a number of antibiotics, including ciprofloxacin and tive of E. coli MC1061 designed to serve as a convenient moxifloxacin, which is obviously a cause for great concern, host for cloning and propagation of foreign DNA. Owing to see http://msc.jcvi.org/e_coli_and_shigella/. Escherichia its unusually high transformation efficiency and the ability coli C str. ATCC 8739 has an altered outer membrane that to maintain large DNA inserts, DH10B became the strain of lacks the outer membrane porin OmpC and contains only choice for many genetic engineering tasks and has been OmpF. extensively used for preparation of mammalian DNA Another important model organism with a recently fin- libraries for whole-genome sequencing. Because of this ished genome is the extremely halophilic archaeon Halo- circumstance, the authors were able to replace most of bacterium salinarum R1. This organism has been first the sequencing with computational analysis of ~4 million isolated from salted fish in 1920s and has been known sequence reads collected in the course of the bovine under several names, including Halobacterium halobium. genome sequencing project at Baylor College of Medicine. Halobacterium salinarum was used in the famous work of Bovine BAC DNApreparations were found to contain some Oesterhelt and Stoeckenius (1971) that discovered bac- < ( 1%) DNA contamination from the E. coli DH10B host. teriorhodopsin, a 26 kDa protein that comprises the sim- These DH10B DNA fragments were identified by compari- plest membrane proton pump. Bacteriorhodopsin served son to the recently updated genomic sequence of E. coli as a founding member of a vast family of retinal-binding K12 strain MG1655 (Riley et al., 2006), extracted and proteins found in a wide variety of organisms and habitats assembled into contigs. The genomic finishing phase (Beja et al., 2000; Venter et al., 2004). Sequencing of the *For correspondence. E-mail [email protected]; Tel. (+1) H. salinarum R1 genome was performed several years 301 435 5910; Fax (+1) 301 435 7793. ago, although closing the genome proved impossible at Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial that time owing to the abundance of insertion sequences exploitation. (Pfeiffer et al., 2008). In contrast, Halobacterium sp. Journal compilation © 2008 Society for Applied Microbiology and Blackwell Publishing Ltd No claim to original US government works 1384 Genomics update Table 1. Recently completed microbial genomes (February–March 2008). GenBank Genome Proteins Sequencing Species name Taxonomy accession size (bp) (total) centrea Reference New organisms Chlamydomonas reinhardtii Eukaryota, ABCN00000000 ~121 Mbp 14 489 JGI Merchant et al. (2007) Chlorophyta Monosiga brevicollis Eukaryota, ABFJ00000000 ~41.6 Mbp ~9 200 JGI King et al. (2008) Choanoflagellata Candidatus Korarchaeum cryptofilum OPF8 Korarchaeota CP000968 1 590 757 1 602 JGI Unpublished Thermoproteus neutrophilus Crenarchaeota CP001014 1 769 823 1 966 JGI Unpublished Halobacterium salinarum R1 Euryachaeota AM774415– 2 668 776 2 749 MPI Biochem. Pfeiffer et al. (2008) AM774419 (total) Corynebacterium urealyticum Actinobacteria AM942444 2 369 219 2 024 Bielefeld U. Tauch et al. (2008) Mycobacterium abscessus Actinobacteria CU458896 5 067 172 4 941 Genoscope Ripoll et al. (2007) CU458745 23 319 Cyanothece sp. ATCC 51142 Cyanobacteria CP000806– 5 460 377 5 304 Wash U. Unpublished CP000811 (total) Synechococcus sp. PCC 7002 Cyanobacteria CP000951– 3 409 935 3 186 BGI Unpublished CP000957 (total) Acholeplasma laidlawii Firmicutes CP000896 1 496 992 1 380 Moscow Inst. Unpublished Phys.-Chem. Candidatus Desulforudis audaxviator Firmicutes CP000860 2 349 476 2 157 JGI Unpublished Finegoldia magna Firmicutes AP008971, 1 797 577 1 813 RIKEN Goto et al. (2008) AP008972 189 163 Heliobacterium modesticaldum Firmicutes CP000930 3 075 407 3 000 TGRI Unpublished Leuconostoc citreum KM20 Firmicutes DQ489736– 1 896 614 1 840 KRIBB Kim et al. (2008) DQ489740 (total) Lysinibacillus (Bacillus) sphaericus Firmicutes CP000817 4 639 821 4 771 BGI Hu et al. (2008) CP000818 177 642 Thermoanaerobacter pseudethanolicus Firmicutes CP000924 2 362 816 2 243 JGI Unpublished Thermoanaerobacter sp. X514 Firmicutes CP000923 2 457 259 2 349 JGI Unpublished Caulobacter sp. K31 a-Proteobacteria CP000927, 5 477 872 5 438 JGI Unpublished CP000928, 233 649 CP000929 177 878 Methylobacterium radiotolerans a-Proteobacteria CP001001– 6 899 110 6 431 JGI Unpublished CP001009 (total) Methylobacterium sp. 4–46 a-Proteobacteria CP000943, 7 659 055 6 692 JGI Unpublished CP000944, 57 951 CP000945 20 019 Cupriavidus taiwanensis b-Proteobacteria CU633749 3 416 911 Genoscope Unpublished CU633750 2 502 411 CU633751 557 200 Leptothrix cholodnii b-Proteobacteria CP001013 4 909 403 4 363 JGI Unpublished Polynucleobacter necessarius b-Proteobacteria CP001010 1 560 469 1 508 JGI Unpublished Francisella philomiragia g-Proteobacteria CP000937, 2 045 775 1 915 JGI Unpublished CP000938 3 936 Shewanella halifaxensis g-Proteobacteria CP000931 5 226 917 4 278 JGI Unpublished Shewanella woodyi g-Proteobacteria CP000961 5 935 403 4 880 JGI Unpublished Leptospira biflexa strain ‘Patoc 1 (Ames)’ Spirochaetes CP000777, 3 603 977 3 600 Institut Pasteur Picardeau et al. (2008) CP000778, 277 995 and Monash Univ. CP000779 74 117 Leptospira biflexa strain ‘Patoc 1 (Paris)’ Spirochaetes CP000786, 3 599 677 3 787 Institut Pasteur Picardeau et al. (2008) CP000787, 277 655 and Monash Univ. CP000788 74 116 Thermotoga sp. RQ2 Thermotogae CP000969 1 877 693 1 819 JGI Unpublished New strains Clavibacter michiganensis ssp. sepedonicus Actinobacteria AM849034 3 258 645 2 943 Sanger institute Bentley et al. (2008) Clostridium botulinum A3 str. Loch Maree Firmicutes CP000962, 3 992 906 3 984 USAMRIID Smith et al. (2007a) CP000963 266 785 Clostridium botulinum B1 str. Okra Firmicutes CP000939, 3 958 233 3 852 USAMRIID Smith et al. (2007a) CP000940 148 780 Streptococcus pneumoniae Hungary19A-6 Firmicutes CP000936 2 245 615 2 155 JCVI Unpublished Ureaplasma parvum str. ATCC 27815 Firmicutes CP000942 751 679 609 JCVI Unpublished Burkholderia cenocepacia MC0-3 b-Proteobacteria CP000958, 3 532 883 3 160 JGI Unpublished CP000959, 3 213 911 2 795 CP000960 1 224 595 1 053 Acinetobacter baumannii AYE g-Proteobacteria CU459137– 4 048 735 3 712 Genoscope Fournier et al. (2006) CU459141 (total) Acinetobacter baumannii SDF g-Proteobacteria CU468230– 3 477 996 2 975 Genoscope Fournier et al. (2006) CU468233 (total) Journal compilation © 2008 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 10, 1383–1391 No claim to original US government works Genomics update 1385 Table 1. cont. GenBank Genome Proteins Sequencing Species name Taxonomy accession size (bp) (total) centrea Reference Escherichia coli C str. ATCC 8739 g-Proteobacteria CP000946 4 746 218 4 200 JGI Unpublished Escherichia coli DH10B g-Proteobacteria CP000948 4 686 137 4 126 U. Wisconsin Durfee et al. (2008) Escherichia