PHYLOGENOMICS: A GENOME- LEVEL APPROACH TO ASSEMBLING THE BACTERIAL BRANCHES OF THE TREE OF LIFE Jonathan A. Eisen1, Naomi Ward1, Karen E. Nelson1, Jonathan H. Badger1, James Sakwa1, Dongying Wu1, Martin Wu1, Kevin Penn1, Grace Pai1, Shannon Smith1, Elizabeth M. O’Connor2, Julie Enticknap2, Tim Steppe2, Frank T. Robb2 1The Institute for Genomic Research (TIGR), Rockville, MD.2Center of Marine Biotechnology (COMB), Baltimore, MD.
Project Summary and Web Page RESULTS
Table 1. Sequencing Status for Selected Phyla RESULTS SUMMARY Figure 4. APIS Output for Yellowstone Mat shotgun sequences Phylum Species selected Growth, Libraries Shotgun Estimated # of Auto- DNA Coverage Genome Contigs Annotated 1. Shotgun sequencing is completed for 5 phyla and in progress for the isolation Size (Mb) Chrysiogenes Chrysiogenes arsenatis + + 4x 2.5 155 + other three (Table 1). 2. Annotation of the genomes helps predict physiology and may aid in Coprothermobacter Coprothermobacter proteolyticus (CP) + + 8x 1.38 3 + experimental studies (Table 2). Dictyoglomi Dictyoglomus thermophilum (DT) + + 8x 2.0 9 + 3. Whole genome phylogenetic analysis suggests one group may not be a
Thermodesulfobacteria Thermodesulfobacterium commune (TC) + + 8x 1.78 26 + novel phylum (Thermomicrobium) and helps resolve relationships among phyla (Figure 3). Nitrospirae Thermodesulfovibrio yellowstonii (TY) + + 8x 1.98 27 + 4. Using the genomic data, members of these phyla have been identified in Thermomicrobia Thermomicrobium roseum + + 8x 3.4 82 + environmental samples (Figure 4).
Deferribacteres Selecting from + In 5. We have developed multiple “phylogenomic” tools in conjunction with Deferribacter thermophilus, progress this project Geovibrio thiophilus, Flexistipes sinusarabici Synergistes Selecting from + In Synergistes jonesii, progress Aminobacter colombiense, Figure 3. Whole genome phylogeny Thermanaerovibrio acidaminovorans, Aminomonas paucivorans, Dethiosulfovibrio peptidovorans
Table 2: Predicted Metabolic Pathways. Predictions made through a combination of mining the automated gene lists and using the APIS-based ECFinder algorithm. See Table 1 for full species names. TC CP TY DT TC CP TY DT Acetoin metabolism + + + + Pentose Phosphate Pathway + + + + Aspartate from fumarate + - + - Pyruvate to isoleucine + - + - Aspartate to alanine + + + - Pyruvate to leucine + - + - Aspartate to oxaloacetate + - - - Pyruvate to valine + - + - C1 metabolism evidence - + + - Pyruvate to acetylCoA + + + - Cellulose to cellobiose - - - + Pyruvate to acetate + + - - Cellobiose metabolism - - + + Pyruvate to cysteine - + - - Chitobiose metabolism + - + - Pyruvate to formate - + - - Dextrin to glucose - + - + Pyruvate to lactate + - + - Formate metabolism + - + + Pyruvate to malate + + + - Fumarate to glutamate - - - + Pyruvate to PEP + + + - Galactose metabolism - - - + Pyruvate to OAA + + + - Glycolysis partial + - - - Proline from glutamate + - - - Glycolysis intact - + + + Putrescine to spermidine + + + + Glucosamine metabolism + + + + Raffinose metabolism - - - + Glutamate to glutamine + + + - Ribose metabolism - + + + Glutamate to citrulline - + - - Ribulose metabolism - - + + Glutamate to histidine - + - - Serine to cysteine - + - - Glycine cleavage - + + + Serine to glycine + + - - Glycine to proline - - - + Sorbose metabolism - - - + Glycine to sarcosine - + - - Sorbitol metabolism - - + + Glycerol metabolism + + + + Sucrose metabolism - - - + Glycogen biosynthesis - - + + Sulfate to sulfite + - + - Histidine biosynthesis - - + - Tryptophan biosynthesis - - + - Lactose metabolism - - - + TCA partial + - + + Malate to pyruvate + - - - Urea to CO2 - + - - Mannose metabolism - + - - Xylan metabolism - - - + Nitrite to ammonia + - + - Xylose metabolism - - - +
BACKGROUND APIS TOOLS - EXAMPLES Automated Whole Genome Trees Figure 1. Bacterial Phyla. The tree is a schematic diagram showing the major Automated Phylogenetic Inference System Figure 2. Project “Pipeline” recognized bacterial phyla (based in part on Hugenholtz (2002) and Boon et al. Select “universal” phylogenetic markers (2001)). Phyla are colored by genome project and culturing status. In red are the phyla selected for this project. Genomic Data Set rRNA Automated Phylogeny/Taxonomy Pipeline (Genes, Proteins, DNA sequences) Build alignments for known species Proteobacteria Small Subunit rRNA PCR TM6 This Project OS-K Published BLASTP Vs. ComboDB Build Hidden Markov Models (HMMs) Acidobacteria ID phyla with cultured representative but no genomes. In progress (all proteins from complete genomes) Web Display Termite Group Sequence and Assemble OP8 Uncultured lineage Determine physiology Nitrospira Obtain live cultures Search HMMs against selected genomes Bacteroides Chlorobi Extract full length homologs from Chimera Check Compare Zoomed and Big Fibrobacteres ComboDB Marine GroupA Genomic DNA, PFGE Picture Trees Align genes, concatenate alignments WS3 Gemmimonas Domain (ABE) Assignment Multiple Sequence Alignment (MUSCLE) Firmicutes Make shotgun library and sequence 2-300 clones by Blastn, Align, NJ Tree Taxon Assignment Fusobacteria with Taxa Reps Actinobacteria If conflicts with accepted, OP9 Phylogenetic analysis Phylogenetic Inference (currently Cyanobacteria further investigation Zoom In Tree bootstrapped NJ using QuickTree) Synergistes Search Domain-Specific DB (NJ or ML) Deferribacteres Chrysiogenetes Shotgun sequence Data Release NKB19 Verrucomicrobia Determination of most related groups Chlamydia “Big Picture Tree” by Extract Taxon Seqs OP3 Assemble genome Blastn, Align, NJ Tree Plus Outgroups, Planctomycetes with Taxa Reps and Top Hits Align Spriochaetes Web Display (see Figure 4) of RDP II Coprothmermobacter Close physical and sequencing gaps OP10 Thermomicrobia Chloroflexi TM7 Annotate genomes REFERENCES Deinococcus-Thermus ACKNOWLEDGEMENTS Dictyoglomus Aquificacae Thanks to Connie Shao for building the project web page and John Heidelberg for •Hugenholtz P (2002) Exploring prokaryotic diversity in the Genomic era. Genome Phylogenomic Analysis Thermodesulfobacteria providing access to unpublished sequence data (supported by NSF FIBR Grant GC054-14- Biology 3(2):reviews0003.1-0003.8. Thermotogae OP1 Z3423) . This project is supported by an award to from the National Science Foundation’s “Assembling the Tree of Life” program (Grant DEB-0228651) •Boon DR, Castenholz RQ, Garrity GM (2001). Bergey’s Manual of Systematic OP11 Bacteriology, 2nd edition.Springer New York.