Phylogenetic Analysis of General Bacterial Porins: a Phylogenomic Case Study

Phylogenetic Analysis of General Bacterial Porins: a Phylogenomic Case Study

Review J Mol Microbiol Biotechnol 2006;11:291–301 DOI: 10.1159/000095631 Phylogenetic Analysis of General Bacterial Porins: A Phylogenomic Case Study Thai X. Nguyen Eric R. Alegre Scott T. Kelley Department of Biology, San Diego State University, San Diego, Calif. , USA Key Words netic analysis comparing the relationships of the GenBank -Bioinformatics ؒ Outer membrane protein ؒ ompF ؒ ompC ؒ GBP sequences to the correctly annotated set of GBPs identi -Phylogeny ؒ Porin fied a large number of previously unclassified and mis-an notated GBPs. Given these promising results, we developed a tree-parsing algorithm for automated phylogenetic anno- Abstract tation and tested it with GenBank sequences. Our algorithm Bacterial porin proteins allow for the selective movement of was able to automatically classify 30 unidentified and 15 mis- hydrophilic solutes through the outer membrane of Gram- annotated GBPs out of 78 sequences. Altogether, our results negative bacteria. The purpose of this study was to clarify support the potential for phylogenomics to increase the ac- the evolutionary relationships among the Type 1 general curacy of sequence annotations. bacterial porins (GBPs), a porin protein subfamily that in- Copyright © 2006 S. Karger AG, Basel cludes outer membrane proteins ompC and ompF among others. Specifically, we investigated the potential utility of phylogenetic analysis for refining poorly annotated or mis- Introduction annotated protein sequences in databases, and for charac- terizing new functionally distinct groups of porin proteins. Gram-negative bacteria are distinguishable from Preliminary phylogenetic analysis of sequences obtained Gram-positive bacteria by the presence of an outer mem- from GenBank indicated that many of these sequences were brane. This membrane serves as a selective permeation incompletely or even incorrectly annotated. Using a well-cu- barrier that restricts the movement of hydrophilic solutes rated set of porins classified via comparative genomics, we in and out of the cell [Koebnik et al., 2000; Nikaido, 2003]. applied recently developed bayesian phylogenetic methods The movement of solutes across the membrane is made for protein sequence analysis to determine the relationships possible by channel-forming proteins. The general bacte- among the Type 1 GBPs. Our analysis found that the major rial porins (GBPs) comprise one such class of channel- GBP classes (ompC , phoE , nmpC and ompN) formed strongly forming proteins found in members of the gamma-pro- supported monophyletic groups, with the exception of teobacteria, such as Escherichia coli , Shigella , Salmonella , ompF, which split into two distinct clades. The relationships Yersinia and others [Koebnik et al., 2000; Schulz, 2002]. of the GBP groups to one another had less statistical support, These non-specific permeation porins are the most abun- except for the relationships of ompC and ompN sequences, dant outer membrane proteins of enteropathogenic bac- which were strongly supported as sister groups. A phyloge- teria [Blasband et al., 1986; Blasband and Schnaitman, © 2006 S. Karger AG, Basel Scott T. Kelley 1464–1801/06/0116–0291$23.50/0 Department of Biology, San Diego State University Fax +41 61 306 12 34 San Diego, CA 92182-4614 (USA) E-Mail [email protected] Accessible online at: Tel. +1 619 594 5371, Fax +1 619 594 4767 www.karger.com www.karger.com/mmb E-Mail [email protected] Fig. 1. Preliminary NJ analysis of putative Type 1 GBPs obtained bitrary identifying number for comparing trees in figures 3 and from a BLAST search of GenBank using the E . coli ompF sequence 4. The name also includes the GenBank Identifier (GI) for the se- as the query. The main purpose of the figure is to show the con- quence, and gene annotation information provided in the Gen- siderable discrepancies between GenBank annotations and the Bank file. Eca = Erwinia carotovora; Eco = E . coli; Sbo = Shigella phylogeny of the sequences. Names in boldface indicate protein boydii ; Sdy = Shigella dysenteriae; Sen = Salmonella enterica ; sequences identified only as porin-like (UP = unknown porin), Sfl = Shigella flexneri; Sgl = Sodalis glossinidius ; Sso = Shigella while arrows highlight sequences annotated as ompC porins. The sonnei ; Sty = Salmonella typhimurium; Ybe = Yersinia bercovieri ; sequence information at the tips of the branches includes a three- Yfr = Yersinia frederiksenii ; Ymo = Yersinia mollaretii; Ype = letter code for the bacterial species (see below) with a unique ar- Yersinia pestis; Yps = Yersinia pseudotuberculosis. 292 J Mol Microbiol Biotechnol 2006;11:291–301 Nguyen/Alegre/Kelley 1987; Schulz, 2002]. The monomeric porin proteins form quences, and these methods have often been used to a stable trimeric channel that allows passive diffusion of characterize new functional groups of proteins [Barba- nutrients across the outer membrane, and this trimer can zuk et al., 2000; Kelley and Thackray, 1999; Yi et al., also facilitate adhesion, invasion, and parasitism of patho- 1999]. genic bacteria [Williams et al., 2000]. Given the utility of phylogenetic analyses for classify- The three best-studied GBPs in E . coli include ompF , ing orthologs and paralogs, we set forth to determine the ompC , and phoE , and they differ from one another in effectiveness of newly developed phylogenetic methods their solute selectivity [Nikaido, 2003]. The expression of for establishing the relationships among the Type 1 GBPs. these well-characterized porins is affected by osmolarity, Using a set of GBPs that had been annotated using a com- temperature, available carbon sources, and phosphate bination of BLAST similarity and comparative genomic concentration, and these conditions have been used to position analysis, we first determined whether phyloge- characterize other porins, such as nmpC , and the LC po- netic approaches could accurately recover known group- rins [Nikaido, 2003]. The LC porin and nmpC genes are ings with high confidence. In other words, did the cor- located on lambdoid bacteriophage and defective lamb- rectly annotated ompC , ompF , and other GBPs form doid prophage that have been integrated into bacterial strongly supported monophyletic groups? Second, we genomes [Blasband et al., 1986; Blasband and Schnait- asked whether phylogenetic methods could determine man, 1987; Prilipov et al., 1998]. The ompF , ompC , phoE , the GBP group affiliation of unidentified porin-like se- nmpC, and LC porins have all been classified as Type 1 quences and also correct erroneous annotations. Finally, GBPs, according to the Transport Classification Data- using newly developed bayesian phylogenetic methods base (TCDB) [Saier et al., 2006], and we adopt this clas- that incorporate advanced models of protein evolution sification throughout this paper. [Ronquist and Huelsenbeck, 2003], we investigated the The purpose of this study was to determine the phy- evolutionary relationships among the various Type 1 logenetic relationships among the Type 1 GBPs in order GBP classes and attempted to detect new classes of un- to better understand their evolution and ultimately assist characterized GBPs. In the process of answering these the development of a broad-spectrum GBP vaccine anti- questions, we also developed a phylogenetic algorithm gen [Singh et al., 1995]. However, preliminary phyloge- for automatically annotating new sequences given a cor- netic analysis of GBP-like porins obtained from a Gen- rectly annotated set of related sequences. We demon- Bank BLAST search indicated that a substantial number strate the effectiveness of this ‘phylogenomic’ approach of protein sequences identified as ompF , ompC , or other using porin-related protein sequences obtained from types of general class porins were either poorly anno- GenBank, and discuss its potential use in automated tated or mis-annotated ( fig. 1 ). Most of the bacterial se- gene annotation. Our results suggest that automated quences we procured from GenBank had presumably phylogenetic methods, combined with BLAST methods been annotated using the BLAST algorithm [Altschul et and cross-genome comparisons, could be highly effec- al., 1990]. The BLAST algorithm is arguably the most tive for improving the quality of gene functional annota- powerful and useful tool in bioinformatics, and has been tions and reducing annotation error propagation in se- used to functionally annotate millions of genes saving quence databases. untold hours of experimentation and providing remark- able insight into biological systems. Although this algo- rithm is both deceptively simple and remarkably power- Results ful, researchers have recognized that the BLAST algo- rithm cannot reliably distinguish between orthologous Multiple sequence alignments proved to be of high (sequences related through common ancestry) and paral- quality, with few insertions or deletions. Most of these ogous (sequence similarity due to an ancestral duplica- insertions or deletions (indels) were in the variable extra- tion event) genes [Barbazuk et al., 2000; Chiu et al., 2006; cellular regions of the porin, regions which are known to Daubin et al., 2002; Srinivasan et al., 2005]. Determina- undergo relatively rapid evolutionary change [Nikaido, tion of orthology or paralogy is critically important be- 2003]. For example, out of 486 amino acid alignment po- cause paralogous genes often have distinct functional sitions in the multiple sequence alignment used to esti- roles in organisms (e.g., ompF , ompC ). Phylogenetic mate the phylogeny in figure

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    11 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us