
Review J Mol Microbiol Biotechnol 2006;11:291–301 DOI: 10.1159/000095631 Phylogenetic Analysis of General Bacterial Porins: A Phylogenomic Case Study Thai X. Nguyen Eric R. Alegre Scott T. Kelley Department of Biology, San Diego State University, San Diego, Calif. , USA Key Words netic analysis comparing the relationships of the GenBank -Bioinformatics ؒ Outer membrane protein ؒ ompF ؒ ompC ؒ GBP sequences to the correctly annotated set of GBPs identi -Phylogeny ؒ Porin fied a large number of previously unclassified and mis-an notated GBPs. Given these promising results, we developed a tree-parsing algorithm for automated phylogenetic anno- Abstract tation and tested it with GenBank sequences. Our algorithm Bacterial porin proteins allow for the selective movement of was able to automatically classify 30 unidentified and 15 mis- hydrophilic solutes through the outer membrane of Gram- annotated GBPs out of 78 sequences. Altogether, our results negative bacteria. The purpose of this study was to clarify support the potential for phylogenomics to increase the ac- the evolutionary relationships among the Type 1 general curacy of sequence annotations. bacterial porins (GBPs), a porin protein subfamily that in- Copyright © 2006 S. Karger AG, Basel cludes outer membrane proteins ompC and ompF among others. Specifically, we investigated the potential utility of phylogenetic analysis for refining poorly annotated or mis- Introduction annotated protein sequences in databases, and for charac- terizing new functionally distinct groups of porin proteins. Gram-negative bacteria are distinguishable from Preliminary phylogenetic analysis of sequences obtained Gram-positive bacteria by the presence of an outer mem- from GenBank indicated that many of these sequences were brane. This membrane serves as a selective permeation incompletely or even incorrectly annotated. Using a well-cu- barrier that restricts the movement of hydrophilic solutes rated set of porins classified via comparative genomics, we in and out of the cell [Koebnik et al., 2000; Nikaido, 2003]. applied recently developed bayesian phylogenetic methods The movement of solutes across the membrane is made for protein sequence analysis to determine the relationships possible by channel-forming proteins. The general bacte- among the Type 1 GBPs. Our analysis found that the major rial porins (GBPs) comprise one such class of channel- GBP classes (ompC , phoE , nmpC and ompN) formed strongly forming proteins found in members of the gamma-pro- supported monophyletic groups, with the exception of teobacteria, such as Escherichia coli , Shigella , Salmonella , ompF, which split into two distinct clades. The relationships Yersinia and others [Koebnik et al., 2000; Schulz, 2002]. of the GBP groups to one another had less statistical support, These non-specific permeation porins are the most abun- except for the relationships of ompC and ompN sequences, dant outer membrane proteins of enteropathogenic bac- which were strongly supported as sister groups. A phyloge- teria [Blasband et al., 1986; Blasband and Schnaitman, © 2006 S. Karger AG, Basel Scott T. Kelley 1464–1801/06/0116–0291$23.50/0 Department of Biology, San Diego State University Fax +41 61 306 12 34 San Diego, CA 92182-4614 (USA) E-Mail [email protected] Accessible online at: Tel. +1 619 594 5371, Fax +1 619 594 4767 www.karger.com www.karger.com/mmb E-Mail [email protected] Fig. 1. Preliminary NJ analysis of putative Type 1 GBPs obtained bitrary identifying number for comparing trees in figures 3 and from a BLAST search of GenBank using the E . coli ompF sequence 4. The name also includes the GenBank Identifier (GI) for the se- as the query. The main purpose of the figure is to show the con- quence, and gene annotation information provided in the Gen- siderable discrepancies between GenBank annotations and the Bank file. Eca = Erwinia carotovora; Eco = E . coli; Sbo = Shigella phylogeny of the sequences. Names in boldface indicate protein boydii ; Sdy = Shigella dysenteriae; Sen = Salmonella enterica ; sequences identified only as porin-like (UP = unknown porin), Sfl = Shigella flexneri; Sgl = Sodalis glossinidius ; Sso = Shigella while arrows highlight sequences annotated as ompC porins. The sonnei ; Sty = Salmonella typhimurium; Ybe = Yersinia bercovieri ; sequence information at the tips of the branches includes a three- Yfr = Yersinia frederiksenii ; Ymo = Yersinia mollaretii; Ype = letter code for the bacterial species (see below) with a unique ar- Yersinia pestis; Yps = Yersinia pseudotuberculosis. 292 J Mol Microbiol Biotechnol 2006;11:291–301 Nguyen/Alegre/Kelley 1987; Schulz, 2002]. The monomeric porin proteins form quences, and these methods have often been used to a stable trimeric channel that allows passive diffusion of characterize new functional groups of proteins [Barba- nutrients across the outer membrane, and this trimer can zuk et al., 2000; Kelley and Thackray, 1999; Yi et al., also facilitate adhesion, invasion, and parasitism of patho- 1999]. genic bacteria [Williams et al., 2000]. Given the utility of phylogenetic analyses for classify- The three best-studied GBPs in E . coli include ompF , ing orthologs and paralogs, we set forth to determine the ompC , and phoE , and they differ from one another in effectiveness of newly developed phylogenetic methods their solute selectivity [Nikaido, 2003]. The expression of for establishing the relationships among the Type 1 GBPs. these well-characterized porins is affected by osmolarity, Using a set of GBPs that had been annotated using a com- temperature, available carbon sources, and phosphate bination of BLAST similarity and comparative genomic concentration, and these conditions have been used to position analysis, we first determined whether phyloge- characterize other porins, such as nmpC , and the LC po- netic approaches could accurately recover known group- rins [Nikaido, 2003]. The LC porin and nmpC genes are ings with high confidence. In other words, did the cor- located on lambdoid bacteriophage and defective lamb- rectly annotated ompC , ompF , and other GBPs form doid prophage that have been integrated into bacterial strongly supported monophyletic groups? Second, we genomes [Blasband et al., 1986; Blasband and Schnait- asked whether phylogenetic methods could determine man, 1987; Prilipov et al., 1998]. The ompF , ompC , phoE , the GBP group affiliation of unidentified porin-like se- nmpC, and LC porins have all been classified as Type 1 quences and also correct erroneous annotations. Finally, GBPs, according to the Transport Classification Data- using newly developed bayesian phylogenetic methods base (TCDB) [Saier et al., 2006], and we adopt this clas- that incorporate advanced models of protein evolution sification throughout this paper. [Ronquist and Huelsenbeck, 2003], we investigated the The purpose of this study was to determine the phy- evolutionary relationships among the various Type 1 logenetic relationships among the Type 1 GBPs in order GBP classes and attempted to detect new classes of un- to better understand their evolution and ultimately assist characterized GBPs. In the process of answering these the development of a broad-spectrum GBP vaccine anti- questions, we also developed a phylogenetic algorithm gen [Singh et al., 1995]. However, preliminary phyloge- for automatically annotating new sequences given a cor- netic analysis of GBP-like porins obtained from a Gen- rectly annotated set of related sequences. We demon- Bank BLAST search indicated that a substantial number strate the effectiveness of this ‘phylogenomic’ approach of protein sequences identified as ompF , ompC , or other using porin-related protein sequences obtained from types of general class porins were either poorly anno- GenBank, and discuss its potential use in automated tated or mis-annotated ( fig. 1 ). Most of the bacterial se- gene annotation. Our results suggest that automated quences we procured from GenBank had presumably phylogenetic methods, combined with BLAST methods been annotated using the BLAST algorithm [Altschul et and cross-genome comparisons, could be highly effec- al., 1990]. The BLAST algorithm is arguably the most tive for improving the quality of gene functional annota- powerful and useful tool in bioinformatics, and has been tions and reducing annotation error propagation in se- used to functionally annotate millions of genes saving quence databases. untold hours of experimentation and providing remark- able insight into biological systems. Although this algo- rithm is both deceptively simple and remarkably power- Results ful, researchers have recognized that the BLAST algo- rithm cannot reliably distinguish between orthologous Multiple sequence alignments proved to be of high (sequences related through common ancestry) and paral- quality, with few insertions or deletions. Most of these ogous (sequence similarity due to an ancestral duplica- insertions or deletions (indels) were in the variable extra- tion event) genes [Barbazuk et al., 2000; Chiu et al., 2006; cellular regions of the porin, regions which are known to Daubin et al., 2002; Srinivasan et al., 2005]. Determina- undergo relatively rapid evolutionary change [Nikaido, tion of orthology or paralogy is critically important be- 2003]. For example, out of 486 amino acid alignment po- cause paralogous genes often have distinct functional sitions in the multiple sequence alignment used to esti- roles in organisms (e.g., ompF , ompC ). Phylogenetic mate the phylogeny in figure
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-