Computational Prediction and Comparative Analysis of Protein Subcellular Localization in Bacteria

COMPUTATIONAL PREDICTION AND COMPARATIVE ANALYSIS OF PROTEIN SUBCELLULAR LOCALIZATION IN BACTERIA Jennifer Leigh Gardy B.Sc., University of British Columbia, 2000 Graduate Certificate in Biotechnology, McGill University, 2001 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY In the Department of Molecular Biology and Biochemistry O Jennifer Gardy 2006 SIMON FRASER UNIVERSITY Spring 2006 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without permission of the author APPROVAL Name: Jennifer Leigh Gardy Degree: Doctor of Philosophy Title of Thesis: Computational Prediction and Comparative Analysis of Protein Subcellular Localization in Bacteria Examining Committee: Chair: Dr. Erika Plettner Assistant Professor, Department of Chemistry Dr. Fiona S.L. Brinkman Senior Supervisor Assistant Professor, Department of Molecular Biology and Biochemistry Dr. Jamie K. Scott Supervisor Professor, Department of Molecular Biology and Biochemistry Dr. Frederic Pio Supervisor Assistant Professor, Department of Molecular Biology and Biochemistry Dr. Margo Moore Internal Examiner Professor, Department of Biological Sciences Dr. Chris Upton External Examiner Associate Professor, Department of Biochemistry and Microbiology University of Victoria Date DefendedlApproved: Friday, April 7, 2006 SIMON FRASER ,,@ UNIVERSIW~ ibra ry DECLARATION OF PARTIAL COPYRIGHT LICENCE The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users. The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection, and, without changing the content, to translate the thesislproject or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work. The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies. It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission. Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence. The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive. Simon Fraser University Library Burnaby, BC, Canada ABSTRACT Predicting the subcellular localization of a protein is a critical step in processes ranging from genome annotation to drug and vaccine target discovery. Previously developed methods for localization prediction in bacteria exhibit poor predictive performance and are not conducive to the high-throughput analysis required in this era of genome-scale biological analysis. We therefore developed PSORTb, a high-precision, high-throug hput tool for the prediction of bacterial protein localization. PSORTb implements a multi-component approach to prediction, incorporating the detection of several sequence features known to influence subcellular localization. With a reported overall precision of 96%, it is the most precise method available and one of the most comprehensive methods - capable of assigning a query protein to one or more of four Gram-positive or five Gram-negative localization sites. The PSORTb algorithm comprises a series of analytical steps, each step - or module - being an independent piece of software which scans the protein for the presence or absence of a particular sequence feature. Modules include: SCL-BLAST for homology-based detection, the HMMTOP transmembrane helix prediction tool, a signal peptide prediction tool, a series of frequent subsequence-based support vector machines, as well as motif and profile-matching modules. The modules return as output either a predicted localization site or - if the feature is not detected - a result of "unknown". The output is then integrated by a Bayesian network into a final prediction. Development of PSORTb also required the creation of PSORTdb, a database storing both known and predicted localization information for bacterial proteins. This is a valuable resource to both the localization prediction and microbial research communities, providing a source of training data for new predictive algorithms and acting as a discovery space. The release of PSORTb v.2.0 allowed us to carry out a number of analyses related to localization. We performed the first genome-wide computational and laboratory screen for N- terminal signal peptides in the opportunistic pathogen Pseudomonas aeruginosa, used PSORTb as a complement to laboratory-based high-throughput 2D gel studies of individual cellular compartments, and examined protein localization in a global context, revealing trends with implications for adaptive evolution in microbes. Keywords: Bioinformatics, localization, pathogenomics, proteomics, bacteria. ACKNOWLEDGEMENTS I would like to extend my sincere thanks to everyone involved in the PSORTb project. First and foremost, Dr. Fiona S.L. Brinkman and the members of the Brinkman Lab - past and present - who collaborated on the PSORTb work: Michael Acab, Matthew Laird, Dr. Sebastien Rey, Cory Spencer, and Christopher Walsh. Thank you also to our collaborators in Simon Fraser University's Computing Science Department: Fei Chen, Dr. Martin Ester, Dr. Oliver Schulte, Rong She, and Dr. Ke Wang; at the University of British Columbia: Dr. Raphael Gottardo, Dr. Robert E.W. Hancock, and Dr. Shawn Lewenza; and around the world: Katalin deFays, Dr. Sujun Hua, Dr. Christophe Lambert, Dr. lstvan Simon, and Dr. Gabor Tusnady. Special thanks to Dr. Kenta Nakai, who developed the first version of PSORT in 1991 and graciously allowed us access to his source code and offered much helpful advice. TABLE OF CONTENTS .. Approval ..............................................................................................................11 ... Abstract ..............................................................................................................III Dedication............................................................................................................ v Acknowledgements ...........................................................................................vi .. Table of Contents ..............................................................................................VII List of Figures ....................................................................................................xi .. List of Tables .....................................................................................................XII ... Glossary ............................................................................................................XIII 1 An introduction to protein subcellular localization in bacteria ................1 1. 1 Protein localization in the bacterial cell ...............................................1 1.2 Signals governing protein targeting in bacteria ...................................3 1.2.1 Bacterial transport systems ............................................................. 3 1.2.2 Type I transporters - ABC transporters ...........................................4 1.2.3 Type II transporters - the general secretory pathway (GSP) ...........4 1.2.4 Type V transporters - autotransporters ...........................................9 1.2.5 Type Ill and IV transporters - delivery of DNAIprotein into host cells .......................................................................................I0 1.2.6 Differences in protein transport in Gram-positive bacteria ............. 11 1.3 Laboratory-based methods for localization determination ................. 12 1.3.1 Microscopy-based visualization .....................................................12 1.3.2 PhoA fusion ...................................................................................13 1.3.3 Subcellular fractionation ................................................................14 1.3.4 Limitations of laboratory-based methods .......................................14 1.4 Computational methods for localization prediction in bacteria .......... I6 1.4.1 The importance of computational predictive methods ................... 16 1.4.2 Early computational methods for localization prediction in bacteria ..........................................................................................18 1.4.3 Recent computational methods for localization prediction in bacteria.......................................................................................... 22 1.5 Goal of the present research ............................................................27 2 ePSORTdb: A database of bacterial proteins of known localization.................................................................................................. 28 2.1 Summary ..........................................................................................28

Load more