Transportdb 2.0: a Database for Exploring Membrane Transporters in Sequenced Genomes from All Domains of Life Liam D

D320–D324 Nucleic Acids Research, 2017, Vol. 45, Database issue Published online 28 November 2016 doi: 10.1093/nar/gkw1068 TransportDB 2.0: a database for exploring membrane transporters in sequenced genomes from all domains of life Liam D. H. Elbourne*, Sasha G. Tetu, Karl A. Hassan and Ian T. Paulsen* Department of Chemistry and Biomolecular Sciences, Macquarie University, NSW 2109, Australia Received September 09, 2016; Revised October 20, 2016; Editorial Decision October 24, 2016; Accepted October 25, 2016 ABSTRACT Genomic DNA sequencing efforts are generating data on predicted genes many orders of magnitude faster than lab- All cellular life contains an extensive array of oratory experimentalists can investigate. This has resulted membrane transport proteins. The vast majority in an increased requirement for high throughput and high of these transporters have not been experimen- quality in silico bioinformatic annotation that is not accom- tally characterized. We have developed a bioinfor- modated by basic genome annotation pipelines. For exam- matic pipeline to identify and annotate complete ple, thousands of membrane transport proteins have been sets of transporters in any sequenced genome. experimentally characterized over the course of the last few This pipeline is now fully automated enabling it decades (3), but we now have on the order of millions of pre- to better keep pace with the accelerating rate of dicted transporters. Furthermore, the number and type of genome sequencing. This manuscript describes transport systems varies widely between genomes (4). How- TransportDB 2.0 (http://www.membranetransport. ever, experimentally characterized transporters can be utilized as a reference set from which the functions of the puta- org/transportDB2/), a completely updated version of tive transporters can be computationally inferred based on TransportDB, which provides access to the large vol- primary sequence identity, predicted secondary structural umes of data generated by our automated trans- homology and topology. porter annotation pipeline. The TransportDB 2.0 TransportDB is a MySQL database with the objective web portal has been rebuilt to utilize contemporary of providing annotations for predicted transporters across JavaScript libraries, providing a highly interactive in- sequenced genomes. In 2004, TransportDB was initially terface to the annotation information, and incorpo- released with membrane transport analysis conducted on rates analysis tools that enable users to query the 121 genomes (5), and subsequently updated in 2007 (3) database on a number of levels. For example, Trans- to 248 genomes. The analyses were conducted by a semi- portDB 2.0 includes tools that allow users to select automated bioinformatic pipeline with additional manual annotated genomes of interest from the thousands curation. Subsequently a further 117 genomes were analyzed and the results included in TransportDB, facilitating of species held in the database and compare their large-scale analysis of membrane transporter distribution complete transporter complements. (3,4). There are tens of thousands of complete and/or draft genome sequences in the major public databases and these INTRODUCTION numbers continue to increase at an accelerating rate (6,7). The semi-automatic bioinformatic characterisation, com- All living cells have proteins in their cell membrane(s), gen- bined with a degree of manual curation, utilized by the orig- erally referred to as transport proteins, that play crucial inal TransportDB, does not scale well and is inadequate at roles in fundamental processes such as the uptake of nu- keeping pace with the rapid rate of sequence data genera- trients, the efflux of toxic and other compounds and inion tion. homeostasis. Transport proteins can be simple channels or To the best of our knowledge there are no other databases pores created in the membrane, that facilitate diffusions of that currently describe transporters amongst large numbers compounds down their concentration gradient, or active of sequenced genomes. Other transporter tools or databases transporters that utilize either ATP and PEP hydrolysis, or that exist include TransportTP, an annotation pipeline op- chemiosmotic energy in the form of an electrochemical pro- timized for eukaryotic organisms, that uses homology mod- ton, ion or solute gradient, to drive the movement of solutes eling approaches followed by machine learning methods, against their concentration gradient (1,2). *To whom correspondence should be addressed. Tel: +61 2 98508122; Fax: +61 2 98508313; Email: [email protected] Correspondence may also be addressed to Ian T. Paulsen. Tel: +61 2 98508152; Fax: +61 2 98508313; Email: [email protected] C The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] Nucleic Acids Research, 2017, Vol. 45, Database issue D321 which requires a model organism to be selected to refine TMHMM 2.0c (20) searches for each protein sequence, us- the predictions (8). There are some databases that focus ing the databases listed above where relevant. on only one type of transporter (e.g. the Ligand-Gated Ion Subsequently a combination of Perl, bash and PHP Channel database; (9)) or include the transporters only of a scripts identify and load all potential transport protein can- single model organism (e.g. Human Transporter Database, didates into a MySQL database, based on them having po- (10)). The Transporter Classification Database, TCDB11 ( ), tentially significant identity to transporter candidates in the is a very well developed resource for classification of trans- TransportDB, TCDB, Pfam, TIGRfam and COG datasets, porters, whose main focus is to classify model proteins into as well as putative transmembrane regions as determined families (analogous to the enzyme commission system) but using TMHMM. These searches are designed to be inclu- it does not list annotations for whole organisms. sive to minimize false negatives in this first stage of the se- In this paper, we present the latest iteration of the lection process. Subsequently, a combination of PHP and database, TransportDB 2.0, which now utilizes an aug- Perl scripts are used to drive a rigorous filtering of the candi- mented pipeline with completely automated transporter dates, using empirically derived rules to eliminate false pos- annotation, and hence is able to better keep pace with itives. the rate of data acquisition. We have populated Trans- The above processes result in the generation of three portDB 2.0 with the membrane transporter annotation for SQL files per genome, encoding: (i) the genome data (tax- over 2700 closed genomes, including the representative se- onomy, size, transporter number), (ii) the identified trans- quences available from NCBI’s RefSeq database, and this porters and (iii) the history of the searches (the nearest is being added to on an ongoing basis. Furthermore, we database matches to the identified transporters). Each of have modernized the web portal for TransportDB 2.0, to these SQL files are then uploaded to corresponding tables in better visualize data generated from the increased numbers the MySQL database on the TransportDB 2.0 server (http: of genomes now available, to expedite the future inclusion //www.membranetransport.org/transportDB2/)andareac- of more interactive and engaging ways of viewing the data cessible through the TransportDB 2.0 portal. presented, and to facilitate a range of comparative analyses. Web interface Data sources The previous version of the website serving TransportDB The transporter annotation pipeline uses data sources was primarily server-side driven, with PHP scripts provid- including NCBI’s RefSeq (ftp://ftp.ncbi.nih.gov/genomes/ ing service of the MySQL database to access to the raw refseq/bacteria)(12)andCOG(13) databases, the Trans- transporter annotation data. In order to move toward pro- porter Classification Database (TCDB; www.tcdb.org)(11) viding more engaging and dynamic ways of viewing con- as well as selected HMMs for transporter protein families tent, and incorporate new analytical tools which facilitate from the TIGRfam and Pfam databases (14,15). An addi- greater user interaction, we have developed a new website tional important datasource is the manually curated set of utilizing the more extensible, and client-side based resources over 100 000 membrane transport proteins from the origi- provided by the jQuery (21) and D3 (22) JavaScript libraries. nal TransportDB database. RefSeq is used as the source for The front page of the new site includes several panels new genome sequences to provide a diverse set of annota- of summary information and mechanisms to access data. tions for organisms with genome sequences available, while The top of the page has links to data or analysis tools (de- avoiding the potential redundancy of GenBank and other tailed below), as well as an overview describing the current International Nucleotide Sequence Database Collaboration contents of the database, including the number of genomes databases. All data sources are updated regularly to ensure that have been analyzed from the broad taxonomic divisions

Transportdb 2.0: a Database for Exploring Membrane Transporters in Sequenced Genomes from All Domains of Life Liam D

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support