Computer Applications Making Rapid Advances in High Throughput Microbial Proteomics (HTMP) Balakrishna Anandkumar1,§, Steve W
Total Page:16
File Type:pdf, Size:1020Kb
Send Orders for Reprints to [email protected] Combinatorial Chemistry & High Throughput Screening, 2014, 17, 173-182 173 Computer Applications Making Rapid Advances in High Throughput Microbial Proteomics (HTMP) Balakrishna Anandkumar1,§, Steve W. Haga2,§ and Hui-Fen Wu*,3,4,5,6 1Department of Biochemistry and Biotechnology, Sourashtra College, Madurai 625004, India 2Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, 804, Taiwan 3Department of Chemistry, National Sun Yat Sen University, Kaohsiung, 804, Taiwan 4School of Pharmacy, College of Pharmacy, Kaohsiung Medical University, Kaohsiung, 800, Taiwan 5Center for Nanoscience and Nanotechnology, National Sun Yat-Sen University, Kaohsiung, 804, Taiwan 6Doctoral Degree Program in Marine Biotechnology, National Sun Yat-sen University, Kaohsiung, 804, Taiwan Abstract: The last few decades have seen the rise of widely-available proteomics tools. From new data acquisition devices, such as MALDI-MS and 2DE to new database searching softwares, these new products have paved the way for high throughput microbial proteomics (HTMP). These tools are enabling researchers to gain new insights into microbial metabolism, and are opening up new areas of study, such as protein-protein interactions (interactomics) discovery. Computer software is a key part of these emerging fields. This current review considers: 1) software tools for identifying the proteome, such as MASCOT or PDQuest, 2) online databases of proteomes, such as SWISS-PROT, Proteome Web, or the Proteomics Facility of the Pathogen Functional Genomics Resource Center, and 3) software tools for applying proteomic data, such as PSI-BLAST or VESPA. These tools allow for research in network biology, protein identification, functional annotation, target identification/validation, protein expression, protein structural analysis, metabolic pathway engineering and drug discovery. Keywords: Drug discovery, high throughput microbial proteomics (HTMP), high throughput screening, protein identification. 1. INTRODUCTION proteins within this proteome, 5) use Psort to identify potential targets, and then 6) use Geno3D to identify a In recent decades, proteomics research has made rapid potentially new antimicrobial agent. advances. Some of the factors driving this advance include expanding genome databases, improved protein In fact, many users will not even perform all of the steps identification technologies, and improved software tools for along a path. If, for example, the user already has the analyzing protein information. Microbial genome sequencing genome of the microbe (because someone else has already projects have shed light on metabolic pathways, gene sequenced it) then the user might simply use it. For the functions, metabolic networks, and the potential applications proteome, however, reuse is more problematic, because the of microbes in various fields. Each of these areas is its own proteome is not a constant; it changes as the microbe field of study. Proteomics has therefore given rise to many responds to its environment. This fact notwithstanding, there other “omics”: transcriptomics, proteogenomics and is also value in analyzing already-known proteomes. interactomics [1]. Concerning this concept of reuse, the key point is to take particular note of the five blue circles in the figure. Each of Fig. (1) presents an overview of some of the various tools these circles represents an “ome,” and each of these “omes” available in microbial protein analysis, and also presents the represents a type of information that can be obtained from various ways in which researchers generate and use private and/or publically-available databases. Consequently, proteomic data with these tools. Please note that a single user some users will not perform all of the steps comprising a full never performs all of the steps that are shown in this figure. path through Fig. (1); instead, they will start in the middle of Typically, the user will just perform the steps along the path the figure, using existing databases as their inputs. of interest. The user might wish, for example, to: 1) use 2DE to isolate proteins, 2) use MALDI-MS to create a set of Regarding the tools shown in this figure, a distinction spectra, 3) use the MASCOT software to create the must be made between software systems and specific proteome, 4) use PSI-BLAST to identify the shapes of the algorithms. Various companies offer full packages that simplify the user’s task by integrating the steps of Fig. (1) into a complete package that can be managed by an easy-to- *Address correspondence to this author at the Department of Chemistry, use graphical user interface. In some cases these systems National Sun Yat Sen University, Kaohsiung, 804, Taiwan; may blur the lines presented in the figure, as certain tools Tel: 886-7-5252000-3955; Fax: 886-7-5253908; may have the capability to accomplish more than one task E-mail: [email protected] ` (although only one task is shown for each). This technicality §Co-first author: contributed equally to the review. is not of real consequence to the figure, however, since the 1875-5402/14 $58.00+.00 © 2014 Bentham Science Publishers 174 Combinatorial Chemistry & High Throughput Screening, 2014, Vol. 17, No. 2 Anandkumar et al. Fig. (1). An illustration of how the various tools for high throughput microbial proteomics (HTMP) relate to one another. On the left-hand side of the figure, a microbe is subjected to molecular analysis. To obtain DNA data, shotgun cloning can be used; to obtain RNA data, reverse transcription of expressed mRNAs can be used; to obtain protein data, various techniques involving chromatography and/or mass spectrometry can be used. Next, this raw data can be analyzed by software to derive information about the genome, transcriptome or proteome, respectively. The middle of the figure also indicates that the genome or the transcriptome can be used to derive proteomic information. The proteomic prediction power of the genome is weak, because not all genes are expressed; the proteomic prediction power of the transcriptome is strong, because it measures the specific instructions for protein creation at a moment in time. Next. the right-hand side of the figure presents the various uses of the proteome. Software tools like VESPA can allow proteomic information to be annotated onto genome, at the site of the specific gene that produced it. Software tools like PSI-BLAST can be used to predict structural information about the proteins (or, more accurate information can be derived directly from the sample through tedious X-ray crystallography). Regardless of how these structures are derived, software tools such as PREDICTOME identify interactions between proteins. Alternatively, these interactions can be predicted directly from the proteomic information, with tools like BIND. Finally, on the far right of the figure, some of the real-world applications of this data are considered, along with some of the tools available for these applications. figure is intended to present a flow, but is not intended to be molecules within the microbe; but these are shown because comprehensive of all of the tools available. (If a more- these molecules provide evidence (either directly or comprehensive list is desired, consider the ExPASy indirectly) about the proteome. That is not to say that other proteomics tools page [2]). molecules, such as metabolites, are less relevant for study; instead, it is only to say that such molecules are outside the Having considered Fig. (1) as a whole, it is now time to scope of this current study, which is focused only on the delve into its details. The details of Fig. (1) will serve as a framework for the organization of the remainder of this means of obtaining and/or the means of using the proteome. paper, as we consider the various software tools that are For RNA, the process involves creating cDNA available at each stage indicated in the figure. (complementary pairs of mRNA strands), and then passing these cDNA strands over a microarray which contains 2. RAW DATA COLLECTION known DNA sequences at known positions. Based on the binding sites, the transcriptome can be predicted. Each The left-hand side of Fig. (1) describes options for the microarray manufacturer offers its own tools to interpret the physical measurement of various molecules within the results. For example, the Factor Analysis for Robust microbe of interest. Data regarding the four types of Microarray Summarization (FARMS) tool is available for molecules are discussed in the figure: RNA, DNA, peptides Affymetrix GeneChips [3]. Notice, therefore, that FARMS is and proteins. There are, of course, many other types of placed on the edge from the cDNA to the transcriptome, Computer Applications Making Rapid Advances in HTMP Combinatorial Chemistry & High Throughput Screening, 2014, Vol. 17, No. 2 175 indicating that it is one of the tools that researchers may use After performing 2DE, the result is a gel with spots at to create the transcriptome. certain locations. Two questions naturally arise from this: 1) how to ensure that spots are not overlooked, and 2) how to For DNA, the process usually involves using shotgun identify which protein is responsible for producing a specific cloning and high throughput Next Generation Sequencing Technologies to get gene sequences. These sequences are spot. To address the first question, a variety of 2DE software tools provide spot detection algorithms for photographs of then analyzed with tools for genome assembly, such as gels, such as Melanie, Melanie DIGE, Image Master, or PAGIT [4], and with tools for genome comparison, such as PDQuest. To address the second question, the scientist might MUMmer [5]. choose to either use software to directly analyze the gel For peptide identification, the researcher can excise the results, or else to continue with physical analysis on the proteins from a gel, convert them into peptides, and then now-isolated individual proteins. This latter option involves analyze them through a mass spectrography. This is the use of mass spectrometry and is described in the next indicated by the flow pattern between protein and proteome section.