Send Orders for Reprints to [email protected]

Combinatorial Chemistry & High Throughput Screening, 2014, 17, 173-182 173 Computer Applications Making Rapid Advances in High Throughput Microbial (HTMP) Balakrishna Anandkumar1,§, Steve W. Haga2,§ and Hui-Fen Wu*,3,4,5,6

1Department of Biochemistry and Biotechnology, Sourashtra College, Madurai 625004, India 2Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, 804, Taiwan 3Department of Chemistry, National Sun Yat Sen University, Kaohsiung, 804, Taiwan 4School of Pharmacy, College of Pharmacy, Kaohsiung Medical University, Kaohsiung, 800, Taiwan 5Center for Nanoscience and Nanotechnology, National Sun Yat-Sen University, Kaohsiung, 804, Taiwan 6Doctoral Degree Program in Marine Biotechnology, National Sun Yat-sen University, Kaohsiung, 804, Taiwan

Abstract: The last few decades have seen the rise of widely-available proteomics tools. From new data acquisition devices, such as MALDI-MS and 2DE to new database searching softwares, these new products have paved the way for high throughput microbial proteomics (HTMP). These tools are enabling researchers to gain new insights into microbial metabolism, and are opening up new areas of study, such as -protein interactions (interactomics) discovery. Computer software is a key part of these emerging fields. This current review considers: 1) software tools for identifying the proteome, such as MASCOT or PDQuest, 2) online databases of proteomes, such as SWISS-PROT, Proteome Web, or the Proteomics Facility of the Pathogen Functional Resource Center, and 3) software tools for applying proteomic data, such as PSI-BLAST or VESPA. These tools allow for research in network biology, protein identification, functional annotation, target identification/validation, protein expression, protein structural analysis, metabolic pathway engineering and drug discovery. Keywords: Drug discovery, high throughput microbial proteomics (HTMP), high throughput screening, protein identification.

1. INTRODUCTION within this proteome, 5) use Psort to identify potential targets, and then 6) use Geno3D to identify a In recent decades, proteomics research has made rapid potentially new antimicrobial agent. advances. Some of the factors driving this advance include expanding databases, improved protein In fact, many users will not even perform all of the steps identification technologies, and improved software tools for along a path. If, for example, the user already has the analyzing protein information. Microbial genome genome of the microbe (because someone else has already projects have shed light on metabolic pathways, sequenced it) then the user might simply use it. For the functions, metabolic networks, and the potential applications proteome, however, reuse is more problematic, because the of microbes in various fields. Each of these areas is its own proteome is not a constant; it changes as the microbe field of study. Proteomics has therefore given rise to many responds to its environment. This fact notwithstanding, there other “”: transcriptomics, proteogenomics and is also value in analyzing already-known proteomes. interactomics [1]. Concerning this concept of reuse, the key point is to take particular note of the five blue circles in the figure. Each of Fig. (1) presents an overview of some of the various tools these circles represents an “ome,” and each of these “omes” available in microbial protein analysis, and also presents the represents a type of information that can be obtained from various ways in which researchers generate and use private and/or publically-available databases. Consequently, proteomic data with these tools. Please note that a single user some users will not perform all of the steps comprising a full never performs all of the steps that are shown in this figure. path through Fig. (1); instead, they will start in the middle of Typically, the user will just perform the steps along the path the figure, using existing databases as their inputs. of interest. The user might wish, for example, to: 1) use 2DE to isolate proteins, 2) use MALDI-MS to create a set of Regarding the tools shown in this figure, a distinction spectra, 3) use the MASCOT software to create the must be made between software systems and specific proteome, 4) use PSI-BLAST to identify the shapes of the algorithms. Various companies offer full packages that simplify the user’s task by integrating the steps of Fig. (1) into a complete package that can be managed by an easy-to-

*Address correspondence to this author at the Department of Chemistry, use graphical user interface. In some cases these systems National Sun Yat Sen University, Kaohsiung, 804, Taiwan; may blur the lines presented in the figure, as certain tools Tel: 886-7-5252000-3955; Fax: 886-7-5253908; may have the capability to accomplish more than one task E-mail: [email protected] ` (although only one task is shown for each). This technicality §Co-first author: contributed equally to the review. is not of real consequence to the figure, however, since the

1875-5402/14 $58.00+.00 © 2014 Bentham Science Publishers 174 Combinatorial Chemistry & High Throughput Screening, 2014, Vol. 17, No. 2 Anandkumar et al.

Fig. (1). An illustration of how the various tools for high throughput microbial proteomics (HTMP) relate to one another. On the left-hand side of the figure, a microbe is subjected to molecular analysis. To obtain DNA data, shotgun cloning can be used; to obtain RNA data, reverse of expressed mRNAs can be used; to obtain protein data, various techniques involving chromatography and/or can be used. Next, this raw data can be analyzed by software to derive information about the genome, transcriptome or proteome, respectively. The middle of the figure also indicates that the genome or the transcriptome can be used to derive proteomic information. The proteomic prediction power of the genome is weak, because not all are expressed; the proteomic prediction power of the transcriptome is strong, because it measures the specific instructions for protein creation at a moment in time. Next. the right-hand side of the figure presents the various uses of the proteome. Software tools like VESPA can allow proteomic information to be annotated onto genome, at the site of the specific gene that produced it. Software tools like PSI-BLAST can be used to predict structural information about the proteins (or, more accurate information can be derived directly from the sample through tedious X-ray crystallography). Regardless of how these structures are derived, software tools such as PREDICTOME identify interactions between proteins. Alternatively, these interactions can be predicted directly from the proteomic information, with tools like BIND. Finally, on the far right of the figure, some of the real-world applications of this data are considered, along with some of the tools available for these applications. figure is intended to present a flow, but is not intended to be molecules within the microbe; but these are shown because comprehensive of all of the tools available. (If a more- these molecules provide evidence (either directly or comprehensive list is desired, consider the ExPASy indirectly) about the proteome. That is not to say that other proteomics tools page [2]). molecules, such as metabolites, are less relevant for study; instead, it is only to say that such molecules are outside the Having considered Fig. (1) as a whole, it is now time to scope of this current study, which is focused only on the delve into its details. The details of Fig. (1) will serve as a framework for the organization of the remainder of this means of obtaining and/or the means of using the proteome. paper, as we consider the various software tools that are For RNA, the process involves creating cDNA available at each stage indicated in the figure. (complementary pairs of mRNA strands), and then passing these cDNA strands over a microarray which contains 2. RAW DATA COLLECTION known DNA sequences at known positions. Based on the binding sites, the transcriptome can be predicted. Each The left-hand side of Fig. (1) describes options for the microarray manufacturer offers its own tools to interpret the physical measurement of various molecules within the results. For example, the Factor Analysis for Robust microbe of interest. Data regarding the four types of Microarray Summarization (FARMS) tool is available for molecules are discussed in the figure: RNA, DNA, peptides Affymetrix GeneChips [3]. Notice, therefore, that FARMS is and proteins. There are, of course, many other types of placed on the edge from the cDNA to the transcriptome, Computer Applications Making Rapid Advances in HTMP Combinatorial Chemistry & High Throughput Screening, 2014, Vol. 17, No. 2 175 indicating that it is one of the tools that researchers may use After performing 2DE, the result is a gel with spots at to create the transcriptome. certain locations. Two questions naturally arise from this: 1) how to ensure that spots are not overlooked, and 2) how to For DNA, the process usually involves using shotgun identify which protein is responsible for producing a specific cloning and high throughput Next Generation Sequencing Technologies to get gene sequences. These sequences are spot. To address the first question, a variety of 2DE software tools provide spot detection algorithms for photographs of then analyzed with tools for genome assembly, such as gels, such as Melanie, Melanie DIGE, Image Master, or PAGIT [4], and with tools for genome comparison, such as PDQuest. To address the second question, the scientist might MUMmer [5]. choose to either use software to directly analyze the gel For peptide identification, the researcher can excise the results, or else to continue with physical analysis on the proteins from a gel, convert them into peptides, and then now-isolated individual proteins. This latter option involves analyze them through a mass spectrography. This is the use of mass spectrometry and is described in the next indicated by the flow pattern between protein and proteome section. in Fig. (1). The typical steps in this process are: 1) enzyme For direct software analysis of photographs of gel results, digestion (usually trypsin), 2) collision-induced various tools such as Melanie, PDQuest, or 2DHunt can be fragmentation and 3) mass spectrography. These spectra contain information about the masses of the component used [24]. These software packages work by comparing the experimental photograph against a database of photographs peptides and peptide fragments. for known peptides. In addition, tools like Flicker [25] can Mass spectrometers may use one of several technologies. be used to visually compare or overlay images. Time-of-Flight (TOF) mass spectrometers infer the mass-to- Some available databases are the SWISS-PROT 2D charge ratio of ions by the means of measuring the time that the ions take to reach the target. The family of TOF mass PAGE [26], 2DWG mega-database [27], and GELBANK [28]. The GelBank repository of 2DE images is noteworthy spectrometers includes Matrix Assisted Laser Desorption because of its focus on microbes (being part of the Microbial Ionization-Mass Spectrometry (MALDI-MS) [6, 7], Tandem Proteome Project) [29]. Another noteworthy database is the MS (MS/MS) [8], and Isotope-coded affinity tag (ICAT)- ProteomeWeb [30]. This database is designed so as to Liquid Chromatography (LC-MS). An alternative family of identify proteins from data obtained via 2D Electrophoresis mass spectrometers infers an ion’s mass-to-charge ratio by the means of measuring the ion’s resonance frequency inside of microbial proteomes. The advantage of ProteomeWeb is that it is able to consider the influence of environmental of a cyclotron. One example of this method is LCMS conditions, because each organism has multiple proteomes in (FTICR) FTICR [9]. The cyclotron approach is more the database, depending upon conditions. This database also challenging than other methods since it studies the peptide has information to identify the gene positions responsible for behavior under magnetic and electric fields after ionization known proteins. but it yields offering highly accurate peptide mass results [10-12]. Belov et al. [13] has developed a high pressure LC- Beyond identifying the individual proteins within a ESI-FTICR design that is fully automated, and therefore sample, an additional task is to quantify the amounts of these suitable for high-throughput proteomics. proteins. This work often involves labeling [31], using a method such as stable isotope labeling-based isotope-coded For proteins, the process offers more options. In fact, affinity tags (ICAT), or isobaric tag for relative and absolute advances in technology have allowed protein/peptide-driven analyses to supersede technologies based on DNA quantification (iTRAQ) [32]. sequencing [14]. Consequently, we will consider these areas One specific application of 2DE is in analyzing the in greater depth. The first step in identifying proteins is secretome and the membrane proteome. A special reference typically to separate them based on physical properties, such to secretome analysis is with JVirGel [33], which works by as the protein’s mass or isoelectric point. Any of a variety of predicting signal peptides involved in protein secretion separation technologies may be used. One such technology is pathways, while attempting to exclude those preproteins that liquid chromatography (LC) [15, 16]. For stand-alone are expected to be secreted. Having computed these peptides, analysis, however, a gel is more commonly used. After the a set of virtual 2D gel images are extrapolated and then sample is placed onto the gel, an electric field is applied – compared against the experimental images. JCaMelix is a first in one direction, then in an orthogonal direction. With similar tool for identifying transmembrane helical proteins movement in two dimensions, this process is called 2D [33]. electrophoresis (2DE). Many reviews about 2DE have been published describing its applications and limitations [17-20]. 3. PROTEIN IDENTIFICATION AND QUANTITA- One such 2DE separation method is two dimensional TION FROM SPECTRA DATA polyacrylamide gel electrophoresis (2D-PAGE). By itself, VerBerkmoes et al. [34] and Nesvizhskii et al. [35] have this method cannot recognize low concentration proteins. already reviewed some of the software tools available for Typically, only a few hundred proteins can be identified in this way [21]. To obtain more accurate results, 2D analyzing mass spectra files. To go into the many possible software tools for identifying and quantifying proteins from fluorescence differential gel electrophoresis (2D DIGE) has MS data would be beyond the scope of this current review. been introduced [22, 23] in which hundreds of proteins from Instead, it suits the current purpose to briefly describe a few two different bacterial sources can be analyzed in a single representative samples of the more popular methods, such as gel with two different fluorochromes. Mascot [36], Pepsea [37], and ProtQuant [38]. 176 Combinatorial Chemistry & High Throughput Screening, 2014, Vol. 17, No. 2 Anandkumar et al.

Mascot is one of many software packages that use a TrEMBL, PROSITE, and ENZYME databases. These technique called peptide mass fingerprinting to identify the databases have been reviewed in detail [42]. peptides by the means of matching their spectra against a database. Mascot is a popular tool, in part, because of its 3.1.1. UniProtKB flexibility, as it allows the user to set many of the search UniProtKB comprises two separate databases, SWISS- parameters, such as the digestion enzymes used, the mass PROT and TrEMBLE [43]. The difference between these values, and the protein’s molecular weights. databases is their annotations. The original database, But not every analyzer uses the mass finger printing SWISS-PROT, remains very popular; since annotations on technique. Pepsea is an example of a search tool that uses a protein data are made by manual curations whereas in technique called de novo sequencing in order to identify TrEMBLE annotations are made by methods. SWISS-PROT peptides. Unlike mass finger printing, de novo sequencing does has been expanded to include SWISS-2DPAGE and SWISS- not require a database, because it directly analyzes the peptides MODEL where annotations provide many insights into the based on their mass spectra. For a simple understanding of the proteome, but the insertion of these annotations is time above method, consider a situation where a certain mass consuming, so that pace cannot keep up with the rapid spectrum contains two adjacent peaks which are separated by a advances in HTMP. The automated annotation methods in mass difference that exactly corresponds to the molecular TrEMBLE are a much faster solution and are able to keep weight of a particular amino acid. In this situation, it is likely pace with the growth of proteomic data, but it has a higher that that the heavier of the two peaks represents a peptide which chance of errors, since there is no human interaction to verify contains this amino acid at one of its ends. In reasoning about the annotations. Due to TrEMBLE's higher chance for error, the spectra peaks, computational tools such as Peptidemass and SWISS-PROT still remains useful. PeptideCutter [39] can assist, by identifying likely sites for cleavage and for post-translational modification. 3.1.2. HAMAP A hybrid approach can be developed that obtain benefits Another project on computational proteomics has been from both data-base searching and from de novo sequencing. initiated with HAMAP (High-quality Automated and Manual One tool for this is Guten Tag [40]. In addition, various Annotation of Microbial Proteomes) to overcome the above more-recent comprehensive software tools may apply a said problem of slow period manual annotations (SWISS- variety of algorithms to maximize the number of matches PROT) and error prone computational annotations found. When evaluating such a tool, scientists must consider (TrEMBLE) by combining the two approaches [44]. This sensitivity (how effective it is at identifying low- tool provides ways to analyze and annotate the proteomes of concentration species) and accuracy (how many false various and . So far, this project has positives are produced). Some additional concerns are the identified and annotated over 250,000 proteins, which quality of the provided database, the cost of the softwares, includes both the Archaea and Bacterial databases. and the ease of use of the software. A simple internet search quickly turns up various websites offering comparative 3.1.3. Microbe-Specific Proteomic Databases information regarding these comprehensive software tools. The PFGRC (Pathogen Resource Once peptides and/or proteins have been identified, the next Center) is a proteomics database started by the J. Craig step is to quantify the concentration levels of these species. One Venter Institute, for the purpose of understanding microbial of the tools to accomplish this is ProtQuant, which builds upon pathogens and bioweapons [45]. Transmembrane and the ΣXCorr method for quantitation. As its name implies, periplasmic subcellular proteomes are provided for many ΣXCorr computes cross-correlation values to provide pathogens. information about both the likelihood of the existence of a In many cases, databases are not limited to a single protein in the sample and its concentration level. In general, the position in Fig. (1). The Microbial Proteome Project [21] is accuracy of computed concentration levels is less than the such an example. This database contains gel images for accuracy of the identification test. To improve upon this, 2DE-base identification and quantification. Because of this, ProtQuant makes use of a more advanced algorithm, ΣXCorr*, it was mentioned in the earlier section dealing with that which considers information even from spectra whose peptide topic. 2DE-base not only contains gel images; it also match-score was below the confidence threshold. In other words, contains the proteomes of 11 different microbes. rather than treating these data points as zeroes, ΣXCorr* uses the small amount of information that is still present in these spectra. Microbesonline [46] is another database that crosses the boundaries within Fig. (1). Although the present discussion 3.1. Databases of Proteomes is in the context of proteomic data (such as would be needed by tools like Mascot or PepSea, in their efforts to match With decades of genomic and proteomic research having spectra against anticipated proteins), yet it should be noted been conducted, a large amount of information is now that microbes online also provides databases for related publicly available for a wide variety of microbes (as well as fields, such as the genome and the metabolome. It contains other organisms). These results have been collected into over a thousand microbial . various publicly available databases and some of those related to microbes are described by Galparin and 3.2. Proteome Derivation Fernandez-Suarez [41]. The ExPASy (the Expert Protein Analysis System) server provides a centralized repository for Moving rightward in Fig. (1), the center portion indicates many tools and databases, such as the SWISS-PROT, that the proteome does not need to be observed through Computer Applications Making Rapid Advances in HTMP Combinatorial Chemistry & High Throughput Screening, 2014, Vol. 17, No. 2 177 techniques such as mass spectrometry; it can instead be Tools also exist to create the proteogenome from the inferred (at least approximately) from either the genome or genome and the transcriptome. One of these is ssahaEST the transcriptome. Concerning the genome, proteins can be (Sequence Search and Alignment by Hashing Algorithm) predicted from the DNA sequences that would produce them [50], which searches the genome for a match to the provided (a sequence of three DNA base pairs encodes for one of the protein. Other tools include ISMARA [51], MIDAW [52] twenty amino acids that occur within proteins). This method and GEPAS [53]. The process of mapping is similar to the is not very useful on its own, however, since DNA only method using the proteome, because the transcriptome is a indicates the proteins that might be created, and since a strong indicator of the proteome. section of DNA might encode for more than one protein. Although such a direct extrapolation from the DNA cannot 3.4. Protein Structural Modeling yield an accurate proteome, it can however assist in protein identification by matching measurements with the Proteins typically interact when they share matching prediction. HAMAP (which was mentioned earlier in the amino acid subsequences. An additional complexity, context of databases) also contains a program to accomplish however, is that proteins are folded into complex shapes in this mapping of DNA into the proteome (indeed this tool has three dimensions and interactions can only occur between been used to validate the information now available in that two amino acid subsequences if they lie on the exterior database). surfaces of their proteins. A protein structure is elucidated and analyzed in terms of its domains and motifs. Compared to the genome-based derivation, the transcriptome is a more accurate way to derive the proteome, Fig. (1) illustrates that these domains and motifs may be because the cDNA transcripts are directly responsible for the derived via two different methods of differing accuracy and creation of the proteins. A tool called TromER had been difficulty. As the figure shows, one approach is to predict the introduced to map the proteome with hypothetical protein shape of a protein based on the proteomic information. Such sequences derived from both the transcriptomic data (trEST) a prediction is possible because of expert knowledge about and genomic data (trGEN) [47]. the ways that specific amino acid sequences can be expected to fold back onto themselves. PSI-BLAST [54] is a Genomic and transcriptomic data are available from probabilistic tool that predicts the physical structure of a many websites. In particularly, many of the proteomic data specified protein. The advantage of this predictive approach bases mentioned earlier also contain genome information; Microbesonline, for example, has over a thousand genomes, is speed. The disadvantage is accuracy. One of the major limitations to PSI-BLAST’s predictive ability is the as mentioned above. Another database with genomic data is existence of side-binding sites along the protein, where KEGG (Kyoto Encyclopedia of Genes and Genomes) [48]. various other molecules may attach themselves. Obviously, Although KEGG is focused on metabolism (a topic which these unexpected molecules will affect the way the protein this present review will not address), KEGG also has a folds. genomic database. In addition, it has a tool, KEGG-disease, which will be discussed later, for discovering new bio Another tool for identifying protein structures is MnM markers. (Minimotif Miner) [55]. MnM predicts which short peptides (i.e., minimotifs), will lie on the surface of the protein. False 3.3. The Proteogenome positives can be reduced by indicating situations where minimotifs are shared between different species. The proteogenome, or annotated genome, provides a To obtain more accurate structural information, the mapping of the proteome onto the genome. To understand scientist must perform direct physical measurement. As Fig. how this work fits into Fig. (1), consider that the figure (1) indicates, this involves isolating the protein of interest shows an edge from the genome to the proteome. It might be and then applying X-ray crystallography and/or Nuclear wondered whether this edge might not have more-properly Magnetic Resonance (NMR). The reason why both the been shown as going to the proteogenome – after all, if the techniques might be used is that they provide complimentary genes are used to infer the proteins, by means of the process information. One technique requires crystallizing the protein, described earlier, the question is, why not preserve the whereas the other requires dissolving it into solution. mapping at that time? The answer is that this is possible, but Whichever technique is used, however, purity is a concern. not necessarily the right choice. In the case of HAMAP One must also consider the difficulty of creating a single (which is the software shown on the edge in question), the crystal (crystallization propensity), and of dissolving the proteome is potentially inferred automatically from the protein (solubility). To address these concerns, the SPINE genome. In the case of automated inferences, the genes database [56] and SPINE analysis system were created [57]. might not really become expressed with the predicted SPINE uses a machine learning approach to predict protein protein. In contrast to this, Fig. (1) indicates that the genome solubility and crystallization propensity. With this and proteome must both first exist, before the proteogenome information, the raw measurement can be adjusted to can be built. This greatly increases the chance that the compensate for predicted imperfections. Other tools exist to annotations will be correct. As Fig. (1) shows, HAMAP is achieve a similar result, including: XANNpred [58], ParCrys able to produce the proteogenome (through manual [59], PPCpred [60], PXS [61] and XtalPred [62]. In a review annotation). VESPA (Visual Exploration and Statistics to [63], it is claimed that the XANNpred predictor performs Promote Annotation) is another package that can do this better than others. [49]. VESPA provides useful ways to display genomic data, as either integrated (high throughput proteomic (HTP) data Although X-ray crystallography and NMR are more (peptide-centric) or transcriptomic data. accurate than direct predictors like MnM, they are still 178 Combinatorial Chemistry & High Throughput Screening, 2014, Vol. 17, No. 2 Anandkumar et al. imperfect. Firstly, they are tedious. Secondly, they cannot visually intuitive way. The CuraGen tool also can compare compensate for structural changes that may occur during the the protein interaction maps of different organisms. Protein preparation and crystallography process. The Swiss PDB interaction pathways can be searched with PIMrider. viewer [64], the Java based open source program Jmol [65], and the Unix based Cn3D [66] are some of the tools to 4. DATABASES FOR NETWORKING BIOLOGY AND visualize protein structures. Swiss PDB viewer is also able to METABOLIC PATHWAYS compute the predicted protein structure (thus explaining its presence in the figure). Proteomics data sets are incomplete without an awareness of post-translational modifications (PTMs). Known proteins have known shapes. The domains and PRIDE is an example of a proteomics database that includes motifs of proteins can therefore be placed in a database. One of these is PDB (the Protein Data Bank) [67, 68]. It has more a focus of such PTMs [80]. The role of PTMs in influencing the structure and function of proteins has been discussed for than 85,000 entries, including 6,000 proteins for Escherichia archaea, prokarya and eukarya [81]. Knowledge of these coli. Within PDB, the SCOPE database holds structural PTMs is necessary for modeling networks and metabolic classifications of proteins and the CATH database holds pathways, since they control protein interactions, by design. class, architecture, topology and hierarchy information. PDB also contains atomic coordinate files from NMR and X-ray Many useful tools for protein networking visualization crystallography measurements. and analysis are available. They include: BIND - Biomolecular Interaction Network Database [82], DIP- 3.5. The Interactome Database of Interacting Proteins [83], GRID-General Repository for Interaction Datasets [84], MINT-Molecular Many cellular processes are controlled by protein-protein Interactions Database [85] and STRING – a database of interactions, as described by Causier [68]. The interactome is predicted functional associations among genes/proteins [86]. the set of interactions that may occur within a given cell. As The nodes (proteins) and edges (type of interaction) of the Fig. (1) shows, some knowledge of domains and motifs can above protein networking databases and protein-protein assist in predicting these interactions, using software such as interaction networks can be browsed with a computational PREDICTOME [69] or STRING [70]. tool APIN (Agile Protein Interaction Network browser) [87]. Alternatively, the ability of two proteins to interact with each other can be experimentally determined by protein- 5. TOOLS FOR TARGET IDENTIFICATION AND fragment complementation assay (PCA) [71]. The idea is to DRUG DISCOVERY introduce a covalent linkage of two proteins (“bait” and A primary application of microbial proteomics is the “prey”) to a third protein (reporter), but fragmented into two identification of targets for the purpose of developing pieces. This third protein is chosen such that each half has an diagnosis methods and therapeutic compounds for microbial affinity for one of the proteins of interest. If the two proteins of interest bind, this can be inferred from the functioning of pathogens and diseases. Target selection for these purposes relies on the specific cellular localization of the protein, the third protein. Referring to Fig. (1), this possibility is not depending on whether they are present inside the cytoplasm shown, because it is not really a part of HTMP – and because or on the transmembrane or outside of the cell. Tools exist to it would clutter the figure. It is not a part of HTMP because predict these locations with good accuracy. Among these, it is not geared to high throughput. If it were drawn, it would Proteome Analyst (PA) can predict the subcellular be represented as an edge directly between the microbe sample on the far left and the interactome circle. localization of the protein, by allowing the user to create custom predictors [88]. It also can predict the genetic Although HTMP does not involve performing PCA, this function (using Gene Quiz) and the molecular function is not to say that its results cannot be used by HTMP. (using ) of the protein. Although PA is Instead, the work of previous researchers, compiled into considered comparatively more sensitive than some other databases, is a fast way to predict interactions. The yeast tools [89], there are still a great many other localization two-hybrid system (Y2H) [72, 73] is such a system to study predictors, such as CELLO [90], SignalP [91], SecretomeP protein-protein interactions, as previously reviewed in [74]. [92], and the various versions of PSORTb (for Gram- Another method called phage display is also being used negative bacteria versus Gram-positive bacteria) [93]. in studying recombinant protein expression and selection of Target selection can be achieved by tools such as interacting molecules in human and microbial systems [75]. XANNpred [58], ParCrys [59], and OB-Score [94]. Some of these programs were discussed earlier, in regard to their 3.5.1. Tools for Using the Interactome support for crystallography as well. Target optimization can Protein-protein interactions lead to the field of functional be achieved by TarO [95], which searches homologues from proteomics. PIMrider [76], the CuraGen data analysis a group of predicted targets, ranking the results based on the software [77], and Myriad Genetics’ ProNet [78] are some presence of multiple sequence alignments. commercial platforms used to navigate the interactome (by Once target proteins are identified, potential antimicrobials displaying interaction maps) and the proteogenome (by are predicted based on such information as the predicted automatically displaying relevant annotation information) structure and the local similarity [96]. There are many servers [79]. In other words, these tools help the user to manage and programs for three dimensional comparative protein massive amounts of information, by the means of sorting modeling, including SWISS-MODEL [97], Geno3D [98], what is important and by displaying the information in a ROBETTA [99], and MODELLER [100]. Computer Applications Making Rapid Advances in HTMP Combinatorial Chemistry & High Throughput Screening, 2014, Vol. 17, No. 2 179

Organic compounds and very short peptides that may Hopefully, the flow framework presented herein may assist function as antimicrobials are predicted with ab initio programs in grasping the interrelations between these fields and tools. by comparing with yet unknown homologues. There are programs for molecular modeling using suitable for CONFLICT OF INTEREST peptides and proteins with identified and annotated homologues. In silico design of antimicrobials is being done with combined The authors confirm that they do not have any conflict of relational database and molecular modeling tools [96]. interst. The easiest method of identifying a new drug is to simply test whether any of the target proteins is likely to react with a ACKNOWLEDGEMENTS pre-existing drug [101] or anti-bacteriocin [102]. If this fails, short peptides can be predicted with ab initio programs that The authors acknowledge National Science Council, Taiwan for the funding. search for as-yet-unknown homologues [98]. In the process of drug discovery, the proteome is searched for “drug friendly” features, such as ligand binding clefts in the nuclear hormone REFERENCES receptors, or binding sites for enzymes such as [1] Zhang, W.; Li, F.; Lei Nie, L. Integrating multiple ‘omics’ analysis phosphodiesterases, cytochrome p450s, or proteinase [103, for microbial biology: application and methodologies. Microbiol., 104]. 2010, 156, 287-301. [2] The ExPASy proteomics tools page. http://www.expasy.org/tools Based on pathways, the Y2H system for yeasts is an (Accessed November 16, 2013). important part of drug discovery [105, 106]. In addition, the [3] Hochreiter, S.; Clevert, D-A.; Obermayer, K. A new summarization Y2H subsystem for transmembrane proteins has also proved method for affymetrix probe level data. , 2006, 22, useful [107]. Tools such as PIMRider are also useful in 943-949. [4] Swain, M.T.; Tsai, I.J.; Assefa, S.A.; Newbold, C.; Berriman, M.; analyzing Y2H data. Beyond drug discovery, Y2H is also useful Otto, T.D. A post-assembly genome-improvement toolkit (PAGIT) for disease diagnosis. Based on the protein interactions to obtain annotated genomes from contigs. Nat. Protoc., 2012, 7, identified by Y2H, toxic molecules can be predicted as disease 1260-1284. markers. [5] Kurtz, S.; Phillippy, A.; Delcher, A.L.; Smoot, M.; Shumway, M.; Antonescu, C.; Salzberg, S.L. Versatile and open software for Vaxign is a web based vaccine design tool that uses many comparing large genomes. Genome Biology, 2004, 5, R12. types of proteogenomic information to predict vaccine targets [6] Sauer, S.; Kliem, M. Mass spectrometry tools for the classification [108]. The program considers a protein’s subcellular location, and identification of bacteria. Nat. Rev. Microbiol., 2010, 8, 74-82. [7] Yan, W.; Hwang, D.; Aebersold, R. Quantitative proteomic its transmembrane domain, its adhesion probability, its analysis to profile dynamic changes in the spatial distribution of sequence conservation likelihood across genomes, its sequence cellular proteins. Methods Mol. Biol., 2008, 432, 389-401. similarity to the host proteome, and its epitope binding to MHC [8] Baggerman, G.; Vierstraete, E.; De Loof, A.; Schoofs, L. Gel-based classes I and II [108]. Vaxign currently provides results of target versus gel-free proteomics: a review. Comb. Chem. High Throughput Screen., 2005, 8, 669-677. predictions for more than 40 pathogen genomes. [9] Nie, L.; Wu, G.; Zhang, W. Statistical application and challenges in HTMP is used to find drugs for some of the world's most global gel-free proteomic analysis by mass spectrometry. Crit. Rev. Biotechnol., 2008, 28, 297-307. serious health problems, including malaria, which is caused [10] Mørtz, E.; O'Connor, P.B.; Roepstorff, P.; Kelleher, N.L.; Wood, by a protozoan: malaria being caused by protozoan plasmo- T.D.; McLafferty, F.W.; Mann, M. Sequence tag identification of dium species. An approach called QSAR (quantitative intact proteins by matching tandem mass spectral data against structure−activity relationship) classifies and predicts anti- sequence data bases. Proc. Natl. Acad. Sci. U.S.A., 1996, 93, 8264- malarial activity using fingerprints - TOMOCOMD−CARDD 8267. [11] Qian, W-J.; Camp, D.G.II.; Smith, R.D. High-throughput (TOpological MOlecular COMputer Design−Computer proteomics using Fourier transform ion cyclotron resonance mass Aided “Rational” ) [109]. For finding drug spectrometry. Expert Rev. Proteomics., 2004, 1(1), 87-95. targets using the in silico docking approach , there is a [12] Marshall, A.G.; Hendrickson, C.L.; Jackson, G.S. Fourier project called WISDOM - World-wide In Silico Docking on transform ion cyclotron resonance mass spectrometry: a primer. Mass Spectrom. Rev., 1998, 17, 1-35. Malaria) in which more than 40 million compounds have [13] Belov, M.E.; Gorshkov, M.V.; Udseth, H.R. Anderson, G.A.; been screened for potential malaria and to suppress the Tolmachev, A.V.; Prior, D.C.; Harkewicz, R.; Smith, R.D. Initial disease [110]. implementation of an electrodynamic ion funnel with FTICR mass spectrometry. J. Am. Soc. Mass. Spectrom., 2000, 11, 19-23. [14] Domon, B.; Aebersold, R. Mass spectrometry and protein analysis. 6. CONCLUSIONS Science, 2006, 312, 212-217. [15] de Hoog, C.L.; Mann, M. Annual Review of Genomics and Human Proteomics takes center stage in the universe of microbial Genetics. Proteomics, 2004, 5, 267-293. ‘–omics,’ because the proteome is the target of so much [16] Gygi, S.P.; Corthals, G.L.; Zhang, Y.; Rochon, Y.; Aebersold, R. research in industrial, environmental and clinical fields. Evaluation of two-dimensional gel electrophoresis-based proteome analysis technology. PNAS, 2000, 97, 9390-9395. Proteome analysis with high throughput technologies and [17] Issaq, H.J.; Veenstra, T.D. Two-dimensional polyacrylamide gel data generated from them are delivering vast knowledge electrophoresis (2D-PAGE): advances and perspectives. through the use of the computational tools discussed here. BioTechniques, 2008, 44, 697-700. These measurement techniques, software tools, and pre- [18] Rabilloud, T.; Chevallet, M.; Sylvie Luche, S.; Lelong, C. Two- calculated databases for HTMP facilitate advancements in dimensional gel electrophoresis in proteomics: Past, present and future. J. Proteomics, 2010, 73, 2064– 2077. microbial identification, protein quantitation, expression [19] Sá-Correia, I.; Teixeira, M.C. 2D electrophoresis-based expression analyses, metabolic pathways, protein localization, proteomics: a microbiologist’s perspective. Expert Rev. modifications, structure analysis and drug discovery. Proteomics, 2010, 7, 943-953. 180 Combinatorial Chemistry & High Throughput Screening, 2014, Vol. 17, No. 2 Anandkumar et al.

[20] O’Farrell, P.H. High resolution two-dimensional electrophoresis of [41] Galperin, M.Y.; Fernandez-Suarez, X-M. The 2012 Nucleic Acids proteins. J. Biol. Chem., 1975, 250, 4007-4021. Research Database Issue and the online Molecular Biology [21] Sonck, K.A.; Kint, G.; Schoofs, G.; Vander Wauven, C.; Database Collection. Nucleic Acids Res., 2012, 40, D1-D8. Vanderleyden, J.; De Keersmaecker, S.C. The proteome of [42] Gasteiger, E.; Gattiker, A.; Hoogland, C.; Ivanyi, I.; Appel, R.D.; Salmonella typhimurium grown under in vivo - mimicking Bairoch, A. ExPASy: the proteomics server for in-depth protein conditions. Proteomics, 2009, 9, 565-579. knowledge and analysis. Nucleic Acids Res., 2003, 31, 3784-3788. [22] Wojcik, J.; Schächter, V. Proteomic databases and software on the [43] Uniprot Database. http://www.uniprot.org (Accessed November 16, web. Briefings in Bioinformatics, 2000, 1, 250-259. 2013). [23] Hoogland, C.; Sanchez, J-C.; Walther, D.; Baujard, V.; Baujard, [44] Gattiker, A.; Michoud, K.; Rivoire, C.; Auchincloss, A.H.; O.; Tonella, L.; Hochstrasser, D. F.; Appel, R. D. Two dimensional Coudert, E.; Lima, T.; Kersey, P.; Pagni, M.; Sigrist, C.J.A.; electrophoresis resources available from ExPASy. Electrophoresis, Lachaize, C.; Veuthey, A-L.; Gasteiger, E.; Bairoch, A. Automated 1999, 20, 3568-3571. annotation of microbial proteomes in Swiss-Prot. Comput. Biol. [24] Lemkin, P.F.; Thornwall, G. Flicker image comparison of 2-D gel Chem., 2003, 27, 49-58. images for putative protein identification using the 2DWG meta- [45] Pathogen Functional Genomics Resource Center: J. Craig Venter database. Mol. Biotechnol., 1999, 12, 159-172. Institute. http://pfgrc.jcvi.org (Accessed November 16, 2013). / [25] Hoogland, C.; Mostaguir, K.; Appel, R.D.; Lisacek, F. The World- [46] Dehal, P.S.; Joachimiak, M.P.; Price, M.N.; Bates, J.T.; Baumohl, 2DPAGE Constellation to promote and publish gel-based J.K.; Chivian, D.; Friedland, G.D.; Huang, K.H.; Keller, K.; proteomics data through the ExPASy server. J. Proteomics, 2008, Novichkov, P.S.; Dubchak, I.L.; Alm, E.J.; Arkin, A.P. 71, 245-248. MicrobesOnline: an integrated portal for comparative and [26] Lemkin, P.F.; Thornwall, G. Flicker image comparison of 2-D gel functional genomics. Nucleic Acids Res., 2010, 38, D396-D400. images for putative protein identification using the 2DWG meta- [47] Sperisen, P.; Iseli, C.; Pagni, M.; Stevenson, B.J.; Bucher, P.; database. Mol. Biotechnol., 1999, 12, 159-172. Jongeneel, C.V. Trome, trEST and trGEN: databases of predicted [27] Babnigg, G.; Giometti, C.S. GELBANK: a database of annotated protein sequences. Nucleic Acids Res., 2004, 32, D509-D511. two-dimensional gel electrophoresis patterns of biological systems [48] Kanehisa, M.; Araki, M.; Goto, S.; Hattori, M.; Hirakawa, M.; Itoh, with completed genomes. Nucleic Acids Res., 2004, 32, D582 - M.; Katayama, T.; Kawashima, S.; Okuda, S.; Tokimatsu, T.; D585. Yamanishi, Y. KEGG for linking genomes to life and the [28] Giometti, C.S.; Babnigg, G.; Tollaksen, S.L.; Khare, T.; Ahrendt, environment. Nucleic Acids Res., 2008, 36, D480-D484. A.; Zhu, W.; Lovley, D.R.; Fredrickson, J.K.; Yates, J.R. III. The [49] Peterson, E.S.; McCue, L.A.; Schrimpe-Rutledge, A.C.; Jensen, Microbial Proteome Project: A Database of Microbial Protein J.L.; Walker, H.; Kobold, M.A.; Webb, S.R.; Payne, S.H.; Ansong, Expression in the Context of Genome Analysis. Accelerating C.; Adkins, J.N.; Cannon, W.R.; Webb-Robertson, B-J.M. VESPA: Discovery for Energy and Environment Workshop. DOE software to facilitate genomic annotation of prokaryotic organisms Genomics: GTL, Contractor-Grantee Workshop III, Washington, through integration of proteomic and transcriptomic data. BMC D.C. 2005, pp.58-59. Genomics, 2012, 13, 131-143. [29] Babnigg, G.; Giometti, C.S. ProteomeWeb: A web-based interface [50] ssahaEST: Sequence Search and Alignment by Hashing Algorithm. for the display and interrogation of proteomes. Proteomics, 2003, http://www.sanger.ac.uk/resources/software/ssahaest/ (Accessed 3, 584-600. November 16, 2013). [30] Haqqani, A.S.; Kelly, J.F.; Stanimirovic, D.B. Quantitative protein [51] ISMARA - Integrated System for Motif Activity Response profiling by mass spectrometry using label-free proteomics. Analysis. http://ismara.unibas.ch/fcgi/mara (Accessed November Methods Mol. Biol., 2008, 439, 241-256. 16, 2013). [31] Bruce, J.E.; Anderson, G.A.; Wen, J.; Harkewicz, R.; Smith, R. [52] Romualdi, C.; Vitulo, N.; Favero, M.D.; Lanfranchi, G. MIDAW: a High mass measurement accuracy for 100 % sequence coverage of web tool for statistical analysis of microarray data. Nucleic Acids enzymatically digested bovine serum albumin from an ESI-FTICR Res., 2005, 33, W644–W649. mass spectrum. Anal. Chem., 1999, 71, 2595-2599. [53] Vaquerizas, J.M.; Conde, L.; Yankilevich, P.; Cabezon, A.; [32] Belov, M.E.; Anderson, G.A.; Wingerd, M.; Udseth, H.R.; Tang, Minguez, P.; Dıaz-Uriarte, R.; Al-Shahrour, F.; Herrero, J.; K.; Prior, D.C.; Swanson, K.R.; Buschbach, M.A.; Strittmatter, Dopazo, J. GEPAS, an experiment-oriented pipeline for the E.F.; Moore, R.J.; Smith, R.D. An automated high performance analysis of microarray data. Nucleic Acids Res., capillary liquid chromatography-Fourier transform ion cyclotron 2005, 33, W616–W620. resonance mass spectrometer for high-throughput proteomics. J. [54] Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Am. Soc. Mass. Spectrom., 2004, 15, 212-232. Webb Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: [33] Hiller, K.; Grote, A.; Maneck, M.; Munch, R.; Jahn, D. JVirGel a new generation of protein database search programs. Nucleic 2.0: computational prediction of proteomes separated via two- Acids Res., 1997, 25, 3389-3402. dimensional gel electrophoresis under consideration of membrane [55] Rajasekaran, S.; Merlin, J.C.; Kundeti, V.; Oommen, A.; Mi, T.; and secreted proteins. Bioinformatics, 2006, 22, 2441-2443. Vyas, J.; Alaniz, I.; Chung, K.; Chowdhury, F.; Deverasatty, S.; [34] VerBerkmoes, N.C.; Connelly, H.M.; Pan, C.; Hettich, R.L. Mass Irvey, T.M.; Lacambacal, D.; Lara, D.; Panchangam, S.; spectrometric approaches for characterizing bacterial proteomes. Rathnayake, V.; Watts, P.; Schiller, M.R. A computational tool for Expert Rev. Proteomics, 2004, 1, 433-447. identifying minimotifs in protein-protein interactions and [35] Nesvizhskii, A.I.; Vitek, O.; Aebersold, R. Analysis and validation improving the accuracy of minimotif predictions. Proteins, 2011, of proteomic data generated by tandem mass spectrometry. Nat. 79, 153-164. Methods, 2007, 4, 787-797. [56] SPINE database and analysis system for the Northeast Structural [36] Perkins, D.N.; Pappin, D.J.C.; Creasy, D.M.; Cottrell, J.S. Genomics Consortium. Probability-based protein identification by searching sequence http://spine.nesg.org/user_login.cgi?url=http://spine.nesg.org/front_ databases using mass spectrometry data. Electrophoresis, 1999, 20, page.cgi? (Accessed November 16, 2013). 3551-3567. [57] Bertone, P.; Kluger, Y.; Lan, N.; Zheng, D; Christendat, D.; Yee, [37] Mann, M.; Hojrup, P.; Roepstorff, P. Use of mass spectrometric A.; Edwards, A.M.; Arrowsmith, C.H.; Montelione, T.; Gerstein, molecular weight information to identify proteins in sequence M. SPINE: an integrated tracking database and data mining databases. Biol. Mass. Spectrom., 1993, 22, 338-345. approach for identifying feasible targets in high-throughput [38] Bridges, S.M.; Magee, G.B.; Wang, N.; Williams, W.P.; Burgess, structural proteomics. Nucleic Acids Res., 2001, 29, 2884-2898. S.C.; Nanduri, B. ProtQuant: a tool for the label-free quantification [58] Overton, I.M.; van Niekerk, C.A.J.; Barton, G.J. XANNpred: of MudPIT proteomics data. BMC Bioinformatics, 2007, 8 (Suppl Neural nets that predict the propensity of a protein to yield 7), S24. diffraction-quality crystals. Proteins, 2011, 79, 1027-1033. [39] PeptideCutter. http://web.expasy.org/peptide_cutter (Accessed [59] Overton, I.M.; Padovani, G.; Girolami, M.A.; Barton, G.J. ParCrys: November 16, 2013). a Parzen window density estimation approach to protein [40] Tabb, D.L.; Saraf, A.; Yates, J.R. GutenTag: high-throughput crystallization propensity prediction. Bioinformatics, 2008, 24, sequence tagging via an empirically derived fragmentation model. 901-907. Anal. Chem., 2003, 75, 6415-6421. Computer Applications Making Rapid Advances in HTMP Combinatorial Chemistry & High Throughput Screening, 2014, Vol. 17, No. 2 181

[60] Mizianty, M.J.; Kurgan, L. Sequence-based prediction of protein [84] Breitkreutz, B.J.; Stark, C.; Tyers, M. The GRID: the General crystallization, purification and production propensity. Repository for Interaction Datasets. Genome Biol., 2003, 4, R23. Bioinformatics, 2011, 27, i24-i33. [85] Zanzoni, A.; Montecchi-Palazzi, L.; Quondam, M.; Ausiello, G.; [61] Price, W.N. II.; Chen, Y.; Handelman, S.K.; Neely, H.; Manor, P.; Helmer-Citterich, M.; Cesareni, G. MINT: a Molecular Karlin, R.; Nair, R.; Liu, J.; Baran, M.; Everett, J.; Tong, S.N.; INTeraction database. FEBS Lett., 2002, 513, 135-140. Forouhar, F.; Swaminathan, S.S.; Acton, T.; Xiao, R.; Luft, J.R.; [86] von Mering, C.; Huynen, M.; Jaeggi, D.; Schmidt, S.; Bork, P.; Lauricella, A.; DeTitta, G.T.; Rost, B.; Montelione, G.T.; Hunt, J.F. Snel, B. STRING: a database of predicted functional associations Understanding the physical properties controlling protein between proteins. Nucleic Acids Res., 2003, 31, 258-261. crystallization based on analysis of large-scale experimental data. [87] de Las Rivas, J.; de Luis, A. Interactome data and databases: Nat. Biotechnol., 2009, 27, 51-57. different types of protein interaction. Comp. Funct. Genom., 2004; [62] Slabinski, L.; Jaroszewski, L.; Rodrigues, A.P.C.; Rychlewski, L.; 5, 173-178. Wilson, I.A.; Lesley, S.A.; Godzik, A. The challenge of protein [88] Szafron, D.; Lu, P.; Greiner, R.; Wishart, D.S.; Poulin, B.; Eisner, structure determination—lessons from . Protein R.; Lu, Z.; Anvik, J.; Macdonell, C.; Fyshe, A.; Meeuwis, D. Sci., 2007, 16, 2472-2482. Proteome Analyst: custom predictions with explanations in a web- [63] Overton, I.M.; Barton, G.J. Computational approaches to selecting based tool for high-throughput proteome annotations. Nucleic Acids and optimizing targets for . Methods, 2011, 55, 3- Res., 2004, 32, W365-W371. 11. [89] Lu, Z.; Szafron, D.; Greiner, R.; Lu, P.; Wishart, D.S.; Poulin, B.; [64] Johansson, M.U.; Zoete, V.; Michielin, O.; Guex, N. Defining and Anvik, J.; Macdonell, C.; Eisner, R. Predicting subcellular searching for structural motifs using DeepView/Swiss-PdbViewer. localization of proteins using machine learned classifiers. BMC Bioinformatics, 2012, 13, 173. Bioinformatics, 2004, 20, 547-556. [65] Jmol. http://jmol.sourceforge.net (Accessed November 16, 2013). [90] Yu, C.; Lin, C.; Hwang, J. Predicting subcellular localization of [66] Wang, Y.; Geer, L.Y.; Chappey, C.; Kans, J.A.; Bryant, S.H. proteins for Gram-negative bacteria by support vector machines Cn3D: sequence and structure views for Entrez. Trends Biochem. based on n-peptide compositions. Protein Sci., 2004, 13, 1402- Sci., 2000, 25, 300-302. 1406. [67] Deshpande, N.; Addess, K.J.; Bluhm, W.F.; Merino-Ott, J.C.; [91] Bendtsen, J.D.; Nielsen, H.; von Heijne, G.; Brunak, S. Improved Townsend-Merino, W.; Zhang, Q.; Knezevich, C.; Xie, L.; Chen, prediction of signal peptides: SignalP 3.0. J. Mol. Biol., 2004, 340, L.; Feng, Z.; Green, R.K.; Flippen-Anderson, J.L.; Westbrook, J.; 783-795. Berman, H.M.; Bourne, P.E. The RCSB Protein Data Bank: a [92] Bendtsen, J.D.; Kiemer, L.; Fausbøll, A.; Brunak, S. Non-classical redesigned query system and relational database based on the protein secretion in bacteria. BMC Microbiol., 2005, 5, 58-70. mmCIF schema. Nucleic Acids Res., 2005, 33, D233–D237. [93] Gardy, J.L.; Laird, M.R.; Chen, F.; Rey, S.; Walsh, C.J.; Ester M.; [68] Rose, P.W.; Beran, B.; Bi, C.; Bluhm, W.F.; Dimitropoulos, D.; Brinkman, F.S.L. PSORTb v.2.0: Expanded prediction of bacterial Goodsell, D.S.; Prlic, A.; Quesada, M.; Quinn, G.B.; Westbrook, protein subcellular localization and insights gained from J.D.; Young, J.; Yukich, B.; Zardecki, C.; Berman, H.M.; Bourne, comparative proteome analysis. Bioinformatics, 2005, 21, 617-623. P.E. The RCSB Protein Data Bank: redesigned web site and web [94] Overton, I.M.; Barton, G.J. A normalised scale for structural services. Nucleic Acids Res., 2011, 39, D392-D401. genomics target ranking: The OB-Score. FEBS Lett. 2006, 580, [69] Predictome. http://visant.bu.edu (Accessed November 16, 2013). 4005-4009. [70] von Mering, C.; Huynen, M.; Jaeggi, D.; Schmidt, S.; Bork, P.; [95] Overton, I.M.; van Niekerk, C.A.J.; Carter, L.G.; Dawson, A.; Snel, B. STRING: a database of predicted functional associations Martin, D.M.A.; Cameron, S.; McMahon, S.A.; White, M.F.; between proteins. Nucleic Acids Res., 2003, 31, 258-261. Hunter, W.N.; Naismith, J.H.; Barton, G.J. TarO: a target [71] Pelletier, J.N.; Remy, I.; Michnick, S.W. Protein-Fragment optimisation system for structural biology. Nucleic Acids Res., Complementation Assays: A general strategy for the in vivo 2008, 36, W190-W196. detection of Protein-Protein Interactions. J. Biomol.Tech., 1998, [96] Hammami, R.; Fliss, I. Current trends in antimicrobial agent http://www.abrf.org/JBT/Articles/JBT0012/jbt0012.html (Accessed research: chemo- and bioinformatics approaches. Drug Dis. Today, November 16, 2013). 2010, 15, 540-546. [72] Fields, S.; Sternglanz, R. The yeast two-hybrid system: An assay [97] Arnold, K.; Bordoli, L.; Kopp, J.; Schwede, T. The SWISS- for protein–protein interactions. Trends Genet., 1994, 10, 286-292. MODEL workspace: a web-based environment for protein structure [73] Colas, P.; Brent, R. The impact of two-hybrid and related methods homology modelling. Bioinformatics, 2006, 22, 195-201. on biotechnology. Trends Biotechnol., 1998, 16, 355-363. [98] Combet, C.; Jambon, M.; Deleage, G.; Geourjon, C. Geno3D: [74] [74] Causier, B. Studying the interactome with the yeast two- automatic comparative molecular modelling of protein. hybrid system and mass spectrometry. Mass Spectrometry Rev., Bioinformatics, 2002, 18, 213-214. 2004, 23, 350-367. [99] Kim, D.E; Chivian, D.; Baker, D. Protein structure prediction and [75] Walter, G.; Zoltan Konthur, Z.; Lehrach, H. High-throughput analysis using the Robetta server. Nucleic Acids Res., 2004, 32, Screening of Surface Displayed Gene Products. Comb. Chem. High W526-W531. Throughput Screen., 2001, 4, 193-205. [100] Sali, A.; Blundell, T.L. Comparative protein modelling by [76] Hybrigenics. https://pim.hybrigenics.com (Accessed November 16, satisfaction of spatial restraints. J. Mol. Biol., 1993, 234, 779-815. 2013). [101] Weir, M.; Swindells, M.; Overington, J. Insights into protein [77] CuraGen data analysis software. http://portal.curagen.com function through large-scale computational analysis of sequence (Accessed November 16, 2013). and structure. Trends in Biotechnology, 2001, 19, S61-66. [78] Myriad Genetics’ ProNet. http://pronet.doubletwist.com (Accessed [102] Hammami, R.; Abdelmajid Zouhir, A.; Lay, C.L.; Hamida J.B.; November 16, 2013). Fliss, I. BACTIBASE second release: a database and tool platform [79] Marcotte, E.M. Practical computational approaches to inferring for bacteriocin characterization. BMC Microbiol., 2010, 10, 22. protein function. Drug Discovery Today: BIOSILICO, 2004, 2, 24- [103] Laskowski, R. Luscombe, N.M.; Swindells, M.B.; Thornton, J.M. 29. Protein clefts in molecular recognition and function. Protein Sci., [80] Jones, P.; Côté, R.G.; Martens, L.; Quinn, A.F.; Taylor, C.F.; 1996, 5, 2438–2452. Derache, W.; Hermjakob, H.; Apweiler, R. PRIDE: a public [104] Brady, G.P.; Stouten, P.; Weirst, F. Prediction and visualization of repository of protein and peptide identifications for the proteomics protein binding pockets with PASS. J. Comput. Aided Mol. Design, community. Nucleic Acids Res., 2006, 34, D659-D663. 2000, 14, 383–401. [81] Jensen, O.N. Interpreting the protein language using proteomics. [105] Hollingsworth, R.; White, J.H. Target discovery using the yeast Nat. Rev. Mol. Cell Biol., 2006, 7, 391-403. two-hybrid system. Drug Discovery Today: Targets, 2004, 3, 97- [82] Bader, G.D.; Betel, D.; Hogue, C.W. BIND: the Biomolecular 103. Interaction Network Database. Nucleic Acids Res., 2003, 31, 248- [106] Hamdi, A.; Colas, P. Yeast two-hybrid methods and their 250. applications in drug discovery. Trends Pharmacol. Sci., 2012, 33, [83] Xenarios, I.; Salwinski, L.; Duan, X.J.; Higney, P.; Kim, S-M.; 109-118. Eisenberg, D. DIP, the Database of Interacting Proteins: a research [107] Parrish, J.R.; Gulyas, K.D.; Finley, R.L. Jr. Yeast two-hybrid tool for studying cellular networks of protein interactions. Nucleic contributions to interactome mapping. Curr. Opinion Biotechnol., Acids Res., 2002, 30, 303-305. 2006, 17, 387–393. 182 Combinatorial Chemistry & High Throughput Screening, 2014, Vol. 17, No. 2 Anandkumar et al.

[108] Xiang, Z.; He, Y. Vaxign: a web-based vaccine target design total and atom-type quadratic maps. J. Chem. Inf. Model, 2005, 45, program for reverse vaccinology. Procedia in Vaccinol., 2009, 1, 1082-1100. 23-29. 2nd Vaccine Global Congress, Boston 2008. [110] Ananthula, R.S.; Ravikumar, M.; Pramod, A.B. Strategies for [109] Marrero-Ponce, Y.; Iyarreta-Veitı ́a, M.; Montero-Torres, A,; generating less toxic P-selectin inhibitors: pharmacophore Romero-Zaldivar, C.; Brandt, C.A.; Ávila, P.A.; Kirchgatter, K.; modeling, virtual screening and counter pharmacophore screening Machado, Y. Ligand-based virtual screening and in silico design of to remove toxic hits. J. Mol. Graph. Model, 2008, 27, 546-557. new antimalarial compounds using nonstochastic and stochastic

 Received: May 30, 2013 Revised: October 10, 2013 Accepted: October 14, 2013