Active Site Detection by Spatial Conformity and Electrostatic Analysis—Unravelling a Proteolytic Function in Shrimp Alkaline Phosphatase
Total Page:16
File Type:pdf, Size:1020Kb
Active Site Detection by Spatial Conformity and Electrostatic Analysis—Unravelling a Proteolytic Function in Shrimp Alkaline Phosphatase Sandeep Chakraborty, Renu Minda, Lipika Salaye, Swapan K. Bhattacharjee, Basuthkar J. Rao* Department of Biological Sciences, Tata Institute of Fundamental Research, Mumbai, India, Abstract Computational methods are increasingly gaining importance as an aid in identifying active sites. Mostly these methods tend to have structural information that supplement sequence conservation based analyses. Development of tools that compute electrostatic potentials has further improved our ability to better characterize the active site residues in proteins. We have described a computational methodology for detecting active sites based on structural and electrostatic conformity - CataLytic Active Site Prediction (CLASP). In our pipelined model, physical 3D signature of any particular enzymatic function as defined by its active sites is used to obtain spatially congruent matches. While previous work has revealed that catalytic residues have large pKa deviations from standard values, we show that for a given enzymatic activity, electrostatic potential difference (PD) between analogous residue pairs in an active site taken from different proteins of the same family are similar. False positives in spatially congruent matches are further pruned by PD analysis where cognate pairs with large deviations are rejected. We first present the results of active site prediction by CLASP for two enzymatic activities - b-lactamases and serine proteases, two of the most extensively investigated enzymes. The results of CLASP analysis on motifs extracted from Catalytic Site Atlas (CSA) are also presented in order to demonstrate its ability to accurately classify any protein, putative or otherwise, with known structure. The source code and database is made available at www.sanchak.com/clasp/. Subsequently, we probed alkaline phosphatases (AP), one of the well known promiscuous enzymes, for additional activities. Such a search has led us to predict a hitherto unknown function of shrimp alkaline phosphatase (SAP), where the protein acts as a protease. Finally, we present experimental evidence of the prediction by CLASP by showing that SAP indeed has protease activity in vitro. Citation: Chakraborty S, Minda R, Salaye L, Bhattacharjee SK, Rao BJ (2011) Active Site Detection by Spatial Conformity and Electrostatic Analysis—Unravelling a Proteolytic Function in Shrimp Alkaline Phosphatase. PLoS ONE 6(12): e28470. doi:10.1371/journal.pone.0028470 Editor: Vladimir N. Uversky, University of South Florida College of Medicine, United States of America Received July 27, 2011; Accepted November 8, 2011; Published December 8, 2011 Copyright: ß 2011 Chakraborty et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was funded by the Tata Institute of Fundamental Research (Department of Atomic Energy). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] Introduction considerably from their intrinsic values [16,17]. These large pKa deviations from standard values have been used for predicting The PDB database has more than 60,000 protein structures to active site residues [18–20]. The pKa perturbations of the ionizable date [1]. Classification of proteins is the logical outcome of the residues on the surface of proteins are minimal, while those buried motivation to add a sense of order in this rapidly growing database in the interior of the protein show considerable shifts [21,22]. [2–7]. Identifying the catalytic sites in a protein is an important Significant shifts in pKa values usually stem from interactions with first step towards characterizing its function. Active site determi- other ionizable residues, hydrogen bonding with non-ionizable nation by site directed mutagenesis methods is often time residues and its exposure to the bulk solvent. The buried nature of consuming and labour intensive to apply on the large number of residues in active sites often results in their high electrostatic new proteins being discovered on a continual basis. potential which has been used as a differentiator between catalytic Currently several methods are available for identifying active and other residues [23,24]. site residues in a protein, amongst which methods based on There are essentially two flavors in the existing models of active electrostatic potentials are increasingly gaining ground. Electro- site prediction. The first one requires a motif of a known static interactions determine various properties of biomolecules enzymatic function to search for the same within a protein of such as catalytic activity, ligand binding, structure and stability [8]. interest. Several web-servers give access to 3D motif based search Finite difference Poisson-Boltzmann (FDPB) electrostatics is used methods in proteins with known structures. SPASM [25] and to compute potential differences (PD) and pKa values from charge RASMOT [26] search for a specified motif in the queried protein. interactions in proteins [9–11]. This continuum model of charges MultiBind recognizes 3D-binding patterns common to several led to the development of tools for studying electrostatic protein structures against precomputed ligands [27]. PAR-3D uses interactions [12–14]. These tools implement fast numerical 3D motifs to identify several different functions but is restricted to approximation of solutions to the linearized Poisson-Boltzmann a few protein activities [28]. Superimpose allows searching for a equation [15]. Active site residues have pKa values that differ specific 3D motif in protein structure databases [29]. A method PLoS ONE | www.plosone.org 1 December 2011 | Volume 6 | Issue 12 | e28470 Active Site Prediction Using 3D and Electrostatics that detects recurring three-dimensional side-chain patterns of amino acids in proteins has concluded that alkaline phosphatases and aminopeptidases have evolved convergently [30]. A recent method, ProBis, detects structurally similar sites on protein surfaces by a local structural alignment technique [31], and compares its results with other analogous methods [32–34]. Though most of these methods do a spatial match of motifs, they do not factor in properties related to electrostatic potentials for eliminating false positive matches. Our current method addresses this shortcoming. The other approaches typically describe learning from known active sites across a variety of catalytic functions and use these learnt properties to predict active sites. Finally these results must be tested either in silico, in vitro or in vivo in order to validate the assigned putative activity. The search for active sites in these in silico methods are generally guided by time tested thumb rules [35]. Conservation of residues in a family of proteins, their presence in a cleft, solvent accessibility and hydrophilicity are a few such properties. RESBOOST integrates the characteristics of catalytic residues and presents them as a logical expression of simple rules [36]. DISCERN combines phylogenetic scores, information from structure, properties computed for structural neighbours and a statistical regularization to control for overfitting [37]. Another method also combines multiple predictors into a Figure 1. CLASP flow. The flow for using a motif from a protein with single classifier [38]. ConCavity uses evolutionary sequence known function and 3D structure in identifying active sites in an conservation estimates with structure-based surface pocket pre- unknown protein. Electrostatic potentials are precomputed for proteins diction [39]. Other methods that analyze evolutionary conserva- already in the PDB Database or computed on the fly for new proteins. tion essentially rely on clusters of conserved sites [40], or create These properties are then used to prune out false positives. doi:10.1371/journal.pone.0028470.g001 databases of sequence segments that encode structural attributes of the binding pockets [41]. However none of the above mentioned methods use electrostatic features in the search process, while a substrate accessibility, cleft geometry, etc that are critically few do not provide much flexibility in modulating the specificity involved in defining an active site. and sensitivity. We present here a computational methodology, CataLytic Active Some methods developed recently use electrostatic properties to Site Prediction (CLASP), for active site detection. A motif from a discriminate active sites. Active sites are known to occur in protein of known function and 3D structure provides the initial electrostatically unfavourable environments [18,42]. Often active template for searching spatially congruent matches. These sites involve ionizable groups where anomalous pKa values have matches are then pruned by the observation that the PD in pairs been tagged with catalytic activity [19]. Q-SiteFinder does ligand of residues in active sites is highly correlated. Our method can binding site prediction by computing energetically favourable interrogate any given hypothetical protein with known structure binding sites on the surface of a protein [43]. THEMATICS for motifs derived