Computational Biology, Protein Chemistry and Mass Spectrometry
Total Page:16
File Type:pdf, Size:1020Kb
CORE Metadata, citation and similar papers at core.ac.uk Provided by Elsevier - Publisher Connector FEBS Letters 580 (2006) 4764–4770 Minireview Phosphoproteomics toolbox: Computational biology, protein chemistry and mass spectrometry Majbrit Hjerrild*, Steen Gammeltoft Department of Clinical Biochemistry, Glostrup Hospital, Nordre Ringvej, DK-2600 Glostrup, Denmark Received 1 June 2006; revised 21 July 2006; accepted 25 July 2006 Available online 4 August 2006 Edited by Francesc Posas tion, or interaction with other proteins. Regulation of the Abstract Protein phosphorylation is important for regulation of most biological functions and up to 50% of all proteins are cell cycle, membrane transport and permeability, cell adhesion, thought to be modified by protein kinases. Increased knowledge neurotransmission, and metabolism are examples of biological about potential phosphorylation of a protein may increase our functions that are modulated through protein phosphorylation understanding of the molecular processes in which it takes part. [1]. The human genome contains 518 different protein kinases Despite the importance of protein phosphorylation, identification and identification of their biological targets is an active re- of phosphoproteins and localization of phosphorylation sites is search area [2]. Even though the number of identified phospho- still a major challenge in proteomics. However, high-throughput proteins is rapidly increasing especially due to development of methods for identification of phosphoproteins are being devel- high-throughput methods for the identification of phospho- oped, in particular within the fields of bioinformatics and mass proteins, in particular within the fields of bioinformatics and spectrometry. In this review, we present a toolbox of current mass spectrometry, it is believed that only a small fraction of technology applied in phosphoproteomics including compu- tational prediction, chemical approaches and mass spectro- physiological phosphorylation sites has been assigned. In this metry-based analysis, and propose an integrated strategy for review a toolbox of techniques currently available for analysis experimental phosphoproteomics. of the phosphoproteome is presented (Fig. 1). Based on this Ó 2006 Federation of European Biochemical Societies. Published overview we propose an integrated strategy for high-through- by Elsevier B.V. All rights reserved. put analysis of phosphorylated proteins (Fig. 2). Keywords: Phosphoproteomics; Phosphorylation; Bioinformatics; Mass spectrometry 2. Computational phosphoproteomics The specificity of protein kinases is determined by e.g. acidic, basic, or hydrophobic amino acids adjacent to the phosphoac- ceptor site often referred to as the consensus sequence of the 1. Phosphoproteomics of cellular regulation kinase. Alternatively, a key determinant for MAP kinase and CDK specificity is a proline in the +1 position. A wide range The cellular proteome is highly dynamic because the ex- of computational approaches have been developed for predic- pressed proteins, their abundance, and their post-translational tions of phosphorylation sites ranging from simple motif modifications depend on the physiological state of the cell. searches to more complex methods like the artificial neural net- Phosphorylation is one of the most common and best charac- works (ANN) where sequence correlations can be taken into terized post-translational modifications (PTM) of cellular pro- account (Table 1). teins. Activation of protein kinases and phosphatases provides Definition of sequence motifs for specific kinases have, for a powerful control of the phosphorylation state, and thus the example, been included in the PROSITE database [3]. The subsequent biological process e.g. through alteration of pro- drawback of this simple motif search is that the consensus se- tein activity, subcellular localization, degradation, conforma- quence often is based on limited data and that the sensitivity of this approach tend to be quite low [4]. Another simple ap- *Corresponding author. Fax: +45 43 23 39 29. proach is the group-based phosphorylation scoring (GPS) E-mail address: [email protected] (M. Hjerrild). method, which is based on comparison of the sequence sur- rounding the reported phosphorylation sites (three residues Abbreviations: ANN, artificial neural network; ESI, electrospray ion- on each side of the site) to a given heptapeptide in the candi- ization; GPS, group-based phosphorylation site predicting and scoring date substrate. This approach has recently been used for platform; HPLC, high performance liquid chromatography; HPRD, human protein reference database; IMAC, immobilized metal affinity prediction of phosphorylation sites for 71 protein kinase sub- chromatography; LC-MS/MS, liquid chromatography tandem mass families [5]. spectrometry; MALDI-TOF, matrix-assisted laser desorption ioniza- A more complex method for prediction of phosphorylation tion time-of-flight; PTM, post-translational modification; QTOF, sites is based on weight matrices which defines more diverse quadrupole time-of-flight; SDS–PAGE, sodium dodecyl sulphate–po- lyacrylamide gel electrophoresis; SILAC, stable isotope labelling by patterns and makes it possible to rank the predicted phosphor- amino acid in cell culture; siRNA, small interfering RNA; SVM, su- ylation sites. Scansite is a weight matrix-based kinase-specific pport vector machine phosphorylation sites prediction server containing more than 0014-5793/$32.00 Ó 2006 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.febslet.2006.07.068 M. Hjerrild, S. Gammeltoft / FEBS Letters 580 (2006) 4764–4770 4765 PHOSPHOPROTEOMICS TOOLBOX Computational biology Protein chemistry Mass spectrometry + Pi pS pT pY pH 3.5 + Phosphorylation site? - + pH 1.9 - 1900 m/z 1980 Phosphorylation site prediction Protein chemistry Enrichment of phosphopeptides - PROSITE - In vitro or in vivo radiolabeling - Fractionation by HPLC -GPS - 2D phosphopeptide mapping - IMAC columns - Scansite - Phosphoamino acid analysis - Titanium dioxide columns - NetPhos/NetPhosK - Sequencing by Edman degradation - Graphite columns - PredPhospho Biochemistry Mass spectrometry analysis - KinasePhos - Immunochemistry using phospho- - Peptide mapping in combination with - Predikin specific antibodies phosphatase treatment - Kinase inhibitors - Sequencing by tandem MS Phosphorylation site databases - Site-directed mutagenesis - Sequencing by hypothesis-driven MS - PhosphoBase - Silencing of the protein kinase by Quantitative phosphoproteomics - Phospho.ELM small interfering RNA - Chemical labeling in vitro - PhosphoSite - Metabolic labeling in vivo - Human Protein Reference Database - Swiss-Prot Fig. 1. Techniques currently used for phosphoproteome analysis including computational biology, protein chemistry, and mass spectrometry-based analysis. SIX STEPS IN EXPERIMENTAL PHOSPHOPROTEOMICS 1. Computational prediction of kinase substrates and phosphorylation sites 2. Phosphorylation in vitro or in vivo of proteins by active kinases (e.g. radiolabelling) 3a. Gel electrophoresis, 2D 3b.Phosphopeptide enrichment by IMAC or phosphopeptide mapping, autoradiography titanium dioxide chromatography 4a. Sequencing by Edman degradation, 4b. Sequencing by tandem mass spectrometry phosphoamino acid analysis (MS/MS), hypothesis-driven MS/MS 5. Validation by immunochemistry, site-directed mutagenesis, kinase inhibitors, siRNA 6. Functional characterization of phosphoprotein and phosphorylation sites in vitro and in vivo Fig. 2. Strategies for integrated analysis of phosphoproteins using computational biology, protein chemistry, and mass spectrometry. 60 motifs characterizing binding or substrate specificities of amino acids have been omitted during synthesis of the many families of Ser/Thr- or Tyr-kinases, SH2, SH3, PDZ, peptide library. Secondly, the synthetic peptide includes only 14-3-3 and PTB domains [6]. The prediction of phosphoryla- eight amino acid residues around the site of phosphorylation tion sites by Scansite is based on in vitro phosphorylation of and neglects the role of more distant residues for specificity. an oriented peptide library by specific protein kinases [7].A Finally, synergy between amino acids in two or more positions major advantage of this approach is that the sequence motifs in the motif will be underestimated by this approach. are determined in unbiased experiments and that no prior Sequence motifs are complex in the sense that positional cor- knowledge of substrates is required. One limitation is that relation between several residues is significant for the specific- the optimal sequence might not be determined, since five ity. Hence, complex machine learning approaches such as 4766 M. Hjerrild, S. Gammeltoft / FEBS Letters 580 (2006) 4764–4770 Table 1 WWW-accessible phosphorylation site prediction servers and databases WWW-accessibility Phosphorylation site prediction servers GPS http://973-proteinweb.ustc.edu.cn/gps/gps_web/ KinasePhos http://KinasePhos.mbc.nctu.edu.tw/ NetPhos www.cbs.dtu.dk/services/NetPhos/ NetPhosK www.cbs.dtu.dk/services/NetPhosK/ PredPhospho www.ngri.re.kr/proteo/PredPhospho.htm/ PREDIKIN http://florey.biosci.uq.edu.au/kinsub/home.htm Prosite http://us.expasy.org/prosite/ Scansite http://scansite.mit.edu Phosphorylation site databases PhosphoBase http://www.cbs.dtu.dk/databases/PhosphoBase/ Phospho.ELM http://phospho.elm.eu.org/ Human protein reference database (HPRD) http://www.hprd.org/ PhosphoSite http://www.phosphosite.org/