A Repository and Analysis Resource for Post-Translational Modifications And
Total Page:16
File Type:pdf, Size:1020Kb
Published online 20 November 2014 Nucleic Acids Research, 2015, Vol. 43, Database issue D521–D530 doi: 10.1093/nar/gku1154 ProteomeScout: a repository and analysis resource for post-translational modifications and proteins Matthew K. Matlock, Alex S. Holehouse and Kristen M. Naegle* Department of Biomedical Engineering and the Center for Biological Systems Engineering, Washington University, St Louis, MO 63130, USA Received August 15, 2014; Revised October 28, 2014; Accepted October 29, 2014 ABSTRACT databases––it is the only database that encompasses PTM compendia and allows for the community deposition of ProteomeScout (https://proteomescout.wustl.edu)is PTM experiments, and it is the only web service that has a resource for the study of proteins and their post- built-in subset selection and enrichment for exploring and translational modifications (PTMs) consisting of a analyzing PTM experiments. ProteomeScout is also unique database of PTMs, a repository for experimental data, in the world of protein resources––it is a resource that en- an analysis suite for PTM experiments, and a tool tails a flexible, track-based viewer of protein annotations. for visualizing the relationships between complex The purposeful design of these features was inspired by protein annotations. The PTM database is a com- the historical progression of resources for other types of pendium of public PTM data, coupled with user- molecules, including: the UCSC Genome Browser (1)and uploaded experimental data. ProteomeScout pro- the Gene Expression Omnibus (GEO) (2). The rapid expan- vides analysis tools for experimental datasets, in- sion in the identification of protein PTMs, enabled by high- throughput experimental techniques (3), has led to explo- cluding summary views and subset selection, which sive growth in the number of specialized PTM databases, can identify relationships within subsets of data by although none of which reflect the full capabilities of the testing for statistically significant enrichment of pro- UCSC Genome Browser or the GEO. A sampling of PTM- tein annotations. Protein annotations are incorpo- specific databases includes Phosida (4), PhosphoSitePlus rated in the ProteomeScout database from external (5), phospho.ELM (6), PTMScout (7), dbPTM (8), PTM- resources and include terms such as Gene Ontology SD (9), PTMCode (10) and O-GLYCBASE (11). The names annotations, domains, secondary structure and non- of these databases reflect the progression of PTM research, synonymous polymorphisms. These annotations are and despite initially being focused on phosphorylation, available in the database download, in the analysis many of them now encompass multiple modification types. tools and in the protein viewer. The protein viewer Growth in PTM identification is just one part of the rapid allows for the simultaneous visualization of anno- expansion in PTM-centric research; the other part involves the quantitative profiling of PTM changes across condi- tations in an interactive web graphic, which can tions. New experimental capabilities have enabled a number be exported in Scalable Vector Graphics (SVG) for- of studies, including high-throughput profiling of tyrosine mat. Finally, quantitative data measurements asso- phosphorylation changes in response to receptor tyrosine ciated with public experiments are also easily view- kinase activation (12,13), methylation response to methyl- able within protein records, allowing researchers to transferase inhibition (14) and ubiquitination response to see how PTMs change across different contexts. proteasome inhibition (15). These datasets contain rich in- ProteomeScout should prove useful for protein re- formation about the regulation of proteins. However, sup- searchers and should benefit the proteomics com- plementary data standards in this research field have made munity by providing a stable repository for PTM ex- it difficult for researchers to garner the power of this infor- periments. mation. The status quo for PTM data involves publication of iso- lated, static supplementary tables. This means that there is INTRODUCTION no single standard format, no central repository and, due to ProteomeScout is a database of protein post-translational the volatility of protein databases (16), there is no persistent modifications, protein annotations and a web resource for mapping of identified PTMs to protein database records. analysis of PTMs and proteins. ProteomeScout is unique Protein identifiers become out-of-date relatively quickly, in the world of post-translational modification (PTM) and in the most extreme cases, disappear altogether––as *To whom correspondence should be addressed. Tel: (314) 935-7665; Fax: (314) 9357448; Email: [email protected] C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. D522 Nucleic Acids Research, 2015, Vol. 43, Database issue with the discontinuation of the International Protein In- by exporting the viewer using Scalable Vector Graphics for- dex (IPI) identifiers. The desire to have a stable, useable mats. and accessible repository of experiments has happened This paper will discuss ProteomeScout features and the before––for high-throughput gene expression experiments. underlying methods. We have included Figure 1 as a ref- This desire resulted in the development of the GEO (2), erence map for the type of interactions a typical user now an important mainstay in the research community. We might have. We will focus on those features that make have created ProteomeScout to serve as a living repository ProteomeScout most unique. These include the breadth of of high-throughput PTM experiments, thereby enabling the PTMs, the repository aspects for PTM experiments, the stability and accessibility of published experimental data in analysis suite available for datasets and the protein viewer the hopes that in the future it may have as much utility in for visualization of the relationship between complex pro- the field of PTM research as GEO does in gene expression tein annotations. ProteomeScout can be found at https: research. To encourage its adoption as a standard reposi- //proteomescout.wustl.edu. tory, we have focused on making it easy to deposit datasets and created the ability to maintain private datasets, which MATERIALS AND METHODS enables private analysis and evaluation prior to publishing to the public domain. Protein annotations In addition to a centralized repository, useful analysis The protein loading process begins when accessions are methods are also necessary, in order to dissect meaningful loaded through one of the following three interfaces: (i) information from high-throughput PTM experiments. We Dataset Upload, (ii) Batch Search or (iii) Annotate Sites. have developed a number of methods, and utilized them in When a protein accession is passed to ProteomeScout our own work, which led to novel insights from PTM exper- through these methods, the appropriate external database is iments, including prediction of novel protein interactions queried for the taxonomy and protein sequence associated from dynamic phosphorylation data (17), the identification with that accession. If the sequence from that species al- of upregulated motifs downstream of the the EGFRvIII ready exists in ProteomeScout, then the accession is added. receptor (18), prediction of PTMs acting in specific path- If the protein sequence does not already exist, then all of the ways (19) and dissection of poorly understood signaling following protein annotations (Table 1) are added to Pro- networks (20). ProteomeScout incorporates a number of teomeScout for the new protein. these analysis tools, plus others, with a focus on ease-of-use, flexibility and reproducibility. Protein sequences. Proteins specified by UniProt accession Another tool needed in protein-centric research is a flex- numbers are obtained from the UniProtKB batch query ible protein viewer, which allows for the simultaneous vi- web service. Proteins specified using other accession num- sualization of complex protein annotations. PTMs are just bers are, if possible, fetched from NCBI Entrez (24) using one complexity in proteins, which are ultimately biochemi- the Biopython (25) Entrez API. cal machines. It is a difficult task to predict how alterations in sequence properties, such as those introduced by modi- PTM ontologies and rules. PTM ontologies are estab- fications or through non-synonymous polymorphisms, will lished from the UniProt ptmlist (http://www.uniprot.org/ affect a protein’s structure and function. Chromosomes are docs/ptmlist) and the ExPASy FindMod tool (http://web. also complex linear sequences composed of regulatory ele- expasy.org/findmod/findmod masses.html)(26). Rules for a ments, modification sites, exons, introns and so forth. The PTM are established by extracting the ‘Target Residue’ and UCSC Genome Browser was developed to give researchers ‘Taxonomic Range’ from the UniProt ptmlist. These rules the ability to visualize the components that make up chro- are applied when modified peptides are loaded from Up- mosomes. Proteins, composed of complex secondary struc- load Dataset ––ProteomeScout ensures that the annotated ture elements, independent domains, sequence variation, residue and the