PINOT: an Intuitive Resource for Integrating Protein-Protein Interactions James E

Tomkins et al. Cell Communication and Signaling (2020) 18:92 https://doi.org/10.1186/s12964-020-00554-5 METHODOLOGY Open Access PINOT: an intuitive resource for integrating protein-protein interactions James E. Tomkins1, Raffaele Ferrari2, Nikoleta Vavouraki1, John Hardy2,3,4,5,6, Ruth C. Lovering7, Patrick A. Lewis1,2,8, Liam J. McGuffin9* and Claudia Manzoni1,10* Abstract Background: The past decade has seen the rise of omics data for the understanding of biological systems in health and disease. This wealth of information includes protein-protein interaction (PPI) data derived from both low- and high-throughput assays, which are curated into multiple databases that capture the extent of available information from the peer-reviewed literature. Although these curation efforts are extremely useful, reliably downloading and integrating PPI data from the variety of available repositories is challenging and time consuming. Methods: We here present a novel user-friendly web-resource called PINOT (Protein Interaction Network Online Tool; available at http://www.reading.ac.uk/bioinf/PINOT/PINOT_form.html) to optimise the collection and processing of PPI data from IMEx consortium associated repositories (members and observers) and WormBase, for constructing, respectively, human and Caenorhabditis elegans PPI networks. Results: Users submit a query containing a list of proteins of interest for which PINOT extracts data describing PPIs. At every query submission PPI data are downloaded, merged and quality assessed. Then each PPI is confidence scored based on the number of distinct methods used for interaction detection and the number of publications that report the specific interaction. Examples of how PINOT can be applied are provided to highlight the performance, ease of use and potential utility of this tool. (Continued on next page) * Correspondence: [email protected]; [email protected] 9School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK 1School of Pharmacy, University of Reading, Whiteknights, Reading RG6 6AP, UK Full list of author information is available at the end of the article © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Tomkins et al. Cell Communication and Signaling (2020) 18:92 Page 2 of 11 (Continued from previous page) Conclusions: PINOT is a tool that allows users to survey the curated literature, extracting PPI data in relation to a list of proteins of interest. PINOT extracts a similar numbers of PPIs as other, analogous, tools and incorporates a set of innovative features. PINOT is able to process large queries, it downloads human PPIs live through PSICQUIC and it applies quality control filters on the downloaded PPI data (i.e. removing the need for manual inspection by the user). PINOT provides the user with information on detection methods and publication history for each downloaded interaction data entry and outputs the results in a table format that can be straightforwardly further customised and/or directly uploaded into network visualization software. Keywords: Protein interaction, Protein network, Network, Data mining, Protein database, Data integration Background downloaded “live” at the time of the query) from seven During the past two decades the use of omics data to databases using the Proteomics Standard Initiative Com- understand biological systems has become an increas- mon Query Interface (PSICQUIC) and integrated to en- ingly valued approach [1]. This includes extensive efforts sure a wide coverage of the PPIs available from these to detect protein-protein interactions (PPIs) on an al- repositories [16]. These data are scored through a simple most proteome-wide scale [2, 3]. The utility of such data and transparent procedure based on ‘method detection’ has been greatly supported by primary database curation and ‘publication records’ and allows the user to further and the International Molecular Exchange (IMEx) Con- apply customized confidence thresholds. PINOT is fully sortium, which promotes collaborative efforts in standar- automated and available online as an open access re- dising and maintaining high quality data curation across source. Output data are provided as a summary table the major molecular interaction data repositories [4]. (directly online or emailed to the user), which summa- The primary databases, such as IntAct [5] and BioGRID rizes the most comprehensive current knowledge of the [6], are rich data resources providing a comprehensive PPI landscape for the protein(s)-of-interest submitted in record of published PPI literature. PPI data are critical the query list. Additionally, and of note, the R scripts to describe connections among proteins, which in turn that underlie PINOT can be freely downloaded from the supports both inference of new functions for proteins help-page. (based on the guilt by association principle [7]) and visualization of protein connectivity via shared interactors. This enables identification of potential communal Methods pathways involving proteins of interest [8–10]. Addition- Protein interaction network online tool (PINOT) ally, literature extracted PPI data can support the PINOT can be run automatically at http://www.reading. prioritization of interactions from high-throughput ex- ac.uk/bioinf/PINOT/PINOT_form.html (hereafter re- “ ” periments (which generate large lists of potential PPI ferred to as web tool ). A choice of parameters is inte- hits), assisting the selection of candidates for further grated by default as explained further below and in analysis/validation [11]. Supplementary Materials S1. Alternatively, R scripts can However, the process of collating PPI data is currently be downloaded from the help-page and in this way “ ” hampered by the fact that no single data source encom- PINOT can be used as a standalone tool whereby pa- passes the full extent of PPIs reported in literature; rameters can be modified as per user choice. hence, users are required to merge (partial) information A list of proteins of interest (seeds) can be queried to mined from multiple different primary databases. Mer- identify their literature-reported interactors that have ging such data is not straightforward due to inconsisten- been curated into PPI databases (Fig. 1). cies in data format and differences in data curation For Homo sapiens (taxonomy ID: 9606) the seed iden- across the PPI databases (e.g. IMEx member vs non- tifiers submitted into the query field must be in an ap- member databases). proved HUGO Gene Nomenclature Committee (HGNC) To optimize the use of PPI data within the public do- gene symbol [17] or valid Swiss-Prot UniProt ID format. main, we developed a user-friendly tool that assists PPI Upon query submission, PPI data are extracted directly data extraction and processing: the Protein Interaction (via API: Shannon, P. (2020) PSICQUIC R package, Network Online Tool (PINOT). This tool represents the DOI: https://doi.org/10.18129/B9.bioc.PSICQUIC [18]) development (and automation) of our previous PPI from seven primary databases, all of which directly an- analysis framework, termed Weighted Protein-Protein notate PPI data from peer-reviewed literature: bhf-ucl, Interaction Network Analysis (WPPINA) [9, 11–15]. BioGRID [6], InnateDB [19], IntAct [5], MBInfo (https:// Through PINOT, PPI data are downloaded directly (i.e. www.mechanobio.info), MINT [20] and UniProt [21]. The downloaded protein interaction data are then Tomkins et al. Cell Communication and Signaling (2020) 18:92 Page 3 of 11 ab c d e Fig. 1 PINOT user interface. a. Screenshot of the PINOT webpage. b. Examples of the text file to be uploaded or list to be populated into the text box of query seeds (i.e. proteins for which protein interactors will be extracted from primary databases that manually curate the literature). c. Example result output file from PINOT, containing the extracted and processed PPI data (only the file header is reported as an example). d. Example of the discarded proteins log file from PINOT, a text file reporting all the seeds for which interactions are not returned to the user. e. Example of the network providers log file from PINOT containing a list of active databases that were utilised for downloading PPI data parsed, merged, filtered and scored (Fig. 2) automatically i) a network file (final_network.txt), which is a tab- by PINOT. A detailed description of the PINOT pipeline spaced text file containing the processed PPIs for the can be found in the Supplementary Materials S1. The seeds

PINOT: an Intuitive Resource for Integrating Protein-Protein Interactions James E

Original Article Text Mining in the Biocuration Workflow: Applications for Literature Curation at Wormbase, Dictybase and TAIR

The ELIXIR Core Data Resources: Fundamental Infrastructure for The

UC Davis UC Davis Previously Published Works

Webnetcoffee

The Biogrid Interaction Database

NIH-GDS: Genomic Data Sharing

SGD and the Alliance of Genome Resources Stacia R

Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases

Annotation of Metabolic Genes in Caenorhabditis Elegans and Reconstruction of Icel1273

The Distribution of Lectins Across the Phylum Nematoda: a Genome-Wide Search

Wormbase 2017: Molting Into a New Stage Tim Schedl Washington University School of Medicine in St

Annual Scientific Report 2011 Annual Scientific Report 2011 Designed and Produced by Pickeringhutchins Ltd