The Biogrid Interaction Database: 2017 Update Andrew Chatr-Aryamontri1, Rose Oughtred2, Lorrie Boucher3, Jennifer Rust2, Christie Chang2, Nadine K

Published online 14 December 2016 Nucleic Acids Research, 2017, Vol. 45, Database issue D369–D379 doi: 10.1093/nar/gkw1102 The BioGRID interaction database: 2017 update Andrew Chatr-aryamontri1, Rose Oughtred2, Lorrie Boucher3, Jennifer Rust2, Christie Chang2, Nadine K. Kolas3, Lara O’Donnell3, Sara Oster3, Chandra Theesfeld2, Adnane Sellam4, Chris Stark3, Bobby-Joe Breitkreutz3, Kara Dolinski2 and Mike Tyers1,3,* 1Institute for Research in Immunology and Cancer, Universite´ de Montreal,´ Montreal,´ Quebec H3T 1J4, Canada, 2Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, 3The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada and 4Centre Hospitalier de l’Universite´ Laval (CHUL), Quebec,´ Quebec´ G1V 4G2, Canada Received September 29, 2016; Revised October 25, 2016; Editorial Decision October 26, 2016; Accepted October 27, 2016 ABSTRACT hence essential for the interpretation of genomic, functional genomic, phenotypic and chemical screen data. Since the The Biological General Repository for Interaction advent of high-throughput approaches to detect protein (1– Datasets (BioGRID: https://thebiogrid.org) is an open 3) and genetic interactions (4), the depth of coverage and access database dedicated to the annotation and accuracy of biological interaction networks has continued archival of protein, genetic and chemical interac- to improve (5,6), with increased resolution at the level of tions for all major model organism species and hu- protein isoforms (7) and metabolites (8,9). Extensive net- mans. As of September 2016 (build 3.4.140), the Bi- work contexts now provide a basis for the rationalization of oGRID contains 1 072 173 genetic and protein in- perturbations caused by disease-associated mutations (10– teractions, and 38 559 post-translational modifica- 12) and have helped deconvolve complex mutational pro- tions, as manually annotated from 48 114 publica- files generated by genome-wide association studies (GWAS) tions. This dataset represents interaction records for and next-generation sequencing-based approaches for anal- ysis of the genome (13), transcriptome, and epigenome (14). 66 model organisms and represents a 30% increase The network paradigm thus holds the promise of predic- compared to the previous 2015 BioGRID update. Bi- tive and precision medicine, as illustrated for example by the oGRID curates the biomedical literature for major synthetic lethal interaction networks between cancer driver model organism species, including humans, with a mutations and established drug targets (15). recent emphasis on central biological processes and The Biological General Repository for Interaction specific human diseases. To facilitate network-based Datasets (BioGRID: https://thebiogrid.org) was originally approaches to drug discovery, BioGRID now incorpo- implemented in 2003 (16) to provide open access to high- rates 27 501 chemical–protein interactions for human throughput (HTP) interaction datasets (1–3), and subse- drug targets, as drawn from the DrugBank database. quently to augment and benchmark HTP data with interac- A new dynamic interaction network viewer allows the tions drawn from focused low-throughput (LTP) studies in easy navigation and filtering of all genetic and pro- the biomedical literature (17). Since its inception as a yeast- specific database (16), BioGRID has grown to cover inter- tein interaction data, as well as for bioactive com- actions from 66 different species, including all major model pounds and their established targets. BioGRID data organisms and humans. Correspondingly, the number of are directly downloadable without restriction in a annotated interactions in BioGRID has increased from 30 variety of standardized formats and are freely dis- 000 protein interactions in the original version to more than tributed through partner model organism databases one million protein, genetic and chemical interactions in the and meta-databases. current release (Figure 1). The primary focus of BioGRID is the manual curation of experimentally validated genetic and protein interactions INTRODUCTION that are reported in peer-reviewed biomedical publications. Biological interactions, whether functional interactions be- Text-mining approaches are now routinely used to acceler- tween genes or physical interactions between proteins and ate and prioritize expert manual curation in BioGRID (18– other biomolecules, provide the framework to understand 20). All interactions in BioGRID are annotated by cura- how the cell is structured and controlled. Interaction net- tors according to a structured set of experimental evidence works reflect the organization of cellular pathways and are codes. In addition BioGRID captures post-translational *To whom correspondence should be addressed. Tel: +1 514 343 6668; Fax: +1 514 343 5839; Email: [email protected] C The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] D370 Nucleic Acids Research, 2017, Vol. 45, Database issue Total interactions Publications containing interactions Protein Interactions Publications curated Genetic Interactions 1,200K 200K 1,000K 160K 800K 120K 600K 80K 400K 200K 40K 0 0 Jul/2013 Jul/2013 Oct/2014 Apr/2012 Apr/2012 Oct/2014 Jun/2011 Jan/2016 Jun/2011 Jun/2016 Jan/2011 Jan/2016 Jun/2016 Jan/2011 Mar/2010 Mar/2015 Feb/2013 Aug/2010 Aug/2015 Dec/2013 Nov/2011 Sep/2012 Mar/2015 Mar/2010 Feb/2013 May/2014 Nov/2011 Dec/2013 Aug/2015 Sep/2012 Aug/2010 May/2014 Figure 1. Increase in data content of BioGRID. Increments in interaction records and source publications reported in BioGRID from March 2010 (release 2.0.62) to September 2016 (release 3.4.140). Left panel shows the increase of annotated protein interactions (red), genetic interactions (green) and total interactions (blue). Right panel shows the number of publications that actually reported protein or genetic interactions (blue) as a function of the total number of publications examined by BioGRID curators (red). modification (PTM) data, such as sites of phosphoryla- standard (34). The BioGRID also currently contains data tion and ubiquitination, from both LTP and HTP stud- on 38 559 protein PTMs as curated from 4317 publica- ies. Recently, BioGRID has extended its data content to tions. These PTM data are now drawn mainly from high- include the protein and/or genetic interactions of drugs, throughput mass spectrometry studies, which are able to metabolites and other bioactive small molecules. The cur- routinely survey many thousands of modification sites for rent version of BioGRID includes a newly implemented any given PTM type (35). All yeast phosphorylation site network viewer that allows visualization of all search re- data in BioGRID are also currently housed in the Phos- sults in an interactive graphical format. Additional new cus- phoGRID database (36), but this older database has been tom page views also allow the interrogation of PTM sites essentially subsumed by the new PTM functionalities of and chemical interaction data. BioGRID content is up- BioGRID (see below). BioGRID curation has recently fo- dated on a monthly basis and is made freely accessible via cused on improved coverage for PTMs. For example, a re- the web interface, through downloadable files in standard- cent themed project on the ubiquitin proteasome system has ized formats, and through dissemination by model organ- documented 130 184 sites of ubiquitin modification on pro- ism database (MOD) partners (21–26) and other biological teins encoded by 8681 human genes and 20 019 sites on 2549 resources (27–32). yeast proteins, all which will be released as a consolidated dataset by the end of 2016. A new aspect of BioGRID is the coverage of chemical interactions, typically for drug- DATABASE GROWTH AND STATISTICS protein targets and/or drug-gene interactions. BioGRID Since our 2015 NAR Database report (33), the number of release 3.3.123 (April 2015) included 27 034 chemical–target curated interactions housed in BioGRID has increased by interaction records drawn from DrugBank, a database of 30%. As of September 2016 (version 3.4.140), BioGRID manually curated drug-target relationships (37). At present, contains 1 072 173 protein and genetic interactions, of BioGRID contains 2519 unique genes/proteins from 21 or- which 836 212 are non-redundant interactions. These in- ganisms that are linked to 4999 total unique chemicals as cu- teractions correspond to 621 639 (470 810 non-redundant) rated from 8989 publications. 2129 of these interactions are protein interactions and 450 534 (373 762 non-redundant) between drugs or other bioactive agents and human genes genetic interactions (Table 1). These data were directly ex- or proteins. tracted from 47 223 manually annotated peer-reviewed pub- In 2016, Google Analytics reported that the BioGRID lications, which were identified from the biomedical litera- received on average 124 232 page views and 14 444 unique ture by keyword searches and text-mining approaches. Ex- visitors per month, versus 88 080 page views and 12 399 tensive manual inspection of candidate abstracts and/or unique visitors per month in 2014.

The Biogrid Interaction Database: 2017 Update Andrew Chatr-Aryamontri1, Rose Oughtred2, Lorrie Boucher3, Jennifer Rust2, Christie Chang2, Nadine K

Impact of the Protein Data Bank Across Scientific Disciplines.Data Science Journal, 19: 25, Pp

Pdbefold Tutorial Tutorial Pdbefold Can May Be Accessed from Multiple Locations on the Pdbe Website

EMBL-EBI-Overview.Pdf

Human Genetics 1990–2009

EC-PSI: Associating Enzyme Commission Numbers with Pfam Domains

RCSB Protein Data Bank: Overview

Uniprot Knowledgebase: a Hub of Integrated Protein Data

Lab Manual.Indd 1 17/01/2019 4:34:55 PM Title : Bioinformatics for Beginners Laboratory Manual

The Role of Uniprot's Protein Sequence Databases in Biomedical Research

Search of Biological Databases and Literature

Saccharomyces Cerevisiae HOWARD BUSSEY*T, DAVID B

Introduction to BLAST Using Human Leptin