Pdbe: Improved findability of Macromolecular Structure Data in the PDB David R
Total Page:16
File Type:pdf, Size:1020Kb
Published online 6 November 2019 Nucleic Acids Research, 2020, Vol. 48, Database issue D335–D343 doi: 10.1093/nar/gkz990 PDBe: improved findability of macromolecular structure data in the PDB David R. Armstrong 1, John M. Berrisford1, Matthew J. Conroy1, Aleksandras Gutmanas 1, Stephen Anyango1, Preeti Choudhary1, Alice R. Clark1, Jose M. Dana1, Mandar Deshpande 1, Roisin Dunlop1, Paul Gane1, Romana Gaborov´ a´ 2, Deepti Gupta1, Pauline Haslam1, Jaroslav Kocaˇ 2, Lora Mak1, Saqib Mir 1, Abhik Mukhopadhyay1, 1 1 1,3 1 Nurul Nadzirin , Sreenath Nair , Typhaine Paysan-Lafosse , Lukas Pravda , Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D335/5613681 by guest on 01 July 2020 David Sehnal2, Osman Salih4, Oliver Smart1, James Tolchard1, Mihaly Varadi 1, Radka Svobodova-Varekovˇ a´ 2, Hossam Zaki1, Gerard J. Kleywegt1,4 and Sameer Velankar1,* 1Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK, 2CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno-Bohunice, Czech Republic, 3InterPro, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK and 4Electron Microscopy Data Bank (EMDB), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Received September 17, 2019; Revised October 11, 2019; Editorial Decision October 11, 2019; Accepted October 25, 2019 ABSTRACT macromolecular structure data in order to solve sci- entific problems. The Protein Data Bank in Europe (PDBe), a found- ing member of the Worldwide Protein Data Bank (wwPDB), actively participates in the deposition, cu- INTRODUCTION ration, validation, archiving and dissemination of Protein Data Bank in Europe (PDBe; pdbe.org) is a found- macromolecular structure data. PDBe supports di- ing member of the Worldwide Protein Data Bank (ww- verse research communities in their use of macro- PDB; wwpdb.org) (1), the international consortium re- molecular structures by enriching the PDB data and sponsible for the management of the Protein Data Bank by providing advanced tools and services for effec- (PDB) (2), the single, global archive for experimentally de- tive data access, visualization and analysis. This pa- termined three dimensional (3D) structures of biological per details the enrichment of data at PDBe, includ- macromolecules. The other wwPDB partners are Research ing mapping of RNA structures to Rfam, and iden- Collaboratory for Structural Bioinformatics Protein Data tification of molecules that act as cofactors. PDBe Bank (RCSB PDB; rcsb.org) (3), Protein Data Bank Japan (PDBj; pdbj.org) (4) and Biological Magnetic Resonance has developed an advanced search facility with ∼100 Bank (BMRB; bmrb.wisc.edu) (5). Together, the four part- data categories and sequence searches. New fea- ners fully cooperate in the areas of deposition, curation, tures have been included in the LiteMol viewer at validation and dissemination of macromolecular structure PDBe, with updated visualization of carbohydrates data, guided by the FAIR principles of administering data and nucleic acids. Small molecules are now mapped resources, which ensure data is Findable, Accessible, Inter- more extensively to external databases and their vi- operable and Reusable (2,6). sual representation has been enhanced. These ad- Since 2014, the processing of all PDB depositions is man- vances help users to more easily find and interpret aged through the OneDep system (7). PDBe is responsi- ble for the processing of OneDep depositions from Euro- *To whom correspondence should be addressed. Tel: +44 1223 49 4646; Email: [email protected] Present addresses: Alice R. Clark, Faculty of Science and Engineering, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1LY, UK. Saqib Mir, GSK, Gunnels Wood Road, Stevenage, Herts SG1 2NY, UK. Abhik Mukhopadhyay, The Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, UK. Oliver Smart, School of Life Sciences, Anglia Ruskin University, East Road, Cambridge CB1 1PT, UK. Hossam Zaki, Department of Molecular and Cellular Biology and Biochemistry, Brown University, Providence, RI, USA. C The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. D336 Nucleic Acids Research, 2020, Vol. 48, Database issue pean and African institutions, totalling over 4,000 PDB en- highlighting any failed processes and allowing easy restart tries in 2018, which equates to 34% of the total PDB depo- of individual tasks. This new infrastructure has significantly sitions that year (wwpdb.org/stats/deposition). PDBe and streamlined the complex process of releasing data at PDBe, the wwPDB partners collaborate closely with the Electron ensuring timely availability of data and allowing easy inte- Microscopy Data Bank (EMDB) (8), the archive for elec- gration of additional software into the PDBe release pro- tron microscopy (EM) electric potential maps, which are cess, some of which is described below. also deposited via OneDep. While all wwPDB partners col- laborate on processing deposited data in the PDB archive, IMPROVEMENTS TO SMALL-MOLECULE DEPICTION thereby creating an authoritative source of macromolecular structure data, each partner has its own website and tools Previous efforts to improve the PDBe website focused on to support users in accessing the data. macromolecules, and the area of small molecules was iden- In 2019, the PDB reached the milestone of 150 000 struc- tified as a future improvement (19). To support clear and Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D335/5613681 by guest on 01 July 2020 tures. This wealth of information is used by researchers effective web-delivery of information pertaining to small in fields ranging from drug discovery to protein engineer- molecules in the PDB, we have now redesigned the underly- ing. To support these diverse user communities, and to ing process for handling small molecules present in the ww- present macromolecular structure information in other bi- PDB chemical component dictionary (CCD) (20), deriving ologically relevant contexts, PDBe enriches its data through added-value information, improving two-dimensional (2D) collaborations with resources that specialize in other ar- depictions and adding cross-references to cheminformatics eas of bioinformatics. For example, the SIFTS project (9) resources. The redesigned process is implemented as a freely brings together PDB structures with protein sequence data available python package (pdbeccdutils; https://gitlab.ebi. and annotations from UniProtKB (10), Pfam (11)and ac.uk/pdbe/ccdutils) that builds on the RDKit software InterPro (12), structure domains from CATH (13)and (http://rdkit.org) and its data model. The package pro- SCOP (14), and, more recently, started incorporating ge- vides functionality for reading PDBx/mmCIF-formatted netic variation data from Ensembl (15) and putative ho- files (compound definitions from the CCD) as well as util- mology groups from Homologene (16). PDBe provides on- ity functions to compute physicochemical properties, scaf- line data access, analysis and visualization tools for PDB folds, fragments and to draw ‘collision-free’ images using data through its website, FTP/RSYNC file download, Ap- a series of templates (Figure 1). This process generates 2D plication Programming Interface (REST API) and PDB coordinates and schematic depictions for the vast majority Coordinate/Electron-density servers. PDBe services were of small molecules in the PDB (e.g. only 447 out of 29,318 accessed from 2.2 million unique IP addresses in 2018, with compounds do not have a collision-free image as of Septem- 1.1 million users accessing the PDBe REST API in the same ber 2019). The images have been added to the PDBe search time period, based on statistics collected at the European interface and PDB entry pages (see below). Where possible, Bioinformatics Institute (EMBL-EBI) (17). we have added cross-references to the database entries for A new resource, Protein Data Bank in Europe- the same compound in other resources, including ChEMBL Knowledge Base (PDBe-KB, pdbe-kb.org), has been (21), ChEBI (22), ZINC (23) and DrugBank (24). developed by PDBe in collaboration with the structural bioinformatics community. The PDBe-KB resource ag- DATA ENRICHMENT gregates macromolecular structure data in a series of dedicated views centred on commonly used biological Macromolecular structures provide insight into the func- objects (e.g. UniProtKB accessions) rather than on PDB tion of biological systems, but to capitalize fully on the value entries. These views incorporate additional biological of the data inherently present in the PDB, it must be pre- context and functional information and are discussed in a sented in the context of related biological and chemical in- separate publication in this issue (18). formation, particularly for scientists whose expertise may This paper describes data enrichment efforts and updates be outside of structural biology. PDBe works closely with to the PDBe