Wwpdb Biocuration: on the Front Line of Structural Biology

FOCUS | COMMENT FOCUS | comment wwPDB biocuration: on the front line of structural biology Biocurators, the backbone of the wwPDB, manage structural biology data deposition, quality, and integrity, and provide integral support to the research community worldwide. Jasmine Y. Young, John Berrisford and Minyu Chen hrough the open-access Protein the structural biology community stepped Data Bank (PDB) archive, structural up to provide much-needed insights Tbiologists provide and gain access into SARS-CoV-2, wwPDB biocuration to atomic-level views of over 175,000 of SARS-CoV-2 protein structures was biological macromolecules. These prioritized to ensure rapid public release structures augment our understanding of of data while maintaining our enduring the connections between structure and commitment to quality. More than 1,000 function in biology. The archive continues COVID-19-related structures are now to grow every year in both the number of freely available from the PDB, a year structures and complexity––even during the after release of the first SARS-CoV-2 COVID-19 pandemic. The PDB is widely wwPDB Biocurator Summit held virtually in 2020 structure. Combined with SARS-CoV-1 regarded as one of the best-curated biodata during the pandemic. Top row, from the left: Yuhe (over 200 structures) and MERS-CoV (70 resources1, supporting the structural biology Liang (RCSB PDB), Jasmine Young (RCSB PDB), structures) PDB structures from the two community and resourced by a team of Irina Persikova (RCSB PDB). Second row, from the earlier epidemics, PDB data are facilitating expert biocurators vital to the integrity of left: Deborah Harrus (PDBe), David Armstrong structure-guided discovery and development the PDB data ecosystem. The biocurators (PDBe), Brian Hudson (RCSB PDB). Bottom row, of anti-coronavirus drugs, vaccines and collaborate with PDB depositors using the from the left: Ezra Peisach (RCSB PDB), John neutralizing antibodies. OneDep software system2 to standardize, Berrisford (PDBe), Minyu Chen (PDBj). Open access to PDB data has helped validate and biocurate incoming structure to expose the inner workings of the data, ensuring that they are findable, coronavirus beyond the scientific accessible, interoperable and reusable community. Throughout the COVID-19 (FAIR). Now in its 50th year of operation, and ontologies. This is as true now as it pandemic, wwPDB biocuration staff have the PDB has exemplified the FAIR principles was 47 years ago when PDB biocuration continued to support research, training for responsible data management since pioneer Frances C. Bernstein and others and education worldwide. We are thrilled long before they were widely known and at Brookhaven National Laboratory that our efforts have contributed to general adopted3. curated data that were represented in the awareness of the importance of scientific Key to the long-term success and original PDB file format. Today, strict advances. As science evolves and new sustainability of the PDB has been the definitions of data types and file formats technologies and methods are adopted by Worldwide Protein Data Bank partnership4,5, in the fully extensible PDBx/mmCIF data the structural biology community, we will formalized in 2003. The wwPDB jointly dictionary allow identification of errors face increasing challenges in handling the manages the single global PDB archive. and inconsistencies, highlight molecular growth in number, size and complexity Today, more than 15 biocurators work properties, and connect data depositors, of depositions to the PDB. In response to at wwPDB data centers located in the biocurators and data consumers. While these challenges, we are working with the United States, Europe and Asia. Using the structural biologists push the envelope by community to increase the accuracy of OneDep system, we share responsibility determining ever larger and more complex metadata in PDB entries through increased for managing incoming PDB structures structures, developing novel experimental automatic data harvesting, increasing the along geographic lines, bringing diverse methods, or designing new validation tools, level of automation in the biocuration expertise in biochemistry, biophysics, we work closely with software developers, pipeline to improve biocuration efficiency, computational chemistry, enzymology and structural biology facilities and scientific and collaborating with the community task small-molecule crystallography to the global innovators to refine PDB data management forces and mmCIF Working Group (http:// enterprise. With training in macromolecular practices. We also use OneDep periodically wwpdb.org/task/mmcif) to extend the data crystallography, nuclear magnetic resonance to look back and across the archive with model and improve validation of models and spectrometry and electron microscopy, ‘remediation efforts’ to bring previously experimental data that will better support we operate on the front lines of structural released structure data up to modern new technologies and methods. ❐ biology, working closely with thousands of standards. PDB depositors every year. The move for biocurators to work from Jasmine Y. Young 1 ✉ , John Berrisford 2 Since its launch in 1971, more than 100 home, away from each regional data center and Minyu Chen3 biocurators have served the PDB, united by during the COVID-19 pandemic, was 1Research Collaboratory for Structural Bioinformatics passion for science, thirst for knowledge, enabled by the global collaborative practices Protein Data Bank (RCSB PDB), Institute for and interest in file formats, data dictionaries that support the wwPDB partnership. As Quantitative Biomedicine, Rutgers, Te State NATURE METHODS | VOL 18 | MAY 2021 | 431–443 | www.nature.com/naturemethods 431 COMMENT | FOCUS comment | FOCUS University of New Jersey, Piscataway, NJ, USA. References US National Institutes of Health (R01GM133198). The 2Protein Data Bank in Europe (PDBe), European 1. Howe, D. et al. Nature 455, 47–50 (2008). Protein Data Bank in Europe is supported by the European 2. Young, J. Y. et al. Database 2018, bay002 (2018). Molecular Biology Laboratory–European Bioinformatics Molecular Biology Laboratory, European 3. Wilkinson, M. D. et al. Sci. Data 3, 160018 (2016). Institute and Wellcome Trust (104948). Protein Data Bank Bioinformatics Institute (EMBL-EBI), Wellcome 4. Berman, H., Henrick, K. & Nakamura, H. Nat. Struct. Biol. 10, Japan is supported by the Database Integration Coordination Genome Campus, Hinxton, UK. 3Protein Data Bank 980 (2003). Program from the National Bioscience Database Center 5. wwPDB Consortium. Nucleic Acids Res. 47, D520–D528 (2019). Japan (PDBj), Institute for Protein Research, Osaka (NBDC)–JST (Japan Science and Technology Agency), the Platform Project for Supporting in Drug Discovery and Life University, Osaka, Japan. Acknowledgements Science Research from AMED, and the joint usage program ✉ RCSB PDB is jointly funded by the US National Science e-mail: [email protected] of Institute for Protein Research, Osaka University. Foundation (DBI-1832184), the US Department of Energy (DE-SC0019749) and the National Cancer Institute, Published online: 7 May 2021 National Institute of Allergy and Infectious Diseases and Competing interests https://doi.org/10.1038/s41592-021-01137-z National Institute of General Medical Sciences of the The authors declare no competing interests. A new era of synchrotron-enabled macromolecular crystallography The future of macromolecular crystallography includes new X-ray sources, enhanced remote-accessible capabilities and time-resolved methods to capture intermediate structures along reaction pathways. Aina E. Cohen n 1974, three years after the official start of the Protein Data Bank, the Ifirst beamlines dedicated to structural D485 studies began operation at multi-GeV Template storage rings. During early macromolecular D483 strand crystallography experiments, a crystal exposed for only a few minutes would produce a higher resolution diffraction D481 pattern than could be measured after Mg2+ hours with a rotating anode source, first demonstrating the value of synchrotron radiation to structural biology1. At state-of-the-art synchrotron beamlines today, complete datasets are measured in seconds with crystal rotation speeds NPE exceeding 90° s−1 and diffraction image frame rates exceeding 100 Hz. The resulting images are monitored for diffraction quality in real time and transferred to Bridge helix automated processing pipelines to simplify data analysis. As technologies advance, structural investigations are transitioning Fig. 1 | Time-resolved studies of transcription. The Calero lab (University of Pittsburgh) has beyond solving single static structures. demonstrated that photoactive caged ATP (green) can bind to RNA polymerase II and that ultraviolet Sequential series of structural snapshots illumination can break the nitrobenzyl group (NPE) in crystallo, allowing ATP release, metal coordination are being applied to provide details of the (blue) and phosphodiester bond formation (red). These experiments were performed remotely atomic positions and motions that define using SSRL beam line 12-1, where crystals were exposed to UV light to break the cage, followed by a the relationships involved in molecular temperature increase from 100 K to 170 K (above the glass transition temperature of water), followed by recognition, transition state stabilization, rapid helical-rotation data collection (2 s per dataset). Credit:Guillermo Calero. allostery and other

Wwpdb Biocuration: on the Front Line of Structural Biology

Original Article Text Mining in the Biocuration Workflow: Applications for Literature Curation at Wormbase, Dictybase and TAIR

Biocuration 2016 - Posters

Biocuration - Mapping Resources and Needs [Version 2; Peer Review: 2 Approved]

Improving the Gene Ontology Resource to Facilitate More Informative Analysis and Interpretation of Alzheimer’S Disease Data

Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases

ISB Newsletter

Ontology: Tool for Broad Spectrum Knowledge Integration Barry Smith

Integration of Proteomics Data Into Uniprotkb

Biocuration at the Saccharomyces Genome Database

Perspective on Literature Curation

OM2017 Camerafinal

Building Curation Filters for BCO and Adapting the BCO Framework for Biocuration