Synergies Between the Protein Data Bank and the Community the Protein Data Bank (PDB) Is a Community Resource
Total Page:16
File Type:pdf, Size:1020Kb
COMMENT | FOCUS comment | FOCUS Synergies between the Protein Data Bank and the community The Protein Data Bank (PDB) is a community resource. But how do we defne community, and how has it changed over the last 50 years since the PDB was founded? How did the community infuence the evolution of the PDB, and how did the PDB infuence both the science and the behavior of the community? Helen M. Berman he PDB community is large and Although structure validation tools were Right now, more than 2.5 million coordinate heterogeneous. It consists of structural minimal as compared to now, the depositor sets are downloaded every day by a very Tbiologists who deposit their data, could be assured that errors were caught by diverse user community. scientists, educators and students who use the PDB staff, especially Frances Bernstein at As the PDB began to be widely used, the data, and journals that publish articles BNL, who worked with the PDB for almost it became more and more critical that the about the deposited structures and the twenty-five years. Before long, structures data were reliable. Coordinate data needed corresponding analyses. The community determined using nuclear magnetic not only to yield good geometry but also also includes professional societies whose resonance (NMR) spectroscopy3 and then had to fit the experimental data. Both the members deposit and use the PDB data and three-dimensional electron microscopy PDB12 and its users13 urged that structure funding agencies that have ensured that (3DEM) were also deposited4. As the PDB factors be deposited. In 2007, erroneous the PDB has the resources to operate. Each grew in size and in scope, it became clear structures deposited in the PDB were sector has its own needs and requirements to some members of the structural biology detected by independent analyses of the that must be considered for the PDB to be a community—most notably Dick Dickerson structures and structure factors. Structural useful and effective resource. and Fred Richards—that the deposition biologists were dismayed and deeply From the beginning, the PDB archive of data should be a prerequisite for the concerned by these events14. Until 2008, was a global operation, and since 2003, it publication of any structure in a scientific the PDB only mandated the deposition of has been managed by an international group journal. The International Union of coordinates; deposition of the underlying called the Worldwide Protein Data Bank Crystallography (IUCr) formed a committee experimental data that would allow (wwPDB) that consists of data centers in the to recommend best practices and, in 1989, checking of the structure against the data US, Europe and Asia1. Since its inception, guidelines were published that articulated was optional. In 2008, structure factor the wwPDB has tried to ensure that the the timing for depositions of coordinates deposition became mandatory, opening various community interests are reflected in and structure factors and that placed the door to the establishment of a rigorous its data management policies. “holds” on data release until publication5. validation pipeline. An X-ray Validation Community efforts that began in Not surprisingly, the first journal to require Task Force15 was commissioned by the the 1960s came together in 1971 when deposition was Acta Crystallographica wwPDB to identify the criteria to be Walter Hamilton at Brookhaven National D. Now virtually all journals require the used. Task Forces were also created for Laboratory (BNL) in New York and Olga deposition of structural data into the PDB. NMR16, 3DEM17, small angle scattering18,19 Kennard at the Crystallographic Data Centre PDB holdings have steadily increased and integrative modeling methods20,21. (Cambridge, UK) agreed to set up the PDB over the years. In 1976, the number of Through these Task Forces, researchers as an international resource for structures structures available was 13; in 2021, it is from the respective communities provide obtained using X-ray crystallography2. more than 175,000. As the PDB grew, the recommendations regarding data standards The archive was launched with just seven user community expanded along with it. as well as best practices for structure structures. At first, protein crystallographers Computational biologists began to use PDB curation and validation. The wwPDB needed to be convinced to add their data to data for protein classification6,7 and protein implements recommendations of the Task this newly formed resource. Tom Koetzle, structure prediction8. New specialty data Forces in the form of validation reports who became the head of the PDB after resources were created, with deep curation that are produced by the wwPDB data Hamilton’s untimely death in 1973, searched for particular groups of structures; there are management system, called OneDep22. the literature for newly published structures now more than 200 related resources that are The reports are now an integral part of the and wrote personal letters to protein regularly updated with PDB data. A whole data deposition process and are required crystallographers requesting that they new field called structural bioinformatics by many journals as part of the review deposit their data. Several researchers were was born9. The PDB also became an process23. wwPDB validation reports are notable champions of the effort. Michael essential resource for drug development made publicly available at the time of Rossmann put his considerable energy into efforts, as exemplified during the HIV–AIDS data release. lobbying his colleagues to deposit data. At epidemic of the 1980s and the current Each step in the evolution of the PDB the time, sharing crystallographic data was a COVID-19 pandemic. As modern web and involved collaborations and conversations challenging endeavor, requiring shipment of visualization tools were brought online, among community members, with the computer punch cards or magnetic tape reel PDB data were used more and more for goal of building a consensus as to how to the PDB. But at a minimum, deposition educational purposes10, and the PDB began exactly the archive should be operated. ensured that the data would not be lost. to create specific resources for education11. The passionate involvement of the various 400 NATURE STRUCTURAL & MOLECULAR BIOLOGY | VOL 28 | MAY 2021 | 400–401 | www.nature.com/nsmb FOCUS | COMMENT FOCUS | comment sectors is one reason why the PDB has References 12. Jiang, J., Abola, E. & Sussman, J. L. Acta Crystallogr. D Biol. Crystallogr. 55, 4 (1999). endured for 50 years and why it will 1. Berman, H., Henrick, K. & Nakamura, H. Nat. Struct. Biol. 10, 980 (2003). 13. Wlodawer, A. Acta Crystallogr. D Biol. Crystallogr. 63, 421–423 continue to be a key resource for many 2. Nature New Biol. 233, 223 (1971). (2007). years to come. ❐ 3. Williamson, M. P., Havel, T. F. & Wüthrich, K. J. Mol. Biol. 182, 14. Borrell, B. Nature 462, 970 (2009). 295–315 (1985). 15. Read, R. J. et al. Structure 19, 1395–1412 (2011). 16. Montelione, G. T. et al. Structure 21, 1563–1570 Helen M. Berman 1,2 ✉ 4. Henderson, R. et al. J. Mol. Biol. 213, 899–929 (1990). (2013). 1 Department of Chemistry and Chemical Biology, 5. International Union of Crystallography. Acta Crystallogr. A 45, 17. Henderson, R. et al. Structure 20, 205–214 658 (1989). (2012). Research Collaboratory for Structural Bioinformatics 18. Trewhella, J. et al. Structure 21, 875–881 (2013). Protein Data Bank (RCSB PDB), Rutgers, the State 6. Andreeva, A., Kulesha, E., Gough, J. & Murzin, A. G. Nucleic Acids Res. 48, D376–D382 (2020). 19. Trewhella, J. et al. Acta Crystallogr. D Struct. Biol. 73, University of New Jersey, Piscataway, NJ, USA. 7. Sillitoe, I., Dawson, N., Tornton, J. & Orengo, C. Biochimie 119, 710–728 (2017). 2Te Bridge Institute, Michelson Center for 209–217 (2015). 20. Sali, A. et al. Structure 23, 1156–1167 (2015). 21. Berman, H. M. et al. Structure 27, 1745–1759 Convergent Bioscience, University of Southern 8. Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Proteins 87, 1011–1020 (2019). (2019). California, Los Angeles, CA, USA. 9. Gu, J. & Bourne, P. E. (eds) Structural Bioinformatics 2nd edn. 22. Young, J. Y. et al. Structure 25, 536–545 (2017). ✉e-mail: [email protected] (Wiley-Blackwell, 2009). 23. Nat. Struct. Mol. Biol. 23, 871 (2016). 10. Martz, E. Trends Biochem. Sci. 27, 107–109 (2002). Published online: 7 May 2021 11. Goodsell, D. S., Zardecki, C., Berman, H. M. & Burley, S. K. Competing interests https://doi.org/10.1038/s41594-021-00586-6 Biochem. Mol. Biol. Educ. 48, 350–355 (2020). The author declares no competing interests. NATURE STRUCTURAL & MOLECULAR BIOLOGY | VOL 28 | MAY 2021 | 400–401 | www.nature.com/nsmb 401.