View metadata, citation and similar papers at core.ac.uk brought to you by CORE Editorial provided by Elsevier - Publisher Connector

Computation Resources for Molecular Biology: A Special Issue

Increasingly, computational approaches are hav- ubiquitin-like folds in the protein databank (UbSRD). ing a central role across many areas of research The resource quantifies the structures of ubiquitins tackling the challenges of understanding the com- and SUMOs (small ubiquitin-like modifier proteins) plexity of biological systems. A resource such as and their different modes of protein–protein interac- BLAST (Basic Local Alignment Search Tool) [1], tions. The database allowed the authors to identify which was published in this journal in 1990, has that the ubiquitin tail is flexible and adopts a range of transformed sequence searching because of its conformations on binding. Users can browse the speed and power in detecting distant but biologically database by phylogeny, by structural properties and significant relationships. Since then, high-throughput by residue interactions. molecular biology technologies have led to a rapid The third database [4] in this Special Issue, expansion in available sequence, structural and authored by Keerthikumar et al., is ExoCarta that is -omics data for many systems being studied. a manually curated compendium of exosomal Computational biologists, statisticians and mathe- proteins, RNAs and lipids. The current version maticians have been motivated by this exponential details more than 41,000 protein, 7000 RNA and growth of data to develop enhanced novel compu- 1000 lipid molecules. Users can browse the data- tational tools. Challenges include the storage and base by organism, content type or gene. Data can cataloging the primary data, searching for relation- also be downloaded for further anal- ships between the data in particular the identification ysis. In addition, users can submit their data via a of homology, performing comparative analyses to spreadsheet to be added after review to the derive fundamental principles and integrating of database. information from different modalities. Via the Web, With the explosion in the number of known protein computational resources are readily being made sequences for a diverse range of organisms and the available to researchers in the community, many of expansion in the number of experimentally determined whom are not necessarily skilled in computing. protein structures, template-based protein structure Accordingly, in recognition of the crucial importance prediction is now widely used by the community. of methods, databases, software and algorithms, the Estimation of the accuracy of a predicted structure is Journal of Molecular Biology has devoted a Special crucial in guiding interpretation of the model by the Issue to collect a series of eight important computa- biologist. Yang et al. [5] report a new server, ResQ, tional resources, which can aid researchers to gain which estimates the accuracy at the residue level novel, molecular and functional insights into important together with a value for the thermal mobility (B-value) biological systems and help to solve unanswered for a predicted model. The approach uses information challenging questions relevant to health and disease. about local structural variation within a series of We have three contributions where the authors structural templates. The authors also demonstrate present an integrated database for an important how the ResQ values can help in structure determina- biological system. Filipa et al. [2] have developed the tion by molecular replacement. AlloRep database that details sequence, structural When a new protein structure has been solved, it is and mutagenic data for LacI/GalR transcriptional common practice to search the Protein Data Bank for regulators. This well-studied family of transcriptional related structure and indeed authors reporting regulators binds a diverse set of DNA sequences. structures in Journal of Molecular Biology are The authors provide manually curated sequence expected to have performed such a search. The alignments for over 3000 sequences and in vivo identification of related structures can provide phenotypic and biochemical data for over 5750 insights into the specificity, function and evolution mutant variants. The authors bring proof of principle of the protein being studied. Similarly, often when a that AlloRep can be used to predict residues that biologist obtains a predicted protein structure, they alter allosteric regulation. This resource allows one wish to undertake these structural searches. Mezulis to hypothesize novel ideas and test the robustness et al. [6] report a new Web server PhyreStorm that of computational approaches for engineering syn- can search the entire Protein Data Bank in typically thetic transcription repressors. less than 1 min. This speed should facilitate users Harrison et al. [3] have developed a relational SQL undertaking several iterative searches exploring database reporting structural features for all 509 multiple hypotheses.

0022-2836/© 2016 Elsevier Ltd. All rights reserved. J Mol Biol (2016) 428, 669–670 670 Computation Resources for Molecular Biology

The identification of small-molecule binding sites References in proteins is of widespread interest in understanding function and in the development of novel drugs and other regulators of activity. Cryptic sites are those [1] Stephen F. Altschul, Warren Gish, , Eugene W. that are not readily detectible from the ligand-free Myers, David J. Lipman, Basic local alignment search tool, J. Mol. structure but become identifiable as a result of Biol. (1990), http://dx.doi.org/10.1016/S0022-2836(05)80360–2. conformational change. Typically, these sites are [2] Filipa L. Sousa, Daniel J. Parente, David L. Shis, Jacob A. identified by time-consuming molecular dynamics Hessman, Allen Chazelle, Matthew R. Bennett, Sarah A. simulations. Here, Cimermancic et al. [7] present the Teichmann, Liskin Swint-Kruse, AlloRep: A repository of CryptoSite server that predicts cryptic sites from sequence, structural and mutagenesis data for the LacI/GalR sequence, structural and evolutionary features transcription regulators, J. Mol. Biol., http://dx.doi.org/10.1016/j. together with a rapid simulation of protein mobility. jmb.2015.09.015. [3] Joseph S. Harrison, Tim M. Jacobs, Kevin Houlihan, Koenraad Users will input their protein coordinates and obtain Van Doorslaer, Brian Kuhlman, UbSRD: The Ubiquitin Structural predicted sites. Relational Database, J. Mol. Biol., http://dx.doi.org/10.1016/j. The determination of the three-dimensional struc- jmb.2015.09.011. ture of a biomolecular complex generally is harder [4] Shivakumar Keerthikumar, David Chisanga, Dinuka Ariyaratne, than solving the structures of the components. Here, Haidar Al Saffar, Sushma Anand, Kening Zhao, Monisha Samuel, van Zundert et al. [8] report an update of the widely Mohashin Pathan, Markandeya Jois, Naveen Chilamkurti, Lahiru used HADDOCK2 Web server for the prediction of Gangoda, Suresh Mathivanan, ExoCarta: A Web-Based Com- bimolecular complexes. The HADDOCK approach is pendium of Exosomal Cargo, J. Mol. Biol., http://dx.doi.org/10. based on using a series of diverse distance 1016/j.jmb.2015.09.019. constraints to identify the predicted complex. This [5] Jianyi Yang, Yan Wang, Yang Zhang, ResQ: An approach to unified estimation of B-factor and residue-specific error in version, HADDOCK2.2, provides facilities to dock protein structure prediction, J. Mol. Biol., http://dx.doi.org/10. mixed molecule type and incorporate additional 1016/j.jmb.2015.09.024. experimental constraints including a restraint based [6] Stefans Mezulis, Michael J.E. Sternberg, Lawrence A. Kelley, on an experimental radius of gyration such as PhyreStorm: A Web server for fast structural searches against obtained from a small-angle X-ray scattering exper- the PDB, J. Mol. Biol., http://dx.doi.org/10.1016/j.jmb.2015.10. iment and several additional restraints identified from 017. NMR studies. [7] Peter Cimermancic, Andrej Sali: CryptoSite: Expanding the Kanehisa et al. [9] report two new automatic druggable proteome by characterization and prediction of servers, BlastKOALA and GhostKOALA, which cryptic binding sites, J. Mol. Biol., http://dx.doi.org/10.1016/j. annotate genome and metagenome sequences. jmb.2016.01.029 [8] G.C.P. van Zundert, J.P.G.L.M. Rodrigues, M. Trellet, C. Schmitz, These servers use the information in the widely P.L. Kastritis, E. Karaca, A.S.J. Melquiond, M.van Dijk, S.J. de used KEGG database that provides a biological Vries, A.M.J.J. Bonvin, The HADDOCK2.2 Web Server: User- interpretation of proteins including their location in friendly integrative modeling of biomolecular complexes, J. Mol. pathways. These two servers identify orthologs in Biol., http://dx.doi.org/10.1016/j.jmb.2015.09.014. the KEGG database and thereby construct a KEGG [9] Minoru Kanehisa, Yoko Sato, Kanae Morishima, BlastKOALA pathway. BlastKOALA is designed for genome and GhostKOALA: KEGG tools for functional characterization sequences using a version of BLAST for sequence of genome and metagenome sequences, J. Mol. Biol., http:// searching. GlostKOALA employs a far more rapid dx.doi.org/10.1016/j.jmb.2015.11.006. sequence search approach and is therefore Michael J.E. Sternberg appropriate for metagenome sequences. Structural Bioinformatics Group, Department of Life Sciences, This Special Issue reports specialist databases Imperial College London, South Kensington, about molecules and vesicles together with Web London SW7 2AZ, England servers that provide biological insight about Corresponding author. individual proteins, biomolecular complexes and E-mail address: [email protected]. pathways. We trust that the resources described will assist many in their research to characterise Marina I. Ostankovitch biological structure and function at the molecular Journal of Molecular Biology, Cambridge, MA 02139, USA level. We would like to thank all the contributors to this Special Issue.