Macromolecular Modeling and Design in Rosetta: New Methods and Frameworks

Total Page:16

File Type:pdf, Size:1020Kb

Macromolecular Modeling and Design in Rosetta: New Methods and Frameworks Supplementary information for: Macromolecular modeling and design in Rosetta: new methods and frameworks Rosetta is licensed and distributed through www.rosettacommons.org. Licenses for academic, non-profit and government laboratories are free of charge, there is a license fee for industry users. The main documentation page can be found at https://www.rosettacommons.org/docs/latest/Home. Tutorials, demos and protocol captures are documented at https://www.rosettacommons.org/demos/latest/Home. RosettaScripts documentation is available at https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/RosettaScripts. PyRosetta tutorials are available at http://www.pyrosetta.org/tutorials. Method Use Developer(s) Lab developed Documentation Protocol capture / demo Limitations Competing methods Protein structure prediction fragment picker1 picks protein Dominik Gront Dominik Gront*,** https://www.rosettacommons.org/docs/latest/applicati Rosetta/demos/public/fragment_picking fragments for on_documentation/utilities/app-fragment-picker various modeling tasks RosettaCM2 comparative Yifan Song formerly David https://www.rosettacommons.org/docs/latest/applicati Rosetta/demos/public/homology_modeling_threa model quality depends on manual MODELLER3, iTasser4, modeling from Baker on_documentation/structure_prediction/RosettaCM ding_basic adjustment of sequence alignment, which HHpred5 multiple templates Supp to 2 can take time to do well iterative recombination of Sergey David Baker, https://www.rosettacommons.org/docs/latest/Iterative hybridize6,7 model Ovchinnikov, Sergey Hybridize substructures for Hahnbeom Park Ovchinnikov de novo modeling Loop modeling MODELLER8 NGK (next- Next Generation Amelie Stein Tanja Kortemme https://www.rosettacommons.org/docs/latest/applicati Rosetta/demos/tutorials/loop_modeling performance deteriorates for loops longer Sphinx10, DiSGro11 generation KIC) 9 Kinematic loop on_documentation/structure_prediction/loop_modelin than 12 residues, as all loop modeling closure g/next-generation-KIC methods do https://www.rosettacommons.org/docs/latest/applicati on_documentation/structure_prediction/loop_modelin g/loopmodel-kinematic (for -vicinity_sampling) LoopHashKIC uses a loop Xingjie Pan Tanja Kortemme https://www.rosettacommons.org/docs/latest/scripting database and KIC _documentation/RosettaScripts/Movers/movers_page for loop design s/LoopModelerMover Consensus_Loop_ uses sequence Enrique Marcos, David Baker https://www.rosettacommons.org/docs/latest/scripting Rosetta/main/tests/integration/tests/ConsensusL LOOPY14, PLOP15, PS116, Design12,13 profiles of known Tom Linsky _documentation/RosettaScripts/TaskOperations/tasko oopDesign PS217 loops to design perations_pages/ConsensusLoopDesignOperation new ones GenKIC Generalized Vikram Mulligan Richard Bonneau* https://www.rosettacommons.org/docs/latest/scripting https://www.rosettacommons.org/demos/latest/tu generality requires more setup and can be molecular dynamics (generalized KIC) KInematic loop _documentation/RosettaScripts/composite_protocols/ torials/GeneralizedKIC/generalized_kinematic_cl less efficient than more biased methods; simulation packages 18 Closure: robotics generalized_kic/GeneralizedKIC osure_1, parameter tuning can require some based approach to https://www.rosettacommons.org/docs/latest/scripting https://www.rosettacommons.org/demos/latest/tu experience sample non- _documentation/RosettaScripts/Movers/movers_page torials/GeneralizedKIC/generalized_kinematic_cl canonical loop s/GeneralizedKICMover osure_2, conformations https://www.rosettacommons.org/docs/latest/scripting https://www.rosettacommons.org/demos/latest/tu _documentation/RosettaScripts/composite_protocols/ torials/GeneralizedKIC/generalized_kinematic_cl generalized_kic/GeneralizedKICperturber osure_3, https://www.rosettacommons.org/docs/latest/scripting https://www.rosettacommons.org/demos/latest/tu _documentation/RosettaScripts/composite_protocols/ torials/GeneralizedKIC/generalized_kinematic_cl generalized_kic/GeneralizedKICfilter osure_4 https://www.rosettacommons.org/docs/latest/scripting Supp to 18,19 _documentation/RosettaScripts/composite_protocols/ generalized_kic/GeneralizedKICselector 1 Method Use Developer(s) Lab developed Documentation Protocol capture / demo Limitations Competing methods https://www.rosettacommons.org/demos/latest/tutorial s/GeneralizedKIC/generalized_kinematic_closure_1 https://www.rosettacommons.org/demos/latest/tutorial s/GeneralizedKIC/generalized_kinematic_closure_2 https://www.rosettacommons.org/demos/latest/tutorial s/GeneralizedKIC/generalized_kinematic_closure_3 https://www.rosettacommons.org/demos/latest/tutorial s/GeneralizedKIC/generalized_kinematic_closure_4 Protein-protein docking RosettaDock4.020 protein-protein Nick Marze, Jeffrey Gray https://www.rosettacommons.org/docs/latest/applicati Rosetta/demos/tutorials/Protein-Protein-Docking global docking requires two orders or InterEVDock21, ClusPro22, docking Shourya Roy on_documentation/docking/docking-protocol https://www.rosettacommons.org/demos/latest/tu magnitude more sampling than local GalaxyTongDock23, Burman torials/Protein-Protein-Docking/Protein-Protein- docking – in these cases RosettaDock is HADDOCK24, PPI3D25 Docking slower than FFT-based methods; like for other tools, large conformational changes make docking difficult Rosetta docking of Shourya Roy (formerly Ingemar https://www.rosettacommons.org/docs/latest/SymDoc SymDock is sensitive to quality of input GalaxyTongDock23, SymDock226 symmetric proteins Burman André), Jeffrey kProtocol monomer; as with other algorithms, SAM27, HSYMDOCK28 Gray performance deteriorates for higher-order symmetries Small molecule ligand docking RosettaLigand29–31 small molecule Sam DeLuca, Jens Meiler https://www.rosettacommons.org/demos/latest/tutorial Rosetta/demos/tutorials/ligand_docking not as fully a dynamic approach as MD AutoDock Vina32, docking to proteins Darwin Fu, s/ligand_docking/ligand_docking_tutorial simulations; performs worse in more GlideScore33 Shannon Smith, solvent-exposed pockets or small (Schrodinger), Rocco Moretti pockets; requires ligand conformers to be DrugScore34, Molecular pre-generated; assumes ligand binding Operating Environment site is known (MOE)35 from Chemical Computing Group, GOLD36 RosettaLigandEns docks ligand Darwin Fu, Jens Meiler https://www.rosettacommons.org/docs/latest/scripting Rosetta/demos/protocol_capture/rosettaligand_e same limitations as RosettaLigand; HybridDock38 emble37 ensembles into Rocco Moretti _documentation/RosettaScripts/Movers/movers_page nsemble assumes ligands in ensemble follow similar proteins s/HighResEnsembleMover binding modes pocket samples binding David Johnson John Karanicolas https://www.rosettacommons.org/docs/latest/applicati Supp to 39,40 competitors use small optimization39,40 pocket on_documentation/utilities/pocket-relax probe molecules conformations https://www.rosettacommons.org/docs/latest/applicati (FTMap41) or molecular during docking on_documentation/analysis/pocket-measure dynamics42 DARC43–45 Docking Approach Ragul Gowthaman, John Karanicolas https://www.rosettacommons.org/docs/latest/applicati Rosetta/demos/public/darc there are numerous using Ray Casting: Karen Khar on_documentation/docking/DARC Supp to 43–45 approaches (f.ex. identifies binding DOCKTITE46 in MOE) for pockets small-molecule docking / virtual screening, but none really comparable to how DARC works Modeling of antibodies and immune system proteins RosettaAntibody47– antibody homology Jeliazko Jeliazkov, Jeffrey Gray https://www.rosettacommons.org/docs/latest/applicati Rosetta/main/tests/integration/tests/antibody_cc performance deteriorates for loops longer SAbPred51, PIGS52, 50 modeling and Nick Marze, Brian on_documentation/antibody/antibody-protocol than 12 residues; unusual CDR loops are BIOVIA Discovery docking Weitzner, hard to model because they are likely not Studio53, MOE54, Jared Adolf- represented in the database Schrodinger55 Bryfogle, Sergey Lyskov, Daisuke Kuroda AbPredict56,57 antibody structure Christoffer Norn, Sarel Fleishman https://www.rosettacommons.org/docs/latest/applicati https://www.rosettacommons.org/demos/latest/tu performance deteriorates for loops longer SAbPred51, PIGS52, prediction Gideon Lapidoth on_documentation/structure_prediction/AbPredict2 torials/AbPredict/AbPredict than 12 residues; unusual CDR loops are BIOVIA Discovery hard to model because they are likely not Studio53, MOE54, represented in the database Schrodinger55 2 Method Use Developer(s) Lab developed Documentation Protocol capture / demo Limitations Competing methods RosettaMHC58 modeling and Santrupti Nerli Nik Sgourakis https://github.com/snerligit/mhc-pep-threader https://github.com/snerligit/mhc-pep- the binding energies cannot fully bioinformatics-based docking of antigen threader/tree/master/examples discriminate best binders from average methods59: NetMHCPan60, / MHC-I / binders MHCFlurry61 (chaperone or T- structure-based cell receptor) methods62: AutoDock63, DINC64 TCRModel65 structure prediction Ragul Gowthaman Brian Pierce https://www.rosettacommons.org/docs/latest/applicati Rosetta/main/tests/integration/tests/tcrmodel challenging to model long CDR loops LYRA66, ImmuneScape67 of T-cell receptors on_documentation/structure_prediction/TCRmodel SnugDock68 docking of Jeliazko Jeliazkov, Jeffrey Gray https://www.rosettacommons.org/docs/latest/applicati Rosetta/main/tests/integration/tests/SnugDock performance deteriorates for loops longer usually for FFT methods
Recommended publications
  • Structural Bioinformatics
    Structural Bioinformatics Sanne Abeln K. Anton Feenstra Centre for Integrative Bioinformatics (IBIVU), and Department of Computer Science, Vrije Universiteit, De Boelelaan 1081A, 1081 HV Amsterdam, Netherlands December 4, 2017 arXiv:1712.00425v1 [q-bio.BM] 1 Dec 2017 2 Abstract This chapter deals with approaches for protein three-dimensional structure prediction, starting out from a single input sequence with unknown struc- ture, the `query' or `target' sequence. Both template based and template free modelling techniques are treated, and how resulting structural models may be selected and refined. We give a concrete flowchart for how to de- cide which modelling strategy is best suited in particular circumstances, and which steps need to be taken in each strategy. Notably, the ability to locate a suitable structural template by homology or fold recognition is crucial; without this models will be of low quality at best. With a template avail- able, the quality of the query-template alignment crucially determines the model quality. We also discuss how other, courser, experimental data may be incorporated in the modelling process to alleviate the problem of missing template structures. Finally, we discuss measures to predict the quality of models generated. Structural Bioinformatics c Abeln & Feenstra, 2014-2017 Contents 7 Strategies for protein structure model generation5 7.1 Template based protein structure modelling . .7 7.1.1 Homology based Template Finding . .7 7.1.2 Fold recognition . .8 7.1.3 Generating the target-template alignment . 10 7.1.4 Generating a model . 11 7.1.5 Loop or missing substructure modelling . 12 7.2 Template-free protein structure modelling .
    [Show full text]
  • Methods for the Refinement of Protein Structure 3D Models
    International Journal of Molecular Sciences Review Methods for the Refinement of Protein Structure 3D Models Recep Adiyaman and Liam James McGuffin * School of Biological Sciences, University of Reading, Reading RG6 6AS, UK; [email protected] * Correspondence: l.j.mcguffi[email protected]; Tel.: +44-0-118-378-6332 Received: 2 April 2019; Accepted: 7 May 2019; Published: 1 May 2019 Abstract: The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.
    [Show full text]
  • Exercise 6: Homology Modeling
    Workshop in Computational Structural Biology (81813), Spring 2019 Exercise 6: Homology Modeling Homology modeling, sometimes referred to as comparative modeling, is a commonly-used alternative to ab-initio modeling (which requires zero knowledge about the native structure, but also overlooks much of our understanding of structural diversity). Homology modeling leverages the massive amount of structural knowledge available today to guide our search of the native structure. Practically, it assumes that sequence similarity implies structural similarity, i.e. assumes a protein homologous to our query protein cannot be very different from it. Therefore, we can roughly "thread" the query sequence onto the template (homologue) structure, and exploit existing knowledge to guide our modeling. Contents Homology modeling with Rosetta 1 Overview 1 Template Selection 1 Loop Modeling 2 Running the full simulation 4 Bonus section! Homology modeling with automated/pre-calculated servers 6 MODBASE 6 I-Tasser 6 Homology modeling with Rosetta Overview In this section, we will model the structure of the Erbin PDZ domain, using homologue template structures, and analyze the quality of the models. The structure of this protein has been solved (PDB ID 1MFG), so we will be able to compare prediction with experiment. As taught in class, classical homology modeling in Rosetta is composed of four phases: 1. Template selection 2. Template-query alignment 3. Rebuilding loop regions 4. Full-atom refinement of the model Our model protein is the Erbin PDZ domain: Erbin contains a class-I PDZ domain that binds to the C-terminal region of the receptor tyrosine kinase ErbB2. PDB ID 1MFG is the crystal structure of the human Erbin PDZ bound to the peptide EYLGLDVPV corresponding to the C-terminal residues of human ErbB2.
    [Show full text]
  • Advances in Rosetta Protein Structure Prediction on Massively Parallel Systems
    UC San Diego UC San Diego Previously Published Works Title Advances in Rosetta protein structure prediction on massively parallel systems Permalink https://escholarship.org/uc/item/87g6q6bw Journal IBM Journal of Research and Development, 52(1) ISSN 0018-8646 Authors Raman, S. Baker, D. Qian, B. et al. Publication Date 2008 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Advances in Rosetta protein S. Raman B. Qian structure prediction on D. Baker massively parallel systems R. C. Walker One of the key challenges in computational biology is prediction of three-dimensional protein structures from amino-acid sequences. For most proteins, the ‘‘native state’’ lies at the bottom of a free- energy landscape. Protein structure prediction involves varying the degrees of freedom of the protein in a constrained manner until it approaches its native state. In the Rosetta protein structure prediction protocols, a large number of independent folding trajectories are simulated, and several lowest-energy results are likely to be close to the native state. The availability of hundred-teraflop, and shortly, petaflop, computing resources is revolutionizing the approaches available for protein structure prediction. Here, we discuss issues involved in utilizing such machines efficiently with the Rosetta code, including an overview of recent results of the Critical Assessment of Techniques for Protein Structure Prediction 7 (CASP7) in which the computationally demanding structure-refinement process was run on 16 racks of the IBM Blue Gene/Le system at the IBM T. J. Watson Research Center. We highlight recent advances in high-performance computing and discuss future development paths that make use of the next-generation petascale (.1012 floating-point operations per second) machines.
    [Show full text]
  • Homology Modeling
    25 HOMOLOGY MODELING Elmar Krieger, Sander B. Nabuurs, and Gert Vriend The ultimate goal of protein modeling is to predict a structure from its sequence with an accuracy that is comparable to the best results achieved experimentally. This would allow users to safely use rapidly generated in silico protein models in all the contexts where today only experimental structures provide a solid basis: structure-based drug design, analysis of protein function, interactions, antigenic behavior, and rational design of proteins with increased stability or novel functions. In addition, protein modeling is the only way to obtain structural information if experimental techniques fail. Many proteins are simply too large for NMR analysis and cannot be crystallized for X-ray diffraction. Among the three major approaches to three-dimensional (3D) structure prediction described in this and the following two chapters, homology modeling is the easiest one. It is based on two major observations: 1. The structure of a protein is uniquely determined by its amino acid sequence (Epstain, Goldberger, and Anfinsen, 1963). Knowing the sequence should, at least in theory, suffice to obtain the structure. 2. During evolution, the structure is more stable and changes much slower than the associated sequence, so that similar sequences adopt practically identical structures, and distantly related sequences still fold into similar structures. This relationship was first identified by Chothia and Lesk (1986) and later quantified by Sander and Schneider (1991). Thanks to the exponential growth of the Protein Data Bank (PDB), Rost (1999) could recently derive a precise limit for this rule, shown in Figure 25.1.
    [Show full text]
  • A New Computer Model for Evaluating the Selective Binding Affinity Of
    pharmaceuticals Article A New Computer Model for Evaluating the Selective Binding Affinity of Phenylalkylamines to T-Type Ca2+ Channels You Lu 1 and Ming Li 2,* 1 Center for Aging, School of Medicine, Tulane University, New Orleans, LA 70112, USA; [email protected] 2 Department of Physiology, School of Medicine, Tulane University, New Orleans, LA 70112, USA * Correspondence: [email protected]; Tel.: +1-504-988-8207 Abstract: To establish a computer model for evaluating the binding affinity of phenylalkylamines (PAAs) to T-type Ca2+ channels (TCCs), we created new homology models for both TCCs and a L-type calcium channel (LCC). We found that PAAs have a high affinity for domains I and IV of TCCs and a low affinity for domains III and IV of the LCC. Therefore, they should be considered as favorable candidates for TCC blockers. The new homology models were validated with some commonly recognized TCC blockers that are well characterized. Additionally, examples of the TCC blockers created were also evaluated using these models. Keywords: T-type calcium channel blocker; homology modeling; computer-aid drug design; virtual drug screening; L-type calcium channel 1. Introduction Citation: Lu, Y.; Li, M. A New As the only type of voltage-gated Ca2+ channels that are activated at or near resting Computer Model for Evaluating the membrane potentials, T-type Ca2+ channels (TCCs) play an important role in regulat- Selective Binding Affinity of 2+ ing [Ca ]i homeostasis in a variety of tissues, including pancreatic β-cells and tumor Phenylalkylamines to T-Type Ca2+ cells [1–4]. Therefore, TCC antagonists could be potentially useful for the treatment of Channels.
    [Show full text]
  • Homology Modeling Hanka Venselaar, Elmar Krieger, & Gert Vriend
    [Book Title], Edited by [Editor’s Name]. ISBN 0-471-XXXXX-X Copyright © 2000 Wiley[Imprint], Inc. Chapter 25 Homology Modeling Hanka Venselaar, Elmar Krieger, & Gert Vriend INTRODUCTION The goal of protein modeling is to predict a structure from its sequence with an accuracy that is comparable to the best results achieved experimentally. This would allow users to safely use in silico generated protein models in scientific fields where today only experimental structures provide a solid basis: structure-based drug design, analysis of protein function, interactions, antigenic behavior, or rational design of proteins with increased stability or novel functions. Protein modeling is the only way to obtain structural information when experimental techniques fail. Many proteins are simply too large for NMR analysis and cannot be crystallized for X-ray diffraction. Among the three major approaches to 3D structure prediction described in this and the following two chapters, homology modeling is the "easiest" approach based on two major observations: • The structure of a protein is uniquely determined by its amino acid sequence (Epstain et al. 1963),and therefore the sequence should, in theory, contain suffice information to obtain the structure. • During evolution, structural changes are observed to be modified at a much slower rate than sequences. Similar sequences have been found to adopt practically identical structures while distantly related sequences can still fold into similar structures. This relationship was first identified by Chothia & Lesk (Chothia et al. 1986) and later quantified by Sander & Schneider (Sander et al. 1991) as summarized in Figure 1. Since the initial establishment of this relationship, Rost et al.
    [Show full text]
  • Homology-Based Loop Modelling Yields More Complete Crystallographic Protein Structures
    bioRxiv preprint doi: https://doi.org/10.1101/329219; this version posted May 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Homology-based loop modelling yields more complete crystallographic protein structures Bart van Beusekom1, Krista Joosten1, Maarten L. Hekkelman1, Robbie P. Joosten1,* and Anastassis Perrakis1,* 1 Department of Biochemistry, The Netherlands Cancer Institute, Plesmanlaan 121, 1066CX, Amsterdam, The Netherlands Contact information: [email protected] and [email protected] 1 bioRxiv preprint doi: https://doi.org/10.1101/329219; this version posted May 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Abstract Inherent protein flexibility, poor or low-resolution diffraction data, or poor electron density maps, often inhibit building complete structural models during X-ray structure determination. However, advances in crystallographic refinement and model building nowadays often allow to complete previously missing parts. Here, we present algorithms that identify regions missing in a certain model but present in homologous structures in the Protein Data Bank (PDB), and “graft” these regions of interest. These new regions are refined and validated in a fully automated procedure. Including these developments in our PDB-REDO pipeline, allowed to build 24,962 missing loops in the PDB.
    [Show full text]
  • A Reinforcement Learning Approach to Protein Loop Modeling Kevin Molloy, Nicolas Buhours, Marc Vaisset, Thierry Simeon, Étienne Ferré, Juan Cortés
    A Reinforcement Learning Approach to Protein Loop Modeling Kevin Molloy, Nicolas Buhours, Marc Vaisset, Thierry Simeon, Étienne Ferré, Juan Cortés To cite this version: Kevin Molloy, Nicolas Buhours, Marc Vaisset, Thierry Simeon, Étienne Ferré, et al.. A Reinforcement Learning Approach to Protein Loop Modeling. IEEE/RSJ International Conference on Intelligent Robots and Systems, Sep 2015, Hamburg, Germany. hal-01206128 HAL Id: hal-01206128 https://hal.archives-ouvertes.fr/hal-01206128 Submitted on 28 Sep 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Reinforcement Learning Approach to Protein Loop Modeling Kevin Molloy∗y, Nicolas Buhours ∗yz, Marc Vaisset∗y, Thierry Simeon´ ∗y, Etienne´ Ferre´ z, Juan Cortes´ ∗y ∗CNRS, LAAS, 7 avenue du colonel Roche, F-31400 Toulouse, France yUniv de Toulouse, LAAS, F-31400 Toulouse, France zSiemens Industry Software, 478 rue de la decouverte,´ 31670 Labege,` France Abstract—Modeling the loop regions of proteins is an active formation space, we propose discretizing the search space via area of research due to their significance in defining how a database of small, contiguous segments of experimentally the protein interacts with other molecular partners. The high observed loop configurations organized by the corresponding structural flexibility of loops presents formidable challenges for both experimental and computational approaches.
    [Show full text]
  • Homology-Based Loop Modeling Yields More Complete
    research papers Homology-based loop modeling yields more IUCrJ complete crystallographic protein structures ISSN 2052-2525 BIOLOGYjMEDICINE Bart van Beusekom, Krista Joosten, Maarten L. Hekkelman, Robbie P. Joosten* and Anastassis Perrakis* Department of Biochemistry, The Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam 1066CX, The Netherlands. Received 18 May 2018 *Correspondence e-mail: [email protected], [email protected] Accepted 23 July 2018 Inherent protein flexibility, poor or low-resolution diffraction data or poorly defined electron-density maps often inhibit the building of complete structural Edited by J. L. Smith, University of Michigan, models during X-ray structure determination. However, recent advances in USA crystallographic refinement and model building often allow completion of previously missing parts. This paper presents algorithms that identify regions Keywords: loop building; structural re-building; PDB-REDO; model completion; missing in a certain model but present in homologous structures in the Protein crystallography. Data Bank (PDB), and ‘graft’ these regions of interest. These new regions are refined and validated in a fully automated procedure. Including these Supporting information: this article has developments in the PDB-REDO pipeline has enabled the building of 24 962 supporting information at www.iucrj.org missing loops in the PDB. The models and the automated procedures are publicly available through the PDB-REDO databank and webserver. More complete protein structure models enable a higher quality public archive but also a better understanding of protein function, better comparison between homologous structures and more complete data mining in structural bioinfor- matics projects. 1. Introduction Protein structure models give direct and detailed insights into biochemistry (Lamb et al., 2015) and are therefore highly relevant to many areas of biology and biotechnology (Terwilliger & Bricogne, 2014).
    [Show full text]
  • Loop Modeling in Proteins Using a Database Approach with Multi-Dimensional Scaling
    Loop Modeling in Proteins Using a Database Approach with Multi-Dimensional Scaling by Daniel Holtby A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Computer Science Waterloo, Ontario, Canada, 2013 c Daniel Holtby 2013 I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract Modeling loops is an often necessary step in protein structure and func- tion determination, even with experimental X-ray and NMR data. It is well known to be difficult. Database techniques have the advantage of producing a higher proportion of predictions with sub-angstrom accuracy when com- pared with ab initio techniques, but the disadvantage of often being able to produce usable results as they depend entirely on the loop already be- ing represented within the database. My contribution is the LoopWeaver protocol, a database method that uses multidimensional scaling to rapidly achieve better clash-free, low energy placement of loops obtained from a database of protein structures. This maintains the above-mentioned advan- tage while avoiding the disadvantage by permitting the use of lower quality matches that would not otherwise fit. Test results show that this method achieves significantly better results than all other methods, including Mod- eler, Loopy, SuperLooper, and Rapper before refinement. With refinement, the results (LoopWeaver and Loopy combined) are better than ROSETTA's, with 0.53A˚ RMSD on average for 206 loops of length 6, 0.75A˚ local RMSD for 168 loops of length 7, 0.93A˚ RMSD for 117 loops of length 8, and 1.13A˚ RMSD loops of length 9, while ROSETTA scores 0.66A,˚ 0.93A,˚ 1.23A,˚ 1.56A,˚ respectively, at the same average time limit (3 hours on a 2.2GHz Opteron).
    [Show full text]
  • Modeling of Loops in Proteins: a Multi-Method Approach Michal Jamroz, Andrzej Kolinski*
    Jamroz and Kolinski BMC Structural Biology 2010, 10:5 http://www.biomedcentral.com/1472-6807/10/5 RESEARCH ARTICLE Open Access Modeling of loops in proteins: a multi-method approach Michal Jamroz, Andrzej Kolinski* Abstract Background: Template-target sequence alignment and loop modeling are key components of protein comparative modeling. Short loops can be predicted with high accuracy using structural fragments from other, not necessairly homologous proteins, or by various minimization methods. For longer loops multiscale approaches employing coarse-grained de novo modeling techniques should be more effective. Results: For a representative set of protein structures of various structural classes test predictions of loop regions have been performed using MODELLER, ROSETTA, and a CABS coarse-grained de novo modeling tool. Loops of various length, from 4 to 25 residues, were modeled assuming an ideal target-template alignment of the remaining portions of the protein. It has been shown that classical modeling with MODELLER is usually better for short loops, while coarse-grained de novo modeling is more effective for longer loops. Even very long missing fragments in protein structures could be effectively modeled. Resolution of such models is usually on the level 2-6 Å, which could be sufficient for guiding protein engineering. Further improvement of modeling accuracy could be achieved by the combination of different methods. In particular, we used 10 top ranked models from sets of 500 models generated by MODELLER as multiple templates for CABS modeling. On average, the resulting molecular models were better than the models from individual methods. Conclusions: Accuracy of protein modeling, as demonstrated for the problem of loop modeling, could be improved by the combinations of different modeling techniques.
    [Show full text]