A Critical Note on Symmetry Contact Artifacts and the Evaluation of the Quality of Homology Models

Total Page:16

File Type:pdf, Size:1020Kb

A Critical Note on Symmetry Contact Artifacts and the Evaluation of the Quality of Homology Models S S symmetry Article A Critical Note on Symmetry Contact Artifacts and the Evaluation of the Quality of Homology Models Dipali Singh 1,†, Karen R. M. Berntsen 2, Coos Baakman 2, Gert Vriend 2,* and Tapobrata Lahiri 1 1 Bioinformatics Division, Indian Institute of Information Technology, Allahabad 211012, India; [email protected] (D.S.); [email protected] (T.L.) 2 CMBI, Radboud University Nijmegen Medical Centre, 6525 GA 26-28 Nijmegen, The Netherlands; [email protected] (K.R.M.B.); [email protected] (C.B.) * Correspondence: [email protected] † Present address: Institute for Computer Science and Department of Biology, Heinrich Heine University, 40225 Düsseldorf, Germany. Received: 20 November 2017; Accepted: 2 January 2018; Published: 11 January 2018 Abstract: It is much easier to determine a protein’s sequence than to determine its three dimensional structure and consequently homology modeling will be an essential aspect of most studies that require 3D protein structure data. Homology modeling templates tend to be PDB files. About 88% of all protein structures in the PDB have been determined with X-ray crystallography, and thus are based on crystals that by necessity hold non-natural packing contacts in accordance with the crystal symmetry. Active site residues, residues involved in intermolecular interactions, residues that get post-translationally modified, or other sites of interest, normally are located at the protein surface so that it is particularly important to correctly model surface-located residues. Unfortunately, surface residues are just those that suffer most from crystal packing artifacts. Our study of the influence of crystal packing artifacts on the quality of homology models reveals that this influence is much larger than generally assumed, and that the evaluation of the quality of homology models should properly account for these artifacts. Keywords: homology modeling; symmetry contact; side chain rotamericity 1. Introduction Knowledge of the three dimensional structure of proteins is a prerequisite for rational drug design, for many forms of protein engineering, or for explaining the molecular phenotype associated with the disease phenotype caused by a mutation in the human genome. It is considerably easier to determine the sequence of a protein than it is to determine its structure, and consequently homology modeling will be an essential aspect of most studies that require protein structure data. The homology modeling community is continuously working on improving all aspects of the process, and the bi-annual CASP ‘competition’ [1–9] is a good benchmark for where the field stands. The root mean square deviation of the atomic positions in a homology model from the equivalent atoms in the corresponding real structure after they have been optimally superposed is an important aspect of the evaluation of the quality of homology modeling procedures [10–12]. Selecting correct rotamers [5–8] is another aspect. Other measures have also been proposed [13–19]. The expected frequency of occurrence of a molecule or residue in a certain conformation is exponentially related to the energy calculated for that conformation. This is a direct result of application of the Boltzmann law [20]. A frequency plot will thus have much sharper peaks than an energy plot. Similar reasoning suggests that amino acids will prefer to have their χ1 torsion angle rather close to +60◦, 180◦, and −60◦. Many articles have been written on this topic e.g., [21–26] so that the valine χ1 frequency distribution plot, shown as an example in Figure1, does not come as a surprise. Symmetry 2018, 10, 25; doi:10.3390/sym10010025 www.mdpi.com/journal/symmetry Symmetry 2018, 10, 25 2 of 13 Symmetry 2018, 10, 25 2 of 13 The relations between the backbone structure and the preferred side chain rotamer have long been The relations between the backbone structure and the preferred side chain rotamer have long knownbeen known [27], and [27], a series and a of series rotamer of rotamer libraries libraries have been have designed been designed over the over years; the years; most oftenmost often to improve to homologyimprove modelinghomology [modeling28–38]. Rotamers [28–38]. Rotamers are also ofare great also importanceof great importance when analyzing when analyzing the quality the of experimentallyquality of experimentally determined determined protein structures protein structures and homology and homology models. mode Thels. rotameric The rotameric state state of side chainsof side has chains often has been often used been as aused quality as a measurequality measure of homology of homology models models [5–8,39 [5–8,39].]. Most Most studies studies suggest thatsuggest rotamer that distributions rotamer distributions are a function are a offunction secondary of secondary structure, structure, to a lesser to extend a lesser of extend the accessibility, of the andaccessibility, barely of the and absence barely or of presence the absence of symmetry or presence contacts. of symmetry However, ifcontacts. two distributions However, areif two similar, theydistributions do not need are to similar, reflect they the samedo not underlying need to reflect data. the same underlying data. FigureFigure 1. 1.Frequency Frequency distribution of of valine valine in ina st arand strand as function as function of the of torsion the torsion angle χ angle1. Theχ red1. Theline red linerepresents represents actual actual frequencies frequencies observed observed in a subset in a of subset PDB files of PDB solved files by solvedcrystallography by crystallography at 1.8 Å resolution or better (with an R-factor of 0.18 or better) that was culled at 90% sequence identity. 26,028 at 1.8 Å resolution or better (with an R-factor of 0.18 or better) that was culled at 90% sequence identity. valines contributed to the red line. The black line represents the negated exponential of the energy as 26,028 valines contributed to the red line. The black line represents the negated exponential of the calculated with Gamess-US [40] using the 6-31g** basis set and the DFT b3lip energy model. Vertical energy as calculated with Gamess-US [40] using the 6-31g** basis set and the DFT b3lip energy model. scales are arbitrary. The small differences in the locations of the maxima of the g+ and g− peaks are Vertical scales are arbitrary. The small differences in the locations of the maxima of the g+ and g− peaks caused by the fact that the experimentally observed valines are found in-between other residues so are caused by the fact that the experimentally observed valines are found in-between other residues so that also 1-5 repulsive forces act on the Cγ atoms while the predicted distribution is calculated for an that also 1-5 repulsive forces act on the Cγ atoms while the predicted distribution is calculated for an isolated valine in vacuum. isolated valine in vacuum. Figure 2 shows, as an example, the rotamer distributions for a few amino acid types in an α- helix,Figure a β-strand,2 shows, or as a loop an example, region. In the most rotamer cases distributionsthe g− conformation for a few is preferred. amino acid The types three in plots an α for-helix, a βisoleucine-strand, orare a shown loop region.because Inthey most are representative cases the g− forconformation the whole set is of preferred. 51 plots (Gly, The Ala, three and plots Pro for isoleucinedo not have are a shown meaningful because χ1 angle). they are Phenylalanine representative in a for β-strand the whole and sethistidine of 51 plotsin an (Gly,α-helix Ala, are and shown Pro do notbecause have a their meaningful curves differχ1 angle). most from Phenylalanine all other pl inots. a βThe-strand general and absence histidine of significant in an α-helix differences are shown becausebetween their the curvessubsets differof the data most as from a function all other of eith plots.er solvent The general accessibility absence or the of presence/absence significant differences of symmetry contacts seem to suggests that this factor is not important. between the subsets of the data as a function of either solvent accessibility or the presence/absence of 88% of the protein structures in the PDB have been elucidated with X-ray crystallography, 11% symmetry contacts seem to suggests that this factor is not important. with NMR, and about 1% with other techniques. Structures solved by X-ray tend to be more accurate. 88% of the protein structures in the PDB have been elucidated with X-ray crystallography, NMR, on the other hand, often gives a better impression of any mobility present in the structure, and 11%it does with not NMR, suffer, and from about artifacts 1% withcaused other by crystal techniques. packing Structures contacts. solved by X-ray tend to be more accurate.Table NMR, 1 shows on thethat otheron average hand, about often one-fifth gives a of better the amino impression acids at of the any surface mobility of a presentprotein are in the structure,involved and in crystal it does packing. not suffer, Table from 1 artifactsalso shows caused that not by crystalall residue packing types contacts. are equally likely to be involvedTable1 in shows crystal that packing on average contacts, about but that one-fifth study ofis beyond the amino the scope acids of at this the surfacearticle. of a protein are involved in crystal packing. Table1 also shows that not all residue types are equally likely to be involved in crystal packing contacts, but that study is beyond the scope of this article. Symmetry 2018, 10, 25 3 of 13 Symmetry 2018, 10, 25 3 of 13 FigureFigure 2. χ 12. frequency χ1 frequency plots plots for for residues residues with or or without without symmetry symmetry contacts contacts (S+ or (S+ S−, orrespectively) S−, respectively) and withand variouswith various solvent solvent accessibilities.
Recommended publications
  • Supporting Information
    Supporting information for ProBiS H2O MD Approach for Identification of Conserved Water Sites in Protein Structures for Drug Design‡ Marko Jukič a,b , Janez Konc a,c, Dušanka Janežič b,* and Urban Bren a,b,* a University of Maribor, Faculty of Chemistry and Chemical Engineering, Laboratory of Physical Chemistry and Chemical Thermodynamics, Smetanova ulica 17, SI-2000 Maribor, Slovenia. b University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, Glagoljaška 8, SI-6000 Koper, Slovenia. c National Institute of Chemistry, Hajdrihova 19, SI-1000, Ljubljana, Slovenia. KEYWORDS: ProBiS, conserved water molecules, molecular dynamics, water clustering, in silico drug design, water site identification. ‡This paper is dedicated to the memory of Professor Maurizio Botta, an excellent researcher and an exceptional colleague with whom we had long-standing fruitful collaboration. WATER ANALYSIS SOFTWARE REVIEW provides energetic evaluation from the distribution of Approaches can be roughly divided into four groups as is interaction energies between water and macromolecular elaborated below, namely calculations with explicit water environments. In order to alleviate sampling problems in models (molecular dynamics - MD, RISM), calculations with explicit solvent MD calculations, an alternative bordering on implicit water models (probe-based), experimental methods the implicit solvent representations was reported by 5 (ProBiS H2O) and descriptor-based methods. Sindhikara et al. Sindhikara et al. also published an analysis
    [Show full text]
  • PDF Hosted at the Radboud Repository of the Radboud University Nijmegen
    PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/185107 Please be advised that this information was generated on 2021-10-01 and may be subject to change. 3400–3403 Nucleic Acids Research, 2003, Vol. 31, No. 13 DOI: 10.1093/nar/gkg505 NRSAS: Nuclear Receptor Structure Analysis Servers Emmanuel Bettler, Roland Krause1, Florence Horn2 and Gert Vriend* CMBI KUN, PO Box 9010, 6500 GL Nijmegen, The Netherlands, 1Washington University, Center for Computational Mechanics, Saint Louis, MO, USA and 2UCSF, Genentech Hall, 600 16th Street, San Francisco CA 94143-2240, USA ReceivedFebruary 15, 2003; AcceptedMarch 4, 2003 ABSTRACT Table 1. Server classes and descriptions We present a coherent series of servers that can Server class Server description perform a large number of structure analyses on nuclear hormone receptors. These servers are part Administration List residue List sequence of the NucleaRDB project, which provides a power- List coordinates ful information system for nuclear hormone recep- Renumber a file tors. The computations performed by the servers Structure validation Check packing quality include homology modelling, structure validation, Check Ramachandran plot calculating contacts, accessibility values, hydrogen Check crystal packing clashes Check bond angles (or distances, planarities, etc.) bonding patterns, predicting mutations and a host of two- and three-dimensional visualisations. The Protein analysis Accessibility, secondary structure and crystal contacts Protein ligand contacts Nuclear Receptor Structure Analysis Servers (NRSAS) are freely accessible at http://www.cmbi. 2D graphics B-factor plot Ramachandran plot kun.nl/NR/servers/html/ and in-house copies can be obtained upon request.
    [Show full text]
  • Modulation of the Activity and Regioselectivity of a Glycosidase: Development of a Convenient Tool for the Synthesis of Specific Disaccharides
    molecules Article Modulation of the Activity and Regioselectivity of a Glycosidase: Development of a Convenient Tool for the Synthesis of Specific Disaccharides Yari Cabezas-Pérusse 1, Franck Daligault 2, Vincent Ferrières 1 , Olivier Tasseau 1 and Sylvain Tranchimand 1,* 1 Ecole Nationale Supérieure de Chimie de Rennes, CNRS, ISCR (Institut des Sciences Chimiques de Rennes)—UMR 6226, Université de Rennes, F-35000 Rennes, France; [email protected] (Y.C.-P.); [email protected] (V.F.); [email protected] (O.T.) 2 CNRS, UFIP (Unité de Fonctionnalité et Ingénierie des Protéines)—UMR 6286, Université de Nantes, F-44000 Nantes, France; [email protected] * Correspondence: [email protected] Abstract: The synthesis of disaccharides, particularly those containing hexofuranoside rings, requires a large number of steps by classical chemical means. The use of glycosidases can be an alternative to limit the number of steps, as they catalyze the formation of controlled glycosidic bonds starting from simple and easy to access building blocks; the main drawbacks are the yields, due to the balance between the hydrolysis and transglycosylation of these enzymes, and the enzyme-dependent regiose- lectivity. To improve the yield of the synthesis of β-D-galactofuranosyl-(1!X)-D-mannopyranosides catalyzed by an arabinofuranosidase, in this study we developed a strategy to mutate, then screen the catalyst, followed by a tailored molecular modeling methodology to rationalize the effects of the Citation: Cabezas-Pérusse, Y.; identified mutations. Two mutants with a 2.3 to 3.8-fold increase in transglycosylation yield were Daligault, F.; Ferrières, V.; Tasseau, O.; obtained, and in addition their accumulated regioisomer kinetic profiles were very different from Tranchimand, S.
    [Show full text]
  • Structural Characterization of the D-Tyr-Trnatyr Deacylase from Bacillus Lichenformis, an Organism of Great Industrial Importance
    Computational Molecular Bioscience, 2011, 1, 1-6 doi:10.4236/cmb.2011.11001 Published Online December 2011 (http://www.SciRP.org/journal/cmb) Structural Characterization of the D-Tyr-tRNATyr Deacylase from Bacillus lichenformis, an Organism of Great Industrial Importance Angshuman Bagchi Department of Biochemistry and Biophysics, University of Kalyani, Kalyani, India E-mail: [email protected] Received November 5, 2011; revised December 2, 2011; accepted December 12, 2011 Abstract A new class of enzyme was established that hydrolyze the ester bond between D-Tyr bound onto its cognate t-RNA. The enzyme is called D-Tyr-tRNA deacylase. The three dimensional structure of the D-Tyr-tRNA deacylase from industrially important microorganism Bacillus lichenformis DSM13 was predicted by com- parative modeling approach. Since the protein acts as a dimer a dimeric model of the enzyme was con- structed. The interactions responsible for dimerization were also predicted. With the help of docking and molecular dynamics simulations the favourable binding mode of the enzyme was predicted. The probable biochemical mechanism of the hydrolysis process was elucidated. This study provides a rational framework to interpret the molecular mechanistic details of the removal of toxic D-Tyr-tRNA from the cells of industri- ally important microorganism Bacillus lichenformis DSM13 using the enzyme D-Tyr-tRNA deacylase. Keywords: Molecular Modeling, Molecular Docking, Molecular Dynamics, Industrially Important Microorganism, Ester Hydrolysis, D-Tyr-tRNA Deacylase 1. Introduction The complete genome sequence of Blich has recently been annotated [4]. It has been observed previously that several organisms The three dimensional structure of the protein has been (for example E.
    [Show full text]
  • Developing Bioinformatics Computer Skills
    Developing Bioinformatics Computer Skills Cynthia Gibas Per Jambeck Publisher: O'Reilly First Edition April 2001 ISBN: 1-56592-664-1, 446 pages Developing Bioinformatics Computer Skills Copyright © 2001 O'Reilly & Associates, Inc. All rights reserved. Printed in the United States of America. Published by O'Reilly & Associates, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O'Reilly & Associates books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safari.oreilly.com). For more information contact our corporate/institutional sales department: 800-998-9938 or [email protected]. The O'Reilly logo is a registered trademark of O'Reilly & Associates, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. The association between the image of a Caenorhabditis elegans and the topic of bioinformatics is a trademark of O'Reilly & Associates, Inc. While every precaution has been taken in the preparation of this book, the publisher assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. 2 Preface__________________________________________________________________________________ 6 Audience for This Book _________________________________________________________________
    [Show full text]
  • From Existing Data to Novel Hypotheses
    From existing data to novel hypotheses Design and application of structure-based Molecular Class Specific Information Systems Remko Kuipers Thesis committee Thesis supervisors Prof. dr. ir. V.A.P. Martins dos Santos Professor of Systems and Synthetic Biology Wageningen University Prof. dr. G. Vriend Professor of Bioinformatics of Macromolecular Structures CMBI, Radboud University Nijmegen, Medical Centre Thesis co-supervisor Dr. P.J. Schaap Assistant professor, Laboratory of Systems and Synthetic Biology Wageningen University Other members Prof. dr. S.C. de Vries, Wageningen University Prof. dr. T.R.J.M. Desmet, Ghent University, Belgium Dr. S.A.F.T. van Hijum, Radboud University Nijmegen Dr. R. de Jong, DSM Biotechnology Center, Delft This research was conducted under the auspices of the Graduate School VLAG (Advanced studies in Food Technology, Agrobiotechnology, Nutrition and Health Sciences). From existing data to novel hypotheses Design and application of structure-based Molecular Class Specific Information Systems Remko Kuipers Thesis submitted in fulfilment of the requirements for the decree of doctor at Wageningen University by the authority of the Rector Magnificus Prof. dr. M.J. Kropff, in the presence of the Thesis Committee appointed by the Academic Board to be defended in public on Wednesday December 12th, 2012 at 11 a.m. in the Aula. Remko K.P. Kuipers From existing data to novel hypotheses. Design and application of structure- based Molecular Class Specific Information Systems. 232 pages Thesis, Wageningen University, Wageningen, The Netherlands (2012) With references, with summaries in Dutch and English ISBN: 978-94-6173-350-4 Table of Contents Chapter 1 General Introduction 7 Chapter 2 Technical Background 49 Chapter 3 3DM: systematic analysis of heterogeneous super- 95 family data to discover protein functionalities Chapter 4 Protein structure analysis of mutations causing 119 inheritable diseases.
    [Show full text]
  • A Series of PDB-Related Databanks for Everyday Needs Wouter G
    D364–D368 Nucleic Acids Research, 2015, Vol. 43, Database issue Published online 28 October 2014 doi: 10.1093/nar/gku1028 A series of PDB-related databanks for everyday needs Wouter G. Touw1, Coos Baakman1, Jon Black1, Tim A. H. te Beek2, E. Krieger1, Robbie P. Joosten1,3 and Gert Vriend1,* 1Centre for Molecular and Biomolecular Informatics, CMBI, Radboud university medical center, Geert Grooteplein Zuid 26–28 6525 GA Nijmegen, The Netherlands, 2Bio-Prodict BV, Nieuwe Marktstraat 54E, 6511 AA Nijmegen, The Netherlands and 3Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam 1066 CX, The Netherlands Received September 16, 2014; Revised October 08, 2014; Accepted October 09, 2014 ABSTRACT compromise between inclusiveness and human readability by the PDBx/mmCIF file format (3,6), in practice the old We present a series of databanks (http://swift.cmbi. format is still used by most software applications. The PDB ru.nl/gv/facilities/) that hold information that is com- format is the source of many problems, some of which are putationally derived from Protein Data Bank (PDB) addressed by our databanks. entries and that might augment macromolecular All databanks are available from http://swift.cmbi.ru.nl/ structure studies. These derived databanks run par- gv/facilities/. This site also provides extensive documenta- allel to the PDB, i.e. they have one entry per PDB en- tion that includes help for downloading individual files or try. Several of the well-established databanks such whole databanks. as HSSP, PDBREPORT and PDB REDO have been updated and/or improved. The software that creates the DSSP databank, for example, has been rewrit- UPDATE ON EXISTING DATABANKS ten to better cope with ␲-helices.
    [Show full text]
  • Phd-Vollan-DUO.Pdf
    In silico analyzes of porins involved in niche adaptation: Exploring the role of Helicobacter pylori outer membrane phospholipase A in acid tolerance Dissertation for the degree of Philosophiae Doctor (Ph.D.) Hilde Synnøve Vollan Division for Infection Control and Environmental Health, Norwegian Institute of Public Health and Department of Clinical Molecular Biology (EpiGen), Division of Medicine, Akershus University Hospital and University of Oslo 2017 © Hilde Synnøve Vollan, 2018 Series of dissertations submitted to the Faculty of Medicine, University of Oslo ISBN 978-82-8377-183-1 All rights reserved. No part of this publication may be reproduced or transmitted, in any form or by any means, without permission. Cover: Hanne Baadsgaard Utigard. Print production: Reprosentralen, University of Oslo. Acknowledgements I would first like to thank my principal supervisor prof. Geir Bukholm for providing motivation and encouragement throughout the project. Scientific hypotheses have been generated with great enthusiasm, based on his experience in the field of microbiology, medicine and global health. Throughout my battle with health issues, the tasks and workload have always been adjusted flexibly and insightful to fit my situation. I have learned a great deal of bioinformatics through the courses at Centre for Molecular and Biomolecular Informatics (CMBI), Radboud University Nijmegen Medical Centre (Radboudumc), The Netherlands. I would especially thank my co-supervisor prof. Gert Vriend for taking the time to patiently teach me protein structure analyzes. The thesis is based on laboratory work by my co-supervisor Tone Tannæs who also has helped and supported me throughout these years. I would also like to thank my co-supervisor prof.
    [Show full text]