Bioinformatics Methods for NMR Chemical Shift Data

Total Page:16

File Type:pdf, Size:1020Kb

Bioinformatics Methods for NMR Chemical Shift Data Bioinformatics Methods for NMR Chemical Shift Data Dissertation an der Fakult¨at fur¨ Mathematik, Informatik und Statistik der Ludwig-Maximilians-Universit¨at Munchen¨ Simon W. Ginzinger vorgelegt am 24.10.2007 Erster Gutachter: Prof. Dr. Volker Heun, Ludwig-Maximilians-Universit¨at Munchen¨ Zweiter Gutachter: Prof. Dr. Robert Konrat, Universit¨at Wien Rigorosum: 8. Februar 2008 Kurzfassung Die nukleare Magnetresonanz-Spektroskopie (NMR) ist eine der wichtigsten Meth- oden, um die drei-dimensionale Struktur von Biomolekulen¨ zu bestimmen. Trotz großer Fortschritte in der Methodik der NMR ist die Aufl¨osung einer Protein- struktur immer noch eine komplizierte und zeitraubende Aufgabe. Das Ziel dieser Doktorarbeit ist es, Bioinformatik-Methoden zu entwickeln, die den Prozess der Strukturaufkl¨arung durch NMR erheblich beschleunigen k¨onnen. Zu diesem Zweck konzentriert sich diese Arbeit auf bestimmte Messdaten aus der NMR, die so genannten chemischen Verschiebungen. Chemische Verschiebungen werden standardm¨aßig zu Beginn einer Struktur- aufl¨osung bestimmt. Wie alle Labordaten k¨onnen chemische Verschiebungen Fehler enthalten, die die Analyse erschweren, wenn nicht sogar unm¨oglich machen. Als erstes Resultat dieser Arbeit wird darum CheckShift pr¨asentiert, eine Meth- ode, die es erm¨oglich einen weit verbreiteten Fehler in chemischen Verschiebungs- daten automatisch zu korrigieren. Das Hauptziel dieser Doktorarbeit ist es jedoch, strukturelle Informationen aus chemischen Verschiebungen zu extrahieren. Als erster Schritt in diese Richtung wurde SimShift entwickelt. SimShift erm¨oglicht es zum ersten Mal, strukturelle Ahnlichkeiten¨ zwischen Proteinen basierend auf chemischen Verschiebungen zu identifizieren. Der Vergleich zu Methoden, die nur auf der Aminos¨aurensequenz basieren, zeigt die Uberlegenheit¨ des verschiebungsbasierten Ansatzes. Als eine naturliche¨ Erweiterung des paarweisen Vergleichs von Proteinen wird darauf fol- gend SimShiftDB vorgestellt. Gegeben ein Protein, durchsucht SimShiftDB eine Datenbank bekannter Proteinstrukturen nach strukturell homologen Eintr¨agen. Die Suche basiert hierbei nur auf der Aminos¨auresequenz und den chemischen Verschiebungen des Proteins. Die detektierten Ahnlichkeiten¨ werden zus¨atzlich nach statistischer Signifikanz bewertet. Mit der Chemical Shift Pipeline wird schließlich das Hauptresultat der Dis- sertation vorgestellt. Durch die Kombination der automatischen Fehlerkorrektur (CheckShift) mit dem Datenbank-Suchalgorithmus (SimShiftDB), wird in 70% bis 80% der vorhergesagten strukturellen Ahnlichkeiten¨ eine sehr hohe Qualit¨at erreicht. Der Anteil der fehlerhaften Vorhersagen betr¨agt nur etwa 10%. iii Summary Nuclear magnetic resonance spectroscopy (NMR) is one of the most important methods for measuring the three-dimensional structure of biomolecules. Despite major progress in the NMR methodology, the solution of a protein structure is still a tedious and time-consuming task. The goal of this thesis is to develop bioinformatics methods which may strongly accelerate the NMR process. This work concentrates on a special type of measurements, the so-called chemical shifts. Chemical shifts are routinely measured at the beginning of a structure resolu- tion process. As all data from the laboratory, chemical shifts may be error-prone, which might complicate or even circumvent the use of this data. Therefore, as the first result of the thesis, we present CheckShift, a method which automatically corrects a frequent error in NMR chemical shift data. However, the main goal of this thesis is the extraction of structural information hidden in chemical shifts. SimShift was developed as a first step in this direc- tion. SimShift is the first approach to identify structural similarities between proteins based on chemical shifts. Compared to methods based on the amino acid sequence alone, SimShift shows its strength in detecting distant structural relationships. As a natural further development of the pairwise comparison of proteins, the SimShift algorithm is adapted for database searching. Given a pro- tein, the improved algorithm, named SimShiftDB, searches a database of solved proteins for structurally homologue entries. The search is based only on the amino acid sequence and the associated chemical shifts. The detected similarities are additionally ranked based on calculations of statistical significance. Finally, the Chemical Shift Pipeline, the main result of this work, is presented. By combining automatic chemical shift error correction (CheckShift) and the database search algorithm (SimShiftDB), it is possible to achieve very high quality in 70% to 80% of the similarities identified. Thereby, only about 10% of the predictions are in error. v Contents 1 Introduction 1 1.1 Overview ............................... 1 1.2 Synopsis................................ 2 2 Preliminaries 5 2.1 Introduction.............................. 5 2.2 NMRBasics.............................. 5 2.3 SHIFTX................................ 9 2.3.1 RingCurrentEffects . .. .. .. .. 9 2.3.2 ElectricFieldEffects . 10 2.3.3 HydrogenBondEffects. 11 2.3.4 Empirical Chemical Shift Hypersurfaces . 12 2.4 TALOS ................................ 12 2.5 PSIPRED ............................... 15 2.6 Chemical Shift Index (CSI) . 15 2.7 Structural Identification (STRIDE) . 16 2.8 HHsearch ............................... 16 2.9 Secondary Structure Element Alignment (SSEA) . .. 17 2.10 CombinatorialExtension(CE). 17 2.10.1 DistanceScores . 18 2.11MaxSub ................................ 19 2.12Databases ............................... 19 3 CheckShift 21 3.1 Introduction.............................. 21 3.2 TheCheckShiftAlgorithm . 22 3.2.1 Preparation of Reference Density Functions . 22 3.2.2 CalculationofSimilarity . 23 3.2.3 Re-ReferencingofDataSets . 26 3.3 Results................................. 26 3.4 Discussion............................... 29 3.5 Availability .............................. 29 vii viii Contents 4 SimShift 31 4.1 Introduction.............................. 31 4.2 SelectionoftheBenchmarkSet . 33 4.2.1 DatabasesUsed........................ 34 4.2.2 Evaluating the Structural Correctness of Alignments ... 34 4.2.3 Defining a Benchmark Set . 35 4.3 The Shift-Alignment Algorithm . 36 4.3.1 Phase 1: Calculation of the Shift-Difference Matrix . .. 36 4.3.2 Phase2:FindGoodBlocks . 36 4.3.3 Phase3: ConcatenationofBlocks . 37 4.3.4 ParameterOptimization . 39 4.4 Results................................. 40 4.4.1 Comparison to SSEA and HHsearch . 40 4.4.2 ComparisontoTALOS. 45 4.5 Discussion............................... 45 5 SimShiftDB 47 5.1 Introduction.............................. 47 5.2 TheTemplateDatabase . 48 5.3 SubstitutionMatricesforShiftData . 48 5.4 E-Values for Chemical Shift Alignments . 49 5.5 The Shift Alignment Algorithm . 51 5.5.1 Step 1: Calculate local alignments . 51 5.5.2 Step 2: Identify the best legal combination . 52 5.6 Results................................. 55 5.6.1 Evaluation of the Modeling Performance . 55 5.6.2 Evaluation of the P-Value Correctness . 56 5.7 Discussion............................... 56 5.8 Availability .............................. 59 6 The Chemical Shift Pipeline 61 6.1 Introduction.............................. 61 6.2 Coping with Missing Chemical Shift Data . 61 6.3 Chemical Shift Substitution Matrices . 62 6.4 TheBenchmarkSet.......................... 65 6.5 Results................................. 65 6.6 Discussion............................... 69 6.7 Availability .............................. 69 7 Outlook 71 A SHIFTX Supplementary Material 81 Contents ix B Empirical vs. Theoretical P-Values for Set S1 85 C Empirical vs. Theoretical P-Values for Set S2 89 D The Benchmark Set 93 E Chemical Shift Pipeline Results 99 1 Introduction 1.1 Overview The main goal of Bioinformatics is to assist people working on biological prob- lems with the solution of tasks that require a thorough understanding of computer science. One of the most prominent methods, which has nowadays become a stan- dard tool for people working in molecular biology, is sequence comparison. With the size of the publicly available databases increasing, it became unfeasible to compare a newly derived sequence to a database of already solved and possibly functionally annotated sequences by hand. Therefore, tools were developed to search databases of millions of sequences automatically. To date, sequence com- parison has a strong research background and many problems arising in this area have been solved. Is the time for computational biology now over? Of course not, because there is more than just the protein sequence one might like to compare and investigate. What is strongly important in understanding the function of a protein is the protein’s three-dimensional structure. It has been shown that there exist proteins with very low sequence similarity sharing a strongly similar three-dimensional structure (see for example [Pastore and Lesk, 1990]). This underlines the need for structural data as often sequence comparison alone is not sufficient. To date, it is not possible to calculate the protein structure reliably from the protein sequence alone. NMR spectroscopy is one of the most important methods for resolving a protein structure in the laboratory. However, solving a protein as a whole is a complicated task, requiring a series of complex experiments. The work presented in this dissertation focuses on data which may be acquired fairly easily in a standard NMR experiment. This data are the so-called chemical shifts,
Recommended publications
  • An Effective Computational Method Incorporating Multiple Secondary
    Old Dominion University ODU Digital Commons Computer Science Faculty Publications Computer Science 2017 An Effective Computational Method Incorporating Multiple Secondary Structure Predictions in Topology Determination for Cryo-EM Images Abhishek Biswas Old Dominion University Desh Ranjan Old Dominion University Mohammad Zubair Old Dominion University Stephanie Zeil Old Dominion University Kamal Al Nasr See next page for additional authors Follow this and additional works at: https://digitalcommons.odu.edu/computerscience_fac_pubs Part of the Biochemistry Commons, Computer Sciences Commons, Molecular Biology Commons, and the Statistics and Probability Commons Repository Citation Biswas, Abhishek; Ranjan, Desh; Zubair, Mohammad; Zeil, Stephanie; Al Nasr, Kamal; and He, Jing, "An Effective Computational Method Incorporating Multiple Secondary Structure Predictions in Topology Determination for Cryo-EM Images" (2017). Computer Science Faculty Publications. 81. https://digitalcommons.odu.edu/computerscience_fac_pubs/81 Original Publication Citation Biswas, A., Ranjan, D., Zubair, M., Zeil, S., Al Nasr, K., & He, J. (2017). An effective computational method incorporating multiple secondary structure predictions in topology determination for cryo-em images. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(3), 578-586. doi:10.1109/tcbb.2016.2543721 This Article is brought to you for free and open access by the Computer Science at ODU Digital Commons. It has been accepted for inclusion in Computer Science Faculty Publications by an authorized administrator of ODU Digital Commons. For more information, please contact [email protected]. Authors Abhishek Biswas, Desh Ranjan, Mohammad Zubair, Stephanie Zeil, Kamal Al Nasr, and Jing He This article is available at ODU Digital Commons: https://digitalcommons.odu.edu/computerscience_fac_pubs/81 HHS Public Access Author manuscript Author ManuscriptAuthor Manuscript Author IEEE/ACM Manuscript Author Trans Comput Manuscript Author Biol Bioinform.
    [Show full text]
  • Chapter 13 Protein Structure Learning Objectives
    Chapter 13 Protein structure Learning objectives Upon completing this material you should be able to: ■ understand the principles of protein primary, secondary, tertiary, and quaternary structure; ■use the NCBI tool CN3D to view a protein structure; ■use the NCBI tool VAST to align two structures; ■explain the role of PDB including its purpose, contents, and tools; ■explain the role of structure annotation databases such as SCOP and CATH; and ■describe approaches to modeling the three-dimensional structure of proteins. Outline Overview of protein structure Principles of protein structure Protein Data Bank Protein structure prediction Intrinsically disordered proteins Protein structure and disease Overview: protein structure The three-dimensional structure of a protein determines its capacity to function. Christian Anfinsen and others denatured ribonuclease, observed rapid refolding, and demonstrated that the primary amino acid sequence determines its three-dimensional structure. We can study protein structure to understand problems such as the consequence of disease-causing mutations; the properties of ligand-binding sites; and the functions of homologs. Outline Overview of protein structure Principles of protein structure Protein Data Bank Protein structure prediction Intrinsically disordered proteins Protein structure and disease Protein primary and secondary structure Results from three secondary structure programs are shown, with their consensus. h: alpha helix; c: random coil; e: extended strand Protein tertiary and quaternary structure Quarternary structure: the four subunits of hemoglobin are shown (with an α 2β2 composition and one beta globin chain high- lighted) as well as four noncovalently attached heme groups. The peptide bond; phi and psi angles The peptide bond; phi and psi angles in DeepView Protein secondary structure Protein secondary structure is determined by the amino acid side chains.
    [Show full text]
  • Increasing the Accuracy of Single Sequence Prediction Methods Using a Deep Semi-Supervised Learning Framework Lewis Moffat 1,∗ and David T
    Structural Bioinformatics Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab491/6313164 by guest on 07 July 2021 Increasing the Accuracy of Single Sequence Prediction Methods Using a Deep Semi-Supervised Learning Framework Lewis Moffat 1,∗ and David T. Jones 1,∗ 1Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK; and Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK ∗To whom correspondence should be addressed. Associate Editor: XXXXXXX Received on XXXXX; revised on XXXXX; accepted on XXXXX Abstract Motivation: Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved. Results: By taking a bioinformatics approach to semi-supervised machine learning, we develop Profile Augmentation of Single Sequences (PASS), a simple but powerful framework for building accurate single- sequence methods. To demonstrate the effectiveness of PASS we apply it to the mature field of secondary structure prediction. In doing so we develop S4PRED, the successor to the open-source PSIPRED-Single method, which achieves an unprecedented Q3 score of 75.3% on the standard CB513 test. PASS provides a blueprint for the development of a new generation of predictive methods, advancing our ability to model individual protein sequences. Availability: The S4PRED model is available as open source software on the PSIPRED GitHub repository (https://github.com/psipred/s4pred), along with documentation. It will also be provided as a part of the PSIPRED web service (http://bioinf.cs.ucl.ac.uk/psipred/) Contact: [email protected], [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
    [Show full text]
  • Bi-Allelic Novel Variants in CLIC5 Identified in a Cameroonian
    G C A T T A C G G C A T genes Article Bi-Allelic Novel Variants in CLIC5 Identified in a Cameroonian Multiplex Family with Non-Syndromic Hearing Impairment Edmond Wonkam-Tingang 1, Isabelle Schrauwen 2 , Kevin K. Esoh 1 , Thashi Bharadwaj 2, Liz M. Nouel-Saied 2 , Anushree Acharya 2, Abdul Nasir 3 , Samuel M. Adadey 1,4 , Shaheen Mowla 5 , Suzanne M. Leal 2 and Ambroise Wonkam 1,* 1 Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa; [email protected] (E.W.-T.); [email protected] (K.K.E.); [email protected] (S.M.A.) 2 Center for Statistical Genetics, Sergievsky Center, Taub Institute for Alzheimer’s Disease and the Aging Brain, and the Department of Neurology, Columbia University Medical Centre, New York, NY 10032, USA; [email protected] (I.S.); [email protected] (T.B.); [email protected] (L.M.N.-S.); [email protected] (A.A.); [email protected] (S.M.L.) 3 Synthetic Protein Engineering Lab (SPEL), Department of Molecular Science and Technology, Ajou University, Suwon 443-749, Korea; [email protected] 4 West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra LG 54, Ghana 5 Division of Haematology, Department of Pathology, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa; [email protected] * Correspondence: [email protected]; Tel.: +27-21-4066-307 Received: 28 September 2020; Accepted: 20 October 2020; Published: 23 October 2020 Abstract: DNA samples from five members of a multiplex non-consanguineous Cameroonian family, segregating prelingual and progressive autosomal recessive non-syndromic sensorineural hearing impairment, underwent whole exome sequencing.
    [Show full text]
  • Accurate and Automated Classification of Protein Secondary Structure with Psicsi
    Accurate and automated classification of protein secondary structure with PsiCSI LING-HONG HUNG AND RAM SAMUDRALA Computational Genomics, Department of Microbiology, University of Washington, Seattle, Washington 98109, USA (RECEIVED July 2, 2002; FINAL REVISION October 18, 2002; ACCEPTED October 31, 2002) Abstract PsiCSI is a highly accurate and automated method of assigning secondary structure from NMR data, which is a useful intermediate step in the determination of tertiary structures. The method combines information from chemical shifts and protein sequence using three layers of neural networks. Training and testing was performed on a suite of 92 proteins (9437 residues) with known secondary and tertiary structure. Using a stringent cross-validation procedure in which the target and homologous proteins were removed from the databases used for training the neural networks, an average 89% Q3 accuracy (per residue) was observed. This is an increase of 6.2% and 5.5% (representing 36% and 33% fewer errors) over methods that use chemical shifts (CSI) or sequence information (Psipred) alone. In addition, PsiCSI improves upon the and is able to use sequence (87.4% ס translation of chemical shift information to secondary structure (Q3 without 13C shifts and 86.9% ס information as an effective substitute for sparse NMR data (Q3 with only H␣ shifts available). Finally, errors made by PsiCSI almost exclusively involve the 86.8% ס Q3 interchange of helix or strand with coil and not helix with strand (<2.5 occurrences per 10000 residues). The automation, increased accuracy, absence of gross errors, and robustness with regards to sparse data make PsiCSI ideal for high-throughput applications, and should improve the effectiveness of hybrid NMR/de novo structure determination methods.
    [Show full text]
  • Further Confirmation of the Association of SLC12A2 with Non-Syndromic
    www.nature.com/jhg ARTICLE OPEN Further confirmation of the association of SLC12A2 with non-syndromic autosomal-dominant hearing impairment Samuel M. Adadey1,2,9, Isabelle Schrauwen3,9, Elvis Twumasi Aboagye1, Thashi Bharadwaj3, Kevin K. Esoh 2, Sulman Basit 4, 3 3 5 2 6 1 Anushree Acharya , Liz M. Nouel-Saied ,✉ Khurram Liaqat , Edmond✉ Wonkam-Tingang , Shaheen Mowla , Gordon A. Awandare , Wasim Ahmad 7, Suzanne M. Leal 3,8 and Ambroise Wonkam 2 © The Author(s) 2021 Congenital hearing impairment (HI) is genetically heterogeneous making its genetic diagnosis challenging. Investigation of novel HI genes and variants will enhance our understanding of the molecular mechanisms and to aid genetic diagnosis. We performed exome sequencing and analysis using DNA samples from affected members of two large families from Ghana and Pakistan, segregating autosomal-dominant (AD) non-syndromic HI (NSHI). Using in silico approaches, we modeled and evaluated the effect of the likely pathogenic variants on protein structure and function. We identified two likely pathogenic variants in SLC12A2, c.2935G>A:p.(E979K) and c.2939A>T:p.(E980V), which segregate with NSHI in a Ghanaian and Pakistani family, respectively. SLC12A2 encodes an ion transporter crucial in the homeostasis of the inner ear endolymph and has recently been reported to be implicated in syndromic and non-syndromic HI. Both variants were mapped to alternatively spliced exon 21 of the SLC12A2 gene. Exon 21 encodes for 17 residues in the cytoplasmatic tail of SLC12A2, is highly conserved between species, and preferentially expressed in cochlear tissues. A review of previous studies and our current data showed that out of ten families with either AD non-syndromic or syndromic HI, eight (80%) had variants within the 17 amino acid residue region of exon 21 (48 bp), suggesting that this alternate domain is critical to the transporter activity in the inner ear.
    [Show full text]
  • A Web Server for Rapid NMR-Based Protein Structure Determination Berjanskii, M.; Tang, P.; Liang, J.; Cruz, J
    NRC Publications Archive Archives des publications du CNRC GeNMR: a web server for rapid NMR-based protein structure determination Berjanskii, M.; Tang, P.; Liang, J.; Cruz, J. A.; Zhou, J.; Zhou, Y.; Bassett, E.; Macdonell, C.; Lu, P.; Lin, G.; Wishart, D. S. This publication could be one of several versions: author’s original, accepted manuscript or the publisher’s version. / La version de cette publication peut être l’une des suivantes : la version prépublication de l’auteur, la version acceptée du manuscrit ou la version de l’éditeur. For the publisher’s version, please access the DOI link below./ Pour consulter la version de l’éditeur, utilisez le lien DOI ci-dessous. Publisher’s version / Version de l'éditeur: https://doi.org/10.1093/nar/gkp280 Nucleic Acids Research, 37, Suppl. 2, pp. W670-W677, 2009-07-01 NRC Publications Record / Notice d'Archives des publications de CNRC: https://nrc-publications.canada.ca/eng/view/object/?id=8c24f92f-5a63-4bc9-9510-868e720b942e https://publications-cnrc.canada.ca/fra/voir/objet/?id=8c24f92f-5a63-4bc9-9510-868e720b942e Access and use of this website and the material on it are subject to the Terms and Conditions set forth at https://nrc-publications.canada.ca/eng/copyright READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE. L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site https://publications-cnrc.canada.ca/fra/droits LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB. Questions? Contact the NRC Publications Archive team at [email protected].
    [Show full text]
  • Bayesian Model of Protein Primary Sequence for Secondary Structure Prediction
    Bayesian Model of Protein Primary Sequence for Secondary Structure Prediction Qiwei Li1, David B. Dahl2*, Marina Vannucci1, Hyun Joo3, Jerry W. Tsai3 1 Department of Statistics, Rice University, Houston, Texas, United States of America, 2 Department of Statistics, Brigham Young University, Provo, Utah, United States of America, 3 Department of Chemistry, University of the Pacific, Stockton, California, United States of America Abstract Determining the primary structure (i.e., amino acid sequence) of a protein has become cheaper, faster, and more accurate. Higher order protein structure provides insight into a protein’s function in the cell. Understanding a protein’s secondary structure is a first step towards this goal. Therefore, a number of computational prediction methods have been developed to predict secondary structure from just the primary amino acid sequence. The most successful methods use machine learning approaches that are quite accurate, but do not directly incorporate structural information. As a step towards improving secondary structure reduction given the primary structure, we propose a Bayesian model based on the knob- socket model of protein packing in secondary structure. The method considers the packing influence of residues on the secondary structure determination, including those packed close in space but distant in sequence. By performing an assessment of our method on 2 test sets we show how incorporation of multiple sequence alignment data, similarly to PSIPRED, provides balance and improves the accuracy of the predictions. Software implementing the methods is provided as a web application and a stand-alone implementation. Citation: Li Q, Dahl DB, Vannucci M, Joo H, Tsai JW (2014) Bayesian Model of Protein Primary Sequence for Secondary Structure Prediction.
    [Show full text]
  • Quantitative Analysis of Biomolecular NMR Spectra: a Prerequisite for the Determination of the Structure and Dynamics of Biomolecules
    Current Organic Chemistry, 2006, 10, 00-00 1 Quantitative Analysis of Biomolecular NMR Spectra: A Prerequisite for the Determination of the Structure and Dynamics of Biomolecules Thérèse E. Malliavin* Laboratoire de Biochimie Théorique, CNRS UPR 9080, Institut de Biologie PhysicoChimique, 13 rue P. et M. Curie, 75 005 Paris, France. Abstract: Nuclear Magnetic Resonance (NMR) became during the two last decades an important method for biomolecular structure determination. NMR permits to study biomolecules in solution and gives access to the molecular flexibility at atomic level on a complete structure: in that respect, it is occupying a unique place in structural biology. During the first years of its development, NMR was trying to meet the requirements previously defined in X-ray crystallography. But, NMR then started to determine its own criteria for the definition of a structure. Indeed, the atomic coordinates of an NMR structure are calculated using restraints on geometrical parameters (angles and distances) of the structure, which are only indirectly related to atom positions: in that respect, NMR and X-ray crystallography are very different. The indirect relation between the NMR measurements and the molecular structure and dynamics makes critical the precision and the interpretation of the NMR parameters and the development of quantitative analysis methods. The methods published since 1997 for liquid-NMR of proteins are reviewed here. First, methods for structure determination are presented, as well as methods for spectral assignment and for structure quality assessment. Second, the quantitative analysis of structure mobility is reviewed. 1. INTRODUCTION parameters is fuzzy, because each NMR parameter is Nuclear Magnetic Resonance (NMR) became at the depending not only on the structure, but also on the internal beginning of 90's, a method of choice for structural biology dynamics of the molecule, and these two aspects are difficult in solution.
    [Show full text]
  • Chemical Shift Secondary Structure Population Inference
    bioRxiv preprint doi: https://doi.org/10.1101/2021.02.20.432095; this version posted February 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 CheSPI: Chemical shift Secondary structure Population Inference Jakob Toudahl Nielsen1* and Frans A.A. Mulder1* 1Interdisciplinary Nanoscience Center (iNANO) and Department of Chemistry, Aarhus University, Gustav Wieds VeJ 14, 8000 Aarhus C, Denmark *Correspondence to: [email protected], [email protected] ORCID: Jtn http://orcid.org/0000-0002-9554-4662 faam http://orcid.org/0000-0001-9427-8406 Keywords: NMR, protein, order, disorder, chemical shifts Abstract NMR chemical shifts (CSs) are delicate reporters of local protein structure, and recent advances in random coil CS (RCCS) prediction and interpretation now offer the compelling prospect of inferring small populations of structure from small deviations from RCCSs. Here, we present CheSPI, a simple and efficient method that provides unbiased and sensitive aggregate measures of local structure and disorder. It is demonstrated that CheSPI can predict even very small amounts of residual structure and robustly delineate subtle differences into four structural classes for intrinsically disordered proteins. For structured regions and proteins, CheSPI can assign up to eight structural classes, which coincide with the well-known DSSP classification. The program is freely available, and can either be invoked from URL www.protein-nmr.org as a web implementation, or run locally from command line as a python program.
    [Show full text]
  • The Structure and Dynamics of Bmr1 Protein from Brugia Malayi: in Silico Approaches
    Int. J. Mol. Sci. 2014, 15, 11082-11099; doi:10.3390/ijms150611082 OPEN ACCESS International Journal of Molecular Sciences ISSN 1422-0067 www.mdpi.com/journal/ijms Article The Structure and Dynamics of BmR1 Protein from Brugia malayi: In Silico Approaches Bee Yin Khor, Gee Jun Tye, Theam Soon Lim, Rahmah Noordin and Yee Siew Choong * Institute for Research in Molecular Medicine, Universiti Sains Malaysia, Minden, Penang 11800, Malaysia; E-Mails: [email protected] (B.Y.K.); [email protected] (G.J.T.); [email protected] (T.S.L.); [email protected] (R.N.) * Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel.: +604-653-4837; Fax: +604-653-4803. Received: 28 January 2014; in revised form: 25 March 2014 / Accepted: 4 June 2014 / Published: 19 June 2014 Abstract: Brugia malayi is a filarial nematode, which causes lymphatic filariasis in humans. In 1995, the disease has been identified by the World Health Organization (WHO) as one of the second leading causes of permanent and long-term disability and thus it is targeted for elimination by year 2020. Therefore, accurate filariasis diagnosis is important for management and elimination programs. A recombinant antigen (BmR1) from the Bm17DIII gene product was used for antibody-based filariasis diagnosis in “Brugia Rapid”. However, the structure and dynamics of BmR1 protein is yet to be elucidated. Here we study the three dimensional structure and dynamics of BmR1 protein using comparative modeling, threading and ab initio protein structure prediction. The best predicted structure obtained via an ab initio method (Rosetta) was further refined and minimized.
    [Show full text]
  • HASH: a Program to Accurately Predict Protein H Shifts from Neighboring Backbone Shifts
    J Biomol NMR DOI 10.1007/s10858-012-9693-7 ARTICLE a HASH: a program to accurately predict protein H shifts from neighboring backbone shifts Jianyang Zeng • Pei Zhou • Bruce Randall Donald Received: 25 October 2012 / Accepted: 5 December 2012 Ó Springer Science+Business Media Dordrecht 2012 Abstract Chemical shifts provide not only peak identities atoms (i.e., adjacent atoms in the sequence), can remedy for analyzing nuclear magnetic resonance (NMR) data, but the above dilemmas, and hence advance NMR-based also an important source of conformational information for structural studies of proteins. By specifically exploiting the studying protein structures. Current structural studies dependencies on chemical shifts of nearby backbone requiring Ha chemical shifts suffer from the following atoms, we propose a novel machine learning algorithm, a a limitations. (1) For large proteins, the H chemical shifts called HASH, to predict H chemical shifts. HASH combines can be difficult to assign using conventional NMR triple- a new fragment-based chemical shift search approach with resonance experiments, mainly due to the fast transverse a non-parametric regression model, called the generalized relaxation rate of Ca that restricts the signal sensitivity. (2) additive model, to effectively solve the prediction problem. Previous chemical shift prediction approaches either We demonstrate that the chemical shifts of nearby back- require homologous models with high sequence similarity bone atoms provide a reliable source of information for or rely heavily on accurate backbone and side-chain predicting accurate Ha chemical shifts. Our testing results structural coordinates. When neither sequence homologues on different possible combinations of input data indicate nor structural coordinates are available, we must resort to that HASH has a wide rage of potential NMR applications in other information to predict Ha chemical shifts.
    [Show full text]