An Overview of Tools for the Validation of Protein NMR Structures

J Biomol NMR (2014) 58:259–285 DOI 10.1007/s10858-013-9750-x ARTICLE An overview of tools for the validation of protein NMR structures Geerten W. Vuister • Rasmus H. Fogh • Pieter M. S. Hendrickx • Jurgen F. Doreleijers • Aleksandras Gutmanas Received: 27 March 2013 / Accepted: 4 June 2013 / Published online: 23 July 2013 Ó Springer Science+Business Media Dordrecht 2013 Abstract Biomolecular structures at atomic resolution examples from the PDB: a small, globular monomeric present a valuable resource for the understanding of biol- protein (Staphylococcal nuclease from S. aureus, PDB ogy. NMR spectroscopy accounts for 11 % of all structures entry 2kq3) and a small, symmetric homodimeric protein (a in the PDB repository. In response to serious problems with region of human myosin-X, PDB entry 2lw9). the accuracy of some of the NMR-derived structures and in order to facilitate proper analysis of the experimental Keywords NMR Á Structure Á Validation Á Restraints Á models, a number of program suites are available. We Quality Á Program Á Review Á Chemical shifts discuss nine of these tools in this review: PROCHECK- NMR, PSVS, GLM-RMSD, CING, Molprobity, Vivaldi, ResProx, NMR constraints analyzer and QMEAN. We Introduction evaluate these programs for their ability to assess the structural quality, restraints and their violations, chemical Biomolecular structures at atomic resolution are crucial for shifts, peaks and the handling of multi-model NMR interpreting cellular processes in a molecular context. In ensembles. We document both the input required by the addition, they serve important roles in drug discovery and programs and output they generate. To discuss their rela- functional industrial design, such as the modification of tive merits we have applied the tools to two representative enzyme properties. For all these applications, it is imperative that the biomolecular atomic structures are accurate, Electronic supplementary material The online version of this precise and truthfully reflect the experimental data on article (doi:10.1007/s10858-013-9750-x) contains supplementary which they were based. material, which is available to authorized users. The Protein Data Bank (PDB) (Berman 2008; Bernstein et al. 1977) is the primary repository of the atomic coor- G. W. Vuister (&) Á R. H. Fogh Department of Biochemistry, School of Biological Sciences, dinates of three-dimensional (3D) biomolecular structures. University of Leicester, Henry Wellcome Building, Lancaster It currently contains more than 91,000 entries, which cover Road, Leicester LE1 9HN, UK proteins, oligonucleotides and their complexes, including e-mail: [email protected] small-molecule ligands. The entries solved by Nuclear R. H. Fogh Magnetic Resonance (NMR) represent approximately Department of Biochemistry, University of Cambridge, 11 % of the total ([10,000 entries). The PDB archive is 80 Tennis Court Road, Cambridge CB2 1GA, UK jointly managed by four partner organisations (RCSB PDB (Berman et al. 2000), PDBe (Velankar et al. 2012), PDBj P. M. S. Hendrickx Á A. Gutmanas Protein Data Bank in Europe, European Molecular Biology (Kinjo et al. 2012) and BMRB (Ulrich et al. 2008) under Laboratory, European Bioinformatics Institute, Wellcome Trust the aegis of the wwPDB (Berman et al. 2007) consortium. Genome Campus, Hinxton, Cambridge CB10 1SD, UK A series of erroneously modelled NMR structures (Clore et al. 1995; Lambert et al. 2004; Nabuurs et al. 2006; J. F. Doreleijers Radboud University Medical Centre, NCMLS, CMBI, Geert Spadaccini et al. 2006) and cases of outright scientific Grooteplein Zuid 26-28, 6525 GA Nijmegen, The Netherlands fraud (Borrell 2009) with X-ray derived structures 123 260 J Biomol NMR (2014) 58:259–285 underscore the need for dedicated tools to assess the plot regions reported by PROCHECK-NMR were com- structural quality of biomolecular structures as well as the monly regarded as an assurance of a good quality structure, agreement with the experimental data. Moreover, the 3D but recent assessments have shown this to be invalid structure and dynamic properties of the biomolecules can (Doreleijers et al. 2012a, b). Tools designed for X-ray change in response to interactions with other molecules and crystallography, such as WHAT IF (Vriend 1990; Hooft hence it is also imperative to carefully assess the accuracy et al. 1996) or Molprobity (Davis et al. 2007; Chen et al. of the structures. 2010), can also be used for NMR-derived structures. Structure validation typically encompasses two broad However, NMR-specific properties such as the presence of aspects: the agreement of the experimental data with the multiple models in one structural ensemble and the resulting structure and a geometric validation. In order to potential dynamical aspects represented in this ensemble, calculate the agreement with the experimental data, a often present problems that are not accommodated by these theoretical description relating the data to the atomic programs. In particular, most of the routines also fail to coordinates is required. These relations are typically also adequately address the validation of ‘ensembles of used during the structure calculation procedure to drive the ensembles’, where the computational protocols simulta- convergence and hence the assessment only conveys the neously aim to treat both the structural model and the degree to which the structure was calculated properly. If, available dynamical data (Montalvao et al. 2012; Lindorff- however, the data are internally inconsistent, this will Larsen et al. 2005). Structure validation software dedicated typically result in statistically poor or unusual distributions to NMR-derived structures, such as the PSVS suite of related structural parameters (vide infra). More inde- (Bhattacharya et al. 2007) or CING (Doreleijers et al. pendent measures are based upon cross-validation methods 2012a), typically provide solutions for the issues inherently (Brunger et al. 1993; Clore and Schwieters 2006; Nabuurs associated with X-ray oriented tools. et al. 2005; Tjandra et al. 2007) that exclude a fraction of Compared to X-ray crystallography, validation of NMR- the data in the structure calculation procedures. derived structures is in general more complicated. Not only Geometric structure validations aim to assess the quality do the tools need to take into account the aforementioned in relation to the chemical and structural knowledge dynamical effects and the multiple conformers in the derived from relevant reference structures. Local structural structural ensemble, but also the nature of the NMR data, parameters such as bond lengths, bond angles and torsion which differs vastly from the X-ray situation. Whereas the angles are obtained from X-ray crystallography data of latter only concerns the reflections, which are uniform in small molecules and ultra-high resolution biomolecular data content, NMR-derived structures can be based on a structures (Engh and Huber 1991, 2001), whereas dihedral large variety of experimental data (Vuister et al. 2011). angle distributions are based upon a set of high-resolution These can be both local in nature, such as distance and X-ray structures. Clearly, there is an inherent danger that dihedral restraints, global in nature, such as residual dipolar structures are evaluated with respect to an incomplete or couplings (RDCs) and pseudo-contact shifts (PCS), or biased reference, but it is nowadays generally appreciated describe the overall shape, such as the small angle scat- that uncommon features flagged by a geometric assessment tering (SAS) data. should be supported by solid experimental data (Vriend A typical NMR structure calculation protocol involves a 1990; Chen et al. 2010; Bhattacharya et al. 2007; Dore- customised simulated annealing procedure, typically in leijers et al. 2012a; Nabuurs et al. 2006; Hooft et al. 1996). torsion-angle space, where restraints are included as Traditional and still-popular biomolecular NMR struc- pseudo-harmonic potentials (Stein et al. 1997;Gu¨ntert ture validation routines have relied on a limited set of tools 1998). Many additions to the basic protocol have been and metrics. It has been customary to summarise restraint proposed, such as the use of database potentials (Ku- content using a simple count of the number of restraints, szewski and Clore 2000), radius of gyration (Schwieters 15 whereas it has long been known that these numbers are and Clore 2008), N-T1/T2-relaxation parameters (Tjandra flawed for multiple reasons (Nabuurs et al. 2003). A recent et al. 1997), SAXS-derived potentials (Gabel et al. 2008)or large-scale analysis also showed great redundancy in the ensembles consistent with S2 order parameters (Best and reported number of distance restraints (Doreleijers et al. Vendruscolo 2004). Refinement in explicit water using a 2009). It even proved possible to refine NMR-derived more extended force field was shown to significantly structures using random 15N-RDC values to acceptable improve the structural quality. (Linge et al. 2003; Spronk Q-factors (Bax and Grishaev 2005). PROCHECK-NMR et al. 2002) Inferential structure determination (Rieping (Laskowski et al. 1996) has been the accepted choice for et al. 2005), while computationally expensive, was shown the assessment of the geometrical quality of NMR to significantly improve the treatment of dynamical effects ensembles, in spite of it long being out-dated. High per- and provide for a more unbiased parametrisation of the centages of residues in the most favoured Ramachandran underlying

An Overview of Tools for the Validation of Protein NMR Structures

A Web Server for Rapid NMR-Based Protein Structure Determination Berjanskii, M.; Tang, P.; Liang, J.; Cruz, J

Quantitative Analysis of Biomolecular NMR Spectra: a Prerequisite for the Determination of the Structure and Dynamics of Biomolecules

HASH: a Program to Accurately Predict Protein H Shifts from Neighboring Backbone Shifts

Automatic 13C Chemical Shift Reference Correction for Unassigned Protein NMR Spectra

Noor E Hafsa

A Modest Improvement in Empirical NMR Chemical Shift Prediction by Means of an Artiﬁcial Neural Network

Bioinformatics Methods for NMR Chemical Shift Data

David Wishart University of Alberta [email protected] NOE-Based NMR

Predicting Chemical Shifts with Graph Neural Networks APREPRINT

Using Chemical Shift Perturbation to Characterise Ligand Binding ⇑ Mike P

Arxiv:1305.2164V2 [Physics.Chem-Ph] 24 Nov 2013

HASH: a Program to Accurately Predict Protein H Shifts from Neighboring Backbone Shifts