Quality Assessment of Protein NMR Structures

Available online at www.sciencedirect.com ScienceDirect § Quality assessment of protein NMR structures 1 2 3,4 Antonio Rosato , Roberto Tejero and Gaetano T Montelione Biomolecular NMR structures are now routinely used in macromolecules. In addition to its unique role in charac- biology, chemistry, and bioinformatics. Methods and metrics terizing biomolecular dynamics, NMR is routinely used for for assessing the accuracy and precision of protein NMR structure determinations of small (<20 kDa) proteins structures are beginning to be standardized across the [1 ,2,3 ] and is beginning to be used more routinely biological NMR community. These include both knowledge- for determining structures of larger (20–50 kDa) soluble based assessment metrics, parameterized from the database and membrane proteins (e.g. Refs. [4 ,5 ,6,7]). NMR- of protein structures, and model versus data assessment derived structure models can be used interchangeably with metrics. On line servers are available that provide models generated by X-ray crystallography in many bio- comprehensive protein structure quality assessment reports, logical applications. It is therefore natural that many qual- and efforts are in progress by the world-wide Protein Data Bank ity assessment metrics are common between the two (wwPDB) to develop a biomolecular NMR structure quality techniques. In addition, there is a portfolio of metrics that assessment pipeline as part of the structure deposition are specific to NMR, which take into account the distinc- process. These quality assessment metrics and standards will tive features of the NMR data used in the structure aid NMR spectroscopists in determining more accurate determination process. structures, and increase the value and utility of these structures for the broad scientific community. In this review, we outline some of the metrics in common Addresses use for protein NMR structure quality assessment. A large 1 Magnetic Resonance Center and Department of Chemistry, University part of our review reflects recently published recommen- of Florence, 50019 Sesto Fiorentino, Italy dations of the world-wide Protein Data Base (wwPDB) 2 ´ ´ Departamento de Quimica Fisica, Universidad de Valencia, Avenida Dr. task forces on validation of biomolecular structures deter- Moliner 50, 46100 Burjassot, Valencia, Spain 3 mined by X-ray crystallography [8 ] and NMR methods Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, and Northeast Structural Genomic [9 ]. Consortium, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA Knowledge-based measures 4 Department of Biochemistry and Molecular Biology, Robert Wood Knowledge-based (KB) metrics describe how well the Johnson Medical School, Rutgers, The State University of New Jersey, structure model conforms to expectations with respect to Piscataway, NJ 08854, USA selected features that can be assessed by comparison with Corresponding author: Montelione, Gaetano T ([email protected]) the extensive database of experimental structures. These include bond length and bond angle distributions, dihedral angle distributions, atomic packing, hydrogen bond Current Opinion in Structural Biology 2013, 23:715–724 geometries, and other geometric features. Ideal values or This review comes from a themed issue on Biophysical methods value distributions are derived from statistical analyses of Edited by Wah Chiu and Gerhard Wagner high-resolution X-ray structures, and are generally con- For a complete overview see the Issue and the Editorial sistent with basic principles of biophysical chemistry. Available online 21st September 2013 There has been some debate regarding the use of KB 0959-440X/$ – see front matter, # 2013 The Authors. Elsevier Ltd. All information derived from X-ray crystal structures of bio- rights reserved. molecules in assessing solution NMR structures. There http://dx.doi.org/10.1016/j.sbi.2013.08.005 are often differences in the sample conditions used in determining NMR and X-ray structures, and particular conformations from the distribution present in solution Introduction may be selected by the requirements of the crystal lattice. Nuclear Magnetic Resonance (NMR) spectroscopy, Nevertheless, there is no cogent reason to adopt different along with X-ray diffraction and cryo-electron micro- KB parameter distributions for assessing solution or solid- scopy, is one of the three major experimental techniques state protein NMR structures with respect to those used providing three-dimensional (3D) structures of biological to assess structures determined by X-ray crystallography. Problems indicated by KB assessments can be mapped § onto the 3D structure to identify local hot spots of This is an open-access article distributed under the terms of the structural inaccuracy [10 ,11 ]. Creative Commons Attribution-NonCommercial-No Derivative Works License, which permits non-commercial use, distribution, and repro- duction in any medium, provided the original author and source are The most general protein model assessment tools look at credited. residue pair-distribution functions (e.g. PROSA2 [12]) or www.sciencedirect.com Current Opinion in Structural Biology 2013, 23:715–724 716 Biophysical methods distributions of hydrophobic and hydrophilic residues and fragment libraries, can significantly improve the (e.g. Verify3D [13]) which are characteristic of native accuracy of NMR structures (see e.g. [1,3,20 ,21]), protein structures. These analyses are important first particularly for larger proteins determined with sparse steps in structure validation. However, their value is restraint networks [4 ,5 ]. Approaches based on molecular primarily in identifying severely incorrect folds [14], dynamics can also often provide appreciable improve- which rarely result from NMR structure determinations ments in structure quality [22]. done with high restraint densities. These scores may, however, be important when assessing structures deter- Model versus data measures mined from sparse restraint networks. Non-globular Model versus data (MvD) metrics describe how well the protein folds (e.g. coil–coil structures) may exhibit poor NMR structure model matches experimental data. MvD PROSA2 or Verify3D scores even when the models are quality assessments include data that have been used in accurate. the structure generation process and, where feasible, cross-validation using data that have not been used in Dihedral angle distributions are the most prominent KB structure generation calculations. statistic used in assessing protein NMR structures. They are generally reported as Z-scores relative to distributions NMR restraint analysis observed in high-resolution crystal structures, or across all The most general form of MvD validation involves com- structures that have been deposited in the PDB. Dihedral parison of distances and dihedral angles in models with angle distributions are generally reported separately for the corresponding experimental restraints. Table 1 pro- amino-acid residue backbone and side chains. Backbone vides a standard format used to report such restraint f, c distributions are generally assessed based on com- violations [9 ,20 ], although other formats are also com- pliance with the Ramachandran plot. Historically, the mon in the NMR literature. These metrics provide an program ProCheck [15] has been used for this analysis, overview of global (or average) restraint violation stat- but recent work using a larger set of X-ray structures istics, as well as information on the most significant determined at high resolution suggests that more accurate outliers. Clusters of restraint violations in regions of assessments can be made using improved backbone f, c the 3D structure may indicate errors in the local struc- distribution statistics [8 ,16]. The assessment of side ture. Methods have been described to convert between chain dihedral angle distributions (also referred to as restraint formats [20 ,23,24], allowing initial restraint rotamer normality) can be more subtle. Protein side lists generated for one structure generation program to chains have been observed to largely adopt standard be used with alternative structure generation programs. rotamer states (gÀ, t, g+) even when buried in the cores The NOE completeness score [20 ,25] is a useful metric of protein crystal structures. While NMR data can in of structural accuracy, assessing the fraction of short principle determine accurate side-chain conformations, distances in the model structure that are consistent with surface side chains are often dynamically averaged com- restraint data set. plicating the interpretation of the corresponding NMR data. NOESY data Restraint analysis has the significant shortcoming that Several tools have been used to assess core atom packing, the restraints are themselves interpretations of NOESY including the Molprobity [11 ] program for assessment and other NMR data. Accordingly, NMR structure qual- of overpacking, and both the Molprobity and Rosetta- ity assessment should also include some metrics validat- Holes [17] programs for assessment of underpacking. ing models against uninterpreted spectral data. In the Severe atomic overlaps are rarely an issue in NMR case of NOESY data, several methods have been devel- structures, unless there are errors in the restraints, oped for back-calculating NOESY spectral data (e.g. because of the use of lower-distance bounds. High-energy Refs. [26–28]), although to date none of these has come

Quality Assessment of Protein NMR Structures

Quantitative Analysis of Biomolecular NMR Spectra: a Prerequisite for the Determination of the Structure and Dynamics of Biomolecules

HASH: a Program to Accurately Predict Protein H Shifts from Neighboring Backbone Shifts

Automatic 13C Chemical Shift Reference Correction for Unassigned Protein NMR Spectra

A Modest Improvement in Empirical NMR Chemical Shift Prediction by Means of an Artiﬁcial Neural Network

Bioinformatics Methods for NMR Chemical Shift Data

David Wishart University of Alberta [email protected] NOE-Based NMR

Predicting Chemical Shifts with Graph Neural Networks APREPRINT

Using Chemical Shift Perturbation to Characterise Ligand Binding ⇑ Mike P

An Overview of Tools for the Validation of Protein NMR Structures

Arxiv:1305.2164V2 [Physics.Chem-Ph] 24 Nov 2013

HASH: a Program to Accurately Predict Protein H Shifts from Neighboring Backbone Shifts

Quantum-Mechanics-Derived C Chemical Shift Server (Cheshift) for Protein Structure Validation