Motivation
Tools for Validation of Analysis of ~400 NMR structures NMR structures are NMR-Structures generally not very good 90% ~25 % of recently deposited 75% structures is seriously flawed Geerten W. Vuister 50% average ~ Structural quality can often 25% be improved by: Department of Biochemistry, University of Leicester 10% • Proper computational procedures Geerten W. Vuister http://proteins.dyndns.org/Validation • Validation of input data Protein Biophysics, IMM, Radboud University Nijmegen • Validation of results
http://proteins.dyndns.org Nabuurs et al. PLoS Comp. Biol. 2, e9, 2006
NMR Structure Validation EMBO course, Basel, July 2013
1 2
Structural quality Structural quality
1Q7X human PDZ2-AS 1OZI mouse PDZ2-AS
Statistics over first 10 deposited structures hPDZ2 mPDZ2
Number of NOE restraints 1648 1354 Number of torsion angle restraints 80 76 RMSD all backbone atoms (Å) 0.24 2.56 RMSD all heavy atoms (Å) 0.86 3.00 PROCHECK Most favoured 59% 79% PROCHECK Additionally allowed 27% 16% PROCHECK Generously allowed / Disallowed 14% 5% WHAT IF Ramachandran plot Z-score -6.7 -3.7 WHAT IF Packing quality Z-score -3.7 -1.2
RMS Z-scores output RMS Z-scores of WHATIF WHAT IF Rotamer normality Z-score -6.5 -1.8 WHAT IF Backbone normality Z-score -8.6 -3.8 Backbone RDC R-factor 69% 40%
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
3 4
Maybe the wrong ensemble was deposited? Structural quality
‣ Unfortunately it looks like this was not the DR1885 apo copper-bound case.
‣ The images used in the publication also contain the errors.
‣ Furthermore, the structural observations described in the paper are in agreement with the incorrect structure… [JMB, 2003, 334:143-155 ] RMS Z-scores output RMS Z-scores of WHATIF
1X7L (replaced by 2JQA) 1X9L
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
5 6 Structural quality Structural quality
DR1885 Restraint s How did such errors pass unnoticed? RMS Z-scores output RMS Z-scores of WHATIF
[PNAS, 2005, 102 (11) 3994-3999 ]
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
7 8
Background reading wwPDB NMR-VTF
Concepts and Tools for Task: to formulate general accepted routines and
Progress in Nuclear Magnetic Resonance Spectroscopy 45 (2004) 315–337 NMR Restraint Analysis www.elsevier.com/locate/pnmrs procedures for the validation of NMR derived and Validation Validation of protein structures derived by NMR spectroscopy
1 1 1 2 a a a a b,* biomolecular structures SANDER B. NABUURS, CHRIS A.E.M. SPRONK, GERT VRIEND, GEERTEN W. VUISTER Chris A.E.M. Spronk , Sander B. Nabuurs , Elmar Krieger , Gert Vriend , Geerten W. Vuister 1 Center for Molecular and Biomolecular Informatics,University of Nijmegen,Toernooiveld 1, aCentre for Molecular and Biomolecular Informatics, IMM, Radboud University Nijmegen, Toernooiveld 1, 6525 ED Nijmegen, The Netherlands 6525 ED Nijmegen,The Netherlands bDepartment of Biophysical Chemistry, IMM, Radboud University Nijmegen, Toernooiveld 1, 6525 ED Nijmegen, The Netherlands 2 Department of Biophysical Chemistry,University of Nijmegen,Toernooiveld 1, 6525 ED Nijmegen,The Netherlands Received 15 July 2004
Contents ABSTRACT: The quality of NMR-derived biomolecular structure models can be assessed 1. Introduction ...... 316 by validation on the level of structural characteristics as well as the NMR data used to derive the structure models. Here, an overview is given of the common methods to validate 2. NMR structure determination ...... 317 Phase 1. Tasks to be implemented by PDB using experimental NMR data. These methods provide measures of quality and goodness of fit of 2.1. Structure calculation procedures ...... 317 the structure to the data. A detailed discussion is given of newly developed methods to 2.2. Structure selection ...... 317 assess the information contained in experimental NMR restraints, which provide powerful 3. Validation of experimental data ...... 318 tools for validation and error analysis in NMR structure determination. © 2004 Wiley 3.1. Fit of structures to experimental restraints ...... 319 largely existing software Periodicals, Inc. Concepts Magn Reson Part A 22A: 90–105, 2004 3.1.1. Restraint violations ...... 319 3.1.2. RMS deviations and energies of restraints ...... 320 KEY WORDS: structure validation; experimental restraints; restraint validation; structure 3.1.3. NMR R-factors and cross-validation ...... 320 refinement 3.1.4. Independent validation and Q-factors ...... 321 3.2. Information content in experimental restraints ...... 321 Phase 2. Tasks for which software / methods are 3.2.1. Number of restraints, completeness and redundancy ...... 321 3.2.2. Quantitative evaluation of experimentalRecommendations NMR restraints ...... of the wwPDB 322 !"#$%&'%(&)#$*#+$$,-#*$'#+.%/,(0/+($"#$*#1'$+&("#234#-+'56+5'&-the spectroscopic data directly, geometric confor-# INTRODUCTION 4. Precision and accuracy of NMR structure ensembles ...... 323 mational restraints are derived from these data, !"#"$%&'()*+,-.$/"0"$1234,.5.$6"7"8"$0*9:+';<=>.$?"1"$@2+*A*'B*+(C$D$E"$!&)FG9G(>$ 4.1. Precision versus accuracy ...... 323 The result of a biomolecular structure determina- which are subsequently used to calculate the struc- NMR Validation Task Force available, but which need more assessment before tion,H9'I*+(')J$2K$L*';*()*+.$@*MG+)F*9)$2K$N'2;4*F'()+J.$8;422A$2K$N'2A23';GA$8;'*9;*(.$0*9+J$#*AA;2F*$N&'A:'93 by solution nuclear magnetic resonance (NMR) tures (1). Derivation of.$LG9;G()*+$/2G:.$L*';*()*+.$LO,$P0Q.$H such structural restraints R"$ 5. Validation of geometric quality ...... 325 5$H9'I*+(')J$2K$SGFT+':3*.$@*MG+)F*9)$2K$N'2;4*F'()+J.$UV$W*99'($S2&+)$/2G:.$SGFT+':3*.$SN5$,!E.$HR"$$ 5.1. Z-scores and RMS Z-scores ...... 325 spectroscopy> is typically a family of structural from NMR spectra is complicated because spectral 6+2)*'9$@G)G$NG9<$'9$O&+2M*.$O7NLXO&+2M*G9$N'2'9K2+FG)';($Y9()')&)*.$#*AA;2F*$W+&()$!*92F*$SGFM&(.$0'9=)29.$SGFT+':3*$SN,V$,8@.$HR"$ 5.2. Bonded geometry ...... 326 modelsC$/G:T2&:$H9'I*+(')J$7*:';GA$S*9)+*.$QS7L8.$S7NY.$!**+)$!+22)*MA*'9$Z&':$5[ describing the accessible molecular confor- overlap,X5U.$[\5\$!E spin$Q'BF*3*9.$W4*$Q*)4*+AG9:(" diffusion, local dynamics,$ and inter- - 5.2.1. Bond lengths and angles ...... 326 mations.W2$]42F$;2++*(M29:*9;*$(42&A:$T*$G::+*((*:" This family, or ensemble,$ of structure converting conformations have to be taken into defining standard validation conventions for PDB 5.2.2. Chirality and tetrahedral geometry ...... 327 $ account. The traditional manual assignment of 1 2,3 4 5 models should agree as a whole with the experi- 5.2.3. SideGaetano chain planarity T. Montelione...... , Michael Nilges , Ad Bax , Peter Güntert 327, N'2F2A*;&AG+$ ()+&;)&+*($ G)$ G)2F';$ +*(2A&)'29$ M+*(*9)$ G$ IGA>A*$NMR resonances +*(2&+;*$ K2+$ and )4*$ conversion &9:*+()G9:'93$ of NMR 2K$ T'2A23J"$ peaks Q7/$ mental NMR data used in the procedure, as well as 5.2.4. Side chain rotamers ...... 328 (M*;)+2(;2MJ$G;;2&9)($K2+$,,^$2K$GAA$()+&;)&+*($'9$)4*$6@N$+*M2(')2+J"$Y9$+*(M29(*$)2$(*+'2&($M+2TA*F($]')4$)4*$G;;&+G;J$ other additional data. Typically, rather than using into structural restraints is an extremely time-con- 5.2.5. Backbone conformation ...... 331 2K$(2F*$2K$)4*$Q7/X:*+'I*:$()+&;)&+*($G9:$'9$2+:*+$)2$KG;'A')G)*$M+2M*+$G9GAJ('($2K$)4*$*=M*+'F*9)GA$F2:*A(.$G$9&FT*+$2K$ 6 7 8 suming process, even for experienced spectrosco- 5.3. Non-bondedT interactionsorsten ...... Herrmann , Jane S. Richardson , Charles Schwieters , 331 M+23+GF$(&')*($G+*$GIG'AGTA*"$#*$:'(;&(($ 9'9*$2K$)4*(*$)22A($'9$)4'($+*I'*]_$6/`S0OSRXQ7/.$68%8.$!L7X/78@.$SYQ!.$ Received 2 March 2004; revised 13 April 2004; ac- pists. Further, manual interpretation of NMR data is 5.3.1. Inter-atomic bumps ...... 331 Phase 3. Tasks requiring further research over the 72AM+2T')J.$ %'IGA:'.$ /*(6+2=.$ Q7/$ ;29()+G'9)($ G9GAJa*+$ G9:$prone bF*G9"$ to human #*$ *IGA&G)*$ error and,)4*(*$ possibly, M+23+GF($ manipulation. K2+$ )4*'+$ GT'A')J$ )2$ cepted 13 April 2004 5.3.2. Hydrogen bonding ...... 9 10 11 332 G((*(($)4*$()+&;)&+GA$c&GA')J.$+*()+G'9)($G9:$)4*'+$I'2AG)'29(.$;4*F';GA$(4'K)(.$M*G<($G9:$)4*$4G9:A'93$2K$F&A)'These problems are being alleviated by theXF2:*A$Q7/$ recent 5.3.3. ElectrostaticsWim F...... Vranken , Geerten W. Vuister , and David S. Wishart , 333 Correspondence*9(*FTA*("$#*$:2;&F*9)$T2)4$)4*$'9M&)$+*c&'+*:$TJ$)4*$M+23+GF($G9:$2&)M&)$)4*J$3*9*+G)*"$W2$:'(;&(($)4*'+$+*AG)'I*$ to: Geerten W. Vuister; E-mail: [email protected] development of several automated methods (for 5.3.4. The packing of residues in protein structures ...... 334 F*+')($]*$4GI*$GMMA'*:$)4*$)22A($)2$)]2$+*M+*(*9)G)'I*$*=GFMA*($K+2F$)4*$6@N_$G$(FGAA.$3A2T&AG+$F292F*+';$M+2)*'9$detailed discussions see [2, 3]), which have in- 5.4. NMR versus X-ray structures ...... 335 Conceptsd8)GM4JA2;2;;GA$9&;A*G(*$K+2F$ in Magnetic Resonance Part A,!"#$%&'%( Vol. 22A(2).$6@N$*9)+J$5
Centre National de la Recherche Scientifique, Ecole Normale Supérieure de
Lyon, Université Claude Bernard Lyon 1, 69100 Villeurbanne, France
1
Structural quality Structural quality Proper computation. Proper computation. • Short annealing/restraint MD calculation using electrostatics and Error detection. explicit solvent (Spronk et al. J. Biomol NMR 22, 281-9, 2002; Linge et al. Mobility and structural variation. Proteins 50, 496-506, 2003) • Publicly available databases: DRESS (http://www.cmbi.ru.nl/dress) Assessment of structural quality. and RECOORD (http://www.ebi.ac.uk/msd-srv/docs/NMR/recoord/ main.html). • >100, 500 refined structures and validation reports. • NMR_REDO Error detection. Mobility and structural variation. Assessment of structural quality.
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
11 12 Structural quality Structural quality Proper computation. Proper computation. Error detection. Error detection. Mobility and structural variation. Mobility and structural variation. Assessment of structural quality. Assessment of structural quality. • Data are ‘complex in nature’ (NOEs, J, RDC, SAXS, databases, ..). • Differences in interpretation and parametrization. • Differences in protocols. • Multiple structures (dynamics). • Tools for structure validation.
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
13 14
Structure determination process Structure determination process
Reinterpretation of Reinterpretation of Experimental Data data data Validated Structure Data Validated Structure Spronk et al, Fig. 1 Data Spronk et al, Fig. 1
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
15 16
Structure determination process Structure calculation and selection
Experimental data (restraints)
Repeat n-times
Selection
Reinterpretation of Experimental Data data Validated Structure Data Spronk et al, Fig. 1
Spronk et al, Fig. 2 Ensemble
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
17 18 Structure determination process Precision and accuracy
Precision is the variation of X around
Accuracy is the closeness of
Precision and accuracy are often mixed in the literature
Reinterpretation of Experimental Data data Validated Structure Data Spronk et al, Fig. 1
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
19 20
Precision and accuracy Accuracy of NMR structures Accuracy can only be assessed when the true structure is known (“Gold Standard”) • Only the case for simulated data-sets
Sometimes X-ray structures are used • Different experimental conditions • Crystal contacts • In some cases X-ray structures fit NMR data better than NMR structures
Spronk et al., Fig. 5
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
21 22
Uncertainty in structure coordinates Coordinate RMSDs X-ray crystallography: B-factor Calculation requires superposition of structures • Quality of the crystal • Region dependent • Dynamic behavior of the molecule • Disorder Structure selection criteria are subjective NMR: atomic Root Mean Square Deviation or RMSD • Should reflect measured dynamics and the uncertainty in the Use Circular Variance, CV? experimental data • Used as a measure for precision and accuracy!
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
23 24 Superposition NMR-VTF Phase 1 676 D.A. SNYDER AND G.T. MONTELIONE Superposition: Cyrange (distance variance matrix) Assessing structured regions: Cyrange Representative model: mediod
Fig. 1. Superimpositions of structural bundles calculated from a single ordered region. A, B:StructuralbundlesobtainedfromthePDBaresuperimposed using only a single stretch of locally ordered residues (i.e. those for which the sum of the and dihedral angle orderSnyder parameters and areMontelione, greater or equal Proteins to than 59, 673 (2008) 1.5) shown in blue. Other locally ordered residues are shown in green and residues which have and dihedral angle order parameters adding to less than 1.5 are shown in red. For the sake of clarity, only four structural models are shown although all models reported in the PDB file were used in calculating the average structure for the superimposition. Residues are numbered as in the PDB file. A: In 1PKT (Phosphatidylinositol 3-kinase, SH3 domain), residues 6–33 were used to determine the superimposition which results in a relatively tight superimposition of the other locally ordered residues (36–77) as well. B: In 1CFC(calcium free calmodulin), the superimposition is calculated using residuesNMR 1–75 Structure which results inValidation a poor superimposition of residues 78–148. C, D:ThesumoftheEMBO andcourse, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 dihedral angle order parameters are shown as a function of sequential residue number for both (C) 1PKT and (D) 1CFC. In (C), the disordered loop residues 34–35 separate ordered structures which are well defined with respect to one another. In (D), disordered residues 76–77 separate ordered structures that are not well defined with respect to each other. 25 26 Recall that each element of the IVM is a sum of n square displacements from a mean value. If scaled by the appropri- ate parametric variance, each element would therefore 2 approximately sample a [n] distributed random variable (i.e., with n degrees of freedom). For even a moderate number of degrees of freedom, the distribution of the cube-root of a 2 random variable approximates a normal distribution.16 Since scaling a normally distributed ran- dom variable results in a variable that maintains a Gaussian distribution, by taking the cube root of each element in the IVM, the transformed row vectors from the sub-matrix consisting only of the (now cube-root of the) variances in distance between core atoms represent core backbone atoms as vectors of approximately normally distributed random variables. The optimal SAHN variant to use for clustering data represented by vectors whose elements are normally dis- Fig. 3.
Precision revisted Precision and accuracy
A good estimate of the precision requires maximizing the conformational variability within a given set of experimental restraints
“Tight” bundle: “Loose” bundle: low RMSD high RMSD Spronk et al, Fig. 6 Suggesting high precision More realistic estimate of precision
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
27 28
Resampling to assess precision Resampling to assess precision
Ubiquitin
refinement ) )
CONCOORD RMSD (Å RMSD RMSD RMSD (º angles )
CONCOORD
Ensemble of Z-scores
protein structures Re-sampled ensemble Refined ensemble RMSD (Å distances
Spronk et al., J. Biomol. NMR. 25, 225-234 (2003)
Spronk et al., J. Biomol. NMR. 25, 225-234 (2003)
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
29 30 Resampling to assess precision Structure determination process NMR ensembles tend not to accurately reflect the experimental uncertainty. Important to consider when assessing structural differences, e.g. structural changes resulting from interaction.
Reinterpretation of Experimental Data data Validated Structure Data Spronk et al, Fig. 1
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
31 32
Evaluation of experimental data Evaluation of experimental data Quality of experimental data: Quality of experimental data: • Number (?) • Number (?) • Completeness / RPF scores • Completeness / RPF scores • QUEEN. • QUEEN.
Agreement of the structure with the experimental Agreement of the structure with the experimental data: data: • Number and size of violations. • Number and size of violations. • Root mean square deviations. • Root mean square deviations. • NOE, Dihedral angles etc. • NOE, Dihedral angles etc. • R-factors, Q-factors. • R-factors, Q-factors. • RDC, chemical shifts etc. • RDC, chemical shifts etc. • Cross-validation. • Cross-validation. NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
33 34
Evaluation of experimental data Evaluation of experimental data Number of restraints: Redundancy of restraints.
Intra-residual (|i-j|=0):
• Side chain conformation Total: 1413 Sequential (|i-j|=1): Secondary structure • HN Medium range (1<|i-j|!4): • Secondary structure Long range (|i-j|>4): Hα • Secondary and tertiary structure Maximum HN-Hα distance=3.0 Å Nabuurs et al, Fig. 3
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
35 36 Evaluation of experimental data Evaluation of experimental data Redundancy of restraints. Completeness Multiple entries of the same restraint!
1D3Z (Ubiquitin):
counts singly defined 940 multiple defined 696 total unique: 1636 duplicates: 1091 total all: 2727
Fig. 3 Nabuurs et al.
NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
37 38
Evaluation of experimental data Evaluation of experimental data Restraints per residue Quality of experimental data: Completeness per residue • Number (?) • Completeness / RPF scores • QUEEN.
Agreement of the structure with the experimental data: • Number and size of violations. • Root mean square deviations. • NOE, Dihedral angles etc. • R-factors, Q-factors. • RDC, chemical shifts etc. Fig. 3 Nabuurs et al. • Cross-validation. NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013
39 40
Evaluation of experimentalProtein NMR RPF Scores data EvaluationARTICLES of experimental data
can also be calibrated from the NOESY data. Accordingly, the RPF scores performance score (F-measure) of the final ensemble of structuresQualityF(Gh ) of experimental data: is assessed by the following set of statistics: • Number (?) {p (h1, h2, p) GANOE,(h1, h2, d) Gh } Recall (Gh ) ) | | ∈ ∈ | (1) Protein NMR RPF Scores ARTICLES {p (h1, h2, p) GANOE} • Completeness / RPF scores Figure 1. Comparison of distance network Gh generated from an ensemble | | ∈ | of 3D query structures and G generated from input NOE peaklist (NOE) ANOE -6 QUEEN. and resonance assignment (R) data. Edges that are present in both Gh and ! d(h1, h2) • GANOE are true positives (TP). Edges present in Gh , but not in GANOE are (h1,h2,d) Gh , ∈ h (h1,h2,p) GANOE false positives (FP). Edges that are not present in both G and GANOE are ∈ true negatives (TN). NOE cross peaks (p) are counted (only once) as false Precisionw(Gh ) ) (2) negatives (FN) if corresponding linking edges in GANOE are not present in d(h1,h2)-6 Gh . ! Agreement of the structure with the experimental (h1,h2,d) Gh ∈ ously linked to more than one proton pair, as indicated by chemical data: 2 Recall(Gh ) Precisionw(Gh ) shift degeneracies and match tolerances. The solution network, GNOE, F(Gh ) ) × × (3) Recall(Gh ) + Precision (Gh ) corresponding to the true 3D structure, is a subgraph of GANOE. w • Number and size of violations. Given complete NOESY peak lists and resonance assignments, for -6 each NOESY cross peak, at least one of its linked proton pairs belongs In this analysis, a distance (d )weightingoftheprecisionmetric,• Root mean square deviations. precisonw(Gh ), is used to reduce the otherwise dominant influence of to GNOE.Fromanensembleofquery3Dstructures,anensemble-average distance network Gh is then calculated from the sum of inverse sixth the many weak NOEs arising from interproton distances close• to theNOE, Dihedral angles etc. powers of individual degenerate proton-proton distances, assuming upper-bound detection limit, dNOE_max. This weighting also makes the uniform effects of nuclear relaxation processes (Figure 1). Protons quality scores less sensitive to the value chosen for dNOE_max• . R-factors, Q-factors. (vertices) are connected (edges) if their corresponding midrange Discriminating Power (DP-score). While the F-measure statistic Figure 5. Graphic representations of false positive (FP) distributions on IL-13 structures. False positives correspond to short average distances ( Downloaded by UKB CONSORTIA NETHERLANDS on July 10, 2009 input NOESY peak lists and resonance assignments. F(Gideal), and such as packing contacts, dihedral angle distributions,true and positiveslists (TP), from which while they “not-relevant” are generated, these documents constraint retrieved lists are by the Published on January 22, 2005 http://pubs.acs.org | doi: 10.1021/ja047109h conformational energies, are valuable tools for protein structurealgorithm areinterpretations false positives of NOESY (FP). peak “Relevant” lists, while RPF documents scores directly not retrieved particularly the Precision of Gideal, thus provides a measure of the validation,15-17 comparing observed conformational distributionsby an algorithmmeasure are the false quality negatives of structures (FN) against and the “not-relevant” NOESY peak list documents combined quality of the resonance assignment and NOESY peak lists data. For example, Precision has similarities with NOE Com- and packing with values observed in nature and/or expectedthat on are also not retrieved by an algorithm are true negatives (TN). for one or more spectra. F(Gideal) and F(Gfree) describe the two bounds first principles. RPF scores measure global goodness-of-fits of pleteness score;10 the Precision score measures the completeness of the performance F(Gh ); i.e., F(Gideal) g F(Gh ) g F(Gfree). With these NOE peak lists with NMR structures. In general, the goalRecall should is definedof back-calculated as the fraction peak lists ofrelati relevantVe to documentsNOESY peak list that data are, retrieved be to generate protein structures that score well in theseby several the algorithmwhile the and NOE Precision Completeness is scoredefined computes as the the fractioncompleteness of retrieved definitions, the fold Discriminating Power (DP) for Gh is then estimated different and complementary views of structure quality.documents For of that the back-calculated are in fact relevant. distance constraints The F-measurerelatiVe to a characterizes deriVed the as: example, high RPF scores and high ProcheckDownloaded by UKB CONSORTIA NETHERLANDS on July 10, 2009 45 scores indicate (and potentially incorrect) constraint list. While the NOESY combined performance of Recall and Precision. that the structures both fit the data wellPublished on January 22, 2005 http://pubs.acs.org | doi: 10.1021/ja047109h and have good peak lists themselves are “derived” information, they are closer F(Gh ) - F(Gfree) stereochemical qualities. High RPF scores and slightly lowerIn the contextto the raw NMRof NOESY-based spectral data than constraint structure lists, analysis, which involve proton pair DP(Gh ) ) (4) much higher levels of interpretation and (sometimes) data Procheck scores indicate that the structures fit the datainteractions well, (h1, h2) are analogous to “documents”. Observed NOESY F(Gideal) - F(Gfree) but that the data may not be sufficient to define correct local omission. structure, and additional data and/or refinement processescross may peaks are defined as true relevant documents, assuming the peak Conclusions where, DP(G ) ) 1 and DP(G ) ) 0. be required. Importantly, good stereochemical and/or packinglists (set NOE) have no noise. Potential NOESY peaks not observed ideal free scores alone do not necessarily demonstrate that the correspond-in the data are“NMR analogous R-factors” to provide not-relevant a quality documents, measure of the assuming agreement the input The F-measure score provides an assessment of the overall fit ing structure fits well to the experimental NOE data. Similarly,data are complete.between the As experimental illustrated and in back-calculated Figure 1, particular NOESY peak proton pair between the query model structure(s) and the experimental data, interactionslists. present Although in (or critical “retrieved to the development by”) the of atomic the field, coordinates such of a assuming that the input data are near complete; the Discriminating (45) Laskowski, R. A.; Moss, D. S.; Thornton, J. M. J. Mol. Biol. 1993, 231, analyses have not been routinely used in NMR structure 1049-1067. Power score, DP(Gh ), measures how the query structure is distinguished (46) Sayle, R. A.; Milner-White, E. J. Trends Biochem. Sci. 1995, 20model, 374. structurecalculations mayeither because be conventional represented methods in the of graphical back calculating representation of the NOESY peak list data GANOE (TP), or not represented in GANOE from the freely rotating chain model. 9 (FP). Proton pair interactionsJ. AM. CHEM. “not SOC. retrieved”VOL. 127, by NO. the 6, 2005structure1673 and also NMR Datasets. We have validated the sensitivities of NMR RPF not represented in GANOE are defined as TNs. Proton pair interactions scores on experimental NMR data sets of: human basic fibroblast 26,27 not retrieved by the structure but represented in GANOE have to be growth factor (FGF-2, 154 a.a.), the inhibitor-free catalytic fragment 28,29 considered carefully with respect to the ambiguous relationship between of human fibroblast collagenase (MMP-1, 169 a.a.), and human peaks and their multiple possible assignments. Since GANOE is an (24) Flory, P. J. Statistical Mechanics of Chain Molecules; Interscience ambiguous network, a FN score is assigned to the peak only if none of Publishers: New York, 1969. the several possible interactions are observed in Gh . In this context, (25) Cantor, C. R.; Schimmel, P. R. Biophysical Chemistry; W. H. Freeman: Recall (eq 1) measures the fraction of NOE cross peaks that are retrieved San Francisco, 1980. (26) Moy, F. J.; Seddon, A. P.; Campbell, E. B.; Bohlen, P.; Powers, R. J. by the query structures, while Precision (eq 2) measures the fraction Biomol. NMR 1995, 6, 245-254. of retrieved proton pair interactions in the query structure that are (27) Moy, F. J.; Seddon, A. P.; Bohlen, P.; Powers, R. Biochemistry 1996, 35, 13552-13561. relevant (in GANOE), weighted by interproton distance. The upper-bound (28) Moy, F. J.; Pisano, M. R.; Chanda, P. K.; Urbano, C.; Killar, L. M.; Sung, observed distance, dNOE_max,usedinthesemeasuresis5Å,but M.-L.; Powers, R. J. Biomol. NMR 1997, 10,9-19. J. AM. CHEM. SOC. 9 VOL. 127, NO. 6, 2005 1667 QUEEN QUEEN: average information Structure calculations Average information of Experimental data restraint r : (restraints) Restraints of 1GB1 !" Iave is a measure for the Experimental data average contribution of Uncertainty: (restraints) the restraint to the determination of the fold Calculate H using concepts of Shannon’s information theory NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 43 44 QUEEN: average information QUEEN: average information Average Average information of information of restraint r : restraint r : !" Iave is a measure for the average contribution of the restraint to the determination of the fold. fraction restraints (%) YBOX (1H95) 10 restraints with highest Iave NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 45 46 QUEEN: unique information QUEEN: Redundancy Unique Unique information of information of restraint r : restraint r : Restraints of 1GB1 !" Is a measure for the degree of support by other restraints fraction restraints (%) Total: 1413 Total: 565 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 47 48 Evaluation of experimental data Evaluation of experimental data Quality of experimental data: Number of and size of restraint violations: • Number (?) (Remember selection!) • Completeness. • QUEEN. Agreement of the structure with the experimental data: • Number and size of violations. • Root mean square deviations. • NOE, Dihedral angles etc. • R-factors, Q-factors. • RDC, chemical shifts etc. Fig. 4 Nabuurs et al. • Cross-validation. NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 49 50 Evaluation of experimental data Evaluation of experimental data RMS deviations: R-factors: General Gonzales et al., J. Magn. Reson. 91, 659-64 (1991), others Clore et al., J.Am.Chem.Soc. 121, 9008-12 (1999) Q-factors: Cornilescu et al., J.Am.Chem.Soc. 120, 6836-7 (1998) NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 51 52 Evaluation of experimental data Cross validation Quality of experimental data: J-Couplings • Number (?) T1/T2 ratios • Completeness / RPF scores S2 • QUEEN. RDCs H-bonds Agreement of the structure with the experimental Chemical shifts / Chemical shift anisotropy data: Database potentials • Number and size of violations. SAXS restraints • Root mean square deviations. • NOE, Dihedral angles etc. Ensemble or time-averaged • R-factors, Q-factors. • RDC, chemical shifts etc. -> Cross-validation of the results • Cross-validation. (Stone, J. Roy. Stat. Soc. B 36, 111-47, 1974) NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 53 54 Cross-validation Cross-validation Clore et al. J. Mol. Biol. (2006) 355, 879-86 Parameters working Algorithm Partition set model data test set Back-calculate Score n NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 55 56 Cross-validation QUEEN: cross-validation 3GB1 model calculations Parameters working Algorithm Cross validation approach Partition set model data test set Back-calculate Score n Problems: • Different types of NMR data have very different information content. • Individual data points have very different information content. Nabuurs et al., J. Biomol. NMR 33, 123-34, 2005 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 57 58 NMR-VTF Phase 1 Structure determination process Superposition: Cyrange (distance variance matrix) Assessing structured regions: Cyrange Representative model: mediod Restraints: ‘simple validation’ (number, violations per bin) of all restraints (distance, dihedral, H-bonds, RDCs, ..), also per model Reinterpretation of Experimental Data data Validated Structure Data Spronk et al, Fig. 1 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 59 60 Validation of protein structure quality Validation of protein structure quality How is the quality of the properties expressed? How is the quality of the properties expressed? • Z-scores, RMS Z-scores (WHAT IF) What type of properties are important? What type of properties are important? How can we check these properties? How can we check these properties? NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 61 62 Normal distributions and Z-scores Normal distributions and RMS Z-scores RMS Z-score~0.5 RMS Z-score=1.0 (reference) RMS Z-score~2 Spronk et al., Fig. 8 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 63 64 Z-scores and RMS Z-scores The WHAT IF reference set Structure Z-scores: Structure Z-scores: • Z-scores > 0 are “better” than average • Well refined high resolution X-ray structures (resolution • Z-scores < 0 are “worse” than average < 2.0 Angstrom, R-factor < 19%) • However: A Z-score of -1 is equally normal as a Z-score • Continuously updated of +1!! RMS Z-scores: Local geometry RMS Z-scores: • Cambridge small molecule database (CSD) • Too tight restraining of geometry: RMS Z-score < 1 • Well refined high resolution X-ray structures • Too loose restraining of geometry: RMS Z-score > 1 • Proper Gaussian distribution: RMS Z-score ~1 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 65 66 Validation of protein structure quality Validation criteria for protein structures How is the quality of the properties expressed? Overall quality: • Ramachandran plot, rotameric states, packing quality, What type of properties are important? backbone conformation Local geometry: • Bond lengths, bond angles, chirality, omega angles, How can we check these properties? side chain planarity Others: • Inter-atomic bumps, buried hydrogen-bonds, electrostatics NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 67 68 Validation criteria for protein structures Overall quality: Ramachandran Plot Overall quality: Ramachandran plot • Ramachandran plot, rotameric states, packing quality, φ and ψ Allowed backbone conformation Additionally allowed angles Generously allowed Local geometry: Forbidden • Bond lengths, bond angles, chirality, omega angles, side chain planarity Others: • Inter-atomic bumps, buried hydrogen-bonds, electrostatics Spronk et al., Fig. 12 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 69 70 Overall quality: Ramachandran plot Overall quality: Ramachandran Plot Residue specific ramachandran plot Z-score = +1.8 Z-score = -8.5 Spronk et al., Fig. 13 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 71 72 Overall quality: Ramachandran Plot Overall quality: Rotameric states Residue specific ramachandran plot Eclipsed Staggered NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 73 74 Overall quality: dihedral angle distributions Overall quality: dihedral angle distributions Chi-1/Chi-2: Janin plot Chi-1/Chi-2: Janin plot Z-score = +1.9 Z-score = -1.5 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 75 76 Overall quality: packing quality Overall quality: backbone conformation Very normal Very unique Z-score=+0.8 Z-score=-14 Bad packing Good packing Spronk et al., Fig. 16 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 77 78 Validation criteria for protein structures Local quality: bonded geometry Overall quality: • Ramachandran plot, rotameric states, packing quality, backbone conformation Local geometry: • Bond lengths, bond angles, chirality, omega angles, side chain planarity Others: • Inter-atomic bumps, buried hydrogen-bonds, electrostatics D-amino acid L-amino acid Distorted Cα-chirality (common) NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 79 80 Local quality: bond length distributions Local quality: omega angles Trans- Cis- conformation conformation RMS Z-score = 0.96 RMS Z-score = 0.22 (omega=180°) (omega=0°) Spronk et al., Fig. 8 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 81 82 Local quality: angle distributions Validation criteria for protein structures Lysozyme X-ray (PDB 3LZT) Overall quality: • Ramachandran plot, rotameric states, packing quality, backbone conformation deviation| Local geometry: ω | • Bond lengths, bond angles, chirality, omega angles, side chain planarity Lysozyme NMR (PDB 1E8L) Others: deviation| • Inter-atomic bumps, buried hydrogen-bonds, ω | electrostatics Spronk et al. Fig. 15 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 83 84 Local quality: side chain planarity Other quality indicators: inter-atomic bumps Planar ARG side-chain Non-planar ARG side-chain Overlap of two backbone atoms (Good) (Bad) Spronk et al. Fig. 10 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 85 86 Other quality indicators: Other quality indicators: electrostatics internal hydrogen bonding “Bad” electrostatics Good electrostatics Spronk et al. Fig. 17 Spronk et al. Fig. 17 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 87 88 Validation of protein structure quality Validation of protein structure quality Table 1. Reviewed validation programs and supported features. Abbreviations used: DR: Distance Restraint, DHR: Dihedral restraint; RDC: Residual Dipolar Coupling; CV: Circular Variance; IVM: Inter-atomic distance variance matrix; ROG: Red/Orange/Green; RMSD: Root-Mean-Square Deviation; HB: Hydrogen bond ! How is the quality of the properties expressed? !"#$"%&'! !!"#$"%&'"(#)*+!! !"#$" !"#$%&'($! !"#$%&'()*"%+,) !"#$%&'&()%&*$+*'+,!""" !"#$%&'$#()!"# !"#$%"$#&'(&)&'*+,+! !"#$%&'()%#*#&+('(! !"#$"#! !"#$%#$&! !"#$%&! !"!#$%&%! !"!#$%&%! !"#$%"!&'"($)%*! !"#$%#&!'('&)! ! !"#$%&%# !"#"$%"&'("&!"! !"#$%&'(&"$)*)"+! !"#$%&'&()*+ !"#$%&'#(! !"#$%&$'"!"#!! !"#$%&'! ! !"! !"#$%&"#'()*+,(-.(! !"! !"! !"! !! !"!!! !"#$"!"#$%"'&()*+,%-.'()**%/" !"#$%&'(&)*+),-*$#,)*"#$) !"#$%&'()*+'(,-.% !"#$#%&'($!!"#$%! !"#$%&! !"#!$%&!! !"#$%#! !"##$%&'($)*! !""#$%"&'()*+,-%./,0'/0,&$! What type of properties are important? !"#$%&'(()*)%'+),-.$ !"#$%!"#$%$"&'(&)' ! !"#$%&&'()*+,)*-##.(#/(/#).! !"#$%'()*+),&-'() !"#$%&"'()* !"#$%&'()*%+,%-).." !"#$%&'#()$*+,-.* !"#$%&'()*+,-.$/0! !""#""$#%&'()#*!+,&-.! !"#$%#&%'()* !"#$%&%'()*)$! !"#$%"!&'"($!)"(&*+'& !"#$%&'()"#$%*' !"#"$%#!"#!!! !"#$%&$'()"*+*,-*./0*.1) !"#$%&'()*+$ ! !"#$%&"#'(! ! !"#$%&'()%*)$+$%,$-,./! !"#$%&'()#*+,&'( ! !"#! !"#$!%&'("&)(*+)'"!"#$%&'" !"#$%&'&(&))*+%,)'+ !"#$%&'()*+,-.! ! !"#$%"&"'()"'*+,-./%+ ! !""#$$%&'%(#&!)*#(% !"#$#"!"#"!"#$%"&'$()'*+! !"#$%&'! !"#$%$&$'(#&)*"+&! ! !"#$%&'()*+'(,-%! !"#$"!"#$%! !"#$%&'(&)*+),-*$#,)*"#$) !"#$%!&'#$&%(")& !"#$%"&'()*+,-.-/0-12%-1+-12+ !"#$%&'()*+,(-#,-("* ! !"#!$%&!! !"#$%&'(#&)$*+#,$! !"#$%"!"#$%! !"!#!$%&! !"#$%&'()*&'+,! ! ! ! !"#$%&'(! !"#"!"#$!!!! !"#! !"#! !"! !"! !"#! !"! ! ! How can we check these properties? !"#$%&'(')"#)*+,+)-(./(.0! !"#$%$&'#()*! !"#$%&'()*&'+,-$ ! !""#$%"&'()*+,-%./,0'/0,&$! !"#$#%& !"#!$%&'&%&()&*! !"##$%&'($)*+'! !"#$%&'()*+*,-*./0*.1( !"#$%#&%'()* Visual inspection ! ! • !"#$!%&'()*+&,-"./)*0&,-01.$)* !"#$%&'(%)*+, !"#$%&'()*+' !"#$%&'(#%&&'&&)'*+,# !"#$%&'#()$*+$,* !"#$%&!'()*+%,'(),$-./01() !"##$%& !"#$%'()*+),&-'() !"#!$#"%&'()! !"#$%&'()*(+,-+& !"#$%&'! !"#$%!!!"#!!!! !"#$%&'#()*+,%-#(&'+%-#$% !"#$%&'&(&))*+ !""#""$#%&'()#*!+,&-.! ! !"#$%$&'#()*+%&,*(,'-.* !"#$%$&''(%)*+! ! ! !"#! !"#$%"&$'! !"##$%&! ! !"#$%&'()%*+,"-&*.(! !"#$%&'!!#()*+)%!,! ! !"#"$"!"#$#% • “Legacy tools”: !"#$%&'%()%*+,! !"#$%&$'()*+,! ! !"#$%&'()*"%+,)-'*#.) ! !"#$%&'(&)*+),-*$#,)*"#$) !""#""$#%&' !"#$%&! !"#$%&'()*+'(,-.% !""#""$#%&'! !"#$%&'"("#)%*%+! !"#!$%&!! !"#$%&! !"#"!"#$%&"!!"#$"%&'()"#*! !"#$#%&'())*+,! ! ! !"#$%&"'(#)! ! PROCHECK / WHAT IF / WHAT_CHECK / Molprobity !"#$%&"'(%)"' !""#$$%&'%())%*(&(+! !"#$"!"#$%&'(! !"#$%&'()*+,%-./% !"#$%&! !"#$%"&'%(! ! !"#$%"&'()!!!"!"#$!!! !"#$%&'(&"$)*)"+! !"#$%&'#()$*+$,* Program suites: ! !"#"$%"&'("&)*+,&'-)*"&./0-)* • !"#! !"! !"! !"! !"! ! !"#$%&'(")*$%&'+,' !"#$%'()*+'(#,-."+/01 !"#$%&$'()*&+#""($!)+&! !"#$%&'$()*(! !"#$%&"#%'"()! !"!#$%! Protein Structure Validation Suite (PSVS), ResProx, !"#$%&'()*+,')(-$ !"#$%&'($&)*+,-.! !"#$%&"#'()*+,) !"#$%"!"#$%&"'"()#*#+! !"#$%&$'()"*+*,-*./0*.1) !"#$%&"! !"#$%&'()*+,(-.' !"#$%&'()*$%&'(+,-./*01* !"#$%&'#()$*+$,* !"#$%&"'(!!!!!! !"#$%&!'! !"#$%!"#$%&'(#%$)* !"#$%&'"!"#$%!&' !"#$%&'()*+',( !"#$%&! !"#$%&'(")*$%&'"+,' ! !"#! !"! Vivaldi, CING !"#$%&'(%)*+*,-*./0*.1% !"#$%&"#'()*+,) !"#$%&'( !"!#$%&%'()*'!"+',*-.! ! !"#$%&'()%&(*%+,# !"#$%&'()*+,+-.#,/! !"#$%&'(#)*+%&',*(%-+%&' !"#$%&"! !"#$%"#&'"()! !"#$%'()*+$#)( !""#! !"#$%&!'()*$+$!,#-! !"#$%&'()*+%,-&&-(./% !"#$%&'(!)"*)&+&"%&,%(! !"#$%&'"!"#$%! Vuister et al., J.Biomol.NMR, in press NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 89 90 Validation of protein structure quality Structural quality Table 1. Reviewed validation programs and supported features. Abbreviations used: DR: Distance Restraint, DHR: Dihedral restraint; RDC: Residual Dipolar Coupling; CV: Circular Variance; IVM: Inter-atomic distance variance matrix; ROG: Red/Orange/Green; RMSD: Root-Mean-Square Deviation; HB: Hydrogen bond ! !"#$"%&'! !!"#$"%&'"(#)*+!! !"#$" !"#$%&'($! !"#$%&'()*"%+,) !"#$%&'&()%&*$+*'+,!""" !"#$%&'$#()!"# !"#$%"$#&'(&)&'*+,+! !"#$%&'()%#*#&+('(! !"#$"#! DR1885 apo copper-bound !"#$%#$&! !"#$%&! !"!#$%&%! !"!#$%&%! !"#$%"!&'"($)%*! !"#$%#&!'('&)! ! !"#$%&%# !"##$%&'($)*! !"#"$%"&'("&!"! !"#$%&'(&"$)*)"+! !"#$%&'&()*+ !"#$%&'#(! ! !"#$%&$'"!"#!! !"#$%&'!"#"$%&$'(")*$+,&-! "!"#$%& ! !"! !"#$%&"#'()*+,(-.(! !"! !"! !"! !!"#$%&'#()*&+',-'! !"!!! !"#$!"#$%&'()*+,'")"&'-,.("!"#$%"'&()*+,%-.'()**%/! " !"#$%&'(&)*+),-*$#,)*"#$) !"#$%&'()*+'(,-.% !"#$#%&'($!"!"#$%&'()*+"(,!!"#$%! ! !"#$%&!"#$%"&'()*+,#-./0.1++&"23.1+! !"#!$%&!!"#$%&'(&)*+),-*$#,)*"#$)! !"#$%&'(!!!!! !"#$%#! ! !"#$%&'()!*!+,!"-.!"! !"#!$%&!'! ! !"#! !"! !"! !"! !"! !"##$%&'($)*!"##$%&'()%*+(,*! !""#$%"&'()*+,-%./,0'/0,&$!"#$%$$&'()*+,-(./'0"!"#$!! ! !!!!"#$%'(")*+! !"#$%&'(()*)%'+),-.$ !"#$%!"#$%$"&'(&)' !!"#$%&'"()#!*)# !"#$%&&'()*+,)*-##.(#/(/#).!"#$%%&'"($#)*+&,-./0! ! !"#$%'()*+),&-'() !"#$%"!"#$%&"'"()#*#! !"#$%&"'()* !"#$%&'()*%+,%-).." !"#$%&'#()$*+,-.*!"#$%"$#&! !"#$%&'()*+,-.$/0!"#$%&'!"#"$%"&'("&)*! !""#""$#%&'()#*!+,&-.! !"#$%#&%'()* !"#$%&%'()*)$! !"#$%"!&'"($!)"(&*+'& !"#$%&'()"#$%*'! !"#"$%#!"#!!! !"#$%&$'()"*+*,-*./0*.1)!"#$%&$! !"#$%&'()*+$ ! !"#$%&"#'(! ! !"#$%&'()%*)$+$%,$-,./! !"#$%&'()#*+,&'(!"#$%&'()&$"*+& ! !"#! !"#$!%&'("&)(*+)'"!"#$%&'" !"#$%&'&(&))*+%,)'+ !"#$%&'()*+,-.! ! !"#$%"&"'()"'*+,-./%+ ! !""#$$%&'%(#&!)*#(%!"#$%! !"#$#"!"#"!"#$%"&'$()'*+! !"#$%&'! !"#$%$&$'(#&)*"+&! ! !"#$%&'()*+,')($ !"#$%&'()*+'(,-%! !"#$"!"#$%! !"#$%&'(&)*+),-*$#,)*"#$) !"#$%!&'#$&%(")&!"#$%&'#()* !"#$%"&'()*+,-.-/0-12%-1+-12+ !"#$%&'()*+,(-#,-("* !"#$%&'()!!!!" !!"#$%&'()&*+(&,-.$ !"#!$%&!! !"#$%&'(#&)$*+#,$!"#$%&'($)*&%) ! !"#$%"!"#$%! !"!#!$%&! !"#"$%&'("&)'*"#+&'("& ! !"#! !"! !"#$%&'()*&'+,!"#$%&'($")'!*)'! !"! !"! !"! ! !!"#$%&'(%$)!%)$*' ! !"#$%! !!"#$%!&$'%&()%*+&%,-!"#"$% !"#$%&'(&)"'*+,"'-!.! !"#$%&'(!"#$#%&'(#)#*+,-)#'! !"#!"#$%""!"#$!! !!!! !"#$%&! !"#! !"#! !"! !"! !"#! !"! ! ! !"#$%&'(')"#)*+,+)-(./(.0!"#$%&!!!!"" ! ! !""#$%"&'()*+,-%./,0'/0,&$! !"#$%$&'#()*! !"#$%'"!"#$%& !"#$%&'#()*+,-+.#(! !"#$%&'()*&'+,-$ !!"#!!$%&'()'*+,!-)%./01 !"#! !"#$%"'(")*$+,"'! !"! !!"! !"! !"! !"#!$%&!#'! !"#$%"&'&(%)*+&,* !""#$%"&'()*+,-%./,0'/0,&$! !"#$#%& !"#$!! !"#$%&'()**%++,-,#,'.! !"#!$%&'&%&()&*! !"!#$%&'()"*)!+&,!-."-! !"##$%&'($)*+'!"#$! ! !"#$%&'()*+*,-*./0*.1( !"#$%#&%'()* ! ! ! !"#$!%&'()*+&,-"./)*0&,-01.$)* !"#$%&'(%)*+, Visual inspection is the easiest tool! ! !"#$%&'()*+' !"#$%&'(#%&&'&&)'*+,# !"#$%&'#()$*+$,* !"#$%&!'()*+%,'(),$-./01() !"##$%& !"#$%'()*+),&-'() !!!"#$%$'()*("+,(-../,! !"#!$#"%&'()! !"#$%&'()*(+,-+& !"#$%&'! !"#$%!!!"#!!!! !"#$%&'#()*+,%-#(&'+%-#$% !"#$%&'&(&))*+ !""#""$#%&'()#*!+,&-.! !!!"#$$#%"#&'#()$(#*+(,--.+! ! !"#$%$&'#()*+%&,*(,'-.* !"#$%$&''(%)*+! ! ! !"#! !"#$%"&$'! !"##$%&! ! !!!"#"$%"&'(&")*&+,-+*! !"#$%&'()%*+,"-&*.(! !"#$%&'!!#()*+)%!,! ! !"#"$"!"#$#% !"#$%&'%()%*+,! !"#$%&$'()*+,! ! !"#$%&'()*"%+,)-'*#.) !!!"#$%$&'$#()$*)+%,)-./-+,! ! !"#$%&'(&)*+),-*$#,)*"#$) !""#""$#%&' ! !"#$%&! !"#$%&'()*+'(,-.% !""#""$#%&'! !!"#$%#&%'()%*+,+)%!"#$%"&'()*+*,-../*'0*/1/'#/*&4(*5/0*0('##*&2'67*(20(28*/(*(92*(':2*";*5%'('67-*<22*(2=(-!"#$%&'"("#)%*%+! ! !"#!$%&!! !"#$%&! output RMS Z-scores of WHATIF !"#"!"#$%&"!!"#$"%&'()"#*! !"#$#%&'())*+,! !!!"#$%&'()*"+*,-.*/012.! ! ! !"#$%&"'(#)! ! !!!"#$%&'())*"+*%,-*./0.-! !"#$%&"'(%)"' !""#$$%&'%())%*(&(+! !"#$"!"#$%&'(! !"#$%&'()*+,%-./% !!!"##"$%&'(%)*+$,"--*%./0/1! !"#$%&! !"#$%"&'%(! !!!"#$"%&'"&'()*'+,,-*! ! !"#$%"&'()!!!"!"#$!!! !"#$%&'(&"$)*)"+! !"#$%&'#()$*+$,* ! !"#"$%"&'("&)*+,&'-)*"&./0-)* !"#! !"! !"! !"! !"! ! !"#$%&'(")*$%&'+,' !"#$%'()*+'(#,-."+/01 !"#$%&$'()*&+#""($!)+&! !"#$%&'$()*(! !"#$%&"#%'"()! !"!#$%! !"#$%&'()*+,')(-$ !"#$%&'($&)*+,-.! !"#$%&"#'()*+,) !"#$%"!"#$%&"'"()#*#+! !"#$%&$'()"*+*,-*./0*.1) !"#$%&"! !"#$%&'()*+,(-.' !"#$%&'()*$%&'(+,-./*01* !"#$%&'#()$*+$,* !"#$%&"'(!!!!!! !"#$%&!'! !"#$%!"#$%&'(#%$)* !"#$%&'"!"#$%!&' !"#$%&'()*+',( !"#$%&! !"#$%&'(")*$%&'"+,' ! !"#! !"! !"#$%&'(%)*+*,-*./0*.1% !"#$%&"#'()*+,) !"#$%&'( !"!#$%&%'()*'!"+',*-.! ! !"#$%&'()%&(*%+,# !"#$%&'()*+,+-.#,/! !"#$%&'(#)*+%&',*(%-+%&' !"#$%&"! !"#$%"#&'"()! !"#$%'()*+$#)( !""#! !"#$%&!'()*$+$!,#-! !"#$%&'()*+%,-&&-(./% !"#$%&'(!)"*)&+&"%&,%(! !"#$%&'"!"#$%! Vuister et al., J.Biomol.NMR, in press NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 91 92 Structure validation programs Structure validation programs PROCHECK and PROCHECK_NMR: WHAT IF / WHAT_CHECK: • Very useful graphical and text output. • More checks and more critical checks. • No longer maintained. • The reference data base of X-ray structures is • Relatively easy to run. continuously updated. • Part of PSVS and CING. • Difficult to install and run, massive (text) output. • Part of Vivaldi and CING. NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 93 94 A WHAT IF summary report Structure validation programs Molprobity NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 95 96 Structure validation programs Structure validation programs Molprobity Protein Structure Validation Suite (PSVS) Wiley InterScience :: Article Full Text HTML 3 Bhattacharya et al. Proteins (2007) vol. 66 (4) pp. 778-95 In this article, we present the Protein Structure Validation Software (PSVS) suite for consistent and rapid evaluation of the quality of protein structures, with a focus on NMR structures and homologyNMR Structuremodels. ValidationPSVS provides a standardized set ofEMBOquality course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 scores and constraint analyses for each input structure. In addition to experimental constraints, this set encompasses a number of parameters evaluating different aspects of the structure, including fold and packing, local residue separation, deviations of 97 98 bond lengths and bond angles, backbone and side-chain torsion angle stereochemistry, and steric overlaps between atoms. These data allow both global and site-specific structure quality assessments. A graphical user interface (GUI) runs the analysis and integrates information reported by several structure quality evaluation tools. Quality scores, calibrated with a set of high- quality X-ray crystal structures ( 1.8 Å resolution), are summarized as Z scores for several structure validation analysis programs. The output consists of a standard set of tables and graphs and a concise validation report. The PSVS software is broadly applicable in structural biology projects. As a demonstration of the value of the PSVS server, we apply these tools to evaluate protein structures determined by different Structural Genomics Consortia, and compare the distributions of quality scores in these structures with X-ray crystal and solution NMR structures deposited in the PDB in recent years. METHODS Tools for Structure Quality Evaluation PSVS incorporates published software developed by other research groups and in our own laboratory that have been integrated under a single graphical user interface. Table I summarizes the software tools supported by the current version of PSVS. PDBStat is a C++ program used to perform various statistical analyses given the Cartesian coordinates of a protein and a list of spatial constraints used to generate the structure (if available). The program is able to read and write coordinates and/or constraints in different standard formats (CONGEN,[29] XPLOR/CNS,[30][31] PDB,[8] and DYANA/CYANA[32]), handling the different hydrogen naming conventions, and can deal with proteins with multiple chains and/or models. PDBStat also produces a constraint satisfaction analysis for distance or dihedral constraints, giving a summary with minimum, maximum, average, and root-mean-square violations, with violations classified in ranges. The program also provides a summary of experimental distance constraints, including the numbers of conformationally restricting distance constraints classified into intra and sequential (backbone/backbone, backbone/side-chain and side-chain/side-chain), long range, hydrogen bond, and disulfide bond constraints. PDBStat is also used to filter out conformationally nonrestricting intraresidue and sequential NOE constraints, if the constraint is too restricting, nonrestricting, or corresponds to a fixed distance, based on the ranges imposed by molecular geometry. In addition, it filters out duplicate constraints, and constraints for identical atom pairs with different bounds. PDBStat also generates contact maps based on coordinates or constraints, calculates atomic rmsd's for an ensemble of structures, and evaluatesStructurestructural order parameters validationfor backbone and dihedralprogramsangles[33] in order to assess how well Structure validation programs local structure is defined across an ensemble of models. The program is also used to fit coordinates to a specified model, and translate and rotate coordinates to optimally superpose them for all or a selected set of atoms, over the average structure or an individual model. Some other functions of PDBStat include performing a simple close contact analysis, main chain and side chain (for Ile and Thr) chiralityProteinanalysis, and an Structureanalysis of hydrogen bond Validationsatisfaction and classification Suite(based (PSVS)on geometric Protein Structure Validation Suite (PSVS) parameters). • Based upon ‘known’ tools. Table I. Tools Used by PSVS to Evaluate Different Aspects of Structure Quality Tool(s) Parameter(s) evaluated PDBStat Define ordered regions of the structure, and analyze numbers of conformationally-restricting constraints, violations of constraints, and rmsd of superimposed atomic coordinates RPF[15] Goodness-of-fit of protein NMR structure with NOESY peak list and resonance assignment data DSSP[36] Calculate secondary structure PROCHECKG- Probability of dihedral angles of a residue type to be factors[3] within a given range MolProbity[5][26- Calculate and visualize bad contacts, atomic overlaps, 28] and C position deviations Verify3D[6] Likelihood of the amino acid sequence to have the three-dimensional packing seen in the structure ProsaII[7] Pseudo energy of pair-wise interactions from the spatial separation of residues Bhattacharya et al. Proteins (2007) PDB validation Close contacts, deviations of bond lengths, and bond vol. 66 (4) pp. 778-95 software[8] angles from ideality NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 http://www3.interscience.wiley.com/cgi-bin/fulltext/114029977/main.html,ftx_abs 01/17/07 14:50:33 99 100 Structure validation programs Structure validation programs Vivaldi CING NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 101 102 CING: Common Interface for NMR structure Generation CING: Data flow User friendly interface to WHAT IF/QUEEN/ • CYANA, XEASY, PROCHECK/Aqua/SHIFTX/Wattos/DSSP/.. results and PDB files, XPLOR, reports. CCPN, ... • Residue oriented “ROG” HTML • Validation and data together Text “ROG” • Hyperlinked HTML ... • Color-coded (red, orange, green) (ROG-score) • Automated export to multiple formats • API to data and validation results Plugins: Smart and guide user with suggestions in access to external troublesome areas! programs NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 103 104 CING: checks CING: checks Correction of minor errors; e.g. nomenclature. Correction of minor errors; e.g. nomenclature. Validation of resonance assignments. • CING’s internal database Validation of experimental restraints. • WHATIF Validation of stereochemical quality. • CCPN import Validation of structural quality. Validation of resonance assignments. Analysis of structural results. Validation of experimental restraints. Validation of stereochemical quality. Validation of structural quality. Analysis of structural results. NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 105 106 CING: checks CING: checks Correction of minor errors; e.g. nomenclature. Correction of minor errors; e.g. nomenclature. Validation of resonance assignments. Validation of resonance assignments. • Relative to database Validation of experimental restraints. • Inconsistencies (e.g. both pseudo-atom and • CING explicit atoms assigned, stereo-specific • QUEEN assignments). • AQUA • Expected assignments. • Wattos Validation of experimental restraints. • RPF (soon) Validation of stereochemical quality. Validation of stereochemical quality. Validation of structural quality. NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 107 108 CING: checks CING: checks Correction of minor errors; e.g. nomenclature. Correction of minor errors; e.g. nomenclature. Validation of resonance assignments. Validation of resonance assignments. Validation of experimental restraints. Validation of experimental restraints. Validation of stereochemical quality. Validation of stereochemical quality. • Whatif Validation of structural quality. Validation of structural quality. • Whatif Analysis of structural results. • Procheck_NMR • CING Analysis of structural results. NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 109 110 CING: checks CING API Correction of minor errors; e.g. nomenclature. API: high-level language, high-level constructs. Validation of resonance assignments. Scriptable and interactive usage (ipython). Validation of experimental restraints. Easy access to all data (shifts, peaks, restraints, Validation of stereochemical quality. coordinates, validation data). Validation of structural quality. Data is ‘linked’, reflecting their relationships. Analysis of structural results. Simple solutions for complicated problems. • Secondary structure (dssp). • Salt-bridges. • Potential di-sulfide bridges. • Proline cis-trans, Leu/Val pro-chiral methyls. • Talos+ predictions NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 111 112 CING: ROG color coding CING: ROG color coding TAF homology domain ROG Color coding: red: bad orange: problems green: good 2PP4 2H7B Wei et al, Nature. Struct. Mol. Biol. 14, 653, 2007 Plevin et al. PNAS 103, 10242, 2006 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 113 114 CING: ROG color coding NRG-CING red: 25 (23%) red: 48 (46%) Converted NRG NMR restraints orange: 36 (34%) orange: 44 (42%) Import as CCPN projects green: 46 (43%) green: 13 (12%) 4102 5080 5669 5839 8878 10002 Entries (>95%) http://nmr.cmbi.ru.nl/NRG-CING Doreleijers et al, J.Biomol. NMR, 2009 2PP4 2H7B Doreleijers et al, NAR 2012 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 115 116 NRG-CING: results NRG-CING: results Average: 92 Most often occurring: 20 Frequency Frequency Protein size (# residues) # of models NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 117 118 NRG-CING: results NRG-CING: results Distance restraints Distance restraints 15 restraints per residue Frequency Restraint surplus (%) Entries Average # distance restraints per residue Dorelijers et al. J.Biomol. NMR, 2009 NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 119 120 NRG-CING: results NRG-CING: results Distance restraints Dihedral restraints ~50% completion 1.1 restraints per residue Frequency Frequency Completeness per residue Average # dihedral restraints per residue NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 121 122 NRG-CING: results NRG-CING: results Dihedral restraints RDC restraints 2 restraints per residue 1.2 restraints per residue Talos? Frequency Frequency # Dihedral restraints per residue Average # RDC restraints per residue NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 123 124 NRG-CING: results NRG-CING: results RDC restraints Quality 1 restraints per residue -3.3 15N-1H Frequency Frequency # RDC restraints per residue Overall WHATIF QualityCheck NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 125 126 NRG-CING: results NRG-CING: results Quality Quality > 2500 Non-refined structures Frequency Frequency Per residue WHATIF QualityCheck rms Z-score omega angle NMR Structure Validation EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 127 128 NRG-CING: results CING: ROG color coding 100 red: 67 (63%) 80 fine orange: 20 (19%) green: 19 (18%) 60 green (%) 40 problematic 20 1HKT Vuister et al, 1994 0 20 40 60 80 100 NMR Structurered Validation (%) EMBO course, Basel, July 2013 NMR Structure Validation EMBO course, Basel, July 2013 129 130 NRG-CING: results NRG-CING: results 100 80 fine 60 green (%) 40 problematic 20