Automatic 13C Chemical Shift Reference Correction for Unassigned Protein NMR Spectra
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Biomolecular NMR https://doi.org/10.1007/s10858-018-0202-5 ARTICLE Automatic 13C chemical shift reference correction for unassigned protein NMR spectra Xi Chen1,2 · Andrey Smelter1,3,4 · Hunter N. B. Moseley1,2,3,4,5 Received: 10 April 2018 / Accepted: 1 August 2018 © The Author(s) 2018 Abstract Poor chemical shift referencing, especially for 13C in protein Nuclear Magnetic Resonance (NMR) experiments, fundamen- tally limits and even prevents effective study of biomacromolecules via NMR, including protein structure determination and analysis of protein dynamics. To solve this problem, we constructed a Bayesian probabilistic framework that circumvents the limitations of previous reference correction methods that required protein resonance assignment and/or three-dimensional protein structure. Our algorithm named Bayesian Model Optimized Reference Correction (BaMORC) can detect and correct 13C chemical shift referencing errors before the protein resonance assignment step of analysis and without three-dimensional structure. By combining the BaMORC methodology with a new intra-peaklist grouping algorithm, we created a combined method called Unassigned BaMORC that utilizes only unassigned experimental peak lists and the amino acid sequence. Unassigned BaMORC kept all experimental three-dimensional HN(CO)CACB-type peak lists tested within ± 0.4 ppm of the correct 13C reference value. On a much larger unassigned chemical shift test set, the base method kept 13C chemical shift ref- erencing errors to within ± 0.45 ppm at a 90% confidence interval. With chemical shift assignments, Assigned BaMORC can detect and correct 13C chemical shift referencing errors to within ± 0.22 at a 90% confidence interval. Therefore, Unassigned BaMORC can correct 13C chemical shift referencing errors when it will have the most impact, right before protein resonance assignment and other downstream analyses are started. After assignment, chemical shift reference correction can be further refined with Assigned BaMORC. These new methods will allow non-NMR experts to detect and correct 13C referencing error at critical early data analysis steps, lowering the bar of NMR expertise required for effective protein NMR analysis. Keywords Carbon chemical shift · Reference correction · Protein NMR · Statistical modeling Introduction http://softw are.cesb.uky.edu; https ://doi.org/10.6084/m9.figsh are.52707 55.v1 Nuclear magnetic resonance (NMR) is a highly versatile Electronic supplementary material The online version of this analytical technique for studying molecular configuration, article (https ://doi.org/10.1007/s1085 8-018-0202-5) contains conformation, and dynamics, especially of biomacromol- supplementary material, which is available to authorized users. ecules such as proteins (Saitô 1986; Spera and Bax 1991; * Hunter N. B. Moseley Wishart et al. 1991; Iwadate et al. 1999; Wishart and Case [email protected] 2001; Neal et al. 2003; Mao et al. 2011; Serrano et al. 2012; Rosato et al. 2012). Several factors are fundamental 1 Department of Molecular and Cellular Biochemistry, to the utilization of NMR spectral data: resonance sensi- University of Kentucky, Lexington, KY 40356, USA tivity, spectral precision, and spectral accuracy (De Dios 2 Department of Statistics, University of Kentucky, Lexington, et al. 1993; Vila et al. 2008). While various improvements KY 40356, USA in sample preparation (Wu et al. 2010; Akira et al. 2012), 3 Markey Cancer Center, University of Kentucky, Lexington, instrumentation (Yang and Bax 2010; Barette et al. 2011; KY 40356, USA Lange et al. 2012; Vernon et al. 2013), and pulse sequences 4 Center for Environmental and Systems Biochemistry, (Meissner and Sørensen 2001; Khaneja et al. 2005) have University of Kentucky, Lexington, KY 40356, USA greatly improved resonance sensitivity and spectral pre- 5 Institute for Biomedical Informatics, University of Kentucky, cision, spectral accuracy still depends on the same basic Lexington, KY 40356, USA Vol.:(0123456789)1 3 Journal of Biomolecular NMR procedure: referencing chemical shifts to a designated chem- (Moseley et al. 2004), PANAV (Wang et al. 2010), Check- ical standard. Additionally, variance in chemical shifts can Shift (Ginzinger et al. 2007; Wang et al. 2010), SHIFTX2 be caused by a variety of experimental factors, including pH, (Han et al. 2011) and VASCO (Rieping and Vranken 2010). temperature, salts, organic solvent mixtures, and inaccurate Due to the complexity of manual procedures and various referencing due to human error (Nowick et al. 2003; Ulrich experimental factors, approximately 40% of the entries in the et al. 2008). In protein NMR analyses, 4,4-dimethyl-4-sila- Biological Magnetic Resonance Bank (BMRB) have chemi- pentane-1-sulfonic acid (DSS) is the recommended internal cal shift accuracy problems (Wang et al. 2005; Ulrich et al. standard for chemical shift referencing (Wishart et al. 1995; 2008). Unfortunately, current reference correction methods Markley et al. 1998). However, DSS has a negative charge at are heavily dependent on the availability of assigned protein NMR-relevant pHs and can interact with positively charged chemical shifts or protein structure. One of the best exam- residues of a protein of interest, broadening and altering its ples is the SHIFTX program (Wang et al. 2005), which is reference chemical shift values (Nowick et al. 2003). Addi- used by the Re-referenced Protein Chemical shift Database tionally, temperature affects the reference chemical shift of (RefDB) (Zhang et al. 2003) to predict protein 1H, 13C and DSS. Lack of experience or familiarity with chemical shift 15N chemical shifts from the X-ray or NMR coordinate data referencing and the factors that can affect referencing is a of previously assigned proteins to check and correct refer- major contributor to chemical shift referencing inaccuracy. encing using the companion program SHIFTCOR (Zhang All downstream analyses and interpretations are affected by et al. 2003). Another good example is the linear analysis these inaccuracies in chemical shifts, including the assign- of chemical shifts (LACS) method, which was developed ment of resonances in biomacromolecules such as proteins. by the National Magnetic Resonance Facility at Madison Moreover, these inaccuracies can outright prevent data anal- and the associated Biological Magnetic Resonance Bank ysis, especially with semiautomated data analysis tools, or (BMRB) and employs assigned chemical shifts to directly propagate through data analysis, snowballing into interpre- calculate a reference correction (Wang et al. 2005). However tive errors about structure and dynamics. Since the structural as shown in Fig. 1, this dependence on assigned shifts cre- and dynamic information contained in the chemical shift is ates a vicious cycle between referencing and assignment in subtle, even small chemical shifts errors due to inaccurate NMR spectra analysis: a correct chemical shift reference is referencing may provide a distorted representation of the required for good resonance assignment, and a good reso- protein, especially if the chemical shifts are directly used nance assignment is needed to validate and correct chemi- in structure determination (Wu et al. 2010; Yang and; Bax cal shift referencing. From a statistical analysis perspective, 2010; Barette et al. 2011; Lange et al. 2012). neither chemical shift referencing nor resonance assignment Supplemental Table 2 shows several available programs can be assessed independently of each other. used by the biomolecular NMR community for correcting To address these issues in protein NMR, we have devel- referencing in 1H, 13C and 15N chemical shifts (Wishart oped a new methodology referred to as Bayesian Model 2011). In addition, there are a variety of tools for detecting Optimized Reference Correction (BaMORC), which detects protein resonance assignment errors, which can be due to bad and corrects 13C chemical shift referencing errors using sets referencing. These tools include but are not limited to AVS of Cα and Cβ chemical shift pairs. BaMORC minimizes the Fig. 1 Overview of traditional and Unassigned BaMORC protein lowed by refinement of referencing through a trial and error process. NMR referencing workflows. Top: The traditional workflow requires Bottom: The Unassigned BaMORC workflow allows referencing cor- a manual referencing at step 2 to resolve the assignment initially, fol- rection before assignment 1 3 Journal of Biomolecular NMR difference between the known amino acid frequencies based accompanying the NMR chemical shift data provided by the on the protein sequence and the frequencies predicted using RefDB, we associated residue-specific Cα and Cβ chemical a set of mostly bivariate statistical models that are amino shifts and then sub-grouped them by amino acid and second- acid and secondary structure specific (Wishart et al. 1991) ary structure type, as shown in Supplemental Fig. 1 for 19 and are based on Cα and Cβ chemical shift statistics. The of the 20 common amino acids (not including glycine) in minimization comes from the adjustment of the 13C chemi- proteins and for the secondary structure types helix, sheet, cal shift referencing. The statistical models integrate prior and coil. The univariate Cα and Cβ chemical shift distribu- amino acid and chemical shift propensity information along tions are multimodal, with most of the modes being second- with amino acid and secondary structure probabilities cal- ary structure specific (Spera and Bax 1991; Wishart