Improved Chemical Shift Based Fragment Selection for CS-Rosetta Using Rosetta3 Fragment Picker
Total Page:16
File Type:pdf, Size:1020Kb
Supporting Online Material to:
Improved chemical shift based fragment selection for CS- Rosetta using Rosetta3 fragment picker.
a b c,d a,e,1 Robert Vernon , Yang Shen , David Baker , Oliver F. Lange
a Biomolecular NMR and Munich Center for Integrated Protein Science, Department Chemie, Technische Universität München, 85747 Garching, Germany
b Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0510, USA. c Department of Biochemistry, University of Washington, Seattle, WA 98195, USA d Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA e Institute of Structural Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
1 Corresponding Author:
email: [email protected]; phone: +49 289 13864; postal: TUM, Chemie, Lichtenbergstrasse 4, 85747 Garching Supplementary Figure 1: Chemical Shift Score Parameter Optimization
During Stage 3 a brute force grid optimization was used to determine the optimal values for the sigmoid constants A and B, as defined in Eq. 2 and then used in Eq. 3. Fragments were selected for the training benchmark set using every combination of constants with A ranging from 1-5 and B ranging from 3-7, and the best and worst fragments’ RMSD improvement relative to the MFR control was used to measure performance, displaying a clear optimum at A=2 and B=4.
Supplementary Figure 2: Relative Effects of the Three Chemical Shift Based Scores
The final R3FP protocol was run with every combination of the three different chemical shift based scores, specifically the revised SS-Similarity score (S), Phi/Psi-SquareWell (P), and CSScore (C), showing how each score effects both the average fragment RMSD across the benchmark as well as the percentage of fragment positions where >90% of the fragments match the target secondary structure.
In all cases the addition of a new score results in improvements to both quality metrics, suggesting that even though each score is derived from the same chemical shift data they still contain independent information and provide complementary restraints during fragment selection. Supplementary Figure 3: Final fragment assembly benchmark, effects on Rosetta Energy
Rosetta energy and RMSD values were taken from the full fragment assembly benchmarks in order to determine each fragment set’s relative ability to guide sampling towards low energy models. In A) Z- Scores were calculated for individual rosetta energies based on the average value and standard deviation for the target in question, as calculated using the MFR control, and a histogram of the values is shown, demonstrating that over the entire benchmark the R3FP fragments do increase Rosetta’s ability to sample low energy (negative Z-score) models while decreasing the frequency of high energy outliers (Z-score > 1)
In B) the Energy vs. CA-RMSD values for a single significantly improved target, VpR247, shows that in some cases improved sampling can reveal a low-energy funnel that was previously not observed, dramatically improving Rosetta’s ability to discriminate low-RMSD models by energy. Supplementary Table 1: Weight Ranges Used During Optimization
Score Function Relative Weights Tested During Parameter Optimization*
CSScore – (Eq. 3) 1.0, 1.5, 2.0, 2.5
ProfileScore – (Eq. 4) 0.2, 0.55, 1.1, 1.5, 2.8
RamaScore – (Eq. 6) 0.04, 0.2, 0.4, 1.0, 2.0
Phi/Psi-SquareWell – (Eq. 8) 0.5, 1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0
Revised SS-Similarity – (Eq. 9) 0.2, 0.25, 0.55, 1.1 *The final accepted weights are underlined. Supplementary Table 2: Benchmark Information
Number of Rosetta Benchmark BMRB ID Residues backbone (Best 0.05% RMSD)* Target Name PDB ID /Data Ref (Range used) chemical MFR R3FP shifts From MFR Set (Shen et al. 2008) (Ulmer et al. gb3 2OED 2003) 56(1-55) 332 0.7 0.7 ccr19 1T17 6460 148(2-144) 842 4.7 4.8 cspA 1MJC 4296 70(4-70) 405 3.0 3.2 calbindin(MFR) 4ICB 390 75(3-74) 435 2.2 1.8 (Ramirez et al. DinI 1GHH 2000) 81(1-77) 463 1.6 1.8 hpr 1POH (Jia et al. 1993) 85(2-83) 419 1.8 1.6 mrr16 1YWX 6799 88(2-81) 514 1.8 1.7 sen15 2GW6 6860 123(6-122) 831 3.8 3.9 tm1442 1SBO 5921 110(5-109) 647 1.5 1.3
From PCS Set (Schmitz et al. 2012)
proteinG 3GB1 7280 56(1-56) 273 0.6 0.6 calbindin 1KQV 6699 79(2-75) 442 1.6 1.7 theta 2AE9 6571 76(10-64) 387 5.4 6.1 argn 1AOY (Su et al. 2008) 78(8-70) 447 5.3 3.2 calmodulin 2K61 15852 148(3-146) 852 4.0 4.7 (Bertini et al. Ncalmodulin 1SW8 2004) 79(3-79) 380 2.1 2.1 thioredoxin 1XOA 1813 108(2-108) 296 2.7 1.8 parvalbumin 1RJV 6049 110(2-109) 535 5.3 3.8 epsilon 1J54 6184 186(7-180) 780 10.7 8.1 From CASD Set (Rosato et al. 2012) (Rosato et al. VpR247 2KIF 2012) 106(1-106) 588 3.7 2.3 (Rosato et al. HR5537A 2KK1 2012) 135(1-135) 783 7.2 7.2 (Rosato et al. atT13 2KNR 2012) 121(1-121) 677 8.5 7.7 (Rosato et al. PGR122A 2KMM 2012) 73(1-73) 416 3.5 3.9 (Rosato et al. NeR103A_trim 2KPM 2012) 105(1-105) 531 5.1 5.0 (Rosato et al. CGR26A 2KPT 2012) 148(1-131) 781 4.7 3.9 *Significant changes (Average Best 0.05% RMSD difference > 4 standard deviations, as determined by a bootstrap analysis of the population) are coloured in green when results improve and red when they get worse. Supplementary Text 1: Instructions for Running the Protocol
The entire CSRosetta package is available by distribution here [www.csrosetta.org], with detailed instructions and tutorials for a variety of protocols.
The installed distribution package includes a wrapper which will call third party programs, format data files, assemble command lines and run the picker using the command line:
./pick_fragments -cs [talos format chemical shifts file]
The protocol presented here uses program version numbers and command lines that may be updated over time, and which for reference were used as follows:
TALOS+ Version 3.80F1 blastpgp Version 2.2.26 nr Blast database, Mar 15, 2013 rosetta SVN Version 51111 (Compatible with Release Version 3.5)
The wrapper then called the following command lines:
TALOS+: talos+ -in [talos format chemical shifts file]
(output pred.ss.tab contains secondary structure predictions, and pred.tab contains phi/psi predictions)
Psipred: blastpgp -t T -i [fasta] -F F -j2 -o [fasta].blast -d nr -v10000 -b10000 -K1000 -h0.0009 -e0.0009 -C chk_file -Q [fasta].pssm
(The chk_file is then converted into a rosetta readable format named t000_.checkpoint) blastpgp -t T -F F -i [fasta] -j 1 -R chk_file -o [fasta].outn -e 0.05 -d nr
(The wrapper then copies PDB ids from [fasta].outn and [fasta].blast into a file named t000_.homologs used for optional homolog exclusion)
Rosetta:
./fragment_picker.linuxgccrelease @flags.file Where the flags.file contains the following flags:
# Database Locations -database [path]/rosetta_database/ -in::file::vall [path]/csrosetta3/frag_picker/csrosetta_vall/vall.dat.2008.apr24.vCS # Input Data -in:file:talos_cs [target].tab -in::file:fasta [target].fasta -frags::ss_pred pred.ss.tab talos -in::file::talos_phi_psi pred.tab -in::file::checkpoint t000_.checkpoint # Flag for Homolog Exclusion (Optional) -frags:denied_pdb t000_.homologs # Output Data -frags::describe_fragments frags.fsc.score -out::file::frag_prefix frags.score # Protocol Constants -frags::n_frags 200 -frags::frag_sizes 3 9 -frags::scoring::config scores.score.cfg -frags::sigmoid_cs_A 2 -frags::sigmoid_cs_B 4 # Mute Output -mute all -mute core.fragment.picking.VallProvider
And the scores.score.cfg file contains a constant set of score weight definitions:
# score name priority weight min_allowed extras CSScore 400 1 - ProfileScoreL1 300 1.5 - TalosSSSimilarity 200 0.25 - talos RamaScore 100 1 - talos PhiPsiSquareWell 50 5.0 - References:
Bertini I, Del Bianco C, Gelis I, Katsaros N, Luchinat C, Parigi G, Peana M, Provenzani A, Zoroddu MA, et al. (2004) Experimentally exploring the conformational space sampled by domain reorientation in calmodulin. Proc Natl Acad Sci USA 101(18):6841-6846.
Jia Z, Quail JW, Waygood EB, Delbaere LT, et al. (1993) The 2.0-A resolution structure of Escherichia coli histidine-containing phosphocarrier protein HPr. A redetermination. J Biol Chem. 268(30):22490-22501.
Ramirez BE, Voloshin ON, Camerini-Otero RD, Bax A, et al. (2000) Solution structure of DinI provides insight into its mode of RecA inactivation. Protein Sci. 9(11):2161-2169.
Rosato A, Aramini JM, Arrowsmith C, Bagaria A, Baker D, Cavalli A, Doreleijers JF, Eletsky A, Giachetti A, Guerry P, et al. (2012) Blind testing of routine, fully automated determination of protein structures from NMR data. Structure 20:227–236.
Schmitz C, Vernon R, Otting G, Baker D, Huber T (2012) Protein structure determination from pseudocontact shifts using ROSETTA. J Mol Biol 416:668–677.
Shen Y, Lange OF, Delaglio F, Rossi P, Aramini JM, Liu G, Eletsky A, Wu Y, Singarapu KK, Lemak A, et al. (2008) Consistent blind protein structure generation from NMR chemical shift data. Proc Natl Acad Sci USA 105:4685–4690.
Su XC, Man B, Beeren S, Liang H, Simonsen S, Schmitz C, Huber T, Messerle BA, Otting G, et al. (2008) A dipicolinic acid tag for rigid lanthanide tagging of proteins and paramagnetic NMR spectroscopy. J Am Chem Soc 130(32):10486-10487.
Ulmer TS, Ramirez BE, Delaglio F, Bax A, et al. (2003) Evaluation of backbone proton positions and dynamics in a small protein by liquid crystal NMR spectroscopy. J Am Chem Soc. 125(30):9179-9191.