Protein Structure Prediction Using Sparse NOE and RDC Restraints with Rosetta in CASP13

Received: 20 February 2019 Revised: 25 May 2019 Accepted: 6 July 2019 DOI: 10.1002/prot.25769 RESEARCH ARTICLE Protein structure prediction using sparse NOE and RDC restraints with Rosetta in CASP13 Georg Kuenze1,2 | Jens Meiler1,2 1Department of Chemistry, Vanderbilt University, Nashville, Tennessee Abstract 2Center for Structural Biology, Vanderbilt Computational methods that produce accurate protein structure models from limited University, Nashville, Tennessee experimental data, for example, from nuclear magnetic resonance (NMR) spectros- Correspondence copy, hold great potential for biomedical research. The NMR-assisted modeling chal- Georg Kuenze, PhD, Department of lenge in CASP13 provided a blind test to explore the capabilities and limitations of Chemistry, Vanderbilt University, Stevenson Center, Station B 351822, Room 7330, current modeling techniques in leveraging NMR data which had high sparsity, ambi- Nashville, TN 37235. guity, and error rate for protein structure prediction. We describe our approach to Email: [email protected] predict the structure of these proteins leveraging the Rosetta software suite. Protein Funding information structure models were predicted de novo using a two-stage protocol. First, low- American Heart Association, Grant/Award Number: 18POST34080422; Deutsche resolution models were generated with the Rosetta de novo method guided by non- Forschungsgemeinschaft, Grant/Award ambiguous nuclear Overhauser effect (NOE) contacts and residual dipolar coupling Number: KU 3510/1-1; National Institutes of Health, Grant/Award Numbers: R01 (RDC) restraints. Second, iterative model hybridization and fragment insertion with GM073151, R01 GM080403 the Rosetta comparative modeling method was used to refine and regularize models guided by all ambiguous and nonambiguous NOE contacts and RDCs. Nine out of 16 of the Rosetta de novo models had the correct fold (global distance test total score > 45) and in three cases high-resolution models were achieved (root-mean- square deviation < 3.5 Å). We also show that a meta-approach applying iterative Rosetta + NMR refinement on server-predicted models which employed non-NMR- contacts and structural templates leads to substantial improvement in model quality. Integrating these data-assisted refinement strategies with innovative non-data- assisted approaches which became possible in CASP13 such as high precision contact prediction will in the near future enable structure determination for large proteins that are outside of the realm of conventional NMR. KEYWORDS CASP13, distance restraints, NMR spectroscopy, protein structure prediction, Rosetta 1 | INTRODUCTION data at atomic-detail. NMR can obtain such data for small proteins under physiologically conditions but loses quickly resolution and sen- A current focus in structural biology has been the development of sitivity when the protein under study becomes large (>20 kDa). The advanced integrative modeling techniques that can determine struc- classical approach of collecting short-range inter-proton distance mea- tures of proteins and their interactions from limited experimental surements by nuclear Overhauser effect spectroscopy (NOESY) data.1,2 Those methods are called in when classical structural biology becomes difficult due to peak line broadening and low signal-to-noise techniques such as X-ray crystallography and nuclear magnetic reso- ratio. Low-resolution and sparse NMR datasets thus call for computa- nance (NMR) spectroscopy fail to obtain complete and unambiguous tional methods that can translate them into accurate structural Proteins. 2019;1–10. wileyonlinelibrary.com/journal/prot © 2019 Wiley Periodicals, Inc. 1 2 KUENZE AND MEILER models. At the same time these methods add atomic-detail informa- technologies like deep convolutional neural networks that allow effi- tion to the model which may not be present in the NMR data, for cient use of coevolution information.19-22 As those new modeling example, sidechain positions. techniques are made available to users, for example, in the form of a The Rosetta program3 offers a unique platform of integrative public webserver, they can be easily incorporated into the NMR modeling tools and has been designed to make use of different types structure prediction protocol. Here, we demonstrate that models of NMR data. For example, chemical shifts (CSs)4-6 and residual dipo- submitted by modeling servers can be easily recombined and hybrid- lar couplings (RDCs)7 can be used to guide the search and assembly of ized with Rosetta and refined with NMR data to yield structure pre- small peptide fragments with known conformations from which dictions with better accuracy than Rosetta-NMR models and the Rosetta builds a protein structure de novo. Using only this kind of original server predictions. We therefore suggest a “meta-approach” backbone NMR data, which is available at an early stage of the NMR to NMR structure prediction which supplements sparse experimental structure determination process, this method, called “CS-Rosetta,” data with complementary restraints, for example, from homolog tem- was able to correctly model the structure of proteins up to 25 kDa.8 plates and predicted contacts. This meta-approach can be a con- Incorporation of sparse NOEs from selectively methyl labeled Ile, Leu structive way to determine the structures of challenging protein and Val sidechains in deuterated proteins and improvements to the targets which till now were outside of the realm of solution conformational sampling algorithm were shown to increase the appli- state NMR. cation limit to 40 kDa.9 Moreover, the CS-Rosetta method was extended to include sparse contact information10 and structural tem- 2 | MATERIALS AND METHODS plates11 from homologous proteins with guidance from CS-based alignments. With the impressive advancement in coevolution- and 2.1 | Overall structure prediction protocol deep-learning-based contact prediction methods,12 evolutionary couplings (ECs) become now increasingly available as new type of dis- As in previous CASP contact-assisted experiments,23,24 a two-stage tance restraints to supplement sparse NMR data13 and facilitate de modeling approach consisting of initial fold-level modeling and subse- novo protein structure prediction in Rosetta.14,15 quent model refinement was employed (Figure 1). The aforementioned CS-Rosetta studies used expert-collected The provided NMR data comprised simulated 1H-1H-NOE and experimental NMR datasets with high completeness in the sense that 1H-15N-RDC data for 11 out of 12 protein targets as well as one real CS assignments were available for almost all residues. In addition, two experimental 1H-1H-NOE dataset for one protein target. In addition, or more RDC datasets and one to two NOEs per residue were used.9 φ/ψ dihedral angle ranges (with ±30 uncertainty) which had been The data provided in CASP13 comprised simulated data and one real back-calculated from the simulated or real CSs were provided by the NMR dataset. The number of NOE restraints was comparable to that organizers. The NMR data were sparse and ambiguous, that is, many one in previous CS-Rosetta studies but datasets contained a consider- NOE cross-peaks could be assigned to more than one possible pair of able number of NOEs with high ambiguity as well as incorrectly H-atoms. In addition, the NOE dataset contained a considerable frac- assigned NOEs. Furthermore, in order to simulate realistic difficulties tion of false positive (FP) contacts arising from incorrect peak assign- in the data collection process, for example, line broadening due to ments because NOEs had been extracted from NOESY-NMR spectra internal protein motions, residues were removed from the peak list, simulated with realistic peak line widths and signal-to-noise ratio. and hence, the assignment became incomplete; RDCs, CS-derived tor- Thus, denoising and optimal use of the ambiguous data along the sion angle restraints and NOEs were available for only part of the pro- modeling pipeline was important. tein. It is an interesting question how sparse and ambiguous NMR In the first modeling stage, only nonambiguous NOEs and RDCs data are best incorporated into modeling methods, to which extent were used. In the following refinement stage, starting from these fold- they can improve the accuracy of the predicted structural model and level models, ambiguous data were included (Figure 1A). These NMR whether they can provide higher accuracy than so-called “free model- data were further supplemented by EC distance restraints which were ing” techniques which omit the use of experimental data. provided by the CASP organizers and had been computed with the In CASP13, we employed a two-stage approach adopted to the MetaPSICOV method.31 ambiguity level of the restraints: initial fold-level modeling guided by Parallel to calculations with this full set of restraints (ie, dihedral, unambiguous data followed by iterative model hybridization and NOE, RDC, and EC restraints), a second independent prediction using refinement using all NMR data. Our results show that with this low only dihedral and RDC restraints was made for every target, except amount of NMR information Rosetta can generate models that have N1008 for which no RDCs had been measured. Purpose of this mini- the correct fold and are in some cases very close to the native struc- mal restraint set was to test the efficacy of backbone-only data for ture. In our analysis after CASP13, we further explored the possibility NMR structure prediction.

Protein Structure Prediction Using Sparse NOE and RDC Restraints with Rosetta in CASP13

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support