Blind Tests of RNA–Protein Binding Affinity Prediction

Blind tests of RNA–protein binding affinity prediction Kalli Kappela, Inga Jarmoskaiteb, Pavanapuresan P. Vaidyanathanb, William J. Greenleafc,d, Daniel Herschlagb, and Rhiju Dasa,b,e,1 aBiophysics Program, Stanford University, Stanford, CA 94305; bDepartment of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305; cDepartment of Genetics, Stanford University School of Medicine, Stanford, CA 94305; dDepartment of Applied Physics, Stanford University, Stanford, CA 94305; and eDepartment of Physics, Stanford University, Stanford, CA 94305 Edited by David Baker, University of Washington, Seattle, WA, and approved March 12, 2019 (received for review November 8, 2018) Interactions between RNA and proteins are pervasive in biology, and a complete prediction framework for RNA–protein complexes driving fundamental processes such as protein translation and has yet to be developed and systematically tested. participating in the regulation of gene expression. Modeling the Existing computational methods that attempt to quantitatively energies of RNA–protein interactions is therefore critical for under- predict relative RNA–protein binding affinities have achieved lim- standing and repurposing living systems but has been hindered by ited success, likely as a result of neglecting key features of the binding complexities unique to RNA–protein binding. Here, we bring together process such as intramolecular interactions and the unbound states. A several advances to complete a calculation framework for RNA–pro- previously developed method to predict relative RNA–protein bind- tein binding affinities, including a unified free energy function for ing affinities from a database-derived statistical potential produced bound complexes, automated Rosetta modeling of mutations, and significant correlations with experimental measurements for protein use of secondary structure-based energetic calculations to model un- mutants, but for RNA mutants the calculations exhibited no detect- bound RNA states. The resulting Rosetta-Vienna RNP-ΔΔGmethod able correlation with experiment (22). Another approach used mo- achieves root-mean-squared errors (RMSEs) of 1.3 kcal/mol on high- lecular dynamics simulations in combination with a nonlinear Poisson throughput MS2 coat protein–RNA measurements and 1.5 kcal/mol Boltzmann model and linear response approximation. Despite the on an independent test set involving the signal recognition particle, complexity of the method and computation exerted on a single model human U1A, PUM1, and FOX-1. As a stringent test, the method system, statistical uncertainties in relative binding affinity calculations achieves RMSE accuracy of 1.4 kcal/mol in blind predictions of hun- were reported to be 1 to 3 kcal/mol (23). More recently, a machine- – dreds of human PUM2 RNA relative binding affinities. Overall, these learning method, GLM-Score, developed to predict absolute nucleic BIOPHYSICS AND RMSE accuracies are significantly better than those attained by prior – acid protein binding affinities from structures of bound complexes COMPUTATIONAL BIOLOGY structure-based approaches applied to the same systems. Importantly, reported excellent accuracies (R2 = 0.75), but has not been tested in Rosetta-Vienna RNP-ΔΔG establishes a framework for further im- its ability to predict relative binding affinities on independent RNA– provements in modeling RNA–protein binding that can be tested by protein complexes (14). prospective high-throughput measurements on new systems. The propensity of RNA to adopt multiple stable conformations in the unbound state makes it problematic to predict binding af- RNA–protein complex | conformational change | binding affinity | blind finities with standard approaches that typically neglect the unbound prediction | energetic prediction state altogether. However, RNA is also distinctive from other molecules in that its unbound energetics can be predicted from a NA binding proteins (RBPs) affect nearly all aspects of RNA Rbiology, including alternative splicing, localization, trans- Significance lation, and stability (1, 2), and novel RNA–protein biophysical phenomena, ranging from in vivo phase separations to helicase- RNA and protein molecules interact to perform translation, induced rearrangements, are being discovered at a rapid pace (3, splicing, and other fundamental processes. These interactions 4). The function of an RBP depends on its ability to identify a are defined by their strength and specificity, but it remains specific target RNA sequence and structure, a process governed by infeasible to experimentally measure these properties for all the energetics of the interactions between each RNA and every RBP biologically important RNA–protein complexes. Development in its biological milieu (5). Recently developed high-throughput ex- of computational strategies for calculating RNA–protein ener- perimental methods have been used to quantitatively characterize getics has been hindered by unique complexities of RNA– the binding landscapes of a handful of RBPs (6–11), improving our protein binding, particularly the propensity of RNA to adopt understanding of the relationship between RNA sequence, structure, multiple conformations in the unbound state. We describe a and binding affinity. However, these empirically derived landscapes method, Rosetta-Vienna RNP-ΔΔG, combining 3D structure are limited to specific systems with solubilities and affinities within modeling with RNA secondary structure-based energetic cal- the concentration windows accessible to these methods. To un- culations to predict RNA–protein relative binding affinities. For derstand RNA–protein systems inaccessible to experimental char- several diverse systems and in rigorous blind tests, the accu- acterization and to rationally design new RNA–protein interactions, racy of Rosetta-Vienna RNP-ΔΔG compared with experimental a general physical model is needed to predict RNA–protein binding measurements is significantly better than that of prior energies. Physical models have proved useful for predicting changes approaches. in binding free energies (ΔΔG) for macromolecular interactions – – Author contributions: K.K. and R.D. designed research; K.K., I.J., and P.P.V. performed that do not involve RNA, including protein protein, protein small research; K.K., I.J., and P.P.V. contributed new reagents/analytic tools; K.K., I.J., P.P.V., molecule, and protein–DNA interactions (12–16). The best meth- W.J.G., D.H., and R.D. analyzed data; and K.K., I.J., P.P.V., W.J.G., D.H., and R.D. wrote ods for these other macromolecular interactions report accuracies the paper. of between 1 and 2 kcal/mol and include rigorous blind studies, The authors declare no conflict of interest. validating their use for applications that range from drug discovery This article is a PNAS Direct Submission. to protein–protein interface design (14, 16, 17). However, the ac- Published under the PNAS license. curacy of these methods deteriorates when molecules are highly 1To whom correspondence should be addressed. Email: [email protected]. flexible or undergo large conformational changes (18, 19), factors This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. commoninRNA–protein binding events (20, 21). Accurate pre- 1073/pnas.1819047116/-/DCSupplemental. diction of RNA–protein binding affinities is therefore challenging, www.pnas.org/cgi/doi/10.1073/pnas.1819047116 PNAS Latest Articles | 1of6 simple secondary structure-based model derived from dozens of A optical melting experiments (24–26). One straightforward but pre- ΔG viously untested solution for the treatment of unbound RNA en- bind ergetics in RNA–protein binding affinity calculations is to use these secondary structure-based calculations. RNA secondary and tertiary structure modeling are commonly integrated to increase the accu- ΔGcomplex ΔGunbound, ΔGunbound, racy of RNA 3D structure prediction (27), but this combination has protein RNA yet to be tested for quantitatively predicting binding affinities. Calculated with Calculated with Calculated with nearest Here we present a complete structure-based computational Rosetta Rosetta neighbor energies framework, Rosetta-Vienna RNP-ΔΔG, for predicting RNA–protein ΔGmut = ΔGmut – ΔG mut – ΔGmut relative binding affinities, bringing together secondary structure- bind complex unbound, unbound, based energetic calculations of unbound RNA free energies and a protein RNA unified energy function for bound RNA–protein complexes. Rosetta- Vienna RNP-ΔΔG achieves root-mean-squared errors (RMSEs) of BCExperimental Relaxed mutant DRNA sequence 1.3 kcal/mol on a dataset of binding affinities of the MS2 coat protein structure structure A C A U G A G U A U C A A C C A A G U with thousands of variants of its partner RNA hairpin (6) and 1.5 kcal/mol on a diverse, independent set of RNA–protein complexes. Additionally, we rigorously evaluated the accuracy of the method Ensemble of possible through a blind challenge that involved making predictions and secondary structures A U C U A U A U Mutate & relax G AA Remove RNA G A A C C separately measuring binding affinities of the human PUF family A C G C surrounding G C U A and relax U A A A A A C G U C G U protein PUM2 with hundreds of RNA sequences using the high- residues A C A U C A U A U A A G A U A C G throughput RNA MaP technology (28). On all tests, the prediction A A C G C G C U A U A ΔΔ A A A A accuracy of Rosetta-Vienna RNP- G appreciably exceeds that of C G C G A U previous structure-based

Blind Tests of RNA–Protein Binding Affinity Prediction

Alternative Polyadenylation in Triple-Negative Breast Tumors Allows NRAS and C-JUN to Bypass PUMILIO Post-Transcriptional Regulation

Regulation of Pluripotency by RNA Binding Proteins

PUM1 Represses CDKN1B Translation and Contributes to Prostate Cancer

Structural Basis for Specific Recognition of Multiple Mrna Targets by a PUF Regulatory Protein

Spinocerebellar Ataxia: an Update

A 5 Cytosine Binding Pocket in Puf3p Specifies Regulation Of

A Divergent Pumilio Repeat Protein Family for Pre-Rrna Processing and Mrna Localization

Coding RNA Genes

Role of the Post-Transcriptional Regulators Pumilio1 and Pumilio2 in Murine Hematopoietic Stem Cells Fabio Michelet

Genomic and Expression Profiling of Human Spermatocytic Seminomas: Primary Spermatocyte As Tumorigenic Precursor and DMRT1 As Candidate Chromosome 9 Gene

Pumilio 1 (PUM1) (NM 014676) Human Tagged ORF Clone Lentiviral Particle Product Data

A Conserved Abundant Cytoplasmic Long Noncoding RNA Modulates Repression by Pumilio Proteins in Human Cells