Computer Folding of RNA Tetraloops: Identification of Key Force Field Deficiencies Petra Kührová,§ Robert B

Computer Folding of RNA Tetraloops: Identification of Key Force Field Deficiencies Petra Kührová,§ Robert B. Best,‡ Sandro Bottaro,† Giovanni Bussi,† Jiří Šponer, §£* Michal Otyepka§£ and Pavel Banáš,§£* §Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacky University Olomouc, 17. listopadu 12, 771 46 Olomouc, Czech Republic e-mail: [email protected] ‡Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520 †Scuola Internazionale Superiore di Studi Avanzati, Via Bonomea 265, 34136 Trieste, Italy £Institute of Biophysics, Academy of Sciences of the Czech Republic, Kralovopolska 135, 612 65 Brno, Czech Republic *corresponding authors KEYWORDS: force field, RNA tetraloop, folding, enhanced sampling methods, REMD Table of Contents: The developments of enhanced sampling methods and force fields are highly mutually interrelated efforts. Here we used three different enhanced sampling techniques (temperature based replica exchange molecular dynamics, replica exchange solute tempering, and metadynamics) to fold 5’-GAGA-3’ RNA tetraloop. We aimed to separate problems caused by limited sampling from those due to force-field inaccuracies, and identify which terms of the force field are responsible for poor description of tetraloop folding. 1 ABSTRACT The computer aided folding of biomolecules, particularly RNAs, is one of the most difficult challenges in computational structural biology. RNA tetraloops are fundamental RNA motifs playing key roles in RNA folding and RNA-RNA and RNA-protein interactions. Although state- of-the-art Molecular Dynamics (MD) force fields correctly describe the native state of these tetraloops as a stable free-energy basin on the microsecond time scale, enhanced sampling techniques reveal that the native state is not the global free energy minimum, suggesting yet unidentified significant imbalances in the force fields. Here we tested our ability to fold the RNA tetraloops in various force fields and simulation settings. We employed three different enhanced sampling techniques, namely temperature replica exchange MD (T-REMD), replica exchange with solute tempering (REST2), and well-tempered metadynamics (WT-MetaD). We aimed to separate problems caused by limited sampling from those due to force-field inaccuracies. We found that none of the contemporary force fields is able to correctly describe folding of the 5’- GAGA-3’ tetraloop over a range of simulation conditions. We thus aimed to identify which terms of the force field are responsible for this poor description of TL folding. We showed that at least two different imbalances contribute to this behavior, namely, overstabilization of base- phosphate and/or sugar-phosphate interactions and underestimated stability of the hydrogen bonding interaction in base pairing. The first artifact stabilizes the unfolded ensemble while the second one destabilizes the folded state. The former problem might be partially alleviated by reparameterization of the Van der Waals parameters of the phosphate oxygens suggested by Case et al., while in order to overcome the latter effect we suggest local potentials to better capture hydrogen bonding interactions. 2 INTRODUCTION Computer simulation of the folding of the biomolecules is one of the major challenges in biomolecular modelling. Simulations, together with experimental methods, help us to build a more complete picture of these nontrivial processes.1-7 Computational studies of folding often use molecular dynamics (MD) simulations in conjunction with enhanced sampling methods. The accuracy, and in turn the predictive power of folding simulations critically depends on the quality of enhanced sampling techniques, accessible time scales, and most importantly on the force fields used. Although modern force fields often perform satisfactorily in unbiased simulations initiated from RNA experimental structures, this does not guarantee that the native state is the global free energy minimum within the force field description, which would be required to achieve folding from the unfolded state. Therefore, contemporary force fields should be validated not only using unbiased MD simulations, which are typically trapped in a free energy basin corresponding to the starting structure, but also using enhanced sampling methods capable to describe e.g. folding of the RNA. The developments of enhanced sampling methods and force fields are thus highly mutually interrelated efforts. The enhanced sampling methods are designed to overcome energy barriers and provide a robust exploration of free energy landscape.8 In the limit of converged sampling, the accuracy of enhanced sampling simulations is determined by the accuracy of the force field.9 There are many variants of enhanced sampling MD methods. The widely used temperature- based replica exchange MD10 (T-REMD) benefits from multiple independent MD simulations running in parallel over a range of different temperatures. Exchanges of configurations between neighbouring temperatures are attempted at fixed time intervals and accepted according to a Metropolis-style algorithm to ensure canonical sampling at all temperatures. Another method based on multiple independent MD simulations is replica exchange with solute scaling (in a REST2 variant, see ref. 11), which is a variant of Hamiltonian-based replica exchange MD (H- REMD).12 This method is based on a modification of the potential energy, so that the interactions between solute atoms are scaled by a factor l, solvent-solvent interactions remain unscaled, and solute-solvent interactions are scaled by an intermediate factor (in this case by �). Scaling the energy by a factor l is equivalent to a scaling of the temperature by 1/l. Thus, in case of REST2 only the solute atoms are effectively heated up. What is more important, the solvent-solvent interactions that typically contribute the most to the energy differences between replicas in T- REMD simulations do not contribute to exchanges. Another popular and conceptually very different enhanced sampling method is a well-tempered metadynamics (WT-MetaD).13-14 WT- MetaD uses a history-dependent biasing potential on a few chosen coarse-grained degrees of freedom referred to collective variables (CVs). Suitable CVs help the systems to escape from the trap of the free energy minima and accelerate sampling of rare events. The well-tempered version of the metadynamics prevents overfilling of the free energy landscape and generally improves convergence of this technique.13 Defining an appropriate set of CVs, which would allow the most relevant conformational changes to be described, is usually the most difficult challenge.15-16 RNA hairpins composed of a short loop at the tip of the Watson-Crick base-paired A-RNA stem are ideal testing systems for validation of force fields because of their small size and the significance of interactions other than Watson-Crick base pairs in stabilizing their structure. These abundant building blocks are indispensable for RNA folding17-18 and play crucial roles in RNA-RNA and protein-RNA interactions. Hairpin loops containing four nucleotides in the loop 3 region, denoted tetraloops (TLs), are notably stable and are often involved in tertiary contacts.17, 19-22 Moreover, TLs play several biological roles in translation and transcription.17 They have even been suggested to be involved in the emergence of spontaneous RNA catalysis in early stages of RNA prebiotic chemistry.23 The most prominent families of TLs are 5’-GNRA-3’ and 5’-UNCG-3’ TLs (N stands for any nucleotide and R for purine). GNRA TLs can participate in protein-RNA complexes serving as a recognition sites for RNA binding and are involved in many RNA-RNA and RNA-ligand interactions.24 Experimental results indicate that the RNA TL folding is more hierarchical compared to folding of small fast-folding proteins.25-26 Two types of RNA TL folding mechanism have been suggested.27 In the first model, the RNA chain collapses through non-specific base pairs to form a heterogeneous structural ensemble, followed by native-like or misfolded helix growth from the nucleation centre.27-28 The second pathway is a zipping mechanism, in which the loop is closed by stable base pair, followed by subsequent zipping of the helical stem.27 Despite their small size, a computational description of the folding of RNA TLs remains a challenge.29 RNA TLs are precisely-shaped molecular building blocks with characteristic signature molecular interactions determining their consensus sequences.30 The TL folding process results from a complex interplay of canonical base pairing, non-canonical interactions, stacking, solvation effects, backbone substates and electrostatic interactions between RNA and surrounding ions. The main obstacles to the accurate molecular description of the TLs folding are limited sampling in molecular simulations and limited accuracy of the force field.31 Current atomistic simulations of nucleic acids are still based on second-generation pair- additive force fields derived around twenty years ago.2-3 There have been efforts to improve their performance by partial reparametrizations.32-38 Most of these works attempted tuning of the uncoupled one-dimensional backbone dihedral potentials, which is the most straightforward refinement. These rather unphysical tweaks are used for final tuning of the force field once the other parameters are fixed, and we certainly

Computer Folding of RNA Tetraloops: Identification of Key Force Field Deficiencies Petra Kührová,§ Robert B

Chapter 23 Nucleic Acids

An O(N5) Algorithm for MFE Prediction of Kissing Hairpins and 4-Chains in Nucleic Acids

Nucleotides and Nucleic Acids

The Structure and Function of Large Biological Molecules 5

Chapter 22. Nucleic Acids

De Novo Nucleic Acids: a Review of Synthetic Alternatives to DNA and RNA That Could Act As † Bio-Information Storage Molecules

Structural Aspects of Nucleic Acid Analogs and Antisense Oligonucleotides

The Structure and Function of Nucleic Acids Revised Edition

Nucleotides and Nucleic Acids

A Suite of RNA Secondary Structure Prediction and Design Software Tools

Branched Kissing Loops for the Construction of Diverse RNA Homooligomeric 2 Nanostructures 3 Di Liu1, Cody W

Success and Challenge in Modeling Nucleic Acid Structure and Dynamics