Mining for Recurrent Long-Range Interactions in RNA Structures

i “main” — 2018/4/3 — 9:38 — page 1 — #1 i i i Published online XX XXX XXXX Nucleic Acids Research, XXXX, Vol. XX, No. XX 1–10 doi:10.1093/nar/gkn000 Mining for recurrent long-range interactions in RNA structures reveals embedded hierarchies in network families Vladimir Reinharz 1,2, Antoine Soulé 2,3,EricWesthof4,JérômeWaldispühl2 and Alain Denise 5,6,ò 1Department of Computer Science, Ben-Gurion University of the Negev, P.O.B. 653 Beer-Sheva, 84105, Israel, 2School of Computer Science, McGill University, 3480 University, Montreal, Quebec, H3A 0E9, Canada, 3LIX, Ecole´ Polytechnique, CNRS, Inria, Palaiseau, 91120, France, 4ARN, Universitéde Strasbourg, IBMC-CNRS, 15 rue RenéDescartes, Strasbourg Cedex, 67084, France, 5LRI, UniversitéParis-Sud, CNRS, UniversitéParis-Saclay, Bâtiment 650, Orsay cedex, 91405, France and 6I2BC, UniversitéParis-Sud, CNRS, CEA, UniversitéParis-Saclay, Bâtiment 400, Orsay cedex, 91405, France Received XXX XX, XXXX; Revised XXX XX, XXXX; Accepted XXX XX, XXXX ABSTRACT loops, bulges, terminal loops. Additional long-range interactions, those that connect distinct secondary structure The wealth of the combinatorics of nucleotide base pairs elements (SSEs) in 3D structures, and non canonical base pairs enables RNA molecules to assemble into sophisticated or interactions make the molecule adopt its three-dimensional interaction networks, which are used to create complex 3D tertiary structure. substructures. These interaction networks are essential to RNA modules are small substructures which appear in shape the 3D architecture of the molecule, and also to multiple locations in a variety of different RNA molecules, provide the key elements to carry molecular functions such and which fold identically or almost identically. They as protein or ligand binding. They are made of organised are formed of assemblies of non-Watson-Crick base pairs, sets of long-range tertiary interactions which connect they mediate the folding of the molecule and they can distinct secondary structure elements in 3D structures. also constitute specific protein or ligand binding sites (21, Here, we present a de novo data-driven approach to 24, 25, 26, 27, 32). Well known RNA modules are, for example, GNRA loops, Kink-turns, G-bulges, and the A- extract automatically from large data sets of full RNA minor interactions. Identifying, characterizing RNA modules, 3D structures the recurrent interaction networks. Our understanding how they form and what are their relationships methodology enables us for the first time to detect the are key points for a better understanding of how RNA folds interaction networks connecting distinct components of the and interact with other molecules. RNA modules can be RNA structure, highlighting their diversity and conservation classified in two classes: through non-related functional RNAs. We use a graphical model to perform pairwise comparisons of all RNA structures • Local modules are located within secondary structure available and to extract recurrent interaction networks and elements: they are mainly formed of non-Watson-Crick base pairings inside the loops (internal, multiple or modules. Our analysis yields a complete catalogue of RNA terminal loops, or bulges) of the secondary structure. 3D structures available in the Protein Data Bank and reveals Most known modules are built mainly locally, as the G- the intricate hierarchical organization of the RNA interaction bulges and the Kink-turn loops (21, 25), but they can networks and modules. We assembled our results in an online also constitute an element of an interaction module. database (http://carnaval.lri.fr)whichwillbe regularly updated. Within the site, a tool allows users with • Interaction modules connect two distinct secondary a novel RNA structure to detect automatically whether structure elements (helices, loops or local modules). A the novel structure contains previously observed recurrent well known element of this class is the “A- minor” Type I/II (27, 28). interaction networks. Here we distinguish recurrent interaction networks (RINs) from interaction modules. As specified below, a INTRODUCTION recurrent interaction network does not contain any sequence RNA tertiary structures are highly modular. Canonical information, but only topological information about the Watson-Crick base pairs form what is called the secondary interactions between nucleotides and the nature of these structure, composed of helices interspersed with other interactions. Thus, a given RIN may be a constituent element secondary structure elements such as multiloops, interior of several other RINs. Further, when embedded in sequence òTo whom correspondence should be addressed. Tel: +33 (0) 1 69 15 63 69; Fax: +33 (0) 1 69 15 65 86; Email: [email protected] © XXXX The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. i i i i i “main” — 2018/4/3 — 9:38 — page 2 — #2 i i i 2 Nucleic Acids Research, XXXX, Vol. XX, No. XX space, a given RIN may participate in several types of other constraints which are developed below. These interaction modules. In other words, when mapped onto subgraphs are called interaction networks. sequence information, an identical recurrent interaction network can give rise to one or several interaction modules. 4. Finally we cluster the identical interaction networks A number of computational approaches have been together and create a network of direct inclusions. developed so far for finding automatically RNA modules in tertiary structures, either by geometric methods, or by We present in Sup. Mat. FigS1 a schema of the method, and algorithms based on graph theory (1, 2, 6, 9, 10, 12, 14, we detail it below. 15, 31, 33, 34, 36, 37). Most of these methods aim to find known modules in new structures. A few methods aim to Data search for modules without any prior knowledge of their The non-redundant RNA database maintained on geometry or topology (9, 15), but they only consider local RNA3DHub (30) on Sept. 9th 2016, version 2.92, was interactions. Databases, as the RNA 3D Motif Atlas (32), and used. It contains 845 all-atom molecular complexes with RNA Bricks (3) store information on the RNA modules which a resolution of at most 3A.˚ From these complexes, we have been found in experimentally determined RNA tertiary retrieved all RNA chains also marked as non-redundant by structures. RNA3DHub. Each chain was annotated by FR3D. Because Regarding especially recurrent interaction networks, apart FR3D cannot analyse modified nucleotides or those with a preliminary attempt (8), no automated method has been missing atoms, our present method does not include them developed up to now to detect them in tertiary structures either. If several models exist for a same chain, the first one and to classify them without any a priori knowledge of their only was considered. For the rest of this paper, the base geometry or topology. pairs extracted from the FR3D annotations are those defined We developed a graph-based methodology to extract all in the Leontis-Westhof geometric classification (23). They recurrent interaction networks in crystallized RNA tertiary are any combination of the orientation cis (c) (resp. trans structures and to cluster them according to their similarity. (t)) with the name of the side which interacts for each of We applied our methodology to a large set of experimentally the two nucleotides: Watson-Crick (W) cis c (or b for resolved RNA structures. Not only we retrieved the known trans), Hoogsteen (H) v (or u) or Sugar-Edge (S) Z (resp. recurrent interaction networks (as the different types of A- V). Thus, each base pair is annotated by a string from the minors), but we also extracted new ones. Our method gives 2 set: c,t ✓ W,S,H or by combining previous symbols. a global view on interaction networks and their modularity, { } { } by organizing them in families according to their inclusion To represent a canonical cWW interaction, a double line is relations. The publicly accessible database CaRNAval http: generally used instead of (cc). //carnaval.lri.fr allows to visually explore and study all the interaction networks and their intricate relationships. Secondary Structure We further analyze our data and expose the remarkable For each chain a secondary structure without pseudoknots diversity of the well known A-minor networks. In particular, was deduced from the annotated interactions, as follows. we show that an unexpected number of unrelated structures First all canonical Watson-Crick and wobble base pairs (i.e. form the exact same intricate network of interactions. A-U, G-C and G-U) were identified. Then, since many Furthermore, the diversity of the molecules in which several of structures are naturally pseudoknotted, we used the K2N (35) these networks are found (e.g. ribosomes, ribozymes and other implementation in the PyCogent (18) Python module to non-functionally related RNAs) underlines the universality remove pseudoknots. Problems arise when a nucleotide and fundamental nature of these recurrent architectures. is involved in several Watson-Crick base pairs (which is geometrically not feasible), probably due to an error of the MATERIALS AND METHODS automatic annotation. Those discrepancies were removed with a ad hoc algorithm such that if a nucleotide is involved in Given an mmCIF file from the PDB describing an RNA chain, several Watson-Crick base pairs, we remove the base pair the method presented here works in five steps. which belongs to the shortest helix. 0. We first build for the chain a directed graph such that the edges represent the phosphodiester bonds as well as Secondary Structure Elements and Skeleton Graph the canonical and non-canonical interactions. From the secondary structure, four types of secondary 1. From the annotations all canonical base pairs are structure elements (SSEs) are defined.

Mining for Recurrent Long-Range Interactions in RNA Structures

Conformational Transition of H-Shaped Branched Polymers

Electrically Sensing Hachimoji DNA Nucleotides Through a Hybrid Graphene/H-BN Nanopore Cite This: Nanoscale, 2020, 12, 18289 Fábio A

The New Architectonics: an Invitation to Structural Biology

De Novo Nucleic Acids: a Review of Synthetic Alternatives to DNA and RNA That Could Act As † Bio-Information Storage Molecules

Chain Structure Characterization

Why Does TNA Cross-Pair More Strongly with RNA Than with DNA? an Answer from X-Ray Analysis**

Bottlebrush Polymers in the Melt and Polyelectrolytes in Solution Share Common Structural Features

Lecture1541230922.Pdf

Statistical Properties of Linear-Hyperbranched Graft

REPLICATION of DOUBLE-STRAND NUCLEIC ACIDS Structure

Variation of Protein Backbone Amide Resonance by Electrostatic Field 1

Molecular Specification of Homopolymer of Vinylbenzyl-Lactose-Amide in Aqueous Solution