R2DT Is a Framework for Predicting and Visualising RNA Secondary Structure Using Templates

R2DT Is a Framework for Predicting and Visualising RNA Secondary Structure Using Templates

ARTICLE https://doi.org/10.1038/s41467-021-23555-5 OPEN R2DT is a framework for predicting and visualising RNA secondary structure using templates Blake A. Sweeney1,7, David Hoksza 2,7, Eric P. Nawrocki3, Carlos Eduardo Ribas 1, Fábio Madeira 1, Jamie J. Cannone4, Robin Gutell4, Aparna Maddala5, Caeden D. Meade5, Loren Dean Williams 5, ✉ Anton S. Petrov 5, Patricia P. Chan 6, Todd M. Lowe6, Robert D. Finn 1,8 & Anton I. Petrov 1,8 Non-coding RNAs (ncRNA) are essential for all life, and their functions often depend on their 1234567890():,; secondary (2D) and tertiary structure. Despite the abundance of software for the visualisa- tion of ncRNAs, few automatically generate consistent and recognisable 2D layouts, which makes it challenging for users to construct, compare and analyse structures. Here, we present R2DT, a method for predicting and visualising a wide range of RNA structures in standardised layouts. R2DT is based on a library of 3,647 templates representing the majority of known structured RNAs. R2DT has been applied to ncRNA sequences from the RNAcentral database and produced >13 million diagrams, creating the world’s largest RNA 2D structure dataset. The software is amenable to community expansion, and is freely available at https://github. com/rnacentral/R2DT and a web server is found at https://rnacentral.org/r2dt. 1 European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK. 2 Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic. 3 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA. 4 Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA. 5 School of Chemistry and Biochemistry, Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA, USA. 6 Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA. 7These authors contributed equally: Blake A. Sweeney, David Hoksza. 8These authors jointly supervised this work: Robert D. Finn, ✉ Anton I. Petrov. email: [email protected] NATURE COMMUNICATIONS | (2021) 12:3494 | https://doi.org/10.1038/s41467-021-23555-5 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-23555-5 NA molecules are key components of a wide range of but they do not take advantage of the community-accepted lay- Rbiological processes, such as translation, splicing, and outs. In addition, some methods can show other issues such as transcription. For many RNAs, the 3D structure is essential overlaps and a lack of visual clarity (Fig. 1). Some packages, like for biological function. For example, ribosomal RNA (rRNA) and the SSU-ALIGN21 can generate 2D structure diagrams of SSU transfer RNA (tRNA) adopt very specific, evolutionarily con- rRNA following the CRW layout but display only a fixed number served 3D conformations in order to perform translation, and of consensus positions. RNA aptamers can specifically recognise small molecules and The lack of tools for visualising RNAs in consistent, repro- other ligands by virtue of their 3D structures. The architecture of ducible, and recognisable layouts, makes comparing RNA struc- a structured RNA molecule is hierarchical, whereby the RNA tures difficult for RNA biologists and essentially impossible for sequence (primary structure) folds into local elements that, in non-specialists. turn, interact with each other to form the 3D structure1. The Here we fill a fundamental gap in visualising structured majority of intramolecular contacts in most ncRNAs can be RNAs by introducing R2DT (RNA 2D Templates). R2DT represented in the form of 2D structure diagrams, which are far encapsulates a comprehensive pipeline for template-based more accessible and can present a broader variety of information RNA 2D visualisation, generating diagramatic 2D representa- than the corresponding 3D structures. tions of RNA structures based on a representative library of Many RNAs are visualised following standard, community- templates, and is implemented as both a standalone application accepted conventions. For example, the 2D diagrams from the (https://github.com/rnacentral/R2DT) and a web server Comparative RNA Web Site2 (CRW) have been used for decades (https://rnacentral.org/r2dt). The framework can be easily and are widely accepted as standard for rRNA visualisation. updated and extended with additional templates, and it has Similarly, tRNAs are traditionally displayed in a cloverleaf layout been extensively tested on > 13 million sequences from with the 5′- and 3′- ends located at the top, the anticodon loop RNAcentral22, a comprehensive database of ncRNA sequences pointing towards the bottom, and the D- and T- loops facing left (see Validation for more information). and right, respectively3. Both of these representations capture important structural and functional features, providing valuable Results insights into the RNA structure and function. However, the Automatic pipeline for template selection and 2D structure construction of RNA secondary structures requires manual cura- visualisation. We developed a computational pipeline that uses a tion, which does not scale with the large numbers of sequences template library to define standard layouts for different types of being generated by modern molecular biology techniques. RNA. A minimal template contains a reference sequence, as well There are many automated approaches for visualising RNA 2D as cartesian coordinates for each nucleotide, and a 2D repre- structure that can be broadly categorised into one or more of the sentation of the structure in dot-bracket notation that encapsu- following groups based upon the type of visualisation and the lates the canonical Watson-Crick base pairs. Some templates also algorithm used to generate it. Visualisations may (1) Use dot plots contain the wobble GU base pairs, but non-canonical base pairs where the X and Y axes represent the RNA sequence and the base are not currently included in the templates (see the next section pairs are shown as dots in Cartesian space (for example iDotter4); for the detailed description of the template library). or (2) Arc plots5 which represent the RNA as a line and inter- To enable automatic template selection, for each template a actions as arcs connecting paired nucleotides (for example covariance model is generated using Infernal23 based on the R-Chie6); and finally (3) Circular diagrams, which are effectively reference sequence and its 2D structure. The R2DT pipeline arc diagrams folded into a circle (for example VARNA7). includes the following steps: Algorithmically, the approaches may use (1) Force-directed lay- outs where nucleotides represent the nodes of a graph and base 1. For each input sequence, the top scoring covariance model pairs as well as other interactions are shown as lines connecting is selected using the ribotyper.pl programme in the the nodes. These layouts are governed by the attraction and Ribovore software package (version 0.40) (https://github. repulsion between the nucleotides and interactions (for example com/ncbi/ribovore). For model selection, ribotyper.pl runs 23 fi Forna8); or (2) Rule-based methods9, such as RNAView10, the Infernal cmsearch programme and uses a pro le 3DNA11, PseudoViewer12, R2R13, RNA2Drawer14, jViz15, HMM derived from the covariance model that scores RNApuzzler16 and many others, for a more comprehensive sequence only and ignores secondary structure to limit fi review see17,18. However, none of these methods achieves all five running time. If Ribovore does not nd any matches, 24 properties of ideal RNA 2D structure diagrams18,19: they should tRNAscan-SE 2.0 is used to search query sequences be modular to reflect functional domains, appear similar for against the tRNA models. To speed up template selection, related structures despite sequence or minor structural changes to the library is divided into several subsets which are allow for easy comparison, avoid overlaps and ensure visual processed separately (Rfam, LSU and SSU RiboVision clarity, be a realistic representation of the 3D structure (if rRNAs, CRW, and tRNA templates). If a sequence is fi fi known), and finally be aesthetically pleasing18,19. classi ed to a template model in one of the subsets (de ned “ ” In general, existing tools do not produce diagrams in widely as being designated PASS by ribotyper.pl without a “ ” fl accepted layouts, and in some cases, homologous or even iden- MultipleHits ag) then the remaining subsets are not tical sequences are displayed in different layouts that can be hard searched. In cases where both a 3D-based and a to analyse and compare17. Specifically, while dot plots are useful covariation-based template are available for the same for comparative analysis in the hands of expert users20, they are RNA, the 3D-based template is preferentially selected. difficult for users to interpret; arc plots, which show sequence The Ribovore software is used to search against all models conservation and base pairs, can be prohibitively large and make except for tRNA. If no hits are detected, tRNAscan-SE 2.0 is it difficult to examine small structural changes; circular diagrams then used to compare the sequences against the bacterial, fi can be uninformative and can make small structural changes hard archaeal, and eukaryotic domain-speci c tRNA models. fi to analyse; while force-directed layouts are not always robust Once a top scoring domain-speci c tRNA model is chosen, fi against changes in the sequence or structure, making a compar- the sequence is compared with the isotype-speci c tRNA ison across related structures difficult. The rule-based methods models for that domain. It is also possible to specify a fi can reliably produce diagrams even for large RNA structures16, template and bypass the classi cation step. If the sequence 2 NATURE COMMUNICATIONS | (2021) 12:3494 | https://doi.org/10.1038/s41467-021-23555-5 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-23555-5 ARTICLE Fig. 1 Examples of 2D structures of the Thermus thermophilus SSU rRNA.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us