Folding RNA/DNA Hybrid Duplexes
Total Page:16
File Type:pdf, Size:1020Kb
Vol. 00 no. 00 BIOINFORMATICS Pages 1–2 Folding RNA/DNA hybrid duplexes Ronny Lorenz 1,∗, Ivo L Hofacker 1,2 and Stephan H Bernhart 2,∗ 1Institute for Theoretical Chemistry, University Vienna, Wahringer¨ Straße 17, 1090 Vienna, Austria 2Research group BCB, Faculty of Computer Science, University Vienna 3Department of Bioinformatics, University of Leipzig, Haertelstraße 16-18, 04109 Leipzig, Germany Received on XXXXX; revised on XXXXX; accepted on XXXXX Associate Editor: XXXXXXX ABSTRACT (for a review see e.g. [11]) or even their dimers, there is, to our Motivation: While there are numerous programs that can predict knowledge, no program that also includes the possibility to predict RNA or DNA secondary structures, a program that predicts RNA/DNA the full secondary structure of RNA/DNA hetero-dimers. This may hetero-dimers is still missing. The lack of easy to use tools for be due to the lack of compiled energy parameters, as there are only predicting their structure may be in part responsible for the small stacking energies available today. We introduce here a possibility number of reports of biologically relevant RNA/DNA hetero-dimers. to predict the secondary structure of such hetero-dimers within the Results: We present here an extension to the widely used widely used ViennaRNA Package. ViennaRNA Package [6] for the prediction of the structure of RNA/DNA hetero-dimers. Availability: http://www.tbi.univie.ac.at/˜ronny/RNA/vrna2.html 2 APPROACH Contact: [email protected] 2.1 Adaptation of the ViennaRNA Package For the prediction of RNA/DNA hetero-dimers, three distinct energy 1 INTRODUCTION parameter sets (RNA, DNA and mixed) are necessary. In order to keep the changes within the inner recursions of the algorithm Nucleic acids have many important functions in biological systems, minimal, we use 8 instead of four bases and 24 instead of such as information carriers, catalysts and regulators. Both variants, six types of Watson-Crick or wobble base pairs for the energy DNA and RNA, can form complex structures by base pair computations. The look-up tables for the energy parameters grow interactions. The pattern of Watson-Crick or wobble base pairs a accordingly. However, this does not significantly contribute to molecule forms is called the secondary structure. A main difference memory consumption. Finally, we demand a fixed order for the between RNA and DNA is due to their usage in biological systems: input sequences: RNA first, DNA last. while DNA builds a dimer with its reverse complement almost all the time, RNA, being mostly single stranded, is more prone to fold back onto itself. However, if DNA is single stranded (e.g. during 2.2 Energy parameters replication, transcription, repair or recombination), it will also form Because we are only considering RNA/DNA hetero-dimers, not co- intramolecular base pairs. There are several examples where RNA polymers, we only need a subset of the energy parameters necessary and DNA interact via base pairing, generating a structure called R- for the full computations in the RNA or DNA case: we do not loop. They protect CpG islands from being methylated [2]. The encounter stacking of a RNA on a DNA base, for example. In stability of DNA/mRNA hybrids is reported to have an influence contrast, stacking parameters of RNA/DNA base pairs, multi-loop on the copy number of repeats [8] and can impair transcription penalties and the interior loop parameters are needed. For stacked elongation, promoting transcription-associated recombination [5]. pairs, i.e. perfect helices, the parameters have been determined Moreover, during the intensive search for new functional transcripts experimentally [7, 12]. Stacking parameters are the most important rendered possible by next generation sequencing technology, at least part of the energy model, as they account for the majority of 1000 novel lincRNAs were identified that may act as regulators RNA/DNA binding free energy. Unfortunately, for other mixed of transcription, and one possibility to do this is via base RNA-DNA loops no experimental data is available. As no high pairing between RNA and DNA (see [3] for a recent review). quality data set of RNA/DNA structures exists, parameters can not As a final biological example, DNA/RNA hybrids play a role be trained, and the performance of different parameter sets can not within the CRISPR pathway in bacteria and archaea [4]. On a be evaluated. The data we have is not sufficient to chose a function technical side, RNA/DNA dimers are often used in micro-array or for the derivation of the energy parameters (supp. fig. 1), so we PCR experiments. The ability to computationally investigate the simply use the average of RNA and DNA ER/D = (ER + ED)/2 properties of the structures of interacting RNA and DNA molecules for all missing values. We are aware that this formula gives an at best would help in the analysis of these dimers. While a number of crude approximation of the actual folding energies, but note that new programs exist that can predict RNA or DNA secondary structures parameters can easily be incorporated. We hope that the availability of prediction tools for RNA/DNA hetero-dimers will encourage ∗to whom correspondence should be addressed experimentalists to provide the missing energy parameters. c Oxford University Press . 1 Lorenz et al A) B) C) D) C A G A C G A C C A C U A A A C U C G C A A C G U A A A U U A G C C G G C U U CC C C G UA GU A G A A A U G C A U U C A U A U C G G C C U G U G U C C C C G C A U A G C C G C U C G C U C G A C U C C G G A U CC C AU G C G A G A C C U G U C U C A A C C U G C C A AU G C U A U A A G C G C C C C G C U C C G A U C G C C G G U C U U G G A A C C C C G C A U A U G G U C C A U C C C G C G A C C A U C A A C G C C U U C U G CC C C G G G A U U C CG A U C C G C U G G C A C CU G G C C C C C C A U C A G G U U A A A U C A A A C A G C C U U U AU G C C G U UC G AUC G A UA A U U G U C C A U U A C G A U U C U C U U C C A G C A U U A U G C A U C G C G tree edit distance U A U C U A U U A G G U C A G C A C A C G C A A G C C A A C C C A A U C U A A C U G A C G A U U U G A G U C G U A G C G C A G C G A A UA U G U A GCUCUAU G U G C A C C A C U U C A U C C G C A U C U A C G UC C C CGC G A A U A U A G A C A C UAC A G A U C U G A U C A U G C G A U G A A A U G A U G U U U G A U U A G A G G C A G C C G G G U C U U A C A A C G U A A C U A CAGU U G A A C A A C U U C A C G A A C A U A A A C U G G U C A A A A A C 200 400 600 800 G C U A U U G A C A G U A C A G C C C A A C U A G C C U C G U C U C G A A A A A C U U AC CA G C A G C C A UU C U G G C G G G G A C C C U A C A C G C U A U C U G U C C G G A G U G G A U U C A U C A UC U U U U A A A A A U C A U C U U C A C C C G A CC C C U A U A U U C C A G G A G AC A C A A U A A U U G U G A C C C 0 G C A G G C G U C U U C A C C A A Fig. 1. Using hybrid instead of RNA parameters leads to very different structure prediction as summarized in a boxplot of the tree edit distances (A) for all interactions between an ncRNA and a putative target region at human chr , XX to YY (20000 nt).