The Petfold and Petcofold Web Servers for Intra- and Intermolecular Structures of Multiple RNA Sequences Stefan E
Total Page:16
File Type:pdf, Size:1020Kb
Published online 23 May 2011 Nucleic Acids Research, 2011, Vol. 39, Web Server issue W107–W111 doi:10.1093/nar/gkr248 The PETfold and PETcofold web servers for intra- and intermolecular structures of multiple RNA sequences Stefan E. Seemann1,2, Peter Menzel1,2, Rolf Backofen1,3 and Jan Gorodkin1,2,* 1Center for Non-coding RNA in Technology and Health, 2Division of Genetics and Bioinformatics, IBHV, University of Copenhagen, Grønnega˚ rdsvej 3, DK-1870 Frederiksberg, Denmark and 3Bioinformatics Group, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany Received February 19, 2011; Revised March 28, 2011; Accepted April 5, 2011 ABSTRACT RNA–RNA interactions occur between many small RNAs in bacteria, e.g. CopA–CopT and FinP–traJ The function of non-coding RNA genes largely 50-UTR, small nucleolar RNAs and small nuclear RNAs depends on their secondary structure and the inter- as well as ribosomal RNAs for their methylation and action with other molecules. Thus, an accurate pre- pseudouridylation, or even between certain long diction of secondary structure and RNA–RNA non-coding RNAs (lncRNAs) and microRNAs to interaction is essential for the understanding of bio- regulate their activity or guide RNA editing. Thereby, logical roles and pathways associated with a the accessibility of nucleotides in the RNA sequence, specific RNA gene. We present web servers to which is constrained by the intra-molecular structure, analyze multiple RNA sequences for common RNA has a large impact on the possible interaction sites. structure and for RNA interaction sites. The web Thermodynamic methods, such as RNAfold (2) or servers are based on the recent PET (Probabilistic Mfold (3), employ a dynamic programming algorithm to find the thermodynamically most stable secondary Evolutionary and Thermodynamic) models PETfold structure by minimizing the free energy of the folded PETcofold and , but add user friendly features molecule. However, it is known that due to several ranging from a graphical layer to interactive usage reasons, such as interactions with proteins or other of the predictors. Additionally, the web servers RNAs and processing of RNAs, the minimum free provide direct access to annotated RNA align- energy (MFE) structure is often not the functional struc- ments, such as the Rfam 10.0 database and ture of the molecule. Instead, the functional structure is a multiple alignments of 16 vertebrate genomes with suboptimal structure, often in the vicinity of the MFE human. The web servers are freely available at: structure. Comparative methods use information from http://rth.dk/resources/petfold/ multiple homologous sequences for secondary structure prediction by considering compensatory mutations that maintain important structural elements over the course INTRODUCTION of evolution, e.g. Pfold (4). The integration of both the RNA folding follows a hierarchical pathway where the thermodynamic and evolutionary paradigms into one primary sequence dictates the formation of the secondary model was the motivation behind PETfold (5) and we structure, which in turn leads to the functional tertiary have shown that our combined PET model increases the structure. Since the RNA secondary structure dominates accuracy of comparative structure prediction compared to the free energy of the tertiary conformation, the secondary previous methods. We further extended the PET model structure is critical for a large number of RNA mediated into PETcofold (6,7) for predicting conserved RNA– processes such as regulation of gene expression, RNA RNA interactions. maturation, splicing and translation (1). Additionally, Two web servers for RNA secondary structure predic- many non-protein-coding RNAs (ncRNAs) and tion from multiple sequence alignments have been de- structured mRNA elements exhibit their function veloped previously, RNAalifold [Vienna RNA web through binding to proteins, other RNA molecules or suite (8)] and Pfold (4). As mentioned in (5), the both. Besides post-transcriptional gene regulation approach here employs a scheme based on the principles through mRNA binding by microRNAs and siRNAs, of these two programs. Interactions between two RNA *To whom correspondence should be addressed. Tel: +45 353 33578; Fax: +45 353 33042; Email: [email protected] ß The Author(s) 2011. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. W108 Nucleic Acids Research, 2011, Vol. 39, Web Server issue molecules can be predicted with, e.g. IntaRNA [Freiburg WEB SERVER RNA Tools (9)]. However, simultaneous RNA cofolding Usage and prediction of conserved interactions from multiple sequence alignments are not supported. Additionally, The web server offers a separate input page for PETfold none of these web servers allows the direct access to and PETcofold. The required input is one multiple such alignments of known ncRNAs and UTR elements sequence alignment for PETfold or two multiple as well as human genome blocks. sequence alignments with at least three shared sequence The web servers presented here offer the prediction of identifiers for PETcofold in FASTA format. Both input conserved RNA secondary structures (PETfold) and pages also offer direct access to Rfam 10.0 (10) seed align- RNA–RNA interactions (PETcofold) on multiple ments with or without paralogs and/or human (hg18) sequence alignments. The web servers are accompanied 17-way MULTIZ alignments for predicting secondary by a graphical interface that enables optimal exploitation structures and interactions. MULTIZ alignments are of the methods when working on a specific data set. As an free of paralogs by definition, in contrast, many Rfam example, we demonstrate PETfold’s structure prediction seed alignments contain several instances of a family in from a multiple alignment of the microRNA lin-4.Asan the same organism. Thus, both servers offer a no example for RNA-RNA interaction with PETcofold,we paralogs version of each Rfam seed that is mandatory demonstrate the web server functionality on the bacterial for the input of PETcofold to get the same identifiers inhibitory antisense-target RNA complex between the two in both alignments. The no paralogs alignment has the conserved structured RNA sequences FinP and the Rfam identifiers mapped to organism-specific MULTIZ 50-UTR of traJ. identifiers and only the sequence that has the closest distance to the mean distance in the phylogenetic tree is selected for each group of paralogs. In addition, the user can specify a phylogenetic tree in Newick format as well ALGORITHM as one or two consensus secondary structures in The PETfold algorithm is described in depth in (5) and dot-bracket format for calculating their score in the the extended algorithm of PETcofold in (6). In short, the PET-model. Several other parameters can be set for each submission. The default values for both the PETfold and maximum expected accuracy (MEA) approach of PETcofold models are set to the optimal values that we PETfold integrates evolutionary structure reliabilities found in the application to previous data sets. Detailed and thermodynamic structure probabilities in one description of the input formats and sample files are scoring scheme. First, base pairs with high evolutionary provided on the website. reliability score are fixed to take functional conserved After submission, the user can access his results via a structure elements into account and then the weighted link that contains a secure unique identifier. Optionally, sum of reliabilities is calculated. The consensus structure an e-mail notification on the completion of the computa- with maximal expected overlap of a multiple alignment is tion can be selected upon submission. The results of each computed by a Nussinov style algorithm using the submission are displayed on an HTML page, which is combined reliabilities. The evolutionary reliability of accessible up to 60 days after completion of the computa- paired and unpaired bases is calculated by Pfold from tion. The output page shows either the user-specified a phylogenetic tree, a substitution model and a stochastic phylogenetic tree or the tree calculated by the neighbor context-free (SCFG) grammar stochastically describing joining algorithm. The PETfold or PETcofold plain the building of RNA structures. The thermodynamic text output includes the predicted (joint) consensus probability of a base to be paired or unpaired is calculated RNA secondary structure in dot-bracket notation, its by RNAfold as the partition function in the energetic score and length normalized score. An example of the equilibrium. PETfold output for a typical microRNA is shown in The PETfold algorithm is extended in PETcofold by Figure 1. The alignment shows the conservation of the hierarchical and constrained folding to predict conserved primary sequence and the reliability scores for base interactions between two RNA alignments taking the pairs, while the predicted consensus secondary structure non-accessibility of highly reliable intra-molecular stems is plotted underneath using the same color scheme. The into account. The hierarchical folding consists of two dotplot shows the base pair reliabilities (upper triangle) steps: (i) intra-molecular folding of both RNA alignments and the base pairs that take part