Annotating RNA Motifs in Sequences and Alignments Paul P
Total Page:16
File Type:pdf, Size:1020Kb
Published online Xxxx 2014 Nucleic Acids Research, 2014, Vol. XX, No. YY 1–37 doi:10.1093/nar/gkn000 Supplementary material: Annotating RNA motifs in sequences and alignments Paul P. Gardner1;∗, and Hisham Eldai1∗ 1School of Biological Sciences, Biomolecular Interaction Centre, University of Canterbury, Private Bag 4800, Christchurch 8140, New Zealand Received July, 2014; Revised July, 2014; Accepted July, 2014 ∗To whom correspondence should be addressed. Tel: +64 3 364 2987; Fax: +64 3 364 2590; Email: [email protected] c 2014 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. [11:22 5/11/2014 supplementary-results.tex] Page: 1 1{37 2 Nucleic Acids Research, 2014, Vol. XX, No. YY SUMMARY In the following document we present supplementary methods, results and figures relating to the RMfam resource: 1. Figures 1-8 illustrate secondary structure diagrams for each of the RMfam motifs. Figure 1 contains a Legend, detailing the color and symbol schemes used to illustrate different evolutionary constraints on the different structures. 2. Figure 9 illustrates our estimates of the accuracy of using covariance models to annotate RNA motifs on sequences and alignments. 3. Figures 10-43 contain secondary structures and the results of per-motif benchmarks. 4. Figures 44&45 illustrate improvements to Rfam (v11.0) alignments and consensus structures based upon RMfam annotations. 5. Figure 46 illustrates the network of the 50 highest scoring RMfam to Rfam mappings. [11:22 5/11/2014 supplementary-results.tex] Page: 2 1{37 Nucleic Acids Research, 2014, Vol. XX, No. YY 3 SECONDARY STRUCTURES Legend U U U Y C G A A A U C basepairannotations A A G A U G C G U C covaryingmutations G C Y R C G compatiblemutations G Y C G nomutationsobserved A Y R Y G C nucleotide nucleotide U A A R present identity A U Y 80% 60% N 80% 70% 40% N 70% R U G U N 60% 5´ 5´ 5´ 5´ 5´ R=AorG.Y=CorU. Figure 1. A legend describing the symbols used in all the secondary structures images presented in figures 1-8. Secondary structure diagrams of: tetraloops: ANYA (1, 2, 3), CUYG (4, 5, 6, 7), GNRA (8, 9, 10, 11, 12, 13), UMAC (14, 15) and UNCG (10, 12, 13, 16) and the hairpins loops C-loop (17, 18, 19, 20), T-loop (12, 13, 21, 22, 23) and U-turn (12, 13, 24, 25). 5-46 nt R R R A U R U R Y R G A G C C G C U A G C A R R Y R Y C A U R G Y 5´ 5´ 5´ Figure 2. Secondary structure diagrams of: the hairpins loops; C-loop (17, 18, 19, 20), T-loop (12, 13, 21, 22, 23) and U-turn (12, 13, 24, 25). 0-38 nt Y 3-81 nt R C R C 4-40 nt R R Y A A Y G C A U G R Y G G C Y R Y R U U R Y Y C C A G G C R R Y G A A A Y R R Y G U A A A G U A U G G G G R A A G G 0-57 nt G A G A A Y R R R Y C G R R R R Y R R Y Y R R Y R R Y C G Y R A R R Y C G G C A G 5´ 5´ 5´ A G 5´ R A R A A 5´ R Figure 3. Secondary structure diagrams of: internal loops: three k-turns (3, 12, 13, 18, 26, 27, 28) and two sarcin-ricin loops (12, 20, 29, 30). [11:22 5/11/2014 supplementary-results.tex] Page: 3 1{37 4 Nucleic Acids Research, 2014, Vol. XX, No. YY R Y G G A G G G G C 4-88 nt 2-22 nt R G Y Y G A G A U C A G A A G U C C C G G G A C U C G G R Y R Y A G R R C G R G R Y A A R Y R R A 0-31 nt C A A C G R Y R Y R G G Y U G U G C C C C A A A U R Y A R Y G G U G A A 28-249 nt R G A A R Y A G C G G A G 1-12 nt A R Y C Y C G G Y Y A G C R Y C R 0-7 nt G Y A G A Y U R R Y Y G R Y G A G C Y G C G Y G U G 0-36 nt C G Y G U G R Y G Y G C G Y G C R G C 5´ R Y A 5´ G C 5´ R G C Y R 5´ 5´ 5´ Figure 4. Secondary structure diagrams of: internal loops: the tandem-GA (20, 31), twist up (17) and UAA GAN (32), the docking elbow (33), right angle 2 and 3 (34) motifs. R Y R R R Y C G C G C G R Y Y R C G Y R A U A U R Y R Y Y G A U C G C G G C C G G Y A U A U G C A U A U R Y C G G C A U R Y A U A U A U G C R U C G A U G C 5´ U Y U Y R U 5´ U U U Y 5´ R Y Y Y Y R R R R Figure 5. Secondary structure diagrams of Rho independent transcription terminators (35). [11:22 5/11/2014 supplementary-results.tex] Page: 4 1{37 Nucleic Acids Research, 2014, Vol. XX, No. YY 5 A R A R G Y G Y R G A U U A U U C G Y A Y R U R U Y Y G U A U A U Y U A R U R U RA Y R Y R R Y U A R U R R Y U R U R U R 5´ 5´ U 5´ 5´ U R G C G Y R 5´ 5´ A A Y A A R A A C A A A R R Figure 6. Secondary structure diagrams of: interactions: the AUF1 (36), CRC (37, 38, 39), CsrA (40, 41, 42, 43, 44, 45, 46, 47), HuR (48, 49), Roquin (50) and VTS1 (51, 52, 53, 54) protein binding motifs. R A A A G A R G A Y R G A G G C C G A C R Y A C A U Y Y G R Y C G R G Y R R Y Y C C C Y C G A A Y G R G A R C R C G R Y G U R Y A U G Y Y G Y 5´ A U A U G Y C 5´ 5´ Figure 7. Secondary structure diagrams of: vapC target (55), the SRP RNA S domain (56, 57, 58) and the catalytic Domain-V (59, 60). [11:22 5/11/2014 supplementary-results.tex] Page: 5 1{37 6 Nucleic Acids Research, 2014, Vol. XX, No. YY Shine-Dalgarno sequence from Bacillus subtilis subsp. subtilis str. 168 2.0 1.0 bits AA GG G A A A A G U A A G A U U A A U A A U U AAAU C A AU U U GA U A C U U CA U U AUAAAAAUAU UU C GA U C C A C G G C C C C C C C C C U C C C U A A G C C CU G UG U GC G UU G G G GGG G C CA G U C G G G A U U A G A G G C 0.0 A -30 -25 -20 -15 -10 -5 0 5 Distance from start codon (nucs) WebLogo 3.1 5´ R A A A R G G G G G R R Y Y R Y A U G A R R R A R Shine-Dalgarno sequence from Escherichia coli str. K-12 substr. MG1655 2.0 1.0 bits AGGA A A A A A A U AA A AA A A U U UU U AA C U UUU G A A AU A U C C C G C UC A U A CC A U U U CA U G U U U G C C G C GU C AA C C A C C G C GC C U UG C U G U G G C CCG UG C U AG CA A U G 0.0 GA -30 -25 -20 -15 -10 -5 0 5 Distance from start codon (nucs) WebLogo 3.1 5´ Y Y Y Y Y Y Y Y R R R G G R R R A Y Y A U G A R A R Shine-Dalgarno sequence from Helicobacter pylori 26695 2.0 1.0 bits A AG A A A AU UA G U A A A UU U AA AU AA U A A C CA A A A U A A U AAAA GG UA AC GU U AUU U U UA UU U G G A AU C CC GC G U G C C G U G U GC C A C G C UC U C C C C U UG U U G G G C G C G C C C G C GG C G C G U C A G G U 0.0 GAA -30 -25 -20 -15 -10 -5 0 5 Distance from start codon (nucs) WebLogo 3.1 5´ Y Y Y Y A A G G R R Y A U G R R A R A Figure 8.