Carnival of SL : Structural variants and the possibility of a common origin

Manja Marza, Nathalie Vanzof, Peter F. Stadlera,b,c,d,e

aBioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, H¨artelstraße 16-18, D-04107 Leipzig, Germany bMax Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany cFraunhofer Institut f¨ur Zelltherapie und Immunologie – IZI, Perlickstraße 1, D-04103 Leipzig, Germany dDepartment of Theoretical Chemistry University of Vienna, W¨ahringerstraße 17, A-1090 Wien, Austria eSanta Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA fCentre de Biologie du D´eveloppement, UMR 5547 C.N.R.S. Universit´ePaul Sabatier, F-31062 Toulouse Cedex, France

Abstract SL RNA trans-splicing so far has been described in seven eukaryotic phyla. A careful re-evaluation of predicted SL RNA secondary structures reveils striking similarities among Euglenida, Kinetoplastida, Dinophyceae, cnidarians, rotifers, nematods, platyhelminths and tunicates. Together with motifs, our analysis suggests an early common origin of spliced leader trans-splicing. Key words:

Introduction later, related RNAs were found in Euglena gracilis [4]. The first metazoan examples were the nematode Many eukaryotes have two fundamental modes of [5] and in platyhelminth Schis- spliceosomal splicing. cis-splicing is the excision of in- tosoma mansoni [6]. Many more examples were soon trons. In trans-splicing,on the other hand, a short leader found in related species see Tab. 2, but it took until the sequence is transfered to the 5’ end of a (typically pro- turn of the millenium before SL RNAs were discovered tein coding) mRNA, which is usually processed from in additional metazoan phyla (cnidaria [7], tunicates a polycistronic transcript. This leader contains the 5’ [8, 9], rotifera [10]), and in dinoflagellates [11, 12]. hypermodified cap structure necessary for translational In some species, multiple divergent copies of the SL initiation [1]. In all cases so far, the leader sequence is RNA have been reported [13, 4, 14, 10], and some derived from small non-coding RNA, the spliced-leader groups of species harbour two or more clearly distinct (SL) RNA, which has a common organization, Fig. 1. types of SL RNAs [15, 7, 14]. Of these SL1 and SL2

SL RNA pre−mRNA are distinguished also in the Rfam [16]. On the other / Leader Intron hand, several model organism do not seem to utilize trans-splicing: despite substantial efforts, no evidence m*Gppp GU Sm AG AUG for trans-splicing could be gathered for Drosophila melanogaster, Saccharomyces cerevisae, and Arabidop- m*Gppp G AUG sis thaliana [17, 18], and no evidence for trans-splicing has been reported in vertebrates despite the availability Figure 1: Schematic drawing of a typical SL RNA of extensive transcriptomics data. The unexpectedly scattered distribution of SL trans- The first SL RNAs were discovered in kinetoplas- splicing across the phylogeny of Eukarya has prompted tids a quarter of a century ago [2, 3]. A few years interest in the evolution of trans-splicing already two decades ago. To-date the two major competing hypothe- ses (reviewed in [7, 17, 18]) still remain unresolved: Email addresses: [email protected] (Manja Marz), [email protected] (Nathalie Vanzo), [email protected] (Peter F. Stadler) 1. SL RNAs have a common origin (Fig. 2a). Based Preprint submitted to Preprint May 20, 2009 adprmtr ie,atmeaueof temperature a (i.e., parameters dard on ntesplmna aeil ayo h pub- the using of obtained Many were structures material. lished supplemental be the can in alternatives found structural ambient Additional optimal organism’s temperature. each to adjusted pa- with rameters computed that structures selected with together structures secondary of Re-evaluation Results SL- Eukarya. towards of the evolution of leans origin strongly common evidence a our provide proof, cannot definite we few as Although very phyla. show across themselves similarities sequences alternatives the structural though secondary similar even similar admit fairly even into that fold structure can fact that in find We RNAs SL apparent problems. methodological the to of attributed be much that find di We studies. previous sec- the to [19]. lead e.g. has see hypothesis, structure ond secondary RNA SL of ity di between from homology RNAs sequence SL detect to failure the while SL- func- of similarities the mechanistic by and primarily tional supported is hypothesis first The iue2 vlto fS Ns(7 7) a LRA aea have RNAs improbable. SL be independen to (a) show derived here RNAs we 17]). which SL clades, (b) ([7, seven in RNAs ancestor. SL eukaryotic of common Evolution 2: Figure .S Nsaoeo utpeocsos(i.2b) (Fig. occasions multiple on arose RNAs SL 2. ff a.2smaie h ulse eodr structures secondary published the summarizes 2 Tab. the from structures secondary the re-evaluate we Here Euglenida rne ewe LRAscnaysrcue can structures secondary RNA SL between erences

ntefc,ta l LRA aetesm func- monocistronic same to mRNAs. transcripts the II have Pol RNAs resolving SL of tion all that fact, the on Kinetoplastida Arabidopsis thaliana Dinophyceae Saccharomyces cerevisae (a) Cnidaria Rotifera

ff Nematoda

rn hl n h paetdispar- apparent the and phyla erent Drosophila melanogaster Platyhelminths Tunicates Homo sapiens trans

Euglenida

slcn al nthe in early -splicing Kinetoplastida Arabidopsis thaliana mfold Dinophyceae

trans Saccharomyces cerevisae (b) T Cnidaria ihstan- with

= Rotifera -splicing, Nematoda 37 Drosophila melanogaster

◦ Platyhelminths C), tly Tunicates Homo sapiens 2 elcnevdpr ftegnmclocus. genomic the of part well-conserved the expanded therefore intestinalis We [9]. reported previously tween Tunicata. sec- phyla. known other the in structures to ex- ondary The similarly folds 2. Tab. then see shorter, sequence nt that tended 16 suggest be would would RNA SL which this site, binding Sm alternative sntfre.For substructure formed. this not that is suggesting strongly param- all settings, under eter energy folding positive a has [22] RNA mbnigst.Asml euneaineto this of of RNA alignment SL sequence the simple with sequence A site. binding Sm hsS N s2n hre hntepbihdse- published the than proposed shorter with 24nt consistent is quence, RNA unpaired. SL that be furthermore, this likely, to it site makes analysis binding con- structural Sm The additional the an forces Moreover, without that and straints at with work. unfavourable consistent temperatures, energetically same not all is the is structure from that reported data site the EST splice the donor a with with [11] in Dinoflagellata. section. Methods the in mentionend pub- di and the corrections para- Small from next data. drastically the lished ques- deviates in analysis detail in several our in In organisms discuss graphs, will [20]. the we ones which of unicellular cases, the most particular for in tion, lethal is which aeS Nsaepooe n[2.Frboth For [12]. in proposed crum are RNAs di SL Completely late [11]. nal mpoen n 1sRAse ob osre ndi- The in conserved be [21]. to noflagelates seem snRNA U1 and proteins Sm the di cor- of major If nization indicate downstream. would of this instead rect, site splice the of stream Euglenida. LRA fohrpya ealdainet a be www.bioinf.uni- can at leipzig.de alignments Supplement Electronic Detailed of the in organization phyla. found other common of the RNAs to SL conform that models 9to h ’sd otepbihdsqec rmthe ( from and GenBank sequence EF143082.1 side published from 5’ the available DNA the to side genomic on 3’ 4’nt the Adding on work 79nt truncated. the are from [12] sequences RNA of SL dinoflagelate the or LRAi lal osre mn hs alveolates. published these the among either conserved Thus, clearly is RNA SL and .intestinalis C. / .piscicida P. Publications ihntncts h eunesmlrt be- similarity sequence the tunicates, Within eune2n obt ie,cvrn the covering sides, both to 20nt sequence tmI ftepublished the of IV Stem , EF143084.1 trans The .sulcatum E. slcn ahnr.Hwvr the However, machinery. -splicing and .brevis K. h mbnigst ssonup- shown is site binding Sm the / .minimum P. SUPPLEMENTS .dioica O. ff ,w aiyoti structure obtain easily we ), rn oeso dinoflagel- of models erent .brevis K. .brevis K. LRAwsreported was RNA SL eietf possible a identify we ff rne nteorga- the in erences smc ihrthan higher much is A ff srpre without reported is 5 rne r briefly are erences smc o long, too much is emnto sig- termination Entosiphon hw htthe that shows / 09-009. EF143079.1 .mi- K. SL C. , h.contortus.1 # STOCKHOLM 1.0

t.spiralis.1 p.pacificus.1 c.elegans.1 GG TTT.....AA.TTACCCAAGTTT....GAG p.pacificus1 GG TTT.....AA.TTACCCAAGTTT....GAG h.contortus1 GG TTT.....AA.TTACCCAAGTTT....GAG c.elegans2 GG TTT..TA...... ACCCA.GTTACT.CAAG h.contortus2 GG TTT..TA...... ACCCA.GTATCT.CAAG p.pacificus2 GG TTTAT...... ACCCA.GTATCT.CAAG c.elegans.1 a.ricciae1 GGC TTATTACAACTTA.CCAAG...... AG philodina GGC TTATTACAACTTA.CCAAG...... AG t.spiralis.2 t.spiralis GG.. TAT...... TTA.CCA.G.ATCTAAAAG //

Figure 4: A sequence feature shared between rotifer and nematod SL RNAs. c.elegans.2 0.1 h.contortus.2 p.pacificus.2 2 2

Figure 3: Neighbor-net analysis confirms that nematode SL1 and SL2 1 1 bits RNAs form distinct paralog groups, with the possible exception of bits Trichinella. TTTTTT T CAAAAA GCGCG G 0 AGCG C CT 0 CGA G 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 A A1 T

10 A GGT TTT G Sm Binding Motif. In most of the previous publications, (a) (b) structures were computed with the constraint of an ex- ternally unpaired Sm-binding site. However, most of the Figure 5: Consensus of (a) Donor splice site and (b) Sm binding site sequences can fold in a haipin structure in which (most of SL RNA for all seven clades. of) the Sm-binding site is located in an accessible loop, as reported e.g. for Euglena [4] and Hydra [7]. There- the complete data set, Fig. 5). No other sequence sim- fore we report structures with and without a constraint ilarities have been reported previously, and a sequence on the Sm-binding site in Tab. 2. alignment does not pick up any additional motifs. How- ever, a comparison of the IUPAC-code consensus se- Phylum specific alignments quences of the 7 phyla identifies several weak sequence The SL RNAs are fairly well conserved as sequence features, Fig. 6. Details can be found in the Electronic level within each of the 7 phyla. Alignments can be Supplement. found in the Electronic Supplement. An exception is the 3’ part of the T. brucei SL RNA, which has a long • Upstream of SL RNA stem I, there is an insert upstream of the Sm-binding site. A/C-rich region containing an occasional T. We ∗ The two paralogous SL RNA in Hydra are probably denote this region by H since H= {A,C,T}. recent duplicates, sharing a highly conserved 17nt long This G-poor sequence conform to the proposed block in the 3’region. Neither paralogs has a recogniz- able homolog in Nematostella vectensis. Exon Intron like element Most nematodes have multiple SL RNAs. Within Donor SL start splice site Rhabditina, there are clearly dicernible paralog groups Sm binding SL1 and SL2 [23]. The SL RNAs of Trichinella (clade site Dorylaimia) [14] do not fit well into this scheme, how- [H][]**[H]* D[]G GU D]* [[]... V]6+ [... []B** [...] []B Initiator loop highly 5’ Stem I 3’ Stem I ever ( Fig. 3). variable A sequence-based alignment between two adjacent region phyla shows no conserved regions, as expected. How- ever, aligning just exonic parts reveals some similarities Figure 6: Common sequence features of SL-RNAs: Initiator lacking between rotifers and nematods, Fig. 4. of G, a variable length of Stem I, which 5’ part consists mainly of A,C,T followed by a loop, which for SL RNA-α contains mainly T and for SL RNA-β A,G. The Donor Splice site is located downstream Ubiquitous Sequence Features of the loop in the hairpin with the highly conserved GGU-motif. Stem I continous nearly without any C. After a highly variable region, the Not surprisingly, the donor splice-site, G|GU, and the known Sm binding site with the common motif AATTTTTGG and a U-rich Sm-binding site are well conserved within each possible unpaired region a last part often structured as a hairpin shows phylum and can be detected easily using Meme [24] in no A to be paired. 3 initiator sequences UNCU in euglenoids [19], Secondary Structure Analysis YA+1NU/AYYY generally observed in metazoans The secondary structures obtained by folding SL [25], YYHBYA+1ACU described for trypanosomes RNAs with and without constraints, and those reported [26] and CA+1AUCUC in K. brevis [11]. in the literature, can be classified into 10 groups. Using each of these structures as constraints, we computed the • The loop and 3’ part of the first hairpin are de- folding energies of all known SL RNAs. Tab. 1 lists the pleated in C. This may be associated with con- results, which can be summarized as follows: straints associated with the splice-site and/or the binding affinity between SL RNA to mRNA. 1. Most euglenidSL RNA foldsinto 4 hairpins. How- ever, other phyla such as cnidaria, rotifers, or tuni- • The 5’ part of the first hairpin (most of the exon) cates (due to the shorter sequence) never fold into shows a lack of G, explained by the pairing with such a structure. the D= {A, G, T} region mentioned above. This 2. The SL RNAs of all species can fold into a single H-block is less well-conserved than the 3’ D-block hairpin including both the donor splice site and a due to the possibility of G-U pairing. A succession Sm binding site. Except for Oikopleura, however, of A-U pairs followed by U-A pairs in the outer part this structure is never energetically preferred. of stem I (just “below” the splice site) was reported 3. Stem I upstream of the Sm-binding site is highly in [19] for euglenoids. In several other clades such conserved. All SL RNAs except that of Oikopleura a stringent pattern is not visible, however. can form this structure. In most cases, this struc- ture is also thermodynamically highly favoured, • The loop of stem I shows a clear prevalence of W= (see Tab. 1, 3rd and 4th column). {A,U}. Depending on the A/U-ratio, two subtypes 4. For the Sm-binding site either a completely un- can be distinguished (see below). bound external structure or a mostly unbound lo- cation within a hairpin loop were discussed. Both • The donor-splicing site is highly conserved with structural models are plausible, the hairpin variant the sequence G|GU, Fig. 5. It is always located is always energetically favorable. downstream of loop I. 5. In most SL RNAs, stem I folds can attain two dif- ferent hairpins, see below (Fig. 7). • The region between stem I and the Sm-binding site is highly variable not only between but also within Interestingly, the sequence underlying stem I can each phylum. form two alternative structures, Fig 7. Since the sequences are highly divergent between phyla, it is • The SM binding site consists of a highly conserved very unlikely to observe the same structural alternative D-region. The common pattern is AAUUUUUGG, throughout the entire dataset. We therefore conclude Fig. 5b, with the sole exception of Oikopleura that the conformational change between the two alter- dioica. natives is required for SL RNA function. Secondary structure alignments show common fea- • Stems downstream of the Sm binding site show a tures within each phylum, as well as for rotifers and ne- highly conserved C,G-rich B= {C,G,U} stem. The matods together. Weak signals fuer conserved structure loop in contrast is A-rich. elements were obtained by aligning all prostomia to- gether, Fig. 8. The corresponding alignments are com- Taken together, we find recognizable sequence con- piled in the Electronic Supplement. straints covering almost the entire SL RNA . The loop region of stem I clearly distinguished two Discussion sub-types of SL RNAs. In class α, the loop consists mostly of Us, while the loop in class β is essentially free SL RNAs share a common function in trans-splicing of Us. In most metazoans with more than SL RNA (e.g. [18, 17]. They provide a short exonic leader sequence C. elegans, H. contortus, P. pacificus, T. spiralis, and S. with an unusual trimethyl-guanosine cap and they play mediterannea both types are present. The two rotifers A. an active role in the spliceosome-catalyzed processing ricciae and Rhilodina sp., as well as C. intestinalis have by virtue of binding to the Sm protein. SL RNAs are only type β, while otherwise type α appears to be preva- found in wide range of eukaryotic phyla, they are con- lent. In the cnidarian Hydra the classification remains spicuosly absent in many major clades suggesting a ambiguous. complex evolutionary history. 4 Table 1: Secondary structures, their MFEmin for all constraint folded MFEcons and their ratio of contraint folding to minimum MFE (MFEcons/MFEmin).

Organism T MFEmin E. gracilis 29 -35.02 -35.02 1.00 -23.38 0.67 -20.51 0.59 -6.22 0.18 -22.52 0.64 -27.15 0.78 -21.54 0.62 -31.50 0.90 -29.27 0.84 -21.94 0.63 P. curvicauda 22 -41.51 -41.51 1.00 -34.19 0.82 -24.35 0.59 -11.65 0.28 -32.71 0.79 -32.86 0.79 -23.18 0.56 -34.98 0.84 -35.13 0.85 -25.45 0.61 C. acus 22 -54.10 -54.10 1.00 -40.10 0.74 -27.01 0.50 -13.84 0.26 -46.80 0.87 -42.92 0.79 -33.50 0.62 -49.63 0.92 -40.73 0.75 -28.97 0.54 R. costata 23 -42.69 -42.69 1.00 -32.29 0.76 -17.57 0.41 -6.61 0.15 -27.23 0.64 -24.78 0.58 -14.38 0.34 -32.01 0.75 -30.09 0.70 -19.69 0.46 M. pellucidum 23 -35.73 -35.73 1.00 -28.32 0.79 -15.28 0.43 -7.66 0.21 – 0 -22.26 0.62 -20.05 0.56 -23.67 0.66 -27.55 0.77 -17.75 0.50 E. sulcatum-long 25 -35.98 – 0 – 0 – 0 -7.13 0.20 -35.98 1.00 -33.35 0.93 -14.13 0.39 – 0 – 0 – 0 *E. sulcatum-short 25 -35.07 -16.47 0.47 -19.29 0.55 -11.84 0.34 -7.15 0.20 – 0 -35.07 1.00 -33.86 0.97 – 0 -33.66 0.96 -27.13 0.77 T. cruzi 23 -44.05 -34.93 0.79 -38.23 0.87 -32.84 0.75 -24.96 0.57 -33.17 0.75 -38.58 0.88 -42.24 0.96 -34.75 0.79 -44.05 1.00 -43.74 0.99 T. vivax 23 -53.64 -46.90 0.87 -53.64 1.00 -43.94 0.82 -22.16 0.41 – 0 – 0 -42.79 0.80 -48.25 0.90 -51.40 0.96 -48.61 0.91 T. brucei 23 -75.05 -60.45 0.81 -75.05 1.00 -34.05 0.45 -15.95 0.21 -48.89 0.65 -45.54 0.61 -44.64 0.59 -55.73 0.74 -71.99 0.96 -43.42 0.58 *T. brucei5’SM 23 -80.31 L. collosoma 28 -27.43 – 0 -27.43 1.00 -21.07 0.77 -15.60 0.57 – 0 -23.54 0.86 -22.69 0.83 – 0 -21.85 0.80 -22.53 0.82 C. fasciculata 20 -30.83 -25.40 0.82 -30.83 1.00 -29.69 0.96 -21.44 0.70 -21.20 0.69 -23.13 0.75 -30.10 0.98 – 0 -30.76 1.00 -29.62 0.96 L. enriettii 36 -24.72 -12.68 0.51 -17.26 0.70 -11.00 0.44 -10.36 0.42 – 0 – 0 -24.72 1.00 – 0 – 0 – 0 K. brevis 20 -45.34 -38.06 0.84 -45.34 1.00 -40.31 0.89 -22.47 0.50 – 0 – 0 -33.03 0.73 – 0 -21.04 0.46 -22.77 0.50 *K. brevis3’SM 20 -50.23 -41.21 0.82 -39.88 0.79 -22.74 0.45 -19.30 0.38 -45.41 0.90 -50.23 1.00 -39.31 0.78 -28.14 0.56 -26.02 0.52 -16.17 0.32 K. micrumlong 20 -54.66 – 0 -53.29 0.97 -53.56 0.98 -26.06 0.48 – 0 -23.93 0.44 -30.62 0.56 – 0 -47.97 0.88 -54.66 1.00

5 Hydra-A 18 -18.06 – 0 -10.96 0.61 -9.26 0.51 -6.60 0.37 – 0 – 0 -18.06 1.00 – 0 -12.63 0.70 -14.35 0.79 Hydra-B1 18 -18.80 – 0 – 0 -18.80 1.00 -7.63 0.41 – 0 – 0 -12.19 0.65 – 0 – 0 -18.03 0.96 Hydra-B2 18 -18.03 – 0 -18.03 1.00 -11.79 0.65 -8.20 0.45 – 0 -15.86 0.88 -17.23 0.96 – 0 -14.70 0.82 -10.85 0.60 Hydra-B3 18 -24.27 – 0 -11.62 0.48 -8.82 0.36 -8.20 0.34 – 0 -22.17 0.91 -20.88 0.86 – 0 -24.27 1.00 -14.93 0.62 A. ricciae 24 -46.48 – 0 – 0 -36.14 0.78 -6.76 0.15 – 0 – 0 – 0 – 0 – 0 -46.48 1.00 Philodina sp. 20 -49.82 – 0 – 0 -42.28 0.85 -7.77 0.16 – 0 – 0 – 0 – 0 – 0 -49.82 1.00 C. elegans 1 20 -44.73 -30.04 0.67 -43.54 0.97 -28.75 0.64 -12.27 0.27 -28.25 0.63 -33.80 0.76 -12.27 0.27 – 0 -44.73 1.00 -31.67 0.71 C. elegans 2 20 -58.38 -43.52 0.75 -57.14 0.98 -47.88 0.82 -23.35 0.40 -33.79 0.58 -32.85 0.56 -34.06 0.58 -52.20 0.89 -58.38 1.00 -49.20 0.84 T. spiralis 1 36 -30.15 -20.92 0.69 -27.80 0.92 -21.08 0.70 -8.34 0.28 – 0 -17.41 0.58 -19.06 0.63 – 0 -30.15 1.00 -20.71 0.69 T. spiralis 2 36 -19.51 – 0 -19.51 1.00 -8.40 0.43 -7.91 0.41 – 0 -17.78 0.91 -11.74 0.60 – 0 -18.27 0.94 -11.81 0.61 P. pacificus 1 20 -54.23 -19.16 0.35 -50.76 0.94 -36.91 0.68 -13.79 0.25 -38.17 0.70 -36.34 0.67 -37.31 0.69 -54.03 1.00 -54.23 1.00 -37.30 0.69 P. pacificus 2 20 -51.68 -48.85 0.95 -48.98 0.95 -39.48 0.76 -19.94 0.39 -29.28 0.57 -32.14 0.62 -38.51 0.75 -42.26 0.82 -51.68 1.00 -37.19 0.72 H. contortus 2 25 -41.15 -33.91 0.82 -39.91 0.97 -35.12 0.85 -17.74 0.43 -17.52 0.43 -17.81 0.43 -41.15 1.00 -34.69 0.84 -39.26 0.95 -40.79 0.99 S. mansoni 28 -31.43 – 0 -23.53 0.75 -26.65 0.85 -21.63 0.69 – 0 -26.41 0.84 -21.63 0.69 – 0 – 0 -31.43 1.00 F. hepatica 16 -57.02 -32.80 0.58 -53.75 0.94 -47.25 0.83 -32.89 0.58 – 0 -41.96 0.74 -41.80 0.73 – 0 -57.02 1.00 -52.15 0.91 S. mediterranea-1 22 -44.16 -32.07 0.73 -43.87 0.99 -37.88 0.86 -21.96 0.50 -15.99 0.36 -28.24 0.64 -26.57 0.60 -31.91 0.72 -44.16 1.00 -41.19 0.93 E. multilocularis 35 -42.20 -33.35 0.79 -36.34 0.86 -26.78 0.63 -19.78 0.47 – 0 -42.20 1.00 -37.35 0.89 – 0 -37.37 0.89 -28.69 0.68 C. intestinalis 21 -14.98 – 0 – 0 -5.60 0.37 -5.71 0.38 – 0 – 0 -14.98 1.00 – 0 – 0 – 0 O. dioicaSM1 20 -14.24 – 0 – 0 – 0 -14.24 1.00 – 0 – 0 – 0 – 0 – 0 – 0 O. dioicaSM2+9 20 -16.35 – 0 -0.82 0.05 -16.35 1.00 -14.24 0.87 – 0 – 0 – 0 – 0 – 0 – 0

0 0.6-0.7 0.8-0.9 0.9-1.0 Organism Isoform1 Isoform2 Euglenida UCG A A Kinetoplastida A U A G U GGUAU AGAAGCU UAU A A UCUGUACUU UGGUAU Dinophyceae UUUCAUG UCUUUGAC U U UGACUAUGAA AUCG C. fasciculata GUC AUAUAAGUA UAU Cnidaria CAG C U G C GUACUGCAUGGUAAGAA CCGUCG C Rotifera U U U U C A U GUGA UUGU G AUGGUAAGA * UAGUGU UCAUUUUGGCACUGC U UC C C CAUU GGCA C UGCCA S. mansoni A U UU * Nematoda GUCGG G C Platyhelminths C U U G G AGGGCUGUACAAG G A A C CGGUUU CCGAU UAUGGCUCAAGGUAC Tunicates U UA GC U C ACCGA G UCUA K. brevis AUCU U U CC Figure 8: Sequence and structure similarities obtained by standard UAAGGUAAGAAU CU U alignment programs ClustalW (• – full sequence,  – exon only) and A A C A UUUAUCUUAGG U A A G A U G U A U CU UUUGAAU GUA GAA the structural alignment tool locarnate (N – full sequence). Filled A A C. intestinalis UCAUACAUUUG UGG AAGUUUA CAUACU symbols indicate similarities, empty symbols indicate that there are no obvious common features; * – alignable with nematods SL1 only.

AUAAAGGUAGG G A G U UUUUAUCU UUAAUAAAGGUA UC U C to SL RNAs that always share the same sequence and C U UUAU UUCA Hydra sp. CAAA U C structure constraints, since these similarities suggest that they interacting with the same proteins. Our analy- U GU C U GUGUA AGAGACU UC U A U CUGUCUUUGAUCA AG U A UU A AG C sis thus favours a common orgin of trans-splicing, with U CA AA CUGU CUUCA GGU UGU AGA G U U UGACAAGAAGU UUA ACA UC L. collosoma CUAAAACAAUUUU UU AAA frequent losses througout the eukaryotic tree [8]. The inability to find SL RNA in genomic se-

UUGAGU UGAGAAGCU U quence data does make a good argument against the ex- A C UAUCAUGUCUUUGAC AGU G A UCUGUACUAUAUUGGU AGAA A U istence of trans-splicing since mutation studies in kine- UAACGCUAUUAUUAG UGACAAGAUUAUUAUCG UC T. brucei C CAA toplasts and nematodes showed that much of the se- quence and structure can be disrupted without conse- Figure 7: Two alternative secondary structures of stem I can be formed quence to function in trans-splicing [17, 27], indicat- in all phyla. In case of C. intestinalis 16 nucleotides were added to the ing that SL RNAs are evolutionary very plastic. This 5’ end of the published sequence. property is shared with several other functionally cru- cial ncRNAs such as telomerase RNA [28, 29] and 7SK RNA [30, 31], which so far also have been found only A re-evaluation of the available evidence on SL in a rather scattered collection of clades. Is is quite con- RNAs across Eukarya shows that there a several se- ceivable, therefore, that SL RNA trans-splicing will be quence and structure similarities among SL RNAs that discovered in multiple additional phyla as sufficiently together strongly suggest not only a common function large EST libraries become available. The reason is but also a common mechanism. SL RNAs share: that SL trans-splicing might well be a relatively rarely 1. The relative positioning of the splice-donor site used processing mechanism in some clades. For exam- and the Sm-binding site is the same. ple, the basal eukaryotes Giardia lamblia [32, 33] and 2. There is a weak but recognizable shared sequence Trichinella spiralis [34] have functional splicesomes but pattern, suggesting common descent or common there are only a handful of mRNAs with spliceosomal selection pressures. introns in these species. Similarly, introns are quite rare 3. Structural similarties between SL RNA are much in Saccharomyces cerevisae [35]. We argue, therefore, greater than recognizedin previouswork when nat- that the absence of SL trans-splicing is plausibly estab- ural ambient temperatures are taken into account. lished only in a few organisms such as mammals, fruit- flies, yeast, or Arabidopsis. For these, the transcriptome 4. All SL RNAs share the possibility of two alterna- is known sufficiently well to rule out with near certainty tive conformations of stem I, suggesting that the that there unrelated mRNAs with a common leader se- structural transition between the two states is in- quence of unclear origin. volved in SL RNA function. If one accepts a common origin of SL RNAs, their While logically possible, it seems quite unlikely that structural evolution must have followed one of the three trans-splicing arose de novo several times to give rise scenaria outline in Fig. 9: (1) Most ancestral SL RNAs 6 Euglenida Methods Kinetoplastida ? Dinophyceae Sequences. SL RNA sequences were taken from the Cnidaria following publications: Chordata: Ciona intestinalis Rotifera [8], Oikopleura dioica [9]; Nematoda: Caenorhabdi- Nematoda tis elegans [5], Ascaris [36], Wucheria bancrofti [37], Platyhelminths Haemonchus contortus [38], Pristionchus pacificus Tunicates [39], Trichinella spiralis [14]; Platyhelminthes: Schis- (a) tosoma mansoni [6], Fasciola hepatica [40], Echinococ- Euglenida cus multilocularis [41], Schmidtea mediterranea [42]; Kinetoplastida Rotifera: Philodina sp., Adineta ricciae [10]; Cnidaria: ? Dinophyceae Hydra sp. [7]; Euglenida: Euglena gracilis [4], En- Cnidaria tosiphon sulcatum [22], Cyclidiopsis acus, Phacus cur- Rotifera vicauda, Rhabdomonas castata, Menoidium pellucidum Nematoda [19] Kinetoplastida: Trypanosoma cruzi, T. vivax, T. Platyhelminths brucei, Leptomonas collosoma [2, 3], Leishmania en- Tunicates riettii [43], Crithidia fasciculata [44], Bodo caudatus (b) [45]. Dinoflagellata: Karenia brevis [11], Karlodinium micrum, Pfiesteria piscicida, Prorocentrum minimum Euglenida [12]. Kinetoplastida ? Dinophyceae A blastn search in GenBank returns additional Cnidaria full length SL RNA sequences, which were used in Rotifera subsequent analyses: Nematoda: Ascaris AB022045.1, Nematoda Loa U31638.1, Mansonella AJ279033.1, Acan- Platyhelminths thocheilonema U31646.1, Onchocerca M37737.1, Tunicates Foleyella AJ250988.1, Setaria AF282181.1, Toxocara (c) U65503.1, Enterobius AY234784.1, Nippostrongylus EB185208.1, Meloidogyne CN443291.1, Haemonchus Figure 9: Three alternative scenaria for the evolution of SL RNA sec- CA994732.1, Teladorsagia CB043522.1; Platy- ondary structure. (a) Ancestral state with 4 hairpins, (b) ancestral helminthes: Echinostoma U85825.1; Rotifera: Bdel- state containing only features that are still contained in all present-day SL RNAs, (c) most parsimonious scenario minimizing the number of loidea AY823993.1; Kinetoplastida: Herpetomonas hairping gain/loss events. AY547489.1, Phytomonas AF243335.1, Wallaceina AY547488.1.

Alignments. Sequences based alignments were calcu- lated by ClustalW 2.0.10 with standard parameters contain 4 hairpins close to the Sm binding site, Fig. 9a, [46]. Structure-based alignments were calculated by with subsequent simplifications in some clades. (2) Locarnate [47]. Alternatively, the structural complexity may have in- creased, Fig. 9b. (3) A maximum parsimony analysis Secondary Structures. Secondary Structures were com- in which we interpret the hairpins as characters, also puted using the Vienna RNA Package 1.8.2 [48]. We points a structurally fairly simple ancestor, Fig. 9c. The used RNAalifold for consensus structure prediction, inferred ancestral states in these scenaria should help RNAsubopt for assessing structural alternatives, and with constructing descriptors for homology search of RNAeval to determine the folding energy of published SL RNAs. RNA secondary models. Whenever individual se- quences were folded, the temperature parameter was set In summary, the re-evaluation of SL RNA sequences to temperature under which the organisms are handled, and secondary structures does not definitively resolve see Tab. 2 for references. the questions regarding the origin of SL transsplicing, The S. mansoni structure proposed in [40] based the but it provides several lines of circumstantial evidence sequence published in [6] contained small differences that consistently favour an ancient common origin in in sequences. We used the original sequence folded into Eukarya. the published structure. 7 Other Software. We used the pattern search program [14] J Pettitt, B M¨uller, I Stansfield, and B Connolly. Spliced leader Meme [24] with default parameters to search for con- trans-splicing in the nematode Trichinella spiralis uses highly served sequence motifs. Phylogenetic analysis was per- polymorphic, noncanonical spliced leaders. RNA, 14(4):760– 770, Apr 2008. formed with the SplitsTree package [49]. [15] D Evans and T Blumenthal. trans splicing of polycistronic caenorhabditis elegans pre-mRNAs: analysis of the SL2 RNA. Mol Cell Biol, 20(18):6659–6667, Sep 2000. Acknowledgements [16] S Griffiths-Jones, S Moxon, M Marshall, A Khanna, S R Eddy, and A Bateman. Rfam: annotating non-coding RNAs in com- We thank Jan Engelhard for extracting growth tem- plete genomes. Nucleic Acids Res, 33(Database issue):121–124, perature of the relevant organisms from the literature. Jan 2005. [17] T W Nilsen. Evolutionary origin of SL-addition trans-splicing: Special thanks to Petra Pregel and Jens Steuck for mak- still an enigma. Trends Genet, 17(12):678–680, Dec 2001. ing work so much easier. This work was supported in [18] K E Hastings. SL trans-splicing: easy come or easy go? Trends part by the Freistaat Sachsen, and 6th framework pro- Genet, 21(4):240–247, Apr 2005. grame of the European Union (project SYNLET). [19] C Frantz, C Ebel, F Paulus, and P Imbault. Characterization of trans-splicing in Euglenoids. Curr Genet, 37(6):349–355, Jun 2000. References [20] CS MCDANIEL. Microorganism coating components, coatings, and coated surfaces. WIPO, 2005. URL {http://www.wipo.int/pctdb/en/wo.jsp?IA= [1] W J Murphy, K P Watkins, and N Agabian. Identification of a novel Y branch structure as an intermediate in trypanosome WO2005026269&wo=2005026269&DISP%LAY=DESC}. mRNA processing: evidence for trans splicing. Cell, 47(4):517– [21] E Alverca, S Franca, and S M D´ıaz de la Espina. Topology of 525, Nov 1986. splicing and snrnp biogenesis in dinoflagellate nuclei. Biol Cell, [2] M Milhausen, R G Nelson, S Sather, M Selkirk, and N Agabian. 98(12):709–720, Dec 2006. Identification of a small RNA containing the trypanosome [22] C Ebel, C Frantz, F Paulus, and P Imbault. Trans-splicing and spliced leader: a donor of shared 5’ sequences of trypanoso- cis-splicing in the colourless Euglenoid, Entosiphon sulcatum. Curr Genet, 35(5):542–550, Jun 1999. matid mRNAs? Cell, 38(3):721–729, Oct 1984. [23] D B Guiliano and M L Blaxter. Operon conservation and the [3] T De Lange, T M Berkvens, H J Veerman, A C Frasch, J D Barry, and P Borst. Comparison of the genes coding for the evolution of trans-splicing in the phylum Nematoda. PLoS common 5’ terminal sequence of messenger RNAs in three try- Genet, 2(11), Nov 2006. panosome species. Nucleic Acids Res, 12(11):4431–4443, Jun [24] TL Bailey and C Elkan. Fitting a mixture model by expectation 1984. maximization to discover motifs in biopolymers. Proceedings [4] L H Tessier, M Keller, R L Chan, R Fournier, J H Weil, and of the Second International Conference on Intelligent Systems P Imbault. Short leader sequences may be transferred from for , pages 28–36, 1994. [25] H Zhang, D A Campbell, N R Sturm, and S Lin. Dinoflagellate small RNAs to pre-mature mRNAs by trans-splicing in Euglena. EMBO J, 10(9):2621–2625, Sep 1991. spliced leader RNA genes display a variety of sequences and [5] M Krause and D Hirsh. A trans-spliced leader sequence on actin genomic arrangements. Mol Biol Evol, 2009. mRNA in C. elegans. Cell, 49(6):753–761, Jun 1987. [26] H Luo, G Gilinger, D Mukherjee, and V Bellofatto. Transcrip- [6] A Rajkovic, R E Davis, J N Simonsen, and F M Rottman. A tion initiation at the TATA-less spliced leader RNA gene pro- spliced leader is present on a subset of mRNAs from the human moter requires at least two dna-binding proteins and a tripartite parasite Schistosoma mansoni. Proc Natl Acad Sci U S A, 87 architecture that includes an initiator element. J Biol Chem, 274 (45):31947–31954, Nov 1999. (22):8879–8883, Nov 1990. [27] P A Maroney, G J Hannon, J D Shambaugh, and T W Nilsen. In- [7] N A Stover and R E Steele. Trans-spliced leader addition to mRNAs in a cnidarian. Proc Natl Acad Sci U S A, 98(10):5693– tramolecular base pairing between the nematode spliced leader 5698, May 2001. and its 5’ splice site is not essential for trans-splicing in vitro. [8] A E Vandenberghe, T H Meedel, and K E Hastings. mRNA 5’- EMBO J, 10(12):3869–3875, Dec 1991. leader trans-splicing in the chordates. Genes Dev, 15(3):294– [28] J. L. Chen, M. A. Blasco, and C. W. Greider. Secondary struc- 303, Feb 2001. ture of vertebrate telomerase RNA. Cell, 100:503–514, 2000. [9] P Ganot, T Kallesoe, R Reinhardt, D Chourrout, and E M [29] M Xie, A Mosig, X Qi, Y Li, P F Stadler, and J J Chen. Structure and function of the smallest vertebrate telomerase RNA from Thompson. Spliced-leader RNA trans splicing in a chordate, teleost fish. J Biol Chem, 283(4):2049–2059, Jan 2008. Oikopleura dioica, with a compact genome. Mol Cell Biol, 24 (17):7795–7805, Sep 2004. [30] Andreas R. Gruber, Dorota Koper-Emde, Manja Marz, Hakim [10] N N Pouchkina-Stantcheva and A Tunnacliffe. Spliced leader Tafer, Stephan Bernhart, Gregor Obernosterer, Axel Mosig, RNA-mediated trans-splicing in phylum Rotifera. Mol Biol Ivo L. Hofacker, Peter F. Stadler, and Bernd-Joachim Benecke. Evol, 22(6):1482–1489, Jun 2005. Invertebrate 7SK snRNAs. J. Mol. Evol., 107-115:66, 2008. [11] K B Lidie and F M van Dolah. Spliced leader RNA-mediated [31] Andreas Gruber, Carsten Kilgus, Axel Mosig, Ivo L. Hofacker, trans-splicing in a dinoflagellate, Karenia brevis. J Eukaryot Wolfgang Hennig, and Peter F. Stadler. Arthropod 7SK RNA. Mol. Biol. Evol., 1923-1930:25, 2008. Microbiol, 54(5):427–435, Sep-Oct 2007. [32] G. B. Simpson, Alastair, Erin K. MacQuarrie, and Andrew J. [12] H Zhang, Y Hou, L Miranda, D A Campbell, N R Sturm, T Gaasterland, and S Lin. Spliced leader RNA trans-splicing in Roger. Eukaryotic evolution: Early origin of canonical introns. dinoflagellates. Proc Natl Acad Sci U S A, 104(11):4618–4623, Nature, 419:270, 2002. Mar 2007. [33] X S Chen, W T White, L J Collins, and D. Penny. Compu- [13] M Blaxter and L Liu. Nematode spliced leaders–ubiquity, evo- tational identification of four spliceosomal snRNAs from the lution and utility. Int J Parasitol, 26(10):1025–1033, Oct 1996. deep-branching eukaryote Giardia intestinalis. PLoS ONE, 3: 8 e3106, 2008. 131/1/83.pdf. [34] A Simoes-Barbosa, D Meloni, J A Wohlschlegel, M M [51] WH Kusber. A study on Phacus smulkowskianus (Eugleno- Konarska, and P J. Johnson. Spliceosomal snRNAs in the uni- phyceae) - a rarely reported taxon found in waters of the botanic cellular eukaryote Trichomonas vaginalis are structurally con- garden berlin-dahlem. Wlldenowia, 28:239–247, 1998. served but lack a 5’-cap structure. RNA, 14:1617–1631, 2008. [52] T L Webb and D Francis. Mating types in Stentor coeruleus. J [35] E Bon, S Casaregola, G Blandin, B Llorente, C Neuv´eglise, Protozool, 16(4):758–763, Nov 1969. M Munsterkotter, U. Guldener, H W Mewes, J Van Helden, [53] J Kielhorn and G Rosner. Morpholine. first draft. IN- B Dujon, and C. Gaillardin. Molecular evolution of eukaryotic TERNATIONAL PROGRAMME ON CHEMICAL SAFETY, genomes: hemiascomycetous yeast spliceosomal introns. Nu- 1996. URL http://www.inchem.org/documents/ehc/ cleic Acids Res., 31:1121–1135, 2003. ehc/ehc179.htm. [36] P A Maroney, J A Denker, E Darzynkiewicz, R Laneve, and T W [54] J P Bruzik, K Van Doren, D Hirsh, and J A Steitz. Trans splicing Nilsen. Most mRNAs in the nematode ascaris lumbricoides are involves a novel form of small nuclear ribonucleoprotein parti- trans-spliced: a role for spliced leader addition in translational cles. Nature, 335(6190):559–562, Oct 1988. efficiency. RNA, 1(7):714–723, Sep 1995. [55] D L LEHMANN. Culture forms of Trypanosoma ranarum [37] R S Dassanayake, N V Chandrasekharan, and E H (lankester, 1871). II. effect of temperature upon reproduction Karunanayake. Trans-spliced leader RNA, 5s-rRNA genes and and cyclic development. J Protozool, 9:325–326, Aug 1962. novel variant orphan spliced-leader of the lymphatic filarial ne- [56] ADC Manaia, MDCM de Souza, EDS Lustosa, and Roitman I. matode wuchereria bancrofti, and a sensitive polymerase chain Leptomonas lactosovorans n.sp., a lactose-utilizing trypanoso- reaction based detection assay. Gene, 269(1-2):185–193, May matid: Description and nutritional requirements. J.Protozool, 2001. 28(1):124–126, 1981. [38] D L Redmond and D P Knox. Haemonchus contortus SL2 trans- [57] JBT da Silva and Roitman I. Effect of temperature and osmo- spliced RNA leader sequence. Mol Biochem Parasitol, 117(1): larity on growth of crithidia fasciculata, c. hutneri, c. luciliae 107–110, Sep 2001. thermophila, and herpetomonas samuelpessoai. J.Protozool, 29 [39] K Z Lee and R J Sommer. Operon structure and trans-splicing (2):269–272, 1982. in the nematode Pristionchus pacificus. Mol Biol Evol, 20(12): [58] J D Berman and F A Neva. Effect of temperature on multiplica- 2097–2103, Dec 2003. tion of Leishmania amastigotes within human monocyte-derived [40] R E Davis, H Singh, C Botka, C Hardwick, M Ashraf el macrophages in vitro. Am J Trop Med Hyg, 30(2):318–321, Mar Meanawy, and J Villanueva. RNA trans-splicing in Fasciola 1981. hepatica. Identification of a spliced leader (SL) RNA and sl se- [59] M Gray, B Wawrik, J Paul, and E Casper. Molecular detection quences on mRNAs. J Biol Chem, 269(31):20026–20030, Aug and quantitation of the red tide dinoflagellate Karenia brevis in 1994. the marine environment. Appl Environ Microbiol, 69(9):5726– [41] K Brehm, K Jensen, and M Frosch. mRNA trans-splicing in 5730, Sep 2003. the human parasitic cestode Echinococcus multilocularis. J Biol [60] H D Park, N E Sharpless, and A B Ortmeyer. Growth and differ- Chem, 275(49):38311–38318, Dec 2000. entiation in Hydra. I. the effect of temperature on sexual differ- [42] R M Zayas, T D Bold, and P A Newmark. Spliced-leader trans- entiation in Hydra littoralis. J Exp Zool, 160(3):247–254, Dec splicing in freshwater planarians. Mol Biol Evol, 22(10):2048– 1965. 2054, Oct 2005. [61] C Ricci, M Caprioli, and D Fontaneto. Stress and fitness in [43] S I Miller, S M Landfear, and D F Wirth. Cloning and charac- parthenogens: is dormancy a key feature for bdelloid rotifers? terization of a Leishmania gene encoding a RNA spliced leader BMC Evol Biol, 7 Suppl 2, 2007. sequence. Nucleic Acids Res, 14(18):7341–7360, Sep 1986. [62] LI Lebedeva and TN Gerasimova. Peculiarities of Philodina [44] M L Muhich, D E Hughes, A M Simpson, and L Simpson. The roseola growth and reproduction under various temperature con- monogenetic kinetoplastid protozoan, Crithidia fasciculata, con- ditions. Int. Revue ges. Hydrobiol., 70(4):509–525, 1985. tains a transcriptionally active, multicopy mini-exon sequence. [63] W A Van Voorhies and S Ward. Genetic and environmental Nucleic Acids Res, 15(7):3141–3153, Apr 1987. conditions that increase longevity in Caenorhabditis elegans de- [45] D A Campbell. Bodo caudatus medRNA and 5s rRNA genes: crease metabolic rate. Proc Natl Acad Sci U S A, 96(20):11399– tandem arrangement and phylogenetic analyses. Biochem Bio- 11403, Sep 1999. phys Res Commun, 182(3):1053–1058, Feb 1992. [64] J Martinez, J Perez-Serrano, W E Bernadina, and F Rodriguez- [46] M A Larkin, G Blackshields, N P Brown, R Chenna, P A Caabeiro. Oxidative, heat and anthelminthic stress responses in McGettigan, H McWilliam, F Valentin, I M Wallace, A Wilm, four species of trichinella: comparative study. J Exp Zool, 293 R Lopez, J D Thompson, T J Gibson, and D G Higgins. Clustal (7):664–674, Dec 2002. W and Clustal X version 2.0. Bioinformatics, 23(21):2947– [65] A Pires da Silva. Pristionchus pacificus genetic protocols. 2948, Nov 2007. WormBook, pages 1–8, 2005. [47] Wolfgang Otto, Sebastian Will, and Rolf Backofen. Structure [66] L F Lejambre and J H Whitlock. Optimum temperature for egg local multiple alignment of RNA. In Proceedings of German development of phenotypes in haemonchus contortus cayugen- Conference on Bioinformatics (GCB’2008), volume P-136 of sis as determined by arrhenius diagrams and sacher’s entropy Lecture Notes in Informatics (LNI), pages 178–188. Gesellschaft function. Int J Parasitol, 3(3):299–310, May 1973. f¨ur Informatik (GI), 2008. ISBN 987-3-88579-230-7. [67] N J Lwambo, E S Upatham, M Kruatrachue, and V Viyanant. [48] I L Hofacker. Vienna RNA secondary structure server. Nucleic The host-parasite relationship between the saudi arabian schis- Acids Res, 31(13):3429–3431, Jul 2003. tosoma mansoni and its intermediate and definitive hosts. 2. ef- [49] Daniel H. Huson and D. Bryant. Application of phylogenetic fects of temperature, salinity and ph on the infection of mice networks in evolutionary studies. Mol. Biol. Evol., 23:254–267, by s. mansoni cercariae. Southeast Asian J Trop Med Public 2006. Health, 18(2):166–170, Jun 1987. [50] JR Cook. Adaptations to temperature in two closely [68] J R Claxton, J Sutherst, P Ortiz, and M J Clarkson. The effect related strains of euglena gracilis. Biological Bulletin, of cyclic temperatures on the growth of Fasciola hepatica and 1966. URL http://www.biolbull.org/cgi/reprint/ Lymnaea viatrix. Vet J, 157(2):166–171, Mar 1999. 9 [69] D Palakodeti, M Smielewska, and B R Graveley. MicroRNAs from the planarian schmidtea mediterranea: a model system for stem cell biology. RNA, 12(9):1640–1649, Sep 2006. [70] M Novak. Growth of echinococcus multilocularis in gerbils ex- posed to different environmental temperature. Experientia, 39 (4):414–414, Apr 1983. [71] JK Petersen and HU Riisgard. Filtration capacity of the as- cidian Ciona intestinalis and its grazing impact in a shallow fjord. Mar.Ecol.Prog.Ser, 88:9–17, 1992. URL http://www. int-res.com/articles/meps/88/m088p009.pdf. [72] F Broms and P Tiselius. Effects of temperature and body size on the clearance rate of oikopleura dioica. J.Plan.Res., 25(5): 573–577, 2003.

Table 2: Sequences, secondary structures, and folding energies ∆G(kcal/mol) of known SL RNAs. Donor splice site (arrow) and Sm-binding site (box) are marked. Left column: Structures pro- posed in the literature. Right column: Alternative structural mod- els proposed in this work. Abbreviations: UWGCG – Univer- sity of Wusconsin Genetics Computer Group; * – Recalculated; T – natural ambient temperatur of organism; Blue nucleotides (Eu- glena, Rotifera) indicate mutations within known SL RNA align- ments; Blue Box (Hydra) – alternative SM-binding sites; Green Ar- row (K. brevis) indicates erroneous splice site from the literature; Se- quences, constraints and drawings are available at www.bioinf.uni- leipzig.de/Publications/SUPPLEMENTS/09-009

10 Published SL RNA T Alternative possible structures at organisms temperature E. gracilis, 1991 [4], Additional Constraint: folded by hand locarna-output AA C A CA AA CAC G G A C A CA CA CAC G CU CU CU C G C CU C C G C G AG C G C G C G C G C G A U A U A C A U A G C A U A CA C G U C G A G C G G C G C C G G C G A C G U C U C G AA C A A C G A U A C G A U G C U U CAA U GGUA AUUUCGA UU ACAAGG GCC CUUCAC U 29C U A U A U U U A U C G U C U U A A CA GA A U U UUC U U A G C G C U C U U A G C G C U GGUA AUUUCGA CAAGGA GC GC CUUCAC U U UUUUGUGAGUAU C ACC CGG GGAGGUUU U GGUA AUUUCGA CAAG CAAUUUUUG C U GGUA AUUUCGA CAAG CAAUUUUUGGAUCCG CA U U U C U CU G A C U U UUUUGUGAGAU U CCU CG CG GGAGGUUU UU C CA U U UUUUGUGAGAU U U C C AC CAA GC [50] U UUUUGUGAGAU U U C C UUU U C C UUU UUU RNAfold (29C): -26.52 -35.02 -32.30 -28.04 E. gracilis, 1999 [22], Additional Constraint: folded by hand? locarna-output AA CA CA C G G A AA CA AA C U C U GC CA C G C G C U C U C U C U C G C G C C G CA C G A U A G C C G C G A U C U A A U A CA C G C G G C C G G C G C C G G C G A A U A U A C G A U A C G A U GC C G A C G A U C A U A U A U U A U C U U A G UUC U C G G C G C 29C U UUC U U A G C G C U C U U A AA C UA U U GGUA AUUCGAC UU ACAAG CAAUUUUUGGAGGCG CCAUCC U C U U A G C G C U GGUA AUUCGAC CAAG CAAUUUUUGGAG C U GGUA AUUCGAC CAAGG GCC G CUUCCAA U U GGUA AUUCGAC CAAG CAAUUUUUG C AU U U U UUUUGUGAGUAU C U UUUGUGAGAU CC UUUGUGAGAU CC CGCG GGAGGUU U C U U UUUGUGAGAU U U U UUU C U C AA U ACUU [50] U U U U C C C A U C C ACUUU ACUUU UCC ACUUU RNAfold (29C): -23.49 -32.59 -33.69 -28.54 P. curvicauda, 2000 [19], folded by hand? CC UU U U AUUG CC A G U U C A C G C A C G CC C G A U C G A U U U A C CA C G A C CA C G C G G C G A C G G C G A C G A U C G G C U A C G G C U A C G A U C G C A U C G C UC U C G A U G C 22C UC U A A U G C UC U U A A U GGUA AUACUCGUU ACACCCG CAAUUUUUGCAG CA U U GGUA AUACUCG CCCGAACGACCAUUGAGCCGUU CA U U GGUA AUACUCG CCCG CAAUUUUUGCAG CA U U U UUUGAU U GAGU C UUUGAU U GAG CGGCGG GU U UU C C CA U UUUGAU U GAG C UU C U AA AC U U UU C U CA C C CACCU [51] C CACCUU GCCACAC CACCUU RNAfold (22C): -34.62 -41.51 -32.85 C. acus, 2000 [19], folded by hand? C C C A CCA C A AC U U AA U U U U AC C G U G C G G A AC C G U G U A C G U A G C U G U A C G AC G A U GC C G C G G A U U G G C AA A U G C AA C G C G G C G A C G G C G C C G G C G A C G A U A U G C C G G U G C C G A U G C U C G G U C G 22C U A U A A U G U C G UC U U G GA C AU A U U C G G C G C U G G C G C U GGUA AUACUCG CCGGAU GGUGUCC G CC UCCAA UC U UC U UC U U G G C G C U U U GGUA AUACUCGUU GCAUCG CAAUUUUUGGGA CACC U GGUA AUACUCG CAUCGG CAAUUUUUG C U GGUA AUACUCG CAUCG CAAUUUUUGGGA CACC U UUAU UGUGAG CC CC CA AGG C GG AGGUU U U U UU C U A G A U UUUUUAUCUGUGAGUC U UUAU UGUGAG U UUAU UGUGAG C U UU C U UU C U AUCU U ACUU C C AUCU U AUCU U RNAfold (22C): -48.37 -55.20 -54.10 -42.92 R. costata, 2000 [19], folded by hand? C CC A U C A U C G CC A U CC A U C G A U C G C G AA C G C G AA C G C G G C G A C G G C G A C G U A A U C G U A A U C G U A U A A U C G UU U A A U C G UU U A C C G G C C G U U C G AC U A 23C U U U C G G C C G UCAU UCGGUAUAAUCUA AACCGGAAGCC UGGCUUCCAAU GGU AUAAUCU AACCG CAAUUUUUGCAA UA A CAU UCGGUAUAAUCUA AACCG CAAUUUUUGCAA UA A U U C A C A GUG GGUCAU UGG GCC AA GUU U UUUA AU UGG UGUG GGUCAUC U CA CA A G C C U U CAG G A U AA U U CAUUAC [52] RNAfold (23C): -35.72 -42.69 -24.78 M. pellucidum, 2000 [19], folded by hand? UUG U C C C U G U U U U U C A U U U C C G C G C G A A C G C G AAA A U A U G C GA A U G C GA C G A U C G C G A U C G C G C G A U C G UU C G A U C G UU C G GA UU U A G C C G U U U A G C C G U U U U A G ACU C C UCGGU AU AAAUCACU ACCG CAAUUUUUGCAA U 23C UCAU UCGGUAUAAUCACUA ACCG CAAUUUUUGCAA U A CAU UCGGUAUAAUCACUA ACC G UUGCG C U AC A AC A A GUGAGUCAU A GUGAGUCAU GG C AACGU A UUUUUUAC AAGUGAGUC U U U U G A GCC U U A A A U AA UU ACAUU AC AC AC A RNAfold (23C): -26.58 -35.73 -22.94 E. sulcatum, 1999 [22], 16 nt shorter folded by hand? different SM-binding site UU UUUC U C UUG G C G C CC A C A C U U C C U A U U A U UUG G U C G U G C G U G C G C C G U C G C C G C C C G C C G G U C G G C C G G C C G U G C C G A U UU C G A U G C C G A U G C C G G U G C C G UUC A U CG UUC U G G A U C G U G 25C A U C G U GGUAU CUCCUUA GGCACCCAGGCAAGUUUCCCUG C A U UUC C U G C G G U UUC U G C G U U GGUAU CUCCUA GCACCCAGGCA UUCGGUAUUA C GCA UCCUUUUGGCUUUUUUUGGU G G U U U GGUAU CUCCUA GCA UCCUUUUGGCUUUUUUUGGU U UUCAAU GAUU U G U G G U AU UCAAU GAU CUGGGUCCGU UUGCAAAU G UCAAU GAU U C U U U C C C UUU A [53] U U U G ACUCGUUU U AU C U U AU C GUGUUU C U ACUCGUUUCU ACUCGUUU ACUCGUUU UUU RNAfold (25C): -33.00 -33.55 -34.10 -33.55 L. collosoma, 1988 [54], suboptimal stem I

UWGCG5.0:-17.5U U C A C A CG CU U CG CU A U G U C A A U G U CG CU UU GU AG C AUG A A U AU A U G U A U AU GGAGACU U A GAU UCC AA CU UCUGA A G C G C A U AU G C G C A U UU GU A U C G G C G C UU GU A U C G UCAUGCUUUGAU GGG UU GA AGGUU GGAGACU U A GACUUC GAAAUUUUGGCA G 28C U A UU A AG A U C G GGAGACU U A GACUUC GAAAUUUUGGCA G CU C C CU C GCC U A G CUGU CUUCA GUG UGU AGACUUCC GAAAUUUUGGCA GG A G GAA U C U C C A CUUCAUGCUUUGAU C UUU UGACAAGAAGU UUAACA UCA UUU CUUCAUGCUUUGAU C UUU A UU AA UU AAA A AA GU G [56] G U AACUAAAACAAUUUUUGAA AACUAAAACAAUUUUUGAA AACUAAAACAAUUU RNAfold (28C): -24.00 -27.43 -24.00 -20.04 C. fasciculata, 1988 [54], suboptimal stem I UWGCG5.0:-12.5 ACA A A AC AC AC A A AA A A AA G C A A G G AC A A G G A A AA U A G G A A G C G C A A G G G C G C U GGAU U AAGA GUUCCC GCCAA G C C G G C G C G C C G A U U A C G G C A U G C C G U A C G G C U GGAU U AAGGACUUC GCACAAAUUUUAGG UG U U U A A C G G C U GGAU U AAGGACUUC GCACAAAUUUUAGG UG UUUCAUGUCUU CGAAGG CG GUUU A 20C CUG UACUU UGGU AAGGACUUC GCACAAAUUUUAGG UG A UG GA U UCAUGUCUUUGA U U UCAUGUCUUUGA U C UGACUAUGAA AUCG UCA U C UCAG CUG UA UAU CAA A UA U U AU AACUAACGCUAUAUAAG [57] AACUAACGCUAUAUAAG G AAU AACUAACGCUAUA RNAfold (20C): -30.83 -30.10 -30.83 -31.40 L. enriettii, 1988 [54], UWGCG5.0:-6.4 UG A C U G A C C G C AU C G A U G UUGGAUGGAAACU C CUUCCGGAACC UGUCU AA G C C G U AU AA G C CA G C G U A C U U G C CA UUGGAUGGAAACU C C U U CCG CUGU CUUCCGG A C UCAUGCUUUGU GAGGGUUUU G ACGG UCUG UACUU GGUAUGGAAACC CUUCCG CUGU CUUCCGG A U C C G A U 36C U A C A A C U G A G U GGAUGGAAACCUUU C GAACGAUUUUGGG U GAC AUGAA UAUGCC GGC GGCG GAGGGUU CUUCAUGCUUUGU A GGC GGCGCGAGGGUU A A C G U U UA A U AA C U A C U AA U AU CGGAACGGUUU A U U UA U U CUUCAUGCUUUGU A AACU U U C UG U UA [58] A AACUAACGCUAUAUAAG U A AACUAACGCUAUAUAAG AACUAACGCUAUAU RNAfold (36C): -16.26 -20.40 -24.72 -20.87

11 Published SL RNA T Alternative possible structures at organisms temperature T. cruzi, 1988 [54],

UWGCG5.0:-24.1G A U GC C G AC G C GU C G G C G U U C A GC A U C GC AC G A U UA A U GC UC UG U AU U G G C A U C G UC A C A U C G GC C G G C G C G C UUC C UU U GUA C A UC C G G U G C GGUACGGAAGCU CAA GCGAGUCG GG CU UUGGUCAA U UUCGGUACGGAAGCU CAAAUCCGG C CAAUUUUUUUCGACCG U U U U C C G UU A A U UA U CUGAU C AUAUGGUACG CAAAUC GCUAUGUUUGGUCAA UCAUGGCUUUGA GUU CGC AGC CC GG GGCCAGUU C A U G C 23C A U U U UA A U UU UAUCAUGCUUUGAU GUUA C C A A U A A A U G C C A UGACAUAGUUUA UAUCGC GGGGCCAGUUUU AU U A A C AU U U U UUC C G G C U A C U CCUUU GGUACGGAAGCUUC GUCAAUUUUUUUCGACCG U A AACUA AAC A UU [55] AACUAACGCU CACGAACCCUUU UAUCAUGCUUUGAU C AUAGU U AACUAACGCUAUUA RNAfold (23C): -39.11 -34.93 -33.17 -42.24 T. vivax, 1988 [54], UWGCG5.0:-26.5 CA CC G A G C CC CG G C CA G C CC U A C G G A G C G C G C G C C G C G G C UA U A G C C G UU G A A G A GC C G A G C C G G C GGUAGAAGCUUA CC CGGUCGC AG CCGUG U CG C G AG G C C G C G A U U G C G C C G G C C G UCAUGCUUUGAU GG GCCGGUG UC GGCAC U C G G C UU A U U G C G C UA C C C UU G 23C U A CGC CG G C C CC A U CUGU CU AUA GGUAUG CC GGAAUUGUUUGACACGGCCCUCUG A GGUAGAAGCUUA C GUUUGACACGGCCCUCUAAUUG G AA A UUGGUAGAAGCUAG UCCCGGU AAGAC UGGAAUUGUUUGACACGGCCCUCUG A AA C GG A A U A A AA CC UCAUGCUUUGAU C UGACAAGAUUAU UCGA C UCAUGCUUUGAU G UA UUC G AUCA C GU G UA C UU A UUUAAACAAAACACAA UA C AAU UU AA A A AA UUUAAACAAAACACAA A A UUUAAACAAAACACA A GC G AACU AUACUAAAGCU UUAU G AUACUAAAAGCU UUAUU [55] CCAAACAACACAAAAAAACUUU RNAfold (23C): -52.77 -46.90 -53.64 -42.79 T. brucei, 1988 [54], alternative SM site

UWGCG5.0:-42.3U A U C G G U U CG A U A C G G C UU G U A U G C CG C G G C A A U U U G C A U U G A U C G G C C G C G U C GA A U GA C G G U A U G U G U CG G C CG C G UA G U U A UA C G CAUA G C G C G C G C C G U G C G U C U U U U A G C G U CG G C U CC G C G C C G U C U G C G A U G U A G C GA GC A U G C U CA G A C G GC 23C C G G C C G U AC G U AA C G G C C C U U A U A G C U G G C U A G C U A C G U A A U C G CG G C C G GC A U AUG C G A G U A U G C UA UCUGUACUA UAUUGGU AGAAG CAG CGCAUUUGUGCUGUUG C UG G U A U A U C G U A G C G C GACA GAU AU AUCG UC [55] C G G G G C U A U U A A C CAA A A U G C UU G C G A A U UA U GGUAGAAGCUA UC GCCAAC GAAUCUGGA U UU G U G G G GC G C A U GGUAGAAGCUUA CCCAG CAUACU CGG A A AU G C UAUCAUGCUUUGAU C A CG U A AA UAUCAUGCUUUGAU C GGUGG GUCU C G G AA G AAG C G G C AACUAACGCUAUUAUUA G G UU G AACUAACGCUAUUAUUA UCGGAUGACCUC GGUAGAAGCUUA C GGAAUCUGGAA UC C A U A UAUCAUGCUUUGAU C UU UUUCUC AA G AACUAACGCUAUUAUUA RNAfold (23C): -67.58 -60.45 -53.11 -57.98 K. brevis, 2007 [11], 24 nt 3’ removed mfold (SM cons) UC C AGGC AGG G A G C GC C C G C C U C U C G A C A C G C C U C U U A UG UU U U A G AUG G U GG UU U G G C C G C G C G U G G U GU C G G C G U U A U A G C G C A U A U C U A G C C C G C G C U U C G AAG U A U A 20C G U A U A C G G AC A C G G C GA A U UUGGC U C AAGGUAC AAG C AGGC C UC UU U UGUAUC AGAU AGC A GU AUGAUAU A A A G G U A U U A A U G C G U C C GA G UC UA AGUC UA UC GU C G UUA UAU U G U C A C AA A U C C C C AG C U A G A A U U G A GU U U C C G U A A C A UU C A U A UA UAAA [59] G U UA C G UC UC GC C A UGG A GGGC AGC AGA C A U GAUA U UUGGC U C AAG C AGAUAAGAC AGC AG A GG C A U GAU AUAAA U A U A UA A A U U U GGC U C AGGUAC AAG C AGC AGA GG C AU GAU AUAA AGAG C GGU AC C GUC U C C GUA C UAU U U C C GA G UC UA GUC U C C GUA C UA UAUU A U U AGA A UC U UA A A UC GG UAUU A C C A U U U C C GA G UC UA AGUC U C C GUA C UUA UAU A AAA C U A UC GGC A A U U U C C UC GC G RNAfold (20C): -46.29 -58.15 -39.64 -46.43 K. micrum, 2007 [12], 79 nt 3’ added mfold3.1.2 (SM cons,20C)

GGCCU UC AU G A C U G C G U C G C U G C AC AU U A U A U G U G U G C U G UU U C G C G U UC G CU U G A C G CC A G C U A AC G U G U G C U A C U A A G C U G U C A U U GCU G U U G U A U G CAA C G UUC U GGCU GGUACAAGUUGGGC UGCG 20C A UUGGCU GGUACAAGUUGGG CCUUUUUUAAUCG C UCUUAGCCAACUCUGAAUCC GAAGUCA UGCAUGG CG C G GCCGA C U U U A G C U UA UACC GA CCAC AGG CUCCAGU ACGUACGUGA C UG U CUA UGU G C U CGAC CCA UCCGACGC U A C G A UG G A G C U A A U U A A C U G C C A AU U C A U UUGGCU GGUAC GAAUCC GAAGUCA UGCAUGCG G C A C U U A C A AU U C U CGAC CCA AGG CU CAGU ACGUAC UG UUGGCU GGUAC GAAUCC GAAGUCA UGCAUGCG G A UG C C A C U U G A C U U G U A C UACGAC CCAC AGG CUCCAGU ACGUACGUGA UG U CUA UGU C RNAfold (20C): -27.68 -66.47 -50.85 -63.97 Hydra-A, 2001 [7], mfold2.3 (SM,Donor cons,18C) UU U A G C CA U A U GG GG C G A G A A C U CU AC U AU U C U U U UA U U U U G U U G UA A U U C AUAAAGGGGUA UACCAAACUAGUU UUCC AC U C U U U G A AAGGGGUA UACCAUAACAGUUU CUCU G U A G C A U 21C U U G AUAAAGGGGUA UACCAUAACAGUUUACUUCUCUUUAGACUUUUCGGG U A A UUCUUUUAUCU UCGG AGGG UG C C U A UUCUUUUAUCU GGUUAGG GGGG U UU C U G CA U U G UUCUUUUAUCU G U C U U GG U A G C C C U UU [60] AUAAAGGGGUA CGAUUUUCGGG U U U A UUCUUUUAUCU RNAfold (21C): -14.37 -10.96 -9.26 -18.06 Hydra-B, 2001 [7], mfold2.3 (SM,Donor cons,18C) AAAU U U A U A U C G U A A C U C G A A GU A AGGAU GG C AAAU G U A U U G U AAAU U UCC AUU UUC AU G C U U G G U A U G A U C G A U U U C G C G U G AA UG U U AU A A U U UA AA U A U A U U U UA C G A AAGG GGC AUC AUUCGCAAAUUUCGAU A A U A C U A A AAGG GGCAU ACGAAAUUUCGACU A A U U GG U 21C G G U G C G U G A U U UCCCUG UGG UGA GCUUUGGGGCU U U UAA AAG UAGG CAAAUUUUCGAA UG U UCCCUG GCUUUGGGGCU U GUG A A G UG U C G G U C G G G A G U U A CUU C AUU UUC GUC G U CUUUU A U U A U U [24nt] A [60] CUG AAA UA AAAU U AGUGGGUAAA A U C [21nt] AAACUUU A U AGCG AAAAAAACCA G C U G C G A U U A A C G A U AGGGUAAACG AAAAAAACC G RNAfold (21C): -32.62 -18.03 -22.17 -20.88

12 Published SL RNA T Alternative possible structures at organisms temperature A. ricciae, 2005 [10], mfold CU CU AC AC AU AAAU AU AA C G AA G C C G A U U A A U C G U A G C GA A U A U A G C AGGUAAAUGAUCAACUCUGAUCAAUACAUAGUUGC UG GCGGGGUGCU AG A U C G A G C G UGC G G G C A G C GA A U G GGAUA AU AAGUUGCAUACAU A UGCGGGGUGCU A 24C A CCAUU AC CAACG AC CGUCCGAUCAU A GGAUA AU AAAUACAUGUUGC UG GCGGGGUGCU A G C A A A G C CA U AG A CCAUU CAACG CGUCCGAUCAU A U A CCAUU CAACG AC CGUCCGAUCAU A A C A G A C A GGCUAU A C A AA AA AG C C GAGCUAU UU [61] GAGCUAU UU RNAfold (24C): -45.31 -36.14 0 -46.28 Philodina sp., 2005 [10], mfold UU UU A U A U AU AA AU C G AA A U C G AAAU G C AAAU U A A U C G G C U A G C A U G C GA A U C G A U C G A AGGAUA AUGACUAAUUUUGAUCAAUCAUGUUGCCA UG GCGGGGUAUUGG GA A U A G C G UGC G A G G C CAAGAGGAUA AUGA UAAC UUUUGAUCAAUCAUGUUGCCA UG GCGGGGUAUUGG A G C GA A U A G GGAUA AU ACAAUCAUGUUGC A UGCGGGGUUUG 20C A CCUAU AC CAACG AC CGGUC A UCAC G C C GGAUA AU ACAAU CAUGUUGC UG GCGGGGUUUG G C A A C A G C CA U AG AUUCAACAUUAUUCGG CAACG AC CGGUCCA UCACAG A U AG A ACCUAU C CAACGA G A C CGUCCAGUCACAG GGCUAU ACCUAU C CAACG AC CGGUCCA UCACAG AA AA AG C C GAGCUAU UU [62] GAGCUAU UU RNAfold (20C): -51.41 -42.28 -42.21 -49.82 C. elegans 1, 1988 [54],

UWGCG5.0:-16.0 AA G A AA AA AU G A G A UC AU AU AU C G A U AU UC AU CG A U C G AU A U UG C G CG CU AUU CG C G C AUA C G UG UGA AG AA A U AG UA C G C G C C G U GGUAAAACUAGACCC AGA GGCGUU CUA A AU C U A C G A U C G C G U 20C A U U U A U U A G A UG CU C G C G G U A U CCAUU UUUGG GGG UCU CUGCAAGGU U A AG A U U A A A G U C G AAC AA A GC C U UU U GGUAAAACUGA UAGCUUAAAAUUUUGGAACG C AC U A U G AC U A U A G C U A UUGAGGUA AUUGAAAC G C GAACGUCUCCUC UUGAGGUA AUUGAAAC GCUUAAAAUUUUGGAACG C A G A U U U A CAAAAUACUAA A CCCAUU UUUGG AAUCAUAA G C G A A AA [63] AAC CCAU UAAUUUG ACAGAGGGG AAC CCAU UAAUUUG AAUCAUAA G A G AAUACUAA RNAfold (20C): -38.14 -44.73 -43.54 -33.80 C. elegans 2, 2000 [15], folded by hand? AA AA G A AA G A C G G A C G U A C G U A UC U A U A U A G A AA GUCA UC U A G U G A G U U A U A G A C G G C U A GUCA C G G U C G G C U A G C U A U G C G G U U G C U U G C U A C G U G C C G G C C G U U U G C G G U C G G C A U U G U U A A U A U C G C G C G A U U G G C C G A U U A 20C G C C G U AA G U G C A U G U A C A G C C G U AA G U C A U A U A U G U A GGUA GCUGGGUCU U AAACGACUUUAAUUUUUGGAACGC CA C A U A U GC C G C U G C G A A U A U A CUUUGACCAACA C U A U GC C G U A U A U A U [63] U A U A A C G C C G U A C A C G GGUU A C G C C G GGUA GCUGGA UUUAAUUUUUGGAACGC CA G C A C GGUA GCUGGA UUUAAUUUUUGGAACGC CA A A A GUA GCUGGAG C G C G CA A A CUUUGACCCA C A A A CUUUGACCCA C AAUUUUGG A A CUCAUUGACCC A GGUUUUA A GGUUUUA RNAfold (20C): -54.79 -68.58 -57.14 -58.38 T. spiralis 1, 2008 [14], mfold (SM cons) U U U U U U U U U UUU GC GC UUU GC U A G C G C U A G C C G G C G C C G G C A U U A UUU U A A U U A U AA UU UU C G UU U A UU U U C G U U A AGGUAAAUAUUUCAAAUCUGAAC CUU AGUGUC GACAAU A U U G C G A U U G A U U G U U G CU CG 36C A U A U CG G CU CG C CCAUUUAUGGA CU UGGG UCA GG CUGUU U U A U C G C G A U U U A U UAGA CC U UU UU U AAAA C G G C AA A U A U G C AAAA C G G C A U GGUAAAUAUUUCAAAU AAAUUUUUUUUC GCG CU A AGGUAAAUAUUUCAAAUCUG CUUG CUUG C U GGUAAAUAUUUCAAAU AAAUUUUUUUUC GCG CU UGGUCCUUU C U U U C U U ACCAUUUAUGGA U [64] C CCAUUUAUGGA U U ACCAUUUAUGGA U AG UAGA U AG RNAfold (36C): -29.49 -30.15 -27.80 -19.06 T. spiralis 2, 2008 [14], mfold (SM cons) UA UA UA A U A U A U A U A U UU A U UU A U A U U U A U U U C G C G U G G C G U G UAA A U A U U G U U A A U A U U A A U A U A U C G A U G U A U AUUUUGAA GGUAAUCGAAACGUGUACGACAAAUAUGU U CAGAAUU G U A U C G A U C G U C G G U U A U A U A 36C C G G U A AACUU CCAU GUAA AC GUUUUG U G C A U A U A U UU G A A U U G U A U A U A U UU UU C U A A AUUUUGAA GGUAAUCGAAACGUGU A C GGAA U A UUGAA GGUAAUCGAAACGU AGAAUGGAUUUUUUUUG A U A UUGAA GGUAAUCGAAACGUGU A C GGAA U A A AACUU CCAU A A A AACUU CCAU UU G A [64] UUAACUUGCCAUA UUAGUAAUACU UU G A C C C RNAfold (36C): -19.51 -17.78 -11.74 -18.27 P. pacificus 1, 2003 [39] AA GA AA G C AA GA U A GA UA G C AA G C C G U A A U GA UA A U C G UA G C C G U A C G C G UA U A C G A U G C U C G U C G A U G C C C G U U G U G C G U G C G C G U G UG U AU C G C G G C C G G C U C G G C U G AU U G G C U A U G A U G C G C U U G 20C A U C G A U C G UG CU C G UG C A U C G UG G U C G C G G C G C C G G C C G C U U A A C U G C G C U A UA UU C G C U A U G C U GGUAAAACUG CUUUUGCCCACCCAAAUUUGGAACUG CA U UG GGUAAA UG CUUUUGCCCACC GAACG CA G U G C UA U A U C G G UUGAGGUAAAGAUUC GCCCACCCAAAUUUUGGAACG C A CCCAUU UUGGU G AC CCAUU U A U A G U C G G C A AA [65] A A G UG GGUAAAGAUUC GCCCACC GAACG C A AAC CCAUUA U A GGUUU A G GGUUU AAC CCAUUA A GGUUU RNAfold (20C): -45.98 -57.87 -50.76 -54.23 P. pacificus 2, 2003 [39] AAU A C A U G C A U C G AAU AU U A C G A AAU AU AU A U G U A C G A G A C G AU A U G U G U A U G A G C G C G C G C G C A U G C A U G C G C G U G U C G G U C G G U CA G U U G G C U C G A A C G U GU C G CG G AAU G C G C G G C C G AU G A U U A A C G C U A UU AC G U A U CU U A G U C G 20C U A U U A C G G U A U C G A U G U A C U C G C U A U A C A C G A U C U C G C G A G U C U A U G C C G GG CA G C U G A U U A GG GC UGGGCUU ACAACCAACAGUUUUUGGGCAG CA A GGUAC UGGG GACC G U AGUUUUUGGGCAG CA G U C G C G C G A A C A G C U A AGGUAC UGGGCUCAACU AGCAGG C CUC UG ACCCA AC [65] UCUAUG ACCC UUGG AC U A C G A A UA U A U C A GAGUUU UA U A U A UCUAUG ACCCA AC A C C G U A GGUACG CAGUUUUUGGGCAG CA GGUUUA C A UCUAUGA AC CC C GGUUUAUA RNAfold (20C): -41.68 -48.85 -48.98 -51.68 H. contortus 2, 2001 [38] UU C G GUCA A U C C C G A G A UC GU GA UA U U G U U A C G UU G C G C C C G G C G C C G A U C G U U A U C G U G G C UC GU C G G C U U U AC U A U AC A AC A A U G U A CC G U AU C A A GGUACU GGG CCCGUCUG UCU C CU GUG GC GUC CAAA U CC GC C G 25C U C C A U U G C U AA UCUAUGACCCA CGGGGCGG AGG GG CACUCGAUAG GUUU A AC G C C G A UUU G C A A C A CU UUA AGGUACUGGG CGUCU ACAAACCCUAAUUUUUGGAAU G C U C GGUUUU ACAAGC CU A A G C C G G C C CUAUGA CCC A AC C U C G U A CGAAC [66] AGGUACU GGG GACCUU AUAGCUCACUCGGGG A G U A G A A A A GGUACUACG CAAUUUUUGGAUAGCUC CUCGG G GUGUUU C CUAUGACCC UUGG CGGG GGGCC A C A U A U A C U UCUAUGAC ACGGGCGGGCCUA AU A CC A CAAGC A CAAGC GGUUUUA RNAfold (25C): -37.63 -47.38 -39.91 -41.15

13 Published SL RNA T Alternative possible structures at organisms temperature S. mansoni, 1990 [6], Zuker:-16.6 UU AAG AC AC U U AC C A C A C A G C C A A C A C C A A C U G A U U G A U U G C G A U C G G C C G GU U G C C G G U G U G C C G C G C G U UGCA GGUAA AACCUG GACCAAAAGCGAUAGUUUUUUUCGGCAGC G 28C G U G C C G AC U UUGCA GGUAA AACCUG GACCAAAAGCGAUAGUUUUUUUCGGCAGC G GUUGCAUGGUAAGAACCUG AAGUUUUUUUCGGCAGC G U U UUGCA GGUAA AACCUG GACCAAAAGU GCAGCCCU A U U UAGUGU UCAUU UUGGCA CUGC UAGUGU UCAUU UUGGCA CUGC U UC C U AGUGU UCAUU UUGGCA CUG GGGG C UC C GUGU UCAUU UUGGC AA U C A AA UA UC A UC C C C [67] AA AACCGU RNAfold (28C): -24.84 -31.05 -26.65 -23.53 F. hepatica, 1994 [40],

Zuker UUC UG CAU A C U C U CA A C A C A AUC C G C G UU A U C G U A CG U U CA A U CG C G A U A U A C U A A U A U C G G C C G C U AG U A A C C U C G AGUGCAUGGUAAGAAUUCG UG CAAACC GCUAGCCUC U U UGCA GGU AGAAUUCG UGG GUG CCAAACCCAUUAUUUGCUG AGCCUC U C G G C G C 16C U U C A G C A U U C AUAUGUCCGC UCUUGGCAAUUCCAA AGAAUCGGAG U U C U AG U A AUGU CGC UCUUGGCAA G AUCGGAG U UC A ACG AGUGCAUGGUAAGAAUUCG UGGACCAUCGG CUAGCCUC U U UGCA GGU AGAAUUCG U AAACCCAUUAUUUGGCU AA AU C U A A UCC U UC U A ACG U C U AUGU CGC UCUUGGCA G AUGU CGC UCUUGGCAAUU AUCGGAG U AU C A CCUAA AACC UCC AU C UC C A A G UC U [68] C G C AACCU AA AAUCC RNAfold (16C): -46.72 -57.02 -47.25 -52.15 S. mediterranea-1, 2005 [42], mfold (SM cons) U U A C A C U A A A A AUC A C GC GC UU A A A A U A U A A A GC GC UU U A U U A CG U A U A AA C G A U U A U U A CG A U UU UU C G A U C G UU C G A U C G AA A U A U A C G A U A U A U A U U A CG A U C G G C C G C G A U C G A C G G C 22C A U U G U A G C U U U A UC U G U A CUUAAU UGGGGACU C CGUUAUC GUUAAUUUUUGACA U UA A U U A G C A UA A C U A G C A C CU U AU C G A U G C CUUAAU UGG CG GACCGUU AACAUUAGUUGG CA U U U UUAAU UGG CG GACCGUUAUCC G G C A U AUAA AUC UU AUGGUA UU AAUUUUUGACA U A AAUAUAUC UCUGGCAG U A U A U U A U U A U A U AAUAUAUC GC CAUGGC G UUG U AAAUAUAUC GC CAUGGC GA A UAUUCUGGCAG UGCCG A U A U U A C A A GCCGU [69] U A U U A C AU C A U U G AA AU G AGC AA AU GCCGU RNAfold (22C): -43.84 -44.16 -32.07 -41.19 E. multilocularis, 2000 [41], mfold no SM given G G U U U U CG CG G C GA C G C G C GA U G G C A U U G U G C U A A C C G C G AC C G A UA U C G C U G G C G C G C C G G C U U U G G U A A U A U U A U G U AG A G U A U A C G U U U C G A CU C A C G C G G C C G U G C G C G C G 35C U UAGU GGUGGA AUCGAUG UGCCU CGAG GA CCAGU U UUAGU UGGUGGA UAUCGAUG UCGCUACGAG UAUUUGGCUG C G C U G U U U G C G C C G ACGU CAC UUC UGGCUA CGGG GCUU CUGGUCG U U ACGU CAC UUC UGGCUA A U A U U G U U U A U G C G U C A C A C G G U C A UUAGU UGGUGGA UAUCGAUGC GACCC G G C U UAGU GGUGGA AUCGAUGC GACCCAGUAUUUGGCUG C U U U C U CACCGU CACCGU U GACGUUCAC UUCCUGGCUAA [70] GACGUUCAC UUCCUGGCUAA U U CACCGU CACCGU RNAfold (35C): -36.34 -42.20 -33.35 -37.37 C. intestinalis, 2001 [8], 4nt 5’ added mfold3.0

A A U A A U U AGGU AGAAUCUGAGC U AGGU AGAAU CU GAGC U UAAGGUAAGAAUCUGAGCUUUGGCUCAACUAUGU 21C U UAAGGUAAGAAUCUGAGCUUUGGCUCAACUAUGU A A A U A A UUUA UUUAC GG CUCG U AGUUUA UUUAC A UUUA UUUAC ACUCG AGUUUA UUUAC G U A A G [71] G A G A C C UAUGU UAUGU RNAfold (21C): -5.71 -14.98 -5.71 -17.06 O. dioica, 2004 [9], folded by hand? CUA U A G C U A AU GG G U C AA C A A A U UGA A GGU GAAUCG CUCCAUGCCAACCCUUUCCUAUCCCUCUCGCCUCUUACAAA U A U U A 20C UUA UUUAGC GAG C G U A U U GC CU UU UUUU G C U U UAGUG CC GUCUCCAUGCCAACCCUUUCCUAUCCCUCUCGCCUCUUACAAA ACUCAUCCCAUU A ACCU A CCCU [72] RNAfold (20C): -14.19 -14.24

14