Supplementary data: The core of the SAM-IV riboswitch aptamer mimics the ligand-binding site of SAM-I riboswitches Zasha Weinberg1, Elizabeth E. Regulski1, Ming C. Hammond2, Jeffrey E. Barrick2,3, Zizhen Yao4, Walter L. Ruzzo4,5 and Ronald R. Breaker1,2,3

1 Department of Molecular, Cellular and Developmental Biology, 2 Department of Molecular Biophysics and Biochemistry, 3 Howard Hughes Medical Institute, Yale University, New Haven, Connecticut 06520. 4 Department of Computer Science and Engineering, 5 Department of Genome Sciences, University of Washington, PO Box 352350, Seattle, WA, 98195. Correspondence should be addressed to R.R.B. ([email protected])

Note: Supplementary Figures S3 and S4 present data on RNA motifs that is of a similar nature to previous reports from our laboratory. Therefore, we used a similar or identical design of figures and wording of figure legend explanations to those in our previous reports [4, 5]. The underlying data on SAM-I riboswitches (Supplementary Figure S4) are derived from a previously established alignment [1]. The data on SAM-IV riboswitches is novel.

Contents

Additional analysis of SAM-IV 2 Non-actinomycete SAM-IV instance ...... 2 Other environmental sequences searched ...... 2 The number of SAM-IV riboswitches per species ...... 2 Experiments designed to deplete cellular SAM concentrations ...... 2

Supplementary Figure S1: consensus sequence/structure of SAM-IV motif, including evidence of covariation 3

Supplementary Figure S2: in-line probing gel showing more detail of SAM-IV’s cleavage pattern 4

Supplementary Figure S3A: of SAM-IV riboswitches 5

Supplementary Figure S3B: gene context of SAM-IV riboswitches 6

Supplementary Figure S3C: conserved domains present in genes downstream of SAM-IV riboswitches 8

Supplementary Figure S3D: SAM-IV multiple sequence alignment 9

Supplementary Figure S4A: taxonomy of SAM-I riboswitches 14

Supplementary Figure S4B: gene context of SAM-I riboswitches 16

Supplementary Figure S4C: conserved domains present in genes downstream of SAM-I riboswitches 22

Supplementary Figure S4D: SAM-I multiple sequence alignment 24 2 Additional analysis of SAM-IV

Non-actinomycete SAM-IV instance The number of SAM-IV riboswitches per species

Only one instance of the SAM-IV motif was identified outside of , Within Actinomycetales, the number of SAM-IV riboswitches per species varies and was in Magnetospirillum magnetotacticum, in the class α-. markedly. For example, coelicolor has one, versus four in S. aver- However, we detected no SAM-IV in M. magneticum, a fully sequenced relative mitilis. Similarly, glutamicum has two, while C. diphtheriae has of M. magnetotacticum. Similarly, the only predicted SAM-I riboswitch in α- none. See Supplementary Figure S3A. proteobacteria is also in M. magnetotacticum. Previously, fourteen protein coding genes typical of were found in M. magnetotacticum, and it was pro- posed that these might have been acquired by horizontal transfer [2], which could also apply to SAM-IV. Experiments designed to deplete cellular SAM concentra- tions Other environmental sequences searched We measured reporter gene expression in a methionine auxotroph strain of S. coeli- We performed homology searches on the following additional environmental se- color [3], grown in minimal media with and without methionine. Since methionine quences downloaded from GenBank, but failed to uncover SAM-IV riboswitches is directly used to make SAM, its depletion induces low SAM levels [6]. Unfor- in any of them: acid mine drainage (GenBank project AADL), soil and whale tunately, XylE activity in all experiments was at background, i.e., similar to a fall (AAFX-AAFZ, AAGA), the human gut (AAQK, AAQL), gutless sea worms strain lacking the xylE gene (data not shown). We presume that growth in min- (AASZ), mouse gut (AATA-AATF) and sludge communities (AATN, AATO). Sar- imal media lowered gene expression below detection limits with XylE, rendering gasso Seq sequences (AACY) contain one SAM-IV riboswitch; see Supplementary the data inconclusive. By contrast, while the results in Fig. 3 of our manuscript Figure S3A. are somewhat close to background, the difference is significant. 3 Supplementary Figure S1: consensus sequence/structure of SAM-IV motif, including evidence of co- variation

Conserved nucleotide positions and evidence of covariation were calculated as in [5]. Stems are labeled P1-P5. P5 in SAM-IV is often missing, but its 50 side involved in the pseudoknot is always present.

G C R: A or G. Y: C or U. pseudoknot G C C G pseudoknot C G Y C base pair annotations G Y C G C C G U C has covarying mutations C G C G Y R G C G has compatible mutations C G no mutations observed U Y SAM-IV C G P3 C G variable hairpin C G A U P5 A C G Y nucleotide nucleotide G C Y C C G identity present G G C G G C U G C U G R Y C R N 97% 97% C G C U G Y N 90% 90% C C G A C G A C Y R G G C 75% A R P2 Y A G N 75% G A Y 50% Y Y G A C G A U A A P4 P1G C C G G G C 5‘ G G Y Y 4 Supplementary Figure S2: in-line probing gel showing more detail of SAM-IV’s cleavage pattern G48 G21 G41 G13 G23 G40 G14 G19 G27 G32 G34 G35 G9 T1 -OH NC 1 mM SAM G86 G83 G70 G74 G69 G48 G75 G82 G63 G84 G67 G100 G101 G86 G94 G83 G95 G105 G74 G98 G75 G82 G84 G89 G63 G67 G116/117 G125 G126 U132 G105 G121 G123 5 Supplementary Figure S3A: taxonomy of SAM-IV riboswitches

The taxonomy of each organism containing a putative SAM-IV riboswitch is listed, with abbreviations (e.g., “Cef-1-1”) used to denote that riboswitch in later figures. (This explanatory text is largely copied from supplementary data on a different RNA motif [4].)

abbrev. of hits taxonomy of species Cef-1-1 Actinobacteria Actinomycetales Corynebacterineae Corynebacteriaceae Corynebacterium efficiens YS-314 Cgl-1-1 to Cgl-1-2 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Corynebacteriaceae Corynebacterium glutamicum ATCC 13032 Mfl-1-1 to Mfl-1-2 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae flavescens PYR-GCK Mle-1-1 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae TN Msm-1-1 to Msm-1-2 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae str. MC2 155 Msp-1-1 to Msp-1-2 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae Mycobacterium sp. JLS Msp-2-1 to Msp-2-2 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae Mycobacterium sp. KMS Msp-3-1 to Msp-3-2 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae Mycobacterium sp. MCS Mul-1-1 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae Agy99 Mva-1-1 to Mva-1-2 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae Mycobacterium vanbaalenii PYR-1 Mav-1-1 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae Mycobacterium avium 104 Mav-2-1 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae Mycobacterium avium subsp. paratuberculosis K-10 Mbo-1-1 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae AF2122/97 Mtu-1-1 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae Mycobacterium tuberculosis C Mtu-2-1 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae Mycobacterium tuberculosis CDC1551 Mtu-3-1 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae Mycobacterium tuberculosis F11 Mtu-4-1 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae Mycobacterium tuberculosis H37Rv Mtu-5-1 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae Mycobacterium tuberculosis str. Haarlem Nfa-1-1 to Nfa-1-3 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae farcinica IFM 10152 Rsp-1-1 to Rsp-1-2 Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Nocardiaceae Rhodococcus sp. RHA1 Fal-1-1 to Fal-1-2 Actinobacteria Actinobacteridae Actinomycetales Frankineae Frankiaceae alni ACN14a Fsp-1-1 Actinobacteria Actinobacteridae Actinomycetales Frankineae Frankiaceae Frankia sp. EAN1pec Kra-1-1 to Kra-1-4 Actinobacteria Actinobacteridae Actinomycetales Frankineae Kineosporiaceae Kineococcus radiotolerans SRS30216 Bli-1-1 Actinobacteria Actinobacteridae Actinomycetales Micrococcineae Brevibacteriaceae Brevibacterium linens BL2 Jsp-1-1 to Jsp-1-3 Actinobacteria Actinobacteridae Actinomycetales Micrococcineae Intrasporangiaceae Janibacter sp. HTCC2649 Aau-1-1 to Aau-1-5 Actinobacteria Actinobacteridae Actinomycetales Micrococcineae aurescens TC1 Asp-1-1 to Asp-1-4 Actinobacteria Actinobacteridae Actinomycetales Micrococcineae Micrococcaceae Arthrobacter sp. FB24 Sar-1-1 Actinobacteria Actinobacteridae Actinomycetales Micromonosporineae Salinispora arenicola CNS205 Str-1-1 Actinobacteria Actinobacteridae Actinomycetales Micromonosporineae Micromonosporaceae Salinispora tropica CNB-440 Nsp-1-1 Actinobacteria Actinobacteridae Actinomycetales Propionibacterineae Nocardioidaceae Nocardioides sp. JS614 Sav-1-1 to Sav-1-4 Actinobacteria Actinobacteridae Actinomycetales Streptomycineae Streptomyces avermitilis MA-4680 Sco-1-1 Actinobacteria Actinobacteridae Actinomycetales Streptomycineae Streptomycetaceae Streptomyces coelicolor A3(2) Mma-1-1 α-proteobacteria Rhodospirillales Rhodospirillaceae Magnetospirillum magnetotacticum MS-1 env-1 environmental sample 6 Supplementary Figure S3B: gene context of SAM-IV riboswitches

All riboswitches (indicated by “RNA→”) are listed with their downstream laps a hypothetical protein, but BLAST can find no homolog of this protein. The genes, according to the RefSeq annotation. Environmental sequences and some direction of each downstream gene is indicated with an arrow (→), and each con- RefSeq entries lack gene annotations, and no genes are listed for such sequences. served in the gene is colored. Conserved domains associated with more Lines beginning with a superscript “1” indicate riboswitches that partially over- than one SAM-IV riboswitch are assigned a color; other domains are gray. Con- laps reverse complement of a hypothetical protein. However, BLAST cannot find served domains are explained in Supplementary Figure S3C. Nucleotide coordi- any homologs of the overlapping region. The superscript “2” riboswitch over- nates are given for the 50 and 30 boundaries of the riboswitch. Note that these lap conserved hypothetical protein (COG5515). However, the overlapping part coordinates are for the full sequence listed in Supplementary Figure S3D, includ- of the hypothetical protein gene has no homologs (by BLAST), even though the ing extra downstream nucleotides used to annotate transcription terminators and non-overlapping part does. COG5515 genes are regulated by other instances of start codons. Therefore the listed 30 coordinate will extend past the actual ri- the riboswitch according to genome annotations; we presume that the indicated boswitch aptamer. (This explanatory text is largely copied from supplementary data riboswitch also regulates the COG5515 gene. The superscript “3” riboswitch over- on a different RNA motif [4].)

abbrev RefSeq accession 50 at 30 at genes 4Cef-1-1 NC 004369.1 + 1474341 1474771 RNA→ hypo→ Cgl-1-1 NC 003450.3 + 1372526 1372970 RNA→ hypo→ Cgl-1-2 NC 006958.1 + 1373993 1374437 RNA→ hypo→ Aau-1-1 NC 008711.1 - 3791869 3791425 RNA→ Nitrilotriacetate monoxgenase (cd01095)→ Nitrilotriacetate monoxgenase (cd01095)→ DdpA (COG0747)→ DppB (COG0601)→ DppC (COG1173)→ COG1123 (COG1123)→ Aau-1-2 NC 008711.1 - 1118288 1117855 RNA→ hypo→ 1Mva-1-1 NC 008726.1 - 1175546 1175106 RNA→ DSPc (pfam00782)ADP ribosyl GH (pfam03747)→ hypo→ Msm-1-1 NC 008596.1 - 1362007 1361570 RNA→ DSPc (pfam00782)ADP ribosyl GH (pfam03747)→ hypo→ Sav-1-1 NC 003155.3 + 3688850 3689305 RNA→ hypo→ hypo→ 2Asp-1-1 NC 008541.1 + 4342124 4342550 RNA→ COG5515 (COG5515)→ RHOD 1 (cd01522)→ MetC (COG0626)→ Aau-1-3 NC 008711.1 + 4015346 4015771 RNA→ RHOD 1 (cd01522)→ metZ (pfam01053)→ putative integral membrane protein→ Fal-1-1 NC 008278.1 + 4831332 4831773 RNA→ DdpA (COG0747)→ DppB (COG0601)→ Sav-1-2 NC 003155.3 - 3688791 3688366 RNA→ hypo→ Sav-1-3 NC 003155.3 + 8585278 8585711 RNA→ HisM (COG0765)→ GlnQ (COG1126)→ SBP bac 3 (pfam00497)→ COG3393 (COG3393)→ Fal-1-2 NC 008278.1 + 4609186 4609626 RNA→ CsdB (COG0520)→ env-1 AACY01218155.1 + 112 544 RNA→ unknown→ Sco-1-1 NC 003888.3 - 2308784 2308334 RNA→ CsdB (COG0520)→ Sav-1-4 NC 003155.3 + 7291219 7291670 RNA→ CsdB (COG0520)→ 4Jsp-1-1 NZ AAMN01000002.1 + 104068 104500 RNA→ metA (COG2021)→ Nsp-1-1 NC 008699.1 + 3644112 3644553 RNA→ CsdB (COG0520)→ Rsp-1-1 NC 008268.1 - 6724436 6724002 RNA→ metC (COG2873)→ metA (COG2021)→ Aau-1-4 NC 008711.1 + 1618323 1618754 RNA→ metA (COG2021)→ Rsp-1-2 NC 008268.1 + 4627879 4628337 RNA→ CsdB (COG0520)→ Jsp-1-2 NZ AAMN01000002.1 + 4561 5010 RNA→ CsdB (COG0520)→ DegV (COG1307)→ DUF205 (pfam02660)→ Nfa-1-1 NC 006361.1 - 1024561 1024137 RNA→ metA (COG2021)→ 4Asp-1-2 NC 008541.1 + 1479272 1479711 RNA→ metA (COG2021)→ Sugar tr (pfam00083)→ AdhC (COG1062)→ Kra-1-1 NZ AAEF02000064.1 + 13374 13810 RNA→ DszC (cd01163)→ Asp-1-3 NC 008541.1 + 2980393 2980837 RNA→ CsdB (COG0520)→ SmtA (COG0500)→ TroR (COG1321)FeoA (pfam04023)→ CrcB (COG0239)→ CrcB (COG0239)→ hypo→ Aau-1-5 NC 008711.1 + 2904748 2905189 RNA→ CsdB (COG0520)→ Jsp-1-3 NZ AAMN01000002.1 + 91101 91530 RNA→ metC (COG2873)→ Mle-1-1 NC 002677.1 - 819835 819388 RNA→ metA (COG2021)→ Msm-1-2 NC 008596.1 - 1748407 1747949 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ 7

Mfl-1-1 NZ AAPA01000001.1 + 833500 833951 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Mva-1-2 NC 008726.1 - 1643759 1643310 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Msp-3-1 NC 008146.1 - 1307453 1307006 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Msp-2-1 NC 008705.1 - 1311762 1311315 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Msp-1-1 NZ AAQC01000009.1 + 7131 7578 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Mul-1-1 NC 008611.1 - 1553444 1552996 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Mtu-2-1 NC 002755.2 + 3723565 3724013 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Mtu-4-1 NC 000962.2 + 3725957 3726405 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Mbo-1-1 NC 002945.3 + 3683207 3683655 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Mtu-3-1 NZ AAIX01000036.1 + 18996 19444 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Mtu-1-1 NZ AAKR01000147.1 + 6245 6693 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Mtu-5-1 NZ AASN01000046.1 + 389912 390360 RNA→ Mav-2-1 NC 002944.2 + 3838558 3839009 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Mav-1-1 NC 008595.1 + 4445969 4446420 RNA→ metC (COG2873)→ metA (COG2021)→ UbiE (COG2226)→ Asp-1-4 NC 008541.1 + 1477689 1478157 RNA→ metC (COG2873)→ Fsp-1-1 NZ AAII01000122.1 + 12974 13413 RNA→ CsdB (COG0520)→ 3Kra-1-2 NZ AAEF02000003.1 + 44250 44673 RNA→ DdpA (COG0747)→ DppB (COG0601)→ DppC (COG1173)→ COG1123 (COG1123)→ Nitrilotriacetate monoxgenase (cd01095)→ Kra-1-3 NZ AAEF02000019.1 - 36781 36357 RNA→ NlpA (COG1464)→ AbcC (COG1135)→ AbcD (COG2011)→ 4Nfa-1-2 NC 006361.1 - 5665654 5665216 RNA→ RHOD 1 (cd01522)→ metZ (pfam01053)→ Str-1-1 NZ AATJ01000006.1 - 318699 318245 RNA→ CsdB (COG0520)→ Sar-1-1 NZ AAWA01000001.1 - 73920 73461 RNA→ CsdB (COG0520)→ Nfa-1-3 NC 006361.1 + 396879 397316 RNA→ CsdB (COG0520)→ Kra-1-4 NZ AAEF02000050.1 - 6285 5844 RNA→ CsdB (COG0520)→ Mma-1-1 NZ AAAP01003574.1 + 1800 2239 RNA→ COG0520: Selenocysteine lyase→ CsdB (COG0520)→ Msp-1-2 NZ AAQC01000005.1 + 250651 251094 RNA→ ←ABC Class3 (cd03229) Mfl-1-2 NZ AAPA01000002.1 + 124393 124836 RNA→ ERCC4 (pfam02732)→ 4Msp-3-2 NC 008146.1 - 920195 919752 RNA→ hypo→ Msp-2-2 NC 008705.1 - 928924 928481 RNA→ hypo→ hypo→ COG0714 (COG0714)Mrr (COG1715)→ McrC (COG4268)→ Bli-1-1 NZ AAGP01000018.1 - 74585 74159 RNA→ hypo→ 8 Supplementary Figure S3C: conserved domains present in genes downstream of SAM-IV riboswitches

Conserved domains found in downstream genes (Supplementary Figure S3B) are assigned a color, while others are shown in gray. (This explanatory text is largely are listed, with the first sentence in their description from the Conserved Domain copied from supplementary data on a different RNA motif [4].) Database. Conserved domains downstream of more than one SAM-IV riboswitch cd01095 nitrilotriacetate monoxygenase oxidizes nitrilotriacetate utilizing reduced flavin mononu- port and metabolism] cleotide (FMNH2) and oxygen. COG1135 ABC-type metal ion transport system, ATPase component [Inorganic ion transport cd01163 Dibenzothiophene (DBT) desulfurization enzyme C (DszC). and metabolism] cd01522 Member of the Rhodanese Homology Domain superfamily, subgroup 1. COG1173 ABC-type dipeptide/oligopeptide/nickel transport systems, permease components cd03229 This class is comprised of all BPD (Binding Protein Dependent) systems that are largely [Amino acid transport and metabolism / Inorganic ion transport and metabolism] represented in archaea and eubacteria and are primarily involved in scavenging solutes from the COG1307 Uncharacterized protein conserved in [Function unknown] environment. COG1321 Mn-dependent transcriptional regulator [Transcription] COG0239 Integral membrane protein possibly involved in chromosome condensation [Cell divi- COG1464 ABC-type metal ion transport system, periplasmic component/surface antigen [Inor- sion and chromosome partitioning] ganic ion transport and metabolism] COG0500 SAM-dependent methyltransferases [Secondary metabolites biosynthesis, transport, COG1715 Restriction endonuclease [Defense mechanisms] COG2011 ABC-type metal ion transport system, permease component [Inorganic ion transport and catabolism / General function prediction only] COG0520 Selenocysteine lyase [Amino acid transport and metabolism] and metabolism] COG0601 ABC-type dipeptide/oligopeptide/nickel transport systems, permease components COG2021 Homoserine acetyltransferase [Amino acid transport and metabolism] COG2226 Methylase involved in ubiquinone/menaquinone biosynthesis [Coenzyme metabolism] [Amino acid transport and metabolism / Inorganic ion transport and metabolism] COG0626 Cystathionine beta-lyases/cystathionine gamma-synthases [Amino acid transport and COG2873 O-acetylhomoserine sulfhydrylase [Amino acid transport and metabolism] COG3393 Predicted acetyltransferase [General function prediction only] metabolism] COG4268 McrBC 5-methylcytosine restriction system component [Defense mechanisms] COG0714 MoxR-like ATPases [General function prediction only] COG5515 Uncharacterized conserved small protein [Function unknown] COG0747 ABC-type dipeptide transport system, periplasmic component [Amino acid transport pfam00083 Sugar (and other) transporter. and metabolism] pfam00497 Bacterial extracellular solute-binding proteins, family 3. COG0765 ABC-type amino acid transport system, permease component [Amino acid transport pfam00782 Dual specificity phosphatase, catalytic domain. and metabolism] pfam01053 Cys/Met metabolism PLP-dependent enzyme. COG1062 Zn-dependent alcohol dehydrogenases, class III [Energy production and conversion] pfam02660 Domain of unknown function DUF. COG1123 ATPase components of various ABC-type transport systems, contain duplicated AT- pfam02732 ERCC4 domain. Pase [General function prediction only] pfam03747 ADP-ribosylglycohydrolase. COG1126 ABC-type polar amino acid transport system, ATPase component [Amino acid trans- pfam04023 FeoA domain. 9 Supplementary Figure S3D: SAM-IV multiple sequence alignment

The multiple sequence alignment of SAM-IV riboswitches follows. The align- “2” denotes base pairs exhibiting covariation, “1” denotes base pairs exhibiting ment includes sequences containing the putative SAM-IV riboswitches, as well as compatible mutations, “0” denotes base pairs that are not observed to mutate downstream sequence, in which Shine-Dalgarno sequences, rho-independent tran- and “?” denotes base pairs that have a significant frequency of non-canonical scription terminators and start codons are annotated. Superscript “1”, “2”, “3” nucleotides for Watson-Crick or G-U pairs. Below these base pair annotation is and “4”are as explained in Supplementary Figure S3B. Nucleotides proposed to the consensus sequence: “R” = “A” or “G”, “Y” = “C” or “U”, red nucleotides: basepair are colored when they comprise Watson-Crick or G-U pairs. Otherwise nucleotide identity conserved more than 97% of the time, black nucleotides: 90%, they are gray. Colors are as follows: P1, P2, P2b, P3, P4, P5, pseudoknots and gray nucleotides: 75%, red circle (): nucleotide is present 97% of the time, black stems of predicted rho-independent transcription terminators. Putative Shine- circle (): 90%, gray circle (): 75%,qa white circle (): 50%. The following SAM-IV Dalgarno and start codons are colored green. They can be distinguished from riboswitchesqa are not shownqa because they have anqa identical nucleotide sequence pseudoknots because they appear 30 to the conserved aptamer motif. Start codons to other hits that are shown: Cgl-1-2, Mav-1-1, Mbo-1-1, Msp-2-1, Msp-2-2, are derived from RefSeq annotations, while Shine-Dalgarno sequences were esti- Mtu-1-1, Mtu-3-1, Mtu-4-1, Mtu-5-1 mated manually based on annotated start codons. Stems (except for terminators) are also indicated at the bottom of the alignment by angle brackets, where match- (This explanatory text is largely copied from supplementary data on a different RNA ing < and > denote base-paired columns. Below these angle brackets, the symbol motif [4].)

alignment positions 1 ··· 149 4 Cef-1-1 AACCUCAUCUUCGCGAUCAAGAAGUUCAGCCU..CUAAGCCCUUCGG.CA.GGCUGAC.UGGCAACCGCGC.....AA.C.GC.....ACA..CGGUGCCC.CCGAAGGAAGAUCCGCUCUGUACUC...... AAAA...... Cgl-1-1 AAUGUCGAUUUUACGAUCAAGAAGAUCAGCC...GCAAGCCCUUCGG.CA.GGCUGAC.UGGCAACCGCGC...... AUC.GC.....GCA..CGGUGCCC.CCGAAGGAAGAUCGGCUCUGUACUC...... AAAA...... Aau-1-1 UAGAUUCAAGUCCAGGUCAAGAGUCAGCACG...CCAAGG..ACCGG.CU.CGUGCUG.CGGCAACCCUCA....GGCAU.U.AAG..UGGCGGGGUGCU..CCGGAAACAGACCAGGCCGCA...... CA...... Aau-1-2 UAGGGUGGUCCCAGCUUCAAGAGAUGUGGUG...CCAAGC..UCCGA.CU.GGCCACA.CACCAACCCCAU...... GU.UC.....AC...GGGUGGU..UCGGAUGAGGAAGCGGCCCGUCCCGG.....ACCAGAGACAA..... 1 Mva-1-1 UACAGUGCUCAUUGCUUCACGAGAUGCCGUCG..CCAAGC..UCCGA.CUGGGCGGCA.CGCCAACCCCAC...... GC.UC.....GU...GGGUGGC..UCGGAUGACGAAGCGGCCA.GCACC...... GACC...... Msm-1-1 UUAUCCUGAACCUGCUUCACGAGACGCCGUCG..ACAAGC..UCCGA.CUGGAUGGCG.CGCCAACCCCAC...... GC.UC.....GU...GGGUGGC..UCGGAUGACGAAGCGGCCUGCACC...... ACCC...... Sav-1-1 AAAGCCUCCCGCUUGGUCACGAAGCCGAUCG...... CGCC.CCUGGACGAUCAUCGGCUGGCAACCUCGC....ACACC.GC.....GCA..AGGUGCCUUCCAGG.GAAGACCGGGCCCCCACU...... GUCCA...... 2 Asp-1-1 UUACGUUUUUGGGGAGUAAUGAGAACCGGCG...CCAAGC..CCUGA.CU.GGCCGGU.CGGCAACCCUCC....UUC.C.AC.....GGCG.GGGUGCC..UCAGGUGAAUACUCGGCAUAUCGA...... UAU...... Aau-1-3 UUACGUUUUUCUUGAGUAAUGAGCACCAGCG...CAAAGC..CCUGA.CU.UGCUGGU.CGGCAACCCUCU...... CUC.C...... GGCG.GGGUGCC..UCAGGUGACUACUCGGCG.UAUCGA...... CAAC...... Fal-1-1 CUGAAGAUCACAACUGUCACGAGUGCCAACG...UCCAGC..CCCGG.CU.UGUUGGC.CGGCAACCCUCC...... ACC.GCGA...AG...GGGUGCC..CCGGGUGAGGACACG.GCCGCAUCC...... AGU...... Sav-1-2 UUAAUUUCACGCCCGGUCACGAGUUCCAGCG...UCAAGC..CCCGG.CU.UGCUGGA.CGGCAUCCC.GC...... CC.GCU....AC...GGGUGCC..CCGGGUGACGACCGGCCCCGUGCG...... CGG...... Sav-1-3 UUACGCCCCAGAACGGUCAAGAGCGUCAGCG...ACAAGC..CCUGG.CU.UGCUGAC.CGGCAACCCUCG.....CGAC.GC.....GGUG.GGGUGCC..CCAGGUGACGACCGGACCGGA...... UGACC...... Fal-1-2 UAAGGUUUCUCCCACGUCUUGAGUGCCAGCG...UUCAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCC.....UUAU.GCA.....GCG.GGGUGCC..CCGGGUGGUGACGAGGCCG.CGGC...... AACCCC...... env-1 UAUAGUCGGAGACAGGUCAAGAGAUUCAGCU...UUAAGC..CCCGG.CU.AGCUGGA.CGGCAACCCUCC.....AACC.GC.....GGUG.GGGUGCU..CC.GGUGGAGACCUGGCCC...... UCAC...... Sco-1-1 UAGGUUUUUCGACAGGUCAUGAGUGACAGUCA..UGAGGC..CCCGG.CC.GACUGUC.CGGCAACCCUCC.....GUCC.GU.....GGCG.GGGUGCC..CCGGGUGAAGACCAGGUCGUGGAC...... AGCAAG...... Sav-1-4 AAGAUGCUGUAUCAGGUCAUGAGCGACAGUCA..UGAGGC..CCCGG.CC.GACUGUC.CGGCAACCCUCC.....GUCC.GU.....GGCG.GGGUGCC..CCGGGUGAAGACCAGGUCGUAGGC...... AGCGAG...... 4 Jsp-1-1 UAGACUCGUGCCCAGGUCAUGAGUCCCAGCG...ACAAGC..CCCGG.CU.UGCUGGG.CGGCAACCCUCC...... UC.GC.....GGUG.GGGUGCC..CCGGGUGAAGACACGGCCCUUCCGG...... UA...... Nsp-1-1 GCUAGCGUCCCGUCCGUCACGAGUACUGGCG...CGAAGC..CCCGG.CU.GGCCAGU.CGGCAACCCUCC...... UCC.GC.....GGCG.GGGUGCC..CCGGGUGAGGACACGGCCGACUGCG...... AC...... Rsp-1-1 UAAGAUUCACUCCAGGUCAUGAGUGCCAGCA...ACGAGC..CCCGG.CU.AGCUGGC.CGGCAACCCUCC...... ACC.GC.....GGUG.GGGUGCC..CCGGGUGAUGACCUGGUUUCCUGAUU.....CACACAAC...... Aau-1-4 GACUCUUCGUAUUAGGUCAUGAGUGCCAGCA...CACAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCC.....UC.C.GC.....GGUG.GGGUGCC..CCGGGUGAAGACCUGGCCUGUUCGC...... ACGCAAG...... Rsp-1-2 CAAUCUUCGAAACAGGUCACGAGUACCAGCG...UCAAGC..CCCGG.CU.AGCUGGU.CGGCAACCCUCC...... ACC.GC.....GGCG.GGGUGCU..CCGGGUGACGACCAGGCUGAGGUC...... CAUACC...... Jsp-1-2 GACUCCACAACCAAGGUCACGAGUACCAGUG...UUCAGC..CCCGG.CU.UGCUGGU.CGGCAACCCUCC...... UCC.GC.....GGUG.GGGUGCU..CCGGGUGACGACCUGGUCGGCUGC...... AGCAA...... Nfa-1-1 UACCAUGGCGGUCAGGUCAUGAGCGCCAGCG...UCAAGC..CCCGG.CU.CGCUGGC.CGGCAACCCUCC.....AGCU.GC.....GGUG.GGGUGCU..CCGGGUGAUGACCGGGCUCCCGA...... AGG...... 4 Asp-1-2 GUAGACUCUUUAAAGGUCAUGAGUGCCAGCG...AC.AGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCC.....UUUC.GC.....GGCG.GGGUGCC..CCGGGUGAAGACCUGGCCUGCCGGCCA....GU...... Kra-1-1 GUUCGCUGGCGCCAGGUCAUGAGCGCCAGCG...ACAAGC..CCCGG.CU.CGCUGGC.CGGCAACCCUCC...... UC.GU.....CGUG.GGGUGCC..CCGGGUGAGGACCUGACCUGGUGCC...... CC...... Asp-1-3 CUAAUCUCGAAACAGGUCACGAGUGCCAGCG...CUAAAC..CCCGG.UU.UGCUGGC.CGGCAACCCUCC.....AUUC.GC.....GGUG.GGGUGCC..CCGGGUGACGACCAGGCCGG.UCCG...... GAA...... Aau-1-5 CUAGCCUUUUUACAGGUCACGAGUGCCAGCG...CUAAAC..CCCGG.UU.UGCUGGC.CGGCAACCCUCC.....AUUC.GC.....GGUG.GGGUGCC..CCGGGUGACGACCCGGCCG.GUCCG...... GAA...... Jsp-1-3 ACGUUUCAACCACAGGUCAUGAGUGCCAGCG...ACAAGC..CCCGG.CU.CGCUGGC.CGGCAACCCUCC.....AACC.GC.....GGUG.GGGUGCC..CCGGGUGAAGACCAGGCGGAGCGUC...... GACC...... Mle-1-1 AUAGGCUGCAACGCGGUCAUGAGCGCCAGCG...UCAAGC..CCCGA.CU.UGCUGGC.CGGCAACCCUCC....AAC.C.GC.....GUUG.GAGUGCC..CC.GGUGAUGACCAGGUUGAGUAGCC.....AGAACC...... <<<<....<.<<<<<<...... <<...... >>..>>>>>>.><<<..<<<.<<...... >>...>>>.>>>...... >>>>..<<<<<<<<<<<<<<<<...... >>>> ...... 2202....2.222222...... 22...... 22..222222.2220..002.2?...... ?2...200.022...... 2022..222??2??22220000...... 0000 ...... <..<<<<<...... >>>>>.>...... 2..22202...... 20222.2...... <<<<.<<...... ???2.22...... R YY GGUCAYGAGYRYCAGCR--- YAAGC--CCCGG-CU- GCUGRY-CGGCAACCCUCC----- YC-GC-----GGY -GGGUGCC--CCGGGUGA GACC GGYY ------qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 10

Msm-1-2 AUAGUCUCUUUGUCGGUCAUGAGCGCCAGCG...ACAAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCC.....AACC.GC.....GGUG.GGGUGCC..CCGGGUGAUGACCAGGUUGAGUGGUUGACGGAUUCCU...... CCGU Mfl-1-1 AUACAGUUCGUCUCGGUCAUGAGUGCCAGCG...ACAAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCC.....AACC.GC.....GGUG.GGGUGCC..CCGGGAGACGACCAGGUUGAGUAGCCG....AACA...... Mva-1-2 AUAGUCUUUGACUCGGUCAUGAGUGCCAGCG...AUAAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCC.....AACC.GC.....GGUG.GGGUGCC..CCGGGAGACGACCAGGUUGAGUAGCCG....GCCA...... Msp-3-1 AUAAGCUUCCUGGCGGUCAUGAGUGCCAGCG...UCAAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCC.....AACC.GC.....GGUG.GGGUGCC..CCGGGUGAUGACCAGGUUGAGUAGCCG....UGA...... Msp-1-1 AUAAGCUUCCUGGCGGUCAUGAGUGCCAGCG...UCAAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCC.....AACC.GC.....GGUG.GGGUGCC..CCGGGUGAUGACCAGGUUGAGUAGCCG....UGA...... Mul-1-1 GUAGGCUUCAGAUCGGUCAUGAGCGCCAGCG...UCAAGC..CCCGG.CU.UGCUGGU.CGGCAACCCUCC.....AACC.GC.....GGUG.GGGUGCC..CCGGGUGAUGACCAGGUUGAGUAGCC.....ACAGUC...... Mtu-2-1 CUAGGCUUCGAGUCGGUCAUGAGCGCCAGCG...UCAAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCC.....AACC.GC.....GGUG.GGGUGCC..CCGGGUGAUGACCAGGUUGAGUAGCC.....AUCGCC...... Mav-2-1 AUAGUCUUUGUAUCGGUCAUGAGCGCCAGCA...UCAAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCC.....AACC.GC.....GGUG.GGGUGCC..CCGGGUGAUGACCAGGUUGAGUAGCC.....ACCACAACC...... Asp-1-4 CUACACUGGGUCAAGGUCAUGAGCGCCAGCA...UUGAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCG.....UUCC.GC.....GGUG.GGGUGCC..CCGGGUGAGGACCUGGCCU.CCGGC...... AACC...... Fsp-1-1 UACAGUUUUCGCCAGGUCCUGAGCGCCAGCG...UCAAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCC...... UCC.GC.....GGUG.GGGUGCC..CCGGGUG.UGACCAGGCCC.GCGGCG.....AACC...... 3 Kra-1-2 CUAGCUUCUUCCUCGGUCAUGAGUGACAGCU...GUAGGC..CCUGG.CU.GGCUGUC.CGGCAACCCUCC.....UUCC.GU.....GGCG.GGGUGCC..CCAGGUGACGACCCGGCCGGGCG...... ACG...... Kra-1-3 CUACGGUCCUCCCAGGUCAUGAGUGACAGCG...ACAAGC..CCUGG.CU.CGCUGUC.CGGCAACCCUCC...... UCC.GC.....GGCG.GGGUGCC..CCAGGUGAAGACCCGGCCGGACG...... AAAA...... 4 Nfa-1-2 UCUCAGUGACCGAAGGUCAUGAGCACCAGCG...CCAAGC..CCCGG.CU.CGCUGGU.CGGCAACCCUCC...... UCC.GC.....GGCG.GGGUGCU..CCGGGUGACGACCUGGCCGUACCG...... AAGA...... Str-1-1 CUACACUGCAAGCAGGUCACGAGCGCCAGCG...ACAAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCGUCGAGGUUC.GC.....GGUG.GGGUGCC..CCGGGUGAUGACCGGGCCUGGCGCG...... GCG...... Sar-1-1 CUACACUGCGAACAGGUCACGAGCGCCAGCG...ACAAGC..CCCGG.CU.UGCUGGC.CGGCAACCCUCGUCGAGGUUC.GC.....GGUG.GGGUGCC..CCGGGUGAUGACCGGGCCUGGCGCG...... GCG...... Nfa-1-3 CCAAUCUCGGUCCAGGUCAUGAGUGCCAGCG...CAAAGC..CCCGG.CU.CGCUGGU.CGGCAACCCUCC...... UCC.GC.....GGUG.GGGUGCU..CCGGGUGACGACCUGGCCGCCGUCCG.....AC...... Kra-1-4 GCAACCUCGCCGCAGGUCAUGAGCGCCAGCG...ACAAGC..CCCGG.CU.CGCUGGC.CGGCAACCCUCC...... UCC.GC.....GGCG.GGGUGCC..CCGGGUGACGACCUGGCCGGUGCCG...... ACG...... Mma-1-1 CAUCCUGGUCCGCAGGUCAUGAGUGCCAGCG...CGAAAC..CCCGG.UU.UGCUGGC.CGGCAACCCUCC.....UCUC.GC.....GGCG.GGGUGCC..CCGGGUGACGACCAGGCCGCACGCCG.....AC...... Msp-1-2 UUAUUCUGAGACCAGCUCACGAGCGGACGGCAGUUUGGGUA.CCUGG.CCCGUCGUCC.CGGCAACCGCGC.....GACCACC.....GCA..CGGUGCC..CCAGGGGAAGAGCGGGCGCG...... GCUUA...... Mfl-1-2 UUACUCUGGCUUCAGCUCACGAGCGGACGACUGUUCGGGUA.CCUGG.CCCGUCGUCC.CGGCAACCGCGC.....GACCACC.....GCA..CGGUGCC..CCAGGGGAAGAGCGGGCACGC...... GGU...... 4 Msp-3-2 UUAACCUGGGAGCAGCUCACGAGCGGACGACUGUUCGGGUA.CCUGG.CCCGUCGUCC.CGGCAACCGCGC.....GACCGCC.....GCA..CGGUGCC..CCAGGGGAAGAGCGGGCACGA...... GCA...... Bli-1-1 AAGAUCAUUCACACGUUCACGAGUCGUGGCU...AUCGGUA.CCUGG.CC.GGCCACG.CGGCAACCC.GU.....CAUC.G...... ACC..GGGUGCC..CCAGGGAAAGAACGGGCCCGCACCA...... CG...... <<<<....<.<<<<<<...... <<...... >>..>>>>>>.><<<..<<<.<<...... >>...>>>.>>>...... >>>>..<<<<<<<<<<<<<<<<...... >>>> ...... 2202....2.222222...... 22...... 22..222222.2220..002.2?...... ?2...200.022...... 2022..222??2??22220000...... 0000 ...... <..<<<<<...... >>>>>.>...... 2..22202...... 20222.2...... <<<<.<<...... ???2.22...... R YY GGUCAYGAGYRYCAGCR--- YAAGC--CCCGG-CU- GCUGRY-CGGCAACCCUCC----- YC-GC-----GGY -GGGUGCC--CCGGGUGA GACC GGYY ------qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa alignment positions 150 ··· 309 4 Cef-1-1 ..GACUGCAGAGAAGA.GC.GAUU...... AUGAAGGAGAUUCCCAUGCGGAAUACCUAUGAAAACACCCCGCUAGUAAUCAGAUUGCACAACAAGUGCGGGCACACUCAAUCCG Cgl-1-1 ..GAAUGCAGAGAAGA.GC.GAUUUUU...... UGAAG...... GAGAUU.CCCAUGGGGAAUAUUUCUGCUAAGCCUCCAUUGCCCAAUAGCUGCGGACAUCAUUUACCAGGACACGAACCCUACUACCUACAAGU Aau-1-1 ...... UGCGGCAAGU.GA.AUGUCGU...... CCCGGCCGCCGC...... GCGGCC.UCUGAAAGAGAGAAUCAUGCCUUCCACUCCUGUACCUGAAUUCCGCUCCCGCGUCAUCGCAUUGGAGCUCGACGGCGAUGGCGCCC Aau-1-2 .CCGA.GCGGGCAAGA.GC.ACUCU...... CCCAAGGAGCAGUCCCUUGCCGUACCAAGGCGAUACCACCCCGGCAGCAAGCACCCCAAAAACACCCAGCAUCCUGGAAGCACCA 1 Mva-1-1 ..GGUGCCUGGCAAGA.GC.GCCCGGC...... AACGG...... GCCCGG.UUGAUCCGGAGCAUCAAUGAACACCUCACGAACGACGAUUCAGCGUGACCGCGCCGCCGGUGCACUGGUCGGUCUGGCCUCGGGGG Msm-1-1 ...GGUGCGGGUAAGA.GC.GCCUCC...... GCGUG...... GGAGG.CCCGGAUCGACCCGGGAGAAGCCAUGGGUUCUUCACGUAUCGAGUUGACGGCUGCGCAGCGAGAUCGGGCUGUCGGCGUGCUGUUG Sav-1-1 ...GUUGGGGGUAAGC.GCGGAUCUCC...... CUGCACGACGCGUAA...... GGGGGU.CCCCGAUUCUCAGGAGACCCCAUGGACAACACCCGCGCAUCAGCCACGCCCCGGAACUGGGCGGGCUACCAGCAGGCAGUCGUGGA 2 Asp-1-1 ...UCGAAUUGCAAGC.GU.GAGAG...... AAGGAGAUCCGUCGUGUCCGACGCACCCGCACAGCUGGCCCCGGCAGCCGAACCAGACGCCGCCGGCCAGGAACAACCGGCCCUG Aau-1-3 ..UCG.UACUGCAAGC.GU.GACUG...... AGGAGCACCAGCAAUGCCUGAACCCACUGAAGAAAAACUCUCCUACCGCCUGAUUACAGGCCCGGAUACCCGUGAUUUCUGCGAG Fal-1-1 ..GGAGGCGGCAAAGC.GC.GGGCCC.GG.....UCU...... CCUGGGCC.UCGGCCGAGGGAGCAAAGACAUGCUGGGAGCGACUCGCCACGCGGGUGGCCGCUCCACGACCGCCGGUCGCGGGCCGGGCACGAUC Sav-1-2 ...UGCACGGGGAACC.GC.GG...... AUCA...... C.CGGGGUGUGGCACUACCGGGGUCCACCGACUCCGGAAGGCCGUCCGCCCAUGCCCGCGUCCGACGCGAUCCCCGCCCUGCCACAGC Sav-1-3 ...... UUCGGUAAGC.GC.GAGGCCC...... CCCAA...... GGG....CCGGAACAGUGACGGUGACGGCGAUGGACGCAGCUACGAGGACGACCGCACAGCGGCGCCCCCGCCGCGUCCUGCUGUGCGCGAAC Fal-1-2 ...GCCGACGGCAAGU.GC.CGACCCU...... GUG...... AGGGUC.CUACCAGGAGCGAUGAUGACGUCGAUCAUCUCUGCCGUGUCCCCGUCCCCUGCCCGGCAGGCCUGCCCCGACCAGCCGACCGGGCA env-1 ...... GGGCAAGU.GC.GGUCUUG...... CGCAGGG...... CAAGGC.CCCGGAUGAAUGCGAGGUCUCAAAUGCGCUGUACCUGUCUCAUCUAGUGACCUGGCGCCACGCCCUGCGACUGUGCGCGCCAUCAG Sco-1-1 ...GUCCACGGCAAGC.GC.GGACCCCU...... CGCGGAACC...... AGGGGUC.CUGGGUCGUCCGAGGGAGUCUCCCGUGAACCAGCGAAUGCGAUAGGGCCGCAGAGCCCCGCCUCCGCACACCCCUUUUCUUCUCUU Sav-1-4 ...GUCUACGGCAAGC.GC.GGACCCCU...... CGUAAUCACC...... AGGGGUC.CUGGUCGUUCGAGGGAGCCUUCCGUGAGCAAGCGAAUGCGAUAGGGCGCCGAGCCCCGCCUCCGCAAUCCCCUUUUCACCUUUCCU 4 Jsp-1-1 ..UCGGUGGGGCAAGC.GC.GAUCC...... GA...... GGAG.CACUCCGAUGACCAUCACCGACCUCCCGACAGCGUCGUUCUGGCGCCCGGGCGAUCACCCGGGCGCGCGCCAGUUCGUCCAGGUCG Nsp-1-1 ..CGCAGCCGGCAAGC.GC.GGACUCG...... GCGACC...... CGGGUC.CCUGACCCGACGGAUGUGCCCACAUGCCCACGAACACUCGCACCUCGGCCCAGGCCCGGGCCGCUCGGACCCGCCCACUGGCCCGC Rsp-1-1 .AGUCGGGUGACAAGC.GC.GGGUC...... CGACAUCGGAGGUUCAUGAUGUGUUGUUGUCGUAGAUAGACCCCAGACAAGCCCUGCGCGCGAACGAACCAGCAACUCUGUCGCG Aau-1-4 ..GCG.ACGGGCAAGC.GC.GAGGUU...... AAGGUGUCAAAGGAAUGACCAUUACUGCUACAGCCCUUCCAAAAUCCGGAGAAGAAGACGGAAUCGUCAAGUACGCCGGCAUAGG Rsp-1-2 ...GGACCCGGCAAGC.GC.GGACUC...... CGAGAGGGCACCGACCCUGCGAC...... GAGUC.CCUGACUACGAGGGAUGUCGUGAUGACUGCUGUACUGACCGCUGACGUGUGUGUUGCCCCCCUAGCCCAGGUUUCGGGCUCCGACC Jsp-1-2 ..CGCAGUUGGCAAGC.GC.GGACUCCCGAG...CACC...... UUCGAGGGUC.CAACGACUCUCCAGGGAGCACCGGAUGUCCAUCGCCAGCUCGAUUCCUCACACCUUCACCCGUGAACGUCGCCAGCCGUGGCUGCG Nfa-1-1 ....UCGGGGGCAAGC.GC.GCUCG...... CAGGAGGUUUUCCCGACAUGAUCGAGUGGUCCGCGUGACCGUGAGCACCGAUCAGAGCCCCUGCCCCUCGGCCACCGGGGCGGAA 4 Asp-1-2 UGG.CGGCAGGCAAGC.GC.GAAGA...... AGAGG...... UCCU.CAGAUGACGAUUGCCGUCACCCGCAGCGGUGUACCCGAAACAUCCAGCCACAGCCUGUCAGCCCGUGACGUGAAAACCACCGCGGG Kra-1-1 ..GGCAC.GGGUAAGG.AC.GAGCCG...... CCGCA...... CGCCU.CCCGGUCCGGUCGCCGACGCGUCCACGAUCGAGGAGAACCCCGGUGACCACCACCACCCCGCCCGUCCCCGCCACCACCGCCCGGC >>>>>>>>>>>>...... <<<<<<<<<<<...... >>>>>>>>>>.>...... 2222??2??222...... ?????2220??...... ??0222????.?...... >>.>>>>...... 22.2???...... -- GGCAAGY-GC-GR YY ------Y qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 11

Asp-1-3 ..CGGACCCGGCAAGC.GC.GGAUCCC...... CGGAGUGC...... GGGGUC.CCUUUACAGUGAGGUCCCCAUGACUACUGCCACUUUCCCCGCAAACACCGCAACCACGUCCACUGCAGGCCUUCUGGACCCGCGGU Aau-1-5 ..UGGAUGCGGCAAGC.GC.GGAUCCC...... CUCCG...... GGGGUC.CCUCUUGAGUGAGGUCCCAUGAGCACUGUCACGUUCGACUCAAACAUCAUUCCGUCCUUCGCCGCAGAAACCACAGCAGAAACAGC Jsp-1-3 ..GGCGC.CCGCAAGC.AC.GGCAUA...... CAAGGAGCUUUUGCGAUGGCUGACGACGCAACUUCUGGCACCCCGAACUGGUCGUUCGAGACCCGGCAGAUCCACGCCGGACAGG Mle-1-1 .GGCUACGCGGCAAGC.GC.GAGUCCG...... CCGUGA...... CGGGCC.CUCUUAGCAAACAGGGAAACGUGAUAUGCGGCUAUGUGUAUCGCUGCUAGCAACCAGCGUUGGUUGACGUCAUUAGCGAUGACGUC Msm-1-2 CGGCCGCAAGGCAAGC.GC.GGGUCCG...... CAGGUU...... CGGGCC.CCAGAGAGACAGGGAAACCUGAUGGAGGCCAACCAUGUGUCGUUGUUCGUGAACCGUUCACACCCACCCGCGGUGUUCAUCUGUAU Mfl-1-1 CGGCUAUAAGGCAAGC.GC.GGGUCCG...... CUGCUGAAA...... CGGGCC.CCCUGUUUCACGGAGGACUUUCGAUGACCGACCCCGAUCUGACUGCUGGUUGGGCUUUUGAGACCAAGCAGGUGCACGCCGGCCAG Mva-1-2 CGGCUAUAAGGCAAGC.GC.GGGUCCG...... CUGUGAA...... CGGGCC.CCCCGGUUUCACAGAGGAGCACAUCGAUGACCGAUACCACUGCGGCCAAUGCAUCCUGGGCGUUCGAGACCAAACAGGUCCACGCC Msp-3-1 CGGCUACGCGGCAAGC.GC.GGGUCCGU...... AUCG...... ACGGGCC.CCCAGACACAGAUGGAGGGCCGUUCCGUAUGAGCACGCCAGAAGACCUGAUCGCCAAUUGGUCGUUCGAGACCAAGCAGGUCCACG Msp-1-1 CGGCUACGCGGCAAGC.GC.GGGUCCGU...... AUCG...... ACGGGCC.CCCAGACACAGAUGGAGGGCCGUUCCGUAUGAGCACGCCAGAAGACCUGAUCGCCAACUGGUCGUUCGAGACCAAGCAGGUUCACG Mul-1-1 .GGCUACACGGCAAGC.GC.GGGUCCG...... CCAUGA...... CGGGCC.CCUGAGCAGACAGGGAAAGUAGCCUCAUGAGCGCCGAAAACACCAGCACUGACGCAGAUCCGACCGCGCAUUGGUCAUUUGAAACC Mtu-2-1 .GGCUGCGCGGCAAGC.GC.GGGUCCG...... CCAUGA...... CGGGCC.CCUGACCAGACGGGGAAAGCUCAUGAGCGCCGACAGCAAUAGCACCGACGCCGAUCCGACCGCGCAUUGGUCGUUCGAAACCAAAC Mav-2-1 .GGCUAUCCGGCAAGC.GC.GGGUCCG...... CCAUGA...... CGGGCC.CCUGAGCAGACAGGGAAACGUGAUGCGCAUUCACUCGGUUCAUCGGCAAGCUCGAUCGCGCCCGCGCGUGCGUCGCUGCUGACGGC Asp-1-4 ..GCCGGUCGGCAAGC.GC.GGCACCA...... GAACCGGCCCGGCAUACCGUGCGUCCGCCUG..UGGUGC.CUUUCUUUAGAAGGAGUCUUCAAUGUCCAAUGCCUGGUCUUUCGAAACCCGCCAGAUCCACGCCGGGCAGGAGCCGGACAGCGCCA Fsp-1-1 .CGCCGUC.GGCAAGU.GC.GGACCC...... CCUCU...... GGGUC.CACCGGGGCCCAAGGAGGCUUCGUCGUGUCCCUUUCGUAUGCCACCACGUCUCCCGACGUUGAUCGGGAAGGGCCCGCCGUCGGCG 3 Kra-1-2 ....UGCCCGGCAAGU.GC.GGAG...... CUGAGGAGAAGCACGUGAUCCCCACGUCGGAGCGGGUCCUGCAGUCGGUGCUGCUCGUCGAGCGUCGCCACGUGGACCUGGUGCG Kra-1-3 ....CGUCCGGCAAGC.GC.GGAUC...... AACGGACGCCGUCGACCACGGGGUGCUGCGAGCCGCGCGACCGCCGUUCGACCCCCGGAGAGCCCACCGUGUCCAGCCCCGCCCC 4 Nfa-1-2 ...CGGUGCGGCAAGU.GC.GGAUU...... CACAGGA...... GGUU.CGAAUGAGUUACGCCGGGGACAUCACCCCCCGCCAGGCCUGGGAACUGCUGCGUGAGAACCCCGCGGCCGUCCUCGUCGACGUGCG Str-1-1 ..CGCGUCGGGCAAGC.GC.GGACCCCG...... CCCGCCCACC...... CGGGGUC.CCUGACUCCGGAGAAGCCACCCCGAUGACCUGUGCCCUCGCCCCGUCGCCGACCGUCGCCGGCCCCCUCAACGUGCUCGGCGUUCC Sar-1-1 ..CGCGUCGGGCAAGC.GC.GGACCCCG...... CCCGGACACCUCCAC...... CGGGGUC.CCCGACCCCGGAGACGUACCCCUAUGACCUGUGCCCUCGCCCCGGCACCGACCGUCACCGGCCCGCUCGACGUGCUCGGCGUGCCC Nfa-1-3 .CGGAC.GCGGCAAGC.GC.GGACUCG...... A...... CGAAUC.CUCUCGAUCUCGGGUCCAUGAACCGACGGGAUGACGUGAUGACUGCUGUUGUGGACCAGACCUGCGCCGCCUUCGCCCGGGUGUCC Kra-1-4 ..UGGCGCCGGCAAGC.GC.GGACCUC...... GCCGG...... GGGAAC.CCCGGUCCUCCCGGCAGGGUCCUGUUCCCGAGGAGACCCGUCGUGACGACCACCACCCUGUCCCCCUCCCGCCCCUGUCCCGGGGC Mma-1-1 .CGGCGGGCGGCAAGC.GC.GGGGAG...... GUU...... CUCCC.CGGACGACCCUGGAGGACUGUGCCAUGACUGCUCUCGCGGACGUCUCGUGUCCCACCGACGCCCCGGCCCCUGACCUCGCGGCGCU Msp-1-2 ...... CGUGCAAGU.GGUGGACGUC...... CGUCGGU...... GGCGUCGCCGGAAAAGGCGAAUGAUGACACAGCACCCUCUGAGCGCAGCCGAUGCGGCUCUGCGCGAGCGUAUCACCGAGCUUUCGGUGCACA Mfl-1-2 ...... GCGUGCAAGC.GGUGGACGCC...... CGAAUGC...... GGCGUCGCCAGAAACGGCGAACGAUGUCAGAGCAACCUAUUACCGAACUGUCCGUGCACAUUCCGUGCGGCGGGCUGCGCGGGCCCGUUCAAC 4 Msp-3-2 ...... UCGUGCAAGU.GGUGGACGUC...... CGGAGGC...... GGCGUCGCCGGAGAUGGCGAAUGAUGACACAACAACCCCUGAGCGCGACCGAUGCGGCUCUGCGCGAACGUAUUACCGAACUCUCGGUGCACA Bli-1-1 ..UGGUGCUGGCAAGAUUC.GAUCC...... GGGCGCGAUGCGCCCGAGAAACGGGUGCCGUCAUGUCACUUGACGCCAACGAGAACUGGCCAGGAUUCUCCCCAGAGGAAUCCCU >>>>>>>>>>>>...... <<<<<<<<<<<...... >>>>>>>>>>.>...... 2222??2??222...... ?????2220??...... ??0222????.?...... >>.>>>>...... 22.2???...... -- GGCAAGY-GC-GR YY ------Y qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa alignment positions 310 ··· 469 4 Cef-1-1 GGCAUGAUGUGUUUUACCUUCGGGCAAAGGUAACCCUGGAAAGAUCGGAGUGGUUGCCGGUUUCCAAUCUGCGAAUGGACGGGGAGUGGGUCGAAUUUGAUGCCGGUGACAAGCAUUAUCGCGGUUGGAAUCACAACAGCGAACUAAUUGCACAGGUCAU Cgl-1-1 AAAGGUUUUGCGCGAAAGGGCGACAUGGGAGCCAGUUACUAACCUGCGUAUGGAUGGCCAGUUCGUCGAGUUUGAUGCUGGUGGGGAACAUAUUCGUGGUUGGAAUCAUGAUCCGGCUCGGAUGGAAUGGUUUGUAAAAAAUGGUGGUCUGGCUGAAUGG Aau-1-1 ACCCAGCGUCGUGGCGCAAGGCCCACCAUGAUCCGCAGGCACUAUUGGGGGGCCCACGGAUCCGUGAAACUGUCCAAGCAGCGGACAUCGCCGGUUUCCAUGUGGCAACGUUCAAGGACACCCGCGUUGUGGAUCCUGGAAGCACCGGGAUUGCGGGCCG Aau-1-2 GGAACCCCGGCGAAAACGCCCACGGUCCCGGUGAUUCCCGCCGUCGACCUCUUCCUUGGCACGGUGACCAUCACGACAAAGGCCCGGACAGGGAAGUUCGCAUAUCCCGAGCCGCAGCUGAUCGCGCAAGCAUUGGCCAAAGCCAUACUGCCGACGCGCU 1 Mva-1-1 ACGCCGUGGGCGCCGGCUACGAGUUCGACCCGCCAAUGGCGGCCGGCCAUCCGGUUGCCAUGAUCGGCGGCGGGCCGGCCCCGUUCCAGCCCGGCGAGUGGACCGACGGCACGUCGAUGGCCAUCGCGAUCGCCGAGAUCGCCGCGACGGGAGCCGACCU Msm-1-1 GGUGCCGCGGCCGGUGAUGCGCUCGGCGCCGGAUACGAGUUAAACCGGCCGCUCGCGCCGGAUCACCCGGUCGGCAUGAUCGGCGGCGGACUCGGACCGUUCGCGCCGGGCGAGUGGACCGACGACACCUCAAUGGCGAUCGCGAUCGCCGAAAUUGCGU Sav-1-1 CAUCCGACUCGCCGACCGCACGGUGAGGGUCGUGCCGGGCCCGCAGGGUACAGCCGAAGGCCACUUCCCAGAACCGGCCGGGCGCACCAUCCAUGUGAUCACCGCAUGCAAUCCCUAUGGCCGUACGGCCUCUGCCGAGGACAACGGCCGUGCCCAGGCG 2 Asp-1-1 GCACUGCCGCAGGAAAGGCUGGCCUACCGGCUCAUCACCGGCCCCGACGACCGCUCGUUCUGCGAGCGCAUCUCCGCAGCACUGGCCGACGGCUAUGUCCUGCACGGAGGCCCGGCGGCAACCUUCAACGGCACCAGCGUGAUCGUUGCCCAGGCCGUGA Aau-1-3 CGCAUCCACAACUCGCUCGCAGAAGGUUACGUGCUCCAUGGCAGCCCGGCCGCCACGUUCAACGGCACCGACGUGAUCGUUGCCCAAGCCGUUGUCCUCCCUGCCGCGAUCGCCAGUGCGGAUGCUGCGGUAGCCAACGCCGUCGAUCAGCUGGAAAACU Fal-1-1 ACCGUGCCCGGCCCGCGGGCUCACCGGUCACCGCAGGUGCGGCUCACCCGCCGCCUACCGAUCGACCUGCGUCGUUCAUCCAGCUGUCUCUGUUCGCCGGCCUGACACCCGCUGCCGGCCGGCCUCCGGCGGUCCCGCCGCGCCCCACCCACCCGAGUUG Sav-1-2 ACAUCGGCUCCGCCCUCCCCACGCAGCCCGAAGCUGCCGAGCCGGGUGCCGACCUGCCAACGAUCUCGGCCCGUCCGGAAGCGUUUUACCCCCACCCCUCGCCGCUCGCCCCCGUCGUCCAACGGGAGCCCGGCCCGGUGCCCGCGGAACACGAGCACGC Sav-1-3 CGCUGUGUCGAUCUGCUCCGUGUCAGCAGCGCGCUGUGUACCGCCCCGGGCUCUGCCCGCCGUCGAGGCUGACGGCGUUCUGAUUCUCUGACGCGCACCGCCGCCGCGGUGUGCUCUCCGCCCGCCCAGUCCCCGGUUCCGGCGCUCUCCCGUGCUUCGU Fal-1-2 GCGGAUCGGCCGGGCACCCACCGGCGGCGAGCCCGCGGGUCACGAGCCCGCGGGUCACGAGCUCAAGGAUCACGAGCUCACGGGCGCCGCCGAACUUUCGGACGCCCAUCGGGUGCAGGGUGACCCGCCCGGUGCGCUGCUGGCGGUGCAGGGAGCGGGG env-1 AACCCCUGUAUCGGCGUCCCGAACGCCCGCCGAGUCCUUGUUCGGCUCGGGUGACAACGGGCUCAUCGAUAACCGCACGACAAGCACGACCACACGUAUACGAAACGCAUCACCAACGAAACUCAAGGAGAGUCACAUGAGCGAGGGCUGGUCCUUCGAA Sco-1-1 UCACAGCGCCCUUGUCUGCGCUGGUCCUCACGGGAGUCCGCCAUGUCCGUACCCACCGCUGUCGCCACCCCGCUGCCCGUCCUCGGCCGGGACGUCACCGUGCCGCUCGUCACCGGCGGUCAGGUCGCCUACGCCGCGCUCGACUACGCCGCCAGCGCGC Sav-1-4 GUGCCGUUGACGUGCGCGUUUCGCGCUGUCCGGCACGGUGAUUCCGCCUGCUCUUCACGGGAGUACACCCAUGUCCGUCUCCACCGCUGCCGCCGACCAGACCGUUUGCUGUGACAUUUCGGGACACCUGCCCGUUCUGGGCCGCGAGGUCACCGUCCCG 4 Jsp-1-1 GCCCCGUCGCGCUCGAGCGCGGUGAAUCCCUGCCCGAGGUCACCGUGGCCUACGAGACGUGGGGGACGCUCAACGCCGCCCGCGACAACGCGGUGCUCGUCGAACACGCCCUCACCGGCGACGCCCACGUCGAGGGCCAGGCCGGUCCCGGCCACGCCAC Nsp-1-1 CUGGUCGGUGCGGACCAGCAGGUGCCGCUGGUGACCGGCGGCCCGGUCCGCUACGCGAACCUGGACAUCGCCGCCAGCGCCCCCGCGCUCCAGUCCGUGGCGGACCGGGUCGCGGAGUUCCUGCCGUUCUACUCCAGCGUGCACCGCGGCGCCGGCUACG Rsp-1-1 CCUCGUCAGACCACUGGAGUAGACAGCACAUGACCGAGAACUGGUCGUUCGAGACCAAGCAGAUCCACGCCGGGCAGACGUCCGACGCGACCACCAAGGCGCGGGCGCUGCCGAUCUACCAGACCACCUCGUACACGUUCGACAGCACCGAUCACGCUGC Aau-1-4 GCCGCUGGAACUUGAAGCCGGCGGAUUCCUUCCCGACGUUGUCCUUGCGUACGAGACCUGGGGGCAGCUGAACUCCGACGCAUCCAAUGCCGUACUCGUCCAGCACGCGCUCACGGGCAGUACACACGUAGCCCGCGGCGCCACUGAUGAAGAAGGCUGG Rsp-1-2 UGCGGGUUCCGCUCGUUCAGGGAGGCGACUGUGCCUACGUCAACUUCGACUACGCCGCCAGUGCGCCGGCGUUGGCGCAGGUGACCGACCGGAUCGGGGCACUGCUCCCGACGUACUCCAGCGUGCACCGCGGCGCCGGCUACGCGUCGCGGGUCUCGAC ......

qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 12

Jsp-1-2 CUCCGUCCCCGCACCGGCCGGGCCCGCCACCCCGAGCGCGCCCAGCGACGACAACGUCUCGCAUGCGGACGGCAUACCCACCGACGUUGUGGACACCAAGGUGCCGGCGGUCGUUGCCGCACCCAAGGUCCCGACGUGUCACGGCACCUUCGUCGAGUAC Nfa-1-1 CUGCUGCCGCCGCCGGACGGCACCCUGGCCAUCGUGCCCGUCGGCGACAUCCGGCUGGAGAGCGGCGCCGUCAUCCCCGACGUCCACCUCGGCGUGCAGCGCUGGGGCGAGCUCUCCCCCGGCCUGGACAACGUCGUGCUCGUCGAGCACGCCCUCACCG 4 Asp-1-2 CAAAACCGCAGGCACUGUCCCCGACGGUACCGUCAGGUUCCAGGGCAUCGGCGGGCUUGACCUUGAAGCCGGCGGGCAUCUGCCGGACGUCACACUCGCCUACGAGACGUGGGGCACGCUGAACGCGGACCGUUCCAACGCCGUGCUGGUGCAGCAUGCC Kra-1-1 AGCGCCUCGCGCCCGUGCUGGAGCGCAUCGCCGCCGGUGCGGUGCAGCGCGAGCAGGACCACGAGCUGCCCCGCGCGCAGGUGCGCGAACUGCUCGACGCCGGUCUCGGCCGGCUGCGCCUCCCCGUCGAGGAGGGCGGGGAGGGGCUCGACCUCGUCGA Asp-1-3 UCACCGUCGCCCAGCGCCCCCUCUCCGCCGUUACCGGCGCGGAGAUCCAGGCUCCGCUGAUCCAGGGCGGCCACGUCCGCUACGCGAACCUGGACUACGGCGCAUCGGCCCCGGCGCUGUCCGUUGUCUCGGCCUACCUCAACGAGAUCCUGCCGUACUA Aau-1-5 CCCUGCUGCACGGCCGCUGGCCGCUGUUACCGGCGCCGAGUUGCAGGCGCCCCUGAUUCAGGGCGGACAUGUUCGCUACGCAAACCUCGACUACGGGGCAUCGGCUCCGGCGUUGACCAUCGUCUCGGCGUACCUCAACGAAGUCCUUCCCUACUACGCG Jsp-1-3 UGCCCGACCCGACGACCGGCGCCCGCGCGCUACCGAUCUACCAGACGACCUCCUACGUCUUCAAGGACACCGAGCAGGCGGCCAACCUGUUCGCGCUCAAGGAGUUCGGCAACAUCUACACGCGCAUCAUGAACCCGACCCAGGACGUCGUGGAGCAACG Mle-1-1 CUAGCCUGAGUACCCCAAUUACACGAUUAGCCCACUUCCGAGCAGAAGGCAGCCCAAGUGAGCUCUGAAAAACAGCCCCCGAGCCGAGUCGAUCCGCAUUGGGCAUUCGCAACUAAACAGAUUGAUGUUAGGCAGAGCCCCGGCUCGGCCACCAACGCCC Msm-1-2 GUGCGCACCCGGGUCCGGCUCGCCCUAGGCGCCGGCCGUCCAGCCGCUUUCACCCCUCAUUCGCCCCCGAGCUCGCGGAGGAGACCCUCUAUGACCACCCCCGAUCCCACCGAGAACUGGUCGUUCGAGACCAAGCAGAUCCACGCGGGUCAGUCGCCCG Mfl-1-1 ACCCCCGACAGCGCGACGAACGCCCGGGCCCUGCCGAUCUACCAGACCACCAGCUACAUCUUCGACAGCACCGACCACGCCGCGGCCCUGUUCGGGCUGGCCGAGCCGGGCAACAUCUACACCCGGAUCAUGAAUCCGACCCAGGACGUGGUCGAGCAGC Mva-1-2 GGGCAGACACCCGACAUCGCCACCAACGCGCGGGCCCUGCCGAUCUACCAGACCACCAGCUACACCUUCAACAGCACCGACCACGCGGCGGCGCUGUUCGGGCUCGCCGAGCCCGGCAACAUCUACACCCGGAUCAUGAACCCGACGACCGACGUCGUCG Msp-3-1 CCGGCCAGACGCCCGACGCGGCGACCAACGCCCGCGCCCUGCCGAUCUACCAGACCACCUCGUACACCUUCCGCGACACCGACCACGCCGCGGCGCUGUUCGGGCUGGCCGAGACCGGCAACAUCUACACGCGGAUCAUGAACCCGACGACCGACGUCGU Msp-1-1 CCGGCCAGACGCCCGACGCGGCGACCAACGCCCGCGCCCUGCCGAUCUACCAGACCACCUCGUACACCUUCCGCGACACCGACCACGCCGCGGCGCUGUUCGGGCUGGCCGAGACCGGCAACAUCUACACGCGGAUCAUGAACCCGACGACCGACGUCGU Mul-1-1 AAGCAGAUCCACGCCGGCCAGCAGCCCGAUUCCGCCACCAACGCGCGGGCGCUGCCGAUCUACCAAACCACCUCGUACACCUUCGAAAACACGGCGCACGCUGCCGCUUUGUUCGGCCUGGAGGUUCCCGGCAACAUCUACACGCGGCUGGGCAACCCCA Mtu-2-1 AGAUACACGCUGGUCAGCACCCUGAUCCGACCACCAACGCCCGGGCUCUGCCGAUCUAUGCGACCACGUCGUACACCUUCGACGACACCGCGCACGCCGCCGCCCUGUUCGGACUGGAAAUUCCGGGCAAUAUCUACACCCGGAUCGGCAACCCCACCAC Mav-2-1 UGCCCGCCGGAUUCUUCUUCCGACUUCUUCCGACCGACUUCUUCCGACCGAGAGAAAGCACCCCCGUGAGCUCCGAGAACACCGAUCACGACACCGACCCCAGCGCGCACUGGUCGUUCGAGACCAAGCAGGUGCACGCCGGCCAGCACCCCGACUCGGC Asp-1-4 CCGGCGCCAGGGCCCUCCCCAUCUACCAGACGACGUCGUUCGUGUUCCCGAGCGCCGAGAGCGCUGCCAACCGCUUUGCGCUGGCCGAACUGGCGCCCAUCUACACGCGCAUCGGCAAUCCCACCCAGGACGCAGUGGAGCAGCGCGUGGCCAGCCUUGA Fsp-1-1 CGCUGCUCGACGUGGUGGGGGCCGGCAUCCCCGUUCCGCUCGCCGACGGACGCGAGGUCCCGUACGCCAAUCUCGACCAGGCCGCCAGCGCCCCCUGCCUGCGCGGGGUCGCCGAGCACGUCGAGCGCGUCCUGCCGUACUCGGCGAGCGUGCACCGCGG 3 Kra-1-2 CACCGGCAGCGCGGCCUGUCGCCCCGGACGCGCCUGCCCCGUCUGACGCGCCUGGCGCGUCCCACGACGCCGUCACCUCUCCACCGUUCUCAGCUGCAUCCCUUCUCUCCCCCACCCGUUUCCGGACCGGCAGGAGUUCCGCCCGCAUGCCCGCACGCCC Kra-1-3 CGCUCUGCCCGACAAGCCCAAGCGUCGCGGUCUCCUCAUCGGGAUCGUCGCCGCCCUGGCCGUGAUCGCCGUGAUCGUCGCCGUCGUCCUCGGCACCCGCGACGACACCACCAGCACGGCGGCCGCCGGUUCCGGUUCCGGCGAGACGGUCACGAUCGGC 4 Nfa-1-2 UACCGAAGCCGAGUGGCGCUUCGUCGGCGUGCCCGACACCAGCUCCAUCGACCGGCCCACCCUGCUGAUCGAGUGGGUGGACGGCACCGGCUCGCCGAACCCGCGCUUCGCCGAGCAGCUCGGCAAGGCACUGGCCGACCGCGCGCCGGAGGCGCCCGUG Str-1-1 CGAUCAGAUCAACCUGGACUACGCGGCCAGUGCCCCGUGCGCGCAGGCCGCGGCGGACGCCGUGACCGCGCUGCUGCCCUGGUACGCAAGUGUGCACCGGGGCGCCGGAGCCCUGUCGCAGCACUGCACUCUGGCCUACGAGCGGGCCCGGCAGACGGUC Sar-1-1 GGUCAGAUCAACCUGGACUACGCGGCCAGCGCGCCGUGCGCGCAGGUCGCGGCGGAUGCCGUGACCGAGCUGCUGCCCUGGUACGCCAGCGUGCACCGGGGCGCUGGCGCCCUGUCGCAGCGCUGCACCCUGGCGUACGAGCGGGCCCGGCAAUCGGUCG Nfa-1-3 GGCGACGACCUGCGCGUGCCGCUCGUCCAGGGCGGGACCACCACCUACGCCAACUUCGACCUGGCCGCGAGCGCUCCGGCGCUGGCCACGGUGGCCGACCGGAUCCAGCACCUGCUGCCCUACUACGCCAGCGUGCACCGGGGCGCCGGGUAUGCCUCGC Kra-1-4 GGUCCACGGGCCCCUGCUGCCCGUCGUCGGCGCGGGCACCCGCGUCCCCCUCGCCCACGGCGGGGAGAUCCCCUACGCCAACCUGGACAACGCGGCGAGCGCGCCGGCGCUGACCGCGGUCGCCGACCGCGUCGCGCAGGUCCUGCCCCACUACGCCAGC Mma-1-1 CGCCGACGCGCCCGCGCUCCUGCCCGUCGUCGGCGGGGACACGCUCGUCCCGCUCGUCGACGGCCGCAGCGUGCCGUACGCGAACCUCGACGUCGCGGCCUCGGCCCCGGCGCUGCGCUCGGUCGCGGACCGCGUGACCGAGGUGCUGCCGCUCUACGCG Msp-1-2 UCCGGUGCGGCGGUCUACGUGGCCCGCUGCAGCGACGAAGUCCGUCACAACCGGACCGGCCGGUGCGGUGGCAGUCGUGCGGGGACGAGGACCACCCCGUGCGAUGGGAGGACGCCGAUGUCUCAAGGGAGCACGAUCUGUGCAUCAUCUGCUUCCGGGG Mfl-1-2 UGCGCGGGCGACGCUACGCACCCGGGGAGGUGCGGUGGCAGUCCUGCAGCGAUGAGGUGCGUCCCGUCCGAUGGGCGGACUCCGACGUCUCCCGGGAGUGCGACCUGUGCGUCAUCUGCCUCCGGGCGACAGCCGGCGGCCGGUCGCGCUGGUCGUGGCU 4 Msp-3-2 UUCCGUGCGGCGGUCUGCGCGGCCCACUGCAGCGACGAAGCCCGUCGCACCCGGACCGGCCGGUGCGAUGGCAGUCAUGCCAGGACGAGGACCACCCCGUGCGAUGGGAGGACGCCGACGUCUCAAGGGACCGGGACUUGUGCGUCAUUUGUUUCCGGGG Bli-1-1 CCAAUGGGCCCGAGCUCUCCUCCGCCACUCGCCCCAGGCGCUGCCGCCCAGCUACAAGGCACUCGCGCACGCAGAUAUCUCACGCGGGGUUCCCCACGCCGGACCAGACUGGAUGAGAACCGCCGAAGCAGCUCGCACCAUCGACUUCACGCCUGUGCUC ......

qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa alignment positions 470 ··· 524 4 Cef-1-1 CGCUGUCGAUCCAACAGCUCAGUGGGAAGAUAGCAGUAGUUUCGGUGGUCUCCUG Cgl-1-1 AGUGGAUCUAGUCCAACCAGUCGCCUCAUGCGUUUUGAGUCCAAAAAUGGGCAUU Aau-1-1 UUUGGACGCGAUCCAGCGGGCGGCCUUUGCAGCAGCGAUCACCCGGUCCAUCGGU Aau-1-2 GGUGUGAGGCGUCAAGGACGCUGACGGUGACUGUUGCGGCCGUCGGGAAGCGGGC 1 Mva-1-1 GCGCCACGAGGAUUCGCUCGACUACAUCGCCGAGCGGUGGCACUGGUGGUCGCGC Msm-1-1 CCACCGGUGAGGAUCUGAGAAAGGAGAAGCCACUCGACUACCUCGUCGAGCGCUG Sav-1-1 CUGCUGUUCGAGGAGAUCGGCCGGCGUCGGCUCGCCUGGUGGCCCGCGGCAGGAG 2 Asp-1-1 UCCUGCCCGAGGCAGUCGCCAGCUCCGACGCGGCGGUGGCAACCGCAGUGGACGA Aau-1-3 ACGACGACGAAGAAGCAUUUGAGGGCCACGCAUGAGCUACGCAGGUGAUCUGACC Fal-1-1 AGCCGACCGUCGCCGUCCGUCGGCUGAGUAUCCGGACCUGGCCGCCCGGGUUCGG Sav-1-2 GGCCGACCGGCAAGACACCUCGACGUGUGCCAGUACGCCGGGCAACUGCUGGGGG Sav-1-3 CUCCCGCAGCGGCCUUGCGCCGCCGUGUGCCGUCGCGCUCGCCGCCCGGGGUGUC Fal-1-2 CUGGGCGUGCCGCUGGUGGGCGGCGGCUGGGCGGAGUACGUCAACCUCGACCACG env-1 ACCAAGCAGAUCCACGCGGGGCAAGCGCCCGACGGAGUUACCAACGCGCGCGCGC Sco-1-1 CCGCGCUCCAGCGGGUCUGGGACGACGUGGCCGCCUACGCCCCGUACUACGGCAG Sav-1-4 CUCGUCACCGGUGGCGAGGUCACCUACGCCGCCCUCGACUACGCCGCCAGCGCCC 4 Jsp-1-1 GCCCGGCUGGUGGGACGGACUGGUGGGCCCGGGACGCCCGCUCGACACCGACCGC ...... Y qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 13

Nsp-1-1 CCUCGGCCGUGAGCACCGCGGCCUACGAGACCGCACGCCGCACCAUCGGUGCGUU Rsp-1-1 CGCGCUGUUCGGUCUGGCCGAGCCGGGCAACAUCUACACCCGCAUCAUGAACCCC Aau-1-4 UGGGAGCAACUUGUUGGACCAGGUGCCACCAUCGACACCAACAAGUUCUUCGUCA Rsp-1-2 CGAGUGCUACGAGAAGGCCCGCGACUCGGUCGCCCGAUUCGUCGGAGCGAGUGAG Jsp-1-2 GCGAACCUCGACCACGGCGCCUCGACCCCGGCACUGGCCAGCGUGGUCGCGGCCG Nfa-1-1 GCGACUCCCACGUCGUCGGCCCGGCCGACGACGUGCACCAGCUGCCCGGCUGGUG 4 Asp-1-2 CUGACCGGCAGCACGCACGUUACCAGGGGAGCCAGUGACGAAGAAGGCUGGUGGG Kra-1-1 GUUCGCCGACCUGCUCGUCGACCUCGCCGCGGCCGACUCCAACCUGCCGCAGGUG Asp-1-3 CGCCAGCGUGCACCGCGGCGCGGGCUAUGCCUCGCAGAUCAGCACGUCGGUGUAC Aau-1-5 AGUGUCCACCGCGGCGCAGGCUACGCCUCCCAAAUCAGCACGUCGGUGUACGAGA Jsp-1-3 CAUCGCCUCGCUGGAAGGCGGCGUCGGCGCUUUGCUCGUGGCCUCGGGUCAGGCC Mle-1-1 GUGCGCUGCCAAGCCAACAAACCACCUCCUACACGUUCACCGACACCGCACAUGC Msm-1-2 ACAGCGCGACCCACGCCCGUGCGCUGCCGAUCUACCAGACCACGUCGUACACCUU Mfl-1-1 GCAUCGCCGCGCUCGAAGGUGGUGUCGCGGCGCUGUUCCUGUCCUCGGGCCAGGC Mva-1-2 AGCAGCGCAUCGCCGCACUGGAAGGCGGCGUGGCGGCGCUGUUCCUGGCCUCCGG Msp-3-1 CGAACAGCGCAUCGCCGCGCUCGAAGGUGGUGUGGCGGCGCUGUUCCUGUCGUCC Msp-1-1 CGAACAGCGCAUCGCCGCGCUCGAAGGUGGUGUGGCGGCGCUGUUCCUGUCGUCC Mul-1-1 CUACCGAUGUGGUCGAGCAGCGCAUCGCCGCGCUCGAAGGUGGGGUCGCCGCGCU Mtu-2-1 CGACGUCGUCGAGCAGCGCAUCGCCGCGCUCGAGGGCGGUGUGGCCGCGCUGUUC Mav-2-1 CACCAACGCCCGGGCCCUGCCGAUCUACCAGACCACGUCGUACACCUUCGACGAC Asp-1-4 GGGCGGCCUGGCGGCGCUGCUGCUCAGUUCGGGGCAGGCGGCCGAAACAUUCGCC Fsp-1-1 AACCGGCUACUCCUCCGCGGUCUGCACCGCGCUCUACGAGGGGGCCCGCGCCGCC 3 Kra-1-2 CCGAAGAACCCUCGUCGCCGGGUUCGCCGCUCUGCUGCUGACCGGUUGCGCCGGU Kra-1-3 GUCGCCGACAAGUCGCUGCCGUACUGGAACACCUACACCGAGCUCGCGCAGUCGC 4 Nfa-1-2 GUGUUCCUGUGUCGGUCGGGGCAGCGUUCGGCGCACGCCGCCGACGUCGCGACCG Str-1-1 GGUGACUUCUUCGGCGCCCGCCCCGACGACCACGUCAUCUUCACCCGGAACACGA Sar-1-1 GUGACUUCCUCGGCGCGCGACCCGACGACCACGUCGUCUUCACCCGGAACACGAC Nfa-1-3 GCAUCUCCACCGAGUGCUACGAGGCCGCGCGCGGCUCGGUCGCCCGGUUCCUCGA Kra-1-4 GUCCACCGCGGCGCCGGGUACCUCUCCCGCGUCUCGACCGCGCUGUUCGAGCAGG Mma-1-1 AGCGUGCACCGCGGCGCGGGCUACCUGUCGCAGGUCUCCACGGCGCUCUACGAGG Msp-1-2 CACGGCCGGGGGUCGCUCCCGCUGGUCGUGGUUGGCUUGCGGCGACUGCCGCUCG Mfl-1-2 GGCCUGCGAGAACUGCCGGGCCGUCAACUCAGCAGUCGAGACCGGCUGGGGAAUC 4 Msp-3-2 CACCGCCGGCGGCCGCUCCCGCUGGUCGUGGCUGGCGUGCGACAACUGCCGCUCG Bli-1-1 UACCACUCACUCUUCAAAUCCCUGGAGUCGAUCGACUCCGACUCGUUCAGAUGGC ...... Y qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 14 Supplementary Figure S4A: taxonomy of SAM-I riboswitches

The taxonomy of each organism containing a putative SAM-I riboswitch is listed, with abbreviations (e.g., “Sus-1-1”) used to denote that riboswitch in later figures. This data is derived from the SAM-I alignment in [1]. (This explanatory text is largely copied from supplementary data on a different RNA motif [4].)

abbrev. of hits taxonomy of species Sus-1-1 Solibacteres Solibacterales Solibacteraceae Solibacter usitatus Ellin6076 Fsp-1-1 Actinobacteria Actinobacteridae Actinomycetales Frankineae Frankiaceae Frankia sp. CcI3 Fsp-2-1 Actinobacteria Actinobacteridae Actinomycetales Frankineae Frankiaceae Frankia sp. EAN1pec Kra-1-1 to Kra-1-5 Actinobacteria Actinobacteridae Actinomycetales Frankineae Kineosporiaceae Kineococcus radiotolerans SRS30216 Asp-1-1 Actinobacteria Actinobacteridae Actinomycetales Micrococcineae Micrococcaceae Arthrobacter sp. FB24 Sav-1-1 Actinobacteria Actinobacteridae Actinomycetales Streptomycineae Streptomycetaceae Streptomyces avermitilis MA-4680 Sco-1-1 Actinobacteria Actinobacteridae Actinomycetales Streptomycineae Streptomycetaceae Streptomyces coelicolor A3(2) Tfu-1-1 Actinobacteria Actinobacteridae Actinomycetales Streptosporangineae Nocardiopsaceae Thermobifida fusca YX Rxy-1-1 to Rxy-1-3 Actinobacteria Rubrobacteridae Rubrobacterales Rubrobacterineae Rubrobacteraceae Rubrobacter xylanophilus DSM 9941 Sth-1-1 to Sth-1-4 Actinobacteria Symbiobacterium thermophilum IAM 14863 Chu-1-1 to Chu-1-3 Sphingobacteriales Flexibacteraceae Cytophaga hutchinsonii ATCC 33406 Cte-1-1 Chlorobi Chlorobia Chlorobiales Chlorobiaceae Chlorobaculum Chlorobium tepidum TLS Cli-1-1 Chlorobi Chlorobia Chlorobiales Chlorobiaceae Chlorobium/Pelodictyon group Chlorobium limicola DSM 245 Cph-1-1 to Cph-1-3 Chlorobi Chlorobia Chlorobiales Chlorobiaceae Chlorobium/Pelodictyon group Chlorobium phaeobacteroides BS1 Cph-2-1 Chlorobi Chlorobia Chlorobiales Chlorobiaceae Chlorobium/Pelodictyon group Chlorobium phaeobacteroides DSM 266 Pph-1-1 to Pph-1-2 Chlorobi Chlorobia Chlorobiales Chlorobiaceae Chlorobium/Pelodictyon group Pelodictyon phaeoclathratiforme BU-1 Pae-1-1 Chlorobi Chlorobia Chlorobiales Chlorobiaceae Prosthecochloris aestuarii DSM 271 Cau-1-1 Chloroflexi Chloroflexales Chloroflexaceae Chloroflexus aurantiacus J-10-fl Gvi-1-1 Gloeobacteria Gloeobacterales Gloeobacter violaceus PCC 7421 Dge-1-1 to Dge-1-2 Deinococcus-Thermus Deinococci Deinococcales Deinococcaceae Deinococcus geothermalis DSM 11300 Dra-1-1 to Dra-1-3 Deinococcus-Thermus Deinococci Deinococcales Deinococcaceae Deinococcus radiodurans R1 Tth-1-1 Deinococcus-Thermus Deinococci Thermales Thermaceae Thermus thermophilus HB27 Bcl-1-1 to Bcl-1-8 Bacillales Bacillaceae Bacillus clausii KSM-K16 Bha-1-1 to Bha-1-5 Firmicutes Bacillales Bacillaceae Bacillus halodurans C-125 Bli-1-1 to Bli-1-10 Firmicutes Bacillales Bacillaceae Bacillus licheniformis ATCC 14580 (DSM 13) Bsu-1-1 to Bsu-1-11 Firmicutes Bacillales Bacillaceae Bacillus subtilis subsp. subtilis str. 168 Ban-1-1 Firmicutes Bacillales Bacillaceae Bacillus Bacilluscereus group Bacillus anthracis str. A1055 Ban-2-1 Firmicutes Bacillales Bacillaceae Bacillus Bacilluscereus group Bacillus anthracis str. A2012 Ban-3-1 to Ban-3-17 Firmicutes Bacillales Bacillaceae Bacillus Bacilluscereus group Bacillus anthracis str. Sterne Bce-1-1 to Bce-1-12 Firmicutes Bacillales Bacillaceae Bacillus Bacilluscereus group Bacillus cereus ATCC 10987 Bce-2-1 to Bce-2-10 Firmicutes Bacillales Bacillaceae Bacillus Bacilluscereus group Bacillus cereus ATCC 14579 Bce-3-1 to Bce-3-4 Firmicutes Bacillales Bacillaceae Bacillus Bacilluscereus group Bacillus cereus E33L Bce-4-1 to Bce-4-7 Firmicutes Bacillales Bacillaceae Bacillus Bacilluscereus group Bacillus cereus G9241 Bth-1-1 to Bth-1-9 Firmicutes Bacillales Bacillaceae Bacillus Bacilluscereus group Bacillus thuringiensis serovar konkukian str. 97-27 Bth-2-1 Firmicutes Bacillales Bacillaceae Bacillus Bacilluscereus group Bacillus thuringiensis str. Al Hakam Esi-1-1 to Esi-1-11 Firmicutes Bacillales Bacillaceae Exiguobacterium sibiricum 255-15 Gka-1-1 to Gka-1-8 Firmicutes Bacillales Bacillaceae Geobacillus kaustophilus HTA426 Oih-1-1 to Oih-1-13 Firmicutes Bacillales Bacillaceae Oceanobacillus iheyensis HTE831 Lin-1-1 to Lin-1-5 Firmicutes Bacillales Listeriaceae Listeria innocua Clip11262 Lmo-1-1 to Lmo-1-5 Firmicutes Bacillales Listeriaceae Listeria monocytogenes EGD-e Lmo-2-1 to Lmo-2-7 Firmicutes Bacillales Listeriaceae Listeria monocytogenes str. 4b F2365 Lmo-3-1 Firmicutes Bacillales Listeriaceae Listeria monocytogenes str. 4b H7858 Sau-1-1 to Sau-1-3 Firmicutes Bacillales Staphylococcus aureus RF122 Sau-2-1 to Sau-2-2 Firmicutes Bacillales Staphylococcus aureus subsp. aureus MRSA252 Sep-1-1 to Sep-1-4 Firmicutes Bacillales Staphylococcus epidermidis RP62A Cac-1-1 to Cac-1-7 Firmicutes Clostridia Clostridiales Clostridiaceae Clostridium acetobutylicum ATCC 824 Cpe-1-1 to Cpe-1-2 Firmicutes Clostridia Clostridiales Clostridiaceae Clostridium perfringens str. 13 Cte-2-1 to Cte-2-4 Firmicutes Clostridia Clostridiales Clostridiaceae Clostridium tetani E88 Cth-1-1 Firmicutes Clostridia Clostridiales Clostridiaceae Clostridium thermocellum ATCC 27405 Dha-1-1 Firmicutes Clostridia Clostridiales Peptococcaceae Desulfitobacterium hafniense DCB-2 Mth-1-1 to Mth-1-3 Firmicutes Clostridia Thermoanaerobacteriales Thermoanaerobacteriaceae Moorella group Moorella thermoacetica ATCC 39073 Tte-1-1 to Tte-1-3 Firmicutes Clostridia Thermoanaerobacteriales Thermoanaerobacteriaceae Thermoanaerobacter tengcongensis MB4 Lme-1-1 Firmicutes Lactobacillales Leuconostoc mesenteroides subsp. mesenteroides ATCC 8293 Ooe-1-1 Firmicutes Lactobacillales Oenococcus oeni PSU-1 Mmy-1-1 Firmicutes Mollicutes Mycoplasmataceae Mycoplasma mycoides subsp. mycoides SC str. PG1 15

Fnu-1-1 to Fnu-1-2 Fusobacterales Fusobacteriaceae Fusobacterium nucleatum subsp. nucleatum ATCC 25586 Fnu-2-1 to Fnu-2-2 Fusobacteria Fusobacteria (class) Fusobacterales Fusobacteriaceae Fusobacterium nucleatum subsp. vincentii ATCC 49256 Mma-1-1 α-proteobacteria Rhodospirillales Rhodospirillaceae Magnetospirillum magnetotacticum MS-1 Dde-1-1 δ-proteobacteria Desulfovibrionales Desulfovibrionaceae Desulfovibrio desulfuricans G20 Dvu-1-1 δ-proteobacteria Desulfovibrionales Desulfovibrionaceae Desulfovibrio vulgaris subsp. vulgaris str. Hildenborough Dac-1-1 δ-proteobacteria Desulfuromonadales Desulfuromonadaceae Desulfuromonas acetoxidans DSM 684 Gme-1-1 to Gme-1-2 δ-proteobacteria Desulfuromonadales Geobacteraceae Geobacter metallireducens GS-15 Gsu-1-1 to Gsu-1-2 δ-proteobacteria Desulfuromonadales Geobacteraceae Geobacter sulfurreducens PCA Xax-1-1 γ-proteobacteria Xanthomonadales Xanthomonadaceae Xanthomonas axonopodis pv. citri str. 306 Xca-1-1 γ-proteobacteria Xanthomonadales Xanthomonadaceae Xanthomonas campestris pv. campestris str. 8004 Xor-1-1 γ-proteobacteria Xanthomonadales Xanthomonadaceae Xanthomonas oryzae pv. oryzae KACC10331 env-1 to env-24 environmental samples 16 Supplementary Figure S4B: gene context of SAM-I riboswitches

All riboswitches (indicated by “RNA→”) are listed with their downstream than one SAM-I riboswitch are assigned a color; other domains are gray. Conserved genes, according to the RefSeq annotation. Environmental sequences and some domains are explained in Supplementary Figure S4C. Nucleotide coordinates are RefSeq entries lack gene annotations, and no genes are listed for such sequences. given for the 50 and 30 boundaries of the riboswitch. This data is derived from the Lines beginning with a superscript “1” indicate riboswitches that lack P4. The SAM-I alignment in [1]. (This explanatory text is largely copied from supplementary direction of each downstream gene is indicated with an arrow (→), and each con- data on a different RNA motif [4].) served domain in the gene is colored. Conserved domains associated with more

abbrev RefSeq accession 50 at 30 at genes 1Lme-1-1 NC 008531.1 - 1672379 1672239 RNA→ MetK (COG0192)→ ADP ribosyl GH (pfam03747)→ Mmy-1-1 NC 005364.1 + 557978 558087 RNA→ MetK (COG0192)→ CutC (COG3142)→ Gid (COG1206)→ ManA (COG1482)→ Ung (COG0692)→ COG3481 (COG3481)→ 1Ooe-1-1 NC 008528.1 + 781465 781568 RNA→ MetK (COG0192)→ 1env-1 AACY01379654.1 + 207 340 RNA→ unknown→ Gka-1-1 NC 006510.1 - 460747 460629 RNA→ hypo→ CysH (COG0175)→ Dvu-1-1 NC 002937.3 - 1264327 1264192 RNA→ ←hypo Dra-1-1 NC 001263.1 + 980694 980823 RNA→ MetH (COG0646)MetH (COG1410)→ hypo→ Dge-1-1 NC 008025.1 + 1419237 1419367 RNA→ MetH (COG0646)MetH (COG1410)→ MetF (COG0685)→ Dde-1-1 NC 007519.1 + 2490969 2491090 RNA→ hypo→ env-2 AACY01111288.1 + 617 741 RNA→ unknown→ env-3 AACY01520571.1 + 680 806 RNA→ env-4 AACY01675808.1 - 610 483 RNA→ unknown→ env-5 AACY01585427.1 - 646 519 RNA→ unknown→ env-6 AACY01470187.1 + 42 170 RNA→ unknown→ Pph-1-1 NZ AAIK01000053.1 - 11876 11737 RNA→ metZ (pfam01053)→ COG1451 (COG1451)→ 1env-7 AACY01775498.1 - 574 468 RNA→ env-8 AACY01438816.1 + 560 676 RNA→ Xax-1-1 NC 003919.1 - 3558028 3557894 RNA→ metA (COG2021)→ MetC (COG0626)→ ThrA (COG0460)→ Xca-1-1 NC 007086.1 + 1531032 1531166 RNA→ metA (COG2021)→ MetC (COG0626)→ ThrA (COG0460)→ Xor-1-1 NC 006834.1 + 1911391 1911524 RNA→ metA (COG2021)→ MetC (COG0626)→ Cph-1-1 NZ AAIC01000108.1 + 4619 4742 RNA→ ←hypo Dra-1-2 NC 001263.1 - 886190 886055 RNA→ metC (COG2873)→ metA (COG2021)→ Chu-1-1 NC 008255.1 - 1136304 1136191 RNA→ hypo→ Asp-1-1 NC 008541.1 - 556497 556385 RNA→ CIMS C terminal like (cd03311)→ Dge-1-2 NC 008025.1 - 547346 547206 RNA→ metC (COG2873)→ metA (COG2021)→ MIP (cd00333)→ COG1215 (COG1215)→ Dra-1-3 NC 001263.1 + 1363053 1363199 RNA→ ABC MetN methionine transporter (cd03258)→ AbcD (COG2011)→ Lipoprotein 9 (pfam03180)→ NlpA (COG1464)→ env-9 AACY01080826.1 + 330 455 RNA→ unknown→ env-10 AAFZ01018340.1 + 187 305 RNA→ Sus-1-1 NC 008536.1 + 2095118 2095234 RNA→ COG0714 (COG0714)→ hypo→ hypo→ MetK (COG0192)→ Rxy-1-1 NC 008148.1 + 2295929 2296052 RNA→ hypo→ 1Mma-1-1 NZ AAAP01003574.1 - 1026 908 RNA→ RHOD 1 (cd01522)→ COG0626: Cystathionine beta-lyases/cystathionine gamma-synthases→ 1Kra-1-1 NZ AAEF02000050.1 + 6416 6521 RNA→ MTHFR (cd00537)→ Methionine synt (pfam01717)CIMS N terminal like (cd03312)→ COG2334 (COG2334)→ 1Kra-1-2 NZ AAEF02000160.1 - 414 313 RNA→ 1Kra-1-3 NZ AAEF02000010.1 - 52121 52019 RNA→ metC (COG2873)→ Nitrilotriacetate monoxgenase (cd01095)→ 1Kra-1-4 NZ AAEF02000035.1 - 11882 11779 RNA→ TauB (COG1116)→ TauA (COG0715)→ TauC (COG0600)→ 1Kra-1-5 NZ AAEF02000040.1 + 10779 10892 RNA→ RHOD 1 (cd01522)→ metZ (pfam01053)→ CIMS C terminal like (cd03311)→ UbiG (COG2227)→ env-11 AAFX01104703.1 - 201 32 RNA→ 17 env-12 AAFZ01007634.1 + 875 1007 RNA→ Cph-1-2 NZ AAIC01000627.1 - 953 830 RNA→ MetC (COG0626)→ env-13 AAGA01023434.1 + 7 144 RNA→ env-14 AAGA01012172.1 - 587 471 RNA→ Tth-1-1 NC 005835.1 - 397567 397417 RNA→ metC (COG2873)→ metA (COG2021)→ DUF72 (pfam01904)→ hypo→ env-15 AAFX01098755.1 - 483 341 RNA→ Fsp-1-1 NC 007777.1 + 5255164 5255294 RNA→ ThrC (COG0498)→ Fsp-2-1 NZ AAII01000013.1 + 63749 63882 RNA→ ThrC (COG0498)→ env-16 AAFX01081827.1 + 80 234 RNA→ Tfu-1-1 NC 007333.1 + 254817 254971 RNA→ ThrC (COG0498)→ MoaD (cd00754)→ Sav-1-1 NC 003155.3 - 4858633 4858473 RNA→ ThrC (COG0498)→ MoaD (cd00754)→ Sco-1-1 NC 003888.3 + 4708429 4708590 RNA→ ThrC (COG0498)→ MoaD (cd00754)→ Esi-1-1 NZ AADW02000002.1 + 259178 259295 RNA→ COG1647 (COG1647)→ Gka-1-2 NC 006510.1 + 1382529 1382647 RNA→ Glyoxalase (pfam00903)→ Esi-1-2 NZ AADW02000005.1 - 146206 146089 RNA→ PMSR (pfam01625)→ Esi-1-3 NZ AADW02000008.1 + 46455 46570 RNA→ LysP (COG0833)→ MHT1 (COG2040)→ hypo→ env-17 AAFX01071693.1 - 989 870 RNA→ env-18 AAFX01048923.1 - 825 710 RNA→ env-19 AAFX01049186.1 + 415 546 RNA→ Pae-1-1 NZ AAIJ01000018.1 - 24135 24016 RNA→ metC (COG2873)→ metA (COG2021)→ Cph-1-3 NZ AAIC01000009.1 + 45881 46000 RNA→ metC (COG2873)→ metA (COG2021)→ Cph-2-1 NC 008639.1 - 2010016 2009894 RNA→ metC (COG2873)→ metA (COG2021)→ Cli-1-1 NZ AAHJ01000010.1 - 33494 33373 RNA→ metC (COG2873)→ metA (COG2021)→ Pph-1-2 NZ AAIK01000012.1 - 37289 37166 RNA→ metC (COG2873)→ metA (COG2021)→ Cte-1-1 NC 002932.3 + 606182 606302 RNA→ metC (COG2873)→ metA (COG2021)→ Rxy-1-2 NC 008148.1 + 1121061 1121196 RNA→ metA (COG2021)→ AA kinase (pfam00696)→ ThrA (COG0460)→ Asd (COG0136)→ Rxy-1-3 NC 008148.1 - 882038 881913 RNA→ MetH (COG0646)MTHFR (cd00537)→ CIMS like (cd03310)→ CIMS C terminal like (cd03311)→ Gme-1-1 NC 007517.1 + 3145546 3145689 RNA→ hypo→ metA (COG2021)→ Pta (COG0857)→ Gsu-1-1 NC 002939.4 + 2700111 2700227 RNA→ metA (COG2021)→ Dac-1-1 NZ AAEW02000009.1 - 42231 42110 RNA→ metA (COG2021)→ env-20 AAFX01122557.1 + 18 159 RNA→ Gvi-1-1 NC 005125.1 - 2730833 2730707 RNA→ MetK (COG0192)→ hypo→ Uma2 (COG4636)→ Mth-1-1 NC 007644.1 - 1204942 1204825 RNA→ PstB (COG1117)→ COG0390 (COG0390)→ Sth-1-1 NC 006177.1 - 2688761 2688642 RNA→ MTHFR (cd00537)→ Cac-1-1 NC 003030.1 + 453556 453672 RNA→ hypo→ MetC (COG0626)→ MetC (COG0626)→ Lmo-1-1 NC 003210.1 + 137126 137242 RNA→ DdpA (COG0747)→ Lmo-2-1 NC 002973.6 + 148864 148980 RNA→ DdpA (COG0747)→ Lin-1-1 NC 003212.1 + 172391 172507 RNA→ DdpA (COG0747)→ env-21 AAFX01054820.1 - 1101 968 RNA→ Cau-1-1 NZ AAAH02000004.1 + 164475 164600 RNA→ metC (COG2873)→ metA (COG2021)→ Bcl-1-1 NC 006582.1 - 604078 603961 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ AbgB (COG1473)→ env-22 AAGA01004112.1 + 794 937 RNA→ Oih-1-1 NC 004193.1 + 3200757 3200878 RNA→ COG0431 (COG0431)→ Bcl-1-2 NC 006582.1 - 1705486 1705370 RNA→ CIMS C terminal like (cd03311)→ Oih-1-2 NC 004193.1 + 727019 727138 RNA→ MetH (COG0646)→ Oih-1-3 NC 004193.1 - 2856872 2856750 RNA→ COG0599 (COG0599)CMD (pfam02627)→ Nitrilotriacetate monoxgenase (cd01095)→ Esi-1-4 NZ AADW02000008.1 + 73933 74052 RNA→ Alkanesulfonate monoxygenase (cd01094)→ RimK (COG0189)→ Gsu-1-2 NC 002939.4 + 1014089 1014209 RNA→ metZ (pfam01053)→ metZ (pfam01053)→ Gme-1-2 NC 007517.1 + 765273 765392 RNA→ metZ (pfam01053)→ metZ (pfam01053)→ Cte-2-1 NC 004557.1 - 1456777 1456655 RNA→ AbcC (COG1135)→ Sep-1-1 NC 002976.3 - 2600726 2600612 RNA→ metA (COG2021)→ Sau-1-1 NC 007622.1 + 15925 16037 RNA→ metA (COG2021)→ 18

Oih-1-4 NC 004193.1 - 2134373 2134257 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Esi-1-5 NZ AADW02000002.1 + 57116 57233 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Bcl-1-3 NC 006582.1 + 951399 951519 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ GldA (COG0371)→ Cpe-1-1 NC 003366.1 - 2500090 2499971 RNA→ MetK (COG0192)→ Cth-1-1 NZ AABG04000106.1 + 1107 1226 RNA→ MetK (COG0192)→ Oih-1-5 NC 004193.1 + 1319034 1319148 RNA→ hypo→ Bce-4-1 NZ AAEK01000065.1 + 5093 5213 RNA→ HisM (COG0765)→ GlnQ (COG1126)→ SBP bac 3 (pfam00497)→ COG2514 (COG2514)→ Bce-1-1 NC 003909.8 - 3299174 3299054 RNA→ HisM (COG0765)→ GlnQ (COG1126)→ SBP bac 3 (pfam00497)→ COG2514 (COG2514)→ Oih-1-6 NC 004193.1 + 3162066 3162189 RNA→ metC (COG2873)→ Lmo-1-2 NC 003210.1 + 882763 882874 RNA→ CIMS C terminal like (cd03311)→ Lmo-2-2 NC 002973.6 + 882707 882818 RNA→ CIMS C terminal like (cd03311)→ Lin-1-2 NC 003212.1 + 871741 871851 RNA→ CIMS C terminal like (cd03311)→ UvrC (COG0322)→ Ban-3-1 NC 005945.1 - 4554911 4554776 RNA→ MetK (COG0192)→ Bth-1-1 NC 005957.1 - 4553311 4553176 RNA→ MetK (COG0192)→ Bcl-1-4 NC 006582.1 - 3008456 3008335 RNA→ MetK (COG0192)→ Bha-1-1 NC 002570.2 - 3427474 3427348 RNA→ MetK (COG0192)→ Bli-1-1 NC 006270.2 - 3080583 3080432 RNA→ MetK (COG0192)→ Gka-1-3 NC 006510.1 - 2876477 2876349 RNA→ MetK (COG0192)→ Oih-1-7 NC 004193.1 - 2365520 2365396 RNA→ MetK (COG0192)→ Bsu-1-1 NC 000964.2 - 3128389 3128237 RNA→ MetK (COG0192)→ Esi-1-6 NZ AADW02000019.1 + 7529 7669 RNA→ MetK (COG0192)→ Oih-1-8 NC 004193.1 - 1098106 1097986 RNA→ CIMS C terminal like (cd03311)→ Esi-1-7 NZ AADW02000002.1 - 258943 258826 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ AbgB (COG1473)→ Cac-1-2 NC 003030.1 - 1073895 1073778 RNA→ ←CysK (COG0031) 1Fnu-1-1 NC 003454.1 - 1317660 1317560 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ 1Fnu-2-1 NZ AABF02000063.1 + 3018 3118 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ 1Fnu-1-2 NC 003454.1 - 987493 987392 RNA→ MetK (COG0192)→ 1Fnu-2-2 NZ AABF02000062.1 - 1992 1888 RNA→ MetK (COG0192)→ Sau-2-1 NC 002952.2 - 1959055 1958930 RNA→ MetK (COG0192)→ Sau-1-2 NC 007622.1 - 1790399 1790275 RNA→ MetK (COG0192)→ Sep-1-2 NC 002976.3 - 1409744 1409621 RNA→ MetK (COG0192)→ Gka-1-4 NC 006510.1 + 883125 883249 RNA→ metZ (pfam01053)→ metZ (pfam01053)→ Bcl-1-5 NC 006582.1 + 1563575 1563693 RNA→ CIMS N terminal like (cd03312)Methionine synt (pfam01717)→ Bcl-1-6 NC 006582.1 + 1983205 1983318 RNA→ MetC (COG0626)→ MetH (COG0646)MTHFR (cd00537)→ MetH (COG0646)MetH (COG1410)→ Lin-1-3 NC 003212.1 - 2538261 2538129 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Lmo-1-3 NC 003210.1 - 2491183 2491051 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Lmo-2-3 NC 002973.6 - 2444832 2444700 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Bcl-1-7 NC 006582.1 + 1701642 1701766 RNA→ ThrA (COG0460)→ Bha-1-2 NC 002570.2 + 1348809 1348936 RNA→ ThrA (COG0460)→ Bce-4-2 NZ AAEK01000047.1 - 24547 24421 RNA→ COG1878 (COG1878)→ Ban-3-2 NC 005945.1 - 371149 371023 RNA→ COG1878 (COG1878)→ Bce-1-2 NC 003909.8 - 471132 471006 RNA→ COG1878 (COG1878)→ Bce-2-1 NC 004722.1 - 374997 374871 RNA→ COG1878 (COG1878)→ Lmo-2-4 NC 002973.6 - 644529 644415 RNA→ metC (COG2873)→ metA (COG2021)→ Sth-1-2 NC 006177.1 + 138150 138272 RNA→ MetK (COG0192)→ Bth-1-2 NC 005957.1 - 768979 768860 RNA→ RbsB (COG1879)→ MglA (COG1129)→ AraH (COG1172)→ Bce-2-2 NC 004722.1 - 755254 755135 RNA→ RbsB (COG1879)→ MglA (COG1129)→ AraH (COG1172)→ Bsu-1-2 NC 000964.2 + 1426177 1426291 RNA→ RbcL (COG1850)→ COG4359 (COG4359)→ AraD (COG0235)→ COG1791 (COG1791)→ Bli-1-2 NC 006270.2 + 1488837 1488950 RNA→ RbcL (COG1850)→ AraD (COG0235)COG4359 (COG4359)→ COG1791 (COG1791)→ Gka-1-5 NC 006510.1 + 974629 974751 RNA→ RbcL (COG1850)→ COG4359 (COG4359)→ AraD (COG0235)→ COG1791 (COG1791)→ Bsu-1-3 NC 000964.2 + 1629408 1629527 RNA→ CysH (COG0175)→ PitA (COG0306)→ ATPS (cd00517)→ CysC (COG0529)→ CysG (COG0007)→ COG2138 (COG2138)→ CysG (COG1648)→ 19

Bli-1-3 NC 006270.2 + 1734721 1734841 RNA→ COG5583 (COG5583)→ Bce-2-3 NC 004722.1 + 1382033 1382155 RNA→ COG5583 (COG5583)→ CysH (COG0175)→ ATPS (cd00517)→ CysC (COG0529)→ CysI (COG0155)→ hypo→ CysG (COG0007)→ COG2138 (COG2138)→ CysG (COG1648)→ Bce-3-1 NC 006274.1 + 1397375 1397497 RNA→ CysH (COG0175)→ ATPS (cd00517)→ CysC (COG0529)→ CysI (COG0155)→ group-specific protein→ CysG (COG0007)→ COG2138 (COG2138)→ CysG (COG1648)→ Ban-3-3 NC 005945.1 + 1362719 1362841 RNA→ CysH (COG0175)→ ATPS (cd00517)→ CysC (COG0529)→ CysI (COG0155)→ hypo→ CysG (COG0007)→ COG2138 (COG2138)→ CysG (COG1648)→ Bli-1-4 NC 006270.2 - 3063546 3063425 RNA→ NlpA (COG1464)→ Bha-1-3 NC 002570.2 + 1699950 1700073 RNA→ MetC (COG0626)→ MetC (COG0626)→ MTHFR (cd00537)MetH (COG0646)→ MetH (COG0646)MetH (COG1410)→ Ban-3-4 NC 005945.1 - 4074587 4074464 RNA→ MetH (COG0646)MTHFR (cd00537)→ MetH (COG0646)MetH (COG1410)→ MurG (COG0707)→ hypo→ hypo→ hypo→ Bce-1-3 NC 003909.8 - 4022659 4022536 RNA→ MetH (COG0646)MTHFR (cd00537)→ MetH (COG0646)MetH (COG1410)→ MurG (COG0707)→ Tte-1-1 NC 003869.1 + 500236 500359 RNA→ MetK (COG0192)→ Mth-1-2 NC 007644.1 + 2070455 2070577 RNA→ MetC (COG0626)→ metZ (pfam01053)→ Ban-3-5 NC 005945.1 + 3893424 3893545 RNA→ RbcL (COG1850)→ COG4359 (COG4359)→ AraD (COG0235)→ COG1791 (COG1791)→ Bce-1-4 NC 003909.8 + 3836040 3836159 RNA→ RbcL (COG1850)→ COG4359 (COG4359)→ AraD (COG0235)→ COG1791 (COG1791)→ Esi-1-8 NZ AADW02000005.1 - 169223 169108 RNA→ RbcL (COG1850)→ COG4359 (COG4359)→ AraD (COG0235)→ COG1791 (COG1791)→ Cte-2-2 NC 004557.1 - 1226207 1226090 RNA→ metC (COG2873)→ HTS (pfam04204)→ Bth-1-3 NC 005957.1 + 1833720 1833843 RNA→ DUF894 (pfam05977)→ Cac-1-3 NC 003030.1 - 2991414 2991290 RNA→ MetK (COG0192)→ Cte-2-3 NC 004557.1 + 303331 303450 RNA→ MetK (COG0192)→ Bce-4-3 NZ AAEK01000030.1 + 59118 59236 RNA→ metC (COG2873)→ MetA (COG1897)→ Bce-2-4 NC 004722.1 - 5321943 5321825 RNA→ metC (COG2873)→ HTS (pfam04204)→ Bce-1-5 NC 003909.8 - 5120384 5120265 RNA→ metC (COG2873)→ HTS (pfam04204)→ Ban-3-6 NC 005945.1 - 5141702 5141583 RNA→ metC (COG2873)→ MetA (COG1897)→ Cac-1-4 NC 003030.1 - 2914848 2914726 RNA→ metC (COG2873)→ Bce-1-6 NC 003909.8 + 1084273 1084393 RNA→ Alkanesulfonate monoxygenase (cd01094)→ hypo→ Esi-1-9 NZ AADW02000029.1 - 16706 16587 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Major facilitator superfamily MFS 1→ Gka-1-6 NC 006510.1 - 972366 972184 RNA→ COG4857 (COG4857)→ COG0182 (COG0182)→ RbsB (COG1879)→ MglA (COG1129)→ AraH (COG1172)→ Esi-1-10 NZ AADW02000005.1 + 169417 169533 RNA→ COG4857 (COG4857)→ COG0182 (COG0182)→ RbsB (COG1879)→ MglA (COG1129)→ AraH (COG1172)→ Bcl-1-8 NC 006582.1 - 3104700 3104581 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ COG2096 (COG2096)→ YvrC (cd01143)→ COG0599 (COG0599)→ Tte-1-2 NC 003869.1 - 1750376 1750263 RNA→ MTHFR (cd00537)→ MetH (COG0646)MetH (COG1410)→ EbsC (COG2606)→ Tte-1-3 NC 003869.1 - 2076689 2076566 RNA→ ThrA (COG0460)→ metC (COG2873)→ Dha-1-1 NZ AAAW04000002.1 + 1181600 1181714 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ Lin-1-4 NC 003212.1 - 1772469 1772347 RNA→ MetK (COG0192)→ Lmo-2-5 NC 002973.6 - 1697573 1697451 RNA→ MetK (COG0192)→ Oih-1-9 NC 004193.1 - 2437314 2437201 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ Bha-1-4 NC 002570.2 - 3590828 3590702 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ Gka-1-7 NC 006510.1 - 3025519 3025396 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Bli-1-5 NC 006270.2 - 3302453 3302332 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Bsu-1-4 NC 000964.2 - 3363537 3363417 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ COG0599 (COG0599)→ Bce-1-7 NC 003909.8 - 4732575 4732453 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ NlpA (COG1464)→ hypo→ SufC (COG0396)→ UPF0051 (pfam01458)→ CsdB (COG0520)→ IscU (COG0822)→ UPF0051 (pfam01458)→ Ban-3-7 NC 005945.1 - 4741319 4741198 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ NlpA (COG1464)→ Sau-1-3 NC 007622.1 + 837534 837651 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Sep-1-3 NC 002976.3 + 480760 480876 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Oih-1-10 NC 004193.1 - 3466527 3466394 RNA→ GldA (COG0371)→ NlpA (COG1464)→ AbcC (COG1135)→ AbcD (COG2011)→ hypo→ Cac-1-5 NC 003030.1 + 1976364 1976478 RNA→ HTS (pfam04204)→ Bce-1-8 NC 003909.8 - 3833860 3833704 RNA→ hypo→ COG4857 (COG4857)→ COG0182 (COG0182)→ Bce-2-5 NC 004722.1 - 4007990 4007836 RNA→ COG4857 (COG4857)→ COG0182 (COG0182)→ hypo→ 20

Bth-1-4 NC 005957.1 - 3870225 3870070 RNA→ COG4857 (COG4857)→ COG0182 (COG0182)→ Bce-3-2 NC 006274.1 - 3927244 3927089 RNA→ COG4857 (COG4857)→ COG0182 (COG0182)→ Ban-3-8 NC 005945.1 - 3891245 3891090 RNA→ COG4857 (COG4857)→ COG0182 (COG0182)→ Bli-1-6 NC 006270.2 + 4014236 4014359 RNA→ CIMS C terminal like (cd03311)→ Bsu-1-5 NC 000964.2 + 3998177 3998297 RNA→ CIMS C terminal like (cd03311)→ Bsu-1-6 NC 000964.2 + 3996787 3996907 RNA→ CIMS C terminal like (cd03311)→ Bha-1-5 NC 002570.2 - 910199 910081 RNA→ RimK (COG0189)→ Bth-1-5 NC 005957.1 - 3170768 3170656 RNA→ hypo→ Ban-3-9 NC 005945.1 - 3092379 3092267 RNA→ hypo→ Bth-2-1 NC 008600.1 - 3165965 3165853 RNA→ SWIM (pfam04434)→ Flavodoxin 2 (pfam02525)→ Bce-1-9 NC 003909.8 + 2536316 2536445 RNA→ DUF1393 (pfam07155)→ COG1123 (COG1123)→ CbiQ (COG0619)→ conserved hypothetical protein subfamily, putative→ Bth-1-6 NC 005957.1 + 2515869 2515998 RNA→ DUF1393 (pfam07155)→ ABC cobalt transport domain1 (cd03225)→ CbiQ (COG0619)→ Ban-3-10 NC 005945.1 + 2459405 2459534 RNA→ DUF1393 (pfam07155)→ ABC cobalt transport domain1 (cd03225)→ CbiQ (COG0619)→ Bce-4-4 NZ AAEK01000052.1 + 2958 3087 RNA→ DUF1393 (pfam07155)→ COG1123 (COG1123)→ CbiQ (COG0619)→ conserved hypothetical protein protein subfamily, putative→ Bce-2-6 NC 004722.1 + 2587946 2588075 RNA→ Integral membrane protein→ Lmo-2-6 NC 002973.6 - 325795 325662 RNA→ NlpA (COG1464)→ AbcC (COG1135)→ AbcD (COG2011)→ COG0388 (COG0388)→ Lmo-3-1 NZ AADR01000001.1 + 130761 130894 RNA→ NlpA (COG1464)→ AbcC (COG1135)→ AbcD (COG2011)→ COG0388 (COG0388)→ Lmo-1-4 NC 003210.1 - 309392 309259 RNA→ NlpA (COG1464)→ AbcC (COG1135)→ AbcD (COG2011)→ COG0388 (COG0388)→ Lin-1-5 NC 003212.1 - 327343 327210 RNA→ NlpA (COG1464)→ AbcC (COG1135)→ AbcD (COG2011)→ COG0388 (COG0388)→ Ban-2-1 NZ AAAC02000001.1 - 716908 716780 RNA→ NlpA (COG1464)→ AbcC (COG1135)→ AbcD (COG2011)→ Bce-1-10 NC 003909.8 - 205489 205361 RNA→ NlpA (COG1464)→ AbcC (COG1135)→ AbcD (COG2011)→ Ban-3-11 NC 005945.1 - 177282 177154 RNA→ NlpA (COG1464)→ AbcC (COG1135)→ AbcD (COG2011)→ Bce-2-7 NC 004722.1 - 174297 174169 RNA→ NlpA (COG1464)→ AbcC (COG1135)→ AbcD (COG2011)→ Bce-3-3 NC 006274.1 - 177111 176983 RNA→ NlpA (COG1464)→ AbcC (COG1135)→ AbcD (COG2011)→ Oih-1-11 NC 004193.1 - 2708652 2708532 RNA→ COG3464 (COG3464)→ metC (COG2873)→ Oih-1-12 NC 004193.1 - 3200645 3200523 RNA→ Alkanesulfonate monoxygenase (cd01094)→ RimK (COG0189)→ Bce-1-11 NC 003909.8 + 361547 361667 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Bce-3-4 NC 006274.1 + 334363 334481 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Ban-3-12 NC 005945.1 + 320611 320731 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Ban-1-1 NZ AAEO01000030.1 + 75093 75213 RNA→ Ban-3-13 NC 005945.1 + 185578 185697 RNA→ DppB (COG0601)→ DppC (COG1173)→ DppD (COG0444)→ AppF (COG4608)→ SBP bac 5 (pfam00496)→ Bth-1-7 NC 005957.1 + 188037 188156 RNA→ DppB (COG0601)→ DppC (COG1173)→ DppD (COG0444)→ AppF (COG4608)→ SBP bac 5 (pfam00496)→ Esi-1-11 NZ AADW02000012.1 - 72076 71959 RNA→ NhaC (COG1757)→ Sep-1-4 NC 002976.3 - 1935508 1935398 RNA→ NhaC (COG1757)→ Sau-2-2 NC 002952.2 - 2480784 2480675 RNA→ NhaC (COG1757)→ Gka-1-8 NC 006510.1 - 745492 745376 RNA→ MTHFR (cd00537)MetH (COG0646)→ MetH (COG0646)MetH (COG1410)→ Bli-1-7 NC 006270.2 + 1291279 1291409 RNA→ metZ (pfam01053)→ metZ (pfam01053)→ Bli-1-8 NC 006270.2 - 1206885 1206760 RNA→ MTHFR (cd00537)MetH (COG0646)→ MetH (COG0646)MetH (COG1410)→ Bsu-1-7 NC 000964.2 - 1180114 1179983 RNA→ MTHFR (cd00537)MetH (COG0646)→ Acetyltransf 1 (pfam00583)→ Acetyltransf 1 (pfam00583)→ Bsu-1-8 NC 000964.2 + 1257609 1257743 RNA→ metZ (pfam01053)→ metZ (pfam01053)→ Bce-2-8 NC 004722.1 - 194629 194511 RNA→ OppA (COG4166)→ Bce-4-5 NZ AAEK01000017.1 - 12415 12297 RNA→ OppA (COG4166)→ Bce-1-12 NC 003909.8 - 225261 225140 RNA→ OppA (COG4166)→ Ban-3-14 NC 005945.1 - 197195 197075 RNA→ OppA (COG4166)→ Oih-1-13 NC 004193.1 + 3294465 3294595 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ AbgB (COG1473)→ Bsu-1-9 NC 000964.2 - 2024468 2024363 RNA→ SerA (COG0111)→ XylB (COG1070)→ Sugar tr (pfam00083)→ Bce-4-6 NZ AAEK01000001.1 + 280075 280190 RNA→ AbgB (COG1473)→ Ban-3-15 NC 005945.1 - 2953927 2953812 RNA→ AbgB (COG1473)→ Bth-1-8 NC 005957.1 - 3052794 3052679 RNA→ possible aminoacylase (N-acyl-L-amino-acid amidohydrolase), N-terminal→ AbgB (COG1473)→ 21

Bli-1-9 NC 006270.2 + 969425 969552 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Ban-3-16 NC 005945.1 + 1375412 1375531 RNA→ NhaC (COG1757)→ Bce-2-9 NC 004722.1 + 1394794 1394913 RNA→ NhaC (COG1757)→ Cte-2-4 NC 004557.1 - 2676962 2676846 RNA→ NhaC (COG1757)→ Sm multidrug ex (pfam06695)→ Ban-3-17 NC 005945.1 + 4074776 4074899 RNA→ MetC (COG0626)→ MetC (COG0626)→ Bce-2-10 NC 004722.1 + 4199707 4199831 RNA→ MetC (COG0626)→ MetC (COG0626)→ Bth-1-9 NC 005957.1 + 4062233 4062356 RNA→ MetC (COG0626)→ MetC (COG0626)→ Bce-4-7 NZ AAEK01000021.1 - 52446 52323 RNA→ MetC (COG0626)→ MetC (COG0626)→ Cac-1-6 NC 003030.1 + 1131530 1131648 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ Cac-1-7 NC 003030.1 - 671363 671244 RNA→ MetH (COG0646)MetH (COG1410)→ Cpe-1-2 NC 003366.1 - 2665238 2665061 RNA→ NhaC (COG1757)→ Lmo-2-7 NC 002973.6 - 1720517 1720397 RNA→ CIMS N terminal like (cd03312)Methionine synt (pfam01717)→ MetC (COG0626)→ MetC (COG0626)→ MetH (COG0646)MTHFR (cd00537)→ Lmo-1-5 NC 003210.1 - 1739604 1739484 RNA→ Methionine synt (pfam01717)CIMS N terminal like (cd03312)→ MetC (COG0626)→ MetC (COG0626)→ MTHFR (cd00537)MetH (COG0646)→ Bli-1-10 NC 006270.2 - 1486652 1486495 RNA→ COG4857 (COG4857)→ COG0182 (COG0182)→ Bsu-1-10 NC 000964.2 - 1423998 1423828 RNA→ COG4857 (COG4857)→ COG0182 (COG0182)→ Bsu-1-11 NC 000964.2 - 1385204 1385035 RNA→ Methionine synt (pfam01717)CIMS N terminal like (cd03312)→ Mth-1-3 NC 007644.1 - 1354291 1354160 RNA→ metC (COG2873)→ metA (COG2021)→ Sth-1-3 NC 006177.1 + 278265 278396 RNA→ AbcC (COG1135)→ AbcD (COG2011)→ NlpA (COG1464)→ hypo→ Sth-1-4 NC 006177.1 + 1830599 1830728 RNA→ metA (COG2021)→ env-23 AACY01426272.1 + 132 241 RNA→ unknown→ env-24 AAFY01008477.1 - 792 654 RNA→ 1Chu-1-2 NC 008255.1 - 2128348 2128246 RNA→ MetC (COG0626)→ MhpC (COG0596)→ Chu-1-3 NC 008255.1 - 165481 165358 RNA→ metA (COG2021)→ membrane spanning protein, required for outer membrane integrity→ 22 Supplementary Figure S4C: conserved domains present in genes downstream of SAM-I riboswitches

Conserved domains found in downstream genes (Supplementary Figure S4B) mains associated with SAM-I riboswitches, some colors are re-used. This data is are listed, with the first sentence in their description from the Conserved Domain derived from the SAM-I alignment in [1]. (This explanatory text is largely copied Database. Conserved domains downstream of more than one SAM-I riboswitch from supplementary data on a different RNA motif [4].) are assigned a color, while others are shown in gray. Because there are many do- cd00333 Major intrinsic protein (MIP) superfamily. COG0396 ABC-type transport system involved in Fe-S cluster assembly, ATPase component cd00517 ATP-sulfurylase (ATPS), also known as sulfate adenylate transferase, catalyzes the [Posttranslational modification, protein turnover, chaperones] transfer of an adenylyl group from ATP to sulfate, forming adenosine 5’-phosphosulfate (APS). COG0431 Predicted flavoprotein [General function prediction only] cd00537 Methylenetetrahydrofolate reductase (MTHFR). COG0444 ABC-type dipeptide/oligopeptide/nickel transport system, ATPase component [Amino cd00754 MoaD family. acid transport and metabolism / Inorganic ion transport and metabolism] cd01094 Alkanesulfonate monoxygenase is the monoxygenase of a two-component system that COG0460 Homoserine dehydrogenase [Amino acid transport and metabolism] catalyzes the conversion of alkanesulfonates to the corresponding aldehyde and sulfite. COG0498 Threonine synthase [Amino acid transport and metabolism] cd01095 nitrilotriacetate monoxygenase oxidizes nitrilotriacetate utilizing reduced flavin mononu- COG0520 Selenocysteine lyase [Amino acid transport and metabolism] cleotide (FMNH2) and oxygen. COG0529 Adenylylsulfate kinase and related kinases [Inorganic ion transport and metabolism] cd01143 Periplasmic binding protein YvrC. COG0596 Predicted hydrolases or acyltransferases (alpha/beta hydrolase superfamily) [General cd01522 Member of the Rhodanese Homology Domain superfamily, subgroup 1. function prediction only] cd03225 Domain I of the ATPase component of a cobalt transport family found in both bacteria COG0599 Uncharacterized homolog of gamma-carboxymuconolactone decarboxylase subunit and archaea. [Function unknown] cd03258 MetN (also known as YusC) is an ABC-type transporter encoded by metN of the COG0600 ABC-type nitrate/sulfonate/bicarbonate transport system, permease component [In- metNPQ operon in Bacillus subtilis that is involved in methionine transport. organic ion transport and metabolism] cd03310 CIMS - Cobalamine-independent methonine synthase, or MetE. COG0601 ABC-type dipeptide/oligopeptide/nickel transport systems, permease components cd03311 CIMS - Cobalamine-independent methonine synthase, or MetE, C-terminal domain like. [Amino acid transport and metabolism / Inorganic ion transport and metabolism] cd03312 CIMS - Cobalamine-independent methonine synthase, or MetE, N-terminal domain like. COG0619 ABC-type cobalt transport system, permease component CbiQ and related trans- COG0007 Uroporphyrinogen-III methylase [Coenzyme metabolism] porters [Inorganic ion transport and metabolism] COG0031 Cysteine synthase [Amino acid transport and metabolism] COG0626 Cystathionine beta-lyases/cystathionine gamma-synthases [Amino acid transport and COG0111 Phosphoglycerate dehydrogenase and related dehydrogenases [Amino acid transport metabolism] and metabolism] COG0646 Methionine synthase I (cobalamin-dependent), methyltransferase domain [Amino acid COG0136 Aspartate-semialdehyde dehydrogenase [Amino acid transport and metabolism] transport and metabolism] COG0155 Sulfite reductase, beta subunit (hemoprotein) [Inorganic ion transport and metabolism] COG0685 5,10-methylenetetrahydrofolate reductase [Amino acid transport and metabolism] COG0175 3’-phosphoadenosine 5’-phosphosulfate sulfotransferase (PAPS reductase)/FAD syn- COG0692 Uracil DNA glycosylase [DNA replication, recombination, and repair] thetase and related enzymes [Amino acid transport and metabolism / Coenzyme metabolism] COG0707 UDP-N-acetylglucosamine:LPS N-acetylglucosamine transferase [Cell envelope bio- COG0182 Predicted translation initiation factor 2B subunit, eIF-2B alpha/beta/delta family genesis, outer membrane] [Translation, ribosomal structure and biogenesis] COG0714 MoxR-like ATPases [General function prediction only] COG0189 Glutathione synthase/Ribosomal protein S6 modification enzyme (glutaminyl trans- COG0715 ABC-type nitrate/sulfonate/bicarbonate transport systems, periplasmic components ferase) [Coenzyme metabolism / Translation, ribosomal structure and biogenesis] [Inorganic ion transport and metabolism] COG0192 S-adenosylmethionine synthetase [Coenzyme metabolism] COG0747 ABC-type dipeptide transport system, periplasmic component [Amino acid transport COG0235 Ribulose-5-phosphate 4-epimerase and related epimerases and aldolases [Carbohydrate and metabolism] transport and metabolism] COG0765 ABC-type amino acid transport system, permease component [Amino acid transport COG0306 Phosphate/sulphate permeases [Inorganic ion transport and metabolism] COG0322 Nuclease subunit of the excinuclease complex [DNA replication, recombination, and and metabolism] COG0822 NifU homolog involved in Fe-S cluster formation [Energy production and conversion] repair] COG0833 Amino acid transporters [Amino acid transport and metabolism] COG0371 Glycerol dehydrogenase and related enzymes [Energy production and conversion] COG0857 BioD-like N-terminal domain of phosphotransacetylase [General function prediction COG0388 Predicted amidohydrolase [General function prediction only] COG0390 ABC-type uncharacterized transport system, permease component [General function only] COG1070 Sugar (pentulose and hexulose) kinases [Carbohydrate transport and metabolism] prediction only] COG1116 ABC-type nitrate/sulfonate/bicarbonate transport system, ATPase component [Inor- 23 ganic ion transport and metabolism] COG2040 Homocysteine/selenocysteine methylase (S-methylmethionine-dependent) [Amino acid COG1117 ABC-type phosphate transport system, ATPase component [Inorganic ion transport transport and metabolism] and metabolism] COG2096 Uncharacterized conserved protein [Function unknown] COG1123 ATPase components of various ABC-type transport systems, contain duplicated AT- COG2138 Uncharacterized conserved protein [Function unknown] Pase [General function prediction only] COG2227 2-polyprenyl-3-methyl-5-hydroxy-6-metoxy-1,4-benzoquinol methylase [Coenzyme COG1126 ABC-type polar amino acid transport system, ATPase component [Amino acid trans- metabolism] port and metabolism] COG2334 Putative homoserine kinase type II (protein kinase fold) [General function prediction COG1129 ABC-type sugar transport system, ATPase component [Carbohydrate transport and only] metabolism] COG2514 Predicted ring-cleavage extradiol dioxygenase [General function prediction only] COG1135 ABC-type metal ion transport system, ATPase component [Inorganic ion transport COG2606 Uncharacterized conserved protein [Function unknown] COG2873 O-acetylhomoserine sulfhydrylase [Amino acid transport and metabolism] and metabolism] COG1172 Ribose/xylose/arabinose/galactoside ABC-type transport systems, permease compo- COG3142 Uncharacterized protein involved in copper resistance [Inorganic ion transport and metabolism] nents [Carbohydrate transport and metabolism] COG1173 ABC-type dipeptide/oligopeptide/nickel transport systems, permease components COG3464 Transposase and inactivated derivatives [DNA replication, recombination, and repair] COG3481 Predicted HD-superfamily hydrolase [General function prediction only] [Amino acid transport and metabolism / Inorganic ion transport and metabolism] COG4166 ABC-type oligopeptide transport system, periplasmic component [Amino acid trans- COG1206 NAD(FAD)-utilizing enzyme possibly involved in translation [Translation, ribosomal port and metabolism] structure and biogenesis] COG4359 Uncharacterized conserved protein [Function unknown] COG1215 Glycosyltransferases, probably involved in cell wall biogenesis [Cell envelope biogene- COG4608 ABC-type oligopeptide transport system, ATPase component [Amino acid transport sis, outer membrane] and metabolism] COG1410 Methionine synthase I, cobalamin-binding domain [Amino acid transport and COG4636 Uncharacterized protein conserved in cyanobacteria [Function unknown] metabolism] COG4857 Predicted kinase [General function prediction only] COG1451 Predicted metal-dependent hydrolase [General function prediction only] COG5583 Uncharacterized small protein [Function unknown] COG1464 ABC-type metal ion transport system, periplasmic component/surface antigen [Inor- pfam00083 Sugar (and other) transporter. ganic ion transport and metabolism] pfam00496 Bacterial extracellular solute-binding proteins, family 5. COG1473 Metal-dependent amidase/aminoacylase/carboxypeptidase [General function predic- pfam00497 Bacterial extracellular solute-binding proteins, family 3. tion only] pfam00583 Acetyltransferase (GNAT) family. COG1482 Phosphomannose isomerase [Carbohydrate transport and metabolism] pfam00696 Amino acid kinase family. COG1647 Esterase/lipase [General function prediction only] pfam00903 Glyoxalase/Bleomycin resistance protein/Dioxygenase superfamily. COG1648 Siroheme synthase (precorrin-2 oxidase/ferrochelatase domain) [Coenzyme pfam01053 Cys/Met metabolism PLP-dependent enzyme. pfam01458 Uncharacterized protein family (UPF0051). metabolism] pfam01625 Peptide methionine sulfoxide reductase. COG1757 Na+/H+ antiporter [Energy production and conversion] COG1791 Uncharacterized conserved protein, contains double-stranded beta-helix domain [Func- pfam01717 Methionine synthase, vitamin-B12 independent. pfam01904 Protein of unknown function DUF72. tion unknown] pfam02525 Flavodoxin-like fold. COG1850 Ribulose 1,5-bisphosphate carboxylase, large subunit [Carbohydrate transport and pfam02627 Carboxymuconolactone decarboxylase family. metabolism] pfam03180 NLPA lipoprotein. COG1878 Predicted metal-dependent hydrolase [General function prediction only] pfam03747 ADP-ribosylglycohydrolase. COG1879 ABC-type sugar transport system, periplasmic component [Carbohydrate transport pfam04204 Homoserine O-succinyltransferase. and metabolism] pfam04434 SWIM zinc finger. COG1897 Homoserine trans-succinylase [Amino acid transport and metabolism] pfam05977 Bacterial protein of unknown function (DUF894). COG2011 ABC-type metal ion transport system, permease component [Inorganic ion transport pfam06695 Putative small multi-drug export protein. and metabolism] pfam07155 Protein of unknown function (DUF1393). COG2021 Homoserine acetyltransferase [Amino acid transport and metabolism] 24 Supplementary Figure S4D: SAM-I multiple sequence alignment

The multiple sequence alignment of SAM-I riboswitches follows. The align- patible mutations, “0” denotes base pairs that are not observed to mutate and “?” ment includes sequences containing the putative SAM-I riboswitches, as well as denotes base pairs that have a significant frequency of non-canonical nucleotides downstream sequence, in which Shine-Dalgarno sequences, rho-independent tran- for Watson-Crick or G-U pairs. Below these base pair annotation is the consensus scription terminators and start codons are annotated. Superscript “1” indicates sequence: “R” = “A” or “G”, “Y” = “C” or “U”, red nucleotides: nucleotide riboswitches lacking P4. Nucleotides proposed to basepair are colored when they identity conserved more than 97% of the time, black nucleotides: 90%, gray nu- comprise Watson-Crick or G-U pairs. Otherwise they are gray. Colors are as cleotides: 75%, red circle (): nucleotide is present 97% of the time, black circle follows: P1, P2, P2b, P3 and P4. Pseudoknots are not colored. Stems are also (): 90%, gray circle (): 75%,qa white circle (): 50%. This data is derived from the indicated at the bottom of the alignment by angle brackets, where matching < SAM-Iqa alignment in [1].qa (This explanatory textqa is largely copied from supplementary and > denote base-paired columns. Below these angle brackets, the symbol “2” data on a different RNA motif [4].) denotes base pairs exhibiting covariation, “1” denotes base pairs exhibiting com-

alignment positions 1 ··· 156 1 Lme-1-1 UUUAAAACAGCU.AUCA.AGAG.UAGG.CGAGUGAU..ACA.G.CACAAUGAC.CCUG.CACCAACC.G.G.CAAA...... AUAAC...... GAAAAGCAGCUACGAGAGUGUAUUUUCAUCAGUU Mmy-1-1 AAAUAAUAUCUU.AUUA.AGAA.CCAU.GGAGAGAU..AAA.A.CUUAAUGAA.GUGG.UAGCAACC.U.A...... UUAGA...... 1 Ooe-1-1 CAGUUAAAACCU.AUUC.AGAGCGUAA.UGAGGGAU..ACC.A.CCCGAUGAG.CUAC.CGGCAACC.U.GCU...... UUUU...... 1 env-1 UUUGCCUUGCUU.AUCA.AGAA.AGAC.CGAGGGA...UUG.GGCCCGCUGAC.GUCU.UAGCAACC...UUCUGGUAGG...... AUUUC...... CGAGCAAAGCUCGGCAACCUCCC Gka-1-1 UUGCAUACGUUG.CUCG.GGAG.UGAG.GGAGGAA...UUG.G.CUCAAUGAA.CUC..CGUCAACC.U.GCCG...... CCUA...... Dvu-1-1 CCUGCGUCACUC.AUGA.CGAG.GGUG.CGAGGGAC..UUG.G.CCCGAUGAC.CACC.CGGCAACC.U..GUGCAGGCGAGACGCC...... GUCA...... GCCA Dra-1-1 CCGUGCGCGGUC.AUCC.AGAGUCGCC.CCAGGGUGUUUCC.UGCCCGCCUAC.GGCG.CAGCAACC.G..GCC...... UUCAU...... CA Dge-1-1 CUCGGCACGGUC.AUCC.UGAG.CGCC.CGAGGGAAUUCCA.GACCCACUGAC.GGCG.CGGCAACC.G.GCCCU...... CAUC...... Dde-1-1 AAAAGACAACUC.AUGA.AGAG.GACG.AGAGGGGU..CUG.G.CCCGAUGAC.CGUC.CGGCAACC.U.GCC...... CGCAA...... env-2 AAAAGUUAAUGC.AUCG.AGAUGACAC.GGAGGGAU..CUG.G.CCCUUUGAU.GUGUCAAGCAGCC.U.UUU...... CAGC...... env-3 UACGAAAUUGGU.AUCG.AGAUGACAC.UGAGGGAU..CUG.G.CCCUUUGAA.GUGACGAGCAGCC.U.UUU...... UCAGC...... env-4 ACUAUUAUAUGU.AUCA.CGAUGACAC.UGAGGGAU..CUG.G.CCCUUUGAA.GUGUCGAGCAUCC.U.UUU...... CAGC...... env-5 ACGUUUAUUUGU.AUCG.AGAUGAUAC.UGAGGGAU..CUG.G.CCCUUUGAA.GUGUCGAGCAGCC.U.UUU...... CAGC...... env-6 GCACUAUUAUGU.AUCA.AGAUGACAC.UGAGGGAU..CUG.G.CCCUUUGAA.GUGUCGAGCAGCC.U.UUU...... CAGC...... Pph-1-1 UCCAUUGCUCAUCAUCA.AAAA.AGGC.UGAGGGAA..UAG.A.CCCGGUGAA.GCCU.UGACAACC.G.ACUCCG...... UUCAA...... AGAC 1 env-7 UCCAUUGUGUUC.AUCA.AGAG.AGUG.UGAGGGUU..CUG.G.CCCUAUGAC.CAUU.CACCAACC.G.GCCCU...... UUCG...... env-8 UCAGGGGCGUUC.AUCA.AGAG.AGUG.UGAGGGUU..CUG.G.CCCUAUGAC.CAUU.CACCAACC.G.GCCCC...... UUCG...... Xax-1-1 CCUAGCCUCACC.AUCG.AGAC.CGGC.GGAGGGA...CAG.G.CCCUUUGAU.GCCG.GGGCAGCC.A.GCG.GAGCGCG...... CAAGC...... GUC Xca-1-1 CGUAGCCUCACC.AUCG.AGAC.CGGC.GGAGGGA...CAG.G.CCCUUUGAU.GCCG.GGGCAGCC.A.GCG.GAGCGCG...... CAAGC...... GCC Xor-1-1 CUUAGCCUCACC.AUCG.AGAC.CGGC.GGAGGGA...CAG.G.CCCUUUGAU.GCCG.GGGCAGCC.A.GCG.GAGCGCG...... CAAGC...... GUC Cph-1-1 AUGUAGUGAUUU.AUAA.AGAA.GGGC.GGAGAGAU..UAG.G.CUCAAGGAA.ACCC.UGGCAACC.U...GC...... UGCAA...... Dra-1-2 AGCCACCCACUU....C.AGAG.CGCC.CGAGAGAC..CUG.G.CUCGACGAC.GGCG.CGGCAACC.G.GAC..CCC...... AUCAC...... Chu-1-1 GGGUCGUGUAUU.AUAC.AGAACGAAG.GGAGAGAU..UAG.G.CUCUAUGAA.CUUC.UGGCAACC.U.GCCC...... ACCA...... Asp-1-1 UAAGAGUGACCC.AUCC.CGAG.CGGC.CGAGAGAC..CUG.G.AUCGUUGAC.GCCG.CAGCAACC.C.UGG...... AAACC...... Dge-1-2 UGGAGCGACCUC.AUCC.AGAG.CGCC.CGAGAGAU..CUG.G.CUCGUUGAC.GGCG.CGGCAACC.G.GACCU...... CAUC...... Dra-1-3 AGGGUCACCUUU.AUCC.AGAGUCGGC.GCAGGGAC..CUG.G.CCCCAUGACCGCCG.CAGCAACC.G..GCCCUC...... AUCA...... env-9 UUGCAUAAACUU.AUGA.AGAA.AGGU.GGAGGGA...UUG.G.CCCUUUGAA.ACCU.UGGCAACC...CUUAG...... CUAAU...... env-10 UUGAGAAUCGAU.AUCG.AGAG.CGACGUGAGGGUU..CUG.G.CCCAGUGAACGUC..CGGCAACC.U.GCCC...... CAUG...... Sus-1-1 AGCUUCAAAAAC.AUUA.AGAG.UGGCGUGAGGGUU..CUG.A.CCCGAUGACCGCC..CGGCAACC.U.GGCG...... AAGC...... Rxy-1-1 UCUGGCCCGUAC.AUCG.AGAG.AGGCGCGAGGGAA..CUG.G.CCCAAAGACCGCC..CGGCAACC.U.GGCUU...... CUCCG...... 1 Mma-1-1 UCCGAGCCAACC.AUCC.GGAG.CGGC.CGAGUGAC..GUG.G.CACGUCGAC.GCCG.CAGCAACC....CCCCGGACCGC...... CUCG...... GCGGCCG 1 Kra-1-1 UCGAACCACACC.AUCC.AGAG.CGGC.CGAGAGUC..CUG.G.CUCGACGAC.GCCG.CAGCAACC...CCCCG...... AGCCC...... 1 Kra-1-2 CCGGCAACAC.C.AUCC.AGAG.UGGC.CGAGAGAC..CUG.G.CUCGACGAC.GCCA.CAGCAACC...ACCGC...... GCA...... 1 Kra-1-3 ACUUGACUCGCC.AUCC.AGAG.CGGC.CGAGAGAC..CCG.G.CUCGUCGAC.GCCG.CAGCAACC.U.CCC...... CCCG...... 1 Kra-1-4 CUGGACCCAC.C.AUCC.AGAG.CGGC.CGAGAGAC..CUG.G.CUCGUCGAC.GCCG.CAGCAACC.G.CCCCC...... UCG...... 1 Kra-1-5 GCUGGCCCGACC.AUCC.AGAG.CGGC.CGAGAGAC..CUG.G.CUCGUCGAC.GCCG.CAGCAACC.A.UCCCCCGC...... CGCCC...... CGG env-11 CCGCGCGCUUUC.AUCA.AGAG.UGAC.UGAGGGA...CUG.G.CCCGUUGAA.GUCA.CGGCAACC...GCUCAAAGGUUGGUCGACCGCG...... AAAUA...... GCGCGGCAGAUCAGUCGAUGG ....<<<<<<<<.<<<.....<.<<<<....<<<...... >>>...... >>>>.><<<..<<.<.<<<<<...... ?????222.012.....2.?222....222...... 222...... 222?.2221..02.?.??2??...... <<<.<...... ??2.2...... CUU-AUC -AGAG- GGY- GAGGGA---Y G-G-CCC RYGA -RCC -CRGCAACC------qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 25 env-12 UCAUGCAUUCUCCAUC..AGAA.CGGU.GGAGGCAU..CAG.GUCCUUUUGAA.GCCU.UGGCAAAC.....AUCUGGUACUU...... AUUAA...... GU Cph-1-2 AACAAUUCAUUU.AUCA.AGAA.AGGU.GGAGGGU...AAG.G.CCCGUUGAA.ACCU.UAGCAACC...CUGAUGG...... UAAUA...... C env-13 UACUGAAAUGUU.AUCA.AGAA.AGGU.GGAGGGAU..UAG.A.CCCAUUGAA.GCCU.UAGCAACC...CUUAG...... ACUUA...... env-14 AUUGAAACUGUU.AUAA.AGAA.AGGC.AGAGGGAU..UAG.A.CCCGAUGAA.GCCU.UAGCAACC...CUUUA...... CUAU...... Tth-1-1 GCCCGUUCUCUU.AUCC.AGAG.CGGU.GGAGGGU...ACG.G.CCCUGUGAA.GCCG.CGGCAACC.U..CCCGCCCCUUCCG...... UUCCA...... UGGCCGGAUGCGGG env-15 UACCGUUUACUU.AUCA.AGAA.AGGC.UGAGGGUA..AUG.G.CCCUGCGAA.GCCU.UGACAACC.U..CCUUC...... UACAA...... CUCCUGGCGGACUUGGG Fsp-1-1 UUCAUCUGGCUC.AUCC.AGAG.GGGC.AGAGGGA...ACG.G.CCCAGCGAA.GCCC.CGGCAACC...ACC..CGC...... AUCCC...... GUGCGC Fsp-2-1 UUCAUCUGGCUC.AUCC.AGAG.GGGC.AGAGGGA...ACG.G.CCCAGCGAA.GCCC.CGGCAACC...ACCGUCGC...... AUCCA...... UGC env-16 UGAAUACCGCUC.AUCC.AGAG.GGGU.GGAGGGA...CCG.G.CCCUGCGAA.GCCC.CGGCAACC.A..CCCGGCUGUGGCGCGUCGCG...... AUCCU...... GCGAGGCCGGC Tfu-1-1 UUGAGAUUGCUC.AUCC.AGAG.GGGU.GGAGGGAC..ACG.G.CCCUGUGAA.GCCC.CGGCAACC.A.UCCCGGUGGACG...... UUGCU...... CGUCAGGUGCGAGGCGGC Sav-1-1 CGAAUACCGCUC.AUCC.AGAG.GGGC.AGAGGGAU..ACG.G.CCCGAUGAA.GCCC.CGGCAACC.C.UCCAGCCGGUCUUGUCACG...... UUGAU...... GUGGCGAGGCUCCCGGC Sco-1-1 UUCAUACCGCUC.AUCC.AGAG.GGGC.AGAGGGAU..ACG.G.CCCGAUGAA.GCCC.CGGCAACC.C.UCC.AGUCGG...... UUCUU...... GUCACACGGACGUGGCGAGGCUCCCGGC Esi-1-1 CCUAUACGAUAU.AUCA.AGAG.UGGA.CGAGAGAC..CUG.G.CUCUAGGAC.UCCA.CGGCAACC.U.GCC...... GAUC...... Gka-1-2 AGCAUACGUCUU.AUCA.AGAG.UGGG.CGAGAGAA..CGG.G.CUUGAUGAC.CCCA.CAGCAACC.U.GCC...... GCA...... Esi-1-2 CAUAUAACACUU.AUCA.AGAGUGGAC.CGAGAGAU..CUG.G.AUCGAUGAC.GUCC.CAGCAACC.U.GC...... GCACA...... Esi-1-3 AAUUACAUUCUU.AUCC.AGAG.AGGU.GGAGGGA...CUG.G.CCCUUUGAA.GCCU.CAGCAACA.G.GUC...... GAAA...... env-17 GGGCGAGCACUU.AUCCCAGAG.UGGC.UGAGGGAA..CAG.GGCCCGCUGAA.GCCA.CAGCAACCAG.GCGC...... GAGA...... env-18 CUGCAGACGCUU.AUCC.AGAG.UGGC.UGAGGGAA..CAG.G.CCCGUCGAA.GCCA.CAGCAACC.A.GGC...... GCAA...... env-19 AUCAACGGUCUC.AUUC.AGAG.UGGC.AGAGGGA...CUG.G.CCCGAUGAA.GCCA.CGGCAACC.G.GCCUCG...... AUAAU...... UUC Pae-1-1 UUUUAAGCUGUC.AUCC.AGAA.AGGU.GGAGGGA...CUG.G.CCCUGAGAA.GCCU.UGGCAACC.G.UCA...... UUCG...... Cph-1-3 UUUCAAGCUAUC.AUCC.AGAA.AGGU.GGAGGGA...GUG.G.CCCUGAGAA.GCCU.UGGCAACC.G.UCA...... GUAG...... Cph-2-1 UUUUACGCUGUC.AUCG.AGAA.AGGU.GGAGGGA...CUG.G.CCCUGAGAA.ACCU.UGGCAACC.G.UCAU...... UGCAA...... Cli-1-1 UUUAACGCUAUC.AUCG.AGAA.AGGU.GGAGGGA...CUG.G.CCCUGAGAA.GCCU.UGGCAACC.G.UCAC...... GCACA...... Pph-1-2 UUUCAAGCUAUC.AUCC.AGAA.AGGU.GGAGGGA...CUG.G.CCCUGAGAA.GCCU.UGGCAACC.G.UCAUU...... GCUCU...... Cte-1-1 UUUCGAGCUAUC.AUCC.AGAA.AGGC.GGAGGGA...CUG.G.CCCUGCGAA.GCCU.UGGCAACC.U.UCAU...... UCCAC...... Rxy-1-2 GCGCGAACCCGC.AUCA.AGAG.CGGU.GGAGGGAA..CUG.G.CCCAGCGAA.GCCG.CGGCAACC.G.GCGGGGGGCC...... CUUAC...... C Rxy-1-3 UUCCGGGCGCUC.AUCG.AGAG.CGGU.GGAGGGA...CGG.G.CCCUGCGAA.GCCG.CGGCAACC.G.GCGGGCGGCG...... GACGC...... C Gme-1-1 AAACGCAUUCUU.AUCA.AGAG.UGGU.GGAGGGA...AAG.G.CCCUGCGAA.GCCA.CAGCAACC.G.GUCUUCCGGGUCGCA...... UUCGA...... GCACGAAGGCGAACCGGAAG Gsu-1-1 GUAGACCUUCUU.AUCA.AGAG.UGGU.GGAGGGA...AAG.G.CCCUGUGAA.ACCA.CAGCAACC.G.GUCCG...... GUAG...... Dac-1-1 GACAACCGUGUU.AUCA.CGAG.UGGU.GGAGGGA...AUG.G.CCCUUUGAA.ACCA.CAGCAACC.G.GUCCUGACG...... UUCA...... GG env-20 CCCUGGCUGCUC.AUCC.AGAG.AGGU.GGAGGGA...CCG.G.CCCUGAGAA.ACCU.CGGCAACC.G.CGAA...... GCGCA...... Gvi-1-1 GUCUCUUGACUU.AUCC.AGAGCAGGC.GUAGGGAA..CAG.G.CCCGGUGAC.GCCA.CGGCAACC.G...CCCAC...... UAGCA...... AC Mth-1-1 AUCCCACGGCUU.AUCG.CGAG.AGGU.GGAGGG....AGG.GCCCCGAUGAA.ACC..CGGCAACC...CCGCU...... AUUUA...... Sth-1-1 CCCGCCCGGUUC.AUCG.AGAG.UGGC.GGAGGGA...CUG.G.CCCCAUGAU.GCCA.CGGCAACC.U..CUCC...... CGCGG...... Cac-1-1 UAAUUGUUUCUU.AUCA.AGAG.UGAC.GGAGGGA...UAG.G.CCCUAUGAA.GUC..CGGCAACA.U..CCAA...... UUAUU...... Lmo-1-1 UUACGUUUUCUU.AUCA.AGAG.UGGU.GGAGGGAA..UCG.G.CCCAGUGAA.ACC..CAGCAGCG.G.AGC...... GCAA...... Lmo-2-1 UUACGUUUUCUU.AUCA.AGAG.CGGU.GGAGGGAA..UCG.G.CCCAGUGAA.GCC..CAGCAGCG.G.AGC...... GCAA...... Lin-1-1 UUACAAUUUCUU.AUCC.AGAG.UGGU.GGAGGGAA..UCG.G.CCCAGUGAA.ACC..CGGCAGCG.G.AGC...... GCAA...... env-21 CGGGACGCUCUU.AUCG.AGAG.UGGU.GGAGGGA...CUG.G.CCCGACGAA.GCC..CGGCAACC.C.GCCGAACC...... GUCAA...... CGGU Cau-1-1 ACACACGCACUC.AUCC.AGAG.CGGU.GGAGGGA...CCG.G.CCCGUUGAA.ACCG.CAGCAACC.C.UCGUAUGC...... GUCGC...... CGCA Bcl-1-1 AUACGAAUUCUU.AUUA.AGAG.GAGC.AGAGGGA...CUG.G.CCCAAUGAU.GCUU.CAGCAACC...CCGC...... CAUGA...... env-22 UCAUGCAUUCUC.AUCA.AGAA.AGGU.GGAGGGAU..CAG.G.UCCUUUGAA.GCCU.UGGCAACC.A.UCUAGUAC...... UUAAU...... UAAGUG Oih-1-1 UUAAUACUUCUU.AUCG.AGAG.AAGC.UAAGGGAC..CUG.G.CCUGUUGAC.GCUU.CAGCAACC.U..CUAU...... CUCCA...... Bcl-1-2 AUGUUUUUUCUU.AUCC.AGAG.AGAU.GGAGGGAU..UUG.G.CCCUUUGAA.GUCU.CAGCAACC.G..GCC...... UUU...... Oih-1-2 AUAGUUAGACUU.AUCA.AGAG.AGAU.GGAGGGA...UUG.G.CCCGAUGAA.GUCU.CAGCAACC.A.GCCU...... AGAUA...... Oih-1-3 UUAUUUUUCCUU.AUCA.AGAGUCGGG.GGAGGAAU..CUG.G.UCCAUUGAU.CCCG.CAGCAACC.A.GUUACA...... AUGAA...... Esi-1-4 CCAAUAAUUCUU.AUCA.AGAG.AAGU.CGAGGGA...ACG.G.CCCGAAGAC.ACUU.CAGCAACC...CCGC...... GCGUA...... Gsu-1-2 ACGGCUUAACUU.AUCA.AGAG.CGAC.CGAGGGA...CAG.G.CCCGGUGAC.GUCG.CGGCAACC.U.CCCC...... AUGG...... Gme-1-2 GAUUCAUUUCUU.AUCA.CGAG.CGAC.CGAGGGA...CUG.G.CCCUAUGAC.GUCG.CGGCAACC...CCCC...... GCAA...... Cte-2-1 UAAAAAAAGCUU.AUUA.AGAG.CGGU.GGAGGGA...CUG.G.CCCUAUGAA.GCC..CGGCAACC.U.GUA.UAUG...... UGUUU...... GCAU Sep-1-1 UUACCUAACCUU.AUUU.UGAG.AAGC.AGAGGGAU..UUG.G.CCCGUAGAA.GCUU.CAGCAACC.G.ACU...... UUAAA...... Sau-1-1 UUCAUAUUUCUU.AUUG.UGAG.AAGU.UGAGGGAC..UUG.G.CCCUGUGAU.ACUU.CAGCAACC.G.ACU...... UUAU...... Oih-1-4 AUUGAAUAACUU.AUCC.AGAG.UGAC.GGAGGGAA..CAG.G.ACCUACGAU.GUCA.CAGCAACC.U.ACC...... UUUAC...... Esi-1-5 GCAUAUCUACUU.AUCG.AGAG.CGAC.CGAGGGAU..UAG.G.CCCAACGAC.GUC..CGGCAACC...ACC...... UUUAA...... Bcl-1-3 AAAAAAGGACUU.AUCA.AGAG.CGAC.UGAGGGAU..UAG.G.CCCAAUGAC.GUC..CAGCAACC...UCCC...... GUUAC...... Cpe-1-1 UUAUAUACUCUU.AUCC.AGAG.AGGU.GGAGGGAAA.AAG.G.CCCUAUGAA.ACC..CGGCAACC...AGUGA...... GAAA...... Cth-1-1 ACAGGUAACCUU.AUCA.AGAG.AGGC.GGAGGGA...AUG.GGCCCUAUGAA.ACC..CGGCAACC.G..GCA...... GAAU...... Oih-1-5 AUGAAAAUACUU.AUCA.AGAG.AGGU.GGAGGGA...CUG.G.CCCGCUGAA.ACCU.CAGCAACA.GAACG...... CAUC...... Bce-4-1 AUUAGUUUUCUU.AUCA.AGAG.AGAU.GGAGGGA...CUG.G.CCCGAUGAA.AUCU.CAGCAACA.G.GCU...... ACACA...... Bce-1-1 AUUAGUUUUCUU.AUUA.AGAG.AGAU.GGAGGGA...CUG.G.CCCGAUGAA.AUCU.CAGCAACA.G.GCU...... AUAAA...... <<<<<<<<.<<<.....<.<<<<....<<<...... >>>...... >>>>.><<<..<<.<.<<<<<...... ?????222.012.....2.?222....222...... 222...... 222?.2221..02.?.??2??...... <<<.<...... ??2.2...... CUU-AUC -AGAG- GGY- GAGGGA---Y G-G-CCC RYGA -RCC -CRGCAACC------qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 26

Oih-1-6 AGCAAAUCUCUU.AUCA.AGAG.UGGU.GGAGGGAA..UAG.G.CCCUGCGAA.GCC..CGGCAACC.U....GUAGC...... AAUU...... GCU Lmo-1-2 ACAUAGUAACUU.AUCA.AGAA.AGGU.GGAGGGUU..CUG.G.CCCCGUGAA.GCCU.UGGCAACC.G.GA...... UUUU...... Lmo-2-2 ACAUAGCAACUU.AUCA.AGAA.AGGU.GGAGGGUU..CUG.G.CCCCGUGAA.GCCU.UGGCAACC.G.GA...... AGUU...... Lin-1-2 ACAUAGUAACUU.AUCA.AGAA.AGGU.GGAGGGUU..CUG.G.CCCAGUGAA.GCCU.UGGCAACC.G.GA...... CUUU...... Ban-3-1 GGAUACUCUCUU.AUCC.CGAGCUGGC.GGAGGGA...CAG.G.CCCGAUGAA.GCC..CAGCAACC.U...CACUUGUAGUGG...... UAAAU...... ACAG Bth-1-1 GGAUACUCUCUU.AUCC.CGAGCUGGC.GGAGGGA...CAG.G.CCCGAUGAA.GCC..CAGCAACC...UCACUUGUAUUG...... GUAAA...... CACA Bcl-1-4 CAUAACGCUUUU.AUCC.AGAGAUGGC.GGAGGGA...CAG.G.CCCGAAGAA.GCC..CAGCAACC...AACAC...... GUAAC...... Bha-1-1 ACGGAUACUCUU.AUCC.AGAGUUGGU.GGAGGGA...CAG.G.CCCGAAGAA.ACCC.CAGCAACC...AACACCUG...... UUAAA...... CAAAG Bli-1-1 GCGGAUACUCUU.AUCC.CGAGCUGGU.GGAGGGA...CAG.G.CCCAAUGAA.ACC..CAGCAACC.G.GUUUCUCUUA...... UUAAU...... GGAAAAAAACAGUUUCUGAGA Gka-1-3 ACGGAUACUCUU.AUCC.CGAGCCGGU.GGAGGGA...CAG.G.CCCGAUGAA.GCC..CAGCAACC.G..UCACAACUG...... UACAU...... GU Oih-1-7 ACGGAUACUCUU.AUUC.AGAGUUGGU.GGAGGGA...CAG.A.CCCGAUGAA.GCC..CAGCAACC...AUCAC...... UACUG...... Bsu-1-1 GCGGAUACUCUU.AUCC.CGAGCUGGC.GGAGGGA...CAG.G.CCCUAUGAA.GCC..CAGCAACC...GGUUUCUCUGUUA...... UUUAU...... UAUGUUCAACUGAGUGAGA Esi-1-6 ACGGAUACUCUU.AUUC.UGAGCAGGU.GGAGGGAAC.AAG.G.CCCGAAGAA.ACC..CGGCAACC.G..UCUUAUA...... UUAAU...... UUCAUCAUUAAUGU Oih-1-8 AUGACAAUUCUU.AUCC.AGAG.AGGU.GGAGGGA...CUG.G.CCCAAGGAA.GCCU.CGGCAACA.G.ACUUAU...... UUGAU...... Esi-1-7 ACCGCAACACUU.AUCC.AGAG.AGGU.GGAGGGA...CUG.G.CCCUACGAA.ACCU.CGGCAACA.G.ACUC...... AUUAU...... Cac-1-2 UAAUAUUUCCUU.AUCA.AGAG.AAAC.GGAGGGA...CUG.G.CCCAAUGAU.GUUU.CAGCAACC.A.AGGU...... UUUAU...... 1 Fnu-1-1 AAAUAAAUAACC.AUCC.AGAG.AAAC.GGAGGGA...CUG.G.CCCAAUGAU.GUUU.CAGCAACC.U.ACU...... UAAA...... 1 Fnu-2-1 AAAUAAAUAACC.AUCC.AGAG.AAAC.UGAGGGA...CUG.G.CCCUAUGAU.GUUU.CAGCAACC.U.ACU...... UAAA...... 1 Fnu-1-2 UGGAAAUAAACC.AUCA.AGAG.AGAU.UGAGGGA...CAG.G.CCCGUUGAG.AUCU.CAGCAACC.U.ACG...... UAAAA...... 1 Fnu-2-2 UAGAAAUGAACC.AUCA.AGAG.AGAU.UGAGGGA...CAG.G.CCCGUUGAG.AUCU.CAGCAACC.U.ACAUU...... AAUU...... Sau-2-1 ACGGAUUCUCUU.AUCC.UGAG.UGGU.GGAGGGAC..AUG.GACCCAAUGAA.ACC..CAGCAACC.U..CUUUU...... UUUAU...... A Sau-1-2 ACGGAUUCUCUU.AUCC.UGAG.UGGU.GGAGGGAC..AUG.GACCCAAUGAA.ACC..CAGCAACC.U..CUUUU...... UUAUA...... Sep-1-2 ACGGAUUCUCUU.AUCC.UGAG.UGGU.GGAGGGAC..AUG.GACCCAAUGAA.ACC..CAGCAACC.U..CUUU...... AUUU...... Gka-1-4 CGUCCCAUUCUU.AUCA.AGAG.AAGC.GGAGGGAA..CUG.G.CCCAAUGAA.GCUU.CAGCAACC.A.GCCGC...... CCG...... Bcl-1-5 AAAAACACUCUU.AUAA.CGAG.AAGC.GGAGGGA...CUG.G.CCCAAUGAA.GCUU.CAGCAACC.A.UUCAU...... UGCG...... Bcl-1-6 GAAUAGGUUCUU.AUCA.AGAG.AAGU.GGAGGGA...AUG.G.CCCAAUGAA.GCUU.CAGCAACC.A.GCCA...... ACCA...... Lin-1-3 UGUAGAAAUCUU.AUCC.AGAG.UGGU.GGAGGGA...AAU.G.CCCUGUGAA.ACC..CAGCAACC.U.AAACAAUAAUUC...... AUUAU...... G Lmo-1-3 UGUAGAAAUCUU.AUCC.AGAG.UGGU.GGAGGGA...AAU.G.CCCUAUGAA.GCC..CAGCAACC.U.AAACAAUAAUUC...... AUUAU...... G Lmo-2-3 UGUAGAAAUCUU.AUCC.AGAG.UGGU.GGAGGGA...AAU.G.CCCUGUGAA.ACC..CAGCAACC.U.AAACAAUAAUUC...... AUUAU...... G Bcl-1-7 GAAUAAGAACUU.AUCA.AGAG.UGGC.GGAGGGAC..CUG.A.CCCAAUGAA.GCC..CGGCAACC...AGUUUA...... UUUGU...... Bha-1-2 AUAAAAAGACUU.AUCG.AGAG.AGGC.AGAGGGA...CUG.A.CCCGAUGAU.GCC..CGGCAACC...CGUUUG...... UUAGC...... CAAGC Bce-4-2 UAAUAUAUCUUU.AUCA.AGAG.AGGC.AGAGGGA...CCG.G.CCCUUUGAA.GCC..CAGCAACC.U...CAGU...... UUACA...... CAAA Ban-3-2 GAAUAAUUCUUU.AUCA.AGAG.AGGC.AGAGGGA...CCG.G.CCCUUUGAA.GCC..CAGCAACC.U...CAGU...... UUAUA...... CAAA Bce-1-2 GAAUAUACCUUU.AUCA.AGAG.AGGC.AGAGGGA...CCG.G.CCCUUUGAA.GCC..CAGCAACC.U...CAGU...... UUAUA...... CAAA Bce-2-1 GAAUACUUCUUU.AUCA.AGAG.AGGC.AGAGGGA...CCG.G.CCCUUUGAA.GCC..CAGCAACC.U...CAGU...... UUAUA...... CAAA Lmo-2-4 UAGUAUUUUCUU.AUCA.CGAA.AGGU.GGAGGGA...CUG.G.CCCUUUGAA.GCCU.UAGCAACC.G..GAA...... UUUAU...... Sth-1-2 CCGAAUACUCUU.AUCA.AGAG.AAGC.GGAGGGAC..CUG.G.CCCGAUGAA.GCUU.CGGCAACC.A.GCCUGC...... GUCAC...... Bth-1-2 AUUCGAUGUCUU.AUCA.AGAGCAGGU.GGAGGGA...UGA.G.CCCUACGAA.GCC..CGGCAACC.G.ACCCA...... UUUA...... Bce-2-2 AUCGGAUGUCUU.AUCA.AGAGCAGGU.GGAGGGA...UGA.G.CCCUACGAA.GCC..CGGCAACC.G.ACCCA...... UUUA...... Bsu-1-2 CUAUAUUUUCUU.AUCA.AGAGCAGGC.AGAGGGA...CGA.G.CCCGAUGAA.GCC..CGGCAACC.G.ACU...... UAUAA...... Bli-1-2 AUCAAUAUUCUU.AUCA.AGAGCAGGC.AGAGGGA...CAA.G.CCCGAUGAA.GCC..CGGCAACC.G.ACU...... UUUUA...... Gka-1-5 GCAGGCUUUCUU.AUCA.AGAGCAGGC.GGAGGGA...CGA.G.CCCAAUGAA.GCC..CGGCAACC.G.GCUUGG...... CGCGC...... GC Bsu-1-3 AUCUAAAAACUU.AUCA.AGAG.CGGC.UGAGGGA...CUG.G.ACCUAUGAA.GCC..CGGCAACC.U.GCA...... UAGUU...... Bli-1-3 AAAAGCAAACUU.AUCA.AGAG.CGGU.GGAGGGA...CUG.G.UCCGAUGAA.ACC..CGGCAACC.U.GCGU...... GUGAA...... Bce-2-3 AAUACAAAGCUU.AUCA.AGAG.AAGC.GGAGGGAA..CUG.G.CCCGAAGAA.GCU..CGGCAACC.U.GCUU...... AUAGA...... Bce-3-1 AAUACAAAGCUU.AUCA.AGAG.AAGC.GGAGGGAA..CUG.G.CCCGGUGAA.GCU..CGGCAACC.U.GCUU...... AUAGA...... Ban-3-3 AAUACAAAGCUU.AUCA.AGAG.AAGC.GGAGGGAA..CUG.G.CCCGGCGAA.GCU..CGGCAACC.U.GCUU...... AUAGA...... Bli-1-4 AAUAUGCAGCUU.AUAA.AGAG.AGAU.GGAGGGA...CUG.G.CCCGGUGAA.AUCU.CAGCAACC.U.GCA...... GCAG...... Bha-1-3 UCUCGUAUUCUU.AUCC.AGAG.AGGU.GGAGGGA...ACG.G.CCCGAAGAA.ACCU.CAGCAACC.A.GCCACG...... AUCCU...... Ban-3-4 AAGACAACUCUU.AUUG.AGAG.CGGU.GGAGGGA...AAG.G.CCCUGUGAA.ACC..CGGCAACC.U..UCAAAC...... GAAAU...... GU Bce-1-3 AGACAAACUCUU.AUUG.AGAG.CGGU.GGAGGGA...AAG.G.CCCUGUGAA.ACC..CGGCAACC.U..UCAAAC...... GAAAU...... GU Tte-1-1 UAACACGCUCUU.AUCA.AGAG.AGGU.GGAGGGAA..AGA.G.CCCGAUGAA.ACC..CGGCAACC.U.GUCCU...... UUUA...... Mth-1-2 CCGGAAACUCUU.AUCG.AGAGCUGGC.GGAGGGA...CUG.G.CCCGAUGAA.GCC..CGGCAACC.G.GCCUCU...... UAGAA...... CC Ban-3-5 UAAAUACUUCUU.AUCA.AGAGCAGGU.GGAGGGA...CGA.G.CCCGACGAA.ACC..CGGCAACC.G.AUCUAC...... AUAAU...... UG Bce-1-4 UAAAUACUUCUU.AUCA.AGAGCAGGU.GGAGGGA...CGA.G.CCCGACGAA.ACC..CGGCAACC.G.AUCUAC...... AAUU...... G Esi-1-8 CAUCCAAUUCUU.AUCA.AGAGCAGAC.GGAGGGA...CGA.G.CCCUACGAU.GUCG.CAGCAACC.G.ACC...... ACUU...... Cte-2-2 UUAAAAUAUCUU.AUCA.AGAG.CGGU.UGAGGGA...CUG.G.CCCUAUGAA.ACC..CAGCAACC.U..AUAC...... ACAAA...... Bth-1-3 UAAAAAUUUCUU.AUUA.AGAG.AGGU.GGAGGGA...CUG.G.CCCUUCGAA.GCCU.CAGCAACC.U.GAUUU...... AUGUG...... Cac-1-3 AUGGAAACUCUU.AUCA.AGAG.AGGU.GGAGGGAA..AGG.G.CCCGUUGAA.ACC..CGGCAACC.G..AUGUAUUAA...... UUUAA...... GU Cte-2-3 UAAAAAGCUCUU.AUCG.AGAG.AGGU.GGAGGGAA..AGG.G.CCCUAUGAA.ACC..CGGCAACC...AAUAUU...... UUUA...... GAA ....<<<<<<<<.<<<.....<.<<<<....<<<...... >>>...... >>>>.><<<..<<.<.<<<<<...... ?????222.012.....2.?222....222...... 222...... 222?.2221..02.?.??2??...... <<<.<...... ??2.2...... CUU-AUC -AGAG- GGY- GAGGGA---Y G-G-CCC RYGA -RCC -CRGCAACC------qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 27

Bce-4-3 UUGCAUAGUCUU.AUCA.AGAAAAGGU.GGAGGGA...CAG.G.CCCGAUGAA.ACCU.UGGCAACA.G..CCG...... UAUAA...... Bce-2-4 UUGCAUAGUCUU.AUCA.AGAAAAGGU.GGAGGGA...CAG.G.CCCGAUGAA.ACCU.UGGCAACA.G..CCG...... UAUAA...... Bce-1-5 UUGCAUAGUCUU.AUCA.AGAAAAGGU.GGAGGGA...CAG.G.CCCGAUGAA.ACCU.UGGCAACA.G..CCG...... UAUAA...... Ban-3-6 UUGCAUAGUCUU.AUCA.AGAAAAGGU.GGAGGGA...CAG.G.CCCGAUGAA.ACCU.UGGCAACA.G..CCG...... UAUAA...... Cac-1-4 UGAUAAGGUCUU.AUCA.AGAG.AGGU.GGAGGGA...CUG.G.CCCUAUGAA.ACC..CAACAACC.A.GCAUUU...... UUUAA...... UUA Bce-1-6 CAAACAAUUCUU.AUGU.UGAG.AAGU.GGAGGGA...CGG.G.CCCUAUGAA.ACUU.CGGCAACC.U..CGU...... AUGAG...... Esi-1-9 CCGAUAACUCUU.AUCG.AGAG.UGGU.GGAGGGA...CUG.G.CCCGAUGAA.ACC..CGGCAACC.G...CGGAA...... UUUAU...... U Gka-1-6 AUCGGUACUCUU.AUCA.AGAGUUGGC.UGAGGGAA..UUG.G.CCCAAUGAA.GCC..CAGCAACC.G.ACCGUAA...... UACUAUCGUGAGAUAGGGCGCACGCCAAGGGCGGCGCCGGAAGCGUCAUGCUUCCGC Esi-1-10 UAUAUGACUCUU.AUCG.UGAGUUGGC.AGAGGGAA..UUG.G.CCCGAAGAC.GCCG.CAGCAACC.G.ACC...... AUCC...... Bcl-1-8 AUAUUCAUUCUU.AUCG.AGAG.AGGU.GGAGGGA...CUG.G.CCCAAUGAA.ACC..CGGCAACC.G.CAAG...... UUCG...... Tte-1-2 UUAAAAUCUCUU.AUCA.AGAG.AGGU.GGAGGGA...CUG.G.CCCGAUGAA.ACC..CGGCAACC.A.GCC...... UUAG...... Tte-1-3 CUCAAUCCUCUU.AUCA.AGAG.UGGU.GGAGGGA...CUG.G.CCCGAUGAA.ACC..CGGCAACC.G.GCAC...... GUAA...... Dha-1-1 AUAAACACUCUU.AUCC.AGAGUAGGU.GGAGGGA...CUG.G.CCCGAUGAA.ACC..CGGCAACC.G.ACAU...... GAAA...... Lin-1-4 AAUUUAUCUCUU.AUCC.AGAG.CGGU.AGAGGGA...CUG.A.CCCUUUGAA.GCC..CAGCAACC.U.ACACA...... UAUA...... Lmo-2-5 AAUUUAUCUCUU.AUCC.AGAG.CGGU.AGAGGGA...CUG.A.CCCUUUGAA.GCC..CAGCAACC.U.ACACA...... UAUA...... Oih-1-9 AUGAUAUCUCUU.AUCU.AGAG.CGGU.GGAGGGA...CUG.G.CCCUUUGAA.ACCG.CGGCAACC.U.UCAUA...... AUUA...... Bha-1-4 AAGAAAACUCUU.AUCA.UGAG.AGGU.GGAGGGA...CUG.G.CCCGAUGAA.GCC..CAGCAACC.G...CCAAGCAG...... CAAAU...... CGCU Gka-1-7 UGUUGCGCUCUU.AUCA.AGAG.AGGU.GGAGGGA...UGU.G.CCCAAUGAA.GCC..CGGCAACC.G..UCAGCG...... CAUGU...... G Bli-1-5 UAUGUUUCUCUU.AUCC.AGAG.AGGU.GGAGGGA...AGU.G.CCCUAUGAA.ACC..CGGCAACC.A..UCAAC...... ACGU...... G Bsu-1-4 UAUAUUUCUCUU.AUCA.AGAG.AGGU.GGAGGGA...AGU.G.CCCUAUGAA.GCC..CGGCAACC.A..UCAA...... CACUG...... Bce-1-7 CUGAUUUCUCUU.AUCA.AGAG.AGGU.GGAGGGA...CUGUG.CCCUGUGAA.GCC..CGGCAACC.G..UCAAC...... UUUAU...... G Ban-3-7 CUGAUUUCUCUU.AUCA.AGAG.AGGU.GGAGGGA...CUGUG.CCCUGUGAA.GCC..CGGCAACC.G..UCAAC...... UUAU...... G Sau-1-3 GCGUAAACUCUU.AUCG.AGAG.UGGU.GGAGGGA...UGU.G.CCCUACGAA.GCC..CGGCAACC.G..UCU...... UAUAU...... Sep-1-3 GCUUUAACUCUU.AUCG.AGAG.AGGU.GGAGGGA...UGU.G.CCCUAAGAA.GCC..CGGCAACC.G..UCU...... AAAAU...... Oih-1-10 CUAAUAUCUCUU.AUUG.AGAG.UGGC.UGAGGGA...CUG.G.CCCUGUGAC.GCC..CGGCAACC.G.UUCAUCG...... UAAUU...... CCAGUG Cac-1-5 AUAUUAUUUCUU.AUCA.AGA..AGGU.GGAGGGA...CUG.G.CCCUAUGAA.GCCU..GACAACC.G.GC...... AAAU...... Bce-1-8 UAUACAACUCUU.AUCA.AGAGCAGGU.GGAGGGAU..UUG.G.CCCGAUGAA.GCC..CAGCAACC.G.ACCGUAAUACCAUUGUGAAAUGGGGCG..UUUAU...... UUACGCCAA Bce-2-5 UAUACAACUCUU.AUCA.AGAGCAGGU.GGAGGGAU..UUG.G.CCCGAUGAA.GCC..CA.CAACC.G.ACCGUAAUACCAUUGUGAAAUGGGGCG..UUUAU...... UACGCCAA Bth-1-4 UAUACAACUCUU.AUCA.AGAGCAGGU.GGAGGGAU..UUG.G.CCCUAUGAA.GCC..CAGCAACC.G.ACCGUAAUACCAUUGUGAAAUGGGGCG..UUUAU...... GACGCCAA Bce-3-2 UAUAUAACUCUU.AUCA.AGAGCAGGU.GGAGGGAU..UUG.G.CCCGAUGAA.GCC..CAGCAACC.G.ACCGUAAUACCAUUGUGAAAUGGGGCG..UUUAU...... GACGCCAA Ban-3-8 UAUACAACUCUU.AUCA.AGAGCAGGU.GGAGGGAU..UUG.G.CCCGAUGAA.GCC..CAGCAACC.G.ACCGUAAUACCAUUGUGAAAUGGGGCG..UUUAU...... GACGCCAA Bli-1-6 AAGGUUUUCCUU.AUCA.AGAG.UGGU.GGAGGGA...CUG.G.CCCUGUGAA.ACC..CGGCAACC.G..CUGU...... CUAUG...... Bsu-1-5 AAGGUUUUCCUU.AUCA.AGAG.AGGU.GGAGGGA...CUG.G.CCCUGCGAU.ACC..CGGCAACC.G..CUG...... UUUAA...... Bsu-1-6 AAGUUGUACCUU.AUCA.AGAG.AGGU.GGAGGGA...CUG.G.CCCUAUGAU.ACC..CGGCAACC.G..CUGU...... UUCAA...... Bha-1-5 UCAUAUUUUCUU.AUCC.AGAG.UGGU.GGAGGGA...CUG.G.CCCUGUGAA.GCC..CGGCAACC.U.CUUU...... UUUU...... Bth-1-5 GAAUAUUUUCUU.AUCC.AGAG.AGGU.GGAGGGA...CUG.G.CCCGAUGAA.ACC..CAGCAACC...GC...... GAU...... Ban-3-9 GAAUAUUUUCUU.AUCC.AGAG.AGGU.GGAGGGA...CUG.G.CCCGAUGAA.ACC..CAGCAACC...GC...... GAU...... Bth-2-1 GAAUAUUUUCUU.AUCC.AGAG.AGGU.GGAGGGA...CUG.G.CCCGAUGAA.ACC..CAGCAACC...GC...... GAU...... Bce-1-9 AAAUUAAUACUU.AUCC.AGAG.AGGU.GGAGGGA...ACG.G.CCCUAUGAA.ACCU.CAGCAACC...CCUAUA...... UAUAU...... UU Bth-1-6 AAAUUAAUACUU.AUCC.AGAG.AGGU.GGAGGGA...ACG.G.CCCUAUGAA.ACCU.CAGCAACC...CCUAUG...... UAAAU...... GC Ban-3-10 AAAUUAAUACUU.AUCC.AGAG.AGGU.GGAGGGA...ACG.G.CCCUAUGAA.ACCU.CAGCAACC...CCUAUG...... UAAAU...... GC Bce-4-4 AAAUUAAUACUU.AUCC.AGAG.AGGU.GGAGGGA...CCG.G.CCCUAUGAA.ACCU.CAGCAACC...CCUAUG...... UAAAU...... GU Bce-2-6 AAAUUAAUACUU.AUCC.AGAG.AGGU.GGAGGGA...ACG.G.CCCUAAAAA.ACCU.CAGCAACC...CCUAUGCA...... AUUUU...... Lmo-2-6 AUAUUUUCUCUU.AUCG.AGAG.UGGC.AGAGGGA...CUG.G.CCCGAUGAA.GCC..CGGCAACC.U.AACUUUA...... UUUAA...... GCGUA Lmo-3-1 AUAUUUUCUCUU.AUCG.AGAG.CGGC.AGAGGGA...CUG.G.CCCGAUGAA.GCC..CGGCAACC.U.AACUUUA...... UUUAA...... GCGUA Lmo-1-4 AUAUUUUCUCUU.AUCG.AGAG.CGGC.AGAGGGA...CUG.G.CCCGAUGAA.GCC..CGGCAACC.U.AACUUUA...... UUUAA...... GCAUA Lin-1-5 AUAUUUUCUCUU.AUCG.AGAG.CGGC.AGAGGGA...CUG.G.CCCGAUGAA.GCC..CGGCAACC.U.AACUUUA...... UUUAA...... GCGUA Ban-2-1 NNNNNNNNNNNN.NNNN.NNAG.UGGC.GGAGGGA...CUG.G.CCCUCUGAU.GCC..CGGCAACC.G.AGCUUAUGAC...... GUAU...... Bce-1-10 ACACAUACUCUU.AUCA.AGAG.UGGC.GGAGGGA...CUG.G.CCCGAUGAU.GCC..CGGCAACC.G.AGCUUAUAAC...... GUAU...... Ban-3-11 ACACAUACUCUU.AUCA.AGAG.UGGC.GGAGGGA...CUG.G.CCCGAUGAU.GCC..CGGCAACC.G.AGCUUAUGAC...... GUAU...... Bce-2-7 ACACAUACUCUU.AUCA.AGAG.UGGC.GGAGGGA...CUG.G.CCCGAUGAU.GCC..CGGCAACC.G.AGCUUAUGAC...... GUAU...... Bce-3-3 ACACAUACUCUU.AUCA.AGAG.UGGC.GGAGGGA...CUG.G.CCCGAUGAU.GCC..CGGCAACC.G.AGCUUAUAAC...... GUAU...... Oih-1-11 UACGUUUUUCUU.AUCA.UGAG.AGGC.GGAGGGAA..AUG.G.CCCAACGAA.ACCU.CGGCAACA.G.GUUCU...... UAUU...... Oih-1-12 AUGAAAUAUCUU.AUCC.UGAG.AGGU.GGAGGGAA..AUG.G.CCCAAAGAA.GCCU.CGGCAACA.G.GUUC...... UAGCU...... Bce-1-11 CGAUACAUUCUU.AUCC.AGAG.AGGU.GGAGGGA...CUG.G.CCCUACGAU.ACCU.CAGCAACG.G.GUU...... UUUUU...... Bce-3-4 CGAUACAUUCUU.AUCC.AGAG.AGGU.GGAGGGA...CUG.G.CCCUACGAU.ACCU.CAGCAACG.G.GUU...... UUUU...... Ban-3-12 CGAUACAUUCUU.AUCC.AGAG.AGGU.GGAGGGA...CUG.G.CCCUACGAU.ACCU.CAGCAACG.G.GUU...... UUUUU...... Ban-1-1 CGAUACAUUCUU.AUCC.AGAG.AGGU.GGAGGGA...CUG.G.CCCUACGAU.ACCU.CAGUAACG.G.GUU...... UUUUU...... Ban-3-13 AGCAAUUUACUU.AUCC.AGAG.AGGU.AGAGGGA...CUG.G.CCCUAUGAC.ACCU.CAGCAGCG.G.GUUCUG...... UAAUA...... Bth-1-7 AGCAAUUUACUU.AUCC.AGAG.AGGU.AGAGGGA...CUG.G.CCCUAUGAC.ACCU.CAGCAGCG.G.GUUCUG...... CAAGA...... <<<<<<<<.<<<.....<.<<<<....<<<...... >>>...... >>>>.><<<..<<.<.<<<<<...... ?????222.012.....2.?222....222...... 222...... 222?.2221..02.?.??2??...... <<<.<...... ??2.2...... CUU-AUC -AGAG- GGY- GAGGGA---Y G-G-CCC RYGA -RCC -CRGCAACC------qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 28

Esi-1-11 UGACUUCUUUUU.AUCG.CGAG.AGAC.GGAGGGA...CUG.G.CCCGAUGAU.GUUU.CGGCAGCG.G.ACGA...... UUAAA...... Sep-1-4 AUUCAAUAACUU.AUCA.AGAG.AAGU.GGAGGGA...CUG.G.CCCAAAGAA.GCUU.CGGCAACA.U.U...... GUAUC...... Sau-2-2 UAAGCAUCACUU.AUCU.AGAG.AGGU.GGAGGGA...CUG.G.CCCUAUGAA.GCCU.CGGCAACA.U...... CUCGA...... Gka-1-8 ACCGGCAUUCUU.AUCA.AGAG.AGGG.GGAGGGA...CUG.G.CCCGGUGAA.CCCU.CAGCAACC.U.GGCC...... CGC...... Bli-1-7 GAAGAGAUUCUU.AUCA.CGAG.AGGU.GGAGGGA...CUG.G.CCCUUUGAA.ACCU.CAGCAACC...GGUCUGCACUGACGAC...... GUCA...... GU Bli-1-8 AUAGCUGUUCUU.AUCA.AGAG.AGGC.AGAGGGA...CUG.G.CCCGAUGAA.GCCU.CAGCAACC...GGUGAAUG...... AAUAU...... UCA Bsu-1-7 AUAUCCGUUCUU.AUCA.AGAG.AAGC.AGAGGGA...CUG.G.CCCGACGAA.GCUU.CAGCAACC...GGU..G...... UAAU...... GGCGAUCAGCCAUG Bsu-1-8 UCGAUAUUUCUU.AUCG.UGAG.AGGU.GGAGGGA...CUG.G.CCCUUAGAA.ACCU.CAGCAACC...GGCUUG...... UUUU...... GCAUUUGCAAA Bce-2-8 UUUACUCAUUGU.AUCA.AGAG.AGGU.GGAGGGA...CUG.G.CCCUUUGAA.ACCU.CGGCAACA.G.GUUC...... AUUUU...... Bce-4-5 UUUACUCAUUGU.AUCA.AGAG.AGGU.GGAGGGA...CUG.G.CCCUUUGAA.ACCU.CGGCAACA.G.GUUC...... AUUUU...... Bce-1-12 UUUACUCAUUGU.AUCA.AGAG.AGGU.GGAGGGA...CUG.G.CCCUUUGAA.ACCU.CGGCAGCA.G.GUUCA...... UUUUU...... Ban-3-14 UUUACUCAUUGU.AUCA.AGAG.AGGU.GGAGGGA...CUG.G.CCCUUUGAA.ACCU.CGGCAGCA.G.GUUCA...... UUUU...... Oih-1-13 ACGUUUUUUCUU.AUCU.AGAG.AGAU.UGAGGGAU..CAG.G.CCCUAUGAC.AUCU.CGGCAGCG.G.AUUCUUUAU...... AUUAA...... Bsu-1-9 UCAAUAUUUUCU.AUCC.AGAG.AGGU.GGAGGGA...CUG.G.CCCUAUGAA.ACCU.CGGCAACA...... UUAU...... Bce-4-6 AUGAAAAUUCUU.AUCG.CGAG.AGGU.GGAGGGA...CUG.G.CCCUAUGAU.ACCU.CGGCAGCG.G.AUUC...... GUUAU...... Ban-3-15 AUGAAAAUUCUU.AUCA.CGAG.AGGU.GGAGGGA...CUG.G.CCCUAUGAA.ACCU.CGGCAGCG.G.AUUC...... GUUAU...... Bth-1-8 AUGAAAAUUCUU.AUCA.CGAG.AGGU.GGAGGGA...CUG.G.CCCUAUGAU.ACCU.CGGCAGCG.G.AUUC...... GUUAU...... Bli-1-9 ACUAUACUUCUU.AUUC.AGAG.AGGC.GGAGGGAA..UUG.G.CCCUGUGAA.ACCU.CGGCAGCG.G.GUUCUGC...... AUACA...... GC Ban-3-16 UGAAACCUUCUU.AUAA.AGAG.AGGC.GGAGGGA...CUG.G.CCCUACGAU.GCCU.CGGCAGCG.G.ACUCG...... AUUUU...... Bce-2-9 UGAAACCUUCUU.AUAA.AGAG.AGGC.GGAGGGA...CUG.G.CCCUACGAU.GCCU.CGGCAGCG.G.ACUCG...... AUUUC...... Cte-2-4 AUAAGGAUUCUU.AUCA.AGAG.AGGC.GGAGGGA...CUG.G.CCCUAUGAA.ACC..CGGCAACC.A.AAAAU...... AAUA...... Ban-3-17 ACGAACAUUCUU.AUCU.AGAG.AGGU.AGAGGGA...CUG.G.CCCUAUGAC.GCCU.CAGCAACC...AUUAAC...... AUUU...... G Bce-2-10 ACGAACAUUCUU.AUCU.AGAG.AGGU.AGAGGGA...CUG.G.CCCUGUGAC.GCCU.CAGCAACC...AUUAAC...... AUUUU...... G Bth-1-9 ACGAACAUUCUU.AUCU.AGAG.AGGU.AGAGGGA...CUG.G.CCCUAUGAC.GCCU.CAGCAACC...AUUAAC...... AUUU...... G Bce-4-7 ACGAACAUUCUU.AUCU.AGAG.AGGU.AGAGGGA...CUG.G.CCCUAUGAC.GCCU.CAGCAACC...AUUAAC...... AUUU...... G Cac-1-6 AUUAGUGCACUU.AUCA.AGAG.AGGU.GGAGGGA...CCG.G.CCCUGUGAA.GCC..CAGCAACC.U.GUAUAUG...... UUAAU...... Cac-1-7 UGUAAAAAUCUU.AUCA.AGAG.UGGU.GGAGGGA...CUG.G.CCCUUUGAA.ACC..CGGCAACC.A.GUAUAUUU...... UUUAA...... Cpe-1-2 UUAAUAAAUCUU.AUCA.AGAG.AGGU.GGAGGGA...CUG.G.CCCUGUGAA.ACC..CAGCAACC.G..GUAAUUCUUUGCGG...... UUAAAACAAUGCUGAUUUUAAAAUAAAAAAAUCAGUAGUAAUUUCCUAUGCAAAGAU Lmo-2-7 UAAAUUACUCUU.AUUA.UGAG.UGGU.AGAGGGA...CUG.G.CCCGUUGAA.ACC..CAGCAACC.U.UUCAA...... UUCG...... Lmo-1-5 UAAAUUGCUCUU.AUAA.UGAG.UGGU.AGAGGGA...CUG.G.CCCGUUGAA.ACC..CGGCAACC.U.UUCAA...... UACG...... Bli-1-10 UUGAUUUCUCUU.AUCG.AGAGUUGGG.UGAAGGA...CUG.G.CCUAAUGAU.CCCAACAGCAACC.G.ACCGUAAUACC...... AUUGU...... GAAAUGGGGCGCGAAUCUCUGCGCCGCU Bsu-1-10 AUAUAUUCUCUU.AUCG.AGAGUUGGG.CGAGGGAU..UUG.G.CCUUUUGAC.CCCAAAAGCAACC.G.ACCGUAAUUCCAUUGUGAAAUGGGGCGCAUUUUU...... UUCGCGCCGAGACGCUGGUCUCUU Bsu-1-11 ACAUUUUCUCUU.AUCG.AGAGUUGGG.CGAGGGA...UUG.G.CCUUUUGAC.CCCAACAGCAACC.G.ACCGUAAUACC...... AUUGU...... GAAAUGGGGCGCACUGCUUUUCGCGCCGAGACUGAUGUCUCAU Mth-1-3 AAUGAAACUCUU.AUCG.AGAG.UGGU.GGAGGGA...CUG.G.CCCGAUGAA.GCC..CGGCAACC....CGCUCUCCUGG...... UUUCU...... CCGGG Sth-1-3 CCGUUGACUCUU.AUCC.AGAGUAGGC.UGAGGGA...CUG.G.CCCGAUGAC.GCCA.CGGCAACC.G..GCACG...... UCUCC...... CCAGGC Sth-1-4 CGGCGGGCUCUU.AUCC.AGAG.AGGU.GGAGGGAA..CUG.G.CCCGAUGAA.ACC..CGGCAACC.C.AGGACG...... CUGC...... GGGGC env-23 AGUUUAUUACUU.AUCA.AGAA.AGGU.GGAGGGA...CAG.G.CCCUGUGAA.GCCU.UGGCAACC.U...... UUGAC...... env-24 GACUCCGUGCUU.AUCA.AGAA.AGGU.GGAGGGAA..UAG.G.CCCUGCGAA.ACCU.UGGCAACC...AACUAUAAGUCCCG...... AAUGU...... UCGGGAU 1 Chu-1-2 CGCACAGCAUUU.AUAC.AGAA.UAGG.GAAGAGAG..CAG.G.CUCAAUGAU.CCUA.UAGCAACC.G.AC...... UUAAA...... Chu-1-3 AGUAGUUGAUUU.AUAA.AGAA.GAGG.GGAGAGAA..CGG.G.CUCUUUGAA.CCUC.UAGCAACC.U..CCGGGAG...... CAAAU...... <<<<<<<<.<<<.....<.<<<<....<<<...... >>>...... >>>>.><<<..<<.<.<<<<<...... ?????222.012.....2.?222....222...... 222...... 222?.2221..02.?.??2??...... <<<.<...... ??2.2...... CUU-AUC -AGAG- GGY- GAGGGA---Y G-G-CCC RYGA -RCC -CRGCAACC------aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq aq alignment positions 157 ··· 259 1 Lme-1-1 UUGAC.A...U..GGUG..GUCCAUCUGU...... AC.A..GA..UAGUCGUCACGUU Mmy-1-1 ....U.U...A..GGUG..CUUAAAUAAG...GGCU...... UUA...... AGCU.AA.C..AA..UGAGAUUGUCUUA 1 Ooe-1-1 ..UGC.A...A..GGUG..CCCCAGUGGUA...... AG.C..GA..UGGGUCUAAGUCU 1 env-1 CGGUU.U...C..GGUG..CUACCUCCCA...... CC.A..GA..UGAGCUAAUCUCU Gka-1-1 .UAGC.A...A..AGUG..CCAAUUCCAG...CUAAAG...... AAGA...... UCUUUAG.UA.A..AA..UAAGGGGCACCCU Dvu-1-1 GCACA.A...C..GGUG..CCAA..CCAG...CCGCG.AC...... GGGA...... A.CGCGG.GA.A..CA..UGAGGCCAGGCUU Dra-1-1 CGGUC.A...C..GGUG..CU.UUCAGGAA..AGGGCCG...... UUUA...... GGUGCGCC.GA.C..GA..UGGCGCGAGCGGC Dge-1-1 ACGGC.A...C..GGUG..CCCCGUCUGG...AAGGCCCCGCGC...... GCAA...... GACGCC...GA.C..GA..UGGCGUGAGAGCG Dde-1-1 ..GGC.A...A..GGUG..CCAAAGCCAG...CCACGGUG...... AGAU...... GCCGCGG.UG.C..CA..UGAGGUCGACAGC env-2 ..CUG.A...A..GGUG..CUAAAGCCAG...UCUCUUC...... CUAA...... GUGGGAGGGA.AU.C..GA..UGCGACUCGAAGG env-3 ..AUA.A...A..GGUG..CUAAUGCCAG...CCCAACCGGA...... AUAU...... AGGUGGG.AGUC..GA..UGCGACUUGAUUG >>>>>.....>..>>.>..>>...... <<<<<<...... >>>>>>...... >>..>>>>>>>>>.... ??2??.....?..20.1..22...... 22222?...... ?22222...... 21..0222?????...... >>>>...... 22??...... -R--- --GGUG--CYAA UCC R---C ------G-RR-A--GA--URAGR R qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 29

env-4 ..AUA.A...A..GGUG..CUAAUGCCAG...CCCAUCGGGU...... AGUU...... GACGGUGGG.AGUC..GA..UGCGACUCGAUUC env-5 ..AUA.A...A..GGUG..CUAAUGCCAG...CCCAACCGGC...... AAAU...... GUCGGUGGG.AGUC..GA..UGCGACUUGAUUG env-6 ..GUA.U...A..GGUG..CUAAUGCCAG...CCCAAUCGGC...... UAAU...... UGUCGGUGGG.AGUC..GA..UGCGACUCGAUUC Pph-1-1 GGAAC.A...C..GGUG..ACACAUUCUU...UUAAACGCGC...... UAUU...... CAUGAGUUUUC.AG.ACUGA..UACGAUUCACGCC 1 env-7 GGGGC.A...C..GGUG..GUAACGCCAG...... AUUC..GA..UGAGGAGAAACCA env-8 GGGGC.A...C..GGUG..GUAACGCCAG...AUUC...... GAU...... GAGG.AG.A..AA..UCAUGGUCAUGUC Xax-1-1 CGCGU.U...U..GGUG..CCAAAUCCUG...CGGGGAC...... CUCC...... GCGUCCGCCG.AA.A..GA..UGGUUCGAAUCGU Xca-1-1 CGCGU.U...U..GGUG..CCAAAUCCUG...CGGGGAC...... CUCC...... GCGUCCGCCG.AA.A..GA..UGGUUCGAAUCGU Xor-1-1 CGCGU.U...U..GGUG..CCAAAUCCUG...CGGGGACCC...... CGCG...... UCCGCCG.AA.A..GA..UGGUUCGACUCGU Cph-1-1 .GGAA.A...A..GGUG..CCAAAACCUAA..CCUGCUCA...... AUAU...... GAGCCGG.AA.A.UGA..UAAAUUAAUUGAU Dra-1-2 ..GUC.A...C..GGUG..CCAAGGCCAG...CCCCCUGGGCGUG...... UCAACGAUGAGCCGCCGGGCG.GG.A...C..CAAGCGGGAAGGC Chu-1-1 .AAGC.A...A..GGUG..CUAAUUCCUA...CCC...... AUG...... UGG.AA.C..UA..UAAAAAGUACGAC Asp-1-1 ..UCG.G...U..GGUGGACGUAGUCCAG...C...... GGCA...... G.AA.G..GACGUGGGCAUCGACAU Dge-1-2 ACGUC.A...C..GGUG..CCAAGGCCAG...CCCAGCGCAAGAGG...... UCAACGAUGGCCGCGUGUGGG.AA.C..GA..UCAGGGAAGGUCA Dra-1-3 CGGCA.G...C..GGUG..CU..UUCCAGA..CCCGCGCGAGCAGCGCCCGA...... CGAU..GGGCGGCGCCGCGGG.AA.C..GA..UAAAGGAAGGCGG env-9 AUAAA.A.GUA..GGUG..CUAAAUCCAA...UUUCGAUU...... UUUA...... AUUGAAA.UA.A..GA..UAAGUGUGGUUUU env-10 .GGGC.A...A..GGUG..CCAAUGCCAG...CCCGG...... GAAA...... CCGGG.AA.C..GA..UGUGGUCCACAUG Sus-1-1 .CGCC.A...A..GGUG..CCAAGGUCAG...CCCG...... GAAA...... CGGG.GA.C..GA..UGUGGCGAACGAC Rxy-1-1 GAGCC.A...A..GGUG..CCAAUGCCAG...CCCGCC...... GAGA...... GACGGG.GA.C..GA..UGUAGCUCGAAAC 1 Mma-1-1 GGCGC.A...G..GGUG..CUACUGCCAC...... G.AU.C..GA..UGGAGGAGGACUC 1 Kra-1-1 GGGCG.A...G..GGUG..CUAAAGCCAG...... GC.C..GA..UGGAGUUCGUGAU 1 Kra-1-2 GCGGC.A...... GGUG..CUCCCGCCAGG...... GG.C..GA..UG.GAGGAACGCA 1 Kra-1-3 ..GGG.C...A..GGUG..CUCCCGCCGGG...... AC.C..GA..UGGGAAGGGGACG 1 Kra-1-4 GGGGU.G...U..GGUG..CUCACGCCAGG...... AC.C..GA..UG.GAGGACGACC 1 Kra-1-5 CGGGG.A...G..GGUG..CUCACGCCAGG...... AC.C..GA..UGGGAGGAAGAAG env-11 GGACC.A...... GGUG..CCAAAUCCAG...CUUCGGUGU...... UGUU..CCGACAACAUCGAGG.GA.A..GA..UGAGAGGAUCCGU env-12 GCUAA.A...A..GGUG...UACAUCCUG...CUGGUACA...... AUAU...... UUGUCUG.AA.A..AG..UGAUUCAAGAGUU Cph-1-2 AUCAG.U...A..GGUG..CUAAUUCCUG...CUUCAU...... UUUC...... AUGAAG.AA.A..GA..UAAAUCACACUGC env-13 CUAAG.A...A..GGUG..CUAAAUUCUA...CUUAAUUUUCGAUUCA...... UUAA....UCGAUACAUU.GG.AU.A..GA..UAACUAAAAACUA env-14 UAAAG.A...A..GGUG..CUAAAUUCUA...CCAA...... UUUU...... UUGG.AU.A..GA..UAACAAAAAUGAA Tth-1-1 CGGGG.C...U..GGUG..CCAACGCCGG...CCCGGGCGGGG...... GAAA...... CGCCCGGG.GA.C..GA..UAAGAGAGGGGGG env-15 AAGGA.A...A..GGUG..CCAAAUCCAU...CCUUGCCG...... CGUU...... UGCGGGG.AA.A..GA..UAAGUCAGAUCAC Fsp-1-1 ..GGC.A...... GGUG..CUAAUUCCGG...CCUGGUGGCA...... GACU...... GCCCCGGG.AA.A..GA..UGAGGAGCUUCUU Fsp-2-1 GCGGC.A...... GGUG..CUAAUUCCGA...CCCGGGAC...... GCAG.....CCGGCUCCCGGG.AA.A..GA..UGAGGAGUUCCGC env-16 CGGGAGA...U..GGUG..CCAACUCCGG...CCUGCGA...... CCAA...... GGUGGCGCAGG.GA.A..GA..UGGGGAGAAAGGC Tfu-1-1 CGGGA.C...A..GGUG..CCAACUCCGU...CCCACUG...... UCAA.....GGUGGCAGUGGG.GA.A..GA..UGAGGGGAGAACG Sav-1-1 UCGGG.A...A..GGUG..CCAAAUCCGU...CUCACGG...... CGAA.....GUGCGUCGUGAG.GA.A..GA..UGAGGAGAAAGGG Sco-1-1 UAGGG.A...A..GGUG..CCAAAUCCGU...CUCACGGCG...... AGAU...... GCGUCGUGAG.GA.A..GA..UGAGGAGAAAGGG Esi-1-1 ..GGC.A...A..GGUG..CUCCGACCAG...CAAAGC...... AGUU...... GCUUUG.GA.U..GA..UACACACGAGAAA Gka-1-2 ..GGC.A...A..GGUG..CUAACACCCG...CAAAGCG...... GUUU...... CGCUUUG.GA.U..GA..UAAGAACGGCUCA Esi-1-2 ...GC.A...A..GGUG..CUACGACCAG...CAAAGC...... GAAA...... GCUUUG.GU.U..GA..UAAAAAGCGGCGA Esi-1-3 ..GAC.A...C..UGUG..CUAAUUCCUG...CGGGUG...... UUG...... UACCCG.AU.C..GA..UAAGCUUCCUUUG env-17 .GCCC.C...C..GGUG..CUACUUCCUG...CCCGA...... UCA...... ACGGG.AG.A..GA..UAAGAGCCUCGGU env-18 ..GCC.C...A..GGUG..CUACUUCCUG...CCCGA...... UAAA...... CCGGG.AG.A..GA..UAAGAGCCUCGGC env-19 GAGGU.GACCA..GGUG..CCAAUUCCAG...CUCGAGC...... GAAA...... GUUCGGGAAA.A..AA..UGAGAAGAGCAGU Pae-1-1 ..UGA.U...U..GGUG..CCAAUUCCAA...CCCGGAG...... AGCU...... GUCCGGGAAA.A..GA..UGAUGGUAUGUGC Cph-1-3 ..UGA.U...U..GGUG..CCAAUUCCAA...CCCGGACA...... AGCA...... GUGCGGG.GA.A..GA..UGAUUGUAUAUGC Cph-2-1 UAUGA.U...U..GGUG..CCAAUUCCAU...CCCGGAU...... UACU...... GGCCGGG.AG.A..GA..UGAUGGUAUGCAU Cli-1-1 .GUGA.U...U..GGUG..CCAAUUCCAU...CCCGGAC...... UGCG...... AGCCGGG.AG.A..GA..UGAUGGUAUGUCU Pph-1-2 AAUGA.G...C..GGUG..CCAAUUCCAU...CCCGGCA...... UAAA...... GACCGGG.AA.U..GA..UGAUGGUAUGCAU Cte-1-1 .AUGA.G...C..GGUG..CCAAAUCCAU...CCCGGA...... GGAA...... AUCCGGG.AA.A..GA..UGAUGUAUGCAUU Rxy-1-2 CCCGU.G...A..GGUG..CCAAUUCCAG...CAGGUCUCCC...... CGAG...... GGGGGCCUG.AA.A..GA..UGUGGGGGAGAGC Rxy-1-3 CGCGC.C...A..GGUG..CCAAUUCCCG...CGGAG...... GAGA...... CUCCG.AG.A..GA..UGAGCCGGCAGCC Gme-1-1 GUGAC.....A..GGUG..CUAAAUCCU...... GCC...... GGGA...... GGC...AA.A..GA..UGAGAACGGCGCU Gsu-1-1 CGGAC.G...CCAGGUG..CUAAAUCCUG...CCC...... GAAA...... GGG.AG.C..GA..UGAGAGGGAGCUU Dac-1-1 AUGAC.....A..GGUG..CUAAUUCCAC...CCCC...... GCAA...... GGGG.AC.A..GA..UGAGACGGCGUCU env-20 .GUCG.C...U..GGUG..CCAAAUCCGG...CAGAAGUUCUCCGGCAGGGUGGCCGGGAGA...... CAUUUCUG.GA.A..GA..UGAGGAAAUUCGC Gvi-1-1 GGGAA.A...A..GGUG..CCAAUUCCUG...CGGUUCC...... UCA...... CGAACCG.GA.A..GA..UAAGUCAGGCAUC Mth-1-1 GCCGG.A...A..GGUG..CCAAGUGCCG...CAGGA...... AACC...... UCCUG.GA.A..GA..UAAGUUGACUCUU Sth-1-1 GGAGA.A...C..GGUG..CCAAAUCCAG...CGGAC...... ACUC...... G.GUCCG.AG.A..GA..UGAAGCGUGCGCA >>>>>.....>..>>.>..>>...... <<<<<<...... >>>>>>...... >>..>>>>>>>>>.... ??2??.....?..20.1..22...... 22222?...... ?22222...... 21..0222?????...... >>>>...... 22??...... -R--- --GGUG--CYAA UCC R---C ------G-RR-A--GA--URAGR R qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 30

Cac-1-1 UUGGA.G...A..UGUG..CUAAUUCCUA...CAGGU...... UUA...... UCCUG.AG.A..GA..UGAGAAUGUUUUU Lmo-1-1 ..GUU.....C..UAUG..CUAAUUCCGAU..CAGAAG...... UAAU...... AUUCUG.GC.A..GA..UAAGUAGUAGCUU Lmo-2-1 ..GUU.....C..UAUG..CUAAUUCCGAU..CAGAAG...... UAAU...... AUUCUG.GC.A..GA..UAAGUAGUAGCUU Lin-1-1 ..GUU.....C..UAUG..CUAAUUCCGAU..CAGAAG...... UAAU...... AUUCUG.GC.A..GA..UAAGUAGUAGCUU env-21 UCGGG.C...G..GGUG..CCAAGUCCGA...CCUCAUUCC...... GCAG...... CGGAUGGGG.AG.A..GA..UGAGAGUUCGAAA Cau-1-1 UACGA.A...C..GGUG..CUAAGUCCGG...CAGAG...... GUG...... UUCUG.GA.A..GA..UGAGCGACGUGGA Bcl-1-1 .GCGG.A...A..GGUG..CUAAUUCCAG...CAGGAC...... UAC...... GUCCUG.GG.A..GA..UAAGAGAUUGAAU env-22 CUAGA.A...A..GGUG..CUACAUCCUG...CUUAGUAC...... UUAA.AAACUUAUGUGCUGAG.AA.A..GA..UGAGUGCGAAGAG Oih-1-1 UUAGA.A...A..GGUG..CUACCUCCAG...CAAGAU...... GUAU...... GUCUUG.AA.A..GA..UAAGAGUCCAGAU Bcl-1-2 .GGCA.A...U..GGUG..CUAAUUCCAA...UAGGUA...... AUG...... UACCUA.GG.A..GA..UAAGAAGUUCGUU Oih-1-2 .AGGU.A...U..GGUG..CUAAUUCCAA...UAGGCU...... UACA...... AGCCUU.AA.A..GA..UAAGAAGAGCUAU Oih-1-3 GUAAC.A...U..GGUG..CUCAUUCCAG...CAAGC...... GUAG...... GCUUG.AU.A..GA..UGAGAAAAGUGUU Esi-1-4 .GCGG.A...A..GGUG..CUACUUCCUA...AAAGGA...... AUUU...... UUCCUUU.GA.A..GA..UAAGAGCAGACAU Gsu-1-2 .GGGG.A...A..GGUG..CCAAUUCCUG...CGAGACC...... GACA...... GGUUUCG.GG.A..GA..UAAGGAAGAGCGU Gme-1-2 .GGGG.A...C..GGUG..CCAAAUCCUG...CGGAACG...... GCAA...... CGUUCCG.GG.A..GA..UAAGGAAGAGCGU Cte-2-1 AUUAU.A...A..GGUG..CUAAAUCCUG...CGGU...... GUAA...... ACCG.AG.A..GA..UGAGGAUUUGAAG Sep-1-1 .UAGC.A...C..GGUG..CUAAUACCAA...CGAG...... CAA...... CUCG.AA.U..GA..UAAGUACGAUAAG Sau-1-1 ..AGC.A...C..GGUG..CUAAAACCAA...CGAG...... UUA...... CUCG.AA.U..GA..UAAGUAUAAAGAC Oih-1-4 ..GGA.G...U..GGUG..CUUCUUCCUG...CAGAA...... UUUU...... UUCUG.AA.A..GA..UAAGGUAAUGAUA Esi-1-5 ..GGA.A...C..GGUG..CCACAUCCUA...CAGAAU...... GUUU...... CAUUCUG.AG.A..GA..UAAGUUACGAAAU Bcl-1-3 .GGGG.A...A..GGUG..CCAAUUCCUG...CAGAAUG...... GGAU...... CAUUCUG.AA.A..GA..UAAGCCGGAAAAA Cpe-1-1 UCACU.A...C..GGUG..CCAAUUCCGG...UAAAGA...... AAU...... UCUUUA.CA.A..GA..UGAGAGAAGAUAA Cth-1-1 .UGUA.U...C..GGUG..CCAAUUCCUA...CAGGAUG...... UAAA...... AGUCCUG.AC.A..GA..UGAGGAUAAAAAA Oih-1-5 ..UGU.....C..UGUG..CUAAAUCCUG...CAAGC...... AAUA...... GCUUG.AA.A..GA..UAAGUUGAGGUUA Bce-4-1 ..AGU.A...C..UGUG..CUAAUUCCAG...CAAACGUA...... UGAA...... GCGUUUG.GA.A..GA..UGAGGGGAAAUGG Bce-1-1 ..AGU.A...C..UGUG..CUAAGUCCAG...CAAACGUA...... UGAA...... GCGUUUG.GA.A..GA..UGAGGGGAAAUGG Oih-1-6 AUUGA.A...A..GGUG..CUAAAUCCUA...CAGACU...... UCAU...... CGUCUG.GA.A..GA..UAAGAGGAGGUUC Lmo-1-2 ...UC.A...C..GGUG..CCAAAUCCAG...CAGG...... UAAC...... ACUG.AC.A..GA..UAAGGCACGCGAA Lmo-2-2 ...UC.A...C..GGUG..CCAAAUCCAG...CAGG...... UAAC...... ACUG.AC.A..GA..UAAGGCACGCGAA Lin-1-2 ...UC.A...C..GGUG..CCAAAUCCAG...CAGU...... AUC...... ACUG.AC.A..GA..UAAGGCACGCGAA Ban-3-1 GUGAA.U...A..GGUG..CUAAAACCUGUG.CGAGGC...... UACA...... GGUCUCG.AA.C..GA..UAAGAGCGAAGGG Bth-1-1 AGUGA.A..UA..GGUG..CUAAAACCUGUG.CGAGGC...... UACA...... GGUCUCG.AA.C..GA..UAAGAGCGAAGGG Bcl-1-4 GUGUA.A...A..GGUG..CUAA..CCUG...CAGAAUG...... CUCG...... GCGUUCUG.GA.A..GA..UAAGAGGCGAAAG Bha-1-1 GUGAA.A...A..GGUG..CUAA..CCUG...CAAGGC...... GUU...... GCCUUG.AA.A..GA..UAAGAGGCGAAAG Bli-1-1 CAACU.A...C..GGUG..CUAA..CCUGAUGCAAGGUGU...... UCAA...... UACCUUG.AG.C..GA..UAAGAGUGAAAGG Gka-1-3 GUGAA.A...U..GGUG..CUAA..CCUGUGGCAAGGC...... GCAG...... UCCUUG.AA.C..GA..UAAGAGUGAAAGG Oih-1-7 GUGAA.A...A..GGUG..CUAA..UCUGAUGCAAGGAU...... AAUA...... GUUCCUUG.AA.C..AA..UAAGAGCGAAAGG Bsu-1-1 CAACC.A...A..GGUG..CUAA..CCUGUUGCAAGGUU...... GUAU...... GAUUCCUUG.AG.C..GA..UAAGAGUGAAAGG Esi-1-6 AGGAA.A...A..GGUG..CU..CUCCUGAAGCGAAGU...... AAAC...... ACUUCG.AA.C..GA..UAAGAGGGUAAAG Oih-1-8 UAAGU.A...C..UGUG..CCAAUUCCAG...UAGCG...... UAAU...... UGCUA.GA.A..GA..UGAGAAGAGUAUA Esi-1-7 .GAGC.A...C..UGUG..CCAAUUCCAU...CAGAC...... AAUU...... GUCUG.AG.A..GA..UGAGUCGAGUGGA Cac-1-2 .ACUU.A...U..GGUG..CUAAUUCCAG...CAGGA...... UAUU...... UUCUG.AA.A..GA..UGAGGAGCGACUA 1 Fnu-1-1 ..UGU.G...U..GGUG..CUAAUUCCAG...... AG.A..GA..UGGAGAGGAAAAU 1 Fnu-2-1 ..UGU.G...U..GGUG..CUAAUUCCAG...... AG.A..GA..UGGAAAGGUCAAU 1 Fnu-1-2 ..UGU.G...U..GGUG..CUAAUUCCUG...... AU.A..GA..UGGAAAAGAUUAU 1 Fnu-2-2 AUUGU.G...U..GGUG..CUAAUUCCUG...... AU.A..GA..UGGAAAAGGUUAU Sau-2-1 AAAGA.A...A..GGUG..CCAAA.CCGUUUGCAGACA...... AAUA...... GGUCUG.AA.C..GA..UAAGAGCGAAUGG Sau-1-2 AAAGA.A...A..GGUG..CCAAA.CCGUUUGCAGACA...... AAUA...... GGUCUG.AA.C..GA..UAAGAGCGAAUGG Sep-1-2 AAAGA.A...A..GGUG..CCAAA.CCGUUUGCAGAC.AA...... AUAU...... G.GUCUG.AA.C..GA..UAAGAGCGAAUGG Gka-1-4 GCGGC.C...A..GGUG..CUAAAUCCAG...CGAAUUGAG...... CAUU...... CAAUUCG.GC.A..GA..UAAGAAGAAGCAU Bcl-1-5 AUGAA.A...A..GGUG..CUAAAUCCAG...CAAAG...... GGAA...... CUUUG.GC.A..GA..UAAGGGGAUUCAU Bcl-1-6 .UGGU.C...A..GGUG..CUAAAUCCAG...CAGU...... UUA...... UCUG.CA.A..GA..UAAGAGAAGCAUG Lin-1-3 UGUUU.A...A..GGUG..CUAAGUCAUG...CAGAACAAC...... GAUU...... UGUUCUG.AA.A..GA..UGAGAAGGAAGUU Lmo-1-3 UGUUU.A...A..GGUG..CUAAGUCAUG...CAGAACAAC...... UAAU...... UGUUCUG.AA.A..GA..UGAGAAGGAAGUU Lmo-2-3 UGUUU.A...A..GGUG..CUAAGUCAUG...CAGAACAAC...... UAAU...... UGUUCUG.AA.A..GA..UGAGAAGGAAGUU Bcl-1-7 AAACC.A...C..GGUG..CUACAUUCAG...CAGAGCAG...... CUUU...... UGUUCUG.AA.A..GA..UAAGUGAGGCGAA Bha-1-2 AAACG.A...A..GGUG..CUAAUUUCAG...CAGAAUG...... AUUU...... CAUUCUG.GA.A..GA..UAAGCGAAGGCGA Bce-4-2 CUGAA.U...A..GGUG..CUAAUUCCUG...CAAAAUG...... CAUU...... GCAUUUUG.AA.A..GA..UAAAACAUAACUA Ban-3-2 CUGAA.U...A..GGUG..CUAAUUCCUG...CAAAAUG...... CAUU...... GCAUUUUG.AA.A..GA..UAAAACGUAACUA Bce-1-2 CUGAA.U...A..GGUG..CUAAUUCCUG...CAAAAUG...... CAUU...... GCAUUUUG.AA.A..GA..UAAAACGUAACUA >>>>>.....>..>>.>..>>...... <<<<<<...... >>>>>>...... >>..>>>>>>>>>.... ??2??.....?..20.1..22...... 22222?...... ?22222...... 21..0222?????...... >>>>...... 22??...... -R--- --GGUG--CYAA UCC R---C ------G-RR-A--GA--URAGR R qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 31

Bce-2-1 CUGAA.U...A..GGUG..CUAAUUCCUG...CAAAAUG...... CAAU...... GCAUUUUG.GA.A..GA..UAAAACUCAACUA Lmo-2-4 .UUUC.A...C..GGUG..CUAAUUCCAG...CAGU...... AUAU...... UCUG.AA.A..GA..UAAGUCGGAAAUC Sth-1-2 CAGGC.A...A..GGUG..CCAAGUCCAG...CCGGGC...... AUA...... CUACCG.GG.A..GA..UAAGAGGGAAGAG Bth-1-2 UGGGC.A...C..GGUG..CUAAUUCUUA...CAACAC...... AUU...... GUGUUG.AA.A..GA..UAAGAGUAAUAUG Bce-2-2 UGGGC.A...C..GGUG..CUAAUUCUUA...CAACAC...... AUU...... GUGUUG.AA.A..GA..UAAGAGUAAUAUG Bsu-1-2 ..AGC.A...C..GGUG..CUAAUUCUUG...CAGCU...... AGC...... GGCUG.AG.A..GA..UAAGAUUCGGACG Bli-1-2 ..AGC.A...C..GGUG..CUAAUUCUUG...CAGC...... UGAC...... GCUG.AG.A..GA..UAAGGAUUCGAAC Gka-1-5 CAAGC.A...C..GGUG..CUAAUUCUUG...CAGCG...... GAAA...... CGCUG.AG.A..GA..UAAGAAGAGCGCC Bsu-1-3 ..UGU.A...A..GGUG..CUACUUCCAG...CAAAAUG...... AAUU...... CCAUUUUG.AA.A..GA..UAAGGGCUGCAUG Bli-1-3 .GCGU.A...A..GGUG..CUACUUCCAG...CAAAAUG...... CCCU...... CAUUUUG.AG.A..GA..UAAGGAAUAGCGG Bce-2-3 .AAGC.A...A..GGUG..CUAAAUCCAG...CAAAAUG...... GGAU...... CCAUUUUG.AA.A..GA..UAAGGUAAAAUAU Bce-3-1 .AAGC.A...A..GGUG..CUAAAUCCAG...CAAAAUG...... GAAU...... CCAUUUUG.AA.A..GA..UAAGGUAAAAUAU Ban-3-3 .AAGC.A...A..GGUG..CUAAAUCCAG...CAAAAUG...... GAAU...... CCAUUUUG.AA.A..GA..UAAGGUAAAAUAU Bli-1-4 ..UGC.A...A..GGUG..CUAAAUCCAG...CAGAAUG...... UCAG...... GCGCAUUCUG.AA.A..GA..UAAGUUGAACAAC Bha-1-3 GUGGU.C...A..GGUG..CUAAUUCCUG...CAAGCA...... UUAU...... UUGCUUG.AG.A..GA..UAAGAGGAAGCGA Ban-3-4 UUGAA.A...C..GGUG..CUAAUACCUG...CAAAAC...... GAAU...... GUUUUG.CA.U..AA..UAAGAGGAGGAAC Bce-1-3 UUGAA.A...C..GGUG..CUAAUACCUG...CAAAAC...... GAAU...... GUUUUG.CA.U..AA..UAAGAGGAGGAUC Tte-1-1 AGGAU.A...A..GGUG..CCAAUUCUCU...CAGAAGA...... UUUU...... UUCUUCUG.AA.A..GA..UGAGGGUAUGCCU Mth-1-2 CAGGU.A...C..GGUG..CCAAUUCCUG...CAGGA...... UUUC...... UCCUG.GC.A..GA..UGAGAGUUGAAAA Ban-3-5 UAGAC.A...C..GGUG..CUAAUUCUCG...CAGCA...... UUA...... CGCUG.AC.A..GA..UAAGGAGCUGGUU Bce-1-4 UAGAC.A...C..GGUG..CUAAUUCUCG...CAGCA...... UUA...... CGCUG.AC.A..GA..UAAGGAGCUGGUU Esi-1-8 ..GGC.A...C..GGUG..CUAAUUCUUG...CAGCU...... AUUA...... CGCUG.AC.A..GA..UAAGAGUCCACGC Cte-2-2 GUAUA.A...A..GGUG..CUAACUCCAG...CAGGA...... AAUU...... UCCUG.AA.A..GA..UAAGAAAAGUUUA Bth-1-3 AUAUA.A...A..GGUG..CUAAAUCCUG...UAGGAUA...... UAAA...... AGUCCUA.AU.A..GA..UAAGAAAAUGGGU Cac-1-3 ACAUA.A...U..GGUG..CCAAUUCCUG...CAGAA...... UUA...... UUCUG.CA.A..GA..UAAGAGAGAGAAU Cte-2-3 GUAUU.A...A..GGUG..CCAAUUCCUG...CAGA...... AAGU...... UCUG.CA.A..GA..UAAGAGGGCUGGC Bce-4-3 .CGGA.A...U..UGUG..CCAAAUCCUG...CAGG..UA...... GUAA...... U..CCUG.AA.A..GA..UAAGAAAGAGCCU Bce-2-4 .CGGA.A...U..UGUG..CCAAAUCCUG...CAGG..UA...... GUAA...... U..CCUG.AG.A..GA..UAAGAAAGAGCCU Bce-1-5 .CGGA.A...U..UGUG..CCAAAUCCUG...CAGGUA...... AUAA...... ACCCUG.AG.A..GA..UAAGAAAGAGCCU Ban-3-6 .CGGA.A...U..UGUG..CCAAAUCCUG...CAGGUA...... AUAA...... AUCCUG.AG.A..GA..UAAGAAAGAGCCU Cac-1-4 GAUGU.A...U..GGUG..UUAAUUCCUG...CAAAG...... UUAA...... UUUUG.AG.A..GA..UAAGAGGAUUAUA Bce-1-6 .ACGA.A...A..GGUG..CCAAAUCCUG...CAGGUGA...... AGAA...... ACACCUG.AA.A..GA..UAAGAGCGGUUCA Esi-1-9 CCGAA.G...U..GGUG..CUAAUUCCAG...CAGAC...... GAUU...... GUCUG.CA.A..GA..UGAGAGCAAAUGG Gka-1-6 AGGGC.A...C..GGUG..CUAAGUCCAA...CAGAAAGACCG...... AUGU...... CUUUCUG.AA.A..GA..UAAGAGGCGCGAA Esi-1-10 ..GGC.A...C..GGUG..CUACAUCCAA...CAGAC...... AUUC...... GUCUG.AG.A..GA..UAAGAGGACGGAA Bcl-1-8 .CUUG.A...A..GGUG..CUAAAUCCUG...CAAAGC...... AUAU...... GGGCUUUG.GG.A..GA..UGAGAGGGAAGCA Tte-1-2 ..GGC.A...U..GGUG..CCAAUUCCUG...CAGCG...... GUUU...... CGCUG.AA.A..GA..UGAGAGAUUCUUG Tte-1-3 .GUGC.U...U..GGUG..CCAAUUCCUG...CAGGUUGGG...... GUUA...... CCCAGCCUG.AG.A..GA..UGAGAGGAGAGGC Dha-1-1 .AUGC.A...C..GGUG..CUAAAUCCUG...CAGG...... AAAU...... ACUG.GG.A..GA..UGAGAGGAAACCC Lin-1-4 AGUGA.A...A..GGUG..CUAA..UCUGUUGCAGGAGU...... AAUA...... UCUCCUG.AA.C..GA..UGAGAGCAAAGGU Lmo-2-5 AGUGA.A...A..GGUG..CUAA..UCUGUUGCAGGAGU...... AUUA...... UCUUCUG.AA.C..GA..UGAGAGCAAAGGU Oih-1-9 AAUGA.A...A..GGUG..CCAAUUCCUG...CAG...... AAA...... AUG.AA.A..GA..UGAGAGAACGUCA Bha-1-4 UGGAA.A...A..GGUG..CUAAUUCCUG...CAAAGC...... GAU...... GCUUUG.AG.A..GA..UGAGAGAAGGGAA Gka-1-7 CUGAA.A...U..GGUG..CCAAUUCACA...CAAAGC...... GGCC...... UGCUUUG.AG.A..GA..UAAGAGACGGAAU Bli-1-5 UUGAA.A...U..GGUG..CCAAUUCACG...CGAAGC...... GUUA...... UGCUUUG.AA.A..GA..UGAGAGAAAGGCC Bsu-1-4 UUGAA.A...U..GGUG..CCAAUUCACA...CGAAGC...... GUUC...... AGCUUUG.AA.A..GA..UGAGAGAAAGGCA Bce-1-7 UUGAA.A...U..GGUG..CCAAUUCCUG...CAAAGC...... AAAU...... GCUUUG.AG.A..GA..UGAGAGAGAGGGA Ban-3-7 UUGAA.A...U..GGUG..CCAAUUCCUG...CAAAGC...... AAAU...... GCUUUG.AG.A..GA..UGAGAGAGAGGGA Sau-1-3 .AGAA.A...U..GGUG..CCAAUUCACA...UAAAGU...... UUUA...... ACUUUU.GA.A..GA..UGAGAGAAACAAU Sep-1-3 .AGAA.A...U..GGUG..CCAAUUCACA...UAAAGU...... AUA...... ACUUUA.GA.A..GA..UGAGAGAAAGAAC Oih-1-10 AUGAA.U...A..GGUG..CUAAAUCCUG...CAAAAUACG...... GACA...... GUAUUUUG.AG.A..AA..UAAGAGAGGUGAU Cac-1-5 ...GU.A...C..GGUG..UUAAUUCCUG...CAAAACU...... UAUU...... UGUUUUG.AA.A..GA..UAAGAAAACAGCU Bce-1-8 AAGGC.A...C..GGUG..CUAAUUCCAG...CAGAAAG...... UAAA...... ACUUUCUG.GC.A..GA..UAAGAGGGGAGAA Bce-2-5 AAGGC.A...C..GGUG..CUAAUUCCAG...CAGAAAG...... UAAA...... ACUUUCUG.GC.A..GA..UAAGAGGGGAGAA Bth-1-4 AAGGC.A...C..GGUG..CUAAUUCCAG...CAGAAAG...... UAAA...... ACUUUCUG.GC.A..GA..UAAGAGGGGAGAA Bce-3-2 AAGGC.A...C..GGUG..CUAAUUCCAG...CAGAAAG...... UAAA...... ACUUUCUG.GC.A..GA..UAAGAGGGGAGAA Ban-3-8 AAGGC.A...C..GGUG..CUAAUUCCAG...CAGAAAG...... UAAA...... ACUUUCUG.GC.A..GA..UAAGAGGGGAGAA Bli-1-6 ACAGA.A...U..GGUG..CUAAAUCCUU...AAGAGCA...... UGUU...... CGUGCUCUU.GA.A..GA..UAAGGAGGAGAUU Bsu-1-5 .CAGA.A...U..GGUG..CUAAAUCCUU...UAGAGCAA...... UGAU...... UGCUCUU.GA.A..GA..UAAGGUUGAGAUU Bsu-1-6 ACAGA.A...U..GGUG..CUAAAUCCUU...AAGAAC...... AUUG...... CGUUCUU.GC.A..GA..UGAGGCGGAGAUU >>>>>.....>..>>.>..>>...... <<<<<<...... >>>>>>...... >>..>>>>>>>>>.... ??2??.....?..20.1..22...... 22222?...... ?22222...... 21..0222?????...... >>>>...... 22??...... -R--- --GGUG--CYAA UCC R---C ------G-RR-A--GA--URAGR R qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 32

Bha-1-5 .AAAG.A...A..GGUG..CCAAUUCCAG...CAGAAC...... AUGA...... UGUUCUG.AA.A..GA..UAAGAAGCGAACG Bth-1-5 ...GC.A...... GGUG..CUAAUUCCAG...CAGAACA...... UAUU...... GGUUCUG.GG.A..GA..UAAGACGAAGAUA Ban-3-9 ...GC.A...... GGUG..CUAAUUCCAG...CAGAACA...... AAUU...... UGUUCUG.GG.A..GA..UAAGACGAAGAUA Bth-2-1 ...GC.A...... GGUG..CUAAUUCCAG...CAGAACA...... UAUU...... UGUUCUG.GG.A..GA..UAAGACGAAGAUA Bce-1-9 AUAGG.A...A..GGUG..CUAAUUCCG....CAGAGAACACG...... AUGU...... GUUUUUUG.GA.A..GA..UAAGAGGAUUCUU Bth-1-6 AUAGG.A...A..GGUG..CUAAUUCCG....CAGAGAACACG...... UUGU...... GUUUUUUG.GA.A..GA..UAAGAGGAUUCUU Ban-3-10 AUAGG.A...A..GGUG..CUAAUUCCG....CAGAGAACACG...... UUGU...... GUUUUUUG.GA.A..GA..UGAGAGGAUUCUU Bce-4-4 AUAGG.A...A..GGUG..CUAAUUCC.G...CAGAGAACACG...... AUGU...... GUUUUUUG.GA.A..GA..UAAGAGGAUUCUU Bce-2-6 AUAGG.A...A..GGUG..CUAAUUCC.G...CAGAGGACACG...... AUGU...... GUUUUUUG.AA.A..GA..UAAGAGGAUUCUU Lmo-2-6 AAGUG.A...A..GGUG..CUAAUUCCAG...CAAAAUGGU...... GUAU...... UCCAUUUUG.GU.A..GA..UAAGAGGAGCUGG Lmo-3-1 AAGUG.A...A..GGUG..CUAAUUCCAG...CAAAAUGGU...... GUAU...... UCCAUUUUG.GU.A..GA..UAAGAGGAGCUGG Lmo-1-4 AAGUG.A...A..GGUG..CUAAUUCCAG...CAAAAUGGU...... GUAU...... UCCGUUUUG.GU.A..GA..UAAGAGGAGCUGG Lin-1-5 AAGUG.A...A..GGUG..CUAAUUCCAG...CAAAAUGGU...... GUAU...... UCCGUUUUG.GU.A..GA..UAAGAGGAGCUGG Ban-2-1 AAGCU.A...A..GGUG..CUAAUUCCUG...CAAAAUGAG...... UUUU...... CGUUUUG.GA.A..GA..UAAGAGAGGAUCC Bce-1-10 AAGCU.A...A..GGUG..CUAAUUCCUG...CAAAACG...... AGUU...... CUCGUUUUG.GA.A..GA..UAAGAGAGGAAUC Ban-3-11 AAGCU.A...A..GGUG..CUAAUUCCUG...CAAAAUGAG...... UUUU...... CGUUUUG.GA.A..GA..UAAGAGAGGAUCC Bce-2-7 AAGCU.A...A..GGUG..CUAAUUCCUG...CAAAACGAG...... UUUU...... CGUUUUG.GA.A..GA..UAAGAGAGGAAUC Bce-3-3 AAGCU.A...A..GGUG..CUAAUUCCUG...CAAAACGAG...... UUUU...... CGUUUUG.GA.A..GA..UAAGAGAGGAACC Oih-1-11 AGAAU.A...C..UGUG..CCAAUUCCAU...CAAGCA...... AAU...... UGCUUG.AA.A..GA..UAAGAGUAGAAUA Oih-1-12 UGAAU.A...C..UGUG..CCAAAUCCAU...CAAGUA...... UUCU...... AUGCUUG.GU.A..GA..UAAGAGAAGUCGG Bce-1-11 ..AAU.A...C..CGUG..CUAACUCCAG...CAAGCCUA...... UGAA...... AGGCUUG.GA.A..GA..UGAGAAGAUGUGA Bce-3-4 ..AAU.A...C..CGUG..CUAACUCCAG...CAAGCCU...... AUAA...... GGGCUUG.GA.A..GA..UGAGAAGAUGUGA Ban-3-12 ..AAU.A...C..CGUG..CUAACUCCAG...CAAGCCAU...... AUAA...... AGGCUUG.GA.A..GA..UGAGAAGAUGUGA Ban-1-1 ..AAU.A...C..CGUG..CUAACUCCAG...CAAGCCAU...... AUAA...... AGGCUUG.GA.A..GA..UGAGAAGAUGUGA Ban-3-13 GGAAC.A...C..CGUG..CUAAUUCCAG...CAAGC...... AAG...... UCUUG.AA.A..GA..UAAGUGAUGGGCC Bth-1-7 UGAAC.A...C..CGUG..CUAAUUCCAG...CAAGC...... AAG...... UCUUG.AA.A..GA..UAAGUGAUGGGCC Esi-1-11 .UCGU.A...C..UGUG..CCAAAUCCAG...CAAGC...... UGCG...... GCUUG.AG.A..GA..UAAGAAGAGCGUC Sep-1-4 ...... A..UGUG..CCAAUUCCAG...UAACCG...... AGA...... AGGUUA.GA.A..GA..UAAGGUUAAACAC Sau-2-2 ...... A..UGUG..CCAAUUCCAG...UAACCG...... UAA...... UGGUUU.GA.A..GA..UAAGCAGGUAAAG Gka-1-8 .GGCC.A...A..GGUG..CUAAAUCCAGA..CAGGC...... GGAA...... GCCUG.GA.A..GA..UAAGAAGAAGCGA Bli-1-7 GCACC.A...A..GGUG..CUAAAUCCAG...CAAGC...... GGAU...... GCUUG.GA.A..GA..UAAGAAGAAGCGA Bli-1-8 UGACC.A...A..GGUG..CUAAAUCCAG...CAAGCA...... GCC...... UGCUUG.GA.A..GA..UAAGAAGACGGAC Bsu-1-7 ..ACC.A...A..GGUG..CUAAAUCCAG...CAAGCUC...... GAA...... CAGCUUG.GA.A..GA..UAAGAAGAGACAA Bsu-1-8 GCGCC.A...A..GGUG..CUAAAUCCAG...CAAGCGU...... UUUU...... UAUGCUUG.GA.A..GA..UAAGAAGAAGCGU Bce-2-8 .GAAU.A...C..UGUG..CCACUUCCUG...CAAGC...... UUUA...... U.GCUUG.AA.A..GA..UAGAAUGAGGGAC Bce-4-5 .GAAU.A...C..UGUG..CCACCUCCUG...CAAGC...... UUUG...... U.GCUUG.AA.A..GA..UAGAAUGAGGGAC Bce-1-12 UGAAU.A...C..UGUG..CCACUUCCUG...CAAGCU...... UUAU...... AGCUUG.AA.A..GA..UAGAAUGAGGGAC Ban-3-14 UGAAU.A...C..UGUG..CCACUUCCUG...CAAGCU...... UUAU...... AGCUUG.AA.A..GA..UAGAAUGAGGGAC Oih-1-13 AGAAU.A...C..UGUG..CCAAUUCCUG...CAAAUGC...... AAAC...... GAGCAUUUG.AA.A..GA..UGAGAAACGAUGG Bsu-1-9 ...... UGUG..CCAAUUCCAG...CAAGC...... GCUA...... GCUUG.AA.A..GA..UAGGAAAGCAAGG Bce-4-6 .GAAU.A...C..UGUG..CCAAUUCCAG...CAAG...... GUAA...... CUUG.AA.A..GA..UAAGAAAGAAGCU Ban-3-15 .GAAU.A...C..UGUG..CCAAUUCCAG...CAAG...... GUAA...... CUUG.AA.A..GA..UAAGAAAGAAGCU Bth-1-8 .GAAU.A...C..UGUG..CCAAUUCCAG...CAAG...... GUAA...... CUUG.AA.A..GA..UAAGAAAGAAGCU Bli-1-9 AGAAU.A...C..UGUG..CCACAUCCAA...CAAGCC...... GUAC...... GGGCUUG.GA.A..GA..UAAGAAGAGAGCG Ban-3-16 AGAGU.G...C..UGUG..CCAAAUCCAG...CAAGC...... AUGU...... GCUUG.AA.A..GA..UGAGAAGAGCGUU Bce-2-9 AGAGU.G...C..UGUG..CCAAAUCCAG...CAAGC...... GUGU...... GCUUG.AA.A..GA..UGAGAAGAGUGUU Cte-2-4 AUUUU.A...A..GGUG..CCAAUUCCAG...CAGGU...... GAA...... ACCUG.AC.A..GA..UAAGACGUAGAGG Ban-3-17 UUAAU.A...A..GGUG..CUAAUUCCAG...CAAAUUG...... CGAA...... AAAUUUG.AC.A..GA..UGAGAAGAAGACU Bce-2-10 UUAAU.A...A..GGUG..CUAAUUCCAG...CAAAUUG...... UGAA...... AGAUUUG.AC.A..GA..UGAGAAGAAGACU Bth-1-9 UUAAU.A...A..GGUG..CUAAUUCCAG...CAAAUUG...... UGAA...... AGAUUUG.AC.A..GA..UGAGAAGAAGACU Bce-4-7 UUAAU.A...A..GGUG..CUAAUUCCAG...CAAAUUG...... UGAA...... CGAUUUG.AC.A..GA..UGAGAAGAAGACU Cac-1-6 UAUAC.A...A..GGUG..CUAAUUCCUG...CAGC...... GCUA...... GCUG.AG.A..GA..UGAGAAUAUAAAU Cac-1-7 UAUAU.G...U..GGUG..CUAAAUCCUG...CAGC...... AAAC...... GCUG.AU.A..GA..UGAGAAUAAUCGC Cpe-1-2 UUAUA.G...C..GGUG..CUAAAUCCUG...CGGU...... AGAA...... ACUG.AG.A..GA..UAAGAAAGAGAGU Lmo-2-7 UUGAA.A...A..GGUG..CUAAAUCCUG...CGAAGU...... GUGA...... UGCUUCG.AG.A..GA..UAAGAGAGACUUA Lmo-1-5 UUGAA.A...A..GGUG..CUAAAUCCUG...CGAAGU...... GUGA...... UGCUUCG.AG.A..GA..UAAGAGAGACUUA Bli-1-10 AGGGC.A...C..GGUG..CUAAUUCCAU...CAGACU...... UUGA...... AUUCUG.AG.A..GA..UAAGAGAGGCGUA Bsu-1-10 AAGGC.A...C..GGUG..CUAAUUCCAUU..CAGAU...... CUG...... AUCUG.AG.A..GA..UAAGAGAGGCGGA Bsu-1-11 AAGGC.A...C..GGUG..CUAAUUCCAU...CAGAU...... UGU...... GUCUG.AG.A..GA..UGAGAGAGGCAGU Mth-1-3 GGCGA.C...A..GGUG..CCAAUUCCUG...CAGGGC...... AGAA...... GGCCCUG.AG.A..GA..UAAGGGGGGUAAG >>>>>.....>..>>.>..>>...... <<<<<<...... >>>>>>...... >>..>>>>>>>>>.... ??2??.....?..20.1..22...... 22222?...... ?22222...... 21..0222?????...... >>>>...... 22??...... -R--- --GGUG--CYAA UCC R---C ------G-RR-A--GA--URAGR R qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa 33

Sth-1-3 GUGCA.G...C..GGUG..CCAAUUCCUG...GGGAGGG...... CUCC...... GCCCUCCC.GA.A..GA..UGAGAGGCAGCCG Sth-1-4 GUCCG.A...A..GGUG..CCAACUCCAC...CGGCGGC...... CUCA...... CGCCGCCG.GA.A..GA..UAAGAGAGCGGAC env-23 ...... U..GGUG..CCAAUUCCUGAU.UUCCA...... AUG...... UGGAA.UA.A..GA..UAAGUUAAAAGAG env-24 UGGAA.A...AUCGGUG..CUAAAUCCUA...UUCGCU...... UUA...... AGCGGA.AA.A..GA..UAAGUAGAAAAAU 1 Chu-1-2 ...GA.A...C..GGUG..CUAAA.CCUG...... CCC...... AA.A..AA..UAAAAGGGAACUA Chu-1-3 CCAGA.A...A..GGUG..CUAAUUCCUGA..CCUGA...... AUAA...... ACAGG.GC.A..UA..UAAAUUAAUUGCA >>>>>.....>..>>.>..>>...... <<<<<<...... >>>>>>...... >>..>>>>>>>>>.... ??2??.....?..20.1..22...... 22222?...... ?22222...... 21..0222?????...... >>>>...... 22??...... -R--- --GGUG--CYAA UCC R---C ------G-RR-A--GA--URAGR R qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa qa References

[1] J.E. Barrick and R.R. Breaker. The distributions, mechanisms, and structures of metabolite-binding riboswitches. Genome Biol., 8(11):R239, 2007.

[2] Beile Gao, Ragi Paramanathan, and Radhey S Gupta. Signature proteins that are distinctive characteristics of Actinobacteria and their subgroups. Antonie Van Leeuwenhoek, 90(1):69–91, Jul 2006.

[3] David A. Hopwood. New data on the linkage map of Streptomyces Coelicolor. Genet. Res., 6:248–262, 1965.

[4] Elizabeth E. Regulski, Ryan H. Moy, Zasha Weinberg, Jeffrey E. Barrick, Zizhen Yao, Walter L. Ruzzo, and Ronald R. Breaker. A widespread riboswitch candidate that controls bacterial genes involved in molybdenum cofactor metabolism, 2007 (submitted).

[5] Zasha Weinberg, Jeffrey E Barrick, Zizhen Yao, Adam Roth, Jane N Kim, Jeremy Gore, Joy Xin Wang, Elaine R Lee, Kirsten F Block, Narasimhan Sudarsan, Shane Neph, Martin Tompa, Walter L Ruzzo, and Ronald R Breaker. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline. Nucleic Acids Res., 35(14):4809–4819, 2007.

[6] W.C. Winkler, A. Nahvi, N. Sudarsan, J.E. Barrick, and R.R. Breaker. An mRNA structure that controls gene expression by binding S-adenosylmethionine. Nat. Struct. Biol., 10:701–707, Sep 2003.