<<

bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Preprint bioRxiv

Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea Claire Toffano-Nioche, Daniel Gautheret and Fabrice Leclerc*

Abstract A structural and functional classification of H/ACA and H/ACA-like motifs is proposed from the analysis of the H/ACA guide RNAs which have been identified previously in the genomes of Euryarchaea (Pyrococcus) and Crenarchaea (). A unified structure/function model is proposed based on the common structural determinants shared by H/ACA and H/ACA-like motifs in both Euryarchaea and Crenarchaea. Using a computational approach, structural and energetic rules for the guide-target RNA-RNA interactions are derived from structural and functional data on the H/ACA RNP particles. H/ACA(-like) motifs found in Pyrococcus are evaluated through the classification and their biological relevance is discussed. Extra-ribosomal targets found in both Pyrococcus and Pyrobaculum are presented as testable candidates which might support the hypothesis of a gene regulation mediated by H/ACA(-like) guide RNAs. Keywords RNomics, , H/ACA motif, pseudouridylation, RNA structure, RNA modification, RNP Dept. of Genome Biology, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Universite´ Paris Sud, 91405, Orsay, France *Corresponding author: [email protected]

Introduction focused on the H/ACA RNAs and RNPs from P. abyssi [16, 17, 4, 18, 19, 5, 20, 21] and P. furiosus [3, 22, 23, 24, 25, The H/ACA guide RNAs are part of a RNP machinery includ- 26, 5, 27, 28, 29]. These studies describe the structure/func- ing several (L7Ae, Cbf5, Nop10 and Gar1) which tion relationships for this RNA guide machinery associated catalyzes the uridine-to-pseudouridine isomerization. Ho- with the RNA fold, the RNA:RNA interactions between the mologs of the snoRNA H/ACA found in Eukarya, they are guide and its target(s) and the RNA: contacts (between found in a widespread number of Archaea. This family of L7Ae and the K-turn or K-loop motif, between Cbf5 and the small guide RNAs was detected both in Euryarchaea (Ar- ACA box, etc.). More recently, the discovery of ’atypical’ chaeoglobus fulgidus [1], Haloferax volcanii [2], Pyrococcus H/ACA motifs in the Pyrobaculum has suggested an abyssi [3, 4, 5]), Crenarchaea (Sulfolobus solfataricus [6]) and alternate way for this machinery to assemble and achieve its Nanoarchaea (Nanoarchaeum equitans [7]). Computational function [30]. These non-canonical H/ACA motifs, further screens for H/ACA RNAs and their potential targets in the designated as H/ACA-like motifs, are detected specifically archaeal genomes suggest they are present in a variable num- in this genus of Crenarchaea and differ from the canonical ber of copies among all the archaeal phyla [8, 5]. The natural H/ACA motifs in that the lower stem is truncated. They in- targets are ribosomal RNAs but other RNAs might be targeted clude two long free single-stranded regions at both 5’ and 3’ with functional implications [9, 10]. In higher eukaryotes, ends forming a pseudo-internal loop and the generic hairpin snRNAs are additional targets [11]; the U2 snRNA is one with an embedded K-turn or K-loop motif. In Pyrobaculum, particular target which is also modified in S. cerevisiae by two more canonical H/ACA motifs are also found (sR201 an H/ACA guide RNA under particular conditions [12]. The and sR202) among the ’atypical’ H/ACA motifs. These two artificial modification of mRNAs by the H/ACA guide RNP particular guide RNAs exhibit a lower stem (5 to 6 base pairs) machinery was shown to suppress nonsense codons by target- although slightly shorter than that of the canonical H/ACA ing the uridine present as the first letter of stop codons [13]. motifs found in Pyrococcus (8 to 9 base pairs) while the other On the other hand, snoRNAs have also been involved in un- eight guide RNAs are fully truncated with no lower stem. usual roles: in alternative splicing in the case of the C/D box From a structural point of view, one may then consider that guide RNA HBII-52 [14] or as one of the RNA biomarkers sR201 and sR202 do correspond to canonical H/ACA motifs. for non-small-cell lung cancer in the case of the H/ACA guide Actually, the first annotated ’atypical’ H/ACA motif sR201 RNA snoRA42 [15]. was identified previously as a canonical H/ACA motif in P. In archaea, extensive structural and functional studies have aerophylum [19, 31, 32]. bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 2/146

In this work, the structure/function relationships of H/ACA single strand region exceeding 10 nucleotides (Pab40.2 and RNAs are revisited from the structural and functional data on Pab19), were omitted. The 2D structure representations were the canonical H/ACA guide RNAs of Pyrococcus. With 7 generated from the STOCKHOLM alignments using the R2R corresponding to 11 H/ACA RNA motifs in its genome, program [36]. The structure representations of the guide- Pyrococcus is one of the archaeal genus with so many genes target RNA pairs are also generated using the R2R program. for this class of sRNA. As shown before [5], one given H/ACA The corresponding pairing energies are calculated using the guide RNA may potentially target several positions in 16S or RNAsnoop utility [37] from the Vienna Package (version 23S rRNAs and one given position may happen to be a poten- 2.0) [38]. In the analysis of the energy contributions to the tial target for two different H/ACA guide RNAs. However, duplex stability, we refer to the 5’ and 3’ duplex as defined in there is only a unique pair between a target and its associ- RNAsnoop. ated guide RNA, i.e. only a few of the potential targeted In the case of the H/ACA-like motifs from Pyrobaculum positions are actually modified (“true targets”) while others and other new H/ACA(-like) motifs (from Pyrococcus), the are not modified (“false targets”). Vice versa, a position po- alignments were obtained from the UCSC genome browser tentially targeted by two guide RNAs is actually modified in MAF format [39, 40] and then converted into an aligned by only one H/ACA guide RNA (“productive”). Here, we FASTA and STOCKHOLM formats. Finally, R2R [36] deco- propose a structural and functional classification that gives a rators were added to generate the 2D structure representations. rational for a “true target” and its associated “productive” The 2D representations of hybrid guide-target RNAs were H/ACA guide RNA or for a “false target” and its associ- generated using R2R with specific options and decorators; ated “non-productive” H/ACA guide RNA. The classification, some detailed examples to reproduce this kind of represen- based on the 2D folds of guide RNAs and the stability of the tations are provided elsewhere [41]. The structural analysis guide-target RNA:RNA duplexes, is intended to discriminate of the H/ACA sRNAs was performed using X3DNA (version between these “productive”/“non-productive” H/ACA com- 2.1) [42] and Curves+[43]. plexes; the “productive” H/ACA complexes are expected to lead to the pseudouridylation of some RNA target(s) (rRNA, 1.3 Computational RNomics tRNA or other RNA) while the non-productive ones may even- tually play some alternative role. A multi-step workflow is used to search for possible H/ACA- The consistency of the classification is tested for all the like motifs in the genome of P. abyssi [46, 47] (Fig. 1). The archaea (Archaeoglobus, Haloferax, Holoarcula, Sulfolobus, H/ACA-like motifs are identified using the descriptor-based Aeropyrum, etc.) where some H/ACA guide RNAs and their approach implemented in RNAMotif [44]. Based on the 2D respective target(s) were characterized. The H/ACA-like mo- structure of the H/ACA(-like) motifs identified in Pyrobac- tifs, found in Pyrobaculum, are also examined to determine ulum [30], six families are defined to consider all possible whether the same classification may apply in spite of their candidates with different stem lengths and bulge positions structural specificities. The structural and functional relevance (Supplementary data: Listings 1 to 12). The two canonical of the H/ACA-like motifs proposed recently in P. abyssi [33] H/ACA motifs Pae sR201 & sR202 are defined using a unique is also evaluated according to the classfication. descriptor. The following sRNAs: Pae sR204, Pae sR208, Pae sR209, Pae sR210 are also defined by a common descriptor with a bulge nucleotide one downstream (3’ end) 1. Materials and Methods from the two GA pairs of the K-turn motif. Each of the other 1.1 RNA-seq Data sRNAs (Pae sR203, Pae sR205, Pae sR206, Pae sR207) is The full RNA-seq data on P. abyssi are available from the defined by a unique descriptor. The substitution between K- work published recently [33] and from its supplementary turn and K-loop motifs is allowed: two descriptors are used material (http://rna.igmors.u-psud.fr/suppl_ for each of the six families of sRNAs. All the descriptors are data/Pyro). provided (as text files) in the supplemental material. A series of S-MART scripts [45] is used to extract the 1.2 Structural alignments and representations H/ACA(-like) motifs that lie in the non-coding transcrip- Most of the sequences of the H/ACA motifs from P. abyssi tome of P. abyssi. A minimum overlap of 50 nucleotides are available in RFAM [34, 35]: Pab21 or snoR9 (RFAM ID: is required to filter out the initial hits. The sequence and RF00065), Pab35 or HgcF (RFAM ID: RF00058), Pab40 or motif conservation are verified using the UCSC archaeal HgcG (RFAM ID: RF00064), Pab105 or HgcE (RFAM ID: genome browser [39, 40]. A combination of two softwares: RF00060), Pab19, Pa160, and Pab91 [5]. The STOCKHOLM locaRNA [48] and RNAalifold [49] are used to identify opti- alignments were edited to split the RNA genes with multiple mal and sub-optimal 2D structures which are consistent with motifs (Pab35: Pab35.1, Pab35.2, Pab35.3; Pab105: Pab105.1, the H/ACA(-like) fold. The search for associated targets is Pab105.2; Pab40: Pab40.1, Pab40.2) into their corresponding performed using the RIsearch program [50], using the rec- unique motifs. All the individual H/ACA motifs were then ommended extension penalty (“-d 30”). The potential RNA merged to generate a full alignment. For the sake of graphical targets are identified by searching through the whole genomes representation, the motifs with a large internal loop, with a matching sequences that can associate with a pseudo-guide bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 3/146

K-loop hits K-turn hits P. abyssi genome 3000 2000 1000 0 0 50 100 150

H/ACA(-like) motifs screen

overlap with RNAseq

sequence conservation

motif H/ACA(-like) RNAs conservation (P. abyssi & P. aerophilum) no other ncRNA presence of candidates target(s) no H/ACA(-like) candidates overlap with RNAseq

target H/ACA(-like) guide:target conservation candidates

Figure 1. Workflow for the identification of H/ACA(-like) motifs and potential targets in Pyrococcus abyssi and . The genome is initially screened for the presence of H/ACA-like motifs using a descriptor-based approach (RNAMotif [44]). The number of hits is given at each step of the workflow as an indicative value for the motifs carrying either a K-turn or K-loop submotif. The first filter is the selection of motifs in the non-coding transcriptome (S-MART scripts [33, 45]), then the selection of conserved sequences and motifs among other archaea (UCSC archaea genome browser [40, 39]) and finally the selection of motifs that have potential target(s) in the genome which are themselves transcribed and conserved among other archaea. bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 4/146 sequence containing the 5’ and 3’ duplex elements from the like motifs discovered in Pyrobaculum, this constraint can internal loop of the guide RNA. This pseudo-guide sequence vary between 9 and 12 base-pairs [30]. When searching for is obtained by merging the 5’ and 3’ duplex elements from the new H/ACA(-like) motifs in the archaeal genomes, a very internal loop of the guide RNA separated by a di-nucleotide loose constraint is generally used for this distance: between spacer equivalent to the di-nucleotide spacer containing the 5 and 12 nucleotides [55, 5, 8, 19]. In all the 3D structures targeted U position. In order to impose a 5’-UN-3’ sequence of the full H/ACA RNP complexes (L7Ae, Nop10, Cbf5, constraint in the target RNAs, the di-nucleotide spacers in and Gar1) including or not a target RNA (PDB IDs: 2HVY, the pseudo-guide sequences are defined by the complemen- 3LWO, 3HAX) [23, 27, 28], this distance constraint is exactly tary element: 5’-NA-3’. The target RNA candidates are se- 10 base-pairs: it gives the proper position and orientation of lected based upon a RIsearch energy score of 16 kcal/mol the K-turn or K-loop motif within the RNP complex for L7Ae or lower which corresponds to the less favorable energy score to bind the guide RNA and interact with Nop10. The only obtained for validated guide-pair targets (with a deviation of exception is the case of the incomplete H/ACA RNP complex 1 kcal/mol). Only target sequences with at most two bulge reconstituted with a composite guide RNA derived from Pf9 nucleotides are tolerated but no bulge nucleotide is allowed on (Pa160 ortholog) which exhibits a 9 base-pairs constraint the 5’ duplex (5’ end of the paired target sequence) upstream (PDB ID: 2RFK) [26]. Interestingly, the L7Ae protein is from the predicted modified position. missing in this particular H/ACA RNP complex. To determine After a guide-target pair is identified, two additional filters whether these constraints apply to all the known “productive” are applied: the presence of the target sequence in a region H/ACA guide RNAs, we are revisiting the structure/function of the genome of P. abyssi which is expressed [33], the con- relationships of the archaeal H/ACA and H/ACA-like motifs. servation of the target in both P. abyssi and P. aerophilum whether it is present in homologous genes (identified using 2.2 New Structural and Functional Classification of the BBH approach [51]) or in genes which are functionally re- H/ACA RNAs in Pyrococcus lated. First, a functional annotation analysis using the DAVID bioinformatics resources [52, 53] is carried out to evaluate The sequence and structure analysis of H/ACA guide RNAs the enrichment in genes and family of genes targeted in both from P. abyssi reveals the existence of different structural genomes. Then, the targeted genes that belong to the same families which can all fit into a common structure/function functional category are analyzed and characterized using the model consistent with a so-called ‘10/n/9’ fold (Fig. 2) where tools described above for the representations and energy cal- the numbers indicate a distance in base-pairs: (a) from the culations of the guide-target pairs. internal loop to the K-turn or K-loop motif and (b) from the internal loop to the ANA box (which is generally equivalent to the length of the lower stem), respectively. The 2D consensus 2. Results and Discussion structure (Fig. 2(a)A), represented from a STOCKHOLM 2.1 Current Structure/Function Model of H/ACA mo- alignment (Supplementary Listing 13) is consistent with the tifs 3D structure determined for a chimeral H/ACA RNA from The H/ACA motif is well described as a stem-loop-stem motif P. furiosus (Fig. 2(a)B, PDB ID: 3LWO) [28] (patched from closed by an apical loop and terminated by an ACA box at the Pf9 and Pf6 RNAs from P. furiosus that are orthologs of the 3’ end [54]. In archaea, the L7Ae ribosomal protein is Pa160 and Pa35.2 from P. abyssi). The main features of the part of the H/ACA RNP particle and specifically binds to a consensus (Fig. 2(a)A) are the following: a lower stem with 9 K-turn motif which is embedded in the upper stem or merged base pairs, a symmetric or asymmetric internal loop (usually within the apical loop as a K-loop. The lower stem generally small: 5-8 nt and 5-6 nt on the 5’ and 3’ single-stranded region, includes 9 base-pairs in the canonical H/ACA motifs and the respectively), an upper stem including 10 stacked base-pairs distance between the ACA box and the base-pair closing the between the internal loop and the K-turn/K-loop motif. internal loop in the upper stem is around 15 nt (between 14 The H/ACA motifs are then classified in two fold families and 16 nt [1]); in the numerous H/ACA motifs present in denoted ‘10/5/9’ (Fig. 2(a)-(b)) and ‘10/6/9’ (Fig. 2(b)) where P. abyssi, an empirical constraint between 14 and 15 nt was n= 5,6 indicates the size of the 3’ single-stranded region of { } proposed [5]. On the other hand, no well-defined constraint is the internal loop. The ‘10/5/9’ fold family is consistent with proposed regarding the position of the K-turn or K-loop motif the 3D structure of a chimeral H/ACA RNA from P. furiosus with respect to the internal loop. In Haloferax volcanii, the (Fig. 2(a)B) while the ‘10/6/9’ fold family is consistent with distance between the base-pair closing the internal loop in the the 3D structure of another chimeral H/ACA RNA (Fig. 2(b)F) upper stem and the second sheared G:A base-pair of the K-turn derived from Archaeoglobus fulgidus (Afu46) which is assem- or K-loop motif corresponds to 8 base-pairs [2]. In the same bled with the H/ACA ribonucleoproteins from P. furiosus , the 2D folds proposed for the “productive” H/ACA (PDB ID: 3HAX) [23]. The sequence of Afu46 is somehow guide RNAs involve a distance of 9 base-pairs [1, 16, 55] similar to that of Pf91 in P. furiosus [5] (homolog of the Pa91 or 10 base-pairs [1, 16]. In Pyrococcus, it is between 9 and motif (Fig. 2(a)) from P. abyssi). In the ‘10/6/9’ fold family, 11 base-pairs but it is restrained to 9 or 10 base-pairs in the the internal loop is extended by the addition of one nucleotide “productive” H/ACA guide RNAs [5]. Finally, in the H/ACA- in the 3’ single-stranded region of the internal loop which bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 5/146

Pa_HACA Pa_HACA-GUIDE65 A B C subfam_weight=0.284006 3-21 nt 3LWO R 28% U G A G R G A Y 10 bp 5 nt 10 bp 6 nt Y R 0-1 nt Y R C G 0-1 nt G base pair annotations 0-1 nt 0-1 nt covarying mutations R Y compatible mutations no mutations observed 5 stacked 5-8 nt 5-6 nt nucleotide nucleotide 6 stacked layers present identity 97% 75% N 0-1 nt layers 97% 90% 50% N 90% N 75% C G C G 9 bp connector (zero length) C G variable-length region 9 bp variable-length loop Y variable-length stem 5´ A C A modular sub-structure Pa105.2-S27 Pa105.2-L1377 C A C A A A A A Pa40.3-L2016 G C G C G U U C G C G C G C G 40 C G 40 C G 30 G C 30 G C Pa91-L2685 30 G C U G U U G K-loop A U U G U G A C U 40 G G G G C G G A 30 U A G C G C A U U A K-turn A K-turn A K-turn A G A G U A G U A G U G A G A G A 50 G A 50 G C U A U G U G G C C G C G C G C G 10 bp 10 C G 10 bp C G 10 bp C G 10 bp 20 C G 20 G C 50 20 G C 20 G C U A C G C G C G C G C G C G C G C G C G G G G G A G 40 G U G G U G U C C A G G 60 G G 60 A U U G G G C G U C C G G U A A U 5 G C C 5' guide sequence U 5' guide sequence A 5' guide sequence U 5' guide sequence U C G U 3' guide sequence U (6 stacked layers) C 3' guide sequence (6 stacked layers) G C 3' guide sequence (6 stacked layers) C A (6 stacked layers) A 3' guide sequence U C G U G 60 (5 stacked layers) U A (5 stacked layers) C U A (5 stacked layers) G G (5 stacked layers) 10 C G 10 C G 10 G C 10 G C C G U G G C G C C G G C C G C G U A 9 bp , 50 C G 9 bp C G 9 bp C G 9 bp C G 9 C G C G 70 C G 70 C G C G G C G C C G ANA box G C ANA70 box G C ANA box G C ANA box C G G U G U G U 5' A C A 5' A C A 5' A U A 5' A U A (a) Figure 2. Structure/Function Model of H/ACA guide RNAs in Pyrococcus abyssi. A. The consensus 2D structure, labeled Pa HACA, is obtained from a curated STOCKHOLM alignment modified from the RFAM entries for H/ACA guide RNAs in P. abyssi (RFAM [34, 35] IDs: RF00058, RF00060, RF00065). B. The 3D structure of the chimeric guide RNA from Pyrococcus furiosus is shown in a stereo view as it is in the H/ACA sRNP complex including its RNA target (PDB ID: 3LWO) [28]. C. The more representative structural subfamily Pa HACA-GUIDE65 (6 and 5 nucleotides in the 5’ and 3’ strand regions of the internal loop, respectively) belongs to the ‘10/5/9’ fold family. The corresponding H/ACA guide RNAs from this subfamily in P. abyssi are shown: Pa91, Pa40.3, Pa105.2. Each RNA fold is annotated with the names of the H/ACA motif and its associated target in the large (L) or small (S) ribosomal subunit indicated by the position which is modified in the rRNAs. The graphical representations of the full 2D structures and the internal loop subfamilies were generated using the R2R program [36]. The internal loop of the H/ACA motif is shaded in green, the K-turn or K-loop motifs in blue and the ANA motif in grey. bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 6/146

D Pa160-S922(Pa35.2-L2697) E Pa105.1-L2554a G A U A A A U A G A C G C G G C 30 C G 30 G C A Pa35.2-L2549a G C 40 U G 40 U G U A K-loop A G C C G C A C G G C U C A K-turn U U G C A G A G 30 A K-turn U A G U G A G A G A G C G C G A 50 A U 50 G C G C 10 bp Pa_HACA-GUIDE55 20 G C 10 bp 10 20 C G 10 20 A G 9 bp +1 10 Pa_HACA-GUIDE56 G C subfam_weight=0.183422 G C subfam_weight=0.171508 C G A G C C G C G 18% G C 17% 5' guide sequence C G C G G U (3 stacked layer +2nt) C G A 0-1 nt Y C U G Y G U C 40 G C Y G C C C U C G C G 5+1 nt C C A C C C C C 60 5 nt C C Y U C Y Y 5 U 6 G 6 A A U R A G 3' guide sequence G 3' guide sequence 5 nt (6-1) 5' guide sequence A C 3' guide sequence 5 nt (6-1) C (5 stacked layers +1nt) 5' guide sequence G U (5 stacked layers +1nt) Y A C G U U A G C R Y (3 stacked layer +2nt) 10 (5 stacked layers) R 10 (3 stacked layer +2nt) G C 1nt U G 10 G C G C G C G C C G G C 50 C G C G 9 bp C G 9 bp C G 9 bp C G 9 C G 9 C G 9 C G G C G C 70 C G ANA box G C ANA box C G ANA box G C 70 G U C G 5' A C A 5' A C A 5' A C A F Pa105.2-L2794 C A A A Pa35.1-L2930 G C C G Pa35.1-L2672 C G 40 30 C 30 G C U G U G 3HAX U G G G U G U U G G G U G C A G C G A K-turn A K-turn , 40 A G U A G U 10 bp G A 50 A A U G C G C G G C C G 10 bp C G 10 bp Pa_HACA-GUIDE66 20 G C 10 20 C G 10 subfam_weight=0.158649 C G C G C G C G U 50 16% G G C G G G U G U 6 stacked G U G G 60 C G G G G C C 5+1 nt C A C U layers C 5' guide sequence C U 3' guide 6sequence A C 6 6 stacked 6 nt Y U A 5' guide sequence U A 3' guide sequence U A (6 stacked layers) C A (6 stacked layers) (6 stacked layers) C C (6 stacked layers) layers C U G U G U G1 nt 10 G C 10 G C 60 G C G C G C C G C G C G 9 bp U A 9 bp C G 70 9 C G 9 9 bp G C G C G C ANA box G C ANA box , 70 G U G C 5' A U A 5' A C A

Pa21-S891 Pa40.1-L2588 GH30 30 K-loop K-loop C U G U A G A U U U A G A G G A G A G A C G G C U A Pa_HACA-GUIDE75 G C 10 bp 10 Pa_HACA-GUIDE85 G C 10 bp 10 subfam_weight=0.102745 C G subfam_weight=0.106948 G C C G U A 10% 20 U G 40 11% 20 C G 40 C G CG U CG C G 1 nt C G C C G C U G G G U 5' guide sequence G G G U C U C G A G G U 6+2 nt A (6 stacked layers +2nt) C G U 5' guide sequence G C G 6+1 nt 5 nt (6 stacked layers +1nt) C C G 5 nt G C C 5 C C 5 C C C C A A C G 3' guide sequence G G 3' guide sequence C G U C G C A C C G (5 stacked layers) C G A C G (5 stacked layers) G C 2 nt G C 1 ntG C 10 G C 10 G C 50 C G 50 C G C G C G 9 bp C G 8 bp + 1 C G 9 C G 9 G C C G ANA box G C ANA box G C G U 5' U A C A 5' A C A (b) Figure 2. Structure/Function Model of H/ACA guide RNAs in Pyrococcus abyssi (continued). D. Structural subfamily Pa HACA-GUIDE55 and one of its representative folds (Pa160-S922; the Pab35.2-L2697 fold is equivalent). E. Structural subfamily Pa HACA-GUIDE56 and its 2 members (Pa35.2-L2549 and Pa105.1-L2554). F. Structural subfamily Pa HACA- GUIDE66 corresponding to the 3D structure of the chimeric guide RNA derived from Afu46 (from Archaeoglobus fulgidus and its representative folds (Pa105.2-L2794 and Pa35.1-L2930 or Pab35.1-L2672). Only the RNA component of H/ACA RNP complex (PDB ID: 3HAX) is shown in a stereoview. G. Structural subfamily Pa HACA-GUIDE75 and its representative fold (Pa21-S891). H. Structural subfamily Pa HACA-GUIDE85 and its representative fold (Pa40.1-L2588). Pa35.2 can adopt two different folds: 10/6/9 or 10/5/9 whether the base-pair from the apical stem closing the internal loop is a G:U (Pab35.2-L2549a) or G:C (Pa35.2-L2697), respectively.

Pa40.1-L2588

30 K-loop U A G U U A G G A C G U A G C 10 bp G C U A 20 C G 40 C G G G U 5' guide sequence G G (6 stacked layers +2nt) A G C G C G A A C 3' guide sequence A C G (5 stacked layers) G C 10 C G 50 C G C G 8 bp + 1 C G C G ANA box G C 5' U A C A bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 7/146 folds with 6 stacked nucleotides (Fig. 2(b)E-F). As a result, nucleotides (8 base-pairs and one unpaired U nucleotide). In the number of stacked layers from the ANA box to the closing other closely related species like P. horikoshii, P. yayanossi, base-pair of the upper stem differs in the two fold families and T. kodakaraensis, the first base-pair is preserved as a ‘10/5/9’ and ‘10/6/9’ with 14 and 15 nucleotides, respectively. wobble U-G or U-A base-pair. The second minor varia- Nevertheless, the two crystallized H/ACA guide RNAs (PDB tion involves a bulge nucleotide in Pa105.1 (A at position IDs: 3LWO and 3HAX, Fig. 2(a)B and Fig. 2(b)F) which are 16: Pa105.1-L2554a in Fig. 2(b)E) which is assumed to be representative from both fold families do exhibit the same rel- stacked in the upper stem: this conformation is supported by ative positions of the internal loop, lower stem and ANA box compensatory changes in the ortholog from T. kodakaraen- when superposing the two RNP complexes. In the ‘10/6/9’ sis: the substitution of the bulge nucleotide by a base-pair family, the additional stacked nucleotide in the 3’ end duplex which is correlated with the substitution of the two unusual sits at the location of the first base-pair from the lower stem in G-A sheared base-pairs embedded in the upper stem by two the ‘10/5/9’ family. This is made possible by a change in the canonical base-pairs (Supplementary Fig. S1). The presence morphology of the lower stem where the RNA helix is more of both a single bulge nucleotide and two G-A base-pairs in bent in ‘10/6/9’. This more pronounced bending is associated the upper stem are expected to allow the adjustment of the with a more compact helix (smaller helical rise) by almost 1A˚ helical twist so that the relative position of the K-turn/K-loop and less twisted (smaller helical twist) by about 15 degrees motif with respect to the internal loop is preserved (a bulge A (Table S1). At the junction between the lower stem and the nucleotide adopts such a stacked conformation in the 24-mer internal loop, the helical twist is large in ‘10/5/9’ (43.1) but RNA hairpin coat protein binding site from bacteriophage small in ‘10/6/9’ (30.8) with respect to a canonical A-RNA R17, PDB ID: 1RHT [56]). helix (33.6). As suggested in a previous work [21], the RNA From this classification, we can summarize the structural sequence may have an influence on the intrinsic bending of determinants of H/ACA RNAs that reside in the relative po- the lower stem of H/ACA guide RNAs. It is likely that the sitions of both the K-turn/K-loop motif and the ANA box proteins of the RNP particle also play a non-specific role to with respect to the internal loop. The distance between the induce a bending suitable with its catalytic activity in this K-turn/K-loop motif and the internal loop includes 10 stacked portion of the guide RNA. layers. A more detailed analysis also suggests a prevalence Different structural subfamilies can then be distinguished of the 5’ duplex over the 3’ duplex in the guide-target hybrid based on the size of the 5’ single-stranded region of the inter- duplex. The “productive” H/ACA RNA are thus expected to nal loop, e.g. the Pa HACA-GUIDE65 subfamily corresponds fulfill the following criteria: (1) a first stretch of 10 stacked to an asymmetric internal loop with 6 and 5 residues, respec- layers from the upper stem to the K-turn/K-loop motif, (2) a tively (Fig. 2(a)C). The 2D representation of each guide RNA second stretch of 14 to 15 stacked layers from the lower stem corresponds to the expected “productive” fold in the H/ACA to the ANA box. The second stretch is generally composed RNP particle and thus includes the name of the H/ACA motif of 9 base-pairs in the lower stem and 5 to 6 nucleotides in (e.g. Pa91) and its associated target in the large (e.g. L2685) the 3’ region of the internal loop. In addition, the 5’ region or small (e.g. S27) ribosomal subunit. In total, 6 different sub- of the internal loop should not have fewer base-pairs than the families are present and the Pa HACA-GUIDE65 subfamily 3’ region in the RNA guide-target hybrid duplex due to geo- is the larger one (28%) including 4 H/ACA guide RNAs in metrical constraints. All the H/ACA guide RNAs that were Pyrococcus (Fig. 2(a)). The other subfamilies, represented in validated from a functional point of view in P. abyssi [5] fit the order of the frequence of occurence in all H/ACA motifs into one of the two fold families: ‘10/6/9’ or ‘10/5/9’ (Fig. 2), (Fig. 2(b)D-H), are variations from the Pa HACA-GUIDE65 except Pa19 for which an atypical pairing model was proposed subfamily in the size of the internal loop which is either with only 8 base-pairs in the upper stem (‘8/7/9’ fold); this shortened or extended by one or two nucleotides on either particular case is discussed later. Pa105.1-L2554 (Fig. 2(b)) sides. H/ACA RNA homologs in other species can switch to was initially considered as a ‘9/6/9’ fold but the structural and a different subfamily due to the insertion(s) or deletion(s) in phylogenetic data suggest it actually fits into the ‘10/6/9’ fold the internal loop. In the Pa HACA-GUIDE55 family, Pa160 (Fig. 2(b)E and Supplementary Fig. S1). has one homolog in P. furiosus: Pf9 that switches to the Because of the conformational flexibility of RNA, the cri- Pa HACA-GUIDE65 family. The homologs of Pa35.2 in Ther- teria relative to the size of the internal loop are only relevant mococcus kodakaraensis and Pa105.2 in P. furiosus switch to the active conformation of the H/ACA RNAs while they from Pa HACA-GUIDE56 to Pa HACA-GUIDE66. The ho- are associated to the proteins of the H/ACA RNP particle and molog of Pa40.1 in T. kodakaraensis switches from Pa HACA- the RNA target. Therefore, one can imagine the internal loop GUIDE85 to Pa HACA-GUIDE65 (data not shown). may change in size due to the opening of the lower or up- Some other minor structural variations are observed (Fig. 2). per stems at the closing base-pairs to accommodate different The first one corresponds to the loss of the first base-pair at RNA targets (Fig. 3). The opening of the upper stem sys- the bottom of the lower stem in Pa40.1 (Pa40.1-L2588 in tematically leads (if not compensated) to switch from a ‘pro- Fig. 2(b)H). However, the distance between the first base-pair ductive’ (’10/5(6)/9’) to a ‘non-productive’ fold (‘9/5(6)/9’, of the lower stem and the ANA box remains the same: 9 ‘8/5(6)/9’, etc.). The Pa35.1 motif is productive (‘10/6/9’) bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 8/146 with the following ribosomal targets: L2930 and L2672 (23S productive complexes are also made up of 14 or 15 stacked rRNA) but non-productive with other targets in the 16S rRNA: layers from the closing base-pair of the upper stem to the ANA S922 (‘9/7/9’) or S1122 (‘9/8/9’) (Figure 3(a)A). On the op- box including 7 to 9 base-pairs in the lower stem depending posite, the opening of the lower stem can preserve the 14 on the extent of opening of the internal loop. or 15 stacked nucleotides allowing extended single-stranded The structure-function classification based on the struc- regions for base-pairing with RNA target(s). One example is ture and pairing models inferred from the available data dis- provided by Pa35.2-L2549 (‘10/6/9’) where the lower stem is criminates the ‘productive’ from the ‘non-productive’ H/ACA shortened by the opening of one (‘10/7/8’) or eventually two guide RNAs for all possible guide-target pairs described pre- wobble base-pairs (‘10/8/7’). viously taking into account one additional criterion for the The presence of a bulge nucleotide in some specific po- guide-target association. The calculated stability of the guide- sitions near the base pairs closing the internal loop can help target complex (RNAsnoop [37]) is not discriminative enough switching between different “productive” folds, thus extend- since some non-productive complexes are as stable as produc- ing the repertoire of potential targets. Bulge nucleotides are tive ones (compare for example Pa21-S891 and Pa21-S892 in a source of sequence variability in the internal loop when Table 1). However, a cutoff value can be defined for the pre- they are located either at the (n-1) position from the closing diction of new guide-pair targets as the minimum energy value base-pair of the upper stem (e.g. Pa35.2-L2697, Fig. 3(a)B) or for a productive complex which is 27 kcal/mol for Pa35.2- at the (n+1) position from the closing base-pair of the lower L2697 (Table 1). According to the classification, 17 guide- stem (e.g. Pa105.2-L1377 or Pa105.2-S27, Fig. 3(a)C). The 5’ target pairs correspond to ‘productive’ folds: e.g. ‘10/6/9’, region of the internal loop is thus enlarged by the insertion of ‘10/5/9’ or equivalent folds (see Table 1), while 8 other pairs one nucleotide at the 5’ or 3’ end while preserving a canonical correspond ‘non-productive’ folds. All the guide RNAs which base-pair (Watson-Crick or wobble) closing the internal loop. are predicted as ‘non-productive’ are indeed non-productive In the first case, the sequence is modified at the position cor- (6 true negative cases). All the guide RNAs which are pro- responding to the last base-pair in the 5’ duplex of the guide- ductive are also predicted as ‘productive’ (15 true positive target hybrid. In the second case, the sequence is extended cases). The two missing cases (false positives) correspond at the first position of the 5’ duplex. In the case of Pa35.2- to guide-target pairs which adopt a ‘productive’ fold with a L2697, the closing base-pair G15-C41 imposes a ‘10/5/9’ weak interaction at 5’ duplex: Pa35.2-L2250 and Pa105.2- fold; the bulge nucleotide U-40 can form a wobble pair with S995 (Table 1). This duplex includes only 3 base-pairs, two G15 by pushing C-41 into the internal loop and thus induce of which are wobble base-pairs for Pab35.2-L2250 (Supple- a switch to the ‘10/6/9’ fold which is productive in Pa35.2- mentary Fig. S2) or 4 base-pairs, two of which are also wob- L2549 (Fig. 3(a)B). Similarly, the Pa105.2 motif can switch ble base-pairs for Pa105.2-S995. The only major difference from a ‘10/5/9’ (Pa105.2-L1377 or P105.2-S27)subfamily to between the two guide-target complexes Pa105.2-L1377(+) a ‘10/6/9’ fold by substituting the closing base pair from the (Fig. 4A and Supplementary Listing 14) and Pa105.2-S995(- lower stem: U9-A64 by the U9-G65 while pushing A64 in ) (Fig. 4B and Supplementary Listing 15) is the pairing in the internal loop (Pa105.2-L2794). The Pa105.2 motif can the 5’ duplex of Pa105.2 (upstream from the pseudo-uridine also undergo typical rearrangements described above for the position): a perfect match for the productive complex but opening of the lower stem: from ‘10/6/9’ to ‘10/7/8’ or alter- a poor match for the non-productive complex. In the other natively from ‘10/5/9’ to ‘10/6/8’ (Fig. 3(a)C). target in 23S rRNA (L2794), the 5’ duplex energy is more stable ( 8.5 kcal/mol) for the productive complex Pa105.2- In the case of Pa105.1, various examples of possible changes that impact negatively the upper stem are also shown L2794 (Fig. 4C and Supplementary Listing 16). Although (Fig. 3(b)D). Pa160-S922 (‘10/5/9’) is productive while Pa160- the total energy of both guide-RNA complexes is the same, the 5’ duplex energy is 5.7 kcal/mol for the non-productive L2672 (‘11/5/8’) obtained by ’zipping down’ the internal loop is not productive (Fig. 3(b)E). More significant conforma- complex (Pa105.2-S995) and -6.9 kcal/mol for the produc- tional changes associated with a fold change are possible tive one (Pa105.2-L1377). Similarly, the 5’ duplex energy of Pab35.2-L2250 is only 5.8 kcal/mol. In some extreme case, for a few H/ACA motifs; Pa21-S891 (‘10/5/9’) is productive the 3’ duplex energy of Pab35.2-L2549(+) is very weak ( 1 while its alternate fold Pa21-S892 (‘9/6/9’) is not (Fig. 3(b)F). kcal/mol) but the 5’ duplex energy is still below 6 kcal/mol Two particular motifs have potential crossed targets: Pa160 ( 9.5 kcal/mol). We can expect the existence of a thermody- and Pa35.1 which can both target the positions S922 and L2672 (Fig. 3(a)-(b)). The productive complexes are Pa160- namic control where a minimum stability of the 5’ duplex is S922 (‘10/5/9’) and Pa35.1-L2672 (‘10/5/9’) while the non- required for the guide-target complex to be productive. This productive ones are: Pa35.1-S922 (‘9/7/9’) and Pa160-L2672 rule is followed in all negative cases. In the border line case of Pa40.1-L2588, the 5’ duplex energy is 6.2 kcal/mol (Ta- (‘11/5/8’). All the productive complexes do exhibit 10 base- ble 1). A minimum energy of 6 kcal/mol can be used as pairs or stacked layers between the the upper stem and the K-turn or K-loop motif. A bulge in the upper stem may com- a discriminant criterion to exclude guide-target pairs which pensate the presence of only 9 base-pairs if it is stacked in the may not be productive (Table 1). helix (and most likely stabilized in the RNP complex). The The structural determinants and the energy criteria pro- bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 9/146

Pa35.1-L2930(+) A Pa35.1-L2672(+) Pa35.1-S922(-) Pa35.1-S1122(-) 30 C U G 30 U C G 30 U C G U G U G U G G G G G G G U U U U U U G U G U G U G C G G A K-turn , 40 G C G G C A G A K-turn , 40 A K-turn , 40 U A G U A G U A A A A A A C G C G C G G C G C 10 bp G C C G 10 C G 9 bp 9 C G 9 bp 9 20 C G 20 C G 20 C G C G C G C G C G U 50 C G C G C G U 50 C G G C U G 50 G U G U U 3' guide sequence C G C G C G C C C C C (6 stacked layers +2nt) 5' guide sequence 3' guide sequence 5' guide sequence C C U C U (7 stacked layers) C A C 6 (7 stacked layers) (6 stacked layers +1nt) 7 U 8 5' guide sequence 3' guide sequence A C A C U A U A U (6 stacked layers) C C (5 stacked layers +1nt) C A U G C U G C U G C 10 G C 60 10 G C 60 10 G C 60 G C G C G C C G C G C G U A 9 bp 9 U A 9 bp 9 U A 9 bp 9 C G C G C G G C G C G C G C ANA box , 70 G C ANA box , 70 G C ANA box , 70 G C G C G C 5' A C A 5' A C A 5' A C A B Pa35.2-L2697(+) Pa35.2-L2549a(+) Pa35.2-L2549b(+) K-loop K-loop K-loop C A A C A A C A A U U U U U U A G 30 A G 30 A G 30 G A G A G A G C G C G C G C G C G C 10 20 C G 10 bp 10 20 C G 10 bp 20 C G 10 bp G C G C 10 G C C C C C G C G C G C G C G C G C G C G C G U 40 G C G U C 40 G U C 40 C U C C U C C U C C C C 7 U U U U 3' guide sequence 5 U U (6 stacked layers +1nt) A G A 3' guide sequence6 5' guide sequence A 5´ guide sequence 5' guide sequence G G C U 3´ guide sequence C (5 stacked layers +1nt) C (3 stacked layers +2nt) (3 stacked layers +2nt) U (4 stacked layers +2nt) U G U (5 stacked layers) G U 10 G U 10 U G 10 U G U G G C G C G C G C 50 G C 50 G C 50 C G 9 bp C G 9 bp C G 8 bp 8 C G C G 9 C G 9 G C G C G C G C ANA box G C ANA box G C ANA box G U G U G U 5´ A C A 5' A C A 5' A C A Pa105.2-L1377(+) C Pa105.2-L2794a(+) Pa105.2-L2794b(+) Pa105.2-S27(+) C A C A A A C A A A A A G C G C G C C G C G C G C G 40 40 C G 40 C G 30 G C 30 30 G C G C U G U G U G U G U G U G G G G G G G G C A A G C A A K-turn G C K-turn A K-turn A G A A G U A G U U G A 50 50 G A 50 G A U G U G U G C G C G C G C G 10 bp 10 bp 10 C G 10 bp 10 C G 10 20 G C 20 G C 20 G C C G C G C G C G C G C G G G G G G G G U 60 G U 60 G G U G G G G G G 60 G G G G G C A 7 C 5' guide sequence C A 3' guide sequence A C U C 5 5' guide sequence 3' guide sequence6 (7 stacked layers) C U (6 stacked layers +1nt) 5' guide sequence U U A U A U A 3' guide sequence (6 stacked layers) (5 stacked layers +1nt) (6 stacked layers) C U A C U G A C A G (5 stacked layers) U G 10 G C 10 G C 10 G C G C G C G C C G C G C G C G 9 bp C G 9 bp C G 8 bp 8 C G 70 9 C G 70 9 C G 70 G C G C G C G C ANA box G C ANA box G C ANA box G U G U G U 5' A U A 5' A U A 5' A U A (a) Figure 3. Productive/Non-productive H/ACA guide RNAs. A. Productive/non-productive folds for Pa35.1 and its true (L2930, L2672) and false (S922 and S1122) targets. B. Productive folds for Pa35.2 and its true targets: L2697 and L2549. C. Productive folds for Pa105.2 and its true targets: L1377, S27, L2794. The 2D structures of the guide RNAs are shown according to the RNA fold consistent with the pairing with its associated target (the RNA targets are omitted for clarity): productive and non-productive complexes are indicated by a (+) and (-) sign, respectively. The different possible structures for a given guide RNA are shown using a double arrow. Structural changes associated with opening or closing base-pairs in the internal loop are indicated by single vertical arrows; dashed lines indicate the opening of the internal loop. Bulge nucleotides switching out/in of a stem are marked using an oblique arrow. bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 10/146

Pa105.1-L2554a(+) Pa105.1-L2554b(-) Pa105.1-L2554c(-) D G A A A G A G A A A A A G A G A C G C G G A 30 C G C G 30 C G 40 30 C G G C G C 40 G C 40 U G U G G C U G G C G C C G C G G C C C G K-turn G C C A K-turn G C C A G U A A K-turn A G U A G G A G A U 50 G A G A 50 G A G A 50 G C G C 20 9 bp +1 9 G C 9 A G 10 20 A G 9 bp 20 A G 9 bp G A G A C G G A C G C G C G C G A A C G G C A G C G C G G C G C A C A C A 5' guide sequence C C C C 60 (3 stacked layers +2nt) C C C G C 7 G 6 G 3' guide sequence6 G 3' guide sequence G 3' guide sequence U 5' guide sequence G 5' guide sequence U G (5 stacked layers +1nt) U (5 stacked layers +2nt) G (5 stacked layers +1nt) A G C (6 stacked layers) G (3 stacked layers +2nt) A G C C 10 10 A G G C G C G C G C G C G C C G C G C G 9 bp 8 bp C G 9 bp C G C G C G C G 9 C G 8 9 G C G C 70 G C C G 70 ANA box C G 70 C G ANA box ANA box C G C G C G 5' A C A 5' A C A 5' A C A

E Pa160-S922(+) Pa160-L2672(-) Pa105.1-L2552(-) U A U A G A U A U A A A C G C G G A G C G C C G 30 G C 30 G C A A 30 C G U G 40 U G 40 G C 40 U A U A U G C G C G G C G C U G C U A K-turn A K-turn C G A G U A G U G C C G A G A A K-turn A G U G C G C G A A U 50 A U 50 G A 50 20 G C 10 bp 20 G C 11 bp 11 G C 8 G C 10 G C 20 A G 8 bp G C G C G A G C G C C G G U G U C G C U A C G C C U G G G C C G C G C A C C C C 5 C C 8 A U 5' guide sequence 3' guide sequence C G 5 (3 stacked layers +2nt) A U (5 stacked layers) 3' guide sequence 5' guide sequence A C 3' guide sequence A C G U (5 stacked layers +3nt) (3 stacked layers +2nt) A C (5 stacked layers) , 60 5' guide sequence G C 10 10 A C A G G C G C (6 stacked layers +2nt) 10 G C G C G C G C C G C G C G C G 9 bp 8 bp C G 8 C G 8 bp 8 C G 9 C G C G C G C G G C C G ANA box C G ANA box C G ANA70 box G C 70 G C 70 C G 5' A C A 5' A C A 5' A C A

Pa21-S891(+) Pa21-S892(-) F 30 30 K-loop K-loop C U G U C G A U A U A G A G G A G A G A G A G C G C 10 bp 9 G C 10 G C 9 bp C G C C G C G 20 U C U G 20 U G 40 C G 40 C G U C G C G C G C C U G U G U C U 5' guide sequence 6 (6 stacked layers +1nt) C C C C C C 5 5´ guide sequence C C 3´ guide sequence C G 3' guide sequence (6 stacked layers) U G (5 stacked layers +1nt) U C G (5 stacked layers) C G 10 G C 10 G C G C 50 G C 50 C G C G C G 9 bp C G 8-9 bp 9 C G 9 C G G C G C G C ANA box G C ANA box G U G U 5' A C A 5´ A C A (b) Figure 3. Productive/Non-productive H/ACA guide RNAs (continued). D. Productive/non-productive folds for Pa105.1 ands its true targets: L2554, false target: L2552. E. Productive/non-productive folds for Pa160 ants its true (S922) and false (L2672) targets. F. Productive/non-productive folds for Pa21 and its true (S891) and false (S892) targets. Pa105.1 can adopt alternative folds by opening the lower stem by one base-pair to increase the base-pairing between the guide and its target (x/6/9 to x/7/8). Opening the apical stem as well by one base-pair (x/8/8) creates an internal loop that would accept L2552 as possible target. Adding one base-pair at the bottom of the lower stem of Pa160 creates an internal loop that would accept L2672 as possible target. bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 11/146 posed in the classification to discriminate between productive/non-Table 1. Functional and structural features from known productive guide-target pairs are further supported by the anal- H/ACA guide RNAs in Pyrococcus targeting 16S and 23S ysis of alternative targets identified in the 16S and 23S rRNAs rRNAs.

c of P. abyssi using RIsearch (see Methods). We can define a f

f e true target as a ribosomal position which was shown experi- a

b mentally to be modified by a given H/ACA guide RNA. In the b d case of P. abyssi, there are 16 ribosomal positions correspond- ing to true targets (pseudouridylation: + , Table 1) and 8 other guide-target model pairs conservation expression promoter % GC content pseudouridylation structure-function hybrid energy duplex energy 5’ duplex energy positions corresponding to false targets (pseudouridylation: Pa19-S1017g ++- 72 + 8/7/9- -38 -24 -11 - , Table 1). Among 37 additional false targets (Table S2), Pa19-S1017h ++- 72 + 10/6/9+ -38 -21 -11 25 guide-pair candidates are expected to be non-productive Pa21-S891 + + + 76 + 10/5/9+ -29 -20 -7.7 Pa21-S892 + + + 76 - 9/6/9- -29 -21 -7.7 folds (‘8/7/9’, ‘8/8/9’, ‘9/6/9’, ‘9/6/9’, ‘9/7/8’) while 12 are Pa35.1-L2930 + + + 70 + 10/6/9+ -30 -19 -8.1 expected to be productive folds (‘10/5/9’, ‘10/6/9’, ‘10/6/8’). Pa35.1-L2672 + + + 70 + 10/8/7+ -30 -17 -16 Pa35.1-S922 + + + 70 - 9/7/9- -28 -13 -6.5 However, all the productive folds except those associated Pa35.1-S1122 + + + 70 - 9/8/9- -28 -15 -9.4 with 3 particular targets (tRNAs) exhibit one of the 7 listed Pa35.2-L2549 + + + 71 + 10/6/9+ -29 -6.4 -9.5 Pa35.2-L2697 + + + 71 + 10/5/9+ -27 -10 -6.0 anomalies (Table S2) such as: a 5’ duplex below the energy Pa35.2-L2250 + + + 72 - 10/5/9+ -27 -6.8 -5.1 cutoff ( 6 kcal/mol), a mismatch in the 5’ duplex prior to the Pa40.1-L2588 ++- 69 + 10/5/9+ -30 -21 -6.2 Pa40.2-L1932 ++- 74 + 10/6/8+ -46 -13 -11 targeted U position, a succession of wobble pairs in the 5’ du- Pa40.2-L278 ++- 74 - 9/7/8- -46 -20 -13 plex, or the absence of pair with the first base of the 3’ duplex Pa40.3-L2016 ++- 63 + 10/5/9+ -32 -13 -7.7 Pa91-L2685 + + + 72 + 10/5/9+ -33 -18 -11 (bulge on the guide). The anomalies are expected to have an Pa105.1-L2554 + + + 76 + 10/6/9+ -35 -15 -8.4 impact on the stability or on the kinetics of association/dis- Pa105.1-L2552 + + + 76 - 8/7/9- -34 -20 -8.1 Pa105.2-S995 + + + 71 - 10/6/9+ -34 -13 -5.7 sociation of the guide-pair complex. One particular anomaly Pa105.2-L2794 + + + 71 + 10/6/9+ -34 -15 -8.5 corresponds to the presence of a 3 nucleotides spacer between Pa105.2-L1377 + + + 71 + 10/6/9+ -34 -13 -6.9 Pa105.2-S27 + + + 71 + 10/6/9+ -34 -17 -6.9 the 5’ and 3’ duplex elements of the target, a feature which Pa160-L2672 + + + 69 - 11/5/8- -38 -12 -8.7 was proposed in a previous model for the Pa19-S1017 guide- Pa160-S922 + + + 69 + 10/5/9+ -38 -20 -14 a b target complex. This feature is present in two guide-targets as provided by the UCSC genome browser [39, 40]. as determined from RNA-seq data [33]. c as determined from previous work by RT-CMCT pairs (Pa19-L655 and Pa160-L1318, see Table S2) associated analysis and in vitro activity of pseudouridylation [5]. d structure-function with pseudo-uridylation modifications which are not detected model as proposed from the ’productive’/’non-productive’ classification (the +/- sign indicates whether it is predicted to be productive or not); the listed in the 23S rRNA of P. abyssi. Based on these two examples, models also include some additional models that optimize the hybrid and this feature is rather considered like an anomaly. duplex energies: 10/6/8 (Pa40.2-L1932) and 10/8/7 (Pa35.1-L2672) which are derived from 10/5/9 and 10/6/9 by opening the lower stem by one or two In the particular case of Pa19, the previous model [5] base-pairs (Fig. 3), respectively. e as calculated by RNAsnoop (kcal/mol) suggested a pairing with only 8 base-pairs in the upper stem from the Vienna RNA package [37]. f the duplex energy includes both the 5’ and 3’ duplex energies with a correction factor (+4.1 kcal/mol). g the and a 3 nucleotide target spacer (Fig. 5A and Supplementary Pab19-S1017 pairing proposed previously is based on a 8/7/9 model [5]. h Listing 17): two features which are not compatible with a pro- the Pab19-S1017 pairing in the current model (10/6/9). ductive guide-target complex according to our classification. An alternate canonical fold is proposed here corresponding to a ‘10/6/9’ model (Fig. 5B and Supplementary Listing 18) similar to that proposed for Pab105.1-L2554a (Supplementary minor sequence variations are observed between P. abyssi and Fig. S1). The bulge nucleotide is a U which may not stack in T. kodakaerensis; the major variation is the substitution in the upper stem but rather loop out to the minor groove and Pa105.1 of the A bulge at the bottom of the upper stem by still induce a compensatory twist in the upper stem as shown a Watson-Crick base-pair which restores a more canonical in some known RNA structures [57]. It is thus expected to stem (Supplementary Fig. S1). In Archaeoglobus fulgidus, be productive because of this structural compensation in the all the identified H/ACA guide RNAs fit in the subfamilies upper stem which also includes three stacked G-A base-pairs described previously in Pyrococcus: GUIDE55, GUIDE56, (Fig. 5). The very long single-strand region at the 5’ side GUIDE65, GUIDE66 and GUIDE75 (Supplementary Fig. S3 of the internal loop (and the presence of 6 or 7 nucleotides & Listing 19). The H/ACA folds are compatible with pro- of the 3’ side) may also give enough flexibility to allow the ductive guide RNAs where Af190, Af4.1, Af4.2b and Af46 accommodation of the K-turn motif in the proper position and can associate with their rRNA targets to allow the pseudo- orientation in the RNP particle. uridylation at the respective positions: S1004, S1167, L2601 and L2639 as shown by Tang et al. [1]. It was later suggested that Af52 is not functional for pseudo-uridylation especially 2.3 H/ACA RNAs in other Euryarchaea and Crenar- because it does not carry any K-turn or K-loop motif [16]. chaea Thus, Af52 is not expected to modify the position L2878 The H/ACA guide RNAs which are present in Thermococcus proposed initially [1, 16]. Af4.3 was also proposed to guide genomes, in particular from T. kodakaerensis, are orthologs the pseudo-uridylation at L1364 but the RNA fold that would from Pyrococcus motifs as described previously [5]. Only target this position (only 9 base-pairs in the upper stem when bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 12/146

A Pa105.2-L1377, Ph105.2, Pf105.2 Pa105.2-L1377(+) A C A A A A A G G C C G C G C G C G 40 G C 30 G C R U G G U G G G G G C A G C A A K-turn A K-turn A G U A G U G A G A 50 U R U G C G C G C G C G G C 20 G C 10 C G LSU C G LSU C G 1377 C G 1377 G G G G G U G U G Y G U 60 C C G G C C guide G G G G C C G Y G G C C C A G U 5' U C C G G G G A U ANA C A G U C U G A C U 6 G A U A G G G G G C C C C U A 5' U A G G C A C A Y A G 5’ duplex G U 3’ duplex U G A G G C G C 10 G C G C target G C U G G C U G C G A C E = -34.5 kcal/mol C G A C C G G U C G G U C G C A E = -6.9 kcal/mol C G 70 9 C A G C A U 5’ G C A U 5' 5' G C ANA box G C ANA box G Y G U 5' A A 5' A U A B Pa105.2-S995, Ph105.2, Pf105.2 Pa105.2-S995(-) A C A A A A A G G C C G C G C G C G 40 G C 30 G C R U G G U G G G G G C A G C A A K-turn A K-turn A G U A G U G A G A 50 U R U G C G C G C G C G G C 20 G C 10 C G SSU C G SSU C G 995 C G 995 G G G G G U G U G Y G U G G C U G G 60 C U G G C C G G C C C A G U guide G Y C A G U C U G G C U G G 5' U C C G G G G A U ANA 6 U A A G U A A G C A C A Y C A A G G C C U C U G 5' U G C A G C G A 10 G C G A G C C G G U G C C G C G G U target C G G U C G C C C G C C C G C G E = -34.5 kcal/mol C G 70 9 C G G C G G G C G G 5' E = -5.7 kcal/mol 5' G C ANA box 5’ G C ANA box G Y G U 5' A A 5' A U A C Pa105.2-L2794, Ph105.2, Pf105.2 Pa105.2-L2794a(+) A C A A A A A G G C C G C G C G C G 40 G C 30 G C R U G G U G G G G G C A G C A A K-turn A K-turn A G U A G U G A G A 50 U R U G C G C G C G C G G C 20 G C 10 C G LSU C G LSU C G 2794 C G 2794 G G G G C U C U G Y G U G G C C guide G Y G G 60 C C G G C C G G C C C A G U 5' U C C G G G G A U A ANA C A G U C U G G C U G G G G G C C C C U G U 5' 6 U A G U U A G U C A C A Y C U C U U G C U 10 G C A G target G C A G G C U G G C U G C G C C E = -34.5 kcal/mol C G C C C G G C C G G C C G C A E5’= -8.5 kcal/mol C G 70 9 C A G C C U G C C U 5' 5' G C ANA box G C ANA box G Y G U 5' A A 5' A U A Figure 4. Models of productive (+) and non-productive (-) guide-target pairs of Pa105.2. A. Productive fold for Pa105.2 and its true target L1377. B. Non-productive fold for Pa105.2 and its false target S995. C. Productive fold for Pa105.2 and its true target L2794. The duplexes are represented with the following color code: orange for the 5’ duplex, green for the 3’ duplex. The energy values are those calculated by RNAsnoop. bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 13/146 A Pa19-S1017a, Ph19, Pf19 Pa19-S1017a A A A A A A C G 40 C G G C G C G C G C G C G C G C G C G C G C U G U G 50 U G guide U G C A C G A K-turn A G U C A G U G A 5´ G A C G G C G G C G U C C U C C U ANA G A G A C U G C C G C C G U A G G A G G A 5' 30 G A G C G G C 8 G C CGU G C G C SSU target G C 60 SSU G C 1017 20 G C 1017 C G U G C G U G C G C G U C G C G U G C E = -38.4 kcal/mol G C C A G C C A G C C U G G C U G G E = -11.0 kcal/mol U G A 5’ G U G A G C G G C C G 7 G C C C G A C C G A U U G G C C A G G C C A A U G C G G A C G C 70 G G 3-4 nt G C C A 10 G C C A G C C G G C C G A U G C A U G C C G U C C G U 9 C C G C G C G C G 5' 5' C G ANA box C G 80 ANA box R U G U 5´ A U A 5´ A U A

Pa19-S1017b , Ph19, Pf19 Pa19-S1017b B A A A A A A C G 40 C G G C G C G C G C G C G C G C G C G C G C U G U G 50 U G C A U G A G guide G C A K-turn U A G U G A 5' C G G C G G C C C U C C U ANA G A G A 30 G A G C G C C G G U C G G A G G A 5' G C A G C G U G C G C G C 60 10 target G C SSU G C SSU C G 1017 20 C G 1017 U E = -38.5 kcal/mol G U C G U G C G U G G C C G C G C C G G C E = -11.0 kcal/mol G C C C 5’ C A G A G G G U U A U U A C G C G G C G G A 6 C C G G C G G U G U G C C A A G C C A A C U 5-6 nt G C C G G C 70 C G G C G A 10 G C G A G C C G G C C G A U C C A U C C C G G C C G 9 G C C G U G C G U G 5' 5' C G ANA box C C G 80 ANA box C R U G U 5' A U A 5' A U A Figure 5. Models of guide-target pairs of Pa19. A. Non-productive Pa19-S1017a model proposed previously (‘8/7/9’) [5]. B. New productive Pa19-S1017b model (‘10/6/9’).

opening the basal G-U base-pair: see Af4.3 in Supplementary fold supported by multiple sequence-structure alignment is Fig. S3) is not consistent with a productive particle accord- proposed for Hvo1 (Supplementary Fig. S5). In the case of ing to our classification. On the other hand, Af4.3 can still Hvo2, a productive fold ‘10/6/9’ can be obtained by extending guide the modification of L1970 [19]; it is closely related to the upper stem and including an additional unpaired residue Pa40.3 and its orthologs in P. horikoshii (Ph40.3) and P. furio- downstream of the pseudo-uridylated position as proposed sus (Pf7.3) which target the position L2016. Another related in a previous model of association [5] (Fig. 5); the resulting H/ACA motif is also found in Methanocaldococcus jannaschii ‘10/6/9’ fold is annotated Hvo2a (Supplementary Fig. S6). where it can guide the modification at the equivalent position The ‘11/8/9’ alternative fold for Hvo2 includes more than L2015 (Supplementary Fig. S4). 5 mismatches in the upper stem to accommodate the target In phylogenetically more distant Euryarchaea such as: sequence [55]. However, a slight change in the pairing of Haloferax volcanii, Haloarcula marismortui, Halobacterium the upper stem and positions of bulge nucleotides both in salinarum, Haloquadratum walsbyi and others, H/ACA RNA the guide and target RNAs can make it switch to an alterna- guides have also been identified [55, 2]. In the case of H. tive ‘10/6/9’ fold suited for the pseudo-uridylation of L1958 volcanii and the related species mentionned above, the RNA (Hvo2b, see Supplementary Fig. S7). Hvo1 and Hvo2a/Hvo2b fold proposed based on the association between the guide can be folded according to ‘10/5/9’ and ‘10/6/9’ models to RNA and its target(s) corresponds to the ‘10/5/9’ fold for the form the following hybrid RNA duplexes: Hvo1-L2621 (Sup- first H/ACA motif (Hvo1 targeting L2621) or the ‘9/7/9’ and plementary Fig. S8), Hvo2a-L1956 (Supplementary Fig. S9) ‘11/8/9’ folds for the second motif (Hvo2 targeting L1956 and Hvo2b-L1958 (Supplementary Fig. S10). and L1958, respectively) [55]. A slightly modified ‘10/5/9’ In the Euryarchaea M. jannaschii and P. furiosus, differen- bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 14/146 tial roles were identified for Cbf5 (TruB/Pus4 family) corre- are from P. calidifontis and P. arsenaticum (sR202). These sponding to two distinct pseudo-uridylation pathways: a guide two sRNA may switch from the Pae HACA-GUIDE56 to the RNA-dependent pathway for rRNA modification and a guide Pae HACA-GUIDE55 subfamily by shortening the internal RNA-independent pathway for the U55 sequence-specific loop by one nucleotide which is then included in the upper tRNA modification [58, 25, 29]. In the general case of tRNA stem (Supplementary Fig. S12). This leads to increase the modifications, the canonical pseudo-uridylation pathway is number of stacked layers from 9 to 10 to restore a productive based on other Y-synthases (PsuX/Pus10 family). In sul- configuration (Fig. 6). The minor variation with respect to folobales and other Crenarchaea (such as: Aeropyrum pernix the canonical H/ACA motif is the length of the lower stem: and Metalloshaera sedula), a guide RNA-independent path- 8 instead of 9 base-pairs. We can assume that the presence way based on an additional Y-synthase (TruD/Pus7 family) of bulge(s) and/or mismatch(es) in the lower stem allows the Tyr also operates on tRNAs at position U35 of tRNAGUA. In guide RNA to be accommodated into the H/ACA RNP par- sulfolobales and related species, this Y35-synthase is defi- ticle in a similar way. The ANA box is well conserved in cient but rescued through the guide RNA-dependent path- all the motifs sR201 and sR202 from the different species of way [20]. The U35 position can be potentially modified Pyrobaculum. A single variation is observed in P. calidifontis through the guide RNA-dependent pathway in 5 species where where the ANA box is replaced by a CCA box reminiscent to a compatible H/ACA guide RNA was identified: Sulfolobus a tRNA 3’ end that might indicate a slight difference in the solfataricus, Sulfolobus tokodaii, Sulfolobus acidocaldarius, structural contraints for the guide RNA to be functional. Aeropyrum pernix, Metallosphaera sedula [20]. This guide The second family includes all the other H/ACA-like RNA-dependent pathway was validated experimentally for guide RNAs (sR203 to sR210) except sR207. Although they Sulfolobus solfataricus: the H/ACA RNA guide Sso1 corre- are truncated from the lower stem, they generally include sponds to a ‘10/5/9’ fold which is also found in A. pernix one or two possible residual base-pair(s) at the position(s) in (Supplementary Fig. S11). On the other hand, the RNA guide the sequence which would be consistent with the Pae HACA- candidates found in the other species fold into a ‘9/5/9’ model GUIDE66 or Pyr HACA-GUIDE65 subfamilies (Fig. 7A). (Supplementary Fig. S11). However, the tRNA substrates From the structural alignment of the H/ACA-like motifs for modified through the RNA-dependent pathway have some this family (Supplementary Listing 21): 64% of the analyzed particularities [58, 20]. Thus, one cannot exclude that the sequences do contain one or two base-pairing position(s) (sub- ‘9/5/9’ fold is still active due to compensatory changes associ- familes BPS, BP1 and BP2, Fig. 7B-D) reminiscent from the ated with the structure of the substrate. canonical H/ACA motif while 36% lost any trace of the lower stem (subfamily NONE, Fig. 7E). Representative members of 2.4 H/ACA(-like) RNAs in Pyrobaculum each subfamily are shown: the presence of residual base-pairs From the data provided on the H/ACA-like guide RNAs in Py- in about two thirds of the H/ACA-like motifs suggest they robaculum [30], we propose a structural classification which may derive from canonical H/ACA motifs degenerated by the separates the ten sRNA (from Pae sR201 to Pae sR210) into accumulation of mutations in the lower stem. The H/ACA- 3 main families which differ by how far they are from the like motifs generally carry an ANA box or a more degenerated canonical H/ACA motifs found in other archaea, especially in NCA box at a distance from the stem which is consistent with Pyrococcus. The first family corresponds to H/ACA motifs the Pyr HACA-GUIDE66 or Pyr HACA-GUIDE65 subfam- reclassified as canonical motifs (Pae sR201 and Pae sR202) ilies (modulo one residue) and the canonical H/ACA motifs while the two other ones correspond to the atypical H/ACA (59%). Some motifs (27%) still carry a degenerated box motifs, designated as H/ACA-like motifs. The first two fam- (NCA) which is downstream of the expected position (up to ilies adopt H/ACA(-like) folds that are consistent with pro- 10-12 residues). In a few cases (14%), there is no ANA or ductive complexes according to the structural and functional NCA box (Supplementary Fig. S13). classification. The third family just includes one of the guide The associations between Pae sR207 and its two riboso- RNA (Pae sR207) and represents a special case that will be mal targets L2597 and L2596 [30] are based on a 12/(13)- discussed later. and 13/(12)- model of pairing, respectively. There is no direct The first family includes sR201 and sR202 which can evidence that these two ribosomal positions are actually mod- be considered as canonical H/ACA motifs (‘10/5/8’ fold), as ified by Pae sR207 and none of the paring models is expected shown in Figure 6: the two subfamilies Pae HACA-GUIDE55 to be productive (Table 2 and Fig. 8). The L2596 position is and Pae HACA-GUIDE56 are already known from Pyrococ- particularly questionable since it requires the presence of two cus (Fig. 2, Supplementary Listing 20). However, only the successive wobble base-pairs closing the pseudo-upper stem sRNA from the Pae HACA-GUIDE55 subfamily (or equiv- while the presence of a second U-G base-pair is not supported alent) are expected to be productive. All the H/ACA motifs by phylogenetic data (Supplementary Fig. S14 and S15). Al- from the Pae HACA-GUIDE55 subfamily exhibit 10 base- ternative folds that would restore a productive H/ACA-like pairs between the internal loop and the K-turn motif. On motif can be proposed by excluding some nucleotides from the other hand, the Pae HACA-GUIDE56 subfamily only in- the stacking layers of the stem (Fig. 9, Supplementary List- cludes 9 base-pairs, the only two members of this subfamily ing 22). However, there is no compatible compensation that bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 15/146

PyrHACA-GUIDE55 PyrHACA-GUIDE56 B subfam_weight=0.824695 C subfam_weight=0.26234 82% 26% C G C G Y C Y 5 nt U G 6 nt R A C R R C 5 nt (6-1) G 5 nt (6-1) C G R Y G C G G Pae_sR201 (+) Pca_sR202b(+) Pca_sR202a(-) A U C U C C G U G U Pyr_HACA C A C G C G G C G C G C (sR201 & sR202) 30 C G 30 C G 30 C G C G C G 40 C G 40 17-18 nt U G C C G C G C G C G 40 G A G A G U G C U U U A G G K-turn G C G C K-turn A G G K-turn G G A U A G U A G U G A G A G A 0-1 nt C A R A C G R Y U A U A 50 C G 50 U R 20 G C 50 10 G C 10 U A C G 20 G C 10 bp 20 U G 9 G C G C 10 bp C G 10 bp U G 9 bp G C G C G C C G G Y G C C G C G C G U G C U C G G G U G U 5 5 C G 5' guide sequence 3' guide sequence 5' guide sequence C A 3' guide sequence 5' guide sequence A 5 nt 5-6 nt C C A (3 stacked layers +2nt) (5 stacked layers) (3 stacked layers +2nt) A C (5 stacked layers) (3 stacked layers +2nt) C 6 G C G G G G 3' guide sequence G A C G 60 C (5 stacked layers) , 60 U A 10 G C 10 G C G Y G 10 C G 60 C G C G Y C G C U 0-1 nt 8 C U 8 C G G C G C 8 C G 8 bp C G 8(+) bp C U 8(+1) bp C U 8(+1) bp R G C A G A G G U G 70 ANA box A G A G ANA box 0-1 nt A 70 ANA box A A C U G U G 70 5' A C A 5' A A C A 5' A C C A 5' A C C A Figure 6. Canonical H/ACA Motifs in Pyrobaculum (Pae sR201 & Pae sR202). A. Consensus structure of the H/ACA motifs. B. Structural subfamily Pae HACA-GUIDE55. C. Structural subfamily Pae HACA-GUIDE56.

would allow Pae sR207 to target L2596. Alternative targets were identified in 16S, 23S rRNAs and tRNAs (Table S3) but Table 2. Functional and structural features from known half of the guide-target pairs exhibit a weak 5’ duplex (below H/ACA guide RNAs in Pyrobaculum targeting rRNAs and the energy cutoff of 6 kcal/mol) and the other half some tRNAs. pairing anomaly(ies) similar to those already described in P.

b

e abyssi.

e d

a

b a c 2.5 New H/ACA(-like) guide candidates in Pyrococ- cus guide-target model structure-function Based on the H/ACA-like motifs present in Pyrobaculum pairs conservation expression promoter % GC content pseudouridylation hybrid energy duplex energy 5’ duplex energy Pae201-S934 + + + 73 + 10/5/8+ -29 -18 -10 genomes, the genome of P. abyssi was scanned for the pres- Pae202-L2562 + + + 67 + 10/7/6+ -27 -21 -7.5 ence of similar motifs in transcribed regions [33] (see Fig. 1). Pae203-S762 + + + 69 + 10/(14)+ -34 -20 -6.0 Pae204-L2023 + + + 63 + 10/(14)+ -37 -21 -19 Several new H/ACA(-like) motifs were identified and ranked Pae205-L2029 + + + 53 + 10/(14)+ -35 -19 -16 according to several criteria: GC-content, conservation, ex- Pae206-L1902 + + + 65 + 10/(15)+ -46 -30 -29 Pae207-L2597a + + + 67 ND 12/(13)- -27 -25 -30 pression, promoter strength [33]. Among the 8 new candidates Pae207-L2597b + + + 67 ND 10/(13)+ -24 -25 -30 thus identified, three of them look like canonical H/ACA mo- Pae207-L2596 + + + 67 ND 13/(12)- -26 -23 -27 Pae208-L1978 + + + 59 ND 10/(14)+ -48 -28 -28 tifs with minor variations and the five other ones correspond to Pae209-S8 + + + 59 + 10/(14)+ -33 -24 -9.2 H/ACA-like motifs where the lower stem is absent or mostly Pae210-tR54 + + + 61 + 10/(15)+ -37 -17 -16 Pae210-tR55 + + + 61 + 9/(15)- -37 -18 -18 truncated (Fig. 10). Three of the motifs were initially con- ND: no direct experimental evidence. a as provided by the UCSC genome sidered of particular interest because they satisfy at least two browser [39, 40]. b as determined from high-throughput pyrosequencing of the criteria listed above: PabO1 (high expression, strong (expression) and experimental validation of RNA targets (productive) [30]. c structure-function model as proposed from the ’productive’/’non-productive’ promoter, conservation, medium GC-content, Supplementary classification; the +/- sign indicates whether it is predicted to be productive Fig. S16), PabO48 (high conservation, high expression, Sup- or not. d as calculated by RNAsnoop (kcal/mol) from the Vienna RNA package [37]. e the duplex energy includes both the 5’ and 3’ duplex energies plementary Fig. S17), PabO78 (high expression, strong pro- with a correction factor (+4.1 kcal/mol). moter, conservation: Supplementary Fig. S18). PabO48 looks like a canonical H/ACA motif while PabO1 and PabO78 are H/ACA-like motifs suggesting these non-canonical motifs bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 16/146

Pyr-BP1 Pyr-BP2 B subfam_weight=0.541517 C subfam_weight=0.388949 54% 39%

R Y R Y R R Y Y Y

6 nt 3-5 nt 6-7 nt 2-6 nt Par_sR208 Pca_sR204 A U C A U A C G G G C G A U C G 40 Pyr: sR203, sR204, sR205, sR206, C G 40 C G G C sR208, sR209, sR210 C G 30 G C A G 30 C G G G 15-20 nt R U U G C G U C U A K-turn G C G A G A G A K-turn U A G 50 G A G A 50 U U A G A A G A 10 bp 10 bp A C G 10 bp G C C G C G 0-2 nt 20 G U 0-1 nt G C 20 G C G C C G 60 C G R U G C 60 C G U G U 0-1 nt U U G U C A A C U C C G C R U G U 70 80 G C70 C 10 U A C 10U C G 5´ A C C A A A A A C G U G G U C A G C A 5´ G C U U A U U C U G C U U U U A U G C G G A A A 5´ guide sequence 3´ guide sequence 5´ guide sequence 3´ guide sequence Y 5´ 6-7 nt 2-6 nt Pyr-BPS Pyr-NONE D subfam_weight=0.284827 E subfam_weight=0.35436 28% 35% C R R Y Y 7 nt 2-5 nt R Y R Y C

6 nt 3-5 nt

Pae_sR203 Pae_sR206

U G G G U G A C G C G G C C G G C 40 C G 40 30 C G C G G C 30 G C A C A C G C U C G K-turn G C A G G K-turn U A G U G A G A U A 50 U G 50 G U C G G C 10 bp G C A10 bp G C A 20 C G G C C C G 20 C G G C G C 10 G 60 70 A C GG U A U G 5´ C C C C U C C C A U U A A U C A G U C G G U C C C C C U U G C A C A C G 60 5´ guide sequence 3´ guide sequence A U G G G C 10 G C C U C U 70 80 G C 5´ C C U U G U C C C U C U A G U U U A U A 5´ guide sequence 3´ guide sequence Figure 7. H/ACA-like Motifs in Pyrobaculum. A. Consensus structure of all the H/ACA-like motifs. B. Structural subfamily (Pyr-BP1) with one pseudo base-pair corresponding to the (n+6) position in the canonical lower stem. C. Structural subfamily (Pyr-BP2) with one pseudo base-pair corresponding to the (n+9) position in the canonical lower stem. D. Structural subfamily (Pyr-BPS) with two pseudo base-pairs corresponding to the (n+6) and (n+9) positions in the canonical lower stem. E. Structural subfamily (Pyr-NONE) without any pseudo base-pair. bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 17/146

Pyrobaculum sR207-L2597 Pae207-L2597

R A U C G G A C G C G C G C G G C C G 40 G C G C Y Y 30 G C G C Y C C G guide G U C A G G C U 5´ U G C C G C G G G G U U G G G A A G U 0-1 nt 0-1 nt G A G U G G C G C C C C A G C 5´ A 50 Y R A G 0-1 nt U R U U U A Y A U A 0-1 nt target G R C G G C 12 bp E = -26.6 kcal/mol A U 12 bp G C 20 G C G C G C G C G C G C G C 0-1 nt ANA box 10 70 ANA box G U G U 60 5´ Y R Y A R G G G C C G C G G G G U U A C A 5´ U U A G G A G A G U C G G G U G C C G C G G G G U U G C A C A 0-1 nt

U G C A G A C C C A U G G U U U G G C G C C C C A G C 5´ U G C A G A C C C A U G G U U U G G C G C C C C A G C 5´ LSU LSU 2597 2597 Figure 8. Pairing Model of guide-target Pae sR207-L2597

Pae_sR207 U G A Pae_sR207 U Pyrobaculum C G G A C G C G (a) C G 40 C G G C (b) C G 40 17-20 nt Y 30 G C G 17-20 nt Y G C U C C G 30 G C A G G C C U G K-turn A G C C G A A G G C C U G A G K-turn 0-1 nt 0-1 nt G A A G 50 Y R A 50 U 0-1 nt A G 1-2 nt 3 nt G A A U R U A A G Y A Y R 0-1 nt U A 0-1 nt U A A G R Y R U C G 0-1 nt C G G C 12 bp A U 12 bp G C 10 bp A U 10 bp G C 20 G C G C 20 G C G C G C G C G C G C G C G C G C G C G C 0-1 nt G C 0-1 nt G C G U U G U G60 G U U G U G60 5´ guide sequence G C 5´ guide sequence G C G C G C G G G 8-9 nt C 3´ guide sequence 8-9 nt G 3´ guide sequence 5-9 nt C 5-9 nt C G C G U G U G 10 G G 10 G G A G U G70 A G U G70 A U A U A U A U G G ANA box G G ANA box C G C C G C 5´ A C A 5´ U U A A C A 5´ A C A 5´ U U A A C A 3-4 nt 3-4 nt Figure 9. Alternative RNA folds for Pae sR207. (a) RNA fold proposed by Bernick et al. [30] vorresponding to a 12/13 model; (b) RNA fold compatible with a 10/(13) model. bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 18/146

H/ACA Motifs H/ACA-like Motifs

PabO48(+/-) 30 PabO1(-) K-loop PabT1(+) PabX2(+/-) K-loop A C A A K-loop C C U C U U A C A U A G C C U A C G 40 U U 30 A A C C G A G G U U U A U A 30 A A A A G 30 C G U G G A G A A K-turn 9 G G A U 10 bp A G U U A 10 G C A A U A 20 G C 20 U A 8 (+1) bp A U G U U A 40 U C C A 20 G G 40 10 U G 10 bp 10 C G U A G C 50 A G U G 8 (+2) bp G U C C G U A U 5´ guide sequence G C C C G 20 A C G C 5´ guide sequence 40 G (6 stacked layers +2nt) A A G U U G G G G 10 G G G U C A A U U 50 3´ guide sequence A G 6(5) 5´ guide sequence C A A C 3´ guide sequence U C G A A U G A G (5 stacked layers +1nt) G A U A G A U U C G 10 G C G G C 11 C U 6 50 C C 3´ guide sequence U G 10 A U A A 8 (+1) bp G C 9 bp C C 5´ guide sequence 3´ guide sequence C G C A G (5 stacked layers +1nt) A U A U C (6 stacked layers +2nt) A G A A G 10 A U 60 9 G C G C U C G C G C G C A 9 A U C U G 5 bp G U A U 50 A U U ANA box G C 4 bp U A U A A ANA box U A 70 5´ G G G C G A U A 4 G C A U 7 G C U 5´ C C A 5´ U A C U U U G A C A 60 5´ U U U 60 PabX3(+/-) PabX4(+/-) K-loop K-loop 30 U C G U G G A C A G U 30 A U A G A G G A G A A C U U U A C G 10 C 10 bp 20 U C 20 U A U G 10 bp C G 10 PabX5(+) C G C G U C 40 G A A U C G U A 40 A U U G G A U 40 A U C G U U G C C A U G 5´ guide sequence U C 3´ guide sequence 6 C C U C G U 5´ guide sequence G U A U A C 9 A A G A 30 U A A U 10 G G A U U A 3´ guide sequence 10 A C G A G 50 C G G U G G U A C U U K-turn 6 bp A G U C U 50 G C 6 (+1) bp U C A A (9) U A 50 G G ANA box A U 60 ANA box A G C G A U 6 U G 5´ C U C C C U C A 5´ C C U C C G 10 bp 10 U G 20 U U A U PabO78(-) A U A A A C A U 60 G C 30 C G U A U A A 5 A A C G 5´ guide sequence G U G A U 40 (6 stacked layers) C A 3´ guide sequence C A 10 C G (5 stacked layers) A U G A G U G U U G C A A K-turn 20 U A U A A U A 9 bp , 70 A U A U 9 C C A 9/(14) U G U G 8 (+1) bp A U 80 ANA box 5´ guide sequence 10 U A 50 60 3´ guide sequence 70ANA box U A U A 5´ G A G C A G A 5´ A U G A A C A U C A C G A G A G G G A A G C U G A A A U A G A A G C A Figure 10. New H/ACA(-like) motifs identified in Pyrococcus abyssi [33]. The functional annotation (+, - or +/-) indicates the relevance of the H/ACA(-like) candidate based on the following criteria: (1) productive fold (’10/6/9’, ’10/5/9’ or equivalent folds), (2) distance of 14 to 15 residues from the internal loop to the ANA box, (3) ANA box, (4) absence of alternative folds due to pairings in the internal loop. (+): all 4 criteria are met; (+/-) only 3 criteria are met; (-) less than 3 criteria are met. bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 19/146 might not be specific to Crenarchaea. However, both PabO1 other functional regions. or PabO78 adopt a non-productive fold (’9/10/9’ and ’9/14’, There is a total of 525 common CDS between P. abyssi respectively); the ANA box signature is present in both motifs and P. aerophilum but only 455 correspond to annotated genes but it is not located at the proper position in PabO1. There (after excluding hypothetical proteins). Among the 525 CDS, are 3 tRNA genes with 12- or 13-mer sequences that could there are 246 CDS targeted by H/ACA(-like) motifs from both be targeted by PabO1 (tRNA-ProCGG, tRNA-ProTGG, tRNA- P. abyssi and P. aerophilum (Fig. 11); among the 246 CDS, AsnTAA, data not shown) but missing a U at the proper position. 218 correspond to annotated genes. PabO48 is the only motif that could target a ribosomal posi- In the sublist of 246 genes which are potential common tion (L2047) but it does not carry any ANA box signature at targets in both Pyrococcus and Pyrobaculum (Fig. 11A-B), the expected position. the more statistically relevant functions (DAVID analysis with A fourth motif, PabT1 (Supplementary Fig. S19), also P value lower than 1E-4 and Benjamini score lower than meets the indicated criteria but it is embedded within the 1E-3, Fig. 11C-D) are: the amino-acid biosynthesis and ni- Pro trogen compound biosynthetic process (UniProt: KW-0028 tRNATGG gene and no potential target was found in the whole genome for this motif. The other 4 motifs are annotated PabX2 & GO:0044271, enrichment score of 4.09), zinc (UniProt: to PabX5 (Supplementary Fig. S20 to Fig. S23): they are all KW-0862, enrichment score of 2.68), nucleotide-binding and conserved in at least two Pyrococcus species and located in ATP-binding (UniProt: KW-0547 and KW-0067, enrichment expressed regions except PabX2. Only PabX4 has a strong score of 2.33). Interestingly, the gene of the 3-dehydroquinate promoter and only PabX5 corresponds to a canonical H/ACA synthase ( Gene ID: 1495344) found in the clus- motif. PabX5 is the only canonical motif that includes a ter of amino-acid biosynthesis was identified as potentially proper ANA box but there is no potential targeted sequence regulated by some guide RNA (C/D box or H/ACA-like) in in the genome. PabX2 and PabX3 can adopt alternative folds Pyrobaculum calidifontis [59]. Similarly, one gene corre- involving several pairings in the internal loop suggesting they sponding to the elongation factor EF-2 (ENTREZ Gene ID: may be degenerated H/ACA-like motifs or just false-positive 1495166) and two genes corresponding to tRNA-synthetases candidates. are potentially regulated by guide RNAs in different species of Pyrobaculum [59]. Among the genes candidates clustered in the nucleotide-binding function, we also find EF-2 and 2.6 Extra-Ribosomal Targets from H/ACA(-like) RNAs 9 tRNA-synthetases which makes this latter class of genes in Pyrococcus and Pyrobaculum the more abundant category among all gene candidates and The full genomes of P. abyssi and P. aerophilum were ex- perhaps the more relevant candidates to test. plored to identify extra-ribosomal targets associated with In the extra-ribosomal targets, the guide-target hybrid du- H/ACA motifs from P. abyssi and H/ACA(-like) motifs from P. plexes generally correspond to sub-optimal hits including aerophilum (see Materials and Methods). A large number of more wobble pairs and/or bulge(s) and mismatch(es); the potential targets were identified since the sequence constraints more frequently involved H/ACA(-like) RNAs are: Pab19 for for guide-target pairing are rather loose. In Pyrococcus abyssi, P. abyssi and Pae sR207 for P. aerophilum. As an example, more than 4000 hits were obtained: only 37% of those corre- the leucyl-tRNA synthetase (ENTREZ Gene ID: 1496220) spond to productive folds (‘10/5/9’ or ‘10/6/9’ or equivalent is shown as a potential gene target (Supplementary data: folds), the remaining 63% correspond to non-productive alter- Fig. S26). As proposed previously in Pyrobaculum [59], some native folds for all the H/ACA motifs. Hits from both the pro- of the H/ACA(-like) RNAs may act as antisense RNAs but it ductive and non-productive folds are more frequently found is not clear whether the full-length RNA is involved [59]. The in 16S and 23S rRNAs that represent 0.3% of the genome data provided here suggest the H/ACA(-like) RNAs might size. The hits from the productive motifs have a 10-fold to potentially act as specific antisenses taking advantage of the a 3-fold enrichment (with respect to the expected number of H/ACA RNP machinery to stabilize some productive or non- targets by chance in rRNA) whether one considers or not the productive complexes with mRNA targets. overlapping targets (Fig. S24). On the contrary, there is an under-representation of targets in CDS regions (92% of the genome size) due to a similar proportion of targets (around 3. Conclusions 50%) located in antisense from CDS or in other functional A more precise structural and functional classification of ar- regions. In P. aerophilum, more than 6,000 hits were obtained: chaeal H/ACA guide RNAs is proposed based on the X-ray hits from both the productive and non-productive folds are data of H/ACA RNPs and 2D RNA predictions supported only slightly more frequent in 16S and 23S rRNAs (2 to 5- by the phylogenetic data of the known H/ACA(-like) motifs fold over-representation). In hits from productive folds, the and their target(s); it is consistent with all the current data targets located in 16S and 23S rRNAs represent 0.3% of the available up to now in both Euryarchaea and Crenarchaea. non-overlapping targets while these regions of the genome One of the major structural determinants is the presence of a represent 0.2% of the genome size (Fig. S25). The number of stretch of 10 stacked layers in the upper stem from the closing targets in the CDS regions is much more under-represented base-pair of the pseudo-uridylation pocket to the G-A sheared due to a large number of targets in antisense from CDS or base-pairs of the K-turn or K-loop motif. A distance of 14 or bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 20/146

A B C D

Figure 11. Venn Diagrams of CDS and Associated Functions Targeted by H/ACA(-like) folds from P. abyssi (Pab) and P. aerophilum (Pae). A. CDS genes/RNA genes targeted by Pab and Pae (full list of genes: Listing 23). B. CDS targeted by productive/non-productive H/ACA(-like) motifs from Pab and Pae (full list of CDS: Listing 24). C. Biological functions or processes (UniProt keywords) targeted by productive/non-productive H/ACA(-like) folds from Pab and Pae (full list of keywords: Listing 25). D. Biological functions or processes (GO terms) targeted by productive/non-productive H/ACA(-like) folds from Pab and Pae (full list of GO terms: Listing 26).

15 nucleotides from the same closing base-pair to the ANA Acknowledgments box proposed previously as another structural determinant [5] is the result of the conformational flexibility at the junction The bioinformatics workflow integrates S-MART scripts which between the internal loop and the lower stem. The 3’ single- were developed by Matthias Zytnicki. Perl scripts were also stranded region of the internal loop usually varies between 5 provided by Olivier Lespinet. The authors are members of the and 6 nucleotides in the typical RNA folds: in the GUIDE65 french network “GDR-Archaea”. subfamily (5 nucleotides) the helical twist is more pronounced at the junction than in the GUIDE66 subfamily (6 nucleotides) and compensates for the shorter 3’ single-stranded region. Fi- References nally, the stability of the 5’ duplex also appears to play a key [1] Thean-Hock Tang, Jean-Pierre Bachellerie, Timofey role in the formation of productive RNA-RNA complexes. It Rozhdestvensky, Marie-Line Bortolin, Harald Huber, probably contributes to tether the uridine to be modified in the Mario Drungowski, Thorsten Elge, Jurgen¨ Brosius, and pseudo-uridylation pocket and to restrain its mobility in the Alexander Huttenhofer.¨ Identification of 86 candidates catalytic site of Cbf5. for small non-messenger RNAs from the archaeon Ar- A more detailed structural analysis of H/ACA-like guide chaeoglobus fulgidus. Proceedings of the National RNAs in the Pyrobaculum genus reveals a strong similarity Academy of Sciences of the United States of America, with the canonical H/ACA guide RNAs previously identified 99(11):7536–7541, 2002. in both Euryarchaea and Crenarchaea. Two particular sRNAs [2] Ian K Blaby, Mrinmoyee Majumder, Kunal Chatterjee, (sR201 and sR202) can actually be considered as canonical Sujata Jana, Henri Grosjean, Valerie´ de Crecy-Lagard,´ H/ACA guide RNAs. The other H/ACA-like guide RNAs and Ramesh Gupta. Pseudouridine formation in archaeal (sR203 to sR210) can be classified in different subfamilies RNAs: The case of Haloferax volcanii. RNA (New York, which retain to some degree the vestiges of the lower stem. NY), 17(7):1367–1380, 2011. The H/ACA(-like) motifs identified recently in P. abyssi [33] show a similar signature with a pseudo lower stem degener- [3] Daniel L Baker, Osama A Youssef, Michael I R Chastkof- ated by the incorporation of mismatches and bulges. Accord- sky, David A Dy, Rebecca M Terns, and Michael P Terns. ing to this current work, all the H/ACA(-like) motifs from RNA-guided RNA modification: functional organization P. aerophilum except one are classified as productive as ex- of the archaeal H/ACA RNP. Genes & Development, pected from the published data [30]. In the particular case 19(10):1238–1248, 2005. of Pae sR207, its role in the modification of the positions [4] Bruno Charpentier, Sebastien´ Muller, and Christiane L2596 and L2597 is not confirmed because there is no direct Branlant. Reconstitution of archaeal H/ACA small ri- experimental evidence; an alternative RNA fold that is con- bonucleoprotein complexes active in pseudouridylation. sistent with a 10/(13) model is proposed for the modification Nucleic Acids Research, 33(10):3133–3144, 2005. of L2597. The presence of genes which might be regulated by H/ACA(-like) RNAs, a hypothesis which was already pro- [5] Sebastien´ Muller, Fabrice Leclerc, Isabelle Behm- posed based on experimental evidence in Pyrobaculum [59], Ansmant, Jean-Baptiste Fourmann, Bruno Charpentier, remains to be tested in the case of gene candidates which are and Christiane Branlant. Combined in silico and exper- potential targets in both Pyrococcus and Pyrobaculum. imental identification of the Pyrococcus abyssi H/ACA sRNAs and their target sites in ribosomal RNAs. Nucleic Acids Research, 36(8):2459–2475, 2008. bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 21/146

[6] Maria A Zago, Patrick P Dennis, and Arina D Omer. The aNOP10 complex reveal crucial roles of the C-terminal expanding world of small RNAs in the hyperthermophilic domains of both proteins in H/ACA sRNP activity. Nu- archaeon Sulfolobus solfataricus. Molecular Microbiol- cleic Acids Research, 34(3):826–839, 2006. ogy, 55(6):1812–1828, 2005. [19] Sebastien´ Muller, Bruno Charpentier, Christiane Branlant, [7] Lennart Randau. RNA processing in the minimal and Fabrice Leclerc. A dedicated computational approach organism Nanoarchaeum equitans. Genome Biology, for the identification of archaeal H/ACA sRNAs. Methods 13(7):R63, 2012. in enzymology, 425:355–387, 2007. [8] P Thebault,´ S de Givry, T Schiex, and C Gaspin. Search- [20] Sebastien´ Muller, Alan Urban, Arnaud Hecker, Fabrice ing RNA motifs and their intermolecular contacts with Leclerc, Christiane Branlant, and Yuri Motorin. Defi- constraint networks. Bioinformatics (Oxford, England), ciency of the tRNA(Tyr):Psi 35-synthase aPus7 in Ar- 22(17):2074–2080, 2006. chaea of the Sulfolobales order might be rescued by [9] M Charette and M W Gray. Pseudouridine in RNA: what, the H/ACA sRNA-guided machinery. Nucleic Acids Re- where, how, and why. IUBMB Life, 49(5):341–351, 2000. search, 37(4):1308–1322, 2009. [21] [10] Junhui Ge and Yi-Tao Yu. RNA pseudouridylation: new Jean-Baptiste Fourmann, Anne-Sophie Tillault, Magali insights into an old modification. Trends in Biochemical Blaud, Fabrice Leclerc, Christiane Branlant, and Bruno Sciences, 38(4):210–218, 2013. Charpentier. Comparative study of two box H/ACA ri- bonucleoprotein pseudouridine-synthases: relation be- [11] Xinliang Zhao, Zhu-Hong Li, Rebecca M Terns, tween conformational dynamics of the guide RNA, en- Michael P Terns, and Yi-Tao Yu. An H/ACA guide RNA zyme assembly and activity. PLOS ONE, 8(7):e70313– directs U2 pseudouridylation at two different sites in the e70313, 2012. branchpoint recognition region in Xenopus oocytes. RNA [22] (New York, NY), 8(12):1515–1525, 2002. Rumana Rashid, Bo Liang, Daniel L Baker, Osama A Youssef, Yang He, Kathleen Phipps, Rebecca M Terns, [12] Xiaoju Ma, Chunxing Yang, Andrei Alexandrov, Eliza- Michael P Terns, and Hong Li. Crystal structure of a Cbf5- beth J Grayhack, Isabelle Behm-Ansmant, and Yi-Tao Nop10-Gar1 complex and implications in RNA-guided Yu. Pseudouridylation of yeast U2 snRNA is catalyzed by pseudouridylation and dyskeratosis congenita. Molecular either an RNA-guided or RNA-independent mechanism. Cell, 21(2):249–260, 2006. The EMBO Journal, 24(13):2403–2413, 2005. [23] Ling Li and Keqiong Ye. Crystal structure of an H/ACA [13] John Karijolich and Yi-Tao Yu. Converting nonsense box ribonucleoprotein particle. Nature, 443(7109):302– codons into sense codons by targeted pseudouridylation. 307, 2006. Nature, 474(7351):395–398, 2011. [24] Youssef and Terns. Dynamic interactions within sub- [14] Shivendra Kishore and Stefan Stamm. The snoRNA HBII- complexes of the H/ACA pseudouridylation guide RNP. 52 regulates alternative splicing of the serotonin receptor Nucleic Acids Research, 35(18):6196–6206, 2007. 2C. Science (New York, NY), 311(5758):230–232, 2006. [25] Priyatansh Gurha, Archi Joardar, Priyasri Chaurasia, and [15] Jipei Liao, Lei Yu, Yuping Mei, Maria Guarnera, Jun Ramesh Gupta. Differential roles of archaeal box H/ACA Shen, Ruiyun Li, Zhenqiu Liu, and Feng Jiang. Small proteins in guide RNA-dependent and independent pseu- nucleolar RNA signatures as biomarkers for non-small- douridine formation. RNA Biology, 4(2):101–109, 2007. cell lung cancer. Molecular cancer, 9:198, 2010. [26] [16] Timofey S Rozhdestvensky, Inna V Tchirkova, Jurgen¨ Bo Liang, Song Xue, Rebecca M Terns, Michael P Terns, Brosius, Jean-Pierre Bachellerie, and Alexander Hutten-¨ and Hong Li. Substrate RNA positioning in the archaeal hofer. Binding of L7Ae protein to the K-turn of archaeal H/ACA ribonucleoprotein complex. Nature Structural & snoRNAs: a shared RNA binding motif for C/D and Molecular Biology, 14(12):1189–1195, 2007. H/ACA box snoRNAs in Archaea. Nucleic Acids Re- [27] Jingqi Duan, Ling Li, Jing Lu, Wei Wang, and Keqiong search, 31(3):869–877, 2003. Ye. Structural mechanism of substrate RNA recruitment [17] C Charron, X Manival, A Clery,´ V Senty-Segault,´ B Char- in H/ACA RNA-guided pseudouridine synthase. Molecu- pentier, N Marmier-Gourrier, C Branlant, and A Aubry. lar Cell, 34(4):427–439, 2009. The archaeal sRNA binding protein L7Ae has a 3D struc- [28] Jing Zhou, Bo Liang, and Hong Li. Functional and struc- ture very similar to that of its eukaryal counterpart while tural impact of target uridine substitutions on the H/ACA having a broader RNA-binding specificity. Journal of ribonucleoprotein particle pseudouridine synthase. Bio- Molecular Biology, 342(3):757–773, 2004. chemistry, 49(29):6276–6281, 2010. [18] Xavier Manival, Christophe Charron, Jean-Baptiste Four- [29] Rajashekhar Kamalampeta and Ute Kothe. Archaeal mann, Franc¸ ois Godard, Bruno Charpentier, and Chris- proteins Nop10 and Gar1 increase the catalytic activity tiane Branlant. Crystal structure determination and site- of Cbf5 in pseudouridylating tRNA. Scientific Reports, directed mutagenesis of the Pyrococcus abyssi aCBF5- 2:663, 2012. bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 22/146

[30] David L Bernick, Patrick P Dennis, Matthias Hochsmann,¨ [41] Yann Ponty and Fabrice Leclerc. Drawing and Editing the and Todd M Lowe. Discovery of Pyrobaculum small Secondary Structure(s) of RNA. Methods in molecular RNA families with atypical pseudouridine guide RNA biology (Clifton, N.J.), 1269(Chapter 5):63–100, January features. RNA (New York, NY), 18(3):402–411, 2012. 2015. [31] Sebastien´ Muller. Development and application of compu- [42] Xiang-Jun Lu and Wilma K Olson. 3DNA: a versatile, in- tational methods dedicated to the identification of H/ACA tegrated software system for the analysis, rebuilding and box sRNAs within archaeal genomes and functional study visualization of three-dimensional nucleic-acid structures. of the corresponding RNAs and H/ACA sRNPs. PhD the- Nature Protocols, 3(7):1213–1227, 2008. sis, Universite´ Henri Poincare´ [Manuscript available from [43] R Lavery, M Moakher, J H Maddocks, D Petkeviciute, theses.fr (2007NAN10110)], 2007. and K Zakrzewska. Conformational analysis of nu- [32] Andre´ Lambert, Jean-Fred Fontaine, Matthieu Legen- cleic acids revisited: Curves+. Nucleic Acids Research, dre, Fabrice Leclerc, Emmanuelle Permal, Franc¸ ois Ma- 37(17):5917–5929, 2009. jor, Harald Putzer, Olivier Delfour, Bernard Michot, and [44] T J Macke. RNAMotif, an RNA secondary structure Daniel Gautheret. The ERPIN server: an interface to definition and search algorithm. Nucleic Acids Research, profile-based RNA motif identification. Nucleic Acids 29(22):4724–4735, 2001. Research, 32(Web Server issue):W160–5, 2004. [45] Matthias Zytnicki and Hadi Quesneville. S-MART, a [33] Claire Toffano-Nioche, Alban Ott, Estelle Crozat, An N software toolbox to aid RNA-Seq data analysis. PLOS Nguyen, Matthias Zytnicki, Fabrice Leclerc, Patrick ONE, 6(10):e25988, 2011. Forterre, Philippe Bouloc, and Daniel Gautheret. RNA at [46] 92 degrees C: The non-coding transcriptome of the hyper- Junxiang J Gao and Ji J Wang. Re-annotation of two thermophilic archaeon Pyrococcus abyssi. RNA Biology, hyperthermophilic archaea Pyrococcus abyssi GE5 and 10(7):1211–1220, 2013. Pyrococcus furiosus DSM 3638. Current Microbiology, 64(2):118–129, January 2012. [34] Sam Griffiths-Jones, Simon Moxon, Mhairi Marshall, [47] Ajay Khanna, Sean R Eddy, and Alex Bateman. Rfam: Kounthea´ Phok, Annick Moisan, Dana Rinaldi, Nicolas annotating non-coding RNAs in complete genomes. Nu- Brucato, Agamemnon J Carpousis, Christine Gaspin, and cleic Acids Research, 33(Database issue):D121–4, 2005. Beatrice´ Clouet-d’Orval. Identification of CRISPR and riboswitch related RNAs among novel noncoding RNAs [35] Paul P Gardner, Jennifer Daub, John G Tate, Eric P of the euryarchaeon Pyrococcus abyssi. BMC Genomics, Nawrocki, Diana L Kolbe, Stinus Lindgreen, Adam C 12:312, 2011. Wilkinson, Robert D Finn, Sam Griffiths-Jones, Sean R [48] Eddy, and Alex Bateman. Rfam: updates to the RNA Sebastian Will, Michael F Siebauer, Steffen Heyne, Jan families database. Nucleic Acids Research, 37(Database Engelhardt, Peter F Stadler, Kristin Reiche, and Rolf issue):D136–40, 2009. Backofen. LocARNAscan: Incorporating thermodynamic stability in sequence and structure-based RNA homology [36] Zasha Weinberg and Ronald R Breaker. R2R–software search. Algorithms for molecular biology : AMB, 8(1):14, to speed the depiction of aesthetic consensus RNA sec- 2013. ondary structures. BMC Bioinformatics, 12:3, 2011. [49] Stephan H Bernhart, Ivo L Hofacker, Sebastian Will, [37] Hakim Tafer, Stephanie Kehr, Jana Hertel, Ivo L Ho- Andreas R Gruber, and Peter F Stadler. RNAalifold: facker, and Peter F Stadler. RNAsnoop: efficient target improved consensus structure prediction for RNA align- prediction for H/ACA snoRNAs. Bioinformatics (Oxford, ments. BMC Bioinformatics, 9:474, 2008. England), 26(5):610–616, 2010. [50] Anne Wenzel, Erdinc¸ Akbasli, and Jan Gorodkin. [38] Ronny Lorenz, Stephan H Bernhart, Christian Honer¨ zu RIsearch: fast RNA-RNA interaction search using a sim- Siederdissen, Hakim Tafer, Christoph Flamm, Peter F plified nearest-neighbor energy model. Journal of Geron- Stadler, and Ivo L Hofacker. ViennaRNA Package 2.0. tology, 28(21):2738–2746, October 2012. Algorithms for molecular biology : AMB, 6(1):26, 2011. [51] Daniel A Dalquen and Christophe Dessimoz. Bidirec- [39] Patricia P Chan, Andrew D Holmes, Andrew M Smith, tional Best Hits Miss Many Orthologs in Duplication- Danny Tran, and Todd M Lowe. The UCSC Archaeal Rich Clades such as Plants and Animals. Genome Biology Genome Browser: 2012 update. Nucleic Acids Research, and Evolution, 5(10):1800–1806, 2013. 40(Database issue):D646–52, 2012. [52] Da Wei Huang, Brad T Sherman, Qina Tan, Joseph Kir, [40] Kevin L Schneider, Katherine S Pollard, Robert Baertsch, David Liu, David Bryant, Yongjian Guo, Robert Stephens, Andy Pohl, and Todd M Lowe. The UCSC Archaeal Michael W Baseler, H Clifford Lane, and Richard A Genome Browser. Nucleic Acids Research, 34(Database Lempicki. DAVID Bioinformatics Resources: expanded issue):D407–10, 2006. annotation database and novel algorithms to better extract bioRxiv preprint doi: https://doi.org/10.1101/016246; this version posted March 10, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Revisiting the Structure/Function Relationships of H/ACA(-like) RNAs: A Unified Model for Euryarchaea and Crenarchaea — 23/146

biology from large gene lists. Nucleic Acids Research, 35(Web Server issue):W169–W175, June 2007. [53] Da Wei Huang, Brad T Sherman, and Richard A Lem- picki. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Pro- tocols, 4(1):44–57, 2009. [54] M Terns and R Terns. Noncoding RNAs of the H/ACA family. Cold Spring Harbor symposia on quantitative biology, 71:395–405, 2006. [55] Henri Grosjean, Christine Gaspin, Christian Marck, Wayne A Decatur, and Valerie´ de Crecy-Lagard.´ RNomics and Modomics in the halophilic archaea Haloferax volcanii: identification of RNA modification genes. BMC Genomics, 9:470, 2008. [56] P N Borer, Y Lin, S Wang, M W Roggenbuck, J M Gott, O C Uhlenbeck, and I Pelczer. Proton NMR and structural features of a 24-nucleotide RNA hairpin. Biochemistry, 34(19):6488–6503, 1995. [57] Andre´ Barthel and Martin Zacharias. Conformational transitions in RNA single uridine and adenosine bulge structures: a molecular dynamics free energy simulation study. Biophysical Journal, 90(7):2450–2462, 2006. [58] Martine Roovers, Caryn Hale, Catherine Tricot, Michael P Terns, Rebecca M Terns, Henri Grosjean, and Louis Droogmans. Formation of the conserved pseu- douridine at position 55 in archaeal tRNA. Nucleic Acids Research, 34(15):4293–4301, 2006. [59] David L Bernick, Patrick P Dennis, Lauren M Lui, and Todd M Lowe. Diversity of Antisense and Other Non- Coding RNAs in Archaea Revealed by Comparative Small RNA Sequencing in Four Pyrobaculum Species. Frontiers in Microbiology, 3:231, 2012.