U7 Snrnas: a Computational Survey Introduction
Total Page:16
File Type:pdf, Size:1020Kb
Article U7 snRNAs: A Computational Survey Manja Marz1,AxelMosig2,3,B¨arbel M.R. Stadler3, and Peter F. Stadler1,4,5,6,7* 1 Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig D-04107, Germany; 2 Department of Combinatorics and Geometry, CAS/MPG Partner Institute for Computational Biology, Shang- hai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; 3 Max Planck Institute for Mathematics in the Sciences, Leipzig D-04103, Germany; 4 Interdisciplinary Center for Bioin- formatics, University of Leipzig, Leipzig D-04107, Germany; 5 Fraunhofer Institute for Cell Therapy and Im- munology, Leipzig D-04103, Germany; 6 Department of Theoretical Chemistry, University of Vienna, Vienna A-1090, Austria; 7 Santa Fe Institute, Santa Fe, NM 87501, USA. U7 small nuclear RNA (snRNA) sequences have been described only for a handful of animal species in the past. Here we describe a computational search for func- tional U7 snRNA genes throughout vertebrates including the upstream sequence elements characteristic for snRNAs transcribed by polymerase II. Based on the results of this search, we discuss the high variability of U7 snRNAs in both se- quence and structure, and report on an attempt to find U7 snRNA sequences in basal deuterostomes and non-drosophilids insect genomes based on a combination of sequence, structure, and promoter features. Due to the extremely short se- quence and the high variability in both sequence and structure, no unambiguous candidates were found. These results cast doubt on putative U7 homologs in even more distant organisms that are reported in the most recent release of the Rfam database. Key words: U7 snRNA, non-coding RNA, RNA secondary structure, evolution Introduction The U7 small nuclear RNA (snRNA) is the small- the only mRNAs that are processed in this way (4 ). est polymerase II transcript known to date, with a The 5 region of the U7 snRNA is complementary length ranging from only 57 nt (sea urchin) to 70 nt to the “histone downstream element” (HDE), located (fruit fly). Its expression level of only a few hun- just downstream of the conserved hairpin. The in- dred copies per cell in mammals is at least three teraction of the U7 snRNP with the HDE is crucial orders of magnitude smaller than the abundance of for the correct processing of the histone 3 elements other snRNAs. It is part of the U7 small nuclear ri- (1 ). The 3 part of the U7 snRNA is occupied by a bonucleoprotein (snRNP), which plays a crucial role modified binding domain for Sm proteins consisting in the 3 end processing of histone mRNAs (1 ). of a characteristic sequence motif followed by a con- Replication-dependent histone mRNAs in metazoa served stem-loop secondary structure motif (5 ). U7 are the only known eukaryotic protein-coding mR- snRNA binds five of the seven Sm proteins that are NAs that are not polyadenylated ending but con- present in spliceosomal snRNAs, while the D1 and D2 tain a conserved stem-loop sequence instead (2 ). subunits are replaced by the Sm-like proteins Lsm10 Beyond metazoan animals, non-polyadenylated his- and Lsm11 (6–8 ). This difference is likely to be as- tone genes have been described in the algae Chlamy- sociated with the differences in the Sm-binding se- domonas reinhardtii and Volvox carteri (3 ), and Dic- quence. Recently, the U7 snRNP has not only re- tyostelium discoideum has a homolog of the histone ceived considerable attention from a structural biol- RNA hairpin-binding protein/stem-loop-binding pro- ogy point of view (9 , 10 ), but also has been investi- tein (HBP/SLBP) (DictyBase: DDB0169192). It ap- gated as a means of modifying splicing dys-regulation. pears that replication-dependent histone mRNAs are In particular, U7 snRNA-derived constructs that tar- get a mutant dystrophin gene were explored as a gene- therapy approach to Duchenne muscular dystrophy *Corresponding author. (11 , 12 ). E-mail: [email protected] Geno. Prot. Bioinfo. Vol. 5 No. 3–4 2007 187 U7 snRNAs: A Computational Survey Given the attention received by histone RNA 3 genes (16 , 32 ). Notably, several variant U7 snRNA end processing and the protein components of the U7 sequences from human HeLa cells were reported in snRNP, it may come as a surprise that the U7 snRNA Yu et al (17 ). This might indicate that the human itself has received little attention in the last decades. genome, in apparent contrast to mouse, also contains In fact, the only two experimentally characterized more than one functional U7 snRNA gene, or that mammalian U7 snRNAs are those of mouse (13–16 ) some of the pseudogenes are transcribed at low levels. and human (1 , 17 ), while most of the earliest work Table 1 therefore lists the number of U7-associated on U7 snRNPs concentrated on the sea urchin Psam- loci obtained by BLAST searches that use the pre- mechinus miliaris (18–21 )andtwoXenopus species sumably functional gene from the same species as (22–24 ). More recently, the U7 snRNA sequences query. This number can be fairly large in some mam- have been reported for Drosophila melanogaster (25 ) malian lineages, reaching almost 100 loci in primates. and Takifugu rubripes (26 ). In contrast, in most species there are only a few U7- We are aware of only two studies that consid- associated sequences, most of which are readily rec- ered U7 snRNA from a bioinformatics point of view. ognizable as retrogenes by virtue of poly-A tails. In L¨uck et al (27 ), the U7 snRNA was used as In several genomes, we were not able to find an an example for the application of the Construct unambiguous candidate for a functional U7 snRNA, tool to compute consensus secondary structures, and although we found sequences that are clearly derived Bompf¨unewerer et al (28 ) briefly reported on a from U7 but are not accompanied by a recogniz- BLAST-based homology search that uncovered can- able proximal sequence element (PSE). Examples in- didate sequences for chicken and two teleost fish. clude Sorex araneus and platypus. Most likely, these The U7 snRNP-dependent mode of histone end BLAST hits are pseudogenes, although many of them processing is a metazoan innovation (2 , 6 ). Never- are annotated with Ensembl gene IDs. This annota- theless, the most recent release of the Rfam database tion derives from sequence homology with the exam- (29 ) (Version 8.0; February 2007) lists sequences from ples stored in the Rfam database. In Figure 3 and eukaryotic protozoa, plants, and even bacteria. This Table 1, we compile the results of our BLAST-based discrepancy prompted us to critically assess the avail- homology search, which contain only sequences that able information on U7 snRNAs. are either experimentally known to be expressed or are predicted to be functional genes based on the pres- ence of conserved upstream elements. Results Separate multiple sequence alignments of am- niotes, teleosts, frogs, sea urchins, and flies reveal Bona fide U7 snRNA sequences strong conservation of the Sm-binding motif, consist- ing of the deviant Sm-binding site RUUUNUCYNG The results of the BLAST-based searches are summa- and the 3 hairpin structure. Furthermore, the rized in Table 1. In most species, only a single gene histone-binding region contains a universally con- with clear snRNA-like upstream elements was found. served box UCUUU (33 ). Using these features as an- In addition, BLAST identified several pseudogenes. chors, we obtained the alignment in Figure 3, which Clusters of U7 snRNAs as previously described for highlights the differences between major clades. No- sea urchins and frogs were otherwise only found in table variations within the vertebrates are in particu- zebrafish (Figure 1). lar the A-rich 5 and the reduced stem in teleosts, and The short length and the substantial divergence of their A-rich sequences in the hairpin loop. The hair- the U7 snRNA sequences make it impossible to distin- pin region is very poorly conserved at the sequence guish functional U7 snRNAs from pseudogenes based level between vertebrates, sea urchins, and flies, al- on the U7 sequence alone. To make this distinction, it though its structural variation is limited in essence to is necessary to analyze the flanking sequences as well. the length of the stem and a few short interior loops Bona fide snRNA genes are accompanied by charac- or single-nucleotide bulges. teristic promoter elements (30 , 31 ). Figure 2 displays the consensus sequence motifs of the presumably func- More distant homologs? tional amniote U7 snRNAs. In human and mouse, several pseudogenes have The U7 snRNA sequences evolve rather fast. Together been described in detail in addition to the functional with the short sequence length, this limits the power 188 Geno. Prot. Bioinfo. Vol. 5 No. 3–4 2007 Marz et al. Table 1 Trusted U7 snRNA sequences* Species Assembly Sequence Start Stop Ori. Database ID ψ Mus musculus Ensembl 43 Chr.6 124,706,844 124,706,905 − ENSMUSG00000065217 27 Rattus norvegicus Ensembl 43 Chr.X 118,163,804 118,163,865 − ENSRNOG00000034996 31 Rattus norvegicus Ensembl 43 Chr.4 160,870,934 160,870,995 − ENSRNOG00000035016 31 Homo sapiens Ensembl 43 Chr.12 6,923,240 6,923,302 + ENSG00000200368 91 Macaca mulatta Ensembl 43 Chr.11 7,125,496 7,125,557 + ENSMMUG00000027525 95 Otolemur garnettii PreEnsembl 43 Scaffold 102959 117,572 117,633 − 0 Oryctolagus cuniculus Ensembl 43 GeneScaffold 1693 111,485 111,546 + 3 Procavia capensis NCBI TRACE 175719230 275 336 + – Loxodonta africana Ensembl 43 Scaffold 60301 4,254 4,314 − 2 Echinops telfairi Ensembl 43 GeneScaffold 2204 10,742 10,803 + ENSETEG00000020899 57 Felis catus Ensembl 43 GeneScaffold 69 192,907 192,968 + 7 Canis familiaris Ensembl 43 Chr.27 41,131,749 41,131,810 − ENSCAFG00000021852 2 Myotis lucifugus PreEnsembl 43 Scaffold 168837 32,294 32,356 − 0 Equus caballus PreEnsembl 43 Scaffold 58 7,463,562 7,463,623 + 0 Bos taurus Ensembl 43 Chr.5 10,349,126 10,349,187 − AAFC03061782 8 Tursiops truncatus NCBI TRACE 194072802 598 659 + – Dasypus novemcinctus Ensembl 43 GeneScaffold 1944 24,469 24,530 + 16 Spermophilus tridec.