Consensus Folding of Aligned Sequences As a New Measure for the Detection of Functional Rnas by Comparative Genomics

Total Page:16

File Type:pdf, Size:1020Kb

Consensus Folding of Aligned Sequences As a New Measure for the Detection of Functional Rnas by Comparative Genomics doi:10.1016/j.jmb.2004.07.018 J. Mol. Biol. (2004) 342, 19–30 Consensus Folding of Aligned Sequences as a New Measure for the Detection of Functional RNAs by Comparative Genomics Stefan Washietl and Ivo L. Hofacker* Institut fu¨r Theoretische Chemie Facing the ever-growing list of newly discovered classes of functional und Molekulare RNAs, it can be expected that further types of functional RNAs are still Strukturbiologie, Universita¨t hidden in recently completed genomes. The computational identification of Wien, Wa¨hringerstraße 17 such RNA genes is, therefore, of major importance. While most known A-1090 Wien, Austria functional RNAs have characteristic secondary structures, their free energies are generally not statistically significant enough to distinguish RNA genes from the genomic background. Additional information is required. Considering the wide availability of new genomic data of closely related species, comparative studies seem to be the most promising approach. Here, we show that prediction of consensus structures of aligned sequences can be a significant measure to detect functional RNAs. We report a new method to test multiple sequence alignments for the existence of an unusually structured and conserved fold. We show for alignments of six types of well-known functional RNA that an energy score consisting of free energy and a covariation term significantly improves sensitivity compared to single sequence predictions. We further test our method on a number of non-coding RNAs from Caenorhabditis elegans/Caenorhabditis briggsae and seven Saccharomyces species. Most RNAs can be detected with high significance. We provide a Perl implementation that can be used readily to score single alignments and discuss how the methods described here can be extended to allow for efficient genome-wide screens. q 2004 Elsevier Ltd. All rights reserved. Keywords: conserved secondary structure; consensus structure prediction; non-coding RNAs; comparative genomics; randomizing multiple sequence *Corresponding author alignments Introduction regulation. There are many other examples of such new “RNA-genes”.4,5 In the past few years, our knowledge on the Another aspect of RNA function concerns cis- molecular and cellular functions of RNA has acting regulatory elements within protein-coding increased dramatically. In particular, the identifi- genes. A recent example is the regulation of cation of numerous RNA transcripts that function metabolic pathways in bacteria through “ribo- directly as RNA without ever being translated to switches”. These riboswitches occur in leader protein (non-coding RNAs; ncRNAs) has made clear sequences of operons and interact directly with 6 that the traditional view of RNA must be extended small metabolites in order to control protein profoundly. To mention just one example, the expression. discovery of micro RNAs1–3 has led to a new These findings not only force experimental paradigm of RNA-directed gene expression biologists to reconsider their strategies and methods, but also pose new challenges to bioinfor- matics. In particular, the computational identifi- cation of functional RNAs in genomes is a major, yet Abbreviations used: MFE, minimum free energy; RUF, largely unsolved, issue. RNAs of unknown function; SGD, Saccharomyces Current methods mostly are based on similarity Genome Database; ncRNA, non-coding RNA. searches and are successful in the identification of E-mail address of the corresponding author: functional RNAs that are members of already [email protected] known families.7–11 A more general approach that 0022-2836/$ - see front matter q 2004 Elsevier Ltd. All rights reserved. 20 Detection of Functional RNAs detects new classes of functional RNAs without straightforward measure for their detection. How- relying on any a priori knowledge would be helpful. ever, prediction programs readily calculate mini- This, however, proved to be difficult. In contrast to mum free energy (MFE) structures also for arbitrary protein-coding genes, which show strong statistical random sequences. The question arises of whether signals like open reading frames and codon bias, the natural RNAs are more stable (have lower MFE) primary sequences of functional RNAs seem to lack than random sequences. This question has been comparable signals completely. partly addressed.15 Here,wetestitagainfor Since most known functional RNAs depend on a sequences from a set of six structural RNA families defined secondary structure, it was suggested by (tRNA, 5 S rRNA, hammerhead ribozyme type III, Maizel and co-workers that functional RNAs have a group II catalytic intron, signal recognition particle more stable secondary structure than expected by RNA, U5 spliceosomal RNA). We used RNAfold for chance.12–14 However, efforts to build a general the prediction and calculated z-scores from a sample RNA gene finder based on secondary structure of 100 random sequences (see Methods). The results prediction failed. Rivas & Eddy had to conclude in are shown in Table 1. On average, the structural an in-depth study on the subject that secondary RNAs all have z-scores clearly below zero, meaning structure alone is generally not significant enough they have lower folding energy than the random for the detection of ncRNAs.15 Some other statistical samples. Is this significant enough to reliably measures, partly derived from secondary structure distinguish single sequences from the random predictions, have been proposed.16–18 Still, background? Figure 2 illustrates this for the tRNA additional information seems to be required for test set. The topmost panel shows the distribution of reliable predictions on a genome-wide scale. z-scores for 579 tRNAs together with the z-scores of The most promising source of information comes 579 random sequences (one shuffled version for from comparative studies. Already, a number of each tRNA). If we use a conservative limit of K4to complete genomes from closely related species are define a significant z-score, we can detect only 2% of available. Some of them have been sequenced solely the tRNAs. To detect half of all tRNAs we would for the purpose of genome comparisons. Readily have to lower the cutoff to K1.8. Then, however, we available sets for comparison are: more than 15 would encounter 4% of false positives. For genome- enteric bacteria,19,20 seven yeast species,21,22 two wide screens where a huge number of candidates nematodes23,24 and the two mammalian genomes has to be scored, this selectivity is too low (especially from human25 and mouse.26 Facing the ever- for a corresponding sensitivity of only 50%). Some of growing pace of genome projects, even more can the tested families form more stable structures (e.g. be expected in the near future. group II catalytic intron, average zZK3.88; ham- QRNA is a program that makes use of this merhead ribozyme III, zZK3.08) but generally the comparative information and scans pairwise align- native sequences are not efficiently separated from ments for conserved secondary structures using the bulk of random sequences. probabilistic models based on stochastic context- An additional point seems noteworthy regarding free grammars.27 This approach has been applied these experiments. Workman & Krogh30 pointed successfully to predict candidates for non-coding out that dinucleotide content influences secondary RNAs in Escherichia coli and Saccharomyces cerevisiae, structure predictions, because of the energy contri- some of which could be verified experimentally.28,29 butions of stacked base-pairs. A correct randomiz- Here, we propose an alternative method to assess ation procedure should, therefore, generate random a multiple sequence alignment for the existence of a sequences of the same dinucleotide content. It is conserved secondary structure. We compute an impossible to consider this in the randomization of averaged folding energy of aligned sequences that multiple sequence alignments (see the next section). also takes into account sequence covariations. For single sequences, however, we performed the z- Following the ideas of the Maizel group, we score calculations with both mono-and dinucleotide compare this to a set of random alignments in shuffled random sequences. The results (Table 1) order to estimate if there is an unusually stable and show that a systematic bias is not recognizable for conserved fold. We address the question of whether our test sets. The values differ only minimally and this can be a significant measure to detect functional the mononucleotide-shuffled z-scores are not RNAs in genome-wide screens. necessarily below the dinucleotide-shuffled scores. Thus, while dinucleotide composition was import- ant in the study by Workman & Krogh, where long Results and Discussion (O500 nt) mRNAs are tested for an (obviously non- existent) subtle bias towards lower folding energies, MFE predictions for single sequences are of it can be neglected in our case. limited statistical significance Additional information from aligned sequences Secondary structure is a useful level on which to shifts MFE predictions towards significant understand RNA function. Fairly reliable models levels can be predicted with computational methods. Since many known functional RNAs are tied to a defined The results so far show that folding energy is secondary structure, such predictions appear a indeed a characteristic signal of (structural) Detection of Functional RNAs Table 1. The z-scores and detection sensitivities for single and aligned sequences of various functional RNAs ncRNA type Single sequence
Recommended publications
  • The ELIXIR Core Data Resources: ​Fundamental Infrastructure for The
    Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure ​ for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). ​ ​ Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences ​ 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that ​ ​ mention Core Data Resource by name or include specific data record accession numbers.
    [Show full text]
  • Methods in and Applications of the Sequencing of Short Non-Coding Rnas" (2013)
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 2013 Methods in and Applications of the Sequencing of Short Non- Coding RNAs Paul Ryvkin University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Bioinformatics Commons, Genetics Commons, and the Molecular Biology Commons Recommended Citation Ryvkin, Paul, "Methods in and Applications of the Sequencing of Short Non-Coding RNAs" (2013). Publicly Accessible Penn Dissertations. 922. https://repository.upenn.edu/edissertations/922 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/922 For more information, please contact [email protected]. Methods in and Applications of the Sequencing of Short Non-Coding RNAs Abstract Short non-coding RNAs are important for all domains of life. With the advent of modern molecular biology their applicability to medicine has become apparent in settings ranging from diagonistic biomarkers to therapeutics and fields angingr from oncology to neurology. In addition, a critical, recent technological development is high-throughput sequencing of nucleic acids. The convergence of modern biotechnology with developments in RNA biology presents opportunities in both basic research and medical settings. Here I present two novel methods for leveraging high-throughput sequencing in the study of short non- coding RNAs, as well as a study in which they are applied to Alzheimer's Disease (AD). The computational methods presented here include High-throughput Annotation of Modified Ribonucleotides (HAMR), which enables researchers to detect post-transcriptional covalent modifications ot RNAs in a high-throughput manner. In addition, I describe Classification of RNAs by Analysis of Length (CoRAL), a computational method that allows researchers to characterize the pathways responsible for short non-coding RNA biogenesis.
    [Show full text]
  • Fast and Reliable Prediction of Noncoding Rnas
    Fast and reliable prediction of noncoding RNAs Stefan Washietl*, Ivo L. Hofacker*, and Peter F. Stadler*†‡ *Department of Theoretical Chemistry and Structural Biology, University of Vienna, Wa¨hringerstrasse 17, A-1090 Wien, Austria; and †Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Ha¨rtelstrasse 16-18, D-04107 Leipzig, Germany Communicated by Hans Frauenfelder, Los Alamos National Laboratory, Los Alamos, NM, December 14, 2004 (received for review November 2, 2004) We report an efficient method for detecting functional RNAs. The served noncoding elements in mammalian (or, more generally, approach, which combines comparative sequence analysis and vertebrate) genomes, and it must be expected that a significant structure prediction, already has yielded excellent results for a fraction of them are functional RNAs. small number of aligned sequences and is suitable for large-scale Possible candidates, however, have been identified only spo- genomic screens. It consists of two basic components: (i) a measure radically so far (19, 21), simply because there are no reliable tools for RNA secondary structure conservation based on computing a to scan multiple sequence alignments for functional RNAs. The consensus secondary structure, and (ii) a measure for thermody- most widely used program QRNA (22), which has been success- namic stability, which, in the spirit of a z score, is normalized with fully used to identify ncRNAs in bacteria (23) and yeast (24), is respect to both sequence length and base composition but can be not suitable for screens of large genomes. QRNA is limited to calculated without sampling from shuffled sequences. Functional pairwise alignments, and its reliability is low, especially if the RNA secondary structures can be identified in multiple sequence evolutionary distance of the two sequences lies outside of the alignments with high sensitivity and high specificity.
    [Show full text]
  • Comparing Tools for Non-Coding RNA Multiple Sequence Alignment Based On
    Downloaded from rnajournal.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press ES Wright 1 1 TITLE 2 RNAconTest: Comparing tools for non-coding RNA multiple sequence alignment based on 3 structural consistency 4 Running title: RNAconTest: benchmarking comparative RNA programs 5 Author: Erik S. Wright1,* 6 1 Department of Biomedical Informatics, University of Pittsburgh (Pittsburgh, PA) 7 * Corresponding author: Erik S. Wright ([email protected]) 8 Keywords: Multiple sequence alignment, Secondary structure prediction, Benchmark, non- 9 coding RNA, Consensus secondary structure 10 Downloaded from rnajournal.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press ES Wright 2 11 ABSTRACT 12 The importance of non-coding RNA sequences has become increasingly clear over the past 13 decade. New RNA families are often detected and analyzed using comparative methods based on 14 multiple sequence alignments. Accordingly, a number of programs have been developed for 15 aligning and deriving secondary structures from sets of RNA sequences. Yet, the best tools for 16 these tasks remain unclear because existing benchmarks contain too few sequences belonging to 17 only a small number of RNA families. RNAconTest (RNA consistency test) is a new 18 benchmarking approach relying on the observation that secondary structure is often conserved 19 across highly divergent RNA sequences from the same family. RNAconTest scores multiple 20 sequence alignments based on the level of consistency among known secondary structures 21 belonging to reference sequences in their output alignment. Similarly, consensus secondary 22 structure predictions are scored according to their agreement with one or more known structures 23 in a family.
    [Show full text]
  • Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I
    EMBL-European Bioinformatics Institute Annual Scientific Report 2013 On the cover Structure 3fof in the Protein Data Bank, determined by Laponogov, I. et al. (2009) Structural insight into the quinolone-DNA cleavage complex of type IIA topoisomerases. Nature Structural & Molecular Biology 16, 667-669. © 2014 European Molecular Biology Laboratory This publication was produced by the External Relations team at the European Bioinformatics Institute (EMBL-EBI) A digital version of the brochure can be found at www.ebi.ac.uk/about/brochures For more information about EMBL-EBI please contact: [email protected] Contents Introduction & overview 3 Services 8 Genes, genomes and variation 8 Molecular atlas 12 Proteins and protein families 14 Molecular and cellular structures 18 Chemical biology 20 Molecular systems 22 Cross-domain tools and resources 24 Research 26 Support 32 ELIXIR 36 Facts and figures 38 Funding & resource allocation 38 Growth of core resources 40 Collaborations 42 Our staff in 2013 44 Scientific advisory committees 46 Major database collaborations 50 Publications 52 Organisation of EMBL-EBI leadership 61 2013 EMBL-EBI Annual Scientific Report 1 Foreword Welcome to EMBL-EBI’s 2013 Annual Scientific Report. Here we look back on our major achievements during the year, reflecting on the delivery of our world-class services, research, training, industry collaboration and European coordination of life-science data. The past year has been one full of exciting changes, both scientifically and organisationally. We unveiled a new website that helps users explore our resources more seamlessly, saw the publication of ground-breaking work in data storage and synthetic biology, joined the global alliance for global health, built important new relationships with our partners in industry and celebrated the launch of ELIXIR.
    [Show full text]
  • A Unicellular Relative of Animals Generates a Layer of Polarized Cells
    RESEARCH ARTICLE A unicellular relative of animals generates a layer of polarized cells by actomyosin- dependent cellularization Omaya Dudin1†*, Andrej Ondracka1†, Xavier Grau-Bove´ 1,2, Arthur AB Haraldsen3, Atsushi Toyoda4, Hiroshi Suga5, Jon Bra˚ te3, In˜ aki Ruiz-Trillo1,6,7* 1Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain; 2Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, United Kingdom; 3Section for Genetics and Evolutionary Biology (EVOGENE), Department of Biosciences, University of Oslo, Oslo, Norway; 4Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Japan; 5Faculty of Life and Environmental Sciences, Prefectural University of Hiroshima, Hiroshima, Japan; 6Departament de Gene`tica, Microbiologia i Estadı´stica, Universitat de Barcelona, Barcelona, Spain; 7ICREA, Barcelona, Spain Abstract In animals, cellularization of a coenocyte is a specialized form of cytokinesis that results in the formation of a polarized epithelium during early embryonic development. It is characterized by coordinated assembly of an actomyosin network, which drives inward membrane invaginations. However, whether coordinated cellularization driven by membrane invagination exists outside animals is not known. To that end, we investigate cellularization in the ichthyosporean Sphaeroforma arctica, a close unicellular relative of animals. We show that the process of cellularization involves coordinated inward plasma membrane invaginations dependent on an *For correspondence: actomyosin network and reveal the temporal order of its assembly. This leads to the formation of a [email protected] (OD); polarized layer of cells resembling an epithelium. We show that this stage is associated with tightly [email protected] (IR-T) regulated transcriptional activation of genes involved in cell adhesion.
    [Show full text]
  • Deep Evolutionary Origin of Nematode SL2 Trans-Splicing Revealed by Genome-Wide
    bioRxiv preprint doi: https://doi.org/10.1101/642082; this version posted May 27, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Deep evolutionary origin of nematode SL2 trans-splicing revealed by genome-wide analysis of the Trichinella spiralis transcriptome Marius Wenzel2, Christopher Johnston1, Berndt Müller1, Jonathan Pettitt1 and Bernadette Connolly1 Running Title: Conservation of nematode SL2 trans-splicing 1 School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Institute of Medical Sciences, Foresterhill, Aberdeen, AB25 2ZD, UK 2 Centre of Genome-Enabled Biology and Medicine, University of Aberdeen, 23 St Machar Drive, Aberdeen AB24 3RY, UK Corresponding Author: Jonathan Pettitt, School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Institute of Medical Sciences, Foresterhill, Aberdeen, AB25 2ZD, UK. Tel.: +44 1224 437516. Email: [email protected] Keywords: spliced leader trans-splicing; polycistronic RNA processing; eukaryotic operons; RNA splicing; nematode genome evolution 1 bioRxiv preprint doi: https://doi.org/10.1101/642082; this version posted May 27, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. ABSTRACT Spliced leader trans-splicing is intimately associated with the presence of eukaryotic operons, allowing the processing of polycistronic RNAs into individual mRNAs.
    [Show full text]
  • U4 Small Nuclear RNA Dissociates from a Yeast Spliceosome And
    MOLECULAR AND CELLULAR BIOLOGY, Nov. 1991, p. 5571-5577 Vol. 11, No. 11 0270-7306/91/115571-07$02.00/0 Copyright C) 1991, American Society for Microbiology U4 Small Nuclear RNA Dissociates from a Yeast Spliceosome and Does Not Participate in the Subsequent Splicing Reaction SHYUE-LEE YEAN AND REN-JANG LIN* Department of Microbiology, University of Texas at Austin, Austin, Texas 78712-1095 Received 16 April 1991/Accepted 19 August 1991 U4 and U6 small nuclear RNAs reside in a single ribonucleoprotein particle, and both are required for pre-mRNA splicing. The U4/U6 and U5 small nuclear ribonucleoproteins join Ul and U2 on the pre-mRNA during spliceosome assembly. Binding of U4 is then destabilized prior to or concomitant with the 5' cleavage-ligation. In order to test the role of U4 RNA, we isolated a functional spliceosome by using extracts prepared from yeast cells carrying a temperature-sensitive allele ofprp2 (rna2). The isolated prp2A spliceosome contains U2, U5, U6, and possibly also I11 and can be activated to splice the bound pre-mRNA. U4 RNA does not associate with the isolated spliceosomes and is shown not to be involved in the subsequent cleavage-ligation reactions. These results are consistent with the hypothesis that the role of U4 in pre-mRNA splicing is to deliver U6 to the spliceosome. Splicing of introns from nuclear pre-mRNAs occurs by mRNA in a spliceosome (19, 22). This prp2A spliceosome is two cleavage-ligation (transesterification) reactions. The first functional, since it can be activated to splice if supplemented reaction is a cleavage at the 5' splice site and the formation with splicing factors and ATP.
    [Show full text]
  • Strategic Plan 2011-2016
    Strategic Plan 2011-2016 Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Mission The Wellcome Trust Sanger Institute uses genome sequences to advance understanding of the biology of humans and pathogens in order to improve human health. -i- Wellcome Trust Sanger Institute Strategic Plan 2011-2016 - ii - Wellcome Trust Sanger Institute Strategic Plan 2011-2016 CONTENTS Foreword ....................................................................................................................................1 Overview .....................................................................................................................................2 1. History and philosophy ............................................................................................................ 5 2. Organisation of the science ..................................................................................................... 5 3. Developments in the scientific portfolio ................................................................................... 7 4. Summary of the Scientific Programmes 2011 – 2016 .............................................................. 8 4.1 Cancer Genetics and Genomics ................................................................................ 8 4.2 Human Genetics ...................................................................................................... 10 4.3 Pathogen Variation .................................................................................................. 13 4.4 Malaria
    [Show full text]
  • Forward Genetics
    MOLECULAR AND CELLULAR BIOLOGY, Sept. 1992, p. 3939-3947 Vol. 12, No. 9 0270-7306/92/093939-09$02.00/0 Copyright X 1992, American Society for Microbiology PRP38 Encodes a Yeast Protein Required for Pre-mRNA Splicing and Maintenance of Stable U6 Small Nuclear RNA Levels STEVEN BLANTON, APARNA SRINIVASAN, AND BRIAN C. RYMOND* T. H. Morgan School ofBiological Sciences, University ofKentucky, Lexington, Kentucky 40506-0225 Downloaded from Received 7 April 1992/Returned for modification 14 May 1992/Accepted 17 June 1992 An essential pre-mRNA splicing factor, the product of the PRP38 gene, has been genetically identified in a screen of temperature-sensitive mutants of Saccharomyces cerevisiae. Shifting temperature-sensitive prp38 cultures from 23 to 37°C prevents the first cleavage-ligation event in the excision of introns from mRNA precursors. In vitro splicing inactivation and complementation studies suggest that the PRP38-encoded factor functions, at least in part, after stable splicing complex formation. The PRP38 locus contains a 726-bp open reading frame coding for an acidic 28-kDa polypeptide (PRP38). While PRP38 lacks obvious structural similarity to previously defined splicing factors, heat inactivation of PRP38, PRP19, or any of the known U6 http://mcb.asm.org/ (or U4/U6) small nuclear ribonucleoprotein-associating proteins (i.e., PRP3, PRP4, PRP6, and PRP24) leads to a common, unexpected consequence: intracellular U6 small nuclear RNA (snRNA) levels decrease as splicing activity is lost. Curiously, U4 snRNA, normally extensively base paired with U6 snRNA, persists in the virtual absence of U6 snRNA. The excision of intervening sequences from eukaryotic ates from the spliceosome (11, 37).
    [Show full text]
  • Domains of Yeast U4 Spliceosomal RNA Required for PRP4 Protein Binding, Snrnp-Snrnp Interactions, and Pre- Mrna Splicing in Vivo
    Downloaded from genesdev.cshlp.org on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press Domains of yeast U4 spliceosomal RNA required for PRP4 protein binding, snRNP-snRNP interactions, and pre- mRNA splicing in vivo R6my Bordonn4, Josette Banroques,~, 2 John Abelson, 1 and Christine Guthrie s Department of Biochemistry and Biophysics, University of California at San Francisco, San Francisco, California 94143 USA; ~Division of Biology, California Institute of Technology, Pasadena, California 91125 USA U4 small nuclear RNA (snRNA) contains two intramolecular stem-loop structures, located near each end of the molecule. The 5' stem-loop is highly conserved in structure and separates two regions of U4 snRNA that base- pair with U6 snRNA in the U4/U6 small nuclear ribonucleoprotein particle (snRNP). The 3' stem-loop is highly divergent in structure among species and lies immediately upstream of the binding site for Sm proteins. To investigate the function of these two domains, mutants were constructed that delete the yeast U4 snRNA 5' stem-loop and that replace the yeast 3' stem-loop with that from trypanosome U4 snRNA. Both mutants fail to complement a null allele of the yeast U4 gene. The defects of the mutants have been examined in heterozygous strains by native gel electrophoresis, glycerol gradient centrifugation, and immunoprecipitation. The chimeric yeast-trypanosome RNA does not associate efficiently with U6 snRNA, suggesting that the 3' stem-loop of yeast U4 snRNA might be a binding site for a putative protein that facilitates assembly of the U4/U6 complex. In contrast, the 5' hairpin deletion mutant associates efficiently with U6 snRNA.
    [Show full text]
  • Caenorhabditis Elegans Mrnas That Encode a Protein Similar to Adars Derive from an Operon Containing Six Genes Ronald F
    3424–3432 Nucleic Acids Research, 1999, Vol. 27, No. 17 © 1999 Oxford University Press Caenorhabditis elegans mRNAs that encode a protein similar to ADARs derive from an operon containing six genes Ronald F. Hough, Arunth T. Lingam and Brenda L. Bass* Department of Biochemistry and Howard Hughes Medical Institute, University of Utah, 50 North Medical Drive, Salt Lake City, UT 84132, USA Received June 8, 1999; Revised and Accepted July 15, 1999 DDBJ/EMBL/GenBank accession nos+ ABSTRACT adenosines within base-paired regions of cellular pre-mRNAs and viral RNAs (reviewed in 5). In addition to the C-terminal The Caenorhabditis elegans T20H4.4 open reading domain, which contains the catalytic active site (2,6,7), frame (GenBank accession no. U00037) predicted by ADARs contain variable numbers of an amino acid sequence Genefinder encodes a 367 amino acid protein that is known as the dsRNA binding motif (dsRBM) (8,9). 32–35% identical to the C-terminal domain of adeno- We isolated several T20H4.4 clones from a C.elegans cDNA sine deaminases that act on RNA. We show that library. The cDNAs included two exons in addition to those T20H4.4 cDNAs (GenBank accession no. AF051275) identified as part of the T20H4.4 ORF by Genefinder (10), and encode a larger 495 amino acid protein that is encoded a larger protein (55.3 kDa) that contained a single extended at its N-terminus to include a single double- dsRBM. While one of the newly identified exons was created stranded RNA-binding motif, and that T20H4.4 occu- by conventional cis-splicing, the second derived from a trans- pies the second position in a six-gene operon splicing event since the 5' ends of the three longest clones con- (5'-T20H4.5, T20H4.4, R151.8A, R151.8B, R151.7, tained non-genomic spliced-leader (SL) sequences.
    [Show full text]