Noncoding RNA Slides
Total Page:16
File Type:pdf, Size:1020Kb
ncRNA: Interest extensive noncoding sequence conservation Modeling and Searching even more extensive transcription for Non-Coding RNA “invisible” structural conservation? many RNA binding proteins W.L. Ruzzo examples: microRNAs, riboswitches http://www.cs.washington.edu/homes/ruzzo Bottom line: important regulatory roles Outline Why RNA? Examples of RNA biology Computational Challenges Modeling Fig. 2. The arrows show the situation as it seemed in 1958. Solid arrows represent Search probable transfers, dotted arrows possible transfers. The absent arrows (compare Fig. 1) Inference represent the impossible transfers postulated by the central dogma. They are the three possible arrows starting from protein. 1 RNA Secondary Structure: The “Central Dogma” RNA makes helices too DNA RNA Protein U C A A Base pairs G C C G Protein A gene A U G C U A DNA C G C G A U (chromosome) RNA G C A (messenger) CA A A cell U “Classical” RNAs Non-coding RNA mRNA Messenger RNA - codes for proteins tRNA Non-coding RNA - all the rest rRNA Before, say, mid 1990’s, 1-2 dozen known (critically snRNA (small nuclear - spl icing) important, but narrow roles: e.g. tRNA) snoRNA (small nucleolar - guides for t/rRNA modifications) Since mid 90’s dramatic discoveries RNAseP (tRNA maturation; ribozyme in bacteria) Regulation, transport, stability/degradation SRP (signal recognition particle; co-translational E.g. “microRNA”: ≈ 100’s in humans targeting of proteins to membranes) By some estimates, ncRNA >> mRNA telomerases 2 Bacteria Triumph of proteins 80% of genome is coding DNA Functionally diverse receptors motors catalysts regulators (Monod & Jakob, Nobel prize 1965) … . e 3 Gene Regulation: The MET Repressor , l a t e The , Riboswitch s t r e protein b l alternative A way SAM SAM Grundy & Henkin, Mol. Microbiol 1998 Epshtein, et al., PNAS 2003 Winkler et al., Nat. Struct. Biol. 2003 Protein Alberts, et al, 3e. DNA 3 . e e 3 3 , , l l a a t t e The e The , Riboswitch , Riboswitch s s t t r r e protein e protein b b l alternatives l alternatives A way A way SAM-II SAM-III SAM-I SAM-I SAM-II Fuchs et al., Corbino et al., NSMB 2006 Genome Biol. 2005 Grundy, Epshtein, Winkler Grundy, Epshtein, Winkler Corbino et al., et al., 1998, 2003 et al., 1998, 2003 Genome Biol. 2005 . e 3 , l a t e The , Riboswitch s t r e protein b l alternatives A way Widespread, deeply conserved, structurally boxed = confirmed sophisticated, functionally diverse, biologicariblloyswitch SAM-III important uses for ncRNA throughout (+2 more) SAM-I SAM-II SAM-IV prokaryotic world. Grundy, Epshtein, Winkler Corbino et al., Fuchs et al., Weinberg et al., et al., 1998, 2003 Genome Biol. 2005 NSMB 2006 RNA 2008 Weinberg, et al. Nucl. Acids Res., July 2007 35: 4809-4819. 4 RNA on the Rise Human Predictions In humans Evofold more RNA- than DNA-binding proteins? S Pedersen, G Bejerano, A Siepel, K Rosenbloom, much more conserved DNA than coding K Lindblad-Toh, ES Lander, J Kent, W Miller, D Haussler, "Identification and classification of MUCH more transcribed DNA than coding conserved RNA secondary structures in the In bacteria human genome." PLoS Comput. Biol., 2, #4 regulation of M ANY genes involves RNA (2006) e33. dozens of classes & thousands of new 48,479 candidates (~70% FDR?) examples in just last 5 years RNAz FOLDALIGN S Washietl, IL Hofacker, M Lukasser, A Hutenhofer, E Torarinsson, M Sawera, JH Havgaard, M PF Stadler, "Mapping of conserved RNA secondary structures predicts thousands of functional Fredholm, J Gorodkin, "Thousands of noncoding RNAs in the human genome." Nat. corresponding human and mouse genomic Biotechnol., 23, #11 (2005) 1383-90. regions unalignable in primary sequence 30,000 structured RNA elements contain common RNA structure." Genome 1,000 conserved across all vertebrates. Res., 16, #7 (2006) 885-9. ~1/3 in introns of known genes, ~1/6 in UTRs 1800 candidates from 36970 (of 100,000) ~1/2 located far from any known gene pairs 5 Fastest Human CMfinder Torarinsson, Yao, Wiklund, Bramsen, Gene? Hansen, Kjems, Tommerup, Ruzzo and Gorodkin. Comparative genomics beyond sequence based alignments: RNA structures in the ENCODE regions. Genome Research, Feb 2008, 18(2):242-251 PMID: 18096747 6500 candidates in ENCODE alone (better FDR, but still high) Origin of Life? Origin of Life? RNA can carry information too (RNA double helix) Life needs RNA can form complex structures information carrier: DNA RNA enzymes exist (ribozymes) molecular machines, like enzymes: Protein making proteins needs DNA + RNA + proteins The “RNA world” hypothesis: making (duplicating) DNA needs proteins 1st life was RNA-based Horrible circularities! How could it have arisen in an abiotic environment? Some extant RNAs are relicts of that origin; some are “modern” inventionsrel 6 ncRNA Example: Xist ncRNA Example: 6S large (12kb?) medium size (175nt) largely unstructured RNA structured required for X-inactivation in mammals highly expressed in E. coli in certain growth conditions sequenced in 1971; function unknown for 30 years 6S mimics an open promoter ncRNA Example: IRE Iron Response Element: a short conserved stem- loop, bound by iron response proteins (IRPs). Found in UTRs of various mRNAs whose products are involved in iron metabolism. E.g., the mRNA of ferritin (an iron storage protein) contains one IRE in its 5' UTR. When iron concentration is low, IRPs bind the ferritin mRNA IRE, repressing translation. Binding of multiple IREs in the 3' and 5' UTRs of the transferrin receptor (involved in iron acquisition) leads to increased mRNA E.coli stability. These two activities form the basis of iron homeostasis in the vertebrate cell. Barrick et al. RNA 2005 Trotochaud et al. NSMB 2005 Willkomm et al. NAR 2005 7 Iron Response Element ncRNA Example: MicroRNAs IRE (partial seed alignment): short (~22 nt) unstructured RNAs excised from Hom.sap. GUUCCUGCUUCAACAGUGUUUGGAUGGAAC Hom.sap. UUUCUUC.UUCAACAGUGUUUGGAUGGAAC ~75nt precursor hairpin Hom.sap. UUUCCUGUUUCAACAGUGCUUGGA.GGAAC approx antisense to mRNA targets, often in 3’ UTR Hom.sap. UUUAUC..AGUGACAGAGUUCACU.AUAAA Hom.sap. UCUCUUGCUUCAACAGUGUUUGGAUGGAAC regulate gene activity, e.g. by destabilizing (plants) Hom.sap. AUUAUC..GGGAACAGUGUUUCCC.AUAAU Hom.sap. UCUUGC..UUCAACAGUGUUUGGACGGAAG or otherwise suppressing (animals) message Hom.sap. UGUAUC..GGAGACAGUGAUCUCC.AUAUG Hom.sap. AUUAUC..GGAAGCAGUGCCUUCC.AUAAU hundreds and growing, each w/ perhaps 10x Cav.por. UCUCCUGCUUCAACAGUGCUUGGACGGAGC targets Mus.mus. UAUAUC..GGAGACAGUGAUCUCC.AUAUG Mus.mus. UUUCCUGCUUCAACAGUGCUUGAACGGAAC Some conserved human to worm; Mus.mus. GUACUUGCUUCAACAGUGUUUGAACGGAAC Rat.nor. UAUAUC..GGAGACAGUGACCUCC.AUAUG others evolving rapidly Rat.nor. UAUCUUGCUUCAACAGUGUUUGGACGGAAC SS_cons <<<<<...<<<<<......>>>>>.>>>>> ncRNA Example: T-boxes ncRNA Example: Riboswitches • UTR structure that directly senses/binds small molecules & regulates mRNA • widespread in prokaryotes • some in eukaryotes 8 Example: Glycine Regulation The Glycine Riboswitch • How is glycine level regulated? • Actual answer (in many bacteria): • Plausible answer: gce g gce protein protein g g g g g g g 5′ 3′ TF g gce mRNA TF glycine cleavage enzyme gene DNA glycine cleavage enzyme gene DNA transcription factors (proteins) bind to DNA to turn nearby genes on or off Mandal et al. Science 2004 (Mandal, Lee, Barrick, Weinberg, Emilsson, Ruzzo, Breaker, Science 2004) 5’ gcvT ORF 3’ And… Fig. 3. Cooperative binding of two glycine molecules by the VC I-II RNA. Plot depicts the • More examples means better alignment fraction of VC II (open) and VC I-II (solid) bound to ligand versus the concentration of glycine. • Understand phylogenetic distribution The constant, n, is the Hill coefficient for the lines as indicated that best fit the aggregate data from four different regions (fig. S3). Shaded boxes demark the dynamic range (DR) of glycine • Find riboswitch in front of new gene concentrations needed by the RNAs to progress from 10%- to 90%-bound states. 9 Riboswitches Why? • ~ 20 ligands known; multiple nonhomologous • RNA’s fold, solutions for some and function • dozens to hundreds of instances of each • TPP known in archaea & eukaryotes • on/off; transcription/translation; splicing; • Nature uses combinatorial control what works • In some bacteria, more riboregulators identified than protein TFs • all found since ~2003 Outline Homology search • ncRNA: what/why? Sequence-based • What does computation bring? – Smith-Waterman – FASTA • How to model and search for ncRNA? – BLAST • Faster search • Better model inference Sharp decline in sensitivity at ~60-70% identity So, use structure, too 10 Impact of RNA homology search (Barrick, et al., 2004) Structure Prediction glycine riboswitch operon • Xray crystalography, NMR, etc B. subtilis • Comparative modeling L. innocua – Alignment & compensatory substitutions A. tumefaciens • Single-sequence Folding V. cholera – mfold, mccaskill, vienna… – est 50-70% accurate up to 200-300 nt M. tuberculosis • Multiple sequence alignment/folding (and 19 more species) Impact of RNA homology search (Barrick, et al., 2004) Using our glycine techniques, we RNA Informatics riboswitch operon found… B. subtilis • RNA: Not just a messenger anymore L. innocua – Dramatic discoveries – Hundreds of families (besides classics like A. tumefaciens tRNA, rRNA, snRNA…) – Widespread, important roles V. cholera • Computational tools important M. tuberculosis – Discovery, characterization, annotation (and 19 more species) (and 42 more species) – BUT: slow, inaccurate, demanding 11 Q: What’s so hard? A A G A A A