¨Ubung Bioinformatik Von RNA- Und Proteinstrukturen

Ubung¨ Bioinformatik von RNA- und Proteinstrukturen Ronny Lorenz [email protected] Bioinformatik University of Leipzig Leipzig, Germany, April 14, 2014 RNA Secondary structure prediction Secondary structures can be uniquely decomposed into loops interior base pairs interior base pair closing base pair CG A A G 5 G U G 5 5 C 3 U C 3 3 CA C U AA closing base pair closing base pair stacking pair hairpin loop multi loop interior base pair closing base pair C G A GA G 5 5 A 3 3 CU C 5 3 AU G interior base pair UAUACGCA closing base pair interior loop bulge exterior loop RNA Secondary structure prediction X E(S) = E(L) L2S i j N 1 • The free energy of a secondary structure is the sum of the free energy of the loops its composed of • Loop energies depend on loop type, loop size and sequence • Energy parameters are measured experimentally or extrapolated by mathematical models RNA Secondary structure prediction Decomposition scheme F i,j = | i j i i+1 j i k k+1 j Ci,j = | | i j i j i d e j i uu+1 j Mi,j = | | i j i j−1j i uu+1j i u u+1 j ^ Mi,j = | i j i j−1j i j Programs / Program suites • Unafold / mfold (M. Zuker) http://mfold.rna.albany.edu/ ...the ’inventor’ of the DP recursion scheme • RNAstructure (D. Mathews) http://rna.urmc.rochester.edu/RNAstructure.html http://rna.urmc.rochester.edu/NNDB/ ...the energy parameter guys • ViennaRNA Package (I. Hofacker) http://www.tbi.univie.ac.at/RNA/ http://www.tbi.univie.ac.at/~ronny/RNA/ ...comprehensive compilation and very fast implementation of RNA structure prediction algorithms RNA/DNA sequence file formats • Simple textfile (ViennaRNA) CAAAGGCGACUCUCCUUAGACUCUAUAAAUAGUAAAUAGCUCCUAGGGACAAGGCUUACG • SEQ file (RNAstructure) ; There can be any number of comments. A title must immediately follow on the next line and be on one line. AAA GCGG UUTGTT UTCUTaaTCTXXXXUCAGG1 • FASTA >some random sequence CAAAGGCGACUCUCCUUAGACUCUAUAAAU AGUAAAUAGCUCCUAGGGACAAGGCUUACG • GenBank http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html • EMBL http://www.ebi.ac.uk/ena/ • RNAML XML style storage of RNA sequence- and structure data • Clustal format for sequence alignments (used by many alignment programs) • Stockholm format sequence alignments (http://rfam.sanger.ac.uk/) RNA/DNA structure file formats • dot parenthesis (a.k.a. dot-bracket) notation UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... • BPSEQ format 1 G 8 2 G 7 3 C 0 4 A 0 5 U 0 6 U 0 7 C 2 8 C 1 • connectivity table (CT) format 8 1 G 0 2 8 1 2 G 1 3 7 2 3 C 2 4 0 3 4 A 3 5 0 4 5 U 4 6 0 5 6 U 5 7 0 6 7 C 6 8 2 7 8 C 7 0 1 8 • RNAML see also http://www.rnasoft.ca/strand/help.php The ViennaRNA Package www.tbi.univie.ac.at/~ronny/RNA • Source code available • Binary packages available (Fedora, Ubuntu, Debian, Windows) • FAST and accurate!!! http: //www.tbi.univie.ac.at/~ronny/RNA/performance.html • Reads simple sequence files, FASTA, clustal, and Stockholm formats • Produces Postscript plots, dot-bracket structures, other output • ViennaRNA Websuite: rna.tbi.univie.ac.at RNAfold Compute minimum free energy structures (...partition function, base pair probabilities, centroid-, and MEA-structure, etc...) • Sequence file, e.g. FASTA format (sequence.fa) >some random sequence UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGA CGGGCGAUCUUCGAAAGUGGAAUCCGUA • Run RNAfold $ RNAfold < sequence.fa >some random sequence UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... (-13.40) • Postscript structure plot U A U G A G A U U C U C G G G G G C C U U C G A G C U G U C G C A C C G G A A G C U U G U G A A U A A G C G C U G A G C RNAplot Draw and annotate RNA secondary structure plots • Sequence/Structure pair input, e.g. output of RNAfold >some random sequence UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... (-13.40) • Run RNAplot with different layout algorithm $ RNAfold < sequence.fa | RNAplot -t 0 C A G C G G G G G G C C G G C A U U C C U U G G C U A G A A U C U C G G C G G A G C A U U C G C U AG G C C G C U G G U G A U C G G C A A U A A A G G G G A U C C U U U C G G U G G C G U C C G A G U C G C C A C UG G G U U C A C C U G G G A A G A C C U A C U U A G C G G A U U U U G A A C U A G G A U A G C A U G C U U G C G U A G A A U U C A C G G A U A U G G • Run RNAplot with --pre="" option to create annotation macros $ RNAfold < sequence.fa | RNAplot --pre="" open resulting postscript plot in a text editor and investigate macros RNAeval Evaluate the free energy of a sequence/structure pair • Simple check if RNAfold and RNAeval score equally $ RNAfold < sequence.fa >some random sequence UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... (-13.40) $ RNAfold < sequence.fa | RNAeval UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... (-13.40) • Evaluate at 20°C and dangle model 1 $ RNAfold < sequence.fa | RNAeval -d 1 -T 20. UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... (-20.28) RNAeval Evaluate the free energy of a sequence/structure pair • Verbose output $ RNAfold < sequence.fa | RNAeval -v External loop : -140 Interior loop ( 3, 57) GC; ( 4, 56) GC: -330 Interior loop ( 4, 56) GC; ( 5, 55) AU: -240 Interior loop ( 5, 55) AU; ( 10, 52) UG: 380 Interior loop ( 10, 52) UG; ( 11, 51) CG: -150 Interior loop ( 11, 51) CG; ( 13, 49) CG: -10 Interior loop ( 13, 49) CG; ( 14, 48) UA: -210 Interior loop ( 14, 48) UA; ( 15, 47) UA: -90 Interior loop ( 15, 47) UA; ( 17, 45) CG: 120 Interior loop ( 17, 45) CG; ( 18, 44) GC: -240 Interior loop ( 18, 44) GC; ( 19, 43) AU: -240 Interior loop ( 19, 43) AU; ( 20, 42) GU: -60 Interior loop ( 20, 42) GU; ( 23, 39) UA: 290 Interior loop ( 23, 39) UA; ( 24, 38) CG: -240 Interior loop ( 24, 38) CG; ( 25, 37) GC: -240 Interior loop ( 25, 37) GC; ( 26, 36) CG: -340 Hairpin loop ( 26, 36) CG : 400 UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... (-13.40).

¨Ubung Bioinformatik Von RNA- Und Proteinstrukturen

Degeneracy and Genetic Assimilation in RNA Evolution Reza Rezazadegan1* and Christian Reidys1,2

Folding RNA/DNA Hybrid Duplexes

Supplemental Material Predicting Transfer RNA Gene Activity From

Ribotoolkit: an Integrated Platform for Analysis and Annotation of Ribosome Profiling Data to Decode Mrna Translation at Codon R

Comparative Ribosome Profiling Uncovers a Dominant Role for Translational Control in Toxoplasma Gondii

Assessing the Quality of Cotranscriptional Folding Simulations

Control of RNA Function by Conformational Design

Machine Learning a Model for RNA Structure Prediction

Alternative DNA Secondary Structure Formation Affects RNA Polymerase II Promoter-Proximal Pausing in Human Karol Szlachta1†, Ryan G

RNA Secondary Structure Prediction Using Deep Learning with Thermodynamic Integration ✉ Kengo Sato 1 , Manato Akiyama1 & Yasubumi Sakakibara1

A Short Tutorial on RNA Bioinformatics 1 RNA Web Services

In Silico Study on RNA Structures of Intronic Mutations of Beta-Globin Gene [Version 1; Peer Review: 1 Approved with Reservations, 1 Not Approved]