¨Ubung Bioinformatik Von RNA- Und Proteinstrukturen
Total Page:16
File Type:pdf, Size:1020Kb
Ubung¨ Bioinformatik von RNA- und Proteinstrukturen Ronny Lorenz [email protected] Bioinformatik University of Leipzig Leipzig, Germany, April 14, 2014 RNA Secondary structure prediction Secondary structures can be uniquely decomposed into loops interior base pairs interior base pair closing base pair CG A A G 5 G U G 5 5 C 3 U C 3 3 CA C U AA closing base pair closing base pair stacking pair hairpin loop multi loop interior base pair closing base pair C G A GA G 5 5 A 3 3 CU C 5 3 AU G interior base pair UAUACGCA closing base pair interior loop bulge exterior loop RNA Secondary structure prediction X E(S) = E(L) L2S i j N 1 • The free energy of a secondary structure is the sum of the free energy of the loops its composed of • Loop energies depend on loop type, loop size and sequence • Energy parameters are measured experimentally or extrapolated by mathematical models RNA Secondary structure prediction Decomposition scheme F i,j = | i j i i+1 j i k k+1 j Ci,j = | | i j i j i d e j i uu+1 j Mi,j = | | i j i j−1j i uu+1j i u u+1 j ^ Mi,j = | i j i j−1j i j Programs / Program suites • Unafold / mfold (M. Zuker) http://mfold.rna.albany.edu/ ...the ’inventor’ of the DP recursion scheme • RNAstructure (D. Mathews) http://rna.urmc.rochester.edu/RNAstructure.html http://rna.urmc.rochester.edu/NNDB/ ...the energy parameter guys • ViennaRNA Package (I. Hofacker) http://www.tbi.univie.ac.at/RNA/ http://www.tbi.univie.ac.at/~ronny/RNA/ ...comprehensive compilation and very fast implementation of RNA structure prediction algorithms RNA/DNA sequence file formats • Simple textfile (ViennaRNA) CAAAGGCGACUCUCCUUAGACUCUAUAAAUAGUAAAUAGCUCCUAGGGACAAGGCUUACG • SEQ file (RNAstructure) ; There can be any number of comments. A title must immediately follow on the next line and be on one line. AAA GCGG UUTGTT UTCUTaaTCTXXXXUCAGG1 • FASTA >some random sequence CAAAGGCGACUCUCCUUAGACUCUAUAAAU AGUAAAUAGCUCCUAGGGACAAGGCUUACG • GenBank http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html • EMBL http://www.ebi.ac.uk/ena/ • RNAML XML style storage of RNA sequence- and structure data • Clustal format for sequence alignments (used by many alignment programs) • Stockholm format sequence alignments (http://rfam.sanger.ac.uk/) RNA/DNA structure file formats • dot parenthesis (a.k.a. dot-bracket) notation UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... • BPSEQ format 1 G 8 2 G 7 3 C 0 4 A 0 5 U 0 6 U 0 7 C 2 8 C 1 • connectivity table (CT) format 8 1 G 0 2 8 1 2 G 1 3 7 2 3 C 2 4 0 3 4 A 3 5 0 4 5 U 4 6 0 5 6 U 5 7 0 6 7 C 6 8 2 7 8 C 7 0 1 8 • RNAML see also http://www.rnasoft.ca/strand/help.php The ViennaRNA Package www.tbi.univie.ac.at/~ronny/RNA • Source code available • Binary packages available (Fedora, Ubuntu, Debian, Windows) • FAST and accurate!!! http: //www.tbi.univie.ac.at/~ronny/RNA/performance.html • Reads simple sequence files, FASTA, clustal, and Stockholm formats • Produces Postscript plots, dot-bracket structures, other output • ViennaRNA Websuite: rna.tbi.univie.ac.at RNAfold Compute minimum free energy structures (...partition function, base pair probabilities, centroid-, and MEA-structure, etc...) • Sequence file, e.g. FASTA format (sequence.fa) >some random sequence UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGA CGGGCGAUCUUCGAAAGUGGAAUCCGUA • Run RNAfold $ RNAfold < sequence.fa >some random sequence UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... (-13.40) • Postscript structure plot U A U G A G A U U C U C G G G G G C C U U C G A G C U G U C G C A C C G G A A G C U U G U G A A U A A G C G C U G A G C RNAplot Draw and annotate RNA secondary structure plots • Sequence/Structure pair input, e.g. output of RNAfold >some random sequence UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... (-13.40) • Run RNAplot with different layout algorithm $ RNAfold < sequence.fa | RNAplot -t 0 C A G C G G G G G G C C G G C A U U C C U U G G C U A G A A U C U C G G C G G A G C A U U C G C U AG G C C G C U G G U G A U C G G C A A U A A A G G G G A U C C U U U C G G U G G C G U C C G A G U C G C C A C UG G G U U C A C C U G G G A A G A C C U A C U U A G C G G A U U U U G A A C U A G G A U A G C A U G C U U G C G U A G A A U U C A C G G A U A U G G • Run RNAplot with --pre="" option to create annotation macros $ RNAfold < sequence.fa | RNAplot --pre="" open resulting postscript plot in a text editor and investigate macros RNAeval Evaluate the free energy of a sequence/structure pair • Simple check if RNAfold and RNAeval score equally $ RNAfold < sequence.fa >some random sequence UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... (-13.40) $ RNAfold < sequence.fa | RNAeval UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... (-13.40) • Evaluate at 20°C and dangle model 1 $ RNAfold < sequence.fa | RNAeval -d 1 -T 20. UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... (-20.28) RNAeval Evaluate the free energy of a sequence/structure pair • Verbose output $ RNAfold < sequence.fa | RNAeval -v External loop : -140 Interior loop ( 3, 57) GC; ( 4, 56) GC: -330 Interior loop ( 4, 56) GC; ( 5, 55) AU: -240 Interior loop ( 5, 55) AU; ( 10, 52) UG: 380 Interior loop ( 10, 52) UG; ( 11, 51) CG: -150 Interior loop ( 11, 51) CG; ( 13, 49) CG: -10 Interior loop ( 13, 49) CG; ( 14, 48) UA: -210 Interior loop ( 14, 48) UA; ( 15, 47) UA: -90 Interior loop ( 15, 47) UA; ( 17, 45) CG: 120 Interior loop ( 17, 45) CG; ( 18, 44) GC: -240 Interior loop ( 18, 44) GC; ( 19, 43) AU: -240 Interior loop ( 19, 43) AU; ( 20, 42) GU: -60 Interior loop ( 20, 42) GU; ( 23, 39) UA: 290 Interior loop ( 23, 39) UA; ( 24, 38) CG: -240 Interior loop ( 24, 38) CG; ( 25, 37) GC: -240 Interior loop ( 25, 37) GC; ( 26, 36) CG: -340 Hairpin loop ( 26, 36) CG : 400 UGGGAAUAGUCUCUUCCGAGUCUCGCGGGCGACGGGCGAUCUUCGAAAGUGGAAUCCGUA ..(((....((.(((.((((..((((.........))))..)))).))).))..)))... (-13.40).