Probabilistic Walks with Graeme

Probabilistic walks with Graeme Sean Eddy Molecular & Cellular Biology, and Applied Mathematics HHMI and Harvard University eddylab.org Attempt to repeat some analytic method that is considered unreliable and difficult until patience and hard work yield results similar to those published by the author. Pleasure derived from success, especially if it has come without the supervision of an instructor (that is, working alone), is a clear indication of aptitude for experimental work. Santiago Ramon y Cajal (1916) April, 1993 I had proposed to develop reporter fusions to neural-specific promoters as a tool to visualize axonal processes in live animals, facilitating genetic screens... I have started to play with... green fluorescent protein (GFP) from the jellyfish... I found out at the worm meeting in June that Marty Chalfie’s lab is onto the same idea. Chalfie has already obtained bright fluorescence in the axons of the six touch neurons... [GFP introduced: Chalfie et al, Science 263:802, 1994] Image from Yishi Jin, UC Santa Cruz GABAergic neurons in C. elegans Somewhat embarrassingly, my most productive work has been unrelated to the original proposal, resulting from some moonlighting as a computational biologist... and a lingering interest in RNA structure from my thesis work.... I invented a new kind of statistical model, related to HMMs, which can model the two-dimensional structure consensus of RNAs. - progress report for my postdoc grant, 1993 From sequence to RNA structure analysis HMM algorithm SCFG algorithm Goal (sequence) (RNA structure) optimal alignment Viterbi CYK P(sequence | model) Forward Inside EM parameter estimation Forward-Backward Inside-Outside memory complexity: O(MN) O(MN2) time complexity (general): O(M2N) O(M3N3) time complexity (as used): O(MN) O(MN3) • we can analyze target sequences with secondary structure models; • but the algorithms are computationally expensive. Software tools for homology search and alignment HMMER Infernal protein homology search: profile HMMs RNA homology search: profile SCFGs http://hmmer.org http://eddylab.org/infernal 55K lines source code 90K lines source code >40,000 downloads/yr >10,000 downloads/yr human Dicer | Lau et al, Nature Struct Mol Biol 19:436 (2012) glnA glutamine riboswitch | RF01739 Pfam Rfam 17929 protein domain families 3016 RNA structure families http://pfam.xfam.org http://rfam.xfam.org Xfam Consortium and HMMER server team: Rob Finn, Alex Bateman European Bioinformatics Institute, Cambridge UK A A G A A U Group I IB4 intron (IB4.5) C G Group I IA3 intron (IA3.1) phylum: Woesearchaeota G C C phylum: Woesearchaeota A U accession: KP308748.1 P9.1C G accession: CP010426.1 260 U A host gene: LSU rRNA host gene: LSU rRNA G U A A P9 G C 280 P9.2 A G A A C G G U G C C A U G G G U U C C U U A C G C A A U G A C G G U A C G A C C G A G G G A A U A 240 U 300 A U P9 A 240 U G 3’ A G C a C G a G C U U A 3’ c A G A u U A UGG g A a U U C g U G C G P10 c C A UU g A A U a A A U G C U A A C g A U U A U U G a U A A U U A A G C 160 A U G C GP9.0 G C u C G A U P5 G C G u U A U A U A P5a U A P5 G C g A U C G C G C g G C c G U A P9.0 A U C G 100 C C g G C U A G u C P5a A A A U P7.2 C G C A P1U a A U C g G C U A A U A A 60 P1 A U A U a U G G c G U 140 A G A U U a U G G u A U 220 C G C G C G A U A A 160 C G 80 A u A C G U a 5’ C G U A A G c C G U U P4 A G A U 20 P7(pseudoknot) C G U A A P7.1 G A u 5’ U A 80 G A P7 A G P4 G C G C 200 U A 100 G C UA G C (pseudo- U C G A U A C A knot) U A C G CA A U A U G A U C G A U U U A A A G C A A G C G G A U G U A 140 A C G C U A A P6 G C U A C A A P6 G C UA C U G C A CU C G C A U A A U U C G G C 180 C A A A A A A G C G C G G C G C A A G C P3 G C P3 G U G A A G U A C G U A U A A U G G U G C A C G A C G G UU U 40 A U G C U A A U A U A A P5bU A P6a U U A U 60 A G C 120 A A U P6a C G A G U G C A U A U U A C G G C C G A U C G A U G C G G C U A G C 120 A G U C C G C G P8 U G C G C G A P8 A A A U 220 G C C G A U A A G 180 G A U U A A A A U U A P2 A U A 456nt A U 200 A U G 20 A U G U U A G C A G U A P2 A U A CG G C G A U G C A U C homology to LAGLI_DADG A G A C homing endonuclease A C (PF00961) C G A U 40 G A A A Nawrocki et al, NAR 46:7970 (2018) “In my view, biology needs numbers; not after the fashion of physics, but in a good engineering, computational spirit.”.

Probabilistic Walks with Graeme

RDA COVID-19 Recommendations and Guidelines on Data Sharing

Enhanced Representation of Natural Product Metabolism in Uniprotkb

The EMBL-European Bioinformatics Institute the Hub for Bioinformatics in Europe

Tunca Doğan , Alex Bateman , Maria J. Martin Your Choice

Evolution and Function of Drososphila Melanogaster Cis-Regulatory Sequences

Downloaded Were Considered to Be True Positive While Those from the from UCSC Databases on 14Th September 2011 [70,71]

Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids

Genomic and Transcriptomic Surveys for the Study of Ncrnas with a Focus on Tropical Parasites

Molecular Genetics & Genomics

On the Necessity of Dissecting Sequence Similarity Scores Into

Pfam: the Protein Families Database in 2021 Jaina Mistry 1,*, Sara Chuguransky 1, Lowri Williams 1, Matloob Qureshi 1, Gustavo A

Download PDF of This Story