6 IV April 2018
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Structure-Based Multiple Rna Sequence Alignment and Finding Rna Motifs
STRUCTURE-BASED MULTIPLE RNA SEQUENCE ALIGNMENT AND FINDING RNA MOTIFS Michael Wayne Sarver A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2006 Committee: Craig Zirbel, Advisor Robert Boughton, Graduate Faculty Representative Truc Nguyen Gabor Szekely John Chen ii ABSTRACT Craig Zirbel, Advisor With the advent of faster computers and the availability of RNA crystal structures we can now use more information to align homologous RNA sequences. We can take a crystal structure and construct a probabilistic model, based on a SCFG, of an RNA molecule. We construct objects called nodes that modularize the model into small pieces that are more manageable. Using this model we can take sequences that are similar to the sequence in the 3D crystal structure and look for the most probable way that the model could have generated the sequence. Then we can get a detailed description of how each node of the model could have generated the sequence. Using this information we can align sequences. Given a seed alignment we give a procedure to construct a 3D structural alignment quickly. In addition we show how the parameters from the model can be estimated. We also have the ability to do motif swaps using objects called alternative nodes. We have developed an algorithm to quickly search through RNA 3D structures to find motifs. This is accomplished by taking a query motif with m bases and finding the center of the heavy atoms for each base and then rotating it onto candidate motifs that have the same number of bases. -
Strings and Sequences in Computer Science
Bioinformatics Algorithms (Fundamental Algorithms, module 2) Zsuzsanna Lipt´ak Masters in Medical Bioinformatics academic year 2018/19, II semester Strings and Sequences in Computer Science • its elements are called characters or letters • jΣj is the size of the alphabet (number of different characters) • a string over Σ is a finite sequence of characters from Σ • we write strings as s = s1s2 ::: sn i.e. si is the i'th character of s N.B.: We number strings from 1, not from 0 Some formalism on strings • Σ a finite set called alphabet 2 / 7 • jΣj is the size of the alphabet (number of different characters) • a string over Σ is a finite sequence of characters from Σ • we write strings as s = s1s2 ::: sn i.e. si is the i'th character of s N.B.: We number strings from 1, not from 0 Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters 2 / 7 • a string over Σ is a finite sequence of characters from Σ • we write strings as s = s1s2 ::: sn i.e. si is the i'th character of s N.B.: We number strings from 1, not from 0 Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters • jΣj is the size of the alphabet (number of different characters) 2 / 7 • we write strings as s = s1s2 ::: sn i.e. si is the i'th character of s N.B.: We number strings from 1, not from 0 Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters • jΣj is the size of the alphabet (number of different characters) • a string over Σ is a finite sequence of characters from Σ 2 / 7 N.B.: We number strings from 1, not from 0 Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters • jΣj is the size of the alphabet (number of different characters) • a string over Σ is a finite sequence of characters from Σ • we write strings as s = s1s2 ::: sn i.e. -
RNA Multiple Structural Alignment with Longest Common Subsequences *
RNA multiple structural alignment with longest common subsequences ? Sergey Bereg a, Marcin Kubica b, Tomasz Walen´ b, Binhai Zhu c;∗, aDepartment of Computer Science, University of Texas at Dallas, Richardson, TX 75083-0688, USA. bInstitute of Informatics, Warsaw University, Banacha 2, 02-097 Warszawa, Poland. cDepartment of Computer Science, Montana State University, Bozeman, MT 59717-3880, USA. Abstract In this paper, we present a new model for RNA multiple sequence structural align- ment based on the longest common subsequence. We consider both the off-line and on-line cases. For the off-line case, i.e., when the longest common subsequence is given as a linear graph with n vertices, we first present a polynomial O(n2) time algorithm to compute its maximum nested loop. We then consider a slightly differ- ent problem|the Maximum Loop Chain problem and present an algorithm which runs in O(n5) time. For the on-line case, i.e., given m RNA sequences of lengths n, compute the longest common subsequence of them such that this subsequence either induces a maximum nested loop or the maximum number of matches, we present efficient algorithms using dynamic programming when m is small. Key words: RNA multiple structure alignment, longest common subsequence, dynamic programming ? This research is partially supported by EPSCOR Visiting Scholar's Program and MSU Short-term Professional Development Program. ∗ Corresponding author: [email protected] Email: [email protected](Sergey Bereg), [email protected](Marcin Kubica), [email protected](Tomasz Wale´n), [email protected](Binhai Zhu) Preprint submitted to Journal of Combinatorial Optimization 6 October 2006 1 Introduction In the study of noncoding RNA (ncRNA), it is well known that the corre- sponding nucleotides are very active among genomic DNA.