DEVELOPING TOOLS for RNA STRUCTURAL ALIGNMENT Ali G
Total Page:16
File Type:pdf, Size:1020Kb
DEVELOPING TOOLS FOR RNA STRUCTURAL ALIGNMENT Ali G. Mokdad A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY May 2006 Committee: Neocles B. Leontis, Advisor Danny Myers, Graduate Faculty Representative Scott O. Rogers Craig Zirbel Alexei Fedorov © 2006 Ali Mokdad All Rights Reserved iii ABSTRACT Neocles B. Leontis, Advisor This work addresses current problems of RNA sequence alignment and describes different tools for solving them. RNA molecules form basepairs that fold the molecule into its secondary and tertiary structures. These structures are more conserved in evolution than primary sequence because they directly affect the function of the molecule. Thus, sequence alignment of RNA molecules, unlike that of other biological molecules, must proceed by aligning homologous pairs of positions to each other. The state of the art methods used today for aligning RNA are based on Stochastic Context Free Grammars (SCFG). These methods are able to characterize and thus align nested RNA basepairs, but are incapable of dealing with crossing basepair patterns. In addition, the current application of this algorithm ignores 3D structure and thus deals best with only one type of basepairs. Although this type (the cis Watson-Crick/Watson-Crick or cWW type) is the most common in RNA, there are at least eleven other families of basepairs that account for about one third of RNA interactions. Each of these families has its own structural dimension, and therefore its own patterns of accepted isosteric substitutions in sequence alignments. Here, an RNA alignment analysis and evaluation tool that takes into consideration 3D structure with all types of basepairs is described. This tool is used to structurally evaluate alignments and locate errors in them. A discussion and classification of tertiary interactions of the G/U wobble basepair is then presented. Novel conserved interactions are discovered, and their sequence signatures are used to further enhance sequence alignments. Finally, a better SCFG approach for automatic RNA alignment is suggested and tested. This approach takes into consideration the 3D structure of all families of basepairs. It is also coupled with another theory, Markov Random Field (MRF), to align areas where crossing basepairs occur. iv To My Mother and Father Thank You! v ACKNOWLEDGEMENTS I would like to thank my advisor and committee members for their support and help at different stages of my studies. I would also like to thank Stan Smith, John Graham, and Arthur Brecher for their kindness, trust, and valuable advice throughout my training. Finally, this work would not have been possible without the constant support and encouragement of my wife. vi TABLE OF CONTENTS Page CHAPTER I: BACKGROUND AND SIGNIFICANCE ...................................................... 1 General Characteristics of RNA ................................................................................ 1 RNA Structure ............................................................................................... 3 Basepairing, RNA Evolution, and Isostericity Matrices................................ 7 RNA Sequence Alignment: A Statement about Evolution........................................ 14 Building Sequence Alignments...................................................................... 14 Problems with Sequence Alignment Representations ................................... 18 Models: Manageable Approximations of Realities........................................ 20 Importance and Use of Good RNA Alignments............................................ 21 Dissertation Structure................................................................................................. 22 CHAPTER II: RIBOSTRAL – AN RNA 3D STRUCTURAL ALIGNMENT ANALYZER, EVALUATOR, AND VIEWER BASED ON BASEPAIR ISOSTERICITIES.................... 25 Introduction…............................................................................................................ 25 Supported Platforms and Deployment Process.............................................. 27 Executing Ribostral.................................................................................................... 28 Loading an Alignment File............................................................................ 29 Dividing Sequences into Subgroups .............................................................. 31 Preparations for the Analysis of a List of Nucleotides .................................. 34 Analyzing a List of Nucleotides .................................................................... 38 Types of Output Files and their Interpretations ......................................................... 39 Output Files for a Non-basepair List ............................................................. 39 vii Output Files for a Basepair List..................................................................... 41 Ribostral Alignment Viewer.......................................................................... 47 Additional Output Formats ............................................................................ 52 Interactive Analysis of Nucleotides........................................................................... 53 Interactive Analysis of a Family of Basepairs ............................................... 54 Interactive Analysis of Individual Basepairs or Motifs ................................. 57 Ribostral Preferences ................................................................................................. 60 Supporting Tools........................................................................................................ 62 Tool 1: Generate BP List from PDB.............................................................. 63 Tool 2: Align Sequences................................................................................ 64 Tool 3: Extract Parts of a FASTA File .......................................................... 64 Tool 4: Merge & Remove Repeats from FASTA Files ................................. 66 Tool 5: Create .fasta from .mat ...................................................................... 66 Help Menu….. ........................................................................................................... 67 Practical Uses of Ribostral......................................................................................... 67 CHAPTER III: STRUCTURAL AND EVOLUTIONARY CLASSIFICATION OF G/U WOBBLE BASEPAIRS IN THE RIBOSOME .................................................................... 68 Introduction…............................................................................................................ 68 Materials and Methods............................................................................................... 73 Results……… ............................................................................................................ 76 Types of Observed Tertiary Interactions and Their Sequence Patterns......... 85 P-interaction................................................................................................... 89 Molecular Dynamics Analysis of the P-interaction....................................... 95 viii Other Shallow Groove Pocket Interactions.................................................... 98 Discussion…….......................................................................................................... 102 Conclusion….. ........................................................................................................... 109 CHAPTER IV: SEQUENCE PARSING AND AUTOMATIC RNA ALIGNMENT .......... 111 Algorithms for Aligning RNA Sequences ................................................................. 111 SCFG Profiles of RNA Molecules................................................................. 114 Alignment of an RNA Motif Using the Hybrid SCFG/MRF Method....................... 117 Alignment Results and Discussions............................................................... 121 SCFG/MRF Discrimination Power................................................................ 125 Conclusion and Summary.......................................................................................... 127 REFERENCES……... ........................................................................................................... 128 ix LIST OF FIGURES Figure Page 1 Identification of the parts of an RNA nucleotide ...................................................... 3 2 Secondary structure of the 5S rRNA from Haloarcula marismortui ......................... 5 3 Tertiary structure of 5S rRNA from Haloarcula marismortui .................................. 6 4 Cis versus trans orientations of glycosidic bonds...................................................... 8 5 Basepair families and their isosteric subfamilies....................................................... 12 6 Ribostral default installation subdirectories............................................................... 28 7 Ribostral’s main GUI................................................................................................. 29 8 File menu options....................................................................................................... 30 9 Browsing for an alignment file .................................................................................. 30 10 A snapshot of the “KnownFASTAFilenames.xls”