Characterizing the 3D Interactions of the Base-Pair Stems
Total Page:16
File Type:pdf, Size:1020Kb
CHARACTERIZING THE 3D INTERACTIONS OF THE BASE-PAIR STEMS AND HELICES IN RNA MOLECULES BY ZAINAB MOHAMMED HASSAN FADHIL A thesis submitted to the The Graduate School- New Brunswick Rutgers, the state University of New Jersey In partial fulfillment of the requirements For the degree of Master of Science Graduate Program in Chemistry and Chemical Biology Written under the direction of Dr. Wilma K. Olson And approved By ………………………………… ………………………………… ………………………………… New Brunswick, New Jersey October 2017 ABSTRACT OF THE THESIS CHARACTERIZING THE 3D INTERACTIONS OF THE BASE-PAIR STEMS AND HELICES IN RNA MOLECULES By ZAINAB MOHAMMED HASSAN FADHIL Thesis Director: Wilma K. Olson Characterizing RNA structure experimentally is a challenging task. Computational methods complement experimental methods. In this thesis, we study, using software tools such as DSSR, Matlab, and Excel, the three-dimensional structures of a group of non-redundant RNA structures. These structures are those identifiers as Leontis and Zirbel at January 2017 [1], and the relevant coordinates for these structures downloaded from the Protein Data Bank (PDB) [2]. Specifically, we find the angles between all pairs of chemically linked double helical stems in each of the RNA structures, the distances between the mid points between all these stems, and the distributions for the calculated angles and distances. Additionally, we separate the above calculations to consider stems within the same helix. A helix is composed of at least two stacked base pairs and these base pairs are not necessarily chemically linked. The third part contains these calculations separated for stems in different helices. Based on these calculations, the stems in the same helix can be classified into two groups. The first group has small angles and mid distance among its stems. The stems all lie in the same direction. In the second group, the angle is large, while the distance is small, which means that the stem directions are opposite to each other. There are different probability density functions that fit closely to the data histograms. ii While the details are given in Chapter 3, the fitted probability density functions confirm that the proposed classification methods are correct. Finally, we plot the distribution of stems lengths to find that the lengths interacting stems tend to be of equal lengths and that interacting stems are more likely to be shorter. iii Acknowledgements I would like to express my most sincere gratitude to my adviser Dr. Olson who has supported me throughout my Master study with her patience and knowledge. Many thanks to my thesis committee Dr. York and Dr. Khare for offering their invaluable insights into my thesis. I would also like to thank my fellow labmates in Dr. Olson’s group for their support in my research. Many thanks to the department of Chemistry at Rutgers for accepting me and supporting me throughout these years. I would like to convey gratitude towards my parents: my father Mohammed and my mother Iman, my sisters Zahraa and Noor, my brothers Mostafa and Ali in Iraq, my kids: Fadhl, Yusur, and Dur for their unconditional love and support. This would have been impossible without their support. Finally, many thanks to my husband, Ahmed, for always being on my side. iv Table of Contents Abstract ……………………………………………………………………………………….... ii Acknowledgements ……………………………………………………………………………. iv List of Tables ………………………………………………………………………………........vi List of Figures …………………………………………………………………………………. vi Chapter1. Introduction ……………………………………………………………………….. 1 1.1 RNA Chemistry …………………………………………………………………………. 4 1.2 Canonical and Noncanonical base pairs ……………………………………………….... 5 1.3 Secondary structure …………………………………………………………………….. 7 1.3.1Internal Loops and Bulges ……………………………………………………… 7 1.3.2 Hairpin Loops ………………………………………………………………….. 8 1.3.3 Junctions between Double Helical Stems …………………………………….... 8 Chapter2. Material and Methods …………..…………………………………………………. 8 2.1 Obtaining the Dataset……….…………………………………………………………... 11 2.2 Processing the Date with Matlab. ……………………………………………………… 12 2.3 Calculations on the Dataset……...…………………………………………………........ 12 2.4 Dataset Histogram Plots …….……………………………………………………....… 13 2.5 Algorithm 1……..………………………………………………………………………. 14 Chapter3. Results ………….………………………………………………………………….. 16 3.1 Results for All the RNA Structures ……………………………………………………. 16 3.2 Fitting Data to Selected Probability Distributions ……………………………………... 17 3.3 Results for RNA Structures Grouped by Their Coaxial Stack……………...…………. 20 3.4 Stems in the Same Helix………………………………………………………………... 20 3.5 Stems in Different Helix ……….……………………………………………………… 24 3.6 Stems Lengths………………………………………………………………………….. 26 Chapter4. Conclusions …………….………………………………………………………….. 28 Chapter5. References ………….……………………………………………………………… 30 v List of Tables Table 1: shows PDB and number of stem in the non-redundant dataset of RNA structures with resolution 3Å or better which contain two or more stems ……………………………………….. 33 Table 2: Type of RNA structures and the number of nucleotides in each structure in the non- redundant dataset of RNA structures with resolution 3Å or better ………………………………. 41 Table 3: PDB_ID of the RNA structures in structures in the non-redundant dataset of RNA structures with resolution 3Å or better …….……………………………………………….….....50 References ………….....………………………………………………………………...……… 60 List of Figures Figure 1.1:2P7D (Hairpin Ribozyme, RNA) three-dimensional structure generated with the DSSR, Jmol. This structure contains 4 stems. ……………………….…………………………...………… 3 Figure 1.2: Base pairs found in RNA double helices: Shown are the structures of commonly base pairs. This image takes from [Echols, H. G. (2001) Operators and promoters: the story of molecular biology anits creators, of California Press …………………………………………………………………………………................ 6 Figure 1.3 Secondary elements of RNA structures. This image takes from Nowakowski, J., and Tinoco, I. (1997) RNA structure and stability. In Seminars in Virology, Elsevier…………….. 10 Figure 3.1.a: Histogram showing the distribution of distances in Å between midpoint in stems in the non-redundant dataset of structures with resolution 3Å or better…………………………... 17 Figure 3.1.b: Normalized histogram inter segment and fitted probability density functions in the non-redundant dataset composed of structures with resolution 3Å or better…………………… 18 Figure 3.2: Histogram showing the distribution of angles in the non-redundant dataset of RNA structures with resolution 3Å or better …………………..……………….…………………….. 19 Figure 3.3: A 3D histogram showing the relation between angles and distances between stems in the non-redundant dataset composed of structures with resolution 3Å or better …………….... 20 Figure 3.4.a: Histogram showing the distribution of distances in the non-redundant dataset of RNA structures with resolution 3Å or better …………………………………………………... 21 Figure 3.4.b: Normalized histogram and fitted probability density functions for the non- redundant dataset of structures with resolution 3Å or better ………….…….....................…… 22 vi Figure 3.5 : Histogram showing the distribution of histogram showing the distribution of angles between stems in the same helix in the non-redundant dataset of structures with resolution 3Å or better.……………………………………………….……….………………………………….. 23 Figure 3.6: A 3D histogram showing the relation between angles and distances between stems in same helix in structures in the non-redundant dataset of structures with resolution 3Å or better …………………………………………………………………………………………………... 23 Figure 3.7: Histogram showing the distribution distances between stems in the different helix in tRNA structures in the non-redundant dataset of structures with resolution 3Å or better……… 24 Figure 3.8: Histogram showing the distribution of angles between stems in the different helix in the structures in the non-redundant dataset of structures with resolution 3Å or better………… 25 Figure 3.9: A 3D histogram showing the distribution of angles and distances between stems in the different helix in RNA structures in the non-redundant with resolution 3Å or better……………………………………………………………………………………………. 25 Figure 3.10: Histogram showing the distribution of stems lengths in RNA structures in the non redundant dataset of structures with resolution 3Å or better ………………………………………… 26 Figure 3.11: Histogram showing the distribution of stems lengths in the same helices in RNA structures in the non-redundant dataset of structures with resolution 3Å or better ………………………………………………………………………………………………….. 27 Figure 3.12: Histogram showing the distribution of stems lengths in the different helices in RNA structures in the non-redundant dataset of structures with resolution 3Å or better ………………………………………………………………………………………………….. 27 vii 1 Chapter 1 Introduction RNA is defined as a large group of macromolecules involved in many essential cellular processes [3]. The biological functions of these macromolecules depend on the complex secondary and three-dimensional structures that they form. In particular, RNA is a main player in the field of molecular biology because of its role in the transcription and translation of the genetic code, i.e., DNA makes RNA (via transcription) and RNA makes protein (during translation) [4]. There are several types of RNA, and probably more to be discovered. The synthesis of protein involves messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA). These RNA types are independent of each other and perform different tasks. In particular, the transfer RNA selects the amino acids, the