An Atlas of RNA Base Pairs Involving Modified Nucleobases with Optimal Geometries and Accurate Energies
Total Page:16
File Type:pdf, Size:1020Kb
An atlas of RNA base pairs involving modified nucleobases with optimal geometries and accurate energies Item Type Article Authors Chawla, Mohit; Oliva, R.; Bujnicki, J. M.; Cavallo, Luigi Citation An atlas of RNA base pairs involving modified nucleobases with optimal geometries and accurate energies 2015 Nucleic Acids Research Eprint version Publisher's Version/PDF DOI 10.1093/nar/gkv606 Publisher Oxford University Press (OUP) Journal Nucleic Acids Research Rights This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Download date 26/09/2021 02:06:18 Link to Item http://hdl.handle.net/10754/558768 Nucleic Acids Research Advance Access published June 27, 2015 Nucleic Acids Research, 2015 1 doi: 10.1093/nar/gkv606 An atlas of RNA base pairs involving modified nucleobases with optimal geometries and accurate energies Mohit Chawla1, Romina Oliva2,*, Janusz M. Bujnicki3,4 and Luigi Cavallo1,* 1King Abdullah University of Science and Technology (KAUST), Physical Sciences and Engineering Division, Kaust Catalysis Center, Thuwal 23955-6900, Saudi Arabia, 2Department of Sciences and Technologies, University Parthenope of Naples, Centro Direzionale Isola C4, I-80143, Naples, Italy, 3Laboratory of Bioinformatics and Protein Downloaded from Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, 02-109 Warsaw, Poland and 4Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznan, Poland Received March 31, 2015; Revised May 27, 2015; Accepted May 28, 2015 http://nar.oxfordjournals.org/ ABSTRACT translational regulation, up to the tuning of cellular dif- ferentiation and development. It is particularly interesting Posttranscriptional modifications greatly enhance that the fraction of human genome that is cell-specifically the chemical information of RNA molecules, con- transcribed to generate these regulatory noncoding RNAs tributing to explain the diversity of their structures is larger that the fraction of it devoted to encode proteins and functions. A significant fraction of RNA experi- (1). at King Abdullah University of Science and Technology on July 1, 2015 mental structures available to date present modified RNA fulfills this striking variety of functions appar- nucleobases, with half of them being involved in H- ently based on a limited chemical diversity, established by bonding interactions with other bases, i.e. ‘modified only four nucleobases: adenine (A), guanine (G), cytosine base pairs’. Herein we present a systematic investi- (C), uracil (U). This apparent contradiction is solved when gation of modified base pairs, in the context of ex- thinking that RNA can take advantage of a large num- perimental RNA structures. To this end, we first com- ber of posttranscriptional modifications, greatly enhancing its chemical information. To date, more than 100 differ- piled an atlas of experimentally observed modified ent modifications have been reported in RNA molecules, base pairs, for which we recorded occurrences and ranging from simple additions or substitutions of chemi- structural context. Then, for each base pair, we se- cal groups as e.g. in methylations or deaminations, to com- lected a representative for subsequent quantum me- plex alterations, often comprising a series of reactions, some chanics calculations, to find out its optimal geometry of which even resulting in a different heterocyclic structure. and interaction energy. Our structural analyses show A complete catalogue of such modifications can be found that most of the modified base pairs are non Watson– in dedicated databases, such as the RNAmods database (2) Crick like and are involved in RNA tertiary structure and MODOMICS (3), with the latter database containing motifs. In addition, quantum mechanics calculations also information about RNA modification pathways and quantify and provide a rationale for the impact of the sites of modification in selected RNAs. different modifications on the geometry and stability While the highest concentration and diversity of post- transcriptional modifications has been till now reported in of the base pairs they participate in. tRNA molecules, they are also widespread in rRNA and mRNA, and more than a dozen of modifications have al- INTRODUCTION ready been reported in small, noncoding RNAs (2,4–8). As Discovery of various forms of noncoding RNAs in the a matter of fact, nowadays most if not all the major classes past two decades, besides the well-known coding messen- of RNA molecules in the cell are thought to possibly present ger RNA (mRNA), ribosomal RNA (rRNA) and transfer modified nucleotides. RNA (tRNA), has dramatically changed our view of the Specific modifications contribute to tRNA stability, favor RNA function. In addition to the transmission of genetic its recognition by the cognate aminoacyl synthetase and by information, it is indeed now clear that RNA molecules can mRNA, influence nuclear export of mRNA, protect it from fulfill a variety of other functions, including catalysis and degradation and regulate splicing, or can establish resis- *To whom correspondence should be addressed. Tel: +966 12 8087566; Fax +966 12 8089999; Email: [email protected] Correspondence may also be addressed to Romina Oliva. Tel: +39 081 5476541; Fax: +39 081 5476514; Email: [email protected] C The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 2 Nucleic Acids Research, 2015 tance to antibiotics in bacterial rRNA (9–14). Many more tive occurrences, and an accurate estimate of the effect of examples of the impact of modifications on the RNA func- each chemical modification on the structure and stability of tion and structure are reviewed in (8,13–17). Importantly, the corresponding H-bonded base pair. Notably, we found modifications also play a role in human diseases, partic- that the modified base pairs typically exhibit non canon- ularly tumors, myopathy, type-2 diabetes and obesity [re- ical geometries (i.e. different from the classical Watson– viewed in (18)]. Crick pairing) and are located in a variety of different RNA Chemical modifications that control the stability and molecules and structural motifs. This extends our under- proper folding of the RNA molecule are generally classi- standing of how posttranscriptional modifications act on fied as ‘structural’. The most efficient ways by which they the structure of RNA molecules to influence their function. can affect the RNA structure are hydrogen bonding, - stacking and the coordination of metal ions, with the first MATERIALS AND METHODS one playing a major role. Chemical modifications may ac- Nomenclature Downloaded from tually occur at all the three edges used by nucleobases for H-bonding to other bases, i.e. the Watson–Crick, the Hoog- The adopted nomenclature for the geometry of the analysed steen and the sugar edge (see Figure 1). A modified nucle- H-bonded base pairs (Table 1) is based on that proposed by obase can thus exhibit significantly changed pairing prop- Leontis and Westhof (42,43) and extended by Lemieux and erties, as compared to the corresponding canonical one. If Major (44). In it, the interacting edges involved in the H- the Watson–Crick edge is affected, for instance, the canoni- bonding, i.e. Watson–Crick, Hoogsteen or sugar, and the http://nar.oxfordjournals.org/ cal Watson–Crick G-C/A-U pairing will be impaired, while two mutual orientations of the glycosidic bonds, i.e cis or non canonical base pairs, involving either of the other two trans are specified (42,43). A symbol ‘W’, ‘H’ or ‘S’, is given edges, may be favored. to indicate that the ‘Watson–Crick’, ‘Hoogsteen’ or ‘sugar’ To date, more than 3000 macromolecular structures have edge is involved in the base-base H-bonding interaction; been deposited in the wwPDB (19), which contain differ- ‘Bs’ is used for bifurcated base pairs involving the sugar side ent types of RNA molecules including not only tRNAs, amino/keto group (44). This is preceded by ‘c’ or ‘t’, indi- mRNAs, rRNAs, but also viral RNAs, riboswitches, ri- cating that the orientation of the glycosidic bonds is cis or bozymes and more recently discovered small non coding nu- trans, respectively. We added an ‘r’ in brackets after the edge clear and nucleolar RNAs. Remarkably, a significant frac- symbol when the corresponding ribose was also involved in at King Abdullah University of Science and Technology on July 1, 2015 tion of such structures present modified residues. There- H-bonding. The symbol for the edge H-bonding with the ri- fore, it is time to systematically investigate the structural ef- bose of the paired nucleoside was also reported in brackets, fect of chemical modifications in the context of experimen- if different from that involved in base-base pairing. Tradi- tal RNA structures. Herein we will focus on the modifica- tional abbreviations were adopted for the modified nucle- tions effect on H-bonded base pairs. To this end, we per- obases. For the non-natural modifications, after the num- formed a comprehensive search in the Protein Data Bank ber of the modified atom we reported the chemical symbol (19) to compile an atlas of experimentally observed ‘mod- of the halogen element substituting a hydrogen atom and ified base pairs’, i.e. H-bonded base pairs, with a given ge- the one-letter-code of the corresponding nucleobase. When ometry, involving at least one noncanonical nucleobase. For a base pair is characterized by only one H-bond, this is in- each specific combination of nucleobases and base pair ge- dicated by a ‘1’ after the edge symbols.