<<

Progress in Biophysics and Molecular xxx (2013) 1e20

Contents lists available at ScienceDirect

Progress in Biophysics and

journal homepage: www.elsevier.com/locate/pbiomolbio

Review RNA structure and dynamics: A base pairing perspective

Sukanya Halder a, Dhananjay Bhattacharyya b,* a Biophysics division, Saha Institute of Nuclear Physics, 1/AF, Bidhannagar, Kolkata 700 064, India b Computational Science division, Saha Institute of Nuclear Physics, 1/AF, Bidhannagar, Kolkata 700 064, India article info abstract

Article history: RNA is now known to possess various structural, regulatory and enzymatic functions for survival of Available online xxx cellular organisms. Functional RNA structures are generally created by three-dimensional organization of small structural motifs, formed by base pairing between self-complementary sequences from different Keywords: parts of the RNA chain. In addition to the canonical WatsoneCrick or wobble base pairs, several non- Non-canonical canonical base pairs are found to be crucial to the structural organization of RNA molecules. They RNA secondary structure appear within different structural motifs and are found to stabilize the molecule through long-range Structural characterization of non-canonical intra-molecular interactions between basic structural motifs like double helices and loops. These base base pairs Detection of non-canonical base pairs pairs also impart functional variation to the minor groove of A-form RNA helices, thus forming anchoring site for metabolites and ligands. Non-canonical base pairs are formed by edge-to-edge hydrogen bonding interactions between the bases. A large number of theoretical studies have been done to detect and analyze these non-canonical base pairs within crystal or NMR derived structures of different functional RNA. Theoretical studies of these isolated base pairs using ab initio quantum chemical methods as well as molecular dynamics simulations of larger fragments have also established that many of these non- canonical base pairs are as stable as the canonical WatsoneCrick base pairs. This review focuses on the various structural aspects of non-canonical base pairs in the organization of RNA molecules and the possible applications of these base pairs in predicting RNA structures with more accuracy. Ó 2013 Elsevier Ltd. All rights reserved.

Contents

1. Introduction ...... 00 2. Studies on RNA structures ...... 00 3. RNA structural organization through base pairing interactions ...... 00 4. Tools for RNA structure analysis ...... 00 5. Structure and stability of non-canonical base pairs ...... 00 6. Structural organization of RNA: importance of non-canonical base pairs ...... 00 7. Functional importance of non-canonical base pairs ...... 00 8. Non-canonical base pairs in RNA structure prediction ...... 00 9. Structural stabilities of double helices containing non-canonical base pairs ...... 00 10. Conclusion ...... 00 Acknowledgement ...... 00 References...... 00

1. Introduction

For years, RNA has been considered to be an intermediate stage between DNA and . Until recently, RNA was only of in- * Corresponding author. Tel.: þ91 33 2337 0379x2252, fax: þ91 33 2337 4637. E-mail addresses: [email protected], [email protected] terest in its role as an intermediary in and trans- (D. Bhattacharyya). lation, where genetic information of DNA is transferred to

0079-6107/$ e see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 2 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 messenger RNA (mRNA) through transcription and then translated several other purposes like house keeping, stress response, into with the help of transfer RNA (tRNA) and ribosomal virulence, quorum sensing etc. RNA (rRNA). However, it is now well established that commonly known 2. Studies on RNA structures functional like mRNA, tRNA and rRNA take additional re- sponsibility in post transcriptional regulation and The discovery of catalytic activity in RNA has prompted processes within the cellular machinery and thereby regulate serious attempts to decipher the mechanisms of their assembly expression levels. In the past few decades, RNA has been into functional native states starting from linear strands. Since shown to exhibit several other crucial functions also. The most the release of crystal structures of the ribosomal subunits in important one is their enzymatic activities similar to endonu- 2000e2001, determination of RNA structures have gained pace clease, nucleotidyltransferase, phosphodiesterase, phospho- remarkably, thanks to the advancements of various chemical and transferase and acid phosphatase (Zaug et al., 1983,1984,1986; biophysical methods. Elucidation of large and complex RNA Zaug and Cech, 1986a,b). Catalytic activity of Tetrahymena 26S structures like group I introns, riboswitches, RNA components of rRNA intervening sequence was first established with its self- both A-type and B-type RNaseP etc by X-ray crystallography has splicing role (Kruger et al., 1982). Endonuclease activity is also been possible. Improvements in techniques for the synthesis, imparted by the RNA component of RNaseP (Guerriertakada purification, crystallization and derivatization of large RNAs, as et al., 1983; Krasilnikov et al., 2003; Kazantsev et al., 2005). well as the development of advanced software, have been crucial This ribonuclease is among the first catalytic RNAs discovered for this advancement. In recent years nuclear magnetic reso- and is one of the only two known conserved in all nance (NMR) spectroscopic technique has also improved signif- taxonomic kingdoms (Krasilnikov et al., 2004). Bacterial RNaseP icantly for structure determination of large . can catalyze the hydrolysis of tRNA precursor by cleaving a Developments in other physical and chemical methods like phosphodiester bond even in the absence of protein-part in vitro single-particle cryo-electron microscopy, mass spectrometry and and results in 50-phosphorylated mature tRNA (Krasilnikov et al., structure-specific chemicals and enzymes have also helped in 2003; Kazantsev et al., 2005). Studies of the non-coding regions this exponential growth of structural data in the Protein Data of mRNA, or aptamers, have revealed their ability of recognition Bank (PDB) (Bernstein et al., 1977; Berman et al., 2000). The and binding to specific target molecules for genetic control determination of various RNA structures, such as the hammer- (Mironov et al., 2002; Szostak, 2002). These non-coding regions head (Scott et al., 1995a,b), SRP RNA (Zwieb et al., 1999) assist in the signal transduction pathway and are required for and the 5S, 16S and 23S RNAs of ribosome (Ban et al. 2000)has translational regulation by sensing the level of a given metabo- greatly increased our knowledge of RNA folds and the three- lite. Another type of biotechnologically important gene regula- dimensional organization of RNA chains (Ferre-D’Amare et al., tory non-coding RNA is riboswitch, which are found in 1998; Batey et al., 1999; Hermann and Patel, 1999). Collectively, and fungi and not in higher organisms, making it target for cure these structures provide a large amount of information about in human. They can fold into unique shapes and have the ability RNA structural motifs (Moore, 1999). Similar exponential growth to shift its conformation to a different structure in presence or in number of crystal structures of proteins is also taking place in absence of specificmetabolite(Nudler and Mironov, 2004; the PDB. Considering the need of classification of these proteins, Vitreschak et al., 2004; Tucker and Breaker, 2005; Batey, 2006) there are a number of methods available, such as SCOP (Murzin for modulating protein synthesis process (Nahvi et al., 2002; et al., 1995; Hubbard et al., 1997), FSSP (Holm and Sander, Winkler et al., 2002a,b). Another functional aspect of RNA mol- 1997), Pisces (Wang and Dunbrack, 2005), BIPA (Lee and ecules is their role in peptide bond formation mediated by the 20- Blundell, 2009) etc. These methods can classify a protein struc- hydroxyl group of the 30-terminal ribose sugars of peptidyl tRNA ture based on its structural class, source organism, secondary (Das et al., 1999; Dorner et al. 2003; Strobel and Cochrane, 2007). structure content, resolution, etc. In a similar manner, it is also Recently, larger rRNA subunits of ribosomes in Eubacteria as necessary to organize the available RNA structures to determine well as plants and animals have been shown to possess chap- different structureefunction relationships. erone activity to assist protein folding (Samanta et al., 2008). The building blocks of structures are the base pairs, Other smaller RNAs like snRNA, snoRNA, miRNA, siRNA etc.are formed by specific hydrogen bonds between complementary bases. also of great importance for the cell to perform all its functions These base paired stacks give rise to the long antiparallel double properly, although their structural features are not yet fully helical DNA. In the canonical WatsoneCrick base pairing pattern in understood. Small nuclear RNA (snRNA) is a class of small RNA DNA, (A) forms base pair with (T), and (G) molecules that are found within the nucleus of eukaryotic cells with (C). RNA also has similar patterns of WatsoneCrick and are involved in a variety of important processes such as RNA base pairing between adenine (A) and (U), and between splicing (as a part of spliceosome), regulation of transcription guanine (G) and cytosine (C) (Fig. 1). However, RNA structures are factors or RNA II, and maintaining the collections of short helices interspersed by unpaired regions and (Matera et al., 2007; Katsamba et al., 2001; Stevens et al., 2001). packed together into compact structures, instead of monotonous or miRNAs are one such small post-transcriptional long helices (Higgs, 2000). Unlike DNA, RNA double helices do not regulator RNA that bind to complementary sequences on target have structural polymorphism. They can only adopt the A-form messenger RNA transcripts (mRNAs), usually resulting in trans- helical conformation with very narrow major groove and shallow lational repression or target degradation and gene silencing and wide minor groove, in which minor groove sides of A:U and G:C (Kusenda et al., 2006; Bartel, 2009). Another type of non-coding WatsoneCrick base pairs do not possess much sequence dependent RNA is small nucleolar RNA or snoRNA, which performs chemical variations of hydrogen bonding signatures required for specific modification of other RNAs, mainly ribosomal RNAs (rRNA) and recognition by proteins and ligands (Fig. 2). Instead, RNA forms a transfer RNAs (tRNA). They function in the form of ribonucleo- range of structural patterns through base pairing and base stacking, proteins (snoRNPs), which use base complementarity to guide giving rise to various motifs and folds as the building blocks of site-specific20-O-ribose methylations or pseudouridylations three-dimensional organization e double helices, hairpin loops, (Kiss-Laszlo et al., 1996; Liang et al., 2001; Bachellerie et al., internal loops, kissing loops, , coaxial stacks etc. The 2002). In bacteria, small regulatory RNAs are significant for three dimensional structures are stabilized by long-range intra-

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 3

the structural motifs are few and common to all types of RNA. Moreover, RNA structural motifs are truly structural, and there may be several sequences that obtain the same structure. Thus most of the RNA classification databases like RNA Base (Murthy and Rose, 2003), fRNAdb (Mituyama et al., 2009), SCOR (Klosterman et al., 2002; Tamura et al., 2004), Rfam (Griffiths-Jones et al., 2003, 2005) etc. attempt to classify the available RNA structures on the basis of their functional categories. Only RNAFRABase (Popenda et al., 2008, 2010) and RNAJunction database (Bindelwald et al., 2008) sort RNA structures according to the motifs present. How- ever, given the ever-increasing number of RNA structures, many of these attempts of RNA classification failed to maintain the database by manual curation only. Keeping in mind the growing number of structures released in the PDB and the need for regular updates, we developed HD-RNAS database [http://www.saha.ac.in/biop/www/HD-RNAS.html] for classification of the RNA structures available in the PDB database using an automated programmatic approach (Ray et al., 2012). At the first stage, the RNA structures are classified according to their functional classes e tRNA, rRNA, mRNA, Ribozymes, Riboswitches, Ribonucleases and Signal recognition particle (SRP) RNAs. Each functional class is divided into sub-classes according to the source organisms of the RNA molecules. As the classification and database creation are done by a software suite, this automated tool is capable of frequently classifying the newly released structures with mini- e Fig. 1. Canonical Watson Crick base pairs (a) A:U and (b) G:C. Hydrogen atoms are manual intervention. The classification also provides (i) a non- capable to form hydrogen bonds, hence are shown as sticks pointing from the donor atoms. The Nitrogen and Oxygen are shown as red spheres as both are redundant set of RNA structures for unbiased analysis, based on acceptors. atoms are shown as shaded cyan spheres as they give rise to van der functionality, source organism and sequence variation, and (ii) a set Waals interaction only. of structures of identical sequences for each class, which have all the ligand-binding and environmental effects. molecular interactions between the secondary structural elements to yield complex motifs. However, similar to the structural classi- 3. RNA structural organization through base pairing fication of proteins into a, b, a/b,(a þ b) categories by SCOP, the interactions classification of RNA based on these structural motifs is difficult as The basic structural arrangement of nucleic acids in cell is formally defined by two types of interactions e (i) hydrogen bonding between a donor atom with a polar hydrogen and an electronegative acceptor atom forming a base pair, and (ii) pep stacking interaction between the aromatic moieties of the nucleo- bases (Saenger, 1984). Hydrogen bonding in the WatsoneCrick base pairing scheme is the major component for stabilization of nucleic acids. These base pairs give rise to the common double helical structure of deoxyribonucleic acid (DNA) as proposed by Watson and Crick (Watson and Crick, 1953). The specificity of base pairings in DNA leads to exact complementarity of the two strands, which further enables faithful replication of the daughter strands with identical sequence and transcription to mRNA containing sequence of the gene. Thus, the WatsoneCrick base pairing is the most crucial component in as a whole. Different experiments, namely X-ray fibre diffraction (Saenger, 1984), circular dichroism spectroscopy (CD) (Cantor and Schimmel, 1980; Bloomfield et al., 2000), linear dichroism (Premilat and Albiser, 1995), etc, indicated that DNA double helical structure may depend on base sequence, and on solution envi- ronment probably to an even larger extent. The structural forms are commonly known as A-DNA, B-DNA, C-DNA, Z-DNA, etc (Saenger, 1984; Ghosh and Bansal, 2003). Among these, B-DNA is possibly more relevant in physiological conditions. In this form of DNA, there are ten base pairs per turn, separation between the base pairs with respect to the neighbouring ones is around 3.4 A, the charged phosphate groups are at the periphery and about 10 A away from the imaginary helix axis running through the centre of the double Fig. 2. Major and Minor grooves in B-form (top) and A-form (bottom) double helices. helix (Saenger, 1984; Neidle, 2002). Two antiparallel sugar- The base atoms are shown in blue and the sugar-phosphate backbone is shown in silver. It is to be noted that the major groove becomes narrow and deep, hence phosphate backbone strands of the double helix give rise to two unaccessible, in case of A-DNA. unequal grooves e major groove and minor groove (Fig. 2). The

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 4 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 major groove in B-DNA has a larger dimension compared to the leading to structural alteration (Saenger et al. 1986). It may further minor groove, restricted by atoms of the sugar moiety. The base be noted that, in hydrophobic environment, the mostly polar major pairs have greater number of functional groups exposed towards groove becomes unused while the relatively more hydrophobic the major groove, such as amino group, carbonyl group, imino minor groove makes more contact with solvent. At physiological group, etc, which can form additional hydrogen bonds with condition, RNA on the other hand, adopts A-form possibly for two or other ligands. Moreover, disposition of these functional groups reasons: (i) its minor groove attains more polar nature due to the depends highly on the base sequence. Nature utilizes this specific presence of 20-OH and (ii) these OH-groups cannot sterically fitin location for recognition of specific sequence during gene regulatory the B-form structure. Recent studies also suggest that water mol- protein binding to DNA (Jones et al. 1999). It is found that most of ecules prefer to bind to WatsoneCrick base paired RNA double the activator as well as repressor proteins bind to DNA in the major helical structures in the groove regions (Kirilova and Carugo, 2011; groove region. The Alpha-helices of these proteins often bind, or Kirmizialtin and Elber, 2010; Auffinger and Westhof, 1998). rest in the major groove and the side chains form specific hydrogen As the exposed minor groove of RNA does not give enough bonds with bases (Lee and Blundell, 2009). The minor groove, on variations in hydrogen bonding pattern for recognition of ligands, the other hand, is much less dynamic e A:T and T:A both have two nature probably used different non-canonical base pairs in most of hydrogen bond acceptors about 6 A apart. Both G:C and C:G have the functional RNA folds (Lescoute and Westhof, 2006; Butcher and similar hydrogen bond acceptors along with a hydrogen bond Pyle, 2011). and bases use three edges for donor in the middle. Protein motifs cannot bind in the narrow hydrogen bonding e WatsoneCrick edge (W), Hoogsteen edge (H) minor groove of B-DNA but many DNA binding antibiotics bind to and Sugar edge (S) (Fig. 3). Although Hoogsteen edge applies only the minor groove, which are often rather non-specific. to , it is widely used to refer to the ‘CeH’ edge of pyrimi- Dehydrated DNA adopts the A-form structure where the major dines (Leontis and Westhof, 2001; Leontis et al., 2002). These base groove becomes too narrow and the minor groove is very wide and pairs can also be in cis or trans forms, depending on relative shallow. The ribose sugar here, adopts a C30-endo form as against orientation of their ribose sugars about the pseudoaxis along C20-endo puckered sugars in B-DNA. The C30-endo sugars lead to hydrogen bond interactions. According to the study of Leontis and shorter distances between the successive phosphate groups. It was Westhof, there can be 12 basic types of base pairing geometries hypothesized that such short distance between the negatively possible in RNA structures, considering three distinct base pairing charged phosphate groups would give rise to economy of hydration edges of each (W, H, S) and two different orientations

Fig. 3. The base pairing edges of . W e WatsoneCrick edge, H e hoogsteen edge, S e sugar edge. (a) Adenine, (b) Guanine, (c) Uracil, (d) Cytosine.

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 5 about the axis of interactions (cis and trans) e cis WatsoneCrick/ of this base pair calculated from energy components is WatsoneCrick (W:WC), trans WatsoneCrick/WatsoneCrick approximately 20 kcal/mol. In similar way, the interaction energy (W:WT), cis WatsoneCrick/Hoogsteen (W:HC), trans Watsone of an A:U W:WC base pair is w 13 kcal/mol, with one NeH.O and Crick/Hoogsteen (W:HT), cis WatsoneCrick/Sugar edge (W:SC), one NeH.N interactions (Fig. 1). trans WatsoneCrick/Sugar edge (W:ST), cis Hoogsteen/Hoogsteen (H:HC), trans Hoogsteen/Hoogsteen (H:HT), cis Hoogsteen/Sugar 4. Tools for RNA structure analysis edge (H:SC), trans Hoogsteen/Sugar edge (H:ST), cis Sugar edge/ Sugar edge (S:SC), and trans Sugar edge/Sugar edge (S:ST) (Leontis Base pair finding in three-dimensional RNA structures is a and Westhof, 2001). According to this classification, the canonical non-trivial job, and used to be done by visual inspection. A:U and G:C pairs belong to the cis WatsoneCrick/WatsoneCrick Considering the immense importance of non-canonical base (W:WC) geometry. Theoretically, this gives rise to 264 possible pairs in RNA structure and function, several surveys have led to types of distinct base pairing schemes. However, stabilization of the development of various computer programs and databases these base pairing schemes by hydrogen bonding is possible in for convenient identification, description and structural charac- fewer cases as all these possible pairs of bases do not have com- terization of RNA base pairs and higher order motifs (Lemieux plementary hydrogen bonding donors and acceptors in their cor- andMajor,2002;Waughetal.,2002; Yang et al., 2003; Olson responding edges. This review focuses on the structural features of et al., 2009). Different types of canonical and non-canonical these various types of non-canonical base pairs. We focus on base pairing interactions are tabulated in the BPS (Base Pair quantum chemical studies constrained to isolated base pair sys- Structures) (Xin and Olson, 2009) and NCIR (Non-Canonical In- tems and classical molecular dynamics simulation studies per- teractions in RNA) (Nagaswamy et al., 2000, 2002) databases. formed on non-canonical base pairs in larger systems, e.g., double Nowadays, several software packages like MC-annotate (Gendron helical stretches, tetraloops, etc., to elucidate their stability in the et al., 2001), MANIP (Massire and Westhof, 1998), HBExplore context of their base pairing patterns. (Lindauer et al., 1996), 3DNA (Lu and Olson, 2003), BPView (Yang In most cases, base-pairings involve polar hydrogen bonds et al., 2003), BPFIND (Das et al., 2006), etc., have been developed mediated by NeH.O/N and/or OeH.O/N type of interactions. to detect base pairs and tertiary structures based on geometric Non-polar interactions like CeH.O/N are also considered to be criteria. The 3DNA software uses the co-planarity condition of the involved in base pairing patterns (Leonard et al., 1995; Auffinger bases to assign the base pairing interactions in RNA. However, and Westhof, 1996; Berger et al., 1996; Wahl and Sundaralingam, many of these methods detect base pairs barely stabilized by a 1997; Auffinger and Westhof, 1999; Brandl et al., 1999). Recent single hydrogen bond or water/ion-mediated hydrogen bonds, studies indicated that CeH groups can also act as hydrogen bond whose stability and strength of interaction remain questionable donor (Panigrahi and Desiraju, 2007; Desiraju, 2010) while their (Sponer et al., 2005a,b,c; Bhattacharyya et al., 2007; Sharma strength may depend on acidity of the group due to local chemical et al., 2008). It is true that a single hydrogen bond between environment (Panigrahi et al., 2011b). For example, the C8eH8 in two bases can lead to considerably strong attraction, but even purine bases, C5eH5 of pyrimidine bases and C2eH2 of adenine after formation of a strong hydrogen bond the bases can rotate can be considered for such base-pairing interactions, C5eH5 being freely about the vector through the H.Acceptor bond. Hence, the the most feasible. Some groups have characterized these as blue bases may attain non-planar geometry, which may not stack well shifting hydrogen bonds having substantially smaller interaction within a double helix and may not be called a base pair. energies (Joseph and Jemmis, 2007). However, such H-bonds may Furthermore, for specificity in , at least two still be important for understanding specificity of ligand receptor hydrogen bonds are required. On the other hand, BPFIND iden- interactions and hence in structure based drug design also. In tifies the base pairs stabilized by at least two hydrogen bonds in addition to the bases interacting directly to form different types of an RNA structure (Das et al., 2006). This considers closeness of base pairs, the 20-OH group of RNA can also interact with another two pairs of hydrogen bond donor and acceptor atoms, such as base to form a specific base pair, particularly involving the sugar amino nitrogen and carbonyl oxygen, and linearity of four pseudo edge of a base, giving rise to enormous theoretical possibilities. In angles for detection of a base pair. Four such linear pseudo angles an extensive study of non-canonical base pairs, the energies of simultaneously ensure two linear hydrogen bond (angle formed different types of hydrogen bonding interactions have been esti- by DeH.A close to 180) and co-planarity of the two bases mated in base pairing context. It is found that NeH.O and Ne forming a base pair. Consideration of at least two hydrogen bonds H.N interactions are significantly distinct in terms of their in a base pair reduces the huge number of theoretically possible contribution towards base pair interaction energies (7.1 kcal/mol varieties of base pairs (264 types) to 128. This number is ach- and 5.7 kcal/mol, respectively), while OeH.N interactions are ieved by considering few possibly protonated base pairs and few strongest among these (7.2 kcal/mol). Weakly polar CeH.O/N sheared varieties also. A complete list is available at http://www. type of hydrogen bonds are less strong, their energy contribution saha.ac.in/biop/www/db/local/BP/rnabasepair.html. It is note- range from 0.5 to 2.0 kcal/mol (Roy et al., 2008). This analysis worthy that accuracy of any automated detection procedure de- also helps to predict the interaction energies and stability of the pends on the criteria employed in different software packages by base pairs from their hydrogen bonding geometries. The interaction different groups. Hence, a cutoff distance or angle can always be energies of base pairs are calculated by various groups as questionable (Kabsch and Sander, 1983). In reality, hydrogen DE ¼ Ebasepair Ebase1 Ebase2 þ correction terms, where Ebasepair, bonding is a continuous event, whereas software programs as- Ebase1 and Ebase2 are total potential energies of the base pair and the sume it to be a discrete function of distance and angle while two individual bases in their relaxed form and are calculated by ab setting the cutoff values. initio quantum chemical methods with different levels of approxi- Similarly for structural characterization of base pairs and their mations/rigor. The most important correction term is due to basis stacking arrangements, Curves (Lavery et al., 2009), 3DNA (Lu and set superposition error. The free energies of formation of base pair Olson, 2003), Freehelix/NEWHELIX (Dickerson, 1998), SCHNAaP are also sometimes calculated with significantly more effort. For (Lu et al., 1997b), NUPARM (Bansal et al., 1995; Mukherjee et al. example, G:C W:WC base pair have 2 NeH.O type of interactions 2006) etc are the most commonly used utilities. According to the between O6G-N4C and N2G-O2C, and one NeH.N type of inter- EMBO workshop (Dickerson, 1989), structures of base pairs or base action between N1G-N3C (Fig. 1). The expected interaction energy pair steps are defined with the help of three translational and three

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 6 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 rotational degrees of freedom along the three mutually perpen- dicular axes fixed on the bases or base pair planes, respectively (Dickerson, 1989; Bansal et al., 1995; Dickerson, 1998; Olson et al., 2001). Within a base pair, the spatial arrangement of one base with respect to the other can be quantitatively defined with the help of intra-base pair parameters e buckle, propeller, open-angle, shear, stagger and stretch (Fig. 4). These intra-base pair parameters have a direct resemblance to the three-dimensional conformation of a base pair, i.e., shear indicates sliding of one base with respect to the other in the base pair plane. The movement of Uracil base in a G:U W:WC base pair as compared to cytosine in a G:C W:WC base pair is a typical example of shear (Fig. 5). Stagger indicates out of plane motion of one base with respect to the other; stretch in- dicates separation of the two bases relating to hydrogen-bonding distance; buckle indicates the amount of cusp formation. Open indicates the angle between the two bases on the base pair plane. This causes elongation of one of the hydrogen bonds in A:U W:WC Fig. 5. Sheared structure of a G:U W:WC base pair (ochre colouring) superposed on a base pair, for example. Propeller twist is the twisting motion of the G:C W:WC base pair (cpk colouring). The guanine bases are superimposed, indicating two bases about the base pair long axis. Among these six param- shearing movement of uracil with respect to cytosine. H-bond are shown by dotted eters; shear, stretch and open-angle relate directly to the hydrogen- lines. bonding pattern and proximity, while buckle, propeller and stagger describe the overall non-planarity of a base pair compared to the NUPARM definition and IUPAC-IUB recommendation of intra-base ideal coplanar geometry. It is expected that these parameters will pair parameters (Fig. 6). The plots indicate normal distribution of highlight two aspects of base pair geometry: (1) quality of hydrogen rotational parameters centred around zero for all the base pairs. bonds forming the base pairs in terms of its deviation from the ideal The propeller twist values for the cis base pairs are generally geometry and (2) relative orientation of the bases with respect to negative. The shear value for A:G H:ST base pair is a little higher, a each other such as non-planarity. Both these properties would prerequisite to maintain two stable hydrogen bonds. Similar large essentially indicate the strength of association of the base pairs and shear value is also expected for G:U wobble base pairs. hence will highlight role of such pairs in RNA fold formation and Likewise, spatial arrangement of one base pair with respect to recognition. In the other software programs like Curves, 3DNA etc, the successive ones in a dinucleotide unit can be quantitatively these values are unusually large for most non-WatsoneCrick base defined with the help of three rotational (tilt, roll, twist) and three pairs and, hence, cannot be correlated to quality of hydrogen translational (shift, slide, rise) inter-base pair parameters (Fig. 7). As bonding. It has been shown earlier that 3DNA calculates an open- expected in A-form double helical structure of RNA, the roll values angle value of w90 for a stable A:U H:WT base pair (Halder and are generally positive and range up to 15, slide values are close Bhattacharyya, 2010). These utilities employ a unified axis defini- to 1.5 A, tilt and shift values are close to zero. The roll and slide tion for both canonical and non-canonical base pairs. In contrary, values mostly are dinucleotide sequence dependent and follow the hydrogen bonding edge-specific axis-system employed in Calladine’s steric clash based rule (Calladine, 1982). In our previous NUPARM generates values of the base pair parameters e buckle, study, we have observed some unusual values inter-base pair pa- open angle, propeller twist, stagger and shear e close to zero for a rameters, especially of twist of dinucleotide steps containing non- strong, stable and planar base pair, irrespective of its base pairing canonical base pairs (Halder and Bhattacharyya, 2010). Neverthe- geometry (Mukherjee et al., 2006). Thus, small magnitudes of pa- less, they follow a normal distribution similar to those containing rameters in NUPARM definition indicate stable base pairing even the canonical base pairs only. for the non-canonical ones, which is in accordance with the IUPAC- In addition to interaction between two bases forming different IUB convention. The stretch values calculated by NUPARM are, types of base pairs, quite often three or more bases appear in co- however, around 3 A for both canonical as well as non-canonical planar orientation within nucleic acid structures. These are base pairs reflecting the approximate hydrogen bond length be- commonly known as base triples or quadruples. These base triples tween donor and acceptor atoms. The distributions of the base pair can stack on top of each other giving rise to three-stranded helix parameters for two representative base pairs e one canonical G:C while the base quartets give rise to quadruple stranded helix, pre- W:WC and one non-canonical A:G H:ST e using a non-redundant sumably in the regions of chromosomal DNA. Base triplets dataset of RNA structures highlight the agreement between the can be found in several types of RNA structural motifs, e.g., kink-turn motifs, sarcin-ricin loops, tetraloopereceptor interactions, A-minor motifs etc. There has been considerable activity in finding and classifying the base triples in RNA structures by various groups (Abu Almakarem et al., 2012; Lee and gutell, 2004, Xin and Olson, 2009). Following the Leontis and Westhof nomenclature of non-canonical base pairs, we can theoretically have 108 types of base triplets, among which only 68 types are found in RNA structures. Model building studies have shown that the rest of the geometries are not likely for steric clashes (Abu Almakarem et al., 2012). Some of these base triplets have special significance in functional RNAs. For example, riboswitches have two major domains e aptamer domain for binding to the metabolites and the expression platform, which undergoes structural changes upon metabollite binding to the fl Fig. 4. Intra-base pair parameters for describing relative orientation of the two bases aptamer domain. Base triplets are found to form the oor and roof of in a base pair. the metabolite binding site in the aptamer domain. (Noeske et al.,

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 7

Fig. 6. Distribution of intra-base pair parameters for canonical and non-canonical base pairs (a) G:C W:WC, (b) A:G H:ST.

2005; Sharma et al., 2009a,b). In ribosomal RNAs, 85e90% of the characterize their structure and stability using various quantum total base triplets are found to be conserved in different species, chemical methods. Most of these studies selected base pairs from indicating their importance in proper functioning of the ribosomes crystal structures or even modelled them, obtained energy- (Abu Almakarem et al., 2012). The topology of base quartets of RNA minimized structures using Density Functional theory, Hartree- are, however, quite different from that of quadruplex DNA. In RNA Fock or post-Hartree-Fock methods like MP2 and analysed their quartets, we generally notice base A is simultaneously paired to interaction energies. Ab initio quantum chemical methods have base B and C and either B or C is paired to base D also. On the other been applied extensively to study nucleic acid base pairs, since its hand, in case of DNA quadruplex, each of the base pairs with two first use in 1986e1988. These studies indicate that many of these other bases in a cyclic arrangement (Fig. 8). non-canonical base pairs are as stable as the canonical ones (Hobza and Sponer, 1999; Sponer et al., 2004; Roy et al., 2008; 5. Structure and stability of non-canonical base pairs Mladek et al., 2009). The interaction energy values generally range between 26 kcal/mol and 5 kcal/mol. Optimizations of There have been a large number of quantum chemical studies the non-canonical base pairs by different quantum chemical on different types of canonical and non-canonical base pairs to methods show that the base pairs stabilized by a pair of polar

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 8 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20

hydrogen bonds are generally found to be rather flexible as their optimized structures significantly deviate from their respective structures in crystal environment. Earlier quantum chemical studies on different non-canonical base pairs occurring in func- tional RNA had shown that the movements for most of them is favourable along the direction of propeller and buckle, but rather restricted for open, shear and, in particular, stretch as they are associated with distortion of hydrogen bonding between the bases (Roy et al., 2008). The vibration along stretch direction was found to occur at high frequency and coupled to few angle bending motions, indicating larger energy cost. This study also indicated that the vibrational motions of the bases with respect to each other take place within a time period of 5pse10ps. Similar fea- Fig. 7. Inter-base pair parameters for describing relative orientation of the two base tures were also obtained from nuclear magnetic resonance spec- pairs in a local doublet. troscopy (NMR) but these studies were restricted to WatsoneCrick base pairs of DNA (Chen and Russu, 2004; Snoussi and Leroy, hydrogen bonds are optimized to structures close to the crystal 2001). As expected, the non-polar base pairs have poor interac- ensembles by most of the methods, indicating their high stability tion energies as compared to most of the polar base pairs. It is also (Roy et al., 2008; Sharma et al., 2010; Panigrahi et al., 2011a,b). observed that base pairs with free carbonyl or amino groups tend The non-polar base pairs and those involving sugar-mediated to undergo structural conversion to involve the free carbonyl or amino group in hydrogen bonding. Two such examples are G:C W:WT and G:G W:HC where the optimized structures use pairing through different edges (Sharma et al., 2008; Panigrahi et al., 2011a). It may be worth mentioning that G:G W:HC base pair is an integral component of telomeric DNA forming G-quadruplex þ and these are usually stabilized by K ions (Fig. 8b). This indicates that the non-hydrogen bonded polar functional groups can be involved in molecular recognition e in the absence of the specific ligand the non-canonical base pairs can adopt a different struc- ture, thereby acting as a conformational switch (Sharma et al., 2008; Panigrahi et al., 2011a). Similar structural transition has been observed for wobble G:U W:WC base pairs during MD simulation runs (Halder and Bhattacharyya, 2012). Most of these QM studies on base pairs, however, were carried out in gas phase. Considering the fact that these base pairs mostly appear in the core of the molecules, such approximation of gas phase or low dielectric simulations appear reasonable. Further- more, most of the quantum chemical methods considering implicit solvent model are rather approximate and realistic measure of solvent effects do not arise from these studies (Sen et al., 2004; Sponer et al., 2010; Tomasi et al., 2005; Ribeiro et al., 2011; Sponer et al., 2009). Moreover, these base pairs often remain within double helical region, where two faces of the base pairs have hydrophobic environment and the edges may have polar environ- ment. Simulation of such complex environment requires compli- cated QM/MM method, which to our knowledge was not done for the base pairs. The molecular dynamics simulation studies, using complete hydration, however, give results similar to the quantum chemical studies (Sponer et al., 2010). Double helical DNA or RNAs mostly have negative propeller twist for the WatsoneCrick base pairs and these give rise to steric clash between the bulky purine bases of successive base pair along minor groove and major groove for pyrimidine-purine and purine- pyrimidine dinucleotide steps, respectively (Calladine, 1982). It may be speculated that non-planarity of the base pairs in propeller twist direction is due to partial pyramidalization of the amino groups involved in hydrogen bonding. The near-universal tendency of negative propeller twist does not remain true for the trans base pairs, and these are positive more often than negative (Mukherjee et al., 2006). Sugar-mediated base pairs mostly deviate from their ideal coplanar geometries. It has been speculated that base pairs involving sugar edge are usually non-planar in order to avoid a steric clash involving the bulky ribose-sugar moiety (Mukherjee et al., Fig. 8. Base quartets for (a) RNA (residues 67, 102, 151 and 170 of 2J00.pdb) and (b) DNA (residues 2, 11, 15 and 22 of 3ERU.pdb). In case of DNA, the four bases in a quartet 2006; Roy et al., 2008). Sometimes the extent of deviation be- are arranged in cyclic orientation. comes greater in the quantum mechanically optimized structures

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 9 compared to that observed in crystal structures. In the cis Sugar along any of its three edges. Even a paired base leaves out the other edge/Sugar edge family (S:SC), a good overall agreement can be two edges available for further interactions with bases from observed between the optimized geometries and crystal structures, different parts of the RNA chain. Thus, non-canonical base pairs the interaction energies indicating their high stability. In contrast, allow secondary or tertiary structural motifs of RNA to interact be- only a few base pairs from trans Sugar edge/Sugar edge family, like tween themselves to maintain folded structure (Lescoute and G:G S:ST, G:C S:ST and A:G S:ST, are quite stable and retain their Westhof, 2006; Butcher and Pyle, 2011). The basic secondary initial base pairing patterns upon geometry optimization (Sponer structural elements of RNA are typified by double helices, bulges, et al., 2005a,b,c). In addition to the common sugar-base and basee internal loops and hairpin loops (Fig. 9). The three dimensional base components, S:SC base pairs also display a polar interaction structures are stabilized by long-range intra-molecular interactions between the 20-OH hydroxyls of the ribose moieties, unique to its between basic secondary structural elements e helices and loops e class. The interaction energies of A:G S:SC, C:G S:SC, G:G S:SC, C:U to yield complex motifs, such as pseudoknots, ribose zippers, kissing S:SC and U:G S:SC are in the range of 23 kcal/mol to 27 kcal/mol, hairpin loops, tetraloopetetraloop receptor interactions, co-axial or compared to 15.3 and 29.4 kcal/mol for A:U W:WC and G:C pseudocontinuous helices (Fig. 10). Many of the frequently found W:WC, respectively (Sponer et al., 2005a). The interaction energies non-canonical base pairs appear quite often within the double he- for the cis WatsoneCrick/Sugar edge base pairs also signify impor- lical regions of different functional RNA structures. These are most tance of 20-OH mediated interactions to stabilize the base pairs common at one of the termini of different double helical stems, (Vokacova et al., 2007). In addition to electrostatic interactions, where the non-canonical base pairs, presumably, have a role in helix contribution of electron correlation component is manifested in capping. On the other hand, there are of double he- stability of the W:SC base pairs because of larger contact area be- lices in functional RNA crystal structures where one or more non- tween the bases. The studies also show high specificity for in- canonical base pairs appear in tandem flanked by regular Wat- teractions through amino- or imino-groups of nucleobases in both soneCrick base pairs (Gautheret et al. 1994). the cis and trans families of W:S-type of base pairs (Sponer et al., Tetraloops are the most common and well-studied type of 2005b,c; Vokacova et al., 2007). The interaction energies vary hairpin loops. There are at least four types of tetraloops that are more widely in W:ST-family with respect to the W:SC-class, the characterized by their sequence and conserved structures e GNRA leading stability coming from baseebase interactions (Sponer et al., type (Heus and Pardi, 1991; Jucker and Pardi, 1995; Leontis and 2005b,c). On the other hand, the sugar-base form is more favoured Westhof, 2002; Correll and Swinger, 2003), UNCG type (Cheong over the baseebase interaction in the W:SC family. The H:SC base et al., 1990), ANYA type (Convery et al., 1998; Rowsell et al., 1998; pair family differs biochemically from the other base pairing families Klosterman et al., 2004) and the (U/A)GNN type (Butcher et al., with distinctive structural and functional roles (Sharma et al., 2010). 1997)[N/ any residue, Y / pyrimidine, R / purine]. In each of It shows a propensity of occurrence between two consecutive bases, these tetraloop families, the second and third form a in which the backbone conformation obstructs the participation of turn in the RNA strand and a non-canonical base pair between the 20-OH group in base pairing interactions. The interaction energies first and fourth nucleotides stabilizes the stem-loop structure, such vary from 5.2 to 20.6 kcal/mol in this family, dispersion being the as G:A H:ST in GNRA tetraloops (Heus and Pardi, 1991; Jucker and leading stabilizing force (Sharma et al., 2010). The base pairs in H:SC Pardi, 1995; Correll and Swinger, 2003) or a WatsoneCrick/sugar family have more hydrophobic character than their counterparts in edge base pair in ANYA loops (Convery et al., 1998; Rowsell et al., H:ST family. The H:ST type of base pairs show stability in their 1998; Klosterman et al., 2004). We have found from our crystal alternative amino-acceptor form as well (Mladek et al., 2009). structure analysis that UNCG type of tetraloops have a U:G S:WT As indicated in Section 4, the BPFIND algorithm can detect type of base pair at the stem-loop junctions in some cases, while in possibly protonated base pairs in RNA structure due to its unique others, it remains unpaired. Quality of these base pairs can be hypothesis driven algorithm. It assumes a possible protonation of judged from analysis of their base pair parameters (Table 1). It has the imino nitrogen atoms when it finds such an atom in close been determined, in general, that the stability of the tetraloop de- proximity of another electronegative atom, such as carbonyl oxy- pends on the composition of bases within the loop and on the gen. This gives rise to such protonated base pairs and many of these composition of this closing base pair (Moody et al., 2004). In case of are found to occur in significant number of times in a non- a GNRA tetraloop, the closing base pair is stacked with the stem redundant dataset of RNA crystal structures. Quantum chemical region, whereas the second and the third bases are also in the 30- studies of these indicate some of them, such as, C(þ):C W:WT, stack (Fig. 9). The step parameters of the single stranded regions C(þ):G W:HC, G(þ):G S:HT etc, are even more stable than the G:C also highlight the near perfect stacking between residues 26 and 27 W:WC base pair (Chawla et al., 2011). Sure enough, protonation is a of GNRA tetraloop (Table 2). Similar type of stacking is also found in costly process and may be favoured by presence of ions or posi- UNCG type of tetraloops, where the second and third bases are tively charged amino acids in close proximity. Thus, these proton- looped out and third and fourth bases are stacked on the 30-side ated base pairs can also act as conformational switch. (Fig. 9). In the example shown in Table 2, the second Uracil residue (18U) is looped out and hence have anomalous stacking parameters 6. Structural organization of RNA: importance of non- with the first uracil (17U) and third cytosine (19C), whereas the canonical base pairs parameters have values similar to A-form helix for the stacking between 19C and 20G and between 20G and the terminal 21G:16C The main driving force for RNA architecture is the packing of RNA W:WC base pair of stem region. In all the above cases the inter-base helices and modules through molecular recognition by specific pair parameters like tilt, roll, twist, shift, slide and rise have values contacts between RNA segments. RNA forms more locally stable within a range allowed for A-form double helical structures structures, or structural motifs that are combinatorially linked and (Table 1). Aside from tetraloops, non-canonical base pairs are found constrained by tertiary interactions for stabilization of the 3D in other types of hairpin loops as well. The single base pair in a structure (Hendrix et al., 2005). These motifs have been previously lonepair triloop is often a non-canonical one, whereas the 5- described as ‘directed and ordered stacked arrays of non-Watsone residue long T-loop present in tRNA is closed by a W:HT interac- Crick base pairs forming distinctive foldings of the phosphodiester tion between nth and (nþ4)th bases. Non-canonical base pairing is backbones of the interacting RNA strands’ (Leontis and Westhof, also common in internal loops (Hendrix et al., 2005). In case of 2003). A base can be approached for hydrogen bonding interaction asymmetric internal loops also, the unpaired bases are usually

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 10 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20

Fig. 9. Secondary structures of RNA (a) Double helix, (bec) Hairpin loops e (b) GNRA tetraloop with first base stacked to 50-end and rest three stacked to 30-end, (c) UNCG tetraloop with first and third bases stacked to 50-end, second base looped out and fourth base stacked to 30-end. (dee) Internal loops e (d) Kink-turn motif with the two helical parts on either side of the turn is in blue and purple. Loop residues are shown as backbone only. (e) Hook-turn motif where the nearer strand is folded at 180 at the residue coloured red. The adjacent two helical stretches are shown in blue and cyan, whereas the other strand is shown as backbone in yellow. surrounded by canonical as well as nonecanonical base pairs. For helices indicate strong stacking interaction between them (Table 2). example, the bulged G motif seen in Sarcin-ricin loop or bacterial Similar is the case for pseudoknots, which contains at least two loop E is formed by 3 stacked non-canonical base pairs with a stem-loop structures. The unpaired loop region of a hairpin-loop bulging G base that forms a base triplet (Wimberly et al., 1993; motif forms base pairing with another part of the same RNA Szewczak and Moore, 1995; Correll et al., 2003). A few represen- chain, folded back on itself, to form the stem region of the other tative data have been shown in Tables 1 and 2. The inter-base pair hairpin-loop. Most of the loop residues participate in non-Watsone stacking parameters for the single base bulges, calculated by Crick base pairing, representing several geometric families NUPARM, show A-form like stacking between the base pairs sur- including A:C W:SC, A:G W:ST, A:G S:ST, A:C S:SC, C:G W:HT etc. rounding single base bulges. In all these cases all the step param- Thus the non-canonical base pairs allow the change in backbone eters, except twist, are similar to those in regular double helical direction at the junction of double helical paired regions and single RNA or DNA. stranded unpaired regions (Fig. 11). Hairpin loops and internal loops need a sudden change in Two other major types of long-range RNAeRNA interactions backbone direction at the junction of double helical stem regions containing non-canonical base pairs, that can be repetitively and unpaired bases (Fig. 11). Non-canonical base pairs seem to observed in domain assembly, are interaction of GNRA tetraloops render flexibility at such junctions, so that the strain introduced in (Costa and Michel, 1995, 1997) with their receptors and loopeloop the backbone is released. Hook-turn motifs also contain a sheared interactions (Lehnert et al., 1996; Costa and Michel, 1997; Costa A:G H:ST pair at the bent region, where the sugar-phosphate et al., 2000) and the kissing loop interactions where the single- backbone is folded by 180 at a single residue (Szep et al., 2003). stranded loop regions of two hairpins interact through base pair- Alike the hairpin loops and internal loops, pseudo-continuous he- ing, forming a composite, coaxially stacked helix (Chang and lices or co-axial helices require a sudden change in the direction of Tinoco, 1994; Ennifar et al., 2001)(Fig. 10). The loop E like struc- the backbone (Fig. 11). The pseudocontinuous helices are formed by ture of loop B in a displays a characteristic set of stacking interaction mediated by a single base or base pair between non-WatsoneCrick pairs (Earnshaw et al., 1997; Butcher et al., two helical stretches aligned along the same axis (Fig. 10). They 1999). Sheared A:G H:ST base pairs (or eventually some other have A:G, C:C, G:U types of non-canonical base pairs at the junction isosteric members of H:ST family) are integral components of the regions which provide rigidity to the linker nucleotides (Kim et al., sarcin/ricin loop, loop E, and kink-turn motifs. Further, the free 1996; Butcher and Pyle, 2011). Inter-base pair or dinucleotide step sugar edge of adenine in this base pair is often involved in critical parameters calculated by NUPARM at the junction of the two tertiary interactions. In case of A-minor motifs, there are four

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 11

Fig. 10. Tertiary motifs in RNA structure. (a) Coaxially stacked helices, (b) between a helical and a single stranded region, (c) Kissing loop interaction between two hairpin loops, (d) Loop-receptor interaction between a tetraloop and its receptor internal loop within a helix.

types e type 0, type I, type II and type III. Among these, type I and with proteins or other ligands as they make additional functional II interactions are specific for adenine due to non-canonical base groups available in the major or minor groove of RNA. The pairing interactions between the nucleobases, whereas Type 0 and discriminatory major-groove edges of the base pairs are buried in III motifs are weaker and non-specific because they are mediated the inaccessible deep groove, whereas the shallow groove permits by interactions with a single 20-OH. In this review we have not access to the rather uniform minor-groove side of canonical pairs. discussed this important base pairing interactions, as these are not RNA helices therefore have little potential for recognition by pro- formed by two hydrogen bonds hence, not detected by BPFIND teins or rest of the RNA chain. However, structural modelling software. studies of natural or synthetic RNAs have revealed the existence of a number of different non-canonical base pairing arrangements 7. Functional importance of non-canonical base pairs occurring as single, tandem or consecutive base pairs within RNA duplexes (Pley et al., 1994; Baeyens et al., 1995, 1996; Shen et al., With an exponential increase in the number of RNA crystal- 1995; Battiste et al., 1996; Cate et al., 1996; Lietzke et al., 1996). structures solved by X-Ray crystallographic method as available Such perturbations in regular RNA helices by non-canonical base in PDB, the number of non-canonical base pairs observed in three- pairing motifs are supposed to be functionally important in dimensional structures of RNA has also increased to date, implying adopting unusual structures, resulting in anchoring sites for metals their growing significance in functional RNAs. Non-canonical base or proteins. Various studies also have validated the importance of pairs have an important role to play in base-specific interactions non-canonical base pairs for specific recognition of ligands and

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 12 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20

Table 1 Base pair parameters, calculated by NUPARM, for representative structural motifs of RNA.

Buckle Open Propeller Stagger Shear Stretch

Paired GNRA tetraloop: PDB ID e 1HQ1 23C:G30 W:WC 11.50 0.14 10.40 0.20 0.00 2.81 24C:G29 W:WC 6.57 0.90 2.54 0.12 0.20 2.86 25G:A28 S:HT 3.36 6.80 12.66 0.77 1.64 3.35 26A X X X X X X 27A X X X X X X 28A:G25 H:ST 3.36 6.80 12.66 0.77 1.64 3.35 29G:C24 W:WC 6.57 0.90 2.54 0.12 0.20 2.86 30G:C23 W:WC 11.50 0.14 10.40 0.20 0.00 2.81 Unpaired UNCG tetraloop: PDB ID e 1I6U 15U:A22 W:WC 5.52 1.33 12.36 0.06 0.05 2.85 16C:G21 W:WC 12.22 1.63 13.15 0.27 0.15 2.96 17U X X X X X X 18U X X X X X X 19C X X X X X X 20G X X X X X X 21G:C16 W:WC 12.22 1.63 13.15 0.27 0.15 2.96 22A:U15 W:WC 5.52 1.33 12.36 0.06 0.05 2.85 Single base bulge: PDB ID e 2J00 138G:164C W:WC 58.62 6.02 23.53 0.93 0.18 3.13 139A:162A W:HT 0.89 3.87 1.74 0.46 1.79 3.29 Single base bulge: PDB ID e 3OFR 1427A:1554G W:WC 23.89 1.43 20.84 0.10 0.04 2.88 1428A:1552C W:WT 2.76 12.77 21.71 0.72 1.98 3.03 Single base bulge: PDB ID e 1VQO 2069C:2119G W:WC 0.13 0.36 5.25 0.17 0.35 2.93 2070A:2117G H:ST 6.27 14.05 0.73 0.07 2.30 3.24 Pseudocontinuous helix: PDB ID e 3BWP 289A:260A W:SC 1.54 3.14 38.38 2.75 1.75 3.09 290C:315G W:WC 22.73 10.76 33.70 5.63 0.07 2.12 Pseudocontinuous helix: PDB ID e 2XQD 1735C:1719G W:WC 12.84 5.77 9.16 0.98 0.29 3.01 1736G:1677G W:WC 3.37 4.06 8.42 0.04 0.08 3.07

Table 2 Stacking parameters, calculated by NUPARM, for representative structural motifs of RNA (Parameters in bold highlighting represent the A-form like stacking.).

Tilt Roll Twist Shift Slide Rise

Paired GNRA tetraloop: PDB ID e 1HQ1 23C:G30 W:WC 2.98 9.17 26.46 0.69 2.14 3.35 24C:G29 W:WC 1.13 8.2 6.79 1.23 1.34 3.58 25G:A28 S:HT 150.24 29.62 22.04 1.59 0.92 4.14 26A 11.29 4.57 39.52 1.05 1.61 3.02 27A 35.4 1.96 105.87 6.63 0.74 1.51 28A:G25 H:ST 1.13 8.2 6.79 1.24 1.35 3.45 29G:C24 W:WC 2.98 9.17 26.46 0.69 2.14 3.35 30G:C23 W:WC X X X X X X Unpaired UNCG tetraloop: PDB ID e 1I6U 15U:A22 W:WC 3.26 3.00 32.72 0.51 1.34 3.23 16C:G21 W:WC 13.73 0.10 34.31 1.08 1.11 3.05 17U 93.62 83.65 45.59 7.68 9.31 3.70 18U 16.56 121.97 46.92 2.18 8.72 6.03 19C 7.23 15.64 21.77 0.87 3.65 4.12 20G 10.48 14.82 18.58 1.57 3.68 2.65 21G:C16 W:WC 3.26 3.00 32.72 0.51 1.34 3.23 22A:U15 W:WC X X X X X X Single base bulge: PDB ID e 2J00 138G:164C W:WC 10.77 1.28 37.03 1.62 0.24 2.38 139A:162A W:HT Single base bulge: PDB ID e 3OFR 1427A:1554G W:WC 2.85 3.66 2.64 2.61 0.29 3.29 1428A:1552C W:WT Single base bulge: PDB ID e 1VQO 2069C:2119G W:WC 1.19 3.43 73.72 2.35 0.57 3.45 2070A:2117G H:ST Pseudocontinuous helix: PDB ID e 3BWP 289A:260A W:SC 2.18 1.07 38.31 0.97 1.72 3.20 290C:315G W:WC Pseudocontinuous helix: PDB ID e 2XQD 1735C:1719G W:WC 1.77 8.45 35.11 0.29 2.06 3.60 1736G:1677G W:WC

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 13

Fig. 11. Schematic representation of the secondary structural motifs of RNA e (a) A-form double helix, (b) Hairpin loop, (c) Internal loop, (d) Pseudocontinuous helices. In all these cases, dotted lines indicate the presence of non-canonical base pairs with a change in sugar-phosphate backbone direction. Canonical base pairs are denoted by broken lines and non-Canonical base pairs are represented by curly lines. proteins in A-form double helices. Sequence conservation and base successive non-canonical base pairs at the central region explains pair covariation studies on different functional domains of group I the variability (Fig. 12(b)). It can be observed that the minor groove intron ribozymes indicate that the presence of several non- is rendered with more recognition sites due to the presence of non- canonical base pairs is a requisite for the structures and enzy- canonical base pairs. matic properties of these molecules (Chandrasekhar and Malathi, 2003). Phylogenetic studies on 16S and 23S ribosomal RNA mole- 8. Non-canonical base pairs in RNA structure prediction cules emphasizes the importance of sequence conservation of A:G W:WC base pair for proper functioning of ribosomes (Sponer et al., Non-canonical base pairs come into play in another important 2003). G:A pair in an RNA dodecamer, consecutive G:U pairs in area of nucleic acid research e prediction of RNA structure. RNA Tetrahymena group I intron and G:A tandems in hammerhead structure prediction methods are commonly focused on two as- ribozymes have been identified as metal-binding site (Pley et al., pects to arrive at a stable structure from a sequence e (i) to 1994; Scott et al., 1995b; Baeyens et al., 1996; Cate and Doudna, maximize the number of base pairs and (ii) to minimize the free 1996). G:U base pairs in RNA provide a complex array of energy. Base pairing complementarities being the principal basis of hydrogen bond donors and acceptors creating a surface area for RNA structure prediction algorithm assuming G can pair with C or U binding of proteins and metal ions. It has to be mentioned here that and A can pair with U - all types of RNA secondary structure a G:U base pair within the acceptor stem of tRNA plays a vital role in analysis begin by identification of self-complementary sequence aminoacylation process (Ramos and Varani, 1997). Purine:Purine regions in the single-stranded molecule (Hendrix et al., 2005). This base-pairing have been found to play an important role in the can be done by various methods like dynamic programming recognition of RNA by Rev protein and in the loop E family of 5S-28S approach, Nussinov algorithm etc (Nussinov and Jacobson, 1980). ribosomal RNAs and hairpin ribozymes, by widening the narrow Thus all possible choices of complementary structures are consid- and deep major groove side of A-form RNA helices (Shen et al., ered to find the most stable structure. Then these potential base 1995; Battiste et al., 1996; Vallurupalli and Moore 2003). A 4 base pairing regions are analyzed following different algorithms like pair long non-canonical motif containing a central G:A:A:G tandem energy minimization considering stacking free energy, entropy due is crucial for proper functioning of selenocysteine to loop formation, etc., to obtain thermodynamically favorable 0 sequence element, near the 3 -UTR of eukaryotic selenoprotein structure. Stacks and base pairs are the dominant stabilizing forces, mRNAs, in mediating selenoprotein translation (Walczak et al., which contribute to the negative free energy, whereas unpaired 1998). Fig. 12(a) shows the disposition of functional groups for bases form destabilizing loops, contributing the positive free en- canonical A:U W:WC and G:C W:WC base pairs along with non- ergy or entropy of the system (Zuker and Stiegler, 1981). Thus, the canonical base pairs like A:G H:ST and G:C W:WT, where the total free energy of a conformation is obtained by adding up the approachable functional groups have been indicted. The ball-and- energy terms for each component, as well as considering the stick representation of the base pairs show that more polar func- nearest neighbour effects (Walter et al., 1994; Mathews et al., 1999). tional groups are projected towards the minor groove side of the Such energy minimization method is employed by RNA structure non-canonical base pairs as compared to those of canonical ones. In prediction software like mFold (Zuker, 2003), RNA Fold (Zuker and addition to these hydrogen binding sites, the shape of the minor Stiegler, 1981; McCaskill, 1990; Hofacker and Stadler, 2006) etc. groove also alters significantly due to these non-canonical base These methods, however, are restricted to finding canonical A:U or pairs. The electrostatic potential surfaces of two double helices e (i) G:C WatsoneCrick base pairs or wobble G:U base pairs and do not canonical A-form double helix (PDB: 1QCU) and (ii) the helix 24 of take into account various other types of non-canonical geometries 16S rRNA in (PDB: 1N32) containing three for calculating the free energy of a probable conformation and

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 14 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20

Fig. 12. (a) A:U W:WC, (b) G:C W:WC, (c) A:G H:ST, (d) G:C W:WT. Base pairs are shown in ball and stick models and the sugar moieties are replaced by eCH3 groups for simplicity. For canonical base pairs, minor groove sides have been indicated. The minor groove sides are not well-defined for isolated non-canonical base pairs. However within a helix, the functional groups protrude towards the minor groove regions. The probable positions of the ligand/solvent molecules have been indicated by green spheres. (e) and (f) show electrostatic, calculated by non-linear solution of PoissoneBolzmann equation using DELPHI software, potential surface of the 1QCU and a non-canonical helix from 1N32 con- taining three successive non-canonical base pairs [A:A s:hT, A:U H:WT, A:G H:ST].

hence the accuracy is never better than 70%. Non-canonical base different biopolymers such as proteins, DNA, bio-membrane, etc. to pairs are sometimes treated as internal loops due to the lack of supplement the limitations of experimental methods. This tech- knowledge about their base pairing or base stacking energy con- nique has been proved to be very useful in structural studies of tributions towards the total free energy of the system. An extensive , in spite of the approximations inherent to the force- study on the stability and energetics of non-canonical base pairs fields and limitations in sampling space due to limited computa- and their stacks will undoubtedly improve the quality and accuracy tional resources. There have been many simulation studies on B- of RNA structure prediction methods. form DNA double helices, for analysis of their sequence-dependent , deformability, elasticity, stacking interactions, 9. Structural stabilities of double helices containing non- or base pairing characteristics (Young et al. 1997; Dixit et al. 2005; canonical base pairs Samanta et al. 2009). Molecular dynamics studies of non-canonical base pairs at the beginning of hairpin loops are also reported Nanosecond-scale molecular dynamics simulations with (Menger et al., 2000; Sorin et al., 2002; Du et al., 2004; Villa et al., explicit solvent model have been successfully employed earlier in 2008). These studies indicate higher stability of the tetraloops like the investigation of the structure, dynamics and deformability of GCAA, UUCG, CACG etc in their ‘closed’ base paired structures,

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 15 where first and fourth residues of the loop regions form a stable Table 3 base pair, sheared G:A S:HT, wobble U:G W:WC and C:G W:WC, Tandem non-canonical motifs present within the double helices of different struc- tural classes. The number of structures in each class is given in the last column. respectively. The hydrogen atoms from base moieties involved in base pairing are observed to be protected from rapid exchange. Motif RNA type Organism Number However, tetraloops are seen to possess inherent flexibility of the of structures second and third residues, where the bases are either stacked with G:A S:HT 23S rRNA Haloarcula marismortui 70 the stem regions or in a looped out conformation. The presence of A:G H:ST Thermus thermophilus 83 looped-out residues with potential hydrogen bond donor and 117 35 acceptor groups is important for the functionality of RNA hairpins 16S rRNA E. coli 93 in its interaction with other proteins and ligands. The loss of T. thermophilus 146 stacking interaction between first and third residues of the tetra- Riboswitch Synthetic 72 loops leads to their destabilization, bringing in unfolding. A:A s:hT 16S rRNA E. coli 93 A:U H:WT T. thermophilus 146 In addition to hairpin loops, Sponer and co-workers have per- A:G H:ST formed molecular dynamics studies on a number of RNA secondary G:A S:HT 23S rRNA H. marismortui 70 structural motifs containing non-canonical base pairs like kink- A:G H:ST turns, kissing loops, pseudoknots, A-minor interactions, sarcin- A:G H:ST ricin motifs etc. (Csaszar et al., 2001; Nissen et al., 2001; Reblova E. coli 83 et al., 2003; Spackova and Sponer, 2006; Reblova et al., 2011). In T. thermophilus 117 general, the motifs are stable and the non-canonical base pairs U:G S:WC 23S rRNA H. marismortui 70 present within these motifs appear to be the key elements to U:U W:WC impart stability. In case of pseudoknot motif, it has been found that G:A S:HT 23S rRNA T. thermophilus 83 A:U H:WT protonation on a cytosine residue is crucial for its structural orga- A:G H:ST nization (Csaszar et al., 2001). Without protonation, the structure A:G W:WC 23S rRNA T. thermophilus 83 undergoes large rearrangements in the local environment. Non- A:G W:WC E. coli 117 canonical base pairs are also very common within double helical G:A S:HT 16S rRNA T. thermophilus 146 regions. Internal loops are formed with single or tandem non- G:A S:HT E. coli 93 A:G H:ST canonical base pairs present in the central regions of double heli- U:U W:WC cal stretches formed with canonical WatsoneCrick base pairs. G:A S:HT 23S rRNA T. thermophilus 83 Single mismatches or the (1 1) internal loops are the most G:A S:HT common motif observed in RNA secondary structures (Peritz et al., A:G H:ST U:U W:WC 23S rRNA H. marismortui 70 1991) and these play integral functional and structural roles (Saito U:U W:WC IRES RNA Cricket paralysis virus Few structures and Richardson, 1981; Calin-Jageman and Nicholson, 2003). Ex- G:A S:HT 23S rRNA E. coli 117 amples of single G:A W:WC, U:U W:WC, A:C W:þC base pair A:G H:ST flanked by regular WatsoneCrick base pairs forming near regular G:G H:zT double helical stems are very common. On the other hand, there are G:G z:HT 16S rRNA T. thermophilus 146 U:A W:HT (in very few structures) large numbers of double helices in functional RNA crystal struc- A:G H:ST tures where two or three non-canonical base pairs appear in tan- U:C W:WC 23S rRNA D. radiodurans 35 dem flanked by regular WatsoneCrick base pairs. A detailed list of U:U W:WC (in very few structures) these motifs is given in Table 3. Though a large number of studies exist regarding the structure and dynamics of non-canonical base pairs situated at termini of RNA double helices or at the beginning RNA fragments containing (1 1), (2 2) and (3 3) non-canonical of hairpin loops (Menger et al., 2000; Sorin et al., 2002; Reblova base pairs in their central regions (Halder and Bhattacharyya, 2010, et al., 2003, 2007; Romanowska et al., 2008; Villa et al., 2008; 2012). Table 3 indicates that (G:A S:HT)::(A:G H:ST), (A:A Ditzler et al., 2009; Reblova et al., 2011), very few studies explain s:hT)::(A:U H:WT)::(A:G H:ST) and (G:A S:HT)::(A:G H:ST)::(A:G the nature of dynamics and structural features of A-form double- H:ST) are the most common among various naturally occurring helical stretches of RNA containing non-canonical base pairing motifs. It was observed from MD simulations that the tandem non- motifs (Reblova et al., 2006; Spackova and Sponer, 2006). Obviously canonical base pairs present within RNA duplex regions are more the question remains e whether these non-canonical base pairs stable than the singly occurring ones like A:G W:WC and U:U have emerged in the large macromolecules due to constraints of the W:WC. Single mismatches were seen to undergo structural tran- remaining sections. Had these non-canonical base pairs arise in sition after a certain period of time, whereas tandem mismatches RNA due to contextual pressure or chance, these would be unstable remained stable throughout the production runs and retain their in absence of any such pressure. Moreover, systematic character- initial orientations. These observations were validated from the ization of these base pairs and base pair stacks is essential for RNA structural classification (Halder and Bhattacharyya, 2012; Ray proper understanding of their contribution towards RNA stability. et al., 2012). The occurrences of single mismatches in their Even if there have been both classical and quantum chemical respective structural classes are lower than that of the 2 2 and studies about structure and energetics of different non-canonical 3 3 mismatches. Helices containing tandemly occurring non- base pairs (Hobza and Sponer, 1999; Sponer et al., 2003, 2004; canonical base pairs are more conserved in crystal structures. We Bhattacharyya et al., 2007; Roy et al., 2008; Sharma et al., 2008; find that A:G and U:U base pairs have frequencies 194 and 374 Mladek et al., 2009; Chawla et al., 2011; Panigrahi et al., 2011a,b), respectively within double helical regions, while tandem mis- little can be found out regarding the base pair steps containing matches have frequencies 829 (for 2 2) and 97 (for 3 3). Both them. the (2 2) motif and U:U mismatch are found in small ribosomal In order to characterize role of these non-canonical base pairs e subunit (16S rRNA) of T. thermophilus and Escherichia coli and large whether they are stabilized by the rest of the RNA structure or they ribosomal subunits (23S rRNA) of Haloarcula marismortui, Dein- act as a seed for formation of the double helical regions e we had ococcus radiodurans, T. thermophilus and E. coli, sometimes occur- performed molecular dynamics simulation of four double helical ring more than once within the same molecule. The A:G mismatch

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 16 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 is found within small subunits of T. thermophilus and E. coli as well RNA have been implemented in FR3D software for searching as large subunit of E. coli. The (3 3) motif is less populated and is recurrent RNA structural motifs (Sarver et al., 2008). Duarte and found only within small ribosomal subunit (16S rRNA) of Pyle devised a novel approach, for characterizing specific structural T. thermophilus and E. coli. However, this motif is highly conserved distortions within an RNA polymer, by introducing the concept of among all the crystal structures of 16S rRNA of T. thermophilus and pseudorotation torsion angles around PeC40 joining pseudobonds E. coli. On the other hand, although U:U and A:G mismatches are in the backbones (Duarte and Pyle, 1998). Based on the PeC40 found in more types of molecules spanning both large and small pseudorotation values, PRIMOS software can classify and identify ribosomal subunits, the base pairs are not conserved among all the different structural motifs, e.g., GNRA tetraloops, in RNA molecules structures in the respective classes and thus have lower propensity (Duarte et al., 2003; Wadley et al., 2007). Another important feature within double helical regions. to describe RNA backbone conformation is the intra-strand C10eC10 Non-canonical mismatches naturally induce deformations distances along the chain. In our study of non-redundant set of RNA within the helical stack, thus causing deviation from canonical A- structures, we found that double helical regions with Watsone form. While the purineepurine mismatches like A:G W:WC require Crick base pairs have an average intra-strand C10eC10 separation of larger strand separation, a pyrimidineepyrimidine base pair like 5.5Åe5.7 A, whereas the values for non-canonical base pairs are U:U would require a narrowing in the helical architecture. The A:G higher. Moreover, unpaired regions have an intra-strand C10eC10 0 W:WC-base pair has an initial distance of 13.0 A between the C1 - separation of 8 Ae9 A. Therefore, consideration of the backbone atoms, whereas the values for normal WatsoneCrick base pairs are conformations for RNA structure prediction is an essential aspect to 10.69 A(AeU/UeA) and 10.77 A (G]C/C]G) (Panigrahi et al., make the prediction more reliable and accurate. 2011a,b). Such deformations are perhaps required for proper mo- lecular recognition. The (2 2) and (3 3) non-canonical base 10. Conclusion pairs also introduce a cleft in the central regions of helices having lower strand separations than canonical WatsoneCrick base pairs. In the last few decades, study of functional RNAs has been of Considering the distance between C10-atoms as an estimation of utmost need because of the structural and functional diversity dis- helical strand separation, we can easily imagine a swelled region played by them. It has been established through many experimental within the helix due to the presence of A:G W:WC-pair. On the works that non-canonical base pairs are the key to tertiary structural contrary, wobble G:U in its original W:WC form is nearly isosteric organization of RNA. They are seen to be present in various types of with canonical WatsoneCrick base pairs having a strand separation RNA secondary structural motifs. We have seen that non-canonical of 10.30 A. It has been found that G:U base pairs adjacent to the base pairs present in tandem within the double helical regions are single mismatches undergo structural transition to W:SC form, highly stable, whereas singly occurring non-canonical base pairs where the strand separation is lower [9.7 A]. Thus, the G:U base pair undergo structural transition to maintain the helical architecture and appear as a shock absorbing buffer with its required flexibility in to reduce the conformational strain introduced in the double helix by the local environment through its bi-phasic stability. the inclusion of these mismatches. The substantial stability of non- According to the wobble hypothesis, G:U-base pairs are known canonical base pairs is indicative of their importance as a seed or to populate the third position of codon-anticodon recognition, ‘nucleation site’ for folding of functional RNAs. In terminal regions, along with other wobble base pairs containing , during the the non-canonical base pairs have a role to play in helix-capping for protein synthesis process within ribosome. Thus these base pairs stabilization of the double helix by preventing terminal melting. often appear at the termini of mini helices between codon of mRNA Such ‘capping’ roles of non-canonical base pairs are also observed at and anticodon of tRNA, giving rise to distortion necessary for the the junction of stem-loop regions and at the interfaces of coaxial abrupt change in strand direction. It is also speculated that the helices. On the other hand, DNA molecules are rather monotonous overall geometry of codon-anticodon mini-helix is, in many cases, and have only long double helical structures. They require high determined by that of the wobble position. The wobble position is nucleation energy for opening of melting bubble during replication already known to display variation in terms of both base pairing and transcription processes, which can then propagate in both di- geometry and chemical modification of the bases. We have seen rections to give cooperative melting profile. Such cooperative nature that the structural transition of G:U-base pairs is crucial to induce of melting of RNA is not required, as RNA structures are comprised of flexibility at the junction of canonical and non-canonical regions short double helical regions interspersed by stretches of unpaired within a double helix. Additionally, protein translation mechanism nucleotides. Terminal melting is an inherent tendency of double requires flexibility at this region because the tRNA binds to mRNA helices because of their finite length effect and partial hydrophobic with high specificity and leaves the binding site of mRNA once character of the base faces. This ease of melting is possibly utilized by peptide bond formation is complete. The bi-phasic stability of G:U- nature in forming molecules like riboswitches, which can easily base pairs at wobble position may play an important role in the convert to an alternative conformation depending on the presence of regulation of short-lived codon-anticodon recognition by acting as a particular type of metabolite or ligand (Vitreschak et al., 2004; a conformational switch. Tucker and Breaker, 2005; Sharma et al., 2009a,b). It is a prerequi- We have discussed that non-canonical base pairs play an site for RNA double helices to be capped by non-canonical base pairs, important role in RNA structural organization by accommodating hairpin loops or coaxial stacking to maintain a stable structure. changes in backbone direction. The inclusion of non-canonical base pairs within double helical stretches is also associated with intro- duction of conformational strain into the backbone (Halder and Acknowledgement Bhattacharyya, 2010). Thus backbone conformation of RNA has been studied extensively and used for prediction and searching We are thankful to Prof. Abhijit Mitra and Prof. Manju Bansal, Dr. structural motifs in many cases (Murray et al., 2003; Schneider Purshottam Sharma, Dr. Arvind Marathe, D.K. Senthil Kumar, Pavan et al., 2004; Zirbel et al., 2009). Few specific baseephosphate in- Kumar Pingali and Angana Ray for suggestions and discussions. We teractions in ribosomal RNAs have been found to be phylogeneti- are also thankful to Swati Panigrahi and Rahul Pal for providing cally conserved. Many hairpin loops are stabilized by hydrogen unpublished RNA base pair data through their database. We are bonding between nucleobases and phosphate groups (Zirbel et al., grateful to Department of Biotechnology, Govt. of India for partial 2009). This conserved nature of baseephosphate interactions in financial support.

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 17

References Convery, M.A., Rowsell, S., Stonehouse, N.J., Ellington, A.D., Hirao, I., Murray, J.B., Peabody, D.S., Phillips, S.E., Stockley, P.G., 1998. Crystal structure of an RNA aptamer-protein complex at 2.8 Å resolution. Nat. Struct. Biol. 5 (2), 133e139. Abu Almakarem, A.S., Petrov, A.I., Stombaugh, J., Zirbel, C.L., Leontis, N.B., 2012. Nucl. Correll, C.C., Beneken, J., Plantinga, M.J., Lubbers, M., Chan, Y.L., 2003. The common Acids Res. 40 (4), 1407e1423. and the distinctive features of the bulged-G motif based on a 1.04 Å resolution Auffinger, P., Westhof, E., 1996. H-bond stability in the tRNA(Asp) anticodon hairpin: RNA structure. Nucl. Acids Res. 31 (23), 6806e6818. 3 ns of multiple molecular dynamics simulations. Biophys. J. 71 (2), 940e954. Correll, C.C., Swinger, K., 2003. Common and distinctive features of GNRA tetraloops Auffinger, P., Westhof, E., 1998. Hydration of RNA base pairs. J. Biomol. Struct. Dyn. based on a GUAA tetraloop structure at 1.4 Å resolution. RNA 9 (3), 355e363. 16 (3), 693e707. Costa, M., Michel, F., 1995. Frequent use of the same tertiary motif by self-folding Auffinger, P., Westhof, E., 1999. Singly and bifurcated hydrogen-bonded base-pairs RNAs. EMBO J. 14 (6), 1276e1285. in tRNA anticodon hairpins and ribozymes. J. Mol. Biol. 292 (3), 467e483. Costa, M., Michel, F., 1997. Rules for RNA recognition of GNRA tetraloops deduced by Bachellerie, J.P., Cavaille, J., Huttenhofer, A., 2002. The expanding snoRNA world. in vitro selection: comparison with in vivo evolution. EMBO J.16 (11), 3289e3302. Biochimie 84 (8), 775e790. Costa, M., Michel, F., Westhof, E., 2000. A three-dimensional perspective on Baeyens, K.J., Debondt, H.L., Holbrook, S.R., 1995. Structure of an RNA double binding by a group ii self-splicing intron. EMBO J. 19 (18), 5007e5018. helix including uracil-uracil base-pairs in an internal loop. Nat. Struct. Biol. 2 Csaszar, K., Spackova, N., Stefl, R., Sponer, J., Leontis, N.B., 2001. Molecular dynamics (1), 56e62. of the frame-shifting pseudoknot from beet western yellows virus: the role of Baeyens, K.J., DeBondt, H.L., Pardi, A., Holbrook, S.R., 1996. A curved RNA helix non- WatsoneCrick base-pairing, ordered hydration, cation binding and base incorporating an internal loop with GA and AA non-WatsoneCrick base on stability and unfolding. J. Mol. Biol. 313 (5), 1073e1091. pairing. Proc. Natl. Acad. Sci. U. S. A. 93 (23), 12851e12855. Das, G.K., Bhattacharyya, D., Burma, D.P., 1999. A possible mechanism of peptide Ban, N., Nissen, P., Hansen, J., Moore, P.B., Steitz, T.A., 2000. The complete atomic bond formation on ribosome without mediation of peptidyl transferase. structure of the large ribosomal subunit at 2.4 Å resolution. Science 289 (5481), J. Theoret. Biol. 200 (2), 193e205. 905e920. Das, J., Mukherjee, S., Mitra, A., Bhattacharyya, D., 2006. Non-canonical base pairs Bansal, M., Bhattacharyya, D., Ravi, B., 1995. NUPARM and NUCGEN: software for and higher order structures in nucleic acids: crystal structure database analysis. analysis and generation of sequence dependent nucleic acid structures. Com- J. Biomol. Struct. Dyn. 24 (2), 149e161. put. Aided Biosci. 11, 281e287. Desiraju, G.R., 2010. A bond by any other name. Angew. Chem. Int. Ed. 49, 2e10. Batey, R.T., Rambo, R.P., Doudna, J.A., 1999. Tertiary motifs in RNA structure and Dickerson, R.E., 1989. Definitions and nomenclature of folding. Angew. Chem. Int. Ed. 38 (16), 2327e2343. components. Nucl. Acids Res. 17, 1797e1803. Bartel, D.P., 2009. MicroRNAs: target recognition and regulatory functions. Cell 136 Dickerson, R.E., 1998. DNA bending: the prevalence of kinkiness and the virtues of (2), 215e233. normality. Nucl. Acids Res. 26, 1906e1926. Batey, R.T., 2006. Structures of regulatory elements in mRNAs. Curr. Opin. Struct. Ditzler, M.A., Sponer, J., Walter, N.G., 2009. Molecular dynamics suggest multi- Biol. 16 (3), 299e306. functionality of an adenine imino group in acid-base catalysis of the hairpin Battiste, J.L., Mao, H.Y., Rao, N.S., Tan, R.Y., Muhandiram, D.R., Kay, L.E., Frankel, A.D., ribozyme. RNA 15 (4), 560e575. Williamson, J.R., 1996. Alpha helix-RNA major groove recognition in an HIV-1 Dixit, S.B., Beveridge, D.L., Case, D.A., Cheatham 3rd, T.E., Giudice, E., Lankas, F., rev peptide rre RNA complex. Science 273 (5281), 1547e1551. Lavery, R., Maddocks, J.H., Osman, R., Sklenar, H., Thayer, K.M., Varnai, P., 2005. Berger, I., Egli, M., Rich, A., 1996. Inter-strand C-H...O hydrogen bonds stabilizing Molecular dynamics simulations of the 136 unique tetranucleotide sequences of four-stranded intercalated molecules: stereoelectronic effects of O4’ in DNA oligonucleotides. II: sequence context effects on the dynamical structures cytosine-rich DNA. Proc. Natl. Acad. Sci. U. S. A. 93 (22), 12116e12121. of the 10 unique dinucleotide steps. Biophys. J. 89, 3721e3740. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Dorner, S., Panuschka, C., Schmid, W., Barta, A., 2003. Mononucleotide derivatives as Shindyalov, I.N., Bourne, P.E., 2000. The protein data Bank. Nucl. Acids Res. 28, ribosomal P-site substrates reveal an important contribution of the 20-OH to 235e242. activity. Nucl. Acids Res. 31, 6536e6542. Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer Jr., E.F., Brice, M.D., Rodgers, J.R., Duarte, C.M., Pyle, A.M., 1998. Stepping through an RNA structure: A novel approach Kennard, O., Shimanouchi, T., Tasumi, M.,1977. The protein data bank: a computer- to conformational analysis. J. Mol. Biol. 284 (5), 1465e1478. based archival file for macromolecular structures. Eur. J. Biochem. 80 (2), 319e324. Duarte, C.M., Wadley, L.M., Pyle, A.M., 2003. RNA structure comparison, motif Bhattacharyya, D., Koripella, S.C., Mitra, A., Rajendran, V.B., Sinha, B., 2007. Theo- search and discovery using a reduced representation of RNA conformational retical analysis of noncanonical base pairing interactions in RNA molecules. space. Nucl. Acids Res. 31 (16), 4755e4761. J. Biosci. 32 (5), 809e825. Du, Z., Ulyanov, N.B., Yu, J., Andino, R., James, T.L., 2004. NMR structure of loop B Bindelwald, E., Hayes, R., Yingling, Y., Kasprzak, W., Shapiro, B.A., 2008. RNA- RNAs from the stem-loop IV domain of the enterovirus internal ribosome entry Junction: a database of RNA junctions and kissing loops for three-dimensional site: a single C to U substitution drastically changes the shape and flexibility of structural analysis and nanodesign. Nucl. Acids Res. 36, D392eD397. RNA. Biochemistry 43, 5757e5771. Bloomfield, V.A., Crothers, D.M., Tinoco, I., Hearst, J.E., Wemmer, D.E., Killman, P.A., Earnshaw, D.J., Masquida, B., Muller, S., Sigurdsson, S.T., Eckstein, F., Westhof, E., Turner, D.H., 2000. Nucleic Acids: Structures, Properties and Functions. Uni- Gait, M.J., 1997. Inter-domain cross-linking and molecular modelling of the versity Science Books, Sausalito, California, USA. hairpin ribozyme. J. Mol. Biol. 274 (2), 197e212. Brandl, M., Lindauer, K., Meyer, M., Suhnel, J., 1999. CeH...O and CeH...N interactions Ennifar, E., Walter, P., Ehresmann, B., Ehresmann, C., Dumas, P., 2001. Crystal in RNA structures. Theor. Chem. Acc. 101 (1e3), 103e113. structures of coaxially stacked kissing complexes of the HIV-1 RNA dimeriza- Butcher, S.E., Dieckmann, T., Feigon, J., 1997. Solution structure of the conserved tion initiation site. Nat. Struct. Biol. 8 (12), 1064e1068. 16S-like ribosomal RNA UGAA tetraloop. J. Mol. Biol. 268 (2), 348e358. Ferre-D’Amare, A.R., Zhou, K., Doudna, J.A., 1998. Crystal structure of a hepatitis Butcher, S.E., Allain, F.H., Feigon, J., 1999. Solution structure of the loop B domain delta virus ribozyme. Nature 395 (6702), 567e574. from the hairpin ribozyme. Nat. Struct. Biol. 6 (3), 212e216. Gautheret, D., Konings, D., Gutell, R.R., 1994. A major family of motifs involving GA Butcher, S.E., Pyle, A.M., 2011. The molecular interactions that stabilize RNA tertiary mismatches in ribosomal-RNA. J. Mol. Biol. 242 (1), 1e8. structure: RNA motifs, patterns, and networks. Acc. Chem. Res. 44 (12),1302e1311. Gendron, P., Lemieux, S., Major, F., 2001. Quantitative analysis of nucleic acid three- Calin-Jageman, I., Nicholson, A.W., 2003. Mutational analysis of an RNA internal dimensional structures. J. Mol. Biol. 308, 919e936. loop as a reactivity epitope for Escherichia coli ribonuclease III substrates. Ghosh, A., Bansal, M., 2003. A glossary of DNA structures from A to Z. Acta Cryst. D. Biochemistry 42 (17), 5025e5034. Biol. Cryst. 59, 620e626. Calladine, C.R., 1982. Mechanics of sequence-dependent stacking of bases in B-DNA. Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., Eddy, S.R., 2003. Rfam: an J. Mol. Biol. 25, 343e352. RNA family database. Nucl. Acids Res. 31 (1), 439e441. Cantor, C.R., Schimmel, P.R., 1980. Biophysical Chemistry. In: Part II: Techniques for Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., Bateman, A., 2005. the Study of Biological Structure and Function. W.H. Freeman & Co, New York. Rfam: annotating non-coding RNAs in complete . Nucl. Acids Res. 33 Cate, J.H., Doudna, J.A., 1996. Metal-binding sites in the major groove of a large (Database issue), D121eD124. ribozyme domain. Structure 4 (10), 1221e1229. Guerriertakada, C., Gardiner, K., Marsh, T., Pace, N., Altman, S., 1983. The RNA moiety Cate, J.H., Gooding, A.R., Podell, E., Zhou, K., Golden, B.L., Kundrot, C.E., Cech, T.R., of ribonuclease-P is the catalytic subunit of the enzyme. Cell 35 (3), 849e857. Doudna, J.A., 1996. Crystal structure of a group I ribozyme domain: principles of Halder, S., Bhattacharyya, D., 2010. Structure stability of tandemly occurring non- RNA packing. Science 273 (5282), 1678e1685. canonical basepairs within double helical fragments: molecular dynamics Chandrasekhar, K., Malathhi, R., 2003. Non-Watson Crick base pairs might stabilize studies of functional RNA. J. Phys. Chem. B 114, 14028e14040. RNA structural motifs in ribozymes e a comparative study of group-I intron Halder, S., Bhattacharyya, D., 2012. Structural variations of single and tandem structures. J. Biosci 28 (5), 547e555. mismatches in RNA duplexes: a joint MD simulation and crystal structure Chang, K.Y., Tinoco Jr., I., 1994. Characterization of a kissing hairpin complex derived database analysis. J. Phys. Chem. B 116, 11845e11856. from the human immunodeficiency virus . Proc. Natl. Acad. Sci. U. S. A. Hendrix, D.K., Brenner, S.E., Holbrook, S.R., 2005. RNA structural motifs: building 91 (18), 8705e8709. blocks of a modular . Q. Rev. Biophys. 38 (3), 221e243. Chawla, M., Sharma, P., Hader, S., Bhattacharyya, D., Mitra, A., 2011. Protonation of Hermann, T., Patel, D.J., 1999. Stitching together RNA tertiary architectures. J. Mol. base pairs in RNA: context analysis and quantum chemical investigations of Biol. 294 (4), 829e849. their geometries and stabilities. J. Phys. Chem. B 115 (6), 1469e1484. Heus, H.A., Pardi, A., 1991. Structural features that give rise to the unusual stability Chen, C., Russu, I.M., 2004. Sequence-dependence of the energetics of opening of at of RNA hairpins containing GNRA loops. Science 253 (5016), 191e194. basepairs in DNA. Biophys. J. 87 (4), 2545e2551. Higgs, P.G., 2000. RNA secondary structure: physical and computational aspects. Cheong, C., Varani, G., Tinoco Jr., I., 1990. Solution structure of an unusually stable 0 Q. Rev. Biophys. 33 (3), 199e253. RNA hairpin, 5 GGAC(UUCG)GUCC. Nature 346 (6285), 680e682.

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 18 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20

Hobza, P., Sponer, J., 1999. Structure, energetics, and dynamics of the nucleic acid Lu, X.J., Olson, W.K., 2003. 3DNA: a software package for the analysis, rebuilding and base pairs: nonempirical ab initio calculations. Chem. Rev. 99 (11), 3247e3276. visualization of three-dimensional nucleic acid structures. Nucl. Acids Res. 31 Hofacker, I.L., Stadler, P.F., 2006. rRNA secondary structures. Bioinformatics 22 (10), (17), 5108e5121. 1172e1176. Massire, D., Westhof, E., 1998. MANIP: an interactive tool for modelling RNA. J. Mol. Holm, L., Sander, C., 1997. DALI/FSSP classification of three-dimensional protein Graph. Model 16, 197e205. folds. Nucl. Acids Res. 25 (1), 231e234. Matera, A.G., Terns, R.M., Terns, M.P., 2007. Non-coding RNAs: lessons from the Hubbard, T.J.P., Murzin, A.G., Brenner, S.E., Chothia, C., 1997. SCOP: a structural small nuclear and small nucleolar RNA. Nat. Rev. Mol. Cell Biol. 8, 209e220. classification of proteins database. Nucl. Acids Res. 25 (1), 236e239. Mathews, D.H., Sabina, J., Zuker, M., Turner, D.H., 1999. Expanded sequence Jones, S., van Hayningen, P., Berman, H.M., Thornton, J.M., 1999. Protein-DNA dependence of thermodynamic parameters improves prediction of RNA sec- interaction: a structural analysis. J. Mol. Biol. 287, 877e896. ondary structure. J. Mol. Biol. 288 (5), 911e940. Joseph, J., Jemmis, E.D., 2007. Red-, blue-, or no-shift in hydrogen bonds: a unified McCaskill, J.S., 1990. The equilibrium partition function and base pair binding explanation. J. Amer. Chem. Soc. 129 (15), 4620e4632. probabilities for RNA secondary structure. Biopolymers 29, 1105e1119. Jucker, F.M., Pardi, A., 1995. GNRA tetraloops make a U-turn. RNA 1 (2), 219e222. Menger, M., Eckstein, F., Porschke, D., 2000. Dynamics of the RNA hairpin GNRA Kabsch, W., Sander, C., 1983. Dictionary of protein secondary structure: pattern tetraloop. Biochemistry 39 (15), 4500e4507. recognition of hydrogen-bonded and geometrical features. Biopolymers 22 (12), Mironov, A.S., Gusarov, I., Rafikov, R., Lopez, L.E., Shatalin, K., Kreneva, R.A., 2577e2637. Perumov, D.A., Nudler, E., 2002. Sensing small molecules by nascent RNA: a Katsamba, P.S., Myszka, D.G., Laird-Offringa, I.A., 2001. Two functionally distinct mechanism to control transcription in bacteria. Cell 111 (5), 747e756. steps mediate high affinity binding of U1A protein to U1 hairpin II RNA. J. Biol. Mituyama, T., Yamada, K., Hattori, E., Okida, H., Ono, Y., Terai, G., Yoshizawa, A., Chem. 276 (24), 21476e21481. Komori, T., Asai, K., 2009. The functional RNA database 3.0: databases to support Kazantsev, A.V., Krivenko, A.A., Harrington, D.J., Holbrook, S.R., Adams, P.D., mining and annotation of functional RNAs. Nucl. Acids Res. 37 (Database issue), Pace, N.R., 2005. Crystal structure of a bacterial ribonuclease P RNA. Proc. Natl. D89eD92. Acad. Sci. U. S. A. 102 (38), 13392e13397. Mladek, A., Sharma, P., Mitra, A., Bhattacharyya, D., Sponer, J., Sponer, J.E., 2009. Kim, J., Walter, A.E., Turner, D.H., 1996. Thermodynamics of coaxially stacked helices Trans Hoogsteen/sugar edge base pairing in RNA. Structures, energies, and with GA and CC mismatches. Biochemistry 35 (43), 13753e13761. stabilities from quantum chemical calculations. J. Phys. Chem. B 113 (6), 1743e Kirilova, S., Carugo, O., 2011. Hydration sites of unpaired RNA bases: a statistical 1755. analysis of the PDB structures. BMC Struct. Biol. 11 (41), 1e12. Moody, E.M., Feerrar, J.C., Bevilacqua, P.C., 2004. Evidence that folding of an RNA Kirmizialtin, S., Elber, R., 2010. Computational exploration of mobile ion distribu- tetraloop hairpin is less cooperative than its DNA counterpart. Biochemistry 43 tions around RNA duplex. J. Phys. Chem. B 114 (24), 8207e8220. (25), 7992e7998. Kiss-Laszlo, Z., Henry, Y., Bachellerie, J.P., Caizergues-Ferrer, M., Kiss, T., 1996. Site- Moore, P.B., 1999. Structural motifs in RNA. Ann. Rev. Biochem. 68, 287e300. specific ribose methylation of preribosomal RNA: a novel function for small Mukherjee, S., Bansal, M., Bhattacharyya, D., 2006. Conformational specificity of nucleolar RNAs. Cell 85 (7), 1077e1088. non-canonical base pairs and higher order structures in nucleic acids: crystal Klosterman, P.S., Tamura, M., Holbrook, S.R., Brenner, S.E., 2002. SCOR: a structural structure database analysis. J. Comput. Aided Mol. Des. 20, 629e645. classification of RNA database. Nucl. Acids Res. 30 (1), 392e394. Murthy, V.L., Rose, G.D., 2003. RNABase: an annotated database of RNA structures. Klosterman, P.S., Hendrix, D.K., Tamura, M., Holbrook, S.R., Brenner, S.E., 2004. Nucl. Acids Res. 31 (1), 502e504. Three-dimensional motifs from the SCOR, structural classification of RNA Murray, L.J., Arendall 3rd, W.B., Richardson, D.C., Richardson, J.S., 2003. RNA back- database: extruded strands, base triples, tetraloops and U-turns. Nucl. Acids bone is rotameric. Proc. Natl. Acad. Sci. U. S. A 100 (24), 13904e13909. Res. 32 (8), 2342e2352. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C., 1995. SCOP e a structural Krasilnikov, A.S., Yang, X.J., Pan, T., Mondragon, A., 2003. Crystal structure of the classification of proteins database for the investigation of sequences and specificity domain of ribonuclease P. Nature 421 (6924), 760e764. structures. J. Mol. Biol. 247 (4), 536e540. Krasilnikov, A.S., Xiao, Y.H., Pan, T., Mondragon, A., 2004. Basis for structural di- Nagaswamy, U., Voss, N., Zhang, Z., Fox, G.E., 2000. Database of non-canonical base versity in homologous RNAs. Science 306 (5693), 104e107. pairs found in known RNA structures. Nucl. Acids Res. 28 (1), 375e376. Kruger, K., Grabowski, P.J., Zaug, A.J., Sands, J., Gottschling, D.E., Cech, T.R., 1982. Self- Nagaswamy, U., Larios-Sanz, M., Hury, J., Collins, S., Zhang, Z., Zhao, Q., Fox, G.E., splicing RNA e auto-excision and auto-cyclization of the ribosomal-RNA 2002. NCIR: a database of non-canonical interactions in known RNA structures. intervening sequence of tetrahymena. Cell 31 (1), 147e157. Nucl. Acids Res. 30 (1), 395e397. Kusenda, B., Mraz, M., Mayer, J., Pospisilova, S., 2006. MicroRNA biogenesis, func- Nahvi, A., Sudarsan, N., Ebert, M.S., Zou, X., Brown, K.L., Breaker, R.R., 2002. Genetic tionality and cancer relevance. Biomed. Pap. Med. Fac. Univ. Palacky Olomouc control by a metabolite binding mRNA. Chem. Biol. 9 (9), 1043e1049. Czech Repub. 150 (2), 205e215. Neidle, S., 2002. Nucleic Acid Structure and Recognition. Oxford University Press. Lavery, R., Moakher, M., Maddocks, J.H., Petkeviciute, D., Zakrzewska, K., 2009. Nissen, P., Ippolito, J.A., Ban, N., Moore, P.B., Steitz, T.A., 2001. RNA tertiary in- Conformational analysis of nucleic acids revisited: curvesþ. Nucl. Acids Res. 37, teractions in the large ribosomal subunit: the A-minor motif. Prot. Nat. Acad. 5917e5929. Sci. U. S. A. 98 (9), 4899e4903. Lee, S., Blundell, T.L., 2009. BIPA: a database for protein-nucleic acid interaction in Noeske, J., Richter, C., Marc, A.G., Nasiri, H.R., Schwalbe, H., Wohnert, J., 2005. An 3D structures. Bioinformatics 25 (12), 1559e1560. intermolecular base triple as the basis of ligand specificity and affinity in the Lee, j.C., Gutell, R.R., 2004. Diversity of base-pair conformations and their occur- guanine- and adenine-sensing riboswitch RNAs. Proc. Natl. Acad. Sci. U. S. A. 102 rence in rRNA structure and RNA structural motifs. J. Mol. Biol. 344 (5), 1225e (5), 1372e1377. 1249. Nudler, E., Mironov, A.S., 2004. The riboswitch control of bacterial metabolism. Lehnert, V., Jaeger, L., Michel, F., Westhof, E., 1996. New loop-loop tertiary Trends Biochem. Sci. 29 (1), 11e17. interactions in self-splicing introns of subgroup IC and ID: a complete Nussinov, R., Jacobson, A.B., 1980. Fast algorithm for predicting the secondary 3D model of the Tetrahymena thermophila ribozyme. Chem. Biol. 3 (12), structure of single-stranded RNA. Proc. Natl. Acad. Sci. U. S. A 77 (11), 993e1009. 6309e6313. Lemieux, S., Major, F., 2002. RNA canonical and non-canonical base pairing types: a Olson, W.K., Bansal, M., Burley, S.K., Dickerson, R.E., Gerstein, M., Harvey, S.C., recognition method and complete repertoire. Nucl. Acids Res. 30 (19), 4250e Heineman,U.,Lu,X.J.,Neidle,S.,Shakked,Z.,Sklenar,H.,Suzuki,M., 4263. Tung, C.S., Westhof, E., Wolberger, C., Berman, H.M., 2001. A standard refer- Leonard, G.A., McAuley-Hecht, K., Brown, T., Hunter, W.N., 1995. Do CeH...O ence frame for the description of nucleic acid base-pair geometry. J. Mol. Biol. hydrogen bonds contribute to the stability of nucleic acid base pairs? Acta Cryst. 313, 229e237. D Biol. Crystallogr. 51 (Pt 2), 136e139. Olson, W.K., Esguerra, M., Xin, Y., Lu, X.J., 2009. New information content in RNA Leontis, N.B., Westhof, E., 2001. Geometric nomenclature and classification of RNA base pairing deduced from quantitative analysis of high-resolution structures. base pairs. RNA 7 (4), 499e512. Methods 47 (3), 177e186. Leontis, N.B., Stombaugh, J., Westhof, E., 2002. The non-WatsoneCrick base pairs Panigrahi, S., Pal, R., Bhattacharyya, D., 2011a. Structure and energy of non- and their associated isostericity matrices. Nucl. Acids Res. 30 (16), 3497e3531. canonical basepairs: comparison of various computational chemistry Leontis, N.B., Westhof, E., 2002. The annotation of RNA motifs. Comp. Funct. Genom. methods with crystallographic ensembles. J. Biomol. Struct. Dyn. 29 (3), 3 (6), 518e524. 541e556. Leontis, N.B., Westhof, E., 2003. Analysis of RNA motifs. Curr. Opin. Struct. Biol. 13 Panigrahi, S., Bhattacharya, A., Bandyopadhyay, D., Grabowski, S.J., (3), 300e308. Bhattacharyya, D., Banerjee, S., 2011b. Wetting property of the edges of mon- Lescoute, A., Westhof, E., 2006. The interaction networks of structured RNAs. Nucl. oatomic step on graphite: frictional-force microscopy and ab initio quantum Acids Res. 34 (22), 6587e6604. chemical studies. J. Phys. Chem. C 115, 14819e14826. Liang, X.H., Liu, L., Michaeli, S., 2001. Identification of the first trypanosome H/ACA Panigrahi, S.K., Desiraju, G.R., 2007. Strong and weak hydrogen bonds in drug-DNA RNA that guides formation on rRNA. J. Biol. Chem. 276 (43), complexes: a statistical analysis. J. Biosci. 32 (4), 677e691. 40313e40318. Peritz, A.E., Kierzek, R., Sugimoto, N., Turner, D.H., 1991. Thermodynamic study of Lietzke, S.E., Barnes, C.L., Berglund, J.A., Kundrot, C.E., 1996. The structure of an internal loops in oligoribonucleotides e symmetrical loops are more stable than RNA dodecamer shows how tandem U-U base pairs increase the range of asymmetric loops. Biochemistry 30 (26), 6428e6436. stable RNA structures and the diversity of recognition sites. Structure 4 (8), Pley, H.W., Flaherty, K.M., McKay, D.B., 1994. Model for an RNA tertiary interaction 917e930. from the structure of an intermolecular complex between a GAAA tetraloop and Lindauer, K., Bendic, C., Suhnel, J., 1996. HBEXPLORE e a new tool for identifying an RNA helix. Nature 372 (6501), 111e113. and analysing hydrogen bonding patterns in biological macromolecules. Com- Popenda, M., Blazewicz, M., Szachniuk, M., Adamiak, R.W., 2008. RNA FRABase put. Appl. Biosci. 12, 281e289. version 1.0: an engine with a database to search for the three-dimensional

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20 19

fragments within RNA structures. Nucl. Acids Res. 36 (Database issue), Spackova, N., Sponer, J., 2006. Molecular dynamics simulations of sarcin-ricin rRNA D386eD391. motif. Nucl. Acids Res. 34 (2), 697e708. Popenda, M., Szachniuk, M., Blazewicz, M., Wasik, S., Burke, E.K., Blazewicz, J., Sponer, J., Mokdad, A., Sponer, J.E., Spackova, N., Leszczynski, J., Leontis, N.B., 2003. Adamiak, R.W., 2010. RNA FRABase 2.0: an advanced web-accessible database Unique tertiary and neighbor interactions determine conservation patterns of with the capacity to search the three-dimensional fragments within RNA cis WatsoneCrick A/G base-pairs. J. Mol. Biol. 330 (5), 967e978. structures. BMC Bioinfo 11, 231. Sponer, J., Jurecka, P., Hobza, P., 2004. Accurate interaction energies of hydrogen- Premilat, S., Albiser, G., 1995. Temperature effects on hydration and form transitions bonded nucleic acid base pairs. J. Am. Chem. Soc. 126 (32), 10142e10151. of DNA. C R Acad. Sci. III 318 (5), 553e557. Sponer, J., Sponer, J.E., Petrov, A.I., Leontis, N.B., 2010. Quantum chemical Ramos, A., Varani, G., 1997. Structure of the acceptor stem of Escherichia coli tRNA studies of nucleic acids: can we construct a bridge to the RNA structural Ala: role of the G3.U70 base pair in synthetase recognition. Nucl. Acids Res. 25 biology and bioinformatics communities? J. Phys. Chem. B 114 (48), (11), 2083e2090. 15723e15741. Ray, S.S., Halder, S., Kaypee, S., Bhattacharyya, D., 2012. HD-RNAS: an automated Sponer, J.E., Leszczynski, J., Sychrovsky, V., Sponer, J., 2005a. Sugar edge/sugar edge hierarchical database of RNA structures. Front. Genet. 3, 59. base pairs in RNA: stabilities and structures from quantum chemical calcula- Reblova, K., Spackova, N., Stefl, R., Csaszar, K., Koca, J., Leontis, N.B., Sponer, J., 2003. tions. J. Phys. Chem. B 109 (39), 18680e18689. Non-WatsoneCrick base pairing and hydration in RNA motifs: molecular dy- Sponer, J.E., Spackova, N., Kulhanek, P., Leszczynski, J., Sponer, J., 2005b. Principles of namics of 5S rRNA loop E. Biophys. J. 84 (6), 3564e3582. RNA base pairing: structures and energies of the trans WatsoneCrick/sugar Reblova, K., Lankas, F., Razga, F., Krasovska, M.V., Koca, J., Sponer, J., 2006. Structure, edge base pairs. J. Phys. Chem. B 109 (22), 11399e11410. dynamics, and elasticity of free 16S rRNA helix 44 studied by molecular dy- Sponer, J.E., Spackova, N., Leszczynski, J., Sponer, J., 2005c. Non-WatsoneCrick base namics simulations. Biopolymers 82 (5), 504e520. pairing in RNA. Quantum chemical analysis of the cis WatsoneCrick/sugar edge Reblova, K., Fadrna, E., Sarzynska, J., Kulinski, T., Kulhanek, P., Ennifar, E., base pair family. J. Phys. Chem. B 109 (10), 2292e2301. Koca,J.,Sponer,J.,2007.Conformationsofflanking bases in HIV-1 RNA Dis Sponer, J., Zgarbova, M., Jurecka, P., Riley, K.E., Sponer, J.E., Hobza, P., 2009. Reference kissing complexes studied by molecular dynamics. Biophys. J. 93 (11), quantum chemical calculations on RNA base pairs directly involving the 20-OH 3932e3949. group of ribose. J. Chem. Theory Comput. 5 (4), 1166e1179. Reblova, K., Sponer, J.E., Spackova, N., Besseova, I., Sponer, J., 2011. A-minor tertiary Stevens, S.W., Barta, I., Ge, H.Y., Moore, R.E., Young, M.K., Lee, T.D., Abelson, J., interactions in RNA kink-turns. Molecular dynamics and quantum chemical 2001. Biochemical and genetic analyses of the U5, U6, and U4/U6U5 analysis. J. Phys. Chem. B 115 (47), 13897e13910. small nuclear ribonucleoproteins from . RNA 7 (11), Ribeiro, R.F., Marenich, A.V., Cramer, C.J., Truhlar, D.G., 2011. The solvation, parti- 1543e1553. tioning, hydrogen bonding, and dimerization of nucleotide bases: a multifac- Strobel, S.A., Cochrane, J.C., 2007. RNA catalysis: ribozymes, ribosomes, and ribos- eted challenge for quantum chemistry. Phys. Chem. Chem. Phys. 13, 10908e witches. Curr. Opin. Chem. Biol. 11 (6), 636e643. 10922. Szep, S., Wang, J., Moore, P.B., 2003. The crystal structure of a 26-nucleotide RNA Romanowska, J., Setny, P., Trylska, J., 2008. Molecular dynamics study of the ribo- containing a hook-turn. RNA 9 (1), 44e51. somal A-site. J. Phys. Chem. B 112 (47), 15227e15243. Szewczak, A.A., Moore, P.B., 1995. The sarcin/ricin loop, a modular RNA. J. Mol. Biol. Rowsell, S., Stonehouse, N.J., Convery, M.A., Adams, C.J., Ellington, A.D., Hirao, I., 247 (1), 81e98. Peabody, D.S., Stockley, P.G., Phillips, S.E.,1998. Crystal structures of a series of RNA Szostak, J.W., 2002. Molecular biology e RNA gets a grip on translation. Nature 419 aptamers complexed to the same protein target. Nat. Struct. Biol. 5 (11), 970e975. (6910), 890e891. Roy, A., Panigrahi, S., Bhattacharyya, M., Bhattacharyya, D., 2008. Structure, stability, Tamura, M., Hendrix, D.K., Klosterman, P.S., Schimmelman, N.R., Brenner, S.E., and dynamics of canonical and noncanonical base pairs: quantum chemical Holbrook, S.R., 2004. SCOR: structural classification of RNA, version 2.0. Nucl. studies. J. Phys. Chem. B 112 (12), 3786e3796. Acids Res. 32 (Database issue), D182eD184. Saenger, W., 1984. Principles of Nucleic Acid Strutcure. Springer-Verlag, New York. Tomasi, J., Menuucci, B., Cammi, R., 2005. Quantum mechanical continuum solva- Saenger, W., Hunter, W.N., Kennard, O., 1986. DNA conformation is determined by tion models. Chem. Rev. 105 (8), 2999e3093. economics in the hydration of phosphate groups. Nature 324 (6095), 385e388. Tucker, B.J., Breaker, R.R., 2005. Riboswitches as versatile gene control elements. Saito, H., Richardson, C.C., 1981. Processing of messenger-RNA by ribonuclease-III Curr. Opin. Struct. Biol. 15 (3), 342e348. regulates expression of gene 1.2 of -T7. Cell 27 (3), 533e542. Vallurupalli, P., Moore, P.B., 2003. The solution structure of the loop E region of the Samanta, D., Mukhopadhyay, D., Chowdhury, S., Ghosh, J., Pal, S., Basu, A., 5S rRNA from spinach . J. Mol. Biol. 325 (5), 843e856. Bhattacharya, A., Das, A., Das, D., DasGupta, C., 2008. Protein folding by domain Villa, A., Widjajakusuma, E., Stock, G., 2008. Molecular dynamics simulation of the VofEscherichia coli 23S rRNA: specificity of RNA-protein interactions. structure, dynamics, and thermostability of the RNA hairpins UCACGG and J. Bacteriol. 190 (9), 3344e3352. CUUCGG. J. Phys. Chem. B 112 (1), 134e142. Samanta, S., Mukherjee, S., Chakrabarti, J., Bhattacharyya, D., 2009. Structural Vitreschak, A.G., Rodionov, D.A., Mironov, A.A., Gelfand, M.S., 2004. Riboswitches: properties of polymeric DNA from molecular dynamics simulations. J. Chem. the oldest mechanism for the regulation of gene expression. Trends Genet. 20 Phys. 130, 115103. (1), 44e50. Sarver, M., Zirbel, C.L., Stombaugh, J., Mokdad, A., Leontis, N.B., 2008. FR3D: finding Vokacova, Z., Sponer, J., Sponer, J.E., Sychrovsky, V., 2007. Theoretical study of the local and composite recurrent structural motifs in RNA 3D structures. J. Math. scalar coupling constants across the noncovalent contacts in RNA base pairs: Biol. 56, 215e252. the cis- and trans-WatsoneCrick/Sugar edge base pair family. J. Phys. Chem. B Schneider, B., Morávek, Z., Berman, H.M., 2004. RNA conformational classes. Nucl. 111 (36), 10813e10824. Acids Res. 32 (5), 1666e1677. Wadley, L.M., Keating, K.S., Duarte, C.M., Pyle, A.M., 2007. Evaluating and learning Scott, W.G., Finch, J.T., Klug, A., 1995a. The crystal structure of an all-RNA from RNA pseudotorsional space: quantitative validation of a reduced repre- . Nucl. Acids Symp. Ser. 34, 214e216. sentation for RNA structure. J. Mol. Biol. 372 (4), 942e957. Scott, W.G., Finch, J.T., Klug, A., 1995b. The crystal structure of an all-RNA Wahl, M.C., Sundaralingam, M., 1997. CeH...O hydrogen bonding in biology. Trends hammerhead ribozyme: a proposed mechanism for RNA catalytic cleavage. Biochem. Sci. 22 (3), 97e102. Cell 81 (7), 991e1002. Walczak, R., Carbon, P., Krol, A., 1998. An essential non-WatsoneCrick base pair Sen, K., Basu, S., Bhattacharyya, D., 2004. QM/MM study of excited state electron motif in 30 UTR to mediate selenoprotein translation. RNA 4 (1), 74e84. transfer between pyrene and 4,4-Bis(dimethylamino)diphenylmethane with Walter, A.E., Turner, D.H., Kim, J., Lyttle, M.H., Müller, P., Mathews, D.H., Zuker, M., different solvent systems: role of hydrogen bonding within solvent molecules. 1994. Coaxial stacking of helixes enhances binding of oligoribonucleotides Inter. J. Quantum Chem. 102, 368e378. and improves predictions of RNA folding. Proc. Natl. Acad. Sci. U. S. A 91 (20), Sharma, M., Bulusu, G., Mitra, A., 2009a. MD simulation of ligand-bound and ligand- 9218e9222. Sep 27. free aptamer: molecular level insights into the binding and switching mecha- Wang, G.L., Dunbrack, R.L., 2005. Pisces: recent improvements to a PDB sequence nism of the Ade A-riboswitch. RNA 15, 1673e1792. culling server. Nucl. Acids Res. 33, W94eW98. Sharma, P., Mitra, A., Sharma, S., Singh, H., Bhattacharyya, D., 2008. Quantum Watson, J.D., Crick, F.H.C., 1953. A structure for deoxyribose nucleic acid. Nature 171, chemical studies of structures and binding in noncanonical RNA base pairs: 737e738. the trans WatsoneCrick: WatsoneCrick family. J. Biomol. Struct. Dyn. 25 (6), Waugh, A., Gendron, P., Altman, R., Brown, J.W., Case, D., Gautheret, D., Harvey, S.C., 709e732. Leontis, N., Westbrook, J., Westhof, E., Zuker, M., Major, F., 2002. RNAML: a Sharma, P., Sponer, J.E., Sponer, J., Sharma, S., Bhattacharyya, D., Mitra, A., 2010. On standard syntax for exchanging RNA information. RNA 8 (6), 707e717. the role of cis Hoogsteen:sugar edge family of base pairs in platforms and Wimberly, B., Varani, G., Tinoco Jr., I., 1993. The conformation of loop E of eukaryotic triplets e quantum chemical insights into RNA structural biology. J. Phys. Chem. 5S ribosomal RNA. Biochemistry 32 (4), 1078e1087. B 114, 3307e3320. Winkler,W.,Nahvi,A.,Breaker,R.R.,2002a.Thiaminederivativesbind Sharma, P., Sharma, S., Chawla, M., Mitra, A., 2009b. Modeling the Noncovalent messenger RNAs directly to regulate bacterial gene expression. Nature 419 interactions at the metabolite binding site in purine riboswitches. J. Mol. Model (6910), 952e956. 15 (6), 633e649. Winkler, W.C., Cohen-Chalamish, S., Breaker, R.R., 2002b. An mRNA structure that Shen, L.X., Cai, Z.P., Tinoco, I., 1995. RNA structure at high-resolution. Faseb J. 9 (11), controls gene expression by binding FMN. Proc. Natl. Acad. Sci. U. S. A. 99 (25), 1023e1033. 15908e15913. Snoussi, K., Leroy, J.L., 2001. Imino proton exchange and base-pair kinetics in RNA Xin, Y., Olson, W.K., 2009. BPS: a database of base-pair structures. Nucl. Acids duplexes. Biochemistry 40 (30), 8898e8904. Res. 37 (Database issue), D83eD88. Sorin, E.J., Engelhardt, M.A., Herschlag, D., Pande, V.S., 2002. RNA simulations: Yang, H., Jossinet, F., Leontis, N., Chen, L., Westbrook, J., Berman, H., Westhof, E., probing hairpin unfolding and the dynamics of a GNRA tetraloop. J. Mol. Biol. 2003. Tools for the automatic identification and classification of RNA base pairs. 317 (4), 493e506. Nucl. Acids Res. 31, 3450e3460.

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003 20 S. Halder, D. Bhattacharyya / Progress in Biophysics and Molecular Biology xxx (2013) 1e20

Young, M.A., Ravishankar, G., Beveridge, D.L., 1997. A 5-nanosecond molecular dy- Zaug, A.J., Cech, T.R., 1986b. The tetrahymena intervening sequence ribonucleic-acid namics trajectory for B0DNA: analysis of structure, motions and solvation. enzyme is a phosphotransferase and an acid-phosphatase. Biochemistry 25 Biophys. J 73, 2313e2336. (16), 4478e4482. Zaug, A.J., Grabowski, P.J., Cech, T.R., 1983. Autocatalytic cyclization of an excised Zirbel, C.L., Sponer, J.E., Sponer, J., Stombaugh, J., Leontis, N.B., 2009. Classification intervening sequence RNA is a cleavage ligation reaction. Nature 301 (5901), and energetics of the base-phosphate interactions in RNA. Nucl. Acids Res. 37 578e583. (15), 4898e4918. Zaug, A.J., Kent, J.R., Cech, T.R., 1984. A labile phosphodiester bond at the ligation Zuker, M., Stiegler, P., 1981. Optimal computer folding of large RNA sequences using junction in a circular intervening sequence RNA. Science 224 (4649), 574e578. thermodynamics and auxiliary information. Nucl. Acids Res. 9 (1), 133e148. Zaug, A.J., Been, M.D., Cech, T.R., 1986. The tetrahymena ribozyme acts like an RNA Zuker, M., 2003. Mfold web server for nucleic acid folding and hybridization pre- restriction endonuclease. Nature 324 (6096), 429e433. diction. Nucl. Acids Res. 31 (13), 3406e3415. Zaug, A.J., Cech, T.R., 1986a. The intervening sequence RNA of tetrahymena is an Zwieb, C., Wower, I., Wower, J., 1999. Comparative sequence analysis of tmRNA. enzyme. Science 231 (4737), 470e475. Nucl. Acids Res. 27 (10), 2063e2071.

Please cite this article in press as: Halder, S., Bhattacharyya, D., RNA structure and dynamics: A base pairing perspective, Progress in Biophysics and Molecular Biology (2013), http://dx.doi.org/10.1016/j.pbiomolbio.2013.07.003